Copy number variations (CNVs) are genomic alterations that result in abnormal copies of one or more genes. Structural genomic events such as duplications, deletions, translocations, and inversions can cause CNVs.
Like single-nucleotide polymorphisms (SNPs), particular CNVs have been associated with susceptibility to diseases such as cancer, inherited genetic disorders, autoimmune diseases, and others.
At Bionano Genomics, we equip clinical research labs with NxClinical, which we believe may be the most comprehensive and up-to-date cytogenetics, and molecular genetics solution. It’s one system for analyzing and interpreting all genomic variants from microarray and next-generation sequencing (NGS) data.
This guide briefly introduces whole-exome CNV analysis, how it works, and how labs are taking advantage of it today.
The development of NGS technology has dramatically improved our ability to detect all types of genomic variations, from single nucleotide variant (SNV) to CNV and other structural variations. Using NGS data for CNV analysis has gained huge attention in recent years thanks to new technologies and better algorithms that enable the simultaneous detection of CNVs and SNVs.
Since NGS technology is now the most common method for high throughput assessment of Sequence Variants (SeqVar) with wide acceptance, the ability to also obtain CNV and LOH status of a sample from NGS is very appealing as it would mean a single workflow and reduced costs.
NGS-based CNV analysis techniques also enable labs to map the precise location of a variant (depending on the detection approach).
There are four main methods of detecting CNVs with NGS data:
Each of these four methods specializes in detecting a specific form or size range of CNV, resulting in a trade-off in breakpoint accuracy. None of these methodologies is perfect; each brings advantages and disadvantages. To address this, many labs combine different methods, such as the read-depths with read-pairs, or read-depths with split-reads, to achieve a more holistic analysis.
As Dr. Fen Guo, Clinical Laboratory Director at PerkinElmer Genomics notes, the utility of these methods often hinges on the quality of the NGS data available.
“There’s a general sense that some methods are better than others—for example, that the split-read method is superior for accurate breakpoint identification because of the nature of this methodology, while the read-depths can detect the dosages of CNVs and works better on a wide range of CNV sizes from small to large CNVs in the genome. But in addition to recognizing the inherent differences between these methods and what they’re capable of, so much depends on the quality of the data—the read depths, the coverage, and the data uniformity.”
—Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director, PerkinElmer Genomics
To give a little more background and tease out some of these important nuances, we briefly summarize each NGS CNV calling method below.
The read-pair methodology was the first to demonstrate the usefulness of NGS data for CNV detection.
It works by comparing the insert size between the actual sequences’ read-pairs with the expected size based on a reference genome. Labs using this method can identify CNVs by mapping the discordance between mapped paired reads whose distances significantly differ from the predetermined average insert size.
The split-read methodology uses reads from paired-end sequencing where only one pair has reliable mapping, and the other either entirely or partially fails to map to the genome.
The read-depth method is based on the hypothesis of a correlation between the depth of coverage of a genomic region and the copy number of the region.
In theory, all forms of genetic variation—including CNVs—can be detected by the assembly of short reads if the reads are sufficiently long and accurate.
Watch our free webinar—Copy Number Variant Detection by NGS: Coverage, Uniformity & Resolution—to see Dr. Guo introduce the main methods utilized for calling CNVs using NGS data and share clinical cases that illustrate how the coverage and uniformity of NGS data contribute to the resolution of CNV calling.
WES is a form of next-generation sequencing that focuses only on the exons (the protein-coding regions) to detect CNVs, SNPs, and somatic mutations.
WES data is useful for the clinical interpretation of genetic variation discovered in exomes. It typically offers a more cost-effective and higher-throughput alternative to WGS, which involves sequencing every single base pair within an organism’s entire DNA sequence at once rather than just parts.
By contrast, WES requires much less data storage and processing power while still providing sufficient coverage for many types of analyses.
“Before labs recognized that they can use NGS data to call CNVs, WES data was mainly used for calling SNVs or indels. Using NGS data for CNV detection has really stood out in recent years due to the capability of detecting CNV and SNVs simultaneously, especially with nowadays the cost of next-generation sequencing reduced dramatically. In addition to CNV detection, detecting loss of heterozygosity (LOH) is another bonus. This is very important for imprinting disorders. In addition, Copy-Neutral absence of heterozygosity is also part of the disease mechanism for many somatic diseases which labs don’t want to miss.”
— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics
Like with all types of NGS data, labs should carefully assess the sensitivity and specificity when calling CNVs. Given WES data’s lack of coverage in intron and non-coding regions, some calls may be missed—a risk that can require a step of manual review. Labs should acknowledge the limitations of their assays and make those limitations clear in their reporting.
Watch our free webinar—Genome sequencing reveals cause of multi-generational split hand/split foot with long bone deficiency—to see how Dr. Raymond C. Caylor, Assistant Director, Molecular Diagnostic Laboratory at Greenwood Genetic Center, utilized genome sequencing and Bionano’s NxClinical software, to provide a diagnosis for a multi-generational family with split hand/split foot with long bone deficiency.
High-quality detection of CNVs from NGS data has been a long-standing challenge for clinical research labs. Most “out-of-the-box” NGS analysis software tools can’t easily detect or visualize CNVs. Their capabilities are typically limited to certain variant types and sizes or focused on detecting SNVs.
Without robust and convenient CNV calling capabilities, labs are left with an incomplete picture of genomic aberrations and, therefore, can’t thoroughly investigate their patient samples and provide complete results.
Today’s software tools for detecting, analyzing, and interpreting CNVs from NGS data can be broadly divided into two categories: homegrown tools and commercial software.
Homegrown CNV tools, while sometimes advantageous from a cost perspective if the lab has very specific and unchanging CNV calling needs, bring several disadvantages that can exact high practical and efficiency costs on a lab.
For example:
Commercial CNV software, on the other hand, enables teams to invest in efficiencies and capabilities that don’t always require in-house bioinformatics or development expertise. These tools tend to be far more user-friendly and keep pace with new developments in NGS capabilities. However, not all CNV software is equal in performance, capability, and ease of use.
As Dr. Guo explains, many of the commercial tools in use today treat CNV analysis as an add-on capability:
“From my experience using several software platforms, many commercial platforms that tout CNV analysis were built for SNV calling and interpretation. CNV calling was added on, but the primary interface is still designed for SNV analysis. Many labs needing to call CNVs need to interface with this data at the genomic level and get the whole picture—especially labs coming from the microarray world that want to use a familiar platform.”
— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics
Dr. Guo urges teams to be thoughtful when evaluating commercial tools against their particular needs—both today and tomorrow:
“You have to be very careful when thinking about the best commercial tool for the type of CNV calling you need to do. Think about the primary purpose you’ll be using it for. Are you only going to be using panels? Exome data only? Or do you think you’ll want software that analyzes all types of NGS data? Here at PerkinElmer Genomics, we use panels, exome, and genome data, which is why we use software [Bionano’s NxClinical] that covers everything.
Secondly, most CNV software will give you deletions, duplications, and copy numbers. But not all of them call AOH, which is important for imprinting disorders and cancer.
Thirdly, you have to consider the differences in analytical performance between software. You don’t want a high false-positive rate or false-negative rate.
And lastly—and most importantly for me—if you or anyone on your team is a naturally visual person, you need to look at the data visualization and user interface. It needs to be user-friendly and not get in its own way. The copy number events across the genome should be easy to visualize and identify.”
— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics
So, to quickly recap the key considerations when evaluating commercial CNV calling software:
Here at Bionano Genomics, we equip labs with the single-source software solution they need to overcome these challenges with a single software solution.
We believe NxClinical may be the most comprehensive and up-to-date solution for cytogenetics and molecular genetics in one system for analyzing and interpreting all genomic variants, including CNVs, from microarray and NGS data.
We’ve perfected two algorithms for the detection of CNV and AOH from almost all NGS assays.
Both are available with NxClinical, the genomics software solution that enables labs to detect CNVs and AOH regions, and visualize SNVs in context, across all microarray and NGS platforms simultaneously—all from a single screen.
Unlike the numerous algorithms available for calling CNVs from WES data that suffer from poor sensitivity or too many false-positive calls, the MSR algorithm has been able to offer the best balance of these competing measures, detecting small true positives without generating many false positives.
The image below shows a small 12Kb deletion overlapping part of MECP2 gene resulting in only 2 virtual probes indicating a small copy number loss. At the same time, with such sensitivity, only four other CNVs were detected that passed the basic filtering stage demonstrating a very low false-positive rate.
Free tutorial for NxClincial users
Are you an active NxClinical user considering an update? In this 25-minute webinar, Soheil Shams, Founder & CEO of BioDiscovery, a Bionano Genomics company, uses multiple example oncology cases to demonstrate the most effective workflow and case review benefits of the Knowledgebase in NxClinical 6.0.
Book a free personalized demo to assess fit and see NxClinical in action. Let us know you’re interested and we’ll connect on an initial consultation to answer questions and dive a little deeper before demonstrating NxClinical—either with example data or your own.
*NxClinical software is for research use only. It is designed to assist clinicians and it is not intended as a primary diagnostic tool. It is each lab’s responsibility to use the software in accordance with internal policies as well as in compliance with applicable regulations.
© Copyright 2022 Bionano Genomics, Inc. All rights reserved. All trademarks are the property of Bionano Genomics, Inc. or their respective owners.