Copy number variations (CNVs) are genomic alterations that result in abnormal copies of one or more genes. Structural genomic events such as duplications, deletions, translocations, and inversions can cause CNVs.
Like single-nucleotide polymorphisms (SNPs), particular CNVs have been associated with susceptibility to diseases such as cancer, inherited genetic disorders, autoimmune diseases, and others.
At Bionano Genomics, we equip clinical research labs with NxClinical which we believe may be the most comprehensive and up-to-date cytogenetics and molecular genetics solution. It’s one system for analyzing and interpreting all genomic variants from microarray and next-generation sequencing (NGS) data.
This guide briefly introduces whole-genome CNV analysis, how it works, and how labs are taking advantage of it today.
The development of NGS technology has dramatically improved our ability to detect all types of genomic variations, from single nucleotide variant (SNV) to CNV and other structural variations. Using NGS data for CNV analysis has gained huge attention in recent years thanks to new technologies and better algorithms that enable the simultaneous detection of CNVs and SNVs.
Since NGS technology is now the most common method for high throughput assessment of sequence variants (SeqVar) with wide acceptance, the ability to also obtain CNV and LOH status of a sample from NGS is very appealing as it would mean a single workflow and reduced cost.
NGS-based CNV analysis techniques also enable labs to map the precise location of a variant (depending on the detection approach).
There are four main methods of detecting CNVs with NGS data:
Each of these four methods specializes in detecting a specific form or size range of CNV, resulting in a trade-off in breakpoint accuracy. None of these methodologies is perfect; each brings advantages and disadvantages. To address this, many labs combine different methods, such as read-depths with read-pairs, or read-depths with split-reads, to achieve a more holistic analysis.
As Dr. Fen Guo, Clinical Laboratory Director at PerkinElmer Genomics notes, the utility of these methods often hinges on the quality of the NGS data available.
“There’s a general sense that some methods are better than others—for example, that the split-read method is superior for accurate breakpoint identification because of the nature of this methodology, while the read-depths can detect the dosages of CNVs and works better on a wide range of CNV sizes from small to large CNVs in the genome. But in addition to recognizing the inherent differences between these methods and what they’re capable of, so much depends on the quality of the data—the read depths, the coverage, and the data uniformity.”
Dr. Fen Guo, Ph.D., FACMG, FCCMG
Clinical Laboratory Director
PerkinElmer Genomics
To give a little more background and tease out some of these important nuances, we briefly summarize each NGS CNV calling method below.
The read-pair methodology was the first to demonstrate the usefulness of NGS data for CNV detection.
It works by comparing the insert size between the actual sequences’ read-pairs with the expected size based on a reference genome. Labs using this method can identify CNVs by mapping the discordance between mapped paired reads whose distances significantly differ from the predetermined average insert size.
The split-read methodology uses reads from paired-end sequencing where only one pair has reliable mapping, and the other either entirely or partially fails to map to the genome.
The read-depth method is based on the hypothesis of a correlation between the depth of coverage of a genomic region and the copy number of the region.
In theory, all forms of genetic variation—including CNVs—can be detected by the assembly of short reads if the reads are sufficiently long and accurate.
Watch our free webinar—Copy Number Variant Detection by NGS: Coverage, Uniformity & Resolution—to see Dr. Guo introduce the main methods utilized for calling CNVs using NGS data and share clinical cases that illustrate how the coverage and uniformity of NGS data contribute to the resolution of CNV calling.
Whole-genome data has broad utility as it can detect SNVs, insertions/deletions, copy number changes, and both large and small structural variants. Thanks to recent technological innovations, the latest genome sequencers can perform whole-genome sequencing more efficiently than ever.
Unlike narrower approaches to detecting and characterizing CNVs from NGS data such as whole-exome sequencing or gene panels, which analyze a limited portion of the genome, whole-genome data delivers a comprehensive view of the entire genome and has a higher resolution compared to capture-based methods.
“Take the DMD gene, for example- the nature of the gene is small exons interspersed by large introns. Using traditional capture-based methodology to enrich the coding region only, you’ll likely lose the resolution you need to call tiny events, such as a single exon deletion or duplication which is an importable portion of the variants spectrum. Using genome sequencing or a specifically designed genome-level DMD assay, you can achieve uniform coverage across the gene. The uniform coverage not only facilitates the identification of smaller deletion and/or duplication but also helps to precisely identify the breakpoint which is critical for accurate copy number variant assessment.”
— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics
Compared to exome data, which only captures one to two percent of the genome and relies on capture-based or PCR-based enrichment, genome data comprises the entire genome—sequencing the coding regions and the non-coding regions. Recent research has suggested that many disease-causing variants may be found in the non-coding regions and are therefore missed by analyzing exomes alone.
Whole-genome data is unique in being PCR-free and non-biased. As a result, PCR-free sequencing methodologies used to call CNVs from whole-genome data provide more uniform coverage across both coding and non-coding regions of DNA. This uniform coverage can increase the likelihood of finding a disease-causing mutation.
Also, because of the uniform coverage, whole-genome data requires relatively lower coverage depths across the genome. Running the same CNV calls from exome data may, for example, require 100 times the coverage, while the same results could be achieved with only 40 times coverage with genome data.
Whole-genome sequencing is also widely regarded as the superior data modality for accurate breakpoint detection.
“In many cases, whole-genome data enables you to identify breakpoints even at the single nucleotide level because of the uniform coverage across the genome. In addition, whole-genome data also provides insight into some challenging regions such as those involved with trinucleotide repeat disorders.”
— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics
Watch our free webinar—Genome sequencing reveals cause of multi-generational split hand/split foot with long bone deficiency—to see how Dr. Raymond C. Caylor, Assistant Director, Molecular Diagnostic Laboratory at Greenwood Genetic Center, utilized genome sequencing and Bionano’s NxClinical software, to provide a diagnosis for a multi-generational family with split hand/split foot with long bone deficiency.
High-quality detection of CNVs from NGS data has been a long-standing challenge for clinical research labs. Most “out-of-the-box” NGS analysis software tools can’t easily detect or visualize CNVs. Their capabilities are typically limited to certain variant types and sizes or focused on detecting SNVs.
Without robust and convenient CNV calling capabilities, labs are left with an incomplete picture of genomic aberrations and, therefore, can’t thoroughly investigate their patient samples and provide complete results.
Today’s software tools for detecting, analyzing, and interpreting CNVs from NGS data can be broadly divided into two categories: homegrown tools and commercial software.
Homegrown CNV tools, while sometimes advantageous from a cost perspective if the lab has very specific and unchanging CNV calling needs, bring several disadvantages that can exact high practical and efficiency costs on a lab.
For example:
Commercial CNV software, on the other hand, enables teams to invest in efficiencies and capabilities that don’t always require in-house bioinformatics or development expertise. These tools tend to be far more user-friendly and keep pace with new developments in NGS capabilities. However, not all CNV software is equal in performance, capability, and ease of use.
As Dr. Guo explains, many of the commercial tools in use today treat CNV analysis as an add-on capability:
“From my experience using several software platforms, many commercial platforms that tout CNV analysis were built for SNV calling and interpretation. CNV calling was added on, but the primary interface is still designed for SNV analysis. Many labs needing to call CNVs need to interface with this data at the genomic level and get the whole picture—especially labs coming from the microarray world that want to use a familiar platform.”
— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics
Dr. Guo urges teams to be thoughtful when evaluating commercial tools against their particular needs—both today and tomorrow:
“You have to be very careful when thinking about the best commercial tool for the type of CNV calling you need to do. Think about the primary purpose you’ll be using it for. Are you only going to be using panels? Exome data only? Or do you think you’ll want software that analyzes all types of NGS data? Here at PerkinElmer Genomics, we use panels, exome, and genome data, which is why we use software [Bionano’s NxClinical] that covers everything.
Secondly, most CNV software will give you deletions, duplications, and copy numbers. But not all of them call AOH, which is important for imprinting disorders and cancer.
Thirdly, you have to consider the differences in analytical performance between software. You don’t want a high false-positive rate or false-negative rate.
And lastly—and most importantly for me—if you or anyone on your team is a naturally visual person, you need to look at the data visualization and user interface. It needs to be user-friendly and not get in its own way. The copy number events across the genome should be easy to visualize and identify.”
— Dr. Fen Guo, Ph.D., FACMG, FCCMG, Clinical Laboratory Director at PerkinElmer Genomics
So, to quickly recap the key considerations when evaluating commercial CNV calling software:
Here at Bionano Genomics, we equip labs with the single-source software solution they need to overcome these challenges with a single software solution.
We believe NxClinical may be the most comprehensive and up-to-date solution for cytogenetics and molecular genetics in one system for analyzing and interpreting all genomic variants, including CNVs, from microarray and NGS data.
We’ve perfected two algorithms for the detection of CNV and AOH from almost all NGS assays.
Both are available with NxClinical, the genomics software solution that enables labs to detect CNVs and AOH regions, and visualize SNVs in context, across all microarray and NGS platforms simultaneously—all from a single screen.
With higher-depth NGS, smaller CNVs can be detected and integrated with sequence variants to provide a holistic view of the sample.
In “Figure 3” below, the ideogram shows regions of copy number gain (blue bars), loss (red bars), AOH (yellow shading), Allelic Imbalance (purple shading), as well as various types of Sequence Variants (e.g., SNV, In/Del, etc.) as colored “lollipops”.
As described in Chaubey et al., Journal of Molecular Diagnostics, vol. 22, No. 6 June 2020, researchers used 10x WGS and validated that the NxClinical algorithm detected all CNVs and AOH that were found by high-resolution SNP arrays.
“Figure 2” above shows a small exonic deletion detected using 10x WGS with the MSR algorithm.
Are you an active NxClinical user considering an update? In this 25-minute webinar, Soheil Shams, Founder & CEO of BioDiscovery, a Bionano Genomics company, uses multiple example oncology cases to demonstrate the most effective workflow and case review benefits of the Knowledgebase in NxClinical 6.0.
Book a free personalized demo to assess fit and see NxClinical in action. Let us know you’re interested and we’ll connect on an initial consultation to answer questions and dive a little deeper before demonstrating NxClinical—either with example data or your own.
*NxClinical software is for research use only. It is designed to assist clinicians and it is not intended as a primary diagnostic tool. It is each lab’s responsibility to use the software in accordance with internal policies as well as in compliance with applicable regulations.
© Copyright 2022 Bionano Genomics, Inc. All rights reserved. All trademarks are the property of Bionano Genomics, Inc. or their respective owners.