I’ve had the pleasure of having a bioinformatics student from UCI, Ryan Bertwell, working with me this summer. He, along with my colleague Alessio, recently reviewed the paper on the CoNIFER algorithm. I thought it would be good to share their insights with our readers – below is their review of the algorithm.
With the increase in the use of NGS technology such as exome sequencing for SNV detection, researchers are looking to see what else they can gain from the data already in hand. Many are looking to obtain copy numbers from such sequence reads. Here we review the approach by Niklas Krumm et. al. in their paper, Copy number variation detection and genotyping from exome sequence data, which explores the techniques and accuracy of their copy number calculation algorithm for exome sequencing data. They’ve dubbed their algorithm CoNIFER (Copy Number Inference from Exome Reads).
The authors used data from HapMap individuals, ASD (Autism Spectrum Disorders) trios, and the NHLBI Sequencing Project (ESP). Sequencing was performed on the Illumina HiSeq 2000 or Illumina GAII platforms. Results were validated against data from arrayCGH, qPCR, whole-genome shotgun sequencing, and targeted clone sequencing.
Their main technique for determining copy number variation involves singular value decomposition (SVD) normalization using exome data. This technique finds and removes large sources of bias from the exome sets and includes X chromosome normalization so that samples can be assayed independent of sex. For discovery of rare CNVS, it was found that a large portion of systemic bias and variance stemmed from the first 10-15 singular values from the singular value decomposition method, and that removing these strongest components of bias helped in the normalization process. These components could be removed because the expected variation is small (due to CNVs and CNPs), while the variation observed due to the first 10-15 components was orders of magnitude higher. For discovery of CNP regions, they removed fewer components (only five) to preserve the real signals from highly CNP loci.
The authors propose that CoNIFER can be used to discover CNVs that might be missed by standard practices. Using a large sample base (366 exomes) it was demonstrated that the CoNIFER algorithm could produce copy number variation predictions that were very accurate (94% overall) and that it is strongly correlated with whole genome data (average r2=0.91). The assessments of the data come from experimentally produced CNV data from processes known to be accurate including quantitative PCR sequencing and whole-genome sequencing.
If you already have exome sequencing data and a large cohort, using CoNIFER may be a good way to take advantage of the data and capture copy number variations without much additional expense. If genome-wide detection of CNVs is the main goal, use of high density aCGH or SNP arrays is a much more preferable route at this time and if you can also obtain exome data, CNVs derived from exome sequencing can complement the array data.
Dr. Elan Hahn discusses how the team at Children’s Hospital of Los Angeles (CHLA) studied WES data and moved it one step further by implementing the use of BioDiscovery’s NxClinical software to analyze copy number variants (CNVs). This simple workflow addition allowed them to resolve ~9% of previously undiagnosed Mendelian conditions that had not been originally identified by WES, and in some cases chromosomal microarray.
In this 30 minute webinar Dr. Hahn:
Paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3409265/#B13
CoNIFER software: http://conifer.sourceforge.net/index.html
Python: http://www.python.org/download/