In a groundbreaking study published in the March 4 issue of Nature Communications, a team of scientists from UCSF, Drexel University, and The Chinese University of Hong Kong used Bionano to analyze structural variation in a record 154 humans. The genomes of these individuals had been previously sequenced as part of the 1000 Genomes Project.
Bionano identified 8.5 times more large insertions in the same samples than previously reported by the 1000 Genomes Project using short-read sequencing, and 35% more large deletions. Not surprisingly, many of the large structural variations were flanked by repetitive elements, which short-read sequencing fails to accurately resolve, rendering the included structural variants undetectable.
Many of the structural variants (SVs) and copy number variants (CNVs) associated with disease phenotypes show significant ethnic variation as well. This finding highlights a severe weakness in the current practice in personalized medicine and population-wide sequencing projects where short-read sequences are aligned to just a single reference genome. Without a thorough characterization of the underlying structural variation, the study authors state, “alignment of short-reads to these ethnically variable disease-associated regions will lead to errors in analysis.” Interestingly, the study is able to replicate the human phylogenetic tree (shown in the image above) just as well as we can do based on SNPs.
The long-range information provided by megabase-size molecules only Bionano is capable of mapping allowed the authors to resolve variation in many of the most intractable regions of the genome, such as segmental duplications, subtelomeric regions, pericentromeric regions and repetitive regions of the Y-chromosome. In addition, the study identified ~60 Mb of non-redundant genome content not found in the hg38 reference genome.
This study of the largest human population mapped on the Bionano platform demonstrates the power of Bionano’s technology to reveal structural variation missed by short-read sequencing. The ethnically diverse study reveals that one reference genome does not fit all, and that it is impossible for a genome analysis based on short-read sequencing alone to correctly characterize all clinically relevant genome variation at the root of human disease in individuals across different populations. A similar point was already made in several news stories covering this publication, including one in STAT news and on GenomeWeb.
The sheer size of this study demonstrates that Bionano genome mapping is a high-speed, cost-effective technology suitable for large population-scale studies. Given recent advancements in the workflow for Saphyr, studies like this one can now be done on a routine basis for only $500/sample in just 4 weeks of data collection.
We are excited to see the results of this massive study published, and expect to see Saphyr in many more discovery studies!