The development of reference standards for genomics.

Download files
Access & Terms of Use
open access
Embargoed until 2022-12-10
Copyright: Martins Reis, Andre Luiz
Altmetric
Abstract
Despite decades from the publication of the first draft, the reference human genome remains incomplete, with many unsolved difficult regions. Next-generation sequencing (NGS) has become a central tool for the detection of genetic variation in biological research and clinical diagnosis. However, despite its advantages, NGS suffers from errors and biases that can confound the detection of true variants. Reference standards are control materials, with known properties, against which to test performance. In recent years, the decreasing costs of DNA synthesis have enabled the creative design of synthetic reference controls for genomics. This thesis describes the development of synthetic DNA controls to qualitatively and quantitatively analyse genome features. First, a synthetic ladder, consisting of a single DNA molecule with artificial sequence elements at known copy-numbers, to accurately measure sequence abundance in NGS libraries. The synthetic ladder provides a universal reference to quantify and mitigate the impact of technical variation, independent of the human genome, improving quantitative comparisons within and between samples. Secondly, a synthetic chromosome containing mirrored representations of diverse clinically relevant variants and human genome features, such as HLA alleles and immune receptors. The synthetic chromosome provides a ground-truth reference, with unambiguous representation of low-confidence and difficult-to-sequence genome regions. Therefore, I used the synthetic chromosome to benchmark different experimental and analytical pipelines, highlighting weaknesses and strengths, and providing best-practices guidelines. Finally, the COVID-19 pandemic revealed the importance of controls for accurate and standardised diagnosis of SARS-CoV-2. Therefore, I made a pair of chimeric A/B standards for SARS-CoV-2 diagnostic RT-PCR testing. Each standard contains multiple target sequences joined in tandem, where targets present in standard A are absent in B, and vice-versa. This enables control cross-validation, unambiguously distinguishing control and test failures. In summary, the rapid development of genome technologies often results in a diverse, but fragmented landscape in genomics that hinders data compatibility and inter-operability. To bridge that gap my research provides a standardised quantitative reference, more diverse representations of challenging human genome features and alternative design principles for diagnostic test controls.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Martins Reis, Andre Luiz
Supervisor(s)
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2021
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 13.09 MB Adobe Portable Document Format
Related dataset(s)