Download files
Access & Terms of Use
open access
Embargoed until 2022-12-10
Copyright: Martins Reis, Andre Luiz
Embargoed until 2022-12-10
Copyright: Martins Reis, Andre Luiz
Altmetric
Abstract
Despite decades from the publication of the first draft, the reference human genome
remains incomplete, with many unsolved difficult regions. Next-generation sequencing
(NGS) has become a central tool for the detection of genetic variation in biological
research and clinical diagnosis. However, despite its advantages, NGS suffers from errors
and biases that can confound the detection of true variants. Reference standards are
control materials, with known properties, against which to test performance. In recent
years, the decreasing costs of DNA synthesis have enabled the creative design of synthetic
reference controls for genomics. This thesis describes the development of synthetic DNA
controls to qualitatively and quantitatively analyse genome features. First, a synthetic
ladder, consisting of a single DNA molecule with artificial sequence elements at known
copy-numbers, to accurately measure sequence abundance in NGS libraries. The synthetic
ladder provides a universal reference to quantify and mitigate the impact of technical
variation, independent of the human genome, improving quantitative comparisons within
and between samples. Secondly, a synthetic chromosome containing mirrored
representations of diverse clinically relevant variants and human genome features, such
as HLA alleles and immune receptors. The synthetic chromosome provides a ground-truth
reference, with unambiguous representation of low-confidence and difficult-to-sequence
genome regions. Therefore, I used the synthetic chromosome to benchmark different
experimental and analytical pipelines, highlighting weaknesses and strengths, and
providing best-practices guidelines. Finally, the COVID-19 pandemic revealed the
importance of controls for accurate and standardised diagnosis of SARS-CoV-2. Therefore,
I made a pair of chimeric A/B standards for SARS-CoV-2 diagnostic RT-PCR testing. Each
standard contains multiple target sequences joined in tandem, where targets present in
standard A are absent in B, and vice-versa. This enables control cross-validation,
unambiguously distinguishing control and test failures. In summary, the rapid
development of genome technologies often results in a diverse, but fragmented
landscape in genomics that hinders data compatibility and inter-operability. To bridge
that gap my research provides a standardised quantitative reference, more diverse
representations of challenging human genome features and alternative design principles
for diagnostic test controls.