Designing synthetic spike-in controls for next-generation sequencing and beyond

Download files
Access & Terms of Use
open access
Embargoed until 2019-07-01
Copyright: Hardwick, Simon
Altmetric
Abstract
Next-generation sequencing (NGS) is a revolutionary tool that can be used for a myriad of applications, ranging from clinical genome sequencing, to gene expression profiling with RNA sequencing (RNA-seq), to the detection of microbes within environmental samples or isolates. However, significant analytical challenges remain with NGS data due to the complexity of genome architecture, as well as a range of biases introduced during library preparation, sequencing and analysis. These biases and challenges can be understood and mitigated through the use of spike-in controls – DNA or RNA oligonucleotides with known sequence and length that are added to samples prior to library preparation. While spike-in controls have previously been developed for transcriptomics, they were designed for technologies that predated the advent of NGS and consequently suffer from several limitations. In this thesis, I present a novel design framework for synthetic spike-in standards (‘sequins’) that can be applied to a range of NGS applications, and demonstrate how sequins can be used as internal controls to assist in the analysis of accompanying samples. In Chapter 1, I develop a set of spliced synthetic RNA standards that are encoded by artificial gene loci on an accompanying in silico chromosome. RNA sequins enable the assessment of important but previously intractable RNA-seq properties including split-read alignment, alternative splicing, isoform-level quantification and fusion gene detection. In Chapter 2, I present the design of a set of DNA sequins comprising a synthetic community of artificial microbial genomes, which can be used in metagenome sequencing and analysis. Importantly, DNA sequins facilitate the accurate resolution of microbial abundance shifts between samples, which are otherwise imperceptible with NGS. Finally, in Chapter 3, I show how RNA sequins can be used in the analysis of complex brain transcriptomes generated using targeted RNA-seq. This includes an assessment of capture efficiency, quantitative accuracy, and the setting of empirical thresholds to distinguish signal from noise. These transcriptomes are presented as an atlas that can be used to link gene expression with neurological phenotypes. The technologies, associated datasets and analytical methods developed herein provide a qualitative and quantitative reference with which to navigate the complexity of genome biology.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Hardwick, Simon
Supervisor(s)
Mercer, Tim
Mattick, John
Smith, Martin
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2018
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 32.41 MB Adobe Portable Document Format
Related dataset(s)