Cloud based computing technologies for genomic medicine

Download files
Access & Terms of Use
open access
Copyright: Yang, Andrian
Altmetric
Abstract
Recent advances in single-cell RNA-sequencing (scRNA-seq) methods have enabled the study of cellular heterogeneity at the single-cell resolution. However, current tools for processing and analysing RNA-seq data are not equipped to handle the large amount of data generated in single-cell studies. With the exponential growth in the number gene expression profiles generated by scRNA-seq methods, there is a need to develop scalable tools for large-scale data analysis and interpretation. In this thesis, I report several new scalable bioinformatics methods that I have developed for the analysis of scRNA-seq data: 1. Falco - a new cloud-based framework for processing of large-scale scRNA-seq data. Falco utilises standard Big Data frameworks such as Apache Hadoop and Apache Spark to enable scalable data analysis. The Falco framework is designed to perform read processing, alignment, gene expression quantification, and transcript reconstruction - all in a parallel and distributed manner. We demonstrated Falco’s scalability using real data sets, with Falco achieving a speed up of 1.7x to 145x compared to single-node execution. Falco also allows for cost efficient analysis, providing savings of up to 65%. 2. Scavenger - a new pipeline to recover false negative, non-aligned reads in RNA-seq data. Scavenger utilises a novel mechanism for the recovery of such reads based on similarity with aligned reads. Using real data, we demonstrated how Scavenger is able to recover a good portion of non-aligned reads and how reads recovered have more variance compared to aligned reads. Genes with substantial increase in expression after recovery are typically lowly-expressed genes and are enriched for pseudogenes, suggesting that the expression of pseudogenes may be under-reported. 3. Starmap - a new tool for visualisation of scRNA-seq data to help with the exploration and interpretation of the large amount of data. Starmap combines two visual paradigms, the 3D scatter plot and the star plot, to allow visualisation of both the high level structure of the data and the cell-level features. Starmap is designed to be cross-platform and supports an immersive mode which allows for visualisation using low-cost VR headsets.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Yang, Andrian
Supervisor(s)
Ho, Joshua
Suter, Catherine
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2019
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 24.84 MB Adobe Portable Document Format
Related dataset(s)