Visual-analytics-driven bioinformatics methods for the analysis of biomolecular data

Download files
Access & Terms of Use
open access
Copyright: Kaur, Sandeep
Advances in molecular biology data collection, leading to the accumulation of large amounts of diverse data, call for novel computational approaches to enable their effective analysis. This thesis explored the application of visual-analytics-driven bioinformatics approaches to four biomolecular data-driven challenges. For analysing time-series omic and multiomic data, a novel method, Minardo-Model, was developed. Minardo-Model can identify key events (e.g. phosphorylation) from such time-series data and temporally order them. To visualise the inferred order of events, two novel visualisation approaches, event maps and event sparklines, were developed. Minardo-Model was tested using two time-series datasets and in both cases, the event orderings derived by this method correlated with prior knowledge. To streamline the use of experimental 3D protein structures for analysing sequence variants, a novel method was developed and integrated into Aquaria. For variants specified in the HGVS notation, the method identifies and displays a best matching structure. Additionally, for each variant specified, all structures spanning the variant, and containing the exact variant (missense only), along with sequence features retrieved from external resources, are summarised. The developed approach was used to analyse variants in human ACE2, and SARS-CoV-2 spike, revealing novel insights. For pathogenic bacterial isolates characterised using multilevel genome typing (MGT), the MGTdb web service was developed. MGTdb, enables upload of isolates as sequence reads or extracted alleles, which are processed and assigned the MGT-identifiers. The features of MGTdb, such as interactive visualisation tools, data download and export to external software, enable epidemiological exploration in the context of the local or global database of isolates. The usability of MGTdb was successfully demonstrated through three case studies. For identifying insertion sequences (IS) from short-read sequencing data, a novel method, WiIS, was developed. WiIS was tested on Bordetella pertussis isolates, for which both short-read (test data) and long-read sequences (ground truth) were available - WiIS was found to have high precision and recall. It also outperformed other published tools in identifying IS in B. pertussis genomes. The novel bioinformatics methods developed in this thesis enable novel analysis of a wide variety of data thus providing insight into various biomolecular processes.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
Resource Type
Degree Type
PhD Doctorate
UNSW Faculty
download public version.pdf 18.26 MB Adobe Portable Document Format
Related dataset(s)