Analysis and optimisation of selected genomic algorithms

Bayat, Arash

doi:10.26190/unsworks/21174

Publication:

Analysis and optimisation of selected genomic algorithms

dc.contributor.advisor	Parameswaran, Sri	en_US
dc.contributor.advisor	Ignjatovic, Aleksandar	en_US
dc.contributor.author	Bayat, Arash	en_US
dc.date.accessioned	2022-03-23T10:05:35Z
dc.date.available	2022-03-23T10:05:35Z
dc.date.issued	2018	en_US
dc.description.abstract	The importance of genomic applications in the fields of medicine, agriculture, environment etc., has focused attention in the area of genomic computation in the last two decades. New technologies make it affordable to extract genomic information (sequencing) on a scale hitherto unknown. This has resulted in decreasing the price of sequencing and has increased the number of areas in which the sequencing data is utilised. Thus there is a need to assemble more and more genomes. A significant computational effort is needed to process this sequenced data (assembly) to assemble data and search for variations. It has been predicted that genomic data will exceed the amount of astronomical data in a near future. The growth in computational capacity, based on Moore’s law, cannot continue to respond to this increased computational demand. This thesis is motivated in response to the extensive demand for processing of sequenced data. The author identifies several important related processes and aims to improve each of those methods. First, a comprehensive review has been done on recent assembly pipelines to evaluate them. The result of the study reveals important facts which are used to design an efficient assembly practice. Second, a novel assembly pipeline is introduced that successfully balances the trade-off between speed and accuracy. Third, a fast and accurate sequence alignment algorithm is proposed that is the core of several steps in the assembly workflow, as well as a wide range of other related analysis. Finally, a new data normalisation method is designed. Due to the probabilistic nature of genome assembly, evaluating accuracy is critical. The normalisation is a vital part of the evaluation process. Along with normalisation method, the author has proposed a metric to measure how well the data is normalised. Such a metric has been proposed for the first time. The proposed assembly pipeline is 6 times faster than Spades and results in 100 times larger contiguity than SOAPdenovo2. The proposed alignment algorithm is 14 times faster than the Smith-Waterman algorithm. Yet, for the 99.99% of input sequence pairs, the proposed alignment algorithm results in the same alignment as the one that Smith-Waterman algorithm produces. Finally, the proposed normalisation method is 949 times more accurate than vt-Normalize.	en_US
dc.identifier.uri	http://hdl.handle.net/1959.4/61762
dc.language	English
dc.language.iso	EN	en_US
dc.publisher	UNSW, Sydney	en_US
dc.rights	CC BY-NC-ND 3.0	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/3.0/au/	en_US
dc.subject.other	Optimisation	en_US
dc.subject.other	Bioinformatic	en_US
dc.subject.other	Algorithm	en_US
dc.title	Analysis and optimisation of selected genomic algorithms	en_US
dc.type	Thesis	en_US
dcterms.accessRights	open access
dcterms.rightsHolder	Bayat, Arash
dspace.entity.type	Publication	en_US
unsw.accessRights.uri	https://purl.org/coar/access_right/c_abf2
unsw.identifier.doi	https://doi.org/10.26190/unsworks/21174
unsw.relation.faculty	Engineering
unsw.relation.originalPublicationAffiliation	Bayat, Arash, Computer Science & Engineering, Faculty of Engineering, UNSW	en_US
unsw.relation.originalPublicationAffiliation	Parameswaran, Sri, Computer Science & Engineering, Faculty of Engineering, UNSW	en_US
unsw.relation.originalPublicationAffiliation	Ignjatovic, Aleksandar, Computer Science & Engineering, Faculty of Engineering, UNSW	en_US
unsw.relation.school	School of Computer Science and Engineering	*
unsw.thesis.degreetype	PhD Doctorate	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: public version.pdf
Size:: 5.19 MB
Format:: application/pdf
Description:

Download

Resource type

Thesis

Publication: Analysis and optimisation of selected genomic algorithms

Files

Original bundle

Resource type

Publication:

Analysis and optimisation of selected genomic algorithms