Publication:
Analysis and optimisation of selected genomic algorithms

dc.contributor.advisor Parameswaran, Sri en_US
dc.contributor.advisor Ignjatovic, Aleksandar en_US
dc.contributor.author Bayat, Arash en_US
dc.date.accessioned 2022-03-23T10:05:35Z
dc.date.available 2022-03-23T10:05:35Z
dc.date.issued 2018 en_US
dc.description.abstract The importance of genomic applications in the fields of medicine, agriculture, environment etc., has focused attention in the area of genomic computation in the last two decades. New technologies make it affordable to extract genomic information (sequencing) on a scale hitherto unknown. This has resulted in decreasing the price of sequencing and has increased the number of areas in which the sequencing data is utilised. Thus there is a need to assemble more and more genomes. A significant computational effort is needed to process this sequenced data (assembly) to assemble data and search for variations. It has been predicted that genomic data will exceed the amount of astronomical data in a near future. The growth in computational capacity, based on Moore’s law, cannot continue to respond to this increased computational demand. This thesis is motivated in response to the extensive demand for processing of sequenced data. The author identifies several important related processes and aims to improve each of those methods. First, a comprehensive review has been done on recent assembly pipelines to evaluate them. The result of the study reveals important facts which are used to design an efficient assembly practice. Second, a novel assembly pipeline is introduced that successfully balances the trade-off between speed and accuracy. Third, a fast and accurate sequence alignment algorithm is proposed that is the core of several steps in the assembly workflow, as well as a wide range of other related analysis. Finally, a new data normalisation method is designed. Due to the probabilistic nature of genome assembly, evaluating accuracy is critical. The normalisation is a vital part of the evaluation process. Along with normalisation method, the author has proposed a metric to measure how well the data is normalised. Such a metric has been proposed for the first time. The proposed assembly pipeline is 6 times faster than Spades and results in 100 times larger contiguity than SOAPdenovo2. The proposed alignment algorithm is 14 times faster than the Smith-Waterman algorithm. Yet, for the 99.99% of input sequence pairs, the proposed alignment algorithm results in the same alignment as the one that Smith-Waterman algorithm produces. Finally, the proposed normalisation method is 949 times more accurate than vt-Normalize. en_US
dc.identifier.uri http://hdl.handle.net/1959.4/61762
dc.language English
dc.language.iso EN en_US
dc.publisher UNSW, Sydney en_US
dc.rights CC BY-NC-ND 3.0 en_US
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/3.0/au/ en_US
dc.subject.other Optimisation en_US
dc.subject.other Bioinformatic en_US
dc.subject.other Algorithm en_US
dc.title Analysis and optimisation of selected genomic algorithms en_US
dc.type Thesis en_US
dcterms.accessRights open access
dcterms.rightsHolder Bayat, Arash
dspace.entity.type Publication en_US
unsw.accessRights.uri https://purl.org/coar/access_right/c_abf2
unsw.identifier.doi https://doi.org/10.26190/unsworks/21174
unsw.relation.faculty Engineering
unsw.relation.originalPublicationAffiliation Bayat, Arash, Computer Science & Engineering, Faculty of Engineering, UNSW en_US
unsw.relation.originalPublicationAffiliation Parameswaran, Sri, Computer Science & Engineering, Faculty of Engineering, UNSW en_US
unsw.relation.originalPublicationAffiliation Ignjatovic, Aleksandar, Computer Science & Engineering, Faculty of Engineering, UNSW en_US
unsw.relation.school School of Computer Science and Engineering *
unsw.thesis.degreetype PhD Doctorate en_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
public version.pdf
Size:
5.19 MB
Format:
application/pdf
Description:
Resource type