Publication:
From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data

dc.contributor.advisor Cavicchioli, Ricardo en_US
dc.contributor.author Amos, Timothy en_US
dc.date.accessioned 2022-03-21T10:57:52Z
dc.date.available 2022-03-21T10:57:52Z
dc.date.issued 2011 en_US
dc.description.abstract The metagenome of a microbial community contains a large quantity of information about the inter-strain genetic variation present in that community. Genome assemblers using algorithms designed for use with isolate genomes obscure the inter-strain variation within metagenomic data. Analysing this variation in metagenomic data is further complicated by sequencing errors that add noise to the system by making base assignments ambiguous. In order to develop improved computational methods for metagenome analysis, simulations were performed using genome data of individual species. A software program, MetaSim, was used to generate simulated reads. Assemblies of these reads were used to investigate the development of an error model to confidently identify SNPs (Single Nucleotide Polymorphisms). This approach proved limited due to the nature of the MetaSim software and the insufficient availability of consistent, well-documented data. As an alternative approach, a graphical analysis of unitigs (high confidence contigs) was developed. This approach provided accurate predictions of whether each unitig in an assembly of simulated reads consisted of only one strain, or more. The approach included developing a system of rules describing the relationship between the number and proportions of strains in an assembly and the positioning of clusters in scatter plots. The differences in densities of clusters were used to help distinguish between ambiguous cluster patterns. Idealised assemblies of simulated reads without sequencing errors were produced, to examine how sequence quality affects the ability to make inferences about inter-strain variation. Computational clustering was investigated as a means of automating the analysis. Having established an approach to analyse unitigs, environmental metagenome data was analysed. This graphical analysis provided a well-supported and parsimonious interpretation of the number of strains present in metagenome data of an Antarctic lake community, and their proportions. en_US
dc.identifier.uri http://hdl.handle.net/1959.4/51820
dc.language English
dc.language.iso EN en_US
dc.publisher UNSW, Sydney en_US
dc.rights CC BY-NC-ND 3.0 en_US
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/3.0/au/ en_US
dc.subject.other Diversity en_US
dc.subject.other Metagenomics en_US
dc.subject.other Bioinformatics en_US
dc.subject.other Strains en_US
dc.title From organism diversity to micro-heterogeneity: confident assessment of fine-scale variation within metagenomic data en_US
dc.type Thesis en_US
dcterms.accessRights open access
dcterms.rightsHolder Amos, Timothy
dspace.entity.type Publication en_US
unsw.accessRights.uri https://purl.org/coar/access_right/c_abf2
unsw.identifier.doi https://doi.org/10.26190/unsworks/15386
unsw.relation.faculty Science
unsw.relation.originalPublicationAffiliation Amos, Timothy, Biotechnology & Biomolecular Sciences, Faculty of Science, UNSW en_US
unsw.relation.originalPublicationAffiliation Cavicchioli, Ricardo, Biotechnology & Biomolecular Sciences, Faculty of Science, UNSW en_US
unsw.relation.school School of Biotechnology & Biomolecular Sciences *
unsw.thesis.degreetype Masters Thesis en_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
whole.pdf
Size:
1.16 MB
Format:
application/pdf
Description:
Resource type