Towards the detection of horizontal gene transfer in metagenomic datasets

Download files
Access & Terms of Use
open access
Copyright: Song, Weizhi
Altmetric
Abstract
Horizontal gene transfer (HGT) is thought to be an important driving force for microbial evolution and adaptation, including the development of antibiotics resistance and niche adaptation. Metagenomics provides an opportunity to study HGT on the level of microbial communities, however, analysis method for this are currently lacking. Here, I developed three bioinformatic pipelines to aid the detection of HGT in metagenomic datasets. Firstly, Binning_refiner was developed to improve the quality of genome bins derived from metagenomic datasets through the combination of different binning programs. The results demonstrated that Binning_refiner can significantly reduce the contamination level of genome bins and increase the total size of contamination-free genome bins. Secondly, HgtSIM was developed to simulate HGT events among microbial community members with user-defined mutation levels. It was developed for testing and benchmarking pipelines for recovering HGTs from complex microbial communities. Thirdly, MetaCHIP was developed to identify HGTs at the community-level through the combination of best-match and phylogenetic approaches. Assessment of its performance on both simulated and real datasets showed that it can effectively predict HGTs with various degrees of genetic divergence from microbial communities. The results also showed that the detection of very recent gene transfers (i.e. those with genetic divergence < 5%) from metagenomic datasets is affected by the read assemble step, as the genomic background of recently transferred genes cannot be recovered with currently available assemblers. And finally, the potential application of long-read sequencing (PacBio) for metagenomics was also explored. A simulated metagenome of pooled genomic DNA from ten marine bacteria with various degrees of genome similarity was sequenced on the PacBio sequencing platform and ten complete high-quality genomes were assembled. The findings and developments made here, including a reference-based read phasing approach for the assembly of highly similar genomes, can be used in the future to design strategies to analyse long-read sequencing data for mixed bacterial communities.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Song, Weizhi
Supervisor(s)
Thomas, Torsten
Egan, Suhelen
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2019
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 12.06 MB Adobe Portable Document Format
Related dataset(s)