In silico virulence prediction and virulence gene discovery of Streptococcus agalactiae

Download files
Access & Terms of Use
open access
Copyright: Lin, Frank Po-Yen
Altmetric
Abstract
Physicians frequently face challenges in predicting which bacterial subpopulations are likely to cause severe infections. A more accurate prediction of virulence would improve diagnostics and limit the extent of antibiotic resistance. Nowadays, bacterial pathogens can be typed with high accuracy with advanced genotyping technologies. However, effective translation of bacterial genotyping data into assessments of clinical risk remains largely unexplored. The discovery of unknown virulence genes is another key determinant of successful prediction of infectious disease outcomes. The trial-and-error method for virulence gene discovery is time-consuming and resource-intensive. Selecting candidate genes with higher precision can thus reduce the number of futile trials. Several in silico candidate gene prioritisation (CGP) methods have been proposed to aid the search for genes responsible for inherited diseases in human. It remains uninvestigated as to how the CGP concept can assist with virulence gene discovery in bacterial pathogens. The main contribution of this thesis is to demonstrate the value of translational bioinformatics methods to address challenges in virulence prediction and virulence gene discovery. This thesis studied an important perinatal bacterial pathogen, group B streptococcus (GBS), the leading cause of neonatal sepsis and meningitis in developed countries. While several antibiotic prophylactic programs have successfully reduced the number of early-onset neonatal diseases (infections that occur within 7 days of life), the prevalence of late-onset infections (infections that occur between 7–30 days of life) remained constant. In addition, the widespread use of intrapartum prophylactic antibiotics may introduce undue risk of penicillin allergy and may trigger the development of antibiotic-resistant microorganisms. To minimising such potential harm, a more targeted approach of antibiotic use is required. Distinguish virulent GBS strains from colonising counterparts thus lays the cornerstone of achieving the goal of tailored therapy. There are three aims of this thesis: 1. Prediction of virulence by analysis of bacterial genotype data: To identify markers that may be associated with GBS virulence, statistical analysis was performed on GBS genotype data consisting of 780 invasive and 132 colonising S. agalactiae isolates. From a panel of 18 molecular markers studied, only alp3 gene (which encodes a surface protein antigen commonly associated with serotype V) showed an increased association with invasive diseases (OR=2.93, p=0.0003, Fisher’s exact test). Molecular serotype II (OR=10.0, p=0.0007) was found to have a significant association with early-onset neonatal disease when compared with late-onset diseases. To investigate whether clinical outcomes can be predicted by the panel of genotype markers, logistic regression and machine learning algorithms were applied to distinguish invasive isolates from colonising isolates. Nevertheless, the predictive analysis only yielded weak predictive power (area under ROC curve, AUC: 0.56–0.71, stratified 10-fold cross-validation). It was concluded that a definitive predictive relationship between the molecular markers and clinical outcomes may be lacking, and more discriminative markers of GBS virulence are needed to be investigated. 2. Development of two computational CGP methods to assist with functional discovery of prokaryotic genes: Two in silico CGP methods were developed based on comparative genomics: statistical CGP exploits the differences in gene frequency against phenotypic groups, while inductive CGP applies supervised machine learning to identify genes with similar occurrence patterns across a range of bacterial genomes. Three rediscovery experiments were carried out to evaluate the CGP methods: a) Rediscovery of peptidoglycan genes was attempted with 417 published bacterial genome sequences. Both CGP methods achieved their best AUC >0.911 in Escherichia coli K-12 and >0.978 Streptococcus agalactiae 2603 (SA-2603) genomes, with an average improvement in precision of >3.2-fold and a maximum of >27-fold using statistical CGP. A median AUC of >0.95 could still be achieved with as few as 10 genome examples in each group in the rediscovery of the peptidoglycan metabolism genes. b) A maximum of 109-fold improvement in precision was achieved in the rediscovery of anaerobic fermentation genes. c) In the rediscovery experiment with genes of 31 metabolic pathways in SA-2603, 14 pathways achieved an AUC >0.9 and 28 pathways achieved AUC >0.8 with the best inductive CGP algorithms. The results from the rediscovery experiments demonstrated that the two CGP methods can assist with the study of functionally uncategorised genomic regions and the discovery of bacterial gene-function relationships. 3. Application of the CGP methods to discover GBS virulence genes: Both statistical and inductive CGP were applied to assist with the discovery of unknown GBS virulence factors. Among a list of hypothetical protein genes, several highly-ranked genes were plausibly involved in molecular mechanisms in GBS pathogenesis, including several genes encoding family 8 glycosyltransferase, family 1 and family 2 glycosyltransferase, multiple adhesins, streptococcal neuraminidase, staphylokinase, and other factors that may have roles in contributing to GBS virulence. Such genes may be candidates for further biological validation. In addition, the co-occurrence of these genes with currently known virulence factors suggested that the virulence mechanisms of GBS in causing perinatal diseases are multifactorial. The procedure demonstrated in this prioritisation task should assist with the discovery of virulence genes in other pathogenic bacteria.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Lin, Frank Po-Yen
Supervisor(s)
Coiera, Enrico
Sintchenko, Vitali
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2009
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download whole.pdf 2.62 MB Adobe Portable Document Format
Related dataset(s)