Modelling protein structure using discrete backbone torsion angles

Download files
Access & Terms of Use
open access
Copyright: Smith, Philip Raymond
Altmetric
Abstract
Since the first attempts at modelling proteins, a large variety of simplified approximations have been used to reduce the computational complexity associated with an all-atom unconstrained model. One simplification is to restrict the allowable backbone torsion angles to a discrete subset of predetermined values in an effort to reduce the conformational space available to the molecule. Discrete-backbone models can be classified as on-lattice where the allowable conformations can be placed onto a grid in Cartesian space, or off-lattice whereby the only constraint imposed is that backbone torsion angles must come from a discrete set. The advantage of off-lattice models is that they can approximate real structures with a relatively small number of representative torsion angles. This work presents an analysis of the accuracy of an off-lattice model whereby the conformation of each standard amino acid (and amino acids preceding proline) is approximated by its own set of discrete phi and psi angles. To select which angles are included in the discrete set, the problem is considered an instance of the more general k-means clustering problem. In k-means clustering, a set of representative points must be computed for a data set such that the distance from each point in the data set to its nearest representative is minimal. To elicit the relationship between the complexity of a model and its accuracy, the structures of real proteins are approximated using a varying number of representative angles for each residue. Discrete approximations are constructed using two algorithms, one which aims to preserve local similarity and another which aims to maintain the overall conformation. Discrete structures are compared to native structures using local measures of similarity such as deviation in torsion angles, as well as global measures including average distance between corresponding atoms when the real and discrete structures are superimposed. Additionally, the effect of using a single set of representative angles for all residue types versus a set of angles for each residue is investigated. In order to identify protein segments which are not being approximated well with a discrete representation, an empirical energy function is derived from the distribution of torsion angles in a database of known structures. The energy function is based on the law of Boltzmann which relates the probability of a system being in a particular state with the energy of that state. It is found that a discrete backbone representation is able to approximate a real structure well in terms of overall conformation as well as fine local structure. Average distance between corresponding atoms is less than 3 Å using only 5 representative angles per-residue. alpha-helices are well conserved using a discrete model but hydrogen bonding in beta-sheets is often disrupted. Examination of empirical energy shows that the extremities of regular structural elements (the termini of helices and the edges of sheets) deviate significantly from the standard geometry of such structures.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Smith, Philip Raymond
Supervisor(s)
Curmi, Paul
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2010
Resource Type
Thesis
Degree Type
Masters Thesis
UNSW Faculty
Files
download whole.pdf 2.6 MB Adobe Portable Document Format
Related dataset(s)