Evolutionary Algorithms for Multiple Sequence Alignments

Download files
Access & Terms of Use
open access
Copyright: Naznin, Farhana
Altmetric
Abstract
Multiple sequence alignment (MSA) is a valuable technique for studying molecular evolution, analysing sequence structure, and drug design. To date, in the literature, many algorithms for solving the MSA problems, based on progressive, iterative and stochastic approaches, have been proposed. As their performances vary depending on the characteristics of the test problems, there is a need for an effective algorithm that consistently performs better over a range of problems. Therefore, the objective of this thesis was to develop a new algorithm for solving MSA problems more effectively. However, our systematic analyses of the existing approaches encouraged us to propose not only a new algorithm but also the sequential development of increasingly effective algorithms. Firstly, this thesis presents a new iterative method, namely MSA-IT, which has two new mechanisms: the first generates guide trees with randomly selected sequences; and the second shuffles the sequences inside these trees, within an iterative process, to generate new and better guide trees. This method is simple and fast. Its performance was tested on standard benchmark problems and, compared with well-known algorithms, was convincing. However, as its solutions made clear that there was room for further improvement, we developed another algorithm based on a genetic algorithm with a progressive alignment approach, namely GAPAM. In GAPAM, the initial population is generated using the two mechanisms used in MSA-IT, and two new genetic search operators are implemented. To test its performance on benchmark datasets, the results from GAPAM were compared with those from MSA-IT as well as the well-known algorithms. The comparisons showed that GAPAM not only provides better solutions than MSA-IT and the other algorithms for most of the test problems, but also achieves an overall better performance. However, analyses of the results uncovered the fact that, for sequences of longer length, GAPAM was trapped in local optima. To overcome this local optima problem, the vertical decomposition technique was used with GAPAM. In vertical decomposition, sequences are vertically divided into a number of shorter length MSA problems which are solved individually using a guide tree approach and then joined them together to form a complete MSA. This technique was applied to the solutions of both initial and child generations inside a GA and its algorithm called the Vertical Decomposition with Genetic Algorithm (VDGA). The performance of this method was compared with those of GAPAM, MSA-IT and the well-known algorithms which confirmed that VDGA outperformed all of them. The results also showed that, for shorter sequences, GAPAM and VDGA were not significantly different, and it was established that VDGA with three vertical divisions was the most successful variant compared with the other divisions considered. In this thesis, the Weigthed Sum of Pairs method is taken as the fitness measure and the benchmark datasets from the BAliBase 2.0 database are used for experimentation. For comparison, the algorithms which provide individual BAliscore (benchmark score) results for the BAliBase 2.0 datasets are considered, that is, MSA-GA, MSA-GA w/prealign, CLUSTAL W, RBT-GA, PRRP, SAGA, CLUSTAL X, MULTALIGN, PILEUP8, DIALIGN, ML_PIMA, HMMT and SB_ PIMA.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Naznin, Farhana
Supervisor(s)
Sarker, Ruhul
Essam, Daryl
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2011
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download whole.pdf 3.29 MB Adobe Portable Document Format
Related dataset(s)