Evolutionary Algorithms for Multiple Sequence Alignments

Naznin, Farhana

doi:10.26190/unsworks/15176

Evolutionary Algorithms for Multiple Sequence Alignments

Download files

Access & Terms of Use

open access
Copyright: Naznin, Farhana

CC BY-NC-ND 3.0

Abstract

Multiple sequence alignment (MSA) is a valuable technique for studying molecular evolution, analysing sequence structure, and drug design. To date, in the literature, many algorithms for solving the MSA problems, based on progressive, iterative and stochastic approaches, have been proposed. As their performances vary depending on the characteristics of the test problems, there is a need for an effective algorithm that consistently performs better over a range of problems. Therefore, the objective of this thesis was to develop a new algorithm for solving MSA problems more effectively. However, our systematic analyses of the existing approaches encouraged us to propose not only a new algorithm but also the sequential development of increasingly effective algorithms. Firstly, this thesis presents a new iterative method, namely MSA-IT, which has two new mechanisms: the first generates guide trees with randomly selected sequences; and the second shuffles the sequences inside these trees, within an iterative process, to generate new and better guide trees. This method is simple and fast. Its performance was tested on standard benchmark problems and, compared with well-known algorithms, was convincing. However, as its solutions made clear that there was room for further improvement, we developed another algorithm based on a genetic algorithm with a progressive alignment approach, namely GAPAM. In GAPAM, the initial population is generated using the two mechanisms used in MSA-IT, and two new genetic search operators are implemented. To test its performance on benchmark datasets, the results from GAPAM were compared with those from MSA-IT as well as the well-known algorithms. The comparisons showed that GAPAM not only provides better solutions than MSA-IT and the other algorithms for most of the test problems, but also achieves an overall better performance. However, analyses of the results uncovered the fact that, for sequences of longer length, GAPAM was trapped in local optima. To overcome this local optima problem, the vertical decomposition technique was used with GAPAM. In vertical decomposition, sequences are vertically divided into a number of shorter length MSA problems which are solved individually using a guide tree approach and then joined them together to form a complete MSA. This technique was applied to the solutions of both initial and child generations inside a GA and its algorithm called the Vertical Decomposition with Genetic Algorithm (VDGA). The performance of this method was compared with those of GAPAM, MSA-IT and the well-known algorithms which confirmed that VDGA outperformed all of them. The results also showed that, for shorter sequences, GAPAM and VDGA were not significantly different, and it was established that VDGA with three vertical divisions was the most successful variant compared with the other divisions considered. In this thesis, the Weigthed Sum of Pairs method is taken as the fitness measure and the benchmark datasets from the BAliBase 2.0 database are used for experimentation. For comparison, the algorithms which provide individual BAliscore (benchmark score) results for the BAliBase 2.0 datasets are considered, that is, MSA-GA, MSA-GA w/prealign, CLUSTAL W, RBT-GA, PRRP, SAGA, CLUSTAL X, MULTALIGN, PILEUP8, DIALIGN, ML_PIMA, HMMT and SB_ PIMA.

Persistent link to this record

http://hdl.handle.net/1959.4/51535

DOI

https://doi.org/10.26190/unsworks/15176

Author(s)

Naznin, Farhana

Supervisor(s)

Sarker, Ruhul

Essam, Daryl

Publication Year

2011

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

whole.pdf

3.29 MB

Adobe Portable Document Format

View full record Show statistics

Library

Evolutionary Algorithms for Multiple Sequence Alignments

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)