Intelligent Systems Group
Computer Science Department
University of the Western Cape


"A fundamental goal of biology is to understand life at the level of genes, proteins and cells. Molecular biology and genetics are undergoing revolutionary changes. Emphasis has shifted from the study of individual genes and proteins to the exploration of the entire genome of an organism and the study of networks of genes and proteins. As the level of aspiration rises and the amount of available data grows by orders of magnitude, the field becomes increasingly dependent on mathematical and statistical modeling, mathematical analysis and computation." - Richard M. Karp

The use of information processing techniques in biological applications such as gene sequencing has led to the field of bioinformatics, where organisms are seen as vast information systems that can be modelled using traditional pattern recognition tools. As a result, the field experiences a growing need for researchers with a background in computer science. Specific problems of interest include (to name a few) alignment and comparison of biological sequences, description of sequence families, gene finding, discovery of protein binding sites and analysis of gene expression.

The projects below are a collaboration with the South African National Bioinformatics Institute, who are international leaders in their field and in touch with which problems are relevant for practical applications such as cancer and AIDS research. These are preliminary suggestions; further topics will also be available.

Metrics for sequence comparison

We are investigating approximate metrics for performing sequence comparison in linear or near-linear time. Standard metrics of this type include the d^2 metric. This project will focus on some probabilistic extensions of this metric that have been shown to be effective in modelling phoneme sequence data.

Higher order hidden Markov modelling of sequence data

While first order hidden Markov models (HMMs) have become standard approach in modelling biological sequence data, higher order models have traditionally been shunned because of their computational complexity. However, failure to model higher order dependencies is seen as a primary weakness of standard HMMs. This project will apply recent developments in the field of higher order HMMs that reduce the complexity of training these models, thereby making it possible to use them in practical applications.

Last updated: September 2002
Back to projects page.