The use of information processing techniques in biological applications such as gene sequencing has led to the field of bioinformatics, where organisms are seen as vast information systems that can be modelled using traditional pattern recognition tools. As a result, the field experiences a growing need for researchers with a background in computer science. Specific problems of interest include (to name a few) alignment and comparison of biological sequences, description of sequence families, gene finding, discovery of protein binding sites and analysis of gene expression.
The projects below are a collaboration with the South African National Bioinformatics Institute, who are international leaders in their field and in touch with which problems are relevant for practical applications such as cancer and AIDS research. These are preliminary suggestions; further topics will also be available.
Metrics for sequence comparison
We are investigating approximate metrics for performing sequence comparison in linear or near-linear time. Standard metrics of this type include the d^2 metric. This project will focus on some probabilistic extensions of this metric that have been shown to be effective in modelling phoneme sequence data.
Higher order hidden Markov modelling of sequence data
While first order hidden Markov models (HMMs) have become standard approach in modelling biological sequence data, higher order models have traditionally been shunned because of their computational complexity. However, failure to model higher order dependencies is seen as a primary weakness of standard HMMs. This project will apply recent developments in the field of higher order HMMs that reduce the complexity of training these models, thereby making it possible to use them in practical applications.
Last updated: September 2002
Back to projects page.