Protein family classification using sparse Markov
transducers
Eleazar Eskin
William Noble Grundy
Yoram Singer
Proceedings of the Eighth International Conference on
Intelligent Systems for Molecular Biology. August 20-23, 2000.
To appear.
Abstract
In this paper we present a method for classifying proteins into
families using sparse Markov transducers (SMTs). Sparse Markov
transducers, similar to probabilistic suffix trees, estimate a
probability distribution conditioned on an input sequence. SMTs
generalize probabilistic suffix trees by allowing for wild-cards
in the conditioning sequences. Because substitutions of amino
acids are common in protein families, incorporating wild-cards
into the model significantly improves classification performance.
We present two models for building protein family classifiers
using SMTs. We also present efficient data structures to improve
the memory usage of the models. We evaluate SMTs by building
protein family classifiers using the Pfam database and compare our
results to previously published results.