Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes

J Mol Biol. 2001 Jan 19;305(3):567-80. doi: 10.1006/jmbi.2000.4315.

Abstract

We describe and validate a new membrane protein topology prediction method, TMHMM, based on a hidden Markov model. We present a detailed analysis of TMHMM's performance, and show that it correctly predicts 97-98 % of the transmembrane helices. Additionally, TMHMM can discriminate between soluble and membrane proteins with both specificity and sensitivity better than 99 %, although the accuracy drops when signal peptides are present. This high degree of accuracy allowed us to predict reliably integral membrane proteins in a large collection of genomes. Based on these predictions, we estimate that 20-30 % of all genes in most genomes encode membrane proteins, which is in agreement with previous estimates. We further discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C(in) topologies. We discuss the possible relevance of this finding for our understanding of membrane protein assembly mechanisms. A TMHMM prediction service is available at http://www.cbs.dtu.dk/services/TMHMM/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Bacterial Proteins / chemistry
  • Computational Biology / methods*
  • Databases as Topic
  • Fungal Proteins / chemistry
  • Genome*
  • Internet
  • Markov Chains*
  • Membrane Proteins / chemistry*
  • Plant Proteins / chemistry
  • Porins / chemistry
  • Protein Sorting Signals
  • Protein Structure, Secondary
  • Reproducibility of Results
  • Research Design
  • Sensitivity and Specificity
  • Software
  • Solubility

Substances

  • Bacterial Proteins
  • Fungal Proteins
  • Membrane Proteins
  • Plant Proteins
  • Porins
  • Protein Sorting Signals