Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1

Genome Biol. 2004;5(8):R52. doi: 10.1186/gb-2004-5-8-r52. Epub 2004 Jul 12.

Abstract

Background: Large fractions of all fully sequenced genomes code for proteins of unknown function. Annotating these proteins of unknown function remains a critical bottleneck for systems biology and is crucial to understanding the biological relevance of genome-wide changes in mRNA and protein expression, protein-protein and protein-DNA interactions. The work reported here demonstrates that de novo structure prediction is now a viable option for providing general function information for many proteins of unknown function.

Results: We have used Rosetta de novo structure prediction to predict three-dimensional structures for 1,185 proteins and protein domains (<150 residues in length) found in Halobacterium NRC-1, a widely studied halophilic archaeon. Predicted structures were searched against the Protein Data Bank to identify fold similarities and extrapolate putative functions. They were analyzed in the context of a predicted association network composed of several sources of functional associations such as: predicted protein interactions, predicted operons, phylogenetic profile similarity and domain fusion. To illustrate this approach, we highlight three cases where our combined procedure has provided novel insights into our understanding of chemotaxis, possible prophage remnants in Halobacterium NRC-1 and archaeal transcriptional regulators.

Conclusions: Simultaneous analysis of the association network, coordinated mRNA level changes in microarray experiments and genome-wide structure prediction has allowed us to glean significant biological insights into the roles of several Halobacterium NRC-1 proteins of previously unknown function, and significantly reduce the number of proteins encoded in the genome of this haloarchaeon for which no annotation is available.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Archaeal Proteins / chemistry*
  • Archaeal Proteins / genetics
  • Archaeal Proteins / metabolism*
  • Bacteriophages / genetics
  • Bacteriophages / physiology
  • Chemotaxis
  • Computational Biology*
  • Databases, Genetic
  • Evolution, Molecular
  • Genome, Archaeal
  • Genomics
  • Halobacterium / chemistry*
  • Halobacterium / classification
  • Halobacterium / genetics
  • Halobacterium / metabolism*
  • Oligonucleotide Array Sequence Analysis
  • Protein Binding
  • Protein Structure, Tertiary
  • Proteomics*
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Software
  • Structure-Activity Relationship
  • Systems Biology*
  • Transcription Factors / chemistry
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • Archaeal Proteins
  • RNA, Messenger
  • Transcription Factors