Identification of pseudogenes in the Drosophila melanogaster genome

Nucleic Acids Res. 2003 Feb 1;31(3):1033-7. doi: 10.1093/nar/gkg169.

Abstract

Pseudogenes are copies of genes that cannot produce a protein. They can be detected from disruptions to their apparent coding sequence, caused by frameshifts and premature stop codons. They are classed as either processed pseudogenes (made by reverse transcription from an mRNA) or duplicated pseudogenes, arising from duplication in the genomic DNA and subsequent disablement. Historically, there is anecdotal evidence that the fruit fly (Drosophila melanogaster) has few pseudogenes. Investigators have linked this to a high deletion rate of genomic DNA, for which there is evidence from genetic experiments on genome size. Here, we apply a homology-based pipeline that was developed previously to identify pseudogenes in other eukaryotic genomes, to the fruit fly, so as to derive the first complete survey of its pseudogene population. We find approximately 100 pseudogenes, with at least a sixth of these as candidate processed pseudogenes. This gives a much lower proportion of pseudogenes (compared with the size of the proteome) than in the genomes of other eukaryotes for which data are available (human, nematode and budding yeast). Closest matching proteins to Drosophila pseudogenes are significantly longer than the average protein in its proteome (up to approximately 60% more than the average protein's length), in contrast to the situation in the three other eukaryotic genomes. This may be due to the persistence of fragments of longer genes. In the fly pseudogene population, we found most pseudogenes for serine proteases (which are more abundant in the Drosophila lineage compared with the other eukaryotes), immunoglobulin-motif-containing proteins and cytochromes P450. Data on the sequences and positions of the putative pseudogenes are available at: http://www.pseudogene.org/fly. The detection of a small number of pseudogenes in the Drosophila genome and the higher mean length for the closest matching proteins to pseudogenes (possibly because remnants of genes encoding longer proteins are more likely to persist) are further evidence for a high deletion rate of genomic DNA in the fruit fly. The data are useful for molecular evolution study in Drosophila.

MeSH terms

  • Animals
  • Chromosome Mapping
  • Cytochrome P-450 Enzyme System / genetics
  • Drosophila Proteins / genetics
  • Drosophila Proteins / physiology
  • Drosophila melanogaster / genetics*
  • Genes, Insect*
  • Genome
  • Immunoglobulins / genetics
  • Pseudogenes*
  • Sequence Deletion
  • Serine Endopeptidases / genetics

Substances

  • Drosophila Proteins
  • Immunoglobulins
  • Cytochrome P-450 Enzyme System
  • Serine Endopeptidases