Number of protein coding genes

Value ~ unitless Range: ~12500 unitless
Organism Amoeba Dictyostelium discoideum
Reference Eichinger et al., The genome of the social amoeba Dictyostelium discoideum. Nature. 2005 May 5 435(7038):43-57. p. 47 right column 2nd paragraphPubMed ID15875012
Method "Full details are provided in the Supplementary Information. Briefly, automated gene prediction was performed using a combination of programs that had been trained on well-characterized D. discoideum genes, and the results integrated with reference to D. discoideum complementary DNA sequences and homology to genes in other species. Other features in the predicted proteins, and other sequence features, were identified using a variety of software packages."
Comments "Of the 13,541 predicted proteins, 47.5% are represented by qualified ESTs [expressed sequence tags], reflecting the inevitable bias in EST sampling. Among the shortest predicted proteins, fewer are represented by ESTs (for example, 21% of those of <60 amino acids) this is at least partly due to a higher level of overprediction. On the basis of the simplifying assumption that 50% of all genes coding for proteins of <100 amino acids are mis-predictions, researchers estimate the true number of genes at roughly 12,500. This number is closer to that seen in multicellular organisms rather than in most unicellular eukaryotes (Table 2)."
Entered by Uri M
ID 105514