Number of coding DNA sequences

Value 552 unitless
Organism Archaea Nanoarchaeum equitans
Reference Waters E et al., The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc Natl Acad Sci U S A. 2003 Oct 28 100(22):12984-8. p. 12985 left column bottom line.PubMed ID14566062
Method "A set of computational methods was applied to the N. equitans genome. Two gene prediction programs, GLIMMER (12) and CRITICA (13), were run on the assembled sequences. The results of the two programs were merged to generate a unique set of genes. When the two programs selected different start codons for genes with the same stop codon, the longer gene was included in the set for further analysis. Additional genes were identified in the intergenic regions by using TBLASTN to compare DNA sequences with protein sequences from other archaeal genomes. The unique set of genes was then translated into amino acid sequences and subjected to BLASTP searches (with an E value cutoff of 10^-10) against the nonredundant amino acid protein database (http://ncbi.nlm.nih.gov) (14). The predicted protein set was searched against the InterPro database release 3.1 (15) by using software modified from the original iprscan programs provided by InterPro. The predicted protein set was also searched against the NCBI Clusters of Orthologous Groups database mid-2001 update (16). Finally, gene family analysis was performed by using the NCBI BLASTCLUST program." (numbers in parentheses point to refs in article).
Comments "The genome of N. equitans (GenBank accession no. AACL01000000) consists of a single, circular chromosome of 490,885 bp [BNID 105503] and has an average G+C content of 31.6%. All 61 sense codons are used, but in line with the low G+C content the third codon position is mainly A or T. [Researchers] identified 552 coding DNA sequences (CDS) with an average length of 827 bp."
Entered by Uri M
ID 105502