Number of protein-coding genes

Please note: The entry will be shown to all once approved by the database administrator.
Value 20210 Unitless
Organism Mouse Mus musculus
Reference Church et al, Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009 May 5 7(5):e1000112. abstract & p.2 right column 3rd paragraphPubMed ID19468303
Method "Using gene predictions for human and mouse from both NCBI [29] and Ensembl [30], [researchers] retained only those that were conserved either within or between the two species. Gene models were assessed for their reliability by: (i) comparing the exon boundaries in alignments of predicted orthologous and paralogous genes, (ii) considering whether mouse and human homologues lay within regions of conserved synteny, and (iii) automatically inspecting genes for reading frame disrupting mutations [31]."[Numbers point to refs in article]
Comments "In a comprehensive analysis of this revised genome sequence, [researchers] are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes)." "The availability of finished sequence for human, and now mouse, enables more-complete surveys of protein-coding genes in both species. [Researchers] now estimate that mouse and human reference genomes contain 20,210 and 19,042 protein-coding genes, respectively. The number of mouse genes had been missing or substantially disrupted in the previous MGSCv3 assembly is 2,185. The majority of these arise from rodent lineage-specific duplications, often (61%) embedded within segmentally duplicated regions that were recalcitrant to WGSA. Many of these mouse- specific genes may contribute to rodent-specific functions and, with their inclusion in the assembly, are now available for further investigation." For 19,735 protein-coding genes in C. elegans see BNID 101364
Entered by Ron Milo - Admin
ID 100310