Number of protein-coding genes

Value 20210 Unitless
Organism Mouse Mus musculus
Reference Church et al, Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009 May 5 7(5):e1000112. abstract & p.2 right column 3rd paragraphPubMed ID19468303
Method P.5 left column bottom paragraph: "Using gene predictions for human and mouse from both NCBI [ref 29] and Ensembl [ref 30], [researchers] retained only those that were conserved either within or between the two species. Gene models were assessed for their reliability by: (i) comparing the exon boundaries in alignments of predicted orthologous and paralogous genes, (ii) considering whether mouse and human homologues lay within regions of conserved synteny, and (iii) automatically inspecting genes for reading frame disrupting mutations [ref 31]."
Comments Abstract: "In a comprehensive analysis of this revised genome sequence, [researchers] are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes)." P.2 right column 2nd paragraph: "The availability of finished sequence for human, and now mouse, enables more-complete surveys of protein-coding genes in both species. [Researchers] now estimate that mouse and human reference genomes contain 20,210 and 19,042 protein-coding genes, respectively. The number of mouse genes had been missing or substantially disrupted in the previous MGSCv3 assembly is 2,185. The majority of these arise from rodent lineage-specific duplications, often (61%) embedded within segmentally duplicated regions that were recalcitrant to WGSA [Whole Genome Sequence and Assembly]. Many of these mouse-specific genes may contribute to rodent-specific functions and, with their inclusion in the assembly, are now available for further investigation." For 19,735 protein-coding genes in C. elegans see BNID 101364
Entered by Ron Milo - Admin
ID 100310