Coding DNA repeated throughout intergenic regions of the Arabidopsis thaliana genome: evolutionary footprints of RNA silencing

Mol Biosyst. 2009 Dec;5(12):1679-87. doi: 10.1039/b903031j. Epub 2009 Apr 30.

Abstract

Pyknons are non-random sequence patterns significantly repeated throughout non-coding genomic DNA that also appear at least once among coding genes. They are interesting because they portend an unforeseen connection between coding and non-coding DNA. Pyknons have only been discovered in the human genome, so it is unknown whether pyknons have wider biological relevance or are simply a phenomenon of the human genome. To address this, DNA sequence patterns from the Arabidopsis thaliana genome were detected using a probability-based method. 24 654 statistically significant sequence patterns, 16 to 24 nucleotides long, repeating 10 or more times in non-coding DNA also appeared in 46% of A. thaliana protein-coding genes. A. thaliana pyknons exhibit features similar to human pyknons, including being distinct sequence patterns, having multiple instances in genes and having remarkable similarity to small RNA sequences with roles in gene silencing. Chromosomal position mapping revealed that genomic pyknon density has concordance with siRNA and transposable element positioning density. Because the A. thaliana and human genomes have approximately the same number of genes but drastically different amounts of non-coding DNA, these data reveal that pyknons represent a biologically important link between coding and non-coding DNA. Because of the association of pyknons with siRNAs and localization to silenced regions of heterochromatin, we postulate that RNA-mediated gene silencing leads to the accumulation of gene sequences in non-coding DNA regions.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Arabidopsis / genetics*
  • Base Sequence
  • Chromosome Mapping
  • DNA, Intergenic / chemistry
  • DNA, Intergenic / genetics*
  • Evolution, Molecular
  • Genome, Human
  • Genome, Plant*
  • Genomics / methods
  • Humans
  • Molecular Sequence Data
  • Open Reading Frames / genetics*
  • RNA Interference
  • Regulatory Elements, Transcriptional / genetics*
  • Sequence Analysis, DNA

Substances

  • DNA, Intergenic