Characteristic enrichment of DNA repeats in different genomes

Proc Natl Acad Sci U S A. 1997 May 13;94(10):5237-42. doi: 10.1073/pnas.94.10.5237.

Abstract

Using computer programs developed for this purpose, we searched for various repeated sequences including inverted, direct tandem, and homopurine-homopyrimidine mirror repeats in various prokaryotes, eukaryotes, and an archaebacterium. Comparison of observed frequencies with expectations revealed that in bacterial genomes and organelles the frequency of different repeats is either random or enriched for inverted and/or direct tandem repeats. By contrast, in all eukaryotic genomes studied, we observed an overrepresentation of all repeats, especially homopurine-homopyrimidine mirror repeats. Analysis of the genomic distribution of all abundant repeats showed that they are virtually excluded from coding sequences. Unexpectedly, the frequencies of abundant repeats normalized for their expectations were almost perfect exponential functions of their size, and for a given repeat this function was indistinguishable between different genomes.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Base Composition
  • Base Sequence
  • Caenorhabditis elegans / genetics
  • Cyanobacteria / genetics
  • DNA / chemistry*
  • DNA / genetics
  • Escherichia coli / genetics
  • Genetic Code
  • Genetic Markers
  • Genome*
  • Genome, Bacterial
  • Haemophilus influenzae / genetics
  • Humans
  • Methanococcus / genetics
  • Nucleic Acid Conformation
  • Organelles
  • Probability
  • Purines
  • Pyrimidines
  • Repetitive Sequences, Nucleic Acid*
  • Saccharomyces cerevisiae / genetics
  • Species Specificity

Substances

  • Genetic Markers
  • Purines
  • Pyrimidines
  • DNA