K-Mer-Based Genome Size Estimation in Theory and Practice

Methods Mol Biol. 2023:2672:79-113. doi: 10.1007/978-1-0716-3226-0_4.

Abstract

Recent advances in sequencing technologies have made genome sequencing of non-model organisms with very large and complex genomes possible. The data can be used to estimate diverse genome characteristics, including genome size, repeat content, and levels of heterozygosity. K-mer analysis is a powerful biocomputational approach with a wide range of applications, including estimation of genome sizes. However, interpretation of the results is not always straightforward. Here, I review k-mer-based genome size estimation, focusing specifically on k-mer theory and peak calling in k-mer frequency histograms. I highlight common pitfalls in data analysis and result interpretation, and provide a comprehensive overview on current methods and programs developed to conduct these analyses.

Keywords: BB-tools; CovEST; FindGSE; GCE; GenomeScope; Jellyfish; KSA; Kmergenie; RESPECT.

Publication types

  • Review

MeSH terms

  • Algorithms*
  • Base Sequence
  • Chromosome Mapping
  • Genome Size
  • Sequence Analysis, DNA / methods
  • Software*