K-Mer-Based Genome Size Estimation in Theory and Practice

Uljana Hesse

doi:10.1007/978-1-0716-3226-0_4

K-Mer-Based Genome Size Estimation in Theory and Practice

Methods Mol Biol. 2023:2672:79-113. doi: 10.1007/978-1-0716-3226-0_4.

Author

Uljana Hesse¹

Affiliation

¹ Department of Biotechnology, University of the Western Cape, Bellville, South Africa. uhesse@uwc.ac.za.

PMID: 37335470
DOI: 10.1007/978-1-0716-3226-0_4

Abstract

Recent advances in sequencing technologies have made genome sequencing of non-model organisms with very large and complex genomes possible. The data can be used to estimate diverse genome characteristics, including genome size, repeat content, and levels of heterozygosity. K-mer analysis is a powerful biocomputational approach with a wide range of applications, including estimation of genome sizes. However, interpretation of the results is not always straightforward. Here, I review k-mer-based genome size estimation, focusing specifically on k-mer theory and peak calling in k-mer frequency histograms. I highlight common pitfalls in data analysis and result interpretation, and provide a comprehensive overview on current methods and programs developed to conduct these analyses.

Keywords: BB-tools; CovEST; FindGSE; GCE; GenomeScope; Jellyfish; KSA; Kmergenie; RESPECT.

Publication types

Review

MeSH terms

Algorithms*
Base Sequence
Chromosome Mapping
Genome Size
Sequence Analysis, DNA / methods
Software*