k-mer sparse matrix model for genetic sequence and its applications in sequence comparison

J Theor Biol. 2014 Dec 21:363:145-50. doi: 10.1016/j.jtbi.2014.08.028. Epub 2014 Aug 23.

Abstract

Based on the k-mer model for genetic sequence, a k-mer sparse matrix representation is proposed to denote the types and sites of k-mers appearing in a genetic sequence, and there exists a one-to-one relationship between a genetic sequence and its associated k-mer sparse matrix. With the singular value decomposition of the k-mer sparse matrix, the k-mer singular value vector is constructed and utilized to numerically quantify the characteristics of a genetic sequence. We investigate and evaluate the optimum value k(⁎) chosen for our k-mer sparse matrix model for genetic sequence. To show the usefulness of our k-mer sparse matrix model method, it is applied to the comparison of genetic sequences, and the results obtained fully demonstrate that our proposed method is very powerful in analyzing and determining the relationships of genetic sequences.

Keywords: Optimum value; Phylogenetic analysis; Singular value decomposition; k-mer Model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence / genetics*
  • Computational Biology / methods*
  • Models, Genetic*
  • Sequence Analysis / methods*