iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC

Mol Biosyst. 2016 Jul 19;12(8):2587-93. doi: 10.1039/c6mb00221h.

Abstract

The nucleosome is the fundamental unit of eukaryotic chromatin, which participates in regulating different cellular processes. Owing to the huge exploration of new DNA primary sequences, it is indispensable to develop an automated model. However, identification of novel protein sequences using conventional methods is difficult or sometimes impossible because of vague motifs and the intricate structure of DNA. In this regard, an effective and high throughput automated model "iNuc-STNC" has been proposed in order to identify accurately and reliably nucleosome positioning in genomes. In this proposed model, DNA sequences are expressed into three distinct feature extraction strategies containing dinucleotide composition, trinucleotide composition and split trinucleotide composition (STNC). Various statistical models were utilized as learner hypotheses. Jackknife test was employed to evaluate the success rates of the proposed model. The experiential results expressed that SVM, in combination with STNC, has obtained an outstanding performance on all benchmark datasets. The predicted outcomes of the proposed model "iNuc-STNC" is higher than current state of the art methods in the literature so far. It is ascertained that the "iNuc-STNC" model will provide a rudimentary framework for the pharmaceutical industry in the development of drug design.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Base Sequence
  • Codon
  • Computational Biology / methods*
  • DNA / chemistry
  • DNA / genetics
  • DNA / metabolism
  • Databases, Genetic
  • Genome*
  • Humans
  • Models, Biological*
  • Models, Statistical
  • Nucleosomes / chemistry*
  • Nucleosomes / metabolism*
  • Sensitivity and Specificity
  • Support Vector Machine

Substances

  • Codon
  • Nucleosomes
  • DNA