iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC

Muhammad Tahir; Maqsood Hayat

doi:10.1039/c6mb00221h

iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC

Mol Biosyst. 2016 Jul 19;12(8):2587-93. doi: 10.1039/c6mb00221h.

Authors

Muhammad Tahir¹, Maqsood Hayat¹

Affiliation

¹ Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan. Maqsood.hayat@gmail.com m.hayat@awkum.edu.pk.

PMID: 27271822
DOI: 10.1039/c6mb00221h

Abstract

The nucleosome is the fundamental unit of eukaryotic chromatin, which participates in regulating different cellular processes. Owing to the huge exploration of new DNA primary sequences, it is indispensable to develop an automated model. However, identification of novel protein sequences using conventional methods is difficult or sometimes impossible because of vague motifs and the intricate structure of DNA. In this regard, an effective and high throughput automated model "iNuc-STNC" has been proposed in order to identify accurately and reliably nucleosome positioning in genomes. In this proposed model, DNA sequences are expressed into three distinct feature extraction strategies containing dinucleotide composition, trinucleotide composition and split trinucleotide composition (STNC). Various statistical models were utilized as learner hypotheses. Jackknife test was employed to evaluate the success rates of the proposed model. The experiential results expressed that SVM, in combination with STNC, has obtained an outstanding performance on all benchmark datasets. The predicted outcomes of the proposed model "iNuc-STNC" is higher than current state of the art methods in the literature so far. It is ascertained that the "iNuc-STNC" model will provide a rudimentary framework for the pharmaceutical industry in the development of drug design.

MeSH terms

Algorithms
Amino Acid Sequence
Animals
Base Sequence
Codon
Computational Biology / methods*
DNA / chemistry
DNA / genetics
DNA / metabolism
Databases, Genetic
Genome*
Humans
Models, Biological*
Models, Statistical
Nucleosomes / chemistry*
Nucleosomes / metabolism*
Sensitivity and Specificity
Support Vector Machine

Substances

Codon
Nucleosomes
DNA