AliBaba2: context specific identification of transcription factor binding sites

In Silico Biol. 2002;2(1):S1-15.

Abstract

Currently, prediction of transcription factor binding sites is widely done using matrices collected from literature. This leads to several problems. We cannot actively control the conservation of the matrices, we cannot systematically use all binding sites available, we do not know which sites were used and which were discarded in matrix construction, we cannot compare and evaluate matrices easily, we cannot detect redundancy and we cannot control sensitivity and specificity. So we are lacking control during the identification process. In this paper a method to overcome these problems is proposed. It is assumed that each binding site has an unknown context which determines its sequence. This leads to the idea of constructing specific matrices for each sequence we are analysing. To do so we have to regard identification of binding sites as a general process, starting at a dataset of known binding sites and ending with the identification of a potential new binding site. In this paper such a process is presented. Besides overcoming the mentioned problems, the implementation also reaches a significantly higher accuracy than current approaches. Evaluations are done analysing all binding sites of TRANSFAC 3.5 public. The resulting tool AliBaba2 is available at http://wwwiti.cs.uni-magdeburg.de/grabe/alibaba2.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Binding Sites
  • Databases, Nucleic Acid
  • Databases, Protein
  • Software
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors