STOP: searching for transcription factor motifs using gene expression

Bioinformatics. 2007 Jul 15;23(14):1737-43. doi: 10.1093/bioinformatics/btm249. Epub 2007 May 8.

Abstract

Motivation: Existing computational methods that identify transcription factor (TF) binding sites on a gene's promoter are plagued by significant inaccuracies. Binding of a TF to a particular sequence is assessed by comparing its similarity score, obtained from the TF's known position weight matrix (PWM), to a threshold. If the similarity score is above the threshold, the sequence is considered a putative binding site. Determining this threshold is a central part of the problem, for which no satisfactory biologically based solution exists.

Results: We present here a method that integrates gene expression data with sequence-based scoring of TF binding sites, for determining a global score threshold for each TF. We validate our method, STOP (Searching TFs Of Promoters), in several ways: (1) we calculate the average expression values of groups of human putative target genes of each TF, and compare them to similar averages derived for random gene groups. The groups of putative targets show significantly higher relative average expression. (2) We find high consistency between the induced lists of putative targets in human and in mouse. (3) The expression patterns associated with human and mouse genes (ordered by PWM scores for each TF) exhibit high similarity between human and mouse, indicating that our method has firm biological basis. (4) Comparison of results obtained by STOP and PRIMA (Elkon et al., 2003) suggests that determining the score threshold using gene expression, as is done in STOP, is more biologically tuned.

Availability: Software package will be available for academic users upon request.

Supplementary information: Supplementary data are available on Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Motifs / genetics*
  • Animals
  • Computational Biology / methods*
  • Gene Expression Regulation*
  • Genomics
  • Humans
  • Mice
  • Models, Genetic
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis
  • Promoter Regions, Genetic
  • Software
  • Species Specificity
  • Transcription Factors / chemistry*
  • Transcription Factors / genetics*

Substances

  • Transcription Factors