Learn Page

Amino Acid Explorer

PSSM Viewer Help

CDD Help

Go back

Questions or comments

   PSSM Viewer Help

What do the sequence conservation levels (complete, high, moderate, low) mean?

Sequence conservation in the PSSM Viewer is measured using the "information content" of each column in the PSSM. For each column that contains more than one residue type, the information content C is defined as the following sum over all residue types i in the column (excluding gap characters):

SUM { f(i) * log(2) [f(i) / q(i)] }

where f(i) = weighted frequency of residue i (see help: How are the frequency bars calculated?)
log(2) - log base 2
q(i) = background frequency of residue i in the BLOSUM62 matrix.

Thus, if a column has the same residue frequencies as described in the BLOSUM62 matrix, C will be zero. If a column has higher residue freqencies than described by BLOSUM62, C will be positive. High values of C therefore imply significantly more conservation than implied by BLOSUM62.

The sequence conservation values are defined as follows:
Full - 100% conservation in the column (only one residue type observed)
High - C > 3.0
Moderate - 1.5 < C <= 3.0
Low - C <= 1.5