GOLD: Genomewide Optimization of Locus Description

The paper is accessible here

The supplementary materials is here


Background:

Identifying the genes, an essential task in post-genome biology, requires aligning the many cDNA sequences available in the public databases on the genome. However, the various programs used at the main genome annotation sites propose different solutions to the exact exon structure in about half the cases.

Results:

To help resolve this problem, we pool the mRNA-to-genome alignments proposed by NCBI, UCSC, ensembl, AceView, and H-inv, for 74,106 mRNA from 29,194 human genes. We carefully define a cost function and let “GOLD”, Genomewide Optimization of Locus Description, select the best alignment for each clone. We annotate the Gold alignments, discuss the distribution of introns and minimal size of exons, classify the frequent rearrangements, and propose that variable tandem-repeat-number and micro-introns below 65 bp, which occur in 9% of the genes, are micro-polymorphisms. We evidence striking chromosomal and regional specificity in the control of gene duplication and discover that exact duplicates of genes containing introns are all clustered within 3.1 megabases of each other. We also observe interchromosomal and regional variability in the levels of base mismatch and rearrangements, annotate suspected defects, including frameshifts, in the genome and the cDNAs, and discuss the high frequency of intronless genes. Finally we identify difficult alignments through programs comparison. The current Gold, their annotations, the C program and acedb schema are available online, or from www.ncbi.nlm.nih.gov/IEB/Research/Acembly/GOLD. Contributions of new alignments are encouraged.

Conclusions:

Because GOLD extracts the best solutions to difficult alignment problems from all programs, it opens a new dimension toward a precise annotation of the human genes and genome.


The aim of the Gold project is to gather the most accurate alignments of human cDNAs from large scale public projects on the human genome, and to use this to extend our understanding of the human genes.

The paper describes our main results and is linked to a detailed supplementary material , itself linked to many lists of clones.

We would be happy to receive your questions, comments or suggestions , please send us an email.


Last modified: Sat Sep 3 00:19:03 EDT 2005