Format

Send to:

Choose Destination

Download Assembly



Cael_CB4856_1.0

Organism name:
Caenorhabditis elegans (roundworm)
Infraspecific name:
Strain: CB4856
BioSample:
SAMN03334911
BioProject:
PRJNA275000
Submitter:
University of Washington
Date:
2015/04/21
Assembly level:
Chromosome
Genome representation:
full
GenBank assembly accession:
GCA_000975215.1 (latest)
RefSeq assembly accession:
n/a
RefSeq assembly and GenBank assembly identical:
n/a
WGS Project:
JZEW01
Assembly method:
RecursiveHomologyBasedAssemblyPlusJRAssembler v. 2014-07-01
Expected final version:
no
Reference guided assembly:
GCA_000002985.2
Genome coverage:
69.0x
Sequencing technology:
Illumina

IDs: 338541 [UID] 1770628 [GenBank]

See Genome Information for Caenorhabditis elegans

There are 26 assemblies for this organism

See more

History (Show revision history)

Comment

C. elegans Hawaiian strain CB4856 genome sequence 1.0

We have undertaken the construction of a C. elegans Hawaiian strain CB4856 reference genome sequence. The Hawaiian strain, CB4856, was isolated in 1972 by Linda Holden from a pineapple field on the ... Hawaiian island of Maui (under the name HA8). To complete our reference genome we took advantage of several very deep coverage MPS datasets for the Hawaiian genome, a new de novo assembly program (Chu et al. 2013), end sequences from a fosmid library for the Hawaiian genome (Perkins, J. D., 2010 Comparison of fosmid libraires made from two geographic isolates of Caenorhabditis elegans, University of British Columbia), recently released RNA-seq data and low coverage genome sequence data from 49 recombinant inbred lines (RILs) (Li et al. 2006) and 60 introgression lines (ILs) (Doroszuk et al. 2009) Exploiting these resources and using a variety of software tools, we have modified the C. elegans N2 reference genome to generate a draft reference sequence for the Hawaiian genome.

Using a strategy similar to that employed in the analysis of different Arabidopsis accessions (Gan et al. 2011; Schneeberger et al. 2011), we first aligned the random genomic reads (69.5X coverage composed of 34.7M paired end sequences (104 base sequences) from clones with an insert size of 321 bp with a total of 7220M bases) to the N2 reference genome, identified SNVs and indels, modified the N2 reference accordingly and realigned the reads, repeating the process 19 times to create a first version of the Hawaiian genome (20 cycles total). This process allowed extension of sequence into regions of high divergence, closed large deletions and built sequence into insertions. We used the JR-Assembler v1.0.4 (Chu et al., 2013) to create de novo assemblies of the same sequence reads, assessed their quality using the program REAPR (Hunt et al. 2013), breaking contigs as needed and aligned the resultant contigs to Hawaiian genome. To identify deletions previously missed, we scanned the genome for regions devoid of coverage, merging adjacent regions if they were only separated by short segments of either very low coverage or repeated sequences. For regions flanked by adjacent segments of the de novo assembled contigs, we used the contig to close the gap. To confirm that such segments were properly placed in the genome, we used the RIL data to establish their chromosomal location. The result is an initial draft reference Hawaiian genome with a total length of 97Mb. Regions of excess coverage (>99x) suggest that we have failed to represent some duplicated segments, which total some 0.5 Mb in length. Also, the de novo assembly generated 22 contigs of 16kb total length that we were unable to locate in the reference. We included only the 9 that were at least 500 bases in length. Just as the N2 reference has been improved through continuous community input, we would expect users will provide improvements here.

Credits:

Illumina Production sequencing - Leonid Kruglyak/Erik C. Andersen, Princeton University, Princeton, NJ USA.

Fosmid sequencing - Don Moerman, University of British Columbia, Canada.

Sequence assembly and data integration for creation of chromosomal files - Owen A. Thompson and Robert H. Waterston, Department of Genome Sciences at University of Washington School of Medicine, Seattle, WA USA  more

Global statistics

Total sequence length98,302,807
Total ungapped length98,298,214
Gaps between scaffolds0
Number of scaffolds16
Scaffold N5017,183,857
Scaffold L503
Number of contigs17
Contig N5014,890,789
Contig L503
Total number of chromosomes and plasmids7
Number of component sequences (WGS or clone)16

Supplemental Content

PubMed articles for this assembly

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...

Global assembly definition

Download the full sequence report
Click on the table row to see sequence details in the table to the right
Assembly Unit Name
Primary Assembly
non-nuclear
Assembly Unit: Primary Assembly (GCA_000975225.1)
Molecule nameGenBank sequenceRefSeq sequenceUnlocalized
sequences count
Chromosome ICM003206.1n/an/a0
Chromosome IICM003207.1n/an/a0
Chromosome IIICM003208.1n/an/a0
Chromosome IVCM003209.1n/an/a0
Chromosome VCM003210.1n/an/a0
Chromosome XCM003211.1n/an/a0
unplacedn/an/an/a9

Assembly statistics

MoleculeTotal
Length
Scaffold
Count
Ungapped
Length
Scaffold
N50
Spanned
Gaps
Unspanned
Gaps
All98,289,0141598,284,42117,183,85710
Chromosome I14,890,789114,890,78914,890,78900
Chromosome II14,885,952114,885,95214,885,95200
Chromosome III13,596,826113,596,82613,596,82600
Chromosome IV17,183,857117,183,85717,183,85700
Chromosome V20,182,852120,182,85220,182,85200
Chromosome X17,537,347117,532,75417,537,34710
unplaced11,391911,3911,40000
MoleculeTotal
Length
Scaffold
Count
Ungapped
Length
Scaffold
N50
Spanned
Gaps
Unspanned
Gaps
Mitochondrion MT13,793113,79313,79300
Support Center