Warning: The NCBI web site requires JavaScript to function. more...
An official website of the United States government
The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Download
IDs: 704988 [UID] 704988 [GenBank] 779818 [RefSeq]
Macaca fascicularis (cynomolgus macaque) Sequence Assembly Release Notes The cynomolgus macaque DNA for shotgun sequencing, is derived from a female, 5.8 yrs old, provided by Dr. Jay Kaplan and originated from "Tinjil", not a native location for cynomolgus, rather ... it is an island off the south coast of Java that was seeded with monkeys by the Washington National Primate Center. The original animal was trapped in eastern Sumatra. Sequences were generated on the Illumina HiSeq for assisted and de novo assembly. Sequence genome coverage for each paired end read type is as follows: 50x 300-500bp inserts, 10x 3kb insert and 2x 8kb insert. Two independent assemblies were built with all sequence data, using an assisted assembler and the de novo assembler SOAP. The workflow for the assisted assembley is as follows: 1) map reads to reference and filter alignments using SRprism (unpublished but in process of being written up) that reports all alignments of equally good quality. Filtering is done by first finding out the histogram for per library insert size seen in alignments, deciding which range to use (usually tightest 99th percentile), and then retaining paired reads that have correct orientation with insert size in the desired range. Different data types (Illumina, traces, solid, 454) have slightly different filtering criteria. 2) use mapped and filtered reads for building consensus contigs. 3) find consecutive contigs that are bridged by mate pairs having 30-mers each on either side of the gap, de-novo assembly in gaps between bridged contigs: 30-mers from reads are used to build an index for de-novo assembly, only filter out reads and reads mapped to contig ends that go into that index, set a predefined maximum gap size and number of iterations used to limit the resources spent on any particular gap. 4) Find structural differences between scaffolds built and reference by using paired reads with mates on different scaffolds and do de-novo gap filling between reordered scaffolds [in progress]. The reference genome used to align cynomolgus macaque reads was the published version (MMUL_1) of rhesus macaque and a updated rhesus macaque assembled version not yet published (courtesy of Aleksey Zimin). Using the assisted assembly as the reference we aligned and merged the de novo assembly using the GAA tool. In the final assembly, referred to as Macaca_fascicularis_5.0, there were 102,878 contigs with an N50 contig length of 85 kb. There were 7627 supercontigs (scaffolds) with the N50 supercontig length of 144 Mb. A total of 2.8 Gb was assembled in contigs. **************************************************** Macaca fascicularis Sequence and Assembly Credits DNA source - Dr. Jay Kaplan, Wake Forest Primate Facility, Wake Forest, NC. Genome Sequence - The Genome Institute, Washington University School of Medicine, St Louis, MO. Sequence Assembly - Richa Agarwala, Sergey Shiryaev, NCBI and The Genome Institute, Washington University School of Medicine, St Louis, MO. Assembly curation - LaDeana Hillier, The Genome Institute, Washington University School of Medicine, St Louis, MO. FISH mapping data - Mariano Rocchi, Department of Biology, University of Bari, Bari, Italy. Funding for the sequence characterization of the cynomolgus macaque genome was provided by NHGRI. Author List: Richard K. Wilson, Wesley C. Warren **************************************************** Chromosome lengths Column 1 = Chromosome Column 2 = Chromosome lengths (including estimated gap sizes) Column 3 = Chromosome sequence length (without including estimated gap sizes) MFA1 227556264 217433370 MFA10 96509753 90761517 MFA11 137757926 132144036 MFA12 132586672 127191125 MFA13 111193037 106335528 MFA14 130733371 123895447 MFA15 112612857 107712928 MFA16 80997621 74103573 MFA17 96864807 92008008 MFA18 75711847 71766527 MFA19 59248254 51391499 MFA2 192460366 186559336 MFA20 78541002 72393001 MFA3 192294377 180410849 MFA4 170955103 164881207 MFA5 189454096 183527297 MFA6 181584905 175247550 MFA7 171882078 164071319 MFA8 146850525 140657447 MFA9 133195287 127272501 MFAX 152835861 144357465 An additional 69.8Mb of sequence is unlocalized. *********************************************************************************** Assembly statistics: *** Contiguity: Contig *** Total contig number: 102878 Total contig bases: 2805274345 bp Average contig length: 27268 bp Maximum contig length: 764150 bp N50 contig length: 85974 bp N50 contig number: 9304 Major contig (> 500 bp) number: 81458 Major_contig bases: 2801848696 bp Major_contig avg contig length: 34396 Major_contig N50 contig length: 86137 Major_contig N50 contig number: 9284 *** Contiguity: Supercontig *** Total supercontig number: 7627 Average supercontig length: 367808 bp Maximum supercontig length: 221345846 bp N50 supercontig length: 144445942 bp N50 supercontig number: 8 Major supercontig (> 500 bp) number: 7587 Major_supercontig bases: 2805261977 bp Major_supercontig avg more
Your browsing activity is empty.
Activity recording is turned off.
Turn recording back on