GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Series GSE40522

Query DataSets for GSE40522

Status

Public on Aug 31, 2012

Title

ENCODE PSU Hardison RnaSeq

Project

Mouse ENCODE

Organism

Mus musculus

Experiment type

Expression profiling by high throughput sequencing

Summary

This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu).
Knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and interpreted as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function.
The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA with a function preserved in mammals versus that with a function in only one species will be discovered.
One of the epigenetic features most closely related to genomic activity is the production of stable RNA, including transcripts from both protein-coding genes and noncoding transcripts. These genomic compilations of transcripts, or transcriptomes, are primary determinants of the way cells function, respond and differentiate, both by the production of proteins translated from coding transcripts and the regulatory activity of untranslated non-coding transcripts. Non-coding RNA's regulate gene expression through diverse mechanisms ranging from reducing chromatin accessibility (affecting large regions or whole chromosomes) to precise fine-tuning of transcription from specific genes, e.g. via RNAi.
Even though a large proportion of mammalian genomes is transcribed, many of the transcribed segments have yet to be assigned any function. The ENCODE project aims to create a comprehensive, quantitative annotation of the human transcriptome in several cell and tissue types as well as to understand regulation of transcriptomes by establishing the relationship between regulatory factors and their targets. Mapping the mouse transcriptome in similar tissues will allow us to discern conservation of transcriptome profiles between mouse and human and to discover species-specific transcription patterns, and to infer conserved versus species-specific regulatory mechanisms. The results will have a significant impact on our understanding of the evolution of gene regulation.

For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Overall design

Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse).
Total RNA was extracted from 5-10 million cells using TRIzol reagent. This was followed by mRNA selection, fragmentation and cDNA synthesis, which were performed as described previously (Mortazavi et al., 2009). Double-stranded cDNA samples were processed for library construction for Illumina sequencing, using the Illumina ChIP-seq Sample Preparation Kit.
Strand-specific libraries were generated in a similar manner, except for a couple of modifications described previously (Parkhomchuk et al., 2009). Briefly, instead of dTTP, dUTP was used during second-strand cDNA synthesis to label the second-strand cDNA. During library preparation, the dUTP-labeled cDNA was treated with Uracil N Glycosylase, prior to the PCR amplification step. This was done to remove uracil from the second-strand, following which the DNA was subjected to high heat to facilitate abasic scission of the second strand.
Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are considered as biological replicates.
Sequencing was done on the Illumina Genome Analyzer IIx and on the Illumina HiSeq 2000. FastQ files for the resulting sequence reads (single read and paired-end, directional and non-directional) were moved to a data library in Galaxy, and tools implemented in Galaxy were used for further processing via workflows ((Giardine et al., 2005), (Blankenberg et al., 2010 ), (Goecks et al., 2010). Data processing was also performed on the CyberSTAR high-performance computing system at Penn State. The reads were mapped to the mouse genome (mm9 assembly) using the program TopHat ((Langmead et al., 2009) and (Trapnell et al., 2009)). Signal tracks were created using BEDtools ((Quinlan et al., 2010)) and SAMtools ((Li, Handasaker et al., 2009)).

Web link

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm9&g=wgEncodePsuRnaSeq
http://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/geo/info/ENCODE.html

Contributor(s)

Hardison R, Paulson R, Bodine D, Weiss M, Mishra T, Keller C, Giardine B, Mishra T, Taylor J

Citation missing

Has this study been published? Please login to update or notify GEO.

BioProject

PRJNA66167

Submission date

Aug 31, 2012

Last update date

May 15, 2019

Contact name

ENCODE DCC

E-mail(s)

encode-help@lists.stanford.edu

Organization name

ENCODE DCC

Street address

300 Pasteur Dr

City

Stanford

State/province

ZIP/Postal code

94305-5120

Country

USA

Platforms (2)

GPL11002	Illumina Genome Analyzer IIx (Mus musculus)
GPL13112	Illumina HiSeq 2000 (Mus musculus)

Samples (17)

More...

GSM995525	PSU_RnaSeq_MEP_paired-end (superseded by GSE90218)
GSM995526	PSU_RnaSeq_MEL_DMSO_2.0pct_single (superseded by GSE93476)
GSM995527	PSU_RnaSeq_G1E-ER4_diffProtD_14hr_paired-end (superseded by GSE90211)

Relations

SRA

SRP015338

Download family	Format
SOFT formatted family file(s)	SOFT
MINiML formatted family file(s)	MINiML
Series Matrix File(s)	TXT

Supplementary file	Size	Download	File type/resource
GSE40522_RAW.tar	3.3 Gb	(http)(custom)	TAR (of BIGWIG)
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file