GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1808042

Query DataSets for GSM1808042

Status

Public on Jan 18, 2016

Title

Adipose_1

Sample type

SRA

Source name

Adipose (Subcutaneous)

Organism

Homo sapiens

Characteristics

tissue: Adipose

Extracted molecule

total RNA

Extraction protocol

In brief, 0.5-1.0 μg of total RNA was twice selected for mRNA by oligo (dT) and then fragmented by heating. First strand cDNA was synthesized using Superscript III reverse transcriptase and random hexamer primers. After second strand synthesis by DNA polymerase I and with dUTP in place of dTTP, double stranded cDNA was end-repaired and A-tailed prior to ligation of Illumina adaptors including DNA indices. Libraries were made strand-specific by digestion with Uracil-DNA Glycosylase prior to PCR amplification. Bead-based clean-up was incorporated after each enzymatic reaction and libraries were checked by flash gel and Bioanalyzer analysis.
Total RNA was isolated from tissues using TRIzol Reagent.

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina HiSeq 2000

Data processing

Insert sizes were estimated using the Picard Tools (v1.79) tools SortSam.jar and CollectInsertSizeMetrics.jar on Bowtie2(v2.0.0-beta7)-generated sam files from a subset of 500,000 paired-end reads per sample (bowtie2 options -q --very-fast --phred33).
Raw reads were mapped to the human genome sequence using Tophat v2.0.6, allowing for four mismatches per 100bp read, a maximum of 6 edit distances per read, and one mismatch in the splice anchor region.
Using Picard Tools, the alignment files were sorted by genomic coordinates, read-group data was added, and duplicate reads were marked and removed.
Transcript structure assembly was performed with Cufflinks (v.2.0.2) on each sample for each tissue type. The Gencode v12 annotation was used as a reference to guide assembly (--GTF-guide); additional parameters included upper-quartile normalization (--upper-quartile-norm), library type (--library-type fr-firststrand), and maximum bundle length (--max-bundle-length 7500000).
Cuffmerge (v2.0.2) was then used to merge all the cufflinks assemblies and the reference annotation into one large set of transcript structures. Splice junctions not present in the reference annotation were required to pass a Shannon entropy score threshold of 2.
The mapped reads for each sample were subsampled down to 20 million using multiple runs of the Picard tool DownsampleSam, and 18 samples per tissue type were selected to create a subsampled dataset. Subsequent steps were performed separatley on both subsampled and non-subsampled datasets.
To calculate the expression level of each gene in each tissue type, Cuffdiff (v2.2.1) was run with default parameters and the option --library-type fr-firststrand on the subsampled read files with 18 samples per tissue type used as ‘replicates’ and the merged set of transcript structures used as the reference annotation. Gene expression for each tissue was calculated by summing all isoforms for a given gene in a given tissue from the isoforms.fpkm_tracking file generated by Cuffdiff. Genes with any isoform with a status of “HIDATA” do not have a calculated FPKM value. Gene expression for each individual was calculated by summing all isoforms for a given gene for each individual from the isoforms.read_group_tracking file generated by Cuffdiff. This file has, for each individual, one value per isoform reported for each gene.
Splice junctions were identified using the JuncBASE package on reads that overlapped protein-coding genes. Only splice junctions with a Shannon entropy score greater than 2 using all subsampled reads were used. Junctions were called non-annotated if they were not present in the Gencode v12 annotation nor in the Ensemble annotation. Reported JuncBASE results were transformed into groups of mutually exclusive junction sets, each with defined length-normalized read counts (reads/100bp) and ‘percent spliced in’ (PSI) values. Complex events (e.g. cassette exon + alternate 3’ splice site) with ambiguously mapped reads and inron retention events were filtered out.
Genome_build: hg19
Supplementary_files_format_and_content: FPKM values by individual and by tissue are reported for 389 pharmacogenes in subsampled and non-subsampled datasets; Percent Spliced In (PSI) values are reported for 389 pharmacogenes by individual in subsampled and non-subsampled datasets.

Submission date

Jul 03, 2015

Last update date

May 15, 2019

Contact name

Kathleen M Giacomini

Organization name

University of California, San Francisco

Street address

1550 4th St, Mission Bay, RH 581

City

San Francisco