GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM2214179

Query DataSets for GSM2214179

Status

Public on Aug 01, 2016

Title

mTAIL-seq exp #9 wt-actegg(0-2hr)2

Sample type

SRA

Source name

Drosophila activated egg

Organism

Drosophila melanogaster

Characteristics

genotype: w1118
developmental stage: activated egg 0-2hr

Treatment protocol

No treatment

Growth protocol

HeLa cells were maintained in DMEM (Welgene) supplemented with 10% fetal bovine serum (Welgene). All the fly strains were obtained from Bloomington stock center. w1118 was used as wild type control. wispKG5287 was previously described as a null allele of wisp (Benoit et al., 2008). Immature oocytes and mature oocytes were collected by hand dissection in Grace′s Unsupplemented Insect Media (Gibco, 11595-030) from 3 or 4 day old female flies. Unfertilized activated eggs were produced from w1118 virgin females mated to sterile males (son of tud1 mothers) . Fly eggs and embryos were collected on grape juice plates for the designated time frame at 25°C.

Extracted molecule

total RNA

Extraction protocol

Total RNAs were extracted from HeLa cells or Drosophila samples by TRIzol reagent (Invitrogen, 15596-018).
Total RNA (~1–5 ug) was ligated to 3′ hairpin adaptor using T4 RNA ligase 2 (NEB, M0239) for overnight. 3′ ligated RNA was partially digested by RNase T1 (Ambion, AM2283) and subject to streptavidin beads (Invitrogen, 11206D). 5′ phosphorylation by PNK reaction (Takara, 2021B) and endonucleolytic cleavage by APE1 reaction (NEB, M0282) were performed on beads. Subsequently, RNA was eluted by 2X RNA loading dye and gel purified by 6% Urea-PAGE gel in the range of 300–750 nucleotides. The purified RNAs were ligated to 5′ adaptor, subjected to reverse-transcription (Invitrogen, 18080-085) and amplified by PCR using Phusion DNA polymerase (Thermo, F-530L). PCR products were purified by AMPure XP beads (Beckman, A63881).

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina MiSeq

Description

mTAIL-seq for activated egg(0-2hr) of wt, rep #2

Data processing

The base calls and signal intensities were processed by Illumina RTA 1.17.28 for MiSeq. The read 1 sequences were reanalyzed for more sensitive basecalling using AYB 2. The read 1 sequences were aligned to the common contaminants set, which is composed of rDNA repeat units (GenBank accession U13369.1), PhiX genome (GenBank accession J02482.1), Illumina TruSeq primer sequences, and all sequences for 5S and 5.8S rRNAs of respective species (retrieved from Rfam 11.0 of the Wellcome Trust Sanger Institute) using GSNAP 2013-03-31 with maximum 5% mismatches allowed. Clusters with any match to the contaminants were removed from the subsequent analyses. The sequences having completely identical nucleotides in the 21st to 35th cycle in read 1 (representative region of the insert) and the 1st to 15th cycle in read 2 (degenerate bases in 3′ adapter) are deduplicated by leaving only a cluster with the maximum PHRED quality sum of read 1. The degenerate and fixed delimiter sequence in 3′ adapter was clipped out from read 2 by searching perfect match of delimiter sequence (‘GTCAG’ as in the direction of read 2) between the 14th and 16th cycles in read 2. The clusters missing a delimiter sequence or having low diversity in degenerate region (at least two occurrences for all of A, C, G and T) were removed from further analyses.
The fluorescence signal intensities were processed into “Relative T signal” as described in our previous paper (Chang et al, 2014). The signals from a spike-in sample were purified with an outlier filter based on robust Mahalanobis distance (mvoutlier package 1.9.9; quan=0.5, alpha=0.025). Random 500 clusters per each spike-in were chosen for parameter calculation of a Gaussian mixture hidden Markov model (GMHMM). We trained the model using Baum-Welch algorithm implemented in the GHMM library (http://ghmm.org) with topology and initial parameters shown in fig. S1A and table S7 and S8 (1,000 iterations). The procedure was iterated to maximize likelihood, not using any property (eg. designed length of poly(A) tail) of spike-ins. Relative T signals outside the range of [-5, 5] were clipped into the range for both training and later calculations.
The length of poly(A) tails were first measured with base call-based “Strategy II” described in fig. S1A. For clusters with the measured length is shorter than 8 nt, the length is called as the final poly(A) tail length. For the others, normalized T signals starting from the first position in T-stretch detected by Strategy II were analyzed with the GMHMM. The hidden states were decoded with the standard Viterbi algorithm implemented in the GHMM library. The number of cycles with state 1 and 2 was called as the length of poly(A) tail. For the estimation of performance, we applied the process to all spike-in samples except the clusters used for the parameter fitting of the model.
The remaining reads after contaminant filter and the first duplication filters were then aligned to the genome sequences (UCSC hg19, positions of splicing junctions were processed from the UCSC Genome Browser database for version of Jan 24, 2013) using GSNAP 2013-03-31. Three different versions of alignments to genome were used in this study. (1) R1 alignment: using only the full read 1 sequences which are 51 nt long. This was used for identification of a cluster. (2) R2 short alignment: using only 40 nt right next to the 3′ adapter of read 2. This was used in searching for the poly(A)-free 3′ hydroxyl ends. (3) paired alignment: using the full read 1 sequences and part of read 2 sequences trimmed of degenerate bases and delimiter. We filtered out poly(A) stretches encoded from genome using this alignment set. All the alignments were performed with maximum mismatches of 5%, minimum mapping quality of 3. All multi-mapped reads were removed. The remaining PCR artifacts with few mismatches were removed again using the R1 alignment with 15 degenerate bases inside the 3′ adapter region. To detect that kind of artifacts, we clustered the R1 alignments with maximum distance between mapped positions of 10 bp, they were then clustered again within the first cluster using degenerate bases from read 2 of respective reads with CD-HIT-EST 4.5.4 (word size=6, sequence identity=0.85). For a set of detected duplicates, we chose a read with maximum sum of PHRED quality in read 1 to leave.
For classification and transcript-level analyses, we compiled reference annotations for human and mouse using NCBI RefSeq, RepeatMasker, gtRNAdb, Rfam and miRBase databases (the first three were downloaded from the UCSC Genome Browser on Apr 25, 2013; Rfam version 11; miRBase version 19). The R1 alignments were annotated with intersection with the compiled annotations using BEDTools {Quinlan, 2010 #66}. When multiple annotations were overlapped to an alignment, we chose a class for the statistics requiring exclusive assignment of a genomic source type by the following priority: miRNA, rRNA, tRNA, Mt-tRNA, snoRNA, scRNA, srpRNA, snRNA, lncRNA, RNA, ncRNA, misc_RNA, Cis-reg, ribozyme, RC, IRES, frameshift_element, LINE, SINE, Simple_repeat, Low_complexity, Satellite, DNA, LTR, CDS, 3′ UTR, 5′ UTR, intron, Other, Unknown (higher priority first). The transcript-level analyses were performed using our custom non-redundant RefSeq (nrRefSeq) transcript set, which is a reduced set retaining only the longest isoform or transcript when regions overlap with each other. The positions of read 1 in nrRefSeq transcripts were positioned with BEDTools intersection between alignments to genome sequences and nrRefSeq annotation set, and then translated to the transcript-level coordination with in-house software.
As poly(A) tails were initially detected with a constraint that it must begin within the first 30 cycles, so the maximum detectable 3′ end modification of poly(A) tails was limited to the last 30 nucleotides of insert. To exclude A stretches obviously encoded from genomic sequence (with or without 3′ end modifications), we masked detected poly(A) tail ranges with read 2 alignments so that the 3′-most position of alignable (not clipped) is eliminated from poly(A) tail or its 3′ end modifications. All statistics regarding transcript-level modification rates were calculated for transcripts having more than 200 tags with poly(A) tails longer than 8 nt.
Genome_build: hg38, dm6
Supplementary_files_format_and_content: The spreadsheet files contain the poly(A) tail length distribution and 3' end modification frequencies next to poly(A) tails for all detected transcripts

Submission date

Jun 26, 2016

Last update date

May 15, 2019

Contact name

Jaechul Lim

E-mail(s)

jaechul.lim@snu.ac.kr

Organization name

Seoul National University

Street address

1 Gwanak-ro, 1 Gwanak-gu

City

Seoul

State/province

ZIP/Postal code

08826

Country

South Korea

Platform ID

GPL16479

Series (2)

GSE83731	mTAIL-seq reveals dynamic poly(A) tail regulation in oocyte-to-embryo development [RNA-seq and mTAIL-seq Human and Drosphila]
GSE83732	mTAIL-seq reveals dynamic poly(A) tail regulation in oocyte-to-embryo development

Relations

Reanalyzed by

GSM3281509

BioSample

SAMN05293646

SRA

SRX1878970

Supplementary file	Size	Download	File type/resource
GSM2214179_mTS9_wt-actegg_0-2hr2.csv.gz	522.4 Kb	(ftp)(http)	CSV
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file