NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM3311095 Query DataSets for GSM3311095
Status Public on Feb 28, 2020
Title DGRP Line 49 Female Rep 2
Sample type SRA
 
Source name DGRP Line 49 Female
Organism Drosophila melanogaster
Characteristics strain: DGRP_49
Sex: Female
age: 3-5 days old
medium: cornmeal-molasses-agar
tissue: Whole flies
Treatment protocol Not applicable
Growth protocol We used 200 inbred, sequenced DGRP lines, established by 20 generations of full sib inbreeding from gravid females collected at the Raleigh, NC USA Farmer’s Market. All lines were reared on cornmeal-molasses-agar medium at 25°C, 60–75% relative humidity and a 12-hr light-dark cycle at equal larval densities. We collected two replicates of 25 females and 30 males per line, for a total of 800 samples. We used a strict randomized experimental design for sample collection. We collected mated 3-5 day old flies between 1-3 pm. We transferred the flies into empty culture vials and froze them over ice supplemented with liquid nitrogen, and sexed the frozen flies. The samples were transferred to 2.0 ml nuclease-free microcentrifuge tubes (Ambion) and stored at -80°C until ready to process.
Extracted molecule total RNA
Extraction protocol Total RNA was extracted with Trizol using the RNAeasy Mini Kit (Qiagen, Inc.). rRNA was depleted using the Ribo-Zero™ Gold Kit (Epicentre, Inc.) with 5ug total RNA input.
Depleted mRNA was fragmented and converted to first strand cDNA. During the synthesis of second strand cDNA, dUTP instead of dTTP was incorporated to label the second strand cDNA. cDNA from each RNA sample was used to produce barcoded cDNA libraries using NEXTflex™ DNA Barcodes (Bioo Scientific, Inc.) with an Illumina TruSeq compatible protocol. Library size was selected using Agencourt Ampure XP Beads (Beckman Coulter, Inc.) and centered around 250 bp with the insert size ~130 bp. Second strand DNA was digested with Uracil-DNA Glycosylase before amplification to produce directional cDNA libraries. Libraries were quantified using Qubit dsDNA HS Kits (Life Technologies, Inc.) and Bioanalyzer (Agilent Technologies, Inc.) to calculate molarity. Libraries were then diluted to equal molarity and re-quantified. A total of 50 pools of 16 libraries were made, again randomly assigning samples to each pool. Pooled library samples were quantified again to calculate final molarity and then denatured and diluted to 14pM. Pooled library samples were clustered on an Illumina cBot; each pool was sequenced on one lane of Illumina Hiseq2500 using 125 bp single-read v4 chemistry. Libraries with fewer than 5 million reads uniquely aligned to the D. melanogaster reference genome were re-sequenced to achieve sufficient read depth.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina HiSeq 2500
 
Data processing Barcoded sequence reads were demultiplexed using the Illumina pipeline v1.9.
Adapter sequences were trimmed using cutadapt v1.642 and trimmed sequences shorter than 50bp were discarded from further analysis.
Trimmed sequences were then aligned to multiple target sequence databases in the following order, using BWA v0.7.10 (MEM algorithm with parameters ‘-v 2 –t 4’):
(1) all trimmed sequences were aligned against a database containing the complete 5S, 18S-5p8S-2S-28S, mt:lrRNA, and mt:srRNA sequences to filter out residual rRNA that escaped depletion during library preparation;
(2) remaining sequences were then aligned against a custom database of potential microbiome component species (see below);
(3) sequences that did not align to either the rRNA or microbiome databases were aligned to all D. melanogaster sequences in RepBase.
The remaining sequences that did not align to any of the databases above were then aligned to the D. melanogaster genome (BDGP5) and known transcriptome (FlyBase v5.57) using STAR v2.4.0e.
Generation of microbiome database: We first performed a preliminary alignment of RNA-seq reads by filtering only rRNA sequences, and then aligning directly to the D. melanogaster genome using the tools and parameters described above. Sequences that did not align to the rRNA database or D. melanogaster reference genome were then analyzed with Trinity v2.1.1 to perform de novo assembly of longer sequences from the short reads. We discarded assembled sequences < 1kb in length, and the remaining sequences were then run through a BLAST search against the refseq_genomic database (downloaded from NCBI on 1/27/16). We then compiled a list of all refseq genomes that were found as a top BLAST hit for at least two assembled sequences. We compiled all fasta files for each of these refseq genomes into a single database for alignment with BWA.
Genotype validation: To validate the DGRP line assigned to each RNA-seq sample, we identified single nucleotide polymorphisms (SNPs) from the RNA-seq reads that aligned to the D. melanogaster reference genome using STAR as described above. We retained only those SNP calls covered by at least 3 reads and at least 75% of all reads supporting the major genotype (note that DGRP lines are inbred and therefore the majority of SNPs are homozygous). This filtering process produced >400k usable SNPs per sample, primarily located in transcribed regions of the genome. We then performed two validation tests of the DGRP line assigned to each sample X by comparing to the previously published genotype calls for each DGRP line (http://dgrp2.gnets.ncsu.edu/data/website/dgrp2.tgeno).
Inference of novel transcripts: We constructed a de novo transcriptome for each individual sample by inputting the RNA-seq reads aligned to the D. melanogaster reference genome into Cufflinks v2.2.1. We also considered the novel transcribed regions (NTRs) identified in a previous study based on unstranded pooled RNA sequencing of the DGRP lines. However, the previously published data do not provide strand-specific signal, while our current RNA-seq data uses a strand-specific library preparation. Therefore, we reassigned the strand for each of the previously published NTRs that was supported by the greater number of total aligned reads across all samples. We then merged all de novo sample transcriptomes and the previously published NTRs using the cuffmerge tool included with Cufflinks v2.2.1, then removed all merged transcript models with any exon overlapping on the same strand any exon in the known D. melanogaster transcriptome. We defined the known transcriptome here as all gene models in FlyBase v5.57 plus all subsequently added gene models in FlyBase v6.11 to account for recently discovered lncRNA sequences. Thus, the final output of this analysis was a set of NTRs constructed from both our current RNA-seq data and previously published pooled RNA-seq data that do not overlap any known gene exons on the same strand.
Gene expression estimation: Read counts were computed for known and novel gene models using HTSeq-count with the ‘intersection-nonempty’ assignment method. Read counts for each gene were then normalized across all samples using EdgeR as follows. First, genes with low expression overall (<10 aligned reads in >75% of the libraries) were excluded from the analysis by modeling the overall distribution of expression levels as a mixture of background noise and foreground signal to determine an empirical threshold for sufficiently expressed genes. Library sizes were re-computed as the sum of reads assigned to the remaining genes, and further normalized using the Trimmed Mean of M-values (TMM) method. We further adjusted gene expression values by estimating and removing the effect of alignment bias resulting from higher rates of non-reference variants clustering in some lines. We computed the alignment bias score A(g,L) defined as the number of non-reference nucleotides per kb in all exons of gene g in DGRP line L, based on the previous map of genomic variation in the DGRP. We then fit a linear model for each endogenous gene: Y = A + ε, where Y is the normalized expression profile for gene g after the read counting and EdgeR normalization described above. After fitting these linear models, ε represents the alignment bias-corrected expression, and was used as the normalized gene expression in all subsequent analysis.
Genetics of gene expression: For each gene in each sex, we fit mixed-effect models to the gene expression vector across all samples corresponding to: Y = W + L + ε, where Y is the observed log2(normalized read count), W is Wolbachia infection status, L is DGRP line, and ε is the residual error. We identified genetically variable transcripts as those that passed a 5% FDR threshold (based on Benjamini-Hochberg corrected P-values) for the L terms. We computed the broad sense heritabilities (H2) for each gene expression trait separately for males and females as H^2 = V/(V+R), where V and R are, respectively, the among line and within line variance components.
Genome_build: BDGP R5/dm3
Supplementary_files_format_and_content: Tab-delimited text files contain normalized log2 FPKM expression values, variant density correction factors, and within-sex line means for all genes.
 
Submission date Jul 30, 2018
Last update date Feb 28, 2020
Contact name Wen Huang
E-mail(s) whuang.ustc@gmail.com
Phone 5173539136
Organization name Michigan State University
Department Animal Science
Lab Wen Huang
Street address 474 S Shaw Ln, Anthony Hall 1205F
City East Lansing
State/province MI
ZIP/Postal code 48824
Country USA
 
Platform ID GPL17275
Series (1)
GSE117850 Gene Expression Networks in the Drosophila Genetic Reference Panel
Relations
BioSample SAMN09738108
SRA SRX4483939

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap