GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM7766815

Query DataSets for GSM7766815

Status

Public on Sep 11, 2023

Title

P152

Sample type

SRA

Source name

3rd and 4th rosette leaves

Organism

Arabidopsis thaliana

Characteristics

tissue: 3rd and 4th rosette leaves
genotype: Ws-2

Growth protocol

The Arabidopsis thaliana ecotype Wassilewskija Ws-2 seeds were surface sterilized and then grown in MS-agar, 1% sucrose plates. These were cold treated at 4C in the dark for 4 days before being grown in 16h light/ 8h dark cycles at a constant temperature of 21C. Seedlings were transferred to soil (with 5% sand) after 10 days and the 3rd and 4th true rosette leaves were tracked for future reference. After 10 days in soil conditions, 75 plants were selected from the total of 225 plants grown, with plants selected to ensure a diversity of bolting statuses. At ZT4 (4h after lights on) the following day, the 3rd and 4th rosette leaves for each individual plant were pooled and flash frozen in liquid nitrogen.

Extracted molecule

polyA RNA

Extraction protocol

Total RNA was isolated from these samples using the Qiagen RNeasy Plant Mini Kit (Cat no. 74904). Residual genomic DNA was removed using the Invitrogen Turbo DNA-free kit (Cat no. AM1907), according to the manufacturer’s protocol.
Libraries were prepared with the NEBNext Ultra II Directional Library Prep Kit for Illumina (Cat no. E7765), using the NEBNext poly(A) magnetic isolation module (Cat no. E7490). Quality control was performed with the Agilent 2100 Bioanalyzer instrument (Part no. G2939BA). Finally, a total of 70 libraries were pooled and sequenced, via Novagene, using one lane on an Illumina NovaSeq system.

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina NovaSeq 6000

Description

4 days of stratification at 4C; 10 days grown on plates (16h light / 8h dark); 10 days grown in soil (same photoperiod); then sampled

Data processing

Before analysis of the raw sequencing data, FastQC v0.11.7(Andrews et al. 2012) was used to assess read quality.
Illumina adapters were trimmed using CutAdapt v3.4 (Martin 2011). (parameters: -a AGATCGGAAGAG -A AGATCGGAAGAG -j 0 -q 10,10 -m 30 -u 10 -U 10)
Reads were quantified using Salmon v1.6.0 (Patro et al. 2017) and the TAIR10 transcriptome (Berardini et al. 2015). (parameters: -l A --validateMappings)
To analyse genetic variation within the population, we followed GATK guidelines for short variant discovery from RNA-seq data.
We used STAR (v2.7.10b; two-pass mode) to align reads to TAIR10 genome, revision 56 (Dobin et al. 2013). (parameters: --runThreadN 20 --readFilesIn ${file}_1.fq.gz ${file}_2.fq.gz --readFilesCommand "gunzip -c" --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 5000000000 --twopassMode Basic --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 0 --outFilterMismatchNmax 1)
We then pre-processed the aligned reads using the commands ‘MarkDuplicates’ (parameters: gatk --java-options '-Xmx28G' MarkDuplicates -ASO coordinate --VERBOSITY DEBUG --MAX_RECORDS_IN_RAM 5000000) and ‘SplitNCigarReads’ (parameters: gatk --java-options '-Xmx28G' SplitNCigarReads --verbosity DEBUG) from GATK (v4.3.0.0) (Poplin et al. 2018).
We used the command ‘HaplotypeCaller’ (parameters: -ERC GVCF --dont-use-soft-clipped-bases true --standard-min-confidence-threshold-for-calling 20) to produce genomic variant calling format (gVCF) files per sample.
Finally, we combined GVCF files using 'CombineGVCFs' (parameters: gatk --java-options '-Xmx60G' CombineGVCFs) produced identified single-nucleotide variants (SNVs) and insertions / deletions (indels) which were confidently called across the whole population, using ‘GenotypeGVCFs’ (parameters: gatk --java-options "-Xmx60g" GenotypeGVCFs).
For further processing of these initial SNVs and indels, we followed the filtering guidelines suggested by ref. (Cruz et al. 2020). Specifically, we selected only biallelic variants with a minimum genotype quality of 40, and which were called in at least 80% of all samples, using VCFtools (v0.1.16) (Danecek et al. 2011). (parameters: --minGQ 40 --max-missing 0.8 --out joint_genotyping/all_genotyped_filtered.vcf --recode)
We then imputed missing genotypes using Beagle (v5.4, 22Jul22, 46e) on default settings (Browning, Zhou, and Browning 2018). Note, due to pre-processing by Beagle, the variant calling format (VCF) file includes two versions of the same heterozygous haplotype (‘0|1’ and ‘1|0’). (parameters: java -Xmx10240m -jar beagle.22Jul22.46e.jar nthreads=2)
Finally, we selected variants with a minor allele frequency (MAF) of at least 0.05, using VCFtools. (vcftools --maf 0.05 --recode)
Assembly: TAIR10
Supplementary files format and content: Salmon quantification folder for each sample, included as a .tar file
Supplementary files format and content: A final VCF file for all samples

Submission date

Sep 07, 2023

Last update date

Sep 11, 2023

Contact name

Ethan James Redmond

E-mail(s)

ethan.redmond@york.ac.uk

Organization name

University of York

Department

Department of Biology

Lab

Ezer lab

Street address

Wentworth Way

City

York

ZIP/Postal code

YO10 5DD

Country

United Kingdom

Platform ID

GPL26208

Series (1)

GSE242681

Single-plant-omics reveals the cascade of transcriptional changes during the vegetative-to-reproductive transition

Relations

BioSample

SAMN37318740

SRA

SRX21665226

Supplementary file	Size	Download	File type/resource
GSM7766815_P152_quant.tar.gz	617.7 Kb	(ftp)(http)	TAR
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file