GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Sample GSM6070092 Query DataSets for GSM6070092
Status Public on Apr 30, 2022
Title medullary thymic epithelial cells, immature, donor 221 [pt221_lo_5'Cap]
Sample type SRA
Source name thymus
Organism Homo sapiens
Characteristics tissue: thymus
cell type: medullary thymic epithelial cells
phenotype: MHCII low
donor age: 4 months
donor sex: male
Extracted molecule total RNA
Extraction protocol mTECs were sorted as CD45-, CDR2-, EPCAM+ cells and MHCII (HLA-DR) was used to separate immature mTEClo and mature mTEChi cell populations
RNA from sorted mTEChi and mTEClo populations was extracted using the High Pure RNA Isolation Kit (Roche)
The libraries were prepared manually as described in Pelechano (2014) Nature Protocols ; yielding libraries only containing molecules derived from 5'Cap RNA.
Library strategy RNA-Seq
Library source transcriptomic
Library selection CAGE
Instrument model Illumina HiSeq 2000
Description Chr: Trancription start region (TSR) Chromosome; Start: TSR start location; Stop: TSR stop location; Samples: Samples with TSR; Sample_indices: Sample level TSRs with internal indices that are present in consensus TSR cluster; Strand: Stand; Listing: Number of sample indices in TSR; Listing_subject: Number of subjects in TSR; Hi: Number of mTEChi samples with TSR; Lo: Number of mTEClo samples with TSR; $pt_$sampletype_Reads: Number of reads in TSR for each sample; $pt_$sampletype_TPM: Power-law normalized TSR expression in tags per million for each sample; Annotation: HOMER gene type annotation; Distance_to_TSS: HOMER distance to known transcription start site; Nearest_Ensembl: HOMER closest Ensembl gene to each TSR; Gene_Name: Gene name for nearest ensemble gene; Gene_Description: Gene description for nearest Ensembl; Annotations_short: Shortened version of gene Annotation; CpG%: TSR CpG%; GC%: TSR GC%; Housekeeping: In list of known housekeeping genes (True/False); TRA: In list of known tissue restricted antigens (True/False); Aire_dep: In list of known (derived from mouse orthologues) AIRE dependent genes (True/False); Fezf2_dep: In list of known (derived from mouse orthologues) FEZF2 dependent genes (True/False); Aire_dep_TRA: TRA AND Aire_dep (True/False); Fezf2_dep_TRA: TRA AND Fezf2_dep (True/False); Other_TRA: TRA AND not Aire_dep AND not Fezf2_dep (True/False)
Data processing Reads were demultiplexed, UMIs trimmed via umi_tools extract (umi_tools v1.1) and screened for contamination with fastq_screen (v0.14.0)
Reads were aligned to GRCh38/Gencode annotation (release 33) using STAR (v2.7.2b) and deduplicated based on UMIs using umi_tools dedup.
Deduplicated, aligned reads were filtered for uniquely mapped reads via samtools view -b -q 255 (samtools v1.11).
Quality control of sequencing and alignment was conducted using FastQC (v0.11.8) and picard CollectRnaSeqMetrics (v2.18.20) and summarized with multiqc (v1.9)
Raw reads were processed with bamboozle (Ziegenheim, Nature Communications, 2021) to remove genetic variation that could help de-identify samples ie any genetic variant was replaced with the corresponding reference genome variant
TSS calling: only the forward (i.e. 5' end) of each read pair was retained via samtools view -h -f 0x40. TSS were then defined as the 5' position of the uniquely mapped, forward reads. The expression levels of the TSSs for each mTEC sample were normalized to tags per million using the power law normalization implemented in CAGEr (v1.32 ). After normalization, removeBatchEffect in limma (v3.46) was used to remove sequencing batch effects.
Normalized TSSs in each sample were combined into TSRs using paraclu (v9) with a minimum tag cluster expression of 2 tags per million and a maximum cluster length of 20bp. TSRs across samples were combined with bedops (v2.4.38) and consensus, strand-specific TSRs in human mTEC samples called by merging TSRs derived from different samples within 20bp proximity using bedtools merge (v2.29.2).
TSRs were annotated using HOMER (v4.11.1), including mapping to closest gene, calculation of TSR CpG/GC content (-CpG) and TATA motif search
Assembly: GRCh38/Gencode annotation (release 33)
Supplementary files format and content: tab-delimited text files include RPKM values for each Sample
Submission date Apr 27, 2022
Last update date Apr 30, 2022
Contact name Hannah Verena Meyer
Organization name Cold Spring Harbor Laboratory
Street address 1 Bungtown Road
City Cold Spring Harbor
State/province New York
ZIP/Postal code 11724
Country USA
Platform ID GPL11154
Series (2)
GSE201718 Transcriptome diversity in human medullary thymic epithelial cells - 5'Cap sequencing
GSE201720 Transcriptome diversity in human medullary thymic epithelial cells
BioSample SAMN27918936
SRA SRX15018773

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap