|
Status |
Public on Apr 30, 2022 |
Title |
medullary thymic epithelial cells, mature, donor 221 [pt221_hi_5'Cap] |
Sample type |
SRA |
|
|
Source name |
thymus
|
Organism |
Homo sapiens |
Characteristics |
tissue: thymus cell type: medullary thymic epithelial cells phenotype: MHCII high donor age: 4 months donor sex: male
|
Extracted molecule |
total RNA |
Extraction protocol |
mTECs were sorted as CD45-, CDR2-, EPCAM+ cells and MHCII (HLA-DR) was used to separate immature mTEClo and mature mTEChi cell populations RNA from sorted mTEChi and mTEClo populations was extracted using the High Pure RNA Isolation Kit (Roche) The libraries were prepared manually as described in Pelechano (2014) Nature Protocols ; yielding libraries only containing molecules derived from 5'Cap RNA.
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
CAGE |
Instrument model |
Illumina HiSeq 2000 |
|
|
Description |
Chr: Trancription start region (TSR) Chromosome; Start: TSR start location; Stop: TSR stop location; Samples: Samples with TSR; Sample_indices: Sample level TSRs with internal indices that are present in consensus TSR cluster; Strand: Stand; Listing: Number of sample indices in TSR; Listing_subject: Number of subjects in TSR; Hi: Number of mTEChi samples with TSR; Lo: Number of mTEClo samples with TSR; $pt_$sampletype_Reads: Number of reads in TSR for each sample; $pt_$sampletype_TPM: Power-law normalized TSR expression in tags per million for each sample; Annotation: HOMER gene type annotation; Distance_to_TSS: HOMER distance to known transcription start site; Nearest_Ensembl: HOMER closest Ensembl gene to each TSR; Gene_Name: Gene name for nearest ensemble gene; Gene_Description: Gene description for nearest Ensembl; Annotations_short: Shortened version of gene Annotation; CpG%: TSR CpG%; GC%: TSR GC%; Housekeeping: In list of known housekeeping genes (True/False); TRA: In list of known tissue restricted antigens (True/False); Aire_dep: In list of known (derived from mouse orthologues) AIRE dependent genes (True/False); Fezf2_dep: In list of known (derived from mouse orthologues) FEZF2 dependent genes (True/False); Aire_dep_TRA: TRA AND Aire_dep (True/False); Fezf2_dep_TRA: TRA AND Fezf2_dep (True/False); Other_TRA: TRA AND not Aire_dep AND not Fezf2_dep (True/False)
|
Data processing |
Reads were demultiplexed, UMIs trimmed via umi_tools extract (umi_tools v1.1) and screened for contamination with fastq_screen (v0.14.0) Reads were aligned to GRCh38/Gencode annotation (release 33) using STAR (v2.7.2b) and deduplicated based on UMIs using umi_tools dedup. Deduplicated, aligned reads were filtered for uniquely mapped reads via samtools view -b -q 255 (samtools v1.11). Quality control of sequencing and alignment was conducted using FastQC (v0.11.8) and picard CollectRnaSeqMetrics (v2.18.20) and summarized with multiqc (v1.9) Raw reads were processed with bamboozle (Ziegenheim, Nature Communications, 2021) to remove genetic variation that could help de-identify samples ie any genetic variant was replaced with the corresponding reference genome variant TSS calling: only the forward (i.e. 5' end) of each read pair was retained via samtools view -h -f 0x40. TSS were then defined as the 5' position of the uniquely mapped, forward reads. The expression levels of the TSSs for each mTEC sample were normalized to tags per million using the power law normalization implemented in CAGEr (v1.32 ). After normalization, removeBatchEffect in limma (v3.46) was used to remove sequencing batch effects. Normalized TSSs in each sample were combined into TSRs using paraclu (v9) with a minimum tag cluster expression of 2 tags per million and a maximum cluster length of 20bp. TSRs across samples were combined with bedops (v2.4.38) and consensus, strand-specific TSRs in human mTEC samples called by merging TSRs derived from different samples within 20bp proximity using bedtools merge (v2.29.2). TSRs were annotated using HOMER (v4.11.1), including mapping to closest gene, calculation of TSR CpG/GC content (-CpG) and TATA motif search Assembly: GRCh38/Gencode annotation (release 33) Supplementary files format and content: tab-delimited text files include RPKM values for each Sample
|
|
|
Submission date |
Apr 27, 2022 |
Last update date |
Apr 30, 2022 |
Contact name |
Hannah Verena Meyer |
E-mail(s) |
hmeyer@cshl.edu
|
Organization name |
Cold Spring Harbor Laboratory
|
Street address |
1 Bungtown Road
|
City |
Cold Spring Harbor |
State/province |
New York |
ZIP/Postal code |
11724 |
Country |
USA |
|
|
Platform ID |
GPL11154 |
Series (2) |
GSE201718 |
Transcriptome diversity in human medullary thymic epithelial cells - 5'Cap sequencing |
GSE201720 |
Transcriptome diversity in human medullary thymic epithelial cells |
|
Relations |
BioSample |
SAMN27918937 |
SRA |
SRX15018772 |