gene-model: GENCODE v36

Question

TCGA RNA Quantification

0

Entering edit mode

23 months ago

shivangi.agarwal800 ▴ 120

Hi folks,

I have downloaded RNA-Seq data for Breast Cancer cases from TCGA. When I look into the individual file, it looks like this:

#################ba295155-272e-43eb-9d6a-e4c9c392e68b.rna_seq.augmented_star_gene_counts.tsv###############################

gene-model: GENCODE v36

gene_id gene_name gene_type unstranded stranded_first stranded_second tpm_unstranded fpkm_unstranded fpkm_uq_unstranded

N_unmapped 1884156 1884156 1884156

N_multimapping 5772894 5772894 5772894

N_noFeature 3331800 41101584 41057602

N_ambiguous 6974507 1651293 1654961

ENSG00000000003.15 TSPAN6 protein_coding 4370 2153 2217 56.2216 13.5542 13.0317

ENSG00000000005.6 TNMD protein_coding 7 2 5 0.2768 0.0667 0.0642

ENSG00000000419.13 DPM1 protein_coding 2625 1314 1312 126.9161 30.5977 29.4181

ENSG00000000457.14 SCYL3 protein_coding 3005 2389 2252 25.4778 6.1423 5.9055

ENSG00000000460.17 C1orf112 protein_coding 1578 1728 1697 15.4251 3.7188 3.5754

ENSG00000000938.13 FGR protein_coding 599 301 298 10.3359 2.4918 2.3958

ENSG00000000971.16 CFH protein_coding 4864 2453 2411 35.5701 8.5755 8.2449

ENSG00000001036.14 FUCA2 protein_coding 1944 1645 1610 40.2007 9.6918 9.3182

ENSG00000001084.13 GCLC protein_coding 1958 1196 1170 13.2587 3.1965 3.0733

ENSG00000001167.14 NFYA protein_coding 4597 2540 2504 70.3931 16.9708 16.3166

ENSG00000001460.18 STPG1 protein_coding 669 375 367 4.5871 1.1059 1.0633

ENSG00000001461.17 NIPAL3 protein_coding 3220 1671 1623 19.9990 4.8215 4.6356

ENSG00000001497.18 LAS1L protein_coding 3766 1791 1979 17.5048 4.2202 4.0575

In the header, there is unstranded, stranded_first, stranded_second, tpm_unstranded, fpkm_uq_unstranded values. I want to run differential expression analysis using DESeq2 but I am not sure which column should I take into consideration for differential expression analysis. I mean should I take stranded or unstranded values ?. Thanks in advance guys. Please give your suggestions.

Thanks

RNA-Seq TCGA • 1.7k views

ADD COMMENT • link updated 23 months ago by jv ★ 1.8k • written 23 months ago by shivangi.agarwal800 ▴ 120

0

Entering edit mode

From GDC documentation https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/#introduction

To facilitate harmonization across samples, all RNA-Seq reads are treated as unstranded during analyses

ADD REPLY • link 23 months ago by jv ★ 1.8k

score 0 · Answer 1 · 2022-12-09

0

Entering edit mode

23 months ago

Zhenyu Zhang ★ 1.2k

You should use unstranded in general unless you know all your data are stranded and from the same project (so less batch effects).

ADD COMMENT • link 23 months ago by Zhenyu Zhang ★ 1.2k