TCGA RNA Quantification
1
0
Entering edit mode
23 months ago

Hi folks,

I have downloaded RNA-Seq data for Breast Cancer cases from TCGA. When I look into the individual file, it looks like this:

#################ba295155-272e-43eb-9d6a-e4c9c392e68b.rna_seq.augmented_star_gene_counts.tsv###############################

gene-model: GENCODE v36

gene_id gene_name gene_type unstranded stranded_first stranded_second tpm_unstranded fpkm_unstranded fpkm_uq_unstranded

N_unmapped 1884156 1884156 1884156

N_multimapping 5772894 5772894 5772894

N_noFeature 3331800 41101584 41057602

N_ambiguous 6974507 1651293 1654961

ENSG00000000003.15 TSPAN6 protein_coding 4370 2153 2217 56.2216 13.5542 13.0317

ENSG00000000005.6 TNMD protein_coding 7 2 5 0.2768 0.0667 0.0642

ENSG00000000419.13 DPM1 protein_coding 2625 1314 1312 126.9161 30.5977 29.4181

ENSG00000000457.14 SCYL3 protein_coding 3005 2389 2252 25.4778 6.1423 5.9055

ENSG00000000460.17 C1orf112 protein_coding 1578 1728 1697 15.4251 3.7188 3.5754

ENSG00000000938.13 FGR protein_coding 599 301 298 10.3359 2.4918 2.3958

ENSG00000000971.16 CFH protein_coding 4864 2453 2411 35.5701 8.5755 8.2449

ENSG00000001036.14 FUCA2 protein_coding 1944 1645 1610 40.2007 9.6918 9.3182

ENSG00000001084.13 GCLC protein_coding 1958 1196 1170 13.2587 3.1965 3.0733

ENSG00000001167.14 NFYA protein_coding 4597 2540 2504 70.3931 16.9708 16.3166

ENSG00000001460.18 STPG1 protein_coding 669 375 367 4.5871 1.1059 1.0633

ENSG00000001461.17 NIPAL3 protein_coding 3220 1671 1623 19.9990 4.8215 4.6356

ENSG00000001497.18 LAS1L protein_coding 3766 1791 1979 17.5048 4.2202 4.0575

In the header, there is unstranded, stranded_first, stranded_second, tpm_unstranded, fpkm_uq_unstranded values. I want to run differential expression analysis using DESeq2 but I am not sure which column should I take into consideration for differential expression analysis. I mean should I take stranded or unstranded values ?. Thanks in advance guys. Please give your suggestions.

Thanks

RNA-Seq TCGA • 1.7k views
ADD COMMENT
0
Entering edit mode

From GDC documentation https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/#introduction

To facilitate harmonization across samples, all RNA-Seq reads are treated as unstranded during analyses

ADD REPLY
0
Entering edit mode
23 months ago
Zhenyu Zhang ★ 1.2k

You should use unstranded in general unless you know all your data are stranded and from the same project (so less batch effects).

ADD COMMENT

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6