Hi folks,
I have downloaded RNA-Seq data for Breast Cancer cases from TCGA. When I look into the individual file, it looks like this:
#################ba295155-272e-43eb-9d6a-e4c9c392e68b.rna_seq.augmented_star_gene_counts.tsv###############################
gene-model: GENCODE v36
gene_id gene_name gene_type unstranded stranded_first stranded_second tpm_unstranded fpkm_unstranded fpkm_uq_unstranded
N_unmapped 1884156 1884156 1884156
N_multimapping 5772894 5772894 5772894
N_noFeature 3331800 41101584 41057602
N_ambiguous 6974507 1651293 1654961
ENSG00000000003.15 TSPAN6 protein_coding 4370 2153 2217 56.2216 13.5542 13.0317
ENSG00000000005.6 TNMD protein_coding 7 2 5 0.2768 0.0667 0.0642
ENSG00000000419.13 DPM1 protein_coding 2625 1314 1312 126.9161 30.5977 29.4181
ENSG00000000457.14 SCYL3 protein_coding 3005 2389 2252 25.4778 6.1423 5.9055
ENSG00000000460.17 C1orf112 protein_coding 1578 1728 1697 15.4251 3.7188 3.5754
ENSG00000000938.13 FGR protein_coding 599 301 298 10.3359 2.4918 2.3958
ENSG00000000971.16 CFH protein_coding 4864 2453 2411 35.5701 8.5755 8.2449
ENSG00000001036.14 FUCA2 protein_coding 1944 1645 1610 40.2007 9.6918 9.3182
ENSG00000001084.13 GCLC protein_coding 1958 1196 1170 13.2587 3.1965 3.0733
ENSG00000001167.14 NFYA protein_coding 4597 2540 2504 70.3931 16.9708 16.3166
ENSG00000001460.18 STPG1 protein_coding 669 375 367 4.5871 1.1059 1.0633
ENSG00000001461.17 NIPAL3 protein_coding 3220 1671 1623 19.9990 4.8215 4.6356
ENSG00000001497.18 LAS1L protein_coding 3766 1791 1979 17.5048 4.2202 4.0575
In the header, there is unstranded, stranded_first, stranded_second, tpm_unstranded, fpkm_uq_unstranded values. I want to run differential expression analysis using DESeq2 but I am not sure which column should I take into consideration for differential expression analysis. I mean should I take stranded or unstranded values ?. Thanks in advance guys. Please give your suggestions.
Thanks
From GDC documentation https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/#introduction