Question

TCGA expression dataset handling

0

Entering edit mode

3.8 years ago

imaparna27 ▴ 20

Hi, I have downloaded open-access TCGA FPKM-samples for expression analysis with all the relevant manifest and metadata files form TCGA cart. It was required to download data separately due to different sub-types and on the basis of raw data processing (like RNA-seq, miRNA-seq).

Now how do I form a matrix for DEG analysis, given that all my samples are individual txt files and I require to put let's say 100 samples and their sample conditions altogether in a csv file?

Also, I am using R-DESeq2 for DEG analysis. What can be the way to solve this issue, also suggest if there is any alternative against FPKM from TCGA, for TCGA data expression analysis?

expression analysis TCGA samples FPKM data • 1.8k views

ADD COMMENT • link updated 2.6 years ago by munibabashir27 • 0 • written 3.8 years ago by imaparna27 ▴ 20

score 0 · Answer 1 · 2021-02-04

0

Entering edit mode

3.8 years ago

dsull ★ 6.9k

Generally, I find it easier to obtain TCGA data from https://xenabrowser.net/datapages/ -- you get gene expression data for all samples into a single file (and then you can just read that file into R and select your samples of interest).

From that site, you can also get counts that you can use for DEG analysis. You need to use counts, not FPKMs. You unfortunately can't use FPKMs for statistically sound DEG analysis.

I prefer using the files generated by "UCSC Toil RNA-seq Recompute" on that site -- that pipeline is more up-to-date than GDC's.

ADD COMMENT • link 3.8 years ago by dsull ★ 6.9k

0

Entering edit mode

Thanks for your response. I went through the data and matrix as suggested by you, however I couldn't get sample info whether it is control or tumor? It seems matrix has only counts information along with TCGA barcodes/identifiers in columns.

ADD REPLY • link 3.8 years ago by imaparna27 ▴ 20

1

Entering edit mode

The TCGA barcodes look something like: TCGA-CJ-4875-01

To figure out what this means (e.g. if it's tumor or normal), please refer to: https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/

(Hint: The example I provided above is tumor, not normal)

ADD REPLY • link 3.8 years ago by dsull ★ 6.9k

0

Entering edit mode

Hi, From your comment, I didn't get how we may know about our samples of interest. I'm new in genomic data analysis and R.

I downloaded data sets for two tumours from GDC and combined all of the samples into one file for 60483 gene_ids. My purpose is to find out the common genes in both tumors. What I did, I used DESeq2 and that gave me a result file, showing differentially expressed genes. I'm confused now that either I'm doing right or not? Please guide me. Thanks

ADD REPLY • link 2.6 years ago by munibabashir27 • 0