Hi, I have downloaded open-access TCGA FPKM-samples for expression analysis with all the relevant manifest and metadata files form TCGA cart. It was required to download data separately due to different sub-types and on the basis of raw data processing (like RNA-seq, miRNA-seq).
Now how do I form a matrix for DEG analysis, given that all my samples are individual txt files and I require to put let's say 100 samples and their sample conditions altogether in a csv file?
Also, I am using R-DESeq2 for DEG analysis. What can be the way to solve this issue, also suggest if there is any alternative against FPKM from TCGA, for TCGA data expression analysis?
Thanks for your response. I went through the data and matrix as suggested by you, however I couldn't get sample info whether it is control or tumor? It seems matrix has only counts information along with TCGA barcodes/identifiers in columns.
The TCGA barcodes look something like: TCGA-CJ-4875-01
To figure out what this means (e.g. if it's tumor or normal), please refer to: https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/
(Hint: The example I provided above is tumor, not normal)
Hi, From your comment, I didn't get how we may know about our samples of interest. I'm new in genomic data analysis and R.
I downloaded data sets for two tumours from GDC and combined all of the samples into one file for 60483 gene_ids. My purpose is to find out the common genes in both tumors. What I did, I used DESeq2 and that gave me a result file, showing differentially expressed genes. I'm confused now that either I'm doing right or not? Please guide me. Thanks