The query below is too much details but might be simple to answer for experienced ones. I spent lots of time to figure out but came to conclusion that I need help from experts.
I downloaded "miRNA gene quantification" from TCGA harmonized data which I saved as CSV file. This data set is more like an "Assay with counts". The rownames are "miRNAs" while colnames are "Patient ID", e.g. TCGA-G9-6356-01A-11R-1788-13.This database contains two "sample.type" - 1) Primary Solid Tumor , 2) Solid Tissue Normal
Similarly, I downloaded "clinical" data containing "patient". The rownames are Patient ID e.g.TCGA-G9-6356, while colnames representing different clinical parameters . This file is similar to "colData" with clinical information for edgeR. The "vital-status" parameter contains "dead or alive" which I want to use as "contrast".
My questions 1. How to link Patient ID from "miRNA gene quantification" data where it is TCGA-G9-6356-01A-11R-1788-13 with "clinical" file which contains "TCGA-G9-6356" as Patient ID ? 2. How to measure DE miRNAs in patients between "dead" and "alive". This is not "survival analysis" but a list of DE miRNAs in above mentioned sample.type in relation to vital.status; dead or alive ? 3. If I am to use edgeR separately with above data, can I completely ignore using a) TCGAanalyze_Filtering and b) TCGAanalyze_DEA.
Kevin Blighe, Thank you so much for taking time to explain in details. I really appreciate that For first query, I used following commands
This creates a new CSV file with only first 12 alpha/digits of sample identifier, allowing to link with sample identifier in clinical data file.
Yes, is everything now okay, in that case?