Hello everyone,
I would like to download somatic SNP data from the TCGA. But if I have a look at the data matrix right here, there are two color codes "Tumor, matched normal" and "Normal, matched Tumor". I looked up the online guide and the Getting Started with the Data Matrix guide.
They explain it like
- TN (Tumor, matched normal) – Data for a tumor tissue for which matched normal tissue exists.
- NT (Normal, matched tumor) – Data for normal tissue for which matched tumor tissue exists.
But where is the difference?
May some of you are more experienced using TCGA then me.
With all the best,
Mario
@vchris_ngs,
I have not yet fully understood how to separate T/N samples, but it seems there is identifier in the BAM file name. I have not had time to go through, will look at in detail, after i finish the analysis on our data.
@Chirag Nepal,
I have got a fair bit of understanding of the data types in the TCGA. I have however not found relevant expression dataset from tcga as per my needs. I would like to suggest you to go through breast cancer datasets in TCGA. There in the RNASeq and RNASeqV2 you have both TN and NT data, which means you will get expression data for both tumor for which normal exists and normal for which tumor exists. So you can download both the formats . One will be color coded in blue (TN type) and other yellow(NT type). Then you can make a filtering to get the matched pairs from the same patient locally and make your cohort. Also if you are not looking for exact match then also you can make the analysis by just downloading randomly TN(blue coded) data and take similar number of NT(yellow coded) data and perform your analysis.