I am trying to download Kidney Renal Rapillary Cell Carcinoma from GDC. I want to know how to extract WT samples (identify which samples are WT) and which samples are mutated samples ? how can I get only those samples ?
I am trying to download Kidney Renal Rapillary Cell Carcinoma from GDC. I want to know how to extract WT samples (identify which samples are WT) and which samples are mutated samples ? how can I get only those samples ?
There are different ways of determining this:
If you have the TCGA barcode, this is by far the easiest way. Look at the 'Sample" field
[source: https://wiki.nci.nih.gov/display/TCGA/TCGA+barcode]
If you don't happen to have the TCGA barcode, then it's most likely a UUID, Case ID, or just a file-name that may have some ID in it's name. In these situations, you can search for these manually at the GDC Data Portal in the search box and then follow links in order to see if it's tumour or normal.
For example, if I have:
0b0e0b62-b823-4fdb-b37b-4a2731e648a7
, this relates to primary
tumour3ba5d6ec-dcce-49bb-82e5-85d3903a2aa1.htseq.counts.gz
, this
relates to UUID c247b168-3b4b-40ae-8e1a-32dda1b34397
and is a normal
sampleThere are other automated ways of doing this but the ones that I tried appeared to be outdated when I recently used them (open to being corrected if wrong, though).
Kevin
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
@Kevin Blighe I would like to know if I get the exon for the normal and tumor samples. what are the posibilities to check for differences? for example, do you know a way to check for mutation? or checking the effect of specific genes across two conditions? would you use the gene expression or the FIRMA ? and how do you deal with it?
I would download the raw count htseq files and then re-analyse them. Are you familiar with RNA-seq methodologies? All that you would need is DESeq2 for normalisation and differential expression.
For mutations, only MAF files are available in the TCGA open access data.
You can also check cBioPortal, which may already have all information for your gene of interest.
@Kevin Blighe why would you try to download the raw files? the problem is that the files are controlled so I cannot download them. how can I use this cBioProtal ? do they have all the data that exist in TCGA? In general I am trying to do the following I want to look at the Wild type samples for specific gene to see if there is a change between the wild type and the mutated/deleted samples (this can be done with DESeq2 ) - Also I want to know if I can drive any biological processes link to a specific gene (where they are unregulated in wild type samples)
I appreciate your help Thanks