Hello everyone,
I'm using GBM data from FireBrowse. I have RNAseq data downloaded from FireBrowse ( file name: GBM.rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.data) it has total 171 samples.
I want to find the IDH1 mutations in the samples of GBM data. By following this video () i was able to find the samples with IDH1 mutation. Following is the list of those samples: total 14 samples
TCGA-06-0128-01A-01D-1490-08
TCGA-06-2570-01A-01D-1495-08
TCGA-06-6389-01A-11D-1696-08
TCGA-06-6701-01A-11D-1845-08
TCGA-15-1444-01A-02D-1696-08
TCGA-26-1442-01A-01D-1696-08
TCGA-32-4208-01A-01D-1353-08
TCGA-02-2483-01A-01D-1494-08
TCGA-06-0129-01A-01D-1490-08
TCGA-06-5417-01A-01D-1486-08
TCGA-14-1456-01B-01D-1494-08
TCGA-14-4157-01A-01D-1353-08
TCGA-19-2629-01A-01D-1495-08
TCGA-27-2521-01A-01D-1494-08
so according to the sample barcode, thses are the DNA samples. thus to find the RNA samples form the expression set data i matched samples barcode by substituting "D" by "R" for example : TCGA-27-2521-01A-01D-1494-08 -> TCGA-27-2521-01A-01R-1494-08 and i got only 8 samples in the RNAseq data. But i'm not sure if this is correct way to find samples with mutation in the RNAseq data (expression set). (please give suggestion for this process)
My questions:
1) How to identify the mutation status of samples ?, mutation of interest: IDH1
2) how to find wild type samples for those identified samples.
Thank you all in advance :)
I'm neither sure that that's the best approach. Just match them by the shortened TCGA barcode, i.e., TCGA-27-2521. Otherwise, you may find minimal matching.
The wild-type 'normal' mutation data is controlled/restricted access only. You will find 'normal' RNA-seq expression data, though.
Thank you for your suggestions. I really appreciate it.
Hi,
I tried to follow your instructions to get the Url for LGG samples having IDH1 mutation. But the URL didn't work when I copied this to my console with wget. Am I doing this correct or there is some other way of downloading this?
Looking forward to your reply. Thanks
Hey, can you show the exact commands that you used and provide a reproducible example?
Sure, here it is with the error it got.
Can you please help.....? I really got stuck.
Any help would be appreciated. Thanks.
You received HTTP error code 400, which means that the server could not parse the URL that you provided.
What, exactly, are you trying to do? Where did you even find that URL?
I want to find all those LGG samples who have IDH gene mutation. Isn't it the correct way to do this? Can you please suggest any alternative method for the same.
Thanks
I do not know if it is the right way. From where did you get the information to run this command:
Did you see this is some tutorial?
Please check the clinical and somatic variant data for LGG at the GDC Data Portal. I have configured a search for you, HERE
I found this very helpful.
http://bioinformaticsfmrp.github.io/TCGAbiolinks/subtypes.html
Thanks
If you wish to use TCGAbiolinks to obtain your mutation data, then I can tell you that it is a very common R package for obtaining TCGA data.
Yes I used TCGAbiolinks and it worked for me. Thanks