Hi all,
I have downloaded total CNV files for a cancer from GDC portal.
I also have the clinical data for all patients, however I cannot map the names of file to submitter IDs.
The file name is some thing like "AMAZE_p_TCGASNP_b86_87_88_N_GenomeWideSNP_6_A01_735406.hg18.seg.txt", while the submitter ID is something like "TCGA-DJ-A2QA".
Can any body guide me how to mach these two names?
Thank you in advance
Nazanin
Could you give us an example, for one patient, of what you have downloaded with links and/or pictures please ?
I have downloaded the whole CNV files using TCGA2bed software.
These are some cnv files which have been downloaded: "AMAZE_p_TCGASNP_b86_87_88_N_GenomeWideSNP_6_A01_735406.hg18.seg.txt",
"AMAZE_p_TCGASNP_b86_87_88_N_GenomeWideSNP_6_A02_735476.hg18.seg.txt"
I want to map these file to the clinical file that I have previously downloaded.
In the clinical file only submitter ID and patients ID is available.
sample names might be inside the text files. Did you check the headers of the files?
Hi,
No the header just includes the results, something like this: "Sample Chromosome Start End Num_Probes Segment_Mean AMAZE_p_TCGASNP_b86_87_88_N_GenomeWideSNP_6_A01_735406 1 51598 9250000 4679 0.0076 AMAZE_p_TCGASNP_b86_87_88_N_GenomeWideSNP_6_A01_735406 1 9250070 9324990 55 0.5138"
Do you have the annotations.txt file coming with the CNV files ?
In this file you will have the entity_id which could also be found in the clinical files
Hi, Yes I have also downloaded the annotation file. However it does not include the names of CNV files that I can use for matching. The following is the header of annotation file:
Coming with your
AMAZE_p_TCGASNP_b86_87_88_N_GenomeWideSNP_6_A01_735406.hg18.seg.txt
you have an annotation file where you can find anentity_id
which I think in this case is this one29ba39af-b266-547a-b2c9-7795eba2e202
corresponding tocase_id
in your clinical file.To check
The problem is I have downloaded the CNV files for 507 patients with TCGA2bed. I know that I can find the patients or submitter ID via GDC, but I cannot do this for all 507 cases manually and I am seeking a way to find the equal patients or submitter ID automatically.
In other word, I want to find the patients or submitter ID based on "AMAZE_p_TCGASNP_b86_87_88_N_GenomeWideSNP_6_A01_735406.hg18.seg.txt". In annotation file there is no column including part of this "AMAZE_p_TCGASNP_b86_87_88_N_GenomeWideSNP_6_A01_735406.hg18.seg.txt" name.
What are the commands you used ?
TCGA2bed is a graphical tool in which toy can select bet ween annotation and experiment. After selecting tumor type, you have to select the type of data: CNV,RNASeq,...
As I don't know this API and it's not open source, I can't really help you more. In your CNV files you have sample names, you can try to get a list of it.
Then, I found this in R (https://cran.r-project.org/web/packages/TCGAretriever/TCGAretriever.pdf) Which I think you can request TCGA database with your list of sample names.
Or you can try to contact persons from this publication (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1419-5)
Hi,
I'm getting an error:
Can anyone assist please?
Please actually show the code that produced the error