I need to do some statistical calculation on CNV data that is publicly available in TCGA website. I am from computer background and no idea about the meaning of these data. I tried searching for the meaning of the files, how they are structured and all but nothing was helpful. It would be nice if someone could give a short overview of the meaning of data in each column. A sample of data present in a CNV file
Sample Chromosome Start End Num_Probes Segment_Mean
DEBUT_p_TCGAb45_81_wRedosSNP_N_GenomeWideSNP_6_G03_729530 1 61735 415164 28 -0.0504
DEBUT_p_TCGAb45_81_wRedosSNP_N_GenomeWideSNP_6_G03_729530 1 462793 629241 4 1.822
DEBUT_p_TCGAb45_81_wRedosSNP_N_GenomeWideSNP_6_G03_729530 1 668210 2138242 350 -0.0311
So I want to know what is the underlying meaning of sample, how they are named. Sample name seems always a big name. Then I understand the columns chromosome, start and end. But I don't understand the meaning of Num_probes and Segmentation_mean?
Very nice answer : I have couple of questions . Now with grch38 How can we run GISTIC, since we need new marker file etc ? Any input is appreciated Also,, Sample names are eg. CYANS_p_TCGAb_422_423_424_NSP_GenomeWideSNP_6_B03_1513914. How can we convert them into TCGA barcodes? Thannks
You can try this function to convert a filename to a TCGA barcode: C: problem in matching the names between file names and patients Id in TCGA
It will not work for all file-names, though. Be acutely aware of the
legacy = TRUE/FALSE
parameter that is passed to the function