I am conducting CNV analysis on TCGA Level 3 SNP 6.0 data. In few of the downloaded tumor samples, I found more than 1 seg files associated with same TCGA submitter ID. For example,
TCGA-44-2656-01A CUTCH_p_TCGAb_355_37_52_NSP_GenomeWideSNP_6_H10_1376764.nocnv_grch38.seg
TCGA-44-2656-01A EGGAR_p_TCGAb33and37_SNP_N_GenomeWideSNP_6_H04_585228.nocnv_grch38.seg
TCGA-44-2656-01A HILLY_p_TCGA_b90_wRedos_SNP_N_GenomeWideSNP_6_A04_748062.nocnv_grch38.seg
The first file showed 415 rows. While the second and third files showed 197 and 271 rows, respectively. All three files showed Mean Seg Score for Chromosomes from 1 to 22 and X.
Under this kind of situation, what factors I should consider to select one of the three files to continue my downstream analysis?
Should I combine those chromosome regions that have overlapped fully or partially among the 3 files, if I decide to combine the seg data of the 3 files?
The UUID's of the above 3 samples followed the same order are:
89327245-3da1-4a96-bee3-5b84ae43401a
f19650bb-8ead-490b-9f91-d7c4b06bfe6b
0e6071db-c44c-4958-ab95-087d44620893
Is there a means that I could find out more about the 3 samples with the above UUID's to facilitate the file selection?
Thanks! I got the same output as u showed above. As a newbie, would like to share that u will need the following libraries before u can execute the above R scripts.
Then load the GenomeInfoDb
# if not installed the first time, then
# Check whether installed properly #
The the following 2 libraries
Then execute the above R script provided by Sean.
U can find more updated info on GenomicDataCommons from the URL below:
https://github.com/Bioconductor/GenomicDataCommons
Thanks for the great details. Minor adjustment--GenomicDataCommons is now in Bioconductor, so
biocLite('GenomicDataCommons')
is the preferred approach for installing.One more detail...
Will need R version 3.4 or above for GenomicDataCommons to be installed properly.
I had version 3.3.3 originally, and the initial installation attempt failed.