Entering edit mode
6.4 years ago
stanley.ju
•
0
What do the different components of a TCGA data filename mean?
For example, here's one file I was looking at: HORNS_p_TCGA_b110_113_SNP_N_GenomeWideSNP_6_C10_772388.grch38.seg.v2.txt
Some parts are self-explanatory. This comes from a genome-wide SNP array, I assume the 6 is Affymetrix 6.0. But what does "HORNS" mean? And "b110"? Etc.
Where did you get this file from? Also,
b110
is incomplete, I think: it should be considered with the113
that follows,b110_113
.I see--so b110_113 is some sort of sample marker?
This particular file name came from a download from TCGA Data Portal --> Uterine Corpus Endometrial Carcinoma --> Copy Number Variation. It was just the first file in the archive after I downloaded (straight from the web, since they're pretty small) all of the CNV data for endometrial carcinoma.
This paper might be of help, but I don't know how useful it is to decipher TCGA filenames.
https://link.springer.com/protocol/10.1007/978-1-4939-3578-9_6
HORNS
could be code for an institute of sample origin or something like that; I wouldn't worry too much about it.