I have sample wise data from TCGA for TNBC which contains approx. 60000 ensemble ids for each patient's sample like ENSG00000242268.2, ENSG00000270112.3, ENSG00000167578.15, ENSG00000273842.1,
ENSG00000078237.5, ENSG00000146083.10 and so on.
What these ids belong to? If they are different transcripts, then they must start as "ENST"
Can anyone suggest?
Thanks Devon and fin swimmer.
If these are gene ids, then I got 60483 such ids in one patient sample, and that too unique.
Also what about the digit after decimal place, I think it is ensemble version, but then how to convert them to gene id?.
But the 60483 ids aren't uniq, aren't they?Or have some the same ID but different version?
What do you mean by "convert them to gene id"?
EDIT:
I took a look at the statistic site of the current ensembl release. If I sum up all known genes including pseudogenes and non coding genes in primary assembly and alternative sequence I came up to 60327 genes. That's nearly to your number. Could this be?
Thanks Devon and fin swimmer. If these are gene ids, then I got 60483 such ids in one patient sample, and that too unique. Also what about the digit after decimal place, I think it is ensemble version, but then how to convert them to gene id?.
But the 60483 ids aren't uniq, aren't they?Or have some the same ID but different version?
What do you mean by "convert them to gene id"?
EDIT: I took a look at the statistic site of the current ensembl release. If I sum up all known genes including pseudogenes and non coding genes in primary assembly and alternative sequence I came up to 60327 genes. That's nearly to your number. Could this be?
shivangi.agarwal800 : Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.