Strange Gene Ids In Tcga
1
2
Entering edit mode
10.9 years ago
jack ▴ 520

Hi all,

I've downloaded RNA-seq data from TCGA, and when I look at different expression data, the ID of first few genes are strange. does anybody knows why ?

gene
?|100130426
?|100133144
?|100134869
?|10357
?|10431
?|136542
?|155060
?|26823
?|280660
?|317712
?|340602
?|388795
tcga genomic annotation rna-seq • 8.7k views
ADD COMMENT
1
Entering edit mode

I want to know this as well. Will find out for you.

ADD REPLY
0
Entering edit mode

What cancer type and which files specifically?

ADD REPLY
0
Entering edit mode

for example: sample TCGA-A6-2683-01

ADD REPLY
0
Entering edit mode

I have another question.

Some of the gene_IDs has string extension as "_calculated"

What does it mean?

Example:

==> OV__bcgsc.ca__illuminahiseq_rnaseq__gene.quantification__Jul-08-2014.txt <==
Hybridization REF
gene
?|100132510_calculated
?|100134860_calculated
?|10357_calculated
?|10431_calculated

Cheers

Cankut CUBUK
Computational Genomics Program - Systems Genomics Lab
Centro de Investigación Príncipe Felipe (CIPF)
C/ Eduardo Primo Yúfera nº3
46012 Valencia, Spain
http://bioinfo.cipf.es

ADD REPLY
0
Entering edit mode

Please post this as a new question rather than adding it as an answer to a year old question.

ADD REPLY
0
Entering edit mode

Ok I will do, thanks

ADD REPLY
3
Entering edit mode
10.9 years ago
Ryan D ★ 3.4k

According to the description file these should be Entrez/LocusLink gene IDs.

For instance, the first one, is LOC100130426, a hypothetical locus. This may explain why many don't have HGNC names. Check out the description in the workflow.

---snip---

File: *.trimmed.annotated.gene.quantification.txt

  • gene: This is the Entrez/LocusLink gene symbol followed by the Entrez/LocusLink gene ID.
  • raw_counts: The number of reads mapping to this gene.
  • median_length_normalized: This is the total aligned bases to all transcript models associated with this gene divided by the mean transcript length.
  • RPKM: See the DESCRIPTION.txt file in the mage-tab bunlde for information on how this is calculated.
ADD COMMENT
0
Entering edit mode

Thanks for the solution Ryan, But the links that you posted are broken now. Can please update them? Since the "TCGA Data Portal is no longer operational" where can we find the mapping between TCGA gene Id to Entrez Gene IDs. To be specific I'm working with the BRCA dataset and would like to get the Entrez ID's for my corresponding TCGA IDs.

ADD REPLY

Login before adding your answer.

Traffic: 1570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6