TCGA data Query (GDCquery): external_gene_name " are missing
1
0
Entering edit mode
3.5 years ago

Hi,

I just got some weird output from TCGA dataset. As you can see in the below picture, some of the external_gene_name are missing. Would you please help me out with this issue? Thank you.

enter image description here

query.seq <- GDCquery(project = "TCGA-BRCA", 
                      data.category = "Transcriptome Profiling", 
                      data.type = "Gene Expression Quantification",
                      sample.type = c("Solid Tissue Normal", "Primary Tumor"),
                      workflow.type = "HTSeq - Counts")

GDCdownload(query.seq)

seq.brca <- GDCprepare(query = query.seq, summarizedExperiment = TRUE)
TCGABiolinks R TCGA • 1.6k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
3.5 years ago
GenoMax 147k

ENSG00000281904 is annotated as novel gene so that is why you have no official gene name. This gene was manually annotated by Ensembl. Others may be similar.

ADD COMMENT
0
Entering edit mode

Thank you, So do you mean that I can neglect them for my analysis? Actually, when I used the "gencode.gene.info.v22.csv" file from TCGA, it has assigned some name to them (highlighted part in the first picture attached).

enter image description here

But on the other hand, my friend get the exact name of the genes one year ago by "gencode.gene.info.v22.csv", but they are not the same in figure 1, I mean they have aliases. for example;

enter image description here

RP11-418H16.1 = AC007389.5

CH17-132F21.5= AC233263.6

So I'm wondering how can I get the same gene names "AC007389.5 and AC233263.6 , ... " ?

ADD REPLY
0
Entering edit mode

I mean, are you realistically interested in genes like these? They probably even have 0 counts across all of your samples. Unless you are specifically studying low-expressed predicted genes, then maybe just filter these out.

ADD REPLY
0
Entering edit mode

Thanks again. Yes, I need them to use in my analysis if I could get the gene names such as "AC007389.5" instead of "RP11-418H16.1" as I mentioned above.

ADD REPLY

Login before adding your answer.

Traffic: 2483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6