Hi,
I'm currently trying to calculate a signature score for immune cells in the tumor environment using RNAseq data from the TCGA database. To do so, I'm using the TCGAbiolinks and GVSA package:
query.gbm <- GDCquery(project = "TCGA-GBM",
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "normalized_results",
experimental.strategy = "RNA-Seq",
legacy = TRUE)
data <- GDCprepare(query.gbm)
gene_expression <- assay(data)
mic <- c("Epb41l5","Ysk4","Fam72a","Cdk18","Adora1","Olfml2b","Slamf1","Sccpdh","1110021L09Rik","Ank3","Kitl","Pmel","Sec14l2","Snap47","Zfp867","Fam83g","Rab34","Sarm1","Leprel4","Hsd17b1","Plekhh3","Etv4","Rundc3a","Ccdc46","Apob","Wdr35", ...)
genelist_mic <- list(mic)
score <- gsva(gene_expression, genelist_mic,
method= "ssgsea",
rnaseq=TRUE,
abs.ranking=FALSE,
min.sz=1,
max.sz=Inf,
no.bootstraps=0,
bootstrap.percent = .632,
parallel.sz=0,
parallel.type="SOCK",
mx.diff=TRUE,
tau=switch(method, gsva=1, ssgsea=0.25, NA),
kernel=TRUE,
ssgsea.norm=TRUE,
verbose=TRUE)
Everything seems to work just fine. I get an expression matrix with gene ids in rows and samples in the corresponding columns but I always get the following error message for the gsva() function:
Error in .local(expr, gset.idx.list, ...) : No identifiers in the gene sets could be matched to the identifiers in the expression data.
If I adjust my gene list (genelist_mic) to some of the gene identifiers from my gene expression matrix (gene_expression) that I get from TCGA, the code works. I also checked the gene identifiers of both the gene expression matrix and my cell type specific gene list with the David tool and it looks like they are all Gene Official Symbols. So I don't understand why it doesn't recognize the gene identifiers.
The gene identifiers from the gene expression matrix that I get back from TCGA look like this:
A1BG A2M NAT1 NAT2 RP11-986E7.7 SERPINA3 AADAC AAMP AANAT AARS ABAT ABCA1 ABCA2 ABCA3 ABCB7 ABCF1
Does anyone know why this might happen?
I would highly appreciate your help!
Thanks, Moritz
Hi Mike,
thanks for your fast reply! So I downloaded the data, checked again and noticed that the gene identifiers in the gene expression matrix were capital letters as opposed to my genelist. So I converted my genelist to all capital letters but now I get the following error message, which I don't understand:
These are the steps before the error message occurs:
Where do you receive this messages? What does the
traceback()
function says? Have you searched in the support of Bioconductor (Where this question should probably be)?So this is what my
traceback()
function returns after calculating for about 10 minutes:Yes, I did and unfortunately I wasn't able to find anything. Thanks for the note though, if I can't solve the problem with this post here, I will try it there.
the genelist_mic is a vector, a list or a GeneSetCollection? Altough the problem might be in mclapp not exporting the right namespace to the different sockets.
Agree with @Lluís R point on the support of Bioconductor.
Anyways, I think there is a problem in reading geneset file. Try with gmt format.
How many geneset are you using ?
Why do you think it's not working with a matrix? I have 172 datasets with around 20,000 genes each. I just don't understand it because calculating the z-scores with the gsva() function and rnaseq=FALSE works perfectly. It also calculates for around 10 minutes and stops then.
So here it what I tried:
I guess that is a problem because gene_expression is a matrix and getGmt expects a character vector. But how would I solve that?
read geneset file in
getGmt
, notexpression_matrix
.Hi,
How is your problem solved? I met the same issue and I have check the row name of my expression data and the ids in the gene set"gmt" file. They are the same in format. Most puzzling,I use the same code and same data several days ago but it just didn't work when I try it yesterday and today. I would highly appreciate your help.
Clover very puzzling!
Please do not add answers unless you're answering the top level question. Use
Add Comment
orAdd Reply
instead as appropriate.