Hello everyone,
I'm trying to do a gene set varian analysis using R to detect a specific gene set signature of a specific pathway from 20 samples of RNA-seq. I have this files in BAM format but I don't know what to do in order to get a panel like this one in the pic!
which R packages are required?
on the Y axis there is the score for a specific singling pathway while the x axis represents the different samples.
Error in .local(expr, gset.idx.list, ...) :
Less than two genes in the input expression data matrix
In addition: Warning messages:
1: In .local(expr, gset.idx.list, ...) :
55046 genes with constant expression values throuhgout the samples.
2: In .local(expr, gset.idx.list, ...) :
Since argument method!="ssgsea", genes with constant expression values are discarded.
I thought that maybe separator could be a problem so I removed it, now because you told me I converted the file from .csv to .txt without specifying the separator and I have a different result:
> mydata <- read.table(file= "mytable_genename.txt",sep="",header=TRUE, row.names = 1)
> table(apply(mydata, 2, function(x) var(x, na.rm = TRUE) == 0))
FALSE
18
> table(apply(mydata, 1, function(x) var(x, na.rm = TRUE) == 0))
FALSE TRUE
50640 4406
> mydata<-data.matrix(mydata)
> esrnaseq <- gsva(mydata, canonicalC2BroadSets, min.sz=5, max.sz=500,
+ kcdf="Poisson", mx.diff=TRUE, verbose=FALSE, parallel.sz=1)
Error in .local(expr, gset.idx.list, ...) :
No identifiers in the gene sets could be matched to the identifiers in the expression data.
In addition: Warning messages:
1: In .local(expr, gset.idx.list, ...) :
4406 genes with constant expression values throuhgout the samples.
2: In .local(expr, gset.idx.list, ...) :
Since argument method!="ssgsea", genes with constant expression values are discarded.
Maybe I should be using the ensemble for the gene name..
My time is quite limited - sorry. You literally just need to look in the file to see how the columns are separated. It is most likely a comma or tab-space. So, sep = ',' or sep = '\t' should work.
Yes, the problem is that you have 4406 genes with 0 variance, so, there is nothing that can be done with these. You will have to remove them prior to performing GSVA. With the apply() function, you can basically create a vector of boolean values for TRUE and FALSE.
I have one more Question,
In the GSVA help page it is suggested to use tokcdf="Poisson" for a suitable modelisation of RNA-seq (count Data)
By default,kcdf="Gaussian"which is suitable wheninput expression values are continuous,
such as microarray fluorescent units inlogarithmic scale, RNA-seq log-CPMs, log-RPKMs or log-TPMs.
When in-put expression values are integer counts, such as those derived from RNA-seqexperiments,
then this argument should be set tokcdf="Poisson"
but in your comment you suggest a 2 step normalisation of raw counts (DESeq2) and log transformation
in the DESeq2 package we can do
Hello Kevin Blighe,
thanks for your reply,
it works great with the argument kcdf="Poisson" and the counts formalisation with DESeq2.
But what is for you the best use (kcdf option and DESeq2 normalisation method) for ssgsea ?
Hi Kevin,
I created the rlog matrix.txt and run GSVA, but I m doing something wrong, To run GSVA I m following this vignette:
https://bioconductor.org/packages/release/bioc/vignettes/GSVA/inst/doc/GSVA.pdf
but at this point I have an error : Error in gsva(mydata, canonicalC2BroadSets, min.sz = 5, max.sz = 500, : could not find function "gsva"
I'm not familiar with this script, can you tell me how to use it?
Do I need of others libraries?
thank you very much
Hey Morris, I think that you also need to load the main package, like this: