I know that I can pass the gene-level rsem matrix to limma and work with it.
But can I use it in DESeq?
I've a rsem matrix from firebrowse, gene-level ~20000 rows. DESeq requires raw counts, and as I only have the matrix with samples/gene I can't use tximport().
You can use these with DESeq2 if you just round the numbers to whole integers and then input to DESeq2 with DESeqDataSetFromMatrix(). It is not ideal and using tximport would be preferred, as it does some adjustments for transcript length and transcript isoform abundances.
If you don't take my word, then take that of the DESeq2 developer: DESeq2 Following RSEM
Hello, I used an RSEM, gene-level count estimates matrix and just rounded the values using the round() function in R to feed them to DESeq2 but the results are very odd (VERY low number of deferentially expressed genes). Is this really a solid way to go about deferential analysis or should I resort to starting from raw data again.
NOTE: The study I got the data from provided the data as RSEM gene-level count matrix and FPKM normalized matrix. I had to use rsem since I know DESeq2 only takes non-normalized counts. The study used an independent t-test on the FPKM file to do the analysis, but I read somewhere that it is highly discouraged.
Using a t-test on FPKM data is an improper analysis. In fact, no statistical inferences in the realm of differential expression analysis can be performed using FPKM data.
It may be the genuine result that there are no differentially expressed genes in your data. How does it appear the dispersion plot?; the MA plot(s)?; the volcano plot(s)? Are the sample groups imbalanced (e.g. 3 normal versus 20 disease)?
Hello! Thank you for your timely reply. The differential expression is between 3 grades of glioma 2, 3 and 4. Basically paired comparisons using DESeq2 The plots are the following:
The distribution of the grades is fairly equal between the conditions (grades)
These plots look okay... the dispersion trend seems a little strange (the large 'belly' on the bottom) but perhaps that's due to RSEM. For the volcano, we may usually use the unadjusted p-values.
For all intents and purposes, the result that you have may be genuine. There may be another confounding factor for which you need to control, but I do not know your experiment 100% to know what that factor may be.
Hello, I used an RSEM, gene-level count estimates matrix and just rounded the values using the round() function in R to feed them to DESeq2 but the results are very odd (VERY low number of deferentially expressed genes). Is this really a solid way to go about deferential analysis or should I resort to starting from raw data again.
NOTE: The study I got the data from provided the data as RSEM gene-level count matrix and FPKM normalized matrix. I had to use rsem since I know DESeq2 only takes non-normalized counts. The study used an independent t-test on the FPKM file to do the analysis, but I read somewhere that it is highly discouraged.
Using a t-test on FPKM data is an improper analysis. In fact, no statistical inferences in the realm of differential expression analysis can be performed using FPKM data.
It may be the genuine result that there are no differentially expressed genes in your data. How does it appear the dispersion plot?; the MA plot(s)?; the volcano plot(s)? Are the sample groups imbalanced (e.g. 3 normal versus 20 disease)?
Hello! Thank you for your timely reply. The differential expression is between 3 grades of glioma 2, 3 and 4. Basically paired comparisons using DESeq2 The plots are the following:
The distribution of the grades is fairly equal between the conditions (grades)
These plots look okay... the dispersion trend seems a little strange (the large 'belly' on the bottom) but perhaps that's due to RSEM. For the volcano, we may usually use the unadjusted p-values.
For all intents and purposes, the result that you have may be genuine. There may be another confounding factor for which you need to control, but I do not know your experiment 100% to know what that factor may be.