I could/should have asked this question at bioC forum, but the answers
there are usually (just) over my head.
I wondered how to use a table of "Expected counts" from rsem to
obtain DEGs using DeSeq2/EdgeR? The expected counts are from UCSC Xena- processed GTEX data.
There seems to be more than one (proposed) workflows present for this kind of study. Before delving deeper into calculations, I wanted to know which approach is more often used:
As the tximport vignette at bioC explains
https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#rsem
. However, in the vignette, the starting txi object is
built differently from my data (I still have not checked the structure of the resulting R object, so that I might create it manually; waiting for the approval of this method over the simpler method 1 above ) .
tximport was developed by the same guys that developed edgeR. The concern that lead to the creation of the package as I understand it is that people make omission of the assumptions these packages work on (in this case, that counts are raw counts and not extimates), so they decided to create a package to couple in the pipeline. As for what exactly it does for DeSeq2, you should look at the code for the function
If I remember correctly, the txi object is a list with three slots: $abundance contains a matrix of TPMs, with one row per transcript and one column per sample, $count contains the same matrix for estimated counts, $length contains a vector with the length of each transcript. This shouldn't be too hard to create.
You can then use the tximport functions for collapsing transcript counts to gene counts, and creating a DESeq dataset object or edgeR object from that collapsed txi. The reason to go from transcript counts rather than gene counts, is that tximport uses the transcript counts to create a weighted effective gene length for use as an offset in the DE model. This protects against splicing changes making the counts from the same gene incomparable between samples, because the effective gene length is different.
I seem to remember that RSEM is able to do something similar, but I can't quite be sure.
tximport was developed by the same guys that developed edgeR. The concern that lead to the creation of the package as I understand it is that people make omission of the assumptions these packages work on (in this case, that counts are raw counts and not extimates), so they decided to create a package to couple in the pipeline. As for what exactly it does for DeSeq2, you should look at the code for the function