Question

Differential gene expression, starting from RSEM expected count values.

3

Entering edit mode

4.8 years ago

english.server ▴ 300

Hi!

I could/should have asked this question at bioC forum, but the answers there are usually (just) over my head.

I wondered how to use a table of "Expected counts" from rsem to obtain DEGs using DeSeq2/EdgeR? The expected counts are from UCSC Xena- processed GTEX data. There seems to be more than one (proposed) workflows present for this kind of study. Before delving deeper into calculations, I wanted to know which approach is more often used:

Rounding the expected count: Question about how to transform RSEM expected_count of TCGA TARGET GTEX to integers?
As the tximport vignette at bioC explains https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#rsem . However, in the vignette, the starting txi object is built differently from my data (I still have not checked the structure of the resulting R object, so that I might create it manually; waiting for the approval of this method over the simpler method 1 above ) .
```
txi <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)
```

rsem expected-count deseq2 • 4.0k views

ADD COMMENT • link updated 4.8 years ago by i.sudbery 20k • written 4.8 years ago by english.server ▴ 300

1

Entering edit mode

tximport was developed by the same guys that developed edgeR. The concern that lead to the creation of the package as I understand it is that people make omission of the assumptions these packages work on (in this case, that counts are raw counts and not extimates), so they decided to create a package to couple in the pipeline. As for what exactly it does for DeSeq2, you should look at the code for the function

ADD REPLY • link 4.8 years ago by biofalconch ★ 1.3k

score 5 · Accepted Answer · 2020-02-20

Definately use tximport if you can.

If I remember correctly, the txi object is a list with three slots: $abundance contains a matrix of TPMs, with one row per transcript and one column per sample, $count contains the same matrix for estimated counts, $length contains a vector with the length of each transcript. This shouldn't be too hard to create.

You can then use the tximport functions for collapsing transcript counts to gene counts, and creating a DESeq dataset object or edgeR object from that collapsed txi. The reason to go from transcript counts rather than gene counts, is that tximport uses the transcript counts to create a weighted effective gene length for use as an offset in the DE model. This protects against splicing changes making the counts from the same gene incomparable between samples, because the effective gene length is different.

I seem to remember that RSEM is able to do something similar, but I can't quite be sure.