Question

salmon with long reads

1

Entering edit mode

15 months ago

shinyjj ▴ 60

Hi all,

I have a matched short and long read RNA-seq data and would like to see the relative isoform abundance using these datasets. I found out that salmon works with long reads as well (ref) and there are another bunch of long read tools that do isoform quantification like bambu, flair etc.

https://www.nature.com/articles/s41592-023-01908-w/figures/3

This is a naive question. According to the Fig 3-a link above, there is not a big difference between salmon and bambu. If salmon already works well with long read, what do long read tools differentiate from salmon? Doesn't salmon already do isoform quantification well with a long read?

I am asking this because I am not sure I should use salmon with long reads or other long read tools with long read.

salmon • 1.4k views

ADD COMMENT • link updated 15 months ago by Rob 7.1k • written 15 months ago by shinyjj ▴ 60

score 3 · Answer 1 · 2024-01-17

3

Entering edit mode

15 months ago

Rob 7.1k

Salmon author here — Yes; salmon works reasonably well for long-read quantification (comparable to other good long read quantification methods). For the record, I think that there are, in general, gains to be had for long read quantification, but none of the existing tools employ models that obtain those gains yet, so that is still an area for future work.

In terms of the argument for other tools, I believe that the biggest "selling point" is that most other long read tools are built primarily around isoform discovery / identification, and they also just happen to include a quantification model. So, for example, the main contribution of bambu, arguably, is it's novel model for transcript identification. It also happens to perform quantification (and does so reasonably well), but that's not the main / primary purpose of the tool.

So, I would argue it's completely reasonable to use salmon for your long read quantification (we do ;P) — of course, if you want / need to do isoform discovery, you'll have to use another tool for that part of the analysis. Of course, once you do discovery, you can always include those new isoforms in your catalog and quantify them with salmon as well.

ADD COMMENT • link 15 months ago by Rob 7.1k

0

Entering edit mode

Thank you Rob for your reply. To sum up your answer, it seems like LR & salmon already works reasonably well. Does LR & salmon works better than SR & salmon as the reads are longer even though the coverage is shallow compared to SR? I would guess error corrected long reads with better coverage with salmon will be another choice in the future ? ;p

ADD REPLY • link 15 months ago by shinyjj ▴ 60

1

Entering edit mode

So it's not true that one data modality is always better than the other, and it really depends on the specific data.

Specifically, while LR data typically has less assignment ambiguity (the reads are longer and much less frequently map to many different isoforms), it also often provides less overall coverage than SR. This means that you may detect the presence of isoforms at low to mid abundance in SR data that you simply don't observe at all in LR data. Of course, for the reads that you do observe, there will generally be less ambiguity as to their origin in long read data — that is true.

As you suggest, moving forward having higher quality (e.g. corrected or lower error-rate) long read sequencing will help further improve long read quantification, as will higher sequencing depths for long read data (e.g. higher throughput nanopore sequencing and techniques such as MAS IsoSeq).

Finally, as we continue to develop new algorithms and methods specifically for long read data, they too may help further improve quantification accuracy (we are working on some of these ourselves, but they're not quite ready for public use yet).

ADD REPLY • link 15 months ago by Rob 7.1k