Hi all,
I have a matched short and long read RNA-seq data and would like to see the relative isoform abundance using these datasets. I found out that salmon works with long reads as well (ref) and there are another bunch of long read tools that do isoform quantification like bambu, flair etc.
https://www.nature.com/articles/s41592-023-01908-w/figures/3
This is a naive question. According to the Fig 3-a link above, there is not a big difference between salmon and bambu. If salmon already works well with long read, what do long read tools differentiate from salmon? Doesn't salmon already do isoform quantification well with a long read?
I am asking this because I am not sure I should use salmon with long reads or other long read tools with long read.
Thank you Rob for your reply. To sum up your answer, it seems like LR & salmon already works reasonably well. Does LR & salmon works better than SR & salmon as the reads are longer even though the coverage is shallow compared to SR? I would guess error corrected long reads with better coverage with salmon will be another choice in the future ? ;p
So it's not true that one data modality is always better than the other, and it really depends on the specific data.
Specifically, while LR data typically has less assignment ambiguity (the reads are longer and much less frequently map to many different isoforms), it also often provides less overall coverage than SR. This means that you may detect the presence of isoforms at low to mid abundance in SR data that you simply don't observe at all in LR data. Of course, for the reads that you do observe, there will generally be less ambiguity as to their origin in long read data — that is true.
As you suggest, moving forward having higher quality (e.g. corrected or lower error-rate) long read sequencing will help further improve long read quantification, as will higher sequencing depths for long read data (e.g. higher throughput nanopore sequencing and techniques such as MAS IsoSeq).
Finally, as we continue to develop new algorithms and methods specifically for long read data, they too may help further improve quantification accuracy (we are working on some of these ourselves, but they're not quite ready for public use yet).