Hi all,
Are there any tools that can quantify transcript abundance (e.g. TPM) from long read? As far as I know, salmon only works with short read data.
Hi all,
Are there any tools that can quantify transcript abundance (e.g. TPM) from long read? As far as I know, salmon only works with short read data.
Salmon
does work with long read data, but you cannot use it's built in alignment mode. Rather, you'd have to align the reads against the transcriptome using a tool like minimap2 (e.g. as in the Oxford Nanopore pipeline here). In fact, salmon
has a dedicated --ont
flag that is designed to improve quantification based on the error profiles of long read alignments (specifically ONT alignments).
Interestingly, despite what one might think, multi mapping is still quite common in long read data. This is often the result of incomplete reads or complete reads from partial (partially degraded) transcripts. So, there is still a non-trivial amount of transcript-level multimapping.
In addition to salmon, there are several other tools dedicated to long read quantification and transcript discovery, but I've not used them extensively and so can't comment too much on them. But you may want to look into e.g. bambu and espresso.
Bambu, Talon, StringTie2, Flames, isoquant, flair.
Maybe I forgot a few.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
If the reads in question are full-length transcript reads then multi-mapping is less likely to happen?
Yes. Typically there is less multi mapping in long read data than in short read data. However, there is still often a non-trivial amount. Attributing a cause with high confidence can be difficult, but many cases clearly arise because either (a) the read is not full length (e.g. it's truncated early, stuck in the pore, terminated due to structure when doing direct RNA sequencing etc.) or because (b) the transcript itself isn't full length (it's partially degraded).