Hi all,
It's a bit of a bait question but also something I want to ask. I have been using the default nf-core pipeline for rnaseq analysis and the output was always fine for DGE. However, recently I started using ENCODE gtf instead because of the warning on the nf-core that the AWS iGenome option may not be the most accurate. Now I have genes of the same name but multiple ensembl ids.
Here comes the question: if I want to just add the counts of the genes sharing the same name, I will need to know if multimapped reads are counted mutliple times, once, or dropped off. When aligning with star
, they didn't use the flag --outFilterMultimapNmax
. So it is set 10, and the output should have tagged the reads that are multimapped. Then I quantify it with salmon
, using the star
output. I have read through the document but there wasn't any explanation as to whether salmon
is aware of these multimapped reads as a quantifier, or how they would handle it if they know it is multimapped.
Would anyone be able to point me to the right document or explain how it works to me? Thanks a lot
Thanks GenoMax I have read those post but I was under the impression those were referring to the quasi-mapping mode? Or have I misunderstood and in fact both the quasi-mapping mode and alignment mode make use of the same EM model for quantification with the difference of a pseudoalignment input and an actual alignment input?
Rob confirmed that salmon accounts for multimapping in both modes.
Thank you very much for the answer and the patience!