Hi guys,
I've already posted a question about Salmon earlier , but this one is totally different
I used two different mappers like Hisat2 and Salmon to map my reads . I got good overall read mapping reads with both of them.
I used HTSeq-count to quantify my mapped reads from Hisat2. But the quant.sf file of Salmon gave me different results : on the one hand, counted reads are quite similar lbetween Htseq-count results and quant.sf file , but on the other hand, there could be a factor 40 between reads counted from Salmon and HTSeq-count for the same exon. Moreover, HTSeq-count will count 0 read for an exon while Salmon will count 15 reads for the same one. I really don't understand these results. I found some papers talking about that but nothing that gave me a good answer..
In addition, I'm gonna show you my results (just the 10 first lines) :
HTSEq-count
AT1G01010:exon:1 1 AT1G01010:exon:2 2 AT1G01010:exon:3 0 AT1G01010:exon:4 2 AT1G01010:exon:5 2 AT1G01010:exon:6 3 AT1G01020:exon:1 0 AT1G01020:exon:10 0 AT1G01020:exon:11 0 AT1G01020:exon:12 1
Salmon quant.sf (I've deleted the 3 middle lines of the file to get a better visualization)
AT1G01010.1 34 AT1G01020.2 43.6719 AT1G01020.6 13.3601 AT1G01020.1 54.2279 AT1G01020.4 0 AT1G01020.5 12.2951 AT1G01020.3 21.4449 AT1G01030.2 8.79053e-05 AT1G01030.1 19.9999
Is it due to the expectation-maximization algorithm of Salmon that some transcripts from HTSeq-count (for example AT1G01010:exon:4 is not found into the Salmon's file?
Best, Vincent
Salmon count multi-mapping reads, HTSeq discards multi-mapping reads - most likely this is the cause of the discrepancy.
If you are mapping to the transcriptome, HTSeq counts are not appropriate, as it will discard too many reads which map to different isoforms of the same gene. HTSeq should be used to count reads mapped to the genome - it will still discard multi-mapped reads, but you won't have multi-mappers due to isoforms.
I used a genome to map my reads with Hisat2 .
Does FeatureCounts discard multi mapping reads?
By default featureCounts discards multi-mapping reads, it has three parameters to alter this behaviour:
-M
,-O
and--fraction
.