Hi,
I have multiple paired-end bam files from rnaseq experiments (hisat2 aligned). I extracted properly-paired reads, sorted, indexed and ran featureCounts using the following command (as per http://bioinf.wehi.edu.au/featureCounts/):
featureCounts -p -t exon -g gene_id -a species.gtf -o bam.featureCounts *.bam
While it is still running, I can see the following in the log file:
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Strand specific : no ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : counted ||
|| Both ends mapped : not required ||
I am confused regarding whether I should count multimapping reads
or not and how it would effect the differential gene expression analysis? I have gone through A: Dealing with multimapping reads in featureCounts post, but would like to know more opinion.
Also the userguide says: "Due to the mapping ambiguity, it is recommended that multi-mapping reads should be excluded from read counting (default behavior of featureCounts program) to produce as accurate counts as possible."
Thanks!
A read from a gene can map to both the parent gene and as well as to a similar pseudogene. Removing ambiguous reads will under represent the parent gene even though it was expressed and counting them will over-represent the pseudogene. What do we do in such a situation?
We still ignore multimappers if your tool needs integers.