I noticed that the gene expression count from HT-Seq / featureCount followed by STAR is much more higher than what RSEM got. It seems RSEM assign more much alignments as non-primary.
I did some digging and got vague impression that this is because RSEM consider more than just counting the reads aligned to gene region like HT-Seq / featureCount does.
I never got a clear explanation what causes this difference. Can anyone share some comments. Really appreciate.
You should be observing that RSEM get somewhat higher counts than the other methods, since it doesn't ignore multimappers, whereas the other methods do. If you're seeing anything other than that then you did something wrong.
I went back and checked the aligners' output. For some reasons, RSEM generates huge amount of non-primary alignment (almost 50%) while STAR only got a little non-primary alignment (<10%). This is the reason why ht-seq followed by RSEM got really low expression count.
I know that RSEM uses STAR internally as well. I can only imagine that the difference is because RSEM used different parameters during STAR alignment. Although I could not figure out which parameters contributed to the difference......
I used basically default parameter:
While RSEM uses:
I have no idea what you mean by "ht-seq followed by RSEM", RSEM should follow STAR and never be used in conjunction with htseq-count. RSEM should produce a lot of multimappers, it's using STAR (or another aligner) to align to the transcriptome.