Question

STAR-HT-Seq/featureCount got much more gene expression count than RSEM did

0

Entering edit mode

6.6 years ago

CY ▴ 750

I noticed that the gene expression count from HT-Seq / featureCount followed by STAR is much more higher than what RSEM got. It seems RSEM assign more much alignments as non-primary.

I did some digging and got vague impression that this is because RSEM consider more than just counting the reads aligned to gene region like HT-Seq / featureCount does.

I never got a clear explanation what causes this difference. Can anyone share some comments. Really appreciate.

RNA-Seq STAR RSEM HT-Seq gene expression • 4.3k views

ADD COMMENT • link 6.6 years ago by CY ▴ 750

1

Entering edit mode

You should be observing that RSEM get somewhat higher counts than the other methods, since it doesn't ignore multimappers, whereas the other methods do. If you're seeing anything other than that then you did something wrong.

ADD REPLY • link 6.6 years ago by Devon Ryan 105k

0

Entering edit mode

I went back and checked the aligners' output. For some reasons, RSEM generates huge amount of non-primary alignment (almost 50%) while STAR only got a little non-primary alignment (<10%). This is the reason why ht-seq followed by RSEM got really low expression count.

I know that RSEM uses STAR internally as well. I can only imagine that the difference is because RSEM used different parameters during STAR alignment. Although I could not figure out which parameters contributed to the difference......

I used basically default parameter:

STAR --limitSjdbInsertNsj 1800000 --genomeDir ${STARGenomeFolder} --readFilesCommand zcat --readFilesIn ${File1} ${File2} --sjdbGTFfile ${GTF} --outFileNamePrefix ${name}_ --outSAMtype BAM SortedByCoordinate --outSAMstrandField intronMotifi --outReadsUnmapped Fastx

While RSEM uses:

STAR --genomeDir $STARgenomeDir --readFilesIn $read1 $read2 --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --readFilesCommand zcat

ADD REPLY • link 6.6 years ago by CY ▴ 750

1

Entering edit mode

I have no idea what you mean by "ht-seq followed by RSEM", RSEM should follow STAR and never be used in conjunction with htseq-count. RSEM should produce a lot of multimappers, it's using STAR (or another aligner) to align to the transcriptome.

ADD REPLY • link 6.6 years ago by Devon Ryan 105k

score 1 · Answer 1 · 2018-05-23

1

Entering edit mode

6.6 years ago

CY ▴ 750

I used htseq-count after RSEM just to compare STAR-htseq vs RSEM-htseq. Now I see non-trivial difference in htseq results. I think it is partially because RSEM generates large amount of multimappers which will not be counted by htseq.

Also, I think you are right about never using RSEM-htseq together.

ADD COMMENT • link 6.6 years ago by CY ▴ 750

1

Entering edit mode

htseq-count should never be used after RSEM, you should use the output of RSEM directly.

ADD REPLY • link 6.6 years ago by Devon Ryan 105k

0

Entering edit mode

I'm agreeing with Devon Ryan here but he could be using the bam output from RSEM and (re-)use that as a bam input file for htseq-count, no?

ADD REPLY • link 6.6 years ago by lieven.sterck 15k

0

Entering edit mode

That was exactly what I did. I know RSEM + HTSeq are not used together usually. I just did this once trying to make a comparison of the expression count (RSEM vs STAR)

Then I found out STAR + HTSeq got much high expression count than RSEM + HTSeq. Digging deeper I found RSEM generate much more non-primary alignments (are they multimappers mentioned by @Devon Ryan?) which are not counted by HTSeq.

Besides, I guess I should not expect so many non-primary alignments because I was looking at gene.bam, not isoform.bam

ADD REPLY • link 6.6 years ago by CY ▴ 750