HTSeq-count vs. FeatureCounts: Significant Difference in Results
1
0
Entering edit mode
8 weeks ago
Dora ▴ 10
htseq-count -s reverse -f bam A.bam human.gtf > htseq.A.count.txt &

./featureCounts -p --countReadPairs -a human.gtf -o A.featurecount.txt -T 4 -s 2 A.bam &

I am analyzing RNA-Seq data using HTSeq-count and FeatureCounts, and I've found that the fragment quantification results from FeatureCounts are about half of those from HTSeq-count. This observation doesn't align with the paper , which suggests that HTSeq-count results are generally slightly lower than those from FeatureCounts. I am confused.

Maybe someone know the reason or have ways to trace the problem?


  • htseq-count result (show a few randomly)

     ENSG00000284681 274
     ENSG00000284634 135
     ENSG00000284526 139
     ENSG00000284431 8
     ENSG00000198804 608623
    
  • featureCounts result

      ENSG00000284681 172
      ENSG00000284634 97
      ENSG00000284526 82
      ENSG00000284431 9
      ENSG00000198804 307752
    

sorry I don't know how to show the table properly at here.

FeatureCounts HTseq-count • 755 views
ADD COMMENT
1
Entering edit mode

If you are not sure about the best method to count reads or if you want to compare methods, I would pick some gene IDs with small counts differences between methods on a number of countable reads (like ENSG00000284431, one can count 10 reads by eye) and check your bam file on IGV for this specific gene ID to get an idea of how the counting could have been done. Maybe you will see that 1 read is overlapping 2 genes that could be discarded in one method but not the other.

ADD REPLY
0
Entering edit mode

Thank you for your help. I'll definitely try the method you suggested. I was just a bit puzzled because I had initially thought that htseq-count defaults to counting reads rather than read pairs (fragments). However, I found in the Q&A section of htseq-count that it states that for paired-end data, it actually defaults to counting fragments.

https://htseq.readthedocs.io/en/master/htseqcount.html

For paired-end data, does htseq-count count reads or read pairs?
Read pairs. The script is designed to count “units of evidence” for gene expression. If both mates map to the same gene, this still only shows that one cDNA fragment originated from that gene. Hence, it should be counted only once.

I choose one of the gene where featureCounts counts as 2 but htseq-count counts as 0. But... I don't know how to read the number of fragments from the screen .... (oh no.....

igv

ADD REPLY
0
Entering edit mode

These tools will account for fragments at the moment you define paired-end reads as input, as the 2 reads of the same fragments are sequencing the same molecule.

Which gene are we talking about ? You have to check where the fragments start and end to understand why one method count them in or out

ADD REPLY
1
Entering edit mode
8 weeks ago

You should read the manual for your tools so that you understand how they work.

You have -s reverse in the htseq-count call, I don't see a similar param for featureCounts - my guess is the default is unstranded. That would probably explain the discrepancies...

ADD COMMENT
0
Entering edit mode

thank you for your reply, for featureCounts, I use -s 2 to specify the strandness, which is equivalent to the -s reverse in htseq-count

# Strandness
 -s <int or string>  Perform strand-specific read counting. A single integer
                      value (applied to all input files) or a string of comma-
                      separated values (applied to each corresponding input
                      file) should be provided. Possible values include:
                      0 (unstranded), 1 (stranded) and 2 (reversely stranded).
                      Default value is 0 (ie. unstranded read counting carried
                      out for all input files).
ADD REPLY
1
Entering edit mode

Given you are using paired end reads you should activate the correct ordering using -r in ht-seq, usually BAMs are sorted by pos which is not the default option.

ADD REPLY
0
Entering edit mode

omg, you are amazing! Yes, that's the problem!

The bam file was in default coordinate sort since I use samtools sort default parameters.

Once I add the -r pos parameter, the results back to normal and also align to the paper mentioned in the question.

Even though I am not sure whether the technical terms 'sort by coordinate' and 'sort by position' are the same, the -r pos did solve the problem.

Thank you very much!!!

ADD REPLY

Login before adding your answer.

Traffic: 2070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6