Entering edit mode
8.4 years ago
lolec
▴
20
Hello everyone!
I am using htseq-count (version 0.6.1p1) on paired-end RNA seq data. I did the analysis sorting by name and sorting by position, however I obtained different results.
By name:
samtools sort -n accepted_hits.bam -o accepted_nsorted.bam
python -m HTSeq.scripts.count -f bam -r name -s reverse accepted_nsorted.bam GRCh37.75.gtf > htseq_accepted_nsorted.txt
tail htseq_accepted_nsorted.txt
ENSG00000273487 2
ENSG00000273488 2
ENSG00000273489 3
ENSG00000273492 5
ENSG00000273493 1
__no_feature 1372372
__ambiguous 820117
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 1836341
By position:
samtools sort accepted_hits.bam -o accepted_psorted.bam
python -m HTSeq.scripts.count -f bam -r pos -s reverse accepted_psorted.bam GRCh37.75.gtf > htseq_accepted_psorted.txt
Message:
Warning: Mate pairing was ambiguous for 23096 records; mate key for first such record: ('HWI-1KL149:106:C8CUGACXX:5:2102:5222:87333', 'first', '1', 146410, '1', 146469, 9416).
29201104 SAM alignment pairs processed.
tail htseq_accepted_psorted.txt
ENSG00000273487 2
ENSG00000273488 2
ENSG00000273489 3
ENSG00000273492 5
ENSG00000273493 1
__no_feature 1467721
__ambiguous 866870
__too_low_aQual 0
__not_aligned 0
__alignment_not_unique 2028312
What is wrong in this analysis ? How can we explain the different counts obtained ? Is there a solution to this problem ?
Looks weird. Probably related to the warning you received.
See htseq-count manual:
Makes me guess that the names of your read (pairs) are not unique? Just something I would check. How did you obtain the bam?
There is no 'warning' when sorted by name. Then I don't thing the problem is coming from reads' name. The bam was obtained with TOPHAT2.
I wouldn't expect a warning when sorting by name, since in that case htseq-count expects the both pairs to be adjacent. If sorted by position, htseq-count will search for the other read of the pair (to perform the mate pairing). And this gave a warning as you can see. Maybe it's the best to contact the author of the tool.
Okay, I will do that. Thank you for your help.
Did you manage to find any solution? (thanks in advance)
Unfortunately no. It might be inherent to how HTSEQ-count works.