optimise htseq count performance by choosing proper samtools sort options
0
0
Entering edit mode
4.9 years ago
2822462298 ▴ 120

Hi all,

I am currently using samtools to sort my bam files by positions (as default), then I used htseq to obtain read counts. Initially, I got massive 'Mate records missing' warnings. Then, I realized that htseq assumed the files were sorted by name, so I included the '-r pos' option and re-run the htseq. Then, I got less 'Mate records missing' warnings but they are still there...So my question would be: 1. Is there a way I can totally eliminate the warnings? 2. Which of the following pipeline is better?

  1. samtool sort by name + htseq without -r pos
  2. samtool sort by position + htseq with -r pos

I referred to the developer's posts: https://github.com/simon-anders/htseq/issues/37 but I still couldn't figure out how I should improve the process properly.

htseq RNA-Seq RNA rna-seq samtools • 2.2k views
ADD COMMENT
2
Entering edit mode

As @Devon suggested in an earlier question you should use featureCounts instead. It is much faster, can auto sort files as needed and will create an analysis ready count matrix from set of BAM files you provide to it making downstream import easy.

ADD REPLY
0
Entering edit mode

Thanks! In that case I do not need to sort the bam file using samtools right?

ADD REPLY
0
Entering edit mode

The BAM file still needs to be sorted, and AFAIK there are slightly different requirements for paired-end (fragment) and single-end (read) quantification. Basically, featureCounts will try to fix the mate pairs if it detects inconsistencies, but it's much slower than actual read counting, so it's best to make sure your files are sorted correctly. Samtools has options to fix unpaired mate reads or remove unpaired reads altogether.

ADD REPLY
0
Entering edit mode

Thanks! Actually I have tried all name, postion, and unsorted bam files for featurecounts. The outputs were pretty much the same with minor differences.

ADD REPLY

Login before adding your answer.

Traffic: 2877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6