Is it necessary to name sort single end reads before htseq count
2
1
Entering edit mode
8.7 years ago
natsterbug ▴ 10

I have single end RNAseq data that I have aligned with Tophat2 and now would like to use htseq to generate counts for EdgeR. I have read that it is necessary to sort the accepted_hits.bam file by name if using paired ended reads. http://www-huber.embl.de/users/anders/HTSeq/doc/count.html Further, since the default option for order is name , if I do not need to sort by name for single end reads is it necessary to use the pos option?

Currently, I am using the command below and receive the following counts: htseq-count -m intersection-nonempty --format=bam tophat_Kalkaska_control/tophat_K18C/accepted_hits.bam PGSC_DM_V403_genes_strand_filtered.gtf > htseq_counts_control/K18C_counts.txt

htseq RNAseq sam bam • 2.8k views
ADD COMMENT
0
Entering edit mode
8.7 years ago

If your data is single-end, use the pos argument as your data is coordinate sorted. Just to be on safe side.

ADD COMMENT
0
Entering edit mode

Will do. Thank you so much.

ADD REPLY
0
Entering edit mode
8.7 years ago
h.mon 35k

It should not make any difference for single-end data, but you can easily test: just run htseq-count with both options, one at a time.

The --pos argument is used to determine how paired reads are sorted on the sam/bam file, to select internally how to find read pairs.

ADD COMMENT

Login before adding your answer.

Traffic: 2296 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6