Hi there, as per thread title.
If I am using HTSeq-count on paired-end mapped BAM files, but they are unsorted, and I use -s yes on the default option, is it advisable?
Hi there, as per thread title.
If I am using HTSeq-count on paired-end mapped BAM files, but they are unsorted, and I use -s yes on the default option, is it advisable?
Paired-end .bam need to be sorted either by read name or by alignment position before using HTseq-counts. You can use samtools sort
to sort it, then use the -r
option in HTseq-counts to specify whether the bam file is sorted by read name (name
) or by alignment position (pos
).
The -s
option of HTseq-counts is completely unrelated to the issue of sorting. It is used to specify whetherf the paired-end data is stranded or not, which depends on sequencing library preparation.
-s <yes/no/reverse>, --stranded=<yes/no/reverse> whether the data is from a strand-specific assay (default: yes) For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
So in a sense if I do not sort the output, and just plop it into HTSeq-counts, will the output be inaccurate?
Never mind, figured it out.
Thanks a lot!