Hi,
This is a question for users of htseq-count; a program for counting reads into gene annotation.
I am using version 0.6.1p1. There is an option --samout
that looks like a simple way to identify actually which reads are used in the counting. The manual definition is:
write out all SAM alignment records into an output SAM file called SAMOUT, annotating each line with its feature assignment (as an optional field with tag 'XF')
However, I am only getting a file with the XF tag, e.g.:
XF:Z:rab-11.1
XF:Z:rab-11.1
XF:Z:col-81
XF:Z:col-81
Is anyone familiar with this option and able to comment? Perhaps the idea is to concatenate this to the existing SAM.
Thanks.
EDIT
I have confirmed that the input SAM file is the same length as the --samout
output. Therefore, is the rest of the SAM line missing intentionally?
EDIT 2 (command line used - file names changed for brevity)
htseq-count --mode=union --stranded=no --type=exon --idattr=gene_id --format=bam --order=name --samout=out.sam in.bam genes.gtf > out.txt
EDIT 3
Just found that this has already been covered in seqanswers http://seqanswers.com/forums/showthread.php?t=45600. It does appear to be a bug when trying to use a BAM file directly.
That's odd. Presumably it's a bug. What's the exact command that you're giving?
Please see "EDIT 2" in main. It has just occurred to me that it might not like input as
--format=BAM
.Hmm, possible. I suspect this is just a bug. I haven't used the --samout option with the most recent version, though I could do that real quick just to confirm this bug.
I can confirm this bug with one of my datasets using 0.6.1p1 (it also seems to have affected 0.6.1). Have you already notified Simon Anders?
I have contacted him, thanks for your help.