When using HTseq-count I get an warning at the beguining :
Warning: Read SRR1503280.2 claims to have an aligned mate which could not be found in an adjacent line.
and at the end of the sam file:
74100000 SAM alignment record pairs processed.
74200000 SAM alignment record pairs processed.
74300000 SAM alignment record pairs processed.
74400000 SAM alignment record pairs processed.
Error occured when processing SAM input (line 74468230 of file accepted_hits_filtsort_Controle.sam):
'pair_alignments' needs a sequence of paired-end alignments
[Exception type: ValueError, raised in __init__.py:603]
and my outputfiles were empty:
0 HTseq_Controle_output.txt
0 HTseq_Huntington_Output.txt
I use this code :
python ~/programs/HTSeq-0.6.1/scripts/htseq-count accepted_hits_filtsort_Controle.sam \
~/database/Mus_musculus.primary.gtf > HTseq_Controle_output.txt
I am very newbie but, I really try to look at all the other similar problems people already posted in the past and could not find a solution. Maybe it is trivial and I only need a tip. Could some one help me?
Have a look at that line (
awk '{if(NR==74468230) print $0;}' accepted_hits_filtsort_Controle.sam
), my guess is that it's truncated. Alternatively, perhaps it's marked as paired but is the last read in the file.Thanks for your help. I am still learning this basic commands. Now I learn 'awk'. That line I get with awk is :
That looks fine. What happens if you use the
-o some_file.sam
option with htseq-count? That should output the reads with annotation as to how each is treated. Then you can justtail some_file.sam
to confirm where it's failing and thengrep -A 1 -B 10
the read name in the original SAM file to get some context. In general, the error you saw suggests that you have a single-end read when htseq-count is expecting a paired-end read. While the read you posted is a singleton, I'd be surprised if that's sufficient to cause the error.The last read in the some_file.sam I've got was:
And using grep I've got:
What happens if you run
grep -A 3 -B 3 SRR1503281.46709388 accepted_hits_filtsort_Controle.sam
?I had to stop for a while because other sequences of other project but, I can not see anything wrong with grep result, can you?
So the first 5 reads were originally paired-end and the last 2 were single-end. I don't think that htseq-count supports mixing the two (you can have orphaned reads mixed in, but not pure single-end reads).
What script/program can I use to cut off this orphaned reads?