I get following error when trying to extract aligned reads
"for alnmt in bam_reader:
... print alnmt"
"Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "../HTSeq/__init__.py", line 832, in __iter__
yield SAM_Alignment.from_pysam_AlignedRead( pa, sf )
File "_HTSeq.pyx", line 1251, in HTSeq._HTSeq.SAM_Alignment.from_pysam_AlignedRead (src/_HTSeq.c:21907)
File "csamtools.pyx", line 800, in pysam.csamtools.Samfile.getrname (pysam/csamtools.c:8816)
ValueError: tid -1 out of range 0<=tid<25"
A tid of -1 would mean a * in the RNAME column (the 3rd column). I'd need to go through the code, but does your BAM file perhaps have a read with a * in the RNAME column that lacks a 0x4 bit set in the FLAG? I imagine that could cause this error.
Edit: BTW, you can test whether this is the case with something like samtools view foo.bam | awk '{if($3 == "*") { if(!and($2, 0x4)) { print $0; }}}', which should print all reads that might cause this error.
Hmm, apparently mawk is installed on ubuntu (perhaps that's what you're using, it's what I have in lab) and it lacks awk's bitwise operators (OS X must have gawk installed, since that awk command worked on my laptop last night!). I'll put together a quick script that should work regardless and post that as an answer.
Since the bitwise operators in awk that I used in my comment apparently don't exist on all system (that's good to know!), here's a simply python solution. You could do this with perl to, but I loath the language.
#!/usr/bin/env python
import csv
import sys
f = csv.reader(sys.stdin, dialect="excel-tab")
of = csv.writer(sys.stdout, dialect="excel-tab")
for line in f :
#Don't try to modify the header!!!
if(line[1][0] == "@") :
of.writerow(line)
continue
if(line[2] == "*") :
if((int(line[1]) & 4) != 4) :
line[1] = "%s" % (int(line[1]) | 4)
of.writerow(line)
If you saved that as clean.py, then you can probably just samtools view namesorted.bam | python clean.py | htseq-count.py options - annotation.gtf if you want to directly use htseq-count. You can also just try to make a cleaned BAM file with samtools view -h something.bam | python clean.py | samtools view -bS - > something.clean.bam (if you really need that to be faster, you can implement the whole thing in C with the samtools API, but that's likely overkill!).
Interesting. I too use STAR, but haven't seen this behavior. Have you reported this to the author yet? If not, please do so, this problem is going to bite a number of people!
A tid of -1 would mean a
*
in theRNAME
column (the 3rd column). I'd need to go through the code, but does your BAM file perhaps have a read with a * in the RNAME column that lacks a 0x4 bit set in the FLAG? I imagine that could cause this error.Edit: BTW, you can test whether this is the case with something like
samtools view foo.bam | awk '{if($3 == "*") { if(!and($2, 0x4)) { print $0; }}}'
, which should print all reads that might cause this error.Thanks. I wasnt able to make that awk statement work but I think you are right. Do you know any efficient way to correct this?
Hmm, apparently mawk is installed on ubuntu (perhaps that's what you're using, it's what I have in lab) and it lacks awk's bitwise operators (OS X must have gawk installed, since that awk command worked on my laptop last night!). I'll put together a quick script that should work regardless and post that as an answer.
You can use
samtools
's built in required filter-f
:samtools view -f 4 xyz.bam | awk '{if($3 == "*") { print $0; }}'