Entering edit mode
7.4 years ago
pupatel
•
0
I'm running pysam program and it's giving me the following error. I quickly checked and found that pysam enforces length restriction on query name (they limit it to 252). Some of my sequence headers have more than 252 characters. How can I bypass that? Is there way to change the limit in pysam? Or, is there a flag to ignore this while fetching each reads from the sam files?
> [E::sam_parse1] query name too long
> [W::sam_read1] parse error at line 5105
> Traceback (most recent call last): File
> "/opt/sqanti/sqanti.py", line 1662, in <module>
> main()
>File "/opt/sqanti/sqanti.py", line 371, in main
> run(args)
> File "/opt/sqanti/sqanti.py", line 1371, in run
> (indelsJunc, indelsTotal) = indels(corrSAM)
> File "/opt/sqanti/utilities/indels_annot.py", line 24, in indels
> for read in sam.fetch():
> File "pysam/calignmentfile.pyx", line 1855, in pysam.calignmentfile.IteratorRowAll.__next__
(pysam/calignmentfile.c:20651)
> IOError: truncated file
This is a problem with htslib-1.4 see this github issue. It will probably truncate the bam file, but trying to modify the query names longer than 252 with awk might help you. What do you wanna do with the bam file in the downstream analysis? ( I ask this to provide a useful answer)
Thanks for replying. I'm using a program which utilizes and calls pysam module to extract splice junctions information. Changing the names of sequences >252 characters might be the only easy way to fix this issue as some files other than .sam also have >252 characters. To short the sequence names from the beginning, will be the only easy way.
yeah, then to short the query name length is your option. Another option could be to downgrade Htslib and pysam.