I am working on a project using command line BLAT right now. I need to be able to take the output of the BLAT run, in any of the supported formats, and convert into a format that can be re-entered into a BLAT run. Eventually, my goal is to be able to iterate my BLAT runs. For reference BLAT can output psl, pslx, maf, sim4, axt, blast- tab, and blast-text format but takes as input only fasta, nib, and 2bit. I found a Biopython module called BlatIO (BlatIO on github.com) that supports parsing for .psl or .pslx files and attempted to parse this .psl output into a fasta format using my own code:
import sys
sys.path.insert(1, 'C:\\Python27\Lib\site-packages\Bio\BlatIO.py')
from Bio.AlignIO import BlatIO
from Bio import SearchIO
from Bio.SearchIO._model import QueryResult, Hit, HSP, HSPFragment
alignments = SearchIO.parse(input_file, 'blat-psl', pslx=True)
line1= QueryResult.id
line2= HSPFragment.query
print ('>', line1)
print (line2)
The output is not an ID and a sequence like I would expect though. Instead I get this:
('>', property object at 0x029BC9F0) property object at 0x029BC3C0
I am open to all suggestions about how to get ANY of the BLAT output formats into ANY of the BLAT input formats....either through fixing the code I have started above or some other method.
THANK YOU!
(PS- I have already done this project in BLAST so please don't tell me to just use BLAST. I know that BLAST has different and in some ways better output formatting options, but I really need to use BLAT not BLAST. PPS - I am aware of tools like those as usaglaxay.com that convert files however I really need a code or package to do this, preferably in Python or Perl, and not a web browser tool!)
Your best friends for sorting out things like this are:
in your case when you print the object it gives you the string representation of that object, which is not all that helpful (ok it is atrocious)
I don't know biopython but think it has to do with that line1 is sort of a 'class reference' not a result object instance, it seems intuitive that you need to loop over all
alignments
; at least in bioperl you need to do this. And then extract data via accessor methods. So it doesn't look like your program could work at all (note I know nothing about python,so maybe there is some kind of weird magic).I'd look for a class that writes fasta files (smth like
SeqIO
(Bio::SeqIO
in bioperl)), pass it the alignment object and see what happens.Btw, if I see correctly
BlatIO
inherits fromSearchIO
and the object returned bySearchIO.parse
should have the same interface as any object returned bySearchIO.parse
, so you just have to look for example code for class/interfaceSearchIO
and it should work. That, given the factory pattern of SearchIO.parse works as I assume.