Hi Camilla,
There are two BUGS in your code. score
and gaps
are not member of alignment
object. Correct code should be:
print('score:', hsp.score)
print('gaps:', hsp.gaps)
Use dir()
to check members:
>>> dir(item.alignments[0])
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'accession', 'hit_def', 'hit_id', 'hsps', 'length', 'title']
>>> dir(item.alignments[0].hsps[0])
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'align_length', 'bits', 'expect', 'frame', 'gaps', 'identities', 'match', 'num_alignments', 'positives', 'query', 'query_end', 'query_start', 'sbjct', 'sbjct_end', 'sbjct_start', 'score', 'strand']
And here is how your code would run if you fix the bugs (I am using it on my BLAST file):
>>> for alignment in item.alignments:
... for hsp in alignment.hsps:
... if hsp.expect < 0.01:
... print('****Alignment****')
... print('sequence:', alignment.title)
... print('length:', alignment.length)
... print('score:', hsp.score)
... print('gaps:', hsp.gaps)
... print(hsp.query[0:90] + '...')
... print(hsp.match[0:90] + '...')
... print(hsp.sbjct[0:90] + '...')
...
****Alignment****
('sequence:', u'gi|568824607|ref|XM_006466626.1| PREDICTED: Citrus sinensis cold-regulated 413 plasma membrane protein 2-like (LOC102620025), transcript variant X5, mRNA')
('length:', 878)
('score:', 412.0)
('gaps:', 11)
AAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGATCAATTGGCCGTGGCTAATATGATCGATTCCGATATCAATGAGCTTAAAA...
||||||||||| |||| || ||||| |||||||||||| | || | || | ||||| || ||| ||||||||||||| |...
AAATGGGGAGAT---TGAATTATTTGGCTATGAAAACTGATGATCAGGTTGCAGCAGAGTTGATCAGCTCTGATTTCAATGAGCTTAAGA...
****Alignment****
('sequence:', u'gi|568824605|ref|XM_006466625.1| PREDICTED: Citrus sinensis cold-regulated 413 plasma membrane protein 2-like (LOC102620025), transcript variant X4, mRNA')
('length:', 911)
('score:', 412.0)
('gaps:', 11)
AAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGATCAATTGGCCGTGGCTAATATGATCGATTCCGATATCAATGAGCTTAAAA...
||||||||||| |||| || ||||| |||||||||||| | || | || | ||||| || ||| ||||||||||||| |...
AAATGGGGAGAT---TGAATTATTTGGCTATGAAAACTGATGATCAGGTTGCAGCAGAGTTGATCAGCTCTGATTTCAATGAGCTTAAGA...
****Alignment****
('sequence:', u'gi|568824603|ref|XM_006466624.1| PREDICTED: Citrus sinensis cold-regulated 413 plasma membrane protein 2-like (LOC102620025), transcript variant X3, mRNA')
('length:', 894)
('score:', 412.0)
('gaps:', 11)
AAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGATCAATTGGCCGTGGCTAATATGATCGATTCCGATATCAATGAGCTTAAAA...
||||||||||| |||| || ||||| |||||||||||| | || | || | ||||| || ||| ||||||||||||| |...
AAATGGGGAGAT---TGAATTATTTGGCTATGAAAACTGATGATCAGGTTGCAGCAGAGTTGATCAGCTCTGATTTCAATGAGCTTAAGA...
****Alignment****
('sequence:', u'gi|568824601|ref|XM_006466623.1| PREDICTED: Citrus sinensis cold-regulated 413 plasma membrane protein 2-like (LOC102620025), transcript variant X2, mRNA')
('length:', 922)
('score:', 412.0)
('gaps:', 11)
AAATGGGGAGAGAAATGAAGTACTTGGCCATGAAAACTGATCAATTGGCCGTGGCTAATATGATCGATTCCGATATCAATGAGCTTAAAA...
||||||||||| |||| || ||||| |||||||||||| | || | || | ||||| || ||| ||||||||||||| |...
AAATGGGGAGAT---TGAATTATTTGGCTATGAAAACTGATGATCAGGTTGCAGCAGAGTTGATCAGCTCTGATTTCAATGAGCTTAAGA...
Best Wishes,
Umer
Is there a reason you're using XML output? A tab-delimited output (
-outfmt 6
), would be much easier to parse with standard python, or Linux commands. You can also run BLAST from inside your python script with BioPython.Biopython doesn't totally support the plain text parsing now, not enough stable, so XML is the best recommanded output for Biopython parsing.
So, how to extract the query coverage from XML output?
Hello, I'm trying to get also the features of the alignment (the one is written in the picture) but I can't figure out how to do it.
My file is a xml that corresponds to several blast sequences, so I also have the question how to distinguish between queries. For each query I would have to get the one with highest score from genomics sequences (no transcript) and the feature corresponding to the name of the transcript (name of the gene that codify for protein, in this example Apoptosis regulator BCL2).
Thank you in advance,
Miguel
Please open a new question and reference this post (how to parse blast output using biopython) there. Do not add an answer unless you're answering the top level question.