How i can extract the first hit's title from blast XML-s
0
0
Entering edit mode
5.1 years ago

Dear All!

i'm new to bioinformatics, and i'm working on an archeogenetics project. My first task, that in a genome part i should search the contaminating, non-human segments. It's a ~500 shotgun sequence.

I would have two questions: -How could i print out just the first hits from the XML. -How could i write a counter to each non-human genes with the number and the name of the organisms?

I work in biopython.

Thank you in advance!

biopython python • 954 views
ADD COMMENT
0
Entering edit mode

Are you bound to use the xml output? Using tabular output like -outfmt 6 is much easier. https://www.ncbi.nlm.nih.gov/books/NBK279684/

If you really need to use xml files you could use something like ElementTree or this: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc95

ElementTree: https://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python

ADD REPLY
0
Entering edit mode

Thank you for the fast aswer! I wrote a short code, but my problem, that i don't know how to reach the hit_num part in the xml. My code is:

x=1
for record in NCBIXML.parse(open("full_result.xml")):
    if record.alignments:
        print("\n")
        print("query: %s" %record.query[:100])
        for align in record.alignments:
          if hit_num in hit.alignments == x :
           print("match:%s" %align.title[:100])

So basically i just want to print the query title and the first alignments title.

ADD REPLY
0
Entering edit mode

I can't help you much further, never used the parser. It helps to just print out everything or look what is inside record

So start with:

for record in NCBIXML.parse(open("full_result.xml")):
    print record

Or if you already know that hit_num is inside record.alignments:

for record in NCBIXML.parse(open("full_result.xml")):
    for x in record.alignments:
        print x
ADD REPLY

Login before adding your answer.

Traffic: 1501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6