Dear community,
I have a XML file contained 50,000 genes Blast result with 10 hits for each gene. I want to mine desire genes Blast result from that XML file and the output file is still in XML format. The output file should have a full Blast result in XML form of desire genes.
I tired with the Python script shared by juefish on this posts, however, I only get partial of one gene Blast XML result parsed instead a list of my desire genes. I quote juefish's Python script here to make the discussion easier.
#!/usr/bin/env python
import sys
import os
import sets
import Bio
from sets import Set
from Bio.Blast import NCBIXML
# Usage.
if len(sys.argv) < 2:
print ""
print "This program extracts blast results from an xml file given a list of query sequences"
print "Usage: %s -list file1 -xml file2 > outfile"
print "-list: list of sequence names"
print "-xml: blast xml output file"
print ""
sys.exit()
# Parse args.
for i in range(len(sys.argv)):
if sys.argv[i] == "-list":
infile1 = sys.argv[i+1]
elif sys.argv[i] == "-xml":
infile2 = sys.argv[i+1]
fls = [infile1,infile2]
results_handle = open(fls[1], "r")
fin1 = open(fls[0],"r")
geneContigs = Set([])
#establish list of names of queries to extract from xml file
for line in fin1:
temp=line.lstrip('>').split()
geneContigs.add(temp[0])
fin1.close()
genes2 = []
marker = False
header = True
#cycle through xml file and printing results that match to query list
for line in results_handle:
line = line.rstrip('\r\n')
if(header==True):
if(line.count("<Iteration>")>0):
header=False
print "%s" % (line)
continue
print "%s" % (line)
else:
if(line.count("<Iteration>")>0):
genes2=[]
if(marker==False):
genes2.append(line)
if(marker==True):
if(line.count("</Iteration>")<1):
print "%s" % (line)
elif(line.count("</Iteration>")>0):
print "%s" % (line)
marker=False
if(line.count("<Iteration_query-def>")>0):
split1 = line.split(">")
split2 = split1[1].split("<")
if(split2[0] in geneContigs):
for i in range(len(genes2)):
print "%s" % (genes2[i])
marker=True
genes2=[]
I'm not good in Python nor BioPython, it would be great if the community could share with me your experience or suggestion.
Thanks for your time.
Hi Pierre; I am trying to compile using the command you mentioned. But it say javac: file not found: Biostar75204.java
Thanks
did you just download the file Biostar75204.java ?
thanks you Pierre. I used the script and was able to compile. Sorry it took a while to reply. I am trying on a large xml output.