HI, Hopefully someone can help me with this. I have prepared a script to extract data from a file, this part work very well, and does what I need to be done. The problem comes when I am using glob.glob, and subprocess to call the function. I keep having the above error message, and I do not know how to handle it. error message:
**File "parsing_blast.py", line 45, in <module> my_file=subprocess.Popen(cmd) File "/usr/lib64/python2.6/subprocess.py", line 642, in __init__ errread, errwrite) File "/usr/lib64/python2.6/subprocess.py", line 1238, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory
Thanks your help
from Bio.Blast import NCBIXML
from Bio import SeqIO, SearchIO
import sys, glob, subprocess, os
folders = glob.glob('/home/me/my_folder/H*')
print folders
for folder in folders:
my_files=glob.glob(folder + '/*.xml')
print my_files
def parsing_blast():
results_handle=open(my_files[0])
blast_results=NCBIXML.parse(results_handle)
#blast_results=NCBIXML.parse(results_handle)
output_handle=open(folder + ' my_data_parse.xml','w')
#to extract some information from the blast file
for blast_result in blast_results:
sequence_length=blast_result.query_letters #this is the length of the sequence
gene=blast_result.query #gene name
#print 'The length is:', sequence_length #check point
#print gene #check point
for description in blast_result.descriptions:
title=description.title #query seq name
#print description.title #check point
for alignment in blast_result.alignments:
for hsp in alignment.hsps:
identity=hsp.identities #matching bases
num_gaps=hsp.gaps #number of gaps
#print identity #check point
#print num_gaps #check point
per_identities=float(identity)/float(sequence_length)*float(100)
#print per_identities #check point
#sys.exit()
extracted_data= (gene + ',' + title + ','+ 'number_gaps: ' + str(num_gaps) +','+ 'per_identity: '+ str(per_identities) +'\n')
output_handle.write(extracted_data)
output_handle.close()
#sys.exit()
parsing_blast()
print 'The file has been created'
Why do you use subprocess to call the function?!
I have a hundred files, all starting by H, and in all of them I have an xml file I would like to parse. I do not want to do it one by one. So I want the script to get the information I want from the file within the H* folder, storing that information on another file. When the file is created in one folder to move to the next folder, and so on. I used glob.glob and subprocess before but within a function. I just wanted to use it from outside the function so I could add another function.
As far as I understood this is an entirely different use case. Instead, you need the multiprocessing module for parallelizing a function across many files.
Hi, I have re-edited the script, and now it works perfectly. But now I need to find out how to tell the programme to store the created file within the H files. Any help in that area, please.
I edited my answer to set the
output_handle
file to a file within the XML file source directory. Is that what you meant?