Hi All!
I am trying to build a blast database using an input fasta file. I am doing this with subprocess, rather than from the command line, as I'm making a reciprocal best hit pipeline. I have a pre-computed BLAST+ makeblastdb database against which the user's input sequences are blasted, and then I need all of the input sequences to be turned into a database in turn, against which the best hits of the previous BLAST run can be BLASTed (i.e. it needs to take whatever the user inputs, on a case by case basis, programatically). Sequences that 'hit' and/or don't 'hit' each other are then labelled appropriately.
Subprocess code:
import subprocess
def run_process(cmd):
if type(cmd)==list:
shell_bool=False
elif type(cmd)==str:
shell_bool=True
else:
return 'Bad input cmd'
try:
cmd = subprocess.run(cmd, check=True, capture_output=True, shell=shell_bool)
return cmd
except FileNotFoundError as e1:
print ('FileNotFoundError:')
print ('\n', e1)
return e1
except subprocess.CalledProcessError as e2:
print ('CalledProcessError:')
print ('\n',e2)
print ('\n',e2.stderr)
print ('\n',e2.stdout)
return e2
db_type_str=' -dbtype prot'
makeblastdb_path= r'"C:\Users\u03132tk\.spyder-py3\modulesData\NCBI\blast-2.10.1+\bin\makeblastdb.exe"'
fasta_db_path= r' -in "C:\Users\u03132tk\.spyder-py3\modulesData\fasta_sequences_SMCOG_efetch_only.txt"'
cmd_str=makeblastdb_path + fasta_db_path + db_type_str
#std_err = Error: mdb_env_open: There is not enough space on the disk.
cmd1=[makeblastdb_path, fasta_db_path, db_type_str]
#-->[WinError 2] The system cannot find the file specified
cmd2=[makeblastdb_path + fasta_db_path + db_type_str]
#-->PermissionError: [WinError 5] Access is denied
test=run_process(cmd_str)
#or cmd1 or cmd2
I've been looking into subprocess and think my code should work, but whilst I've tried inputting the command as a string (cmd_str
) and as a list (cmd
), and for cmd
values either as one big string ([makeblastdb_path + fasta_db_path + db_type_str]
) or as separate arguments ([makeblastdb_path, fasta_db_path, db_type_str]
), neither work.
I've most progress using cmd_str
- it does seem to start running makeblastdb. However, I get CalledProcessError
with return code 255
, and stderr
is Error: mdb_env_open: There is not enough space on the disk
. Whilst this seems like a clear cut issue, when I copy cmd_str
into command line it works fine, suggesting disk space isnt the problem. Any ideas why this is?
I have already made the BLASTDB_LMDB_MAP_SIZE=1000000
environment variable as discussed here makeblastdb Fasta file with 25 sequences gives Error: mdb_env_open: There is not enough space on the disk, but I don't think this is the issue, due to my successful command line runs and the different environment variables (mdb_env_open
vs BLASTDB_LMDB_MAP_SIZE
).
Cheers for reading!
Tim
EDIT 1 - Working code:
import subprocess
import os
def run_process(cmd):
if type(cmd)==list:
shell_bool=False
elif type(cmd)==str:
shell_bool=True
else:
return 'Bad input cmd'
envp = {
**os.environ,
'BLASTDB_LMDB_MAP_SIZE':'1000000',
}
print (envp)
try:
cmd = subprocess.run(cmd, check=True, capture_output=True, shell=shell_bool, env=envp)
return cmd
except FileNotFoundError as e1:
print ('FileNotFoundError:')
print ('\n', e1)
return e1
except subprocess.CalledProcessError as e2:
print ('CalledProcessError:')
print ('\n',e2)
print ('\n',e2.stderr)
print ('\n',e2.stdout)
return e2
db_type_str='prot'
makeblastdb_path= r'C:\Users\u03132tk\.spyder-py3\modulesData\NCBI\blast-2.10.1+\bin\makeblastdb.exe'
fasta_db_path= r'C:\Users\u03132tk\.spyder-py3\modulesData\fasta_sequences_SMCOG_efetch_only.txt'
cmd1=[makeblastdb_path, '-in', fasta_db_path, '-dbtype', db_type_str]
test=run_process(cmd1)
Hi Massa (you legend, this has been bugging me for hours),
Thanks for your reply! I looked into environments (didn't really know what these were) and you're right - the environment
BLASTDB_LMDB_MAP_SIZE
wasn't updated. I updated the code above with the environment variable and all seems well. I then sorted out the formatting ofcmd1
(I had spaces and speech marks that weren't necessary, subprocess was doing that formatting) and the list input withoutshell
also works. The working code has been added as an edit for anyone with similar issues.Can you change your comment to an answer so I can accept it please?
Cheers! Tim