RunningTBLASTN many times with python
4
0
Entering edit mode
7.1 years ago
trini1god • 0

Hello, I am trying to align a protein sequences to a nucleotide database so am using TBLASTN. Each protein sequence is contained in different files (like over 2000 files). So I opened the directory using glob and want to loop through each file and run the TBLASTN on them but it is giving me this error:

Command line argument error: Argument "query". File is not accessible:  `conSeqfile'

Below is my code:

import os
from os import path
import glob

datadir = '/home/chisom/somtee/chisom/multifasta_files/multi_fasta/Muscle_files/Consensus'
os.chdir(datadir)
readfilenames = glob.glob(os.path.join(datadir, '*.fasta'))

db = '/home/chisom/somtee/chisom/multifasta_files/multi_fasta/Mesculenta_305_v6.1.cds.fa'

#A directory for the output data
outdir = '/home/chisom/somtee/chisom/multifasta_files/multi_fasta/Blast_output'


for conSeqfile in readfilenames:
     print conSeqfile      #printed here to make it is printing the file

    out_blast = path.join(outdir, "blastres_" + conSeqfile[4:] + ".txt")

    cmd = "tblastn -query conSeqfile -db db -out out_blast -outfmt 6 -evalue 0.00001"
    os.system(cmd)

I have been on it for hours and also sourced for similar problems online but still cant figure it out. Any help is appreciated. Thanks

blast alignment python biopython • 2.2k views
ADD COMMENT
2
Entering edit mode

Change the line:

cmd = "tblastn -query conSeqfile -db db -out out_blast -outfmt 6 -evalue 0.00001"

into this:

cmd = "tblastn -query {} -db {} -out {} -outfmt 6 -evalue 0.00001".format(conSeqfile, db, out_blast)
ADD REPLY
0
Entering edit mode

Thanks @a.zielezinski, your answer helped me. I changed it and it worked

ADD REPLY
1
Entering edit mode
7.1 years ago
st.ph.n ★ 2.7k

You're on the right track, and your code can be simplified:

#!/usr/bin/env python

import os, glob, subprocess

os.chdir('/home/chisom/somtee/chisom/multifasta_files/multi_fasta/Muscle_files/Consensus')

db = '/home/chisom/somtee/chisom/multifasta_files/multi_fasta/Mesculenta_305_v6.1.cds.fa'

outfile = "/home/chisom/somtee/chisom/multifasta_files/multi_fasta/Blast_output/blastres_" + conSeqfile[4:] + ".txt"

for file in glob.glob("*.fasta"):
    print "Running tBLASTn for: " + file

    subprocess.call("tblastn -query " + file + " -db " + db + " -outfmt 6 -evalue 0.00001 -out " + outfile, shell=True)
ADD COMMENT
1
Entering edit mode
7.1 years ago
Joe 21k

Do you want to use python for this specifically? This would be less cumbersome as a simple shell script in my opinion. It looks like you're just trying to start the same command on several input fastas?

I'd just do (without any attempt to parallelise):

for file in /path/to/files/*.fasta ; do tblastn -query "$file" -db ~/somtee/chisom/multifasta_files/multi_fasta/Mesculenta_305_v6.1.cds.fa -out /path/to/outputdir/"${file%.*}"_out_blast.txt -outfmt 6 -evalue 0.00001 ; done
ADD COMMENT
0
Entering edit mode
7.1 years ago

not python, but a simple Makefile. This can be parallelized using

make -j number_of_jobs

ADD COMMENT
0
Entering edit mode
7.1 years ago
Joe 21k

Invoke your python command via GNU parallel or explore the threading and multiprocessing modules within python if you want to handle it within the script.

ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6