Hello all,
I need some assistance in programing a script to run a basic nt BLAST on contigs and then save the requested files in a output file (csv) for easy sorting. I have an understanding of python but do not practice enough to be very good at it and since I spend a large portion of my time doing this simple task I am trying to speed it up.
I have a test file of 3 contigs in basic fasta fomat with the ">" as the contig id followed by the sequence I would like to have my script work by reading the file into a dict and then passing the key and value pairs to the blast+ wrapper (found it expects each file to be one search) and then save to the output file and append the next search result until done with the file.
This is my script so far...
import sys
from Bio import SeqIO
from Bio.Blast.Applications import NcbitblastnCommandline
#varable input files
file_of_seq = sys.argv[1]
#make a dictinary for the sequences to be ran using BLAST
record_dict = SeqIO.to_dict(SeqIO.parse(file_of_seq,"fasta")
len(record_dict)
The issue I'm running into is when I run this script in command line it doesn't return the len
of the dictionary it just returns an syntax error
~$ python blast.py test_query.fasta
File "blast.py", line 10
len(record_dict)
^
I do not know why it doesn't return 3 to the console since that is how many key,val pairs it should be making. So I 'm not really sure I'm using Biopython correctly in making the dictionary object.
Any advise would be appreciated.
Thanks,
Sean
Perhaps I misunderstood your description, but just give your multiple sequence file to BLAST+ as the query argument. That should work, there is no reason to make individual FASTA files with a single query sequence each.
I have tried that however my non-test file contains 153 contigs that I would like to blast and it seems to only return the first 20ish contigs the couple of times I have tried. The console looks like its running just fine but the outfile stops growing in size no matter how long I wait. The console never returns, the file stops getting big and there is no error message. So I took this to be a batch/job size issue and was wanting to use Biopython to feed it in one by one so I could blast them. Am I wrong in this line of thinking? Is there an option I'm missing in the command line tool that would let blast so many or make it do them one by one?
Hi Peter,
It appears I may have been just to impatient with the command line tool. I have been letting it run for a longer amount of time and it just finished and gave me the result I was looking for. Thank you for your patience and advice.
Sean
I'm glad you've solved this :)