Hi everyone,
I am running blasp through NCBIWWW in biopython and I need to blast 50-100 sequences at a time. Right now, I am just going through the list one by one. I would like to submit several of these at once.
Is there a way to do this?
Hi everyone,
I am running blasp through NCBIWWW in biopython and I need to blast 50-100 sequences at a time. Right now, I am just going through the list one by one. I would like to submit several of these at once.
Is there a way to do this?
Hi pawlowac,
I had a vary similar question a couple months ago ( Using Biopython and BLAST+ to automate de novo viral contig sorting ) and what Peter says in it is true. The short answer is you do not need to use biopython, you can just use the standalone blast function and use your file that has you sequence in it as the query. It works with any amount it will just take some time.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for the Answer. I had first used BLAST+ to do this, but kept getting timeout errors. I tried again after your suggestion (with the exact same command) and it works great now. Must be the NCBI connection being unreliable as always.
Great, I'm glad it worked for you. :)
Ok, I take it back. It worked once, but now it says CPU limit exceeded. There was 150 proteins I was trying to blast...
You can limit the amount of results in the search parameters by using the 'max_target_seqs' flag. I think the manual has the default set like 500 or something sure high. If you only need a few close hits you can run it with a determined number. For mine I was only concerend with the most exact match so I run it with
Unfortunately I need the diversity and there is significant overlap in results between the sequences. I end up parsing the XML results using biopython and grabbing sequence ID with certain conditions and then check for duplicates before using efetch to grab FASTA files. Oh well, back to the drawing board.