Hi Biostars,
I'm running a program called BUSCO on several vertebrate genome assemblies (not so much for genome assessment as to collect single copy orthologs from their database). BUSCO mostly wraps several programs together (tblastn/augustus/hmmer). One genome in particular though, the opossum (Monodelphis domestica) has been giving me some trouble.
BUSCO first runs makeblastdb, which took nearly 24 hours for opossum (compared to about ~10 minutes for the dog and ~20 for the wallaby). BUSCO also runs threaded tblastn. For all of my genomes I've run it with 16-20 cores and the whole search lasts for no longer than 2 hours, but when I run it on the opossum genome, it only uses 4 cores even if I request 16 and 24 hours in its still going (though only adding something to the results file every few hours).
I've tried running makeblastdb and tblastn separately and observed the same problem, but only on the opossum. I've also checked the formatting and it looks identical to the other Ensembl genomes (headers use the same format as dog, lines organized in the same way, all are repeat masked and use "N" for masked data). Blast also gives me no error messages and doesn't exit, but it will hang for long stretches and will stop using resources on my computer.
I can't for the life of me figure out why this genome is behaving so differently when it is formatted correctly and being run the exact same way as my other genomes. Any ideas?
Why don't you run the blast itself? The program might have bugs.
As I said in the point "I've tried running makeblastdb and tblastn separately and observed the same problem, but only on the opossum."
I don't think the current release of blast has a bug. It also wouldn't make sense that out of several identically formatted Ensembl release genomes that the problem would only arise for one.
Did you try ubuntu or other linux's blast repository?