Hi all,
Just wondering if anyone has experienced a similar issue. I'm using blast+ version 2.11 on my school's cluster to blast fasta files of size about 23-60 Mb (~1000-4000 sequences).
Specifically I am using blastn with the nt database.
What is weird is that once my blast is done, sequences are missing. For example I am blasting contigs from metaspades and I notice that if my contig file has 1000 sequences, blast will only give me hits for 500 of them. I run it a few times and it is always the same contigs that are missing from blast. I am not using any thresholds because I just want to see what is being matched to these sequences. I thought that maybe these sequences just don't have any hits - so I extract the missing sequences separately from the contig file and blast a few of them but I do get a result!
Why is blastn just skipping some contigs entirely?? The exact command I am using is below:
blastn -db nt -num_threads 32 -max_target_seqs 1 -max_hsps 1 \
-outfmt "6 qseqid sseqid qlen slen pident evalue score staxids stitle" \
-query contig.fasta > contig_blast.output && echo "DONE" contig_blast.output
I just want the first top hit for each of my contigs but I need it for all the contigs in the file.
Hi, the number 500 is oddly specific one and seems like some threshold is in play. The blastn has two settings which are
500
by default:num_descriptions
andmax_target_seqs
(from the https://www.ncbi.nlm.nih.gov/books/NBK279684/). Btw, do those "missing" hits have higher or lower bitscore when you find them separately, then those reported in the big run?That was jsut an example but it isn't always 500. In general it just seems like half are being blasted and half are being skipped but it isn't a strict 50% reduction. They have high bitscore and in fact, I thought the contig length was coming into play and maybe these were super long - but the ones missing are often the shorter contigs. They also have high percent identity and good evalue scores so I really have no explanation as to why these are being skipped...
What happens to the output when you get rid of
&& echo "DONE" contig_blast.output
? Perhaps the installation is buggy? Have you tried grabbing a version off ofconda
? Or try using another sequence search tool (e.g.,MMseqs2
).I did do a run before without "DONE" but I thought maybe I was just missing an error and needed some validation that blast was finishing fine. I added the "DONE" to ensure this. Looks like it didn't matter because regardless, sequences are being skipped. Unfortunately our servers don't support conda so I can't grab it off that. Never heard of MMseqs2 but I'll check it out thanks!
Your servers don't support
conda
? As in the user cannot even write anything, even to their own directories? That doesn't make any sense.