Hi community, Greetings!
I am using remote BLASTx on my Linux machine for some bacterial species (Few Nucleotide sequences in FASTA file format). Following is the command I'm using: /home/ncbi-blast-2.16.O+/bin/blastx - remote -query filtered_regions.fasta -db nr -entrez_query "E.coli [organism]" -out k12_blastx_res.txt
And it is providing me let's say 30 regions with No hits. But when I put specific strain (eg. -entrez_query "E.coli K-12 [organism]"), the number of regions with No hits decreases drastically (around 12-15).
Can anyone explain why is it happening? Because logically when using general taxon (E.coli) it should give low number of No hits as it must cover other strains as well, and when specifying the strain (E.coli K-12) it should increase the number of No hits as it must ignore other strains. But experiencing completely opposite.
Your help will be appreciated.
Thank you.
How about using the online service, where you can specify the organism? https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome
Hi, thanks for your response, actually in the beginning I was using online service, but according to the project's need I had to use BLAST that can be automated (containerized) within the pipeline. That's why I switched to Linux based remote BLAST.
What does this mean? 30 sequences from your query show no hits?
Yes, If I had 100 regions in my query file, out of which 70 regions get hits (Hypothetical/non-redundant proteins) and 30 regions gets No Hits (i.e Neutral regions: that means they don't code for any kinda protein).
According to my logic: If I'm providing whole organism as Entrez query, It should cover all the strains of that organism and should give less number of neutral regions (with No hits) and if I provide strain in entrez query, It should ignore the other strains and give higher number of neutral regions.
But I'm getting opposite of it.
And E coli is just an example, it's happening with other organisms as well.
Try using taxID for the organism since that may be a better filter. You should also look at default values for other parameters (word length etc) to make sure they are optimal for what you are trying to do. If you don't change a value then the default values are always in use. People tend to forget that at times.
I tried with taxid earlier but was facing same issue. I'll surely look into setting up the parameters.Thanks for your suggestion.