Question

Remote BLASTx : Strain specific search.

0

Entering edit mode

12 weeks ago

Saransh • 0

Hi community, Greetings!

I am using remote BLASTx on my Linux machine for some bacterial species (Few Nucleotide sequences in FASTA file format). Following is the command I'm using: /home/ncbi-blast-2.16.O+/bin/blastx - remote -query filtered_regions.fasta -db nr -entrez_query "E.coli [organism]" -out k12_blastx_res.txt

And it is providing me let's say 30 regions with No hits. But when I put specific strain (eg. -entrez_query "E.coli K-12 [organism]"), the number of regions with No hits decreases drastically (around 12-15).

Can anyone explain why is it happening? Because logically when using general taxon (E.coli) it should give low number of No hits as it must cover other strains as well, and when specifying the strain (E.coli K-12) it should increase the number of No hits as it must ignore other strains. But experiencing completely opposite.

Your help will be appreciated.

Thank you.

Strain_specific_search BLASTx BLAST • 527 views

ADD COMMENT • link 11 weeks ago by Saransh • 0

1

Entering edit mode

How about using the online service, where you can specify the organism? https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome

ADD REPLY • link 12 weeks ago by shenwei356 8.7k

0

Entering edit mode

Hi, thanks for your response, actually in the beginning I was using online service, but according to the project's need I had to use BLAST that can be automated (containerized) within the pipeline. That's why I switched to Linux based remote BLAST.

ADD REPLY • link 12 weeks ago by Saransh • 0

0

Entering edit mode

And it is providing me let's say 30 regions with No hits.

What does this mean? 30 sequences from your query show no hits?

ADD REPLY • link 12 weeks ago by GenoMax 147k

0

Entering edit mode

Yes, If I had 100 regions in my query file, out of which 70 regions get hits (Hypothetical/non-redundant proteins) and 30 regions gets No Hits (i.e Neutral regions: that means they don't code for any kinda protein).

According to my logic: If I'm providing whole organism as Entrez query, It should cover all the strains of that organism and should give less number of neutral regions (with No hits) and if I provide strain in entrez query, It should ignore the other strains and give higher number of neutral regions.

But I'm getting opposite of it.

And E coli is just an example, it's happening with other organisms as well.

ADD REPLY • link 12 weeks ago by Saransh • 0

0

Entering edit mode

Entrez query, It should cover all the strains of that organism

Try using taxID for the organism since that may be a better filter. You should also look at default values for other parameters (word length etc) to make sure they are optimal for what you are trying to do. If you don't change a value then the default values are always in use. People tend to forget that at times.

ADD REPLY • link 11 weeks ago by GenoMax 147k

0

Entering edit mode

I tried with taxid earlier but was facing same issue. I'll surely look into setting up the parameters.Thanks for your suggestion.

ADD REPLY • link 11 weeks ago by Saransh • 0