standalone BLAST parametes
1
0
Entering edit mode
7.2 years ago
cvu ▴ 180

Hi All,

I've predicted genes in genome. Now I want to identify proteins, For that, I've blasted all predicted proteins against uniprot database.

blast parameters

blastp -query proteins.fasta -db Uniprotdb -max_target_seqs 1 -max_hsps 1 -out output.blastp -outfmt 6 -evalue 0.001

what should be my blastp parameters, to get only significant match ?

Thank you in advance!!

genome gene blast • 2.0k views
ADD COMMENT
0
Entering edit mode

What is a significant match for you ? If you expect to always find nearly perfect matches in Uniprot, then restrictive parameters would work but if you also expect imperfect matches you need to have parameters to accommodate them. It may be easier to let blast report more hits then filter these with your favourite scripting language.

ADD REPLY
0
Entering edit mode

Thanks for the reply. I want perfect matches, but which parameters to set to get a good match?

ADD REPLY
0
Entering edit mode

If you're only looking for identical matches blast is the wrong tool for the job. Just use grep or the string matching function of a scripting language or an implementation of a global alignment algorithm (e.g. needle in the EMBOSS suite). If you insist on blast, filter the output on alignment length and percent identity, i.e. only keep alignments (HSPs) that are full length relative to the query and 100% identity.

ADD REPLY
0
Entering edit mode
7.2 years ago

what should be my blastp parameters, to get only significant match ?

This question groups together with "what is the cure of cancer".

Functional annotation is a pain in the a**, you have to deal with it. There are no "optimal" parameters, and many results in the databases are either wrong or "unknown", "uncharacterized", "undefined".

From your command I can see that you are limiting the hits to 1 and the high-scoring pairs to 1 (hsp). Why? Are you working in a non-model organism I assume (like I do) so you did a gene prediction because there was none. However, you could allow for more hsps than 1 because many times you have more than 1 hsp per sequence. There is one script (this: find-best-hit.py ) which allows you to find the best combination of HSPs in a blast run from an xml output file. Give it a try!

ADD COMMENT
0
Entering edit mode

Thanks for the reply. I want only one best match for each protein, Hence I set hsp 1.

ADD REPLY
0
Entering edit mode

"HSP corresponds to the matching region between the query sequence and the database hit sequence." from High Scoring Pairs (HSP) in BLAST output

There can be many HSPs per match, and limiting your blast run to one per match may reduce / underestimate / misestimate the overall sequence identity.

I think you should make the effort to read the literature about it before doing your analysis blindly.

EDIT: this is the literature you need to read! http://jeff.wintersinger.org/posts/2014/07/designing-an-algorithm-to-compute-the-optimal-set-of-blast-hits/

ADD REPLY

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6