Hi All,
I've predicted genes in genome. Now I want to identify proteins, For that, I've blasted all predicted proteins against uniprot database.
blast parameters
blastp -query proteins.fasta -db Uniprotdb -max_target_seqs 1 -max_hsps 1 -out output.blastp -outfmt 6 -evalue 0.001
what should be my blastp parameters, to get only significant match ?
Thank you in advance!!
What is a significant match for you ? If you expect to always find nearly perfect matches in Uniprot, then restrictive parameters would work but if you also expect imperfect matches you need to have parameters to accommodate them. It may be easier to let blast report more hits then filter these with your favourite scripting language.
Thanks for the reply. I want perfect matches, but which parameters to set to get a good match?
If you're only looking for identical matches blast is the wrong tool for the job. Just use grep or the string matching function of a scripting language or an implementation of a global alignment algorithm (e.g. needle in the EMBOSS suite). If you insist on blast, filter the output on alignment length and percent identity, i.e. only keep alignments (HSPs) that are full length relative to the query and 100% identity.