Hi guys,
I am a little bit stuck in choosing the right blast result. I have a protein sequence of a non model organism I am interested in (Primula veris, plant). I have successfully performed a blast for this protein against 2 databases, 1 being A. thaliana (one of the best reviewed and most reliable genomes) and Actinidia chinensis, the closest relative to my study species with a fully annotated genome. Here are the results for the blast:
A. thaliana blast: Protein ID NP_191888.2 Identity: 46.36 % E-value: 4e-118 bitscore: 353 (2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase superfamily protein)
A. chinensis blast: Protein ID PSR94821.1 Identity 58.60 % E-value: 1e-154 bitscore 448 (Serine--tRNA ligase)
Both seem good results (very low E-value). Identity is higher for A. chinensis for a lot of protein sequences I have tested. This is normal considering the smaller genetic distance. How do I select the right protein from these results?
Cheers
Can you add some information about how long your query protein sequence is? What does the alignment look like in terms of how much of that sequence is covered/represented in the result. On face value those two proteins appear unrelated (unless they have common domains which is where your hit is). I would conservatively say that the hit on the genome closest to the one you are working with is probably more reliable, IF everything has been done right.
What happens if you just do
delta-blast
using your protein at NCBI? What family of proteins do you get in the result?Thanks for the fast reply! I appreciate it. I have used the command line blast. So these value I have are out of this blast. Now I blasted the sequence in the browser using NCBI. The results for A. thaliana turn up the same as. The result for A. chinensis however are totally different with no good hits... Strange, since I downloaded the fasta files from ncbi, made databases for both of them in the same way and used the same blast parameters. Thanks again for replying.
Do keep in mind (and this is important !) that E-vales are NOT transferable/comparable between searches with different databases. Moreover, the implementation of the NCBI online blast is little bit different then for the standalone ones
I'm also keen to know what your 'definition of best hit' is? best hit to do what with? or do you want most similar one?
why don't you create a DB with both these datasets in it, and then do one blast and pick the top one.
It would still be useful to get answers for the questions I had asked in my comment above.
Did you do blastp and then
delta-blast
at NCBI site?What is the "right" protein for you ? If you're trying to find orthologs then you should probably build a phylogenetic tree.