Entering edit mode
11 months ago
Raman
•
0
How to interpret blast result for e.g., when I blast Gohir.D13G088900 in Arabidopsis thaliana i got mainly two results
- At1g62040 with e-value 1e-67, percent identity 88% and score 200 bits(508) 2.At2g05630 with e-value 2e-66, percent identity 92% and score 197 bits (500) out of these two hits which one should be considered as best hit and why ? Please explain
I would suggest looking at the alignments to see which is a better hit when comparing two results. One could be a high percent identity match, but over a much shorter part of the query. Whereas the other could be a much longer match, but lower percent identity. In this scenario, I would probably select the latter as the better hit, but it really depends on what I'm looking for.
basically i am trying to do some genome wide identification of some particular genes in a Gossypium species by blastp of known Arabidopsis sequences in Gossypium genome . Then the result that i got in gossypium are aligned back to arabidopsis to confirm best hit. But i am facing difficulty in handling those results because i actually want one best hit when i am aligning my gossypium results back to arabidopsis. but the top hits observed in arabidopsis are different based on e-value and percentage identity as discussed above. so i am confused which one to choose as based on e-value one is best and based on percent identity other hit is best. so can I depend upon the pairwise alignment for best hit
you may want to make a real phylogenetic tree from a multiple-alignment of the amino acid sequences
yes i am doing that
So indeed - take the gene family (perhaps also including several species between Arabidopsis and Gossypium), make a multiple alignment with Mafft or similar, then build a tree... you should be able to see what are one-to-one vs one-to-many orthologs
In that case maybe clustering genes into orthology groups might be useful. Depending on the number of genes of interest you have, you could use something like OrthoFinder to cluster. It also emits a phylogeny and MSA for each orthogroup it defines. Though it does feel a little overkill for just 2 genomes.