Hi, I have a DNA sequence ( it's about 388 bp), which I am comparing with GenBank sequences using Blastx. I understand that Blastx looks into all possible 6 reading frames when translating a DNA seq, but the outcome is puzzling me because it is showing that 3 different reading frames show similarity to the same protein (it's in a conserved region of a Peptidase M1 superfamily). Also, when I look closely at the alignments, the similarities ( in the 3 frames) occur within the same region. The similarity is approx 76% of maximum identity and an E-value of 2e-11 .
Is this "similarity" of my sequence, most likely due to chance?
There are 2 things that make me think this:
1) I am aware that my sequence is too short compared to the >1000bp of the M1 peptidase sequence in GenBank.
2) When I look at the reading frames of my translated sequence, there are stop codons spread throughout... or can this be due to errors in sequencing?
Thanks for any help!
Repeating this comment regarding use of BlastX with frame shift penalty(
-w
option): I've found an interesting discussion here. I wonder typically what frame shift penalty value(s) for BlastX can be generally used.I bet the 3 reading frames are in the same direction, right?