Question

How Can Different 3 Reading Frames Have Similarity For The Same Sequence Using Blastx?

2

Entering edit mode

13.8 years ago

Worldalive ▴ 20

Hi, I have a DNA sequence ( it's about 388 bp), which I am comparing with GenBank sequences using Blastx. I understand that Blastx looks into all possible 6 reading frames when translating a DNA seq, but the outcome is puzzling me because it is showing that 3 different reading frames show similarity to the same protein (it's in a conserved region of a Peptidase M1 superfamily). Also, when I look closely at the alignments, the similarities ( in the 3 frames) occur within the same region. The similarity is approx 76% of maximum identity and an E-value of 2e-11 .

Is this "similarity" of my sequence, most likely due to chance?

There are 2 things that make me think this:

1) I am aware that my sequence is too short compared to the >1000bp of the M1 peptidase sequence in GenBank.

2) When I look at the reading frames of my translated sequence, there are stop codons spread throughout... or can this be due to errors in sequencing?

Thanks for any help!

blast alignment • 4.8k views

ADD COMMENT • link updated 13.7 years ago by Larry_Parnell 16k • written 13.8 years ago by Worldalive ▴ 20

0

Entering edit mode

Repeating this comment regarding use of BlastX with frame shift penalty(-w option): I've found an interesting discussion here. I wonder typically what frame shift penalty value(s) for BlastX can be generally used.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 13.8 years ago by Woa ★ 2.9k

0

Entering edit mode

I bet the 3 reading frames are in the same direction, right?

ADD REPLY • link 13.7 years ago by Chris Evelo 10k

score 4 · Answer 1 · 2011-02-17

4

Entering edit mode

13.8 years ago

Ketil 4.1k

This is probably too obvious, but if it is a low complexity or repeat region, this could happen. Normally LCRs are masked by BLAST, but perhaps you were using -F F?

ADD COMMENT • link 13.8 years ago by Ketil 4.1k

score 2 · Answer 2 · 2011-02-17

Errors in sequencing can cause indels that change the reading frame. It's frequent that the same nucleotide sequence has several Blast high-scoring segment pairs (HSPs) in different reading frames with the same reference protein. I'd like to know if your sequence comes from a 454 experiment. The typical errors in 454 usually cause frameshifts that could explain your situation. It would be useful too to see the blast result you get

score 2 · Answer 3 · 2011-02-17

It is not just a low-complexity region that will give the result you describe, but any repetitive sequence. This becomes a problem when the repeat sequence is falsely incorporated into a gene model, thereby taking what should be annotated as a genomic repeat/low-complexity region and putting it into the protein database.

Try it yourself - take a human Alu sequence and run it against a protein db. I'm sure many of those hits are from bad gene models.