Question

Frame Shifts And Multiple Alignments

2

Entering edit mode

13.5 years ago

Gimly_Gloin ▴ 70

Is there a way to align DNA reads(454 raw reads) to a protein reference sequence that is already aligned? The purpose of this is to remove or fix frameshifts from the multiple alignment. As far as I can see, blastx will show if there are frameshifts, but will not show a "fixed" alignment.

Any Ideas? Thanks in advance.

multiple dna protein • 4.9k views

ADD COMMENT • link updated 13.5 years ago by Ari ▴ 120 • written 13.5 years ago by Gimly_Gloin ▴ 70

0

Entering edit mode

What do you mean by "that is already aligned"? Aligned to other protein sequences?

ADD REPLY • link 13.4 years ago by Ahdf-Lell-Kocks ★ 1.6k

0

Entering edit mode

I mean, I have a protein alignment that I know is correct (from other sources), I want to align my 454 reads to that (as a reference alignment), but my reads have frameshifts. I've been trying to find software that would fix those frameshifts to get a better alignment, but have so far had no luck. Protein alignment is AA reads are DNA. I've translated in 6 reading frames but I still have frameshifts, I could do them manually, but would prefer an automated scheme, as I have about 500 reads or so.

ADD REPLY • link 13.4 years ago by Gimly_Gloin ▴ 70

score 1 · Answer 1 · 2012-03-06

1

Entering edit mode

13.4 years ago

Hamish ★ 3.3k

This sounds like you want to map the reads, which may contain frame shifts, on to a set of reference proteins to identify the mappings and frame-shifts. Assuming this is the case then a possible approach, using the FASTA suite, would be to:

Create a fasta sequence format database from your protein sequences.
Search the database using the 'fastx' (faster but limited to between codon shifts) or 'fasty' (slower but gives both in codon and between codon shifts), see "Comparison of DNA Sequences with Protein Sequences" (PMID:9403055 or PDF). The alignments produced indicate the occurrence of frame-shifts using the '/' and '' characters in the query sequence displayed in the alignment for positive and negative shifts.

From this point you know where the frame-shifts should occur in the reads and can adjust accordingly.

This is basically just a variation on EST to protein mapping techniques. A quick search in the literature finds "EST2Prot: mapping EST sequences to proteins." which might be worth a look.

ADD COMMENT • link 13.4 years ago by Hamish ★ 3.3k

0

Entering edit mode

I'm still reading through the documentation, Do you know if there is a way to extract the best scoring alignments as fasta sequences?

ADD REPLY • link 13.4 years ago by Gimly_Gloin ▴ 70

0

Entering edit mode

I believe that BioPerl supports 'fastx'/'fasty' so you could look there for something suitable. Alternatively you could try using MView and the various output control options ('-b', '-d', '-E', '-F' and '-m') for the FASTA programs to select the required alignment and convert it into an easier formats

ADD REPLY • link 13.4 years ago by Hamish ★ 3.3k

0

Entering edit mode

Do you know of a quick way to pull out the best results? Something I can use in a pipeline? MVeiw just hangs on me?

ADD REPLY • link 13.4 years ago by Gimly_Gloin ▴ 70

score 1 · Answer 2 · 2012-03-06

PAGAN can add new sequence fragments to an existing reference alignment and it also models 454 homopolymer errors. However, it can't correct frame-shifts and, in your case, the protein reference alignment should first be converted to a matching DNA alignment and the queries then aligned to that. There should be several tools available for this conversion.