Question

How To "Assemble" Proteins Using Amino Acids, How Do I Align Fragments Of A Protein To Get The Best Overall Protein?

0

Entering edit mode

12.7 years ago

Gimly_Gloin ▴ 70

I've just been attempting a frameshift correction of my DNA sequences using fasty36. the output is a set of protein sequences "hopefully" in the correct frame. I have a huge amount of duplicates, i've tried to reduce them by clustering, which has worked somewhat, but I still have a large number of sequences where the hits to my reference sequence give me different translations for the same read. (My reference; the protein database I used seems to have it's own ambiguities/frame-shifts that change the protein sequence in places, although I could scrutinize this I don't want to, because these may be biologically significant.)

I want to align each read (now that I have many possibilities for the same read) against each other AND choose the best fit for the reference sequences (well over 2000 sequences for the MSA). SO... I was wondering if there is a way to align them end to end, similar to a DNA assembly?

In fasty if I adjust the e-value to be very small, I loose a lot of sequences.

So far i've done:

1) fasty36 run of my DNA-reads against a Protein reference sequence

2) extracted the "corrected" sequences with "/", "\","*","-"...

3) removed those chars for the next step

4) Run a fasta36 of my Translated DNA against my Protein reference sequence using a high e-value to collect the "perfect" alignments(<--- probably an unneeded step)

Thoughts advice, and alternative solutions/suggestions much appreciated....

fasta alignment reference assembly • 4.0k views

ADD COMMENT • link updated 11.1 years ago by Biostar 20 • written 12.7 years ago by Gimly_Gloin ▴ 70