I've just been attempting a frameshift correction of my DNA sequences using fasty36. the output is a set of protein sequences "hopefully" in the correct frame. I have a huge amount of duplicates, i've tried to reduce them by clustering, which has worked somewhat, but I still have a large number of sequences where the hits to my reference sequence give me different translations for the same read. (My reference; the protein database I used seems to have it's own ambiguities/frame-shifts that change the protein sequence in places, although I could scrutinize this I don't want to, because these may be biologically significant.)
I want to align each read (now that I have many possibilities for the same read) against each other AND choose the best fit for the reference sequences (well over 2000 sequences for the MSA). SO... I was wondering if there is a way to align them end to end, similar to a DNA assembly?
In fasty if I adjust the e-value to be very small, I loose a lot of sequences.
So far i've done:
1) fasty36 run of my DNA-reads against a Protein reference sequence
2) extracted the "corrected" sequences with "/", "\","*","-"...
3) removed those chars for the next step
4) Run a fasta36 of my Translated DNA against my Protein reference sequence using a high e-value to collect the "perfect" alignments(<--- probably an unneeded step)
Thoughts advice, and alternative solutions/suggestions much appreciated....