Hello folks,
can you think of any existing tool that would take a bunch of seqs in a protein fasta that ARE already ALIGNED and just merge those in a sort of a chimera-concatenate. Pieces might not overlap, and mismatches would produce X's or smth.. Like in this simple example.
>.. AAAAA---ABA------BAAAA
>.. ---AAAA-------BBBAB---
>output AAAAAAA-ABA---BBBXXAAA
The need is very obvious but what I ever encounter among tools around is "concatenate only non-overlapping pieces", "restrict to the longest containing sequence", etc. The seqs are protein, and the tool need be suitable for piping, so no GUI needed. Can anyone suggest?.. Thanks in advance.
Hi,
Sorry, I did not test and go into details of it. If you want "-" instead of "n" simply replace it. Good that it worked for you.
Hi Puli, thanks for the reply. In the first place, the prog outputs a binary representation of a character (-) that it does not find in the conversion table (\00). Ok, I can get things out of the binary into text, but a neater way would be to introduce a new character in the table, so that the consensus contains the initial "-" (..thousands of files) . As I tested, it did not go. Please let me know if you think the table cannot adopt a new character state. Thanks again!