I need to concatenate clustalw aligned multifasta alignment file. I have tried many concatenating tools, which is not recognising the gaps (---)
of the alignment and ultimately end up with error stating that sequence dissimilar length
. Is there any way to do the same.
For example
I have alignment file a.fasta and b.fasta as given below,
a.fasta
>or1
ATGTCT-----TGA
>or2
-----TGATAG-----
>or3
TGATA-----TAGTT
b.fasta
>or2
ATGATGTATGATGATA
>or1
GTAGATAGATAGAG
>or3
ATGCTAGATAGATAG
The expected output is given below,
c.fasta
>or1
ATGTCT-----TGAGTAGATAGATAGAG
>or2
-----TGATAG-----ATGATGTATGATGATA
>or3
TGATA-----TAGTT ATGCTAGATAGATAG
are all those sequences (aligned ones) on a single line ?
if so then someting like this should work:
Dear @ lieven.sterck, Here, I showed an example sequence. But, in reality, I have multiple lined fasta file. All the sequences are not in single line.
you can first linearise them if that does not hinder the downstream analysis? eg using this:
https://gist.github.com/lindenb/2c0d4e11fd8a96d4c345
but I must admit that shenwei356 solution using the
seqkit concat
is much neater ;)Thank you @lieven.sterck for your valuable suggestion.
-update: I didn't realize that b.fasta was aligned. The following method only works if the second fasta file is unaligned. If you have the raw sequences for b.fasta you can use this method. Specially if you repeatedly want to align sequences with a.fasta you can have a profile alignment of a.fasta (a.hmm) and use hmmalign to align other sequences to a.fasta.
You can use hmmbuild to build a profile from a.fasta and then use hmmalign to align b.fasta to the hmm profile.
Or even better: Convert aligned fasta to stockholm
I think hmmalign works best with stockholm format but you can try it with aligned fasta (a.fasta) and see if it works.
Thank you @Fatima, However it shows following error,
All sequences in an aligned fasta should have the same length. or1 is 14 but or2 is 16.
hi Fatima ,
can you elaborate on how (or why) this is answering the original question?
if the b.fasta alignments do not cover the same region as a.fasta, then the hmm-align will also not align them, no?
My bad. Because b.fasta didn't have any gaps I thought it's an unaligned fasta.