Appending sequences in fasta file with identical headers
2
0
Entering edit mode
8.3 years ago
EVR ▴ 610

Hi,

I have fasta file like follows:

>A1
aatggggta
>A1
atttggta
>A1
actcaagt
>B1
agctaaa
> B1
ttaagc
>C1
aattatggc

I would like concatenate, the sequences with with identical headers like follows

>A1
aatggggtaatttggtaactcaagt
>B1
agctaaattaagc
>C1
aattatggc

Is there any tool which I can make use to achieve the above output. Kindly guide me.

fasta • 2.2k views
ADD COMMENT
0
Entering edit mode

Out of curiosity... why would you want to do that?

ADD REPLY
4
Entering edit mode
8.3 years ago
awk '/^>/ {if(prev!=$0) {prev=$0;printf("\n%s\n",$0);} next;} {printf("%s",$0);} END {printf("\n");}' input.fa

>A1
aatggggtaatttggtaactcaagt
>B1
agctaaattaagc
>C1
aattatggc
ADD COMMENT
1
Entering edit mode
8.3 years ago
venu 7.1k

How about this

cat file.fa | paste - - | datamash -s -g 1 collapse 2 | sed 's/,//g' | tr '\t' '\n' > new_file.fa

ouput:

>A1
aatggggtaatttggtaactcaagt
>B1
agctaaattaagc
>C1
aattatggc

P.S: datamash-here

ADD COMMENT

Login before adding your answer.

Traffic: 2454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6