Append characters to end of fasta sequences in a multifasta file
1
0
Entering edit mode
3.8 years ago
sid5427 ▴ 20

Hello all,

I am trying to edit fasta sequences in a large multifasta file to serve as input for some tools.

Essentially I would like to append some characters to the end of each fasta sequence. I can write a script using python or perl, but was wondering if there is a quick and easy way to do it - say using sed or awk?

assuming I want to append a * character to the end of each fasta sequence -

I have tried this -

perl -pe 's/\n*/*\n>/' test.fasta
sed -r 's/\n>/*\n>/' test.fasta

but these did not work, instead just printing out the same sequences with no change.

Even though this does work, appending a * character at the end of each line in the fasta file.

sed -r 's/\n/*\n/' test.fasta

I have tried adding other line ending characters in place of '/n' such as '/r', '/M' - no luck. I think I might be missing some other line ending character or symbol - or my entire logic might be off.

essentially I want my output sequences to look like this -

>header_name
ATATCGACGCGACGTCGACGTCGACG
ATATCGACGCGACGTC*
>header_name
ATATCGAGACGTCGACGTATCGAGACG
ATATCGGAAGTC*

Any help would be appreciated!

fasta sed perl • 2.4k views
ADD COMMENT
1
Entering edit mode

sed /$/*/ is what you want... but that will add a star to the end of each line including the header

sed "s/^\([^>]\+\)$/\1*/" will add a star to each sequence line (which is a problem for formatted fasta) *

If you're dealing with formatted fasta, I'd read the file, record by record. Whenever a new record is encountered, write the previous one and add the star to the end of the last sequence line. Otherwise you can take the second line from above.

* this doesn't work out of the box on a mac zsh

ADD REPLY
1
Entering edit mode

Hello,

You could try a conditioned sed

sed '/^[A-Z]/ s/$/\*/g' test.fasta
ADD REPLY
2
Entering edit mode
3.8 years ago

if you have multi-line fasta, try:

$ seqkit replace -sp '$' -r '*' test.fa

If you have flattened fasta, try:

$ sed '/^>/! s/$/*/' test.fa
ADD COMMENT
0
Entering edit mode

Ah yes - I forgot about the differences between multiline and flattened fasta files. Thanks - the code snippet for the multi-line fasta worked!

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6