Hello all,
I am trying to edit fasta sequences in a large multifasta file to serve as input for some tools.
Essentially I would like to append some characters to the end of each fasta sequence. I can write a script using python or perl, but was wondering if there is a quick and easy way to do it - say using sed or awk?
assuming I want to append a * character to the end of each fasta sequence -
I have tried this -
perl -pe 's/\n*/*\n>/' test.fasta
sed -r 's/\n>/*\n>/' test.fasta
but these did not work, instead just printing out the same sequences with no change.
Even though this does work, appending a *
character at the end of each line in the fasta file.
sed -r 's/\n/*\n/' test.fasta
I have tried adding other line ending characters in place of '/n'
such as '/r', '/M'
- no luck.
I think I might be missing some other line ending character or symbol - or my entire logic might be off.
essentially I want my output sequences to look like this -
>header_name
ATATCGACGCGACGTCGACGTCGACG
ATATCGACGCGACGTC*
>header_name
ATATCGAGACGTCGACGTATCGAGACG
ATATCGGAAGTC*
Any help would be appreciated!
sed /$/*/
is what you want... but that will add a star to the end of each line including the headersed "s/^\([^>]\+\)$/\1*/"
will add a star to each sequence line (which is a problem for formatted fasta) *If you're dealing with formatted fasta, I'd read the file, record by record. Whenever a new record is encountered, write the previous one and add the star to the end of the last sequence line. Otherwise you can take the second line from above.
* this doesn't work out of the box on a mac zsh
Hello,
You could try a conditioned sed