Replace names in FASTA headers from the first white space
2
0
Entering edit mode
2.7 years ago
bionix ▴ 10

Hello, I have a list of sequences in a fasta file which looks like this:

>WP00001_00001 HP Protein 1
ATGCATGATCAGTTGACGT
>WP00002_00022 Protein Like/Protein1
ATGACTGACGTTGACGTAC
>WP00002_00007 Protein cluster2
ATGGCTAGCCATGTACATT

I want to replace the first white space with a pipe (|) and then replace all other white space in the header (description) with an underscore (_). So that the final output file should look like this:

>WP00001_00001|HP_Protein_1
ATGCATGATCAGTTGACGT
>WP00002_00022|Protein_Like/Protein1
ATGACTGACGTTGACGTAC
>WP00002_00007|Protein_cluster2
ATGGCTAGCCATGTACATT

Could you please help me with that?

Regards, PSP

fasta space headers • 972 views
ADD COMMENT
0
Entering edit mode

What have you tried? The forum has a number of "edit FASTA/Q header" posts.

ADD REPLY
2
Entering edit mode
2.7 years ago
JC 13k

Perl-one-liner:

$ perl -pe 'if(/>/) { s/\s/|/; s/\s/_/g; s/_$/\n/ }' < in.fasta 
>WP00001_00001|HP_Protein_1
ATGCATGATCAGTTGACGT
>WP00002_00022|Protein_Like/Protein1
ATGACTGACGTTGACGTAC
>WP00002_00007|Protein_cluster2
ATGGCTAGCCATGTACATT
ADD COMMENT
0
Entering edit mode

Thanks a lot! It solved my problem.

ADD REPLY
2
Entering edit mode
2.7 years ago
$ sed -re '/^>/ s/\s/|/;s/\s/_/g' test.fa  

>WP00001_00001|HP_Protein_1
ATGCATGATCAGTTGACGT
>WP00002_00022|Protein_Like/Protein1
ATGACTGACGTTGACGTAC
>WP00002_00007|Protein_cluster2
ATGGCTAGCCATGTACATT
ADD COMMENT
0
Entering edit mode

Thanks a lot! It worked.

ADD REPLY

Login before adding your answer.

Traffic: 2590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6