Duplicate fasta header and append it with a pipe
1
0
Entering edit mode
2.6 years ago
bionix ▴ 10

Hello All,

I have a multi fasta file with millions of sequences. I want to duplicate a part of the header and join it to the header itself with a pipe, while another part (of the header) should be deleted.

Let's say I have a fasta file, "input.fasta," which looks like this:

>Gene1 wbdfwbf
ATGCCGATGCAGTGACG
>Gene2 wbdwe
ATGCAGTGACGTAGCAG
>Gene3 wdbwd
TGACGTAGCGTAGCAG

I want it to convert to:

>Gene1|Gene1
ATGCCGATGCAGTGACG
>Gene2|Gene2
ATGCAGTGACGTAGCAG
>Gene3|Gene3
TGACGTAGCGTAGCAG

First, I used cut -d ' ' -f 1 < input.fasta > out1.fasta for deleting space followed by all the characters from the header and then added a pipe by doing perl -p -e 's/^(>.*)$/$1\|/g' out1.fasta > out2.fasta

out2.fasta looks like this:

>Gene1|
ATGCCGATGCAGTGACG
>Gene2|
ATGCAGTGACGTAGCAG
>Gene3|
TGACGTAGCGTAGCAG

Now I am stuck here. I have come across many posts on deleting duplicates on the forum but didn't see any post on duplicating fasta header. Could you please help me with this or point out a solution if it has already been discussed?

Many Thanks, PSP

fasta headers duplicate • 600 views
ADD COMMENT
1
Entering edit mode
2.6 years ago
sed  '/^>/s/>\([^ ]*\).*/>\1|\1/'  in.fa
ADD COMMENT
0
Entering edit mode

Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 2734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6