How To Add Specific Word To First word in Fasta header
2
0
Entering edit mode
9.7 years ago
empyrean999 ▴ 180

I have fasta file with different headers. Basically assembled fasta files some are assembled with different versions so they got same fasta names. When I tried to make a blast database with -parse_seqids its complaining me of duplicate id's. So I would like to add a extension to with version of assembly to its fasta headers.

Examples

Input fasta sequences: (I am just showing headers here)

>Contig1_Node1_length20_cov30 Date:03/01/2015 Sequence_Organism:Other
>Contig2_deg1 Date:03/01/2015 Sequence_Organism:Other
>Contig3_jcg20839 Date:03/01/2015 Sequence_Organism:Other

Output fasta sequences:

>Contig1_Node1_length20_cov30_V2 Date:03/01/2015 Sequence_Organism:Other
>Contig2_deg1_V2 Date:03/01/2015 Sequence_Organism:Other
>Contig3_jcg20839_V2 Date:03/01/2015 Sequence_Organism:Other
sed awk unix perl • 3.2k views
ADD COMMENT
0
Entering edit mode

Exclude the -parse_seqids while creating blast database. It will not give any error.

ADD REPLY
0
Entering edit mode

True but I need -parse_seqids to extract sequences from fasta file.

ADD REPLY
4
Entering edit mode
9.7 years ago

Hi, here is a sed solution:

sed -e '/^>/ s/ /_V2 /' input.fa > output.fa

and a awk solution:

awk '{printf (/^>/) ? $1"_V2 "$2" "$3"\n" : $0"\n"}' input.fa > output.fa
ADD COMMENT
2
Entering edit mode
9.7 years ago
mxs ▴ 530

OK, the simplest way is again either a perl of awk script:

perl -lne 'chomp;if(/>(.*?)\s+(.*)/){print ">$1_V2 $2"}else{print $_}' input.fa > output.fa

_V2 in the above line is what you are adding as an extension. If there are several different extensions then you should create a key table and preloaded as a hash table. Again everything can be done in a single line.

Hope this helps

Cheers
mxs

PS: please ask if anything is unclear regarding the above solution

ADD COMMENT

Login before adding your answer.

Traffic: 2664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6