Adding numbers after duplicate headers in fasta files
2
0
Entering edit mode
2.1 years ago

If I have the .fasta file consisting of a sequence of genes from certain species, how do I add numbers after duplicate headers in such a manner:

i.e. before

>Homo Sapiens
ABCDEFG

>Mus Musculus
EDFGHIK

>Homo Sapiens
XYGFS

after

>Homo Sapiens_1
ABCDEFG

>Mus Musculus
EDFGHIK

>Homo Sapiens_2
XYGFS
linux • 900 views
ADD COMMENT
1
Entering edit mode
2.1 years ago

Here's a seqkit answer too.

seqkit rename -n file.fasta

ADD COMMENT
0
Entering edit mode
2.1 years ago
 awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < in.fa |\
sort -t $'\t' -k1,1 |\
awk -F '\t' '{N++;if($1!=P) N=1;printf("%s_%d\t%s\n",$1,N,$2);P=$1;}' |\
tr "\t" "\n"
ADD COMMENT
0
Entering edit mode

that still adds the one to non-replicate header species.

ADD REPLY

Login before adding your answer.

Traffic: 2147 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6