Changing a fasta header
2
1
Entering edit mode
19 months ago
Diana Nadia ▴ 10

Hi I have a fasta file anotated and I want to add to the first position after > the next word to 'Similar to'

   >_Anouracaudifer_00017283-RA transcript Name:"Similar to Chid1 Chitinase domain-containing protein 1 (Rattus norvegicus OX=10116)" offset:0 AED:0.30 eAED:0.30 QI:0|0|0|1|1|1|12|0|393
ATGAAGGCGCTCCTGCATGTGCTCTGGCTCACTCTGGCCTGCGGCTCTGCTCACACCACCCTGTCGAAGTCGGATGCCAAGAAGTCTGCCTCCAAGACACTGCAGGAGAAGACTCAGCTCTCAGAGACACCTGTGCAGGACCGGGGTCTGGTGGTAACAGACCCCCGAGCCGAGGACG

I want the output to be like this

>Chid1_Anouracaudifer_00017283-RA transcript Name:"Similar to Chid1 Chitinase domain-containing protein 1 (Rattus norvegicus OX=10116)" offset:0 AED:0.30 eAED:0.30 QI:0|0|0|1|1|1|12|0|393
ATGAAGGCGCTCCTGCATGTGCTCTGGCTCACTCTGGCCTGCGGCTCTGCTCACACCACCCTGTCGAAGTCGGATGCCAAGAAGTCTGCCTCCAAGACACTGCAGGAGAAGACTCAGCTCTCAGAGACACCTGTGCAGGACCGGGGTCTGGTGGTAACAGACCCCCGAGCCGAGGACG

How can i do it? i already tried with

sed -E 's/(Similar to )(\w+)/>CHIA_\2\1\2/' file.txt > new_file_2.txt

and store it in a new file and tried to paste it into the headers but it does not work , any ideas?

fasta changingheader preprocessing header • 768 views
ADD COMMENT
2
Entering edit mode
19 months ago

Your sed idea was close, here's a working implementation.

sed -E 's/(^>)(.+Similar to )(\S+)(.+)/\1\3\2\3\4/' in.fasta

You can take a similar approach with seqkit too, which I tend to prefer for fasta manipulation.

seqkit replace -p "(.+Similar to )(\S+)(.+)" -r "\$2\$1\$2\$3" in.fasta
ADD COMMENT
0
Entering edit mode
19 months ago
YiweiZhu ▴ 30

I like to use bioawk to edit fasta files.

>_Anouracaudifer_00017283-RA transcript Name:"Similar to Chid1 Chitinase domain-containing protein 1 (Rattus norvegicus OX=10116)" offset:0 AED:0.30 eAED:0.30 QI:0|0|0|1|1|1|12|0|393
ATGAAGGCGCTCCTGCATGTGCTCTGGCTCACTCTGGCCTGCGGCTCTGCTCACACCACCCTGTCGAAGTCGG

bioawk -c fastx '{a=match($4,"Similar to");b=substr($4,a+11,5);print ">"b$name" "$4"\n"$seq}' test.fa

>Chid1_Anouracaudifer_00017283-RA transcript Name:"Similar to Chid1 Chitinase domain-containing protein 1 (Rattus norvegicus OX=10116)" offset:0 AED:0.30 eAED:0.30 QI:0|0|0|1|1|1|12|0|393
ATGAAGGCGCTCCTGCATGTGCTCTGGCTCACTCTGGCCTGCGGCTCTGCTCACACCACCCTGTCGAAGTCGG
ADD COMMENT

Login before adding your answer.

Traffic: 2690 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6