How to replace the headers in a fasta files with same keyword?
1
3
Entering edit mode
2.6 years ago
sunnykevin97 ▴ 990

Hi

I have thousands(1000's) of fasta files in one directory, I want to replace all the fasta file headers with the same keyword >Gast_superba ??

suggestions.

gene genome • 1.1k views
ADD COMMENT
4
Entering edit mode
$ awk '/>/{print ">Gast_superba"}!/>/' test.fa
$ sed '/^>/ s/.*/>Gast_superba/' test.fa
ADD REPLY
0
Entering edit mode

cpad0112, thanks I solved it by writing loop.

ADD REPLY
2
Entering edit mode

Do you want to add Gast_superba in to the description line of the entries following the caret in each one and leave the rest of the previous description line? A few examples of what you have and what you want for each entry in the file would make sure you got the best advice. Because I wonder if I have it backwards, and you want to replace all those occurences with something different, similar to here.

This sounds like the fastest way to do this would be to use a shell script to iterate on the FASTA files matching your extension, see here, and then use sed to the a find/replace on the header line, see here. If I understand correctly you replace every caret with >Gast_superba.

There's a lot of similar answers already in Biostars if you look around.

ADD REPLY
2
Entering edit mode

Replacing the header name of each fasta file (10000's) in a directory, I solved it writting a loop.

awk '/>/{print ">Gast_superba"}!/>/' *.fasta > con.fasta

Original file

head -n1 data1.fasta
>clostridium buryricum TOA

Modified file

head -n1 data1.fasta
 >Gast_superba
ADD REPLY
5
Entering edit mode
2.6 years ago
Jeremy ▴ 930

At the risk of sounding redundant after Wayne's answer, I think you should be able to use the following command in Unix:

sed 's/>.*/>Gast_superba/' *.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6