Rename fasta headers
2
1
Entering edit mode
2.0 years ago
diecasfranco ▴ 10

Hi all, I'm looking for a simple solution for renaming fasta headers.

I have this fasta header

>trpE___AA_HMM___6fa05435949258489b608db9e58e5ba38821f2f26fffe5755daff43abin_id:MALBOS1|source:AA_HMM|e_value:5.2e99|contig:MALBOS1_000000117228|gene_callers_id:113772|start:215745|stop:217260|length:1515

And I would like to rename it only like this

>MALBOS1_000000117228

That means, remove everything before the pattern "contig:" and after "|gene_callers_id"

Any ideas?

thanks

fasta regex • 1.5k views
ADD COMMENT
1
Entering edit mode
2.0 years ago
JC 13k
perl -pe 's/>.+contig:(.+?)\|.+/>$1/' < FASTA_IN > FASTA_OUT
ADD COMMENT
0
Entering edit mode

Thanks JC, it is great solution.

Now I have a different issue here, because I need the gene name at the beginning of the fasta header.

>trpE___AA_HMM___6fa05435949258489b608db9e58e5ba38821f2f26fffe5755daff43abin_id:MALBOS1|source:AA_HMM|e_value:5.2e99|contig:MALBOS1_000000117228|gene_callers_id:113772|start:215745|stop:217260|length:1515

And the output should be

>trpE_MALBOS1_000000117228

Any ideas?

ADD REPLY
1
Entering edit mode

if the headers are correctly formatted (with >GENENAME__) you can:

perl -pe 's/>(.+?_)_.+contig:(.+?)\|.+/>$1$2/' < FASTA_IN > FASTA_OUT
ADD REPLY
0
Entering edit mode

I have a different issue here, because I need the Chromsome name at the beginning of the fasta header.

Chr01:10181894..10189044_INT#LTR/Copia|Class_I/LTR/Ty1_copia/Ale:Ty1-RT ID=Chr01:10181894..10189044_INT#LTR/Copia|Class_I/LTR/Ty1_copia/Ale:Ty1-RT;gene=RT;clade=Ale;evalue=2.7e-66;coverage=26.1;probability=0.98

To

Chr01_Ale_Ty1-RT

Any ideas?

ADD REPLY
1
Entering edit mode
2.0 years ago
Jeremy ▴ 930

Here's a solution using cut:

cut -d '|' -f 4 start.fasta | sed 's/contig:/>/' > end.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 2183 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6