Question

Add Information to Protein Fasta Headers

0

Entering edit mode

22 months ago

ahmadjoyyia ▴ 20

Hi, I have protein fasta file whose headers look like '>evm.model.chr.9.52'. There are almost 30k+ proteins. I have performed functional annotations and also added every information to gene structure we get from EVM. The thing is, in that files I had columns so I basically merged information. Now, I al performing some analysis and I want to add atleast protein name or even GO term in fasta header so it would make things alot easier for me. I want something like;

 >evm.model.chr.9.52 GO:1234678

Can I do it with grep? Or Seqkit or any other method please? Any help would be highly appreciated. Thanks!!!

protein fasta functional-annotation header • 1.0k views

ADD COMMENT • link 22 months ago by ahmadjoyyia ▴ 20

0

Entering edit mode

There is not enough information about where these GO terms or protein names are going to come from. Do you have a separate file which has this information?

ADD REPLY • link 22 months ago by GenoMax 147k

0

Entering edit mode

Thank you for the reply! I have GO terms in separated tab delimited file.

ADD REPLY • link 22 months ago by ahmadjoyyia ▴ 20

0

Entering edit mode

You may want to post example lines from that file.

ADD REPLY • link 22 months ago by GenoMax 147k

0

Entering edit mode

I already gave you example how my protein fasta header looks like. I had done functional annotation from many sources and made a custom file which looks like this;

>chr.1.1128 - KOG1072@1|root,KOG1072@2759|Eukaryota,37PDR@33090|Viridiplantae,3G9CK@35493|Streptophyta,4JJK7@91835|fabids GO:0005575,GO:0005622,GO:0005623,GO:0005737,GO:0005829,GO:0008150,GO:0019222,GO:0031323,GO:0043455,GO:0044424,GO:0044444,GO:0044464,GO:0050789,GO:0050794,GO:0065007,GO:2000762 F-box kelch-repeat protein O80582 RecName: Full=F-box/kelch-repeat protein At2g44130 [Arabidopsis thaliana] substrate(PAL) adaptor of SCF E3 ubiquitin ligase *(KFB-PAL) & swissprot: F-box/kelch-repeat protein At2g44130 & original description: none'

That is a tab-delimited file and has proper column names. Like 5th column is of GO_Description. Lets say I want to copy GO_Description or GO from this file to my protein fasta file with matching ID, which in my case he would be 'chr.1.1128'. How can I do this? Thanks!!

ADD REPLY • link updated 22 months ago by GenoMax 147k • written 22 months ago by ahmadjoyyia ▴ 20

0

Entering edit mode

GenoMax Hi.. I can see you have edited my reply and it is something I am looking for. But can you please tell me how did you do it? And how can I do it please?

ADD REPLY • link 22 months ago by ahmadjoyyia ▴ 20