Add Information to Protein Fasta Headers
0
0
Entering edit mode
22 months ago
ahmadjoyyia ▴ 20

Hi, I have protein fasta file whose headers look like '>evm.model.chr.9.52'. There are almost 30k+ proteins. I have performed functional annotations and also added every information to gene structure we get from EVM. The thing is, in that files I had columns so I basically merged information. Now, I al performing some analysis and I want to add atleast protein name or even GO term in fasta header so it would make things alot easier for me. I want something like;

 >evm.model.chr.9.52 GO:1234678

Can I do it with grep? Or Seqkit or any other method please? Any help would be highly appreciated. Thanks!!!

protein fasta functional-annotation header • 1.0k views
ADD COMMENT
0
Entering edit mode

There is not enough information about where these GO terms or protein names are going to come from. Do you have a separate file which has this information?

ADD REPLY
0
Entering edit mode

Thank you for the reply! I have GO terms in separated tab delimited file.

ADD REPLY
0
Entering edit mode

You may want to post example lines from that file.

ADD REPLY
0
Entering edit mode

I already gave you example how my protein fasta header looks like. I had done functional annotation from many sources and made a custom file which looks like this;

>chr.1.1128 - KOG1072@1|root,KOG1072@2759|Eukaryota,37PDR@33090|Viridiplantae,3G9CK@35493|Streptophyta,4JJK7@91835|fabids GO:0005575,GO:0005622,GO:0005623,GO:0005737,GO:0005829,GO:0008150,GO:0019222,GO:0031323,GO:0043455,GO:0044424,GO:0044444,GO:0044464,GO:0050789,GO:0050794,GO:0065007,GO:2000762 F-box kelch-repeat protein O80582 RecName: Full=F-box/kelch-repeat protein At2g44130 [Arabidopsis thaliana] substrate(PAL) adaptor of SCF E3 ubiquitin ligase *(KFB-PAL) & swissprot: F-box/kelch-repeat protein At2g44130 & original description: none'

That is a tab-delimited file and has proper column names. Like 5th column is of GO_Description. Lets say I want to copy GO_Description or GO from this file to my protein fasta file with matching ID, which in my case he would be 'chr.1.1128'. How can I do this? Thanks!!

ADD REPLY
0
Entering edit mode

GenoMax Hi.. I can see you have edited my reply and it is something I am looking for. But can you please tell me how did you do it? And how can I do it please?

ADD REPLY

Login before adding your answer.

Traffic: 2758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6