Entering edit mode
22 months ago
ahmadjoyyia
▴
20
Hi, I have protein fasta file whose headers look like '>evm.model.chr.9.52'. There are almost 30k+ proteins. I have performed functional annotations and also added every information to gene structure we get from EVM. The thing is, in that files I had columns so I basically merged information. Now, I al performing some analysis and I want to add atleast protein name or even GO term in fasta header so it would make things alot easier for me. I want something like;
>evm.model.chr.9.52 GO:1234678
Can I do it with grep? Or Seqkit or any other method please? Any help would be highly appreciated. Thanks!!!
There is not enough information about where these GO terms or protein names are going to come from. Do you have a separate file which has this information?
Thank you for the reply! I have GO terms in separated tab delimited file.
You may want to post example lines from that file.
I already gave you example how my protein fasta header looks like. I had done functional annotation from many sources and made a custom file which looks like this;
That is a tab-delimited file and has proper column names. Like 5th column is of GO_Description. Lets say I want to copy GO_Description or GO from this file to my protein fasta file with matching ID, which in my case he would be 'chr.1.1128'. How can I do this? Thanks!!
GenoMax Hi.. I can see you have edited my reply and it is something I am looking for. But can you please tell me how did you do it? And how can I do it please?