Gene names/ids on annotated protein files
0
0
Entering edit mode
21 months ago
nitinra ▴ 50

Hello all,

I have annotated around 276 protein files using EggNog. The protein files have the following headers (example from one of the species):

 head Spodoptera_frugiperda.fa

>file_1_file_1_g22553.t1 gene=file_1_file_1_g22553
MNRLGMIVDLSHVGENTTRAAIKLSRAPVVFTHSSVYSLCNHKRNVPDDIIHSLKENGGIIMVNFFPDFVKCAPNATISDVAEHFHYIKRMVGADYVGIGGDFDGVNRVPRGLEDVSRYPELFAELLRSGQWTVQELKNLAGLNMLRVMRQVEKVRDEMRTNGVEPEEHPDSPNDNGNCTSNAFYTEYV

The annotation file from EggNog has the following:

  head Spodoptera_frugiperda.softmasked.prot.fa.emapper.annotations

> file_1_file_1_g22553.t1   13037.EHJ66618  2.39e-121   357.0   COG2355@1|root,KOG4127@2759|Eukaryota,38D9H@33154|Opisthokonta,3BCAM@33208|Metazoa,3CRIG@33213|Bilateria,41U16@6656|Arthropoda,3SJQR@50557|Insecta,4488J@7088|Lepidoptera   6656|Arthropoda O   Membrane dipeptidase (Peptidase family M19) -   -   3.4.13.19   ko:K01273   -   -   ko00000,ko00537,ko01000,ko01002,ko04147 -   -   -   Peptidase_M19

How do I replace all the headers in the protein file with the gene names (in the case of the example,

gene=file_1_file_1_g22553  to Peptidase_M19.

Thank you!!

bash Protein GFF annotation • 424 views
ADD COMMENT

Login before adding your answer.

Traffic: 1938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6