I have assigned KEGG ids for my newly sequenced protein sequences, using Using Kegg/Kaas, sow i have a list of IDs , how do i assign them pathway maps . i need to know which of the genes(proteins) is in what family
I have assigned KEGG ids for my newly sequenced protein sequences, using Using Kegg/Kaas, sow i have a list of IDs , how do i assign them pathway maps . i need to know which of the genes(proteins) is in what family
Could I ask you to provide an example of an input file and an example of your desired output? It might help us to better understand your question.
Perhaps you might find this tool useful? https://github.com/endrebak/kg
I solved this by using GhostKOALA.
Just need to provide your query amino acid sequences in FASTA format and speficy which KEGG GENES database file to be searched. You will get an email when your results are ready. On the results, if you go to "reconstruct pathway" it will tell you how many proteins match to each family and also which of the genes is in each family. Hope it helps!
Do the FASTA-formatted amino acid sequences have to be divided into proteins, like this:
>PROKKA_00002 hypothetical protein
MSINSSLQQLAGGIAAAIGGMIVVQKDNFSPIEHYDTLALVVAIFVGICVYVLSLVSKIV
RDKNKA*
>PROKKA_00003 ATP-dependent RNA helicase RhlE
LEALNRFKAGKTRVLVTTDLLARGIDIQFLPFVINYELPRSPKDYIHRIGRTVRAEASGE
AISFVSPEDQHHFKVIQKKMKKWVTMVEGDGLV*
>PROKKA_00004 Long-chain-fatty-acid--CoA ligase FadD13
MIIRGGENIYSSEVENILYEHPAVTDAALVGIPHQTLGEEPAAVVHLAPGMTATEEELRH
YVSERLAKFKVPVKIIFTQDTLPRNANGKILKRDLKALF*
I mean, they have to be, right? Otherwise, how would the program tell where one protein starts and the next one begins.
Welcome @willnotburn : As far as I am aware, partial proteins sequences can be used as input too. This means that you can input sequences that either do not start with M (5prime_partial) or do not end with * (3prime_partial). I did not have any problem with internal protein sequences either.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
The input is a file of protein sequences >5000
e.g.
The output should look like this:
Links: