Hello everyone
I have several bacterial genomes files in .gbk format. I am trying to find some script or software that will allow me to get the part of the gbk file referring to the names of several genes. For example of this file. And i want to obtain just the part of the product with name "IVa2 protein".
Input:
CDS 1879..3366
/gene="E1B"
/codon_start=1
/product="55K protein"
/protein_id="ADM66105.1"
/translation="MEPGHPTEQGLHPGLRSHAPVEGLDQAAGTENLELLASTASSSG
SSSSTQTNIHVGGRNEAGHGREPEERPGPSVGRGAGLNQVSSLYPELSKVLTSMARGV
KRERSDGGNTGMMTELTASLMNRKRPERLTWYELQQECRDELGLMQDKYGLEQIKTHW
LNPDEDWEEAIKKYAKIALRPDCKYIVTKTVNIRHACYISGNGAEVVIDTLDKAAFRC
CMMGMRAGVMNMNSMIFMNMKFNGEKFNGVLFMANSHMTLHGCSFFGFNNMCAEVWGA
SKIRGCKFYGCWMGVVGRPKSEMSVKQCVFEKCYLGVSTEGNARVRHCSSLETGCFCL
VKGTASLKHNMVKGCTDERMYNMLTCDSGVCHILKNIHVTSHPRKKWPVFENNLLIKC
HMHLGARRGTFQPYQCNFSQTKLLLENDAFSRVNLNGIFDMDVSVYKILRYDETKSRV
RACECGGRHTRMQPVALDVTEELRPDHLVMACTGTEFSSSGEDTD"
gene 3454..3858
/gene="IX"
CDS 3454..3858
/gene="IX"
/codon_start=1
/product="IX protein"
/protein_id="ADM66106.1"
/translation="MNGTGGAFEGGLFSPYLTTRLPGWAGVRQNVMGSTVDGRPVLPA
NSSTMTYATVGSSSLDSTAAAAAAAAAMTATRLASSYMPSSGSSPSVPSSIIAEEKLL
ALLAELEALSRQLAALTQQVSELREQQQQQNK"
gene complement(3902..5526)
/gene="IVa2"
CDS complement(join(3902..5235,5514..5526))
/gene="IVa2"
/codon_start=1
/product="IVa2 protein"
/protein_id="ADM66107.1"
/translation="METRGRRPCPFQHQQDESQAHPCKRPARGSPLHRDGDHPHSDPE
TLEGHDAGRAGRPSSRALQSQSSQPPKRGSLLDRDAVEHVTELWDRLELLSQTLAKMP
MADGLKPLKNFASLQELLSLGGDRLLGELVRENLQVRDMLNEVAPLLRDDGSCMSLNY
HLQPVIGVIYGPTGCGKSQLLRNLLSSQLITPAPETVFFIAPQVDMIPPSEMKAWEMQ
ICEGNFAPGPEGTIVPQSGTLRPKFIKMSYDDLTQEHNYDVSDPRNVFAKAAAHGPIA
IIMDECMENLGGHKGVSKFFHAFPSKLHDKFPKCTGYTVLVVLHNMNPRRDLGGNIAN
LKIQAKLHIISPRMHPSQLNRFANTYTKGLPVAISLLLKDIIQHHAQRPCYDWIIYNT
TPEHEAMQWCYLHPRDGLMPMYLNIQSHLYRVLEKIHRTLNDRERWTRAYRARKNK"
gene complement(5005..13458)
/gene="E2B"
CDS complement(join(5005..8526,13450..13458))
/gene="E2B"
/codon_start=1
/product="DNA polymerase"
/protein_id="ADM66108.1"
/translation="MALVQSHGARGLHAEAADPGCQPPRRRARQRSQGAAPGPARAPR
RRASAAPARGARTAAAAGSTPATPLLKAHRGTVVAPRSYGLMQCVDTTTNSPVEIKYH
LHLKHALTRLYEVNLRTLPPDLDLRDTMDSSQLRALVFALRPRRAEIWTWLPRGLVSL
SVLEEPQGESHAGEHESHQPGPPLLKFLLKGRAVYLVDEVQPVQRCEYCGRFYKHQHE
CSVRRRDFYFHHINSHSSNWWQEIQFFPIGSHPRTERLFVTYDVETYTWMGSFGKQLV
PFMLVMKFSGEPELVALARDLAVRLRWDRWERDPLTFYCVTPEKMAVGQQFRLFRDEL
QTLMARELWASFMQANPHLQEWALEQHGLQCPEDLTYEELKKLPHIKGRPRFMELYIV
GHNINGFDEIVLAAQVINNRASVPGPFRITRNFMPRAGKILFNDVTFALPNPLSKKRT
DFELWEHGGCDDSDFKYQFLKVMVRDTFALTHTSLRKAAQAYALPVEKGCCPYKAVNH
FYMLGSYRADDRGFPLREYWKDDEEYALNRELWEKKGEAGYDIIRETLDYCAMDVLVT
AELVAKLQDSYAHFIRDSVRLPHAHFNIFQRPTISSNSHAIFRQIVFRAEQPQRTNLG
PAFLAPSHELYDYVRASIRGGRCYPTYIGILSEPIYVYDICGMYASALTHPMPWGPPL
NPYERALAAREWQMALDDASSKIDYFDKELCPGIFTIDADPPDEHLLDVLPPFCSRKG
GRLCWTNEPLRGEVATSVDLVTLHNRGWRVRIVPDERTTVFPEWKCVAREYVQLNIAA
KERADRDKNQTMRSIAKLLSNALYGSFATKLDNKKIVFSDQMDESLLKSIAAGQANIK
SSSFLETDNLSAEVMPALEREYLPQQLALVDSDAEESEDEHRPAPFYTPPSGTPGHVA
YTYKPITFLDAEEGDMCLHTVEKVDPLVDNDRYPSHVASFVLAWTRAFVSEWSEFLYE
EDRGTPLQDRPIKSVYGDTDSLFVTERGHRLMETRGKKRIKKNGGKLVFDPEQPELTW
LVECETVCAHCGADAFAPESVFLAPKLYALQSLLCPACGRSSKGKLRAKGHAAEVLNY
ELMVNCYLADSQGEDRARFSTSRMSLKRTLASAQPGAHPFTVTETTLTRTLRPWKDMT
LAALDAHRLVPYSRSRPNPRNEEVCWIEMP"
output: gene complement(3902..5526) /gene="IVa2" CDS complement(join(3902..5235,5514..5526)) /gene="IVa2" /codon_start=1 /product="IVa2 protein" /protein_id="ADM66107.1" /translation="METRGRRPCPFQHQQDESQAHPCKRPARGSPLHRDGDHPHSDPE TLEGHDAGRAGRPSSRALQSQSSQPPKRGSLLDRDAVEHVTELWDRLELLSQTLAKMP MADGLKPLKNFASLQELLSLGGDRLLGELVRENLQVRDMLNEVAPLLRDDGSCMSLNY HLQPVIGVIYGPTGCGKSQLLRNLLSSQLITPAPETVFFIAPQVDMIPPSEMKAWEMQ ICEGNFAPGPEGTIVPQSGTLRPKFIKMSYDDLTQEHNYDVSDPRNVFAKAAAHGPIA IIMDECMENLGGHKGVSKFFHAFPSKLHDKFPKCTGYTVLVVLHNMNPRRDLGGNIAN LKIQAKLHIISPRMHPSQLNRFANTYTKGLPVAISLLLKDIIQHHAQRPCYDWIIYNT TPEHEAMQWCYLHPRDGLMPMYLNIQSHLYRVLEKIHRTLNDRERWTRAYRARKNK"
hi,
you may want to try this one
Have you searched the forum? There are no end of solutions to parse out genes from Genbanks. Take a look at a the "Similar posts" on the right hand side of this page.
I would recommend you learn some BioPython or similar to achieve this.
I already searched in various topics in the site. And I still have difficult to get this information from the gbk file.