Entering edit mode
6.2 years ago
steven.marshall2
▴
10
Hi
I am trying to parse a genbank file. I am using python 2.7 and biopython 1.73.
Below is the first entry in my file. The information I would like to save to a new file is: Accession, Organism, kpc gene and its translation
I would like to save the same info from all the records in my file.
Thanks to all in advance who might be able to help.
LOCUS MH558576 11275 bp DNA linear BCT 03-SEP-2018
DEFINITION Klebsiella pneumoniae strain KP21-KPC plasmid, partial sequence.
ACCESSION MH558576
VERSION MH558576.1
KEYWORDS .
SOURCE Klebsiella pneumoniae
ORGANISM Klebsiella pneumoniae
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacterales;
Enterobacteriaceae; Klebsiella.
REFERENCE 1 (bases 1 to 11275)
AUTHORS Wang,P., Hu,Y., Yi,G., Shen,X., Wang,Z., Ma,R., Shan,B. and Wang,Y.
TITLE Clone dissemination of blaKPC-2 and blaNDM-1 co-producing clinical
isolates of Klebsiella pneumoniae in a Chinese teaching hospital
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 11275)
AUTHORS Wang,P., Hu,Y., Yi,G., Shen,X., Wang,Z., Ma,R., Shan,B. and Wang,Y.
TITLE Direct Submission
JOURNAL Submitted (02-JUL-2018) Department of Key Laboratory, The 2nd
Affiliated Hospital of Kunming Medical University, 374 Dian Mian
Road, Kunming, Yunnan 650101, China
COMMENT ##Assembly-Data-START##
Sequencing Technology :: Sanger dideoxy sequencing
##Assembly-Data-END##
FEATURES Location/Qualifiers
source 1..11275
/organism="Klebsiella pneumoniae"
/mol_type="genomic DNA"
/strain="KP21-KPC"
/isolation_source="urine"
/db_xref="taxon:573"
/plasmid="unnamed"
/country="China: Kunming,YN"
/collection_date="2010"
CDS 43..759
/codon_start=1
/transl_table=11
/product="IS6-like element IS26 family transposase"
/protein_id="AXS01185.1"
/translation="MELHMNPFKGRHFQRDIILWAVRWYCKYGISYRELQEMLAERGV
NVDHSTIYRWVQRYAPEMEKRLRWYWRNPSDLCPWHMDETYVKVNGRWAYLYRAVDSR
GRTVDFYLSSRRNSKAAYRFLGKILNNVKKWQIPRFINTDKAPAYGRALALLKREGRC
PSDVEHRQIKYRNNVIECDHGKLKRIIGATLGFKSMKTAYATIKGIEVMRALRKGQAS
AFYYGDPLGEMRLVSRVFEM"
gene 862..1272
/gene="tnpR"
/note="truncated TnpR resolvase"
CDS 1395..2375
/codon_start=1
/transl_table=11
/product="IS481-like element ISKpn27 family transposase"
/protein_id="AXS01186.1"
/translation="MTQALHSQARTTHLIREEIRNSTLPQAELARMYNVTRQTIRKWQ
NRESPEDKSHAPNKMYTTLTPEQELIVVELRKTLLLPTDDLLAVTREFINPAVSRAGL
GRCLRRHGVSDLRNLVEQEGTAPATKKTFKDYEPGFVHIDIKYLPQMPDETARRYLFV
AIDRATRWVFIELYADQTDGSSGDFLNKVQQACPVKIVKLLTDNGSQFTDRFTAGGKK
KEPSGTHVFDRLCKQLGIEHRLIPPRHPQTNGMVERFNGRISDIVNQTRFGSAAELES
TLRNYVKIYNHSIPQRALQHKTPVQALKEWHEKRPELFRKRVYNQPGLDI"
gene 2651..3532
/gene="kpc"
/note="carbapenem-hydrolyzing class A beta-lactamase
KPC-2"
CDS 2651..3532
/gene="kpc"
/codon_start=1
/transl_table=11
/product="carbapenem-hydrolyzing class A beta-lactamase
KPC-2"
/protein_id="AXS01187.1"
/translation="MSLYRRLVLLSCLSWPLAGFSATALTNLVAEPFAKLEQDFGGSI
GVYAMDTGSGATVSYRAEERFPLCSSFKGFLAAAVLARSQQQAGLLDTPIRYGKNALV
PWSPISEKYLTTGMTVAELSAAAVQYSDNAAANLLLKELGGPAGLTAFMRSIGDTTFR
LDRWELELNSAIPGDARDTSSPRAVTESLQKLTLGSALAAPQRQQFVDWLKGNTTGNH
RIRAAVPADWAVGDKTGTCGVYGTANDYAVVWPTGRAPIVLAVYTRAPNKDDKHSEAV
IAAAARLALEGLGVNGQ"
gene complement(4767..5063)
/gene="korC"
/note="transcriptional repressor protein KorC"
CDS complement(4767..5063)
/gene="korC"
/codon_start=1
/transl_table=11
/product="transcriptional repressor protein KorC"
/protein_id="AXS01188.1"
/translation="MIRPETLRPFAEDWQAPTADEIKEVLELIRQRKGLSKPLSGVDV
ADLVGLPGERGSGKGTRTFRRWVSKTNPSPIAYGAWSILAHLAGFGAIWDADRD"
gene complement(5392..5817)
/gene="klca"
/note="antirestriction protein"
CDS complement(5392..5817)
/gene="klca"
/codon_start=1
/transl_table=11
/product="antirestriction protein"
/protein_id="AXS01189.1"
/translation="MMQTELNPLICSLVATPRRMAAMPRYVGRFYVVFESMLYQQMKG
LCREYRGAYWLMWELSNGGFYMAPGRRDEMLNIEAMNYFSGQMSADAAGITACLYLYS
HLSFHTEGADQERFSRLYHSLRDWACEHDEKEAILAAID"
CDS complement(5928..6206)
/codon_start=1
/transl_table=11
/product="hypothetical protein"
/protein_id="AXS01190.1"
/translation="MIHTANRTFHQLYREWIRERREHMHNVLTWERDRYGARLVGLFY
RYCKVANPFPRCTLNTRINYRAHAVNLPDWPARSLELNKMWLSWREKK"
CDS 7749..8309
/codon_start=1
/transl_table=11
/product="TnpR resolvase"
/protein_id="AXS01191.1"
/translation="MQGHRIGYVRVSSFDQNPERQLEQTQVSKVFTDKASGKDTQRPQ
LEALLSFVREGDTVVVHSMDRLARNLDDLRRLVQKLTQRGVRIEFLKEGLVFTGEDSP
MANLMLSVMGAFAEFERALIRERQREGIALAKQRGAYRGRKKALSDEQAATLRQRATA
GEPKAQLAREFNISRETLYQYLRTDD"
CDS 8313..>11275
/codon_start=1
/transl_table=11
/product="Tn3-like element TnAs1 family transposase"
/protein_id="AXS01192.1"
/translation="MPRRLILSATERDTLLALPESQDDLIRYYTFNDSDLSLIRQRRG
DANRLGFAVQLCLLRYPGYALGTDSELPEPVILWVAKQVQAEPASWAKYGERDVTRRE
HAQELRTYLQLAPFGLSDFRALVRELTELAQQTDKGLLLAGQALESLRQKRRILPALS
VIDRACSEAIARANRRVYRALVEPLTDSHRAKLDELLKLKAGSSITWLTWLRQAPLKP
NSRHMLEHIERLKTFQLVDLPEGLGRHIHQNRLLKLAREGGQMTPKDLGKFEPQRRYA
TLAAVVLESTATVIDELVDLHDRILVKLFSGAKHKHQQQFQKQGKAINDKVRLYSRIG
QALLEAKESGSDPYAAIEAVIPWDEFTESVSEAELLARPEGFDHLHLVGENFATLRRY
TPALLEVLELRAAPAAQGVLAAVQTLREMNADNLRKVPADAPTAFIKPRWKPLVITPE
GLDRKFYEICALSELKNALRSGDIWVKGSRQFRDFDDYLLPAEKFAALKREQALPLAI
NPNSDQYLEERLQLLDEQLATVTRLAKDNELPDAILTESGLKITPLDAAVPDRAQALI
DQTSQLLPRIKITELLMDVDDWTGFSRHFTHLKDGAEAKDRTLLLSAILGDAINLGLT
KMAESSPGLTYAKLSWLQAWHIRDETYSAALAELVNHQYRHAFAAHWGDGTTSSSDGQ
RFRAGGRGESTGHVNPKYGSEPGRLFYTHISDQYAPFSTRVVNVGVRDSTYVLDGLLY
HESDLRIEEHYTDTAGFTDHVFALMHLLGFRFAPRIRDLGETKLYVPQGVQAYPTLRP
LIGGTLNIKHVRAHWDDILRLASSIKQGTVTASLMLRKLGSYPRQNGLAVALRELGRI
ERTLFILDWLQSVELRRRVHAGLNKGEARNSLARAVFFNRLGEIRDRSFEQQRYRASG
LNLVTAAIVLWNTVYLERATQGLVEAGKPVDGELLQFLSPLGWEHINLTGDYVWRQSR
RLEDGKFRPLRMPGKP"
ORIGIN
1 gcaaatagtc ggtggtgata aacttatcat ccccttttgc tgatggagct gcacatgaac
61 ccattcaaag gccggcattt tcagcgtgac atcattctgt gggccgtacg ctggtactgc
121 aaatacggca tcagttaccg tgagctgcag gagatgctgg ctgaacgcgg agtgaatgtc
181 gatcactcca cgatttaccg ctgggttcag cgttatgcgc ctgaaatgga aaaacggctg
241 cgctggtact ggcgtaaccc ttccgatctt tgcccgtggc acatggatga aacctacgtg
301 aaggtcaatg gccgctgggc gtatctgtac cgggccgtcg acagccgggg ccgcactgtc
361 gatttttatc tctcctcccg tcgtaacagc aaagctgcat accggtttct gggtaaaatc
Thanks. With the tutorial and some other info I found on-line I was able to get the following:
Which gives me:
The result:
Klebsiella pneumoniae strain KP21-KPC plasmid, partial sequence
['kpc']
['carbapenem-hydrolyzing class A beta-lactamase KPC-2']
['MSLYRRLVLLSCLSWPLAGFSATALTNLVAEPFAKLEQ....
This is exactly the information I want (except the brackets and quotations around everything), however I had to know that the feature I want was the 5th one. Given multiple records in a genbank file how can I get the feature i want by using a qualifier like the gene name?