Splitting Gbff files
1
0
Entering edit mode
2.3 years ago
schlogl ▴ 160

Hi there, There are any easy way (or hard) to split a multi Gbff file (7815) in single gbk files? I appreciate any help! Paulo

Gbff • 1.6k views
ADD COMMENT
2
Entering edit mode
2.3 years ago

You could use BioPython like so:

from Bio import SeqIO

# Parse GenBank with multiple records
stream = SeqIO.parse("genomes.gb", format="genbank")

# Write each record as a separate file
for rec in stream:
    SeqIO.write(rec, f"{rec.id}.gb", "genbank")
ADD COMMENT
0
Entering edit mode

@Istvan Albert To be real honest I tried but I maybe messed up because in my code it was reading like a unique file 8(. I will check it out. Thank you for your time. Paulo

ADD REPLY
1
Entering edit mode

The code is really simple. The parsing step reads the original multi-genbank file, then the loop writes into a filename that uses the record id in the name. Change it to

from Bio import SeqIO

# Parse GenBank with multiple records
stream = SeqIO.parse("genomes.gb", format="genbank")

# Print the id of each record
for rec in stream:
    print(rec.id)

to see each record id in your file, perhaps it is reusing the ids

ADD REPLY

Login before adding your answer.

Traffic: 2503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6