Entering edit mode
6.2 years ago
beginner_problem
▴
10
I am trying go extract the gene positions from a genbank file using Biopython. This is the function i wrote so far:
def get_CDS(file):
record = SeqIO.read(file, "genbank")
cds = []
for feature in record.features:
if feature.type == 'CDS':
print feature.location
start_i = feature.location.start
end_i = feature.location.end
cds.append((start_i, end_i))
return cds
However I noticed sometimes, there are entries like:
join{[4585844:4586295](-), [4584940:4585845](-)}
And then start and end positions will return: 4584940 and 4586295.
Does someone maybe know, how can I also get the positions of the genes accordingly, for the first part of the gene [4585844:4586295] and then [4584940:4585845]
Could you please provide accession number of the genbank file you are trying to parse using this code?
For example, one of the pestis genomes causes this problem: NC_003143