I need a python script to get a genbank file, with the intron-exons boundaries, and give me the sequences of these boundaries. I think it should be easy with biopython seqIO, but I don't know how to use it in that way.
I need a python script to get a genbank file, with the intron-exons boundaries, and give me the sequences of these boundaries. I think it should be easy with biopython seqIO, but I don't know how to use it in that way.
As others have said, it would be good if you could explain the what features your genbank files have and exactly what you want to get from them. Presuming you have genomic DNA with "exon" features like this one and you want to get sequences either side of the exon-intron boundary you can do this:
from Bio import SeqIO
record = SeqIO.read("LDH.gb", "genbank")
exons = [f for f in record.features if f.type == "exon"]
for start in [e.location.start.position for e in exons]:
print record.seq[start - 1 : start + 1]
You should also check out the biopython tutorial which describes the SeqRecord and SeqFeature objects
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I'm not sure that the heterogeneous annotations in genbank are enough to extract those positions.
I think Pierre is correct in his assumption.
Dror, it would be helpful to edit your post to include a specific GenBank file you want to process and also some more details about what you are hoping to achieve. Your post title mentions drawing intron boundaries, while your actual question asks for sequences. Specifics will help with generating useful answers.
Dror: Dear Brad, You right, sorry for the title what I need is to take several refseq entries of orthologs, and extract the first two exons and intron from each one. Then I want to see the the boundaries of the first intron. I think that this intron is conserved, in all metazoans. I did it manually for some genes, and I want to automate it for looking on many others. Later, I want to draw the intron-exons boundaries of these genes graphically, showing the conserved features of the boundaries. I do not know how to change the title.