I am trying to get a list of proteins in a genomic region surrounding a protein of interest identified by a protein accession number. I am using biopython and the entrez module. First I start off by getting a list of protein accession from GenBank and now I want to get an idea of the genomic region surrounding the protein.
I am using Entrez.efetch(db="protein", id=rec, rettype="ipg", retmode="text")
In order to get the nucleotide accession number and the start / stop sites of the protein CDS.
My question is, how do I then download a genbank file representing a 30kb region around the CDS.
Can someone point me in the right direction.
I don't think this is the most optimal method to tackle this question.
You would be better of downloading a bed or gtf file of your organism of interest and get the neighbouring genes from that, e.g. using bedtools.
I am interested in comparing biosynthetic gene clusters in bacteria that share common enzymes required for the synthesis of the backbone of a natural product. I would rather not download a larger file than I need for the sake of time and bandwidth.