What is the preferred way to, given locations in a genome, find the first genes that are upstream or downstream of these locations?
I have been looking into processing a GFF file using BioPython, but it is taking an insanely long time (2h+) to parse a file other tools can parse in seconds. I also considered using Ensembl, but there also doesn't seem to be a good API for Python there.
So what would be my best course of action here? Use MySqlDb to hook straight into the Ensembl database? Parse the GFF file myself?
That seems to do exactly what I want, thank you! So I guess the best way to do this in Python is to not use Python...
there is a implementation of bedtools incase you need to use a a part of a script. http://packages.python.org/pybedtools/