Michael Dondrup's comment is probably the best solution if you want to do anything complex with ranges, but for the problem stated (intergenic distance calculation), you really only need a few lines of python code:
gene_positions = [('geneA', 'chr1', 100,200), ('geneB', 'chr1', 300,400),
('geneC', 'chr1', 401, 450), ('geneD', 'chr2', 100,200)]
gene_positions.sort(key = lambda x: (x[1],x[2]))
intergenic_distances = []
for gene1,gene2 in zip(gene_positions, gene_positions[1:]):
(gene1_name,gene1_chr,gene1_start,gene1_end) = gene1
(gene2_name,gene2_chr,gene2_start,gene2_end) = gene2
if gene1_chr == gene2_chr:
intergenic_distances.append((gene2_start - gene1_end,
gene1_name, gene2_name))
I don't know what you want to do with the data afterward, but you get a list of (intergenic_distance, gene1_name, gene2_name) tuples, like this:
for distance,gene1,gene2 in intergenic_distances:
print "Distance between %s and %s is %s"%(gene1,gene2,distance)
Of course, you probably already knew that.
There are several python packages and other alternatives mentioned here: http://biostar.stackexchange.com/questions/2245/what-is-the-quickest-algorithm-for-range-overlap
I remember that package was called 'pygr', but I haven't used it myself.
I think it would be better if you paste a little portion of your input file on your question.
Hi Abhi. Can you post your input file?