Let there exist a bed file, a, with 3 overlapping records:
a
chr1 1 20 s1 1 +
chr1 5 20 s2 1 +
chr1 10 20 s3 1 +
I want to write a function which would parse a and return a bed-file, bm containing genome-unique windows along with a list of bed-records which span those windows, i.e.
b
chr1 1 5 s1 1 +
chr1 5 10 s1,s2,s3 1 +
chr1 10 20 s1, s2, s3 1 +
The goal of the function is to produce genome-unique bed records (essentially an index), from which combinations of records can be used to produce each of the original bed records.
I wouldn't know where to start with something like this. Any suggestions are greatly appreciated.