I have a bed file with regions of interest, and I would like to split it into a minimum number of bed files each of which will have all regions separated to each other by N bases or more. Any ideas what tool I could use?
E.g. input.bed
chr1 1000 1050
chr1 1080 1130
chr1 2000 2050
Would be split by:
split_by_distance -n 150 -i input.bed
And would produce:
input0001.bed
chr1 1000 1050
input0002.bed
chr1 1080 1130
chr1 2000 2050
Explained graphically:
Original file has 3 entries
Minimum distance:
[xxxxxxxxx]
First and second are too close:
[xxxxxxxxx]
##########
##########
##########
Output is:
file1: first
file2: second and third
Another example:
Minimum distance: [xxxxxx]
[xxxxxx] [xxxxxx] [xxxxxx] [xxxxxx]
AAAAAAAAAA BBBBBBBBBB CCCCCCCCCC DDDDDDDDD EEEEEEEEEE
Output files:
File1:
AAAAAAAAAA CCCCCCCCCC EEEEEEEEEE
File2:
BBBBBBBBBB DDDDDDDDD
Thx
I don't think this completely solves the problem. But bedtools makewindows might be something to look into. It does not output the results into multiple bed files unfortunately, but it's a start.
So row1 is in its own file because row2 is less than 150bp away, but row3 is with row2 because its further than 150bp away?
wat.
I think he might have added the 1 in -n 150 by accident. We shouldn't assume, but it seems that he wants to be able to split his input bed file into multiple bedfiles based on the -n number. So every n bases the input would be split into a different file. I'm not sure WHY exactly.
I could understand that - but i think he actually wants to split the file so that subfile represent a "lone inverval", or a cluster of intervals. Perhaps from a peak caller. Also, not exactly sure why :P