Is there anyway to group the first three lines, while leaving the last line alone?
I know Bedtools have mergeBed function, merging those overlapping span, which, however will include the last line.
This may sound a pure computational question; but I'm just curious if we have available tools already to tackle such questions
I would second brentp's comment. It is not clear what you are asking. To my eye the relationship between these coordinates can be represented as follows (not to scale)
A ------------------------------
B -----------------------
C -------------------------
D ----
All four segments have some overlap with the other three. D is contained within another segment. A, B, and C are not completely contained within any other segment... Is that what you mean?
I would second brentp's comment. It is not clear what you are asking. To my eye the relationship between these coordinates can be represented as follows (not to scale):
A ------------------------------
B -----------------------
C -------------------------
D ----
All four segments have some overlap with the other three. D is contained within another segment. A, B, and C are not completely contained within any other segment... Is that what you mean? If so, there is probably a way to identify such cases (possibly in two steps) using BEDTools.
My understanding of the BED format is that each line is independent, and agnostic of any other lines. There is no data field that relates to other lines. On the other hand, that doesn't mean you couldn't hack something up, like giving lines which should be grouped a common key in the "name" field, or a common color in the itemRGB field. On the other hand perhaps you could convert from BED format to a line group aware format such as GFF (which has a group feature), thus making it accessible to available tools.
As brentp asked - what is the nature of your grouping? Why choose BED format?
what is the rule for doing the grouping?
I would second brentp's comment. It is not clear what you are asking. To my eye the relationship between these coordinates can be represented as follows (not to scale)
A ------------------------------ B ----------------------- C ------------------------- D ----
All four segments have some overlap with the other three. D is contained within another segment. A, B, and C are not completely contained within any other segment... Is that what you mean?