I have a large set of footprint intervals that range from 11 to 25bp For the purpose of motif discovery I would like to extend all intervals to, for example, 50bp. Intervals should be extended equally from both sides. I would usually use 'bedtools slop' for fixed length intervals, but this would not appear to work with variable length.
It would be great if anyone could advise me how to use bedtools, or something else. I have a nagging feeling I am missing something obvious, so apologies in advance!
Thank you for your answer! I was going to ask how it handles odd lengths. It is OK if one side has an extra base, as long as the final length is the same.
Thanks for the addition. After discussing this with a colleague this morning it was pointed out that finding the mid-point of each region and then extending out works equally well. I knew I had missed something!
Yes, either way gets you to the same answer, but you'd still need to shift the midpoint up or down a base when dividing an even-numbered length in half.
Probably an obvious answer Im not thinking of, but why does this code sometimes create intervals slightly outside the target range of 50, like 49 or 51? Thanks!
Dividing
diff
by 2 when it is an odd number, probably. Theint()
function chops off the fractional part.Sounds like this is not a problem for the original question, but might be an issue for you? This is tough because genomic intervals aren't usually fractional, so a decision would have to be made to figure out what to do.
As one approach, perhaps just check first if the total interval length would be one more or less than the target length, and subtract or add one to either the left or right flank, as needed (perhaps flipping a coin to decide which) to get to the target length.