Let's say you are working with hg19
chromosome extents (a sorted BED file called hg19.extents.bed
):
To solve your problem, you can use bedops --chop
to split this extents file by 60k-base increments, and you would then run a BEDOPS bedops
set operation to capture all the elements that fall within each increment.
First, split the extents:
$ bedops --chop 60000 hg19.extents.bed > hg19.60k.bed
Second, sort your input file with BEDOPS sort-bed
, to prep it for use with BEDOPS tools:
$ sort-bed input.bed > input.sorted.bed
Third, run the equivalent of bedops --element-of 1 input.sorted.bed increment_i.bed
for each increment i
.
Here is a bash
shell command that will do this:
$ incCounter=0; \
while read incLine; \
do \
incCounterPadded=`printf %07d ${incCounter}`; \
chromosomeName=`cut -f1 ${incLine}`; \
outputFn="output_${chromosomeName}_${incCounterPadded}.bed"; \
echo -e ${incLine} | bedops --element-of 1 input.sorted.bed - > ${outputFn}; \
incCounter=$((incCounter+1)); \
done < hg19.60k.bed
Each of the files output_chr1_0000000.bed
, output_chr1_0000001.bed
, etc. contains elements of input.sorted.bed
that fall within 60k windows across hg19
.
Note that this will create many, many thousands of files for hg19
. You may want to do a bit more work to filter operations to folders named by chromosome, or widen your window region, or apply other strategies to more sensibly manage the output from this script. Hopefully this gets you started.
Sjneph is also correct that my method will cause "double-counting" where an input element spans two adjoining increments. This may or may not be an issue for your analysis, but his bedmap
approach also yields much more manageable output while highlighting potentially problematic element overlaps. I'd give his answer more attention, depending on what you're trying to do.
Thank you, but i need it with step 60000 for all chromosome not only to 180000, but thanks. I am doing whole genome sequencing, I separate bed fie to each chromosomes and after that I need separate each chr1 to bed files by cooridnate condition with step 60000
Well, you can create a text file giving the ranges for what you want separated and read the ranges from the text file generated then use the logic to suite your needs.