Question

Change the interval of bed file

0

Entering edit mode

2.9 years ago

the ▴ 10

Hi,

I have a bed file containing mappability values of reference genome. It starts from 10,000th position and goes on. The interval for each line varies, but there are no jumps and overlaps. For example, the first line is between 10,000 and 10,0039, the second line is 10,039 and 10,040. I want to change the interval to 1000 base pairs for each line and take the mean of mappability value for that interval.

Let's say: Input

chr1 10,000 10,039 0.4
...     10,039  .....      0.2

Output

chr1 10,000 11,000 0.2 (mean mappability value between 10,000 and 11,000 position)

bed • 1.6k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 2.9 years ago by the ▴ 10

0

Entering edit mode

2.9 years ago

ATpoint 85k

That is what bedtools map can do for you. Check its documentation https://bedtools.readthedocs.io/en/latest/content/tools/map.html

ADD COMMENT • link 2.9 years ago by ATpoint 85k

0

Entering edit mode

Thank you for the answer! From the link, I understand bedtools map works with a second bed file, but I have one file I want to change of its start-end interval with mappability score. Is it possible with bedtools map?

ADD REPLY • link 2.9 years ago by the ▴ 10

0

Entering edit mode

You would need to create a second file with the bins of interest first.

ADD REPLY • link 2.9 years ago by ATpoint 85k

score 2 · Accepted Answer · 2022-01-19

With BEDOPS bedops and bedmap set and mean operations, and Kent utilities to build 1kb windows:

$ tr -d ',' < elements_with_commas.bed | sort-bed - > elements.bed
$ fetchChromSizes hg38 | grep -v '_*_' | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2 }' | sort-bed - | bedops --chop 1000 - > hg38.1kb.bed
$ bedops --everything elements.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, ".", $4 }' > union.bed
$ bedmap --echo --mean --delim '\t' hg38.1kb.bed union.bed > answer.bed

Replace hg38 with your reference genome name (e.g., mm10 for mouse, etc.).

You can also use process substitutions to avoid creating intermediate files, bundling the set and mean operations into a more efficient one-liner:

$ tr -d ',' < elements_with_commas.bed | sort-bed - > elements.bed
$ bedmap --echo --mean --delim '\t' <(fetchChromSizes hg38 | grep -v '_*_' | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2 }' | sort-bed - | bedops --chop 1000 -) <(bedops --everything elements.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, ".", $4 }') > answer.bed

However, the first set of commands is probably easier to read, troubleshoot, and modify.

Another demonstration (and related Biostars answer) here, showing use of the --sum operation, instead of --mean, though it is the same principle: How can I bin my bed files into 500bp bins?

The bedmap command offers several score summary operations, in addition to --sum and --mean. Median, min, max, weighted and trimmed means, etc. Run bedmap --help or review the documentation for a full listing.