Change the interval of bed file
2
0
Entering edit mode
2.9 years ago
the ▴ 10

Hi,

I have a bed file containing mappability values of reference genome. It starts from 10,000th position and goes on. The interval for each line varies, but there are no jumps and overlaps. For example, the first line is between 10,000 and 10,0039, the second line is 10,039 and 10,040. I want to change the interval to 1000 base pairs for each line and take the mean of mappability value for that interval.

Let's say: Input

chr1 10,000 10,039 0.4
...     10,039  .....      0.2

Output

chr1 10,000 11,000 0.2 (mean mappability value between 10,000 and 11,000 position)
bed • 1.6k views
ADD COMMENT
2
Entering edit mode
2.9 years ago

With BEDOPS bedops and bedmap set and mean operations, and Kent utilities to build 1kb windows:

$ tr -d ',' < elements_with_commas.bed | sort-bed - > elements.bed
$ fetchChromSizes hg38 | grep -v '_*_' | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2 }' | sort-bed - | bedops --chop 1000 - > hg38.1kb.bed
$ bedops --everything elements.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, ".", $4 }' > union.bed
$ bedmap --echo --mean --delim '\t' hg38.1kb.bed union.bed > answer.bed

Replace hg38 with your reference genome name (e.g., mm10 for mouse, etc.).

You can also use process substitutions to avoid creating intermediate files, bundling the set and mean operations into a more efficient one-liner:

$ tr -d ',' < elements_with_commas.bed | sort-bed - > elements.bed
$ bedmap --echo --mean --delim '\t' <(fetchChromSizes hg38 | grep -v '_*_' | awk -v FS="\t" -v OFS="\t" '{ print $1, "0", $2 }' | sort-bed - | bedops --chop 1000 -) <(bedops --everything elements.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, ".", $4 }') > answer.bed

However, the first set of commands is probably easier to read, troubleshoot, and modify.

Another demonstration (and related Biostars answer) here, showing use of the --sum operation, instead of --mean, though it is the same principle: How can I bin my bed files into 500bp bins?

The bedmap command offers several score summary operations, in addition to --sum and --mean. Median, min, max, weighted and trimmed means, etc. Run bedmap --help or review the documentation for a full listing.

ADD COMMENT
1
Entering edit mode

It worked. Thank you!!!!!!

ADD REPLY
0
Entering edit mode
2.9 years ago
ATpoint 85k

That is what bedtools map can do for you. Check its documentation https://bedtools.readthedocs.io/en/latest/content/tools/map.html

ADD COMMENT
0
Entering edit mode

Thank you for the answer! From the link, I understand bedtools map works with a second bed file, but I have one file I want to change of its start-end interval with mappability score. Is it possible with bedtools map?

ADD REPLY
0
Entering edit mode

You would need to create a second file with the bins of interest first.

ADD REPLY

Login before adding your answer.

Traffic: 2364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6