Bedtools merge minimum overlap?
1
0
Entering edit mode
6 months ago
SJP • 0

I have a bedfile with multiple mapped sequences like below:

chr1    100     200     region1    +
chr1    180     300     region2    +
chr1    300     350     region3    +
chr1    320     400     region4    +
chr1    330     410     region5    +

I am using bedtools merge to merge sequences that overlap - however, I really need to set a minimum overlap cut-off. Bedtools merge will overlap bookended sequences, and sequences overlapping by just one basepair, and there is no way to change this.

E.g. regions 1 and 2 overlap by only 20bp, and in this case I would not like them merged. However, regions 4 and 5 mostly overlap, and I would like these sequences to be merged. Its also really important that I am able to retain the ID and strand information in the resulting merged bed entry, as well as enforcing strandedness when merging.

Does anyone know of a program or method to do this? I have also tried bedOPS with no success. I am new to Python programming so not quite sure how I would be able to code this myself!

bedtools genomics • 637 views
ADD COMMENT
1
Entering edit mode

Not sure if you wanting to do like this-

cat your_input.bed 
chr1    100     200     region1    +
chr1    180     300     region2    +
chr1    300     350     region3    +
chr1    320     400     region4    +
chr1    330     410     region5    +

#BEDOPS bedmap to merge beds if there greater than 20% overlaps between beds:
bedmap --count --echo-map-range --fraction-both 0.2 --delim "\t" your_input.bed  |cut -f2- -|sort-bed -|uniq -
chr1    100 200
chr1    180 300
chr1    300 410
ADD REPLY
0
Entering edit mode

Note that, in place of sort-bed - | uniq -, you can pass the --unique option to sort-bed.

The input is not strictly a BED6 file, but you can strand-separate it by way of awk:

awk -vFS="\t" -vOFS="\t" '($5=="+")' input.bed | sort-bed - > output.forward.bed

Repeat for reverse-stranded elements:

awk -vFS="\t" -vOFS="\t" '($5=="-")' input.bed | sort-bed - > output.reverse.bed

May as well sort, if the sort order of input.bed is unknown.

Then run BEDOPS bedmap on each file, as usual.

You can use bedops --everything to take the multiset union of the two bedmap results:

bedops --everything answer.forward.bed answer.reverse.bed > answer.bed

The file answer.bed will be sorted correctly.

To preserve IDs in the fourth column, you can use the --echo-map-id or --echo-map-id-uniq operator with bedmap.

ADD REPLY
0
Entering edit mode
6 months ago

Bedtools intersect allows you to specify the fraction of overlap between two BED (or BAM) files using the F/f/r flags. You could split your sort-ordered BED file into two files of alternating entries, intersect per your specifications to generate a list of intervals to be merged, perform Bedtools merge on that subset, add back the unmerged intervals, and sort.

Inelegant solution, but it does the job.

ADD COMMENT

Login before adding your answer.

Traffic: 1515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6