I have two bed files, and I have run bedtools intersect to calculate the overlapping regions.
bedtools intersect -a fileA.bed -b fileB.bed -wao > overlaps.bed
I need to calculate the number of overlaps that are longer than 100 bp (i.e. value higher than 100 in the 8th column of the overlaps.bed file, if I am right) per each feature in fileA.bed, as well as the total length covered by those overlaps.
The bedmap command takes all overlaps between map.bed and an element in reference.map, and applies the --echo-overlap-size operator, which prints the sizes of all overlapping elements.
The awk command filters overlap sizes by lengths greater than 100, sums their sizes and prints the sum and the number of filtered elements.
Cool. Here's a modified version that takes out a step of making the intermediate file tmp, which would help speed it up some more and eliminate generating tmp:
The hyphen - is a common placeholder for standard input from upstream processes. This placeholder can be used with BEDOPS tools like bedops, bedmap etc. as well as with common Unix utilities, like awk, paste, gunzip, etc.
Piping input from one process to the next in this way is a good way to skip making intermediate files, which can take time to generate and unnecessarily use up disk space.
The --bp-ovr 100 operator in bedmap requires 100 bases of overlap between elements in map.bed and the element in reference.bed. The --echo-overlap-size operator behaves as described in my previous answer.
The resulting overlap sizes are then passed to awk, which prints the count of qualifying overlaps and the sum of their sizes.
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thanks! I will next time, sorry about that.