You can use BEDOPS bedops --intersect
and --element-of
to do what you want.
But this is also a good use case for bedmap (also part of BEDOPS) to create a more general and powerful solution.
Say you want to do this for any three mark categories. At some point, you decide you want to redo your experiment by excluding Pol II, instead of H3K4me3. Instead of building a new script of specifically-ordered bedops operations, you could just grep
a result set, in which there are any three marks that overlap an annotation. Grep will let you filter a file for matches on a keyword or pattern, as well as filter the file to exclude any matches with a keyword or pattern.
By way of specifics, let's say that you have ChIP-seq marks in one file called marks.bed
, where the first three columns are the genomic position of the mark (chromosome, start and stop), and the fourth column is a string labeling the mark type: H3K4me1
, H3K27Ac
, Pol_II
, and H3K4me3
:
$ bedops --everything H3K4me1.bed H3K27Ac.bed Pol_II.bed H3K4me3.bed > marks.bed
This is a typical four-column BED file. The fourth column is an ID field, so we can use bedmap --echo-map-id-uniq
and awk
to map the peak type to the annotation (intergenic) and count the number of overlaps:
$ bedmap --echo-map-id-uniq --echo --echo-map-id-uniq --delim '\t' intergenic.bed marks.bed \
| awk '{ \
if (split($1, a, ";") == 3) { \
print $0; \
} \
}' \
| cut -f2- \
> three_mark_types_overlapping_intergenic_regions.bed
The result contains the annotation element (intergenic region in this example, but this could be an exonic region, intronic region, etc.), where exactly three unique ID types overlap. Note that these are any three types, which could include H3K4me3 and two other unique types.
We use --echo-map-id-uniq
twice. That's not a typo. We use it once to put the IDs in the first column, and the second use is to again put the IDs at the end of the line. The first column result gets processed with awk
, which prints out the line if there are any three mark types. The cut
command cuts off that prefix, and what's left is a BED file.
Now we just use grep
to exclude the ID we don't want:
$ grep -v H3K4me3 three_mark_type_overlapping_intergenic_regions.bed > three_mark_type_overlapping_intergenic_regions_without_H3K4me3_mark_overlaps.bed
Reuse this result down the road for any other ID type, if you want:
$ grep -v Pol_II three_mark_type_overlapping_intergenic_regions.bed > three_mark_type_overlapping_intergenic_regions_without_Pol_II_marks.bed
I want to thank you for this immensely detailed answer. Give me 24 hours to attempt this on my own and I will edit this post with any questions / or feedback I might have.