Question

Determing sites that contain 3 marks, but lack 1.

3

Entering edit mode

9.1 years ago

Vov ▴ 70

I have been given four separate ChIP-Seq marks that I have aligned, and called using Homer and MACS2 respectively. I have annotated the peak files, and saved them in .txt format.

I am interested in determining which sites (intergenic, introns, exons, intragenic, etc) in the human genome (hg19) contain H3K4me1 + H3K27Ac + Pol II but lack H3K4me3.

What software would allow me to do this? What function?

Eventually I will overlap this cluster with more marks.

Let me know if the question isn't clarified well enough.

Thanks in advance

ChIP-Seq • 2.0k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.1 years ago by Vov ▴ 70

Ram · Answer 1 · 2015-10-15

You can use BEDOPS bedops --intersect and --element-of to do what you want.

But this is also a good use case for bedmap (also part of BEDOPS) to create a more general and powerful solution.

Say you want to do this for any three mark categories. At some point, you decide you want to redo your experiment by excluding Pol II, instead of H3K4me3. Instead of building a new script of specifically-ordered bedops operations, you could just grep a result set, in which there are any three marks that overlap an annotation. Grep will let you filter a file for matches on a keyword or pattern, as well as filter the file to exclude any matches with a keyword or pattern.

By way of specifics, let's say that you have ChIP-seq marks in one file called marks.bed, where the first three columns are the genomic position of the mark (chromosome, start and stop), and the fourth column is a string labeling the mark type: H3K4me1, H3K27Ac, Pol_II, and H3K4me3:

$ bedops --everything H3K4me1.bed H3K27Ac.bed Pol_II.bed H3K4me3.bed > marks.bed

This is a typical four-column BED file. The fourth column is an ID field, so we can use bedmap --echo-map-id-uniq and awk to map the peak type to the annotation (intergenic) and count the number of overlaps:

$ bedmap --echo-map-id-uniq --echo --echo-map-id-uniq --delim '\t' intergenic.bed marks.bed \
    | awk '{ \
          if (split($1, a, ";") == 3) { \
              print $0; \
          } \
      }' \
    | cut -f2- \
    > three_mark_types_overlapping_intergenic_regions.bed

The result contains the annotation element (intergenic region in this example, but this could be an exonic region, intronic region, etc.), where exactly three unique ID types overlap. Note that these are any three types, which could include H3K4me3 and two other unique types.

We use --echo-map-id-uniq twice. That's not a typo. We use it once to put the IDs in the first column, and the second use is to again put the IDs at the end of the line. The first column result gets processed with awk, which prints out the line if there are any three mark types. The cut command cuts off that prefix, and what's left is a BED file.

Now we just use grep to exclude the ID we don't want:

$ grep -v H3K4me3 three_mark_type_overlapping_intergenic_regions.bed > three_mark_type_overlapping_intergenic_regions_without_H3K4me3_mark_overlaps.bed

Reuse this result down the road for any other ID type, if you want:

$ grep -v Pol_II three_mark_type_overlapping_intergenic_regions.bed > three_mark_type_overlapping_intergenic_regions_without_Pol_II_marks.bed

Ram · Answer 2 · 2015-10-15

0

Entering edit mode

9.1 years ago

h.mon 35k

See bedtools and BEDOPS.

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by h.mon 35k

1

Entering edit mode

BEDOPS seems like it might be able to do exactly what I want. Let me know if this sounds about right:

1) Use BEDOPS intersect function to find overlap off my three marks and merge them into 1 bed file.

2) Use BEDOPS -not-element-of function to show elements in the reference file (merged bed file from step 1) that do not overlap elements in other sets.

What do you think? I'm sure i'm missing something, but being new I can't quite see it.

ADD REPLY • link 9.1 years ago by Vov ▴ 70