Question

Should I merge my two datasets together or something else?

2

Entering edit mode

8.9 years ago

dally ▴ 210

I have a very fundamental question that I can't seem to find an answer to.

I have a variety of TF and histone marks for a untreated cell line and a treated cell line. I ran a tool that returned a bed file of 'potential' enhancer regions. The untreated cell line identified 36k possible enhancer sites while the treated cell line identified 34k. If I am interested in seeing whether a histone mark or two are enriched / de-enriched at these enhancer sites, should I merge these two datasets together to generate one large dataset? Or should I take only the common enhancer regions (intersectBed) that appear between both datasets?

Why is it 'correct' to merge them together as opposed to identifying common regions? Or vice versa? Or is there something else I should be doing that is entirely different?

I have not worked with untreated vs treated cell types before so I don't wish to proceed too far before determining this.

datasets enhancers • 2.1k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 8.9 years ago by dally ▴ 210

Ram · Accepted Answer · 2016-02-03

I would do check how many enhancer sites are common b/w two conditions

If high overlap (>90%), I would conclude, treatment has no significant effect and would proceed by the plotting the enrichment of histone marks/TF on the intersection or union (if >95% overlap) of enhancer regions
If low overlap or very different enhancer sets, I would say treatment has an effect and would elaborate what kind of enhancers are common and what are the "new" enhancer sites plus which ones got lost. You go forward with GO analysis of the neighbouring genes etc, for these groups. Once you are clear with your groups, go forward with the enrichment calculations.