Question

What is the appropriate approach for ChIPseq protein genome occupancy % calculation?

0

Entering edit mode

9.4 years ago

AlexAbdulkaderKheirallah ▴ 120

Hello All,

I want to generate a pie chart for protein occupancy onto different genomic features and I considered using bedtools to query a BAM file given annotation in BED format to retrieve number of tags aligning to those features. Is this OK or is it more appropriate to compare peak caller BED output against BED annotation? I thought with querying BAM file there could be a considerable noise coming from non-specific reads.

Thanks

ChIP-Seq • 3.6k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.4 years ago by AlexAbdulkaderKheirallah ▴ 120

2

Entering edit mode

Please don't make pie charts, they're terrible at conveying information.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by Devon Ryan 104k

0

Entering edit mode

It doesn't matter if it is a pie chart or other way of presenting % distributions. But why would pie charts be terrible?

ADD REPLY • link 9.4 years ago by AlexAbdulkaderKheirallah ▴ 120

1

Entering edit mode

It turns out that humans are really bad at accurately estimating and comparing percentages represented in pie charts. A table is typically preferred, though if you have time course or other longitudinal data then there are other graphical options.

ADD REPLY • link 9.4 years ago by Devon Ryan 104k

0

Entering edit mode

Your data are not categorical (a occupancy site could intersect with many genomic features) - so a pie chart (rarely the right approach) is definitively wrong here.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by Simon Cockell 7.4k

0

Entering edit mode

For the first time I see somebody commenting on a data that he did not even see before. Anyway, what you said doesn't make sense and you're mistaken. Even if protein intersect with genomic features you can still quantify the number of tags specific to these features and have an idea of protein's genome-wide occupancy. Actually % distributions are quite common for novel ChIPseq analyses which is the case for me.

ADD REPLY • link 9.4 years ago by AlexAbdulkaderKheirallah ▴ 120

1

Entering edit mode

If the pie chart you're after is like the ones shown in the link provided in Ido's answer below, then I am not mistaken. It is perfectly possible for a ChIP region to be both in a promoter of one gene, and downstream of another gene - so the data are not categorical & a pie chart is wrong.

Just because it is common, doesn't mean it's correct.

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.4 years ago by Simon Cockell 7.4k

0

Entering edit mode

It is not about it being common or not but what kind of information you are after. I agree that ChIP region can be at promoter for one gene and gene body for another but that doesn't exclude the the validity of estimating % distributions. What you want to know is whether protein would have a preference for promoter or gene body or whatever region you are interested in which is a perfectly valid question to answer. You can actually generate average profiles for gene bodies starting from -1000TSS and ending =1000TES which would give you an idea of the most common occupancy.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by AlexAbdulkaderKheirallah ▴ 120

0

Entering edit mode

Unless you're after the wrong kind of information, any collating statistic (sum, mean, median, mode, max, min) based on overlapping regions expressed in a pie chart will cause issues my man.

I know it can seem like a non-issue at first, but since many marks sit right at the beginning of two genes transcribed in opposite directions, TSS +- anything is going to really mess things up - I'm telling you, overlapping intersections are the bane of epigenetics because they're so easy to do wrong and are rarely documented in the methods properly :/

EDIT: to make the comment a little more positive - you can do pie charts if you only collate signal on regions without overlaps - however its almost always better to plot signal distributions (not just a sum or a mean) and look at that. It tells you a lot more.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by John 13k

score 2 · Answer 1 · 2015-06-24

2

Entering edit mode

9.4 years ago

Ido Tamir 5.2k

http://liulab.dfci.harvard.edu/CEAS/usermanual.html

ADD COMMENT • link 9.4 years ago by Ido Tamir 5.2k

0

Entering edit mode

that's useful thanks

ADD REPLY • link 9.4 years ago by AlexAbdulkaderKheirallah ▴ 120

score 1 · Answer 2 · 2015-06-24

1

Entering edit mode

9.4 years ago

Friederike 9.0k

I recommend normalizing the ChIP against an input sample, i.e. supply a WIG file of normalized read counts to CEAS (if that's what you end up using)

You can use bamCompare of the deepTools suite for that dx.doi.org/10.1093/nar/gku365

ADD COMMENT • link 9.4 years ago by Friederike 9.0k