Question

csaw - is there a way to identify composition bias in a dataset?

0

Entering edit mode

11 months ago

GLG ▴ 10

Hello. I was wondering if a composition bias/systematic DB could be identified through csaw. If so, I would normalize the data set for composition bias, otherwise I'd prefer normalizing for trended bias as it seems to perform better according to some papers.

We are probing a factor that is thought to be recruited to H3K27me3 sites in the genome. Our experiment compares the distribution of this factor with and without an inhibitor of K27me3 deposition.

Since we also have evidence that our factor distributes to many sites without K27me3 enrichment, I wanted to avoid assumptions such as "there will be a global reduction in this factor binding just because there is a global decrease in K27me3 enrichment".

Of course the most reliable way would be to blot for this factor in both conditions and see if there's global decrease in signal. However, this experiment involves a 45-day drug treatment, and the cut&run files are already available, so it would be more convenient if I could get a glimpse of the existence of systematic DB/global changes through the outputs of csaw alone.

Does anyone know whether this is possible and if so, how to do this?

Parameters:

small windows -> 150 bp, 50bp spacing.

large windows -> 2000 bp, 500bp spacing.

windows are consolidated using tol = 100 and max.width = 20000.

fold-change cutoff for filtering windows by the global background -> 3.0 for small windows and 2.0 for large windows

Normalization factors after normalizing for composition bias using 10kb bins:

(normfacs <- data.small.filt$norm.factors)
0.9741764 0.9745805 1.0287688 1.0238278

MA Plots after normalizing for composition bias. Samples 1 and 2 are control replicates and samples 3 and 4 are treatment replicates

MA Plots after normalizing for composition bias. Samples 1 and 2 are control replicates and samples 3 and 4 are treatment replicates

enter image description here

Normalization for trended bias -- before

enter image description here

Normalization for trended bias -- after

enter image description here

Do any of the above scream or at least indicate that there were global changes in enrichment in our factor? This would help me to decide normalization methods both for DB analysis and bigWig generation for visualization. Thanks a lot!

chip-seq csaw normalization composition-bias • 650 views

ADD COMMENT • link updated 10 months ago by ATpoint 86k • written 11 months ago by GLG ▴ 10

0

Entering edit mode

I personally do pretty much the same diagnostic as you, which is MA-plots (average logcpm vs log ratio aka log fold change) for all vs all, and then just look at it by eye.

Important for this is that you prefilter against (in your case) windows (or for my peaks, I never use windows) with small counts so you can really see whether the bulk of meaningful points is well-centered along y=0. I personally prefer not to use these density clouds but really plot individual points. The righthand-part of the plots (the "arrowhead") should be at mentioned y=0. That looks reasonable in your case I would say, though as said these density plots put focus on regions with low counts (because they're abundant) so it's a bit difficult to tell. You do not seem to have trended bias. Usually I just do default DESeq2/edgeR-style normalization with a count matrix based on called peaks. Sometimes I subset to the top 20% of regions with largest counts in case there is a lot of differential regions, to protect normalization being skewed towards up- or downregulated regions. Thinking aloud here, therefore the sloppy language.

ADD REPLY • link 11 months ago by ATpoint 86k

0

Entering edit mode

So to clarify, if the "arrowhead" in the righthand-part of the MA-plot is around y=0, does that indicate that composition bias was not present, or that it was present and was removed?

Would you know what the logratio of normalization factors (red line) indicates when it's not at y=0? I know the csaw manual says that if the red line crosses the center of the dark cloud then the composition bias was identified and removed, but I wonder if the red line being very close to y=0 means that there was no composition bias to begin with (I have H3K27me3 samples where we actually know that systematic DB/composition bias is present, and the offset is much higher, i.e. red line is around y = -1 for example, but I'm not sure about the meaning of this offset for the red line)

ADD REPLY • link 10 months ago by GLG ▴ 10

0

Entering edit mode

So to clarify, if the "arrowhead" in the righthand-part of the MA-plot is around y=0, does that indicate that composition bias was not present, or that it was present and was removed?

Yes, that's my usual interpretation.

Would you know what the logratio of normalization factors (red line) indicates when it's not at y=0

Red line is a loess regression fit afaik. If it is consistently biased towards -1 then I would say you might need to adapt normalization, but for this I would need to see the data. I always "look" at these plots by eye and judge dataset by dataset. Mostly default normalization is fine, but sometimes it isn't.

ADD REPLY • link 10 months ago by ATpoint 86k