Question

Macs: How Low Can One Go For Mfold Parameter; And What Does Uneven Treatment & Control Tags Mean?

2

Entering edit mode

11.5 years ago

Jordan ★ 1.3k

Hi,

I'm working on chip-seq data and I have a couple of questions on MACS.

How much low can you go for mfold parameter. The default is 10,30. I have given 5,30 and still the model could only call 392 peaks. I wanted to know if it's alright to go below 5?
I get a warning that, Treatment and control tags are uneven! FDR may be wrong. How can I fix this warning? Or is it alright to ignore this warning? i was wondering if it impedes my analysis by any chance?

Thanks in advance!

macs chip-seq • 11k views

ADD COMMENT • link updated 11.5 years ago by KCC ★ 4.1k • written 11.5 years ago by Jordan ★ 1.3k

0

Entering edit mode

One possible way to solve the problem of "Treatment and control tags are uneven! FDR may be wrong" that for me worked pretty fine is:

1.- Try to get the read length of both control and sample to be the same 2.- Down sample the sample that have more reads to the sample that have less reads

I know that MACS scales the samples in order to get the peaks, but the solution that I state is based on try-catch-error

Hope it helps!

ADD REPLY • link 11.5 years ago by daniel.soronellas ▴ 330

score 7 · Answer 1 · 2013-05-15

How many tags do you have for treatment and how many do you have for control? Sounds like there is a just a big difference in the amount of data you have for each.

I think it's okay to use mfold with a lower value, for instance 3,30. Although it depends. Let me explain. The mfold parameter is used to build the shift model. The reason the shift model is important is it determines how much you have to shift your tags on the forward strand and the reverse strand. The theory is that when a transcription factor is bound at a particular spot, it causes a lag in opposite directions on both strands, because fragments tend to break at the point where the fragment is bound. Once macs figures out how much to shift things it will shift tags forward on the forward strand and backward on the reverse strand.

To build this shift model, you want 'real' peaks. The mfold parameter is actually defining the definition of a peak. That way your model will be accurate. So, -m 10,30 means that peaks that are about 10 fold to 30 fold enriched are going to be used as real peaks. The default values are just a guess (probably based on trial and error) and right values for your data could be different.

Anyway, in theory you could figure out what the shift in your data is yourself. There are ways to tackle this problem. (You can judge the level of fit of the current shift model by running the R script that macs produces.) Once you know the right shift size for your data, the whole issue of what mfold to use is irrelevant. You should just input your desired shift-size yourself and use the '--nomodel' option. I don't think macs does a great job of figuring out the shift-size anyway.