Using the same cutoff on peak calling?
0
2
Entering edit mode
3.4 years ago
babanski ▴ 20

Hi!

When using macs2 peak caller to ChIP-seq data, I pick up some pvalues from --cutoff-analysis and test the pvalues until I got a reasonable balance between accurate peak call and false positive calls of background noises. But I found the best pvalues differs between samples. For example, given p= 0.0001 was the best for CTCF.rep1, performing peak call -pvalue 0.0001 to CTCF.rep2 ended up losing most of peaks, and instead, p=0.01 turn out the best for CTCF.rep2. This is maybe due to the difference of coverage of each sample.

I have four samples of CTCF peaks. Control.rep1, control.rep2, treatment.rep1 and treatment.rep2. The impression by my eye suggested pvalues of 0.01, 0.001, 0.0001 and 0,00001 are the best, respectively. When peak calling by p=0.01, I got a lot of background noise in treatment.rep2 sample, so I want to use p=0.01 to control.rep1 and p=0.00001 to treatment.rep2, if possible.

I think there are some options.

  • Using a single p value to all of these samples.
  • p=0.01 for the control samples and p=0.0001 for the treatment samples. (the same p to each treatment group)
  • Or, maybe use the same p to each replicate.
  • Using four independent p values to each samples.

Is it OK to use different p values, or I should use a single cutoff p value to all of my samples?

Thanks in advance, Taisuke

macs2 cutoff • 2.2k views
ADD COMMENT
0
Entering edit mode

First of all it would be nice to know about your replicates, have you consider a batch effect? Also, have you done any trimming in your fastq files? and were all the samples mapped using the same parameters? Saying this, in most of the case the reproducibility of peaks depends on sequencing depth : do you have a similar coverage in both replicates to detect the peaks? If this is the case, I would call the peaks using the same p value.

ADD REPLY
0
Entering edit mode

Thank you for replying!

I did the CUT&RUN on the same day and submitted the samples to the same company, so the batch effect is considerably low. I trimmed the fastq file using a program in cutruntools and mapped them with the same parameters.

The mapping summary is as following;

Factor Treatment Replicate SequencingDepth
CTCF NTC rep1 15071942
CTCF NTC rep2 16927612
CTCF SH rep1 12005923
CTCF SH rep2 8656545

As you can see, the coverage is very different in one sample. If using different p values are not acceptable way, down-sampling of the other samples may help get a reasonable result. What do you think?

ADD REPLY
0
Entering edit mode

Looking at your depth, it seems that there is something wrong with rep 2 SH, because rest of the samples have nearly two times more... could you figure what is wrong with it? An error with the sequencing? missing something during trimming? mapping? Saying that, if you thing everything is good with rep2, I will try to sequence it again (if possible) but if not, I would not include it in the analysis.

ADD REPLY
0
Entering edit mode

The file size of SH.rep2_R1_001.fastq.gz and R2_001.fastq.gz are much smaller than the others. So, simply the seq data I got was fewer in this sample. Otherwise, FASTQ qualities are similar, alignment rates are similar... Also, the duplication rate is high in both of these rep2 samples around 30%. Though CUT&RUN tends to create duplicated reads, but still this seems high. Maybe, the library prep material was too low and saturated in these samples and got fewer data in SH.rep2??? Anyway, something seems to go wrong with these. Excluding these rep2 samples may be safer. Thanks a lot.

ADD REPLY
0
Entering edit mode

Maybe, the library prep material was too low and saturated in these samples

seems to be the problem

ADD REPLY

Login before adding your answer.

Traffic: 1792 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6