peak calling of ChIP-seq
2
0
Entering edit mode
7.6 years ago
Ben ▴ 60

I have many ChIP-Seq data containing duplicated data.

Firstly, I aligned these fastq files into reference genome separately, then I merged these bam files into one bigger bam file. I used MACS to do peak calling. However, many papers did not merge these bam files, but they did peak calling separately and merge these peaks produced by MACS. Does anyone know which one method is better? And how to merge these peaks generated by MACS?

ChIP-Seq • 3.9k views
ADD COMMENT
1
Entering edit mode

If these are biological replicates follow the IDR analysis of encode. Do quality analysis of noise to signal with SPP using cross correlation analysis as EagleEye suggested. Also perform chance in parallel to understand the quality of the signals. FInally peak calling with MACS2 (i hope you are doing with the latest). Multiple peak calling can also be done with macs2, having one input and all the bam files for your samples.

Check the link

ADD REPLY
0
Entering edit mode

Please be reminded that SPP or IDR protocol can be only used for single-end read data. So, better use masc2 peak caller which can handle both single and pair end data. Please check my reply for more details.

ADD REPLY
1
Entering edit mode

OP did not mention if its SE or PE.

ADD REPLY
0
Entering edit mode

Thanks for your suggestions! But I have another question, you siad that I should merge the common peaks from multiple peak calling. However, what are the common peaks? In fact, I do not know to merge peaks from multiple files.

ADD REPLY
1
Entering edit mode
ADD REPLY
1
Entering edit mode
7.6 years ago
EagleEye 7.6k

Hi,

I recommend you to use phantompeakqualtools cross-correlation analysis

  • Check the column 11 values. If the replicates have values close to each other, you can merge those samples and do single peak calling. Othewise you do peak calling separately and merge/ take the common peaks from both peak calling.

    COL11: QualityTag: Quality tag based on thresholded RSC (codes: -2:veryLow,-1:Low,0:Medium,1:High,2:veryHigh)
    
  • Also recheck/verify the samples using 'plotFingerpring'.

ADD COMMENT
1
Entering edit mode
7.6 years ago
dnamonk ▴ 10

The best approach is to do peak calling separately on each replicate (make sure to use input) and then use either: phantompeakqualtools if you have single end read data (Reference: https://sites.google.com/site/anshulkundaje/projects/idr).

OR

Use ChiLin: https://www.ncbi.nlm.nih.gov/pubmed/27716038 if you have pair-end data to assess the quality of each replicate. Please remember that SPP can be only used for single end read data. So, you better use macs2 peak caller.

Nowadays, in newly coming papers calculating Pearson's correlation for checking read density for overlapping replicates is regarded as a better approach than IDR. So, you should also give it a try.

Then only select those replicates which have significant overlaps. Later, you can merge the peaks for each replicate. Best is to perform downstream analysis on only those peaks which are overlapping. Use Bedtools to merge peaks.

Good luck!

ADD COMMENT
1
Entering edit mode

I agree about the Pearson's correlation for checking the read density. Something I reckon is applied in bamcompare of deeptools, if am not wrong.

ADD REPLY
0
Entering edit mode

Could you comment on the differences between IDR and Pearson? I understand what each approach is doing, but given that a Pearson for the read count of the peak summits gives, lets say >= 0.9, is it then possible that IDR would mark these two replicates as unacceptable? So essentially, is a good linear correlation sufficient to assess the reproducibility of a replicate?

ADD REPLY

Login before adding your answer.

Traffic: 4454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6