Question

Peak calling with MACS for paired end DNase-seq data

0

Entering edit mode

6.6 years ago

Bioinformatist Newbie ▴ 270

I have a paired-end DNAseq data for wild-type and mutant conditions. I aligned the reads with GRCh38 genome by using BWA mem and wanted to call peaks from paired BAM file by using macs2 with -f BAMPE, but I am getting very few number of peaks (80 peaks with q-value 0.05) and I am unable to understand why. Can anybody here please help me if I am doing something wrong here? Aligning the reads with BWA-mem by:

bwa mem -t 16 genome.fa Read1.fastq Read2.fastq > aln_pe.sam

Converted this sam to bam by:

samtools view -bS aln_pe.sam > aln_pe.bam

Sorted the bam by:

samtools sort aln_pe.bam -o aln_pe_sorted.bam

Alignment stats by:

samtools flagstat aln_pe_sorted.bam

samtools flagstat output for wild-type:

63043224 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
150278 + 0 supplementary
0 + 0 duplicates
62528448 + 0 mapped (99.18% : N/A)
62892946 + 0 paired in sequencing
31446473 + 0 read1
31446473 + 0 read2
61214012 + 0 properly paired (97.33% : N/A)
62062360 + 0 with itself and mate mapped
315810 + 0 singletons (0.50% : N/A)
654324 + 0 with mate mapped to a different chr
457479 + 0 with mate mapped to a different chr (mapQ>=5)

samtools flagstat output for mutant:

59028279 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
145735 + 0 supplementary
0 + 0 duplicates
56304928 + 0 mapped (95.39% : N/A)
58882544 + 0 paired in sequencing
29441272 + 0 read1
29441272 + 0 read2
53646420 + 0 properly paired (91.11% : N/A)
54564956 + 0 with itself and mate mapped
1594237 + 0 singletons (2.71% : N/A)
784174 + 0 with mate mapped to a different chr
585883 + 0 with mate mapped to a different chr (mapQ>=5)

Now when I use macs2 for calling peaks I get very few peaks and I am unable to understand if that is because of my samples or I am doing something wrong. I have tried different ways, the most peaks which I was able to get were 80 by trying this:

macs2 callpeak -t MUT1_sorted.bam -c WT1_sorted.bam -f BAMPE -g hs -n MUT1_WT1 -q 0.05 --nomodel --shift 27 --extsize 54

Can anybody help me and point out what is wrong here? macs output:

# Command line: callpeak -t MUT1_sorted.bam -c WT1_sorted.bam -f BAMPE -g hs -n MUT1_WT1 -q 0.05 --nomodel --shift 27 --extsize 54
# ARGUMENTS LIST:
# name = MUT1_WT1
# format = BAMPE
# ChIP-seq file = ['MUT1_sorted.bam']
# control file = ['WT1_sorted.bam']
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is on

INFO  @ Thu, 03 May 2018 19:24:36: #1 read fragment files... 
INFO  @ Thu, 03 May 2018 19:24:36: #1 read treatment fragments... 
INFO  @ Thu, 03 May 2018 19:24:47:  1000000 
.......
INFO  @ Thu, 03 May 2018 19:29:35:  26000000 
INFO  @ Thu, 03 May 2018 19:30:19: #1.2 read input fragments... 
INFO  @ Thu, 03 May 2018 19:30:29:  1000000 
.......
INFO  @ Thu, 03 May 2018 19:35:36:  30000000 
INFO  @ Thu, 03 May 2018 19:36:18: #1 mean fragment size is determined as 109 bp from treatment 
INFO  @ Thu, 03 May 2018 19:36:18: #1 note: mean fragment size in control is 140 bp -- value ignored 
INFO  @ Thu, 03 May 2018 19:36:18: #1 fragment size = 109 
INFO  @ Thu, 03 May 2018 19:36:18: #1  total fragments in treatment: 26823210 
INFO  @ Thu, 03 May 2018 19:36:18: #1 user defined the maximum fragments... 
INFO  @ Thu, 03 May 2018 19:36:18: #1 filter out redundant fragments by allowing at most 1 identical fragment(s) 
INFO  @ Thu, 03 May 2018 19:37:28: #1  fragments after filtering in treatment: 26291999 
INFO  @ Thu, 03 May 2018 19:37:28: #1  Redundant rate of treatment: 0.02 
INFO  @ Thu, 03 May 2018 19:37:28: #1  total fragments in control: 30607006 
INFO  @ Thu, 03 May 2018 19:37:28: #1 user defined the maximum fragments... 
INFO  @ Thu, 03 May 2018 19:37:28: #1 filter out redundant fragments by allowing at most 1 identical fragment(s) 
INFO  @ Thu, 03 May 2018 19:38:48: #1  fragments after filtering in control: 30017115 
INFO  @ Thu, 03 May 2018 19:38:48: #1  Redundant rate of control: 0.02 
INFO  @ Thu, 03 May 2018 19:38:48: #1 finished! 
INFO  @ Thu, 03 May 2018 19:38:48: #2 Build Peak Model... 
INFO  @ Thu, 03 May 2018 19:38:48: #2 Skipped... 
INFO  @ Thu, 03 May 2018 19:38:48: #2 Use 109 as fragment length 
INFO  @ Thu, 03 May 2018 19:38:48: #3 Call peaks... 
INFO  @ Thu, 03 May 2018 19:38:48: #3 Pre-compute pvalue-qvalue table... 
INFO  @ Thu, 03 May 2018 19:43:32: #3 Call peaks for each chromosome... 
INFO  @ Thu, 03 May 2018 19:45:51: #4 Write output xls file... MUT1_WT1_peaks.xls 
INFO  @ Thu, 03 May 2018 19:45:51: #4 Write peak in narrowPeak format file... MUT1_WT1_peaks.narrowPeak 
INFO  @ Thu, 03 May 2018 19:45:51: #4 Write summits bed file... MUT1_WT1_summits.bed 
INFO  @ Thu, 03 May 2018 19:45:51: Done!

macs2 DNA-seq paired-end BWA • 4.7k views

ADD COMMENT • link updated 6.6 years ago by igor 13k • written 6.6 years ago by Bioinformatist Newbie ▴ 270

0

Entering edit mode

Could you please look at your two samples in parallel on IGV? Does it look that you are missing many peak? Could you please describe your experimental setup, wild type and mutant in more details?

ADD REPLY • link 6.6 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Thank you for replying. I think I am missing many peak regions even if I call peaks without using the control. Please see this IGV Image. As per my understanding, I should observe a peak for this region as many of the reads are mapping in this region but In actual I don't see a narrow peak for this region. Wild type and mutant are brain tissues, where mutant carries a mutation in gene which caused exon skipping of a gene. I think there should not be much difference in chromatin accessibility as that gene is not a TF or pioneer TF but still calling peaks only on WT or MUT bam files should give a reasonable number of peaks not like 2574 only (In case of MUT where control was not used for peak calling).

ADD REPLY • link 6.6 years ago by Bioinformatist Newbie ▴ 270

score 2 · Answer 1 · 2018-05-06

2

Entering edit mode

6.6 years ago

igor 13k

I think there is a problem with your command:

macs2 callpeak -t MUT1_sorted.bam -c WT1_sorted.bam

While mutant is -t/--treatment, your -c/--control should probably be skipped. Both your mutant and WT samples would be considered treatment. MACS was originally designed for standard ChIP-seq where the control sample would be the input or IgG sample.

ADD COMMENT • link 6.6 years ago by igor 13k

0

Entering edit mode

Thank you for your suggestion. By trying this: macs2 callpeak -t MUT1_sorted.bam -f BAMPE -g hs -n MUT1_WT1 -q 0.05 --nomodel --shift 27 --extsize 54 I get only 2574 peaks. I think this number is not reasonable. Let's say if I associate each peak to a gene, even then, only 2574 genes are accessible at the chromatin level. Do you think I can try something else?

ADD REPLY • link 6.6 years ago by Bioinformatist Newbie ▴ 270

0

Entering edit mode

It may indicate a problem with your sample prep. You can change the significance cutoff if you just want to see more peaks. I don't have any experience with DNase-seq, but for ATAC-seq, I've seen many people use unadjusted p-value which will produce a lot more peaks.

ADD REPLY • link 6.6 years ago by igor 13k

0

Entering edit mode

Thank you for replying. Can you please guide me towards any such reference of using unadjusted pvalue. From unadjusted p-value, do you mean something like -p 0.5 or so. So that I get more peak but point in this case would be, are those peaks reliable enough? I have got this data from our collaborators and I do not have any idea about how they prepared library. Can you please confirm if I am not doing something wrong here in terms of the steps I listed in my questions. Thanks a lot!

ADD REPLY • link 6.6 years ago by Bioinformatist Newbie ▴ 270

0

Entering edit mode

Did you get this resolved by using specific parameters? I also see that you are changing the --shift option which is recommended to keep as 0 with format BAMPE. See MACS options. How did you determined the --extsize to 54?

ADD REPLY • link 6.1 years ago by sutturka ▴ 190