I have a paired-end DNAseq data for wild-type and mutant conditions. I aligned the reads with GRCh38 genome by using BWA mem and wanted to call peaks from paired BAM file by using macs2 with -f BAMPE
, but I am getting very few number of peaks (80 peaks with q-value 0.05) and I am unable to understand why. Can anybody here please help me if I am doing something wrong here?
Aligning the reads with BWA-mem by:
bwa mem -t 16 genome.fa Read1.fastq Read2.fastq > aln_pe.sam
Converted this sam to bam by:
samtools view -bS aln_pe.sam > aln_pe.bam
Sorted the bam by:
samtools sort aln_pe.bam -o aln_pe_sorted.bam
Alignment stats by:
samtools flagstat aln_pe_sorted.bam
samtools flagstat output for wild-type:
63043224 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
150278 + 0 supplementary
0 + 0 duplicates
62528448 + 0 mapped (99.18% : N/A)
62892946 + 0 paired in sequencing
31446473 + 0 read1
31446473 + 0 read2
61214012 + 0 properly paired (97.33% : N/A)
62062360 + 0 with itself and mate mapped
315810 + 0 singletons (0.50% : N/A)
654324 + 0 with mate mapped to a different chr
457479 + 0 with mate mapped to a different chr (mapQ>=5)
samtools flagstat output for mutant:
59028279 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
145735 + 0 supplementary
0 + 0 duplicates
56304928 + 0 mapped (95.39% : N/A)
58882544 + 0 paired in sequencing
29441272 + 0 read1
29441272 + 0 read2
53646420 + 0 properly paired (91.11% : N/A)
54564956 + 0 with itself and mate mapped
1594237 + 0 singletons (2.71% : N/A)
784174 + 0 with mate mapped to a different chr
585883 + 0 with mate mapped to a different chr (mapQ>=5)
Now when I use macs2 for calling peaks I get very few peaks and I am unable to understand if that is because of my samples or I am doing something wrong. I have tried different ways, the most peaks which I was able to get were 80 by trying this:
macs2 callpeak -t MUT1_sorted.bam -c WT1_sorted.bam -f BAMPE -g hs -n MUT1_WT1 -q 0.05 --nomodel --shift 27 --extsize 54
Can anybody help me and point out what is wrong here? macs output:
# Command line: callpeak -t MUT1_sorted.bam -c WT1_sorted.bam -f BAMPE -g hs -n MUT1_WT1 -q 0.05 --nomodel --shift 27 --extsize 54
# ARGUMENTS LIST:
# name = MUT1_WT1
# format = BAMPE
# ChIP-seq file = ['MUT1_sorted.bam']
# control file = ['WT1_sorted.bam']
# effective genome size = 2.70e+09
# band width = 300
# model fold = [5, 50]
# qvalue cutoff = 5.00e-02
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is on
INFO @ Thu, 03 May 2018 19:24:36: #1 read fragment files...
INFO @ Thu, 03 May 2018 19:24:36: #1 read treatment fragments...
INFO @ Thu, 03 May 2018 19:24:47: 1000000
.......
INFO @ Thu, 03 May 2018 19:29:35: 26000000
INFO @ Thu, 03 May 2018 19:30:19: #1.2 read input fragments...
INFO @ Thu, 03 May 2018 19:30:29: 1000000
.......
INFO @ Thu, 03 May 2018 19:35:36: 30000000
INFO @ Thu, 03 May 2018 19:36:18: #1 mean fragment size is determined as 109 bp from treatment
INFO @ Thu, 03 May 2018 19:36:18: #1 note: mean fragment size in control is 140 bp -- value ignored
INFO @ Thu, 03 May 2018 19:36:18: #1 fragment size = 109
INFO @ Thu, 03 May 2018 19:36:18: #1 total fragments in treatment: 26823210
INFO @ Thu, 03 May 2018 19:36:18: #1 user defined the maximum fragments...
INFO @ Thu, 03 May 2018 19:36:18: #1 filter out redundant fragments by allowing at most 1 identical fragment(s)
INFO @ Thu, 03 May 2018 19:37:28: #1 fragments after filtering in treatment: 26291999
INFO @ Thu, 03 May 2018 19:37:28: #1 Redundant rate of treatment: 0.02
INFO @ Thu, 03 May 2018 19:37:28: #1 total fragments in control: 30607006
INFO @ Thu, 03 May 2018 19:37:28: #1 user defined the maximum fragments...
INFO @ Thu, 03 May 2018 19:37:28: #1 filter out redundant fragments by allowing at most 1 identical fragment(s)
INFO @ Thu, 03 May 2018 19:38:48: #1 fragments after filtering in control: 30017115
INFO @ Thu, 03 May 2018 19:38:48: #1 Redundant rate of control: 0.02
INFO @ Thu, 03 May 2018 19:38:48: #1 finished!
INFO @ Thu, 03 May 2018 19:38:48: #2 Build Peak Model...
INFO @ Thu, 03 May 2018 19:38:48: #2 Skipped...
INFO @ Thu, 03 May 2018 19:38:48: #2 Use 109 as fragment length
INFO @ Thu, 03 May 2018 19:38:48: #3 Call peaks...
INFO @ Thu, 03 May 2018 19:38:48: #3 Pre-compute pvalue-qvalue table...
INFO @ Thu, 03 May 2018 19:43:32: #3 Call peaks for each chromosome...
INFO @ Thu, 03 May 2018 19:45:51: #4 Write output xls file... MUT1_WT1_peaks.xls
INFO @ Thu, 03 May 2018 19:45:51: #4 Write peak in narrowPeak format file... MUT1_WT1_peaks.narrowPeak
INFO @ Thu, 03 May 2018 19:45:51: #4 Write summits bed file... MUT1_WT1_summits.bed
INFO @ Thu, 03 May 2018 19:45:51: Done!
Could you please look at your two samples in parallel on IGV? Does it look that you are missing many peak? Could you please describe your experimental setup, wild type and mutant in more details?
Thank you for replying. I think I am missing many peak regions even if I call peaks without using the control. Please see this IGV Image. As per my understanding, I should observe a peak for this region as many of the reads are mapping in this region but In actual I don't see a narrow peak for this region. Wild type and mutant are brain tissues, where mutant carries a mutation in gene which caused exon skipping of a gene. I think there should not be much difference in chromatin accessibility as that gene is not a TF or pioneer TF but still calling peaks only on WT or MUT bam files should give a reasonable number of peaks not like 2574 only (In case of MUT where control was not used for peak calling).