Question

DiffBind: no peaks in DBA

0

Entering edit mode

14 days ago

yvonneh ▴ 10

Hi,

I'm analyzing my chipseq data and am encountering some issues. I'm new to chipseq so any help is appreciated. Here is my pipeline:

I have triplicates for my treatment and triplicates for control.

Align filtered & trimmed Illumina fastq (paired end) to ref genome using bowtie.
Convert SAM file to sorted BAM file using samtools.
Use MACS2 to call peaks. This was done on Galaxy and I used all standard parameters except for changing lower mfold to 1 because I wanted to make it less specific just to see more peaks.
The output showed 5 peaks, I converted the Peaks.txt file to csv and used that as input for next step.
I used the above .cvs file for DiffBind in R, code below:

peaks <- dba.analyze("tS_cM.csv")

The error I'm getting is:

Loading sample sheet... 1 NA raw 2 NA raw 3 NA raw 4 NA raw 5 NA raw 6 NA raw 7 NA raw Error: DBA object has no peaks.

I've also included the txt version of my .csv file

# This file is generated by MACS version 2.2.9.1
#Command line: callpeak -t /data/dnb10/galaxy_db/files/7/5/9/dataset_759c3204-8cd2-4a01-9d1a-625a880c13b9.dat /data/dnb10/galaxy_db/files/f/8/3/dataset_f8366859-d501-4460-b617-6d716216d811.dat /data/dnb10/galaxy_db/files/8/f/a/dataset_8fa3c242-c607-44f0-8cad-6738b0e8f681.dat --name S1_sorted_bam -c /data/dnb10/galaxy_db/files/3/c/b/dataset_3cb5e1c5-e77d-439c-91a0-537f8f4ea857.dat /data/dnb10/galaxy_db/files/c/c/7/dataset_cc784e5b-2cc9-458b-8daf-e226ed2c59fa.dat --format BAMPE --gsize 3700000 --keep-dup 1 --d-min 20 --buffer-size 100000 --bdg --qvalue 0.05 --mfold 1 50 --bw 300
# ARGUMENTS LIST:
# name = S1_sorted_bam
# format = BAMPE
# ChIP-seq file = ['/data/dnb10/galaxy_db/files/7/5/9/dataset_759c3204-8cd2-4a01-9d1a-625a880c13b9.dat', '/data/dnb10/galaxy_db/files/f/8/3/dataset_f8366859-d501-4460-b617-6d716216d811.dat', '/data/dnb10/galaxy_db/files/8/f/a/dataset_8fa3c242-c607-44f0-8cad-6738b0e8f681.dat']
# control file = ['/data/dnb10/galaxy_db/files/3/c/b/dataset_3cb5e1c5-e77d-439c-91a0-537f8f4ea857.dat', '/data/dnb10/galaxy_db/files/c/c/7/dataset_cc784e5b-2cc9-458b-8daf-e226ed2c59fa.dat']
# effective genome size = 3.70e+06
# band width = 300
# model fold = [1, 50]
# qvalue cutoff = 5.00e-02
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is on

# fragment size is determined as 210 bps
# total fragments in treatment: 613205
# fragments after filtering in treatment: 544611
# maximum duplicate fragments in treatment = 1
# Redundant rate in treatment: 0.11
# total fragments in control: 565624
# fragments after filtering in control: 489201
# maximum duplicate fragments in control = 1
# Redundant rate in control: 0.14
# d = 210
chr start   end length  abs_summit  pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
NC_014034.1 513165  513418  254 513402  62  6.11744 1.91471 2.57173 S1_sorted_bam_peak_1
NC_014034.1 3003464 3003706 243 3003490 67  5.96006 1.84696 2.52564 S1_sorted_bam_peak_2
NC_014034.1 3118257 3118497 241 3118334 75  3.97106 1.55513 1.32946 S1_sorted_bam_peak_3
NC_014034.1 3399189 3399404 216 3399353 70  8.24044 2.09495 2.6286  S1_sorted_bam_peak_4
NC_014034.1 3409689 3409913 225 3409745 68  5.66339 1.80183 2.36274 S1_sorted_bam_peak_5
NC_014034.1 3490720 3490940 221 3490816 67  6.15106 1.87075 2.58832 S1_sorted_bam_peak_6
NC_014034.1 3637800 3638077 278 3637887 65  4.68416 1.70072 1.77008 S1_sorted_bam_peak_7

Why isn't DiffBind identifying these peaks? Any help would be greatly appreciated.

MACS2 DiffBind ChIP-seq • 272 views

ADD COMMENT • link updated 14 days ago by jared.andrews07 ★ 16k • written 14 days ago by yvonneh ▴ 10

0

Entering edit mode

I've also included the txt version of my .csv file

No, you added a screenshot of plain text content, which is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.

code_formatting

ADD REPLY • link 14 days ago by Ram 43k

1

Entering edit mode

Ok fixed!

ADD REPLY • link 14 days ago by yvonneh ▴ 10

score 0 · Answer 1 · 2024-05-02

Because dba.analyze is not meant to be run directly on a peaks file. Have you read the documentation?

In addition, a ChIP-seq experiment with only 5 peaks called is pretty indicative of worthless data. Assess the data to ensure the experiment actually worked before spending more time on it. Throw your bam in IGV and see if it looks any different than your IgG/Input controls. With only 5 peaks called, my guess is that it doesn't. Peaks should be easily distinguishable by eye in a successful ChIP-seq experiment.