Hi,
I'm analyzing my chipseq data and am encountering some issues. I'm new to chipseq so any help is appreciated. Here is my pipeline:
I have triplicates for my treatment and triplicates for control.
- Align filtered & trimmed Illumina fastq (paired end) to ref genome using bowtie.
- Convert SAM file to sorted BAM file using samtools.
- Use MACS2 to call peaks. This was done on Galaxy and I used all standard parameters except for changing lower mfold to 1 because I wanted to make it less specific just to see more peaks.
- The output showed 5 peaks, I converted the Peaks.txt file to csv and used that as input for next step.
- I used the above .cvs file for DiffBind in R, code below:
peaks <- dba.analyze("tS_cM.csv")
The error I'm getting is:
Loading sample sheet... 1 NA raw 2 NA raw 3 NA raw 4 NA raw 5 NA raw 6 NA raw 7 NA raw Error: DBA object has no peaks.
I've also included the txt version of my .csv file
# This file is generated by MACS version 2.2.9.1
#Command line: callpeak -t /data/dnb10/galaxy_db/files/7/5/9/dataset_759c3204-8cd2-4a01-9d1a-625a880c13b9.dat /data/dnb10/galaxy_db/files/f/8/3/dataset_f8366859-d501-4460-b617-6d716216d811.dat /data/dnb10/galaxy_db/files/8/f/a/dataset_8fa3c242-c607-44f0-8cad-6738b0e8f681.dat --name S1_sorted_bam -c /data/dnb10/galaxy_db/files/3/c/b/dataset_3cb5e1c5-e77d-439c-91a0-537f8f4ea857.dat /data/dnb10/galaxy_db/files/c/c/7/dataset_cc784e5b-2cc9-458b-8daf-e226ed2c59fa.dat --format BAMPE --gsize 3700000 --keep-dup 1 --d-min 20 --buffer-size 100000 --bdg --qvalue 0.05 --mfold 1 50 --bw 300
# ARGUMENTS LIST:
# name = S1_sorted_bam
# format = BAMPE
# ChIP-seq file = ['/data/dnb10/galaxy_db/files/7/5/9/dataset_759c3204-8cd2-4a01-9d1a-625a880c13b9.dat', '/data/dnb10/galaxy_db/files/f/8/3/dataset_f8366859-d501-4460-b617-6d716216d811.dat', '/data/dnb10/galaxy_db/files/8/f/a/dataset_8fa3c242-c607-44f0-8cad-6738b0e8f681.dat']
# control file = ['/data/dnb10/galaxy_db/files/3/c/b/dataset_3cb5e1c5-e77d-439c-91a0-537f8f4ea857.dat', '/data/dnb10/galaxy_db/files/c/c/7/dataset_cc784e5b-2cc9-458b-8daf-e226ed2c59fa.dat']
# effective genome size = 3.70e+06
# band width = 300
# model fold = [1, 50]
# qvalue cutoff = 5.00e-02
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 1000 bps and 10000 bps
# Broad region calling is off
# Paired-End mode is on
# fragment size is determined as 210 bps
# total fragments in treatment: 613205
# fragments after filtering in treatment: 544611
# maximum duplicate fragments in treatment = 1
# Redundant rate in treatment: 0.11
# total fragments in control: 565624
# fragments after filtering in control: 489201
# maximum duplicate fragments in control = 1
# Redundant rate in control: 0.14
# d = 210
chr start end length abs_summit pileup -log10(pvalue) fold_enrichment -log10(qvalue) name
NC_014034.1 513165 513418 254 513402 62 6.11744 1.91471 2.57173 S1_sorted_bam_peak_1
NC_014034.1 3003464 3003706 243 3003490 67 5.96006 1.84696 2.52564 S1_sorted_bam_peak_2
NC_014034.1 3118257 3118497 241 3118334 75 3.97106 1.55513 1.32946 S1_sorted_bam_peak_3
NC_014034.1 3399189 3399404 216 3399353 70 8.24044 2.09495 2.6286 S1_sorted_bam_peak_4
NC_014034.1 3409689 3409913 225 3409745 68 5.66339 1.80183 2.36274 S1_sorted_bam_peak_5
NC_014034.1 3490720 3490940 221 3490816 67 6.15106 1.87075 2.58832 S1_sorted_bam_peak_6
NC_014034.1 3637800 3638077 278 3637887 65 4.68416 1.70072 1.77008 S1_sorted_bam_peak_7
Why isn't DiffBind identifying these peaks? Any help would be greatly appreciated.
No, you added a screenshot of plain text content, which is counterproductive. You can copy paste the content directly here (using the code formatting option shown below), or use a GitHub Gist if the content volume exceeds allowed length here.
Ok fixed!