I am trying to conduct peak calling on a publicly available dataset.
I posted before, but no matter what I tried (I followed the manual in variations, ChatGPT, forums, etc.), I could not make DiffBind work for it. This dataset I am trying to analyze is almost identical to the one used in the very beginning of its own documentation.
I have different cell lines, and I want to conduct the analysis by grouping distinct cell lines and obtaining cell line-specific information as well as group-specific information. I tried using different metadata files, which can be seen below.
==> formatted_metadata_diffbind_myc.tsv <==
SampleID Tissue Factor Condition Treatment Replicate bamReads ControlID Peaks PeakCaller
M7_1 mcf7_myc_rep1_peaks.narrowPeak macs mcf7_myc_rep1_bowtie_sorted_q20_dupmarked.bam MCF7in
M7_2 mcf7_myc_rep2_peaks.narrowPeak macs mcf7_myc_rep2_bowtie_sorted_q20_dupmarked.bam MCF7in
M7_3 mcf7_myc_rep3_peaks.narrowPeak macs mcf7_myc_rep3_bowtie_sorted_q20_dupmarked.bam MCF7in
T4_1 t47d_myc_rep1_peaks.narrowPeak macs t47d_myc_rep1_bowtie_sorted_q20_dupmarked.bam T47Din
T4_2 t47d_myc_rep2_peaks.narrowPeak macs t47d_myc_rep2_bowtie_sorted_q20_dupmarked.bam T47Din
T4_3 t47d_myc_rep3_peaks.narrowPeak macs t47d_myc_rep3_bowtie_sorted_q20_dupmarked.bam T47Din
M231_1 mdamb231_myc_rep1_peaks.narrowPeakn macs mdamb231_myc_rep1_bowtie_sorted_q20_dupmarked.bam MDAMB231in
M231_2 mdamb231_myc_rep2_peaks.narrowPeakn macs mdamb231_myc_rep2_bowtie_sorted_q20_dupmarked.bam MDAMB231in
M231_3 mdamb231_myc_rep3_peaks.narrowPeakn macs mdamb231_myc_rep3_bowtie_sorted_q20_dupmarked.bam MDAMB231in
BT_1 bt549_myc_rep1_peaks.narrowPeak macs bt549_myc_rep1_bowtie_sorted_q20_dupmarked.bam BT549in
BT_2 bt549_myc_rep2_peaks.narrowPeak macs bt549_myc_rep2_bowtie_sorted_q20_dupmarked.bam BT549in
BT_3 BT549 MYC TNBC Non 3 bt549_myc_rep3_bowtie_sorted_q20_dupmarked.bam BT549in bt549_myc_rep3_peaks.narrowPeak macs
==> metadata_diffbind_fixed.csv <==
"SampleID","Condition","Replicate","Peaks","bamReads","bamControl"
"M7_1","ER+",1,"mcf7_myc_rep1_peaks.narrowPeak","mcf7_myc_rep1_bowtie_sorted_q20_dupmarked.bam","mcf7_input_bowtie_sorted_q20_dupmarked.bam"
"M7_2","ER+",2,"mcf7_myc_rep2_peaks.narrowPeak","mcf7_myc_rep2_bowtie_sorted_q20_dupmarked.bam","mcf7_input_bowtie_sorted_q20_dupmarked.bam"
"M7_3","ER+",3,"mcf7_myc_rep3_peaks.narrowPeak","mcf7_myc_rep3_bowtie_sorted_q20_dupmarked.bam","mcf7_input_bowtie_sorted_q20_dupmarked.bam"
"T4_1","ER+",1,"t47d_myc_rep1_peaks.narrowPeak","t47d_myc_rep1_bowtie_sorted_q20_dupmarked.bam","t47d_input_bowtie_sorted_q20_dupmarked.bam"
"T4_2","ER+",2,"t47d_myc_rep2_peaks.narrowPeak","t47d_myc_rep2_bowtie_sorted_q20_dupmarked.bam","t47d_input_bowtie_sorted_q20_dupmarked.bam"
"T4_3","ER+",3,"t47d_myc_rep3_peaks.narrowPeak","t47d_myc_rep3_bowtie_sorted_q20_dupmarked.bam","t47d_input_bowtie_sorted_q20_dupmarked.bam"
"M231_1","TNBC",1,"mdamb231_myc_rep1_peaks.narrowPeak","mdamb231_myc_rep1_bowtie_sorted_q20_dupmarked.bam","mdamb231_input_bowtie_sorted_q20_dupmarked.bam"
"M231_2","TNBC",2,"mdamb231_myc_rep2_peaks.narrowPeak","mdamb231_myc_rep2_bowtie_sorted_q20_dupmarked.bam","mdamb231_input_bowtie_sorted_q20_dupmarked.bam"
"M231_3","TNBC",3,"mdamb231_myc_rep3_peaks.narrowPeak","mdamb231_myc_rep3_bowtie_sorted_q20_dupmarked.bam","mdamb231_input_bowtie_sorted_q20_dupmarked.bam"
"BT_1","TNBC",1,"bt549_myc_rep1_peaks.narrowPeak","bt549_myc_rep1_bowtie_sorted_q20_dupmarked.bam","bt549_input_bowtie_sorted_q20_dupmarked.bam"
"BT_2","TNBC",2,"bt549_myc_rep2_peaks.narrowPeak","bt549_myc_rep2_bowtie_sorted_q20_dupmarked.bam","bt549_input_bowtie_sorted_q20_dupmarked.bam"
"BT_3","TNBC",3,"bt549_myc_rep3_peaks.narrowPeak","bt549_myc_rep3_bowtie_sorted_q20_dupmarked.bam","bt549_input_bowtie_sorted_q20_dupmarked.bam"
==> metadata_diffbind_myc.csv <==
SampleID,Tissue,Condition,Replicate,Factor,Treatment,PeakCaller,Peaks,bamReads,bamControl,ControlID
M7_1,MCF7,ER+,1,MYC,Non,macs,mcf7_myc_rep1_peaks.narrowPeak,mcf7_myc_rep1_bowtie_sorted_q20_dupmarked.bam,mcf7_input_bowtie_sorted_q20_dupmarked.bam,MCF7in
M7_2,MCF7,ER+,2,MYC,Non,macs,mcf7_myc_rep2_peaks.narrowPeak,mcf7_myc_rep2_bowtie_sorted_q20_dupmarked.bam,mcf7_input_bowtie_sorted_q20_dupmarked.bam,MCF7in
M7_3,MCF7,ER+,3,MYC,Non,macs,mcf7_myc_rep3_peaks.narrowPeak,mcf7_myc_rep3_bowtie_sorted_q20_dupmarked.bam,mcf7_input_bowtie_sorted_q20_dupmarked.bam,MCF7in
T4_1,T47D,ER+,1,MYC,Non,macs,t47d_myc_rep1_peaks.narrowPeak,t47d_myc_rep1_bowtie_sorted_q20_dupmarked.bam,t47d_input_bowtie_sorted_q20_dupmarked.bam,T47Din
T4_2,T47D,ER+,2,MYC,Non,macs,t47d_myc_rep2_peaks.narrowPeak,t47d_myc_rep2_bowtie_sorted_q20_dupmarked.bam,t47d_input_bowtie_sorted_q20_dupmarked.bam,T47Din
T4_3,T47D,ER+,3,MYC,Non,macs,t47d_myc_rep3_peaks.narrowPeak,t47d_myc_rep3_bowtie_sorted_q20_dupmarked.bam,t47d_input_bowtie_sorted_q20_dupmarked.bam,T47Din
M231_1,MDAMB231,TNBC,1,MYC,Non,macs,mdamb231_myc_rep1_peaks.narrowPeak,mdamb231_myc_rep1_bowtie_sorted_q20_dupmarked.bam,mdamb231_input_bowtie_sorted_q20_dupmarked.bam,MDAMB231in
M231_2,MDAMB231,TNBC,2,MYC,Non,macs,mdamb231_myc_rep2_peaks.narrowPeak,mdamb231_myc_rep2_bowtie_sorted_q20_dupmarked.bam,mdamb231_input_bowtie_sorted_q20_dupmarked.bam,MDAMB231in
M231_3,MDAMB231,TNBC,3,MYC,Non,macs,mdamb231_myc_rep3_peaks.narrowPeak,mdamb231_myc_rep3_bowtie_sorted_q20_dupmarked.bam,mdamb231_input_bowtie_sorted_q20_dupmarked.bam,MDAMB231in
BT_1,BT549,TNBC,1,MYC,Non,macs,bt549_myc_rep1_peaks.narrowPeak,bt549_myc_rep1_bowtie_sorted_q20_dupmarked.bam,bt549_input_bowtie_sorted_q20_dupmarked.bam,BT549in
BT_2,BT549,TNBC,2,MYC,Non,macs,bt549_myc_rep2_peaks.narrowPeak,bt549_myc_rep2_bowtie_sorted_q20_dupmarked.bam,bt549_input_bowtie_sorted_q20_dupmarked.bam,BT549in
BT_3,BT549,TNBC,3,MYC,Non,macs,bt549_myc_rep3_peaks.narrowPeak,bt549_myc_rep3_bowtie_sorted_q20_dupmarked.bam,bt549_input_bowtie_sorted_q20_dupmarked.bam,BT549in
==> myc_metadata.csv <==
SampleID,Tissue,Factor,Condition,Treatment,Replicate,bamReads,ControlID,bamControl,Peaks,PeakCaller
M7_1,MCF7,MYC,ER+,Non,1,mcf7_myc_rep1_bowtie_sorted_q20_dupmarked.bam,MCF7in,mcf7_input_bowtie_sorted_q20_dupmarked.bam,mcf7_myc_rep1_peaks.narrowPeak,raw
M7_2,MCF7,MYC,ER+,Non,2,mcf7_myc_rep2_bowtie_sorted_q20_dupmarked.bam,MCF7in,mcf7_input_bowtie_sorted_q20_dupmarked.bam,mcf7_myc_rep2_peaks.narrowPeak,raw
M7_3,MCF7,MYC,ER+,Non,3,mcf7_myc_rep3_bowtie_sorted_q20_dupmarked.bam,MCF7in,mcf7_input_bowtie_sorted_q20_dupmarked.bam,mcf7_myc_rep3_peaks.narrowPeak,raw
T4_1,T47D,MYC,ER+,Non,1,t47d_myc_rep1_bowtie_sorted_q20_dupmarked.bam,T47Din,t47d_input_bowtie_sorted_q20_dupmarked.bam,t47d_myc_rep1_peaks.narrowPeak,raw
T4_2,T47D,MYC,ER+,Non,2,t47d_myc_rep2_bowtie_sorted_q20_dupmarked.bam,T47Din,t47d_input_bowtie_sorted_q20_dupmarked.bam,t47d_myc_rep2_peaks.narrowPeak,raw
T4_3,T47D,MYC,ER+,Non,3,t47d_myc_rep3_bowtie_sorted_q20_dupmarked.bam,T47Din,t47d_input_bowtie_sorted_q20_dupmarked.bam,t47d_myc_rep3_peaks.narrowPeak,raw
M231_1,MDAMB231,MYC,TNBC,Non,1,mdamb231_myc_rep1_bowtie_sorted_q20_dupmarked.bam,MDAMB231in,mdamb231_input_bowtie_sorted_q20_dupmarked.bam,mdamb231_myc_rep1_peaks.narrowPeak,raw
M231_2,MDAMB231,MYC,TNBC,Non,2,mdamb231_myc_rep2_bowtie_sorted_q20_dupmarked.bam,MDAMB231in,mdamb231_input_bowtie_sorted_q20_dupmarked.bam,mdamb231_myc_rep2_peaks.narrowPeak,raw
M231_3,MDAMB231,MYC,TNBC,Non,3,mdamb231_myc_rep3_bowtie_sorted_q20_dupmarked.bam,MDAMB231in,mdamb231_input_bowtie_sorted_q20_dupmarked.bam,mdamb231_myc_rep3_peaks.narrowPeak,raw
BT_1,BT549,MYC,TNBC,Non,1,bt549_myc_rep1_bowtie_sorted_q20_dupmarked.bam,BT549in,bt549_input_bowtie_sorted_q20_dupmarked.bam,bt549_myc_rep1_peaks.narrowPeak,raw
BT_2,BT549,MYC,TNBC,Non,2,bt549_myc_rep2_bowtie_sorted_q20_dupmarked.bam,BT549in,bt549_input_bowtie_sorted_q20_dupmarked.bam,bt549_myc_rep2_peaks.narrowPeak,raw
BT_3,BT549,MYC,TNBC,Non,3,bt549_myc_rep3_bowtie_sorted_q20_dupmarked.bam,BT549in,bt549_input_bowtie_sorted_q20_dupmarked.bam,bt549_myc_rep3_peaks.narrowPeak,raw
I also tried one with exact paths.
I came until this part:
But I cannot continue with normalizing. I am getting errors such as these:
dbObj <- dba.normalize(dbObj)
Error in sum(sapply(pv$peaks, nrow)) : invalid argument 'type' (list)
I am literally desperate. If you can help me I will more than appreciate. Thank you.
Thank you so much. Which metadata format should I proceed with, you think?