I have a couple of CUT&RUN samples with control and treatment conditions. This is my first time analyzing this type of data, so I need a little bit of help. I need to compare the peaks between control and treatment. To this end:
I need to compare the number of peaks in control and treatment samples, which I have no idea how to compute.
Then I think I need to apply peak calling using 'seacr', but I don't know how should be the input and how the output will look like, especially in case that I need to compare between samples. According to the help page they should be something like below:
bedtools bamtobed -bedpe -i $sample.bam > $sample.bed
awk '$1==$4 && $6-$2 < 1000 {print $0}' $sample.bed > $sample.clean.bed
cut -f 1,2,6 $sample.clean.bed | sort -k1,1 -k2,2n -k3,3n > $sample.fragments.bed
bedtools genomecov -bg -i $sample.fragments.bed -g my.genome > $sample.fragments.bedgraph
<output prefix>.stringent.bed OR <output prefix>.relaxed.bed (BED file of enriched regions)
bash SEACR_1.3.sh target.bedgraph IgG.bedgraph norm stringent output
I have no idea what should be exactly the input to compare control and treatment samples and how to use the output to draw heatmap plots. So I tried to skip this step and do as follows but I am going on the wrong direction and I need desperate help to fix this:
I used bamCompare
function from deeptools
to compare the control and treatment bam files and convert the result to a single bigwig file:
bamCompare -b1 treatment_sorted.bam -b2 control_sorted.bam -o control.vs.treatment.bw --scaleFactorsMethod None --normalizeUsing RPKM --blackListFileName ENCFF356LFX.bed.gz
Then I used the bamCompare
output, to draw heatmap using control and treatment bed files using computeMatrix
and PlotHeatmap
:
computeMatrix scale-regions -S control.vs.treatment.bw -R control_sorted.bed treatment_sorted.bed -b 1000 --blackListFileName ENCFF356LFX.bed.gz --outFileName MAt.mat.gz
plotHeatmap -m MAt.mat.gz -o plotHeatmap.MAt.mat.gz.png --colorList blue,darkred
However I have no idea how to perform peak calling and where to use it in my situation. Any idea would be appreciated.
from
SEACR
githubREADME.md
Your first code chunk example is detailing how to do this.
But before you can do this step you need to align your reads to your reference genome to generate
sample.bam
.So your workflow would be something like: fastq QC --> align reads --> convert BAM to bedgraph --> call peaks with SEACR