Hello,
I am computer science student and trying to work on BioMedical data and learning and applying the Bioinformatics techniques.
I have ChipSEQ data generated by my lab , of histone modification , H3k27me3, with three different samples I have applied the whole pipeline and able to call peaks as well using MACS2. the peaks shown in the results are as expected that the sample that is expected to show the highest number of peaks and the one who should show less , is accordingly.
But when I am doing the downstream analysis using Chipseeker and following the vignette the frequency plot is showing almost similar number of peaks where as the number of peaks in the excel peak file in th e MASC2 results shows 194577 and 48752 peaks respectively The code that I am using for Chipseeker is
promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
tagMatrixList <- lapply(files, getTagMatrix, windows=promoter)
plotAvgProf(tagMatrixList, xlim=c(-3000, 3000), conf=0.95,resample=500, facet="row")
MACS2 command that's I am using
macs2 callpeak -t treatedfle.bam -c inputfile.bam --gsize 3.0e9 --bdg --broad --broad-cutoff 0.1 --nomodel --extsize 125 --name treated_ --outdir /home/labs/chip/
Any help or suggestions are welcome.
Hi,
How do you observe the number of peaks using the frequency plot / average profile ?
Could you show the plot?
This is the peak count frequency plot that I am referring to , the first sampleis supposed to have the least number of peaks but its frequency is highest .. Am I interpretting it wrong ?
The y-axis is peak count frequency not peak numbers. I tried to understand it through the R script of plotAvgProf. The peak count frequency is calculated for each sample separately (using apply fn). Then collated together using facet. The profile plot is telling you distribution of peaks around TSS +/- 1kb with y-axis limits ranging upto 0.0007. If any of your samples has peaks with higher frequency at any given genomic site the scale i.e y-limit will increase. Also the number you mentioned about peaks are not so low even they are too much different. So while calculating the frequency you might get a range of y-axis which is same.
Thankyou so much Ankit for your detailed answer.
hello , I have another query related to this , the data that I am analysing was previously analyzed too and those results indicate that there is dip in the TSS region where is mine are showing the highest at TSS , can you suggest what could be the reason ? I would really appreciate your response
what is your txdb?
can you explain your experiment in detail?