Hi everyone, first post here!
I'm currently working on a ChIPseq data set enriching for H3K27me3 between four Arabidopsis genotypes.
My current problem is at the annotation step using annotatepeak from ChIPseeker. Because the peaks I have called (using MACS2) are broad, they cover multiple loci in the genome, but only become annotated with the closest TSS instead of say, three different loci. As a result, identifying differentially methylated regions between my WT and mutants is quite error prone when I look at the bed files in IGV. I've tried to show an example in the image attached.
This is the line of code I use in R:
WT_consensus_anno <- annotatePeak(WT_covered_granges, tssRegion=c(-3000, 3000),
TxDb=txdb, annoDb="org.At.tair.db")
Where txdb is extracted from the TxDb.Athaliana.BioMart.plantsmart28 package, and WT_covered_granges is a GRanges object containing the called peaks from MACS2.
Any help would be much appreciated, I've been puzzling over this for a few days and searched everywhere for answers.
If I've left out any information please let me know
Thanks a lot!
One thing that ChIPseeker has is the flank option. See here. It defines a flanking region around the peaks, and reports the genes that overlap the flanking region.
I am not 100% sure how is the output defined. What I did was bypassing the ChIPseeker. I needed to find the distances from the TSS only and not annotate other elements. So I extracted the TSS from the GTF file. Then with bedtools window, I have reported the TSS which overlapped my peaks file and calculated the distance from it using awk.