Hi,
I have a list of ChIP-seq peak regions and I am curious about those peaks overlapping annotated promoter regions. Inspired by this tutorial
(more specifically, sections 5.3 and 5.4), I have come up with two approaches using two different findOverlaps methods (Genomic Alignments R package) that leave me with slightly different and thus confusing results:
Approach 1: Using subsetByOverlaps()
# retrieving promoter regions
promoters <- promoters(genes, upstream=2000, downstream=200)
# retrieving peaks that overlap a promoter region
peaksProm <- subsetByOverlaps(peaks, promoters) # 1695 ranges
# retrieving genes that have a peak overlapping their promoter region
genesWithPromPeaks <- subsetByOverlaps(promoters, peaks) # 2231 ranges
Approach 2: Using findOverlaps()
# retrieving promoter regions
promoters <- promoters(genes, upstream=2000, downstream=200)
PromPeaks <- findOverlaps(peaks, prom) # 2246 hits
# retrieving peaks that overlap a promoter region
peaksProm <- peaks[queryHits(PromPeaks)] # 2246 ranges
# retrieving genes that have a peak overlapping their promoter region
genesWithPromPeaks <- genes[subjectHits(PromPeaks)] # 2246 ranges
Approach 2 gives the same number of genes and peaks, which is what I would expect. I am confused by the different results between the two approaches, but also by the different number of genes and peaks retrieved with approach 1. I was expecting both functions to yield the same results, but I might be missing something here. I would be thankful for any input and help!