Entering edit mode
15 months ago
DARLOR
▴
10
Hi, I have a list of peaks in bed format and I would to remove peaks that falls into promoters (+/- 500 bp). I tried with this approach but I'm not sure is it correct. Can anyone confirm it?
> peaks
GRanges object with 336 ranges and 3 metadata columns:
seqnames ranges strand | gene_name region other_value
<Rle> <IRanges> <Rle> | <character> <character> <numeric>
[1] chr19 25406474-25407399 * | Kank1 chr19-25406474-25407.. 0.992174
[2] chr13 83517922-83518810 * | Mef2c chr13-83517922-83518.. 0.988074
[3] chr14 27058403-27059242 * | Il17rd chr14-27058403-27059.. 0.984493
[4] chr19 25400147-25400902 * | Kank1 chr19-25400147-25400.. 0.982824
[5] chr3 68493632-68494504 * | Schip1 chr3-68493632-68494504 0.982311
... ... ... ... . ... ... ...
[332] chr12 76552016-76552928 * | Plekhg3 chr12-76552016-76552.. 0.503659
[333] chr16 22434324-22435033 * | Etv5 chr16-22434324-22435.. 0.502546
[334] chr2 77064877-77065682 * | Ccdc141 chr2-77064877-77065682 0.500821
[335] chr11 7129366-7130259 * | Adcy1 chr11-7129366-7130259 0.500202
[336] chr14 27122377-27123107 * | Il17rd chr14-27122377-27123.. 0.500010
-------
seqinfo: 19 sequences from an unspecified genome; no seqlengths
peaks_df= as.data.frame(peaks)
TSS <- promoterRegions("mm10", upstream=500, downstream=500)
TSS_gr= makeGRangesFromDataFrame(TSS, keep.extra.columns = T)
on_promoter= as.data.frame(subsetByOverlaps(peaks, TSS_gr))
peaks_noprom= peaks_df[!peaks_df$region %in% on_promoter$region,]
I obtained 298 peaks that don't fall into promoter regions from 336 I started. Is it correct? any suggestions?