Hello all,
I'm having a bit of trouble with bioconductor. I have several different GRange objects (peak files from MACS, specifically) that I'd like to merge by performing a left join. Examples files look like this:
> head(c8.grange)
GRanges object with 6 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
Grouped_C8_NoExt_WithD_cond1_1 1 [ 2215700, 2216098] *
Grouped_C8_NoExt_WithD_cond1_2 1 [ 2309602, 2309838] *
Grouped_C8_NoExt_WithD_cond1_3 1 [ 9958346, 9958709] *
Grouped_C8_NoExt_WithD_cond1_4 1 [18401454, 18401797] *
Grouped_C8_NoExt_WithD_cond1_5 1 [22603688, 22603920] *
Grouped_C8_NoExt_WithD_cond1_6 1 [23866394, 23866612] *
-------
seqinfo: 29 sequences from an unspecified genome; no seqlengths
> head(v0.grange)
GRanges object with 6 ranges and 2 metadata columns:
seqnames ranges strand | score fold
<Rle> <IRanges> <Rle> | <integer> <numeric>
V0_Peaks_NoExt_peak_1 1 [11160, 11408] * | 91 4.31469
V0_Peaks_NoExt_peak_2 1 [13468, 13668] * | 64 3.71956
V0_Peaks_NoExt_peak_3 1 [16855, 17058] * | 42 3.12443
V0_Peaks_NoExt_peak_4 1 [27463, 27918] * | 983 16.81243
V0_Peaks_NoExt_peak_5 1 [28820, 29068] * | 58 3.57078
V0_Peaks_NoExt_peak_6 1 [36493, 37032] * | 546 11.45626
-------
seqinfo: 457 sequences from an unspecified genome; no seqlengths
I'd like to:
- Find where the peaks overlap, and subsequently:
- Merge based on that overlap.
So far:
> c8.v0<-findOverlapsOfPeaks(c8.grange,v0.grange,minoverlap = 200)
> overlaps<-c8.v0$peaklist[["c8.grange///v0.grange"]]
> overlaps
GRanges object with 475 ranges and 1 metadata column:
seqnames ranges strand | peakNames
<Rle> <IRanges> <Rle> | <CharacterList>
[1] 1 [ 2215669, 2216098] * | v0.grange__V0_Peaks_NoExt_peak_98,c8.grange__Grouped_C8_NoExt_WithD_cond1_1
[2] 1 [18401238, 18402379] * | v0.grange__V0_Peaks_NoExt_peak_698a,c8.grange__Grouped_C8_NoExt_WithD_cond1_4
[3] 1 [22603511, 22603993] * | v0.grange__V0_Peaks_NoExt_peak_784,c8.grange__Grouped_C8_NoExt_WithD_cond1_5
[4] 1 [23866184, 23866789] * | v0.grange__V0_Peaks_NoExt_peak_816,c8.grange__Grouped_C8_NoExt_WithD_cond1_6
[5] 1 [25970340, 25971358] * | v0.grange__V0_Peaks_NoExt_peak_887a,c8.grange__Grouped_C8_NoExt_WithD_cond1_7
... ... ... ... . ...
[471] KN150112.1 [ 1834, 2755] * | v0.grange__V0_Peaks_NoExt_peak_42395,c8.grange__Grouped_C8_NoExt_WithD_cond1_607
[472] KN150127.1 [26653, 27288] * | v0.grange__V0_Peaks_NoExt_peak_42417,c8.grange__Grouped_C8_NoExt_WithD_cond1_608
[473] KN150342.1 [24017, 24364] * | c8.grange__Grouped_C8_NoExt_WithD_cond1_609,v0.grange__V0_Peaks_NoExt_peak_42582
[474] KN150342.1 [24739, 25115] * | c8.grange__Grouped_C8_NoExt_WithD_cond1_610,v0.grange__V0_Peaks_NoExt_peak_42583
[475] KN150342.1 [25567, 25951] * | c8.grange__Grouped_C8_NoExt_WithD_cond1_611,v0.grange__V0_Peaks_NoExt_peak_42584
-------
seqinfo: 457 sequences from an unspecified genome; no seqlengths
What I'm attempting to do is join several different files (v0,v1,v2,v3, etc.) to this singular base file (c8) to have a way to compare the peak scores for overlapping peaks from different samples. Outside of cutting up the peakNames
string to try and do a merge that way, does anyone know of a base function within ChIPpeakAnno that can do this for me? Thanks!
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.5.0 purrr_0.2.2 readr_1.0.0 tidyr_0.6.0 tibble_1.2 ggplot2_2.1.0 tidyverse_1.0.0 ChIPpeakAnno_3.6.5
[9] VennDiagram_1.6.17 futile.logger_1.4.3 GenomicRanges_1.24.3 GenomeInfoDb_1.8.7 Biostrings_2.40.2 XVector_0.12.1 IRanges_2.6.1 S4Vectors_0.10.3
[17] BiocGenerics_0.18.0 BiocInstaller_1.22.3 gplots_3.0.1 edgeR_3.14.0 limma_3.28.21
loaded via a namespace (and not attached):
[1] Rcpp_0.12.7 lattice_0.20-34 GO.db_3.3.0 Rsamtools_1.24.0 gtools_3.5.0 assertthat_0.1
[7] digest_0.6.10 mime_0.5 R6_2.1.3 plyr_1.8.4 futile.options_1.0.0 RSQLite_1.0.0
[13] httr_1.2.1 zlibbioc_1.18.0 GenomicFeatures_1.24.5 lazyeval_0.2.0 gdata_2.17.0 Matrix_1.2-7.1
[19] splines_3.3.1 BiocParallel_1.6.6 AnnotationHub_2.4.2 RCurl_1.95-4.8 biomaRt_2.28.0 munsell_0.4.3
[25] shiny_0.14 httpuv_1.3.3 rtracklayer_1.32.2 multtest_2.28.0 htmltools_0.3.5 SummarizedExperiment_1.2.3
[31] interactiveDisplayBase_1.10.3 idr_1.2 matrixStats_0.50.2 XML_3.98-1.4 GenomicAlignments_1.8.4 MASS_7.3-45
[37] bitops_1.0-6 RBGL_1.48.1 xtable_1.8-2 gtable_0.2.0 DBI_0.5-1 magrittr_1.5
[43] scales_0.4.0 graph_1.50.0 KernSmooth_2.23-15 seqinr_3.3-1 lambda.r_1.1.9 ensembldb_1.4.7
[49] tools_3.3.1 ade4_1.7-4 BSgenome_1.40.1 Biobase_2.32.0 survival_2.39-5 AnnotationDbi_1.34.4
[55] colorspace_1.2-6 regioneR_1.4.2 caTools_1.17.1 memoise_1.0.0
You know how to find overlaps, so write a function to do the merging (since it's not entirely clear how you want that done).
mergeByOverlaps exists but it won't allow you to select for any particular type of overlap.
As mentionned by Devon Ryan if you want more specific merging a function build around findOverlaps will be the best option.