Merge Peak Bed file to Granges
1
0
Entering edit mode
3.9 years ago
wax4001 ▴ 10

I have a bed file of pre-defined super-enhancer data and a Granges object of CDS of hg19 data. I am wondering How I can merge these two so that the Granges will create a score showing how many peak/ super-enhancer is within the range of each gene selected? for example, the original Granges is like this, ignore the specific

## GRanges object with 12 ranges and 0 metadata columns:
##        seqnames        ranges strand
##           <Rle>     <IRanges>  <Rle>
##    [1]     chr1   1-249250621      +
##    [2]     chr1         1-100      -
##    [3]     chr1 112-249250621      -
##    [4]     chr1   1-249250621      *
##    [5]     chr2         1-101      +
##    ...      ...           ...    ...
##    [8]     chr2   1-243199373      *
##    [9]     chr3   1-198022430      +
##   [10]     chr3         1-109      -
##   [11]     chr3 121-198022430      -
##   [12]     chr3   1-198022430      *
##   -------
##   seqinfo: 3 sequences from an unspecified genome`enter code here`

and I have a bed file with some random seq

chr1  213941196  213942363
chr1  213942363  213943530
chr1  213943530  213944697
chr2  158364697  158365864
chr2  158365864  158367031
chr3  127477031  127478198
chr3  127478198  127479365
chr3  127479365  127480532
chr3  127480532  127481699

ignore hid, and seq but I want to see the count as number of peaks with in the region defined by the original Granges

 ## GRanges object with 6 ranges and 10 metadata columns:
    ##       seqnames        ranges strand |       hid     count  eligible
    ##          <Rle>     <IRanges>  <Rle> | <integer> <numeric> <numeric>
    ##   [1]        1   69091-70008      + |         1      <NA>         0
    ##   [2]        1 367640-368634      + |         2      <NA>         0
    ##   [3]        1 621059-622053      - |         3      <NA>         0
    ##   [4]        1 860260-879955      + |         4         0       193
    ##   [5]        1 879584-894689      - |         5         1       634
    ##   [6]        1 895967-901095      + |         6      <NA>         0
    ##        query.id ReplicationTiming                 C                 G
    ##       <integer>         <numeric>         <numeric>         <numeric>
    ##   [1]         1              <NA>              <NA>              <NA>
    ##   [2]         2              <NA>              <NA>              <NA>
    ##   [3]         3              <NA>              <NA>              <NA>
    ##   [4]         4  1.69339861733204            0.3204             0.353
    ##   [5]         5  1.69542176470588 0.333717647058824 0.289047058823529
    ##   [6]         6              <NA>              <NA>              <NA>
    ##       Heterochromatin    LungExpression frac.eligible
    ##             <numeric>         <numeric>     <numeric>
    ##   [1]            <NA>              <NA>             0
    ##   [2]            <NA>              <NA>             0
    ##   [3]            <NA>              <NA>             0
    ##   [4]               0 0.645780413675521      0.009799
    ##   [5]               0  1.76972436511895       0.04197
    ##   [6]            <NA>              <NA>             0
    ##   -------
    ##   seqinfo: 49 sequences from an unspecified genome
ChIP-Seq Granges Bed • 1.7k views
ADD COMMENT
0
Entering edit mode

You should include some example data and an example of the desired output to make it easier for people to help.

ADD REPLY
1
Entering edit mode
3.9 years ago

I made some example data similar to yours as a reproducible example. You can import the ranges from your BED file to a GRanges object using rtracklayer::import("file.bed")

library("plyranges")

CDSs <- data.frame(seqnames="chrI", start=seq(10, 25, 5), end=seq(15, 30, 5), strand=c("-", "*", "+", "+")) %>%
  as_granges

> CDSs
GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chrI     10-15      -
  [2]     chrI     15-20      *
  [3]     chrI     20-25      +
  [4]     chrI     25-30      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

peaks <- data.frame(seqnames="chrI", start=c(12, 17, 29), end=c(18, 22, 36), strand="*") %>%
  as_granges

> peaks
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chrI     12-18      *
  [2]     chrI     17-22      *
  [3]     chrI     29-36      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

plyranges makes it easy to add the overlaps.

overlaps <- mutate(CDSs, n_overlaps=count_overlaps(CDSs, peaks))

> overlaps
GRanges object with 4 ranges and 1 metadata column:
      seqnames    ranges strand | n_overlaps
         <Rle> <IRanges>  <Rle> |  <integer>
  [1]     chrI     10-15      - |          1
  [2]     chrI     15-20      * |          2
  [3]     chrI     20-25      + |          1
  [4]     chrI     25-30      + |          1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENT

Login before adding your answer.

Traffic: 1211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6