convert bed file to GRangeList object
4
5
Entering edit mode
8.1 years ago

I have a bed file as follow:

chr1    100194350       100194710       ARID3A  1000 
chr1    151430604       151430964       ARID3A  1000  
chr1    20301327        20301687        ARID3A  1000 
chr1    229267393       229267573       ARID3A  1000  
chr1    8802108         8802375         ARID3A  1000
chr1    109289093       109289349       ATF1    1000  
chr1    110527180       110527436       ATF1    1000
chr1    110950342       110950486       ATF1    1000  
chr1    115124275       115124409       ATF1    1000  
chr1    115259380       115259491       ATF1    1000  
...

and I would like to convert it to a GRangeList object in R as follow:

load("data.rda")
head(data)
$ARID3A
GRanges object with 8999 ranges and 0 metadata columns:
   seqnames                 ranges strand
      <Rle>              <IRanges>  <Rle>
 1     chr1     [1307917, 1308277]      *
 2     chr1     [1407080, 1407440]      *
 3     chr1     [1858670, 1859030]      *
 4     chr1     [2175900, 2176260]      *
 5     chr1     [2290655, 2291015]      *
   ...      ...                    ...    ...

  8966     chrX [154495495, 154495855]      *
  8967     chrX [154799333, 154799693]      *
  8968     chrX [154819952, 154820312]      *
  8969     chrX [154840885, 154841245]      *
  8970     chrX [155434904, 155435264]      *

  -------

  seqinfo: 23 sequences from an unspecified genome; no seqlengths

$ATF1

GRanges object with 14883 ranges and 0 metadata columns:
seqnames                 ranges strand


 <Rle>              <IRanges>  <Rle>
  1     chr1     [ 778593,  778805]      *
  2     chr1     [1000794, 1001007]      *
  3     chr1     [1032962, 1033218]      *
  4     chr1     [1109781, 1110037]      *
  5     chr1     [1185572, 1185828]      *
...      ...                    ...    ...

  14846     chrX [155026863, 155027119]      *
  14847     chrX [155057436, 155057692]      *
  14848     chrX [155881105, 155881361]      *
  14849     chrX [155881673, 155881929]      *
  14850     chrX [155893620, 155893876]      *
  -------
  seqinfo: 23 sequences from an unspecified genome; no seqlengths

  ...

I checked GenomicRanges but I could not find a way to make it from bed file?

Thank you so much for helping me.

R • 21k views
ADD COMMENT
9
Entering edit mode
8.1 years ago
zx8754 12k

We need to split then use lapply to get list output of ranges

# dummy data
df1 <- read.table(text = "chr1    100194350       100194710       ARID3A  1000 
chr1    151430604       151430964       ARID3A  1000  
                  chr1    20301327        20301687        ARID3A  1000 
                  chr1    229267393       229267573       ARID3A  1000  
                  chr1    8802108         8802375         ARID3A  1000
                  chr1    109289093       109289349       ATF1    1000  
                  chr1    110527180       110527436       ATF1    1000
                  chr1    110950342       110950486       ATF1    1000  
                  chr1    115124275       115124409       ATF1    1000  
                  chr1    115259380       115259491       ATF1    1000  ", header = FALSE)


library(GenomicRanges)

# split and convert per region
res <- 
  lapply(split(df1, df1$V4), function(i){
    GRanges(seqnames = i$V1,
            ranges = IRanges(start = i$V2,
                             end = i$V3,
                             names = i$V4))
  })

# result
res

$ARID3A
GRanges object with 5 ranges and 0 metadata columns:
         seqnames                 ranges strand
            <Rle>              <IRanges>  <Rle>
  ARID3A     chr1 [100194350, 100194710]      *
  ARID3A     chr1 [151430604, 151430964]      *
  ARID3A     chr1 [ 20301327,  20301687]      *
  ARID3A     chr1 [229267393, 229267573]      *
  ARID3A     chr1 [  8802108,   8802375]      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$ATF1
GRanges object with 5 ranges and 0 metadata columns:
       seqnames                 ranges strand
          <Rle>              <IRanges>  <Rle>
  ATF1     chr1 [109289093, 109289349]      *
  ATF1     chr1 [110527180, 110527436]      *
  ATF1     chr1 [110950342, 110950486]      *
  ATF1     chr1 [115124275, 115124409]      *
  ATF1     chr1 [115259380, 115259491]      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENT
10
Entering edit mode
8.1 years ago
igor 13k

You can do this with rtracklayer library:

library(rtracklayer)
gr_obj =  import("file.bed")
library(GenomicRanges)
gr_list = split(gr_obj, gr_obj$name)

More info here: https://www.bioconductor.org/packages/release/bioc/vignettes/rtracklayer/inst/doc/rtracklayer.pdf

ADD COMMENT
7
Entering edit mode
5.9 years ago
bernatgel ★ 3.4k

The function toGRanges from package regioneR will work both with a data frame or a file with a bed-like structure (it will internally call rtracklayer::import used by igor to actually import the data)

 library(regioneR)

dd <- toGRanges("data.bed")
dd <- split(dd, f = dd$name)
ADD COMMENT

Login before adding your answer.

Traffic: 2123 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6