Setting elementMetadata in genomicranges list objects
3
1
Entering edit mode
10.5 years ago
reviewer3 ▴ 20

I thought this would be easy, but once again working with genomicranges is not intuitive to me.

Here's a toy genomic ranges object. I want to generate a GR list for each chromosome, calculate the distance between the start of each feature (irrespective of strand, so in this case it's all 0, but it doesn't matter) and add the 'distance' column to each GR in the list:

gr.toy <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr3"), c(3, 3, 3)),
                  ranges = IRanges(1:9, width = 1, names = head(letters,9)),
                  strand = Rle(strand(c("+", "-", "+")), c(2, 4, 3)),
                  score = 1:9,GC = seq(1, 0, length=9))

lgr.toy<-split(gr.toy, seqnames(gr.toy))
lgr.tss<-lapply(lgr.toy, flank, 0)

ldis<-lapply(lgr.tss, function(X){strand(X)<-Rle(strand("*"), length(strand(X))); distanceToNearest(X)@elementMetadata})

Now I try to add ldis to lgr.toy:

lgr.toy
GRangesList of length 3:
$chr1
GRanges with 3 ranges and 2 metadata columns:
    seqnames    ranges strand |     score        GC
       <Rle> <IRanges>  <Rle> | <integer> <numeric>
  a     chr1    [1, 1]      + |         1         1
  b     chr1    [2, 2]      + |         2     0.875
  c     chr1    [3, 3]      - |         3      0.75

$chr2
GRanges with 3 ranges and 2 metadata columns:
    seqnames ranges strand | score    GC
  d     chr2 [4, 4]      - |     4 0.625
  e     chr2 [5, 5]      - |     5   0.5
  f     chr2 [6, 6]      - |     6 0.375

$chr3
GRanges with 3 ranges and 2 metadata columns:
    seqnames ranges strand | score    GC
  g     chr3 [7, 7]      + |     7  0.25
  h     chr3 [8, 8]      + |     8 0.125
  i     chr3 [9, 9]      + |     9     0

---
seqlengths:
chr1 chr2 chr3
   NA   NA   NA

ldis
$chr1
DataFrame with 3 rows and 1 column
   distance
  <integer>
1         0
2         0
3         0

$chr2
DataFrame with 3 rows and 1 column
   distance
  <integer>
1         0
2         0
3         0

$chr3
DataFrame with 3 rows and 1 column
   distance
  <integer>
1         0
2         0
3         0

for(i in 1:length(lgr.toy))   { mcols(lgr.toy[[i]])$distance<-ldis[[i]][,1] }

Error in .Method(..., deparse.level = deparse.level) :
  number of columns for arg 2 do not match those of first arg

I get the same result if I try to use values() or elementMetadata()

It works just fine if I pull the GR for one chromosome out and add the data separately. Why is this so hard when working from a list??? Thank you!

gr1<-lgr.toy$chr1

mcols(gr1)$distance<-ldis[[1]][,1]

gr1
GRanges with 3 ranges and 3 metadata columns:
    seqnames    ranges strand |     score        GC  distance
       <Rle> <IRanges>  <Rle> | <integer> <numeric> <integer>
  a     chr1    [1, 1]      + |         1         1         0
  b     chr1    [2, 2]      + |         2     0.875         0
  c     chr1    [3, 3]      - |         3      0.75         0
  ---
  seqlengths:
   chr1 chr2 chr3
     NA   NA   NA
mcols values genomicranges elementMetaData • 11k views
ADD COMMENT
2
Entering edit mode
10.5 years ago

distanceToNearest() respects chromosomes, so just:

gr.toy <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr3"), c(3, 3, 3)),
                  ranges = IRanges(1:9, width = 1, names = head(letters,9)),
                  strand = Rle(strand(c("+", "-", "+")), c(2, 4, 3)),
                  score = 1:9,GC = seq(1, 0, length=9))
#so we don't overwrite the original strands (alternatively, just save them 
#to an object
gr2.toy <- gr.toy
strand(gr2.toy) <- "*"
mcols(gr.toy)$distance <- as.data.frame(distanceToNearest(gr2.toy))$distance

BTW, you can also end(gr2.toy)=start(gr2.toy) if you really do just want the distance to be 5' only.

ADD COMMENT
0
Entering edit mode

Thank you, Devon! I understand that distanceToNearest() respects chromosomes, but I have to do more analysis once I have the distances and I'd like to do it chromosome by chromosome as they differ in gene content, etc. Also, end(gr2.toy) = start(gr2.toy) isn't strand-sensitive, so the true 5' of a (-) strand feature is actually listed as 'end'. That's why I had to use flank, although maybe there's a better approach.

I +1 your answer, and happy to accept it if you can help address this in a list-context. Thank you!

ADD REPLY
1
Entering edit mode
8.5 years ago

I also experienced difficulty trying to assign metadata to GRanges objects inside of a GRangesList. @DevonRyan's answer helps us add metadata to a GRanges object that is not within a list. @petervangalen's answer helps us add metadata to a GRangesList object, but not to GRanges objects within them.

I finally found an answer in the Bioconductor mailing list archives [https://stat.ethz.ch/pipermail/bioconductor/2011-December/042628.html]. The reason you get the error "number of columns for arg 2 do not match those of first arg" is that all of the objects inside a given GRangesList must have exactly the same columns. So you can't use a for loop to iterate through and add columns.

Instead, unlist it, add your desired columns, and then relist:

unlisted <- unlist(lgr.toy)
unlisted$distance <- "desired metadata"
lgr.toy.relisted <- relist(unlisted, lgr.toy)
ADD COMMENT
0
Entering edit mode
8.7 years ago
petervangalen ▴ 210

The following line works to add an "id" metadata column to a GRangesList:

queries.grl <- GRangesList( lapply(queries.grl, function(x) {
    x$id <- paste0( seqnames(x), ":", start(x), "-", end(x) )
    return(x) }) )
ADD COMMENT

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6