I thought this would be easy, but once again working with genomicranges is not intuitive to me.
Here's a toy genomic ranges object. I want to generate a GR list for each chromosome, calculate the distance between the start of each feature (irrespective of strand, so in this case it's all 0, but it doesn't matter) and add the 'distance' column to each GR in the list:
gr.toy <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr3"), c(3, 3, 3)),
ranges = IRanges(1:9, width = 1, names = head(letters,9)),
strand = Rle(strand(c("+", "-", "+")), c(2, 4, 3)),
score = 1:9,GC = seq(1, 0, length=9))
lgr.toy<-split(gr.toy, seqnames(gr.toy))
lgr.tss<-lapply(lgr.toy, flank, 0)
ldis<-lapply(lgr.tss, function(X){strand(X)<-Rle(strand("*"), length(strand(X))); distanceToNearest(X)@elementMetadata})
Now I try to add ldis to lgr.toy:
lgr.toy
GRangesList of length 3:
$chr1
GRanges with 3 ranges and 2 metadata columns:
seqnames ranges strand | score GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
a chr1 [1, 1] + | 1 1
b chr1 [2, 2] + | 2 0.875
c chr1 [3, 3] - | 3 0.75
$chr2
GRanges with 3 ranges and 2 metadata columns:
seqnames ranges strand | score GC
d chr2 [4, 4] - | 4 0.625
e chr2 [5, 5] - | 5 0.5
f chr2 [6, 6] - | 6 0.375
$chr3
GRanges with 3 ranges and 2 metadata columns:
seqnames ranges strand | score GC
g chr3 [7, 7] + | 7 0.25
h chr3 [8, 8] + | 8 0.125
i chr3 [9, 9] + | 9 0
---
seqlengths:
chr1 chr2 chr3
NA NA NA
ldis
$chr1
DataFrame with 3 rows and 1 column
distance
<integer>
1 0
2 0
3 0
$chr2
DataFrame with 3 rows and 1 column
distance
<integer>
1 0
2 0
3 0
$chr3
DataFrame with 3 rows and 1 column
distance
<integer>
1 0
2 0
3 0
for(i in 1:length(lgr.toy)) { mcols(lgr.toy[[i]])$distance<-ldis[[i]][,1] }
Error in .Method(..., deparse.level = deparse.level) :
number of columns for arg 2 do not match those of first arg
I get the same result if I try to use values()
or elementMetadata()
It works just fine if I pull the GR for one chromosome out and add the data separately. Why is this so hard when working from a list??? Thank you!
gr1<-lgr.toy$chr1
mcols(gr1)$distance<-ldis[[1]][,1]
gr1
GRanges with 3 ranges and 3 metadata columns:
seqnames ranges strand | score GC distance
<Rle> <IRanges> <Rle> | <integer> <numeric> <integer>
a chr1 [1, 1] + | 1 1 0
b chr1 [2, 2] + | 2 0.875 0
c chr1 [3, 3] - | 3 0.75 0
---
seqlengths:
chr1 chr2 chr3
NA NA NA
Thank you, Devon! I understand that
distanceToNearest()
respects chromosomes, but I have to do more analysis once I have the distances and I'd like to do it chromosome by chromosome as they differ in gene content, etc. Also,end(gr2.toy) = start(gr2.toy)
isn't strand-sensitive, so the true 5' of a (-) strand feature is actually listed as 'end'. That's why I had to use flank, although maybe there's a better approach.I +1 your answer, and happy to accept it if you can help address this in a list-context. Thank you!