Entering edit mode
4.8 years ago
jeni
▴
90
Hi everyone!
I have a dataframe with some genomic intervals and its corresponding coverage in several samples:
sample1 sample2 sample3
1:1-3 30 NA NA
1:1-4 NA 40 35
1:4-5 35 NA NA
1:5-7 NA 50 50
1:6-7 60 NA NA
I would like to obtain the same dataframe but for genomic positions:
sample1 sample2 sample3
1:1 30 40 35
1:2 30 40 35
1:3 30 40 35
1:4 35 40 35
1:5 35 50 50
1:6 60 50 50
1:7 60 50 50
How could I get this?
The intervals can be obtained first by
rownames
. Then usestrsplit
to get the chromosome (first element) and the ranges (2nd and 3rd element). You can either put this into a data frame and use thenmakeGRangesFromDataFrame
or useGRanges
directly to construct a GRanges object. The coverages could be stored aselementMetadata
in the resulting GRanges object. I suggest you try that out. It is a good practice to improve yourself.Okay, thanks! I have already done that.
But now how can I get genomic positions from each interval, indicating the coverage value of each sample for each position?
Can you show what you have done?
Sure! I've transformed my dataframe in a GRanges object (I've splitted first genomic coordinates to this format -> chr start end):
Now, I have tried:
and I get this:
In this example I cannot obtain all the positions, but in my real df I can, because I have a lot of overlapped intervals. Now the problem I have is that I dont know how to maintain and adapt metadata columns, what I would like is to obtain this:
Related SO post: