Entering edit mode
6.6 years ago
user_g
▴
20
Hello, I am looking for a way to split my data into groups where each group is made of the same window size I define.
Chrom Start End
chr1 1 10
chr1 11 20
chr1 21 30
chr1 31 40
For example, if I want a window size of 20, then the groups would be : 1-20 , 11-30 , 21 - 40. As long as the size of the group did not exceed 20 it can keep adding to the same group.
I tried using the split function but couldn't implement this way using it. Is there a way around this?
Some questions :
Do you have a dataframe or a GRange ? Your example data looks like a dataframe but you mentionned GRange.
Can a same row goes to different groups ?
Also, is the
start
column automatically create a new dataframe ? By example if you had a row c("chr1","16","25"), this will create a dataframe from 16 to 35. In this case you will have as many dataframe as rows...What do you want to achieve after that splicing ?
I am alternating between data frames and GRanges to find the perfect way to achieve this, so I if I could find a way to do this in GRanges then I will convert my data frame into a GRange object.
Yes the same row can be in another group.
Yes thats true, I will end up having the same number of clusters as the number of rows but these clusters will be rows in a data frame or GRange object not each row an independent data frame.
I need these clusters to study them further in the next stage.
Why not use a for loop over your dataframe and then do your process in the loop ?
Something like this :
If you really need all your GRanges at the same time you can create a list of GRanges before the loop, append it in the loop and use it after the loop.
hello, yes I tried using the for loop but when dealing with large data, it became very slow this is why I am looking for another way