Hi guys,
I have a file (named DP.2L) which looks like this:
CHROM POS SAMPLE_1
1 1168 47
1 1197 40
1 1202 45
POS ranges from 1168 to 49359284. I generated a sequence vector containing so that I get bin sizes starting from 1168, to 49359284 by 10,000 steps:
> seqWrapper <- function(lb, ub, by=1) {
+ s <- c()
+ if(!ub <= lb) s <- seq(lb,ub, by=by)
+ return(s)
+ }
> x<-seqWrapper(lb=1168, ub=49359284,by=10000)
I want to sequentially calculate the average values of SAMPLE_1 if the value of x is less than or equal to the value of POS but more than the value of the next value of x. For example if x values are :1168 11168 21168 , then I want to first look at the first x value (1168) and compare it with the POS column. I want it to calculate the average of the corresponding rows in SAMPLE_1 if the value of its corresponding POS is less than or equal to the first values of x (1168). But for the next x value (11168), I want the average should only be calculated x is <= POS but > the first x value (1168).
I got as far as this but its not right....
t<- for (i in x) { if (DP.2L<=x & DP.2L > x[i]) {(mean(DP.2L$SAMPLE_1))} }
Your analysis is not really 'sliding-window' but 'consecutive windows' because in sliding-windows are overlapping in my understanding, but according to your definition your regions do not overlap. I suggest to consider whether this is really what you want and if it is appropriate at all. I can just guess that you have data from a a tiling array or ChIP-chip experiment, but in this case for plotting, a real sliding window approach might be preferable.
Yes, Michael what I actually meant to say was a non-overlapping window. As I understood a sliding window can be overlapping or non-overlapping, because the term "sliding" refers to the analysis sliding across the data.