Entering edit mode
12.7 years ago
Zev.Kronenberg
12k
Greetings all,
I am no stranger to R, but I will admit that bioconductor always throws me for a loop. I am trying to do a sliding mean or a binned mean (lets say 1Kb) across all scaffolds in my data. Since i am working on SNV data only start = stop. Could I construct my data format differently and bin from the start?
head of data:
CHRM POS Q_FREQ_POP1 Q_FREQ_POP2 FREQ_DIFF HET_POP1 HET_POP2 FST
1 scaffold_0 1257 0.3846154 0.4545455 0.06993007 0.4733728 0.4958678 0.004995005
2 scaffold_0 1302 0.4000000 0.4615385 0.06153846 0.4800000 0.4970414 0.003846154
3 scaffold_0 2072 0.3888889 0.5294118 0.14052288 0.4753086 0.4982699 0.019876591
the call:
my.ranged.dat<-RangedData(ranges=IRanges(start=dat$POS, end=dat$POS), space=dat$CHRM, score=dat$FST)
RangedData with 6 rows and 1 value column across 8166 spaces
space ranges | score
<factor> <IRanges> | <numeric>
1 scaffold_0 [1257, 1257] | 0.004995005
2 scaffold_0 [1302, 1302] | 0.003846154
3 scaffold_0 [2072, 2072] | 0.019876591
4 scaffold_0 [3513, 3513] | 0.001382604
5 scaffold_0 [4392, 4392] | 0.000637690
6 scaffold_0 [4469, 4469] | 0.006060606
Some questions back: Do you have a single score per position only? Or do you have a score for each base postion (being zero or NA for most positions)? In computing the average scores, how should missing scores be treated, should they be ignored, actually this is what I think makes most sense, ortherwise all averages will be ~0.