Entering edit mode
7.1 years ago
pedroamandrade
▴
40
Hi everyone,
I am trying to create a simple script to sum z-scores of several pairwise FST comparisons (basically, replicate the diFST statistic from this paper: http://www.pnas.org/content/107/3/1160.full)
I already found awk scripts to calculate average and standard deviation of a field, but I was looking for something more specific: integrate these values within the script itself. It would be something such as this:
DATASET
0.19393939 0.00000000 0.00000000
0.00000000 0.00000000 0.07407407
0.13658455 0.08052603 0.06750392
0.03704075 0.09428103 0.05263158
0.06046107 0.02883006 0.05263158
0.07591252 0.03383722 0.01593773
0.10475719 0.02040816 0.09956710
0.05010618 0.12556459 0.11561691
0.09554731 0.24450549 0.17279412
0.04499541 0.04938272 0.03703704
MOCK-UP SCRIPT
awk '{print ($1-(average$1))/(stdev$1) + ($2-(average$2))/(stdev$2) + ($3-(average$3))/(stdev$3)}'
OUTPUT
-0.2179645555
-2.2434375436
1.1659017773
-0.7342745412
-1.1920991459
-1.5720200178
0.419356693
1.1658334403
4.7111786566
-1.5024747631
If you could help with just the awk '{print ($1-(average$1))/(stdev$1)}' the rest I can easily do. Thank you for the help!
how about using R ?
Hi Pierre, the reason I asked for awk is because I'm used to doing most simple calculations in this language.
R is good for handling row-oriented data, it's not the tool of choice for columns/whole-file.
This outputs the mean, variance, and sd. I'm not sure what you are doing after that (?)
Hi Kevin,
My objective is to incorporate the mean and standard deviation of the column into the formula itself. For example, given a column $1, do:
($1-average$1)/(stdev$1)
Basically, I want to convert each value of $1 into a zscore (for example, following this: https://statistics.laerd.com/statistical-guides/standard-score-2.php). I know how to do this in several steps, first calculating the mean and standard deviation separately for the column and then incorporate these values into awk, but I imagine there must be a way to do this in a single step.
I see, I wrote my awk code above for summarising rows (not columns).
It may be complex in awk as you'd have to read over the file multiple times. The syntax in awk or that is: awk 'NR=FNR {first pass actions; next} {second pass actions}' DATASET DATASET
As Pierre said, what about R?
Here's a R script that should be executable in your shell that will convert your data to Z scores:
Hi Kevin,
Thanks for the script! That does exactly what I needed!