Hello all,
I have columns of population genetic data, here's an example:
line chromosome fst_score
1 1 0.3
2 1 0.7
3 1 0.3
4 1 0.15
5 1 0.4
6 2 0.6
7 2 0.94
8 2 0.17
9 2 0.19
I want to calculate the average of the values in column 3 (the Fst score) but not for the whole column, I need to bin windows of certain sizes. To start with I'd just like to know how to calculate the average for 10 rows of data column 3. I know that to calculate the average for the whole column I can do something like this:
for line in fileObj:
lineList = line.strip().split()
if lineList[1] in ['1100', '1200', '1300']:
list_to_average = [float(s) for s in lineList[2:]]
average = sum(list_to_average)/len(list_to_average)
but I am not sure how to initiate a reliable counter that would do this for every 10 rows and output this along with the value in the first column (so that I know which lines the average comes from).
This is for chromosome data so in the real file the values in column 2 are chromosome positions and I will use these to define the number of rows that need to be average together as 1mb. But this is trickier as I will need to count when the distance between rows has reached 1mb as I iterate through the file. For now I would just like to solve the first challenge of calculating an average for every 10 lines in a file.
Please let me know if I can format the question better or if there is already an answer on this forum (I have looked)
Any help is appreciated. I'm still getting to grips with programming!
Thanks
Rubal7
Are you committed to a Python solution? This would be very easy in R.
I'm open to R solutions too, especially if it is simpler. But I am even less familiar with R syntax