Counting the number of values within a range in a specific column
1
1
Entering edit mode
4.8 years ago
arsala521 ▴ 50

Hello everyone,

I want to use awk to count number of values within a range in a specific column, and to print ranges with the count in output. For example, 4th column of my input is like this

1000
4000
5001
7050
4000
3500

With bin size of 5000, I want to know the number of values in range 1 - 5000, 5001 - 10000 and so on. My values are greater than 100,000. I want to count the values as many times as they appear. For example 4000 should be counted twice. The output should be showing two columns showing ranges and the count like

1000-5000 4
5000-10000 2

I found many relevant answers on different forums but nothing exactly serves my purpose. I found one code but it is counting duplicate values only once.

Thanks

awk • 3.0k views
ADD COMMENT
4
Entering edit mode
4.8 years ago
zx8754 12k

Using awk:

$ awk '{print (int($1/5000)+1)*5000}' temp.txt | sort -g | uniq -c
# 4 5000
# 2 10000

Using R:

# example input
x <- c(1000, 4000, 5001, 7050, 4000, 3500)

# using cut into 5K intervals
table(cut(x, seq(0, 10000, 5000)))
# (0,5e+03] (5e+03,1e+04] 
#         4             2 

# using round
table(round(x, -4))
# 0 10000 
# 4     2
ADD COMMENT

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6