Question

awk command to count specific field

0

Entering edit mode

6.6 years ago

saadleeshehreen ▴ 140

Hi, I have a file with the following content. Now I want to count how many of them have 1 in the field -n2.

Bacteroides fragilis,0
Bacteroides fragilis,0
Salmonella enterica,1
Salmonella enterica,1
Salmonella enterica,1
Bacteroides fragilis,0
.................................
..................................

I used the following command :

cat f1.txt | awk {'$2 == 1'} | wc -l

But it doesn't give me the answer. Please help!

command • 4.2k views

ADD COMMENT • link updated 6.6 years ago by cpad0112 21k • written 6.6 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

I am not able to Understand input file format. You can use following command to count number of 1 in field 2

grep -o ",1" input.txt  | wc -l

ADD REPLY • link 6.6 years ago by MSM55 ▴ 160

score 1 · Answer 1 · 2018-05-23

The weird format of your file (if indeed it is in this way) is out of anyone's understanding. But I ll explain how awk could work here provided a nicely formatted tab separated table

Consider your file (say file.txt) this way, the <tab> and <space> symbols are for representation, your actual file will have whitepspace (tabs and spaces) and corresponding positions shown in the file

Bacteroides<space>fragilis<tab>0
Bacteroides<space>fragilis<tab>0
Salmonella<space>enterica<tab>1
Salmonella<space>enterica<tab>1
Salmonella<space>enterica<tab>1
Bacteroides<space>fragilis<tab>0

Now if you say

awk '$2==1{print}' file.txt | wc -l

It may not work, because, by default the field separator which awk consider here is the first white space it encounters which in this case would be the space Bacteroides <space> fragilis

Hence, you must add a field separator -F

awk -F "\t" '$2==1{print}' file.txt | wc -l

Pierre Lindenbaum · Answer 2 · 2018-05-23

1

Entering edit mode

6.6 years ago

Pierre Lindenbaum 164k

set the field separator 'F' to 'comma' and increase a value 'N' each time column 2 is '1'. At the end print the value of N.

awk -F, '($2==1){N++;}END{print N;}' file.txt

but I think most people would use

cut -d, -f 2 file.txt | grep -c -w 1

ADD COMMENT • link 6.6 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hi, Thanks. I have other related problem. My file like this:

10 Lachnoclostridium sp.   0       0       0       0       1
11 Haemophilus ducreyi     0       0       0       0       1
12 Clostridiales bacterium 0       0       0       0       1
13 Escherichia albertii    0       1       0       0       1

It has 8 fields. I want to just count the lines which value =1 in field 7 and field 8. How can I do that? I used the following, but it's not the exact output.

awk '$4 == 0; $5 == 0; $6 == 0; $7 == 1; $8 ==1' file.txt

ADD REPLY • link updated 6.6 years ago by Pierre Lindenbaum 164k • written 6.6 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

https://www.gnu.org/software/gawk/manual/html_node/Boolean-Ops.html

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hi, You can use following command

awk  '$7==1 && $8==1 {print}' input.txt

ADD REPLY • link 6.6 years ago by MSM55 ▴ 160

0

Entering edit mode

or just awk '$7==1 && $8==1' input.txt

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks. It works for me. :)

ADD REPLY • link 6.6 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

try this as well:

awk '$7 && $8==1'  input.txt

ADD REPLY • link 6.6 years ago by cpad0112 21k

score 0 · Answer 3 · 2018-05-23

Though OP wants solution in awk, here is datamash solution:

Species (organism) wide 0's and 1's count:

$ datamash -s -t "," -g 1,2 count 2 < test.txt | sed 's/,/\t/g'
Bacteroides fragilis    0   3
Salmonella enterica 1   3

Only 0's and 1's count:

$ datamash -s -t "," -g 2 count 2 < test.txt | sed 's/,/\t/g'
0   3
1   3

input (from OP):

 $ cat test.txt 
Bacteroides fragilis,0
Bacteroides fragilis,0
Salmonella enterica,1
Salmonella enterica,1
Salmonella enterica,1
Bacteroides fragilis,0