How to use a bash variable in awk calculation
1
0
Entering edit mode
4.2 years ago

I am using the following code to analyze a sequence by seqtk and use the output in awk to print total number of nucleotides and nucleotides marked as N.

seqtk comp "$FILE" | awk '{x+=$2}END{print "Total Nucleotides: " x}' 
seqtk comp "$FILE" | awk '{x+=$9}END{print "Total Unknown Nucleotides: " x}' 
seqtk comp "$FILE" | awk '{x+=$9 ; y+=$2}END{print "Percent of Unknown Nucleotides: " (x/y)*100}'

Since in the above code seqtk calculations are done 3 times, I want to process once and put the result in a variable and reuse. I have tried the following.

RESULT= seqtk comp "$FILE"
echo "$RESULT" | awk '{x+=$2}END{print "Total Nucleotides: " x}' 
echo "$RESULT" | awk '{x+=$9}END{print "Total Unknown Nucleotides: " x}' 
echo "$RESULT" | awk '{x+=$9 ; y+=$2}END{print "Percent of Unknown Nucleotides: " (x/y)*100}'

But the value of x comes out equal to 0.

bash awk • 1.3k views
ADD COMMENT
1
Entering edit mode
4.2 years ago

use the -v option of awk:

awk -v awk_var=<your bash var>

awk -v x=$RESULT in your specific case

ADD COMMENT
0
Entering edit mode

I tried the following

 awk -v x= $RESULT '{x+=$2}END{print "Total Nucleotides: " x}'

It didn't produce any output.

ADD REPLY
0
Entering edit mode

there should be no space behind the x= part

also check if $RESULTS is correctly set (also there should be no space behind the = sign)

ADD REPLY
0
Entering edit mode

The $RESULT variable is empty. Not sure why.

ADD REPLY
0
Entering edit mode

did you remove the space after RESULT= in your cmdline?

moreover, you need to either use the backtick operator: `

or more modern the $() operator. eg RESULT=$(seqtk comp "$FILE")

ADD REPLY
0
Entering edit mode

After adding $() operator now $RESULT is set correctly. However, now I am getting this error: awk: cannot open 11820664 (No such file or directory). I have tried awk -v x=$RESULT '{print "Total Nucleotides: " x[2]}' as well as awk -v x=$RESULT '{x=$2}END{print "Total Nucleotides: " x}'. 11820664 is the third value in the line of data. Thanks

ADD REPLY
0
Entering edit mode

Finally, I made it to work by using the following code: awk '{x=$2}END{print "Total Nucleotides: " x}' <<< "$RESULT"

ADD REPLY
0
Entering edit mode

ok, if the 'value' of RESULTS is something multiline, it's all a bit more tricky.

ADD REPLY

Login before adding your answer.

Traffic: 2474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6