Struggle with awk command
2
2
Entering edit mode
6.2 years ago

Hello, I'm fighting with my awk command since yesterday.

I have a file (locus.txt), this is some IgH locus from mm10 (I don't have header but you have : chr, start, end, strand and name_of_the_locus, separated by tab)

chr12   113363298   113365156   -   gamma3
chr12   113330756   113338695   -   gamma1
chr12   113308036   113314227   -   gamma2b
chr12   113274557   113277035   -   gammaepsilon
chr12   113260153   113264625   -   alpha
chr12   113289248   113295541   -   gamma2a
chr12   113423027   113426701   -   muIgh
chr12   113225832   113255223   -   3'RR
chr12   113416247   113418358   -   IgD

What I want to do is to grab the minimum position in this file, so the minimum position in start column (second column : 113225832, for 3'RR)

Then, I want to substract all my position with this minimum and rearrange the file like this

gamma3  137466   139324
gamma1  104924   112863
...etc

What I have tried so far

Search for minimum value, saved in $min :

min=`awk -v min=1000000000 '{if($2<min){min=$2}}END{print min}' locus.txt`

Then substract position and rearrange the file :

awk -F $'\t' '{$1=$4=""; print $5"\t"$2-$min"\t"$3-$min}' locus.txt

But I got this :

gamma3  0   1858
gamma1  0   7939
gamma2b 0   6191
gammaepsilon    0   2478
alpha   0   4472
gamma2a 0   6293
muIgh   0   3674
3'RR    0   29391
IgD 0   2111

The only correct result is 29391 for 3'RR

Seems not like a complex problem but I can't find a way out of this...

I bet on a casting problem but i'm not even sure. Thanks for your help !

awk • 1.4k views
ADD COMMENT
3
Entering edit mode
6.2 years ago
ATpoint 85k
## First get the minimum:
MIN=$(bc <<< $(sort -k2,2n in.file | awk 'NR == 1 {print $2}'))

## Then subtract and rearrange:
awk -v min=$MIN 'OFS="\t" {print $5, $2-min, $3-min}' in.file > out.rearranged
ADD COMMENT
0
Entering edit mode

Thanks, could you explain the bc <<< please

ADD REPLY
1
Entering edit mode

bc is a basic calculator in bash, allowing to deal with floating point numbers, e.g. bc <<< "scale=9;100/4545454" gives you the division result by nine digits. Admittedly not necessary in your situation, simply MIN=$(sort -k2,2n in.file | awk 'NR == 1 {print $2}') will do just fine, but I got used to always include that bc thing.

ADD REPLY
0
Entering edit mode

outside awk, min variable can be had from: min=$(datamash min 2 < test.txt)

ADD REPLY
2
Entering edit mode
6.2 years ago

To pass a variable to awk you can use the -v option like this (not tested):

awk -v min=$min -F $'\t' '{$1=$4=""; print $5 "\t" $2 - min "\t" $3 - min}' locus.txt
ADD COMMENT
0
Entering edit mode

Throught this was intuitive, thanks

ADD REPLY

Login before adding your answer.

Traffic: 1770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6