Entering edit mode
5.2 years ago
Sam
▴
20
I have a big file contains a huge amount of SNP data, I like to search multiple columns ( here column 3 to 5) and filter any lines contain values between 0.5 to 1.
SNP NN1 NN2 NN3
1 AG 0.0 0.0 1.0
2 CC 0.2 0.0 2.0
3 CC 0.3 0.5 0.0
4 AA 0.5 0.5 1.0
5 GG 0.4 0.0 0.0
expected output;
SNP NN1 NN2 NN3
1 AG 0.0 0.0 1.0
3 CC 0.3 0.5 0.0
4 AA 0.5 0.5 1.0
I could find awk scripts that use a single field but not multiple field/column for query.
ex; awk -F "," '$3 >= 0.5 && $3 <= 1'
input.csv > out.csv
Any help will be appreciated. Thanks a lot.
How to add images to a Biostars post
Edit by ATpoint : You have to use the image button (right of
10101
in the format bar), pasting the full path/link to the picture including the suffix sohttps://i.ibb.co/92CjrYk/bst.png
instead of the link you originally posted.Thanks, Arup for formatting the data. Also it would be helpful to know how to paste excel data on Biostar effectively.
You can share data snippets in Github and link that to biostars thread.
Grouping the conditions and adding the "or" operator
||
should do the trickawk -F "," '($3 >= 0.5 && $3 <= 1) || ($4 >= 0.5 && $4 <= 1) || ($5 >= 0.5 && $5 <= 1)' input.csv > out.csv
Thanks, Sus... But I have > 100 columns, is that possible to search a range of columns as a condition. for example. choose between $3 to $100
I suggest you embed this in a for loop within
awk
with an option to exit the look (= go to next line) if any of the conditions is not fulfilled, basically iterating from the 3rd to the nth column.