Question

Value extraction from large file with bash

1

Entering edit mode

5.6 years ago

mel22 ▴ 100

Hi I have many large files with association results. Each file containes 8 columns (the 3rd one is the p value), I need to create from each file a new one conataining only observations where the p value is < 10 e-5. How can I do this with bash code ? Here a small example from these files :

         SNP      N         P        p2        or1     or2    q        q1           
 c10_pos5974849   2      0.1881      0.1881  1.1931  1.1931  0.5707    0.00
 c10_pos5975482   2      0.3225      0.3225  0.8670  0.8670  0.8840    0.00
 c11_pos68438345   2      0.6537        0.66  0.9705  0.9690  0.2856   12.29
 c11_pos107693921   2      0.8938      0.8558  1.0133  1.0250  0.1755   45.52
 c12_pos67499221   2      0.8351      0.8351  1.0236  1.0236  0.6413    0.00
 c14_pos67844869   2      0.1103      0.1915  0.7334  0.7229  0.2039   38.05
 c14_pos68073026   2     0.09954      0.1298  0.6383  0.6215  0.2662   19.11
 c14_pos68087872   2      0.3704      0.3704  1.2500  1.2500  0.7319    0.00

Thank you

SNP • 1.1k views

ADD COMMENT • link updated 5.6 years ago by Pierre Lindenbaum 166k • written 5.6 years ago by mel22 ▴ 100

1

Entering edit mode

one word : awk ! ;)

in general: if you're working with column like data, always consider awk for processing it

ADD REPLY • link 5.6 years ago by lieven.sterck 15k

1

Entering edit mode

with gnu-parallel and awk:

$ parallel --dry-run  "awk -F \"\t\" 'NR==1 {print}; \$3<=10^-5 {print}' {} > out/{.}.filter.txt" ::: *.txt

create a folder by name "out" in the current folder and run the script in the current folder. Remove dry-run to execute the command.

ADD REPLY • link 5.6 years ago by cpad0112 21k

0

Entering edit mode

That's great thanks cpad0112

ADD REPLY • link 5.6 years ago by mel22 ▴ 100

score 6 · Accepted Answer · 2020-01-08

6

Entering edit mode

5.6 years ago

Pierre Lindenbaum 166k

find . -type f -name "*.common.suffix"  | while read F ; do  awk '($5=="P"  || $5 < 1E-5)' $F > ${F}.subset.txt ; done

ADD COMMENT • link 5.6 years ago by Pierre Lindenbaum 166k