parsing the output in to a table format ?
1
2
Entering edit mode
2.4 years ago
sunnykevin97 ▴ 990

Hello every one,

I'd like to parse the output into a nice tabular form.

Original file -

Head -n 15 out.txt

OG0017773
lnL(ntime: 17  np: 21):   -308.613954      +0.000000
kappa (ts/tv) =  1.19167
tree length =   1.00231
\n
OG0017774
lnL(ntime: 17  np: 21):  -1176.541361      +0.000000
kappa (ts/tv) =  1.81927
tree length =   0.14755
\n

Expected output -

ID          lnL        kappa   tree-length
OG0017773 -308.613954  1.19167 1.00231
OG0017774 -1176.541361 1.81927 0.14755

Suggestions appreciated.

gene aw genome protein • 761 views
ADD COMMENT
4
Entering edit mode
2.4 years ago

How about

paste - - - - - < out.txt | awk 'BEGIN{OFS="\t";print "ID","lnL","kappa","tree-length"}{print $1,$6,$11,$15}'
ADD COMMENT
1
Entering edit mode

Only 1- 679 rows were in proper format. others were not (see below).

   1 ID      lnL     kappa   tree-length
   2 OG0017768       -1240.508840    1.59033 0.24283
   3 OG0017769       -2607.526498    2.46138 0.35359
   4 OG0017771       -2675.448504    2.01208 0.46143
   5 OG0017773       -308.613954     1.19167 1.00231
   6 OG0017774       -1176.541361    1.81927 0.14755
   7 OG0017775       -1170.122630    1.53226 0.28171
   8 OG0017777       -716.309630     0.92042 0.82997
   9 OG0017778       -2045.420800    2.87851 0.40813
  10 OG0017779       -1402.736338    4.25832 0.48478
  11 OG0017780       -1206.121141    2.56273 0.31509
  12 OG0017781       -2127.133366    2.13188 0.37264
  13 OG0017782       -1101.531974    2.51309 0.23550
  14 OG0017784       -2929.188210    2.73322 0.27905
  .......
  .......

 680  OG0018547       -953.772958     4.69054 0.50710
 679 OG0018548       np:     (ts/tv)
 680 tree    OG0018550       -1088.130393    =
 681 tree    OG0018551       -1190.469363    =
 682 tree    OG0018552       -108.683048     =
 683 tree    OG0018554       -2086.501529    =
 684 tree    OG0018555       -4413.209468    =
 685 tree    OG0018556       -994.403381     =
 686 tree    OG0018557       -1006.241283    =
 ......
 ......
 ......

tree    OG0018692       -2079.088360    =
 803 tree    OG0018693       -796.984367     =
 804 tree    OG0018694       -2302.475465    =
 805 tree    OG0018695       -1368.123380    =
 806 tree    OG0018696
 807 lnL(ntime:      +0.000000       tree    \n
 808 lnL(ntime:      +0.000000       tree    \n
 809 lnL(ntime:      +0.000000       tree    \n
 810 lnL(ntime:      +0.000000       tree    \n
 811 lnL(ntime:      +0.000000       tree    \n
 812 lnL(ntime:      +0.000000       tree    \n
 813 lnL(ntime:      +0.000000       tree    \n
 814 lnL(ntime:      +0.000000       tree    \n
 815 lnL(ntime:      +0.000000       tree    \n
 816 lnL(ntime:      +0.000000       tree    \n

3341 lnL(ntime:      +0.000000       tree    \n
3342 lnL(ntime:      +0.000000       tree    \n
3343 lnL(ntime:      +0.000000       tree    \n
3344 lnL(ntime:      +0.000000       tree    \n
3345 lnL(ntime:      +0.000000       tree    \n
3346 lnL(ntime:      +0.000000       tree    \n
3347 \n      21):    =       =
3348 \n      21):    =       =
3349 \n      21):    =       =
3350 \n      21):    =       =
3351 \n      21):    =       =
3352 \n      21):    =       =
3353 \n      21):    =       =
3354 \n      21):    =       =
3355 \n      21):    =       =
3356 \n      21):    =       =
3357 \n      21):    =       =
3358 \n      21):    =       =
3359 \n      21):    =       =
3360 \n      21):    =       =
3361 \n      21):    =       =
3362 \n      21):    =       =
3363 \n      21):    =       =
3364 \n      21):    =       =
3365 \n      21):    =       =
3366 \n
ADD REPLY
4
Entering edit mode

Well, evidently the lines are shifted, possibly by additional or fewer whitespaces in those lines (either in the sample names or due to the tool's output). You could try to collapse them:

echo "Here     are      too   many whitespaces" | tr -s '[:blank:]'

So give this a spin:

paste - - - - - < out.txt | tr -s '[:blank:]' | awk 'BEGIN{OFS="\t";print "ID","lnL","kappa","tree-length"}{print $1,$6,$11,$15}'

Ultimately, you unfortunately need to troubleshoot this yourself based on the data:

  • try [:space:] instead of [:blank:]
  • change the columns to be printed, if they are off: $14 instead of $15 if you wish to print the fourteenth instead of fifteenth column.
  • If you can pinpoint what changes in the lines that work and those that don't, one could possibly use a Regex in the awk command to alter the output accordingly.
ADD REPLY

Login before adding your answer.

Traffic: 2799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6