Sed Remove Boostraps From Mrbayes Trees
3
0
Entering edit mode
12.4 years ago
Louis ▴ 50

Hello again Biostars,

I have what should be a simple sed question but I'm having some difficulty. Please take a look and let me know if you have a solution. I'm trying to use sed to remove bootstrap values from a nexus tree file which was created in MrBayes. Apparently, the newest version of MrBayes does not allow you to omit these bootstraps from your output. This is proving problematic since the next program in my pipeline is having difficulty parsing those bootstraps. So given the following, I would like to remove the value between all instances of ":" and ",". However, I want to conserve the ",". The sed edited solution should read as below. My attempt thus far have been too greedy such as sed -e 's/:.*,//g'. Thanks!

  72 BE982029,
  73 RIMD,
  74 TX2103,
  75 3631,
  76 3646,
  77 T3937;
  tree gen.0 =  (28:1.000000000000000e-01,((33:1.000000000000000e-01,(((64:1.000000000000000e-01,54:1.000000000000000e-01):1.000000000000000e-01,35:1.000000000000000e-01):1.000000000000000e-01,(((61:1.000000000000000e-01,55:1.000000000000000e-01):1.000000000000000e-01,47:1.000000000000000e-01):1.000000000000000e-01,((31:1.000000000000000e-01,(77:1.000000000000000e-01,30:1.000000000000000e-01):1.000000000000000e-01)

  72 BE982029,
  73 RIMD,
  74 TX2103,
  75 3631,
  76 3646,
  77 T3937;
  tree gen.0 =  (28,((33,(((64,54),35),(((61,55),47),((31,(77,30))
• 2.8k views
ADD COMMENT
2
Entering edit mode
12.4 years ago
Andreas ★ 2.5k

Use

sed -e 's/:[0-9\.e\-]\+//g'

Short explanation: substitute everything starting with a colon followed by anything matching a number ([0-9]), a dot (escaped, otherwise this means "any") an e or a minus (escaped, otherwise this means "range" in this context), occurring at least once and replace with nothing

Edit (see also comments below): This is using GNU sed. I need the backslash in front of the +. Note that on a Mac you very likely have BSD sed installed. Changing the backslash plus or to an asterisk will do the job as well.

ADD COMMENT
0
Entering edit mode

sorry, it doesn't work on my debian 64-bit. Which system did you test on?

ADD REPLY
0
Entering edit mode

CenOS 6.1 using GNU sed.

Try again, the markup swallowed some backslashes earlier on.

ADD REPLY
0
Entering edit mode

sorry, doesn't work yet! :) spits me back the same line.

ADD REPLY
0
Entering edit mode

The backslash before the + has to be removed, then it works fine: 's/:[0-9\.e\-]+//g'

ADD REPLY
0
Entering edit mode
12.4 years ago
Arun 2.4k

As you rightly mention, SED is greedy with regular expressions. The .* will replace the the longest occurring pattern with your replace string. Instead you should do it this way: Check for pattern that starts with : followed by any number of not : characters and then a ,.

> echo "tree gen.0 =  (28:1.000000000000000e-01,((33:1.000000000000000e-01,(((64:1.000000000000000e-01,54:1.000000000000000e-01):1.000000000000000e-01,35:1.000000000000000e-01):1.000000000000000e-01,(((61:1.000000000000000e-01,55:1.000000000000000e-01):1.000000000000000e-01,47:1.000000000000000e-01):1.000000000000000e-01,((31:1.000000000000000e-01,(77:1.000000000000000e-01,30:1.000000000000000e-01):1.000000000000000e-01)" > test.txt
> sed -e 's/:[^:]*,/,/g' test.txt
tree gen.0 =  (28,((33,(((64,54:1.000000000000000e-01),35:1.000000000000000e-01),(((61,55:1.000000000000000e-01),47:1.000000000000000e-01),((31,(77,30:1.000000000000000e-01):1.000000000000000e-01)
# there is also a pattern with )
> sed -e 's/:[^:]*,/,/g; s/:[^:]*)/)/g' test.txt
tree gen.0 =  (28,((33,(((64,54),35),(((61,55),47),((31,(77,30))
ADD COMMENT
0
Entering edit mode
12.4 years ago
Louis ▴ 50

Andreas, your script did not work on my Mac and I'm not certain why. Regardless, thank you for the reply.

Arun, your script did the trick. I also found an additional possibility (shown below) that does the same.

sed -e 's/:[^,)]*//g'

ADD COMMENT
0
Entering edit mode

oh my... how did I miss that! :)

ADD REPLY
0
Entering edit mode

oh, your reply should normally be a comment to his or my answer, rather than a new answer.

ADD REPLY

Login before adding your answer.

Traffic: 3550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6