Having a bit of trouble reformatting this messed up run log. I want to remove the strings of characters that did not translate correctly from linux terminal stdout into the log file and then replace those string with a \t, a \n, or white space. Doing it for a large number of files, so I need a command line solution.
The following malformed strings repeat for every entry in the log:
- ^[[3J^[[H^[[2J^[[1;33m
- ^[[0m^[[0;33m
- ^[[0m^[[1;33m
- ^[[0m|^H/^H-^H^H
- ^[[1;37m
- ^[[0m^[[0;37m
- ^[[0m^[[1;37m
- ^[[0m^[[0;37m
- ^[[0m^[[0;37m^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^H^[[0m^[[0;37m
- ^[[0m^[[1;32m
- ^[[0m^[[0;32m
I've tried numerous gnu sed regexs to try to capture these with escaped special chars, but I keep getting 's/ ' unterminated errors (I think mainly due to that opening ^ in the strings?). Any pointers on how to go about doing this with sed or awk? Is there an easier way, perhaps with some sort of a find and replace python/perl script?
This is my current regex:
sed 's/\^\[\[3J\^\[\[H\^\[\[2J\^\[\[1;33m//g; s/\^\[\[0m\^\[\[0;33m//g; s/\^\[\[0m\^\[\[1;33m//g; s/\^\[\[0m|\^H\/\^H\-\^H\^H//g; s/\^\[\[1;37m//g; s/\^\[\[0m\^\[\[0;37m//g; s/\^\[\[0m//g; s/\^H//g; s/\^\[\[1;32m//g; s/\^\[\[0;32m//g' run.log > run_clean.log
I tried your command on a sample file and it worked for me.
This link might help:
https://unix.stackexchange.com/questions/14684/removing-control-chars-including-console-codes-colours-from-script-output
Helpful to know it works for you and that my regex is at least correct. Something else is going wrong then I suppose.
Based on your suggestion about color codes, I think the answer might be due to the fact that sed is a stream editor and these are terminal ansi codes. If you cat the log file, the progress bar representations and colors show up as shown below.
https://pasteboard.co/JvYUOyh.png
So sed can't recognize the codes because it is essentially reading the file like cat.
Is this a bioinformatics question?
More of a raw data skills question sure. I'm working on a bioinformatics pipeline of mtdna deletion calling using eKLIPse deletion caller. So yes, it is related to bioinformatics in that I'm trying to clean up the eKLIPse logs.