I've added a couple of things to my bashrc that I use pretty often. Some of the simpler ones:
greatly reduce time for a lot of operations.
export LC_ALL=C
e.g sorting a 1.8m line bed file goes from 43 seconds with LC_ALL="" to 3.2 seconds with LC_ALL=C
A quick check to make sure all lines have the same number of columns:
function check-columns(){
awk 'BEGIN {FS="\t" }{ print NF }' $1 | sort -u
}
Output tab-delimited output so that the columns are aligned:
alias cols="column -s$'\t' -t"
use like:
head some.bed | cols
Nice question. Looking forward for answers. I keep on bumping in this LC_ALL thing. Do you have a nice link that explain what locale is andwhy it matters?
IIUC LC_ALL=C tells whoever will listen that strings are not multi-byte (e.g. unicode), so no conversion is needed.