Oh yeah…I almost forgot
In addition to my rant below…
A coworker today (an Ubuntu user) was trying to use the Gnu tool ’sort’. ’sort’ has a switch to ignore case, but we didn’t want it to ignore case (in essense, preserve case, or asciibetical) So that it would sort words based on the alphabet, but put the capitilized letters in order and then the lowercase words in order. Like:
Alpha
Beta
alpha
beta
However ’sort’ relies on your LC env vars. From the man page:
*** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.
In other words, if your distro sets the LC vars to use UTF8 for internationization, it will break sort. That is all well and good and I am a fan of being international. But christ man, you broke sort!! Gentoo however uses “POSIX” for it’s LC settings, thus preserving ASCII, and that is what I want. No more no less. I WANT MY ASCIIBETICAL SORT!!
So you prolly think I am a Neanderthal because I want sort to work, and you might be right, but remember this:
I don’t care.
***EDIT*** Appearently, according to Mark the LC part that matters is ‘en’ not UTF8. ***EDIT***
Actually, I find that very interesting. Did you change the var or did you do something like break the data up, sort and cat it?
Looks like maybe it’s affecting the output of grep also. I tried ‘egrep ^[A-Z] testfile’ and it outputs lines starting with a-z also. It does ignore lines starting with non-alpha characters. That’s even more irritating than messing up sort’s case sensitivity IMHO. (Of course, I may just have a buggy regex. Tell me if that be the case. Please (-: )
Hey Josiah,
I just did ‘export LC_ALL=C’
Yep, If I do that, it fixes grep. How annoying!
I am not sure. The man page seems to suggest that a “bracket expression” (which I always called a charecter class is affected by the LC_* env vars. However I wouldn’t expect the regex you posted to sort based on anything other than the order the words appear in the file.
HAh
Stopi it! You are posting while I am posting :)
As an aside I down loaded the source for sort last night and had a look. I think what is needed is an “preserve case” or ASCIIbetical switch.
Nate, ya big pansy!
LC_* are meant to act that way: it also affects “ls” and a multitude of other utils. I prefer to see my files in a case-insensitive order.
And, it isn’t the UTF8 bit that is causing your “trouble”: its the “en” bit.
I am sure it is meant to act this way. However having a switch to ignore case is stupid if there isn’t one to preserve case.
I agree, it makes sense that the flexibility to easily go back and forth makes sense.
So is Mark suggesting that really the tools are broker or the configuration. I’m starting to think it’s the tools.
I downloaded the source for ’sort’, which if you are interested is part of the gnu coreutils package. I just did a cursory look through it. However it does appear that it tries to do “do the right thing” depending upon the env. So while not technically broken it would appear that it has no way of “doing the right thing” when the env vars are set to certain values. Meaning, if there is a switch to ignore case, and not passing it would preserve case, but that doesn’t happen…then the tool itself is “broken”. What is needed is a -p, preserve case switch.