Running is therapy

a blog about running, cats and Linux.
  • Home
  • Current Schedule
  • Personal Records
  • Upcoming Races

Oh yeah…I almost forgot

Published by Nathan Powell on November 29, 2005 08:31 pm under computers

In addition to my rant below…

A coworker today (an Ubuntu user) was trying to use the Gnu tool ’sort’. ’sort’ has a switch to ignore case, but we didn’t want it to ignore case (in essense, preserve case, or asciibetical) So that it would sort words based on the alphabet, but put the capitilized letters in order and then the lowercase words in order. Like:

Alpha
Beta
alpha
beta

However ’sort’ relies on your LC env vars. From the man page:


***  WARNING  *** The locale specified by the environment affects sort order.  Set LC_ALL=C to get the traditional sort order that uses native byte values.

In other words, if your distro sets the LC vars to use UTF8 for internationization, it will break sort. That is all well and good and I am a fan of being international. But christ man, you broke sort!! Gentoo however uses “POSIX” for it’s LC settings, thus preserving ASCII, and that is what I want. No more no less. I WANT MY ASCIIBETICAL SORT!!

So you prolly think I am a Neanderthal because I want sort to work, and you might be right, but remember this:

I don’t care.

***EDIT*** Appearently, according to Mark the LC part that matters is ‘en’ not UTF8. ***EDIT***

12 Comments so far

  1. Josiah Ritchie on November 30th, 2005

    Actually, I find that very interesting. Did you change the var or did you do something like break the data up, sort and cat it?

  2. Josiah Ritchie on November 30th, 2005

    Looks like maybe it’s affecting the output of grep also. I tried ‘egrep ^[A-Z] testfile’ and it outputs lines starting with a-z also. It does ignore lines starting with non-alpha characters. That’s even more irritating than messing up sort’s case sensitivity IMHO. (Of course, I may just have a buggy regex. Tell me if that be the case. Please (-: )

  3. Administrator on November 30th, 2005

    Hey Josiah,

    I just did ‘export LC_ALL=C’

  4. Josiah Ritchie on November 30th, 2005

    Yep, If I do that, it fixes grep. How annoying!

  5. Administrator on November 30th, 2005

    I am not sure. The man page seems to suggest that a “bracket expression” (which I always called a charecter class is affected by the LC_* env vars. However I wouldn’t expect the regex you posted to sort based on anything other than the order the words appear in the file.

  6. Administrator on November 30th, 2005

    HAh
    Stopi it! You are posting while I am posting :)

  7. Administrator on November 30th, 2005

    As an aside I down loaded the source for sort last night and had a look. I think what is needed is an “preserve case” or ASCIIbetical switch.

  8. Mark A. Hershberger on November 30th, 2005

    Nate, ya big pansy!

    LC_* are meant to act that way: it also affects “ls” and a multitude of other utils. I prefer to see my files in a case-insensitive order.

    And, it isn’t the UTF8 bit that is causing your “trouble”: its the “en” bit.

  9. Administrator on November 30th, 2005

    I am sure it is meant to act this way. However having a switch to ignore case is stupid if there isn’t one to preserve case.

  10. Josiah Ritchie on December 2nd, 2005

    I agree, it makes sense that the flexibility to easily go back and forth makes sense.

    So is Mark suggesting that really the tools are broker or the configuration. I’m starting to think it’s the tools.

  11. Administrator on December 3rd, 2005

    I downloaded the source for ’sort’, which if you are interested is part of the gnu coreutils package. I just did a cursory look through it. However it does appear that it tries to do “do the right thing” depending upon the env. So while not technically broken it would appear that it has no way of “doing the right thing” when the env vars are set to certain values. Meaning, if there is a switch to ignore case, and not passing it would preserve case, but that doesn’t happen…then the tool itself is “broken”. What is needed is a -p, preserve case switch.

Posting your comment.

  • Search

  • Archives

    • August 2008 (7)
    • July 2008 (11)
    • June 2008 (14)
    • May 2008 (15)
    • April 2008 (10)
    • March 2008 (16)
    • February 2008 (17)
    • January 2008 (37)
    • December 2007 (21)
    • November 2007 (30)
    • October 2007 (29)
    • September 2007 (22)
    • August 2007 (30)
    • July 2007 (49)
    • June 2007 (32)
    • May 2007 (29)
    • April 2007 (38)
    • March 2007 (26)
    • February 2007 (25)
    • January 2007 (23)
    • December 2006 (10)
    • November 2006 (12)
    • October 2006 (9)
    • September 2006 (9)
    • August 2006 (5)
    • July 2006 (13)
    • June 2006 (9)
    • May 2006 (8)
    • April 2006 (11)
    • March 2006 (12)
    • February 2006 (12)
    • January 2006 (13)
    • December 2005 (15)
    • November 2005 (19)
    • October 2005 (8)
  • Categories

    • blather (60)
    • books (10)
    • computers (177)
    • cooking (1)
    • economics (3)
    • emacs (3)
    • football (1)
    • gaming (1)
    • hiking (5)
    • housekeeping (5)
    • lifehacking (7)
    • music (3)
    • paddling (2)
    • personal (1)
    • politics (21)
    • programming (67)
    • running (228)
    • smoking (40)
    • sysadmin (9)
    • tlc (10)
    • Uncategorized (1)
  • Pages

    • Current Schedule
    • Personal Records
    • Upcoming Races

Copyright © 2008 Running is therapy
WordPress Theme based on Light Theme