Alphabetizing

Posted on May 12, 2007

Now I’ll get into the whole alphabetical order thing. Lingo uses Unicode (a/k/a utf-8) display the letters and characters. This isn’t anything special, as most modern web programs use Unicode. Unicode is sort of like an international font. Think of it as a huge alphabet made from the alphabets of other languages. It’s not the best analogy, because languages like Chinese have ideograms, not alphabets. Unicode (if you have the full set installed on your PC) is supposed to encompass every letter, character, numeral, and punctuation mark from all of the major languages and some of the smaller ones, too.

Unicode can be alphabetized, sort of. It’s more like it puts things in a certain order. Prior to Unicode was ASCII, so for backwards compatability, the first spots in Unicode were taken up by the ASCII character set, in the exact same order. It starts out with some punctuation and math symbols, moves into numerals, does some more punctuation and symbols, then does the capital letters of the English alphabet, more symbols, lower case English alphabet, and some more symbols. Each character in Unicode is encoded with an alphanumeric code. This code provides the “order” for where it is in the Unicode set.

This causes a quirk with the sorting. Anything that is capitalized (like city names) will show up before any word that isn’t capitalized. For example, in the Russian/English dictionary I’m working on, Yekaterinburg comes right before abacus. Eventually, I want to make it a case-insensitive sort, but I can’t figure it out yet. I spent most of the day just getting it to sort in “alphabetical-ASCII” order, so I’m too tired to figure it out now. I’m sure there’s a way to do this without changing cases, I just have to figure it out. Until then, it’s good enough.

For a tip to other coders, you can put utf-8 data into a mysql table that isn’t mysql. You can also pull it out and sort it with PHP. My webhost doesn’t have the utf8 character set for mysql, so that’s why I’ve been digging into PHP to do it.

Related posts:

  1. whelmed
  2. Learning
  3. Moodle is as Moodle does
  4. So this is Grad school
  5. Palm T|X

Filed Under: Code, Journal, LAMP, Net, Open Source, TESOL | Comments Off

Leave a Comment

If you would like to make a comment, please fill out the form below.

Name (required)

Email (required)

Website

Comments

Spam protection by WP Captcha-Free

Comments are closed.

© Copyright gottahavacuppamocha • Powered by WordpressCoffee Candy is based on the Eye Candy theme.