May 12 2007

Alphabetizing

Published by lafnlab at 12:35 am under Code,Journal,LAMP,Net,Open Source,TESOL

Now I’ll get into the whole alphabetical order thing. Lingo uses Unicode (a/k/a utf-8) display the letters and characters. This isn’t anything special, as most modern web programs use Unicode. Unicode is sort of like an international font. Think of it as a huge alphabet made from the alphabets of other languages. It’s not the best analogy, because languages like Chinese have ideograms, not alphabets. Unicode (if you have the full set installed on your PC) is supposed to encompass every letter, character, numeral, and punctuation mark from all of the major languages and some of the smaller ones, too.

Unicode can be alphabetized, sort of. It’s more like it puts things in a certain order. Prior to Unicode was ASCII, so for backwards compatability, the first spots in Unicode were taken up by the ASCII character set, in the exact same order. It starts out with some punctuation and math symbols, moves into numerals, does some more punctuation and symbols, then does the capital letters of the English alphabet, more symbols, lower case English alphabet, and some more symbols. Each character in Unicode is encoded with an alphanumeric code. This code provides the “order” for where it is in the Unicode set.

This causes a quirk with the sorting. Anything that is capitalized (like city names) will show up before any word that isn’t capitalized. For example, in the Russian/English dictionary I’m working on, Yekaterinburg comes right before abacus. Eventually, I want to make it a case-insensitive sort, but I can’t figure it out yet. I spent most of the day just getting it to sort in “alphabetical-ASCII” order, so I’m too tired to figure it out now. I’m sure there’s a way to do this without changing cases, I just have to figure it out. Until then, it’s good enough.

For a tip to other coders, you can put utf-8 data into a mysql table that isn’t mysql. You can also pull it out and sort it with PHP. My webhost doesn’t have the utf8 character set for mysql, so that’s why I’ve been digging into PHP to do it.

Related posts:

  1. whelmed
  2. Bibliophile
  3. Search
  4. Learning
  5. Fun with Japanese by マイコル・ハックス。

Tags

2 stars 3 stars 4 stars 1999 2003 2004 2005 2006 2007 2008 Academy Award action film animation based on a book based on a true story black and white film california china comedy comic book film ensemble evil plot fantasy france gangsters horror infidelity italy japan martial arts film movie review new york city outer space period film prostitution revenge Review Haiku science fiction sequel sexploitation subtitles thriller united kingdom united states war film

Comments Off

Comments are closed at this time.

Search

You are protected by wp-dephorm: