Messing around with code

May 20, 2016

Coding is something I do so rarely I often have to look online for tips and go through alot* of trial and error to get things done. On the current project, I wanted to do something differently.

Normally, in most database tables, it’s common to create an ID field to hold a number that serves as a unique identifier for that record (commonly known as a primary key). It’s easy to setup because the database does all the work, therefore it’s a very common form of primary key. However, primary keys don’t necessarily have to be numbers. The only requirement is that they be unique within the table.

For this project I wanted each primary key to be 10 characters long and consist of numbers and letters. Ten digits, plus 26 lowercase characters, plus 26 uppercase characters in a ten character string means 141 trillion possible unique keys (10+26+26)10 = 141,167,095,653,412 That should be enough for any database… that I’m working on.

In playing around with the code I realized there were a lot of other possibilities. I didn’t have to limit myself to numbers and the Latin alphabet. In fact, I didn’t need to use them at all. The code below is the same code I came up with, but the characters I used are different.


// generates a 10 character random ID
function id_gen($newid) {
$chars = "⩽⩾⩿⪀⪁⪂⪃⪄⪅⪆⪇⪈⪉⪊⪋⪌⪍⪎⪏⪐⪑⪒⪓⪔⪕⪖⪗⪘⪙⪚⪛⪜⪝⪞⪟⪠⪯⪰⪱⪲⪳⪴⪵⪶⪷⪸⪹⪺";
$tmp = preg_split("//u", $chars, -1, PREG_SPLIT_NO_EMPTY);
shuffle($tmp);
$tmp2 = join("", $tmp);
return mb_substr($tmp2,0,10,"UTF-8");
}

Calling id_gen in some PHP code would result in something like this ⩾⪺⪐⪒⪞⪠⪎⪉⪚⪷. Imagine a table full of those as their primary keys. The key to the function is PHP’s builtin mb_substr function. The normal substr function will take a string and cut it to a particular length, but it doesn’t do well with most UTF-8 characters. The mb_substr function can handle multibyte characters, which is why it’s used in this function.

* Fuck prescriptivist grammarians. “Alot” is a perfectly legitimate English word. If it gets used and is understood in context, it’s part of the language.