This class can be used to generate "friendly url's" where numerical id's are replaced with words. A well known example is Gfycat which uses "adjectiveadjectiveanimal
": https://gfycat.com/gracefulspanishgemsbuck
.
This class has two methods: toURL(int $id)
which returns a string generated from the id and parseUrl(string $text)
which returns the id from the parsed text. It supports custom word / category lists, an optional separator and optional formatting.
Usage:
// Load words data
$words = json_decode(file_get_contents('data/words.json'), true);
// Create new instance of UrlGenerator
$urlgen = new UrlGenerator\UrlGenerator($words);
// Convert ID to URL
$url = $urlgen->toUrl(96712);
echo sprintf("URL: '%s'\n", $url);
// Convert URL to ID
$id = $urlgen->parseUrl($url);
echo sprintf("Decoded ID: %d\n", $id);
Output (depending on the wordlist used):
URL: 'sick-viridian-wolf'
Decoded ID: 96712
The UrlGenerator
has a constructor with 4 arguments; all of which but the first are optional:
$words
: The words-structure (see below) to use as 'dictionary'$categories
(optional, see below): if you want to use a different order for categories than the default order (which is the order of the keys of the$words
argument)$separator
(optional, see below): the separator, if any, to use$format
(optional, see below): the format to use
The UrlGenerator
has two public methods:
toUrl(int $id)
: Converts an integer into a URLparseUrl(string $text)
: Converts text into an integer
You can use custom word lists; you can store these anywhere you want like in a JSON file or in a database. As long as you initialize the UrlGenerator
class with the following data structure:
[
"adjectives" => ["big", "smart", "funky"],
"colors" => ["red", "green", "blue"],
"animals" => ["cow","whale","monkey"]
]
The UrlGenerator
will automatically determine which 'categories' are available. In the above example generated url's would take the form adjective-color-animal
. Whenever this should turn out to be not enough, the UrlGenerator
automatically repeats the first category as often as needed; so this would result in adjective-adjective-color-animal
or even adjective-adjective-adjective-color-animal
and so on. However, the order of the categories can be specified by passing an array of words to the $categories
argument of the UrlGenerator
class. You could, for example, pass ['colors','adjectives','animals']
which will result in url's that take the form color-adjective-animal
or, again, when this should not be enough: color-color-color-adjective-animal
.
By default UrlGenerator
uses the -
character to separate words. This results in urls like big-red-whale
. You can specify any desired string as a separator; it helps if the separator string is not contained in any of the words. It is possible to specify an empty (''
) or null
separator. This wil result in urls like bigredwhale
. This is the closest to what Gfycat url's look like. However, you need to take extreme care that the words don't overlap. If, for example, the adjectives would contain both old
and cold
a url like genericoldpanda
will result in an ambiguous result ("generi", "cold", "panda" vs. "generic", "old", "panda"). With a carefully generated wordlist this shouldn't have to be a problem.
A few formats are supported which can, again, be specified when constructing an instance of the UrlGenerator
class. The currently supported formats are ucfirst
, lcfirst
, upper
, lower
and no-format (either an empty string (''
) or null
). All options do what their name implies; so ucfirst
would result in Big-Red-Whale
and upper
in BIG-RED-WHALE
. The "no-format" option just keeps the words in tact.
The toUrl(int $id)
method takes the ID and, basically, does a base conversion similarly to how you would convert the decimal value 967
to the hexadecimal value 3C7
. However, this time we don't have 16 'digits' (0..9, A..F) but any number of words representing a digit.
The parseUrl(string $text)
method does, basically the opposite of the toUrl(int $id)
method; it takes a string and tries to do another base conversion, similarly as how you would convert the hexadecimal value 3C7
to the decimal value 967
. However, this time it's a bit more complicated...
If we could assume there will always be a separator 'per digit' in the string, we could simply split the string at the separator and do our calculations. Even if the separator would not be used but, for example, the ucfirst
option (resulting in BigRedWhale
) we could split out the words pretty easily. However; I wanted to stay as close as possible to the Gfycat implementation. And that complicates things. It, basically, meant I had the following requirements: the url should be case insensitive and contain an optional separator.
The 'decoding' of a URL relies on a lookup table which is created when the UrlGenerator
class is initialized (which, by the way, is a pretty expensive operation; keep the instance around as long as you can if you need to generate or parse more than one URL!). I won't go into too much detail, but in essence a tree is created on a per-character-basis in reverse order. When decoding a URL the algorithm starts at the end working it's way to the beginning of the string while meanwhile working it's way down this tree and looking up word indices in their respective categories. Whenever an index is determined it can be used in the 'base-N' conversion and the algorithm continues until the beginning of the URL is reached or a lookup failed.
- Don't change your wordlist once you go into production. Imagine reassigning or reordering the value of the values
A..F
in the hexadecimal system. It will be very hard, if not impossible, to make this work correctly without resulting in incorrectly converted URL's to ID's or causing ambiguous results etc. - Use large word-lists. Don't go overboard, but categories with a handful of words don't help much (unless you don't mind either long url's (
red-blue-blue-red-red-blue-funky-monkey
for example) or have some more smaller categories). - Whatever wordlist/separator/format you decide on, once you picked it, you're stuck with it (unless you want to break all your URL's or you'll need to do some (on-the-fly?) conversion.
- When not using any separator, try to use longer, unique, words that are not contained in other words (so, for example, avoid "
old
,cold
" or "expensive
,inexpensive
") - If ambiguous words without a separator are unavoidable or desired you could consider only using the
toUrl(int $id)
method and storing the result alongside theid
in your data. Apply a unique constraint and index if you can. That way you can use the field with the URL value to do a lookup - As mentioned earlier; try to keep this class around for as long as possible. The constructor contains some fairly CPU intensive code (building the lookup table), so ideally you keep an instance alive for as long as possible.
The 'ambiguous words' problem can probably be solved in a later version by changing the iteratively lookup process into a recusive algorithm; that way when a lookup turns out to fail (again, cold
, old
for example) the next value can be tried recursively until the URL is completely decoded correctly (or still fail as a whole).
Licensed under MIT license. See LICENSE for details.