Code Monkey home page Code Monkey logo

base62's People

Contributors

1ma avatar grahamcampbell avatar magnetik avatar peter279k avatar rican7 avatar tuupola avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

base62's Issues

Can't successfully convert low-value UUIDs without blowing up.

The Problem

When I tried to use this with my phpexpertsinc/ConciseUUID project, the tests all failed on low value UUIDs.

InvalidArgumentException: $bytes string should contain 16 characters.

Here is a test case:

use Ramsey\Uuid\Uuid;
use Tuupola\Base62Proxy as Base62;

$uuid = Uuid::fromBytes(Base62::decode($conciseUuid));

Here's the test data:

    $badUuids = [
            '0023a441-a3a3-4d9e-bd65-de3381c3a226' => '00GHs6XflJ51yCvZ4TwH4g',
            '1ee9a026-48ef-4592-9d87-88ceea7bc35e' => '0wKXIE87UgfjIvSPLkAHao',
            '0e0aa2a8-1a10-45e4-a67a-c97b9c5a7d19' => '0QUkgNC86JAY1A8JhVZ7iT',
            '1ad0d525-97c9-4c08-ad56-59acd47e3f7c' => '0obEi3noEliUnbTQbhMrLo',
   ];

The Solution

I solved this using ext-gmp via:

    // 3. We pad zeros to the beginning, as the result returned by gmp_strval after base conversion
    // is not always 22 characters long.
    $uuid = str_pad($uuid, 22, '0', STR_PAD_LEFT);

[Suggestion] Add support to encode hex directly

$hex = '123456abcdef';
Base62::encode($hex);  // "JngPBzse6O99qumM"
Base62::encode(hex2bin($hex)); // "5gOMRIbf"

I think it is better to add encodeHex and decodeHex methods, then we can encode hex directly like Base62::encodeHex($hex). And for internal, we may call encodeHex for string encoding like:

public function encodeHex($data)
{
    return ctype_xdigit($data) ? gmp_strval(gmp_init($data, 16), 62) : '';
}

public function encode($data, $integer = false)
{
    if (is_integer($data) || true === $integer) {
        return gmp_strval(gmp_init($data, 10), 62);
    }

    return $this->encodeHex(bin2hex($data));
}

Intolerant of truncated data, inelegant failure mode

This is something of an edge case, but it's reasonably important if encoded strings are embedded in URLs - incomplete data is better than trashed data. If I encode a string using base62, then remove 1 or more chars from the end of the string, then decode it again, it results in complete garbage output.

This is in contrast to similar encoders that degrade more gracefully, such as PHP's built-in base64 functions. In this example I've chosen an input length that does not result in padding in the base64 encoder, as deleting padding is always harmless:

$str = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567';

$base62 = new Tuupola\Base62;
$e = $base62->encode($str);
$d = $base62->decode(substr($e, 0, -1));
var_dump($str, $e, $d);

$e = base64_encode($str);
$d = base64_decode(substr($e, 0, -1));
var_dump($str, $e, $d);

Output:

string(60) "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567"
string(81) "4p1Nflx8o0gcd8e3xWM0Idxg3Fuxk4R0eRc9fYLj2B83X5FRXw82YB3KaoyYLuNBn1LJW2KwVqMIAhO5P"
string(60) "���`�G�`�܎���"�d�3i��3y�`��S�d6ϙ�}��6��\&}7	�l�}O�\000�!ۮG�"
string(60) "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567"
string(80) "YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NTY3"
string(59) "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456"

Note that while the base62 output is unusable, the base64 output just loses a single character at the end.

I suspect this may be because the entire input string is treated as a single GMP number, rather than treating it as a byte stream, or encoding in chunks of a few chars at a time. It's noticeable when you encode nearly identical strings - in base62 they result in completely different output after the difference, rather than only differing in the areas where the input strings are different (as seen in base64). For example:

$str1 = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567';
$str2 = 'abcdefghijklmnopqrstuvwxyz_BCDEFGHIJKLMNOPQRSTUVWXYZ01234567';
var_dump($base62->encode($str1), $base62->encode($str2), base64_encode($str1), base64_encode($str2));

Output (^ indicates difference in output):

string(81) "4p1Nflx8o0gcd8e3xWM0Idxg3Fuxk4R0eRc9fYLj2B83X5FRXw82YB3KaoyYLuNBn1LJW2KwVqMIAhO5P"
string(81) "4p1Nflx8o0gcd8e3xWM0Idxg3Fuxk4R0eRcBcpzqd0KogZYJMSqRuFsDRn11xfrgKLFGaYUmPtLLlF5V9"
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
string(80) "YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NTY3"
string(80) "YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpfQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NTY3"
                                               ^

Leading 0x00 stripped from binary data

I met a leading \0 string when running tests, is this normal?

There was 1 failure:

1) Tuupola\Base62\Base62Test::testShouldEncodeAndDecodeRandomBytes
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-Binary String: 0x00486eea2de87439fc081b892616a3b0f1f098df86e2cdd23e7d21f5f046a30a1a6662fff6c3c017b1d4853a1fdd7dc00975016d9c2801b9df659fadc6abe1109b1e1f3960367603e75bb9ddf9d8097af5948f74df585d05bbee61aff992f3d35577e31aafce7d4342d3a68da0d5ca8d46bde2f7e7f555cf6a1938c4f52bdd43
+Binary String: 0x486eea2de87439fc081b892616a3b0f1f098df86e2cdd23e7d21f5f046a30a1a6662fff6c3c017b1d4853a1fdd7dc00975016d9c2801b9df659fadc6abe1109b1e1f3960367603e75bb9ddf9d8097af5948f74df585d05bbee61aff992f3d35577e31aafce7d4342d3a68da0d5ca8d46bde2f7e7f555cf6a1938c4f52bdd43

/private/tmp/base62/tests/Base62Test.php:41

FAILURES!

Which base62 algorithm does this implement?

My understanding is that there is no formal spec for base62, but that the "glowfall" implementation (despite its lack of stars) has become the de facto implementation (used the most across the most repos).

Does this follow that spec? Or a different one? Or create a new one?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.