suminb / base62 Goto Github PK
View Code? Open in Web Editor NEWPython module for base62 encoding; a URL-safe encoding for arbitrary data
License: Other
Python module for base62 encoding; a URL-safe encoding for arbitrary data
License: Other
Hello,
first of all, thank you for this library. I am using it for encoding 16 byte blocks and I have noticed, that during encoding, leading bytes that are equal to 0x00
are ignored. This is due to conversion to integer, which the library internally does. I believe this is not a correct behavior, because without knowledge of the input bytes block length, you cannot reconstruct (decode) the original input from output. But for example in encryption (and many other areas), all bytes (incl. leading zero bytes) matter.
I'll give an example using base64, which does this correctly:
encoded = b64encode(b'\x00\x00\x01').decode()
print(encoded)
decoded = b64decode(encoded)
print(decoded)
This code yields:
AAAB
b'\x00\x00\x01'
Now your library:
encoded = base62.encodebytes(b'\x00\x00\x01')
print(encoded)
decoded = base62.decodebytes(encoded)
print(decoded)
Yields:
1
b'\x01'
As you can see, decoded output is not equal the input (it misses the two leading zero bytes).
I has run this code and it return not correct
hex_id = "18815C41CB3F4E98AB4C55183553E82B"
base62.encodebytes(bytes.fromhex(hex_id)).swapcase()
# and return the wrong `KeVTaiPBuBS3akxitQdAf`
but it should be 0KeVTaiPBuBS3akxitQdAf
Please consider adding Python 3 trove classifiers (e.g. Programming Language :: Python :: 3
) to this package to advertise Python 3 compatibility. This will also require release a new version on PyPI.
Our preliminary test shows promising results. Building base62
with Cython results in great improvement in performance.
import random
import base62
for _ in range(1000000):
value = random.randint(0, 0xffffffff)
encoded = base62.encode(value)
assert base62.decode(encoded) == value
(pybase62) ➜ base62 git:(develop) ✗ time python test.py
python test.py 8.41s user 0.00s system 99% cpu 8.420 total
(pybase62) ➜ base62 git:(develop) ✗ time python test.py
python test.py 6.63s user 0.00s system 99% cpu 6.636 total
That's approximately 21% of speed improvement. More thorough tests shall be conducted in the near future.
Thank you for the great package, @suminb!
Would you consider publishing wheels beyond the source on PyPI for faster installation? This is, in particular, relevant for projects with long lists of dependencies with small packages like this one. 😅
I'm interested in base62 encoding the bytes digest of a hash: greenelab/deep-review#298.
It looks like base62.encode()
takes anint
not a bytes
object. I assume the solution is to convert the bytes object to an int
. @suminb do you have a suggestion for the best method to use?
Someone sent me an email asking this question:
I see your base62 script at github https://github.com/suminb/base62/blob/develop/base62.py, it's a awesome project and save me lots of time. But I have a question about the code
def decode(b): """Decodes a base62 encoded value ``b``.""" if b.startswith('0z'): b = b[2:] l, i, v = len(b), 0, 0 for x in b: v += _value(x) * (BASE ** (l - (i + 1))) i += 1 return vAbout the above code, what does the
if b.startswith('0z')
do ? why there need to do this? I don't understand. Thanks.
This is a great question, in fact, and I would like to share this question along with my answer with everyone else.
The prefix 0z
is something that I used to use in my old code. It was a decade ago or so. At that time I wanted to use different encodings within a single project. For example,
123456
(decimal)0xf10a
(hexadecimal)0zExid
(base62)were all in use. The code for checking for the prefix 0z
is there merely for backward-compatibility reasons and not necessary in ordinary cases. If you ask base62
to encode an integer, it will not add the prefix 0z
. Likewise, if you ask to decode a base62 string without the prefix, it will work fine.
P.S. It is generally okay to send me an email, but I highly recommend to post your question on GitHub for two reasons:
Changing n /= BASE
to n //= BASE
makes this work with Python 3, without sacrificing backward compatibility.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.