This is a C library for compressing short strings. It was developed to individually compress and decompress small strings. In general compression utilities such as zip
, gzip
do not compress short strings well and often expand them. They also use lots of memory which makes them unusable in constrained environments like Arduino.
Note: The present byte-code version is 2 and it replaces Unishox 1. Unishox 1 is still available as unishox1.c, but it will have to be compiled manually if it is needed.
- Compression for low memory devices such as Arduino and ESP8266
- Compression of Chat application text exchange include Emojis
- Storing compressed text in database
- Faster retrieval speed when used as join keys
- Bandwidth and storage cost reduction for Cloud
Unishox is an hybrid encoder (entropy, dictionary and delta coding). It works by assigning fixed prefix-free codes for each letter in the above Character Set (entropy coding). It also encodes repeating letter sets separately (dictionary coding). For Unicode characters, delta coding is used.
The model used for arriving at the prefix-free code is shown below:
The complete specification can be found in this article: Unishox 2 - Guaranteed Configurable Compression for Short Strings using Entropy, Dictionary and Delta encoding techniques.
To compile, just use make
or use gcc as follows:
gcc -o unishox2 test_unishox2.c unishox2.c
For testing the compiled program, use:
./test_unishox2 -t
int unishox2_compress_simple(const char *in, int len, char *out);
int unishox2_decompress_simple(const char *in, int len, char *out);
To see Unishox in action, simply try to compress a string:
./test_unishox2 "Hello World"
To compress and decompress a file, use:
./test_unishox2 -c <input_file> <compressed_file>
./test_unishox2 -d <compressed_file> <decompressed_file>
Unishox does not give good ratios compressing large files or compressing binary files.
Unishox supports the entire Unicode character set. As of now it supports UTF-8 as input and output encoding.
- Unishox Compression Library for Arduino Progmem
- Sqlite3 User Defined Function as loadable extension
- Sqlite3 Library for ESP32
- Sqlite3 Library for ESP8266
- Port of Unishox 1 to Python and C++ by Stephan Hadinger for Tasmota
- Python bindings for Unishox2
- Thanks to Jonathan Greenblatt for his port of Unishox2 that works on Particle Photon
- Thanks to Chris Partridge for his port of Unishox2 to CPython and his comprehensive tests using Hypothesis and extensive performance tests.
- Thanks to Stephan Hadinger for his port of Unishox1 to Python for Tasmota
In case of any issues, please email the Author (Arundale Ramanathan) at [email protected] or create GitHub issue.