phoboslab / qoi Goto Github PK

View Code? Open in Web Editor NEW

6.8K 6.8K 327.0 226 KB

The “Quite OK Image Format” for fast, lossless image compression

License: MIT License

C 98.46% Makefile 1.54%

qoi's People

Contributors

Stargazers

Watchers

Forkers

unitycoder sgraut kokizzu fonsleenaars xnoreq yatima1460 mfkiwl umezawatakeshi nsauzede therealmarv takotakot-archives blackle manuforks photonlines awakecoding hotelzululima alexpreynolds slembcke emattiza landaire nunn-daddy edznux hanclinto nigeltao sunmachine opfour vec4f mattaningram tingdaodi 5l1v3r1 ewouth templeblock liansheng197 betorcs wrzucher mkll thaikien dubajj lifeart neildeo05 wubicookie farseer2 haik3638 rjvysakh pfusik leilavr rockystevejobs fadedbee hs5530hs rmn20 kieroooo ikskuh cyberflamego pavlinb samyak2 vitaliytalyh lbatalha cjun714 mu-l lyrl liuyf5231 jonyhuang andrea-mariadb-1-1 texpert vicktor vonj usbalex yyp2003net oscardssmith abeobk franciscorpuz signprompt hj3938 eco747 nga76 userrarrrm siskin-framework zakarumych moneyl beyonddream-productions strogo chocolate42 suryatmodulus gareins justanotherdot c0c1 gitcnsh-dslin nulliion alliedenvy rbino notnullnotvoid sanyinm fedor4ever anr2me elihwyma tclarke miljkovn mkcg 4144 xionggithub

qoi's Issues

GDIFF_8, GDIFF_16, GDIFF_24

So I did a quick test, I tried to replace DIFF_8, DIFF_16 and DIFF_24 with GDIFF without adding new opcodes.
Here is my results, I think that 222 diff_8 + 454 gdiff_16 + 474 gdiff_24 is the most optimal.

original
kodak		771	
misc		398	
screenshots	2582
textures	184	
wallpaper	10674

222 gdiff_8 + 373 gdiff_16 + 474 gdiff_24
kodak		693	
misc		412	
screenshots	2401
textures	191	
wallpaper	10501

222 gdiff_8 + 454 gdiff_16 + 555 gdiff_24
kodak		719	
misc		401	
screenshots	2501
textures	178	
wallpaper	10412

222 gdiff_8 + 454 diff_16 + 555 diff_24
kodak		772	
misc		401	
screenshots	2587
textures	185	
wallpaper	10773

222 diff_8 + 454 gdiff_16 + 555 diff_24
kodak		722	
misc		399	
screenshots	2508
textures	178	
wallpaper	10307

222 diff_8 + 454 diff_16 + 555 gdiff_24
kodak		768	
misc		398	
screenshots	2579
textures	183	
wallpaper	10616

222 diff_8 + 454 gdiff_16 + 555 gdiff_24
kodak		721	
misc		399
screenshots	2504
textures	178	
wallpaper	10296

222 diff_8 + 454 gdiff_16 + 474 gdiff_24
kodak		694	
misc		404
screenshots	2425
textures	178	
wallpaper	10184

222 diff_8 + 373 gdiff_16 + 555 gdiff_24
kodak		699	
misc		404
screenshots	2405
textures	188	
wallpaper	10352

Add notice that this library should not be used on untrusted input

First let me say that this project is very cool and I'm looking forward to seeing where it's going. I saw the announcement of this project and noticed there was no description of where this library is safe to use and no fuzz tests. I wrote a super basic libfuzzer harness that can trigger ASAN violations pretty quickly with no corpus:

#define QOI_IMPLEMENTATION
#include "qoi.h"
#include <stddef.h>
#include <stdint.h>

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  int w, h;
  if (size < 4) {
    return 0;
  }

  qoi_decode((void*)(data + 4), (int)(size - 4), &w, &h, *((int *)data));
  return 0;
}

$ clang -fsanitize=address,fuzzer fuzz.c && ./a.out

IMO since this project is getting a fair amount of attention rather quickly it may be wise to note that this reference implementation is purely experimental at this time and should not be used on untrusted inputs.

The final(?) specification

I want to apologize.

I may have been too quick with announcing the file format to be finished. I'm frankly overwhelmed with the attention this is getting. With all the implementations already out there, I thought it was a good idea to finalize the specification ASAP. I'm no longer sure if that was the right decision.

QOI is probably good enough the way it is now, but I'm wondering if there are things that could be done better — without sacrificing the simplicity or performance of this format.

One of these things is the fact that QOI_RUN_16 was determined to be pretty useless, and QOI could be become even simpler by just removing it. Maybe there's more easy wins with a different hash function or distributing some bits differently? I don't know.

At the risk of annoying everyone: how do you all feel about giving QOI a bit more time to mature?

To be clear, the things I'd be willing to discuss here are fairly limited:

I don't want more features (higher bit depth, custom headers, more meta info...).
I'm generally against any ideas that make the format more complex (e.g. mode switches, transforms into YUV colorspace...)
I don't want to make decoding of the chunks to be dependent on some information in the header (e.g. different behaviours for 3 or 4 channels)

What I'm looking for specifically is:

Changes that make the format simpler
Changes that would yield better performance when en-/decoding
Changes that improve compression without making QOI more complex

Should we set a deadline in 2-3 weeks to produce the really-final (pinky promise) specification? Or should we just leave it as it is?

Again, I'm very sorry for the confusing messaging!

Edit: Thanks for your feedback. Let's produce the final spec till 2021.12.20.

Compiler warning; unsequenced modifications.

Compiler: Apple clang version 13.0.0 (clang-1300.0.29.3)

./qoi.h:382:14: warning: multiple unsequenced modifications to 'p' [-Wunsequenced]
int magic = QOI_READ_32(bytes, p);
^~~~~~~~~~~~~~~~~~~~~
./qoi.h:245:28: note: expanded from macro 'QOI_READ_32'
#define QOI_READ_32(B, P) (QOI_READ_16(B, P) << 16 | QOI_READ_16(B, P))
^~~~~~~~~~~~~~~~~
./qoi.h:244:33: note: expanded from macro 'QOI_READ_16'
#define QOI_READ_16(B, P) (((B[P++] & 0xff) << 8) | (B[P++] & 0xff))

The order in which P is incremented is non-deterministic / undefined.

16 bit variant?

I was thinking this might be cool in OpenEXR ~ at first glance, it looks like a 16 bit variant would be straight forward?

QOI codec in Rebol

Hi,

I've just included QOI codec in my Rebol version.
Oldes/Rebol3@15de62c

Here is my current result:

Cheers!

Delete "Lenna" (cropped pornography) from your test corpus

Hi, my business partner, who is an ex-Unity/Facebook engineer (and a woman), noticed you have cropped pornography in your test corpus:
https://phoboslab.org/files/qoibench/

The cropped pornography image in question is "misc/lenna.png". It should be deleted. This is very unprofessional.

Consider adding a _clang-format file

...to enforce consistency when contributing

Change wire format to little-endian

Spun out of #28 (comment) where I said:

I would instead advise to make everything little endian. Not just the header fields, but also the bytecodes. Almost all CPUs used today are little-endian, and support unaligned loads.

I just uploaded a proof of concept which showed an approx 1.05x improvement in decode speed. Copy/pasting the commit message:

Change wire format to little-endian

This is a backwards incompatible change to the wire format. For example,
the QOI_DIFF_24 bit-packing encoding changes from:

| MSB   6   5   4   3   2   1 LSB |
+---------------------------------+
|   1   1   1   0  r4  r3  r2  r1 | addr+0
|  r0  g4  g3  g2  g1  g0  b4  b3 | addr+1
|  b2  b1  b0  a4  a3  a2  a1  a0 | addr+2

to:

| MSB   6   5   4   3   2   1 LSB |
+---------------------------------+
|  r3  r2  r1  r0   0   1   1   1 | addr+0
|  b1  b0  g4  g3  g2  g1  g0  r4 | addr+1
|  a4  a3  a2  a1  a0  b4  b3  b2 | addr+2

----

Decode speed-up on an Intel NUC (Comet Lake, BXNUC10i5FNKPA):
1.07x  images/kodak
1.04x  images/misc
1.01x  images/screenshots
1.06x  images/textures
1.05x  images/wallpaper

In detail:

images/kodak
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:       8.0       144.2         49.02          2.73       717
stbi:         8.8        84.2         44.43          4.67       979
qoi:          3.6         4.2        109.80         93.41       771
qoile:        3.3         4.2        117.70         93.41       771

images/misc
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:       9.3        89.1         82.18          8.57       335
stbi:         8.2        77.5         93.17          9.85       497
qoi:          2.8         3.1        275.03        245.73       451
qoile:        2.7         3.1        285.92        245.95       451

images/screenshots
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:      45.3       519.5        181.85         15.84      2219
stbi:        35.2       622.5        233.73         13.22      2821
qoi:         24.3        23.9        339.08        344.26      2582
qoile:       23.9        23.9        343.80        344.20      2582

images/textures
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:       2.6        33.1         50.84          3.92       163
stbi:         2.5        19.8         52.26          6.56       232
qoi:          0.9         1.0        149.74        130.18       184
qoile:        0.8         1.0        158.48        130.22       184

images/wallpaper
        decode ms   encode ms   decode mpps   encode mpps   size kb
libpng:     154.4      2289.3         60.71          4.09      9224
stbi:       190.0      1455.1         49.32          6.44     13299
qoi:         71.2        76.6        131.57        122.35     10647
qoile:       67.9        76.6        138.11        122.37     10647

The key difference between qoi.h and qoile.h is:

391c183
< void *qoi_decode(const void *data, int size, int *out_w, int *out_h, int channels) {
---
> void *qoile_decode(const void *data, int size, int *out_w, int *out_h, int channels) {
427c219
< 			int b1 = bytes[p++];
---
> 			uint32_t b = peek_u32le(bytes + p);
429,462c221,257
< 			if ((b1 & QOIBE_MASK_2) == QOIBE_INDEX) {
< 				px = index[b1 ^ QOIBE_INDEX];
< 			}
< 			else if ((b1 & QOIBE_MASK_3) == QOIBE_RUN_8) {
< 				run = (b1 & 0x1f);
< 			}
< 			else if ((b1 & QOIBE_MASK_3) == QOIBE_RUN_16) {
< 				int b2 = bytes[p++];
< 				run = (((b1 & 0x1f) << 8) | (b2)) + 32;
< 			}
< 			else if ((b1 & QOIBE_MASK_2) == QOIBE_DIFF_8) {
< 				px.rgba.r += ((b1 >> 4) & 0x03) - 1;
< 				px.rgba.g += ((b1 >> 2) & 0x03) - 1;
< 				px.rgba.b += ( b1       & 0x03) - 1;
< 			}
< 			else if ((b1 & QOIBE_MASK_3) == QOIBE_DIFF_16) {
< 				int b2 = bytes[p++];
< 				px.rgba.r += (b1 & 0x1f) - 15;
< 				px.rgba.g += (b2 >> 4) - 7;
< 				px.rgba.b += (b2 & 0x0f) - 7;
< 			}
< 			else if ((b1 & QOIBE_MASK_4) == QOIBE_DIFF_24) {
< 				int b2 = bytes[p++];
< 				int b3 = bytes[p++];
< 				px.rgba.r += (((b1 & 0x0f) << 1) | (b2 >> 7)) - 15;
< 				px.rgba.g +=  ((b2 & 0x7c) >> 2) - 15;
< 				px.rgba.b += (((b2 & 0x03) << 3) | ((b3 & 0xe0) >> 5)) - 15;
< 				px.rgba.a +=   (b3 & 0x1f) - 15;
< 			}
< 			else if ((b1 & QOIBE_MASK_4) == QOIBE_COLOR) {
< 				if (b1 & 8) { px.rgba.r = bytes[p++]; }
< 				if (b1 & 4) { px.rgba.g = bytes[p++]; }
< 				if (b1 & 2) { px.rgba.b = bytes[p++]; }
< 				if (b1 & 1) { px.rgba.a = bytes[p++]; }
---
> 			if ((b & QOILE_MASK_2) == QOILE_INDEX) {
> 				px = index[(b >> 2) & 63];
> 				p += 1;
> 			}
> 			else if ((b & QOILE_MASK_3) == QOILE_RUN_8) {
> 				run = ((b >> 3) & 0x1F);
> 				p += 1;
> 			}
> 			else if ((b & QOILE_MASK_3) == QOILE_RUN_16) {
> 				run = ((b >> 3) & 0x1FFF);
> 				p += 2;
> 			}
> 			else if ((b & QOILE_MASK_2) == QOILE_DIFF_8) {
> 				px.rgba.r += ((b >> 2) & 0x03) - 1;
> 				px.rgba.g += ((b >> 4) & 0x03) - 1;
> 				px.rgba.b += ((b >> 6) & 0x03) - 1;
> 				p += 1;
> 			}
> 			else if ((b & QOILE_MASK_3) == QOILE_DIFF_16) {
> 				px.rgba.r += ((b >>  3) & 0x1F) - 15;
> 				px.rgba.g += ((b >>  8) & 0x0F) - 7;
> 				px.rgba.b += ((b >> 12) & 0x0F) - 7;
> 				p += 2;
> 			}
> 			else if ((b & QOILE_MASK_4) == QOILE_DIFF_24) {
> 				px.rgba.r += ((b >>  4) & 0x1F) - 15;
> 				px.rgba.g += ((b >>  9) & 0x1F) - 15;
> 				px.rgba.b += ((b >> 14) & 0x1F) - 15;
> 				px.rgba.a += ((b >> 19) & 0x1F) - 15;
> 				p += 3;
> 			}
> 			else if ((b & QOILE_MASK_4) == QOILE_COLOR) {
> 				p += 1;
> 				if (b & 0x10) { px.rgba.r = bytes[p++]; }
> 				if (b & 0x20) { px.rgba.g = bytes[p++]; }
> 				if (b & 0x40) { px.rgba.b = bytes[p++]; }
> 				if (b & 0x80) { px.rgba.a = bytes[p++]; }

Makefile for macOS big sur

Hi, just noticed your cool project on hackaday.
Wanted to test this awesome tool on my macbook.
Anyway here's a Makefile for macOS big sur (sorry no time to create a pull request...):

# Makefile that works on macos big sur with brew install libpng and then
# just running make
CC        = gcc
CFLAGS    = -Wall -O2 -arch x86_64 -I. 
HEADERS   = qoi.h
CONVERTER   = qoiconv
BENCHMARK   = qoibench
INSTALL_DIR = /usr/local/bin

all: $(CONVERTER) $(BENCHMARK)

$(CONVERTER): qoiconv.c $(HEADERS)
  $(CC) $(CFLAGS) qoiconv.c -o $(CONVERTER)
  strip $(CONVERTER)

$(BENCHMARK): qoibench.c $(HEADERS)
  $(CC) $(CFLAGS) qoibench.c -lpng -o $(BENCHMARK)
  strip $(BENCHMARK)


clean:
  @rm -rf $(CONVERTER) $(BENCHMARK)

Usage: just run make to create the 2 binaries:

make
gcc -Wall -O2 -arch x86_64 -I.  qoiconv.c -o qoiconv
strip qoiconv
gcc -Wall -O2 -arch x86_64 -I.  qoibench.c -lpng -o qoibench 
... some warnings related to libpng stuff ...
6 warnings generated.
strip qoibench

And then run qoiconv or qoibench:

$ ./qoiconv 
Usage: qoiconv <infile> <outfile>
Examples:
  qoiconv input.png output.qoi
  qoiconv input.qoi output.png

Keep up the great work.

Kind regard,
Walter

pure go implementation

See https://github.com/xfmoulet/qoi : a pure go implementaiton of qoi file format.
Performance (not tuned) is around half the C version when compiled with gcc -O3

Header endianness

Hi!
First of all, very nice work!

I've noticed a potential issue with qoi_header_t. Entire qoi file format is generally Big-endian, but width,height and size integers in the header are represented in the output file either as Little or Big endian, depending on the machine architecture. This unfortunately makes the qoi files non-portable across machines with different byte-orders.

If this was not intentional, could you consider placing htonl/ntohl and htons/ntohs calls before/after packing/unpacking the mentioned integers in qoi_header_t? I can submit a PR if you don't have the time.

Implementing in WebAssembly

This may be a dumb and obvious question but how would you implement this in WebAssembly so you can use it with things like Node Serverless Functions and Edge Workers?

a proposed change of encoding

first, thanks for this amazingly simple and efficient idea!
I hope you don't mind if I use something similar in my personal projects - namely as a preprocessing step before deflating directional lightmaps in my engine

I've noticed though that it has a problem with images with lots of variation in the alpha channel

I made a couple of changes in my from-scratch implementation and get ~8% size improvement over QOI on the kodim set
a side note: sometimes the full block with mask (color block in qoi) might encode better than the 24-bit block, but gives only marginal gains

new proposed encoding (also delta ranges are 2's complement, say -16..15)

the encoding differs in two modes of operation: the default one is color mode, and the other is alpha mode

bit prefix:
00 use 6-bit index (same for both modes)
01 use rgb delta -2..1 (same for both modes)
10 - color mode: use 4 bits for blue and 5 bits for red and green delta
- alpha mode: use 4-bit for rgb and 2-bit delta for alpha
110 rle mode + 5-bit rep
the rle mode is special, however. if another rle command follows this one, rep count is merged like this: (rep << 5) + lo 5 bits of cmd. the encoder has to make sure the chained rle commands go in big endian order to decode properly. ultimately rep-1 is encoded
1110 (24-bit code)
- color mode: 6 bits blue, 7 bits red and green delta
- alpha mode: 5 bits for r,g,b and a
1111 full with mask, same for alpha and color mode
exactly the same as qoi: 4 bit channel mask, then individual channel bytes if present
except that if mask is 0, the decoder should flip between modes (color<->alpha) and continue processing (no pixel is output in this case)

currently I simply output the mode switch byte before encoding a pixel which has alpha > 0 and < 255, this helps with images with complicated alpha. a more advanced encoder might use this byte to switch back to color mode at some point, but I don't do this

I'm also using a more sophisticated hashing (xor/shift + some mults on 32-bit pixel value), but the benefit of this is dubious

two passes?

First of all, this is awesome - great work!

Since encoding is 20x-50x faster, would it make sense to run a 2nd pass, with a different direction, and then keep the smallest of the 2 results?

So, basically run a "horizontal" pass, then a "vertical" pass, and keep the smallest in size.

Most likely should not be the default, but if I was using QOI, I would like to have this option.

(I haven't read the code yet)

C# implementation (.NET 6)

C# implementation:

https://github.com/NUlliiON/QoiSharp

Performance will improve over time.
Benchmarks will be added soon
NuGet package uploaded

Paint.NET plugin + C# implementation

Hi!

I really like QOI and decided to use in in my projects.
So to help myself and others I decided to make a Paint.NET file type plugin to be able to load, view, create, convert and save QOI images.

I hope you and/or others will find it useful.

https://github.com/iOrange/QoiFileTypeNet

Consider adding a sRGB flag to the header

There's no way to tell if the file contains pixels/texels in the sRGB colorspace, or not. It's a single bit, and in some applications this is very valuable.

'qoi_encode' uses hard-coded '4' instead of 'QOI_PADDING' when determining 'max_size'

Spotted while reading through "qoi.h" after finding this on reddit.

int max_size = w * h * (channels + 1) + sizeof(qoi_header_t) + 4;

int max_size = w * h * (channels + 1) + sizeof(qoi_header_t) + QOI_PADDING;

Is it valid to embed data in padding?

QOI format has 4-Byte padding at the end.
But, in current implementation (81b438c), it seems that we can embed data in the padding area.

For example, consider this:

#!/bin/sh

#    | "qoif"    | wid | hei | size      | QOI_COLOR    |
echo '71 6f 69 66 00 01 00 01 00 00 00 05 ff a0 b0 c0 d0' | xxd -revert -plain > foo.qoi

./qoiconv foo.qoi foo.png

This successfully create 1x1 image (pixel value is (0xA0, 0xB0, 0xC0, 0xD0)). Is it valid?

Need a Windows viewer app

Hi - I can add QOI to the Basis Universal repo:
https://github.com/BinomialLLC/basis_universal/

QOI is valuable because it's so fast, and I read/write A LOT of PNG's during development. So many that I optimized lodepng to be faster.

However, the library must be fuzzed with at least zzuf or we can't legally use it. That's required. I can do this as our time permits, but hopefully others will do this before us. Fuzzing is extremely important and required or we cannot use it.

Also, I need a Windows viewer app. It needs the ability to display the alpha channel as grayscale. Know of anything?

My understanding is wrong?

I test a 4096 * 4096 * 32bits BMP(platform: win10x64):
Libpng encoding takes about 30 ms (BMP2PNG), decoding takes about 30 ms (PNG2BMP).
LibQOI encoding takes about 234 ms (BMP2QOI), decoding takes about 187 ms (QOI2BMP).

Why is it so different from your test, Is it my understanding wrong?
test.zip

test souce code : https://github.com/dbyoung720/TestQOI.git

Dual license: MIT or Public domain (like stb_image.h)

The public domain license makes it easier for us to include your header in our open source project. Otherwise we have to get permission from our corporate customers to use it. Making the license/public domain declaration compatible with stb_image.h will make it easier for commercial users to use your library.

Pure Java 8 implementation

Here is a pure Java 8 implementation of QOI:

https://github.com/saharNooby/qoi-java (no AWT dependency, so can be used on Android -- not tested though)
https://github.com/saharNooby/qoi-java-awt (BufferedImage converter)

Performance is reasonable, but the library not heavily optimized yet.

Add column to QOIBench for compression ratio

When benching files of different size, compression ratio is a better statistic than size. Similarly, it probably makes sense to get rid of encode and decode time, since the rates are more reliable and useful info.

Sample files

A set of simple .qoi files and corresponding .png files would be helpful for testing independent implementations.

What is the purpose of the magic number 0x2020

There's a magic number 0x2020 used at:

qoi/qoi.h

Line 406 in fda5167

(run == 0x2020 || px.v != px_prev.v || px_pos == px_end)

As it's 32d, it looks like ist the boundary condition, however I do not immediately understand how it could ever be reached in the given context.

I'd like suggest it should at least be some constant with a proper name.

Recent changes in experimental branch

With all that we learned through the analysis and ideas of a lot of people here, I refined QOI quite a bit. More than I thought I would.

The current state is in the experimental branch.

First of all, benchmark results for the new test suite using
qoibench 1 images/ --nopng --onlytotals

## Total for images/textures_photo/
        decode ms   encode ms   decode mpps   encode mpps   size kb    rate
master:       8.2        11.6        127.43         90.52      2522   61.6%
experi:       5.9         8.2        178.14        127.56      1981   48.4%

## Total for images/textures_pk01/
master:       0.7         1.0        186.23        126.11       184   36.4%
experi:       0.6         0.9        214.67        145.87       178   35.2%

## Total for images/screenshot_game/
master:       2.7         3.9        231.42        162.40       534   21.6%
experi:       2.6         3.4        245.06        187.25       519   21.0%

## Total for images/textures_pk/
master:       0.3         0.5        138.31         93.64        83   48.1%
experi:       0.3         0.4        159.63        110.87        75   43.5%

## Total for images/textures_pk02/
master:       2.0         2.8        155.27        110.31       504   42.5%
experi:       1.7         2.3        182.73        133.00       479   40.4%

## Total for images/icon_64/
master:       0.0         0.0        251.06        163.38         4   28.3%
experi:       0.0         0.0        343.60        266.60         5   31.3%

## Total for images/icon_512/
master:       0.6         0.9        474.50        308.36        80    7.8%
experi:       0.6         0.7        474.62        378.76       102   10.1%

## Total for images/photo_kodak/
master:       2.9         4.2        137.76         92.77       771   50.2%
experi:       2.4         3.5        166.17        111.66       671   43.7%

## Total for images/textures_plants/
master:       3.8         6.2        281.64        170.37       951   22.9%
experi:       3.3         5.0        324.00        211.07       922   22.2%

## Total for images/screenshot_web/
master:      18.1        28.2        449.27        287.79      2775    8.7%
experi:      17.5        23.2        464.81        350.15      2649    8.3%

## Total for images/pngimg/
master:       6.5        10.0        279.91        180.57      1415   20.0%
experi:       5.9         8.6        307.44        210.93      1445   20.5%

## Total for images/photo_tecnick/
master:      10.1        15.2        142.74         95.00      2710   48.2%
experi:       8.8        13.6        163.36        105.69      2527   44.9%

## Total for images/photo_wikipedia/
master:       7.8        11.7        138.75         92.50      2260   53.4%
experi:       6.7        10.4        161.91        104.37      2102   49.6%

# Grand total for images/
master:       2.1         3.1        220.85        148.50       485   26.8%
experi:       1.9         2.7        245.67        173.24       465   25.7%

As you can see throughput improved a lot, as did the compression ratio for all files without an alpha channel (icon_*/ and pngimg/ suffered a bit, but the overall compression ratio for these files is already quite high. textures_plants/ still saw improvements). For photos or photo-like images QOI now often beats libpng!

What changed? After I switched the tags for QOI_RUN (previously 2-bit tag) and QOI_GDIFF_16 (previously 4-bit tag) I noticed that QOI_GDIFF covered almost all(!) cases that were previously encoded by QOI_DIFF_16/24. So... why not remove them?

#define QOI_OP_INDEX  0x00 // 00xxxxxx
#define QOI_OP_DIFF   0x40 // 01xxxxxx (aka QOI_DIFF_8)
#define QOI_OP_LUMA   0x80 // 10xxxxxx (aka QOI_GDIFF_16)
#define QOI_OP_RUN    0xc0 // 11xxxxxx
#define QOI_OP_RGB    0xfe // 11111110 (aka QOI_COLOR with RGB)
#define QOI_OP_RGBA   0xff // 11111111 (aka QOI_COLOR with RGBA)

(see the experimental file format documentation for the details)

That is, most tags are now 2-bit, while the run-length is limited to 62 and thus leaves some room for the two 8-bit QOI_OP_RGB and QOI_OP_RGBA tags. So QOI would be even simpler than before and (probably?) gain a lot more possibilities for performance improvements:

there is no multi-byte run
there are no tags with a variable byte-length
there are no more values that cross byte boundaries

Yes, it means that a change in the alpha channel will always be encoded as a 5-byte QOI_OP_RGBA, but using the current test suit of images, this seems to be totally fine. The alpha channel is mostly either 255 or 0. The famous dice.png and FLIF's fish.png seem to be awfully "artificial" uses of PNG. (For comparison, in the experimental branch with the original tag-layout and QOI_DIFF_16/24 still present, the overal compression ratio was at 24.6% - but the win in simplicity and performance is imho worth this 1%).

The hash function changed to the following:

#define QOI_COLOR_HASH(C) (C.rgba.r * 3 + C.rgba.g * 5 + C.rgba.b * 7)

This is seriously the best performing hash function I could find and I tried quite a few. This also ignores the alpha channel, making it even more of a second-class citizen.

You may not like it (and I'm truly sorry for all the work that would need to be done in existent implementations), but I strongly believe that this is The Right Thing To Do™.

Thoughts?

Benchmark results must be updated

Benchmark results on this page must be updated. I think they were calculated before this commit 30f8a39

Separate file format from compression scheme

I find the compression scheme itself kinda interesting, regardless of the header format. Separating it from the file format might encourage people to take this simple but quite OK scheme (pun intended) and use it on its own for their specific use cases.

It's not like anything will change, but I believe being precise with the definition (separating two different things: the compression scheme vs. the file format) is a right idea.

XZ on QOI is better than zopflipng

I am always curious of data compression.
I tried your corpus, and compare with zopflipng. What surprised me a lot is that QOI + XZ is smaller than zopflipng. XZ is run with -9 and zopflipng with --prefix -m

Kodak set

tests/images/kodak on  master [?] ❯ du -c kodim*.png | tail -1
15072	total
tests/images/kodak on  master [?] ❯ du -c zopfli_kodim*.png | tail -1
14424	total
tests/images/kodak on  master [?] ❯ du -c kodim*.qoi | tail -1
18568	total
tests/images/kodak on  master [?] ❯ du -c kodim*.qoi.xz | tail -1
13760	total
tests/images/kodak on  master [?] ❯ du -c zopfli_kodim*.png.xz | tail -1
14424	total

Screenshot set is much more impressive

tests/images/screenshots on  master [?] ❯ find -name '*.png' ! -name 'zopfli_*' -print0 | xargs -0 du -c | tail -1
33216	total
tests/images/screenshots on  master [?] ❯ du -c zopfli_*.png | tail -1
21912	total
tests/images/screenshots on  master [?] ❯ du -c * | tail -1
128364	total
tests/images/screenshots on  master [?] ❯ du -c *.qoi | tail -1
33600	total
tests/images/screenshots on  master [?] ❯ du -c *.qoi.xz | tail -1
18132	total
tests/images/screenshots on  master [?] ❯ du -c zopfli*.xz | tail -1
21504	total

Is there any explanation on why it can beat out PNG like this?

Using C99 fixed width integer types

I noticed that you use the old way of define integer vars (unsigned short, int, etc). This makes the code and the file format dependant of the machine architecture of where was compiled/generated.

Instead, use the fixed width integer types that C99 have, and QOI would even work on 8,16 and 32 bit machines (and a fast & simple image format it's useful for retro computers) without any issue of portability. Specially, if the endianness problem it's fixed (saw in #10 )

I could made a PR with this kind of changes if you like.

The QOI File Format Specification

After a discussion in #28, the QOI data format changes to accommodate some of the concerns. This will serve as the basis for final specification for QOI.

~~These changes are not yet reflected in the code of this repository. I'm working on it!~~ The code in qoi.h now implements all these changes.

Changes from the original implementation

all values are encoded in big-endian byte order (already happened in c03edb2)
the range of QOI_DIFF will shift -1, to be consistent with the range of a two's complement int
QOI_DIFF will explicitly allow to wrap around. Whether the encoder makes use of this is outside of the spec. The decoder must account for this wrapping.
the size field in the header will be removed
width and height in the header will be widened to 32bit
a channels field will be added to the header. This is purely informative and will not change the behavior of the en-/decoder
a colorspace bitmap will be added to the header. This is purely informative and will not change the behavior of the en-/decoder.
the spec will mandate that the alpha channel is un-premultiplied

The header then looks like this:

struct qoi_header_t {
    char [4];       // magic bytes "qoif"
    u32 width;      // image width in pixels (BE)
    u32 height;     // image height in pixels (BE)
     u8 channels;   // must be 3 (RGB) or 4 (RGBA)
     u8 colorspace; // a bitmap 0000rgba where 
                    //   - a zero bit indicates sRGBA, 
                    //   - a one bit indicates linear (user interpreted)
                    //   colorspace for each channel
};

The ranges for QOI_DIFF change to:

2bit: -2..1 instead of the original range -1..2
4bit: -8..7 instead of the original range -7..8
5bit: -16..15 instead of the original range -15..16

The channels field in the header serves only as a hint to the user on how to handle this image. It is valid for a QOI image to still encode alpha changes in a file with a header that denotes 3 channels. It is not the responsibility of the decoder to mask off alpha values. The color hash will always be computed as r^g^b^a, irregardless of the number of channels denoted in the header.

Consistency of computing color hashes and zero-initializing of `px`

Just a thought, for consistency sake. IIUC:

Currently, for 4 channels, color hashes are computed as r ^ g ^ b ^ a.
For 3 channels, they are computed as r ^ g ^ b ^ 255 (solely because of px = px_prev line).

If px = px_prev was replaced with zero initialization of px, color hash would also be a simple xor of all 3 or 4 components (e.g. might make it a tiny bit easier to implement in a generic way in other languages).

Clarify that RGBA means non-premultiplied alpha

It's probably worth mentioning in the spec (or, if there's not a spec, in qoi.h) whether RGBA means premultiplied (associated) alpha or non-premultiplied (straight, unassociated) alpha.

https://en.wikipedia.org/wiki/Alpha_compositing#Straight_versus_premultiplied

If it's "whatever PNG does" then it's non-premultiplied alpha.

Lock output file before writing

First, this project looks super awesome. Very impressed.

Minor suggestion would be to call fopen in qoi_write() prior to invoking encode, in alignment with 'fail fast' philosophy.

Logic: if I wanted to write a batch processor that warns on failure, it would be much faster for cases where files already exist/are unwritable.

The number of channels should be encoded in the QOI header

The function
qoi_read(const char *filename, int *out_w, int *out_h, int channels)
needs the number of channels as parameter. This should not be needed.
The QOI file should know with which number of channels it was encoded.

To solve this the header could be improved to contain also the number of channels.
BTW.: The header should also contain some file format version. This allows future improvements.

Upcoming breaking changes & locking in the data format

Saying that I'm surprised by the amount of attention this is getting would be an understatement. There's lots of discussion going on about how the data format and compression could be improved and what features could be added.

I want to give my views here and discuss how to go forward.

First and foremost, I want QOI to be simple. Please keep this in mind. I consider the general compression scheme to be done. There's lots of interesting ideas on how to improve compression. I want to tinker with these ideas - but not for QOI.

QOI will not be versioned. There will only be one version of QOI's data format. I'm hoping we will be able to strictly define what exactly that is in the coming days.

QOI will only support 24bit RGB and 32bit RGBA data. I acknowledge there's some need for fewer or more channels and also for higher bit depths or paletted color - QOI will not serve these needs.

So, with all that said, there's some breaking changes that are probably worthwhile. I want to discuss if and how to implement those.

Proposed changes

width, height and size in the header should be stored as big endian for consistency with the rest of the format (this change already happened in c03edb2)
Color differences (QOI_DIFF_*) should ~~be stored~~ have the same range as two's-complement. That means:

2bit: -2..1 instead of the current range -1..2
4bit: -8..7 instead of the current range -7..8
5bit: -16..15 instead of the current range -15..16

The header should accommodate some more info. Currently there's demand for
3a) number of channels (#16)
3b) the colorspace (#25 and this huge discussion on HN)
3c) un-/premultiplied alpha (#13)
3d) user-defined values

So, 1) is already implemented; 2) seems like the right thing to do (any objections?); 3) is imho worth discussing.

3a) Storing the number of channels (3 or 4) in the header would allow a user of this library to omit if they want RGB or RGBA and files would be more descriptive of their contents. You would still be able to enforce 3 or 4 channels when loading. This is consistent to what stbi_load does

int x,y,n;
unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
// ... process data if not NULL ...
// ... x = width, y = height, n = # 8-bit components per pixel ...
// ... replace '0' with '1'..'4' to force that many components per pixel
// ... but 'n' will always be the number that it would have been if you said 0

It is my opinion that the channels header value should be purely informative. Meaning, en-/decoder will do exactly the same, regardless of the number of channels. The extra 5bit for alpha in QOI_DIFF_24 will still be wasted for RGB files.

3b) I don't understand enough about the colorspace issue to gauge the significance. If we implement this however, I would suggest to give this a full byte in the header, where 0 = sRGB and any non-zero value is another, user-defined(?) colorspace.

3c) I'm against an option for premultiplied alpha, because it puts more burden on any QOI implementation to decode in the right pixel format. We should just specify that QOI images have un-premultiplied alpha.

3d) For simplicity's sake I'd like to put 3a) and 3b) as one byte each into the header. I'm uncertain if we then should "pad" the u32 size in the header with two more bytes. This would make the size 4byte aligned again, but there's probably no need for it!? A u16 unused could also cause more confusion when other QOI libraries suddenly specify any of these bits to mean something.

With all this, the header would then be the following 16 bytes:

struct qoi_header_t {
	char [4];       // magic bytes "qoif"
	u16 width;      // image width in pixels (BE)
	u16 height;     // image height in pixels (BE)
	 u8 channels;   // must be 3 (RGB) or 4 (RGBA)
	 u8 colorspace; // 0 = sRGB (other values currently undefined)
	u16 unused;     // free for own use
	u32 size;       // number of data bytes following this header (BE)
};

The one issue I have with this, is how to give these extra header value to the user of this library. qoi_read("file.qoi", &w, &h, &channels_in_file, &colorspace, want_channels) looks like an ugly API. So maybe that would rather be implemented as qoi_read_ex() and qoi_read() stays as it is. I'm still not sure if I want that extended header...

What's the opinion of the other library authors?

QOI support for Pillow/Python

Since QUI is a C library, it could be a Python extension and used as a plugin for Pillow.

Some easy optimizations are available

Hey, thanks for the work you're putting in to this.

I've written a Rust implementation of your image format (https://github.com/steven-joruk/qoi) and added some optimizations you might want to take as well.

The biggest gain is by factoring out writing the QOI_RUN command which lets you get rid of a bunch of redundant comparisons and a couple of branches: steven-joruk/qoi@3f3ee0a

You can reduce some more branches when writing QOI_COLOUR: https://github.com/steven-joruk/qoi/blob/3f3ee0ae7ecbb62a4b293f932d28580099989159/src/encode.rs#L158

And I'm unsure if this has an real affect but you only need to store the previous colour when it's changed (move the assignment in to the px_prev != px block).

When I hacked those in to my local qoi.h I saw improvements of around 16% for dice.png, I haven't measured other files. The rust benchmark encodes dice.qoi (from raw) in around 2.3ms compared to qoibench's 3.7ms (3.4ms with the above changes), I haven't compared the assembly or profiles to see what else may be going on.

Some int types should be long or size_t

This issue was factored out of PR #6 (comment)

Generally, when passing around a pointer-length pair for a block of memory, the length should be size_t.

The qoi_decode function still takes (const void* data, int size, etc).

Note that ftell and fread return long and size_t, not int

The qoi_read function can also still overflow here.

Consider BGRA instead of RGBA

Speaking of Windows (#24)... IIUC the default Windows color order is BGRA (not RGBA) and likewise for Linux (X11) and I think MacOS / iOS too. Can't remember what Android is.

Anway, if we're talking of finalizing the file format (#28), consider QOI producing BGRA, not RGBA. Especially as QOI is about being fast to decode, this would avoid what libpng calls the PNG_TRANSFORM_BGR step.

Transforming BGRA <-> RGBA is cheap, especially with SIMD, but it's not free.

stb_image is so bad at png compression it makes the benchmark unreasonable

I tried playing with qoiconv on a few images I had around (since I had a hard time believing something so simple could get so close to png). So far some png inputs it fails to decode at all so it can't convert them to qoi, while others it can read and convert, but when you convert back to png the file size is almost double what it originally was. Running imagemagic convert on the png generated by qoiconv results in about a 40% reduction in file size.

I think the benchmark needs to switch to libpng or something else if it is to be a serious comparison against png compression, although the compression ratio is surprisingly good for something so simple.

For example:
https://ae27ff.meme.tips/res/klmmlyby.png original is 212153 bytes
qoi is 289133 bytes
qoiconv back to png is 354424 bytes
output from imagemagick convert on the file generated by qoivonc is 220817 bytes

So whatever stb_image does is not good at all, and that makes me question if the encode/decode times are even valid either versus libpng which seems to be just about the standard for png.

Consider versioning the header

The header is currently 12 bytes:

struct qoi_header_t {
  char [4];              // magic bytes "qoif"
  unsigned short width;  // image width in pixels
  unsigned short height; // image height in pixels
  unsigned int size;     // number of data bytes following this header
};

Endianness is already discussed in #10.

If the file format isn't set in stone yet, consider:

changing the 4-byte magic so that it's not ASCII text, so that accidental matches are less likely. This is part of why the PNG magic header starts with an 0x89. A suggestion: [0x71 0xf8 0x69 0x66] is invalid UTF-8 but "qøif" in Latin-1.
widening the width and height to 24 or 32 bits. 65536 pixel wide images might seem "impractically large" right now but it's less than an order of magnitude more than what my phone camera can produce, and it wasn't so long ago that 640x480 was considered "high resolution".
adding a version number somewhere, to enable future extensions to the format.

You might find some inspiration in NIE's 16 byte header:
https://github.com/google/wuffs/blob/main/doc/spec/nie-spec.md

There's also, as mentioned in the Hacker News discussion, the idea of re-using the IFF / RIFF container format.

License?

Can people use this for anything?

Split data in rectangle chunks for parallelizable processing

The Idea is to add two more fields to the header: chunk width and chunk height (u8?) that defines blocks of data that can be processed independently from others (each have its own "64 last known pixels"). The number of chunks can easily be determined by dividing the image's dimensions.
There are several adventages to this:

It will enable parallel processing on the CPU resulting in a much faster decoding and encoding, and allow for a more efficient GPU decoder implementation than what could be done;
It will improve pixel locality, as the "64 last known pixels" only work on the width but not on the height.

The drawback is the added complexity, but I think the sacrifice would be worth it. I'm waiting for your feedbacks to start to implement and benchmark this.

Also, each chunk could be stored contiguously for a better data (and not pixel) locality. Maybe better, maybe too complex for this, dunno...

What do you think?

Comprehensive Test Suite of Images

I have now assembled a pretty comprehensive suite of test images. These all come with the proper license information (CC or public domain). It includes:

an icon set in 64px & 512px (thanks @nigeltao)
various photo sets
various texture sets
game screenshots
website screenshots
a random sample from https://pngimg.com/

Here's the full set:
https://phoboslab.org/files/qoibench/qoi_benchmark_suite.tar (1.1 GB) — very proudly excluding lenna.jpg

All images in this test suite are PNGs. I will add QOI images once the specification has been finalized (related #20).

To make it a bit easier to test tweaks for qoi, qoibench.c (in the experimental branch) can now descend into subdirectories, prints a grand total and has gained various options:

Usage: qoibench <iterations> <directory> [options]
Options:
    --nowarmup ... don't perform a warmup run
    --nopng ...... don't run png encode/decode
    --noverify ... don't verify qoi roundtrip
    --noencode ... don't run encoders
    --nodecode ... don't run decoders
    --norecurse .. don't descend into directories
Examples
    qoibench 10 images/textures/
    qoibench 1 images/textures/ --nopng --nowarmup

E.g. if you just want to check the overall compression ratio for qoi as fast as possible:
./qoibench 1 images/ --nowarmup --nopng --noverify --nodecode

Stream-like input and output

Hi!

It would be great to additionally support stream reading and writing functions like in libpng (png_set_read_fn) or libtiff (TIFFClientOpen).

Ideally, this feature has just a single customer-supplied reading (or writing) function like png_set_read_fn().

This feature is needed for image reading libraries like FreeImage or SAIL to simplify QOI integration.

comments :heart:

probably close this as soon as you see it, as it isn't a real issue

quoting source:

// -----------------------------------------------------------------------------
// libpng encode/decode wrappers
// Seriously, who thought this was a good abstraction for an API to read/write
// images?

I haven't laughed this hard reading someone else's code since I learned that the candela is a poor unit (no direct links for line numbers, so please search for "I think the candela is a scam.")

Your code is fun to read.

Create a water-proof specification

I implemented the Zig implementation of Qoi and ai made it a clean room implementation, thus testing the specification in qoi.h.

Some problems i noticed in both the implementation and the description: While the qoi format assumes big endian byte order for a lot of things, the implementation is only suitable to run on little endian machines.

Also, the bit order is unclear for cross-byte fields:

qoi/qoi.h

Lines 128 to 134 in dd0b04b

    
           QOI_DIFF_24 { 
        
           	u8 tag  :  4;   // b1110 
        
           	u8 dr   :  5;   // 5-bit   red channel difference: -15..16 
        
           	u8 dg   :  5;   // 5-bit green channel difference: -15..16 
        
           	u8 db   :  5;   // 5-bit  blue channel difference: -15..16 
        
           	u8 da   :  5;   // 5-bit alpha channel difference: -15..16 
        
           }

dr for example crosses the first byte boundary, and it is unspecified how the bits are ordered here. To me, it was unclear if the 128-bit or the 1-bit of the second byte will provide the additional bit. It's also unclear which bit the bit is in the final u5 value.

A good alternative that makes this unmistakable clear would be something like this:

|                                 QOI_DIFF_24                                 |
|        Byte + 0         |        Byte + 1         |        Byte + 1         |
|  7  6  5  4  3  2  1  0 |  7  6  5  4  3  2  1  0 |  7  6  5  4  3  2  1  0 |
|-------------------------|-------------------------|-------------------------|
|  1  1  1  0 r4 r3 r2 r1 | r0 g4 g3 g2 g1 g0 b4 b3 | b2 b1 b0 a4 a3 a2 a1 a0 |

With:
  r4...r0 forming the red channel difference between -15..16
  g4...g0 forming the green channel difference between -15..16
  b4...b0 forming the blue channel difference between -15..16
  a4...a0 forming the alpha channel difference between -15..16

I'm happy to create a PR for this change

	QOI_DIFF_24 {
	u8 tag : 4; // b1110
	u8 dr : 5; // 5-bit red channel difference: -15..16
	u8 dg : 5; // 5-bit green channel difference: -15..16
	u8 db : 5; // 5-bit blue channel difference: -15..16
	u8 da : 5; // 5-bit alpha channel difference: -15..16
	}