phoboslab / qoa Goto Github PK
View Code? Open in Web Editor NEWThe “Quite OK Audio Format” for fast, lossy audio compression
License: MIT License
The “Quite OK Audio Format” for fast, lossy audio compression
License: MIT License
Each file has a very loud noise at the beginning of the file once I convert from WAV to QOA and back to WAV.
Steps:
qoaplay
and MP3/FLAC bits in Makefilemake
./qoaconv input.wav input.qoa && ./qoaconv input.qoa output.wav
output.wav
has a noticeable beep, occurs on all tested WAV filesThis loud part at the beginning should not be present in the output WAV:
Using a MacBook Pro M1 Max, macOS 12.5.1.
This causes segmentation fault if build with -fsanitize=address
So many qoaenc encoded files are invalid in last frame.
Workaround is to just set missing bytes to 0 as they are not actually used in decoding, but this complicates code path and ultimately breaks optimizations, and makes sanity checks within decoder less straightforward.
While there's still some details to discuss (specifically fields in the file & frame headers), I started working on the file format specification. The current draft can be found here:
https://qoaformat.org/qoa-specification-draft-01.pdf
I'm sure I forgot to mention some details and/or need to clarify things. Please let me know!
The C99 spec states that
The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined.
In qoaconv.c L270:
double psnr = 1.0/0.0;
This refuses to compile on some compilers like MSVC.
Changing 1.0/0.0
to the INFINITY macro seems to work fine.
I was reading through some of the code here after seeing this project mentioned in the new raylib release. I am curious why num channels and sample rate are present in every frame if they are consistent (and therefore redundant?) in a valid file. Why not move this into the header and save ~4 bytes per frame.
Is this based on wanting 64 bit aligned reads specifically? Sorry just curious, I love how straight forward this is to implement and wish you success!
Hi,
I just tried this compression and it works fine on hundred of different files. However on a specific one, the reconstructed sound has bugs in it. If I lower the volume of the source file, the bug does not appear anymore.
Check below the source file (test.WAV) and the reconstructed file with the bugs (raw PCM 16 bits signed mono)
https://kodamo.org/share2cRGh/test.WAV
https://kodamo.org/share2cRGh/reconst.raw
Is it possible that this exact sound triggers some bug in the encode/decode algorithm?
QOA decoding is implemented in https://github.com/AuburnSounds/audio-formats
alongside with seeking, also it's a chunked decoding library. Goal is to follow the format if it changes.
Any chance you could make some Windows binaries available? Github Actions?
32 bits is sufficient for typical use, but it could be a limitation for more exotic use cases (like storing MF/HF radio signals, and anything else that may have large time and/or frequency). Instead of anything that increases the size of file_header I propose this simple change:
struct {
char magic[3]; // magic bytes 'qoa'
uint40_t samples; // number of samples per channel in this file
} file_header; // = 64 bits
Doing this would match wavpack's 40 bit total sample count limit. It would be hard for an exotic use case to be so exotic that 40 bits is a limitation.
Compiling with clang qoaplay.c -std=gnu99 -O3 -o qoaplay.exe
, playback of any .qoa
file results a quick greeting of garbage output (thankfully I had followed the warning and set my headphones low!) and afterwards the player seemingly crashes. I also tried zig cc
to similar effect.
I compiled qoaconv
and thankfully it converts to and from .qoa
normally.
I've added some very simple noise shaping to the encoder (to the noise_shaping branch). This does not change the decoder or the data format. The noise shaping should help to move quantization noise into the higher, less audible frequencies.
Here's a comparison page with all samples with and without noise shaping: https://phoboslab.org/files/qoa-samples/noiseshaping.html
The difference for some sample is night & day. Listen to 32_triangles-triangle_roll_stereo
at 00:43 or 35_glockenspiel_arpegio_melodious_phrase_stereo
at 00:39.
However, this noise shaping has an adverse effect for some other samples. I tried to contain it by only applying most of the shaping when our prediction is "bad" anyway. But still, I feel that some samples sound more "crunchy" now. Listen to 21_trumpet_arpegio_melodious_phrase_stereo
right at the beginning for instance. Vocals in julien_baker_sprained_ankle
and others also seem to have lost a bit of "smoothness".
Maybe someone with better ears (and/or equipment :D) can take a listen? What's the usual strategy here, to adaptively correct for quantization noise?
I'm testing my encoder and I noticed the sample archive doesn't match what the reference encoder produces:
C:\0\ci\qoa\qoa>git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
C:\0\ci\qoa\qoa>git rev-parse HEAD
2d74551f8c5b2cb8c94fe68959dbe1cf9977a793
C:\0\ci\qoa\qoa>cc -O3 -o qoaconv.exe qoaconv.c
C:\0\ci\qoa\qoa>cd ..\test
C:\0\ci\qoa\test>unzip ../qoa_test_samples.zip bandcamp/allegaeon-beasts-and-worms.wav bandcamp/qoa/allegaeon-beasts-and-worms.qoa
Archive: ../qoa_test_samples.zip
inflating: bandcamp/allegaeon-beasts-and-worms.wav
inflating: bandcamp/qoa/allegaeon-beasts-and-worms.qoa
C:\0\ci\qoa\test>..\qoa\qoaconv bandcamp/allegaeon-beasts-and-worms.wav allegaeon-beasts-and-worms.qoa
bandcamp/allegaeon-beasts-and-worms.wav: channels: 2, samplerate: 44100 hz, samples per channel: 6712438, duration: 152 sec
allegaeon-beasts-and-worms.qoa: size: 5295 kb (5422440 bytes) = 278.32 kbit/s, psnr: 39.44 db
C:\0\ci\qoa\test>md5sum ../qoa_test_samples.zip bandcamp/allegaeon-beasts-and-worms.wav bandcamp/qoa/allegaeon-beasts-and-worms.qoa allegaeon-beasts-and-worms.qoa
9b6ae38bb2980466c73b7453e638aa84 *../qoa_test_samples.zip
98b16dfc3041f2d6384c1e378bee72aa *bandcamp/allegaeon-beasts-and-worms.wav
50e5121fb63bc93553a963b9b221ac05 *bandcamp/qoa/allegaeon-beasts-and-worms.qoa
5508a8f6194af9e046873e44a552674f *allegaeon-beasts-and-worms.qoa
C:\0\ci\qoa\test>fc /b bandcamp\qoa\allegaeon-beasts-and-worms.qoa allegaeon-beasts-and-worms.qoa | head
Comparing files BANDCAMP\QOA\allegaeon-beasts-and-worms.qoa and ALLEGAEON-BEASTS-AND-WORMS.QOA
00275B00: F7 0F
00275B01: DC F6
00275B02: 50 DB
00275B03: BB 7F
00275B04: 0D B6
00275B05: AD DF
00275B06: 2D FD
00275B07: C1 B7
00275B10: C3 FF
Hi, how does QOA work at 1-bit? Does it only store the residual at 1-bit? Or Something else?
Is it possible for QOA to achieve lower bitrates for speech? like 8kbit/s or 16kbits?
I discovered this while working on a codec with similar design principles. (Hi!) Specifically, I discovered this after learning about QOA's dynamic slice quantization, and the optimization it performs to the process of brute forcing them.
If you make certain assumptions about the "shape" of the quantization error as you loop over scaling factors, you can skip large chunks of scaling factors entirely. This assumption is that it mostly decreases towards the ideal scaling factor and then increases when going away from it. If it starts increasing, you can skip to the zeroth scaling factor; and if, after skipping to the zeroth scaling factor, it starts increasing rapidly, you can break out of the loop entirely.
Doing this results in a small loss of quality (around 0.05db), but this quality loss can be controlled by adjusting a fudge factor.
$ time ./qoaconv.exe "test ultra new.wav" out_old.qoa
test ultra new.wav: channels: 2, samplerate: 44100 hz, samples per channel: 3155783, duration: 71 sec
out.qoa: size: 2489 kb (2549328 bytes) = 278.32 kbit/s, psnr: 46.87 db
real 0m0.384s
user 0m0.343s
sys 0m0.046s
$ time ./qoaconv.exe "test ultra new.wav" out_new.qoa
test ultra new.wav: channels: 2, samplerate: 44100 hz, samples per channel: 3155783, duration: 71 sec
out_new.qoa: size: 2489 kb (2549328 bytes) = 278.32 kbit/s, psnr: 46.82 db
real 0m0.332s
user 0m0.311s
sys 0m0.030s
The fudge factor I chose was n < (slice_len >> 2)
. If the encoding loop only made it 25% of the way through before breaking, the slice is assumed to have excessively large error.
Test audio attached: test ultra new.zip
Not really an issue, more of a discussion and sharing of my ideas.
I've been wondering if QOA might be a good codec for implementing pulseaudio/pipewire's audio over the network. I've played with those in the past, but once you get to 6 channels over wifi it gets laggy. Maybe a more efficient codec like QOA would help.
Before I try to convince any pipewire folks that we should use a lossy codec, I wanted to see what it would sound like. For this I created a jack client that takes its inputs and sends it to the outputs (so i can use it between any of my apps and physical speakers) but with the slight difference that it encodes stuff to QOA then immediatelly decodes it. If there are any artifacts one should be able to hear it, live, using this qoa_quality_test
jack client. If nothing more, this is a pretty simple example on how to hookup qoa to a jack client.
Since this is streaming audio, I'm using only the lower level functions (eg: only qoa_{encode_decode}_frame
functions without using qoa_{encode,decode}
). I think it might be useful to have the default qoa lms initing in some kind of common function so other users can call them too (just like me).
The 16 bits in use to communicate the frame size are not necessary, since the first 8 bits of each frame's header contains the number of channels and from that you can calculate the total frame size, because all arrays are only dependent on the number of channels:
sizeof(frame_header) + num_channels * (sizeof(lms_state) + sizeof(qoa_slice_t) * 256)
By dropping the frame size bits, and by extension its value limit of an unsigned 16-bit int, we could theoretically also drop this channel limit:
#define QOA_MAX_CHANNELS 8
As a suggestion, I would propose to include more metadata to improve seekability through frames:
As it stands, each frame can change the number of channels and/or sample rate which means that you need to read each frame header to be able to seek through an audio stream. Even if the number of channels remains constant, you can only seek to certain sample offsets and cannot seek to certain timestamps or even calculate the timestamp after seeking without decoding all frames in between.
If we use the free 16 bits to encode additional metadata, we can include something like this in the frame header
bits 0123456789abcdef
tvvvvvvvvvvvvvvv
t = 0
you need to decode all frames
t = 1
the value `v` (15 bits) indicates the number of following frames that do not deviate in number of channels or sample rate
For live streaming you would use t=0
but for streams encoded ahead of time, you could set t=1
and set the number of frames for which the encoder is certain that the number of channels or sample rate isn't going to change. Even if the encoder isn't sure ahead of time, if the encoder's output is seekable it could write the correct value after the fact.
This way, you only need to decode the first frame to be able to seek to any timestamp as well.
While convert_wav.sh
does the job of converting all files, I wanted additional features for testing purpose (e.g. make check
). Here is my suggested Makefile
(which uses the QOA's Makefile) with the script that generates dependencies (gen_deps.sh
):
Makefile
:-include deps.mk
all: $(QOA_FILES) $(WAV_FILES)
../qoaconv:
make -C .. conv
md5: qoa.md5 wav.md5
qoa.md5:
md5sum */qoa/*.qoa > $@
wav.md5:
md5sum */qoa_wav/*.qoa.wav > $@
check: rebuild_conv all
md5sum --quiet -c qoa.md5
md5sum --quiet -c wav.md5
rebuild_conv:
make -C .. conv
ifndef MAKE_RESTARTS
deps.mk: .UPDATE_DEPS
./gen_deps.sh > $@
.PHONY: .UPDATE_DEPS
.UPDATE_DEPS:
endif
clean:
rm -f $(QOA_FILES) $(WAV_FILES)
make -C .. $@
rm -f deps.mk
gen_deps.sh
:#!/bin/bash
for file_type in QOA,qoa,qoa WAV,qoa_wav,qoa.wav; do
PREFIX=$(echo $file_type | cut -d, -f1)
SUBDIR=$(echo $file_type | cut -d, -f2)
EXTENS=$(echo $file_type | cut -d, -f3)
# Create Makefile list of file names
echo "${PREFIX}_FILES= \\"
for directory in */; do
dir_only=$(echo "$directory" | sed 's-/*$--g')
for wav_orig in "$dir_only"/*.wav; do
name=$(basename "${wav_orig%.*}")
# Add file name to the Makefile list
echo -e "\t$dir_only/$SUBDIR/$name.$EXTENS \\"
done
done
echo # Empty line
done
QOACONV=../qoaconv
for directory in */; do
dir_only=$(echo "$directory" | sed 's-/*$--g')
# Create the subdirectories
mkdir -p "$dir_only"/qoa
mkdir -p "$dir_only"/qoa_wav
# Create Makefile rules
echo "$dir_only"/qoa/%.qoa: "$dir_only"/%.wav $QOACONV
echo -e "\t$QOACONV" '$< $@\n'
echo "$dir_only"/qoa_wav/%.qoa.wav: "$dir_only"/qoa/%.qoa $QOACONV
echo -e "\t$QOACONV" '$< $@\n'
done
Hello. The quality of QOA is not very good at lower sampling rates, can anyone make a 4bit version of QOA?
SerenityOS now has a system-wide QOA loader (SerenityOS/serenity#17512), which could be mentioned in the README list.
I'm creating this as an issue and not as a PR because based on the experience with QOI, I'm not sure what exact format you want for the list here :^)
Hi, for the 1-bit version of QOA, is it possible to have a fixed bitrate of 8kbps?
The calculation error * error
is performed using signed 32-bit arithmetic and it overflows for the allegaeon-beasts-and-worms
test (probably many others as well).
The C and C++ standards only define unsigned arithmetic overflow while a signed overflow is an undefined behavior.
A possible fix is to change the variable to:
long long error = (sample - reconstructed);
(wrapped with some typedef probably).
or perhaps check if error
is in range -sqrt(INT_MAX)..sqrt(INT_MAX)
- not sure if it's guaranteed that at least one scale factor will be in range.
The current implementation is dependent on 64-bit integers. This does not work on many (old) ANSI-C compilers that don't have a 64-bit integer type such as long long. I have seen that the format itself does not need 64-bit integers at all. Is it possible to make the implementation independent of 64-bit ints and therefore much more compatible with a wide range of systems?
Little-endian seems to make more sense IMO. Every common architecture uses little endian (x86/arm/risc-v) meaning using big-endian forces the vast majority of hardware in use to do a bswap. Not an expensive operation but more expensive than not having to do anything at all.
Adding const before char* removes warning during compilation time:
qoaplay_desc *qoaplay_open(char *path)
Hi, Is it possible we could see a very simplistic encoder/decoder program that only reads/writes raw audio files?
Is there a way to convert planar audio data (float 32 : LLLLLL.......RRRRRR) to qoa format with minimum data loss?
As pointed out on HN there are some audible periodic clicks, particularly in the 24_tuba_arpegio_melodious_phrase_stereo test sample. As I have now confirmed, this is indeed happening because of the quantization of weights and history samples for the frame headers here.
Currently, the quantization is just history[i] >> 8
. These clicks go away when quantizing with something that remains more accurate at the lower end (e.g. sqrt(weights[i]
). However, I think the added complexity of having sqrt
or some kind of exponent + mantissa
representation that fits into 8 bits there is not worth it.
Or is there some simple way to quantize 16 bit ints into 8 bit with something that approximates sqrt()
?
Anyway, my current plan is to just store the LMS state unquantized. That is: 16 bit for each weight and history sample. This increases the bitrate to 278kbits/s (44100hz, stereo). I would also change the layout of the frame header, so that the weights and history samples can be stored/loaded as 64bits each. So instead of:
struct {
struct {
int8_t history; // quantized to 8 bits
int8_t weight; // quantized to 8 bits
} lms_entry[4];
} lms_state[num_channels];
we would have
struct {
int16_t history[4];
int16_t weights[4];
} lms_state[num_channels];
Thoughts?
https://godbolt.org/z/Ezd47sd5M suggests that the current qoa_read_u64 function produces a surprising amount of x86_64
machine code. Certainly a lot more than "just a mov (and bswap)".
This is possibly because something like the (*p)+5
in (uint64_t)bytes[(*p)+5] << 16
can overflow an unsigned int
(but an unsigned int
is not a size_t
). But I'm just guessing.
Since it's called pretty deep in the qoa_decode_frame
loop, this qoa_read_u64
function might be worth optimizing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.