binomialllc / bc7e Goto Github PK
View Code? Open in Web Editor NEWBinomial's fast high quality full-featured SIMD BC7 encoder.
License: Apache License 2.0
Binomial's fast high quality full-featured SIMD BC7 encoder.
License: Apache License 2.0
This isn't an issue just a minor suggestion for improving image quality in the future.
The encoder takes data that has already been rounded to u8, but if instead it took data as u16 or float, as most game source images these days are u16, it would be possible for the encoder to more accurately match the original image.
I think this would be particularly useful for normal maps, rounding to u8 and then having the BC7 encoder round a 2nd time can be pretty destructive to quality.
https://cdn.discordapp.com/attachments/780432012766347315/834102101083947049/unknown.png shows what this looks like for a black texture against a bright background. Most pixels are 255 alpha, same as source image (which I double-checked does not have this problem), but there is an erratic pattern of 254 alpha pixels across the whole texture. It happens on all textures I tried and seemingly at all compression/speed levels.
Mostly for learning purposes I'm playing around with porting bc7e to a GPU compute shader (it's a world of pain due to shader compilers, drivers, etc. -- nothing to do with bc7e itself, just the whole compute shader ecosystem leaves a lot to be desired), and noticed that in some cases the produced results are different depending on compute shader threadgroup size.
Seems that across whole bc7e code there's exactly one place that does anything "between the SIMD lanes" (everything else is just per-lane code without interaction with other lanes). It's this piece in estimate_partition_list()
-
if ((total_subsets == 2) && (partition == BC7E_2SUBSET_CHECKERBOARD_PARTITION_INDEX))
{
if (all(i >= HIGH_FREQUENCY_SORTED_PARTITION_THRESHOLD))
break;
}
if I disable that part in ISPC code, then I can get the GPU results to match CPU.
If I understand this code correctly, it's a heuristic optimization, along the lines of "if we got this far into partitions, skip checking later ones since they likely won't matter".
But not exactly sure why there's an all(i >= threshold)
which makes it potentially produce different results depending on SIMD width (e.g. SSE4 vs AVX2 vs AVX512). Just using a regular i >= threshold
would make the results not be dependent on SIMD width.
Is bc7e able to be embedded in a single executable and without openmp / ucrt / crt?
I remember ispc not being able to do this 3-4 years ago.
This is for use in Godot Engine.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.