nfrechette / acl Goto Github PK
View Code? Open in Web Editor NEWAnimation Compression Library
License: MIT License
Animation Compression Library
License: MIT License
We should document the code as well as add a page under docs
to show example code with how to populate the structures and what they are used for.
https://bestpractices.coreinfrastructure.org/
See https://github.com/nlohmann/json for an example
Easier to read that way.
If the variable bit rate optimization algorithm fails to find a suitable quantized bit rate with an acceptable error, it falls back to 32 bits per component and those are stored as bit aligned float32 values. When this happens, range reduction can needlessly reduce the accuracy. Since we are already storing full floats, we might as well store the original clip values without any range reduction. This will increase the accuracy considerably of that special bit rate and avoid issues with exotic world space clips where range reduction hurts us.
Some error functions employ recursion. This is bad for very long bone chains. It should be easy enough to remove.
https://fgiesen.wordpress.com/2012/08/15/linear-interpolation-past-present-and-future/
We should look into the lerp: lerp_1(t, a, b) = (1 - t)*a + t*b
Measure and publish the results.
It is imperative that the error metric function be as close as what the host game engine will use internally to compute and blend poses.
For example, if we use matrices within the engine, we must use matrices to compute the error metric. Failing to do so could lead to the compression algorithm not seeing the same error as the game engine. AffineMatrix_32
does not perform at all like Transform_32
when scale is present. This would also allow support for VQM transforms.
Sometimes monotonic time updating isn't desired between keys. This could be to give a retro look and feel to animations (e.g. lego movie) or to handle camera cuts in cinematics where we teleport the character and do not wish to interpolate between some keys.
Can we add android to CI somehow?
Add android to cmake and make.py
.
We should document the code as well as add a page under docs
to show example code with how to populate the structures and what they are used for.
How many bytes needed in clip header?
Segment header?
Constant track data?
Clip range data?
Segment track formats?
Segment range data?
Animated data is already tracked
Once we have this information, we can trivially how many bytes touched and how many cache lines touched to sample 1 bone or 1 pose.
There are already some unit tests for math functions.
Make sure we have 100% coverage or as much as reasonably possible.
It needs to be broken down for every platform we support.
It needs to link with the bare minimum that needs to be done for integration: allocator, error handling, populating raw clip structures, compressing, and decompressing.
A section on contributing with details on: how to run the unit tests, the make.py script, the various tools, etc.
We have a lot of appveyor and travis jobs at the moment and they often fail on travis when installing packages due to download timeouts. Considering that each build is fairly fast, there is no need to have one job per configuration permutation.
It would make sense to have 1 job per compiler and do both debug/release and x86/x64 on it. Two jobs for appveyor (vs2015, vs2017) and five for travis (gcc5, clang4, clang5, xcode8, xcode9).
See discussion in issue #63.
See https://github.com/nlohmann/json as an example
Could be useful to compare how it measures again the other error metrics that support scale.
We should validate the various memory_utils.h
functionalities with unit tests.
It is relevant to track and I already have the data, just need to write it down.
We should document the code as well as add a page under docs
to show examples with how to use it.
We should document the code as well as add a page under docs
to show example code with how to use it.
We should document the code as well as add a page under docs
to show example code with how to implement the interface.
Allocator should be renamed AnsiAllocator and derive from a new IAllocator interface.
A new DebugAllocator should be created that simply passes allocations through and asserts at destruction that the NB live allocations is zero to do a rudimentary tracking of memory leaks and double frees.
This allocator should be used in the tools and unit tests that we provide to ensure there are no memory leaks or double frees, etc.
When there is no scale present, the TransformMatrixErrorMetric
never normalizes the rotation quaternion. If the bone chain is long, error could accumulate.
Try adding normalization after every transform_mul
or adding it just at the end and compare the results.
This would be critical for production use and allow a fallback algorithm to be used if the error isn't good enough.
Add the max error to: OutputStats
iOS needs to support decompression, compression is optional and not really required for now.
Add iOS to cmake and make.py
.
Can we add iOS to Travis CI?
Can we run unit tests on iOS?
Make sure unaligned loads are handled properly. On ARM, __packed is required!
Can be tested in UE 4.15.
See here for details on how it works at a high level: http://nfrechette.github.io/2016/12/22/anim_compression_error_compensation/
This would help dramatically for the few remaining exotic clips in the Paragon data set where the max error is unusually high.
Storing bone transforms in local space of the bind pose. For translation in particular, this reduces the range of values that we compress, increasing the accuracy and reducing the memory footprint a bit. At runtime when we decompress, we simply add back the bind pose.
Should be optional, this might very well be best done by the game. Perhaps we can provide only helper functions that the game can call. Maybe do nothing at all and let them deal with it?
Lots of modern processors support pop_count
and count_leading_zero
type instructions. This can speed up bit set manipulation considerably and could be used to optimize the decompression and the bone chain interator.
Instead of always dropping the W component, we should attempt to drop the largest component and store 2 bits somewhere to remember which component is dropped. This should improve accuracy considerably when W is small.
Where to store the extra 2 bits:
Note that because the component dropped might change from sample to sample or segment to segment (depending on the above variant), we will have to store the full 4 component range information for the clip/segment. This is unfortunate but we will likely need the 4th component anyway in order to mix in full quaternion variable bit rate (no component dropping) when precision requires it.
For full precision mode and for constant samples, we can store the 2 bits as part of the 3 remaining floats. Because rotations have their values between [-1.0, 1.0], we only use a subset of the floating point range. Our exponent is always smaller than 1. With IEEE-754, the exponent value is stored as exponent + 127
on 8 bits meaning our value is always smaller than 128. This means the first exponent bit in our floating point number is always 0. We can use the first two floats to store our 2 bits and we can clear them after the load to reconstruct the original exponent. This can be very cheap. We also have a spare bit in the 3rd component that remains which could be used to reconstruct the sign of the stripped component. Note that this means that rotations cannot safely encode infinity/nan which is fine.
Measure and publish the results.
See also:
https://gafferongames.com/post/snapshot_compression/
We should document the code as well as add a page under docs
to show example code.
Move the sjson writer to sjson-cpp and fix other changes made by ACL.
Include a full version under external, same as catch.
In the clip reader/writer which use the sjson stuff, add a check if the corresponding sjson header has ALREADY been included. Force the user to include SJSON manually, they can then either use their own dependency or the one included in external.
Appveyor already builds x86 but it has not been tested beyond the unit tests passing.
acl_compressor.py
needs to be ran on CMU to properly validate with: vs2015, vs2017, gcc5, clang5.
Add x86 support to Travis CI.
While investigating an exotic clip from Paragon with an unusually high error (~9cm), I found out that when we drop the W component of a quaternion, it can yield a large error which is compounded by a deep hierarchy and excessively high scale (8000.0) and translation values (20000.0).
Attempting to use AffineMatrix_64
did not help at all, the issue isn't with the arithmetic or the rounding but in the fact that a small error in the quat.w
yields a small error in the matrix itself and it compounds. It is not possible to ortho-normalize the matrix at every bone because it contains scale.
When comparing against UE 4.15, the same clip has an error of ~170cm using the ACL error metric. However, using the UE 4.15 error metric, it is quite acceptable (<1cm). I also confirmed within UE 4.15 and the animation clip looks very clean, there is no visible error. This means that at least for this clip, the UE 4.15 error metric is much more accurate than ACL's when scale is present in this fashion.
We should have some documentation in the code and as well as example code under docs
with how to populate the structures and what they are used for.
The segment range extent is always bounded by [min value ... (1.0 - min value)]
If my min value is say 0.6, my extent can be at most 0.4.
Instead of doing: mul_add(value, extent, min)
Try: mul_add(value, (1.0 - min) * extent_scaled, min)
The smaller the range extent, the more precise our bits become.
The same also holds for the range extent for rotation tracks since the boundaries are known: [-1.0 .. 1.0]
We should document the code as well as add a page under docs
to show example code with how to populate the structures and what they are used for.
Range reduction sometimes causes accuracy loss. Investigate fixed point arithmetic to see if it can improve accuracy.
Perhaps a mix of fixed point/float32 arithmetic should be used for optimal results?
Also keep in mind performance implications for the decompression.
http://x86asm.net/articles/fixed-point-arithmetic-and-tricks/
https://en.wikipedia.org/wiki/Fixed-point_arithmetic
Use a rotation track from CMU for a segment, 16 rotations.
Compare with current float32 code path.
Compare with float64 code path.
Compare with fixed point code path (possibly various precision settings).
Exhaustive comparison for every possible bit rate?
https://software.intel.com/en-us/forums/intel-isa-extensions/topic/301988
http://codesuppository.blogspot.ca/2015/02/sse2neonh-porting-guide-and-header-file.html
https://blog.molecular-matters.com/2013/05/24/a-faster-quaternion-vector-multiplication/
Classic:
v' = q * v * conjugate(q)
Different formulation:
t = 2 * cross(q.xyz, v)
v' = v + q.w * t + cross(q.xyz, t)
Is the accuracy better or worse?
Is the performance better or worse?
OS X needs to support compression and decompression as well as at least the acl_compression.py
script.
Unit tests must also pass.
Add OS X to cmake and make.py
.
OS X needs to be added to Travis CI as well.
Single segment clips do not benefit from segment range reduction since the extent will be 1.0 and the min will be 0.0, adding no value, just overhead.
CMU does not have that many short clips but Paragon and most games do.
We should document the code as well as add a page under docs
to show examples.
Once the unit tests are extended and in place, adding support for this should be trivial and simply require to add them to travis.
The various packing and unpacking functions should be properly unit tested.
Take 100 clips from CMU, some exotic, others picked based on their duration so we get a good mix.
Uniform sampling should be compressed with various methods and the decompression validated against an error output. See main.cpp
in acl_compressor
.
Ideally we want to test only the variants that are reasonably expected to be used otherwise the unit tests might take too long to execute. TBD
See https://github.com/nlohmann/json as an example
Additive animation clips can be implemented in one of two ways:
transform_mul(transform_inverse(reference), value)
In the later format, the 3D scale can be zero which is problematic.
Ideally when compressing we must measure the error after the clip has been applied to the base clip to ensure the highest accuracy when it is played back. As such we must add the option for a clip to have a reference clip.
Some additive clips use a single frame as a reference while others use the whole clip time scaled.
Note that on the decompression side, the base clip isn't added. This is left for the game engine to perform at its leisure. For now anyway.
We should document the code as well as add a page under docs
to show examples with how to use it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.