fractalfir / tmf Goto Github PK

Tight Model format is an experimental lossy 3D model format focused on reducing file size as much as posible without decreasing visual quality of the viewed model or read speeds.

License: MIT License

Rust 100.00%

3d 3d-models compression format lossy

tmf's Introduction

TMF - high compression ratio(up to 89%), blazing fast 3D model format

What is TMF?

tmf is a model format focused on:

Preserving graphical fidelity
Achieving high compression ratio
Being very fast
Giving a very friendly and explicit API, with high quality documentation.

What is `tmf` best at?

tmf works best when operating on moderately sized 3D models(<100 k triangles), with fairly consistent LOD.

How good is TMF at achieving its goals?

As for the visual quality, one can easily judge by themselves.
Compression ratio usually falls between 86-89%, depending on quality settings. On more strict settings(preserving exact order of all vertex data) tmf has a compression ratio of around ~70%
Decode speeds are very high, in some cases outperforming readers of uncompressed formats by a factor of magnitude. For example, decoding the blender test moneky(Suzanne, file in tests directory) subdivided 2 times(15.7k triangles, 8.2k points) takes just 678.18 µs(0.67818 ms)! Thanks to built-in tokio integration, decoding may be autmaticaly split between threads, taking decode speeds even further up. Decoding the mesh containing the bust of Nefertiti, a 3D model with around 2 milion triangles, takes 220-240 ms on a single thread, and only 84 ms on 8 threads, on a 8 logical core system(4 physical cores). This means that tmf is very fast. Please, however, note that tmf-s compression algorithm struggles with very large models, so models with millions of triangles receive far less benefits in terms of file size.
TMF API centres mostly around 2 types: TMFMesh representing a mesh and all operations that may be done with it, and TMFPrecisionInfo specifying quality settings. All of TMFs functions and types are well documented, very often with multiple examples, showing exactly how to use them, greatly improving the ease of use. All operations on a mesh are explicit.

Model render comparison

Uncompressed .obj	Compressed .tmf file(default settings, data reordering allowed)

Is tmf a right fit for your project?

When it is not a right fit:

You don't care about read speeds at all. Then just use Draco. It is way slower(In my tests around 10-20x), but it is also better at compressing.
Your meshes are very big(millions of triangles). tmf was optimised and tested with much more modest meshes(>150k triangles). It's compression becomes worse, the more triangles and points you have. It is very still fast, the compression is just not well suited for such tasks.

When it is a right fit:

You need your models to be smaller, but don't want to sacrifice much of the read speed.
Your meshes are modestly sized or small (>150k triangles).
You only need your meshes to look exacly the same, and are fine with some unnoticable changes.

How are high compression speeds achieved?

Currently on default settings TMF uses bit-wise operations(bit-shift and or) to read data, which makes it able to read data at very high speeds. Additionally TMF is thread safe, and has built-in, optional multi-threading, allowing for decoding of many parts of one model at the same time by many cores, increasing speed even further.

How does it work?

While I mark tmf as a "lossy compression format" in a classical meaning of this word, it really does not compress anything (at least for now). The bulk of the space savings come from storing the model data in different data structures that better reflect the data they store, and saving data with exactly precision it needs(e.g. 9 or 23 bit data types).

Comparisions

The model used in test is the blender monkey(Suzzane). TMF files were saved with default settings(TMFPrecisionInfo::default()).

File size comparison

Format	Size
.obj	1.3 MB
zip(deflate) compressed .obj	367.7 kB
.fbx	651.0 kB
zip(deflate) compressed .fbx	600.6 kB
.gltf	476.5 kB
zip(deflate) compressed .gltf	302.1 kB
.glb	356.6 kB
zip(deflate) compressed .glb	267.5 kB
.tmf	308.3 kB
.tmf with pre-encode optimisations applied	161.9 kB
.tmf with pre-encode optimisations and hand-picked quality settings	142.4 kB
zip(deflate) compressed .tmf	307.9 kB
zip(deflate) compressed .tmf, with pre-encode optimizations	160.2 kB
zip(deflate) compressed .tmf, with pre-encode optimizations, hand-picked quality settings	141.0 kB
draco on max compression settings	~22 kB

TMF vs. Draco.

Draco is noticeably better at compression than TMF. If all you are looking for is reduced file size, then just use Draco. But if you are looking for both high compression and fast reads, tmf can be a vaible alternative.

A comparison of some pros and cons

NOTE: when compression ratios/percentages are given, all formats(eg. tmf,draco,fbx) are compared to .obj as uncompressed base.

Category	Draco	TMF
Compression Ratio	Draco is generally better at compressing data, depending on the compression settings it can be between ~80-98%	TMF can compress your file by around 87.3%
3D model(Suzanne) read time	7-10 ms	~0.6 ms
Impact of compression on read time	Read time increases with compression level	For most settings read time decreases with compression level
3D model(Suzanne) write time	10-18 ms	~7 ms
Language	C++	Rust
Official Rust support	None	Native
Build Dependencies	C++ compiler, cmake, make	only standard rust tollchain
Using in rust project	Requires manual linking	installs and links automatically using cargo

What can lead to compression of a particular mesh being less efficient?

Greatly varying LOD: The save system dynamically adjusts to the LOD of the mesh. For example, a low-poly castle mesh may be saved with precision of 10 cm and a strawberry model may be saved with 1 mm precision. Saving those two object in one mesh(not file!) will force the castle mesh to be saved with higher precision, wasting space. Because most meshes will naturally have a consistent LOD, and meshes that don't would almost always lead to issues elsewhere, this problem is rarely encountered.

Examples

Mesh loading

Loading one mesh

use tmf::TMFMesh;
use std::fs::File;
let input = File::open("suzanne.tmf").expect("Could not open .tmf file!");
let (mesh,name) = TMFMesh::read_tmf_one(&mut input).expect("Could not read TMF file!");
// Geting mesh data
let vertices = mesh.get_vertices().expect("No vertices!");
let vertex_triangles = mesh.get_vertex_triangles().expect("No vertiex triangle array!");
let normals = mesh.get_normals().expect("No normals!");
let normal_triangles = mesh.get_normal_triangles().expect("No normal triangle array!");
let uvs = mesh.get_uvs().expect("No uvs!");
let uv_triangles = mesh.get_uv_triangles().expect("No uv triangle array!");
// Can provide arrays laid out like OpenGL buffers for ease of use when developing games!
let buff_vert_array = mesh.get_vertex_buffer();
let buff_norm_array = mesh.get_normal_buffer();
let buff_uv_array = mesh.get_uv_buffer();

Loading multiple meshes

use tmf::TMFMesh;
use std::fs::File;
let input = File::open("suzanne.tmf").expect("Could not open .tmf file!");
let meshes = TMFMesh::read_tmf_one(&mut input).expect("Could not open TMF file!");
for (mesh,name) in meshes{
    do_something(mesh,name);
}

Mesh Saving

Saving one mesh

use tmf::TMFMesh;
use std::fs::File;
let output = File::create("suzanne.tmf").expect("Could not create output file!");
let settings = TMFPrecisionInfo::default();

// Change TMF mesh to have better laid out data. This can save significant ammounts of space.
mesh.unify_index_data();

mesh.write_tmf_one(&mut output,&settings,name).expect("Could not save TMF mesh!");

Saving multiple meshes

use tmf::TMFMesh;
use std::fs::File;
let output = File::open("suzanne.tmf").expect("Could not create .tmf file!");
TMFMesh::write_tmf(meshes,&mut input,&settings).expect("Could not write TMF mesh!");

Features

0.1 (Current version)

Planed Features

Vertex groups
Materials some initial work already done

More in-depth explanation of compression

Math-based savings

Many formats used for saving of 3D models are shockingly wasteful. There are a lot of opportunities to reduce file size, even when using lossless compression. For example, many model formats treat surface normal vectors like any other vectors. But they aren't like other vectors! They have some special properties, which can be exploited to save them more efficiently. Namely:

All components in a normal vector fall into range <-1,1>. This means that values such as 1.3, 123.0, 69.323, or even 6.50e+12 can never occur in a normal vector, so saving them using a format which supports those values is wasteful.
All normal vectors fulfil conditions x^2 + y^2 + z^2 = 1. This means, that there are a lot of vectors that have all their components in range <-1,1> but aren't valid surface normals. If saving those invalid values is supported, this means that there is wasted space. So, by taking into consideration those properties of normals, they can be saved in such a way that each combination of saved bits correspond to a different normal, wasting no space!

An analogical approach is taken for each and every element of model data, reducing the size even further.

Bits vs Bytes based savings.

A disadvantage of using byte-aligned data types is lack of granularity of precision when saving data. A good example of this may be a UV coordinate that should represent a point on a 1024 pixel texture, with precision of .25 pixels. Doing some quick back of the napkin maths, it can be determined that a precision of log2(1024/.25) = log2(4096) = 12 bits is required. But only available data types are either too small (u8) or way too big(u16, 25% of disk space would go to waste!). The solution is forgoing byte alignment. It comes with a slight performance penalty of having to do bit shifts, and inability to use pre-built compression algorithms (they assume byte-alignment), but come with huge advantage of using data types just wide enough to save what is needed and not any wider. Data is laid out like that in what I call an UBA (Unaligned Binary Array). Data in an UBA consists of a series of data with any binary size, where consecutive data may cross byte boundaries, start or end at any point in a byte, and there is no padding. The size of elements is usually specified before the UBA itself. For some widths, like 9 bits, savings coming from using UBA's can reach as much as 44%!

tmf's People

Contributors

$fractalfir avatar$

Stargazers

Watchers

Forkers

cptpotato nicopap cjcormier alphastrata

tmf's Issues

[META] Preparations for 0.2: What is missing.

0.2 was planned to be released very soon.
I plan on releasing 0.2 in roughly the state it is in on 20 CET on 10 of June, because after this period I will be unable to work on the project for more than a week.
I am happy with the amount of features in this release. Is there anything missing?

[FEATURE] Support arbitrary vertex attributes

TMF looks super cool!

But I'd not only want to store normal, positions and uvs data. Formats like glTF also store other attributes per vertex, such as: color, tangent, secondary uvs, skeletal animation bone weights.

Game engines do use more than normal position and uvs, and it seems reasonable to expect from a mesh serialization format that it supports additional attributes.

Implementation

I'm conscious this is not a trivial task, because each attribute can have an arbitrary representation. And it probably makes it hard to apply some useful heuristics. So I'm not adventuring to propose an implementation. But the most basic API I'd like is a method on TMFMesh that does:

fn set_attribute<T: VertexAttribute>(&mut self, attribute_id: usize, buffer: &[T]) {}
fn get_attribute<T: VertexAttribte>(&self, attribute_id: usize) -> Option<&[T]> {}

(this is an extremely primitive API that can fairly trivially be improved, not a template for a final API)

A method on VertexAttribute could provide to TMFMesh informations so that it can apply good compression heuristics, and specifically allow erasing the type (so that it could be internally stored as a Box<[u8]> and cast when accessed)

What is this for?

I'm maintaining the bevy_fbx crate and bevy is landing a new asset loader with a post-processing step. The FBX format suuuuucks and is wasteful when it comes to game assets (it even leaks private information), using an intermediary representation *that is fast to write and read from is a necessity tbh.

My fbx loader currently only supports normal positions and uvs, so I already can use TMF! However, storing tangents in the backed meshes would be super useful, it would allow not having to compute tangents at runtime, which can be fairly expensive.

How does this compare to draco?

Draco is a mesh data compression scheme that is supported e.g. in glTF (with an extension that e.g. Blender supports out of the box) or as a standalone mesh format.

The README shows some comparisons against zipping up some common 3D formats, but based on the description the impression I got was that glb+draco would be a much closer alternative than a zipped fbx so that would be interesting to see.

Draco is not the smallest library so it might also be interesting to take a look at the code size & runtime speed of the different solutions.

EDIT: another similar library is meshoptimizer which, while the main focus seems to be on optimizing meshes for more efficient rendering, also does some tricks to reduce file size.

[FEATURE] Support for quads

I wanted to ask if this is in the to-do list. There is literally no way out there to compress quad meshes effectively right now. Converting to triangles means you lose all the precious clean quad topology.

Delta encoding [FEATURE]

Currently, indices are by far the biggest (size-wise) parts of the final file(60%). While previously attempted remedies(e.g. splitting index arrays to allow lower indices to be saved with fewer bits) to help, the issue still persists. Potential good solution would be
delta encoding, modified to better fit this particular use case.

A potential approach could look something like that:

Delta-encoding gets assigned a compression-type (marked in the segment header).
A delta encoded segment will start with the following:
2.a. A field describing the amount of elements in a segment(u64).
2.b. Raw data precision bits(u8).
2.c. Delta precision bits(u8)
2.d. Delta min/max (UBA field, size of Raw data precision bits bits). IMPORTANT: preceded by sign!
After that, a continus array of indices, encoded as follows:
3.a DeltaOrRaw (u1) - marks if next item is delta encoded(0) or raw(1)
3.b.0 If delta, a Delta precision bits long number will be encoded, containing a number between Delta min and Delta max. This number should be added to the value of the last number.
3.b.1 If Raw, a Raw data precision bits long number, encoding the value of the index.

This compression type will be selected during encoding, if it is beneficial and TMFPrecisonInfo field allow_delta_encoing is not set to false.

Possible issues:

This relies on neighbouring indices being mostly very similar, which is the case for most meshes (not all!).
May increase decode time. How much? At worst, I expect it to be ~2x slower than normal encoding. There is a slim chance it will be faster. But it is planned to be opt-out during encode, so eventual performance issues should not be a problem.

[FEATURE] Tangent vertex attribute

I'd like to be able to store tangents in my tmf files.

Storing tangents in a serialized mesh format is a way to avoid expensive computation at runtime. Hence reducing load time. In fact glTF specifies tangents as a standard field for their model file format. Quoting https://registry.khronos.org/glTF/specs/2.0/glTF-2.0.html#meshes-overview

XYZW vertex tangents where the XYZ portion is normalized, and the W component is a sign value (-1 or +1) indicating handedness of the tangent basis

Those look very much like a normal (with the one additional +1 -1 component).

[BUG] tmf Doesn't compile with the `fast_trig` feature enabled

I get a compilation error if I enable the fast_trig feature:

tmf = { version = "0.1.1", features = ["fast_trig"] }

When running cargo build I get the following compilation error:

error[E0412]: cannot find type `fprec` in this scope
  --> /github.com-1ecc6299db9ec823/tmf-0.1.1/src/normals.rs:98:27
   |
98 |     let x = fsin(asine as fprec) as FloatType;
   |                           ^^^^^ not found in this scope
   |
note: type alias `crate::utilis::fprec` exists but is inaccessible
  --> /github.com-1ecc6299db9ec823/tmf-0.1.1/src/utilis.rs:21:1
   |
21 | type fprec = f64;
   | ^^^^^^^^^^^^^^^^^ not accessible

I would expect tmf to compile regardless of the enabled features. tmf compiles correctly when fast_trig is not enabled.

Octahedron normals

Hey there!

Have you considered encoding normals using octahedron mapping? It's a neat way to map a direction vector into two components and it has a more uniform distribution than storing (angle, z).

I haven't profiled it against the current implementation, but here's some sample code:

code

use glam::{Vec2, Vec3};

/// Encode a 3d direction vector to a 2d vector using octahedron mapping.
/// The output vector is in the range [-1..1]. The input vector doesn't have to be normalized.
pub fn encode_oct(dir: Vec3) -> Vec2 {
    let norm = dir.x.abs() + dir.y.abs() + dir.z.abs();
    let nx = dir.x / norm;
    let ny = dir.y / norm;
    if dir.z.is_sign_positive() {
        Vec2::new(nx, ny)
    } else {
        // fold over negative z
        Vec2::new(
            (1.0 - ny.abs()) * nx.signum(),
            (1.0 - nx.abs()) * ny.signum(),
        )
    }
}

/// Decode an octahedron mapped direction vector back to the original one.
/// The output is normalized.
pub fn decode_oct(mut oct: Vec2) -> Vec3 {
    let z = 1.0 - oct.x.abs() - oct.y.abs();
    oct += oct.signum() * z.min(0.0);
    Vec3::new(oct.x, oct.y, z).normalize()
}

If you want to, I could open a PR to compare it to the current impl.

Shared Index Segment Type[FEATURE]

Index segments currently make up most of the mesh size. Different approaches to reducing their size have yielded insufficient results. The previous attempts were small, universal improvements for saving all kinds of meshes. Some meshes can be however further compressed by increasing the size of one segment to shrink another. This is where SharedIndexSegment would come in.

A SharedIndexSegment is a segment that stores indices, which are identical, but would normally end up duplicated in completely different segments. A bitmask at the begging of the segment signals, which segments did the indices in the segment belong to.
This alone will not do a lot, since a large range of indices being shared across segments is very rare. This is where the cost in increasing size of some other segments comes in.

Let us imagine this hypothetical scenario:
We have a set of vertices:
[va,vb,vc,vd]
uvs:
[ua,ub,ub,ud]
combined into two triangles:
vertex index:
[0,1,2,3,1,2]
and uvs:
[1,2,3,0,2,3]
There are 5 unique combinations of vertex and uv indices:
[ (0,1) ,(1,2),(2,3),(3,0),(1,1)]
if we change the uv and vertex array to look like this:
[va,vb,vc,vd,vb]
and uv array to look like this:
[ub,uc,ud,ua,ub]
Each unique combination of uv and vertex data can be represented with a single index!
[0,1,2,3,4,2]
This has it's downsides:

Both the vertex and uv array now contain duplicate data.
The highest value in the unifed index array is now bigger.
Computing the new data will increase write times.
Which is why it is not something that will fit each mesh, and should be done on a per-mesh basis. Additionally, reordering data may be not allowed for some user-generated meshes. Someone might not want their mesh data reordered. Which is why this will be a function: [unfiy_index_data].

This has the potential to drastically reduce the size of some meshes.

What is needed for this to work?

Write the unify_index_data function.
Find identical fragments of segments, before encoding them
Add support for writing/reading shared index segments.

[BUG] Triangle segment spilling has some weird issues.

Describe the bug
Triangle segment spiriting should not change the mesh in any way, besides reducing file size. Basic tests show that there are no issues with spilling segments, but when a real mesh(Suzanne) is saved/read and then exported, one triangle is always wrong.

To Reproduce
Read/save and the export a mesh.

Expected behavior
Exported mesh is the same as imported one.

Question: Why have distinct index buffers per attribute?

The API exposes set_normal_triangles, set_uv_triangles, set_vertex_triangles. This allows user to use different index buffers per attribute.

In my experience, there is only a single set of indices per mesh. So all those foo_triangles arrays are duplicate of the same index buffer!

So why do they exist as separate entities? Shouldn't they be merged into a single one? It seems this could reduce memory usage as well.

Add size comparisons with GLTF

Is your feature request related to a problem? Please describe.
GTLF is gaining massive traction as a reliable royalty-free general purpose scene description format. Consider adding comparisons to a model only gltf.

Describe the solution you'd like
Just as OBJ and Draco have sections in the readme, so should GLTF.

Describe alternatives you've considered
N/A

Additional context
It would;d be nice to compare tmf to GLTF, as more and more models are being distributed in it. Also a GLTF->TMF converter would be nice