milchratchet / luminary Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 1.0 47.53 MB

CUDA based Pathtracing Offline and Realtime Renderer

License: MIT License

CMake 1.68% C 50.14% Cuda 48.18%

c cuda global-illumination gpu graphics path-tracing ray-tracing raytracing

luminary's People

Contributors

Stargazers

Watchers

Forkers

jackzhousz

luminary's Issues

Wavefront Path Tracing

Currently, the main work is separated into a tracing and a shading kernel. The shading kernel can be plagued by a good amount of divergence. Hence it could be beneficial to split it into many small kernels handling each case of hit. To do this it will be necessary to come up with a clever way of storing the samples between kernels. We cannot afford to use more memory than we already do.

Right now I am thinking on storing the samples in a per thread list basis where each thread could keep track on how many samples of each category it has. Additionally, I am thinking one could balance the lists with a small kernel in which each thread balances the workload in one warp. If this kernel can be made fast, it could be beneficial.

Note that with this, I think having multiple samples per pixel should be dropped as that gives 5% performance at best which is negligible in the context of rendering multiple samples and makes everything more complicated. This also implies that we can leave out the memory for the results, records and albedo buffers in the samples. This leaves samples small and helps with the current memory pressure.

This is kind of a different approach to #19.

Offline Mode Postprocess Menu

After rendering an SDL window should pop up in which you can change properties like exposure etc. This should be optional through -p --postprocess. It should simply be a realtime window which has a special subset menu and does not actually render.

Probability Rejection

Right now the important sampling runs into the issue that sometimes very rare samples get pulled which hinder convergence as pixels with these rare samples will take many samples to converge from there on.

I simply had the idea that we could reject samples which are too rare. This may also leave the conditional light sampling superflous.

BVH Traversal - Dynamic Ray Fetching Issue

Dynamic Ray Fetching causes artifacts which somewhat resemble the bounding boxes. Deactivating dynamic ray fetching alleviates this issue.

The issue is independent from the samples per pixels and thus probably independet from #6.

ReSTIR

Just some days ago, the paper ReSTIR for global illumination was released. While I have not read it yet, it looks very promising at first glance. I would argue that this is the most important thing to implement going forward. Together with temporal reprojection, this could deliver some incredible quality in realtime mode.

Atomics

The current atomics use is superflous. We can either keep track of finished tasks in a kernel and then perform just one atomic call or we could simply schedule kernels for the maximum number of iterations regardless of whether we are done before that or not.

Screenshot Meta Information

It would be nice to include meta information like camera position and rotation into the screenshots.

Realtime UI Menu

All realtime settings should be available in a UI menu. In terms of design I would suggest something like the forge menus in the Halo games. This will be necessary especially with the more and more options that are planned for Realtime mode.

Settings Serialization

There should be a way to create a *.lum file based on the current settings in Realtime mode.

Ocean - Sun reflection gives weird colors

When the sun is at a certain angle (around 25 degrees), the reflection in the ocean starts to have all kinds of weird colors.

Mesh path as start command / Soft Reset at Runtime

Especially in combination with #20 it should be possible to just specify the path to a mesh. This way users would not have to create a lum file. The lum file can then be created by Luminary. The only issue would be specifying things like width or height. While it would be possible to just ask for these next to the mesh path, I think it would be best to make these variables changeable during runtime. This would pretty much require a soft reset procedure which frees the raytracing instance and re initializes a new one. This will also require an integer panel in the UI.

Improved Obj Loader

A faster obj loader could look like this:

Load some fixed amount of bytes from the file (like 4kb)
Take a struct which acts like a questionaire and read through the string char by char and fill out the questionare while doing so.
A questionaire is done once \n is found. Then we process the questionaire and then start a new one.
If we find \0 we load again some more bytes and keep only the bytes that are still relevant from the current set.
The issue is keeping the amount of logic statements low. The advantages are few file reads, efficient traversal through the string and we can even support quads.

Binary Save States

Alongside #20, it should be possible to save the memory to disk and load it again later. The idea is that you can then load a scene quickly without having to wait for the setup.

I suggest a file format like in the Ratchet and Clank games with the header being a list of pointers to the individual memory blocks.

Optix Pixel Count Limit

It seems that Optix cannot handle more than ~16 million pixels when tensor cores are available. Hence for offline mode a tiled denoising approach is necessary. With that I may aswell look into a tiled rendering approach with which one can just render insanely large images without memory issues.

Different Ray Generation Approach

I could try using a more common approach of sending one ray per pixel and on a hit we generate diffuse, specular and light rays. The diffuse and specular rays spawn a light ray on hit while a light ray does not spawn another ray on hit.

This would require some rewriting of the kernel and probably cost a lot of memory. However, the current kernel should be extandable by handling each stage like an iteration in the current kernel.

This would not be beneficial for the offline mode but should be better regarding time to noise ratio.

Architecture Selection

Alongside #61 the build process should allow for different GPU archs. Kernel launch parameter tuning will only be available on the arch that I develop for.

This could simply be implemented through a cmake option which defaults to the arch I use.

qoi image support

Luminary needs more support for image formats other than png. Most of these are not so trivial/require extensive libraries. Qoi is a new and pretty promising format. While it is not supported by anything really at the moment, it is easy to implement.

Optix Raytracer

At this point adding Optix for the ray tracing as an option could be interesting. I have recently seen a comparison of Vulkan, DX12 and Compute Shader for Raytracing on an RTX 2080 Ti and hardware accelerated RT was more than 4x faster than compute. This is quite large, in fact, larger than I expected it. With this in mind, as Optix uses the RT cores, it could be very interesting to see how much of a performance improvement could be gained. With that said, it will probably only ever be an opt in feature as the compute variant remains the default. In the beginning of this project I didn't want to use Optix because I wanted to learn how to do BVH building and traversal but now that I have pretty much reached the current limit of static geometry BVHs, I guess there is not much to learn right now and thus Optix is now valid simply as a feature.

Stack Trace

I figured out how I could implement a pseudo stack trace tacker. I simply keep a static stack of strings containing the name of the function. Then whenever a function is entered we call a function that puts the name of the function onto the stack. Whenever a crash_message call happens we can then print the whole stack. This should be quite elegant and performant since we could simply allocate a large enough stack so reallocation are never necessary. The only issue is that we need to keep track of when a function returnes. It would be simple to just add a function call at the end of each function but that would be messy and error prone. With that said I am not quite sure yet how else to do that.

foo* bar(...) {
  PUSH_FUNC();
   
   /* Do very exquisite instruction execution */

  POP_FUNC();
   
  return foobar;
}

Framerate independent input polling

Handle input polling in a separate thread. Communication to the main thread should be done through double buffering and mutexes. Currently Im thinking about an interface like this:

struct InputHandler;
struct Input;

InputHandler init_input_handler();
void start_input_handler(InputHandler*);
Input get_input(InputHandler*);
void destroy_input_handler(InputHandler*);

I am looking to use the windows pthread implementation found on Github.

Sky / Cloud Improvements

There are some tasks left after #11 and #15.

~~Options for rayleigh, mie and ozone coefficients.~~
More cloud types and variety.
Cloud multiple scattering.

Volumetrics

Volumetric lighting on densities which at most depend on height. No 3d textures or other things are intended here. For water it may be useful to improve visuals and make the water more blue.

Volumetric Fog
~~Volumetric Underwater~~

Volumetric Fog Rework

The current fog implementation was quickly thrown together and I am unsure whether it even works at all at the moment. Given the new knowledge in Volumetrics and BRDFs it should be quite doable to get a decent version up and running. Since the fog will have huge light sampling demands I would suggest working on this once #22 is done.

Correlation between frames

In the realtime mode, there is a correlation between frames which causes the image to converge to stripes.

These seem to be related to light sampling as turning off or forcing light samples fixes the issue.

One can find that using more samples per pixel respectively computing only one frame alleviates the issue and thus this is not seen in offline mode.

Wireframe Rendering Output

Rendering wireframes should be doable by using the internal triangle coordinates of a ray hit.

UI Resizing

The UI height should be resizable. Further, when moving to the left or right border, the UI should extend to full height like in most OS.

GNU Toolchain

Luminary is currently developed using the Microsoft C/C++ compilers. Since I recently realized how simple it is to use cmake projects in Windows without using Visual Studio, Luminary should probably switch over to GCC/Clang. Most of the code should already work but especially the special _s functions need to be replaced.

BRDF Correctness

The current material workflow, especially the weighing of directions, is shotty and probably mostly inaccurate. (Fog is so bad I don't even want to start talking about it)

There are material models that combine specular and diffuse BRDF. This probably better than the current approach. The current BRDF has quite a significant energy loss.

Light Source Generation Bug

Sometimes connected meshes of a light source end up spawning more than one light source.

Workload Sorting

Sorting the samples should provide better performance. However, fast sorting algorithms require a lot of extra memory. Thus it is first necessary to make it possible to render only parts of the image based on a pixel offset, so that the samples dont take up as much space so that we have enough space for the sorting.

Partial Rendering
Sorting

Realtime Bloom

Bloom is currently only available in Offline mode. The kernel needs to run on the GPU, be of high quality and take less than 0.5ms on 1920x1080.

Interactive Toy Primitive

One should be able to have a primitive that one can move, resize and rotate around as desired. Its material should be fully customizable in realtime, that includes emission. This should not be included in the BVH and thus is limited to 1 primitive. However, there should be multiple kinds of primitives available, all we need is a function which describes its shape.

BVH Construction - Parallel

The BVH construction can be parallelized and in general made faster by quite some margin.

A. Ebert, V. Fuetterling, C. Lojewski and F. Pfreundt, Parallel Spatial Splits in Bounding Volume Hierarchies, Eurographics Symposium on Parallel Graphics and Visualization, 2016.

Light Source Debug Visualization

There should be a mode where the light sources are traced as spheres which you can then see. This could be useful to analyze the current light source generation algorithm but also to improve scenes.

Overlapping light sampling angles

When from a given point multiple light sources overlap over one direction, then this direction is biased and causes overly bright spots. I am unsure at the moment how to fix this without hurting performance too much.

Particle Downfall System

It would be nice to have generated particles like snow or rain drops. They would make up its own hit type.

BVH Traversal Performance Spikes

In some scenes and camera angles, the BVH traversal takes 100-1000 times as long as usual. Further while the traversal may take this long in one particular frame, the next frame with the same camera angle may run fine. I suspect that there are some very special rays that for some reason trigger an exhaustive hierarchy traversal. Either I can find and eliminate the reason or I implement a limit which aborts traversal if too many iterations have past.

Edit:
In general this happens on the second or third BVH Traversal Kernel start. Never on the first one. I suspected that it has something to do with no pixels being terminated yet. However the balance kernel was found to not be at fault. Profilers suggest that only few threads cause this hiccup. Limiting iteration count fixed the hiccups but caused NaN pixels. Hence I assume that these few pixels end up with some garbage values at some point which causes the geometry kernel to output NaNs and causes the BVH travsersal to be exhaustive.

Edit2:
So turns out that the issue was normals in the scene having NaNs. Pull request #31 takes care of that.

Build Meta Data

Insert data like git branch etc using cmake configuration files. This should help identify builds better but things like build time should be excluded as builds of the same commit should always be identical.

Proper Logging

There should be a logging API that allows for uniform output.

Auto Exposure

In Realtime mode there should be a toggle for auto exposure. It is open how to determine it but one may try to use the average brightness obtained through the optix denoiser.

Realtime Upscaling

Realtime Mode should have the option to render at a lower resolution than the window resolution. All I need is a kernel that upscales the rendered image. This could possibly combined into the kernel that transform the float image to the 8bit image.

The reason behind this is to achieve high framerates without having a small window due to the low resolution. SDL's built in rescaling seems way too slow for some reason.

Temporal Reprojection

Right now we have temporal accumulation. We should generate motion vectors and then use temporal reprojection. It does not have to be very robust as scenes are static anyway. The image during movement should simply be a bit more stable than it is right now. The motion vectors may then also be fed to Optix even though in its current state this only makes the denoising more stable. Optix does not use it in a temporal reprojection manner yet.

Generate Motion Vectors
Temporal Reprojection

Edit:
Motion vectors are discussed in Ray Tracing Gems 2.

Shadow Terminator Problem

The issue is that we use low poly meshes to approximate smooth shapes which causes shadow artifacts. I had looked at this issue in the past but I found no literature on it since I could not figure out what this issue is called. My own attempts could resolve the issue in some cases but turned out to be not robust enough to be actually used. Luckily, this problem is presented in Ray Tracing Gems 2.

Baking from baked file crashes

Loading from a baked file and then creating a new baked file crashes the program.

Enhanced Sky

The Sky should include Moons and Stars. The atmosphere should be tweakable to make for denser or thinner atmospheres or be completely turned off to allow scenes outside of planets. Positions should be based on a fixed sun position and position of the planet and moon like in a solar system. In realtime mode, there should be a way to change positions of celestial bodies through time advancing. Reference:

To Do:

Water molecule simulation
Use position, this includes making sun visibility on horizon height dependend
Moon
Stars
Objects in the sky are properly traced instead of using angles
Purkinje Shift

Optix 7.4

Optix 7.4 is out now. Since Luminary is still using Optix 7.2, we should update now.

Clouds

Clouds should be procedurally generated and rely on the fact that they are above all geometry. In other words, all light data for the clouds comes directly from the atmosphere and does not require BVH traversals as we assume no hit. They should be able to move as a whole but deformation is not required.

Edit:
These kind of volumetrics are discussed in Ray Tracing Gems 2.

Greyscale
Sepia
4 Color Greyscale (Gameboy)
CRT Filter