zeux / niagara Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 71.0 1.01 MB

A Vulkan renderer written from scratch on stream

License: MIT License

C++ 78.97% GLSL 13.58% C 5.49% CMake 1.96%

vulkan

niagara's People

Contributors

Stargazers

Watchers

Forkers

bombomby vr3d dariusz1989 shuaibarshad victorlyn cappah frostyfeng kyawkyaw cg-enths ccf19881030 kayru gwihlidal swq0553 jaw egohleweyn congyue1977 abrarzshahriar sebastianrueclausen joseeteixeira yunhsiao zsb534923374 fanghao6666 jaybaird 5433d-r32433 wsqjny amilacg rubbyzhang lonegamedev mortal2000 asdlei99 alehacksp kkhomyakov3d fromasmtodisasm m0nguss pingpongcat divecoder donnhenry matcheydj rkun08 lastexile-ch xiao7540916 dummbyy mgubisch picanumber swordlegend orgail mslinklater ssdemajia icodein jaycelai sid1980 mistycube qingrenjia chenyangchenyang drvkize0 axinging yangtzehina brucelevis deccer sket4 wdspjm incodegames nasingfaund herculesyuan alexermolovich tcantenot forksnd toryder gustavo-tomas nyran

niagara's Issues

Call `SetMeshOutputsEXT` before writing to `gl_MeshVerticesEXT`/`gl_PrimitiveTriangleIndicesEXT`

In your mesh shader, you call SetMeshOutputsEXT after writing to gl_MeshVerticesEXT and gl_PrimitiveTriangleIndicesEXT.

niagara/src/shaders/meshlet.mesh.glsl

Line 113 in 57e1022

SetMeshOutputsEXT(vertexCount, triangleCount);

In the documentation I read (https://github.com/KhronosGroup/Vulkan-Docs/blob/main/proposals/VK_EXT_mesh_shader.adoc), it says:

Mesh shaders must specify the actual number of primitives and vertices being emitted before writing them, via the OpSetMeshOutputsEXT instruction.

I`m not sure if its really required to do so, but as I understand it is.

It makes no visual difference on the NVIDIA driver, but I still thought I'd write it to you because it's supposed to run poorly on some architectures.

Correctly handle maxTaskWorkGroupCount restriction

maxTaskWorkGroupCountX/Y/Z is limited to 65535 (which is a min guaranteed spec limit as well as an actual command encoding restriction on some drivers).
maxTaskWorkGroupTotalCount is limited to 2^22 (min guaranteed spec limit).

This means that when we use taskSubmit we need another small shader that converts the total task WG count into X/Y/Z, e.g. hardcoding Y as 64, Z as 1, and padding the task commands to a multiple of 64 with dummy 0-count entries to handle overflow.

Without this, when we disable LOD or culling in the new pipeline, we're going over the limit.

Just want to say a big thanks

THANKS!
Really really helpful <3

Could you support CMake to build?

Mesh shader discussion

Not sure what is the best place to talk about this so decided maybe we can discuss it here. Hope this is okay.

Looking at the current code, I noticed that the mesh shader workgroup size is 64, but the shader has: max_vertices = 64, max_primitives = 124 this means that the shader is going to have poor occupancy on AMD HW, effectively leaving 50% of shader invocations under-utilized. Note that this is also suboptimal on NVidia HW which prefers a workgroup size of 32.

I recommend to have a compile-time constant for each of these values (similar to what you do for MESH_WGSIZE) and configure it like this:

AMD: workgroup size == max output vertices == max output primitives >= 64, all outputs accessed by local invocation index
Nvidia: workgroup size = 32, max output vertices = 64, max output primitives >= 64

You can achieve this by using a "compile-time loop" (a loop using the compile-time constants) which will be optimal on both AMD and NVidia.

Kitten starts to flicker when turn off frustum culling and lod together

When I press C to turn off frustum culling and then press L to disable lod , the screen start to flickering as the video shows. If disable either frustum culling or lod, it will be stable again.

flickering.mp4

LOD detail

Hi, I come up with two questions about the LOD implementation.

In drawcull.comp.glsl, the indirect draw command is set by:

niagara/src/shaders/drawcull.comp.glsl

Lines 136 to 137 in 98f5d5a

    
           drawCommands[dci].taskCount = (lod.meshletCount + 31) / 32; 
        
           drawCommands[dci].firstTask = lod.meshletOffset / 32;

then, the meshlet index set as mi = gl_WorkGroupID.x * 32 + gl_LocalInvocationID.x in meshlet.task.glsl.
If lod.meshletOffset is not divisible by 32, can we still derive the correct meshlet index in task shader?
For example, let lod.meshletOffset = 62 and lod.taskCount = 64, we may get mi in range [32, 96), but the correct range should be [62, 126].

Is it possible to compute LOD level in task shader, as described in this NVIDIA blog?

Fauly occlusion with camera

Hello Zeux ! I've been dealing with a bug in the "fast/compact" sphere projection

vec4 ProjectedSphereToAABB( vec3 viewSpaceCenter, float r, float projWidth, float projHeight )
{
	vec2 cXZ = viewSpaceCenter.xz;
	vec2 vXZ = vec2( sqrt( dot( cXZ, cXZ ) - r * r ), r );
	vec2 minX = mat2( vXZ.x, vXZ.y, -vXZ.y, vXZ.x ) * cXZ;
	vec2 maxX = mat2( vXZ.x, -vXZ.y, vXZ.y, vXZ.x ) * cXZ;

	vec2 cYZ =viewSpaceCenter.yz;
	vec2 vYZ = vec2( sqrt( dot( cYZ, cYZ ) - r * r ), r );
	vec2 minY = mat2( vYZ.x, vYZ.y, -vYZ.y, vYZ.x ) * cYZ;
	vec2 maxY = mat2( vYZ.x, -vYZ.y, vYZ.y, vYZ.x ) * cYZ;



	// NOTE: quick and dirty projection
	vec4 aabb = vec4( ( minX.x / minX.y ) * projWidth,
					  ( minY.x / minY.y ) * projHeight,
					  ( maxX.x / maxX.y ) * projWidth,
					  ( maxY.x / maxY.y ) * projHeight );

	// NOTE: from NDC to texture UV space 
	aabb = aabb.xwzy * vec4( 0.5, -0.5, 0.5, -0.5 ) + vec4( 0.5 );

	return aabb;
}

Basically , I was getting occlusion culling on "the object in question" but only in a "Bermuda Tiangle" region, and it shouldn't have happen because it was visible.
The issue is produced by the quick and dirty projection, namely it lacks the 1/zNear as a factor:

vec4 aabb = vec4( ( minX.x / minX.y ) * projWidth / zNear,
					  ( minY.x / minY.y ) * projHeight/ zNear,
					  ( maxX.x / maxX.y ) * projWidth / zNear,
					  ( maxY.x / maxY.y ) * projHeight / zNear );

Took me a while to figure out :)
Cheers !

Frustum culling with symmetry doesn't work for non-identity view matrix.

The other components of this :
vec4 frustumX = normalizePlane(projectionT[3] + projectionT[0]); // x + w < 0
vec4 frustumY = normalizePlane(projectionT[3] + projectionT[1]); // y + w < 0

cullData.frustum[0] = frustumX.x;
cullData.frustum[1] = frustumX.z;
cullData.frustum[2] = frustumY.y;
cullData.frustum[3] = frustumY.z;

are not always 0 , so discarding them will result into faulty culling.
( something similar : https://drive.google.com/file/d/1lTC3M-KOs6g4Z38tmGTbieJOkYjNhWUO/view?usp=sharing, this is from my engine btw )

But the number of planes can be reduced from 6 to 4 since the near and far planes result in a trivial test.

Assertion Failed

i have this error :

C:\Users\Donovane>C:\Users\Donovane\Desktop\niagara-master\src\Release\niagara.exe C:\Users\Donovane\Desktop\niagara-master\src\Release\monitor.obj
GPU0: NVIDIA GeForce RTX 2060
Selected GPU NVIDIA GeForce RTX 2060
Assertion failed: rcs, file C:\Users\Donovane\Desktop\niagara-master\src\niagara.cpp, line 547

how to solve them ?

Doesn't run on AMD GPUs on Win because drivers missing VK_KHR_push_descriptor..

Makes sense to ask to make AMD Vulkan driver comptabile your Vulkan playground?
thanks.

Custom Build tool Cmd line needs some quotes

If there are spaces in the path to your project or in the path to the vulkan SDK, the custom build step won't work.
=>
"$(VULKAN_SDK)\Bin\glslangValidator" "%(FullPath)" -V --target-env vulkan1.1 -o "shaders/%(Filename).spv"
should fix it.

SEGFAULT on an AMD 6900XT when enabling `VK_EXT_mesh_shader`

Hi Arseny, thanks for the quick demo using VK_EXT_mesh_shader! Unfortunately for whatever I'm getting a SEGFAULT with I run it with RADV_PERFTEST=ext_ms. Have you encountered this? Is there another environment variable I should be using? Here's the full log from GDB (running via sh is the only way so far that I've figured out how to force RADV_PERFTEST=ext_ms):

gdb --args sh -c 'RADV_PERFTEST=ext_ms ./niagara ../data/kitten.obj'
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from sh...
(No debugging symbols found in sh)
(gdb) r
Starting program: /usr/bin/sh -c RADV_PERFTEST=ext_ms\ ./niagara\ ../data/kitten.obj
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
process 67360 is executing new program: /home/ashley/projects/niagara/build/niagara
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffec8e66c0 (LWP 67364)]
[New Thread 0x7fffe7ebe6c0 (LWP 67365)]
GPU0: Intel(R) Xe Graphics (TGL GT2)
GPU1: AMD Radeon RX 6900 XT (RADV NAVI21)
GPU2: llvmpipe (LLVM 14.0.6, 256 bits)
Selected GPU AMD Radeon RX 6900 XT (RADV NAVI21)

Thread 1 "niagara" received signal SIGSEGV, Segmentation fault.
0x00007fffecc7a974 in ?? () from /usr/lib/libvulkan_radeon.so
(gdb) bt
#0  0x00007fffecc7a974 in ?? () from /usr/lib/libvulkan_radeon.so
#1  0x00007fffecc5a9d2 in ?? () from /usr/lib/libvulkan_radeon.so
#2  0x00007fffecc5df7b in ?? () from /usr/lib/libvulkan_radeon.so
#3  0x00007fffecc61521 in ?? () from /usr/lib/libvulkan_radeon.so
#4  0x00007fffecc616e2 in ?? () from /usr/lib/libvulkan_radeon.so
#5  0x00005555555772f4 in createGraphicsPipeline(VkDevice_T*, VkPipelineCache_T*, VkPipelineRenderingCreateInfo const&, std::initializer_list<Shader const*>, VkPipelineLayout_T*, std::initializer_list<int>) ()
#6  0x0000555555565259 in main ()
(gdb)

Here's the device info from vulkaninfo --summary:

GPU1:
	apiVersion         = 4206818 (1.3.226)
	driverVersion      = 92282979 (0x5802063)
	vendorID           = 0x1002
	deviceID           = 0x73bf
	deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
	deviceName         = AMD Radeon RX 6900 XT (RADV NAVI21)
	driverID           = DRIVER_ID_MESA_RADV
	driverName         = radv
	driverInfo         = Mesa 22.3.0-devel (git-c93b72d045)
	conformanceVersion = 1.3.0.0
	deviceUUID         = 00000000-8400-0000-0000-000000000000
	driverUUID         = 414d442d-4d45-5341-2d44-525600000000

VkPhysicalDeviceVulkan12Features shaderFloat16 not supported on Nvidia Maxwell

Awesome stream and git repo, thank you immensely.

One quick note, when running the solution on my Windows 10, NVIDIA 980 Classifieds, the application fails to create a device. While mesh shaders are not supported on this GPU, the non-mesh shader code does work fine, but shaderFloat16 must be disabled.

---
GPU0: GeForce GTX 980
GPU1: GeForce GTX 980
Selected GPU GeForce GTX 980
ERROR: terminator_CreateDevice: Failed in ICD C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_cc8a742077501a9f\.\nvoglv64.dll vkCreateDevicecall
Assertion failed: !"Validation error encountered!", file C:\Programming\Examples\niagara\src\device.cpp, line 76
---

By setting features12.shaderFloat16 = false; the application runs without error. It's not clear to me how it impacts the float16 utilized on the texcoords behind the scene, but it seems to operate acceptably.

AMD RX 560: cannot create VkDevice due Failed in ICD

Hi @zeux
vkCreateDevice failed:
ERROR: terminator_CreateDevice: Failed in ICD C:\Windows\System32\DriverStore\FileRepository\u0364232.inf_amd64_ac01b1fb8d253d0b\B364161.\amdvlk64.dll vkCreateDevicecall

Later, I will try to fix problem and create a pull request

Possible negation error in frustum culling code.

I found implementing the frustum culling code in my engine required changing

visible = visible && center.z * cullData.frustum[3] - abs(center.y) * cullData.frustum[2] > -radius;

visible = visible && center.z * cullData.frustum[3] + abs(center.y) * cullData.frustum[2] > -radius;

to work as expected. (The difference turns into a sum). Otherwise objects going out of view off the top or bottom of the camera were not handled correctly with a look camera I implemented. I don't have a derivation that this is correct or that your code is wrong, so I apologize if this is a 'two wrongs make a right' in my code base. The projection matrix I use is reverse Z with infinite projection modified to support rendering objects at infinity correctly, but after some checking I could not see how these changes would alter the behavior of culling along the Y planes of the frustum.

If this is true, then it would not have been easily detectable as-is since most objects are within the view frustum anyway, and the cost of the (alleged) error is just more objects to process.

The project doesn't compile on Vulkan SDK 1.2.135.0

shaders.cpp is including #include <vulkan/spirv.h> which seems to have moved to spirv-headers/spirv.h, so the project doesn't compile with Vulkan SDK 1.2.135.0.

Incorrect frustum symmetry culling in bottom right quadrant

I noticed incorrect frustum culling (relying on frustum symmetry assumption) in the bottom right quadrant.

Culling Off:

Culling On:

Difference:

The difference view clearly shows the issue. The other differences in the frame are likely from the infinite projection, so it should be fine to ignore those ones (can be tuned).

I started with a canonical frustum symmetry test:

// fit the center to the positive quadrant to reduce view plane tests; only use positive planes
vec3 pos_center = vec3(abs(center.x), abs(center.y), center.z);

// unpack the view space plane equations
vec3 plane1 = vec3(cullData.frustum[0], 0.0f, cullData.frustum[1]);
vec3 plane2 = vec3(0.0f, cullData.frustum[2], cullData.frustum[3]);

visible = visible && !((dot(plane1.xyz, pos_center.xyz) < -radius) || (dot(plane2.xyz, pos_center.xyz) < -radius));

Then I expanded it, and then rearranged the equations to match the original code.

visible = visible && dot(plane1.xyz, pos_center.xyz) > -radius;
visible = visible && dot(plane2.xyz, pos_center.xyz) > -radius;

visible = visible && (plane1.x * pos_center.x + plane1.y * pos_center.y + plane1.z * pos_center.z) > -radius;
visible = visible && (plane2.x * pos_center.x + plane2.y * pos_center.y + plane2.z * pos_center.z) > -radius;

visible = visible && pos_center.z * plane1.z + plane1.x * pos_center.x > -radius;
visible = visible && pos_center.z * plane2.z + plane2.y * pos_center.y > -radius;

The issue ended up being the subtractions instead of additions from the expanded dot products. Fixing the operations produces the expected image, while maintaining expected culling rate.

Fixed in: #4

Image layouts are incorrect.

You don't transition the image layouts between acquiring, rendering and presenting.

You can fix that by setting the initialLayout and finalLayout in createRenderPass to UNDEFINED and PRESENT resp.

swapchain images start UNDEFINED then must be transitioned to COLOR_ATTACHMENT when you want to render to it and then to PRESENT when you present to the surface.

You can always transition the layout away from UNDEFINED if you don't care about the data inside it (which you don't at the start of the renderpass because you clear).

Device extension VK_EXT_mesh_shader not supported by selected physical device or enabled layers.

The GPU I use is 2080TI, as I saw in your video. But why mine doesn't support Mesh shading.
All of the extensions are as follows :

VK_KHR_16bit_storage
VK_KHR_8bit_storage
VK_KHR_acceleration_structure
VK_KHR_bind_memory2
VK_KHR_buffer_device_address
VK_KHR_copy_commands2
VK_KHR_create_renderpass2
VK_KHR_dedicated_allocation
VK_KHR_deferred_host_operations
VK_KHR_depth_stencil_resolve
VK_KHR_descriptor_update_template
VK_KHR_device_group
VK_KHR_draw_indirect_count
VK_KHR_driver_properties
VK_KHR_dynamic_rendering
VK_KHR_external_fence
VK_KHR_external_fence_win32
VK_KHR_external_memory
VK_KHR_external_memory_win32
VK_KHR_external_semaphore
VK_KHR_external_semaphore_win32
VK_KHR_format_feature_flags2
VK_KHR_fragment_shading_rate
VK_KHR_get_memory_requirements2
VK_KHR_image_format_list
VK_KHR_imageless_framebuffer
VK_KHR_maintenance1
VK_KHR_maintenance2
VK_KHR_maintenance3
VK_KHR_maintenance4
VK_KHR_multiview
VK_KHR_pipeline_executable_properties
VK_KHR_pipeline_library
VK_KHR_present_id
VK_KHR_present_wait
VK_KHR_push_descriptor
VK_KHR_ray_query
VK_KHR_ray_tracing_pipeline
VK_KHR_relaxed_block_layout
VK_KHR_sampler_mirror_clamp_to_edge
VK_KHR_sampler_ycbcr_conversion
VK_KHR_separate_depth_stencil_layouts
VK_KHR_shader_atomic_int64
VK_KHR_shader_clock
VK_KHR_shader_draw_parameters
VK_KHR_shader_float16_int8
VK_KHR_shader_float_controls
VK_KHR_shader_integer_dot_product
VK_KHR_shader_non_semantic_info
VK_KHR_shader_subgroup_extended_types
VK_KHR_shader_subgroup_uniform_control_flow
VK_KHR_shader_terminate_invocation
VK_KHR_spirv_1_4
VK_KHR_storage_buffer_storage_class
VK_KHR_swapchain
VK_KHR_swapchain_mutable_format
VK_KHR_synchronization2
VK_KHR_timeline_semaphore
VK_KHR_uniform_buffer_standard_layout
VK_KHR_variable_pointers
VK_KHR_vulkan_memory_model
VK_KHR_win32_keyed_mutex
VK_KHR_workgroup_memory_explicit_layout
VK_KHR_zero_initialize_workgroup_memory
VK_EXT_4444_formats
VK_EXT_blend_operation_advanced
VK_EXT_border_color_swizzle
VK_EXT_buffer_device_address
VK_EXT_calibrated_timestamps
VK_EXT_color_write_enable
VK_EXT_conditional_rendering
VK_EXT_conservative_rasterization
VK_EXT_custom_border_color
VK_EXT_depth_clip_control
VK_EXT_depth_clip_enable
VK_EXT_depth_range_unrestricted
VK_EXT_descriptor_indexing
VK_EXT_discard_rectangles
VK_EXT_extended_dynamic_state
VK_EXT_extended_dynamic_state2
VK_EXT_external_memory_host
VK_EXT_fragment_shader_interlock
VK_EXT_full_screen_exclusive
VK_EXT_hdr_metadata
VK_EXT_host_query_reset
VK_EXT_image_robustness
VK_EXT_image_view_min_lod
VK_EXT_index_type_uint8
VK_EXT_inline_uniform_block
VK_EXT_line_rasterization
VK_EXT_load_store_op_none
VK_EXT_memory_budget
VK_EXT_memory_priority
VK_EXT_multi_draw
VK_EXT_pageable_device_local_memory
VK_EXT_pci_bus_info
VK_EXT_pipeline_creation_cache_control
VK_EXT_pipeline_creation_feedback
VK_EXT_post_depth_coverage
VK_EXT_primitive_topology_list_restart
VK_EXT_private_data
VK_EXT_provoking_vertex
VK_EXT_queue_family_foreign
VK_EXT_robustness2
VK_EXT_sample_locations
VK_EXT_sampler_filter_minmax
VK_EXT_scalar_block_layout
VK_EXT_separate_stencil_usage
VK_EXT_shader_atomic_float
VK_EXT_shader_demote_to_helper_invocation
VK_EXT_shader_image_atomic_int64
VK_EXT_shader_subgroup_ballot
VK_EXT_shader_subgroup_vote
VK_EXT_shader_viewport_index_layer
VK_EXT_subgroup_size_control
VK_EXT_texel_buffer_alignment
VK_EXT_tooling_info
VK_EXT_transform_feedback
VK_EXT_vertex_attribute_divisor
VK_EXT_vertex_input_dynamic_state
VK_EXT_ycbcr_2plane_444_formats
VK_EXT_ycbcr_image_arrays
VK_NV_clip_space_w_scaling
VK_NV_compute_shader_derivatives
VK_NV_cooperative_matrix
VK_NV_corner_sampled_image
VK_NV_coverage_reduction_mode
VK_NV_cuda_kernel_launch
VK_NV_dedicated_allocation
VK_NV_dedicated_allocation_image_aliasing
VK_NV_device_diagnostic_checkpoints
VK_NV_device_diagnostics_config
VK_NV_device_generated_commands
VK_NV_external_memory
VK_NV_external_memory_win32
VK_NV_fill_rectangle
VK_NV_fragment_coverage_to_color
VK_NV_fragment_shader_barycentric
VK_NV_fragment_shading_rate_enums
VK_NV_framebuffer_mixed_samples
VK_NV_geometry_shader_passthrough
VK_NV_inherited_viewport_scissor
VK_NV_linear_color_attachment
VK_NV_mesh_shader
VK_NV_ray_tracing
VK_NV_representative_fragment_test
VK_NV_sample_mask_override_coverage
VK_NV_scissor_exclusive
VK_NV_shader_image_footprint
VK_NV_shader_sm_builtins
VK_NV_shader_subgroup_partitioned
VK_NV_shading_rate_image
VK_NV_viewport_array2
VK_NV_viewport_swizzle
VK_NV_win32_keyed_mutex
VK_NVX_binary_import
VK_NVX_image_view_handle
VK_NVX_multiview_per_view_attributes

Simple Flycam

Hi Zeux,

Thank you very much for this awesome example!
How much effort it would take to implement a simple flycam (like WASD + mouse support)? I would love it to see the scene from different angles.

Thank you!

	drawCommands[dci].taskCount = (lod.meshletCount + 31) / 32;
	drawCommands[dci].firstTask = lod.meshletOffset / 32;