CUDA Rasterizer

CLICK ME FOR INSTRUCTION OF THIS PROJECT

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4

Yuxin Hu
Tested on: Windows 10, i7-6700HQ @ 2.60GHz 8GB, GTX 960M 4096MB (Personal Laptop)

Yuxin Hu

Code Change

rasterize.cu. Add a new function parameters in function _vertexTransformAndAssembly: float scale. For objects that are too large to be displayed properly on screen, I will pass a scale parameters to resize it in model space.
rasterize.cu. Add a kernal function _rasterizePrimitive to set value for fragment buffer. It has three modes: triangle, point and line.
rasterize.cu. Add three function parameters in render() function. glm::vec3 lightDir & float lightIntensity: for light direction and light intensity that will be used for Lambert shading models. PrimitiveType mode: if it is point or line, do not apply shading model, if it is triangle, apply lambert shading model.
rasterize.cu. Add a new function getZByLerp, get depth of fragment on a line between two vertice.
rasterize.cu. Add a new function rasterizeLine. A naive approach to loop through all pixels within line's bounding box, and check if each pixel falls on the line segment.
rasterize.cu. Add a new function bresenhamLine. This is third party code taken reference from http://tech-algorithm.com/articles/drawing-line-using-bresenham-algorithm/. It uses the Bresenhan Line Algorithm to shade fragments that form the line between two vertices.
rasterize.cu. Add a new function rasterizeWireFrame. This will be called as a parent function of bresenhamLine.
rasterize.h. Add the performance timer class PerformanceTimer, adapted from WindyDarian(https://github.com/WindyDarian).
rasterizeTools.h. Add a new function getAABBForLine. Get the bounding box of the line segment.
rasterizeTools.h. Add a new function getColorAtCoordinate. Get the color of the fragment using barycentric interpolation, without perspective correction.
rasterizeTools.h. Add a new function getEyeSpaceZAtCoordinate. Get the eye space z at coordinate using barycentric interpolation.
rasterizeTools.h. Add a new function getTextureAtCoord. Get the perspective corrected texture uv coordinate using barycentric interpolation.

How to run different rasterize mode?

Render primitives with lambert shading model: change the last parameter of below two kernal function calls in rasterize() to Triangle _rasterizePrimitive (......, Triangle) render << <blockCount2d, blockSize2d >> >(......, Triangle);
Render primitives with point: change the last parameter of below two kernal function calls in rasterize() to Point _rasterizePrimitive (......, Point) render << <blockCount2d, blockSize2d >> >(......, Point);
Render primitives with Lines: change the last parameter of below two kernal function calls in rasterize() to Line _rasterizePrimitive (......, Line) render << <blockCount2d, blockSize2d >> >(......, Line);
Render primitives with scale factor: change the last parameter of below kernal function call in rasteriza(), e.g. set it as 0.01 to render the two cylinder engine. _vertexTransformAndAssembly(......, 0.01)

Basic Rasterizer with Bounding Box and Depth Tested

Flower Colored with Normals

Cow with Lambert Shadings

Double Cylinder Engine with Lamber Shadings

Double Cylinder Engine Scaled with 0.01

Character Model with Lambert Shadings

Interpolate Fragment Colors Within Triangle

Color Interpolation Within Each Triangle

UV Texture Map

Checker Box with Black and White Grid Texture

Yellow Duck with Texture

Cesium Milk Truck with Texture

Point

Box rendered with points only

Cow rendered with points

Line

Cow rendered with lines

Performance Analysis

Rasterize Kernal Run Time Versus Depth of Object along Camera Z

Rasterize Kernal Run Time Versus Depth of Object

In general the closer the objects toward camera, the longer it takes to complete rasterize kernal. Because the closer the objects are towards camera, the larger area each triangle will occupy in the screen space. In the rasterize primitive kernal we need to loop over more pixels. The number of triangles does not affect the performance. More triangles (complex engine scaled at 0.01) does not necessary take more time to complete. From the sudden increase of run time between -2 and -1, it is clear that the bottleneck is the occupancy of the triangles on screen. At a very close distance, a few triangle will be rendering on screen, but each of them almost take entire screen space, and we have to loop over all pixels within the bounding box, which severely affects performance.

Rasterize Pipeline Breakdown

Rasterize Pipeline Runtime Breakdown

Except Engine which is scaled by 0.01, the rest of objects runtime are measured when camera is at z=-3. The runtime of both boxes and triangles are long because of the slow rasterize process, although they contain much fewer number of primitives. Again it shows that the bottleneck is the loop over large triangles on screenspace. On the contrary, although engine and duck have more primitives, each primitives only occupy a small region on screen, and the loop time in rasterize is shorter. This shows that parallel kernal threads do improve the performance when number of primitives increase, however, the bottleneck comes when some primitive occupies large screen space, and those thread will take long to finish.

Rasterize Pipeline Runtime Percentage Breakdown

As the primitive number increases, the percentage of runtime taken by vertex transformation, primitive assmebly, and render will increase.

Rasterize Kernal Run Time Versus Texture Read

Rasterize Kernal Run Time Of Checkerbox

It takes twice the time to render checkerbox with texture read.

Rasterize Line Methods Comparason

Rasterize Line Methods Comparason

I used a naive approach to render lines, which is looping through all pixels within the line's bounding box, and check if each pixel falls on the line. I also tested the Bresenham line algorithm, which is the algorithm described in http://tech-algorithm.com/articles/drawing-line-using-bresenham-algorithm/ The idea is that for line in first octanc, where the slop is between 0 and 1, we increment x every time, and we render either (x+1,y) or (x+1, y+1) based on which pixel is closer to the line. For lines in other octant, we simply convert them to the first octant and repeat the method. This method avoids looping through all pixels, where most of them are not falling on line. From the performance analysis we can observe that the Bresenham line algorithm has almost 4 times performance improvement than naive apprach.

huyuxin / project4-cuda-rasterizer Goto Github PK

project4-cuda-rasterizer's Introduction

CUDA Rasterizer

Yuxin Hu

Code Change

How to run different rasterize mode?

Basic Rasterizer with Bounding Box and Depth Tested

Interpolate Fragment Colors Within Triangle

UV Texture Map

Point

Line

Performance Analysis

Credits

project4-cuda-rasterizer's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent