diff --git a/PerformanceAnalysis.xlsx b/PerformanceAnalysis.xlsx new file mode 100644 index 0000000..b1701f7 Binary files /dev/null and b/PerformanceAnalysis.xlsx differ diff --git a/README.md b/README.md index 41b91f0..bc531e8 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,140 @@ CUDA Rasterizer =============== -[CLICK ME FOR INSTRUCTION OF THIS PROJECT](./INSTRUCTION.md) +[CLICK ME FOR INSTRUCTIONS FOR THIS PROJECT](./INSTRUCTION.md) **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE - * (TODO) [LinkedIn](), [personal website](), [twitter](), etc. -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Angelina Risi + * [LinkedIn](www.linkedin.com/in/angelina-risi) + * [Twitter](https://twitter.com/Angelina_Risi) +* Tested on: Windows 10, i7-6700HQ @ 2.60GHz 8GB, GTX 960M 4096MB (Personal Laptop) + + +![Sample 1](renders/test2.gif) + +## Features + +### Primitive Rasterization + +The current code supports triangles, lines, and points. Additionally, you can toggle wireframe mode and point cloud mode for triangles using the respective defined tokens at the beginning of rasterize.cu. Triangles and lines use perspective-correct interpolation to determine attributes of the overlapping fragments (see "Perspective-Correct Texture Mapping"). All supported modes still support textures and "paint normals" mode (using the interpolated normal as the color instead). By default, the fragment color is set to white. + +*Point Cloud:* + +![Point Cloud w/ Normals](images/point_cloud.PNG)![Point Cloud w/ Texture](images/point_cloud2.PNG) + +*Wireframe:* + +![Wireframe](images/wireframe.PNG) + + +The performance impact from the extra primitive support would be negligible for the current single primitive mode scenes tested, but in a complicated scene there would be branching. Currently each thread handles a primitive (which has its own performance tradeoffs for primitive size in screenspace), but if, for example, the different primitive types were not sorted to be contiguous in memory and a scene contained many of each type, there would be divergent branching frequently in warps, in essence cutting performance in the rasterization step in half or worse depending on how many different types and the run time of each (with triangles being slowest and points fastest). However, the current tradeoff is being able to draw these shapes at all, and optimizations could be made, such as the sorting previously mentioned. + + +### Texturing + +![Sample 2](renders/test3.gif) + +Currently texture sampling for fragment colors is supported. + +#### Bilinear Texture Filtering + +Since vertices are given specific UV coordinates on the texture in floating point values normalized 0 to 1, there are bound to be issues with texture scaling and aliasing effects. Bilinear filtering blurs the target texel with its neighbors, smoothing the drawn texture, using linear interpolation. The coefficient used for interpolating in this case is the difference between the integer and float values of the texel index. The code design was actually based off the [Wikipedia page](https://en.wikipedia.org/wiki/Bilinear_filtering) and handles edge cases by wrapping the texture. +The performance impact is extra index computations and memory access. The current implementation has branching for wrapping, which may be improved using a modulo instead for wrapping and other code optimizations, and must access and weight four texels as opposed to just one. At minimum, one would expect about a factor of four performance hit in the texture sampling funtion. + +#### Perspective-Correct Texture Mapping + +If you only compute the fragment attributes using the original vertex attributes and barycentric weights, you get "alline" interpolation. The problem with this method is most clearly seen in textures when perspective is used, and results in the texturing of neighboring triangles appearing "disjoint" with one another. To fix this, a perspective adjustment was applied by calculating the perspective-correct depth "w" and multiplying each contribution by this factor divided by the vertex z position for that attribute. In code: + +```cpp +// "bary" is the barycentric weights, and "tri" stores the vertex positions +float w = 1.0f / ((bary.x / tri[0].z) + (bary.y / tri[1].z) + (bary.z / tri[2].z)); -### (TODO: Your README) +fragBuf[f_idx].texcoord0 = w * ( (p.v[0].texcoord0 * bary.x / tri[0].z) + + (p.v[1].texcoord0 * bary.y / tri[1].z) + + (p.v[2].texcoord0 * bary.z / tri[2].z)); +``` + +The result is reduced unwanted distortion on attributes due to distortion (the same formula can be used on normals, colors, etc). A comparison between affine (first image) and corrected (second image) textering is seen below for a checkerboard texture: + +![Affine](images/perspective_incorrect_interpolation2.PNG) ![Perspective Correct](images/bilinear_tex_filter.PNG) + +The second image also has bilinear filtering. While the pattern does not look perfect, probably due to rounding errors, it is singnificantly improved from the affine case. + +The performance is not expected to suffer much from this feature, but while this part is compute light, it still requires several global memory accesses. The current implementation also has to pass the texture information to each vertex, and then to each fragment, which compounds the problem of memory use and access. Perhaps if there was an implementation grouping fragments by material or texture, and only passing the constant data (non-uv attributes for the texture) once for the group of fragments, it could improve performance. One could also make the assumption those fragments would appear grouped near each other in screen space if sharing a texture, so there is less worry of random access slowdown from this method as well. + + +### Depth Buffer + +Both a integer depth buffer and an integer mutex buffer are stored with one entry per fragment. This simple implementation uses the mutex to force access to the depth buffer, and then to the fragment buffer, as a critical section. While this will affect the reasterizer speed, sometimes significantly, it avoids any race conditions caused by multiple primitives overlapping the same pixel coordinate. Currently it only has integer precision, so z-fighting for close objects may not be completely avoided. Part of the rasterizer code has been reproduced below. The relevant part is the "do while" loop that creates a critical section preventing access to the current fragment index by all other threads by using an atomic function to lock the address. + +```cpp +inside = isBarycentricCoordInBounds(bary); -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +float z_float = getZAtCoordinate(bary, tri); +if (z_float < 0 || z_float > 1) continue; +int z = z_float * INT_MAX; +bool isSet = false; +if (inside) { + isSet = false; + do { + isSet = (atomicCAS(mutex + f_idx, 0, 1) == 0); + if (isSet) { + // Critical section + if (z < depth[f_idx]) { + depth[f_idx] = z; + fragBuf[f_idx] = f_true; + } + } + if (isSet) { + mutex[f_idx] = 0; + } + } while (!isSet); +} +``` + +However, by forcing serialization of memory access, we force a loss of performance. In a dense scene with many overlapping primitives, we can expect severe framerate drops since we lose some of the advantage of parallelism. This is necessary for accurately rendering a scene with depth and avoiding any race conditions from the overlapping objects. The only optimization that comes to mind is comparing with the z-buffer first to avoid locking and then not needing it, and then comparing again in the critical section to avoid race conditions. Perhaps even shared memory can be used to pre-load the entire z-buffer to speed up access for dense scenes. + + +## Performance Analysis + +### Rasterizer Block Size + +After taking a rough estimate of the framerate during runtime for three different gltf files under different rasterizer block size conditions, I determined that with the current implementation, the block size has little to no effect on the framerate in most cases. In the cases where there are many triangles and each take little screen space, there is some changes due to thoroughput, and higher overall framerate. + +![Block Size Perf](images/blockSize.png) + +### Coloring and Motion + +Since the block size had little effect, I was curious about the effect of other factors, such as rendering of the normals versus textures and whether moving the scene around had significant impact. The impact on time per function as a percent of overall CUDA run time is as follows: + +![FunctionTime](images/funcTime.png) + +The raw data is from the NSight Performance Analysis tool. + +*Duck w/ Paint Normals, Still:* + +![Duck, Normals, Still](images/duck_normals_analysis1_1.PNG) ![Duck, Normals, Still](images/duck_normals_analysis1_2.PNG) + +*Duck w/ Paint Normals, Moving:* + +![Duck, Normals, Moving](images/duck_normals_analysis2_1.PNG) ![Duck, Normals, Moving](images/duck_normals_analysis2_2.PNG) + + +*Duck w/ Texture, Still:* + +![Duck, Texture, Still](images/duck_texture_analysis1_1.PNG) ![Duck, Texture, Still](images/duck_texture_analysis1_2.PNG) + +*Duck w/ Paint Texture, Moving:* + +![Duck, Texture, Moving](images/duck_texture_analysis2_1.PNG) ![Duck,Texture, Moving](images/duck_texture_analysis2_2.PNG) + + + +While in motion, the program spent more time in the rasterizer. Partly this is likely caused by the object moving closer to the camera, increasing the number of overlapping pixels needed to be scanned. The program spends a surprising amount of time in the rendering stage. While there seems to be less to do in the code, I can only assume accessing every fragment in memory, including those not actually containing anything, and copying to the frame buffer causes this slowdown. Texturing did have a slight effect on the average render time, which is at the highest max when both texturing and motion occur. + ### Credits * [tinygltfloader](https://github.com/syoyo/tinygltfloader) by [@soyoyo](https://github.com/syoyo) diff --git a/images/attempt to texture1.PNG b/images/attempt to texture1.PNG new file mode 100644 index 0000000..25a875a Binary files /dev/null and b/images/attempt to texture1.PNG differ diff --git a/images/bilinear_tex_filter.PNG b/images/bilinear_tex_filter.PNG new file mode 100644 index 0000000..3dd268c Binary files /dev/null and b/images/bilinear_tex_filter.PNG differ diff --git a/images/blockSize.png b/images/blockSize.png new file mode 100644 index 0000000..38692f9 Binary files /dev/null and b/images/blockSize.png differ diff --git a/images/duck_normals_analysis1_1.PNG b/images/duck_normals_analysis1_1.PNG new file mode 100644 index 0000000..54db594 Binary files /dev/null and b/images/duck_normals_analysis1_1.PNG differ diff --git a/images/duck_normals_analysis1_2.PNG b/images/duck_normals_analysis1_2.PNG new file mode 100644 index 0000000..1a3e0bb Binary files /dev/null and b/images/duck_normals_analysis1_2.PNG differ diff --git a/images/duck_normals_analysis2_1.PNG b/images/duck_normals_analysis2_1.PNG new file mode 100644 index 0000000..786f54c Binary files /dev/null and b/images/duck_normals_analysis2_1.PNG differ diff --git a/images/duck_normals_analysis2_2.PNG b/images/duck_normals_analysis2_2.PNG new file mode 100644 index 0000000..a8d8826 Binary files /dev/null and b/images/duck_normals_analysis2_2.PNG differ diff --git a/images/duck_texture_analysis1_1.PNG b/images/duck_texture_analysis1_1.PNG new file mode 100644 index 0000000..3f636b0 Binary files /dev/null and b/images/duck_texture_analysis1_1.PNG differ diff --git a/images/duck_texture_analysis1_2.PNG b/images/duck_texture_analysis1_2.PNG new file mode 100644 index 0000000..ce6a80e Binary files /dev/null and b/images/duck_texture_analysis1_2.PNG differ diff --git a/images/duck_texture_analysis2_1.PNG b/images/duck_texture_analysis2_1.PNG new file mode 100644 index 0000000..0f08d69 Binary files /dev/null and b/images/duck_texture_analysis2_1.PNG differ diff --git a/images/duck_texture_analysis2_2.PNG b/images/duck_texture_analysis2_2.PNG new file mode 100644 index 0000000..2ccb673 Binary files /dev/null and b/images/duck_texture_analysis2_2.PNG differ diff --git a/images/funcTime.png b/images/funcTime.png new file mode 100644 index 0000000..73d7c9e Binary files /dev/null and b/images/funcTime.png differ diff --git a/images/normal_interpolation.PNG b/images/normal_interpolation.PNG new file mode 100644 index 0000000..aebff7c Binary files /dev/null and b/images/normal_interpolation.PNG differ diff --git a/images/perspective_correct.PNG b/images/perspective_correct.PNG new file mode 100644 index 0000000..602233a Binary files /dev/null and b/images/perspective_correct.PNG differ diff --git a/images/perspective_incorrect_interpolation.PNG b/images/perspective_incorrect_interpolation.PNG new file mode 100644 index 0000000..de9e141 Binary files /dev/null and b/images/perspective_incorrect_interpolation.PNG differ diff --git a/images/perspective_incorrect_interpolation2.PNG b/images/perspective_incorrect_interpolation2.PNG new file mode 100644 index 0000000..075fae8 Binary files /dev/null and b/images/perspective_incorrect_interpolation2.PNG differ diff --git a/images/point_cloud.PNG b/images/point_cloud.PNG new file mode 100644 index 0000000..b3297e0 Binary files /dev/null and b/images/point_cloud.PNG differ diff --git a/images/point_cloud2.PNG b/images/point_cloud2.PNG new file mode 100644 index 0000000..8a739a0 Binary files /dev/null and b/images/point_cloud2.PNG differ diff --git a/images/quack.PNG b/images/quack.PNG new file mode 100644 index 0000000..22d8ae5 Binary files /dev/null and b/images/quack.PNG differ diff --git a/images/quack_works.PNG b/images/quack_works.PNG new file mode 100644 index 0000000..c9844a1 Binary files /dev/null and b/images/quack_works.PNG differ diff --git a/images/quack_z_buf_works.PNG b/images/quack_z_buf_works.PNG new file mode 100644 index 0000000..5810be6 Binary files /dev/null and b/images/quack_z_buf_works.PNG differ diff --git a/images/tex_works.PNG b/images/tex_works.PNG new file mode 100644 index 0000000..7987a8f Binary files /dev/null and b/images/tex_works.PNG differ diff --git a/images/wireframe.PNG b/images/wireframe.PNG new file mode 100644 index 0000000..9927025 Binary files /dev/null and b/images/wireframe.PNG differ diff --git a/images/zbuf.PNG b/images/zbuf.PNG new file mode 100644 index 0000000..a7bed9d Binary files /dev/null and b/images/zbuf.PNG differ diff --git a/images/zbuf_issues.PNG b/images/zbuf_issues.PNG new file mode 100644 index 0000000..bfba32a Binary files /dev/null and b/images/zbuf_issues.PNG differ diff --git a/renders/test.gif b/renders/test.gif new file mode 100644 index 0000000..0ba4298 Binary files /dev/null and b/renders/test.gif differ diff --git a/renders/test2.gif b/renders/test2.gif new file mode 100644 index 0000000..c3a814b Binary files /dev/null and b/renders/test2.gif differ diff --git a/renders/test3.gif b/renders/test3.gif new file mode 100644 index 0000000..b62f437 Binary files /dev/null and b/renders/test3.gif differ diff --git a/src/rasterize.cu b/src/rasterize.cu index 1262a09..e9307b1 100644 --- a/src/rasterize.cu +++ b/src/rasterize.cu @@ -18,6 +18,14 @@ #include #include +#define PAINT_NORMALS 0 +#define NO_LIGHTING 1 +#define WIREFRAME_MODE 0 +#define POINT_CLOUD_MODE 0 + + +#define RASTERIZER_BLOCK 256 + namespace { typedef unsigned short VertexIndex; @@ -28,7 +36,7 @@ namespace { typedef unsigned char BufferByte; - enum PrimitiveType{ + enum PrimitiveType { Point = 1, Line = 2, Triangle = 3 @@ -41,12 +49,12 @@ namespace { // The attributes listed below might be useful, // but always feel free to modify on your own - glm::vec3 eyePos; // eye space position used for shading - glm::vec3 eyeNor; // eye space normal used for shading, cuz normal will go wrong after perspective transformation - // glm::vec3 col; - glm::vec2 texcoord0; - TextureData* dev_diffuseTex = NULL; - // int texWidth, texHeight; + glm::vec3 eyePos; // eye space position used for shading + glm::vec3 eyeNor; // eye space normal used for shading, cuz normal will go wrong after perspective transformation + // glm::vec3 col; + glm::vec2 texcoord0; + TextureData* dev_diffuseTex = NULL; + int texWidth, texHeight; // ... }; @@ -62,10 +70,11 @@ namespace { // The attributes listed below might be useful, // but always feel free to modify on your own - // glm::vec3 eyePos; // eye space position used for shading - // glm::vec3 eyeNor; - // VertexAttributeTexcoord texcoord0; - // TextureData* dev_diffuseTex; + //glm::vec3 eyePos; // eye space position used for shading + glm::vec3 eyeNor; + VertexAttributeTexcoord texcoord0; + TextureData* dev_diffuseTex; + int texWidth, texHeight; // ... }; @@ -110,62 +119,173 @@ static Fragment *dev_fragmentBuffer = NULL; static glm::vec3 *dev_framebuffer = NULL; static int * dev_depth = NULL; // you might need this buffer when doing depth test +static int * dev_mutex = NULL; /** * Kernel that writes the image to the OpenGL PBO directly. */ -__global__ +__global__ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { - int x = (blockIdx.x * blockDim.x) + threadIdx.x; - int y = (blockIdx.y * blockDim.y) + threadIdx.y; - int index = x + (y * w); - - if (x < w && y < h) { - glm::vec3 color; - color.x = glm::clamp(image[index].x, 0.0f, 1.0f) * 255.0; - color.y = glm::clamp(image[index].y, 0.0f, 1.0f) * 255.0; - color.z = glm::clamp(image[index].z, 0.0f, 1.0f) * 255.0; - // Each thread writes one pixel location in the texture (textel) - pbo[index].w = 0; - pbo[index].x = color.x; - pbo[index].y = color.y; - pbo[index].z = color.z; - } + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + int index = x + (y * w); + + if (x < w && y < h) { + glm::vec3 color; + color.x = glm::clamp(image[index].x, 0.0f, 1.0f) * 255.0; + color.y = glm::clamp(image[index].y, 0.0f, 1.0f) * 255.0; + color.z = glm::clamp(image[index].z, 0.0f, 1.0f) * 255.0; + // Each thread writes one pixel location in the texture (textel) + pbo[index].w = 0; + pbo[index].x = color.x; + pbo[index].y = color.y; + pbo[index].z = color.z; + } +} + + +__device__ +glm::vec3 genTextureColor(Fragment f) { + float u = f.texcoord0.x * (f.texWidth - 1); + float v = f.texcoord0.y * (f.texHeight - 1); + + int tex_x = u; + int tex_y = v; + + float u_ratio = u - tex_x; + float v_ratio = v - tex_y; + + glm::vec3 tex_col; + glm::vec3 temp; + + int tex_idx = 3 * (tex_x + f.texWidth * tex_y); + + // if out of bounds give error color RED + if (tex_y > (f.texHeight - 1) || tex_y < 0 || tex_x >(f.texWidth - 1) || tex_x < 0) { + tex_col = glm::vec3(255.0f, 0.0f, 0.0f); + } + else { + bool wrap_x = false; bool wrap_y = false; + + if (tex_y + 1 == f.texHeight) wrap_y = true; + if (tex_x + 1 == f.texWidth) wrap_x = true; + + // (x, y) + tex_col.x = f.dev_diffuseTex[tex_idx]; + tex_col.y = f.dev_diffuseTex[tex_idx + 1]; + tex_col.z = f.dev_diffuseTex[tex_idx + 2]; + + tex_col *= (1 - u_ratio) * (1 - v_ratio); + + // (x + 1, y) + if (wrap_x) { + tex_idx = 3 * tex_y * f.texWidth; + } + else { + tex_idx = tex_idx + 3; + } + temp.x = f.dev_diffuseTex[tex_idx]; + temp.y = f.dev_diffuseTex[tex_idx + 1]; + temp.z = f.dev_diffuseTex[tex_idx + 2]; + + tex_col += (temp * u_ratio * (1 - v_ratio)); + + // (x, y + 1) + if (wrap_y) { + tex_idx = 3 * tex_x; + } + else { + tex_idx = 3 * (tex_x + f.texWidth * (tex_y + 1)); + } + + temp.x = f.dev_diffuseTex[tex_idx]; + temp.y = f.dev_diffuseTex[tex_idx + 1]; + temp.z = f.dev_diffuseTex[tex_idx + 2]; + + tex_col += (temp * v_ratio * (1 - u_ratio)); + + // (x + 1, y + 1) + if (wrap_x && wrap_y) { + tex_idx = 0; + } + else if (wrap_x) { + tex_idx = 3 * (f.texWidth * (tex_y + 1)); + } + else if (wrap_y) { + tex_idx = tex_idx = 3 * (tex_x + 1); + } + else { + tex_idx = 3 * ((tex_x + 1) + f.texWidth * (tex_y + 1)); + } + + temp.x = f.dev_diffuseTex[tex_idx]; + temp.y = f.dev_diffuseTex[tex_idx + 1]; + temp.z = f.dev_diffuseTex[tex_idx + 2]; + + tex_col += (temp * v_ratio * u_ratio); + + } + tex_col = tex_col / 255.0f; + return tex_col; } -/** +/** * Writes fragment colors to the framebuffer */ __global__ void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) { - int x = (blockIdx.x * blockDim.x) + threadIdx.x; - int y = (blockIdx.y * blockDim.y) + threadIdx.y; - int index = x + (y * w); + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + int index = x + (y * w); + + if (x < w && y < h) { + Fragment f = fragmentBuffer[index]; + + // lambertian lighting + float diffuseLight = glm::dot(glm::normalize(f.eyeNor), glm::vec3(0.0f, 1.0f, 0.0f)) + 0.2f; // 0.2f is ambient + if (diffuseLight > 1) diffuseLight = 1.0f; + if (diffuseLight < 0) diffuseLight = 0.0f; - if (x < w && y < h) { - framebuffer[index] = fragmentBuffer[index].color; + if (NO_LIGHTING) diffuseLight = 1.0f; + + if (PAINT_NORMALS) { + framebuffer[index] = fragmentBuffer[index].eyeNor * diffuseLight; + } + // apply texture color + else if (f.dev_diffuseTex != NULL) { + + framebuffer[index] = genTextureColor(f) * diffuseLight; + } + + else { + framebuffer[index] = fragmentBuffer[index].color * diffuseLight; + } // TODO: add your fragment shader code here - } + } } /** * Called once at the beginning of the program to allocate memory. */ void rasterizeInit(int w, int h) { - width = w; - height = h; + width = w; + height = h; cudaFree(dev_fragmentBuffer); cudaMalloc(&dev_fragmentBuffer, width * height * sizeof(Fragment)); cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); - cudaFree(dev_framebuffer); - cudaMalloc(&dev_framebuffer, width * height * sizeof(glm::vec3)); - cudaMemset(dev_framebuffer, 0, width * height * sizeof(glm::vec3)); - + cudaFree(dev_framebuffer); + cudaMalloc(&dev_framebuffer, width * height * sizeof(glm::vec3)); + cudaMemset(dev_framebuffer, 0, width * height * sizeof(glm::vec3)); + cudaFree(dev_depth); cudaMalloc(&dev_depth, width * height * sizeof(int)); + cudaFree(dev_mutex); + cudaMalloc(&dev_mutex, width * height * sizeof(int)); + cudaMemset(dev_mutex, 0, width * height * sizeof(int)); + checkCUDAError("rasterizeInit"); } @@ -187,9 +307,9 @@ void initDepth(int w, int h, int * depth) * kern function with support for stride to sometimes replace cudaMemcpy * One thread is responsible for copying one component */ -__global__ +__global__ void _deviceBufferCopy(int N, BufferByte* dev_dst, const BufferByte* dev_src, int n, int byteStride, int byteOffset, int componentTypeByteSize) { - + // Attribute (vec3 position) // component (3 * float) // byte (4 * byte) @@ -202,20 +322,20 @@ void _deviceBufferCopy(int N, BufferByte* dev_dst, const BufferByte* dev_src, in int offset = i - count * n; // which component of the attribute for (int j = 0; j < componentTypeByteSize; j++) { - - dev_dst[count * componentTypeByteSize * n - + offset * componentTypeByteSize + + dev_dst[count * componentTypeByteSize * n + + offset * componentTypeByteSize + j] - = + = - dev_src[byteOffset - + count * (byteStride == 0 ? componentTypeByteSize * n : byteStride) - + offset * componentTypeByteSize + dev_src[byteOffset + + count * (byteStride == 0 ? componentTypeByteSize * n : byteStride) + + offset * componentTypeByteSize + j]; } } - + } @@ -235,7 +355,7 @@ void _nodeMatrixTransform( } glm::mat4 getMatrixFromNodeMatrixVector(const tinygltf::Node & n) { - + glm::mat4 curMatrix(1.0); const std::vector &m = n.matrix; @@ -247,7 +367,8 @@ glm::mat4 getMatrixFromNodeMatrixVector(const tinygltf::Node & n) { curMatrix[i][j] = (float)m.at(4 * i + j); } } - } else { + } + else { // no matrix, use rotation, scale, translation if (n.translation.size() > 0) { @@ -275,12 +396,12 @@ glm::mat4 getMatrixFromNodeMatrixVector(const tinygltf::Node & n) { return curMatrix; } -void traverseNode ( +void traverseNode( std::map & n2m, const tinygltf::Scene & scene, const std::string & nodeString, const glm::mat4 & parentMatrix - ) +) { const tinygltf::Node & n = scene.nodes.at(nodeString); glm::mat4 M = parentMatrix * getMatrixFromNodeMatrixVector(n); @@ -537,7 +658,7 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { size_t s = image.image.size() * sizeof(TextureData); cudaMalloc(&dev_diffuseTex, s); cudaMemcpy(dev_diffuseTex, &image.image.at(0), s, cudaMemcpyHostToDevice); - + diffuseTexWidth = image.width; diffuseTexHeight = image.height; @@ -554,7 +675,7 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { // ---------Node hierarchy transform-------- cudaDeviceSynchronize(); - + dim3 numBlocksNodeTransform((numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); _nodeMatrixTransform << > > ( numVertices, @@ -595,21 +716,21 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { } // for each node } - + // 3. Malloc for dev_primitives { cudaMalloc(&dev_primitives, totalNumPrimitives * sizeof(Primitive)); } - + // Finally, cudaFree raw dev_bufferViews { std::map::const_iterator it(bufferViewDevPointers.begin()); std::map::const_iterator itEnd(bufferViewDevPointers.end()); - - //bufferViewDevPointers + + //bufferViewDevPointers for (; it != itEnd; it++) { cudaFree(it->second); @@ -623,11 +744,11 @@ void rasterizeSetBuffers(const tinygltf::Scene & scene) { -__global__ +__global__ void _vertexTransformAndAssembly( - int numVertices, - PrimitiveDevBufPointers primitive, - glm::mat4 MVP, glm::mat4 MV, glm::mat3 MV_normal, + int numVertices, + PrimitiveDevBufPointers primitive, + glm::mat4 MVP, glm::mat4 MV, glm::mat3 MV_normal, int width, int height) { // vertex id @@ -639,9 +760,33 @@ void _vertexTransformAndAssembly( // Then divide the pos by its w element to transform into NDC space // Finally transform x and y to viewport space + VertexOut & vOut = primitive.dev_verticesOut[vid]; + + // screen-space pos + vOut.pos = MVP * glm::vec4(primitive.dev_position[vid], 1.0f); + vOut.pos /= vOut.pos.w; + vOut.pos.x = 0.5f * (float)width * (vOut.pos.x + 1.0f); + vOut.pos.y = 0.5f * (float)height * (1.0f - vOut.pos.y); + + // eye-space pos + vOut.eyePos = glm::vec3(MV * glm::vec4(primitive.dev_position[vid], 1.0f)); + + // eye-space normal + vOut.eyeNor = MV_normal * primitive.dev_normal[vid]; + vOut.eyeNor = glm::normalize(vOut.eyeNor); + + vOut.dev_diffuseTex = primitive.dev_diffuseTex; + if (vOut.dev_diffuseTex != NULL) { + vOut.texcoord0 = primitive.dev_texcoord0[vid]; + + + vOut.texHeight = primitive.diffuseTexHeight; + vOut.texWidth = primitive.diffuseTexWidth; + } + // TODO: Apply vertex assembly here - // Assemble all attribute arraies into the primitive array - + // Assemble all attribute arrays into the primitive array + } } @@ -649,7 +794,7 @@ void _vertexTransformAndAssembly( static int curPrimitiveBeginId = 0; -__global__ +__global__ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_primitives, PrimitiveDevBufPointers primitive) { // index id @@ -660,28 +805,267 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_ // TODO: uncomment the following code for a start // This is primitive assembly for triangles - //int pid; // id for cur primitives vector - //if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { - // pid = iid / (int)primitive.primitiveType; - // dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] - // = primitive.dev_verticesOut[primitive.dev_indices[iid]]; - //} + int pid; // id for cur primitives vector + if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { + pid = iid / (int)primitive.primitiveType; + dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] + = primitive.dev_verticesOut[primitive.dev_indices[iid]]; + + dev_primitives[pid + curPrimitiveBeginId].primitiveType = Triangle; + } // TODO: other primitive types (point, line) + else if (primitive.primitiveMode == TINYGLTF_MODE_LINE) { + pid = iid / (int)primitive.primitiveType; + dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] + = primitive.dev_verticesOut[primitive.dev_indices[iid]]; + + dev_primitives[pid + curPrimitiveBeginId].primitiveType = Line; + } + + else if (primitive.primitiveMode == TINYGLTF_MODE_POINTS) { + pid = iid; + dev_primitives[pid + curPrimitiveBeginId].v[0] = primitive.dev_verticesOut[primitive.dev_indices[iid]]; + + dev_primitives[pid + curPrimitiveBeginId].primitiveType = Point; + } + } + +} + +__device__ +void rasterizeTriangle(Primitive p, Fragment* fragBuf, int* depth, int* mutex, int width, int height) { + + Fragment f_true; + f_true.color = glm::vec3(1.0f); // default white + f_true.dev_diffuseTex = p.v[0].dev_diffuseTex; + + if (f_true.dev_diffuseTex != NULL) { + f_true.texWidth = p.v[0].texWidth; + f_true.texHeight = p.v[0].texHeight; + } + + // triangle vertex positions + glm::vec3 tri[3] = { glm::vec3(p.v[0].pos), glm::vec3(p.v[1].pos), glm::vec3(p.v[2].pos) }; + + // backface culling + glm::vec3 normal = glm::cross(tri[1] - tri[0], tri[2] - tri[0]); + if (glm::dot(normal, glm::vec3(0.0f, 0.0f, 1.0f)) > 0) return; + + AABB aabb = getAABBForTriangle(tri); + + // if completely off screen return + if (aabb.max.y >= height && aabb.min.y >= height) return; + if (aabb.max.y < 0 && aabb.min.y < 0) return; + + if (aabb.max.x >= width && aabb.min.x >= width) return; + if (aabb.max.x < 0 && aabb.min.x < 0) return; + + // clamp bounds + if (aabb.max.y >= height) aabb.max.y = height - 1; + if (aabb.min.y < 0) aabb.min.y = 0; + if (aabb.max.x >= width) aabb.max.x = width - 1; + if (aabb.min.x < 0) aabb.min.x = 0; + + int f_idx; + + glm::vec3 bary; + bool inside = false; + + int z; + float z_float; + float w; + + bool isSet = false; + + for (int y = aabb.min.y; y <= aabb.max.y; y++) { + for (int x = aabb.min.x; x <= aabb.max.x; x++) { + f_idx = y * width + x; + // identify if in bounds of triangle + bary = calculateBarycentricCoordinate(tri, glm::vec2(x, y)); + inside = isBarycentricCoordInBounds(bary); + + // depth buffer + z_float = getZAtCoordinate(bary, tri); + if (z_float < 0 || z_float > 1) continue; + z = z_float * INT_MAX; + + w = 1.0f / ((bary.x / tri[0].z) + (bary.y / tri[1].z) + (bary.z / tri[2].z)); + + if (inside) { + isSet = false; + do { + isSet = (atomicCAS(mutex + f_idx, 0, 1) == 0); + if (isSet) { + // Critical section + if (z < depth[f_idx]) { + + depth[f_idx] = z; + fragBuf[f_idx] = f_true; + + // generate interpolated attributes + if (p.v[0].dev_diffuseTex != NULL) { + fragBuf[f_idx].texcoord0 = w * ( (p.v[0].texcoord0 * bary.x / tri[0].z) + + (p.v[1].texcoord0 * bary.y / tri[1].z) + + (p.v[2].texcoord0 * bary.z / tri[2].z)); + } + fragBuf[f_idx].eyeNor = w * ((p.v[0].eyeNor * bary.x / tri[0].z) + (p.v[1].eyeNor * bary.y / tri[1].z) + (p.v[2].eyeNor * bary.z / tri[2].z)); + } + } + if (isSet) { + mutex[f_idx] = 0; + } + } while (!isSet); + } + } } - + } +__device__ +void rasterizePoint(VertexOut v, Fragment* fragBuf, int* depth, int* mutex, int width) { + Fragment f; + f.color = glm::vec3(1.0f); + f.dev_diffuseTex = v.dev_diffuseTex; + if (f.dev_diffuseTex != NULL) { + f.texcoord0 = v.texcoord0; + f.texHeight = v.texHeight; + f.texWidth = v.texWidth; + } + f.eyeNor = v.eyeNor; + int f_idx = (int)v.pos.y * width + (int)v.pos.x; + + bool isSet = false; + + do { + isSet = (atomicCAS(mutex + f_idx, 0, 1) == 0); + if (isSet) { + // Critical section + if (v.pos.z < depth[f_idx]) { + + depth[f_idx] = v.pos.z; + fragBuf[f_idx] = f; + } + } + if (isSet) { + mutex[f_idx] = 0; + } + } while (!isSet); + +} +__device__ +void rasterizeLine(VertexOut v1, VertexOut v2, Fragment* fragBuf, int* depth, int* mutex, int width, int height) { + AABB aabb; + aabb.min = glm::vec3(min(v1.pos.x, v2.pos.x), min(v1.pos.y, v2.pos.y), min(v1.pos.z, v2.pos.z)); + aabb.max = glm::vec3(max(v1.pos.x, v2.pos.x), max(v1.pos.y, v2.pos.y), max(v1.pos.z, v2.pos.z)); + + // if completely off screen return + if (aabb.max.y >= height && aabb.min.y >= height) return; + if (aabb.max.y < 0 && aabb.min.y < 0) return; + + if (aabb.max.x >= width && aabb.min.x >= width) return; + if (aabb.max.x < 0 && aabb.min.x < 0) return; + + // clamp bounds + if (aabb.max.y >= height) aabb.max.y = height - 1; + if (aabb.min.y < 0) aabb.min.y = 0; + if (aabb.max.x >= width) aabb.max.x = width - 1; + if (aabb.min.x < 0) aabb.min.x = 0; + + int x; + float z; + + float m = (v2.pos.y - v1.pos.y) / (v2.pos.x - v1.pos.x); + float a; + + Fragment f; + f.color = glm::vec3(1.0f); + + int f_idx; + + for (int y = aabb.min.y; y <= aabb.max.y; y++) { + //get point x on line + x = (int)((y - v1.pos.y) / m) + (int)v1.pos.x; + + // check bounds + if (x > aabb.max.x || x < aabb.min.x) continue; + + a = (x - v1.pos.x) / (v2.pos.x - v1.pos.x); + f_idx = y * width + x; + + z = 1.0f / ((a / v1.pos.z) + ((1-a) / v2.pos.z)); + + f.dev_diffuseTex = v1.dev_diffuseTex; + if (f.dev_diffuseTex != NULL) { + f.texcoord0 = z * ((a * v1.texcoord0 / v1.pos.z) + ((1 - a) * v2.texcoord0 / v2.pos.z)); + f.texHeight = v1.texHeight; + f.texWidth = v1.texWidth; + } + f.eyeNor = z * ((a * v1.eyeNor / v1.pos.z) + ((1 - a) * v2.eyeNor / v2.pos.z)); + + bool isSet = false; + + do { + isSet = (atomicCAS(mutex + f_idx, 0, 1) == 0); + if (isSet) { + // Critical section + if (z < depth[f_idx]) { + + depth[f_idx] = z; + fragBuf[f_idx] = f; + } + } + if (isSet) { + mutex[f_idx] = 0; + } + } while (!isSet); + } + +} + +__global__ +void _rasterizer(int num_prim, Primitive* primitives, Fragment* fragBuf, int* depth, int* mutex, int width, int height) { + // index id + int idx = (blockIdx.x * blockDim.x) + threadIdx.x; + if (idx > num_prim) return; + + Primitive p = primitives[idx]; + + if (POINT_CLOUD_MODE && p.primitiveType == Triangle) { + for (int i = 0; i < 3; i++) { + if (p.v[i].pos.x >= width || p.v[i].pos.y >= height) continue; + if (p.v[i].pos.x < 0 || p.v[i].pos.y < 0) continue; + rasterizePoint(p.v[i], fragBuf, depth, mutex, width); + } + + } + else if (WIREFRAME_MODE && p.primitiveType == Triangle) { + rasterizeLine(p.v[0], p.v[1], fragBuf, depth, mutex, width, height); + rasterizeLine(p.v[0], p.v[2], fragBuf, depth, mutex, width, height); + rasterizeLine(p.v[1], p.v[2], fragBuf, depth, mutex, width, height); + } + else if (p.primitiveType == Triangle) { + rasterizeTriangle(p, fragBuf, depth, mutex, width, height); + } + else if (p.primitiveType == Point) { + if (p.v[0].pos.x >= width || p.v[0].pos.y >= height) return; + if (p.v[0].pos.x < 0 || p.v[0].pos.y < 0) return; + rasterizePoint(p.v[0], fragBuf, depth, mutex, width); + } + else if (p.primitiveType == Line) { + rasterizeLine(p.v[0], p.v[1], fragBuf, depth, mutex, width, height); + } +} /** * Perform rasterization. */ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal) { - int sideLength2d = 8; - dim3 blockSize2d(sideLength2d, sideLength2d); - dim3 blockCount2d((width - 1) / blockSize2d.x + 1, + int sideLength2d = 8; + dim3 blockSize2d(sideLength2d, sideLength2d); + dim3 blockCount2d((width - 1) / blockSize2d.x + 1, (height - 1) / blockSize2d.y + 1); // Execute your rasterization pipeline here @@ -702,14 +1086,14 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g dim3 numBlocksForVertices((p->numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); dim3 numBlocksForIndices((p->numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height); + _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> > (p->numVertices, *p, MVP, MV, MV_normal, width, height); checkCUDAError("Vertex Processing"); cudaDeviceSynchronize(); _primitiveAssembly << < numBlocksForIndices, numThreadsPerBlock >> > - (p->numIndices, - curPrimitiveBeginId, - dev_primitives, - *p); + (p->numIndices, + curPrimitiveBeginId, + dev_primitives, + *p); checkCUDAError("Primitive Assembly"); curPrimitiveBeginId += p->numPrimitives; @@ -718,20 +1102,22 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g checkCUDAError("Vertex Processing and Primitive Assembly"); } - + + cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); - initDepth << > >(width, height, dev_depth); - - // TODO: rasterize + initDepth << > > (width, height, dev_depth); + // TODO: rasterize + dim3 num_blocks((totalNumPrimitives + RASTERIZER_BLOCK - 1) / RASTERIZER_BLOCK); + _rasterizer << > > (totalNumPrimitives, dev_primitives, dev_fragmentBuffer, dev_depth, dev_mutex, width, height); - // Copy depthbuffer colors into framebuffer - render << > >(width, height, dev_fragmentBuffer, dev_framebuffer); + // Copy depthbuffer colors into framebuffer + render << > > (width, height, dev_fragmentBuffer, dev_framebuffer); checkCUDAError("fragment shader"); - // Copy framebuffer into OpenGL buffer for OpenGL previewing - sendImageToPBO<<>>(pbo, width, height, dev_framebuffer); - checkCUDAError("copy render result to pbo"); + // Copy framebuffer into OpenGL buffer for OpenGL previewing + sendImageToPBO << > > (pbo, width, height, dev_framebuffer); + checkCUDAError("copy render result to pbo"); } /** @@ -739,7 +1125,7 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g */ void rasterizeFree() { - // deconstruct primitives attribute/indices device buffer + // deconstruct primitives attribute/indices device buffer auto it(mesh2PrimitivesMap.begin()); auto itEnd(mesh2PrimitivesMap.end()); @@ -753,24 +1139,27 @@ void rasterizeFree() { cudaFree(p->dev_verticesOut); - + //TODO: release other attributes and materials } } //////////// - cudaFree(dev_primitives); - dev_primitives = NULL; + cudaFree(dev_primitives); + dev_primitives = NULL; cudaFree(dev_fragmentBuffer); dev_fragmentBuffer = NULL; - cudaFree(dev_framebuffer); - dev_framebuffer = NULL; + cudaFree(dev_framebuffer); + dev_framebuffer = NULL; cudaFree(dev_depth); dev_depth = NULL; - checkCUDAError("rasterize Free"); + cudaFree(dev_mutex); + dev_mutex = NULL; + + checkCUDAError("rasterize Free"); } diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h index 46c701e..aa6db10 100644 --- a/src/rasterizeTools.h +++ b/src/rasterizeTools.h @@ -95,7 +95,7 @@ bool isBarycentricCoordInBounds(const glm::vec3 barycentricCoord) { */ __host__ __device__ static float getZAtCoordinate(const glm::vec3 barycentricCoord, const glm::vec3 tri[3]) { - return -(barycentricCoord.x * tri[0].z + return (barycentricCoord.x * tri[0].z + barycentricCoord.y * tri[1].z + barycentricCoord.z * tri[2].z); } diff --git a/util/tiny_gltf_loader.h b/util/tiny_gltf_loader.h index 3746ac8..506921e 100644 --- a/util/tiny_gltf_loader.h +++ b/util/tiny_gltf_loader.h @@ -1488,11 +1488,11 @@ static bool ParsePrimitive(Primitive *primitive, std::string *err, ParseNumberProperty(&mode, err, o, "mode", false); int primMode = static_cast(mode); - if (primMode != TINYGLTF_MODE_TRIANGLES) { + if (primMode != TINYGLTF_MODE_TRIANGLES && primMode != TINYGLTF_MODE_POINTS && primMode != TINYGLTF_MODE_LINE) { if (err) { (*err) += "Currently TinyGLTFLoader doesn not support primitive mode other \n" - "than TRIANGLES.\n"; + "than TRIANGLES, POINTS, and LINES.\n"; } return false; }