University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking
- Keyu Lu
- Tested on: Windows 10, Dell Oman, NVIDIA GeForce RTX 2060
| Flocking Scene: N_FOR_VIS = 5,000, scene_scale=100.0f, DT = 0.2f |
|---|
![]() |
| Flocking Scene: N_FOR_VIS = 500,000, scene_scale=300.0f, DT = 2.0f |
|---|
![]() |
| Flocking Scene: N_FOR_VIS = 5000,000, scene_scale=500.0f, DT = 0.5f |
|---|
![]() |
Naive: Performance drops sharply as boid count increases due to O(N^2) complexity.
Uniform Grid: Less severe performance drop with more boids due to reduced comparisons.
Coherent Grid: Similar to the uniform grid but maintains better performance due to optimized memory access.
Increasing block size generally improves performance until a threshold, after which there are diminishing returns. This is consistent across all implementations and is likely due to the limits of GPU thread management and optimal thread occupancy.
The 8-cell neighbor search outperforms the 27-cell approach as it aligns better with the localized interaction range of boids, avoiding unnecessary computations for distant cells that do not impact the immediate behavior of the boids, leading to improved performance.
For the extra credit, I implemented a shared-memory optimization to enhance the nearest neighbor search within the naive approach of the boid simulation. The naive approach's performance was improved by using shared memory for the computations involved in updating boid velocities.
The implementation conditional can be observed in the following snippet:
#if USE_SHARED_MEM
kernUpdateVelocityBruteForceShared <<< fullBlocksPerGrid, blockSize, sizeof(glm::vec3) * blockSize * 2 >>> (N, dev_pos, dev_vel1, dev_vel2);
#else
kernUpdateVelocityBruteForce <<< fullBlocksPerGrid, blockSize >>> (N, dev_pos, dev_vel1, dev_vel2);
#endifThis section of the code utilizes the preprocessor directive USE_SHARED_MEM to switch between using shared memory (kernUpdateVelocityBruteForceShared) and not using it (kernUpdateVelocityBruteForce).
As demonstrated, the use of shared memory has a significant impact on the frames per second (FPS) achieved by the simulation under the naive setting:





.png)

