ParallelVec is a generic collection of contiguously stored heterogenous values with
an API similar to that of a Vec<(T1, T2, ...)> but stores the data laid out as a
separate slice per field, using a structures of arrays
layout. The advantage of this layout is that cache utilization may be signifgantly improved
when iterating over the data.
This approach is common to game engines, and Entity-Component-Systems in particular but is applicable anywhere that cache coherency and memory bandwidth are important for performance.
Unlike a struct of Vecs, only one length and capacity field is stored, and only one contiguous
allocation is made for the entire data structs. Upon reallocation, a struct of Vec may apply
additional allocation pressure. ParallelVec only allocates once per resize.
use parallel_vec::ParallelVec;
// #Some 'entity' data.
struct Position { x: f64, y: f64 }
struct Velocity { dx: f64, dy: f64 }
struct ColdData { /* Potentially many fields omitted here */ }
// Create a vec of entities
let mut entities: ParallelVec<(Position, Velocity, ColdData)> = ParallelVec::new();
entities.push((Position {x: 1.0, y: 2.0}, Velocity { dx: 0.0, dy: 0.5 }, ColdData {}));
entities.push((Position {x: 0.0, y: 2.0}, Velocity { dx: 0.5, dy: 0.5 }, ColdData {}));
// Update entities. This loop only loads position and velocity data, while skipping over
// the ColdData which is not necessary for the physics simulation.
for (position, velocity, _) in entities.iter_mut() {
*position = *position + *velocity;
}
// Remove an entity
entities.swap_remove(0);By default, this crate requires the standard library. Disabling the default features
enables this crate to compile in #![no_std] environments. There must be a set global
allocator and heap support for this crate to work.
ParallelVec can be serialized if it's parameters can be serialized. This is disabled by
default. Use the serde feature to enable support for serialization and deserialization.
To run benchmarks, use cargo bench. The benchmarks for this crate directly compares the
iteration and get performance of ParallelVec and it's Vec equivalent on small structs,
big structs, and a mix of both.
Generally, ParallelVec achieves similar performance to Vec when the entire buffer can
fit into cache. Once the backing store grows larger than cache, or if there are other
operations competing for cache space, ParallelVec achieves higher iteration speeds than
its Vec equivalent, particularly as the size of the elements increases. Conversely,
ParallelVec performance falls off relative to it's Vec equivalent as the size of the
overall buffer increases.