Allocation functions, memory transfers and context

We've been investigating changing the PI interface for memory allocations and also to some extent for memory transfers, which in turns also changes some of the meaning of the PI context. A lot of the reasoning for these changes is based on how the SYCL DPC++ runtime currently works, but it would be good to consider them for the Unified Runtime.

The changes are:
1. Add a `pi_device` argument to buffer and image allocation entry points (`piMemBufferCreate`, `piMemImageCreate`). It doesn't necessarily mean that the allocation will only be usable on that device, but it's helpful for backends that don't natively support context style allocations. For the DPC++ SYCL runtime this makes a lot of sense because we already do lazy allocation so when we call these functions we always already know the exact device targeted and not just the context (the SYCL `context_bound` property is not currently implemented in DPC++).
2. Add a new query `piextGetMemoryConnection` that takes two pairs of `(pi_device, pi_context)`, and returns information on how the memory can or should be handled between the two pairs. It currently has three options:
    * `PI_MEMORY_CONNECTION_NONE`: memory in the first `(context, device)` pair cannot be used or migrated by the plugin into the second `(context, device)` pair, copies through host are necessary.
    * `PI_MEMORY_CONNECTION_MIGRATABLE`: memory in the first `(context, device)` pair cannot be used directly by the second `(context, device)` pair, but the plugin can handle migrating data between the two (`piEnqueueMemBufferCopy`).
    * `PI_MEMORY_CONNECTION_UNIFIED`: memory in the first `(context, device)` pair is usable in the second pair.
    
And with these two changes it means that a backend that doesn't natively support context-style allocations doesn't have to emulate them anymore, and can simply allocate for a specific device and report that the memory still needs to be migrated between devices in the same context. And a device that does support context-style allocations can ignore the `pi_device` passed to the allocation functions and then simply report `PI_MEMORY_CONNECTION_UNIFIED` when the contexts are identical, and `PI_MEMORY_CONNECTION_NONE` when the contexts are different. In addition it also means that we can let plugins inform us if they can optimize memory copies between different context by reporting `PI_MEMORY_CONNECTION_MIGRATABLE`, which would mean that ` piEnqueueMemBufferCopy` is supported between the two contexts and may be more efficient than doing a copy through host.

And so to circle back to the initial motivation, CUDA doesn't have context-style memory allocations like OpenCL or PI, and so to support having multiple CUDA devices in the same `pi_context` we would have to roll out our own memory manager in the CUDA plugin (which I believe the LevelZero plugin also does), but since the SYCL runtime already has a memory manager, these PI plugin changes allow us to simply defer the management of memory allocations within the same context for the CUDA plugin to the SYCL runtime.

You can see more discussions and initial implementations of this on the following PR:
* https://github.com/intel/llvm/pull/6446/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allocation functions, memory transfers and context #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allocation functions, memory transfers and context #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions