Skip to content

Allocation functions, memory transfers and context #53

Open
@npmiller

Description

@npmiller

We've been investigating changing the PI interface for memory allocations and also to some extent for memory transfers, which in turns also changes some of the meaning of the PI context. A lot of the reasoning for these changes is based on how the SYCL DPC++ runtime currently works, but it would be good to consider them for the Unified Runtime.

The changes are:

  1. Add a pi_device argument to buffer and image allocation entry points (piMemBufferCreate, piMemImageCreate). It doesn't necessarily mean that the allocation will only be usable on that device, but it's helpful for backends that don't natively support context style allocations. For the DPC++ SYCL runtime this makes a lot of sense because we already do lazy allocation so when we call these functions we always already know the exact device targeted and not just the context (the SYCL context_bound property is not currently implemented in DPC++).
  2. Add a new query piextGetMemoryConnection that takes two pairs of (pi_device, pi_context), and returns information on how the memory can or should be handled between the two pairs. It currently has three options:
    • PI_MEMORY_CONNECTION_NONE: memory in the first (context, device) pair cannot be used or migrated by the plugin into the second (context, device) pair, copies through host are necessary.
    • PI_MEMORY_CONNECTION_MIGRATABLE: memory in the first (context, device) pair cannot be used directly by the second (context, device) pair, but the plugin can handle migrating data between the two (piEnqueueMemBufferCopy).
    • PI_MEMORY_CONNECTION_UNIFIED: memory in the first (context, device) pair is usable in the second pair.

And with these two changes it means that a backend that doesn't natively support context-style allocations doesn't have to emulate them anymore, and can simply allocate for a specific device and report that the memory still needs to be migrated between devices in the same context. And a device that does support context-style allocations can ignore the pi_device passed to the allocation functions and then simply report PI_MEMORY_CONNECTION_UNIFIED when the contexts are identical, and PI_MEMORY_CONNECTION_NONE when the contexts are different. In addition it also means that we can let plugins inform us if they can optimize memory copies between different context by reporting PI_MEMORY_CONNECTION_MIGRATABLE, which would mean that piEnqueueMemBufferCopy is supported between the two contexts and may be more efficient than doing a copy through host.

And so to circle back to the initial motivation, CUDA doesn't have context-style memory allocations like OpenCL or PI, and so to support having multiple CUDA devices in the same pi_context we would have to roll out our own memory manager in the CUDA plugin (which I believe the LevelZero plugin also does), but since the SYCL runtime already has a memory manager, these PI plugin changes allow us to simply defer the management of memory allocations within the same context for the CUDA plugin to the SYCL runtime.

You can see more discussions and initial implementations of this on the following PR:

Metadata

Metadata

Assignees

No one assigned

    Labels

    memoryMemory allocations/transfers/operationspiDPC++ PI requirementspecificationChanges or additions to the specification

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions