Skip to content

Conversation

henrypinkard
Copy link
Contributor

@henrypinkard henrypinkard commented Feb 22, 2025

The V2 buffer provides thread-safe, generic data storage with improved performance and cleaner abstractions.

Before merging

  • Decide on default metadata handling and API for turning it on/off
  • Add modify pymmcore swig to expose new pointer based image handling
  • Bump Core and MMCoreJ versions appropriately

Design

Two core components:

DataBuffer: Thread-safe generic storage replacing CircularBuffer
BufferManager: Unified interface managing both legacy and new implementations

Key features of the new buffer system:

  • Thread-safe read/write access
  • Support for generic data types beyond just images
  • Support for various data types simultaneously (e.g. images of different sizes, pixels types, etc)
  • Zero copy writing into the buffer by giving devices pointers into it
  • Zero copy management at the application layer through pointer manipulation

It can be enabled with:

core.enableV2Buffer(true)

Performance

As a drop in replacement for the circular buffer (i.e. copying the data same number of times, but allowing for arbitrary size and data types), the new buffer gives equal or better performance:

In sequence acquisitions:
image

In continuous sequence acquisitions (live mode):

image

It's significantly faster to allocate

image

Additionally, it has two key features that will enable much higher performance code:

  1. Application layer (e.g. Java, Python), can get access to data/metadata via pointers, avoiding direct copies. (In a quick test this seems to give a 2x speed improvement for reading out 2048x2048 images)
  2. Device adapters can avoid and extra copy by requesting a slot to write data into

Testing

I've written and validated the new buffer and circular buffer against many new tests here. (FYI these live in mmpycorex so they can easily test both MMCoreJ and pymmcore)

It also passes all the pycromanager acquisition tests, which test the various functionalities of the acquisition engine

Metadata

In conjunction with these changes, it made sense to standardize the metadata added to to images. This was previously split amongst several places, making it hard to keep track of and maintain, including the SWIG wrapper, the core, the corec allback, and the device code. Some of it was generated at the time of image acquisition, and some of it was generated at the time of image retrieval.

It has now all been consolidated into void CMMCore::addCameraMetadata, and the same metadata is added to all images whether snapped or passed through a buffer (with the small exception of some multi-camera device adapter-specific tags).

Testing reveals there's a substantial performance cost to adding so much metadata to all images:

image

Previously, much of this cost was incurred when reading images back out of the buffer. With the new changes, it is incurred at the time of insertion. However, I think it makes much more sense that this metadata is added at insertion time, because that's when its most likely to be in sync with the actual state of hardware.

Since this consolidation takes place outside the BufferManager, it also affects the circular buffer and will change behavior even if the v2 buffer is disabled. We need to figure out what should be enabled here. It's unclear (to me) what higher level code depends on what tags, but including the union of all of them by default will substantially hurt performance. I also have just a temporary function in the core API for controlling which metadata to add, which should perhaps be replaced with something more permanent.

Multi-camera

While it is possible to use the v2 buffer with multi-camera devices, since its flexibility is a more general solution (e.g. supports different image sizes, types, etc) to than the multi-camera device adapter, in my opinion that should be deprecated and application code that relies on it updated to the v2 buffer.

One addition here is the getLastTaggedImageFromDevicePointer("cameraLabel"), which enables you to get the last image from a specific camera, rather than having to search backwards through the most recent images and read their metadata.

A step towards a single route for all data

The pointer-based API gives a good opportunity to start moving towards a single route for all data, rather than a separate route for snap and sequences. I don't think its possible to fully do this without changing how cameras handle data for snaps, but in the mean time the GetImagePointer function now copies the snap buffer in camera adapters into the v2 buffer, returning a pointer to it. This should be faster than copying into the application memory space because it can be multithreaded, and still allows the pointer-based handling of the data from the application layer.

Pointer based image handling

You get these through methods like getLastImagePointer(), which return a TaggedImagePointer object. This object is a wrapper around the TaggedImage object, but it will not load the pixels until you call getPixels(), or if you never want to use them you can call release(), or just use the metadata without pixels like:

TaggedImagePointer tip = core.getLastImagePointer();
// This works just like a regular JSONObject, but it won't load
// the metadata until needed
tip.tags.get("Width");

…ure metadata is accurate for v2 buffer images
…c; Also centralized Metadata generation into the core from SWIG, core callback, device base
@henrypinkard
Copy link
Contributor Author

Sounds good. As for System State Cache, there currently is a mechanism to switch that off if desired (MMCoreJ.i::includeSystemStateCache_). I am all for cleaning that up, but we will need a way to continue switching that off also when using V1 when needed (hoping to start V2 soon on the Java side, but expecting that will take some time).

I moved that method to the core and out of the SWIG layer, but will keep it indefinitely for backwards compatibility

@henrypinkard
Copy link
Contributor Author

Added system state cache to summary metadata: micro-manager/AcqEngJ#127

@henrypinkard
Copy link
Contributor Author

Okay I think I've addressed everything except for the two remaining unresolved comments above.

I think what to do about acquiring write slots can be addressed in future PR, but it would be good to figure out what the eventual strategy will be

I noticed that a few new functions (AcquireImageWriteSlot et al.) are added to CoreCallback but not to its base (interface) class MM::Core. @henrypinkard Would I be right in thinking that these are there to show how the transfer of images from the camera to MMCore can be made more efficient, but are kept hidden from device adapters for the time being?

This was indeed an accidental omission. I've added them to the interface, but this is commented out for now

I would agree with a cautious approach here because it would be bad if we have cameras that only work with V1 or V2. It would also be bad if every camera that supports WriteSlot had to add conditional branches to check if V2 is enabled. Probably the best way is to say that cameras must use either InsertImage or WriteSlot, but not both, and then make WriteSlot just work with V1 as well, just without the benefit of eliminating a copy. (As far as I'm concerned, it's fine if this PR doesn't yet expose WriteSlot to devices.)

NewDataBuffer (formerly V2) maintains backward compatibility with InsertImage, so existing camera adapters will continue to function. Users who prefer to avoid pointer management can still use InsertImage to copy data into the NewDataBuffer.
However, AcquireWriteSlot is incompatible with CircularBuffer for several reasons:

  • The CircularBuffer isn't truly circular - it fully empties when filled in continuous sequence mode, which would invalidate pointers held by device code.
  • CircularBuffer lacks slot locking mechanisms. Adding this functionality would essentially transform it into something similar to NewDataBuffer while risking new bugs.

I considered creating an intermediate compatibility buffer that would temporarily store data when a camera acquires a write slot with CircularBuffer enabled. However, this approach would be overly complex and effectively just place a NewDataBuffer instance between the CircularBuffer and device adapters.

I think the path forward is to enable write slot acquisition in a future PR after additional testing, with the requirement that camera device adapters using the acquire/release slot feature must use the new buffer. Since the performance testing shows NewDataBuffer is more performant than CircularBuffer while covering all the same use cases, it seems to me the application should be migrated to using it as the default option as soon as we are confident in its robustness

@marktsuchida
Copy link
Member

I'm finally getting around to looking at the details of the image retrieval API for the V2 buffer. Please correct me if I'm misunderstanding anything below.

Using the new mechanism, the app (let's look at Java for now) calls popNextDataPointer(), which, after the MMCoreJ wrapping into popNextTaggedImagePointer(), returns TaggedImagePointer. That Java class stores a (wrapped) BufferDataPointer -- so the app can indefinitely hold on to the pointer. When the app finally calls TaggedImagePointer.release(), this goes through BufferDataPointer::release(), which calls the current DataBuffer's ReleaseDataReadPointer() via the BufferManager. But in the meantime, the DataBuffer might have been deleted and replaced, due to the buffer size changing (or due to switching V2 off and on again). In such cases, the release() will crash or corrupt memory. (Also, accessing the image data after the buffer has been reallocated will crash or return incorrect data.)

On the other hand, if the app obtains a TaggedImagePointer and then lets go of it without calling release(), the slots pointed to remain allocated indefinitely and there is no way to reclaim the memory (other than resizing the buffer). This is perhaps less critical because there is no correct way to use this without explicitly calling release().

It's also generally hard to see what all the problems are that could arise from the lifetime management (or lack thereof) of buffer slots -- a problem in itself. I think the only safe way to deal with this is to explicitly share ownership of the buffer slots (including the memory backing them) between MMCore and the app. This could be done by managing the slots with std::shared_ptr (which performs automatic reference counting); the Java side could hold onto a copy of the shared_ptr until the app explicitly release()s the slot, so that the image data is available even if the MMCore buffer goes away.

After having written the above, I realized that you don't have separate buffers for each slot (like the V1 buffer) but rather one big, contiguous buffer. You can still have shared_ptrs that point to each block of that buffer while sharing ownership of the whole buffer (ask me how if not clear), though that would mean that the large buffer will be kept alive until every single pointer retained by the app is let go of.

(It is not clear to me what the advantage of the contiguous buffer is. You end up using freeRegions_, which is complicated and also has the danger of becoming fragmented (this could be seen as reinventing a simplistic memory allocator). Also, the single contiguous buffer strategy will never work on 32-bit systems, although we maybe don't care about that.)

As for the Java API for things that need to be explicitly "released" by user code, it should conform to the AutoCloseable interface so that it can be used with the try-with-resources statement (rough equivalent to Python's with statement).

(Ideally we also automatically release the shared_ptr when the Java object gets garbage collected, which would require java.lang.ref.Cleaner, which would require us to update to Java 9+ first, but that we can do. (Please don't use finalize().) But this could be added later, I think. Having this means that people won't need to restart the program after running incorrect code that fails to release the buffer slots, but it should not be necessary for correct code.)

I'm afraid I cannot recommend merging this until these buffer lifetime issues are addressed. If possible, it might be productive to split this PR into two: one that cleans up the metadata handling without introducing the V2 buffer, and one that purely introduces the V2 buffer. That would speed up reviewing the changes to the metadata handling (which I still need to take another look at -- I'm mostly happy with it but it's easier to make 100% sure there are no unknown changes in behavior than to later troubleshoot the existing (sometimes hacky) application code that might depend on exact behavior).

@henrypinkard
Copy link
Contributor Author

henrypinkard commented Mar 3, 2025

I'm finally getting around to looking at the details of the image retrieval API for the V2 buffer. Please correct me if I'm misunderstanding anything below.

Thanks for taking a look. Your understanding is mostly correct -- but in a couple places I think you've misunderstood, and in fact the behavior you're advocating for is already implemented.

Using the new mechanism, the app (let's look at Java for now) calls popNextDataPointer(), which, after the MMCoreJ wrapping into popNextTaggedImagePointer(), returns TaggedImagePointer. That Java class stores a (wrapped) BufferDataPointer -- so the app can indefinitely hold on to the pointer. When the app finally calls TaggedImagePointer.release(), this goes through BufferDataPointer::release()

Correct

which calls the current DataBuffer's ReleaseDataReadPointer() via the BufferManager. But in the meantime, the DataBuffer might have been deleted and replaced, due to the buffer size changing (or due to switching V2 off and on again). In such cases, the release() will crash or corrupt memory. (Also, accessing the image data after the buffer has been reallocated will crash or return incorrect data.)

The clearing/deletion of the v2 differ is handled differently than v1 circular buffer for exactly this reason. The v1 buffer clears/is reallocated every time before starting a sequence acquisition. The v2 buffer does not. For v2, we've now split into two separate operations: clearing and resetting. Trying to clear when there is application code that holds outstanding slots will throw an error. Reset is the more dangerous operation that has the problems you mention, which is why its not simply slotted in eveywhere that the old circular buffer used to be cleared. For example, for the case of changing the buffer size, we have:

void BufferManager::ReallocateBuffer(unsigned int memorySizeMB) {
   if (useNewDataBuffer_.load()) {
      int numOutstanding = newDataBuffer_->NumOutstandingSlots();   
      if (numOutstanding > 0) {
         throw CMMError("Cannot reallocate NewDataBuffer: " + std::to_string(numOutstanding) + " outstanding active slot(s) detected.");
      }
      delete newDataBuffer_;
      newDataBuffer_ = new DataBuffer(memorySizeMB);
   } else {
      delete circBuffer_;
      circBuffer_ = new CircularBuffer(memorySizeMB);
   }
}

On the other hand, if the app obtains a TaggedImagePointer and then lets go of it without calling release(), the slots pointed to remain allocated indefinitely and there is no way to reclaim the memory (other than resizing the buffer). This is perhaps less critical because there is no correct way to use this without explicitly calling release().

You'd have to call reset(). Resizing the buffer would fail in this situation. But yes, there is no correct way to handle pointers without explicit calls to release.

It's also generally hard to see what all the problems are that could arise from the lifetime management (or lack thereof) of buffer slots -- a problem in itself. I think the only safe way to deal with this is to explicitly share ownership of the buffer slots (including the memory backing them) between MMCore and the app. This could be done by managing the slots with std::shared_ptr (which performs automatic reference counting); the Java side could hold onto a copy of the shared_ptr until the app explicitly release()s the slot, so that the image data is available even if the MMCore buffer goes away.

This is essentially what already happens (though not with shared_ptrs)

(It is not clear to me what the advantage of the contiguous buffer is. You end up using freeRegions_, which is complicated and also has the danger of becoming fragmented (this could be seen as reinventing a simplistic memory allocator).

Empirical testing indicated that the currect mechanism of memory mapping a large buffer had the best performance.

True abou the fragmentation, though I don't think this is so likely to happen in practice (I can provide more detail if needed). In any case, this is an internal implementation detail that can always be changed in a future PR without breaking backwards compatibility.

Also, the single contiguous buffer strategy will never work on 32-bit systems, although we maybe don't care about that.)

In my opinion, 32 bit support should be retired

As for the Java API for things that need to be explicitly "released" by user code, it should conform to the AutoCloseable interface so that it can be used with the try-with-resources statement (rough equivalent to Python's with statement).

(Ideally we also automatically release the shared_ptr when the Java object gets garbage collected, which would require java.lang.ref.Cleaner, which would require us to update to Java 9+ first, but that we can do. (Please don't use finalize().) But this could be added later, I think. Having this means that people won't need to restart the program after running incorrect code that fails to release the buffer slots, but it should not be necessary for correct code.)

I went back and forth on how forgiving the design should be about forgetting to call release(), but when it comes down to it, that really is the only correct way to use this. I think a lot of higher level code may pass images around so putting it in a single try block may not always be feasible. But I can certainly add support for AutoClosable

I would say its better to not upgrade to Java 9 first in case that brings other unforseen issues.

If possible, it might be productive to split this PR into two: one that cleans up the metadata handling without introducing the V2 buffer, and one that purely introduces the V2 buffer. That would speed up reviewing the changes to the metadata handling (which I still need to take another look at -- I'm mostly happy with it but it's easier to make 100% sure there are no unknown changes in behavior than to later troubleshoot the existing (sometimes hacky) application code that might depend on exact behavior).

I understand the motivation for this, but unfortunately, I think this would be very challenging. The metadata generation was tangled up in many other functions. I tried to do this very carefully to avoid unexpected changes in higher level application code. It will at least be straightforward to implement fixes once identified since it is all centralized now. We could consider default including legacy metadata (even though it would give a performance hit) on the v1 buffer so that unexpected things don't break

Update: I split out changes to the circularbuffer behavior into #588

@henrypinkard
Copy link
Contributor Author

@marktsuchida I've made changes based on our discussion yesterday:

  • clarified and added more checks for calling clear() on the new buffer (which throws if there are unreleased pointers) and forceReset(), which is the more dangerous option, and is documented as such
  • Add the Autoclosable interface to TaggedImagePointer

Also some further explanation on this:

(It is not clear to me what the advantage of the contiguous buffer is. You end up using freeRegions_, which is complicated and also has the danger of becoming fragmented (this could be seen as reinventing a simplistic memory allocator).

Note that the contiguous buffer is memory mapped, so its not the same a regular contiguous allocation (which i tried first and was very slow).

The combination of a contiguous memory mapping + slot management system (free regions, etc) is efficient because you never have to create new heap objects. The circular buffer is much slower to initialize (see graph above), especially for small image size, because it pre-allocates many frameBuffers to hold its images. I'm not sure how the new buffer could maintain flexibility to different image sizes yet not suffer this pre-allocation penalty without the current strategy of allocating a big block

@henrypinkard henrypinkard mentioned this pull request Mar 12, 2025
6 tasks
@yuechuanlin-cw
Copy link

@henrypinkard This is an amazing alternative! Is there any compiled micro-manager version that implements this new image buffer? Or it has to be self-compiled from the source? Thanks!

@henrypinkard
Copy link
Contributor Author

No, you have to compile the core, the core wrap (either mmcorej or pymmcore) and the new device adapters from source.

Note that there's not yet support for the new features in AcqEngJ. So you may want to compile MMCoreJ and its Jar wrapper, then modify AcqEngJ to have it make use of the generic data handling capabilities. You can test AcqEngJ through the Acquisition class in pycro-manager, or from the MM desktop application (you have to enable it in tools-options)

@yuechuanlin-cw
Copy link

No, you have to compile the core, the core wrap (either mmcorej or pymmcore) and the new device adapters from source.

Note that there's not yet support for the new features in AcqEngJ. So you may want to compile MMCoreJ and its Jar wrapper, then modify AcqEngJ to have it make use of the generic data handling capabilities. You can test AcqEngJ through the Acquisition class in pycro-manager, or from the MM desktop application (you have to enable it in tools-options)

I managed to compile the core wrap and device adapter. However, even without enabling new image buffer, the micro-manager doesn't work in live or MDA, while only work at Snap mode. I supposed that if new buffer is not enabled, it should function as the normal micro-manager, right?

@henrypinkard
Copy link
Contributor Author

Yes. Maybe try downloading the nightly build from when this PR was opened and using that as a starting point?

@yuechuanlin-cw
Copy link

Yes. Maybe try downloading the nightly build from when this PR was opened and using that as a starting point?

I tried downloaded the nightly built version built before Feb 17. Unfortunately, it didn't work. The snap image works fine always while when Live is on, it is stuck there. And also, the Sequence Buffer Monitor seems piled up and then stuck. I am not sure what happened. The core log actually also gave nothing output.

@henrypinkard
Copy link
Contributor Author

Here's my full install with the Core and demo camera built from source, which works with the Demo camera:

https://drive.google.com/file/d/11wLbqtzeYJAIbQjsw2-Q9slIM5fXGi7w/view?usp=sharing

@yuechuanlin-cw
Copy link

Here's my full install with the Core and demo camera built from source, which works with the Demo camera:

https://drive.google.com/file/d/11wLbqtzeYJAIbQjsw2-Q9slIM5fXGi7w/view?usp=sharing

Thank you so much, Henry! I will work on it and see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants