New image buffer #570

henrypinkard · 2025-02-22T02:16:42Z

The V2 buffer provides thread-safe, generic data storage with improved performance and cleaner abstractions.

Before merging

Decide on default metadata handling and API for turning it on/off
Add modify pymmcore swig to expose new pointer based image handling
Bump Core and MMCoreJ versions appropriately

Design

Two core components:

DataBuffer: Thread-safe generic storage replacing CircularBuffer
BufferManager: Unified interface managing both legacy and new implementations

Key features of the new buffer system:

Thread-safe read/write access
Support for generic data types beyond just images
Support for various data types simultaneously (e.g. images of different sizes, pixels types, etc)
Zero copy writing into the buffer by giving devices pointers into it
Zero copy management at the application layer through pointer manipulation

It can be enabled with:

core.enableV2Buffer(true)

Performance

As a drop in replacement for the circular buffer (i.e. copying the data same number of times, but allowing for arbitrary size and data types), the new buffer gives equal or better performance:

In sequence acquisitions:

In continuous sequence acquisitions (live mode):

It's significantly faster to allocate

Additionally, it has two key features that will enable much higher performance code:

Application layer (e.g. Java, Python), can get access to data/metadata via pointers, avoiding direct copies. (In a quick test this seems to give a 2x speed improvement for reading out 2048x2048 images)
Device adapters can avoid and extra copy by requesting a slot to write data into

Testing

I've written and validated the new buffer and circular buffer against many new tests here. (FYI these live in mmpycorex so they can easily test both MMCoreJ and pymmcore)

It also passes all the pycromanager acquisition tests, which test the various functionalities of the acquisition engine

Metadata

In conjunction with these changes, it made sense to standardize the metadata added to to images. This was previously split amongst several places, making it hard to keep track of and maintain, including the SWIG wrapper, the core, the corec allback, and the device code. Some of it was generated at the time of image acquisition, and some of it was generated at the time of image retrieval.

It has now all been consolidated into void CMMCore::addCameraMetadata, and the same metadata is added to all images whether snapped or passed through a buffer (with the small exception of some multi-camera device adapter-specific tags).

Testing reveals there's a substantial performance cost to adding so much metadata to all images:

Previously, much of this cost was incurred when reading images back out of the buffer. With the new changes, it is incurred at the time of insertion. However, I think it makes much more sense that this metadata is added at insertion time, because that's when its most likely to be in sync with the actual state of hardware.

Since this consolidation takes place outside the BufferManager, it also affects the circular buffer and will change behavior even if the v2 buffer is disabled. We need to figure out what should be enabled here. It's unclear (to me) what higher level code depends on what tags, but including the union of all of them by default will substantially hurt performance. I also have just a temporary function in the core API for controlling which metadata to add, which should perhaps be replaced with something more permanent.

Multi-camera

While it is possible to use the v2 buffer with multi-camera devices, since its flexibility is a more general solution (e.g. supports different image sizes, types, etc) to than the multi-camera device adapter, in my opinion that should be deprecated and application code that relies on it updated to the v2 buffer.

One addition here is the getLastTaggedImageFromDevicePointer("cameraLabel"), which enables you to get the last image from a specific camera, rather than having to search backwards through the most recent images and read their metadata.

A step towards a single route for all data

The pointer-based API gives a good opportunity to start moving towards a single route for all data, rather than a separate route for snap and sequences. I don't think its possible to fully do this without changing how cameras handle data for snaps, but in the mean time the GetImagePointer function now copies the snap buffer in camera adapters into the v2 buffer, returning a pointer to it. This should be faster than copying into the application memory space because it can be multithreaded, and still allows the pointer-based handling of the data from the application layer.

Pointer based image handling

You get these through methods like getLastImagePointer(), which return a TaggedImagePointer object. This object is a wrapper around the TaggedImage object, but it will not load the pixels until you call getPixels(), or if you never want to use them you can call release(), or just use the metadata without pixels like:

TaggedImagePointer tip = core.getLastImagePointer();
// This works just like a regular JSONObject, but it won't load
// the metadata until needed
tip.tags.get("Width");

…oreAndDevices into data_buffer_new

…v2 buffer

…r types

…nters to java longs

…ure metadata is accurate for v2 buffer images

…mpelxity

…c; Also centralized Metadata generation into the core from SWIG, core callback, device base

henrypinkard · 2025-02-28T18:21:27Z

Sounds good. As for System State Cache, there currently is a mechanism to switch that off if desired (MMCoreJ.i::includeSystemStateCache_). I am all for cleaning that up, but we will need a way to continue switching that off also when using V1 when needed (hoping to start V2 soon on the Java side, but expecting that will take some time).

I moved that method to the core and out of the SWIG layer, but will keep it indefinitely for backwards compatibility

henrypinkard · 2025-02-28T19:34:15Z

Added system state cache to summary metadata: micro-manager/AcqEngJ#127

henrypinkard · 2025-02-28T20:30:56Z

Okay I think I've addressed everything except for the two remaining unresolved comments above.

I think what to do about acquiring write slots can be addressed in future PR, but it would be good to figure out what the eventual strategy will be

I noticed that a few new functions (AcquireImageWriteSlot et al.) are added to CoreCallback but not to its base (interface) class MM::Core. @henrypinkard Would I be right in thinking that these are there to show how the transfer of images from the camera to MMCore can be made more efficient, but are kept hidden from device adapters for the time being?

This was indeed an accidental omission. I've added them to the interface, but this is commented out for now

I would agree with a cautious approach here because it would be bad if we have cameras that only work with V1 or V2. It would also be bad if every camera that supports WriteSlot had to add conditional branches to check if V2 is enabled. Probably the best way is to say that cameras must use either InsertImage or WriteSlot, but not both, and then make WriteSlot just work with V1 as well, just without the benefit of eliminating a copy. (As far as I'm concerned, it's fine if this PR doesn't yet expose WriteSlot to devices.)

NewDataBuffer (formerly V2) maintains backward compatibility with InsertImage, so existing camera adapters will continue to function. Users who prefer to avoid pointer management can still use InsertImage to copy data into the NewDataBuffer.
However, AcquireWriteSlot is incompatible with CircularBuffer for several reasons:

The CircularBuffer isn't truly circular - it fully empties when filled in continuous sequence mode, which would invalidate pointers held by device code.
CircularBuffer lacks slot locking mechanisms. Adding this functionality would essentially transform it into something similar to NewDataBuffer while risking new bugs.

I considered creating an intermediate compatibility buffer that would temporarily store data when a camera acquires a write slot with CircularBuffer enabled. However, this approach would be overly complex and effectively just place a NewDataBuffer instance between the CircularBuffer and device adapters.

I think the path forward is to enable write slot acquisition in a future PR after additional testing, with the requirement that camera device adapters using the acquire/release slot feature must use the new buffer. Since the performance testing shows NewDataBuffer is more performant than CircularBuffer while covering all the same use cases, it seems to me the application should be migrated to using it as the default option as soon as we are confident in its robustness

marktsuchida · 2025-03-03T18:36:45Z

I'm finally getting around to looking at the details of the image retrieval API for the V2 buffer. Please correct me if I'm misunderstanding anything below.

Using the new mechanism, the app (let's look at Java for now) calls popNextDataPointer(), which, after the MMCoreJ wrapping into popNextTaggedImagePointer(), returns TaggedImagePointer. That Java class stores a (wrapped) BufferDataPointer -- so the app can indefinitely hold on to the pointer. When the app finally calls TaggedImagePointer.release(), this goes through BufferDataPointer::release(), which calls the current DataBuffer's ReleaseDataReadPointer() via the BufferManager. But in the meantime, the DataBuffer might have been deleted and replaced, due to the buffer size changing (or due to switching V2 off and on again). In such cases, the release() will crash or corrupt memory. (Also, accessing the image data after the buffer has been reallocated will crash or return incorrect data.)

On the other hand, if the app obtains a TaggedImagePointer and then lets go of it without calling release(), the slots pointed to remain allocated indefinitely and there is no way to reclaim the memory (other than resizing the buffer). This is perhaps less critical because there is no correct way to use this without explicitly calling release().

It's also generally hard to see what all the problems are that could arise from the lifetime management (or lack thereof) of buffer slots -- a problem in itself. I think the only safe way to deal with this is to explicitly share ownership of the buffer slots (including the memory backing them) between MMCore and the app. This could be done by managing the slots with std::shared_ptr (which performs automatic reference counting); the Java side could hold onto a copy of the shared_ptr until the app explicitly release()s the slot, so that the image data is available even if the MMCore buffer goes away.

After having written the above, I realized that you don't have separate buffers for each slot (like the V1 buffer) but rather one big, contiguous buffer. You can still have shared_ptrs that point to each block of that buffer while sharing ownership of the whole buffer (ask me how if not clear), though that would mean that the large buffer will be kept alive until every single pointer retained by the app is let go of.

(It is not clear to me what the advantage of the contiguous buffer is. You end up using freeRegions_, which is complicated and also has the danger of becoming fragmented (this could be seen as reinventing a simplistic memory allocator). Also, the single contiguous buffer strategy will never work on 32-bit systems, although we maybe don't care about that.)

As for the Java API for things that need to be explicitly "released" by user code, it should conform to the AutoCloseable interface so that it can be used with the try-with-resources statement (rough equivalent to Python's with statement).

(Ideally we also automatically release the shared_ptr when the Java object gets garbage collected, which would require java.lang.ref.Cleaner, which would require us to update to Java 9+ first, but that we can do. (Please don't use finalize().) But this could be added later, I think. Having this means that people won't need to restart the program after running incorrect code that fails to release the buffer slots, but it should not be necessary for correct code.)

I'm afraid I cannot recommend merging this until these buffer lifetime issues are addressed. If possible, it might be productive to split this PR into two: one that cleans up the metadata handling without introducing the V2 buffer, and one that purely introduces the V2 buffer. That would speed up reviewing the changes to the metadata handling (which I still need to take another look at -- I'm mostly happy with it but it's easier to make 100% sure there are no unknown changes in behavior than to later troubleshoot the existing (sometimes hacky) application code that might depend on exact behavior).

MMCore/MMCore.h

henrypinkard · 2025-03-03T20:42:55Z

I'm finally getting around to looking at the details of the image retrieval API for the V2 buffer. Please correct me if I'm misunderstanding anything below.

Thanks for taking a look. Your understanding is mostly correct -- but in a couple places I think you've misunderstood, and in fact the behavior you're advocating for is already implemented.

Using the new mechanism, the app (let's look at Java for now) calls popNextDataPointer(), which, after the MMCoreJ wrapping into popNextTaggedImagePointer(), returns TaggedImagePointer. That Java class stores a (wrapped) BufferDataPointer -- so the app can indefinitely hold on to the pointer. When the app finally calls TaggedImagePointer.release(), this goes through BufferDataPointer::release()

Correct

which calls the current DataBuffer's ReleaseDataReadPointer() via the BufferManager. But in the meantime, the DataBuffer might have been deleted and replaced, due to the buffer size changing (or due to switching V2 off and on again). In such cases, the release() will crash or corrupt memory. (Also, accessing the image data after the buffer has been reallocated will crash or return incorrect data.)

The clearing/deletion of the v2 differ is handled differently than v1 circular buffer for exactly this reason. The v1 buffer clears/is reallocated every time before starting a sequence acquisition. The v2 buffer does not. For v2, we've now split into two separate operations: clearing and resetting. Trying to clear when there is application code that holds outstanding slots will throw an error. Reset is the more dangerous operation that has the problems you mention, which is why its not simply slotted in eveywhere that the old circular buffer used to be cleared. For example, for the case of changing the buffer size, we have:

void BufferManager::ReallocateBuffer(unsigned int memorySizeMB) {
   if (useNewDataBuffer_.load()) {
      int numOutstanding = newDataBuffer_->NumOutstandingSlots();   
      if (numOutstanding > 0) {
         throw CMMError("Cannot reallocate NewDataBuffer: " + std::to_string(numOutstanding) + " outstanding active slot(s) detected.");
      }
      delete newDataBuffer_;
      newDataBuffer_ = new DataBuffer(memorySizeMB);
   } else {
      delete circBuffer_;
      circBuffer_ = new CircularBuffer(memorySizeMB);
   }
}

On the other hand, if the app obtains a TaggedImagePointer and then lets go of it without calling release(), the slots pointed to remain allocated indefinitely and there is no way to reclaim the memory (other than resizing the buffer). This is perhaps less critical because there is no correct way to use this without explicitly calling release().

You'd have to call reset(). Resizing the buffer would fail in this situation. But yes, there is no correct way to handle pointers without explicit calls to release.

It's also generally hard to see what all the problems are that could arise from the lifetime management (or lack thereof) of buffer slots -- a problem in itself. I think the only safe way to deal with this is to explicitly share ownership of the buffer slots (including the memory backing them) between MMCore and the app. This could be done by managing the slots with std::shared_ptr (which performs automatic reference counting); the Java side could hold onto a copy of the shared_ptr until the app explicitly release()s the slot, so that the image data is available even if the MMCore buffer goes away.

This is essentially what already happens (though not with shared_ptrs)

(It is not clear to me what the advantage of the contiguous buffer is. You end up using freeRegions_, which is complicated and also has the danger of becoming fragmented (this could be seen as reinventing a simplistic memory allocator).

Empirical testing indicated that the currect mechanism of memory mapping a large buffer had the best performance.

True abou the fragmentation, though I don't think this is so likely to happen in practice (I can provide more detail if needed). In any case, this is an internal implementation detail that can always be changed in a future PR without breaking backwards compatibility.

Also, the single contiguous buffer strategy will never work on 32-bit systems, although we maybe don't care about that.)

In my opinion, 32 bit support should be retired

As for the Java API for things that need to be explicitly "released" by user code, it should conform to the AutoCloseable interface so that it can be used with the try-with-resources statement (rough equivalent to Python's with statement).

(Ideally we also automatically release the shared_ptr when the Java object gets garbage collected, which would require java.lang.ref.Cleaner, which would require us to update to Java 9+ first, but that we can do. (Please don't use finalize().) But this could be added later, I think. Having this means that people won't need to restart the program after running incorrect code that fails to release the buffer slots, but it should not be necessary for correct code.)

I went back and forth on how forgiving the design should be about forgetting to call release(), but when it comes down to it, that really is the only correct way to use this. I think a lot of higher level code may pass images around so putting it in a single try block may not always be feasible. But I can certainly add support for AutoClosable

I would say its better to not upgrade to Java 9 first in case that brings other unforseen issues.

If possible, it might be productive to split this PR into two: one that cleans up the metadata handling without introducing the V2 buffer, and one that purely introduces the V2 buffer. That would speed up reviewing the changes to the metadata handling (which I still need to take another look at -- I'm mostly happy with it but it's easier to make 100% sure there are no unknown changes in behavior than to later troubleshoot the existing (sometimes hacky) application code that might depend on exact behavior).

I understand the motivation for this, but unfortunately, I think this would be very challenging. The metadata generation was tangled up in many other functions. I tried to do this very carefully to avoid unexpected changes in higher level application code. It will at least be straightforward to implement fixes once identified since it is all centralized now. We could consider default including legacy metadata (even though it would give a performance hit) on the v1 buffer so that unexpected things don't break

Update: I split out changes to the circularbuffer behavior into #588

henrypinkard · 2025-03-04T19:14:01Z

@marktsuchida I've made changes based on our discussion yesterday:

clarified and added more checks for calling clear() on the new buffer (which throws if there are unreleased pointers) and forceReset(), which is the more dangerous option, and is documented as such
Add the Autoclosable interface to TaggedImagePointer

Also some further explanation on this:

(It is not clear to me what the advantage of the contiguous buffer is. You end up using freeRegions_, which is complicated and also has the danger of becoming fragmented (this could be seen as reinventing a simplistic memory allocator).

Note that the contiguous buffer is memory mapped, so its not the same a regular contiguous allocation (which i tried first and was very slow).

The combination of a contiguous memory mapping + slot management system (free regions, etc) is efficient because you never have to create new heap objects. The circular buffer is much slower to initialize (see graph above), especially for small image size, because it pre-allocates many frameBuffers to hold its images. I'm not sure how the new buffer could maintain flexibility to different image sizes yet not suffer this pre-allocation penalty without the current strategy of allocating a big block

yuechuanlin-cw · 2025-04-01T14:02:38Z

@henrypinkard This is an amazing alternative! Is there any compiled micro-manager version that implements this new image buffer? Or it has to be self-compiled from the source? Thanks!

henrypinkard · 2025-04-04T16:47:10Z

No, you have to compile the core, the core wrap (either mmcorej or pymmcore) and the new device adapters from source.

Note that there's not yet support for the new features in AcqEngJ. So you may want to compile MMCoreJ and its Jar wrapper, then modify AcqEngJ to have it make use of the generic data handling capabilities. You can test AcqEngJ through the Acquisition class in pycro-manager, or from the MM desktop application (you have to enable it in tools-options)

yuechuanlin-cw · 2025-04-11T19:01:04Z

No, you have to compile the core, the core wrap (either mmcorej or pymmcore) and the new device adapters from source.

Note that there's not yet support for the new features in AcqEngJ. So you may want to compile MMCoreJ and its Jar wrapper, then modify AcqEngJ to have it make use of the generic data handling capabilities. You can test AcqEngJ through the Acquisition class in pycro-manager, or from the MM desktop application (you have to enable it in tools-options)

I managed to compile the core wrap and device adapter. However, even without enabling new image buffer, the micro-manager doesn't work in live or MDA, while only work at Snap mode. I supposed that if new buffer is not enabled, it should function as the normal micro-manager, right?

henrypinkard · 2025-04-11T19:47:23Z

Yes. Maybe try downloading the nightly build from when this PR was opened and using that as a starting point?

yuechuanlin-cw · 2025-04-11T20:35:58Z

Yes. Maybe try downloading the nightly build from when this PR was opened and using that as a starting point?

I tried downloaded the nightly built version built before Feb 17. Unfortunately, it didn't work. The snap image works fine always while when Live is on, it is stuck there. And also, the Sequence Buffer Monitor seems piled up and then stuck. I am not sure what happened. The core log actually also gave nothing output.

henrypinkard · 2025-04-14T17:41:35Z

Here's my full install with the Core and demo camera built from source, which works with the Demo camera:

https://drive.google.com/file/d/11wLbqtzeYJAIbQjsw2-Q9slIM5fXGi7w/view?usp=sharing

yuechuanlin-cw · 2025-04-18T18:14:18Z

Here's my full install with the Core and demo camera built from source, which works with the Demo camera:

https://drive.google.com/file/d/11wLbqtzeYJAIbQjsw2-Q9slIM5fXGi7w/view?usp=sharing

Thank you so much, Henry! I will work on it and see how it goes.

henrypinkard added 30 commits January 31, 2025 14:06

initial data buffer commit

27e4c8c

Factored out circularBuffer behind new API

f886b12

finished concurrency for buffer slots

ebbc1bd

most functionality in v2 buffer implemented

94125ee

added metadata to buffer

8efc90d

fix compiler warnings

0fd0823

cleanup

06fe9ae

cleanup

e468f0f

remove comment

93860a9

Merge branch 'data_buffer_new' of https://github.com/henrypinkard/mmC…

b0aa28e

…oreAndDevices into data_buffer_new

modify SWIG to read image size dynamically for each image

d87cc2e

fixed bugs, implementation seems to work just like circularbuffer

c7883ee

add parallel copying and memory mapping

c29e114

switch to simpler mutexs

b7a02a0

fix docs and clean up

093fb1d

clean up and remove headers from buffer

f37b964

restored delted functions, refactor, and recycle buffer slots

8564b15

lots of bug fixes and refactoring

9887ae9

small perf improvements and bug fixes

23e1cc4

expose direct getting of pointers from corecallback for writing into …

37ab2cb

…v2 buffer

fix bug

8697a85

fix bug getting image without metadta and standardize internal pointe…

6aae6b6

…r types

fix bit depth fn, which is not stored in v2 buffer, and map image poi…

0f1b645

…nters to java longs

remove unused typemap

567cb2b

add ability to manipulate data pointers inside v2 buffer. Also make s…

a5a69b7

…ure metadata is accurate for v2 buffer images

fix bugs with snap. pointer-based taggedimages WIP

1408c8e

Refactor to Metadata to only maintain essential metadata in buffer

2847d1c

refactor and simplify to make safer pointer handling and less SWIG co…

893642f

…mpelxity

Working bufferdatapointer class and snapimage

6b2fd6d

major refactor to make v2 buffer and buffer manager data type agnosti…

785e995

…c; Also centralized Metadata generation into the core from SWIG, core callback, device base

remove errant bracket

3814ea9

henrypinkard added 3 commits February 28, 2025 11:04

fine-grained handling metadata categories and correct default behavior

bc1a9f1

fix metadata keyword

6cc217e

fix bugs and rename from v2 to newdatabuffer

b15d252

henrypinkard added 3 commits February 28, 2025 11:51

allow retrieval of generic non image data

a9f20fa

comment out acquirewriteslot mechanism for now

8a798e2

add commented out functions to MMCore interface

bc3ea3b

marktsuchida reviewed Mar 3, 2025

View reviewed changes

MMCore/MMCore.h Outdated Show resolved Hide resolved

henrypinkard added 4 commits March 4, 2025 08:57

change to metadata bitmask

f5dec94

clarified API for force reseting vs clearing and added safety checks

b2a3b69

add method for adding generic dat

484ca34

add autoclosable

8cd1e38

henrypinkard mentioned this pull request Mar 8, 2025

Allow camera device adapters to call InsertImage on snap #592

Open

clarify deprecations

e877874

henrypinkard mentioned this pull request Mar 12, 2025

New camera API #593

Open

6 tasks

marktsuchida mentioned this pull request Jul 11, 2025

Resizing the Sequence Buffer can take a long time micro-manager/micro-manager#2167

Open

marktsuchida mentioned this pull request Aug 14, 2025

Review/merge order for major MMDevice/MMCore PRs #713

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New image buffer #570

New image buffer #570

Uh oh!

henrypinkard commented Feb 22, 2025 •

edited

Loading

Uh oh!

henrypinkard commented Feb 28, 2025

Uh oh!

henrypinkard commented Feb 28, 2025

Uh oh!

henrypinkard commented Feb 28, 2025

Uh oh!

marktsuchida commented Mar 3, 2025

Uh oh!

Uh oh!

henrypinkard commented Mar 3, 2025 •

edited

Loading

Uh oh!

henrypinkard commented Mar 4, 2025

Uh oh!

yuechuanlin-cw commented Apr 1, 2025

Uh oh!

henrypinkard commented Apr 4, 2025

Uh oh!

yuechuanlin-cw commented Apr 11, 2025

Uh oh!

henrypinkard commented Apr 11, 2025

Uh oh!

yuechuanlin-cw commented Apr 11, 2025

Uh oh!

henrypinkard commented Apr 14, 2025

Uh oh!

yuechuanlin-cw commented Apr 18, 2025

Uh oh!

Uh oh!

New image buffer #570

Are you sure you want to change the base?

New image buffer #570

Uh oh!

Conversation

henrypinkard commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before merging

Design

Performance

Testing

Metadata

Multi-camera

A step towards a single route for all data

Pointer based image handling

Uh oh!

henrypinkard commented Feb 28, 2025

Uh oh!

henrypinkard commented Feb 28, 2025

Uh oh!

henrypinkard commented Feb 28, 2025

Uh oh!

marktsuchida commented Mar 3, 2025

Uh oh!

Uh oh!

henrypinkard commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henrypinkard commented Mar 4, 2025

Uh oh!

yuechuanlin-cw commented Apr 1, 2025

Uh oh!

henrypinkard commented Apr 4, 2025

Uh oh!

yuechuanlin-cw commented Apr 11, 2025

Uh oh!

henrypinkard commented Apr 11, 2025

Uh oh!

yuechuanlin-cw commented Apr 11, 2025

Uh oh!

henrypinkard commented Apr 14, 2025

Uh oh!

yuechuanlin-cw commented Apr 18, 2025

Uh oh!

Uh oh!

henrypinkard commented Feb 22, 2025 •

edited

Loading

henrypinkard commented Mar 3, 2025 •

edited

Loading