Skip to content

Conversation

go2scope
Copy link

@go2scope go2scope commented Dec 11, 2024

Overview

Introducing the new MM::StorageDevice. There are two main changes to the micro-manger back end: a new device type (Storage) and the extension of the MMCoreAPI. Both changes are lateral because they extend capabilities without affecting existing functionality. There are no new dependencies for the code base. Any existing client code, including the Micro-manager front end, can safely ignore this extension.

The new feature enables MMCore to save datasets on the back end using a storage device adapter. Developers can add support for various formats by creating device adapters like they would add support for a new camera. There are two significant advantages in utilizing this architecture:

The most important one is that the interface between the microscopy application and the file storage is defined by the MMCore API and allows for designing generic, storage/format-independent acquisition engines and acquisition scripts. To change the file format, we only change the device configuration, but the acquisition code remains the same. Or mainly the same, except for some specific properties - pretty much as what happens when we change the camera. Adding support for new file formats or using better implementations does not involve changing the existing code in any significant way.

The second advantage is that by embedding the storage implementation in the MMCore, we can achieve a higher efficiency (data rate), as images do not have to cross the MMCore API boundary.

Status

We plan on adding more documentation and some examples over the coming days and weeks. This PR is work in progress and only a draft to get some early feedback. Implementation is currently functional, but incomplete. Also, the micro-manager build for the new version may not be completely compatible with the main branch. This will be corrected in the next update.

To go over the new API, you can see MMCore.h and MMDevice.h. There are currently two device adapters: BigTiff (with chunking) and Zarr (based on https://github.com/acquire-project/acquire-driver-zarr), both located in the "go2scope" device library. The bigTiff adapter is fully functional, while the Zarr currently supports only writing, not reading.

We are still working on integrating the now MMCore back-end storage into the Micro-manager application (Java). It works but is still not ready for the PR.

Credits

Nathan Clack and the Chan Zuckerberg Initiative team provided material, advisory, and moral support for this effort. Nico Stuurman and Talley Lambert provided early feedback and help. Milos Jovanovic built the fully functional and highly efficient BigTiff adapter based on the early specification of the interface.

@go2scope go2scope marked this pull request as draft December 11, 2024 03:03
@marktsuchida
Copy link
Member

Super excited about this coming to MMCore and sorry for the lack of reaction until now!

Would it be okay (before getting into actual code review) if I rebased all your commits to remove the changes that are not direclty part of StorageDevice (mostly the switch of Visual Studio version) and force-push? I'm pretty sure I can do this without losing the branching and merging in your commit history.

This is both (almost certainly) necessary for eventual merging, and should also make the diff a lot easier to view.

(Am right that the code doesn't actually need VS2022?)

(We do want to switch to Visual Studio 2022 very soon, but that really needs to be a separate pull request that is coordinated with build infrastructure updates. That can happen before or after this PR gets merged, depending on what's easiest/needed.)

@go2scope
Copy link
Author

go2scope commented Jan 30, 2025 via email

@tlambert03
Copy link
Contributor

was waiting to not be the first person :) ... but also wanted to express my enthusiasm and appreciation for the work being done here. this would be very exciting to have in some form 🚀

@nicost
Copy link
Member

nicost commented Feb 14, 2025

Just pushed a commit that reverts the Visual Studio platform toolset to v142 (VS 2019). Nenad, can you do a git pull before adding other commits? Also, make sure in Visual Studio to select "do not upgrade" if VS offers to upgrade the toolset. You may need to run the VS installer to install the v142 (VS2019) platform toolsets on your system.

Regretfully, there are still many project files flagged as changed, at first sight that seems to be due to differences in line ending, so nothing to be really worried about, but it still makes this PR unwieldy.

@nicost
Copy link
Member

nicost commented Feb 14, 2025

Just added Go2Scope to the solution and can build this. I copied acquire-zar.dll (b.t.w., can the acquire-zar code eventually be moved to the public 3rdparty? I assume this is all released under an open license?) to my Micro-Manager installation, but Go2Scope stays greyed out in the Hardware configuration wizard. I guess this is because it is a device type not recognized by my HCW? Happy to help with the Java code if that is needed to fully evaluate this PR. As expected, the UI works well with this Core cope.

@go2scope
Copy link
Author

go2scope commented Feb 14, 2025 via email

@marktsuchida
Copy link
Member

marktsuchida commented Feb 21, 2025

Before rebasing this branch to remove the VS2022 conversion and other issues with micromanager.sln, I have pushed the current state of the branch to mmstorage-draft-backup-20250220 in this repository. (This can be kept until this PR is safely merged.)

@marktsuchida
Copy link
Member

Now I have force-pushed a rebased branch to the PR branch (mmstorage-draft in the go2scope fork).

Briefly, the rebase

  • Removes the conversion of projects to VS2022 (and back to VS2019); the 2 new projects that were newly added are switched to VS2019 in an added commit
  • Removes the 2 merges from main, which only served to increase conflicts
  • Rebases the whole branch onto the current tip of main (which could be done without any conflicts in the code!)
  • Fixes a build error in MMCore.vcxproj
  • Removes micromanagerSD.slnf (I don't think this convenience file needs to be in the official repo, and it was causing conflicts)
Full details of the rebase

After studying (with gitk) the branches and merges within the PR, I found that

  • The changes to unrelated vcxproj files are done at the beginning and end of
    the branch, where the histories are linear, and can be simply left out during
    rebasing. (One additional commit, 'reverted to 142 runtime', in the middle,
    can also be left out.)

  • Changes to micromanager.sln are more complex, but all of the merge conflicts
    (and their incorrect resolutions) are due to merging main into the PR
    branch, which serves no useful purpose. So we can eliminate most problems by
    eliminating the 2 merge commits that merge from main. The end result will
    still have conflicts with main, but they should be simpler to resolve.

I created a new branch mmstorage-rebase equal to the PR branch
mmstorage-draft and ran:

git rebase -i --rebase-merges 4441b057 mmstorage-rebase

where 4441b05 is the commit on main from which the PR branch began.

The vast majority of the overwhelning amount of commits and branches are
actually branches that have already been merged into main (and pulled in
because of the merges from main into the PR branch).

Then I deleted the unnecessary lines:

  • pick 1b94904 upgrade to VS 2022, solution filter added for Storage Device development
  • merge -C 52a0c28 main-13 # Merge branch 'main' of github.com:go2scope/mmCoreAndDevicesSD
  • pick f448c83 reverted to 142 runtime
  • merge -C 0345b00 main-16 # Merge branch 'main' into mmstorage-draft
  • pick 70095d6 Core: revert 0VS platform toolset to v142.
  • pick addd0af Core: add Go2Scope to VS solution.

I then also deleted all the branches upstream of the deleted merge commits.
This was necessary because otherwise all those branches would also get rebased
onto 4441b05, causing conflicts in parts that we are not keeping.

The goal is to rewirte history such that the changes of VS versions and the
merges from main never happened.

I anticipated that there would still be conflicts during the rebase at 'zarr
dimension order corrected' (formerly 9f9e03e), because that commit also
included an attempted fix to micromanager.sln (and I could not remove the whole
commit because it also touches source code). This could be resolved by simply
removing the changes to micromanager.sln in this commit.

git checkout 2efcd502 -- micromanager.sln

There were also 2 conflicts involving changes to micromanagerSD.slnf, because
that file was created in the removed commit ('upgrade to VS2022, ...'). I take
it that this file is for your personal convenience only, so I resolved the
conflicts by removing the file so it no longer appears anywhere.

After the rebase, I made sure that no commits were missing (except for the
intentionally removed ones listed above), and also examined the diff between
mmstorage-draft and mmstorage-rebase to ensure that the only differences
are those unrelated to this PR (because I removed the merges from main).

After the first round of rebasing, which simplified the whole branch, I did a
second rebase only to edit the commit 'go2scope device introduced' to remove
the spurious (unrelated) changes to micromanager.sln.

In addition, I added a new commit that sets the PlatformToolset for the 2 new
projects introduced in this branch to v142 (VS2019 build tools), so that they
are consistent with all the other projects.

Then I tested the build and found that MMCore does not build, due to
(unintended?) changes to MMCore.vcxproj made in the commit 'static runtime
selected'. So I did a third rebase to remove those changes.

Finally, it (somewhat miraclulously) turned out that there were no code
conflicts with the current main (except in micromanager.sln due to projects
added in main), so I rebased the whole PR branch onto main:

git rebase --rebase-merges main

I reset mmstorage-draft to mmstorage-rebase and force-pushed it.

@marktsuchida
Copy link
Member

We might need to do another round of conflict resolution before merging, depending on the order of events (there's another pending PR that adds device types), but I think this is now in a state where we can review and test -- and I look forward to diving into the details over the next days!

BTW if you want to bring the rebased branch to your local clone, assuming that the remote origin points to the go2scope fork (check with git remote -v), you can use:

git fetch origin
git checkout mmstorage-draft
git reset --hard origin/mmstorage-draft

@marktsuchida
Copy link
Member

And don't worry about the failed checks -- I'll fix those before merging, but that's simplest to do just before merging.

Copy link
Member

@marktsuchida marktsuchida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've so far looked at the MMDevice parts. I like the overall design!
Some suggestions and questions.

Comment on lines 1491 to 1492
virtual int Create(const char* path, const char* name, int numberOfDimensions, const int shape[],
MM::StorageDataType pixType, const char* meta, char* handle) = 0;
Copy link
Member

@marktsuchida marktsuchida Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an important reason to use a string handle? Wondering if it might be simpler to use a uint64_t handle if the value itself has no meaning.

EDIT: And (as we discussed today) letting MMCore generate the handle could make it easier to use multiple datasets, potentially from different storage devices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for the metadata parameter, I would suggest using a pointer and size (std::size_t) rather than a null-terminated string. This would not only eliminate a scan on the receiving side to determine the length, but would also allow metadata that contains internal NUL bytes (not that we'd recommend that).

This applies to all passing of metadata in and out, of course.

* \return Progress value (0-100), or -1 if idle or not implemented
* \note Assumes single operation execution at a time
*/
virtual int GetProgress(const char* handle) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be called on a second thread while the first thread is making the call to perform an operation?

* \param handle Handle of the dataset to delete
* \return Status code indicating success or failure
*/
virtual int Delete(char* handle) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also effectively close the handle? Why does this take a handle rather than a path?

Comment on lines 1617 to 1618
virtual int AddImage(const char* handle, int sizeInBytes, unsigned char* pixels,
int coordinates[], int numCoordinates, const char* imageMeta) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sizes should be std::size_t or std::uint64_t (for memory sizes and on-disk sizes, resp.).
I think coordinates should also be 64-bit, just to be safe. numCoordinates can be std::size_t.

Comment on lines 1661 to 1662
virtual const unsigned char* GetImage(const char* handle, int coordinates[],
int numCoordinates) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this have an int return value so that it can indicate an error?

Also, how long is the returned pointer valid? If it is not until the handle is closed (which would be problematic), we probably need a "release" function for images (analogous to the one for metadata).

@marktsuchida marktsuchida mentioned this pull request Feb 22, 2025
3 tasks
Copy link
Member

@marktsuchida marktsuchida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments, primarily on the MMCore API.

Comment on lines +8177 to +8181
std::string CMMCore::getDeviceNameToOpenDataset(const char* path)
{
// TODO:
throw CMMError("Feature not supported", MMERR_GENERIC);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that more than one storage device could open the same file format, I don't know if this function can be well defined. Maybe better to either return a collection of storage devices, or provide instead bool isDatasetSupportedByStorage(storageLabel, path). The latter gives apps more freedom (they can find the first compatible or all, check in a preferred order, or just check for a particular storage).

FocusDirectionAwayFromSample,
};

enum StorageDataType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be best to call this just PixelFormat, so we can (later) use it for other things, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

("Format", not "type", because "type" may imply integer data type.)

StorageDataType_UNKNOWN = 0,
StorageDataType_GRAY8, // uint8
StorageDataType_GRAY16, // uint16
StorageDataType_RGB32 // RGBA, 8bits per color, A ignored defaults to 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The odd format that MMCore currently requires is BGRA32, not RGBA.

It's a good question whether storage devices should support that format, though, because usually we'd want to write RGB24 to disk. The enum could have constants for both.

(We should add support for cameras to simply send RGB24 images in the near future, after the current batch of PRs have been merged. That will help with direct streaming to disk from RGB cameras.)

Comment on lines +99 to +101
// Maximum length of serialized metadata strings
const int MaxMetadataLength = 200000;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being used, and also don't think there should be an arbitrary limit to metadata length, so remove?

int StorageInstance::ConfigureDimension(const char* handle, int dimension, const char* name, const char* meaning)
{
RequireInitialized(__func__);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check to forbid configuring dimensions (1) on a read-only dataset and (2) on a dataset to which images have already been appended should probably be done here, so that that logic is not repeated in every storage adapter.

(Or are there cases when (2) is not disallowed?)

Same also for ConfigureCoordinates.

int StorageInstance::AppendImage(const char* handle, int sizeInBytes, unsigned char* pixels, const char* imageMeta)
{
RequireInitialized(__func__);
return GetImpl()->AppendImage(handle, sizeInBytes, pixels, imageMeta);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MMCore should throw an error if an attempt is made to add an image to a read-only dataset (again, so that each storage device doesn't have to repeat the logic and we get a consistent error message).

Same with AddImage (if keeping).

int StorageInstance::SetCustomMeta(const char* handle, const std::string& key, const std::string& meta)
{
RequireInitialized(__func__);
return GetImpl()->SetCustomMetadata(handle, key.c_str(), meta.c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read-only datasets do not allow setting custom metadata, correct?
If so, MMCore should check.

* \param handle - currently open dataset handle
* \param imageMeta - image metadata
*/
void CMMCore::appendNextToDataset(const char* handle, const std::vector<long>& coordinates, const char* imageMeta) throw (CMMError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly name popNext[Image]AndAppendToDataset(), to be easily compared with the plain popNextImage() and appendImageToDataset()?


  • This won't work for Multi Camera. That can be addressed later (possibly with Henry's per-camera buffering).
  • Applications won't be able to store metadata values generated by cameras using this mechanism. This also can be addressed later if found necessary.

(Just noting the limitations so we are aware of them.)

* \param coordinates - array of coordinates, one for each dimension
* \return - image pixels
*/
STORAGEIMGOUT CMMCore::getImageFromDataset(const char* handle, const std::vector<long>& coordinates) throw (CMMError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pointing out that this API forces MMCoreJ and pymmcore to immediately make a copy of the image buffer, because the buffer can go away at any moment. (Strictly speaking, even that is not fully safe because the copy happens outside of the device module lock.)

It would be nice to add (in the future) a way to share ownership of the image buffer, so that it can be kept alive as (say) a NumPy array without making a copy. But this need not be part of the first iteration, because the focus is presumably on getting camera-to-storage streaming working. We might want to think about buffer lifetime in the device API, though (I already commented there).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants