Skip to content

Conversation

luissantosHCIT
Copy link

Context

The synchronous load method pulls too much metadata during the loading phase. This wastes resources such as bandwidth, server connections, and user patience. In my test case, I was downloading up to 29 MB of JSON metadata which could take up to 10 seconds. The vast majority of the metadata is not used at all when setting up the UI/Hanging Protocol.

We can have a snappy synchronous experience by requesting and making use of as much of the QIDO metadata as possible.
We can also skip downloading the full study metadata and simply download two slices worth of metadata per series. With a little IPP patching, the Viewer can still enable the 3D tools.

Breaking the process in smaller QIDO and WADO queries allows for parallelization of the request against the server thus minimizing time spent requesting data.

Fixes #5287

Note

I apologize for the number of changes but I saw opportunities to improve code quality while chasing a few defects prior to PR release. I took time to add more type definitions, reorganize some of the files, and testing both the lazy load and sync load features.

Also, note that this only takes care of the metadata portion itself but not the image load. I will tackle the image load optimization in a separate PR as an opt in setting. For now, the UI should update pretty fast although the thumbnail retrieval will prevent it from showing the images which can take a second in my environment.

Ignore commits from August 4th since those are being reviewed in #5293. They are only showing here because I had to base my new branch off the one associated with that PR so I could test against my live environment as well.

Finally, I will be adding a few more commits but those will be unit tests. I decided to open this PR to allow initial reviews. I should be adding the rest of the unit tests soon (tomorrow).

Changes & Results

  • Broke down metadata retrieval into study and series level QIDO queries.
  • Added logic for retrieving the first and last slice for each series discovered in study.
  • Added logic for preparing the metadata for viewer consumption, which includes updating the IPP and instance number of reconstructed missing items.
  • Refactored the organization of the DicomWebDataSource module/component.
  • Added several new TypeScript modules and upgraded a few source files to be in TypeScript so I could apply type annotations.
  • Added a few basic tests (I will add more tomorrow while awaiting for a reviewer).
  • Added function and module docstrings.
  • Removed no longer needed code.
  • Moved a few functions into their own TypeScript module.

Before

image

After

image

Testing

  • Run a webserver or dev node script.
  • Open a web browser and make sure to show the network tab from dev tools.
  • Navigate to page and observe the burst of queries and fetches each taking a few milliseconds.
  • Confirm that only a few hundred KB are downloaded for a given study in the worst case scenario.

Checklist

PR

  • Added new unit tests
  • Ran unit tests
  • Ran end to end tests
  • My Pull Request title is descriptive, accurate and follows the
    semantic-release format and guidelines.

Code

  • My code has been well-documented (function documentation, inline comments,
    etc.)

Public Documentation Updates

  • [] The documentation page has been updated as necessary for any public API
    additions or removals.

Tested Environment

  • [Ubuntu Linux] OS:
  • [20.18.0] Node version:
  • [Firefox 141, Chromium 138.0.7204.157] Browser:

… reflect expected return type.

Signed-off-by: Luis M. Santos <[email protected]>
…g metadata. WADO metadata request is more likely to return JSON and some VNAs do not like the extra options in the Accept header. Also, passing an empty array is not sufficient because somewhere we still include a comma that breaks the Accept header. It's better to omit it for the metadata request only.

Signed-off-by: Luis M. Santos <[email protected]>
…s typed checked. Of course, I upgraded the module to a TypeScript module.

Moved the request header interfaces into its own TypeScript module (RequestHeaders.ts) in the core types.

Signed-off-by: Luis M. Santos <[email protected]>
…s a result, upgraded the source file to TypeScript.

Removed the User interface from RequestHeaders module and moved them to the user module.

Signed-off-by: Luis M. Santos <[email protected]>
Signed-off-by: Luis M. Santos <[email protected]>
Signed-off-by: Luis M. Santos <[email protected]>
…urely from QIDO as possible.

Signed-off-by: Luis M. Santos <[email protected]>
…ccount so that slightly different requests are attempted. In particular, it fixes the instance metadata retrieval getting blocked because they share a study uid.

Signed-off-by: Luis M. Santos <[email protected]>
… used by the display protocol engine.

Signed-off-by: Luis M. Santos <[email protected]>
…rging of data from the qido search into copies of a single reference slice metadata. It saves on unnecessary data retrieval and thus makes the viewer loading very snappy.

Signed-off-by: Luis M. Santos <[email protected]>
…ive on what to retrieve.

Signed-off-by: Luis M. Santos <[email protected]>
… allow usage of type definition elsewhere for TS source files.

Signed-off-by: Luis M. Santos <[email protected]>
…hat I can retrieve multiple slice metadata blocks.

Signed-off-by: Luis M. Santos <[email protected]>
…ed to pass a list of promises as opposed to a single promise so we can reconstruct the spatial information.

Signed-off-by: Luis M. Santos <[email protected]>
…we now fetch 2 slice metadata blocks (first and last slices) instead of one.

Signed-off-by: Luis M. Santos <[email protected]>
…econstructed metadata array can be used by the viewer and correctly perform a 3D reconstruction.

Adjusted the type definitions to improve self documentation of the input data. Since dcmjs is a JS library, the naturalized dataset lacks proper type annotations so this is the beginning work to add such annotations.

Signed-off-by: Luis M. Santos <[email protected]>
…ction for the retrieval of metadata from the dicom source.

Made stylistic refactors and added basic documentation.

Signed-off-by: Luis M. Santos <[email protected]>
…h level flow of the loading logic so I can figure out a small defect.

Signed-off-by: Luis M. Santos <[email protected]>
…make the extraction logic cleaner and allow for sharing of types with the wado modules.

Signed-off-by: Luis M. Santos <[email protected]>
… level since it is a core definition to the rest of the local codebase.

Signed-off-by: Luis M. Santos <[email protected]>
…an cleaning the _retrieveMetaDataAsync logic to help in finding the root cause of a defect introduced in my sync retrieval changes.

Signed-off-by: Luis M. Santos <[email protected]>
const imageIds = getImageId({ instance, frame });
return imageIds;
},
getImageIdsForInstance: getImageIdsForInstance,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be just getImageIdsForInstance,
as your assignment variable name is identical to what you are assigning it to.

@wayfarer3130
Copy link
Contributor

It is indeed interesting to be able to query for studies with qido and display them, fetching first/last etc, but it definitely needs to be something that can be turned on/off with a flag because the performance of this can be either much better or much worse than the existing behaviour. The reason is that not all systems store all the metadata as individual pieces, but if they store it like static dicomweb or other stream optimized variants, then the fastest way to get the data is actually to fetch metadata for the entire series at once - that allows the best compression to be applied, and works equally well for both single and multiframe.
On the OHIF side, these changes break unknown items throughout the viewer. For example, have you tried this at all with whole slide microscopy viewing in the CS3D mode? Another example is CINE playback rates - the tags that end up being needed differ based on modalities. Custom hanging protocols can depend on any tag, and so can custom split/sort rules.
You will need really clear documentation on exactly what queries get made under what circumstances and how to decide whether that will work for various display conditions.

@wayfarer3130
Copy link
Contributor

To make this more configurable/manageable without introducing bugs, could you create a new data source with these changes?
Something like @ohif/extension.dataSourcesModule.dicomweb_minimal
and document exactly what it requires differently, and what sorts of things that will affect.
This may need to make changes to the existing data sources to make them more extensible, but that is something that has been intended all along.

I would set it up so that it is easy to fall back to the basic data source retrieval options on conditions like only have <= N sop instances - that way you can use N~=10 and just fallback to retrieving full metadata, and only use the qido options when they actually make a significant difference. It might also need to have some way to customize when that occurs, and almost certainly needs to have a way to go back and reload everything using the full metadata set on a per-series basis.

@luissantosHCIT
Copy link
Author

luissantosHCIT commented Aug 25, 2025

It is indeed interesting to be able to query for studies with qido and display them, fetching first/last etc, but it definitely needs to be something that can be turned on/off with a flag because the performance of this can be either much better or much worse than the existing behaviour. The reason is that not all systems store all the metadata as individual pieces, but if they store it like static dicomweb or other stream optimized variants, then the fastest way to get the data is actually to fetch metadata for the entire series at once - that allows the best compression to be applied, and works equally well for both single and multiframe. On the OHIF side, these changes break unknown items throughout the viewer. For example, have you tried this at all with whole slide microscopy viewing in the CS3D mode? Another example is CINE playback rates - the tags that end up being needed differ based on modalities. Custom hanging protocols can depend on any tag, and so can custom split/sort rules. You will need really clear documentation on exactly what queries get made under what circumstances and how to decide whether that will work for various display conditions.

Let me make it a flag for now. You bring excellent points that I need to consider.

In terms of the WADO/QIDO part of the standard, it's a REST API, so I think a standards compliant system has to comply with expected metadata.

WADO example,

  • GET {s}/studies/{study}/series/{series}/metadata => has to return the series metadata
  • GET {s}/studies/{study}/series/{series}/instances/{instance}/metadata => has to return the instance metadata

https://dicom.nema.org/medical/dicom/current/output/html/part18.html#sect_10.4.1.1.2

QIDO example,

  • GET {s}/studies/{study} => has to return the series list for the study
  • GET {s}/studies/{study}/series/{series} => has to return the instance list for the series

https://dicom.nema.org/medical/dicom/current/output/html/part18.html#sect_10.6.3.3.1
https://dicom.nema.org/medical/dicom/current/output/html/part18.html#sect_10.6.3.3.2
https://dicom.nema.org/medical/dicom/current/output/html/part18.html#sect_10.6.3.3.3

I do understand helping gzip or zstd achieve better compression, but I am not convinced that it is necessarily faster to retrieve MB of data in most cases. In my particular case, the response is already gzip compressed. This is something that would have to be extensively benchmarked across vendor implementations and it is well outside my scope and resources at the moment.

With that said, if we stick strictly to the standard, QIDO as deep into the tree as possible and then WADO at the leaf should include much fewer metadata than WADO at the higher levels (series) despite having multiple separate requests and potentially poor compression per request. Also, we get to benefit from request parallelization and keeping them close to less than 14KB to avoid the TCP slow start tax per connection. Meaning, it should be near instantaneous which is what I found in my crude testing.

This is not to discredit your observations. I am thinking that most studies should have a lot of metadata because that is the nature of DICOM (CT, MRI, PET, DWI, Perfusion, etc...) and DICOMweb lacks a GraphQL specification like FHIR has.

PS.....

I just read your follow up comment and I fully agree. That is a better way to approach this. I was avoiding adding to the project since I am new here, but I do like that suggestion the more I think about this. One thing we (more like I) might need to revisit the correct threshold for falling back to full retrieval. Starting at 10 sounds good to me for now.

I will check CINE and CS3D if possible. Expect changes starting tomorrow since I need to take care of something this afternoon at my institution.

Thank you. This is a good discussion!

Also, I already saw a few other emails about style and I will look at those as well for the other PRs, but I wanted to share that TypeScript is not my primary language so occasionally you might see more of a C or Rust or Pythony style when writing things. I am going to take this opportunity to learn from you!

@luissantosHCIT
Copy link
Author

luissantosHCIT commented Sep 2, 2025

Moved my changes to its own module as requested.
Do not review yet. I am missing the implementation of the default behavior if slices < 10.

@luissantosHCIT luissantosHCIT changed the title fix 5287 optimize series discovery on load time [WIP] fix 5287 optimize series discovery on load time Sep 3, 2025
…ces.

Reorganized the utility functionality so that different data sources can select the functionality they need while minimizing duplication and migrating code into TypeScript land.

Signed-off-by: Luis M. Santos <[email protected]>
…to the new utils toolbox.

Signed-off-by: Luis M. Santos <[email protected]>
…ring this portion of the API in the DataSources because I may have made a mistake previously. I think it is part of the API to have their own copy-paste version of getImageIdsForInstance. I will need to revisit refactoring this at a later time.

Signed-off-by: Luis M. Santos <[email protected]>
…ataSource API as far as I could see doing a codebase search. I am implementing the same elsewhere.

Signed-off-by: Luis M. Santos <[email protected]>
…ataSource API and I wanted to make all of these guys similar so I can more consistently refactor the general API in the future.

Signed-off-by: Luis M. Santos <[email protected]>
…it uses the same utilities as the DicomWebMinimalDataSource.

Signed-off-by: Luis M. Santos <[email protected]>
…se these functions across several data sources.

Signed-off-by: Luis M. Santos <[email protected]>
…the study contains fewer than 10 slices.

Signed-off-by: Luis M. Santos <[email protected]>
@luissantosHCIT
Copy link
Author

luissantosHCIT commented Sep 3, 2025

TODO:

  • Organizing the Data Source directories.
  • Refactoring imports and logic out of the individual DicomWeb DataSource items.
  • Created the new Minimal Data Source
  • Added default full retrieval to Minimal Data Source.
  • Added a few docstrings.
  • Documentation of new Minimal DicomWeb Data Source.
  • Review all tests.
  • Add new tests.

…ents were expected to also change per slice.

Signed-off-by: Luis M. Santos <[email protected]>
…com Data Sources directory.

Signed-off-by: Luis M. Santos <[email protected]>
Signed-off-by: Luis M. Santos <[email protected]>
@wayfarer3130
Copy link
Contributor

The different queries do in fact need to be implemented, but the performance of different types of queries and what is available to actually return differs between systems. As well, the actual parsing performance differs between systems.
During testing of different PACS systems, the performance of the back end PACS system was always the determining factor for how fast things could display, and not the transfer speed. Only when there was a metadata gzip cache in between the PACS and the front end did transfer speed become an issue, and that only occurs for very large single frame studies. Those ones get pretty decent performance when switching to multi-frame. I was able to test that on tens of thousands of frames and get time to first image of a couple of seconds from clicking on the image in the display area.

Given that, to figure out how to better deal with slower PACS systems or ones not supporting multiframes:

  1. Reduce the number of query/retrieves used
  2. Reduce the expense of the query/retrieves done

For the first one, try the hanging protocols which don't load the entire study, but only load the required series. That means that it has to be determinable which series are displayed first by things available in the series query, without needing to load all the metadata. That can be a problem for some study types.

For the second, you idea of using the instances query is probably the best option - however, you still need to be able to determine the split/sort rules that are applicable. There are ways to "guess" that, but the query data returned by the initial query isn't really good enough to guess that - something like:
a. Load the instance query for the series determined to be of interest
b. Sort the series by the appropriate criteria
c. Let the split/sort rules determine which instances need full metadata up front (this can use the first/last series, number of frames and other common attributes to guess if this instance needs full metadata) - this may determine it needs all metadata for this series.
d. Load instance metadata in batches as required, or if all metadata is needed, load it all up front

You can start seeing why a general solution to not load all the metadata up front becomes a problem because you need to deal with a lot of different use cases. If the hanging protocol used to determine initial load can also determine whether it is safe to do a query instance metadata for some of the instances, then it would help you to switch between full metadata modes and partial, even on a single series. Those hanging protocols probably need to be tuned to the exact back end response data you can get on the instance level query and series level queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Optimize WADO metadata retrieval.
2 participants