Skip to content

StarDist h5py Error Loading '2D_versatile_fluo' #13

@AndrewLutsky

Description

@AndrewLutsky

Hello again,

Again thank you for your continued work on the pipeline!

I have been running the pipeline and found that when I run the sd_cell_segmentation step, I run into the following error:

ERROR ~ Error executing process > 'sd_segment_cells:sd_cell_segmentation (23)'

Caused by:
  Process `sd_segment_cells:sd_cell_segmentation (23)` terminated with an error exit status (1)


Command executed:

  python3.8 /opt/stardist_segment.py \
  	1189_003 \
  	1189_003-cp4-preprocessed_metadata.csv \
  	DNA1 \
  	2D_versatile_fluo \
  	default \
  	0.05\
  	default \
  	1189_003-StarDist-Cells.csv \
  mmand 1189_003-StarDist-Cell_Mask.tiff \
  1     > stardist_segmentation_log.txt 2>&1

Command exit status:

in ./work/48/91c .... /stardist_segmentation_log.txt

  1 2025-09-02 17:31:26.636220: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror:     libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /.singularity.d/libs
  2 2025-09-02 17:31:26.636269: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your mac    hine.
  3 2025-09-02 17:31:29.250180: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
  4 2025-09-02 17:31:29.252965: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libc    uda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /.singularity.d/libs
  5 2025-09-02 17:31:29.252980: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
  6 2025-09-02 17:31:29.253007: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: node0424.palmett    o.clemson.edu
  7 2025-09-02 17:31:29.253011: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node0424.palmetto.clemson.edu
  8 2025-09-02 17:31:29.253071: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: Not found: was unable to find libcud    a.so DSO loaded into this program
  9 2025-09-02 17:31:29.254470: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 550.163.1
 10 2025-09-02 17:31:29.254655: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Lib    rary (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
 11 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
 12 2025-09-02 17:31:29.254818: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
 13 The selected sample name is: 1189_003
 14 The selected probability threshold is: 0.05
 15 Using the model's default value for NMS
 16 Parsing metadata file: 1189_003-cp4-preprocessed_metadata.csv
 17 The files to be loaded are: { ... }
 18 The parsed labels are: ['DNA1']
 19 Loading model: 2D_versatile_fluo default
 20 Found model '2D_versatile_fluo' for 'StarDist2D'.
 21 Loading network weights from 'weights_best.h5'.
 22 Traceback (most recent call last):
 23   File "/opt/stardist_segment.py", line 190, in <module>
 24     model = load_model(model_name, model_path)
 25   File "/opt/stardist_segment.py", line 94, in load_model
 26     return StarDist2D.from_pretrained(model_to_load)
 27   File "/usr/local/lib/python3.8/dist-packages/csbdeep/models/base_model.py", line 79, in from_pretrained
 28     return get_model_instance(cls, name_or_alias)
 29   File "/usr/local/lib/python3.8/dist-packages/csbdeep/models/pretrained.py", line 102, in get_model_instance
 30     model = cls(config=None, name=path.stem, basedir=path.parent)
 31   File "/usr/local/lib/python3.8/dist-packages/stardist/models/model2d.py", line 292, in __init__
 32     super().__init__(config, name=name, basedir=basedir)
 33   File "/usr/local/lib/python3.8/dist-packages/stardist/models/base.py", line 220, in __init__
 34     super().__init__(config=config, name=name, basedir=basedir)
 35   File "/usr/local/lib/python3.8/dist-packages/csbdeep/models/base_model.py", line 113, in __init__
 36     self._find_and_load_weights()
 37   File "/usr/local/lib/python3.8/dist-packages/csbdeep/models/base_model.py", line 32, in wrapper
 38     return f(*args, **kwargs)
 39   File "/usr/local/lib/python3.8/dist-packages/csbdeep/models/base_model.py", line 167, in _find_and_load_weights
 40     self.load_weights(weights_chosen.name)
 41   File "/usr/local/lib/python3.8/dist-packages/csbdeep/models/base_model.py", line 32, in wrapper
 42     return f(*args, **kwargs)
 43   File "/usr/local/lib/python3.8/dist-packages/csbdeep/models/base_model.py", line 184, in load_weights
 44     self.keras_model.load_weights(str(self.logdir/name))
 45   File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 2229, in load_weights
 46     hdf5_format.load_weights_from_hdf5_group(f, self.layers)
 47   File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 696, in load_weights_from_hdf5_group
 48     weight_values = [np.asarray(g[weight_name]) for weight_name in weight_names]
 49   File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 696, in <listcomp>
 50     weight_values = [np.asarray(g[weight_name]) for weight_name in weight_names]
 51   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
 52   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
 53   File "/usr/local/lib/python3.8/dist-packages/h5py/_hl/group.py", line 264, in __getitem__
 54     oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
 55   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
 56   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
 57   File "h5py/h5o.pyx", line 190, in h5py.h5o.open
 58 KeyError: 'Unable to open object (bad local heap signature)'

When I load the container and load the 2D_versatile_fluo model in independently, I do not encounter any errors.

I believe that this is related to an ongoing issue with stardist and opening the weights files in parallel in conjunction with h5py. [https://github.com/stardist/stardist/issues/93] ... I have resolved this issue by setting the executor.queueSize for my custom profile to be 1 to set the number of jobs for that step to be 1, but this seems like a poor solution, I'm not really sure the proper way to go about this... This is most likely not coming up on the test data since there are only two samples being run at once, I would be willing to try and run using different queue sizes on my local cluster to see when it seems to "break". Let me know your thoughts! Another hacky solution is to delay the processes using a timer randomly so that the process doesn't try to load in the model weights all at once as discussed in the linked issue thread. Let me know if you need more than what is provided log wise...

Again thanks for your time and effort!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions