Skip to content

Conversation

@klimaj
Copy link
Member

@klimaj klimaj commented Sep 19, 2025

Currently, PackedPose objects are serialized/deserialized using the pickle module (introduced in ~2019), and the Pose.cache dictionary (introduced in #430) supports caching arbitrary datatypes in the Pose object using the pickle module. Additionally, #462 enables saving compressed PackedPose objects to disk (i.e., as *.b64_pose and *.pkl_pose files) for sharing PyRosetta Pose objects with the scientific community. However, use of the pickle module is not secure (see warning here as outlined in #519).

Herein this PR, a secure pickle.loads method is developed and slotted into the PackedPose and Pose.cache infrastructure to permanently disallow certain risky packages, modules, and namespaces from being unpickled/loaded (e.g., exec, eval, os.system, subprocess.run, etc., and will be updated over time as needed), thus significantly improving the security of handling PackedPose and Pose objects in memory if received from a second party (i.e., over a socket, queue, interprocess communication, etc.) or when reading a file received from a second party (i.e., using pyrosetta.distributed.io.pose_from_file with a *.b64_pose and *.pkl_pose file). By default, only pyrosetta and numpy packages, and certain builtins modules (like dict, complex, tuple, etc.), are considered secure and permitted to be unpickled/loaded. Other packages that the user may want to serialize/deserialize may be assigned as secure per-process by the user in-code (see methods below). It is worth noting that PyTorch developers have implemented a similar strategy with the torch.serialization.add_safe_globals() method.

Another aim of this PR is to implement an optional Hash-based Message Authentication Code (HMAC) key in the Pose.cache dictionary for data integrity verification. While not a security feature, this new API allows the user to set a HMAC key to be prepended to every score value in the Pose.cache dictionary that effectively says "this was saved by PyRosetta", so that it intentionally raises an error when the HMAC key is missing or differs upon retrieval, indicating that the data appears to have been tampered with or modified. By default, the HMAC key is disabled (being set to None) in order to reduce memory overhead of the Pose.cache dictionary; e.g., if 32 bytes are prepended to each score value, with 1,000 score values that's 32,000 bytes or 32 KB of overhead, and with a million score values that's 32 MB of overhead.

The following are newly added functions:

  • pyrosetta.secure_unpickle.add_secure_package: Add a package to the unpickle allowed list
  • pyrosetta.secure_unpickle.remove_secure_package: Remove a package from the unpickle allowed list
  • pyrosetta.secure_unpickle.clear_secure_packages: Remove all packages from the unpickle allowed list
  • pyrosetta.secure_unpickle.get_disallowed_packages: Return all permanently disallowed packages/modules/prefixes
  • pyrosetta.secure_unpickle.get_secure_packages: Return all packages in the unpickle allowed list
  • pyrosetta.secure_unpickle.set_secure_packages: Set all packages in the unpickle allowed list
  • pyrosetta.secure_unpickle.set_unpickle_hmac_key: Set the HMAC key for the Pose.cache dictionary
  • pyrosetta.secure_unpickle.get_unpickle_hmac_key: Return the HMAC key for the Pose.cache dictionary

@klimaj klimaj added the ready_for_review This PR is ready to be reviewed and merged. label Sep 23, 2025
Copy link
Member

@lyskov lyskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, - thank you for taking care of this @klimaj !

Tagging @rclune , - Rachel could you please check the comment on documentation below? Thanks,

Added a warning about the risks of unpickling data from untrusted sources in the secure_unpickle.py file.
Updated warning message regarding secure unpickling to clarify risks and emphasize the importance of trusted sources.
Comment on lines 70 to 79
from pyrosetta.secure_unpickle import (
add_secure_package,
remove_secure_package,
clear_secure_packages,
get_disallowed_packages,
get_secure_packages,
set_secure_packages,
UnpickleSecurityError,
UnpickleIntegrityError,
)
Copy link
Member

@lyskov lyskov Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klimaj could you please elaborate why do we need these at top level? Ie: in general I would like to keep default import list lean (and these will not be useful unless distributed framework is enabled).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyskov Good point! The original idea was for convenience, but on second thought, I agree that we don't want to clutter the pyrosetta.* namespace. Also please note that these methods are still active outside of the pyrosetta.distributed framework, since these are for configuring the secure packages for the Pose.cache dictionary.

@klimaj
Copy link
Member Author

klimaj commented Sep 24, 2025

@lyskov thanks for the review! And @rclune, thank you for adding the warnings!

@lyskov lyskov merged commit 078f4d9 into RosettaCommons:main Sep 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

03 PyRosetta enhancement New feature or request industry ready_for_review This PR is ready to be reviewed and merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants