-
Notifications
You must be signed in to change notification settings - Fork 120
Supporting secure unpickling in PyRosetta #523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…g required attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a warning about the risks of unpickling data from untrusted sources in the secure_unpickle.py file.
Updated warning message regarding secure unpickling to clarify risks and emphasize the importance of trusted sources.
| from pyrosetta.secure_unpickle import ( | ||
| add_secure_package, | ||
| remove_secure_package, | ||
| clear_secure_packages, | ||
| get_disallowed_packages, | ||
| get_secure_packages, | ||
| set_secure_packages, | ||
| UnpickleSecurityError, | ||
| UnpickleIntegrityError, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klimaj could you please elaborate why do we need these at top level? Ie: in general I would like to keep default import list lean (and these will not be useful unless distributed framework is enabled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyskov Good point! The original idea was for convenience, but on second thought, I agree that we don't want to clutter the pyrosetta.* namespace. Also please note that these methods are still active outside of the pyrosetta.distributed framework, since these are for configuring the secure packages for the Pose.cache dictionary.
Currently,
PackedPoseobjects are serialized/deserialized using thepicklemodule (introduced in ~2019), and thePose.cachedictionary (introduced in #430) supports caching arbitrary datatypes in thePoseobject using thepicklemodule. Additionally, #462 enables saving compressedPackedPoseobjects to disk (i.e., as*.b64_poseand*.pkl_posefiles) for sharing PyRosettaPoseobjects with the scientific community. However, use of thepicklemodule is not secure (see warning here as outlined in #519).Herein this PR, a secure
pickle.loadsmethod is developed and slotted into thePackedPoseandPose.cacheinfrastructure to permanently disallow certain risky packages, modules, and namespaces from being unpickled/loaded (e.g.,exec,eval,os.system,subprocess.run, etc., and will be updated over time as needed), thus significantly improving the security of handlingPackedPoseandPoseobjects in memory if received from a second party (i.e., over a socket, queue, interprocess communication, etc.) or when reading a file received from a second party (i.e., usingpyrosetta.distributed.io.pose_from_filewith a*.b64_poseand*.pkl_posefile). By default, onlypyrosettaandnumpypackages, and certainbuiltinsmodules (likedict,complex,tuple, etc.), are considered secure and permitted to be unpickled/loaded. Other packages that the user may want to serialize/deserialize may be assigned as secure per-process by the user in-code (see methods below). It is worth noting that PyTorch developers have implemented a similar strategy with the torch.serialization.add_safe_globals() method.Another aim of this PR is to implement an optional Hash-based Message Authentication Code (HMAC) key in the
Pose.cachedictionary for data integrity verification. While not a security feature, this new API allows the user to set a HMAC key to be prepended to every score value in thePose.cachedictionary that effectively says "this was saved by PyRosetta", so that it intentionally raises an error when the HMAC key is missing or differs upon retrieval, indicating that the data appears to have been tampered with or modified. By default, the HMAC key is disabled (being set toNone) in order to reduce memory overhead of thePose.cachedictionary; e.g., if 32 bytes are prepended to each score value, with 1,000 score values that's 32,000 bytes or 32 KB of overhead, and with a million score values that's 32 MB of overhead.The following are newly added functions:
pyrosetta.secure_unpickle.add_secure_package: Add a package to the unpickle allowed listpyrosetta.secure_unpickle.remove_secure_package: Remove a package from the unpickle allowed listpyrosetta.secure_unpickle.clear_secure_packages: Remove all packages from the unpickle allowed listpyrosetta.secure_unpickle.get_disallowed_packages: Return all permanently disallowed packages/modules/prefixespyrosetta.secure_unpickle.get_secure_packages: Return all packages in the unpickle allowed listpyrosetta.secure_unpickle.set_secure_packages: Set all packages in the unpickle allowed listpyrosetta.secure_unpickle.set_unpickle_hmac_key: Set the HMAC key for thePose.cachedictionarypyrosetta.secure_unpickle.get_unpickle_hmac_key: Return the HMAC key for thePose.cachedictionary