Unified EDDIE-ML, or UEDDIE-ML (pronounced "oodie-ML"), is a transformer model that predicts interaction energies from atomistic deformation density coefficients for both neutral and charged systems, based on Dr. Kaycee Low's original EDDIE-ML. The main innovation is the introduction of a charge scaling factor to scale atomistic contributions in the atomistic approximation based on monomer charge. This produces test metrics less than 1 kcal/mol, thereby reaching chemical accuracy statistically.
In the repository directory, simply run pip install -r requirements.txt then run python test.py
To load the pretrained model for inference,
# Because the model was trained using GPUs, we need to set map_location to cpu
model = torch.load('model.pt', weights_only=False, map_location='cpu')
# Make sure multi-GPU model parallelism is disabled
model.disable_multi_gpu('cpu')
# Flag the model for inference mode
model.eval()
# Load the dataset and scalers
from dataset import UEDDIEDataset
dataset = UEDDIEDataset()
scaler_x, scaler_y = dataset.load_and_apply_scalers()
# Load a sample
x, e, c, y, name = dataset.get(0, return_name=True)
# Reshape to (1, ...) for batch dimension of 1
x, e, c = x.unsqueeze(0), e.unsqueeze(0), c.unsqueeze(0)
# Evaluate the model
with torch.no_grad():
y_pred = model(x, e, c)
# Invert y scaling and convert from Hartrees to kcal/mol
y_pred = np.array([y_pred.item()])
y_pred = y_pred.reshape(1, 1)
ie_model = scaler_y.inverse_transform(y_pred)[0, 0] * 627.509
# Print the final interaction energy
print(ie_model)This repository implements the following scientific computing and machine learning pipeline:
- Deformation densities are calculated using (GPU4)PySCF and stored as .cube files
- Those .cube files, encoding uniform deformation density grids, are read and atomistic descriptors are calculated from them and stored as .coeff files
- The dataset is created and dimensionality reduction (PCA) is applied to form the features, creating
output.hdf5 - Interaction energies are calculated (in Hartrees) using the CPU-based open-source quantum chemistry engine Psi4, creating
energies.dat - The atomistic machine learning model UEDDIE-ML is trained and tested using a roughly 80/20 split by default of the dataset and interaction energies
get_deformation_densities.pyrequiresnumpyandpyscf, and optionally but recommendedgpu4pyscfget_dens_coeffs.pyrequiresnumpy,ase,scipy,spherical_functions,sympy, andh5pymake_dataset.pyrequiresnumpy,h5py,matplotlib, andscikit-learnget_energies.pyrequires Psi4 and thepsi4Python packagetrain.pyrequiresnumpy,h5py,scikit-learn,joblib,torch,matplotlib,seaborn, andlion-pytorch
You may need different environments to run each script, because oftentimes Psi4, PySCF, and PyTorch conflict, especially in a HPC module environment.
