Title: Multimodal transformers with chemical priors improve the accuracy of phase classification from X-ray diffraction spectra
- Python 3.8.18 / 3.11.8
- Pytorch 1.13.1
- numpy 1.24.3
- xrayutilities 1.7.4 / 1.7.8
- pymatgen 2023.8.10
- transformers 4.44.2
- scikit-learn 1.3.2
- scipy 1.10.1
- Windows 11, version 23H2
- NVIDIA GeForce GTX 3080/4090
https://docs.anaconda.com/anaconda/install/
conda create -n chem_xrd python=3.8.18
conda activate chem_xrd
pip install numpy==1.24.3
pip install xrayutilities==1.7.4
pip install pymatgen==2023.8.10
pip install transformers==4.44.2
pip install scikit-learn==1.3.2
Option 1: to install the CPU version on Linux and Windows
pip install torch==1.13.1+cpu torchvision==0.14.1+cpu torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cpu
Option 2: to install the GPU version (with CUDA==11.x and cuDNN>=8.5.0)
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
See also: https://pytorch.org/get-started/previous-versions/
Before use, download the XRD dataset and pretrained models at https://drive.google.com/file/d/1vBlm35L_PZZ6wtZlvAe_E2jjcrLlXTob, and place the contents in the "cif" and "pretrained_model" directories, respectively.
Run Preprocess.ipynb. The notebook shows the generation of new alloyed CIF files, the application of lattice strains, and the process of dataset generation. For the CIF selection process, refer to ICSD_down_selection.xlsx for further information.
Run Train.ipynb. The notebook shows model training for both single-phase and multi-phase scenarios. In single-phase classification, the model predicts only the material ID. In multi-phase classification, the model predicts both the material ID(s) and the number of phases present.
Run Evaluation.ipynb. The notebook shows single-phase and multi-phase classifications with Chem-XRD models on simulated and experimental datasets, reproducing figures in the main text.
Run Interpretation.ipynb. The notebook includes t-SNE visualizations of the training datasets, as well as interpretations of the elemental and structural contributions in the Chem-XRD models, reproducing figures in the main text and the Supplementary Information (SI).