This repository consists of analysis scripts to reproduce the publication on LOBSTER database
The following version numbers are used for the workflows:
Workflow.ipynb includes script to start all LOBSTER computations with
pymatgen, fireworks, and atomate.
-
Use the
Data_generation/requirements.txtto create conda environment with necessary packages -
The
Lobster_lightweight_json_generation.ipynbscript will generate light weight lobster jsons that consists of lobsterpy summarzied bonding information, relevant strongest bonds, madelung energies of the structures and atomic charges (refer Table 1 and 2 of the manuscript for the description). -
The
Computational_data_generation.ipynbscript stores all the relevant LOBSTER computation files in the form of JSON using pydantic schema as implemented for atomate2 (refer Table 3 of the manuscript for the description).Example_data/Lightweight_jsons/-- path to sample LOBSTER Lightweight JSONS filesExample_data/Computational_data_jsons/-- path to sample Computational JSON files- All 1520 LOBSTER Lightweight JSONS / Computational data JSONS can be download here :
- Use the
Read_data_records/requirements.txtto create conda environment with necessary packages Read_lobsterpy_data.ipynbThis script will read LobsterPy summarized bonding information JSON files as python dictionary (refer Table 1 of the manuscript for the description).Read_lobsterschema_data.ipynbThis script will read LobsterSchema data as pymatgen objects and consists of all the relevant LOBSTER computation data in the form of python dictionary (refer Table 2 of the manuscript for the description).
-
atomate2 - Install it using
pip install git+https://github.com/materialsproject/atomate2.git@fa603e3cb4c3024b9b12b0d752793a9191d99f8a -
- Download all the computational data files from following repository links:
- Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8
- Make a directory named
Resultsto extract all the tar files. - For example , extract
Part1.tar file, using following commandtar -xf Part1.tar -C ./Results/ - Repeat the command above to extract all the 8 tar files to
Resultsdirectory. - This should result in 1520 directories inside
Results. Each sub directory will be named asmp-xxxand it denotes the Materials Project ID of the compound.
- You can then use the scripts provided to reproduce our technical validation section results.
Charge_spilling_lobster.ipynbwill produce the dataframe with charge spillings for entire dataset and also create the histograms (as in the manuscript).Charge_spilling_data.pklconsists of presaved data fromCharge_spilling_lobster.ipynbscript run (load this to get plots on the go).
Band_overlaps.ipynbwill produce the dataframe with devaitons frombandOverlaps.lobsterfor entire dataset.Band_overlaps_data.pklconsists of presaved data fromBand_overlaps.ipynbscript run (load this to get results on the go).
Get_plots_band_features_tanimoto.ipynbwill produce all the PDOS benchmarking data, save pandas dataframes as pickle and also save the all the plotslsolobdos.pklandlobdos.pklconsists of all the data necessary to reproduce the plots (as shown in Fig 4, 5, 6, 7)Save_pdos_plot_and_data.ipynbwill save the PDOS comparison plots.-
- Download the dash app and its data from 10.5281/zenodo.7795903
- Run the
Band_features.pyscript to get dash app to explore all the s, p, d band feature plots (Checkout -h options) - Run the
Check_fingerprints.pyscript to get dash app to visualize all the s, p, d fingerprint plots (Checkout -h options)s
BVA_Charge_comparisons.ipynbwill produce the results of charge comparison analysis and also corresponding plots (as shown in Fig 8, 9)Charge_comp_data.pklcontains saved to charge comparisonCoordination_comparisons_BVA.ipynbwill produce the results of coordination environments comparisonsCoordination_comp_data_bva.pklcontains saved to coordination environments comparisons
Data_topology.ipynbthis script will extract and store the data necessary for Fig 10.Lobster_dataoverview.pklcontains presaved data ready to be used for generating Fig 10.
- Create conda environment with python 3.8 use
conda create -n ML_model python==3.8 - Activate the newly created
ML_modelenvironment and install matbench v0.6 usingpip install matbench==0.6(Need to do this to deal with automatminer package dependencies conflicts) - Then use the
ML_model/requirements.txtto install all the necessary packages mpids.csvFile contains list of material project ids and corresponding compositionsfeaturizerThis python module is used to featurize lobster lightweight jsons to use ICOHP data as features for ML modelFeaturize_lobsterpy_jsons.ipynbThis script will generate lobster features via featurizer module save it as using the featurizer modulelobsterpy_featurized_data.csvML_data_with_automatminer.ipynbThis script uses automatminer featurizer to extract matminer features based on composition and structure and creates data ready to be used for ML model training (also adds lobter summary stats data as features)-dataforml_automatminer.pklml_utilities.pyThis module contains utility functions used for training and evaluating random forest (RF) regressor models.RF_model.ipynbThis script will train and evaluate 2 RF regressor models using nested CV approach. (Including and exclusing LOBSTER features)Automatminer_rf_ml_model.ipynbThis script will train and evaluate RF regression models using automatminer Matpipe (Used to compare matbench RF model).exc_icohpThis directory containts model cross validation evaluation result plot and feature importance plotsexc_icohp/summary_stats.csvThis file containts summarized stats of model trained and evaluated usingRF_model.ipynbscript. (Excluding LOBSTER features)inc_icohpThis directory containts model cross validation evaluation result plot and feature importance plotsinc_icohp/summary_stats.csvThis file containts summarized stats of model trained and evaluated usingRF_model.ipynbscript. (Including LOBSTER features)Plot_summary_results.ipynbThis scripts reads thesummary_stats.csvof the RF model and visualizes data from Table 7.
- Download all the computational data files from following repository links: