Skip to content

lamalab-org/PolyBind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PolyBind

License Contributor Covenant

PolyBind Overview

PolyBind creates unified embeddings for different polymer representations (PSMILES, BigSMILES, and polymer names) using self-supervised contrastive learning, enabling integration of heterogeneous polymer datasets.

Installation

# Create virtual environment with Python 3.11
uv venv --python 3.11 polybind

# Activate environment
source polybind/bin/activate

# Install dependencies
uv pip install https://github.com/lamalab-org/UniPolymer.git

Usage

from polybind_api import PolyBind

polybind = PolyBind("sreekanth22/PolyBind")

psmiles_embeddings = polybind.encode_psmiles(["[*]CC[*]", "[*]COC[*]"])
print("PSMILES Embeddings:", psmiles_embeddings)

bigsmiles_embeddings = polybind.encode_bigsmiles(["{<N[Si](C)(C)>}", "{$CC=C(CCCC)C$}"])
print("BigSMILES Embeddings:", bigsmiles_embeddings)

polymer_embeddings = polybind.encode_polymer_names(["polyethylene", "polystyrene"])
print("Polymer Name Embeddings:", polymer_embeddings)

This will output the embeddings for each representation, which can be used for downstream tasks.

Batch Encoding Multiple Modalities

from polybind_api import PolyBind
polybind = PolyBind("sreekanth22/PolyBind")

results = polybind.batch_encode(
    psmiles=["[*]CC[*]", "[*]COC[*]"],
    bigsmiles=["{<N[Si](C)(C)>}", "{$CC=C(CCCC)C$}"],
    polymer_names=["polyethylene", "polystyrene"]
)

print("Available embeddings:")
for key, embeddings in results.items():
    print(f"  {key}: shape {embeddings.shape}")

This will output the embeddings for all requested modalities in a single call, which is more efficient than calling each method separately.

Citation

If you use PolyBind in your research, please cite the following paper:

@inproceedings{
kunchapu2025polybind,
title={PolyBind: Effectively Combining Datasets Indexed in Different Representations of Polymers},
author={Sreekanth Kunchapu and Adrian Mirza and Kevin Maik Jablonka},
booktitle={AI for Accelerated Materials Design - NeurIPS 2025},
year={2025},
url={https://openreview.net/forum?id=WkRb273dgP}
}

About

API for getting embeddings for different polymer identifiers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages