Finalize readme, docs, and coverage (#100 - #104)

orionarcher · janosh · orionarcher · commit 054f174d9878 · 2025-04-03T08:10:29.000-07:00
* plot speedup

* change import line

* conf.py docs fix

* restyle TorchSim -&gt; torch-sim

* small typo fix

* tweak err messages

* small trajectory fix and ignore unbatched in codecov

* remove seemingly unneeded Structure/Atoms/PhonopyAtoms stubs in torch_sim/io.py

* set coverage target

* fix toml

* multi line string

* tweak doc strings

* switch away from .png

* move speedup up in readme

* fix broken links and set threshold differently

* docs fix

* delete codevoc threshhold attempt

---------

Co-authored-by: Janosh Riebesell &lt;janosh.riebesell@gmail.com&gt;
diff --git a/.github/workflows/link-check-config.json b/.github/workflows/link-check-config.json
@@ -1,7 +1,6 @@
 {
   "ignorePatterns": [
-    {
-      "pattern": "../tutorials/.+.ipynb$"
-    }
+    {"pattern": "../tutorials/.+.ipynb$"},
+    {"pattern": "*github.com/user-attachments*"}
   ]
 }
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -22,7 +22,7 @@ repos:
       - id: check-case-conflict
       - id: check-symlinks
       - id: check-yaml
-        exclude: ".*\/copilot\/.*"
+        exclude: .*\/copilot\/.*
       - id: destroyed-symlinks
       - id: end-of-file-fixer
       - id: forbid-new-submodules
@@ -34,7 +34,7 @@ repos:
     hooks:
       - id: codespell
         stages: [pre-commit, commit-msg]
-        args: ["--ignore-words-list", "statics,crate,annote,atomate,nd,te,titel,coo,slite,fro"]
+        args: [--ignore-words-list, "statics,crate,annote,atomate,nd,te,titel,coo,slite,fro"]
 
   - repo: https://github.com/igorshubovych/markdownlint-cli
     rev: v0.44.0
diff --git a/README.md b/README.md
@@ -3,18 +3,20 @@
 [![CI](https://github.com/radical-ai/torch-sim/actions/workflows/test.yml/badge.svg)](https://github.com/radical-ai/torch-sim/actions/workflows/test.yml)
 [![codecov](https://codecov.io/gh/radical-ai/torch-sim/branch/main/graph/badge.svg)](https://codecov.io/gh/radical-ai/torch-sim)
 [![This project supports Python 3.11+](https://img.shields.io/badge/Python-3.11+-blue.svg?logo=python&logoColor=white)](https://python.org/downloads)
-[![PyPI](https://img.shields.io/pypi/v/torch-sim?logo=pypi&logoColor=white)](https://pypi.org/project/torch-sim)
+[![PyPI](https://img.shields.io/pypi/v/torch_sim_atomistic?logo=pypi&logoColor=white)](https://pypi.org/project/torch_sim_atomistic)
 [![Zenodo](https://img.shields.io/badge/Zenodo-15127004-blue?logo=Zenodo&logoColor=white)][zenodo]
 
 [zenodo]: https://zenodo.org/records/15127004
 
 <!-- help docs find start of prose in readme, DO NOT REMOVE -->
-TorchSim is an next-generation open-source atomistic simulation engine for the MLIP era. By rewriting the core primitives of atomistic simulation in Pytorch, it allows orders of magnitude acceleration of popular machine learning potentials.
+torch-sim is a next-generation open-source atomistic simulation engine for the MLIP
+era. By rewriting the core primitives of atomistic simulation in Pytorch, it allows
+orders of magnitude acceleration of popular machine learning potentials.
 
 * Automatic batching and GPU memory management allowing significant simulation speedup
-* Support for MACE and Fairchem MLIP models
+* Support for MACE, Fairchem, and SevenNet MLIP models with more in progress
 * Support for classical lennard jones, morse, and soft-sphere potentials
-* Molecular dynamics integration schemes like NVE, NVT Langevin, and NPT langevin
+* Molecular dynamics integration schemes like NVE, NVT Langevin, and NPT Langevin
 * Relaxation of atomic positions and cell with gradient descent and FIRE
 * Swap monte carlo and hybrid swap monte carlo algorithm
 * An extensible binary trajectory writing format with support for arbitrary properties
@@ -24,7 +26,7 @@ TorchSim is an next-generation open-source atomistic simulation engine for the M
 
 ## Quick Start
 
-Here is a quick demonstration of many of the core features of TorchSim:
+Here is a quick demonstration of many of the core features of torch-sim:
 native support for GPUs, MLIP models, ASE integration, simple API,
 autobatching, and trajectory reporting, all in under 40 lines of code.
 
@@ -85,12 +87,25 @@ relaxed_state = ts.optimize(
 print(relaxed_state.energy)
 ```
 
+## Speedup
+
+torch-sim achieves up to 100x speedup compared to ASE with popular MLIPs.
+
+![Speedup comparison](https://github.com/user-attachments/assets/2ad1d8b0-a7aa-467b-9260-acb76a1ed591)
+
+This figure compares the time per atom of ASE and torch_sim. Time per atom is defined
+as the number of atoms / total time. While ASE can only run a single system of n_atoms
+(on the x axis), torch_sim can run as many systems as will fit in memory. On an H100,
+the max atoms that could fit in memory was 8000 for gemnet, 10000 for MACE, and 2500
+for SevenNet. This metric describes model performance by capturing speed and memory
+usage simultaneously.
+
 ## Installation
 
 ### PyPI Installation
 
 ```sh
-pip install torch-sim
+pip install torch-sim-atomistic
 ```
 
 ### Installing from source
@@ -103,18 +118,18 @@ pip install .
 
 ## Examples
 
-To understand how `torch-sim` works, start with the [comprehensive tutorials](https://radical-ai.github.io/torch-sim/user/overview.html) in the documentation.
+To understand how torch-sim works, start with the [comprehensive tutorials](https://radical-ai.github.io/torch-sim/user/overview.html) in the documentation.
 
 Even more usage examples can be found in the [`examples/`](examples/readme.md) folder.
 
 ## Core Modules
 
-The `torch-sim` structured is summarized in the [API reference](https://radical-ai.github.io/torch-sim/reference/index.html) documentation.
+torch-sim's structure is summarized in the [API reference](https://radical-ai.github.io/torch-sim/reference/index.html) documentation.
 
 ## License
 
 `torch-sim` is released under an [MIT license](LICENSE).
 
 ## Citation
 
-A manuscript is in preparation. Meanwhile, if you use TorchSim in your research, please [cite the Zenodo archive][zenodo].
+A manuscript is in preparation. Meanwhile, if you use torch-sim in your research, please [cite the Zenodo archive][zenodo].
diff --git a/docs/conf.py b/docs/conf.py
@@ -17,14 +17,14 @@
 
 # -- Project information -----------------------------------------------------
 
-project = "torch-sim"
+project = "torch-sim-atomistic"
 copyright = "2025, Radical AI"  # noqa: A001
 author = "Abhijeet Gangan, Orion Cohen, Janosh Riebesell"
 
 # The short X.Y version
-version = importlib.metadata.version("torch-sim")
+version = importlib.metadata.version(project)
 # The full version, including alpha/beta/rc tags
-release = importlib.metadata.version("torch-sim")
+release = importlib.metadata.version(project)
 
 # -- General configuration ---------------------------------------------------
 
diff --git a/docs/dev/dev_install.md b/docs/dev/dev_install.md
@@ -81,6 +81,7 @@ python -m http.server -d docs_build
 To locally generate the tutorials, they must be copied to the docs folder,
 converted to `.ipynb` files, and executed. Then the .py files and any generated
 trajectory files must be cleaned up.
+
 ```bash
 cp -r examples/tutorials docs/ && \
 jupytext --set-formats "py:percent,ipynb" docs/tutorials/*.py && \
diff --git a/docs/user/overview.md b/docs/user/overview.md
@@ -29,4 +29,4 @@ Learn more in [Understanding Reporting](../tutorials/reporting_tutorial.ipynb)
 
 Under the hood, `torch-sim` takes a modular functional approach to atomistic simulation. Each integrator or optimizer function, such as `nvt_langevin,` takes in a model and parameters and returns `init` and `update` functions that act on a unique `State.` The state inherits from `SimState` and tracks the fixed and fluctuating parameters of the simulation, such as the `momenta` for NVT or the timestep for FIRE. The runner functions take this basic structure and wrap it in a convenient interface with autobatching and reporting.
 
-Learn more in [Fundamentals of `torch-sim`](../tutorials/low_level_tutorial.ipynb) and [Hybrid Swap Tutorial](../tutorials/hybrid_swap_tutorial.ipynb)
+Learn more in [Fundamentals of `torch-sim`](../tutorials/low_level_tutorial.ipynb) and [Implementing New Methods](../tutorials/hybrid_swap_tutorial.ipynb)
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,5 +1,5 @@
 [project]
-name = "torch_sim"
+name = "torch_sim_atomistic"
 version = "0.1.0"
 description = "A pytorch toolkit for calculating material properties using MLIPs"
 authors = [
@@ -87,6 +87,7 @@ ignore = [
     "ERA001",  # Found commented-out code
     "FIX002",  # Line contains TODO, consider resolving the issue
     "G003",    # logging-string-concat
+    "G004",    # logging uses f-string
     "INP001",  # implicit-namespace-package
     "ISC001",  # avoid conflicts with the formatter
     "N803",    # Variable name should be lowercase
@@ -118,7 +119,6 @@ pep8-naming.ignore-names = ["get_kT", "kT"]
 "examples/**/*" = ["B018"]
 "examples/tutorials/**/*" = ["ALL"]
 
-
 [tool.ruff.format]
 docstring-code-format = true
 
@@ -128,3 +128,8 @@ check-filenames = true
 [tool.pytest]
 addopts = ["--cov-report=term-missing", "--cov=torch_sim", "-v"]
 testpaths = ["tests"]
+
+[tool.coverage.run]
+omit = [
+    "torch_sim/unbatched/*",
+]
diff --git a/tests/test_trajectory.py b/tests/test_trajectory.py
@@ -354,12 +354,12 @@ def test_invalid_dtype_handling(test_file: Path) -> None:
     complex_data = {
         "complex": np.random.default_rng(seed=0).random((10, 3)).astype(np.float16)
     }
-    with pytest.raises(ValueError, match="Unsupported dtype"):
+    with pytest.raises(ValueError, match="Unsupported array.dtype="):
         traj.write_arrays(complex_data, steps=0)
 
     # Test string data
     string_data = {"strings": np.array([["a", "b", "c"]] * 10)}
-    with pytest.raises(ValueError, match="Unsupported dtype"):
+    with pytest.raises(ValueError, match="Unsupported array.dtype="):
         traj.write_arrays(string_data, steps=0)
 
     traj.close()
diff --git a/torch_sim/autobatching.py b/torch_sim/autobatching.py
@@ -86,36 +86,36 @@ def _argmax_bins(lst: list[float]) -> int:
     def _revargsort_bins(lst: list[float]) -> list[int]:
         return sorted(range(len(lst)), key=lambda i: -lst[i])
 
-    isdict = isinstance(items, dict)
+    is_dict = isinstance(items, dict)
 
     if not hasattr(items, "__len__"):
         raise TypeError("d must be iterable")
 
-    if not isdict and hasattr(items[0], "__len__"):
+    if not is_dict and hasattr(items[0], "__len__"):
         if weight_pos is not None:
             key = lambda x: x[weight_pos]  # noqa: E731
         if key is None:
             raise ValueError("Must provide weight_pos or key for tuple list")
 
-    if not isdict and key:
+    if not is_dict and key:
         new_dict = dict(enumerate(items))
         items = {i: key(val) for i, val in enumerate(items)}
-        isdict = True
+        is_dict = True
         is_tuple_list = True
     else:
         is_tuple_list = False
 
-    if isdict:
+    if is_dict:
         # get keys and values (weights)
         keys_vals = items.items()
         keys = [k for k, v in keys_vals]
         vals = [v for k, v in keys_vals]
 
         # sort weights decreasingly
-        ndcs = _revargsort_bins(vals)
+        n_dcs = _revargsort_bins(vals)
 
-        weights = _get_bins(vals, ndcs)
-        keys = _get_bins(keys, ndcs)
+        weights = _get_bins(vals, n_dcs)
+        keys = _get_bins(keys, n_dcs)
 
         bins = [{}]
     else:
@@ -140,15 +140,15 @@ def _revargsort_bins(lst: list[float]) -> list[int]:
 
     weights = _get_bins(weights, valid_ndcs)
 
-    if isdict:
+    if is_dict:
         keys = _get_bins(keys, valid_ndcs)
 
     # prepare array containing the current weight of the bins
     weight_sum = [0.0]
 
     # iterate through the weight list, starting with heaviest
     for item, weight in enumerate(weights):
-        if isdict:
+        if is_dict:
             key = keys[item]
 
         # find candidate bins where the weight might fit
@@ -170,7 +170,7 @@ def _revargsort_bins(lst: list[float]) -> list[int]:
             # open a new bin
             b = len(weight_sum)
             weight_sum.append(0.0)
-            if isdict:
+            if is_dict:
                 bins.append({})
             else:
                 bins.append([])
@@ -180,7 +180,7 @@ def _revargsort_bins(lst: list[float]) -> list[int]:
             b = 0
 
         # put it in
-        if isdict:
+        if is_dict:
             bins[b][key] = weight
         else:
             bins[b].append(weight)
@@ -234,9 +234,7 @@ def measure_model_memory_forward(state: SimState, model: ModelInterface) -> floa
 
     logging.info(  # noqa: LOG015
         "Model Memory Estimation: Running forward pass on state with "
-        "%s atoms and %s batches.",
-        state.n_atoms,
-        state.n_batches,
+        f"{state.n_atoms} atoms and {state.n_batches} batches.",
     )
     # Clear GPU memory
     torch.cuda.synchronize()
@@ -324,7 +322,7 @@ def calculate_memory_scaler(
     Args:
         state (SimState): State to calculate metric for, with shape information
             specific to the SimState instance.
-        memory_scales_with (Literal["n_atoms_x_density", "n_atoms"]): Type of metric
+        memory_scales_with ("n_atoms_x_density" |s "n_atoms"): Type of metric
             to use. "n_atoms" uses only atom count and is suitable for models that
             have a fixed number of neighbors. "n_atoms_x_density" uses atom count
             multiplied by number density and is better for models with radial cutoffs
@@ -404,12 +402,9 @@ def estimate_max_memory_scaler(
 
     logging.info(  # noqa: LOG015
         "Model Memory Estimation: Estimating memory from worst case of "
-        "largest and smallest system. Largest system has %s atoms and %s batches, "
-        "and smallest system has %s atoms and %s batches.",
-        max_state.n_atoms,
-        max_state.n_batches,
-        min_state.n_atoms,
-        min_state.n_batches,
+        f"largest and smallest system. Largest system has {max_state.n_atoms} atoms "
+        f"and {max_state.n_batches} batches, and smallest system has "
+        f"{min_state.n_atoms} atoms and {min_state.n_batches} batches.",
     )
     min_state_max_batches = determine_max_batch_size(min_state, model, **kwargs)
     max_state_max_batches = determine_max_batch_size(max_state, model, **kwargs)
@@ -474,7 +469,7 @@ def __init__(
         Args:
             model (ModelInterface): Model to batch for, used to estimate memory
                 requirements.
-            memory_scales_with (Literal["n_atoms", "n_atoms_x_density"]): Metric to use
+            memory_scales_with ("n_atoms" | "n_atoms_x_density"): Metric to use
                 for estimating memory requirements:
                 - "n_atoms": Uses only atom count
                 - "n_atoms_x_density": Uses atom count multiplied by number density
@@ -767,7 +762,7 @@ def __init__(
         Args:
             model (ModelInterface): Model to batch for, used to estimate memory
                 requirements.
-            memory_scales_with (Literal["n_atoms", "n_atoms_x_density"]): Metric to use
+            memory_scales_with ("n_atoms" | "n_atoms_x_density"): Metric to use
                 for estimating memory requirements:
                 - "n_atoms": Uses only atom count
                 - "n_atoms_x_density": Uses atom count multiplied by number density
diff --git a/torch_sim/io.py b/torch_sim/io.py
diff --git a/torch_sim/state.py b/torch_sim/state.py
diff --git a/torch_sim/trajectory.py b/torch_sim/trajectory.py

Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"ignorePatterns": [`
`3`		`- {`
`4`		`- "pattern": "../tutorials/.+.ipynb$"`
`5`		`- }`
	`3`	`+ {"pattern": "../tutorials/.+.ipynb$"},`
	`4`	`+ {"pattern": "github.com/user-attachments"}`
`6`	`5`	`]`
`7`	`6`	`}`
Original file line number	Diff line number	Diff line change
`@@ -29,4 +29,4 @@ Learn more in [Understanding Reporting](../tutorials/reporting_tutorial.ipynb)`
`29`	`29`
`30`	`30`	Under the hood, `torch-sim` takes a modular functional approach to atomistic simulation. Each integrator or optimizer function, such as `nvt_langevin,` takes in a model and parameters and returns `init` and `update` functions that act on a unique `State.` The state inherits from `SimState` and tracks the fixed and fluctuating parameters of the simulation, such as the `momenta` for NVT or the timestep for FIRE. The runner functions take this basic structure and wrap it in a convenient interface with autobatching and reporting.
`31`	`31`
`32`		-Learn more in [Fundamentals of `torch-sim`](../tutorials/low_level_tutorial.ipynb) and [Hybrid Swap Tutorial](../tutorials/hybrid_swap_tutorial.ipynb)
	`32`	+Learn more in [Fundamentals of `torch-sim`](../tutorials/low_level_tutorial.ipynb) and [Implementing New Methods](../tutorials/hybrid_swap_tutorial.ipynb)