Skip to content

Commit cb9db92

Browse files
committed
maj
1 parent 2d5d0f6 commit cb9db92

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

docs/index.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,30 +6,30 @@
66

77
## Introduction
88

9-
### PLAID (Physics-Learning AI Datamodel): The Missing Layer for Scientific ML
9+
### PLAID (Physics-Learning AI Datamodel): the missing layer for Scientific ML
1010

1111
Keep your simulation data intact, query it intuitively, and transform it seamlessly for deep learning.
1212

1313
PLAID is an open framework that makes it easy to represent and share datasets from complex physics simulations. It introduces a common standard for describing simulation data and comes with a library to create, explore, and manipulate complex datasets of physics similations. PLAID was first developed at SafranTech, the research and innovation center of [Safran Group](https://www.safran-group.com/).
1414

1515

16-
### Why Another Data Model?
16+
### Why another data model?
1717

18-
In machine learning, datasets are often treated as flat tables, sequences, or images. Standard frameworks Hugging Face, PyTorch, TensorFlow assume your data is already regular, homogeneous, and columnar. But in scientific and industrial applications, this assumption rarely holds:
18+
In machine learning, datasets are often treated as flat tables, sequences, or images. Standard frameworks (Hugging Face, PyTorch, TensorFlow) assume your data is already regular, homogeneous, and columnar. But in scientific and industrial applications, this assumption rarely holds:
1919

2020
- Simulations produce hierarchical, multi-zone data.
2121
- Fields have heterogeneous shapes, types, and metadata.
22-
- Implicit conventions vary from one simulation to another.
22+
- Implicit conventions may vary from one simulation to another.
2323

2424
Traditional ML datasets are not designed to handle this complexity efficiently. Flattening, padding, or converting these structures into a standard tabular format can be error-prone, memory-intensive, and slow, and it often destroys critical information about the underlying physical structure.
2525

26-
PLAID fills this gap by sitting upstream in the ML pipeline, bridging raw scientific data and ML-ready formats:
26+
PLAID fills this gap by sitting *upstream* in the ML pipeline, bridging raw scientific data and ML-ready formats, including graph-based ones like PyTorch Geometric (PyG):
2727

28-
1. Capture the full structure: PLAID preserves hierarchical, multi-field, multi-zone data, including metadata, defaults, and units.
29-
2. Simplify access: Intuitive APIs let you query fields, arrays, and derived quantities without flattening or rewriting your trees.
30-
3. Prepare for ML: When needed, PLAID can produce ML-ready datasets (PyTorch, PyG, or Hugging Face style), while keeping memory and computation efficient.
28+
1. Capture the full structure: PLAID preserves hierarchical, multi-field, multi-zone data, including metadata.
29+
2. Simplify access: intuitive APIs let you query fields, arrays, and derived quantities without flattening or rewriting your trees.
30+
3. Prepare for ML: PLAID can generate PyTorch datasets, Hugging Face datasets, or PyG graph objects, so batching and training pipelines work seamlessly, while keeping memory and computation efficient.
3131

32-
In short: PLAID is not “just another dataset format.” It is a scientific data management layer, designed for the complex, heterogeneous, high-dimensional world of physics-based simulations, where preparing your data for ML is as important as the model itself.
32+
In short: PLAID is not “just another dataset format.” It is a scientific data management layer, designed for the complex, heterogeneous, high-dimensional world of physics-based simulations, where preparing your data for ML (whether columnar or graph-structured) is as important as the model itself.
3333

3434
## Open source
3535

0 commit comments

Comments
 (0)