maj

casenave · casenave · commit cb9db927b115 · 2025-08-24T08:59:39.000+02:00
diff --git a/docs/index.md b/docs/index.md
@@ -6,30 +6,30 @@
 
 ## Introduction
 
-### PLAID (Physics-Learning AI Datamodel): The Missing Layer for Scientific ML
+### PLAID (Physics-Learning AI Datamodel): the missing layer for Scientific ML
 
 Keep your simulation data intact, query it intuitively, and transform it seamlessly for deep learning.
 
 PLAID is an open framework that makes it easy to represent and share datasets from complex physics simulations. It introduces a common standard for describing simulation data and comes with a library to create, explore, and manipulate complex datasets of physics similations. PLAID was first developed at SafranTech, the research and innovation center of [Safran Group](https://www.safran-group.com/).
 
 
-### Why Another Data Model?
+### Why another data model?
 
-In machine learning, datasets are often treated as flat tables, sequences, or images. Standard frameworks — Hugging Face, PyTorch, TensorFlow — assume your data is already regular, homogeneous, and columnar. But in scientific and industrial applications, this assumption rarely holds:
+In machine learning, datasets are often treated as flat tables, sequences, or images. Standard frameworks (Hugging Face, PyTorch, TensorFlow) assume your data is already regular, homogeneous, and columnar. But in scientific and industrial applications, this assumption rarely holds:
 
 - Simulations produce hierarchical, multi-zone data.
 - Fields have heterogeneous shapes, types, and metadata.
-- Implicit conventions vary from one simulation to another.
+- Implicit conventions may vary from one simulation to another.
 
 Traditional ML datasets are not designed to handle this complexity efficiently. Flattening, padding, or converting these structures into a standard tabular format can be error-prone, memory-intensive, and slow, and it often destroys critical information about the underlying physical structure.
 
-PLAID fills this gap by sitting upstream in the ML pipeline, bridging raw scientific data and ML-ready formats:
+PLAID fills this gap by sitting *upstream* in the ML pipeline, bridging raw scientific data and ML-ready formats, including graph-based ones like PyTorch Geometric (PyG):
 
-1. Capture the full structure: PLAID preserves hierarchical, multi-field, multi-zone data, including metadata, defaults, and units.
-2. Simplify access: Intuitive APIs let you query fields, arrays, and derived quantities without flattening or rewriting your trees.
-3. Prepare for ML: When needed, PLAID can produce ML-ready datasets (PyTorch, PyG, or Hugging Face style), while keeping memory and computation efficient.
+1. Capture the full structure: PLAID preserves hierarchical, multi-field, multi-zone data, including metadata.
+2. Simplify access: intuitive APIs let you query fields, arrays, and derived quantities without flattening or rewriting your trees.
+3. Prepare for ML: PLAID can generate PyTorch datasets, Hugging Face datasets, or PyG graph objects, so batching and training pipelines work seamlessly, while keeping memory and computation efficient.
 
-In short: PLAID is not “just another dataset format.” It is a scientific data management layer, designed for the complex, heterogeneous, high-dimensional world of physics-based simulations, where preparing your data for ML is as important as the model itself.
+In short: PLAID is not “just another dataset format.” It is a scientific data management layer, designed for the complex, heterogeneous, high-dimensional world of physics-based simulations, where preparing your data for ML (whether columnar or graph-structured) is as important as the model itself.
 
 ## Open source