diff --git a/.cursor/logging.mdc b/.cursor/logging.mdc new file mode 100644 index 0000000..d743bfb --- /dev/null +++ b/.cursor/logging.mdc @@ -0,0 +1,23 @@ +--- +description: Logging should be actionable, structured, and follow professional practices +alwaysApply: true +--- + +- Never log and raise at the same time. Let logs occur where the exception is handled. +- Use `.bind()` to attach contextual fields like `job_id`, `trainer`, `dataset`, etc. +- Propagate the bound logger across modules instead of re-creating it. +- Use semantic log levels: + - `debug`: Diagnostic/internal details, not printed in prod + - `info`: Expected events (training started, dataset loaded, etc.) + - `warning`: Unexpected but handled (e.g. missing columns) + - `error`: Real failures that may abort a process +- Avoid logs like `"Starting function"` — instead log **what happened**, with context. +- Avoid excessive logging in loops. Log summary or batch progress only. +- Enable JSON structured logs only in production environments. +- Keep logs relevant to external observability, not internal dev experimentation. + +**Checklist for evaluating a log:** +- Does it convey a real event, not just function entry? +- Is it at the correct level (not everything is warning)? +- Is the logger properly bound with job/task context? +- Can this log help diagnose or monitor the system? diff --git a/.cursor/modular-design-principles.mdc b/.cursor/modular-design-principles.mdc new file mode 100644 index 0000000..ef14c0d --- /dev/null +++ b/.cursor/modular-design-principles.mdc @@ -0,0 +1,19 @@ +--- +description: Enforce modular design with clear responsibility boundaries +alwaysApply: true +--- + +- Follow **Single Responsibility Principle (SRP)**: one reason to change per module/class/function. +- Avoid **over-engineering**: modularize only when it improves clarity, reuse or testability. +- Group related logic in packages: `data/`, `services/`, `utils/`, `models/`, etc. +- Keep modules cohesive but not overly fragmented. +- Don't split code into separate files unless they evolve independently or have distinct lifecycles. + +**Checklist for evaluating modularity:** +- Can the module be reused independently? +- Does it hide internal details behind a clean interface? +- Would adding more features bloat its logic? + +Example: +✅ `email_service.py` sends emails +🚫 `email_sender.py`, `email_templates.py`, `email_config.py` if too small and tightly coupled diff --git a/.cursor/structured-import-order.mdc b/.cursor/structured-import-order.mdc new file mode 100644 index 0000000..edb2a40 --- /dev/null +++ b/.cursor/structured-import-order.mdc @@ -0,0 +1,56 @@ +--- +description: Enforce structured and absolute import order for Python files +globs: + - "**/*.py" +alwaysApply: true +--- + +Group, order, and format Python imports following these conventions: + +### ✅ Use **absolute imports only**: +Always write imports using the full path from the project root. +**Avoid** relative imports like `from .module import foo` or `from ..utils import bar`. + +- ✅ `from my_project.utils import foo` +- ❌ `from .utils import foo` +- ❌ `from ..submodule import bar` + +This ensures clarity, consistency across refactors, and better compatibility with tools like linters, IDEs, and packaging systems. + +--- + +### 📚 Import grouping and ordering: + +Organize imports into **three groups**, separated by a blank line: + +1. **Standard Library Imports** + e.g., `import os`, `from datetime import datetime` + +2. **Third-Party Library Imports** + e.g., `import numpy as np`, `from sqlalchemy import Column` + +3. **Local Application/Library Imports** + e.g., `from my_project.utils import helper_function` + +--- + +### 🔤 Within each group: + +- First: `import module` (in alphabetical order) +- Then: `from module import name` (in alphabetical order) + +--- + +### ✅ Example: + +```python +import os +import sys +from datetime import datetime + +import numpy as np +import pandas as pd +from sqlalchemy import Column, Integer + +import my_project +from my_project.utils import helper_function diff --git a/.cursor/type-hinting.mdc b/.cursor/type-hinting.mdc new file mode 100644 index 0000000..dff6fbb --- /dev/null +++ b/.cursor/type-hinting.mdc @@ -0,0 +1,28 @@ +--- +description: Enforce strict type annotations using Python 3.12+ standards +globs: + - "**/*.py" +alwaysApply: true +--- + +- Always annotate: + - Function parameters and return types, even for `None`, `bool`, `str`, etc. + - Variable declarations (including attributes and loop vars) when static typing helps comprehension. + - Lambda expressions when assigned or passed as arguments. +- **Avoid** `Any`, `Optional`, `Union`, `List`, `Dict`, etc. unless justified. +- **Prefer** Python 3.12+ standard generics: + - Use `list[str]`, `dict[str, int]`, `tuple[str, int]` + - Avoid `List`, `Dict`, `Tuple` from `typing` +- Use `|` (pipe) syntax for unions: + - Use `str | None` + - Avoid `Optional[str]` +- Use `Self` and `classmethod`/`staticmethod` hints as per [PEP 673](https://peps.python.org/pep-0673/) + +**Examples:** + +```python +def greet(name: str | None = None) -> str: + return f"Hello, {name or 'World'}" + +def parse(data: str) -> dict[str, int]: + ... diff --git a/README.md b/README.md index df8b20d..3c580e4 100755 --- a/README.md +++ b/README.md @@ -116,6 +116,11 @@ This project uses [structlog](https://www.structlog.org/en/stable/) for structur Structured logs make it easier to parse, search, and analyze logs in production systems, especially when using centralized logging tools like Loki, ELK, or Datadog. +## IDE + +This project has some [cursor|https://cursor.com/] rules under `.cursor` that help us write some code, but you are not forced to use it as this is optional. + + ### Examples 1. Pass context information as key word arguments