Skip to content

Conversation

@rkritika1508
Copy link
Collaborator

@rkritika1508 rkritika1508 commented Dec 3, 2025

Summary

Target issue is #PLEASE_TYPE_ISSUE_NUMBER
Explain the motivation for making this change. What existing problem does the pull request solve?

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • New Features

    • Added safety guardrails to validate and protect user inputs and model outputs
    • Implemented automatic detection and redaction of offensive language
    • Added automatic detection and anonymization of sensitive personal information
    • Introduced multi-language content validation support (English and Hindi)
    • Added configurable banned word list filtering
  • Tests

    • Added comprehensive test coverage for safety validation features

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Introduces a comprehensive guardrails safety system for content validation. Adds configuration models, a GuardrailsEngine orchestrator that builds and executes validators from config, multiple validator implementations (lexical slur detection, PII anonymization, ban lists), language detection utilities, and corresponding test coverage. Includes hub validator loader infrastructure and new project dependencies.

Changes

Cohort / File(s) Summary
Configuration Types
backend/app/safety/guardrail_config.py
Defines ValidatorConfigItem as a discriminated union over three validator config classes; introduces GuardrailConfig and GuardrailConfigRoot models for input/output validation pipelines.
Engine & Orchestration
backend/app/safety/guardrails_engine.py
Implements GuardrailsEngine class that initializes from GuardrailConfigRoot, builds input/output guards via validator instantiation, and exposes run_input_validators and run_output_validators methods.
Validator Base & Constants
backend/app/safety/validators/base_validator_config.py, backend/app/safety/validators/constants.py
Establishes BaseValidatorConfig with on_fail action and class variable for validator references; defines module constants for slur filenames and language/label/score keys.
Hub Validator Loader
backend/app/safety/validators/hub_loader.py
Provides hub validator integration: HUB_VALIDATORS mapping, functions to check importability, install validators via Guardrails CLI, and dynamically load validator classes post-installation.
Lexical Slur Validator
backend/app/safety/validators/lexical_slur.py
Implements LexicalSlur validator with SlurSeverity enum and text normalization (emoji removal, punctuation stripping, lowercasing); detects toxic words and redacts with [REDACTED_SLUR]; includes LexicalSlurSafetyValidatorConfig for configuration.
PII Remover Validator
backend/app/safety/validators/pii_remover.py
Implements PIIRemover validator integrating Presidio for English PII anonymization with language detection branching (English/Hinglish paths); includes PIIRemoverSafetyValidatorConfig with entity types and threshold tuning.
Ban List Validator
backend/app/safety/validators/ban_list_safety_validator_config.py
Defines BanListSafetyValidatorConfig extending BaseValidatorConfig with banned_words list and post_init hook to lazily load hub validator class via ensure_hub_validator_installed and load_hub_validator_class.
Language Detector
backend/app/safety/utils/language_detector.py
Wraps XLM-RoBERTa language classification model via Hugging Face pipeline; provides cached predict method with normalization for Hindi labels and convenience methods is_hindi and is_english.
Guardrails Engine Tests
backend/app/tests/safety/test_guardrails_engine.py
Tests GuardrailsEngine initialization and validation runs with uli_slur_match, ban_list, and pii_remover validators; verifies slur redaction and PII anonymization in validated outputs.
Lexical Slur Validator Tests
backend/app/tests/safety/validators/test_lexical_slurs.py
Tests LexicalSlur validator with temporary slur CSV fixtures; covers severity filtering, text normalization, slur detection, and redaction across multiple language and severity configurations.
PII Remover Validator Tests
backend/app/tests/safety/validators/test_pii_remover.py
Tests PIIRemover initialization, validation paths (English and Hinglish), language detection integration, entity_types configuration, and Presidio mocking for anonymization outcomes.
Dependencies
backend/pyproject.toml
Adds runtime dependencies: guardrails-ai (≥0.7.0), emoji, ftfy, presidio_analyzer, presidio_anonymizer, transformers, and torch.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant GuardrailsEngine
    participant HubLoader
    participant LanguageDetector
    participant Validators as Validators<br/>(Lexical Slur,<br/>Ban List, PII)
    participant ExternalServices as External<br/>(Guardrails Hub,<br/>Presidio, HF Model)

    User->>GuardrailsEngine: init(GuardrailConfigRoot)
    activate GuardrailsEngine
    GuardrailsEngine->>HubLoader: ensure_hub_validator_installed(type)
    activate HubLoader
    HubLoader->>HubLoader: is_importable(module_path)?
    alt Not Installed
        HubLoader->>ExternalServices: guardrails hub install
        ExternalServices-->>HubLoader: (installed)
    end
    HubLoader-->>GuardrailsEngine: (ready)
    deactivate HubLoader
    GuardrailsEngine->>HubLoader: load_hub_validator_class(type)
    HubLoader-->>GuardrailsEngine: validator_class
    GuardrailsEngine->>Validators: instantiate validators
    Validators-->>GuardrailsEngine: validator instances
    GuardrailsEngine-->>User: GuardrailsEngine ready
    deactivate GuardrailsEngine

    User->>GuardrailsEngine: run_input_validators(text)
    activate GuardrailsEngine
    GuardrailsEngine->>LanguageDetector: predict(text)
    activate LanguageDetector
    LanguageDetector->>ExternalServices: XLM-RoBERTa inference
    ExternalServices-->>LanguageDetector: lang_label
    LanguageDetector-->>GuardrailsEngine: {language, score}
    deactivate LanguageDetector
    
    rect rgb(200, 220, 255)
    note right of GuardrailsEngine: Run validators based on language
    end
    GuardrailsEngine->>Validators: run(text)
    alt Language == Hindi
        Validators->>Validators: Hinglish path (lexical slur)
    else Language == English
        Validators->>ExternalServices: Presidio anonymize (PII)
        ExternalServices-->>Validators: anonymized_text
    end
    Validators-->>GuardrailsEngine: validated_output
    GuardrailsEngine-->>User: validated_result
    deactivate GuardrailsEngine
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Text processing logic: Extensive normalization in lexical_slur.py (emoji removal, punctuation handling, lowercasing) and string redaction patterns require careful validation
  • External service integration: Presidio AnalyzerEngine and AnonymizerEngine integration paths in pii_remover.py; language-branching logic (English vs. Hinglish) needs verification
  • Hub validator mechanics: Lazy loading and installation via Guardrails CLI in hub_loader.py and BanListSafetyValidatorConfig post_init; import resolution logic warrants attention
  • Language detection: ML model caching with lru_cache and label normalization in language_detector.py; ensure model initialization and scoring are correct
  • GuardrailsEngine initialization: Validator instantiation pattern via model_dump and get_validator flow; Guard composition with use_many requires scrutiny
  • Configuration and discriminated unions: ValidatorConfigItem with discriminator field usage across multiple config classes; serialization/deserialization paths
  • Test coverage comprehensiveness: Multiple test files with mocking strategies (Presidio mocks, temp CSV fixtures) should align with implementation behavior

Suggested labels

enhancement, ready-for-review

Suggested reviewers

  • avirajsingh7
  • AkhileshNegi
  • nishika26

Poem

🐰 A guardrail here, a slur blocked there,
Presidio cleanses with utmost care!
PII removed, toxins redacted fast,
Language-aware safety holds steadfast!
Config flows true through our engines' core,
Now users' content's protected more! 🛡️

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17f427e and 6639589.

⛔ Files ignored due to path filters (2)
  • backend/app/safety/validators/lexical_slur/curated_slurlist_hi_en.csv is excluded by !**/*.csv
  • backend/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • backend/app/safety/guardrail_config.py (1 hunks)
  • backend/app/safety/guardrails_engine.py (1 hunks)
  • backend/app/safety/utils/language_detector.py (1 hunks)
  • backend/app/safety/validators/ban_list_safety_validator_config.py (1 hunks)
  • backend/app/safety/validators/base_validator_config.py (1 hunks)
  • backend/app/safety/validators/constants.py (1 hunks)
  • backend/app/safety/validators/hub_loader.py (1 hunks)
  • backend/app/safety/validators/lexical_slur.py (1 hunks)
  • backend/app/safety/validators/pii_remover.py (1 hunks)
  • backend/app/tests/safety/test_guardrails_engine.py (1 hunks)
  • backend/app/tests/safety/validators/test_lexical_slurs.py (1 hunks)
  • backend/app/tests/safety/validators/test_pii_remover.py (1 hunks)
  • backend/pyproject.toml (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants