Skip to content

Conversation

pablohashescobar
Copy link
Member

@pablohashescobar pablohashescobar commented Sep 11, 2025

Description

  • Implement cleanup for old webhook logs
  • Adjust schedule times for existing cleanup tasks
  • Reduce batch size from 1000 to 500

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (change that would cause existing functionality to not work as expected)

Test Scenarios

  • validate cleanup task should not trigger read ops error
  • add webhook log cleanup task

References

WEB-4873

Summary by CodeRabbit

  • New Features

    • Automated daily cleanup of webhook logs to manage storage and keep activity history lean.
    • Automated daily cleanup of expired export links for improved data hygiene.
  • Chores

    • Adjusted schedules for daily cleanup jobs to run earlier for smoother maintenance windows.
    • Tuned cleanup batch size to improve reliability and reduce resource spikes during background processing.

- Implement cleanup for old webhook logs - Add periodic Celery task for
webhook log deletion - Adjust schedule times for existing cleanup tasks
- Reduce batch size from 1000 to 500
@Copilot Copilot AI review requested due to automatic review settings September 11, 2025 10:00
Copy link
Contributor

coderabbitai bot commented Sep 11, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds webhook log cleanup to the background cleanup workflow, decreases batch size from 1000 to 500, and updates Celery beat to adjust existing schedules and add daily tasks for webhook log cleanup and exporter history cleanup.

Changes

Cohort / File(s) Summary
Cleanup task enhancements
apps/api/plane/bgtasks/cleanup_task.py
Added transform_webhook_log(record: Dict), get_webhook_logs_queryset(), and Celery task delete_webhook_logs() to process hard-deletion of old webhook logs; reduced BATCH_SIZE from 1000 to 500; integrated WebhookLog into existing cleanup pipeline.
Celery schedule updates
apps/api/plane/celery.py
Adjusted daily run times for three existing cleanup tasks; added two new scheduled tasks: delete_webhook_logs at 03:30 UTC and delete_old_s3_link at 03:45 UTC.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Cron as Celery Beat (UTC)
  participant Celery as Celery Worker
  participant Cleanup as delete_webhook_logs()
  participant DB as Database

  Cron->>Celery: Enqueue task plane.bgtasks.cleanup_task.delete_webhook_logs (daily 03:30)
  Celery->>Cleanup: Execute task
  Cleanup->>DB: Query webhook logs older than HARD_DELETE_AFTER_DAYS (batch size=500)
  DB-->>Cleanup: Iterator of records (values-only)
  loop For each batch
    Cleanup->>Cleanup: Transform records (stringify fields)
    Cleanup->>DB: Delete processed records
  end
  Cleanup-->>Celery: Task complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

tests required

Suggested reviewers

  • dheeru0198
  • sriramveeraghanta

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately describes the primary changes in the PR: adding a webhook log cleanup task and updating the Celery schedule, and it includes the issue key for traceability. It matches the PR objectives and raw_summary, is concise, and contains no unnecessary noise, so a reviewer scanning history will understand the main intent.
Description Check ✅ Passed The PR description follows the repository template: it includes a Description section stating the main changes (webhook cleanup, schedule adjustments, batch size reduction), a populated Type of Change section, Test Scenarios, and References. The Description is brief and the test scenarios are minimal but present, so the description is mostly complete and maps to the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Poem

A hare at three-thirty hops through logs so old,
Trims the webs of webhooks, tidy, crisp, and bold.
With batches half as hefty, the queue keeps clean,
Beat taps the drum—03:30 on the scene.
Hop, sweep, delete—my burrow’s evergreen! 🐇✨

✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-cleanup-task-webhook

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

makeplane bot commented Sep 11, 2025

Pull Request Linked with Plane Work Items

Comment Automatically Generated by Plane

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements automated cleanup for old webhook logs by adding a new Celery task and updating the schedule for existing cleanup tasks. The changes help manage database storage by automatically removing old webhook logs after a configurable retention period.

  • Add new webhook log cleanup functionality with MongoDB archival
  • Update Celery task scheduling to better distribute cleanup operations
  • Reduce batch size from 1000 to 500 for improved memory usage

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
apps/api/plane/celery.py Adds webhook log cleanup task to Celery schedule and adjusts timing for existing cleanup tasks
apps/api/plane/bgtasks/cleanup_task.py Implements webhook log cleanup logic including queryset, transform function, and Celery task

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

"webhook",
"event_type",
"request_method",
"response_headers",
Copy link
Preview

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field request_headers is listed twice in the values() call - once at line 414 and again at line 418 as response_headers. The first occurrence at line 414 should be request_headers and line 418 should remain response_headers.

Suggested change
"response_headers",
"request_headers",

Copilot uses AI. Check for mistakes.

@@ -374,7 +396,32 @@ def get_issue_description_versions_queryset():
)


# Celery tasks - now much simpler!
def get_webhook_logs_queryset():
"""Get email logs older than cutoff days."""
Copy link
Preview

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring incorrectly states 'Get email logs older than cutoff days' but this function gets webhook logs. It should be updated to 'Get webhook logs older than cutoff days.'

Suggested change
"""Get email logs older than cutoff days."""
"""Get webhook logs older than cutoff days."""

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
apps/api/plane/bgtasks/cleanup_task.py (1)

251-269: Avoid string-casting structured fields; align with other transforms.

Casting headers/bodies/status/retry_count to str hurts querying and is inconsistent with transform_api_log. Keep IDs as strings, but preserve original types for structured/numeric fields. Also tweak the docstring wording.

-def transform_webhook_log(record: Dict):
-    """Transfer webhook logs to a new destination."""
-    return {
-        "id": str(record["id"]),
-        "created_at": str(record["created_at"]) if record.get("created_at") else None,
-        "workspace_id": str(record["workspace_id"]),
-        "webhook": str(record["webhook"]),
-        # Request
-        "event_type": str(record["event_type"]),
-        "request_method": str(record["request_method"]),
-        "request_headers": str(record["request_headers"]),
-        "request_body": str(record["request_body"]),
-        # Response
-        "response_status": str(record["response_status"]),
-        "response_body": str(record["response_body"]),
-        "response_headers": str(record["response_headers"]),
-        # retry count
-        "retry_count": str(record["retry_count"]),
-    }
+def transform_webhook_log(record: Dict) -> Dict:
+    """Transform webhook log record."""
+    return {
+        "id": str(record["id"]),
+        "created_at": str(record["created_at"]) if record.get("created_at") else None,
+        "workspace_id": str(record["workspace_id"]),
+        "webhook": str(record["webhook"]),
+        # Request
+        "event_type": record.get("event_type"),
+        "request_method": record.get("request_method"),
+        "request_headers": record.get("request_headers"),
+        "request_body": record.get("request_body"),
+        # Response
+        "response_status": record.get("response_status"),
+        "response_body": record.get("response_body"),
+        "response_headers": record.get("response_headers"),
+        # Retry count
+        "retry_count": record.get("retry_count"),
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bf0595 and 23acf6c.

📒 Files selected for processing (2)
  • apps/api/plane/bgtasks/cleanup_task.py (4 hunks)
  • apps/api/plane/celery.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
apps/api/plane/bgtasks/cleanup_task.py (2)
apps/api/plane/settings/mongo.py (1)
  • MongoConnection (18-124)
apps/api/plane/utils/exception_logger.py (1)
  • log_exception (9-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and lint web apps
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (8)
apps/api/plane/celery.py (5)

56-59: Reschedule email-notification logs to 02:45 UTC — OK.


60-63: Reschedule page-versions cleanup to 03:00 UTC — OK.


64-67: Reschedule issue-description versions to 03:15 UTC — OK.


68-71: Add daily webhook-log cleanup at 03:30 UTC — OK.


72-75: Exporter history cleanup now runs twice daily; confirm or dedupe.

There’s an existing schedule at Line 44-47 ("check-every-day-to-delete_exporter_history" at 01:30) and this new one at 03:45 with a different key. If two runs/day weren’t intended, drop one and keep a single entry.

Option A (keep 03:45; remove this new block here and update the original to 03:45):

-    "check-every-day-to-delete-exporter-history": {
-        "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",
-        "schedule": crontab(hour=3, minute=45),  # UTC 03:45
-    },

Then update the original block (Lines 44-47) time from 01:30 → 03:45.

Option B (keep both if intentional): please confirm idempotency and load impact.

apps/api/plane/bgtasks/cleanup_task.py (3)

24-25: Add WebhookLog import — OK.


31-31: Reduce BATCH_SIZE to 500 — OK.
Better memory footprint; aligns with chunked iterators.


473-483: New Celery task delete_webhook_logs — OK.
Wired correctly into the generic cleanup pipeline and collection name.

Comment on lines 399 to 423
def get_webhook_logs_queryset():
"""Get email logs older than cutoff days."""
cutoff_days = int(os.environ.get("HARD_DELETE_AFTER_DAYS", 30))
cutoff_time = timezone.now() - timedelta(days=cutoff_days)
logger.info(f"Webhook logs cutoff time: {cutoff_time}")

return (
WebhookLog.all_objects.filter(created_at__lte=cutoff_time)
.values(
"id",
"created_at",
"workspace_id",
"webhook",
"event_type",
"request_method",
"response_headers",
"request_body",
"response_status",
"response_body",
"response_headers",
"retry_count",
)
.iterator(chunk_size=BATCH_SIZE)
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix queryset: wrong docstring + missing request_headers + duplicate response_headers (will cause KeyError).

transform_webhook_log reads request_headers, but the queryset doesn’t select it; also response_headers appears twice.

-def get_webhook_logs_queryset():
-    """Get email logs older than cutoff days."""
+def get_webhook_logs_queryset():
+    """Get webhook logs older than cutoff days."""
@@
-        .values(
+        .values(
             "id",
             "created_at",
             "workspace_id",
             "webhook",
             "event_type",
             "request_method",
-            "response_headers",
+            "request_headers",
             "request_body",
             "response_status",
             "response_body",
-            "response_headers",
+            "response_headers",
             "retry_count",
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_webhook_logs_queryset():
"""Get email logs older than cutoff days."""
cutoff_days = int(os.environ.get("HARD_DELETE_AFTER_DAYS", 30))
cutoff_time = timezone.now() - timedelta(days=cutoff_days)
logger.info(f"Webhook logs cutoff time: {cutoff_time}")
return (
WebhookLog.all_objects.filter(created_at__lte=cutoff_time)
.values(
"id",
"created_at",
"workspace_id",
"webhook",
"event_type",
"request_method",
"response_headers",
"request_body",
"response_status",
"response_body",
"response_headers",
"retry_count",
)
.iterator(chunk_size=BATCH_SIZE)
)
def get_webhook_logs_queryset():
"""Get webhook logs older than cutoff days."""
cutoff_days = int(os.environ.get("HARD_DELETE_AFTER_DAYS", 30))
cutoff_time = timezone.now() - timedelta(days=cutoff_days)
logger.info(f"Webhook logs cutoff time: {cutoff_time}")
return (
WebhookLog.all_objects.filter(created_at__lte=cutoff_time)
.values(
"id",
"created_at",
"workspace_id",
"webhook",
"event_type",
"request_method",
"request_headers",
"request_body",
"response_status",
"response_body",
"response_headers",
"retry_count",
)
.iterator(chunk_size=BATCH_SIZE)
)
🤖 Prompt for AI Agents
In apps/api/plane/bgtasks/cleanup_task.py around lines 399 to 423, the function
docstring incorrectly says "Get email logs" and the queryset selects
response_headers twice while omitting request_headers that transform_webhook_log
expects; update the docstring to mention webhook logs, remove the duplicate
response_headers entry, and add "request_headers" to the .values(...) list so
the queryset returns the headers needed by transform_webhook_log.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
apps/api/plane/bgtasks/cleanup_task.py (1)

399-401: Fix docstring: it references email logs.

-    """Get email logs older than cutoff days."""
+    """Get webhook logs older than cutoff days."""
🧹 Nitpick comments (2)
apps/api/plane/bgtasks/cleanup_task.py (2)

251-270: Preserve JSON types and align docstring wording; avoid str() on structured fields.

Storing headers/bodies as strings hurts queryability and masking; other transforms keep JSON as-is. Also align docstring with peers.

-def transform_webhook_log(record: Dict):
-    """Transfer webhook logs to a new destination."""
+def transform_webhook_log(record: Dict):
+    """Transform webhook log record."""
     return {
         "id": str(record["id"]),
         "created_at": str(record["created_at"]) if record.get("created_at") else None,
         "workspace_id": str(record["workspace_id"]),
         "webhook": str(record["webhook"]),
         # Request
         "event_type": str(record["event_type"]),
         "request_method": str(record["request_method"]),
-        "request_headers": str(record["request_headers"]),
-        "request_body": str(record["request_body"]),
+        "request_headers": record.get("request_headers"),
+        "request_body": record.get("request_body"),
         # Response
-        "response_status": str(record["response_status"]),
-        "response_body": str(record["response_body"]),
-        "response_headers": str(record["response_headers"]),
+        "response_status": record.get("response_status"),
+        "response_body": record.get("response_body"),
+        "response_headers": record.get("response_headers"),
         # retry count
-        "retry_count": str(record["retry_count"]),
+        "retry_count": record.get("retry_count"),
     }

If secrets can appear in headers/body (e.g., Authorization, cookies, tokens), ensure upstream masking or add masking here before archival.


406-424: Confirm field names and indexing for performance.

  • Verify WebhookLog uses field "webhook" (not "webhook_id") in values(); else this will KeyError at transform time.
  • Ensure an index on created_at to keep this cutoff query efficient at scale.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bf0595 and 69d08f7.

📒 Files selected for processing (2)
  • apps/api/plane/bgtasks/cleanup_task.py (4 hunks)
  • apps/api/plane/celery.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
apps/api/plane/bgtasks/cleanup_task.py (2)
apps/api/plane/settings/mongo.py (1)
  • MongoConnection (18-124)
apps/api/plane/utils/exception_logger.py (1)
  • log_exception (9-20)
🔇 Additional comments (5)
apps/api/plane/celery.py (2)

56-67: Daily schedule shifts look good.

Times tightened to 02:45/03:00/03:15 are reasonable spacing and keep load staggered.


68-71: New webhook log cleanup beat entry: LGTM.

Task path matches the new Celery task and the 03:30 UTC slot fits the cadence.

apps/api/plane/bgtasks/cleanup_task.py (3)

24-24: Importing WebhookLog: LGTM.

Required for queryset/delete path.


31-31: Batch size reduced to 500: LGTM.

Less memory pressure; matches iterator chunk_size usage.


475-484: New Celery task wrapper: LGTM.

Aligns with generic process_cleanup_task pattern and collection name.

Comment on lines +72 to 75
"check-every-day-to-delete-exporter-history": {
"task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",
"schedule": crontab(hour=3, minute=45), # UTC 03:45
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Duplicate exporter-history beat entry will double-run the task.

You now have both:

  • "check-every-day-to-delete_exporter_history" at 01:30 (existing), and
  • "check-every-day-to-delete-exporter-history" at 03:45 (new).

Celery will schedule both since keys differ. Keep exactly one entry and one name. If you intend to move it to 03:45, delete this new block and update the original entry’s time; or delete the old one and keep this. Here’s the minimal fix to remove the duplicate new block:

-    "check-every-day-to-delete-exporter-history": {
-        "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",
-        "schedule": crontab(hour=3, minute=45),  # UTC 03:45
-    },

🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "Searching repository for exporter schedule keys and task name..."
rg -n --hidden -S -C2 "check-every-day-to-delete-exporter-history|check-every-day-to-delete_exporter_history|delete_old_s3_link|exporter_expired_task" || true

echo
echo "Showing apps/api/plane/celery.py (if present)..."
if [ -f "apps/api/plane/celery.py" ]; then
  sed -n '1,240p' apps/api/plane/celery.py || true
else
  echo "apps/api/plane/celery.py not found"
fi

Length of output: 5595


Remove duplicate Celery beat entry for exporter history

Two scheduled entries call plane.bgtasks.exporter_expired_task.delete_old_s3_link; keep exactly one. Locations: apps/api/plane/celery.py lines 44–46 ("check-every-day-to-delete_exporter_history" — UTC 01:30) and lines 72–75 ("check-every-day-to-delete-exporter-history" — UTC 03:45). Remove the duplicate block or update the original entry’s schedule. Minimal fix (remove the new 03:45 block):

-    "check-every-day-to-delete-exporter-history": {
-        "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",
-        "schedule": crontab(hour=3, minute=45),  # UTC 03:45
-    },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"check-every-day-to-delete-exporter-history": {
"task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",
"schedule": crontab(hour=3, minute=45), # UTC 03:45
},
🤖 Prompt for AI Agents
apps/api/plane/celery.py around lines 72-75: there is a duplicate Celery beat
entry calling plane.bgtasks.exporter_expired_task.delete_old_s3_link (the
original already exists at lines 44-46 with UTC 01:30); remove the newer block
at lines 72-75 so only the original scheduled entry remains, ensuring no
duplicate tasks are scheduled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants