[WEB-4873]: Add webhook log cleanup task and update Celery schedule #7772

pablohashescobar · 2025-09-11T10:00:35Z

Description

Implement cleanup for old webhook logs
Adjust schedule times for existing cleanup tasks
Reduce batch size from 1000 to 500

Type of Change

Bug fix (non-breaking change which fixes an issue)
Improvement (change that would cause existing functionality to not work as expected)

Test Scenarios

validate cleanup task should not trigger read ops error
add webhook log cleanup task

References

WEB-4873

Summary by CodeRabbit

New Features
- Automated daily cleanup of webhook logs to manage storage and keep activity history lean.
- Automated daily cleanup of expired export links for improved data hygiene.
Chores
- Adjusted schedules for daily cleanup jobs to run earlier for smoother maintenance windows.
- Tuned cleanup batch size to improve reliability and reduce resource spikes during background processing.

- Implement cleanup for old webhook logs - Add periodic Celery task for webhook log deletion - Adjust schedule times for existing cleanup tasks - Reduce batch size from 1000 to 500

coderabbitai · 2025-09-11T10:00:44Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds webhook log cleanup to the background cleanup workflow, decreases batch size from 1000 to 500, and updates Celery beat to adjust existing schedules and add daily tasks for webhook log cleanup and exporter history cleanup.

Changes

Cohort / File(s)	Summary
Cleanup task enhancements `apps/api/plane/bgtasks/cleanup_task.py`	Added transform_webhook_log(record: Dict), get_webhook_logs_queryset(), and Celery task delete_webhook_logs() to process hard-deletion of old webhook logs; reduced BATCH_SIZE from 1000 to 500; integrated WebhookLog into existing cleanup pipeline.
Celery schedule updates `apps/api/plane/celery.py`	Adjusted daily run times for three existing cleanup tasks; added two new scheduled tasks: delete_webhook_logs at 03:30 UTC and delete_old_s3_link at 03:45 UTC.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Cron as Celery Beat (UTC)
  participant Celery as Celery Worker
  participant Cleanup as delete_webhook_logs()
  participant DB as Database

  Cron->>Celery: Enqueue task plane.bgtasks.cleanup_task.delete_webhook_logs (daily 03:30)
  Celery->>Cleanup: Execute task
  Cleanup->>DB: Query webhook logs older than HARD_DELETE_AFTER_DAYS (batch size=500)
  DB-->>Cleanup: Iterator of records (values-only)
  loop For each batch
    Cleanup->>Cleanup: Transform records (stringify fields)
    Cleanup->>DB: Delete processed records
  end
  Cleanup-->>Celery: Task complete

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[WEB-4720] chore: refactor and extend cleanup tasks for logs and versions #7604 — Also modifies cleanup_task and Celery beat schedules for periodic cleanups.
[WEB-4813] fix: ensure all identifiers in log transformations are converted to strings #7682 — Adjusts transform logic to stringify ID-like fields; similar pattern used in transform_webhook_log.

Suggested labels

tests required

Suggested reviewers

dheeru0198
sriramveeraghanta

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly and accurately describes the primary changes in the PR: adding a webhook log cleanup task and updating the Celery schedule, and it includes the issue key for traceability. It matches the PR objectives and raw_summary, is concise, and contains no unnecessary noise, so a reviewer scanning history will understand the main intent.
Description Check	✅ Passed	The PR description follows the repository template: it includes a Description section stating the main changes (webhook cleanup, schedule adjustments, batch size reduction), a populated Type of Change section, Test Scenarios, and References. The Description is brief and the test scenarios are minimal but present, so the description is mostly complete and maps to the PR objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Poem

A hare at three-thirty hops through logs so old,
Trims the webs of webhooks, tidy, crisp, and bold.
With batches half as hefty, the queue keeps clean,
Beat taps the drum—03:30 on the scene.
Hop, sweep, delete—my burrow’s evergreen! 🐇✨

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-cleanup-task-webhook

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

makeplane · 2025-09-11T10:00:44Z

Pull Request Linked with Plane Work Items

[WEB-4873] Optimize the cleanup task that's cauing database read spikes

Comment Automatically Generated by Plane

Copilot

Pull Request Overview

This PR implements automated cleanup for old webhook logs by adding a new Celery task and updating the schedule for existing cleanup tasks. The changes help manage database storage by automatically removing old webhook logs after a configurable retention period.

Add new webhook log cleanup functionality with MongoDB archival
Update Celery task scheduling to better distribute cleanup operations
Reduce batch size from 1000 to 500 for improved memory usage

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
apps/api/plane/celery.py	Adds webhook log cleanup task to Celery schedule and adjusts timing for existing cleanup tasks
apps/api/plane/bgtasks/cleanup_task.py	Implements webhook log cleanup logic including queryset, transform function, and Celery task

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-11T10:01:19Z

apps/api/plane/bgtasks/cleanup_task.py

+            "webhook",
+            "event_type",
+            "request_method",
+            "response_headers",


The field request_headers is listed twice in the values() call - once at line 414 and again at line 418 as response_headers. The first occurrence at line 414 should be request_headers and line 418 should remain response_headers.

Suggested change

"response_headers",

"request_headers",

Copilot · 2025-09-11T10:01:19Z

apps/api/plane/bgtasks/cleanup_task.py

@@ -374,7 +396,32 @@ def get_issue_description_versions_queryset():
    )


-# Celery tasks - now much simpler!
+def get_webhook_logs_queryset():
+    """Get email logs older than cutoff days."""


The docstring incorrectly states 'Get email logs older than cutoff days' but this function gets webhook logs. It should be updated to 'Get webhook logs older than cutoff days.'

Suggested change

"""Get email logs older than cutoff days."""

"""Get webhook logs older than cutoff days."""

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

apps/api/plane/bgtasks/cleanup_task.py (1)

251-269: Avoid string-casting structured fields; align with other transforms.

Casting headers/bodies/status/retry_count to str hurts querying and is inconsistent with transform_api_log. Keep IDs as strings, but preserve original types for structured/numeric fields. Also tweak the docstring wording.

-def transform_webhook_log(record: Dict):
-    """Transfer webhook logs to a new destination."""
-    return {
-        "id": str(record["id"]),
-        "created_at": str(record["created_at"]) if record.get("created_at") else None,
-        "workspace_id": str(record["workspace_id"]),
-        "webhook": str(record["webhook"]),
-        # Request
-        "event_type": str(record["event_type"]),
-        "request_method": str(record["request_method"]),
-        "request_headers": str(record["request_headers"]),
-        "request_body": str(record["request_body"]),
-        # Response
-        "response_status": str(record["response_status"]),
-        "response_body": str(record["response_body"]),
-        "response_headers": str(record["response_headers"]),
-        # retry count
-        "retry_count": str(record["retry_count"]),
-    }
+def transform_webhook_log(record: Dict) -> Dict:
+    """Transform webhook log record."""
+    return {
+        "id": str(record["id"]),
+        "created_at": str(record["created_at"]) if record.get("created_at") else None,
+        "workspace_id": str(record["workspace_id"]),
+        "webhook": str(record["webhook"]),
+        # Request
+        "event_type": record.get("event_type"),
+        "request_method": record.get("request_method"),
+        "request_headers": record.get("request_headers"),
+        "request_body": record.get("request_body"),
+        # Response
+        "response_status": record.get("response_status"),
+        "response_body": record.get("response_body"),
+        "response_headers": record.get("response_headers"),
+        # Retry count
+        "retry_count": record.get("retry_count"),
+    }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bf0595 and 23acf6c.

📒 Files selected for processing (2)

apps/api/plane/bgtasks/cleanup_task.py (4 hunks)
apps/api/plane/celery.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

apps/api/plane/bgtasks/cleanup_task.py (2)

apps/api/plane/settings/mongo.py (1)

MongoConnection (18-124)

apps/api/plane/utils/exception_logger.py (1)

log_exception (9-20)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build and lint web apps
GitHub Check: Analyze (javascript)

🔇 Additional comments (8)

apps/api/plane/celery.py (5)
56-59: Reschedule email-notification logs to 02:45 UTC — OK.

60-63: Reschedule page-versions cleanup to 03:00 UTC — OK.

64-67: Reschedule issue-description versions to 03:15 UTC — OK.

68-71: Add daily webhook-log cleanup at 03:30 UTC — OK.

72-75: Exporter history cleanup now runs twice daily; confirm or dedupe.

There’s an existing schedule at Line 44-47 ("check-every-day-to-delete_exporter_history" at 01:30) and this new one at 03:45 with a different key. If two runs/day weren’t intended, drop one and keep a single entry.

Option A (keep 03:45; remove this new block here and update the original to 03:45):
-    "check-every-day-to-delete-exporter-history": {
-        "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",
-        "schedule": crontab(hour=3, minute=45),  # UTC 03:45
-    },
Then update the original block (Lines 44-47) time from 01:30 → 03:45.

Option B (keep both if intentional): please confirm idempotency and load impact.
apps/api/plane/bgtasks/cleanup_task.py (3)

24-25: Add WebhookLog import — OK.

31-31: Reduce BATCH_SIZE to 500 — OK.
Better memory footprint; aligns with chunked iterators.

473-483: New Celery task delete_webhook_logs — OK.
Wired correctly into the generic cleanup pipeline and collection name.

coderabbitai · 2025-09-11T10:08:53Z

apps/api/plane/bgtasks/cleanup_task.py

+def get_webhook_logs_queryset():
+    """Get email logs older than cutoff days."""
+    cutoff_days = int(os.environ.get("HARD_DELETE_AFTER_DAYS", 30))
+    cutoff_time = timezone.now() - timedelta(days=cutoff_days)
+    logger.info(f"Webhook logs cutoff time: {cutoff_time}")
+
+    return (
+        WebhookLog.all_objects.filter(created_at__lte=cutoff_time)
+        .values(
+            "id",
+            "created_at",
+            "workspace_id",
+            "webhook",
+            "event_type",
+            "request_method",
+            "response_headers",
+            "request_body",
+            "response_status",
+            "response_body",
+            "response_headers",
+            "retry_count",
+        )
+        .iterator(chunk_size=BATCH_SIZE)
+    )
+


⚠️ Potential issue

Fix queryset: wrong docstring + missing request_headers + duplicate response_headers (will cause KeyError).

transform_webhook_log reads request_headers, but the queryset doesn’t select it; also response_headers appears twice.

-def get_webhook_logs_queryset(): - """Get email logs older than cutoff days.""" +def get_webhook_logs_queryset(): + """Get webhook logs older than cutoff days.""" @@ - .values( + .values( "id", "created_at", "workspace_id", "webhook", "event_type", "request_method", - "response_headers", + "request_headers", "request_body", "response_status", "response_body", - "response_headers", + "response_headers", "retry_count", )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_webhook_logs_queryset():

"""Get email logs older than cutoff days."""

cutoff_days = int(os.environ.get("HARD_DELETE_AFTER_DAYS", 30))

cutoff_time = timezone.now() - timedelta(days=cutoff_days)

logger.info(f"Webhook logs cutoff time: {cutoff_time}")

return (

WebhookLog.all_objects.filter(created_at__lte=cutoff_time)

.values(

"id",

"created_at",

"workspace_id",

"webhook",

"event_type",

"request_method",

"response_headers",

"request_body",

"response_status",

"response_body",

"response_headers",

"retry_count",

)

.iterator(chunk_size=BATCH_SIZE)

)

def get_webhook_logs_queryset():

"""Get webhook logs older than cutoff days."""

cutoff_days = int(os.environ.get("HARD_DELETE_AFTER_DAYS", 30))

cutoff_time = timezone.now() - timedelta(days=cutoff_days)

logger.info(f"Webhook logs cutoff time: {cutoff_time}")

return (

WebhookLog.all_objects.filter(created_at__lte=cutoff_time)

.values(

"id",

"created_at",

"workspace_id",

"webhook",

"event_type",

"request_method",

"request_headers",

"request_body",

"response_status",

"response_body",

"response_headers",

"retry_count",

)

.iterator(chunk_size=BATCH_SIZE)

)

🤖 Prompt for AI Agents

In apps/api/plane/bgtasks/cleanup_task.py around lines 399 to 423, the function docstring incorrectly says "Get email logs" and the queryset selects response_headers twice while omitting request_headers that transform_webhook_log expects; update the docstring to mention webhook logs, remove the duplicate response_headers entry, and add "request_headers" to the .values(...) list so the queryset returns the headers needed by transform_webhook_log.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

apps/api/plane/bgtasks/cleanup_task.py (1)
399-401: Fix docstring: it references email logs.
-    """Get email logs older than cutoff days."""
+    """Get webhook logs older than cutoff days."""

🧹 Nitpick comments (2)

apps/api/plane/bgtasks/cleanup_task.py (2)

251-270: Preserve JSON types and align docstring wording; avoid str() on structured fields.

Storing headers/bodies as strings hurts queryability and masking; other transforms keep JSON as-is. Also align docstring with peers.

-def transform_webhook_log(record: Dict):
-    """Transfer webhook logs to a new destination."""
+def transform_webhook_log(record: Dict):
+    """Transform webhook log record."""
     return {
         "id": str(record["id"]),
         "created_at": str(record["created_at"]) if record.get("created_at") else None,
         "workspace_id": str(record["workspace_id"]),
         "webhook": str(record["webhook"]),
         # Request
         "event_type": str(record["event_type"]),
         "request_method": str(record["request_method"]),
-        "request_headers": str(record["request_headers"]),
-        "request_body": str(record["request_body"]),
+        "request_headers": record.get("request_headers"),
+        "request_body": record.get("request_body"),
         # Response
-        "response_status": str(record["response_status"]),
-        "response_body": str(record["response_body"]),
-        "response_headers": str(record["response_headers"]),
+        "response_status": record.get("response_status"),
+        "response_body": record.get("response_body"),
+        "response_headers": record.get("response_headers"),
         # retry count
-        "retry_count": str(record["retry_count"]),
+        "retry_count": record.get("retry_count"),
     }

If secrets can appear in headers/body (e.g., Authorization, cookies, tokens), ensure upstream masking or add masking here before archival.

406-424: Confirm field names and indexing for performance.

Verify WebhookLog uses field "webhook" (not "webhook_id") in values(); else this will KeyError at transform time.
Ensure an index on created_at to keep this cutoff query efficient at scale.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8bf0595 and 69d08f7.

📒 Files selected for processing (2)

apps/api/plane/bgtasks/cleanup_task.py (4 hunks)
apps/api/plane/celery.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

apps/api/plane/bgtasks/cleanup_task.py (2)

apps/api/plane/settings/mongo.py (1)

MongoConnection (18-124)

apps/api/plane/utils/exception_logger.py (1)

log_exception (9-20)

🔇 Additional comments (5)

apps/api/plane/celery.py (2)

56-67: Daily schedule shifts look good.

Times tightened to 02:45/03:00/03:15 are reasonable spacing and keep load staggered.

68-71: New webhook log cleanup beat entry: LGTM.

Task path matches the new Celery task and the 03:30 UTC slot fits the cadence.

apps/api/plane/bgtasks/cleanup_task.py (3)

24-24: Importing WebhookLog: LGTM.

Required for queryset/delete path.

31-31: Batch size reduced to 500: LGTM.

Less memory pressure; matches iterator chunk_size usage.

475-484: New Celery task wrapper: LGTM.

Aligns with generic process_cleanup_task pattern and collection name.

coderabbitai · 2025-09-11T10:17:04Z

apps/api/plane/celery.py

+    "check-every-day-to-delete-exporter-history": {
+        "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",
+        "schedule": crontab(hour=3, minute=45),  # UTC 03:45
    },


💡 Verification agent

🧩 Analysis chain

Duplicate exporter-history beat entry will double-run the task.

You now have both:

"check-every-day-to-delete_exporter_history" at 01:30 (existing), and

"check-every-day-to-delete-exporter-history" at 03:45 (new).

Celery will schedule both since keys differ. Keep exactly one entry and one name. If you intend to move it to 03:45, delete this new block and update the original entry’s time; or delete the old one and keep this. Here’s the minimal fix to remove the duplicate new block:

- "check-every-day-to-delete-exporter-history": { - "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link", - "schedule": crontab(hour=3, minute=45), # UTC 03:45 - },

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Searching repository for exporter schedule keys and task name..." rg -n --hidden -S -C2 "check-every-day-to-delete-exporter-history|check-every-day-to-delete_exporter_history|delete_old_s3_link|exporter_expired_task" || true echo echo "Showing apps/api/plane/celery.py (if present)..." if [ -f "apps/api/plane/celery.py" ]; then sed -n '1,240p' apps/api/plane/celery.py || true else echo "apps/api/plane/celery.py not found" fi

Length of output: 5595

Remove duplicate Celery beat entry for exporter history

Two scheduled entries call plane.bgtasks.exporter_expired_task.delete_old_s3_link; keep exactly one. Locations: apps/api/plane/celery.py lines 44–46 ("check-every-day-to-delete_exporter_history" — UTC 01:30) and lines 72–75 ("check-every-day-to-delete-exporter-history" — UTC 03:45). Remove the duplicate block or update the original entry’s schedule. Minimal fix (remove the new 03:45 block):

- "check-every-day-to-delete-exporter-history": { - "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link", - "schedule": crontab(hour=3, minute=45), # UTC 03:45 - },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"check-every-day-to-delete-exporter-history": {

"task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",

"schedule": crontab(hour=3, minute=45), # UTC 03:45

},

🤖 Prompt for AI Agents

apps/api/plane/celery.py around lines 72-75: there is a duplicate Celery beat entry calling plane.bgtasks.exporter_expired_task.delete_old_s3_link (the original already exists at lines 44-46 with UTC 01:30); remove the newer block at lines 72-75 so only the original scheduled entry remains, ensuring no duplicate tasks are scheduled.

Add webhook log cleanup task and update Celery schedule

23acf6c

- Implement cleanup for old webhook logs - Add periodic Celery task for webhook log deletion - Adjust schedule times for existing cleanup tasks - Reduce batch size from 1000 to 500

Copilot AI review requested due to automatic review settings September 11, 2025 10:00

Copilot AI reviewed Sep 11, 2025

View reviewed changes

pablohashescobar requested a review from dheeru0198 September 11, 2025 10:02

Add request_headers to webhook logs queryset fields

69d08f7

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

dheeru0198 approved these changes Sep 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WEB-4873]: Add webhook log cleanup task and update Celery schedule #7772

[WEB-4873]: Add webhook log cleanup task and update Celery schedule #7772

pablohashescobar commented Sep 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 11, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

makeplane bot commented Sep 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 11, 2025

Uh oh!

Copilot AI Sep 11, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 11, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 11, 2025

Uh oh!

Uh oh!

	"""Get email logs older than cutoff days."""
	"""Get webhook logs older than cutoff days."""

[WEB-4873]: Add webhook log cleanup task and update Celery schedule #7772

Are you sure you want to change the base?

[WEB-4873]: Add webhook log cleanup task and update Celery schedule #7772

Conversation

pablohashescobar commented Sep 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Test Scenarios

References

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks (3 passed)

Poem

Uh oh!

makeplane bot commented Sep 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pablohashescobar commented Sep 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 11, 2025 •

edited

Loading