-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[WEB-4873]: Add webhook log cleanup task and update Celery schedule #7772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: preview
Are you sure you want to change the base?
Conversation
- Implement cleanup for old webhook logs - Add periodic Celery task for webhook log deletion - Adjust schedule times for existing cleanup tasks - Reduce batch size from 1000 to 500
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds webhook log cleanup to the background cleanup workflow, decreases batch size from 1000 to 500, and updates Celery beat to adjust existing schedules and add daily tasks for webhook log cleanup and exporter history cleanup. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Cron as Celery Beat (UTC)
participant Celery as Celery Worker
participant Cleanup as delete_webhook_logs()
participant DB as Database
Cron->>Celery: Enqueue task plane.bgtasks.cleanup_task.delete_webhook_logs (daily 03:30)
Celery->>Cleanup: Execute task
Cleanup->>DB: Query webhook logs older than HARD_DELETE_AFTER_DAYS (batch size=500)
DB-->>Cleanup: Iterator of records (values-only)
loop For each batch
Cleanup->>Cleanup: Transform records (stringify fields)
Cleanup->>DB: Delete processed records
end
Cleanup-->>Celery: Task complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Poem
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pull Request Linked with Plane Work Items Comment Automatically Generated by Plane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements automated cleanup for old webhook logs by adding a new Celery task and updating the schedule for existing cleanup tasks. The changes help manage database storage by automatically removing old webhook logs after a configurable retention period.
- Add new webhook log cleanup functionality with MongoDB archival
- Update Celery task scheduling to better distribute cleanup operations
- Reduce batch size from 1000 to 500 for improved memory usage
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
apps/api/plane/celery.py | Adds webhook log cleanup task to Celery schedule and adjusts timing for existing cleanup tasks |
apps/api/plane/bgtasks/cleanup_task.py | Implements webhook log cleanup logic including queryset, transform function, and Celery task |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
"webhook", | ||
"event_type", | ||
"request_method", | ||
"response_headers", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field request_headers
is listed twice in the values() call - once at line 414 and again at line 418 as response_headers
. The first occurrence at line 414 should be request_headers
and line 418 should remain response_headers
.
"response_headers", | |
"request_headers", |
Copilot uses AI. Check for mistakes.
@@ -374,7 +396,32 @@ def get_issue_description_versions_queryset(): | |||
) | |||
|
|||
|
|||
# Celery tasks - now much simpler! | |||
def get_webhook_logs_queryset(): | |||
"""Get email logs older than cutoff days.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring incorrectly states 'Get email logs older than cutoff days' but this function gets webhook logs. It should be updated to 'Get webhook logs older than cutoff days.'
"""Get email logs older than cutoff days.""" | |
"""Get webhook logs older than cutoff days.""" |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
apps/api/plane/bgtasks/cleanup_task.py (1)
251-269
: Avoid string-casting structured fields; align with other transforms.Casting headers/bodies/status/retry_count to str hurts querying and is inconsistent with
transform_api_log
. Keep IDs as strings, but preserve original types for structured/numeric fields. Also tweak the docstring wording.-def transform_webhook_log(record: Dict): - """Transfer webhook logs to a new destination.""" - return { - "id": str(record["id"]), - "created_at": str(record["created_at"]) if record.get("created_at") else None, - "workspace_id": str(record["workspace_id"]), - "webhook": str(record["webhook"]), - # Request - "event_type": str(record["event_type"]), - "request_method": str(record["request_method"]), - "request_headers": str(record["request_headers"]), - "request_body": str(record["request_body"]), - # Response - "response_status": str(record["response_status"]), - "response_body": str(record["response_body"]), - "response_headers": str(record["response_headers"]), - # retry count - "retry_count": str(record["retry_count"]), - } +def transform_webhook_log(record: Dict) -> Dict: + """Transform webhook log record.""" + return { + "id": str(record["id"]), + "created_at": str(record["created_at"]) if record.get("created_at") else None, + "workspace_id": str(record["workspace_id"]), + "webhook": str(record["webhook"]), + # Request + "event_type": record.get("event_type"), + "request_method": record.get("request_method"), + "request_headers": record.get("request_headers"), + "request_body": record.get("request_body"), + # Response + "response_status": record.get("response_status"), + "response_body": record.get("response_body"), + "response_headers": record.get("response_headers"), + # Retry count + "retry_count": record.get("retry_count"), + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/api/plane/bgtasks/cleanup_task.py
(4 hunks)apps/api/plane/celery.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
apps/api/plane/bgtasks/cleanup_task.py (2)
apps/api/plane/settings/mongo.py (1)
MongoConnection
(18-124)apps/api/plane/utils/exception_logger.py (1)
log_exception
(9-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and lint web apps
- GitHub Check: Analyze (javascript)
🔇 Additional comments (8)
apps/api/plane/celery.py (5)
56-59
: Reschedule email-notification logs to 02:45 UTC — OK.
60-63
: Reschedule page-versions cleanup to 03:00 UTC — OK.
64-67
: Reschedule issue-description versions to 03:15 UTC — OK.
68-71
: Add daily webhook-log cleanup at 03:30 UTC — OK.
72-75
: Exporter history cleanup now runs twice daily; confirm or dedupe.There’s an existing schedule at Line 44-47 ("check-every-day-to-delete_exporter_history" at 01:30) and this new one at 03:45 with a different key. If two runs/day weren’t intended, drop one and keep a single entry.
Option A (keep 03:45; remove this new block here and update the original to 03:45):
- "check-every-day-to-delete-exporter-history": { - "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link", - "schedule": crontab(hour=3, minute=45), # UTC 03:45 - },Then update the original block (Lines 44-47) time from 01:30 → 03:45.
Option B (keep both if intentional): please confirm idempotency and load impact.
apps/api/plane/bgtasks/cleanup_task.py (3)
24-25
: Add WebhookLog import — OK.
31-31
: Reduce BATCH_SIZE to 500 — OK.
Better memory footprint; aligns with chunked iterators.
473-483
: New Celery task delete_webhook_logs — OK.
Wired correctly into the generic cleanup pipeline and collection name.
def get_webhook_logs_queryset(): | ||
"""Get email logs older than cutoff days.""" | ||
cutoff_days = int(os.environ.get("HARD_DELETE_AFTER_DAYS", 30)) | ||
cutoff_time = timezone.now() - timedelta(days=cutoff_days) | ||
logger.info(f"Webhook logs cutoff time: {cutoff_time}") | ||
|
||
return ( | ||
WebhookLog.all_objects.filter(created_at__lte=cutoff_time) | ||
.values( | ||
"id", | ||
"created_at", | ||
"workspace_id", | ||
"webhook", | ||
"event_type", | ||
"request_method", | ||
"response_headers", | ||
"request_body", | ||
"response_status", | ||
"response_body", | ||
"response_headers", | ||
"retry_count", | ||
) | ||
.iterator(chunk_size=BATCH_SIZE) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix queryset: wrong docstring + missing request_headers + duplicate response_headers (will cause KeyError).
transform_webhook_log
reads request_headers
, but the queryset doesn’t select it; also response_headers
appears twice.
-def get_webhook_logs_queryset():
- """Get email logs older than cutoff days."""
+def get_webhook_logs_queryset():
+ """Get webhook logs older than cutoff days."""
@@
- .values(
+ .values(
"id",
"created_at",
"workspace_id",
"webhook",
"event_type",
"request_method",
- "response_headers",
+ "request_headers",
"request_body",
"response_status",
"response_body",
- "response_headers",
+ "response_headers",
"retry_count",
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def get_webhook_logs_queryset(): | |
"""Get email logs older than cutoff days.""" | |
cutoff_days = int(os.environ.get("HARD_DELETE_AFTER_DAYS", 30)) | |
cutoff_time = timezone.now() - timedelta(days=cutoff_days) | |
logger.info(f"Webhook logs cutoff time: {cutoff_time}") | |
return ( | |
WebhookLog.all_objects.filter(created_at__lte=cutoff_time) | |
.values( | |
"id", | |
"created_at", | |
"workspace_id", | |
"webhook", | |
"event_type", | |
"request_method", | |
"response_headers", | |
"request_body", | |
"response_status", | |
"response_body", | |
"response_headers", | |
"retry_count", | |
) | |
.iterator(chunk_size=BATCH_SIZE) | |
) | |
def get_webhook_logs_queryset(): | |
"""Get webhook logs older than cutoff days.""" | |
cutoff_days = int(os.environ.get("HARD_DELETE_AFTER_DAYS", 30)) | |
cutoff_time = timezone.now() - timedelta(days=cutoff_days) | |
logger.info(f"Webhook logs cutoff time: {cutoff_time}") | |
return ( | |
WebhookLog.all_objects.filter(created_at__lte=cutoff_time) | |
.values( | |
"id", | |
"created_at", | |
"workspace_id", | |
"webhook", | |
"event_type", | |
"request_method", | |
"request_headers", | |
"request_body", | |
"response_status", | |
"response_body", | |
"response_headers", | |
"retry_count", | |
) | |
.iterator(chunk_size=BATCH_SIZE) | |
) |
🤖 Prompt for AI Agents
In apps/api/plane/bgtasks/cleanup_task.py around lines 399 to 423, the function
docstring incorrectly says "Get email logs" and the queryset selects
response_headers twice while omitting request_headers that transform_webhook_log
expects; update the docstring to mention webhook logs, remove the duplicate
response_headers entry, and add "request_headers" to the .values(...) list so
the queryset returns the headers needed by transform_webhook_log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
apps/api/plane/bgtasks/cleanup_task.py (1)
399-401
: Fix docstring: it references email logs.- """Get email logs older than cutoff days.""" + """Get webhook logs older than cutoff days."""
🧹 Nitpick comments (2)
apps/api/plane/bgtasks/cleanup_task.py (2)
251-270
: Preserve JSON types and align docstring wording; avoid str() on structured fields.Storing headers/bodies as strings hurts queryability and masking; other transforms keep JSON as-is. Also align docstring with peers.
-def transform_webhook_log(record: Dict): - """Transfer webhook logs to a new destination.""" +def transform_webhook_log(record: Dict): + """Transform webhook log record.""" return { "id": str(record["id"]), "created_at": str(record["created_at"]) if record.get("created_at") else None, "workspace_id": str(record["workspace_id"]), "webhook": str(record["webhook"]), # Request "event_type": str(record["event_type"]), "request_method": str(record["request_method"]), - "request_headers": str(record["request_headers"]), - "request_body": str(record["request_body"]), + "request_headers": record.get("request_headers"), + "request_body": record.get("request_body"), # Response - "response_status": str(record["response_status"]), - "response_body": str(record["response_body"]), - "response_headers": str(record["response_headers"]), + "response_status": record.get("response_status"), + "response_body": record.get("response_body"), + "response_headers": record.get("response_headers"), # retry count - "retry_count": str(record["retry_count"]), + "retry_count": record.get("retry_count"), }If secrets can appear in headers/body (e.g., Authorization, cookies, tokens), ensure upstream masking or add masking here before archival.
406-424
: Confirm field names and indexing for performance.
- Verify WebhookLog uses field "webhook" (not "webhook_id") in values(); else this will KeyError at transform time.
- Ensure an index on created_at to keep this cutoff query efficient at scale.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/api/plane/bgtasks/cleanup_task.py
(4 hunks)apps/api/plane/celery.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
apps/api/plane/bgtasks/cleanup_task.py (2)
apps/api/plane/settings/mongo.py (1)
MongoConnection
(18-124)apps/api/plane/utils/exception_logger.py (1)
log_exception
(9-20)
🔇 Additional comments (5)
apps/api/plane/celery.py (2)
56-67
: Daily schedule shifts look good.Times tightened to 02:45/03:00/03:15 are reasonable spacing and keep load staggered.
68-71
: New webhook log cleanup beat entry: LGTM.Task path matches the new Celery task and the 03:30 UTC slot fits the cadence.
apps/api/plane/bgtasks/cleanup_task.py (3)
24-24
: Importing WebhookLog: LGTM.Required for queryset/delete path.
31-31
: Batch size reduced to 500: LGTM.Less memory pressure; matches iterator chunk_size usage.
475-484
: New Celery task wrapper: LGTM.Aligns with generic process_cleanup_task pattern and collection name.
"check-every-day-to-delete-exporter-history": { | ||
"task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link", | ||
"schedule": crontab(hour=3, minute=45), # UTC 03:45 | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Duplicate exporter-history beat entry will double-run the task.
You now have both:
- "check-every-day-to-delete_exporter_history" at 01:30 (existing), and
- "check-every-day-to-delete-exporter-history" at 03:45 (new).
Celery will schedule both since keys differ. Keep exactly one entry and one name. If you intend to move it to 03:45, delete this new block and update the original entry’s time; or delete the old one and keep this. Here’s the minimal fix to remove the duplicate new block:
- "check-every-day-to-delete-exporter-history": {
- "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",
- "schedule": crontab(hour=3, minute=45), # UTC 03:45
- },
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Searching repository for exporter schedule keys and task name..."
rg -n --hidden -S -C2 "check-every-day-to-delete-exporter-history|check-every-day-to-delete_exporter_history|delete_old_s3_link|exporter_expired_task" || true
echo
echo "Showing apps/api/plane/celery.py (if present)..."
if [ -f "apps/api/plane/celery.py" ]; then
sed -n '1,240p' apps/api/plane/celery.py || true
else
echo "apps/api/plane/celery.py not found"
fi
Length of output: 5595
Remove duplicate Celery beat entry for exporter history
Two scheduled entries call plane.bgtasks.exporter_expired_task.delete_old_s3_link; keep exactly one. Locations: apps/api/plane/celery.py lines 44–46 ("check-every-day-to-delete_exporter_history" — UTC 01:30) and lines 72–75 ("check-every-day-to-delete-exporter-history" — UTC 03:45). Remove the duplicate block or update the original entry’s schedule. Minimal fix (remove the new 03:45 block):
- "check-every-day-to-delete-exporter-history": {
- "task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link",
- "schedule": crontab(hour=3, minute=45), # UTC 03:45
- },
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"check-every-day-to-delete-exporter-history": { | |
"task": "plane.bgtasks.exporter_expired_task.delete_old_s3_link", | |
"schedule": crontab(hour=3, minute=45), # UTC 03:45 | |
}, |
🤖 Prompt for AI Agents
apps/api/plane/celery.py around lines 72-75: there is a duplicate Celery beat
entry calling plane.bgtasks.exporter_expired_task.delete_old_s3_link (the
original already exists at lines 44-46 with UTC 01:30); remove the newer block
at lines 72-75 so only the original scheduled entry remains, ensuring no
duplicate tasks are scheduled.
Description
Type of Change
Test Scenarios
References
WEB-4873
Summary by CodeRabbit
New Features
Chores