[GDB-13346] Enhance CloudWatch monitoring: add new alarms and refactor existing ones #129
+174
−73
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR enhances the CloudWatch monitoring configuration and improves alarm coverage and consistency across all GraphDB nodes.
Key Changes
Refactored existing alarms:
• Added treat_missing_data = "notBreaching" to prevent false positives during deployment or downtime.
• Unified comparison_operator to GreaterThanOrEqualToThreshold for consistency.
• Introduced new per-node alarms
• disk_used_percent — triggers when root disk usage exceeds the threshold.
• mem_used_percent — monitors memory utilization per instance.
• cpu_utilization — tracks CPU usage individually for each node.
• Updated CloudWatch Agent configuration
• Added InstanceId as an appended dimension to support per-node metric granularity and accurate alarm mapping.
Related Issues
[GDB-13446]
Changes
Checklist