An autonomous, self-aware infrastructure management platform designed to monitor system metrics, detect anomalies using statistical analysis, and provide real-time operational insights via a professional monitoring stack.
This project demonstrates a complete microservice ecosystem built using modern DevOps and ML practices, from data ingestion and analysis to visualization and automated lifecycle management.
A decoupled system consisting of:
- Backend Service (Spring Boot): Handles API logic and data storage.
- ML Worker Service (Python): Performs anomaly detection.
- Monitoring Stack (Prometheus + Grafana): Provides observability and analytics.
Spring Boot backend exposes REST APIs for collecting any time-series metric data, stored in a PostgreSQL database.
A continuously running Python ML worker fetches recent metrics and applies the Modified Z-Score algorithm — a robust statistical method that identifies outliers without assuming a Gaussian distribution.
Integrates Prometheus for metrics scraping and Grafana for interactive dashboards that visualize:
- System health
- Metric trends
- Detected anomalies
All components — Backend, ML Worker, Database, Prometheus, Grafana — are containerized using Podman.
A single automation script (run_project.sh) manages:
- Environment cleanup
- Image rebuild
- Sequential startup
This ensures reproducibility, consistency, and zero manual setup.
| Layer | Technology/Tool | Purpose |
|---|---|---|
| Backend | Java 17+, Spring Boot, Spring Data JPA, Spring Actuator | API, persistence, and metric exposure |
| ML/AI | Python, Pandas | Data manipulation and statistical anomaly detection |
| Database | PostgreSQL | Storage for time-series metrics |
| DevOps | Podman, Dockerfile, Bash, SELinux | Containerization and lifecycle automation |
| Monitoring | Prometheus, Grafana | Metrics collection and dashboard visualization |
| OS | Rocky Linux | Base runtime environment |
- Linux VM (tested on Rocky Linux 9)
git,podman(ordocker)- Internet access for container image pulls
git clone https://github.com/prodXCE/autonomous-resource-optimizer
cd autonomous-resource-optimizer
chmod +x runAro.sh./runAro.shThis script automatically:
- Stops existing containers
- Rebuilds images
- Launches services in order
sudo podman psYou should see:
aro-backend-apppostgres-dbaro-ml-workerprometheusgrafana
All with status: Up ✅
| Service | URL | Default Credentials |
|---|---|---|
| Grafana | http://<your-vm-ip>:3000 |
admin / admin |
| Prometheus | http://<your-vm-ip>:9090 |
— |
./runAro.sh# Normal Data
curl -X POST -H "Content-Type: application/json" \
-d '{"source": "vm-prod-01", "metricType": "CPU_USAGE", "value": 0.85}' \
http://<your-vm-ip>:8080/api/metrics
# Anomaly Data
curl -X POST -H "Content-Type: application/json" \
-d '{"source": "vm-prod-01", "metricType": "CPU_USAGE", "value": 25.0}' \
http://<your-vm-ip>:8080/api/metricsMonitor the ML worker logs:
sudo podman run --name aro-ml-worker-run --replace --network private-net aro-ml-worker:latest
sudo podman logs -f aro-ml-workerWithin 60 seconds, you should see log entries highlighting the anomaly detection event.