Introduction
In Stage 4A, I built a CLI tool (swiftdeploy) that generates infrastructure from a single file (manifest.yaml).
In Stage 4B, I extended it to
include:
- Observability (metrics)
- Policy enforcement (OPA)
- Auditing (history + reports)
The goal was simple but strict:
The system must refuse to deploy or promote if it is unsafe.
This meant moving from just “running containers” to building a system that can think and decide before acting.
⸻
Architectural Overview
manifest.yaml
↓
swiftdeploy CLI
↓
docker-compose + nginx
↓
Docker Network
↓
[ NGINX ] → [ APP (/metrics) ]
↓
metrics
↓
CLI
↓
OPA
At a high level:
- manifest.yaml is the single source of truth
- swiftdeploy CLI reads it and generates:
- docker-compose.yml
- nginx.conf
- Docker runs:
- API service
- Nginx (reverse proxy)
- OPA (policy engine)
flow:
CLI → collect data → send to OPA → receive decision → deploy or block
The Design: A Tool That Writes Its Own Infrastructure
The core idea was:
I don’t manually write configs — I generate them.
Instead of editing multiple files, I only update:
manifest.yaml
then:
python swiftdeploy.py init
This generates:
- docker-compose.yml
- nginx.conf
Why this matters
- Reduces manual errors
- Keeps configuration consistent
- Makes the system reproducible
If I deletes my configs, I can regenerate everything from the manifest.
Observability: Adding the “Eyes” (/metrics)
I added a /metrics endpoint to the API in Prometheus format.
It tracks:
Throughput & Errors
http_requests_total{method, path, status_code}Latency
http_request_duration_seconds_bucketApplication State
app_uptime_seconds
app_mode (0=stable, 1=canary)
chaos_active
The Guardrails: Policy Enforcement with OPA
Instead of writing logic inside the CLI, I used Open Policy Agent.
Key Rule:
The CLI must NOT decide anything — OPA decides everything.
🔹 Infrastructure Policy (Pre-Deploy)
Checks:
- Disk space
- CPU load
Example rule:
Deny if disk_free < 10GB
Deny if cpu_load > 2.0
If I artificially reduce disk space:
BLOCKED: Disk below threshold
👉 This satisfies the Hard Gate requirement
⸻
🔹 Canary Safety Policy (Pre-Promote)
Before promoting, the CLI:
- Scrapes /metrics
- Calculates:
- Error rate
- P99 latency
- Sends to OPA
Policy:
Deny if error_rate > 1%
Deny if p99_latency > 500ms
⸻
Why Isolation Matters
OPA runs as a separate container and:
- Is reachable by the CLI
- Is NOT exposed through Nginx
👉 This ensures:
- No external access to policy engine
- Clear separation of responsibilities
This satisfies the “No Leakage” requirement
⸻
🧪 The Chaos: Testing Failure Scenarios
I implemented a /chaos endpoint:
Modes:
- slow → delays responses
- error → randomly returns 500
- recover → resets system
{ "mode": "slow", "duration": 2 }
What Happened
When I injected chaos:
- Latency increased
- Error rate increased
- Metrics reflected the change
When I tried to promote:
BLOCKED: Latency too high
👉 This confirmed:
The system reacts to real runtime conditions, not assumptions
⸻
The Eyes: swiftdeploy status
This command:
python swiftdeploy.py status
- Continuously scrapes /metrics
- Displays live system state
- Logs everything to:
history.jsonl
The Memory: Audit System
From the logs, I generate:
python swiftdeploy.py audit
This creates:
audit_report.md
Contents:
- Timeline of events
- Policy violations
👉 The report renders cleanly in GitHub Markdown
(Satisfies submission requirement)
⸻
Lessons Learned
This stage changed how I think about DevOps:
- Deployment is not just execution
It’s decision-making
⸻
- Policies should be external
Keeping logic in OPA:
- makes it reusable
- avoids tightly coupled code
⸻
- Metrics are not just for monitoring
They actively drive decisions
⸻
- Debugging is part of the process
I faced:
- YAML errors
- Docker rebuild issues
- Nginx misconfigurations
- OPA connection failures
Fixing them helped me understand the system deeply.
⸻
✅ Final Checklist (Submission Criteria)
✔ manifest.yaml is the only edited file
✔ Deployment blocked when disk is low
✔ OPA not exposed via Nginx
✔ Metrics fully implemented
✔ Audit report generated and readable
✔ Blog includes architecture diagram
⸻
Conclusion
This project helped me move from:
running commands → building systems that enforce rules
I now better understand how:
- observability
- policy
- infrastructure
work together in real-world systems.
⸻
If you’re learning DevOps, my biggest takeaway is:
Don’t just deploy — build systems that decide when deployment is safe.
United States
NORTH AMERICA
Related News
UCP Variant Data: The #1 Reason Agent Checkouts Fail
7h ago
Amazon Employees Are 'Tokenmaxxing' Due To Pressure To Use AI Tools
21h ago
How Braze’s CTO is rethinking engineering for the agentic area
10h ago

Décryptage technique : Comment builder un téléchargeur de vidéos Reddit performant (DASH, HLS & WebAssembly)
17h ago
How AI Reduced Manual Driver Verification by 75% — Operations Case Study. Part 2
4h ago