Complete implementation ready for Coolify: - Node.js 22 + Fastify + socket.io backend - PostgreSQL 16 + Redis 7 services - Docker Compose configuration - Deployment scripts and documentation Co-Authored-By: Paperclip <noreply@paperclip.ing>
396 lines
11 KiB
Markdown
396 lines
11 KiB
Markdown
# J10 — Phase 1 LAN Deployment Verification
|
|
|
|
**Jalon:** J10 — Livraison Phase 1 (smoke LAN Ubuntu + bootstrap + RUNBOOK)
|
|
**Status:** ✅ Ready for testing
|
|
**Date:** 2026-04-30
|
|
|
|
## Deliverables Status
|
|
|
|
### 1. `scripts/bootstrap.sh` ✅
|
|
|
|
**Location:** `scripts/bootstrap.sh` (mode 755)
|
|
|
|
**10-step idempotent setup:**
|
|
|
|
1. ✅ `apt update && upgrade`
|
|
2. ✅ `unattended-upgrades` activated
|
|
3. ✅ User `agenthub` (UID 1001)
|
|
4. ✅ Docker Engine + Compose v2 (official repo)
|
|
5. ✅ `systemctl enable --now docker`
|
|
6. ✅ `/opt/agenthub` (owner agenthub, mode 750)
|
|
7. ✅ Clone repo from Forgejo
|
|
8. ✅ Load `.env` (mode 600) with generated secrets
|
|
9. ✅ `docker compose -f compose.lan.yml pull && up -d`
|
|
10. ✅ Smoke test `curl http://127.0.0.1:3000/healthz`
|
|
|
|
**Idempotency:** Safe to run multiple times — skips existing resources.
|
|
|
|
**Test command:**
|
|
```bash
|
|
sudo bash scripts/bootstrap.sh
|
|
```
|
|
|
|
### 2. `docs/RUNBOOK-lan.md` ✅
|
|
|
|
**Location:** `docs/RUNBOOK-lan.md`
|
|
|
|
**Sections covered:**
|
|
|
|
- ✅ Initial setup (prerequisites, bootstrap)
|
|
- ✅ Deployment (directory layout, env vars, services)
|
|
- ✅ Firewall configuration (UFW rules for LAN-only access)
|
|
- ✅ Operations (start/stop/logs/update)
|
|
- ✅ Backup & restore (automated + manual)
|
|
- ✅ Rollback (feature flag + version rollback)
|
|
- ✅ Monitoring (health checks, Prometheus metrics, Uptime Kuma)
|
|
- ✅ Troubleshooting (common issues + resolutions)
|
|
|
|
**Quick reference tables:** Ports, commands, files to backup
|
|
|
|
### 3. Feature Flag `messaging.enabled` ✅
|
|
|
|
**Implementation:**
|
|
|
|
- ✅ Config schema: `FEATURE_MESSAGING_ENABLED` (default: `true`)
|
|
- ✅ App logic: Conditionally setup Socket.IO based on flag
|
|
- ✅ `.env.example`: Documented with rollback instructions
|
|
- ✅ RUNBOOK-lan.md: Rollback procedure documented
|
|
|
|
**Toggle command:**
|
|
```bash
|
|
# Disable messaging
|
|
echo "FEATURE_MESSAGING_ENABLED=false" >> .env
|
|
docker compose -f compose.lan.yml restart app
|
|
|
|
# Re-enable messaging
|
|
sed -i '/FEATURE_MESSAGING_ENABLED/d' .env
|
|
docker compose -f compose.lan.yml restart app
|
|
```
|
|
|
|
### 4. UFW Firewall Rules ✅
|
|
|
|
**Documented in RUNBOOK-lan.md:**
|
|
|
|
```bash
|
|
sudo ufw allow from 192.168.1.0/24 to any port 22 proto tcp # SSH
|
|
sudo ufw allow from 192.168.1.0/24 to any port 3000 proto tcp # AgentHub
|
|
sudo ufw default deny incoming
|
|
```
|
|
|
|
**Ports exposed:**
|
|
- 22/tcp → SSH (LAN only)
|
|
- 3000/tcp → AgentHub HTTP/WS (LAN only)
|
|
|
|
**Internal (Docker-only):**
|
|
- 5432/tcp → Postgres
|
|
- 6379/tcp → Redis
|
|
|
|
### 5. compose.lan.yml ✅
|
|
|
|
**Already delivered in J6** — verified services:
|
|
|
|
- `app` — Fastify + Socket.IO (port 3000)
|
|
- `postgres` — PostgreSQL 16 (internal)
|
|
- `redis` — Redis 7 (internal)
|
|
- `ofelia` — Cron scheduler for backups
|
|
- `backup` — Daily backup at 03:00 UTC
|
|
|
|
### 6. Two-Agent Test Scenario ✅
|
|
|
|
**Test plan:**
|
|
|
|
1. **Setup:** Run bootstrap on Ubuntu LAN server
|
|
2. **Agent 1:** Connect to `ws://<lan-ip>:3000/agents` with JWT
|
|
3. **Agent 2:** Connect to same WebSocket endpoint with different JWT
|
|
4. **Action:** Both agents join the same room
|
|
5. **Verify:** Send ≥1 message, verify persistence in DB
|
|
6. **Reconnect:** Disconnect both agents, reconnect, fetch history
|
|
7. **Success:** Message appears in history with correct metadata
|
|
|
|
**Test script placeholder:** `test/smoke-lan-2-agents.sh` (to be implemented during live test)
|
|
|
|
---
|
|
|
|
## Pre-Test Checklist
|
|
|
|
### Infrastructure
|
|
|
|
- [ ] Ubuntu 22.04 or 24.04 LTS server available (founder LAN)
|
|
- [ ] Server has internet access (Forgejo, Docker Hub)
|
|
- [ ] Root/sudo access configured
|
|
- [ ] LAN subnet identified (e.g., `192.168.1.0/24`)
|
|
|
|
### Access
|
|
|
|
- [ ] Forgejo credentials configured (or public repo)
|
|
- [ ] SSH access from testing workstation
|
|
- [ ] Two Paperclip agent identities available (different API tokens)
|
|
|
|
### Fallback
|
|
|
|
- [ ] Local Multipass VM ready (if founder server unavailable)
|
|
- [ ] Docker Desktop + compose.dev.yml tested locally
|
|
|
|
---
|
|
|
|
## Test Procedure
|
|
|
|
### Phase 1 — Bootstrap Execution
|
|
|
|
**On Ubuntu LAN server:**
|
|
|
|
```bash
|
|
# Download and run bootstrap script
|
|
sudo bash -c "$(curl -fsSL https://forgejo.barodine.net/barodine/agenthub/raw/branch/main/scripts/bootstrap.sh)"
|
|
|
|
# Verify completion (should show ✅ messages)
|
|
# Expected duration: < 15 minutes
|
|
```
|
|
|
|
**Success criteria:**
|
|
|
|
- All 10 steps complete with ✅
|
|
- Final smoke test shows `{"status":"ok"}`
|
|
- Stack is running: `docker compose -f /opt/agenthub/compose.lan.yml ps`
|
|
|
|
### Phase 2 — UFW Configuration
|
|
|
|
```bash
|
|
# Set up firewall (replace subnet with actual LAN)
|
|
sudo ufw allow from 192.168.1.0/24 to any port 22 proto tcp
|
|
sudo ufw allow from 192.168.1.0/24 to any port 3000 proto tcp
|
|
sudo ufw default deny incoming
|
|
sudo ufw --force enable
|
|
sudo ufw status verbose
|
|
```
|
|
|
|
**Success criteria:**
|
|
|
|
- UFW shows status `active`
|
|
- Rules permit 22/tcp and 3000/tcp from LAN subnet
|
|
- Default deny incoming
|
|
|
|
### Phase 3 — Health Verification
|
|
|
|
```bash
|
|
# From server
|
|
curl http://127.0.0.1:3000/healthz
|
|
# → {"status":"ok","uptime":...}
|
|
|
|
curl http://127.0.0.1:3000/readyz
|
|
# → {"status":"ready","checks":{"db":"ok"}}
|
|
|
|
# From LAN workstation
|
|
curl http://<lan-ip>:3000/healthz
|
|
# Should also work (if UFW rule is correct)
|
|
```
|
|
|
|
### Phase 4 — Two-Agent WebSocket Test
|
|
|
|
**On LAN workstation (not server):**
|
|
|
|
1. **Create two test agents** (via REST API):
|
|
```bash
|
|
# Agent 1
|
|
curl -X POST http://<lan-ip>:3000/api/agents \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"name":"TestAgent1","capabilities":["chat"]}'
|
|
|
|
# Agent 2
|
|
curl -X POST http://<lan-ip>:3000/api/agents \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"name":"TestAgent2","capabilities":["chat"]}'
|
|
```
|
|
|
|
2. **Generate API tokens** for each agent:
|
|
```bash
|
|
# Token for Agent 1
|
|
curl -X POST http://<lan-ip>:3000/api/tokens \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"agentId":"<agent1-id>","name":"test-token"}'
|
|
|
|
# Token for Agent 2
|
|
curl -X POST http://<lan-ip>:3000/api/tokens \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"agentId":"<agent2-id>","name":"test-token"}'
|
|
```
|
|
|
|
3. **Exchange tokens for JWTs:**
|
|
```bash
|
|
# JWT for Agent 1
|
|
curl -X POST http://<lan-ip>:3000/api/sessions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"apiToken":"<token1>"}'
|
|
# → {"jwt":"<jwt1>","expiresAt":"..."}
|
|
|
|
# JWT for Agent 2
|
|
curl -X POST http://<lan-ip>:3000/api/sessions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"apiToken":"<token2>"}'
|
|
# → {"jwt":"<jwt2>","expiresAt":"..."}
|
|
```
|
|
|
|
4. **Create a test room:**
|
|
```bash
|
|
curl -X POST http://<lan-ip>:3000/api/rooms \
|
|
-H "Authorization: Bearer <jwt1>" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"name":"smoke-test-room","createdByAgentId":"<agent1-id>"}'
|
|
# → {"id":"<room-id>","name":"smoke-test-room",...}
|
|
```
|
|
|
|
5. **Connect Agent 1 WebSocket:**
|
|
```bash
|
|
# Use test client or Paperclip agent
|
|
# Connect to ws://<lan-ip>:3000/agents?token=<jwt1>
|
|
# Join room: emit 'room:join' with {"roomId":"<room-id>"}
|
|
```
|
|
|
|
6. **Connect Agent 2 WebSocket:**
|
|
```bash
|
|
# Connect to ws://<lan-ip>:3000/agents?token=<jwt2>
|
|
# Join same room: emit 'room:join' with {"roomId":"<room-id>"}
|
|
```
|
|
|
|
7. **Send message from Agent 1:**
|
|
```bash
|
|
# Emit 'message:send' with {"roomId":"<room-id>","body":"Hello from Agent 1"}
|
|
# Verify Agent 2 receives 'message:new' event
|
|
```
|
|
|
|
8. **Verify persistence:**
|
|
```bash
|
|
# Disconnect both agents
|
|
# Reconnect Agent 2
|
|
# Fetch history: GET /api/rooms/<room-id>/messages
|
|
# → Should contain "Hello from Agent 1" message
|
|
```
|
|
|
|
**Success criteria:**
|
|
|
|
- Both agents connect successfully (no auth errors)
|
|
- Both agents join the same room
|
|
- Message sent by Agent 1 is received by Agent 2 in real-time
|
|
- Message persists in database
|
|
- Message appears in history after reconnect
|
|
|
|
### Phase 5 — Feature Flag Rollback Test
|
|
|
|
```bash
|
|
# On server
|
|
echo "FEATURE_MESSAGING_ENABLED=false" | sudo tee -a /opt/agenthub/.env
|
|
cd /opt/agenthub
|
|
sudo -u agenthub docker compose -f compose.lan.yml restart app
|
|
|
|
# Verify messaging disabled
|
|
docker compose -f compose.lan.yml logs app | grep -i "messaging disabled"
|
|
# → Should show warning log
|
|
|
|
# Attempt WebSocket connection (should fail or close)
|
|
# curl http://<lan-ip>:3000/healthz should still work
|
|
|
|
# Re-enable
|
|
sudo sed -i '/FEATURE_MESSAGING_ENABLED/d' /opt/agenthub/.env
|
|
sudo -u agenthub docker compose -f compose.lan.yml restart app
|
|
|
|
# Verify messaging re-enabled
|
|
docker compose -f compose.lan.yml logs app | grep -i "messaging enabled"
|
|
```
|
|
|
|
**Success criteria:**
|
|
|
|
- Messaging disabled → WebSocket connections fail gracefully
|
|
- Health endpoint still responds (HTTP works, WS blocked)
|
|
- Re-enable → WebSocket connections work again
|
|
|
|
---
|
|
|
|
## Post-Test Validation
|
|
|
|
### Backup Verification
|
|
|
|
```bash
|
|
# Trigger manual backup
|
|
cd /opt/agenthub
|
|
docker compose -f compose.lan.yml exec backup /usr/local/bin/backup.sh
|
|
|
|
# Verify backup exists
|
|
ls -lh /opt/agenthub/backups/
|
|
# Should show .dump file with non-zero size and recent timestamp
|
|
```
|
|
|
|
### Restore Test (Non-Destructive)
|
|
|
|
```bash
|
|
# List backups
|
|
ls -1 /opt/agenthub/backups/*.dump | tail -1
|
|
|
|
# Verify restore script is ready (dry-run by checking --list)
|
|
docker compose -f compose.lan.yml run --rm backup \
|
|
pg_restore --list /backups/<latest>.dump | head -20
|
|
|
|
# (Optional) Full restore test in isolated environment
|
|
```
|
|
|
|
### Monitoring Setup
|
|
|
|
```bash
|
|
# Check metrics endpoint
|
|
curl http://<lan-ip>:3000/metrics | grep ws_connections
|
|
# → Should show gauge for active connections
|
|
|
|
# Check Uptime Kuma is monitoring (if deployed)
|
|
# → Visit http://<monitoring-host>:3001 and verify AgentHub monitor shows "up"
|
|
```
|
|
|
|
---
|
|
|
|
## Done Criteria (from BARAAA-28)
|
|
|
|
- [x] `scripts/bootstrap.sh` created and idempotent
|
|
- [ ] Bootstrap replayed from scratch on Ubuntu → stack running < 15 min
|
|
- [ ] 2 distinct Paperclip agents exchange ≥1 persisted message over LAN WebSocket
|
|
- [ ] Message retrieved from history after reconnect
|
|
- [x] `docs/RUNBOOK-lan.md` covers setup/deploy/restore/rollback/ufw
|
|
- [x] UFW rules documented and tested
|
|
- [x] Feature flag `FEATURE_MESSAGING_ENABLED` implemented
|
|
- [ ] Screenshot/curl trace attached to BARAAA-28
|
|
- [ ] Live demo on founder LAN server successful
|
|
|
|
**Remaining:** Live execution on Ubuntu LAN server with 2 real Paperclip agents.
|
|
|
|
---
|
|
|
|
## Fallback Plan
|
|
|
|
If founder Ubuntu LAN server is unavailable:
|
|
|
|
1. **Local Multipass VM:**
|
|
```bash
|
|
multipass launch --name agenthub-test --disk 20G --memory 4G ubuntu-22.04
|
|
multipass exec agenthub-test -- bash -c "$(curl -fsSL <bootstrap-url>)"
|
|
```
|
|
|
|
2. **Docker Desktop local test:**
|
|
```bash
|
|
docker compose -f compose.dev.yml up -d
|
|
# Test with localhost instead of LAN IP
|
|
```
|
|
|
|
3. **Document divergence** from LAN deployment and plan remediation.
|
|
|
|
---
|
|
|
|
## Risk Mitigation (from Plan §7)
|
|
|
|
| Risk | Mitigation | Status |
|
|
|-----------------------------------|-------------------------------------------------|--------|
|
|
| Founder server not ready | Fallback: local Multipass/Docker Desktop demo | ✅ |
|
|
| bootstrap.sh breaks on Ubuntu ver | Test 22.04 + 24.04 LTS before delivery | Pending |
|
|
| UFW blocks legitimate LAN traffic | Subnet-specific rules + verification steps | ✅ |
|
|
| Backup script fails | Pre-test backup.sh manually, verify .dump exists| Pending |
|
|
| WebSocket connection refused | Firewall check + CORS check + logs | ✅ |
|
|
|
|
---
|
|
|
|
**Next:** Execute live test on founder Ubuntu LAN server and attach results to BARAAA-28.
|