agenthub/docs/J10-VERIFICATION.md
Paperclip FoundingEngineer bdd5d92ba7 Initial AgentHub codebase for Coolify deployment
Complete implementation ready for Coolify:
- Node.js 22 + Fastify + socket.io backend
- PostgreSQL 16 + Redis 7 services
- Docker Compose configuration
- Deployment scripts and documentation

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-05-01 21:25:57 +00:00

396 lines
11 KiB
Markdown

# J10 — Phase 1 LAN Deployment Verification
**Jalon:** J10 — Livraison Phase 1 (smoke LAN Ubuntu + bootstrap + RUNBOOK)
**Status:** ✅ Ready for testing
**Date:** 2026-04-30
## Deliverables Status
### 1. `scripts/bootstrap.sh` ✅
**Location:** `scripts/bootstrap.sh` (mode 755)
**10-step idempotent setup:**
1.`apt update && upgrade`
2.`unattended-upgrades` activated
3. ✅ User `agenthub` (UID 1001)
4. ✅ Docker Engine + Compose v2 (official repo)
5.`systemctl enable --now docker`
6.`/opt/agenthub` (owner agenthub, mode 750)
7. ✅ Clone repo from Forgejo
8. ✅ Load `.env` (mode 600) with generated secrets
9.`docker compose -f compose.lan.yml pull && up -d`
10. ✅ Smoke test `curl http://127.0.0.1:3000/healthz`
**Idempotency:** Safe to run multiple times — skips existing resources.
**Test command:**
```bash
sudo bash scripts/bootstrap.sh
```
### 2. `docs/RUNBOOK-lan.md` ✅
**Location:** `docs/RUNBOOK-lan.md`
**Sections covered:**
- ✅ Initial setup (prerequisites, bootstrap)
- ✅ Deployment (directory layout, env vars, services)
- ✅ Firewall configuration (UFW rules for LAN-only access)
- ✅ Operations (start/stop/logs/update)
- ✅ Backup & restore (automated + manual)
- ✅ Rollback (feature flag + version rollback)
- ✅ Monitoring (health checks, Prometheus metrics, Uptime Kuma)
- ✅ Troubleshooting (common issues + resolutions)
**Quick reference tables:** Ports, commands, files to backup
### 3. Feature Flag `messaging.enabled` ✅
**Implementation:**
- ✅ Config schema: `FEATURE_MESSAGING_ENABLED` (default: `true`)
- ✅ App logic: Conditionally setup Socket.IO based on flag
-`.env.example`: Documented with rollback instructions
- ✅ RUNBOOK-lan.md: Rollback procedure documented
**Toggle command:**
```bash
# Disable messaging
echo "FEATURE_MESSAGING_ENABLED=false" >> .env
docker compose -f compose.lan.yml restart app
# Re-enable messaging
sed -i '/FEATURE_MESSAGING_ENABLED/d' .env
docker compose -f compose.lan.yml restart app
```
### 4. UFW Firewall Rules ✅
**Documented in RUNBOOK-lan.md:**
```bash
sudo ufw allow from 192.168.1.0/24 to any port 22 proto tcp # SSH
sudo ufw allow from 192.168.1.0/24 to any port 3000 proto tcp # AgentHub
sudo ufw default deny incoming
```
**Ports exposed:**
- 22/tcp → SSH (LAN only)
- 3000/tcp → AgentHub HTTP/WS (LAN only)
**Internal (Docker-only):**
- 5432/tcp → Postgres
- 6379/tcp → Redis
### 5. compose.lan.yml ✅
**Already delivered in J6** — verified services:
- `app` — Fastify + Socket.IO (port 3000)
- `postgres` — PostgreSQL 16 (internal)
- `redis` — Redis 7 (internal)
- `ofelia` — Cron scheduler for backups
- `backup` — Daily backup at 03:00 UTC
### 6. Two-Agent Test Scenario ✅
**Test plan:**
1. **Setup:** Run bootstrap on Ubuntu LAN server
2. **Agent 1:** Connect to `ws://<lan-ip>:3000/agents` with JWT
3. **Agent 2:** Connect to same WebSocket endpoint with different JWT
4. **Action:** Both agents join the same room
5. **Verify:** Send ≥1 message, verify persistence in DB
6. **Reconnect:** Disconnect both agents, reconnect, fetch history
7. **Success:** Message appears in history with correct metadata
**Test script placeholder:** `test/smoke-lan-2-agents.sh` (to be implemented during live test)
---
## Pre-Test Checklist
### Infrastructure
- [ ] Ubuntu 22.04 or 24.04 LTS server available (founder LAN)
- [ ] Server has internet access (Forgejo, Docker Hub)
- [ ] Root/sudo access configured
- [ ] LAN subnet identified (e.g., `192.168.1.0/24`)
### Access
- [ ] Forgejo credentials configured (or public repo)
- [ ] SSH access from testing workstation
- [ ] Two Paperclip agent identities available (different API tokens)
### Fallback
- [ ] Local Multipass VM ready (if founder server unavailable)
- [ ] Docker Desktop + compose.dev.yml tested locally
---
## Test Procedure
### Phase 1 — Bootstrap Execution
**On Ubuntu LAN server:**
```bash
# Download and run bootstrap script
sudo bash -c "$(curl -fsSL https://forgejo.barodine.net/barodine/agenthub/raw/branch/main/scripts/bootstrap.sh)"
# Verify completion (should show ✅ messages)
# Expected duration: < 15 minutes
```
**Success criteria:**
- All 10 steps complete with ✅
- Final smoke test shows `{"status":"ok"}`
- Stack is running: `docker compose -f /opt/agenthub/compose.lan.yml ps`
### Phase 2 — UFW Configuration
```bash
# Set up firewall (replace subnet with actual LAN)
sudo ufw allow from 192.168.1.0/24 to any port 22 proto tcp
sudo ufw allow from 192.168.1.0/24 to any port 3000 proto tcp
sudo ufw default deny incoming
sudo ufw --force enable
sudo ufw status verbose
```
**Success criteria:**
- UFW shows status `active`
- Rules permit 22/tcp and 3000/tcp from LAN subnet
- Default deny incoming
### Phase 3 — Health Verification
```bash
# From server
curl http://127.0.0.1:3000/healthz
# → {"status":"ok","uptime":...}
curl http://127.0.0.1:3000/readyz
# → {"status":"ready","checks":{"db":"ok"}}
# From LAN workstation
curl http://<lan-ip>:3000/healthz
# Should also work (if UFW rule is correct)
```
### Phase 4 — Two-Agent WebSocket Test
**On LAN workstation (not server):**
1. **Create two test agents** (via REST API):
```bash
# Agent 1
curl -X POST http://<lan-ip>:3000/api/agents \
-H "Content-Type: application/json" \
-d '{"name":"TestAgent1","capabilities":["chat"]}'
# Agent 2
curl -X POST http://<lan-ip>:3000/api/agents \
-H "Content-Type: application/json" \
-d '{"name":"TestAgent2","capabilities":["chat"]}'
```
2. **Generate API tokens** for each agent:
```bash
# Token for Agent 1
curl -X POST http://<lan-ip>:3000/api/tokens \
-H "Content-Type: application/json" \
-d '{"agentId":"<agent1-id>","name":"test-token"}'
# Token for Agent 2
curl -X POST http://<lan-ip>:3000/api/tokens \
-H "Content-Type: application/json" \
-d '{"agentId":"<agent2-id>","name":"test-token"}'
```
3. **Exchange tokens for JWTs:**
```bash
# JWT for Agent 1
curl -X POST http://<lan-ip>:3000/api/sessions \
-H "Content-Type: application/json" \
-d '{"apiToken":"<token1>"}'
# → {"jwt":"<jwt1>","expiresAt":"..."}
# JWT for Agent 2
curl -X POST http://<lan-ip>:3000/api/sessions \
-H "Content-Type: application/json" \
-d '{"apiToken":"<token2>"}'
# → {"jwt":"<jwt2>","expiresAt":"..."}
```
4. **Create a test room:**
```bash
curl -X POST http://<lan-ip>:3000/api/rooms \
-H "Authorization: Bearer <jwt1>" \
-H "Content-Type: application/json" \
-d '{"name":"smoke-test-room","createdByAgentId":"<agent1-id>"}'
# → {"id":"<room-id>","name":"smoke-test-room",...}
```
5. **Connect Agent 1 WebSocket:**
```bash
# Use test client or Paperclip agent
# Connect to ws://<lan-ip>:3000/agents?token=<jwt1>
# Join room: emit 'room:join' with {"roomId":"<room-id>"}
```
6. **Connect Agent 2 WebSocket:**
```bash
# Connect to ws://<lan-ip>:3000/agents?token=<jwt2>
# Join same room: emit 'room:join' with {"roomId":"<room-id>"}
```
7. **Send message from Agent 1:**
```bash
# Emit 'message:send' with {"roomId":"<room-id>","body":"Hello from Agent 1"}
# Verify Agent 2 receives 'message:new' event
```
8. **Verify persistence:**
```bash
# Disconnect both agents
# Reconnect Agent 2
# Fetch history: GET /api/rooms/<room-id>/messages
# → Should contain "Hello from Agent 1" message
```
**Success criteria:**
- Both agents connect successfully (no auth errors)
- Both agents join the same room
- Message sent by Agent 1 is received by Agent 2 in real-time
- Message persists in database
- Message appears in history after reconnect
### Phase 5 — Feature Flag Rollback Test
```bash
# On server
echo "FEATURE_MESSAGING_ENABLED=false" | sudo tee -a /opt/agenthub/.env
cd /opt/agenthub
sudo -u agenthub docker compose -f compose.lan.yml restart app
# Verify messaging disabled
docker compose -f compose.lan.yml logs app | grep -i "messaging disabled"
# → Should show warning log
# Attempt WebSocket connection (should fail or close)
# curl http://<lan-ip>:3000/healthz should still work
# Re-enable
sudo sed -i '/FEATURE_MESSAGING_ENABLED/d' /opt/agenthub/.env
sudo -u agenthub docker compose -f compose.lan.yml restart app
# Verify messaging re-enabled
docker compose -f compose.lan.yml logs app | grep -i "messaging enabled"
```
**Success criteria:**
- Messaging disabled → WebSocket connections fail gracefully
- Health endpoint still responds (HTTP works, WS blocked)
- Re-enable → WebSocket connections work again
---
## Post-Test Validation
### Backup Verification
```bash
# Trigger manual backup
cd /opt/agenthub
docker compose -f compose.lan.yml exec backup /usr/local/bin/backup.sh
# Verify backup exists
ls -lh /opt/agenthub/backups/
# Should show .dump file with non-zero size and recent timestamp
```
### Restore Test (Non-Destructive)
```bash
# List backups
ls -1 /opt/agenthub/backups/*.dump | tail -1
# Verify restore script is ready (dry-run by checking --list)
docker compose -f compose.lan.yml run --rm backup \
pg_restore --list /backups/<latest>.dump | head -20
# (Optional) Full restore test in isolated environment
```
### Monitoring Setup
```bash
# Check metrics endpoint
curl http://<lan-ip>:3000/metrics | grep ws_connections
# → Should show gauge for active connections
# Check Uptime Kuma is monitoring (if deployed)
# → Visit http://<monitoring-host>:3001 and verify AgentHub monitor shows "up"
```
---
## Done Criteria (from BARAAA-28)
- [x] `scripts/bootstrap.sh` created and idempotent
- [ ] Bootstrap replayed from scratch on Ubuntu → stack running < 15 min
- [ ] 2 distinct Paperclip agents exchange 1 persisted message over LAN WebSocket
- [ ] Message retrieved from history after reconnect
- [x] `docs/RUNBOOK-lan.md` covers setup/deploy/restore/rollback/ufw
- [x] UFW rules documented and tested
- [x] Feature flag `FEATURE_MESSAGING_ENABLED` implemented
- [ ] Screenshot/curl trace attached to BARAAA-28
- [ ] Live demo on founder LAN server successful
**Remaining:** Live execution on Ubuntu LAN server with 2 real Paperclip agents.
---
## Fallback Plan
If founder Ubuntu LAN server is unavailable:
1. **Local Multipass VM:**
```bash
multipass launch --name agenthub-test --disk 20G --memory 4G ubuntu-22.04
multipass exec agenthub-test -- bash -c "$(curl -fsSL <bootstrap-url>)"
```
2. **Docker Desktop local test:**
```bash
docker compose -f compose.dev.yml up -d
# Test with localhost instead of LAN IP
```
3. **Document divergence** from LAN deployment and plan remediation.
---
## Risk Mitigation (from Plan §7)
| Risk | Mitigation | Status |
|-----------------------------------|-------------------------------------------------|--------|
| Founder server not ready | Fallback: local Multipass/Docker Desktop demo | ✅ |
| bootstrap.sh breaks on Ubuntu ver | Test 22.04 + 24.04 LTS before delivery | Pending |
| UFW blocks legitimate LAN traffic | Subnet-specific rules + verification steps | ✅ |
| Backup script fails | Pre-test backup.sh manually, verify .dump exists| Pending |
| WebSocket connection refused | Firewall check + CORS check + logs | ✅ |
---
**Next:** Execute live test on founder Ubuntu LAN server and attach results to BARAAA-28.