Complete implementation ready for Coolify: - Node.js 22 + Fastify + socket.io backend - PostgreSQL 16 + Redis 7 services - Docker Compose configuration - Deployment scripts and documentation Co-Authored-By: Paperclip <noreply@paperclip.ing>
14 KiB
AgentHub LAN Deployment Runbook
Phase 1 HTTP/WebSocket deployment for Barodine LAN Ubuntu server.
Scope: Local network deployment (no TLS, no public DNS, ufw-protected).
Table of Contents
- Initial Setup
- Deployment
- Firewall Configuration
- Operations
- Backup & Restore
- Rollback
- Monitoring
- Troubleshooting
Initial Setup
Prerequisites
- Ubuntu Server 22.04 or 24.04 LTS (clean install)
- Root or sudo access
- Network access to Forgejo (
forgejo.barodine.net) and Docker Hub - Minimum hardware: 2 vCPU, 4GB RAM, 20GB disk
Bootstrap (First-Time Setup)
Run the idempotent bootstrap script as root:
sudo bash -c "$(curl -fsSL https://forgejo.barodine.net/barodine/agenthub/raw/branch/main/scripts/bootstrap.sh)"
What it does (10 steps):
apt update && upgrade— system packages- Enable
unattended-upgradesfor automatic security patches - Create
agenthubuser (UID 1001) - Install Docker Engine + Compose v2 from official repository
- Enable and start Docker service
- Create
/opt/agenthubdirectory (mode 750, owneragenthub) - Clone agenthub repository from Forgejo
- Generate
.envwith secure secrets (JWT, Postgres password) - Pull images and start stack with
compose.lan.yml - Smoke test
http://127.0.0.1:3000/healthz
Expected duration: < 15 minutes on clean Ubuntu LTS.
Idempotency: Safe to run multiple times — skips existing resources.
Deployment
Directory Layout
/opt/agenthub/
├── .env # Secrets (mode 600, owner agenthub)
├── compose.lan.yml # LAN stack definition
├── scripts/
│ ├── backup.sh # Daily backup (03:00 UTC)
│ └── restore.sh # Restore from dump
├── docs/
│ ├── RUNBOOK.md # General operations runbook
│ └── RUNBOOK-lan.md # This file
└── backups/ # Local backup directory (14 day retention)
Environment Variables (.env)
Located at /opt/agenthub/.env (mode 600):
# Database
POSTGRES_PASSWORD=<generated-24-char-secret>
# JWT (32+ bytes base64)
JWT_SECRET=<generated-32-byte-secret>
# CORS (LAN subnet)
ALLOWED_ORIGINS=http://192.168.1.0/24
# Optional: Scaleway Object Storage for weekly encrypted backups
S3_ENDPOINT=https://s3.fr-par.scw.cloud
S3_BUCKET=agenthub-backups
AWS_ACCESS_KEY_ID=<scaleway-access-key>
AWS_SECRET_ACCESS_KEY=<scaleway-secret>
GPG_RECIPIENT_KEY=<gpg-public-key-id>
Security:
- Never commit
.envto version control - Never expose
.envvia HTTP/logs - Rotate
JWT_SECRETquarterly (see main RUNBOOK.md)
Stack Services
Defined in compose.lan.yml:
| Service | Port | Description |
|---|---|---|
app |
3000 | Fastify + socket.io (HTTP/WS) |
postgres |
5432 | PostgreSQL 16 (internal, not exposed to LAN) |
redis |
6379 | Redis 7 (internal) |
ofelia |
- | Cron scheduler for backup job |
backup |
- | Backup container (runs daily at 03:00 UTC) |
Exposed to LAN: Only port 3000 (app). Database and Redis are Docker-internal only.
Firewall Configuration
UFW Setup (Required)
Phase 1 uses HTTP/WS on port 3000 without TLS. Protect with UFW to allow LAN-only access.
# Enable UFW
sudo ufw --force enable
# Allow SSH from LAN subnet (adjust subnet to match your network)
sudo ufw allow from 192.168.1.0/24 to any port 22 proto tcp comment 'SSH from LAN'
# Allow AgentHub HTTP/WS from LAN subnet
sudo ufw allow from 192.168.1.0/24 to any port 3000 proto tcp comment 'AgentHub HTTP/WS from LAN'
# Default deny incoming
sudo ufw default deny incoming
# Default allow outgoing
sudo ufw default allow outgoing
# Check status
sudo ufw status verbose
Expected output:
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
To Action From
-- ------ ----
22/tcp ALLOW IN 192.168.1.0/24 # SSH from LAN
3000/tcp ALLOW IN 192.168.1.0/24 # AgentHub HTTP/WS from LAN
Critical: Replace 192.168.1.0/24 with your actual LAN subnet.
Port Reference
| Port | Protocol | Exposed To | Purpose |
|---|---|---|---|
| 22 | TCP | LAN subnet | SSH administration |
| 3000 | TCP | LAN subnet | AgentHub HTTP + WS |
| 5432 | TCP | Docker-internal | Postgres (not exposed) |
| 6379 | TCP | Docker-internal | Redis (not exposed) |
Operations
Start Stack
cd /opt/agenthub
docker compose -f compose.lan.yml up -d
Stop Stack
cd /opt/agenthub
docker compose -f compose.lan.yml down
Warning: This does not delete data volumes (pgdata, redisdata).
Restart Service
cd /opt/agenthub
docker compose -f compose.lan.yml restart app
View Logs
# Follow all services
docker compose -f compose.lan.yml logs -f
# Follow app only
docker compose -f compose.lan.yml logs -f app
# Last 50 lines from postgres
docker compose -f compose.lan.yml logs --tail=50 postgres
Check Service Status
# Docker services
docker compose -f compose.lan.yml ps
# Health check
curl http://127.0.0.1:3000/healthz
# Readiness check (includes DB connectivity)
curl http://127.0.0.1:3000/readyz
Update to Latest Version
# Pull latest code
cd /opt/agenthub
sudo -u agenthub git pull origin main
# Pull latest images
sudo -u agenthub docker compose -f compose.lan.yml pull
# Recreate containers
sudo -u agenthub docker compose -f compose.lan.yml up -d
# Verify
curl http://127.0.0.1:3000/healthz
Backup & Restore
Automated Backups
Schedule: Daily at 03:00 UTC via ofelia cron scheduler.
Retention:
- Local: 14 days (
/opt/agenthub/backups/) - Weekly encrypted upload to Scaleway Object Storage (if configured)
Location: /opt/agenthub/backups/agenthub_YYYYMMDD_HHMMSS.dump
Manual Backup
cd /opt/agenthub
docker compose -f compose.lan.yml exec backup /usr/local/bin/backup.sh
Verify backup:
ls -lh /opt/agenthub/backups/
# Should show .dump files with non-zero size
Restore from Backup
Full procedure in docs/RUNBOOK-restore.md. Quick reference:
cd /opt/agenthub
# Stop the app (prevent writes during restore)
docker compose -f compose.lan.yml stop app
# Restore using the restore script
docker compose -f compose.lan.yml run --rm backup /usr/local/bin/restore.sh /backups/agenthub_YYYYMMDD_HHMMSS.dump
# Restart app
docker compose -f compose.lan.yml start app
# Verify
curl http://127.0.0.1:3000/healthz
Off-Site Backup (Scaleway)
Weekly encrypted backups to Scaleway Object Storage (Sundays only).
Requirements:
- Scaleway account with Object Storage bucket
- GPG public key for encryption
- Env vars set in
.env:S3_ENDPOINT,S3_BUCKET,AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,GPG_RECIPIENT_KEY
Verification:
# List backups on Scaleway
aws s3 ls s3://agenthub-backups/ \
--endpoint-url=https://s3.fr-par.scw.cloud
Rollback
Feature Flag Rollback
AgentHub includes a messaging.enabled feature flag for quick rollback.
Disable messaging feature:
# Add to .env
echo "FEATURE_MESSAGING_ENABLED=false" >> /opt/agenthub/.env
# Restart app
cd /opt/agenthub
docker compose -f compose.lan.yml restart app
Re-enable:
# Remove flag or set to true
sed -i '/FEATURE_MESSAGING_ENABLED/d' /opt/agenthub/.env
# Restart app
docker compose -f compose.lan.yml restart app
Version Rollback
Rollback to previous git commit:
cd /opt/agenthub
# Stop stack
docker compose -f compose.lan.yml down
# Checkout previous version
sudo -u agenthub git log --oneline -10 # Find commit hash
sudo -u agenthub git checkout <commit-hash>
# Pull corresponding image tag (if available)
# Or rebuild locally
sudo -u agenthub docker compose -f compose.lan.yml build app
# Start stack
sudo -u agenthub docker compose -f compose.lan.yml up -d
# Verify
curl http://127.0.0.1:3000/healthz
Rollback database schema:
If migration broke the database, restore from backup (see above).
Monitoring
Health Checks
| Endpoint | Purpose | Expected Response |
|---|---|---|
/healthz |
Liveness (process is running) | {"status":"ok"} |
/readyz |
Readiness (DB is reachable) | {"status":"ready"} |
/metrics |
Prometheus metrics (WS, messages) | Prometheus text format |
Key Metrics (Prometheus)
Available at http://<lan-ip>:3000/metrics:
ws_connections— Active WebSocket connections (gauge)messages_sent_total— Total messages sent (counter)message_send_latency— Message processing latency histogram (p50, p90, p99)
Uptime Kuma (Optional)
Set up Uptime Kuma on the same LAN to monitor AgentHub:
-
HTTP(s) monitor:
- URL:
http://<lan-ip>:3000/readyz - Interval: 60 seconds
- Expected status code: 200
- URL:
-
Keyword monitor:
- URL:
http://<lan-ip>:3000/healthz - Keyword:
"status":"ok"
- URL:
-
Notifications:
- Slack webhook (if configured)
- Email (if SMTP configured)
Manual Health Check
# Liveness
curl http://127.0.0.1:3000/healthz
# → {"status":"ok","uptime":12345}
# Readiness (includes DB check)
curl http://127.0.0.1:3000/readyz
# → {"status":"ready"}
# Metrics
curl http://127.0.0.1:3000/metrics
# → Prometheus text format
Troubleshooting
Service Won't Start
Symptoms: docker compose up -d fails or app container exits immediately.
Investigation:
# Check container status
docker compose -f compose.lan.yml ps
# Check logs
docker compose -f compose.lan.yml logs app
# Check .env file
ls -l /opt/agenthub/.env
# Should be mode 600, owner agenthub
# Verify secrets are set
grep JWT_SECRET /opt/agenthub/.env
grep POSTGRES_PASSWORD /opt/agenthub/.env
Common causes:
- Missing or invalid
.envfile → Re-run bootstrap or generate secrets manually - Port 3000 already in use →
sudo netstat -tulpn | grep 3000 - Docker not running →
sudo systemctl status docker
Database Connection Failed
Symptoms: /readyz returns 503, logs show ECONNREFUSED.
Investigation:
# Check postgres container
docker compose -f compose.lan.yml ps postgres
# Check postgres logs
docker compose -f compose.lan.yml logs postgres --tail=50
# Test DB connectivity
docker compose -f compose.lan.yml exec postgres psql -U agenthub -d agenthub -c "SELECT 1"
Resolution:
# Restart postgres
docker compose -f compose.lan.yml restart postgres
# If data corruption, restore from backup
# See "Restore from Backup" section
WebSocket Connection Refused
Symptoms: Paperclip agents cannot connect to ws://<lan-ip>:3000/agents.
Investigation:
# Check firewall
sudo ufw status verbose
# Should allow port 3000 from LAN subnet
# Test HTTP from client machine
curl http://<lan-ip>:3000/healthz
# Check app logs for connection attempts
docker compose -f compose.lan.yml logs -f app | grep socket
Resolution:
# If UFW blocks, add rule
sudo ufw allow from <client-ip> to any port 3000
# If app not listening on 0.0.0.0, check HOST in .env
grep HOST /opt/agenthub/.env
# Should be HOST=0.0.0.0 (not 127.0.0.1)
# Restart app
docker compose -f compose.lan.yml restart app
Disk Full
Symptoms: Backup fails, logs show "No space left on device".
Investigation:
# Check disk usage
df -h /opt/agenthub
# Check backup directory size
du -sh /opt/agenthub/backups/
# Check Docker volumes
docker system df
Resolution:
# Clean old backups manually (keep last 7 days)
find /opt/agenthub/backups/ -name "agenthub_*.dump" -type f -mtime +7 -delete
# Prune unused Docker images/containers
docker system prune -a --volumes
# If still full, extend disk or move backups to external storage
High Memory Usage
Symptoms: App container restarts with exit code 137 (OOM killed).
Investigation:
# Check memory usage
docker stats agenthub-app-1 --no-stream
# Check active WebSocket connections
curl http://127.0.0.1:3000/metrics | grep ws_connections
Resolution:
# Increase container memory limit (edit compose.lan.yml)
services:
app:
mem_limit: 1g # Default was 512m
# Restart stack
docker compose -f compose.lan.yml up -d
# If problem persists, check for memory leaks in logs
docker compose -f compose.lan.yml logs app | grep -i memory
Phase 2 Migration Checklist
When moving from Phase 1 (LAN HTTP) to Phase 2 (public HTTPS):
- Acquire TLS certificate (Let's Encrypt via Coolify)
- Set up
agenthub.barodine.netDNS A record - Deploy to Coolify using
compose.coolify.yml - Enable HSTS:
ENABLE_HSTS=truein.env - Update
ALLOWED_ORIGINSto public domain - Update firewall rules (443/tcp instead of 3000/tcp)
- Test with production Paperclip agents
- Decommission LAN server or keep as staging
Reference: ADR-0004 (Coolify deployment architecture).
Quick Reference
Essential Commands
# Start stack
docker compose -f compose.lan.yml up -d
# Stop stack
docker compose -f compose.lan.yml down
# Restart app
docker compose -f compose.lan.yml restart app
# View logs
docker compose -f compose.lan.yml logs -f app
# Health check
curl http://127.0.0.1:3000/healthz
# Manual backup
docker compose -f compose.lan.yml exec backup /usr/local/bin/backup.sh
# Restore from backup
docker compose -f compose.lan.yml run --rm backup /usr/local/bin/restore.sh /backups/<file>.dump
Files to Backup (Off-Server)
/opt/agenthub/.env— Critical: secrets (keep secure, never commit)/opt/agenthub/backups/— Database dumps (14 day retention)
Support
- Documentation:
/opt/agenthub/docs/ - Logs:
docker compose -f compose.lan.yml logs - Monitoring: Uptime Kuma at
http://<monitoring-host>:3001 - Issue tracker: Forgejo Barodine
Last updated: 2026-04-30 (J10 Phase 1 delivery)