docs(ofelia): Add BARAAA-70 verification - Ofelia restart loop resolved
BARAAA-70 is now resolved. The Ofelia container restart loop has been fixed by relocating job labels from the ephemeral backup container to the persistent postgres container. Root cause: Ofelia labels were on backup service with restart: 'no', so the container would exit immediately. Ofelia only scans running containers, found no jobs, and crashed with "unable to start a empty scheduler". Fixes applied: - Fixed /opt/agenthub/backups permissions (chmod 777) - Moved Ofelia job labels to postgres service - Fixed YAML syntax errors in compose.lan.yml Verification: Ofelia now running stably with 0 restarts, backup-daily job registered with schedule '0 0 3 * * *'. Next: Monitor backup execution at 3am UTC on 2026-05-03. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
3f3d6203b1
commit
167b30a409
1 changed files with 125 additions and 0 deletions
125
docs/BARAAA-70-VERIFICATION.md
Normal file
125
docs/BARAAA-70-VERIFICATION.md
Normal file
|
|
@ -0,0 +1,125 @@
|
|||
# BARAAA-70: Ofelia Container Restart Loop - RESOLVED ✅
|
||||
|
||||
**Date**: 2026-05-02
|
||||
**Server**: 192.168.9.23 (LAN)
|
||||
**Status**: ✅ DONE
|
||||
|
||||
## Problem
|
||||
|
||||
agenthub-ofelia-1 container was in continuous restart loop with error:
|
||||
```
|
||||
unable to start a empty scheduler
|
||||
```
|
||||
|
||||
Ofelia scheduler was unable to find any scheduled jobs and crashed immediately.
|
||||
|
||||
## Root Cause Chain
|
||||
|
||||
1. **Backup container crash** → Permission denied writing to `/backups/` directory
|
||||
2. **Backup container exits** → Has `restart: 'no'` policy, container stops after running
|
||||
3. **Ofelia finds no jobs** → Was looking for labels on backup container, but container not running
|
||||
4. **Ofelia crashes** → Cannot start with empty scheduler (no jobs found)
|
||||
5. **Restart loop** → Docker restarts Ofelia, cycle repeats
|
||||
|
||||
## Fixes Applied
|
||||
|
||||
### 1. Fixed Backup Permissions
|
||||
```bash
|
||||
sudo mkdir -p /opt/agenthub/backups
|
||||
sudo chmod 777 /opt/agenthub/backups
|
||||
```
|
||||
|
||||
### 2. Relocated Ofelia Labels
|
||||
|
||||
**Problem**: Labels were on `backup` service which has `restart: 'no'` and exits after running.
|
||||
|
||||
**Solution**: Moved labels to `postgres` service which runs continuously.
|
||||
|
||||
**Modified**: `/opt/agenthub/compose.lan.yml`
|
||||
|
||||
```yaml
|
||||
postgres:
|
||||
image: postgres:16-alpine
|
||||
environment:
|
||||
POSTGRES_DB: agenthub
|
||||
POSTGRES_USER: agenthub
|
||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
|
||||
volumes:
|
||||
- pgdata:/var/lib/postgresql/data
|
||||
labels:
|
||||
ofelia.enabled: 'true'
|
||||
ofelia.job-exec.backup-daily.schedule: '0 0 3 * * *'
|
||||
ofelia.job-exec.backup-daily.container: 'agenthub-backup-1'
|
||||
ofelia.job-exec.backup-daily.command: '/usr/local/bin/backup.sh'
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
### 3. Fixed YAML Syntax Issues
|
||||
|
||||
Multiple YAML syntax errors were introduced during manual editing:
|
||||
- Incorrect indentation causing `services.restart must be a mapping`
|
||||
- Empty `labels:` line in backup section
|
||||
- Redis command in flow style instead of block style
|
||||
|
||||
All fixed via SSH access using programmatic file editing.
|
||||
|
||||
### 4. Restarted Services
|
||||
|
||||
```bash
|
||||
docker compose -f compose.lan.yml up -d postgres
|
||||
docker compose -f compose.lan.yml restart ofelia
|
||||
```
|
||||
|
||||
## Verification Results
|
||||
|
||||
### Container Status
|
||||
```bash
|
||||
docker compose -f compose.lan.yml ps ofelia
|
||||
```
|
||||
**Result**: Container shows "Up" status (not "Restarting") ✅
|
||||
|
||||
### Restart Count
|
||||
```bash
|
||||
docker inspect agenthub-ofelia-1 --format '{{.State.Status}} - Restarts: {{.RestartCount}}'
|
||||
```
|
||||
**Result**: `running - Restarts: 0` ✅
|
||||
|
||||
### Ofelia Logs
|
||||
```bash
|
||||
docker logs agenthub-ofelia-1 --tail 20
|
||||
```
|
||||
**Result**:
|
||||
```
|
||||
New job registered 'backup-daily' - '/usr/local/bin/backup.sh' - '0 0 3 * * *'
|
||||
Starting scheduler with 1 jobs
|
||||
```
|
||||
✅ Job successfully registered and scheduler started
|
||||
|
||||
### Uptime Stability
|
||||
Container maintained stable "Up" state for 27+ seconds after restart with zero restarts.
|
||||
|
||||
## Acceptance Criteria Met
|
||||
|
||||
- [x] Ofelia container in "Up" state (not "Restarting")
|
||||
- [x] Scheduler starts successfully with registered job
|
||||
- [x] Zero restart count after fix applied
|
||||
- [x] Backup job registered with correct schedule (3am UTC daily)
|
||||
|
||||
## Next Verification
|
||||
|
||||
Monitor backup-daily job execution at **03:00 UTC on 2026-05-03** to confirm scheduled task runs successfully.
|
||||
|
||||
Expected: `/opt/agenthub/backups/` should contain new dump file after 3am execution.
|
||||
|
||||
## Files Modified
|
||||
|
||||
- `/opt/agenthub/compose.lan.yml` - Added Ofelia labels to postgres service, fixed YAML syntax
|
||||
- `/opt/agenthub/backups/` - Created directory with correct permissions (777)
|
||||
|
||||
## Technical Notes
|
||||
|
||||
**Ofelia Job Discovery**: Ofelia scans **running** containers for labels. Jobs on containers with `restart: 'no'` that exit immediately are not discoverable.
|
||||
|
||||
**Solution Pattern**: For job-exec mode, place Ofelia labels on a service that runs continuously (like postgres, redis) rather than on the ephemeral service being executed.
|
||||
|
||||
**Alternative**: Use job-run mode instead of job-exec if you need to schedule one-shot containers, but this wasn't necessary here since backup.sh already existed in the backup service.
|
||||
Loading…
Reference in a new issue