BARAAA-70 is now resolved. The Ofelia container restart loop has been fixed by relocating job labels from the ephemeral backup container to the persistent postgres container. Root cause: Ofelia labels were on backup service with restart: 'no', so the container would exit immediately. Ofelia only scans running containers, found no jobs, and crashed with "unable to start a empty scheduler". Fixes applied: - Fixed /opt/agenthub/backups permissions (chmod 777) - Moved Ofelia job labels to postgres service - Fixed YAML syntax errors in compose.lan.yml Verification: Ofelia now running stably with 0 restarts, backup-daily job registered with schedule '0 0 3 * * *'. Next: Monitor backup execution at 3am UTC on 2026-05-03. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
125 lines
3.9 KiB
Markdown
125 lines
3.9 KiB
Markdown
# BARAAA-70: Ofelia Container Restart Loop - RESOLVED ✅
|
|
|
|
**Date**: 2026-05-02
|
|
**Server**: 192.168.9.23 (LAN)
|
|
**Status**: ✅ DONE
|
|
|
|
## Problem
|
|
|
|
agenthub-ofelia-1 container was in continuous restart loop with error:
|
|
```
|
|
unable to start a empty scheduler
|
|
```
|
|
|
|
Ofelia scheduler was unable to find any scheduled jobs and crashed immediately.
|
|
|
|
## Root Cause Chain
|
|
|
|
1. **Backup container crash** → Permission denied writing to `/backups/` directory
|
|
2. **Backup container exits** → Has `restart: 'no'` policy, container stops after running
|
|
3. **Ofelia finds no jobs** → Was looking for labels on backup container, but container not running
|
|
4. **Ofelia crashes** → Cannot start with empty scheduler (no jobs found)
|
|
5. **Restart loop** → Docker restarts Ofelia, cycle repeats
|
|
|
|
## Fixes Applied
|
|
|
|
### 1. Fixed Backup Permissions
|
|
```bash
|
|
sudo mkdir -p /opt/agenthub/backups
|
|
sudo chmod 777 /opt/agenthub/backups
|
|
```
|
|
|
|
### 2. Relocated Ofelia Labels
|
|
|
|
**Problem**: Labels were on `backup` service which has `restart: 'no'` and exits after running.
|
|
|
|
**Solution**: Moved labels to `postgres` service which runs continuously.
|
|
|
|
**Modified**: `/opt/agenthub/compose.lan.yml`
|
|
|
|
```yaml
|
|
postgres:
|
|
image: postgres:16-alpine
|
|
environment:
|
|
POSTGRES_DB: agenthub
|
|
POSTGRES_USER: agenthub
|
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
|
|
volumes:
|
|
- pgdata:/var/lib/postgresql/data
|
|
labels:
|
|
ofelia.enabled: 'true'
|
|
ofelia.job-exec.backup-daily.schedule: '0 0 3 * * *'
|
|
ofelia.job-exec.backup-daily.container: 'agenthub-backup-1'
|
|
ofelia.job-exec.backup-daily.command: '/usr/local/bin/backup.sh'
|
|
restart: unless-stopped
|
|
```
|
|
|
|
### 3. Fixed YAML Syntax Issues
|
|
|
|
Multiple YAML syntax errors were introduced during manual editing:
|
|
- Incorrect indentation causing `services.restart must be a mapping`
|
|
- Empty `labels:` line in backup section
|
|
- Redis command in flow style instead of block style
|
|
|
|
All fixed via SSH access using programmatic file editing.
|
|
|
|
### 4. Restarted Services
|
|
|
|
```bash
|
|
docker compose -f compose.lan.yml up -d postgres
|
|
docker compose -f compose.lan.yml restart ofelia
|
|
```
|
|
|
|
## Verification Results
|
|
|
|
### Container Status
|
|
```bash
|
|
docker compose -f compose.lan.yml ps ofelia
|
|
```
|
|
**Result**: Container shows "Up" status (not "Restarting") ✅
|
|
|
|
### Restart Count
|
|
```bash
|
|
docker inspect agenthub-ofelia-1 --format '{{.State.Status}} - Restarts: {{.RestartCount}}'
|
|
```
|
|
**Result**: `running - Restarts: 0` ✅
|
|
|
|
### Ofelia Logs
|
|
```bash
|
|
docker logs agenthub-ofelia-1 --tail 20
|
|
```
|
|
**Result**:
|
|
```
|
|
New job registered 'backup-daily' - '/usr/local/bin/backup.sh' - '0 0 3 * * *'
|
|
Starting scheduler with 1 jobs
|
|
```
|
|
✅ Job successfully registered and scheduler started
|
|
|
|
### Uptime Stability
|
|
Container maintained stable "Up" state for 27+ seconds after restart with zero restarts.
|
|
|
|
## Acceptance Criteria Met
|
|
|
|
- [x] Ofelia container in "Up" state (not "Restarting")
|
|
- [x] Scheduler starts successfully with registered job
|
|
- [x] Zero restart count after fix applied
|
|
- [x] Backup job registered with correct schedule (3am UTC daily)
|
|
|
|
## Next Verification
|
|
|
|
Monitor backup-daily job execution at **03:00 UTC on 2026-05-03** to confirm scheduled task runs successfully.
|
|
|
|
Expected: `/opt/agenthub/backups/` should contain new dump file after 3am execution.
|
|
|
|
## Files Modified
|
|
|
|
- `/opt/agenthub/compose.lan.yml` - Added Ofelia labels to postgres service, fixed YAML syntax
|
|
- `/opt/agenthub/backups/` - Created directory with correct permissions (777)
|
|
|
|
## Technical Notes
|
|
|
|
**Ofelia Job Discovery**: Ofelia scans **running** containers for labels. Jobs on containers with `restart: 'no'` that exit immediately are not discoverable.
|
|
|
|
**Solution Pattern**: For job-exec mode, place Ofelia labels on a service that runs continuously (like postgres, redis) rather than on the ephemeral service being executed.
|
|
|
|
**Alternative**: Use job-run mode instead of job-exec if you need to schedule one-shot containers, but this wasn't necessary here since backup.sh already existed in the backup service.
|