Hermes cron jobs are tiny until they are gone.
A good job is not just a schedule. It is the prompt, loaded skills, enabled tools, script path, delivery target, state files, credentials path, and the last known output trail. Lose one piece and the job may still appear in a list while doing the wrong thing every hour. Lovely little trap.
The operator thesis is simple: treat Hermes cron jobs as recoverable runtime state, not as disposable automation. A restore that only brings back the repo is incomplete. You need to prove the scheduler can list the jobs, run the right prompt, reach the right scripts, read the right state, and deliver a sane no-op response when nothing changed.
Keepmyclaw currently targets OpenClaw and Hermes backup and restore. For Hermes operators, cron state is one of the first things worth protecting because scheduled jobs are where useful agent behavior quietly accumulates.
What has to survive
A Hermes cron restore needs more than a copy of one JSON file. At minimum, protect these pieces:
- The cron job definitions under the Hermes cron state directory.
- The full prompt for every job, not just the preview shown in a list command.
- Attached skills and any local persona or playbook files the prompt reads.
- Enabled toolsets, model, provider, repeat count, delivery target, and schedule.
- Script paths under the Hermes scripts directory.
- Durable state under the Hermes state directory.
- Recent output files under the cron output directory.
- Credential files referenced by scripts, after client-side encryption.
- Browser or CDP setup files if a job depends on browser automation.
The output files matter more than people think. A restored job can look alive while silently changing behavior because an old prompt was rebuilt from memory. The latest run output often contains the full prompt, loaded skill text, script output, and final response. That is the forensic record when the scheduler lies to your face.
Baseline restore targets
Set measurable restore targets before you need them.
A sane baseline for Hermes cron jobs:
- RPO: 15 minutes for active cron definitions.
- RTO: 30 minutes for restoring all critical jobs on a clean machine.
- Retention: 30 daily snapshots and 12 monthly snapshots.
- Verification cadence: one automated snapshot check per day.
- Restore drill cadence: one clean-machine or temp-profile drill per month.
- Critical job inventory: every job with public posting, email, deployment, revenue, or data mutation access.
- Output retention: at least the last 20 runs per critical job.
- Script compile check: every restore drill.
- Credential check: presence and file permissions only. Never print secrets.
- Delivery check: final response format matches the job rules, especially plain text for Telegram.
The exact numbers can change. The point is not the spreadsheet. The point is that nobody should discover the restore target during the outage.
Failure scenarios worth testing
The job exists but the prompt is wrong
This happens when someone restores a preview instead of the full prompt. Cron lists often show only a short preview. That is not enough. The missing tail can contain public action caps, silence rules, credential boundaries, or product scope.
Mitigation: recover prompts from the latest session JSON or latest cron output, then compare against the restored job. Do not rebuild from memory. Memory is where duplicate Hermes backup articles come from. Ask me how I know.
The schedule restores but the script path breaks
Hermes cron scripts expect paths relative to the Hermes scripts directory. An absolute path that worked on the old machine can fail on a clean restore. The job may still run, but the script never executes.
Mitigation: after restore, compile every referenced script and run safe read-only smoke tests. For mutation scripts, verify imports and credentials without executing the side effect.
The job runs but delivery is noisy
Telegram does not render markdown the way people expect. A no-op job that emits a cheerful summary every hour is not recovered. It is just spam with a calendar.
Mitigation: verify final responses from saved output files. A successful no-op should return exactly the job's silence marker, usually [SILENT]. Public reports should use plain text only.
Credentials restore with the wrong permissions
A script may find the credential file and still fail because permissions changed, paths moved, or the file belongs to the wrong user. Worse, a lazy restore check can print the secret while debugging.
Mitigation: check file existence, owner, permission mode, and secret length only. Do not print token values. If the job uses Cloudflare, GitHub, Reddit, or browser auth, test a safe metadata endpoint before trusting the restore.
The scheduler index is stale
Directly editing cron state can leave the scheduler with an old in-memory view. The file looks correct. The running service disagrees. Computers remain committed to slapstick.
Mitigation: use official Hermes cron commands when possible after restoring state. Then list jobs, confirm schedules, and run one bounded manual scheduler invocation if the job is safe.
The public-action ledger is missing
A restored social or publishing job without its dedupe ledger can repost old content. That is not a backup issue in theory. In production, it is exactly a backup issue.
Mitigation: back up job state and ledgers with the same priority as job definitions. Verify used URLs, source IDs, posted article slugs, and action history before enabling public jobs.
Clean-machine restore checklist
Start with a machine or profile that has no useful local Hermes state. Restore the encrypted Keepmyclaw snapshot, then verify in this order.
First, list cron jobs. Confirm the expected count, job names, schedules, repeats, model and provider fields, skills, toolsets, delivery target, and script name. Do not trust a pretty list. Open at least one full job record or recovered prompt source.
Second, inspect the latest output directory for each critical job. Confirm the restored files include recent timestamps and a final response section. If there is no output history, you lost the audit trail.
Third, check scripts. Compile Python scripts. For shell scripts, run syntax checks where possible. Confirm paths are relative to the Hermes scripts directory. Fix absolute paths before enabling jobs.
Fourth, verify state files. Revenue jobs need their seen ledgers. Blog jobs need registries and slug history. Social jobs need action ledgers. Browser jobs need enough profile or CDP setup state to avoid logging into the wrong account.
Fifth, run safe no-op jobs manually. Do not live-test public posting, email sending, production deployment, pricing changes, or destructive cleanup unless that action is the approved drill. For unsafe jobs, verify prerequisites and leave the public action disabled until the scheduled run boundary is deliberate.
Sixth, inspect the saved output from the test run. The scheduler can report success while the response is empty, markdown-heavy, or missing the required silence marker. The file is the witness. Read it.
What Keepmyclaw should capture
For Hermes cron recovery, Keepmyclaw should capture the runtime context around the scheduler, not just the visible code you remembered to commit.
That means cron definitions, scripts, skills, memory, durable state, configs, output history, and encrypted credential files. It also means preserving enough context to restore behavior. A job that was allowed to post publicly needs its caps and ledgers. A job that was draft-only needs that boundary preserved. A job that reports to Telegram needs the formatting rules intact.
This is why generic file backup is easy to outgrow. It can copy the bytes and still miss the operating meaning. Hermes workspaces are mostly files, yes. The useful part is knowing which files make the agent itself recoverable.
The restore proof that counts
A Hermes cron restore is real when you can show these facts:
- The restored scheduler lists every critical job.
- Full prompts match the last known source, not list previews.
- Scripts compile or pass safe syntax checks.
- State ledgers and output history are present.
- Credentials are present with safe permissions and no secret leakage.
- One safe job runs and records the correct final response.
- Public-action jobs remain bounded until their ledgers are verified.
Anything less is a comforting folder copy.
Comforting folder copies are how agents wake up on a new machine with no memory, no schedule, and one mysterious cron job trying to publish yesterday's draft again. Back up the scheduler like it matters. It does.