Hermes Agent backup gets serious when the agent stops being a toy and starts carrying work you would hate to rebuild.
A fresh install is easy. A useful Hermes setup is not. The useful part lives in the rough edges: local memory, loaded skills, scheduled jobs, repo context, tool settings, prompts that took five attempts to get right, and credentials that should never leak in plain text.
Keepmyclaw currently supports OpenClaw and Hermes backup and restore. That scope matters. This is not a generic Claude Code migration guide. It is for operators running OpenClaw or Hermes-style local agent workspaces who want the machine, the context, and the schedule to survive a bad day.
The operator thesis is simple: if your Hermes agent has memory and scheduled work, it is already production infrastructure. Treat backup as a restore system, not a zip file you hope is useful later.
What actually needs to survive?
Start with the parts that make the agent yours.
Back up workspace files first. That means active repos, local notes, task ledgers, generated assets, scripts, and any project-specific state the agent reads before acting. If the workspace disappears, the agent may still launch, but it wakes up with no hands.
Back up memory next. Hermes setups often depend on local state files, durable notes, output ledgers, persona files, and decision logs. Losing those files does not look dramatic at first. The agent just starts making old mistakes again. Nasty little failure mode.
Skills need the same treatment. A skill is operating procedure. It tells the agent how to handle a product, a repo, a tone of voice, a research workflow, or a safety boundary. If skills vanish, the model may still answer, but the agent stops behaving like the system you trained.
Cron jobs are worse. They contain timing, delivery rules, silence rules, scripts, toolsets, and the prompt that turns a job into an operator. A missing cron job does not throw a loud error. It just stops doing the thing. You find out when no report arrives, no content ships, or no recovery alert fires.
Configs and credentials round out the list. They need different handling. Config files should be recoverable. Credentials should be encrypted client-side before they leave the machine. Never treat raw tokens as ordinary backup payload. That is how a safety system becomes a breach with nicer branding.
Set backup parameters before you need them
Use numbers. Vibes are not a retention policy.
For a working Hermes setup, start with an RPO of 4 hours for active workspaces. If losing half a day of agent state would hurt, daily backup is not enough. Use a 15 minute RPO for high-change windows like migrations, launches, or big content runs.
Set an RTO of 30 minutes for a clean-machine restore. That does not mean every disaster resolves in 30 minutes. It means the restore path should be boring enough that a new machine can fetch the backup, decrypt it, restore the workspace, and prove the agent can run inside that window.
Keep hourly snapshots for 24 hours, daily snapshots for 30 days, and weekly snapshots for 12 weeks. If storage cost matters, compress old snapshots after 7 days. If auditability matters, keep a manifest hash for every snapshot.
Run a restore drill every 14 days. Run one immediately after changing skills, cron prompts, credential layout, or project structure. Verify at least one cold restore every quarter on a machine that has never seen the workspace before.
Use a backup size ceiling so failures are visible. If a Hermes workspace usually produces a 120 MB encrypted snapshot and suddenly produces 4 KB, stop trusting it. If it jumps to 9 GB, check whether node_modules, cache folders, logs, or generated media slipped into the payload.
Track at least these parameters: RPO, RTO, snapshot cadence, retention window, restore drill cadence, encryption mode, manifest hash, excluded paths, credential handling, maximum snapshot age, and last successful restore time.
Four failure scenarios worth rehearsing
The cron ledger disappears
A scheduled Hermes job runs every morning. It reads a local seen ledger so it does not repeat itself. Then the machine gets wiped and the job is recreated from memory without the ledger.
The job still runs. That is the trap. It starts rediscovering old items, repeating stale decisions, and sending duplicate reports. Nobody thinks "data loss" because the cron job exists.
Mitigation: include cron definitions, output history needed for dedupe, and state ledgers in the backup set. During restore, run one dry job and confirm it reads the restored ledger before it does public or paid work.
A skill update gets rolled back silently
You patch a Hermes skill to stop the agent from making a bad marketing claim. Two weeks later, a restore pulls an older copy from some random local archive. The agent is back to the bad claim, but it sounds confident, because of course it does.
Mitigation: snapshot skills with version metadata and a manifest hash. After restore, compare the restored skill hashes against the last known good manifest. If a skill controls public content, checkout behavior, auth, or deployment, make hash mismatch a hard stop.
Credentials restore without the matching config
The encrypted credential blob survives, but the config path does not. The agent can technically decrypt secrets, but the integration points at an old account, a dead API URL, or a staging project.
Mitigation: back up credentials and config as a pair, but keep the secret material encrypted before upload. On restore, run read-only auth checks first. Verify account identity, endpoint, scope, and expiry. Do not run login flows, rotate tokens, or mutate auth as part of a blind restore script.
The workspace restores but the agent cannot act
The files come back. The repo opens. Then the first real task fails because executable permissions, local scripts, ignored config files, or tool settings did not survive.
Mitigation: restore tests need more than file count. Run a smoke test that checks script permissions, tool availability, configured paths, current branch, cron schedule presence, and whether the agent can read its own memory. A restored workspace that cannot execute is a museum exhibit.
What a clean-machine restore should prove
A real restore drill starts from a machine that does not already contain the workspace. Otherwise you are testing nostalgia.
The drill should fetch the latest encrypted snapshot, decrypt locally, restore files into a fresh workspace path, and verify the manifest. Then it should run the smallest safe Hermes operation that proves the system has context. Read memory. Load a skill. List cron jobs. Compile or dry-run one script. Confirm no public post, email, payment action, or deployment fires during the test.
The proof should be written down. Keep a restore report with snapshot ID, created_at, restored_at, duration, file count, byte count, manifest hash, excluded paths, cron count, skill count, and smoke-test result. If any of those are missing, the next operator has to guess. Guessing is where agents go to die in interesting ways.
For Keepmyclaw, the clean-machine question is the product question. Can an OpenClaw or Hermes operator recover the operating context, not just a directory tree? If yes, the backup is useful. If not, it is storage pretending to be insurance.
What not to back up blindly
Do not include dependency folders by default. node_modules, build artifacts, browser caches, temp folders, and log spam make snapshots slow and noisy. Include lockfiles and install instructions instead.
Do not back up raw credentials as plain files. Encrypt before upload, verify decrypt locally, and keep restore logs free of token values. Print presence, account identity, scope, and length only when needed.
Do not back up every generated output forever. Keep outputs that affect dedupe, audit, decisions, or user-visible behavior. Drop bulky throwaway files unless they are part of the deliverable.
Do not rely on git alone. Git protects source changes that were committed. Hermes state often lives outside commits: cron output, local memory, credential config, tool settings, and scratch ledgers. Manual commits work until the one time you forget. Then the machine teaches the lesson for free.
The minimum viable Hermes backup policy
If you run Hermes for real work, use this baseline.
Back up active workspaces every 4 hours, and every 15 minutes during high-change operations. Encrypt snapshots client-side. Keep hourly snapshots for 24 hours, daily snapshots for 30 days, and weekly snapshots for 12 weeks. Store a manifest hash with every snapshot. Exclude dependencies, caches, temp folders, and raw logs unless a specific job needs them.
Include memory, skills, cron definitions, cron state ledgers, project docs, scripts, configs, and encrypted credentials. After every restore, verify file count, byte count, manifest hash, skill count, cron count, script permissions, and one safe dry run.
Run a restore drill every 14 days. Run a clean-machine drill every quarter. Tighten the cadence when the agent starts touching public content, checkout paths, customer data, or deployments.
Hermes backup is not about keeping a copy of files because copies feel responsible. It is about preserving the operating context that lets the agent keep doing useful work after the machine dies, the config breaks, or a schedule quietly disappears.
That is the bar. Anything below that is just hoping with a folder name.