How Do You Back Up Hermes Agent Before a Clean-Machine Restore?

If Hermes Agent only lives on one laptop, you do not have an agent. You have a very elaborate local ritual.

Hermes gets useful when it accumulates context. Skills. Cron jobs. Memory. Repo paths. Prompt patterns. Credential references. Little operator habits that never make it into a README because everyone assumes the machine will keep existing. Then the machine gets wiped, the disk dies, or a migration turns into a scavenger hunt.

The operator thesis is simple: a Hermes Agent backup is not a zip of random files. It is a restore contract. If a clean machine cannot rebuild the agent's working state within a defined RTO, the backup is still unproven.

Keepmyclaw currently focuses on OpenClaw and Hermes backup and restore. Not generic Claude Code state loss. Not every developer agent under the sun. The practical target here is narrower: preserve a local Hermes/OpenClaw style agent runtime well enough that you can prove a new machine can pick up the work.

What counts as Hermes Agent state?

Start with the data that changes behavior, not the data that feels tidy.

For most Hermes setups, the recovery set includes the workspace directory, repo checkouts the agent actively touches, local memory files, custom skills, cron job definitions, scheduled job output history, config files, model/provider defaults, tool access policy, script directories, and credential files after client-side encryption.

That last clause matters. Credential material should not be dumped into a backup as plaintext because the backup tool was in a hurry. If a restore requires credentials, the backup process needs an encrypted path for them, or a documented manual rehydration step with clear ownership. Anything else is just a breach with better folder names.

Treat these as baseline parameters:

RPO: maximum acceptable lost agent work, usually 15 minutes to 24 hours.
RTO: maximum acceptable restore time on a clean machine, usually 30 minutes to 4 hours.
Snapshot cadence: hourly for active operators, daily for light use.
Retention: 7 daily, 4 weekly, and 3 monthly snapshots for small teams.
Restore drill cadence: monthly, plus after major config or skill changes.
Credential rotation window: 24 hours after any suspected local compromise.
Backup verification timeout: fail the run if listing or checksum proof takes over 10 minutes.
Clean-machine test scope: at least one repo task, one skill load, one cron listing, and one memory lookup.

Those numbers are not sacred. They are useful because they force the question. How much work are you willing to lose, and how long can the agent be useless before it costs you money?

The clean-machine restore checklist

A clean-machine restore is harsher than restoring into the same folder after a bad edit. Good. That is the point.

The minimum checklist is:

Confirm the new machine has the expected OS, shell, package manager, and runtime versions.
Restore the Hermes workspace and any OpenClaw workspace paths into the intended location.
Restore custom skills and verify the agent can load them.
Restore memory and confirm a known fact or operating rule is available.
Restore cron job definitions and confirm schedules, prompts, scripts, toolsets, and delivery targets survived.
Restore scripts used by cron jobs and run compile checks where possible.
Restore config files without weakening permissions.
Rehydrate credentials through the approved encrypted or manual path.
Run one safe end-to-end task in a test repo.
Record the restore time and any manual steps.

The last item is where people get lazy. They restore files, see a familiar directory tree, and declare victory. Cute. The agent still may not work.

A real restore proves behavior. Can Hermes read the right skill? Can it find its scheduled jobs? Can it run a script that depends on the restored paths? Can it avoid using a stale credential path from the old machine? Can it produce the same operating decision from memory that it would have made yesterday?

Four failure scenarios worth testing

The cron jobs restore, but the scripts do not

A scheduled job definition can survive while its script path points nowhere. This is common when scripts lived outside the workspace, or when absolute paths changed between machines.

Mitigation: include the script directory in the backup set. During restore, list every cron job, extract script paths, and compile or smoke-test scripts that are safe. For Keepmyclaw-style jobs, script paths should be relative to the Hermes scripts directory when the scheduler expects that. Absolute paths are how future you gets mugged by past you.

The skill files restore, but the agent loads old behavior

Skills are operational policy, not decoration. If a Hermes skill says the product supports generic agent backup when the current scope is only OpenClaw and Hermes, the restored agent can create bad content or take wrong actions immediately.

Mitigation: version skill files with the backup. After restore, run a skill load check for the skills that govern public work, repo work, and product positioning. Add a known guardrail to the test, such as "Keepmyclaw currently supports OpenClaw and Hermes only." If the restored agent cannot repeat it, the restore is not clean.

The memory restores, but credential references are stale

Agents often remember where credentials live. That is useful until a new machine puts them somewhere else, or the old path restores without the actual secret. Then the agent either fails noisily or, worse, tries to fix auth by touching token files it should not mutate.

Mitigation: separate credential references from credential values. Backup encrypted credential files only when that is part of the approved security model. Otherwise restore a credential manifest with names, expected locations, required scopes, and a rehydration checklist. Test by verifying presence and permissions, not by printing secrets. Secret logs are not diagnostics. They are confetti for attackers.

The workspace restores, but repo state is unsafe

A Hermes agent may depend on repos with local branches, stashes, generated files, and uncommitted work. A filesystem backup can bring all of that back, but not every restored repo is safe to push from.

Mitigation: after restore, run git status on every managed repo before public or production actions. Record branch, upstream, ahead/behind count, uncommitted files, and stashes. The clean-machine test should prove the agent can tell the difference between a clean main branch and a half-finished SEO branch. This is boring until it prevents a garbage deploy.

What Keepmyclaw should protect

For Hermes and OpenClaw operators, Keepmyclaw should protect the parts that are painful to reconstruct:

Agent workspace files.
Memory files and durable state.
Custom skills and persona/playbook files.
Cron job definitions and output history.
Scripts used by scheduled jobs.
Config files and model/tool defaults.
Credential files after client-side encryption.
Multi-agent setup state where concurrent operators share context.

The product does not need to pretend it backs up every AI tool to be useful. In fact, that would make the promise weaker. The valuable promise is narrower: when a Hermes or OpenClaw operator has real local state, Keepmyclaw gives them a recoverable path before a machine loss turns into archaeology.

A restore drill that proves something

Run this drill on a machine or VM that starts clean. Do not cheat by restoring into the original environment.

First, create a snapshot from the active machine. Record the snapshot ID, timestamp, source machine, workspace paths, and expected RPO. Then restore to the clean machine. Time the restore from first command to first successful agent task.

Your acceptance test should include five checks. Hermes can load one required skill. Hermes can read one known memory or operating rule. The scheduler can list cron jobs with prompts, scripts, toolsets, and delivery targets intact. At least one safe script compiles or runs in dry mode. A managed repo reports a sane branch and no surprise conflict markers.

Set a hard pass/fail line. Example: RTO under 60 minutes, RPO under 24 hours, zero missing required skills, zero plaintext secrets in logs, and no public action until repo state is verified.

If the drill fails, fix the backup set first. Do not patch the restore by hand and call the backup good. That just trains everyone to trust a process that only works when the person who knows the machine is awake.

The buying question

A good Hermes Agent backup answers one question: can this agent come back on a machine it has never seen before?

Not partially. Not after three hours of remembering where things were. Not with missing cron jobs and a vague plan to recreate skills later.

If the agent matters, the restore has to be boring. Snapshot, restore, verify, continue. Keepmyclaw exists for that narrow, unglamorous problem. Which is usually where the expensive failures hide.