Hermes memory loss rarely looks dramatic at first.
The agent still opens. The repo is still there. A few commands work. Then it asks a question it answered last week, forgets a product boundary, misses a credential note, or runs a cron job from an old assumption. That is when you learn the backup only saved files, not working context. Charming little betrayal.
The operator thesis is simple: Hermes Agent memory restore is not complete until the agent can recover its decisions, habits, scripts, skills, cron jobs, and local notes well enough to keep operating safely. A folder copy is not proof. A clean restore with verified context is proof.
Keepmyclaw currently supports OpenClaw and Hermes backup and restore. For Hermes operators, memory matters because it is where the boring operational truth accumulates. Product rules. Prompt edits. Skill behavior. Cron ledgers. Local scripts. Things nobody remembers until the new machine gets them wrong.
What Hermes memory actually includes
Do not treat memory as one magic file. In a real Hermes setup, useful memory spreads across several places.
At minimum, protect these parts:
- Long-lived memory and workspace notes.
- Skills, including user-edited skills and local playbooks.
- Cron prompts, schedules, output history, and state ledgers.
- Scripts called by cron jobs or recurring operator workflows.
- Config files, model defaults, provider choices, and tool settings.
- Repo state notes, stashes, branch names, and deployment assumptions.
- Credential files after client-side encryption.
- Browser automation setup if a workflow depends on a live profile or CDP session.
- Local docs that explain product scope, pricing, checkout, or public-action boundaries.
The trap is that these files do not look equally important. A markdown note may carry a stronger recovery signal than a whole repo. A tiny JSON ledger may be the only thing preventing a duplicate public post. A skill file may contain the instruction that keeps the agent from marketing the wrong product.
This is why memory restore needs a manifest. Not vibes. A manifest.
Baseline restore targets
Pick numbers before the failure.
A sane Hermes memory restore baseline looks like this:
- RPO: 15 minutes for active memory, cron state, and ledgers.
- RTO: 30 minutes for a usable clean-machine restore.
- Snapshot cadence: every 15 minutes while active, hourly while idle.
- Retention: 96 short-term snapshots, 30 daily snapshots, 12 monthly snapshots.
- Restore drill cadence: one temp-profile drill every 30 days.
- Clean-machine drill cadence: once per quarter.
- Manifest coverage: 100 percent of critical memory paths and scripts.
- Checksum coverage: every snapshot before it is marked verified.
- Secret handling: credentials encrypted before upload, never printed during restore checks.
- Public-action gate: social, email, deploy, and checkout jobs stay disabled until ledgers and prompts are verified.
- Acceptance test: one safe no-op job runs and produces the expected final response.
The exact values can move. The important part is that the restore has a target. If you only know whether the zip file exists, you know almost nothing.
Failure scenarios worth testing
The restore brings back old memory
This is the quiet failure. The agent remembers something, but not the latest thing. It follows an old product scope, posts from an old content backlog, or resurrects a rejected idea because the current notes were outside the backup path.
Mitigation: include timestamps and checksums for every memory source. After restore, compare the newest memory file, latest cron output, latest skill edit, and latest decision note against the source manifest. If the restored memory is older than the RPO, the restore failed.
The skill files restore but behavior changes
Hermes skills can depend on local references, persona files, scripts, or state. Restoring the skill directory alone may not restore the behavior. The agent may load the skill and still miss the instruction that lived in a referenced file.
Mitigation: treat skills as dependency trees. The manifest should include every local file a critical skill reads. After restore, run a read-only skill smoke test and confirm the assembled behavior still contains the current product scope and safety rules.
The cron ledger is missing
A memory restore that loses ledgers is dangerous. A blog job can republish the same angle. A social job can repost a thread. A revenue job can chase the same rejected lead. The agent thinks it is being productive. It is actually stepping on the rake again.
Mitigation: back up state ledgers with the same priority as prompts. Before enabling public jobs, verify used URLs, source IDs, PR numbers, posted slugs, rejected candidates, and follow-up history. If the ledger is missing, keep public-action jobs paused until you rebuild or intentionally reset it.
The restored credentials point to the wrong account
Credential files can exist and still be wrong. Permissions can change. The account can differ. The token can expire. The dangerous version is a restore that silently authenticates to the wrong social, GitHub, Cloudflare, or payment account.
Mitigation: verify credentials through safe metadata endpoints only. Check account identity, scopes, file owner, permission mode, and token length. Do not print secret values. If identity cannot be verified safely, keep mutation jobs disabled.
The repo comes back clean but context is gone
A repo can restore perfectly while the agent forgets why the branch existed, which PR was safe to merge, or which local changes were unrelated to SEO work. Git cannot remember operator judgment for you. Rude, but fair.
Mitigation: store repo state notes outside the repo too. Capture branch, upstream, ahead and behind count, open PR URLs, stashes, dirty files, and the reason for any active branch. After restore, run git status before any deploy or public action.
The restore checklist
Start from a clean profile or clean machine. Do not test by restoring into the old working directory. That only proves the old machine can hide your mistakes.
First, restore the encrypted Keepmyclaw snapshot. Then verify the manifest. Confirm every critical path exists, every checksum matches, and the newest snapshot is within the RPO.
Second, inspect memory freshness. Open the latest operator notes, product scope docs, decision logs, skill files, cron outputs, and state ledgers. Confirm the restored agent knows the current boundary: Keepmyclaw supports OpenClaw and Hermes backup and restore now. Broader tools are future expansion, not current product support.
Third, test skill loading. List skills, inspect critical skills, and confirm referenced local files exist. If a skill depends on a persona, playbook, script, or reference file, verify that file too. Missing references are how restored agents drift into weird old behavior.
Fourth, test cron state without causing side effects. List jobs. Check schedules, prompts, enabled toolsets, model fields, scripts, delivery targets, and output history. Compile scripts. Run only safe no-op jobs. Do not live-test public posting, email, checkout changes, or deployment from a restore drill unless that action is explicitly part of the drill.
Fifth, verify credentials safely. Confirm files exist with restrictive permissions. Use metadata calls to verify account identity where possible. Record presence and scope. Never print token values. This is basic hygiene, which means it gets skipped exactly once before becoming a postmortem.
Sixth, run one controlled operator task. It should read restored memory, follow current scope, use a restored script or skill, and write a harmless output. The point is not to do work. The point is to prove the agent can use its restored context without improvising from stale memory.
What Keepmyclaw should capture
For Hermes memory restore, Keepmyclaw should capture the operating context around the agent:
- Hermes memory and workspace files.
- Skills and every local dependency they read.
- Cron definitions, outputs, prompts, scripts, and state ledgers.
- Config files and provider settings.
- Encrypted credential files.
- Repo state notes and restore manifests.
- Verification reports from previous restore drills.
That last item matters. A previous restore report tells the next restore what good looked like. Without it, every outage becomes original research. Nobody needs original research while a production agent is down.
The proof that counts
A Hermes Agent memory restore is real when these checks pass:
- The newest restored memory is within the RPO.
- Critical skills load with their referenced files.
- Cron jobs list with full prompts, schedules, scripts, and ledgers.
- Scripts compile or pass safe syntax checks.
- Credentials are present, encrypted at rest, and verified without secret leakage.
- Product scope and public-action boundaries match the latest source of truth.
- A safe no-op task uses restored context and writes the expected output.
Anything less is a comforting copy of a directory.
That may be enough for a toy agent. It is not enough for a Hermes operator who depends on memory, cron, skills, and local decisions to stay intact. Restore the context. Then trust the machine.