Skip to content
Founders$49 once → 2 years of Pro ($98 value)Become Founder →
ClauLock

Blog

Claude Secrets Manager: how to stop leaking API keys to your AI agent

Why API keys end up in transcripts when you pair Claude Code with the usual secret-handling patterns, and what it takes to make the leak impossible instead of unlikely.

By Jesús E. Viera · · 10 min read

The first time I noticed it, I had been pair-programming with Claude Code for about twenty minutes. I'd asked it to poke at a failing CI job, and somewhere in the middle of its Bash calls it dumped env | grep GITHUB, decided that was useful context, and summarized my ghp_live_ token in three lines of prose. The token was now in the transcript, the transcript was in the tool's logs, and the tool's logs were in the vendor's retention policy. Rotating it took ten minutes. Accepting that I'd just leaked it took longer.

This post is about why that happens, why the usual fixes don't work when the caller is an AI agent, and what a secrets manager for Claude agents actually has to look like if you want the leak to be impossible rather than unlikely.

The threat model changed and nobody updated the tools

Every mainstream secrets manager — Vault, Doppler, 1Password CLI, AWS Secrets Manager, the .env file you definitely still have in a repo somewhere — assumes the same threat model:

The process that uses the secret is trusted with the plaintext.

That model is correct for a web server. The web server needs the database password to open the connection. It holds the password in memory, uses it, and discards it. The password doesn't end up in an HTTP response because the server is a well-behaved program that doesn't print its environment to clients.

When the "process that uses the secret" is an AI model running in a loop, every assumption in that threat model falls over:

  • The model's entire input and output is logged. Transcripts are the product. Anything the model sees, humans see, and possibly a vendor's fine-tuning pipeline sees.
  • The model is easy to trick into revealing what it was given. "What's the exact command you're about to run?" "Print the final argv for debugging." The model answers, because answering is what it does.
  • Tools the model calls echo their inputs — auth failures, debug output, set -x, rate-limit responses that quote the header back. Every one of those paths ends with a plaintext secret on the model's stdout.
  • The model's context window is retained. Even if it never speaks the secret out loud, having it in the context means it's in the request body of every subsequent API call for the remainder of the session.

The question isn't whether existing patterns can be made safer. The question is whether you can achieve a property that existing patterns cannot achieve at all: the model uses the secret but never receives the plaintext.

Why every obvious fix still fails

Environment variables

The Claude Code user has GITHUB_TOKEN exported. Claude writes curl -H "Authorization: Bearer $GITHUB_TOKEN" and the shell expands it. Problem: the shell expansion happens before exec, so the expanded string is what Claude's own Bash tool saw and echoed back. Even if the tool wrapper strips the echo, the model can just run echo $GITHUB_TOKEN and read the answer. Env vars assume a cooperative reader. The model isn't that.

.env files

Worse than env vars. Now the secret is persisted in the working directory. Any cat .env, rg -U, or accidentally staged git diff hands the full set to the model. The model will happily cat your .env because that is exactly the kind of thing you asked it to do when you wanted it to "figure out why startup is failing."

Copy-pasting into chat

The secret is now a verbatim string in the conversation history. It will never not be there. You can delete the message; the token is already past the retention horizon. The only remediation is rotation.

"Just use Vault"

Vault solves server-side secret distribution beautifully and is the right tool for production fleets. It does not solve the local developer-machine problem where the requesting process (Claude) is the one you don't trust with the plaintext. When Claude does vault kv get secret/github, Vault prints the value to Claude's stdout. Same failure mode as env vars, with extra TLS.

"Use Composio / Arcade / Nango"

These are excellent at a different problem: orchestrating OAuth flows for SaaS APIs on behalf of end-users. Their threat model assumes the AI agent is trusted with bearer tokens — the entire product is about giving it tokens more conveniently. They are answering "how does the agent get tokens?", not "how does the agent use tokens without seeing them?"

The property you actually want

I'll call it never-reveal. Stated precisely:

For every secret stored by the manager, no plaintext byte of that secret appears in any input or output channel of the AI model, at any point during or after the model's use of it.

The reason this isn't just a policy — the reason it has to be an architectural invariant — is that "no plaintext in model channels" has to hold even when:

  • The model is prompt-injected by a malicious webpage it fetched.
  • The tool the model is using prints its arguments on error.
  • The user asks "what did you just do?" and the model answers honestly.
  • The model decides, for perfectly good reasons, that it would like to grep for secret-looking strings.

A single code path that lets the model read a plaintext secret under any of these conditions is enough to invalidate the property. The architecture has to make reading impossible, not discouraged.

What the architecture actually requires

Four components, in order of load-bearing:

1. An MCP server whose API shape forbids revealing

Claude Code lets you register MCP servers that expose tools. The tools are what Claude can see and call. If one of those tools is secret_reveal(name: string) -> string, the invariant is already broken: the whole point of the server is to hand out plaintext. The correct surface is a set of tools that never return the raw value — you can list names, describe metadata, lock, unlock, rotate, audit, expiry-check, classify, and request from the user (which pops an elicitation prompt in the Claude client so the human types the value, not the model). You cannot expose a tool named get or reveal that returns bytes. The shape of the API is the invariant.

2. A hook that rewrites Bash commands before execution

Claude writes curl ... {{GITHUB_TOKEN}} in its Bash tool call. Claude Code's PreToolUse hook fires. The hook wraps the command with a resolver binary — clsec-exec -- curl ...before the command runs. The model's transcript records the original command with the placeholder. The model never sees the substituted form.

3. A resolver that substitutes only inside a child's argv

The resolver (clsec-exec in our case) talks to a local daemon over a Unix socket, gets the plaintext back, performs the string substitution in its own memory, and calls execve. The plaintext exists in:

  • The daemon's mlocked memory (page-lockable, not swapped).
  • The kernel's argv buffer during execve.
  • The child process's argv, for the lifetime of that one process.

Not in Claude's context. Not in the transcript. Not on disk. Not in shell history. Not in the audit log. Not anywhere the model can reach. The "never-reveal" invariant lives or dies in this binary: if it leaks a byte to stdout, the property is broken.

4. A scrubber on the child's output

Even after all of the above, a badly-behaved tool can still print the secret back. curl --fail -v prints the Authorization header on error. Some SDKs log request bodies on exception. The resolver pipes the child's stdout and stderr through a scrubber that watches for any substring matching a known secret value and replaces it with the original placeholder before Claude sees the bytes. This is defense-in-depth: the architecture should never leak, and if a tool tries to make it leak, the scrubber catches it.

A minimal working example

With ClauLock installed, adding and using a secret looks like this. The adding step uses the MCP server's secret_request_from_user elicitation, so the value goes from your clipboard directly into the daemon without passing through the model:

# In a Claude Code session, you ask:
#   "I need to call the GitHub API. Please set up GITHUB_TOKEN."
# Claude calls mcp__claulock__secret_request_from_user, which pops
# an elicitation dialog in your Claude client. You paste the value
# there. Claude never sees it.

# Then:
curl -H "Authorization: Bearer {{GITHUB_TOKEN}}" https://api.github.com/user
# Transcript:  curl -H "Authorization: Bearer {{GITHUB_TOKEN}}" …
# Actual call: curl -H "Authorization: Bearer ghp_live_…"       …
# Returned:    {"login":"octocat",…}  (scrubbed if the value echoes)

Verification you can run yourself

"Never-reveal" is only meaningful if you can check it. ClauLock ships a tests/leak_test.sh that exercises the whole pipeline and greps every output channel — stdout, stderr, the audit log, the MCP server's responses, the hook's log — for the actual plaintext. The test runs on every push. If a change ever breaks the invariant, CI fails.

You should be suspicious of any tool that makes the claim without giving you a scriptable way to verify it. Invariants that aren't tested aren't invariants. They're aspirations.

What this does and does not protect against

Honestly, because this matters:

  • Protects: the model receiving the plaintext via transcript, tool output, or its own commands.
  • Protects: accidental echoes from tools that print their arguments (scrubbed).
  • Protects: disk leaks — the vault is encrypted at rest with XChaCha20-Poly1305 and an Argon2id-derived key.
  • Does not protect: the child process itself misbehaving. If you tell Claude to run a tool whose job is to POST the token to example.com, the token reaches example.com. That's not a secrets-manager problem; that's an authorization problem, and it's handled by approving commands before they run.
  • Does not protect: root on your machine. If an attacker has root, they can read the daemon's memory. This is the threat model of every local secrets manager, including Keychain and ssh-agent.
  • Does not protect: secrets the model generates and hands to you — if you paste a password Claude invented into Slack, that's on you.

Where this goes

The AI-agent category is three years old. The threat models are still being drawn. Treating your OpenAI key the way you treat your database password worked in 2023 when the "agent" was a python script you babysat. It does not work in 2026 when the agent is cursor-style tooling, Claude Code, or Devin-like autonomy, and its bytes-per-second of output are in the thousands.

If you're running AI agents against production resources — and at this point, most teams are — you need a secret-handling story that is architecturally incompatible with leakage. Not "we try hard." Not "we have a policy." Architecturally incompatible.

ClauLock is one such story. Install it in two minutes (curl -fsSL https://claulock.com/install.sh | sh), run bash tests/leak_test.sh, and decide for yourself whether the property holds. If you build your own, the architecture above is what it has to look like — the name on the binary is the part that's negotiable.

Questions or corrections: [email protected]. The invariant test is in the public repo; if you can break it, there's a bug bounty.