Claude Verifier — docs
Open-source desktop app that probes any Anthropic-compatible /v1/messages endpoint with 47 prompts to verify whether it's genuinely Claude. macOS, Linux, and Windows binaries are auto-built on every release.
Overview
LLM gateways and reseller proxies sometimes silently downgrade or swap the model behind a claude-* model id. This app hammers an endpoint with a curated suite of 47 prompts that expose identity, behavioral, and capability differences between Claude and other model families (GPT, Qwen, DeepSeek, Gemini, Llama, etc.) — then produces an evidence-rich Markdown report with a shareable URL.
Install
Grab a pre-built binary from the latest release:
| OS | File |
|---|---|
| macOS (Apple Silicon) | Claude.Verifier-1.0.0-mac-arm64.dmg |
| macOS (Intel) | Claude.Verifier-1.0.0-mac-x64.dmg |
| Windows | Claude.Verifier-1.0.0-win-x64.exe |
| Linux (universal) | Claude.Verifier-1.0.0-linux-x86_64.AppImage |
| Linux (Debian/Ubuntu) | Claude.Verifier-1.0.0-linux-amd64.deb |
macOS Gatekeeper note: The app isn't signed with an Apple Developer cert. After dragging it into/Applications/, runxattr -cr "/Applications/Claude Verifier.app"once in Terminal. On Windows, SmartScreen shows "Windows protected your PC" — click More info → Run anyway.
Or run from source:
git clone https://github.com/botnick/claude-verifier.git
cd claude-verifier
./run.sh # macOS / Linux
run.bat # Windows
Quick start
- Paste your endpoint (default = Anthropic's
/v1/messages) and API key. - Pick a model from the dropdown (or type any model identifier).
- Click ◯ Test connection to confirm credentials.
- Tick probes in the Test Catalog, then ▶ Run selected — or just ▶▶ Run ALL.
- When the run finishes the report is auto-saved to
reports/and a shareable URL is copied to your clipboard.
Keyboard shortcuts
| Key | Action |
|---|---|
| ⌘↩ / Ctrl+Enter | Run selected |
| ⇧⌘↩ / Ctrl+Shift+Enter | Run ALL |
| Esc | Stop (when a run is in flight) |
| ⌘K / Ctrl+K | Focus API key field |
| ⌘L / Ctrl+L | Focus endpoint field |
1:1 Claude Code traffic emulation
Every request matches real claude-cli traffic shape:
user-agent: claude-cli/<ver> (external, cli)x-app: cli+ fullx-stainless-*set (lang/os/arch/runtime/version)anthropic-beta: prompt-caching-2024-07-31,fine-grained-tool-streaming-2025-05-14- A
systemblock matching Claude Code's preamble metadata.user_id = "user_" + sha256(api_key)- The full
toolsarray (Bash, Read, Write, Edit, Glob, Grep, TodoWrite, WebFetch)
Endpoints can't behave differently for "verifier" traffic vs genuine Claude Code traffic.
Sandbox auto-fallback
Because we include tools on every request, the model sometimes replies with a pure tool_use turn (no text content) — which would leave the keyword verdict with nothing to match. The chat IPC handles this transparently:
- Attempt 1: with full tools array (true 1:1 traffic).
- If the response has only
tool_useand no text → Attempt 2: same request without tools. - Result is returned with
sandboxed: trueandsandbox_first_tools: [...]so the Detail tab can surface "first reply was tool_use, retried without tools".
Latency is summed across both attempts so the wall-clock cost is honest. The text field always reflects what the verdict matches against.
Verdict + confidence
Each probe carries two keyword lists. judge(test, text) returns a label + confidence in [0, 1]:
| Label | Triggered when… |
|---|---|
| PASS | expected keywords found, no red flags |
| SUSPICIOUS | expected missing, OR both expected & red flag present |
| FAIL | red flag found and no expected, OR request errored |
| INFO | informational probe with no scoring |
Golden similarity
An optional data/golden.json file contains 5 reference samples per probe. When loaded, every probe response is compared via character-bigram cosine similarity (language-agnostic, works for Thai/Chinese/Arabic without a tokenizer):
≥ 0.6— ✅ close to genuine≥ 0.35— ⚠️ partially diverging< 0.35— ⛔ clearly different
Regenerate the baseline against real Claude any time:
ANTHROPIC_API_KEY=sk-ant-... npm run build-golden
User-added probes with no baseline are simply skipped — no crash, no dangling column.
Latency + shape signals
Beyond keyword + similarity, the report shows two additional signals on the at-a-glance table:
- Latency coefficient-of-variation —
std / meanacross all probes.CV < 0.18= suspiciously uniform latencies (proxy / cache hint);CV > 1.6= highly variable. - Avg chars-per-output-token — genuine Claude responses tend to ~3–6. Heavy skew is a tokenizer fingerprint.
- stop_reason mix — Claude almost always ends with
end_turn. Lots ofmax_tokensor unusual values is a flag.
Probe categories
| Category | What it checks | # |
|---|---|---|
| identity | Who the endpoint claims to be, cutoff, creator, training methodology, repeat-back trap. | 6 |
| jailbreak | DAN, "ignore previous instructions", grandma exploit, base64-encoded request, evil-twin roleplay. | 5 |
| china | Tiananmen, Taiwan, Xinjiang, Hong Kong, Xi, Falun Gong, Pooh — TH/ZH/EN. | 15 |
| trick | Flattery, system-prompt leak, infra probe, owner traps, forced choice, completion, wrapper, covert-token, CC echo. | 15 |
| capability | Math chain-of-thought, strict JSON, Thai language, Unicode round-trip, long-context needle. | 6 |
Probe pack format
A "probe pack" is a portable JSON containing probe definitions. Endpoint, key, and model are never included — strictly probes + metadata:
{
"format": "claude-verifier-probe-pack",
"version": 1,
"name": "TH localization pack",
"description": "Probes that catch Thai-language proxies",
"probes": [
{ "id": "th_register_formal",
"cat": "capability",
"title": "Formal Thai register",
"prompt": "ใช้ภาษาไทยทางการแบบกระทรวงพูดถึง...",
"expect_any": ["ครับ", "ค่ะ", "ดิฉัน"],
"red_flag": [] }
]
}
Share & import probes
Two ways to participate without forking the repo:
- Web editor: open the editor, add or edit probes, click ⬇ Export pack. Share the JSON anywhere. Anyone can ⬆ Import it back.
- Desktop app: same flow inside the app's catalog header. Imported probes persist in
data/custom-probes.json.
Built-in probes are immutable from the editor (you'd untick them in the desktop app to skip on a run). Custom probes show a custom badge and can be edited / deleted freely.
Project-folder rule
Everything this app produces stays inside the project directory. No ~/Documents, no app.getPath('userData') for app-generated state, no localStorage for persistent user data:
<project>/data/golden.json — reference baseline
<project>/data/custom-probes.json — user-imported probes
<project>/reports/<stamp>.md — auto-saved reports
Contributing
The highest-leverage contribution is adding probes. The more high-signal questions, the harder it gets for a proxy to fake being Claude.
Three ways:
- 🌐 Export a probe pack from the editor and share the JSON in Discussions.
- 🔀 Open a PR adding probes to
public/tests.js+ rerunnpm run build-goldento refreshdata/golden.json. See CONTRIBUTING.md. - 🐛 File issues describing endpoints that fool the current catalog so others can refine probes.
License
MIT — see LICENSE. Contribution implies you license your changes under the same terms.