CC-Canary: phát hiện regression trong Claude Code, hoàn toàn offline

Giới thiệu

CC-Canary là drift detection cho Claude Code, đóng gói thành hai Agent Skills có thể cài. Nó đọc session JSONL mà Claude Code đã tự ghi trong ~/.claude/projects/, phát hiện model có drift trên work của chính bạn hay không, và tạo ra forensic report có thể chia sẻ.

Không network, không account, không telemetry, không daemon chạy ngầm. Chạy trên dữ liệu đã có sẵn trên disk.

Status hiện tại: 0.x / pre-alpha — format output và metric set có thể thay đổi.

Tính năng chính

Hai skill

Skill	Invocation	Output
`cc-canary`	`/cc-canary [window]`	markdown writeup (`./cc-canary-<date>.md`) — paste-ready cho GitHub issue hoặc gist
`cc-canary-html`	`/cc-canary-html [window]`	HTML dashboard dark theme (`./cc-canary-<date>.html`), tự mở trong browser

Window mặc định 60d. Accept 7d / 14d / 30d / 60d / 90d / 180d.

Nội dung mỗi report

Verdict: HOLDING / SUSPECTED REGRESSION / CONFIRMED REGRESSION / INCONCLUSIVE
Headline metrics table — pre vs post với band verdicts
Weekly trend bars: cost (USD, verified với ccusage đến cent), read:edit ratio, reasoning loops, tokens/turn
Cross-version comparison — cùng user, model version khác, control task mix
Auto-detected inflection date — composite health-score break
Findings với classification model-side / user-side / ambiguous
Appendices: hour-of-day thinking depth, word-frequency shift, per-turn behavior rates, v.v.

Metrics đáng chú ý

Read:Edit ratio — file reads per edit. Proxy cho mức độ điều tra trước khi mutate.
Write share of mutations — Write / (Edit + Write). Share cao = model rewrite file thay vì surgical edit.
Reasoning loops / 1K tool calls — phrase như “let me try again”, “oh wait”, “actually”.
Frustration rate — rate của frustration word trong prompt của bạn.
Thinking redaction rate — fraction của thinking block bị redact vs visible.
Mean thinking length — reasoning-depth proxy (qua cryptographic signature length, r=0.97 với content length).
API turns per user turn — số API call model thực hiện mỗi user message.
Tokens per user turn — tổng token volume per user message.

Cách sử dụng

Install

npx skills add delta-hq/cc-canary

Cài một skill riêng:

npx skills add delta-hq/cc-canary --skill cc-canary
npx skills add delta-hq/cc-canary --skill cc-canary-html

Chạy

Trong Claude Code session bất kỳ:

/cc-canary 60d
/cc-canary-html 30d

Requirements

python3 ≥ 3.8 trên PATH
macOS / Linux / WSL cho auto-open step (fallback sang in path nếu open / xdg-open / start fail)

Cách hoạt động

Scan: Python script (stdlib-only, không pip, không Node) walk ~/.claude/projects/**/*.jsonl, filter theo window, exclude subagent sessions mặc định.
Dedupe: Assistant message dedupe trên (message.id, requestId) — cùng scheme với ccusage, vì Claude Code ghi cùng message vào nhiều JSONL khi session resume/branch.
Aggregate: Per-session metrics — tool-mix, read:edit ratio, reasoning-loop phrases, self-admitted errors, premature stops, interrupts, token usage, cost (current Claude 4.x rates), hour-of-day thinking depth.
Detect inflection: Composite health score per day; argmax của |before − after| over candidate dates với 0.75σ floor. Fallback median-timestamp split nếu không có break đạt ngưỡng.
Pre-render report: script viết skeleton markdown/HTML với mọi table và bar chart đã fill sẵn. Chỉ ~20 slot narrative ngắn (marked ) để Claude fill — verdict line, summary, per-finding reasoning, root-cause.
Fill & save: Claude đọc skeleton, viết narrative, save file cuối.

Total runtime: ~2.5s cho script + 10–20s cho Claude fill narrative.

Privacy

Hoàn toàn local. Zero network call.
Script chỉ đọc ~/.claude/projects/*.jsonl. Không gì khác.
User-prompt content truncate ≤180 chars trước khi include vào skeleton, redact /Users/… paths, email, hex-like token.
Output file live ở directory bạn invoke skill. Không upload đi đâu.

Dev nên quan tâm vì

Cảm nhận “Claude Code dở dần” thường subjective và khó defend. Tool này cho bạn con số cụ thể trên dữ liệu của chính mình — đủ rigor để đưa vào bug report hoặc forum post. Hợp với ai dùng Claude Code hàng ngày và muốn track xem có drift thật hay chỉ là hôm xấu trời.

CC-Canary: phát hiện regression trong Claude Code, hoàn toàn offline

TL;DR

Giới thiệu

Tính năng chính

Hai skill

Nội dung mỗi report

Metrics đáng chú ý

Cách sử dụng

Install

Chạy

Requirements

Cách hoạt động

Privacy

Dev nên quan tâm vì

Đường dẫn nguồn

CC-Canary: phát hiện regression trong Claude Code, hoàn toàn offline

TL;DR

Giới thiệu

Tính năng chính

Hai skill

Nội dung mỗi report

Metrics đáng chú ý

Cách sử dụng

Install

Chạy

Requirements

Cách hoạt động

Privacy

Dev nên quan tâm vì

Đường dẫn nguồn

Cùng bản tin này