@hanfour.huang/caliber 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/README.md +667 -0
  2. package/dist/analyzers/section.d.ts +9 -0
  3. package/dist/analyzers/section.d.ts.map +1 -0
  4. package/dist/analyzers/section.js +503 -0
  5. package/dist/analyzers/section.js.map +1 -0
  6. package/dist/analyzers/usage.d.ts +3 -0
  7. package/dist/analyzers/usage.d.ts.map +1 -0
  8. package/dist/analyzers/usage.js +141 -0
  9. package/dist/analyzers/usage.js.map +1 -0
  10. package/dist/cli.d.ts +3 -0
  11. package/dist/cli.d.ts.map +1 -0
  12. package/dist/cli.js +295 -0
  13. package/dist/cli.js.map +1 -0
  14. package/dist/config.d.ts +19 -0
  15. package/dist/config.d.ts.map +1 -0
  16. package/dist/config.js +156 -0
  17. package/dist/config.js.map +1 -0
  18. package/dist/data-quality.d.ts +3 -0
  19. package/dist/data-quality.d.ts.map +1 -0
  20. package/dist/data-quality.js +54 -0
  21. package/dist/data-quality.js.map +1 -0
  22. package/dist/extractors/claude-code.d.ts +6 -0
  23. package/dist/extractors/claude-code.d.ts.map +1 -0
  24. package/dist/extractors/claude-code.js +216 -0
  25. package/dist/extractors/claude-code.js.map +1 -0
  26. package/dist/extractors/codex.d.ts +5 -0
  27. package/dist/extractors/codex.d.ts.map +1 -0
  28. package/dist/extractors/codex.js +184 -0
  29. package/dist/extractors/codex.js.map +1 -0
  30. package/dist/i18n.d.ts +53 -0
  31. package/dist/i18n.d.ts.map +1 -0
  32. package/dist/i18n.js +163 -0
  33. package/dist/i18n.js.map +1 -0
  34. package/dist/period.d.ts +5 -0
  35. package/dist/period.d.ts.map +1 -0
  36. package/dist/period.js +25 -0
  37. package/dist/period.js.map +1 -0
  38. package/dist/reporters/report.d.ts +7 -0
  39. package/dist/reporters/report.d.ts.map +1 -0
  40. package/dist/reporters/report.js +440 -0
  41. package/dist/reporters/report.js.map +1 -0
  42. package/dist/standard.d.ts +5 -0
  43. package/dist/standard.d.ts.map +1 -0
  44. package/dist/standard.js +98 -0
  45. package/dist/standard.js.map +1 -0
  46. package/dist/types.d.ts +216 -0
  47. package/dist/types.d.ts.map +1 -0
  48. package/dist/types.js +3 -0
  49. package/dist/types.js.map +1 -0
  50. package/dist/utils.d.ts +3 -0
  51. package/dist/utils.d.ts.map +1 -0
  52. package/dist/utils.js +17 -0
  53. package/dist/utils.js.map +1 -0
  54. package/package.json +59 -0
  55. package/templates/eval-standard.json +174 -0
package/README.md ADDED
@@ -0,0 +1,667 @@
1
+ # Caliber
2
+
3
+ **Measure the caliber of your AI-assisted engineering.** A self-hostable gateway, audit log, and evaluator for teams that want to know exactly what their AI coding assistants are doing — and how well.
4
+
5
+ **精準衡量你的 AI 工程力。** 自架的 gateway / 稽核 / 評核平台,讓團隊清楚知道 AI 助理到底在做什麼、做得多好。
6
+
7
+ ---
8
+
9
+ ## Why / 為什麼需要這個工具
10
+
11
+ Engineering managers need evidence-based data to evaluate how effectively their team uses AI coding assistants. Manual review of hundreds of AI sessions is impractical. This tool automates the process by:
12
+
13
+ 研發經理需要基於證據的資料來評估團隊使用 AI 程式助手的成效。手動審查數百個 AI 工作階段不切實際。本工具透過以下方式自動化此流程:
14
+
15
+ 1. **Extracting** usage data from local Claude Code (`~/.claude/`) and Codex (`~/.codex/`) storage
16
+ 2. **Analyzing** session patterns for decision-making quality and risk identification
17
+ 3. **Scoring** against a configurable evaluation standard (default: OneAD R&D standard)
18
+ 4. **Generating** structured reports with evidence and score recommendations
19
+
20
+ ---
21
+
22
+ ## Features / 功能特色
23
+
24
+ - Reads Claude Code session metadata, facets, SQLite cost data, and JSONL conversations
25
+ - Reads Codex SQLite thread data (tokens, models, sessions)
26
+ - Detects decision-making patterns (iterative refinement, multi-task coordination, active corrections)
27
+ - Detects risk identification signals (security awareness, performance discussions, bug catching)
28
+ - Configurable evaluation standard — bring your own criteria, keywords, and thresholds
29
+ - Multiple output formats: terminal (colored), JSON, Markdown, HTML
30
+ - JSON output is machine-parseable (`--format json` emits clean JSON to stdout, progress logs go to stderr)
31
+ - Noise filtering to exclude system messages and code review templates from analysis
32
+ - `init-standard` command to export the default standard as a customization template
33
+ - Data quality warnings when data sources are missing or incomplete
34
+
35
+ ---
36
+
37
+ ## Platform mode / 平台模式
38
+
39
+ Starting with **v0.2.0** Caliber also ships as a self-hostable web platform with
40
+ organization-scoped RBAC, invites, and an audit log. Use this mode if you want
41
+ a shared workspace for a team rather than a per-engineer CLI report.
42
+
43
+ **v0.3.0** adds an opt-in **gateway** that proxies Anthropic-native
44
+ (`/v1/messages`) and OpenAI-compatible (`/v1/chat/completions`) traffic
45
+ through a shared pool of upstream accounts:
46
+
47
+ - Admins donate `sk-ant-...` API keys or OAuth bundles extracted from Claude
48
+ Code; the gateway's scheduler picks one per request based on priority,
49
+ concurrency, and rate-limit state.
50
+ - Each user self-issues or receives an admin-issued platform API key
51
+ (`ak_...`) that authenticates against the gateway.
52
+ - Usage + cost (per Anthropic/OpenAI token pricing) lands in a `usage_logs`
53
+ table, surfaced via per-user and per-org dashboards.
54
+
55
+ **v0.4.0** adds an opt-in **evaluator** subsystem for performance evaluation
56
+ (gated behind `ENABLE_EVALUATOR` feature flag):
57
+
58
+ - **Content capture opt-in** — organization-level toggle; members see their
59
+ captured usage on `/dashboard/profile/evaluation`. 90-day default retention
60
+ (per-org override: 30/60/90). AES-256-GCM encryption with domain-separated
61
+ HKDF keys.
62
+ - **Dual-layer evaluation** — rule-based scoring (always-on) + optional LLM
63
+ Deep Analysis (per-org opt-in). Costs dogfooded via self-gateway loopback
64
+ and tracked in `usage_logs`.
65
+ - **Admin-customizable scoring rubrics** — platform defaults seeded for
66
+ English, Traditional Chinese, and Japanese; organizations can define custom
67
+ rubrics with dry-run preview. Zod-validated signal discriminated union
68
+ (keywords, thresholds, refusal rates, client mix, model diversity, cache
69
+ patterns, extended thinking, tool variety, iteration counts) — **extended
70
+ in v0.5.0** with six facet-based signal types.
71
+ - **GDPR member-initiated delete request workflow** — members request deletion,
72
+ org admins approve (or auto-reject after 30 days). Retention purge and GDPR
73
+ execution run on separate cron workers.
74
+ - **Labor-law-friendly transparency** — members always see their own full
75
+ evaluation report; team managers see redacted team views (LLM analysis
76
+ fields nulled unless they are also org admins). Leaderboard visibility is
77
+ opt-in per organization (privacy default).
78
+
79
+ **v0.5.0** extends the v0.4.0 evaluator with **per-org LLM cost budgeting**
80
+ and **opt-in LLM facet extraction** (gated behind `ENABLE_FACET_EXTRACTION`
81
+ env + per-org `llm_facet_enabled`). All v0.4.0 behaviour preserved when both
82
+ flags are off.
83
+
84
+ - **Cost budget infrastructure** — every org gets `llm_monthly_budget_usd`
85
+ + `llm_budget_overage_behavior` (`degrade` skips over-budget calls,
86
+ `halt` stops all LLM until next UTC month). Spend tracked per-call in a
87
+ new `llm_usage_events` ledger, summed per UTC month, enforced before each
88
+ LLM call. Cost dashboard at `/dashboard/organizations/<id>/evaluator/costs`
89
+ with breakdowns by task / model / 6-month history; compact widget on the
90
+ evaluator status page.
91
+ - **LLM facet extraction** — opt-in second LLM pass per session that
92
+ classifies each evaluation window's sessions into structured JSON
93
+ (`{sessionType, outcome, claudeHelpfulness, frictionCount, bugsCaughtCount,
94
+ codexErrorsCount}`). Extracted rows persisted to `request_body_facets`
95
+ table with `prompt_version` cache so the same LLM call doesn't fire twice.
96
+ Deterministic failures (parse / validation / timeout) write an error row
97
+ so they don't retry; transient failures (5xx, budget hit) skip silently.
98
+ - **Six new rubric signal types** consume the facet aggregate:
99
+ `facet_claude_helpfulness`, `facet_friction_per_session`,
100
+ `facet_bugs_caught`, `facet_codex_errors`, `facet_outcome_success_rate`,
101
+ `facet_session_type_ratio`. Custom rubrics can opt in today; the rubric
102
+ editor ships an in-form Signal types reference.
103
+ - **Platform default rubrics bumped to v1.1.0** — strictly additive: each
104
+ section gains one facet support (`facet_outcome_success` to `interaction`,
105
+ `facet_bugs_caught` to `riskControl`). Orgs without facet extraction see
106
+ zero scoring change.
107
+ - **Report-page facet drill-down** — when facet rows exist for the period,
108
+ the user's evaluation report shows session-type distribution, success
109
+ rate, avg helpfulness, and bug/friction/codex counters. Hidden silently
110
+ when no rows exist.
111
+ - **Observability artifacts** shipped under `ops/`: 3 Grafana dashboards
112
+ (evaluator / body-capture / GDPR), 11 Prometheus alert rules, 9 runbooks
113
+ in `docs/runbooks/`, and a post-release smoke workflow that auto-creates a
114
+ `release-blocker` issue when the canary fails.
115
+
116
+ See [`docs/UPGRADE-v0.5.0.md`](docs/UPGRADE-v0.5.0.md) for the upgrade
117
+ playbook (migrations 0004-0007, env flags, three-tier rollback).
118
+
119
+ Quick start:
120
+ ```sh
121
+ cd docker
122
+ cp .env.example .env # fill in OAuth + bootstrap email (+ gateway secrets if enabling)
123
+ docker compose up -d # api + web + postgres + redis
124
+ docker compose --profile gateway up -d # opt-in: add gateway service
125
+ ```
126
+
127
+ Images are published on every `v*` tag to:
128
+
129
+ | Image | amd64 | arm64 |
130
+ |-------|-------|-------|
131
+ | `ghcr.io/hanfour/caliber-api` | ✅ | ✅ |
132
+ | `ghcr.io/hanfour/caliber-gateway` (new in v0.3.0) | ✅ | ✅ |
133
+ | `ghcr.io/hanfour/caliber-web` | ✅ | ❌ (dropped in v0.5.0; QEMU cross-build was unstable) |
134
+
135
+ Operator guides:
136
+
137
+ - **Try locally first**: [`docs/LOCAL_DEPLOY.md`](docs/LOCAL_DEPLOY.md) — 5-min path on your laptop, escalates to on-prem production
138
+ - Self-hosting bring-up (api + web + gateway): [`docs/SELF_HOSTING.md`](docs/SELF_HOSTING.md)
139
+ - Gateway operator + user reference: [`docs/GATEWAY.md`](docs/GATEWAY.md)
140
+
141
+ **Cloud deploy templates** (alternatives to docker-compose self-hosting):
142
+
143
+ - [Render Blueprint](deploy/render/README.md) — closest thing to one-click; provisions Postgres + 3 services; needs Upstash Redis externally
144
+ - [Fly.io](deploy/fly/README.md) — three apps + Fly Postgres + Upstash; geographically distributed if you want
145
+ - [Railway](deploy/railway/README.md) — native Postgres + Redis plugins; manual service creation per the README
146
+
147
+ ⚠️ Vercel is **not supported** — the gateway is a long-running Fastify
148
+ server with BullMQ workers, doesn't fit Vercel's serverless model. See
149
+ the deploy/ READMEs for what does work.
150
+
151
+ CLI mode and platform mode share no runtime state; pick whichever fits.
152
+
153
+ > **First time trying platform mode?** Start with
154
+ > [`docs/GETTING_STARTED.md`](docs/GETTING_STARTED.md) — a 30-minute
155
+ > end-to-end walkthrough that takes a fresh checkout to a working
156
+ > personal AI gateway sharing your Claude.ai Pro/Max subscription
157
+ > across all your devices.
158
+
159
+ ---
160
+
161
+ ## Data Sources / 資料來源
162
+
163
+ | Source | Path | Data |
164
+ |--------|------|------|
165
+ | Claude Code Session Meta | `~/.claude/usage-data/session-meta/*.json` | Tokens, duration, tools, languages, git commits, first prompt |
166
+ | Claude Code Facets | `~/.claude/usage-data/facets/*.json` | AI-generated session analysis: goals, outcomes, friction, helpfulness |
167
+ | Claude Code SQLite | `~/.claude/__store.db` | Per-message cost (USD), model, duration |
168
+ | Claude Code JSONL | `~/.claude/projects/*/*.jsonl` | Full conversation content for keyword signal scanning |
169
+ | Codex SQLite | `~/.codex/state_5.sqlite` | Threads: tokens_used, model, title, git info |
170
+ | Codex History | `~/.codex/history.jsonl` | Full user prompts by thread/session |
171
+ | Codex Logs | `~/.codex/logs_2.sqlite` | Thread-level tool calls and error events |
172
+
173
+ All data is read **locally and read-only**. No data is sent to any external service.
174
+
175
+ 所有資料皆為**本地端唯讀存取**,不會傳送至任何外部服務。
176
+
177
+ ---
178
+
179
+ ## Prerequisites / 系統需求
180
+
181
+ - **Node.js** >= 18
182
+ - **npm** (included with Node.js)
183
+ - `~/.claude/` directory (from Claude Code usage)
184
+ - `~/.codex/` directory (from Codex CLI usage, optional)
185
+
186
+ ---
187
+
188
+ ## Installation / 安裝
189
+
190
+ ### Recommended: Install from npm / 建議:從 npm 安裝
191
+
192
+ ```bash
193
+ npm install -g @hanfour.huang/caliber
194
+
195
+ # Verify installation
196
+ caliber --version
197
+ ```
198
+
199
+ ### Update / 更新
200
+
201
+ ```bash
202
+ npm install -g @hanfour.huang/caliber@latest
203
+ ```
204
+
205
+ Caliber uses `~/.caliber.json` for CLI settings. On first run after upgrading
206
+ from the older `aide` CLI, an existing `~/.aide.json` file is read, migrated to
207
+ `~/.caliber.json`, and reported with a one-time deprecation notice.
208
+
209
+ ### Existing local-clone users / 已使用 clone 安裝的使用者
210
+
211
+ If you previously installed from a cloned repo or `npm link`, migrate to the npm package:
212
+
213
+ ```bash
214
+ npm unlink -g aide 2>/dev/null || npm uninstall -g @hanfour.huang/aide
215
+ npm install -g @hanfour.huang/caliber@latest
216
+ ```
217
+
218
+ ### Development mode / 開發模式
219
+
220
+ ```bash
221
+ git clone https://github.com/hanfour/caliber.git ~/caliber
222
+ cd ~/caliber
223
+ npm install
224
+ npx tsx src/cli.ts --help
225
+ ```
226
+
227
+ ---
228
+
229
+ ## Quick Start / 快速開始
230
+
231
+ ```bash
232
+ # Quick usage summary (last 7 days)
233
+ caliber summary
234
+
235
+ # Full evaluation report (last 30 days, terminal output)
236
+ caliber report
237
+
238
+ # Save report as Markdown
239
+ caliber report --format markdown --output report.md
240
+
241
+ # Save report as HTML
242
+ caliber report --format html --output report.html
243
+
244
+ # Monthly KPI report
245
+ caliber monthly
246
+ ```
247
+
248
+ ---
249
+
250
+ ## Usage / 使用方式
251
+
252
+ ### Quick Summary / 快速摘要
253
+
254
+ ```bash
255
+ # Last 7 days (default)
256
+ caliber summary
257
+
258
+ # Custom date range
259
+ caliber summary --since 2026-03-01 --until 2026-03-31
260
+ ```
261
+
262
+ Output:
263
+
264
+ ```
265
+ AI Dev Usage Summary
266
+ Period: 2026-03-01 ~ 2026-03-31
267
+
268
+ Claude Code
269
+ Sessions: 57
270
+ Tokens: 259,336
271
+ Duration: 15676 min
272
+ Active Days: 9
273
+
274
+ Codex
275
+ Sessions: 1
276
+ Tokens: 368,930
277
+ Active Days: 1
278
+ ```
279
+
280
+ ### Full Evaluation Report / 完整評核報告
281
+
282
+ ```bash
283
+ # Default: last 30 days, text format, built-in OneAD standard
284
+ caliber report
285
+
286
+ # Current calendar month
287
+ caliber monthly
288
+
289
+ # Previous full calendar month
290
+ caliber monthly --previous
291
+
292
+ # Current calendar quarter
293
+ caliber quarterly
294
+
295
+ # Previous full calendar quarter
296
+ caliber quarterly --previous
297
+
298
+ # Custom date range
299
+ caliber report --since 2026-03-01 --until 2026-04-14
300
+
301
+ # Output as Markdown file
302
+ caliber report --format markdown --output report.md
303
+
304
+ # Output as HTML file
305
+ caliber report --format html --output report.html
306
+
307
+ # Output as JSON (machine-parseable, clean stdout)
308
+ caliber report --format json --output report.json
309
+
310
+ # Pipe JSON for programmatic consumption
311
+ caliber report --format json 2>/dev/null | jq '.sections[].score'
312
+
313
+ # Use a custom evaluation standard
314
+ caliber report --standard my-standard.json
315
+
316
+ # Include engineer/department metadata in report
317
+ caliber report --engineer "Jane Doe" --department "R&D"
318
+ ```
319
+
320
+ > **Note:** When using `--format json`, progress and status messages are written to stderr.
321
+ > stdout contains only the JSON report, making it safe to pipe to `jq` or other tools.
322
+
323
+ ### Using the compiled CLI / 使用編譯後的 CLI
324
+
325
+ If you are developing locally and have run `npm run build`, you can use `node dist/cli.js`:
326
+
327
+ ```bash
328
+ node dist/cli.js report --since 2026-03-01 --until 2026-03-31
329
+ node dist/cli.js summary
330
+ node dist/cli.js monthly --previous --format markdown --output march.md
331
+ node dist/cli.js report --format html --output report.html
332
+ ```
333
+
334
+ ---
335
+
336
+ ## CLI Reference / 命令參考
337
+
338
+ ### `caliber report`
339
+
340
+ Generate a full evaluation report.
341
+
342
+ ```
343
+ Options:
344
+ -s, --since <date> Start date, YYYY-MM-DD (default: 30 days ago)
345
+ -u, --until <date> End date, YYYY-MM-DD (default: today)
346
+ -f, --format <format> Output: text | json | markdown (default: text)
347
+ -o, --output <file> Write report to file instead of stdout
348
+ --standard <path> Path to custom evaluation standard JSON
349
+ --engineer <name> Engineer name for report identification
350
+ --department <name> Department name for report identification
351
+ ```
352
+
353
+ ### `caliber summary`
354
+
355
+ Quick usage summary for a date range.
356
+
357
+ ```
358
+ Options:
359
+ -s, --since <date> Start date, YYYY-MM-DD (default: 7 days ago)
360
+ -u, --until <date> End date, YYYY-MM-DD (default: today)
361
+ ```
362
+
363
+ ### `caliber monthly`
364
+
365
+ Generate a monthly KPI report.
366
+
367
+ ```
368
+ Options:
369
+ -f, --format <format> Output: text | json | markdown (default: text)
370
+ -o, --output <file> Write report to file instead of stdout
371
+ --standard <path> Path to custom evaluation standard JSON
372
+ --previous Use the previous full calendar month
373
+ ```
374
+
375
+ ### `caliber quarterly`
376
+
377
+ Generate a quarterly KPI report.
378
+
379
+ ```
380
+ Options:
381
+ -f, --format <format> Output: text | json | markdown (default: text)
382
+ -o, --output <file> Write report to file instead of stdout
383
+ --standard <path> Path to custom evaluation standard JSON
384
+ --previous Use the previous full calendar quarter
385
+ ```
386
+
387
+ ### `caliber init-standard`
388
+
389
+ Export the default evaluation standard as a JSON template for customization.
390
+
391
+ ```
392
+ Options:
393
+ -o, --output <file> Output file path (default: eval-standard.json)
394
+ ```
395
+
396
+ ---
397
+
398
+ ## Report Structure / 報告結構
399
+
400
+ The generated report contains the following sections:
401
+
402
+ ### 1. Management Summary / 管理摘要
403
+
404
+ Management-facing overview for monthly/quarterly KPI review:
405
+
406
+ - Overall headline
407
+ - Period assessment
408
+ - Key observations
409
+ - Recommended follow-up actions
410
+
411
+ ### 2. Usage Overview / 使用概覽
412
+
413
+ Quantitative metrics for both Claude Code and Codex:
414
+
415
+ - Total sessions, tokens (input/output), estimated cost
416
+ - Active days, duration
417
+ - Top projects by token usage
418
+ - Top tools used (Bash, Read, Edit, etc.)
419
+ - Model breakdown
420
+
421
+ ### 3-N. Evaluation Sections / 評核區段
422
+
423
+ Each section defined in the evaluation standard generates:
424
+
425
+ - **Summary** — aggregate statistics
426
+ - **Usage evidence** — workload/depth indicators such as sessions, tool usage, follow-up prompts
427
+ - **Score evidence** — threshold-relevant evidence used for 100% / 120% scoring
428
+ - **Evidence signals** — grouped by type (iterative refinement, bugs caught, security awareness, etc.)
429
+ - **Metrics** — numeric indicators used for scoring
430
+
431
+ ### Final. Score Recommendation / 分值建議
432
+
433
+ For each evaluation section:
434
+
435
+ - **Score**: Standard (100%) or Superior (120%)
436
+ - **Label**: Human-readable grade
437
+ - **Reason**: Evidence-backed explanation referencing the criteria
438
+
439
+ ### Data Quality Warnings / 資料品質警告
440
+
441
+ The report includes data quality warnings when:
442
+
443
+ - Required data sources (`~/.claude/usage-data/session-meta`) are missing
444
+ - Sessions exist but no facets are found (qualitative analysis limited)
445
+ - No keyword signals detected (JSONL files may be missing)
446
+ - No sessions found at all in the evaluation period
447
+
448
+ ---
449
+
450
+ ## Custom Evaluation Standards / 自訂評核標準
451
+
452
+ The built-in default is the OneAD R&D AI-Application Evaluation Standard. To create your own:
453
+
454
+ ### Step 1: Export the default template / 匯出預設範本
455
+
456
+ ```bash
457
+ npx tsx src/cli.ts init-standard --output my-standard.json
458
+ ```
459
+
460
+ ### Step 2: Edit the JSON file / 編輯 JSON 檔案
461
+
462
+ Key fields you can customize:
463
+
464
+ | Field | Purpose |
465
+ |-------|---------|
466
+ | `name` | Standard name shown in report header |
467
+ | `sections[]` | Array of evaluation sections (add/remove/reorder) |
468
+ | `sections[].id` | Unique section identifier |
469
+ | `sections[].name` | Section display name |
470
+ | `sections[].weight` | KPI weight (display only) |
471
+ | `sections[].keywords` | Conversation scanning keywords |
472
+ | `sections[].thresholds` | Numeric thresholds for Superior score |
473
+ | `sections[].superiorRules` | Optional rule for combining thresholds |
474
+ | `sections[].standard` | 100% score criteria text |
475
+ | `sections[].superior` | 120% score criteria text |
476
+ | `noiseFilters` | Rules to exclude system/template messages |
477
+
478
+ ### Step 3: Use it / 使用自訂標準
479
+
480
+ ```bash
481
+ npx tsx src/cli.ts report --standard my-standard.json
482
+ ```
483
+
484
+ ### Example: Adding a new section / 新增評核區段範例
485
+
486
+ ```json
487
+ {
488
+ "id": "collaboration",
489
+ "name": "AI-Human Collaboration Quality",
490
+ "weight": "30%",
491
+ "standard": {
492
+ "score": 100,
493
+ "label": "Standard",
494
+ "criteria": ["Uses AI for routine tasks", "Follows AI suggestions without modification"]
495
+ },
496
+ "superior": {
497
+ "score": 120,
498
+ "label": "Superior",
499
+ "criteria": ["Actively debates with AI on design decisions", "Synthesizes multiple AI suggestions into novel solutions"]
500
+ },
501
+ "keywords": ["design", "architecture", "trade-off", "pattern", "alternative"],
502
+ "thresholds": {
503
+ "iterativeRatio": 0.4,
504
+ "keywordHits": 15
505
+ },
506
+ "superiorRules": {
507
+ "mode": "grouped",
508
+ "strongThresholds": ["iterativeRatio", "keywordHits"],
509
+ "supportThresholds": ["avgToolUses"],
510
+ "minStrongMatched": 1,
511
+ "minSupportMatched": 0
512
+ }
513
+ }
514
+ ```
515
+
516
+ ### Superior Rules / 升等規則
517
+
518
+ `superiorRules.mode = "any"` — any matched threshold is enough for 120%.
519
+
520
+ `superiorRules.mode = "grouped"` — separate strong evidence from support evidence. Strong evidence must meet a minimum count; support evidence alone is not sufficient.
521
+
522
+ Keys referenced by `strongThresholds` and `supportThresholds` must also exist in `thresholds`.
523
+
524
+ ### Available threshold keys / 可用門檻鍵值
525
+
526
+ | Key | Description |
527
+ |-----|-------------|
528
+ | `iterativeRatio` | Ratio of iterative/multi-task sessions to total |
529
+ | `correctionCount` | Number of user corrections/interruptions |
530
+ | `keywordHits` | Number of keyword signal matches |
531
+ | `avgToolUses` | Average tool uses per session |
532
+ | `securityCount` | Security-related keyword matches |
533
+ | `performanceCount` | Performance-related keyword matches |
534
+ | `bugsCaught` | AI-generated bugs caught (from facets) |
535
+ | `frictionSessions` | Sessions with friction events |
536
+ | `codexIterativeSessions` | Codex threads with strong iterative evidence |
537
+ | `codexMultiTurnSessions` | Codex multi-turn threads |
538
+ | `codexFollowUpCount` | Codex follow-up user prompts |
539
+ | `codexDeepSessions` | Codex high-depth threads |
540
+ | `codexErrorSessions` | Codex threads with logged errors |
541
+
542
+ ---
543
+
544
+ ## Default Evaluation Standard / 預設評核標準
545
+
546
+ The built-in OneAD standard evaluates two dimensions:
547
+
548
+ ### AI Interaction & Decision (20% KPI weight) / AI 交互與決策
549
+
550
+ | Grade | Criteria |
551
+ |-------|----------|
552
+ | **Standard (100%)** | Actively use AI for coding; clear decision notes |
553
+ | **Superior (120%)** | Multi-iteration guidance (A->B->C); system-constraint-aware optimization |
554
+
555
+ ### AI Identification & Risk Control (50% KPI weight) / AI 識別與風險控管
556
+
557
+ | Grade | Criteria |
558
+ |-------|----------|
559
+ | **Standard (100%)** | Catch common AI errors/hallucinations; stable code |
560
+ | **Superior (120%)** | Identify critical risks (security, performance, memory); produce SOP/Wiki for team sharing |
561
+
562
+ ---
563
+
564
+ ## Architecture / 架構
565
+
566
+ ```
567
+ src/
568
+ ├── cli.ts # CLI entry point (commander)
569
+ ├── types.ts # TypeScript type definitions
570
+ ├── standard.ts # Load & validate evaluation standards
571
+ ├── period.ts # Date period resolution (monthly/quarterly)
572
+ ├── data-quality.ts # Data source completeness checks
573
+ ├── utils.ts # Shared utilities (noise filter)
574
+ ├── extractors/
575
+ │ ├── claude-code.ts # Read ~/.claude/ data (JSONL, SQLite, JSON)
576
+ │ └── codex.ts # Read ~/.codex/ data (SQLite, JSONL)
577
+ ├── analyzers/
578
+ │ ├── usage.ts # Aggregate quantitative usage metrics
579
+ │ └── section.ts # Generic section analyzer (facets + keywords + thresholds)
580
+ └── reporters/
581
+ └── report.ts # Render reports (text, JSON, Markdown)
582
+
583
+ templates/
584
+ └── eval-standard.json # Default OneAD evaluation standard (source of truth)
585
+
586
+ tests/
587
+ ├── cli.test.ts # CLI regression tests (subprocess)
588
+ ├── section.test.ts # Section analyzer unit tests
589
+ ├── standard.test.ts # Standard loader/validator tests
590
+ ├── data-quality.test.ts # Data quality checker tests
591
+ └── fixtures/ # Test fixture files
592
+ ```
593
+
594
+ ### Pipeline / 處理流程
595
+
596
+ ```
597
+ Extract --> Analyze --> Score --> Report
598
+
599
+ 1. Extract: Read session-meta, facets, SQLite, JSONL from local stores
600
+ 2. Analyze: Aggregate usage + run each section through generic analyzer
601
+ 3. Score: Compare metrics against section thresholds
602
+ 4. Report: Render in chosen format with evidence and recommendations
603
+ ```
604
+
605
+ ---
606
+
607
+ ## Development / 開發
608
+
609
+ ### Scripts / 腳本
610
+
611
+ ```bash
612
+ npm run build # Compile TypeScript to dist/
613
+ npm run dev # Run CLI directly via tsx (no build needed)
614
+ npm run test # Run test suite (vitest)
615
+ npm run test:watch # Run tests in watch mode
616
+ ```
617
+
618
+ ### Running tests / 執行測試
619
+
620
+ ```bash
621
+ # Run all tests
622
+ npm test
623
+
624
+ # Run a specific test file
625
+ npx vitest run tests/section.test.ts
626
+
627
+ # Watch mode
628
+ npm run test:watch
629
+ ```
630
+
631
+ ### Project conventions / 專案慣例
632
+
633
+ - All progress/status messages are written to **stderr**; report output goes to **stdout**
634
+ - JSON output (`--format json`) is guaranteed clean on stdout for piping
635
+ - SQLite connections are wrapped in `try/finally` to prevent resource leaks
636
+ - The evaluation standard template (`templates/eval-standard.json`) is the single source of truth
637
+ - Custom standards inherit default `noiseFilters` if not specified
638
+
639
+ ---
640
+
641
+ ## Troubleshooting / 問題排除
642
+
643
+ ### No sessions found
644
+
645
+ - Verify `~/.claude/usage-data/session-meta/` contains JSON files
646
+ - Check the date range matches when the AI tools were used
647
+ - For Codex, verify `~/.codex/state_5.sqlite` exists
648
+
649
+ ### Empty facets
650
+
651
+ - Facets are generated asynchronously by Claude Code after sessions end
652
+ - Recent sessions may not have facets yet
653
+ - The tool will show a data quality warning in this case
654
+
655
+ ### JSON output contains extra text
656
+
657
+ This was fixed in v0.1.0. All progress messages now go to stderr. If you encounter this, ensure you are using the latest version. Use `2>/dev/null` to suppress stderr when piping:
658
+
659
+ ```bash
660
+ npx tsx src/cli.ts report --format json 2>/dev/null | jq .
661
+ ```
662
+
663
+ ---
664
+
665
+ ## License
666
+
667
+ MIT
@@ -0,0 +1,9 @@
1
+ import type { ClaudeCodeSession, ClaudeCodeFacet, ClaudeCodeConversationSignal, CodexSession, CodexConversationSignal, CodexSessionInsight, EvalSectionDef, EvalSectionResult } from "../types.js";
2
+ /**
3
+ * Generic section analyzer.
4
+ *
5
+ * Collects evidence from facets, sessions, and conversation signals,
6
+ * then determines standard vs superior score based on section thresholds.
7
+ */
8
+ export declare function analyzeSection(section: EvalSectionDef, claudeSessions: ClaudeCodeSession[], facets: Map<string, ClaudeCodeFacet>, claudeSignals: ClaudeCodeConversationSignal[], codexSessions: CodexSession[], codexInsights: Map<string, CodexSessionInsight>, codexSignals: CodexConversationSignal[]): EvalSectionResult;
9
+ //# sourceMappingURL=section.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"section.d.ts","sourceRoot":"","sources":["../../src/analyzers/section.ts"],"names":[],"mappings":"AAAA,OAAO,KAAK,EACV,iBAAiB,EACjB,eAAe,EACf,4BAA4B,EAC5B,YAAY,EACZ,uBAAuB,EACvB,mBAAmB,EACnB,cAAc,EACd,iBAAiB,EAClB,MAAM,aAAa,CAAC;AAsiBrB;;;;;GAKG;AACH,wBAAgB,cAAc,CAC5B,OAAO,EAAE,cAAc,EACvB,cAAc,EAAE,iBAAiB,EAAE,EACnC,MAAM,EAAE,GAAG,CAAC,MAAM,EAAE,eAAe,CAAC,EACpC,aAAa,EAAE,4BAA4B,EAAE,EAC7C,aAAa,EAAE,YAAY,EAAE,EAC7B,aAAa,EAAE,GAAG,CAAC,MAAM,EAAE,mBAAmB,CAAC,EAC/C,YAAY,EAAE,uBAAuB,EAAE,GACtC,iBAAiB,CAwKnB"}