@kansei-link/bantou 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,91 +1,91 @@
1
- # Cockpit MCP — Keyword Dictionary
2
-
3
- ## What this is
4
-
5
- Cockpit MCP's **Stage 1 classifier** uses keyword dictionaries to map raw transaction 摘要 (= memo strings) to 勘定科目 (= account categories) + 税区分 (= tax codes).
6
-
7
- This is the **practitioner-style 14 categories × 100 keywords** approach, packaged as portable JSON for free distribution. Each tax jurisdiction has its own dictionary file:
8
-
9
- - `jp-tax-baseline-v1.json` — Japanese tax accounting baseline (14 categories)
10
- - (future) `sg-tax-baseline-v1.json` — Singapore GST + ACRA-aligned categories
11
- - (future) `us-tax-baseline-v1.json` — US sales tax + GAAP categories
12
-
13
- ## How matching works
14
-
15
- 1. **Normalize** 摘要 string: 全角 → 半角、 大文字 → 小文字、 trim whitespace
16
- 2. **Iterate categories** (top to bottom) and for each, check if any keyword is a substring of the normalized 摘要
17
- 3. **First match wins**: if "Suica" matches in `travel`, classifier returns `travel`. No further categories checked.
18
- 4. **Apply amount thresholds** (if defined): e.g., 会議費 only matches ≤¥10K; if amount > ¥10K, auto-redirect to 交際費 via `amount_overflow_category`
19
- 5. **Apply special patterns** (if defined): e.g., `transfer_professional` requires the keyword AND a 振込 verb in the 摘要
20
- 6. If no category matches → fallback to **Stage 2** (Claude API classifier with confidence high/medium/low)
21
-
22
- ## Versioning
23
-
24
- - `v1.0.0` = initial baseline (= 14 categories × ~50 keywords average)
25
- - `v1.x` = expand to 100 keywords/category via dogfood iterations
26
- - `v2.x` = community-tuned (= Cockpit user 集合知 で keyword 改善)
27
- - `v3.x` = AI-discovered keywords (= 過去 anonymous data から AI が新 keyword 発見)
28
-
29
- ## How to extend
30
-
31
- ### Add new keyword to existing category
32
-
33
- 1. Open the relevant `*-tax-baseline-v*.json`
34
- 2. Find the category by `id`
35
- 3. Append your keyword to the `keywords` array
36
- 4. Test: run smoke test against your test fixtures
37
- 5. Open PR (= reviewer must verify keyword is unambiguous + likely to match real transactions)
38
-
39
- ### Add new category
40
-
41
- 1. Define new entry in `categories` array with all required fields (= `id`, `name_ja`, `freee_account_code`, `default_tax_code`, `keywords`)
42
- 2. Choose `freee_account_code` from freee's standard chart of accounts
43
- 3. Determine `default_tax_code` based on Japanese 消費税 rules
44
- 4. Decide priority (= where in the categories array it goes — earlier = higher priority)
45
- 5. Add at least 10 initial keywords to bootstrap
46
- 6. Run smoke test
47
- 7. Open PR (= reviewer verifies category is genuinely needed and not a duplicate)
48
-
49
- ### Add jurisdiction-specific dictionary
50
-
51
- 1. Copy `jp-tax-baseline-v1.json` → `<country-code>-tax-baseline-v1.json`
52
- 2. Update `locale` and `tax_jurisdiction` fields
53
- 3. Replace categories with the target country's chart of accounts
54
- 4. Provide a community maintainer (= regional accountant) to validate
55
- 5. Open PR
56
-
57
- ## Partnership note
58
-
59
- This dictionary is designed to be **co-authored with tax practitioners** starting Year 1 closed beta:
60
-
61
- - Practitioner's 6-month-built 14 categories × 100 keywords → contributed as anonymous PR
62
- - Co-author credit in `maintainer` field
63
- - Quarterly merge of practitioner improvements + community improvements
64
- - See `synapse-arrows-playbook/04-tooling-gaps/10-kansei-link-cockpit-strategy.md` §11 for partnership terms
65
-
66
- ## Validation
67
-
68
- ```bash
69
- # Run schema validation (TODO: add to CI)
70
- node scripts/validate-keyword-dict.js data/keyword-dict/*.json
71
- ```
72
-
73
- ## Smoke test scenarios
74
-
75
- `packages/cockpit-mcp/tests/keyword-dict.test.ts` covers:
76
-
77
- 1. Basic substring match (= "Suica" → travel)
78
- 2. Multi-keyword match (= 上 category 優先)
79
- 3. Amount threshold (= "スターバックス @ ¥15,000" → entertainment, NOT meeting_meal)
80
- 4. Special pattern: salary employee detection (= 給与 + 従業員名 list match)
81
- 5. Special pattern: 振込 + 士業名 → professional_fee + 取引先抽出
82
- 6. Special pattern: 振込 + カナ人名 → outsourcing + 発生日前月末調整
83
- 7. No-match → Stage 2 fallback signal
84
- 8. Normalize: 全角 / カタカナ / 半角混在の 摘要
85
-
86
- ## See also
87
-
88
- - Schema: [`../keyword-dict-schema.json`](../keyword-dict-schema.json)
89
- - Architecture: [`../../docs/architecture.md`](../../docs/architecture.md)
90
- - 7 exclusion rules: [`../exclusion-rules/README.md`](../exclusion-rules/README.md) (= Stage 0 = before classifier runs)
91
- - Strategy: [synapse-arrows-playbook Doc 10](https://github.com/michielinksee/synapse-arrows-playbook/blob/main/04-tooling-gaps/10-kansei-link-cockpit-strategy.md)
1
+ # Cockpit MCP — Keyword Dictionary
2
+
3
+ ## What this is
4
+
5
+ Cockpit MCP's **Stage 1 classifier** uses keyword dictionaries to map raw transaction 摘要 (= memo strings) to 勘定科目 (= account categories) + 税区分 (= tax codes).
6
+
7
+ This is the **practitioner-style 14 categories × 100 keywords** approach, packaged as portable JSON for free distribution. Each tax jurisdiction has its own dictionary file:
8
+
9
+ - `jp-tax-baseline-v1.json` — Japanese tax accounting baseline (14 categories)
10
+ - (future) `sg-tax-baseline-v1.json` — Singapore GST + ACRA-aligned categories
11
+ - (future) `us-tax-baseline-v1.json` — US sales tax + GAAP categories
12
+
13
+ ## How matching works
14
+
15
+ 1. **Normalize** 摘要 string: 全角 → 半角、 大文字 → 小文字、 trim whitespace
16
+ 2. **Iterate categories** (top to bottom) and for each, check if any keyword is a substring of the normalized 摘要
17
+ 3. **First match wins**: if "Suica" matches in `travel`, classifier returns `travel`. No further categories checked.
18
+ 4. **Apply amount thresholds** (if defined): e.g., 会議費 only matches ≤¥10K; if amount > ¥10K, auto-redirect to 交際費 via `amount_overflow_category`
19
+ 5. **Apply special patterns** (if defined): e.g., `transfer_professional` requires the keyword AND a 振込 verb in the 摘要
20
+ 6. If no category matches → fallback to **Stage 2** (Claude API classifier with confidence high/medium/low)
21
+
22
+ ## Versioning
23
+
24
+ - `v1.0.0` = initial baseline (= 14 categories × ~50 keywords average)
25
+ - `v1.x` = expand to 100 keywords/category via dogfood iterations
26
+ - `v2.x` = community-tuned (= Cockpit user 集合知 で keyword 改善)
27
+ - `v3.x` = AI-discovered keywords (= 過去 anonymous data から AI が新 keyword 発見)
28
+
29
+ ## How to extend
30
+
31
+ ### Add new keyword to existing category
32
+
33
+ 1. Open the relevant `*-tax-baseline-v*.json`
34
+ 2. Find the category by `id`
35
+ 3. Append your keyword to the `keywords` array
36
+ 4. Test: run smoke test against your test fixtures
37
+ 5. Open PR (= reviewer must verify keyword is unambiguous + likely to match real transactions)
38
+
39
+ ### Add new category
40
+
41
+ 1. Define new entry in `categories` array with all required fields (= `id`, `name_ja`, `freee_account_code`, `default_tax_code`, `keywords`)
42
+ 2. Choose `freee_account_code` from freee's standard chart of accounts
43
+ 3. Determine `default_tax_code` based on Japanese 消費税 rules
44
+ 4. Decide priority (= where in the categories array it goes — earlier = higher priority)
45
+ 5. Add at least 10 initial keywords to bootstrap
46
+ 6. Run smoke test
47
+ 7. Open PR (= reviewer verifies category is genuinely needed and not a duplicate)
48
+
49
+ ### Add jurisdiction-specific dictionary
50
+
51
+ 1. Copy `jp-tax-baseline-v1.json` → `<country-code>-tax-baseline-v1.json`
52
+ 2. Update `locale` and `tax_jurisdiction` fields
53
+ 3. Replace categories with the target country's chart of accounts
54
+ 4. Provide a community maintainer (= regional accountant) to validate
55
+ 5. Open PR
56
+
57
+ ## Partnership note
58
+
59
+ This dictionary is designed to be **co-authored with tax practitioners** starting Year 1 closed beta:
60
+
61
+ - Practitioner's 6-month-built 14 categories × 100 keywords → contributed as anonymous PR
62
+ - Co-author credit in `maintainer` field
63
+ - Quarterly merge of practitioner improvements + community improvements
64
+ - See `synapse-arrows-playbook/04-tooling-gaps/10-kansei-link-cockpit-strategy.md` §11 for partnership terms
65
+
66
+ ## Validation
67
+
68
+ ```bash
69
+ # Run schema validation (TODO: add to CI)
70
+ node scripts/validate-keyword-dict.js data/keyword-dict/*.json
71
+ ```
72
+
73
+ ## Smoke test scenarios
74
+
75
+ `packages/cockpit-mcp/tests/keyword-dict.test.ts` covers:
76
+
77
+ 1. Basic substring match (= "Suica" → travel)
78
+ 2. Multi-keyword match (= 上 category 優先)
79
+ 3. Amount threshold (= "スターバックス @ ¥15,000" → entertainment, NOT meeting_meal)
80
+ 4. Special pattern: salary employee detection (= 給与 + 従業員名 list match)
81
+ 5. Special pattern: 振込 + 士業名 → professional_fee + 取引先抽出
82
+ 6. Special pattern: 振込 + カナ人名 → outsourcing + 発生日前月末調整
83
+ 7. No-match → Stage 2 fallback signal
84
+ 8. Normalize: 全角 / カタカナ / 半角混在の 摘要
85
+
86
+ ## See also
87
+
88
+ - Schema: [`../keyword-dict-schema.json`](../keyword-dict-schema.json)
89
+ - Architecture: [`../../docs/architecture.md`](../../docs/architecture.md)
90
+ - 7 exclusion rules: [`../exclusion-rules/README.md`](../exclusion-rules/README.md) (= Stage 0 = before classifier runs)
91
+ - Strategy: [synapse-arrows-playbook Doc 10](https://github.com/michielinksee/synapse-arrows-playbook/blob/main/04-tooling-gaps/10-kansei-link-cockpit-strategy.md)