@kansei-link/bantou 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -21
- package/README.md +110 -110
- package/data/exclusion-rules/README.md +104 -104
- package/data/exclusion-rules/jp-tax-baseline-v1.json +185 -185
- package/data/exclusion-rules-schema.json +109 -109
- package/data/keyword-dict/README.md +91 -91
- package/data/keyword-dict/jp-tax-baseline-v1.json +398 -398
- package/data/keyword-dict-schema.json +117 -117
- package/data/tax-rules/jp-tax-rules-v1.json +170 -170
- package/dist/classifier/claude-classifier.js +35 -35
- package/dist/classifier/keyword-classifier.js +5 -2
- package/dist/exclusion/exclusion-checker.js +5 -0
- package/dist/tax-rules/tax-rule-engine.js +5 -0
- package/package.json +74 -74
|
@@ -1,91 +1,91 @@
|
|
|
1
|
-
# Cockpit MCP — Keyword Dictionary
|
|
2
|
-
|
|
3
|
-
## What this is
|
|
4
|
-
|
|
5
|
-
Cockpit MCP's **Stage 1 classifier** uses keyword dictionaries to map raw transaction 摘要 (= memo strings) to 勘定科目 (= account categories) + 税区分 (= tax codes).
|
|
6
|
-
|
|
7
|
-
This is the **practitioner-style 14 categories × 100 keywords** approach, packaged as portable JSON for free distribution. Each tax jurisdiction has its own dictionary file:
|
|
8
|
-
|
|
9
|
-
- `jp-tax-baseline-v1.json` — Japanese tax accounting baseline (14 categories)
|
|
10
|
-
- (future) `sg-tax-baseline-v1.json` — Singapore GST + ACRA-aligned categories
|
|
11
|
-
- (future) `us-tax-baseline-v1.json` — US sales tax + GAAP categories
|
|
12
|
-
|
|
13
|
-
## How matching works
|
|
14
|
-
|
|
15
|
-
1. **Normalize** 摘要 string: 全角 → 半角、 大文字 → 小文字、 trim whitespace
|
|
16
|
-
2. **Iterate categories** (top to bottom) and for each, check if any keyword is a substring of the normalized 摘要
|
|
17
|
-
3. **First match wins**: if "Suica" matches in `travel`, classifier returns `travel`. No further categories checked.
|
|
18
|
-
4. **Apply amount thresholds** (if defined): e.g., 会議費 only matches ≤¥10K; if amount > ¥10K, auto-redirect to 交際費 via `amount_overflow_category`
|
|
19
|
-
5. **Apply special patterns** (if defined): e.g., `transfer_professional` requires the keyword AND a 振込 verb in the 摘要
|
|
20
|
-
6. If no category matches → fallback to **Stage 2** (Claude API classifier with confidence high/medium/low)
|
|
21
|
-
|
|
22
|
-
## Versioning
|
|
23
|
-
|
|
24
|
-
- `v1.0.0` = initial baseline (= 14 categories × ~50 keywords average)
|
|
25
|
-
- `v1.x` = expand to 100 keywords/category via dogfood iterations
|
|
26
|
-
- `v2.x` = community-tuned (= Cockpit user 集合知 で keyword 改善)
|
|
27
|
-
- `v3.x` = AI-discovered keywords (= 過去 anonymous data から AI が新 keyword 発見)
|
|
28
|
-
|
|
29
|
-
## How to extend
|
|
30
|
-
|
|
31
|
-
### Add new keyword to existing category
|
|
32
|
-
|
|
33
|
-
1. Open the relevant `*-tax-baseline-v*.json`
|
|
34
|
-
2. Find the category by `id`
|
|
35
|
-
3. Append your keyword to the `keywords` array
|
|
36
|
-
4. Test: run smoke test against your test fixtures
|
|
37
|
-
5. Open PR (= reviewer must verify keyword is unambiguous + likely to match real transactions)
|
|
38
|
-
|
|
39
|
-
### Add new category
|
|
40
|
-
|
|
41
|
-
1. Define new entry in `categories` array with all required fields (= `id`, `name_ja`, `freee_account_code`, `default_tax_code`, `keywords`)
|
|
42
|
-
2. Choose `freee_account_code` from freee's standard chart of accounts
|
|
43
|
-
3. Determine `default_tax_code` based on Japanese 消費税 rules
|
|
44
|
-
4. Decide priority (= where in the categories array it goes — earlier = higher priority)
|
|
45
|
-
5. Add at least 10 initial keywords to bootstrap
|
|
46
|
-
6. Run smoke test
|
|
47
|
-
7. Open PR (= reviewer verifies category is genuinely needed and not a duplicate)
|
|
48
|
-
|
|
49
|
-
### Add jurisdiction-specific dictionary
|
|
50
|
-
|
|
51
|
-
1. Copy `jp-tax-baseline-v1.json` → `<country-code>-tax-baseline-v1.json`
|
|
52
|
-
2. Update `locale` and `tax_jurisdiction` fields
|
|
53
|
-
3. Replace categories with the target country's chart of accounts
|
|
54
|
-
4. Provide a community maintainer (= regional accountant) to validate
|
|
55
|
-
5. Open PR
|
|
56
|
-
|
|
57
|
-
## Partnership note
|
|
58
|
-
|
|
59
|
-
This dictionary is designed to be **co-authored with tax practitioners** starting Year 1 closed beta:
|
|
60
|
-
|
|
61
|
-
- Practitioner's 6-month-built 14 categories × 100 keywords → contributed as anonymous PR
|
|
62
|
-
- Co-author credit in `maintainer` field
|
|
63
|
-
- Quarterly merge of practitioner improvements + community improvements
|
|
64
|
-
- See `synapse-arrows-playbook/04-tooling-gaps/10-kansei-link-cockpit-strategy.md` §11 for partnership terms
|
|
65
|
-
|
|
66
|
-
## Validation
|
|
67
|
-
|
|
68
|
-
```bash
|
|
69
|
-
# Run schema validation (TODO: add to CI)
|
|
70
|
-
node scripts/validate-keyword-dict.js data/keyword-dict/*.json
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
## Smoke test scenarios
|
|
74
|
-
|
|
75
|
-
`packages/cockpit-mcp/tests/keyword-dict.test.ts` covers:
|
|
76
|
-
|
|
77
|
-
1. Basic substring match (= "Suica" → travel)
|
|
78
|
-
2. Multi-keyword match (= 上 category 優先)
|
|
79
|
-
3. Amount threshold (= "スターバックス @ ¥15,000" → entertainment, NOT meeting_meal)
|
|
80
|
-
4. Special pattern: salary employee detection (= 給与 + 従業員名 list match)
|
|
81
|
-
5. Special pattern: 振込 + 士業名 → professional_fee + 取引先抽出
|
|
82
|
-
6. Special pattern: 振込 + カナ人名 → outsourcing + 発生日前月末調整
|
|
83
|
-
7. No-match → Stage 2 fallback signal
|
|
84
|
-
8. Normalize: 全角 / カタカナ / 半角混在の 摘要
|
|
85
|
-
|
|
86
|
-
## See also
|
|
87
|
-
|
|
88
|
-
- Schema: [`../keyword-dict-schema.json`](../keyword-dict-schema.json)
|
|
89
|
-
- Architecture: [`../../docs/architecture.md`](../../docs/architecture.md)
|
|
90
|
-
- 7 exclusion rules: [`../exclusion-rules/README.md`](../exclusion-rules/README.md) (= Stage 0 = before classifier runs)
|
|
91
|
-
- Strategy: [synapse-arrows-playbook Doc 10](https://github.com/michielinksee/synapse-arrows-playbook/blob/main/04-tooling-gaps/10-kansei-link-cockpit-strategy.md)
|
|
1
|
+
# Cockpit MCP — Keyword Dictionary
|
|
2
|
+
|
|
3
|
+
## What this is
|
|
4
|
+
|
|
5
|
+
Cockpit MCP's **Stage 1 classifier** uses keyword dictionaries to map raw transaction 摘要 (= memo strings) to 勘定科目 (= account categories) + 税区分 (= tax codes).
|
|
6
|
+
|
|
7
|
+
This is the **practitioner-style 14 categories × 100 keywords** approach, packaged as portable JSON for free distribution. Each tax jurisdiction has its own dictionary file:
|
|
8
|
+
|
|
9
|
+
- `jp-tax-baseline-v1.json` — Japanese tax accounting baseline (14 categories)
|
|
10
|
+
- (future) `sg-tax-baseline-v1.json` — Singapore GST + ACRA-aligned categories
|
|
11
|
+
- (future) `us-tax-baseline-v1.json` — US sales tax + GAAP categories
|
|
12
|
+
|
|
13
|
+
## How matching works
|
|
14
|
+
|
|
15
|
+
1. **Normalize** 摘要 string: 全角 → 半角、 大文字 → 小文字、 trim whitespace
|
|
16
|
+
2. **Iterate categories** (top to bottom) and for each, check if any keyword is a substring of the normalized 摘要
|
|
17
|
+
3. **First match wins**: if "Suica" matches in `travel`, classifier returns `travel`. No further categories checked.
|
|
18
|
+
4. **Apply amount thresholds** (if defined): e.g., 会議費 only matches ≤¥10K; if amount > ¥10K, auto-redirect to 交際費 via `amount_overflow_category`
|
|
19
|
+
5. **Apply special patterns** (if defined): e.g., `transfer_professional` requires the keyword AND a 振込 verb in the 摘要
|
|
20
|
+
6. If no category matches → fallback to **Stage 2** (Claude API classifier with confidence high/medium/low)
|
|
21
|
+
|
|
22
|
+
## Versioning
|
|
23
|
+
|
|
24
|
+
- `v1.0.0` = initial baseline (= 14 categories × ~50 keywords average)
|
|
25
|
+
- `v1.x` = expand to 100 keywords/category via dogfood iterations
|
|
26
|
+
- `v2.x` = community-tuned (= Cockpit user 集合知 で keyword 改善)
|
|
27
|
+
- `v3.x` = AI-discovered keywords (= 過去 anonymous data から AI が新 keyword 発見)
|
|
28
|
+
|
|
29
|
+
## How to extend
|
|
30
|
+
|
|
31
|
+
### Add new keyword to existing category
|
|
32
|
+
|
|
33
|
+
1. Open the relevant `*-tax-baseline-v*.json`
|
|
34
|
+
2. Find the category by `id`
|
|
35
|
+
3. Append your keyword to the `keywords` array
|
|
36
|
+
4. Test: run smoke test against your test fixtures
|
|
37
|
+
5. Open PR (= reviewer must verify keyword is unambiguous + likely to match real transactions)
|
|
38
|
+
|
|
39
|
+
### Add new category
|
|
40
|
+
|
|
41
|
+
1. Define new entry in `categories` array with all required fields (= `id`, `name_ja`, `freee_account_code`, `default_tax_code`, `keywords`)
|
|
42
|
+
2. Choose `freee_account_code` from freee's standard chart of accounts
|
|
43
|
+
3. Determine `default_tax_code` based on Japanese 消費税 rules
|
|
44
|
+
4. Decide priority (= where in the categories array it goes — earlier = higher priority)
|
|
45
|
+
5. Add at least 10 initial keywords to bootstrap
|
|
46
|
+
6. Run smoke test
|
|
47
|
+
7. Open PR (= reviewer verifies category is genuinely needed and not a duplicate)
|
|
48
|
+
|
|
49
|
+
### Add jurisdiction-specific dictionary
|
|
50
|
+
|
|
51
|
+
1. Copy `jp-tax-baseline-v1.json` → `<country-code>-tax-baseline-v1.json`
|
|
52
|
+
2. Update `locale` and `tax_jurisdiction` fields
|
|
53
|
+
3. Replace categories with the target country's chart of accounts
|
|
54
|
+
4. Provide a community maintainer (= regional accountant) to validate
|
|
55
|
+
5. Open PR
|
|
56
|
+
|
|
57
|
+
## Partnership note
|
|
58
|
+
|
|
59
|
+
This dictionary is designed to be **co-authored with tax practitioners** starting Year 1 closed beta:
|
|
60
|
+
|
|
61
|
+
- Practitioner's 6-month-built 14 categories × 100 keywords → contributed as anonymous PR
|
|
62
|
+
- Co-author credit in `maintainer` field
|
|
63
|
+
- Quarterly merge of practitioner improvements + community improvements
|
|
64
|
+
- See `synapse-arrows-playbook/04-tooling-gaps/10-kansei-link-cockpit-strategy.md` §11 for partnership terms
|
|
65
|
+
|
|
66
|
+
## Validation
|
|
67
|
+
|
|
68
|
+
```bash
|
|
69
|
+
# Run schema validation (TODO: add to CI)
|
|
70
|
+
node scripts/validate-keyword-dict.js data/keyword-dict/*.json
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Smoke test scenarios
|
|
74
|
+
|
|
75
|
+
`packages/cockpit-mcp/tests/keyword-dict.test.ts` covers:
|
|
76
|
+
|
|
77
|
+
1. Basic substring match (= "Suica" → travel)
|
|
78
|
+
2. Multi-keyword match (= 上 category 優先)
|
|
79
|
+
3. Amount threshold (= "スターバックス @ ¥15,000" → entertainment, NOT meeting_meal)
|
|
80
|
+
4. Special pattern: salary employee detection (= 給与 + 従業員名 list match)
|
|
81
|
+
5. Special pattern: 振込 + 士業名 → professional_fee + 取引先抽出
|
|
82
|
+
6. Special pattern: 振込 + カナ人名 → outsourcing + 発生日前月末調整
|
|
83
|
+
7. No-match → Stage 2 fallback signal
|
|
84
|
+
8. Normalize: 全角 / カタカナ / 半角混在の 摘要
|
|
85
|
+
|
|
86
|
+
## See also
|
|
87
|
+
|
|
88
|
+
- Schema: [`../keyword-dict-schema.json`](../keyword-dict-schema.json)
|
|
89
|
+
- Architecture: [`../../docs/architecture.md`](../../docs/architecture.md)
|
|
90
|
+
- 7 exclusion rules: [`../exclusion-rules/README.md`](../exclusion-rules/README.md) (= Stage 0 = before classifier runs)
|
|
91
|
+
- Strategy: [synapse-arrows-playbook Doc 10](https://github.com/michielinksee/synapse-arrows-playbook/blob/main/04-tooling-gaps/10-kansei-link-cockpit-strategy.md)
|