@dizzlkheinz/ynab-mcpb 0.16.1 → 0.17.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.code/agents/0098661e-0fa3-4990-beb9-c0cbf3f123aa/status.txt +1 -0
- package/.code/agents/1324/exec-call_tIpx9uV1TpARbAMZonRQm8AO.txt +757 -0
- package/.code/agents/1572/exec-call_GjVFBFOWcY7lE0idc5nWlLNh.txt +781 -0
- package/.code/agents/1846/exec-call_1YNAVD18RjrMN7JnfkkQhUP3.txt +766 -0
- package/.code/agents/1846/exec-call_lh3lDzE4WJAh1lFiomiiZ73D.txt +766 -0
- package/.code/agents/2038/exec-call_DYwOukaYsL8VCONWmV2rUW5u.txt +766 -0
- package/.code/agents/2038/exec-call_c7fOQ7UrpVcTtvdfGBRM146V.txt +652 -0
- package/.code/agents/2038/exec-call_ySNyq9Mm55jWE480s54r5QcA.txt +766 -0
- package/.code/agents/2256/exec-call_AtPcRWPmFPMcmX6qOFm1fCEY.txt +766 -0
- package/.code/agents/2454/exec-call_aFJpupwjfZeOBm7ixI5Vc8z2.txt +766 -0
- package/.code/agents/2454/exec-call_wogZ4HfXTodTEXvdgXlVUBpv.txt +766 -0
- package/.code/agents/2e905864-aa07-4314-bcf9-c5b32277e4ac/result.txt +36 -0
- package/.code/agents/3073/exec-call_Peeagc9DxGYLgE6pNdMZhqIE.txt +766 -0
- package/.code/agents/3073/exec-call_d2YSE3hXF08KRSoUM3qd8Z3x.txt +766 -0
- package/.code/agents/335aa031-466d-4fb7-925f-3cd864e264d0/result.txt +191 -0
- package/.code/agents/3364/exec-call_NbhIrsM5HhyDZDmJZG5CuCYL.txt +766 -0
- package/.code/agents/3364/exec-call_cKtJg0NrXiwXEFwlsE3uPZRA.txt +766 -0
- package/.code/agents/36d98414-5cde-4d9d-9a67-a240a18c1f07/result.txt +189 -0
- package/.code/agents/4604e866-b7b8-44f5-992f-2f683b0a523b/status.txt +1 -0
- package/.code/agents/5f8dc01c-47b3-4163-b0b3-aa31be89fcdc/status.txt +1 -0
- package/.code/agents/7/exec-call_HltHpkDox0Zm1vGEjdksUgpE.txt +1120 -0
- package/.code/agents/7/exec-call_LCATrOPPAgbxW9Q1z0XaVi2E.txt +2646 -0
- package/.code/agents/7/exec-call_W8DeRfNG9hvbgVFvf0clBf6R.txt +2646 -0
- package/.code/agents/94a0ddf3-a304-4ec3-913e-3cceef509948/error.txt +1 -0
- package/.code/agents/e2c752b7-711d-423a-af57-f53c809deb84/result.txt +160 -0
- package/.code/agents/e6601719-c31f-4a0e-8c71-d70787d0ab71/status.txt +1 -0
- package/.code/agents/f250b7ed-5bd5-4036-aa8c-ce63caee7d61/result.txt +20 -0
- package/AGENTS.md +1 -36
- package/CLAUDE.md +28 -43
- package/NUL +0 -1
- package/README.md +8 -10
- package/dist/bundle/index.cjs +41 -41
- package/dist/server/YNABMCPServer.js +28 -381
- package/dist/server/config.d.ts +2 -0
- package/dist/server/config.js +1 -0
- package/dist/tools/accountTools.d.ts +2 -0
- package/dist/tools/accountTools.js +45 -0
- package/dist/tools/adapters.d.ts +12 -0
- package/dist/tools/adapters.js +25 -0
- package/dist/tools/budgetTools.d.ts +2 -0
- package/dist/tools/budgetTools.js +30 -0
- package/dist/tools/categoryTools.d.ts +2 -0
- package/dist/tools/categoryTools.js +45 -0
- package/dist/tools/monthTools.d.ts +2 -0
- package/dist/tools/monthTools.js +32 -0
- package/dist/tools/payeeTools.d.ts +2 -0
- package/dist/tools/payeeTools.js +32 -0
- package/dist/tools/reconciliation/index.d.ts +2 -0
- package/dist/tools/reconciliation/index.js +33 -0
- package/dist/tools/schemas/common.d.ts +3 -0
- package/dist/tools/schemas/common.js +3 -0
- package/dist/tools/schemas/outputs/comparisonOutputs.d.ts +1 -1
- package/dist/tools/transactionTools.d.ts +2 -0
- package/dist/tools/transactionTools.js +124 -0
- package/dist/tools/utilityTools.d.ts +3 -1
- package/dist/tools/utilityTools.js +32 -2
- package/dist/types/index.d.ts +1 -0
- package/dist/types/toolRegistration.d.ts +27 -0
- package/dist/types/toolRegistration.js +1 -0
- package/package.json +2 -2
- package/scripts/run-domain-integration-tests.js +4 -1
- package/src/__tests__/workflows.e2e.test.ts +1 -7
- package/src/server/YNABMCPServer.ts +33 -519
- package/src/server/__tests__/toolRegistration.test.ts +236 -0
- package/src/server/config.ts +1 -0
- package/src/tools/__tests__/adapters.test.ts +113 -0
- package/src/tools/__tests__/utilityTools.test.ts +7 -7
- package/src/tools/accountTools.ts +53 -0
- package/src/tools/adapters.ts +74 -0
- package/src/tools/budgetTools.ts +37 -0
- package/src/tools/categoryTools.ts +53 -0
- package/src/tools/monthTools.ts +39 -0
- package/src/tools/payeeTools.ts +39 -0
- package/src/tools/reconciliation/index.ts +45 -0
- package/src/tools/schemas/common.ts +18 -0
- package/src/tools/transactionTools.ts +140 -0
- package/src/tools/utilityTools.ts +42 -2
- package/src/types/index.ts +3 -0
- package/src/types/toolRegistration.ts +88 -0
- package/.github/workflows/pr-description-check.yml +0 -88
- package/docs/README.md +0 -72
- package/docs/getting-started/CONFIGURATION.md +0 -175
- package/docs/getting-started/INSTALLATION.md +0 -333
- package/docs/getting-started/QUICKSTART.md +0 -282
- package/docs/guides/ARCHITECTURE.md +0 -533
- package/docs/guides/DEPLOYMENT.md +0 -189
- package/docs/guides/INTEGRATION_TESTING.md +0 -730
- package/docs/guides/TESTING.md +0 -591
- package/docs/reconciliation-flow.md +0 -83
- package/docs/reference/EXAMPLES.md +0 -946
- package/docs/reference/TOOLS.md +0 -348
- package/docs/reference/TROUBLESHOOTING.md +0 -481
package/docs/guides/TESTING.md
DELETED
|
@@ -1,591 +0,0 @@
|
|
|
1
|
-
# YNAB MCP Server - Testing Guide
|
|
2
|
-
|
|
3
|
-
Comprehensive testing guide covering automated tests, manual test scenarios, and quality assurance processes.
|
|
4
|
-
|
|
5
|
-
## Table of Contents
|
|
6
|
-
|
|
7
|
-
- [Overview](#overview)
|
|
8
|
-
- [Test Structure](#test-structure)
|
|
9
|
-
- [Running Tests](#running-tests)
|
|
10
|
-
- [Environment Setup](#environment-setup)
|
|
11
|
-
- [Test Types](#test-types)
|
|
12
|
-
- [Coverage Requirements](#coverage-requirements)
|
|
13
|
-
- [Manual Test Scenarios](#manual-test-scenarios)
|
|
14
|
-
- [Test Data Management](#test-data-management)
|
|
15
|
-
- [Common Issues](#common-issues)
|
|
16
|
-
|
|
17
|
-
## Overview
|
|
18
|
-
|
|
19
|
-
The YNAB MCP Server includes both automated and manual testing capabilities:
|
|
20
|
-
|
|
21
|
-
**Automated Tests**:
|
|
22
|
-
1. **Unit Tests** - Test individual components in isolation
|
|
23
|
-
2. **Integration Tests** - Test component interactions with mocked dependencies
|
|
24
|
-
3. **End-to-End Tests** - Test complete workflows with real YNAB API (optional)
|
|
25
|
-
4. **Performance Tests** - Test response times, memory usage, and load handling
|
|
26
|
-
|
|
27
|
-
**Manual Testing**:
|
|
28
|
-
- Comprehensive test scenarios for Claude Desktop integration
|
|
29
|
-
- Feature verification workflows
|
|
30
|
-
- Performance and reliability validation
|
|
31
|
-
|
|
32
|
-
## Test Structure
|
|
33
|
-
|
|
34
|
-
```
|
|
35
|
-
src/
|
|
36
|
-
├── __tests__/ # Global test utilities and E2E tests
|
|
37
|
-
│ ├── setup.ts # Test environment setup
|
|
38
|
-
│ ├── testUtils.ts # Shared test utilities
|
|
39
|
-
│ ├── testRunner.ts # Comprehensive test runner
|
|
40
|
-
│ ├── workflows.e2e.test.ts # End-to-end workflow tests
|
|
41
|
-
│ ├── comprehensive.integration.test.ts # Integration tests
|
|
42
|
-
│ └── performance.test.ts # Performance and load tests
|
|
43
|
-
├── server/__tests__/ # Server component tests
|
|
44
|
-
├── tools/__tests__/ # Tool-specific tests
|
|
45
|
-
└── types/__tests__/ # Type definition tests
|
|
46
|
-
```
|
|
47
|
-
|
|
48
|
-
## Running Tests
|
|
49
|
-
|
|
50
|
-
### Quick Test Commands
|
|
51
|
-
|
|
52
|
-
```bash
|
|
53
|
-
# Run all tests with coverage
|
|
54
|
-
npm test
|
|
55
|
-
|
|
56
|
-
# Run specific test types
|
|
57
|
-
npm run test:unit # Unit tests only (fast, mocked)
|
|
58
|
-
npm run test:integration # Integration tests (mocked API)
|
|
59
|
-
npm run test:e2e # End-to-end tests (real API)
|
|
60
|
-
npm run test:performance # Performance tests
|
|
61
|
-
|
|
62
|
-
# Generate coverage report
|
|
63
|
-
npm run test:coverage
|
|
64
|
-
|
|
65
|
-
# Run comprehensive test suite with detailed reporting
|
|
66
|
-
npm run test:comprehensive
|
|
67
|
-
|
|
68
|
-
# Watch mode for test development
|
|
69
|
-
npm run test:watch
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
## Environment Setup
|
|
73
|
-
|
|
74
|
-
### For Unit and Integration Tests (using mocks):
|
|
75
|
-
```bash
|
|
76
|
-
# Optional - will use mock token if not provided
|
|
77
|
-
YNAB_ACCESS_TOKEN=your_test_token
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
### For End-to-End Tests (using real YNAB API):
|
|
81
|
-
```bash
|
|
82
|
-
# Required for E2E tests
|
|
83
|
-
YNAB_ACCESS_TOKEN=your_real_ynab_personal_access_token
|
|
84
|
-
|
|
85
|
-
# Optional - specify test budget/account IDs
|
|
86
|
-
TEST_BUDGET_ID=your_test_budget_id
|
|
87
|
-
TEST_ACCOUNT_ID=your_test_account_id
|
|
88
|
-
|
|
89
|
-
# Optional - skip E2E tests
|
|
90
|
-
SKIP_E2E_TESTS=true
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
## Test Types
|
|
94
|
-
|
|
95
|
-
### Unit Tests
|
|
96
|
-
- Test individual functions and classes in isolation
|
|
97
|
-
- Use mocked dependencies
|
|
98
|
-
- Fast execution (< 10 seconds)
|
|
99
|
-
- No external API calls
|
|
100
|
-
|
|
101
|
-
### Integration Tests
|
|
102
|
-
- Test component interactions
|
|
103
|
-
- Use mocked YNAB API responses
|
|
104
|
-
- Validate complete tool workflows
|
|
105
|
-
- Medium execution time (10-30 seconds)
|
|
106
|
-
|
|
107
|
-
### End-to-End Tests
|
|
108
|
-
- Test against real YNAB API
|
|
109
|
-
- Validate complete user workflows
|
|
110
|
-
- Slower execution (30-60 seconds)
|
|
111
|
-
- **Warning**: Creates real data in your test budget
|
|
112
|
-
|
|
113
|
-
### Performance Tests
|
|
114
|
-
- Test response times and memory usage
|
|
115
|
-
- Validate performance under load
|
|
116
|
-
- Test error handling performance
|
|
117
|
-
- Medium execution time (15-30 seconds)
|
|
118
|
-
|
|
119
|
-
## Coverage Requirements
|
|
120
|
-
|
|
121
|
-
The test suite enforces minimum coverage thresholds:
|
|
122
|
-
|
|
123
|
-
- **Lines**: 80%
|
|
124
|
-
- **Functions**: 80%
|
|
125
|
-
- **Branches**: 80%
|
|
126
|
-
- **Statements**: 80%
|
|
127
|
-
|
|
128
|
-
---
|
|
129
|
-
|
|
130
|
-
# Manual Test Scenarios
|
|
131
|
-
|
|
132
|
-
Comprehensive test scenarios for manually validating the YNAB MCP server with Claude Desktop.
|
|
133
|
-
|
|
134
|
-
## 1. Setup Verification Tests
|
|
135
|
-
|
|
136
|
-
### 1.1 Server Startup and Connection
|
|
137
|
-
|
|
138
|
-
**Objective**: Verify the server starts successfully and Claude Desktop connects.
|
|
139
|
-
|
|
140
|
-
**Steps**:
|
|
141
|
-
1. Build the project: `npm run build`
|
|
142
|
-
2. Configure Claude Desktop with MCP server settings
|
|
143
|
-
3. Restart Claude Desktop
|
|
144
|
-
4. Check MCP servers list in Claude Desktop
|
|
145
|
-
|
|
146
|
-
**Expected Results**:
|
|
147
|
-
- Build completes without errors
|
|
148
|
-
- Claude Desktop shows "ynab-mcp-server" in connected servers
|
|
149
|
-
- No connection errors in Claude Desktop logs
|
|
150
|
-
|
|
151
|
-
**Success Criteria**: Server appears as connected in Claude Desktop interface
|
|
152
|
-
|
|
153
|
-
### 1.2 YNAB Token Authentication
|
|
154
|
-
|
|
155
|
-
**Objective**: Verify YNAB Personal Access Token is valid and working.
|
|
156
|
-
|
|
157
|
-
**Steps**:
|
|
158
|
-
1. Ask Claude: "Can you run the diagnostic_info tool?"
|
|
159
|
-
2. Check the returned authentication status
|
|
160
|
-
3. Verify user information is retrieved
|
|
161
|
-
|
|
162
|
-
**Expected Results**:
|
|
163
|
-
- Diagnostic info returns successfully
|
|
164
|
-
- Authentication status shows "authenticated: true"
|
|
165
|
-
- User information includes YNAB user details
|
|
166
|
-
|
|
167
|
-
**Success Criteria**: No authentication errors, user data present
|
|
168
|
-
|
|
169
|
-
### 1.3 System Status Verification
|
|
170
|
-
|
|
171
|
-
**Objective**: Verify all server components are initialized properly.
|
|
172
|
-
|
|
173
|
-
**Steps**:
|
|
174
|
-
1. Run diagnostic_info tool
|
|
175
|
-
2. Review system configuration
|
|
176
|
-
3. Check cache initialization
|
|
177
|
-
4. Verify environment variables
|
|
178
|
-
|
|
179
|
-
**Expected Results**:
|
|
180
|
-
- All services report healthy status
|
|
181
|
-
- Cache is initialized with correct settings
|
|
182
|
-
- Environment variables loaded properly
|
|
183
|
-
- Tool registry shows all tools
|
|
184
|
-
|
|
185
|
-
**Success Criteria**: All system components report healthy status
|
|
186
|
-
|
|
187
|
-
## 2. Basic Functionality Tests
|
|
188
|
-
|
|
189
|
-
### 2.1 Budget Management
|
|
190
|
-
|
|
191
|
-
**Objective**: Test basic budget listing and selection functionality.
|
|
192
|
-
|
|
193
|
-
**Steps**:
|
|
194
|
-
1. Ask Claude: "List my YNAB budgets"
|
|
195
|
-
2. Note the budget names returned
|
|
196
|
-
3. Ask Claude: "Set my default budget to [budget_name]"
|
|
197
|
-
4. Ask Claude: "What is my current default budget?"
|
|
198
|
-
|
|
199
|
-
**Expected Results**:
|
|
200
|
-
- Budget list returns user's budgets with names and IDs
|
|
201
|
-
- Default budget is set successfully
|
|
202
|
-
- Cache warming is triggered automatically
|
|
203
|
-
- Default budget query returns the selected budget
|
|
204
|
-
|
|
205
|
-
**Success Criteria**: Budget operations work without errors, cache warming occurs
|
|
206
|
-
|
|
207
|
-
### 2.2 Account Listing
|
|
208
|
-
|
|
209
|
-
**Objective**: Test account retrieval and caching behavior.
|
|
210
|
-
|
|
211
|
-
**Steps**:
|
|
212
|
-
1. Ask Claude: "List my accounts" (first time)
|
|
213
|
-
2. Note response time
|
|
214
|
-
3. Ask Claude: "List my accounts" (second time)
|
|
215
|
-
4. Compare response times
|
|
216
|
-
5. Check diagnostic_info for cache hits
|
|
217
|
-
|
|
218
|
-
**Expected Results**:
|
|
219
|
-
- First request fetches from YNAB API
|
|
220
|
-
- Second request is faster (cache hit)
|
|
221
|
-
- Both requests return identical account data
|
|
222
|
-
- Cache metrics show hit count increase
|
|
223
|
-
|
|
224
|
-
**Success Criteria**: Caching improves response time, data consistency maintained
|
|
225
|
-
|
|
226
|
-
### 2.3 Transaction Retrieval
|
|
227
|
-
|
|
228
|
-
**Objective**: Test transaction listing with various filters.
|
|
229
|
-
|
|
230
|
-
**Steps**:
|
|
231
|
-
1. Ask Claude: "Show me recent transactions"
|
|
232
|
-
2. Ask Claude: "Show me transactions from a specific account"
|
|
233
|
-
3. Ask Claude: "Show me transactions from the last 30 days"
|
|
234
|
-
4. Ask Claude: "Show me uncategorized transactions"
|
|
235
|
-
|
|
236
|
-
**Expected Results**:
|
|
237
|
-
- All transaction queries return appropriate data
|
|
238
|
-
- Filters work correctly (account, date range, categorization status)
|
|
239
|
-
- Response times are reasonable
|
|
240
|
-
- Data format is consistent
|
|
241
|
-
|
|
242
|
-
**Success Criteria**: All transaction filters work correctly, consistent formatting
|
|
243
|
-
|
|
244
|
-
## 3. Enhanced Caching Tests
|
|
245
|
-
|
|
246
|
-
### 3.1 Cache Warming Verification
|
|
247
|
-
|
|
248
|
-
**Objective**: Verify cache warming works after setting default budget.
|
|
249
|
-
|
|
250
|
-
**Steps**:
|
|
251
|
-
1. Clear cache (restart server or use diagnostic tools)
|
|
252
|
-
2. Set default budget
|
|
253
|
-
3. Check cache metrics immediately after
|
|
254
|
-
4. Verify accounts, categories, and payees are cached
|
|
255
|
-
|
|
256
|
-
**Expected Results**:
|
|
257
|
-
- Cache warming triggers automatically
|
|
258
|
-
- Accounts, categories, and payees are pre-loaded
|
|
259
|
-
- Subsequent requests for these data types are fast
|
|
260
|
-
- Cache hit rate improves dramatically
|
|
261
|
-
|
|
262
|
-
**Success Criteria**: Cache warming pre-loads commonly used data
|
|
263
|
-
|
|
264
|
-
### 3.2 LRU Eviction Testing
|
|
265
|
-
|
|
266
|
-
**Objective**: Test cache eviction when limits are reached.
|
|
267
|
-
|
|
268
|
-
**Steps**:
|
|
269
|
-
1. Set cache limit to low value (via environment variables)
|
|
270
|
-
2. Request data for multiple different filters
|
|
271
|
-
3. Check cache metrics for evictions
|
|
272
|
-
4. Verify least recently used items are evicted first
|
|
273
|
-
|
|
274
|
-
**Expected Results**:
|
|
275
|
-
- Cache respects maximum entry limits
|
|
276
|
-
- Older entries are evicted as new ones are added
|
|
277
|
-
- Most frequently accessed data remains cached
|
|
278
|
-
- No memory leaks occur
|
|
279
|
-
|
|
280
|
-
**Success Criteria**: LRU eviction maintains cache within limits
|
|
281
|
-
|
|
282
|
-
### 3.3 Stale-While-Revalidate Testing
|
|
283
|
-
|
|
284
|
-
**Objective**: Test stale data serving while refreshing in background.
|
|
285
|
-
|
|
286
|
-
**Steps**:
|
|
287
|
-
1. Cache some data and wait for it to become stale
|
|
288
|
-
2. Request the stale data
|
|
289
|
-
3. Verify immediate response with stale data
|
|
290
|
-
4. Confirm background refresh occurs
|
|
291
|
-
|
|
292
|
-
**Expected Results**:
|
|
293
|
-
- Stale data is served immediately for fast response
|
|
294
|
-
- Background refresh updates the cache
|
|
295
|
-
- User gets immediate response, cache stays fresh
|
|
296
|
-
- No blocking on refresh operations
|
|
297
|
-
|
|
298
|
-
**Success Criteria**: Stale-while-revalidate provides fast responses while maintaining freshness
|
|
299
|
-
|
|
300
|
-
## 4. Tool Registry Tests
|
|
301
|
-
|
|
302
|
-
### 4.1 All Tools Accessibility
|
|
303
|
-
|
|
304
|
-
**Objective**: Verify all tools are accessible through Claude Desktop.
|
|
305
|
-
|
|
306
|
-
**Steps**:
|
|
307
|
-
1. Ask Claude to list available YNAB tools
|
|
308
|
-
2. Test a selection of tools from different categories:
|
|
309
|
-
- Budget management (list_budgets, set_default_budget)
|
|
310
|
-
- Account management (list_accounts, get_account)
|
|
311
|
-
- Transaction management (list_transactions, create_transaction)
|
|
312
|
-
- Monthly data analysis (get_month, list_months)
|
|
313
|
-
- Utility tools (diagnostic_info, convert_amount)
|
|
314
|
-
|
|
315
|
-
**Expected Results**:
|
|
316
|
-
- All tools are accessible and respond correctly
|
|
317
|
-
- Parameter validation works consistently
|
|
318
|
-
- Error handling is uniform across tools
|
|
319
|
-
- Tool descriptions are helpful and accurate
|
|
320
|
-
|
|
321
|
-
**Success Criteria**: All tools accessible with consistent behavior
|
|
322
|
-
|
|
323
|
-
### 4.2 Parameter Validation
|
|
324
|
-
|
|
325
|
-
**Objective**: Test parameter validation across different tools.
|
|
326
|
-
|
|
327
|
-
**Steps**:
|
|
328
|
-
1. Try tools with missing required parameters
|
|
329
|
-
2. Try tools with invalid parameter values
|
|
330
|
-
3. Try tools with correct parameters
|
|
331
|
-
4. Test optional parameter handling
|
|
332
|
-
|
|
333
|
-
**Expected Results**:
|
|
334
|
-
- Missing required parameters result in clear error messages
|
|
335
|
-
- Invalid parameters are rejected with helpful guidance
|
|
336
|
-
- Valid parameters are processed correctly
|
|
337
|
-
- Optional parameters work when provided or omitted
|
|
338
|
-
|
|
339
|
-
**Success Criteria**: Consistent parameter validation with helpful error messages
|
|
340
|
-
|
|
341
|
-
## 5. Transaction Management Tests
|
|
342
|
-
|
|
343
|
-
### 5.1 Transaction Creation
|
|
344
|
-
|
|
345
|
-
**Objective**: Test creating new transactions.
|
|
346
|
-
|
|
347
|
-
**Steps**:
|
|
348
|
-
1. Ask Claude: "Create a test transaction for $10.00 groceries"
|
|
349
|
-
2. Verify transaction appears in YNAB
|
|
350
|
-
3. Check transaction details
|
|
351
|
-
4. Clean up test transaction
|
|
352
|
-
|
|
353
|
-
**Expected Results**:
|
|
354
|
-
- Transaction is created successfully
|
|
355
|
-
- All details are recorded correctly
|
|
356
|
-
- Transaction appears in YNAB interface
|
|
357
|
-
- Appropriate account and category are used
|
|
358
|
-
|
|
359
|
-
**Success Criteria**: Transaction creation works with accurate data
|
|
360
|
-
|
|
361
|
-
### 5.2 Transaction Export
|
|
362
|
-
|
|
363
|
-
**Objective**: Test transaction export functionality.
|
|
364
|
-
|
|
365
|
-
**Steps**:
|
|
366
|
-
1. Ask Claude: "Export my transactions to a file"
|
|
367
|
-
2. Check export directory for file
|
|
368
|
-
3. Review exported data format
|
|
369
|
-
4. Verify data completeness
|
|
370
|
-
|
|
371
|
-
**Expected Results**:
|
|
372
|
-
- Export file is created in correct directory
|
|
373
|
-
- File contains accurate transaction data
|
|
374
|
-
- Format is readable and well-structured
|
|
375
|
-
- All requested transactions are included
|
|
376
|
-
|
|
377
|
-
**Success Criteria**: Export creates complete, accurate files
|
|
378
|
-
|
|
379
|
-
### 5.3 CSV Comparison
|
|
380
|
-
|
|
381
|
-
**Objective**: Test CSV comparison functionality.
|
|
382
|
-
|
|
383
|
-
**Steps**:
|
|
384
|
-
1. Use a sample CSV file
|
|
385
|
-
2. Ask Claude: "Compare this CSV with my YNAB transactions"
|
|
386
|
-
3. Review matching results
|
|
387
|
-
4. Check unmatched transaction identification
|
|
388
|
-
|
|
389
|
-
**Expected Results**:
|
|
390
|
-
- CSV parsing works correctly
|
|
391
|
-
- Transaction matching algorithms function properly
|
|
392
|
-
- Unmatched transactions are identified
|
|
393
|
-
- Clear reporting of comparison results
|
|
394
|
-
|
|
395
|
-
**Success Criteria**: CSV comparison accurately identifies matches and discrepancies
|
|
396
|
-
|
|
397
|
-
### 5.4 Account Reconciliation
|
|
398
|
-
|
|
399
|
-
**Objective**: Test comprehensive account reconciliation.
|
|
400
|
-
|
|
401
|
-
**Steps**:
|
|
402
|
-
1. Prepare CSV export from your bank
|
|
403
|
-
2. Ask Claude: "Reconcile my checking account with this CSV"
|
|
404
|
-
3. Review matching analysis
|
|
405
|
-
4. Check balance verification
|
|
406
|
-
5. Review recommendations
|
|
407
|
-
|
|
408
|
-
**Expected Results**:
|
|
409
|
-
- Smart duplicate matching works correctly
|
|
410
|
-
- Automatic date adjustment handles timezone issues
|
|
411
|
-
- Balance matching provides exact reconciliation
|
|
412
|
-
- Comprehensive reporting shows all details
|
|
413
|
-
|
|
414
|
-
**Success Criteria**: Reconciliation accurately matches transactions and balances
|
|
415
|
-
|
|
416
|
-
## 6. Error Handling Tests
|
|
417
|
-
|
|
418
|
-
### 6.1 Missing Budget ID Scenarios
|
|
419
|
-
|
|
420
|
-
**Objective**: Test behavior when no default budget is set.
|
|
421
|
-
|
|
422
|
-
**Steps**:
|
|
423
|
-
1. Clear default budget setting
|
|
424
|
-
2. Try tools that require budget context
|
|
425
|
-
3. Check error messages
|
|
426
|
-
4. Verify recovery guidance
|
|
427
|
-
|
|
428
|
-
**Expected Results**:
|
|
429
|
-
- Clear error messages about missing budget
|
|
430
|
-
- Helpful guidance on setting default budget
|
|
431
|
-
- No system crashes or unclear errors
|
|
432
|
-
- Easy recovery path provided
|
|
433
|
-
|
|
434
|
-
**Success Criteria**: Clear error messages with actionable recovery steps
|
|
435
|
-
|
|
436
|
-
### 6.2 Invalid Parameter Testing
|
|
437
|
-
|
|
438
|
-
**Objective**: Test handling of invalid parameters.
|
|
439
|
-
|
|
440
|
-
**Steps**:
|
|
441
|
-
1. Try tools with malformed parameters
|
|
442
|
-
2. Test with out-of-range values
|
|
443
|
-
3. Try with incorrect data types
|
|
444
|
-
4. Test with missing required fields
|
|
445
|
-
|
|
446
|
-
**Expected Results**:
|
|
447
|
-
- Validation catches all invalid parameters
|
|
448
|
-
- Error messages clearly identify the problem
|
|
449
|
-
- Suggestions for correct parameter format
|
|
450
|
-
- No system instability from bad inputs
|
|
451
|
-
|
|
452
|
-
**Success Criteria**: Robust parameter validation with helpful error messages
|
|
453
|
-
|
|
454
|
-
### 6.3 YNAB API Error Scenarios
|
|
455
|
-
|
|
456
|
-
**Objective**: Test handling of YNAB API errors.
|
|
457
|
-
|
|
458
|
-
**Steps**:
|
|
459
|
-
1. Test with expired token (if possible)
|
|
460
|
-
2. Test during YNAB API maintenance
|
|
461
|
-
3. Test with network connectivity issues
|
|
462
|
-
4. Test with rate limiting scenarios
|
|
463
|
-
|
|
464
|
-
**Expected Results**:
|
|
465
|
-
- Graceful handling of API errors
|
|
466
|
-
- Clear error messages about external issues
|
|
467
|
-
- No server crashes or hangs
|
|
468
|
-
- Appropriate retry mechanisms
|
|
469
|
-
|
|
470
|
-
**Success Criteria**: Robust error handling for external API issues
|
|
471
|
-
|
|
472
|
-
## 7. Performance Tests
|
|
473
|
-
|
|
474
|
-
### 7.1 Response Time Verification
|
|
475
|
-
|
|
476
|
-
**Objective**: Verify acceptable response times with and without caching.
|
|
477
|
-
|
|
478
|
-
**Steps**:
|
|
479
|
-
1. Measure response times for fresh requests
|
|
480
|
-
2. Measure response times for cached requests
|
|
481
|
-
3. Compare performance improvements
|
|
482
|
-
4. Test with large data sets
|
|
483
|
-
|
|
484
|
-
**Expected Results**:
|
|
485
|
-
- Fresh requests complete within reasonable time (< 5 seconds)
|
|
486
|
-
- Cached requests are significantly faster (< 1 second)
|
|
487
|
-
- Large data sets are handled efficiently
|
|
488
|
-
- No performance degradation over time
|
|
489
|
-
|
|
490
|
-
**Success Criteria**: Response times meet performance expectations
|
|
491
|
-
|
|
492
|
-
### 7.2 Concurrent Request Handling
|
|
493
|
-
|
|
494
|
-
**Objective**: Test server behavior under concurrent load.
|
|
495
|
-
|
|
496
|
-
**Steps**:
|
|
497
|
-
1. Make multiple simultaneous requests
|
|
498
|
-
2. Check for race conditions
|
|
499
|
-
3. Verify data consistency
|
|
500
|
-
4. Monitor resource usage
|
|
501
|
-
|
|
502
|
-
**Expected Results**:
|
|
503
|
-
- Concurrent requests handled properly
|
|
504
|
-
- No race conditions in cache or data
|
|
505
|
-
- Consistent results across all requests
|
|
506
|
-
- Reasonable resource usage
|
|
507
|
-
|
|
508
|
-
**Success Criteria**: Stable performance under concurrent load
|
|
509
|
-
|
|
510
|
-
### 7.3 Memory Usage Monitoring
|
|
511
|
-
|
|
512
|
-
**Objective**: Verify memory usage remains stable during extended use.
|
|
513
|
-
|
|
514
|
-
**Steps**:
|
|
515
|
-
1. Monitor baseline memory usage
|
|
516
|
-
2. Perform extended testing session
|
|
517
|
-
3. Check for memory leaks
|
|
518
|
-
4. Verify cache size limits are respected
|
|
519
|
-
|
|
520
|
-
**Expected Results**:
|
|
521
|
-
- Memory usage remains stable over time
|
|
522
|
-
- No significant memory leaks detected
|
|
523
|
-
- Cache eviction prevents unbounded growth
|
|
524
|
-
- Resource usage stays within acceptable limits
|
|
525
|
-
|
|
526
|
-
**Success Criteria**: Stable memory usage without leaks
|
|
527
|
-
|
|
528
|
-
## Test Data Management
|
|
529
|
-
|
|
530
|
-
### Mock Data
|
|
531
|
-
- Unit and integration tests use mock data
|
|
532
|
-
- Mock responses are defined in test files
|
|
533
|
-
- No real API calls or data modification
|
|
534
|
-
|
|
535
|
-
### E2E Test Data
|
|
536
|
-
- E2E tests create real data in your YNAB budget
|
|
537
|
-
- Test transactions are automatically cleaned up
|
|
538
|
-
- Test accounts cannot be deleted via API (manual cleanup required)
|
|
539
|
-
- Use a dedicated test budget to avoid affecting real data
|
|
540
|
-
|
|
541
|
-
## Common Issues and Solutions
|
|
542
|
-
|
|
543
|
-
### E2E Tests Skipped
|
|
544
|
-
```
|
|
545
|
-
⏭️ E2E tests skipped (no API key or SKIP_E2E_TESTS=true)
|
|
546
|
-
```
|
|
547
|
-
**Solution**: Set `YNAB_ACCESS_TOKEN` environment variable
|
|
548
|
-
|
|
549
|
-
### Coverage Below Threshold
|
|
550
|
-
```
|
|
551
|
-
⚠️ Coverage below target (<80%)
|
|
552
|
-
```
|
|
553
|
-
**Solution**: Add more tests or remove untestable code from coverage
|
|
554
|
-
|
|
555
|
-
### Performance Tests Failing
|
|
556
|
-
```
|
|
557
|
-
❌ Performance assertion failed: 1500ms > 1000ms
|
|
558
|
-
```
|
|
559
|
-
**Solution**: Optimize code or adjust performance thresholds
|
|
560
|
-
|
|
561
|
-
### Connection Errors in Claude Desktop
|
|
562
|
-
**Solution**:
|
|
563
|
-
1. Verify Node.js version (18+)
|
|
564
|
-
2. Check build completed successfully
|
|
565
|
-
3. Verify MCP server configuration
|
|
566
|
-
4. Restart Claude Desktop completely
|
|
567
|
-
|
|
568
|
-
## Test Execution Guidelines
|
|
569
|
-
|
|
570
|
-
1. **Prerequisites**: Ensure .env file is configured with valid YNAB token
|
|
571
|
-
2. **Environment**: Use development configuration for detailed logging
|
|
572
|
-
3. **Documentation**: Record results for each test scenario
|
|
573
|
-
4. **Issues**: Log any problems with steps to reproduce
|
|
574
|
-
5. **Performance**: Record timing measurements for performance tests
|
|
575
|
-
6. **Cleanup**: Clean up test data after testing (transactions, exports)
|
|
576
|
-
|
|
577
|
-
## Success Criteria Summary
|
|
578
|
-
|
|
579
|
-
- ✅ All basic functionality works correctly
|
|
580
|
-
- ✅ Enhanced caching provides performance improvements
|
|
581
|
-
- ✅ Error handling is robust and helpful
|
|
582
|
-
- ✅ All tools are accessible and functional
|
|
583
|
-
- ✅ Transaction management works reliably
|
|
584
|
-
- ✅ Performance meets or exceeds expectations
|
|
585
|
-
- ✅ Integration with Claude Desktop is seamless
|
|
586
|
-
- ✅ Security and reliability standards are met
|
|
587
|
-
|
|
588
|
-
---
|
|
589
|
-
|
|
590
|
-
For automated testing implementation details, see the source code in `src/__tests__/`.
|
|
591
|
-
For additional testing checklists, see `../development/TESTING_CHECKLIST.md`.
|
|
@@ -1,83 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
title: 'Automated Reconciliation Flow'
|
|
3
|
-
status: 'active'
|
|
4
|
-
last_updated: '2025-11-12'
|
|
5
|
-
owners:
|
|
6
|
-
- '@ynab-mcpb/tooling'
|
|
7
|
-
related_docs:
|
|
8
|
-
- reference/API.md#reconcile_account
|
|
9
|
-
- reference/TOOLS.md#reconcile_account
|
|
10
|
-
- guides/TESTING.md#comprehensive-account-reconciliation
|
|
11
|
-
- reference/TROUBLESHOOTING.md#reconciliation
|
|
12
|
-
---
|
|
13
|
-
|
|
14
|
-
# Automated Reconciliation Flow
|
|
15
|
-
|
|
16
|
-
Deterministic playbook for reconciling a YNAB account with a bank statement inside the MCP host. The flow runs newest → oldest, stops the moment balances align, and emits both a narrative and machine-readable payload for assistants and downstream automation.
|
|
17
|
-
|
|
18
|
-
## Prerequisites & Environment
|
|
19
|
-
- Provide a valid `.env` cloned from `.env.example`, including `YNAB_ACCESS_TOKEN`, cache knobs, and any per-budget rate limits. Run `npm run validate-env` whenever secrets change.
|
|
20
|
-
- Install dependencies with `npm install`, keep `node` ≥ 20, and prefer `npm run dev` while editing to recompile TypeScript incrementally.
|
|
21
|
-
- Reconciliation tools assume access to CSV statements on disk (`csv_file_path`) or piped data (`csv_data`). Files should stay inside the workspace to avoid sandbox denials.
|
|
22
|
-
- Vitest snapshot directories (`test-results/`) must be writable; dry-run audits reference these to highlight regression diffs.
|
|
23
|
-
|
|
24
|
-
## Schema & Data Contracts
|
|
25
|
-
- **Input contract** – `ReconcileAccountSchema` (`src/tools/reconciliation/index.ts`) enforces budget/account ids, CSV source, statement balance, and guard rails like date/amount tolerances, automation toggles, and confidence thresholds. Every entry path calls `ReconcileAccountSchema.parse(...)` before touching the YNAB API.
|
|
26
|
-
- **CSV normalization** – `autoDetectCSVFormat` plus `extractDateRangeFromCSV` deduce header presence, delimiter, debit/credit pairs, and generate the reconciliation window (min/max ± 5 days buffer).
|
|
27
|
-
- **Structured output** – `buildReconciliationPayload` + `responseFormatter` return `version: '2.0'` JSON (see `docs/schemas/reconciliation-v2.json`) alongside the human narrative. This payload captures matches, actions, balance deltas, and flags like `audit_trail_complete`.
|
|
28
|
-
- **Audit snapshot** – `buildBalanceReconciliation` records `precision_calculations`, `discrepancy_analysis`, and `final_verification` booleans, ensuring downstream tooling can prove reconciliation outcomes without re-querying YNAB.
|
|
29
|
-
|
|
30
|
-
## Configuration Knobs (Schema Excerpts)
|
|
31
|
-
- Matching tolerances: `date_tolerance_days` (0-7, default 5) and `amount_tolerance_cents` (default 1¢) gate candidate searches; `confidence_threshold` (0.8) controls risk when auto-clearing.
|
|
32
|
-
- Automation toggles: `auto_create_transactions`, `auto_update_cleared_status`, `auto_adjust_dates`, `auto_unclear_missing`, `dry_run`, and `balance_verification_mode` (`ANALYSIS_ONLY`, `GUIDED_RESOLUTION`, `AUTO_RESOLVE`).
|
|
33
|
-
- CSV format overrides: `csv_format.{date_column, amount_column, debit_column, credit_column, date_format, has_header, delimiter}` keep unusual exports usable without retooling the parser.
|
|
34
|
-
- Safety rails: `require_exact_match` and `max_resolution_attempts` prevent runaway loops; `include_structured_data` controls whether assistants receive the payload blob.
|
|
35
|
-
|
|
36
|
-
## Logging & Auditability
|
|
37
|
-
- Every mutation funnels through `responseFormatter` with `execution.summary` stats plus `matches_found`, `transactions_created`, and `transactions_updated` counts for dashboards.
|
|
38
|
-
- `buildBalanceReconciliation` emits `audit_trail_complete`, balance math, and likely-cause hints whenever a discrepancy persists.
|
|
39
|
-
- `executor.ts` annotates each action with reasons (e.g., "marked as cleared, date adjusted"), giving a linear log for SOC review.
|
|
40
|
-
- Tests under `src/tools/reconciliation/__tests__/` assert both narrative text and structured payloads; failures drop sanitized artifacts into `test-results/` for diffing.
|
|
41
|
-
|
|
42
|
-
## Numbered Steps, Rationale, and Validation Hooks
|
|
43
|
-
### Step 1 — Input validation & window detection
|
|
44
|
-
- **Rationale**: Prevents wasting API calls on malformed CSVs and ensures the comparison window brackets all relevant transactions.
|
|
45
|
-
- **What happens**: Parse CSV metadata, normalize amounts/dates, derive min/max window ±5 days, and hydrate default tolerances through `ReconcileAccountSchema.parse(...)`. Liability accounts invert statement balance sign for consistent delta math.
|
|
46
|
-
- **Validation**: Unit coverage in `src/tools/reconciliation/__tests__/parser.*` plus `npm run validate-env` to guarantee credentials before file I/O occurs.
|
|
47
|
-
- **Open questions**: Should we persist detected CSV format to `test-exports/` for reuse, or is in-memory derivation sufficient for multi-pass sessions?
|
|
48
|
-
|
|
49
|
-
### Step 2 — Phase 1 statement pass (newest → oldest)
|
|
50
|
-
- **Rationale**: Mirroring experienced YNAB workflows short-circuits once balances match, avoiding needless mutation of ancient rows.
|
|
51
|
-
- **What happens**: Sort bank rows descending, compute `cleared_delta = ynab.cleared - statement_balance`, and for each row find best YNAB candidate within tolerances + payee similarity. If confidence ≥ `auto_match_threshold` and automation toggles allow, clear/update/auto-create transactions. Recalculate `cleared_delta` after every action; halt once |delta| ≤ tolerance.
|
|
52
|
-
- **Validation**: `findBestMatch` integration tests ensure deterministic candidate ordering; we also assert log completeness (`audit_trail_complete`) in executor tests.
|
|
53
|
-
- **Open questions**: Do we need adaptive confidence thresholds for larger ledgers (>1k rows) to limit runtime, or is the static percentage enough?
|
|
54
|
-
|
|
55
|
-
### Step 3 — Phase 2 cleared-YNAB sanity pass
|
|
56
|
-
- **Rationale**: Detects stale cleared transactions that never appeared on the bank statement, a common source of lingering deltas.
|
|
57
|
-
- **What happens**: Iterate YNAB transactions with `cleared === 'cleared'` but `reconciled === false` inside the CSV window ±5 days. Attempt to re-match them to leftover bank rows; otherwise flip to `uncleared` when `auto_unclear_missing` is true and recompute `cleared_delta`.
|
|
58
|
-
- **Validation**: Executor tests (`executor.sanity-pass.test.ts`) verify we never un-clear reconciled items, and dry-run mode logs intended actions without mutating YNAB.
|
|
59
|
-
- **Open questions**: Should we surface a preview of would-be un-cleared transactions in dry-run mode to the structured payload for UI display?
|
|
60
|
-
|
|
61
|
-
### Step 4 — Finalize reconciliation
|
|
62
|
-
- **Rationale**: Once balances align, we need a trusted checkpoint recording statement date/balance plus an auditable list of touched transactions.
|
|
63
|
-
- **What happens**: Prompt the assistant/user to finish reconciliation, set involved transactions to `reconciled`, and call `buildBalanceReconciliation` to persist precision math and `final_verification` booleans.
|
|
64
|
-
- **Validation**: Snapshot tests assert the `execution.account_balance.before/after` objects stay monotonic; manual validation by rerunning `npm test -- --runInBand` ensures no race with parallel Vitest workers.
|
|
65
|
-
- **Open questions**: Should we enforce that `statement_date` is required at this stage, or keep the current fallback to `statement_end_date` if missing?
|
|
66
|
-
|
|
67
|
-
### Step 5 — Leftover escalation & operator handoff
|
|
68
|
-
- **Rationale**: Keeping humans in the loop for medium/low-confidence matches prevents silent drift when automation can’t safely conclude.
|
|
69
|
-
- **What happens**: Surface structured `recommendations` containing low-confidence suggestions, unmatched bank-only rows, and the list of transactions auto un-cleared during Step 3. The narrative outlines manual review order, while the JSON payload allows clients to build UI cards (see `reference/TROUBLESHOOTING.md#reconciliation`).
|
|
70
|
-
- **Validation**: Adapter tests verify `buildReconciliationPayload` includes each unresolved set with counts, and E2E scripts (`test-reconcile-autodetect.js`) confirm the CLI prints the same inventory of leftovers.
|
|
71
|
-
- **Open questions**: Do we need a SLA timer/escalation hook (e.g., Slack webhook) when leftovers include more than N transactions, or is assistant messaging enough?
|
|
72
|
-
|
|
73
|
-
### Step 6 — Retriable automation & telemetry feedback
|
|
74
|
-
- **Rationale**: Audit logs inform future tuning (e.g., tolerance adjustments) and enable replays without re-parsing inputs.
|
|
75
|
-
- **What happens**: Persist log streams, emit `execution.summary` stats, and optionally rerun the flow with updated knobs (e.g., `balance_verification_mode = 'GUIDED_RESOLUTION'`) using the same CSV payload. Telemetry consumers watch `audit_trail_complete` and `discrepancy_analysis` to decide whether another automated attempt is viable.
|
|
76
|
-
- **Validation**: `docs/guides/TESTING.md#comprehensive-account-reconciliation` details the manual harness; CI pipelines run `npm run test:comprehensive` to ensure telemetry fields stay backwards compatible.
|
|
77
|
-
- **Open questions**: Should we snapshot anonymized telemetry for regression dashboards, or does that introduce privacy concerns with customer CSVs?
|
|
78
|
-
|
|
79
|
-
## Testing Hooks & Cross-links
|
|
80
|
-
- Follow the **Comprehensive Account Reconciliation** playbook in `[docs/guides/TESTING.md](guides/TESTING.md#comprehensive-account-reconciliation)` to exercise CSV parsing, matching, and execution paths end-to-end.
|
|
81
|
-
- Tool contract reference lives in `[docs/reference/API.md#reconcile_account](reference/API.md#reconcile_account)` and `[docs/reference/TOOLS.md#reconcile_account](reference/TOOLS.md#reconcile_account)`; keep this doc updated when schemas there change.
|
|
82
|
-
- Troubleshooting steps for stubborn discrepancies are cataloged in `[docs/reference/TROUBLESHOOTING.md#reconciliation](reference/TROUBLESHOOTING.md#reconciliation)`; link to this when raising escalation tickets.
|
|
83
|
-
- Local scripts (`test-reconcile-tool.js`, `test-reconcile-autodetect.js`) double as reproducible demonstrations—capture their JSON output and attach to `.pr-description.md` when documenting reconciliation changes.
|