vscode-ark 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Your Name
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,12 @@
1
+ include README.md
2
+ include LICENSE
3
+ include requirements.txt
4
+ recursive-include tests *.py
5
+ global-exclude *.pyc
6
+ global-exclude __pycache__
7
+ global-exclude *.db
8
+ global-exclude *.db-*
9
+ global-exclude *.log
10
+ global-exclude watcher-queue/*
11
+ global-exclude policy.txt
12
+ global-exclude .DS_Store
@@ -0,0 +1,422 @@
1
+ Metadata-Version: 2.4
2
+ Name: vscode-ark
3
+ Version: 0.1.0
4
+ Summary: Comprehensive analysis system for VS Code/Copilot Chat sessions with behavioral signal extraction and heat scoring
5
+ Home-page: https://github.com/goCosmix/vscode-ark
6
+ Author: Ernie Butcher
7
+ Author-email: Ernie Butcher <ernie@fiosii.com>
8
+ Maintainer-email: Ernie Butcher <ernie@fiosii.com>
9
+ License: MIT
10
+ Project-URL: Homepage, https://github.com/goCosmix/vscode-ark
11
+ Project-URL: Repository, https://github.com/goCosmix/vscode-ark.git
12
+ Project-URL: Issues, https://github.com/goCosmix/vscode-ark/issues
13
+ Project-URL: Documentation, https://github.com/goCosmix/vscode-ark#readme
14
+ Project-URL: Changelog, https://github.com/goCosmix/vscode-ark/blob/main/CHANGELOG.md
15
+ Keywords: vscode,copilot,chat,analysis,behavioral,signals,heat-score,ai,conversation
16
+ Classifier: Development Status :: 3 - Alpha
17
+ Classifier: Intended Audience :: Developers
18
+ Classifier: Intended Audience :: Science/Research
19
+ Classifier: License :: OSI Approved :: MIT License
20
+ Classifier: Operating System :: OS Independent
21
+ Classifier: Programming Language :: Python :: 3
22
+ Classifier: Programming Language :: Python :: 3.8
23
+ Classifier: Programming Language :: Python :: 3.9
24
+ Classifier: Programming Language :: Python :: 3.10
25
+ Classifier: Programming Language :: Python :: 3.11
26
+ Classifier: Programming Language :: Python :: 3.12
27
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
28
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
29
+ Classifier: Topic :: System :: Logging
30
+ Requires-Python: >=3.8
31
+ Description-Content-Type: text/markdown
32
+ License-File: LICENSE
33
+ Requires-Dist: watchfiles>=0.20
34
+ Requires-Dist: click>=8.0
35
+ Requires-Dist: sentence-transformers>=2.2.2
36
+ Requires-Dist: numpy>=1.26
37
+ Provides-Extra: dev
38
+ Requires-Dist: pytest>=7.0; extra == "dev"
39
+ Requires-Dist: pytest-cov; extra == "dev"
40
+ Requires-Dist: black; extra == "dev"
41
+ Requires-Dist: isort; extra == "dev"
42
+ Requires-Dist: flake8; extra == "dev"
43
+ Requires-Dist: mypy; extra == "dev"
44
+ Provides-Extra: test
45
+ Requires-Dist: pytest>=7.0; extra == "test"
46
+ Requires-Dist: pytest-cov; extra == "test"
47
+ Dynamic: author
48
+ Dynamic: home-page
49
+ Dynamic: license-file
50
+ Dynamic: requires-python
51
+
52
+ # VS Code Ark
53
+
54
+ [![Python Version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
55
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
56
+ [![CI](https://github.com/yourusername/vscode-ark/workflows/CI/badge.svg)](https://github.com/yourusername/vscode-ark/actions)
57
+
58
+ A comprehensive data pipeline and analysis system for VS Code/Copilot Chat sessions. Extract behavioral signals, compute heat scores, and gain deep insights into human-AI interaction patterns.
59
+
60
+ ## ✨ Features
61
+
62
+ - **Behavioral Signal Analysis** - Extract 200+ keywords across 6 signal types (corrections, frustrations, affirmations, etc.)
63
+ - **Heat Score Computation** - Quantify user frustration and agent performance (0-100 scale)
64
+ - **Real-time Monitoring** - Live sync daemon with crash-resistant queue system
65
+ - **Full-text Search** - FTS5-powered search across all conversations
66
+ - **Semantic Intelligence** - miniLM embeddings, session summaries, related sessions, anomaly alerts, and recommendations
67
+ - **Code Symbol Indexing** - AST-backed symbol extraction for Python/JS/TS and content search across VFS blobs
68
+ - **Incremental Sync** - Watcher-driven session refreshes keep embeddings and session insight current as chat and tool outputs change
69
+ - **Package-centric Layout** - All runtime code lives under `vscode_ark/` for a clean root.
70
+ - **Policy-based Access Control** - Allow/deny patterns for data filtering
71
+ - **Rich Analytics** - Token usage, context compaction, session recovery analysis
72
+ - **Export Capabilities** - JSON, JSONL, and text export formats
73
+ - **Professional CLI** - Comprehensive command-line interface with 25+ commands
74
+
75
+ ## 📋 Table of Contents
76
+
77
+ - [Installation](#installation)
78
+ - [Quick Start](#quick-start)
79
+ - [Architecture](#architecture)
80
+ - [CLI Reference](#cli-reference)
81
+ - [Data Analysis](#data-analysis)
82
+ - [Configuration](#configuration)
83
+ - [Development](#development)
84
+ - [Contributing](#contributing)
85
+ - [License](#license)
86
+
87
+ ## 🚀 Installation
88
+
89
+ ### Prerequisites
90
+
91
+ - Python 3.8+
92
+ - VS Code with Copilot Chat extension
93
+
94
+ ### From Source
95
+
96
+ ```bash
97
+ git clone https://github.com/yourusername/vscode-ark.git
98
+ cd vscode-ark
99
+ pip install -e .
100
+ ```
101
+
102
+ ### With Development Dependencies
103
+
104
+ ```bash
105
+ pip install -e ".[dev]"
106
+ ```
107
+
108
+ ### From PyPI (Future)
109
+
110
+ ```bash
111
+ pip install vscode-ark
112
+ ```
113
+
114
+ ## ⚡ Quick Start
115
+
116
+ 1. **Initialize the database:**
117
+ ```bash
118
+ cda sync
119
+ ```
120
+
121
+ 2. **Start live monitoring:**
122
+ ```bash
123
+ cda watch start
124
+ ```
125
+ The watcher keeps VS Code updates, code symbols, and embeddings in sync.
126
+
127
+ 3. **Build semantic intelligence:**
128
+ ```bash
129
+ cda embed build
130
+ ```
131
+
132
+ 4. **Explore your data:**
133
+ ```bash
134
+ cda stats # System overview
135
+ cda sessions # Recent sessions
136
+ cda search "error" # Search conversations
137
+ cda code-search "todo" --regex # Search code content
138
+ cda code-search "def process" --symbol # Search code symbols
139
+ cda semantic-search "confused" # Semantic search
140
+ cda related <session> # Find related sessions
141
+ cda summarize <session> # Session summary and recommendations
142
+ cda heat # Frustration analysis
143
+ ```
144
+
145
+ ## 🧠 SQLite limits and mitigation
146
+
147
+ - **Single writer in WAL mode**: the system uses one writer process for ingest/reconstruct/extract/embed and allows many concurrent readers via SQLite WAL.
148
+ - **Large VFS blob handling**: for very large raw artifacts, the clean approach is chunked storage or external file references instead of a single enormous BLOB.
149
+ - **Default 8KB page size / cache**: this code now sets `PRAGMA cache_size=-2000`, `PRAGMA mmap_size=268435456`, and `PRAGMA temp_store=MEMORY` to improve read/cache performance on larger databases.
150
+ - **Further tuning**: rebuild the DB with a larger page size (e.g. `PRAGMA page_size=32768`) if you need more efficient storage for very large session history.
151
+
152
+ ## 🔧 Configuration
153
+
154
+ - **VS Code Data Directory**: By default, assumes macOS paths (`~/Library/Application Support/Code/User`). Override with `export VSCODE_DATA_DIR=/path/to/vscode/data` (e.g., on Linux: `~/.config/Code/User`).
155
+ - **No other config needed**: Everything is CLI-driven with local SQLite.
156
+
157
+ ## 🏗️ Architecture
158
+
159
+ ```
160
+ VS Code Storage → ingest.py → vfs + sessions + transcripts
161
+
162
+ reconstruct.py → exchanges (structured conversations)
163
+
164
+ extract.py → signals + tokens + heat scores + analysis
165
+
166
+ embed.py → semantic embeddings + summaries + alerts
167
+
168
+ watcher.py → live sync + FTS indexing + queue resilience
169
+
170
+ cda → query interface + policy enforcement
171
+ ```
172
+
173
+ ### Core Components
174
+
175
+ | Component | Purpose | Key Features |
176
+ |-----------|---------|--------------|
177
+ | **ingest.py** | Data ingestion | VFS storage, gzip compression, session metadata |
178
+ | **reconstruct.py** | Conversation processing | Exchange threading, tool call linking, FTS indexing |
179
+ | **extract.py** | Signal analysis | Behavioral pattern recognition, heat scoring, token accounting |
180
+ | **watcher.py** | Live monitoring | File watching, incremental updates, crash recovery |
181
+ | **cda** | Query interface | 25+ commands, policy filtering, rich formatting |
182
+
183
+ ### Database Schema
184
+
185
+ - **workspaces** - VS Code workspace metadata
186
+ - **sessions** - Chat session information and metadata
187
+ - **vfs** - Gzip-compressed file storage with SHA256 hashes
188
+ - **exchanges** - Structured conversation turns with tool calls
189
+ - **exchange_signals** - Behavioral signal annotations
190
+ - **symbols** - Code symbol index (functions, classes, etc.)
191
+ - **token_usage** - Per-request token consumption tracking
192
+ - **compactions** - Context window summarization events
193
+ - **session_analysis** - Aggregated session metrics and heat scores
194
+
195
+ ## 🖥️ CLI Reference
196
+
197
+ ### Core Commands
198
+
199
+ ```bash
200
+ # System Management
201
+ cda status # Show daemon status and queue information
202
+ cda stats # System-wide statistics and coverage
203
+ cda sync # Full data ingestion and rebuild
204
+ cda reconstruct # Rebuild conversations and search index
205
+
206
+ # Session Analysis
207
+ cda sessions # List all sessions (newest first)
208
+ cda session <id> # Show detailed session information
209
+ cda workspace <id> # Show sessions for a workspace
210
+ cda workspaces # List all workspaces
211
+
212
+ # Search & Query
213
+ cda search <query> # Full-text search across conversations
214
+ cda code-search <pattern> [--symbol] [--regex] # Search code symbols or code content
215
+ cda semantic-search <query> # Semantic search using embeddings
216
+ cda similar <session> # Find sessions similar to a session
217
+ cda related <session> # Alias for semantic related sessions
218
+ cda summarize <session> # Show session summary, topics, and recommendations
219
+ cda topics # List semantic topic tags
220
+ cda alerts <session> # Show semantic anomaly alerts
221
+ cda recommend <session> # Show session recommendations
222
+ cda tools <query> # Search tool call arguments
223
+ cda memory # Show memory files and global state
224
+
225
+ # Behavioral Analysis
226
+ cda signals [session] # Show behavioral signals
227
+ cda heat [session] # Frustration and heat analysis
228
+ cda behavior # Aggregate behavioral intelligence
229
+ cda saved # Sessions that recovered from high heat
230
+
231
+ # Data Export
232
+ cda export <session> # Export session as JSON/JSONL/text
233
+ cda replay <session> # Print conversation as readable text
234
+
235
+ # Advanced
236
+ cda query <sql> # Execute raw SQL queries
237
+ cda tokens [session] # Token usage analysis
238
+ cda compactions [session] # Context compaction events
239
+ cda edits # Edit session analytics
240
+
241
+ # Policy Management
242
+ cda policy allow <pattern> # Add allow pattern
243
+ cda policy deny <pattern> # Add deny pattern
244
+ cda policy list # Show current policies
245
+
246
+ # Live Monitoring
247
+ cda watch start # Start watcher daemon
248
+ cda watch stop # Stop watcher daemon
249
+ cda watch restart # Restart watcher daemon
250
+ ```
251
+
252
+ ### Command Examples
253
+
254
+ ```bash
255
+ # Search for error handling discussions
256
+ cda search "error handling" --limit 20
257
+
258
+ # Find sessions with high frustration
259
+ cda heat --limit 10
260
+
261
+ # Search for specific functions in code
262
+ cda code-search "def process_data" --symbol
263
+
264
+ # Search code content with regex or plain text
265
+ cda code-search "timeout" --regex
266
+
267
+ # Find semantically related sessions
268
+ cda related abc123
269
+
270
+ # Summarize a session with semantic topics and recommendations
271
+ cda summarize abc123
272
+
273
+ # Export a session for external analysis
274
+ cda export abc123 --format jsonl --output session.jsonl
275
+
276
+ # Monitor live sessions
277
+ cda watch start
278
+ cda status # Check queue status
279
+ ```
280
+
281
+ ## 📊 Data Analysis
282
+
283
+ ### Behavioral Signals
284
+
285
+ The system recognizes 6 signal types with 200+ keyword patterns:
286
+
287
+ | Signal Type | Weight | Description | Example Keywords |
288
+ |-------------|--------|-------------|------------------|
289
+ | **correction** | 3 | User correcting agent behavior | "stop", "wrong", "nope", "wait" |
290
+ | **pre_correction** | 2 | Early frustration signs | "actually", "hold on", "slow down" |
291
+ | **redirect** | 1 | User changing direction | "pivot", "change direction", "instead" |
292
+ | **affirmation** | 0 | Positive feedback | "good", "right", "perfect", "thanks" |
293
+ | **approval** | 0 | Task completion approval | "that works", "looks good", "approved" |
294
+ | **frustration** | 5 | Strong negative signals | "this is broken", "not working", "terrible" |
295
+
296
+ ### Heat Score Algorithm
297
+
298
+ ```
299
+ Heat Score = min(100, Σ(signal_weights))
300
+ ```
301
+
302
+ - **Peak Heat**: Maximum heat reached in session
303
+ - **Final Heat**: Heat at session end
304
+ - **Recovery**: Sessions that return to low heat after high peaks
305
+ - **Saved Sessions**: High-heat sessions that recover with affirmations
306
+
307
+ ### Token Usage Tracking
308
+
309
+ - Per-request token consumption (prompt + completion)
310
+ - Model identification and version tracking
311
+ - Context compaction event logging
312
+ - Cost estimation capabilities
313
+
314
+ ## ⚙️ Configuration
315
+
316
+ ### Automatic Detection
317
+
318
+ VS Code Ark automatically detects paths using standard locations:
319
+
320
+ - **macOS**: `~/Library/Application Support/Code/User/`
321
+ - **Windows**: `%APPDATA%\Code\User\`
322
+ - **Linux**: `~/.config/Code/User/`
323
+
324
+ ### Environment Variables
325
+
326
+ ```bash
327
+ export VSCODE_ARK_DB=/path/to/custom.db # Custom database location
328
+ export VSCODE_ARK_CONFIG=/path/to/config # Custom config directory
329
+ ```
330
+
331
+ ### Policy Configuration
332
+
333
+ Data access policies are stored in `policy.txt`:
334
+
335
+ ```
336
+ ALLOW important-project
337
+ DENY sensitive-data
338
+ ALLOW *.py
339
+ ```
340
+
341
+ ## 🔧 Development
342
+
343
+ ### Setup Development Environment
344
+
345
+ ```bash
346
+ make install-dev
347
+ ```
348
+
349
+ ### Running Tests
350
+
351
+ ```bash
352
+ make test # Run test suite
353
+ make test-cov # Run with coverage report
354
+ ```
355
+
356
+ ### Code Quality
357
+
358
+ ```bash
359
+ make lint # Run flake8 and mypy
360
+ make format # Format with black and isort
361
+ ```
362
+
363
+ ### Building
364
+
365
+ ```bash
366
+ make build # Build distribution packages
367
+ make publish # Publish to PyPI (requires credentials)
368
+ ```
369
+
370
+ ### Project Structure
371
+
372
+ ```
373
+ vscode-ark/
374
+ ├── vscode_ark/ # Main package
375
+ │ ├── __init__.py
376
+ │ └── cli.py # Command-line interface
377
+ ├── scripts/ # Utility scripts
378
+ │ ├── ingest.py # Data ingestion
379
+ │ ├── reconstruct.py # Conversation processing
380
+ │ ├── extract.py # Signal analysis
381
+ │ └── watcher.py # Live monitoring
382
+ ├── tests/ # Test suite
383
+ ├── docs/ # Documentation
384
+ ├── pyproject.toml # Package configuration
385
+ ├── setup.py # Legacy setup
386
+ ├── Makefile # Development tasks
387
+ └── README.md # This file
388
+ ```
389
+
390
+ ## 🤝 Contributing
391
+
392
+ 1. Fork the repository
393
+ 2. Create a feature branch: `git checkout -b feature/amazing-feature`
394
+ 3. Make your changes and add tests
395
+ 4. Run the test suite: `make test`
396
+ 5. Format code: `make format`
397
+ 6. Commit your changes: `git commit -m 'Add amazing feature'`
398
+ 7. Push to the branch: `git push origin feature/amazing-feature`
399
+ 8. Open a Pull Request
400
+
401
+ ### Development Guidelines
402
+
403
+ - **Type Hints**: All functions should have type annotations
404
+ - **Docstrings**: Comprehensive docstrings for public APIs
405
+ - **Tests**: Unit tests for all new functionality
406
+ - **Linting**: Code must pass flake8 and mypy checks
407
+ - **Formatting**: Code must be formatted with black and isort
408
+
409
+ ## 📝 License
410
+
411
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
412
+
413
+ ## 🙏 Acknowledgments
414
+
415
+ - Built for analyzing VS Code/Copilot Chat interaction patterns
416
+ - Inspired by the need for better human-AI interaction insights
417
+ - Uses SQLite FTS5 for high-performance full-text search
418
+ - Implements behavioral signal processing for conversation analysis
419
+
420
+ ---
421
+
422
+ **VS Code Ark** - Understanding the human side of AI conversations.