vector-task-mcp 1.3.2__tar.gz → 1.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/PKG-INFO +233 -6
  2. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/README.md +300 -73
  3. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/main.py +572 -1
  4. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/pyproject.toml +1 -1
  5. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/src/embeddings.py +24 -0
  6. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/src/models.py +6 -1
  7. vector_task_mcp-1.4.0/src/normalization.py +666 -0
  8. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/src/security.py +5 -5
  9. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/src/task_store.py +458 -16
  10. vector_task_mcp-1.4.0/tests/test_normalization.py +627 -0
  11. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/tests/test_task_store.py +346 -4
  12. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/vector_task_mcp.egg-info/PKG-INFO +233 -6
  13. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/vector_task_mcp.egg-info/SOURCES.txt +2 -0
  14. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/.mcp.json +0 -0
  15. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/.python-version +0 -0
  16. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/CLAUDE.md +0 -0
  17. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/LICENSE +0 -0
  18. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/MANIFEST.in +0 -0
  19. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/claude-desktop-config.example.json +0 -0
  20. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/requirements.txt +0 -0
  21. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/run-arm64.sh +0 -0
  22. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/setup.cfg +0 -0
  23. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/src/__init__.py +0 -0
  24. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/vector_task_mcp.egg-info/dependency_links.txt +0 -0
  25. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/vector_task_mcp.egg-info/entry_points.txt +0 -0
  26. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/vector_task_mcp.egg-info/requires.txt +0 -0
  27. {vector_task_mcp-1.3.2 → vector_task_mcp-1.4.0}/vector_task_mcp.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: vector-task-mcp
3
- Version: 1.3.2
3
+ Version: 1.4.0
4
4
  Summary: A secure, vector-based task management server for Claude Desktop using sqlite-vec and sentence-transformers
5
5
  Author-email: Xsaven <xsaven@gmail.com>
6
6
  License-Expression: MIT
@@ -34,11 +34,15 @@ A **secure, vector-based task management server** for Claude Desktop using `sqli
34
34
  - **💾 Persistent Storage**: SQLite database with vector indexing via `sqlite-vec`
35
35
  - **🏷️ Smart Organization**: Priorities, tags, and subtasks for better task management
36
36
  - **📋 Task Lifecycle**: Track tasks from pending → in_progress → completed → tested → validated (or stopped)
37
+ - **🔐 Tag Normalization**: Automatic tag deduplication with semantic similarity
38
+ - **📊 IDF Weights**: Rare tags boost search relevance more than common tags
39
+ - **🎯 Tag Classification**: Filter tags vs boost tags for smart ranking
40
+ - **🔄 Alias Scent**: Original tag variants preserved for search context
37
41
  - **🔒 Security First**: Input validation, path sanitization, and resource limits
38
42
  - **⚡ High Performance**: Fast embedding generation with `sentence-transformers`
39
- - **📊 Rich Statistics**: Comprehensive task analytics and progress tracking
43
+ - **📈 Rich Statistics**: Comprehensive task analytics and progress tracking
40
44
  - **🔄 Hierarchical Tasks**: Support for parent-child task relationships
41
- - **📈 Priority Management**: Organize tasks by priority (low, medium, high, critical)
45
+ - **📊 Priority Management**: Organize tasks by priority (low, medium, high, critical)
42
46
  - **💬 Task Comments**: Add notes and updates to tasks without changing content
43
47
 
44
48
  ## 🛠️ Technical Stack
@@ -48,6 +52,7 @@ A **secure, vector-based task management server** for Claude Desktop using `sqli
48
52
  | **Vector DB** | sqlite-vec | Vector storage and similarity search |
49
53
  | **Embeddings** | sentence-transformers/all-MiniLM-L6-v2 | 384D text embeddings |
50
54
  | **MCP Framework** | FastMCP | High-level tools-only server |
55
+ | **Tag Normalization** | Custom (src/normalization.py) | Semantic tag deduplication |
51
56
  | **Dependencies** | uv script headers | Self-contained deployment |
52
57
  | **Security** | Custom validation | Path/input sanitization |
53
58
  | **Testing** | pytest + coverage | Comprehensive test suite |
@@ -67,7 +72,13 @@ vector-task-mcp/
67
72
  │ ├── __init__.py # Package initialization
68
73
  │ ├── models.py # Data models & configuration
69
74
  │ ├── security.py # Security validation & sanitization
70
- └── task_store.py # SQLite-vec task operations
75
+ ├── task_store.py # SQLite-vec task operations
76
+ │ ├── embeddings.py # Embedding model wrapper
77
+ │ └── normalization.py # Tag normalization & classification
78
+
79
+ ├── tests/ # Test suite
80
+ │ ├── test_task_store.py # Task store tests
81
+ │ └── test_normalization.py # Normalization tests
71
82
 
72
83
  └── .gitignore # Git exclusions
73
84
  ```
@@ -345,6 +356,89 @@ Returns:
345
356
  }
346
357
  ```
347
358
 
359
+ ### Tag Normalization Tools
360
+
361
+ **19. `tag_normalize_preview` - Preview Tag Merges**
362
+ ```
363
+ Preview which tags can be merged:
364
+ - threshold: 0.90 (strict) or 0.85 (aggressive)
365
+ ```
366
+
367
+ Shows similar tags that can be merged into canonical forms.
368
+
369
+ **20. `tag_normalize_apply` - Apply Tag Normalization**
370
+ ```
371
+ Apply tag normalization with optional dry_run
372
+ ```
373
+
374
+ Merges variant tags into canonical forms and stores original variants in `tag_variants`.
375
+
376
+ **21. `tag_similarity` - Compare Two Tags**
377
+ ```
378
+ Compare similarity between "auth" and "authentication"
379
+ ```
380
+
381
+ Returns cosine similarity score (0.0-1.0).
382
+
383
+ **22. `canonical_tag_add` - Add Canonical Mapping**
384
+ ```
385
+ Add mapping: "authentication" → "auth"
386
+ ```
387
+
388
+ **23. `canonical_tag_remove` - Remove Mapping**
389
+ ```
390
+ Remove mapping for "authentication"
391
+ ```
392
+
393
+ **24. `canonical_tag_list` - List All Mappings**
394
+ ```
395
+ List all canonical tag mappings
396
+ ```
397
+
398
+ **25. `get_canonical_tags` - List Canonical Tags**
399
+ ```
400
+ List all canonical tags only
401
+ ```
402
+
403
+ ### Tag Intelligence Tools
404
+
405
+ **26. `tag_frequencies` - Get Tag Frequencies & IDF Weights**
406
+ ```
407
+ Get tag frequencies with IDF weights
408
+ ```
409
+
410
+ Returns frequency statistics and IDF weights for search ranking:
411
+ ```json
412
+ {
413
+ "api": {"count": 10, "frequency": 0.4, "idf_weight": 0.621},
414
+ "vendor:stripe": {"count": 1, "frequency": 0.04, "idf_weight": 1.443}
415
+ }
416
+ ```
417
+
418
+ **27. `tag_weights` - Get Simplified IDF Weights**
419
+ ```
420
+ Get IDF weights for all tags (for search ranking)
421
+ ```
422
+
423
+ **28. `tag_classify` - Classify Single Tag**
424
+ ```
425
+ Classify tag "vendor:stripe"
426
+ ```
427
+
428
+ Returns boost level (high/medium/low/filter_only) for ranking.
429
+
430
+ **29. `tags_classify_batch` - Classify Multiple Tags**
431
+ ```
432
+ Classify tags: ["vendor:stripe", "api", "status:pending"]
433
+ ```
434
+
435
+ **30. `search_explain` - Search with Ranking Explanation**
436
+ ```
437
+ Search for "authentication" with ranking explanation
438
+ ```
439
+
440
+ Shows how IDF weights, classification, and variants affect ranking.
441
+
348
442
  ### Task Priorities
349
443
 
350
444
  | Priority | Use Cases |
@@ -392,11 +486,23 @@ uv run main.py --working-dir ~/projects/my-project
392
486
 
393
487
  ```
394
488
  your-project/
395
- ├── tasks.db # SQLite database with task vectors
396
- ├── src/ # Your project files
489
+ ├── memory/
490
+ │ └── tasks.db # SQLite database with task vectors
491
+ ├── src/ # Your project files
397
492
  └── other-files...
398
493
  ```
399
494
 
495
+ ### Database Schema
496
+
497
+ **tasks table**:
498
+ - Core task data + `tags` (canonical) + `tag_variants` (original variants)
499
+
500
+ **canonical_tags table**:
501
+ - Predefined tag mappings (variant → canonical)
502
+
503
+ **task_vectors table**:
504
+ - 384-dimensional embeddings for semantic search
505
+
400
506
  ### Security Limits
401
507
 
402
508
  - **Max task content**: 10,000 characters
@@ -449,6 +555,100 @@ your-project/
449
555
  "Refactor legacy authentication module to use new security library"
450
556
  ```
451
557
 
558
+ ## 🏷️ Tag Normalization
559
+
560
+ ### Overview
561
+
562
+ Tag normalization reduces tag fragmentation by merging semantically similar tags:
563
+
564
+ | Before | After |
565
+ |--------|-------|
566
+ | auth, authentication, auth-api, login | → **auth** |
567
+ | db, database, database-setup | → **database** |
568
+ | api, rest api, API | → **api** |
569
+
570
+ ### Hard Guards (Prevent Wrong Merges)
571
+
572
+ | Guard | Rule | Example |
573
+ |-------|------|---------|
574
+ | **Version** | Different versions → NO | `php8` ≠ `php7` |
575
+ | **Numeric** | Different numbers → NO | `api1` ≠ `api2` |
576
+ | **Facet** | Different prefixes → NO | `type:*` ≠ `domain:*` |
577
+ | **Prefix** | Structured ≠ Plain | `type:refactor` ≠ `refactor` |
578
+
579
+ ### Substring Boost
580
+
581
+ Tags that are substrings get a small boost if:
582
+ - Shorter word ≥ 4 characters
583
+ - Not in stop-words (api, ui, db, etc.)
584
+
585
+ Example: `"laravel"` ⊂ `"laravel framework"` → boost to 0.95
586
+
587
+ ### Facet Model (Colon Tags)
588
+
589
+ Tags with colons (`prefix:value`) are treated as structured facets:
590
+
591
+ ```
592
+ type:refactor ← facet: "type", value: "refactor"
593
+ vendor:stripe ← facet: "vendor", value: "stripe"
594
+ module:terminal ← facet: "module", value: "terminal"
595
+ ```
596
+
597
+ Rules:
598
+ - Same prefix can merge if similar: `type:refactor` ↔ `type:refactoring` ✅
599
+ - Different prefixes never merge: `type:*` ↔ `domain:*` ❌
600
+ - Structured never merges with plain: `type:*` ↔ `refactor` ❌
601
+
602
+ ### Tag Variants (Alias Scent)
603
+
604
+ When tags are migrated, original variants are preserved:
605
+
606
+ ```json
607
+ {
608
+ "tags": ["auth"],
609
+ "tag_variants": ["authentication", "auth-api", "login"]
610
+ }
611
+ ```
612
+
613
+ Variants provide:
614
+ - Context for search ranking
615
+ - Explanation in UI ("Why auth? Because was login/authentication")
616
+ - Rerank signal for queries
617
+
618
+ ## 📊 IDF Weights & Tag Classification
619
+
620
+ ### IDF (Inverse Document Frequency)
621
+
622
+ Rare tags boost relevance more than common tags:
623
+
624
+ ```
625
+ idf_weight = 1 / log(1 + frequency)
626
+ ```
627
+
628
+ | Tag | Count | IDF Weight | Effect |
629
+ |-----|-------|------------|--------|
630
+ | `api` | 70% of tasks | 0.38 | Low signal |
631
+ | `vendor:stripe` | 3% of tasks | 1.44 | Strong signal |
632
+
633
+ ### Tag Classification
634
+
635
+ Tags are classified by boost level:
636
+
637
+ | Level | Boost | Examples |
638
+ |-------|-------|----------|
639
+ | `high` | 1.5 | `vendor:*`, `module:*`, `service:*` |
640
+ | `medium` | 1.0 | Facet tags (`domain:*`, `type:*`), specific tags |
641
+ | `low` | 0.5 | General tags (`api`, `backend`, `test`) |
642
+ | `filter_only` | 0.1 | `status:*`, `priority:*` |
643
+
644
+ ### Search Ranking
645
+
646
+ Final search score combines:
647
+ 1. **Vector similarity** (cosine distance)
648
+ 2. **IDF weight** (rare tags boost more)
649
+ 3. **Tag classification** (high > medium > low > filter)
650
+ 4. **Variant bonus** (tasks with tag_variants get small boost)
651
+
452
652
  ## 🔍 How Semantic Search Works
453
653
 
454
654
  The server uses **sentence-transformers** to convert tasks into 384-dimensional vectors that capture semantic meaning:
@@ -629,6 +829,33 @@ Based on testing with various dataset sizes:
629
829
 
630
830
  *Tested on MacBook Air M1 with sentence-transformers/all-MiniLM-L6-v2*
631
831
 
832
+ ## 🧪 Test Coverage
833
+
834
+ ```
835
+ tests/
836
+ ├── test_task_store.py # 60 tests - Task store operations
837
+ └── test_normalization.py # 45 tests - Tag normalization
838
+
839
+ Total: 105 tests
840
+ ```
841
+
842
+ Run tests:
843
+ ```bash
844
+ uv run pytest tests/ -v
845
+ ```
846
+
847
+ ## 🔄 Backward Compatibility
848
+
849
+ New features are backward compatible:
850
+
851
+ | Feature | Migration |
852
+ |---------|-----------|
853
+ | `tag_variants` column | Auto-added via ALTER TABLE |
854
+ | `canonical_tags` table | Auto-created via CREATE TABLE IF NOT EXISTS |
855
+ | IDF reranking | Opt-in via `use_idf_rerank=True` |
856
+
857
+ Existing databases work without changes. New columns/tables added automatically on first run.
858
+
632
859
  ## 🤝 Contributing
633
860
 
634
861
  This is a standalone MCP server designed for personal/team use. For improvements: