@mytechtoday/augment-extensions 0.2.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +614 -39
- package/augment-extensions/coding-standards/bash/README.md +196 -0
- package/augment-extensions/coding-standards/bash/module.json +163 -0
- package/augment-extensions/coding-standards/bash/rules/naming-conventions.md +336 -0
- package/augment-extensions/coding-standards/bash/rules/universal-standards.md +289 -0
- package/augment-extensions/coding-standards/css/README.md +40 -0
- package/augment-extensions/coding-standards/css/examples/css-examples.css +550 -0
- package/augment-extensions/coding-standards/css/module.json +44 -0
- package/augment-extensions/coding-standards/css/rules/css-modern-features.md +448 -0
- package/augment-extensions/coding-standards/css/rules/css-standards.md +492 -0
- package/augment-extensions/coding-standards/html/README.md +40 -0
- package/augment-extensions/coding-standards/html/examples/html-examples.html +267 -0
- package/augment-extensions/coding-standards/html/examples/responsive-layout.html +505 -0
- package/augment-extensions/coding-standards/html/module.json +44 -0
- package/augment-extensions/coding-standards/html/rules/html-standards.md +349 -0
- package/augment-extensions/coding-standards/html-css-js/README.md +194 -0
- package/augment-extensions/coding-standards/html-css-js/examples/async-examples.js +487 -0
- package/augment-extensions/coding-standards/html-css-js/examples/css-examples.css +550 -0
- package/augment-extensions/coding-standards/html-css-js/examples/dom-examples.js +667 -0
- package/augment-extensions/coding-standards/html-css-js/examples/html-examples.html +267 -0
- package/augment-extensions/coding-standards/html-css-js/examples/javascript-examples.js +612 -0
- package/augment-extensions/coding-standards/html-css-js/examples/responsive-layout.html +505 -0
- package/augment-extensions/coding-standards/html-css-js/module.json +48 -0
- package/augment-extensions/coding-standards/html-css-js/rules/async-patterns.md +515 -0
- package/augment-extensions/coding-standards/html-css-js/rules/css-modern-features.md +448 -0
- package/augment-extensions/coding-standards/html-css-js/rules/css-standards.md +492 -0
- package/augment-extensions/coding-standards/html-css-js/rules/dom-manipulation.md +439 -0
- package/augment-extensions/coding-standards/html-css-js/rules/html-standards.md +349 -0
- package/augment-extensions/coding-standards/html-css-js/rules/javascript-standards.md +486 -0
- package/augment-extensions/coding-standards/html-css-js/rules/performance.md +463 -0
- package/augment-extensions/coding-standards/html-css-js/rules/tooling.md +543 -0
- package/augment-extensions/coding-standards/js/README.md +46 -0
- package/augment-extensions/coding-standards/js/examples/async-examples.js +487 -0
- package/augment-extensions/coding-standards/js/examples/dom-examples.js +667 -0
- package/augment-extensions/coding-standards/js/examples/javascript-examples.js +612 -0
- package/augment-extensions/coding-standards/js/module.json +49 -0
- package/augment-extensions/coding-standards/js/rules/async-patterns.md +515 -0
- package/augment-extensions/coding-standards/js/rules/dom-manipulation.md +439 -0
- package/augment-extensions/coding-standards/js/rules/javascript-standards.md +486 -0
- package/augment-extensions/coding-standards/js/rules/performance.md +463 -0
- package/augment-extensions/coding-standards/js/rules/tooling.md +543 -0
- package/augment-extensions/coding-standards/php/README.md +248 -0
- package/augment-extensions/coding-standards/php/examples/api-endpoint-example.php +204 -0
- package/augment-extensions/coding-standards/php/examples/cli-command-example.php +206 -0
- package/augment-extensions/coding-standards/php/examples/legacy-refactoring-example.php +234 -0
- package/augment-extensions/coding-standards/php/examples/web-application-example.php +211 -0
- package/augment-extensions/coding-standards/php/examples/woocommerce-extension-example.php +215 -0
- package/augment-extensions/coding-standards/php/examples/wordpress-plugin-example.php +189 -0
- package/augment-extensions/coding-standards/php/module.json +166 -0
- package/augment-extensions/coding-standards/php/rules/api-development.md +480 -0
- package/augment-extensions/coding-standards/php/rules/category-configuration.md +332 -0
- package/augment-extensions/coding-standards/php/rules/cli-tools.md +472 -0
- package/augment-extensions/coding-standards/php/rules/cms-integration.md +561 -0
- package/augment-extensions/coding-standards/php/rules/code-quality.md +402 -0
- package/augment-extensions/coding-standards/php/rules/documentation.md +425 -0
- package/augment-extensions/coding-standards/php/rules/ecommerce.md +627 -0
- package/augment-extensions/coding-standards/php/rules/error-handling.md +336 -0
- package/augment-extensions/coding-standards/php/rules/legacy-migration.md +677 -0
- package/augment-extensions/coding-standards/php/rules/naming-conventions.md +279 -0
- package/augment-extensions/coding-standards/php/rules/performance.md +392 -0
- package/augment-extensions/coding-standards/php/rules/psr-standards.md +186 -0
- package/augment-extensions/coding-standards/php/rules/security.md +358 -0
- package/augment-extensions/coding-standards/php/rules/testing.md +403 -0
- package/augment-extensions/coding-standards/php/rules/type-declarations.md +331 -0
- package/augment-extensions/coding-standards/php/rules/web-applications.md +426 -0
- package/augment-extensions/coding-standards/powershell/README.md +154 -0
- package/augment-extensions/coding-standards/powershell/examples/admin-example.ps1 +272 -0
- package/augment-extensions/coding-standards/powershell/examples/automation-example.ps1 +173 -0
- package/augment-extensions/coding-standards/powershell/examples/cloud-example.ps1 +243 -0
- package/augment-extensions/coding-standards/powershell/examples/cross-platform-example.ps1 +297 -0
- package/augment-extensions/coding-standards/powershell/examples/dsc-example.ps1 +224 -0
- package/augment-extensions/coding-standards/powershell/examples/legacy-migration-example.ps1 +340 -0
- package/augment-extensions/coding-standards/powershell/examples/module-example.psm1 +255 -0
- package/augment-extensions/coding-standards/powershell/module.json +165 -0
- package/augment-extensions/coding-standards/powershell/rules/administrative-tools.md +439 -0
- package/augment-extensions/coding-standards/powershell/rules/automation-scripts.md +240 -0
- package/augment-extensions/coding-standards/powershell/rules/cloud-orchestration.md +384 -0
- package/augment-extensions/coding-standards/powershell/rules/configuration-schema.md +383 -0
- package/augment-extensions/coding-standards/powershell/rules/cross-platform-scripts.md +482 -0
- package/augment-extensions/coding-standards/powershell/rules/dsc-configurations.md +296 -0
- package/augment-extensions/coding-standards/powershell/rules/error-handling.md +314 -0
- package/augment-extensions/coding-standards/powershell/rules/legacy-migrations.md +466 -0
- package/augment-extensions/coding-standards/powershell/rules/modules-functions.md +244 -0
- package/augment-extensions/coding-standards/powershell/rules/naming-conventions.md +266 -0
- package/augment-extensions/coding-standards/powershell/rules/performance-optimization.md +209 -0
- package/augment-extensions/coding-standards/powershell/rules/security-practices.md +314 -0
- package/augment-extensions/coding-standards/powershell/rules/testing-guidelines.md +268 -0
- package/augment-extensions/coding-standards/powershell/rules/universal-standards.md +197 -0
- package/augment-extensions/coding-standards/python/README.md +12 -8
- package/augment-extensions/coding-standards/python/examples/best-practices.py +373 -0
- package/augment-extensions/coding-standards/python/module.json +8 -4
- package/augment-extensions/coding-standards/python/rules/async-patterns.md +884 -0
- package/augment-extensions/coding-standards/python/rules/documentation.md +831 -0
- package/augment-extensions/coding-standards/python/rules/error-handling.md +855 -68
- package/augment-extensions/coding-standards/python/rules/testing.md +409 -0
- package/augment-extensions/coding-standards/python/rules/tooling.md +446 -0
- package/augment-extensions/coding-standards/python/rules/type-hints.md +115 -50
- package/augment-extensions/collections/html-css-js/README.md +82 -0
- package/augment-extensions/collections/html-css-js/collection.json +41 -0
- package/augment-extensions/domain-rules/database/README.md +161 -0
- package/augment-extensions/domain-rules/database/examples/flat-database-example.md +793 -0
- package/augment-extensions/domain-rules/database/examples/hybrid-database-example.md +1132 -0
- package/augment-extensions/domain-rules/database/examples/nosql-document-example.md +868 -0
- package/augment-extensions/domain-rules/database/examples/nosql-graph-example.md +805 -0
- package/augment-extensions/domain-rules/database/examples/relational-schema-example.md +621 -0
- package/augment-extensions/domain-rules/database/examples/vector-database-example.md +965 -0
- package/augment-extensions/domain-rules/database/module.json +28 -0
- package/augment-extensions/domain-rules/database/rules/flat-databases.md +624 -0
- package/augment-extensions/domain-rules/database/rules/nosql-databases.md +588 -0
- package/augment-extensions/domain-rules/database/rules/nosql-document-stores.md +856 -0
- package/augment-extensions/domain-rules/database/rules/nosql-graph-databases.md +778 -0
- package/augment-extensions/domain-rules/database/rules/nosql-key-value-stores.md +963 -0
- package/augment-extensions/domain-rules/database/rules/performance-optimization.md +1076 -0
- package/augment-extensions/domain-rules/database/rules/relational-databases.md +697 -0
- package/augment-extensions/domain-rules/database/rules/relational-indexing.md +671 -0
- package/augment-extensions/domain-rules/database/rules/relational-query-optimization.md +607 -0
- package/augment-extensions/domain-rules/database/rules/relational-schema-design.md +907 -0
- package/augment-extensions/domain-rules/database/rules/relational-transactions.md +783 -0
- package/augment-extensions/domain-rules/database/rules/security-standards.md +980 -0
- package/augment-extensions/domain-rules/database/rules/universal-best-practices.md +485 -0
- package/augment-extensions/domain-rules/database/rules/vector-databases.md +521 -0
- package/augment-extensions/domain-rules/database/rules/vector-embeddings.md +858 -0
- package/augment-extensions/domain-rules/database/rules/vector-indexing.md +934 -0
- package/augment-extensions/domain-rules/mcp/README.md +150 -0
- package/augment-extensions/domain-rules/mcp/examples/compressed-example.md +522 -0
- package/augment-extensions/domain-rules/mcp/examples/graph-augmented-example.md +520 -0
- package/augment-extensions/domain-rules/mcp/examples/hybrid-example.md +570 -0
- package/augment-extensions/domain-rules/mcp/examples/state-based-example.md +427 -0
- package/augment-extensions/domain-rules/mcp/examples/token-based-example.md +435 -0
- package/augment-extensions/domain-rules/mcp/examples/vector-based-example.md +502 -0
- package/augment-extensions/domain-rules/mcp/module.json +49 -0
- package/augment-extensions/domain-rules/mcp/rules/compressed-mcp.md +595 -0
- package/augment-extensions/domain-rules/mcp/rules/configuration.md +345 -0
- package/augment-extensions/domain-rules/mcp/rules/graph-augmented-mcp.md +687 -0
- package/augment-extensions/domain-rules/mcp/rules/hybrid-mcp.md +636 -0
- package/augment-extensions/domain-rules/mcp/rules/state-based-mcp.md +484 -0
- package/augment-extensions/domain-rules/mcp/rules/testing-validation.md +360 -0
- package/augment-extensions/domain-rules/mcp/rules/token-based-mcp.md +393 -0
- package/augment-extensions/domain-rules/mcp/rules/universal-rules.md +194 -0
- package/augment-extensions/domain-rules/mcp/rules/vector-based-mcp.md +625 -0
- package/augment-extensions/workflows/beads/module.json +4 -3
- package/augment-extensions/workflows/database/README.md +195 -0
- package/augment-extensions/workflows/database/ai-prompt-testing.md +295 -0
- package/augment-extensions/workflows/database/examples/migration-example.md +498 -0
- package/augment-extensions/workflows/database/examples/optimization-example.md +496 -0
- package/augment-extensions/workflows/database/examples/schema-design-example.md +444 -0
- package/augment-extensions/workflows/database/module.json +42 -0
- package/augment-extensions/workflows/database/rules/data-migration.md +249 -0
- package/augment-extensions/workflows/database/rules/documentation-standards.md +339 -0
- package/augment-extensions/workflows/database/rules/migration-workflow.md +352 -0
- package/augment-extensions/workflows/database/rules/optimization-workflow.md +435 -0
- package/augment-extensions/workflows/database/rules/schema-design-workflow.md +535 -0
- package/augment-extensions/workflows/database/rules/testing-patterns.md +305 -0
- package/augment-extensions/workflows/database/rules/workflow.md +458 -0
- package/augment-extensions/workflows/openspec/module.json +4 -3
- package/augment-extensions/writing-standards/screenplay/README.md +171 -0
- package/augment-extensions/writing-standards/screenplay/examples/aaa-hollywood-scene.fountain +164 -0
- package/augment-extensions/writing-standards/screenplay/module.json +124 -0
- package/augment-extensions/writing-standards/screenplay/rules/universal-formatting.md +339 -0
- package/cli/MODULES.md +302 -0
- package/cli/dist/cli.js +142 -9
- package/cli/dist/cli.js.map +1 -1
- package/cli/dist/commands/catalog.d.ts +13 -0
- package/cli/dist/commands/catalog.d.ts.map +1 -0
- package/cli/dist/commands/catalog.js +104 -0
- package/cli/dist/commands/catalog.js.map +1 -0
- package/cli/dist/commands/gui.d.ts +6 -0
- package/cli/dist/commands/gui.d.ts.map +1 -0
- package/cli/dist/commands/gui.js +211 -0
- package/cli/dist/commands/gui.js.map +1 -0
- package/cli/dist/commands/init.d.ts.map +1 -1
- package/cli/dist/commands/init.js +12 -0
- package/cli/dist/commands/init.js.map +1 -1
- package/cli/dist/commands/install-rules.d.ts +14 -0
- package/cli/dist/commands/install-rules.d.ts.map +1 -0
- package/cli/dist/commands/install-rules.js +127 -0
- package/cli/dist/commands/install-rules.js.map +1 -0
- package/cli/dist/commands/link.d.ts.map +1 -1
- package/cli/dist/commands/link.js +9 -11
- package/cli/dist/commands/link.js.map +1 -1
- package/cli/dist/commands/list.d.ts.map +1 -1
- package/cli/dist/commands/list.js +11 -28
- package/cli/dist/commands/list.js.map +1 -1
- package/cli/dist/commands/mcp.d.ts +48 -0
- package/cli/dist/commands/mcp.d.ts.map +1 -0
- package/cli/dist/commands/mcp.js +229 -0
- package/cli/dist/commands/mcp.js.map +1 -0
- package/cli/dist/commands/self-remove.d.ts +7 -0
- package/cli/dist/commands/self-remove.d.ts.map +1 -0
- package/cli/dist/commands/self-remove.js +179 -0
- package/cli/dist/commands/self-remove.js.map +1 -0
- package/cli/dist/commands/show.d.ts.map +1 -1
- package/cli/dist/commands/show.js +29 -99
- package/cli/dist/commands/show.js.map +1 -1
- package/cli/dist/commands/skill.d.ts +67 -0
- package/cli/dist/commands/skill.d.ts.map +1 -0
- package/cli/dist/commands/skill.js +513 -0
- package/cli/dist/commands/skill.js.map +1 -0
- package/cli/dist/commands/unlink.d.ts +6 -0
- package/cli/dist/commands/unlink.d.ts.map +1 -0
- package/cli/dist/commands/unlink.js +115 -0
- package/cli/dist/commands/unlink.js.map +1 -0
- package/cli/dist/commands/validate.d.ts +6 -0
- package/cli/dist/commands/validate.d.ts.map +1 -0
- package/cli/dist/commands/validate.js +159 -0
- package/cli/dist/commands/validate.js.map +1 -0
- package/cli/dist/utils/catalog-sync.d.ts +22 -0
- package/cli/dist/utils/catalog-sync.d.ts.map +1 -0
- package/cli/dist/utils/catalog-sync.js +157 -0
- package/cli/dist/utils/catalog-sync.js.map +1 -0
- package/cli/dist/utils/character-count.d.ts +56 -0
- package/cli/dist/utils/character-count.d.ts.map +1 -0
- package/cli/dist/utils/character-count.js +190 -0
- package/cli/dist/utils/character-count.js.map +1 -0
- package/cli/dist/utils/documentation-validator.d.ts +18 -0
- package/cli/dist/utils/documentation-validator.d.ts.map +1 -0
- package/cli/dist/utils/documentation-validator.js +233 -0
- package/cli/dist/utils/documentation-validator.js.map +1 -0
- package/cli/dist/utils/install-rules.d.ts +32 -0
- package/cli/dist/utils/install-rules.d.ts.map +1 -0
- package/cli/dist/utils/install-rules.js +375 -0
- package/cli/dist/utils/install-rules.js.map +1 -0
- package/cli/dist/utils/mcp-integration.d.ts +70 -0
- package/cli/dist/utils/mcp-integration.d.ts.map +1 -0
- package/cli/dist/utils/mcp-integration.js +292 -0
- package/cli/dist/utils/mcp-integration.js.map +1 -0
- package/cli/dist/utils/module-system.d.ts +153 -0
- package/cli/dist/utils/module-system.d.ts.map +1 -0
- package/cli/dist/utils/module-system.js +528 -0
- package/cli/dist/utils/module-system.js.map +1 -0
- package/cli/dist/utils/modules-catalog.d.ts +33 -0
- package/cli/dist/utils/modules-catalog.d.ts.map +1 -0
- package/cli/dist/utils/modules-catalog.js +163 -0
- package/cli/dist/utils/modules-catalog.js.map +1 -0
- package/cli/dist/utils/rule-install-hooks.d.ts +19 -0
- package/cli/dist/utils/rule-install-hooks.d.ts.map +1 -0
- package/cli/dist/utils/rule-install-hooks.js +224 -0
- package/cli/dist/utils/rule-install-hooks.js.map +1 -0
- package/cli/dist/utils/skill-system.d.ts +95 -0
- package/cli/dist/utils/skill-system.d.ts.map +1 -0
- package/cli/dist/utils/skill-system.js +313 -0
- package/cli/dist/utils/skill-system.js.map +1 -0
- package/modules.md +518 -106
- package/package.json +12 -3
|
@@ -0,0 +1,435 @@
|
|
|
1
|
+
# Token-Based MCP Example: Legal Contract Analysis
|
|
2
|
+
|
|
3
|
+
## Use Case
|
|
4
|
+
|
|
5
|
+
A legal AI assistant that analyzes long contracts (50-200 pages) and answers questions about specific clauses, obligations, and risks.
|
|
6
|
+
|
|
7
|
+
**Challenges**:
|
|
8
|
+
- Contracts exceed context window (200k tokens)
|
|
9
|
+
- Need to maintain context across multiple queries
|
|
10
|
+
- Must preserve exact wording for legal accuracy
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Configuration
|
|
15
|
+
|
|
16
|
+
```json
|
|
17
|
+
{
|
|
18
|
+
"mcp": {
|
|
19
|
+
"type": "token",
|
|
20
|
+
"contextWindow": {
|
|
21
|
+
"modelMaxTokens": 200000,
|
|
22
|
+
"outputBuffer": 4096,
|
|
23
|
+
"systemPromptTokens": 500,
|
|
24
|
+
"effectiveWindow": 195404
|
|
25
|
+
},
|
|
26
|
+
"chunking": {
|
|
27
|
+
"strategy": "sliding_window",
|
|
28
|
+
"chunkSize": 4096,
|
|
29
|
+
"overlap": 512
|
|
30
|
+
},
|
|
31
|
+
"summarization": {
|
|
32
|
+
"enabled": true,
|
|
33
|
+
"hierarchical": true,
|
|
34
|
+
"levels": 3
|
|
35
|
+
},
|
|
36
|
+
"entitySpotlighting": {
|
|
37
|
+
"enabled": true,
|
|
38
|
+
"entityTypes": ["PARTY", "OBLIGATION", "DATE", "AMOUNT"]
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
}
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Architecture
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
50
|
+
│ Legal Contract (150k tokens) │
|
|
51
|
+
└─────────────────────────────────────────────────────────────┘
|
|
52
|
+
│
|
|
53
|
+
▼
|
|
54
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
55
|
+
│ Hierarchical Summarization │
|
|
56
|
+
│ ┌──────────────────────────────────────────────────────┐ │
|
|
57
|
+
│ │ Level 1: Detailed Summary (75k tokens, 50%) │ │
|
|
58
|
+
│ └──────────────────────────────────────────────────────┘ │
|
|
59
|
+
│ ┌──────────────────────────────────────────────────────┐ │
|
|
60
|
+
│ │ Level 2: Medium Summary (30k tokens, 20%) │ │
|
|
61
|
+
│ └──────────────────────────────────────────────────────┘ │
|
|
62
|
+
│ ┌──────────────────────────────────────────────────────┐ │
|
|
63
|
+
│ │ Level 3: Gist (7.5k tokens, 5%) │ │
|
|
64
|
+
│ └──────────────────────────────────────────────────────┘ │
|
|
65
|
+
└─────────────────────────────────────────────────────────────┘
|
|
66
|
+
│
|
|
67
|
+
▼
|
|
68
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
69
|
+
│ Entity Spotlighting │
|
|
70
|
+
│ Parties: Acme Corp, Beta LLC │
|
|
71
|
+
│ Key Dates: 2024-01-15 (effective), 2026-01-15 (expiry) │
|
|
72
|
+
│ Obligations: Payment terms, Delivery schedule │
|
|
73
|
+
│ Amounts: $500,000 (total), $50,000 (monthly) │
|
|
74
|
+
└─────────────────────────────────────────────────────────────┘
|
|
75
|
+
│
|
|
76
|
+
▼
|
|
77
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
78
|
+
│ Sliding Window Chunks (4096 tokens each) │
|
|
79
|
+
│ Chunk 1: Sections 1-3 (overlap 512) │
|
|
80
|
+
│ Chunk 2: Sections 3-5 (overlap 512) │
|
|
81
|
+
│ Chunk 3: Sections 5-7 (overlap 512) │
|
|
82
|
+
│ ... │
|
|
83
|
+
└─────────────────────────────────────────────────────────────┘
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Implementation
|
|
89
|
+
|
|
90
|
+
### Step 1: Document Ingestion
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
import tiktoken
|
|
94
|
+
|
|
95
|
+
class LegalContractAnalyzer:
|
|
96
|
+
def __init__(self, config):
|
|
97
|
+
self.config = config
|
|
98
|
+
self.tokenizer = tiktoken.encoding_for_model("gpt-4o")
|
|
99
|
+
self.summaries = {}
|
|
100
|
+
self.entities = {}
|
|
101
|
+
self.chunks = []
|
|
102
|
+
|
|
103
|
+
def ingest_contract(self, contract_text: str):
|
|
104
|
+
"""Ingest and process contract"""
|
|
105
|
+
# Step 1: Create hierarchical summaries
|
|
106
|
+
self.summaries = self.create_hierarchical_summaries(contract_text)
|
|
107
|
+
|
|
108
|
+
# Step 2: Extract entities
|
|
109
|
+
self.entities = self.extract_entities(contract_text)
|
|
110
|
+
|
|
111
|
+
# Step 3: Create sliding window chunks
|
|
112
|
+
self.chunks = self.create_sliding_window_chunks(contract_text)
|
|
113
|
+
|
|
114
|
+
print(f"Ingested contract:")
|
|
115
|
+
print(f" Original: {self.count_tokens(contract_text)} tokens")
|
|
116
|
+
print(f" Level 1 Summary: {self.count_tokens(self.summaries['level_1'])} tokens")
|
|
117
|
+
print(f" Level 2 Summary: {self.count_tokens(self.summaries['level_2'])} tokens")
|
|
118
|
+
print(f" Level 3 Gist: {self.count_tokens(self.summaries['gist'])} tokens")
|
|
119
|
+
print(f" Chunks: {len(self.chunks)}")
|
|
120
|
+
print(f" Entities: {len(self.entities)}")
|
|
121
|
+
|
|
122
|
+
def count_tokens(self, text: str) -> int:
|
|
123
|
+
"""Count tokens in text"""
|
|
124
|
+
return len(self.tokenizer.encode(text))
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Step 2: Hierarchical Summarization
|
|
128
|
+
|
|
129
|
+
```python
|
|
130
|
+
def create_hierarchical_summaries(self, text: str):
|
|
131
|
+
"""Create multi-level summaries"""
|
|
132
|
+
summaries = {}
|
|
133
|
+
current_text = text
|
|
134
|
+
|
|
135
|
+
# Level 1: 50% compression
|
|
136
|
+
summaries['level_1'] = self.summarize(
|
|
137
|
+
current_text,
|
|
138
|
+
max_tokens=int(self.count_tokens(current_text) * 0.5)
|
|
139
|
+
)
|
|
140
|
+
|
|
141
|
+
# Level 2: 20% compression (of original)
|
|
142
|
+
summaries['level_2'] = self.summarize(
|
|
143
|
+
summaries['level_1'],
|
|
144
|
+
max_tokens=int(self.count_tokens(text) * 0.2)
|
|
145
|
+
)
|
|
146
|
+
|
|
147
|
+
# Level 3: 5% compression (gist)
|
|
148
|
+
summaries['gist'] = self.summarize(
|
|
149
|
+
summaries['level_2'],
|
|
150
|
+
max_tokens=int(self.count_tokens(text) * 0.05)
|
|
151
|
+
)
|
|
152
|
+
|
|
153
|
+
return summaries
|
|
154
|
+
|
|
155
|
+
def summarize(self, text: str, max_tokens: int):
|
|
156
|
+
"""Summarize text to target token count"""
|
|
157
|
+
prompt = f"""
|
|
158
|
+
Summarize the following legal contract to approximately {max_tokens} tokens.
|
|
159
|
+
Preserve key parties, obligations, dates, and amounts.
|
|
160
|
+
Maintain legal precision.
|
|
161
|
+
|
|
162
|
+
Contract: {text}
|
|
163
|
+
|
|
164
|
+
Summary:
|
|
165
|
+
"""
|
|
166
|
+
|
|
167
|
+
return llm_call(prompt, max_tokens=max_tokens)
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Step 3: Entity Extraction
|
|
171
|
+
|
|
172
|
+
```python
|
|
173
|
+
def extract_entities(self, text: str):
|
|
174
|
+
"""Extract key entities"""
|
|
175
|
+
prompt = f"""
|
|
176
|
+
Extract key entities from this legal contract:
|
|
177
|
+
- Parties (companies, individuals)
|
|
178
|
+
- Obligations (what each party must do)
|
|
179
|
+
- Dates (effective date, expiry, milestones)
|
|
180
|
+
- Amounts (payments, penalties, limits)
|
|
181
|
+
|
|
182
|
+
Contract: {text}
|
|
183
|
+
|
|
184
|
+
Return as JSON:
|
|
185
|
+
{{
|
|
186
|
+
"parties": [...],
|
|
187
|
+
"obligations": [...],
|
|
188
|
+
"dates": [...],
|
|
189
|
+
"amounts": [...]
|
|
190
|
+
}}
|
|
191
|
+
"""
|
|
192
|
+
|
|
193
|
+
response = llm_call(prompt)
|
|
194
|
+
return json.loads(response)
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
### Step 4: Sliding Window Chunks
|
|
200
|
+
|
|
201
|
+
```python
|
|
202
|
+
def create_sliding_window_chunks(self, text: str):
|
|
203
|
+
"""Create overlapping chunks"""
|
|
204
|
+
chunk_size = self.config['chunking']['chunkSize']
|
|
205
|
+
overlap = self.config['chunking']['overlap']
|
|
206
|
+
|
|
207
|
+
tokens = self.tokenizer.encode(text)
|
|
208
|
+
chunks = []
|
|
209
|
+
start = 0
|
|
210
|
+
|
|
211
|
+
while start < len(tokens):
|
|
212
|
+
end = start + chunk_size
|
|
213
|
+
chunk_tokens = tokens[start:end]
|
|
214
|
+
chunk_text = self.tokenizer.decode(chunk_tokens)
|
|
215
|
+
|
|
216
|
+
chunks.append({
|
|
217
|
+
'text': chunk_text,
|
|
218
|
+
'start_token': start,
|
|
219
|
+
'end_token': end,
|
|
220
|
+
'token_count': len(chunk_tokens)
|
|
221
|
+
})
|
|
222
|
+
|
|
223
|
+
start = end - overlap
|
|
224
|
+
|
|
225
|
+
return chunks
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
### Step 5: Query Processing
|
|
229
|
+
|
|
230
|
+
```python
|
|
231
|
+
def answer_question(self, question: str):
|
|
232
|
+
"""Answer question about contract"""
|
|
233
|
+
# Allocate token budget
|
|
234
|
+
total_budget = self.config['contextWindow']['effectiveWindow']
|
|
235
|
+
output_buffer = self.config['contextWindow']['outputBuffer']
|
|
236
|
+
available_budget = total_budget - output_buffer
|
|
237
|
+
|
|
238
|
+
# Budget allocation
|
|
239
|
+
gist_budget = 500
|
|
240
|
+
entities_budget = 1000
|
|
241
|
+
relevant_chunks_budget = available_budget - gist_budget - entities_budget
|
|
242
|
+
|
|
243
|
+
# Build context
|
|
244
|
+
context_parts = []
|
|
245
|
+
|
|
246
|
+
# 1. Add gist (always included)
|
|
247
|
+
context_parts.append(f"Contract Overview:\n{self.summaries['gist']}")
|
|
248
|
+
|
|
249
|
+
# 2. Add relevant entities
|
|
250
|
+
relevant_entities = self.get_relevant_entities(question)
|
|
251
|
+
context_parts.append(f"\nKey Entities:\n{self.format_entities(relevant_entities)}")
|
|
252
|
+
|
|
253
|
+
# 3. Find and add relevant chunks
|
|
254
|
+
relevant_chunks = self.find_relevant_chunks(question, relevant_chunks_budget)
|
|
255
|
+
context_parts.append(f"\nRelevant Sections:\n{self.format_chunks(relevant_chunks)}")
|
|
256
|
+
|
|
257
|
+
# Build final prompt
|
|
258
|
+
context = '\n'.join(context_parts)
|
|
259
|
+
prompt = f"""
|
|
260
|
+
Based on the following contract information, answer the question.
|
|
261
|
+
|
|
262
|
+
{context}
|
|
263
|
+
|
|
264
|
+
Question: {question}
|
|
265
|
+
|
|
266
|
+
Answer:
|
|
267
|
+
"""
|
|
268
|
+
|
|
269
|
+
return llm_call(prompt, max_tokens=output_buffer)
|
|
270
|
+
|
|
271
|
+
def get_relevant_entities(self, question: str):
|
|
272
|
+
"""Get entities relevant to question"""
|
|
273
|
+
# Simple keyword matching (could use embeddings)
|
|
274
|
+
relevant = {}
|
|
275
|
+
question_lower = question.lower()
|
|
276
|
+
|
|
277
|
+
for entity_type, entities in self.entities.items():
|
|
278
|
+
relevant[entity_type] = [
|
|
279
|
+
e for e in entities
|
|
280
|
+
if any(word in question_lower for word in e.lower().split())
|
|
281
|
+
]
|
|
282
|
+
|
|
283
|
+
return relevant
|
|
284
|
+
|
|
285
|
+
def find_relevant_chunks(self, question: str, token_budget: int):
|
|
286
|
+
"""Find most relevant chunks within budget"""
|
|
287
|
+
# Score chunks by keyword overlap (could use embeddings)
|
|
288
|
+
question_words = set(question.lower().split())
|
|
289
|
+
scored_chunks = []
|
|
290
|
+
|
|
291
|
+
for chunk in self.chunks:
|
|
292
|
+
chunk_words = set(chunk['text'].lower().split())
|
|
293
|
+
overlap = len(question_words & chunk_words)
|
|
294
|
+
scored_chunks.append((chunk, overlap))
|
|
295
|
+
|
|
296
|
+
# Sort by score
|
|
297
|
+
scored_chunks.sort(key=lambda x: x[1], reverse=True)
|
|
298
|
+
|
|
299
|
+
# Select chunks within budget
|
|
300
|
+
selected = []
|
|
301
|
+
total_tokens = 0
|
|
302
|
+
|
|
303
|
+
for chunk, score in scored_chunks:
|
|
304
|
+
if total_tokens + chunk['token_count'] <= token_budget:
|
|
305
|
+
selected.append(chunk)
|
|
306
|
+
total_tokens += chunk['token_count']
|
|
307
|
+
else:
|
|
308
|
+
break
|
|
309
|
+
|
|
310
|
+
return selected
|
|
311
|
+
|
|
312
|
+
def format_entities(self, entities: dict):
|
|
313
|
+
"""Format entities for context"""
|
|
314
|
+
lines = []
|
|
315
|
+
for entity_type, entity_list in entities.items():
|
|
316
|
+
if entity_list:
|
|
317
|
+
lines.append(f"{entity_type}: {', '.join(entity_list)}")
|
|
318
|
+
return '\n'.join(lines)
|
|
319
|
+
|
|
320
|
+
def format_chunks(self, chunks: list):
|
|
321
|
+
"""Format chunks for context"""
|
|
322
|
+
return '\n\n---\n\n'.join([c['text'] for c in chunks])
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
---
|
|
326
|
+
|
|
327
|
+
## Example Usage
|
|
328
|
+
|
|
329
|
+
```python
|
|
330
|
+
# Initialize analyzer
|
|
331
|
+
config = {
|
|
332
|
+
"contextWindow": {
|
|
333
|
+
"modelMaxTokens": 200000,
|
|
334
|
+
"outputBuffer": 4096,
|
|
335
|
+
"systemPromptTokens": 500,
|
|
336
|
+
"effectiveWindow": 195404
|
|
337
|
+
},
|
|
338
|
+
"chunking": {
|
|
339
|
+
"chunkSize": 4096,
|
|
340
|
+
"overlap": 512
|
|
341
|
+
}
|
|
342
|
+
}
|
|
343
|
+
|
|
344
|
+
analyzer = LegalContractAnalyzer(config)
|
|
345
|
+
|
|
346
|
+
# Ingest contract
|
|
347
|
+
with open('contract.txt', 'r') as f:
|
|
348
|
+
contract_text = f.read()
|
|
349
|
+
|
|
350
|
+
analyzer.ingest_contract(contract_text)
|
|
351
|
+
|
|
352
|
+
# Ask questions
|
|
353
|
+
questions = [
|
|
354
|
+
"What are the payment terms?",
|
|
355
|
+
"When does this contract expire?",
|
|
356
|
+
"What are Acme Corp's obligations?",
|
|
357
|
+
"What are the termination conditions?"
|
|
358
|
+
]
|
|
359
|
+
|
|
360
|
+
for question in questions:
|
|
361
|
+
print(f"\nQ: {question}")
|
|
362
|
+
answer = analyzer.answer_question(question)
|
|
363
|
+
print(f"A: {answer}")
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
---
|
|
367
|
+
|
|
368
|
+
## Key Rules Applied
|
|
369
|
+
|
|
370
|
+
### ✅ Context Window Management
|
|
371
|
+
- **Rule**: Calculate effective window (model max - output buffer - system prompt)
|
|
372
|
+
- **Implementation**: 200k - 4k - 500 = 195,404 tokens available
|
|
373
|
+
|
|
374
|
+
### ✅ Hierarchical Summarization
|
|
375
|
+
- **Rule**: Create multi-level summaries for long documents
|
|
376
|
+
- **Implementation**: 3 levels (50%, 20%, 5% compression)
|
|
377
|
+
|
|
378
|
+
### ✅ Entity Spotlighting
|
|
379
|
+
- **Rule**: Maintain entity reference tables
|
|
380
|
+
- **Implementation**: Extract parties, obligations, dates, amounts
|
|
381
|
+
|
|
382
|
+
### ✅ Sliding Windows
|
|
383
|
+
- **Rule**: Use overlapping chunks for context continuity
|
|
384
|
+
- **Implementation**: 4096 token chunks with 512 token overlap
|
|
385
|
+
|
|
386
|
+
### ✅ Token Budgeting
|
|
387
|
+
- **Rule**: Allocate budget across context components
|
|
388
|
+
- **Implementation**: Gist (500) + Entities (1000) + Chunks (remaining)
|
|
389
|
+
|
|
390
|
+
---
|
|
391
|
+
|
|
392
|
+
## Performance Metrics
|
|
393
|
+
|
|
394
|
+
| Metric | Value |
|
|
395
|
+
|--------|-------|
|
|
396
|
+
| Original Contract | 150,000 tokens |
|
|
397
|
+
| Level 1 Summary | 75,000 tokens (50%) |
|
|
398
|
+
| Level 2 Summary | 30,000 tokens (20%) |
|
|
399
|
+
| Gist | 7,500 tokens (5%) |
|
|
400
|
+
| Number of Chunks | 37 chunks |
|
|
401
|
+
| Average Query Context | 8,000 tokens |
|
|
402
|
+
| Query Latency | 2-3 seconds |
|
|
403
|
+
| Accuracy | 95%+ (preserves exact wording) |
|
|
404
|
+
|
|
405
|
+
---
|
|
406
|
+
|
|
407
|
+
## Benefits
|
|
408
|
+
|
|
409
|
+
✅ **Handles long contracts**: Up to 200k tokens
|
|
410
|
+
✅ **Maintains accuracy**: Preserves exact legal wording
|
|
411
|
+
✅ **Fast queries**: 2-3 second response time
|
|
412
|
+
✅ **Flexible context**: Adapts to question type
|
|
413
|
+
✅ **Cost-effective**: Minimizes token usage
|
|
414
|
+
|
|
415
|
+
---
|
|
416
|
+
|
|
417
|
+
## Limitations
|
|
418
|
+
|
|
419
|
+
❌ **No cross-document analysis**: Single contract only
|
|
420
|
+
❌ **No relationship modeling**: Entities not linked
|
|
421
|
+
❌ **No long-term memory**: Each query is independent
|
|
422
|
+
❌ **Limited semantic search**: Keyword-based chunk selection
|
|
423
|
+
|
|
424
|
+
---
|
|
425
|
+
|
|
426
|
+
## Extensions
|
|
427
|
+
|
|
428
|
+
To address limitations, consider:
|
|
429
|
+
- **Vector-based MCP**: For semantic chunk retrieval
|
|
430
|
+
- **Graph-augmented MCP**: For entity relationships
|
|
431
|
+
- **State-based MCP**: For multi-turn conversations
|
|
432
|
+
- **Hybrid MCP**: Combine all approaches
|
|
433
|
+
|
|
434
|
+
---
|
|
435
|
+
|