sentex 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. sentex-0.2.0/PKG-INFO +372 -0
  2. sentex-0.2.0/README.md +332 -0
  3. sentex-0.2.0/pyproject.toml +50 -0
  4. sentex-0.2.0/sentex/__init__.py +66 -0
  5. sentex-0.2.0/sentex/context.py +114 -0
  6. sentex-0.2.0/sentex/embedder.py +61 -0
  7. sentex-0.2.0/sentex/eval.py +287 -0
  8. sentex-0.2.0/sentex/fs.py +217 -0
  9. sentex-0.2.0/sentex/graph.py +731 -0
  10. sentex-0.2.0/sentex/knn.py +85 -0
  11. sentex-0.2.0/sentex/llm.py +51 -0
  12. sentex-0.2.0/sentex/manifest.py +38 -0
  13. sentex-0.2.0/sentex/persistence.py +181 -0
  14. sentex-0.2.0/sentex/pipeline.py +399 -0
  15. sentex-0.2.0/sentex/relations.py +185 -0
  16. sentex-0.2.0/sentex/retrieval.py +119 -0
  17. sentex-0.2.0/sentex/scoring.py +62 -0
  18. sentex-0.2.0/sentex/server.py +243 -0
  19. sentex-0.2.0/sentex/session.py +57 -0
  20. sentex-0.2.0/sentex/splitter.py +82 -0
  21. sentex-0.2.0/sentex/store.py +223 -0
  22. sentex-0.2.0/sentex/telemetry.py +197 -0
  23. sentex-0.2.0/sentex/tokens.py +27 -0
  24. sentex-0.2.0/sentex/types.py +84 -0
  25. sentex-0.2.0/sentex.egg-info/PKG-INFO +372 -0
  26. sentex-0.2.0/sentex.egg-info/SOURCES.txt +39 -0
  27. sentex-0.2.0/sentex.egg-info/dependency_links.txt +1 -0
  28. sentex-0.2.0/sentex.egg-info/requires.txt +26 -0
  29. sentex-0.2.0/sentex.egg-info/top_level.txt +1 -0
  30. sentex-0.2.0/setup.cfg +4 -0
  31. sentex-0.2.0/tests/test_byoa.py +233 -0
  32. sentex-0.2.0/tests/test_graph.py +181 -0
  33. sentex-0.2.0/tests/test_knn.py +52 -0
  34. sentex-0.2.0/tests/test_persistence.py +51 -0
  35. sentex-0.2.0/tests/test_pipeline.py +223 -0
  36. sentex-0.2.0/tests/test_relations_and_fs.py +291 -0
  37. sentex-0.2.0/tests/test_retrieval_levels.py +296 -0
  38. sentex-0.2.0/tests/test_scoring.py +93 -0
  39. sentex-0.2.0/tests/test_splitter.py +44 -0
  40. sentex-0.2.0/tests/test_store_and_session.py +227 -0
  41. sentex-0.2.0/tests/test_telemetry_and_eval.py +212 -0
sentex-0.2.0/PKG-INFO ADDED
@@ -0,0 +1,372 @@
1
+ Metadata-Version: 2.4
2
+ Name: sentex
3
+ Version: 0.2.0
4
+ Summary: Sentence-graph context management for multi-agent AI pipelines
5
+ License: MIT
6
+ Project-URL: Homepage, https://github.com/laxmanclo/sentex
7
+ Project-URL: Repository, https://github.com/laxmanclo/sentex
8
+ Project-URL: Issues, https://github.com/laxmanclo/sentex/issues
9
+ Keywords: ai,agents,context,llm,multi-agent,memory,rag
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
17
+ Requires-Python: >=3.10
18
+ Description-Content-Type: text/markdown
19
+ Requires-Dist: numpy>=1.26
20
+ Requires-Dist: sentence-transformers>=3.0
21
+ Requires-Dist: litellm>=1.40
22
+ Requires-Dist: fastapi>=0.111
23
+ Requires-Dist: uvicorn>=0.30
24
+ Requires-Dist: tiktoken>=0.7
25
+ Requires-Dist: nltk>=3.8
26
+ Provides-Extra: prometheus
27
+ Requires-Dist: prometheus-client>=0.20; extra == "prometheus"
28
+ Provides-Extra: eval
29
+ Requires-Dist: scikit-learn>=1.4; extra == "eval"
30
+ Provides-Extra: watch
31
+ Requires-Dist: httpx>=0.27; extra == "watch"
32
+ Provides-Extra: full
33
+ Requires-Dist: prometheus-client>=0.20; extra == "full"
34
+ Requires-Dist: scikit-learn>=1.4; extra == "full"
35
+ Requires-Dist: httpx>=0.27; extra == "full"
36
+ Provides-Extra: dev
37
+ Requires-Dist: pytest>=8; extra == "dev"
38
+ Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
39
+ Requires-Dist: httpx>=0.27; extra == "dev"
40
+
41
+ # sentex
42
+
43
+ **Context management middleware for multi-agent AI pipelines.**
44
+
45
+ ```
46
+ pip install sentex
47
+ ```
48
+
49
+ You have agents. They produce outputs. The next agent needs to read those outputs — but not all of them. Just the parts that are relevant to *its specific task*, within a token budget.
50
+
51
+ Sentex is the layer between your agents that makes this work.
52
+
53
+ ---
54
+
55
+ ## The problem
56
+
57
+ In any multi-agent pipeline, Agent 3 ends up carrying the full output of Agents 1 and 2 — whether it's relevant or not. At scale this blows token budgets and degrades output quality. Every framework either dumps everything in or makes you manage it manually.
58
+
59
+ ## How Sentex works
60
+
61
+ Every agent output goes into a shared sentence graph. When the next agent runs, Sentex retrieves exactly the sentences relevant to its task — traversing semantic KNN edges across agent boundaries — and delivers them within a token budget you set.
62
+
63
+ ```
64
+ Agent 1 output → split into sentences → embedded → KNN graph
65
+
66
+ Agent 3 query → find entry point → BFS traversal → 8 sentences (not 60)
67
+ ```
68
+
69
+ ---
70
+
71
+ ## Quickstart — bring your own agents
72
+
73
+ ```python
74
+ from sentex import ContextGraph
75
+
76
+ graph = ContextGraph()
77
+
78
+ # === Your Agent 1 runs (any framework — LangChain, CrewAI, raw API) ===
79
+ output = my_agent_1.run("research the immune system")
80
+ graph.put("search-results", output, agent_id="researcher")
81
+
82
+ # === Build context for Agent 2 ===
83
+ context = graph.get("search-results", query="write a 60-second script", budget=2000)
84
+ # context = [most relevant sentences from Agent 1's output]
85
+
86
+ prompt = f"Use this research:\n{chr(10).join(context)}\n\nWrite the script."
87
+ script = my_agent_2.run(prompt)
88
+
89
+ graph.put("script", script, agent_id="writer")
90
+ graph.used("search-results") # boosts retrieval for future runs
91
+ ```
92
+
93
+ Three calls: `put()`, `get()`, `used()`. That's the integration.
94
+
95
+ ---
96
+
97
+ ## The four context levels
98
+
99
+ Every node in the graph has four representations:
100
+
101
+ | Level | What | When to use |
102
+ |-------|------|-------------|
103
+ | **L0** | ~50-token identity sentence | Scanning many nodes to decide which to load |
104
+ | **L1** | Sentence-graph retrieval | Default — get the relevant sentences, nothing else |
105
+ | **L2** | ~300-token extractive summary | When you need a coherent overview, not fragments |
106
+ | **L3** | Full raw content | When the agent needs everything |
107
+
108
+ ```python
109
+ # L1 — sentence graph (default)
110
+ context = graph.get("search-results", query="immune cells", budget=2000)
111
+ # → ["T-cells recognize antigens on pathogen surfaces.",
112
+ # "Antigen recognition triggers the adaptive immune cascade.", ...]
113
+
114
+ # L2 — extractive summary (first ~300 tokens of content)
115
+ summary = graph.get("search-results", query="", budget=2000, layer="l2")
116
+ # → "The immune system has two branches: innate and adaptive. T-cells..."
117
+
118
+ # L3 — full content
119
+ full = graph.get("search-results", query="", budget=99999, layer="l3")
120
+ # → the entire raw output from Agent 1
121
+
122
+ # L0 — identity (first sentence, used by scan_nodes)
123
+ identity = graph.get("search-results", query="", budget=100, layer="l0")
124
+ # → "The immune system has two branches: innate and adaptive."
125
+ ```
126
+
127
+ ---
128
+
129
+ ## Multi-agent example
130
+
131
+ ```python
132
+ from sentex import ContextGraph
133
+
134
+ graph = ContextGraph()
135
+
136
+ # Agent 1: researcher
137
+ research = researcher_agent.run("immune system")
138
+ graph.put("resources/research", research, agent_id="researcher")
139
+
140
+ # Agent 2: analyst — only sees the research sentences relevant to its task
141
+ context = graph.get("resources/research", query="key mechanisms and findings", budget=1500)
142
+ analysis = analyst_agent.run(f"Analyse:\n{chr(10).join(context)}")
143
+ graph.put("working/analysis", analysis, agent_id="analyst")
144
+ graph.used("resources/research")
145
+
146
+ # Agent 3: writer — graph edges cross agent boundaries automatically.
147
+ # Sentences from research and analysis that are semantically close get KNN edges.
148
+ # BFS traversal for each node surfaces the most relevant from both.
149
+ research_ctx = graph.get("resources/research", query="write a script", budget=1000)
150
+ analysis_ctx = graph.get("working/analysis", query="write a script", budget=800)
151
+
152
+ script = writer_agent.run(
153
+ f"Research:\n{chr(10).join(research_ctx)}\n\n"
154
+ f"Analysis:\n{analysis_ctx}\n\n"
155
+ f"Write a 60-second video script."
156
+ )
157
+ graph.put("working/script", script, agent_id="writer")
158
+ ```
159
+
160
+ ---
161
+
162
+ ## Structured assembly (optional)
163
+
164
+ If you want Sentex to handle budget enforcement across multiple reads automatically:
165
+
166
+ ```python
167
+ from sentex import ContextGraph, defineAgent, Read
168
+
169
+ graph = ContextGraph()
170
+
171
+ # ... agents 1 and 2 have run, graph has content ...
172
+
173
+ writer_manifest = defineAgent(
174
+ id="writer",
175
+ reads=[
176
+ Read("resources/research", layer="l1", budget=1500),
177
+ Read("working/analysis", layer="l2", budget=500),
178
+ ],
179
+ writes=["working/script"],
180
+ token_budget=4000, # total cap across all reads
181
+ fallback="l2", # if L1 confidence < 0.5, serve L2 instead
182
+ )
183
+
184
+ assembled = graph.assemble_for(writer_manifest, query="write a video script")
185
+
186
+ # assembled.context → {"resources/research": [sentences], "working/analysis": "summary"}
187
+ # assembled.token_count → 1923 (enforced before the LLM call — never a surprise)
188
+ # assembled.layers_used → {"resources/research": "l1", "working/analysis": "l2"}
189
+ # assembled.confidence → {"resources/research": 0.84, "working/analysis": 1.0}
190
+ # assembled.compressed → [] (nothing fell back due to budget)
191
+ # assembled.missing → [] (all declared reads were available)
192
+
193
+ # Build a prompt from the assembled context:
194
+ sections = []
195
+ for node_id, content in assembled.context.items():
196
+ text = "\n".join(content) if isinstance(content, list) else content
197
+ sections.append(f"[{node_id}]\n{text}")
198
+ prompt = "\n\n".join(sections) + "\n\nWrite a 60-second video script."
199
+
200
+ script = writer_agent.run(prompt)
201
+ graph.put("working/script", script, agent_id="writer")
202
+ graph.mark_used(assembled, used_ids=["resources/research"])
203
+ ```
204
+
205
+ `assemble_for()` handles:
206
+ - Token budget across all reads (total cap, not per-read)
207
+ - Automatic fallback: L1 → L2 → L0 if over budget
208
+ - Confidence-based fallback: low similarity → serve L2 instead of bad sentences
209
+ - Full diagnostics on what was actually served
210
+
211
+ ---
212
+
213
+ ## Dynamic node discovery (AutoRead)
214
+
215
+ When you don't know the node IDs at definition time — scan all nodes at L0, retrieve from the top-k:
216
+
217
+ ```python
218
+ from sentex import ContextGraph, AutoRead, defineAgent
219
+
220
+ graph = ContextGraph()
221
+
222
+ # ... many nodes ingested under resources/* ...
223
+
224
+ manifest = defineAgent(
225
+ id="synthesiser",
226
+ reads=[AutoRead(top_k=3, layer="l1", budget_per_node=1000, scope="resources")],
227
+ writes=["working/synthesis"],
228
+ token_budget=5000,
229
+ )
230
+ assembled = graph.assemble_for(manifest, query="immune system mechanisms")
231
+ # → scans all resources/* nodes at L0
232
+ # → retrieves L1 from the 3 most relevant
233
+ # → keys in assembled.context: "auto:resources/research", "auto:resources/docs", ...
234
+ ```
235
+
236
+ ---
237
+
238
+ ## Graph inspection
239
+
240
+ ```python
241
+ graph.stats()
242
+ # → {"nodes": 3, "sentences": 47, "edges": 235, "edge_boosts": 12, "node_ids": [...]}
243
+
244
+ graph.get_node("resources/research")
245
+ # → ContextNode(id=..., produced_by=..., sentence_ids=..., l0=..., l2=..., ...)
246
+
247
+ # Scan nodes by relevance (L0 level — fast, no sentence retrieval)
248
+ graph.scan_nodes("immune cell coordination", top_k=3)
249
+ # → [("resources/research", 0.87), ("working/analysis", 0.74), ...]
250
+ ```
251
+
252
+ ---
253
+
254
+ ## HTTP server (for TypeScript / non-Python pipelines)
255
+
256
+ ```bash
257
+ uvicorn sentex.server:app --port 8765
258
+ ```
259
+
260
+ ```
261
+ POST /put body: {node_id, content, agent_id}
262
+ POST /get body: {node_id, query, budget, layer}
263
+ POST /used body: {node_ids: [...]}
264
+ POST /scan body: {query, top_k, scope}
265
+ POST /assemble body: {agent_id, reads, token_budget, query}
266
+ GET /nodes list all nodes with token counts
267
+ GET /nodes/{id} inspect a single node
268
+ GET /health node count, sentence count
269
+ ```
270
+
271
+ OpenAPI docs at `http://localhost:8765/docs`.
272
+
273
+ ---
274
+
275
+ ## Pipeline decorator (optional orchestration)
276
+
277
+ If you want Sentex to own the run loop rather than just managing context:
278
+
279
+ ```python
280
+ from sentex import Pipeline, Read
281
+
282
+ pipeline = Pipeline()
283
+
284
+ @pipeline.agent(id="researcher", writes=["resources/research"])
285
+ async def researcher(ctx):
286
+ return await ctx.llm("Research the immune system in detail.")
287
+
288
+ @pipeline.agent(
289
+ id="writer",
290
+ reads=[Read("resources/research", layer="l1", budget=2000)],
291
+ writes=["working/script"],
292
+ token_budget=4000,
293
+ )
294
+ async def writer(ctx):
295
+ return await ctx.llm(ctx.render() + "\n\nWrite a 60-second video script.")
296
+
297
+ result = await pipeline.run(
298
+ query="explain how the immune system works",
299
+ llm="gpt-4o", # any LiteLLM model string
300
+ )
301
+
302
+ print(result.outputs["working/script"])
303
+ print(result.summary())
304
+ ```
305
+
306
+ ---
307
+
308
+ ## Cross-run memory (SQLite-backed)
309
+
310
+ Edge weights and node summaries persist between runs so retrieval improves over time:
311
+
312
+ ```python
313
+ from sentex import Pipeline, MemoryStore
314
+
315
+ store = MemoryStore("./sentex.db")
316
+ pipeline = Pipeline(persist="./sentex.db")
317
+
318
+ # Run 1: baseline retrieval
319
+ # Run 2: L2 summaries cached, no regeneration needed
320
+ # Run 3+: session history available
321
+
322
+ store.all_sessions()
323
+ # → [{"session_id": ..., "query": ..., "started_at": ..., "committed_at": ...}, ...]
324
+ ```
325
+
326
+ ---
327
+
328
+ ## How the sentence graph works
329
+
330
+ When content is ingested:
331
+ 1. Split into sentences (NLTK Punkt; code blocks and lists are atomic)
332
+ 2. Each sentence embedded (`all-MiniLM-L6-v2`, 384 dims, runs locally, no API key needed)
333
+ 3. K nearest neighbours computed across **all sentences in the graph** — not just within the current node. Sentences from different agents get edges automatically.
334
+ 4. Extractive L0 (first sentence) and L2 (first ~300 tokens) built immediately; LLM-generated summaries are optional and async.
335
+
336
+ When an agent retrieves at L1:
337
+ ```
338
+ embed(query) → highest-sim sentence in this node (entry point)
339
+ → BFS via KNN edges (crosses agent boundaries)
340
+ → collect until token budget hit
341
+ → return sorted by relevance
342
+ ```
343
+
344
+ ---
345
+
346
+ ## What Sentex does not do
347
+
348
+ - No agent orchestration — bring your own (LangChain, CrewAI, AutoGen, anything)
349
+ - No vector database required (numpy handles the KNN)
350
+ - No graph database required (Python dict handles adjacency)
351
+ - No cloud infrastructure (runs in-process; SQLite for optional persistence)
352
+ - No opinion on which LLM you use (any LiteLLM-compatible model for L0/L2 generation)
353
+
354
+ ---
355
+
356
+ ## Comparison
357
+
358
+ | | LangChain | LlamaIndex | **Sentex** |
359
+ |--|-----------|------------|------------|
360
+ | Sentence-level retrieval | No | No | **Yes** |
361
+ | Cross-agent graph edges | No | No | **Yes** |
362
+ | Works with any agent framework | Yes | Yes | **Yes** |
363
+ | Budget enforcement pre-call | No | No | **Yes** |
364
+ | Requires vector DB | No | Yes | **No** |
365
+ | Runs fully in-memory | Yes | No | **Yes** |
366
+ | HTTP server for polyglot pipelines | No | No | **Yes** |
367
+
368
+ ---
369
+
370
+ ## License
371
+
372
+ MIT
sentex-0.2.0/README.md ADDED
@@ -0,0 +1,332 @@
1
+ # sentex
2
+
3
+ **Context management middleware for multi-agent AI pipelines.**
4
+
5
+ ```
6
+ pip install sentex
7
+ ```
8
+
9
+ You have agents. They produce outputs. The next agent needs to read those outputs — but not all of them. Just the parts that are relevant to *its specific task*, within a token budget.
10
+
11
+ Sentex is the layer between your agents that makes this work.
12
+
13
+ ---
14
+
15
+ ## The problem
16
+
17
+ In any multi-agent pipeline, Agent 3 ends up carrying the full output of Agents 1 and 2 — whether it's relevant or not. At scale this blows token budgets and degrades output quality. Every framework either dumps everything in or makes you manage it manually.
18
+
19
+ ## How Sentex works
20
+
21
+ Every agent output goes into a shared sentence graph. When the next agent runs, Sentex retrieves exactly the sentences relevant to its task — traversing semantic KNN edges across agent boundaries — and delivers them within a token budget you set.
22
+
23
+ ```
24
+ Agent 1 output → split into sentences → embedded → KNN graph
25
+
26
+ Agent 3 query → find entry point → BFS traversal → 8 sentences (not 60)
27
+ ```
28
+
29
+ ---
30
+
31
+ ## Quickstart — bring your own agents
32
+
33
+ ```python
34
+ from sentex import ContextGraph
35
+
36
+ graph = ContextGraph()
37
+
38
+ # === Your Agent 1 runs (any framework — LangChain, CrewAI, raw API) ===
39
+ output = my_agent_1.run("research the immune system")
40
+ graph.put("search-results", output, agent_id="researcher")
41
+
42
+ # === Build context for Agent 2 ===
43
+ context = graph.get("search-results", query="write a 60-second script", budget=2000)
44
+ # context = [most relevant sentences from Agent 1's output]
45
+
46
+ prompt = f"Use this research:\n{chr(10).join(context)}\n\nWrite the script."
47
+ script = my_agent_2.run(prompt)
48
+
49
+ graph.put("script", script, agent_id="writer")
50
+ graph.used("search-results") # boosts retrieval for future runs
51
+ ```
52
+
53
+ Three calls: `put()`, `get()`, `used()`. That's the integration.
54
+
55
+ ---
56
+
57
+ ## The four context levels
58
+
59
+ Every node in the graph has four representations:
60
+
61
+ | Level | What | When to use |
62
+ |-------|------|-------------|
63
+ | **L0** | ~50-token identity sentence | Scanning many nodes to decide which to load |
64
+ | **L1** | Sentence-graph retrieval | Default — get the relevant sentences, nothing else |
65
+ | **L2** | ~300-token extractive summary | When you need a coherent overview, not fragments |
66
+ | **L3** | Full raw content | When the agent needs everything |
67
+
68
+ ```python
69
+ # L1 — sentence graph (default)
70
+ context = graph.get("search-results", query="immune cells", budget=2000)
71
+ # → ["T-cells recognize antigens on pathogen surfaces.",
72
+ # "Antigen recognition triggers the adaptive immune cascade.", ...]
73
+
74
+ # L2 — extractive summary (first ~300 tokens of content)
75
+ summary = graph.get("search-results", query="", budget=2000, layer="l2")
76
+ # → "The immune system has two branches: innate and adaptive. T-cells..."
77
+
78
+ # L3 — full content
79
+ full = graph.get("search-results", query="", budget=99999, layer="l3")
80
+ # → the entire raw output from Agent 1
81
+
82
+ # L0 — identity (first sentence, used by scan_nodes)
83
+ identity = graph.get("search-results", query="", budget=100, layer="l0")
84
+ # → "The immune system has two branches: innate and adaptive."
85
+ ```
86
+
87
+ ---
88
+
89
+ ## Multi-agent example
90
+
91
+ ```python
92
+ from sentex import ContextGraph
93
+
94
+ graph = ContextGraph()
95
+
96
+ # Agent 1: researcher
97
+ research = researcher_agent.run("immune system")
98
+ graph.put("resources/research", research, agent_id="researcher")
99
+
100
+ # Agent 2: analyst — only sees the research sentences relevant to its task
101
+ context = graph.get("resources/research", query="key mechanisms and findings", budget=1500)
102
+ analysis = analyst_agent.run(f"Analyse:\n{chr(10).join(context)}")
103
+ graph.put("working/analysis", analysis, agent_id="analyst")
104
+ graph.used("resources/research")
105
+
106
+ # Agent 3: writer — graph edges cross agent boundaries automatically.
107
+ # Sentences from research and analysis that are semantically close get KNN edges.
108
+ # BFS traversal for each node surfaces the most relevant from both.
109
+ research_ctx = graph.get("resources/research", query="write a script", budget=1000)
110
+ analysis_ctx = graph.get("working/analysis", query="write a script", budget=800)
111
+
112
+ script = writer_agent.run(
113
+ f"Research:\n{chr(10).join(research_ctx)}\n\n"
114
+ f"Analysis:\n{analysis_ctx}\n\n"
115
+ f"Write a 60-second video script."
116
+ )
117
+ graph.put("working/script", script, agent_id="writer")
118
+ ```
119
+
120
+ ---
121
+
122
+ ## Structured assembly (optional)
123
+
124
+ If you want Sentex to handle budget enforcement across multiple reads automatically:
125
+
126
+ ```python
127
+ from sentex import ContextGraph, defineAgent, Read
128
+
129
+ graph = ContextGraph()
130
+
131
+ # ... agents 1 and 2 have run, graph has content ...
132
+
133
+ writer_manifest = defineAgent(
134
+ id="writer",
135
+ reads=[
136
+ Read("resources/research", layer="l1", budget=1500),
137
+ Read("working/analysis", layer="l2", budget=500),
138
+ ],
139
+ writes=["working/script"],
140
+ token_budget=4000, # total cap across all reads
141
+ fallback="l2", # if L1 confidence < 0.5, serve L2 instead
142
+ )
143
+
144
+ assembled = graph.assemble_for(writer_manifest, query="write a video script")
145
+
146
+ # assembled.context → {"resources/research": [sentences], "working/analysis": "summary"}
147
+ # assembled.token_count → 1923 (enforced before the LLM call — never a surprise)
148
+ # assembled.layers_used → {"resources/research": "l1", "working/analysis": "l2"}
149
+ # assembled.confidence → {"resources/research": 0.84, "working/analysis": 1.0}
150
+ # assembled.compressed → [] (nothing fell back due to budget)
151
+ # assembled.missing → [] (all declared reads were available)
152
+
153
+ # Build a prompt from the assembled context:
154
+ sections = []
155
+ for node_id, content in assembled.context.items():
156
+ text = "\n".join(content) if isinstance(content, list) else content
157
+ sections.append(f"[{node_id}]\n{text}")
158
+ prompt = "\n\n".join(sections) + "\n\nWrite a 60-second video script."
159
+
160
+ script = writer_agent.run(prompt)
161
+ graph.put("working/script", script, agent_id="writer")
162
+ graph.mark_used(assembled, used_ids=["resources/research"])
163
+ ```
164
+
165
+ `assemble_for()` handles:
166
+ - Token budget across all reads (total cap, not per-read)
167
+ - Automatic fallback: L1 → L2 → L0 if over budget
168
+ - Confidence-based fallback: low similarity → serve L2 instead of bad sentences
169
+ - Full diagnostics on what was actually served
170
+
171
+ ---
172
+
173
+ ## Dynamic node discovery (AutoRead)
174
+
175
+ When you don't know the node IDs at definition time — scan all nodes at L0, retrieve from the top-k:
176
+
177
+ ```python
178
+ from sentex import ContextGraph, AutoRead, defineAgent
179
+
180
+ graph = ContextGraph()
181
+
182
+ # ... many nodes ingested under resources/* ...
183
+
184
+ manifest = defineAgent(
185
+ id="synthesiser",
186
+ reads=[AutoRead(top_k=3, layer="l1", budget_per_node=1000, scope="resources")],
187
+ writes=["working/synthesis"],
188
+ token_budget=5000,
189
+ )
190
+ assembled = graph.assemble_for(manifest, query="immune system mechanisms")
191
+ # → scans all resources/* nodes at L0
192
+ # → retrieves L1 from the 3 most relevant
193
+ # → keys in assembled.context: "auto:resources/research", "auto:resources/docs", ...
194
+ ```
195
+
196
+ ---
197
+
198
+ ## Graph inspection
199
+
200
+ ```python
201
+ graph.stats()
202
+ # → {"nodes": 3, "sentences": 47, "edges": 235, "edge_boosts": 12, "node_ids": [...]}
203
+
204
+ graph.get_node("resources/research")
205
+ # → ContextNode(id=..., produced_by=..., sentence_ids=..., l0=..., l2=..., ...)
206
+
207
+ # Scan nodes by relevance (L0 level — fast, no sentence retrieval)
208
+ graph.scan_nodes("immune cell coordination", top_k=3)
209
+ # → [("resources/research", 0.87), ("working/analysis", 0.74), ...]
210
+ ```
211
+
212
+ ---
213
+
214
+ ## HTTP server (for TypeScript / non-Python pipelines)
215
+
216
+ ```bash
217
+ uvicorn sentex.server:app --port 8765
218
+ ```
219
+
220
+ ```
221
+ POST /put body: {node_id, content, agent_id}
222
+ POST /get body: {node_id, query, budget, layer}
223
+ POST /used body: {node_ids: [...]}
224
+ POST /scan body: {query, top_k, scope}
225
+ POST /assemble body: {agent_id, reads, token_budget, query}
226
+ GET /nodes list all nodes with token counts
227
+ GET /nodes/{id} inspect a single node
228
+ GET /health node count, sentence count
229
+ ```
230
+
231
+ OpenAPI docs at `http://localhost:8765/docs`.
232
+
233
+ ---
234
+
235
+ ## Pipeline decorator (optional orchestration)
236
+
237
+ If you want Sentex to own the run loop rather than just managing context:
238
+
239
+ ```python
240
+ from sentex import Pipeline, Read
241
+
242
+ pipeline = Pipeline()
243
+
244
+ @pipeline.agent(id="researcher", writes=["resources/research"])
245
+ async def researcher(ctx):
246
+ return await ctx.llm("Research the immune system in detail.")
247
+
248
+ @pipeline.agent(
249
+ id="writer",
250
+ reads=[Read("resources/research", layer="l1", budget=2000)],
251
+ writes=["working/script"],
252
+ token_budget=4000,
253
+ )
254
+ async def writer(ctx):
255
+ return await ctx.llm(ctx.render() + "\n\nWrite a 60-second video script.")
256
+
257
+ result = await pipeline.run(
258
+ query="explain how the immune system works",
259
+ llm="gpt-4o", # any LiteLLM model string
260
+ )
261
+
262
+ print(result.outputs["working/script"])
263
+ print(result.summary())
264
+ ```
265
+
266
+ ---
267
+
268
+ ## Cross-run memory (SQLite-backed)
269
+
270
+ Edge weights and node summaries persist between runs so retrieval improves over time:
271
+
272
+ ```python
273
+ from sentex import Pipeline, MemoryStore
274
+
275
+ store = MemoryStore("./sentex.db")
276
+ pipeline = Pipeline(persist="./sentex.db")
277
+
278
+ # Run 1: baseline retrieval
279
+ # Run 2: L2 summaries cached, no regeneration needed
280
+ # Run 3+: session history available
281
+
282
+ store.all_sessions()
283
+ # → [{"session_id": ..., "query": ..., "started_at": ..., "committed_at": ...}, ...]
284
+ ```
285
+
286
+ ---
287
+
288
+ ## How the sentence graph works
289
+
290
+ When content is ingested:
291
+ 1. Split into sentences (NLTK Punkt; code blocks and lists are atomic)
292
+ 2. Each sentence embedded (`all-MiniLM-L6-v2`, 384 dims, runs locally, no API key needed)
293
+ 3. K nearest neighbours computed across **all sentences in the graph** — not just within the current node. Sentences from different agents get edges automatically.
294
+ 4. Extractive L0 (first sentence) and L2 (first ~300 tokens) built immediately; LLM-generated summaries are optional and async.
295
+
296
+ When an agent retrieves at L1:
297
+ ```
298
+ embed(query) → highest-sim sentence in this node (entry point)
299
+ → BFS via KNN edges (crosses agent boundaries)
300
+ → collect until token budget hit
301
+ → return sorted by relevance
302
+ ```
303
+
304
+ ---
305
+
306
+ ## What Sentex does not do
307
+
308
+ - No agent orchestration — bring your own (LangChain, CrewAI, AutoGen, anything)
309
+ - No vector database required (numpy handles the KNN)
310
+ - No graph database required (Python dict handles adjacency)
311
+ - No cloud infrastructure (runs in-process; SQLite for optional persistence)
312
+ - No opinion on which LLM you use (any LiteLLM-compatible model for L0/L2 generation)
313
+
314
+ ---
315
+
316
+ ## Comparison
317
+
318
+ | | LangChain | LlamaIndex | **Sentex** |
319
+ |--|-----------|------------|------------|
320
+ | Sentence-level retrieval | No | No | **Yes** |
321
+ | Cross-agent graph edges | No | No | **Yes** |
322
+ | Works with any agent framework | Yes | Yes | **Yes** |
323
+ | Budget enforcement pre-call | No | No | **Yes** |
324
+ | Requires vector DB | No | Yes | **No** |
325
+ | Runs fully in-memory | Yes | No | **Yes** |
326
+ | HTTP server for polyglot pipelines | No | No | **Yes** |
327
+
328
+ ---
329
+
330
+ ## License
331
+
332
+ MIT