dialectica 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ scopes:
2
+ - name: mta
3
+ description: core agent logic, specialist definitions, and tool integrations in multi_tool_agent/; does not cover configuration files or project metadata.
4
+ - name: dialectica
5
+ description: core dialectica engine and pluggable stages in dialectica/; does not cover agent-specific logic or multi_tool_agent/.
6
+ - name: tests
7
+ description: automated test suites for all project components in tests/; does not cover source code or build configuration.
@@ -0,0 +1,13 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
11
+
12
+ # Environment variables
13
+ .env
@@ -0,0 +1 @@
1
+ 3.11.13
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Frad LEE
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,475 @@
1
+ Metadata-Version: 2.4
2
+ Name: dialectica
3
+ Version: 0.3.0
4
+ Summary: An adversarial reasoning engine: a pluggable tree-search workflow where thoughts are generated, adversarially evaluated, and synthesized (thesis -> antithesis -> synthesis).
5
+ Author-email: Frad LEE <fradser@gmail.com>
6
+ License-Expression: MIT
7
+ License-File: LICENSE
8
+ Requires-Python: >=3.11
9
+ Requires-Dist: google-adk>=2.0.0
10
+ Requires-Dist: litellm>=1.66.0
11
+ Requires-Dist: pydantic>=2.11.3
12
+ Description-Content-Type: text/markdown
13
+
14
+ # Dialectica ![](https://img.shields.io/badge/A%20FRAD%20PRODUCT-WIP-yellow)
15
+
16
+ [![Twitter Follow](https://img.shields.io/twitter/follow/FradSer?style=social)](https://twitter.com/FradSer) [![Python Version](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/) [![Framework](https://img.shields.io/badge/Framework-ADK%202.1-orange.svg)](https://google.github.io/adk-docs/) [![Evaluation](https://img.shields.io/badge/Evaluation-GAN%20Adversarial-purple.svg)]()
17
+
18
+ English | [简体中文](README.zh-CN.md)
19
+
20
+ **Dialectica** is a pluggable adversarial reasoning engine. It searches a tree of "thoughts" where each thought is generated, adversarially evaluated and iteratively refined, then synthesized into an answer — *thesis → antithesis → synthesis* (Generator → Discriminator → Synthesizer). Inspired by [karpathy/autoresearch](https://github.com/karpathy/autoresearch)'s propose→evaluate→keep-best loop and Claude Code's composable workflows, every stage is a swappable component; the default wiring is Tree-of-Thoughts + a GAN-style evaluation loop on Google ADK 2.1.
21
+
22
+ ## Install
23
+
24
+ Use it as a library in your own project:
25
+
26
+ ```bash
27
+ uv add git+https://github.com/FradSer/dialectica
28
+ # or: pip install git+https://github.com/FradSer/dialectica
29
+ ```
30
+
31
+ ```python
32
+ import os, asyncio
33
+ from dialectica import create_engine
34
+
35
+ os.environ["GOOGLE_API_KEY"] = "..." # the app owns env setup
36
+
37
+ async def main():
38
+ result = await create_engine("Your problem here").run()
39
+ print(result["final_answer"])
40
+
41
+ asyncio.run(main())
42
+ ```
43
+
44
+ The library reads configuration from `os.environ` and does **not** load `.env`
45
+ itself. To work on Dialectica instead, see [Setup and Usage](#setup-and-usage).
46
+
47
+ ## Key Features
48
+
49
+ ### 🧩 Pluggable engine (thesis → antithesis → synthesis)
50
+ The `Engine` owns only the search *control flow*; every decision is delegated to
51
+ an injected component, so any stage can be swapped without touching the engine:
52
+
53
+ | Stage | Role | Default |
54
+ |-------|------|---------|
55
+ | `Generator` | propose thoughts (**thesis**) | `LlmGenerator` |
56
+ | `Evaluator` | critique & refine (**antithesis**) | `AdversarialEvaluator` |
57
+ | `Selector` | choose the frontier | `BeamSearch` |
58
+ | `Synthesizer` | combine into an answer (**synthesis**) | `LlmSynthesizer` |
59
+
60
+ Retarget it at code review, research, or decision-making just by changing the
61
+ generator's prompts or swapping a stage — see [Pluggable Architecture](#pluggable-architecture).
62
+
63
+ ### 🔄 GAN-style adversarial evaluation (keep-best)
64
+ Each thought undergoes **iterative adversarial refinement** rather than a single pass:
65
+ 1. **Discriminator** scores it with a structured verdict (score, flaws, suggestions)
66
+ 2. **Generator** refines it from that critique
67
+ 3. **Discriminator** re-scores
68
+ 4. Loop until the quality threshold, a terminate signal, or `max_gan_rounds`
69
+
70
+ Refinement is **not assumed monotonic** — the loop keeps the *best-scoring* round
71
+ (à la autoresearch's "keep only what beats the current best"), and the node stores
72
+ that refined text so synthesis works on the improved version, not the original.
73
+
74
+ ### 🌳 Tree search with merit-based beam
75
+ - **Strategies are scored before the beam** — the frontier reflects merit, not generation order
76
+ - **Beam search** keeps the top-k most promising paths (`BeamSearch`, or `GreedySearch`)
77
+ - **Pruning**: paths below threshold are dropped; exploration stops when the beam empties
78
+ - **Multi-node synthesis**: the final answer integrates the top scoring thoughts across branches
79
+
80
+ ### 📊 Structured evaluation results
81
+ The `Discriminator` returns a `DiscriminatorVerdict` via ADK `output_schema` (no
82
+ fragile text parsing). The engine wraps it as an `EvaluationResult`:
83
+ `score`, `flaws`, `suggestions`, `should_terminate`, `reasoning`,
84
+ `adversarial_rounds`, `refined_thought`, and the full per-round `history`.
85
+
86
+ ## Architecture
87
+
88
+ ```
89
+ User Problem
90
+
91
+ Engine — Phase 1: Initialize
92
+
93
+ Generator expands root → initial strategies
94
+ ↓ (each strategy scored by the Evaluator before it can enter the beam)
95
+ Engine — Phase 2: Explore (beam search)
96
+
97
+ For each node in the Selector's frontier:
98
+ ├── Generator expands it into children
99
+ └── for each child, Evaluator runs the GAN loop:
100
+ ├── Discriminator scores (structured verdict)
101
+ ├── Generator refines from the critique
102
+ ├── re-score, keep the best round
103
+ └── persist the refined thought + score on the node
104
+ → children ≥ threshold form the next beam
105
+
106
+ Engine — Phase 3: Synthesize
107
+
108
+ Synthesizer integrates the top thoughts
109
+
110
+ Final Answer (+ thought_tree, best_path, stats)
111
+ ```
112
+
113
+ ## Workflow Phases
114
+
115
+ ### Phase 1: Initialization
116
+ - Creates the root node from the user problem
117
+ - `Generator.expand(root)` produces the initial strategies (validated via `ThoughtData`)
118
+ - **Each strategy is adversarially scored**, then the ones clearing the threshold seed the beam (falling back to the Selector's top-k if none clear it)
119
+
120
+ ### Phase 2: Exploration (beam search)
121
+ Iterates up to `max_depth` times:
122
+ 1. **Select**: `Selector.select(...)` picks the frontier from the active beam
123
+ 2. **Generate**: `Generator.expand(parent)` creates child thoughts
124
+ 3. **Evaluate**: `Evaluator.evaluate(...)` runs the GAN loop, keeping the best round and persisting the refined thought
125
+ 4. **Filter**: children scoring ≥ `score_threshold` form the next beam
126
+
127
+ Exploration stops when the beam empties or `max_depth` is reached.
128
+
129
+ ### Phase 3: Synthesis
130
+ - `Synthesizer.synthesize(...)` takes the top-scoring evaluated thoughts
131
+ - Produces a coherent, comprehensive final answer
132
+
133
+ ## Setup and Usage
134
+
135
+ 1. **Clone the repository:**
136
+ ```bash
137
+ git clone https://github.com/FradSer/dialectica
138
+ cd dialectica
139
+ ```
140
+
141
+ 2. **Set up environment variables:**
142
+ ```bash
143
+ cd dialectica
144
+ cp .env.example .env
145
+ # Edit .env with your API keys and model preferences
146
+ ```
147
+
148
+ 3. **Install dependencies:**
149
+ ```bash
150
+ uv sync
151
+ ```
152
+
153
+ 4. **Run a problem:**
154
+ ```python
155
+ import asyncio
156
+ from dialectica import create_engine
157
+
158
+ async def main():
159
+ engine = create_engine("Design a sustainable urban transport system")
160
+ result = await engine.run()
161
+ print(result["final_answer"])
162
+
163
+ asyncio.run(main())
164
+ ```
165
+
166
+ ## Configuration
167
+
168
+ ### Environment Variables
169
+
170
+ **Model Configuration:**
171
+ ```bash
172
+ # Default model for all agents
173
+ DEFAULT_MODEL_CONFIG=google:gemini-3.5-flash
174
+
175
+ # Role-specific overrides (optional)
176
+ GENERATOR_MODEL_CONFIG=google:gemini-3.1-pro
177
+ DISCRIMINATOR_MODEL_CONFIG=google:gemini-3.1-pro
178
+ SYNTHESIZER_MODEL_CONFIG=google:gemini-3.1-pro
179
+ ```
180
+
181
+ **Supported Providers:**
182
+ - `google:gemini-3.5-flash` (Google AI Studio)
183
+ - `openrouter:anthropic/claude-3.5-sonnet` (OpenRouter)
184
+ - `openai:gpt-4o` (OpenAI)
185
+
186
+ **API Credentials:**
187
+ ```bash
188
+ # Google AI Studio
189
+ GOOGLE_API_KEY=your-key-here
190
+
191
+ # Or Vertex AI
192
+ GOOGLE_GENAI_USE_VERTEXAI=true
193
+ GOOGLE_CLOUD_PROJECT=your-project
194
+ GOOGLE_CLOUD_LOCATION=us-central1
195
+
196
+ # OpenRouter
197
+ OPENROUTER_API_KEY=sk-or-...
198
+
199
+ # OpenAI
200
+ OPENAI_API_KEY=sk-...
201
+ OPENAI_API_BASE=https://api.openai.com/v1
202
+ ```
203
+
204
+ ### Engine Parameters
205
+
206
+ ```python
207
+ engine = create_engine(
208
+ problem="Your problem statement",
209
+ max_depth=4, # Max tree depth
210
+ beam_width=3, # Active paths per iteration
211
+ max_gan_rounds=3, # Max adversarial refinement rounds
212
+ score_threshold=7.0, # Min score to continue
213
+ synthesizer_model=None, # Optional model override
214
+ )
215
+ ```
216
+
217
+ ## Usage Examples
218
+
219
+ ### Basic Usage
220
+
221
+ ```python
222
+ from dialectica import create_engine
223
+
224
+ # Create the engine
225
+ engine = create_engine(
226
+ "Design a sustainable urban transport system"
227
+ )
228
+
229
+ # Run workflow
230
+ result = await engine.run()
231
+
232
+ # Access results
233
+ print(result["final_answer"])
234
+ print(f"Generated {len(result['thought_tree'])} thoughts")
235
+ print(f"Best path: {result['best_path']}")
236
+ ```
237
+
238
+ ### Inspecting the result
239
+
240
+ `run()` returns the answer plus the full search trace:
241
+
242
+ ```python
243
+ result = await engine.run()
244
+ result["final_answer"] # synthesized answer
245
+ result["best_path"] # node ids from root to the highest-scoring thought
246
+ result["thought_tree"] # every node, with scores and per-round GAN history
247
+ result["stats"] # total_thoughts, max_depth_reached, duration_seconds
248
+ ```
249
+
250
+ ### Custom Configuration
251
+
252
+ ```python
253
+ engine = create_engine(
254
+ problem="Optimize supply chain logistics",
255
+ max_depth=5,
256
+ beam_width=5,
257
+ max_gan_rounds=4,
258
+ score_threshold=8.0,
259
+ synthesizer_model="google:gemini-3.1-pro",
260
+ )
261
+ ```
262
+
263
+ ## Project Structure
264
+
265
+ ```
266
+ dialectica/
267
+ ├── __init__.py # Public API exports
268
+ ├── agent.py # Composition root: create_engine() wires defaults
269
+ ├── coordinator.py # Search engine — orchestrates the pluggable stages
270
+ ├── protocols.py # Stage interfaces: Generator/Evaluator/Selector/Synthesizer
271
+ ├── generation.py # LlmGenerator (default Generator) + list parsing
272
+ ├── gan_evaluator.py # AdversarialEvaluator / SinglePassEvaluator (Evaluator)
273
+ ├── selection.py # BeamSearch / GreedySearch (Selector)
274
+ ├── synthesis.py # LlmSynthesizer (default Synthesizer)
275
+ ├── agent_runtime.py # Single LLM-call seam (run_agent)
276
+ ├── agent_factory.py # Dynamic agent creation (role templates)
277
+ ├── models.py # ThoughtData, DiscriminatorVerdict, EvaluationResult
278
+ ├── llm_config.py # Model configuration factory
279
+ └── validation.py # Thought validation utilities
280
+ tests/
281
+ ├── conftest.py # Loads .env for the e2e skip guard
282
+ ├── helpers.py # Deterministic mock LLM stand-ins
283
+ ├── test_models.py # Schema / verdict unit tests
284
+ ├── test_generation.py # List parsing + generator prompt routing
285
+ ├── test_gan_evaluator.py # GAN loop + single-pass evaluator (mocked LLM)
286
+ ├── test_coordinator.py # Engine control flow (injected fake stages)
287
+ ├── test_default_pipeline.py # Default composition integration (mocked LLM)
288
+ └── test_e2e_live.py # Real Gemini E2E (marked `e2e`)
289
+ ```
290
+
291
+ ## Testing
292
+
293
+ The suite has two tiers:
294
+
295
+ - **Mocked tests** (default) — fast, deterministic, no API key. They replace
296
+ the LLM call seam with stand-ins and exercise the real orchestration: beam
297
+ search, the GAN refinement loop, pruning, and synthesis.
298
+ - **Live E2E** (`@pytest.mark.e2e`) — drives the full workflow against the real
299
+ Gemini API. Deselected by default and auto-skipped when `GOOGLE_API_KEY` is
300
+ unset (loaded from `dialectica/.env`).
301
+
302
+ ```bash
303
+ uv run pytest # mocked tests only (seconds, no key)
304
+ uv run pytest -m e2e # live API E2E (slower, requires GOOGLE_API_KEY)
305
+ ```
306
+
307
+ ## Pluggable Architecture
308
+
309
+ The `Coordinator` owns only the search *control flow*. Every decision is
310
+ delegated to an injected component, so any stage can be swapped without
311
+ touching the engine — the engine is a general-purpose reasoning workflow, and
312
+ ToT + GAN is just the default wiring.
313
+
314
+ | Protocol | Responsibility | Default | Alternatives |
315
+ |----------|----------------|---------|--------------|
316
+ | `Generator` | expand a node into candidate thoughts | `LlmGenerator` | custom prompts/agent |
317
+ | `Evaluator` | score (and optionally refine) a thought | `AdversarialEvaluator` (GAN loop) | `SinglePassEvaluator` (cheap) |
318
+ | `Selector` | choose the next search frontier | `BeamSearch(width)` | `GreedySearch` |
319
+ | `Synthesizer` | combine thoughts into the answer | `LlmSynthesizer` | custom |
320
+
321
+ `create_engine(...)` wires the defaults. To customize, build the
322
+ components yourself and construct `Coordinator` directly:
323
+
324
+ ```python
325
+ from dialectica import (
326
+ Coordinator, BeamSearch, SinglePassEvaluator, LlmSynthesizer,
327
+ )
328
+ from dialectica.agent import build_default_components
329
+
330
+ # Start from the defaults, then swap a stage:
331
+ generator, _evaluator, _selector, synthesizer = build_default_components()
332
+ from dialectica.agent_factory import create_agent
333
+ from dialectica.models import DiscriminatorVerdict
334
+
335
+ discriminator = create_agent(
336
+ role="Discriminator", role_name="Discriminator", output_schema=DiscriminatorVerdict
337
+ )
338
+
339
+ engine = Coordinator(
340
+ problem="...",
341
+ generator=generator,
342
+ evaluator=SinglePassEvaluator(discriminator), # cheaper: no refinement loop
343
+ selector=BeamSearch(width=5), # wider frontier
344
+ synthesizer=synthesizer,
345
+ max_depth=3,
346
+ score_threshold=7.0,
347
+ )
348
+ result = await engine.run()
349
+ ```
350
+
351
+ Any object implementing a protocol's method works (they are
352
+ `typing.Protocol`, so no subclassing needed) — e.g. a non-LLM heuristic
353
+ `Evaluator`, or a `Selector` that keeps a diverse frontier instead of pure
354
+ top-k.
355
+
356
+ ## Key Components
357
+
358
+ ### Coordinator
359
+ Orchestrates the three-phase workflow against the stage protocols:
360
+ - Initialize → Explore → Synthesize
361
+ - Manages the thought tree and active beam
362
+ - Delegates generation, scoring, selection, and synthesis to injected components
363
+
364
+ ### AgentFactory
365
+ Creates agents from role templates:
366
+ - Standardized system prompts
367
+ - Tool configuration per role
368
+ - Model configuration per role
369
+ - Runtime agent instantiation
370
+
371
+ ### AdversarialEvaluator
372
+ Implements GAN-style evaluation:
373
+ - Generator proposes/refines thoughts
374
+ - Discriminator critiques with feedback
375
+ - Iterative refinement loop
376
+ - Structured evaluation results
377
+
378
+ ### ThoughtData Model
379
+ Validates thought structure:
380
+ - Required fields (id, parent_id, depth, content)
381
+ - Optional evaluation data
382
+ - GAN round tracking
383
+ - Evaluation history
384
+
385
+ ## Migration to v0.3
386
+
387
+ v0.3 renames the project to **Dialectica** and turns the monolithic coordinator
388
+ into a pluggable engine. The old public names still work as aliases.
389
+
390
+ | Was | Now |
391
+ |-----|-----|
392
+ | package `multi_tool_agent` | package `dialectica` |
393
+ | `create_engine(...)` | `create_engine(...)` *(old name aliased)* |
394
+ | `Coordinator` | `Engine` *(old name aliased)* |
395
+ | `coordinator.run(invocation_context)` | `engine.run()` *(no argument)* |
396
+ | `adk web` | run programmatically: `await create_engine(...).run()` |
397
+
398
+ ```python
399
+ # Old
400
+ from multi_tool_agent import create_engine
401
+ result = await create_engine("...").run(ctx)
402
+
403
+ # New
404
+ from dialectica import create_engine
405
+ result = await create_engine("...").run()
406
+ ```
407
+
408
+ Customization is now first-class — build the stages and inject them (see
409
+ [Pluggable Architecture](#pluggable-architecture)). Update any import path
410
+ `multi_tool_agent` → `dialectica`; that is the only breaking change for callers
411
+ using the default pipeline.
412
+
413
+ ## Performance Considerations
414
+
415
+ **Token Consumption:**
416
+ - GAN evaluation: 2-6 LLM calls per thought (depending on rounds)
417
+ - Beam search: beam_width × max_depth iterations
418
+ - Typical problem: 50-200 thoughts, 200-800 LLM calls
419
+
420
+ **Optimization Strategies:**
421
+ - Reduce `max_gan_rounds` to 1-2 for faster execution
422
+ - Raise `score_threshold` to prune harder; lower it to explore more paths
423
+ - Narrow the beam (`beam_width`) or use `GreedySearch` to cut fan-out
424
+ - Use a lighter model for the Generator and a stronger one for the Discriminator
425
+ - Swap in `SinglePassEvaluator` to skip the refinement loop entirely
426
+
427
+ ## Troubleshooting
428
+
429
+ ### Import Errors
430
+ ```bash
431
+ # Ensure Python 3.11+
432
+ python --version
433
+
434
+ # Reinstall dependencies
435
+ rm -rf .venv
436
+ uv sync
437
+ ```
438
+
439
+ ### ADK Version Mismatch
440
+ ```bash
441
+ # Check installed version
442
+ uv pip show google-adk
443
+
444
+ # Should show 2.1.0 or higher
445
+ ```
446
+
447
+ ### API Key Issues
448
+ ```bash
449
+ # Test Google AI Studio
450
+ export GOOGLE_API_KEY=your-key
451
+ uv run python -c "from dialectica import create_engine; print('OK')"
452
+ ```
453
+
454
+ ## Contributing
455
+
456
+ Contributions welcome! Areas of interest:
457
+ - New stage implementations (`Generator` / `Evaluator` / `Selector` / `Synthesizer`)
458
+ - Alternative search/selection policies (e.g. diversity-preserving frontiers)
459
+ - Performance optimizations
460
+ - Documentation improvements
461
+ - Test coverage
462
+
463
+ ## License
464
+
465
+ [MIT](LICENSE)
466
+
467
+ ## References
468
+
469
+ - [karpathy/autoresearch](https://github.com/karpathy/autoresearch) — propose → evaluate → keep-best loop
470
+ - [Google ADK Documentation](https://google.github.io/adk-docs/)
471
+ - [Tree of Thoughts Paper](https://arxiv.org/abs/2305.10601)
472
+
473
+ ## Acknowledgments
474
+
475
+ Built with [Google ADK](https://github.com/google/adk-python), inspired by Tree of Thoughts research, [karpathy/autoresearch](https://github.com/karpathy/autoresearch)'s autonomous keep-best loop, and Claude Code's composable workflows.