unique_toolkit 1.42.8__py3-none-any.whl → 1.43.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- unique_toolkit/_common/experimental/write_up_agent/README.md +848 -0
- unique_toolkit/_common/experimental/write_up_agent/__init__.py +22 -0
- unique_toolkit/_common/experimental/write_up_agent/agent.py +170 -0
- unique_toolkit/_common/experimental/write_up_agent/config.py +42 -0
- unique_toolkit/_common/experimental/write_up_agent/examples/data.csv +13 -0
- unique_toolkit/_common/experimental/write_up_agent/examples/example_usage.py +78 -0
- unique_toolkit/_common/experimental/write_up_agent/examples/report.md +154 -0
- unique_toolkit/_common/experimental/write_up_agent/schemas.py +36 -0
- unique_toolkit/_common/experimental/write_up_agent/services/__init__.py +13 -0
- unique_toolkit/_common/experimental/write_up_agent/services/dataframe_handler/__init__.py +19 -0
- unique_toolkit/_common/experimental/write_up_agent/services/dataframe_handler/exceptions.py +29 -0
- unique_toolkit/_common/experimental/write_up_agent/services/dataframe_handler/service.py +150 -0
- unique_toolkit/_common/experimental/write_up_agent/services/dataframe_handler/utils.py +130 -0
- unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/__init__.py +27 -0
- unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/config.py +56 -0
- unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/exceptions.py +79 -0
- unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/prompts/config.py +34 -0
- unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/prompts/system_prompt.j2 +15 -0
- unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/prompts/user_prompt.j2 +21 -0
- unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/service.py +369 -0
- unique_toolkit/_common/experimental/write_up_agent/services/template_handler/__init__.py +29 -0
- unique_toolkit/_common/experimental/write_up_agent/services/template_handler/default_template.j2 +37 -0
- unique_toolkit/_common/experimental/write_up_agent/services/template_handler/exceptions.py +39 -0
- unique_toolkit/_common/experimental/write_up_agent/services/template_handler/service.py +191 -0
- unique_toolkit/_common/experimental/write_up_agent/services/template_handler/utils.py +182 -0
- unique_toolkit/_common/experimental/write_up_agent/utils.py +24 -0
- unique_toolkit/agentic/feature_flags/__init__.py +6 -0
- unique_toolkit/agentic/feature_flags/feature_flags.py +32 -0
- unique_toolkit/agentic/message_log_manager/service.py +88 -12
- {unique_toolkit-1.42.8.dist-info → unique_toolkit-1.43.0.dist-info}/METADATA +7 -1
- {unique_toolkit-1.42.8.dist-info → unique_toolkit-1.43.0.dist-info}/RECORD +33 -5
- {unique_toolkit-1.42.8.dist-info → unique_toolkit-1.43.0.dist-info}/LICENSE +0 -0
- {unique_toolkit-1.42.8.dist-info → unique_toolkit-1.43.0.dist-info}/WHEEL +0 -0
|
@@ -0,0 +1,848 @@
|
|
|
1
|
+
# Write-Up Agent
|
|
2
|
+
|
|
3
|
+
The Write-Up Agent is a powerful tool for automatically generating structured markdown reports from DataFrame data using Large Language Models (LLMs). It transforms tabular data into coherent, well-organized narratives suitable for documentation, analysis reports, and technical write-ups.
|
|
4
|
+
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
- [Overview](#overview)
|
|
8
|
+
- [Key Features](#key-features)
|
|
9
|
+
- [Agent Workflow](#agent-workflow)
|
|
10
|
+
- [DataFrame and Template Relationship](#dataframe-and-template-relationship)
|
|
11
|
+
- [How It Works](#how-it-works)
|
|
12
|
+
- [Getting Started](#getting-started)
|
|
13
|
+
- [Template System](#template-system)
|
|
14
|
+
- [Configuration](#configuration)
|
|
15
|
+
- [Architecture](#architecture)
|
|
16
|
+
- [Examples](#examples)
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## Overview
|
|
21
|
+
|
|
22
|
+
### What is the Write-Up Agent?
|
|
23
|
+
|
|
24
|
+
The Write-Up Agent bridges the gap between structured data and narrative documentation. Given a pandas DataFrame, it:
|
|
25
|
+
|
|
26
|
+
1. **Organizes** data by grouping rows into logical sections
|
|
27
|
+
2. **Summarizes** each section using LLM-powered text generation
|
|
28
|
+
3. **Generates** a cohesive markdown report with consistent formatting
|
|
29
|
+
|
|
30
|
+
### Use Cases
|
|
31
|
+
|
|
32
|
+
- **Data Analysis Reports**: Convert analysis results into readable summaries
|
|
33
|
+
- **FAQ Documentation**: Generate organized FAQ pages from Q&A pairs
|
|
34
|
+
- **Survey Summaries**: Transform survey responses into structured reports
|
|
35
|
+
- **Knowledge Base Articles**: Create documentation from structured knowledge entries
|
|
36
|
+
- **Executive Summaries**: Distill large datasets into key insights
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Key Features
|
|
41
|
+
|
|
42
|
+
### 🎯 Template-Driven Architecture
|
|
43
|
+
|
|
44
|
+
- **Single Source of Truth**: Jinja2 templates define both data structure and output format
|
|
45
|
+
- **Automatic Column Detection**: Templates automatically determine which columns to use
|
|
46
|
+
- **Flexible Grouping**: Organize data by any column (e.g., section, category, region)
|
|
47
|
+
|
|
48
|
+
### 🔄 Intelligent Processing
|
|
49
|
+
|
|
50
|
+
- **Adaptive Batching**: Automatically splits large groups to fit within token limits
|
|
51
|
+
- **Iterative Summarization**: Maintains context across batches for coherent outputs
|
|
52
|
+
- **Order Preservation**: Maintains the logical flow from your DataFrame
|
|
53
|
+
|
|
54
|
+
### 🛡️ Type-Safe & Robust
|
|
55
|
+
|
|
56
|
+
- **Pydantic Schemas**: Type-safe data structures (`GroupData`, `ProcessedGroup`)
|
|
57
|
+
- **Custom Exceptions**: Clear error messages for debugging
|
|
58
|
+
- **Automatic Normalization**: Column names converted to snake_case for template compatibility
|
|
59
|
+
|
|
60
|
+
### 🎨 Customizable
|
|
61
|
+
|
|
62
|
+
- **Custom Templates**: Define your own structure and formatting
|
|
63
|
+
- **Group-Specific Instructions**: Tailor LLM behavior per section
|
|
64
|
+
- **Configurable Batching**: Control row and token limits
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## Agent Workflow
|
|
69
|
+
|
|
70
|
+
### What Does the Agent Do?
|
|
71
|
+
|
|
72
|
+
The Write-Up Agent follows a sophisticated 6-step workflow to transform your DataFrame into a polished markdown report:
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
76
|
+
│ 1. TEMPLATE PARSING │
|
|
77
|
+
│ Parse Jinja template → Extract grouping & selected columns │
|
|
78
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
79
|
+
↓
|
|
80
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
81
|
+
│ 2. COLUMN NORMALIZATION │
|
|
82
|
+
│ DataFrame columns → Convert to snake_case │
|
|
83
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
84
|
+
↓
|
|
85
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
86
|
+
│ 3. VALIDATION │
|
|
87
|
+
│ Check: Required columns exist in normalized DataFrame │
|
|
88
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
89
|
+
↓
|
|
90
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
91
|
+
│ 4. GROUPING │
|
|
92
|
+
│ Group DataFrame by detected column → Preserve order │
|
|
93
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
94
|
+
↓
|
|
95
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
96
|
+
│ 5. LLM GENERATION (Per Group) │
|
|
97
|
+
│ a. Batch rows if needed (token/row limits) │
|
|
98
|
+
│ b. Render batch content from template │
|
|
99
|
+
│ c. Build prompts (system + user with section context) │
|
|
100
|
+
│ d. Call LLM with prompts │
|
|
101
|
+
│ e. Aggregate batch summaries → Final summary │
|
|
102
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
103
|
+
↓
|
|
104
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
105
|
+
│ 6. REPORT ASSEMBLY │
|
|
106
|
+
│ Render final template with all LLM summaries → Markdown │
|
|
107
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### Detailed Workflow Explanation
|
|
111
|
+
|
|
112
|
+
#### Step 1: Template Parsing
|
|
113
|
+
The agent parses your Jinja template to automatically detect:
|
|
114
|
+
- **Grouping column**: Found from `{{ g.column_name }}` patterns
|
|
115
|
+
- **Selected columns**: Found from `{{ row.column_name }}` patterns
|
|
116
|
+
|
|
117
|
+
Example:
|
|
118
|
+
```jinja
|
|
119
|
+
# {{ g.section }} → Grouping column: "section"
|
|
120
|
+
{{ row.question }} → Selected column: "question"
|
|
121
|
+
{{ row.answer }} → Selected column: "answer"
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
#### Step 2: Column Normalization
|
|
125
|
+
All DataFrame column names are converted to **snake_case**:
|
|
126
|
+
- `"My Section"` → `"my_section"`
|
|
127
|
+
- `"UserQuestion"` → `"user_question"`
|
|
128
|
+
- `"column-name"` → `"column_name"`
|
|
129
|
+
|
|
130
|
+
This ensures compatibility with Jinja template syntax.
|
|
131
|
+
|
|
132
|
+
#### Step 3: Validation
|
|
133
|
+
The agent verifies that all columns referenced in the template exist in the normalized DataFrame. If any are missing, a clear error message is raised.
|
|
134
|
+
|
|
135
|
+
#### Step 4: Grouping
|
|
136
|
+
Data is grouped by the detected grouping column:
|
|
137
|
+
- Groups appear in **order of first appearance** (not alphabetically)
|
|
138
|
+
- Each group contains all rows with the same grouping value
|
|
139
|
+
- Selected columns are filtered for each group
|
|
140
|
+
|
|
141
|
+
#### Step 5: LLM Generation
|
|
142
|
+
For each group:
|
|
143
|
+
1. **Batching** (if needed): Large groups are split into manageable batches
|
|
144
|
+
2. **Content Rendering**: Template renders batch data for LLM input
|
|
145
|
+
3. **Prompt Building**: System and user prompts are constructed with:
|
|
146
|
+
- Section name (group key)
|
|
147
|
+
- Group-specific instructions (if configured)
|
|
148
|
+
- Previous batch summary (for context)
|
|
149
|
+
4. **LLM Call**: Generate summary for the batch
|
|
150
|
+
5. **Aggregation**: If multiple batches, summaries are iteratively combined
|
|
151
|
+
|
|
152
|
+
#### Step 6: Report Assembly
|
|
153
|
+
All group summaries are combined using the template to produce the final markdown report.
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## DataFrame and Template Relationship
|
|
158
|
+
|
|
159
|
+
### The Critical Connection
|
|
160
|
+
|
|
161
|
+
The DataFrame and template are **tightly coupled** through column names. Understanding this relationship is essential for successful report generation.
|
|
162
|
+
|
|
163
|
+
### 🔑 **CRITICAL: snake_case Requirement**
|
|
164
|
+
|
|
165
|
+
**All template variable references MUST use snake_case notation.**
|
|
166
|
+
|
|
167
|
+
Your DataFrame columns can use ANY naming convention:
|
|
168
|
+
```python
|
|
169
|
+
df = pd.DataFrame({
|
|
170
|
+
'My Section': [...], # Space-separated
|
|
171
|
+
'UserQuestion': [...], # PascalCase
|
|
172
|
+
'user-answer': [...] # kebab-case
|
|
173
|
+
})
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
But your template MUST reference them in snake_case:
|
|
177
|
+
```jinja
|
|
178
|
+
# {{ g.my_section }} ✓ CORRECT
|
|
179
|
+
# {{ row.user_question }} ✓ CORRECT
|
|
180
|
+
# {{ row.user_answer }} ✓ CORRECT
|
|
181
|
+
|
|
182
|
+
# {{ g.My Section }} ✗ WRONG - will fail
|
|
183
|
+
# {{ row.UserQuestion }} ✗ WRONG - will fail
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
### How It Works: Normalization Bridge
|
|
187
|
+
|
|
188
|
+
```
|
|
189
|
+
DataFrame Columns snake_case Template References
|
|
190
|
+
───────────────── ───────────── ──────────────────
|
|
191
|
+
"My Section" → "my_section" ← {{ g.my_section }}
|
|
192
|
+
"UserQuestion" → "user_question" ← {{ row.user_question }}
|
|
193
|
+
"column-name" → "column_name" ← {{ row.column_name }}
|
|
194
|
+
"Product_ID" → "product_id" ← {{ row.product_id }}
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
### Template-DataFrame Mapping Example
|
|
198
|
+
|
|
199
|
+
**DataFrame:**
|
|
200
|
+
```python
|
|
201
|
+
df = pd.DataFrame({
|
|
202
|
+
'Report Section': ['Executive Summary', 'Financial Analysis'],
|
|
203
|
+
'Key Finding': ['Revenue up 20%', 'Costs reduced 15%'],
|
|
204
|
+
'Data Source': ['Q4 Report', 'Annual Budget']
|
|
205
|
+
})
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
**Template (MUST use snake_case):**
|
|
209
|
+
```jinja
|
|
210
|
+
{% for g in groups %}
|
|
211
|
+
# {{ g.report_section }} {# Normalized from "Report Section" #}
|
|
212
|
+
|
|
213
|
+
{% if g.llm_response %}
|
|
214
|
+
{{ g.llm_response }}
|
|
215
|
+
{% else %}
|
|
216
|
+
{% for row in g.rows %}
|
|
217
|
+
**Finding**: {{ row.key_finding }} {# Normalized from "Key Finding" #}
|
|
218
|
+
**Source**: {{ row.data_source }} {# Normalized from "Data Source" #}
|
|
219
|
+
{% endfor %}
|
|
220
|
+
{% endif %}
|
|
221
|
+
{% endfor %}
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
### Group-Specific Instructions: Key Format
|
|
225
|
+
|
|
226
|
+
When providing `group_specific_instructions`, you must use a specific key format to ensure the instructions are correctly matched to groups:
|
|
227
|
+
|
|
228
|
+
⚠️ **Key Format**: `"{snake_case_column}:{snake_case_value}"`
|
|
229
|
+
|
|
230
|
+
**Why both in snake_case?**
|
|
231
|
+
- DataFrame column names are automatically normalized to snake_case (e.g., `"Report Section"` → `"report_section"`)
|
|
232
|
+
- Group values are also normalized to snake_case (e.g., `"Executive Summary"` → `"executive_summary"`)
|
|
233
|
+
- Your instruction keys must match this normalized format
|
|
234
|
+
|
|
235
|
+
**Example:**
|
|
236
|
+
|
|
237
|
+
If your DataFrame has:
|
|
238
|
+
- Column: `"Section"`
|
|
239
|
+
- Values: `"Executive Summary"`, `"Detailed Analysis"`, `"Recommendations"`
|
|
240
|
+
|
|
241
|
+
Your keys must be:
|
|
242
|
+
|
|
243
|
+
```python
|
|
244
|
+
config = WriteUpAgentConfig(
|
|
245
|
+
generation_handler_config=GenerationHandlerConfig(
|
|
246
|
+
group_specific_instructions={
|
|
247
|
+
# Format: "{snake_case_column}:{snake_case_value}"
|
|
248
|
+
# Both parts must be in snake_case
|
|
249
|
+
"section:executive_summary": "Be concise, highlight key metrics",
|
|
250
|
+
"section:detailed_analysis": "Be thorough, include all data points",
|
|
251
|
+
"section:recommendations": "Be actionable, prioritize by impact"
|
|
252
|
+
}
|
|
253
|
+
)
|
|
254
|
+
)
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
**Key Format**: `"{snake_case_column}:{snake_case_value}"`
|
|
258
|
+
- **Column name part**: MUST be snake_case (normalized column name)
|
|
259
|
+
- **Value part**: MUST be snake_case (normalized group value)
|
|
260
|
+
|
|
261
|
+
**Transformation Table:**
|
|
262
|
+
|
|
263
|
+
| DataFrame Column | DataFrame Value | Normalized Column | Normalized Value | Required Key |
|
|
264
|
+
| :--------------- | :-------------- | :---------------- | :--------------- | :----------- |
|
|
265
|
+
| `Section` | `Executive Summary` | `section` | `executive_summary` | `section:executive_summary` |
|
|
266
|
+
| `Report Section` | `User Feedback` | `report_section` | `user_feedback` | `report_section:user_feedback` |
|
|
267
|
+
| `topic-name` | `API-Design` | `topic_name` | `api_design` | `topic_name:api_design` |
|
|
268
|
+
|
|
269
|
+
### Validation and Error Messages
|
|
270
|
+
|
|
271
|
+
If your template references columns incorrectly, you'll see:
|
|
272
|
+
|
|
273
|
+
```
|
|
274
|
+
DataFrameValidationError: DataFrame missing required columns after
|
|
275
|
+
snake_case normalization: ['My Section', 'UserQuestion']
|
|
276
|
+
Available columns: ['my_section', 'user_question', 'user_answer']
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
This tells you:
|
|
280
|
+
- What the template is looking for (incorrect format)
|
|
281
|
+
- What columns are actually available (snake_case format)
|
|
282
|
+
|
|
283
|
+
### Quick Reference: Naming Rules
|
|
284
|
+
|
|
285
|
+
| Component | Format | Example | Notes |
|
|
286
|
+
|-----------|--------|---------|-------|
|
|
287
|
+
| **DataFrame Columns** | Any format | `"My Column"`, `"UserName"` | Will be normalized to snake_case |
|
|
288
|
+
| **DataFrame Values** | Any format | `"Executive Summary"` | Will be normalized to snake_case |
|
|
289
|
+
| **Template Variables** | **snake_case** | `{{ g.my_column }}` | MUST use snake_case |
|
|
290
|
+
| **Group Instruction Keys** | **snake_case:snake_case** | `"my_column:executive_summary"` | Both parts in snake_case |
|
|
291
|
+
|
|
292
|
+
---
|
|
293
|
+
|
|
294
|
+
## How It Works
|
|
295
|
+
|
|
296
|
+
### Input: DataFrame
|
|
297
|
+
|
|
298
|
+
The agent requires a pandas DataFrame with your data. Column names can be in any format (spaces, PascalCase, etc.) - they'll be automatically normalized to snake_case.
|
|
299
|
+
|
|
300
|
+
```python
|
|
301
|
+
import pandas as pd
|
|
302
|
+
|
|
303
|
+
df = pd.DataFrame({
|
|
304
|
+
'Section': ['Introduction', 'Methods', 'Results'],
|
|
305
|
+
'Question': ['What is X?', 'How does Y?', 'What are Z?'],
|
|
306
|
+
'Answer': ['X is...', 'Y works by...', 'Z are...']
|
|
307
|
+
})
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
### Configuration
|
|
311
|
+
|
|
312
|
+
Two main components:
|
|
313
|
+
|
|
314
|
+
1. **WriteUpAgentConfig**: Defines the template and generation settings
|
|
315
|
+
2. **LanguageModelService**: Provides LLM access for summarization (passed at process time)
|
|
316
|
+
|
|
317
|
+
```python
|
|
318
|
+
from unique_toolkit._common.experimental.write_up_agent import (
|
|
319
|
+
WriteUpAgent,
|
|
320
|
+
WriteUpAgentConfig,
|
|
321
|
+
)
|
|
322
|
+
from unique_toolkit.language_model.service import LanguageModelService
|
|
323
|
+
|
|
324
|
+
config = WriteUpAgentConfig() # Uses default template
|
|
325
|
+
agent = WriteUpAgent(config=config)
|
|
326
|
+
|
|
327
|
+
# LLM service is created separately and passed to process()
|
|
328
|
+
llm_service = LanguageModelService.from_settings(settings)
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
### Processing
|
|
332
|
+
|
|
333
|
+
The agent orchestrates a multi-step pipeline:
|
|
334
|
+
|
|
335
|
+
```python
|
|
336
|
+
report = agent.process(df, llm_service=llm_service) # Returns markdown string
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
**Internal Pipeline:**
|
|
340
|
+
|
|
341
|
+
1. **Template Parsing**: Extract grouping column and selected columns from template
|
|
342
|
+
2. **DataFrame Validation**: Verify required columns exist (after snake_case normalization)
|
|
343
|
+
3. **Grouping**: Create groups based on grouping column, preserving DataFrame order
|
|
344
|
+
4. **Batching**: Split large groups into manageable batches (token/row limits)
|
|
345
|
+
5. **LLM Generation**: Generate summaries for each batch with context
|
|
346
|
+
6. **Report Assembly**: Combine all summaries into final markdown report
|
|
347
|
+
|
|
348
|
+
### Output: Markdown Report
|
|
349
|
+
|
|
350
|
+
```markdown
|
|
351
|
+
# Introduction
|
|
352
|
+
|
|
353
|
+
This section introduces the concept of X, explaining its fundamental
|
|
354
|
+
principles and applications...
|
|
355
|
+
|
|
356
|
+
---
|
|
357
|
+
|
|
358
|
+
# Methods
|
|
359
|
+
|
|
360
|
+
The methodology involves Y, which operates through a series of steps...
|
|
361
|
+
|
|
362
|
+
---
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
---
|
|
366
|
+
|
|
367
|
+
## Template System
|
|
368
|
+
|
|
369
|
+
### Templates as Configuration
|
|
370
|
+
|
|
371
|
+
The Jinja2 template serves as the **single source of truth**, defining:
|
|
372
|
+
|
|
373
|
+
- **Grouping column**: Which column to group by (e.g., `section`)
|
|
374
|
+
- **Selected columns**: Which columns to include in each group (e.g., `question`, `answer`)
|
|
375
|
+
- **Output structure**: How the final report should be formatted
|
|
376
|
+
|
|
377
|
+
### Default Template
|
|
378
|
+
|
|
379
|
+
The default template expects three columns: `section`, `question`, `answer`
|
|
380
|
+
|
|
381
|
+
```jinja
|
|
382
|
+
{% for g in groups %}
|
|
383
|
+
# {{ g.section }}
|
|
384
|
+
|
|
385
|
+
{% if g.llm_response %}
|
|
386
|
+
{{ g.llm_response }}
|
|
387
|
+
{% else %}
|
|
388
|
+
{% for row in g.rows %}
|
|
389
|
+
**Q: {{ row.question }}**
|
|
390
|
+
|
|
391
|
+
A: {{ row.answer }}
|
|
392
|
+
|
|
393
|
+
{% endfor %}
|
|
394
|
+
{% endif %}
|
|
395
|
+
|
|
396
|
+
---
|
|
397
|
+
{% endfor %}
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
### How Template Parsing Works
|
|
401
|
+
|
|
402
|
+
The agent automatically detects:
|
|
403
|
+
|
|
404
|
+
```jinja
|
|
405
|
+
# {{ g.section }} → Grouping column: "section"
|
|
406
|
+
{{ row.question }} → Selected column: "question"
|
|
407
|
+
{{ row.answer }} → Selected column: "answer"
|
|
408
|
+
```
|
|
409
|
+
|
|
410
|
+
### Reserved Keywords
|
|
411
|
+
|
|
412
|
+
These keywords are reserved for template logic (not treated as data columns):
|
|
413
|
+
|
|
414
|
+
- `g.rows`: List of row dictionaries
|
|
415
|
+
- `g.llm_response`: LLM-generated summary
|
|
416
|
+
- `g.instructions`: Group-specific instructions (future use)
|
|
417
|
+
|
|
418
|
+
### Two-Phase Rendering
|
|
419
|
+
|
|
420
|
+
**Phase 1 - LLM Input** (`g.llm_response` is None):
|
|
421
|
+
```markdown
|
|
422
|
+
**Q: What is the Write-Up Agent?**
|
|
423
|
+
A: A tool for generating reports...
|
|
424
|
+
```
|
|
425
|
+
|
|
426
|
+
**Phase 2 - Final Report** (`g.llm_response` is provided):
|
|
427
|
+
```markdown
|
|
428
|
+
The Write-Up Agent is an automated tool that transforms structured
|
|
429
|
+
DataFrame data into coherent summaries...
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
### Custom Templates
|
|
433
|
+
|
|
434
|
+
Create your own template for different data structures:
|
|
435
|
+
|
|
436
|
+
```jinja
|
|
437
|
+
{% for g in groups %}
|
|
438
|
+
## {{ g.category }}
|
|
439
|
+
|
|
440
|
+
{% if g.llm_response %}
|
|
441
|
+
{{ g.llm_response }}
|
|
442
|
+
{% else %}
|
|
443
|
+
{% for row in g.rows %}
|
|
444
|
+
- **{{ row.product }}**: ${{ row.price }} - {{ row.description }}
|
|
445
|
+
{% endfor %}
|
|
446
|
+
{% endif %}
|
|
447
|
+
{% endfor %}
|
|
448
|
+
```
|
|
449
|
+
|
|
450
|
+
This template expects columns: `category`, `product`, `price`, `description`
|
|
451
|
+
|
|
452
|
+
---
|
|
453
|
+
|
|
454
|
+
## Configuration
|
|
455
|
+
|
|
456
|
+
### WriteUpAgentConfig
|
|
457
|
+
|
|
458
|
+
```python
|
|
459
|
+
from unique_toolkit._common.experimental.write_up_agent import WriteUpAgentConfig
|
|
460
|
+
|
|
461
|
+
config = WriteUpAgentConfig(
|
|
462
|
+
# Template (default: Q&A template for section/question/answer)
|
|
463
|
+
template="{% for g in groups %}...",
|
|
464
|
+
|
|
465
|
+
# Generation settings
|
|
466
|
+
generation_handler_config=GenerationHandlerConfig(
|
|
467
|
+
language_model=language_model_info,
|
|
468
|
+
common_instruction="You are a technical writer...",
|
|
469
|
+
max_rows_per_batch=20,
|
|
470
|
+
max_tokens_per_batch=4000,
|
|
471
|
+
group_specific_instructions={
|
|
472
|
+
"section:Introduction": "Be welcoming and engaging",
|
|
473
|
+
"section:Methods": "Be precise and technical"
|
|
474
|
+
}
|
|
475
|
+
)
|
|
476
|
+
)
|
|
477
|
+
```
|
|
478
|
+
|
|
479
|
+
### Configuration Options
|
|
480
|
+
|
|
481
|
+
| Parameter | Type | Default | Description |
|
|
482
|
+
|-----------|------|---------|-------------|
|
|
483
|
+
| `template` | `str` | Default Q&A template | Jinja2 template defining structure |
|
|
484
|
+
| `generation_handler_config` | `GenerationHandlerConfig` | Default config | LLM generation settings |
|
|
485
|
+
|
|
486
|
+
### GenerationHandlerConfig
|
|
487
|
+
|
|
488
|
+
| Parameter | Type | Default | Description |
|
|
489
|
+
|-----------|------|---------|-------------|
|
|
490
|
+
| `language_model` | `LMI` | Required | Language model to use |
|
|
491
|
+
| `common_instruction` | `str` | Default system prompt | Base instruction for all groups |
|
|
492
|
+
| `max_rows_per_batch` | `int` | 20 | Max rows per LLM call |
|
|
493
|
+
| `max_tokens_per_batch` | `int` | 4000 | Max tokens per LLM call |
|
|
494
|
+
| `group_specific_instructions` | `dict[str, str]` | `{}` | Custom instructions per group (format: `"column:value"`) |
|
|
495
|
+
| `prompts_config` | `GenerationHandlerPromptsConfig` | Default prompts | Configurable system and user prompt templates |
|
|
496
|
+
|
|
497
|
+
### GenerationHandlerPromptsConfig
|
|
498
|
+
|
|
499
|
+
Allows customization of the prompt templates used for LLM generation:
|
|
500
|
+
|
|
501
|
+
| Parameter | Type | Default | Description |
|
|
502
|
+
|-----------|------|---------|-------------|
|
|
503
|
+
| `system_prompt_template` | `str` | Default system template | Jinja2 template for system prompt |
|
|
504
|
+
| `user_prompt_template` | `str` | Default user template | Jinja2 template for user prompt |
|
|
505
|
+
|
|
506
|
+
**Example: Custom Prompts**
|
|
507
|
+
```python
|
|
508
|
+
from unique_toolkit._common.experimental.write_up_agent.services.generation_handler.prompts.config import (
|
|
509
|
+
GenerationHandlerPromptsConfig
|
|
510
|
+
)
|
|
511
|
+
|
|
512
|
+
custom_prompts = GenerationHandlerPromptsConfig(
|
|
513
|
+
system_prompt_template="""
|
|
514
|
+
You are a professional technical writer.
|
|
515
|
+
{{ common_instruction }}
|
|
516
|
+
""",
|
|
517
|
+
user_prompt_template="""
|
|
518
|
+
Section: {{ section_name }}
|
|
519
|
+
|
|
520
|
+
{% if group_instruction %}
|
|
521
|
+
Special instructions: {{ group_instruction }}
|
|
522
|
+
{% endif %}
|
|
523
|
+
|
|
524
|
+
{% if previous_summary %}
|
|
525
|
+
Previous context: {{ previous_summary }}
|
|
526
|
+
{% endif %}
|
|
527
|
+
|
|
528
|
+
Content to summarize:
|
|
529
|
+
{{ content }}
|
|
530
|
+
"""
|
|
531
|
+
)
|
|
532
|
+
|
|
533
|
+
gen_config = GenerationHandlerConfig(
|
|
534
|
+
language_model=language_model_info,
|
|
535
|
+
prompts_config=custom_prompts
|
|
536
|
+
)
|
|
537
|
+
```
|
|
538
|
+
|
|
539
|
+
### Utility Functions
|
|
540
|
+
|
|
541
|
+
**template_loader(parent_dir, template_name)**
|
|
542
|
+
|
|
543
|
+
Helper function to load Jinja2 templates from the filesystem:
|
|
544
|
+
|
|
545
|
+
```python
|
|
546
|
+
from pathlib import Path
|
|
547
|
+
from unique_toolkit._common.experimental.write_up_agent.utils import template_loader
|
|
548
|
+
|
|
549
|
+
# Load a custom template file
|
|
550
|
+
template_path = Path(__file__).parent
|
|
551
|
+
template_content = template_loader(template_path, "my_template.j2")
|
|
552
|
+
```
|
|
553
|
+
|
|
554
|
+
| Parameter | Type | Description |
|
|
555
|
+
|-----------|------|-------------|
|
|
556
|
+
| `parent_dir` | `Path` | Directory containing the template |
|
|
557
|
+
| `template_name` | `str` | Name of the template file (e.g., "template.j2") |
|
|
558
|
+
| **Returns** | `str` | Template content as a string |
|
|
559
|
+
|
|
560
|
+
---
|
|
561
|
+
|
|
562
|
+
## Architecture
|
|
563
|
+
|
|
564
|
+
### Dependency Injection Pattern
|
|
565
|
+
|
|
566
|
+
The agent uses **dependency injection** for the `LanguageModelService`:
|
|
567
|
+
|
|
568
|
+
- **At initialization**: The agent is configured with templates and settings, but no LLM service
|
|
569
|
+
- **At process time**: The LLM service is passed as a parameter to `process()`
|
|
570
|
+
|
|
571
|
+
**Benefits:**
|
|
572
|
+
- **Flexibility**: Use different LLM services for different processing runs
|
|
573
|
+
- **Reusability**: One agent instance can be used with multiple LLM services
|
|
574
|
+
- **Testability**: Easy to mock LLM services for testing
|
|
575
|
+
- **Separation**: Agent configuration is independent of LLM service lifecycle
|
|
576
|
+
|
|
577
|
+
**Example:**
|
|
578
|
+
```python
|
|
579
|
+
# Initialize once with configuration
|
|
580
|
+
agent = WriteUpAgent(config=config)
|
|
581
|
+
|
|
582
|
+
# Use with different LLM services or configurations
|
|
583
|
+
llm_service_gpt4 = LanguageModelService.from_settings(gpt4_settings)
|
|
584
|
+
report1 = agent.process(df1, llm_service=llm_service_gpt4)
|
|
585
|
+
|
|
586
|
+
llm_service_claude = LanguageModelService.from_settings(claude_settings)
|
|
587
|
+
report2 = agent.process(df2, llm_service=llm_service_claude)
|
|
588
|
+
```
|
|
589
|
+
|
|
590
|
+
### Separation of Concerns
|
|
591
|
+
|
|
592
|
+
The agent follows a clean architecture with three main handlers:
|
|
593
|
+
|
|
594
|
+
```
|
|
595
|
+
WriteUpAgent (Orchestrator)
|
|
596
|
+
├── TemplateHandler (Template Operations)
|
|
597
|
+
│ ├── Parse template structure
|
|
598
|
+
│ ├── Extract columns
|
|
599
|
+
│ └── Render groups
|
|
600
|
+
├── DataFrameHandler (Data Operations)
|
|
601
|
+
│ ├── Normalize column names
|
|
602
|
+
│ ├── Validate columns
|
|
603
|
+
│ └── Create groups
|
|
604
|
+
└── GenerationHandler (LLM Operations)
|
|
605
|
+
├── Create batches
|
|
606
|
+
├── Build prompts
|
|
607
|
+
├── Call LLM (with injected LanguageModelService)
|
|
608
|
+
└── Aggregate summaries
|
|
609
|
+
```
|
|
610
|
+
|
|
611
|
+
### Data Flow
|
|
612
|
+
|
|
613
|
+
```
|
|
614
|
+
DataFrame → Normalize → Validate → Group → Batch → LLM → Aggregate → Report
|
|
615
|
+
```
|
|
616
|
+
|
|
617
|
+
### Type-Safe Schemas
|
|
618
|
+
|
|
619
|
+
```python
|
|
620
|
+
from unique_toolkit._common.experimental.write_up_agent import (
|
|
621
|
+
GroupData,
|
|
622
|
+
ProcessedGroup
|
|
623
|
+
)
|
|
624
|
+
|
|
625
|
+
# GroupData: After DataFrame grouping
|
|
626
|
+
GroupData(
|
|
627
|
+
group_key="Introduction",
|
|
628
|
+
rows=[{"question": "...", "answer": "..."}]
|
|
629
|
+
)
|
|
630
|
+
|
|
631
|
+
# ProcessedGroup: After LLM generation
|
|
632
|
+
ProcessedGroup(
|
|
633
|
+
group_key="Introduction",
|
|
634
|
+
rows=[{"question": "...", "answer": "..."}],
|
|
635
|
+
llm_response="The introduction section..."
|
|
636
|
+
)
|
|
637
|
+
```
|
|
638
|
+
|
|
639
|
+
---
|
|
640
|
+
|
|
641
|
+
## Examples
|
|
642
|
+
|
|
643
|
+
### Basic Usage
|
|
644
|
+
|
|
645
|
+
```python
|
|
646
|
+
import pandas as pd
|
|
647
|
+
from unique_toolkit._common.experimental.write_up_agent import (
|
|
648
|
+
WriteUpAgent,
|
|
649
|
+
WriteUpAgentConfig,
|
|
650
|
+
)
|
|
651
|
+
from unique_toolkit.app.unique_settings import UniqueSettings
|
|
652
|
+
from unique_toolkit.language_model.service import LanguageModelService
|
|
653
|
+
|
|
654
|
+
# Setup
|
|
655
|
+
settings = UniqueSettings.from_env()
|
|
656
|
+
settings.init_sdk()
|
|
657
|
+
|
|
658
|
+
# Create DataFrame
|
|
659
|
+
df = pd.DataFrame({
|
|
660
|
+
'section': ['Intro', 'Methods', 'Results'],
|
|
661
|
+
'question': ['What?', 'How?', 'What result?'],
|
|
662
|
+
'answer': ['Answer 1', 'Answer 2', 'Answer 3']
|
|
663
|
+
})
|
|
664
|
+
|
|
665
|
+
# Initialize agent
|
|
666
|
+
config = WriteUpAgentConfig()
|
|
667
|
+
agent = WriteUpAgent(config=config)
|
|
668
|
+
|
|
669
|
+
# Create LLM service
|
|
670
|
+
llm_service = LanguageModelService.from_settings(settings)
|
|
671
|
+
|
|
672
|
+
# Generate report
|
|
673
|
+
report = agent.process(df, llm_service=llm_service)
|
|
674
|
+
print(report)
|
|
675
|
+
```
|
|
676
|
+
|
|
677
|
+
### Custom Template Example
|
|
678
|
+
|
|
679
|
+
```python
|
|
680
|
+
custom_template = """
|
|
681
|
+
{% for g in groups %}
|
|
682
|
+
# {{ g.region }} Market Analysis
|
|
683
|
+
|
|
684
|
+
{% if g.llm_response %}
|
|
685
|
+
{{ g.llm_response }}
|
|
686
|
+
{% else %}
|
|
687
|
+
{% for row in g.rows %}
|
|
688
|
+
- **{{ row.product }}**: {{ row.units }} units sold
|
|
689
|
+
{% endfor %}
|
|
690
|
+
{% endif %}
|
|
691
|
+
|
|
692
|
+
---
|
|
693
|
+
{% endfor %}
|
|
694
|
+
"""
|
|
695
|
+
|
|
696
|
+
config = WriteUpAgentConfig(template=custom_template)
|
|
697
|
+
|
|
698
|
+
df = pd.DataFrame({
|
|
699
|
+
'Region': ['North', 'South', 'East'],
|
|
700
|
+
'Product': ['Widget', 'Gadget', 'Tool'],
|
|
701
|
+
'Units': [100, 200, 150]
|
|
702
|
+
})
|
|
703
|
+
|
|
704
|
+
agent = WriteUpAgent(config=config)
|
|
705
|
+
llm_service = LanguageModelService.from_settings(settings)
|
|
706
|
+
report = agent.process(df, llm_service=llm_service)
|
|
707
|
+
```
|
|
708
|
+
|
|
709
|
+
### With Group-Specific Instructions
|
|
710
|
+
|
|
711
|
+
```python
|
|
712
|
+
from unique_toolkit._common.experimental.write_up_agent.services.generation_handler import (
|
|
713
|
+
GenerationHandlerConfig
|
|
714
|
+
)
|
|
715
|
+
|
|
716
|
+
# DataFrame column: "Section"
|
|
717
|
+
# DataFrame values: "Executive Summary", "Detailed Analysis", "Recommendations"
|
|
718
|
+
|
|
719
|
+
gen_config = GenerationHandlerConfig(
|
|
720
|
+
language_model=language_model_info,
|
|
721
|
+
common_instruction="You are an expert data analyst.",
|
|
722
|
+
group_specific_instructions={
|
|
723
|
+
# Format: "snake_case_column:snake_case_value"
|
|
724
|
+
# Both column name AND value must be in snake_case
|
|
725
|
+
"section:executive_summary": "Be concise, highlight key metrics",
|
|
726
|
+
"section:detailed_analysis": "Be thorough, include all data points",
|
|
727
|
+
"section:recommendations": "Be actionable, prioritize by impact"
|
|
728
|
+
}
|
|
729
|
+
)
|
|
730
|
+
|
|
731
|
+
config = WriteUpAgentConfig(generation_handler_config=gen_config)
|
|
732
|
+
agent = WriteUpAgent(config=config)
|
|
733
|
+
|
|
734
|
+
# Process with LLM service
|
|
735
|
+
llm_service = LanguageModelService.from_settings(settings)
|
|
736
|
+
report = agent.process(df, llm_service=llm_service)
|
|
737
|
+
```
|
|
738
|
+
|
|
739
|
+
**Important**: Both the column name (`section`) AND the values (`executive_summary`, etc.) must be in snake_case to match the automatic normalization applied to your DataFrame.
|
|
740
|
+
---
|
|
741
|
+
|
|
742
|
+
## Advanced Features
|
|
743
|
+
|
|
744
|
+
### Automatic Column Normalization
|
|
745
|
+
|
|
746
|
+
All column names are automatically converted to snake_case for template compatibility:
|
|
747
|
+
|
|
748
|
+
| Original | Normalized |
|
|
749
|
+
|----------|------------|
|
|
750
|
+
| `My Column` | `my_column` |
|
|
751
|
+
| `UserName` | `user_name` |
|
|
752
|
+
| `section-name` | `section_name` |
|
|
753
|
+
|
|
754
|
+
Your DataFrame can use any naming convention - the agent handles normalization automatically.
|
|
755
|
+
|
|
756
|
+
### Order Preservation
|
|
757
|
+
|
|
758
|
+
Groups appear in the order they first appear in your DataFrame, not alphabetically:
|
|
759
|
+
|
|
760
|
+
```python
|
|
761
|
+
df = pd.DataFrame({
|
|
762
|
+
'section': ['Intro', 'Methods', 'Results', 'Intro'] # Intro appears twice
|
|
763
|
+
})
|
|
764
|
+
# Report will show: Intro → Methods → Results (not: Intro → Methods → Results → Intro)
|
|
765
|
+
```
|
|
766
|
+
|
|
767
|
+
### Adaptive Batching
|
|
768
|
+
|
|
769
|
+
For groups with many rows, the agent automatically:
|
|
770
|
+
1. Splits into batches based on token/row limits
|
|
771
|
+
2. Processes each batch with LLM
|
|
772
|
+
3. Maintains context by passing previous summary to next batch
|
|
773
|
+
4. Aggregates all batch summaries into final section summary
|
|
774
|
+
|
|
775
|
+
### Error Handling
|
|
776
|
+
|
|
777
|
+
Custom exceptions provide clear error messages:
|
|
778
|
+
|
|
779
|
+
```python
|
|
780
|
+
from unique_toolkit._common.experimental.write_up_agent.services.dataframe_handler import (
|
|
781
|
+
DataFrameValidationError,
|
|
782
|
+
DataFrameGroupingError,
|
|
783
|
+
)
|
|
784
|
+
from unique_toolkit._common.experimental.write_up_agent.services.template_handler import (
|
|
785
|
+
TemplateParsingError,
|
|
786
|
+
ColumnExtractionError,
|
|
787
|
+
)
|
|
788
|
+
|
|
789
|
+
try:
|
|
790
|
+
report = agent.process(df)
|
|
791
|
+
except DataFrameValidationError as e:
|
|
792
|
+
print(f"Missing columns: {e.missing_columns}")
|
|
793
|
+
except TemplateParsingError as e:
|
|
794
|
+
print(f"Template error: {e}")
|
|
795
|
+
```
|
|
796
|
+
|
|
797
|
+
---
|
|
798
|
+
|
|
799
|
+
## Best Practices
|
|
800
|
+
|
|
801
|
+
1. **Column Names**: Use descriptive names - they'll be normalized automatically
|
|
802
|
+
2. **Data Organization**: Arrange DataFrame in logical order (will be preserved)
|
|
803
|
+
3. **Template Design**: Start with default template, customize as needed
|
|
804
|
+
4. **Batch Sizes**: Adjust `max_rows_per_batch` based on data density
|
|
805
|
+
5. **Instructions**: Use `group_specific_instructions` for varied section styles
|
|
806
|
+
6. **Testing**: Test with small datasets first to verify template parsing
|
|
807
|
+
|
|
808
|
+
---
|
|
809
|
+
|
|
810
|
+
## Troubleshooting
|
|
811
|
+
|
|
812
|
+
### "DataFrame missing required columns"
|
|
813
|
+
|
|
814
|
+
The template references columns that don't exist in your DataFrame (after snake_case normalization).
|
|
815
|
+
|
|
816
|
+
**Solution**: Check template column references match your DataFrame columns (in snake_case).
|
|
817
|
+
|
|
818
|
+
### "Template must use grouping pattern"
|
|
819
|
+
|
|
820
|
+
Your template doesn't include `{% for g in groups %}`.
|
|
821
|
+
|
|
822
|
+
**Solution**: Ensure template follows the grouping pattern shown in examples.
|
|
823
|
+
|
|
824
|
+
### "Single grouping column required"
|
|
825
|
+
|
|
826
|
+
Your template references multiple grouping columns (e.g., `{{ g.col1 }}`, `{{ g.col2 }}`).
|
|
827
|
+
|
|
828
|
+
**Solution**: Currently only single-column grouping is supported. Use one grouping column.
|
|
829
|
+
|
|
830
|
+
---
|
|
831
|
+
|
|
832
|
+
## Future Enhancements
|
|
833
|
+
|
|
834
|
+
- [ ] **Multi-column grouping support**: Group by multiple columns simultaneously (e.g., `region` and `category`)
|
|
835
|
+
- [ ] **Reference handling**: Support passing a reference map to automatically resolve and include references in the generated content
|
|
836
|
+
|
|
837
|
+
---
|
|
838
|
+
|
|
839
|
+
## Contributing
|
|
840
|
+
|
|
841
|
+
This is an experimental feature. Feedback and contributions are welcome!
|
|
842
|
+
|
|
843
|
+
---
|
|
844
|
+
|
|
845
|
+
## License
|
|
846
|
+
|
|
847
|
+
Part of the Unique Toolkit - see main repository LICENSE.
|
|
848
|
+
|