unique_toolkit 1.42.8__py3-none-any.whl → 1.43.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (33) hide show
  1. unique_toolkit/_common/experimental/write_up_agent/README.md +848 -0
  2. unique_toolkit/_common/experimental/write_up_agent/__init__.py +22 -0
  3. unique_toolkit/_common/experimental/write_up_agent/agent.py +170 -0
  4. unique_toolkit/_common/experimental/write_up_agent/config.py +42 -0
  5. unique_toolkit/_common/experimental/write_up_agent/examples/data.csv +13 -0
  6. unique_toolkit/_common/experimental/write_up_agent/examples/example_usage.py +78 -0
  7. unique_toolkit/_common/experimental/write_up_agent/examples/report.md +154 -0
  8. unique_toolkit/_common/experimental/write_up_agent/schemas.py +36 -0
  9. unique_toolkit/_common/experimental/write_up_agent/services/__init__.py +13 -0
  10. unique_toolkit/_common/experimental/write_up_agent/services/dataframe_handler/__init__.py +19 -0
  11. unique_toolkit/_common/experimental/write_up_agent/services/dataframe_handler/exceptions.py +29 -0
  12. unique_toolkit/_common/experimental/write_up_agent/services/dataframe_handler/service.py +150 -0
  13. unique_toolkit/_common/experimental/write_up_agent/services/dataframe_handler/utils.py +130 -0
  14. unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/__init__.py +27 -0
  15. unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/config.py +56 -0
  16. unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/exceptions.py +79 -0
  17. unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/prompts/config.py +34 -0
  18. unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/prompts/system_prompt.j2 +15 -0
  19. unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/prompts/user_prompt.j2 +21 -0
  20. unique_toolkit/_common/experimental/write_up_agent/services/generation_handler/service.py +369 -0
  21. unique_toolkit/_common/experimental/write_up_agent/services/template_handler/__init__.py +29 -0
  22. unique_toolkit/_common/experimental/write_up_agent/services/template_handler/default_template.j2 +37 -0
  23. unique_toolkit/_common/experimental/write_up_agent/services/template_handler/exceptions.py +39 -0
  24. unique_toolkit/_common/experimental/write_up_agent/services/template_handler/service.py +191 -0
  25. unique_toolkit/_common/experimental/write_up_agent/services/template_handler/utils.py +182 -0
  26. unique_toolkit/_common/experimental/write_up_agent/utils.py +24 -0
  27. unique_toolkit/agentic/feature_flags/__init__.py +6 -0
  28. unique_toolkit/agentic/feature_flags/feature_flags.py +32 -0
  29. unique_toolkit/agentic/message_log_manager/service.py +88 -12
  30. {unique_toolkit-1.42.8.dist-info → unique_toolkit-1.43.0.dist-info}/METADATA +7 -1
  31. {unique_toolkit-1.42.8.dist-info → unique_toolkit-1.43.0.dist-info}/RECORD +33 -5
  32. {unique_toolkit-1.42.8.dist-info → unique_toolkit-1.43.0.dist-info}/LICENSE +0 -0
  33. {unique_toolkit-1.42.8.dist-info → unique_toolkit-1.43.0.dist-info}/WHEEL +0 -0
@@ -0,0 +1,848 @@
1
+ # Write-Up Agent
2
+
3
+ The Write-Up Agent is a powerful tool for automatically generating structured markdown reports from DataFrame data using Large Language Models (LLMs). It transforms tabular data into coherent, well-organized narratives suitable for documentation, analysis reports, and technical write-ups.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Overview](#overview)
8
+ - [Key Features](#key-features)
9
+ - [Agent Workflow](#agent-workflow)
10
+ - [DataFrame and Template Relationship](#dataframe-and-template-relationship)
11
+ - [How It Works](#how-it-works)
12
+ - [Getting Started](#getting-started)
13
+ - [Template System](#template-system)
14
+ - [Configuration](#configuration)
15
+ - [Architecture](#architecture)
16
+ - [Examples](#examples)
17
+
18
+ ---
19
+
20
+ ## Overview
21
+
22
+ ### What is the Write-Up Agent?
23
+
24
+ The Write-Up Agent bridges the gap between structured data and narrative documentation. Given a pandas DataFrame, it:
25
+
26
+ 1. **Organizes** data by grouping rows into logical sections
27
+ 2. **Summarizes** each section using LLM-powered text generation
28
+ 3. **Generates** a cohesive markdown report with consistent formatting
29
+
30
+ ### Use Cases
31
+
32
+ - **Data Analysis Reports**: Convert analysis results into readable summaries
33
+ - **FAQ Documentation**: Generate organized FAQ pages from Q&A pairs
34
+ - **Survey Summaries**: Transform survey responses into structured reports
35
+ - **Knowledge Base Articles**: Create documentation from structured knowledge entries
36
+ - **Executive Summaries**: Distill large datasets into key insights
37
+
38
+ ---
39
+
40
+ ## Key Features
41
+
42
+ ### 🎯 Template-Driven Architecture
43
+
44
+ - **Single Source of Truth**: Jinja2 templates define both data structure and output format
45
+ - **Automatic Column Detection**: Templates automatically determine which columns to use
46
+ - **Flexible Grouping**: Organize data by any column (e.g., section, category, region)
47
+
48
+ ### 🔄 Intelligent Processing
49
+
50
+ - **Adaptive Batching**: Automatically splits large groups to fit within token limits
51
+ - **Iterative Summarization**: Maintains context across batches for coherent outputs
52
+ - **Order Preservation**: Maintains the logical flow from your DataFrame
53
+
54
+ ### 🛡️ Type-Safe & Robust
55
+
56
+ - **Pydantic Schemas**: Type-safe data structures (`GroupData`, `ProcessedGroup`)
57
+ - **Custom Exceptions**: Clear error messages for debugging
58
+ - **Automatic Normalization**: Column names converted to snake_case for template compatibility
59
+
60
+ ### 🎨 Customizable
61
+
62
+ - **Custom Templates**: Define your own structure and formatting
63
+ - **Group-Specific Instructions**: Tailor LLM behavior per section
64
+ - **Configurable Batching**: Control row and token limits
65
+
66
+ ---
67
+
68
+ ## Agent Workflow
69
+
70
+ ### What Does the Agent Do?
71
+
72
+ The Write-Up Agent follows a sophisticated 6-step workflow to transform your DataFrame into a polished markdown report:
73
+
74
+ ```
75
+ ┌─────────────────────────────────────────────────────────────────┐
76
+ │ 1. TEMPLATE PARSING │
77
+ │ Parse Jinja template → Extract grouping & selected columns │
78
+ └─────────────────────────────────────────────────────────────────┘
79
+
80
+ ┌─────────────────────────────────────────────────────────────────┐
81
+ │ 2. COLUMN NORMALIZATION │
82
+ │ DataFrame columns → Convert to snake_case │
83
+ └─────────────────────────────────────────────────────────────────┘
84
+
85
+ ┌─────────────────────────────────────────────────────────────────┐
86
+ │ 3. VALIDATION │
87
+ │ Check: Required columns exist in normalized DataFrame │
88
+ └─────────────────────────────────────────────────────────────────┘
89
+
90
+ ┌─────────────────────────────────────────────────────────────────┐
91
+ │ 4. GROUPING │
92
+ │ Group DataFrame by detected column → Preserve order │
93
+ └─────────────────────────────────────────────────────────────────┘
94
+
95
+ ┌─────────────────────────────────────────────────────────────────┐
96
+ │ 5. LLM GENERATION (Per Group) │
97
+ │ a. Batch rows if needed (token/row limits) │
98
+ │ b. Render batch content from template │
99
+ │ c. Build prompts (system + user with section context) │
100
+ │ d. Call LLM with prompts │
101
+ │ e. Aggregate batch summaries → Final summary │
102
+ └─────────────────────────────────────────────────────────────────┘
103
+
104
+ ┌─────────────────────────────────────────────────────────────────┐
105
+ │ 6. REPORT ASSEMBLY │
106
+ │ Render final template with all LLM summaries → Markdown │
107
+ └─────────────────────────────────────────────────────────────────┘
108
+ ```
109
+
110
+ ### Detailed Workflow Explanation
111
+
112
+ #### Step 1: Template Parsing
113
+ The agent parses your Jinja template to automatically detect:
114
+ - **Grouping column**: Found from `{{ g.column_name }}` patterns
115
+ - **Selected columns**: Found from `{{ row.column_name }}` patterns
116
+
117
+ Example:
118
+ ```jinja
119
+ # {{ g.section }} → Grouping column: "section"
120
+ {{ row.question }} → Selected column: "question"
121
+ {{ row.answer }} → Selected column: "answer"
122
+ ```
123
+
124
+ #### Step 2: Column Normalization
125
+ All DataFrame column names are converted to **snake_case**:
126
+ - `"My Section"` → `"my_section"`
127
+ - `"UserQuestion"` → `"user_question"`
128
+ - `"column-name"` → `"column_name"`
129
+
130
+ This ensures compatibility with Jinja template syntax.
131
+
132
+ #### Step 3: Validation
133
+ The agent verifies that all columns referenced in the template exist in the normalized DataFrame. If any are missing, a clear error message is raised.
134
+
135
+ #### Step 4: Grouping
136
+ Data is grouped by the detected grouping column:
137
+ - Groups appear in **order of first appearance** (not alphabetically)
138
+ - Each group contains all rows with the same grouping value
139
+ - Selected columns are filtered for each group
140
+
141
+ #### Step 5: LLM Generation
142
+ For each group:
143
+ 1. **Batching** (if needed): Large groups are split into manageable batches
144
+ 2. **Content Rendering**: Template renders batch data for LLM input
145
+ 3. **Prompt Building**: System and user prompts are constructed with:
146
+ - Section name (group key)
147
+ - Group-specific instructions (if configured)
148
+ - Previous batch summary (for context)
149
+ 4. **LLM Call**: Generate summary for the batch
150
+ 5. **Aggregation**: If multiple batches, summaries are iteratively combined
151
+
152
+ #### Step 6: Report Assembly
153
+ All group summaries are combined using the template to produce the final markdown report.
154
+
155
+ ---
156
+
157
+ ## DataFrame and Template Relationship
158
+
159
+ ### The Critical Connection
160
+
161
+ The DataFrame and template are **tightly coupled** through column names. Understanding this relationship is essential for successful report generation.
162
+
163
+ ### 🔑 **CRITICAL: snake_case Requirement**
164
+
165
+ **All template variable references MUST use snake_case notation.**
166
+
167
+ Your DataFrame columns can use ANY naming convention:
168
+ ```python
169
+ df = pd.DataFrame({
170
+ 'My Section': [...], # Space-separated
171
+ 'UserQuestion': [...], # PascalCase
172
+ 'user-answer': [...] # kebab-case
173
+ })
174
+ ```
175
+
176
+ But your template MUST reference them in snake_case:
177
+ ```jinja
178
+ # {{ g.my_section }} ✓ CORRECT
179
+ # {{ row.user_question }} ✓ CORRECT
180
+ # {{ row.user_answer }} ✓ CORRECT
181
+
182
+ # {{ g.My Section }} ✗ WRONG - will fail
183
+ # {{ row.UserQuestion }} ✗ WRONG - will fail
184
+ ```
185
+
186
+ ### How It Works: Normalization Bridge
187
+
188
+ ```
189
+ DataFrame Columns snake_case Template References
190
+ ───────────────── ───────────── ──────────────────
191
+ "My Section" → "my_section" ← {{ g.my_section }}
192
+ "UserQuestion" → "user_question" ← {{ row.user_question }}
193
+ "column-name" → "column_name" ← {{ row.column_name }}
194
+ "Product_ID" → "product_id" ← {{ row.product_id }}
195
+ ```
196
+
197
+ ### Template-DataFrame Mapping Example
198
+
199
+ **DataFrame:**
200
+ ```python
201
+ df = pd.DataFrame({
202
+ 'Report Section': ['Executive Summary', 'Financial Analysis'],
203
+ 'Key Finding': ['Revenue up 20%', 'Costs reduced 15%'],
204
+ 'Data Source': ['Q4 Report', 'Annual Budget']
205
+ })
206
+ ```
207
+
208
+ **Template (MUST use snake_case):**
209
+ ```jinja
210
+ {% for g in groups %}
211
+ # {{ g.report_section }} {# Normalized from "Report Section" #}
212
+
213
+ {% if g.llm_response %}
214
+ {{ g.llm_response }}
215
+ {% else %}
216
+ {% for row in g.rows %}
217
+ **Finding**: {{ row.key_finding }} {# Normalized from "Key Finding" #}
218
+ **Source**: {{ row.data_source }} {# Normalized from "Data Source" #}
219
+ {% endfor %}
220
+ {% endif %}
221
+ {% endfor %}
222
+ ```
223
+
224
+ ### Group-Specific Instructions: Key Format
225
+
226
+ When providing `group_specific_instructions`, you must use a specific key format to ensure the instructions are correctly matched to groups:
227
+
228
+ ⚠️ **Key Format**: `"{snake_case_column}:{snake_case_value}"`
229
+
230
+ **Why both in snake_case?**
231
+ - DataFrame column names are automatically normalized to snake_case (e.g., `"Report Section"` → `"report_section"`)
232
+ - Group values are also normalized to snake_case (e.g., `"Executive Summary"` → `"executive_summary"`)
233
+ - Your instruction keys must match this normalized format
234
+
235
+ **Example:**
236
+
237
+ If your DataFrame has:
238
+ - Column: `"Section"`
239
+ - Values: `"Executive Summary"`, `"Detailed Analysis"`, `"Recommendations"`
240
+
241
+ Your keys must be:
242
+
243
+ ```python
244
+ config = WriteUpAgentConfig(
245
+ generation_handler_config=GenerationHandlerConfig(
246
+ group_specific_instructions={
247
+ # Format: "{snake_case_column}:{snake_case_value}"
248
+ # Both parts must be in snake_case
249
+ "section:executive_summary": "Be concise, highlight key metrics",
250
+ "section:detailed_analysis": "Be thorough, include all data points",
251
+ "section:recommendations": "Be actionable, prioritize by impact"
252
+ }
253
+ )
254
+ )
255
+ ```
256
+
257
+ **Key Format**: `"{snake_case_column}:{snake_case_value}"`
258
+ - **Column name part**: MUST be snake_case (normalized column name)
259
+ - **Value part**: MUST be snake_case (normalized group value)
260
+
261
+ **Transformation Table:**
262
+
263
+ | DataFrame Column | DataFrame Value | Normalized Column | Normalized Value | Required Key |
264
+ | :--------------- | :-------------- | :---------------- | :--------------- | :----------- |
265
+ | `Section` | `Executive Summary` | `section` | `executive_summary` | `section:executive_summary` |
266
+ | `Report Section` | `User Feedback` | `report_section` | `user_feedback` | `report_section:user_feedback` |
267
+ | `topic-name` | `API-Design` | `topic_name` | `api_design` | `topic_name:api_design` |
268
+
269
+ ### Validation and Error Messages
270
+
271
+ If your template references columns incorrectly, you'll see:
272
+
273
+ ```
274
+ DataFrameValidationError: DataFrame missing required columns after
275
+ snake_case normalization: ['My Section', 'UserQuestion']
276
+ Available columns: ['my_section', 'user_question', 'user_answer']
277
+ ```
278
+
279
+ This tells you:
280
+ - What the template is looking for (incorrect format)
281
+ - What columns are actually available (snake_case format)
282
+
283
+ ### Quick Reference: Naming Rules
284
+
285
+ | Component | Format | Example | Notes |
286
+ |-----------|--------|---------|-------|
287
+ | **DataFrame Columns** | Any format | `"My Column"`, `"UserName"` | Will be normalized to snake_case |
288
+ | **DataFrame Values** | Any format | `"Executive Summary"` | Will be normalized to snake_case |
289
+ | **Template Variables** | **snake_case** | `{{ g.my_column }}` | MUST use snake_case |
290
+ | **Group Instruction Keys** | **snake_case:snake_case** | `"my_column:executive_summary"` | Both parts in snake_case |
291
+
292
+ ---
293
+
294
+ ## How It Works
295
+
296
+ ### Input: DataFrame
297
+
298
+ The agent requires a pandas DataFrame with your data. Column names can be in any format (spaces, PascalCase, etc.) - they'll be automatically normalized to snake_case.
299
+
300
+ ```python
301
+ import pandas as pd
302
+
303
+ df = pd.DataFrame({
304
+ 'Section': ['Introduction', 'Methods', 'Results'],
305
+ 'Question': ['What is X?', 'How does Y?', 'What are Z?'],
306
+ 'Answer': ['X is...', 'Y works by...', 'Z are...']
307
+ })
308
+ ```
309
+
310
+ ### Configuration
311
+
312
+ Two main components:
313
+
314
+ 1. **WriteUpAgentConfig**: Defines the template and generation settings
315
+ 2. **LanguageModelService**: Provides LLM access for summarization (passed at process time)
316
+
317
+ ```python
318
+ from unique_toolkit._common.experimental.write_up_agent import (
319
+ WriteUpAgent,
320
+ WriteUpAgentConfig,
321
+ )
322
+ from unique_toolkit.language_model.service import LanguageModelService
323
+
324
+ config = WriteUpAgentConfig() # Uses default template
325
+ agent = WriteUpAgent(config=config)
326
+
327
+ # LLM service is created separately and passed to process()
328
+ llm_service = LanguageModelService.from_settings(settings)
329
+ ```
330
+
331
+ ### Processing
332
+
333
+ The agent orchestrates a multi-step pipeline:
334
+
335
+ ```python
336
+ report = agent.process(df, llm_service=llm_service) # Returns markdown string
337
+ ```
338
+
339
+ **Internal Pipeline:**
340
+
341
+ 1. **Template Parsing**: Extract grouping column and selected columns from template
342
+ 2. **DataFrame Validation**: Verify required columns exist (after snake_case normalization)
343
+ 3. **Grouping**: Create groups based on grouping column, preserving DataFrame order
344
+ 4. **Batching**: Split large groups into manageable batches (token/row limits)
345
+ 5. **LLM Generation**: Generate summaries for each batch with context
346
+ 6. **Report Assembly**: Combine all summaries into final markdown report
347
+
348
+ ### Output: Markdown Report
349
+
350
+ ```markdown
351
+ # Introduction
352
+
353
+ This section introduces the concept of X, explaining its fundamental
354
+ principles and applications...
355
+
356
+ ---
357
+
358
+ # Methods
359
+
360
+ The methodology involves Y, which operates through a series of steps...
361
+
362
+ ---
363
+ ```
364
+
365
+ ---
366
+
367
+ ## Template System
368
+
369
+ ### Templates as Configuration
370
+
371
+ The Jinja2 template serves as the **single source of truth**, defining:
372
+
373
+ - **Grouping column**: Which column to group by (e.g., `section`)
374
+ - **Selected columns**: Which columns to include in each group (e.g., `question`, `answer`)
375
+ - **Output structure**: How the final report should be formatted
376
+
377
+ ### Default Template
378
+
379
+ The default template expects three columns: `section`, `question`, `answer`
380
+
381
+ ```jinja
382
+ {% for g in groups %}
383
+ # {{ g.section }}
384
+
385
+ {% if g.llm_response %}
386
+ {{ g.llm_response }}
387
+ {% else %}
388
+ {% for row in g.rows %}
389
+ **Q: {{ row.question }}**
390
+
391
+ A: {{ row.answer }}
392
+
393
+ {% endfor %}
394
+ {% endif %}
395
+
396
+ ---
397
+ {% endfor %}
398
+ ```
399
+
400
+ ### How Template Parsing Works
401
+
402
+ The agent automatically detects:
403
+
404
+ ```jinja
405
+ # {{ g.section }} → Grouping column: "section"
406
+ {{ row.question }} → Selected column: "question"
407
+ {{ row.answer }} → Selected column: "answer"
408
+ ```
409
+
410
+ ### Reserved Keywords
411
+
412
+ These keywords are reserved for template logic (not treated as data columns):
413
+
414
+ - `g.rows`: List of row dictionaries
415
+ - `g.llm_response`: LLM-generated summary
416
+ - `g.instructions`: Group-specific instructions (future use)
417
+
418
+ ### Two-Phase Rendering
419
+
420
+ **Phase 1 - LLM Input** (`g.llm_response` is None):
421
+ ```markdown
422
+ **Q: What is the Write-Up Agent?**
423
+ A: A tool for generating reports...
424
+ ```
425
+
426
+ **Phase 2 - Final Report** (`g.llm_response` is provided):
427
+ ```markdown
428
+ The Write-Up Agent is an automated tool that transforms structured
429
+ DataFrame data into coherent summaries...
430
+ ```
431
+
432
+ ### Custom Templates
433
+
434
+ Create your own template for different data structures:
435
+
436
+ ```jinja
437
+ {% for g in groups %}
438
+ ## {{ g.category }}
439
+
440
+ {% if g.llm_response %}
441
+ {{ g.llm_response }}
442
+ {% else %}
443
+ {% for row in g.rows %}
444
+ - **{{ row.product }}**: ${{ row.price }} - {{ row.description }}
445
+ {% endfor %}
446
+ {% endif %}
447
+ {% endfor %}
448
+ ```
449
+
450
+ This template expects columns: `category`, `product`, `price`, `description`
451
+
452
+ ---
453
+
454
+ ## Configuration
455
+
456
+ ### WriteUpAgentConfig
457
+
458
+ ```python
459
+ from unique_toolkit._common.experimental.write_up_agent import WriteUpAgentConfig
460
+
461
+ config = WriteUpAgentConfig(
462
+ # Template (default: Q&A template for section/question/answer)
463
+ template="{% for g in groups %}...",
464
+
465
+ # Generation settings
466
+ generation_handler_config=GenerationHandlerConfig(
467
+ language_model=language_model_info,
468
+ common_instruction="You are a technical writer...",
469
+ max_rows_per_batch=20,
470
+ max_tokens_per_batch=4000,
471
+ group_specific_instructions={
472
+ "section:Introduction": "Be welcoming and engaging",
473
+ "section:Methods": "Be precise and technical"
474
+ }
475
+ )
476
+ )
477
+ ```
478
+
479
+ ### Configuration Options
480
+
481
+ | Parameter | Type | Default | Description |
482
+ |-----------|------|---------|-------------|
483
+ | `template` | `str` | Default Q&A template | Jinja2 template defining structure |
484
+ | `generation_handler_config` | `GenerationHandlerConfig` | Default config | LLM generation settings |
485
+
486
+ ### GenerationHandlerConfig
487
+
488
+ | Parameter | Type | Default | Description |
489
+ |-----------|------|---------|-------------|
490
+ | `language_model` | `LMI` | Required | Language model to use |
491
+ | `common_instruction` | `str` | Default system prompt | Base instruction for all groups |
492
+ | `max_rows_per_batch` | `int` | 20 | Max rows per LLM call |
493
+ | `max_tokens_per_batch` | `int` | 4000 | Max tokens per LLM call |
494
+ | `group_specific_instructions` | `dict[str, str]` | `{}` | Custom instructions per group (format: `"column:value"`) |
495
+ | `prompts_config` | `GenerationHandlerPromptsConfig` | Default prompts | Configurable system and user prompt templates |
496
+
497
+ ### GenerationHandlerPromptsConfig
498
+
499
+ Allows customization of the prompt templates used for LLM generation:
500
+
501
+ | Parameter | Type | Default | Description |
502
+ |-----------|------|---------|-------------|
503
+ | `system_prompt_template` | `str` | Default system template | Jinja2 template for system prompt |
504
+ | `user_prompt_template` | `str` | Default user template | Jinja2 template for user prompt |
505
+
506
+ **Example: Custom Prompts**
507
+ ```python
508
+ from unique_toolkit._common.experimental.write_up_agent.services.generation_handler.prompts.config import (
509
+ GenerationHandlerPromptsConfig
510
+ )
511
+
512
+ custom_prompts = GenerationHandlerPromptsConfig(
513
+ system_prompt_template="""
514
+ You are a professional technical writer.
515
+ {{ common_instruction }}
516
+ """,
517
+ user_prompt_template="""
518
+ Section: {{ section_name }}
519
+
520
+ {% if group_instruction %}
521
+ Special instructions: {{ group_instruction }}
522
+ {% endif %}
523
+
524
+ {% if previous_summary %}
525
+ Previous context: {{ previous_summary }}
526
+ {% endif %}
527
+
528
+ Content to summarize:
529
+ {{ content }}
530
+ """
531
+ )
532
+
533
+ gen_config = GenerationHandlerConfig(
534
+ language_model=language_model_info,
535
+ prompts_config=custom_prompts
536
+ )
537
+ ```
538
+
539
+ ### Utility Functions
540
+
541
+ **template_loader(parent_dir, template_name)**
542
+
543
+ Helper function to load Jinja2 templates from the filesystem:
544
+
545
+ ```python
546
+ from pathlib import Path
547
+ from unique_toolkit._common.experimental.write_up_agent.utils import template_loader
548
+
549
+ # Load a custom template file
550
+ template_path = Path(__file__).parent
551
+ template_content = template_loader(template_path, "my_template.j2")
552
+ ```
553
+
554
+ | Parameter | Type | Description |
555
+ |-----------|------|-------------|
556
+ | `parent_dir` | `Path` | Directory containing the template |
557
+ | `template_name` | `str` | Name of the template file (e.g., "template.j2") |
558
+ | **Returns** | `str` | Template content as a string |
559
+
560
+ ---
561
+
562
+ ## Architecture
563
+
564
+ ### Dependency Injection Pattern
565
+
566
+ The agent uses **dependency injection** for the `LanguageModelService`:
567
+
568
+ - **At initialization**: The agent is configured with templates and settings, but no LLM service
569
+ - **At process time**: The LLM service is passed as a parameter to `process()`
570
+
571
+ **Benefits:**
572
+ - **Flexibility**: Use different LLM services for different processing runs
573
+ - **Reusability**: One agent instance can be used with multiple LLM services
574
+ - **Testability**: Easy to mock LLM services for testing
575
+ - **Separation**: Agent configuration is independent of LLM service lifecycle
576
+
577
+ **Example:**
578
+ ```python
579
+ # Initialize once with configuration
580
+ agent = WriteUpAgent(config=config)
581
+
582
+ # Use with different LLM services or configurations
583
+ llm_service_gpt4 = LanguageModelService.from_settings(gpt4_settings)
584
+ report1 = agent.process(df1, llm_service=llm_service_gpt4)
585
+
586
+ llm_service_claude = LanguageModelService.from_settings(claude_settings)
587
+ report2 = agent.process(df2, llm_service=llm_service_claude)
588
+ ```
589
+
590
+ ### Separation of Concerns
591
+
592
+ The agent follows a clean architecture with three main handlers:
593
+
594
+ ```
595
+ WriteUpAgent (Orchestrator)
596
+ ├── TemplateHandler (Template Operations)
597
+ │ ├── Parse template structure
598
+ │ ├── Extract columns
599
+ │ └── Render groups
600
+ ├── DataFrameHandler (Data Operations)
601
+ │ ├── Normalize column names
602
+ │ ├── Validate columns
603
+ │ └── Create groups
604
+ └── GenerationHandler (LLM Operations)
605
+ ├── Create batches
606
+ ├── Build prompts
607
+ ├── Call LLM (with injected LanguageModelService)
608
+ └── Aggregate summaries
609
+ ```
610
+
611
+ ### Data Flow
612
+
613
+ ```
614
+ DataFrame → Normalize → Validate → Group → Batch → LLM → Aggregate → Report
615
+ ```
616
+
617
+ ### Type-Safe Schemas
618
+
619
+ ```python
620
+ from unique_toolkit._common.experimental.write_up_agent import (
621
+ GroupData,
622
+ ProcessedGroup
623
+ )
624
+
625
+ # GroupData: After DataFrame grouping
626
+ GroupData(
627
+ group_key="Introduction",
628
+ rows=[{"question": "...", "answer": "..."}]
629
+ )
630
+
631
+ # ProcessedGroup: After LLM generation
632
+ ProcessedGroup(
633
+ group_key="Introduction",
634
+ rows=[{"question": "...", "answer": "..."}],
635
+ llm_response="The introduction section..."
636
+ )
637
+ ```
638
+
639
+ ---
640
+
641
+ ## Examples
642
+
643
+ ### Basic Usage
644
+
645
+ ```python
646
+ import pandas as pd
647
+ from unique_toolkit._common.experimental.write_up_agent import (
648
+ WriteUpAgent,
649
+ WriteUpAgentConfig,
650
+ )
651
+ from unique_toolkit.app.unique_settings import UniqueSettings
652
+ from unique_toolkit.language_model.service import LanguageModelService
653
+
654
+ # Setup
655
+ settings = UniqueSettings.from_env()
656
+ settings.init_sdk()
657
+
658
+ # Create DataFrame
659
+ df = pd.DataFrame({
660
+ 'section': ['Intro', 'Methods', 'Results'],
661
+ 'question': ['What?', 'How?', 'What result?'],
662
+ 'answer': ['Answer 1', 'Answer 2', 'Answer 3']
663
+ })
664
+
665
+ # Initialize agent
666
+ config = WriteUpAgentConfig()
667
+ agent = WriteUpAgent(config=config)
668
+
669
+ # Create LLM service
670
+ llm_service = LanguageModelService.from_settings(settings)
671
+
672
+ # Generate report
673
+ report = agent.process(df, llm_service=llm_service)
674
+ print(report)
675
+ ```
676
+
677
+ ### Custom Template Example
678
+
679
+ ```python
680
+ custom_template = """
681
+ {% for g in groups %}
682
+ # {{ g.region }} Market Analysis
683
+
684
+ {% if g.llm_response %}
685
+ {{ g.llm_response }}
686
+ {% else %}
687
+ {% for row in g.rows %}
688
+ - **{{ row.product }}**: {{ row.units }} units sold
689
+ {% endfor %}
690
+ {% endif %}
691
+
692
+ ---
693
+ {% endfor %}
694
+ """
695
+
696
+ config = WriteUpAgentConfig(template=custom_template)
697
+
698
+ df = pd.DataFrame({
699
+ 'Region': ['North', 'South', 'East'],
700
+ 'Product': ['Widget', 'Gadget', 'Tool'],
701
+ 'Units': [100, 200, 150]
702
+ })
703
+
704
+ agent = WriteUpAgent(config=config)
705
+ llm_service = LanguageModelService.from_settings(settings)
706
+ report = agent.process(df, llm_service=llm_service)
707
+ ```
708
+
709
+ ### With Group-Specific Instructions
710
+
711
+ ```python
712
+ from unique_toolkit._common.experimental.write_up_agent.services.generation_handler import (
713
+ GenerationHandlerConfig
714
+ )
715
+
716
+ # DataFrame column: "Section"
717
+ # DataFrame values: "Executive Summary", "Detailed Analysis", "Recommendations"
718
+
719
+ gen_config = GenerationHandlerConfig(
720
+ language_model=language_model_info,
721
+ common_instruction="You are an expert data analyst.",
722
+ group_specific_instructions={
723
+ # Format: "snake_case_column:snake_case_value"
724
+ # Both column name AND value must be in snake_case
725
+ "section:executive_summary": "Be concise, highlight key metrics",
726
+ "section:detailed_analysis": "Be thorough, include all data points",
727
+ "section:recommendations": "Be actionable, prioritize by impact"
728
+ }
729
+ )
730
+
731
+ config = WriteUpAgentConfig(generation_handler_config=gen_config)
732
+ agent = WriteUpAgent(config=config)
733
+
734
+ # Process with LLM service
735
+ llm_service = LanguageModelService.from_settings(settings)
736
+ report = agent.process(df, llm_service=llm_service)
737
+ ```
738
+
739
+ **Important**: Both the column name (`section`) AND the values (`executive_summary`, etc.) must be in snake_case to match the automatic normalization applied to your DataFrame.
740
+ ---
741
+
742
+ ## Advanced Features
743
+
744
+ ### Automatic Column Normalization
745
+
746
+ All column names are automatically converted to snake_case for template compatibility:
747
+
748
+ | Original | Normalized |
749
+ |----------|------------|
750
+ | `My Column` | `my_column` |
751
+ | `UserName` | `user_name` |
752
+ | `section-name` | `section_name` |
753
+
754
+ Your DataFrame can use any naming convention - the agent handles normalization automatically.
755
+
756
+ ### Order Preservation
757
+
758
+ Groups appear in the order they first appear in your DataFrame, not alphabetically:
759
+
760
+ ```python
761
+ df = pd.DataFrame({
762
+ 'section': ['Intro', 'Methods', 'Results', 'Intro'] # Intro appears twice
763
+ })
764
+ # Report will show: Intro → Methods → Results (not: Intro → Methods → Results → Intro)
765
+ ```
766
+
767
+ ### Adaptive Batching
768
+
769
+ For groups with many rows, the agent automatically:
770
+ 1. Splits into batches based on token/row limits
771
+ 2. Processes each batch with LLM
772
+ 3. Maintains context by passing previous summary to next batch
773
+ 4. Aggregates all batch summaries into final section summary
774
+
775
+ ### Error Handling
776
+
777
+ Custom exceptions provide clear error messages:
778
+
779
+ ```python
780
+ from unique_toolkit._common.experimental.write_up_agent.services.dataframe_handler import (
781
+ DataFrameValidationError,
782
+ DataFrameGroupingError,
783
+ )
784
+ from unique_toolkit._common.experimental.write_up_agent.services.template_handler import (
785
+ TemplateParsingError,
786
+ ColumnExtractionError,
787
+ )
788
+
789
+ try:
790
+ report = agent.process(df)
791
+ except DataFrameValidationError as e:
792
+ print(f"Missing columns: {e.missing_columns}")
793
+ except TemplateParsingError as e:
794
+ print(f"Template error: {e}")
795
+ ```
796
+
797
+ ---
798
+
799
+ ## Best Practices
800
+
801
+ 1. **Column Names**: Use descriptive names - they'll be normalized automatically
802
+ 2. **Data Organization**: Arrange DataFrame in logical order (will be preserved)
803
+ 3. **Template Design**: Start with default template, customize as needed
804
+ 4. **Batch Sizes**: Adjust `max_rows_per_batch` based on data density
805
+ 5. **Instructions**: Use `group_specific_instructions` for varied section styles
806
+ 6. **Testing**: Test with small datasets first to verify template parsing
807
+
808
+ ---
809
+
810
+ ## Troubleshooting
811
+
812
+ ### "DataFrame missing required columns"
813
+
814
+ The template references columns that don't exist in your DataFrame (after snake_case normalization).
815
+
816
+ **Solution**: Check template column references match your DataFrame columns (in snake_case).
817
+
818
+ ### "Template must use grouping pattern"
819
+
820
+ Your template doesn't include `{% for g in groups %}`.
821
+
822
+ **Solution**: Ensure template follows the grouping pattern shown in examples.
823
+
824
+ ### "Single grouping column required"
825
+
826
+ Your template references multiple grouping columns (e.g., `{{ g.col1 }}`, `{{ g.col2 }}`).
827
+
828
+ **Solution**: Currently only single-column grouping is supported. Use one grouping column.
829
+
830
+ ---
831
+
832
+ ## Future Enhancements
833
+
834
+ - [ ] **Multi-column grouping support**: Group by multiple columns simultaneously (e.g., `region` and `category`)
835
+ - [ ] **Reference handling**: Support passing a reference map to automatically resolve and include references in the generated content
836
+
837
+ ---
838
+
839
+ ## Contributing
840
+
841
+ This is an experimental feature. Feedback and contributions are welcome!
842
+
843
+ ---
844
+
845
+ ## License
846
+
847
+ Part of the Unique Toolkit - see main repository LICENSE.
848
+