agentfluent 0.2.0__tar.gz → 0.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {agentfluent-0.2.0 → agentfluent-0.3.0}/PKG-INFO +108 -57
- {agentfluent-0.2.0 → agentfluent-0.3.0}/README.md +105 -56
- {agentfluent-0.2.0 → agentfluent-0.3.0}/pyproject.toml +13 -1
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/agents/extractor.py +16 -12
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/agents/models.py +39 -12
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/analytics/pipeline.py +63 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/analytics/pricing.py +4 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/commands/analyze.py +63 -3
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/commands/config_check.py +8 -1
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/commands/list_cmd.py +30 -16
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/formatters/helpers.py +17 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/formatters/table.py +165 -38
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/main.py +36 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/config/__init__.py +6 -1
- agentfluent-0.3.0/src/agentfluent/config/mcp_discovery.py +334 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/config/models.py +43 -1
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/config/scanner.py +2 -1
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/core/discovery.py +8 -2
- agentfluent-0.3.0/src/agentfluent/core/parser.py +353 -0
- agentfluent-0.3.0/src/agentfluent/core/paths.py +90 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/core/session.py +29 -0
- agentfluent-0.3.0/src/agentfluent/diagnostics/__init__.py +7 -0
- agentfluent-0.3.0/src/agentfluent/diagnostics/aggregation.py +161 -0
- agentfluent-0.3.0/src/agentfluent/diagnostics/builtin_actions.py +76 -0
- agentfluent-0.3.0/src/agentfluent/diagnostics/correlator.py +636 -0
- agentfluent-0.3.0/src/agentfluent/diagnostics/delegation.py +528 -0
- agentfluent-0.3.0/src/agentfluent/diagnostics/mcp_assessment.py +346 -0
- agentfluent-0.3.0/src/agentfluent/diagnostics/model_routing.py +337 -0
- agentfluent-0.3.0/src/agentfluent/diagnostics/models.py +268 -0
- agentfluent-0.3.0/src/agentfluent/diagnostics/pipeline.py +265 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/diagnostics/signals.py +5 -3
- agentfluent-0.3.0/src/agentfluent/diagnostics/trace_signals.py +320 -0
- agentfluent-0.3.0/src/agentfluent/traces/__init__.py +0 -0
- agentfluent-0.3.0/src/agentfluent/traces/discovery.py +75 -0
- agentfluent-0.3.0/src/agentfluent/traces/linker.py +50 -0
- agentfluent-0.3.0/src/agentfluent/traces/models.py +119 -0
- agentfluent-0.3.0/src/agentfluent/traces/parser.py +207 -0
- agentfluent-0.3.0/src/agentfluent/traces/retry.py +91 -0
- agentfluent-0.2.0/src/agentfluent/core/parser.py +0 -255
- agentfluent-0.2.0/src/agentfluent/diagnostics/__init__.py +0 -51
- agentfluent-0.2.0/src/agentfluent/diagnostics/correlator.py +0 -248
- agentfluent-0.2.0/src/agentfluent/diagnostics/models.py +0 -75
- {agentfluent-0.2.0 → agentfluent-0.3.0}/LICENSE +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/__init__.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/agents/__init__.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/analytics/__init__.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/analytics/agent_metrics.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/analytics/tokens.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/analytics/tools.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/__init__.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/commands/__init__.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/exit_codes.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/formatters/__init__.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/cli/formatters/json_output.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/config/scoring.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/core/__init__.py +0 -0
- {agentfluent-0.2.0 → agentfluent-0.3.0}/src/agentfluent/py.typed +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: agentfluent
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.3.0
|
|
4
4
|
Summary: Local-first agent analytics with prompt diagnostics
|
|
5
5
|
Keywords: claude,agent,analytics,cli,llm,diagnostics
|
|
6
6
|
Author: Fred Pearce
|
|
@@ -20,11 +20,13 @@ Requires-Dist: typer>=0.15
|
|
|
20
20
|
Requires-Dist: rich>=14.0
|
|
21
21
|
Requires-Dist: pydantic>=2.0
|
|
22
22
|
Requires-Dist: pyyaml>=6.0
|
|
23
|
+
Requires-Dist: scikit-learn>=1.4 ; extra == 'clustering'
|
|
23
24
|
Requires-Python: >=3.12
|
|
24
25
|
Project-URL: Homepage, https://github.com/frederick-douglas-pearce/agentfluent
|
|
25
26
|
Project-URL: Repository, https://github.com/frederick-douglas-pearce/agentfluent
|
|
26
27
|
Project-URL: Issues, https://github.com/frederick-douglas-pearce/agentfluent/issues
|
|
27
28
|
Project-URL: Changelog, https://github.com/frederick-douglas-pearce/agentfluent/blob/main/CHANGELOG.md
|
|
29
|
+
Provides-Extra: clustering
|
|
28
30
|
Description-Content-Type: text/markdown
|
|
29
31
|
|
|
30
32
|
# AgentFluent
|
|
@@ -60,7 +62,7 @@ The agent observability space is crowded — several tools capture what agents d
|
|
|
60
62
|
|
|
61
63
|
- **Research-grounded.** Every diagnostic maps to a specific gap in the agent's prompt, tool list, or model selection — not vibes. See the [research doc](docs/AGENT_ANALYTICS_RESEARCH.md) for the feasibility and positioning analysis.
|
|
62
64
|
- **Behavior-to-improvement, not just traces.** When the agent retries Bash 40% of the time, AgentFluent tells you *which prompt clause is missing* — not just that the retry happened.
|
|
63
|
-
- **The config is the agent.** In interactive sessions, the human course-corrects. In programmatic agents, the prompt and tool setup *are* the agent — a flaw compounds at scale. AgentFluent scores
|
|
65
|
+
- **The config is the agent.** In interactive sessions, the human course-corrects. In programmatic agents, the prompt and tool setup *are* the agent — a flaw compounds at scale. AgentFluent scores description, tools (`allowed_tools` / `disallowedTools`), model, and prompt on every agent definition, and audits MCP server configuration (configured-but-unused, observed-but-missing) against real tool usage. Hook coverage and cross-agent pattern detection are on the roadmap.
|
|
64
66
|
- **Local-first and private.** All analysis runs on your machine. Zero outbound network calls. No API key required.
|
|
65
67
|
- **CLI-native.** `agentfluent analyze --format json | jq ...` — fits agent developer workflows (terminal, CI/CD, PR checks) without a web dashboard dependency.
|
|
66
68
|
- **JSON output envelope is a contract.** A stable `{version, command, data}` schema lets you build PR gates, trend dashboards, and regression detectors on top without tracking AgentFluent's internal refactors.
|
|
@@ -83,21 +85,23 @@ If you write your own prompts each session, use CodeFluent. If your prompts live
|
|
|
83
85
|
|
|
84
86
|
## Screenshots
|
|
85
87
|
|
|
86
|
-
<
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
<
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
88
|
+
**Execution Analytics** — `agentfluent analyze --project <name>`
|
|
89
|
+
|
|
90
|
+

|
|
91
|
+
|
|
92
|
+
**Behavior Diagnostics** — `agentfluent analyze --project <name> --diagnostics`
|
|
93
|
+
|
|
94
|
+

|
|
95
|
+
|
|
96
|
+
**Suggested Subagents with copy-paste-ready YAML draft** — `agentfluent analyze --project <name> --diagnostics --verbose`
|
|
97
|
+
|
|
98
|
+

|
|
99
|
+
|
|
100
|
+
**Config Assessment** — `agentfluent config-check`
|
|
101
|
+
|
|
102
|
+

|
|
103
|
+
|
|
104
|
+
<sub>Screenshots are regenerated from real session data via <code>scripts/generate_readme_screenshots.py</code>.</sub>
|
|
101
105
|
|
|
102
106
|
## Getting Started
|
|
103
107
|
|
|
@@ -152,10 +156,22 @@ agentfluent analyze --project codefluent # Full project analy
|
|
|
152
156
|
agentfluent analyze --project codefluent --agent pm # Filter to one subagent
|
|
153
157
|
agentfluent analyze --project codefluent --latest 5 # Last 5 sessions only
|
|
154
158
|
agentfluent analyze --project codefluent --diagnostics # Show behavior diagnostics
|
|
159
|
+
agentfluent analyze --project codefluent --diagnostics -v # + YAML subagent drafts
|
|
155
160
|
agentfluent analyze --project codefluent --format json | jq '.data.token_metrics.total_cost'
|
|
161
|
+
|
|
162
|
+
# Save the top-confidence cluster as a real subagent definition:
|
|
163
|
+
agentfluent analyze --project codefluent --diagnostics --format json \
|
|
164
|
+
| jq -r '.data.diagnostics.delegation_suggestions[0].yaml_draft' \
|
|
165
|
+
> ~/.claude/agents/new-agent.md
|
|
156
166
|
```
|
|
157
167
|
|
|
158
|
-
Produces a token-usage table, per-model cost breakdown (labeled as API rate — subscription plans differ), tool usage concentration, and an Agent Invocations table summarizing each subagent's token, duration, and tool-use count. `--diagnostics` surfaces
|
|
168
|
+
Produces a token-usage table, per-model cost breakdown (labeled as API rate — subscription plans differ), tool usage concentration, and an Agent Invocations table summarizing each subagent's token, duration, and tool-use count. `--diagnostics` surfaces the full v0.3 signal surface:
|
|
169
|
+
|
|
170
|
+
- **Metadata-level** (from invocation summaries): tool-error keywords, token-per-tool-use outliers, duration outliers.
|
|
171
|
+
- **Trace-level** (from `~/.claude/projects/<session>/subagents/`): retry loops, stuck patterns, permission failures, consecutive tool-error sequences — each with per-tool-call evidence.
|
|
172
|
+
- **Aggregate**: model mismatch (complexity class wrong for declared/observed model), delegation clustering (recurring `general-purpose` patterns → proposed specialized subagents), MCP server audit (configured-but-unused, observed-but-missing).
|
|
173
|
+
|
|
174
|
+
Near-duplicate recommendations are aggregated per `(agent, target, signal)` shape into one row with an occurrence `Count` and metric range (e.g. *"4 invocations (4.9x–8.0x above 5,064 mean). Consider adding more specific instructions..."*). Each recommendation carries a specific config surface to change (prompt, tools, model, mcp) and a pointer to the file to edit. Recommendations for built-in agents (Explore, general-purpose, Plan, etc.) use concern-specific action text — wrapper subagent for scope issues, retry bounds on the delegating agent for recovery issues, reroute for tools/model — since built-in agents have no user-editable prompt or tool config.
|
|
159
175
|
|
|
160
176
|
Cost numbers reflect current per-token pricing; historical sessions are priced at today's rates until [#80](https://github.com/frederick-douglas-pearce/agentfluent/issues/80) (time-series pricing) lands.
|
|
161
177
|
|
|
@@ -180,9 +196,13 @@ AgentFluent's "configuration" is CLI flags — no config file, no environment va
|
|
|
180
196
|
| `--scope` | `all` | `config-check` scope: `user`, `project`, or `all` |
|
|
181
197
|
| `--agent` | (none) | Filter `analyze` or `config-check` to one subagent type |
|
|
182
198
|
| `--latest N` | (all sessions) | `analyze` only the N most recent sessions |
|
|
199
|
+
| `--session` | (all) | `analyze` a specific session filename within the project |
|
|
183
200
|
| `--diagnostics` | off | `analyze`: show behavior-correlation signals |
|
|
201
|
+
| `--min-cluster-size` | 5 | Delegation clustering: minimum invocations per cluster (requires `agentfluent[clustering]`) |
|
|
202
|
+
| `--min-similarity` | 0.7 | Delegation dedup: cosine-similarity threshold against existing agents |
|
|
203
|
+
| `--claude-config-dir` | `~/.claude/` | Override the Claude config root (also honors `$CLAUDE_CONFIG_DIR`) |
|
|
184
204
|
| `--format` | `table` | Output format: `table` (Rich) or `json` (envelope) |
|
|
185
|
-
| `--verbose` | off | Extra detail
|
|
205
|
+
| `--verbose` | off | Extra detail: per-session breakdown, per-invocation detail, raw (un-aggregated) recommendations, and YAML subagent drafts for suggested clusters |
|
|
186
206
|
| `--quiet` | off | Suppress non-essential output (useful in CI) |
|
|
187
207
|
|
|
188
208
|
## Output formats
|
|
@@ -212,22 +232,31 @@ No ANSI escapes in JSON output, guaranteed. The key `total_cost` is the pay-per-
|
|
|
212
232
|
flowchart LR
|
|
213
233
|
subgraph Local["Local filesystem — nothing leaves this boundary"]
|
|
214
234
|
S["Session JSONL<br/>~/.claude/projects/"]
|
|
215
|
-
|
|
235
|
+
ST["Subagent traces<br/><session>/subagents/"]
|
|
236
|
+
A["Agent definitions<br/>~/.claude/agents/"]
|
|
237
|
+
M["MCP config<br/>~/.claude.json<br/>.mcp.json"]
|
|
216
238
|
end
|
|
217
239
|
|
|
218
240
|
S --> P[Parser]
|
|
241
|
+
ST --> TP[Trace Parser<br/>+ Linker]
|
|
219
242
|
P --> X[Agent Extractor]
|
|
220
243
|
P --> TM[Token & Cost<br/>Metrics]
|
|
221
244
|
P --> TU[Tool Usage<br/>Patterns]
|
|
245
|
+
TP --> X
|
|
222
246
|
A --> CS[Config Scanner]
|
|
223
247
|
CS --> SC[Config Scorer]
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
248
|
+
M --> MD[MCP Discovery]
|
|
249
|
+
|
|
250
|
+
X --> DX[Delegation<br/>Clustering]
|
|
251
|
+
X --> MR[Model-Routing<br/>Analysis]
|
|
252
|
+
X --> SIG[Signal Extraction<br/>metadata + trace]
|
|
253
|
+
SIG --> COR[Correlator]
|
|
254
|
+
MR --> COR
|
|
255
|
+
DX --> COR
|
|
256
|
+
MD --> COR
|
|
257
|
+
SC --> COR
|
|
258
|
+
|
|
259
|
+
COR --> OUT["Rich tables<br/>or JSON envelope"]
|
|
231
260
|
TM --> OUT
|
|
232
261
|
TU --> OUT
|
|
233
262
|
SC --> OUT
|
|
@@ -236,12 +265,15 @@ flowchart LR
|
|
|
236
265
|
Step by step:
|
|
237
266
|
|
|
238
267
|
1. **Parse JSONL** — `core/parser.py` reads each session file into typed `SessionMessage` objects. Handles streaming snapshot deduplication, plain-string vs. array content shapes, and Claude Code's real `toolUseResult` format (see [`CLAUDE.md`](CLAUDE.md) for the format spec).
|
|
239
|
-
2. **
|
|
240
|
-
3. **
|
|
241
|
-
4. **
|
|
242
|
-
5. **
|
|
243
|
-
6. **
|
|
244
|
-
7. **
|
|
268
|
+
2. **Parse subagent traces** — `traces/parser.py` reads per-session subagent files under `<session>/subagents/agent-<agentId>.jsonl` and reconstructs the internal tool-call sequence with `is_error` flags. `traces/linker.py` attaches each trace back to its parent invocation via `agentId`. `traces/retry.py` detects retry sequences within a trace.
|
|
269
|
+
3. **Discover projects and sessions** — `core/discovery.py` enumerates `~/.claude/projects/` and surfaces friendly display names.
|
|
270
|
+
4. **Extract agent invocations** — `agents/extractor.py` walks messages, pairs Agent `tool_use` blocks with their `tool_result` content blocks, and pulls per-invocation metadata (tokens, duration, tool-use count) from the containing user message's `toolUseResult` sibling.
|
|
271
|
+
5. **Compute token and cost metrics** — `analytics/tokens.py` aggregates usage per model with `<synthetic>` sentinel filtering; `analytics/pricing.py` applies per-token rates labeled as API rate.
|
|
272
|
+
6. **Score agent configurations** — `config/scanner.py` parses YAML frontmatter from each `.md` in `.claude/agents/` and `~/.claude/agents/`; `config/scoring.py` scores description, tools, model, and prompt on a 4-dimension rubric.
|
|
273
|
+
7. **Discover MCP servers** — `config/mcp_discovery.py` reads `mcpServers` from `~/.claude.json` (user + project-local scopes) and `.mcp.json` (project-shared), honoring the `enabledMcpjsonServers` / `disabledMcpjsonServers` gating arrays. Used by the audit phase to compare against observed `mcp__*` tool usage.
|
|
274
|
+
8. **Diagnose behavior** — `diagnostics/` extracts metadata signals (`signals.py`), trace-level signals (`trace_signals.py` — retry loops, stuck patterns, permission failures, error sequences), model-routing mismatches (`model_routing.py`), and MCP audit signals (`mcp_assessment.py`). `correlator.py` routes each signal to a config target (prompt/tools/model/mcp) and emits an actionable recommendation.
|
|
275
|
+
9. **Propose new subagents** — `diagnostics/delegation.py` clusters recurring `general-purpose` invocations via TF-IDF + KMeans and drafts candidate subagent definitions with name, model, tool list, and prompt scaffold. Under `--verbose`, each draft is emitted as a copy-paste-ready YAML frontmatter block. Deduped against existing agents by cosine similarity.
|
|
276
|
+
10. **Render** — `cli/formatters/table.py` emits Rich tables; `cli/formatters/json.py` emits the stable JSON envelope. Format is selected by `--format`.
|
|
245
277
|
|
|
246
278
|
Everything runs locally. No outbound network calls, ever. No API key needed.
|
|
247
279
|
|
|
@@ -250,7 +282,11 @@ Everything runs locally. No outbound network calls, ever. No API key needed.
|
|
|
250
282
|
- **Project and Session Discovery** — Enumerates `~/.claude/projects/`, groups sessions by project, shows per-project session count, total size, and last-modified timestamp. Handles Claude Code subagent sidechain files and Agent SDK sessions uniformly.
|
|
251
283
|
- **Execution Analytics** — Token usage, API-rate cost, cache efficiency, per-model breakdown, tool-call concentration, and per-agent invocation metrics (tokens, duration, tool-use count). Cache creation and cache read tokens are tracked separately so you can see where your prompt caching is working.
|
|
252
284
|
- **Agent Config Assessment** — 4-dimension rubric (description, tools, model, prompt) applied to every `.md` file in `~/.claude/agents/` and `./.claude/agents/`. Produces a 0–100 score plus ranked, specific recommendations ("Prompt body doesn't mention error handling"). Catches agents that are technically valid but miss well-known best practices.
|
|
253
|
-
- **
|
|
285
|
+
- **Subagent Trace Parsing** — Parses the internal tool-call sequences Claude Code emits under `~/.claude/projects/<session>/subagents/agent-<agentId>.jsonl`, links them back to the delegating invocation, and detects retry sequences. Gives diagnostics per-call evidence (which tool, which attempt, which error) instead of just an invocation-level summary.
|
|
286
|
+
- **Behavior Diagnostics** — `--diagnostics` emits signals across three layers. *Metadata*: tool-error keywords, token-per-tool-use outliers, duration outliers. *Trace-level*: retry loops, stuck patterns (same call repeated with no progress), permission failures, consecutive tool-error sequences. *Aggregate*: model mismatch (declared/observed model wrong for the workload's complexity), MCP server audit (configured-but-unused, observed-but-missing). Near-duplicate recommendations collapse into one row per `(agent, target, signal)` shape with an occurrence `Count` and metric range, sorted severity-desc then count-desc so the highest-impact findings surface first. Recommendations for built-in agents (Explore, general-purpose, Plan, code-reviewer, etc.) use concern-specific action text since built-ins have no user-editable config. Each signal routes to a `target` config surface — prompt, tools, model, or mcp — and the recommendation names the file to edit and the specific change to make.
|
|
287
|
+
- **Delegation Clustering** — TF-IDF + KMeans on recurring `general-purpose` invocations surfaces patterns that would benefit from their own specialized subagent. Proposes a complete draft: name, description, recommended model (with cost reasoning), tool list derived from the cluster's trace data, and a prompt-body scaffold. Under `--verbose`, each cluster emits a copy-paste-ready **YAML subagent definition block** (frontmatter + prompt body) that can be saved directly as `~/.claude/agents/<name>.md`. Low-confidence clusters are kept but prefixed with a `REVIEW BEFORE USE` comment so loose groupings don't land in production blindly. Confidence tiers (high/medium/low) are calibrated against real-world cohesion distributions from multi-contributor datasets. Suppresses drafts that overlap existing agents and annotates the overlap. Requires the optional `agentfluent[clustering]` extra.
|
|
288
|
+
- **Model-Routing Diagnostics** — Per-agent-type classification of observed complexity (tool-call counts, token footprint, error rate, write-tool presence) compared against the agent's declared model tier. Flags overspec (complex model on simple workload — cost savings estimate included) and underspec (simple model struggling). Consumes trace-based model inference when frontmatter is absent.
|
|
289
|
+
- **MCP Server Assessment** — Reads configured MCP servers from `~/.claude.json` (user + project-local) and `.mcp.json` (project-shared), honoring per-user enable/disable gating. Compares against observed `mcp__<server>__*` tool usage from both parent sessions and subagent traces. Emits `MCP_UNUSED_SERVER` (INFO, configured but zero calls) and `MCP_MISSING_SERVER` (WARNING, failing calls to an unconfigured server) signals with actionable recommendations.
|
|
254
290
|
- **JSON Output Envelope** — Stable `{version, command, data}` schema. No ANSI escapes. Intended as a programmatic contract for CI integration, PR gates, and regression tracking.
|
|
255
291
|
- **Quiet and Verbose Modes** — `--quiet` for CI-friendly one-line summaries; `--verbose` for per-session breakdown and per-invocation detail tables. Defaults target interactive humans.
|
|
256
292
|
|
|
@@ -265,7 +301,7 @@ AgentFluent is designed so data stays on your machine. The attack surface is sma
|
|
|
265
301
|
| Input validation | Pydantic models with strict type constraints | Malformed JSONL crashing the parser |
|
|
266
302
|
| Safe YAML loading | `yaml.safe_load` only | Arbitrary code execution via frontmatter |
|
|
267
303
|
| CI security review | Claude-powered review on every PR | New vulnerabilities |
|
|
268
|
-
| Automated testing |
|
|
304
|
+
| Automated testing | 730+ unit tests incl. security-focused cases | Regressions |
|
|
269
305
|
|
|
270
306
|
### Secrets handling
|
|
271
307
|
|
|
@@ -286,7 +322,7 @@ See [`docs/SECURITY.md`](docs/SECURITY.md) for the full policy: leak vector, def
|
|
|
286
322
|
- **[Typer](https://typer.tiangolo.com) + [Rich](https://rich.readthedocs.io)** — CLI framework and terminal formatting
|
|
287
323
|
- **[Pydantic v2](https://docs.pydantic.dev)** — data models across module boundaries
|
|
288
324
|
- **[PyYAML](https://pyyaml.org)** — agent definition frontmatter parsing (`safe_load` only)
|
|
289
|
-
- **[pytest](https://pytest.org) + pytest-cov** —
|
|
325
|
+
- **[pytest](https://pytest.org) + pytest-cov** — 730+ tests
|
|
290
326
|
- **[mypy](https://mypy.readthedocs.io) strict mode** — full type coverage
|
|
291
327
|
- **[ruff](https://docs.astral.sh/ruff/)** — linting and formatting
|
|
292
328
|
- **[uv](https://docs.astral.sh/uv/)** — package and dependency management
|
|
@@ -299,8 +335,10 @@ src/agentfluent/
|
|
|
299
335
|
├── core/ # JSONL parser, session models, project/session discovery
|
|
300
336
|
├── agents/ # Agent invocation extraction and AgentInvocation model
|
|
301
337
|
├── analytics/ # Token/cost metrics, tool patterns, model pricing
|
|
302
|
-
├── config/ # Agent definition scanner
|
|
303
|
-
|
|
338
|
+
├── config/ # Agent definition scanner + scoring + MCP server discovery
|
|
339
|
+
├── traces/ # Subagent trace parsing, linking, and retry detection
|
|
340
|
+
└── diagnostics/ # Behavior signals (metadata + trace), correlation,
|
|
341
|
+
# model routing, delegation clustering, MCP audit
|
|
304
342
|
```
|
|
305
343
|
|
|
306
344
|
Full architecture and conventions are documented in [`CLAUDE.md`](CLAUDE.md).
|
|
@@ -317,7 +355,7 @@ uv run agentfluent --help
|
|
|
317
355
|
### Testing
|
|
318
356
|
|
|
319
357
|
```bash
|
|
320
|
-
uv run pytest -m "not integration" #
|
|
358
|
+
uv run pytest -m "not integration" # 730+ unit tests (CI default)
|
|
321
359
|
uv run pytest # Full suite incl. integration tests against your real ~/.claude/projects/
|
|
322
360
|
uv run pytest --cov=agentfluent # With coverage
|
|
323
361
|
```
|
|
@@ -345,27 +383,40 @@ Five GitHub Actions workflows run automatically:
|
|
|
345
383
|
|
|
346
384
|
## Roadmap
|
|
347
385
|
|
|
348
|
-
**v0.2 (
|
|
349
|
-
- Parser fix for real Claude Code `toolUseResult` shape ([#84](https://github.com/frederick-douglas-pearce/agentfluent/issues/84)
|
|
350
|
-
- Cost label clarity for subscription-plan users ([#76](https://github.com/frederick-douglas-pearce/agentfluent/issues/76)
|
|
351
|
-
- Pricing data correction + opus-4-7 + synthetic filter ([#75](https://github.com/frederick-douglas-pearce/agentfluent/issues/75)
|
|
352
|
-
|
|
353
|
-
**v0.3
|
|
354
|
-
-
|
|
355
|
-
-
|
|
356
|
-
-
|
|
357
|
-
-
|
|
358
|
-
-
|
|
359
|
-
-
|
|
360
|
-
-
|
|
361
|
-
-
|
|
362
|
-
-
|
|
363
|
-
-
|
|
386
|
+
**v0.2 (shipped):**
|
|
387
|
+
- Parser fix for real Claude Code `toolUseResult` shape ([#84](https://github.com/frederick-douglas-pearce/agentfluent/issues/84))
|
|
388
|
+
- Cost label clarity for subscription-plan users ([#76](https://github.com/frederick-douglas-pearce/agentfluent/issues/76))
|
|
389
|
+
- Pricing data correction + opus-4-7 + synthetic filter ([#75](https://github.com/frederick-douglas-pearce/agentfluent/issues/75))
|
|
390
|
+
|
|
391
|
+
**v0.3 (shipped):**
|
|
392
|
+
- Subagent trace parser ([E2](https://github.com/frederick-douglas-pearce/agentfluent/issues/98)) — reconstructs the full internal tool-call sequence per subagent with `is_error` flags and retry detection, linked back to the delegating invocation.
|
|
393
|
+
- Deep diagnostics engine ([E3](https://github.com/frederick-douglas-pearce/agentfluent/issues/99)) — trace-level signals: retry loops, stuck patterns, permission failures, consecutive tool-error sequences, each carrying per-tool-call evidence.
|
|
394
|
+
- Delegation clustering ([#92](https://github.com/frederick-douglas-pearce/agentfluent/issues/92)) — TF-IDF + KMeans over recurring `general-purpose` invocations; proposes complete draft subagent definitions deduped against existing agents.
|
|
395
|
+
- Model-routing diagnostics ([#95](https://github.com/frederick-douglas-pearce/agentfluent/issues/95)) — per-agent-type complexity classification vs. declared model; overspec/underspec flags with cost-savings estimates. Trace-based model inference when frontmatter is absent.
|
|
396
|
+
- MCP server assessment ([#100](https://github.com/frederick-douglas-pearce/agentfluent/issues/100)) — configured-vs-observed audit with `MCP_UNUSED_SERVER` and `MCP_MISSING_SERVER` signals.
|
|
397
|
+
- Recommendation aggregation ([#165](https://github.com/frederick-douglas-pearce/agentfluent/issues/165)) — near-duplicate rows collapse per `(agent, target, signal)` shape with occurrence count and metric range; raw list preserved for `--verbose` and JSON drill-down.
|
|
398
|
+
- Built-in vs custom agent differentiation ([#166](https://github.com/frederick-douglas-pearce/agentfluent/issues/166)) — concern-specific action text (scope / recovery / tools / model) for built-in agents that have no user-editable config; nine of ten correlation rules updated.
|
|
399
|
+
- YAML subagent draft in `--verbose` ([#168](https://github.com/frederick-douglas-pearce/agentfluent/issues/168)) — copy-paste-ready `~/.claude/agents/<name>.md` block for each cluster; exposed as `yaml_draft` field in `--format json` for jq-pipe workflows.
|
|
400
|
+
- Cluster confidence re-calibration ([#167](https://github.com/frederick-douglas-pearce/agentfluent/issues/167)) — thresholds validated against two real datasets; MEDIUM now surfaces actionable candidates instead of everything landing in LOW.
|
|
401
|
+
- Aggregated row signal-type clarity ([#181](https://github.com/frederick-douglas-pearce/agentfluent/issues/181)) — same-(agent, target) rows that fire on different signals now name the trigger in the prefix (e.g., `tool_error_sequence:` vs `retry_loop:`) instead of looking interchangeable.
|
|
402
|
+
- Unknown-agent attribution fix ([#169](https://github.com/frederick-douglas-pearce/agentfluent/issues/169)) — invocations missing `subagent_type` (older skills, certain Claude Code versions) now correctly default to `general-purpose` instead of falling out of clustering as "unknown".
|
|
403
|
+
- `--claude-config-dir` flag and `$CLAUDE_CONFIG_DIR` env var for non-default session paths ([#90](https://github.com/frederick-douglas-pearce/agentfluent/issues/90)).
|
|
404
|
+
- Empirical threshold calibration via a committed Jupyter notebook ([#140](https://github.com/frederick-douglas-pearce/agentfluent/issues/140)).
|
|
405
|
+
|
|
406
|
+
**v0.4+:**
|
|
407
|
+
- Parent-thread offload analysis ([#189](https://github.com/frederick-douglas-pearce/agentfluent/issues/189)) — detect repeating tool-use patterns in the parent Claude Code thread and recommend subagent / skill candidates that move that work onto cheaper-tier models. The dominant cost lever for users who deploy agents at scale.
|
|
408
|
+
- In-product glossary ([#190](https://github.com/frederick-douglas-pearce/agentfluent/issues/190), [#191](https://github.com/frederick-douglas-pearce/agentfluent/issues/191)) — definitions for token types, tool names, agent types, signal types, severity / confidence levels surfaced via static markdown (Phase 1) and `agentfluent explain <term>` CLI subcommand (Phase 2).
|
|
409
|
+
- Outlier-detection distribution recalibration ([#186](https://github.com/frederick-douglas-pearce/agentfluent/issues/186)) — replace ratio-to-mean outlier signals with distribution-aware detection (z-score / IQR) backed by per-agent distribution analysis.
|
|
410
|
+
- Time-series pricing data structure ([#80](https://github.com/frederick-douglas-pearce/agentfluent/issues/80)) + session-timestamp-aware cost calculation ([#81](https://github.com/frederick-douglas-pearce/agentfluent/issues/81)) + automated pricing updates ([#82](https://github.com/frederick-douglas-pearce/agentfluent/issues/82)).
|
|
411
|
+
- Agent SDK main-session MCP + tool extraction ([#112](https://github.com/frederick-douglas-pearce/agentfluent/issues/112)).
|
|
412
|
+
- Per-invocation token input/output split for more accurate cost estimates ([#143](https://github.com/frederick-douglas-pearce/agentfluent/issues/143)).
|
|
413
|
+
- Hosted documentation site ([#97](https://github.com/frederick-douglas-pearce/agentfluent/issues/97)).
|
|
414
|
+
- Prompt regression detection (`agentfluent diff`) across agent config versions.
|
|
415
|
+
- Hook coverage in the config rubric.
|
|
364
416
|
|
|
365
417
|
**Future:**
|
|
366
418
|
- Webapp dashboard for trend visualization
|
|
367
419
|
- `agentfluent diff` — side-by-side comparison of behavior before/after a prompt change
|
|
368
|
-
- MCP server configuration assessment
|
|
369
420
|
- Closed-loop self-improvement — use AgentFluent's diagnostic output as a feedback signal the agent itself consumes to propose config edits against its own past sessions
|
|
370
421
|
- Agent ROI reporting — roll up cost, usage, and task-completion signals over time so a business can evaluate whether an optimized agent is worth continuing to run
|
|
371
422
|
|
|
@@ -379,7 +430,7 @@ Browse [open issues](https://github.com/frederick-douglas-pearce/agentfluent/iss
|
|
|
379
430
|
| **No agent invocations** | Agent invocation rows require the session to actually call a subagent (`Agent` tool_use with a `subagent_type`). A session that never delegated has no agent data to analyze — this is not an error. |
|
|
380
431
|
| **Zero tokens / dashes in Agent Invocations** | If you're on AgentFluent ≤ 0.1.0, this is the [#84 parser bug](https://github.com/frederick-douglas-pearce/agentfluent/issues/84) — upgrade with `uv tool upgrade agentfluent`. |
|
|
381
432
|
| **Python version error** | AgentFluent requires Python 3.12+. Check with `python --version` and upgrade if needed. |
|
|
382
|
-
| **Non-default session path** |
|
|
433
|
+
| **Non-default session path** | Pass `--claude-config-dir /path/to/.claude` or set `$CLAUDE_CONFIG_DIR` before invoking any command. The override applies to project discovery, agent configs, and MCP server discovery together. |
|
|
383
434
|
| **`Malformed JSON at <file>:<line>` warning** | A session file has a corrupted line — usually null bytes left behind when Claude Code was killed mid-write. The parser skips the line and continues; analytics are unaffected. Safe to ignore, or delete the line with `sed -i '<line>d' <file>` to silence the warning. |
|
|
384
435
|
| **Stale tool install after local build** | If `uv tool install --from <path> agentfluent` seems to reuse cached code, run `uv tool uninstall agentfluent && uv cache clean agentfluent` before reinstalling. |
|
|
385
436
|
|