PyPI - agentx-kit - Versions diffs - 0.2.0__tar.gz → 0.3.0__tar.gz - Mend

agentx-kit 0.2.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (73) hide show

{agentx_kit-0.2.0 → agentx_kit-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: agentx-kit
-Version: 0.2.0
+Version: 0.3.0
 Summary: An open-source, provider-agnostic agentic framework + interactive project scaffolder for LangChain and CrewAI. Pick your LLM provider, agents, RAG, memory, MCP tools and skills — generate a ready-to-run uv project.
 Project-URL: Homepage, https://github.com/muhammadyahiya/agentx-kit
 Project-URL: Repository, https://github.com/muhammadyahiya/agentx-kit
@@ -48,7 +48,10 @@ Requires-Dist: mcp>=1.2.0; extra == 'all'
 Requires-Dist: openinference-instrumentation-langchain>=0.1.0; extra == 'all'
 Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.20.0; extra == 'all'
 Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'all'
+Requires-Dist: pandas>=2.0.0; extra == 'all'
 Requires-Dist: sse-starlette>=2.0.0; extra == 'all'
+Requires-Dist: streamlit>=1.40.0; extra == 'all'
+Requires-Dist: tiktoken>=0.7.0; extra == 'all'
 Requires-Dist: uvicorn[standard]>=0.29.0; extra == 'all'
 Provides-Extra: anthropic
 Requires-Dist: langchain-anthropic>=0.2.0; extra == 'anthropic'
@@ -58,6 +61,10 @@ Provides-Extra: bedrock
 Requires-Dist: langchain-aws>=0.2.0; extra == 'bedrock'
 Provides-Extra: crewai
 Requires-Dist: crewai>=0.70.0; extra == 'crewai'
+Provides-Extra: dashboard
+Requires-Dist: pandas>=2.0.0; extra == 'dashboard'
+Requires-Dist: streamlit>=1.40.0; extra == 'dashboard'
+Requires-Dist: tiktoken>=0.7.0; extra == 'dashboard'
 Provides-Extra: dev
 Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
 Requires-Dist: pytest>=8.0.0; extra == 'dev'
@@ -233,6 +240,26 @@ agentx prompt remove reviewer
 A blank `system_prompt` is auto-derived from the agent's role + goal. You can also
 just open `prompts.json` in an editor — the CLI is a convenience, not a gate.
+## 📊 Prompt dashboard (observability + optimization)
+A Streamlit workbench to **understand and refine how your prompts talk to the LLM** —
+launch it any time:
+```bash
+pip install "agentx-kit[dashboard]"
+agentx dashboard                 # opens http://localhost:8501
+agentx prompt set assistant -d   # edit a prompt AND open the dashboard
+```
+It gives you, live as you edit:
+- **Token count, context-window utilization gauge, and cost estimate** (tiktoken-accurate).
+- **Quality score (0–100)** with a checklist (role / goal / output-format / examples / constraints / specificity) and **concrete suggestions + limit warnings**.
+- **✨ One-click LLM optimization** — refines the prompt while preserving intent, shows a **diff + rationale + token delta**, and can **apply the result straight back to `prompts.json`**.
+- **▶️ Test run** — send the prompt to the model and see the response with **tokens in/out, latency, and cost**.
+- **📈 Usage trends** — tokens, cost, and latency over time, logged locally to `.agentx/insights.jsonl`.
+Run it inside a generated AgentX project and it reads/writes that project's
+`prompts.json`; run it anywhere else for a free-form prompt scratchpad.
 ## 🏢 Enterprise pack
 Generate a production-shaped project with one flag — informed by a survey of
 CrewAI/LangGraph/create-llama/AgentStack/agno/pydantic-ai (see [RESEARCH.md](RESEARCH.md)):
@@ -281,6 +308,7 @@ llm = build_resilient_chat("openai", "gpt-4o-mini", fallbacks=[("anthropic", "cl
 | `mcp` | `langchain-mcp-adapters` | MCP tools |
 | `observability` | `opentelemetry-*`, `openinference-*` | tracing |
 | `server` | `fastapi`, `uvicorn` | serving |
+| `dashboard` | `streamlit`, `tiktoken`, `pandas` | prompt observability dashboard |
 | `all` | everything above | kitchen sink |
 See [DESIGN.md](DESIGN.md) for the architecture and [RESEARCH.md](RESEARCH.md) for the competitive analysis behind these features.

{agentx_kit-0.2.0 → agentx_kit-0.3.0}/README.md RENAMED Viewed

@@ -138,6 +138,26 @@ agentx prompt remove reviewer
 A blank `system_prompt` is auto-derived from the agent's role + goal. You can also
 just open `prompts.json` in an editor — the CLI is a convenience, not a gate.
+## 📊 Prompt dashboard (observability + optimization)
+A Streamlit workbench to **understand and refine how your prompts talk to the LLM** —
+launch it any time:
+```bash
+pip install "agentx-kit[dashboard]"
+agentx dashboard                 # opens http://localhost:8501
+agentx prompt set assistant -d   # edit a prompt AND open the dashboard
+```
+It gives you, live as you edit:
+- **Token count, context-window utilization gauge, and cost estimate** (tiktoken-accurate).
+- **Quality score (0–100)** with a checklist (role / goal / output-format / examples / constraints / specificity) and **concrete suggestions + limit warnings**.
+- **✨ One-click LLM optimization** — refines the prompt while preserving intent, shows a **diff + rationale + token delta**, and can **apply the result straight back to `prompts.json`**.
+- **▶️ Test run** — send the prompt to the model and see the response with **tokens in/out, latency, and cost**.
+- **📈 Usage trends** — tokens, cost, and latency over time, logged locally to `.agentx/insights.jsonl`.
+Run it inside a generated AgentX project and it reads/writes that project's
+`prompts.json`; run it anywhere else for a free-form prompt scratchpad.
 ## 🏢 Enterprise pack
 Generate a production-shaped project with one flag — informed by a survey of
 CrewAI/LangGraph/create-llama/AgentStack/agno/pydantic-ai (see [RESEARCH.md](RESEARCH.md)):
@@ -186,6 +206,7 @@ llm = build_resilient_chat("openai", "gpt-4o-mini", fallbacks=[("anthropic", "cl
 | `mcp` | `langchain-mcp-adapters` | MCP tools |
 | `observability` | `opentelemetry-*`, `openinference-*` | tracing |
 | `server` | `fastapi`, `uvicorn` | serving |
+| `dashboard` | `streamlit`, `tiktoken`, `pandas` | prompt observability dashboard |
 | `all` | everything above | kitchen sink |
 See [DESIGN.md](DESIGN.md) for the architecture and [RESEARCH.md](RESEARCH.md) for the competitive analysis behind these features.

{agentx_kit-0.2.0 → agentx_kit-0.3.0}/pyproject.toml RENAMED Viewed

@@ -1,7 +1,7 @@
 [project]
 # PyPI distribution name (import name + CLI stay `agentx`; `agentx` was taken).
 name = "agentx-kit"
-version = "0.2.0"
+version = "0.3.0"
 description = "An open-source, provider-agnostic agentic framework + interactive project scaffolder for LangChain and CrewAI. Pick your LLM provider, agents, RAG, memory, MCP tools and skills — generate a ready-to-run uv project."
 readme = "README.md"
 requires-python = ">=3.10,<3.14"
@@ -59,6 +59,7 @@ observability = [
     "openinference-instrumentation-langchain>=0.1.0",
 ]
 server = ["fastapi>=0.110.0", "uvicorn[standard]>=0.29.0", "sse-starlette>=2.0.0"]
+dashboard = ["streamlit>=1.40.0", "tiktoken>=0.7.0", "pandas>=2.0.0"]
 # ---- Bundles ----
 all = [
@@ -71,6 +72,7 @@ all = [
     "opentelemetry-sdk>=1.20.0", "opentelemetry-exporter-otlp-proto-http>=1.20.0",
     "openinference-instrumentation-langchain>=0.1.0",
     "fastapi>=0.110.0", "uvicorn[standard]>=0.29.0", "sse-starlette>=2.0.0",
+    "streamlit>=1.40.0", "tiktoken>=0.7.0", "pandas>=2.0.0",
 ]
 dev = ["pytest>=8.0.0", "pytest-cov>=5.0.0"]

{agentx_kit-0.2.0 → agentx_kit-0.3.0}/src/agentx/__init__.py RENAMED Viewed

@@ -16,7 +16,7 @@ is enough to get started.
 """
 from __future__ import annotations
-__version__ = "0.2.0"
+__version__ = "0.3.0"
 from .providers import (  # noqa: E402
     ProviderSpec,
@@ -33,6 +33,12 @@ from .reliability import (  # noqa: E402
     build_resilient_chat,
 )
 from .structured import structured_model  # noqa: E402
+from .insights import (  # noqa: E402
+    analyze_prompt,
+    count_tokens,
+    estimate_cost,
+    optimize_prompt,
+)
 __all__ = [
     "__version__",
@@ -52,4 +58,9 @@ __all__ = [
     "apply_guards",
     "GuardrailError",
     "structured_model",
+    # prompt insights
+    "analyze_prompt",
+    "optimize_prompt",
+    "count_tokens",
+    "estimate_cost",
 ]

{agentx_kit-0.2.0 → agentx_kit-0.3.0}/src/agentx/cli.py RENAMED Viewed

@@ -46,6 +46,28 @@ def providers() -> None:
     console.print(table)
+@app.command()
+def dashboard(
+    port: int = typer.Option(8501, "--port", help="Port for the dashboard server."),
+    provider: str = typer.Option(None, "--provider", help="Default provider to preselect."),
+    model: str = typer.Option(None, "--model", help="Default model to preselect."),
+    project: Path = typer.Option(None, "--project", help="Project dir (default: cwd; auto-detects prompts.json)."),
+) -> None:
+    """Launch the prompt observability & optimization dashboard (Streamlit).
+    A workbench to edit a prompt and see token usage, context-window utilization,
+    cost, quality suggestions, one-click LLM optimization, and test runs — live.
+    """
+    from .dashboard import launch
+    console.print(f"[cyan]Launching AgentX dashboard on http://localhost:{port} …[/] (Ctrl+C to stop)")
+    try:
+        launch(port=port, provider=provider, model=model, project=str(project) if project else None)
+    except RuntimeError as exc:
+        console.print(f"[red]{exc}[/]")
+        raise typer.Exit(1) from exc
 def _result_panel(result, spec: ProjectSpec) -> None:
     lines = [f"[bold green]✓[/] Project '{spec.slug}' created at:", f"  {result.target_dir}", ""]
     lines += [f"  • {m}" for m in result.messages]
@@ -156,6 +178,20 @@ def _read_text_arg(text: str | None, from_file: Path | None) -> str:
     return (text or "").strip()
+def _maybe_launch_dashboard(launch_flag: bool, project_dir: Path) -> None:
+    """Open the prompt dashboard after an edit if requested."""
+    if not launch_flag:
+        console.print("  [dim]Tip: run `agentx dashboard` to tune this prompt live.[/]")
+        return
+    from .dashboard import launch
+    console.print("[cyan]Opening dashboard…[/]")
+    try:
+        launch(project=str(project_dir))
+    except RuntimeError as exc:
+        console.print(f"[yellow]{exc}[/]")
 @prompt_app.command("list")
 def prompt_list(project: Path = typer.Option(None, "--project", help="Project dir (default: search from cwd).")) -> None:
     """List agents and their (resolved) prompts."""
@@ -177,6 +213,7 @@ def prompt_set(
     text: str = typer.Option("", "--text", "-t", help="New system prompt text."),
     from_file: Path = typer.Option(None, "--file", "-f", help="Read prompt text from a file."),
     project: Path = typer.Option(None, "--project"),
+    dash: bool = typer.Option(False, "--dashboard", "-d", help="Open the dashboard after saving."),
 ) -> None:
     """Set/replace an existing agent's system prompt."""
     path = _resolve_prompts_file(project)
@@ -190,6 +227,7 @@ def prompt_set(
         console.print(f"[red]{exc}[/]")
         raise typer.Exit(1) from exc
     console.print(f"[green]✓[/] Updated prompt for '{agent}'.")
+    _maybe_launch_dashboard(dash, path.parent)
 @prompt_app.command("add")
@@ -200,6 +238,7 @@ def prompt_add(
     text: str = typer.Option("", "--text", "-t", help="System prompt (blank = auto from role/goal)."),
     from_file: Path = typer.Option(None, "--file", "-f"),
     project: Path = typer.Option(None, "--project"),
+    dash: bool = typer.Option(False, "--dashboard", "-d", help="Open the dashboard after saving."),
 ) -> None:
     """Add a new agent; the project picks it up automatically on next run."""
     path = _resolve_prompts_file(project)
@@ -209,6 +248,7 @@ def prompt_add(
         console.print(f"[red]{exc}[/]")
         raise typer.Exit(1) from exc
     console.print(f"[green]✓[/] Added agent '{agent}'. It will run on next start — no code changes needed.")
+    _maybe_launch_dashboard(dash, path.parent)
 @prompt_app.command("remove")

agentx_kit-0.3.0/src/agentx/dashboard/__init__.py ADDED Viewed

@@ -0,0 +1,40 @@
+"""The AgentX prompt-observability dashboard (Streamlit)."""
+from __future__ import annotations
+import os
+import subprocess
+import sys
+from pathlib import Path
+APP = Path(__file__).parent / "app.py"
+def launch(port: int = 8501, provider: str | None = None, model: str | None = None,
+           project: str | None = None, headless: bool = False) -> int:
+    """Launch the Streamlit dashboard. Raises a helpful error if Streamlit is absent."""
+    try:
+        import streamlit  # noqa: F401
+    except ImportError as exc:
+        raise RuntimeError(
+            "The dashboard needs Streamlit. Install it with:\n"
+            "    pip install 'agentx-kit[dashboard]'"
+        ) from exc
+    env = os.environ.copy()
+    if provider:
+        env["AGENTX_DASH_PROVIDER"] = provider
+    if model:
+        env["AGENTX_DASH_MODEL"] = model
+    env["AGENTX_DASH_PROJECT"] = str(project or Path.cwd())
+    cmd = [
+        sys.executable, "-m", "streamlit", "run", str(APP),
+        "--server.port", str(port),
+        "--browser.gatherUsageStats", "false",
+    ]
+    if headless:
+        cmd += ["--server.headless", "true"]
+    return subprocess.run(cmd, env=env).returncode
+__all__ = ["launch", "APP"]

agentx_kit-0.3.0/src/agentx/dashboard/app.py ADDED Viewed

@@ -0,0 +1,270 @@
+"""AgentX — Prompt Observability & Optimization Dashboard (Streamlit).
+Launched via ``agentx dashboard``. An observability workbench for understanding
+and refining how your prompts interact with the LLM:
+  • live token count, context-window utilization, and cost estimate
+  • a heuristic quality score with concrete suggestions + limit warnings
+  • one-click LLM optimization (refine while preserving intent) with a diff
+  • a test run showing response, tokens in/out, latency, and cost
+  • usage trends over time, logged locally to .agentx/insights.jsonl
+If run inside a generated AgentX project, it reads/writes that project's
+``prompts.json`` so optimizations can be applied in place.
+"""
+from __future__ import annotations
+import difflib
+import os
+import time
+from pathlib import Path
+import streamlit as st
+from agentx.insights import (
+    analyze_prompt,
+    count_tokens,
+    estimate_cost,
+    get_log,
+    optimize_prompt,
+    prompt_hash,
+)
+from agentx.insights.tokens import context_window
+from agentx.providers import all_specs, get_spec
+st.set_page_config(page_title="AgentX Prompt Dashboard", page_icon="🧬", layout="wide")
+_PROJECT = Path(os.getenv("AGENTX_DASH_PROJECT", "."))
+_DEFAULT_PROVIDER = os.getenv("AGENTX_DASH_PROVIDER", "openai")
+_DEFAULT_MODEL = os.getenv("AGENTX_DASH_MODEL", "")
+# --------------------------------------------------------------------------- #
+# prompts.json integration (optional)
+# --------------------------------------------------------------------------- #
+def _load_prompts_store():
+    try:
+        from agentx.scaffold import prompts_store
+        path = prompts_store.find_prompts_file(_PROJECT)
+        if path:
+            return prompts_store, path, prompts_store.load(path)
+    except Exception:  # noqa: BLE001
+        pass
+    return None, None, None
+def _log():
+    return get_log(_PROJECT / ".agentx" / "insights.jsonl")
+# --------------------------------------------------------------------------- #
+# Sidebar — provider/model + prompt source
+# --------------------------------------------------------------------------- #
+def _sidebar():
+    st.sidebar.header("🧬 AgentX Dashboard")
+    specs = all_specs()
+    ids = [s.id for s in specs]
+    provider = st.sidebar.selectbox(
+        "Provider", ids, index=ids.index(_DEFAULT_PROVIDER) if _DEFAULT_PROVIDER in ids else 0,
+    )
+    default_model = _DEFAULT_MODEL or get_spec(provider).default_model
+    model = st.sidebar.text_input("Model", value=default_model)
+    store, path, data = _load_prompts_store()
+    source = "Free-form"
+    agent_name = None
+    initial_text = st.session_state.get("prompt_text", "")
+    if store and data and data.get("agents"):
+        st.sidebar.success(f"Project prompts: {path.parent.name}/prompts.json")
+        choices = ["Free-form"] + list(data["agents"])
+        source = st.sidebar.selectbox("Prompt source", choices)
+        if source != "Free-form":
+            agent_name = source
+            meta = data["agents"][agent_name]
+            loaded = meta.get("system_prompt") or ""
+            if st.session_state.get("_loaded_agent") != agent_name:
+                st.session_state["prompt_text"] = loaded
+                st.session_state["_loaded_agent"] = agent_name
+                initial_text = loaded
+    else:
+        st.sidebar.info("No prompts.json found — running in free-form mode. "
+                        "Run inside an AgentX project to edit its prompts.")
+    return provider, model, store, path, agent_name
+# --------------------------------------------------------------------------- #
+# Panels
+# --------------------------------------------------------------------------- #
+def _metrics_row(text: str, model: str):
+    tokens = count_tokens(text, model)
+    win = context_window(model)
+    util = tokens / win if win else 0.0
+    cost = estimate_cost(tokens, 0, model)
+    c1, c2, c3, c4 = st.columns(4)
+    c1.metric("Tokens", f"{tokens:,}")
+    c2.metric("Context window", f"{win:,}")
+    c3.metric("Utilization", f"{util:.1%}")
+    c4.metric("Est. input cost", f"${cost:.5f}")
+    st.progress(min(1.0, util), text=f"Context window usage: {util:.1%}")
+    return tokens
+def _analysis_panel(text: str, model: str):
+    a = analyze_prompt(text, model)
+    score = a.quality_score
+    color = "🟢" if score >= 75 else "🟡" if score >= 50 else "🔴"
+    st.subheader(f"{color} Prompt quality: {score}/100")
+    cols = st.columns(2)
+    labels = {
+        "has_role": "Role defined", "has_goal": "Goal stated",
+        "has_output_format": "Output format", "has_examples": "Examples",
+        "has_constraints": "Constraints", "not_vague": "Specific (not vague)",
+        "reasonable_length": "Reasonable length",
+    }
+    items = list(a.checks.items())
+    for i, (key, ok) in enumerate(items):
+        cols[i % 2].markdown(f"{'✅' if ok else '⬜'} {labels.get(key, key)}")
+    if a.suggestions:
+        st.markdown("##### 💡 Suggestions")
+        for s in a.suggestions:
+            st.markdown(f"- {s}")
+    for w in a.warnings:
+        st.warning(w)
+def _optimize_panel(text: str, provider: str, model: str, store, path, agent_name):
+    st.subheader("✨ Optimize prompt")
+    feedback = st.text_input("Optional feedback (tone, format, length, focus…)", key="opt_feedback")
+    if st.button("Optimize with LLM", type="primary"):
+        with st.spinner("Refining prompt (preserving intent)…"):
+            result = optimize_prompt(text, provider, model, feedback=feedback)
+        if not result.ok:
+            st.error(f"Optimization failed: {result.error}")
+        else:
+            st.session_state["opt_result"] = {"improved": result.improved, "rationale": result.rationale}
+            _log().record(kind="optimize", model=model, prompt_hash=prompt_hash(text),
+                          tokens_in=count_tokens(text, model), tokens_out=count_tokens(result.improved, model),
+                          note="prompt optimization")
+    res = st.session_state.get("opt_result")
+    if res:
+        before = count_tokens(text, model)
+        after = count_tokens(res["improved"], model)
+        delta = after - before
+        st.caption(f"Tokens: {before} → {after}  ({'+' if delta >= 0 else ''}{delta})")
+        st.markdown("**Improved prompt**")
+        st.code(res["improved"])
+        if res["rationale"]:
+            with st.expander("Why these changes?"):
+                st.markdown(res["rationale"])
+        with st.expander("Diff (original → improved)"):
+            diff = difflib.unified_diff(
+                text.splitlines(), res["improved"].splitlines(),
+                fromfile="original", tofile="improved", lineterm="",
+            )
+            st.code("\n".join(diff) or "(no line-level changes)", language="diff")
+        cols = st.columns(2)
+        if cols[0].button("Use as current prompt"):
+            st.session_state["prompt_text"] = res["improved"]
+            st.session_state.pop("opt_result", None)
+            st.rerun()
+        if agent_name and store and path:
+            if cols[1].button(f"💾 Apply to '{agent_name}' in prompts.json"):
+                store.set_prompt(path, agent_name, res["improved"])
+                st.session_state["prompt_text"] = res["improved"]
+                st.session_state.pop("opt_result", None)
+                st.success(f"Saved to prompts.json → {agent_name}. Your project picks it up on next run.")
+                st.rerun()
+def _run_panel(text: str, provider: str, model: str):
+    st.subheader("▶️ Test run")
+    user_msg = st.text_area("User message", value="Hello! Introduce yourself.", height=80, key="run_user")
+    if st.button("Run against the model"):
+        try:
+            from agentx import get_chat_model
+            from langchain_core.messages import HumanMessage, SystemMessage
+            llm = get_chat_model(provider, model)
+            messages = [SystemMessage(text), HumanMessage(user_msg)] if text.strip() else [HumanMessage(user_msg)]
+            t0 = time.time()
+            with st.spinner("Calling the model…"):
+                resp = llm.invoke(messages)
+            latency = int((time.time() - t0) * 1000)
+            reply = getattr(resp, "content", str(resp))
+            tin = count_tokens(text + user_msg, model)
+            tout = count_tokens(reply, model)
+            cost = estimate_cost(tin, tout, model)
+            _log().record(kind="run", model=model, prompt_hash=prompt_hash(text),
+                          tokens_in=tin, tokens_out=tout, cost_usd=cost, latency_ms=latency)
+            m1, m2, m3, m4 = st.columns(4)
+            m1.metric("Tokens in", f"{tin:,}")
+            m2.metric("Tokens out", f"{tout:,}")
+            m3.metric("Latency", f"{latency} ms")
+            m4.metric("Est. cost", f"${cost:.5f}")
+            st.markdown("**Response**")
+            st.markdown(reply)
+        except Exception as exc:  # noqa: BLE001
+            st.error(f"Run failed: {exc}\n\nCheck your provider extra is installed and credentials are set.")
+def _trends_panel():
+    st.subheader("📊 Usage & trends")
+    log = _log()
+    agg = log.aggregate()
+    c1, c2, c3, c4 = st.columns(4)
+    c1.metric("Runs", agg["runs"])
+    c2.metric("Total tokens", f"{agg['total_tokens']:,}")
+    c3.metric("Total cost", f"${agg['total_cost_usd']:.4f}")
+    c4.metric("Avg latency", f"{agg['avg_latency_ms']} ms")
+    rows = [r for r in log.events() if r.get("kind") == "run"]
+    if not rows:
+        st.info("No runs logged yet — use **Test run** to populate trends.")
+        return
+    try:
+        import pandas as pd
+        df = pd.DataFrame(rows)
+        df["ts"] = pd.to_datetime(df["ts"])
+        df = df.set_index("ts")
+        st.markdown("###### Tokens per run")
+        st.line_chart(df[["tokens_in", "tokens_out"]], height=200)
+        st.markdown("###### Cost (USD) per run")
+        st.line_chart(df[["cost_usd"]], height=160)
+        st.markdown("###### Latency (ms) per run")
+        st.line_chart(df[["latency_ms"]], height=160)
+    except Exception:  # noqa: BLE001 - pandas optional
+        st.write(rows[-20:])
+# --------------------------------------------------------------------------- #
+def main():
+    provider, model, store, path, agent_name = _sidebar()
+    st.title("🧬 Prompt Observability & Optimization")
+    st.caption("Edit a prompt, see token/cost/limits live, get suggestions, optimize, and test — all in one place.")
+    text = st.text_area(
+        "System prompt", value=st.session_state.get("prompt_text", ""),
+        height=240, key="prompt_text",
+        placeholder="You are a helpful assistant. Your goal is to…",
+    )
+    _metrics_row(text, model)
+    st.divider()
+    tab_analyze, tab_optimize, tab_run, tab_trends = st.tabs(
+        ["🔎 Analysis", "✨ Optimize", "▶️ Test run", "📊 Trends"]
+    )
+    with tab_analyze:
+        _analysis_panel(text, model)
+    with tab_optimize:
+        _optimize_panel(text, provider, model, store, path, agent_name)
+    with tab_run:
+        _run_panel(text, provider, model)
+    with tab_trends:
+        _trends_panel()
+main()

agentx_kit-0.3.0/src/agentx/insights/__init__.py ADDED Viewed

@@ -0,0 +1,18 @@
+"""Prompt insights: token/cost analysis, quality heuristics, LLM optimization, logging."""
+from .analyze import PromptAnalysis, analyze_prompt
+from .log import InsightEvent, InsightLog, get_log, prompt_hash
+from .optimize import OptimizationResult, optimize_prompt
+from .tokens import (
+    TokenReport,
+    context_window,
+    count_tokens,
+    estimate_cost,
+    utilization,
+)
+__all__ = [
+    "analyze_prompt", "PromptAnalysis",
+    "optimize_prompt", "OptimizationResult",
+    "count_tokens", "estimate_cost", "context_window", "utilization", "TokenReport",
+    "InsightLog", "InsightEvent", "get_log", "prompt_hash",
+]

agentx_kit-0.3.0/src/agentx/insights/analyze.py ADDED Viewed

@@ -0,0 +1,85 @@
+"""Heuristic prompt analysis — quality score, suggestions, and limit warnings.
+Offline and fast (no LLM). Encodes widely-recommended prompt-engineering levers
+(2025–2026): a clear role + goal, explicit output format, examples, constraints,
+lean/keyword-led wording, and context-window awareness. Use this for instant
+feedback while editing; use :mod:`agentx.insights.optimize` for an LLM rewrite.
+"""
+from __future__ import annotations
+import re
+from dataclasses import dataclass, field
+from .tokens import context_window, count_tokens, utilization
+_VAGUE = ("something", "stuff", "etc", "and so on", "good", "nice", "appropriate", "as needed")
+_FORMAT_HINTS = ("json", "markdown", "bullet", "list", "table", "format", "schema", "output:", "respond with")
+_EXAMPLE_HINTS = ("example", "e.g.", "for instance", "input:", "output:")
+_CONSTRAINT_HINTS = ("must", "do not", "don't", "never", "only", "limit", "at most", "no more than", "avoid")
+_ROLE_HINTS = ("you are", "act as", "your role", "as a ")
+_GOAL_HINTS = ("your goal", "your task", "objective", "you should", "help the user", "your job")
+@dataclass
+class PromptAnalysis:
+    tokens: int
+    chars: int
+    quality_score: int                     # 0-100
+    checks: dict[str, bool] = field(default_factory=dict)
+    suggestions: list[str] = field(default_factory=list)
+    warnings: list[str] = field(default_factory=list)
+def analyze_prompt(text: str, model: str = "gpt-4o-mini") -> PromptAnalysis:
+    """Score a prompt and return actionable suggestions + limit warnings."""
+    text = text or ""
+    low = text.lower()
+    tokens = count_tokens(text, model)
+    checks = {
+        "has_role": any(h in low for h in _ROLE_HINTS),
+        "has_goal": any(h in low for h in _GOAL_HINTS),
+        "has_output_format": any(h in low for h in _FORMAT_HINTS),
+        "has_examples": any(h in low for h in _EXAMPLE_HINTS),
+        "has_constraints": any(h in low for h in _CONSTRAINT_HINTS),
+        "not_vague": not any(re.search(rf"\b{re.escape(w)}\b", low) for w in _VAGUE),
+        "reasonable_length": 5 <= tokens <= 1500,
+    }
+    # Weighted score (role + goal + format matter most).
+    weights = {
+        "has_role": 18, "has_goal": 18, "has_output_format": 18,
+        "has_examples": 14, "has_constraints": 14, "not_vague": 10, "reasonable_length": 8,
+    }
+    score = sum(w for k, w in weights.items() if checks[k])
+    suggestions: list[str] = []
+    if not checks["has_role"]:
+        suggestions.append("Open with an explicit role, e.g. “You are a senior support agent…”.")
+    if not checks["has_goal"]:
+        suggestions.append("State the goal/task clearly so the model knows what success looks like.")
+    if not checks["has_output_format"]:
+        suggestions.append("Specify the output format (JSON/markdown/bullets) — reduces retries and tokens.")
+    if not checks["has_examples"]:
+        suggestions.append("Add 1–2 short input→output examples (few-shot) for tricky tasks.")
+    if not checks["has_constraints"]:
+        suggestions.append("Add constraints/guardrails (length caps, “do not …”) to keep output on-task.")
+    if not checks["not_vague"]:
+        suggestions.append("Replace vague words (e.g. “good”, “appropriate”, “stuff”) with concrete criteria.")
+    warnings: list[str] = []
+    win = context_window(model)
+    util = utilization(tokens, model)
+    if tokens > 1500:
+        warnings.append(
+            f"Prompt is long ({tokens} tokens). Lead with keywords, trim boilerplate, and move "
+            "stable context into a cached prefix to cut cost."
+        )
+    if util >= 0.5:
+        warnings.append(f"Prompt already uses {util:.0%} of {model}'s {win:,}-token context window.")
+    if tokens < 5:
+        warnings.append("Prompt is very short — likely under-specified.")
+    return PromptAnalysis(
+        tokens=tokens, chars=len(text), quality_score=score,
+        checks=checks, suggestions=suggestions, warnings=warnings,
+    )

agentx_kit-0.3.0/src/agentx/insights/log.py ADDED Viewed

@@ -0,0 +1,88 @@
+"""Interaction log — append prompt edits/runs/optimizations for the dashboard.
+A local JSONL at ``.agentx/insights.jsonl`` (project-local). Powers the usage
+and trend charts: tokens in/out, cost, latency, model, per event.
+"""
+from __future__ import annotations
+import hashlib
+import json
+import threading
+from dataclasses import asdict, dataclass, field
+from datetime import datetime, timezone
+from pathlib import Path
+_lock = threading.Lock()
+def prompt_hash(text: str) -> str:
+    return hashlib.sha256((text or "").encode("utf-8")).hexdigest()[:10]
+def _now() -> str:
+    return datetime.now(timezone.utc).isoformat()
+@dataclass
+class InsightEvent:
+    ts: str = field(default_factory=_now)
+    kind: str = "run"                  # run | edit | optimize
+    model: str = ""
+    prompt_hash: str = ""
+    tokens_in: int = 0
+    tokens_out: int = 0
+    cost_usd: float = 0.0
+    latency_ms: int = 0
+    note: str = ""
+class InsightLog:
+    def __init__(self, path: str | Path = ".agentx/insights.jsonl"):
+        self.path = Path(path)
+        self.path.parent.mkdir(parents=True, exist_ok=True)
+    def add(self, event: InsightEvent) -> InsightEvent:
+        with _lock:
+            with self.path.open("a", encoding="utf-8") as fh:
+                fh.write(json.dumps(asdict(event)) + "\n")
+        return event
+    def record(self, **kwargs) -> InsightEvent:
+        return self.add(InsightEvent(**kwargs))
+    def events(self, limit: int | None = None) -> list[dict]:
+        if not self.path.exists():
+            return []
+        rows = []
+        for line in self.path.read_text(encoding="utf-8").splitlines():
+            line = line.strip()
+            if not line:
+                continue
+            try:
+                rows.append(json.loads(line))
+            except json.JSONDecodeError:
+                continue
+        return rows[-limit:] if limit else rows
+    def aggregate(self) -> dict:
+        rows = self.events()
+        runs = [r for r in rows if r.get("kind") == "run"]
+        total_tokens = sum(r.get("tokens_in", 0) + r.get("tokens_out", 0) for r in runs)
+        total_cost = round(sum(r.get("cost_usd", 0.0) for r in runs), 6)
+        lat = [r.get("latency_ms", 0) for r in runs if r.get("latency_ms")]
+        return {
+            "events": len(rows),
+            "runs": len(runs),
+            "total_tokens": total_tokens,
+            "total_cost_usd": total_cost,
+            "avg_latency_ms": round(sum(lat) / len(lat)) if lat else 0,
+            "optimizations": sum(1 for r in rows if r.get("kind") == "optimize"),
+        }
+    def clear(self) -> None:
+        if self.path.exists():
+            self.path.unlink()
+def get_log(path: str | Path = ".agentx/insights.jsonl") -> InsightLog:
+    return InsightLog(path)

agentx_kit-0.3.0/src/agentx/insights/optimize.py ADDED Viewed

@@ -0,0 +1,80 @@
+"""LLM-backed prompt refinement — rewrite a prompt while preserving intent.
+Implements the "iterative refinement" pattern: improve an existing prompt by
+applying best practices (clear role/goal, explicit output format, constraints,
+lean wording) and any user feedback, *without* changing the original intent.
+"""
+from __future__ import annotations
+import logging
+from dataclasses import dataclass
+from ..providers import get_chat_model
+logger = logging.getLogger(__name__)
+_OPTIMIZER_SYSTEM = (
+    "You are a senior prompt engineer. You rewrite system prompts to be clearer, "
+    "more reliable, and more token-efficient WITHOUT changing their intent.\n"
+    "Apply: an explicit role + goal, a specified output format, concrete constraints, "
+    "lean keyword-led wording, and few-shot examples only if they add value. "
+    "Remove redundancy and vagueness. Keep it as short as possible while complete."
+)
+_OPTIMIZER_HUMAN = (
+    "Rewrite the PROMPT below.\n\n"
+    "PROMPT:\n{prompt}\n\n"
+    "{feedback_block}"
+    "Respond in EXACTLY this format:\n"
+    "===IMPROVED===\n<the improved prompt only>\n===RATIONALE===\n"
+    "<3-5 bullet points explaining the key changes>"
+)
+@dataclass
+class OptimizationResult:
+    original: str
+    improved: str
+    rationale: str
+    ok: bool = True
+    error: str = ""
+def optimize_prompt(
+    prompt: str,
+    provider: str | None = None,
+    model: str | None = None,
+    feedback: str = "",
+    temperature: float = 0.3,
+) -> OptimizationResult:
+    """Return an LLM-refined version of ``prompt`` + a rationale. Never raises."""
+    if not (prompt or "").strip():
+        return OptimizationResult(prompt, prompt, "", ok=False, error="Empty prompt.")
+    feedback_block = f"Apply this feedback: {feedback}\n\n" if feedback.strip() else ""
+    try:
+        from langchain_core.prompts import ChatPromptTemplate
+        chain = ChatPromptTemplate.from_messages(
+            [("system", _OPTIMIZER_SYSTEM), ("human", _OPTIMIZER_HUMAN)]
+        ) | get_chat_model(provider, model, temperature=temperature)
+        raw = chain.invoke({"prompt": prompt, "feedback_block": feedback_block}).content
+        improved, rationale = _parse(raw, fallback=prompt)
+        return OptimizationResult(prompt, improved, rationale)
+    except Exception as exc:  # noqa: BLE001
+        logger.warning("Prompt optimization failed: %s", exc)
+        return OptimizationResult(prompt, prompt, "", ok=False, error=str(exc))
+def _parse(raw: str, fallback: str) -> tuple[str, str]:
+    text = raw or ""
+    improved, rationale = fallback, ""
+    if "===IMPROVED===" in text:
+        rest = text.split("===IMPROVED===", 1)[1]
+        if "===RATIONALE===" in rest:
+            imp, rat = rest.split("===RATIONALE===", 1)
+            improved, rationale = imp.strip(), rat.strip()
+        else:
+            improved = rest.strip()
+    else:
+        improved = text.strip() or fallback
+    return improved, rationale

agentx_kit-0.3.0/src/agentx/insights/tokens.py ADDED Viewed

@@ -0,0 +1,97 @@
+"""Token counting, cost estimation, and context-window utilization.
+Uses ``tiktoken`` when available for accurate counts; otherwise falls back to a
+~4-chars/token heuristic. Pricing and context windows are approximate, editable
+defaults — override per your contract. Cost is *derived* from tokens (the
+industry convention; OTel GenAI standardises tokens, not cost).
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+# Approximate context windows (tokens). Matched by substring on the model id.
+_CONTEXT_WINDOWS: dict[str, int] = {
+    "gpt-4o": 128_000, "gpt-4.1": 1_000_000, "gpt-4": 128_000, "o1": 200_000, "o3": 200_000,
+    "gpt-3.5": 16_385,
+    "claude-3-5": 200_000, "claude-3": 200_000, "claude": 200_000,
+    "gemini-1.5": 1_000_000, "gemini-2": 1_000_000, "gemini": 1_000_000,
+    "llama-3.3": 128_000, "llama-3.1": 128_000, "llama3": 8_192, "llama": 8_192,
+    "mixtral": 32_768, "mistral": 32_768, "qwen": 32_768,
+}
+# Approximate USD per 1K tokens, as (input, output). Defaults are conservative.
+_PRICING: dict[str, tuple[float, float]] = {
+    "gpt-4o-mini": (0.00015, 0.0006), "gpt-4o": (0.0025, 0.01), "gpt-4.1": (0.002, 0.008),
+    "gpt-4": (0.03, 0.06), "gpt-3.5": (0.0005, 0.0015),
+    "claude-3-5-sonnet": (0.003, 0.015), "claude-3-5-haiku": (0.0008, 0.004),
+    "claude-3-opus": (0.015, 0.075), "claude": (0.003, 0.015),
+    "gemini-1.5-flash": (0.000075, 0.0003), "gemini-1.5-pro": (0.00125, 0.005), "gemini": (0.000075, 0.0003),
+    "llama": (0.0, 0.0), "qwen": (0.0, 0.0), "mistral": (0.0, 0.0),
+}
+_DEFAULT_WINDOW = 8_192
+_DEFAULT_PRICE = (0.0, 0.0)
+def _match(model: str, table: dict):
+    model = (model or "").lower()
+    # Longest key match wins (so "gpt-4o-mini" beats "gpt-4").
+    best = None
+    for key in sorted(table, key=len, reverse=True):
+        if key in model:
+            best = table[key]
+            break
+    return best
+def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
+    """Count tokens in ``text`` for ``model``. Accurate via tiktoken if installed."""
+    if not text:
+        return 0
+    try:
+        import tiktoken
+        try:
+            enc = tiktoken.encoding_for_model(model)
+        except KeyError:
+            enc = tiktoken.get_encoding("o200k_base" if "gpt-4o" in model or "gpt-4.1" in model else "cl100k_base")
+        return len(enc.encode(text))
+    except Exception:  # noqa: BLE001 - tiktoken absent or model unknown
+        # ~4 chars/token heuristic with a small floor.
+        return max(1, round(len(text) / 4))
+def context_window(model: str) -> int:
+    return _match(model, _CONTEXT_WINDOWS) or _DEFAULT_WINDOW
+def utilization(tokens: int, model: str) -> float:
+    """Fraction (0-1) of the model's context window used by ``tokens``."""
+    win = context_window(model)
+    return round(tokens / win, 4) if win else 0.0
+def estimate_cost(input_tokens: int, output_tokens: int = 0, model: str = "gpt-4o-mini") -> float:
+    """Estimate USD cost from token counts (approximate; pricing is editable)."""
+    price_in, price_out = _match(model, _PRICING) or _DEFAULT_PRICE
+    return round(input_tokens / 1000 * price_in + output_tokens / 1000 * price_out, 6)
+@dataclass
+class TokenReport:
+    model: str
+    tokens: int
+    context_window: int
+    utilization: float
+    est_input_cost: float
+    @classmethod
+    def for_text(cls, text: str, model: str) -> "TokenReport":
+        n = count_tokens(text, model)
+        return cls(
+            model=model,
+            tokens=n,
+            context_window=context_window(model),
+            utilization=utilization(n, model),
+            est_input_cost=estimate_cost(n, 0, model),
+        )

agentx_kit-0.3.0/tests/test_insights.py ADDED Viewed

@@ -0,0 +1,94 @@
+"""Tests for the prompt-insights core (token/cost, analysis, log). No live LLM."""
+import pytest
+from agentx.insights import (
+    analyze_prompt,
+    context_window,
+    count_tokens,
+    estimate_cost,
+    get_log,
+    prompt_hash,
+    utilization,
+)
+# ----- tokens -----
+def test_count_tokens_scales_with_length():
+    assert count_tokens("", "gpt-4o-mini") == 0
+    short = count_tokens("hello world", "gpt-4o-mini")
+    long = count_tokens("hello world " * 100, "gpt-4o-mini")
+    assert 0 < short < long
+def test_context_window_and_utilization():
+    assert context_window("gpt-4o-mini") >= 100_000
+    assert context_window("totally-unknown-model") == 8192  # default
+    assert 0.0 <= utilization(1000, "gpt-4o-mini") < 0.1
+def test_estimate_cost_monotonic():
+    cheap = estimate_cost(1000, 0, "gpt-4o-mini")
+    pricey = estimate_cost(1000, 0, "gpt-4o")
+    assert pricey > cheap >= 0
+    assert estimate_cost(0, 0, "gpt-4o") == 0.0
+# ----- analysis -----
+def test_analyze_good_prompt_scores_high():
+    good = (
+        "You are a senior support agent. Your goal is to resolve billing issues. "
+        "Respond in JSON with fields {reason, action}. Do not invent policy. "
+        "Example: input: 'refund?' output: {\"reason\": \"...\", \"action\": \"...\"}."
+    )
+    a = analyze_prompt(good, "gpt-4o-mini")
+    assert a.quality_score >= 70
+    assert a.checks["has_role"] and a.checks["has_goal"] and a.checks["has_output_format"]
+def test_analyze_poor_prompt_has_suggestions():
+    a = analyze_prompt("do good stuff", "gpt-4o-mini")
+    assert a.quality_score < 50
+    assert a.suggestions
+    assert a.checks["not_vague"] is False  # 'good'/'stuff' are vague
+def test_analyze_long_prompt_warns():
+    a = analyze_prompt("word " * 4000, "gpt-4o-mini")
+    assert any("long" in w.lower() for w in a.warnings)
+# ----- log -----
+def test_insight_log_roundtrip(tmp_path):
+    log = get_log(tmp_path / ".agentx" / "insights.jsonl")
+    log.record(kind="run", model="gpt-4o-mini", tokens_in=100, tokens_out=50, cost_usd=0.001, latency_ms=420)
+    log.record(kind="run", model="gpt-4o-mini", tokens_in=200, tokens_out=80, cost_usd=0.002, latency_ms=380)
+    log.record(kind="optimize", model="gpt-4o-mini", tokens_in=100, tokens_out=90)
+    agg = log.aggregate()
+    assert agg["runs"] == 2
+    assert agg["total_tokens"] == 100 + 50 + 200 + 80
+    assert agg["optimizations"] == 1
+    assert agg["avg_latency_ms"] == 400
+def test_prompt_hash_stable():
+    assert prompt_hash("abc") == prompt_hash("abc")
+    assert prompt_hash("abc") != prompt_hash("abd")
+# ----- dashboard launcher import is lazy/graceful -----
+def test_dashboard_launch_requires_streamlit(monkeypatch):
+    import builtins
+    from agentx import dashboard
+    real_import = builtins.__import__
+    def fake_import(name, *a, **k):
+        if name == "streamlit":
+            raise ImportError("no streamlit")
+        return real_import(name, *a, **k)
+    monkeypatch.setattr(builtins, "__import__", fake_import)
+    with pytest.raises(RuntimeError) as exc:
+        dashboard.launch()
+    assert "agentx-kit[dashboard]" in str(exc.value)