PyPI - agent-failure-doctor - Versions diffs - 3.2.0__tar.gz - Mend

agent-failure-doctor 3.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (192) hide show

agent_failure_doctor-3.2.0/LICENSE ADDED Viewed

@@ -0,0 +1,5 @@
+MIT License
+Copyright (c) 2026 WebAgentRuntimeBench
+Permission is hereby granted... (MIT text)

agent_failure_doctor-3.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,595 @@
+Metadata-Version: 2.4
+Name: agent-failure-doctor
+Version: 3.2.0
+Summary: Local-first failure diagnosis for AI browser automation, Playwright, crawler, and RPA runs.
+Author: sida lin
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/tobybgy-lsd/web-agent-runtime-bench
+Project-URL: Repository, https://github.com/tobybgy-lsd/web-agent-runtime-bench
+Project-URL: Issues, https://github.com/tobybgy-lsd/web-agent-runtime-bench/issues
+Keywords: playwright,browser automation,ai agent,crawler,rpa,debugging,failure diagnosis
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Software Development :: Testing
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Provides-Extra: trace-gen
+Requires-Dist: playwright>=1.45; extra == "trace-gen"
+Dynamic: license-file
+# Agent Failure Doctor
+[中文文档](README.zh-CN.md)
+![CI](https://github.com/tobybgy-lsd/web-agent-runtime-bench/actions/workflows/benchmark.yml/badge.svg)
+![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
+![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)
+Local-first failure diagnosis lifecycle tool for AI browser automation,
+Playwright, crawler, RPA, and business automation failures.
+- Current milestone: Agent Failure Doctor v3.2 Auto Collector P98 Gate
+- Previous stable line: Agent Failure Doctor v3.1.0 P98 Master Gate.
+- Previous P95 stable line: Agent Failure Doctor v2.4.1 P95 Alignment & Missing Tracks Pack.
+**Input:** trace.zip / error.log / console.txt / network.json /
+screenshot metadata / user_description.txt
+**Output:** diagnosis, evidence, next action, repair suggestions,
+GitHub issue draft, Codex fix prompt.
+## Quickstart
+```powershell
+git clone https://github.com/tobybgy-lsd/web-agent-runtime-bench.git
+cd web-agent-runtime-bench
+python -m pip install -e .
+failure-doctor diagnose .\examples\failed_runs\proxy_network_error --out .\report
+failure-doctor plan .\report --out .\fix_plan
+failure-doctor collect --project . --preset auto --out .\failure_doctor_auto_report `
+  --auto-diagnose --auto-handoff --auto-sanitize
+failure-doctor agent-bootstrap --target all --project .
+```
+See [validation/dashboard.md](validation/dashboard.md),
+[docs/P98_LIMITS.md](docs/P98_LIMITS.md),
+[docs/AGENT_FRONTEND_INVOCATION.md](docs/AGENT_FRONTEND_INVOCATION.md),
+and [docs/safety_boundary.md](docs/safety_boundary.md).
+P98 master gate passed with the auto collector pillar included.
+Advanced commands include `failure-doctor handoff`,
+`failure-doctor agent-bootstrap`, `failure-doctor propose-patch`, and
+`failure-doctor batch`.
+**Core commands:** `collect` / `diagnose` / `plan` / `verify` / `run` /
+`watch` / `sanitize` / `adapt` / `handoff` / `agent-bootstrap` /
+`propose-patch` / `batch`
+**Classic lifecycle:** `diagnose` / `plan` / `verify` / `run` /
+`sanitize` / `adapt` -> `diagnose -> plan -> AI handoff / patch proposal
+-> verify -> sanitize/share`
+**P98 gate:** `knowledge base -> coverage matrix ->
+trace/cross-framework/training/composite/handoff/batch/sanitize/auto-collector
+-> master gate`
+## Distribution & Feedback
+v3.2.0 is the current stable technical baseline. The next phase is distribution
+and real user feedback, not more synthetic feature expansion.
+- PyPI release runbook: [docs/PYPI_RELEASE.md](docs/PYPI_RELEASE.md)
+- 2-minute demo script: [docs/DEMO_VIDEO_SCRIPT.md](docs/DEMO_VIDEO_SCRIPT.md)
+- Technical article draft: [docs/TECH_ARTICLE_DRAFT.md](docs/TECH_ARTICLE_DRAFT.md)
+- Real user feedback loop: [docs/REAL_USER_FEEDBACK_LOOP.md](docs/REAL_USER_FEEDBACK_LOOP.md)
+After PyPI publication, the target install command is:
+```powershell
+pip install agent-failure-doctor
+```
+For non-technical Windows users, double-click
+`scripts/windows/Start-FailureDoctor-Diagnosis.bat` or drag a failed project
+folder onto it.
+Advanced v3.2 commands include `failure-doctor collect` and `failure-doctor watch`.
+Agent frontend invocation:
+```powershell
+failure-doctor agent-bootstrap --target all --project .
+```
+This writes `.failure-doctor/AGENT_ENTRYPOINT.md` plus Codex, Cursor,
+Claude Code, VS Code/Copilot, Antigravity, OpenCode, Qoder, Trae, WorkBuddy,
+OpenClaw, Hermes, and generic agent workflow instructions.
+Agent Failure Doctor uses a deterministic evidence-based diagnostic engine.
+It does not claim to solve arbitrary failures, but it provides explainable
+classification, evidence, fix plans, and before/after verification for known
+automation failure patterns.
+Applied scenario demos are local-only mock workflows for commerce automation,
+live monitoring, content publishing, GUI data bridge, and ERP sync failure
+diagnosis.
+Spiderbuf-inspired challenge demos are local-only mock failure packs inspired
+by public crawler-training challenge categories; they validate diagnosis and
+safe next actions without accessing spiderbuf.cn or publishing private solution
+logic.
+**Integration commands:** `failure-doctor collect-playwright` / `failure-doctor pack-logs` / `failure-doctor adapt`
+## What You Get
+```text
+report/
+|-- diagnosis.json
+|-- diagnosis.md
+|-- evidence.json
+|-- input_summary.json
+|-- issue_draft.md
+|-- repair_suggestions.md
+|-- codex_fix_prompt.md
+`-- failure_doctor_report.zip
+```
+Agent Failure Doctor turns sanitized automation failure materials into a report
+that explains what likely failed, what evidence supports the diagnosis, what
+evidence is missing, and what to ask Codex or another coding assistant to
+change next.
+## One-Minute Start
+Auto Capture:
+```powershell
+failure-doctor run -- python crawler.py
+failure-doctor run -- pytest tests/test_listing.py
+failure-doctor run -- playwright test
+```
+This writes a local run folder under `.failure-doctor/runs/<run_id>/`:
+```text
+.failure-doctor/runs/<run_id>/
+|-- command.txt
+|-- exit_code.txt
+|-- stdout.log
+|-- stderr.log
+|-- environment.json
+|-- detected_artifacts.json
+|-- input_summary.json
+|-- diagnosis/
+|-- fix_plan/
+|-- verification_hint.md
+`-- shareable_failure_pack.zip
+```
+The generated `safe_to_share.json` defaults to `safe_to_share=false`; review and sanitize before sending a pack to anyone else.
+Sanitize & Share Pack:
+Sanitize a failed run before sharing it:
+```powershell
+failure-doctor sanitize .\.failure-doctor\runs\<run_id> --out .\shareable_failure_pack
+```
+This writes redacted logs, redacted network summaries, trace metadata only, a
+redaction report, a review gate, and `shareable_failure_pack.zip`.
+Raw `trace.zip` archives are not copied into the sanitized pack.
+Put a failed run in a folder:
+```text
+my_failed_run/
+|-- error.log
+|-- console.txt
+|-- network.json
+|-- README.txt
+`-- screenshot.png
+```
+Then run:
+```powershell
+failure-doctor diagnose .\my_failed_run --out .\report
+```
+The tool inventories inputs and uses this evidence priority:
+```text
+trace.zip > log > network.json > user description > screenshot metadata
+```
+When evidence is too thin, it should downgrade to `insufficient_evidence` instead of guessing.
+## Minimal Demos
+Proxy/network failure:
+```powershell
+failure-doctor diagnose .\examples\failed_runs\proxy_failed --out .\report_proxy
+```
+Strict mode locator conflict:
+```powershell
+failure-doctor diagnose .\examples\failed_runs\strict_mode_locator --out .\report_locator
+```
+Low-evidence screenshot-only run:
+```powershell
+failure-doctor diagnose .\examples\failed_runs\low_evidence_screenshot_only --out .\report_low_evidence
+```
+Native Playwright trace fixture:
+```powershell
+trace-doctor diagnose .\examples\realistic_playwright_traces\02_login_redirect_302\trace.zip --out .\report_login_trace
+```
+## Before / After Report
+Report structure: conclusion / evidence / why / next action / Codex fix prompt
+Before:
+```text
+page.goto: net::ERR_PROXY_CONNECTION_FAILED while opening https://example.test
+```
+After:
+```text
+Conclusion: network/proxy setup failed before the page loaded.
+Evidence: Playwright reported net::ERR_PROXY_CONNECTION_FAILED.
+Next action: check proxy settings, DNS, VPN, and CI network configuration.
+Codex fix prompt: add trace/log capture and make proxy configuration explicit.
+```
+## Verify a Fix
+```powershell
+failure-doctor diagnose .\failed_run --out .\report
+failure-doctor plan .\report --out .\fix_plan
+failure-doctor verify --before .\failed_run --after .\rerun_after_fix --out .\verification_report
+```
+`verify` compares before/after evidence and reports whether the original failure
+is resolved, unchanged, changed into another failure, or insufficiently
+evidenced.
+## AI Handoff & Patch Proposal
+Turn a report into task packs that Codex, Claude Code, or Cursor can execute:
+```powershell
+failure-doctor handoff .\report --target codex --out .\ai_handoff
+failure-doctor handoff .\report --target claude_code --out .\ai_handoff
+failure-doctor handoff .\report --target cursor --out .\ai_handoff
+```
+This writes:
+```text
+ai_handoff/
+|-- ai_handoff.json
+|-- ai_handoff.md
+|-- codex_task.md
+|-- claude_code_task.md
+|-- cursor_task.md
+|-- affected_files.json
+|-- validation_commands.md
+|-- forbidden_actions.md
+|-- token_budget_report.json
+`-- ai_handoff_pack.zip
+```
+Generate a dry-run patch proposal without modifying source code:
+```powershell
+failure-doctor propose-patch --repo . --report .\report --out .\patch_plan
+```
+This writes:
+```text
+patch_plan/
+|-- patch_proposal.md
+|-- proposed_changes.json
+|-- affected_files.json
+|-- validation_commands.md
+`-- patch_risk_assessment.json
+```
+`propose-patch` is intentionally proposal-only. It does not edit files, apply patches, run tests, or open pull requests.
+v2.5 validation writes `validation/ai_handoff_validation.json`:
+```text
+20/20 Codex task files generated
+20/20 Claude Code task files generated
+20/20 Cursor task files generated
+18/20 patch proposals generated
+20/20 required sections present
+20/20 concise token budget checks pass
+0 forbidden outputs
+```
+## Batch Diagnosis / Fleet Mode
+Diagnose many failed runs and get a fleet-level summary:
+```powershell
+failure-doctor batch .\runs --out .\batch_report
+```
+Input:
+```text
+runs/
+|-- run_001/
+|-- run_002/
+|-- run_003/
+`-- ...
+```
+Output:
+```text
+batch_report/
+|-- summary.json
+|-- summary.md
+|-- failures_by_type.csv
+|-- top_root_causes.md
+|-- repeated_failures.md
+|-- suggested_regression_cases.md
+|-- repair_priority.md
+`-- reports/
+```
+Fleet mode answers which failures repeat, which root causes dominate, which runs
+should become regression cases, and which fixes deserve priority.
+## P98 Controlled Maturity
+v3.0 starts the P98 controlled maturity track. This is not an ecosystem score;
+it does not count stars, external PRs, external issues, PyPI downloads, or
+long-term community adoption.
+Current P98 assets:
+- [docs/P98_CONTROLLED_MATURITY_SCORECARD.md](docs/P98_CONTROLLED_MATURITY_SCORECARD.md)
+- [knowledge_base/](knowledge_base/)
+- [docs/CRAWLER_FAILURE_COVERAGE_MATRIX.md](docs/CRAWLER_FAILURE_COVERAGE_MATRIX.md)
+- [validation/crawler_failure_coverage_matrix.json](validation/crawler_failure_coverage_matrix.json)
+Knowledge-base commands:
+```powershell
+python -m tools.knowledge_base.validate_patterns
+python -m tools.knowledge_base.search_patterns --query selector_drift
+python -m tools.validation.run_crawler_failure_coverage_matrix
+```
+## Applied Scenario Demos
+Local-only mock demos show how Agent Failure Doctor can diagnose failures in:
+- hot product collection
+- live commerce monitoring
+- ecommerce listing automation
+- authorized content publishing workflow
+- GUI / RPA data bridge
+- ERP-to-ecommerce sync
+Run:
+```powershell
+python -m tools.validation.run_applied_scenario_validation
+```
+## Spiderbuf-Inspired Challenge Demos
+`examples/spiderbuf_inspired_challenges/` contains local-only mock failure packs inspired by public crawler-training challenge categories:
+- cookie/session required
+- iframe extraction
+- Ajax dynamic loading
+- random CSS selector drift
+- infinite scroll missing items
+- rate limit 429
+- API signature required
+- browser fingerprint risk
+- Selenium detection risk
+- challenge page detected
+These cases are diagnosis-only. They do not access spiderbuf.cn, do not include
+private solutions, and do not include access-control defeat steps.
+```powershell
+python -m tools.validation.run_spiderbuf_inspired_validation
+```
+## Integrations
+Collect Playwright test-results into a failure pack:
+```powershell
+failure-doctor collect-playwright .\examples\mock_playwright_test_results --out .\tmp_failure_pack
+failure-doctor diagnose .\tmp_failure_pack --out .\tmp_collected_report
+```
+Normalize a loose log folder:
+```powershell
+failure-doctor pack-logs .\examples\mock_raw_logs --out .\tmp_log_pack
+failure-doctor diagnose .\tmp_log_pack --out .\tmp_log_report
+```
+Normalize a Selenium, Puppeteer, Cypress, Scrapy, requests, or httpx failure log:
+```powershell
+failure-doctor adapt .\examples\cross_framework_fixtures\selenium\no_such_element\raw --framework selenium --out .\tmp_selenium_pack
+failure-doctor diagnose .\tmp_selenium_pack --out .\tmp_selenium_report
+failure-doctor plan .\tmp_selenium_report --out .\tmp_selenium_fix_plan
+```
+Supported adapter frameworks:
+```text
+selenium | puppeteer | cypress | scrapy | requests | httpx | auto
+```
+Playwright remains the deepest native trace backend. Cross-framework adapters
+normalize local logs and metadata into the same failure lifecycle; they do not
+run those frameworks or connect to external platforms.
+See [docs/INTEGRATIONS.md](docs/INTEGRATIONS.md) and [docs/GITHUB_ACTION_USAGE.md](docs/GITHUB_ACTION_USAGE.md).
+## Validation Status
+Current milestone: Agent Failure Doctor v3.2 Auto Collector P98 Gate.
+Previous stable line: Agent Failure Doctor v2.4.1 P95 Alignment & Missing Tracks Pack.
+- 131 source-ledger records with separated `real_public_issue`, `official_doc_pattern`, and `public_inspired_sanitized` labels
+- 50 traceable real public issue records
+- 100 Playwright Trace Doctor P95 fixtures
+- 100/100 Playwright trace reasonable classifications
+- 100/100 Playwright trace exact subtype matches
+- 62 external public reference seeds
+- 20 external public reference held-out records
+- 20/20 external public reference reasonable classifications
+- 20/20 external public reference actionable next actions
+- 12 resolution validation cases
+- 12/12 resolution statuses correct
+- 18 applied scenario validation cases
+- 18/18 applied scenario reasonable classifications
+- 18/18 applied scenario valid fix plans
+- 18/18 applied scenario verification statuses correct
+- Playwright collector, generic log packer, browser-use adapter, and GitHub Actions usage docs
+- v2.0 Auto Capture command wrapper: `failure-doctor run -- <command>`
+- Sanitize & Share command: `failure-doctor sanitize <failed_run> --out <shareable_failure_pack>`
+- Cross-framework adapter command: `failure-doctor adapt <input> --framework <framework> --out <failure_pack>`
+- 100 cross-framework P95 fixtures across Selenium, Puppeteer, Cypress, Scrapy, requests, httpx, browser-use, and generic RPA
+- 100/100 cross-framework P95 reasonable classifications
+- 100/100 cross-framework P95 valid fix plans
+- 0 forbidden outputs in cross-framework P95 validation
+- 40 training challenge P95 local-only validation cases
+- 40/40 training challenge reasonable classifications
+- 40/40 training challenge valid fix plans
+- 40/40 training challenge verification statuses correct
+- 0 forbidden outputs and 0 private solution leaks in training challenge validation
+- 160 composite P95 strict local-only validation cases
+- 160/160 composite primary classifications correct
+- 160/160 composite repair-order checks correct
+- 160/160 composite evidence graphs generated
+- 0 forbidden outputs in composite P95 strict validation
+- P95 Core Triad Gate: pass
+- 3 composite showcase reports under `sample_reports/composite_showcase/`
+- 10 external held-out public-source records
+- 9/10 external held-out reasonable classifications
+- 10/10 external held-out actionable next actions
+- 0 forbidden outputs in generated reports/prompts
+- GitHub Actions green across Ubuntu, macOS, Windows, plus Windows benchmark/smoke/safety
+See [docs/VALIDATION_REPORT.md](docs/VALIDATION_REPORT.md),
+[docs/EXTERNAL_DATA_SOURCES.md](docs/EXTERNAL_DATA_SOURCES.md), and
+[validation/dashboard.md](validation/dashboard.md) for validation metrics,
+limits, and boundaries.
+## Reproduce Validation
+```powershell
+python -m tools.real_trace_generation.generate_real_trace_fixtures `
+  --out .\examples\realistic_playwright_traces `
+  --count 30 `
+  --clean
+python -m tools.validation.run_real_trace_validation
+python -m tools.validation.run_playwright_trace_p95_validation
+python -m tools.validation.run_external_public_reference_validation
+python -m tools.validation.run_resolution_validation
+python -m tools.validation.run_spiderbuf_inspired_validation
+python -m tools.validation.run_training_challenge_validation
+python -m tools.validation.run_cross_framework_p95_validation
+python -m tools.validation.run_composite_diagnosis_p95_strict_validation
+python -m tools.validation.run_p95_core_triage_gate
+python scripts\validate_external_heldout.py
+```
+## Safety Boundary
+This project is for local, sanitized failure diagnosis.
+It is not:
+- a challenge-solving tool
+- an access-control circumvention tool
+- a credential extractor
+- a real-platform scraper
+- a tool for unauthorized collection
+For suspected platform risk cases, the intended output is identification,
+routing, and compliance-oriented next steps such as reducing request volume,
+using an official API, confirming authorization, contacting the platform, or
+stopping unauthorized collection.
+## Contributing Failure Cases
+You do not need to write code. The most useful contribution is a sanitized
+failure case: log snippets, trace metadata, network summaries, screenshot
+metadata, and a short description of what happened.
+Open an [External failure case issue](.github/ISSUE_TEMPLATE/external_failure_case.yml) and remove secrets before posting:
+- passwords
+- API keys
+- cookies
+- tokens
+- authorization headers
+- private screenshots
+- private data
+- personal data
+Accepted input types include sanitized `error.log`, `trace.zip`, `console.txt`,
+`network.json`, screenshot metadata, and `user_description.txt`.
+If you allow it, a sanitized case may be assigned an `EXT-YYYY-NNNN` id, run
+once with the current released version before rule changes, and added to the
+external validation dashboard.
+Templates and author-generated examples are not counted as external cases.
+See [CONTRIBUTING.md](CONTRIBUTING.md),
+[docs/external_validation_protocol.md](docs/external_validation_protocol.md),
+[docs/REAL_TRACE_CONTRIBUTION_GUIDE.md](docs/REAL_TRACE_CONTRIBUTION_GUIDE.md),
+and [docs/REAL_DATA_SOURCES.md](docs/REAL_DATA_SOURCES.md).
+## Commands
+Run all tests:
+```powershell
+python -m unittest discover -s tests -p "test_*.py"
+```
+Run smoke and safety checks:
+```powershell
+scripts\smoke_test.ps1
+scripts\local_safety_scan.ps1
+```