@rubix0270/arboris 1.0.2 → 1.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +8 -20
- package/run.mjs +10 -0
- package/dist/cli.mjs +0 -383
- package/manifest.json +0 -323
- package/prisma/skills/accessibility/SKILL.md +0 -147
- package/prisma/skills/agent-architecture-audit/SKILL.md +0 -257
- package/prisma/skills/agent-eval/SKILL.md +0 -146
- package/prisma/skills/agent-harness-construction/SKILL.md +0 -74
- package/prisma/skills/agent-introspection-debugging/SKILL.md +0 -154
- package/prisma/skills/agent-payment-x402/SKILL.md +0 -225
- package/prisma/skills/agent-self-evaluation/SKILL.md +0 -182
- package/prisma/skills/agent-self-evaluation/examples/high-score-example.md +0 -87
- package/prisma/skills/agent-self-evaluation/examples/low-score-example.md +0 -86
- package/prisma/skills/agent-self-evaluation/references/evaluation-criteria.md +0 -71
- package/prisma/skills/agent-self-evaluation/references/hook-integration.md +0 -64
- package/prisma/skills/agent-self-evaluation/scripts/evaluate.py +0 -408
- package/prisma/skills/agent-self-evaluation/templates/evaluation-report.md +0 -86
- package/prisma/skills/agent-sort/SKILL.md +0 -216
- package/prisma/skills/agentic-engineering/SKILL.md +0 -64
- package/prisma/skills/agentic-os/SKILL.md +0 -388
- package/prisma/skills/ai-first-engineering/SKILL.md +0 -52
- package/prisma/skills/ai-regression-testing/SKILL.md +0 -386
- package/prisma/skills/android-clean-architecture/SKILL.md +0 -340
- package/prisma/skills/angular-developer/SKILL.md +0 -155
- package/prisma/skills/angular-developer/references/angular-animations.md +0 -160
- package/prisma/skills/angular-developer/references/angular-aria.md +0 -410
- package/prisma/skills/angular-developer/references/cli.md +0 -86
- package/prisma/skills/angular-developer/references/component-harnesses.md +0 -59
- package/prisma/skills/angular-developer/references/component-styling.md +0 -91
- package/prisma/skills/angular-developer/references/components.md +0 -117
- package/prisma/skills/angular-developer/references/creating-services.md +0 -97
- package/prisma/skills/angular-developer/references/data-resolvers.md +0 -69
- package/prisma/skills/angular-developer/references/define-routes.md +0 -67
- package/prisma/skills/angular-developer/references/defining-providers.md +0 -72
- package/prisma/skills/angular-developer/references/di-fundamentals.md +0 -120
- package/prisma/skills/angular-developer/references/e2e-testing.md +0 -56
- package/prisma/skills/angular-developer/references/effects.md +0 -83
- package/prisma/skills/angular-developer/references/hierarchical-injectors.md +0 -43
- package/prisma/skills/angular-developer/references/host-elements.md +0 -80
- package/prisma/skills/angular-developer/references/injection-context.md +0 -63
- package/prisma/skills/angular-developer/references/inputs.md +0 -101
- package/prisma/skills/angular-developer/references/linked-signal.md +0 -59
- package/prisma/skills/angular-developer/references/loading-strategies.md +0 -61
- package/prisma/skills/angular-developer/references/mcp.md +0 -108
- package/prisma/skills/angular-developer/references/navigate-to-routes.md +0 -69
- package/prisma/skills/angular-developer/references/outputs.md +0 -86
- package/prisma/skills/angular-developer/references/reactive-forms.md +0 -122
- package/prisma/skills/angular-developer/references/rendering-strategies.md +0 -44
- package/prisma/skills/angular-developer/references/resource.md +0 -77
- package/prisma/skills/angular-developer/references/route-animations.md +0 -56
- package/prisma/skills/angular-developer/references/route-guards.md +0 -52
- package/prisma/skills/angular-developer/references/router-lifecycle.md +0 -45
- package/prisma/skills/angular-developer/references/router-testing.md +0 -87
- package/prisma/skills/angular-developer/references/show-routes-with-outlets.md +0 -68
- package/prisma/skills/angular-developer/references/signal-forms.md +0 -795
- package/prisma/skills/angular-developer/references/signals-overview.md +0 -94
- package/prisma/skills/angular-developer/references/tailwind-css.md +0 -69
- package/prisma/skills/angular-developer/references/template-driven-forms.md +0 -114
- package/prisma/skills/angular-developer/references/testing-fundamentals.md +0 -65
- package/prisma/skills/api-connector-builder/SKILL.md +0 -121
- package/prisma/skills/api-design/SKILL.md +0 -524
- package/prisma/skills/architecture-decision-records/SKILL.md +0 -180
- package/prisma/skills/article-writing/SKILL.md +0 -80
- package/prisma/skills/automation-audit-ops/SKILL.md +0 -143
- package/prisma/skills/autonomous-agent-harness/SKILL.md +0 -274
- package/prisma/skills/autonomous-loops/SKILL.md +0 -611
- package/prisma/skills/backend-patterns/SKILL.md +0 -562
- package/prisma/skills/benchmark/SKILL.md +0 -94
- package/prisma/skills/benchmark-methodology/SKILL.md +0 -190
- package/prisma/skills/benchmark-optimization-loop/SKILL.md +0 -70
- package/prisma/skills/blender-motion-state-inspection/SKILL.md +0 -165
- package/prisma/skills/blueprint/SKILL.md +0 -106
- package/prisma/skills/brand-discovery/SKILL.md +0 -145
- package/prisma/skills/brand-discovery/references/10_purpose-why.md +0 -40
- package/prisma/skills/brand-discovery/references/20_positioning.md +0 -44
- package/prisma/skills/brand-discovery/references/30_audience-niche.md +0 -52
- package/prisma/skills/brand-discovery/references/40_personality-archetype.md +0 -57
- package/prisma/skills/brand-discovery/references/50_voice-tone.md +0 -59
- package/prisma/skills/brand-discovery/references/60_narrative-story.md +0 -50
- package/prisma/skills/brand-discovery/references/70_founder-tension.md +0 -49
- package/prisma/skills/brand-discovery/references/90_SYNTHESIS.md +0 -133
- package/prisma/skills/brand-voice/SKILL.md +0 -98
- package/prisma/skills/brand-voice/references/voice-profile-schema.md +0 -55
- package/prisma/skills/browser-qa/SKILL.md +0 -105
- package/prisma/skills/bun-runtime/SKILL.md +0 -85
- package/prisma/skills/canary-watch/SKILL.md +0 -108
- package/prisma/skills/carrier-relationship-management/SKILL.md +0 -212
- package/prisma/skills/cisco-ios-patterns/SKILL.md +0 -164
- package/prisma/skills/ck/SKILL.md +0 -148
- package/prisma/skills/ck/commands/forget.mjs +0 -44
- package/prisma/skills/ck/commands/info.mjs +0 -24
- package/prisma/skills/ck/commands/init.mjs +0 -143
- package/prisma/skills/ck/commands/list.mjs +0 -40
- package/prisma/skills/ck/commands/migrate.mjs +0 -202
- package/prisma/skills/ck/commands/resume.mjs +0 -36
- package/prisma/skills/ck/commands/save.mjs +0 -210
- package/prisma/skills/ck/commands/shared.mjs +0 -387
- package/prisma/skills/ck/hooks/session-start.mjs +0 -224
- package/prisma/skills/claude-devfleet/SKILL.md +0 -112
- package/prisma/skills/click-path-audit/SKILL.md +0 -245
- package/prisma/skills/clickhouse-io/SKILL.md +0 -440
- package/prisma/skills/code-tour/SKILL.md +0 -254
- package/prisma/skills/codebase-onboarding/SKILL.md +0 -234
- package/prisma/skills/codehealth-mcp/SKILL.md +0 -167
- package/prisma/skills/coding-standards/SKILL.md +0 -551
- package/prisma/skills/competitive-platform-analysis/SKILL.md +0 -214
- package/prisma/skills/competitive-report-structure/SKILL.md +0 -162
- package/prisma/skills/compose-multiplatform-patterns/SKILL.md +0 -300
- package/prisma/skills/config-gc/SKILL.md +0 -120
- package/prisma/skills/configure-ecc/SKILL.md +0 -385
- package/prisma/skills/connections-optimizer/SKILL.md +0 -190
- package/prisma/skills/content-engine/SKILL.md +0 -132
- package/prisma/skills/content-hash-cache-pattern/SKILL.md +0 -162
- package/prisma/skills/context-budget/SKILL.md +0 -136
- package/prisma/skills/continuous-agent-loop/SKILL.md +0 -46
- package/prisma/skills/continuous-learning/SKILL.md +0 -132
- package/prisma/skills/continuous-learning/config.json +0 -18
- package/prisma/skills/continuous-learning/evaluate-session.sh +0 -69
- package/prisma/skills/continuous-learning-v2/SKILL.md +0 -361
- package/prisma/skills/continuous-learning-v2/agents/observer-loop.sh +0 -359
- package/prisma/skills/continuous-learning-v2/agents/observer.md +0 -189
- package/prisma/skills/continuous-learning-v2/agents/session-guardian.sh +0 -150
- package/prisma/skills/continuous-learning-v2/agents/start-observer.sh +0 -248
- package/prisma/skills/continuous-learning-v2/config.json +0 -8
- package/prisma/skills/continuous-learning-v2/hooks/observe.sh +0 -585
- package/prisma/skills/continuous-learning-v2/scripts/detect-project.sh +0 -322
- package/prisma/skills/continuous-learning-v2/scripts/instinct-cli.py +0 -1956
- package/prisma/skills/continuous-learning-v2/scripts/lib/homunculus-dir.sh +0 -31
- package/prisma/skills/continuous-learning-v2/scripts/migrate-homunculus.sh +0 -68
- package/prisma/skills/continuous-learning-v2/scripts/test_parse_instinct.py +0 -1421
- package/prisma/skills/cost-aware-llm-pipeline/SKILL.md +0 -184
- package/prisma/skills/cost-tracking/SKILL.md +0 -97
- package/prisma/skills/council/SKILL.md +0 -204
- package/prisma/skills/cpp-coding-standards/SKILL.md +0 -724
- package/prisma/skills/cpp-testing/SKILL.md +0 -325
- package/prisma/skills/crosspost/SKILL.md +0 -112
- package/prisma/skills/csharp-testing/SKILL.md +0 -322
- package/prisma/skills/customer-billing-ops/SKILL.md +0 -141
- package/prisma/skills/customs-trade-compliance/SKILL.md +0 -263
- package/prisma/skills/dart-flutter-patterns/SKILL.md +0 -564
- package/prisma/skills/dashboard-builder/SKILL.md +0 -109
- package/prisma/skills/data-scraper-agent/SKILL.md +0 -765
- package/prisma/skills/data-throughput-accelerator/SKILL.md +0 -73
- package/prisma/skills/database-migrations/SKILL.md +0 -430
- package/prisma/skills/deep-research/SKILL.md +0 -160
- package/prisma/skills/defi-amm-security/SKILL.md +0 -167
- package/prisma/skills/delivery-gate/SKILL.md +0 -126
- package/prisma/skills/delivery-gate/hooks/quality-gate.py +0 -220
- package/prisma/skills/deployment-patterns/SKILL.md +0 -428
- package/prisma/skills/design-system/SKILL.md +0 -83
- package/prisma/skills/django-celery/SKILL.md +0 -458
- package/prisma/skills/django-patterns/SKILL.md +0 -735
- package/prisma/skills/django-security/SKILL.md +0 -644
- package/prisma/skills/django-tdd/SKILL.md +0 -730
- package/prisma/skills/django-verification/SKILL.md +0 -470
- package/prisma/skills/dmux-workflows/SKILL.md +0 -192
- package/prisma/skills/docker-patterns/SKILL.md +0 -365
- package/prisma/skills/documentation-lookup/SKILL.md +0 -91
- package/prisma/skills/dotnet-patterns/SKILL.md +0 -322
- package/prisma/skills/dynamic-workflow-mode/SKILL.md +0 -124
- package/prisma/skills/e2e-testing/SKILL.md +0 -327
- package/prisma/skills/ecc-guide/SKILL.md +0 -190
- package/prisma/skills/ecc-recipes/SKILL.md +0 -149
- package/prisma/skills/ecc-tools-cost-audit/SKILL.md +0 -161
- package/prisma/skills/email-ops/SKILL.md +0 -122
- package/prisma/skills/energy-procurement/SKILL.md +0 -228
- package/prisma/skills/enterprise-agent-ops/SKILL.md +0 -51
- package/prisma/skills/error-handling/SKILL.md +0 -377
- package/prisma/skills/eval-harness/SKILL.md +0 -271
- package/prisma/skills/evm-token-decimals/SKILL.md +0 -131
- package/prisma/skills/exa-search/SKILL.md +0 -108
- package/prisma/skills/fal-ai-media/SKILL.md +0 -289
- package/prisma/skills/fastapi-patterns/SKILL.md +0 -514
- package/prisma/skills/finance-billing-ops/SKILL.md +0 -128
- package/prisma/skills/flox-environments/SKILL.md +0 -497
- package/prisma/skills/flutter-dart-code-review/SKILL.md +0 -436
- package/prisma/skills/foundation-models-on-device/SKILL.md +0 -243
- package/prisma/skills/frontend-a11y/SKILL.md +0 -446
- package/prisma/skills/frontend-design-direction/SKILL.md +0 -93
- package/prisma/skills/frontend-patterns/SKILL.md +0 -657
- package/prisma/skills/frontend-slides/SKILL.md +0 -185
- package/prisma/skills/frontend-slides/STYLE_PRESETS.md +0 -330
- package/prisma/skills/frontend-slides/animation-patterns.md +0 -122
- package/prisma/skills/frontend-slides/html-template.md +0 -419
- package/prisma/skills/frontend-slides/scripts/export-pdf.sh +0 -418
- package/prisma/skills/frontend-slides/scripts/extract-pptx.py +0 -96
- package/prisma/skills/frontend-slides/viewport-base.css +0 -153
- package/prisma/skills/fsharp-testing/SKILL.md +0 -281
- package/prisma/skills/gan-style-harness/SKILL.md +0 -279
- package/prisma/skills/gateguard/SKILL.md +0 -133
- package/prisma/skills/generating-python-installer/SKILL.md +0 -820
- package/prisma/skills/git-workflow/SKILL.md +0 -716
- package/prisma/skills/github-ops/SKILL.md +0 -145
- package/prisma/skills/golang-patterns/SKILL.md +0 -675
- package/prisma/skills/golang-testing/SKILL.md +0 -721
- package/prisma/skills/google-workspace-ops/SKILL.md +0 -96
- package/prisma/skills/growth-log/SKILL.md +0 -128
- package/prisma/skills/healthcare-cdss-patterns/SKILL.md +0 -246
- package/prisma/skills/healthcare-emr-patterns/SKILL.md +0 -160
- package/prisma/skills/healthcare-eval-harness/SKILL.md +0 -208
- package/prisma/skills/healthcare-phi-compliance/SKILL.md +0 -146
- package/prisma/skills/hermes-imports/SKILL.md +0 -89
- package/prisma/skills/hexagonal-architecture/SKILL.md +0 -277
- package/prisma/skills/hipaa-compliance/SKILL.md +0 -79
- package/prisma/skills/homelab-network-readiness/SKILL.md +0 -170
- package/prisma/skills/homelab-network-setup/SKILL.md +0 -130
- package/prisma/skills/homelab-pihole-dns/SKILL.md +0 -275
- package/prisma/skills/homelab-vlan-segmentation/SKILL.md +0 -312
- package/prisma/skills/homelab-wireguard-vpn/SKILL.md +0 -306
- package/prisma/skills/hookify-rules/SKILL.md +0 -128
- package/prisma/skills/inherit-legacy-style/SKILL.md +0 -157
- package/prisma/skills/intent-driven-development/SKILL.md +0 -360
- package/prisma/skills/inventory-demand-planning/SKILL.md +0 -247
- package/prisma/skills/investor-materials/SKILL.md +0 -97
- package/prisma/skills/investor-outreach/SKILL.md +0 -92
- package/prisma/skills/ios-icon-gen/SKILL.md +0 -158
- package/prisma/skills/ios-icon-gen/scripts/generate_icons.swift +0 -258
- package/prisma/skills/ios-icon-gen/scripts/iconify_gen.sh +0 -235
- package/prisma/skills/iterative-retrieval/SKILL.md +0 -212
- package/prisma/skills/ito-basket-compare/SKILL.md +0 -64
- package/prisma/skills/ito-data-atlas-agent/SKILL.md +0 -64
- package/prisma/skills/ito-market-intelligence/SKILL.md +0 -61
- package/prisma/skills/ito-trade-planner/SKILL.md +0 -68
- package/prisma/skills/java-coding-standards/SKILL.md +0 -384
- package/prisma/skills/jira-integration/SKILL.md +0 -303
- package/prisma/skills/jpa-patterns/SKILL.md +0 -152
- package/prisma/skills/knowledge-ops/SKILL.md +0 -155
- package/prisma/skills/kotlin-coroutines-flows/SKILL.md +0 -285
- package/prisma/skills/kotlin-exposed-patterns/SKILL.md +0 -720
- package/prisma/skills/kotlin-ktor-patterns/SKILL.md +0 -690
- package/prisma/skills/kotlin-patterns/SKILL.md +0 -712
- package/prisma/skills/kotlin-testing/SKILL.md +0 -825
- package/prisma/skills/kubernetes-patterns/SKILL.md +0 -756
- package/prisma/skills/laravel-patterns/SKILL.md +0 -416
- package/prisma/skills/laravel-plugin-discovery/SKILL.md +0 -230
- package/prisma/skills/laravel-security/SKILL.md +0 -948
- package/prisma/skills/laravel-tdd/SKILL.md +0 -675
- package/prisma/skills/laravel-verification/SKILL.md +0 -180
- package/prisma/skills/latency-critical-systems/SKILL.md +0 -74
- package/prisma/skills/lead-intelligence/SKILL.md +0 -322
- package/prisma/skills/lead-intelligence/agents/enrichment-agent.md +0 -85
- package/prisma/skills/lead-intelligence/agents/mutual-mapper.md +0 -75
- package/prisma/skills/lead-intelligence/agents/outreach-drafter.md +0 -98
- package/prisma/skills/lead-intelligence/agents/signal-scorer.md +0 -60
- package/prisma/skills/liquid-glass-design/SKILL.md +0 -279
- package/prisma/skills/llm-trading-agent-security/SKILL.md +0 -147
- package/prisma/skills/logistics-exception-management/SKILL.md +0 -222
- package/prisma/skills/loop-design-check/SKILL.md +0 -143
- package/prisma/skills/mailtrap-email-integration/SKILL.md +0 -77
- package/prisma/skills/make-interfaces-feel-better/SKILL.md +0 -152
- package/prisma/skills/manim-video/SKILL.md +0 -90
- package/prisma/skills/manim-video/assets/network_graph_scene.py +0 -52
- package/prisma/skills/market-research/SKILL.md +0 -76
- package/prisma/skills/marketing-campaign/SKILL.md +0 -114
- package/prisma/skills/mcp-server-patterns/SKILL.md +0 -70
- package/prisma/skills/messages-ops/SKILL.md +0 -105
- package/prisma/skills/ml-adoption-playbook/SKILL.md +0 -57
- package/prisma/skills/mle-workflow/SKILL.md +0 -347
- package/prisma/skills/motion-advanced/SKILL.md +0 -596
- package/prisma/skills/motion-foundations/SKILL.md +0 -299
- package/prisma/skills/motion-patterns/SKILL.md +0 -434
- package/prisma/skills/motion-ui/SKILL.md +0 -576
- package/prisma/skills/mysql-patterns/SKILL.md +0 -413
- package/prisma/skills/nanoclaw-repl/SKILL.md +0 -34
- package/prisma/skills/nestjs-patterns/SKILL.md +0 -231
- package/prisma/skills/netmiko-ssh-automation/SKILL.md +0 -174
- package/prisma/skills/network-bgp-diagnostics/SKILL.md +0 -168
- package/prisma/skills/network-config-validation/SKILL.md +0 -211
- package/prisma/skills/network-interface-health/SKILL.md +0 -153
- package/prisma/skills/nextjs-turbopack/SKILL.md +0 -58
- package/prisma/skills/nodejs-keccak256/SKILL.md +0 -103
- package/prisma/skills/nutrient-document-processing/SKILL.md +0 -168
- package/prisma/skills/nuxt4-patterns/SKILL.md +0 -101
- package/prisma/skills/openclaw-persona-forge/SKILL.md +0 -289
- package/prisma/skills/openclaw-persona-forge/gacha.py +0 -224
- package/prisma/skills/openclaw-persona-forge/gacha.sh +0 -5
- package/prisma/skills/openclaw-persona-forge/references/avatar-style.md +0 -124
- package/prisma/skills/openclaw-persona-forge/references/boundary-rules.md +0 -53
- package/prisma/skills/openclaw-persona-forge/references/error-handling.md +0 -53
- package/prisma/skills/openclaw-persona-forge/references/identity-tension.md +0 -48
- package/prisma/skills/openclaw-persona-forge/references/naming-system.md +0 -39
- package/prisma/skills/openclaw-persona-forge/references/output-template.md +0 -166
- package/prisma/skills/opensource-pipeline/SKILL.md +0 -256
- package/prisma/skills/orch-add-feature/SKILL.md +0 -45
- package/prisma/skills/orch-build-mvp/SKILL.md +0 -49
- package/prisma/skills/orch-change-feature/SKILL.md +0 -43
- package/prisma/skills/orch-fix-defect/SKILL.md +0 -43
- package/prisma/skills/orch-pipeline/SKILL.md +0 -121
- package/prisma/skills/orch-refine-code/SKILL.md +0 -44
- package/prisma/skills/parallel-execution-optimizer/SKILL.md +0 -73
- package/prisma/skills/perl-patterns/SKILL.md +0 -505
- package/prisma/skills/perl-security/SKILL.md +0 -504
- package/prisma/skills/perl-testing/SKILL.md +0 -476
- package/prisma/skills/plan-orchestrate/SKILL.md +0 -263
- package/prisma/skills/plankton-code-quality/SKILL.md +0 -237
- package/prisma/skills/postgres-patterns/SKILL.md +0 -148
- package/prisma/skills/prediction-market-oracle-research/SKILL.md +0 -64
- package/prisma/skills/prediction-market-risk-review/SKILL.md +0 -61
- package/prisma/skills/prisma-patterns/SKILL.md +0 -401
- package/prisma/skills/product-capability/SKILL.md +0 -142
- package/prisma/skills/product-lens/SKILL.md +0 -93
- package/prisma/skills/production-audit/SKILL.md +0 -207
- package/prisma/skills/production-scheduling/SKILL.md +0 -238
- package/prisma/skills/project-flow-ops/SKILL.md +0 -112
- package/prisma/skills/prompt-optimizer/SKILL.md +0 -398
- package/prisma/skills/python-patterns/SKILL.md +0 -751
- package/prisma/skills/python-testing/SKILL.md +0 -817
- package/prisma/skills/pytorch-patterns/SKILL.md +0 -397
- package/prisma/skills/quality-nonconformance/SKILL.md +0 -260
- package/prisma/skills/quarkus-patterns/SKILL.md +0 -723
- package/prisma/skills/quarkus-security/SKILL.md +0 -468
- package/prisma/skills/quarkus-tdd/SKILL.md +0 -812
- package/prisma/skills/quarkus-verification/SKILL.md +0 -480
- package/prisma/skills/ralphinho-rfc-pipeline/SKILL.md +0 -68
- package/prisma/skills/react-native-patterns/SKILL.md +0 -326
- package/prisma/skills/react-patterns/SKILL.md +0 -342
- package/prisma/skills/react-performance/SKILL.md +0 -575
- package/prisma/skills/react-testing/SKILL.md +0 -424
- package/prisma/skills/recsys-pipeline-architect/SKILL.md +0 -115
- package/prisma/skills/recursive-decision-ledger/SKILL.md +0 -80
- package/prisma/skills/redis-patterns/SKILL.md +0 -404
- package/prisma/skills/regex-vs-llm-structured-text/SKILL.md +0 -221
- package/prisma/skills/remotion-video-creation/SKILL.md +0 -43
- package/prisma/skills/remotion-video-creation/rules/3d.md +0 -86
- package/prisma/skills/remotion-video-creation/rules/animations.md +0 -29
- package/prisma/skills/remotion-video-creation/rules/assets/charts-bar-chart.tsx +0 -173
- package/prisma/skills/remotion-video-creation/rules/assets/text-animations-typewriter.tsx +0 -100
- package/prisma/skills/remotion-video-creation/rules/assets/text-animations-word-highlight.tsx +0 -108
- package/prisma/skills/remotion-video-creation/rules/assets.md +0 -78
- package/prisma/skills/remotion-video-creation/rules/audio.md +0 -172
- package/prisma/skills/remotion-video-creation/rules/calculate-metadata.md +0 -104
- package/prisma/skills/remotion-video-creation/rules/can-decode.md +0 -75
- package/prisma/skills/remotion-video-creation/rules/charts.md +0 -58
- package/prisma/skills/remotion-video-creation/rules/compositions.md +0 -146
- package/prisma/skills/remotion-video-creation/rules/display-captions.md +0 -126
- package/prisma/skills/remotion-video-creation/rules/extract-frames.md +0 -229
- package/prisma/skills/remotion-video-creation/rules/fonts.md +0 -152
- package/prisma/skills/remotion-video-creation/rules/get-audio-duration.md +0 -58
- package/prisma/skills/remotion-video-creation/rules/get-video-dimensions.md +0 -68
- package/prisma/skills/remotion-video-creation/rules/get-video-duration.md +0 -58
- package/prisma/skills/remotion-video-creation/rules/gifs.md +0 -138
- package/prisma/skills/remotion-video-creation/rules/images.md +0 -130
- package/prisma/skills/remotion-video-creation/rules/import-srt-captions.md +0 -67
- package/prisma/skills/remotion-video-creation/rules/lottie.md +0 -67
- package/prisma/skills/remotion-video-creation/rules/measuring-dom-nodes.md +0 -34
- package/prisma/skills/remotion-video-creation/rules/measuring-text.md +0 -143
- package/prisma/skills/remotion-video-creation/rules/sequencing.md +0 -106
- package/prisma/skills/remotion-video-creation/rules/tailwind.md +0 -11
- package/prisma/skills/remotion-video-creation/rules/text-animations.md +0 -20
- package/prisma/skills/remotion-video-creation/rules/timing.md +0 -179
- package/prisma/skills/remotion-video-creation/rules/transcribe-captions.md +0 -19
- package/prisma/skills/remotion-video-creation/rules/transitions.md +0 -122
- package/prisma/skills/remotion-video-creation/rules/trimming.md +0 -52
- package/prisma/skills/remotion-video-creation/rules/videos.md +0 -171
- package/prisma/skills/repo-scan/SKILL.md +0 -79
- package/prisma/skills/research-ops/SKILL.md +0 -113
- package/prisma/skills/returns-reverse-logistics/SKILL.md +0 -240
- package/prisma/skills/rules-distill/SKILL.md +0 -265
- package/prisma/skills/rules-distill/scripts/scan-rules.sh +0 -58
- package/prisma/skills/rules-distill/scripts/scan-skills.sh +0 -129
- package/prisma/skills/rust-patterns/SKILL.md +0 -500
- package/prisma/skills/rust-testing/SKILL.md +0 -501
- package/prisma/skills/safety-guard/SKILL.md +0 -76
- package/prisma/skills/santa-method/SKILL.md +0 -307
- package/prisma/skills/scientific-db-pubmed-database/SKILL.md +0 -176
- package/prisma/skills/scientific-db-uspto-database/SKILL.md +0 -178
- package/prisma/skills/scientific-pkg-gget/SKILL.md +0 -167
- package/prisma/skills/scientific-thinking-literature-review/SKILL.md +0 -193
- package/prisma/skills/scientific-thinking-scholar-evaluation/SKILL.md +0 -161
- package/prisma/skills/search-first/SKILL.md +0 -183
- package/prisma/skills/security-bounty-hunter/SKILL.md +0 -100
- package/prisma/skills/security-review/SKILL.md +0 -504
- package/prisma/skills/security-review/cloud-infrastructure-security.md +0 -361
- package/prisma/skills/security-scan/SKILL.md +0 -166
- package/prisma/skills/seo/SKILL.md +0 -155
- package/prisma/skills/skill-comply/SKILL.md +0 -59
- package/prisma/skills/skill-comply/fixtures/compliant_trace.jsonl +0 -5
- package/prisma/skills/skill-comply/fixtures/noncompliant_trace.jsonl +0 -3
- package/prisma/skills/skill-comply/fixtures/tdd_spec.yaml +0 -44
- package/prisma/skills/skill-comply/prompts/classifier.md +0 -24
- package/prisma/skills/skill-comply/prompts/scenario_generator.md +0 -62
- package/prisma/skills/skill-comply/prompts/spec_generator.md +0 -42
- package/prisma/skills/skill-comply/pyproject.toml +0 -15
- package/prisma/skills/skill-comply/scripts/__init__.py +0 -0
- package/prisma/skills/skill-comply/scripts/classifier.py +0 -85
- package/prisma/skills/skill-comply/scripts/grader.py +0 -124
- package/prisma/skills/skill-comply/scripts/parser.py +0 -107
- package/prisma/skills/skill-comply/scripts/report.py +0 -170
- package/prisma/skills/skill-comply/scripts/run.py +0 -127
- package/prisma/skills/skill-comply/scripts/runner.py +0 -194
- package/prisma/skills/skill-comply/scripts/scenario_generator.py +0 -70
- package/prisma/skills/skill-comply/scripts/spec_generator.py +0 -72
- package/prisma/skills/skill-comply/scripts/utils.py +0 -13
- package/prisma/skills/skill-comply/tests/test_grader.py +0 -197
- package/prisma/skills/skill-comply/tests/test_parser.py +0 -90
- package/prisma/skills/skill-comply/tests/test_runner.py +0 -172
- package/prisma/skills/skill-scout/SKILL.md +0 -141
- package/prisma/skills/skill-stocktake/SKILL.md +0 -195
- package/prisma/skills/skill-stocktake/scripts/quick-diff.sh +0 -87
- package/prisma/skills/skill-stocktake/scripts/save-results.sh +0 -56
- package/prisma/skills/skill-stocktake/scripts/scan.sh +0 -170
- package/prisma/skills/social-graph-ranker/SKILL.md +0 -155
- package/prisma/skills/social-publisher/SKILL.md +0 -130
- package/prisma/skills/springboot-patterns/SKILL.md +0 -315
- package/prisma/skills/springboot-security/SKILL.md +0 -273
- package/prisma/skills/springboot-tdd/SKILL.md +0 -159
- package/prisma/skills/springboot-verification/SKILL.md +0 -232
- package/prisma/skills/strategic-compact/SKILL.md +0 -136
- package/prisma/skills/swift-actor-persistence/SKILL.md +0 -144
- package/prisma/skills/swift-concurrency-6-2/SKILL.md +0 -216
- package/prisma/skills/swift-protocol-di-testing/SKILL.md +0 -191
- package/prisma/skills/swiftui-patterns/SKILL.md +0 -259
- package/prisma/skills/taste/SKILL.md +0 -264
- package/prisma/skills/taste/references/genre-taxonomy.md +0 -87
- package/prisma/skills/tdd-workflow/SKILL.md +0 -583
- package/prisma/skills/team-agent-orchestration/SKILL.md +0 -111
- package/prisma/skills/team-builder/SKILL.md +0 -169
- package/prisma/skills/terminal-ops/SKILL.md +0 -110
- package/prisma/skills/tinystruct-patterns/SKILL.md +0 -279
- package/prisma/skills/tinystruct-patterns/references/architecture.md +0 -90
- package/prisma/skills/tinystruct-patterns/references/data-handling.md +0 -60
- package/prisma/skills/tinystruct-patterns/references/database.md +0 -99
- package/prisma/skills/tinystruct-patterns/references/routing.md +0 -64
- package/prisma/skills/tinystruct-patterns/references/system-usage.md +0 -97
- package/prisma/skills/tinystruct-patterns/references/testing.md +0 -72
- package/prisma/skills/token-budget-advisor/SKILL.md +0 -134
- package/prisma/skills/ui-demo/SKILL.md +0 -466
- package/prisma/skills/ui-to-vue/SKILL.md +0 -135
- package/prisma/skills/uncloud/SKILL.md +0 -344
- package/prisma/skills/unified-notifications-ops/SKILL.md +0 -188
- package/prisma/skills/verification-loop/SKILL.md +0 -127
- package/prisma/skills/video-editing/SKILL.md +0 -311
- package/prisma/skills/videodb/SKILL.md +0 -375
- package/prisma/skills/videodb/reference/api-reference.md +0 -550
- package/prisma/skills/videodb/reference/capture-reference.md +0 -407
- package/prisma/skills/videodb/reference/capture.md +0 -101
- package/prisma/skills/videodb/reference/editor.md +0 -443
- package/prisma/skills/videodb/reference/generative.md +0 -331
- package/prisma/skills/videodb/reference/rtstream-reference.md +0 -564
- package/prisma/skills/videodb/reference/rtstream.md +0 -65
- package/prisma/skills/videodb/reference/search.md +0 -230
- package/prisma/skills/videodb/reference/streaming.md +0 -406
- package/prisma/skills/videodb/reference/use-cases.md +0 -118
- package/prisma/skills/videodb/scripts/ws_listener.py +0 -282
- package/prisma/skills/visa-doc-translate/README.md +0 -86
- package/prisma/skills/visa-doc-translate/SKILL.md +0 -117
- package/prisma/skills/vite-patterns/SKILL.md +0 -450
- package/prisma/skills/vue-patterns/SKILL.md +0 -471
- package/prisma/skills/windows-desktop-e2e/SKILL.md +0 -888
- package/prisma/skills/workspace-surface-audit/SKILL.md +0 -126
- package/prisma/skills/x-api/SKILL.md +0 -235
|
@@ -1,64 +0,0 @@
|
|
|
1
|
-
# Hook Integration for Session-Stop Self-Evaluation
|
|
2
|
-
|
|
3
|
-
Add this hook to `hooks/hooks.json` to remind the agent to self-evaluate at the end of every session (the hook echoes a reminder; it does not run the evaluator automatically):
|
|
4
|
-
|
|
5
|
-
```json
|
|
6
|
-
{
|
|
7
|
-
"hooks": {
|
|
8
|
-
"Stop": [
|
|
9
|
-
{
|
|
10
|
-
"hooks": [
|
|
11
|
-
{
|
|
12
|
-
"type": "command",
|
|
13
|
-
"command": "echo '[Self-Eval] Session complete. Consider running agent-self-evaluation to rate your output.'"
|
|
14
|
-
}
|
|
15
|
-
],
|
|
16
|
-
"description": "Remind agent to self-evaluate at session end"
|
|
17
|
-
}
|
|
18
|
-
]
|
|
19
|
-
}
|
|
20
|
-
}
|
|
21
|
-
```
|
|
22
|
-
|
|
23
|
-
`Stop` events do not require a `matcher` field (it is optional for `Stop`, `Notification`, `UserPromptSubmit`, and `SubagentStop` per `scripts/ci/validate-hooks.js`). If omitted, the hook object only needs `hooks` and metadata such as `description`.
|
|
24
|
-
|
|
25
|
-
## Integration with the Python Evaluator
|
|
26
|
-
|
|
27
|
-
The `scripts/evaluate.py` script can be used as a standalone tool:
|
|
28
|
-
|
|
29
|
-
```bash
|
|
30
|
-
# Pipe agent output directly
|
|
31
|
-
echo "Your agent response here" | python3 skills/agent-self-evaluation/scripts/evaluate.py
|
|
32
|
-
|
|
33
|
-
# From files
|
|
34
|
-
python3 skills/agent-self-evaluation/scripts/evaluate.py --task task.txt --output response.txt
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
To integrate it into hooks, capture the last agent output to a file first, then run the evaluator. For lightweight reminders after shell-based verification, use a simple supported matcher string:
|
|
38
|
-
|
|
39
|
-
```json
|
|
40
|
-
{
|
|
41
|
-
"hooks": {
|
|
42
|
-
"PostToolUse": [
|
|
43
|
-
{
|
|
44
|
-
"matcher": "Bash",
|
|
45
|
-
"hooks": [
|
|
46
|
-
{
|
|
47
|
-
"type": "command",
|
|
48
|
-
"command": "echo '[Self-Eval] If this command completed verification for a non-trivial task, consider running agent-self-evaluation.'"
|
|
49
|
-
}
|
|
50
|
-
],
|
|
51
|
-
"description": "Remind agent to self-evaluate after shell verification"
|
|
52
|
-
}
|
|
53
|
-
]
|
|
54
|
-
}
|
|
55
|
-
}
|
|
56
|
-
```
|
|
57
|
-
|
|
58
|
-
This avoids documenting unsupported command-expression matcher syntax. If your harness supports command-level matcher expressions, prefer a word-boundary regex such as `\b(pytest|npm test|go test)\b` rather than a broad `test` substring.
|
|
59
|
-
|
|
60
|
-
These hooks are opt-in. Add them to your local `hooks/hooks.json` if you want automated evaluation prompts.
|
|
61
|
-
|
|
62
|
-
## Manual Usage (Recommended)
|
|
63
|
-
|
|
64
|
-
The most reliable approach is manual invocation — the agent runs self-evaluation as part of its workflow when the `agent-self-evaluation` skill is active, without requiring hook configuration. The skill's "When to Activate" section already covers trigger conditions (multi-file changes, debugging sessions, design documents).
|
|
@@ -1,408 +0,0 @@
|
|
|
1
|
-
#!/usr/bin/env python3
|
|
2
|
-
"""Standalone agent output evaluator using the 5-axis rubric.
|
|
3
|
-
|
|
4
|
-
Reads a task description and agent output from stdin or files,
|
|
5
|
-
scores each axis, and prints a structured evaluation report.
|
|
6
|
-
|
|
7
|
-
Usage:
|
|
8
|
-
# Pipe output directly
|
|
9
|
-
echo "Task: Add retry logic" | evaluate.py --output response.txt
|
|
10
|
-
|
|
11
|
-
# From files
|
|
12
|
-
evaluate.py --task task.txt --output response.txt
|
|
13
|
-
|
|
14
|
-
# Interactive (reads task from prompt, output from stdin)
|
|
15
|
-
evaluate.py --interactive
|
|
16
|
-
|
|
17
|
-
The evaluator uses keyword heuristics + structural checks as a first pass.
|
|
18
|
-
For production use, pair with an LLM judge for semantic understanding.
|
|
19
|
-
"""
|
|
20
|
-
|
|
21
|
-
import argparse
|
|
22
|
-
import re
|
|
23
|
-
import sys
|
|
24
|
-
from dataclasses import dataclass, field
|
|
25
|
-
from typing import Optional
|
|
26
|
-
|
|
27
|
-
# Tunable thresholds for evaluation heuristics
|
|
28
|
-
WALL_OF_TEXT_WORDS = 200
|
|
29
|
-
SUMMARY_CHECK_WORDS = 300
|
|
30
|
-
SUMMARY_CHECK_FIRST_N = 100
|
|
31
|
-
TASK_OUTPUT_RATIO_HIGH = 15
|
|
32
|
-
TASK_OUTPUT_RATIO_MEDIUM = 8
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
@dataclass
|
|
36
|
-
class AxisScore:
|
|
37
|
-
name: str
|
|
38
|
-
score: int
|
|
39
|
-
evidence: list[str] = field(default_factory=list)
|
|
40
|
-
improvement: Optional[str] = None
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
def count_words(text: str) -> int:
|
|
44
|
-
return len(text.split())
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
def check_accuracy(text: str) -> AxisScore:
|
|
48
|
-
"""Check for verifiable claims, tool output references, error signs."""
|
|
49
|
-
evidence = []
|
|
50
|
-
deductions = 0
|
|
51
|
-
score = 5
|
|
52
|
-
|
|
53
|
-
# Positive signals: verified claims
|
|
54
|
-
verified_patterns = [
|
|
55
|
-
(r"(?i)(tests?\s+pass|all\s+tests?\s+passing|\d+\s+passed)", "Tests passing"),
|
|
56
|
-
(r"(?i)(exit\s+code\s*[:=]?\s*0|exited\s+with\s+0)", "Clean exit code"),
|
|
57
|
-
(r"(?i)(lint.*clean|no\s+lint\s+errors|0\s+errors)", "Lint clean"),
|
|
58
|
-
(r"(?i)(verified|confirmed|validated)\s+(with|against|using|by)", "Explicit verification"),
|
|
59
|
-
(r"(?i)(grep|rg)\s+.*\b(found|matched|returned)", "Grep confirmed"),
|
|
60
|
-
]
|
|
61
|
-
for pattern, label in verified_patterns:
|
|
62
|
-
if re.search(pattern, text):
|
|
63
|
-
evidence.append(f"+ {label}")
|
|
64
|
-
|
|
65
|
-
# Negative signals: unverified claims
|
|
66
|
-
danger_patterns = [
|
|
67
|
-
(r"(?i)(should\s+work|probably\s+fine|should\s+be\s+ok)", "Hedged claim without verification"),
|
|
68
|
-
(r"(?i)(I\s+think|I\s+believe|I\s+assume|might\s+be)", "Speculation without evidence"),
|
|
69
|
-
(r"(?i)(untested|not\s+tested|haven'?t\s+tested)", "Explicitly untested"),
|
|
70
|
-
(r"(?i)(TODO|FIXME|HACK|WORKAROUND)", "Unresolved TODO/FIXME"),
|
|
71
|
-
]
|
|
72
|
-
for pattern, label in danger_patterns:
|
|
73
|
-
if re.search(pattern, text):
|
|
74
|
-
deductions += 1
|
|
75
|
-
evidence.append(f"- {label}")
|
|
76
|
-
|
|
77
|
-
if deductions >= 3:
|
|
78
|
-
score = 2
|
|
79
|
-
elif deductions == 2:
|
|
80
|
-
score = 3
|
|
81
|
-
elif deductions == 1:
|
|
82
|
-
score = 4
|
|
83
|
-
|
|
84
|
-
if not evidence:
|
|
85
|
-
evidence.append("No verification signals detected — score assumes correctness")
|
|
86
|
-
|
|
87
|
-
result = AxisScore(name="Accuracy", score=score, evidence=evidence)
|
|
88
|
-
if score < 5:
|
|
89
|
-
result.improvement = "Cite specific tool outputs (test results, exit codes, grep findings) to back claims"
|
|
90
|
-
return result
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
def check_completeness(text: str) -> AxisScore:
|
|
94
|
-
"""Check for requirement coverage, edge cases, error handling."""
|
|
95
|
-
evidence = []
|
|
96
|
-
score = 5
|
|
97
|
-
|
|
98
|
-
# Positive signals
|
|
99
|
-
completeness_signals = [
|
|
100
|
-
(r"(?i)(edge\s*cases?|corner\s*cases?)", "Edge cases addressed"),
|
|
101
|
-
(r"(?i)(error\s*handling|exception\s*handling|try/except|try\s*{)", "Error handling present"),
|
|
102
|
-
(r"(?i)(all\s+\w+\s+(methods|endpoints|routes))", "Full coverage claimed"),
|
|
103
|
-
(r"(?i)(verification|verified\s+that|confirmed\s+that)", "Verification step present"),
|
|
104
|
-
]
|
|
105
|
-
for pattern, label in completeness_signals:
|
|
106
|
-
if re.search(pattern, text):
|
|
107
|
-
evidence.append(f"+ {label}")
|
|
108
|
-
|
|
109
|
-
# Gaps
|
|
110
|
-
gap_signals = [
|
|
111
|
-
(r"(?i)(not\s+covered|not\s+handled|out\s+of\s+scope)", "Explicit gap acknowledged"),
|
|
112
|
-
(r"(?i)(only\s+(works|handles|supports)\s+\w+)", "Limited scope noted"),
|
|
113
|
-
(r"(?i)(assume[sd]?\s+that|assuming\s+the)", "Assumption without verification"),
|
|
114
|
-
]
|
|
115
|
-
deductions = 0
|
|
116
|
-
for pattern, label in gap_signals:
|
|
117
|
-
if re.search(pattern, text):
|
|
118
|
-
deductions += 1
|
|
119
|
-
evidence.append(f"- {label}")
|
|
120
|
-
|
|
121
|
-
if deductions >= 2:
|
|
122
|
-
score = 3
|
|
123
|
-
elif deductions == 1:
|
|
124
|
-
score = 4
|
|
125
|
-
|
|
126
|
-
if not evidence:
|
|
127
|
-
evidence.append("No completeness signals — unable to assess coverage")
|
|
128
|
-
|
|
129
|
-
result = AxisScore(name="Completeness", score=score, evidence=evidence)
|
|
130
|
-
if score < 5:
|
|
131
|
-
result.improvement = "List what was covered AND what was intentionally excluded, with reasoning"
|
|
132
|
-
return result
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
def _check_jargon(text: str) -> tuple[int, list[str]]:
|
|
136
|
-
"""Return clarity deductions for unexplained domain jargon."""
|
|
137
|
-
jargon = [
|
|
138
|
-
(r"\b(idempotent|race condition|deadlock|thundering herd)\b", "concurrency"),
|
|
139
|
-
(r"\b(exponential backoff|circuit breaker|bulkhead)\b", "resilience"),
|
|
140
|
-
(r"\b(ACID|CAP|eventual consistency|linearizability)\b", "database theory"),
|
|
141
|
-
]
|
|
142
|
-
explanation_pattern = r"(?i)({domain}|means|refers to|i\.e\.|in other words)"
|
|
143
|
-
for pattern, domain in jargon:
|
|
144
|
-
has_term = re.search(pattern, text, re.IGNORECASE)
|
|
145
|
-
explains_term = re.search(explanation_pattern.format(domain=domain), text)
|
|
146
|
-
if has_term and not explains_term:
|
|
147
|
-
return 1, [f"- Domain term used without explanation ({domain})"]
|
|
148
|
-
return 0, []
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
def _check_summary(text: str) -> tuple[int, list[str]]:
|
|
152
|
-
"""Return clarity deduction when long output lacks an early summary."""
|
|
153
|
-
summary_terms = ["summary", "tldr", "overview", "in short"]
|
|
154
|
-
has_early_summary = any(term in ' '.join(text.split()[:SUMMARY_CHECK_FIRST_N]).lower() for term in summary_terms)
|
|
155
|
-
if not has_early_summary and count_words(text) > SUMMARY_CHECK_WORDS:
|
|
156
|
-
return 1, ["- No summary/TLDR in first 100 words (text is 300+ words)"]
|
|
157
|
-
return 0, []
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
def check_clarity(text: str) -> AxisScore:
|
|
161
|
-
"""Check for structure, readability, jargon handling."""
|
|
162
|
-
evidence = []
|
|
163
|
-
deductions = 0
|
|
164
|
-
|
|
165
|
-
if re.search(r"^#{1,3}\s+", text, re.MULTILINE):
|
|
166
|
-
evidence.append("+ Uses headings for structure")
|
|
167
|
-
if re.search(r"```", text):
|
|
168
|
-
evidence.append("+ Uses code blocks")
|
|
169
|
-
if re.search(r"^\s*[-*]\s+", text, re.MULTILINE):
|
|
170
|
-
evidence.append("+ Uses bullet points")
|
|
171
|
-
|
|
172
|
-
for paragraph in [p for p in text.split("\n\n") if p.strip()]:
|
|
173
|
-
if count_words(paragraph) > WALL_OF_TEXT_WORDS:
|
|
174
|
-
deductions += 1
|
|
175
|
-
evidence.append("- Wall-of-text paragraph (>200 words without break)")
|
|
176
|
-
break
|
|
177
|
-
|
|
178
|
-
jargon_deductions, jargon_evidence = _check_jargon(text)
|
|
179
|
-
summary_deductions, summary_evidence = _check_summary(text)
|
|
180
|
-
deductions += jargon_deductions + summary_deductions
|
|
181
|
-
evidence.extend(jargon_evidence + summary_evidence)
|
|
182
|
-
|
|
183
|
-
if deductions >= 3:
|
|
184
|
-
score = 2
|
|
185
|
-
elif deductions == 2:
|
|
186
|
-
score = 3
|
|
187
|
-
elif deductions == 1:
|
|
188
|
-
score = 4
|
|
189
|
-
else:
|
|
190
|
-
score = 5
|
|
191
|
-
|
|
192
|
-
if not evidence:
|
|
193
|
-
evidence.append("+ Well-structured with no clarity issues detected")
|
|
194
|
-
|
|
195
|
-
result = AxisScore(name="Clarity", score=score, evidence=evidence)
|
|
196
|
-
if score < 5:
|
|
197
|
-
result.improvement = "Add headings, break long paragraphs, define domain terms on first use"
|
|
198
|
-
return result
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
def check_actionability(text: str) -> AxisScore:
|
|
202
|
-
"""Check if the user can act on the output immediately."""
|
|
203
|
-
evidence = []
|
|
204
|
-
score = 5
|
|
205
|
-
deductions = 0
|
|
206
|
-
|
|
207
|
-
# Positive signals
|
|
208
|
-
actionable_signals = [
|
|
209
|
-
(r"(?i)(merge|PR|pull request).*?(created|ready|open)", "PR created"),
|
|
210
|
-
(r"(?i)(run|execute)\s+[`\"']?[\w./-]+", "Specific run command given"),
|
|
211
|
-
(r"(?i)(next\s+steps?|follow[- ]up|what\s+to\s+do)", "Next steps provided"),
|
|
212
|
-
(r"(?i)(file\s+(created|written|modified|updated)\s+at)", "File path specified"),
|
|
213
|
-
]
|
|
214
|
-
for pattern, label in actionable_signals:
|
|
215
|
-
if re.search(pattern, text):
|
|
216
|
-
evidence.append(f"+ {label}")
|
|
217
|
-
|
|
218
|
-
# Negative signals
|
|
219
|
-
vague_signals = [
|
|
220
|
-
(r"(?i)(you\s+(should|could|might\s+want\s+to))\s+\w+", "Vague suggestion without specifics"),
|
|
221
|
-
(r"(?i)(consider|maybe|perhaps)\s+\w+ing", "Non-committal suggestion"),
|
|
222
|
-
(r"(?i)(figure\s+out|look\s+into|investigate)\s", "Defers work to user"),
|
|
223
|
-
]
|
|
224
|
-
for pattern, label in vague_signals:
|
|
225
|
-
if re.search(pattern, text):
|
|
226
|
-
deductions += 1
|
|
227
|
-
evidence.append(f"- {label}")
|
|
228
|
-
|
|
229
|
-
if deductions >= 3:
|
|
230
|
-
score = 2
|
|
231
|
-
elif deductions == 2:
|
|
232
|
-
score = 3
|
|
233
|
-
elif deductions == 1:
|
|
234
|
-
score = 4
|
|
235
|
-
|
|
236
|
-
if not evidence:
|
|
237
|
-
evidence.append("No actionability signals — user may need to ask 'what now?'")
|
|
238
|
-
|
|
239
|
-
result = AxisScore(name="Actionability", score=score, evidence=evidence)
|
|
240
|
-
if score < 5:
|
|
241
|
-
result.improvement = "End with a single clear action: 'Merge this PR', 'Run ./deploy.sh', or 'Review the 3 changed files'"
|
|
242
|
-
return result
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
def check_conciseness(text: str, task: Optional[str] = None) -> AxisScore:
|
|
246
|
-
"""Check for redundancy, filler, information density."""
|
|
247
|
-
evidence = []
|
|
248
|
-
score = 5
|
|
249
|
-
wc = count_words(text)
|
|
250
|
-
|
|
251
|
-
# Heuristic: task-to-output ratio
|
|
252
|
-
if task:
|
|
253
|
-
task_wc = count_words(task)
|
|
254
|
-
ratio = wc / max(task_wc, 1)
|
|
255
|
-
if ratio > TASK_OUTPUT_RATIO_HIGH:
|
|
256
|
-
evidence.append(f"- Output is {ratio:.0f}x longer than task description (high ratio)")
|
|
257
|
-
score = min(score, 3)
|
|
258
|
-
elif ratio > TASK_OUTPUT_RATIO_MEDIUM:
|
|
259
|
-
evidence.append(f"- Output is {ratio:.0f}x longer than task description")
|
|
260
|
-
score = min(score, 4)
|
|
261
|
-
|
|
262
|
-
# Redundancy signals
|
|
263
|
-
redundancy_checks = [
|
|
264
|
-
(r"(?i)(as\s+(I|we)\s+(mentioned|said|noted|discussed)\s+(earlier|above|before))",
|
|
265
|
-
"Refers back to earlier statement (possible repetition)"),
|
|
266
|
-
(r"(?i)(to\s+summarize|in\s+summary|in\s+conclusion|to\s+conclude)",
|
|
267
|
-
"Has explicit summary (good if needed, flag if redundant)"),
|
|
268
|
-
(r"(?i)(let\s+me\s+(explain|break\s+this\s+down|walk\s+you\s+through))",
|
|
269
|
-
"Meta-commentary adds words without information"),
|
|
270
|
-
]
|
|
271
|
-
redundant_count = 0
|
|
272
|
-
for pattern, label in redundancy_checks:
|
|
273
|
-
matches = re.findall(pattern, text)
|
|
274
|
-
if len(matches) > 2:
|
|
275
|
-
redundant_count += 1
|
|
276
|
-
evidence.append(f"- '{label}' appears {len(matches)} times")
|
|
277
|
-
|
|
278
|
-
if redundant_count >= 2:
|
|
279
|
-
score = min(score, 3)
|
|
280
|
-
elif redundant_count == 1:
|
|
281
|
-
score = min(score, 4)
|
|
282
|
-
|
|
283
|
-
if not evidence and score == 5:
|
|
284
|
-
evidence.append("+ No redundancy detected. Information density appears good.")
|
|
285
|
-
|
|
286
|
-
result = AxisScore(name="Conciseness", score=score, evidence=evidence)
|
|
287
|
-
if score < 5:
|
|
288
|
-
result.improvement = "Cut meta-commentary, remove repeated points, trim examples to one representative case"
|
|
289
|
-
return result
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
def evaluate(task: Optional[str], output: str) -> list[AxisScore]:
|
|
293
|
-
"""Run all 5 axis checks and return scored results."""
|
|
294
|
-
return [
|
|
295
|
-
check_accuracy(output),
|
|
296
|
-
check_completeness(output),
|
|
297
|
-
check_clarity(output),
|
|
298
|
-
check_actionability(output),
|
|
299
|
-
check_conciseness(output, task),
|
|
300
|
-
]
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
def format_report(scores: list[AxisScore]) -> str:
|
|
304
|
-
"""Format scores into a readable evaluation report."""
|
|
305
|
-
avg = sum(s.score for s in scores) / len(scores)
|
|
306
|
-
lines = []
|
|
307
|
-
lines.append("=" * 60)
|
|
308
|
-
lines.append("AGENT SELF-EVALUATION REPORT")
|
|
309
|
-
lines.append("=" * 60)
|
|
310
|
-
lines.append(f"Summary: Overall score {avg:.1f}/5 across 5 quality axes.")
|
|
311
|
-
lines.append("")
|
|
312
|
-
|
|
313
|
-
for s in scores:
|
|
314
|
-
bar = "█" * s.score + "░" * (5 - s.score)
|
|
315
|
-
lines.append(f" {s.name:<15} {bar} {s.score}/5")
|
|
316
|
-
lines.extend(f" {e}" for e in s.evidence)
|
|
317
|
-
if s.improvement:
|
|
318
|
-
lines.append(f" → {s.improvement}")
|
|
319
|
-
lines.append("")
|
|
320
|
-
|
|
321
|
-
lines.append(f" {'OVERALL':<15} {avg:.1f}/5")
|
|
322
|
-
lines.append("")
|
|
323
|
-
|
|
324
|
-
# Critical issues (axes ≤ 2)
|
|
325
|
-
critical = [(s, s.improvement or "No improvement suggested") for s in scores if s.score <= 2]
|
|
326
|
-
lines.append("CRITICAL ISSUES (axes ≤ 2):")
|
|
327
|
-
if critical:
|
|
328
|
-
for s, imp in critical:
|
|
329
|
-
lines.append(f" [{s.name}] Score {s.score}/5 — {imp}")
|
|
330
|
-
else:
|
|
331
|
-
lines.append(" None")
|
|
332
|
-
|
|
333
|
-
lines.append("")
|
|
334
|
-
lines.append("Self-check: Would the user agree with this assessment? [Yes/No + brief justification]")
|
|
335
|
-
lines.append("")
|
|
336
|
-
|
|
337
|
-
# Top improvements (axes scoring < 4, ranked by impact)
|
|
338
|
-
improvements = [(s, s.improvement) for s in scores if s.improvement and s.score < 4]
|
|
339
|
-
lines.append("TOP IMPROVEMENTS:")
|
|
340
|
-
if improvements:
|
|
341
|
-
for i, (s, imp) in enumerate(sorted(improvements, key=lambda x: x[0].score), 1):
|
|
342
|
-
lines.append(f" {i}. [{s.name}] {imp}")
|
|
343
|
-
else:
|
|
344
|
-
lines.append(" No axes below 4. Strong output across all dimensions.")
|
|
345
|
-
|
|
346
|
-
lines.append("")
|
|
347
|
-
|
|
348
|
-
# Verdict
|
|
349
|
-
min_score = min(s.score for s in scores)
|
|
350
|
-
if min_score <= 2:
|
|
351
|
-
verdict = f"Redo with specific fixes. Weakest axis: {min(scores, key=lambda s: s.score).name} ({min_score}/5)."
|
|
352
|
-
elif any(s.score <= 3 for s in scores):
|
|
353
|
-
weak = [s.name for s in scores if s.score <= 3]
|
|
354
|
-
verdict = f"Fix {'/'.join(weak)} issues, then deliver."
|
|
355
|
-
elif avg >= 4.5:
|
|
356
|
-
verdict = "Deliver as-is. No changes needed."
|
|
357
|
-
else:
|
|
358
|
-
verdict = "Deliver as-is. Minor improvements noted above."
|
|
359
|
-
lines.append(f"VERDICT: {verdict}")
|
|
360
|
-
|
|
361
|
-
return "\n".join(lines)
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
def _read_file_or_text(path: Optional[str], *, required: bool = False) -> Optional[str]:
|
|
365
|
-
"""Read a file path or return inline text when allowed."""
|
|
366
|
-
if path is None:
|
|
367
|
-
return None
|
|
368
|
-
try:
|
|
369
|
-
with open(path) as f:
|
|
370
|
-
return f.read()
|
|
371
|
-
except FileNotFoundError:
|
|
372
|
-
if required:
|
|
373
|
-
print(f"Error: output file '{path}' not found", file=sys.stderr)
|
|
374
|
-
sys.exit(1)
|
|
375
|
-
return path
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
def _read_input(args: argparse.Namespace) -> tuple[Optional[str], str]:
|
|
379
|
-
"""Read task and output for interactive, file, or pipe mode."""
|
|
380
|
-
if args.interactive:
|
|
381
|
-
task = input("Task description: ").strip()
|
|
382
|
-
print("Paste agent output (Ctrl+D to finish):")
|
|
383
|
-
return task, sys.stdin.read()
|
|
384
|
-
if args.output:
|
|
385
|
-
return _read_file_or_text(args.task), _read_file_or_text(args.output, required=True) or ""
|
|
386
|
-
return _read_file_or_text(args.task), sys.stdin.read()
|
|
387
|
-
|
|
388
|
-
|
|
389
|
-
def main() -> None:
|
|
390
|
-
parser = argparse.ArgumentParser(
|
|
391
|
-
description="Evaluate agent output against the 5-axis rubric"
|
|
392
|
-
)
|
|
393
|
-
parser.add_argument("--task", help="Task description (file path or inline text)")
|
|
394
|
-
parser.add_argument("--output", help="Agent output to evaluate (file path)")
|
|
395
|
-
parser.add_argument("--interactive", action="store_true", help="Prompt for task and read output from stdin")
|
|
396
|
-
args = parser.parse_args()
|
|
397
|
-
|
|
398
|
-
task, output = _read_input(args)
|
|
399
|
-
if not output:
|
|
400
|
-
print("Error: no output to evaluate", file=sys.stderr)
|
|
401
|
-
sys.exit(1)
|
|
402
|
-
|
|
403
|
-
scores = evaluate(task, output)
|
|
404
|
-
print(format_report(scores))
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
if __name__ == "__main__":
|
|
408
|
-
main()
|
|
@@ -1,86 +0,0 @@
|
|
|
1
|
-
# Agent Self-Evaluation Report Template
|
|
2
|
-
|
|
3
|
-
Copy this template and fill in after completing a task. The format matches `scripts/evaluate.py` output.
|
|
4
|
-
|
|
5
|
-
```
|
|
6
|
-
============================================================
|
|
7
|
-
AGENT SELF-EVALUATION REPORT
|
|
8
|
-
============================================================
|
|
9
|
-
Summary: Overall score X.X/5 across 5 quality axes.
|
|
10
|
-
|
|
11
|
-
Accuracy █████ 5/5 or ███░░ 3/5
|
|
12
|
-
+ [Evidence: passing tests, verified claims]
|
|
13
|
-
- [Gaps: unverified claims, hedging language]
|
|
14
|
-
→ [Improvement if score < 5]
|
|
15
|
-
|
|
16
|
-
Completeness █████ 5/5
|
|
17
|
-
+ [What's covered: all requirements + edge cases]
|
|
18
|
-
- [What's missing: explicitly acknowledge gaps]
|
|
19
|
-
→ [Improvement if score < 5]
|
|
20
|
-
|
|
21
|
-
Clarity █████ 5/5
|
|
22
|
-
+ [Structure: headings, code blocks, bullet points]
|
|
23
|
-
- [Issues: undefined terms, wall of text, no summary]
|
|
24
|
-
→ [Improvement if score < 5]
|
|
25
|
-
|
|
26
|
-
Actionability █████ 5/5
|
|
27
|
-
+ [User can: merge PR, run command, review file]
|
|
28
|
-
- [Blockers: missing steps, vague suggestions]
|
|
29
|
-
→ [Improvement if score < 5]
|
|
30
|
-
|
|
31
|
-
Conciseness █████ 5/5
|
|
32
|
-
+ [Tight: no repetition, high information density]
|
|
33
|
-
- [Bloat: filler, meta-commentary, repeated points]
|
|
34
|
-
→ [Improvement if score < 5]
|
|
35
|
-
|
|
36
|
-
OVERALL X.X/5
|
|
37
|
-
|
|
38
|
-
CRITICAL ISSUES (axes ≤ 2):
|
|
39
|
-
[Axis] Score N/5 — specific fix needed
|
|
40
|
-
(or "None" if no axis ≤ 2)
|
|
41
|
-
|
|
42
|
-
Self-check: Would the user agree with this assessment? [Yes/No + brief justification]
|
|
43
|
-
|
|
44
|
-
TOP IMPROVEMENTS:
|
|
45
|
-
1. [Highest impact fix]
|
|
46
|
-
2. [Second highest]
|
|
47
|
-
(Only list axes scoring < 4, ranked by user impact)
|
|
48
|
-
|
|
49
|
-
VERDICT: [Deliver as-is / Fix N issues then deliver / Redo from scratch]
|
|
50
|
-
```
|
|
51
|
-
|
|
52
|
-
## Quick Reference: Scoring Triggers
|
|
53
|
-
|
|
54
|
-
| If you see this... | Accuracy | Completeness | Clarity | Actionability | Conciseness |
|
|
55
|
-
|---|---|---|---|---|---|
|
|
56
|
-
| "should work" / "probably fine" | ≤4 | — | — | — | — |
|
|
57
|
-
| "I think" / "I believe" | ≤4 | — | — | — | — |
|
|
58
|
-
| No test output cited | ≤4 | — | — | — | — |
|
|
59
|
-
| "TODO" / "FIXME" left behind | ≤3 | ≤3 | — | ≤3 | — |
|
|
60
|
-
| Missing error handling | — | ≤3 | — | — | — |
|
|
61
|
-
| Only happy path covered | — | ≤3 | — | — | — |
|
|
62
|
-
| Wall-of-text paragraph (>200 words) | — | — | ≤3 | — | — |
|
|
63
|
-
| No headings or structure | — | — | ≤3 | — | — |
|
|
64
|
-
| "You should..." without specifics | — | — | — | ≤3 | — |
|
|
65
|
-
| No PR or file created | — | — | — | ≤3 | — |
|
|
66
|
-
| User needs to figure out next step | — | — | — | ≤2 | — |
|
|
67
|
-
| Repeated points (3+ times) | — | — | — | — | ≤3 |
|
|
68
|
-
| "Let me explain..." / "To summarize..." x3+ | — | — | — | — | ≤3 |
|
|
69
|
-
| Output >15x longer than task | — | — | — | — | ≤3 |
|
|
70
|
-
|
|
71
|
-
## When to Skip
|
|
72
|
-
|
|
73
|
-
Skip the evaluation if:
|
|
74
|
-
- Task was a single tool call (e.g., "read this file" — nothing to evaluate)
|
|
75
|
-
- User explicitly says "don't evaluate" or "just do it"
|
|
76
|
-
- Task is purely conversational (greeting, small talk)
|
|
77
|
-
- You're mid-workflow and the user will judge the final output, not intermediate steps
|
|
78
|
-
|
|
79
|
-
## Post-Evaluation Actions
|
|
80
|
-
|
|
81
|
-
| Overall Score | What to do |
|
|
82
|
-
|---|---|
|
|
83
|
-
| ≥4.5 | Deliver as-is. No changes needed. |
|
|
84
|
-
| 3.5–4.4 | Flag top improvement but deliver. Fix if <30 seconds. |
|
|
85
|
-
| 2.5–3.4 | State what you'd change. Ask user: "Should I redo [axis] or deliver as-is?" |
|
|
86
|
-
| <2.5 | Don't deliver. Say: "This scored [score] because [evidence]. Let me redo this with [specific fix]." Then redo. |
|