@bastani/atomic 0.5.11-0 → 0.5.12-0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/adapt/SKILL.md +199 -0
- package/.agents/skills/advanced-evaluation/SKILL.md +402 -0
- package/.agents/skills/advanced-evaluation/references/bias-mitigation.md +288 -0
- package/.agents/skills/advanced-evaluation/references/evaluation-pipeline.md +43 -0
- package/.agents/skills/advanced-evaluation/references/implementation-patterns.md +315 -0
- package/.agents/skills/advanced-evaluation/references/metrics-guide.md +331 -0
- package/.agents/skills/advanced-evaluation/scripts/evaluation_example.py +392 -0
- package/.agents/skills/animate/SKILL.md +175 -0
- package/.agents/skills/arrange/SKILL.md +124 -0
- package/.agents/skills/audit/SKILL.md +148 -0
- package/.agents/skills/bdi-mental-states/SKILL.md +311 -0
- package/.agents/skills/bdi-mental-states/references/bdi-ontology-core.md +207 -0
- package/.agents/skills/bdi-mental-states/references/framework-integration.md +582 -0
- package/.agents/skills/bdi-mental-states/references/rdf-examples.md +315 -0
- package/.agents/skills/bdi-mental-states/references/sparql-competency.md +420 -0
- package/.agents/skills/bolder/SKILL.md +117 -0
- package/.agents/skills/bun/SKILL.md +199 -0
- package/.agents/skills/clarify/SKILL.md +183 -0
- package/.agents/skills/colorize/SKILL.md +143 -0
- package/.agents/skills/context-compression/SKILL.md +272 -0
- package/.agents/skills/context-compression/references/evaluation-framework.md +213 -0
- package/.agents/skills/context-compression/scripts/compression_evaluator.py +862 -0
- package/.agents/skills/context-compression/tests/test_compression_evaluator.py +56 -0
- package/.agents/skills/context-degradation/SKILL.md +206 -0
- package/.agents/skills/context-degradation/references/patterns.md +314 -0
- package/.agents/skills/context-degradation/scripts/degradation_detector.py +614 -0
- package/.agents/skills/context-fundamentals/SKILL.md +201 -0
- package/.agents/skills/context-fundamentals/references/context-components.md +283 -0
- package/.agents/skills/context-fundamentals/scripts/context_manager.py +533 -0
- package/.agents/skills/context-optimization/SKILL.md +195 -0
- package/.agents/skills/context-optimization/references/optimization_techniques.md +272 -0
- package/.agents/skills/context-optimization/scripts/compaction.py +562 -0
- package/.agents/skills/create-spec/SKILL.md +244 -0
- package/.agents/skills/critique/SKILL.md +225 -0
- package/.agents/skills/critique/reference/cognitive-load.md +106 -0
- package/.agents/skills/critique/reference/heuristics-scoring.md +234 -0
- package/.agents/skills/critique/reference/personas.md +178 -0
- package/.agents/skills/delight/SKILL.md +304 -0
- package/.agents/skills/distill/SKILL.md +122 -0
- package/.agents/skills/docx/LICENSE.txt +30 -0
- package/.agents/skills/docx/SKILL.md +590 -0
- package/.agents/skills/docx/scripts/__init__.py +1 -0
- package/.agents/skills/docx/scripts/accept_changes.py +135 -0
- package/.agents/skills/docx/scripts/comment.py +318 -0
- package/.agents/skills/docx/scripts/office/helpers/__init__.py +0 -0
- package/.agents/skills/docx/scripts/office/helpers/merge_runs.py +199 -0
- package/.agents/skills/docx/scripts/office/helpers/simplify_redlines.py +197 -0
- package/.agents/skills/docx/scripts/office/pack.py +159 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- package/.agents/skills/docx/scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- package/.agents/skills/docx/scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- package/.agents/skills/docx/scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- package/.agents/skills/docx/scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- package/.agents/skills/docx/scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- package/.agents/skills/docx/scripts/office/schemas/mce/mc.xsd +75 -0
- package/.agents/skills/docx/scripts/office/schemas/microsoft/wml-2010.xsd +560 -0
- package/.agents/skills/docx/scripts/office/schemas/microsoft/wml-2012.xsd +67 -0
- package/.agents/skills/docx/scripts/office/schemas/microsoft/wml-2018.xsd +14 -0
- package/.agents/skills/docx/scripts/office/schemas/microsoft/wml-cex-2018.xsd +20 -0
- package/.agents/skills/docx/scripts/office/schemas/microsoft/wml-cid-2016.xsd +13 -0
- package/.agents/skills/docx/scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- package/.agents/skills/docx/scripts/office/schemas/microsoft/wml-symex-2015.xsd +8 -0
- package/.agents/skills/docx/scripts/office/soffice.py +183 -0
- package/.agents/skills/docx/scripts/office/unpack.py +132 -0
- package/.agents/skills/docx/scripts/office/validate.py +111 -0
- package/.agents/skills/docx/scripts/office/validators/__init__.py +15 -0
- package/.agents/skills/docx/scripts/office/validators/base.py +847 -0
- package/.agents/skills/docx/scripts/office/validators/docx.py +446 -0
- package/.agents/skills/docx/scripts/office/validators/pptx.py +275 -0
- package/.agents/skills/docx/scripts/office/validators/redlining.py +247 -0
- package/.agents/skills/docx/scripts/templates/comments.xml +3 -0
- package/.agents/skills/docx/scripts/templates/commentsExtended.xml +3 -0
- package/.agents/skills/docx/scripts/templates/commentsExtensible.xml +3 -0
- package/.agents/skills/docx/scripts/templates/commentsIds.xml +3 -0
- package/.agents/skills/docx/scripts/templates/people.xml +3 -0
- package/.agents/skills/evaluation/SKILL.md +251 -0
- package/.agents/skills/evaluation/references/metrics.md +339 -0
- package/.agents/skills/evaluation/scripts/evaluator.py +627 -0
- package/.agents/skills/explain-code/SKILL.md +230 -0
- package/.agents/skills/extract/SKILL.md +91 -0
- package/.agents/skills/filesystem-context/SKILL.md +287 -0
- package/.agents/skills/filesystem-context/references/implementation-patterns.md +549 -0
- package/.agents/skills/filesystem-context/scripts/filesystem_context.py +425 -0
- package/.agents/skills/find-skills/SKILL.md +142 -0
- package/.agents/skills/frontend-design/SKILL.md +147 -0
- package/.agents/skills/frontend-design/reference/color-and-contrast.md +132 -0
- package/.agents/skills/frontend-design/reference/interaction-design.md +195 -0
- package/.agents/skills/frontend-design/reference/motion-design.md +99 -0
- package/.agents/skills/frontend-design/reference/responsive-design.md +114 -0
- package/.agents/skills/frontend-design/reference/spatial-design.md +100 -0
- package/.agents/skills/frontend-design/reference/typography.md +133 -0
- package/.agents/skills/frontend-design/reference/ux-writing.md +107 -0
- package/.agents/skills/gh-commit/SKILL.md +243 -0
- package/.agents/skills/gh-create-pr/SKILL.md +93 -0
- package/.agents/skills/harden/SKILL.md +354 -0
- package/.agents/skills/hosted-agents/SKILL.md +260 -0
- package/.agents/skills/hosted-agents/references/infrastructure-patterns.md +700 -0
- package/.agents/skills/hosted-agents/scripts/sandbox_manager.py +590 -0
- package/.agents/skills/impeccable/SKILL.md +365 -0
- package/.agents/skills/impeccable/reference/color-and-contrast.md +105 -0
- package/.agents/skills/impeccable/reference/craft.md +70 -0
- package/.agents/skills/impeccable/reference/extract.md +70 -0
- package/.agents/skills/impeccable/reference/interaction-design.md +195 -0
- package/.agents/skills/impeccable/reference/motion-design.md +99 -0
- package/.agents/skills/impeccable/reference/responsive-design.md +114 -0
- package/.agents/skills/impeccable/reference/spatial-design.md +100 -0
- package/.agents/skills/impeccable/reference/typography.md +142 -0
- package/.agents/skills/impeccable/reference/ux-writing.md +107 -0
- package/.agents/skills/impeccable/scripts/cleanup-deprecated.mjs +214 -0
- package/.agents/skills/init/SKILL.md +138 -0
- package/.agents/skills/layout/SKILL.md +125 -0
- package/.agents/skills/liteparse/SKILL.md +222 -0
- package/.agents/skills/memory-systems/SKILL.md +219 -0
- package/.agents/skills/memory-systems/references/implementation.md +551 -0
- package/.agents/skills/memory-systems/scripts/memory_store.py +616 -0
- package/.agents/skills/multi-agent-patterns/SKILL.md +257 -0
- package/.agents/skills/multi-agent-patterns/references/frameworks.md +433 -0
- package/.agents/skills/multi-agent-patterns/scripts/coordination.py +613 -0
- package/.agents/skills/normalize/SKILL.md +70 -0
- package/.agents/skills/onboard/SKILL.md +245 -0
- package/.agents/skills/opentui/SKILL.md +201 -0
- package/.agents/skills/opentui/references/animation/REFERENCE.md +431 -0
- package/.agents/skills/opentui/references/components/REFERENCE.md +144 -0
- package/.agents/skills/opentui/references/components/code-diff.md +672 -0
- package/.agents/skills/opentui/references/components/containers.md +417 -0
- package/.agents/skills/opentui/references/components/inputs.md +531 -0
- package/.agents/skills/opentui/references/components/text-display.md +386 -0
- package/.agents/skills/opentui/references/core/REFERENCE.md +145 -0
- package/.agents/skills/opentui/references/core/api.md +543 -0
- package/.agents/skills/opentui/references/core/configuration.md +168 -0
- package/.agents/skills/opentui/references/core/gotchas.md +393 -0
- package/.agents/skills/opentui/references/core/patterns.md +449 -0
- package/.agents/skills/opentui/references/keyboard/REFERENCE.md +617 -0
- package/.agents/skills/opentui/references/layout/REFERENCE.md +337 -0
- package/.agents/skills/opentui/references/layout/patterns.md +444 -0
- package/.agents/skills/opentui/references/react/REFERENCE.md +174 -0
- package/.agents/skills/opentui/references/react/api.md +436 -0
- package/.agents/skills/opentui/references/react/configuration.md +302 -0
- package/.agents/skills/opentui/references/react/gotchas.md +443 -0
- package/.agents/skills/opentui/references/react/patterns.md +501 -0
- package/.agents/skills/opentui/references/solid/REFERENCE.md +201 -0
- package/.agents/skills/opentui/references/solid/api.md +564 -0
- package/.agents/skills/opentui/references/solid/configuration.md +316 -0
- package/.agents/skills/opentui/references/solid/gotchas.md +427 -0
- package/.agents/skills/opentui/references/solid/patterns.md +560 -0
- package/.agents/skills/opentui/references/testing/REFERENCE.md +614 -0
- package/.agents/skills/optimize/SKILL.md +266 -0
- package/.agents/skills/overdrive/SKILL.md +142 -0
- package/.agents/skills/pdf/LICENSE.txt +30 -0
- package/.agents/skills/pdf/SKILL.md +314 -0
- package/.agents/skills/pdf/forms.md +294 -0
- package/.agents/skills/pdf/reference.md +612 -0
- package/.agents/skills/pdf/scripts/check_bounding_boxes.py +65 -0
- package/.agents/skills/pdf/scripts/check_fillable_fields.py +11 -0
- package/.agents/skills/pdf/scripts/convert_pdf_to_images.py +33 -0
- package/.agents/skills/pdf/scripts/create_validation_image.py +37 -0
- package/.agents/skills/pdf/scripts/extract_form_field_info.py +122 -0
- package/.agents/skills/pdf/scripts/extract_form_structure.py +115 -0
- package/.agents/skills/pdf/scripts/fill_fillable_fields.py +98 -0
- package/.agents/skills/pdf/scripts/fill_pdf_form_with_annotations.py +107 -0
- package/.agents/skills/playwright-cli/SKILL.md +344 -0
- package/.agents/skills/playwright-cli/references/element-attributes.md +23 -0
- package/.agents/skills/playwright-cli/references/playwright-tests.md +39 -0
- package/.agents/skills/playwright-cli/references/request-mocking.md +87 -0
- package/.agents/skills/playwright-cli/references/running-code.md +231 -0
- package/.agents/skills/playwright-cli/references/session-management.md +169 -0
- package/.agents/skills/playwright-cli/references/storage-state.md +275 -0
- package/.agents/skills/playwright-cli/references/test-generation.md +88 -0
- package/.agents/skills/playwright-cli/references/tracing.md +139 -0
- package/.agents/skills/playwright-cli/references/video-recording.md +143 -0
- package/.agents/skills/polish/SKILL.md +224 -0
- package/.agents/skills/pptx/LICENSE.txt +30 -0
- package/.agents/skills/pptx/SKILL.md +232 -0
- package/.agents/skills/pptx/editing.md +205 -0
- package/.agents/skills/pptx/pptxgenjs.md +420 -0
- package/.agents/skills/pptx/scripts/__init__.py +0 -0
- package/.agents/skills/pptx/scripts/add_slide.py +195 -0
- package/.agents/skills/pptx/scripts/clean.py +286 -0
- package/.agents/skills/pptx/scripts/office/helpers/__init__.py +0 -0
- package/.agents/skills/pptx/scripts/office/helpers/merge_runs.py +199 -0
- package/.agents/skills/pptx/scripts/office/helpers/simplify_redlines.py +197 -0
- package/.agents/skills/pptx/scripts/office/pack.py +159 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- package/.agents/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- package/.agents/skills/pptx/scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- package/.agents/skills/pptx/scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- package/.agents/skills/pptx/scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- package/.agents/skills/pptx/scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- package/.agents/skills/pptx/scripts/office/schemas/mce/mc.xsd +75 -0
- package/.agents/skills/pptx/scripts/office/schemas/microsoft/wml-2010.xsd +560 -0
- package/.agents/skills/pptx/scripts/office/schemas/microsoft/wml-2012.xsd +67 -0
- package/.agents/skills/pptx/scripts/office/schemas/microsoft/wml-2018.xsd +14 -0
- package/.agents/skills/pptx/scripts/office/schemas/microsoft/wml-cex-2018.xsd +20 -0
- package/.agents/skills/pptx/scripts/office/schemas/microsoft/wml-cid-2016.xsd +13 -0
- package/.agents/skills/pptx/scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- package/.agents/skills/pptx/scripts/office/schemas/microsoft/wml-symex-2015.xsd +8 -0
- package/.agents/skills/pptx/scripts/office/soffice.py +183 -0
- package/.agents/skills/pptx/scripts/office/unpack.py +132 -0
- package/.agents/skills/pptx/scripts/office/validate.py +111 -0
- package/.agents/skills/pptx/scripts/office/validators/__init__.py +15 -0
- package/.agents/skills/pptx/scripts/office/validators/base.py +847 -0
- package/.agents/skills/pptx/scripts/office/validators/docx.py +446 -0
- package/.agents/skills/pptx/scripts/office/validators/pptx.py +275 -0
- package/.agents/skills/pptx/scripts/office/validators/redlining.py +247 -0
- package/.agents/skills/pptx/scripts/thumbnail.py +289 -0
- package/.agents/skills/project-development/SKILL.md +291 -0
- package/.agents/skills/project-development/references/case-studies.md +388 -0
- package/.agents/skills/project-development/references/pipeline-patterns.md +610 -0
- package/.agents/skills/project-development/scripts/pipeline_template.py +796 -0
- package/.agents/skills/prompt-engineer/SKILL.md +263 -0
- package/.agents/skills/prompt-engineer/references/advanced_patterns.md +271 -0
- package/.agents/skills/prompt-engineer/references/core_prompting.md +137 -0
- package/.agents/skills/prompt-engineer/references/quality_improvement.md +193 -0
- package/.agents/skills/quieter/SKILL.md +103 -0
- package/.agents/skills/research-codebase/SKILL.md +227 -0
- package/.agents/skills/shape/SKILL.md +96 -0
- package/.agents/skills/skill-creator/LICENSE.txt +202 -0
- package/.agents/skills/skill-creator/SKILL.md +485 -0
- package/.agents/skills/skill-creator/agents/analyzer.md +274 -0
- package/.agents/skills/skill-creator/agents/comparator.md +202 -0
- package/.agents/skills/skill-creator/agents/grader.md +223 -0
- package/.agents/skills/skill-creator/assets/eval_review.html +146 -0
- package/.agents/skills/skill-creator/eval-viewer/generate_review.py +471 -0
- package/.agents/skills/skill-creator/eval-viewer/viewer.html +1325 -0
- package/.agents/skills/skill-creator/references/schemas.md +430 -0
- package/.agents/skills/skill-creator/scripts/__init__.py +0 -0
- package/.agents/skills/skill-creator/scripts/aggregate_benchmark.py +401 -0
- package/.agents/skills/skill-creator/scripts/generate_report.py +326 -0
- package/.agents/skills/skill-creator/scripts/improve_description.py +247 -0
- package/.agents/skills/skill-creator/scripts/package_skill.py +136 -0
- package/.agents/skills/skill-creator/scripts/quick_validate.py +103 -0
- package/.agents/skills/skill-creator/scripts/run_eval.py +310 -0
- package/.agents/skills/skill-creator/scripts/run_loop.py +328 -0
- package/.agents/skills/skill-creator/scripts/utils.py +47 -0
- package/.agents/skills/sl-commit/SKILL.md +51 -0
- package/.agents/skills/sl-submit-diff/SKILL.md +55 -0
- package/.agents/skills/teach-impeccable/SKILL.md +71 -0
- package/.agents/skills/test-driven-development/SKILL.md +371 -0
- package/.agents/skills/test-driven-development/testing-anti-patterns.md +299 -0
- package/.agents/skills/tool-design/SKILL.md +271 -0
- package/.agents/skills/tool-design/references/architectural_reduction.md +210 -0
- package/.agents/skills/tool-design/references/best_practices.md +176 -0
- package/.agents/skills/tool-design/scripts/description_generator.py +528 -0
- package/.agents/skills/typescript-advanced-types/SKILL.md +719 -0
- package/.agents/skills/typescript-expert/SKILL.md +428 -0
- package/.agents/skills/typescript-expert/references/tsconfig-strict.json +92 -0
- package/.agents/skills/typescript-expert/references/typescript-cheatsheet.md +383 -0
- package/.agents/skills/typescript-expert/references/utility-types.ts +335 -0
- package/.agents/skills/typescript-expert/scripts/ts_diagnostic.py +203 -0
- package/.agents/skills/typescript-react-reviewer/SKILL.md +200 -0
- package/.agents/skills/typescript-react-reviewer/references/antipatterns.md +510 -0
- package/.agents/skills/typescript-react-reviewer/references/checklist.md +267 -0
- package/.agents/skills/typescript-react-reviewer/references/react19-patterns.md +305 -0
- package/.agents/skills/typeset/SKILL.md +116 -0
- package/.agents/skills/workflow-creator/SKILL.md +337 -0
- package/.agents/skills/workflow-creator/references/agent-sessions.md +789 -0
- package/.agents/skills/workflow-creator/references/computation-and-validation.md +224 -0
- package/.agents/skills/workflow-creator/references/control-flow.md +450 -0
- package/.agents/skills/workflow-creator/references/discovery-and-verification.md +156 -0
- package/.agents/skills/workflow-creator/references/failure-modes.md +732 -0
- package/.agents/skills/workflow-creator/references/getting-started.md +289 -0
- package/.agents/skills/workflow-creator/references/session-config.md +355 -0
- package/.agents/skills/workflow-creator/references/state-and-data-flow.md +374 -0
- package/.agents/skills/workflow-creator/references/user-input.md +206 -0
- package/.agents/skills/workflow-creator/references/workflow-inputs.md +274 -0
- package/.agents/skills/xlsx/LICENSE.txt +30 -0
- package/.agents/skills/xlsx/SKILL.md +292 -0
- package/.agents/skills/xlsx/scripts/office/helpers/__init__.py +0 -0
- package/.agents/skills/xlsx/scripts/office/helpers/merge_runs.py +199 -0
- package/.agents/skills/xlsx/scripts/office/helpers/simplify_redlines.py +197 -0
- package/.agents/skills/xlsx/scripts/office/pack.py +159 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- package/.agents/skills/xlsx/scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- package/.agents/skills/xlsx/scripts/office/schemas/mce/mc.xsd +75 -0
- package/.agents/skills/xlsx/scripts/office/schemas/microsoft/wml-2010.xsd +560 -0
- package/.agents/skills/xlsx/scripts/office/schemas/microsoft/wml-2012.xsd +67 -0
- package/.agents/skills/xlsx/scripts/office/schemas/microsoft/wml-2018.xsd +14 -0
- package/.agents/skills/xlsx/scripts/office/schemas/microsoft/wml-cex-2018.xsd +20 -0
- package/.agents/skills/xlsx/scripts/office/schemas/microsoft/wml-cid-2016.xsd +13 -0
- package/.agents/skills/xlsx/scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- package/.agents/skills/xlsx/scripts/office/schemas/microsoft/wml-symex-2015.xsd +8 -0
- package/.agents/skills/xlsx/scripts/office/soffice.py +183 -0
- package/.agents/skills/xlsx/scripts/office/unpack.py +132 -0
- package/.agents/skills/xlsx/scripts/office/validate.py +111 -0
- package/.agents/skills/xlsx/scripts/office/validators/__init__.py +15 -0
- package/.agents/skills/xlsx/scripts/office/validators/base.py +847 -0
- package/.agents/skills/xlsx/scripts/office/validators/docx.py +446 -0
- package/.agents/skills/xlsx/scripts/office/validators/pptx.py +275 -0
- package/.agents/skills/xlsx/scripts/office/validators/redlining.py +247 -0
- package/.agents/skills/xlsx/scripts/recalc.py +184 -0
- package/.claude/agents/reviewer.md +1 -0
- package/.github/agents/reviewer.md +1 -0
- package/.opencode/agents/reviewer.md +1 -0
- package/README.md +274 -169
- package/package.json +6 -7
- package/src/commands/cli/init/index.ts +2 -2
- package/src/commands/cli/init/scm.ts +7 -8
- package/src/commands/cli/workflow-command.test.ts +74 -0
- package/src/commands/cli/workflow.ts +7 -2
- package/src/scripts/bundle-configs.ts +128 -0
- package/src/sdk/components/compact-switcher.tsx +1 -1
- package/src/sdk/components/orchestrator-panel-store.ts +13 -0
- package/src/sdk/components/orchestrator-panel.tsx +10 -0
- package/src/sdk/components/statusline.tsx +13 -1
- package/src/sdk/providers/claude.ts +42 -0
- package/src/sdk/runtime/executor.ts +111 -32
- package/src/sdk/types.ts +7 -0
- package/src/sdk/workflows/builtin/ralph/claude/index.ts +132 -76
- package/src/sdk/workflows/builtin/ralph/copilot/index.ts +129 -71
- package/src/sdk/workflows/builtin/ralph/helpers/git.ts +184 -17
- package/src/sdk/workflows/builtin/ralph/helpers/prompts.ts +463 -79
- package/src/sdk/workflows/builtin/ralph/opencode/index.ts +124 -80
- package/src/services/system/auto-sync.ts +31 -51
- package/src/services/system/skills.ts +56 -60
- package/dist/lib/path-root-guard.d.ts +0 -4
- package/dist/lib/path-root-guard.d.ts.map +0 -1
- package/dist/sdk/components/color-utils.d.ts +0 -4
- package/dist/sdk/components/color-utils.d.ts.map +0 -1
- package/dist/sdk/components/compact-switcher.d.ts +0 -10
- package/dist/sdk/components/compact-switcher.d.ts.map +0 -1
- package/dist/sdk/components/connectors.d.ts +0 -15
- package/dist/sdk/components/connectors.d.ts.map +0 -1
- package/dist/sdk/components/connectors.test.d.ts +0 -2
- package/dist/sdk/components/connectors.test.d.ts.map +0 -1
- package/dist/sdk/components/edge.d.ts +0 -4
- package/dist/sdk/components/edge.d.ts.map +0 -1
- package/dist/sdk/components/error-boundary.d.ts +0 -23
- package/dist/sdk/components/error-boundary.d.ts.map +0 -1
- package/dist/sdk/components/graph-theme.d.ts +0 -17
- package/dist/sdk/components/graph-theme.d.ts.map +0 -1
- package/dist/sdk/components/header.d.ts +0 -3
- package/dist/sdk/components/header.d.ts.map +0 -1
- package/dist/sdk/components/hooks.d.ts +0 -15
- package/dist/sdk/components/hooks.d.ts.map +0 -1
- package/dist/sdk/components/layout.d.ts +0 -27
- package/dist/sdk/components/layout.d.ts.map +0 -1
- package/dist/sdk/components/layout.test.d.ts +0 -2
- package/dist/sdk/components/layout.test.d.ts.map +0 -1
- package/dist/sdk/components/node-card.d.ts +0 -10
- package/dist/sdk/components/node-card.d.ts.map +0 -1
- package/dist/sdk/components/orchestrator-panel-contexts.d.ts +0 -16
- package/dist/sdk/components/orchestrator-panel-contexts.d.ts.map +0 -1
- package/dist/sdk/components/orchestrator-panel-store.d.ts +0 -46
- package/dist/sdk/components/orchestrator-panel-store.d.ts.map +0 -1
- package/dist/sdk/components/orchestrator-panel-store.test.d.ts +0 -2
- package/dist/sdk/components/orchestrator-panel-store.test.d.ts.map +0 -1
- package/dist/sdk/components/orchestrator-panel-types.d.ts +0 -18
- package/dist/sdk/components/orchestrator-panel-types.d.ts.map +0 -1
- package/dist/sdk/components/orchestrator-panel.d.ts +0 -52
- package/dist/sdk/components/orchestrator-panel.d.ts.map +0 -1
- package/dist/sdk/components/session-graph-panel.d.ts +0 -7
- package/dist/sdk/components/session-graph-panel.d.ts.map +0 -1
- package/dist/sdk/components/status-helpers.d.ts +0 -6
- package/dist/sdk/components/status-helpers.d.ts.map +0 -1
- package/dist/sdk/components/statusline.d.ts +0 -7
- package/dist/sdk/components/statusline.d.ts.map +0 -1
- package/dist/sdk/components/workflow-picker-panel.d.ts +0 -123
- package/dist/sdk/components/workflow-picker-panel.d.ts.map +0 -1
- package/dist/sdk/define-workflow.d.ts +0 -78
- package/dist/sdk/define-workflow.d.ts.map +0 -1
- package/dist/sdk/define-workflow.test.d.ts +0 -2
- package/dist/sdk/define-workflow.test.d.ts.map +0 -1
- package/dist/sdk/errors.d.ts +0 -24
- package/dist/sdk/errors.d.ts.map +0 -1
- package/dist/sdk/errors.test.d.ts +0 -2
- package/dist/sdk/errors.test.d.ts.map +0 -1
- package/dist/sdk/index.d.ts +0 -13
- package/dist/sdk/index.d.ts.map +0 -1
- package/dist/sdk/providers/claude.d.ts +0 -170
- package/dist/sdk/providers/claude.d.ts.map +0 -1
- package/dist/sdk/providers/copilot.d.ts +0 -11
- package/dist/sdk/providers/copilot.d.ts.map +0 -1
- package/dist/sdk/providers/opencode.d.ts +0 -11
- package/dist/sdk/providers/opencode.d.ts.map +0 -1
- package/dist/sdk/runtime/discovery.d.ts +0 -86
- package/dist/sdk/runtime/discovery.d.ts.map +0 -1
- package/dist/sdk/runtime/executor-entry.d.ts +0 -11
- package/dist/sdk/runtime/executor-entry.d.ts.map +0 -1
- package/dist/sdk/runtime/executor.d.ts +0 -72
- package/dist/sdk/runtime/executor.d.ts.map +0 -1
- package/dist/sdk/runtime/executor.test.d.ts +0 -2
- package/dist/sdk/runtime/executor.test.d.ts.map +0 -1
- package/dist/sdk/runtime/graph-inference.d.ts +0 -35
- package/dist/sdk/runtime/graph-inference.d.ts.map +0 -1
- package/dist/sdk/runtime/loader.d.ts +0 -70
- package/dist/sdk/runtime/loader.d.ts.map +0 -1
- package/dist/sdk/runtime/panel.d.ts +0 -9
- package/dist/sdk/runtime/panel.d.ts.map +0 -1
- package/dist/sdk/runtime/theme.d.ts +0 -28
- package/dist/sdk/runtime/theme.d.ts.map +0 -1
- package/dist/sdk/runtime/tmux.d.ts +0 -297
- package/dist/sdk/runtime/tmux.d.ts.map +0 -1
- package/dist/sdk/types.d.ts +0 -295
- package/dist/sdk/types.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/deep-research-codebase/claude/index.d.ts +0 -62
- package/dist/sdk/workflows/builtin/deep-research-codebase/claude/index.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/deep-research-codebase/copilot/index.d.ts +0 -46
- package/dist/sdk/workflows/builtin/deep-research-codebase/copilot/index.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/deep-research-codebase/helpers/heuristic.d.ts +0 -26
- package/dist/sdk/workflows/builtin/deep-research-codebase/helpers/heuristic.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/deep-research-codebase/helpers/prompts.d.ts +0 -92
- package/dist/sdk/workflows/builtin/deep-research-codebase/helpers/prompts.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/deep-research-codebase/helpers/scout.d.ts +0 -57
- package/dist/sdk/workflows/builtin/deep-research-codebase/helpers/scout.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/deep-research-codebase/opencode/index.d.ts +0 -49
- package/dist/sdk/workflows/builtin/deep-research-codebase/opencode/index.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/ralph/claude/index.d.ts +0 -14
- package/dist/sdk/workflows/builtin/ralph/claude/index.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/ralph/copilot/index.d.ts +0 -14
- package/dist/sdk/workflows/builtin/ralph/copilot/index.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/ralph/helpers/git.d.ts +0 -17
- package/dist/sdk/workflows/builtin/ralph/helpers/git.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/ralph/helpers/prompts.d.ts +0 -119
- package/dist/sdk/workflows/builtin/ralph/helpers/prompts.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/ralph/helpers/review.d.ts +0 -20
- package/dist/sdk/workflows/builtin/ralph/helpers/review.d.ts.map +0 -1
- package/dist/sdk/workflows/builtin/ralph/opencode/index.d.ts +0 -14
- package/dist/sdk/workflows/builtin/ralph/opencode/index.d.ts.map +0 -1
- package/dist/sdk/workflows/index.d.ts +0 -24
- package/dist/sdk/workflows/index.d.ts.map +0 -1
- package/dist/services/config/definitions.d.ts +0 -85
- package/dist/services/config/definitions.d.ts.map +0 -1
- package/dist/services/system/copy.d.ts +0 -77
- package/dist/services/system/copy.d.ts.map +0 -1
- package/dist/services/system/detect.d.ts +0 -75
- package/dist/services/system/detect.d.ts.map +0 -1
- package/tsconfig.json +0 -33
|
@@ -0,0 +1,331 @@
|
|
|
1
|
+
# Metric Selection Guide for LLM Evaluation
|
|
2
|
+
|
|
3
|
+
This reference provides guidance on selecting appropriate metrics for different evaluation scenarios.
|
|
4
|
+
|
|
5
|
+
## Metric Categories
|
|
6
|
+
|
|
7
|
+
### Classification Metrics
|
|
8
|
+
|
|
9
|
+
Use for binary or multi-class evaluation tasks (pass/fail, correct/incorrect).
|
|
10
|
+
|
|
11
|
+
#### Precision
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
Precision = True Positives / (True Positives + False Positives)
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
**Interpretation**: Of all responses the judge said were good, what fraction were actually good?
|
|
18
|
+
|
|
19
|
+
**Use when**: False positives are costly (e.g., approving unsafe content)
|
|
20
|
+
|
|
21
|
+
```python
|
|
22
|
+
def precision(predictions, ground_truth):
|
|
23
|
+
true_positives = sum(1 for p, g in zip(predictions, ground_truth) if p == 1 and g == 1)
|
|
24
|
+
predicted_positives = sum(predictions)
|
|
25
|
+
return true_positives / predicted_positives if predicted_positives > 0 else 0
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
#### Recall
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
Recall = True Positives / (True Positives + False Negatives)
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
**Interpretation**: Of all actually good responses, what fraction did the judge identify?
|
|
35
|
+
|
|
36
|
+
**Use when**: False negatives are costly (e.g., missing good content in filtering)
|
|
37
|
+
|
|
38
|
+
```python
|
|
39
|
+
def recall(predictions, ground_truth):
|
|
40
|
+
true_positives = sum(1 for p, g in zip(predictions, ground_truth) if p == 1 and g == 1)
|
|
41
|
+
actual_positives = sum(ground_truth)
|
|
42
|
+
return true_positives / actual_positives if actual_positives > 0 else 0
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
#### F1 Score
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
F1 = 2 * (Precision * Recall) / (Precision + Recall)
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
**Interpretation**: Harmonic mean of precision and recall
|
|
52
|
+
|
|
53
|
+
**Use when**: You need a single number balancing both concerns
|
|
54
|
+
|
|
55
|
+
```python
|
|
56
|
+
def f1_score(predictions, ground_truth):
|
|
57
|
+
p = precision(predictions, ground_truth)
|
|
58
|
+
r = recall(predictions, ground_truth)
|
|
59
|
+
return 2 * p * r / (p + r) if (p + r) > 0 else 0
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
### Agreement Metrics
|
|
63
|
+
|
|
64
|
+
Use for comparing automated evaluation with human judgment.
|
|
65
|
+
|
|
66
|
+
#### Cohen's Kappa (κ)
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
κ = (Observed Agreement - Expected Agreement) / (1 - Expected Agreement)
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
**Interpretation**: Agreement adjusted for chance
|
|
73
|
+
- κ > 0.8: Almost perfect agreement
|
|
74
|
+
- κ 0.6-0.8: Substantial agreement
|
|
75
|
+
- κ 0.4-0.6: Moderate agreement
|
|
76
|
+
- κ < 0.4: Fair to poor agreement
|
|
77
|
+
|
|
78
|
+
**Use for**: Binary or categorical judgments
|
|
79
|
+
|
|
80
|
+
```python
|
|
81
|
+
def cohens_kappa(judge1, judge2):
|
|
82
|
+
from sklearn.metrics import cohen_kappa_score
|
|
83
|
+
return cohen_kappa_score(judge1, judge2)
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
#### Weighted Kappa
|
|
87
|
+
|
|
88
|
+
For ordinal scales where disagreement severity matters:
|
|
89
|
+
|
|
90
|
+
```python
|
|
91
|
+
def weighted_kappa(judge1, judge2):
|
|
92
|
+
from sklearn.metrics import cohen_kappa_score
|
|
93
|
+
return cohen_kappa_score(judge1, judge2, weights='quadratic')
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
**Interpretation**: Penalizes large disagreements more than small ones
|
|
97
|
+
|
|
98
|
+
### Correlation Metrics
|
|
99
|
+
|
|
100
|
+
Use for ordinal/continuous scores.
|
|
101
|
+
|
|
102
|
+
#### Spearman's Rank Correlation (ρ)
|
|
103
|
+
|
|
104
|
+
**Interpretation**: Correlation between rankings, not absolute values
|
|
105
|
+
- ρ > 0.9: Very strong correlation
|
|
106
|
+
- ρ 0.7-0.9: Strong correlation
|
|
107
|
+
- ρ 0.5-0.7: Moderate correlation
|
|
108
|
+
- ρ < 0.5: Weak correlation
|
|
109
|
+
|
|
110
|
+
**Use when**: Order matters more than exact values
|
|
111
|
+
|
|
112
|
+
```python
|
|
113
|
+
def spearmans_rho(scores1, scores2):
|
|
114
|
+
from scipy.stats import spearmanr
|
|
115
|
+
rho, p_value = spearmanr(scores1, scores2)
|
|
116
|
+
return {'rho': rho, 'p_value': p_value}
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
#### Kendall's Tau (τ)
|
|
120
|
+
|
|
121
|
+
**Interpretation**: Similar to Spearman but based on pairwise concordance
|
|
122
|
+
|
|
123
|
+
**Use when**: You have many tied values
|
|
124
|
+
|
|
125
|
+
```python
|
|
126
|
+
def kendalls_tau(scores1, scores2):
|
|
127
|
+
from scipy.stats import kendalltau
|
|
128
|
+
tau, p_value = kendalltau(scores1, scores2)
|
|
129
|
+
return {'tau': tau, 'p_value': p_value}
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
#### Pearson Correlation (r)
|
|
133
|
+
|
|
134
|
+
**Interpretation**: Linear correlation between scores
|
|
135
|
+
|
|
136
|
+
**Use when**: Exact score values matter, not just order
|
|
137
|
+
|
|
138
|
+
```python
|
|
139
|
+
def pearsons_r(scores1, scores2):
|
|
140
|
+
from scipy.stats import pearsonr
|
|
141
|
+
r, p_value = pearsonr(scores1, scores2)
|
|
142
|
+
return {'r': r, 'p_value': p_value}
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### Pairwise Comparison Metrics
|
|
146
|
+
|
|
147
|
+
#### Agreement Rate
|
|
148
|
+
|
|
149
|
+
```
|
|
150
|
+
Agreement = (Matching Decisions) / (Total Comparisons)
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
**Interpretation**: Simple percentage of agreement
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
def pairwise_agreement(decisions1, decisions2):
|
|
157
|
+
matches = sum(1 for d1, d2 in zip(decisions1, decisions2) if d1 == d2)
|
|
158
|
+
return matches / len(decisions1)
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
#### Position Consistency
|
|
162
|
+
|
|
163
|
+
```
|
|
164
|
+
Consistency = (Consistent across position swaps) / (Total comparisons)
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
**Interpretation**: How often does swapping position change the decision?
|
|
168
|
+
|
|
169
|
+
```python
|
|
170
|
+
def position_consistency(results):
|
|
171
|
+
consistent = sum(1 for r in results if r['position_consistent'])
|
|
172
|
+
return consistent / len(results)
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
## Selection Decision Tree
|
|
176
|
+
|
|
177
|
+
```
|
|
178
|
+
What type of evaluation task?
|
|
179
|
+
│
|
|
180
|
+
├── Binary classification (pass/fail)
|
|
181
|
+
│ └── Use: Precision, Recall, F1, Cohen's κ
|
|
182
|
+
│
|
|
183
|
+
├── Ordinal scale (1-5 rating)
|
|
184
|
+
│ ├── Comparing to human judgments?
|
|
185
|
+
│ │ └── Use: Spearman's ρ, Weighted κ
|
|
186
|
+
│ └── Comparing two automated judges?
|
|
187
|
+
│ └── Use: Kendall's τ, Spearman's ρ
|
|
188
|
+
│
|
|
189
|
+
├── Pairwise preference
|
|
190
|
+
│ └── Use: Agreement rate, Position consistency
|
|
191
|
+
│
|
|
192
|
+
└── Multi-label classification
|
|
193
|
+
└── Use: Macro-F1, Micro-F1, Per-label metrics
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
## Metric Selection by Use Case
|
|
197
|
+
|
|
198
|
+
### Use Case 1: Validating Automated Evaluation
|
|
199
|
+
|
|
200
|
+
**Goal**: Ensure automated evaluation correlates with human judgment
|
|
201
|
+
|
|
202
|
+
**Recommended Metrics**:
|
|
203
|
+
1. Primary: Spearman's ρ (for ordinal scales) or Cohen's κ (for categorical)
|
|
204
|
+
2. Secondary: Per-criterion agreement
|
|
205
|
+
3. Diagnostic: Confusion matrix for systematic errors
|
|
206
|
+
|
|
207
|
+
```python
|
|
208
|
+
def validate_automated_eval(automated_scores, human_scores, criteria):
|
|
209
|
+
results = {}
|
|
210
|
+
|
|
211
|
+
# Overall correlation
|
|
212
|
+
results['overall_spearman'] = spearmans_rho(automated_scores, human_scores)
|
|
213
|
+
|
|
214
|
+
# Per-criterion agreement
|
|
215
|
+
for criterion in criteria:
|
|
216
|
+
auto_crit = [s[criterion] for s in automated_scores]
|
|
217
|
+
human_crit = [s[criterion] for s in human_scores]
|
|
218
|
+
results[f'{criterion}_spearman'] = spearmans_rho(auto_crit, human_crit)
|
|
219
|
+
|
|
220
|
+
return results
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
### Use Case 2: Comparing Two Models
|
|
224
|
+
|
|
225
|
+
**Goal**: Determine which model produces better outputs
|
|
226
|
+
|
|
227
|
+
**Recommended Metrics**:
|
|
228
|
+
1. Primary: Win rate (from pairwise comparison)
|
|
229
|
+
2. Secondary: Position consistency (bias check)
|
|
230
|
+
3. Diagnostic: Per-criterion breakdown
|
|
231
|
+
|
|
232
|
+
```python
|
|
233
|
+
def compare_models(model_a_outputs, model_b_outputs, prompts):
|
|
234
|
+
results = []
|
|
235
|
+
for a, b, p in zip(model_a_outputs, model_b_outputs, prompts):
|
|
236
|
+
comparison = await compare_with_position_swap(a, b, p)
|
|
237
|
+
results.append(comparison)
|
|
238
|
+
|
|
239
|
+
return {
|
|
240
|
+
'a_wins': sum(1 for r in results if r['winner'] == 'A'),
|
|
241
|
+
'b_wins': sum(1 for r in results if r['winner'] == 'B'),
|
|
242
|
+
'ties': sum(1 for r in results if r['winner'] == 'TIE'),
|
|
243
|
+
'position_consistency': position_consistency(results)
|
|
244
|
+
}
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
### Use Case 3: Quality Monitoring
|
|
248
|
+
|
|
249
|
+
**Goal**: Track evaluation quality over time
|
|
250
|
+
|
|
251
|
+
**Recommended Metrics**:
|
|
252
|
+
1. Primary: Rolling agreement with human spot-checks
|
|
253
|
+
2. Secondary: Score distribution stability
|
|
254
|
+
3. Diagnostic: Bias indicators (position, length)
|
|
255
|
+
|
|
256
|
+
```python
|
|
257
|
+
class QualityMonitor:
|
|
258
|
+
def __init__(self, window_size=100):
|
|
259
|
+
self.window = deque(maxlen=window_size)
|
|
260
|
+
|
|
261
|
+
def add_evaluation(self, automated, human_spot_check=None):
|
|
262
|
+
self.window.append({
|
|
263
|
+
'automated': automated,
|
|
264
|
+
'human': human_spot_check,
|
|
265
|
+
'length': len(automated['response'])
|
|
266
|
+
})
|
|
267
|
+
|
|
268
|
+
def get_metrics(self):
|
|
269
|
+
# Filter to evaluations with human spot-checks
|
|
270
|
+
with_human = [e for e in self.window if e['human'] is not None]
|
|
271
|
+
|
|
272
|
+
if len(with_human) < 10:
|
|
273
|
+
return {'insufficient_data': True}
|
|
274
|
+
|
|
275
|
+
auto_scores = [e['automated']['score'] for e in with_human]
|
|
276
|
+
human_scores = [e['human']['score'] for e in with_human]
|
|
277
|
+
|
|
278
|
+
return {
|
|
279
|
+
'correlation': spearmans_rho(auto_scores, human_scores),
|
|
280
|
+
'mean_difference': np.mean([a - h for a, h in zip(auto_scores, human_scores)]),
|
|
281
|
+
'length_correlation': spearmans_rho(
|
|
282
|
+
[e['length'] for e in self.window],
|
|
283
|
+
[e['automated']['score'] for e in self.window]
|
|
284
|
+
)
|
|
285
|
+
}
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
## Interpreting Metric Results
|
|
289
|
+
|
|
290
|
+
### Good Evaluation System Indicators
|
|
291
|
+
|
|
292
|
+
| Metric | Good | Acceptable | Concerning |
|
|
293
|
+
|--------|------|------------|------------|
|
|
294
|
+
| Spearman's ρ | > 0.8 | 0.6-0.8 | < 0.6 |
|
|
295
|
+
| Cohen's κ | > 0.7 | 0.5-0.7 | < 0.5 |
|
|
296
|
+
| Position consistency | > 0.9 | 0.8-0.9 | < 0.8 |
|
|
297
|
+
| Length correlation | < 0.2 | 0.2-0.4 | > 0.4 |
|
|
298
|
+
|
|
299
|
+
### Warning Signs
|
|
300
|
+
|
|
301
|
+
1. **High agreement but low correlation**: May indicate calibration issues
|
|
302
|
+
2. **Low position consistency**: Position bias affecting results
|
|
303
|
+
3. **High length correlation**: Length bias inflating scores
|
|
304
|
+
4. **Per-criterion variance**: Some criteria may be poorly defined
|
|
305
|
+
|
|
306
|
+
## Reporting Template
|
|
307
|
+
|
|
308
|
+
```markdown
|
|
309
|
+
## Evaluation System Metrics Report
|
|
310
|
+
|
|
311
|
+
### Human Agreement
|
|
312
|
+
- Spearman's ρ: 0.82 (p < 0.001)
|
|
313
|
+
- Cohen's κ: 0.74
|
|
314
|
+
- Sample size: 500 evaluations
|
|
315
|
+
|
|
316
|
+
### Bias Indicators
|
|
317
|
+
- Position consistency: 91%
|
|
318
|
+
- Length-score correlation: 0.12
|
|
319
|
+
|
|
320
|
+
### Per-Criterion Performance
|
|
321
|
+
| Criterion | Spearman's ρ | κ |
|
|
322
|
+
|-----------|--------------|---|
|
|
323
|
+
| Accuracy | 0.88 | 0.79 |
|
|
324
|
+
| Clarity | 0.76 | 0.68 |
|
|
325
|
+
| Completeness | 0.81 | 0.72 |
|
|
326
|
+
|
|
327
|
+
### Recommendations
|
|
328
|
+
- All metrics within acceptable ranges
|
|
329
|
+
- Monitor "Clarity" criterion - lower agreement may indicate need for rubric refinement
|
|
330
|
+
```
|
|
331
|
+
|