PyPI - schliff - Versions diffs - 6.0.1__tar.gz - Mend

schliff 6.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (73) hide show

schliff-6.0.1/LICENSE +21 -0
schliff-6.0.1/PKG-INFO +367 -0
schliff-6.0.1/README.md +341 -0
schliff-6.0.1/pyproject.toml +55 -0
schliff-6.0.1/schliff.egg-info/PKG-INFO +367 -0
schliff-6.0.1/schliff.egg-info/SOURCES.txt +71 -0
schliff-6.0.1/schliff.egg-info/dependency_links.txt +1 -0
schliff-6.0.1/schliff.egg-info/entry_points.txt +2 -0
schliff-6.0.1/schliff.egg-info/top_level.txt +1 -0
schliff-6.0.1/setup.cfg +4 -0
schliff-6.0.1/skills/__init__.py +0 -0
schliff-6.0.1/skills/schliff/SKILL.md +297 -0
schliff-6.0.1/skills/schliff/__init__.py +0 -0
schliff-6.0.1/skills/schliff/eval-suite.json +832 -0
schliff-6.0.1/skills/schliff/hooks/hooks.json +15 -0
schliff-6.0.1/skills/schliff/hooks/session-injector.js +204 -0
schliff-6.0.1/skills/schliff/references/improvement-protocol.md +776 -0
schliff-6.0.1/skills/schliff/references/metrics-catalog.md +185 -0
schliff-6.0.1/skills/schliff/references/skill-patterns.md +110 -0
schliff-6.0.1/skills/schliff/references/state-management.md +54 -0
schliff-6.0.1/skills/schliff/scripts/__init__.py +0 -0
schliff-6.0.1/skills/schliff/scripts/achievements.py +286 -0
schliff-6.0.1/skills/schliff/scripts/analyze-skill.sh +287 -0
schliff-6.0.1/skills/schliff/scripts/auto-improve.py +648 -0
schliff-6.0.1/skills/schliff/scripts/cli.py +143 -0
schliff-6.0.1/skills/schliff/scripts/dashboard.py +313 -0
schliff-6.0.1/skills/schliff/scripts/doctor.py +275 -0
schliff-6.0.1/skills/schliff/scripts/episodic-store.py +5 -0
schliff-6.0.1/skills/schliff/scripts/episodic_store.py +536 -0
schliff-6.0.1/skills/schliff/scripts/generate-report.py +538 -0
schliff-6.0.1/skills/schliff/scripts/init-skill.py +799 -0
schliff-6.0.1/skills/schliff/scripts/meta-report.py +5 -0
schliff-6.0.1/skills/schliff/scripts/meta_report.py +480 -0
schliff-6.0.1/skills/schliff/scripts/nlp.py +91 -0
schliff-6.0.1/skills/schliff/scripts/parallel-runner.py +5 -0
schliff-6.0.1/skills/schliff/scripts/parallel_runner.py +456 -0
schliff-6.0.1/skills/schliff/scripts/progress.py +827 -0
schliff-6.0.1/skills/schliff/scripts/run-eval.sh +521 -0
schliff-6.0.1/skills/schliff/scripts/runtime-evaluator.py +256 -0
schliff-6.0.1/skills/schliff/scripts/score-skill.py +180 -0
schliff-6.0.1/skills/schliff/scripts/score_skill.py +11 -0
schliff-6.0.1/skills/schliff/scripts/scoring/__init__.py +23 -0
schliff-6.0.1/skills/schliff/scripts/scoring/clarity.py +179 -0
schliff-6.0.1/skills/schliff/scripts/scoring/coherence.py +76 -0
schliff-6.0.1/skills/schliff/scripts/scoring/composability.py +142 -0
schliff-6.0.1/skills/schliff/scripts/scoring/composite.py +178 -0
schliff-6.0.1/skills/schliff/scripts/scoring/diff.py +97 -0
schliff-6.0.1/skills/schliff/scripts/scoring/edges.py +117 -0
schliff-6.0.1/skills/schliff/scripts/scoring/efficiency.py +149 -0
schliff-6.0.1/skills/schliff/scripts/scoring/patterns.py +195 -0
schliff-6.0.1/skills/schliff/scripts/scoring/quality.py +137 -0
schliff-6.0.1/skills/schliff/scripts/scoring/runtime.py +124 -0
schliff-6.0.1/skills/schliff/scripts/scoring/structure.py +145 -0
schliff-6.0.1/skills/schliff/scripts/scoring/triggers.py +153 -0
schliff-6.0.1/skills/schliff/scripts/shared.py +286 -0
schliff-6.0.1/skills/schliff/scripts/skill-mesh.py +5 -0
schliff-6.0.1/skills/schliff/scripts/skill_mesh.py +858 -0
schliff-6.0.1/skills/schliff/scripts/terminal_art.py +214 -0
schliff-6.0.1/skills/schliff/scripts/test-integration.sh +1765 -0
schliff-6.0.1/skills/schliff/scripts/test-self.sh +168 -0
schliff-6.0.1/skills/schliff/scripts/text-gradient.py +5 -0
schliff-6.0.1/skills/schliff/scripts/text_gradient.py +1043 -0
schliff-6.0.1/skills/schliff/templates/eval-suite-template.json +109 -0
schliff-6.0.1/skills/schliff/tests/proof/bad-eval-suite.json +60 -0
schliff-6.0.1/skills/schliff/tests/proof/bad-skill.md +10 -0
schliff-6.0.1/skills/schliff/tests/proof/test-proof.sh +222 -0
schliff-6.0.1/skills/schliff/tests/unit/conftest.py +17 -0
schliff-6.0.1/skills/schliff/tests/unit/test_edge_cases.py +357 -0
schliff-6.0.1/skills/schliff/tests/unit/test_golden.py +274 -0
schliff-6.0.1/skills/schliff/tests/unit/test_patterns.py +513 -0
schliff-6.0.1/skills/schliff/tests/unit/test_scoring.py +1051 -0
schliff-6.0.1/skills/schliff/tests/unit/test_shared.py +279 -0
schliff-6.0.1/skills/schliff/tests/unit/test_stress.py +816 -0

schliff-6.0.1/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Zandereins (Franz Paul)
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

schliff-6.0.1/PKG-INFO ADDED Viewed

@@ -0,0 +1,367 @@
+Metadata-Version: 2.4
+Name: schliff
+Version: 6.0.1
+Summary: Autonomous skill improvement and measurement framework for Claude Code
+Author: Franz Paul
+License: MIT
+Project-URL: Repository, https://github.com/Zandereins/schliff
+Project-URL: Issues, https://github.com/Zandereins/schliff/issues
+Project-URL: Changelog, https://github.com/Zandereins/schliff/blob/main/CHANGELOG.md
+Keywords: claude-code,skills,scoring,evaluation,autonomous-improvement
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Software Development :: Quality Assurance
+Classifier: Topic :: Software Development :: Testing
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Dynamic: license-file
+# Schliff
+The finishing cut for Claude Code skills.
+<p align="center">
+  <img src="demo/schliff-demo.gif?v=2" alt="Schliff improving a skill from 56.9 to 99.9" width="720">
+</p>
+```
+Baseline:  █████░░░░░░░░░░░░░░░  54.0/100  [D]
+After 18x: ████████████████████  98.3/100  [S]
+What changed:
+  Structure         70 → 100     Added description, examples, concrete commands
+  Efficiency        35 → 93      Removed hedging language, improved density
+  Composability     30 → 90      Added scope, error behavior, dependencies
+  Clarity           90 → 100     Resolved vague references
+```
+> You wrote a skill. It worked. Three weeks later, triggers misfire, edge cases slip through, instructions contradict themselves. Schliff measures the damage (deterministic scoring, no LLM needed) and fixes it autonomously (Claude Code applies patches, measures delta, reverts regressions).
+[![GitHub stars](https://img.shields.io/github/stars/Zandereins/schliff?style=flat-square)](https://github.com/Zandereins/schliff)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
+[![Tests](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/Zandereins/130bb61237b5b9b1536718e6a2296d4a/raw/schliff-tests.json)](.github/workflows/test.yml)
+[![Structural Score](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/Zandereins/130bb61237b5b9b1536718e6a2296d4a/raw/schliff-score.json)](skills/schliff/scripts/score-skill.py)
+[![v6.0.0](https://img.shields.io/badge/Version-6.0.0-F59E0B)](CHANGELOG.md)
+[![Claude Code Skill](https://img.shields.io/badge/Claude_Code-Skill-8A2BE2)](https://docs.anthropic.com/en/docs/claude-code/skills)
+---
+## Try It — Demo in 3 minutes
+> **Note:** Schliff commands (`/schliff:*`) run inside [Claude Code](https://docs.anthropic.com/en/docs/claude-code), not in a regular terminal. Claude's intelligence decides which patches to apply — the scorer is deterministic, the improvement loop uses the LLM.
+```bash
+# 1. Install once (terminal, ~1 min)
+git clone https://github.com/Zandereins/schliff.git && bash schliff/install.sh
+# 2. Score the included demo skill (Claude Code, ~10 sec)
+/schliff:init demo/bad-skill/SKILL.md
+# 3. Watch it improve the demo skill (Claude Code, ~2 min)
+/schliff:auto
+```
+**What you'll see on the demo skill:** 18 autonomous iterations. Each one: patch → measure → keep or revert. Score climbs from 54 [D] to 98 [S]. Stops when ROI plateaus. Real-world skills take longer and may not reach [S] — complex skills plateau around [A] to [S] depending on their eval suite coverage.
+**Prerequisites:** Python 3.9+, Bash, Git, jq
+Already have skills? Run `/schliff:doctor` to scan all installed skills and show health grades + token costs.
+---
+## What Schliff Fixes
+Real improvements from the included demo skill:
+| Problem | What Schliff does | Result |
+|---------|-------------------|--------|
+| Triggers misfire | Keyword matching + negative boundaries | **0% → 89%** accuracy |
+| Missing structure | Added examples, edge cases, frontmatter | **75 → 100**/100 |
+| Vague instructions | Replaced hedging with concrete commands | **35 → 93**/100 |
+| No scope boundaries | Added handoff declarations + "do NOT use" | **40 → 100**/100 |
+Automated. No human intervention. Stops when ROI plateaus.
+---
+## This Is For You If
+- **Skill Creator** — Run `/schliff:init` on your v1 skill to get a baseline + eval suite
+- **Skill Maintainer** — Run `/schliff:auto` to grind any skill from [C] to [S] overnight
+- **Fleet Manager (10+ skills)** — Run `/schliff:doctor` to scan everything, detect conflicts + token costs
+- **Quality Gate** — Run `/schliff:eval` before shipping, or use the [GitHub Action](#github-action) in CI
+---
+## Why It Works
+**Autonomous** — Runs unattended. Applies patches, measures delta, reverts regressions, stops when ROI drops. No prompts, no babysitting.
+**Deterministic scoring** — The 7-dimension scorer is pure Python, no LLM. Same input, same output. The improvement loop (`/schliff:auto`) runs inside Claude Code — Claude decides which patches to apply, but 60-70% of fixes follow deterministic rules (frontmatter, noise removal, TODO cleanup).
+**Empirical** — 7 scoring dimensions (structure, triggers, quality, edges, efficiency, composability, clarity) + optional runtime validation against actual Claude behavior.
+**Learns** — Episodic memory remembers which strategies worked across sessions. Predicts success before trying. Your 50th skill improves faster than your 1st.
+**Scales** — MinHash + LSH mesh analysis detects trigger conflicts across 50+ skills in O(n). Doctor command shows health grades for your entire skill collection.
+---
+## Autoresearch for Claude Code
+Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch) (50K+ stars) — Schliff applies the same autonomous improvement loop to Claude Code skills:
+| | Karpathy's autoresearch | Schliff |
+|---|---|---|
+| **Target** | ML training scripts | Claude Code SKILL.md files |
+| **Metric** | 1 (val_bpb) | 7 dimensions |
+| **Patches** | 100% LLM | 60-70% deterministic |
+| **Memory** | None | Cross-session episodic store |
+| **Fleet** | 1 file | 50+ skills (Doctor + Mesh) |
+Both run overnight. Both stop when ROI plateaus. Both improve unattended.
+---
+## Commands
+### Core
+| Command | What It Does |
+|---------|--------------|
+| `/schliff` | Full autonomous loop with GOAL + METRIC |
+| `/schliff:doctor` | Scan ALL installed skills, show health summary |
+| `/schliff:auto` | Self-driving auto-improve (deterministic patches, no prompts) |
+| `/schliff:init` | Bootstrap eval suite + baseline from any SKILL.md |
+| `/schliff:report` | Generate shareable markdown report with badge |
+### Analyze & Debug
+| Command | What It Does |
+|---------|--------------|
+| `/schliff:analyze` | One-shot gap analysis with ranked recommendations |
+| `/schliff:bench` | Establish quality baseline for a skill |
+| `/schliff:eval` | Run eval suite assertions |
+| `/schliff:mesh` | Detect trigger conflicts across all installed skills |
+| `/schliff:triage` | Cluster failures, auto-generate fixes |
+| `/schliff:log-failure` | Log a skill failure for later triage |
+| `/schliff:update` | Update Schliff to latest version |
+---
+<details>
+<summary><b>How It Scores</b> — 7 dimensions + optional runtime</summary>
+Two modes, one decision:
+**Structural Score** (default) — Instant, zero LLM cost. Pure Python analysis of file organization, trigger keywords, eval coverage, edge cases, efficiency, composability. No API calls needed. Use `schliff score SKILL.md` from any terminal or `/schliff:bench` in Claude Code.
+**Runtime Score** (`--runtime`) — Invokes Claude with test prompts, validates actual behavior against assertions. Requires Claude CLI. Use before shipping to production.
+**Improvement Loop** (`/schliff:auto`) — Runs **inside Claude Code**. Claude reads the scorer output, picks the highest-impact fix, patches the SKILL.md, re-scores, keeps or reverts. This is where the LLM intelligence lives. The scorer is the ruler; Claude is the craftsman.
+| Dimension | Weight | What It Measures |
+|-----------|--------|-----------------|
+| Structure | 15% | Frontmatter, headers, examples, progressive disclosure |
+| Trigger Accuracy | 20% | TF-IDF keyword overlap against eval suite prompts |
+| Eval Coverage | 20% | Assertion breadth and eval suite coverage |
+| Edge Coverage | 15% | Edge case definitions in eval suite |
+| Token Efficiency | 10% | Information density, signal-to-noise ratio |
+| Composability | 10% | Scope boundaries, handoff declarations |
+| Clarity | 5% | Contradiction detection, vague references, ambiguity |
+| Runtime *(opt-in)* | 10% | Actual Claude behavior against assertions |
+Grades: **S** (>=95), **A** (>=85), **B** (>=75), **C** (>=65), **D** (>=50), **E** (>=35), **F** (<35).
+Full scoring methodology: [docs/SCORING.md](docs/SCORING.md)
+</details>
+<details>
+<summary><b>Dashboard</b> — Health overview for any skill</summary>
+```
+======================================================================
+  Schliff Health Dashboard: schliff
+======================================================================
+  Structural Score: ███████████████████░  95.4/100  [S]
+    [7/8 dimensions, 90% coverage]
+  Dimensions:
+    structure       ██████████  100/100
+    triggers        █████████░   95/100
+    quality         █████████░   91/100
+    edges           ██████████  100/100
+    efficiency      ████████░░   84/100
+    composability   ██████████  100/100
+    clarity         ██████████  100/100
+======================================================================
+```
+</details>
+<details>
+<summary><b>Auto-Improve</b> — Autonomous grinding with EMA-based stopping</summary>
+```
+Scoring baseline...
+Baseline: 95.4/100 (7 dims)
+--- Iteration 1 ---
+Stopping: composite >= 98 (95.4)
+  Schliff Auto-Improve Complete
+  ──────────────────────────────────────────────────
+  Score:  95 → 95.4/100  ███████████████████░  (+0.0)  [S]
+  Iters:  0  |  Kept: 0  |  Time: 1s
+  Stop:   composite >= 98 (95.4)
+  (Already near-optimal — consider runtime eval for further gains)
+```
+</details>
+<details>
+<summary><b>Doctor</b> — Scan all installed skills at once</summary>
+```
+======================================================================
+  Schliff Doctor — Skill Health Check
+======================================================================
+  1 skills scanned | 1 healthy | 4 mesh issues
+  Skill                      Score  Grade   Dims  Issues  Action
+  --------------------------------------------------------------------
+  schliff                   90    [A]    7/8       0  Healthy
+  Mesh Health: 68/100 (4 cross-skill issues)
+  Run /schliff:mesh for details.
+  NOTE: Scores are STRUCTURAL — they measure file organization,
+  not runtime effectiveness. Use --runtime for validated scoring.
+======================================================================
+```
+</details>
+<details>
+<summary><b>What's New in v6.0</b></summary>
+| Feature | Description |
+|---------|-------------|
+| Rebrand to Schliff | "The finishing cut" — German for polish/grind |
+| Clarity as Default | 7th dimension always active (contradictions, vague refs, ambiguity) |
+| Token Cost Estimation | Doctor shows per-skill token cost + fleet total |
+| GitHub Action | `Zandereins/schliff@v6` — CI quality gate with PR comments |
+| pip CLI | `schliff score SKILL.md` — works without Claude Code |
+| Actionable Doctor | Copy-paste commands with full skill paths |
+| Trigger Confidence | Small eval suites (<8 triggers) capped at score 60 |
+| Context-aware Contradictions | "run tests" vs "run tests in production" distinguished |
+| Anti-gaming | Empty headers, repetitive markers, binary composability fixed |
+| 443 Tests (unit + integration + proof) | +70 stress tests, +28 edge cases, +76 patterns, +20 golden files |
+| 40 Security Fixes | Shell injection, prompt injection, ReDoS, supply chain |
+</details>
+---
+## Quality & Security
+Schliff scores itself — 7 dimensions, same engine, no exceptions.
+| Metric | Value | What This Means |
+|--------|-------|-----------------|
+| Structural Score | **95.4 / 100** [S] | Production-ready. 10 composability sub-checks, all passing. |
+| Tests | **443 passing** | 318 unit + 99 integration + 20 self + 6 proof. Every scorer rule tested. |
+| Security | **40 fixes** | Shell injection, prompt injection, ReDoS, supply chain. |
+| Dimensions | **7 + runtime** | Transparent, rule-based, explainable scoring. |
+| Journey | v1.0 (62.5) → v6.0 (95.4) | 7 major versions. Continuous improvement, no regressions. |
+[Scoring methodology](docs/SCORING.md) | [Security details](CHANGELOG.md)
+---
+## GitHub Action
+Score skills in CI. Block PRs that regress. The Codecov for SKILL.md files.
+```yaml
+- uses: Zandereins/schliff@v6
+  with:
+    skill-path: '.claude/skills/my-skill/SKILL.md'
+    minimum-score: '75'      # blocks PR if below
+    comment-on-pr: 'true'    # posts score table on PR
+```
+---
+## CLI
+Score any skill without Claude Code:
+```bash
+pip install schliff
+schliff score path/to/SKILL.md          # score a skill
+schliff score path/to/SKILL.md --json   # JSON output
+schliff doctor                           # scan all installed skills
+```
+---
+## Ecosystem
+`skill-creator` builds a v1 skill. Schliff grinds it to production quality.
+```
+skill-creator → v1 SKILL.md → /schliff:auto → autonomous grinding → ship
+```
+- **[skill-creator](https://github.com/anthropics/courses/tree/master/claude-code/09-skill-creator)** — generate the first draft
+- **[autoresearch](https://github.com/uditgoenka/autoresearch)** — generalized autonomous research for Claude Code
+---
+## Badge
+Score your skill and add this to your README:
+```markdown
+[![Schliff: 95 [S]](https://img.shields.io/badge/Schliff-95%2F100_%5BS%5D-brightgreen)](https://github.com/Zandereins/schliff)
+```
+[![Schliff: 95 [S]](https://img.shields.io/badge/Schliff-95%2F100_%5BS%5D-brightgreen)](https://github.com/Zandereins/schliff)
+---
+## Contributing
+Found a bug in the scorer? Add a test case to `eval-suite.json` and open an issue.
+Want to improve scoring logic? Edit `score-skill.py`, run `bash test-integration.sh`, and PR the diff.
+---
+## Next Steps
+1. [Try the 3-minute demo](#try-it--3-minutes-zero-config) — see a skill go from [D] to [S]
+2. Run `/schliff:doctor` on your own skills — instant health check
+3. Add the [GitHub Action](#github-action) to your CI — quality gate for every PR
+4. [Read the scoring methodology](docs/SCORING.md) — understand what each dimension measures
+Questions? [Open an issue](https://github.com/Zandereins/schliff/issues) — we respond fast.
+---
+## License
+MIT — do whatever you want.
+---
+*Built by [Franz Paul](https://github.com/Zandereins) with Claude Code.*