npm - kc-beta - Versions diffs - 0.7.3 → 0.8.1 - Mend

kc-beta 0.7.3 → 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (109) hide show

package/template/skills/zh/meta/tree-processing/SKILL.md DELETED Viewed

@@ -1,121 +0,0 @@
----
-name: tree-processing
-description: >
-  Design production-grade document chunking mechanisms for verification workflows. Use when
-  building the chunking step of a workflow that will run repeatedly on many documents.
-  The approach: observe sample documents, find structural patterns, write a chunking script
-  in code, that script runs in production. Also use for navigating large documents via
-  hierarchical structure when a rule targets a specific section.
-  For quick, cheap batch chunking during exploration, use document-chunking instead.
----
-# Tree Processing
-Most verification rules do not need the entire document. They need a specific section, a specific table, a specific disclosure. The tree is your map for navigating large documents efficiently.
-## Production Chunking Methodology
-For verification workflows that process many documents, the chunking mechanism must be precise, consistent, and fast. The approach:
-1. **Observe**: Read 3-5 sample documents. Note their structure — headers, numbering, section patterns.
-2. **Find patterns**: Identify what's consistent (header format, numbering convention, TOC structure).
-3. **Write code**: Design a chunking script (regex-based splitter, header detector, TOC parser) that captures the pattern.
-4. **Test**: Run the script on samples. Verify it produces correct, consistent chunks.
-5. **Deploy**: The script runs in production workflows. It's deterministic, free, and fast.
-This is different from `document-chunking` (quick, cheap splits for exploration). Production chunking is a one-time design effort that pays off across all documents of the same type.
-## Why Trees
-Two reasons:
-1. **Rules have scope.** "The risk disclosure in Chapter 5 must contain..." — you need to find Chapter 5, not read 1000 pages.
-2. **Worker LLMs have limits.** A 16K-32K context window cannot hold a 1000-page document. You must narrow to the relevant section.
-The tree structure solves both: it tells you WHERE things are, and lets you extract JUST what you need.
-## Building the Tree
-### Step 1: Discover the Structure
-Before building a tree parser, explore several sample documents to find structural patterns. Look for:
-- **Header conventions**: Do chapters start with "Chapter X"? "第X章"? "Part X"? A Roman numeral?
-- **Numbering systems**: "1.1.2", "Article 3", "(a)(i)", hierarchical numbering?
-- **Visual markers**: Bold text, larger font, horizontal rules, page breaks before chapters?
-- **Table of contents**: Most formal documents have one. It is the document's own tree.
-Spend time here. The patterns you find determine whether the tree builder is a simple regex or a complex parser.
-### Step 2: Choose the Parser
-**If patterns are consistent** (they usually are in regulated documents):
-- Write a regex-based splitter. For example:
-  - `^第[一二三四五六七八九十百千]+章` for Chinese chapter headers
-  - `^Chapter \d+` for English
-  - `^\d+\.\d+(\.\d+)*\s` for numbered sections
-- This is fast, deterministic, and reliable. Prefer this when it works.
-**If patterns are inconsistent or absent**:
-- Use the LLM-guided wedge-driving approach (see `rule-extraction/references/chunking-strategies.md` for the full algorithm: rolling context window, K-token quoting, Levenshtein fuzzy matching).
-- This is slower and costs LLM calls, but handles unstructured documents. The rolling window means even very large unstructured leaf nodes can be chunked incrementally.
-**If the document has a table of contents**:
-- Parse the TOC first. It gives you the tree structure and page numbers for free.
-- Then use the TOC-derived structure to split the document body.
-### Step 3: Build the Tree
-The tree is a simple nested structure:
-```
-Document
-├── Part I: General Provisions
-│   ├── Chapter 1: Definitions (pages 1-15)
-│   └── Chapter 2: Scope (pages 16-22)
-├── Part II: Capital Requirements
-│   ├── Chapter 3: Minimum Capital (pages 23-45)
-│   │   ├── Section 3.1: Tier 1 Capital
-│   │   └── Section 3.2: Tier 2 Capital
-│   └── Chapter 4: Risk Weighting (pages 46-78)
-└── Part III: Disclosure
-    └── Chapter 5: Risk Disclosure (pages 79-120)
-```
-Each node stores: the header text, the level, the start/end positions in the document, and the content size (in tokens or characters).
-### Step 4: Use the Tree
-Given a rule that says "check the risk disclosure section":
-1. **Search the tree** for the relevant node. Match the rule's scope description against node headers.
-   - Exact match: "Chapter 5" → find node with "Chapter 5" header.
-   - Semantic match: "risk disclosure section" → find node whose header or content relates to risk disclosure. May need fuzzy matching or LLM classification.
-2. **Extract the content** of that node (and optionally its children).
-3. **Check the size.** If the content fits in the worker LLM's context window, use it directly. If not, descend to child nodes and find the specific subsection needed.
-## The Full Context → Chapter → Entity Pipeline
-This is the standard narrowing funnel for extracting entities for verification:
-1. **Full context**: Use the tree to understand the document structure. Know where everything is.
-2. **Chapter**: Navigate to the specific section that the rule targets. Extract its content.
-3. **Entity**: Within the chapter content, extract the specific entity (number, text, clause) using the techniques from `entity-extraction`.
-For worker LLMs with 16K-32K context:
-- The chapter content + the extraction prompt must fit in the context window.
-- If a chapter is too large, descend further in the tree.
-- Always include the parent header chain for context: "Part II > Chapter 3 > Section 3.1" so the LLM knows where this content sits in the document.
-## Caching and Reuse
-Build the tree once per document, reuse across all rules:
-- Save the tree structure as JSON alongside the parsed document.
-- Multiple rules may need different sections of the same document. The tree lets each rule navigate directly to its section without re-parsing.
-## Edge Cases
-- **Flat documents**: Some documents have no structural hierarchy. Treat the entire document as one node. Use LLM-guided chunking if it exceeds the context window.
-- **Deeply nested structures**: Some legal documents have 6+ nesting levels. Build all levels but typically only navigate 2-3 levels deep for any given rule.
-- **Cross-section references**: A section might reference "as defined in Section 1.2." When extracting, you may need content from multiple tree nodes. Collect them into a single context for the LLM.
-- **Appendices and annexes**: Often contain critical tables and data. Include them as top-level nodes in the tree.

package/template/skills/zh/meta-meta/skill-to-workflow/SKILL.md DELETED Viewed

@@ -1,188 +0,0 @@
----
-name: skill-to-workflow
-description: Distill a proven verification skill into a Python workflow with worker LLM prompts. Use when a rule skill has been tested and reaches the SKILL_ACCURACY threshold defined in .env. Covers the decision of what to implement as code vs LLM calls, prompt engineering for small context windows, model tier selection and progressive downgrade, and testing workflows against the coding agent's own results as ground truth. Also use when optimizing existing workflows for cost or speed.
----
-# Skill to Workflow
-The skill is the ground truth. The workflow is a cheaper, faster approximation. Your job is to make the approximation as good as the original while being as cheap as possible.
-## Engineering Goal
-Optimize the full chain: **shortest workflow** (fewest nodes) → **smallest model per node** (cheapest tier that meets accuracy) → **shortest prompt per model** (minimum tokens). This is the engineering objective — not prompt template sophistication or framework compliance.
-## When to Start
-A skill is ready for workflow distillation when:
-- It has been tested on all documents in Samples/.
-- Its accuracy meets or exceeds the SKILL_ACCURACY threshold in `.env`.
-- Edge cases are documented in the skill's `assets/corner_cases.json`.
-- You understand the rule well enough to explain exactly how you verify it.
-If any of these are not true, go back and iterate on the skill first.
-## The Distillation Decision
-For each step in your skill-based verification process, ask:
-### Can this be done with regex or Python? (Cost: zero)
-- Date extraction with known formats → regex
-- Numeric comparison against threshold → Python arithmetic
-- Chinese numeral conversion → Python lookup table
-- Format validation (ID numbers, codes) → regex
-- Table cell extraction from structured markdown → string manipulation
-If yes, write it as code. These are free, fast, and deterministic.
-### Does this require language understanding? (Cost: worker LLM call)
-- Finding the relevant section in a document → LLM
-- Extracting an entity described in natural language → LLM
-- Judging semantic adequacy ("adequate risk disclosure") → LLM
-- Resolving ambiguous references → LLM
-If yes, design a worker LLM prompt. Use the smallest model tier that maintains accuracy.
-### The hybrid approach (most common)
-Most rules are a mix: regex extracts the number, Python compares it to the threshold, LLM handles the exceptional cases. Design the workflow as a pipeline where cheap steps run first and expensive steps run only when needed.
-### When regex alone isn't enough — decision rubric
-Before declaring distillation complete, audit each rule's `verification_type` / `metric` / `evidence_type` (or equivalent fields in your catalog). For rules where the required verification is one of:
-- **Semantic** ("is this a positive guarantee or a disclaimer?")
-- **Contextual** ("interpret this in light of the document's product type")
-- **Counterfactual** ("what should this value be, given the other fields?")
-- **Cross-field arithmetic** ("does 期初 + 收益 - 分配 = 期末?")
-regex alone rarely suffices. Three acceptable forms:
-1. **Pure regex with documented limits** — write the regex check, include a comment explaining the fragility (e.g., "matches syntactic pattern only; cannot detect semantic guarantees")
-2. **Hybrid regex + LLM** — regex baseline catches obvious cases, `worker_llm_call` (tier1-2) handles ambiguous ones. The hybrid workflow declares which rule_ids escalate.
-3. **Pure LLM via `worker_llm_call`** — for fully semantic rules where no regex baseline is meaningful.
-Don't ship pure regex for a rule whose `verification_type` is `judgment` / `semantic` without the documented-limits note. Future-you or a colleague will assume the regex is sufficient and that bug will hide for months.
-### Worker LLM cost-aware tier choice
-If you do escalate to LLM:
-- **tier1** (most capable, ~¥0.001-0.002/doc): cross-field reasoning, ambiguity resolution, rules that benefit from chain-of-thought
-- **tier2-3**: bulk extraction with simple semantic checks
-- **tier4** (cheapest): high-volume keyword-spotting that regex can't handle. Note: tier4 models on SiliconFlow are Qwen3.5 thinking-mode — `content` can return empty if `reasoning_content` consumes max_tokens. Test with realistic prompts before relying. If you see empty responses, either bump max_tokens to ≥8192, shorten your prompt, or fall back to tier1-2.
-Both v0.7.1 audit conductors (DS and GLM) defaulted to all-regex distillation and only added LLM escalation when the human user explicitly asked for "V2 with worker LLM". If your rule catalog has any rules where the verification is genuinely semantic, you should reach for `worker_llm_call` yourself — don't wait to be asked.
-## Workflow Structure
-A workflow is a Python file (or small set of files) in `workflows/`:
-```
-workflows/
-  rule_001_capital_adequacy/
-    workflow_v1.py        # The main workflow script
-    prompts/
-      extract.txt         # Worker LLM prompt for extraction
-      judge.txt           # Worker LLM prompt for judgment (if needed)
-    config.json           # Model assignments, thresholds
-```
-The workflow file should have a clear entry point:
-```python
-def verify(document_text: str, config: dict) -> dict:
-    """
-    Returns:
-        {
-            "rule_id": "R001",
-            "result": "pass" | "fail" | "missing" | "error",
-            "extracted_value": ...,
-            "confidence": 0.0-1.0,
-            "comment": "..." (only when fail),
-            "model_used": "...",
-            "llm_calls": int,
-            "llm_tokens": int
-        }
-    """
-```
-This is a reference, not a rigid contract. Adapt the structure to the specific rule. The important thing is that every workflow produces a result that can be compared against the skill-based ground truth.
-## Prompt Engineering for Worker LLMs
-Worker LLMs have smaller context windows (typically 16K-32K tokens). Design prompts that:
-1. **Are self-contained.** Include everything the model needs in the prompt. Do not assume the model has context from previous calls.
-2. **Specify the output format.** "Return a JSON object with fields: value, confidence, reasoning." Structured output reduces parsing errors.
-3. **Include the narrowed context.** Do not send the entire document. Use the tree-processing pipeline (full document → relevant chapter → relevant section) to narrow the context before calling the worker LLM.
-4. **Are written in the document's language.** Chinese documents get Chinese prompts. English documents get English prompts. Do not mix languages in a single prompt.
-5. **Provide examples sparingly.** One or two examples help. Ten examples waste context window and risk overfitting.
-## Model Tier Selection
-Start with the highest tier (TIER1) for each step. Measure accuracy. Then try lower tiers:
-1. Run the workflow with TIER1 on all Samples/. Record accuracy per step.
-2. For each step, try TIER2. If accuracy stays above WORKFLOW_ACCURACY, keep TIER2.
-3. Continue downgrading per step until accuracy drops below threshold.
-4. Record the optimal tier per step in `config.json`.
-Different steps within the same workflow can use different model tiers. Extraction might need TIER2 while judgment might work fine with TIER3.
-### Formal Downgrade Protocol
-The basic approach above works, but a more rigorous protocol prevents premature tier commitments:
-**Direction**: Start top-down (TIER1 → TIER4) to establish the accuracy ceiling first. You need to know the best possible accuracy before trading it for cost savings.
-**Minimum test runs**: Run at least a meaningful number of documents (e.g., min(10, total_samples)) at each candidate tier before making a tier decision. Small samples are unreliable — a 3-document test could be misleading.
-**Accuracy delta trigger**: If a lower tier's accuracy is significantly below the higher tier (e.g., >5 percentage points), stay at the higher tier for that step. If the delta is within tolerance, use the cheaper tier.
-**Per-step independence**: Each workflow step is assessed separately. Record the optimal tier per step in `config.json`. Do not assume the whole workflow must use one tier.
-**Re-assessment trigger**: If production quality control shows a step's accuracy degrading (e.g., due to new document formats), re-run the tier assessment for that step.
-**Model-task recommendation list**: Maintain a per-project mapping of (task_type → recommended_tier) based on your testing experience. Over time, these lists can be collected across projects to build generalized tier recommendations.
-All numbers here (10 documents, 5 percentage points, etc.) are recommended starting points. The coding agent and developer user should calibrate these — or replace them entirely with a different assessment approach — based on their specific volume, accuracy requirements, and cost constraints. The pattern matters: **test at each tier → compare accuracy → commit when within tolerance → re-assess on degradation**.
-This follows the same tier-transition framework as parser escalation in `document-parsing`: a quality/accuracy score drives the decision to stay, escalate, or skip.
-## Testing Against Ground Truth
-The coding agent's skill-based results are the ground truth. For each document in Samples/:
-1. Run the workflow.
-2. Compare the workflow's result against the skill-based result.
-3. Log discrepancies: which step failed, what was expected vs actual.
-4. Compute accuracy: `(matching results) / (total documents)`.
-5. If accuracy < WORKFLOW_ACCURACY, diagnose and fix. Use `evolution-loop` methodology.
-## Versioning
-Each iteration of a workflow is a new version file: `workflow_v1.py`, `workflow_v2.py`, etc. Track which version is active in `config.json`. See `version-control` skill for the full methodology.
-## Releasing Workflows
-Once workflows hit accuracy threshold, they can be packaged for end users via the `release` tool. Each release is a self-contained directory under `output/releases/<slug>/` with the pinned workflows, a Python runner, a confidence scorer, an HTML dashboard generator, and a `serve.sh` helper. The bundle has no kc-beta dependency — anyone with Python and a worker LLM API key can run `python run.py <doc>` and produce verification results.
-What to include is your call: all rules in catalog, or a curated subset via the `include` parameter; bundling 1-3 representative samples as `fixtures/` if you want the recipient to be able to dry-run without their own data.
-The `release` tool snapshots the workspace first (git tag `snap/release-<slug>`), so the bundle is regenerable from git even if `output/releases/` is later cleaned. Decide when to release — there's no automation, no forced cadence. Typical triggers: workflows reach SKILL/WORKFLOW_ACCURACY thresholds, a stakeholder needs a hand-off, a production cron should run pinned versions instead of latest. Discuss with the developer user.
-## Cost Tracking
-Track the cost of each workflow run:
-- Number of LLM calls per document.
-- Total tokens consumed per document.
-- Model tier used per call.
-This data helps the developer user understand the production cost and informs further optimization.
-## Worker LLM API
-Worker LLMs are accessed via SiliconFlow API. Connection details are in `.env`:
-- `SILICONFLOW_API_KEY` for authentication
-- `SILICONFLOW_BASE_URL` for the API endpoint
-- Model names in `TIER1` through `TIER4`
-See `references/worker-llm-catalog.md` for current model capabilities and context window sizes.