PyPI - archetype-md - Versions diffs - 0.1.0__py3-none-any.whl - Mend

archetype-md 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

archetype/README.md +188 -0
archetype/__init__.py +20 -0
archetype/markdown/__init__.py +67 -0
archetype/markdown/_ast_normalizer.py +175 -0
archetype/markdown/_projector.py +288 -0
archetype/markdown/_shared.py +58 -0
archetype/markdown/annotations.py +63 -0
archetype/markdown/elements.py +125 -0
archetype/markdown/errors.py +27 -0
archetype/markdown/extractor.py +156 -0
archetype/markdown/introspection.py +110 -0
archetype/markdown/meta_validation.py +264 -0
archetype/markdown/parser.py +19 -0
archetype/markdown/renderer.py +297 -0
archetype/markdown/template_model.py +47 -0
archetype/templating/__init__.py +52 -0
archetype/templating/environment.py +50 -0
archetype/templating/resolve.py +45 -0
archetype_md-0.1.0.dist-info/METADATA +12 -0
archetype_md-0.1.0.dist-info/RECORD +23 -0
archetype_md-0.1.0.dist-info/WHEEL +4 -0
archetype_md-0.1.0.dist-info/entry_points.txt +4 -0
archetype_md-0.1.0.dist-info/licenses/LICENSE +21 -0

archetype/README.md ADDED Viewed

@@ -0,0 +1,188 @@
+# Archetype
+Pydantic as the single source of truth for agentic systems.
+## 1. Purpose
+Agentic systems constantly move structured data across a fuzzy typed/textual (deterministic/generative)
+boundary: a prompt describes the expected output shape, the LLM emits
+markdown, code parses that markdown back into typed objects, and a downstream
+agent re-renders it for the next step. When the prompt, the parser, and the
+type definition live in separate places, they drift — silently — and that
+drift is the most common source of bugs in LLM pipelines.
+Archetype eliminates the drift by making one annotated Pydantic class the
+authoritative declaration. From that single class, the library derives:
+- the **markdown template** the agent is instructed to fill in,
+- the **renderer** that turns instances back into markdown,
+- the **parser/validator** that turns LLM output back into instances,
+- the **JSON schema** for tool/structured-output integration,
+- the **field introspection** that prompts use to describe their own
+  expected sections (`template_fields(Model)`),
+- the **Jinja resolution context** for one-pass instruction templates.
+A schema change in one place propagates everywhere automatically. Renaming
+a field, adding a section, or changing a heading's structure cannot
+desynchronize the prompt from the parser, because both are projections of
+the same class.
+## 2. Usage
+Archetype has two submodules:
+- `archetype.markdown` — typed markdown documents driven by annotated
+  Pydantic models (template generation, rendering, parsing, validation,
+  subtree extraction, heading-field introspection).
+- `archetype.templating` — a preconfigured Jinja environment with
+  markdown-aware globals (`template_fields`, `render_template`) and a
+  `resolve()` helper for one-pass instruction templating.
+### Declaring a document
+```python
+from typing import Annotated
+from archetype.markdown import (
+    MarkdownDocument, MarkdownHeader,
+    AsHeading, AsBulletList, TextTemplate,
+)
+class Finding(MarkdownHeader):
+    title: Annotated[str, TextTemplate("Finding {ordinal} - {value}")]
+    description: Annotated[str, AsHeading()]
+    evidence: Annotated[list[str], AsBulletList()]
+class Review(MarkdownDocument):
+    title: Annotated[str, TextTemplate("{value}")]
+    summary: Annotated[str, AsHeading()]
+    findings: list[Finding]
+```
+### Rendering, parsing, introspecting
+```python
+from archetype.markdown import (
+    render_template, render_instance, validate_markdown, template_fields,
+)
+# Skeleton markdown to embed in an agent's prompt
+template_md = render_template(Review)
+# Turn an LLM's markdown reply back into a typed instance
+review: Review = validate_markdown(llm_output, Review)
+# Re-render an instance to markdown (e.g. as input to a downstream agent)
+markdown = render_instance(review)
+# Iterate heading metadata for prompt construction
+for field in template_fields(Review):
+    print(field.heading, field.description)
+```
+### Instruction templates with Jinja
+```python
+from archetype.templating import resolve
+def designer_instructions_provider(state: DesignerInput) -> str:
+    return resolve(
+        _load_template(),
+        feature=state.feature_definition,
+    )
+```
+Inside the template:
+```jinja
+The feature definition has these sections:
+{% for field in template_fields(FeatureDefinition) %}
+- **{{ field.heading }}** — {{ field.description }}
+{% endfor %}
+Your output must match this structure:
+{{ render_template(DesignDocument) }}
+```
+Templates use only `{{ path }}`, `{% for x in path %}…{% endfor %}`, and the
+two registered globals — no filters, conditionals, macros, includes, or
+inheritance. The restriction is convention, not runtime-enforced.
+## 3. What the Pydantic model drives
+The annotated model is the hub; every artifact downstream is a derivation
+of it. There is no parallel source of truth for any of these arrows.
+```
+                            ┌──────────────────────────┐
+                            │   Annotated Pydantic     │
+                            │   model (your class)     │
+                            │                          │
+                            │  • field names + types   │
+                            │  • Annotated[…] markers: │
+                            │      AsHeading           │
+                            │      AsCodeBlock         │
+                            │      AsTable             │
+                            │      AsBulletList        │
+                            │      AsNumberedList      │
+                            │      TextTemplate        │
+                            │  • nested MarkdownHeader │
+                            │    subclasses            │
+                            └─────────────┬────────────┘
+                                          │
+       ┌──────────────────┬───────────────┼───────────────┬──────────────────┐
+       │                  │               │               │                  │
+       ▼ drives           ▼ drives        ▼ controls      ▼ validates        ▼ exposes
+┌─────────────┐  ┌─────────────────┐ ┌──────────────┐ ┌─────────────┐ ┌────────────────┐
+│ render_     │  │ render_instance │ │ validate_    │ │ Pydantic    │ │ template_      │
+│ template()  │  │ ()              │ │ markdown()   │ │ field +     │ │ fields() →     │
+│             │  │                 │ │              │ │ structural  │ │ FieldInfo for  │
+│ skeleton    │  │ instance →      │ │ markdown →   │ │ meta-       │ │ each heading   │
+│ markdown    │  │ markdown        │ │ instance     │ │ validation  │ │ (.heading,     │
+│ for prompts │  │                 │ │              │ │ at class    │ │  .description) │
+│             │  │                 │ │              │ │ definition  │ │                │
+└─────────────┘  └─────────────────┘ └──────────────┘ └─────────────┘ └────────────────┘
+       │                  │                 │                                │
+       │                  │                 │                                │
+       └──────────┬───────┴─────────────────┴────────────────────────────────┘
+                  │
+                  ▼ all reachable inside Jinja via
+        ┌──────────────────────────────────────┐
+        │  archetype.templating.resolve(...)   │
+        │                                      │
+        │  globals: template_fields,           │
+        │           render_template            │
+        │                                      │
+        │  one-pass agent-instruction rendering│
+        └──────────────────────────────────────┘
+                  │
+                  ▼ also drives
+        ┌──────────────────────────────────────┐
+        │  Model.model_json_schema() — JSON    │
+        │  schema for structured-output / tool │
+        │  integrations (free from Pydantic)   │
+        └──────────────────────────────────────┘
+                  │
+                  ▼ supports
+        ┌──────────────────────────────────────┐
+        │  extract_subtree() — slice a typed   │
+        │  subtree out of a larger document    │
+        └──────────────────────────────────────┘
+```
+### Per-arrow summary
+| Arrow                  | Reads from the model                                  | Produces                                |
+| ---------------------- | ----------------------------------------------------- | --------------------------------------- |
+| `render_template`      | field names, annotations, nested types                | skeleton markdown for prompts           |
+| `render_instance`      | instance values + annotations                         | markdown serialization                  |
+| `validate_markdown`    | field types, annotations, structural rules            | typed instance (or `MarkdownValidationError`) |
+| Meta-validation hook   | class structure at definition time                    | early `MarkdownError` on malformed templates |
+| `template_fields`      | heading-introducing fields and their docstrings       | `FieldInfo(heading, description)` stream |
+| `extract_subtree`      | nested `MarkdownHeader` types                         | typed slice of a larger document        |
+| `Model.model_json_schema()` | field types (Pydantic-native)                    | JSON Schema for structured-output APIs  |
+| `resolve()` (Jinja)    | the model, via `template_fields` / `render_template`  | fully-resolved instruction string       |
+The takeaway: edit the annotated Pydantic class, and every artifact above
+follows. No other file needs to change for the prompt, the parser, the
+schema, and the renderer to stay in agreement.

archetype/__init__.py ADDED Viewed

@@ -0,0 +1,20 @@
+"""Archetype — Pydantic as source of truth for agentic systems.
+Core idea: declare a Pydantic data model once, and have one change to that
+model propagate, without any other code edits, to every derived artifact
+the model participates in — markdown templates, renderers, parsers,
+validators, JSON schemas, instruction placeholders, and more.
+Modules:
+- ``archetype.markdown`` — typed markdown documents via Pydantic.
+  Annotation-driven domain models, rendering, parsing, validation,
+  subtree extraction, and heading-field introspection.
+- ``archetype.templating`` — Jinja-based template resolution. Provides
+  a preconfigured Jinja environment with markdown-aware globals
+  (``template_fields``, ``render_template``) and a ``resolve()`` helper
+  that renders a template string against a context object.
+See individual submodule docstrings for details.
+"""

archetype/markdown/__init__.py ADDED Viewed

@@ -0,0 +1,67 @@
+"""Declarative markdown-document machinery for archetype.
+See the architecture ADR and the markdown-machinery-design document for
+context. Quick example:
+    from typing import Annotated
+    from archetype.markdown import (
+        MarkdownDocument, MarkdownHeader,
+        AsHeading, TextTemplate,
+        render_template, validate_markdown,
+        template_fields,
+    )
+    class Finding(MarkdownHeader):
+        title: Annotated[str, TextTemplate("Finding {ordinal} - {value}")]
+        description: Annotated[str, AsHeading()]
+    class Review(MarkdownDocument):
+        title: Annotated[str, TextTemplate("{value}")]
+        summary: Annotated[str, AsHeading()]
+        findings: list[Finding]
+    template = render_template(Review)
+    review = validate_markdown(produced_md, Review)
+    fields = template_fields(Review)
+"""
+from archetype.markdown.annotations import (
+    AsBulletList,
+    AsCodeBlock,
+    AsHeading,
+    AsNumberedList,
+    AsTable,
+    TextTemplate,
+)
+from archetype.markdown.errors import (
+    MarkdownError,
+    MarkdownExtractionError,
+    MarkdownTemplateError,
+    MarkdownValidationError,
+)
+from archetype.markdown.extractor import extract_subtree
+from archetype.markdown.introspection import FieldInfo, template_fields
+from archetype.markdown.parser import validate_markdown
+from archetype.markdown.renderer import render_instance, render_template
+from archetype.markdown.template_model import MarkdownDocument, MarkdownHeader
+__all__ = [
+    "AsBulletList",
+    "AsCodeBlock",
+    "AsHeading",
+    "AsNumberedList",
+    "AsTable",
+    "FieldInfo",
+    "MarkdownDocument",
+    "MarkdownError",
+    "MarkdownExtractionError",
+    "MarkdownHeader",
+    "MarkdownTemplateError",
+    "MarkdownValidationError",
+    "TextTemplate",
+    "extract_subtree",
+    "render_instance",
+    "render_template",
+    "template_fields",
+    "validate_markdown",
+]

archetype/markdown/_ast_normalizer.py ADDED Viewed

@@ -0,0 +1,175 @@
+"""Normalize markdown-it-py AST tokens into a tree of typed BlockElement
+instances + an optional MarkdownFrontmatter at the top.
+Why a separate module: keeps the AST-token → typed-tree concern decoupled from
+the projector (element-tree → domain instance). Tests can drive each layer
+independently.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+import yaml
+from markdown_it import MarkdownIt
+from markdown_it.token import Token
+from archetype.markdown.elements import (
+    BlockElement,
+    MarkdownBulletList,
+    MarkdownCodeBlock,
+    MarkdownFrontmatter,
+    MarkdownHeading,
+    MarkdownNumberedList,
+    MarkdownParagraph,
+    MarkdownTable,
+    MarkdownTableRow,
+)
+from archetype.markdown.errors import MarkdownValidationError
+@dataclass
+class NormalizedDocument:
+    """The output of `normalize()` — frontmatter (or None) plus the top-level
+    block sequence. Each top-level heading carries its scoped body recursively."""
+    frontmatter: MarkdownFrontmatter | None
+    blocks: list[BlockElement]
+def normalize(markdown: str) -> NormalizedDocument:
+    """Parse markdown text and produce a normalized element tree."""
+    fm, body = _split_frontmatter(markdown)
+    # Use commonmark + the table plugin. NOT MarkdownIt("gfm-like"): that
+    # preset enables the linkify rule, which requires the linkify-it-py
+    # package (not in our deps) and crashes at parse time without it.
+    md = MarkdownIt("commonmark").enable("table")
+    tokens = md.parse(body)
+    flat = _tokens_to_blocks(tokens)
+    blocks_with_scope = _nest_headings_by_level(flat)
+    return NormalizedDocument(frontmatter=fm, blocks=blocks_with_scope)
+def _split_frontmatter(markdown: str) -> tuple[MarkdownFrontmatter | None, str]:
+    if not markdown.startswith("---\n"):
+        return None, markdown
+    end = markdown.find("\n---\n", 4)
+    if end == -1:
+        return None, markdown
+    raw_yaml = markdown[4 : end + 1]
+    rest = markdown[end + len("\n---\n") :]
+    try:
+        parsed = yaml.safe_load(raw_yaml) or {}
+    except yaml.YAMLError as exc:
+        raise MarkdownValidationError(f"Frontmatter YAML is malformed: {exc}") from exc
+    return MarkdownFrontmatter(raw_yaml=raw_yaml, parsed=parsed), rest
+@dataclass
+class _FlatBlock:
+    """Pass-1 wrapper that carries the AST heading level alongside the typed
+    element. Used only inside the normalizer; never escapes the module."""
+    element: BlockElement
+    level: int | None  # set only for MarkdownHeading
+def _tokens_to_blocks(tokens: list[Token]) -> list[_FlatBlock]:
+    """First pass: convert flat token stream into a flat list of `_FlatBlock`
+    wrappers."""
+    out: list[_FlatBlock] = []
+    i = 0
+    while i < len(tokens):
+        t = tokens[i]
+        if t.type == "heading_open":
+            level = int(t.tag[1])  # 'h2' -> 2
+            text = tokens[i + 1].content
+            out.append(_FlatBlock(element=MarkdownHeading(text=text, body=[]), level=level))
+            i += 3
+        elif t.type == "paragraph_open":
+            content = tokens[i + 1].content
+            out.append(_FlatBlock(element=MarkdownParagraph(content=content), level=None))
+            i += 3
+        elif t.type == "fence":
+            lang = t.info.strip() or None
+            out.append(
+                _FlatBlock(element=MarkdownCodeBlock(language=lang, content=t.content), level=None)
+            )
+            i += 1
+        elif t.type == "bullet_list_open":
+            items, advance = _collect_list_items(tokens, i, "bullet_list_close")
+            out.append(_FlatBlock(element=MarkdownBulletList(items=items), level=None))
+            i += advance
+        elif t.type == "ordered_list_open":
+            items, advance = _collect_list_items(tokens, i, "ordered_list_close")
+            out.append(_FlatBlock(element=MarkdownNumberedList(items=items), level=None))
+            i += advance
+        elif t.type == "table_open":
+            table, advance = _collect_table(tokens, i)
+            out.append(_FlatBlock(element=table, level=None))
+            i += advance
+        else:
+            i += 1
+    return out
+def _collect_list_items(tokens: list[Token], start: int, close_type: str) -> tuple[list[str], int]:
+    items: list[str] = []
+    i = start + 1
+    while tokens[i].type != close_type:
+        if tokens[i].type == "list_item_open":
+            items.append(tokens[i + 2].content)
+        i += 1
+    return items, (i - start) + 1
+def _collect_table(tokens: list[Token], start: int) -> tuple[MarkdownTable, int]:
+    columns: list[str] = []
+    rows: list[MarkdownTableRow] = []
+    i = start + 1
+    in_header = False
+    in_body = False
+    cur_row: list[str] = []
+    while tokens[i].type != "table_close":
+        t = tokens[i]
+        if t.type == "thead_open":
+            in_header = True
+        elif t.type == "thead_close":
+            in_header = False
+        elif t.type == "tbody_open":
+            in_body = True
+        elif t.type == "tbody_close":
+            in_body = False
+        elif t.type == "tr_open":
+            cur_row = []
+        elif t.type == "tr_close":
+            if in_header:
+                columns = cur_row
+            elif in_body:
+                rows.append(MarkdownTableRow(cells=cur_row))
+        elif t.type in ("th_open", "td_open"):
+            cur_row.append(tokens[i + 1].content)
+        i += 1
+    return MarkdownTable(columns=columns, rows=rows), (i - start) + 1
+def _nest_headings_by_level(flat: list[_FlatBlock]) -> list[BlockElement]:
+    """Second pass: turn flat block list into a tree by nesting blocks under
+    the most recent open heading scope."""
+    root: list[BlockElement] = []
+    stack: list[tuple[int, MarkdownHeading]] = []
+    for fb in flat:
+        block = fb.element
+        if isinstance(block, MarkdownHeading):
+            assert fb.level is not None
+            level = fb.level
+            while stack and stack[-1][0] >= level:
+                stack.pop()
+            target = stack[-1][1].body if stack else root
+            target.append(block)
+            stack.append((level, block))
+        else:
+            target = stack[-1][1].body if stack else root
+            target.append(block)
+    return root