npm - @kudusov.takhir/ba-toolkit - Versions diffs - 3.2.0 → 3.3.0 - Mend

@kudusov.takhir/ba-toolkit 3.2.0 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/CHANGELOG.md +10 -1
package/README.md +6 -3
package/bin/ba-toolkit.js +3 -0
package/package.json +1 -1
package/skills/brief/SKILL.md +3 -3
package/skills/references/domains/ai-ml.md +262 -0
package/skills/references/domains/edtech.md +223 -0
package/skills/references/domains/govtech.md +244 -0
package/skills/srs/SKILL.md +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -11,6 +11,14 @@ Versions follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ---
+## [3.3.0] — 2026-04-09
+### Added
+- **Three new domain references — EdTech, GovTech, AI / ML.** The shipped domain catalog grows from 9 to 12 first-class industries, in addition to the `custom` fallback. Each new file (`skills/references/domains/edtech.md`, `govtech.md`, `ai-ml.md`) follows the established 9-section structure (one section per pipeline interview-phase skill: `/brief`, `/srs`, `/stories`, `/usecases`, `/ac`, `/nfr`, `/datadict`, `/apicontract`, `/wireframes`) plus a domain-specific glossary, and matches the depth of the existing references (~250 lines each). New entries appear in the `DOMAINS` array in `bin/ba-toolkit.js`, the `currently:` enumeration in `skills/brief/SKILL.md` and `skills/srs/SKILL.md`, the brief artifact-template `Domain:` line, the README intro / domain-table / badge, and the canonical domain-order rule in `CLAUDE.md` §5. **EdTech** covers K-12 platforms, higher-ed tools, MOOC marketplaces, corporate L&D, language learning, exam prep, and micro-credential platforms — with FERPA / COPPA / GDPR-K / Section 508 / WCAG, LTI / SCORM / xAPI / OneRoster / Clever rostering, and cohort-management mechanics baked in. **GovTech** covers citizen-facing e-services, permits and licensing, tax filing, benefits, public records / FOIA, court e-filing, and 311 — with national-digital-ID brokering (Login.gov / BankID / ItsMe / eIDAS), FedRAMP / StateRAMP / FISMA / CJIS / IRS Pub 1075 / Section 508 / EN 301 549 / plain-language and records-retention obligations. **AI / ML** covers LLM-powered apps, RAG pipelines, agent frameworks, model-serving and inference, fine-tuning, evals, and embedded AI features — with prompt-injection defence, hallucination metrics, eval regressions, model fallback, EU AI Act / NIST AI RMF / ISO 42001 risk classification, cost / token quotas, and RAG / vector-store data modelling. Skills are auto-discovered, so no CLI registration changes are needed beyond the `DOMAINS` array entry.
+---
 ## [3.2.0] — 2026-04-09
 ### Highlights
@@ -475,7 +483,8 @@ CI scripts that relied on the old behaviour (`init` creates files only, `install
 ---
-[Unreleased]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.2.0...HEAD
+[Unreleased]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.3.0...HEAD
+[3.3.0]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.2.0...v3.3.0
 [3.2.0]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.1.1...v3.2.0
 [3.1.1]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.1.0...v3.1.1
 [3.1.0]: https://github.com/TakhirKudusov/ba-toolkit/compare/v3.0.0...v3.1.0

package/README.md CHANGED Viewed

@@ -2,10 +2,10 @@
 # 📋 BA Toolkit
-Structured BA pipeline for AI coding agents — concept to handoff, 23 skills, 9 domains, one-command Notion + Confluence publish.
+Structured BA pipeline for AI coding agents — concept to handoff, 23 skills, 12 domains, one-command Notion + Confluence publish.
 <img src="https://img.shields.io/badge/skills-23-blue" alt="Skills">
-<img src="https://img.shields.io/badge/domains-9-green" alt="Domains">
+<img src="https://img.shields.io/badge/domains-12-green" alt="Domains">
 <img src="https://img.shields.io/badge/format-Markdown-orange" alt="Format">
 <img src="https://img.shields.io/badge/language-auto--detect-purple" alt="Language">
 <img src="https://img.shields.io/badge/license-MIT-lightgrey" alt="License">
@@ -24,7 +24,7 @@ Structured BA pipeline for AI coding agents — concept to handoff, 23 skills, 9
 BA Toolkit is a set of 23 interconnected skills that run a full business-analysis pipeline inside your AI coding agent. You can start as early as `/discovery` (a brain-storm step for users who don't yet know what to build) or jump straight to `/brief` if you already have a project in mind, then work all the way through to a development handoff package. Each skill reads the output of the previous ones — maintaining cross-references between artifacts along the chain `FR → US → UC → AC → NFR → Entity → ADR → API → WF → Scenario`. When you're ready to share with non-developer stakeholders, `/publish` (or `ba-toolkit publish`) bundles every artifact into import-ready folders for Notion and Confluence — drag-and-drop, no API tokens.
-Unlike one-shot prompting, every artifact is written to disk as Markdown, every ID links back to its source, and `/trace` verifies coverage across the whole pipeline. `/clarify` and `/analyze` catch ambiguities and quality gaps with CRITICAL/HIGH severity ratings. Domain references for 9 industries (SaaS, Fintech, E-commerce, Healthcare, Logistics, On-demand, Social/Media, Real Estate, iGaming) plug in automatically at `/brief`.
+Unlike one-shot prompting, every artifact is written to disk as Markdown, every ID links back to its source, and `/trace` verifies coverage across the whole pipeline. `/clarify` and `/analyze` catch ambiguities and quality gaps with CRITICAL/HIGH severity ratings. Domain references for 12 industries (SaaS, Fintech, E-commerce, Healthcare, Logistics, On-demand, Social/Media, Real Estate, iGaming, EdTech, GovTech, AI/ML) plug in automatically at `/brief`.
 Artifacts are generated in whatever language you write in — ask in English, get English docs; ask in any other language, the output follows.
@@ -243,6 +243,9 @@ The pipeline is domain-agnostic by default. At `ba-toolkit init` you pick a doma
 | **Social / Media** | Social networks, creator platforms, community forums, newsletters, short-video |
 | **Real Estate** | Property portals, agency CRM, rental management, property management, mortgage tools |
 | **iGaming** | Online slots, sports betting, casino lobbies, Telegram Mini Apps, promo mechanics |
+| **EdTech** | LMS, K-12, higher ed, MOOC, corporate L&D, language learning, exam prep |
+| **GovTech** | Citizen e-services, permits, tax filing, benefits, public records, court e-filing |
+| **AI / ML** | LLM apps, RAG pipelines, agents, model serving, fine-tuning, MLOps platforms |
 | **Custom** | Any other domain — works with general interview questions |
 Adding a new domain = creating one Markdown file in `skills/references/domains/`. See [docs/DOMAINS.md](docs/DOMAINS.md).

package/bin/ba-toolkit.js CHANGED Viewed

@@ -80,6 +80,9 @@ const DOMAINS = [
   { id: 'social-media', name: 'Social/Media', desc: 'Social networks, creator platforms, community forums' },
   { id: 'real-estate',  name: 'Real Estate',  desc: 'Property portals, agency CRM, rental management' },
   { id: 'igaming',      name: 'iGaming',      desc: 'Slots, betting, casino, Telegram Mini Apps' },
+  { id: 'edtech',       name: 'EdTech',       desc: 'LMS, K-12, higher ed, MOOC, corporate L&D, language learning' },
+  { id: 'govtech',      name: 'GovTech',      desc: 'Citizen e-services, permits, tax, benefits, public records' },
+  { id: 'ai-ml',        name: 'AI / ML',      desc: 'LLM apps, RAG, agents, model serving, fine-tuning, MLOps' },
   { id: 'custom',       name: 'Custom',       desc: 'Any other domain — general interview questions' },
 ];

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kudusov.takhir/ba-toolkit",
-  "version": "3.2.0",
+  "version": "3.3.0",
   "description": "AI-powered Business Analyst pipeline — 23 skills from concept discovery to development handoff, with one-command Notion + Confluence publish. Works with Claude Code, Codex CLI, Gemini CLI, Cursor, and Windsurf.",
   "keywords": [
     "business-analyst",

package/skills/brief/SKILL.md CHANGED Viewed

@@ -10,7 +10,7 @@ Starting point of the BA Toolkit pipeline. Generates a structured Project Brief
 ## Loading domain reference
-Domain references are located in `references/domains/` relative to the `ba-toolkit` directory. Supported domains: `saas`, `fintech`, `ecommerce`, `healthcare`, `logistics`, `on-demand`, `social-media`, `real-estate`, `igaming`. For other domains, work without a reference file.
+Domain references are located in `references/domains/` relative to the `ba-toolkit` directory. Supported domains: `saas`, `fintech`, `ecommerce`, `healthcare`, `logistics`, `on-demand`, `social-media`, `real-estate`, `igaming`, `edtech`, `govtech`, `ai-ml`. For other domains, work without a reference file.
 ## Workflow
@@ -28,7 +28,7 @@ If `00_principles_*.md` exists in the output directory, load it and apply its co
 ### 3. Domain selection
-Ask the user about the project domain. If a matching `references/domains/{domain}.md` file exists (currently: `saas`, `fintech`, `ecommerce`, `healthcare`, `logistics`, `on-demand`, `social-media`, `real-estate`, `igaming`), load it and use its domain-specific interview questions (section `1. /brief`), typical business goals, risks, and glossary.
+Ask the user about the project domain. If a matching `references/domains/{domain}.md` file exists (currently: `saas`, `fintech`, `ecommerce`, `healthcare`, `logistics`, `on-demand`, `social-media`, `real-estate`, `igaming`, `edtech`, `govtech`, `ai-ml`), load it and use its domain-specific interview questions (section `1. /brief`), typical business goals, risks, and glossary.
 If the domain does not match any supported one, record it as `custom:{name}` and use general questions only.
@@ -62,7 +62,7 @@ If a domain reference is loaded, supplement general questions with domain-specif
 ```markdown
 # Project Brief: {Project Name}
-**Domain:** {saas | fintech | ecommerce | healthcare | logistics | on-demand | social-media | real-estate | igaming | custom:{name}}
+**Domain:** {saas | fintech | ecommerce | healthcare | logistics | on-demand | social-media | real-estate | igaming | edtech | govtech | ai-ml | custom:{name}}
 **Date:** {date}
 ## 1. Project Summary

package/skills/references/domains/ai-ml.md ADDED Viewed

@@ -0,0 +1,262 @@
+# Domain Reference: AI / ML
+Domain-specific knowledge for AI / ML products: LLM-powered applications, conversational agents, RAG (retrieval-augmented generation) pipelines, agent frameworks, model-serving and inference platforms, fine-tuning and training pipelines, ML feature stores, model marketplaces, AI dev tools, vector databases, and embedded AI features inside non-AI products.
+---
+## 1. /brief — Project Brief
+### Domain-specific interview questions
+- Product type: end-user LLM app (chat, copilot), RAG over a private corpus, autonomous agent / multi-agent system, model-serving platform, fine-tuning service, AI-feature embedded in an existing SaaS, AI dev tool, vector database, evals platform?
+- Build vs. buy on the model: hosted API (OpenAI, Anthropic, Google), open-weights self-hosted (Llama, Mistral, Qwen), fine-tuned open-weights, in-house pre-trained, hybrid?
+- User segment: consumer, prosumer, B2B knowledge worker, developer, internal (employee-facing), regulated enterprise?
+- Modality: text only, multimodal (image, audio, video, code), voice-first, structured-output / function-calling?
+- Monetization: per-message, per-token, subscription, per-seat, per-API-call, freemium with usage caps, marketplace take-rate?
+- Latency budget: real-time chat (<2s), interactive completion (<200ms first token), batch processing (minutes), or async background?
+- Determinism / reproducibility requirement: high (same input → same output, e.g. legal, healthcare, finance) vs. low (creative)?
+### Typical business goals
+- Reach quality threshold for production launch (often the gating factor: an evals score, a human-preference win-rate vs. baseline, or an internal acceptance bar).
+- Reduce cost-per-query (token costs, retrieval costs, fine-tuning amortisation).
+- Reduce hallucination / unsafe-output rate to a specific target (e.g. <1% on the eval set).
+- Improve task success rate (the AI version of feature success — does the user actually finish the task with help?).
+- Increase active usage (DAU, messages per session, tasks completed).
+- Build a moat through proprietary data, fine-tuned models, or workflow integration.
+- Hit safety, bias, and compliance thresholds (especially in regulated sectors).
+### Typical risks
+- Hallucinations and confident-sounding incorrect output.
+- Prompt injection and jailbreaking (data exfiltration, role override, tool misuse).
+- Cost runaway (one heavy user can burn months of margin).
+- Latency unpredictability (P50 vs. P99 differs by an order of magnitude).
+- Vendor / model deprecation (the underlying model API changes or sunsets).
+- Data privacy: user inputs may contain PII, secrets, or proprietary code that the model provider then stores or trains on.
+- Regulatory exposure: EU AI Act, sector-specific (FDA SaMD for medical, NYC bias audit for HR, NIST AI RMF, ISO 42001).
+- Eval / quality regression: a model upgrade silently breaks a downstream task.
+- Tool-use agents going off-policy (calling the wrong tool, making irreversible writes, looping).
+---
+## 2. /srs — Requirements Specification
+### Domain-specific interview questions
+- Roles: end user, prompt / agent author, eval reviewer, ML engineer, MLOps / inference operator, safety / policy reviewer, billing admin?
+- Multi-tenancy: shared model with per-tenant retrieval, per-tenant fine-tunes, per-tenant inference cluster?
+- Model layer: which models (provider × name × version) are wired? Fallback / failover strategy when the primary model is down or rate-limited?
+- Prompting / orchestration framework: in-house, LangChain, LlamaIndex, custom agent loop, OpenAI Agents SDK?
+- RAG: which corpus? Which embedding model? Which vector database? Re-indexing schedule?
+- Tool use / function calling: which tools are exposed to the model? Which side effects are allowed (read-only, idempotent, irreversible)?
+- Evals: which datasets? Which scoring methods (LLM-as-judge, rubric, A/B human pref, reference-based metrics)?
+- Safety: input filter, output filter, refusal policy, red-team coverage, prompt-injection defence (delimited / sandwiched / structured prompts, system-prompt leakage prevention)?
+- Data handling: are user inputs logged? Used for fine-tuning? Sent to a third-party model provider (and under which DPA terms)?
+- Compliance: EU AI Act risk classification, NIST AI RMF profile, ISO 42001 certification target, sector rules (FDA SaMD, HIPAA, GDPR Art. 22 automated-decision rights)?
+### Typical functional areas
+- Conversation / completion API and UI.
+- Prompt / template management with versioning.
+- Model routing and fallback (provider × model × version).
+- Tool / function-calling registry.
+- RAG pipeline (ingest, chunk, embed, store, retrieve, rerank).
+- Vector database / index management.
+- Fine-tuning / dataset management.
+- Inference serving (autoscaling, queueing, batching, GPU / accelerator management).
+- Eval pipeline (datasets, scorers, regression detection, dashboards).
+- Safety pipeline (input / output filters, red-team queue, refusal policies).
+- Observability (token usage, latency, cost, quality, drift).
+- Cost / usage metering and quotas.
+- Feedback collection (thumbs up/down, structured rating, free-text).
+- Admin console (model registry, prompt registry, tool registry, eval runs, safety dashboard).
+---
+## 3. /stories — User Stories
+### Domain-specific interview questions
+- Critical end-user flows: ask a question, get a grounded answer with citations, refine the answer, take an action via a tool, share or export the result?
+- Critical author / operator flows: write a new prompt, run it against an eval set, compare against a baseline, ship to production, monitor regressions?
+- Critical safety flows: red-team a new prompt, review a flagged refusal, export an audit log for a regulator?
+- Edge cases: model outage and fallback, rate-limit hit, user input contains PII, tool call fails halfway through a multi-step plan, hallucinated citation, prompt injection in retrieved context?
+- Personas: end user, prompt / agent author, eval reviewer, ML engineer, MLOps operator, safety reviewer, regulator / auditor.
+### Typical epics
+- Conversation and completion.
+- Retrieval and citation.
+- Tool / function calling.
+- Prompt and template authoring.
+- Eval running and regression detection.
+- Safety review and red-teaming.
+- Model routing and fallback.
+- Cost metering and quotas.
+- Feedback and improvement loop.
+- Fine-tuning workflow.
+- Observability and incident response.
+- Admin (model registry, prompt registry, audit log).
+---
+## 4. /usecases — Use Cases
+### Domain-specific interview questions
+- Critical alternative flows: model returns an unsafe output → safety filter rewrites or refuses; tool call fails → agent retries, escalates, or rolls back; retrieved context is empty → fall back to general knowledge with a clear disclaimer; quota hit → block with a soft message and upgrade prompt?
+- System actors: model provider API, embedding API, vector database, tool / plugin APIs (search, code execution, browser, internal CRMs), safety classifier, eval scorer, observability backend, billing / metering system?
+### Typical exceptional flows
+- Model API timeout / 5xx → fall back to a secondary model or queue for retry.
+- Rate-limit hit (provider-side or per-tenant) → backoff, queue, or reject with a clear message.
+- Output filter blocks an answer → safe completion or refusal with a reason code.
+- Input filter detects prompt injection → strip / quote the suspicious content and warn.
+- Retrieved context is empty → answer with a disclaimer or ask a clarifying question instead of hallucinating.
+- Tool call fails after a partial side effect → compensate / roll back and report cleanly to the user.
+- Eval regression detected after a prompt or model upgrade → auto-rollback and page on-call.
+- User submits PII or a secret → mask in logs, do not forward to third-party providers without DPA coverage.
+- Citation cannot be resolved to a real source → suppress the citation and downgrade confidence.
+---
+## 5. /ac — Acceptance Criteria
+### Domain-specific interview questions
+- Quality bars: minimum eval score, minimum win-rate vs. baseline, max hallucination rate on the eval set, max unsafe-output rate?
+- Latency bars: time to first token (p50, p95), full completion time, agent-step total budget?
+- Cost bars: max average tokens per query, max average cost per query, hard per-tenant daily / monthly quota?
+- Determinism / reproducibility: is `temperature = 0` required, with the same model version pinned? Is bit-for-bit reproducibility required (often impossible — replace with semantic-equivalence checks)?
+- Boundary values: max prompt length, max context window utilisation, max conversation history retained, max retrieved documents per query, max tool-call depth, max agent steps per task?
+- Safety AC: explicit list of disallowed outputs (PII leakage, prompt-injection echo, instructions for weapons / self-harm / etc.) with at least one positive and one negative test case each.
+---
+## 6. /nfr — Non-functional Requirements
+### Domain-specific interview questions
+- Latency: TTFT (time to first token), full completion p50 / p95, end-to-end agent-task budget?
+- Throughput: peak QPS, concurrent active sessions, peak fine-tune jobs, peak batch inference?
+- Cost target: average and p95 cost per request, monthly budget per tenant, alerting threshold?
+- Quality: target eval score, target win-rate vs. last release, hallucination rate ceiling, refusal rate ceiling?
+- Reproducibility: pinned model versions, pinned prompt versions, deterministic decoding settings?
+- Safety: red-team coverage frequency, mandatory categories (CSAM, weapons, self-harm, hate, prompt injection, PII leakage)?
+- Observability: traces per request (prompt, retrieval, tool calls, scores, cost, latency)?
+- Data residency: where do user inputs and embeddings live? Which cloud regions are allowed? Is the model API itself in a permitted region?
+### Mandatory NFR categories for AI / ML
+- **Performance:** TTFT < 1s (p50) for chat; full completion < 5s (p95) for short-form; batch jobs scheduled to off-peak windows.
+- **Scalability:** elastic inference (autoscaling on tokens-per-second, not just requests-per-second); rate limiting per tenant and global; queueing with backpressure.
+- **Cost control:** per-request token cap, per-tenant daily and monthly budget, alerting at 50/80/100 % thresholds; cost dashboard mandatory.
+- **Quality / evals:** eval set on every prompt or model change; regression detection on every release; rollback playbook documented.
+- **Safety:** input + output filters; red-team coverage of mandatory categories; refusal-rate dashboard; incident playbook for jailbreaks; clear escalation path.
+- **Security:** secrets never echoed; user inputs scrubbed before logging; mTLS or signed requests to the model provider; tool-call allow-list; sandboxed code execution if applicable; tenant isolation in the vector store.
+- **Privacy:** explicit notice on what is logged and what is sent to third-party model providers; DPA in place with each provider; opt-out for training-data use; PII redaction before storage.
+- **Compliance:** EU AI Act risk-class assessment on file; NIST AI RMF profile mapped; ISO 42001 controls if pursued; sector controls (HIPAA, FDA SaMD, NYC bias audit) where applicable.
+- **Observability:** end-to-end trace per request (prompt rendered, context retrieved, tools called, model used, tokens, latency, cost, evals, safety verdict, user feedback).
+- **Reproducibility:** every released prompt and model version is pinned and re-runnable on the eval set.
+- **Resilience:** primary-model outage → automatic failover to a secondary; documented degraded-mode behaviour.
+---
+## 7. /datadict — Data Dictionary
+### Domain-specific interview questions
+- Multi-tenancy: tenant_id on every entity, including prompts, evals, conversations, vector index partitions?
+- Retention: how long are user conversations, prompts, evals, and traces retained? What is purged on opt-out?
+- PII categorisation: which fields might contain user PII or secrets and how are they protected (redaction, encryption, masked in UI)?
+### Mandatory entities for AI / ML
+- **Prompt** — versioned prompt template: id, version, body, variables, model binding, status (draft/published).
+- **Model** — model registry entry: provider, name, version, endpoint, default params, status, deprecation date.
+- **Conversation** — chat session: tenant, user, started_at, status, metadata.
+- **Message** — single turn: conversation, role (user/assistant/system/tool), content, tokens, cost, latency, model used.
+- **Trace** — full execution record per request: prompt id+version, retrieval results, tool calls, model id+version, final output, token counts, cost, latency, eval scores, safety verdict.
+- **EvalSet** — labelled dataset: name, items, scoring method, owner.
+- **EvalRun** — single run of a prompt or model against an eval set: scores, regression vs. baseline, run timestamp.
+- **Tool** — tool / function registry entry: name, schema, side-effect class (read-only / idempotent / irreversible), allowed roles.
+- **ToolCall** — log of an executed tool call: trace id, tool, arguments, result, status, latency.
+- **Document** — RAG corpus item: source, ingest date, hash, chunk count, embedding model, retention class.
+- **Chunk** — embedded chunk: document, position, text, embedding vector, metadata.
+- **VectorIndex** — index partition: tenant, embedding model, dimensionality, count, last reindex.
+- **Feedback** — user rating: trace id, rating, free-text reason, reviewer id.
+- **SafetyEvent** — flagged input or output: trace id, category, action taken, reviewer status.
+- **Quota** — per-tenant usage allowance: window, token cap, request cap, current usage.
+- **AuditLog** — admin actions on prompts, models, tools, evals, safety policy.
+---
+## 8. /apicontract — API Contract
+### Domain-specific interview questions
+- Streaming responses (server-sent events / chunked) vs. unary completions?
+- Function-calling / structured-output schemas — JSON Schema published per tool?
+- Public API for third-party integrators vs. internal-only?
+- Webhooks for "eval regression detected", "safety event flagged", "fine-tune job complete"?
+- Idempotency keys on completions (so a retried request does not double-charge)?
+### Typical endpoint groups
+- Chat / completion (streaming and unary).
+- Prompts (CRUD, versions, publish, rollback).
+- Models (registry, route, fallback, deprecation).
+- Tools (registry, schemas, allow-list per tenant).
+- Conversations (list, read, delete, export).
+- RAG ingest (upload, status, reindex, delete).
+- Vector search (query, similarity, filter).
+- Evals (sets, runs, compare, dashboards).
+- Safety (flag, review, override, audit).
+- Feedback (submit, list, aggregate).
+- Quotas and billing (current usage, limits, top-ups).
+- Fine-tuning (upload dataset, start job, status, deploy).
+- Webhooks (regression, safety, job-complete events).
+- Admin (audit log, model registry, tool registry).
+---
+## 9. /wireframes — Wireframe Descriptions
+### Domain-specific interview questions
+- Key screens: chat / conversation surface, prompt editor with side-by-side diff and eval panel, eval dashboard, safety review queue, model registry, tool registry, cost / usage dashboard, RAG corpus management, fine-tune job page?
+- Specific states: streaming response in progress, model fallback in effect, safety filter blocked an output, quota exhausted, rate-limited, regenerating, citation hover, tool call running, partial agent failure, eval regression alert, expensive query warning?
+### Typical screens
+- Chat surface (streaming response, citations, "regenerate", "stop", thumbs up/down, copy).
+- Prompt editor (template body, variables, model binding, side-by-side run, eval panel).
+- Eval dashboard (runs over time, regression alerts, drill-down per item).
+- Eval set editor (labelled examples, scorers, owner).
+- Safety review queue (flagged inputs / outputs, decision form, audit trail).
+- Model registry (providers, versions, deprecation calendar, fallback chains).
+- Tool registry (schemas, side-effect class, allow-list per tenant).
+- Cost / usage dashboard (per tenant, per prompt, per model; alerts at thresholds).
+- RAG corpus management (sources, ingest status, reindex button, retention class).
+- Fine-tune job page (dataset, base model, hyperparameters, status, evals, deploy).
+- Trace viewer (per request: prompt → retrieval → tool calls → model → output → evals → safety → feedback).
+- Admin console (audit log, quotas, policy editor, opt-out registry).
+---
+## Domain Glossary
+| Term | Definition |
+|------|-----------|
+| LLM | Large Language Model |
+| RAG | Retrieval-Augmented Generation — retrieve from a corpus, then generate grounded on it |
+| Agent | LLM that can call tools and act over multiple steps |
+| Tool / function calling | LLM produces a structured request to invoke an external function |
+| Embedding | Dense vector representation of text (or other modality) for similarity search |
+| Vector database | Storage and ANN search over embeddings |
+| Chunk | A unit of text from a source document used for embedding and retrieval |
+| Reranker | Second-stage model that reorders retrieved candidates by relevance |
+| Hallucination | Confident, plausible-sounding output that is factually wrong |
+| Prompt injection | Adversarial input that tries to override the system prompt or extract secrets |
+| Jailbreak | Adversarial input that bypasses the safety filter |
+| Eval | A measurement of model or prompt quality on a labelled dataset |
+| LLM-as-judge | Using one LLM to score the outputs of another |
+| TTFT | Time to first token — latency to the start of the streamed response |
+| Token | The unit of text the model bills and processes (sub-word) |
+| Context window | Maximum tokens the model can attend to per request |
+| Fine-tuning | Continued training of a base model on a custom dataset |
+| LoRA | Low-Rank Adaptation — a parameter-efficient fine-tuning technique |
+| Inference | Running a trained model to produce output (vs. training) |
+| MLOps | Operational discipline of running ML models in production |
+| Drift | Production data or behaviour drifting away from the eval distribution |
+| Refusal rate | % of requests the safety filter declines |
+| Red-teaming | Systematic adversarial testing of a model or product for unsafe behaviour |
+| Guardrails | Programmatic input / output filters around an LLM |
+| Multi-modal | Model that handles more than one modality (text, image, audio, video) |
+| EU AI Act | EU regulation classifying AI systems by risk and imposing obligations |
+| NIST AI RMF | US NIST AI Risk Management Framework |
+| ISO 42001 | International standard for AI management systems |

package/skills/references/domains/edtech.md ADDED Viewed

@@ -0,0 +1,223 @@
+# Domain Reference: EdTech
+Domain-specific knowledge for EdTech projects: K-12 platforms, higher-education tools, corporate learning (LXP/LMS), MOOC marketplaces, language learning, tutoring marketplaces, exam-prep apps, micro-credential and certification platforms.
+---
+## 1. /brief — Project Brief
+### Domain-specific interview questions
+- Learner segment: K-12 students, university students, adult professionals, corporate employees, parents-of-learners?
+- Product type: full LMS, lightweight LXP, MOOC marketplace, 1:1 tutoring marketplace, self-paced micro-courses, exam prep, language learning, simulation/lab environment?
+- Buyer vs. user split: who pays (parent, school district, university, HR/L&D, employer, learner) vs. who uses?
+- Monetization: per-seat institutional licensing, B2C subscription, freemium, certificate fees, marketplace take-rate, sponsored/free?
+- Accreditation and certification: are completions recognised by an external body (universities, certifying organisations, ministries of education)?
+- Geography and language coverage: single market or multi-region; what languages and reading directions are required at launch?
+### Typical business goals
+- Improve learning outcomes (course completion, mastery scores, time-to-proficiency).
+- Increase course completion rate (the perennial EdTech problem; benchmark <15% for free MOOCs).
+- Grow institutional contracts (school districts, universities, corporate L&D).
+- Reduce learner-acquisition cost; raise repeat enrolment.
+- Build a credentialing economy (certificates that employers actually trust).
+- Move learners up a value ladder: free → paid → certificate → cohort → degree.
+### Typical risks
+- Low completion and engagement — the dominant EdTech failure mode.
+- Child-safety / minor-data compliance: COPPA (US, under-13), GDPR-K (EU), age verification, parental consent.
+- Procurement cycles for K-12 and higher-ed are long (6–18 months) and seasonal.
+- Accessibility lawsuits (Section 508 / WCAG 2.1 AA / ADA) — high prevalence in US K-12.
+- Cheating, plagiarism, and AI-generated submissions undermine assessment integrity.
+- Content licensing — third-party textbook, video, and assessment item rights.
+---
+## 2. /srs — Requirements Specification
+### Domain-specific interview questions
+- Roles: learner, instructor / teacher, teaching assistant, course author, parent / guardian, school admin, district admin, content reviewer, support agent?
+- Multi-tenancy: per-district, per-school, per-cohort isolation?
+- LTI / SCORM / xAPI integration: must the platform plug into existing LMS (Canvas, Moodle, Blackboard, D2L)?
+- SIS (Student Information System) integration: rostering via OneRoster / Clever / ClassLink?
+- Assessment types: multiple-choice, free-text, code submissions, file upload, peer review, oral exam, proctored exam?
+- Live vs. self-paced: real-time classroom (video, whiteboard) or fully asynchronous?
+- Compliance: FERPA (US), COPPA, GDPR-K, state student-data privacy laws (CSPC), accessibility (WCAG 2.1 AA, Section 508).
+### Typical functional areas
+- Course catalog and enrolment.
+- Course authoring (lesson editor, video upload, quiz builder, branching scenarios).
+- Lesson player (video, reading, interactive widgets, transcript, captions).
+- Assessment engine (item bank, randomisation, time limits, retakes, grading rubric).
+- Gradebook and progress tracking.
+- Discussion forums, peer review, group projects.
+- Live classroom (video, whiteboard, breakout rooms, polls) — if synchronous.
+- Certificates, badges, transcripts, micro-credentials.
+- Reporting for instructors and admins (cohort progress, at-risk learners).
+- Parent/guardian portal (K-12).
+- Admin console (rostering, license management, content moderation).
+---
+## 3. /stories — User Stories
+### Domain-specific interview questions
+- Critical learner flows: discovery, enrolment, first lesson, returning to where I left off, taking an assessment, getting feedback, earning a certificate?
+- Critical instructor flows: creating a course, importing existing content, monitoring cohort progress, grading, intervening with at-risk learners?
+- Edge cases: assessment timeout, network drop mid-exam, plagiarism detection, content review queue, refund / unenrol?
+- Personas: motivated self-learner, struggling K-12 student, corporate compliance learner, university lecturer, district curriculum lead.
+### Typical epics
+- Learner onboarding and FTUE.
+- Course discovery and enrolment.
+- Lesson playback and progress tracking.
+- Assessment and grading.
+- Instructor authoring and analytics.
+- Cohort / classroom management.
+- Certificates and credentials.
+- Parent / guardian visibility (K-12).
+- Admin console and rostering.
+- Live classroom (if synchronous).
+---
+## 4. /usecases — Use Cases
+### Domain-specific interview questions
+- Critical alternative flows: assessment timeout, lost connection during a proctored exam, peer reviewer drops out, plagiarism flag, accessibility-driven extended time?
+- System actors: SIS provider, LTI consumer LMS, video CDN, proctoring service, payment provider, certificate-issuance service, content-moderation pipeline?
+### Typical exceptional flows
+- Assessment timer expires while learner is still answering.
+- Network drop during a high-stakes exam (recovery, replay, instructor escalation).
+- Plagiarism / AI-generated content detected after submission.
+- Learner under 13 with no parental consent on file.
+- Instructor revokes a published course; enrolled learners affected.
+- SIS roster sync fails mid-term; new students cannot log in.
+- LTI launch from external LMS arrives with invalid signature.
+---
+## 5. /ac — Acceptance Criteria
+### Domain-specific interview questions
+- Business rules: pass mark, max retakes, minimum time on task, prerequisite courses, certificate validity period, late-submission policy?
+- Boundary values: max video length, max assessment duration, max file upload size (essays, code), max concurrent live-classroom participants, gradebook precision (0.01 vs. integer)?
+- Accessibility AC: WCAG 2.1 AA conformance per screen, captions on every video, transcript on every audio, keyboard-only navigation, screen-reader announcements for timer changes?
+---
+## 6. /nfr — Non-functional Requirements
+### Domain-specific interview questions
+- Concurrency: peak concurrent learners (start of term, exam day, MOOC launch); peak concurrent live-classroom participants?
+- Video delivery: target start time (<2s), buffering ratio, supported bitrates, offline download for low-bandwidth?
+- Data residency: per-country / per-district storage requirements (e.g. EU student data must stay in EU; some US states require in-state hosting)?
+- Accessibility target: WCAG 2.1 AA mandatory; AAA aspirational?
+### Mandatory NFR categories for EdTech
+- **Performance:** lesson load < 2s, video start < 2s, gradebook query < 500ms (p95), assessment submission ack < 1s.
+- **Scalability:** handle the start-of-term spike (10–50× steady-state for 24–72h); support cohort sizes up to N learners simultaneously in live classroom.
+- **Availability:** 99.9% during academic hours; planned maintenance windows must avoid exam periods. Status page mandatory for institutional buyers.
+- **Security:** SSO via SAML / OIDC with rostering provider; RBAC by role and cohort; FERPA / COPPA / GDPR-K compliance; encryption at rest and in transit; audit log of grade changes.
+- **Privacy:** explicit consent flows for under-13 learners; parental access portal; right to delete student data; minimal data collection from minors.
+- **Accessibility:** WCAG 2.1 AA on all learner-facing surfaces; Section 508 conformance for US public-sector contracts; captions and transcripts on every multimedia asset.
+- **Content delivery:** global CDN for video; adaptive bitrate; offline download for mobile.
+- **Backup:** RPO < 1 hour, RTO < 4 hours; gradebook is critical — point-in-time recovery required.
+---
+## 7. /datadict — Data Dictionary
+### Domain-specific interview questions
+- Multi-tenancy: tenant_id (school, district, organisation) on every entity?
+- Soft delete: which entities (Course, Assessment, Submission) keep history for audit and grade-dispute resolution?
+- PII categorisation: what fields are FERPA / COPPA / GDPR-K protected?
+### Mandatory entities for EdTech
+- **Learner** — student account: profile, age, guardian link, accessibility prefs, locale.
+- **Instructor** — teacher / course author: profile, qualifications, hosted courses.
+- **Course** — course / module: title, description, prerequisites, learning objectives, status (draft/published).
+- **Lesson** — lesson / unit: order, type (video, reading, quiz), content payload.
+- **Enrolment** — learner ↔ course link: status (active, completed, dropped), start, end, progress %.
+- **Assessment** — quiz / exam: item bank reference, time limit, attempts allowed, passing score.
+- **Submission** — learner answer: content, score, attempt number, submitted_at, graded_by.
+- **GradebookEntry** — gradebook row: learner, course, assessment, score, weight, late?
+- **Certificate** — issued credential: learner, course, issue date, verification URL.
+- **Cohort** — class / group: school, term, instructor, learners.
+- **Guardian** — parent / guardian (K-12): link to learner, contact, consent status.
+- **AuditLog** — grade change, content publish, roster sync events.
+---
+## 8. /apicontract — API Contract
+### Domain-specific interview questions
+- LTI 1.3 / Deep Linking / Names and Roles support required?
+- OneRoster / Clever / ClassLink rostering API as a sync source?
+- xAPI / Caliper Analytics learning-record output for institutional analytics warehouses?
+- Public API for third-party integrations (textbook publishers, exam proctoring, plagiarism detection)?
+- Webhooks for "enrolment created", "assessment submitted", "certificate issued"?
+### Typical endpoint groups
+- Auth (SSO via SAML/OIDC, LTI launch, API keys).
+- Courses (CRUD, publish, version, prerequisites).
+- Lessons (CRUD, ordering, media upload).
+- Enrolments (enrol, unenrol, list, progress).
+- Assessments (item bank, start attempt, submit, grade).
+- Gradebook (read, override, audit).
+- Rostering (sync, learner / cohort import).
+- Certificates (issue, verify, list).
+- Analytics (cohort progress, at-risk learners).
+- Admin (tenants, license management, content moderation).
+- Webhooks (outgoing learning events).
+---
+## 9. /wireframes — Wireframe Descriptions
+### Domain-specific interview questions
+- Key screens: course catalog, course landing page, lesson player, assessment player, learner dashboard, instructor dashboard, gradebook, certificate page?
+- Specific states: assessment in progress with countdown timer, network-drop recovery, content locked behind prerequisite, completion confetti, accessibility-large-text mode?
+### Typical screens
+- Learner dashboard (continue learning, due assignments, recent grades).
+- Course catalog and search.
+- Course landing page (overview, syllabus, instructor, reviews, enrol button).
+- Lesson player (video / reading / interactive, transcript, notes, navigation).
+- Assessment player (timer, navigation, save-and-resume, submit confirmation).
+- Gradebook (learner view: my grades; instructor view: cohort grades).
+- Instructor course authoring (lesson editor, quiz builder, preview).
+- Instructor cohort dashboard (progress, at-risk, intervention queue).
+- Certificate page (verifiable URL, share to LinkedIn).
+- Parent / guardian portal (K-12): child progress, upcoming work, communications.
+- Admin console (rostering, licenses, content moderation).
+- Live classroom (video grid, whiteboard, chat, polls, breakout rooms) — if synchronous.
+---
+## Domain Glossary
+| Term | Definition |
+|------|-----------|
+| LMS | Learning Management System — institutional platform for course delivery and gradebook |
+| LXP | Learning Experience Platform — learner-centric, recommendation-driven evolution of the LMS |
+| MOOC | Massive Open Online Course |
+| LTI | Learning Tools Interoperability — IMS Global standard for embedding tools in an LMS |
+| SCORM | Sharable Content Object Reference Model — legacy standard for packaged learning content |
+| xAPI | Experience API (Tin Can) — modern learning-record format successor to SCORM |
+| Caliper Analytics | IMS Global learning-analytics standard |
+| SIS | Student Information System — system of record for enrolment, schedule, and grades |
+| OneRoster | IMS Global standard for SIS-to-application rostering |
+| Clever / ClassLink | US K-12 rostering and SSO providers |
+| FERPA | Family Educational Rights and Privacy Act (US) — student-record privacy law |
+| COPPA | Children's Online Privacy Protection Act (US) — parental consent for under-13 |
+| GDPR-K | GDPR provisions specific to children's data (EU) |
+| Section 508 | US federal accessibility law for ICT |
+| WCAG 2.1 AA | Web Content Accessibility Guidelines, level AA |
+| Item bank | Pool of assessment questions, randomised per attempt |
+| Cohort | A group of learners progressing through a course together |
+| Proctoring | Identity verification and behaviour monitoring during high-stakes exams |
+| Adaptive learning | Personalisation engine that adjusts difficulty / pace per learner |
+| Micro-credential | Small, verifiable digital certificate (vs. full degree) |
+| Completion rate | % of enrolled learners who finish a course — the canonical EdTech metric |
+| Time on task | Total active time a learner spends on a unit of content |

package/skills/references/domains/govtech.md ADDED Viewed

@@ -0,0 +1,244 @@
+# Domain Reference: GovTech
+Domain-specific knowledge for GovTech projects: citizen-facing e-services, permits and licensing, public records and FOIA, tax and benefits, identity and digital wallet, public-procurement (eProcurement), municipal 311 and case management, court and justice systems, voter and elections services.
+---
+## 1. /brief — Project Brief
+### Domain-specific interview questions
+- Service type: citizen-facing portal, permit / licence application, tax filing, benefits eligibility, public records request, 311 / non-emergency service request, court e-filing, vendor / procurement portal, internal case-management for caseworkers?
+- Level of government: national / federal, state / regional, municipal / city, agency-internal?
+- Buyer vs. user: who funds the project (agency, ministry, council) vs. who uses it (citizen, caseworker, vendor)?
+- Identity scheme: national digital ID (e.g. Login.gov, BankID, ItsMe, Aadhaar, eIDAS-recognised national eID), agency-issued credential, or self-asserted account?
+- Languages: which official languages must be supported at launch (often legally mandated)?
+- Procurement vehicle: internal build, prime contractor, framework agreement, GovTech sandbox?
+### Typical business goals
+- Reduce time-to-service for citizens (cut processing time from weeks to days/minutes).
+- Reduce cost-per-transaction vs. paper / counter / phone channels.
+- Increase digital channel adoption (move citizens off paper and counters).
+- Reduce caseworker manual effort and processing backlog.
+- Improve service satisfaction (customer-effort score, citizen NPS).
+- Meet legislated digital-service targets (e.g. EU SDG single digital gateway, US 21st Century IDEA Act, UK GDS service standard).
+### Typical risks
+- Legacy mainframe / system-of-record integration (COBOL, AS/400, custom XML over SOAP) — slow, expensive, fragile.
+- Procurement and approval cycles measured in years, not months.
+- Heightened press / political scrutiny on launch failures.
+- Accessibility lawsuits (Section 508, WCAG 2.1 AA, EN 301 549) — strict and frequent in the public sector.
+- Data residency and sovereignty: many jurisdictions ban sending citizen data abroad.
+- Equity of access — must work on old phones, low bandwidth, with limited literacy, and on assistive tech.
+- Audit, FOIA, and records-retention obligations apply to every transaction.
+---
+## 2. /srs — Requirements Specification
+### Domain-specific interview questions
+- Roles: citizen, authenticated resident, business representative, caseworker, supervisor, agency administrator, auditor, FOIA officer, third-party API consumer?
+- Identity proofing level: NIST IAL1 (self-asserted), IAL2 (remote identity-proofed), IAL3 (in-person verified)? Authentication assurance level (AAL1/2/3)?
+- Multi-tenancy across agencies: shared platform, agency-isolated, cross-agency single sign-on?
+- Legacy integration: which systems of record (tax, benefits, vital records, land registry, criminal-justice information system) must be read or written, and via which protocols (SOAP, REST, fixed-width batch, EDI)?
+- Compliance: WCAG 2.1 AA / EN 301 549, FedRAMP, FISMA, StateRAMP, ISO 27001, SOC 2, NIST 800-53, sector-specific (CJIS for justice, IRS Pub 1075 for tax, HIPAA for health-related benefits)?
+- Records retention: how long must form submissions, attachments, and audit trails be retained (often 7+ years, sometimes permanently)?
+### Typical functional areas
+- Identity proofing and authentication (national digital ID broker).
+- Citizen account / dashboard ("my services", "my correspondence", "my documents").
+- Service catalog and eligibility check.
+- Form intake (multi-step, save-and-resume, prefill from system of record).
+- Document upload and verification.
+- Payment of fees (card, ACH/SEPA, cash voucher at counter or post office).
+- Caseworker queue and case management.
+- Decision letters and outbound correspondence (digital + postal).
+- Public records / FOIA request handling.
+- Audit log and records retention.
+- Reporting for agency leadership and oversight bodies.
+- Notifications (email, SMS, postal letter for unbanked / unconnected citizens).
+---
+## 3. /stories — User Stories
+### Domain-specific interview questions
+- Critical citizen flows: discover service, check eligibility, sign in, fill multi-step form, upload documents, pay fee, receive decision, appeal a decision?
+- Critical caseworker flows: pick up case from queue, request more information, approve / deny, escalate to supervisor, archive?
+- Edge cases: incomplete application after 30 days, expired identity proof, payment refund, lost decision letter, FOIA disclosure with redaction, citizen requests records about themselves?
+- Personas: tech-confident citizen, citizen with low digital literacy, citizen using assistive tech, business representative filing on behalf of a company, caseworker, FOIA officer, auditor.
+### Typical epics
+- Sign-in via national digital ID.
+- Citizen dashboard ("my services").
+- Service discovery and eligibility check.
+- Application intake (multi-step form with save-and-resume).
+- Document verification.
+- Fee payment.
+- Caseworker queue and decision-making.
+- Outbound correspondence (digital + postal hybrid).
+- Public records / FOIA.
+- Appeals and grievances.
+- Audit log and records retention.
+- Reporting and oversight.
+---
+## 4. /usecases — Use Cases
+### Domain-specific interview questions
+- Critical alternative flows: identity proof fails, payment declines, citizen abandons mid-form, caseworker requests extra documents, supervisor overrides denial, system-of-record write fails after fee was charged?
+- System actors: national digital ID broker, payment processor, system of record (tax, benefits, vital records), postal hybrid mail provider, geocoder / address validator, document verification service, FOIA workflow system?
+### Typical exceptional flows
+- Identity proofing fails (KBA / liveness / document check) — fallback to in-person counter.
+- Citizen has no digital identity — alternative paper or phone channel must remain available.
+- Payment declined — resume application in "fee due" status, do not lose form data.
+- Save-and-resume token expires — citizen must re-authenticate but data is preserved.
+- System of record returns an error after the fee is charged — automatic refund and incident ticket.
+- Caseworker decision is appealed — case re-opens with new SLA clock.
+- Records-retention period elapses — automated archival or destruction per schedule.
+- FOIA disclosure with PII redaction — redaction queue + reviewer sign-off.
+---
+## 5. /ac — Acceptance Criteria
+### Domain-specific interview questions
+- Business rules: eligibility criteria per service, document validity periods, fee schedules, statutory processing time limits, appeal windows?
+- Boundary values: max attachment size, max number of attachments, session timeout (often regulated, e.g. 15-min idle for IRS Pub 1075), form completion deadline?
+- Accessibility AC: WCAG 2.1 AA conformance per page, EN 301 549 conformance for EU public sector, screen-reader announcements for form errors, keyboard-only navigation, plain-language reading level (often grade 6–8 mandated)?
+---
+## 6. /nfr — Non-functional Requirements
+### Domain-specific interview questions
+- Concurrency: peak load on tax-filing day, benefits-application opening day, election registration deadline (10–100× steady state)?
+- Channel parity: any service offered online must remain available via paper / phone — what is the legal requirement?
+- Data residency / sovereignty: must all citizen data stay within national borders? Cloud region restrictions?
+- Disaster recovery: RTO/RPO targets dictated by statute or oversight body?
+- Plain language: maximum reading-grade level (often 6–8) enforced by guideline?
+### Mandatory NFR categories for GovTech
+- **Performance:** form pages < 2s on a 3G connection; system-of-record write < 5s (p95); search < 1s.
+- **Scalability:** handle the deadline-day spike (e.g. tax day, FAFSA opening, election registration cutoff) without queueing citizens out.
+- **Availability:** 99.9% during published service hours; planned maintenance restricted to off-hours; status page mandatory for citizen services; outage alerts via the agency comms channel.
+- **Security:** FedRAMP Moderate (US federal) / StateRAMP / ISO 27001 / NIST 800-53 controls; encryption at rest and in transit; audit log of every form submission, decision, and access to PII; multi-factor authentication for caseworkers; sector controls (CJIS, IRS Pub 1075, HIPAA) where applicable.
+- **Privacy:** Privacy Impact Assessment on file; minimal data collection; right of access and right of correction; explicit consent for any secondary use; PII is encrypted and logged on access.
+- **Accessibility:** WCAG 2.1 AA mandatory; EN 301 549 for EU public sector; Section 508 conformance for US federal; tested with assistive tech (NVDA, JAWS, VoiceOver, Dragon).
+- **Plain language:** target reading grade 6–8; multi-language coverage of all official languages; avoidance of jargon and acronyms.
+- **Records retention:** retention schedule per record type (often 7+ years; permanent for some categories); legal hold flag freezes deletion; FOIA-ready export.
+- **Data residency:** all citizen data stored in the relevant jurisdiction; cloud region locked.
+- **Backup:** RPO < 15 min for transactional state, RTO < 4 hours; offsite backup; tested DR runbook.
+---
+## 7. /datadict — Data Dictionary
+### Domain-specific interview questions
+- Multi-tenancy by agency: agency_id on every entity?
+- PII categorisation: which fields are SSN / national ID / DOB / address — and how are they protected (encryption, tokenisation, masked in UI)?
+- Records-retention metadata: every record has a retention class, retention start, scheduled disposal date?
+- Audit trail: every read of PII is logged with actor, reason, timestamp?
+### Mandatory entities for GovTech
+- **Citizen** — identity-proofed natural person: name, date of birth, national ID (encrypted), verified address.
+- **Account** — login account: link to Citizen, identity assurance level, MFA enrolment, last sign-in.
+- **Service** — service catalog entry: title, agency, eligibility criteria, fee schedule.
+- **Application** — submitted form instance: service, citizen, status, submission timestamp, fee paid.
+- **FormPage** — multi-step form state: application, page, answers (versioned for save-and-resume).
+- **Document** — uploaded attachment: application, type, file hash, virus-scan status, retention class.
+- **Payment** — fee transaction: application, amount, channel (card, ACH/SEPA, voucher), provider reference.
+- **Case** — caseworker workflow item: application, assigned caseworker, status, SLA clock.
+- **Decision** — outcome: case, decision (approve / deny / request more info), reason, decision date, appealable until.
+- **Correspondence** — outbound message: case, channel (digital, postal, both), template, sent_at, delivery status.
+- **Caseworker** — internal staff: agency, role, MFA status, security clearance level if applicable.
+- **AuditLog** — append-only record: actor, action, target, timestamp, IP, reason (mandatory for PII reads).
+- **RecordsHold** — legal hold marker: entity, hold reason, hold date, lifted_at.
+- **FoiaRequest** — public records request: requester, scope, status, redacted disclosure document, decision date.
+---
+## 8. /apicontract — API Contract
+### Domain-specific interview questions
+- Identity broker integration: SAML 2.0 / OIDC against the national digital ID provider — which assurance levels and claims?
+- System-of-record APIs: SOAP / REST / batch — what authentication (mTLS, signed JWT, IP allow-list)?
+- Public APIs for civic-tech developers: open by default vs. registered keys?
+- Webhook outputs for status changes ("application submitted", "decision issued") to other agency systems?
+- Open-data exports (CSV / JSON / GeoJSON) for transparency and oversight?
+### Typical endpoint groups
+- Auth (national digital ID broker callback, MFA enrolment, session refresh).
+- Citizen profile (read, update verified attributes, consent management).
+- Services (catalog, eligibility check).
+- Applications (start, save, submit, list mine, list mine in progress).
+- Documents (upload, virus-scan status, redact, download).
+- Payments (initiate, callback, refund, receipt).
+- Cases (caseworker queue, assign, decide, escalate).
+- Correspondence (templates, send, status).
+- Audit (read by auditor / oversight role only).
+- FOIA (intake, status, disclosure download).
+- Open data (anonymised statistics, machine-readable).
+- Webhooks (outgoing events to other agency systems).
+---
+## 9. /wireframes — Wireframe Descriptions
+### Domain-specific interview questions
+- Key screens: service discovery, eligibility check, sign-in via national ID, multi-step form with save-and-resume, document upload, fee payment, application status, decision letter view, appeal form, caseworker queue, caseworker decision form?
+- Specific states: identity proofing failed, save-and-resume token expired, payment failed, application incomplete after N days, decision pending appeal, redacted FOIA disclosure?
+- Plain-language and accessibility states: large-text mode, high-contrast mode, screen-reader announcements for form errors, language switcher prominently placed?
+### Typical screens
+- Service catalog (browse and search by life event: "having a baby", "moving house", "starting a business").
+- Eligibility check (short pre-form to filter ineligible applicants out before identity proofing).
+- Sign-in / identity proofing flow (national ID broker handoff, fallback to in-person).
+- Citizen dashboard (in-progress applications, recent decisions, correspondence inbox).
+- Multi-step form (progress indicator, save-and-resume, plain-language help, inline error messages).
+- Document upload (drag-and-drop, list of accepted types, virus-scan progress).
+- Fee payment (channel choice, receipt).
+- Application status (timeline with statutory SLA clock).
+- Decision letter (digital copy, postal copy mailed, appeal button if applicable).
+- Appeal form (separate flow with its own SLA).
+- Caseworker queue (filter by status, SLA, priority; bulk-assign).
+- Caseworker decision form (read-only application, decision options, reason text, supervisor escalation).
+- Audit screen (read-only, restricted to auditor role).
+- FOIA intake (public form, no sign-in required).
+- Public service status page.
+---
+## Domain Glossary
+| Term | Definition |
+|------|-----------|
+| FedRAMP | US federal cloud-security authorisation programme |
+| StateRAMP | US state-level analogue of FedRAMP |
+| FISMA | Federal Information Security Modernization Act (US) |
+| NIST 800-53 | US federal security and privacy controls catalog |
+| CJIS | Criminal Justice Information Services security policy (FBI) |
+| IRS Pub 1075 | US IRS rules for handling federal tax information |
+| eIDAS | EU regulation on electronic identification and trust services |
+| SDG | Single Digital Gateway (EU) — single point of access for cross-border services |
+| GDS | UK Government Digital Service — sets the UK service standard |
+| 21st Century IDEA Act | US law mandating digital service modernisation |
+| EN 301 549 | EU public-sector accessibility standard |
+| WCAG 2.1 AA | Web Content Accessibility Guidelines, level AA |
+| Section 508 | US federal accessibility law for ICT |
+| Login.gov | US federal shared sign-in service |
+| BankID | Nordic national digital ID scheme |
+| ItsMe | Belgian national digital ID |
+| Aadhaar | Indian national digital ID |
+| IAL / AAL | NIST identity-assurance and authentication-assurance levels |
+| KBA | Knowledge-Based Authentication |
+| FOIA | Freedom of Information Act — public records request regime |
+| Privacy Impact Assessment (PIA) | Mandatory privacy review of any system that processes PII |
+| Records retention schedule | Legally mandated table of how long each record class must be kept |
+| Legal hold | Marker that freezes deletion of records subject to litigation or audit |
+| Plain language | Mandated writing standard targeting reading grade 6–8 |
+| Hybrid mail | Postal letter generated digitally and printed by a contracted provider |
+| 311 | Non-emergency municipal service request channel (US) |
+| eFiling | Electronic submission of court documents |
+| eProcurement | Public-sector vendor and bid management |

package/skills/srs/SKILL.md CHANGED Viewed

@@ -15,7 +15,7 @@ Second step of the BA Toolkit pipeline. Generates an SRS adapted from IEEE 830.
 0. If `00_principles_*.md` exists in the output directory, load it and apply its conventions (artifact language, ID format, traceability requirements, Definition of Ready, quality gate threshold).
 1. Read `01_brief_*.md` from the output directory. If missing, warn and suggest running `/brief`.
 2. Extract: slug, domain, business goals, functionality, stakeholders, constraints, glossary.
-3. If a matching `references/domains/{domain}.md` file exists (currently: `saas`, `fintech`, `ecommerce`, `healthcare`, `logistics`, `on-demand`, `social-media`, `real-estate`, `igaming`), load it and apply its section `2. /srs`.
+3. If a matching `references/domains/{domain}.md` file exists (currently: `saas`, `fintech`, `ecommerce`, `healthcare`, `logistics`, `on-demand`, `social-media`, `real-estate`, `igaming`, `edtech`, `govtech`, `ai-ml`), load it and apply its section `2. /srs`.
 ## Environment