npm - nubos-pilot - Versions diffs - 1.2.2 → 1.2.4 - Mend

nubos-pilot 1.2.2 → 1.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (70) hide show

package/CHANGELOG.md +18 -0
package/README.md +16 -0
package/agents/np-architect.md +2 -0
package/agents/np-executor.md +1 -1
package/agents/np-learnings-extractor.md +54 -0
package/agents/np-planner.md +1 -1
package/agents/np-security-reviewer.md +9 -0
package/bin/np-tools/_commands.cjs +4 -0
package/bin/np-tools/derive-tier.cjs +86 -0
package/bin/np-tools/derive-tier.test.cjs +83 -0
package/bin/np-tools/learnings.cjs +109 -0
package/bin/np-tools/learnings.test.cjs +66 -0
package/bin/np-tools/loop-run-round.cjs +7 -1
package/bin/np-tools/security.cjs +3 -0
package/bin/np-tools/skill-audit.cjs +79 -0
package/bin/np-tools/skill-audit.test.cjs +86 -0
package/bin/np-tools/spawn-headless.cjs +35 -1
package/bin/np-tools/spawn-headless.test.cjs +135 -0
package/bin/np-tools/verify-reliability.cjs +65 -0
package/bin/np-tools/verify-reliability.test.cjs +69 -0
package/lib/agents.test.cjs +1 -0
package/lib/config-defaults.cjs +13 -0
package/lib/config-schema.cjs +11 -0
package/lib/eval-reliability.cjs +63 -0
package/lib/eval-reliability.test.cjs +56 -0
package/lib/headless-guard.cjs +127 -0
package/lib/headless-guard.test.cjs +119 -0
package/lib/install/claude-hooks-learnings.test.cjs +82 -0
package/lib/install/claude-hooks.cjs +65 -4
package/lib/install/claude-hooks.test.cjs +5 -2
package/lib/learnings/capture-ledger.cjs +80 -0
package/lib/learnings/capture-ledger.test.cjs +54 -0
package/lib/learnings/extract.cjs +191 -0
package/lib/learnings/extract.test.cjs +115 -0
package/lib/nubosloop-audit.cjs +104 -0
package/lib/nubosloop-skill-audit.test.cjs +98 -0
package/lib/nubosloop.cjs +9 -0
package/lib/tier-classify.cjs +67 -0
package/lib/tier-classify.test.cjs +67 -0
package/np-tools.cjs +4 -0
package/package.json +1 -1
package/skills/np-access-control/SKILL.md +42 -0
package/skills/np-accessibility-audit/SKILL.md +41 -0
package/skills/np-adr/SKILL.md +37 -0
package/skills/np-api-design/SKILL.md +34 -0
package/skills/np-caching-strategy/SKILL.md +38 -0
package/skills/np-data-modeling/SKILL.md +37 -0
package/skills/np-data-privacy/SKILL.md +39 -0
package/skills/np-dependency-audit/SKILL.md +47 -0
package/skills/np-encryption/SKILL.md +47 -0
package/skills/np-error-handling/SKILL.md +37 -0
package/skills/np-incident-response/SKILL.md +38 -0
package/skills/np-llm-app-architecture/SKILL.md +50 -0
package/skills/np-observability/SKILL.md +39 -0
package/skills/np-performance/SKILL.md +38 -0
package/skills/np-queue-design/SKILL.md +32 -0
package/skills/np-rag-design/SKILL.md +43 -0
package/skills/np-refactoring/SKILL.md +35 -0
package/skills/np-resilience-patterns/SKILL.md +39 -0
package/skills/np-secure-code-review/SKILL.md +46 -0
package/skills/np-secure-design/SKILL.md +44 -0
package/skills/np-service-boundary/SKILL.md +35 -0
package/skills/np-system-design/SKILL.md +40 -0
package/skills/np-test-strategy/SKILL.md +46 -0
package/skills/np-threat-model/SKILL.md +42 -0
package/templates/claude/payload/hooks/np-learnings-hook.cjs +56 -0
package/templates/claude/payload/hooks/np-security-hook.cjs +1 -0
package/workflows/architect-phase.md +21 -1
package/workflows/execute-phase.md +66 -4
package/workflows/verify-work.md +17 -4

package/skills/np-observability/SKILL.md ADDED Viewed

@@ -0,0 +1,39 @@
+---
+name: np-observability
+description: "Quality bar for changes that add or modify a code path that should be observable — services, request handlers, background jobs, integrations with external systems, and any new failure path. Triggered for executor work on handlers, workers, clients, or error branches. Encodes the logging/metrics/tracing rules the change MUST satisfy before commit so the path is diagnosable from telemetry alone, not a runbook or dashboard to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Observability
+If something breaks at 3am, can the on-call person see what happened from logs and metrics alone — without you adding logging during the incident? Apply this bar to the code path you are about to commit.
+## Before editing
+- Read the project's existing telemetry conventions first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "logging metrics conventions" --task $TASK_ID`. Match the established log idiom, metric naming, and trace-context propagation — consistency beats personal preference.
+## Logging
+- **Structured, not string-concatenated.** Emit key-value fields (`order_id`, `duration_ms`), never bake variables into a prose sentence. Machines query fields; they can't grep your prose.
+- **Levels mean something.** `error` = actionable, someone must look; `warn` = degraded but handled; `info` = significant state change at a boundary; `debug` = development detail. Don't log expected control flow as `error` — error noise hides real errors.
+- **Carry the correlation/request/trace id** on every log line so a single request can be reconstructed across the path.
+- **Never log secrets, tokens, credentials, or PII.** Redact at the logging boundary, not by hoping callers don't pass them.
+- **Log once at the boundary**, not at every layer the value passes through. Don't log inside hot loops or per-iteration — aggregate and emit once.
+## Metrics
+- **Emit a metric for what you'll need to answer "is it healthy / how often / how slow."** Counter for occurrences, histogram/timer for latency, and a way to compute error rate (success vs failure count) on any path that can fail.
+- **Follow the existing metric naming** and label/tag conventions. Don't invent a parallel scheme. Keep label cardinality bounded — no user ids or raw inputs as labels.
+## Tracing
+- **Propagate trace context across every service and async boundary** — outgoing requests, queue publishes, job pickups. A dropped context turns one trace into orphans.
+## Verification bar (must hold before commit)
+- Every new failure path is diagnosable from logs + metrics alone: a structured log at `warn`/`error` with the correlation id, and a counter that increments on failure.
+- Logs are structured key-value with correct levels; no secrets/tokens/PII reach any sink — pair with [np-secure-code-review].
+- Error branches carry actionable context (what failed, identifying ids) so the cause is clear without re-reading source — align with [np-error-handling].
+- New metrics follow existing naming/labels; latency and error rate are observable for any path that can be slow or fail.
+- Trace context is propagated across every service/async hop the change introduces.
+- No logging in hot loops; the path logs once at its boundary, not at every layer.

package/skills/np-performance/SKILL.md ADDED Viewed

@@ -0,0 +1,38 @@
+---
+name: np-performance
+description: "Quality bar for changes that touch a hot path, data access, or anything that scales with input size — triggered for executor work on loops over collections, query layers, request/render hot paths, batch jobs, or any code whose cost grows with N. Encodes the performance checklist the change MUST satisfy before commit: known complexity, no N+1, bounded result sets, sound caching, lean hot loops. Not a profiling report to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Performance
+A change that scales with input must not degrade as input grows. Optimize the dominant cost; leave micro-noise alone. Never trade correctness or readability for unmeasured speed.
+## Before editing
+- Read existing conventions / find the hot path: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`.
+- Establish the baseline before changing anything: measure the current cost (timing, query count, allocations) so you can prove the change helps. Don't guess where the time goes.
+## Complexity
+- Know the algorithmic complexity of what you wrote and whether it grows with N. State it to yourself before committing.
+- Replace nested scans over collections with a set/map lookup (O(N²) → O(N)).
+- Do work once: hoist invariant computation out of loops; memoize repeated pure calls.
+## Data access
+- Kill N+1: no queries inside a loop. Batch, eager-load, or join so the count is constant in the number of rows.
+- Paginate or stream unbounded result sets. Never load-all-into-memory when the size is driven by user data or table growth.
+- Push filtering, aggregation, and limits down to the data layer instead of fetching wide and discarding in code.
+## Hot loops and caching
+- Avoid needless allocation and copying inside hot loops; reuse buffers, slice instead of clone where safe.
+- Cache only when correctness allows it, and only with a clear invalidation story — no stale-forever. Define the key and the expiry/bust trigger.
+- Defer expensive, non-critical work (move it async / to a background job) so it stays off the request or render path.
+## Verification bar (must hold before commit)
+- The change's complexity is known and does not grow worse than linearly with input unless unavoidable and justified.
+- No query runs inside a loop; collection access over the data layer is batched or eager-loaded.
+- Every result set whose size scales with input is paginated, streamed, or bounded — nothing loads everything into memory.
+- Any cache added has an explicit invalidation trigger; no value can go stale-forever.
+- Hot loops do no redundant allocation, copying, or repeated pure work.
+- A before/after measurement (or a clear complexity argument) shows the change helps and harms nothing; the dominant cost was the target, not micro-noise.
+- Correctness and readability were not sacrificed for unmeasured speed.
+- See [np-data-modeling] for indexing that backs these queries and [np-observability] for latency/throughput metrics that confirm the win in production.

package/skills/np-queue-design/SKILL.md ADDED Viewed

@@ -0,0 +1,32 @@
+---
+name: np-queue-design
+description: "Quality bar for executor work that adds or changes asynchronous job, message-queue, or worker code — producers, consumers, background jobs, event handlers, task processors, and the messages they exchange. Triggered whenever a change enqueues, dequeues, or processes work off the request path, encoding the async-correctness rules the change MUST satisfy before commit (idempotent consumers under at-least-once delivery, no assumed ordering, a real failure path with backoff and dead-letter, ack timeouts above worst-case processing, bounded backpressure, small versioned payloads, observable jobs). This is a bar the change must meet, not a queue spec to author. Language- and broker-agnostic."
+user-invocable: false
+---
+# Queue Design
+A queue decouples producer from consumer in time — and in doing so trades synchronous certainty for delivery guarantees that are weaker than they look. Design the consumer for the guarantees the transport actually gives, not the ones you wish it gave.
+## Before editing
+- Read existing conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`. Match the established queue/job idiom (naming, serializer, retry policy, dead-letter convention) rather than inventing a parallel one.
+## Assume at-least-once; make consumers idempotent
+- Treat every message as possibly delivered more than once. Processing the same message twice MUST produce the same result — dedupe on a message/business key, use a conditional or upsert write, or record processed IDs. Exactly-once is largely a myth; idempotency is the real defense.
+- Do not assume ordering. Only rely on order when the transport guarantees it AND the work genuinely needs it; otherwise make handlers order-independent.
+## Every consumer needs a failure path
+- A handler that can fail needs explicit retry with backoff, a max-attempts cap, and a dead-letter queue for what exhausts it. A poison message that always fails MUST NOT block the queue or retry forever — route it to the DLQ and move on.
+- Set the visibility/ack timeout longer than the worst-case processing time. Too short and the broker redelivers in-flight work, double-processing it.
+## Bound the queue and the payload
+- A queue that grows unbounded is an outage waiting to happen. Define what happens under backpressure — bound depth, shed, or scale consumers — and make lag observable; do not let it silently accumulate.
+- Keep the payload small: carry IDs and references, not large blobs. Version the message schema so producer and consumer can evolve independently; a consumer must tolerate an unknown newer field, not crash on it.
+## Verification bar (must hold before commit)
+- The consumer is idempotent: redelivery of the same message produces the same result, proven by a dedupe key, conditional write, or processed-ID record — not by hoping delivery is exactly-once.
+- No unjustified reliance on ordering; where order is required, the transport actually guarantees it.
+- A failure path exists: bounded retry with backoff, a max-attempts cap, and a dead-letter destination — a poison message cannot stall or loop the queue. Retry/backoff mechanics align with [np-error-handling].
+- Visibility/ack timeout exceeds worst-case processing time; no window where in-flight work is redelivered and run twice.
+- Backpressure is bounded and queue depth/lag is observable per [np-observability]; failures, retries, and DLQ arrivals are logged with enough context to diagnose what failed and how often.
+- The payload is small and the schema versioned; consumers tolerate additive schema change. Cross-boundary failure handling for the enqueue/dequeue call follows [np-resilience-patterns].

package/skills/np-rag-design/SKILL.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: np-rag-design
+description: "Quality bar for researcher/architect/executor work that adds or changes a retrieval-augmented (RAG) or semantic-search pipeline — chunking, embeddings, vector or hybrid indexes, retrieval, ranking, and grounding. Triggered when a task touches how documents are split, embedded, indexed, retrieved, filtered, or fed to a model as context. Encodes a checklist the design MUST satisfy before commit, not a spec document to author. Provider- and store-agnostic."
+user-invocable: false
+---
+# RAG / Retrieval Design
+Retrieval decides what the model sees. Bad chunks, a stale index, or unfiltered hits silently degrade every downstream answer. Make each retrieval choice deliberate and measurable.
+## Before editing
+- Read existing conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "retrieval index conventions" --task $TASK_ID`.
+- Find the existing chunker, embedding call, and index/query path before adding a parallel one.
+## Chunking
+- Size and overlap are deliberate choices justified against content shape and query length — not copied defaults.
+- Split on semantic boundaries (sections, paragraphs, code blocks); never sever a unit mid-sentence or mid-record.
+- Carry source metadata on every chunk: document id, title, location/anchor, timestamp, and access scope.
+## Embedding & index
+- One embedding model for both index build and query — mixing models or versions makes similarity meaningless.
+- Index type (vector / keyword / hybrid) fits scale and query latency; record the dimensionality and distance metric.
+- Re-embedding after a model change means a full re-index — plan it, don't half-migrate.
+## Retrieval quality
+- Have a retrieval eval: a known query→expected-doc set with recall/precision measured, not "it returned something."
+- Rank, then filter by a tuned score threshold so weak hits never reach the prompt; cap result count.
+- Surface why a chunk was retrieved (score, source) so failures are debuggable.
+## Grounding & freshness
+- Answers cite the retrieved sources; nothing ungrounded is presented as fact.
+- Budget retrieved tokens against the prompt window — truncate or rerank, don't overflow.
+- Define the re-index / invalidation trigger (on write, on schedule, on version bump); a stale index serves wrong answers.
+## Verification bar (must hold before commit)
+- Chunking preserves semantic units and attaches source metadata; size/overlap are justified, not defaulted.
+- Index and query use the identical embedding model and version; metric and dimensions are documented.
+- A retrieval eval exists and recall on the known set is measured and acceptable — regressions are visible.
+- Low-score / irrelevant chunks are filtered by threshold and count before reaching the context window.
+- Generated answers are grounded in and cite retrieved sources; see [np-llm-app-architecture] for the generation half.
+- Retrieval enforces caller access scope — it never returns documents the caller cannot see; see [np-secure-code-review].
+- A concrete re-index / invalidation trigger is defined for content and model changes.
+- Index and query latency fit the budget at target scale; see [np-performance].

package/skills/np-refactoring/SKILL.md ADDED Viewed

@@ -0,0 +1,35 @@
+---
+name: np-refactoring
+description: "Quality bar for behavior-preserving change — refactor, cleanup, restructure, rename, extract, deduplicate, decouple. Triggered for executor work on tasks that improve internal structure while keeping observable behavior identical. Encodes the checklist the change MUST satisfy before commit, not a method to teach. Structure changes; behavior does not — if behavior shifts, it is a feature or fix, not a refactor, and must be flagged. Language- and framework-agnostic."
+user-invocable: false
+---
+# Refactoring
+A refactor changes how the code is shaped, never what it does. Same inputs, same outputs, same side effects — only the structure improves. If you cannot say that with a straight face, you are not refactoring.
+## Before editing
+- Read existing conventions / pin behavior: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`.
+- Find the tests that currently exercise the code. If none pin the behavior you are about to move, add characterization tests FIRST — capture what it does today, not what it should do.
+## Behavior is frozen
+- Do not change observable behavior: return values, side effects, error cases, ordering, timing-sensitive contracts.
+- Preserve public contracts — signatures, names, types, routes, serialized shapes — unless the task is explicitly to change them.
+- If you discover a bug mid-refactor, do not fix it silently. Flag it and keep the behavior (including the bug) until a separate task addresses it.
+- A behavior change disguised as a refactor is the failure mode to avoid. When in doubt, it is NOT a refactor.
+## Small green steps
+- Move in small, reversible increments. Run the tests between steps; stay green the whole way.
+- Separate mechanical changes (bulk rename, file move, extract) from logic moves — different commits, so each diff is trivially reviewable.
+- Never mix a refactor and a behavior change in one commit.
+## Stay in scope
+- Refactor only what the task names. Resist "while I'm here" cleanups, drive-by reformatting, and adjacent rewrites — they bloat the diff and hide the real change.
+- Improving the target means: better names, higher cohesion, less duplication, looser coupling, lower complexity. Not: new abstractions nobody asked for.
+## Verification bar (must hold before commit)
+- Behavior is provably unchanged: the pre-existing tests pass without modification, or characterization tests added first still pass (see [np-test-strategy]).
+- No public contract was altered unless the task required it (see [np-api-design] for contract preservation).
+- The diff is reviewable: mechanical renames/moves are isolated from logic changes; no unrelated files touched.
+- No behavior change is bundled in; any discovered bug or behavior gap is flagged, not fixed in place.
+- The change measurably reduces duplication, coupling, or complexity — or it should not have been made.

package/skills/np-resilience-patterns/SKILL.md ADDED Viewed

@@ -0,0 +1,39 @@
+---
+name: np-resilience-patterns
+description: "Quality bar for executor work where code calls across a process or network boundary — an external API, another internal service, a queue, or a database under load. Triggered whenever a change introduces or modifies a remote call, encoding the resilience rules the change MUST satisfy before commit (bounded timeouts, fail-fast on persistent failure, isolation to stop cascades, degradation over hard error, load shedding under overload, retry-safe idempotency). This is a bar the change must meet, not a resilience spec to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Resilience Patterns
+A call across a process or network boundary WILL fail — slowly, partially, or completely. Design for that as the normal case, not the exception. Match each pattern to the actual failure mode you face; do not bolt on machinery the call does not need.
+## Before editing
+- Read existing conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`. Match the established resilience idiom (timeout helper, circuit-breaker wrapper, fallback shape) rather than inventing a parallel one.
+## Bound every remote call
+- Every call leaving the process gets an explicit timeout — connect AND read. No client default, no unbounded wait. A hung dependency must surface as a fast, typed failure, never a stuck caller.
+- Set the budget from the caller's deadline, not the dependency's optimism. A downstream timeout longer than the request budget is a bug.
+## Fail fast on a dependency that stays down
+- For a dependency that can be down for seconds-to-minutes, a circuit breaker beats hammering it: trip after a failure threshold, reject immediately while open, probe before closing.
+- A breaker is for sustained failure. For a single transient blip, a bounded timeout (or one retry — see [np-error-handling]) is enough; do not reach for a breaker where a timeout suffices.
+## Isolate so one failure does not sink the ship
+- Cap the resources any single dependency can consume — connection-pool slice, concurrency limit, dedicated worker set. One slow dependency must not drain the shared pool and freeze unrelated work (bulkhead).
+- Make the blast radius explicit: when dependency X is exhausted, exactly which calls degrade — and which stay healthy?
+## Degrade instead of erroring, shed instead of collapsing
+- Where a stale, cached, or partial answer beats a hard error, define the fallback and return it on failure. State plainly when a response is degraded.
+- Under overload, shed or queue with a bound rather than accept everything and fall over. Reject early with a clear signal; protect the work already in flight.
+## Make retries safe
+- Any call that may be retried (by you, a client, or infrastructure) must be idempotent — idempotency key, conditional write, or natural dedupe. A retried non-idempotent write is data corruption waiting to happen.
+## Verification bar (must hold before commit)
+- Every new or modified remote call has an explicit, deadline-derived timeout; no unbounded waits remain.
+- The pattern matches the failure mode: breaker only where failure is sustained, bulkhead only where a shared resource can be exhausted, fallback only where a degraded answer is acceptable. No unjustified machinery.
+- A failing or slow dependency cannot cascade: its resource consumption is capped and the degraded path is exercised, not just declared.
+- Overload is shed or bounded, not absorbed until collapse.
+- Anything retryable is idempotent; retry/backoff itself lives in [np-error-handling].
+- Trips, rejections, fallbacks, and shed load are observable per [np-observability]; added latency budgets respect [np-performance].

package/skills/np-secure-code-review/SKILL.md ADDED Viewed

@@ -0,0 +1,46 @@
+---
+name: np-secure-code-review
+description: "Quality bar for any change that touches authentication, authorization, session handling, secrets, cryptography, file uploads, deserialization, SQL or shell construction, SSRF-prone outbound requests, or user-controlled input that reaches a sink. Triggered for executor and security-reviewer work on auth/crypto/SQL/input-handling code. Encodes an OWASP-aligned review checklist the change MUST satisfy before commit — not a document to produce. Language-agnostic."
+user-invocable: false
+---
+# Secure Code Review
+A change that touches a security-sensitive surface is not done when the test passes — it is done when it survives this checklist. Apply every relevant section to the diff you are about to commit. A single unaddressed item is a blocking finding, not a nit.
+## Before editing
+- Read the existing pattern first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "auth <surface>" --task $TASK_ID`. Match the project's established auth/validation idiom; do not introduce a second one.
+- Identify the trust boundary the change crosses. Everything arriving from across it is hostile until validated.
+## Input handling
+- Validate at the boundary: type, length, range, format, allow-list over deny-list. Reject, don't sanitize-and-hope.
+- Parameterize every query. No string-concatenated SQL/NoSQL/LDAP, ever. No shelling out with interpolated user input — pass argv arrays.
+- Encode on output for the destination context (HTML, attribute, JS, URL, SQL). Escaping is contextual, not global.
+- Treat file paths, redirect targets, and outbound URLs from user input as SSRF/path-traversal vectors: canonicalize and allow-list.
+## AuthN / AuthZ
+- Authorization is checked server-side on every protected action, against the *acting* identity, on the *specific* resource (no IDOR). Never trust a client-supplied role, id, or `isAdmin`.
+- New endpoints/handlers default to deny. Adding a route never silently widens access.
+- Session/token: rotate on privilege change, expire, and invalidate on logout. No auth material in URLs or logs.
+## Secrets & crypto
+- No hardcoded secrets, keys, or tokens in source or fixtures. Read from config/env/secret store.
+- Use the platform's vetted crypto primitives. No home-rolled crypto, no MD5/SHA1 for passwords (use the project's password hasher), no ECB, no static IVs.
+- Compare secrets in constant time. Generate tokens from a CSPRNG.
+## Errors, logging, dependencies
+- Error responses leak nothing exploitable (no stack traces, SQL, or internal paths to clients). Log the detail server-side instead.
+- Never log secrets, tokens, full PANs, or full PII.
+- New dependency? Justify it, pin it, and confirm it is maintained — a new transitive supply-chain surface is a review item.
+## Verification bar (must hold before commit)
+- Every user-input path in the diff is validated and its sinks are parameterized/encoded.
+- Every protected action has a server-side authz check on the acting identity and resource.
+- No secret, no home-rolled crypto, no leaking error path introduced.
+- If any item cannot be satisfied within task scope, stop and surface it as a security finding — do not commit around it. Pair with [np-threat-model] when the change introduces a new trust boundary.

package/skills/np-secure-design/SKILL.md ADDED Viewed

@@ -0,0 +1,44 @@
+---
+name: np-secure-design
+description: "Quality bar for designing a new feature, system, or integration with security implications — a new external surface, a new trust boundary, a new privilege or token model, or the handling of sensitive assets (credentials, PII, money, secrets). Triggered for architect and executor work that shapes how a thing will be built, not how it is coded line-by-line. Encodes design-time security rules the design MUST satisfy before commit — the design-time twin of [np-secure-code-review]. Language- and framework-agnostic."
+user-invocable: false
+---
+# Secure Design
+Security is decided at design time. A control you forgot to design in is a vulnerability you will ship; the code review can only catch what the architecture left room for. Apply this bar to the design you are about to commit, before any line of implementation depends on it.
+## Before editing
+- Read existing security decisions/conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "security design <surface>" --task $TASK_ID`. Locked decisions (RULES/CONTEXT) — established trust boundaries, secret stores, privilege models — override generic defaults.
+- Name the trust boundaries this design crosses and the assets it protects. Everything else follows from those two facts.
+## Secure by default
+- The safe configuration is the default. Security is opt-out, never opt-in: encryption on, public access off, the strict policy selected unless deliberately relaxed.
+- Fail securely. An error, timeout, or unparseable input denies the action — it never grants access or falls through to a permissive path.
+- Build the security gate into the design now, not as a later hardening pass. A control retrofitted onto a shipped surface is a migration, not a tweak.
+## Least privilege & blast radius
+- Every component, credential, and token gets the minimum access it needs and nothing more. No shared god-credentials, no wildcard scopes, no standing admin.
+- Scope and time-box tokens; isolate secrets in a store, never in code, config, or the same blast radius as the data they unlock.
+- Segment so a single compromise is contained. Ask: if this component is fully owned, what else falls? Design to make that answer small.
+## Zero trust & defense in depth
+- Authenticate and authorize every request at the boundary. Never trust the network, the caller's claimed identity, or a client-supplied role/flag — verify against the acting identity on the specific resource.
+- Never rely on a single control. A WAF is not a substitute for input validation; network position is not a substitute for authz. Each layer assumes the one in front of it failed.
+- Validate across boundaries even between your own services — internal does not mean trusted.
+## Threat-informed
+- Design against how this will actually be attacked, not a generic checklist. Enumerate the abuse cases for this specific surface and design the control that defeats each — pair with [np-threat-model] for any new trust boundary or external surface.
+## Verification bar (must hold before commit)
+- The default state of the design is the secure state; relaxing it is an explicit, visible choice.
+- Every component/credential/token in the design is least-privilege and scoped; secrets are isolated and a single compromise is contained.
+- Every entry point authenticates and authorizes against the acting identity — no implicit trust in network, caller claims, or a single control.
+- The design fails closed on every error path, and the security gate is part of the design, not a deferred follow-up.
+- Abuse cases for this surface are named and answered — pair with [np-threat-model] to enumerate them and [np-access-control] for the privilege model. The implementation that follows is bound by [np-secure-code-review].

package/skills/np-service-boundary/SKILL.md ADDED Viewed

@@ -0,0 +1,35 @@
+---
+name: np-service-boundary
+description: "Quality bar for changes that introduce or move a module/service boundary — splitting a service, extracting a module, adding a cross-module dependency, wiring an event-driven or microservice seam. Triggered for architect/executor work that changes how parts of the system are divided and coupled. Encodes the coupling rules the change MUST satisfy before commit, not a design document to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Service & Module Boundaries
+A boundary is a commitment. It decides what can change cheaply and what can't. Draw it for a reason, expose the least you can, and make the dependencies point on purpose.
+## Before editing
+- Read the project's existing module structure first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`. Match the established boundaries, names, and seams instead of inventing new ones.
+## Boundaries follow cohesion
+- Group what changes together; separate what changes independently. A boundary that forces two unrelated things to deploy in lockstep, or splits one thing across two modules, is in the wrong place.
+- Split into a separate service or module only for a real, present reason: independent scaling, independent deploy, or a distinct owner. Never split by reflex, by layer, or "for cleanliness."
+- A wrong boundary is expensive to undo. When unsure, keep it inside one module — extracting later is cheap; collapsing a premature split is not.
+## Dependency direction is deliberate
+- Dependencies are acyclic. No A→B→A. If you reach for a circular dependency, the boundary is wrong.
+- Depend on abstractions and stable things, not on volatile internals. Point arrows toward what changes least.
+- A boundary exposes an explicit, minimal contract. Internals — schemas, helpers, storage shape — stay private. Callers reaching past the contract into internals is the coupling you are here to prevent.
+## Sync vs async is a choice
+- A network or process hop is a failure boundary, not a free function call. The other side can be slow, absent, or duplicate the call — handle it (see [np-system-design] for timeouts and retries).
+- Choose synchronous only when the caller genuinely needs the result now and can wait. Otherwise prefer async/event-driven so the caller is not coupled to the callee's availability.
+- Event-driven decoupling fits where producers and consumers should not know each other. The producer emits a fact; it does not call consumers or assume who listens. Adding a consumer must not require touching the producer.
+## Verification bar (must hold before commit)
+- The boundary maps to a cohesion or ownership reason that is stated, not assumed — and a new service/module split has a real independent-scaling, deploy, or ownership justification.
+- The dependency graph stays acyclic; new dependencies point toward more stable abstractions, not volatile details.
+- Each boundary exposes a minimal explicit contract; callers do not reach into internals, and internals are not widened to satisfy one caller.
+- Sync vs async is chosen deliberately; every process/network hop is treated as a failure boundary, not assumed reliable (cross-check [np-system-design]).
+- Cross-boundary calls match the surrounding contract and error conventions (see [np-api-design]); async hops use the established messaging seam (see [np-queue-design]).
+- Event producers stay ignorant of consumers; adding or removing a consumer touches no producer code.

package/skills/np-system-design/SKILL.md ADDED Viewed

@@ -0,0 +1,40 @@
+---
+name: np-system-design
+description: "Quality bar for executor or architect work that designs a new system, module, or significant feature — introducing components, defining how they interact, or restructuring how responsibilities and state are split. Triggered when a change establishes structure others will build on, not isolated logic inside one existing unit. Encodes design rules the change MUST satisfy before commit, not a design document to author. Intent-level and language-agnostic: shapes responsibilities, data flow, and failure behavior — never file names or schema DDL."
+user-invocable: false
+---
+# System Design
+Good design is the cheapest design that meets the requirement and degrades predictably. Shape the components and their seams; resist solving problems the requirement does not pose yet.
+## Before editing
+- Read the project's existing architecture/conventions first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`. Match the established idiom — a new component should look like it belongs.
+## Responsibilities and boundaries
+- Give each component a single, statable responsibility. If you cannot name it in one sentence without "and", split it or narrow it.
+- Keep cohesion high inside a component and coupling low across the seam. The boundary, not the internals, is the load-bearing decision.
+- Depend on the narrowest contract that does the job; do not reach across a boundary into another component's internals.
+## Data flow and ownership
+- Make ownership explicit: one component owns each piece of state; others read or request, they do not mutate it behind its back.
+- Make the call graph explicit — who calls whom, in which direction. Avoid cycles between components.
+- State sync vs async per interaction and justify async only where it earns its cost (latency, decoupling, backpressure); do not default to it.
+## Simplicity and fit
+- Choose the simplest structure that meets the stated requirement. No speculative abstraction, no extension point for a use case nobody asked for (YAGNI).
+- Match design complexity to problem size: a small problem gets a small design. Adding layers, indirection, or a queue must be justified by a real, present constraint — not a hypothetical one.
+- Prefer fewer moving parts. Every new component, hop, or shared mutable state is a cost that must pay for itself.
+## Failure modes and testability
+- Name how the design degrades: what happens when a dependency is slow, absent, or returns garbage. The answer must be a deliberate choice, not an accident.
+- Design components to be exercisable in isolation — dependencies enter through the seam, so a component can be tested without standing up the whole system.
+## Verification bar (must hold before commit)
+- Each component has a single responsibility statable in one sentence; boundaries follow cohesion, not convenience (pair with [np-service-boundary] when a seam is non-trivial).
+- State ownership is unambiguous and the call direction is acyclic; sync vs async is a stated decision, not a default.
+- The design is the simplest that meets the requirement — no abstraction, layer, or generality without a present constraint that demands it.
+- Failure behavior for each external dependency is named and intentional; the system degrades predictably rather than silently.
+- Components are testable in isolation through their seams.
+- The load-bearing decision and its rejected alternatives are stated inline; when it is architecturally significant, capture it via [np-adr].
+- The design stays intent-level — responsibilities, flows, and contracts — and prescribes no file names or schema DDL.

package/skills/np-test-strategy/SKILL.md ADDED Viewed

@@ -0,0 +1,46 @@
+---
+name: np-test-strategy
+description: "Quality bar for changes that add or modify behavior and therefore need tests. Triggered for executor and verifier work on any feature, fix, or refactor that changes observable behavior. Encodes the testing checklist the change MUST satisfy before commit — not a test plan to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Test Strategy
+A change that alters behavior ships with tests that lock that behavior in. Tests exist to catch the regression a future edit would introduce — not to inflate coverage. Aim for the cheapest test that would fail if the behavior broke.
+## Before editing
+- Read the project's existing test conventions first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "test conventions <area>" --task $TASK_ID`. Match the established idiom (test framework, naming, directory layout, fixture/factory style, assertion helpers). Do not introduce a new test tool or pattern.
+## Test the behavior, not the implementation
+- Assert on observable outcomes: return values, emitted events, persisted state, responses, side effects a caller can see. Never assert on private fields, call order of internals, or how the work is done.
+- Name tests by the behavior under test and its condition, not by the method name.
+- A test that has to change every time you refactor internals is testing the wrong thing — delete or rewrite it.
+## Pick the right level
+- Unit-test pure logic and branching in isolation; this is where edge cases are cheapest to cover.
+- Integration-test the seams: real DB, real serialization, real wiring between collaborators that unit tests stub away.
+- Reserve e2e/end-to-end for a thin layer of critical user journeys — slow and flaky if overused.
+- Push each assertion to the lowest level that can still observe the behavior.
+## Cover the full surface
+- Happy path, boundary/edge inputs (empty, max, zero, null, unicode, duplicate), and failure/error paths all get a test.
+- Error paths assert the right error surfaces and nothing is half-committed — see [np-error-handling].
+- A bug fix gets a regression test that fails before the fix and passes after; verify it actually fails on the unpatched code first.
+- A refactor adds characterization tests for any untested behavior it touches before the change — see [np-refactoring].
+## Mock only true externals
+- Mock network, clock, randomness, filesystem, and third-party services — nothing else. Mocking your own collaborators couples tests to structure and hides integration bugs.
+- Prefer real in-memory fakes (e.g. in-memory store) over mocks that merely re-assert the call you wrote.
+## Determinism and speed
+- No real sleeps, wall-clock reads, network calls, or reliance on test execution order. Inject the clock, freeze time, seed randomness.
+- Each test sets up and tears down its own state; tests pass in isolation and in any order.
+- Fast by default — a slow suite stops being run.
+## Verification bar (must hold before commit)
+- New or changed behavior has a test asserting its observable outcome; the test fails if the behavior is reverted.
+- Happy path, at least one boundary case, and the failure/error path are each covered.
+- Bug fixes include a regression test confirmed to fail before the fix.
+- Tests are deterministic (no sleeps/real clock/network/order-dependence) and run at the lowest sufficient level.
+- Mocks cover only true externals; no assertions on private internals.
+- Suite is green and matches the project's existing test idiom; error-path coverage aligns with [np-error-handling].

package/skills/np-threat-model/SKILL.md ADDED Viewed

@@ -0,0 +1,42 @@
+---
+name: np-threat-model
+description: "Quality bar for any change that introduces or alters a trust boundary, opens a new attack surface, adds an external integration, or handles sensitive data/assets — new ingress, a webhook/callback, a queue consumer, a third-party API call, a privilege transition, a new datastore for credentials/PII. Triggered for executor and security-reviewer work on such changes. Encodes a lightweight STRIDE reasoning checklist the change MUST satisfy before commit — reasoning over the diff, not a formal report to produce. Language- and framework-agnostic."
+user-invocable: false
+---
+# Threat Model
+A new trust boundary is a new promise an attacker will test. Threat modeling here is not a document — it is the act of reasoning over the diff before you commit it: what asset moved, who can now reach it, and what stops them from abusing it. Apply this bar whenever the change touches a boundary, surface, integration, or sensitive asset.
+## Before editing
+- Read the project's existing boundaries and assumptions first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "threat model <surface>" --task $TASK_ID`. Reuse the established trust model; do not invent a parallel one.
+## Frame the change
+- **Name the assets the change touches** — credentials, PII, money, tokens, audit integrity, availability of a path. If it touches none, the surface is the asset.
+- **Draw the trust boundary** the diff crosses: where does untrusted input enter, where does privilege change, what is now reachable that wasn't. Everything arriving from across it is hostile until proven otherwise.
+- **List the new actors** — anonymous caller, authenticated-but-not-authorized user, a compromised dependency, the integration partner itself.
+## Enumerate threats (STRIDE lens)
+- **Spoofing** — can an actor claim an identity they don't hold? Is the caller, webhook, or integration authenticated and its origin verified?
+- **Tampering** — can request, payload, stored asset, or in-transit data be altered? Integrity checks, signatures, parameterized sinks.
+- **Repudiation** — is a security-relevant action attributable? Is there a tamper-evident audit trail for the new path?
+- **Information disclosure** — can the asset leak via responses, errors, logs, timing, or an over-broad scope? Least exposure by default.
+- **Denial of service** — can the new surface be exhausted (unbounded work, no rate limit, amplification via the integration)?
+- **Elevation of privilege** — can the change be used to gain rights? Does any new path default-allow or cross a boundary without an authz check?
+## Rank and mitigate
+- Rank each credible threat by **likelihood × impact**; spend the diff's effort on the high cells, not the theoretical tail.
+- Every credible threat is either **mitigated inside this change** or recorded as an **explicit accepted-risk finding** — never silently left open.
+- Mitigations live in the diff, not in a future ticket. Implement the secure default; pair concrete sink/auth hardening with [np-secure-code-review].
+## Verification bar (must hold before commit)
+- The trust boundary and the assets the change touches are named, and untrusted input across it is treated as hostile.
+- Each STRIDE category was applied to the new surface; every credible threat has a mitigation in the diff or an explicit accepted-risk finding.
+- The new actor with the least privilege cannot reach an asset they shouldn't — spoofing and elevation paths are closed by default-deny.
+- The new path is attributable (audit) and bounded (rate/quota); errors and logs disclose nothing exploitable.
+- If a high-likelihood × high-impact threat cannot be mitigated within task scope, stop and surface it — do not commit around it.

package/templates/claude/payload/hooks/np-learnings-hook.cjs ADDED Viewed

@@ -0,0 +1,56 @@
+#!/usr/bin/env node
+'use strict';
+const fs = require('node:fs');
+const path = require('node:path');
+const cp = require('node:child_process');
+// ADR-0010 / ECC continuous-learning: thin Stop-hook shim. On session Stop it
+// asks np-tools to (rate-limited) auto-capture reusable learnings from the
+// turn's diff; on UserPromptSubmit it resets the consecutive-stop streak. All
+// heavy logic lives in lib/learnings/. A learning hook must NEVER break the
+// session — every failure path exits 0 silently.
+const ALLOWED_VERBS = new Set(['capture', 'reset']);
+function resolveNpTools() {
+  const candidates = [
+    path.join(process.cwd(), '.nubos-pilot', 'bin', 'np-tools.cjs'),
+    path.join(__dirname, '..', '..', '..', '.nubos-pilot', 'bin', 'np-tools.cjs'),
+  ];
+  for (const c of candidates) {
+    try { if (fs.statSync(c).isFile()) return c; } catch {}
+  }
+  return null;
+}
+function readStdin() {
+  return new Promise((resolve) => {
+    if (process.stdin.isTTY) return resolve('');
+    let buf = '';
+    process.stdin.setEncoding('utf-8');
+    const timer = setTimeout(() => { try { process.stdin.removeAllListeners(); } catch {} resolve(buf); }, 800);
+    process.stdin.on('data', (c) => { buf += c; });
+    process.stdin.on('end', () => { clearTimeout(timer); resolve(buf); });
+    process.stdin.on('error', () => { clearTimeout(timer); resolve(buf); });
+  });
+}
+(async () => {
+  if (process.env.NUBOS_PILOT_HEADLESS === '1') { process.exit(0); return; }
+  const verb = process.argv[2];
+  if (!ALLOWED_VERBS.has(verb)) { process.exit(0); return; }
+  const npTools = resolveNpTools();
+  if (!npTools) { process.exit(0); return; }
+  const input = await readStdin();
+  try {
+    cp.spawnSync(process.execPath, [npTools, 'learnings', verb, '--stdin'], {
+      input,
+      encoding: 'utf-8',
+      timeout: 15000,
+      maxBuffer: 4 * 1024 * 1024,
+      cwd: process.cwd(),
+    });
+  } catch { /* never let a learning hook break the session */ }
+  process.exit(0);
+})().catch(() => { process.exit(0); });

package/templates/claude/payload/hooks/np-security-hook.cjs CHANGED Viewed

@@ -31,6 +31,7 @@ function readStdin() {
 }
 (async () => {
+  if (process.env.NUBOS_PILOT_HEADLESS === '1') { process.exit(0); return; }
   const verb = process.argv[2];
   if (!ALLOWED_VERBS.has(verb)) { process.exit(0); return; }
   const npTools = resolveNpTools();

package/workflows/architect-phase.md CHANGED Viewed

@@ -74,9 +74,24 @@ The architect then consumes the consensus-merged `RESEARCH.md` instead of a sing
 After the architect emits `M<NNN>-ARCHITECTURE.md`, the orchestrator spawns ONE `np-critic` instance with the architecture file + `M<NNN>-CONTEXT.md` as inputs. The critic verifies that every locked decision in CONTEXT has a corresponding architecture entry and that no `Deferred` items leaked into the architecture. Findings of category `unmet-criterion`, `locked-decision-violation`, or `information-missing` route per `lib/nubosloop.cjs::routeFindings`. A single Build-Fixer-style round on the architect closes the loop. Beyond one round the workflow exits with `stuck` and the user resolves manually — architecture decisions don't merit unbounded looping.
+## Skills (Nubos library)
+Nubos ships a design-time skill library under `.claude/skills/np-*/` (present only on Claude Code). These are the **quality bar for the architecture decisions you are about to commit** — each skill's "Verification bar" is the standard each ADR-style decision is held to. Before spawning `np-architect`, classify the milestone (read `M<NNN>-CONTEXT.md` + `M<NNN>-RESEARCH.md`) and inject the matching skill triggers into the architect's spawn prompt. Skills **stack** — include every row the milestone matches (cap at the most relevant ~4 if more match; always keep the security row when it applies).
+| Milestone signal | Skills to trigger |
+|---|---|
+| Designs a new system, module, or significant feature | `np-system-design` (with `np-adr` for an architecturally significant, hard-to-reverse choice) |
+| Introduces or moves a module/service boundary, splits a service, or chooses sync vs async | `np-service-boundary` |
+| Any structural decision that is costly to reverse — datastore, sync/async, new dependency, auth model, public contract | `np-adr` |
+| New external surface, new trust boundary, new privilege model, or handling of sensitive assets | `np-secure-design` (with `np-threat-model`) |
+| Authorization model — roles, permissions, policies, resource ownership | `np-access-control` |
+| Design depends on an unreliable dependency, async/queue processing, or a cache | `np-resilience-patterns`, `np-queue-design`, `np-caching-strategy` (each as it applies) |
+| Persisted data shape, or personal/sensitive data | `np-data-modeling`, `np-data-privacy` (as they apply) |
+| Purely additive milestone with no structural/security decision | None — skip the skill block (and reconsider whether the architect pass is needed at all) |
 ## Spawn np-architect
-Spawn `agents/np-architect.md` mit dem folgenden Files-to-Read-Block:
+Spawn `agents/np-architect.md` mit dem folgenden Files-to-Read-Block und (sofern Skills matchen) der angehängten Skill-Direktive:
 ```
 <files_to_read>
@@ -88,8 +103,13 @@ Spawn `agents/np-architect.md` mit dem folgenden Files-to-Read-Block:
 Milestone: M<NNN>
 Task: Emit M<NNN>-ARCHITECTURE.md per the agent's Output Contract.
+Use the following Nubos skills as the quality bar for your decisions: <skill-1>, <skill-2>, ...
+Each is installed at .claude/skills/<skill>/SKILL.md; every architecture decision must satisfy the matching skill's "Verification bar".
 ```
+If zero skills match, omit the skill-directive line — do not invent skills.
 Der Agent ist read-only auf Source — er schreibt EINE Datei:
 `.nubos-pilot/milestones/M<NNN>/M<NNN>-ARCHITECTURE.md`.