npm - nubos-pilot - Versions diffs - 1.2.1 → 1.2.3 - Mend

nubos-pilot 1.2.1 → 1.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (86) hide show

package/CHANGELOG.md +43 -1
package/agents/np-architect.md +2 -0
package/agents/np-executor.md +1 -1
package/agents/np-learnings-extractor.md +54 -0
package/agents/np-planner.md +1 -1
package/agents/np-security-reviewer.md +9 -0
package/bin/np-tools/_commands.cjs +5 -0
package/bin/np-tools/derive-tier.cjs +86 -0
package/bin/np-tools/derive-tier.test.cjs +83 -0
package/bin/np-tools/doctor.cjs +15 -2
package/bin/np-tools/graph-impact.cjs +111 -0
package/bin/np-tools/graph-impact.test.cjs +119 -0
package/bin/np-tools/learnings.cjs +105 -0
package/bin/np-tools/learnings.test.cjs +66 -0
package/bin/np-tools/loop-run-round.cjs +7 -1
package/bin/np-tools/scan-codebase.cjs +21 -1
package/bin/np-tools/skill-audit.cjs +79 -0
package/bin/np-tools/skill-audit.test.cjs +86 -0
package/bin/np-tools/verify-reliability.cjs +65 -0
package/bin/np-tools/verify-reliability.test.cjs +69 -0
package/lib/agents.test.cjs +1 -0
package/lib/checkpoint.cjs +3 -0
package/lib/codebase-graph.cjs +0 -0
package/lib/codebase-graph.test.cjs +174 -0
package/lib/codebase-manifest.cjs +3 -0
package/lib/config-defaults.cjs +13 -0
package/lib/config-schema.cjs +11 -0
package/lib/eval-reliability.cjs +63 -0
package/lib/eval-reliability.test.cjs +56 -0
package/lib/install/claude-hooks-learnings.test.cjs +82 -0
package/lib/install/claude-hooks.cjs +65 -4
package/lib/install/claude-hooks.test.cjs +5 -2
package/lib/learnings/capture-ledger.cjs +80 -0
package/lib/learnings/capture-ledger.test.cjs +54 -0
package/lib/learnings/extract.cjs +191 -0
package/lib/learnings/extract.test.cjs +115 -0
package/lib/learnings.cjs +19 -95
package/lib/memory.cjs +38 -33
package/lib/messaging.cjs +12 -6
package/lib/metrics-aggregate.cjs +14 -2
package/lib/migrate.cjs +29 -0
package/lib/migrate.test.cjs +91 -0
package/lib/nubosloop-audit.cjs +104 -0
package/lib/nubosloop-skill-audit.test.cjs +98 -0
package/lib/nubosloop.cjs +9 -0
package/lib/schemas/data/checkpoint.v1.json +13 -0
package/lib/schemas/data/codebase-manifest.v1.json +22 -0
package/lib/schemas/data/learnings.v1.json +28 -0
package/lib/schemas/data/memory-manifest.v1.json +14 -0
package/lib/schemas/data/memory-record.v1.json +16 -0
package/lib/schemas/data/message.v1.json +19 -0
package/lib/schemas/data/metrics-record.v1.json +11 -0
package/lib/tier-classify.cjs +67 -0
package/lib/tier-classify.test.cjs +67 -0
package/lib/validate.cjs +301 -0
package/lib/validate.test.cjs +242 -0
package/np-tools.cjs +5 -0
package/package.json +3 -1
package/skills/np-access-control/SKILL.md +42 -0
package/skills/np-accessibility-audit/SKILL.md +41 -0
package/skills/np-adr/SKILL.md +37 -0
package/skills/np-api-design/SKILL.md +34 -0
package/skills/np-caching-strategy/SKILL.md +38 -0
package/skills/np-data-modeling/SKILL.md +37 -0
package/skills/np-data-privacy/SKILL.md +39 -0
package/skills/np-dependency-audit/SKILL.md +47 -0
package/skills/np-encryption/SKILL.md +47 -0
package/skills/np-error-handling/SKILL.md +37 -0
package/skills/np-incident-response/SKILL.md +38 -0
package/skills/np-llm-app-architecture/SKILL.md +50 -0
package/skills/np-observability/SKILL.md +39 -0
package/skills/np-performance/SKILL.md +38 -0
package/skills/np-queue-design/SKILL.md +32 -0
package/skills/np-rag-design/SKILL.md +43 -0
package/skills/np-refactoring/SKILL.md +35 -0
package/skills/np-resilience-patterns/SKILL.md +39 -0
package/skills/np-secure-code-review/SKILL.md +46 -0
package/skills/np-secure-design/SKILL.md +44 -0
package/skills/np-service-boundary/SKILL.md +35 -0
package/skills/np-system-design/SKILL.md +40 -0
package/skills/np-test-strategy/SKILL.md +46 -0
package/skills/np-threat-model/SKILL.md +42 -0
package/templates/claude/payload/hooks/np-learnings-hook.cjs +55 -0
package/workflows/architect-phase.md +21 -1
package/workflows/execute-phase.md +66 -4
package/workflows/verify-work.md +17 -4

package/skills/np-encryption/SKILL.md ADDED Viewed

@@ -0,0 +1,47 @@
+---
+name: np-encryption
+description: "Quality bar for any change that encrypts, decrypts, hashes, signs, or verifies data; stores or checks passwords; sets up TLS or certificate handling; generates tokens, nonces, IVs, or salts; or reads, writes, or rotates keys and secrets. Triggered for executor work touching cryptography, password storage, transport security, signing/HMAC, or key/secret management. Encodes crypto rules the change MUST satisfy before commit — not a spec to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Encryption & Key Management
+Crypto code is not done when it round-trips in a test — it is done when it would survive a hostile reviewer. The failure mode is silent: a broken cipher mode, a reused nonce, or a leaked key produces output that looks correct. Apply every relevant section to the diff. A single unaddressed item is a blocking finding, not a nit.
+## Before editing
+- Read existing crypto conventions / locked decisions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`. Reuse the project's chosen library and key store; do not introduce a second.
+- Locked decisions in RULES/CONTEXT (cipher, hasher, key source, rotation policy) override every generic default below.
+## Never roll your own
+- Use the platform's vetted, current high-level crypto library — not raw block-cipher calls, not a hand-built construction. If you are choosing modes, padding, or combining primitives by hand, stop.
+- Pick the right tool. Hashing is one-way (integrity, dedup, fingerprints). Encryption is reversible (confidentiality). Signing/HMAC proves authenticity. Do not substitute one for another.
+## Hashing & passwords
+- Passwords go through a slow, salted password hasher (argon2/bcrypt/scrypt or the project's chosen one) with a per-secret salt. Never MD5/SHA1/plain-SHA256 for passwords.
+- General-purpose fast hashes are for integrity/identity only, never for secrets that must resist guessing.
+## Encryption
+- Use authenticated encryption (AEAD, e.g. AES-GCM / ChaCha20-Poly1305). Never ECB. Never unauthenticated CBC where tampering matters.
+- A fresh IV/nonce per message from a CSPRNG. Never a static, zero, or reused IV/nonce — reuse breaks the cipher.
+- Encrypt sensitive data in transit: TLS everywhere, verify certificates, no protocol/cipher downgrade, no disabled verification. Encrypt at rest where the threat model requires it.
+## Keys, secrets & randomness
+- Keys and secrets live in a secret store / KMS / env — NEVER in source, config-in-repo, fixtures, or logs. No key material in error messages or URLs.
+- Scope keys to purpose and plan rotation: rotation must not orphan data encrypted under the old key.
+- Use a CSPRNG for anything security-relevant — tokens, IVs, salts, session ids. Never a normal/seedable RNG.
+- Compare secrets, tokens, MACs, and signatures in constant time. Never `==` on a secret.
+## Verification bar (must hold before commit)
+- No home-rolled crypto; a vetted current primitive/library is used for the right job (hash vs encrypt vs sign).
+- Passwords use a slow salted hasher; no MD5/SHA1 for any secret.
+- Encryption is AEAD with a CSPRNG-fresh IV/nonce; no ECB, no static/reused nonce.
+- Data is encrypted in transit with verified TLS; at rest where the threat model demands it.
+- No key or secret introduced into source, config-in-repo, or logs; rotation and scoping are accounted for.
+- All security-relevant randomness is CSPRNG; secret comparisons are constant-time.
+- If any item cannot be satisfied within task scope, stop and surface it as a finding — do not commit around it. Pair with [np-secure-code-review] for the sink-level review, [np-secure-design] when the change adds a new key or trust boundary, and [np-data-privacy] when the data is personal/regulated.

package/skills/np-error-handling/SKILL.md ADDED Viewed

@@ -0,0 +1,37 @@
+---
+name: np-error-handling
+description: "Quality bar for changes that touch backend, service, integration, or IO code that can fail — network calls, database writes, external APIs, file/queue/process work, batch loops. Triggered for executor work on any failure-prone path; encodes the resilience checklist the change MUST satisfy before commit (fail loud, preserve cause, timeouts, bounded retries, idempotency, resource cleanup, actionable errors), not a doc to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Error Handling & Resilience
+Code that can fail must fail predictably. The bar below is about what the change does on the unhappy path, not the happy one.
+## Before editing
+- Read existing conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "error handling retry conventions" --task $TASK_ID`. Match the established error/retry idiom rather than inventing a new one.
+## Fail loud, preserve context
+- No empty catch and no catch-and-continue that hides a failure. If you catch, you handle, rethrow, or log-and-escalate.
+- Distinguish recoverable errors (retry, fallback, degrade) from programmer errors (bug — let it crash, don't paper over).
+- When wrapping an error, preserve the original cause/stack/chain. Never discard the inner error to throw a vague new one.
+- Surface actionable errors to callers: enough to act on, no internals (no stack traces, secrets, SQL, or host details leaking across a trust boundary).
+## Outbound calls & retries
+- Every outbound or blocking IO call (network, DB, queue, subprocess, lock) has an explicit timeout. No unbounded waits.
+- Retry only idempotent operations. Use backoff with a hard attempt/time cap — no tight retry loops, no retry storms against a struggling dependency.
+- Write paths that may be retried are idempotent (idempotency key, upsert, or dedup) so a retry can't double-apply.
+## Cleanup & partial failure
+- The failure path releases what the success path acquired: connections, file handles, locks, temp files, transactions. Prefer finally/defer/with-style guarantees over manual unwind.
+- Validate inputs before mutating state where cheap; otherwise make partial mutation recoverable. Don't leave half-written state on error.
+- Batch/loop work decides explicitly: fail-fast or collect-and-report. Don't let one bad item silently drop the rest or mask which items failed.
+## Verification bar (must hold before commit)
+- No silent swallow: every catch handles, rethrows with cause, or escalates — verified, not assumed.
+- Every new outbound/IO call has a timeout; every retry has backoff and a cap; retried writes are idempotent.
+- Failure paths free all acquired resources; no leaked handles, locks, or open transactions.
+- Caller-facing errors are actionable and leak no internals; batch work reports partial failures.
+- Error and retry paths are covered, not just the happy path — see [np-test-strategy].
+- Failures and retries are observable (logged/metered with cause and context) — see [np-observability].
+- Error shapes and status codes returned across an API boundary are consistent — see [np-api-design].

package/skills/np-incident-response/SKILL.md ADDED Viewed

@@ -0,0 +1,38 @@
+---
+name: np-incident-response
+description: "Quality bar for changes that are risky or hard to reverse — feature flags, behavior gated on data state, data migrations coupled to code, or changes to external integrations. Triggered for executor work on high-blast-radius changes; encodes the reversibility and rollback-readiness checklist the change MUST satisfy before commit, not a runbook to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Change Reversibility & Rollback Readiness
+Reversibility is a property you design into a risky change, not something you bolt on after it breaks. Before you commit, you must already know how to undo it.
+## Before editing
+- Read existing conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`.
+## Know the reversal path
+- Every risky change has one undo story: clean `revert`, flip a flag off, or a documented manual rollback. Pick one before writing code.
+- If the only way back is "restore from backup," the change is not ready — make it cheaper to reverse.
+## Gate new behavior
+- Put new or risky behavior behind a flag/toggle so it can be disabled without a code change, redeploy, or hotfix.
+- Default the flag to the old behavior; turning it ON is the deliberate act.
+- Add a kill switch for anything that can misbehave under load or when an external system is slow or down.
+## Keep data reversible
+- Do not couple an irreversible data migration to the code that depends on it.
+- Use expand/contract: add the new shape, backfill, switch reads, drop the old shape later — each step independently reversible.
+- A rollback of the code must not leave data in a shape the old code cannot read.
+## Blast radius
+- Know what fails if this change fails: does it degrade gracefully, or take the system down with it?
+- Isolate failure to the smallest surface — one feature, one path, one tenant — not the whole request.
+- Leave a short runbook note where the project keeps them: what this changes, how to tell it's wrong, how to turn it off.
+## Verification bar (must hold before commit)
+- The change has a named reversal path: revert, flag-off, or a written manual rollback.
+- New/risky behavior is behind a flag defaulting to old behavior; a kill switch exists for load- or integration-sensitive paths.
+- Code is safe to roll back at any point: no data migration that the prior code can't tolerate (cross-link [np-data-modeling] for reversible expand/contract migrations).
+- Failure degrades gracefully and is contained to a known blast radius, not the whole system.
+- A short runbook note records what changed, how to detect it's wrong, and how to turn it off (cross-link [np-observability] for detection signals).

package/skills/np-llm-app-architecture/SKILL.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+name: np-llm-app-architecture
+description: "Quality bar for any change that designs or modifies an LLM-backed feature — a prompt, an agent loop, tool/function calling, structured-output extraction, an LLM-as-judge, or any path where model output flows into the system. Triggered for researcher, architect, and executor work on LLM/agent/prompt/tool-use/AI features. Encodes the design rules the change MUST satisfy before commit, not a design document to author. Provider- and framework-agnostic."
+user-invocable: false
+---
+# LLM Application Architecture
+A language model is a non-deterministic, fallible component you do not control. Designing around one is not prompt-tweaking — it is building a system that stays correct, safe, and affordable when the model is wrong, slow, or adversarially steered. Apply this bar to the LLM feature you are about to commit.
+## Before editing
+- Read the project's existing LLM conventions first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "llm feature conventions" --task $TASK_ID`. Match the established prompt structure, output-validation idiom, and provider client. Do not introduce a second one.
+- For the current model lineup, context windows, and API specifics, consult the project's own up-to-date reference — never hardcode model names or limits from memory.
+## Eval before build
+- Define what "correct" means before writing the prompt. A capability check (does it do the task on representative cases) plus a regression set (does a prompt/model change break what worked) — not vibes, not a single happy-path demo.
+- Wire the eval into the project's verify step so a prompt or model change that regresses output fails the gate, the same as a code change.
+## Treat output as untrusted
+- The model's output is hostile input until validated. Never pipe raw output into a sink — a query, a shell, a file path, a render, a downstream call — without parsing and checking it.
+- Constrain the shape: schema / structured-output / function-call form, not free-text you regex after the fact. Validate against the schema; reject and retry on a miss, never best-effort-parse.
+## Context is a budget, not a dump
+- Decide deliberately what enters the prompt. Token budget is finite and cost/latency scale with it — relevance beats volume.
+- Have an explicit truncation/selection strategy for oversized context (rank, summarize, window) — never silently drop the tail and hope.
+## Tools & injection are trust boundaries
+- A tool/function call crosses the same trust boundary as any external input: authorize the call and validate every argument before executing. The model proposing a tool call is not authorization.
+- Prompt injection is real: untrusted content in the prompt must not be able to escalate. Separate instructions from data, and grant tools least privilege so a hijacked prompt cannot reach what it should not.
+## Failure & cost are design inputs
+- Plan for timeouts, rate limits, refusals, and malformed output as expected states, not exceptions. Define retries (with backoff), fallbacks, and a degraded path.
+- Route by task complexity and cache where safe — cost and latency are design constraints, not afterthoughts.
+- Never put secrets or PII into prompts or logs. Prompts and completions are frequently logged; treat them as such.
+## Verification bar (must hold before commit)
+- An eval (capability + regression) exists and is wired into the project's verify step — pair with the project's verify gate.
+- No model output reaches a sink unvalidated; output is schema-constrained and parse failures are handled.
+- Context entering the prompt is bounded with a deliberate truncation strategy; no unbounded dump.
+- Every tool call is authorized and its arguments validated; tools run at least privilege — pair with [np-secure-code-review] for prompt-injection and tool-trust handling.
+- Timeouts, rate limits, refusals, and malformed output have defined retry/fallback behavior — pair with [np-error-handling] for failure modes.
+- No secret or PII enters any prompt or log.
+- If the feature retrieves context from a corpus, the retrieval design itself meets [np-rag-design].

package/skills/np-observability/SKILL.md ADDED Viewed

@@ -0,0 +1,39 @@
+---
+name: np-observability
+description: "Quality bar for changes that add or modify a code path that should be observable — services, request handlers, background jobs, integrations with external systems, and any new failure path. Triggered for executor work on handlers, workers, clients, or error branches. Encodes the logging/metrics/tracing rules the change MUST satisfy before commit so the path is diagnosable from telemetry alone, not a runbook or dashboard to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Observability
+If something breaks at 3am, can the on-call person see what happened from logs and metrics alone — without you adding logging during the incident? Apply this bar to the code path you are about to commit.
+## Before editing
+- Read the project's existing telemetry conventions first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "logging metrics conventions" --task $TASK_ID`. Match the established log idiom, metric naming, and trace-context propagation — consistency beats personal preference.
+## Logging
+- **Structured, not string-concatenated.** Emit key-value fields (`order_id`, `duration_ms`), never bake variables into a prose sentence. Machines query fields; they can't grep your prose.
+- **Levels mean something.** `error` = actionable, someone must look; `warn` = degraded but handled; `info` = significant state change at a boundary; `debug` = development detail. Don't log expected control flow as `error` — error noise hides real errors.
+- **Carry the correlation/request/trace id** on every log line so a single request can be reconstructed across the path.
+- **Never log secrets, tokens, credentials, or PII.** Redact at the logging boundary, not by hoping callers don't pass them.
+- **Log once at the boundary**, not at every layer the value passes through. Don't log inside hot loops or per-iteration — aggregate and emit once.
+## Metrics
+- **Emit a metric for what you'll need to answer "is it healthy / how often / how slow."** Counter for occurrences, histogram/timer for latency, and a way to compute error rate (success vs failure count) on any path that can fail.
+- **Follow the existing metric naming** and label/tag conventions. Don't invent a parallel scheme. Keep label cardinality bounded — no user ids or raw inputs as labels.
+## Tracing
+- **Propagate trace context across every service and async boundary** — outgoing requests, queue publishes, job pickups. A dropped context turns one trace into orphans.
+## Verification bar (must hold before commit)
+- Every new failure path is diagnosable from logs + metrics alone: a structured log at `warn`/`error` with the correlation id, and a counter that increments on failure.
+- Logs are structured key-value with correct levels; no secrets/tokens/PII reach any sink — pair with [np-secure-code-review].
+- Error branches carry actionable context (what failed, identifying ids) so the cause is clear without re-reading source — align with [np-error-handling].
+- New metrics follow existing naming/labels; latency and error rate are observable for any path that can be slow or fail.
+- Trace context is propagated across every service/async hop the change introduces.
+- No logging in hot loops; the path logs once at its boundary, not at every layer.

package/skills/np-performance/SKILL.md ADDED Viewed

@@ -0,0 +1,38 @@
+---
+name: np-performance
+description: "Quality bar for changes that touch a hot path, data access, or anything that scales with input size — triggered for executor work on loops over collections, query layers, request/render hot paths, batch jobs, or any code whose cost grows with N. Encodes the performance checklist the change MUST satisfy before commit: known complexity, no N+1, bounded result sets, sound caching, lean hot loops. Not a profiling report to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Performance
+A change that scales with input must not degrade as input grows. Optimize the dominant cost; leave micro-noise alone. Never trade correctness or readability for unmeasured speed.
+## Before editing
+- Read existing conventions / find the hot path: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`.
+- Establish the baseline before changing anything: measure the current cost (timing, query count, allocations) so you can prove the change helps. Don't guess where the time goes.
+## Complexity
+- Know the algorithmic complexity of what you wrote and whether it grows with N. State it to yourself before committing.
+- Replace nested scans over collections with a set/map lookup (O(N²) → O(N)).
+- Do work once: hoist invariant computation out of loops; memoize repeated pure calls.
+## Data access
+- Kill N+1: no queries inside a loop. Batch, eager-load, or join so the count is constant in the number of rows.
+- Paginate or stream unbounded result sets. Never load-all-into-memory when the size is driven by user data or table growth.
+- Push filtering, aggregation, and limits down to the data layer instead of fetching wide and discarding in code.
+## Hot loops and caching
+- Avoid needless allocation and copying inside hot loops; reuse buffers, slice instead of clone where safe.
+- Cache only when correctness allows it, and only with a clear invalidation story — no stale-forever. Define the key and the expiry/bust trigger.
+- Defer expensive, non-critical work (move it async / to a background job) so it stays off the request or render path.
+## Verification bar (must hold before commit)
+- The change's complexity is known and does not grow worse than linearly with input unless unavoidable and justified.
+- No query runs inside a loop; collection access over the data layer is batched or eager-loaded.
+- Every result set whose size scales with input is paginated, streamed, or bounded — nothing loads everything into memory.
+- Any cache added has an explicit invalidation trigger; no value can go stale-forever.
+- Hot loops do no redundant allocation, copying, or repeated pure work.
+- A before/after measurement (or a clear complexity argument) shows the change helps and harms nothing; the dominant cost was the target, not micro-noise.
+- Correctness and readability were not sacrificed for unmeasured speed.
+- See [np-data-modeling] for indexing that backs these queries and [np-observability] for latency/throughput metrics that confirm the win in production.

package/skills/np-queue-design/SKILL.md ADDED Viewed

@@ -0,0 +1,32 @@
+---
+name: np-queue-design
+description: "Quality bar for executor work that adds or changes asynchronous job, message-queue, or worker code — producers, consumers, background jobs, event handlers, task processors, and the messages they exchange. Triggered whenever a change enqueues, dequeues, or processes work off the request path, encoding the async-correctness rules the change MUST satisfy before commit (idempotent consumers under at-least-once delivery, no assumed ordering, a real failure path with backoff and dead-letter, ack timeouts above worst-case processing, bounded backpressure, small versioned payloads, observable jobs). This is a bar the change must meet, not a queue spec to author. Language- and broker-agnostic."
+user-invocable: false
+---
+# Queue Design
+A queue decouples producer from consumer in time — and in doing so trades synchronous certainty for delivery guarantees that are weaker than they look. Design the consumer for the guarantees the transport actually gives, not the ones you wish it gave.
+## Before editing
+- Read existing conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`. Match the established queue/job idiom (naming, serializer, retry policy, dead-letter convention) rather than inventing a parallel one.
+## Assume at-least-once; make consumers idempotent
+- Treat every message as possibly delivered more than once. Processing the same message twice MUST produce the same result — dedupe on a message/business key, use a conditional or upsert write, or record processed IDs. Exactly-once is largely a myth; idempotency is the real defense.
+- Do not assume ordering. Only rely on order when the transport guarantees it AND the work genuinely needs it; otherwise make handlers order-independent.
+## Every consumer needs a failure path
+- A handler that can fail needs explicit retry with backoff, a max-attempts cap, and a dead-letter queue for what exhausts it. A poison message that always fails MUST NOT block the queue or retry forever — route it to the DLQ and move on.
+- Set the visibility/ack timeout longer than the worst-case processing time. Too short and the broker redelivers in-flight work, double-processing it.
+## Bound the queue and the payload
+- A queue that grows unbounded is an outage waiting to happen. Define what happens under backpressure — bound depth, shed, or scale consumers — and make lag observable; do not let it silently accumulate.
+- Keep the payload small: carry IDs and references, not large blobs. Version the message schema so producer and consumer can evolve independently; a consumer must tolerate an unknown newer field, not crash on it.
+## Verification bar (must hold before commit)
+- The consumer is idempotent: redelivery of the same message produces the same result, proven by a dedupe key, conditional write, or processed-ID record — not by hoping delivery is exactly-once.
+- No unjustified reliance on ordering; where order is required, the transport actually guarantees it.
+- A failure path exists: bounded retry with backoff, a max-attempts cap, and a dead-letter destination — a poison message cannot stall or loop the queue. Retry/backoff mechanics align with [np-error-handling].
+- Visibility/ack timeout exceeds worst-case processing time; no window where in-flight work is redelivered and run twice.
+- Backpressure is bounded and queue depth/lag is observable per [np-observability]; failures, retries, and DLQ arrivals are logged with enough context to diagnose what failed and how often.
+- The payload is small and the schema versioned; consumers tolerate additive schema change. Cross-boundary failure handling for the enqueue/dequeue call follows [np-resilience-patterns].

package/skills/np-rag-design/SKILL.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: np-rag-design
+description: "Quality bar for researcher/architect/executor work that adds or changes a retrieval-augmented (RAG) or semantic-search pipeline — chunking, embeddings, vector or hybrid indexes, retrieval, ranking, and grounding. Triggered when a task touches how documents are split, embedded, indexed, retrieved, filtered, or fed to a model as context. Encodes a checklist the design MUST satisfy before commit, not a spec document to author. Provider- and store-agnostic."
+user-invocable: false
+---
+# RAG / Retrieval Design
+Retrieval decides what the model sees. Bad chunks, a stale index, or unfiltered hits silently degrade every downstream answer. Make each retrieval choice deliberate and measurable.
+## Before editing
+- Read existing conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "retrieval index conventions" --task $TASK_ID`.
+- Find the existing chunker, embedding call, and index/query path before adding a parallel one.
+## Chunking
+- Size and overlap are deliberate choices justified against content shape and query length — not copied defaults.
+- Split on semantic boundaries (sections, paragraphs, code blocks); never sever a unit mid-sentence or mid-record.
+- Carry source metadata on every chunk: document id, title, location/anchor, timestamp, and access scope.
+## Embedding & index
+- One embedding model for both index build and query — mixing models or versions makes similarity meaningless.
+- Index type (vector / keyword / hybrid) fits scale and query latency; record the dimensionality and distance metric.
+- Re-embedding after a model change means a full re-index — plan it, don't half-migrate.
+## Retrieval quality
+- Have a retrieval eval: a known query→expected-doc set with recall/precision measured, not "it returned something."
+- Rank, then filter by a tuned score threshold so weak hits never reach the prompt; cap result count.
+- Surface why a chunk was retrieved (score, source) so failures are debuggable.
+## Grounding & freshness
+- Answers cite the retrieved sources; nothing ungrounded is presented as fact.
+- Budget retrieved tokens against the prompt window — truncate or rerank, don't overflow.
+- Define the re-index / invalidation trigger (on write, on schedule, on version bump); a stale index serves wrong answers.
+## Verification bar (must hold before commit)
+- Chunking preserves semantic units and attaches source metadata; size/overlap are justified, not defaulted.
+- Index and query use the identical embedding model and version; metric and dimensions are documented.
+- A retrieval eval exists and recall on the known set is measured and acceptable — regressions are visible.
+- Low-score / irrelevant chunks are filtered by threshold and count before reaching the context window.
+- Generated answers are grounded in and cite retrieved sources; see [np-llm-app-architecture] for the generation half.
+- Retrieval enforces caller access scope — it never returns documents the caller cannot see; see [np-secure-code-review].
+- A concrete re-index / invalidation trigger is defined for content and model changes.
+- Index and query latency fit the budget at target scale; see [np-performance].

package/skills/np-refactoring/SKILL.md ADDED Viewed

@@ -0,0 +1,35 @@
+---
+name: np-refactoring
+description: "Quality bar for behavior-preserving change — refactor, cleanup, restructure, rename, extract, deduplicate, decouple. Triggered for executor work on tasks that improve internal structure while keeping observable behavior identical. Encodes the checklist the change MUST satisfy before commit, not a method to teach. Structure changes; behavior does not — if behavior shifts, it is a feature or fix, not a refactor, and must be flagged. Language- and framework-agnostic."
+user-invocable: false
+---
+# Refactoring
+A refactor changes how the code is shaped, never what it does. Same inputs, same outputs, same side effects — only the structure improves. If you cannot say that with a straight face, you are not refactoring.
+## Before editing
+- Read existing conventions / pin behavior: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`.
+- Find the tests that currently exercise the code. If none pin the behavior you are about to move, add characterization tests FIRST — capture what it does today, not what it should do.
+## Behavior is frozen
+- Do not change observable behavior: return values, side effects, error cases, ordering, timing-sensitive contracts.
+- Preserve public contracts — signatures, names, types, routes, serialized shapes — unless the task is explicitly to change them.
+- If you discover a bug mid-refactor, do not fix it silently. Flag it and keep the behavior (including the bug) until a separate task addresses it.
+- A behavior change disguised as a refactor is the failure mode to avoid. When in doubt, it is NOT a refactor.
+## Small green steps
+- Move in small, reversible increments. Run the tests between steps; stay green the whole way.
+- Separate mechanical changes (bulk rename, file move, extract) from logic moves — different commits, so each diff is trivially reviewable.
+- Never mix a refactor and a behavior change in one commit.
+## Stay in scope
+- Refactor only what the task names. Resist "while I'm here" cleanups, drive-by reformatting, and adjacent rewrites — they bloat the diff and hide the real change.
+- Improving the target means: better names, higher cohesion, less duplication, looser coupling, lower complexity. Not: new abstractions nobody asked for.
+## Verification bar (must hold before commit)
+- Behavior is provably unchanged: the pre-existing tests pass without modification, or characterization tests added first still pass (see [np-test-strategy]).
+- No public contract was altered unless the task required it (see [np-api-design] for contract preservation).
+- The diff is reviewable: mechanical renames/moves are isolated from logic changes; no unrelated files touched.
+- No behavior change is bundled in; any discovered bug or behavior gap is flagged, not fixed in place.
+- The change measurably reduces duplication, coupling, or complexity — or it should not have been made.

package/skills/np-resilience-patterns/SKILL.md ADDED Viewed

@@ -0,0 +1,39 @@
+---
+name: np-resilience-patterns
+description: "Quality bar for executor work where code calls across a process or network boundary — an external API, another internal service, a queue, or a database under load. Triggered whenever a change introduces or modifies a remote call, encoding the resilience rules the change MUST satisfy before commit (bounded timeouts, fail-fast on persistent failure, isolation to stop cascades, degradation over hard error, load shedding under overload, retry-safe idempotency). This is a bar the change must meet, not a resilience spec to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Resilience Patterns
+A call across a process or network boundary WILL fail — slowly, partially, or completely. Design for that as the normal case, not the exception. Match each pattern to the actual failure mode you face; do not bolt on machinery the call does not need.
+## Before editing
+- Read existing conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`. Match the established resilience idiom (timeout helper, circuit-breaker wrapper, fallback shape) rather than inventing a parallel one.
+## Bound every remote call
+- Every call leaving the process gets an explicit timeout — connect AND read. No client default, no unbounded wait. A hung dependency must surface as a fast, typed failure, never a stuck caller.
+- Set the budget from the caller's deadline, not the dependency's optimism. A downstream timeout longer than the request budget is a bug.
+## Fail fast on a dependency that stays down
+- For a dependency that can be down for seconds-to-minutes, a circuit breaker beats hammering it: trip after a failure threshold, reject immediately while open, probe before closing.
+- A breaker is for sustained failure. For a single transient blip, a bounded timeout (or one retry — see [np-error-handling]) is enough; do not reach for a breaker where a timeout suffices.
+## Isolate so one failure does not sink the ship
+- Cap the resources any single dependency can consume — connection-pool slice, concurrency limit, dedicated worker set. One slow dependency must not drain the shared pool and freeze unrelated work (bulkhead).
+- Make the blast radius explicit: when dependency X is exhausted, exactly which calls degrade — and which stay healthy?
+## Degrade instead of erroring, shed instead of collapsing
+- Where a stale, cached, or partial answer beats a hard error, define the fallback and return it on failure. State plainly when a response is degraded.
+- Under overload, shed or queue with a bound rather than accept everything and fall over. Reject early with a clear signal; protect the work already in flight.
+## Make retries safe
+- Any call that may be retried (by you, a client, or infrastructure) must be idempotent — idempotency key, conditional write, or natural dedupe. A retried non-idempotent write is data corruption waiting to happen.
+## Verification bar (must hold before commit)
+- Every new or modified remote call has an explicit, deadline-derived timeout; no unbounded waits remain.
+- The pattern matches the failure mode: breaker only where failure is sustained, bulkhead only where a shared resource can be exhausted, fallback only where a degraded answer is acceptable. No unjustified machinery.
+- A failing or slow dependency cannot cascade: its resource consumption is capped and the degraded path is exercised, not just declared.
+- Overload is shed or bounded, not absorbed until collapse.
+- Anything retryable is idempotent; retry/backoff itself lives in [np-error-handling].
+- Trips, rejections, fallbacks, and shed load are observable per [np-observability]; added latency budgets respect [np-performance].

package/skills/np-secure-code-review/SKILL.md ADDED Viewed

@@ -0,0 +1,46 @@
+---
+name: np-secure-code-review
+description: "Quality bar for any change that touches authentication, authorization, session handling, secrets, cryptography, file uploads, deserialization, SQL or shell construction, SSRF-prone outbound requests, or user-controlled input that reaches a sink. Triggered for executor and security-reviewer work on auth/crypto/SQL/input-handling code. Encodes an OWASP-aligned review checklist the change MUST satisfy before commit — not a document to produce. Language-agnostic."
+user-invocable: false
+---
+# Secure Code Review
+A change that touches a security-sensitive surface is not done when the test passes — it is done when it survives this checklist. Apply every relevant section to the diff you are about to commit. A single unaddressed item is a blocking finding, not a nit.
+## Before editing
+- Read the existing pattern first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "auth <surface>" --task $TASK_ID`. Match the project's established auth/validation idiom; do not introduce a second one.
+- Identify the trust boundary the change crosses. Everything arriving from across it is hostile until validated.
+## Input handling
+- Validate at the boundary: type, length, range, format, allow-list over deny-list. Reject, don't sanitize-and-hope.
+- Parameterize every query. No string-concatenated SQL/NoSQL/LDAP, ever. No shelling out with interpolated user input — pass argv arrays.
+- Encode on output for the destination context (HTML, attribute, JS, URL, SQL). Escaping is contextual, not global.
+- Treat file paths, redirect targets, and outbound URLs from user input as SSRF/path-traversal vectors: canonicalize and allow-list.
+## AuthN / AuthZ
+- Authorization is checked server-side on every protected action, against the *acting* identity, on the *specific* resource (no IDOR). Never trust a client-supplied role, id, or `isAdmin`.
+- New endpoints/handlers default to deny. Adding a route never silently widens access.
+- Session/token: rotate on privilege change, expire, and invalidate on logout. No auth material in URLs or logs.
+## Secrets & crypto
+- No hardcoded secrets, keys, or tokens in source or fixtures. Read from config/env/secret store.
+- Use the platform's vetted crypto primitives. No home-rolled crypto, no MD5/SHA1 for passwords (use the project's password hasher), no ECB, no static IVs.
+- Compare secrets in constant time. Generate tokens from a CSPRNG.
+## Errors, logging, dependencies
+- Error responses leak nothing exploitable (no stack traces, SQL, or internal paths to clients). Log the detail server-side instead.
+- Never log secrets, tokens, full PANs, or full PII.
+- New dependency? Justify it, pin it, and confirm it is maintained — a new transitive supply-chain surface is a review item.
+## Verification bar (must hold before commit)
+- Every user-input path in the diff is validated and its sinks are parameterized/encoded.
+- Every protected action has a server-side authz check on the acting identity and resource.
+- No secret, no home-rolled crypto, no leaking error path introduced.
+- If any item cannot be satisfied within task scope, stop and surface it as a security finding — do not commit around it. Pair with [np-threat-model] when the change introduces a new trust boundary.

package/skills/np-secure-design/SKILL.md ADDED Viewed

@@ -0,0 +1,44 @@
+---
+name: np-secure-design
+description: "Quality bar for designing a new feature, system, or integration with security implications — a new external surface, a new trust boundary, a new privilege or token model, or the handling of sensitive assets (credentials, PII, money, secrets). Triggered for architect and executor work that shapes how a thing will be built, not how it is coded line-by-line. Encodes design-time security rules the design MUST satisfy before commit — the design-time twin of [np-secure-code-review]. Language- and framework-agnostic."
+user-invocable: false
+---
+# Secure Design
+Security is decided at design time. A control you forgot to design in is a vulnerability you will ship; the code review can only catch what the architecture left room for. Apply this bar to the design you are about to commit, before any line of implementation depends on it.
+## Before editing
+- Read existing security decisions/conventions: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "security design <surface>" --task $TASK_ID`. Locked decisions (RULES/CONTEXT) — established trust boundaries, secret stores, privilege models — override generic defaults.
+- Name the trust boundaries this design crosses and the assets it protects. Everything else follows from those two facts.
+## Secure by default
+- The safe configuration is the default. Security is opt-out, never opt-in: encryption on, public access off, the strict policy selected unless deliberately relaxed.
+- Fail securely. An error, timeout, or unparseable input denies the action — it never grants access or falls through to a permissive path.
+- Build the security gate into the design now, not as a later hardening pass. A control retrofitted onto a shipped surface is a migration, not a tweak.
+## Least privilege & blast radius
+- Every component, credential, and token gets the minimum access it needs and nothing more. No shared god-credentials, no wildcard scopes, no standing admin.
+- Scope and time-box tokens; isolate secrets in a store, never in code, config, or the same blast radius as the data they unlock.
+- Segment so a single compromise is contained. Ask: if this component is fully owned, what else falls? Design to make that answer small.
+## Zero trust & defense in depth
+- Authenticate and authorize every request at the boundary. Never trust the network, the caller's claimed identity, or a client-supplied role/flag — verify against the acting identity on the specific resource.
+- Never rely on a single control. A WAF is not a substitute for input validation; network position is not a substitute for authz. Each layer assumes the one in front of it failed.
+- Validate across boundaries even between your own services — internal does not mean trusted.
+## Threat-informed
+- Design against how this will actually be attacked, not a generic checklist. Enumerate the abuse cases for this specific surface and design the control that defeats each — pair with [np-threat-model] for any new trust boundary or external surface.
+## Verification bar (must hold before commit)
+- The default state of the design is the secure state; relaxing it is an explicit, visible choice.
+- Every component/credential/token in the design is least-privilege and scoped; secrets are isolated and a single compromise is contained.
+- Every entry point authenticates and authorizes against the acting identity — no implicit trust in network, caller claims, or a single control.
+- The design fails closed on every error path, and the security gate is part of the design, not a deferred follow-up.
+- Abuse cases for this surface are named and answered — pair with [np-threat-model] to enumerate them and [np-access-control] for the privilege model. The implementation that follows is bound by [np-secure-code-review].

package/skills/np-service-boundary/SKILL.md ADDED Viewed

@@ -0,0 +1,35 @@
+---
+name: np-service-boundary
+description: "Quality bar for changes that introduce or move a module/service boundary — splitting a service, extracting a module, adding a cross-module dependency, wiring an event-driven or microservice seam. Triggered for architect/executor work that changes how parts of the system are divided and coupled. Encodes the coupling rules the change MUST satisfy before commit, not a design document to author. Language- and framework-agnostic."
+user-invocable: false
+---
+# Service & Module Boundaries
+A boundary is a commitment. It decides what can change cheaply and what can't. Draw it for a reason, expose the least you can, and make the dependencies point on purpose.
+## Before editing
+- Read the project's existing module structure first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`. Match the established boundaries, names, and seams instead of inventing new ones.
+## Boundaries follow cohesion
+- Group what changes together; separate what changes independently. A boundary that forces two unrelated things to deploy in lockstep, or splits one thing across two modules, is in the wrong place.
+- Split into a separate service or module only for a real, present reason: independent scaling, independent deploy, or a distinct owner. Never split by reflex, by layer, or "for cleanliness."
+- A wrong boundary is expensive to undo. When unsure, keep it inside one module — extracting later is cheap; collapsing a premature split is not.
+## Dependency direction is deliberate
+- Dependencies are acyclic. No A→B→A. If you reach for a circular dependency, the boundary is wrong.
+- Depend on abstractions and stable things, not on volatile internals. Point arrows toward what changes least.
+- A boundary exposes an explicit, minimal contract. Internals — schemas, helpers, storage shape — stay private. Callers reaching past the contract into internals is the coupling you are here to prevent.
+## Sync vs async is a choice
+- A network or process hop is a failure boundary, not a free function call. The other side can be slow, absent, or duplicate the call — handle it (see [np-system-design] for timeouts and retries).
+- Choose synchronous only when the caller genuinely needs the result now and can wait. Otherwise prefer async/event-driven so the caller is not coupled to the callee's availability.
+- Event-driven decoupling fits where producers and consumers should not know each other. The producer emits a fact; it does not call consumers or assume who listens. Adding a consumer must not require touching the producer.
+## Verification bar (must hold before commit)
+- The boundary maps to a cohesion or ownership reason that is stated, not assumed — and a new service/module split has a real independent-scaling, deploy, or ownership justification.
+- The dependency graph stays acyclic; new dependencies point toward more stable abstractions, not volatile details.
+- Each boundary exposes a minimal explicit contract; callers do not reach into internals, and internals are not widened to satisfy one caller.
+- Sync vs async is chosen deliberately; every process/network hop is treated as a failure boundary, not assumed reliable (cross-check [np-system-design]).
+- Cross-boundary calls match the surrounding contract and error conventions (see [np-api-design]); async hops use the established messaging seam (see [np-queue-design]).
+- Event producers stay ignorant of consumers; adding or removing a consumer touches no producer code.

package/skills/np-system-design/SKILL.md ADDED Viewed

@@ -0,0 +1,40 @@
+---
+name: np-system-design
+description: "Quality bar for executor or architect work that designs a new system, module, or significant feature — introducing components, defining how they interact, or restructuring how responsibilities and state are split. Triggered when a change establishes structure others will build on, not isolated logic inside one existing unit. Encodes design rules the change MUST satisfy before commit, not a design document to author. Intent-level and language-agnostic: shapes responsibilities, data flow, and failure behavior — never file names or schema DDL."
+user-invocable: false
+---
+# System Design
+Good design is the cheapest design that meets the requirement and degrades predictably. Shape the components and their seams; resist solving problems the requirement does not pose yet.
+## Before editing
+- Read the project's existing architecture/conventions first: `node .nubos-pilot/bin/np-tools.cjs knowledge-search "<query>" --task $TASK_ID`. Match the established idiom — a new component should look like it belongs.
+## Responsibilities and boundaries
+- Give each component a single, statable responsibility. If you cannot name it in one sentence without "and", split it or narrow it.
+- Keep cohesion high inside a component and coupling low across the seam. The boundary, not the internals, is the load-bearing decision.
+- Depend on the narrowest contract that does the job; do not reach across a boundary into another component's internals.
+## Data flow and ownership
+- Make ownership explicit: one component owns each piece of state; others read or request, they do not mutate it behind its back.
+- Make the call graph explicit — who calls whom, in which direction. Avoid cycles between components.
+- State sync vs async per interaction and justify async only where it earns its cost (latency, decoupling, backpressure); do not default to it.
+## Simplicity and fit
+- Choose the simplest structure that meets the stated requirement. No speculative abstraction, no extension point for a use case nobody asked for (YAGNI).
+- Match design complexity to problem size: a small problem gets a small design. Adding layers, indirection, or a queue must be justified by a real, present constraint — not a hypothetical one.
+- Prefer fewer moving parts. Every new component, hop, or shared mutable state is a cost that must pay for itself.
+## Failure modes and testability
+- Name how the design degrades: what happens when a dependency is slow, absent, or returns garbage. The answer must be a deliberate choice, not an accident.
+- Design components to be exercisable in isolation — dependencies enter through the seam, so a component can be tested without standing up the whole system.
+## Verification bar (must hold before commit)
+- Each component has a single responsibility statable in one sentence; boundaries follow cohesion, not convenience (pair with [np-service-boundary] when a seam is non-trivial).
+- State ownership is unambiguous and the call direction is acyclic; sync vs async is a stated decision, not a default.
+- The design is the simplest that meets the requirement — no abstraction, layer, or generality without a present constraint that demands it.
+- Failure behavior for each external dependency is named and intentional; the system degrades predictably rather than silently.
+- Components are testable in isolation through their seams.
+- The load-bearing decision and its rejected alternatives are stated inline; when it is architecturally significant, capture it via [np-adr].
+- The design stays intent-level — responsibilities, flows, and contracts — and prescribes no file names or schema DDL.