npm - @harness-engineering/cli - Versions diffs - 1.23.0 → 1.23.2 - Mend

@harness-engineering/cli 1.23.0 → 1.23.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (423) hide show

package/dist/agents/skills/cursor/check-mechanical-constraints/SKILL.md CHANGED Viewed

@@ -116,6 +116,15 @@ For each violation that was not auto-fixed, report:
 - Auto-fixed changes are verified by running the test suite
 - No violations are suppressed without explicit team approval and a documented reason
+## Rationalizations to Reject
+| Rationalization                                                                       | Why It Is Wrong                                                                                                                         |
+| ------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
+| "This forbidden import is just for one utility function -- I will suppress it inline" | The gate says no suppressing rules without documentation. Undocumented suppressions accumulate and erode the constraint system.         |
+| "The auto-fix looks right, so I do not need to re-run tests"                          | The gate says no auto-fix without test verification. Even import reordering can break code that depends on module initialization order. |
+| "This is just a Tier 2 warning -- it can wait until after merge"                      | Tier 2 violations must be resolved before merge to main. Warnings that accumulate on main become the new baseline.                      |
+| "The linter rule does not make sense for this project, so I will just disable it"     | Propose a config change with justification, do not disable the rule inline. Fix it at the configuration level.                          |
 ## Examples
 ### Example: Forbidden import detected

package/dist/agents/skills/cursor/cleanup-dead-code/SKILL.md CHANGED Viewed

@@ -191,6 +191,17 @@ Code behind feature flags or environment checks may appear dead in the default c
 - No dynamically-imported, type-only, or side-effect code was accidentally deleted
 - Each cleanup commit is atomic and has a descriptive message explaining what was removed and why
+## Rationalizations to Reject
+These are common rationalizations that sound reasonable but lead to incorrect results. When you catch yourself thinking any of these, stop and follow the documented process instead.
+| Rationalization                                                                                        | Why It Is Wrong                                                                                                                                                 |
+| ------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "This export has zero static imports, so it is definitely dead and safe to remove"                     | Zero static imports does not mean zero consumers. Dynamic imports, type-only imports, side-effect imports, and package entry points all create false positives. |
+| "I removed the dead code and the tests pass, so I do not need to run harness validate and check-deps"  | Both harness validate and harness check-deps must pass after every cleanup. Dead code removal can introduce dependency violations.                              |
+| "The convergence loop found new dead code after my fixes, but it is probably just noise from the tool" | Removing dead code creates more dead code. The convergence loop exists to catch these cascades. If the issue count decreased, loop back.                        |
+| "The entropy report has 60 items but I can clean them all up in one pass to be thorough"               | When the report is very large (>50 items), pick the highest-confidence dead code first. Attempting everything at once risks compound errors.                    |
 ## Examples
 ### Example: Removing unused utility functions

package/dist/agents/skills/cursor/detect-doc-drift/SKILL.md CHANGED Viewed

@@ -127,6 +127,15 @@ Group findings by documentation file so that fixes can be applied file-by-file.
 - No documentation references deleted files, functions, or features
 - Drift findings are prioritized and assigned to the appropriate fix cycle
+## Rationalizations to Reject
+| Rationalization                                                           | Why It Is Wrong                                                                                                                      |
+| ------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
+| "The docs are close enough -- a renamed function is obvious from context" | Renamed references in AGENTS.md cause AI agents to hallucinate about non-existent code. Precision matters.                           |
+| "We only changed internal code, so the docs do not need checking"         | Internal API docs with wrong signatures waste developer debugging time. Changed-behavior-not-reflected drift is High priority.       |
+| "There are too many findings to deal with right now, so skip the scan"    | The escalation protocol exists for this case: focus on Critical and High items, create a tracking issue for the rest.                |
+| "We can rely on code review to catch stale docs"                          | Code reviewers focus on code correctness, not documentation cross-references. harness check-docs catches what humans routinely miss. |
 ## Examples
 ### Example: Renamed function detected

package/dist/agents/skills/cursor/enforce-architecture/SKILL.md CHANGED Viewed

@@ -272,21 +272,11 @@ These apply to ALL skills. If you catch yourself doing any of these, STOP.
 ## Rationalizations to Reject
-### Universal
-These reasoning patterns sound plausible but lead to bad outcomes. Reject them.
-- **"It's probably fine"** — "Probably" is not evidence. Verify before asserting.
-- **"This is best practice"** — Best practice in what context? Cite the source and
-  confirm it applies to this codebase.
-- **"We can fix it later"** — If it is worth flagging, it is worth documenting now
-  with a concrete follow-up plan.
-### Domain-Specific
-- **"The violation is minor — just one import"** — One violation sets a precedent. Enforce the constraint or document an explicit exception with rationale.
-- **"It works, so the architecture must be fine"** — Working code with bad architecture is technical debt with compound interest. Correct function does not excuse structural violations.
-- **"This is a legacy module, different rules apply"** — Legacy does not mean exempt. Either the constraint applies or it needs an explicit documented exception.
+| Rationalization                                  | Reality                                                                                                                              |
+| ------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------ |
+| "The violation is minor — just one import"       | One violation sets a precedent. Enforce the constraint or document an explicit exception with rationale.                             |
+| "It works, so the architecture must be fine"     | Working code with bad architecture is technical debt with compound interest. Correct function does not excuse structural violations. |
+| "This is a legacy module, different rules apply" | Legacy does not mean exempt. Either the constraint applies or it needs an explicit documented exception.                             |
 ## Escalation

package/dist/agents/skills/cursor/harness-accessibility/SKILL.md CHANGED Viewed

@@ -261,6 +261,16 @@ A11Y-030 [info] Hardcoded color value not from design token set
 - `A11Y-031`: Contrast failure -- fix requires choosing a darker color. Escalate to design tokens or get human input on replacement color.
 - `A11Y-001`: The `alt=""` fix assumes decorative. If the icon conveys meaning, human must write descriptive alt text.
+## Rationalizations to Reject
+| Rationalization                                                                                                                         | Reality                                                                                                                                                                                                                                                                        |
+| --------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| "The contrast ratio is 4.4:1 — that's essentially 4.5:1 and the difference is imperceptible. I'll mark it passing."                     | WCAG AA requires exactly 4.5:1 for normal text. 4.4:1 fails. There is no rounding or visual perception exception in the standard. Flag the failure with the actual ratio and let the user decide how to remediate.                                                             |
+| "This `<div onClick>` already has good visual styling — adding `role='button'` and a keyboard handler is unnecessary clutter."          | A clickable `<div>` without `role="button"` and `onKeyDown` is inaccessible to keyboard-only users and screen reader users. Visual styling has no bearing on ARIA semantics or keyboard reachability. This is A11Y-012, always flagged.                                        |
+| "The automated fix for this `<img>` alt attribute is obvious — I'll apply it without showing the diff since it's just adding `alt=''`." | Every automated fix must be presented as a before/after diff before being written to disk. This is a hard gate. The correct alt value for non-decorative images requires human judgment, and even `alt=""` makes a semantic claim about decorativeness that must be confirmed. |
+| "I18n is enabled, so I'll skip the `lang` and `dir` attribute checks entirely — harness-i18n will catch them."                          | Deferral to harness-i18n is conditional on `i18n.enabled: true` in config. If i18n is not configured, these checks remain part of this skill's scan. Always read the config before skipping any check category.                                                                |
+| "There are 15 findings in this component — I'll fix the easy ones automatically and leave the rest without reporting them explicitly."  | All findings must be reported, regardless of whether they are auto-fixable. The report is the primary deliverable of the REPORT phase. Selectively reporting only fixable violations hides the full accessibility debt from the team.                                          |
 ## Gates
 These are hard stops. Violating any gate means the process has broken down.

package/dist/agents/skills/cursor/harness-api-design/SKILL.md CHANGED Viewed

@@ -332,21 +332,11 @@ These apply to ALL skills. If you catch yourself doing any of these, STOP.
 ## Rationalizations to Reject
-### Universal
-These reasoning patterns sound plausible but lead to bad outcomes. Reject them.
-- **"It's probably fine"** — "Probably" is not evidence. Verify before asserting.
-- **"This is best practice"** — Best practice in what context? Cite the source and
-  confirm it applies to this codebase.
-- **"We can fix it later"** — If it is worth flagging, it is worth documenting now
-  with a concrete follow-up plan.
-### Domain-Specific
-- **"It's an internal API, breaking changes are fine"** — Internal consumers break too. Version the change or coordinate the migration explicitly.
-- **"The field name is obvious enough"** — API field names are a public contract. Follow existing naming conventions and document the semantics.
-- **"Nobody uses that endpoint anyway"** — Verify with access logs or usage data. Assumptions about usage without evidence lead to silent breakages.
+| Rationalization                                   | Reality                                                                                                   |
+| ------------------------------------------------- | --------------------------------------------------------------------------------------------------------- |
+| "It's an internal API, breaking changes are fine" | Internal consumers break too. Version the change or coordinate the migration explicitly.                  |
+| "The field name is obvious enough"                | API field names are a public contract. Follow existing naming conventions and document the semantics.     |
+| "Nobody uses that endpoint anyway"                | Verify with access logs or usage data. Assumptions about usage without evidence lead to silent breakages. |
 ## Escalation

package/dist/agents/skills/cursor/harness-architecture-advisor/SKILL.md CHANGED Viewed

@@ -309,21 +309,11 @@ These apply to ALL skills. If you catch yourself doing any of these, STOP.
 ## Rationalizations to Reject
-### Universal
-These reasoning patterns sound plausible but lead to bad outcomes. Reject them.
-- **"It's probably fine"** — "Probably" is not evidence. Verify before asserting.
-- **"This is best practice"** — Best practice in what context? Cite the source and
-  confirm it applies to this codebase.
-- **"We can fix it later"** — If it is worth flagging, it is worth documenting now
-  with a concrete follow-up plan.
-### Domain-Specific
-- **"This will be easier to maintain"** — Easier for whom, and compared to what? Cite the maintenance burden with evidence from the codebase.
-- **"It's the modern approach"** — Modernity is not a design criterion. Fitness for purpose is. State the specific benefit.
-- **"Other teams do it this way"** — Other teams have different constraints. Evaluate the option on this codebase's specific merits.
+| Rationalization                   | Reality                                                                                             |
+| --------------------------------- | --------------------------------------------------------------------------------------------------- |
+| "This will be easier to maintain" | Easier for whom, and compared to what? Cite the maintenance burden with evidence from the codebase. |
+| "It's the modern approach"        | Modernity is not a design criterion. Fitness for purpose is. State the specific benefit.            |
+| "Other teams do it this way"      | Other teams have different constraints. Evaluate the option on this codebase's specific merits.     |
 ## Escalation

package/dist/agents/skills/cursor/harness-auth/SKILL.md CHANGED Viewed

@@ -307,21 +307,11 @@ These apply to ALL skills. If you catch yourself doing any of these, STOP.
 ## Rationalizations to Reject
-### Universal
-These reasoning patterns sound plausible but lead to bad outcomes. Reject them.
-- **"It's probably fine"** — "Probably" is not evidence. Verify before asserting.
-- **"This is best practice"** — Best practice in what context? Cite the source and
-  confirm it applies to this codebase.
-- **"We can fix it later"** — If it is worth flagging, it is worth documenting now
-  with a concrete follow-up plan.
-### Domain-Specific
-- **"No one would guess this token format"** — Security by obscurity. Tokens must be cryptographically secure regardless of format predictability.
-- **"This is an internal service, auth is less critical"** — Internal services are lateral movement targets. Authenticate all service boundaries.
-- **"The frontend validates permissions, so the backend doesn't need to"** — Client-side checks are bypassable. Server-side authorization is the only real enforcement.
+| Rationalization                                                      | Reality                                                                                             |
+| -------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
+| "No one would guess this token format"                               | Security by obscurity. Tokens must be cryptographically secure regardless of format predictability. |
+| "This is an internal service, auth is less critical"                 | Internal services are lateral movement targets. Authenticate all service boundaries.                |
+| "The frontend validates permissions, so the backend doesn't need to" | Client-side checks are bypassable. Server-side authorization is the only real enforcement.          |
 ## Escalation

package/dist/agents/skills/cursor/harness-autopilot/SKILL.md CHANGED Viewed

@@ -720,6 +720,16 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
 - Checkpoint commits fire after every passing checkpoint; recovery commits fire on retry budget exhaustion
 - Rigor level persists across session resume — set once during INIT, never changed mid-session
+## Rationalizations to Reject
+| Rationalization                                                                                     | Reality                                                                                                                                                                                                                 |
+| --------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "This phase is low complexity, so I can skip the APPROVE_PLAN gate entirely"                        | Low complexity only means auto-approval when no concern signals fire. If the planner flagged concerns, produced a complexity override, or the task count exceeds 15, the gate pauses regardless of the spec annotation. |
+| "I can write the planning logic inline instead of dispatching to the harness-planner persona agent" | The Iron Law is explicit: autopilot delegates, never reimplements. Using a general-purpose agent or inlining planning logic bypasses the harness methodology.                                                           |
+| "The retry budget is exhausted but I can try one more approach before stopping"                     | The 3-attempt retry budget exists because each failed attempt degrades context and compounds risk. Exceeding the budget without human input turns a recoverable failure into an unrecoverable one.                      |
+| "I will skip the scratchpad since keeping research in conversation is faster"                       | Scratchpad is gated by rigor level. At standard or thorough, bulky research (>500 words) must go to scratchpad to keep agent conversation focused on decisions.                                                         |
+| "The plan auto-approved, so I can skip recording the decision in the decisions array"               | Every plan approval -- auto or manual -- must be recorded with its signal evaluation. The decisions array is the audit trail that explains why a plan was approved.                                                     |
 ## Examples
 ### Example: 3-Phase Security Scanner

package/dist/agents/skills/cursor/harness-brainstorming/SKILL.md CHANGED Viewed

@@ -326,6 +326,15 @@ These patterns make requirements testable and unambiguous. Apply them when the o
 - `harness validate` passes after the spec is written
 - If scope was too large, it was decomposed into sub-projects with the human's approval
+## Rationalizations to Reject
+| Rationalization                                                                    | Reality                                                                                                                                                                                     |
+| ---------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "I already understand the problem well enough to skip the question phase"          | The Gates section is explicit: you must ask at least one clarifying question before proposing approaches. Skipping questions means making untested assumptions.                             |
+| "There is only one viable approach, so presenting alternatives would be contrived" | The gate requires at least 2 approaches with tradeoffs. A single approach is a recommendation disguised as a decision -- the human has no real choice.                                      |
+| "I will draft the full spec and present it for review all at once to save time"    | Section-dump specs are explicitly forbidden. Presenting section by section with feedback between each catches misunderstandings early.                                                      |
+| "This future capability is low-cost to include now, so we should build it in"      | YAGNI is a gate, not a suggestion. Every capability must trace to a stated requirement. "We might need this later" is the exact rationalization that turns focused specs into bloated ones. |
 ## Examples
 ### Example: Designing a Notification System

package/dist/agents/skills/cursor/harness-caching/SKILL.md CHANGED Viewed

@@ -295,6 +295,16 @@ async function getProducts(): Promise<Product[]> {
 }
 ```
+## Rationalizations to Reject
+| Rationalization                                                                         | Reality                                                                                                                                                                                                                                                                                                                               |
+| --------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "Cache invalidation is complex — a short TTL means we're always fresh enough"           | Short TTLs without event-based invalidation guarantee staleness windows proportional to the TTL. A 60-second TTL on pricing data means a price change takes up to 60 seconds to propagate. For correctness-sensitive data, TTL alone is not a strategy — it is a staleness budget.                                                    |
+| "Redis is fast — we don't need to worry about the cache stampede, it'll resolve itself" | Cache stampedes do not resolve themselves; they resolve after they have already caused a database overload. With 200 requests per second and a 450ms rebuild time, a stampede generates ~90 concurrent database queries at once. The database can fail before a single cache entry is rebuilt.                                        |
+| "We cache the whole user object so we have everything in one key"                       | Caching composite objects creates broad invalidation requirements. A user profile update invalidates a cache entry that includes unrelated fields like payment methods or notification preferences. Cache keys scoped to specific sub-resources reduce invalidation blast radius.                                                     |
+| "The cache failure is caught and logged — users will just get a slower response"        | Catching the error is not the same as handling it. If cache failure causes a 30-second database query on every request, "slower response" becomes "timeout" becomes "circuit breaker opens" becomes "service unavailable." Failure handling must degrade to database reads within the acceptable latency budget, not unconditionally. |
+| "We namespace by feature name — there's no collision risk"                              | Feature names are not unique across services. Two services that share a Redis instance and both cache `user:123` without a service prefix will corrupt each other's data silently. Namespace prefixes must include the service and the data schema version, not just the feature name.                                                |
 ## Gates
 - **No unbounded caches.** Every cache (in-memory, Redis, Memcached) must have either a `maxSize`/`maxmemory` limit or a TTL on every key. An unbounded cache will grow until it causes memory exhaustion. WHERE a cache has no eviction policy configured, THEN the skill must halt and require one before proceeding.

package/dist/agents/skills/cursor/harness-chaos/SKILL.md CHANGED Viewed

@@ -280,6 +280,16 @@ Result: PASSED - System maintained availability throughout.
         Zero 5xx errors observed. No data loss.
 ```
+## Rationalizations to Reject
+| Rationalization                                                                           | Reality                                                                                                                                                                                                                                                                        |
+| ----------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| "Our circuit breakers are already tested in unit tests — we don't need chaos experiments" | Unit tests verify that circuit breaker code executes. Chaos experiments verify that the circuit breaker actually opens under real load, that the fallback produces an acceptable user-facing response, and that monitoring detects the transition. These are different things. |
+| "We can't run chaos experiments in production — it's too risky"                           | Avoiding chaos experiments does not reduce risk — it defers the discovery of failure modes to real incidents. Chaos experiments in staging with defined abort criteria and short durations are lower risk than discovering failure modes at 2am during a real outage.          |
+| "The experiment passed in staging so we know it'll work in production"                    | Staging differences in traffic volume, data distribution, and infrastructure scale can mask failure modes. Staging experiments validate the mechanism; production experiments (with tightly scoped blast radius) validate the system under real conditions.                    |
+| "We injected the fault and the system recovered — the experiment is done"                 | Recovery alone does not validate resilience. The experiment must also confirm: detection time (did monitoring catch it?), recovery time (did it meet the SLA?), and no data loss or corruption. A system that recovers after 10 minutes of data inconsistency has not passed.  |
+| "We have runbooks for these failure modes, so game days aren't necessary"                 | A runbook that has never been executed under pressure is a hypothesis, not a procedure. Game days reveal whether runbooks are complete, whether on-call engineers can execute them accurately under stress, and whether the estimated recovery times are realistic.            |
 ## Gates
 - **No chaos experiments without abort criteria.** Every experiment must define conditions under which it is immediately terminated. Running an experiment that you cannot stop is reckless, not engineering.

package/dist/agents/skills/cursor/harness-code-review/SKILL.md CHANGED Viewed

@@ -831,21 +831,11 @@ These apply to ALL skills. If you catch yourself doing any of these, STOP.
 ## Rationalizations to Reject
-### Universal
-These reasoning patterns sound plausible but lead to bad outcomes. Reject them.
-- **"It's probably fine"** — "Probably" is not evidence. Verify before asserting.
-- **"This is best practice"** — Best practice in what context? Cite the source and
-  confirm it applies to this codebase.
-- **"We can fix it later"** — If it is worth flagging, it is worth documenting now
-  with a concrete follow-up plan.
-### Domain-Specific
-- **"The tests pass, so the logic must be correct"** — Tests can be incomplete. Review the logic independently of test results.
-- **"This is how it was done elsewhere in the codebase"** — Existing patterns can be wrong. Evaluate the pattern on its merits, not just its precedent.
-- **"It's just a refactor, low risk"** — Refactors change behavior surfaces. Review them with the same rigor as feature changes.
+| Rationalization                                     | Reality                                                                                     |
+| --------------------------------------------------- | ------------------------------------------------------------------------------------------- |
+| "The tests pass, so the logic must be correct"      | Tests can be incomplete. Review the logic independently of test results.                    |
+| "This is how it was done elsewhere in the codebase" | Existing patterns can be wrong. Evaluate the pattern on its merits, not just its precedent. |
+| "It's just a refactor, low risk"                    | Refactors change behavior surfaces. Review them with the same rigor as feature changes.     |
 ## Escalation

package/dist/agents/skills/cursor/harness-codebase-cleanup/SKILL.md CHANGED Viewed

@@ -187,6 +187,17 @@ Remaining findings: 3 (require human action)
      Suggested: Deprecate with @deprecated JSDoc tag, remove in next major version
 ```
+## Rationalizations to Reject
+These are common rationalizations that sound reasonable but lead to incorrect results. When you catch yourself thinking any of these, stop and follow the documented process instead.
+| Rationalization                                                                                               | Why It Is Wrong                                                                                                                                  |
+| ------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
+| "This dead export is in a high-churn file but the removal is clearly safe"                                    | High-churn files have more hidden consumers. Safe findings in the top 10% by churn are downgraded to probably-safe, requiring explicit approval. |
+| "The convergence loop is not reducing findings quickly enough, so I will apply unsafe fixes to make progress" | Unsafe findings are never auto-fixed, regardless of convergence pressure. Each requires human judgment.                                          |
+| "The verification gate failed on a probably-safe fix, but I am confident the fix is correct"                  | When verification fails after a fix batch, the entire batch must be reverted and all findings reclassified as unsafe.                            |
+| "I will skip the hotspot context phase since it adds time and the churn data is just supplementary"           | The hotspot map drives safety classification accuracy. Without it, safe fixes in high-churn areas are not downgraded.                            |
 ## Examples
 ### Example: Post-Refactoring Cleanup

package/dist/agents/skills/cursor/harness-compliance/SKILL.md CHANGED Viewed

@@ -288,6 +288,16 @@ Phase 4: REPORT
       7. Add automated HIPAA compliance regression tests to CI pipeline
 ```
+## Rationalizations to Reject
+| Rationalization                                                                 | Reality                                                                                                                                                                                                                                       |
+| ------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "We're not in the EU so GDPR doesn't apply to us"                               | GDPR applies to any organization that processes data of EU residents, regardless of where the organization is based. If a single EU user can sign up, GDPR scope must be assessed.                                                            |
+| "Our lawyers will handle the compliance questions — just document what we have" | Legal review and technical implementation are distinct. Lawyers cannot attest that Article 17 deletion cascades to S3 and Segment. The technical implementation must be audited separately.                                                   |
+| "We already did a SOC2 audit last year — this codebase is the same"             | SOC2 Type II assesses controls over time. Adding a new data store, third-party processor, or API endpoint can invalidate previous control attestations. Audits are point-in-time snapshots, not permanent certificates.                       |
+| "The audit isn't for three months — we can fix the gaps before then"            | Gaps found now require implementation, testing, and evidence collection time. Auditors expect evidence of sustained control operation, not freshly deployed fixes. A gap fixed the week before an audit is still a finding.                   |
+| "That field is technically a username, not PII"                                 | Data classification cannot be done by naming convention. A username combined with any other identifying field (email, IP, phone) is PII under GDPR. Classification must be based on the realistic re-identification risk, not the field name. |
 ## Gates
 - **No compliance report without data classification.** A compliance audit that does not inventory and classify data fields is incomplete. The classification matrix must be produced before controls can be meaningfully assessed. Without knowing what data exists and where, control checks are theoretical.

package/dist/agents/skills/cursor/harness-containerization/SKILL.md CHANGED Viewed

@@ -269,6 +269,16 @@ Phase 4: VALIDATE
   Result: PASS -- well-configured container setup
 ```
+## Rationalizations to Reject
+| Rationalization                                                                       | Reality                                                                                                                                                                                                                                                                                                           |
+| ------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "We use `latest` in production because we always want the most recent image"          | `latest` is a mutable pointer. A deployment at 9am and a rollback at 9pm may pull different image contents with the same tag. Immutable tags (semver or digest) are required for reproducible deployments and reliable rollbacks.                                                                                 |
+| "The container runs as root because it needs to bind to port 80"                      | Binding to a privileged port requires elevated capability, not full root access. The correct solution is to run the container as a non-root user and use `--cap-add NET_BIND_SERVICE`, or to expose a high port and use a load balancer or ingress for port translation.                                          |
+| "We don't set resource limits because we want the container to use whatever it needs" | Containers without memory limits can exhaust node memory, triggering OOM kills of other containers on the same node. Kubernetes uses resource requests for scheduling and limits for safety; omitting limits transfers the risk of one container onto the entire node.                                            |
+| "Our image is 2GB but it only takes a few seconds to pull in our CI"                  | Image size multiplies across every developer pull, every CI run, and every Kubernetes pod startup. A 2GB image that takes 3 seconds to pull in CI with a warm cache takes 90 seconds on a cold node during an autoscaling event at peak traffic.                                                                  |
+| "We don't need liveness and readiness probes — the container exits if it crashes"     | Process exit is a coarse health signal. A process that is running but deadlocked, stuck in an infinite retry loop, or unable to connect to its database will never exit. Kubernetes will continue routing traffic to it. Readiness probes prevent traffic routing to unhealthy containers that are still running. |
 ## Gates
 - **No `latest` tag in production manifests.** Production Kubernetes manifests or compose files using `latest` image tags are blocking findings. Immutable tags or digests are required.

package/dist/agents/skills/cursor/harness-data-pipeline/SKILL.md CHANGED Viewed

@@ -259,6 +259,16 @@ Phase 4: DOCUMENT
   Quality Report: FAIL (2 errors requiring immediate attention)
 ```
+## Rationalizations to Reject
+| Rationalization                                                                                                                   | Reality                                                                                                                                                                                                                                                                                            |
+| --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "The pipeline failed halfway through — we'll just re-run it and it'll pick up where it left off."                                 | A non-idempotent pipeline that is re-run from the middle writes duplicate records for the portion that succeeded before failure. The correct fix is to make the pipeline idempotent (MERGE, upsert, or delete-then-insert) so re-runs are always safe, not to assume partial re-runs are harmless. |
+| "The model has no dbt tests yet, but it's only used in one dashboard — low risk."                                                 | Every untested model is a silent data quality failure waiting to reach a stakeholder. Revenue and user-facing models require test coverage regardless of how few consumers they have today. The number of consumers grows; the coverage does not add itself retroactively.                         |
+| "We're still figuring out the schema — we'll add data contracts once the model stabilizes."                                       | Contracts are most valuable during schema evolution, not after it. An unstable schema without a contract lets breaking changes propagate undetected to downstream consumers. Add the contract as the model is defined; update it explicitly as the schema changes. That explicitness is the value. |
+| "Circular dependency detection is handled by the orchestrator — I don't need to check for it during design."                      | Orchestrators detect circular dependencies at runtime, after the DAG has been deployed. Static analysis during design catches them before deployment, before the pipeline fails at 3am, and before engineers have to diagnose a graph cycle under pressure. Detect them early.                     |
+| "The freshness check is too strict — it keeps alerting because the upstream source is occasionally delayed. I'll just remove it." | A freshness check that fires too often has the wrong threshold. Removing it means stale data reaches analysts silently. Adjust the `warn_after` and `error_after` thresholds to match the source's actual SLA, and escalate if the source cannot meet its own SLA.                                 |
 ## Gates
 - **No approving non-idempotent production pipelines.** If a pipeline writes data without MERGE, upsert, or delete-then-insert patterns, it is flagged as an error. Non-idempotent pipelines cause data duplication on re-runs.

package/dist/agents/skills/cursor/harness-data-validation/SKILL.md CHANGED Viewed

@@ -328,6 +328,16 @@ await producer.send({ topic: 'order-events', messages: [{ value: JSON.stringify(
 const event = orderPlacedSchema.parse(JSON.parse(message.value));
 ```
+## Rationalizations to Reject
+| Rationalization                                                                                                                 | Reality                                                                                                                                                                                                                                                                                                                                                                           |
+| ------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "TypeScript already types the request body — we don't need runtime Zod validation on top of that."                              | TypeScript types are erased at runtime. `req.body as CreateUserInput` compiles fine and accepts any payload at runtime. A missing required field, a string where a number is expected, or an injected extra field bypasses TypeScript entirely. Runtime validation is not redundant with types — it is the only enforcement that exists when the application is actually running. |
+| "We trust this internal service — we don't need to validate its message payloads."                                              | Trust boundaries are not about intent; they are about reliability. Internal services change their schemas, deploy independently, and have bugs. A consumer that accepts payloads without validation silently processes malformed data and produces corrupted downstream records. Validate every message that crosses a process boundary, regardless of who sent it.               |
+| "The validation error message just says 'invalid input' — the developer can look at the schema to understand what failed."      | Developers are not the only consumers of validation errors. Frontend applications display them, monitoring systems alert on them, and support teams diagnose them. A message that says `{"field":"email","expected":"string email","received":"null"}` is resolved in seconds. "Invalid input" creates a support ticket.                                                          |
+| "The two services define their own schemas independently but they've been in sync so far — shared contracts are overkill."      | "In sync so far" describes luck, not process. Independent schema definitions diverge at the next feature sprint when one team changes a field name. Shared contracts in a common package make schema drift a compile-time error instead of a runtime mystery. The divergence between `userId` and `customerId` in the same event is exactly what independent definitions produce. |
+| "Environment variable validation at startup is unnecessary — if a variable is missing, the app will fail when it's first used." | Failing at the first usage of a missing variable produces a cryptic error deep in the call stack, often after the app has been running for minutes and has processed real requests. Failing at startup produces a clear error with the variable name, before any requests are served. Fast failure is always better than deferred failure.                                        |
 ## Gates
 - **No type assertions on external data.** WHERE `as` is used to cast data from an API response, message payload, request body, or `JSON.parse` result, THEN the skill must flag it as a trust boundary violation. Type assertions bypass runtime validation entirely. The only acceptable pattern is runtime validation followed by type inference.

package/dist/agents/skills/cursor/harness-database/SKILL.md CHANGED Viewed

@@ -286,21 +286,11 @@ These apply to ALL skills. If you catch yourself doing any of these, STOP.
 ## Rationalizations to Reject
-### Universal
-These reasoning patterns sound plausible but lead to bad outcomes. Reject them.
-- **"It's probably fine"** — "Probably" is not evidence. Verify before asserting.
-- **"This is best practice"** — Best practice in what context? Cite the source and
-  confirm it applies to this codebase.
-- **"We can fix it later"** — If it is worth flagging, it is worth documenting now
-  with a concrete follow-up plan.
-### Domain-Specific
-- **"The table is small, we don't need an index"** — Tables grow. Plan for the steady state, not the current row count.
-- **"The ORM handles this for us"** — ORMs generate SQL that may not match your performance expectations. Review the generated queries for correctness and efficiency.
-- **"We can always add a migration later"** — Schema changes in production have operational cost. Design the schema thoughtfully now rather than migrating repeatedly.
+| Rationalization                              | Reality                                                                                                                          |
+| -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- |
+| "The table is small, we don't need an index" | Tables grow. Plan for the steady state, not the current row count.                                                               |
+| "The ORM handles this for us"                | ORMs generate SQL that may not match your performance expectations. Review the generated queries for correctness and efficiency. |
+| "We can always add a migration later"        | Schema changes in production have operational cost. Design the schema thoughtfully now rather than migrating repeatedly.         |
 ## Escalation

package/dist/agents/skills/cursor/harness-debugging/SKILL.md CHANGED Viewed

@@ -298,6 +298,15 @@ Update the session status to `resolved`.
 - Debug session file is complete with investigation log, hypotheses, and resolution
 - Learnings were captured for future reference
+## Rationalizations to Reject
+| Rationalization                                                                   | Reality                                                                                                                                     |
+| --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| "I have a strong hunch about what is wrong, so I will jump straight to fixing it" | Phase 1 INVESTIGATE must be completed before ANY fix code is written. You are guessing, not debugging.                                      |
+| "I changed two things and the bug is gone, so the fix must be correct"            | One variable at a time is a gate. Changing multiple things simultaneously means you do not know which change fixed it.                      |
+| "This is my third attempt but I feel close, so one more try before escalating"    | After 3 failed fix attempts, the gate requires you to question the architecture. The problem is likely not where you think it is.           |
+| "A try-catch that swallows the error prevents the crash, so the bug is fixed"     | Symptom suppression is explicitly listed as a bad fix. Wrapping the failure in a try-catch addresses what the bug did, not why it happened. |
 ## Examples
 ### Example: API Endpoint Returns 500 Instead of 400

package/dist/agents/skills/cursor/harness-dependency-health/SKILL.md CHANGED Viewed

@@ -145,6 +145,16 @@ For each problem found, generate a specific, actionable recommendation:
 - Report follows the structured output format
 - All findings are backed by graph query evidence (with graph) or systematic static analysis (without graph)
+## Rationalizations to Reject
+These are common rationalizations that sound reasonable but lead to incorrect results. When you catch yourself thinking any of these, stop and follow the documented process instead.
+| Rationalization                                                                                                  | Why It Is Wrong                                                                                                                               |
+| ---------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
+| "There are a few orphan files but they are probably test fixtures or configs, so I will skip investigating them" | Orphan detection explicitly excludes entry points. Files with zero inbound imports that are not entry points must be investigated.            |
+| "The cycle is between two closely related files, so it is not really a problem"                                  | Cycles create fragile coupling where any change in the cycle affects all members. Even "related" files should not have circular dependencies. |
+| "The health score is a B, which is good enough -- no need to act on the recommendations"                         | A hub with 14 importers is a single point of failure. "Good enough" scores mask specific structural risks that compound over time.            |
 ## Examples
 ### Example: Weekly Health Check on Monorepo

package/dist/agents/skills/cursor/harness-deployment/SKILL.md CHANGED Viewed

@@ -283,21 +283,11 @@ These apply to ALL skills. If you catch yourself doing any of these, STOP.
 ## Rationalizations to Reject
-### Universal
-These reasoning patterns sound plausible but lead to bad outcomes. Reject them.
-- **"It's probably fine"** — "Probably" is not evidence. Verify before asserting.
-- **"This is best practice"** — Best practice in what context? Cite the source and
-  confirm it applies to this codebase.
-- **"We can fix it later"** — If it is worth flagging, it is worth documenting now
-  with a concrete follow-up plan.
-### Domain-Specific
-- **"It's just a config change, not a code change"** — Config changes cause outages at the same rate as code changes. Deploy them with the same rigor and rollback strategy.
-- **"We tested this in staging"** — Staging is not production. Traffic patterns, data volume, and edge cases differ. Staging success does not guarantee production safety.
-- **"Downtime will be brief"** — Brief is not zero. Quantify the expected impact and communicate it to stakeholders before deploying.
+| Rationalization                                | Reality                                                                                                                                |
+| ---------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
+| "It's just a config change, not a code change" | Config changes cause outages at the same rate as code changes. Deploy them with the same rigor and rollback strategy.                  |
+| "We tested this in staging"                    | Staging is not production. Traffic patterns, data volume, and edge cases differ. Staging success does not guarantee production safety. |
+| "Downtime will be brief"                       | Brief is not zero. Quantify the expected impact and communicate it to stakeholders before deploying.                                   |
 ## Escalation

package/dist/agents/skills/cursor/harness-design/SKILL.md CHANGED Viewed

@@ -246,6 +246,16 @@ DESIGN-003 [info] Three font weights in one component
   Fix:        Consolidate font-weight values to 400 (body) and 600 (heading) only
 ```
+## Rationalizations to Reject
+| Rationalization                                                                                                             | Reality                                                                                                                                                                                                                            |
+| --------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "The tokens are already defined, so the aesthetic intent is obvious — I can infer it and skip Phase 1."                     | Tokens define values, not intent. The style, tone, and differentiator exist in the designer's head, not in a color ramp. DESIGN.md cannot be generated without explicit confirmation.                                              |
+| "There are only 3 violations and they're minor — I'll skip recording them in the graph to save time."                       | Unrecorded violations are invisible to every downstream skill. harness-impact-analysis and harness-accessibility rely on `VIOLATES_DESIGN` edges existing. Skip graph writes and the enforcement record is permanently incomplete. |
+| "The strictness level isn't set in config, so I'll just use strict to be safe."                                             | Defaulting to strict without reading config imposes blocking CI failures the team never agreed to. Always read `design.strictness` and default to `standard` when absent — not to the most aggressive level.                       |
+| "This anti-pattern is declared, but there are 40+ instances — it would take forever to report them all, so I'll summarize." | The REVIEW phase must report every finding with file path, line number, and severity. Summarizing hides the scope from the team and makes automated tooling miss violations.                                                       |
+| "DESIGN.md already exists from a previous run, so I can skip Phase 2 and go straight to REVIEW."                            | An existing DESIGN.md may be outdated or missing sections. The DIRECTION phase must verify all required sections are present and current before the REVIEW phase can rely on them.                                                 |
 ## Gates
 These are hard stops. Violating any gate means the process has broken down.

package/dist/agents/skills/cursor/harness-design-mobile/SKILL.md CHANGED Viewed

@@ -317,6 +317,16 @@ struct WorkoutRow: View {
 }
 ```
+## Rationalizations to Reject
+| Rationalization                                                                                                                                                 | Reality                                                                                                                                                                                                                                                                 |
+| --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "The touch target is 40pt on iOS — that's close to 44pt and the designer approved the comp, so I'll leave it."                                                  | The 44pt iOS minimum and 48dp Android minimum are non-negotiable gates, not guidelines. Touch target violations are `error` severity regardless of strictness level. Designer comp approval does not override platform accessibility requirements.                      |
+| "This is a cross-platform React Native component, so I only need to read the generic token mapping — platform-specific rules for iOS and Android are optional." | React Native components require both `ios.yaml` and `android.yaml` rules. Platform-specific rules govern safe areas, elevation, navigation patterns, and touch targets that differ between platforms. Missing either set produces non-compliant native behavior.        |
+| "The component uses a hardcoded shadow for iOS — `shadowColor`, `shadowOffset`, etc. Those aren't design tokens, they're platform APIs."                        | Shadow colors must still reference token values. `shadowColor: tokens.color.neutral[900]` is the correct form. Hardcoded shadow values like `#000` or `rgba(0,0,0,0.2)` are token binding violations the VERIFY phase will flag.                                        |
+| "There's no `design-system/DESIGN.md` yet, but I know the aesthetic intent from our planning discussion — I'll proceed with tokens only."                       | Proceeding without `DESIGN.md` means anti-pattern enforcement is disabled for the entire VERIFY phase. The anti-pattern check is what catches design intent violations beyond token correctness. Warn the user and recommend running harness-design first.              |
+| "The scaffold plan is straightforward — a simple card component. I'll skip presenting it to the user and just generate."                                        | The scaffold plan confirmation is when the user can catch incorrect platform assumptions (wrong StyleSheet structure, wrong platform APIs) before any code is written. Mobile components are harder to refactor than web components due to platform-specific branching. |
 ## Gates
 - **No component generation without reading tokens from harness-design-system.** The SCAFFOLD phase requires `design-system/tokens.json`. Do not generate components with hardcoded values as a fallback.

package/dist/agents/skills/cursor/harness-design-system/SKILL.md CHANGED Viewed

@@ -263,6 +263,16 @@ Spacing:         PASS (monotonically increasing, no gaps)
 Harness validate: PASS
 ```
+## Rationalizations to Reject
+| Rationalization                                                                                                            | Reality                                                                                                                                                                                                                                                 |
+| -------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| "The project already has a tailwind.config with colors — I can derive the token set from that and skip the DEFINE phase."  | Existing Tailwind config represents design debt, not design intent. The DEFINE phase exists to make deliberate choices about palette and typography. Deriving tokens from scattered config perpetuates the inconsistency the skill is meant to resolve. |
+| "One of the contrast pairs is 4.3:1 — close enough to 4.5:1 to pass. I'll mark it as passing."                             | 4.3:1 fails WCAG AA for normal text. There is no "close enough." Flag the failure and ask the user to choose an alternative. Silently accepting sub-threshold contrast is a compliance defect.                                                          |
+| "The user confirmed the palette in our conversation, so I can skip the formal confirmation gate and generate immediately." | The confirmation gate exists as a structural checkpoint, not a courtesy. Generate only after presenting the full palette + typography + spacing summary and receiving explicit approval. Conversation context can drift.                                |
+| "There are no existing design files, so I can skip the DISCOVER phase and go straight to defining."                        | The DISCOVER phase also detects the CSS framework and existing color/font usage. Skipping it means the generated tokens may not map to the actual CSS strategy and the design debt assessment is lost.                                                  |
+| "Fonts without fallback stacks are probably fine — modern browsers handle missing fonts gracefully."                       | A missing fallback stack is a token validation failure regardless of browser behavior. Every `fontFamily` token must include at least one generic fallback. This is a VALIDATE phase gate, not a style preference.                                      |
 ## Gates
 These are hard stops. Violating any gate means the process has broken down.