@pieerry/harness-kit 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. package/.claude/agents/product-manager.md +20 -0
  2. package/.claude/agents/staff-software-engineer.md +25 -0
  3. package/.claude/commands/product-manager/prd.md +31 -0
  4. package/.claude/commands/product-manager/prp.md +35 -0
  5. package/.claude/commands/product-manager/run.md +31 -0
  6. package/.claude/commands/sse/dev.md +47 -0
  7. package/.claude/commands/sse/plan.md +33 -0
  8. package/.claude/commands/sse/pr.md +43 -0
  9. package/.claude/commands/sse/run.md +39 -0
  10. package/.claude/commands/sse/test.md +38 -0
  11. package/.claude/hooks/status-line.sh +103 -0
  12. package/.claude/plugins/product-manager/README.md +120 -0
  13. package/.claude/plugins/product-manager/evals/prd-quality.md +88 -0
  14. package/.claude/plugins/product-manager/evals/prd-readiness.md +66 -0
  15. package/.claude/plugins/product-manager/evals/prp-context-readiness.md +51 -0
  16. package/.claude/plugins/product-manager/evals/prp-quality.md +88 -0
  17. package/.claude/plugins/product-manager/guides/examples/good-prd-example.md +121 -0
  18. package/.claude/plugins/product-manager/guides/examples/good-prp-example.md +128 -0
  19. package/.claude/plugins/product-manager/guides/pipeline.md +84 -0
  20. package/.claude/plugins/product-manager/guides/prd-guidelines.md +27 -0
  21. package/.claude/plugins/product-manager/guides/product-guidelines.md +75 -0
  22. package/.claude/plugins/product-manager/guides/prp-guidelines.md +64 -0
  23. package/.claude/plugins/product-manager/guides/templates/prd.md +89 -0
  24. package/.claude/plugins/product-manager/guides/templates/prp.md +98 -0
  25. package/.claude/plugins/product-manager/guides/writing-style.md +71 -0
  26. package/.claude/plugins/product-manager/hooks/post-eval-prd.sh +77 -0
  27. package/.claude/plugins/product-manager/hooks/post-eval-prp.sh +70 -0
  28. package/.claude/plugins/product-manager/hooks/post-write-prd.sh +56 -0
  29. package/.claude/plugins/product-manager/hooks/post-write-prp.sh +61 -0
  30. package/.claude/plugins/product-manager/hooks/pre-prp-check.sh +48 -0
  31. package/.claude/plugins/product-manager/outputs/.markers/.gitkeep +0 -0
  32. package/.claude/plugins/product-manager/scripts/confluence-publish.py +205 -0
  33. package/.claude/plugins/product-manager/scripts/link-validator.py +87 -0
  34. package/.claude/plugins/product-manager/scripts/sensor-runner.py +140 -0
  35. package/.claude/plugins/product-manager/scripts/token-phase.py +208 -0
  36. package/.claude/plugins/product-manager/sensors/prd-acceptance-criteria.md +39 -0
  37. package/.claude/plugins/product-manager/sensors/prd-structure.md +39 -0
  38. package/.claude/plugins/product-manager/sensors/prp-context-quality.md +42 -0
  39. package/.claude/plugins/product-manager/sensors/prp-links.md +24 -0
  40. package/.claude/plugins/product-manager/sensors/prp-structure.md +52 -0
  41. package/.claude/plugins/product-manager/skills/prd/SKILL.md +33 -0
  42. package/.claude/plugins/product-manager/skills/prp/SKILL.md +37 -0
  43. package/.claude/plugins/staff-software-engineer/README.md +90 -0
  44. package/.claude/plugins/staff-software-engineer/evals/plan-quality.md +48 -0
  45. package/.claude/plugins/staff-software-engineer/guides/coding-style.md +51 -0
  46. package/.claude/plugins/staff-software-engineer/guides/commit-style.md +44 -0
  47. package/.claude/plugins/staff-software-engineer/guides/conventions-override.md +79 -0
  48. package/.claude/plugins/staff-software-engineer/guides/pipeline.md +69 -0
  49. package/.claude/plugins/staff-software-engineer/hooks/post-eval-plan.sh +43 -0
  50. package/.claude/plugins/staff-software-engineer/hooks/post-write-plan.sh +49 -0
  51. package/.claude/plugins/staff-software-engineer/outputs/.markers/.gitkeep +0 -0
  52. package/.claude/plugins/staff-software-engineer/sensors/code-conventions.md +37 -0
  53. package/.claude/plugins/staff-software-engineer/sensors/plan-structure.md +37 -0
  54. package/.claude/plugins/staff-software-engineer/sensors/test-coverage.md +28 -0
  55. package/.claude/plugins/staff-software-engineer/skills/backend/SKILL.md +80 -0
  56. package/.claude/plugins/staff-software-engineer/skills/devops/SKILL.md +58 -0
  57. package/.claude/plugins/staff-software-engineer/skills/mobile/SKILL.md +52 -0
  58. package/.claude/plugins/staff-software-engineer/skills/web/SKILL.md +64 -0
  59. package/.claude/settings.local.json +61 -0
  60. package/CLAUDE.md +90 -0
  61. package/LICENSE +21 -0
  62. package/README.md +192 -0
  63. package/VERSION +1 -0
  64. package/bin/hk.js +141 -0
  65. package/context-library/README.md +38 -0
  66. package/context-library/business-info-template.md +39 -0
  67. package/context-library/decisions/README.md +3 -0
  68. package/context-library/example-prds/README.md +3 -0
  69. package/context-library/meetings/.gitkeep +0 -0
  70. package/context-library/metrics/.gitkeep +0 -0
  71. package/context-library/personal-context-template.md +31 -0
  72. package/context-library/research/.gitkeep +0 -0
  73. package/context-library/squads/README.md +32 -0
  74. package/context-library/strategy/README.md +3 -0
  75. package/package.json +43 -0
  76. package/setup/install.sh +154 -0
  77. package/setup/update.sh +17 -0
@@ -0,0 +1,66 @@
1
+ # Eval: PRD Readiness
2
+
3
+ Type: rule-based stage gate
4
+ Mode: advisory
5
+
6
+ Decides whether a PRD has enough content for its declared stage to advance. Run after prd-quality passes.
7
+
8
+ ## Stage gates
9
+
10
+ ### Team Kickoff
11
+
12
+ Required:
13
+ - problem in 1-2 sentences
14
+ - hypothesis with numeric direction
15
+ - customer list with at least 1 named segment
16
+ - primary success metric named
17
+
18
+ Word count: 250-600
19
+ Next stage: Planning Review
20
+
21
+ ### Planning Review
22
+
23
+ Inherits Team Kickoff, plus:
24
+ - baseline, target, horizon for every metric
25
+ - guardrail metrics named
26
+ - strategy fit explained
27
+ - impact sizing with at least one quantified figure
28
+
29
+ Word count: 400-900
30
+ Next stage: Solution Review
31
+
32
+ ### Solution Review
33
+
34
+ Inherits Planning Review, plus:
35
+ - solution overview with user flow (Mermaid or numbered)
36
+ - at least 1 user story (As / I want / so that)
37
+ - edge cases documented or explicit none-apply statement
38
+ - non-goals listed with reasons
39
+ - risk table with at least 2 risks and mitigations
40
+ - UX reference or "no UI change"
41
+
42
+ Word count: 700-1400
43
+ Next stage: Launch Readiness
44
+
45
+ ### Launch Readiness
46
+
47
+ Inherits Solution Review, plus:
48
+ - phased rollout table with audience, duration, pass criteria per phase
49
+ - kill criteria with numeric thresholds
50
+ - owners and reviewers
51
+ - open questions resolved or assigned to owner
52
+ - feature flag or rollback plan
53
+
54
+ Word count: 1200-2500
55
+ Next stage: Shipped
56
+
57
+ ## Output format
58
+
59
+ ```json
60
+ {
61
+ "current_stage": "Team Kickoff | Planning Review | Solution Review | Launch Readiness",
62
+ "ready_for_next": true,
63
+ "missing_for_next": ["item", "item"],
64
+ "warnings": ["soft issue"]
65
+ }
66
+ ```
@@ -0,0 +1,51 @@
1
+ # Eval: PRP Context Readiness
2
+
3
+ Type: composite (executor simulation plus sensor pass-through)
4
+ Mode: handoff gate
5
+
6
+ Pass criteria:
7
+ - all structural sensors passed
8
+ - shippable in {yes, partial}
9
+ - blocking_questions count <= 2
10
+ - one_shot_likelihood >= 0.7
11
+
12
+ Run last, after prp-quality passes.
13
+
14
+ ## Phase 1: structural
15
+
16
+ Pass-through. Required green:
17
+ - sensors/prp-structure.md
18
+ - sensors/prp-context-quality.md
19
+ - sensors/prp-links.md
20
+
21
+ If any fail, abort and return sensor feedback.
22
+
23
+ ## Phase 2: executor simulation
24
+
25
+ You are a coding agent that has never seen this codebase. You are handed only this PRP. You may not ask the PM follow-up questions. Read it end-to-end and answer:
26
+
27
+ 1. Could you ship with only what is in the PRP plus linked files? (yes / partial / no)
28
+ 2. List every place where you would need to stop and ask a human.
29
+ 3. List every claim that looks plausible but you cannot verify from the PRP alone.
30
+ 4. Rate one-shot-success likelihood from 0 to 1.
31
+
32
+ ## Output format
33
+
34
+ ```json
35
+ {
36
+ "shippable": "yes | partial | no",
37
+ "blocking_questions": ["question"],
38
+ "unverifiable_claims": ["claim"],
39
+ "one_shot_likelihood": 0.0
40
+ }
41
+ ```
42
+
43
+ ## Phase 3: PM signoff (manual)
44
+
45
+ PM confirms or requests another iteration.
46
+
47
+ ## On failure
48
+
49
+ Phase 1 fails: re-run failing sensor, regenerate failed sections.
50
+ Phase 2 fails criteria: regenerate weakest section based on blocking_questions.
51
+ Phase 3 rejects: collect PM feedback, retry the loop.
@@ -0,0 +1,88 @@
1
+ # Eval: PRP Quality
2
+
3
+ Type: LLM-judge
4
+ Mode: quality gate
5
+ Threshold: weighted total >= 8.0
6
+
7
+ Read as if you are the executor: would you hit a dead end? Score 0-10 per dimension. Cite line numbers when below 7. Weighted total = sum(score x weight%) / 100.
8
+
9
+ ## Rubric
10
+
11
+ ### Goal clarity (weight 15%)
12
+ One paragraph, concrete, tied to source PRD?
13
+
14
+ - 10: unambiguous
15
+ - 5: clear but loose scope
16
+ - 0: "improve X" with no scope
17
+
18
+ ### Context completeness (weight 20%)
19
+ Context section gives file paths, patterns with examples, external docs, gotchas? Executor never needs to leave?
20
+
21
+ - 10: self-contained
22
+ - 5: most context present, some gaps
23
+ - 0: "go look at the repo"
24
+
25
+ ### Implementation concreteness (weight 20%)
26
+ Blueprint specific enough that executor never decides naming, file location, or API shape?
27
+
28
+ - 10: numbered steps with concrete identifiers
29
+ - 5: outline present, some decisions open
30
+ - 0: prose handwaving
31
+
32
+ ### Validation executability (weight 15%)
33
+ Validation gates are real commands the executor copy-pastes? Manual checks specific?
34
+
35
+ - 10: copy-paste ready
36
+ - 5: commands with placeholders
37
+ - 0: "make sure it works"
38
+
39
+ ### Pattern referencing (weight 10%)
40
+ Codebase patterns named with file:line references?
41
+
42
+ - 10: at least 2 concrete patterns
43
+ - 5: 1 pattern
44
+ - 0: no analogies
45
+
46
+ ### Scope discipline (weight 10%)
47
+ "Out of scope" explicit? One feature, no bundled unrelated work?
48
+
49
+ - 10: tight
50
+ - 5: 1 unrelated item slipped in
51
+ - 0: scope creep
52
+
53
+ ### Gap honesty (weight 5%)
54
+ Unknowns marked with NEEDS REVIEW or TBD with owners? Marker count <= 5?
55
+
56
+ - 10: honest, low marker count
57
+ - 5: markers without owners
58
+ - 0: too many gaps or pretends to know
59
+
60
+ ### Voice (weight 5%)
61
+ No banned words. No em-dashes. Mermaid not ASCII.
62
+
63
+ - 10: clean
64
+ - 5: 1-2 violations
65
+ - 0: 3+ violations
66
+
67
+ ## On failure (total below 8.0)
68
+
69
+ Retry. Identify lowest-scoring dimensions, regenerate those sections only. Max 3 attempts.
70
+
71
+ ## Output format
72
+
73
+ ```json
74
+ {
75
+ "scores": {
76
+ "goal_clarity": 0,
77
+ "context_completeness": 0,
78
+ "implementation_concreteness": 0,
79
+ "validation_executability": 0,
80
+ "pattern_referencing": 0,
81
+ "scope_discipline": 0,
82
+ "gap_honesty": 0,
83
+ "voice": 0
84
+ },
85
+ "weighted_total": 0.0,
86
+ "feedback": ["dimension: specific issue with line ref"]
87
+ }
88
+ ```
@@ -0,0 +1,121 @@
1
+ # PRD: Invoice Deadline Timezone Fix
2
+
3
+ **Squad:** Billing | **DRI (PM):** @pm-lead | **Date:** 2026-05-08
4
+ **Stage:** Solution Review
5
+ **Bet:** N/A (bug-driven refinement)
6
+ **Status:** In Review
7
+
8
+ ## 1) Problem and Hypothesis
9
+
10
+ **Problem:** Customer success reports that 3% of invoices are flagged as overdue even when the customer is still inside their local payment window. The deadline calculation runs in UTC, but US-EAST customers operate in UTC-5 and EU-WEST customers in UTC+1.
11
+
12
+ **Hypothesis:** If we apply each customer's registered timezone offset when computing the deadline, then false-overdue invoices will drop to zero within 30 days, because the only cause we have seen across 12 weekly support tickets is the UTC mismatch.
13
+
14
+ **Strategy fit:** Supports the Billing reliability pillar for Q2 2026. False-overdue flags trigger automated dunning emails and finance write-offs; cleaning this up is table stakes before we offer SLA-backed payment-window guarantees in H2.
15
+
16
+ **Evidence:**
17
+ - 12 weekly support tickets in the last 6 weeks, all from customers in US-EAST and EU-WEST
18
+ - Finance: $14k in disputed late fees traced to false-overdue invoices (Q1 2026)
19
+ - Customer success confirmed the issue affects only customers in non-UTC timezones. UTC-aligned customers (UK) have zero false-overdue reports.
20
+ - Quote, CUST-001 ops manager: "Our payment deadline is 17:00 local. The dunning email arrived at 14:00 saying we missed it. That's just wrong."
21
+
22
+ ## 2) Customers
23
+
24
+ | Customer | Why it matters |
25
+ |----------|----------------|
26
+ | Customer ops (US, EU, others in non-UTC) | Daily friction; one weekly escalation each |
27
+ | Customer success | Triages the false-overdue tickets, ~3h/week of avoidable work |
28
+ | Finance | Reconciles disputed late fees manually |
29
+ | End customers | Receive incorrect dunning notifications, hurts trust |
30
+
31
+ ## 3) Scope and Non-Goals
32
+
33
+ **In scope:**
34
+ - Read `customers.timezone` when computing the deadline for any invoice.
35
+ - Apply offset in both the daily report and the real-time invoice screen.
36
+ - Backfill the last 90 days of flagged invoices to clear false positives.
37
+
38
+ **Non-goals (max 3):**
39
+ - Schema changes to `customers`. The timezone column already exists and is populated.
40
+ - Per-segment deadline rules. Separate refinement.
41
+ - Surfacing deadline time in customer-facing API responses. Out of scope, request from customers but not blocking.
42
+
43
+ **Trade-offs accepted:**
44
+ - The fix runs in application code, not the database. About +5ms per invoice lookup; acceptable given current latency budget.
45
+ - Backfill is a one-off script, not a recurring job. Anything older than 90 days stays as-is.
46
+
47
+ ## 4) Solution Overview
48
+
49
+ **Summary:** When the billing service evaluates whether an invoice is overdue, it currently compares the invoice's deadline timestamp against a UTC clock. We change the comparison to convert both timestamps into the customer's local timezone before evaluating. The customer timezone is already stored in `customers.timezone` and populated for every active customer.
50
+
51
+ We add an integration test per timezone tier (UTC, UTC-5, UTC+1) using fixed clock fixtures. A one-time backfill script re-evaluates invoices flagged as overdue in the last 90 days and clears false positives.
52
+
53
+ **User flow:**
54
+
55
+ ```mermaid
56
+ flowchart LR
57
+ INV[Invoice evaluated] --> Lookup[Lookup customer.timezone]
58
+ Lookup --> Convert[Convert deadline + now to customer local time]
59
+ Convert --> Compare[Compare in local time]
60
+ Compare -->|on time| Green[Green status]
61
+ Compare -->|overdue| Red[Red status + dunning]
62
+ ```
63
+
64
+ **User stories:**
65
+ - As a customer ops lead, I want the invoice report to use my local payment deadline, so that on-time invoices are not flagged as overdue.
66
+ - As a customer success analyst, I want false-overdue invoices from the last 90 days cleared, so that I stop triaging the same complaints every week.
67
+ - As a finance analyst, I want disputed late fees traceable to the bug to reconcile automatically, so that I do not process them by hand.
68
+
69
+ **UX reference:** No UI changes. The invoice screen and the daily report read from the corrected backend.
70
+
71
+ ## 5) Success Metrics
72
+
73
+ | Metric | Baseline | Target | Horizon |
74
+ |--------|----------|--------|---------|
75
+ | False-overdue rate (non-UTC customers) | 3.1% | 0% | 30 days |
76
+ | Weekly support tickets about deadlines | 12 | 0 | 30 days |
77
+ | Disputed late fees ($) | 14k/quarter | 0 | 90 days |
78
+
79
+ **Guardrails:**
80
+ - Invoice evaluation latency P95: must stay below 200ms (current 145ms; +5ms acceptable).
81
+ - No regression in UTC-aligned customers (UK reference cohort): false-overdue rate must stay 0%.
82
+
83
+ **Kill criteria:** If P95 latency rises above 220ms or if any UTC-aligned customer starts flagging false-overdue after rollout, disable the feature flag and roll back the backfill.
84
+
85
+ ## 6) Rollout
86
+
87
+ | Phase | Audience | Duration | Pass criteria |
88
+ |-------|----------|----------|---------------|
89
+ | 1 | Internal staging + CUST-001 only | 7 days | Zero false-overdue in CUST-001, latency unchanged |
90
+ | 2 | All non-UTC customers | 14 days | False-overdue rate drops to 0%, no UTC regressions |
91
+ | 3 | Full rollout + 90-day backfill | 1 day cutover, 7-day monitoring | Ticket count drops to 0 within 14 days |
92
+
93
+ Feature flag: `deadline_local_tz_v1`. Default off, enabled per phase.
94
+
95
+ ## 7) Risks
96
+
97
+ | Risk | Likelihood | Impact | Mitigation |
98
+ |------|-----------|--------|------------|
99
+ | Customer timezone misconfigured | Low | High | Pre-rollout audit: query `customers` for nulls or invalid TZ strings, fix before phase 2 |
100
+ | Backfill clears legitimately overdue invoices | Low | High | Backfill script writes to an audit table; ops reviews before flipping statuses |
101
+ | New customers added without timezone | Medium | Medium | Add NOT NULL constraint after backfill complete |
102
+
103
+ ## 8) Owners and Open Questions
104
+
105
+ **DRI:** @pm-lead
106
+ **Reviewers:** Engineering (Billing tech lead), Customer success lead, Finance ops, Sales ops
107
+
108
+ **Open questions:**
109
+ - [ ] Should we notify customers about the backfill correction, or just clear the records silently? Owner: @customer-success
110
+ - [ ] Does finance want a per-customer dispute reconciliation report, or aggregated? Owner: @finance-ops
111
+
112
+ ## 9) Appendix
113
+
114
+ **Alternatives considered:**
115
+ - Store deadlines in UTC and force customers to provide their deadlines in UTC. Rejected because it pushes the conversion burden onto every customer and is error-prone.
116
+ - Database-side timezone conversion via view. Rejected because the deadline logic lives in the billing service and splitting it across layers makes future changes harder.
117
+
118
+ **Research quotes:**
119
+ > "Our payment deadline is 17:00 local. The dunning email arrived at 14:00 saying we missed it. That's just wrong." CUST-001 ops manager, ticket #4421
120
+
121
+ > "Every Monday I open the report and there are five fake overdue ones from US-EAST. I just close them. Waste of time." Customer success analyst, weekly retro 2026-04-29
@@ -0,0 +1,128 @@
1
+ # PRP: Invoice Deadline Timezone Fix
2
+
3
+ **Source PRD:** `outputs/prd/2026-05-08-billing-invoice-deadline-tz-fix.md`
4
+ **Target executor:** coding-agent
5
+ **Squad:** Billing | **Tech lead:** @tech-lead | **Date:** 2026-05-10
6
+
7
+ ## 1) Goal
8
+
9
+ Make the billing service evaluate invoice payment deadlines in the customer's local timezone instead of UTC. Add an integration-test suite proving correctness across UTC, UTC-3, and UTC-5 customers. Ship a one-time backfill script that re-evaluates invoices flagged as overdue in the last 90 days and clears false positives via an audit-table review step.
10
+
11
+ ## 2) Why
12
+
13
+ - 3% of invoices from non-UTC customers are flagged overdue incorrectly (PRD §1).
14
+ - 12 weekly support tickets, $14k in disputed late fees per quarter (PRD §1).
15
+ - Prerequisite for SLA-backed payment-window guarantees planned for H2 2026 (PRD §1, strategy fit).
16
+
17
+ ## 3) What
18
+
19
+ **User-visible behavior:**
20
+ - Invoice daily report shows overdue status using customer-local deadline comparison.
21
+ - Real-time invoice screen (`/invoices` view) shows the same.
22
+ - No UI changes; backend correction only.
23
+ - Backfill output: invoices reclassified from overdue to on-time within the 90-day window, after ops sign-off on the audit table.
24
+
25
+ **Out of scope:**
26
+ - Adding new columns to `customers`. `customers.timezone` already exists.
27
+ - Schema migration for `invoice_evaluations`. Reuse existing table.
28
+ - Refactoring the broader deadline-rule engine.
29
+
30
+ **Success criteria (verifiable):**
31
+ - [ ] Given an invoice with deadline `2026-05-10T20:00:00Z` for customer `CUST-001` (UTC-3, deadline 17:00 local), When the billing service evaluates, Then status = on-time.
32
+ - [ ] Given an invoice with deadline `2026-05-10T22:00:00Z` for customer `CUST-001`, When the billing service evaluates, Then status = overdue.
33
+ - [ ] Backfill script outputs a CSV of (invoice_id, old_status, new_status, customer_id, tz) and writes to `audit_invoice_tz_backfill` table. No rows are flipped without a manual approval step.
34
+ - [ ] Latency P95 stays below 220ms in load test.
35
+
36
+ ## 4) Context
37
+
38
+ ### Repos and files touched
39
+
40
+ | Repo | File | Change type | Reference |
41
+ |------|------|-------------|-----------|
42
+ | billing-service | `src/main/java/com/example/billing/deadline/DeadlineEvaluator.java:42` | modify | core comparison logic |
43
+ | billing-service | `src/main/java/com/example/billing/deadline/DeadlineEvaluator.java:88` | new | `convertToCustomerLocal(Instant, ZoneId)` |
44
+ | billing-service | `src/main/resources/db/migration/V20260510__audit_invoice_tz_backfill.sql` | new | audit table migration |
45
+ | billing-service | `src/test/java/com/example/billing/deadline/DeadlineEvaluatorTest.java` | modify | add UTC-3 / UTC-5 cases |
46
+ | billing-service | `src/main/resources/scripts/backfill/invoice_deadline_tz_backfill.sql` | new | one-time backfill script |
47
+
48
+ ### Patterns to follow
49
+
50
+ - **Pattern:** Customer lookup with timezone
51
+ **Example in codebase:** `src/main/java/com/example/billing/customer/CustomerRepository.java:33`
52
+ **Why follow it:** Existing helper returns customer with TZ already deserialized as `ZoneId`. Reusing it avoids a second query.
53
+
54
+ - **Pattern:** Feature flag wrapping
55
+ **Example in codebase:** `src/main/java/com/example/billing/dunning/DunningService.java:67`
56
+ **Why follow it:** Same per-customer gating model. Use flag name `deadline_local_tz_v1`.
57
+
58
+ - **Pattern:** Integration test with fixed clock
59
+ **Example in codebase:** `src/test/java/com/example/billing/dunning/DunningServiceTest.java:120`
60
+ **Why follow it:** Avoids flaky tests around DST or runtime TZ.
61
+
62
+ ### External documentation
63
+
64
+ - Java `ZonedDateTime` semantics: https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html. Read the section on `withZoneSameInstant` vs `withZoneSameLocal`. We want `withZoneSameInstant`.
65
+ - Feature flag service ADR: `docs/adr/0014-feature-flags.md`. For `featureFlags.isEnabled` per-entity semantics.
66
+
67
+ ### Known gotchas
68
+
69
+ - **DST transitions:** Some customer timezones (US-EAST, EU-WEST) observe DST. Test suite must pin to known fixed dates either side of DST boundaries to remain stable.
70
+ - **Null timezone:** If `customers.timezone IS NULL` (some legacy customers), fall back to UTC and log a warning at WARN level with `customer_id`. Do not throw: that would block invoice evaluation entirely.
71
+ - **Backfill idempotency:** Run-twice safety. The audit-table insert must be `INSERT … ON CONFLICT (invoice_id) DO NOTHING`.
72
+
73
+ ## 5) Implementation blueprint
74
+
75
+ ```
76
+ 1. Add featureFlags injection to DeadlineEvaluator (see DunningService.java:67 pattern).
77
+ 2. In DeadlineEvaluator.evaluate(Invoice invoice):
78
+ a. Lookup customer via CustomerRepository.findByIdWithTimezone(invoice.customerId).
79
+ b. If customer.timezone == null, log WARN, fall back to ZoneId.of("UTC").
80
+ c. Convert invoice.deadline into customer.timezone using withZoneSameInstant.
81
+ d. Compare in local zone, return InvoiceStatus.OVERDUE or ON_TIME.
82
+ 3. Guard the new path with featureFlags.isEnabled("deadline_local_tz_v1", customerId).
83
+ If disabled, run legacy UTC comparison.
84
+ 4. Update DeadlineEvaluatorTest with 6 cases (CUST-001 on-time/late, CUST-002 on-time/late, CUST-003 regression).
85
+ 5. Write invoice_deadline_tz_backfill.sql with ON CONFLICT DO NOTHING and audit-only writes.
86
+ ```
87
+
88
+ **Data:**
89
+ - No schema changes to operational tables.
90
+ - New table `audit_invoice_tz_backfill` via migration `V20260510__audit_invoice_tz_backfill.sql`. Columns: `invoice_id UUID PK`, `old_status VARCHAR(16)`, `new_status VARCHAR(16)`, `customer_id UUID`, `tz VARCHAR(64)`, `created_at TIMESTAMPTZ DEFAULT now()`.
91
+ - Volume: about 3% of 90 days × 50k invoices/day = about 135k rows.
92
+
93
+ **Observability:**
94
+ - Logs: `INFO` on flag check ("deadline_local_tz_v1 enabled for customer={customer_id}"), `WARN` on null timezone fallback.
95
+ - Metrics: `billing.deadline.evaluation.duration` histogram, tagged with `tz`. Dashboard: `billing/deadline-health` (existing: add a TZ panel).
96
+ - Alerts: P95 latency above 220ms for 5 min triggers pager. Existing alert; raise threshold from 200 to 220 for the flag-on cohort.
97
+
98
+ ## 6) Validation gates
99
+
100
+ ```bash
101
+ ./gradlew :billing-service:build
102
+ ./gradlew :billing-service:test --tests "com.example.billing.deadline.DeadlineEvaluatorTest*"
103
+ ./gradlew :billing-service:detekt
104
+ psql $STAGING_DB_URL -f src/main/resources/scripts/backfill/invoice_deadline_tz_backfill.sql --single-transaction --dry-run
105
+ ```
106
+
107
+ **Manual verification:**
108
+ - [ ] In staging, set feature flag for `CUST-001`. Create one invoice with deadline `20:00 UTC` (17:00 customer-local, on the deadline). Confirm status = on-time.
109
+ - [ ] Run backfill script in staging. Inspect `audit_invoice_tz_backfill` for CUST-001: confirm 90-day window populated, no UTC customers included.
110
+ - [ ] In Grafana `billing/deadline-health`, confirm new TZ panel renders.
111
+
112
+ ## 7) Rollout
113
+
114
+ - [ ] Feature flag: `deadline_local_tz_v1`, per-customer gating.
115
+ - [ ] Migration required: yes. `V20260510__audit_invoice_tz_backfill.sql`. Forward-only, safe; no data movement.
116
+ - [ ] Rollback plan: disable flag, drop `audit_invoice_tz_backfill` only if backfill was applied incorrectly (data is read-only, so this is reversible).
117
+
118
+ ## 8) Open items
119
+
120
+ - [ ] Confirm web frontend reads invoice status from API and not from its own clock: @web-tech-lead
121
+ - [ ] Decide who triggers the backfill apply step in production: @ops-lead
122
+
123
+ ## 9) References
124
+
125
+ - PRD: `outputs/prd/2026-05-08-billing-invoice-deadline-tz-fix.md`
126
+ - ADR feature flags: `docs/adr/0014-feature-flags.md`
127
+ - Deadline dashboard: https://grafana.example.com/d/billing-deadline-health
128
+ - Pattern reference: `billing-service/src/main/java/com/example/billing/dunning/DunningService.java:67`
@@ -0,0 +1,84 @@
1
+ # Pipeline
2
+
3
+ Shared rules for PRD and PRP generation. Edit retry, approval, publish, and token accounting here.
4
+
5
+ ## Stages per artifact
6
+
7
+ 1. Generate the artifact using the template in skills/{prd|prp}/SKILL.md.
8
+ 2. Apply sensors listed in skills/{prd|prp}/SKILL.md.
9
+ 3. Apply evals listed in skills/{prd|prp}/SKILL.md.
10
+ 4. Append the approval marker. Publish hook fires.
11
+
12
+ ## Retry policy
13
+
14
+ Max attempts: 3 per artifact.
15
+
16
+ Trigger on:
17
+ - any sensor returns non-zero
18
+ - eval weighted total below threshold (default 8.0)
19
+
20
+ Strategy:
21
+ 1. Read the feedback.
22
+ 2. Regenerate only failed or low-scoring sections.
23
+ 3. Re-run sensors and eval.
24
+
25
+ Hard stop after 3 attempts: return blocker, do not proceed.
26
+
27
+ ## Approval marker
28
+
29
+ PRD:
30
+ ```
31
+ <!-- approved: {YYYY-MM-DD} score={weighted-total} -->
32
+ ```
33
+
34
+ PRP:
35
+ ```
36
+ <!-- approved: {YYYY-MM-DD} score={weighted-total} ready-for-handoff: true -->
37
+ ```
38
+
39
+ Triggers hooks/post-eval-{prd|prp}.sh.
40
+
41
+ ## Publish
42
+
43
+ Local: file stays in outputs/{prd|prp}/.
44
+
45
+ Confluence: fires if JIRA_USERNAME and JIRA_API_TOKEN are set. Calls scripts/confluence-publish.py. Missing creds skips silently.
46
+
47
+ After publish, hook appends:
48
+ ```
49
+ <!-- published: {ISO-timestamp} -->
50
+ ```
51
+
52
+ ## Token accounting
53
+
54
+ Each phase brackets a measurable token window. Phases for this plugin:
55
+ - prd-generate: from skill invocation to first save in outputs/prd/
56
+ - prd-validate: from first save to approval marker (covers sensors and evals)
57
+ - prp-generate: from skill invocation to first save in outputs/prp/
58
+ - prp-validate: from first save to approval marker
59
+
60
+ Markers live in outputs/.markers/{feature_id}.{phase}.{start|end}, each with `{"timestamp": ISO, "session_id": ""}`. Skill writes the .start marker; hooks write the .end markers.
61
+
62
+ After each artifact's eval passes, the publish hook runs scripts/token-phase.py for both of its phases. The script reads the Claude transcript JSONL, sums usage tokens (input, output, cache_read, cache_creation) within each window, and appends a phase entry to outputs/tokens/{feature_id}.json. Markers are deleted after consumption.
63
+
64
+ The post-eval hook also appends an inline summary comment to the artifact:
65
+ ```
66
+ <!-- tokens: outputs/tokens/{feature_id}.json in={N} out={N} cache_r={N} -->
67
+ ```
68
+
69
+ Future phases (dev, code review, launch) append new entries to the same outputs/tokens/{feature_id}.json by reusing the feature_id slug.
70
+
71
+ If the transcript is not readable, the script logs a warning and exits 0. Token accounting never blocks publish.
72
+
73
+ ## Orchestrator order
74
+
75
+ 1. PRD: stages 1-4.
76
+ 2. hooks/pre-prp-check.sh validates the PRD approved marker.
77
+ 3. PRP: stages 1-4.
78
+ 4. Return summary.
79
+
80
+ ## Stop conditions
81
+
82
+ - 3 failed attempts on the same artifact
83
+ - missing required input after one clarification
84
+ - pre-prp-check.sh denies PRP start
@@ -0,0 +1,27 @@
1
+ # PRD Guidelines
2
+
3
+ Rules specific to PRDs. General PM rules in product-guidelines.md.
4
+
5
+ Business only. Zero technical detail. File paths, APIs, schemas belong in the PRP.
6
+
7
+ User stories: As a {persona}, I want {action}, so that {benefit}.
8
+
9
+ Metrics need baseline, target, horizon in every row. No empty cells.
10
+
11
+ Mermaid only, never ASCII.
12
+
13
+ Gaps: `NOT FOUND - NEEDS REVIEW: {detail}`. Never invent.
14
+
15
+ Word count by stage:
16
+ - Team Kickoff: 300-500
17
+ - Planning Review: 500-800
18
+ - Solution Review: 800-1200
19
+ - Launch Readiness: 1500-2000
20
+
21
+ Required sections at each stage:
22
+ - Team Kickoff: 1, 2, 8
23
+ - Planning Review: + 5
24
+ - Solution Review: + 4, 7
25
+ - Launch Readiness: all
26
+
27
+ Stage progression: do not skip stages. Each gate adds requirements, never removes them.
@@ -0,0 +1,75 @@
1
+ # Product Guidelines
2
+
3
+ How PMs think about product. Read before drafting a PRD.
4
+
5
+ ## We ship outcomes, not features
6
+
7
+ Every PRD touches a real user or customer cohort. Always answer:
8
+ - Who pays for the pain today?
9
+ - Whose workflow changes?
10
+ - What metric in the customer's business or experience moves?
11
+
12
+ If you cannot answer all three, the PRD is not ready.
13
+
14
+ ## Squads
15
+
16
+ Defined per org in `context-library/business-info.md` and `context-library/squads/`. Multi-squad PRDs: name a primary squad, call out cross-squad interfaces.
17
+
18
+ ## Customer hierarchy
19
+
20
+ When prioritizing, make trade-offs between cohorts explicit, not invisible. A common ranking template (override per org):
21
+
22
+ 1. Paying customers with high operational impact
23
+ 2. Reliability partners (SLAs, contracts)
24
+ 3. End consumers (NPS, support volume)
25
+ 4. Internal ops (cost-to-serve)
26
+
27
+ ## Metrics
28
+
29
+ Prefer metrics already on existing dashboards. New metrics need an owner to instrument.
30
+
31
+ Strong examples (the shape of a strong metric — specific, owned, instrumented):
32
+ - error rate of X by segment
33
+ - first-attempt success rate
34
+ - latency P95 of Y action
35
+ - generation time for Z artifact
36
+ - compliance rate per cohort
37
+
38
+ Weak:
39
+ - "User satisfaction" with no source
40
+ - "Improved efficiency" with no number
41
+ - "Reduced friction" with no baseline
42
+
43
+ ## Non-goals are mandatory
44
+
45
+ Every PRD lists at least one non-goal. If you cannot think of one, you have not scoped the feature.
46
+
47
+ ## Rollout is not "ship it"
48
+
49
+ Every PRD ends with phases, audience per phase, pass criteria per phase, rollback. Feature flags by default.
50
+
51
+ ## Kill criteria
52
+
53
+ Decide before launch what makes you turn the feature off. Examples:
54
+ - error rate above X%
55
+ - latency P95 above Yms
56
+ - adoption below Z% after N days
57
+
58
+ ## PRD vs PRP boundary
59
+
60
+ In PRD:
61
+ - why we are building
62
+ - who feels the pain
63
+ - success metric and target
64
+ - open questions about scope
65
+
66
+ In PRP:
67
+ - file paths, class names, schemas
68
+ - API endpoints, validation commands
69
+ - open questions about implementation
70
+
71
+ If the question is "should we build this", PRD. If it is "how do we build this", PRP.
72
+
73
+ ## Diagrams
74
+
75
+ Mermaid always. Never ASCII.