@pieerry/harness-kit 3.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/product-manager.md +20 -0
- package/.claude/agents/staff-software-engineer.md +25 -0
- package/.claude/commands/product-manager/prd.md +31 -0
- package/.claude/commands/product-manager/prp.md +35 -0
- package/.claude/commands/product-manager/run.md +31 -0
- package/.claude/commands/sse/dev.md +47 -0
- package/.claude/commands/sse/plan.md +33 -0
- package/.claude/commands/sse/pr.md +43 -0
- package/.claude/commands/sse/run.md +39 -0
- package/.claude/commands/sse/test.md +38 -0
- package/.claude/hooks/status-line.sh +103 -0
- package/.claude/plugins/product-manager/README.md +120 -0
- package/.claude/plugins/product-manager/evals/prd-quality.md +88 -0
- package/.claude/plugins/product-manager/evals/prd-readiness.md +66 -0
- package/.claude/plugins/product-manager/evals/prp-context-readiness.md +51 -0
- package/.claude/plugins/product-manager/evals/prp-quality.md +88 -0
- package/.claude/plugins/product-manager/guides/examples/good-prd-example.md +121 -0
- package/.claude/plugins/product-manager/guides/examples/good-prp-example.md +128 -0
- package/.claude/plugins/product-manager/guides/pipeline.md +84 -0
- package/.claude/plugins/product-manager/guides/prd-guidelines.md +27 -0
- package/.claude/plugins/product-manager/guides/product-guidelines.md +75 -0
- package/.claude/plugins/product-manager/guides/prp-guidelines.md +64 -0
- package/.claude/plugins/product-manager/guides/templates/prd.md +89 -0
- package/.claude/plugins/product-manager/guides/templates/prp.md +98 -0
- package/.claude/plugins/product-manager/guides/writing-style.md +71 -0
- package/.claude/plugins/product-manager/hooks/post-eval-prd.sh +77 -0
- package/.claude/plugins/product-manager/hooks/post-eval-prp.sh +70 -0
- package/.claude/plugins/product-manager/hooks/post-write-prd.sh +56 -0
- package/.claude/plugins/product-manager/hooks/post-write-prp.sh +61 -0
- package/.claude/plugins/product-manager/hooks/pre-prp-check.sh +48 -0
- package/.claude/plugins/product-manager/outputs/.markers/.gitkeep +0 -0
- package/.claude/plugins/product-manager/scripts/confluence-publish.py +205 -0
- package/.claude/plugins/product-manager/scripts/link-validator.py +87 -0
- package/.claude/plugins/product-manager/scripts/sensor-runner.py +140 -0
- package/.claude/plugins/product-manager/scripts/token-phase.py +208 -0
- package/.claude/plugins/product-manager/sensors/prd-acceptance-criteria.md +39 -0
- package/.claude/plugins/product-manager/sensors/prd-structure.md +39 -0
- package/.claude/plugins/product-manager/sensors/prp-context-quality.md +42 -0
- package/.claude/plugins/product-manager/sensors/prp-links.md +24 -0
- package/.claude/plugins/product-manager/sensors/prp-structure.md +52 -0
- package/.claude/plugins/product-manager/skills/prd/SKILL.md +33 -0
- package/.claude/plugins/product-manager/skills/prp/SKILL.md +37 -0
- package/.claude/plugins/staff-software-engineer/README.md +90 -0
- package/.claude/plugins/staff-software-engineer/evals/plan-quality.md +48 -0
- package/.claude/plugins/staff-software-engineer/guides/coding-style.md +51 -0
- package/.claude/plugins/staff-software-engineer/guides/commit-style.md +44 -0
- package/.claude/plugins/staff-software-engineer/guides/conventions-override.md +79 -0
- package/.claude/plugins/staff-software-engineer/guides/pipeline.md +69 -0
- package/.claude/plugins/staff-software-engineer/hooks/post-eval-plan.sh +43 -0
- package/.claude/plugins/staff-software-engineer/hooks/post-write-plan.sh +49 -0
- package/.claude/plugins/staff-software-engineer/outputs/.markers/.gitkeep +0 -0
- package/.claude/plugins/staff-software-engineer/sensors/code-conventions.md +37 -0
- package/.claude/plugins/staff-software-engineer/sensors/plan-structure.md +37 -0
- package/.claude/plugins/staff-software-engineer/sensors/test-coverage.md +28 -0
- package/.claude/plugins/staff-software-engineer/skills/backend/SKILL.md +80 -0
- package/.claude/plugins/staff-software-engineer/skills/devops/SKILL.md +58 -0
- package/.claude/plugins/staff-software-engineer/skills/mobile/SKILL.md +52 -0
- package/.claude/plugins/staff-software-engineer/skills/web/SKILL.md +64 -0
- package/.claude/settings.local.json +61 -0
- package/CLAUDE.md +90 -0
- package/LICENSE +21 -0
- package/README.md +192 -0
- package/VERSION +1 -0
- package/bin/hk.js +141 -0
- package/context-library/README.md +38 -0
- package/context-library/business-info-template.md +39 -0
- package/context-library/decisions/README.md +3 -0
- package/context-library/example-prds/README.md +3 -0
- package/context-library/meetings/.gitkeep +0 -0
- package/context-library/metrics/.gitkeep +0 -0
- package/context-library/personal-context-template.md +31 -0
- package/context-library/research/.gitkeep +0 -0
- package/context-library/squads/README.md +32 -0
- package/context-library/strategy/README.md +3 -0
- package/package.json +43 -0
- package/setup/install.sh +154 -0
- package/setup/update.sh +17 -0
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# Eval: PRD Readiness
|
|
2
|
+
|
|
3
|
+
Type: rule-based stage gate
|
|
4
|
+
Mode: advisory
|
|
5
|
+
|
|
6
|
+
Decides whether a PRD has enough content for its declared stage to advance. Run after prd-quality passes.
|
|
7
|
+
|
|
8
|
+
## Stage gates
|
|
9
|
+
|
|
10
|
+
### Team Kickoff
|
|
11
|
+
|
|
12
|
+
Required:
|
|
13
|
+
- problem in 1-2 sentences
|
|
14
|
+
- hypothesis with numeric direction
|
|
15
|
+
- customer list with at least 1 named segment
|
|
16
|
+
- primary success metric named
|
|
17
|
+
|
|
18
|
+
Word count: 250-600
|
|
19
|
+
Next stage: Planning Review
|
|
20
|
+
|
|
21
|
+
### Planning Review
|
|
22
|
+
|
|
23
|
+
Inherits Team Kickoff, plus:
|
|
24
|
+
- baseline, target, horizon for every metric
|
|
25
|
+
- guardrail metrics named
|
|
26
|
+
- strategy fit explained
|
|
27
|
+
- impact sizing with at least one quantified figure
|
|
28
|
+
|
|
29
|
+
Word count: 400-900
|
|
30
|
+
Next stage: Solution Review
|
|
31
|
+
|
|
32
|
+
### Solution Review
|
|
33
|
+
|
|
34
|
+
Inherits Planning Review, plus:
|
|
35
|
+
- solution overview with user flow (Mermaid or numbered)
|
|
36
|
+
- at least 1 user story (As / I want / so that)
|
|
37
|
+
- edge cases documented or explicit none-apply statement
|
|
38
|
+
- non-goals listed with reasons
|
|
39
|
+
- risk table with at least 2 risks and mitigations
|
|
40
|
+
- UX reference or "no UI change"
|
|
41
|
+
|
|
42
|
+
Word count: 700-1400
|
|
43
|
+
Next stage: Launch Readiness
|
|
44
|
+
|
|
45
|
+
### Launch Readiness
|
|
46
|
+
|
|
47
|
+
Inherits Solution Review, plus:
|
|
48
|
+
- phased rollout table with audience, duration, pass criteria per phase
|
|
49
|
+
- kill criteria with numeric thresholds
|
|
50
|
+
- owners and reviewers
|
|
51
|
+
- open questions resolved or assigned to owner
|
|
52
|
+
- feature flag or rollback plan
|
|
53
|
+
|
|
54
|
+
Word count: 1200-2500
|
|
55
|
+
Next stage: Shipped
|
|
56
|
+
|
|
57
|
+
## Output format
|
|
58
|
+
|
|
59
|
+
```json
|
|
60
|
+
{
|
|
61
|
+
"current_stage": "Team Kickoff | Planning Review | Solution Review | Launch Readiness",
|
|
62
|
+
"ready_for_next": true,
|
|
63
|
+
"missing_for_next": ["item", "item"],
|
|
64
|
+
"warnings": ["soft issue"]
|
|
65
|
+
}
|
|
66
|
+
```
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# Eval: PRP Context Readiness
|
|
2
|
+
|
|
3
|
+
Type: composite (executor simulation plus sensor pass-through)
|
|
4
|
+
Mode: handoff gate
|
|
5
|
+
|
|
6
|
+
Pass criteria:
|
|
7
|
+
- all structural sensors passed
|
|
8
|
+
- shippable in {yes, partial}
|
|
9
|
+
- blocking_questions count <= 2
|
|
10
|
+
- one_shot_likelihood >= 0.7
|
|
11
|
+
|
|
12
|
+
Run last, after prp-quality passes.
|
|
13
|
+
|
|
14
|
+
## Phase 1: structural
|
|
15
|
+
|
|
16
|
+
Pass-through. Required green:
|
|
17
|
+
- sensors/prp-structure.md
|
|
18
|
+
- sensors/prp-context-quality.md
|
|
19
|
+
- sensors/prp-links.md
|
|
20
|
+
|
|
21
|
+
If any fail, abort and return sensor feedback.
|
|
22
|
+
|
|
23
|
+
## Phase 2: executor simulation
|
|
24
|
+
|
|
25
|
+
You are a coding agent that has never seen this codebase. You are handed only this PRP. You may not ask the PM follow-up questions. Read it end-to-end and answer:
|
|
26
|
+
|
|
27
|
+
1. Could you ship with only what is in the PRP plus linked files? (yes / partial / no)
|
|
28
|
+
2. List every place where you would need to stop and ask a human.
|
|
29
|
+
3. List every claim that looks plausible but you cannot verify from the PRP alone.
|
|
30
|
+
4. Rate one-shot-success likelihood from 0 to 1.
|
|
31
|
+
|
|
32
|
+
## Output format
|
|
33
|
+
|
|
34
|
+
```json
|
|
35
|
+
{
|
|
36
|
+
"shippable": "yes | partial | no",
|
|
37
|
+
"blocking_questions": ["question"],
|
|
38
|
+
"unverifiable_claims": ["claim"],
|
|
39
|
+
"one_shot_likelihood": 0.0
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Phase 3: PM signoff (manual)
|
|
44
|
+
|
|
45
|
+
PM confirms or requests another iteration.
|
|
46
|
+
|
|
47
|
+
## On failure
|
|
48
|
+
|
|
49
|
+
Phase 1 fails: re-run failing sensor, regenerate failed sections.
|
|
50
|
+
Phase 2 fails criteria: regenerate weakest section based on blocking_questions.
|
|
51
|
+
Phase 3 rejects: collect PM feedback, retry the loop.
|
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
# Eval: PRP Quality
|
|
2
|
+
|
|
3
|
+
Type: LLM-judge
|
|
4
|
+
Mode: quality gate
|
|
5
|
+
Threshold: weighted total >= 8.0
|
|
6
|
+
|
|
7
|
+
Read as if you are the executor: would you hit a dead end? Score 0-10 per dimension. Cite line numbers when below 7. Weighted total = sum(score x weight%) / 100.
|
|
8
|
+
|
|
9
|
+
## Rubric
|
|
10
|
+
|
|
11
|
+
### Goal clarity (weight 15%)
|
|
12
|
+
One paragraph, concrete, tied to source PRD?
|
|
13
|
+
|
|
14
|
+
- 10: unambiguous
|
|
15
|
+
- 5: clear but loose scope
|
|
16
|
+
- 0: "improve X" with no scope
|
|
17
|
+
|
|
18
|
+
### Context completeness (weight 20%)
|
|
19
|
+
Context section gives file paths, patterns with examples, external docs, gotchas? Executor never needs to leave?
|
|
20
|
+
|
|
21
|
+
- 10: self-contained
|
|
22
|
+
- 5: most context present, some gaps
|
|
23
|
+
- 0: "go look at the repo"
|
|
24
|
+
|
|
25
|
+
### Implementation concreteness (weight 20%)
|
|
26
|
+
Blueprint specific enough that executor never decides naming, file location, or API shape?
|
|
27
|
+
|
|
28
|
+
- 10: numbered steps with concrete identifiers
|
|
29
|
+
- 5: outline present, some decisions open
|
|
30
|
+
- 0: prose handwaving
|
|
31
|
+
|
|
32
|
+
### Validation executability (weight 15%)
|
|
33
|
+
Validation gates are real commands the executor copy-pastes? Manual checks specific?
|
|
34
|
+
|
|
35
|
+
- 10: copy-paste ready
|
|
36
|
+
- 5: commands with placeholders
|
|
37
|
+
- 0: "make sure it works"
|
|
38
|
+
|
|
39
|
+
### Pattern referencing (weight 10%)
|
|
40
|
+
Codebase patterns named with file:line references?
|
|
41
|
+
|
|
42
|
+
- 10: at least 2 concrete patterns
|
|
43
|
+
- 5: 1 pattern
|
|
44
|
+
- 0: no analogies
|
|
45
|
+
|
|
46
|
+
### Scope discipline (weight 10%)
|
|
47
|
+
"Out of scope" explicit? One feature, no bundled unrelated work?
|
|
48
|
+
|
|
49
|
+
- 10: tight
|
|
50
|
+
- 5: 1 unrelated item slipped in
|
|
51
|
+
- 0: scope creep
|
|
52
|
+
|
|
53
|
+
### Gap honesty (weight 5%)
|
|
54
|
+
Unknowns marked with NEEDS REVIEW or TBD with owners? Marker count <= 5?
|
|
55
|
+
|
|
56
|
+
- 10: honest, low marker count
|
|
57
|
+
- 5: markers without owners
|
|
58
|
+
- 0: too many gaps or pretends to know
|
|
59
|
+
|
|
60
|
+
### Voice (weight 5%)
|
|
61
|
+
No banned words. No em-dashes. Mermaid not ASCII.
|
|
62
|
+
|
|
63
|
+
- 10: clean
|
|
64
|
+
- 5: 1-2 violations
|
|
65
|
+
- 0: 3+ violations
|
|
66
|
+
|
|
67
|
+
## On failure (total below 8.0)
|
|
68
|
+
|
|
69
|
+
Retry. Identify lowest-scoring dimensions, regenerate those sections only. Max 3 attempts.
|
|
70
|
+
|
|
71
|
+
## Output format
|
|
72
|
+
|
|
73
|
+
```json
|
|
74
|
+
{
|
|
75
|
+
"scores": {
|
|
76
|
+
"goal_clarity": 0,
|
|
77
|
+
"context_completeness": 0,
|
|
78
|
+
"implementation_concreteness": 0,
|
|
79
|
+
"validation_executability": 0,
|
|
80
|
+
"pattern_referencing": 0,
|
|
81
|
+
"scope_discipline": 0,
|
|
82
|
+
"gap_honesty": 0,
|
|
83
|
+
"voice": 0
|
|
84
|
+
},
|
|
85
|
+
"weighted_total": 0.0,
|
|
86
|
+
"feedback": ["dimension: specific issue with line ref"]
|
|
87
|
+
}
|
|
88
|
+
```
|
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
# PRD: Invoice Deadline Timezone Fix
|
|
2
|
+
|
|
3
|
+
**Squad:** Billing | **DRI (PM):** @pm-lead | **Date:** 2026-05-08
|
|
4
|
+
**Stage:** Solution Review
|
|
5
|
+
**Bet:** N/A (bug-driven refinement)
|
|
6
|
+
**Status:** In Review
|
|
7
|
+
|
|
8
|
+
## 1) Problem and Hypothesis
|
|
9
|
+
|
|
10
|
+
**Problem:** Customer success reports that 3% of invoices are flagged as overdue even when the customer is still inside their local payment window. The deadline calculation runs in UTC, but US-EAST customers operate in UTC-5 and EU-WEST customers in UTC+1.
|
|
11
|
+
|
|
12
|
+
**Hypothesis:** If we apply each customer's registered timezone offset when computing the deadline, then false-overdue invoices will drop to zero within 30 days, because the only cause we have seen across 12 weekly support tickets is the UTC mismatch.
|
|
13
|
+
|
|
14
|
+
**Strategy fit:** Supports the Billing reliability pillar for Q2 2026. False-overdue flags trigger automated dunning emails and finance write-offs; cleaning this up is table stakes before we offer SLA-backed payment-window guarantees in H2.
|
|
15
|
+
|
|
16
|
+
**Evidence:**
|
|
17
|
+
- 12 weekly support tickets in the last 6 weeks, all from customers in US-EAST and EU-WEST
|
|
18
|
+
- Finance: $14k in disputed late fees traced to false-overdue invoices (Q1 2026)
|
|
19
|
+
- Customer success confirmed the issue affects only customers in non-UTC timezones. UTC-aligned customers (UK) have zero false-overdue reports.
|
|
20
|
+
- Quote, CUST-001 ops manager: "Our payment deadline is 17:00 local. The dunning email arrived at 14:00 saying we missed it. That's just wrong."
|
|
21
|
+
|
|
22
|
+
## 2) Customers
|
|
23
|
+
|
|
24
|
+
| Customer | Why it matters |
|
|
25
|
+
|----------|----------------|
|
|
26
|
+
| Customer ops (US, EU, others in non-UTC) | Daily friction; one weekly escalation each |
|
|
27
|
+
| Customer success | Triages the false-overdue tickets, ~3h/week of avoidable work |
|
|
28
|
+
| Finance | Reconciles disputed late fees manually |
|
|
29
|
+
| End customers | Receive incorrect dunning notifications, hurts trust |
|
|
30
|
+
|
|
31
|
+
## 3) Scope and Non-Goals
|
|
32
|
+
|
|
33
|
+
**In scope:**
|
|
34
|
+
- Read `customers.timezone` when computing the deadline for any invoice.
|
|
35
|
+
- Apply offset in both the daily report and the real-time invoice screen.
|
|
36
|
+
- Backfill the last 90 days of flagged invoices to clear false positives.
|
|
37
|
+
|
|
38
|
+
**Non-goals (max 3):**
|
|
39
|
+
- Schema changes to `customers`. The timezone column already exists and is populated.
|
|
40
|
+
- Per-segment deadline rules. Separate refinement.
|
|
41
|
+
- Surfacing deadline time in customer-facing API responses. Out of scope, request from customers but not blocking.
|
|
42
|
+
|
|
43
|
+
**Trade-offs accepted:**
|
|
44
|
+
- The fix runs in application code, not the database. About +5ms per invoice lookup; acceptable given current latency budget.
|
|
45
|
+
- Backfill is a one-off script, not a recurring job. Anything older than 90 days stays as-is.
|
|
46
|
+
|
|
47
|
+
## 4) Solution Overview
|
|
48
|
+
|
|
49
|
+
**Summary:** When the billing service evaluates whether an invoice is overdue, it currently compares the invoice's deadline timestamp against a UTC clock. We change the comparison to convert both timestamps into the customer's local timezone before evaluating. The customer timezone is already stored in `customers.timezone` and populated for every active customer.
|
|
50
|
+
|
|
51
|
+
We add an integration test per timezone tier (UTC, UTC-5, UTC+1) using fixed clock fixtures. A one-time backfill script re-evaluates invoices flagged as overdue in the last 90 days and clears false positives.
|
|
52
|
+
|
|
53
|
+
**User flow:**
|
|
54
|
+
|
|
55
|
+
```mermaid
|
|
56
|
+
flowchart LR
|
|
57
|
+
INV[Invoice evaluated] --> Lookup[Lookup customer.timezone]
|
|
58
|
+
Lookup --> Convert[Convert deadline + now to customer local time]
|
|
59
|
+
Convert --> Compare[Compare in local time]
|
|
60
|
+
Compare -->|on time| Green[Green status]
|
|
61
|
+
Compare -->|overdue| Red[Red status + dunning]
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**User stories:**
|
|
65
|
+
- As a customer ops lead, I want the invoice report to use my local payment deadline, so that on-time invoices are not flagged as overdue.
|
|
66
|
+
- As a customer success analyst, I want false-overdue invoices from the last 90 days cleared, so that I stop triaging the same complaints every week.
|
|
67
|
+
- As a finance analyst, I want disputed late fees traceable to the bug to reconcile automatically, so that I do not process them by hand.
|
|
68
|
+
|
|
69
|
+
**UX reference:** No UI changes. The invoice screen and the daily report read from the corrected backend.
|
|
70
|
+
|
|
71
|
+
## 5) Success Metrics
|
|
72
|
+
|
|
73
|
+
| Metric | Baseline | Target | Horizon |
|
|
74
|
+
|--------|----------|--------|---------|
|
|
75
|
+
| False-overdue rate (non-UTC customers) | 3.1% | 0% | 30 days |
|
|
76
|
+
| Weekly support tickets about deadlines | 12 | 0 | 30 days |
|
|
77
|
+
| Disputed late fees ($) | 14k/quarter | 0 | 90 days |
|
|
78
|
+
|
|
79
|
+
**Guardrails:**
|
|
80
|
+
- Invoice evaluation latency P95: must stay below 200ms (current 145ms; +5ms acceptable).
|
|
81
|
+
- No regression in UTC-aligned customers (UK reference cohort): false-overdue rate must stay 0%.
|
|
82
|
+
|
|
83
|
+
**Kill criteria:** If P95 latency rises above 220ms or if any UTC-aligned customer starts flagging false-overdue after rollout, disable the feature flag and roll back the backfill.
|
|
84
|
+
|
|
85
|
+
## 6) Rollout
|
|
86
|
+
|
|
87
|
+
| Phase | Audience | Duration | Pass criteria |
|
|
88
|
+
|-------|----------|----------|---------------|
|
|
89
|
+
| 1 | Internal staging + CUST-001 only | 7 days | Zero false-overdue in CUST-001, latency unchanged |
|
|
90
|
+
| 2 | All non-UTC customers | 14 days | False-overdue rate drops to 0%, no UTC regressions |
|
|
91
|
+
| 3 | Full rollout + 90-day backfill | 1 day cutover, 7-day monitoring | Ticket count drops to 0 within 14 days |
|
|
92
|
+
|
|
93
|
+
Feature flag: `deadline_local_tz_v1`. Default off, enabled per phase.
|
|
94
|
+
|
|
95
|
+
## 7) Risks
|
|
96
|
+
|
|
97
|
+
| Risk | Likelihood | Impact | Mitigation |
|
|
98
|
+
|------|-----------|--------|------------|
|
|
99
|
+
| Customer timezone misconfigured | Low | High | Pre-rollout audit: query `customers` for nulls or invalid TZ strings, fix before phase 2 |
|
|
100
|
+
| Backfill clears legitimately overdue invoices | Low | High | Backfill script writes to an audit table; ops reviews before flipping statuses |
|
|
101
|
+
| New customers added without timezone | Medium | Medium | Add NOT NULL constraint after backfill complete |
|
|
102
|
+
|
|
103
|
+
## 8) Owners and Open Questions
|
|
104
|
+
|
|
105
|
+
**DRI:** @pm-lead
|
|
106
|
+
**Reviewers:** Engineering (Billing tech lead), Customer success lead, Finance ops, Sales ops
|
|
107
|
+
|
|
108
|
+
**Open questions:**
|
|
109
|
+
- [ ] Should we notify customers about the backfill correction, or just clear the records silently? Owner: @customer-success
|
|
110
|
+
- [ ] Does finance want a per-customer dispute reconciliation report, or aggregated? Owner: @finance-ops
|
|
111
|
+
|
|
112
|
+
## 9) Appendix
|
|
113
|
+
|
|
114
|
+
**Alternatives considered:**
|
|
115
|
+
- Store deadlines in UTC and force customers to provide their deadlines in UTC. Rejected because it pushes the conversion burden onto every customer and is error-prone.
|
|
116
|
+
- Database-side timezone conversion via view. Rejected because the deadline logic lives in the billing service and splitting it across layers makes future changes harder.
|
|
117
|
+
|
|
118
|
+
**Research quotes:**
|
|
119
|
+
> "Our payment deadline is 17:00 local. The dunning email arrived at 14:00 saying we missed it. That's just wrong." CUST-001 ops manager, ticket #4421
|
|
120
|
+
|
|
121
|
+
> "Every Monday I open the report and there are five fake overdue ones from US-EAST. I just close them. Waste of time." Customer success analyst, weekly retro 2026-04-29
|
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
# PRP: Invoice Deadline Timezone Fix
|
|
2
|
+
|
|
3
|
+
**Source PRD:** `outputs/prd/2026-05-08-billing-invoice-deadline-tz-fix.md`
|
|
4
|
+
**Target executor:** coding-agent
|
|
5
|
+
**Squad:** Billing | **Tech lead:** @tech-lead | **Date:** 2026-05-10
|
|
6
|
+
|
|
7
|
+
## 1) Goal
|
|
8
|
+
|
|
9
|
+
Make the billing service evaluate invoice payment deadlines in the customer's local timezone instead of UTC. Add an integration-test suite proving correctness across UTC, UTC-3, and UTC-5 customers. Ship a one-time backfill script that re-evaluates invoices flagged as overdue in the last 90 days and clears false positives via an audit-table review step.
|
|
10
|
+
|
|
11
|
+
## 2) Why
|
|
12
|
+
|
|
13
|
+
- 3% of invoices from non-UTC customers are flagged overdue incorrectly (PRD §1).
|
|
14
|
+
- 12 weekly support tickets, $14k in disputed late fees per quarter (PRD §1).
|
|
15
|
+
- Prerequisite for SLA-backed payment-window guarantees planned for H2 2026 (PRD §1, strategy fit).
|
|
16
|
+
|
|
17
|
+
## 3) What
|
|
18
|
+
|
|
19
|
+
**User-visible behavior:**
|
|
20
|
+
- Invoice daily report shows overdue status using customer-local deadline comparison.
|
|
21
|
+
- Real-time invoice screen (`/invoices` view) shows the same.
|
|
22
|
+
- No UI changes; backend correction only.
|
|
23
|
+
- Backfill output: invoices reclassified from overdue to on-time within the 90-day window, after ops sign-off on the audit table.
|
|
24
|
+
|
|
25
|
+
**Out of scope:**
|
|
26
|
+
- Adding new columns to `customers`. `customers.timezone` already exists.
|
|
27
|
+
- Schema migration for `invoice_evaluations`. Reuse existing table.
|
|
28
|
+
- Refactoring the broader deadline-rule engine.
|
|
29
|
+
|
|
30
|
+
**Success criteria (verifiable):**
|
|
31
|
+
- [ ] Given an invoice with deadline `2026-05-10T20:00:00Z` for customer `CUST-001` (UTC-3, deadline 17:00 local), When the billing service evaluates, Then status = on-time.
|
|
32
|
+
- [ ] Given an invoice with deadline `2026-05-10T22:00:00Z` for customer `CUST-001`, When the billing service evaluates, Then status = overdue.
|
|
33
|
+
- [ ] Backfill script outputs a CSV of (invoice_id, old_status, new_status, customer_id, tz) and writes to `audit_invoice_tz_backfill` table. No rows are flipped without a manual approval step.
|
|
34
|
+
- [ ] Latency P95 stays below 220ms in load test.
|
|
35
|
+
|
|
36
|
+
## 4) Context
|
|
37
|
+
|
|
38
|
+
### Repos and files touched
|
|
39
|
+
|
|
40
|
+
| Repo | File | Change type | Reference |
|
|
41
|
+
|------|------|-------------|-----------|
|
|
42
|
+
| billing-service | `src/main/java/com/example/billing/deadline/DeadlineEvaluator.java:42` | modify | core comparison logic |
|
|
43
|
+
| billing-service | `src/main/java/com/example/billing/deadline/DeadlineEvaluator.java:88` | new | `convertToCustomerLocal(Instant, ZoneId)` |
|
|
44
|
+
| billing-service | `src/main/resources/db/migration/V20260510__audit_invoice_tz_backfill.sql` | new | audit table migration |
|
|
45
|
+
| billing-service | `src/test/java/com/example/billing/deadline/DeadlineEvaluatorTest.java` | modify | add UTC-3 / UTC-5 cases |
|
|
46
|
+
| billing-service | `src/main/resources/scripts/backfill/invoice_deadline_tz_backfill.sql` | new | one-time backfill script |
|
|
47
|
+
|
|
48
|
+
### Patterns to follow
|
|
49
|
+
|
|
50
|
+
- **Pattern:** Customer lookup with timezone
|
|
51
|
+
**Example in codebase:** `src/main/java/com/example/billing/customer/CustomerRepository.java:33`
|
|
52
|
+
**Why follow it:** Existing helper returns customer with TZ already deserialized as `ZoneId`. Reusing it avoids a second query.
|
|
53
|
+
|
|
54
|
+
- **Pattern:** Feature flag wrapping
|
|
55
|
+
**Example in codebase:** `src/main/java/com/example/billing/dunning/DunningService.java:67`
|
|
56
|
+
**Why follow it:** Same per-customer gating model. Use flag name `deadline_local_tz_v1`.
|
|
57
|
+
|
|
58
|
+
- **Pattern:** Integration test with fixed clock
|
|
59
|
+
**Example in codebase:** `src/test/java/com/example/billing/dunning/DunningServiceTest.java:120`
|
|
60
|
+
**Why follow it:** Avoids flaky tests around DST or runtime TZ.
|
|
61
|
+
|
|
62
|
+
### External documentation
|
|
63
|
+
|
|
64
|
+
- Java `ZonedDateTime` semantics: https://docs.oracle.com/javase/8/docs/api/java/time/ZonedDateTime.html. Read the section on `withZoneSameInstant` vs `withZoneSameLocal`. We want `withZoneSameInstant`.
|
|
65
|
+
- Feature flag service ADR: `docs/adr/0014-feature-flags.md`. For `featureFlags.isEnabled` per-entity semantics.
|
|
66
|
+
|
|
67
|
+
### Known gotchas
|
|
68
|
+
|
|
69
|
+
- **DST transitions:** Some customer timezones (US-EAST, EU-WEST) observe DST. Test suite must pin to known fixed dates either side of DST boundaries to remain stable.
|
|
70
|
+
- **Null timezone:** If `customers.timezone IS NULL` (some legacy customers), fall back to UTC and log a warning at WARN level with `customer_id`. Do not throw: that would block invoice evaluation entirely.
|
|
71
|
+
- **Backfill idempotency:** Run-twice safety. The audit-table insert must be `INSERT … ON CONFLICT (invoice_id) DO NOTHING`.
|
|
72
|
+
|
|
73
|
+
## 5) Implementation blueprint
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
1. Add featureFlags injection to DeadlineEvaluator (see DunningService.java:67 pattern).
|
|
77
|
+
2. In DeadlineEvaluator.evaluate(Invoice invoice):
|
|
78
|
+
a. Lookup customer via CustomerRepository.findByIdWithTimezone(invoice.customerId).
|
|
79
|
+
b. If customer.timezone == null, log WARN, fall back to ZoneId.of("UTC").
|
|
80
|
+
c. Convert invoice.deadline into customer.timezone using withZoneSameInstant.
|
|
81
|
+
d. Compare in local zone, return InvoiceStatus.OVERDUE or ON_TIME.
|
|
82
|
+
3. Guard the new path with featureFlags.isEnabled("deadline_local_tz_v1", customerId).
|
|
83
|
+
If disabled, run legacy UTC comparison.
|
|
84
|
+
4. Update DeadlineEvaluatorTest with 6 cases (CUST-001 on-time/late, CUST-002 on-time/late, CUST-003 regression).
|
|
85
|
+
5. Write invoice_deadline_tz_backfill.sql with ON CONFLICT DO NOTHING and audit-only writes.
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
**Data:**
|
|
89
|
+
- No schema changes to operational tables.
|
|
90
|
+
- New table `audit_invoice_tz_backfill` via migration `V20260510__audit_invoice_tz_backfill.sql`. Columns: `invoice_id UUID PK`, `old_status VARCHAR(16)`, `new_status VARCHAR(16)`, `customer_id UUID`, `tz VARCHAR(64)`, `created_at TIMESTAMPTZ DEFAULT now()`.
|
|
91
|
+
- Volume: about 3% of 90 days × 50k invoices/day = about 135k rows.
|
|
92
|
+
|
|
93
|
+
**Observability:**
|
|
94
|
+
- Logs: `INFO` on flag check ("deadline_local_tz_v1 enabled for customer={customer_id}"), `WARN` on null timezone fallback.
|
|
95
|
+
- Metrics: `billing.deadline.evaluation.duration` histogram, tagged with `tz`. Dashboard: `billing/deadline-health` (existing: add a TZ panel).
|
|
96
|
+
- Alerts: P95 latency above 220ms for 5 min triggers pager. Existing alert; raise threshold from 200 to 220 for the flag-on cohort.
|
|
97
|
+
|
|
98
|
+
## 6) Validation gates
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
./gradlew :billing-service:build
|
|
102
|
+
./gradlew :billing-service:test --tests "com.example.billing.deadline.DeadlineEvaluatorTest*"
|
|
103
|
+
./gradlew :billing-service:detekt
|
|
104
|
+
psql $STAGING_DB_URL -f src/main/resources/scripts/backfill/invoice_deadline_tz_backfill.sql --single-transaction --dry-run
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
**Manual verification:**
|
|
108
|
+
- [ ] In staging, set feature flag for `CUST-001`. Create one invoice with deadline `20:00 UTC` (17:00 customer-local, on the deadline). Confirm status = on-time.
|
|
109
|
+
- [ ] Run backfill script in staging. Inspect `audit_invoice_tz_backfill` for CUST-001: confirm 90-day window populated, no UTC customers included.
|
|
110
|
+
- [ ] In Grafana `billing/deadline-health`, confirm new TZ panel renders.
|
|
111
|
+
|
|
112
|
+
## 7) Rollout
|
|
113
|
+
|
|
114
|
+
- [ ] Feature flag: `deadline_local_tz_v1`, per-customer gating.
|
|
115
|
+
- [ ] Migration required: yes. `V20260510__audit_invoice_tz_backfill.sql`. Forward-only, safe; no data movement.
|
|
116
|
+
- [ ] Rollback plan: disable flag, drop `audit_invoice_tz_backfill` only if backfill was applied incorrectly (data is read-only, so this is reversible).
|
|
117
|
+
|
|
118
|
+
## 8) Open items
|
|
119
|
+
|
|
120
|
+
- [ ] Confirm web frontend reads invoice status from API and not from its own clock: @web-tech-lead
|
|
121
|
+
- [ ] Decide who triggers the backfill apply step in production: @ops-lead
|
|
122
|
+
|
|
123
|
+
## 9) References
|
|
124
|
+
|
|
125
|
+
- PRD: `outputs/prd/2026-05-08-billing-invoice-deadline-tz-fix.md`
|
|
126
|
+
- ADR feature flags: `docs/adr/0014-feature-flags.md`
|
|
127
|
+
- Deadline dashboard: https://grafana.example.com/d/billing-deadline-health
|
|
128
|
+
- Pattern reference: `billing-service/src/main/java/com/example/billing/dunning/DunningService.java:67`
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Pipeline
|
|
2
|
+
|
|
3
|
+
Shared rules for PRD and PRP generation. Edit retry, approval, publish, and token accounting here.
|
|
4
|
+
|
|
5
|
+
## Stages per artifact
|
|
6
|
+
|
|
7
|
+
1. Generate the artifact using the template in skills/{prd|prp}/SKILL.md.
|
|
8
|
+
2. Apply sensors listed in skills/{prd|prp}/SKILL.md.
|
|
9
|
+
3. Apply evals listed in skills/{prd|prp}/SKILL.md.
|
|
10
|
+
4. Append the approval marker. Publish hook fires.
|
|
11
|
+
|
|
12
|
+
## Retry policy
|
|
13
|
+
|
|
14
|
+
Max attempts: 3 per artifact.
|
|
15
|
+
|
|
16
|
+
Trigger on:
|
|
17
|
+
- any sensor returns non-zero
|
|
18
|
+
- eval weighted total below threshold (default 8.0)
|
|
19
|
+
|
|
20
|
+
Strategy:
|
|
21
|
+
1. Read the feedback.
|
|
22
|
+
2. Regenerate only failed or low-scoring sections.
|
|
23
|
+
3. Re-run sensors and eval.
|
|
24
|
+
|
|
25
|
+
Hard stop after 3 attempts: return blocker, do not proceed.
|
|
26
|
+
|
|
27
|
+
## Approval marker
|
|
28
|
+
|
|
29
|
+
PRD:
|
|
30
|
+
```
|
|
31
|
+
<!-- approved: {YYYY-MM-DD} score={weighted-total} -->
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
PRP:
|
|
35
|
+
```
|
|
36
|
+
<!-- approved: {YYYY-MM-DD} score={weighted-total} ready-for-handoff: true -->
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Triggers hooks/post-eval-{prd|prp}.sh.
|
|
40
|
+
|
|
41
|
+
## Publish
|
|
42
|
+
|
|
43
|
+
Local: file stays in outputs/{prd|prp}/.
|
|
44
|
+
|
|
45
|
+
Confluence: fires if JIRA_USERNAME and JIRA_API_TOKEN are set. Calls scripts/confluence-publish.py. Missing creds skips silently.
|
|
46
|
+
|
|
47
|
+
After publish, hook appends:
|
|
48
|
+
```
|
|
49
|
+
<!-- published: {ISO-timestamp} -->
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Token accounting
|
|
53
|
+
|
|
54
|
+
Each phase brackets a measurable token window. Phases for this plugin:
|
|
55
|
+
- prd-generate: from skill invocation to first save in outputs/prd/
|
|
56
|
+
- prd-validate: from first save to approval marker (covers sensors and evals)
|
|
57
|
+
- prp-generate: from skill invocation to first save in outputs/prp/
|
|
58
|
+
- prp-validate: from first save to approval marker
|
|
59
|
+
|
|
60
|
+
Markers live in outputs/.markers/{feature_id}.{phase}.{start|end}, each with `{"timestamp": ISO, "session_id": ""}`. Skill writes the .start marker; hooks write the .end markers.
|
|
61
|
+
|
|
62
|
+
After each artifact's eval passes, the publish hook runs scripts/token-phase.py for both of its phases. The script reads the Claude transcript JSONL, sums usage tokens (input, output, cache_read, cache_creation) within each window, and appends a phase entry to outputs/tokens/{feature_id}.json. Markers are deleted after consumption.
|
|
63
|
+
|
|
64
|
+
The post-eval hook also appends an inline summary comment to the artifact:
|
|
65
|
+
```
|
|
66
|
+
<!-- tokens: outputs/tokens/{feature_id}.json in={N} out={N} cache_r={N} -->
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Future phases (dev, code review, launch) append new entries to the same outputs/tokens/{feature_id}.json by reusing the feature_id slug.
|
|
70
|
+
|
|
71
|
+
If the transcript is not readable, the script logs a warning and exits 0. Token accounting never blocks publish.
|
|
72
|
+
|
|
73
|
+
## Orchestrator order
|
|
74
|
+
|
|
75
|
+
1. PRD: stages 1-4.
|
|
76
|
+
2. hooks/pre-prp-check.sh validates the PRD approved marker.
|
|
77
|
+
3. PRP: stages 1-4.
|
|
78
|
+
4. Return summary.
|
|
79
|
+
|
|
80
|
+
## Stop conditions
|
|
81
|
+
|
|
82
|
+
- 3 failed attempts on the same artifact
|
|
83
|
+
- missing required input after one clarification
|
|
84
|
+
- pre-prp-check.sh denies PRP start
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
# PRD Guidelines
|
|
2
|
+
|
|
3
|
+
Rules specific to PRDs. General PM rules in product-guidelines.md.
|
|
4
|
+
|
|
5
|
+
Business only. Zero technical detail. File paths, APIs, schemas belong in the PRP.
|
|
6
|
+
|
|
7
|
+
User stories: As a {persona}, I want {action}, so that {benefit}.
|
|
8
|
+
|
|
9
|
+
Metrics need baseline, target, horizon in every row. No empty cells.
|
|
10
|
+
|
|
11
|
+
Mermaid only, never ASCII.
|
|
12
|
+
|
|
13
|
+
Gaps: `NOT FOUND - NEEDS REVIEW: {detail}`. Never invent.
|
|
14
|
+
|
|
15
|
+
Word count by stage:
|
|
16
|
+
- Team Kickoff: 300-500
|
|
17
|
+
- Planning Review: 500-800
|
|
18
|
+
- Solution Review: 800-1200
|
|
19
|
+
- Launch Readiness: 1500-2000
|
|
20
|
+
|
|
21
|
+
Required sections at each stage:
|
|
22
|
+
- Team Kickoff: 1, 2, 8
|
|
23
|
+
- Planning Review: + 5
|
|
24
|
+
- Solution Review: + 4, 7
|
|
25
|
+
- Launch Readiness: all
|
|
26
|
+
|
|
27
|
+
Stage progression: do not skip stages. Each gate adds requirements, never removes them.
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# Product Guidelines
|
|
2
|
+
|
|
3
|
+
How PMs think about product. Read before drafting a PRD.
|
|
4
|
+
|
|
5
|
+
## We ship outcomes, not features
|
|
6
|
+
|
|
7
|
+
Every PRD touches a real user or customer cohort. Always answer:
|
|
8
|
+
- Who pays for the pain today?
|
|
9
|
+
- Whose workflow changes?
|
|
10
|
+
- What metric in the customer's business or experience moves?
|
|
11
|
+
|
|
12
|
+
If you cannot answer all three, the PRD is not ready.
|
|
13
|
+
|
|
14
|
+
## Squads
|
|
15
|
+
|
|
16
|
+
Defined per org in `context-library/business-info.md` and `context-library/squads/`. Multi-squad PRDs: name a primary squad, call out cross-squad interfaces.
|
|
17
|
+
|
|
18
|
+
## Customer hierarchy
|
|
19
|
+
|
|
20
|
+
When prioritizing, make trade-offs between cohorts explicit, not invisible. A common ranking template (override per org):
|
|
21
|
+
|
|
22
|
+
1. Paying customers with high operational impact
|
|
23
|
+
2. Reliability partners (SLAs, contracts)
|
|
24
|
+
3. End consumers (NPS, support volume)
|
|
25
|
+
4. Internal ops (cost-to-serve)
|
|
26
|
+
|
|
27
|
+
## Metrics
|
|
28
|
+
|
|
29
|
+
Prefer metrics already on existing dashboards. New metrics need an owner to instrument.
|
|
30
|
+
|
|
31
|
+
Strong examples (the shape of a strong metric — specific, owned, instrumented):
|
|
32
|
+
- error rate of X by segment
|
|
33
|
+
- first-attempt success rate
|
|
34
|
+
- latency P95 of Y action
|
|
35
|
+
- generation time for Z artifact
|
|
36
|
+
- compliance rate per cohort
|
|
37
|
+
|
|
38
|
+
Weak:
|
|
39
|
+
- "User satisfaction" with no source
|
|
40
|
+
- "Improved efficiency" with no number
|
|
41
|
+
- "Reduced friction" with no baseline
|
|
42
|
+
|
|
43
|
+
## Non-goals are mandatory
|
|
44
|
+
|
|
45
|
+
Every PRD lists at least one non-goal. If you cannot think of one, you have not scoped the feature.
|
|
46
|
+
|
|
47
|
+
## Rollout is not "ship it"
|
|
48
|
+
|
|
49
|
+
Every PRD ends with phases, audience per phase, pass criteria per phase, rollback. Feature flags by default.
|
|
50
|
+
|
|
51
|
+
## Kill criteria
|
|
52
|
+
|
|
53
|
+
Decide before launch what makes you turn the feature off. Examples:
|
|
54
|
+
- error rate above X%
|
|
55
|
+
- latency P95 above Yms
|
|
56
|
+
- adoption below Z% after N days
|
|
57
|
+
|
|
58
|
+
## PRD vs PRP boundary
|
|
59
|
+
|
|
60
|
+
In PRD:
|
|
61
|
+
- why we are building
|
|
62
|
+
- who feels the pain
|
|
63
|
+
- success metric and target
|
|
64
|
+
- open questions about scope
|
|
65
|
+
|
|
66
|
+
In PRP:
|
|
67
|
+
- file paths, class names, schemas
|
|
68
|
+
- API endpoints, validation commands
|
|
69
|
+
- open questions about implementation
|
|
70
|
+
|
|
71
|
+
If the question is "should we build this", PRD. If it is "how do we build this", PRP.
|
|
72
|
+
|
|
73
|
+
## Diagrams
|
|
74
|
+
|
|
75
|
+
Mermaid always. Never ASCII.
|