universal-dev-standards 5.12.1 → 5.13.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bundled/ai/standards/deployment-standards.ai.yaml +50 -2
- package/bundled/ai/standards/logging.ai.yaml +40 -3
- package/bundled/ai/standards/packaging-standards.ai.yaml +25 -2
- package/bundled/ai/standards/self-review-protocol.ai.yaml +144 -0
- package/bundled/core/deployment-standards.md +100 -2
- package/bundled/core/logging-standards.md +122 -2
- package/bundled/core/packaging-standards.md +72 -2
- package/bundled/core/self-review-protocol.md +160 -0
- package/bundled/locales/zh-CN/CHANGELOG.md +39 -3
- package/bundled/locales/zh-CN/README.md +2 -2
- package/bundled/locales/zh-CN/SECURITY.md +1 -1
- package/bundled/locales/zh-TW/CHANGELOG.md +32 -3
- package/bundled/locales/zh-TW/README.md +2 -2
- package/bundled/locales/zh-TW/SECURITY.md +1 -1
- package/bundled/locales/zh-TW/core/self-review-protocol.md +158 -0
- package/bundled/skills/contract-test-assistant/SKILL.md +7 -0
- package/bundled/skills/deploy-assistant/SKILL.md +2 -0
- package/bundled/skills/logging-guide/SKILL.md +25 -2
- package/bundled/skills/migration-assistant/SKILL.md +104 -0
- package/bundled/skills/runbook-assistant/SKILL.md +8 -0
- package/package.json +2 -2
- package/standards-registry.json +16 -4
|
@@ -12,11 +12,14 @@ standard:
|
|
|
12
12
|
- "Automate build, test, deploy, and rollback"
|
|
13
13
|
|
|
14
14
|
meta:
|
|
15
|
-
version: "1.
|
|
16
|
-
updated: "2026-05-
|
|
15
|
+
version: "1.2.0"
|
|
16
|
+
updated: "2026-05-26"
|
|
17
17
|
source: core/deployment-standards.md
|
|
18
18
|
description: >
|
|
19
19
|
Safe deployment strategies, feature flags, rollback, environment parity, and DORA metrics.
|
|
20
|
+
v1.2.0: Added Defensive Deployment Ordering — required extract-verify-then-delete sequence,
|
|
21
|
+
PowerShell + bash verify snippets, failure mode mapping, cross-link to packaging-standards
|
|
22
|
+
Archive Format Integrity (XSPEC-231 / closes issue #110).
|
|
20
23
|
v1.1.0: Added environment stratification responsibility matrix and stub server CI/CD
|
|
21
24
|
lifecycle rules (XSPEC-204).
|
|
22
25
|
|
|
@@ -230,6 +233,51 @@ stub_server_cicd_rules:
|
|
|
230
233
|
(a) fully testable in UAT via real service or Level 2 stub server, OR
|
|
231
234
|
(b) explicitly documented as PRD-smoke-only with a smoke test plan.
|
|
232
235
|
|
|
236
|
+
defensive_ordering:
|
|
237
|
+
scope: "destructive-update deploy patterns (stop → swap → start) common to Windows IIS, SystemD-managed services, and similar in-place replacement workflows."
|
|
238
|
+
forbidden_ordering: |
|
|
239
|
+
1. Stop service
|
|
240
|
+
2. Extract new package ← may silently no-op on format mismatch
|
|
241
|
+
3. Delete old install ← runs unconditionally — destroys live install
|
|
242
|
+
4. Copy new install ← throws (source doesn't exist)
|
|
243
|
+
5. Start service ← cannot start (binaries gone)
|
|
244
|
+
required_ordering: |
|
|
245
|
+
1. Stop service
|
|
246
|
+
2. Extract new package → staging area (NOT directly over live install)
|
|
247
|
+
3. ✅ VERIFY staging area contains expected artifacts
|
|
248
|
+
↑ if verification fails: abort, do NOT touch the live install
|
|
249
|
+
4. Backup live install (or earlier — both is fine)
|
|
250
|
+
5. Delete old install (preserving logs / runtime data)
|
|
251
|
+
6. Copy new install from staging
|
|
252
|
+
7. Restore preserved configs
|
|
253
|
+
8. Start service
|
|
254
|
+
9. Sanity check (HTTP probe / health endpoint)
|
|
255
|
+
verify_step_3:
|
|
256
|
+
required: true
|
|
257
|
+
skip_allowed: false
|
|
258
|
+
minimum: "Test-Path / [ -f ... ] against at least one well-known file from the new package."
|
|
259
|
+
preferred: "Hash-check a manifest of expected files."
|
|
260
|
+
rules:
|
|
261
|
+
- id: extract-to-staging-not-live
|
|
262
|
+
requirement: "Step 2 MUST extract to a staging area separate from the live install path."
|
|
263
|
+
priority: required
|
|
264
|
+
- id: verify-before-destructive
|
|
265
|
+
requirement: "Step 3 MUST verify staging contents before any destructive step touches the live install. Failure MUST abort with live install untouched."
|
|
266
|
+
priority: required
|
|
267
|
+
- id: backup-required
|
|
268
|
+
requirement: "A backup of the live install MUST exist before step 5 (delete). Can be taken earlier."
|
|
269
|
+
priority: required
|
|
270
|
+
- id: preserve-runtime-data
|
|
271
|
+
requirement: "Step 5 MUST exclude logs and runtime data from deletion."
|
|
272
|
+
priority: required
|
|
273
|
+
failure_modes_addressed:
|
|
274
|
+
- wrong_format_archive: "Step 3 verify fails — live install untouched"
|
|
275
|
+
- partial_extract: "Step 3 verify fails — live install untouched"
|
|
276
|
+
- archive_root_changed: "Step 3 verify fails — live install untouched"
|
|
277
|
+
- permission_denied_extract: "Step 3 verify fails — live install untouched"
|
|
278
|
+
upstream_prevention: "Producer-side prevention via packaging-standards Archive Format Integrity. Both layers form defense-in-depth — neither alone is sufficient."
|
|
279
|
+
failure_mode_reference: "PROD incident 2026-05-24: Expand-Archive silent no-op on tar-renamed-to-.zip + unconditional Remove-Item destroyed live install. ~3 minutes downtime. Step 3 verify would have aborted before destruction."
|
|
280
|
+
|
|
233
281
|
physical_spec:
|
|
234
282
|
type: custom_script
|
|
235
283
|
validator:
|
|
@@ -3,10 +3,10 @@
|
|
|
3
3
|
|
|
4
4
|
id: logging
|
|
5
5
|
meta:
|
|
6
|
-
version: "1.
|
|
7
|
-
updated: "2026-
|
|
6
|
+
version: "1.3.0"
|
|
7
|
+
updated: "2026-05-26"
|
|
8
8
|
source: core/logging-standards.md
|
|
9
|
-
description: Logging levels, structured logging, and best practices
|
|
9
|
+
description: Logging levels, structured logging, file rotation policy, and best practices
|
|
10
10
|
|
|
11
11
|
log_levels:
|
|
12
12
|
ordered:
|
|
@@ -119,6 +119,43 @@ rules:
|
|
|
119
119
|
Correlate all three via trace_id/span_id
|
|
120
120
|
priority: recommended
|
|
121
121
|
|
|
122
|
+
- id: rotation-dual-trigger
|
|
123
|
+
trigger: configuring file-based log sink
|
|
124
|
+
instruction: |
|
|
125
|
+
File-based log sinks MUST set both rotation triggers:
|
|
126
|
+
1. Time-based: rollingInterval=Day (or equivalent)
|
|
127
|
+
2. Size-based: fileSizeLimitBytes explicit AND rollOnFileSizeLimit=true
|
|
128
|
+
Default size caps are hostile in production — Serilog silently drops at 1 GB
|
|
129
|
+
if rollOnFileSizeLimit is left at default false; log4j / Winston / Python
|
|
130
|
+
RotatingFileHandler likewise drop or grow unbounded without explicit config.
|
|
131
|
+
Recommended starting value: fileSizeLimitBytes=104857600 (100 MB),
|
|
132
|
+
retainedFileCountLimit >= N*7 where N = max expected rolls per day.
|
|
133
|
+
priority: required
|
|
134
|
+
|
|
135
|
+
- id: rotation-ops-sop
|
|
136
|
+
trigger: log file size approaching cap
|
|
137
|
+
instruction: |
|
|
138
|
+
If a log file size reaches >= 90% of fileSizeLimitBytes at expected end-of-day,
|
|
139
|
+
INVESTIGATE the cause (noisy retry loop, unbounded debug logging, stack-trace
|
|
140
|
+
flood) BEFORE raising the cap. Raising the cap masks the noise problem.
|
|
141
|
+
priority: required
|
|
142
|
+
|
|
143
|
+
rotation_policy:
|
|
144
|
+
must_set_both:
|
|
145
|
+
time_based: rollingInterval=Day (or equivalent)
|
|
146
|
+
size_based:
|
|
147
|
+
fileSizeLimitBytes: explicit (100 MB recommended)
|
|
148
|
+
rollOnFileSizeLimit: true
|
|
149
|
+
hostile_defaults:
|
|
150
|
+
serilog: silently stops at 1 GB if rollOnFileSizeLimit=false
|
|
151
|
+
log4j: drops if no SizeBasedTriggeringPolicy
|
|
152
|
+
python_rotatingfilehandler: grows unbounded if maxBytes unset
|
|
153
|
+
winston: drops if maxSize unset
|
|
154
|
+
recommended:
|
|
155
|
+
fileSizeLimitBytes: 104857600
|
|
156
|
+
retainedFileCountLimit_formula: "N * 7 (N = max rolls/day)"
|
|
157
|
+
ops_sop: investigate noise root cause at >= 90% size before raising cap
|
|
158
|
+
|
|
122
159
|
quick_reference:
|
|
123
160
|
level_selection:
|
|
124
161
|
columns: [Question, Level]
|
|
@@ -12,8 +12,8 @@ standard:
|
|
|
12
12
|
- "Pipeline-integrated: packaging runs between Review and Deploy in the adoption-layer pipeline"
|
|
13
13
|
|
|
14
14
|
meta:
|
|
15
|
-
version: "1.
|
|
16
|
-
updated: "2026-
|
|
15
|
+
version: "1.1.0"
|
|
16
|
+
updated: "2026-05-26"
|
|
17
17
|
source: core/packaging-standards.md
|
|
18
18
|
|
|
19
19
|
principles:
|
|
@@ -135,6 +135,29 @@ recipe_selection_guide:
|
|
|
135
135
|
yes: windows-installer
|
|
136
136
|
no: custom-recipe-required
|
|
137
137
|
|
|
138
|
+
archive_format_integrity:
|
|
139
|
+
rules:
|
|
140
|
+
- id: real_format_matches_extension
|
|
141
|
+
requirement: "A .zip file MUST be a real ZIP archive (PKZip magic PK\\x03\\x04). A renamed POSIX tar with .zip extension is forbidden."
|
|
142
|
+
priority: required
|
|
143
|
+
- id: verify_before_publish
|
|
144
|
+
requirement: "Packaging step MUST verify the produced archive's real format before declaring success."
|
|
145
|
+
priority: required
|
|
146
|
+
verification_examples:
|
|
147
|
+
zip_python: "python -c \"import zipfile; zipfile.ZipFile('out.zip').namelist()\""
|
|
148
|
+
zip_unix: "file out.zip # expect 'Zip archive data', NOT 'POSIX tar archive'"
|
|
149
|
+
targz_unix: "tar -tzf out.tar.gz >/dev/null"
|
|
150
|
+
- id: windows_recipe_compliance
|
|
151
|
+
requirement: "On Windows, use PowerShell Compress-Archive or .NET ZipFile::CreateFromDirectory. Do NOT use git-bash 'tar -a -cf x.zip' — the auto-extension flag produces a POSIX tar archive."
|
|
152
|
+
priority: required
|
|
153
|
+
do_use:
|
|
154
|
+
- "Compress-Archive -Path 'publish\\*' -DestinationPath 'dist\\patch.zip' -Force"
|
|
155
|
+
- "[System.IO.Compression.ZipFile]::CreateFromDirectory(...)"
|
|
156
|
+
do_not_use:
|
|
157
|
+
- "tar -a -cf x.zip ... # produces POSIX tar with .zip extension on Windows tar ports"
|
|
158
|
+
consumer_side_defense: "Producers cannot guarantee verification downstream. Deploy scripts MUST verify archive integrity before any destructive action — see deployment-standards Defensive Deployment Ordering."
|
|
159
|
+
failure_mode_reference: "PROD incident 2026-05-24: tar-renamed-to-.zip + Expand-Archive silent no-op + unconditional Remove-Item destroyed live install. ~3 minutes downtime."
|
|
160
|
+
|
|
138
161
|
physical_spec:
|
|
139
162
|
type: custom_script
|
|
140
163
|
validator:
|
|
@@ -0,0 +1,144 @@
|
|
|
1
|
+
# Self-Review Protocol - AI Optimized
|
|
2
|
+
# Source: core/self-review-protocol.md
|
|
3
|
+
|
|
4
|
+
id: self-review-protocol
|
|
5
|
+
meta:
|
|
6
|
+
version: "1.0.0"
|
|
7
|
+
updated: "2026-05-26"
|
|
8
|
+
source: core/self-review-protocol.md
|
|
9
|
+
description: Mandatory self-review pass on large markdown edits before commit; catches 6 categories of internal cross-reference inconsistencies that internal reasoning routinely misses
|
|
10
|
+
|
|
11
|
+
trigger_conditions:
|
|
12
|
+
mandatory:
|
|
13
|
+
condition: commit modifies > 50 lines of markdown
|
|
14
|
+
artefact_types:
|
|
15
|
+
- ADR (architecture decision records, DEC-NNN)
|
|
16
|
+
- XSPEC (cross-project specs)
|
|
17
|
+
- XSPEC SDD Deltas
|
|
18
|
+
- SKILL.md (Claude Code custom skills)
|
|
19
|
+
- ARCHITECTURE.md
|
|
20
|
+
- API.md
|
|
21
|
+
- DEPLOYMENT.md
|
|
22
|
+
- MIGRATION.md
|
|
23
|
+
- runbooks
|
|
24
|
+
- playbooks
|
|
25
|
+
- README.md (when modifying major sections)
|
|
26
|
+
optional:
|
|
27
|
+
condition: commit modifies <= 50 lines of markdown
|
|
28
|
+
rationale: small edits rarely have cross-reference risk
|
|
29
|
+
not_applicable:
|
|
30
|
+
condition: code or config only changes
|
|
31
|
+
rationale: covered by lint, test, and code review
|
|
32
|
+
|
|
33
|
+
inconsistency_categories:
|
|
34
|
+
- id: 1
|
|
35
|
+
name: diagram_step_mismatch
|
|
36
|
+
description: Diagram or flow chart out of sync with step list
|
|
37
|
+
example: workflow diagram has 7 boxes but document defines 8 steps
|
|
38
|
+
check: count diagram nodes vs `## Step N:` / `## N.` headers
|
|
39
|
+
- id: 2
|
|
40
|
+
name: changelog_reference_error
|
|
41
|
+
description: Changelog entry references wrong anchor
|
|
42
|
+
example: changelog says "Step 1 added X" but X actually at Step 0
|
|
43
|
+
check: for each changelog line, grep the anchor it references
|
|
44
|
+
- id: 3
|
|
45
|
+
name: count_drift
|
|
46
|
+
description: Explicit number in text out of sync with actual count
|
|
47
|
+
example: "self-audit has 4 questions" but list has 7
|
|
48
|
+
check: grep for "N questions", "N rows", "N items" and verify
|
|
49
|
+
- id: 4
|
|
50
|
+
name: stale_template
|
|
51
|
+
description: Hardcoded model names, tool versions, dates not updated
|
|
52
|
+
example: commit template hardcodes Claude Sonnet 4.6 when model varies
|
|
53
|
+
check: find hardcoded specifics; replace with placeholders or update
|
|
54
|
+
- id: 5
|
|
55
|
+
name: wrong_tool_reference
|
|
56
|
+
description: Recommended CLI command does not do what described
|
|
57
|
+
example: recommends `claude --version` for model name (shows CLI version)
|
|
58
|
+
check: for each CLI command mentioned, mental check or `--help` verify
|
|
59
|
+
- id: 6
|
|
60
|
+
name: placeholder_rule_misalignment
|
|
61
|
+
description: Example contradicts current rule or latest case experience
|
|
62
|
+
example: example shows D1/D2/D3 but rule says D3 not mandatory
|
|
63
|
+
check: every concrete value in examples consistent with current rules
|
|
64
|
+
|
|
65
|
+
procedure:
|
|
66
|
+
step_1:
|
|
67
|
+
name: re_read_full_file
|
|
68
|
+
description: Use file-reading tool to read entire file (not just diff)
|
|
69
|
+
when: after editing, before committing
|
|
70
|
+
step_2:
|
|
71
|
+
name: walk_categories
|
|
72
|
+
description: Apply 6 categories above against the file
|
|
73
|
+
step_3:
|
|
74
|
+
name: fix_in_same_commit
|
|
75
|
+
description: If issues found, edit in place and include fixes in same commit
|
|
76
|
+
rationale: ship-and-patch creates follow-up commits; in-place fix is cleaner
|
|
77
|
+
step_4:
|
|
78
|
+
name: patch_if_already_committed
|
|
79
|
+
description: If issues found after commit, create patch commit (e.g., v1.2.1 fixes v1.2.0)
|
|
80
|
+
|
|
81
|
+
recording_formats:
|
|
82
|
+
skill_md:
|
|
83
|
+
location: changelog line in frontmatter or near top
|
|
84
|
+
format: '> **v{X.Y.(Z+1)} Self-review pass {YYYY-MM-DD}**: {N} issues found, {M} fixed in same commit'
|
|
85
|
+
adr_dec:
|
|
86
|
+
location: '## Follow-up Tracking table'
|
|
87
|
+
format: '| Self-review pass | This DEC | ✅ YYYY-MM-DD (6 categories, no issues) |'
|
|
88
|
+
xspec_sdd_delta:
|
|
89
|
+
location: after non-modification list section (e.g. §N.6)
|
|
90
|
+
format: '> Self-review pass: YYYY-MM-DD (6 categories, no issues)'
|
|
91
|
+
commit_message_body:
|
|
92
|
+
location: last line of commit body
|
|
93
|
+
format: 'Self-review (protocol v1.0.0): N issues found, M applied in same commit / 0 found.'
|
|
94
|
+
|
|
95
|
+
distinction_from_other_practices:
|
|
96
|
+
code_review:
|
|
97
|
+
covers: code correctness, design, security
|
|
98
|
+
trigger: before merging code PR
|
|
99
|
+
source_standard: code-review.md
|
|
100
|
+
content_self_audit:
|
|
101
|
+
covers: content completeness (all required sections present)
|
|
102
|
+
trigger: each artefact creation
|
|
103
|
+
example: eval-source skill 7-question audit
|
|
104
|
+
self_review_protocol:
|
|
105
|
+
covers: internal cross-reference consistency (form, not content)
|
|
106
|
+
trigger: after large markdown edit, before commit
|
|
107
|
+
source_standard: self-review-protocol.md (this standard)
|
|
108
|
+
peer_review:
|
|
109
|
+
covers: independent perspective, blast radius assessment
|
|
110
|
+
trigger: significant changes
|
|
111
|
+
|
|
112
|
+
anti_patterns:
|
|
113
|
+
- skipping_because_diff_small:
|
|
114
|
+
problem: small diffs in large files often introduce cross-ref errors elsewhere
|
|
115
|
+
mitigation: trigger is whole-file size, not diff size
|
|
116
|
+
- reviewing_diff_only:
|
|
117
|
+
problem: cross-ref errors may live in unchanged sections referencing changed content
|
|
118
|
+
mitigation: re-read whole file, not just diff
|
|
119
|
+
- document_without_practice:
|
|
120
|
+
problem: discipline is in the practice, not the documentation
|
|
121
|
+
mitigation: enforce in commit checklist
|
|
122
|
+
- substitute_for_peer_review:
|
|
123
|
+
problem: self-review catches inconsistencies, not design flaws
|
|
124
|
+
mitigation: keep peer review for significant changes
|
|
125
|
+
|
|
126
|
+
verification:
|
|
127
|
+
metrics:
|
|
128
|
+
- patch_commit_ratio:
|
|
129
|
+
description: ratio of v1.X.0 -> v1.X.1 follow-up patches for same artefact
|
|
130
|
+
target: significant drop after adopting; eval-source went from 100% to 0% after v1.3.0
|
|
131
|
+
- issue_surface_time:
|
|
132
|
+
description: issues caught by self-review (pre-commit) vs by next reader (post-commit)
|
|
133
|
+
target: pre-commit grows, post-commit shrinks
|
|
134
|
+
|
|
135
|
+
examples_in_the_wild:
|
|
136
|
+
- artefact: dev-platform/.claude/skills/eval-source/SKILL.md
|
|
137
|
+
version: v1.3.0
|
|
138
|
+
commit: 6b45c5d
|
|
139
|
+
note: first SKILL.md edit to pass self-review pre-commit; preceded by 2 patch cycles (v1.1.0->v1.1.1 with 3 issues, v1.2.0->v1.2.1 with 6 issues) that motivated this standard
|
|
140
|
+
|
|
141
|
+
self_review:
|
|
142
|
+
date: "2026-05-26"
|
|
143
|
+
issues_found: 0
|
|
144
|
+
notes: First draft self-review pass; no internal inconsistencies detected
|
|
@@ -2,8 +2,8 @@
|
|
|
2
2
|
|
|
3
3
|
> **Language**: English | [繁體中文](../locales/zh-TW/core/deployment-standards.md)
|
|
4
4
|
|
|
5
|
-
**Version**: 1.
|
|
6
|
-
**Last Updated**: 2026-
|
|
5
|
+
**Version**: 1.1.0
|
|
6
|
+
**Last Updated**: 2026-05-26
|
|
7
7
|
**Applicability**: All software projects with deployment pipelines
|
|
8
8
|
**Scope**: universal
|
|
9
9
|
**Industry Standards**: Twelve-Factor App, Google SRE — Release Engineering, DORA State of DevOps
|
|
@@ -219,6 +219,103 @@ Strategies are not mutually exclusive. Common combinations:
|
|
|
219
219
|
|
|
220
220
|
---
|
|
221
221
|
|
|
222
|
+
## Defensive Deployment Ordering
|
|
223
|
+
|
|
224
|
+
When a deploy script replaces a running install (the destructive-update pattern common to Windows IIS, SystemD-managed services, or any "stop → swap → start" workflow), the ordering of destructive steps relative to verification is non-negotiable.
|
|
225
|
+
|
|
226
|
+
### The forbidden ordering
|
|
227
|
+
|
|
228
|
+
```
|
|
229
|
+
1. Stop service
|
|
230
|
+
2. Extract new package ← may silently no-op on format mismatch
|
|
231
|
+
3. Delete old install ← runs unconditionally — destroys the running install
|
|
232
|
+
4. Copy new install ← throws (source doesn't exist)
|
|
233
|
+
5. Start service ← cannot start (binaries gone)
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
If step 2 silently fails (corrupt archive, wrong format, disk full, permissions), step 3 still runs and **destroys the running install**, leaving nothing to recover from except backup. Backup helps for full rollback but does NOT prevent the outage window — the service is already down.
|
|
237
|
+
|
|
238
|
+
### The required ordering — extract, verify, then delete
|
|
239
|
+
|
|
240
|
+
The destructive deploy ordering **MUST** be:
|
|
241
|
+
|
|
242
|
+
```
|
|
243
|
+
1. Stop service
|
|
244
|
+
2. Extract new package → staging area (NOT directly over live install)
|
|
245
|
+
3. ✅ VERIFY staging area contains expected artifacts
|
|
246
|
+
↑ if verification fails: abort, do NOT touch the live install
|
|
247
|
+
4. Backup live install (or done earlier — both is fine)
|
|
248
|
+
5. Delete old install (preserving logs / runtime data)
|
|
249
|
+
6. Copy new install from staging
|
|
250
|
+
7. Restore preserved configs
|
|
251
|
+
8. Start service
|
|
252
|
+
9. Sanity check (HTTP probe / health endpoint)
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
**Step 3 verification is non-negotiable.** Minimum verification is checking that at least one well-known file from the new package exists in the staging area. Hash-checking a manifest of expected files is preferred when available.
|
|
256
|
+
|
|
257
|
+
### Verification snippets
|
|
258
|
+
|
|
259
|
+
**PowerShell** (Windows IIS deploy):
|
|
260
|
+
|
|
261
|
+
```powershell
|
|
262
|
+
$staging = "C:\deploy\staging-$(Get-Date -Format yyyyMMddHHmmss)"
|
|
263
|
+
Expand-Archive -Path $zipPath -DestinationPath $staging -Force
|
|
264
|
+
|
|
265
|
+
# Non-negotiable: verify staging before touching live install
|
|
266
|
+
if (-not (Test-Path "$staging\api\MyApp.dll")) {
|
|
267
|
+
throw "Expected $staging\api\MyApp.dll not found — archive may be corrupt or wrong format. Aborting deploy. Live install untouched."
|
|
268
|
+
}
|
|
269
|
+
|
|
270
|
+
# Only NOW touch live install
|
|
271
|
+
Copy-Item "$apiDir" "$backupDir" -Recurse -Force
|
|
272
|
+
Get-ChildItem $apiDir -Exclude logs | Remove-Item -Recurse -Force
|
|
273
|
+
Copy-Item "$staging\api\*" $apiDir -Recurse
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
**bash** (Linux SystemD-managed service):
|
|
277
|
+
|
|
278
|
+
```bash
|
|
279
|
+
set -euo pipefail
|
|
280
|
+
|
|
281
|
+
STAGING="/srv/deploy/staging-$(date +%Y%m%d%H%M%S)"
|
|
282
|
+
mkdir -p "$STAGING"
|
|
283
|
+
tar -xzf "$ARCHIVE" -C "$STAGING"
|
|
284
|
+
|
|
285
|
+
# Non-negotiable: verify staging before touching live install
|
|
286
|
+
if [ ! -f "$STAGING/bin/myapp" ]; then
|
|
287
|
+
echo "ERROR: Expected $STAGING/bin/myapp not found. Aborting deploy. Live install untouched." >&2
|
|
288
|
+
exit 1
|
|
289
|
+
fi
|
|
290
|
+
|
|
291
|
+
# Only NOW touch live install
|
|
292
|
+
systemctl stop myapp
|
|
293
|
+
cp -a "$LIVE_DIR" "$BACKUP_DIR"
|
|
294
|
+
find "$LIVE_DIR" -mindepth 1 -not -path "$LIVE_DIR/logs*" -delete
|
|
295
|
+
cp -a "$STAGING"/* "$LIVE_DIR/"
|
|
296
|
+
systemctl start myapp
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
### Failure modes addressed
|
|
300
|
+
|
|
301
|
+
| Failure mode | What protects against it |
|
|
302
|
+
|---|---|
|
|
303
|
+
| Archive is wrong format (e.g., tar renamed to `.zip`) | Step 3 verify fails — live install untouched |
|
|
304
|
+
| Partial extract (disk full mid-extract) | Step 3 verify fails — live install untouched |
|
|
305
|
+
| Archive root structure changed (extra wrapper folder, missing key file) | Step 3 verify fails — live install untouched |
|
|
306
|
+
| Permissions issue (extract step had read but not write) | Step 3 verify fails — live install untouched |
|
|
307
|
+
| Backup script itself fails | Optional secondary check after step 4 |
|
|
308
|
+
|
|
309
|
+
### Upstream prevention
|
|
310
|
+
|
|
311
|
+
Verifying at the consumer side is the last line of defense. The **upstream** prevention — refusing to produce a misformatted archive in the first place — is covered by [Packaging Standards — Archive Format Integrity](packaging-standards.md#archive-format-integrity). Both layers together form a defense-in-depth pair; neither alone is sufficient.
|
|
312
|
+
|
|
313
|
+
### Failure mode reference (real incident)
|
|
314
|
+
|
|
315
|
+
A Windows IIS production deploy script (2026-05-24) ran `Expand-Archive` against a tar-renamed-to-`.zip` archive (silent no-op), then `Remove-Item -Recurse` against the live `apiDir`, then `Copy-Item` from a source that did not exist (because nothing had been extracted). The live install was wiped, AppPool stopped, production was down for ~3 minutes until backup-based rollback completed. Adding step 3 verify (`Test-Path "$staging/api/MyApp.dll"`) would have aborted the deploy at the staging stage with the live install untouched.
|
|
316
|
+
|
|
317
|
+
---
|
|
318
|
+
|
|
222
319
|
## Post-Deployment Checklist
|
|
223
320
|
|
|
224
321
|
### Immediate (< 5 minutes)
|
|
@@ -334,6 +431,7 @@ Smoke test failure MUST block the deployment from proceeding and trigger a rollb
|
|
|
334
431
|
|
|
335
432
|
| Version | Date | Changes |
|
|
336
433
|
|---------|------|---------|
|
|
434
|
+
| 1.1.0 | 2026-05-26 | Added: Defensive Deployment Ordering section — required extract-verify-then-delete sequence, PowerShell + bash verify snippets, failure mode mapping, cross-link to packaging-standards Archive Format Integrity (XSPEC-231 / closes issue #110) |
|
|
337
435
|
| 1.0.0 | 2026-02-09 | Initial release |
|
|
338
436
|
|
|
339
437
|
---
|
|
@@ -2,8 +2,8 @@
|
|
|
2
2
|
|
|
3
3
|
> **Language**: English | [繁體中文](../locales/zh-TW/core/logging-standards.md)
|
|
4
4
|
|
|
5
|
-
**Version**: 1.
|
|
6
|
-
**Last Updated**: 2026-
|
|
5
|
+
**Version**: 1.3.0
|
|
6
|
+
**Last Updated**: 2026-05-26
|
|
7
7
|
**Applicability**: All software projects
|
|
8
8
|
**Scope**: universal
|
|
9
9
|
**Industry Standards**: RFC 5424, OpenTelemetry, W3C Trace Context
|
|
@@ -595,6 +595,117 @@ For endpoints called thousands of times per second:
|
|
|
595
595
|
| WARN | 90 days |
|
|
596
596
|
| ERROR/FATAL | 1 year |
|
|
597
597
|
|
|
598
|
+
---
|
|
599
|
+
|
|
600
|
+
## Log File Rotation Policy
|
|
601
|
+
|
|
602
|
+
### Rotation policy — MUST set both
|
|
603
|
+
|
|
604
|
+
A file-based log sink configuration **MUST** include **both** triggers:
|
|
605
|
+
|
|
606
|
+
1. **Time-based rotation** (`rollingInterval: Day` or equivalent) — for chronological partitioning
|
|
607
|
+
2. **Size-based rotation** with `rollOnFileSizeLimit: true` (or equivalent) — to handle volume spikes
|
|
608
|
+
|
|
609
|
+
> **Why mandatory:** Most logging libraries ship with a silent default size cap. When the file hits the cap, subsequent log writes are **dropped silently** — no warning, no error. The application keeps running while half a day of logs vanish. Setting both triggers explicitly defeats this trap.
|
|
610
|
+
|
|
611
|
+
### Default cap is hostile in production
|
|
612
|
+
|
|
613
|
+
| Library | Default size cap | Behavior when cap hit |
|
|
614
|
+
|---|---|---|
|
|
615
|
+
| Serilog File sink (.NET) | 1 GB | **Silently stops writing** (`RollOnFileSizeLimit = false` by default) |
|
|
616
|
+
| log4j RollingFileAppender | none unless set | Same — no roll = drops |
|
|
617
|
+
| Python `RotatingFileHandler` | infinite unless `maxBytes` set | Grows unbounded |
|
|
618
|
+
| Winston `winston-daily-rotate-file` | none unless `maxSize` set | Same — no roll = drops |
|
|
619
|
+
|
|
620
|
+
If you do not explicitly configure size-based rotation, you are accepting one of the failure modes above.
|
|
621
|
+
|
|
622
|
+
### Recommended starting values
|
|
623
|
+
|
|
624
|
+
| Parameter | Value | Rationale |
|
|
625
|
+
|---|---|---|
|
|
626
|
+
| `fileSizeLimitBytes` | 100 MB | Balance: small enough to open in an editor, large enough to avoid excessive rolls |
|
|
627
|
+
| `rollOnFileSizeLimit` | `true` | When cap hit, create `*-001.txt`, `*-002.txt`; do **NOT** drop |
|
|
628
|
+
| `retainedFileCountLimit` | ≥ N×7 where N = max expected rolls/day | Avoid premature deletion of in-window logs |
|
|
629
|
+
|
|
630
|
+
### Recipes per language
|
|
631
|
+
|
|
632
|
+
**.NET / Serilog** (`appsettings.json`):
|
|
633
|
+
|
|
634
|
+
```json
|
|
635
|
+
{
|
|
636
|
+
"Serilog": {
|
|
637
|
+
"WriteTo": [{
|
|
638
|
+
"Name": "File",
|
|
639
|
+
"Args": {
|
|
640
|
+
"path": "logs/app-.txt",
|
|
641
|
+
"rollingInterval": "Day",
|
|
642
|
+
"fileSizeLimitBytes": 104857600,
|
|
643
|
+
"rollOnFileSizeLimit": true,
|
|
644
|
+
"retainedFileCountLimit": 90
|
|
645
|
+
}
|
|
646
|
+
}]
|
|
647
|
+
}
|
|
648
|
+
}
|
|
649
|
+
```
|
|
650
|
+
|
|
651
|
+
**Python** (`logging.handlers`):
|
|
652
|
+
|
|
653
|
+
```python
|
|
654
|
+
from logging.handlers import RotatingFileHandler
|
|
655
|
+
|
|
656
|
+
handler = RotatingFileHandler(
|
|
657
|
+
filename="logs/app.log",
|
|
658
|
+
maxBytes=104857600, # 100 MB
|
|
659
|
+
backupCount=90 # ~3 months of rolls assuming low cardinality
|
|
660
|
+
)
|
|
661
|
+
# For combined time+size rotation, compose TimedRotatingFileHandler with size check
|
|
662
|
+
# or use a third-party library such as concurrent-log-handler.
|
|
663
|
+
```
|
|
664
|
+
|
|
665
|
+
**Java / log4j2** (`log4j2.xml`):
|
|
666
|
+
|
|
667
|
+
```xml
|
|
668
|
+
<RollingFile name="App" fileName="logs/app.log"
|
|
669
|
+
filePattern="logs/app-%d{yyyy-MM-dd}-%i.log.gz">
|
|
670
|
+
<PatternLayout pattern="%d %-5p %c{1.} - %m%n"/>
|
|
671
|
+
<Policies>
|
|
672
|
+
<TimeBasedTriggeringPolicy interval="1"/>
|
|
673
|
+
<SizeBasedTriggeringPolicy size="100 MB"/>
|
|
674
|
+
</Policies>
|
|
675
|
+
<DefaultRolloverStrategy max="90"/>
|
|
676
|
+
</RollingFile>
|
|
677
|
+
```
|
|
678
|
+
|
|
679
|
+
**Node / Winston** (`winston-daily-rotate-file`):
|
|
680
|
+
|
|
681
|
+
```javascript
|
|
682
|
+
import DailyRotateFile from "winston-daily-rotate-file";
|
|
683
|
+
|
|
684
|
+
new DailyRotateFile({
|
|
685
|
+
filename: "logs/app-%DATE%.log",
|
|
686
|
+
datePattern: "YYYY-MM-DD",
|
|
687
|
+
maxSize: "100m",
|
|
688
|
+
maxFiles: "90d"
|
|
689
|
+
});
|
|
690
|
+
```
|
|
691
|
+
|
|
692
|
+
### Operational SOP — investigate, don't just raise the cap
|
|
693
|
+
|
|
694
|
+
If a log file size reaches ≥ 90% of `fileSizeLimitBytes` at expected end-of-day, **investigate the cause before raising the cap**. Typical root causes:
|
|
695
|
+
|
|
696
|
+
- Noisy retry loop logging every attempt at INFO instead of WARN summary
|
|
697
|
+
- Unbounded debug logging accidentally enabled in production
|
|
698
|
+
- Stack-trace flood from one upstream failure
|
|
699
|
+
- Health probe / sidecar polluting the business log
|
|
700
|
+
|
|
701
|
+
Raising the cap masks the underlying noise problem and pushes the next outage further out.
|
|
702
|
+
|
|
703
|
+
### Failure-mode reference (real incident)
|
|
704
|
+
|
|
705
|
+
A production .NET Worker using only `rollingInterval: Day` (no size limit set, Serilog default 1 GB cap) hit the cap at 07:31 and silently dropped every log entry until 13:00+ when the operator noticed the tail was stale. Five consecutive daily files showed `~1,073,741,8XX bytes` (= 1 GiB exactly, Serilog default). Half a day of production diagnostics were lost. Setting `fileSizeLimitBytes` + `rollOnFileSizeLimit: true` would have rolled to `worker-YYYYMMDD_001.txt` and preserved the events.
|
|
706
|
+
|
|
707
|
+
---
|
|
708
|
+
|
|
598
709
|
## Quick Reference Card
|
|
599
710
|
|
|
600
711
|
### Log Level Selection
|
|
@@ -623,6 +734,14 @@ App cannot continue? → FATAL
|
|
|
623
734
|
- [ ] Credit cards never logged
|
|
624
735
|
- [ ] Retention policies configured
|
|
625
736
|
|
|
737
|
+
### Rotation Checklist
|
|
738
|
+
|
|
739
|
+
- [ ] Time-based rotation set (`rollingInterval: Day` or equivalent)
|
|
740
|
+
- [ ] Size-based rotation set with `rollOnFileSizeLimit: true` (or equivalent)
|
|
741
|
+
- [ ] `fileSizeLimitBytes` explicitly configured (default cap is hostile)
|
|
742
|
+
- [ ] `retainedFileCountLimit` ≥ N×7 to cover within-window rolls
|
|
743
|
+
- [ ] 90% size SOP defined: investigate noise root cause, do not just raise cap
|
|
744
|
+
|
|
626
745
|
---
|
|
627
746
|
|
|
628
747
|
**Related Standards:**
|
|
@@ -635,6 +754,7 @@ App cannot continue? → FATAL
|
|
|
635
754
|
|
|
636
755
|
| Version | Date | Changes |
|
|
637
756
|
|---------|------|---------|
|
|
757
|
+
| 1.3.0 | 2026-05-26 | Added: Log File Rotation Policy — mandatory dual-trigger (time + size) rotation with hostile-default warning, recipes for .NET/Python/Java/Node, ops SOP (XSPEC-232 / closes issue #111) |
|
|
638
758
|
| 1.2.0 | 2026-01-24 | Added: OpenTelemetry Semantic Conventions, Observability Three Pillars Integration, Log-based Alerting, Advanced Correlation Patterns |
|
|
639
759
|
| 1.1.0 | 2026-01-05 | Added: References section with OWASP, RFC 5424, OpenTelemetry, and 12 Factor App |
|
|
640
760
|
| 1.0.0 | 2025-12-30 | Initial logging standards |
|
|
@@ -2,8 +2,8 @@
|
|
|
2
2
|
|
|
3
3
|
> **Language**: English | [繁體中文](../locales/zh-TW/core/packaging-standards.md)
|
|
4
4
|
|
|
5
|
-
**Version**: 1.
|
|
6
|
-
**Last Updated**: 2026-
|
|
5
|
+
**Version**: 1.1.0
|
|
6
|
+
**Last Updated**: 2026-05-26
|
|
7
7
|
**Applicability**: Projects using a UDS-aware toolchain
|
|
8
8
|
**Scope**: universal
|
|
9
9
|
|
|
@@ -194,6 +194,75 @@ A packaging run is considered **successful** when ALL of the following condition
|
|
|
194
194
|
|
|
195
195
|
---
|
|
196
196
|
|
|
197
|
+
## Archive Format Integrity
|
|
198
|
+
|
|
199
|
+
When a packaging step produces an archive (`.zip`, `.tar.gz`, `.tar.bz2`, etc.) that will be consumed by a deploy script, the **real binary format MUST match the file extension**. A file named `.zip` MUST be a real ZIP archive (PKZip magic `PK\x03\x04`), not a renamed tar archive.
|
|
200
|
+
|
|
201
|
+
> **Why mandatory:** mismatched archive formats trigger silent failures downstream. PowerShell's `Expand-Archive` and `[System.IO.Compression.ZipFile]::ExtractToDirectory()` accept tar-renamed-to-`.zip` **without raising an error** — the file is read, nothing is extracted, no exception. If the next step of the deploy script is destructive (e.g., "delete current install directory"), the live install is destroyed with nothing to replace it.
|
|
202
|
+
|
|
203
|
+
### Verification before publish
|
|
204
|
+
|
|
205
|
+
Every packaging step that produces an archive **MUST** include format verification before declaring success. Minimum verification:
|
|
206
|
+
|
|
207
|
+
| Format | Verification one-liner |
|
|
208
|
+
|---|---|
|
|
209
|
+
| `.zip` | `python -c "import zipfile; zipfile.ZipFile('out.zip').namelist()"` must succeed |
|
|
210
|
+
| `.zip` (Unix) | `file out.zip` must report `Zip archive data`, **NOT** `POSIX tar archive` |
|
|
211
|
+
| `.tar.gz` | `tar -tzf out.tar.gz >/dev/null` must succeed |
|
|
212
|
+
| any | optional: hash a manifest of expected files and compare |
|
|
213
|
+
|
|
214
|
+
Verification failure MUST abort the packaging pipeline before publish.
|
|
215
|
+
|
|
216
|
+
### Platform-specific recipes
|
|
217
|
+
|
|
218
|
+
**Windows — DO use:**
|
|
219
|
+
|
|
220
|
+
```powershell
|
|
221
|
+
# Option A: PowerShell built-in (produces real ZIP)
|
|
222
|
+
Compress-Archive -Path "publish\*" -DestinationPath "dist\patch.zip" -Force
|
|
223
|
+
|
|
224
|
+
# Option B: .NET API (produces real ZIP)
|
|
225
|
+
Add-Type -Assembly System.IO.Compression.FileSystem
|
|
226
|
+
[System.IO.Compression.ZipFile]::CreateFromDirectory(
|
|
227
|
+
"publish", "dist\patch.zip", "Optimal", $false
|
|
228
|
+
)
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
**Windows — DO NOT use:**
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
# ❌ git-bash / busybox tar -a -cf is UNRELIABLE on Windows
|
|
235
|
+
# The -a "auto by extension" flag produces a POSIX tar archive with .zip extension.
|
|
236
|
+
# `file patch.zip` → "POSIX tar archive (GNU)" (not "Zip archive data")
|
|
237
|
+
cd publish && tar -a -cf "../dist/patch.zip" api/
|
|
238
|
+
```
|
|
239
|
+
|
|
240
|
+
**Unix-like — DO use:**
|
|
241
|
+
|
|
242
|
+
```bash
|
|
243
|
+
# Use 'zip' for ZIP archives (BSD/Linux)
|
|
244
|
+
zip -r dist/patch.zip publish/
|
|
245
|
+
|
|
246
|
+
# Use 'tar -czf' (without -a) for tar.gz archives — explicit, deterministic
|
|
247
|
+
tar -czf dist/patch.tar.gz publish/
|
|
248
|
+
|
|
249
|
+
# Verify before publishing
|
|
250
|
+
file dist/patch.zip # expect "Zip archive data"
|
|
251
|
+
python -c "import zipfile; zipfile.ZipFile('dist/patch.zip').namelist()"
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
### Consumer-side defense
|
|
255
|
+
|
|
256
|
+
Producers cannot guarantee that consumers verify. Consumers (deploy scripts) **MUST** verify archive integrity before any destructive action. See [Deployment Standards — Defensive Deployment Ordering](deployment-standards.md#defensive-deployment-ordering) for the consumer-side requirement.
|
|
257
|
+
|
|
258
|
+
### Failure mode reference (real incident)
|
|
259
|
+
|
|
260
|
+
A Windows IIS production deploy script (2026-05-24) used `tar -a -cf patch.zip api/` in git-bash to produce its release archive. The consumer-side PowerShell deploy script then ran `Expand-Archive` (silent no-op on the tar-renamed file), proceeded to `Remove-Item -Recurse` the live `apiDir`, then `Copy-Item` from a source that did not exist (because nothing had been extracted). The live install was wiped, AppPool stopped, and production was down for ~3 minutes until backup-based rollback completed.
|
|
261
|
+
|
|
262
|
+
The combination of (a) producer using auto-extension tar and (b) consumer not verifying extract output destroyed the running install with no error raised at any step.
|
|
263
|
+
|
|
264
|
+
---
|
|
265
|
+
|
|
197
266
|
## Related Standards
|
|
198
267
|
|
|
199
268
|
- [Deployment Standards](deployment-standards.md) — Deploy stage that follows packaging
|
|
@@ -207,6 +276,7 @@ A packaging run is considered **successful** when ALL of the following condition
|
|
|
207
276
|
|
|
208
277
|
| Version | Date | Changes |
|
|
209
278
|
|---------|------|---------|
|
|
279
|
+
| 1.1.0 | 2026-05-26 | Added: Archive Format Integrity section — real-format-must-match-extension rule, verification one-liners, Windows recipe DO/DON'T list, real incident reference (XSPEC-231 / closes issue #113) |
|
|
210
280
|
| 1.0.0 | 2026-04-15 | Initial release — XSPEC-034 Phase 1 |
|
|
211
281
|
|
|
212
282
|
---
|