universal-dev-standards 5.12.1 → 5.13.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -12,11 +12,14 @@ standard:
12
12
  - "Automate build, test, deploy, and rollback"
13
13
 
14
14
  meta:
15
- version: "1.1.0"
16
- updated: "2026-05-13"
15
+ version: "1.2.0"
16
+ updated: "2026-05-26"
17
17
  source: core/deployment-standards.md
18
18
  description: >
19
19
  Safe deployment strategies, feature flags, rollback, environment parity, and DORA metrics.
20
+ v1.2.0: Added Defensive Deployment Ordering — required extract-verify-then-delete sequence,
21
+ PowerShell + bash verify snippets, failure mode mapping, cross-link to packaging-standards
22
+ Archive Format Integrity (XSPEC-231 / closes issue #110).
20
23
  v1.1.0: Added environment stratification responsibility matrix and stub server CI/CD
21
24
  lifecycle rules (XSPEC-204).
22
25
 
@@ -230,6 +233,51 @@ stub_server_cicd_rules:
230
233
  (a) fully testable in UAT via real service or Level 2 stub server, OR
231
234
  (b) explicitly documented as PRD-smoke-only with a smoke test plan.
232
235
 
236
+ defensive_ordering:
237
+ scope: "destructive-update deploy patterns (stop → swap → start) common to Windows IIS, SystemD-managed services, and similar in-place replacement workflows."
238
+ forbidden_ordering: |
239
+ 1. Stop service
240
+ 2. Extract new package ← may silently no-op on format mismatch
241
+ 3. Delete old install ← runs unconditionally — destroys live install
242
+ 4. Copy new install ← throws (source doesn't exist)
243
+ 5. Start service ← cannot start (binaries gone)
244
+ required_ordering: |
245
+ 1. Stop service
246
+ 2. Extract new package → staging area (NOT directly over live install)
247
+ 3. ✅ VERIFY staging area contains expected artifacts
248
+ ↑ if verification fails: abort, do NOT touch the live install
249
+ 4. Backup live install (or earlier — both is fine)
250
+ 5. Delete old install (preserving logs / runtime data)
251
+ 6. Copy new install from staging
252
+ 7. Restore preserved configs
253
+ 8. Start service
254
+ 9. Sanity check (HTTP probe / health endpoint)
255
+ verify_step_3:
256
+ required: true
257
+ skip_allowed: false
258
+ minimum: "Test-Path / [ -f ... ] against at least one well-known file from the new package."
259
+ preferred: "Hash-check a manifest of expected files."
260
+ rules:
261
+ - id: extract-to-staging-not-live
262
+ requirement: "Step 2 MUST extract to a staging area separate from the live install path."
263
+ priority: required
264
+ - id: verify-before-destructive
265
+ requirement: "Step 3 MUST verify staging contents before any destructive step touches the live install. Failure MUST abort with live install untouched."
266
+ priority: required
267
+ - id: backup-required
268
+ requirement: "A backup of the live install MUST exist before step 5 (delete). Can be taken earlier."
269
+ priority: required
270
+ - id: preserve-runtime-data
271
+ requirement: "Step 5 MUST exclude logs and runtime data from deletion."
272
+ priority: required
273
+ failure_modes_addressed:
274
+ - wrong_format_archive: "Step 3 verify fails — live install untouched"
275
+ - partial_extract: "Step 3 verify fails — live install untouched"
276
+ - archive_root_changed: "Step 3 verify fails — live install untouched"
277
+ - permission_denied_extract: "Step 3 verify fails — live install untouched"
278
+ upstream_prevention: "Producer-side prevention via packaging-standards Archive Format Integrity. Both layers form defense-in-depth — neither alone is sufficient."
279
+ failure_mode_reference: "PROD incident 2026-05-24: Expand-Archive silent no-op on tar-renamed-to-.zip + unconditional Remove-Item destroyed live install. ~3 minutes downtime. Step 3 verify would have aborted before destruction."
280
+
233
281
  physical_spec:
234
282
  type: custom_script
235
283
  validator:
@@ -3,10 +3,10 @@
3
3
 
4
4
  id: logging
5
5
  meta:
6
- version: "1.2.0"
7
- updated: "2026-03-18"
6
+ version: "1.3.0"
7
+ updated: "2026-05-26"
8
8
  source: core/logging-standards.md
9
- description: Logging levels, structured logging, and best practices
9
+ description: Logging levels, structured logging, file rotation policy, and best practices
10
10
 
11
11
  log_levels:
12
12
  ordered:
@@ -119,6 +119,43 @@ rules:
119
119
  Correlate all three via trace_id/span_id
120
120
  priority: recommended
121
121
 
122
+ - id: rotation-dual-trigger
123
+ trigger: configuring file-based log sink
124
+ instruction: |
125
+ File-based log sinks MUST set both rotation triggers:
126
+ 1. Time-based: rollingInterval=Day (or equivalent)
127
+ 2. Size-based: fileSizeLimitBytes explicit AND rollOnFileSizeLimit=true
128
+ Default size caps are hostile in production — Serilog silently drops at 1 GB
129
+ if rollOnFileSizeLimit is left at default false; log4j / Winston / Python
130
+ RotatingFileHandler likewise drop or grow unbounded without explicit config.
131
+ Recommended starting value: fileSizeLimitBytes=104857600 (100 MB),
132
+ retainedFileCountLimit >= N*7 where N = max expected rolls per day.
133
+ priority: required
134
+
135
+ - id: rotation-ops-sop
136
+ trigger: log file size approaching cap
137
+ instruction: |
138
+ If a log file size reaches >= 90% of fileSizeLimitBytes at expected end-of-day,
139
+ INVESTIGATE the cause (noisy retry loop, unbounded debug logging, stack-trace
140
+ flood) BEFORE raising the cap. Raising the cap masks the noise problem.
141
+ priority: required
142
+
143
+ rotation_policy:
144
+ must_set_both:
145
+ time_based: rollingInterval=Day (or equivalent)
146
+ size_based:
147
+ fileSizeLimitBytes: explicit (100 MB recommended)
148
+ rollOnFileSizeLimit: true
149
+ hostile_defaults:
150
+ serilog: silently stops at 1 GB if rollOnFileSizeLimit=false
151
+ log4j: drops if no SizeBasedTriggeringPolicy
152
+ python_rotatingfilehandler: grows unbounded if maxBytes unset
153
+ winston: drops if maxSize unset
154
+ recommended:
155
+ fileSizeLimitBytes: 104857600
156
+ retainedFileCountLimit_formula: "N * 7 (N = max rolls/day)"
157
+ ops_sop: investigate noise root cause at >= 90% size before raising cap
158
+
122
159
  quick_reference:
123
160
  level_selection:
124
161
  columns: [Question, Level]
@@ -12,8 +12,8 @@ standard:
12
12
  - "Pipeline-integrated: packaging runs between Review and Deploy in the adoption-layer pipeline"
13
13
 
14
14
  meta:
15
- version: "1.0.0"
16
- updated: "2026-04-15"
15
+ version: "1.1.0"
16
+ updated: "2026-05-26"
17
17
  source: core/packaging-standards.md
18
18
 
19
19
  principles:
@@ -135,6 +135,29 @@ recipe_selection_guide:
135
135
  yes: windows-installer
136
136
  no: custom-recipe-required
137
137
 
138
+ archive_format_integrity:
139
+ rules:
140
+ - id: real_format_matches_extension
141
+ requirement: "A .zip file MUST be a real ZIP archive (PKZip magic PK\\x03\\x04). A renamed POSIX tar with .zip extension is forbidden."
142
+ priority: required
143
+ - id: verify_before_publish
144
+ requirement: "Packaging step MUST verify the produced archive's real format before declaring success."
145
+ priority: required
146
+ verification_examples:
147
+ zip_python: "python -c \"import zipfile; zipfile.ZipFile('out.zip').namelist()\""
148
+ zip_unix: "file out.zip # expect 'Zip archive data', NOT 'POSIX tar archive'"
149
+ targz_unix: "tar -tzf out.tar.gz >/dev/null"
150
+ - id: windows_recipe_compliance
151
+ requirement: "On Windows, use PowerShell Compress-Archive or .NET ZipFile::CreateFromDirectory. Do NOT use git-bash 'tar -a -cf x.zip' — the auto-extension flag produces a POSIX tar archive."
152
+ priority: required
153
+ do_use:
154
+ - "Compress-Archive -Path 'publish\\*' -DestinationPath 'dist\\patch.zip' -Force"
155
+ - "[System.IO.Compression.ZipFile]::CreateFromDirectory(...)"
156
+ do_not_use:
157
+ - "tar -a -cf x.zip ... # produces POSIX tar with .zip extension on Windows tar ports"
158
+ consumer_side_defense: "Producers cannot guarantee verification downstream. Deploy scripts MUST verify archive integrity before any destructive action — see deployment-standards Defensive Deployment Ordering."
159
+ failure_mode_reference: "PROD incident 2026-05-24: tar-renamed-to-.zip + Expand-Archive silent no-op + unconditional Remove-Item destroyed live install. ~3 minutes downtime."
160
+
138
161
  physical_spec:
139
162
  type: custom_script
140
163
  validator:
@@ -0,0 +1,144 @@
1
+ # Self-Review Protocol - AI Optimized
2
+ # Source: core/self-review-protocol.md
3
+
4
+ id: self-review-protocol
5
+ meta:
6
+ version: "1.0.0"
7
+ updated: "2026-05-26"
8
+ source: core/self-review-protocol.md
9
+ description: Mandatory self-review pass on large markdown edits before commit; catches 6 categories of internal cross-reference inconsistencies that internal reasoning routinely misses
10
+
11
+ trigger_conditions:
12
+ mandatory:
13
+ condition: commit modifies > 50 lines of markdown
14
+ artefact_types:
15
+ - ADR (architecture decision records, DEC-NNN)
16
+ - XSPEC (cross-project specs)
17
+ - XSPEC SDD Deltas
18
+ - SKILL.md (Claude Code custom skills)
19
+ - ARCHITECTURE.md
20
+ - API.md
21
+ - DEPLOYMENT.md
22
+ - MIGRATION.md
23
+ - runbooks
24
+ - playbooks
25
+ - README.md (when modifying major sections)
26
+ optional:
27
+ condition: commit modifies <= 50 lines of markdown
28
+ rationale: small edits rarely have cross-reference risk
29
+ not_applicable:
30
+ condition: code or config only changes
31
+ rationale: covered by lint, test, and code review
32
+
33
+ inconsistency_categories:
34
+ - id: 1
35
+ name: diagram_step_mismatch
36
+ description: Diagram or flow chart out of sync with step list
37
+ example: workflow diagram has 7 boxes but document defines 8 steps
38
+ check: count diagram nodes vs `## Step N:` / `## N.` headers
39
+ - id: 2
40
+ name: changelog_reference_error
41
+ description: Changelog entry references wrong anchor
42
+ example: changelog says "Step 1 added X" but X actually at Step 0
43
+ check: for each changelog line, grep the anchor it references
44
+ - id: 3
45
+ name: count_drift
46
+ description: Explicit number in text out of sync with actual count
47
+ example: "self-audit has 4 questions" but list has 7
48
+ check: grep for "N questions", "N rows", "N items" and verify
49
+ - id: 4
50
+ name: stale_template
51
+ description: Hardcoded model names, tool versions, dates not updated
52
+ example: commit template hardcodes Claude Sonnet 4.6 when model varies
53
+ check: find hardcoded specifics; replace with placeholders or update
54
+ - id: 5
55
+ name: wrong_tool_reference
56
+ description: Recommended CLI command does not do what described
57
+ example: recommends `claude --version` for model name (shows CLI version)
58
+ check: for each CLI command mentioned, mental check or `--help` verify
59
+ - id: 6
60
+ name: placeholder_rule_misalignment
61
+ description: Example contradicts current rule or latest case experience
62
+ example: example shows D1/D2/D3 but rule says D3 not mandatory
63
+ check: every concrete value in examples consistent with current rules
64
+
65
+ procedure:
66
+ step_1:
67
+ name: re_read_full_file
68
+ description: Use file-reading tool to read entire file (not just diff)
69
+ when: after editing, before committing
70
+ step_2:
71
+ name: walk_categories
72
+ description: Apply 6 categories above against the file
73
+ step_3:
74
+ name: fix_in_same_commit
75
+ description: If issues found, edit in place and include fixes in same commit
76
+ rationale: ship-and-patch creates follow-up commits; in-place fix is cleaner
77
+ step_4:
78
+ name: patch_if_already_committed
79
+ description: If issues found after commit, create patch commit (e.g., v1.2.1 fixes v1.2.0)
80
+
81
+ recording_formats:
82
+ skill_md:
83
+ location: changelog line in frontmatter or near top
84
+ format: '> **v{X.Y.(Z+1)} Self-review pass {YYYY-MM-DD}**: {N} issues found, {M} fixed in same commit'
85
+ adr_dec:
86
+ location: '## Follow-up Tracking table'
87
+ format: '| Self-review pass | This DEC | ✅ YYYY-MM-DD (6 categories, no issues) |'
88
+ xspec_sdd_delta:
89
+ location: after non-modification list section (e.g. §N.6)
90
+ format: '> Self-review pass: YYYY-MM-DD (6 categories, no issues)'
91
+ commit_message_body:
92
+ location: last line of commit body
93
+ format: 'Self-review (protocol v1.0.0): N issues found, M applied in same commit / 0 found.'
94
+
95
+ distinction_from_other_practices:
96
+ code_review:
97
+ covers: code correctness, design, security
98
+ trigger: before merging code PR
99
+ source_standard: code-review.md
100
+ content_self_audit:
101
+ covers: content completeness (all required sections present)
102
+ trigger: each artefact creation
103
+ example: eval-source skill 7-question audit
104
+ self_review_protocol:
105
+ covers: internal cross-reference consistency (form, not content)
106
+ trigger: after large markdown edit, before commit
107
+ source_standard: self-review-protocol.md (this standard)
108
+ peer_review:
109
+ covers: independent perspective, blast radius assessment
110
+ trigger: significant changes
111
+
112
+ anti_patterns:
113
+ - skipping_because_diff_small:
114
+ problem: small diffs in large files often introduce cross-ref errors elsewhere
115
+ mitigation: trigger is whole-file size, not diff size
116
+ - reviewing_diff_only:
117
+ problem: cross-ref errors may live in unchanged sections referencing changed content
118
+ mitigation: re-read whole file, not just diff
119
+ - document_without_practice:
120
+ problem: discipline is in the practice, not the documentation
121
+ mitigation: enforce in commit checklist
122
+ - substitute_for_peer_review:
123
+ problem: self-review catches inconsistencies, not design flaws
124
+ mitigation: keep peer review for significant changes
125
+
126
+ verification:
127
+ metrics:
128
+ - patch_commit_ratio:
129
+ description: ratio of v1.X.0 -> v1.X.1 follow-up patches for same artefact
130
+ target: significant drop after adopting; eval-source went from 100% to 0% after v1.3.0
131
+ - issue_surface_time:
132
+ description: issues caught by self-review (pre-commit) vs by next reader (post-commit)
133
+ target: pre-commit grows, post-commit shrinks
134
+
135
+ examples_in_the_wild:
136
+ - artefact: dev-platform/.claude/skills/eval-source/SKILL.md
137
+ version: v1.3.0
138
+ commit: 6b45c5d
139
+ note: first SKILL.md edit to pass self-review pre-commit; preceded by 2 patch cycles (v1.1.0->v1.1.1 with 3 issues, v1.2.0->v1.2.1 with 6 issues) that motivated this standard
140
+
141
+ self_review:
142
+ date: "2026-05-26"
143
+ issues_found: 0
144
+ notes: First draft self-review pass; no internal inconsistencies detected
@@ -2,8 +2,8 @@
2
2
 
3
3
  > **Language**: English | [繁體中文](../locales/zh-TW/core/deployment-standards.md)
4
4
 
5
- **Version**: 1.0.0
6
- **Last Updated**: 2026-02-09
5
+ **Version**: 1.1.0
6
+ **Last Updated**: 2026-05-26
7
7
  **Applicability**: All software projects with deployment pipelines
8
8
  **Scope**: universal
9
9
  **Industry Standards**: Twelve-Factor App, Google SRE — Release Engineering, DORA State of DevOps
@@ -219,6 +219,103 @@ Strategies are not mutually exclusive. Common combinations:
219
219
 
220
220
  ---
221
221
 
222
+ ## Defensive Deployment Ordering
223
+
224
+ When a deploy script replaces a running install (the destructive-update pattern common to Windows IIS, SystemD-managed services, or any "stop → swap → start" workflow), the ordering of destructive steps relative to verification is non-negotiable.
225
+
226
+ ### The forbidden ordering
227
+
228
+ ```
229
+ 1. Stop service
230
+ 2. Extract new package ← may silently no-op on format mismatch
231
+ 3. Delete old install ← runs unconditionally — destroys the running install
232
+ 4. Copy new install ← throws (source doesn't exist)
233
+ 5. Start service ← cannot start (binaries gone)
234
+ ```
235
+
236
+ If step 2 silently fails (corrupt archive, wrong format, disk full, permissions), step 3 still runs and **destroys the running install**, leaving nothing to recover from except backup. Backup helps for full rollback but does NOT prevent the outage window — the service is already down.
237
+
238
+ ### The required ordering — extract, verify, then delete
239
+
240
+ The destructive deploy ordering **MUST** be:
241
+
242
+ ```
243
+ 1. Stop service
244
+ 2. Extract new package → staging area (NOT directly over live install)
245
+ 3. ✅ VERIFY staging area contains expected artifacts
246
+ ↑ if verification fails: abort, do NOT touch the live install
247
+ 4. Backup live install (or done earlier — both is fine)
248
+ 5. Delete old install (preserving logs / runtime data)
249
+ 6. Copy new install from staging
250
+ 7. Restore preserved configs
251
+ 8. Start service
252
+ 9. Sanity check (HTTP probe / health endpoint)
253
+ ```
254
+
255
+ **Step 3 verification is non-negotiable.** Minimum verification is checking that at least one well-known file from the new package exists in the staging area. Hash-checking a manifest of expected files is preferred when available.
256
+
257
+ ### Verification snippets
258
+
259
+ **PowerShell** (Windows IIS deploy):
260
+
261
+ ```powershell
262
+ $staging = "C:\deploy\staging-$(Get-Date -Format yyyyMMddHHmmss)"
263
+ Expand-Archive -Path $zipPath -DestinationPath $staging -Force
264
+
265
+ # Non-negotiable: verify staging before touching live install
266
+ if (-not (Test-Path "$staging\api\MyApp.dll")) {
267
+ throw "Expected $staging\api\MyApp.dll not found — archive may be corrupt or wrong format. Aborting deploy. Live install untouched."
268
+ }
269
+
270
+ # Only NOW touch live install
271
+ Copy-Item "$apiDir" "$backupDir" -Recurse -Force
272
+ Get-ChildItem $apiDir -Exclude logs | Remove-Item -Recurse -Force
273
+ Copy-Item "$staging\api\*" $apiDir -Recurse
274
+ ```
275
+
276
+ **bash** (Linux SystemD-managed service):
277
+
278
+ ```bash
279
+ set -euo pipefail
280
+
281
+ STAGING="/srv/deploy/staging-$(date +%Y%m%d%H%M%S)"
282
+ mkdir -p "$STAGING"
283
+ tar -xzf "$ARCHIVE" -C "$STAGING"
284
+
285
+ # Non-negotiable: verify staging before touching live install
286
+ if [ ! -f "$STAGING/bin/myapp" ]; then
287
+ echo "ERROR: Expected $STAGING/bin/myapp not found. Aborting deploy. Live install untouched." >&2
288
+ exit 1
289
+ fi
290
+
291
+ # Only NOW touch live install
292
+ systemctl stop myapp
293
+ cp -a "$LIVE_DIR" "$BACKUP_DIR"
294
+ find "$LIVE_DIR" -mindepth 1 -not -path "$LIVE_DIR/logs*" -delete
295
+ cp -a "$STAGING"/* "$LIVE_DIR/"
296
+ systemctl start myapp
297
+ ```
298
+
299
+ ### Failure modes addressed
300
+
301
+ | Failure mode | What protects against it |
302
+ |---|---|
303
+ | Archive is wrong format (e.g., tar renamed to `.zip`) | Step 3 verify fails — live install untouched |
304
+ | Partial extract (disk full mid-extract) | Step 3 verify fails — live install untouched |
305
+ | Archive root structure changed (extra wrapper folder, missing key file) | Step 3 verify fails — live install untouched |
306
+ | Permissions issue (extract step had read but not write) | Step 3 verify fails — live install untouched |
307
+ | Backup script itself fails | Optional secondary check after step 4 |
308
+
309
+ ### Upstream prevention
310
+
311
+ Verifying at the consumer side is the last line of defense. The **upstream** prevention — refusing to produce a misformatted archive in the first place — is covered by [Packaging Standards — Archive Format Integrity](packaging-standards.md#archive-format-integrity). Both layers together form a defense-in-depth pair; neither alone is sufficient.
312
+
313
+ ### Failure mode reference (real incident)
314
+
315
+ A Windows IIS production deploy script (2026-05-24) ran `Expand-Archive` against a tar-renamed-to-`.zip` archive (silent no-op), then `Remove-Item -Recurse` against the live `apiDir`, then `Copy-Item` from a source that did not exist (because nothing had been extracted). The live install was wiped, AppPool stopped, production was down for ~3 minutes until backup-based rollback completed. Adding step 3 verify (`Test-Path "$staging/api/MyApp.dll"`) would have aborted the deploy at the staging stage with the live install untouched.
316
+
317
+ ---
318
+
222
319
  ## Post-Deployment Checklist
223
320
 
224
321
  ### Immediate (< 5 minutes)
@@ -334,6 +431,7 @@ Smoke test failure MUST block the deployment from proceeding and trigger a rollb
334
431
 
335
432
  | Version | Date | Changes |
336
433
  |---------|------|---------|
434
+ | 1.1.0 | 2026-05-26 | Added: Defensive Deployment Ordering section — required extract-verify-then-delete sequence, PowerShell + bash verify snippets, failure mode mapping, cross-link to packaging-standards Archive Format Integrity (XSPEC-231 / closes issue #110) |
337
435
  | 1.0.0 | 2026-02-09 | Initial release |
338
436
 
339
437
  ---
@@ -2,8 +2,8 @@
2
2
 
3
3
  > **Language**: English | [繁體中文](../locales/zh-TW/core/logging-standards.md)
4
4
 
5
- **Version**: 1.2.0
6
- **Last Updated**: 2026-01-24
5
+ **Version**: 1.3.0
6
+ **Last Updated**: 2026-05-26
7
7
  **Applicability**: All software projects
8
8
  **Scope**: universal
9
9
  **Industry Standards**: RFC 5424, OpenTelemetry, W3C Trace Context
@@ -595,6 +595,117 @@ For endpoints called thousands of times per second:
595
595
  | WARN | 90 days |
596
596
  | ERROR/FATAL | 1 year |
597
597
 
598
+ ---
599
+
600
+ ## Log File Rotation Policy
601
+
602
+ ### Rotation policy — MUST set both
603
+
604
+ A file-based log sink configuration **MUST** include **both** triggers:
605
+
606
+ 1. **Time-based rotation** (`rollingInterval: Day` or equivalent) — for chronological partitioning
607
+ 2. **Size-based rotation** with `rollOnFileSizeLimit: true` (or equivalent) — to handle volume spikes
608
+
609
+ > **Why mandatory:** Most logging libraries ship with a silent default size cap. When the file hits the cap, subsequent log writes are **dropped silently** — no warning, no error. The application keeps running while half a day of logs vanish. Setting both triggers explicitly defeats this trap.
610
+
611
+ ### Default cap is hostile in production
612
+
613
+ | Library | Default size cap | Behavior when cap hit |
614
+ |---|---|---|
615
+ | Serilog File sink (.NET) | 1 GB | **Silently stops writing** (`RollOnFileSizeLimit = false` by default) |
616
+ | log4j RollingFileAppender | none unless set | Same — no roll = drops |
617
+ | Python `RotatingFileHandler` | infinite unless `maxBytes` set | Grows unbounded |
618
+ | Winston `winston-daily-rotate-file` | none unless `maxSize` set | Same — no roll = drops |
619
+
620
+ If you do not explicitly configure size-based rotation, you are accepting one of the failure modes above.
621
+
622
+ ### Recommended starting values
623
+
624
+ | Parameter | Value | Rationale |
625
+ |---|---|---|
626
+ | `fileSizeLimitBytes` | 100 MB | Balance: small enough to open in an editor, large enough to avoid excessive rolls |
627
+ | `rollOnFileSizeLimit` | `true` | When cap hit, create `*-001.txt`, `*-002.txt`; do **NOT** drop |
628
+ | `retainedFileCountLimit` | ≥ N×7 where N = max expected rolls/day | Avoid premature deletion of in-window logs |
629
+
630
+ ### Recipes per language
631
+
632
+ **.NET / Serilog** (`appsettings.json`):
633
+
634
+ ```json
635
+ {
636
+ "Serilog": {
637
+ "WriteTo": [{
638
+ "Name": "File",
639
+ "Args": {
640
+ "path": "logs/app-.txt",
641
+ "rollingInterval": "Day",
642
+ "fileSizeLimitBytes": 104857600,
643
+ "rollOnFileSizeLimit": true,
644
+ "retainedFileCountLimit": 90
645
+ }
646
+ }]
647
+ }
648
+ }
649
+ ```
650
+
651
+ **Python** (`logging.handlers`):
652
+
653
+ ```python
654
+ from logging.handlers import RotatingFileHandler
655
+
656
+ handler = RotatingFileHandler(
657
+ filename="logs/app.log",
658
+ maxBytes=104857600, # 100 MB
659
+ backupCount=90 # ~3 months of rolls assuming low cardinality
660
+ )
661
+ # For combined time+size rotation, compose TimedRotatingFileHandler with size check
662
+ # or use a third-party library such as concurrent-log-handler.
663
+ ```
664
+
665
+ **Java / log4j2** (`log4j2.xml`):
666
+
667
+ ```xml
668
+ <RollingFile name="App" fileName="logs/app.log"
669
+ filePattern="logs/app-%d{yyyy-MM-dd}-%i.log.gz">
670
+ <PatternLayout pattern="%d %-5p %c{1.} - %m%n"/>
671
+ <Policies>
672
+ <TimeBasedTriggeringPolicy interval="1"/>
673
+ <SizeBasedTriggeringPolicy size="100 MB"/>
674
+ </Policies>
675
+ <DefaultRolloverStrategy max="90"/>
676
+ </RollingFile>
677
+ ```
678
+
679
+ **Node / Winston** (`winston-daily-rotate-file`):
680
+
681
+ ```javascript
682
+ import DailyRotateFile from "winston-daily-rotate-file";
683
+
684
+ new DailyRotateFile({
685
+ filename: "logs/app-%DATE%.log",
686
+ datePattern: "YYYY-MM-DD",
687
+ maxSize: "100m",
688
+ maxFiles: "90d"
689
+ });
690
+ ```
691
+
692
+ ### Operational SOP — investigate, don't just raise the cap
693
+
694
+ If a log file size reaches ≥ 90% of `fileSizeLimitBytes` at expected end-of-day, **investigate the cause before raising the cap**. Typical root causes:
695
+
696
+ - Noisy retry loop logging every attempt at INFO instead of WARN summary
697
+ - Unbounded debug logging accidentally enabled in production
698
+ - Stack-trace flood from one upstream failure
699
+ - Health probe / sidecar polluting the business log
700
+
701
+ Raising the cap masks the underlying noise problem and pushes the next outage further out.
702
+
703
+ ### Failure-mode reference (real incident)
704
+
705
+ A production .NET Worker using only `rollingInterval: Day` (no size limit set, Serilog default 1 GB cap) hit the cap at 07:31 and silently dropped every log entry until 13:00+ when the operator noticed the tail was stale. Five consecutive daily files showed `~1,073,741,8XX bytes` (= 1 GiB exactly, Serilog default). Half a day of production diagnostics were lost. Setting `fileSizeLimitBytes` + `rollOnFileSizeLimit: true` would have rolled to `worker-YYYYMMDD_001.txt` and preserved the events.
706
+
707
+ ---
708
+
598
709
  ## Quick Reference Card
599
710
 
600
711
  ### Log Level Selection
@@ -623,6 +734,14 @@ App cannot continue? → FATAL
623
734
  - [ ] Credit cards never logged
624
735
  - [ ] Retention policies configured
625
736
 
737
+ ### Rotation Checklist
738
+
739
+ - [ ] Time-based rotation set (`rollingInterval: Day` or equivalent)
740
+ - [ ] Size-based rotation set with `rollOnFileSizeLimit: true` (or equivalent)
741
+ - [ ] `fileSizeLimitBytes` explicitly configured (default cap is hostile)
742
+ - [ ] `retainedFileCountLimit` ≥ N×7 to cover within-window rolls
743
+ - [ ] 90% size SOP defined: investigate noise root cause, do not just raise cap
744
+
626
745
  ---
627
746
 
628
747
  **Related Standards:**
@@ -635,6 +754,7 @@ App cannot continue? → FATAL
635
754
 
636
755
  | Version | Date | Changes |
637
756
  |---------|------|---------|
757
+ | 1.3.0 | 2026-05-26 | Added: Log File Rotation Policy — mandatory dual-trigger (time + size) rotation with hostile-default warning, recipes for .NET/Python/Java/Node, ops SOP (XSPEC-232 / closes issue #111) |
638
758
  | 1.2.0 | 2026-01-24 | Added: OpenTelemetry Semantic Conventions, Observability Three Pillars Integration, Log-based Alerting, Advanced Correlation Patterns |
639
759
  | 1.1.0 | 2026-01-05 | Added: References section with OWASP, RFC 5424, OpenTelemetry, and 12 Factor App |
640
760
  | 1.0.0 | 2025-12-30 | Initial logging standards |
@@ -2,8 +2,8 @@
2
2
 
3
3
  > **Language**: English | [繁體中文](../locales/zh-TW/core/packaging-standards.md)
4
4
 
5
- **Version**: 1.0.0
6
- **Last Updated**: 2026-04-15
5
+ **Version**: 1.1.0
6
+ **Last Updated**: 2026-05-26
7
7
  **Applicability**: Projects using a UDS-aware toolchain
8
8
  **Scope**: universal
9
9
 
@@ -194,6 +194,75 @@ A packaging run is considered **successful** when ALL of the following condition
194
194
 
195
195
  ---
196
196
 
197
+ ## Archive Format Integrity
198
+
199
+ When a packaging step produces an archive (`.zip`, `.tar.gz`, `.tar.bz2`, etc.) that will be consumed by a deploy script, the **real binary format MUST match the file extension**. A file named `.zip` MUST be a real ZIP archive (PKZip magic `PK\x03\x04`), not a renamed tar archive.
200
+
201
+ > **Why mandatory:** mismatched archive formats trigger silent failures downstream. PowerShell's `Expand-Archive` and `[System.IO.Compression.ZipFile]::ExtractToDirectory()` accept tar-renamed-to-`.zip` **without raising an error** — the file is read, nothing is extracted, no exception. If the next step of the deploy script is destructive (e.g., "delete current install directory"), the live install is destroyed with nothing to replace it.
202
+
203
+ ### Verification before publish
204
+
205
+ Every packaging step that produces an archive **MUST** include format verification before declaring success. Minimum verification:
206
+
207
+ | Format | Verification one-liner |
208
+ |---|---|
209
+ | `.zip` | `python -c "import zipfile; zipfile.ZipFile('out.zip').namelist()"` must succeed |
210
+ | `.zip` (Unix) | `file out.zip` must report `Zip archive data`, **NOT** `POSIX tar archive` |
211
+ | `.tar.gz` | `tar -tzf out.tar.gz >/dev/null` must succeed |
212
+ | any | optional: hash a manifest of expected files and compare |
213
+
214
+ Verification failure MUST abort the packaging pipeline before publish.
215
+
216
+ ### Platform-specific recipes
217
+
218
+ **Windows — DO use:**
219
+
220
+ ```powershell
221
+ # Option A: PowerShell built-in (produces real ZIP)
222
+ Compress-Archive -Path "publish\*" -DestinationPath "dist\patch.zip" -Force
223
+
224
+ # Option B: .NET API (produces real ZIP)
225
+ Add-Type -Assembly System.IO.Compression.FileSystem
226
+ [System.IO.Compression.ZipFile]::CreateFromDirectory(
227
+ "publish", "dist\patch.zip", "Optimal", $false
228
+ )
229
+ ```
230
+
231
+ **Windows — DO NOT use:**
232
+
233
+ ```bash
234
+ # ❌ git-bash / busybox tar -a -cf is UNRELIABLE on Windows
235
+ # The -a "auto by extension" flag produces a POSIX tar archive with .zip extension.
236
+ # `file patch.zip` → "POSIX tar archive (GNU)" (not "Zip archive data")
237
+ cd publish && tar -a -cf "../dist/patch.zip" api/
238
+ ```
239
+
240
+ **Unix-like — DO use:**
241
+
242
+ ```bash
243
+ # Use 'zip' for ZIP archives (BSD/Linux)
244
+ zip -r dist/patch.zip publish/
245
+
246
+ # Use 'tar -czf' (without -a) for tar.gz archives — explicit, deterministic
247
+ tar -czf dist/patch.tar.gz publish/
248
+
249
+ # Verify before publishing
250
+ file dist/patch.zip # expect "Zip archive data"
251
+ python -c "import zipfile; zipfile.ZipFile('dist/patch.zip').namelist()"
252
+ ```
253
+
254
+ ### Consumer-side defense
255
+
256
+ Producers cannot guarantee that consumers verify. Consumers (deploy scripts) **MUST** verify archive integrity before any destructive action. See [Deployment Standards — Defensive Deployment Ordering](deployment-standards.md#defensive-deployment-ordering) for the consumer-side requirement.
257
+
258
+ ### Failure mode reference (real incident)
259
+
260
+ A Windows IIS production deploy script (2026-05-24) used `tar -a -cf patch.zip api/` in git-bash to produce its release archive. The consumer-side PowerShell deploy script then ran `Expand-Archive` (silent no-op on the tar-renamed file), proceeded to `Remove-Item -Recurse` the live `apiDir`, then `Copy-Item` from a source that did not exist (because nothing had been extracted). The live install was wiped, AppPool stopped, and production was down for ~3 minutes until backup-based rollback completed.
261
+
262
+ The combination of (a) producer using auto-extension tar and (b) consumer not verifying extract output destroyed the running install with no error raised at any step.
263
+
264
+ ---
265
+
197
266
  ## Related Standards
198
267
 
199
268
  - [Deployment Standards](deployment-standards.md) — Deploy stage that follows packaging
@@ -207,6 +276,7 @@ A packaging run is considered **successful** when ALL of the following condition
207
276
 
208
277
  | Version | Date | Changes |
209
278
  |---------|------|---------|
279
+ | 1.1.0 | 2026-05-26 | Added: Archive Format Integrity section — real-format-must-match-extension rule, verification one-liners, Windows recipe DO/DON'T list, real incident reference (XSPEC-231 / closes issue #113) |
210
280
  | 1.0.0 | 2026-04-15 | Initial release — XSPEC-034 Phase 1 |
211
281
 
212
282
  ---