@ngocsangairvds/vsaf 4.1.16 → 4.1.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ngocsangairvds/vsaf",
3
- "version": "4.1.16",
3
+ "version": "4.1.17",
4
4
  "description": "logging step",
5
5
  "main": "packages/core/dist/index.js",
6
6
  "types": "packages/core/dist/index.d.ts",
@@ -112,6 +112,20 @@ uv run --directory ~/.claude/vds-scripts --package spec_orchestrator vds-spec --
112
112
  - Research-backed investigation → `research-skill`
113
113
  - Structural graph review → `code-review-graph-skill`
114
114
 
115
+ ## Mandatory Rules
116
+
117
+ ### Confluence Content Export
118
+
119
+ When fetching or exporting Confluence page content (especially BRD, KBNV, SRS pages with tables and images), you **MUST** read and follow `references/confluence-content-export.md` before proceeding. Key rules:
120
+
121
+ 1. **Use `body.storage`** (not `body.view`) as source of truth for image references
122
+ 2. **Only download attachments referenced in `body.storage`** — never bulk-download all attachments
123
+ 3. **Preserve image position** in output — replace `ac:image` with `![](images/...)` in-place, never collect at end
124
+ 4. **Keep HTML tables** in markdown output — do not flatten to pipe tables
125
+ 5. **Verify image count** matches between `body.storage` and output before reporting done
126
+
127
+ Failure to follow these rules causes silent data loss (wrong images downloaded, table structure destroyed, image↔text mapping broken).
128
+
115
129
  ## Progressive Disclosure
116
130
 
117
131
  Use these bundled references for depth:
@@ -121,6 +135,7 @@ Use these bundled references for depth:
121
135
  - `references/validation-commands.md`
122
136
  - `references/development-commands.md`
123
137
  - `references/specialist-routing.md`
138
+ - `references/confluence-content-export.md`
124
139
 
125
140
  ## Notes
126
141
 
@@ -0,0 +1,127 @@
1
+ # Confluence Content Export — Mandatory Rules
2
+
3
+ When extracting content from a Confluence page (BRD, KBNV, SRS, or any page with tables/images), follow these rules strictly. Violations cause data loss that is invisible to the user.
4
+
5
+ ## Rule 1 — Use `body.storage` as the Single Source of Truth for Image Refs
6
+
7
+ - **Never** rely on `body.view` for image references — it is post-render HTML that may add or drop images via macros.
8
+ - `body.storage` is raw XHTML containing `<ac:image><ri:attachment ri:filename="..."/></ac:image>` — this is the definitive set of inline images.
9
+ - **Only download attachments whose filename appears in `body.storage`**. Do not bulk-download from `attachments --list`.
10
+
11
+ ```bash
12
+ # Step 1: fetch storage body
13
+ uv run --directory ~/.claude/vds-scripts --package vds-cli \
14
+ vds-cli confluence content page <PAGE_ID> --expand body.storage
15
+ ```
16
+
17
+ ```python
18
+ # Step 2: extract referenced filenames from storage XHTML
19
+ import re
20
+ storage_xml: str = ... # body.storage content
21
+ referenced = set(re.findall(r'ri:filename="([^"]+)"', storage_xml))
22
+ # referenced = {'screen-login.png', 'flow-diagram.png', 'table-header.png'}
23
+ ```
24
+
25
+ ## Rule 2 — Preserve Image-to-Text Positional Mapping
26
+
27
+ When converting to markdown/local file:
28
+
29
+ - Replace each `<ac:image>...<ri:attachment ri:filename="X"/>...</ac:image>` with `![X](images/X)` **at its original position** in the document. Never collect images into a list at the end.
30
+ - If the content contains `<table>`, **keep HTML table markup** in the markdown output. Markdown viewers (GitHub, Obsidian, VS Code) all render HTML tables. Do not flatten tables with nested bullets or merged cells into pure-markdown `|` tables — this destroys row/column relationships.
31
+ - Place the `images/` folder at the same level as `content.md`.
32
+
33
+ ```python
34
+ # Step 3: convert storage XHTML → markdown with inline image refs
35
+ import re
36
+
37
+ def storage_to_markdown(storage_xml: str) -> str:
38
+ """Convert Confluence storage XHTML to markdown, preserving image positions and tables."""
39
+ result = storage_xml
40
+
41
+ # Replace ac:image with markdown image refs (in-place)
42
+ def replace_ac_image(match: re.Match) -> str:
43
+ filename = re.search(r'ri:filename="([^"]+)"', match.group(0))
44
+ if filename:
45
+ name = filename.group(1)
46
+ return f'![{name}](images/{name})'
47
+ return match.group(0)
48
+
49
+ result = re.sub(
50
+ r'<ac:image[^>]*>.*?</ac:image>',
51
+ replace_ac_image,
52
+ result,
53
+ flags=re.DOTALL,
54
+ )
55
+
56
+ # Keep <table>...</table> blocks as-is (HTML in markdown)
57
+ # Convert other XHTML elements to markdown as usual:
58
+ # <p> → newline, <h1> → #, <li> → -, <strong> → **, etc.
59
+ # ...
60
+
61
+ return result
62
+ ```
63
+
64
+ ## Rule 3 — Classify Downloaded Attachments
65
+
66
+ If the user requests "all attachments" (audit/backup scenario):
67
+
68
+ - Split into two directories:
69
+ - `images/` — files referenced inline in `body.storage`
70
+ - `images_unused/` — attached to the page but **not** referenced inline
71
+ - Report summary: `Used: N inline / Total: M attached`
72
+
73
+ ```bash
74
+ # Download only referenced attachments
75
+ for filename in $REFERENCED_FILES; do
76
+ att_id=$(vds-cli confluence content list-attachments <PAGE_ID> --json-only \
77
+ | jq -r ".[] | select(.title == \"$filename\") | .id")
78
+ uv run --directory ~/.claude/vds-scripts --package vds-cli \
79
+ vds-cli confluence content download-attachment "$att_id" --out "images/$filename"
80
+ done
81
+ ```
82
+
83
+ ## Rule 4 — Verify Before Reporting "Done"
84
+
85
+ After generating `content.md`:
86
+
87
+ 1. Count inline image refs in output: `grep -c '!\[.*\](images/.*)' content.md`
88
+ 2. Count `ri:filename` occurrences in `body.storage`
89
+ 3. **Both counts must match.** If they differ, log a warning with the diff:
90
+
91
+ ```
92
+ WARNING: Image ref mismatch — storage has 3 ri:filename refs, content.md has 2 inline images.
93
+ Missing: screen-step3.png
94
+ ```
95
+
96
+ Do not report success if counts diverge.
97
+
98
+ ## Rule 5 — Correct Invocation Sequence (Step by Step)
99
+
100
+ ```bash
101
+ # B1: Fetch storage body (source of truth for image refs)
102
+ uv run --directory ~/.claude/vds-scripts --package vds-cli \
103
+ vds-cli confluence content page <PAGE_ID> --expand body.storage
104
+
105
+ # B2: Extract ri:filename values from storage XML (see Rule 1 Python snippet)
106
+
107
+ # B3: Download only referenced attachments
108
+ uv run --directory ~/.claude/vds-scripts --package vds-cli \
109
+ vds-cli confluence content download-attachment <ATT_ID> --out images/<filename>
110
+
111
+ # B4: Convert storage XHTML → markdown (see Rule 2 Python snippet)
112
+ # - ac:image → ![](images/...) in-place
113
+ # - tables stay as HTML
114
+ # - output: content.md + images/
115
+
116
+ # B5: Verify (see Rule 4)
117
+ ```
118
+
119
+ ## Anti-Patterns (What NOT to Do)
120
+
121
+ | Anti-pattern | Why it fails |
122
+ |---|---|
123
+ | Fetch `body.view` then strip HTML | Loses image positions, macro-injected images leak in |
124
+ | `attachments --list` then download all | Downloads orphaned/versioned files (59 vs 3 actually used) |
125
+ | Flatten `<table>` to markdown `\|` pipe tables | Destroys merged cells, nested bullets, colspan/rowspan |
126
+ | Collect images at end of document | Breaks image↔context mapping (which screen? which step?) |
127
+ | Skip verification | Silent data loss — user trusts output without knowing images are missing |