@ngocsangairvds/vsaf 4.1.16 → 4.1.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ngocsangairvds/vsaf",
3
- "version": "4.1.16",
3
+ "version": "4.1.18",
4
4
  "description": "logging step",
5
5
  "main": "packages/core/dist/index.js",
6
6
  "types": "packages/core/dist/index.d.ts",
@@ -112,6 +112,21 @@ uv run --directory ~/.claude/vds-scripts --package spec_orchestrator vds-spec --
112
112
  - Research-backed investigation → `research-skill`
113
113
  - Structural graph review → `code-review-graph-skill`
114
114
 
115
+ ## Mandatory Rules
116
+
117
+ ### Confluence Content Export
118
+
119
+ When fetching or exporting Confluence page content (especially BRD, KBNV, SRS pages with tables and images), you **MUST** read and follow `references/confluence-content-export.md` before proceeding. Key rules:
120
+
121
+ 0. **Output to `.vsaf/docs/confluence/<page-title>/`** — never to `docs/` at project root
122
+ 1. **Use `body.storage`** (not `body.view`) as source of truth for image references
123
+ 2. **Only download attachments referenced in `body.storage`** — never bulk-download all attachments
124
+ 3. **Preserve image position** in output — replace `ac:image` with `![](images/...)` in-place, never collect at end
125
+ 4. **Keep HTML tables** in markdown output — do not flatten to pipe tables
126
+ 5. **Verify image count** matches between `body.storage` and output before reporting done
127
+
128
+ Failure to follow these rules causes silent data loss (wrong images downloaded, table structure destroyed, image↔text mapping broken).
129
+
115
130
  ## Progressive Disclosure
116
131
 
117
132
  Use these bundled references for depth:
@@ -121,6 +136,7 @@ Use these bundled references for depth:
121
136
  - `references/validation-commands.md`
122
137
  - `references/development-commands.md`
123
138
  - `references/specialist-routing.md`
139
+ - `references/confluence-content-export.md`
124
140
 
125
141
  ## Notes
126
142
 
@@ -0,0 +1,158 @@
1
+ # Confluence Content Export — Mandatory Rules
2
+
3
+ When extracting content from a Confluence page (BRD, KBNV, SRS, or any page with tables/images), follow these rules strictly. Violations cause data loss that is invisible to the user.
4
+
5
+ ## Rule 0 — Output Directory
6
+
7
+ All Confluence export artifacts **MUST** be placed under `.vsaf/docs/` in the project root, **not** in `docs/` at root.
8
+
9
+ ```
10
+ <project-root>/
11
+ ├── .vsaf/
12
+ │ └── docs/
13
+ │ └── confluence/
14
+ │ ├── <page-title>/
15
+ │ │ ├── content.md
16
+ │ │ └── images/
17
+ │ │ ├── screen-login.png
18
+ │ │ └── flow-diagram.png
19
+ │ └── <another-page>/
20
+ │ ├── content.md
21
+ │ └── images/
22
+ └── docs/ ← project docs, NOT for Confluence exports
23
+ ```
24
+
25
+ - Use `.vsaf/docs/confluence/<page-title>/` as the output directory for each page.
26
+ - This keeps Confluence artifacts separate from project documentation and co-located with other VSAF working files.
27
+ - Pass `--out .vsaf/docs/confluence/<page-title>/` when using CLI commands that accept output path.
28
+
29
+ ## Rule 1 — Use `body.storage` as the Single Source of Truth for Image Refs
30
+
31
+ - **Never** rely on `body.view` for image references — it is post-render HTML that may add or drop images via macros.
32
+ - `body.storage` is raw XHTML containing `<ac:image><ri:attachment ri:filename="..."/></ac:image>` — this is the definitive set of inline images.
33
+ - **Only download attachments whose filename appears in `body.storage`**. Do not bulk-download from `attachments --list`.
34
+
35
+ ```bash
36
+ # Step 1: fetch storage body
37
+ uv run --directory ~/.claude/vds-scripts --package vds-cli \
38
+ vds-cli confluence content page <PAGE_ID> --expand body.storage
39
+ ```
40
+
41
+ ```python
42
+ # Step 2: extract referenced filenames from storage XHTML
43
+ import re
44
+ storage_xml: str = ... # body.storage content
45
+ referenced = set(re.findall(r'ri:filename="([^"]+)"', storage_xml))
46
+ # referenced = {'screen-login.png', 'flow-diagram.png', 'table-header.png'}
47
+ ```
48
+
49
+ ## Rule 2 — Preserve Image-to-Text Positional Mapping
50
+
51
+ When converting to markdown/local file:
52
+
53
+ - Replace each `<ac:image>...<ri:attachment ri:filename="X"/>...</ac:image>` with `![X](images/X)` **at its original position** in the document. Never collect images into a list at the end.
54
+ - If the content contains `<table>`, **keep HTML table markup** in the markdown output. Markdown viewers (GitHub, Obsidian, VS Code) all render HTML tables. Do not flatten tables with nested bullets or merged cells into pure-markdown `|` tables — this destroys row/column relationships.
55
+ - Place the `images/` folder at the same level as `content.md`.
56
+
57
+ ```python
58
+ # Step 3: convert storage XHTML → markdown with inline image refs
59
+ import re
60
+
61
+ def storage_to_markdown(storage_xml: str) -> str:
62
+ """Convert Confluence storage XHTML to markdown, preserving image positions and tables."""
63
+ result = storage_xml
64
+
65
+ # Replace ac:image with markdown image refs (in-place)
66
+ def replace_ac_image(match: re.Match) -> str:
67
+ filename = re.search(r'ri:filename="([^"]+)"', match.group(0))
68
+ if filename:
69
+ name = filename.group(1)
70
+ return f'![{name}](images/{name})'
71
+ return match.group(0)
72
+
73
+ result = re.sub(
74
+ r'<ac:image[^>]*>.*?</ac:image>',
75
+ replace_ac_image,
76
+ result,
77
+ flags=re.DOTALL,
78
+ )
79
+
80
+ # Keep <table>...</table> blocks as-is (HTML in markdown)
81
+ # Convert other XHTML elements to markdown as usual:
82
+ # <p> → newline, <h1> → #, <li> → -, <strong> → **, etc.
83
+ # ...
84
+
85
+ return result
86
+ ```
87
+
88
+ ## Rule 3 — Classify Downloaded Attachments
89
+
90
+ If the user requests "all attachments" (audit/backup scenario):
91
+
92
+ - Split into two directories:
93
+ - `images/` — files referenced inline in `body.storage`
94
+ - `images_unused/` — attached to the page but **not** referenced inline
95
+ - Report summary: `Used: N inline / Total: M attached`
96
+
97
+ ```bash
98
+ # Download only referenced attachments into .vsaf/docs/confluence/<page>/images/
99
+ OUT_DIR=".vsaf/docs/confluence/<page-title>"
100
+ mkdir -p "$OUT_DIR/images"
101
+ for filename in $REFERENCED_FILES; do
102
+ att_id=$(vds-cli confluence content list-attachments <PAGE_ID> --json-only \
103
+ | jq -r ".[] | select(.title == \"$filename\") | .id")
104
+ uv run --directory ~/.claude/vds-scripts --package vds-cli \
105
+ vds-cli confluence content download-attachment "$att_id" --out "$OUT_DIR/images/$filename"
106
+ done
107
+ ```
108
+
109
+ ## Rule 4 — Verify Before Reporting "Done"
110
+
111
+ After generating `content.md`:
112
+
113
+ 1. Count inline image refs in output: `grep -c '!\[.*\](images/.*)' content.md`
114
+ 2. Count `ri:filename` occurrences in `body.storage`
115
+ 3. **Both counts must match.** If they differ, log a warning with the diff:
116
+
117
+ ```
118
+ WARNING: Image ref mismatch — storage has 3 ri:filename refs, content.md has 2 inline images.
119
+ Missing: screen-step3.png
120
+ ```
121
+
122
+ Do not report success if counts diverge.
123
+
124
+ ## Rule 5 — Correct Invocation Sequence (Step by Step)
125
+
126
+ ```bash
127
+ # B0: Set output directory
128
+ OUT_DIR=".vsaf/docs/confluence/<page-title>"
129
+ mkdir -p "$OUT_DIR/images"
130
+
131
+ # B1: Fetch storage body (source of truth for image refs)
132
+ uv run --directory ~/.claude/vds-scripts --package vds-cli \
133
+ vds-cli confluence content page <PAGE_ID> --expand body.storage
134
+
135
+ # B2: Extract ri:filename values from storage XML (see Rule 1 Python snippet)
136
+
137
+ # B3: Download only referenced attachments
138
+ uv run --directory ~/.claude/vds-scripts --package vds-cli \
139
+ vds-cli confluence content download-attachment <ATT_ID> --out "$OUT_DIR/images/<filename>"
140
+
141
+ # B4: Convert storage XHTML → markdown (see Rule 2 Python snippet)
142
+ # - ac:image → ![](images/...) in-place
143
+ # - tables stay as HTML
144
+ # - output: $OUT_DIR/content.md + $OUT_DIR/images/
145
+
146
+ # B5: Verify (see Rule 4)
147
+ ```
148
+
149
+ ## Anti-Patterns (What NOT to Do)
150
+
151
+ | Anti-pattern | Why it fails |
152
+ |---|---|
153
+ | Fetch `body.view` then strip HTML | Loses image positions, macro-injected images leak in |
154
+ | `attachments --list` then download all | Downloads orphaned/versioned files (59 vs 3 actually used) |
155
+ | Flatten `<table>` to markdown `\|` pipe tables | Destroys merged cells, nested bullets, colspan/rowspan |
156
+ | Collect images at end of document | Breaks image↔context mapping (which screen? which step?) |
157
+ | Skip verification | Silent data loss — user trusts output without knowing images are missing |
158
+ | Export to `docs/` at project root | Pollutes project docs; use `.vsaf/docs/confluence/` instead |