@ngocsangairvds/vsaf 4.1.16 → 4.1.17
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -112,6 +112,20 @@ uv run --directory ~/.claude/vds-scripts --package spec_orchestrator vds-spec --
|
|
|
112
112
|
- Research-backed investigation → `research-skill`
|
|
113
113
|
- Structural graph review → `code-review-graph-skill`
|
|
114
114
|
|
|
115
|
+
## Mandatory Rules
|
|
116
|
+
|
|
117
|
+
### Confluence Content Export
|
|
118
|
+
|
|
119
|
+
When fetching or exporting Confluence page content (especially BRD, KBNV, SRS pages with tables and images), you **MUST** read and follow `references/confluence-content-export.md` before proceeding. Key rules:
|
|
120
|
+
|
|
121
|
+
1. **Use `body.storage`** (not `body.view`) as source of truth for image references
|
|
122
|
+
2. **Only download attachments referenced in `body.storage`** — never bulk-download all attachments
|
|
123
|
+
3. **Preserve image position** in output — replace `ac:image` with `` in-place, never collect at end
|
|
124
|
+
4. **Keep HTML tables** in markdown output — do not flatten to pipe tables
|
|
125
|
+
5. **Verify image count** matches between `body.storage` and output before reporting done
|
|
126
|
+
|
|
127
|
+
Failure to follow these rules causes silent data loss (wrong images downloaded, table structure destroyed, image↔text mapping broken).
|
|
128
|
+
|
|
115
129
|
## Progressive Disclosure
|
|
116
130
|
|
|
117
131
|
Use these bundled references for depth:
|
|
@@ -121,6 +135,7 @@ Use these bundled references for depth:
|
|
|
121
135
|
- `references/validation-commands.md`
|
|
122
136
|
- `references/development-commands.md`
|
|
123
137
|
- `references/specialist-routing.md`
|
|
138
|
+
- `references/confluence-content-export.md`
|
|
124
139
|
|
|
125
140
|
## Notes
|
|
126
141
|
|
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# Confluence Content Export — Mandatory Rules
|
|
2
|
+
|
|
3
|
+
When extracting content from a Confluence page (BRD, KBNV, SRS, or any page with tables/images), follow these rules strictly. Violations cause data loss that is invisible to the user.
|
|
4
|
+
|
|
5
|
+
## Rule 1 — Use `body.storage` as the Single Source of Truth for Image Refs
|
|
6
|
+
|
|
7
|
+
- **Never** rely on `body.view` for image references — it is post-render HTML that may add or drop images via macros.
|
|
8
|
+
- `body.storage` is raw XHTML containing `<ac:image><ri:attachment ri:filename="..."/></ac:image>` — this is the definitive set of inline images.
|
|
9
|
+
- **Only download attachments whose filename appears in `body.storage`**. Do not bulk-download from `attachments --list`.
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
# Step 1: fetch storage body
|
|
13
|
+
uv run --directory ~/.claude/vds-scripts --package vds-cli \
|
|
14
|
+
vds-cli confluence content page <PAGE_ID> --expand body.storage
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
```python
|
|
18
|
+
# Step 2: extract referenced filenames from storage XHTML
|
|
19
|
+
import re
|
|
20
|
+
storage_xml: str = ... # body.storage content
|
|
21
|
+
referenced = set(re.findall(r'ri:filename="([^"]+)"', storage_xml))
|
|
22
|
+
# referenced = {'screen-login.png', 'flow-diagram.png', 'table-header.png'}
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## Rule 2 — Preserve Image-to-Text Positional Mapping
|
|
26
|
+
|
|
27
|
+
When converting to markdown/local file:
|
|
28
|
+
|
|
29
|
+
- Replace each `<ac:image>...<ri:attachment ri:filename="X"/>...</ac:image>` with `` **at its original position** in the document. Never collect images into a list at the end.
|
|
30
|
+
- If the content contains `<table>`, **keep HTML table markup** in the markdown output. Markdown viewers (GitHub, Obsidian, VS Code) all render HTML tables. Do not flatten tables with nested bullets or merged cells into pure-markdown `|` tables — this destroys row/column relationships.
|
|
31
|
+
- Place the `images/` folder at the same level as `content.md`.
|
|
32
|
+
|
|
33
|
+
```python
|
|
34
|
+
# Step 3: convert storage XHTML → markdown with inline image refs
|
|
35
|
+
import re
|
|
36
|
+
|
|
37
|
+
def storage_to_markdown(storage_xml: str) -> str:
|
|
38
|
+
"""Convert Confluence storage XHTML to markdown, preserving image positions and tables."""
|
|
39
|
+
result = storage_xml
|
|
40
|
+
|
|
41
|
+
# Replace ac:image with markdown image refs (in-place)
|
|
42
|
+
def replace_ac_image(match: re.Match) -> str:
|
|
43
|
+
filename = re.search(r'ri:filename="([^"]+)"', match.group(0))
|
|
44
|
+
if filename:
|
|
45
|
+
name = filename.group(1)
|
|
46
|
+
return f''
|
|
47
|
+
return match.group(0)
|
|
48
|
+
|
|
49
|
+
result = re.sub(
|
|
50
|
+
r'<ac:image[^>]*>.*?</ac:image>',
|
|
51
|
+
replace_ac_image,
|
|
52
|
+
result,
|
|
53
|
+
flags=re.DOTALL,
|
|
54
|
+
)
|
|
55
|
+
|
|
56
|
+
# Keep <table>...</table> blocks as-is (HTML in markdown)
|
|
57
|
+
# Convert other XHTML elements to markdown as usual:
|
|
58
|
+
# <p> → newline, <h1> → #, <li> → -, <strong> → **, etc.
|
|
59
|
+
# ...
|
|
60
|
+
|
|
61
|
+
return result
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Rule 3 — Classify Downloaded Attachments
|
|
65
|
+
|
|
66
|
+
If the user requests "all attachments" (audit/backup scenario):
|
|
67
|
+
|
|
68
|
+
- Split into two directories:
|
|
69
|
+
- `images/` — files referenced inline in `body.storage`
|
|
70
|
+
- `images_unused/` — attached to the page but **not** referenced inline
|
|
71
|
+
- Report summary: `Used: N inline / Total: M attached`
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
# Download only referenced attachments
|
|
75
|
+
for filename in $REFERENCED_FILES; do
|
|
76
|
+
att_id=$(vds-cli confluence content list-attachments <PAGE_ID> --json-only \
|
|
77
|
+
| jq -r ".[] | select(.title == \"$filename\") | .id")
|
|
78
|
+
uv run --directory ~/.claude/vds-scripts --package vds-cli \
|
|
79
|
+
vds-cli confluence content download-attachment "$att_id" --out "images/$filename"
|
|
80
|
+
done
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## Rule 4 — Verify Before Reporting "Done"
|
|
84
|
+
|
|
85
|
+
After generating `content.md`:
|
|
86
|
+
|
|
87
|
+
1. Count inline image refs in output: `grep -c '!\[.*\](images/.*)' content.md`
|
|
88
|
+
2. Count `ri:filename` occurrences in `body.storage`
|
|
89
|
+
3. **Both counts must match.** If they differ, log a warning with the diff:
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
WARNING: Image ref mismatch — storage has 3 ri:filename refs, content.md has 2 inline images.
|
|
93
|
+
Missing: screen-step3.png
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Do not report success if counts diverge.
|
|
97
|
+
|
|
98
|
+
## Rule 5 — Correct Invocation Sequence (Step by Step)
|
|
99
|
+
|
|
100
|
+
```bash
|
|
101
|
+
# B1: Fetch storage body (source of truth for image refs)
|
|
102
|
+
uv run --directory ~/.claude/vds-scripts --package vds-cli \
|
|
103
|
+
vds-cli confluence content page <PAGE_ID> --expand body.storage
|
|
104
|
+
|
|
105
|
+
# B2: Extract ri:filename values from storage XML (see Rule 1 Python snippet)
|
|
106
|
+
|
|
107
|
+
# B3: Download only referenced attachments
|
|
108
|
+
uv run --directory ~/.claude/vds-scripts --package vds-cli \
|
|
109
|
+
vds-cli confluence content download-attachment <ATT_ID> --out images/<filename>
|
|
110
|
+
|
|
111
|
+
# B4: Convert storage XHTML → markdown (see Rule 2 Python snippet)
|
|
112
|
+
# - ac:image →  in-place
|
|
113
|
+
# - tables stay as HTML
|
|
114
|
+
# - output: content.md + images/
|
|
115
|
+
|
|
116
|
+
# B5: Verify (see Rule 4)
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
## Anti-Patterns (What NOT to Do)
|
|
120
|
+
|
|
121
|
+
| Anti-pattern | Why it fails |
|
|
122
|
+
|---|---|
|
|
123
|
+
| Fetch `body.view` then strip HTML | Loses image positions, macro-injected images leak in |
|
|
124
|
+
| `attachments --list` then download all | Downloads orphaned/versioned files (59 vs 3 actually used) |
|
|
125
|
+
| Flatten `<table>` to markdown `\|` pipe tables | Destroys merged cells, nested bullets, colspan/rowspan |
|
|
126
|
+
| Collect images at end of document | Breaks image↔context mapping (which screen? which step?) |
|
|
127
|
+
| Skip verification | Silent data loss — user trusts output without knowing images are missing |
|