@ngocsangairvds/vsaf 4.1.16 → 4.1.18
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -112,6 +112,21 @@ uv run --directory ~/.claude/vds-scripts --package spec_orchestrator vds-spec --
|
|
|
112
112
|
- Research-backed investigation → `research-skill`
|
|
113
113
|
- Structural graph review → `code-review-graph-skill`
|
|
114
114
|
|
|
115
|
+
## Mandatory Rules
|
|
116
|
+
|
|
117
|
+
### Confluence Content Export
|
|
118
|
+
|
|
119
|
+
When fetching or exporting Confluence page content (especially BRD, KBNV, SRS pages with tables and images), you **MUST** read and follow `references/confluence-content-export.md` before proceeding. Key rules:
|
|
120
|
+
|
|
121
|
+
0. **Output to `.vsaf/docs/confluence/<page-title>/`** — never to `docs/` at project root
|
|
122
|
+
1. **Use `body.storage`** (not `body.view`) as source of truth for image references
|
|
123
|
+
2. **Only download attachments referenced in `body.storage`** — never bulk-download all attachments
|
|
124
|
+
3. **Preserve image position** in output — replace `ac:image` with `` in-place, never collect at end
|
|
125
|
+
4. **Keep HTML tables** in markdown output — do not flatten to pipe tables
|
|
126
|
+
5. **Verify image count** matches between `body.storage` and output before reporting done
|
|
127
|
+
|
|
128
|
+
Failure to follow these rules causes silent data loss (wrong images downloaded, table structure destroyed, image↔text mapping broken).
|
|
129
|
+
|
|
115
130
|
## Progressive Disclosure
|
|
116
131
|
|
|
117
132
|
Use these bundled references for depth:
|
|
@@ -121,6 +136,7 @@ Use these bundled references for depth:
|
|
|
121
136
|
- `references/validation-commands.md`
|
|
122
137
|
- `references/development-commands.md`
|
|
123
138
|
- `references/specialist-routing.md`
|
|
139
|
+
- `references/confluence-content-export.md`
|
|
124
140
|
|
|
125
141
|
## Notes
|
|
126
142
|
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
# Confluence Content Export — Mandatory Rules
|
|
2
|
+
|
|
3
|
+
When extracting content from a Confluence page (BRD, KBNV, SRS, or any page with tables/images), follow these rules strictly. Violations cause data loss that is invisible to the user.
|
|
4
|
+
|
|
5
|
+
## Rule 0 — Output Directory
|
|
6
|
+
|
|
7
|
+
All Confluence export artifacts **MUST** be placed under `.vsaf/docs/` in the project root, **not** in `docs/` at root.
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
<project-root>/
|
|
11
|
+
├── .vsaf/
|
|
12
|
+
│ └── docs/
|
|
13
|
+
│ └── confluence/
|
|
14
|
+
│ ├── <page-title>/
|
|
15
|
+
│ │ ├── content.md
|
|
16
|
+
│ │ └── images/
|
|
17
|
+
│ │ ├── screen-login.png
|
|
18
|
+
│ │ └── flow-diagram.png
|
|
19
|
+
│ └── <another-page>/
|
|
20
|
+
│ ├── content.md
|
|
21
|
+
│ └── images/
|
|
22
|
+
└── docs/ ← project docs, NOT for Confluence exports
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
- Use `.vsaf/docs/confluence/<page-title>/` as the output directory for each page.
|
|
26
|
+
- This keeps Confluence artifacts separate from project documentation and co-located with other VSAF working files.
|
|
27
|
+
- Pass `--out .vsaf/docs/confluence/<page-title>/` when using CLI commands that accept output path.
|
|
28
|
+
|
|
29
|
+
## Rule 1 — Use `body.storage` as the Single Source of Truth for Image Refs
|
|
30
|
+
|
|
31
|
+
- **Never** rely on `body.view` for image references — it is post-render HTML that may add or drop images via macros.
|
|
32
|
+
- `body.storage` is raw XHTML containing `<ac:image><ri:attachment ri:filename="..."/></ac:image>` — this is the definitive set of inline images.
|
|
33
|
+
- **Only download attachments whose filename appears in `body.storage`**. Do not bulk-download from `attachments --list`.
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
# Step 1: fetch storage body
|
|
37
|
+
uv run --directory ~/.claude/vds-scripts --package vds-cli \
|
|
38
|
+
vds-cli confluence content page <PAGE_ID> --expand body.storage
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
```python
|
|
42
|
+
# Step 2: extract referenced filenames from storage XHTML
|
|
43
|
+
import re
|
|
44
|
+
storage_xml: str = ... # body.storage content
|
|
45
|
+
referenced = set(re.findall(r'ri:filename="([^"]+)"', storage_xml))
|
|
46
|
+
# referenced = {'screen-login.png', 'flow-diagram.png', 'table-header.png'}
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Rule 2 — Preserve Image-to-Text Positional Mapping
|
|
50
|
+
|
|
51
|
+
When converting to markdown/local file:
|
|
52
|
+
|
|
53
|
+
- Replace each `<ac:image>...<ri:attachment ri:filename="X"/>...</ac:image>` with `` **at its original position** in the document. Never collect images into a list at the end.
|
|
54
|
+
- If the content contains `<table>`, **keep HTML table markup** in the markdown output. Markdown viewers (GitHub, Obsidian, VS Code) all render HTML tables. Do not flatten tables with nested bullets or merged cells into pure-markdown `|` tables — this destroys row/column relationships.
|
|
55
|
+
- Place the `images/` folder at the same level as `content.md`.
|
|
56
|
+
|
|
57
|
+
```python
|
|
58
|
+
# Step 3: convert storage XHTML → markdown with inline image refs
|
|
59
|
+
import re
|
|
60
|
+
|
|
61
|
+
def storage_to_markdown(storage_xml: str) -> str:
|
|
62
|
+
"""Convert Confluence storage XHTML to markdown, preserving image positions and tables."""
|
|
63
|
+
result = storage_xml
|
|
64
|
+
|
|
65
|
+
# Replace ac:image with markdown image refs (in-place)
|
|
66
|
+
def replace_ac_image(match: re.Match) -> str:
|
|
67
|
+
filename = re.search(r'ri:filename="([^"]+)"', match.group(0))
|
|
68
|
+
if filename:
|
|
69
|
+
name = filename.group(1)
|
|
70
|
+
return f''
|
|
71
|
+
return match.group(0)
|
|
72
|
+
|
|
73
|
+
result = re.sub(
|
|
74
|
+
r'<ac:image[^>]*>.*?</ac:image>',
|
|
75
|
+
replace_ac_image,
|
|
76
|
+
result,
|
|
77
|
+
flags=re.DOTALL,
|
|
78
|
+
)
|
|
79
|
+
|
|
80
|
+
# Keep <table>...</table> blocks as-is (HTML in markdown)
|
|
81
|
+
# Convert other XHTML elements to markdown as usual:
|
|
82
|
+
# <p> → newline, <h1> → #, <li> → -, <strong> → **, etc.
|
|
83
|
+
# ...
|
|
84
|
+
|
|
85
|
+
return result
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## Rule 3 — Classify Downloaded Attachments
|
|
89
|
+
|
|
90
|
+
If the user requests "all attachments" (audit/backup scenario):
|
|
91
|
+
|
|
92
|
+
- Split into two directories:
|
|
93
|
+
- `images/` — files referenced inline in `body.storage`
|
|
94
|
+
- `images_unused/` — attached to the page but **not** referenced inline
|
|
95
|
+
- Report summary: `Used: N inline / Total: M attached`
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
# Download only referenced attachments into .vsaf/docs/confluence/<page>/images/
|
|
99
|
+
OUT_DIR=".vsaf/docs/confluence/<page-title>"
|
|
100
|
+
mkdir -p "$OUT_DIR/images"
|
|
101
|
+
for filename in $REFERENCED_FILES; do
|
|
102
|
+
att_id=$(vds-cli confluence content list-attachments <PAGE_ID> --json-only \
|
|
103
|
+
| jq -r ".[] | select(.title == \"$filename\") | .id")
|
|
104
|
+
uv run --directory ~/.claude/vds-scripts --package vds-cli \
|
|
105
|
+
vds-cli confluence content download-attachment "$att_id" --out "$OUT_DIR/images/$filename"
|
|
106
|
+
done
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Rule 4 — Verify Before Reporting "Done"
|
|
110
|
+
|
|
111
|
+
After generating `content.md`:
|
|
112
|
+
|
|
113
|
+
1. Count inline image refs in output: `grep -c '!\[.*\](images/.*)' content.md`
|
|
114
|
+
2. Count `ri:filename` occurrences in `body.storage`
|
|
115
|
+
3. **Both counts must match.** If they differ, log a warning with the diff:
|
|
116
|
+
|
|
117
|
+
```
|
|
118
|
+
WARNING: Image ref mismatch — storage has 3 ri:filename refs, content.md has 2 inline images.
|
|
119
|
+
Missing: screen-step3.png
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
Do not report success if counts diverge.
|
|
123
|
+
|
|
124
|
+
## Rule 5 — Correct Invocation Sequence (Step by Step)
|
|
125
|
+
|
|
126
|
+
```bash
|
|
127
|
+
# B0: Set output directory
|
|
128
|
+
OUT_DIR=".vsaf/docs/confluence/<page-title>"
|
|
129
|
+
mkdir -p "$OUT_DIR/images"
|
|
130
|
+
|
|
131
|
+
# B1: Fetch storage body (source of truth for image refs)
|
|
132
|
+
uv run --directory ~/.claude/vds-scripts --package vds-cli \
|
|
133
|
+
vds-cli confluence content page <PAGE_ID> --expand body.storage
|
|
134
|
+
|
|
135
|
+
# B2: Extract ri:filename values from storage XML (see Rule 1 Python snippet)
|
|
136
|
+
|
|
137
|
+
# B3: Download only referenced attachments
|
|
138
|
+
uv run --directory ~/.claude/vds-scripts --package vds-cli \
|
|
139
|
+
vds-cli confluence content download-attachment <ATT_ID> --out "$OUT_DIR/images/<filename>"
|
|
140
|
+
|
|
141
|
+
# B4: Convert storage XHTML → markdown (see Rule 2 Python snippet)
|
|
142
|
+
# - ac:image →  in-place
|
|
143
|
+
# - tables stay as HTML
|
|
144
|
+
# - output: $OUT_DIR/content.md + $OUT_DIR/images/
|
|
145
|
+
|
|
146
|
+
# B5: Verify (see Rule 4)
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
## Anti-Patterns (What NOT to Do)
|
|
150
|
+
|
|
151
|
+
| Anti-pattern | Why it fails |
|
|
152
|
+
|---|---|
|
|
153
|
+
| Fetch `body.view` then strip HTML | Loses image positions, macro-injected images leak in |
|
|
154
|
+
| `attachments --list` then download all | Downloads orphaned/versioned files (59 vs 3 actually used) |
|
|
155
|
+
| Flatten `<table>` to markdown `\|` pipe tables | Destroys merged cells, nested bullets, colspan/rowspan |
|
|
156
|
+
| Collect images at end of document | Breaks image↔context mapping (which screen? which step?) |
|
|
157
|
+
| Skip verification | Silent data loss — user trusts output without knowing images are missing |
|
|
158
|
+
| Export to `docs/` at project root | Pollutes project docs; use `.vsaf/docs/confluence/` instead |
|