pull-cli 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,218 @@
1
+ Metadata-Version: 2.4
2
+ Name: pull-cli
3
+ Version: 0.1.0
4
+ Summary: AI-optimized Confluence evidence package extractor
5
+ Project-URL: Homepage, https://github.com/ThomasRohde/pull-cli
6
+ Project-URL: Repository, https://github.com/ThomasRohde/pull-cli
7
+ Project-URL: Issues, https://github.com/ThomasRohde/pull-cli/issues
8
+ Author-email: Thomas Rohde <rohde.thomas@gmail.com>
9
+ License: MIT
10
+ License-File: LICENSE
11
+ Keywords: ai,atlassian,cli,confluence,markdown
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Environment :: Console
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Operating System :: OS Independent
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Requires-Python: >=3.11
22
+ Requires-Dist: atlassian-python-api>=4.0.7
23
+ Requires-Dist: beautifulsoup4>=4.12.3
24
+ Requires-Dist: lxml>=5.2.0
25
+ Requires-Dist: markdownify>=0.13.1
26
+ Requires-Dist: pyyaml>=6.0.2
27
+ Provides-Extra: dev
28
+ Requires-Dist: hatch>=1.14.0; extra == 'dev'
29
+ Requires-Dist: pytest>=8.3.0; extra == 'dev'
30
+ Requires-Dist: ruff>=0.6.9; extra == 'dev'
31
+ Provides-Extra: extract
32
+ Requires-Dist: defusedxml>=0.7.1; extra == 'extract'
33
+ Requires-Dist: openpyxl>=3.1.5; extra == 'extract'
34
+ Requires-Dist: pypdf>=5.0.0; extra == 'extract'
35
+ Requires-Dist: python-docx>=1.1.2; extra == 'extract'
36
+ Requires-Dist: python-pptx>=1.0.2; extra == 'extract'
37
+ Description-Content-Type: text/markdown
38
+
39
+ # pull-cli
40
+
41
+ `pull-cli` installs the `pull` command, a read-only Confluence extractor for AI-consumable evidence packages. It is rendered-page-first: page Markdown, and the optional Markdown bundle in full mode, are based on the current published page as visible to the authenticated user, while storage XML is kept for macro recovery, provenance, and fallback.
42
+
43
+ The default output mode is `simple`: a quiet agent-facing package with the root AI Markdown file, per-page Markdown files, assets/sidecars, and validation control files. Use `--output-mode full` when you also want `bundle.md`, page HTML snapshots, and storage-source sidecars.
44
+
45
+ Confluence access is implemented through `atlassian-python-api` behind a small `pull_cli.clients` protocol. The extraction, redaction, manifest, asset, link, and validation contracts remain owned by `pull-cli`.
46
+
47
+ ## Install
48
+
49
+ ```bash
50
+ uvx pull-cli --help
51
+ uv tool install pull-cli
52
+ pip install pull-cli
53
+ ```
54
+
55
+ The package name is `pull-cli`. The import package is `pull_cli`. Console scripts are `pull` and `pull-cli`.
56
+
57
+ ## Quickstart
58
+
59
+ Cloud:
60
+
61
+ ```bash
62
+ set PULL_URL=https://example.atlassian.net/wiki
63
+ set PULL_USER=you@example.com
64
+ set PULL_TOKEN=your-api-token
65
+ pull 123456 -o pulled-confluence
66
+ ```
67
+
68
+ Data Center or Server:
69
+
70
+ ```bash
71
+ set PULL_URL=https://confluence.example.com/confluence
72
+ set PULL_TOKEN=your-personal-access-token
73
+ pull --page-id 123456 -o pulled-confluence
74
+ ```
75
+
76
+ `CONFPUB_URL`, `CONFPUB_USER`, `CONFPUB_TOKEN`, and `CONFPUB_SSL_VERIFY` are accepted as compatibility fallbacks after `PULL_*` variables.
77
+
78
+ ## CLI Examples
79
+
80
+ ```bash
81
+ pull 123456 -o pulled
82
+ pull "https://example.atlassian.net/wiki/spaces/EA/pages/123456/Architecture" -o pulled
83
+ pull --space EA --title "Architecture Overview" -o pulled
84
+ pull --page-id 123456 --tree --depth 3 --max-pages 100 -o tree
85
+ pull --page-id 123456 --tree --assets all --extract-attachments -o offline
86
+ pull --page-id 123456 --tree --comments -o with-comments
87
+ pull --page-id 123456 --output-mode full -o full-evidence
88
+ pull --page-id 123456 --output-mode simple --bundle -o simple-with-bundle
89
+ pull --page-id 123456 --json -o pulled
90
+ pull validate pulled
91
+ pull guide --json
92
+ ```
93
+
94
+ Selector resolution order is: explicit `--page-id`, explicit `--url`, positional URL, positional numeric page ID, then `--space` plus `--title`.
95
+
96
+ ## Output Package
97
+
98
+ Default `simple` mode:
99
+
100
+ ```text
101
+ pulled-confluence/
102
+ ├── page-title.md
103
+ ├── page-title.yaml
104
+ ├── manifest.yaml
105
+ ├── pages/
106
+ │ └── 0001-page-slug/
107
+ │ ├── index.md
108
+ │ ├── page.json
109
+ │ ├── comments.md # with --comments, only when comments exist
110
+ │ └── assets/
111
+ └── diagnostics/
112
+ ├── warnings.jsonl
113
+ └── unresolved-links.md
114
+ ```
115
+
116
+ `page-title.md` is named from the sanitized root page title and is the recommended first file to give another AI agent. In simple mode it links only the reading/navigation surface: page Markdown paths, assets, sidecars, and explicitly requested agent-facing extras such as `bundle.md` or `chunks.jsonl`. Warning counts are shown, but control files are not linked from the root AI Markdown.
117
+
118
+ `page-title.yaml` is the machine-readable version of that AI navigation manifest, also named from the sanitized root page title. It intentionally omits noisy provenance and raw API details; use `manifest.yaml` when you need full validation/provenance data. The exact generated filenames are recorded in `manifest.yaml` under `paths.ai_entry` and `paths.ai_manifest`. AI navigation paths are package-root-relative: resolve them against the directory containing the root AI Markdown/YAML file, not the caller's shell working directory.
119
+
120
+ `manifest.yaml`, `page.json`, and diagnostics files are still written in simple mode so `pull validate <output-dir>` and provenance checks work. `--force` never deletes stale files from earlier runs; use `--clean` when switching modes if you need the physical tree to contain only files from the new mode.
121
+
122
+ `--output-mode full` adds the full evidence artifacts:
123
+
124
+ ```text
125
+ pulled-confluence/
126
+ ├── bundle.md
127
+ └── pages/
128
+ └── 0001-page-slug/
129
+ ├── index.html
130
+ └── source.storage.xml
131
+ ```
132
+
133
+ `bundle.md` concatenates pages in page/tree order with stable delimiters for AI use; local links embedded in the bundle are rebased to the package root. `index.html` and `source.storage.xml` are raw/reference artifacts, not the primary navigation surface.
134
+
135
+ For tree pulls, nested page paths are the default. The manifest always carries stable numeric ordering.
136
+
137
+ ## Auth and Config
138
+
139
+ Resolution order:
140
+
141
+ 1. CLI flags such as `--base-url`, `--user`, `--token`, `--ssl-verify`.
142
+ 2. `PULL_*` environment variables.
143
+ 3. Optional YAML config from `--config`.
144
+ 4. `CONFPUB_*` compatibility environment variables.
145
+
146
+ `--ssl-verify` accepts `true`, `false`, or a CA bundle path.
147
+
148
+ ## Macro, Asset, and Link Behavior
149
+
150
+ The extractor uses a macro adapter registry. Current adapters cover panels/admonitions, code/noformat, status, expand, tabs, layout flattening, TOC placeholders, children/page tree links when in scope, include/excerpt placeholders or inline source when available, attachments, displayed files, Jira placeholders, diagram snapshots, dynamic snapshots, HTML macro sanitization, and unknown macro placeholders.
151
+
152
+ Asset policy defaults to `visible`: rendered images, visible attachment links, file macros, and rendered diagram images where discoverable. `--assets page` downloads all page attachments. `--assets all` includes visible/referenced assets plus all page attachments and macro-listed files where discoverable. `--no-assets` skips downloads and preserves source links with warnings.
153
+
154
+ Local links to pages in the pulled tree are rewritten to relative `index.md` paths. Downloaded asset links are rewritten to local files. External, mailto, Jira, and out-of-scope Confluence links are preserved. Same-page anchors are normalized where possible; unresolved anchors become diagnostics.
155
+
156
+ ## Comments
157
+
158
+ Comments are skipped by default. Use `--comments` to fetch page-level and inline comments for each pulled page. When comments exist, `pull` writes a page-local `comments.md` sidecar with agent-readable metadata and Markdown-converted comment bodies.
159
+
160
+ Comment sidecars are agent-facing reading surfaces: the root AI Markdown page hierarchy links them in simple mode, the page `index.md` header links the local sidecar, and the AI YAML includes the optional comments path and count. If one page's comments cannot be fetched, the pull continues with `W_COMMENTS_FETCH_FAILED` and validation can still pass for the partial package.
161
+
162
+ ## JSON Mode
163
+
164
+ With `--json` or `LLM=true`, stdout is exactly one JSON object with:
165
+
166
+ ```json
167
+ {
168
+ "schema_version": "1.0",
169
+ "request_id": "req_...",
170
+ "ok": true,
171
+ "command": "pull",
172
+ "target": {},
173
+ "result": {},
174
+ "warnings": [],
175
+ "errors": [],
176
+ "metrics": {}
177
+ }
178
+ ```
179
+
180
+ Progress, retries, warnings, and debug output belong on stderr.
181
+
182
+ ## Security
183
+
184
+ `pull` is read-only. It does not mutate Confluence, fetch drafts by default, bypass permissions, or call LLM services. Tokens, Authorization headers, cookies, signed download query parameters, and token-like strings are redacted before JSON envelopes, manifests, page metadata, and diagnostics are written.
185
+
186
+ Rendered HTML snapshots are sanitized by removing executable tags and event attributes. HTML macro content is made inert before conversion.
187
+
188
+ ## Validation
189
+
190
+ ```bash
191
+ pull validate pulled-confluence
192
+ pull validate pulled-confluence/manifest.yaml --json
193
+ ```
194
+
195
+ Validation checks manifest shape, AI navigation manifest paths, relative paths, page files, optional comment sidecars, asset checksums, diagnostics JSONL, Markdown local links, and token-like markers in text outputs.
196
+
197
+ ## Development
198
+
199
+ ```bash
200
+ uv sync --all-extras
201
+ uv run ruff check .
202
+ uv run pytest
203
+ uv build
204
+ uv run pull --help
205
+ uv run pull guide --json
206
+ uv run python tests/generate_fixture_output.py .tmp/generated-fixture
207
+ uv run pull validate .tmp/generated-fixture
208
+ ```
209
+
210
+ Live smoke testing requires a readable Confluence page and credentials through `PULL_*` or `CONFPUB_*`.
211
+
212
+ ## Releasing
213
+
214
+ Versions are managed from `src/pull_cli/__init__.py` through Hatch. Use `uv run hatch version patch`, `uv run hatch version minor`, or `uv run hatch version major`; `pull --version`, built package metadata, and GitHub release tags are expected to match. See [RELEASING.md](RELEASING.md) for the PyPI trusted publisher setup and release flow.
215
+
216
+ ## License
217
+
218
+ MIT. See [LICENSE](LICENSE).
@@ -0,0 +1,31 @@
1
+ pull_cli/__init__.py,sha256=NwPRGSlXfZCb8Z4R1qZXHlFuXjmIGWj3CDBUsow4Rpk,105
2
+ pull_cli/__main__.py,sha256=UP0yjf5bUzmplYRElA6c5FuXZNu3sTDMJfOpZsy3Bqg,115
3
+ pull_cli/assets.py,sha256=jh1Jgl-la-zA2jA4fhy58Bt9geEuffKrKRuy5yIZnA8,8356
4
+ pull_cli/attachment_extractors.py,sha256=9gZr7K5hFN7sp_Xp9jSqXdQy2Gt-EH7bwU_8TKAbB6c,2769
5
+ pull_cli/cli.py,sha256=RWDrY0WWo-tmqAuOtOvoG7cYhEJL7eKsomaJWFLpjDo,14896
6
+ pull_cli/config.py,sha256=Itkuda5Ir3yY6RRYYCNGIOm6yfW1ntSXgP6tnrqJmS4,2294
7
+ pull_cli/crawler.py,sha256=dLA5OBQPSznqXkDEPfiQTEnnv5JSo-9fWM5SEFGSrwA,1582
8
+ pull_cli/envelope.py,sha256=cwfFkR_ZERf-U4FkB-69qTd3ohIs6uZpvwxMY4iPWZk,1647
9
+ pull_cli/errors.py,sha256=20lRBifyj_KTuCaJWEQWAExDPNlWGp2-_Wig3SS9Igo,1155
10
+ pull_cli/extractor.py,sha256=klALkkc5_UgeOzTuYhlcJ_2PvzoXRmxBay_eTR9HU0o,12911
11
+ pull_cli/guide.py,sha256=PEXHT_B01JVjJSgmt8s_HVSccja16TfLDufxXppFtMM,6047
12
+ pull_cli/html_normalizer.py,sha256=JAf8fRnDUnUaIGnK1fHNJCz3FSdzPANea1eLSvBHCTE,4225
13
+ pull_cli/links.py,sha256=Zbb8s8f-TT9x6hFLocz-u4WHqU_F7SJt8OGf6cUMinU,6136
14
+ pull_cli/macros.py,sha256=WO2rWdH24X0x7dpn_hDlj6kVfOSsw3jS-UlRNgSiFlo,18938
15
+ pull_cli/markdown_writer.py,sha256=Zl4MR4MMssnBrbbln-1lixaBCsfSM3tD9vVAzu7F7Xk,673
16
+ pull_cli/models.py,sha256=SDj4u_0Mj1tUbdxFBrsjGx7w1OUQoYxAgO9xnxXP3S8,6276
17
+ pull_cli/paths.py,sha256=DTt26KNqFTk08uUL1g24UcHFdbJ1wB3WGwEvLIzc74g,1220
18
+ pull_cli/resolver.py,sha256=Jyoznr9ff-_lwgcMtKal1Yo-4E58XD7JIOfddHmGxHI,2872
19
+ pull_cli/security.py,sha256=p5ozNBKvvT2g0LFgB8Qjf6HYxN_-laSEFuO7Q_EiWN8,3760
20
+ pull_cli/validator.py,sha256=nBiFbAgbil0dCpkr_VbL1BTefhT_SeuuKeSWQx5HGSk,18935
21
+ pull_cli/writer.py,sha256=7qN-7TzDG0cZCndddR8A6jxJs-AmB_FeUIyTbhj4Shs,31656
22
+ pull_cli/clients/__init__.py,sha256=54sYdetJJ03RnCtRP4xbINO0JqlRpoUB2PN-rlBx13k,267
23
+ pull_cli/clients/base.py,sha256=KxgYGCYFeu1NRiFo8-HayvRYVEjCrtmjGBBoHYfzx4Q,858
24
+ pull_cli/clients/cloud_v2.py,sha256=YhLk6mv1kKZq2NmuDC52uJFdfvE1Y9J-Er6wR9k0NJE,5571
25
+ pull_cli/clients/data_center.py,sha256=1zN35inHrCni-X2xxny_NanP3k5Lf4RPa5veJLdTn7M,15647
26
+ pull_cli/clients/hybrid.py,sha256=gBOSMrxvUbV9pHYP5YpWa53lL1CIE-WfyUc-A1sJtkA,424
27
+ pull_cli-0.1.0.dist-info/METADATA,sha256=6nxkPhYw8M26ogIi5YtjyQxnF0qt9TZo2a9HVirUcso,10056
28
+ pull_cli-0.1.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
29
+ pull_cli-0.1.0.dist-info/entry_points.txt,sha256=WbpaUb8ralPybdTXQvn4ntQzXX3PpcixXZJ9NZXz3LU,72
30
+ pull_cli-0.1.0.dist-info/licenses/LICENSE,sha256=XeaSn5y2iyNPWLW7TlPEWT8OkLhuZw-koqZ5bhUw5tg,1069
31
+ pull_cli-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.30.1
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,3 @@
1
+ [console_scripts]
2
+ pull = pull_cli.cli:main
3
+ pull-cli = pull_cli.cli:main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Thomas Rohde
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.