@smart-cloud/publisher-exporter 1.1.19 → 1.1.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +41 -0
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -29,9 +29,13 @@ publisher-exporter install-browsers
29
29
  If `PLAYWRIGHT_BROWSERS_PATH` points to a shared system directory, create that directory first and make it writable by the same OS user that will run `publisher-exporter install-browsers`. A one-time elevated setup step to create or re-own the directory is fine. The later cron job does not need elevated privileges to use an already installed shared browser cache, but it does need read and execute access to that directory tree.
30
30
 
31
31
  ## Commands
32
+ - Normal usage:
32
33
 
33
34
  ```bash
34
35
  PUBLISHER_CONFIG=./publisher.config.json publisher-exporter crawl
36
+ PUBLISHER_CONFIG=./publisher.config.json publisher-exporter crawl --crawl-mode incremental
37
+ PUBLISHER_CONFIG=./publisher.config.json publisher-exporter crawl --resume-rewrite
38
+ PUBLISHER_CONFIG=./publisher.config.json publisher-exporter crawl --retry-timeouts
35
39
  PUBLISHER_CONFIG=./publisher.config.json publisher-exporter deploy
36
40
  PUBLISHER_CONFIG=./publisher.config.json publisher-exporter invalidate
37
41
  publisher-exporter publish
@@ -42,11 +46,40 @@ Equivalent `npx` usage:
42
46
 
43
47
  ```bash
44
48
  PUBLISHER_CONFIG=./publisher.config.json npx @smart-cloud/publisher-exporter crawl
49
+ PUBLISHER_CONFIG=./publisher.config.json npx @smart-cloud/publisher-exporter crawl --crawl-mode incremental
50
+ PUBLISHER_CONFIG=./publisher.config.json npx @smart-cloud/publisher-exporter crawl --resume-rewrite
45
51
  PUBLISHER_CONFIG=./publisher.config.json npx @smart-cloud/publisher-exporter deploy
46
52
  PUBLISHER_CONFIG=./publisher.config.json npx @smart-cloud/publisher-exporter invalidate
47
53
  npx @smart-cloud/publisher-exporter queue-runner --runtime-dir /srv/site/runtime --max-jobs 1
48
54
  ```
49
55
 
56
+ ## Crawl Modes And Repair Workflows
57
+
58
+ - `crawl` without extra flags performs a normal full crawl, asset download, and final text rewrite.
59
+ - `crawl --crawl-mode incremental` reuses the existing crawl manifest when possible and rewrites only the files affected by re-crawled pages, text assets, or changed asset-map entries.
60
+ - `crawl --resume-rewrite` skips discovery, rendering, and asset download, then reruns the final text rewrite over the existing output tree using the current rewrite rules.
61
+ - `crawl --retry-timeouts` retries timed-out URLs from the latest archived full crawl or publish log snapshot.
62
+
63
+ Use `--resume-rewrite` when the already exported output is otherwise valid but the rewrite logic changed or a previous crawl stopped during the rewrite phase. This is the fastest repair path for stale output caused by rewrite-only bugs such as escaped JSON replacement issues, protocol-relative asset URLs like `//host/path.css`, or an interrupted final rewrite.
64
+
65
+ If you need to repair existing exported HTML after a rewrite bug, prefer `--resume-rewrite` over a new incremental crawl. Incremental crawl only rewrites targeted files; `--resume-rewrite` reprocesses every text file in the current output.
66
+
67
+ During `--resume-rewrite`, the terminal can be quieter than a full crawl. Progress still updates in `runtime/current-progress.json` and the live crawl event snapshot under the configured log directory.
68
+
69
+ ## Direct CLI With Runtime-Managed Config
70
+
71
+ When the WordPress plugin already wrote `runtime/config.json` into shared publisher storage, direct CLI invocation still needs both the config path and the runtime directory.
72
+
73
+ Example:
74
+
75
+ ```bash
76
+ STATIC_PUBLISHER_RUNTIME_DIR=/mnt/site/runtime \
77
+ PUBLISHER_CONFIG=/mnt/site/runtime/config.json \
78
+ publisher-exporter crawl --resume-rewrite
79
+ ```
80
+
81
+ `PUBLISHER_CONFIG` selects the main exporter config file. `STATIC_PUBLISHER_RUNTIME_DIR` anchors runtime artifacts such as `current-progress.json`, `crawl-manifest.json`, queue files, and storage-relative output/log paths.
82
+
50
83
  ## Queue Runner
51
84
 
52
85
  The queue runner reads the runtime JSON files generated by the WordPress plugin.
@@ -123,6 +156,14 @@ Archived files are gzip-compressed per file and listed in `job.json`. WordPress
123
156
 
124
157
  Prune old archive directories with `publisher-exporter prune-logs --runtime-dir /srv/site/runtime --older-than-days 30`. Add that command to daily cron or another retention scheduler that matches your log policy.
125
158
 
159
+ ## Troubleshooting
160
+
161
+ - If exported HTML still contains old rewritten URLs after a rewrite fix, run `crawl --resume-rewrite`. This reprocesses every text file in the current output tree and does not depend on earlier `rewriteComplete: true` entries in `crawl-manifest.json`.
162
+ - If `--resume-rewrite` appears to stop after the initial resume message, check `runtime/current-progress.json` and `<logDir>/current-crawl-event.json` before assuming it is stuck. That phase can stay mostly quiet on stdout while it rewrites files.
163
+ - If direct CLI execution seems to ignore the expected runtime state, verify that `PUBLISHER_CONFIG` and `STATIC_PUBLISHER_RUNTIME_DIR` both point at the same runtime tree. A mismatched config path and runtime dir can make the run look idle or incomplete.
164
+ - Prefer incremental crawl when the existing manifest is trusted and you only need a fast recrawl of changed pages or assets. Prefer a full crawl when discovery rules changed, the manifest is missing or suspect, or you want a clean end-to-end rebuild.
165
+ - For generated 404 capture, set `generated404RequestPath` to a path that really returns HTTP 404 from the origin. The exporter validates that response before reusing it as the generated 404 page.
166
+
126
167
  ## WordPress Integration
127
168
 
128
169
  The companion WordPress plugin can optionally store an `External exporter dir` setting. Point it at this package root when you want PHP-side diagnostics to verify the local CLI install.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@smart-cloud/publisher-exporter",
3
- "version": "1.1.19",
3
+ "version": "1.1.20",
4
4
  "license": "MIT",
5
5
  "type": "module",
6
6
  "description": "Headless Playwright static publisher for WordPress/Elementor sites with sitemap-only page discovery, strict asset capture, escaped URL rewrite, structured logs, and targeted retry modes.",