@sidub-inc/docuoria.cli 1.0.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (65) hide show
  1. package/dist/index.js +1056 -0
  2. package/package.json +56 -0
  3. package/payload/.claude-plugin/plugin.json +21 -0
  4. package/payload/MANIFEST.json +322 -0
  5. package/payload/SKILL.md +88 -0
  6. package/payload/assets/lib/Docuoria.dll +0 -0
  7. package/payload/assets/schemas/template-schema.json +413 -0
  8. package/payload/commands/classify.md +11 -0
  9. package/payload/commands/diagnose.md +11 -0
  10. package/payload/commands/extract.md +11 -0
  11. package/payload/commands/inspect.md +11 -0
  12. package/payload/commands/validate-template.md +11 -0
  13. package/payload/examples/01-extract-to-csv.md +49 -0
  14. package/payload/examples/02-classify-unknown-pdf.md +102 -0
  15. package/payload/examples/03-diagnose-failed-result.md +68 -0
  16. package/payload/references/classification.md +363 -0
  17. package/payload/references/decision-tree.md +43 -0
  18. package/payload/references/failure-tree.md +169 -0
  19. package/payload/references/pattern-authoring.md +40 -0
  20. package/payload/references/patterns.md +97 -0
  21. package/payload/references/privacy.md +36 -0
  22. package/payload/references/scripts.md +361 -0
  23. package/payload/references/template-reference.md +606 -0
  24. package/payload/references/workflow.md +163 -0
  25. package/payload/scripts/_common.csx +250 -0
  26. package/payload/scripts/classify.csx +53 -0
  27. package/payload/scripts/dry-run.csx +85 -0
  28. package/payload/scripts/evaluate-match.csx +72 -0
  29. package/payload/scripts/execute.csx +89 -0
  30. package/payload/scripts/inspect.csx +43 -0
  31. package/payload/scripts/list-templates.csx +34 -0
  32. package/payload/scripts/load-template.csx +54 -0
  33. package/payload/scripts/save-template.csx +53 -0
  34. package/payload/scripts/schema-info.csx +84 -0
  35. package/payload/scripts/test-groups.csx +44 -0
  36. package/payload/scripts/test-pattern.csx +61 -0
  37. package/payload/scripts/validate-template.csx +54 -0
  38. package/payload/skill/SKILL.md +88 -0
  39. package/payload/skill/assets/lib/Docuoria.dll +0 -0
  40. package/payload/skill/assets/schemas/template-schema.json +413 -0
  41. package/payload/skill/examples/01-extract-to-csv.md +49 -0
  42. package/payload/skill/examples/02-classify-unknown-pdf.md +102 -0
  43. package/payload/skill/examples/03-diagnose-failed-result.md +68 -0
  44. package/payload/skill/references/classification.md +363 -0
  45. package/payload/skill/references/decision-tree.md +43 -0
  46. package/payload/skill/references/failure-tree.md +169 -0
  47. package/payload/skill/references/pattern-authoring.md +40 -0
  48. package/payload/skill/references/patterns.md +97 -0
  49. package/payload/skill/references/privacy.md +36 -0
  50. package/payload/skill/references/scripts.md +361 -0
  51. package/payload/skill/references/template-reference.md +606 -0
  52. package/payload/skill/references/workflow.md +163 -0
  53. package/payload/skill/scripts/_common.csx +250 -0
  54. package/payload/skill/scripts/classify.csx +53 -0
  55. package/payload/skill/scripts/dry-run.csx +85 -0
  56. package/payload/skill/scripts/evaluate-match.csx +72 -0
  57. package/payload/skill/scripts/execute.csx +89 -0
  58. package/payload/skill/scripts/inspect.csx +43 -0
  59. package/payload/skill/scripts/list-templates.csx +34 -0
  60. package/payload/skill/scripts/load-template.csx +54 -0
  61. package/payload/skill/scripts/save-template.csx +53 -0
  62. package/payload/skill/scripts/schema-info.csx +84 -0
  63. package/payload/skill/scripts/test-groups.csx +44 -0
  64. package/payload/skill/scripts/test-pattern.csx +61 -0
  65. package/payload/skill/scripts/validate-template.csx +54 -0
@@ -0,0 +1,97 @@
1
+ # Patterns
2
+
3
+ The patterns below are *illustrative*. They demonstrate authoring techniques, not copy-paste solutions. Real PDFs have layout quirks (broken words, unicode whitespace, OCR drift) that mean a pattern that matches the visible text will often miss the engine's flattened haystack. Before using any pattern below, run `dotnet script scripts/test-pattern.csx -- <pdf> '<regex>'` and adapt to what `PatternTestResult.Matches` and `PatternTestResult.Gaps` actually report. See `pattern-authoring.md` for techniques.
4
+
5
+ ### 1. ISO 8601 date
6
+
7
+ - regex: `\b\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])\b`
8
+ - Matches: `2024-01-15`, `1999-12-31`.
9
+ - Non-matches: `2024-13-01`, `24-01-15`, `2024/01/15`.
10
+ - Field type: `DateOnly`.
11
+ - Teaches: alternation inside character classes for valid month/day ranges.
12
+
13
+ ### 2. US ZIP / CA postal code (combined)
14
+
15
+ - regex: `\b(?:\d{5}(?:-\d{4})?|[A-Z]\d[A-Z] ?\d[A-Z]\d)\b`
16
+ - Matches: `90210`, `90210-1234`, `K1A 0B1`, `K1A0B1`.
17
+ - Non-matches: `1234`, `9021O` (letter O instead of zero).
18
+ - Field type: `string`.
19
+ - Teaches: non-capturing groups and optional whitespace with `?`.
20
+
21
+ ### 3. Currency with symbol
22
+
23
+ - regex: `(?<currency>[$€£¥])\s?(?<amount>\d{1,3}(?:,\d{3})*(?:\.\d{2})?)`
24
+ - Matches: `$1,234.56`, `€ 99.00`, `£10`.
25
+ - Non-matches: `1234.56` (no symbol), `$1.234,56` (European grouping).
26
+ - Field type: `decimal`.
27
+ - Teaches: named groups (consumed downstream by `PatternExtractionSource.PrimaryGroup`).
28
+
29
+ ### 4. Currency, symbol-stripped numeric only
30
+
31
+ - regex: `(?<![\$€£¥\d])-?\d{1,3}(?:,\d{3})*\.\d{2}(?![\d])`
32
+ - Matches: `1,234.56`, `-99.00`.
33
+ - Non-matches: `1234` (no decimal), `1,234.5` (one fraction digit).
34
+ - Field type: `decimal`.
35
+ - Teaches: lookbehind/lookahead to reject embedded matches.
36
+
37
+ ### 5. Integer (standalone)
38
+
39
+ - regex: `(?<![\d.])\d+(?![\d.])`
40
+ - Matches: `42`, `1000`.
41
+ - Non-matches: `3.14`, `12345.67`.
42
+ - Field type: `int`.
43
+ - Teaches: negative lookarounds for "not part of a bigger token".
44
+
45
+ ### 6. Decimal (any precision)
46
+
47
+ - regex: `(?<![\d.])-?\d+\.\d+(?![\d.])`
48
+ - Matches: `3.14`, `-0.001`.
49
+ - Non-matches: `3.`, `.5`, `3.14.15`.
50
+ - Field type: `decimal`.
51
+ - Teaches: balanced lookarounds preventing partial overlap.
52
+
53
+ ### 7. Percentage
54
+
55
+ - regex: `(?<value>-?\d+(?:\.\d+)?)\s?%`
56
+ - Matches: `50%`, `12.5 %`, `-3%`.
57
+ - Non-matches: `% 50`, `fifty percent`.
58
+ - Field type: `decimal`.
59
+ - Teaches: optional whitespace plus capture before a literal suffix.
60
+
61
+ ### 8. Email (RFC-pragmatic)
62
+
63
+ - regex: `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b`
64
+ - Matches: `a@b.co`, `first.last+tag@example.com`.
65
+ - Non-matches: `a@b`, `@example.com`.
66
+ - Field type: `string`.
67
+ - Teaches: word-boundary anchoring and the pragmatic-vs-RFC-strict tradeoff.
68
+
69
+ ### 9. Phone E.164
70
+
71
+ - regex: `\+[1-9]\d{1,14}\b`
72
+ - Matches: `+14165551212`, `+442071838750`.
73
+ - Non-matches: `416-555-1212`, `+0123` (leading zero after `+`).
74
+ - Field type: `string`.
75
+ - Teaches: format normalisation upstream of extraction (the engine expects already-cleaned input).
76
+
77
+ ### 10. URL (http/https)
78
+
79
+ - regex: `https?://[^\s<>"']+`
80
+ - Matches: `https://example.com/path?q=1`, `http://a.b`.
81
+ - Non-matches: `ftp://x`, `example.com`.
82
+ - Field type: `string`.
83
+ - Teaches: negated character class for "until whitespace or quote".
84
+
85
+ ### 11. UUID v1–v5
86
+
87
+ - regex: `\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}\b`
88
+ - Matches: `550e8400-e29b-41d4-a716-446655440000`.
89
+ - Non-matches: `not-a-uuid`, all-zeros (fails the variant nibble).
90
+ - Field type: `Guid`.
91
+ - Teaches: position-specific character classes to validate format.
92
+
93
+ ## How to verify a pattern
94
+
95
+ - Run `dotnet script scripts/test-pattern.csx -- <pdf> '<regex>'`. If `PatternTestResult.HasMatches` is `false`, the pattern does not match this PDF's haystack — go to `pattern-authoring.md`.
96
+ - If matches are partial, run `dotnet script scripts/test-groups.csx -- <pdf> '<regex>'` and read `PatternGroupTestResult.Groups[*].MatchesIndependently` to find the failing group.
97
+ - If matches are too many, tighten with lookarounds (patterns 4–6 demonstrate the technique).
@@ -0,0 +1,36 @@
1
+ # Local-processing privacy guarantee
2
+
3
+ ## Claim
4
+
5
+ When you call `Docuoria`, the PDF bytes you supply never leave the machine the engine runs on. The library reads, extracts, transforms, and renders the PDF entirely in-process. The only outbound network call any first-party component makes is template JSON read/write against an HTTP template store, and that call is opt-in.
6
+
7
+ ## Evidence — extraction is in-process
8
+
9
+ Every PDF-consuming primitive on `IDocuoriaEngine` (`InspectAsync`, `TestPatternAsync`, `TestGroupsAsync`, `DryRunAsync`, `ExecuteTemplateAsync`, `EvaluateMatchAsync`, `EvaluateMatchRuleAsync`, `ClassifyAsync`) takes a `Stream` — never a URL or remote handle for the PDF. See `src/libs/Docuoria/Contracts/IDocuoriaEngine.cs`; each method's documentation carries the contract phrase "The PDF stream is opened and disposed within the call (D-13)." The implementation in `src/libs/Docuoria/Engine/DocuoriaEngine.cs` resolves the in-process `IPdfDocumentFactory` and walks the configured extraction/transformation/publish steps without leaving the process.
10
+
11
+ ## Evidence — the one network surface
12
+
13
+ `src/libs/Docuoria/Storage/ApiTemplateStoreProvider.cs` implements `ITemplateStoreProvider` and uses `IHttpClientFactory` (named client `ApiTemplateStoreProvider.HttpClientName = "Docuoria.TemplateStore"`) to read and write *templates*. Templates are JSON describing how to extract — they do not contain PDF bytes. A host that wants entirely local processing simply does not register `ApiTemplateStoreProvider`; the engine functions identically against a local store.
14
+
15
+ `src/libs/Docuoria/Pipeline/Retrieval/Http/HttpRetrievalProvider.cs` is an inbound fetch path the *host* opts into for retrieval steps; it does not upload PDFs supplied by the caller — it only downloads PDFs the template explicitly references.
16
+
17
+ ## What this does NOT promise
18
+
19
+ - If you wire a third-party logger, telemetry sink, or background storage handler around the engine, your hosting code may transmit data. The guarantee is about the library, not your host.
20
+ - If the host uses `HttpRetrievalProvider` to download a PDF before processing, the URL of that PDF is necessarily known to the network. The guarantee is about what happens *after* the engine has the bytes in memory.
21
+ - If you store templates via `ApiTemplateStoreProvider`, template content (which may contain regex patterns derived from the PDF's text) crosses the network. PDFs do not.
22
+
23
+ ## Verifying for yourself
24
+
25
+ 1. Search the library for outbound HTTP usage:
26
+ `Select-String -Path src/libs/Sidub.PdfPipeline -Pattern 'HttpClient|HttpRequestMessage' -Recurse`.
27
+ Confirm hits are confined to the template-store and retrieval surfaces:
28
+ - `Storage/ApiTemplateStoreProvider.cs`
29
+ - `Storage/Http/TemplateStoreCredentialHandler.cs`
30
+ - `Registration/HttpRetrievalProviderBuilderExtensions.cs`
31
+ - `Registration/TemplateStoreBuilderExtensions.cs`
32
+ - `Pipeline/Retrieval/Http/HttpRetrievalProvider.cs`
33
+
34
+ The extra hits beyond `ApiTemplateStoreProvider` and `HttpRetrievalProvider` are credential-handler and DI-registration support for those same two surfaces — they do not introduce new outbound paths.
35
+ 2. Confirm `IDocuoriaEngine` only accepts `Stream` for PDF input — open `src/libs/Docuoria/Contracts/IDocuoriaEngine.cs` and read every method signature.
36
+ 3. Run `dotnet test` with the network blocked (e.g. firewall rule) and observe extraction tests still pass.
@@ -0,0 +1,361 @@
1
+ # Agent Scripts
2
+
3
+ This directory hosts the **agent-facing CLI surface** for `Docuoria`. Each script
4
+ is a [`dotnet-script`](https://github.com/dotnet-script/dotnet-script) `.csx` file that
5
+ binds a single SDK verb to a deterministic JSON contract. The scripts are designed for
6
+ non-interactive automation (LLM agents, CI jobs, shell pipelines) and uphold a strict
7
+ output contract:
8
+
9
+ - **Successful runs** emit a single line of UTF-8 JSON to **stdout**, exit code `0`.
10
+ - **Errors** emit a single `{"error":{"code","message","detail"}}` line to **stderr**,
11
+ non-zero exit code.
12
+ - All payloads serialize via `DocuoriaJsonOptions.Default` (camelCase, discriminator
13
+ `$type` for polymorphic results, `WhenWritingNull` ignore policy — see Classify for
14
+ the explicit-null exception).
15
+
16
+ Every script `#load "_common.csx"` to share host bootstrap, argument parsing
17
+ (`Cli.Require` / `Cli.Get` / `Cli.Has`), template-store registration, PDF stream
18
+ loading, and JSON writers.
19
+
20
+ > v1.4 invariants: confidence is binary (`1.0` / `0.0`), `ClassifyAsync` returns
21
+ > `null` when no template matches (CLS-02), and throws `InvalidOperationException`
22
+ > when no store is registered.
23
+
24
+ > **Distribution:** this directory is the **source** for the AI plugin's `scripts/`
25
+ > folder. `skills/build.ps1` copies these `.csx` files into `dist/docuoria/scripts/`
26
+ > and rewrites the SDK `#r` line in `_common.csx` to point at the bundled
27
+ > `assets/lib/Docuoria.dll`. In-repo development uses the relative
28
+ > `bin/Release/...dll` path; downstream consumers receive the bundled DLL.
29
+
30
+ ## Installation
31
+
32
+ ```powershell
33
+ dotnet tool install -g dotnet-script
34
+ # build the SDK once so _common.csx can reference the local DLL
35
+ dotnet build src/libs/Docuoria/Docuoria.csproj -c Debug
36
+ ```
37
+
38
+ Run any script with:
39
+
40
+ ```powershell
41
+ dotnet script scripts/<name>.csx -- --pdf path\to\file.pdf
42
+ ```
43
+
44
+ The `--` separator forwards subsequent tokens as script arguments (exposed as the
45
+ `Args` global, an `IList<string>`).
46
+
47
+ ## Common Store Parameters
48
+
49
+ Scripts that access the template store (`classify`, `evaluate-match`,
50
+ `list-templates`, `load-template`, `save-template`) accept these shared flags
51
+ to configure the store backend. When neither `--store-path` nor `--store-url`
52
+ is provided, the local store defaults to `./templates`.
53
+
54
+ | Flag | Default | Description |
55
+ | -------------- | -------------- | ------------------------------------------------------------------------ |
56
+ | `--store-path` | `./templates` | Local file-system template store directory. |
57
+ | `--store-url` | _(none)_ | API template store URL (mutually exclusive with `--store-path`). |
58
+ | `--store-key` | _(none)_ | Function key for API store authentication (used with `--store-url`). |
59
+
60
+ ## Error JSON Shape
61
+
62
+ ```json
63
+ {
64
+ "error": {
65
+ "code": "kebab-case-identifier",
66
+ "message": "Human-readable summary.",
67
+ "detail": "Optional stack trace or full exception text."
68
+ }
69
+ }
70
+ ```
71
+
72
+ Common `code` values: `pdf-not-found`, `parse-error`, `already-exists`, `no-store`,
73
+ `unhandled`, `bad-format`.
74
+
75
+ ---
76
+
77
+ ## inspect.csx
78
+
79
+ **Synopsis.** Report low-level PDF structure (page count, text blocks, candidate
80
+ patterns) for a PDF — primary discovery step before authoring a template.
81
+
82
+ | Arg | Required | Description |
83
+ | --------- | -------- | -------------------------------------------- |
84
+ | `--pdf` | yes | Path to the source PDF. |
85
+ | `--page` | no | 1-based page index (default: all pages). |
86
+
87
+ **Output schema.** `PdfInspection` payload (`pageCount`, `pages[].blocks[]`, …).
88
+
89
+ **Exit codes.** `0` success · `1` unhandled · `pdf-not-found` on missing input.
90
+
91
+ **Example.**
92
+
93
+ ```powershell
94
+ dotnet script scripts/inspect.csx -- --pdf invoice.pdf --page 1
95
+ ```
96
+
97
+ ---
98
+
99
+ ## test-pattern.csx
100
+
101
+ **Synopsis.** Evaluate a single extraction pattern against a PDF and report whether
102
+ it matched, with the captured value(s).
103
+
104
+ | Arg | Required | Description |
105
+ | -------------------- | -------- | ---------------------------------------------------------- |
106
+ | `--pattern` | yes | Inline pattern source (regex or DSL block). |
107
+ | `--pdf` | yes | Path to the source PDF. |
108
+ | `--page` | no | 1-based page index (default: all pages). |
109
+ | `--block-separator` | no | Override the block-separator regex used during extraction. |
110
+
111
+ **Output schema.** `{ hasMatches: bool, matches: [...] }`.
112
+
113
+ **Exit codes.** `0` success · `1` unhandled / parse-error · `pdf-not-found`.
114
+
115
+ **Example.**
116
+
117
+ ```powershell
118
+ dotnet script scripts/test-pattern.csx -- --pattern 'Invoice #(\d+)' --pdf invoice.pdf
119
+ ```
120
+
121
+ ---
122
+
123
+ ## test-groups.csx
124
+
125
+ **Synopsis.** Evaluate a multi-group pattern and emit each named capture group's
126
+ match set — used when authoring repeating-row extractions.
127
+
128
+ | Arg | Required | Description |
129
+ | ----------- | -------- | -------------------------------------------- |
130
+ | `--pattern` | yes | Multi-group pattern source. |
131
+ | `--pdf` | yes | Path to the source PDF. |
132
+ | `--page` | no | 1-based page index (default: all pages). |
133
+
134
+ **Output schema.** `{ groups: { <name>: [matches...] } }`.
135
+
136
+ **Exit codes.** `0` success · `1` on extraction failure.
137
+
138
+ **Example.**
139
+
140
+ ```powershell
141
+ dotnet script scripts/test-groups.csx -- --pattern @rows.txt --pdf invoice.pdf
142
+ ```
143
+
144
+ ---
145
+
146
+ ## validate-template.csx
147
+
148
+ **Synopsis.** Parse a `Template` JSON file and report schema / semantic validation
149
+ results without executing it.
150
+
151
+ | Arg | Required | Description |
152
+ | ------------ | -------- | ------------------------------------ |
153
+ | `--template` | yes | Path to the template JSON file. |
154
+
155
+ **Output schema.** `{ valid: bool, errors: [string] }`.
156
+
157
+ **Exit codes.** `0` always (even on `valid:false`) · `1` for `parse-error`.
158
+
159
+ **Example.**
160
+
161
+ ```powershell
162
+ dotnet script scripts/validate-template.csx -- --template templates/invoice.json
163
+ ```
164
+
165
+ ---
166
+
167
+ ## dry-run.csx
168
+
169
+ **Synopsis.** Execute extraction + publish steps against a PDF **without** producing a
170
+ serialized output payload — useful for end-to-end pipeline validation. Optionally
171
+ preview formatted output with `--preview-as`.
172
+
173
+ | Arg | Required | Description |
174
+ | -------------- | -------- | ---------------------------------------------------------- |
175
+ | `--pdf` | yes | Path to the source PDF. |
176
+ | `--template` | yes | Path to the template JSON file. |
177
+ | `--preview-as` | no | Preview formatted output: `csv` or `json` (no file written). |
178
+
179
+ **Output schema.** `{ kind: "SucceededResult"|"FailedResult"|"RejectedResult", result }`.
180
+ With `--preview-as`: `{ kind, format, preview }`.
181
+
182
+ **Exit codes.** `0` success · `1` unhandled.
183
+
184
+ **Example.**
185
+
186
+ ```powershell
187
+ dotnet script scripts/dry-run.csx -- --pdf invoice.pdf --template templates/invoice.json
188
+ ```
189
+
190
+ ---
191
+
192
+ ## execute.csx
193
+
194
+ **Synopsis.** Full pipeline run with output generation. Writes a CSV or JSON payload
195
+ either to stdout or to `--output`.
196
+
197
+ | Arg | Required | Description |
198
+ | ------------ | -------- | -------------------------------------------------------------- |
199
+ | `--pdf` | yes | Path to the source PDF. |
200
+ | `--template` | yes | Path to the template JSON file. |
201
+ | `--format` | yes | `csv` or `json`. |
202
+ | `--output` | no | Write binary payload to this path. If omitted, stdout text. |
203
+
204
+ **Output schema.** Success: `{ status: "ok", format, output? }` (output is base64 / text).
205
+ Failure: `{ status: "rejected"|"failed", result }`.
206
+
207
+ **Exit codes.** `0` success · `1` rejected/failed · `2` `bad-format`.
208
+
209
+ **Example.**
210
+
211
+ ```powershell
212
+ dotnet script scripts/execute.csx -- --pdf invoice.pdf --template templates/invoice.json --format json --output out.json
213
+ ```
214
+
215
+ ---
216
+
217
+ ## evaluate-match.csx
218
+
219
+ **Synopsis.** Compute the aggregated match confidence between a PDF and a single
220
+ template. Confidence is `ruleConfidence × extractionProbeScore` (0.0 when either
221
+ fails, 1.0 when both are perfect). Template argument may be a file path **or** a
222
+ template identifier resolved through the configured store.
223
+
224
+ | Arg | Required | Description |
225
+ | -------------- | -------- | --------------------------------------------------------------------------------- |
226
+ | `--pdf` | yes | Path to the source PDF. |
227
+ | `--template` | yes | File path (`.json` / contains path separator) **or** template ID for store lookup. |
228
+ | `--store-path` | no | Local template store directory (default: `./templates`). |
229
+ | `--store-url` | no | API template store URL. |
230
+ | `--store-key` | no | Function key for API store authentication. |
231
+
232
+ **Output schema.** `{ confidence, matchedRules }`.
233
+
234
+ **Exit codes.** `0` success · `1` template not found / unhandled.
235
+
236
+ **Example.**
237
+
238
+ ```powershell
239
+ dotnet script scripts/evaluate-match.csx -- --pdf invoice.pdf --template invoice
240
+ ```
241
+
242
+ ---
243
+
244
+ ## classify.csx
245
+
246
+ **Synopsis.** Run ranked classification across **all** registered templates and return
247
+ the top matches sorted by confidence (descending).
248
+
249
+ | Arg | Required | Description |
250
+ | -------------- | -------- | ------------------------------------------------- |
251
+ | `--pdf` | yes | Path to the source PDF. |
252
+ | `--top` | no | Maximum number of results to return (default: 5). |
253
+ | `--store-path` | no | Local template store directory (default: `./templates`). |
254
+ | `--store-url` | no | API template store URL. |
255
+ | `--store-key` | no | Function key for API store authentication. |
256
+
257
+ **Output schema.** `{ matches: [{ templateId, confidence }, ...] }`. Only functional
258
+ matches are included (root rule passes AND extraction probe > 0).
259
+
260
+ **Exit codes.** `0` success (including `match:null`) · `1` `no-store` if no template
261
+ store is registered · `1` unhandled.
262
+
263
+ **Example.**
264
+
265
+ ```powershell
266
+ dotnet script scripts/classify.csx -- --pdf invoice.pdf
267
+ ```
268
+
269
+ ---
270
+
271
+ ## list-templates.csx
272
+
273
+ **Synopsis.** Enumerate template identifiers from the configured store.
274
+
275
+ | Arg | Required | Description |
276
+ | -------------- | -------- | -------------------------------------------------------- |
277
+ | `--store-path` | no | Local template store directory (default: `./templates`). |
278
+ | `--store-url` | no | API template store URL. |
279
+ | `--store-key` | no | Function key for API store authentication. |
280
+
281
+ **Output schema.** `{ templates: [id, ...] }`.
282
+
283
+ **Exit codes.** `0` success · `1` unhandled.
284
+
285
+ **Example.**
286
+
287
+ ```powershell
288
+ dotnet script scripts/list-templates.csx -- --store-path ./templates
289
+ ```
290
+
291
+ ---
292
+
293
+ ## load-template.csx
294
+
295
+ **Synopsis.** Resolve a template by identifier and emit its JSON representation.
296
+
297
+ | Arg | Required | Description |
298
+ | -------------- | -------- | -------------------------------------------------------- |
299
+ | `--id` | yes | Template identifier. |
300
+ | `--output` | no | Write JSON to this file path instead of stdout. |
301
+ | `--store-path` | no | Local template store directory (default: `./templates`). |
302
+ | `--store-url` | no | API template store URL. |
303
+ | `--store-key` | no | Function key for API store authentication. |
304
+
305
+ **Output schema.** Without `--output`: full template JSON. With `--output`:
306
+ `{ status: "ok", path }`.
307
+
308
+ **Exit codes.** `0` success · `1` not-found / unhandled.
309
+
310
+ **Example.**
311
+
312
+ ```powershell
313
+ dotnet script scripts/load-template.csx -- --id invoice --output templates/invoice.json
314
+ ```
315
+
316
+ ---
317
+
318
+ ## save-template.csx
319
+
320
+ **Synopsis.** Persist a template JSON file to the configured store. Fails with
321
+ `already-exists` unless `--overwrite` is supplied.
322
+
323
+ | Arg | Required | Description |
324
+ | -------------- | -------- | ------------------------------------------------------------------------- |
325
+ | `--template` | yes | Path to the template JSON file to persist. |
326
+ | `--overwrite` | no | Boolean switch — overwrite an existing template with the same identifier. |
327
+ | `--store-path` | no | Local template store directory (default: `./templates`). |
328
+ | `--store-url` | no | API template store URL. |
329
+ | `--store-key` | no | Function key for API store authentication. |
330
+
331
+ **Output schema.** `{ status: "ok", identifier }`.
332
+
333
+ **Exit codes.** `0` success · `1` `already-exists` / parse-error / unhandled.
334
+
335
+ **Example.**
336
+
337
+ ```powershell
338
+ dotnet script scripts/save-template.csx -- --template templates/invoice.json --overwrite
339
+ ```
340
+
341
+ ---
342
+
343
+ ## Internals — `_common.csx`
344
+
345
+ `_common.csx` is the **single bootstrap** used by every script. It:
346
+
347
+ 1. References the locally-built `Docuoria.dll` and required NuGet packages
348
+ (`PdfPig`, `Tabula`, `CsvHelper`, `pythonnet`, `Microsoft.Extensions.Hosting` /
349
+ `DependencyInjection` / `Http`).
350
+ 2. Exposes `Cli.Require / Cli.Get / Cli.Has` for argument parsing (renamed from
351
+ `Args` to avoid shadowing the `dotnet-script` global of the same name).
352
+ 3. Builds a Generic Host via `ScriptHost.CreateHost(args, includeStore: bool)` which
353
+ wires `AddDocuoriaEngine`, `AddBuiltInMatchRules`, the CSV/JSON output
354
+ generators, and (optionally) the template store selected by the `--store-path` /
355
+ `--store-url` / `--store-key` flags.
356
+ 4. Provides `JsonOut.Write` / `JsonOut.Error` writers backed by
357
+ `DocuoriaJsonOptions.Default` and a `LoadPdf(path)` helper that exits with
358
+ `pdf-not-found` when the input is missing.
359
+
360
+ Scripts must declare `#nullable enable` after `#load "_common.csx"` because the
361
+ nullable context does not propagate across `#load` boundaries.