metanorma-release 0.2.23 → 0.2.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop_todo.yml +21 -319
  3. data/README.adoc +306 -91
  4. data/lib/metanorma/release/aggregation_pipeline.rb +65 -40
  5. data/lib/metanorma/release/asset_processor.rb +0 -2
  6. data/lib/metanorma/release/cache_store.rb +1 -0
  7. data/lib/metanorma/release/change_detector.rb +3 -2
  8. data/lib/metanorma/release/cli.rb +26 -2
  9. data/lib/metanorma/release/commands/aggregate.rb +0 -16
  10. data/lib/metanorma/release/commands/package.rb +9 -17
  11. data/lib/metanorma/release/commands/release_command.rb +19 -14
  12. data/lib/metanorma/release/config.rb +16 -7
  13. data/lib/metanorma/release/dependency_validation.rb +19 -0
  14. data/lib/metanorma/release/document_flattener.rb +173 -0
  15. data/lib/metanorma/release/index.rb +1 -1
  16. data/lib/metanorma/release/interfaces.rb +12 -0
  17. data/lib/metanorma/release/platform/github/manifest_reader.rb +3 -1
  18. data/lib/metanorma/release/platform/github/release_fetcher.rb +4 -10
  19. data/lib/metanorma/release/platform/github/topic_discoverer.rb +1 -1
  20. data/lib/metanorma/release/platform/github.rb +0 -4
  21. data/lib/metanorma/release/platform/local/fetcher.rb +5 -17
  22. data/lib/metanorma/release/platform/null/manifest_reader.rb +15 -0
  23. data/lib/metanorma/release/platform/null/publisher.rb +1 -1
  24. data/lib/metanorma/release/platform/null.rb +2 -0
  25. data/lib/metanorma/release/platform/static_discoverer.rb +19 -0
  26. data/lib/metanorma/release/platform.rb +1 -0
  27. data/lib/metanorma/release/platform_factory.rb +6 -21
  28. data/lib/metanorma/release/publication.rb +23 -161
  29. data/lib/metanorma/release/publication_serializer.rb +59 -0
  30. data/lib/metanorma/release/release_pipeline.rb +7 -15
  31. data/lib/metanorma/release/rxl_extractor.rb +106 -0
  32. data/lib/metanorma/release/site.rb +4 -154
  33. data/lib/metanorma/release/slug_strategy.rb +30 -15
  34. data/lib/metanorma/release/version.rb +1 -1
  35. data/lib/metanorma/release.rb +36 -19
  36. metadata +8 -2
data/README.adoc CHANGED
@@ -8,18 +8,25 @@ toc::[]
8
8
 
9
9
  == Overview
10
10
 
11
- `metanorma-release` manages the full release lifecycle of Metanorma documents through three actors:
11
+ `metanorma-release` manages the full release lifecycle of Metanorma documents through two config files:
12
12
 
13
- **Doc repo** (producer)::
14
- Discover compiled documents -> extract metadata from RXL -> detect changes -> package as zip -> release to a platform (GitHub Releases, local filesystem).
13
+ **Per-repo** (`metanorma.release.yml`)::
14
+ Defines routing rules that map documents to channel labels based on slug pattern, stage, or doctype.
15
+ Read by the `release` command to tag releases with channel metadata.
15
16
 
16
- **Org/publisher** (governor)::
17
- Define channels and routing rules via config. Map document metadata (stage, doctype) to channel labels that control visibility and distribution.
17
+ **Per-site** (`metanorma.aggregate.yml`)::
18
+ Defines discovery (which repos to aggregate), channel subscription (which channels to include), display categories, and output layout.
19
+ Read by the `aggregate` command to build a file tree + `index.json` for any site generator.
18
20
 
19
- **Aggregator** (consumer)::
20
- Discover repositories -> fetch published releases -> filter by channel and stage -> extract zip assets -> generate `index.json` with Relaton enrichment and a file tree for any site generator.
21
+ The output is platform-agnostic: a directory containing `index.json` and a tree of document files (HTML, PDF, XML, RXL).
22
+ Any site generator (Jekyll, Hugo, Vite) consumes that output independently.
21
23
 
22
- The output is platform-agnostic: a directory containing `index.json` and a tree of document files. Any site generator (Jekyll, Hugo, Vite) consumes that output independently.
24
+ === How it works
25
+
26
+ . **Compile** — Metanorma compiles `.adoc` sources to HTML, PDF, XML, and RXL (Relaton metadata).
27
+ . **Package & release** — `metanorma-release release` discovers compiled docs, extracts metadata from RXL, packages as ZIP, and publishes to GitHub Releases with channel labels derived from `metanorma.release.yml` routing rules.
28
+ . **Aggregate** — `metanorma-release aggregate` discovers repos by topic/topic, fetches their releases, filters by channel, extracts ZIP assets, enriches with Relaton bibliographic data, and writes `index.json` + file tree.
29
+ . **Present** — A site generator (Jekyll) reads `index.json` and renders the document registry.
23
30
 
24
31
  == Installation
25
32
 
@@ -45,20 +52,56 @@ Requires Ruby >= 3.2. Optional runtime dependencies:
45
52
 
46
53
  == Quick start
47
54
 
48
- === CLI
55
+ === In a document repository
56
+
57
+ Create `metanorma.release.yml`:
58
+
59
+ [source,yaml]
60
+ ----
61
+ documents:
62
+ - pattern: "cc-*"
63
+ channels: [public/standards]
64
+ ----
49
65
 
50
- The gem ships three commands:
66
+ Run the release:
51
67
 
52
68
  [source,sh]
53
69
  ----
54
- # Package compiled documents as zip archives
55
- metanorma-release package --output-dir _site
70
+ metanorma-release release --platform github --token $GITHUB_TOKEN
71
+ ----
72
+
73
+ === In an aggregator site
74
+
75
+ Create `metanorma.aggregate.yml`:
76
+
77
+ [source,yaml]
78
+ ----
79
+ source: github
80
+ output_dir: _site/docs
81
+ file_routing: flat
82
+
83
+ channels:
84
+ - public
85
+
86
+ display_categories:
87
+ - name: Standards
88
+ slug: standards
89
+ doctypes: [standard, specification, report]
90
+ - name: Guides
91
+ slug: guides
92
+ doctypes: [guide, advisory]
93
+
94
+ github:
95
+ organizations:
96
+ - MyOrg
97
+ topic: metanorma-release
98
+ ----
56
99
 
57
- # Package and release to a platform
58
- metanorma-release release --platform github --output-dir _site --token $GITHUB_TOKEN
100
+ Run the aggregation:
59
101
 
60
- # Aggregate published releases into a file tree + index.json
61
- metanorma-release aggregate --repos my-org/my-repo --output-dir _site/cc
102
+ [source,sh]
103
+ ----
104
+ metanorma-release aggregate
62
105
  ----
63
106
 
64
107
  === Ruby API
@@ -68,7 +111,6 @@ metanorma-release aggregate --repos my-org/my-repo --output-dir _site/cc
68
111
  # Discover publications from compiled RXL files
69
112
  publications = Metanorma::Release::Publication.discover("_site")
70
113
 
71
- # Each publication carries metadata from Relaton
72
114
  pub = publications.first
73
115
  pub.identifier # => "CC 18011:2018"
74
116
  pub.slug # => "cc-18011-2018"
@@ -78,13 +120,11 @@ pub.stage # => "60"
78
120
  pub.doctype # => "standard"
79
121
  pub.formats # => ["html", "pdf", "xml"]
80
122
 
81
- # Serialization (used in release body and sidecar metadata)
123
+ # Serialize (used in release body)
82
124
  pub.to_release_body # => "<!-- mn-release-metadata\n{...}\n-->"
83
- pub.to_json # => "{...}"
84
125
 
85
126
  # Parse from release body (used in aggregation)
86
127
  pub = Publication.from_release_body(body)
87
- pub = Publication.from_json(json_string)
88
128
  ----
89
129
 
90
130
  == CLI reference
@@ -124,7 +164,7 @@ metanorma-release release [options]
124
164
  |`--manifest FILE` |Release manifest file (default: `metanorma.release.yml`)
125
165
  |`--force` |Force release even if unchanged
126
166
  |`--force-replace PAT` |Glob pattern for forced replacement (repeatable)
127
- |`--channels CHANS` |Override channels
167
+ |`--channels CHANS` |Override channels (bypasses routing rules)
128
168
  |`--concurrency N` |Parallel workers (default: 4)
129
169
  |`--token TOKEN` |Platform auth token
130
170
  |`--config SOURCE` |Config file
@@ -148,111 +188,195 @@ metanorma-release aggregate [options]
148
188
  |`--source SOURCE` |Discovery source: `github`, `local:PATH` (default: `github`)
149
189
  |`--organizations ORGS` |Organization list (overrides config)
150
190
  |`--topic TOPIC` |Repository topic filter (default: `metanorma-release`)
151
- |`--repos REPOS` |Explicit repo list
152
- |`--channels CHANS` |Filter channels
153
- |`--stages STAGES` |Filter stages
191
+ |`--repos REPOS` |Explicit repo list (overrides discovery)
192
+ |`--channels CHANS` |Filter by channel (only aggregate matching channels)
154
193
  |`--output-dir DIR` |Output directory (default: `_site/cc`)
155
194
  |`--file-routing MODE` |File layout: `by-document`, `flat`, `by-format` (default: `by-document`)
156
- |`--[no-]include-drafts` |Include draft releases
195
+ |`--[no-]include-drafts` |Include draft releases (default: false)
157
196
  |`--concurrency N` |Parallel repos (default: 4)
158
197
  |`--min-documents N` |Fail if fewer documents found (default: 0)
159
198
  |`--token TOKEN` |Platform auth token (falls back to `GITHUB_TOKEN` env)
160
199
  |===
161
200
 
162
- ==== Aggregate config file
201
+ == Config files
202
+
203
+ === Per-repo: `metanorma.release.yml`
163
204
 
164
- Create `metanorma.aggregate.yml` in your project root:
205
+ Defines how documents in a repository are routed to channels.
206
+ Placed in the root of each Metanorma document repository.
207
+
208
+ ==== Simple single-channel
209
+
210
+ All documents go to one channel:
165
211
 
166
212
  [source,yaml]
167
213
  ----
168
- source: github
169
- output_dir: _site/cc
170
- file_routing: flat
171
- cache_dir: .cache/aggregate
172
-
173
- github:
174
- organizations:
175
- - MyOrg
176
- topic: metanorma-release
214
+ documents:
215
+ - pattern: "cc-*"
216
+ channels: [public/standards]
177
217
  ----
178
218
 
179
- CLI flags override config file values. Cache is always enabled (defaults to `.cache/aggregate/`).
219
+ ==== Multi-channel by document pattern
180
220
 
181
- == Concepts
221
+ Route different documents to different channels based on their slug:
182
222
 
183
- === Publication
223
+ [source,yaml]
224
+ ----
225
+ documents:
226
+ - pattern: "cc-s-*"
227
+ channels: [public/standards]
228
+ - pattern: "cc-r-*"
229
+ channels: [public/reports]
230
+ - pattern: "cc-a-*"
231
+ channels: [public/admin]
232
+ - pattern: "cc-adv-*"
233
+ channels: [public/advisories]
234
+ ----
184
235
 
185
- The central domain model. A `Publication` carries metadata from Relaton RXL extraction, files from the filesystem, and channels from config routing.
236
+ Pattern matching uses Ruby `File.fnmatch` glob syntax against the document slug.
237
+ The slug is derived from the document identifier: `CC 51020:2019` → `cc-51020-2019`.
186
238
 
187
- [source,ruby]
188
- ----
189
- pub = Metanorma::Release::Publication.new(
190
- identifier: "CC 18011:2018",
191
- slug: "cc-18011-2018",
192
- title: "Date and time — Explicit representation",
193
- edition: "1",
194
- stage: "60",
195
- doctype: "standard",
196
- revdate: "2018-06-01",
197
- files: [PublicationFile.new(format: "html", name: "cc-18011.html", path: "cc-18011.html")],
198
- channels: ["public"]
199
- )
239
+ ==== Routing by stage and doctype
200
240
 
201
- pub.base_dir # => "."
202
- pub.content_hash # => #<ContentHash ...>
203
- pub.with_channels(["members"]) # => new Publication with different channels
204
- ----
241
+ You can also route by stage or doctype instead of pattern:
205
242
 
206
- === Channels
243
+ [source,yaml]
244
+ ----
245
+ documents:
246
+ - stages: ["20", "30"]
247
+ channels: [internal]
248
+ - doctypes: [standard, specification]
249
+ channels: [public/standards]
250
+ - doctypes: [report]
251
+ channels: [public/reports]
252
+ ----
207
253
 
208
- Channels are simple string labels that control document visibility and distribution. Typical values: `public`, `members`, `internal`.
254
+ Multiple criteria are ANDed (a document must match all specified fields).
255
+ First matching entry wins. Documents not matching any entry default to `["public"]`.
209
256
 
210
- === Config
257
+ ==== Slug strategies
211
258
 
212
- A `metanorma.release.yml` config file defines channels and routing rules for an organization:
259
+ Tag naming varies by publisher. Set the default strategy and per-publisher overrides:
213
260
 
214
261
  [source,yaml]
215
262
  ----
216
- channels:
217
- - public
218
- - members
219
- - internal
220
-
221
- routing:
222
- default: [public]
223
- rules:
224
- - stage: ["20", "30"]
225
- channels: [internal]
226
- - stage: ["60"]
227
- channels: [public]
228
- - doctype: [report]
229
- channels: [public]
230
-
231
263
  slug:
232
264
  default: edition
233
265
  strategies:
234
266
  ietf: internet-draft
235
267
  ieee: draft-suffix
236
- iho: version
237
- ogc: version
238
268
  ----
239
269
 
240
- Routing rules match raw metadata values from Relaton (stage, doctype) to channel labels. When no config is present, all documents route to `public`.
270
+ [cols="1m,1,2",options="header"]
271
+ |===
272
+ |Strategy |Tag format |Used for
273
+ |`edition` (default) |`cc-18011-2018/ed1` |CalConnect, ISO
274
+ |`version` |`iho-s44/v1` |IHO, OGC
275
+ |`internet-draft` |`id-ietf-foo/1` |IETF drafts
276
+ |`rfc` |`rfc-1234/ed1` |IETF RFCs
277
+ |`draft-suffix` |`ieee-8021/d1` |IEEE
278
+ |===
279
+
280
+ The strategy is resolved from the document identifier prefix (e.g., `CC` → default, `IETF` → `ietf`).
241
281
 
242
- === Slug strategies
282
+ === Per-site: `metanorma.aggregate.yml`
243
283
 
244
- Tag and file naming varies by publisher (derived from the document identifier prefix). Strategies are resolved via a registry:
284
+ Defines how an aggregator site discovers, filters, and outputs documents.
245
285
 
246
- [cols="1m,1,2",options="header"]
286
+ [source,yaml]
287
+ ----
288
+ source: github
289
+ output_dir: _site/docs
290
+ file_routing: flat
291
+ cache_dir: .cache/aggregate
292
+ data_dir: _data
293
+
294
+ channels:
295
+ - public
296
+
297
+ include_drafts: true
298
+
299
+ display_categories:
300
+ - name: Standards, Specifications & Reports
301
+ slug: standards
302
+ doctypes:
303
+ - standard
304
+ - specification
305
+ - report
306
+ - name: Guides & Advisories
307
+ slug: guides
308
+ doctypes:
309
+ - guide
310
+ - advisory
311
+ - name: Directives
312
+ slug: directives
313
+ doctypes:
314
+ - directive
315
+ - name: Administrative
316
+ slug: administrative
317
+ doctypes:
318
+ - administrative
319
+
320
+ github:
321
+ organizations:
322
+ - CalConnect
323
+ topic: metanorma-release
324
+ ----
325
+
326
+ ==== Config reference
327
+
328
+ [cols="1m,1,3",options="header"]
247
329
  |===
248
- |Publisher |Strategy |Tag format
249
- |default (CalConnect, ISO) |`EditionSlug` |`cc-18011-2018/ed1`
250
- |IETF draft |`InternetDraftSlug` |`id-ietf-foo/1`
251
- |IETF RFC |`RfcSlug` |`rfc-1234/ed1`
252
- |IEEE |`DraftSuffixSlug` |`ieee-8021/d1`
253
- |IHO, OGC |`VersionSlug` |`iho-s44/v1`
330
+ |Key |Default |Description
331
+ |`source` |`github` |Discovery source: `github` or `local:PATH`
332
+ |`output_dir` |`_site/cc` |Where to write extracted files and `index.json`
333
+ |`file_routing` |`by-document` |File layout: `by-document`, `flat`, or `by-format`
334
+ |`cache_dir` |`.cache/aggregate` |Delta state cache for incremental builds
335
+ |`data_dir` |_none_ |If set, writes flattened `documents.json` for site generators
336
+ |`channels` |`[]` |Channel filter (empty = accept all channels)
337
+ |`include_drafts` |`false` |Whether to include draft-stage releases
338
+ |`display_categories` |`[]` |Maps doctypes to display categories for site output
339
+ |`github.organizations` |_required_ |GitHub orgs to scan for repositories
340
+ |`github.topic` |`metanorma-release` |Repository topic filter
254
341
  |===
255
342
 
343
+ CLI flags override config file values.
344
+
345
+ == Concepts
346
+
347
+ === Channels
348
+
349
+ Channels are hierarchical string labels that control document visibility.
350
+ They are assigned during release (per-repo config) and filtered during aggregation (per-site config).
351
+
352
+ Typical channel hierarchy:
353
+
354
+ ```
355
+ public/ → visible on public sites
356
+ public/standards → published standards
357
+ public/reports → technical reports
358
+ public/admin → administrative documents
359
+ members/ → members-only content
360
+ internal/ → not published to any site
361
+ ```
362
+
363
+ Channel matching is prefix-based: a filter for `public` matches `public/standards`, `public/reports`, etc.
364
+
365
+ === Display categories
366
+
367
+ Display categories map document types to site sections.
368
+ Defined in `metanorma.aggregate.yml`, they group doctypes into user-facing categories:
369
+
370
+ [source,yaml]
371
+ ----
372
+ display_categories:
373
+ - name: Standards, Specifications & Reports
374
+ slug: standards
375
+ doctypes: [standard, specification, report]
376
+ ----
377
+
378
+ The aggregator resolves each document's display category from its doctype and includes `display_category` and `display_category_slug` fields in the output JSON.
379
+
256
380
  === File routing
257
381
 
258
382
  The aggregation pipeline supports three file layout modes:
@@ -265,6 +389,97 @@ The aggregation pipeline supports three file layout modes:
265
389
  |`by-format` |`html/cc-18011.html`
266
390
  |===
267
391
 
392
+ === Delta state caching
393
+
394
+ Aggregation is incremental: a delta state cache tracks which repos/tags have already been processed.
395
+ On subsequent runs, unchanged repos are skipped entirely.
396
+ The cache lives in `.cache/aggregate/` by default and should be persisted in CI (cache action, artifact upload).
397
+
398
+ === Output format
399
+
400
+ The aggregator writes:
401
+
402
+ * `index.json` — full document index with metadata, bibliographic data, and file references
403
+ * File tree — extracted document files (HTML, PDF, XML, RXL) organized by file routing mode
404
+ * `_data/documents.json` — flattened version for Jekyll site generators (if `data_dir` is set)
405
+
406
+ Each document in the output includes:
407
+
408
+ ```
409
+ slug, id, title, abstract, stage, doctype, edition, date,
410
+ channels, formats, files,
411
+ has_html, has_pdf, has_xml, has_rxl,
412
+ html_path, pdf_path, xml_path, rxl_path,
413
+ stage_css, doctype_class,
414
+ display_category, display_category_slug,
415
+ authors, committee,
416
+ bibliographic
417
+ ```
418
+
419
+ == Creating a new document repository
420
+
421
+ === Using the template
422
+
423
+ For CalConnect documents, use the https://github.com/CalConnect/cc-template[`cc-template`] repository template:
424
+
425
+ . Click **Use this template** on GitHub
426
+ . Name the repo `cc-{descriptive-name}` (e.g. `cc-icalendar-series`)
427
+ . Replace placeholder document numbers in `sources/` and `metanorma.yml`
428
+ . Add the `metanorma-release` topic to the repository
429
+ . Push to `main`
430
+
431
+ === Manual setup
432
+
433
+ . Create a repository with the `metanorma-release` GitHub topic
434
+ . Add `metanorma.release.yml` with routing rules:
435
+
436
+ +
437
+ [source,yaml]
438
+ ----
439
+ documents:
440
+ - pattern: "cc-*"
441
+ channels: [public/standards]
442
+ ----
443
+
444
+ . Add `metanorma.yml` with source file list:
445
+
446
+ +
447
+ [source,yaml]
448
+ ----
449
+ metanorma:
450
+ source:
451
+ files:
452
+ - sources/cc-51020.adoc
453
+ collection:
454
+ name: "My Document Title"
455
+ organization: CalConnect
456
+ ----
457
+
458
+ . Add a CI workflow (`.github/workflows/release.yml`):
459
+
460
+ +
461
+ [source,yaml]
462
+ ----
463
+ name: Release
464
+ on:
465
+ push:
466
+ branches: [main]
467
+ paths: ['sources/**', 'metanorma.yml', 'metanorma.release.yml']
468
+ workflow_dispatch:
469
+ permissions:
470
+ contents: write
471
+ jobs:
472
+ release:
473
+ uses: actions-mn/.github/.github/workflows/metanorma-release.yml@main
474
+ with:
475
+ default-visibility: private
476
+ secrets: inherit
477
+ ----
478
+
479
+ . Write your AsciiDoc source under `sources/`
480
+
481
+ The aggregator site will automatically discover your repository and publish its documents.
482
+
268
483
  == Architecture
269
484
 
270
485
  === Domain model
@@ -274,9 +489,9 @@ All core types are immutable, frozen value objects:
274
489
  * `Publication` -- metadata + files + channels + source
275
490
  * `PublicationFile` -- format, name, path
276
491
  * `PublicationSource` -- owner, repo, tag, url, date
277
- * `Channel` -- string label wrapper
278
- * `Index` -- collection of Publications with parameters
279
- * `Site` -- aggregated output (index + file tree + Relaton enrichment)
492
+ * `Channel` -- string label wrapper with prefix matching
493
+ * `Index` -- collection of Publications with schema version
494
+ * `Site` -- aggregated output (index + file tree + Relaton enrichment + display categories)
280
495
 
281
496
  === Dependency flow
282
497
 
@@ -7,12 +7,14 @@ module Metanorma
7
7
  keyword_init: true)
8
8
  RepoError = Struct.new(:tag, :message, keyword_init: true)
9
9
 
10
- class AggregationPipeline # rubocop:disable Metrics/ClassLength
10
+ class AggregationPipeline
11
11
  Dependencies = Struct.new(
12
12
  :discoverer, :fetcher, :manifest_reader,
13
13
  :metadata_filter, :asset_processor, :delta_state,
14
14
  keyword_init: true
15
15
  ) do
16
+ include DependencyValidation
17
+
16
18
  def initialize(**kwargs)
17
19
  super
18
20
  validate_types!
@@ -27,16 +29,6 @@ module Metanorma
27
29
  "manifest_reader")
28
30
  validate_interface!(delta_state, DeltaStateManager, "delta_state")
29
31
  end
30
-
31
- def validate_interface!(obj, mod, name)
32
- return if obj.is_a?(mod) || begin
33
- obj.class.ancestors.include?(mod)
34
- rescue StandardError
35
- false
36
- end
37
-
38
- raise ArgumentError, "#{name} must include #{mod}, got #{obj.class}"
39
- end
40
32
  end
41
33
 
42
34
  Config = Struct.new(
@@ -58,17 +50,13 @@ module Metanorma
58
50
  def run(config, output_dir)
59
51
  @deps.delta_state.load
60
52
  repos = @deps.discoverer.discover
61
- publications = []
62
- reports = []
63
- failed_repos = []
64
53
 
65
- repos.each do |repo|
66
- repo_docs, report = process_repo(repo, output_dir, config)
67
- publications.concat(repo_docs)
68
- reports << report
69
- rescue StandardError => e
70
- failed_repos << RepoError.new(tag: repo.to_s, message: e.message)
71
- raise if config.fail_on_error
54
+ if config.concurrency > 1
55
+ publications, reports, failed_repos = run_concurrent(repos,
56
+ output_dir, config)
57
+ else
58
+ publications, reports, failed_repos = run_sequential(repos,
59
+ output_dir, config)
72
60
  end
73
61
 
74
62
  @deps.delta_state.save
@@ -84,6 +72,52 @@ module Metanorma
84
72
 
85
73
  private
86
74
 
75
+ def run_sequential(repos, output_dir, config)
76
+ publications = []
77
+ reports = []
78
+ failed_repos = []
79
+
80
+ repos.each do |repo|
81
+ repo_docs, report = process_repo(repo, output_dir, config)
82
+ publications.concat(repo_docs)
83
+ reports << report
84
+ rescue StandardError => e
85
+ failed_repos << RepoError.new(tag: repo.to_s, message: e.message)
86
+ raise if config.fail_on_error
87
+ end
88
+
89
+ [publications, reports, failed_repos]
90
+ end
91
+
92
+ def run_concurrent(repos, output_dir, config)
93
+ publications = []
94
+ reports = []
95
+ failed_repos = []
96
+ mutex = Mutex.new
97
+ threads = repos.each_slice([
98
+ (repos.length.to_f / config.concurrency).ceil, 1
99
+ ].max).map do |batch|
100
+ Thread.new(batch) do |slice|
101
+ slice.each do |repo|
102
+ repo_docs, report = process_repo(repo, output_dir, config)
103
+ mutex.synchronize do
104
+ publications.concat(repo_docs)
105
+ reports << report
106
+ end
107
+ rescue StandardError => e
108
+ mutex.synchronize do
109
+ failed_repos << RepoError.new(tag: repo.to_s,
110
+ message: e.message)
111
+ end
112
+ raise if config.fail_on_error
113
+ end
114
+ end
115
+ end
116
+ threads.each { |t| t.join(300) }
117
+
118
+ [publications, reports, failed_repos]
119
+ end
120
+
87
121
  def process_repo(repo, output_dir, config)
88
122
  repo_key = repo.to_s
89
123
 
@@ -116,25 +150,23 @@ module Metanorma
116
150
  tag = release.tag_name
117
151
  current_tags << tag
118
152
 
119
- content_hash = extract_content_hash(release.body)
120
-
121
- if @deps.delta_state.processed?(repo_key, tag, content_hash)
122
- files = @deps.delta_state.release_files(repo_key, tag)
123
- if files.all? { |f| File.exist?(File.join(output_dir, f)) }
124
- publications << build_publication(metadata, files, content_hash,
125
- release, repo)
126
- next
127
- end
153
+ cached_files = @deps.delta_state.release_files(repo_key, tag)
154
+ if cached_files.any? && cached_files.all? do |f|
155
+ File.exist?(File.join(output_dir, f))
156
+ end
157
+ publications << build_publication(metadata, cached_files, release,
158
+ repo)
159
+ next
128
160
  end
129
161
 
130
162
  zip_asset = find_zip_asset(release)
131
163
  next unless zip_asset
132
164
 
133
165
  result = @deps.asset_processor.process(zip_asset.data, metadata_h)
134
- @deps.delta_state.mark_processed(repo_key, tag, content_hash,
166
+ @deps.delta_state.mark_processed(repo_key, tag, nil,
135
167
  result.files.map(&:path))
136
168
  publications << build_publication(metadata, result.files.map(&:path),
137
- content_hash, release, repo)
169
+ release, repo)
138
170
  rescue StandardError => e
139
171
  errors << RepoError.new(tag: release.tag_name, message: e.message)
140
172
  end
@@ -150,17 +182,10 @@ module Metanorma
150
182
  )]
151
183
  end
152
184
 
153
- def build_publication(metadata, files, _content_hash, release, repo)
185
+ def build_publication(metadata, files, release, repo)
154
186
  metadata.with_files_and_source(files, release, repo)
155
187
  end
156
188
 
157
- def extract_content_hash(body)
158
- return nil if body.nil?
159
-
160
- match = body.match(/^content-hash:([a-f0-9]+)/)
161
- match ? match[1] : nil
162
- end
163
-
164
189
  def find_zip_asset(release)
165
190
  return nil unless release.assets
166
191
 
@@ -12,8 +12,6 @@ module Metanorma
12
12
  class AssetProcessor
13
13
  ProcessResult = Struct.new(:files, :channels, keyword_init: true)
14
14
 
15
- CANONICALIZE_PATTERN = /-ed\d+(\.\d+)?-/
16
-
17
15
  def initialize(output_dir:, routing:, canonicalize: true)
18
16
  @output_dir = output_dir
19
17
  @routing = routing
@@ -1,5 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require "fileutils"
3
4
  require "json"
4
5
 
5
6
  module Metanorma