@redpanda-data/docs-extensions-and-macros 4.11.1 → 4.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. package/bin/doc-tools.js +4 -2
  2. package/package.json +3 -1
  3. package/tools/property-extractor/COMPUTED_CONSTANTS.md +173 -0
  4. package/tools/property-extractor/Makefile +12 -1
  5. package/tools/property-extractor/README.adoc +828 -97
  6. package/tools/property-extractor/compare-properties.js +38 -13
  7. package/tools/property-extractor/constant_resolver.py +610 -0
  8. package/tools/property-extractor/file_pair.py +42 -0
  9. package/tools/property-extractor/generate-handlebars-docs.js +41 -8
  10. package/tools/property-extractor/helpers/gt.js +9 -0
  11. package/tools/property-extractor/helpers/includes.js +17 -0
  12. package/tools/property-extractor/helpers/index.js +3 -0
  13. package/tools/property-extractor/helpers/isEnterpriseEnum.js +24 -0
  14. package/tools/property-extractor/helpers/renderPropertyExample.js +6 -5
  15. package/tools/property-extractor/overrides.json +248 -0
  16. package/tools/property-extractor/parser.py +254 -32
  17. package/tools/property-extractor/property_bag.py +40 -0
  18. package/tools/property-extractor/property_extractor.py +1417 -430
  19. package/tools/property-extractor/requirements.txt +1 -0
  20. package/tools/property-extractor/templates/property-backup.hbs +161 -0
  21. package/tools/property-extractor/templates/property.hbs +104 -49
  22. package/tools/property-extractor/templates/topic-property-backup.hbs +148 -0
  23. package/tools/property-extractor/templates/topic-property.hbs +72 -34
  24. package/tools/property-extractor/tests/test_known_values.py +617 -0
  25. package/tools/property-extractor/tests/transformers_test.py +81 -6
  26. package/tools/property-extractor/topic_property_extractor.py +23 -10
  27. package/tools/property-extractor/transformers.py +2191 -369
  28. package/tools/property-extractor/type_definition_extractor.py +669 -0
  29. package/tools/property-extractor/definitions.json +0 -245
@@ -1,206 +1,937 @@
1
- = Redpanda Property Generator
1
+ = Redpanda Property Extractor
2
2
 
3
- The Redpanda Property Generator is a CLI tool designed to extract properties from Redpanda's source code and generate a JSON output with their definitions as well as Asciidoc pages.
3
+ The Redpanda Property Extractor automatically extracts configuration properties and type definitions from Redpanda's C++ source code and generates JSON schemas and AsciiDoc documentation.
4
4
 
5
5
  == Prerequisites
6
6
 
7
- Ensure the following prerequisites are installed on your system:
7
+ Ensure the following prerequisites are installed:
8
8
 
9
9
  - https://www.python.org/downloads/[Python 3.10 or higher]
10
- - A C++ compiler (such as `gcc`, `clang`)
11
- - https://www.google.com/search?q=how+to+install+make[`make` utility] (to use the Makefile for automation)
12
- +
13
- To ensure `make` is available:
14
- +
10
+ - A C++ compiler (such as `gcc` or `clang`)
11
+ - https://www.gnu.org/software/make/[`make` utility]
12
+
13
+ Verify `make` installation:
14
+
15
15
  [,bash]
16
16
  ----
17
17
  make --version
18
18
  ----
19
19
 
20
- == Install
20
+ == Quick start
21
21
 
22
22
  . Clone the repository:
23
23
  +
24
24
  [,bash]
25
25
  ----
26
26
  git clone https://github.com/redpanda-data/docs-extensions-and-macros.git
27
- cd docs-extensions-and-macros
27
+ cd docs-extensions-and-macros/tools/property-extractor
28
28
  ----
29
29
 
30
- == Generate properties
31
-
32
- . Run the build process:
30
+ . Build and generate documentation:
33
31
  +
34
32
  [,bash]
35
33
  ----
36
- cd tools/property-extractor
37
34
  make build
38
35
  ----
39
36
  +
40
37
  This command:
41
38
  +
42
- - Sets up the Python virtual environment (`venv`).
43
- - Checks out the Redpanda source code to the specified branch or tag.
44
- - Runs the extractor to generate a JSON file at `gen/properties-output.json`.
45
- - Runs the docs generator to generate Asciidoc pages from the `properties-output.json`.
39
+ * Sets up a Python virtual environment
40
+ * Clones the Redpanda source code to the specified branch or tag
41
+ * Extracts properties and type definitions to `gen/properties-output.json`
42
+ * Generates AsciiDoc documentation files in `output/`
46
43
 
47
- . Locate the generated files:
44
+ . View generated files:
48
45
  +
49
46
  [,bash]
50
47
  ----
51
48
  ls gen/properties-output.json
52
- ls output
49
+ ls output/pages/
53
50
  ----
54
51
 
55
- To clean the environment and generated files:
52
+ To clean generated files:
56
53
 
57
54
  [,bash]
58
55
  ----
59
56
  make clean
60
57
  ----
61
58
 
62
- == Run the extractor manually
59
+ == How it works
60
+
61
+ === Architecture overview
62
+
63
+ The property extractor uses a multi-stage pipeline:
64
+
65
+ [source,text]
66
+ ----
67
+ C++ Source Code
68
+
69
+ [Tree-sitter Parser] → AST
70
+
71
+ [Property Extractor] → Raw properties
72
+
73
+ [Type Definition Extractor] → Auto-discovered types
74
+
75
+ [Transformers Pipeline] → Enriched properties
76
+
77
+ [Type Resolver] → Resolved types & defaults
78
+
79
+ [Enum Default Mapper] → User-facing enum values
80
+
81
+ [Chrono Evaluator] → Numeric values & human-readable times
82
+
83
+ [Overrides Applier] → Final properties
84
+
85
+ JSON Schema Output
86
+ ----
87
+
88
+ === Stage 1: Source code parsing
89
+
90
+ The extractor uses https://tree-sitter.github.io/tree-sitter/[Tree-sitter] to parse C++ source code into Abstract Syntax Trees (ASTs). It identifies property declarations in:
91
+
92
+ * `src/v/config/configuration.cc` - Broker and cluster properties
93
+ * `src/v/kafka/client/configuration.cc` - Kafka client properties
94
+ * Other configuration files
95
+
96
+ Properties are declared using Redpanda's property template classes:
97
+
98
+ [,cpp]
99
+ ----
100
+ property<std::optional<int>>("property_name", "Description")
101
+ .default_value(42)
102
+ .visibility(visibility::tunable);
103
+ ----
104
+
105
+ === Stage 2: Type definition extraction
106
+
107
+ The extractor automatically discovers type definitions from C++ headers:
108
+
109
+ ==== Automatically extracted types
110
+
111
+ [cols="1,2,2"]
112
+ |===
113
+ | Type category | Example | Extraction method
114
+
115
+ | *Structs and classes*
116
+ | `model::broker_endpoint`, `config::tls_config`
117
+ | Brace-counting algorithm extracts complete struct bodies including nested types and methods
118
+
119
+ | *Enumerations*
120
+ | `model::compression`, `config::tls_version`
121
+ | Regex pattern matching with support for four conversion function patterns: `_to_string()`, `operator<<`, `string_switch`, and `to_string_view()`
122
+
123
+ | *Type aliases*
124
+ | `using node_id = named_type<int32_t, ...>`
125
+ | Pattern matching for `using` declarations with underlying type resolution
126
+
127
+ | *Enum string mappings*
128
+ | `write_caching_mode::default_false` → `"false"`
129
+ | Extracted from enum-to-string conversion functions using four pattern-matching strategies
130
+ |===
131
+
132
+ ==== Enum string mapping patterns
133
+
134
+ The extractor supports four C++ patterns for mapping enum values to user-facing strings:
135
+
136
+ [cols="1,2"]
137
+ |===
138
+ | Pattern | C++ Code Example
139
+
140
+ | *Pattern 1: `_to_string()` method*
141
+ |
142
+ [,cpp]
143
+ ----
144
+ std::string_view write_caching_mode_to_string(write_caching_mode s) {
145
+ switch(s) {
146
+ case write_caching_mode::default_false:
147
+ return "false";
148
+ }
149
+ }
150
+ ----
151
+
152
+ | *Pattern 2: `operator<<` overload*
153
+ |
154
+ [,cpp]
155
+ ----
156
+ std::ostream& operator<<(std::ostream& os, compression c) {
157
+ switch(c) {
158
+ case compression::gzip:
159
+ os << "gzip";
160
+ }
161
+ }
162
+ ----
163
+
164
+ | *Pattern 3: `string_switch` reverse lookup*
165
+ |
166
+ [,cpp]
167
+ ----
168
+ compression from_string(std::string_view s) {
169
+ return string_switch<compression>(s)
170
+ .match("gzip", compression::gzip)
171
+ .match("snappy", compression::snappy);
172
+ }
173
+ ----
174
+
175
+ | *Pattern 4: `to_string_view()` function*
176
+ |
177
+ [,cpp]
178
+ ----
179
+ constexpr std::string_view to_string_view(tls_version v) {
180
+ switch(v) {
181
+ case tls_version::v1_0:
182
+ return "v1.0";
183
+ case tls_version::v1_2:
184
+ return "v1.2";
185
+ }
186
+ }
187
+ ----
188
+ |===
189
+
190
+ The extractor searches for these patterns in `.cc` files related to the enum's `.h` header file.
191
+
192
+ ==== Type namespace resolution
193
+
194
+ The extractor resolves unqualified type names by trying common namespace prefixes:
195
+
196
+ * `config::` - Configuration types
197
+ * `model::` - Core data model types
198
+ * `security::` - Security and authentication types
199
+ * `net::` - Network types
200
+ * `kafka::` - Kafka protocol types
201
+ * `pandaproxy::` - Schema registry types
202
+
203
+ Example: An unqualified type `tls_version` automatically resolves to `config::tls_version` if found in the `config` namespace.
204
+
205
+ The extractor scans these source directories:
206
+
207
+ * `model/` - Core data model types
208
+ * `config/` - Configuration types
209
+ * `net/` - Network types
210
+ * `kafka/` - Kafka protocol types
211
+ * `pandaproxy/` - Schema registry types
212
+ * `security/` - Security and audit types
213
+ * `utils/` - Utility types
214
+
215
+ === Stage 3: Property enrichment
216
+
217
+ A series of transformers processes extracted properties:
218
+
219
+ [cols="1,2"]
220
+ |===
221
+ | Transformer | Function
222
+
223
+ | `BasicInfoTransformer`
224
+ | Extracts property names, types, and descriptions
225
+
226
+ | `VisibilityTransformer`
227
+ | Determines visibility (public, tunable, deprecated)
228
+
229
+ | `IsNullableTransformer`
230
+ | Detects optional properties
231
+
232
+ | `DefaultValueTransformer`
233
+ | Extracts and resolves default values
234
+
235
+ | `UnitsTransformer`
236
+ | Identifies units (bytes, milliseconds, etc.)
237
+
238
+ | `RequiresRestartTransformer`
239
+ | Determines if changes require restart
240
+
241
+ | `IsSecretTransformer`
242
+ | Marks sensitive properties
243
+ |===
244
+
245
+ ==== Deprecated property detection
246
+
247
+ The extractor identifies deprecated properties using three methods:
248
+
249
+ [cols="1,2,2"]
250
+ |===
251
+ | Detection method | C++ pattern | Result
252
+
253
+ | *Type-based*
254
+ | `deprecated_property<T>("name", ...)`
255
+ | Sets `is_deprecated: true` in JSON output
256
+
257
+ | *Metadata-based*
258
+ | `meta{.deprecated = "reason"}` +
259
+ `meta{.deprecated = yes}`
260
+ | Sets `is_deprecated: true` and optionally captures `deprecated_reason`
261
+
262
+ | *Visibility-based*
263
+ | `meta{.visibility = visibility::deprecated}`
264
+ | Sets `is_deprecated: true` and marks for migration documentation only
265
+ |===
266
+
267
+ Example C++ declarations:
268
+
269
+ [,cpp]
270
+ ----
271
+ // Type-based deprecation
272
+ deprecated_property<int>("old_setting", "Legacy configuration")
273
+ .default_value(42);
274
+
275
+ // Metadata-based deprecation with reason
276
+ property<bool>("legacy_mode", "Old behavior flag")
277
+ .default_value(false)
278
+ .visibility(visibility::user)
279
+ .meta{.deprecated = "Use new_mode instead"};
280
+
281
+ // Visibility-based deprecation
282
+ property<std::string>("obsolete_path", "Deprecated file path")
283
+ .default_value("/old/location")
284
+ .visibility(visibility::deprecated);
285
+ ----
286
+
287
+ Generated JSON output:
288
+
289
+ [,json]
290
+ ----
291
+ {
292
+ "old_setting": {
293
+ "type": "integer",
294
+ "default": 42,
295
+ "is_deprecated": true
296
+ },
297
+ "legacy_mode": {
298
+ "type": "boolean",
299
+ "default": false,
300
+ "is_deprecated": true,
301
+ "deprecated_reason": "Use new_mode instead"
302
+ },
303
+ "obsolete_path": {
304
+ "type": "string",
305
+ "default": "/old/location",
306
+ "is_deprecated": true,
307
+ "visibility": "deprecated"
308
+ }
309
+ }
310
+ ----
311
+
312
+ Deprecated properties appear in migration guides but are excluded from standard user documentation.
313
+
314
+ ==== Experimental property detection
315
+
316
+ The extractor identifies experimental properties that are in development or testing:
317
+
318
+ [cols="1,2,2"]
319
+ |===
320
+ | Detection method | C++ pattern | Result
321
+
322
+ | *Type-based*
323
+ | `experimental_property<T>("name", ...)`
324
+ | Sets `is_experimental_property: true` in JSON output
325
+
326
+ | *Metadata-based*
327
+ | `meta{.experimental = true}` +
328
+ `meta{.experimental = "description"}`
329
+ | Sets `is_experimental_property: true` and optionally captures experimental notes
330
+ |===
331
+
332
+ Example C++ declarations:
333
+
334
+ [,cpp]
335
+ ----
336
+ // Type-based experimental
337
+ experimental_property<int>("new_feature", "Feature in development")
338
+ .default_value(0);
339
+
340
+ // Metadata-based experimental
341
+ property<bool>("beta_mode", "Experimental feature flag")
342
+ .default_value(false)
343
+ .visibility(visibility::tunable)
344
+ .meta{.experimental = true};
345
+ ----
346
+
347
+ Generated JSON output:
348
+
349
+ [,json]
350
+ ----
351
+ {
352
+ "new_feature": {
353
+ "type": "integer",
354
+ "default": 0,
355
+ "is_experimental_property": true
356
+ },
357
+ "beta_mode": {
358
+ "type": "boolean",
359
+ "default": false,
360
+ "is_experimental_property": true
361
+ }
362
+ }
363
+ ----
364
+
365
+ Experimental properties are excluded from the documentation.
366
+
367
+ === Stage 4: Type resolution
368
+
369
+ The type resolver:
370
+
371
+ . Resolves `$ref` pointers to actual type definitions
372
+ . Expands C++ constructors into JSON-compatible default values
373
+ . Maps C++ types to JSON Schema types
374
+ . Applies enum constraints to properties
375
+
376
+ Example transformation:
377
+
378
+ [,cpp]
379
+ ----
380
+ // C++ source
381
+ property<std::vector<model::broker_endpoint>>("kafka_api")
382
+ .default_value({model::broker_endpoint{"internal", "127.0.0.1", 9092}})
383
+ ----
384
+
385
+ Becomes:
386
+
387
+ [,json]
388
+ ----
389
+ {
390
+ "kafka_api": {
391
+ "type": "array",
392
+ "items": {"$ref": "#/definitions/model::broker_endpoint"},
393
+ "default": [{"name": "internal", "address": "127.0.0.1", "port": 9092}]
394
+ }
395
+ }
396
+ ----
397
+
398
+ === Stage 5: Chrono expression evaluation and human-readable formatting
399
+
400
+ The extractor automatically evaluates C++ chrono expressions in default values and provides human-readable time representations:
401
+
402
+ ==== Chrono expression evaluation
403
+
404
+ Mathematical time expressions are converted to numeric values:
405
+
406
+ [,cpp]
407
+ ----
408
+ // C++ source with chrono expressions
409
+ property<std::chrono::milliseconds>("log_segment_ms_max")
410
+ .default_value(24h * 365); // One year in hours
411
+
412
+ property<std::chrono::seconds>("connection_timeout")
413
+ .default_value(7 * 24h); // One week
414
+ ----
415
+
416
+ The extractor:
417
+
418
+ 1. Parses time literals: `24h`, `365d`, `5min`, `30s`, `100ms`
419
+ 2. Evaluates arithmetic: `24h * 365`, `7 * 24h`, `60s + 30s`
420
+ 3. Converts to appropriate unit based on C++ type
421
+ 4. Adds human-readable representation for documentation
422
+
423
+ Example transformation:
424
+
425
+ [cols="1,1,1,1"]
426
+ |===
427
+ | C++ Expression | C++ Type | Numeric Value | Human-Readable
428
+
429
+ | `24h * 365`
430
+ | `std::chrono::milliseconds`
431
+ | `31536000000`
432
+ | "1 year"
433
+
434
+ | `7 * 24h`
435
+ | `std::chrono::seconds`
436
+ | `604800`
437
+ | "1 week"
438
+
439
+ | `5min`
440
+ | `std::chrono::seconds`
441
+ | `300`
442
+ | "5 minutes"
443
+
444
+ | `24h`
445
+ | `std::chrono::milliseconds`
446
+ | `86400000`
447
+ | "1 day"
448
+ |===
449
+
450
+ Generated JSON output:
451
+
452
+ [,json]
453
+ ----
454
+ {
455
+ "log_segment_ms_max": {
456
+ "type": "integer",
457
+ "default": 31536000000,
458
+ "default_human_readable": "1 year",
459
+ "c_type": "std::chrono::milliseconds"
460
+ },
461
+ "connection_timeout": {
462
+ "type": "integer",
463
+ "default": 604800,
464
+ "default_human_readable": "1 week",
465
+ "c_type": "std::chrono::seconds"
466
+ }
467
+ }
468
+ ----
469
+
470
+ ==== Human-readable time formatting
471
+
472
+ The `format_time_human_readable()` function automatically selects the most appropriate time unit:
63
473
 
64
- To run the extractor tool directly:
474
+ * Prefers larger units (years > weeks > days > hours > minutes > seconds > milliseconds)
475
+ * Only uses a unit if the value divides evenly
476
+ * Example: 604800 seconds becomes "1 week" instead of "7 days"
477
+
478
+ This human-readable format appears in documentation templates alongside the numeric value:
479
+
480
+ [source,asciidoc]
481
+ ----
482
+ | Default
483
+ | `604800` (1 week)
484
+ ----
485
+
486
+ === Stage 6: Enum default mapping
487
+
488
+ Raw C++ enum values are mapped to user-facing strings:
489
+
490
+ [,cpp]
491
+ ----
492
+ enum class write_caching_mode {
493
+ default_true,
494
+ default_false,
495
+ disabled
496
+ };
497
+
498
+ const char* write_caching_mode_to_string(write_caching_mode s) {
499
+ case write_caching_mode::default_false: return "false";
500
+ // ...
501
+ }
502
+ ----
503
+
504
+ Properties using this enum automatically map:
505
+
506
+ * Default: `default_false` → `"false"`
507
+ * Enum values: `["true", "false", "disabled"]`
508
+
509
+ === Stage 7: Override application
510
+
511
+ The `overrides.json` file allows customization of both properties and type definitions:
512
+
513
+ [,json]
514
+ ----
515
+ {
516
+ "properties": {
517
+ "kafka_api": {
518
+ "description": "Custom description",
519
+ "example": "kafka_api:\n - name: internal\n address: 0.0.0.0\n port: 9092"
520
+ }
521
+ },
522
+ "definitions": {
523
+ "model::compression": {
524
+ "enum": ["none", "gzip", "snappy", "lz4", "zstd", "producer"]
525
+ }
526
+ }
527
+ }
528
+ ----
529
+
530
+ == Command-line reference
531
+
532
+ === Basic usage
65
533
 
66
534
  [,bash]
67
535
  ----
68
- ./property_extractor.py --path <path-to-redpanda-source> [options]
536
+ ./property_extractor.py --path <redpanda-source-path> [options]
69
537
  ----
70
538
 
71
- === Command options
539
+ === Options
72
540
 
541
+ [cols="1,2,1"]
73
542
  |===
74
- | Option | Description
543
+ | Option | Description | Default
75
544
 
76
545
  | `--path <path>`
77
- | Path to the Redpanda source directory to extract properties from (required).
546
+ | Path to Redpanda source directory (required)
547
+ | None
548
+
78
549
  | `--recursive`
79
- | Recursively scan the provided path for header (`*.h`) and implementation (`*.cc`) file pairs.
80
- | `--output <output>`
81
- | Path to the output JSON file. If not provided, the output will be printed to the console.
82
- | `--definitions <definitions>`
83
- | Path to the `definitions.json` file for type definitions (default: included `definitions.json`).
550
+ | Recursively scan for header/implementation file pairs
551
+ | False
552
+
553
+ | `--output <file>`
554
+ | Output JSON file path
555
+ | stdout
556
+
557
+ | `--overrides <file>`
558
+ | JSON file with property and definition overrides
559
+ | `overrides.json`
560
+
84
561
  | `-v`, `--verbose`
85
- | Enable verbose logging for debugging purposes.
562
+ | Enable verbose logging
563
+ | False
86
564
  |===
87
565
 
88
- === Example command
566
+ === Examples
567
+
568
+ Extract properties from Redpanda source:
89
569
 
90
570
  [,bash]
91
571
  ----
92
- ./property_extractor.py --path ./tmp/redpanda --recursive --output autogenerated/properties.json
572
+ ./property_extractor.py --path ./tmp/redpanda/src/v --output properties.json
93
573
  ----
94
574
 
95
- === How it works
575
+ Use custom overrides:
96
576
 
97
- . The tool identifies pairs of header (`*.h`) and implementation (`*.cc`) files in the specified Redpanda source directory. This ensures that both the declaration and definition of properties are available.
577
+ [,bash]
578
+ ----
579
+ ./property_extractor.py \
580
+ --path ./tmp/redpanda/src/v \
581
+ --overrides custom-overrides.json \
582
+ --output properties.json
583
+ ----
98
584
 
99
- . Tree-sitter is used to parse the C{plus}{plus} source code and create abstract syntax trees (ASTs). Both the Tree-sitter C++ library (via a Git submodule) and its Python bindings (`tree_sitter`) are required for this step.
585
+ Enable verbose logging for debugging:
100
586
 
101
- . Custom logic in `property_extractor.py` processes the ASTs to extract property definitions from specific files like:
102
- +
103
- - `src/v/config/configuration.cc`
104
- - `src/v/kafka/client/configuration.cc`
587
+ [,bash]
588
+ ----
589
+ ./property_extractor.py --path ./tmp/redpanda/src/v --verbose
590
+ ----
105
591
 
106
- . Extracted properties are processed by a series of transformers to enrich and normalize the data. For example:
107
- +
108
- - `BasicInfoTransformer`: Extracts names and metadata.
109
- - `VisibilityTransformer`: Determines visibility (e.g., public or private).
110
- - `IsNullableTransformer`: Detects if a property is nullable.
592
+ == Customization
111
593
 
112
- . The `definitions.json` file is merged into the output, linking property types to their descriptions.
594
+ === When to add manual definitions
113
595
 
114
- === JSON output
596
+ You need manual definitions in `overrides.json` only for:
115
597
 
116
- The final JSON contains:
598
+ ==== 1. Types removed from codebase
117
599
 
118
- - `properties`: Extracted properties with metadata.
119
- - `definitions`: Type definitions, merged from `definitions.json`.
600
+ If a type was removed from Redpanda source but properties still reference it:
120
601
 
121
- Example JSON structure:
602
+ [,json]
603
+ ----
604
+ {
605
+ "definitions": {
606
+ "legacy_type": {
607
+ "type": "string",
608
+ "description": "Maintained for backward compatibility"
609
+ }
610
+ }
611
+ }
612
+ ----
613
+
614
+ ==== 2. Complex types not auto-extractable
615
+
616
+ Property classes inheriting from template base classes:
617
+
618
+ [,cpp]
619
+ ----
620
+ class retention_duration_property final
621
+ : public property<std::optional<std::chrono::milliseconds>> {
622
+ // Complex logic, no simple fields to extract
623
+ };
624
+ ----
625
+
626
+ Define manually:
627
+
628
+ [,json]
629
+ ----
630
+ {
631
+ "definitions": {
632
+ "retention_duration_property": {
633
+ "type": "integer",
634
+ "minimum": -2147483648,
635
+ "maximum": 2147483647
636
+ }
637
+ }
638
+ }
639
+ ----
640
+
641
+ ==== 3. Override auto-extracted definitions
642
+
643
+ Provide cleaner enum values or simplified field lists:
644
+
645
+ [,json]
646
+ ----
647
+ {
648
+ "definitions": {
649
+ "model::compression": {
650
+ "$comment": "Overrides auto-extracted enum to exclude internal values",
651
+ "enum": ["none", "gzip", "snappy", "lz4", "zstd", "producer"]
652
+ }
653
+ }
654
+ }
655
+ ----
656
+
657
+ ==== 4. Documentation-only types
658
+
659
+ Types needed for documentation but not in C++ source:
122
660
 
123
661
  [,json]
124
662
  ----
125
663
  {
126
- "properties": {
127
- "example_property": {
128
- "type": "string",
129
- "description": "An example property."
130
- }
664
+ "definitions": {
665
+ "custom_config_type": {
666
+ "type": "object",
667
+ "properties": {
668
+ "host": {"type": "string"},
669
+ "port": {"type": "integer"}
670
+ }
671
+ }
672
+ }
673
+ }
674
+ ----
675
+
676
+ === Override precedence
677
+
678
+ Definitions are applied in this order (later overrides earlier):
679
+
680
+ . Auto-extracted from C++ source
681
+ . `overrides.json` definitions
682
+
683
+ === Overrides file format
684
+
685
+ The `overrides.json` file supports two top-level keys:
686
+
687
+ [,json]
688
+ ----
689
+ {
690
+ "$comment": "Property and definition overrides for Redpanda property extraction",
691
+
692
+ "properties": {
693
+ "property_name": {
694
+ "description": "Custom description text",
695
+ "example": ".Example\n[,yaml]\n----\nredpanda:\n property_name: value\n----",
696
+ "version": "24.3",
697
+ "related_topics": ["xref:topic.adoc[Link]"],
698
+ "default": "custom_default",
699
+ "config_scope": "broker",
700
+ "type": "string"
701
+ }
702
+ },
703
+
704
+ "definitions": {
705
+ "type::name": {
706
+ "$comment": "Overrides or adds type definition",
707
+ "type": "enum",
708
+ "enum": ["value1", "value2", "value3"],
709
+ "defined_in": "https://github.com/.../file.h#L123"
710
+ }
711
+ }
712
+ }
713
+ ----
714
+
715
+ Property override fields:
716
+
717
+ * `description` - Override auto-extracted description
718
+ * `example` - Add AsciiDoc example block
719
+ * `example_file` - Load example from external file
720
+ * `version` - Version when property was introduced
721
+ * `related_topics` - Array of cross-reference links
722
+ * `default` - Override default value
723
+ * `config_scope` - Specify scope for new properties (broker/cluster/topic)
724
+ * `type` - Specify type for new properties
725
+
726
+ == JSON output format
727
+
728
+ The extractor generates a JSON Schema-like document:
729
+
730
+ [,json]
731
+ ----
732
+ {
733
+ "properties": {
734
+ "property_name": {
735
+ "type": "string",
736
+ "description": "Property description",
737
+ "default": "default_value",
738
+ "required": false,
739
+ "visibility": "tunable",
740
+ "requires_restart": false,
741
+ "config_scope": "broker",
742
+ "units": "bytes",
743
+ "minimum": 0,
744
+ "maximum": 1000,
745
+ "enum": ["option1", "option2"],
746
+ "example": ".Example\n[,yaml]\n----\nredpanda:\n property_name: value\n----"
747
+ }
748
+ },
749
+
750
+ "definitions": {
751
+ "model::broker_endpoint": {
752
+ "type": "object",
753
+ "properties": {
754
+ "name": {"type": "string"},
755
+ "address": {"type": "string"},
756
+ "port": {"type": "integer", "minimum": 0, "maximum": 65535}
757
+ },
758
+ "defined_in": "model/metadata.h"
759
+ },
760
+
761
+ "model::compression": {
762
+ "type": "enum",
763
+ "enum": ["none", "gzip", "snappy", "lz4", "zstd", "producer"],
764
+ "enum_string_mappings": {
765
+ "compression_type_none": "none",
766
+ "compression_type_gzip": "gzip"
767
+ },
768
+ "defined_in": "model/compression.h"
131
769
  },
132
- "definitions": {
133
- "string": {
134
- "description": "A string type."
135
- }
770
+
771
+ "model::node_id": {
772
+ "type": "integer",
773
+ "minimum": -2147483648,
774
+ "maximum": 2147483647,
775
+ "alias_for": "named_type<int32_t, struct node_id_model_type>",
776
+ "defined_in": "model/fundamental.h"
136
777
  }
778
+ }
137
779
  }
138
780
  ----
139
781
 
140
- === Custom definitions
782
+ == Documentation generation
141
783
 
142
- You can provide a custom `definitions.json` file:
784
+ To generate AsciiDoc documentation from the JSON:
143
785
 
144
786
  [,bash]
145
787
  ----
146
- ./property_extractor.py --path ./tmp/redpanda --definitions custom-definitions.json --output autogenerated/custom-output.json
788
+ python3 generate_docs.py
147
789
  ----
148
790
 
149
- === Debugging
791
+ This creates:
792
+
793
+ * `output/pages/broker-properties.adoc` - Broker configuration
794
+ * `output/pages/cluster-properties.adoc` - Cluster configuration
795
+ * `output/pages/object-storage-properties.adoc` - Cloud storage configuration
796
+ * `output/pages/deprecated/partials/deprecated-properties.adoc` - Deprecated properties
797
+
798
+
799
+ == Troubleshooting
800
+
801
+ === Type not found
150
802
 
151
- Enable verbose logging to see detailed information:
803
+ If a property references a type that isn't extracted:
152
804
 
805
+ . Check if the type exists in Redpanda source:
806
+ +
153
807
  [,bash]
154
808
  ----
155
- ./property_extractor.py --path ./tmp/redpanda --verbose
809
+ find tmp/redpanda/src/v -name "*.h" -exec grep -l "your_type_name" {} \;
810
+ ----
811
+
812
+ . If found, check extraction:
813
+ +
814
+ [,bash]
815
+ ----
816
+ ./property_extractor.py --path tmp/redpanda/src/v --verbose 2>&1 | grep "your_type_name"
817
+ ----
818
+
819
+ . If not extracted, add manual definition to `overrides.json`
820
+
821
+ === Enum values incorrect
822
+
823
+ If enum values don't match user-facing strings:
824
+
825
+ . Check for `_to_string()` function in source
826
+ . If missing or incorrect, override in `overrides.json`:
827
+ +
828
+ [,json]
829
+ ----
830
+ {
831
+ "definitions": {
832
+ "model::your_enum": {
833
+ "enum": ["user_value1", "user_value2"]
834
+ }
835
+ }
836
+ }
156
837
  ----
157
838
 
158
- == Run the docs generator manually
839
+ === Missing property fields
159
840
 
160
- . Make sure you have the `autogenerated/properties-output.json` file, relative to the `Makefile` location.
841
+ If extracted properties lack descriptions or defaults:
161
842
 
162
- . Run the script:
843
+ . Check C++ source for property declaration
844
+ . Add override in `overrides.json`:
163
845
  +
846
+ [,json]
847
+ ----
848
+ {
849
+ "properties": {
850
+ "property_name": {
851
+ "description": "Detailed description",
852
+ "example": "..."
853
+ }
854
+ }
855
+ }
856
+ ----
857
+
858
+ === Build failures
859
+
860
+ Tree-sitter compilation errors:
861
+
164
862
  [,bash]
165
863
  ----
166
- python3 generate_docs.py
864
+ cd tree-sitter/tree-sitter-cpp
865
+ git submodule update --init --recursive
167
866
  ----
168
867
 
169
- The script will process the JSON and generate AsciiDoc files in the `output/pages/` directory.
868
+ Python dependency errors:
170
869
 
171
- === Output files
870
+ [,bash]
871
+ ----
872
+ make clean
873
+ make venv
874
+ ----
875
+
876
+ == Advanced usage
877
+
878
+ === Adding new transformers
879
+
880
+ To add custom property transformations:
881
+
882
+ . Create a transformer function in `transformers.py`:
883
+ +
884
+ [,python]
885
+ ----
886
+ def my_custom_transformer(properties):
887
+ """Add custom metadata to properties."""
888
+ for prop_name, prop in properties.items():
889
+ # Add custom logic
890
+ prop['custom_field'] = compute_value(prop)
891
+ return properties
892
+ ----
172
893
 
173
- The following files will be generated:
894
+ . Register in transformer pipeline in `property_extractor.py`:
895
+ +
896
+ [,python]
897
+ ----
898
+ properties = transform_files_with_properties(files_with_properties)
899
+ properties = my_custom_transformer(properties) # Add here
900
+ ----
174
901
 
175
- - Broker Properties: `output/pages/broker-properties.adoc`
176
- - Cluster Properties: `output/pages/cluster-properties.adoc`
177
- - Object Storage Properties: `output/pages/object-storage-properties.adoc`
178
- - Deprecated Properties: `output/pages/deprecated/partials/deprecated-properties.adoc`
902
+ === Extending type extraction
179
903
 
180
- === Error reports
904
+ To support additional C++ patterns:
181
905
 
182
- If the script encounters issues, it will generate error files in the `output/error/` directory:
906
+ . Add extraction method to `type_definition_extractor.py`
907
+ . Register in `_extract_from_file()` method
908
+ . Test extraction on sample files
183
909
 
184
- - `empty_description.txt`: Properties without descriptions.
185
- - `empty_type.txt`: Properties without types.
186
- - `max_without_min.txt`: Properties with a maximum value but no minimum.
187
- - `min_without_max.txt`: Properties with a minimum value but no maximum.
910
+ === Custom output formats
188
911
 
189
- The console output will summarize the errors and property statistics.
912
+ To generate additional output formats:
913
+
914
+ . Load the JSON output:
915
+ +
916
+ [,python]
917
+ ----
918
+ import json
919
+ with open('gen/properties-output.json') as f:
920
+ data = json.load(f)
921
+ ----
190
922
 
191
- === How it works
923
+ . Transform to desired format (YAML, XML, etc.)
192
924
 
193
- . Input parsing:
194
- - The script loads the JSON file from the `autogenerated/` directory.
195
- - Properties are categorized into groups based on their `defined_in` field or specific naming conventions such as the `cloud_` prefix.
925
+ == Contributing
196
926
 
197
- . Validation:
198
- - Validates fields like `description`, `type`, `maximum`, and `minimum`.
199
- - Identifies missing or inconsistent data and logs these to error files.
927
+ When modifying the extractor:
200
928
 
201
- . Documentation generation:
202
- - Creates AsciiDoc files with categorized properties, including metadata such as type, default value, visibility, and restart requirements.
203
- - Appends appropriate titles, introductions, and formatting for each group.
929
+ . Test on multiple Redpanda versions
930
+ . Update `overrides.json` for new types
931
+ . Run validation: `make test`
932
+ . Document changes in this README
204
933
 
205
- . Error reporting: Generates error reports for easy debugging and correction of the input JSON.
934
+ == Additional resources
206
935
 
936
+ * https://github.com/redpanda-data/redpanda[Redpanda GitHub Repository]
937
+ * https://tree-sitter.github.io/tree-sitter/[Tree-sitter Documentation]