okf-schema 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,220 @@
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[codz]
4
+ *$py.class
5
+
6
+ # C extensions
7
+ *.so
8
+
9
+ # Distribution / packaging
10
+ .Python
11
+ build/
12
+ develop-eggs/
13
+ dist/
14
+ downloads/
15
+ eggs/
16
+ .eggs/
17
+ lib/
18
+ lib64/
19
+ parts/
20
+ sdist/
21
+ var/
22
+ wheels/
23
+ share/python-wheels/
24
+ *.egg-info/
25
+ .installed.cfg
26
+ *.egg
27
+ MANIFEST
28
+
29
+ # PyInstaller
30
+ # Usually these files are written by a python script from a template
31
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
32
+ *.manifest
33
+ *.spec
34
+
35
+ # Installer logs
36
+ pip-log.txt
37
+ pip-delete-this-directory.txt
38
+
39
+ # Unit test / coverage reports
40
+ htmlcov/
41
+ .tox/
42
+ .nox/
43
+ .coverage
44
+ .coverage.*
45
+ .cache
46
+ nosetests.xml
47
+ coverage.xml
48
+ *.cover
49
+ *.py.cover
50
+ .hypothesis/
51
+ .pytest_cache/
52
+ cover/
53
+
54
+ # Translations
55
+ *.mo
56
+ *.pot
57
+
58
+ # Django stuff:
59
+ *.log
60
+ local_settings.py
61
+ db.sqlite3
62
+ db.sqlite3-journal
63
+
64
+ # Flask stuff:
65
+ instance/
66
+ .webassets-cache
67
+
68
+ # Scrapy stuff:
69
+ .scrapy
70
+
71
+ # Sphinx documentation
72
+ docs/_build/
73
+
74
+ # PyBuilder
75
+ .pybuilder/
76
+ target/
77
+
78
+ # Jupyter Notebook
79
+ .ipynb_checkpoints
80
+
81
+ # IPython
82
+ profile_default/
83
+ ipython_config.py
84
+
85
+ # pyenv
86
+ # For a library or package, you might want to ignore these files since the code is
87
+ # intended to run in multiple environments; otherwise, check them in:
88
+ # .python-version
89
+
90
+ # pipenv
91
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
93
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
94
+ # install all needed dependencies.
95
+ # Pipfile.lock
96
+
97
+ # UV
98
+ # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
100
+ # commonly ignored for libraries.
101
+ # uv.lock
102
+
103
+ # poetry
104
+ # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
106
+ # commonly ignored for libraries.
107
+ # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108
+ # poetry.lock
109
+ # poetry.toml
110
+
111
+ # pdm
112
+ # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
113
+ # pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
114
+ # https://pdm-project.org/en/latest/usage/project/#working-with-version-control
115
+ # pdm.lock
116
+ # pdm.toml
117
+ .pdm-python
118
+ .pdm-build/
119
+
120
+ # pixi
121
+ # Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
122
+ # pixi.lock
123
+ # Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
124
+ # in the .venv directory. It is recommended not to include this directory in version control.
125
+ .pixi
126
+
127
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
128
+ __pypackages__/
129
+
130
+ # Celery stuff
131
+ celerybeat-schedule
132
+ celerybeat.pid
133
+
134
+ # Redis
135
+ *.rdb
136
+ *.aof
137
+ *.pid
138
+
139
+ # RabbitMQ
140
+ mnesia/
141
+ rabbitmq/
142
+ rabbitmq-data/
143
+
144
+ # ActiveMQ
145
+ activemq-data/
146
+
147
+ # SageMath parsed files
148
+ *.sage.py
149
+
150
+ # Environments
151
+ .env
152
+ .envrc
153
+ .venv
154
+ env/
155
+ venv/
156
+ ENV/
157
+ env.bak/
158
+ venv.bak/
159
+
160
+ # Spyder project settings
161
+ .spyderproject
162
+ .spyproject
163
+
164
+ # Rope project settings
165
+ .ropeproject
166
+
167
+ # mkdocs documentation
168
+ /site
169
+
170
+ # mypy
171
+ .mypy_cache/
172
+ .dmypy.json
173
+ dmypy.json
174
+
175
+ # Pyre type checker
176
+ .pyre/
177
+
178
+ # pytype static type analyzer
179
+ .pytype/
180
+
181
+ # Cython debug symbols
182
+ cython_debug/
183
+
184
+ # PyCharm
185
+ # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
186
+ # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
187
+ # and can be added to the global gitignore or merged into this file. For a more nuclear
188
+ # option (not recommended) you can uncomment the following to ignore the entire idea folder.
189
+ # .idea/
190
+
191
+ # Abstra
192
+ # Abstra is an AI-powered process automation framework.
193
+ # Ignore directories containing user credentials, local state, and settings.
194
+ # Learn more at https://abstra.io/docs
195
+ .abstra/
196
+
197
+ # Visual Studio Code
198
+ # Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
199
+ # that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
200
+ # and can be added to the global gitignore or merged into this file. However, if you prefer,
201
+ # you could uncomment the following to ignore the entire vscode folder
202
+ # .vscode/
203
+ # Temporary file for partial code execution
204
+ tempCodeRunnerFile.py
205
+
206
+ # Ruff stuff:
207
+ .ruff_cache/
208
+
209
+ # PyPI configuration file
210
+ .pypirc
211
+
212
+ # Marimo
213
+ marimo/_static/
214
+ marimo/_lsp/
215
+ __marimo__/
216
+
217
+ # Streamlit
218
+ .streamlit/secrets.toml
219
+
220
+ .agents/changes/
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Gaetan Semet
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,281 @@
1
+ Metadata-Version: 2.4
2
+ Name: okf-schema
3
+ Version: 0.2.0
4
+ Summary: CLI tool and Python library for working with OKF (Open Knowledge Format) bundles
5
+ Project-URL: Homepage, https://github.com/gsemet/okf-schema
6
+ Project-URL: Documentation, https://okf-schema.readthedocs.io
7
+ Project-URL: Repository, https://github.com/gsemet/okf-schema
8
+ Project-URL: Issues, https://github.com/gsemet/okf-schema/issues
9
+ Author-email: Gaetan Semet <gaetan@xeberon.net>
10
+ License-Expression: MIT
11
+ License-File: LICENSE
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Programming Language :: Python :: 3.13
20
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
21
+ Classifier: Topic :: Text Processing :: Markup
22
+ Requires-Python: >=3.10
23
+ Requires-Dist: click>=8.0
24
+ Requires-Dist: jsonschema>=4.23.0
25
+ Requires-Dist: pyjson5>=1.6.0
26
+ Requires-Dist: ruamel-yaml>=0.18.0
27
+ Description-Content-Type: text/markdown
28
+
29
+ # okf-schema
30
+
31
+ [![CI](https://github.com/gsemet/okf-schema/actions/workflows/ci.yml/badge.svg)](https://github.com/gsemet/okf-schema/actions/workflows/ci.yml)
32
+ [![PyPI](https://img.shields.io/pypi/v/okf-schema)](https://pypi.org/project/okf-schema/)
33
+ [![Python Versions](https://img.shields.io/pypi/pyversions/okf-schema)](https://pypi.org/project/okf-schema/)
34
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
35
+
36
+ **okf-schema** is a CLI tool and Python library for working with **OKF (Open Knowledge Format)** bundles
37
+ with JSONSchema validation of the frontmatter metadata, and formatting capabilities while preserving comments.
38
+
39
+ OKF is a markdown-based knowledge format where each concept is a markdown file with YAML frontmatter.
40
+
41
+ > [!IMPORTANT]
42
+ > OKF-schema is opinionated. It delivers a valid OKF bundle but is adds a structure on the frontmatter that
43
+ > is not allowed in OKF specification:
44
+ >
45
+ > ```raw
46
+ > Type values are **not** registered centrally. Producers SHOULD pick
47
+ > values that are descriptive and self-explanatory; consumers MUST
48
+ > tolerate unknown types gracefully (typically by treating them as
49
+ > generic concepts).
50
+ > ```
51
+ >
52
+ > In a strict OKF bundle, the `type` field is mandatory but can take any value and the validator needs
53
+ > to allow any field in the frontmatter.
54
+ >
55
+ > OKF-schema **requires** the type to be one of the registered types in the `_schema/` directory
56
+ > and validates the frontmatter against the corresponding schema.
57
+ > Additional properties may or may not be allowed depending on the schema definition.
58
+
59
+ ## What `okf-schema` adds to OKF
60
+
61
+ Plain OKF only defines a folder of markdown files. `okf-schema` turns those files into a **validated, queryable knowledge base** by adding:
62
+
63
+ | Capability | What it does |
64
+ |-----------|--------------|
65
+ | **Schema-driven frontmatter validation** | Every concept's YAML frontmatter is checked against a JSONSchema. Invalid fields, missing required keys, or wrong types are reported as structured errors. |
66
+ | **Auto-discovered schemas** | Schemas live inside the bundle under `_schema/` (e.g. `_schema/concept.schema.yaml`). The `type` field in a concept's frontmatter tells `okf-schema` which schema file to load. A concept with `type: concept` is validated against `_schema/concept.schema.yaml`. Schemas can be written in **YAML**, **JSON**, or **JSON5** (JSON with comments and trailing commas). |
67
+ | **Bundle integrity checks** | Detects broken internal links, missing `index.md` files, malformed `log.md` entries, and reserved-file violations. |
68
+ | **Safe linting** | Normalizes YAML frontmatter by flattening nested lists and converting block-style to inline notation while preserving comments and custom quotes via `ruamel.yaml`. |
69
+ | **Analytics** | Bundle statistics. |
70
+
71
+ See a real schema definition in [`examples/ai-llm-knowledge-base/_schema/concept.schema.yaml`](examples/ai-llm-knowledge-base/_schema/concept.schema.yaml).
72
+
73
+ Example of structure
74
+
75
+ ```raw
76
+ my-bundle/
77
+ ├── _schema/
78
+ │ ├── concept.schema.yaml
79
+ │ ├── tool.schema.json
80
+ │ └── paper.schema.json5
81
+ ├── concepts/
82
+ │ ├── rag.md
83
+ │ └── chain-of-thought.md
84
+ ├── tools/
85
+ │ ├── langchain.md
86
+ │ └── llamaindex.md
87
+ ├── papers/
88
+ │ ├── rag-paper.md
89
+ │ └── chain-of-thought-paper.md
90
+ ├── index.md
91
+ └── log.md
92
+ ```
93
+
94
+ The `type` field in each entity frontmatter determines which schema is used for validation.
95
+ For example, `type: concept` uses `_schema/concept.schema.yaml`, while `type: tool` uses `_schema/tool.schema.json`.
96
+
97
+ Schema extensions supported:
98
+
99
+ - `.schema.yaml` — YAML (human-friendly, supports comments and anchors)
100
+ - `.schema.json` — JSON (strict syntax, widely supported by editors)
101
+ - `.schema.json5` — JSON5 (JSON with comments, trailing commas, and unquoted keys)
102
+
103
+ ## Installation
104
+
105
+ ```bash
106
+ uv tool install okf-schema
107
+ ```
108
+
109
+ ## Quickstart
110
+
111
+ ```bash
112
+ # Initialize a new OKF bundle
113
+ okf-schema init my-bundle
114
+
115
+ # Update index.md files for all directories
116
+ okf-schema index --path my-bundle/bundle
117
+
118
+ # Lint frontmatter (flatten nested lists and convert block-style to inline)
119
+ okf-schema lint --path my-bundle/bundle
120
+
121
+ # Validate a bundle
122
+ okf-schema validate --path my-bundle/bundle
123
+ # or enforce strict validation (fail on warnings)
124
+ okf-schema validate --path my-bundle/bundle --strict
125
+
126
+ # List all concepts
127
+ okf-schema list --path my-bundle/bundle
128
+ ```
129
+
130
+ ## CLI Reference
131
+
132
+ | Subcommand | Description |
133
+ |-----------|-------------|
134
+ | `init <name>` | Create a new OKF bundle directory structure |
135
+ | `new --path <dir> --name <name>` | Create a new concept file with frontmatter template |
136
+ | `validate --path <bundle>` | Validate bundle structure and frontmatter |
137
+ | `validate --path <bundle> --strict` | Validate and fail on warnings |
138
+ | `lint --path <bundle>` | Lint frontmatter: flatten nested lists and convert block-style to inline |
139
+ | `list --path <bundle>` | List all concepts in a bundle |
140
+ | `show --path <bundle> <concept>` | Show a single concept's frontmatter and body |
141
+ | `index --path <bundle>` | Regenerate all `index.md` files |
142
+ | `stats --path <bundle>` | Show bundle statistics |
143
+
144
+ ## Recommended Workflow
145
+
146
+ Before packaging or distributing a bundle, run these three commands in order and fix all warnings:
147
+
148
+ ```bash
149
+ okf-schema index --path my-bundle/bundle # regenerate index.md files
150
+ okf-schema lint --path my-bundle/bundle # flatten nested lists and convert block lists to inline
151
+ okf-schema validate --path my-bundle/bundle --strict # check structure, schema, and links; fail on warnings
152
+ ```
153
+
154
+ Only zip or ship the bundle once `validate --strict` reports **zero errors and zero warnings**. Warnings such as missing `index.md` (W4), block-style lists (W7), or broken cross-links (W2) signal issues that will degrade the experience for downstream consumers.
155
+
156
+ ## Example: AI & LLM Knowledge Base
157
+
158
+ The [`examples/ai-llm-knowledge-base/`](examples/ai-llm-knowledge-base/) directory contains a realistic knowledge base with **three concept types** — `concept`, `tool`, and `paper` — each validated by its own schema in `_schema/`.
159
+
160
+ ### How `type` selects the schema
161
+
162
+ The `type` field in a concept's frontmatter determines which schema file is loaded. A file with `type: concept` is validated against `_schema/concept.schema.yaml`; `type: tool` against `_schema/tool.schema.json`; and `type: paper` against `_schema/paper.schema.json5`.
163
+
164
+ ### Schema format support
165
+
166
+ `okf-schema` accepts schemas in three formats:
167
+
168
+ | Extension | Format | Notes |
169
+ |-----------|--------|-------|
170
+ | `.schema.yaml` | YAML | Human-friendly, supports comments and anchors |
171
+ | `.schema.json` | JSON | Strict syntax, widely supported by editors |
172
+ | `.schema.json5` | JSON5 | JSON with comments, trailing commas, and unquoted keys |
173
+
174
+ ### Schema highlights
175
+
176
+ **`concept.schema.yaml`** — AI concepts with enums, email validation, and kebab-case regex:
177
+
178
+ ```yaml
179
+ properties:
180
+ category:
181
+ enum: [LLM, AI Agent, Coding Agent, Prompt Engineering, Tooling, Evaluation]
182
+ maturity:
183
+ enum: [experimental, beta, production, deprecated]
184
+ author_email:
185
+ type: string
186
+ format: email
187
+ tags:
188
+ type: array
189
+ items:
190
+ pattern: "^[a-z0-9-]+$" # kebab-case only
191
+ ```
192
+
193
+ **`tool.schema.json`** — Developer tools with URI validation and language enums:
194
+
195
+ ```json
196
+ {
197
+ "properties": {
198
+ "license": {
199
+ "enum": ["MIT", "Apache-2.0", "GPL-3.0", "Proprietary", "Other"]
200
+ },
201
+ "language": {
202
+ "enum": ["Python", "JavaScript", "TypeScript", "Rust", "Go", "Java", "Multi-language"]
203
+ },
204
+ "url": { "type": "string", "format": "uri" }
205
+ }
206
+ }
207
+ ```
208
+
209
+ **`paper.schema.json5`** — Research papers with year bounds and venue enums:
210
+
211
+ ```javascript
212
+ // JSON5 allows comments, trailing commas, and unquoted keys
213
+ {
214
+ properties: {
215
+ year: { type: "integer", minimum: 1950, maximum: 2030 },
216
+ venue: {
217
+ enum: ["NeurIPS", "ICML", "ICLR", "ACL", "EMNLP", "arXiv", "Other"]
218
+ },
219
+ bibtex_key: { pattern: "^[A-Za-z0-9_-]+$" },
220
+ },
221
+ }
222
+ ```
223
+
224
+ ### Concept file example (`concepts/rag.md`)
225
+
226
+ ```markdown
227
+ ---
228
+ type: concept
229
+ title: Retrieval-Augmented Generation
230
+ description: >
231
+ A technique that enhances LLM outputs by retrieving relevant documents
232
+ from an external knowledge store and injecting them into the prompt.
233
+ category: LLM
234
+ maturity: production
235
+ author_email: bob@example.com
236
+ complexity: intermediate
237
+ tags: [rag, retrieval, llm, knowledge-base]
238
+ related_tools: [LangChain, LlamaIndex, OpenAI-API]
239
+ ---
240
+
241
+ # Retrieval-Augmented Generation
242
+
243
+ RAG combines parametric knowledge (the model's weights) with non-parametric
244
+ knowledge (external documents) to reduce hallucinations...
245
+ ```
246
+
247
+ ### Validation in action
248
+
249
+ ```bash
250
+ # Validates all concepts, tools, and papers against their respective schemas
251
+ okf-schema validate --path examples/ai-llm-knowledge-base
252
+
253
+ # Show bundle statistics
254
+ okf-schema stats --path examples/ai-llm-knowledge-base
255
+ ```
256
+
257
+ ## Python API
258
+
259
+ ```python
260
+ from okf_schema.api import validate_bundle
261
+
262
+ report = validate_bundle("path/to/bundle")
263
+ for finding in report.findings:
264
+ print(finding.level, finding.message)
265
+
266
+ # The _schema/ directory inside the bundle is auto-discovered.
267
+ # You can also pass an explicit schema_db path:
268
+ # report = validate_bundle("path/to/bundle", schema_db="path/to/schemas")
269
+ ```
270
+
271
+ ## Documentation
272
+
273
+ Full documentation is available at [https://okf-schema.readthedocs.io](https://okf-schema.readthedocs.io).
274
+
275
+ ## Contributing
276
+
277
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.
278
+
279
+ ## License
280
+
281
+ MIT License — see [LICENSE](LICENSE) for details.