okf-generator 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,9 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Umair Baig
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6
+
7
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
8
+
9
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,320 @@
1
+ Metadata-Version: 2.4
2
+ Name: okf-generator
3
+ Version: 0.1.1
4
+ Summary: Generate OKF v0.1 knowledge bundles from codebases — Claude skill + OpenCode integration
5
+ Project-URL: Homepage, https://github.com/umairbaig/okf-generator
6
+ Project-URL: Documentation, https://github.com/umairbaig/okf-generator#readme
7
+ Project-URL: Repository, https://github.com/umairbaig/okf-generator
8
+ Project-URL: Issues, https://github.com/umairbaig/okf-generator/issues
9
+ Author: Umair Baig
10
+ License: MIT
11
+ License-File: LICENSE
12
+ Keywords: ai-agent,claude-skill,code-knowledge,codebase-indexing,llm,okf,open-knowledge-format,opencode,training-data,tree-sitter
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Classifier: Topic :: Software Development :: Documentation
21
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
22
+ Requires-Python: >=3.11
23
+ Requires-Dist: pyyaml>=6.0
24
+ Requires-Dist: tqdm>=4.0
25
+ Requires-Dist: tree-sitter-go>=0.23.0
26
+ Requires-Dist: tree-sitter-java>=0.23.0
27
+ Requires-Dist: tree-sitter-javascript>=0.23.0
28
+ Requires-Dist: tree-sitter-python>=0.25.0
29
+ Requires-Dist: tree-sitter-ruby>=0.23.0
30
+ Requires-Dist: tree-sitter-rust>=0.23.0
31
+ Requires-Dist: tree-sitter-typescript>=0.23.0
32
+ Requires-Dist: tree-sitter>=0.25.0
33
+ Provides-Extra: all
34
+ Requires-Dist: build; extra == 'all'
35
+ Requires-Dist: hatch; extra == 'all'
36
+ Requires-Dist: openai>=1.0.0; extra == 'all'
37
+ Requires-Dist: pytest>=7.0; extra == 'all'
38
+ Requires-Dist: twine; extra == 'all'
39
+ Provides-Extra: dev
40
+ Requires-Dist: build; extra == 'dev'
41
+ Requires-Dist: hatch; extra == 'dev'
42
+ Requires-Dist: pytest>=7.0; extra == 'dev'
43
+ Requires-Dist: twine; extra == 'dev'
44
+ Provides-Extra: llm
45
+ Requires-Dist: openai>=1.0.0; extra == 'llm'
46
+ Description-Content-Type: text/markdown
47
+
48
+ <div align="center">
49
+
50
+ <img src="docs/banner.svg" alt="okf-generator banner" width="100%"/>
51
+
52
+ <br/>
53
+
54
+ [![PyPI version](https://img.shields.io/pypi/v/okf-generator?style=flat-square&color=7c3aed&label=PyPI)](https://pypi.org/project/okf-generator/)
55
+ [![Python](https://img.shields.io/pypi/pyversions/okf-generator?style=flat-square&color=06b6d4)](https://pypi.org/project/okf-generator/)
56
+ [![Tests](https://img.shields.io/github/actions/workflow/status/umairbaig/okf-generator/ci.yml?style=flat-square&label=tests&color=4ade80)](https://github.com/umairbaig/okf-generator/actions)
57
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue?style=flat-square)](LICENSE)
58
+ [![OKF v0.1](https://img.shields.io/badge/OKF-v0.1-7c3aed?style=flat-square)](https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing)
59
+ [![Claude Skill](https://img.shields.io/badge/Claude-Skill-orange?style=flat-square&logo=anthropic)](SKILL.md)
60
+ [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen?style=flat-square)](CONTRIBUTING.md)
61
+
62
+ **Index any codebase into a structured OKF v0.1 knowledge bundle — then look up exact concepts for AI agents like OpenCode.**
63
+
64
+ [Installation](#installation) · [Quick Start](#quick-start) · [CLI Reference](#cli-reference) · [OpenCode Integration](#opencode-integration) · [Contributing](#contributing)
65
+
66
+ </div>
67
+
68
+ ---
69
+
70
+ ## What is this?
71
+
72
+ `okf-generator` converts your source code into an [Open Knowledge Format](https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing) (OKF) v0.1 knowledge bundle — structured markdown files that AI agents can read, search, and reason over.
73
+
74
+ Instead of giving an AI your entire codebase, you give it exactly the concept it needs:
75
+
76
+ ```bash
77
+ # Before touching WorldBankConnector, look it up
78
+ okf lookup WorldBankConnector
79
+
80
+ # CLASS: WorldBankConnector
81
+ # Source : StockAI/RnD/python/connectors/economic_data.py line 51
82
+ # Description : Fetches World Bank development indicators via wbdata API.
83
+ # Methods : get_indicator, search
84
+ # Signature : class WorldBankConnector
85
+ ```
86
+
87
+ ## Features
88
+
89
+ - **6 languages** — Python (stdlib AST), JS/TS/Go/Java/Rust/Ruby (tree-sitter)
90
+ - **Zero LLM required** for extraction — deterministic, fast, offline-capable
91
+ - **OKF v0.1 conformant** — type, description, resource, tags, timestamp
92
+ - **Domain/resource-path layout** — bundle mirrors your source tree exactly
93
+ - **Resumable LLM enrichment** — enrich descriptions with any OpenAI-compat endpoint; safe to interrupt and rerun
94
+ - **OpenCode integration** — `AGENTS.md` + custom commands for pinpoint context injection
95
+ - **Training data pipeline** — convert bundle to JSONL pairs (codegen, QA, doc, summarize, crosslink)
96
+ - **Claude Skill** — install `SKILL.md` to trigger the full pipeline from natural language
97
+
98
+ ## Installation
99
+
100
+ ```bash
101
+ # Core (extraction only — no LLM required)
102
+ pip install okf-generator
103
+
104
+ # With LLM enrichment + training pair generation
105
+ pip install okf-generator[llm]
106
+ ```
107
+
108
+ **Requirements:** Python 3.11+
109
+
110
+ ## Quick Start
111
+
112
+ ```bash
113
+ # 1. Generate a knowledge bundle from your codebase
114
+ okf generate ./my_project ./okf_bundle
115
+
116
+ # 2. Look up a concept (works instantly, zero LLM)
117
+ okf lookup WorldBankConnector
118
+
119
+ # 3. Find all concepts from one file
120
+ okf lookup --file src/connectors/economic_data.py
121
+
122
+ # 4. Generate training pairs from the bundle
123
+ okf pairs ./okf_bundle ./train.jsonl
124
+
125
+ # 5. Regenerate SUMMARY.md after enrichment
126
+ okf summarize ./okf_bundle
127
+ ```
128
+
129
+ ## Bundle Layout
130
+
131
+ The output mirrors your source tree — not flat buckets:
132
+
133
+ ```
134
+ okf_bundle/
135
+ ├── SUMMARY.md ← bird's-eye view for AI agents
136
+ ├── index.md ← root navigation
137
+ ├── log.md ← generation history
138
+ └── StockAI/
139
+ └── RnD/
140
+ └── python/
141
+ └── connectors/
142
+ ├── index.md ← lists all concepts in this folder
143
+ ├── economic_data.md ← Module concept
144
+ └── economic_data/
145
+ ├── WorldBankConnector.md ← Class
146
+ ├── get_indicator.md ← Function
147
+ └── search.md ← Function
148
+ ```
149
+
150
+ Each file is OKF v0.1 conformant:
151
+
152
+ ```yaml
153
+ ---
154
+ type: Class
155
+ title: WorldBankConnector
156
+ description: Fetches World Bank development indicators via wbdata API.
157
+ resource: StockAI/RnD/python/connectors/economic_data.py
158
+ tags:
159
+ - lang:python
160
+ - type:Class
161
+ - module:StockAI
162
+ - domain:RnD
163
+ - git:branch:main
164
+ - git:repo:TrainLLMs
165
+ timestamp: '2026-05-23T09:01:21Z'
166
+ ---
167
+
168
+ # WorldBankConnector
169
+
170
+ ...signature, docstring, params, returns, methods, related concepts...
171
+ ```
172
+
173
+ ## CLI Reference
174
+
175
+ ### `okf generate`
176
+
177
+ ```
178
+ okf generate <source_dir> [output_dir]
179
+
180
+ Options:
181
+ --summarize <bundle_dir> Regenerate SUMMARY.md only (no re-scan)
182
+
183
+ Environment variables (LLM enrichment):
184
+ OKF_ENRICH=1 Enable LLM enrichment
185
+ OKF_BASE_URL OpenAI-compat base URL (default: https://api.anthropic.com/v1)
186
+ OKF_API_KEY API key
187
+ OKF_MODEL Model name (default: claude-sonnet-4-6)
188
+ OKF_MAX_WORKERS Parallel workers (default: 2)
189
+ ```
190
+
191
+ ### `okf lookup`
192
+
193
+ ```
194
+ okf lookup [query] [options]
195
+
196
+ Options:
197
+ --bundle PATH Bundle directory (default: ./okf_bundle)
198
+ --file PATH Filter by source file
199
+ --type TYPE Filter by concept type: Function | Class | Module
200
+ --tag TAG Filter by tag, repeatable: --tag lang:python
201
+ --limit N Max results (default: 10)
202
+ --compact One-line output per result
203
+ --json JSON output for programmatic use
204
+ --full Raw .md file content
205
+ --min-score N Minimum relevance score 0-1 (default: 0.1)
206
+ ```
207
+
208
+ ### `okf pairs`
209
+
210
+ ```
211
+ okf pairs <bundle_dir> [output_file]
212
+
213
+ Environment variables:
214
+ SKIP_SYNTH=1 Static pairs only (no LLM)
215
+ SYNTH_BASE_URL API endpoint
216
+ SYNTH_API_KEY API key
217
+ SYNTH_MODEL Model name
218
+ MAX_WORKERS Parallel workers (default: 3)
219
+ QA_PER_CONCEPT Q&A pairs per concept (default: 3)
220
+ PAIR_TYPES Comma-separated: codegen,qa,doc,summarize,crosslink
221
+ ```
222
+
223
+ ## Supported Languages
224
+
225
+ | Language | Parser | Extracts |
226
+ |----------|--------|---------|
227
+ | Python | stdlib `ast` | Functions, classes, params, return types, docstrings |
228
+ | JavaScript / TypeScript | tree-sitter | Functions, arrow fns, classes, JSDoc |
229
+ | Go | tree-sitter | Funcs, methods, structs, interfaces, GoDoc |
230
+ | Java | tree-sitter | Classes, methods, constructors, Javadoc |
231
+ | Rust | tree-sitter | Fns, structs, enums, traits, impl blocks, `///` |
232
+ | Ruby | tree-sitter | Defs, classes, modules, `#` comments |
233
+
234
+ ## LLM Enrichment
235
+
236
+ Works with any OpenAI-compatible endpoint — Claude, Ollama, llama.cpp, etc:
237
+
238
+ ```bash
239
+ # Using a local llama.cpp server
240
+ OKF_ENRICH=1 \
241
+ OKF_BASE_URL="http://localhost:8080/v1" \
242
+ OKF_API_KEY="llamabarn" \
243
+ OKF_MODEL="ggml-org/gemma-3-4b-it-qat-GGUF:Q4_0" \
244
+ OKF_MAX_WORKERS=2 \
245
+ okf generate ./my_project ./okf_bundle
246
+ ```
247
+
248
+ Enrichment is **resumable** — interrupt and rerun freely. Already-enriched concepts are skipped.
249
+
250
+ ## OpenCode Integration
251
+
252
+ ```bash
253
+ # 1. Tell OpenCode about the bundle (auto-loaded every session)
254
+ cat >> AGENTS.md << 'EOF'
255
+ ## OKF Knowledge Bundle
256
+ Before working on any class or function, look it up:
257
+ okf lookup --bundle ./okf_bundle <ConceptName>
258
+ EOF
259
+
260
+ # 2. Add a custom command
261
+ mkdir -p .opencode/commands
262
+ echo "RUN okf lookup --bundle ./okf_bundle \$NAME" > .opencode/commands/lookup.md
263
+ ```
264
+
265
+ Then in OpenCode: `/lookup NAME=WorldBankConnector`
266
+
267
+ See [docs/opencode-integration.md](references/opencode-integration.md) for full setup.
268
+
269
+ ## Python API
270
+
271
+ ```python
272
+ from okf.generator import scan_codebase, write_bundle, write_summary
273
+ from okf.lookup import load_bundle, search
274
+
275
+ # Generate bundle
276
+ concepts = scan_codebase("./my_project")
277
+ write_bundle(concepts, "./okf_bundle", "my_project", ["initial generation"])
278
+ write_summary("my_project", concepts, "./okf_bundle", {})
279
+
280
+ # Search concepts
281
+ bundle = load_bundle("./okf_bundle")
282
+ results = search(bundle, tokens=["WorldBankConnector"])
283
+ print(results[0]["description"])
284
+ ```
285
+
286
+ ## Training Data
287
+
288
+ Convert your OKF bundle into JSONL training pairs for fine-tuning:
289
+
290
+ ```bash
291
+ # 5 pair types: codegen, qa, doc, summarize, crosslink
292
+ okf pairs ./okf_bundle ./train.jsonl
293
+ ```
294
+
295
+ Each pair is in chat format compatible with most fine-tuning pipelines.
296
+
297
+ ## Claude Skill
298
+
299
+ Install `SKILL.md` to trigger the full pipeline from natural language inside Claude:
300
+
301
+ > *"Index my codebase"* → generates OKF bundle
302
+ > *"Look up WorldBankConnector"* → returns exact concept
303
+ > *"Generate training pairs from my bundle"* → outputs JSONL
304
+
305
+ ## Contributing
306
+
307
+ Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
308
+
309
+ ```bash
310
+ git clone https://github.com/umairbaig/okf-generator
311
+ cd okf-generator
312
+ pip install -e ".[dev]"
313
+ pytest tests/
314
+ ```
315
+
316
+ **Good first issues:** adding a new language parser, improving fuzzy search scoring, adding a CHANGELOG.
317
+
318
+ ## License
319
+
320
+ [MIT](LICENSE) — Copyright © 2026 Umair Baig
@@ -0,0 +1,273 @@
1
+ <div align="center">
2
+
3
+ <img src="docs/banner.svg" alt="okf-generator banner" width="100%"/>
4
+
5
+ <br/>
6
+
7
+ [![PyPI version](https://img.shields.io/pypi/v/okf-generator?style=flat-square&color=7c3aed&label=PyPI)](https://pypi.org/project/okf-generator/)
8
+ [![Python](https://img.shields.io/pypi/pyversions/okf-generator?style=flat-square&color=06b6d4)](https://pypi.org/project/okf-generator/)
9
+ [![Tests](https://img.shields.io/github/actions/workflow/status/umairbaig/okf-generator/ci.yml?style=flat-square&label=tests&color=4ade80)](https://github.com/umairbaig/okf-generator/actions)
10
+ [![License: MIT](https://img.shields.io/badge/license-MIT-blue?style=flat-square)](LICENSE)
11
+ [![OKF v0.1](https://img.shields.io/badge/OKF-v0.1-7c3aed?style=flat-square)](https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing)
12
+ [![Claude Skill](https://img.shields.io/badge/Claude-Skill-orange?style=flat-square&logo=anthropic)](SKILL.md)
13
+ [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen?style=flat-square)](CONTRIBUTING.md)
14
+
15
+ **Index any codebase into a structured OKF v0.1 knowledge bundle — then look up exact concepts for AI agents like OpenCode.**
16
+
17
+ [Installation](#installation) · [Quick Start](#quick-start) · [CLI Reference](#cli-reference) · [OpenCode Integration](#opencode-integration) · [Contributing](#contributing)
18
+
19
+ </div>
20
+
21
+ ---
22
+
23
+ ## What is this?
24
+
25
+ `okf-generator` converts your source code into an [Open Knowledge Format](https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing) (OKF) v0.1 knowledge bundle — structured markdown files that AI agents can read, search, and reason over.
26
+
27
+ Instead of giving an AI your entire codebase, you give it exactly the concept it needs:
28
+
29
+ ```bash
30
+ # Before touching WorldBankConnector, look it up
31
+ okf lookup WorldBankConnector
32
+
33
+ # CLASS: WorldBankConnector
34
+ # Source : StockAI/RnD/python/connectors/economic_data.py line 51
35
+ # Description : Fetches World Bank development indicators via wbdata API.
36
+ # Methods : get_indicator, search
37
+ # Signature : class WorldBankConnector
38
+ ```
39
+
40
+ ## Features
41
+
42
+ - **6 languages** — Python (stdlib AST), JS/TS/Go/Java/Rust/Ruby (tree-sitter)
43
+ - **Zero LLM required** for extraction — deterministic, fast, offline-capable
44
+ - **OKF v0.1 conformant** — type, description, resource, tags, timestamp
45
+ - **Domain/resource-path layout** — bundle mirrors your source tree exactly
46
+ - **Resumable LLM enrichment** — enrich descriptions with any OpenAI-compat endpoint; safe to interrupt and rerun
47
+ - **OpenCode integration** — `AGENTS.md` + custom commands for pinpoint context injection
48
+ - **Training data pipeline** — convert bundle to JSONL pairs (codegen, QA, doc, summarize, crosslink)
49
+ - **Claude Skill** — install `SKILL.md` to trigger the full pipeline from natural language
50
+
51
+ ## Installation
52
+
53
+ ```bash
54
+ # Core (extraction only — no LLM required)
55
+ pip install okf-generator
56
+
57
+ # With LLM enrichment + training pair generation
58
+ pip install okf-generator[llm]
59
+ ```
60
+
61
+ **Requirements:** Python 3.11+
62
+
63
+ ## Quick Start
64
+
65
+ ```bash
66
+ # 1. Generate a knowledge bundle from your codebase
67
+ okf generate ./my_project ./okf_bundle
68
+
69
+ # 2. Look up a concept (works instantly, zero LLM)
70
+ okf lookup WorldBankConnector
71
+
72
+ # 3. Find all concepts from one file
73
+ okf lookup --file src/connectors/economic_data.py
74
+
75
+ # 4. Generate training pairs from the bundle
76
+ okf pairs ./okf_bundle ./train.jsonl
77
+
78
+ # 5. Regenerate SUMMARY.md after enrichment
79
+ okf summarize ./okf_bundle
80
+ ```
81
+
82
+ ## Bundle Layout
83
+
84
+ The output mirrors your source tree — not flat buckets:
85
+
86
+ ```
87
+ okf_bundle/
88
+ ├── SUMMARY.md ← bird's-eye view for AI agents
89
+ ├── index.md ← root navigation
90
+ ├── log.md ← generation history
91
+ └── StockAI/
92
+ └── RnD/
93
+ └── python/
94
+ └── connectors/
95
+ ├── index.md ← lists all concepts in this folder
96
+ ├── economic_data.md ← Module concept
97
+ └── economic_data/
98
+ ├── WorldBankConnector.md ← Class
99
+ ├── get_indicator.md ← Function
100
+ └── search.md ← Function
101
+ ```
102
+
103
+ Each file is OKF v0.1 conformant:
104
+
105
+ ```yaml
106
+ ---
107
+ type: Class
108
+ title: WorldBankConnector
109
+ description: Fetches World Bank development indicators via wbdata API.
110
+ resource: StockAI/RnD/python/connectors/economic_data.py
111
+ tags:
112
+ - lang:python
113
+ - type:Class
114
+ - module:StockAI
115
+ - domain:RnD
116
+ - git:branch:main
117
+ - git:repo:TrainLLMs
118
+ timestamp: '2026-05-23T09:01:21Z'
119
+ ---
120
+
121
+ # WorldBankConnector
122
+
123
+ ...signature, docstring, params, returns, methods, related concepts...
124
+ ```
125
+
126
+ ## CLI Reference
127
+
128
+ ### `okf generate`
129
+
130
+ ```
131
+ okf generate <source_dir> [output_dir]
132
+
133
+ Options:
134
+ --summarize <bundle_dir> Regenerate SUMMARY.md only (no re-scan)
135
+
136
+ Environment variables (LLM enrichment):
137
+ OKF_ENRICH=1 Enable LLM enrichment
138
+ OKF_BASE_URL OpenAI-compat base URL (default: https://api.anthropic.com/v1)
139
+ OKF_API_KEY API key
140
+ OKF_MODEL Model name (default: claude-sonnet-4-6)
141
+ OKF_MAX_WORKERS Parallel workers (default: 2)
142
+ ```
143
+
144
+ ### `okf lookup`
145
+
146
+ ```
147
+ okf lookup [query] [options]
148
+
149
+ Options:
150
+ --bundle PATH Bundle directory (default: ./okf_bundle)
151
+ --file PATH Filter by source file
152
+ --type TYPE Filter by concept type: Function | Class | Module
153
+ --tag TAG Filter by tag, repeatable: --tag lang:python
154
+ --limit N Max results (default: 10)
155
+ --compact One-line output per result
156
+ --json JSON output for programmatic use
157
+ --full Raw .md file content
158
+ --min-score N Minimum relevance score 0-1 (default: 0.1)
159
+ ```
160
+
161
+ ### `okf pairs`
162
+
163
+ ```
164
+ okf pairs <bundle_dir> [output_file]
165
+
166
+ Environment variables:
167
+ SKIP_SYNTH=1 Static pairs only (no LLM)
168
+ SYNTH_BASE_URL API endpoint
169
+ SYNTH_API_KEY API key
170
+ SYNTH_MODEL Model name
171
+ MAX_WORKERS Parallel workers (default: 3)
172
+ QA_PER_CONCEPT Q&A pairs per concept (default: 3)
173
+ PAIR_TYPES Comma-separated: codegen,qa,doc,summarize,crosslink
174
+ ```
175
+
176
+ ## Supported Languages
177
+
178
+ | Language | Parser | Extracts |
179
+ |----------|--------|---------|
180
+ | Python | stdlib `ast` | Functions, classes, params, return types, docstrings |
181
+ | JavaScript / TypeScript | tree-sitter | Functions, arrow fns, classes, JSDoc |
182
+ | Go | tree-sitter | Funcs, methods, structs, interfaces, GoDoc |
183
+ | Java | tree-sitter | Classes, methods, constructors, Javadoc |
184
+ | Rust | tree-sitter | Fns, structs, enums, traits, impl blocks, `///` |
185
+ | Ruby | tree-sitter | Defs, classes, modules, `#` comments |
186
+
187
+ ## LLM Enrichment
188
+
189
+ Works with any OpenAI-compatible endpoint — Claude, Ollama, llama.cpp, etc:
190
+
191
+ ```bash
192
+ # Using a local llama.cpp server
193
+ OKF_ENRICH=1 \
194
+ OKF_BASE_URL="http://localhost:8080/v1" \
195
+ OKF_API_KEY="llamabarn" \
196
+ OKF_MODEL="ggml-org/gemma-3-4b-it-qat-GGUF:Q4_0" \
197
+ OKF_MAX_WORKERS=2 \
198
+ okf generate ./my_project ./okf_bundle
199
+ ```
200
+
201
+ Enrichment is **resumable** — interrupt and rerun freely. Already-enriched concepts are skipped.
202
+
203
+ ## OpenCode Integration
204
+
205
+ ```bash
206
+ # 1. Tell OpenCode about the bundle (auto-loaded every session)
207
+ cat >> AGENTS.md << 'EOF'
208
+ ## OKF Knowledge Bundle
209
+ Before working on any class or function, look it up:
210
+ okf lookup --bundle ./okf_bundle <ConceptName>
211
+ EOF
212
+
213
+ # 2. Add a custom command
214
+ mkdir -p .opencode/commands
215
+ echo "RUN okf lookup --bundle ./okf_bundle \$NAME" > .opencode/commands/lookup.md
216
+ ```
217
+
218
+ Then in OpenCode: `/lookup NAME=WorldBankConnector`
219
+
220
+ See [docs/opencode-integration.md](references/opencode-integration.md) for full setup.
221
+
222
+ ## Python API
223
+
224
+ ```python
225
+ from okf.generator import scan_codebase, write_bundle, write_summary
226
+ from okf.lookup import load_bundle, search
227
+
228
+ # Generate bundle
229
+ concepts = scan_codebase("./my_project")
230
+ write_bundle(concepts, "./okf_bundle", "my_project", ["initial generation"])
231
+ write_summary("my_project", concepts, "./okf_bundle", {})
232
+
233
+ # Search concepts
234
+ bundle = load_bundle("./okf_bundle")
235
+ results = search(bundle, tokens=["WorldBankConnector"])
236
+ print(results[0]["description"])
237
+ ```
238
+
239
+ ## Training Data
240
+
241
+ Convert your OKF bundle into JSONL training pairs for fine-tuning:
242
+
243
+ ```bash
244
+ # 5 pair types: codegen, qa, doc, summarize, crosslink
245
+ okf pairs ./okf_bundle ./train.jsonl
246
+ ```
247
+
248
+ Each pair is in chat format compatible with most fine-tuning pipelines.
249
+
250
+ ## Claude Skill
251
+
252
+ Install `SKILL.md` to trigger the full pipeline from natural language inside Claude:
253
+
254
+ > *"Index my codebase"* → generates OKF bundle
255
+ > *"Look up WorldBankConnector"* → returns exact concept
256
+ > *"Generate training pairs from my bundle"* → outputs JSONL
257
+
258
+ ## Contributing
259
+
260
+ Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
261
+
262
+ ```bash
263
+ git clone https://github.com/umairbaig/okf-generator
264
+ cd okf-generator
265
+ pip install -e ".[dev]"
266
+ pytest tests/
267
+ ```
268
+
269
+ **Good first issues:** adding a new language parser, improving fuzzy search scoring, adding a CHANGELOG.
270
+
271
+ ## License
272
+
273
+ [MIT](LICENSE) — Copyright © 2026 Umair Baig