@hasna/knowledge 0.2.12 → 0.2.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -89,6 +89,10 @@ open-knowledge safety status --scope project --json
89
89
  # Inspect AI SDK provider credentials and model aliases
90
90
  open-knowledge providers status --scope project --json
91
91
  open-knowledge providers models --scope project --json
92
+
93
+ # Embed indexed chunks and run semantic search
94
+ open-knowledge embeddings index --scope project --model openai:text-embedding-3-small --json
95
+ open-knowledge embeddings search "company wiki policy" --scope project --json
92
96
  ```
93
97
 
94
98
  ## Commands
@@ -258,6 +262,22 @@ such as `default`, `fast`, `reasoning`, `sonnet`, and `deepseek`, and records
258
262
  provider capability metadata for structured output, tool use, tool streaming,
259
263
  reasoning, embeddings, and native web-search support.
260
264
 
265
+ ### embeddings
266
+ ```bash
267
+ open-knowledge embeddings status [--scope project] [--json]
268
+ open-knowledge embeddings index [--model openai:text-embedding-3-small] [--limit <n>] [--scope project] [--json]
269
+ open-knowledge embeddings search <query> [--model openai:text-embedding-3-small] [--limit <n>] [--scope project] [--json]
270
+ ```
271
+ Build and query the local vector index over derived knowledge chunks. The first
272
+ implementation stores vectors in SQLite as JSON rows in `chunk_embeddings` and
273
+ `vector_index_entries`, with provider/model/dimensions, source revision/hash,
274
+ chunk offsets, token counts, invalidation status, and provenance metadata. Raw
275
+ source bytes remain owned by `open-files`; semantic results return cited chunks
276
+ with source refs and revision metadata.
277
+
278
+ OpenAI embeddings use AI SDK v6 and `OPENAI_API_KEY`. `--fake` provides
279
+ deterministic local vectors for tests and offline smoke checks.
280
+
261
281
  ### help
262
282
  ```bash
263
283
  open-knowledge help [command]
@@ -293,8 +313,10 @@ The MCP server exposes item tools (`ok_add`, `ok_list`, `ok_get`, `ok_update`,
293
313
  `ok_delete`, `ok_archive`, `ok_restore`, `ok_upsert`, `ok_untag`,
294
314
  `ok_bulk_delete`, `ok_prune`, `ok_dedupe`, `ok_stats`, `ok_export`,
295
315
  `ok_import`, `ok_batch`), workspace/storage inspection (`ok_paths`,
296
- `ok_storage_status`), and source-ref parsing/resolution
297
- (`ok_parse_source_ref`, `ok_resolve_source`).
316
+ `ok_storage_status`), provider/embedding tools (`ok_provider_status`,
317
+ `ok_provider_models`, `ok_embeddings_status`, `ok_embeddings_index`,
318
+ `ok_semantic_search`), and source-ref parsing/resolution (`ok_parse_source_ref`,
319
+ `ok_resolve_source`).
298
320
 
299
321
  ## Source And Artifact Boundary
300
322
 
@@ -314,10 +336,21 @@ source ref. It does not copy raw files into the knowledge workspace; local file,
314
336
  S3, web, and open-files inputs are converted into redacted chunks with offsets,
315
337
  hashes, revision metadata, and FTS rows.
316
338
 
339
+ Chunks, resolver results, generated wiki pages, and index records carry
340
+ provenance metadata: source owner, source ref/URI, revision/hash, chunk offsets,
341
+ read-only status, citation requirements, and stale-source status. This keeps
342
+ future semantic search and wiki compile flows tied back to `open-files` instead
343
+ of detached Markdown.
344
+
345
+ Semantic indexing stores generated vector rows and provenance only. It does not
346
+ store raw S3 or local-file bytes in the knowledge app, so a future hosted/S3
347
+ wrapper can move generated artifacts to object storage while source ownership
348
+ and immutable object identity stay in `open-files`.
349
+
317
350
  AI provider configuration is local/BYOK by default. `open-knowledge` declares
318
351
  AI SDK v6 provider support through `ai`, `@ai-sdk/openai`,
319
352
  `@ai-sdk/anthropic`, and `@ai-sdk/deepseek`, but does not call providers until a
320
- future prompt/agent command explicitly requests a model.
353
+ prompt, embedding, or agent command explicitly requests a model.
321
354
 
322
355
  Generated knowledge artifacts can be stored locally under
323
356
  `.hasna/apps/knowledge/artifacts` or through the S3 artifact-store adapter.