@hasna/knowledge 0.2.13 → 0.2.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -89,6 +89,14 @@ open-knowledge safety status --scope project --json
89
89
  # Inspect AI SDK provider credentials and model aliases
90
90
  open-knowledge providers status --scope project --json
91
91
  open-knowledge providers models --scope project --json
92
+
93
+ # Embed indexed chunks and run semantic search
94
+ open-knowledge embeddings index --scope project --model openai:text-embedding-3-small --json
95
+ open-knowledge embeddings search "company wiki policy" --scope project --json
96
+
97
+ # Hybrid search over source chunks, generated wiki pages, indexes, and optional vectors
98
+ open-knowledge search "company wiki policy" --scope project --json
99
+ open-knowledge search "company wiki policy" --scope project --semantic --json
92
100
  ```
93
101
 
94
102
  ## Commands
@@ -233,6 +241,17 @@ Consume open-files JSON or JSONL change events. This invalidates matching
233
241
  source chunks and embeddings by source ref, revision, or hash, updates
234
242
  permission/path/delete metadata, and records a local run ledger.
235
243
 
244
+ ### search
245
+ ```bash
246
+ open-knowledge search <query> [--scope project] [--limit <n>] [--json]
247
+ open-knowledge search <query> --semantic [--model openai:text-embedding-3-small] [--scope project] [--json]
248
+ ```
249
+ Run hybrid search over `chunks_fts`, generated wiki chunks, wiki/index catalog
250
+ rows, and optional vector results. The default path is local-only keyword and
251
+ catalog search. `--semantic` embeds the query and merges vector results from
252
+ `vector_index_entries`, preserving source refs, artifact URIs, citations,
253
+ revision/hash metadata, and provenance in each structured result.
254
+
236
255
  ### safety
237
256
  ```bash
238
257
  open-knowledge safety status [--scope project] [--json]
@@ -258,6 +277,22 @@ such as `default`, `fast`, `reasoning`, `sonnet`, and `deepseek`, and records
258
277
  provider capability metadata for structured output, tool use, tool streaming,
259
278
  reasoning, embeddings, and native web-search support.
260
279
 
280
+ ### embeddings
281
+ ```bash
282
+ open-knowledge embeddings status [--scope project] [--json]
283
+ open-knowledge embeddings index [--model openai:text-embedding-3-small] [--limit <n>] [--scope project] [--json]
284
+ open-knowledge embeddings search <query> [--model openai:text-embedding-3-small] [--limit <n>] [--scope project] [--json]
285
+ ```
286
+ Build and query the local vector index over derived knowledge chunks. The first
287
+ implementation stores vectors in SQLite as JSON rows in `chunk_embeddings` and
288
+ `vector_index_entries`, with provider/model/dimensions, source revision/hash,
289
+ chunk offsets, token counts, invalidation status, and provenance metadata. Raw
290
+ source bytes remain owned by `open-files`; semantic results return cited chunks
291
+ with source refs and revision metadata.
292
+
293
+ OpenAI embeddings use AI SDK v6 and `OPENAI_API_KEY`. `--fake` provides
294
+ deterministic local vectors for tests and offline smoke checks.
295
+
261
296
  ### help
262
297
  ```bash
263
298
  open-knowledge help [command]
@@ -293,8 +328,10 @@ The MCP server exposes item tools (`ok_add`, `ok_list`, `ok_get`, `ok_update`,
293
328
  `ok_delete`, `ok_archive`, `ok_restore`, `ok_upsert`, `ok_untag`,
294
329
  `ok_bulk_delete`, `ok_prune`, `ok_dedupe`, `ok_stats`, `ok_export`,
295
330
  `ok_import`, `ok_batch`), workspace/storage inspection (`ok_paths`,
296
- `ok_storage_status`), and source-ref parsing/resolution
297
- (`ok_parse_source_ref`, `ok_resolve_source`).
331
+ `ok_storage_status`), provider/embedding tools (`ok_provider_status`,
332
+ `ok_provider_models`, `ok_embeddings_status`, `ok_embeddings_index`,
333
+ `ok_semantic_search`), hybrid retrieval (`ok_search`), and source-ref
334
+ parsing/resolution (`ok_parse_source_ref`, `ok_resolve_source`).
298
335
 
299
336
  ## Source And Artifact Boundary
300
337
 
@@ -320,10 +357,15 @@ read-only status, citation requirements, and stale-source status. This keeps
320
357
  future semantic search and wiki compile flows tied back to `open-files` instead
321
358
  of detached Markdown.
322
359
 
360
+ Semantic indexing stores generated vector rows and provenance only. It does not
361
+ store raw S3 or local-file bytes in the knowledge app, so a future hosted/S3
362
+ wrapper can move generated artifacts to object storage while source ownership
363
+ and immutable object identity stay in `open-files`.
364
+
323
365
  AI provider configuration is local/BYOK by default. `open-knowledge` declares
324
366
  AI SDK v6 provider support through `ai`, `@ai-sdk/openai`,
325
367
  `@ai-sdk/anthropic`, and `@ai-sdk/deepseek`, but does not call providers until a
326
- future prompt/agent command explicitly requests a model.
368
+ prompt, embedding, or agent command explicitly requests a model.
327
369
 
328
370
  Generated knowledge artifacts can be stored locally under
329
371
  `.hasna/apps/knowledge/artifacts` or through the S3 artifact-store adapter.