ai-navigator 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. ai_navigator-0.1.0/CLAUDE.md +317 -0
  2. ai_navigator-0.1.0/PARSER_DESIGN.md +62 -0
  3. ai_navigator-0.1.0/PKG-INFO +443 -0
  4. ai_navigator-0.1.0/README.md +404 -0
  5. ai_navigator-0.1.0/SCHEMA_GUIDE.md +861 -0
  6. ai_navigator-0.1.0/_lab/logprob.py +208 -0
  7. ai_navigator-0.1.0/_lab/position.py +232 -0
  8. ai_navigator-0.1.0/pyproject.toml +64 -0
  9. ai_navigator-0.1.0/src/ai_navigator/__init__.py +57 -0
  10. ai_navigator-0.1.0/src/ai_navigator/conf_parser/__init__.py +4 -0
  11. ai_navigator-0.1.0/src/ai_navigator/conf_parser/parser.py +63 -0
  12. ai_navigator-0.1.0/src/ai_navigator/conf_parser/prompt.py +154 -0
  13. ai_navigator-0.1.0/src/ai_navigator/infra/__init__.py +39 -0
  14. ai_navigator-0.1.0/src/ai_navigator/infra/base_server.py +7 -0
  15. ai_navigator-0.1.0/src/ai_navigator/infra/const_configs.py +28 -0
  16. ai_navigator-0.1.0/src/ai_navigator/infra/credentials.py +64 -0
  17. ai_navigator-0.1.0/src/ai_navigator/infra/exceptions.py +51 -0
  18. ai_navigator-0.1.0/src/ai_navigator/infra/logger.py +19 -0
  19. ai_navigator-0.1.0/src/ai_navigator/infra/models.py +53 -0
  20. ai_navigator-0.1.0/src/ai_navigator/infra/state.py +72 -0
  21. ai_navigator-0.1.0/src/ai_navigator/infra/storage.py +278 -0
  22. ai_navigator-0.1.0/src/ai_navigator/parser/__init__.py +3 -0
  23. ai_navigator-0.1.0/src/ai_navigator/parser/response.py +168 -0
  24. ai_navigator-0.1.0/src/ai_navigator/pre_processor/__init__.py +3 -0
  25. ai_navigator-0.1.0/src/ai_navigator/pre_processor/image.py +92 -0
  26. ai_navigator-0.1.0/src/ai_navigator/schema/__init__.py +4 -0
  27. ai_navigator-0.1.0/src/ai_navigator/schema/composer.py +451 -0
  28. ai_navigator-0.1.0/src/ai_navigator/schema/extractor.py +168 -0
  29. ai_navigator-0.1.0/src/ai_navigator/server/__init__.py +6 -0
  30. ai_navigator-0.1.0/src/ai_navigator/server/anthropic_server.py +167 -0
  31. ai_navigator-0.1.0/src/ai_navigator/server/base_server.py +159 -0
  32. ai_navigator-0.1.0/src/ai_navigator/server/gemini_server.py +165 -0
  33. ai_navigator-0.1.0/src/ai_navigator/server/openai_server.py +149 -0
  34. ai_navigator-0.1.0/tests/__init__.py +0 -0
  35. ai_navigator-0.1.0/tests/test_basic.py +494 -0
@@ -0,0 +1,317 @@
1
+ # ai-navigator — Agent Guide
2
+
3
+ ## Project purpose
4
+
5
+ `ai-navigator` is a PyPI package that provides a unified, provider-agnostic interface for calling LLMs (OpenAI, Anthropic, Google Gemini) and building structured prompts. It normalises request/response handling, conversation storage, schema definition, image pre-processing, and response parsing into a coherent Python API.
6
+
7
+ ---
8
+
9
+ ## Repository layout
10
+
11
+ ```
12
+ ai-navigator/
13
+ ├── pyproject.toml
14
+ ├── src/ai_navigator/
15
+ │ ├── __init__.py # re-exports the most-used public symbols
16
+ │ ├── infra/ # FOUNDATION — pure data + utilities, no provider coupling
17
+ │ │ ├── exceptions.py # full exception hierarchy
18
+ │ │ ├── logger.py # get_logger(name) → stdlib Logger
19
+ │ │ ├── models.py # Message, Response, TokenUsage, ContentPart (Pydantic v2)
20
+ │ │ ├── const_configs.py # ConstConfigs — env-var-backed package constants
21
+ │ │ ├── credentials.py # CredentialsLoader — YAML fetch, override for other sources
22
+ │ │ ├── state.py # RequestState pipeline container
23
+ │ │ └── storage.py # StorageBase (SQLite default, no init args); StoreStatus
24
+ │ ├── server/ # SERVER LAYER — BaseServer + all provider implementations
25
+ │ │ ├── base_server.py # BaseServer (ABC) — credentials, conversation, dispatch
26
+ │ │ ├── openai_server.py # OpenAIServer
27
+ │ │ ├── anthropic_server.py # AnthropicServer
28
+ │ │ └── gemini_server.py # GeminiServer
29
+ │ ├── schema/
30
+ │ │ ├── composer.py # SchemaComposer — YAML → preprocess + schema_conversion
31
+ │ │ └── extractor.py # ResultExtractor — LLM result → flat leaf-node dict
32
+ │ ├── conf_parser/
33
+ │ │ ├── parser.py # ConfParser: multi-schema YAML config files
34
+ │ │ └── prompt.py # PromptBuilder: YAML-driven conversation assembly
35
+ │ ├── pre_processor/
36
+ │ │ └── image.py # ImageProcessor: local/URL/bytes → ContentPart
37
+ │ └── parser/
38
+ │ └── response.py # ResponseParser: JSON extract, Pydantic validate,
39
+ │ # find_value, enum check
40
+ └── tests/
41
+ └── test_basic.py # unit tests; no API keys needed
42
+ ```
43
+
44
+ ---
45
+
46
+ ## Module responsibilities
47
+
48
+ ### `infra/storage.py` — StorageBase
49
+
50
+ Concrete default implementation: all storage uses **SQLite** (three tables:
51
+ `pipeline_data`, `metrics`, `cache`). All I/O is wrapped in `try/except` —
52
+ permission errors degrade gracefully (log warning, return `None` /
53
+ `StoreStatus.ERROR`).
54
+
55
+ No constructor args — db path from `ConstConfigs.STORAGE_PATH`
56
+ (env `AI_NAVIGATOR_STORAGE_PATH`, default `ai_navigator.db`).
57
+ Override `_get_db_path()` to change location (e.g. in tests).
58
+
59
+ | pair | pipeline stage | table |
60
+ |---|---|---|
61
+ | `request_store` / `request_fetch` | raw user input | `pipeline_data` |
62
+ | `reference_store` / `reference_fetch` | processed schema / prompts | `pipeline_data` |
63
+ | `response_store` / `response_fetch` | server raw LLM response | `pipeline_data` |
64
+ | `status_store` / `status_fetch` | processing status | `pipeline_data` |
65
+ | `result_store` / `result_fetch` | extracted / parsed result | `pipeline_data` |
66
+ | `metric_report` / `metric_load` | aggregate metrics | `metrics` |
67
+ | `cache_store` / `cache_fetch` | high-frequency counters | `cache` |
68
+
69
+ Store methods return `StoreStatus.OK` / `StoreStatus.ERROR` (string constants).
70
+ Fetch methods return the value or `None` (not found).
71
+
72
+ Override any pair to swap the backend — store/fetch pairs **must be overridden
73
+ together**.
74
+
75
+ ---
76
+
77
+ ### `infra/state.py` — RequestState
78
+
79
+ Pipeline state container. Passed through processing stages so intermediate
80
+ steps don't require extra function arguments.
81
+
82
+ ```
83
+ request_data {"type": "message", "content": str | list}
84
+ {"type": "conversation", "messages": list[Message]}
85
+ {"type": "prompt", "template": list, "data_dict": dict}
86
+
87
+ params LLM / server parameters passed directly to the provider call.
88
+ Examples: temperature, max_tokens, top_p, logprobs, top_logprobs
89
+
90
+ configs Package-internal control knobs (NOT forwarded to provider).
91
+ Examples:
92
+ term_extract_discard bool default True
93
+ extract_list_elements bool default False
94
+
95
+ reference derived artefacts: e.g. {"schema": <SchemaComposer>}
96
+ — the processed schema lives here after .preprocess()
97
+
98
+ result populated by final stage with parsed LLM output
99
+
100
+ status Status(code=StatusCode.PENDING|OK|ERROR, message="")
101
+ ```
102
+
103
+ ### `schema/composer.py` — SchemaComposer
104
+
105
+ **Request-side** schema handling: YAML definition → OpenAI structured-output dict.
106
+
107
+ **YAML format**
108
+
109
+ ```yaml
110
+ meta:
111
+ name: ProductReview
112
+ description: Extract structured review data
113
+ version: "1.0"
114
+
115
+ defs: # optional — reusable definitions
116
+ score_range:
117
+ type: int
118
+ description: Score from 0 to 10
119
+
120
+ schema: # dict: term-name → spec (no "name:" key inside)
121
+ title:
122
+ type: str
123
+ description: Product title
124
+
125
+ rating:
126
+ ref: score_range # → {"$ref": "#/$defs/score_range"}
127
+
128
+ category:
129
+ type: enum
130
+ choices: [electronics, clothing, food]
131
+ config_confidence: true # pkg-internal: include in LogProbParser extraction
132
+
133
+ category_dyn:
134
+ type: enum
135
+ dynamic_choices: cat_list # choices from data_dict["cat_list"] at preprocess()
136
+ config_confidence: true
137
+
138
+ optional_note:
139
+ type: [str, null] # list of types → anyOf in JSON Schema
140
+
141
+ detail:
142
+ type: dict
143
+ terms: # nested terms also use dict format
144
+ reason:
145
+ type: str
146
+ score:
147
+ ref: score_range
148
+
149
+ tags:
150
+ type: list
151
+ item_type: str # str / int / float / bool (default: str)
152
+ choices: [fast, light] # optional — constrains list items
153
+ ```
154
+
155
+ **Supported types** — `str` / `string` / `free-text`, `int` / `integer`,
156
+ `float` / `number`, `bool` / `boolean`, `null`, `enum`, `list`, `dict`, `any`.
157
+ A list of types (e.g. `[str, null]`) produces `anyOf` in JSON Schema.
158
+
159
+ **`defs:` section** — optional reusable term definitions. Referenced inside
160
+ `schema:` (and nested `terms:`) with `ref: def_name`, which produces
161
+ `{"$ref": "#/$defs/def_name"}` in the output. Defs themselves support all
162
+ the same attributes as regular terms, including `dynamic_*`.
163
+
164
+ **Dynamic attributes** — any attribute can be made dynamic by prefixing it
165
+ with `dynamic_`. `dynamic_{attr}: key` sets `attr = data_dict[key]` at
166
+ `preprocess()` time, then removes the `dynamic_*` key. Examples:
167
+
168
+ | Attribute | Effect |
169
+ |---|---|
170
+ | `dynamic_type: key` | sets `type` (resolved before all other logic) |
171
+ | `dynamic_description: key` | sets `description` |
172
+ | `dynamic_choices: key` | sets `choices` for `enum` / `list` |
173
+ | `dynamic_terms: key` | sets `terms` for a `dict` term |
174
+
175
+ **`config_*` attributes** — any attribute whose name starts with `config_` is
176
+ a package-internal directive and is **never forwarded** to the JSON Schema
177
+ output (silently ignored by `schema_conversion()`).
178
+
179
+ **`config_confidence: bool`** — optional, default `false`. When `true` the
180
+ term is included in `confidence_terms()`, enabling `LogProbParser` to extract
181
+ a probability distribution for that field. Meaningful only on `enum` terms.
182
+
183
+ **`required` is absent from YAML** — all terms are always included in the JSON
184
+ Schema `required` array (OpenAI strict mode mandates this).
185
+
186
+ **Field-name constraint** — Term names must not contain `"."`. A `SchemaError`
187
+ is raised at `schema_conversion()` time.
188
+
189
+ **Two-phase workflow**
190
+
191
+ 1. `preprocess(data_dict)` → resolves all `dynamic_*` attributes; returns a new `SchemaComposer`.
192
+ 2. `schema_conversion(task_name=None)` → returns `{"type": "json_schema", "json_schema": {...}}`.
193
+
194
+ **Introspection helpers**
195
+ - `confidence_terms()` → `{path: [candidates]}` for `config_confidence: true` enum terms.
196
+ - `build_prompt_instruction()` → plain-text system-prompt fragment.
197
+
198
+ ### `schema/extractor.py` — ResultExtractor
199
+
200
+ **Response-side** schema handling: raw LLM dict → flat result dict.
201
+
202
+ Extraction is driven by the active **parse-type set** (not a leaf/non-leaf flag):
203
+
204
+ | Condition | Behaviour |
205
+ |---|---|
206
+ | `type == "dict"` (always active) | Recurse into `terms`; keys become dot-notation paths |
207
+ | `type == "list"` + `configs["extract_list_elements"]=True` | Flatten to `name_1`, `name_2`, … |
208
+ | Anything else | Return value as-is |
209
+
210
+ `configs["term_extract_discard"]` (bool, default `True`) controls whether the **parent key is kept** when a term is expanded:
211
+
212
+ - `True` (default): parent key discarded, only expanded children in result
213
+ - `False`: parent key also written to result alongside expanded children
214
+
215
+ ```python
216
+ extractor = ResultExtractor()
217
+
218
+ # Default — dict expanded, parent discarded
219
+ result = extractor.extract(data, composer)
220
+ # {"title": "Phone", "detail.reason": "fast", "detail.score": 9,
221
+ # "tags": ["a", "b"]}
222
+
223
+ # Keep parent key alongside children
224
+ result = extractor.extract(data, composer,
225
+ configs={"term_extract_discard": False})
226
+ # {"title": "Phone",
227
+ # "detail": {"reason": "fast", "score": 9}, ← also kept
228
+ # "detail.reason": "fast", "detail.score": 9,
229
+ # "tags": ["a", "b"]}
230
+
231
+ # List expansion + discard (default)
232
+ result = extractor.extract(data, composer,
233
+ configs={"extract_list_elements": True})
234
+ # {"detail.reason": "fast", "detail.score": 9,
235
+ # "tags_1": "a", "tags_2": "b"} ← "tags" gone
236
+ ```
237
+
238
+ ### `conf_parser/prompt.py` — PromptBuilder
239
+
240
+ ```yaml
241
+ - role: system
242
+ message:
243
+ - type: const_text
244
+ content: You are a helpful assistant.
245
+
246
+ - message: # role defaults to "user"
247
+ - type: const_text
248
+ content: "Describe this product:"
249
+ - type: dynamic_text
250
+ key: product_description
251
+ ```
252
+
253
+ `const_*` keeps literal `content`; `dynamic_*` reads `data_dict[key]`.
254
+ Single-text messages collapse to `Message(role, content: str)`.
255
+
256
+ ### `parser/` — response parsing
257
+
258
+ | File | Responsibility |
259
+ |---|---|
260
+ | `response.py` | `ResponseParser`: JSON extraction, Pydantic validation, `find_value`, enum validation. |
261
+
262
+ > `logprob.py` and `position.py` are offline — moved to `_lab/` (outside the
263
+ > package, not shipped). `config_confidence` is preserved in schema terms for
264
+ > future re-integration.
265
+
266
+ ---
267
+
268
+ ## Data flow
269
+
270
+ ```
271
+ RequestState.request_data
272
+ type="message" → normalise str|list → list[Message]
273
+ type="conversation" → pass through
274
+ type="prompt" → PromptBuilder.build(data_dict) → list[Message]
275
+
276
+ SchemaComposer.from_yaml(yaml_str)
277
+ .preprocess(data_dict) → stored in RequestState.reference["schema"]
278
+ .schema_conversion() → response_format dict for llm.response()
279
+
280
+ ConcreteServer.response(msgs, response_format=...)
281
+ → Response(content=..., raw=completion)
282
+
283
+ ResponseParser.parse_response(response)
284
+ → data: dict
285
+
286
+ ResultExtractor().extract(data, composer)
287
+ → {"title": "...", "detail.reason": "...", "detail.score": 9}
288
+ → stored in RequestState.result
289
+
290
+ ─── Optional: logprob (offline — _lab/logprob.py + _lab/position.py) ─────────
291
+
292
+ composer.confidence_terms()
293
+ → {"sentiment": ["正面", "负面", "中性"]}
294
+ (feeds LogProbParser when re-integrated)
295
+ ```
296
+
297
+ ---
298
+
299
+ ## Conventions
300
+
301
+ - `BaseServer` never reads or parses `self.credentials` — concrete server's job inside `_setup`.
302
+ - All SDK imports are **lazy** (inside `_setup`). The package imports cleanly without any provider SDK installed.
303
+ - `_chat` / `_response` must raise a `ProviderError` subclass (never a raw SDK exception).
304
+ - Path separator throughout `parser/` and `schema/` is **"."** (dot).
305
+ - Term names in schemas **must not contain "."**. Validated at `schema_conversion()` time.
306
+ - `static_*` types are fully YAML-defined; `dynamic_*` types require `preprocess(data_dict)`.
307
+
308
+ ---
309
+
310
+ ## Running tests
311
+
312
+ ```bash
313
+ pip install -e ".[dev]"
314
+ pytest tests/ -v
315
+ ```
316
+
317
+ No API keys required — all tests exercise `infra`, `parser`, and `schema` only.
@@ -0,0 +1,62 @@
1
+ # Parser & Schema 模块设计说明
2
+
3
+ ## 当前架构(v0.1)
4
+
5
+ ```
6
+ 用户 YAML schema
7
+
8
+
9
+ SchemaComposer.from_yaml()
10
+
11
+ ├── preprocess(data_dict) ← 解析所有 dynamic_* 属性
12
+
13
+ └── schema_conversion() ← 输出 OpenAI response_format dict
14
+ {"type": "json_schema", "json_schema": {...}}
15
+
16
+ LLM 调用(structured output)
17
+
18
+
19
+ ResponseParser.parse_response() ← 从响应文本中提取 JSON
20
+
21
+ ResultExtractor.extract(data, composer, configs)
22
+
23
+ ├── 默认:dict 类型展开,list 原样保留
24
+ │ 结果是 flat dict,key 用 dot-notation(如 "detail.score")
25
+
26
+ ├── configs["extract_list_elements"]=True
27
+ │ list 展开为 term_1, term_2, ...
28
+
29
+ └── configs["term_extract_discard"]=False
30
+ 展开的同时保留父节点原始 key
31
+ ```
32
+
33
+ ## Schema YAML 格式
34
+
35
+ ```
36
+ meta: 必填 — name, description, version
37
+ defs: 可选 — 可复用类型定义($defs)
38
+ schema: 必填 — dict 格式,key 即字段名
39
+ field_name:
40
+ type: str/int/float/bool/null/enum/list/dict/any
41
+ 或 [str, null] 形式 → anyOf
42
+ ref: 引用 defs 里的定义 → $ref
43
+ config_*: pkg 内部属性,不写入 JSON Schema
44
+ config_confidence: 标记 enum 字段进行 logprob 提取(暂时下线)
45
+ dynamic_*: 运行时从 data_dict 注入,preprocess() 后移除
46
+ ```
47
+
48
+ ## 路径分隔符约定
49
+
50
+ - 所有路径使用 `.`(点号)作为层级分隔符
51
+ - 字段名不允许含 `.`,在 `schema_conversion()` 时校验
52
+ - 数组展开后的 key 格式:`field_1`, `field_2`(下划线 + 1-based 序号)
53
+
54
+ ## LogProb 支持(暂时下线)
55
+
56
+ `position.py` 和 `logprob.py` 移至 `_lab/`,不打包。
57
+ `config_confidence` 字段保留,待后续集成时直接启用。
58
+
59
+ 重新集成步骤:
60
+ 1. 将 `_lab/logprob.py` 和 `_lab/position.py` 移回 `src/ai_navigator/parser/`
61
+ 2. 更新 `parser/__init__.py` 导出 `LogProbParser`, `JSONPositionParser` 等
62
+ 3. `SchemaComposer.confidence_terms()` 已返回 `{path: [candidates]}`,直接传给 `LogProbParser`