openaivec 0.10.5__tar.gz → 1.0.10__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (130) hide show
  1. openaivec-1.0.10/.github/copilot-instructions.md +368 -0
  2. openaivec-1.0.10/.github/dependabot.yml +15 -0
  3. openaivec-0.10.5/.github/workflows/python-mkdocs.yml → openaivec-1.0.10/.github/workflows/docs.yml +3 -4
  4. openaivec-0.10.5/.github/workflows/python-package.yml → openaivec-1.0.10/.github/workflows/publish.yml +3 -2
  5. openaivec-1.0.10/.github/workflows/test-pr.yml +57 -0
  6. openaivec-0.10.5/.github/workflows/python-test.yml → openaivec-1.0.10/.github/workflows/test.yml +8 -1
  7. {openaivec-0.10.5 → openaivec-1.0.10}/.gitignore +4 -0
  8. openaivec-1.0.10/AGENTS.md +34 -0
  9. openaivec-1.0.10/PKG-INFO +399 -0
  10. openaivec-1.0.10/README.md +373 -0
  11. openaivec-1.0.10/docs/api/main.md +19 -0
  12. openaivec-1.0.10/docs/api/pandas_ext.md +3 -0
  13. openaivec-1.0.10/docs/api/spark.md +3 -0
  14. openaivec-1.0.10/docs/api/task.md +3 -0
  15. openaivec-1.0.10/docs/api/tasks/customer_support/customer_sentiment.md +3 -0
  16. openaivec-1.0.10/docs/api/tasks/customer_support/inquiry_classification.md +3 -0
  17. openaivec-1.0.10/docs/api/tasks/customer_support/inquiry_summary.md +3 -0
  18. openaivec-1.0.10/docs/api/tasks/customer_support/intent_analysis.md +3 -0
  19. openaivec-1.0.10/docs/api/tasks/customer_support/response_suggestion.md +3 -0
  20. openaivec-1.0.10/docs/api/tasks/customer_support/urgency_analysis.md +3 -0
  21. openaivec-1.0.10/docs/api/tasks/nlp/dependency_parsing.md +3 -0
  22. openaivec-1.0.10/docs/api/tasks/nlp/keyword_extraction.md +3 -0
  23. openaivec-1.0.10/docs/api/tasks/nlp/morphological_analysis.md +3 -0
  24. openaivec-1.0.10/docs/api/tasks/nlp/named_entity_recognition.md +3 -0
  25. openaivec-1.0.10/docs/api/tasks/nlp/sentiment_analysis.md +3 -0
  26. openaivec-1.0.10/docs/api/tasks/nlp/translation.md +3 -0
  27. openaivec-1.0.10/docs/contributor-guide.md +3 -0
  28. {openaivec-0.10.5 → openaivec-1.0.10}/docs/index.md +21 -18
  29. openaivec-1.0.10/docs/overrides/main.html +10 -0
  30. {openaivec-0.10.5 → openaivec-1.0.10}/mkdocs.yml +29 -8
  31. {openaivec-0.10.5 → openaivec-1.0.10}/pyproject.toml +24 -0
  32. openaivec-1.0.10/pytest.ini +42 -0
  33. openaivec-1.0.10/src/openaivec/__init__.py +18 -0
  34. openaivec-1.0.10/src/openaivec/_cache/__init__.py +12 -0
  35. openaivec-1.0.10/src/openaivec/_cache/optimize.py +109 -0
  36. openaivec-1.0.10/src/openaivec/_cache/proxy.py +806 -0
  37. openaivec-1.0.10/src/openaivec/_di.py +326 -0
  38. openaivec-1.0.10/src/openaivec/_embeddings.py +203 -0
  39. openaivec-0.10.5/src/openaivec/log.py → openaivec-1.0.10/src/openaivec/_log.py +2 -2
  40. openaivec-1.0.10/src/openaivec/_model.py +113 -0
  41. openaivec-0.10.5/src/openaivec/prompt.py → openaivec-1.0.10/src/openaivec/_prompt.py +95 -28
  42. openaivec-1.0.10/src/openaivec/_provider.py +207 -0
  43. openaivec-1.0.10/src/openaivec/_responses.py +511 -0
  44. openaivec-1.0.10/src/openaivec/_schema/__init__.py +9 -0
  45. openaivec-1.0.10/src/openaivec/_schema/infer.py +340 -0
  46. openaivec-1.0.10/src/openaivec/_schema/spec.py +350 -0
  47. openaivec-1.0.10/src/openaivec/_serialize.py +234 -0
  48. openaivec-0.10.5/src/openaivec/util.py → openaivec-1.0.10/src/openaivec/_util.py +25 -85
  49. openaivec-1.0.10/src/openaivec/pandas_ext.py +2135 -0
  50. openaivec-1.0.10/src/openaivec/spark.py +838 -0
  51. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/__init__.py +27 -29
  52. openaivec-1.0.10/src/openaivec/task/customer_support/__init__.py +26 -0
  53. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/customer_support/customer_sentiment.py +51 -41
  54. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/customer_support/inquiry_classification.py +86 -61
  55. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/customer_support/inquiry_summary.py +44 -45
  56. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/customer_support/intent_analysis.py +56 -41
  57. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/customer_support/response_suggestion.py +49 -43
  58. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/customer_support/urgency_analysis.py +76 -71
  59. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/nlp/__init__.py +4 -4
  60. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/nlp/dependency_parsing.py +19 -20
  61. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/nlp/keyword_extraction.py +22 -24
  62. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/nlp/morphological_analysis.py +25 -25
  63. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/nlp/named_entity_recognition.py +26 -28
  64. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/nlp/sentiment_analysis.py +29 -21
  65. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/nlp/translation.py +24 -30
  66. openaivec-1.0.10/src/openaivec/task/table/__init__.py +3 -0
  67. {openaivec-0.10.5 → openaivec-1.0.10}/src/openaivec/task/table/fillna.py +16 -16
  68. openaivec-1.0.10/tests/_cache/test_optimize.py +283 -0
  69. openaivec-1.0.10/tests/_cache/test_proxy.py +828 -0
  70. openaivec-1.0.10/tests/_cache/test_proxy_suggester.py +200 -0
  71. openaivec-1.0.10/tests/_schema/test_infer.py +379 -0
  72. openaivec-1.0.10/tests/_schema/test_spec.py +481 -0
  73. openaivec-1.0.10/tests/conftest.py +352 -0
  74. openaivec-1.0.10/tests/test_di.py +319 -0
  75. openaivec-1.0.10/tests/test_embeddings.py +159 -0
  76. openaivec-1.0.10/tests/test_pandas_ext.py +864 -0
  77. {openaivec-0.10.5 → openaivec-1.0.10}/tests/test_prompt.py +53 -8
  78. openaivec-1.0.10/tests/test_provider.py +489 -0
  79. openaivec-1.0.10/tests/test_responses.py +216 -0
  80. openaivec-1.0.10/tests/test_serialize.py +265 -0
  81. openaivec-1.0.10/tests/test_serialize_pydantic_v2_compliance.py +117 -0
  82. openaivec-1.0.10/tests/test_spark.py +423 -0
  83. {openaivec-0.10.5 → openaivec-1.0.10}/tests/test_task.py +41 -133
  84. openaivec-1.0.10/tests/test_util.py +228 -0
  85. openaivec-1.0.10/uv.lock +3147 -0
  86. openaivec-0.10.5/.github/workflows/python-update.yml +0 -36
  87. openaivec-0.10.5/PKG-INFO +0 -701
  88. openaivec-0.10.5/README.md +0 -677
  89. openaivec-0.10.5/docs/api/embeddings.md +0 -15
  90. openaivec-0.10.5/docs/api/pandas_ext.md +0 -15
  91. openaivec-0.10.5/docs/api/prompt.md +0 -15
  92. openaivec-0.10.5/docs/api/responses.md +0 -15
  93. openaivec-0.10.5/docs/api/spark.md +0 -15
  94. openaivec-0.10.5/docs/api/task.md +0 -19
  95. openaivec-0.10.5/docs/api/tasks/customer_support/customer_sentiment.md +0 -3
  96. openaivec-0.10.5/docs/api/tasks/customer_support/inquiry_classification.md +0 -3
  97. openaivec-0.10.5/docs/api/tasks/customer_support/inquiry_summary.md +0 -3
  98. openaivec-0.10.5/docs/api/tasks/customer_support/intent_analysis.md +0 -3
  99. openaivec-0.10.5/docs/api/tasks/customer_support/response_suggestion.md +0 -3
  100. openaivec-0.10.5/docs/api/tasks/customer_support/urgency_analysis.md +0 -3
  101. openaivec-0.10.5/docs/api/tasks/nlp/dependency_parsing.md +0 -15
  102. openaivec-0.10.5/docs/api/tasks/nlp/keyword_extraction.md +0 -15
  103. openaivec-0.10.5/docs/api/tasks/nlp/morphological_analysis.md +0 -15
  104. openaivec-0.10.5/docs/api/tasks/nlp/named_entity_recognition.md +0 -15
  105. openaivec-0.10.5/docs/api/tasks/nlp/sentiment_analysis.md +0 -15
  106. openaivec-0.10.5/docs/api/tasks/nlp/translation.md +0 -15
  107. openaivec-0.10.5/docs/api/util.md +0 -15
  108. openaivec-0.10.5/src/openaivec/__init__.py +0 -9
  109. openaivec-0.10.5/src/openaivec/embeddings.py +0 -172
  110. openaivec-0.10.5/src/openaivec/pandas_ext.py +0 -1047
  111. openaivec-0.10.5/src/openaivec/responses.py +0 -392
  112. openaivec-0.10.5/src/openaivec/serialize.py +0 -225
  113. openaivec-0.10.5/src/openaivec/spark.py +0 -619
  114. openaivec-0.10.5/src/openaivec/task/customer_support/__init__.py +0 -32
  115. openaivec-0.10.5/src/openaivec/task/model.py +0 -84
  116. openaivec-0.10.5/src/openaivec/task/table/__init__.py +0 -3
  117. openaivec-0.10.5/tests/test_embeddings.py +0 -120
  118. openaivec-0.10.5/tests/test_pandas_ext.py +0 -351
  119. openaivec-0.10.5/tests/test_responses.py +0 -173
  120. openaivec-0.10.5/tests/test_serialize.py +0 -331
  121. openaivec-0.10.5/tests/test_spark.py +0 -231
  122. openaivec-0.10.5/tests/test_util.py +0 -176
  123. openaivec-0.10.5/uv.lock +0 -2588
  124. {openaivec-0.10.5 → openaivec-1.0.10}/.env.example +0 -0
  125. {openaivec-0.10.5 → openaivec-1.0.10}/CODE_OF_CONDUCT.md +0 -0
  126. {openaivec-0.10.5 → openaivec-1.0.10}/LICENSE +0 -0
  127. {openaivec-0.10.5 → openaivec-1.0.10}/SECURITY.md +0 -0
  128. {openaivec-0.10.5 → openaivec-1.0.10}/SUPPORT.md +0 -0
  129. {openaivec-0.10.5 → openaivec-1.0.10}/docs/robots.txt +0 -0
  130. {openaivec-0.10.5 → openaivec-1.0.10}/tests/__init__.py +0 -0
@@ -0,0 +1,368 @@
1
+ # Copilot Instructions – openaivec
2
+
3
+ Concise guide for generating code that fits this project’s architecture, performance model, style, and public API. Favor these rules over generic heuristics.
4
+
5
+ ---
6
+
7
+ ## 1. Purpose & Scope
8
+
9
+ Provide high‑throughput, batched access to OpenAI / Azure OpenAI Responses + Embeddings for pandas & Spark with strict ordering, deduplication, and structured outputs.
10
+
11
+ ---
12
+
13
+ ## 2. Public Surface (primary exports)
14
+
15
+ From `openaivec.__init__`:
16
+
17
+ - `BatchResponses`, `AsyncBatchResponses`
18
+ - `BatchEmbeddings`, `AsyncBatchEmbeddings`
19
+ - `PreparedTask`, `FewShotPromptBuilder`
20
+
21
+ Entry points:
22
+
23
+ - Pandas accessors: `Series.ai` / `Series.aio`
24
+ - Spark UDF builders in `spark.py`
25
+ - Structured tasks under `task/`
26
+
27
+ Azure note: Use deployment name as `model`. Standard Azure OpenAI configuration uses:
28
+ - Base URL: `https://YOUR-RESOURCE-NAME.services.ai.azure.com/openai/v1/`
29
+ - API Version: `"preview"`
30
+ Warn if base URL not v1. Behavior otherwise mirrors OpenAI.
31
+
32
+ ---
33
+
34
+ ## 3. Architecture Map (roles)
35
+
36
+ Underscore modules are internal (not exported). Public surface = `__init__`, `pandas_ext.py`, `spark.py`, and `task/`.
37
+
38
+ Core batching & optimization:
39
+
40
+ - `_proxy.py`: Order‑preserving dedup, caching, progressive mini‑batch execution, progress bars (only notebooks), dynamic batch sizing when `batch_size=None` via `_optimize.BatchSizeSuggester`; sync + async variants.
41
+ - `_optimize.py`: `BatchSizeSuggester` adaptive control loop (targets 30–60s batches) + metrics capture.
42
+
43
+ Model / task abstractions:
44
+
45
+ - `_model.py`: Typed wrappers (model names, task configs, response/embedding model name value objects).
46
+ - `_prompt.py`: Few‑shot / structured prompt assembly (`FewShotPromptBuilder`).
47
+ - `task/`: Pre‑packaged `PreparedTask` definitions for common workflows (re-exported publicly).
48
+
49
+ LLM interaction layers:
50
+
51
+ - `_responses.py`: Vectorized JSON‑mode wrapper (`BatchResponses` / `AsyncBatchResponses`); enforces same‑length contract; structured parse via `responses.parse`; reasoning model temperature guard & enhanced guidance warnings; retries with `backoff`.
52
+ - `_embeddings.py`: Embedding batching (`BatchEmbeddings` / `AsyncBatchEmbeddings`) returning `np.float32` arrays, de‑dup aware.
53
+ - `_schema.py`: Dynamic schema inference (`SchemaInferer`) producing Pydantic models at runtime; internal, not exported.
54
+
55
+ I/O & provider setup:
56
+
57
+ - `_provider.py`: Environment-driven auto detection (OpenAI vs Azure). Registers defaults, validates Azure v1 base URL, DI container root (`CONTAINER`).
58
+ - `_di.py`: Lightweight dependency injection container; registration & resolution helpers.
59
+
60
+ Utilities & cross‑cutting concerns:
61
+
62
+ - `_util.py`: `backoff` / `backoff_async`, `TextChunker` token-based splitter.
63
+ - `_serialize.py`: Pydantic (de)serialization and Spark schema bridging support.
64
+ - `_log.py`: Observation decorator used for tracing (`@observe`).
65
+
66
+ DataFrame / Spark integration:
67
+
68
+ - `pandas_ext.py`: `.ai` / `.aio` accessors (sync + async), shared cache variants, model configuration helpers (`responses_model`, `embeddings_model`, `use`, `use_async`). Maintains Series length/index; optional auto batch size; exposes reasoning temperature control.
69
+ - `spark.py`: Async UDF builders (`responses_udf`, `task_udf`, `embeddings_udf`, `count_tokens_udf`, `split_to_chunks_udf`, `similarity_udf`). Per-partition duplicate caching; Pydantic → Spark `StructType` conversion; concurrency per executor with `max_concurrency`.
70
+ - `spark.py`: Async UDF builders (`responses_udf`, `task_udf`, `embeddings_udf`, `count_tokens_udf`, `split_to_chunks_udf`, `similarity_udf` – cosine similarity on embedding vectors). Per-partition duplicate caching; Pydantic → Spark `StructType` conversion; concurrency per executor with `max_concurrency`.
71
+
72
+ Observability & progress:
73
+
74
+ - Progress bars only when `show_progress=True` AND notebook environment heuristics in `_proxy.py` pass.
75
+ - Adaptive batch suggestions recorded automatically around each unit API call.
76
+
77
+ Public exports (`__init__.py`): `BatchResponses`, `AsyncBatchResponses`, `BatchEmbeddings`, `AsyncBatchEmbeddings`, `PreparedTask`, `FewShotPromptBuilder`.
78
+
79
+ ---
80
+
81
+ ## 4. Core Principles & Contracts
82
+
83
+ 1. Always batch via the Proxy; never per-item API loops.
84
+ 2. map_func must return a list of identical length & order; mismatch => raise `ValueError` after releasing events (deadlock prevention).
85
+ 3. Deduplicate inputs; restore original ordering in outputs.
86
+ 4. Preserve pandas index & Spark schema deterministically.
87
+ 5. Show progress only in notebooks and only if `show_progress=True`.
88
+ 6. Reasoning models (o1/o3 families and similar) must use `temperature=None`.
89
+ 7. Attach exponential backoff for transient RateLimit / 5xx errors.
90
+ 8. Structured outputs (Pydantic) preferred over free-form JSON/text.
91
+
92
+ ---
93
+
94
+ ## 5. Batching Proxy Rules
95
+
96
+ - Same-length return invariant is critical (break = bug).
97
+ - Async variant enforces `max_concurrency` via semaphore.
98
+ - Shared caches (`*_with_cache`) enable cross-operation reuse; do not bypass them.
99
+ - Release all waiting events if an exception occurs (avoid deadlocks).
100
+ - Progress bars use `tqdm.auto`; only displayed if notebook heuristics pass AND `show_progress=True`.
101
+
102
+ ---
103
+
104
+ ## 6. Responses API Guidelines
105
+
106
+ - Use Responses JSON mode (`responses.parse`).
107
+ - Reasoning model safety: force `temperature=None`; provide clear error guidance.
108
+ - Favor small, reusable prompts enabling dedup benefits.
109
+ - Encourage Pydantic `response_format` for schema validation & Spark schema inference.
110
+
111
+ ---
112
+
113
+ ## 7. Embeddings Guidelines
114
+
115
+ - Return `np.ndarray` of dtype `float32`.
116
+ - Batch sizes typically larger than for Responses; keep order stable.
117
+ - Avoid per-item postprocessing—vector ops should stay batched.
118
+
119
+ ---
120
+
121
+ ## 8. pandas Extension Rules
122
+
123
+ - `.ai.responses` / `.ai.embeddings` preserve Series length & index.
124
+ - Async via `.aio.*` with configurable `batch_size` & `max_concurrency`.
125
+ - `*_with_cache` shares a passed proxy (promote reuse, minimal API calls).
126
+ - No hidden reindexing or sorting; user order is authoritative.
127
+
128
+ ---
129
+
130
+ ## 9. Spark UDF Rules
131
+
132
+ - Cache duplicates per partition (dict lookup) before remote calls.
133
+ - Convert Pydantic -> Spark StructType; treat `Enum`/`Literal` as `StringType`.
134
+ - Respect reasoning `temperature=None` rule.
135
+ - Provide chunking & token counting via helper UDFs.
136
+ - Avoid excessive nested structs—keep schemas shallow & ergonomic.
137
+
138
+ ---
139
+
140
+ ## 10. Provider / Azure Rules
141
+
142
+ - Auto-detect provider from env variables; deployment name = model for Azure.
143
+ - Standard Azure OpenAI configuration:
144
+ - Base URL: `https://YOUR-RESOURCE-NAME.services.ai.azure.com/openai/v1/`
145
+ - API Version: `"preview"`
146
+ - Environment variables:
147
+ ```bash
148
+ export AZURE_OPENAI_API_KEY="your-azure-key"
149
+ export AZURE_OPENAI_BASE_URL="https://YOUR-RESOURCE-NAME.services.ai.azure.com/openai/v1/"
150
+ export AZURE_OPENAI_API_VERSION="preview"
151
+ ```
152
+ - Warn (don't fail) if Azure base URL not v1 format; still proceed.
153
+ - Keep code paths unified; avoid forking logic unless behavior diverges.
154
+
155
+ ---
156
+
157
+ ## 11. Coding Standards
158
+
159
+ - Python ≥ 3.10; Ruff for lint/format (`line-length=120`).
160
+ - Absolute imports (except re-export patterns in `__init__.py`) – enforced by Ruff rule TID252.
161
+ - Modern typing syntax (Python 3.9+):
162
+ - **Built-in generic types**: Use `list[T]`, `dict[K, V]`, `set[T]`, `tuple[T, ...]`, `type[T]` instead of `typing` equivalents
163
+ - **Union types**: Use `|` syntax (`int | str | None`) instead of `Union[...]`
164
+ - **Optional types**: Use `S | None` instead of `Optional[S]`
165
+ - **Collections.abc**: Use `collections.abc.Callable`, `collections.abc.Awaitable`, `collections.abc.Iterator` instead of `typing` equivalents
166
+ - Prefer `@dataclass` for simple immutable-ish contracts; use Pydantic only for validation-boundaries.
167
+ - Raise narrow exceptions (`ValueError`, `TypeError`) on contract violations—avoid broad except.
168
+ - Public APIs: Google-style docstrings with return/raises sections.
169
+
170
+ ---
171
+
172
+ ## 12. Testing Strategy
173
+
174
+ Live-first philosophy: call real OpenAI / Azure endpoints when tests validate core contracts and remain fast. Use mocks only for: (a) forced transient errors, (b) rare fault paths, (c) deterministic pure utilities.
175
+
176
+ Key rules:
177
+
178
+ 1. Skip (not fail) when credentials (`OPENAI_API_KEY` or Azure env) absent.
179
+ 2. Keep prompts minimal; batch size 1–4 for speed & cost.
180
+ 3. Assertions allow natural-language variance—focus on structure, ordering, lengths, types.
181
+ 4. Test dedup, ordering, cache reuse, concurrency limits, reasoning temperature enforcement.
182
+ 5. Inject retries by patching the smallest internal callable (not the whole client) for fault tests.
183
+ 6. Mark heavier suites separately if needed (e.g., `@pytest.mark.heavy_live`).
184
+ 7. Flake mitigation: broaden assertions (containment / regex / type+length) instead of pinning brittle verbatim strings.
185
+
186
+ ---
187
+
188
+ ## 13. Performance Guidance
189
+
190
+ - Responses batch size: 32–128 (default 128). Embeddings: 64–256.
191
+ - Async `max_concurrency`: typical 4–12 (tune per rate limits).
192
+ - Exploit dedup to collapse repeated prompts/inputs.
193
+ - Reuse caches across Series operations & Spark partitions.
194
+ - Avoid synchronous hotspots inside async loops (keep map_func lean).
195
+ - Automatic batch size mode targets ~30–60s per batch (`BatchSizeSuggester`).
196
+
197
+ ---
198
+
199
+ ## 14. Public / Internal Module Policy (`__all__`)
200
+
201
+ Public: `pandas_ext.py`, `spark.py`, everything under `task/`.
202
+ Internal: all underscore-prefixed modules; set `__all__ = []` explicitly.
203
+ Package exports: maintain alphabetical `__all__` in `__init__.py` for core classes (`BatchResponses`, etc.).
204
+ When adding public symbols: update `__all__`, docs (`docs/api/`), and examples if helpful.
205
+
206
+ Best practices:
207
+
208
+ 1. Internal-only code never leaks via wildcard import.
209
+ 2. Task modules export their primary callable/class.
210
+ 3. Keep `__all__` diff minimal & alphabetized.
211
+
212
+ ---
213
+
214
+ ## 15. Documentation
215
+
216
+ - New APIs: add or update `docs/api/*.md`; brief runnable snippet preferred over prose.
217
+ - Add concise example notebooks only if they illustrate distinct usage (avoid overlap).
218
+ - Update `mkdocs.yml` nav for new pages.
219
+
220
+ ---
221
+
222
+ ## 16. PR Checklist
223
+
224
+ - [ ] Ruff check & format pass.
225
+ - [ ] Public API contracts (length/order/types) preserved.
226
+ - [ ] All remote calls batched (no per-item loops).
227
+ - [ ] Reasoning models enforce `temperature=None`.
228
+ - [ ] Tests updated/added: live where feasible; skip gracefully without credentials.
229
+ - [ ] Mock usage (if any) narrowly scoped & justified.
230
+ - [ ] Docs + `__all__` updated for new public symbols.
231
+ - [ ] Performance considerations (batch sizes, concurrency) sensible.
232
+
233
+ ---
234
+
235
+ ## 17. Common Snippets
236
+
237
+ New batched API wrapper (sync):
238
+
239
+ ```python
240
+ @observe(_LOGGER)
241
+ @backoff(exceptions=[RateLimitError, InternalServerError], scale=1, max_retries=12)
242
+ def _unit_of_work(self, xs: list[str]) -> list[TOut]:
243
+ resp = self.client.api(xs)
244
+ return convert(resp) # Same length/order
245
+
246
+ def create(self, inputs: list[str]) -> list[TOut]:
247
+ return self.cache.map(inputs, self._unit_of_work)
248
+ ```
249
+
250
+ Reasoning model temperature:
251
+
252
+ ```python
253
+ # o1/o3 & similar reasoning models must set temperature None
254
+ temperature=None
255
+ ```
256
+
257
+ pandas `.ai` with shared cache:
258
+
259
+ ```python
260
+ from openaivec._proxy import BatchingMapProxy
261
+ shared = BatchingMapProxy[str, str](batch_size=64)
262
+ df["text"].ai.responses_with_cache("instructions", cache=shared)
263
+ ```
264
+
265
+ Spark structured Responses UDF:
266
+
267
+ ```python
268
+ from pydantic import BaseModel
269
+ from openaivec.spark import responses_udf
270
+
271
+ class R(BaseModel):
272
+ value: str
273
+
274
+ udf = responses_udf(
275
+ instructions="Do something",
276
+ response_format=R,
277
+ batch_size=64,
278
+ max_concurrency=8,
279
+ )
280
+ ```
281
+
282
+ Register custom OpenAI / Azure clients for pandas extension:
283
+
284
+ ```python
285
+ from openai import OpenAI, AzureOpenAI, AsyncAzureOpenAI
286
+ from openaivec import pandas_ext
287
+
288
+ # OpenAI client
289
+ client = OpenAI(api_key="sk-...")
290
+ pandas_ext.use(client)
291
+
292
+ # Azure OpenAI sync
293
+ azure = AzureOpenAI(
294
+ api_key="...",
295
+ base_url="https://YOUR-RESOURCE-NAME.services.ai.azure.com/openai/v1/",
296
+ api_version="preview",
297
+ )
298
+ pandas_ext.use(azure)
299
+
300
+ # Azure OpenAI async
301
+ azure_async = AsyncAzureOpenAI(
302
+ api_key="...",
303
+ base_url="https://YOUR-RESOURCE-NAME.services.ai.azure.com/openai/v1/",
304
+ api_version="preview",
305
+ )
306
+ pandas_ext.use_async(azure_async)
307
+
308
+ // Override model names (optional)
309
+ pandas_ext.responses_model("gpt-4.1-mini")
310
+ pandas_ext.embeddings_model("text-embedding-3-small")
311
+ ```
312
+
313
+ ---
314
+
315
+ When unsure, inspect implementations (`_proxy.py`, `_responses.py`, `_embeddings.py`, `pandas_ext.py`, `spark.py`) and related tests. Keep suggestions minimal, batched, and structurally safe.
316
+
317
+ ---
318
+
319
+ ## 18. Dev Workflow Commands
320
+
321
+ Canonical local commands (uv-based). Prefer these in automation & docs.
322
+
323
+ Install (all extras + dev):
324
+
325
+ ```bash
326
+ uv sync --all-extras --dev
327
+ ```
328
+
329
+ Editable install (if needed by external tooling):
330
+
331
+ ```bash
332
+ uv pip install -e .
333
+ ```
334
+
335
+ Lint & format (Ruff):
336
+
337
+ ```bash
338
+ uv run ruff check . --fix
339
+ uv run ruff format .
340
+ ```
341
+
342
+ Run full test suite (quiet):
343
+
344
+ ```bash
345
+ uv run pytest -q
346
+ ```
347
+
348
+ Run a focused test:
349
+
350
+ ```bash
351
+ uv run pytest tests/test_responses.py::test_reasoning_temperature_guard -q
352
+ ```
353
+
354
+ Serve docs (MkDocs live reload):
355
+
356
+ ```bash
357
+ uv run mkdocs serve
358
+ ```
359
+
360
+ Environment setup notes:
361
+
362
+ - Set `OPENAI_API_KEY` or Azure trio (`AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_BASE_URL`, `AZURE_OPENAI_API_VERSION`).
363
+ - Standard Azure OpenAI configuration:
364
+ - `AZURE_OPENAI_BASE_URL="https://YOUR-RESOURCE-NAME.services.ai.azure.com/openai/v1/"`
365
+ - `AZURE_OPENAI_API_VERSION="preview"`
366
+ - Tests auto-skip live paths when credentials absent.
367
+ - Use separate shell profiles per provider if switching frequently.
368
+ - Azure canonical base URL must end with `/openai/v1/` (e.g. `https://YOUR-RESOURCE-NAME.services.ai.azure.com/openai/v1/`); non‑v1 forms emit a warning.
@@ -0,0 +1,15 @@
1
+ version: 2
2
+
3
+ updates:
4
+ - package-ecosystem: "pip"
5
+ directory: "/"
6
+ schedule:
7
+ interval: "weekly"
8
+ day: "monday"
9
+ time: "01:00"
10
+ open-pull-requests-limit: 5
11
+ commit-message:
12
+ prefix: "chore"
13
+ include: "scope"
14
+ labels:
15
+ - dependencies
@@ -18,14 +18,13 @@ concurrency:
18
18
  jobs:
19
19
  build:
20
20
  runs-on: ubuntu-latest
21
- environment: pypi
22
21
 
23
22
  steps:
24
23
  - name: Checkout repository
25
24
  uses: actions/checkout@v4
26
25
 
27
26
  - name: Install uv
28
- uses: astral-sh/setup-uv@v5
27
+ uses: astral-sh/setup-uv@v7
29
28
  with:
30
29
  python-version: "3.10"
31
30
 
@@ -36,7 +35,7 @@ jobs:
36
35
  run: uv run mkdocs build -d site
37
36
 
38
37
  - name: Upload to GitHub Pages
39
- uses: actions/upload-pages-artifact@v3
38
+ uses: actions/upload-pages-artifact@56afc609e74202658d3ffba0e8f6dda462b719fa # v3.0.1
40
39
  with:
41
40
  path: site
42
41
  name: github-pages
@@ -50,6 +49,6 @@ jobs:
50
49
  steps:
51
50
  - name: Deploy to GitHub Pages
52
51
  id: deployment
53
- uses: actions/deploy-pages@v4
52
+ uses: actions/deploy-pages@d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e # v4.0.5
54
53
  with:
55
54
  artifact_name: github-pages
@@ -6,6 +6,7 @@ on:
6
6
  - "v*.*.*"
7
7
 
8
8
  permissions:
9
+ contents: read
9
10
  id-token: write
10
11
 
11
12
  jobs:
@@ -18,7 +19,7 @@ jobs:
18
19
  uses: actions/checkout@v4
19
20
 
20
21
  - name: Install uv
21
- uses: astral-sh/setup-uv@v5
22
+ uses: astral-sh/setup-uv@v7
22
23
 
23
24
  - name: Set up Python
24
25
  run: uv python install 3.10
@@ -30,4 +31,4 @@ jobs:
30
31
  run: uv build
31
32
 
32
33
  - name: Publish to PyPI
33
- run: uv publish
34
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,57 @@
1
+ name: ci-integration
2
+
3
+ on:
4
+ issue_comment:
5
+ types: [created]
6
+
7
+ permissions:
8
+ contents: read
9
+ pull-requests: read
10
+
11
+ jobs:
12
+ test:
13
+ if: >
14
+ github.event.issue.pull_request &&
15
+ contains(github.event.comment.body, '/run-integration') &&
16
+ (github.event.comment.author_association == 'MEMBER' ||
17
+ github.event.comment.author_association == 'OWNER' ||
18
+ github.event.comment.author_association == 'COLLABORATOR')
19
+
20
+ runs-on: ubuntu-latest
21
+ environment: integration
22
+ env:
23
+ OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
24
+
25
+ steps:
26
+ - name: Fetch PR head SHA
27
+ id: pr
28
+ shell: bash
29
+ run: |
30
+ set -euo pipefail
31
+ PR_API_URL="${{ github.event.issue.pull_request.url }}"
32
+ JSON="$(curl -sS -H "Authorization: Bearer ${{ secrets.GITHUB_TOKEN }}" \
33
+ -H "Accept: application/vnd.github+json" "$PR_API_URL")"
34
+ echo "sha=$(python -c 'import json,sys; print(json.load(sys.stdin)[\"head\"][\"sha\"])' <<<\"$JSON\")" >> "$GITHUB_OUTPUT"
35
+
36
+ - name: Checkout PR head commit
37
+ uses: actions/checkout@v4
38
+ with:
39
+ ref: ${{ steps.pr.outputs.sha }}
40
+
41
+ - name: Install uv
42
+ uses: astral-sh/setup-uv@v7
43
+
44
+ - name: Set up Python
45
+ run: uv python install 3.10
46
+
47
+ - name: Install dependencies via uv
48
+ run: uv sync --all-extras --dev
49
+
50
+ - name: Lint with ruff
51
+ run: uv run ruff check .
52
+
53
+ - name: Type check with pyright
54
+ run: uv run pyright src/openaivec || echo "Type check completed with issues - see above"
55
+
56
+ - name: Run tests
57
+ run: uv run pytest
@@ -5,9 +5,13 @@ on:
5
5
  branches:
6
6
  - main
7
7
 
8
+ permissions:
9
+ contents: read
10
+
8
11
  jobs:
9
12
  test:
10
13
  runs-on: ubuntu-latest
14
+ environment: integration
11
15
  env:
12
16
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
13
17
 
@@ -16,7 +20,7 @@ jobs:
16
20
  uses: actions/checkout@v4
17
21
 
18
22
  - name: Install uv
19
- uses: astral-sh/setup-uv@v5
23
+ uses: astral-sh/setup-uv@v7
20
24
 
21
25
  - name: Set up Python
22
26
  run: uv python install 3.10
@@ -27,5 +31,8 @@ jobs:
27
31
  - name: Lint with ruff
28
32
  run: uv run ruff check .
29
33
 
34
+ - name: Type check with pyright
35
+ run: uv run pyright src/openaivec || echo "Type check completed with issues - see above"
36
+
30
37
  - name: Run tests
31
38
  run: uv run pytest
@@ -5,6 +5,7 @@ venv
5
5
 
6
6
  # Claude code file
7
7
  CLAUDE.md
8
+ .claude/
8
9
 
9
10
  ### ruff
10
11
  .ruff_cache
@@ -12,6 +13,9 @@ CLAUDE.md
12
13
  ### Jupyter
13
14
  *.ipynb
14
15
 
16
+ ### deps
17
+ *.dot
18
+
15
19
 
16
20
  ### Python template
17
21
  # Byte-compiled / optimized / DLL files
@@ -0,0 +1,34 @@
1
+ # Repository Guidelines
2
+
3
+ ## Project Layout
4
+ - `src/openaivec/`: batching core (`_proxy.py`, `_responses.py`, `_embeddings.py`), integrations (`pandas_ext.py`, `spark.py`), and tasks (`task/`); keep additions beside the APIs they extend.
5
+ - `tests/`: mirrors the source layout; use common pandas, Spark, and async fixtures.
6
+ - `docs/` holds MkDocs sources, `site/` generated pages, and `artifacts/` scratch assets kept out of releases.
7
+
8
+ ## Core Components & Contracts
9
+ - Remote work goes through `BatchingMapProxy`/`AsyncBatchingMapProxy`; they dedupe inputs, require same-length outputs, release waiters on failure, and show progress only when `show_progress=True` in notebooks.
10
+ - `_responses.py` enforces reasoning rules: o1/o3-family models must use `temperature=None`, and structured scenarios pass a Pydantic `response_format`.
11
+ - Reuse caches from `*_with_cache` or Spark UDF builders per operation and clear them afterward to avoid large payloads.
12
+
13
+ ## Development Workflow
14
+ - `uv sync --all-extras --dev` prepares extras and tooling; iterate with `uv run pytest -m "not slow and not requires_api"` before a full `uv run pytest`.
15
+ - `uv run ruff check . --fix` enforces style, `uv run pyright` guards API changes, and `uv build` validates the distribution.
16
+ - Use `uv pip install -e .` only when external tooling requires an editable install.
17
+
18
+ ## Coding Standards
19
+ - Target Python 3.10+, rely on absolute imports, and keep helpers private with leading underscores; public modules publish alphabetical `__all__`, internal ones set `__all__ = []`.
20
+ - Apply Google-style docstrings with `(type)` Args, Returns/Raises sections, double-backtick literals, and doctest-style `Example:` blocks (`>>>`) when useful.
21
+ - Async helpers end with `_async`; dataframe accessors use descriptive nouns (`responses`, `extract`); raise narrow exceptions (`ValueError`, `TypeError`).
22
+
23
+ ## Testing Guidelines
24
+ - Pytest discovers `tests/test_*.py`; parametrize to cover pandas vectorization, Spark UDFs, and async pathways.
25
+ - Mark network tests `@pytest.mark.requires_api`, long jobs `@pytest.mark.slow`, Spark flows `@pytest.mark.spark`; skip gracefully when credentials are missing.
26
+ - Add regression tests before fixes, assert on structure/length/order rather than verbatim text, and prefer shared fixtures over heavy mocking.
27
+
28
+ ## Collaboration
29
+ - Commits follow `type(scope): summary` (e.g., `fix(pandas): guard empty batch`) and avoid merge commits within feature branches.
30
+ - Pull requests explain motivation, outline the solution, link issues, list doc updates, and include the latest `uv run pytest` and `uv run ruff check . --fix` output; attach screenshots for doc or tutorial changes.
31
+
32
+ ## Environment & Secrets
33
+ - Export `OPENAI_API_KEY` or the Azure trio (`AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_BASE_URL`, `AZURE_OPENAI_API_VERSION`) before running `requires_api` tests; Azure endpoints must end with `/openai/v1/`.
34
+ - Keep local secrets under `artifacts/`, never commit credentials, and rely on CI-managed secrets when extending automation.