vectorwave 0.2.5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. vectorwave-0.2.5/PKG-INFO +680 -0
  2. vectorwave-0.2.5/Readme.md +653 -0
  3. vectorwave-0.2.5/crates/Cargo.lock +331 -0
  4. vectorwave-0.2.5/crates/Cargo.toml +15 -0
  5. vectorwave-0.2.5/crates/src/lib.rs +148 -0
  6. vectorwave-0.2.5/pyproject.toml +43 -0
  7. vectorwave-0.2.5/src/vectorwave/__init__.py +29 -0
  8. vectorwave-0.2.5/src/vectorwave/batch/__init__.py +0 -0
  9. vectorwave-0.2.5/src/vectorwave/batch/batch.py +176 -0
  10. vectorwave-0.2.5/src/vectorwave/core/__init__.py +0 -0
  11. vectorwave-0.2.5/src/vectorwave/core/auto_injector.py +100 -0
  12. vectorwave-0.2.5/src/vectorwave/core/core.py +0 -0
  13. vectorwave-0.2.5/src/vectorwave/core/decorator.py +191 -0
  14. vectorwave-0.2.5/src/vectorwave/core/generator.py +140 -0
  15. vectorwave-0.2.5/src/vectorwave/core/llm/__init__.py +0 -0
  16. vectorwave-0.2.5/src/vectorwave/core/llm/base.py +47 -0
  17. vectorwave-0.2.5/src/vectorwave/core/llm/factory.py +13 -0
  18. vectorwave-0.2.5/src/vectorwave/core/llm/openai_client.py +79 -0
  19. vectorwave-0.2.5/src/vectorwave/database/__init__.py +0 -0
  20. vectorwave-0.2.5/src/vectorwave/database/archiver.py +100 -0
  21. vectorwave-0.2.5/src/vectorwave/database/dataset.py +150 -0
  22. vectorwave-0.2.5/src/vectorwave/database/db.py +413 -0
  23. vectorwave-0.2.5/src/vectorwave/database/db_search.py +496 -0
  24. vectorwave-0.2.5/src/vectorwave/exception/__init__.py +0 -0
  25. vectorwave-0.2.5/src/vectorwave/exception/exceptions.py +22 -0
  26. vectorwave-0.2.5/src/vectorwave/models/__init__.py +0 -0
  27. vectorwave-0.2.5/src/vectorwave/models/db_config.py +143 -0
  28. vectorwave-0.2.5/src/vectorwave/monitoring/__init__.py +0 -0
  29. vectorwave-0.2.5/src/vectorwave/monitoring/alert/__init__.py +0 -0
  30. vectorwave-0.2.5/src/vectorwave/monitoring/alert/base.py +8 -0
  31. vectorwave-0.2.5/src/vectorwave/monitoring/alert/factory.py +21 -0
  32. vectorwave-0.2.5/src/vectorwave/monitoring/alert/null_alerter.py +7 -0
  33. vectorwave-0.2.5/src/vectorwave/monitoring/alert/webhook_alerter.py +69 -0
  34. vectorwave-0.2.5/src/vectorwave/monitoring/monitoring.py +0 -0
  35. vectorwave-0.2.5/src/vectorwave/monitoring/tracer.py +538 -0
  36. vectorwave-0.2.5/src/vectorwave/prediction/__init__.py +0 -0
  37. vectorwave-0.2.5/src/vectorwave/prediction/predictor.py +0 -0
  38. vectorwave-0.2.5/src/vectorwave/search/__init__.py +0 -0
  39. vectorwave-0.2.5/src/vectorwave/search/execution_search.py +154 -0
  40. vectorwave-0.2.5/src/vectorwave/search/extended_search.py +0 -0
  41. vectorwave-0.2.5/src/vectorwave/search/rag_search.py +154 -0
  42. vectorwave-0.2.5/src/vectorwave/utils/__init__.py +0 -0
  43. vectorwave-0.2.5/src/vectorwave/utils/context.py +4 -0
  44. vectorwave-0.2.5/src/vectorwave/utils/function_cache.py +96 -0
  45. vectorwave-0.2.5/src/vectorwave/utils/healer.py +155 -0
  46. vectorwave-0.2.5/src/vectorwave/utils/replayer.py +265 -0
  47. vectorwave-0.2.5/src/vectorwave/utils/replayer_semantic.py +234 -0
  48. vectorwave-0.2.5/src/vectorwave/utils/return_caching_utils.py +152 -0
  49. vectorwave-0.2.5/src/vectorwave/utils/status.py +55 -0
  50. vectorwave-0.2.5/src/vectorwave/vectorizer/__init__.py +0 -0
  51. vectorwave-0.2.5/src/vectorwave/vectorizer/base.py +12 -0
  52. vectorwave-0.2.5/src/vectorwave/vectorizer/factory.py +51 -0
  53. vectorwave-0.2.5/src/vectorwave/vectorizer/huggingface_vectorizer.py +34 -0
  54. vectorwave-0.2.5/src/vectorwave/vectorizer/openai_vectorizer.py +46 -0
@@ -0,0 +1,680 @@
1
+ Metadata-Version: 2.4
2
+ Name: vectorwave
3
+ Version: 0.2.5
4
+ Classifier: Programming Language :: Python :: 3
5
+ Classifier: Programming Language :: Python :: 3.10
6
+ Classifier: Programming Language :: Python :: 3.11
7
+ Classifier: Programming Language :: Python :: 3.12
8
+ Classifier: Programming Language :: Python :: 3.13
9
+ Classifier: Programming Language :: Rust
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Requires-Dist: weaviate-client>=4.0.0
14
+ Requires-Dist: pydantic-settings>=2.0.0
15
+ Requires-Dist: sentence-transformers
16
+ Requires-Dist: requests
17
+ Requires-Dist: openai
18
+ License-File: LICENSE
19
+ License-File: NOTICE
20
+ Summary: VectorWave: Seamless Auto-Vectorization Framework
21
+ Author-email: junyeonggim <junyeonggim5@gmail.com>
22
+ License-Expression: MIT
23
+ Requires-Python: >=3.10
24
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
25
+ Project-URL: Repository, https://github.com/cozymori/vectorwave
26
+
27
+
28
+ # VectorWave: Seamless Auto-Vectorization Framework
29
+
30
+ [LICENSE](https://www.google.com/search?q=LICENSE)
31
+
32
+ ## ๐ŸŒŸ Project Overview
33
+
34
+ **VectorWave** is an innovative framework that uses **decorators** to automatically store and manage the output of Python functions/methods in a **Vector Database (Vector DB)**. Developers can transform function output into intelligent vector data with a single line of code (`@vectorize`), without worrying about the complex processes of data collection, embedding generation, and vector database storage.
35
+
36
+ ---
37
+
38
+ ## โœจ Key Features
39
+
40
+ * **`@vectorize` Decorator:**
41
+ 1. **Static Data Collection:** Upon script load, the function's source code, docstring, and metadata are stored once in the `VectorWaveFunctions` collection.
42
+ 2. **Dynamic Data Logging:** Every time the function is called, execution time, success/failure status, error logs, and 'dynamic tags' are recorded in the `VectorWaveExecutions` collection.
43
+ * **(NEW) AI-Powered Function Documentation:** Uses a Large Language Model (LLM) to automatically generate the **`search_description`** and **`sequence_narrative`**. This significantly reduces manual effort and improves semantic search quality.
44
+ * **(NEW) Deferred Registration:** LLM documentation generation is deferred and runs only upon explicit command, **completely avoiding latency** during application startup.
45
+ * **Semantic Caching and Performance Optimization:**
46
+ * Determines a cache hit based on the **semantic similarity** of function inputs, bypassing function execution for identical or highly similar inputs and returning stored results instantly.
47
+ * This is highly effective in **significantly reducing execution latency** and cost, especially for high-cost computational functions (e.g., LLM calls, complex data processing).
48
+ * **Distributed Tracing:** Combines `@vectorize` and `@trace_span` decorators to group the execution of complex multi-step workflows under a single **`trace_id`** for analysis.
49
+ * **Search Interface:** Provides `search_functions` and `search_executions` to query stored vector data (function definitions) and logs (execution history), facilitating the building of RAG and monitoring systems.
50
+
51
+ ---
52
+
53
+ ## ๐Ÿš€ Usage
54
+
55
+ VectorWave consists of 'storing' via decorators and 'searching' via functions, now including **execution flow tracing**.
56
+
57
+ ### 1. (Required) Database Initialization and Setup
58
+
59
+ ```python
60
+ import time
61
+ from vectorwave import (
62
+ vectorize,
63
+ initialize_database,
64
+ search_functions,
65
+ search_executions,
66
+ generate_and_register_metadata # Added for Auto-Doc
67
+ )
68
+ # [Add] Import trace_span separately for distributed tracing.
69
+ from vectorwave.monitoring.tracer import trace_span
70
+
71
+ # Should only be called once when the script starts.
72
+ try:
73
+ client = initialize_database()
74
+ print("VectorWave DB initialization successful.")
75
+ except Exception as e:
76
+ print(f"DB initialization failed: {e}")
77
+ exit()
78
+ ````
79
+
80
+ ### 2\. [Store] Using `@vectorize` and Distributed Tracing
81
+
82
+ `@vectorize` acts as the **Root** of the tracing, and applying `@trace_span` to internal functions groups the workflow execution under a **single `trace_id`**.
83
+
84
+ ```python
85
+ # --- Sub-span function: Captures arguments ---
86
+ @trace_span(attributes_to_capture=['user_id', 'amount'])
87
+ def step_1_validate_payment(user_id: str, amount: int):
88
+ """(Span) Validates payment. Records user_id and amount to logs."""
89
+ print(f" [SPAN 1] Validating payment for {user_id}...")
90
+ time.sleep(0.1)
91
+ return True
92
+
93
+ @trace_span(attributes_to_capture=['user_id', 'receipt_id'])
94
+ def step_2_send_receipt(user_id: str, receipt_id: str):
95
+ """(Span) Sends the receipt."""
96
+ print(f" [SPAN 2] Sending receipt {receipt_id}...")
97
+ time.sleep(0.2)
98
+
99
+
100
+ # --- Root function (Acts as @trace_root) ---
101
+ @vectorize(
102
+ search_description="Process user payment and return a receipt.",
103
+ sequence_narrative="After payment is complete, a receipt is sent via email.",
104
+ team="billing", # โฌ…๏ธ Custom Tag (Recorded in all execution logs)
105
+ priority=1 # โฌ…๏ธ Custom Tag (Execution Priority)
106
+ )
107
+ def process_payment(user_id: str, amount: int):
108
+ """(Root Span) Executes the user payment workflow."""
109
+ print(f" [ROOT EXEC] process_payment: Starting workflow for {user_id}...")
110
+
111
+ # When calling sub-functions, the same trace_id is automatically inherited via ContextVar.
112
+ step_1_validate_payment(user_id=user_id, amount=amount)
113
+
114
+ receipt_id = f"receipt_{user_id}_{amount}"
115
+ step_2_send_receipt(user_id=user_id, receipt_id=receipt_id)
116
+
117
+ print(f" [ROOT DONE] process_payment")
118
+ return {"status": "success", "receipt_id": receipt_id}
119
+
120
+ # --- Function Execution ---
121
+ print("Now calling 'process_payment'...")
122
+ # This single call records 3 execution logs (spans) in the DB,
123
+ # and all three logs are grouped under one 'trace_id'.
124
+ process_payment("user_789", 5000)
125
+ ```
126
+
127
+ -----
128
+
129
+ ### 2.1. ๐Ÿ’ก AI Documentation Setup (LLM Configuration)
130
+
131
+ To use the LLM feature, you must specify dependencies and environment variables.
132
+
133
+ #### Prerequisites for AI Auto-Documentation
134
+
135
+ To use the AI-powered documentation feature, you must have the `openai` library installed and configure your API key.
136
+
137
+ 1. **Install Library:**
138
+ ```bash
139
+ pip install openai
140
+ ```
141
+
142
+ 2. **Set API Key:** Add your valid OpenAI API key to your `.env` file.
143
+ ```ini
144
+ OPENAI_API_KEY="sk-proj-YOUR_API_KEY_HERE"
145
+ # WEAVIATE_GENERATIVE_MODULE="generative-openai" (Required to enable the Weaviate module when using OpenAI LLM)
146
+ ```
147
+
148
+ ### 2.2. ๐Ÿš€ Usage: Auto-Generating Function Metadata (Auto=True)
149
+
150
+ Instead of manually defining `search_description` and `sequence_narrative`, you can use the `auto=True` flag.
151
+
152
+ #### 3. Automatic Function Metadata Generation (Auto=True)
153
+
154
+ You can use the `auto=True` flag instead of manually defining `search_description` and `sequence_narrative`.
155
+
156
+ 1. **Mark Function:** Set `auto=True`. It is **strongly recommended to include a detailed Docstring** to enhance the LLM's analysis quality.
157
+
158
+ ```python
159
+ # Code from vectorwave/test_ex/example.py
160
+ @vectorize(auto=True, team="loyalty-program")
161
+ def calculate_loyalty_points(purchase_amount: int, is_vip: bool):
162
+ """
163
+ Function to calculate loyalty points based on purchase amount.
164
+ VIP customers earn double points.
165
+ """
166
+ points = purchase_amount // 10
167
+ if is_vip:
168
+ points *= 2
169
+ return {"points": points, "tier": "VIP" if is_vip else "Regular"}
170
+ ```
171
+
172
+ 2. **Trigger Generation:** Call `generate_and_register_metadata()` **immediately after** all `@vectorize` function definitions are complete. This function calls the LLM, vectorizes the generated metadata, and registers it to the DB.
173
+
174
+ ```python
175
+ # ... (After defining the calculate_loyalty_points function above)
176
+
177
+ # [Mandatory] Must be called after all function definitions are complete.
178
+ print("๐Ÿš€ Checking for functions needing auto-documentation...")
179
+ generate_and_register_metadata()
180
+ ```
181
+
182
+ > **Note:** Since this process involves LLM API calls, it can cause **latency** if run during server startup. It is recommended to execute this via a separate management script or an admin API endpoint in a production environment.
183
+ -----
184
+
185
+ #### Semantic Caching Example
186
+
187
+ Configure the system to prevent re-execution for similar inputs and return cached results.
188
+
189
+ ```python
190
+ from vectorwave import vectorize
191
+ import time
192
+
193
+ @vectorize(
194
+ search_description="High-cost summarization task using LLM",
195
+ sequence_narrative="LLM Summarization Step",
196
+ semantic_cache=True, # Enable caching
197
+ cache_threshold=0.95, # Cache hit if similarity is > 95%
198
+ capture_return_value=True # Must capture return value for caching
199
+ )
200
+ def summarize_document(document_text: str):
201
+ # Actual LLM call or high-cost computation logic (e.g., 0.5s delay)
202
+ time.sleep(0.5)
203
+ print("--- [Cache Miss] Document is being summarized by LLM...")
204
+ return f"Summary of: {document_text[:20]}..."
205
+
206
+ # First call (Cache Miss) - Takes 0.5s, result stored in DB
207
+ result_1 = summarize_document("The first quarter results showed strong growth in Europe and Asia...")
208
+
209
+ # Second call (Cache Hit) - Takes 0.0s, returns cached value
210
+ # "Q1 results" may semantically match "first quarter results", leading to a cache hit.
211
+ result_2 = summarize_document("The Q1 results demonstrated strong growth in Europe and Asia...")
212
+
213
+ # result_2 returns the stored value from result_1 without actual function execution.
214
+ ```
215
+
216
+ ### 3\. [Search โ‘ ] Function Definition Search (for RAG)
217
+
218
+ ```python
219
+ # Semantically search for functions related to 'payment'.
220
+ print("\n--- Searching for 'payment' related functions ---")
221
+ payment_funcs = search_functions(
222
+ query="User payment processing function",
223
+ limit=3
224
+ )
225
+ for func in payment_funcs:
226
+ print(f" - Function Name: {func['properties']['function_name']}")
227
+ print(f" - Description: {func['properties']['search_description']}")
228
+ print(f" - Similarity (Distance): {func['metadata'].distance:.4f}")
229
+ ```
230
+
231
+ ### 4\. [Search โ‘ก] Execution Log Search (Monitoring and Tracing)
232
+
233
+ The `search_executions` function can now search for all related execution logs (spans) based on the `trace_id`.
234
+
235
+ ```python
236
+ # 1. Find the Trace ID of the latest 'process_payment' workflow.
237
+ latest_payment_span = search_executions(
238
+ limit=1,
239
+ filters={"function_name": "process_payment"},
240
+ sort_by="timestamp_utc",
241
+ sort_ascending=False
242
+ )
243
+ trace_id = latest_payment_span[0]["trace_id"]
244
+
245
+ # 2. Search all spans belonging to that Trace ID, sorted chronologically.
246
+ print(f"\n--- Full Trace ({trace_id[:8]}...) ---")
247
+ trace_spans = search_executions(
248
+ limit=10,
249
+ filters={"trace_id": trace_id},
250
+ sort_by="timestamp_utc",
251
+ sort_ascending=True # Ascending sort for workflow flow analysis
252
+ )
253
+
254
+ for i, span in enumerate(trace_spans):
255
+ print(f" - [Span {i+1}] {span['function_name']} ({span['duration_ms']:.2f}ms)")
256
+ # Captured arguments (user_id, amount, etc.) are also displayed.
257
+
258
+ # Expected result:
259
+ # - [Span 1] step_1_validate_payment (100.81ms)
260
+ # - [Span 2] step_2_send_receipt (202.06ms)
261
+ # - [Span 3] process_payment (333.18ms)
262
+ ```
263
+
264
+ -----
265
+
266
+ ## โš™๏ธ Configuration
267
+
268
+ VectorWave automatically reads Weaviate database connection information and **vectorization strategy** from **environment variables** or a `.env` file.
269
+
270
+ Create a `.env` file in your project's root directory (e.g., where `test_ex/example.py` is located) and set the necessary values.
271
+
272
+ ### Vectorization Strategy Settings (VECTORIZER)
273
+
274
+ You can select the text vectorization method via the `VECTORIZER` environment variable in your `test_ex/.env` file.
275
+
276
+ | `VECTORIZER` Setting | Description | Required Additional Settings |
277
+ | :--- | :--- | :--- |
278
+ | **`huggingface`** | (Recommended Default) Uses the `sentence-transformers` library on the local CPU for vectorization. No API key needed for testing. | `HF_MODEL_NAME` (e.g., "sentence-transformers/all-MiniLM-L6-v2") |
279
+ | **`openai_client`** | (High-Performance) Calls the OpenAI API directly via the Python client for models like `text-embedding-3-small`. | `OPENAI_API_KEY` (Valid OpenAI API key) |
280
+ | **`weaviate_module`** | (Docker Delegation) Delegates vectorization to the module built into the Weaviate Docker container (e.g., `text2vec-openai`). | `WEAVIATE_VECTORIZER_MODULE`, `OPENAI_API_KEY` |
281
+ | **`none`** | No vectorization is performed. Data is stored without vectors. | None |
282
+
283
+ #### โš ๏ธ Semantic Caching Prerequisites and Settings
284
+
285
+ To use `semantic_cache=True`, the following conditions must be met:
286
+
287
+ * **Vectorizer Required:** A **Python-based vectorizer** (`huggingface` or `openai_client`) must be configured in the library settings (`VECTORIZER` env var). Caching is automatically disabled if `weaviate_module` or `none` is set.
288
+ * **Return Value Capture Mandatory:** When `semantic_cache=True` is enabled, the `capture_return_value` parameter is automatically set to `True`.
289
+
290
+ ### .env File Examples
291
+
292
+ Configure the contents of your `.env` file according to the strategy you intend to use.
293
+
294
+ #### Example 1: Using `huggingface` (Local, No API Key Needed)
295
+
296
+ Uses a `sentence-transformers` model on the local machine. Easy for testing as no API key is needed.
297
+
298
+ ```ini
299
+ # .env (For HuggingFace)
300
+ # --- Basic Weaviate Connection Settings ---
301
+ WEAVIATE_HOST=localhost
302
+ WEAVIATE_PORT=8080
303
+ WEAVIATE_GRPC_PORT=50051
304
+
305
+ # --- [Strategy 1] HuggingFace Settings ---
306
+ VECTORIZER="huggingface"
307
+ HF_MODEL_NAME="sentence-transformers/all-MiniLM-L6-v2"
308
+
309
+ # (OPENAI_API_KEY is not needed in this mode)
310
+ OPENAI_API_KEY=sk-...
311
+
312
+ # --- [Advanced] Custom Property Settings ---
313
+ CUSTOM_PROPERTIES_FILE_PATH=.weaviate_properties
314
+ FAILURE_MAPPING_FILE_PATH=.vectorwave_errors.json
315
+ RUN_ID=test-run-001
316
+ ```
317
+
318
+ #### Example 2: Using `openai_client` (Python Client, High-Performance)
319
+
320
+ Calls the OpenAI API directly via the `openai` Python library.
321
+
322
+ ```ini
323
+ # .env (For OpenAI Python Client)
324
+ # --- Basic Weaviate Connection Settings ---
325
+ WEAVIATE_HOST=localhost
326
+ WEAVIATE_PORT=8080
327
+ WEAVIATE_GRPC_PORT=50051
328
+
329
+ # --- [Strategy 2] OpenAI Client Settings ---
330
+ VECTORIZER="openai_client"
331
+
332
+ # [REQUIRED] Must enter a valid OpenAI API key.
333
+ OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
334
+
335
+ # (HF_MODEL_NAME is not used in this mode)
336
+ HF_MODEL_NAME=...
337
+
338
+ # --- [Advanced] Custom Property Settings ---
339
+ CUSTOM_PROPERTIES_FILE_PATH=.weaviate_properties
340
+ RUN_ID=test-run-001
341
+ ```
342
+
343
+ #### Example 3: Using `weaviate_module` (Docker Delegation)
344
+
345
+ Delegates vectorization to the Weaviate Docker container instead of Python. (Refer to `vw_docker.yml` settings)
346
+
347
+ ```ini
348
+ # .env (For Weaviate Module Delegation)
349
+ # --- Basic Weaviate Connection Settings ---
350
+ WEAVIATE_HOST=localhost
351
+ WEAVIATE_PORT=8080
352
+ WEAVIATE_GRPC_PORT=50051
353
+
354
+ # --- [Strategy 3] Weaviate Module Settings ---
355
+ VECTORIZER="weaviate_module"
356
+ WEAVIATE_VECTORIZER_MODULE=text2vec-openai
357
+ WEAVIATE_GENERATIVE_MODULE=generative-openai
358
+
359
+ # [REQUIRED] The Weaviate container reads and uses this API key.
360
+ OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
361
+
362
+ # --- [Advanced] Custom Property Settings ---
363
+ CUSTOM_PROPERTIES_FILE_PATH=.weaviate_properties
364
+ RUN_ID=test-run-001
365
+ ```
366
+
367
+ -----
368
+
369
+ ### ๐Ÿš€ Advanced Failure Tracking (Error Code)
370
+
371
+ Beyond simple `status: "ERROR"` recording, `error_code` attributes are added to `VectorWaveExecutions` logs to categorize failure causes.
372
+
373
+ When an exception occurs in a function decorated with `@vectorize` or `@trace_span`, the `error_code` is automatically determined based on the following 3 priorities:
374
+
375
+ 1. **Custom Exception Attribute (Priority 1):**
376
+ The most specific method. If the raised exception object `e` has an `e.error_code` attribute, that value is used as the `error_code`.
377
+
378
+ ```python
379
+ class PaymentError(Exception):
380
+ def __init__(self, message, error_code):
381
+ super().__init__(message)
382
+ self.error_code = error_code # โฌ…๏ธ This attribute is detected.
383
+
384
+ @vectorize(...)
385
+ def process_payment(amount):
386
+ if amount < 0:
387
+ raise PaymentError("Amount < 0", error_code="PAYMENT_NEGATIVE_AMOUNT")
388
+
389
+ # DB Log upon execution: { "status": "ERROR", "error_code": "PAYMENT_NEGATIVE_AMOUNT" }
390
+ ```
391
+
392
+ 2. **Global Mapping File (Priority 2):**
393
+ Manages common exceptions like `ValueError` centrally. The exception class name is looked up as a key in the JSON file specified by `FAILURE_MAPPING_FILE_PATH` (default: `.vectorwave_errors.json`) in the `.env` file.
394
+
395
+ **`.vectorwave_errors.json` Example:**
396
+
397
+ ```json
398
+ {
399
+ "ValueError": "INVALID_INPUT",
400
+ "KeyError": "CONFIG_MISSING",
401
+ "TypeError": "INVALID_INPUT"
402
+ }
403
+ ```
404
+
405
+ ```python
406
+ @vectorize(...)
407
+ def get_config(key):
408
+ return os.environ[key] # โฌ…๏ธ Raises KeyError
409
+
410
+ # DB Log upon execution: { "status": "ERROR", "error_code": "CONFIG_MISSING" }
411
+ ```
412
+
413
+ 3. **Default Value (Priority 3):**
414
+ Any exception not covered by P1 or P2 has its exception class name (e.g., `"ZeroDivisionError"`) automatically saved as the `error_code`.
415
+
416
+ **[Usage] Searching Failure Logs:**
417
+ You can now filter `search_executions` by `error_code` to aggregate only specific types of failures.
418
+
419
+ ```python
420
+ # Search all failure logs categorized as "INVALID_INPUT"
421
+ invalid_logs = search_executions(
422
+ filters={"error_code": "INVALID_INPUT"},
423
+ limit=10
424
+ )
425
+ ```
426
+
427
+ -----
428
+
429
+ ### Custom Properties and Dynamic Execution Tagging
430
+
431
+ VectorWave can store user-defined additional metadata alongside static data (function definitions) and dynamic data (execution logs). This works in two steps.
432
+
433
+ #### Step 1: Define Custom Schema (Tag "Allow-List")
434
+
435
+ Create a JSON file at the path specified by `CUSTOM_PROPERTIES_FILE_PATH` (default: `.weaviate_properties`) in the `.env` file.
436
+
437
+ This file instructs VectorWave to add **new properties (columns)** to the Weaviate collections. This file acts as the **"allow-list"** for all custom tags.
438
+
439
+ **`.weaviate_properties` Example:**
440
+
441
+ ```json
442
+ {
443
+ "run_id": {
444
+ "data_type": "TEXT",
445
+ "description": "The ID of the specific test run"
446
+ },
447
+ "experiment_id": {
448
+ "data_type": "TEXT",
449
+ "description": "Identifier for the experiment"
450
+ },
451
+ "team": {
452
+ "data_type": "TEXT",
453
+ "description": "The team responsible for this function"
454
+ },
455
+ "priority": {
456
+ "data_type": "INT",
457
+ "description": "Execution priority"
458
+ }
459
+ }
460
+ ```
461
+
462
+ * Properties defined above will be added to both `VectorWaveFunctions` and `VectorWaveExecutions` collections.
463
+
464
+ #### Step 2: Dynamic Execution Tagging (Adding Values)
465
+
466
+ When a function executes, VectorWave adds tags to the `VectorWaveExecutions` log. These tags are collected and merged in two ways:
467
+
468
+ **1. Global Tags (Environment Variables)**
469
+ VectorWave looks for environment variables whose **uppercase names** (e.g., `RUN_ID`, `EXPERIMENT_ID`) match the keys defined in Step 1. Found values are loaded as `global_custom_values` and added to *all* execution logs. Ideal for metadata spanning the entire script execution.
470
+
471
+ **2. Per-Function Tags (Decorator)**
472
+ Tags can be passed directly to the `@vectorize` decorator as keyword arguments (`**execution_tags`). Ideal for function-specific metadata.
473
+
474
+ ```python
475
+ # --- .env file ---
476
+ # RUN_ID=global-run-abc
477
+ # TEAM=default-team
478
+
479
+ @vectorize(
480
+ search_description="Process payment",
481
+ sequence_narrative="...",
482
+ team="billing", # <-- Per-function tag
483
+ priority=1 # <-- Per-function tag
484
+ )
485
+ def process_payment():
486
+ pass
487
+
488
+ @vectorize(
489
+ search_description="Another function",
490
+ sequence_narrative="...",
491
+ run_id="override-run-xyz" # <-- Overrides global tag
492
+ )
493
+ def other_function():
494
+ pass
495
+ ```
496
+
497
+ **Tag Merging and Validation Rules**
498
+
499
+ 1. **Validation (Important):** Tags (global or per-function) are **only** stored in Weaviate if their key (e.g., `run_id`, `team`, `priority`) is first defined in the `.weaviate_properties` file (Step 1). Tags not defined in the schema are **ignored**, and a warning is printed upon script startup.
500
+
501
+ 2. **Precedence (Override):** If a tag key is defined in both places (e.g., global `RUN_ID` from `.env` and per-function `run_id="override-xyz"`), the **per-function tag explicitly defined in the decorator always wins**.
502
+
503
+ **Resulting Logs:**
504
+
505
+ * `process_payment()` execution log: `{"run_id": "global-run-abc", "team": "billing", "priority": 1}`
506
+ * `other_function()` execution log: `{"run_id": "override-run-xyz", "team": "default-team"}`
507
+
508
+ -----
509
+
510
+ ### ๐Ÿš€ Real-time Error Alerting (Webhook)
511
+
512
+ Beyond simple log storage, `VectorWave` can send real-time alerts via **Webhook** immediately upon error occurrence. This feature is built into the `tracer` and can be enabled simply by modifying the `.env` file.
513
+
514
+ **How it Works:**
515
+
516
+ 1. An exception occurs in a function decorated with `@trace_span` or `@vectorize`.
517
+ 2. The `tracer` immediately detects the error and calls the `alerter` object.
518
+ 3. The `alerter` reads the `.env` settings, uses `WebhookAlerter`, and dispatches the error information to the configured URL.
519
+ 4. The notification is optimized for **Discord Embed** format, sending a detailed report including the error code, trace ID, captured attributes (`user_id`, etc.), and the full stack trace.
520
+
521
+ **How to Enable:**
522
+ Add these two variables to your `test_ex/.env` file (or environment variables).
523
+
524
+ ```ini
525
+ # .env File
526
+
527
+ # 1. Set the alert strategy to 'webhook'. (Default: "none")
528
+ ALERTER_STRATEGY="webhook"
529
+
530
+ # 2. Enter your Webhook URL obtained from Discord or Slack.
531
+ ALERTER_WEBHOOK_URL="[https://discord.com/api/webhooks/YOUR_HOOK_ID/](https://www.google.com/search?q=https://discord.com/api/webhooks/YOUR_HOOK_ID/)..."
532
+ Adding just these two lines and running test_ex/example.py will immediately send an alert to Discord when a CustomValueError occurs.
533
+
534
+ Extensibility (Strategy Pattern): This alert system is designed with the Strategy Pattern, allowing easy extension to other notification channels like email or PagerDuty by implementing the BaseAlerter interface.
535
+ ```
536
+
537
+ -----
538
+
539
+ ## ๐Ÿงช Advanced Features: Testing and Maintenance
540
+
541
+ VectorWave provides powerful tools to utilize stored operational data for testing and maintenance.
542
+
543
+ ### 1\. Automated Regression Testing (Replay)
544
+
545
+ **Transform production logs into test cases.**
546
+ VectorWave records the **input arguments** and **return value** of the function upon execution. The `Replayer` uses this data to re-execute the function and verify if the result matches the past outcome, automatically detecting **regression** (breakage of existing functionality) due to code changes.
547
+
548
+ #### Enable Replay Mode
549
+
550
+ Add the `replay=True` option to the `@vectorize` decorator. Input arguments and return values will be automatically captured.
551
+
552
+ ```python
553
+ @vectorize(
554
+ search_description="Calculate payment amount",
555
+ sequence_narrative="Validate user and return total amount",
556
+ replay=True # <--- Turn on this option to enable Replay!
557
+ )
558
+ def calculate_total(user_id: str, price: int, tax: float):
559
+ return price + (price * tax)
560
+ ```
561
+
562
+ #### Execute Tests (Replay Test)
563
+
564
+ Use the `VectorWaveReplayer` in a separate test script to validate the current code against past successful execution history.
565
+
566
+ ```python
567
+ from vectorwave.utils.replayer import VectorWaveReplayer
568
+
569
+ replayer = VectorWaveReplayer()
570
+
571
+ # Test the latest 10 successful logs of 'my_module.calculate_total'
572
+ result = replayer.replay("my_module.calculate_total", limit=10)
573
+
574
+ print(f"Passed: {result['passed']}, Failed: {result['failed']}")
575
+
576
+ if result['failed'] > 0:
577
+ for fail in result['failures']:
578
+ print(f"Mismatch! UUID: {fail['uuid']}, Expected: {fail['expected']}, Actual: {fail['actual']}")
579
+ ```
580
+
581
+ #### Update Baseline
582
+
583
+ If the change in result is intentional due to logic modification, use the `update_baseline=True` option to save the current execution result as the new correct answer (Baseline) in the DB.
584
+
585
+ ```python
586
+ # Update the stored return value in the DB to the current function execution result.
587
+ replayer.replay("my_module.calculate_total", update_baseline=True)
588
+ ```
589
+
590
+ ### 2\. Data Archiving and Fine-tuning (Archiver)
591
+
592
+ **Manage database capacity and secure training datasets.**
593
+ Export old execution logs in **JSONL format** (suitable for LLM fine-tuning) or delete them from the database to save storage space.
594
+
595
+ ```python
596
+ from vectorwave.database.archiver import VectorWaveArchiver
597
+
598
+ archiver = VectorWaveArchiver()
599
+
600
+ # 1. Export to JSONL and Clear from DB (Export & Clear)
601
+ archiver.export_and_clear(
602
+ function_name="my_module.calculate_total",
603
+ output_file="data/training_dataset.jsonl",
604
+ clear_after_export=True # Delete logs from DB after successful export
605
+ )
606
+
607
+ # 2. Delete Only (Purge)
608
+ archiver.export_and_clear(
609
+ function_name="my_module.calculate_total",
610
+ output_file="",
611
+ delete_only=True
612
+ )
613
+ ```
614
+
615
+ **Generated JSONL Example:**
616
+
617
+ ```json
618
+ {"messages": [{"role": "user", "content": "{\"price\": 100, \"tax\": 0.1}"}, {"role": "assistant", "content": "110.0"}]}
619
+ ```
620
+
621
+ ## ๐ŸŒŠ Auto-Injection (Zero-Code Change Integration)
622
+
623
+ You don't need to modify your business logic to use VectorWave. Use `VectorWaveAutoInjector` to inject functionality externally.
624
+
625
+ ### How to use
626
+
627
+ 1. **Configure Global Settings**: Set default values like `team`, `priority`, or `auto` (pending mode).
628
+ 2. **Inject Modules**: Specify the target module path string.
629
+
630
+ ```python
631
+ from vectorwave import initialize_database, VectorWaveAutoInjector, generate_and_register_metadata
632
+
633
+ # 1. Initialize DB
634
+ initialize_database()
635
+
636
+ # 2. Configure AutoInjector (Global Settings)
637
+ VectorWaveAutoInjector.configure(
638
+ team="billing-team",
639
+ priority=1,
640
+ auto=True # True: Collect metadata in memory (Pending), False: Save immediately to DB
641
+ )
642
+
643
+ # 3. Inject VectorWave into your module
644
+ # (No need to add @vectorize decorators in 'my_service.payment' code!)
645
+ VectorWaveAutoInjector.inject("my_service.payment")
646
+
647
+ # 4. Register Metadata (If auto=True)
648
+ generate_and_register_metadata()
649
+
650
+ # 5. Run your business logic
651
+ import my_service.payment
652
+ my_service.payment.process_transaction()
653
+ ```
654
+
655
+ ## ๐ŸŒŒ Ecosystem
656
+
657
+ VectorWave is part of a larger ecosystem designed to optimize the entire lifecycle of AI engineering, from observability to testing.
658
+
659
+ ### ๐Ÿ„โ€โ™‚๏ธ [VectorSurfer](https://github.com/cozymori/vectorsurfer)
660
+ > **Visualize your AI flows instantly.**
661
+ **VectorSurfer** is a comprehensive dashboard for VectorWave. It allows you to visualize execution traces, monitor error rates in real-time, and manage self-healing processes through an intuitive web interface.
662
+ * **Trace Visualization:** View complex execution flows (Spans) and latency waterfalls at a glance.
663
+ * **Error Monitoring:** Track error trends and inspect detailed failure logs.
664
+ * **Healer Interface:** Review and apply code fixes suggested by the VectorWave Healer.
665
+
666
+ ### โœ… [VectorCheck](https://github.com/cozymori/vectorcheck)
667
+ > **Test your AI with "Intent", not just strings.**
668
+ **VectorCheck** is an AI-native regression testing framework. Instead of brittle exact string matching (`assert a == b`), it uses vector similarity to validate if the AI's output matches the intended meaning ("Golden Data").
669
+ * **Semantic Assertions:** Pass tests if the output is *semantically* similar to the expected result, even if the wording differs.
670
+ * **Golden Data Replay:** Automatically fetch and replay successful production logs to verify new code changes.
671
+ * **CLI Dashboard:** Run tests and view results directly in your terminal with zero configuration.
672
+
673
+ ## ๐Ÿค Contributing
674
+
675
+ We welcome all forms of contributions, including bug reports, feature requests, and code contributions. Please refer to [CONTRIBUTING.md](https://www.google.com/search?q=httpsS://www.google.com/search%3Fq%3DCONTRIBUTING.md) for details.
676
+
677
+ ## ๐Ÿ“œ License
678
+
679
+ This project is distributed under the MIT License. Please check the [LICENSE](https://www.google.com/search?q=LICENSE) file for details.
680
+