databao-context-engine 0.1.2__py3-none-any.whl → 0.1.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (86) hide show
  1. databao_context_engine/__init__.py +18 -6
  2. databao_context_engine/build_sources/__init__.py +4 -0
  3. databao_context_engine/build_sources/{internal/build_runner.py → build_runner.py} +27 -23
  4. databao_context_engine/build_sources/build_service.py +53 -0
  5. databao_context_engine/build_sources/build_wiring.py +84 -0
  6. databao_context_engine/build_sources/export_results.py +41 -0
  7. databao_context_engine/build_sources/{internal/plugin_execution.py → plugin_execution.py} +3 -7
  8. databao_context_engine/cli/add_datasource_config.py +41 -15
  9. databao_context_engine/cli/commands.py +12 -43
  10. databao_context_engine/cli/info.py +2 -2
  11. databao_context_engine/databao_context_engine.py +137 -0
  12. databao_context_engine/databao_context_project_manager.py +96 -6
  13. databao_context_engine/datasources/add_config.py +34 -0
  14. databao_context_engine/{datasource_config → datasources}/check_config.py +18 -7
  15. databao_context_engine/datasources/datasource_context.py +93 -0
  16. databao_context_engine/{project → datasources}/datasource_discovery.py +17 -16
  17. databao_context_engine/{project → datasources}/types.py +64 -15
  18. databao_context_engine/init_project.py +25 -3
  19. databao_context_engine/introspection/property_extract.py +59 -30
  20. databao_context_engine/llm/errors.py +2 -8
  21. databao_context_engine/llm/install.py +13 -20
  22. databao_context_engine/llm/service.py +1 -3
  23. databao_context_engine/mcp/all_results_tool.py +2 -2
  24. databao_context_engine/mcp/mcp_runner.py +4 -2
  25. databao_context_engine/mcp/mcp_server.py +1 -4
  26. databao_context_engine/mcp/retrieve_tool.py +3 -11
  27. databao_context_engine/plugin_loader.py +111 -0
  28. databao_context_engine/pluginlib/build_plugin.py +25 -9
  29. databao_context_engine/pluginlib/config.py +16 -2
  30. databao_context_engine/plugins/databases/athena_introspector.py +85 -22
  31. databao_context_engine/plugins/databases/base_introspector.py +5 -3
  32. databao_context_engine/plugins/databases/clickhouse_introspector.py +22 -11
  33. databao_context_engine/plugins/databases/duckdb_introspector.py +1 -1
  34. databao_context_engine/plugins/databases/introspection_scope.py +11 -9
  35. databao_context_engine/plugins/databases/introspection_scope_matcher.py +2 -5
  36. databao_context_engine/plugins/databases/mssql_introspector.py +26 -17
  37. databao_context_engine/plugins/databases/mysql_introspector.py +23 -12
  38. databao_context_engine/plugins/databases/postgresql_introspector.py +2 -2
  39. databao_context_engine/plugins/databases/snowflake_introspector.py +43 -10
  40. databao_context_engine/plugins/plugin_loader.py +43 -42
  41. databao_context_engine/plugins/resources/parquet_introspector.py +2 -3
  42. databao_context_engine/project/info.py +31 -2
  43. databao_context_engine/project/init_project.py +16 -7
  44. databao_context_engine/project/layout.py +3 -3
  45. databao_context_engine/retrieve_embeddings/__init__.py +3 -0
  46. databao_context_engine/retrieve_embeddings/{internal/export_results.py → export_results.py} +2 -2
  47. databao_context_engine/retrieve_embeddings/{internal/retrieve_runner.py → retrieve_runner.py} +5 -9
  48. databao_context_engine/retrieve_embeddings/{internal/retrieve_service.py → retrieve_service.py} +3 -17
  49. databao_context_engine/retrieve_embeddings/retrieve_wiring.py +49 -0
  50. databao_context_engine/{serialisation → serialization}/yaml.py +1 -1
  51. databao_context_engine/services/chunk_embedding_service.py +23 -11
  52. databao_context_engine/services/factories.py +1 -46
  53. databao_context_engine/services/persistence_service.py +11 -11
  54. databao_context_engine/storage/connection.py +11 -7
  55. databao_context_engine/storage/exceptions/exceptions.py +2 -2
  56. databao_context_engine/storage/migrate.py +2 -4
  57. databao_context_engine/storage/migrations/V01__init.sql +6 -31
  58. databao_context_engine/storage/models.py +2 -23
  59. databao_context_engine/storage/repositories/chunk_repository.py +16 -12
  60. databao_context_engine/storage/repositories/factories.py +1 -12
  61. databao_context_engine/storage/repositories/vector_search_repository.py +8 -13
  62. databao_context_engine/system/properties.py +4 -2
  63. databao_context_engine-0.1.3.dist-info/METADATA +75 -0
  64. {databao_context_engine-0.1.2.dist-info → databao_context_engine-0.1.3.dist-info}/RECORD +68 -77
  65. databao_context_engine/build_sources/internal/build_service.py +0 -77
  66. databao_context_engine/build_sources/internal/build_wiring.py +0 -52
  67. databao_context_engine/build_sources/internal/export_results.py +0 -43
  68. databao_context_engine/build_sources/public/api.py +0 -4
  69. databao_context_engine/databao_engine.py +0 -85
  70. databao_context_engine/datasource_config/__init__.py +0 -0
  71. databao_context_engine/datasource_config/add_config.py +0 -50
  72. databao_context_engine/datasource_config/datasource_context.py +0 -60
  73. databao_context_engine/project/runs.py +0 -39
  74. databao_context_engine/retrieve_embeddings/internal/__init__.py +0 -0
  75. databao_context_engine/retrieve_embeddings/internal/retrieve_wiring.py +0 -29
  76. databao_context_engine/retrieve_embeddings/public/__init__.py +0 -0
  77. databao_context_engine/retrieve_embeddings/public/api.py +0 -3
  78. databao_context_engine/serialisation/__init__.py +0 -0
  79. databao_context_engine/services/run_name_policy.py +0 -8
  80. databao_context_engine/storage/repositories/datasource_run_repository.py +0 -136
  81. databao_context_engine/storage/repositories/run_repository.py +0 -157
  82. databao_context_engine-0.1.2.dist-info/METADATA +0 -187
  83. /databao_context_engine/{build_sources/internal → datasources}/__init__.py +0 -0
  84. /databao_context_engine/{build_sources/public → serialization}/__init__.py +0 -0
  85. {databao_context_engine-0.1.2.dist-info → databao_context_engine-0.1.3.dist-info}/WHEEL +0 -0
  86. {databao_context_engine-0.1.2.dist-info → databao_context_engine-0.1.3.dist-info}/entry_points.txt +0 -0
@@ -1,157 +0,0 @@
1
- from datetime import datetime
2
- from typing import Any, Optional
3
-
4
- import duckdb
5
-
6
- from databao_context_engine.services.run_name_policy import RunNamePolicy
7
- from databao_context_engine.storage.models import RunDTO
8
-
9
-
10
- class RunRepository:
11
- def __init__(self, conn: duckdb.DuckDBPyConnection, run_name_policy: RunNamePolicy):
12
- self._conn = conn
13
- self._run_name_policy = run_name_policy
14
-
15
- def create(
16
- self, *, project_id: str, dce_version: Optional[str] = None, started_at: datetime | None = None
17
- ) -> RunDTO:
18
- if started_at is None:
19
- started_at = datetime.now()
20
- run_name = self._run_name_policy.build(run_started_at=started_at)
21
-
22
- row = self._conn.execute(
23
- """
24
- INSERT INTO
25
- run (project_id, nemory_version, started_at, run_name)
26
- VALUES
27
- (?, ?, ?, ?)
28
- RETURNING
29
- *
30
- """,
31
- [project_id, dce_version, started_at, run_name],
32
- ).fetchone()
33
- if row is None:
34
- raise RuntimeError("Run creation returned no object")
35
- return self._row_to_dto(row)
36
-
37
- def get(self, run_id: int) -> Optional[RunDTO]:
38
- row = self._conn.execute(
39
- """
40
- SELECT
41
- *
42
- FROM
43
- run
44
- WHERE
45
- run_id = ?
46
- """,
47
- [run_id],
48
- ).fetchone()
49
- return self._row_to_dto(row) if row else None
50
-
51
- def get_by_run_name(self, *, project_id: str, run_name: str) -> RunDTO | None:
52
- row = self._conn.execute(
53
- """
54
- SELECT
55
- *
56
- FROM
57
- run
58
- WHERE
59
- run.project_id = ? AND run_name = ?
60
- """,
61
- [project_id, run_name],
62
- ).fetchone()
63
- return self._row_to_dto(row) if row else None
64
-
65
- def get_latest_run_for_project(self, project_id: str) -> RunDTO | None:
66
- row = self._conn.execute(
67
- """
68
- SELECT
69
- *
70
- FROM
71
- run
72
- WHERE
73
- run.project_id = ?
74
- ORDER BY run.started_at DESC
75
- LIMIT 1
76
- """,
77
- [project_id],
78
- ).fetchone()
79
- return self._row_to_dto(row) if row else None
80
-
81
- def update(
82
- self,
83
- run_id: int,
84
- *,
85
- project_id: Optional[str] = None,
86
- ended_at: Optional[datetime] = None,
87
- dce_version: Optional[str] = None,
88
- ) -> Optional[RunDTO]:
89
- sets: list[Any] = []
90
- params: list[Any] = []
91
-
92
- if project_id is not None:
93
- sets.append("project_id = ?")
94
- params.append(project_id)
95
- if ended_at is not None:
96
- sets.append("ended_at = ?")
97
- params.append(ended_at)
98
- if dce_version is not None:
99
- sets.append("nemory_version = ?")
100
- params.append(dce_version)
101
-
102
- if not sets:
103
- return self.get(run_id)
104
-
105
- params.append(run_id)
106
- self._conn.execute(
107
- f"""
108
- UPDATE
109
- run
110
- SET
111
- {", ".join(sets)}
112
- WHERE
113
- run_id = ?
114
- """,
115
- params,
116
- )
117
-
118
- return self.get(run_id)
119
-
120
- def delete(self, run_id: int) -> int:
121
- row = self._conn.execute(
122
- """
123
- DELETE FROM
124
- run
125
- WHERE
126
- run_id = ?
127
- RETURNING
128
- run_id
129
- """,
130
- [run_id],
131
- ).fetchone()
132
- return 1 if row else 0
133
-
134
- def list(self) -> list[RunDTO]:
135
- rows = self._conn.execute(
136
- """
137
- SELECT
138
- *
139
- FROM
140
- run
141
- ORDER BY
142
- run_id DESC
143
- """
144
- ).fetchall()
145
- return [self._row_to_dto(r) for r in rows]
146
-
147
- @staticmethod
148
- def _row_to_dto(row: tuple) -> RunDTO:
149
- run_id, project_id, started_at, ended_at, dce_version, run_name = row
150
- return RunDTO(
151
- run_id=int(run_id),
152
- run_name=run_name,
153
- project_id=str(project_id),
154
- started_at=started_at,
155
- ended_at=ended_at,
156
- nemory_version=dce_version,
157
- )
@@ -1,187 +0,0 @@
1
- Metadata-Version: 2.3
2
- Name: databao-context-engine
3
- Version: 0.1.2
4
- Summary: Add your description here
5
- Requires-Dist: click>=8.3.0
6
- Requires-Dist: duckdb>=1.4.3
7
- Requires-Dist: pyyaml>=6.0.3
8
- Requires-Dist: requests>=2.32.5
9
- Requires-Dist: pymysql>=1.1.2
10
- Requires-Dist: clickhouse-connect>=0.10.0
11
- Requires-Dist: mcp>=1.23.3
12
- Requires-Dist: pyathena>=3.22.0
13
- Requires-Dist: snowflake-connector-python>=4.1.0
14
- Requires-Dist: pydantic>=2.12.4
15
- Requires-Dist: jinja2>=3.1.6
16
- Requires-Dist: asyncpg>=0.31.0
17
- Requires-Dist: asyncio>=4.0.0
18
- Requires-Dist: asyncpg-stubs>=0.31.1
19
- Requires-Dist: mssql-python>=1.0.0 ; extra == 'mssql'
20
- Requires-Python: >=3.12
21
- Provides-Extra: mssql
22
- Description-Content-Type: text/markdown
23
-
24
- [![official project](https://jb.gg/badges/official.svg)](https://confluence.jetbrains.com/display/ALL/JetBrains+on+GitHub)
25
- [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/JetBrains/databao-context-engine/blob/main/LICENSE)
26
-
27
- [//]: # ([![PyPI version](https://img.shields.io/pypi/v/databao-context-engine.svg)](https://pypi.org/project/databao-context-engine))
28
-
29
- [//]: # ([![Python versions](https://img.shields.io/pypi/pyversions/databao-context-engine.svg)](https://pypi.org/project/databao-context-engine/))
30
-
31
-
32
- <h1 align="center">Databao Context Engine</h1>
33
- <p align="center">
34
- <b>Semantic context for your LLMs — generated automatically.</b><br/>
35
- No more copying schemas. No manual documentation. Just accurate answers.
36
- </p>
37
- <p align="center">
38
- <a href="https://databao.app">Website</a>
39
-
40
- [//]: # (•)
41
-
42
- [//]: # ( <a href="#quickstart">Quickstart</a> •)
43
-
44
- [//]: # ( <a href="#supported-data-sources">Data Sources</a> •)
45
-
46
- [//]: # ( <a href="#contributing">Contributing</a>)
47
- </p>
48
-
49
- ---
50
-
51
- ## What is Databao Context Engine?
52
-
53
- Databao Context Engine **automatically generates governed semantic context** from your databases, BI tools, documents, and spreadsheets.
54
-
55
- Integrate it with any LLM to deliver **accurate, context-aware answers** — without copying schemas or writing documentation by hand.
56
-
57
- ```
58
- Your data sources → Context Engine → Unified semantic graph → Any LLM
59
- ```
60
-
61
- ## Why choose Databao Context Engine?
62
-
63
- | Feature | What it means for you |
64
- |----------------------------|----------------------------------------------------------------|
65
- | **Auto-generated context** | Extracts schemas, relationships, and semantics automatically |
66
- | **Runs locally** | Your data never leaves your environment |
67
- | **MCP integration** | Works with Claude Desktop, Cursor, and any MCP-compatible tool |
68
- | **Multiple sources** | Databases, dbt projects, spreadsheets, documents |
69
- | **Built-in benchmarks** | Measure and improve context quality over time |
70
- | **LLM agnostic** | OpenAI, Anthropic, Ollama, Gemini — use any model |
71
- | **Governed & versioned** | Track, version, and share context across your team |
72
- | **Dynamic or static** | Serve context via MCP server or export as artifact |
73
-
74
- # Prerequisites
75
-
76
- This README assumes you will use `uv` as your package manager.
77
-
78
- You can install it following the instructions [here](https://docs.astral.sh/uv/getting-started/installation/)
79
-
80
- If you are going to push to the repository, please make sure to install git pre-commit hooks by running
81
-
82
- ```bash
83
- uv run pre-commit install
84
- ```
85
-
86
- # How to run?
87
-
88
- You can run it with:
89
-
90
- ```bash
91
- uv run dce info
92
- ```
93
-
94
- Not providing the `info` subcommand or using the `--help` flag will show the help screen for the command.
95
-
96
- ## Using the dce command directly
97
-
98
- To be able to use the `dce` command directly (without using `uv run` or `python`) there are two options.
99
-
100
- ### Installing dce locally
101
-
102
- For that one needs to:
103
-
104
- 1. Build the project by running
105
-
106
- ```bash
107
- uv build
108
- ```
109
-
110
- 2. Installing the project on our machine by running:
111
-
112
- ```bash
113
- uv tool install -e .
114
- ```
115
-
116
- This second step will install the `dce` script on your machine and add it into your path.
117
-
118
- ### Create dce alias using nix
119
-
120
- This method will simply create a new shell environment with `dce` alias. For that one needs to install `nix` package
121
- manager (https://nixos.org/download/). After that one could simply run in the project root
122
-
123
- ```bash
124
- $ nix-shell
125
- ```
126
-
127
- which is a short version of `$ nix-shell shell.nix`.
128
-
129
- Alternatively, one could specify the path to the project repository
130
-
131
- ```bash
132
- $ nix-shell {path_to_dce_repository}
133
- ```
134
-
135
- After that, you can then directly use:
136
-
137
- ```bash
138
- dce --help
139
- ```
140
-
141
- Note: when we actually release our built Python package, users that don't use `uv` will still be able to install the CLI
142
- by using `pipx install` instead.
143
-
144
- # Running Mypy
145
-
146
- [mypy](https://mypy.readthedocs.io/en/stable/getting_started.html) has been added to the project for type checking.
147
-
148
- You can run it with the following:
149
-
150
- ```bash
151
- uv run mypy src --exclude "test_*" --exclude dist
152
- ```
153
-
154
- NB: the above runs type checking on all files within the `src` directory, excluding all test files.
155
-
156
- # Running tests
157
-
158
- You can run the tests with:
159
-
160
- ```bash
161
- uv run pytest
162
- ```
163
-
164
- (there is currently one test succeeding and one test failing in the project)
165
-
166
- # Generating JSON Schemas for our plugin's config files
167
-
168
- To be able to build a datasource, each plugin requires a yaml config file that describes how to connect to the
169
- datasource,
170
- as well as other information needed to customise the plugin.
171
-
172
- To document what each config file should look like, we can generate a JSON schema describing the fields allowed in that
173
- file.
174
-
175
- You can generate all JSON schemas for all plugins by running:
176
-
177
- ```bash
178
- uv run generate_configs_schemas
179
- ```
180
-
181
- Some options can be provided to the command to choose which plugins to include or exclude from the generation.
182
- To see the options available, you can refer to the help:
183
-
184
- ```bash
185
- uv run generate_configs_schemas --help
186
- ```
187
-