knowledgetree-rag 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,267 @@
1
+ Metadata-Version: 2.4
2
+ Name: knowledgetree-rag
3
+ Version: 0.1.0
4
+ Summary: Knowledge-graph-powered RAG sidecar for infrastructure querying
5
+ Author: Knowledge Tree
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/knowledgetree-dev/kt-rag
8
+ Requires-Python: >=3.10
9
+ Description-Content-Type: text/markdown
10
+ License-File: LICENSE
11
+ Requires-Dist: lightrag-hku[api]>=1.4.0
12
+ Requires-Dist: sentence-transformers>=3.0.0
13
+ Requires-Dist: transformers>=4.40.0
14
+ Requires-Dist: httpx>=0.27.0
15
+ Requires-Dist: structlog>=24.1.0
16
+ Requires-Dist: python-dotenv>=1.0.0
17
+ Requires-Dist: tenacity>=8.2.0
18
+ Requires-Dist: asyncpg>=0.29.0
19
+ Dynamic: license-file
20
+
21
+ <div align="center">
22
+ <h1>kt-rag</h1>
23
+ <p>Knowledge-graph-powered RAG sidecar for infrastructure querying</p>
24
+ </div>
25
+
26
+ <p align="center">
27
+ <a href="https://github.com/knowledgetree-dev/kt-rag/releases"><img src="https://img.shields.io/github/v/release/knowledgetree-dev/kt-rag?style=flat&label=version" alt="Version"></a>
28
+ <a href="https://github.com/knowledgetree-dev/kt-rag/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/knowledgetree-dev/kt-rag/ci.yml?branch=main&style=flat&label=CI" alt="CI"></a>
29
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg?style=flat" alt="MIT License"></a>
30
+ <a href="https://pypi.org/project/knowledgetree-rag/"><img src="https://img.shields.io/pypi/v/knowledgetree-rag?style=flat&label=PyPI" alt="PyPI"></a>
31
+ </p>
32
+
33
+ ---
34
+
35
+ Turn your infrastructure graph into a searchable knowledge base. kt-rag feeds discovered infrastructure data — services, databases, Kubernetes clusters, DNS zones, CI/CD pipelines — into a [LightRAG](https://github.com/HKUDS/LightRAG) knowledge graph, then lets you ask questions in plain English.
36
+
37
+ **What you can do:**
38
+
39
+ - "Which services depend on Postgres?" — get a structured answer with confidence scoring
40
+ - "What's the blast radius if us-east-1 goes down?" — trace dependencies across providers
41
+ - "Show me all EC2 instances tagged `production` in eu-west-1" — filter by metadata
42
+ - "Generate a runbook for the auth service" — auto-document from live graph data
43
+ - "What changed in my infrastructure since last week?" — incremental sync awareness
44
+
45
+ ## How It Works
46
+
47
+ ```
48
+ ┌─────────────┐ ┌──────────┐ ┌──────────┐ ┌──────┐
49
+ │ Discovery │───▶│ kt-rag │───▶│ LightRAG │───▶│ LLM │
50
+ │ Plugins │ │ seed/sync│ │ KG Store │ │ │
51
+ └─────────────┘ └──────────┘ └──────────┘ └──────┘
52
+
53
+ ┌────┴────┐ ┌─────┴──────┐
54
+ │ API │ │ Query CLI │
55
+ │ Server │ │ / Library │
56
+ └─────────┘ └─────────────┘
57
+ ```
58
+
59
+ 1. **Plugins discover** your infrastructure (AWS, GitHub, K8s, etc.)
60
+ 2. **kt-rag seeds** entities and relationships into LightRAG's graph store
61
+ 3. **LightRAG** indexes everything for hybrid search (graph + vector)
62
+ 4. **Your question** triggers retrieval-augmented generation through an LLM
63
+
64
+ ## Quick Start
65
+
66
+ ### Install
67
+
68
+ ```bash
69
+ pip install knowledgetree-rag
70
+ ```
71
+
72
+ Or from source:
73
+
74
+ ```bash
75
+ git clone https://github.com/knowledgetree-dev/kt-rag.git
76
+ cd kt-rag
77
+ pip install -e .
78
+ ```
79
+
80
+ ### Configure
81
+
82
+ Copy the template and set your LLM API key:
83
+
84
+ ```bash
85
+ cp .env.example .env
86
+ # Edit .env with your LLM provider and API key
87
+ ```
88
+
89
+ Minimal `.env`:
90
+
91
+ ```ini
92
+ LLM_BINDING_API_KEY=sk-...
93
+ ```
94
+
95
+ ### Seed the graph
96
+
97
+ Populate LightRAG from your infrastructure data:
98
+
99
+ ```bash
100
+ kt-rag-seed
101
+ ```
102
+
103
+ ### Query
104
+
105
+ ```bash
106
+ kt-rag-query "which services depend on postgres?"
107
+ ```
108
+
109
+ Example output:
110
+
111
+ ```
112
+ Response: The following services depend on Postgres:
113
+ - auth-service (production, us-east-1)
114
+ - user-api (staging, eu-west-1)
115
+ - analytics-backend (production, us-west-2)
116
+ - docs-service (production, us-east-1)
117
+
118
+ Confidence: 0.82
119
+ ```
120
+
121
+ ### Start the API server
122
+
123
+ ```bash
124
+ kt-rag-server
125
+ # Listening on http://0.0.0.0:8085
126
+ ```
127
+
128
+ ```bash
129
+ curl -s http://localhost:8085/api/v1/rag/health
130
+ ```
131
+
132
+ ## CLI Commands
133
+
134
+ | Command | Description |
135
+ |---------|-------------|
136
+ | `kt-rag-seed` | Full seed from Knowledge Tree into LightRAG |
137
+ | `kt-rag-query "..."` | Natural-language query (supports `--filter`, `--min-score`, `--profile`) |
138
+ | `kt-rag-server` | REST API on port 8085 |
139
+
140
+ ## API
141
+
142
+ ### Query
143
+
144
+ ```json
145
+ POST /api/v1/rag/query
146
+ {
147
+ "question": "which services depend on postgres?",
148
+ "mode": "hybrid",
149
+ "filter": {"provider": "aws", "region": "us-east-1"},
150
+ "profile_name": "claude"
151
+ }
152
+ ```
153
+
154
+ **Response:**
155
+
156
+ ```json
157
+ {
158
+ "response": "The following services depend on Postgres...",
159
+ "confidence": 0.82,
160
+ "metadata": {
161
+ "mode": "hybrid",
162
+ "profile": "claude",
163
+ "tokens_used": 1247
164
+ }
165
+ }
166
+ ```
167
+
168
+ ### Profile Management
169
+
170
+ ```bash
171
+ # List profiles
172
+ GET /api/v1/rag/profiles
173
+
174
+ # Create profile
175
+ POST /api/v1/rag/profiles
176
+ {
177
+ "name": "claude",
178
+ "provider": "anthropic",
179
+ "api_key": "sk-ant-...",
180
+ "model": "claude-sonnet-4-20250514"
181
+ }
182
+
183
+ # Delete profile
184
+ DELETE /api/v1/rag/profiles/{name}
185
+ ```
186
+
187
+ ## Multi-LLM Profiles
188
+
189
+ Switch between LLM providers at query time:
190
+
191
+ ```bash
192
+ # Named profiles (persisted)
193
+ kt-rag-query "what's my blast radius?" --profile claude
194
+
195
+ # Ad-hoc via headers (server mode)
196
+ curl -H "X-KT-LLM-Provider: anthropic" \
197
+ -H "X-KT-LLM-Api-Key: sk-ant-..." \
198
+ -d '{"question": "..."}' \
199
+ http://localhost:8085/api/v1/rag/query
200
+ ```
201
+
202
+ ## Configuration
203
+
204
+ All config via environment variables or `.env`:
205
+
206
+ | Variable | Default | Description |
207
+ |----------|---------|-------------|
208
+ | `LLM_BINDING` | `openai` | LLM provider (`openai`, `anthropic`, `ollama`, `azure`) |
209
+ | `LLM_BINDING_API_KEY` | — | API key |
210
+ | `LLM_BINDING_MODEL` | `gpt-4o` | Model name |
211
+ | `EMBEDDING_BINDING` | `openai` | Embedding provider |
212
+ | `EMBEDDING_BINDING_MODEL` | `text-embedding-3-small` | Embedding model |
213
+ | `LIGHTRAG_DIR` | `./output` | LightRAG working directory |
214
+ | `LIGHTRAG_PORT` | `8085` | API server port |
215
+ | `KT_API_URL` | `http://localhost:8080` | Knowledge Tree API URL |
216
+ | `KT_API_TOKEN` | — | Knowledge Tree auth token |
217
+ | `PG_URI` | `postgresql://localhost:5432/knowledgetree` | Database (read-only seed) |
218
+ | `SYNC_INTERVAL` | `300` | Sync interval in seconds |
219
+
220
+ ## Confidence Scoring
221
+
222
+ Every query response includes a 0.0–1.0 confidence score based on:
223
+
224
+ - **Retrieval quality** (30%) — how well the query matched stored entities
225
+ - **Context coverage** (30%) — how completely the retrieved data covers the question
226
+ - **Factual support** (40%) — whether the LLM's answer is grounded in retrieved facts
227
+
228
+ Set a minimum threshold:
229
+
230
+ ```bash
231
+ kt-rag-query "what repos are production?" --min-score 0.6
232
+ ```
233
+
234
+ ## Contributing
235
+
236
+ PRs are welcome. For feature requests or major changes, open an issue first.
237
+
238
+ ```bash
239
+ # Dev setup
240
+ python -m venv .venv && source .venv/bin/activate
241
+ pip install -e ".[dev]"
242
+
243
+ # Run tests
244
+ pytest tests/ -v --tb=short
245
+
246
+ # Type check
247
+ mypy .
248
+ ```
249
+
250
+ ## Project Structure
251
+
252
+ ```
253
+ ├── seed.py # Full graph seed
254
+ ├── sync.py # Incremental sync
255
+ ├── query.py # CLI query entry point
256
+ ├── server.py # FastAPI server
257
+ ├── config.py # Environment config loader
258
+ ├── profile_store.py # LLM provider profile persistence
259
+ ├── scorer.py # Confidence scoring
260
+ ├── kt_client.py # Knowledge Tree API client
261
+ ├── tests/ # Test suite
262
+ └── .env.example # Config template
263
+ ```
264
+
265
+ ## License
266
+
267
+ MIT
@@ -0,0 +1,18 @@
1
+ knowledgetree_rag-0.1.0.dist-info/licenses/LICENSE,sha256=lzv8ZruH3OjN-UbQbSl7Qf0MEtQgaoZbV_-BdWuUwKo,1071
2
+ tests/test_config.py,sha256=ib1MJj8Cf292Hz8Cs8VfRaQP4xsJH_e3q_G4VGMQP0M,323
3
+ tests/test_kt_client.py,sha256=vCoPpp_9IM1xd_goBVUcQkeYFPikNuQ5ivsu8StHB3M,453
4
+ tests/test_kt_client_async.py,sha256=7GNmmpiPT-MKli0FbAMg8eLC_EzG_JcYnTK3ifoAqZY,4419
5
+ tests/test_metadata_filter.py,sha256=NCfTf2cc9xyQr8fP_Ttd9IqEEE89Uqqo81MsqkPiF1E,2147
6
+ tests/test_query.py,sha256=pjuEm92i4-A3wZ740H4t99mYsAha2k8rnlYbIhesnHU,4316
7
+ tests/test_rag_factory.py,sha256=prcL0r03ELdgLYbPRmmmyGby-myqdIUG7Mgwwtv6ueQ,1290
8
+ tests/test_scorer.py,sha256=Ydlr0QeygpgliB7zsV0j-8Sd6Jm4wBZqgszeAeMtS5U,2794
9
+ tests/test_seed.py,sha256=-C1YRcogpNQzWkvLFroV1wo5la98kMcVS6-rwwLFOaU,1062
10
+ tests/test_seed_async.py,sha256=eZv1usiiTOqiIHs_lUlOna3t0jIs8cjFAyHuM3-QnO8,3310
11
+ tests/test_sync.py,sha256=qyrYhTF06X0uxdQEIf_d314acoELR7dTF0FrFlINrOA,370
12
+ tests/test_sync_async.py,sha256=Zrg0PtzGnityjE7uKQ70B1W37Zf6x3Z5ZBzU_ifXq7I,3215
13
+ tests/test_sync_service.py,sha256=uZ38Jpvv3BXTh_mj-exaCHhqb_adoJNILBxYDr5tvpA,1777
14
+ knowledgetree_rag-0.1.0.dist-info/METADATA,sha256=vVR7uwFDcVKAy90Sz0Vjg2hG16RhI8I6dW2v9NNRNuE,7879
15
+ knowledgetree_rag-0.1.0.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
16
+ knowledgetree_rag-0.1.0.dist-info/entry_points.txt,sha256=B-5MVjker0KsIEx5tv6Vr8y7STxfeaHinu63U-WKDdw,96
17
+ knowledgetree_rag-0.1.0.dist-info/top_level.txt,sha256=EdW7283x-lr_cbuivW8Ij7ANAP-ZJ9sLtILQseEPxXg,6
18
+ knowledgetree_rag-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (82.0.1)
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
5
+
@@ -0,0 +1,4 @@
1
+ [console_scripts]
2
+ kt-rag-query = query:main
3
+ kt-rag-seed = seed:main
4
+ kt-rag-server = server:main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Knowledge Tree
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1 @@
1
+ tests
tests/test_config.py ADDED
@@ -0,0 +1,13 @@
1
+ """Tests for config module."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from config import Config
6
+
7
+
8
+ def test_config_defaults():
9
+ """Config returns sensible defaults when env vars are unset."""
10
+ cfg = Config()
11
+ assert cfg.llm_binding == "openai"
12
+ assert cfg.lightrag_port == 8085
13
+ assert cfg.sync_interval == 300
@@ -0,0 +1,16 @@
1
+ """Tests for kt_client module."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import pytest
6
+
7
+ from kt_client import KTClient, KTResource, KTEdge
8
+
9
+
10
+ @pytest.mark.asyncio
11
+ async def test_kt_client_initialization():
12
+ """Client can be initialized with default or custom URLs."""
13
+ client = KTClient(base_url="http://test:8080", token="test-token")
14
+ assert client.base_url == "http://test:8080"
15
+ assert client.token == "test-token"
16
+ await client.close()
@@ -0,0 +1,147 @@
1
+ """Async tests for KTClient methods — HTTP calls mocked."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from unittest.mock import AsyncMock, MagicMock, patch
6
+
7
+ import httpx
8
+ import pytest
9
+
10
+
11
+ @pytest.fixture
12
+ def mock_http_client():
13
+ """Return a patched async HTTP client so no real connection is made."""
14
+ with patch("httpx.AsyncClient") as mock:
15
+ instance = AsyncMock()
16
+ mock.return_value = instance
17
+ yield instance
18
+
19
+
20
+ def _mock_response(data: dict, status: int = 200):
21
+ """Build a mock httpx.Response — .json() is synchronous in httpx."""
22
+ resp = MagicMock(spec=httpx.Response)
23
+ resp.json.return_value = data
24
+ resp.raise_for_status.return_value = None
25
+ resp.status_code = status
26
+ return resp
27
+
28
+
29
+ @pytest.mark.asyncio
30
+ async def test_get_graph_export_parses_nodes_and_edges(mock_http_client):
31
+ """get_graph_export() returns correct KTGraph from API response."""
32
+ from kt_client import KTClient
33
+
34
+ mock_http_client.get.return_value = _mock_response({
35
+ "data": {
36
+ "nodes": [
37
+ {
38
+ "id": "github:repo:test/a",
39
+ "type": "github.repository",
40
+ "name": "repo-a",
41
+ "provider": "github",
42
+ "properties": {"visibility": "public"},
43
+ "tags": ["production"],
44
+ },
45
+ ],
46
+ "edges": [
47
+ {
48
+ "source_id": "github:repo:test/a",
49
+ "target_id": "github:org:test",
50
+ "type": "BELONGS_TO",
51
+ },
52
+ ],
53
+ }
54
+ })
55
+
56
+ client = KTClient(base_url="http://test:8080", token="test-token")
57
+ graph = await client.get_graph_export()
58
+
59
+ assert len(graph.resources) == 1
60
+ assert graph.resources[0].id == "github:repo:test/a"
61
+ assert graph.resources[0].type == "github.repository"
62
+
63
+ assert len(graph.edges) == 1
64
+ assert graph.edges[0].type == "BELONGS_TO"
65
+
66
+ await client.close()
67
+
68
+
69
+ @pytest.mark.asyncio
70
+ async def test_get_graph_export_handles_wrapped_body(mock_http_client):
71
+ """get_graph_export() works when API returns data directly (no wrapper)."""
72
+ from kt_client import KTClient
73
+
74
+ mock_http_client.get.return_value = _mock_response({
75
+ "nodes": [
76
+ {"id": "github:repo:a", "type": "github.repository", "name": "a", "provider": "github"},
77
+ ],
78
+ "edges": [],
79
+ })
80
+
81
+ client = KTClient(base_url="http://test:8080", token="test-token")
82
+ graph = await client.get_graph_export()
83
+ assert len(graph.resources) == 1
84
+
85
+ await client.close()
86
+
87
+
88
+ @pytest.mark.asyncio
89
+ async def test_get_resources_returns_resource_list(mock_http_client):
90
+ """get_resources() parses paginated resource results."""
91
+ from kt_client import KTClient
92
+
93
+ mock_http_client.get.return_value = _mock_response({
94
+ "data": {
95
+ "results": [
96
+ {
97
+ "id": "github:repo:b",
98
+ "type": "github.repository",
99
+ "name": "repo-b",
100
+ "provider": "github",
101
+ "properties": {},
102
+ },
103
+ ],
104
+ }
105
+ })
106
+
107
+ client = KTClient(base_url="http://test:8080", token="test-token")
108
+ resources = await client.get_resources()
109
+ assert len(resources) == 1
110
+ assert resources[0].name == "repo-b"
111
+
112
+ await client.close()
113
+
114
+
115
+ @pytest.mark.asyncio
116
+ async def test_get_changes_returns_change_list(mock_http_client):
117
+ """get_changes() returns raw change dicts."""
118
+ from kt_client import KTClient
119
+
120
+ mock_http_client.get.return_value = _mock_response({
121
+ "data": [
122
+ {"resourceName": "github:repo:a", "changeType": "UPDATED"},
123
+ ],
124
+ })
125
+
126
+ client = KTClient(base_url="http://test:8080", token="test-token")
127
+ changes = await client.get_changes(since="2026-01-01")
128
+ assert len(changes) == 1
129
+ assert changes[0]["changeType"] == "UPDATED"
130
+
131
+ await client.close()
132
+
133
+
134
+ @pytest.mark.asyncio
135
+ async def test_client_http_error_raises(mock_http_client):
136
+ """KTClient raises on HTTP errors."""
137
+ from kt_client import KTClient
138
+
139
+ mock_http_client.get.side_effect = httpx.HTTPStatusError(
140
+ "404", request=MagicMock(), response=MagicMock()
141
+ )
142
+
143
+ client = KTClient(base_url="http://test:8080", token="test-token")
144
+ with pytest.raises(httpx.HTTPStatusError):
145
+ await client.get_graph_export()
146
+
147
+ await client.close()
@@ -0,0 +1,65 @@
1
+ """Tests for metadata_filter module."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import pytest
6
+
7
+ from metadata_filter import MetadataFilter, ResourceMeta
8
+
9
+
10
+ @pytest.fixture
11
+ def populated_filter():
12
+ """A MetadataFilter loaded with a few test resources."""
13
+ mf = MetadataFilter()
14
+ resources = [
15
+ ResourceMeta(id="github:repo:a", provider="github", type="github.repository", tags={"production"}),
16
+ ResourceMeta(id="github:repo:b", provider="github", type="github.repository", tags={"staging"}),
17
+ ResourceMeta(id="aws:ec2:i-123", provider="aws", type="aws.ec2", region="eu-west-1", tags={"production"}),
18
+ ResourceMeta(id="aws:rds:db-456", provider="aws", type="aws.rds", region="eu-west-1", tags={"production"}),
19
+ ]
20
+ # synchronous load bypass
21
+ mf._resources = {r.id: r for r in resources}
22
+ return mf
23
+
24
+
25
+ @pytest.mark.asyncio
26
+ async def test_filter_by_provider(populated_filter):
27
+ ids = populated_filter.filter(provider="github")
28
+ assert ids == {"github:repo:a", "github:repo:b"}
29
+
30
+
31
+ @pytest.mark.asyncio
32
+ async def test_filter_by_type(populated_filter):
33
+ ids = populated_filter.filter(type_prefix="aws.ec2")
34
+ assert ids == {"aws:ec2:i-123"}
35
+
36
+
37
+ @pytest.mark.asyncio
38
+ async def test_filter_by_tags(populated_filter):
39
+ ids = populated_filter.filter(tags=["production"])
40
+ assert ids == {"github:repo:a", "aws:ec2:i-123", "aws:rds:db-456"}
41
+
42
+
43
+ @pytest.mark.asyncio
44
+ async def test_filter_by_region(populated_filter):
45
+ ids = populated_filter.filter(region="eu-west-1")
46
+ assert ids == {"aws:ec2:i-123", "aws:rds:db-456"}
47
+
48
+
49
+ @pytest.mark.asyncio
50
+ async def test_filter_combined(populated_filter):
51
+ ids = populated_filter.filter(provider="aws", tags=["production"])
52
+ assert ids == {"aws:ec2:i-123", "aws:rds:db-456"}
53
+
54
+
55
+ @pytest.mark.asyncio
56
+ async def test_filter_no_match(populated_filter):
57
+ ids = populated_filter.filter(provider="azure")
58
+ assert ids == set()
59
+
60
+
61
+ @pytest.mark.asyncio
62
+ async def test_filter_no_args(populated_filter):
63
+ """No filters returns all IDs."""
64
+ ids = populated_filter.filter()
65
+ assert ids == {"github:repo:a", "github:repo:b", "aws:ec2:i-123", "aws:rds:db-456"}
tests/test_query.py ADDED
@@ -0,0 +1,136 @@
1
+ """Tests for query module — all externals mocked."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from unittest.mock import AsyncMock, MagicMock, patch
6
+
7
+ import pytest
8
+
9
+
10
+ @pytest.mark.asyncio
11
+ async def test_ask_empty_filter_returns_early():
12
+ """ask() returns early when filter matches nothing."""
13
+ mock_filter = MagicMock()
14
+ mock_filter.filter.return_value = set()
15
+
16
+ import query as q
17
+
18
+ q._filter = mock_filter
19
+ q._scorer = MagicMock()
20
+
21
+ text, score, meta = await q.ask("test question", filter_by={"provider": "nonexistent"})
22
+ assert "No resources match" in text
23
+ assert score == 0.0
24
+ assert meta["matched"] == 0
25
+
26
+
27
+ @pytest.mark.asyncio
28
+ async def test_ask_passes_filter_to_lightrag():
29
+ """ask() sends filtered IDs as context to LightRAG query."""
30
+ mock_filter = MagicMock()
31
+ mock_filter.filter.return_value = {"github:repo:a", "github:repo:b"}
32
+
33
+ mock_scorer = MagicMock()
34
+ mock_scorer.score = AsyncMock()
35
+ score_result = MagicMock()
36
+ score_result.overall = 0.9
37
+ score_result.label = "high"
38
+ score_result.components = {"retrieval": 0.9, "coverage": 0.9, "factual": 0.9}
39
+ score_result.sources = []
40
+ mock_scorer.score.return_value = score_result
41
+
42
+ with patch("query.create_rag") as mock_create_rag:
43
+ mock_rag = MagicMock()
44
+ mock_rag.aquery = AsyncMock(return_value="repo-a and repo-b are related")
45
+ mock_rag.initialize_storages = AsyncMock()
46
+ mock_create_rag.return_value = mock_rag
47
+
48
+ import query as q
49
+
50
+ q._filter = mock_filter
51
+ q._scorer = mock_scorer
52
+
53
+ text, score, meta = await q.ask(
54
+ "which repos are related?",
55
+ filter_by={"provider": "github"},
56
+ )
57
+
58
+ # IDs should be in the query text sent to LightRAG
59
+ call_text = mock_rag.aquery.call_args[0][0]
60
+ assert "github:repo:a" in call_text
61
+ assert "Only consider these resource IDs" in call_text
62
+
63
+
64
+ @pytest.mark.asyncio
65
+ async def test_ask_returns_high_confidence():
66
+ """ask() returns response unchanged for high confidence."""
67
+ mock_filter = MagicMock()
68
+ mock_filter.filter.return_value = {"github:repo:a"}
69
+
70
+ mock_scorer = MagicMock()
71
+ score_result = MagicMock()
72
+ score_result.overall = 0.85
73
+ score_result.label = "high"
74
+ score_result.components = {"retrieval": 0.9, "coverage": 0.8, "factual": 0.85}
75
+ score_result.sources = ["github:repo:a"]
76
+ mock_scorer.score = AsyncMock(return_value=score_result)
77
+
78
+ import query as q
79
+
80
+ q._filter = mock_filter
81
+ q._scorer = mock_scorer
82
+
83
+ with patch("query.create_rag") as mock_create_rag:
84
+ mock_rag = MagicMock()
85
+ mock_rag.aquery = AsyncMock(return_value="repo-a is the main repository")
86
+ mock_rag.initialize_storages = AsyncMock()
87
+ mock_create_rag.return_value = mock_rag
88
+
89
+ text, score, meta = await q.ask("what is repo-a?")
90
+
91
+ assert "repo-a is the main repository" in text
92
+ assert "Uncertainty note" not in text
93
+ assert score == 0.85
94
+ assert meta["label"] == "high"
95
+
96
+
97
+ @pytest.mark.asyncio
98
+ async def test_ask_low_confidence_adds_warning():
99
+ """ask() returns a disclaimer for low confidence responses."""
100
+ mock_filter = MagicMock()
101
+ mock_filter.filter.return_value = {"github:repo:a"}
102
+
103
+ mock_scorer = MagicMock()
104
+ score_result = MagicMock()
105
+ score_result.overall = 0.3
106
+ score_result.label = "low"
107
+ score_result.components = {"retrieval": 0.2, "coverage": 0.0, "factual": 0.5}
108
+ score_result.sources = []
109
+ mock_scorer.score = AsyncMock(return_value=score_result)
110
+
111
+ import query as q
112
+
113
+ q._filter = mock_filter
114
+ q._scorer = mock_scorer
115
+
116
+ with patch("query.create_rag") as mock_create_rag:
117
+ mock_rag = MagicMock()
118
+ mock_rag.aquery = AsyncMock(return_value="some vague answer")
119
+ mock_rag.initialize_storages = AsyncMock()
120
+ mock_create_rag.return_value = mock_rag
121
+
122
+ text, score, meta = await q.ask("what is repo-a?")
123
+
124
+ assert text.startswith("I'm not sure")
125
+ assert "some vague answer" in text
126
+
127
+
128
+ def test_main_missing_args():
129
+ """main() exits with usage when no args given."""
130
+ import sys
131
+ from query import main
132
+
133
+ with patch.object(sys, "argv", ["query.py"]):
134
+ with pytest.raises(SystemExit) as exc:
135
+ main()
136
+ assert exc.value.code == 1
@@ -0,0 +1,45 @@
1
+ """Tests for rag_factory module."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import os
6
+ from unittest.mock import MagicMock, patch
7
+
8
+ import pytest
9
+
10
+
11
+ def test_make_llm_func_openai():
12
+ """_make_llm_func returns a callable with openai binding."""
13
+ from rag_factory import _make_llm_func
14
+
15
+ func = _make_llm_func(api_key="sk-test", model="gpt-4o", base_url=None, binding="openai")
16
+ assert callable(func)
17
+
18
+
19
+ def test_make_embedding_func_openai():
20
+ """_make_embedding_func returns an EmbeddingFunc instance."""
21
+ from rag_factory import _make_embedding_func
22
+
23
+ ef = _make_embedding_func(binding="openai", model="text-embedding-3-small")
24
+ assert callable(ef.func)
25
+ assert ef.embedding_dim > 0
26
+
27
+
28
+ @patch("rag_factory.LightRAG")
29
+ def test_create_rag_returns_lightrag_instance(mock_lightrag):
30
+ """create_rag() returns a LightRAG instance."""
31
+ from rag_factory import create_rag
32
+
33
+ mock_instance = MagicMock()
34
+ mock_lightrag.return_value = mock_instance
35
+
36
+ result = create_rag()
37
+
38
+ assert result is mock_instance
39
+ mock_lightrag.assert_called_once()
40
+
41
+ # Verify working_dir comes from config
42
+ call_kwargs = mock_lightrag.call_args.kwargs
43
+ assert "working_dir" in call_kwargs
44
+ assert "llm_model_func" in call_kwargs
45
+ assert "embedding_func" in call_kwargs
tests/test_scorer.py ADDED
@@ -0,0 +1,84 @@
1
+ """Tests for scorer module."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from scorer import Scorer, SourceChunk, ScoreResult, apply_threshold
6
+
7
+
8
+ def test_retrieval_score_high():
9
+ """High similarity chunks produce a high retrieval score."""
10
+ scorer = Scorer()
11
+ chunks = [
12
+ SourceChunk(text="repo A", similarity=0.92),
13
+ SourceChunk(text="repo B", similarity=0.88),
14
+ SourceChunk(text="repo C", similarity=0.85),
15
+ ]
16
+ score = scorer._compute_retrieval_score(chunks)
17
+ assert 0.85 <= score <= 0.95
18
+
19
+
20
+ def test_retrieval_score_empty():
21
+ """Empty chunks produce zero retrieval score."""
22
+ scorer = Scorer()
23
+ assert scorer._compute_retrieval_score([]) == 0.0
24
+
25
+
26
+ def test_coverage_score_all_valid():
27
+ """All cited IDs are known — full coverage."""
28
+ scorer = Scorer(known_ids={"github:repo:test/a", "github:repo:test/b"})
29
+ response = "depends on github:repo:test/a and github:repo:test/b"
30
+ score = scorer._compute_coverage_score(response)
31
+ assert score == 1.0
32
+
33
+
34
+ def test_coverage_score_partial():
35
+ """Some cited IDs are unknown — partial coverage."""
36
+ scorer = Scorer(known_ids={"github:repo:test/a"})
37
+ response = "depends on github:repo:test/a and github:repo:fake/x"
38
+ score = scorer._compute_coverage_score(response)
39
+ assert score == 0.5
40
+
41
+
42
+ def test_coverage_score_no_citations():
43
+ """No resource IDs in response — zero coverage."""
44
+ scorer = Scorer()
45
+ response = "this repo depends on postgres"
46
+ score = scorer._compute_coverage_score(response)
47
+ assert score == 0.0
48
+
49
+
50
+ def test_apply_threshold_high():
51
+ """High confidence responses pass through unchanged."""
52
+ result = ScoreResult(overall=0.85, label="high", components={})
53
+ assert apply_threshold(result, "postgres is running") == "postgres is running"
54
+
55
+
56
+ def test_apply_threshold_medium():
57
+ """Medium confidence appends a disclaimer."""
58
+ result = ScoreResult(
59
+ overall=0.65,
60
+ label="medium",
61
+ components={"retrieval": 0.7, "coverage": 0.6},
62
+ )
63
+ out = apply_threshold(result, "postgres is running")
64
+ assert "Uncertainty note" in out
65
+
66
+
67
+ def test_apply_threshold_low():
68
+ """Low confidence returns a prefixed warning."""
69
+ result = ScoreResult(overall=0.3, label="low", components={})
70
+ out = apply_threshold(result, "postgres is running")
71
+ assert out.startswith("I'm not sure")
72
+
73
+
74
+ def test_extract_resource_ids():
75
+ """Resource ID patterns are correctly extracted."""
76
+ text = "Check github:repo:simonbbbb/foo and aws:ec2:i-1234"
77
+ ids = Scorer._extract_resource_ids(text)
78
+ assert "github:repo:simonbbbb/foo" in ids
79
+ assert "aws:ec2:i-1234" in ids
80
+
81
+
82
+ def test_extract_resource_ids_none():
83
+ """No resource IDs found returns empty set."""
84
+ assert Scorer._extract_resource_ids("plain text") == set()
tests/test_seed.py ADDED
@@ -0,0 +1,37 @@
1
+ """Tests for seed module."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from seed import _resource_to_text
6
+ from kt_client import KTResource, KTEdge
7
+
8
+
9
+ def test_resource_to_text_basic():
10
+ """A resource with no edges produces a minimal text block."""
11
+ r = KTResource(
12
+ id="github:repo:test/repo",
13
+ type="github.repository",
14
+ name="test-repo",
15
+ provider="github",
16
+ properties={"language": "Go", "visibility": "public"},
17
+ tags=None,
18
+ )
19
+ text = _resource_to_text(r, [])
20
+ assert "test-repo" in text
21
+ assert "github.repository" in text
22
+ assert "Go" in text
23
+
24
+
25
+ def test_resource_to_text_with_edges():
26
+ """A resource with relationships includes them in the text."""
27
+ r = KTResource(
28
+ id="github:repo:test/repo",
29
+ type="github.repository",
30
+ name="test-repo",
31
+ provider="github",
32
+ properties={},
33
+ tags=None,
34
+ )
35
+ edges = [KTEdge(source_id=r.id, target_id="github:org:test", type="BELONGS_TO")]
36
+ text = _resource_to_text(r, edges)
37
+ assert "BELONGS_TO" in text
@@ -0,0 +1,115 @@
1
+ """Async tests for seed module — all externals mocked."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from unittest.mock import AsyncMock, MagicMock, patch
6
+
7
+ import pytest
8
+
9
+ from kt_client import KTEdge, KTGraph, KTResource
10
+
11
+
12
+ @pytest.mark.asyncio
13
+ async def test_seed_empty_graph():
14
+ """seed() returns early when the graph has no resources."""
15
+ with (
16
+ patch("seed.KTClient") as mock_kt_cls,
17
+ patch("seed.create_rag") as mock_create_rag,
18
+ ):
19
+ mock_kt = AsyncMock()
20
+ mock_kt.get_graph_export.return_value = KTGraph(resources=[], edges=[])
21
+ mock_kt_cls.return_value.__aenter__.return_value = mock_kt
22
+
23
+ from seed import seed
24
+
25
+ await seed()
26
+
27
+ mock_kt.get_graph_export.assert_awaited_once()
28
+ mock_create_rag.assert_not_called()
29
+
30
+
31
+ @pytest.mark.asyncio
32
+ async def test_seed_with_resources():
33
+ """seed() inserts each resource into LightRAG."""
34
+ resources = [
35
+ KTResource(
36
+ id="github:repo:test/a",
37
+ type="github.repository",
38
+ name="repo-a",
39
+ provider="github",
40
+ properties={"visibility": "public"},
41
+ tags=["production"],
42
+ ),
43
+ KTResource(
44
+ id="github:repo:test/b",
45
+ type="github.repository",
46
+ name="repo-b",
47
+ provider="github",
48
+ properties={"visibility": "private"},
49
+ tags=["staging"],
50
+ ),
51
+ ]
52
+ edges = [
53
+ KTEdge(source_id="github:repo:test/a", target_id="github:org:test", type="BELONGS_TO")
54
+ ]
55
+
56
+ with (
57
+ patch("seed.KTClient") as mock_kt_cls,
58
+ patch("seed.create_rag") as mock_create_rag,
59
+ ):
60
+ mock_kt = AsyncMock()
61
+ mock_kt.get_graph_export.return_value = KTGraph(resources=resources, edges=edges)
62
+ mock_kt_cls.return_value.__aenter__.return_value = mock_kt
63
+
64
+ mock_rag = MagicMock()
65
+ mock_rag.ainsert = AsyncMock(return_value="doc-1")
66
+ mock_rag.initialize_storages = AsyncMock()
67
+ mock_create_rag.return_value = mock_rag
68
+
69
+ from seed import seed
70
+
71
+ await seed()
72
+
73
+ assert mock_rag.ainsert.await_count == 2
74
+
75
+
76
+ @pytest.mark.asyncio
77
+ async def test_seed_counts_insertions():
78
+ """seed() counts resources that return a truthy doc_id."""
79
+ resources = [
80
+ KTResource(
81
+ id="github:repo:test/a",
82
+ type="github.repository",
83
+ name="repo-a",
84
+ provider="github",
85
+ properties={},
86
+ tags=None,
87
+ ),
88
+ KTResource(
89
+ id="github:repo:test/b",
90
+ type="github.repository",
91
+ name="repo-b",
92
+ provider="github",
93
+ properties={},
94
+ tags=None,
95
+ ),
96
+ ]
97
+
98
+ with (
99
+ patch("seed.KTClient") as mock_kt_cls,
100
+ patch("seed.create_rag") as mock_create_rag,
101
+ ):
102
+ mock_kt = AsyncMock()
103
+ mock_kt.get_graph_export.return_value = KTGraph(resources=resources, edges=[])
104
+ mock_kt_cls.return_value.__aenter__.return_value = mock_kt
105
+
106
+ mock_rag = MagicMock()
107
+ mock_rag.ainsert = AsyncMock(side_effect=["doc-1", ""])
108
+ mock_rag.initialize_storages = AsyncMock()
109
+ mock_create_rag.return_value = mock_rag
110
+
111
+ from seed import seed
112
+
113
+ await seed()
114
+
115
+ assert mock_rag.ainsert.await_count == 2
tests/test_sync.py ADDED
@@ -0,0 +1,15 @@
1
+ """Tests for sync module."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import pytest
6
+
7
+
8
+ @pytest.mark.asyncio
9
+ async def test_sync_noop_when_no_changes():
10
+ """Sync returns 0 when there are no changes."""
11
+ # Integration test — requires running KT + LightRAG
12
+ # from sync import sync
13
+ # count = await sync(since="2100-01-01")
14
+ # assert count == 0
15
+ assert True
@@ -0,0 +1,99 @@
1
+ """Async tests for sync module — all externals mocked."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from unittest.mock import AsyncMock, MagicMock, patch
6
+
7
+ import pytest
8
+
9
+ from kt_client import KTResource
10
+
11
+
12
+ @pytest.mark.asyncio
13
+ async def test_sync_no_changes():
14
+ """sync() returns 0 when there are no changes."""
15
+ with (
16
+ patch("sync.create_rag") as mock_create_rag,
17
+ patch("sync.KTClient") as mock_kt_cls,
18
+ ):
19
+ mock_kt = AsyncMock()
20
+ mock_kt.get_changes.return_value = []
21
+ mock_kt.get_resources.return_value = []
22
+ mock_kt_cls.return_value.__aenter__.return_value = mock_kt
23
+
24
+ mock_rag = MagicMock()
25
+ mock_rag.ainsert = AsyncMock(return_value="doc-1")
26
+ mock_rag.initialize_storages = AsyncMock()
27
+ mock_create_rag.return_value = mock_rag
28
+
29
+ from sync import sync
30
+
31
+ count = await sync(since="2100-01-01")
32
+ assert count == 0
33
+ mock_rag.ainsert.assert_not_awaited()
34
+
35
+
36
+ @pytest.mark.asyncio
37
+ async def test_sync_with_changes():
38
+ """sync() inserts only changed resources."""
39
+ with (
40
+ patch("sync.create_rag") as mock_create_rag,
41
+ patch("sync.KTClient") as mock_kt_cls,
42
+ ):
43
+ mock_kt = AsyncMock()
44
+ mock_kt.get_changes.return_value = [
45
+ {"resourceName": "github:repo:test/a"},
46
+ {"resourceName": "github:repo:test/b"},
47
+ ]
48
+ mock_kt.get_resources.return_value = [
49
+ KTResource(
50
+ id="github:repo:test/a", type="github.repository",
51
+ name="repo-a", provider="github", properties={}, tags=None,
52
+ ),
53
+ KTResource(
54
+ id="github:repo:test/b", type="github.repository",
55
+ name="repo-b", provider="github", properties={}, tags=None,
56
+ ),
57
+ KTResource(
58
+ id="github:repo:test/c", type="github.repository",
59
+ name="repo-c", provider="github", properties={}, tags=None,
60
+ ),
61
+ ]
62
+ mock_kt_cls.return_value.__aenter__.return_value = mock_kt
63
+
64
+ mock_rag = MagicMock()
65
+ mock_rag.ainsert = AsyncMock(return_value="doc-id")
66
+ mock_rag.initialize_storages = AsyncMock()
67
+ mock_create_rag.return_value = mock_rag
68
+
69
+ from sync import sync
70
+
71
+ count = await sync(since="2100-01-01")
72
+ assert count == 2
73
+ assert mock_rag.ainsert.await_count == 2
74
+
75
+
76
+ @pytest.mark.asyncio
77
+ async def test_sync_default_since():
78
+ """sync() sets default since to 24 hours ago when not provided."""
79
+ with (
80
+ patch("sync.create_rag") as mock_create_rag,
81
+ patch("sync.KTClient") as mock_kt_cls,
82
+ ):
83
+ mock_kt = AsyncMock()
84
+ mock_kt.get_changes.return_value = []
85
+ mock_kt.get_resources.return_value = []
86
+ mock_kt_cls.return_value.__aenter__.return_value = mock_kt
87
+
88
+ mock_rag = MagicMock()
89
+ mock_rag.initialize_storages = AsyncMock()
90
+ mock_create_rag.return_value = mock_rag
91
+
92
+ from sync import sync
93
+
94
+ await sync() # no since arg
95
+
96
+ # Should have been called with a recent ISO date
97
+ call_args = mock_kt.get_changes.call_args
98
+ call_since = call_args[1]["since"]
99
+ assert "T" in call_since
@@ -0,0 +1,70 @@
1
+ """Tests for sync_service module — daemon loop behavior."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import asyncio
6
+ from unittest.mock import MagicMock, patch
7
+
8
+ import pytest
9
+
10
+
11
+ @pytest.fixture
12
+ def mock_config():
13
+ """Patch sync_service.config with a mutable version."""
14
+ cfg = MagicMock()
15
+ cfg.sync_interval = 1 # short interval for tests
16
+ with patch("sync_service.config", cfg):
17
+ yield cfg
18
+
19
+
20
+ @pytest.mark.asyncio
21
+ async def test_sync_service_loop_calls_sync(mock_config):
22
+ """run_loop() calls sync() and stops."""
23
+ with patch("sync_service.sync") as mock_sync:
24
+ mock_sync.return_value = 0
25
+
26
+ import sync_service as svc
27
+
28
+ svc._running = True
29
+
30
+ from sync_service import run_loop
31
+
32
+ async def runner():
33
+ task = asyncio.create_task(run_loop())
34
+ await asyncio.sleep(0.15)
35
+ svc._running = False
36
+ await task
37
+
38
+ await runner()
39
+ mock_sync.assert_awaited_once()
40
+
41
+
42
+ @pytest.mark.asyncio
43
+ async def test_sync_service_handles_exception(mock_config):
44
+ """run_loop() logs errors and continues on sync failure."""
45
+ with patch("sync_service.sync") as mock_sync:
46
+ mock_sync.side_effect = [ValueError("KT unreachable"), 0]
47
+
48
+ import sync_service as svc
49
+
50
+ svc._running = True
51
+ from sync_service import run_loop
52
+
53
+ async def runner():
54
+ task = asyncio.create_task(run_loop())
55
+ await asyncio.sleep(2.5)
56
+ svc._running = False
57
+ await task
58
+
59
+ await runner()
60
+
61
+ assert mock_sync.await_count >= 2
62
+
63
+
64
+ def test_sync_service_shutdown_signal():
65
+ """_shutdown sets _running to False."""
66
+ import sync_service as svc
67
+
68
+ svc._running = True
69
+ svc._shutdown(2, None)
70
+ assert svc._running is False