bridgekit 0.1.1__tar.gz → 0.2.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: bridgekit
3
- Version: 0.1.1
3
+ Version: 0.2.1
4
4
  Summary: AI tools that make you a better data scientist, not a redundant one.
5
5
  License: MIT
6
6
  Project-URL: Homepage, https://github.com/getbridgekit/bridgekit
@@ -15,8 +15,19 @@ Requires-Python: >=3.9
15
15
  Description-Content-Type: text/markdown
16
16
  License-File: LICENSE
17
17
  Requires-Dist: anthropic>=0.20.0
18
+ Requires-Dist: chromadb>=0.4.0
19
+ Requires-Dist: sentence-transformers>=2.0.0
20
+ Requires-Dist: pypdf>=3.0.0
21
+ Requires-Dist: python-docx>=1.0.0
22
+ Requires-Dist: python-pptx>=0.6.0
23
+ Requires-Dist: nbformat>=5.0.0
24
+ Provides-Extra: dev
25
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
26
+ Requires-Dist: pytest-mock>=3.0.0; extra == "dev"
18
27
  Dynamic: license-file
19
28
 
29
+ <img src="assets/logo.png" width="150"/>
30
+
20
31
  # Bridgekit
21
32
 
22
33
  **AI tools that make you a better data scientist, not a redundant one.**
@@ -134,6 +145,49 @@ Paste your writeup as a string and call `evaluate()` — that's it.
134
145
 
135
146
  ---
136
147
 
148
+ ## Tool #2: Analysis Search
149
+
150
+ Ask questions across a collection of your past analysis documents. Point it at a folder and get answers grounded in your actual work — no digging through files manually.
151
+
152
+ Uses a vector database and semantic similarity to find relevant context across your documents — not keyword matching.
153
+
154
+ Supports `.txt`, `.md`, `.pdf`, `.docx`, `.pptx`, and `.ipynb` files.
155
+
156
+ > **Note:** The first run will download the MiniLM embedding model (~90MB). This is a one-time download — it gets cached locally for all subsequent calls.
157
+
158
+ **From a folder:**
159
+ ```python
160
+ from bridgekit import ask
161
+
162
+ print(ask("what drove churn in Q3?", source="reports/"))
163
+ ```
164
+
165
+ **From raw text:**
166
+ ```python
167
+ from bridgekit import ask
168
+
169
+ text = """
170
+ Q3 churn rose to 4.5%, driven by a product outage in August and a pricing
171
+ change in July that increased SMB costs by 12%.
172
+ """
173
+
174
+ print(ask("what caused the Q3 churn spike?", text=text))
175
+ ```
176
+
177
+ **Output** *(based on sample data included in the repo)*:
178
+ ```
179
+ Based on the Q3 2024 Churn Analysis, two primary factors drove the elevated
180
+ churn rate of 4.5%:
181
+
182
+ 1. August Product Outage — A 14-hour outage affected 3,800 accounts. Impacted
183
+ accounts churned at 8.1% vs 3.2% for unaffected accounts.
184
+
185
+ 2. July Pricing Change — SMB costs increased by an average of 12%, causing SMB
186
+ churn to spike to 7.2% — the highest single-month figure in the dataset.
187
+ ```
188
+
189
+ ---
190
+
137
191
  ## Why not just use Claude?
138
192
 
139
193
  You could. But you'd need to know what to ask, how to frame it, and what a good answer looks like. Bridgekit has that baked in — it knows you're a data scientist presenting findings, so it asks the right questions automatically. No prompt engineering required. Just paste your work and run it.
@@ -156,7 +210,7 @@ Bridgekit only ever sees text you write yourself — your narrative, your conclu
156
210
 
157
211
  ## What's next?
158
212
 
159
- Bridgekit is a suite, not a one-off. The analysis reviewer is tool #1. Coming next:
213
+ Bridgekit is a suite, not a one-off. Two tools are live more are coming:
160
214
 
161
215
  - **Statistical approach suggester** — describe your problem in plain English, get the right test and why
162
216
  - **Stakeholder translator** — turn your technical findings into a narrative a non-technical audience will actually follow
@@ -1,21 +1,4 @@
1
- Metadata-Version: 2.4
2
- Name: bridgekit
3
- Version: 0.1.1
4
- Summary: AI tools that make you a better data scientist, not a redundant one.
5
- License: MIT
6
- Project-URL: Homepage, https://github.com/getbridgekit/bridgekit
7
- Project-URL: Issues, https://github.com/getbridgekit/bridgekit/issues
8
- Keywords: data science,AI,analysis,evaluation,anthropic
9
- Classifier: Development Status :: 3 - Alpha
10
- Classifier: Intended Audience :: Science/Research
11
- Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
12
- Classifier: Programming Language :: Python :: 3
13
- Classifier: License :: OSI Approved :: MIT License
14
- Requires-Python: >=3.9
15
- Description-Content-Type: text/markdown
16
- License-File: LICENSE
17
- Requires-Dist: anthropic>=0.20.0
18
- Dynamic: license-file
1
+ <img src="assets/logo.png" width="150"/>
19
2
 
20
3
  # Bridgekit
21
4
 
@@ -134,6 +117,49 @@ Paste your writeup as a string and call `evaluate()` — that's it.
134
117
 
135
118
  ---
136
119
 
120
+ ## Tool #2: Analysis Search
121
+
122
+ Ask questions across a collection of your past analysis documents. Point it at a folder and get answers grounded in your actual work — no digging through files manually.
123
+
124
+ Uses a vector database and semantic similarity to find relevant context across your documents — not keyword matching.
125
+
126
+ Supports `.txt`, `.md`, `.pdf`, `.docx`, `.pptx`, and `.ipynb` files.
127
+
128
+ > **Note:** The first run will download the MiniLM embedding model (~90MB). This is a one-time download — it gets cached locally for all subsequent calls.
129
+
130
+ **From a folder:**
131
+ ```python
132
+ from bridgekit import ask
133
+
134
+ print(ask("what drove churn in Q3?", source="reports/"))
135
+ ```
136
+
137
+ **From raw text:**
138
+ ```python
139
+ from bridgekit import ask
140
+
141
+ text = """
142
+ Q3 churn rose to 4.5%, driven by a product outage in August and a pricing
143
+ change in July that increased SMB costs by 12%.
144
+ """
145
+
146
+ print(ask("what caused the Q3 churn spike?", text=text))
147
+ ```
148
+
149
+ **Output** *(based on sample data included in the repo)*:
150
+ ```
151
+ Based on the Q3 2024 Churn Analysis, two primary factors drove the elevated
152
+ churn rate of 4.5%:
153
+
154
+ 1. August Product Outage — A 14-hour outage affected 3,800 accounts. Impacted
155
+ accounts churned at 8.1% vs 3.2% for unaffected accounts.
156
+
157
+ 2. July Pricing Change — SMB costs increased by an average of 12%, causing SMB
158
+ churn to spike to 7.2% — the highest single-month figure in the dataset.
159
+ ```
160
+
161
+ ---
162
+
137
163
  ## Why not just use Claude?
138
164
 
139
165
  You could. But you'd need to know what to ask, how to frame it, and what a good answer looks like. Bridgekit has that baked in — it knows you're a data scientist presenting findings, so it asks the right questions automatically. No prompt engineering required. Just paste your work and run it.
@@ -156,7 +182,7 @@ Bridgekit only ever sees text you write yourself — your narrative, your conclu
156
182
 
157
183
  ## What's next?
158
184
 
159
- Bridgekit is a suite, not a one-off. The analysis reviewer is tool #1. Coming next:
185
+ Bridgekit is a suite, not a one-off. Two tools are live more are coming:
160
186
 
161
187
  - **Statistical approach suggester** — describe your problem in plain English, get the right test and why
162
188
  - **Stakeholder translator** — turn your technical findings into a narrative a non-technical audience will actually follow
@@ -0,0 +1,5 @@
1
+ from .reviewer import evaluate
2
+ from .search import ask
3
+
4
+ __version__ = "0.2.1"
5
+ __all__ = ["evaluate", "ask"]
@@ -56,6 +56,9 @@ def evaluate(text: str) -> str:
56
56
  Returns:
57
57
  Structured feedback across five dimensions.
58
58
  """
59
+ if not text or not text.strip():
60
+ raise ValueError("Text cannot be empty.")
61
+
59
62
  api_key = os.environ.get("ANTHROPIC_API_KEY")
60
63
  if not api_key:
61
64
  raise EnvironmentError(
@@ -0,0 +1,122 @@
1
+ import os
2
+ from pathlib import Path
3
+ import anthropic
4
+ import chromadb
5
+ from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
6
+
7
+ CHUNK_SIZE = 150 # words per chunk
8
+ CHUNK_OVERLAP = 20
9
+
10
+
11
+ def _load_file(path: Path) -> str:
12
+ suffix = path.suffix.lower()
13
+ if suffix == ".pdf":
14
+ import pypdf
15
+ reader = pypdf.PdfReader(str(path))
16
+ return "\n".join(page.extract_text() or "" for page in reader.pages)
17
+ elif suffix == ".docx":
18
+ import docx
19
+ doc = docx.Document(str(path))
20
+ return "\n".join(p.text for p in doc.paragraphs if p.text.strip())
21
+ elif suffix == ".pptx":
22
+ from pptx import Presentation
23
+ prs = Presentation(str(path))
24
+ lines = []
25
+ for slide in prs.slides:
26
+ for shape in slide.shapes:
27
+ if hasattr(shape, "text") and shape.text.strip():
28
+ lines.append(shape.text)
29
+ return "\n".join(lines)
30
+ elif suffix == ".ipynb":
31
+ import nbformat
32
+ nb = nbformat.read(str(path), as_version=4)
33
+ lines = []
34
+ for cell in nb.cells:
35
+ if cell.cell_type in ("markdown", "code") and cell.source.strip():
36
+ lines.append(cell.source)
37
+ return "\n\n".join(lines)
38
+ else:
39
+ return path.read_text(encoding="utf-8")
40
+
41
+
42
+ def _chunk(text: str) -> list[str]:
43
+ words = text.split()
44
+ chunks = []
45
+ i = 0
46
+ while i < len(words):
47
+ chunks.append(" ".join(words[i:i + CHUNK_SIZE]))
48
+ i += CHUNK_SIZE - CHUNK_OVERLAP
49
+ return [c for c in chunks if c.strip()]
50
+
51
+
52
+ def ask(question: str, source: str = None, text: str = None) -> str:
53
+ """
54
+ Ask a question across a collection of analysis documents or raw text.
55
+
56
+ Args:
57
+ question: The question to answer.
58
+ source: Path to a folder containing .txt, .md, .pdf, .docx, .pptx, or .ipynb files.
59
+ text: A raw text string to search instead of a folder.
60
+
61
+ Returns:
62
+ An answer grounded in the provided documents.
63
+ """
64
+ if not source and not text:
65
+ raise ValueError("Provide either 'source' (folder path) or 'text'.")
66
+
67
+ api_key = os.environ.get("ANTHROPIC_API_KEY")
68
+ if not api_key:
69
+ raise EnvironmentError(
70
+ "ANTHROPIC_API_KEY not found. Set it with: export ANTHROPIC_API_KEY=your_key_here"
71
+ )
72
+
73
+ # Collect chunks
74
+ chunks = []
75
+
76
+ if text:
77
+ chunks.extend(_chunk(text))
78
+
79
+ if source:
80
+ folder = Path(source).expanduser().resolve()
81
+ supported = {".txt", ".md", ".pdf", ".docx", ".pptx", ".ipynb"}
82
+ for file in sorted(folder.iterdir()):
83
+ if file.suffix.lower() in supported:
84
+ content = _load_file(file)
85
+ chunks.extend(_chunk(content))
86
+
87
+ if not chunks:
88
+ raise ValueError("No content found. Check your source folder or text input.")
89
+
90
+ # Embed and store in ChromaDB
91
+ embedding_fn = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
92
+ client = chromadb.Client()
93
+ collection = client.get_or_create_collection(
94
+ name="bridgekit_ask",
95
+ embedding_function=embedding_fn
96
+ )
97
+ collection.add(
98
+ documents=chunks,
99
+ ids=[f"chunk_{i}" for i in range(len(chunks))]
100
+ )
101
+
102
+ # Retrieve most relevant chunks
103
+ results = collection.query(query_texts=[question], n_results=min(8, len(chunks)))
104
+ context = "\n\n".join(results["documents"][0])
105
+
106
+ # Generate answer with Claude
107
+ anthropic_client = anthropic.Anthropic(api_key=api_key)
108
+ message = anthropic_client.messages.create(
109
+ model="claude-opus-4-5",
110
+ max_tokens=1024,
111
+ system=(
112
+ "You are a senior data scientist answering questions based on analysis reports. "
113
+ "Answer only from the provided context. Be specific and cite findings where relevant. "
114
+ "If the context does not contain enough information to answer, say so clearly."
115
+ ),
116
+ messages=[{
117
+ "role": "user",
118
+ "content": f"Context from analysis reports:\n\n{context}\n\nQuestion: {question}"
119
+ }]
120
+ )
121
+
122
+ return message.content[0].text
@@ -1,3 +1,33 @@
1
+ Metadata-Version: 2.4
2
+ Name: bridgekit
3
+ Version: 0.2.1
4
+ Summary: AI tools that make you a better data scientist, not a redundant one.
5
+ License: MIT
6
+ Project-URL: Homepage, https://github.com/getbridgekit/bridgekit
7
+ Project-URL: Issues, https://github.com/getbridgekit/bridgekit/issues
8
+ Keywords: data science,AI,analysis,evaluation,anthropic
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Science/Research
11
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Requires-Python: >=3.9
15
+ Description-Content-Type: text/markdown
16
+ License-File: LICENSE
17
+ Requires-Dist: anthropic>=0.20.0
18
+ Requires-Dist: chromadb>=0.4.0
19
+ Requires-Dist: sentence-transformers>=2.0.0
20
+ Requires-Dist: pypdf>=3.0.0
21
+ Requires-Dist: python-docx>=1.0.0
22
+ Requires-Dist: python-pptx>=0.6.0
23
+ Requires-Dist: nbformat>=5.0.0
24
+ Provides-Extra: dev
25
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
26
+ Requires-Dist: pytest-mock>=3.0.0; extra == "dev"
27
+ Dynamic: license-file
28
+
29
+ <img src="assets/logo.png" width="150"/>
30
+
1
31
  # Bridgekit
2
32
 
3
33
  **AI tools that make you a better data scientist, not a redundant one.**
@@ -115,6 +145,49 @@ Paste your writeup as a string and call `evaluate()` — that's it.
115
145
 
116
146
  ---
117
147
 
148
+ ## Tool #2: Analysis Search
149
+
150
+ Ask questions across a collection of your past analysis documents. Point it at a folder and get answers grounded in your actual work — no digging through files manually.
151
+
152
+ Uses a vector database and semantic similarity to find relevant context across your documents — not keyword matching.
153
+
154
+ Supports `.txt`, `.md`, `.pdf`, `.docx`, `.pptx`, and `.ipynb` files.
155
+
156
+ > **Note:** The first run will download the MiniLM embedding model (~90MB). This is a one-time download — it gets cached locally for all subsequent calls.
157
+
158
+ **From a folder:**
159
+ ```python
160
+ from bridgekit import ask
161
+
162
+ print(ask("what drove churn in Q3?", source="reports/"))
163
+ ```
164
+
165
+ **From raw text:**
166
+ ```python
167
+ from bridgekit import ask
168
+
169
+ text = """
170
+ Q3 churn rose to 4.5%, driven by a product outage in August and a pricing
171
+ change in July that increased SMB costs by 12%.
172
+ """
173
+
174
+ print(ask("what caused the Q3 churn spike?", text=text))
175
+ ```
176
+
177
+ **Output** *(based on sample data included in the repo)*:
178
+ ```
179
+ Based on the Q3 2024 Churn Analysis, two primary factors drove the elevated
180
+ churn rate of 4.5%:
181
+
182
+ 1. August Product Outage — A 14-hour outage affected 3,800 accounts. Impacted
183
+ accounts churned at 8.1% vs 3.2% for unaffected accounts.
184
+
185
+ 2. July Pricing Change — SMB costs increased by an average of 12%, causing SMB
186
+ churn to spike to 7.2% — the highest single-month figure in the dataset.
187
+ ```
188
+
189
+ ---
190
+
118
191
  ## Why not just use Claude?
119
192
 
120
193
  You could. But you'd need to know what to ask, how to frame it, and what a good answer looks like. Bridgekit has that baked in — it knows you're a data scientist presenting findings, so it asks the right questions automatically. No prompt engineering required. Just paste your work and run it.
@@ -137,7 +210,7 @@ Bridgekit only ever sees text you write yourself — your narrative, your conclu
137
210
 
138
211
  ## What's next?
139
212
 
140
- Bridgekit is a suite, not a one-off. The analysis reviewer is tool #1. Coming next:
213
+ Bridgekit is a suite, not a one-off. Two tools are live more are coming:
141
214
 
142
215
  - **Statistical approach suggester** — describe your problem in plain English, get the right test and why
143
216
  - **Stakeholder translator** — turn your technical findings into a narrative a non-technical audience will actually follow
@@ -3,8 +3,11 @@ README.md
3
3
  pyproject.toml
4
4
  bridgekit/__init__.py
5
5
  bridgekit/reviewer.py
6
+ bridgekit/search.py
6
7
  bridgekit.egg-info/PKG-INFO
7
8
  bridgekit.egg-info/SOURCES.txt
8
9
  bridgekit.egg-info/dependency_links.txt
9
10
  bridgekit.egg-info/requires.txt
10
- bridgekit.egg-info/top_level.txt
11
+ bridgekit.egg-info/top_level.txt
12
+ tests/test_reviewer.py
13
+ tests/test_search.py
@@ -0,0 +1,11 @@
1
+ anthropic>=0.20.0
2
+ chromadb>=0.4.0
3
+ sentence-transformers>=2.0.0
4
+ pypdf>=3.0.0
5
+ python-docx>=1.0.0
6
+ python-pptx>=0.6.0
7
+ nbformat>=5.0.0
8
+
9
+ [dev]
10
+ pytest>=7.0.0
11
+ pytest-mock>=3.0.0
@@ -2,9 +2,12 @@
2
2
  requires = ["setuptools>=61.0"]
3
3
  build-backend = "setuptools.build_meta"
4
4
 
5
+ [tool.setuptools.packages.find]
6
+ include = ["bridgekit*"]
7
+
5
8
  [project]
6
9
  name = "bridgekit"
7
- version = "0.1.1"
10
+ version = "0.2.1"
8
11
  description = "AI tools that make you a better data scientist, not a redundant one."
9
12
  readme = "README.md"
10
13
  requires-python = ">=3.9"
@@ -19,6 +22,18 @@ classifiers = [
19
22
  ]
20
23
  dependencies = [
21
24
  "anthropic>=0.20.0",
25
+ "chromadb>=0.4.0",
26
+ "sentence-transformers>=2.0.0",
27
+ "pypdf>=3.0.0",
28
+ "python-docx>=1.0.0",
29
+ "python-pptx>=0.6.0",
30
+ "nbformat>=5.0.0",
31
+ ]
32
+
33
+ [project.optional-dependencies]
34
+ dev = [
35
+ "pytest>=7.0.0",
36
+ "pytest-mock>=3.0.0",
22
37
  ]
23
38
 
24
39
  [project.urls]
@@ -0,0 +1,162 @@
1
+ import os
2
+ import pytest
3
+ from unittest.mock import MagicMock, patch
4
+
5
+
6
+ # ---------------------------------------------------------------------------
7
+ # Helpers
8
+ # ---------------------------------------------------------------------------
9
+
10
+ def _make_mock_message(text: str):
11
+ """Build a minimal fake Anthropic message response."""
12
+ content_block = MagicMock()
13
+ content_block.text = text
14
+ message = MagicMock()
15
+ message.content = [content_block]
16
+ return message
17
+
18
+
19
+ FAKE_RESPONSE = (
20
+ "BRIDGEKIT ANALYSIS REVIEW\n"
21
+ "─────────────────────────────────────────\n\n"
22
+ "1. CLARITY\n"
23
+ "✅ STRONG — The writeup is clear and jargon-free.\n\n"
24
+ "2. AUDIENCE CLARITY\n"
25
+ "✅ STRONG — Written for the right audience.\n\n"
26
+ "3. STATISTICAL RIGOR\n"
27
+ "⚠️ NEEDS WORK — Sample size is not mentioned.\n\n"
28
+ "4. METHODOLOGY\n"
29
+ "✅ STRONG — Approach is well explained.\n\n"
30
+ "5. BUSINESS IMPACT\n"
31
+ "❌ MISSING — No quantified outcomes.\n\n"
32
+ "─────────────────────────────────────────\n"
33
+ "BOTTOM LINE\n"
34
+ "Add specific metrics to quantify business impact."
35
+ )
36
+
37
+
38
+ # ---------------------------------------------------------------------------
39
+ # Tests
40
+ # ---------------------------------------------------------------------------
41
+
42
+ class TestEvaluateReturnsString:
43
+ """evaluate() should return a plain string."""
44
+
45
+ def test_returns_string(self):
46
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
47
+ with patch("anthropic.Anthropic") as MockAnthropic:
48
+ mock_client = MagicMock()
49
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_RESPONSE)
50
+ MockAnthropic.return_value = mock_client
51
+
52
+ from bridgekit.reviewer import evaluate
53
+ result = evaluate("We ran an A/B test on 500 users.")
54
+
55
+ assert isinstance(result, str)
56
+
57
+ def test_returns_non_empty_string(self):
58
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
59
+ with patch("anthropic.Anthropic") as MockAnthropic:
60
+ mock_client = MagicMock()
61
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_RESPONSE)
62
+ MockAnthropic.return_value = mock_client
63
+
64
+ from bridgekit.reviewer import evaluate
65
+ result = evaluate("We ran an A/B test on 500 users.")
66
+
67
+ assert len(result) > 0
68
+
69
+
70
+ class TestEvaluateOutputStructure:
71
+ """evaluate() output should contain the required section headers."""
72
+
73
+ def test_output_contains_clarity(self):
74
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
75
+ with patch("anthropic.Anthropic") as MockAnthropic:
76
+ mock_client = MagicMock()
77
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_RESPONSE)
78
+ MockAnthropic.return_value = mock_client
79
+
80
+ from bridgekit.reviewer import evaluate
81
+ result = evaluate("Some analysis text.")
82
+
83
+ assert "CLARITY" in result
84
+
85
+ def test_output_contains_bottom_line(self):
86
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
87
+ with patch("anthropic.Anthropic") as MockAnthropic:
88
+ mock_client = MagicMock()
89
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_RESPONSE)
90
+ MockAnthropic.return_value = mock_client
91
+
92
+ from bridgekit.reviewer import evaluate
93
+ result = evaluate("Some analysis text.")
94
+
95
+ assert "BOTTOM LINE" in result
96
+
97
+ def test_output_contains_both_required_sections(self):
98
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
99
+ with patch("anthropic.Anthropic") as MockAnthropic:
100
+ mock_client = MagicMock()
101
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_RESPONSE)
102
+ MockAnthropic.return_value = mock_client
103
+
104
+ from bridgekit.reviewer import evaluate
105
+ result = evaluate("Some analysis text.")
106
+
107
+ assert "CLARITY" in result and "BOTTOM LINE" in result
108
+
109
+
110
+ class TestEvaluateMissingApiKey:
111
+ """evaluate() should raise EnvironmentError when the API key is absent."""
112
+
113
+ def test_raises_environment_error_when_key_missing(self):
114
+ env = {k: v for k, v in os.environ.items() if k != "ANTHROPIC_API_KEY"}
115
+ with patch.dict(os.environ, env, clear=True):
116
+ from bridgekit.reviewer import evaluate
117
+ with pytest.raises(EnvironmentError):
118
+ evaluate("Some analysis text.")
119
+
120
+ def test_error_message_mentions_key(self):
121
+ env = {k: v for k, v in os.environ.items() if k != "ANTHROPIC_API_KEY"}
122
+ with patch.dict(os.environ, env, clear=True):
123
+ from bridgekit.reviewer import evaluate
124
+ with pytest.raises(EnvironmentError, match="ANTHROPIC_API_KEY"):
125
+ evaluate("Some analysis text.")
126
+
127
+
128
+ class TestEvaluateEmptyInput:
129
+ """evaluate() should raise ValueError for empty or whitespace-only input."""
130
+
131
+ def test_empty_string_raises_value_error(self):
132
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
133
+ from bridgekit.reviewer import evaluate
134
+ with pytest.raises(ValueError, match="empty"):
135
+ evaluate("")
136
+
137
+ def test_whitespace_only_raises_value_error(self):
138
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
139
+ from bridgekit.reviewer import evaluate
140
+ with pytest.raises(ValueError, match="empty"):
141
+ evaluate(" ")
142
+
143
+
144
+ class TestEvaluateApiCallShape:
145
+ """evaluate() should pass the user text through to the Anthropic API."""
146
+
147
+ def test_api_called_with_user_text(self):
148
+ user_text = "Our conversion rate improved after the campaign."
149
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
150
+ with patch("anthropic.Anthropic") as MockAnthropic:
151
+ mock_client = MagicMock()
152
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_RESPONSE)
153
+ MockAnthropic.return_value = mock_client
154
+
155
+ from bridgekit.reviewer import evaluate
156
+ evaluate(user_text)
157
+
158
+ call_kwargs = mock_client.messages.create.call_args
159
+ # The user text should appear somewhere in the messages payload
160
+ messages_arg = call_kwargs.kwargs.get("messages") or call_kwargs.args[0]
161
+ content = str(messages_arg)
162
+ assert user_text in content
@@ -0,0 +1,238 @@
1
+ import os
2
+ import pytest
3
+ from pathlib import Path
4
+ from unittest.mock import MagicMock, patch
5
+ import tempfile
6
+
7
+
8
+ # ---------------------------------------------------------------------------
9
+ # Helpers
10
+ # ---------------------------------------------------------------------------
11
+
12
+ def _make_mock_message(text: str):
13
+ """Build a minimal fake Anthropic message response."""
14
+ content_block = MagicMock()
15
+ content_block.text = text
16
+ message = MagicMock()
17
+ message.content = [content_block]
18
+ return message
19
+
20
+
21
+ FAKE_ANSWER = "Based on the documents, the conversion rate increased by 12%."
22
+
23
+
24
+ def _make_mock_chromadb(chunks: list[str] | None = None):
25
+ """
26
+ Return a (mock_chromadb_module, mock_embedding_fn_class) pair whose
27
+ collection.query() returns the supplied chunks as context.
28
+ """
29
+ returned_docs = chunks if chunks is not None else ["sample context chunk"]
30
+
31
+ mock_collection = MagicMock()
32
+ mock_collection.query.return_value = {"documents": [returned_docs]}
33
+
34
+ mock_chroma_client = MagicMock()
35
+ mock_chroma_client.get_or_create_collection.return_value = mock_collection
36
+
37
+ mock_chromadb = MagicMock()
38
+ mock_chromadb.Client.return_value = mock_chroma_client
39
+
40
+ mock_embedding_fn_class = MagicMock()
41
+ mock_embedding_fn_class.return_value = MagicMock()
42
+
43
+ return mock_chromadb, mock_embedding_fn_class
44
+
45
+
46
+ # ---------------------------------------------------------------------------
47
+ # Tests
48
+ # ---------------------------------------------------------------------------
49
+
50
+ class TestAskReturnsString:
51
+ """ask() should return a non-empty string."""
52
+
53
+ def test_returns_string_with_text_input(self):
54
+ mock_chromadb, mock_ef = _make_mock_chromadb()
55
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
56
+ with patch("anthropic.Anthropic") as MockAnthropic, \
57
+ patch("chromadb.Client", mock_chromadb.Client), \
58
+ patch(
59
+ "chromadb.utils.embedding_functions.SentenceTransformerEmbeddingFunction",
60
+ mock_ef,
61
+ ):
62
+ mock_client = MagicMock()
63
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_ANSWER)
64
+ MockAnthropic.return_value = mock_client
65
+
66
+ from bridgekit.search import ask
67
+ result = ask("What was the conversion rate?", text="The conversion rate increased by 12%.")
68
+
69
+ assert isinstance(result, str)
70
+ assert len(result) > 0
71
+
72
+ def test_returns_non_empty_answer(self):
73
+ mock_chromadb, mock_ef = _make_mock_chromadb()
74
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
75
+ with patch("anthropic.Anthropic") as MockAnthropic, \
76
+ patch("chromadb.Client", mock_chromadb.Client), \
77
+ patch(
78
+ "chromadb.utils.embedding_functions.SentenceTransformerEmbeddingFunction",
79
+ mock_ef,
80
+ ):
81
+ mock_client = MagicMock()
82
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_ANSWER)
83
+ MockAnthropic.return_value = mock_client
84
+
85
+ from bridgekit.search import ask
86
+ result = ask("What was the conversion rate?", text="The conversion rate increased by 12%.")
87
+
88
+ assert result == FAKE_ANSWER
89
+
90
+
91
+ class TestAskMissingSourceAndText:
92
+ """ask() should raise ValueError when neither source nor text is supplied."""
93
+
94
+ def test_raises_value_error_with_no_inputs(self):
95
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
96
+ from bridgekit.search import ask
97
+ with pytest.raises(ValueError, match="source"):
98
+ ask("What happened?")
99
+
100
+ def test_raises_value_error_message_mentions_text(self):
101
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
102
+ from bridgekit.search import ask
103
+ with pytest.raises(ValueError):
104
+ ask("What happened?", source=None, text=None)
105
+
106
+
107
+ class TestAskMissingApiKey:
108
+ """ask() should raise EnvironmentError when ANTHROPIC_API_KEY is absent."""
109
+
110
+ def test_raises_environment_error_when_key_missing(self):
111
+ env = {k: v for k, v in os.environ.items() if k != "ANTHROPIC_API_KEY"}
112
+ with patch.dict(os.environ, env, clear=True):
113
+ from bridgekit.search import ask
114
+ with pytest.raises(EnvironmentError):
115
+ ask("What happened?", text="Some text about results.")
116
+
117
+ def test_error_message_mentions_key(self):
118
+ env = {k: v for k, v in os.environ.items() if k != "ANTHROPIC_API_KEY"}
119
+ with patch.dict(os.environ, env, clear=True):
120
+ from bridgekit.search import ask
121
+ with pytest.raises(EnvironmentError, match="ANTHROPIC_API_KEY"):
122
+ ask("What happened?", text="Some text about results.")
123
+
124
+
125
+ class TestAskWithTextInput:
126
+ """ask() should work correctly when called with the text= parameter."""
127
+
128
+ def test_text_input_reaches_api(self):
129
+ raw_text = "Revenue grew 25% year-over-year driven by enterprise sales."
130
+ mock_chromadb, mock_ef = _make_mock_chromadb([raw_text])
131
+
132
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
133
+ with patch("anthropic.Anthropic") as MockAnthropic, \
134
+ patch("chromadb.Client", mock_chromadb.Client), \
135
+ patch(
136
+ "chromadb.utils.embedding_functions.SentenceTransformerEmbeddingFunction",
137
+ mock_ef,
138
+ ):
139
+ mock_client = MagicMock()
140
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_ANSWER)
141
+ MockAnthropic.return_value = mock_client
142
+
143
+ from bridgekit.search import ask
144
+ ask("What drove revenue growth?", text=raw_text)
145
+
146
+ # Verify the Anthropic API was actually called once
147
+ assert mock_client.messages.create.call_count == 1
148
+
149
+ def test_text_input_included_in_context(self):
150
+ raw_text = "Churn dropped from 8% to 3% after onboarding improvements."
151
+ mock_chromadb, mock_ef = _make_mock_chromadb([raw_text])
152
+
153
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
154
+ with patch("anthropic.Anthropic") as MockAnthropic, \
155
+ patch("chromadb.Client", mock_chromadb.Client), \
156
+ patch(
157
+ "chromadb.utils.embedding_functions.SentenceTransformerEmbeddingFunction",
158
+ mock_ef,
159
+ ):
160
+ mock_client = MagicMock()
161
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_ANSWER)
162
+ MockAnthropic.return_value = mock_client
163
+
164
+ from bridgekit.search import ask
165
+ ask("What happened to churn?", text=raw_text)
166
+
167
+ call_kwargs = mock_client.messages.create.call_args
168
+ messages_arg = call_kwargs.kwargs.get("messages") or call_kwargs.args[0]
169
+ # The retrieved chunk (raw_text) should appear in the prompt context
170
+ assert raw_text in str(messages_arg)
171
+
172
+
173
+ class TestAskWithSourceFolder:
174
+ """ask() should load .txt files from a folder and pass their content to the API."""
175
+
176
+ def test_source_folder_with_txt_file(self):
177
+ with tempfile.TemporaryDirectory() as tmpdir:
178
+ sample_file = Path(tmpdir) / "report.txt"
179
+ sample_content = "The experiment showed a 15% lift in click-through rate."
180
+ sample_file.write_text(sample_content, encoding="utf-8")
181
+
182
+ mock_chromadb, mock_ef = _make_mock_chromadb([sample_content])
183
+
184
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
185
+ with patch("anthropic.Anthropic") as MockAnthropic, \
186
+ patch("chromadb.Client", mock_chromadb.Client), \
187
+ patch(
188
+ "chromadb.utils.embedding_functions.SentenceTransformerEmbeddingFunction",
189
+ mock_ef,
190
+ ):
191
+ mock_client = MagicMock()
192
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_ANSWER)
193
+ MockAnthropic.return_value = mock_client
194
+
195
+ from bridgekit.search import ask
196
+ result = ask("What was the lift?", source=tmpdir)
197
+
198
+ assert isinstance(result, str)
199
+ assert len(result) > 0
200
+
201
+ def test_source_folder_calls_api_once(self):
202
+ with tempfile.TemporaryDirectory() as tmpdir:
203
+ (Path(tmpdir) / "notes.txt").write_text(
204
+ "User satisfaction scores improved by 20 points.", encoding="utf-8"
205
+ )
206
+
207
+ mock_chromadb, mock_ef = _make_mock_chromadb()
208
+
209
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
210
+ with patch("anthropic.Anthropic") as MockAnthropic, \
211
+ patch("chromadb.Client", mock_chromadb.Client), \
212
+ patch(
213
+ "chromadb.utils.embedding_functions.SentenceTransformerEmbeddingFunction",
214
+ mock_ef,
215
+ ):
216
+ mock_client = MagicMock()
217
+ mock_client.messages.create.return_value = _make_mock_message(FAKE_ANSWER)
218
+ MockAnthropic.return_value = mock_client
219
+
220
+ from bridgekit.search import ask
221
+ ask("How did satisfaction change?", source=tmpdir)
222
+
223
+ assert mock_client.messages.create.call_count == 1
224
+
225
+ def test_source_folder_empty_raises_value_error(self):
226
+ with tempfile.TemporaryDirectory() as tmpdir:
227
+ # Folder exists but has no supported files
228
+ mock_chromadb, mock_ef = _make_mock_chromadb()
229
+
230
+ with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
231
+ with patch("chromadb.Client", mock_chromadb.Client), \
232
+ patch(
233
+ "chromadb.utils.embedding_functions.SentenceTransformerEmbeddingFunction",
234
+ mock_ef,
235
+ ):
236
+ from bridgekit.search import ask
237
+ with pytest.raises(ValueError, match="No content found"):
238
+ ask("What happened?", source=tmpdir)
@@ -1,4 +0,0 @@
1
- from .reviewer import evaluate
2
-
3
- __version__ = "0.1.1"
4
- __all__ = ["evaluate"]
@@ -1 +0,0 @@
1
- anthropic>=0.20.0
File without changes
File without changes