regression-substrate 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. regression_substrate-0.1.0/.gitignore +24 -0
  2. regression_substrate-0.1.0/CHANGES.md +39 -0
  3. regression_substrate-0.1.0/LICENSE +21 -0
  4. regression_substrate-0.1.0/PKG-INFO +104 -0
  5. regression_substrate-0.1.0/README.md +63 -0
  6. regression_substrate-0.1.0/adapters.py +185 -0
  7. regression_substrate-0.1.0/data/gold.jsonl +14 -0
  8. regression_substrate-0.1.0/data/responses.csv +25 -0
  9. regression_substrate-0.1.0/diff_engine.py +350 -0
  10. regression_substrate-0.1.0/examples/gold.jsonl +14 -0
  11. regression_substrate-0.1.0/examples/responses.csv +25 -0
  12. regression_substrate-0.1.0/gold.py +108 -0
  13. regression_substrate-0.1.0/ingest.py +255 -0
  14. regression_substrate-0.1.0/otel_exporter.py +220 -0
  15. regression_substrate-0.1.0/otel_spec.md +107 -0
  16. regression_substrate-0.1.0/pyproject.toml +35 -0
  17. regression_substrate-0.1.0/requirements.txt +2 -0
  18. regression_substrate-0.1.0/run_evaluation.py +181 -0
  19. regression_substrate-0.1.0/sequential_gate.py +243 -0
  20. regression_substrate-0.1.0/src/regression_substrate/__init__.py +36 -0
  21. regression_substrate-0.1.0/src/regression_substrate/adapters.py +185 -0
  22. regression_substrate-0.1.0/src/regression_substrate/cli.py +108 -0
  23. regression_substrate-0.1.0/src/regression_substrate/diff_engine.py +350 -0
  24. regression_substrate-0.1.0/src/regression_substrate/gold.py +108 -0
  25. regression_substrate-0.1.0/src/regression_substrate/ingest.py +255 -0
  26. regression_substrate-0.1.0/src/regression_substrate/otel_exporter.py +220 -0
  27. regression_substrate-0.1.0/src/regression_substrate/sequential_gate.py +243 -0
  28. regression_substrate-0.1.0/tests/test_gate.py +69 -0
@@ -0,0 +1,24 @@
1
+ # Build artifacts
2
+ dist/
3
+ build/
4
+ *.egg-info/
5
+ src/*.egg-info/
6
+
7
+ # Python
8
+ __pycache__/
9
+ *.pyc
10
+ *.pyo
11
+
12
+ # Test / output
13
+ .pytest_cache/
14
+ out/
15
+ out_test/
16
+ out_test2/
17
+
18
+ # IDE
19
+ .vscode/
20
+ .idea/
21
+
22
+ # Env
23
+ .venv/
24
+ *.env
@@ -0,0 +1,39 @@
1
+ # Changes — closing the deployment gaps
2
+
3
+ Four loopholes were raised for moving from a local runnable ZIP to a live
4
+ deployment. Here is what changed and what is verified.
5
+
6
+ ## 1. Ephemeral martingale state → persistence boundary (sequential_gate.py)
7
+ `SequentialGate` now takes a pluggable `Backend`. The default is in-memory; in
8
+ production you back `append_event` with a durable append-only store (Kafka /
9
+ Postgres) and `save_checkpoint`/`load_checkpoint` with a fast cache (Redis)
10
+ keyed by `(stream, epoch)`. A cold-booting pod resumes from the checkpoint in
11
+ O(1) instead of replaying the whole log, while the log stays the audit source of
12
+ truth. **Verified:** a fresh gate sharing a backend resumes a stream's capital
13
+ exactly, and `replay()` matches the checkpoint. **Not done here:** the actual
14
+ Redis/Postgres `Backend` subclass — it's a drop-in against the documented
15
+ interface, but needs your infra to test.
16
+
17
+ ## 2. Gold-set concept drift → rolling gold + drift detection (gold.py, new)
18
+ `RollingGoldSet` (FIFO, oldest labels roll out), `sample_for_labeling` (routes a
19
+ random fraction of live traffic to a human queue so the gold set tracks the live
20
+ distribution), and `drift_report` (compares judge agreement and error_sd on the
21
+ older vs most-recent half at a fixed threshold; flags `drift_warning` or
22
+ `judge_inadmissible`). **Verified:** a judge that silently breaks on a new query
23
+ type is caught (kappa 1.0 → 0.0, error_sd 0.10 → 0.41). **Not done here:** wiring
24
+ the sample queue to a real labeling tool.
25
+
26
+ ## 3. TF-IDF too brittle → pluggable embedder (ingest.py)
27
+ `auto_cluster` now takes an `embedder`. Default `tfidf_embedder` stays offline;
28
+ `sentence_transformer_embedder("all-MiniLM-L6-v2")` is a one-line swap for
29
+ semantically-close failure modes. **Verified:** the TF-IDF path. **Not done
30
+ here:** the sentence-transformer path — the model download needs network access
31
+ to the model hub, which this build environment blocks; verified the interface,
32
+ not the weights.
33
+
34
+ ## 4. Small-N bootstrap instability → power floor (diff_engine.py)
35
+ `gate()` gained `min_n` (default 30). Below it the gate returns `HOLD`
36
+ (insufficient power) instead of risking a `SHIP`/`REGRESSION` on a bootstrap
37
+ that underestimates variance. **Verified:** N=8 → HOLD, N=40 → REGRESSION on the
38
+ same effect size. The bundled sample (N=6) now reports `underpowered` and relaxes
39
+ the floor for illustration only; real data keeps the default 30.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 [Your Name or Organization]
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,104 @@
1
+ Metadata-Version: 2.4
2
+ Name: regression-substrate
3
+ Version: 0.1.0
4
+ Summary: A statistically rigorous CI gate for AI: treats model outputs as distributions, penalizes unreliable judges, and decides ship / hold / regression.
5
+ License: MIT License
6
+
7
+ Copyright (c) 2026 [Your Name or Organization]
8
+
9
+ Permission is hereby granted, free of charge, to any person obtaining a copy
10
+ of this software and associated documentation files (the "Software"), to deal
11
+ in the Software without restriction, including without limitation the rights
12
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
13
+ copies of the Software, and to permit persons to whom the Software is
14
+ furnished to do so, subject to the following conditions:
15
+
16
+ The above copyright notice and this permission notice shall be included in all
17
+ copies or substantial portions of the Software.
18
+
19
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
20
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
21
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
22
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
23
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
24
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
25
+ SOFTWARE.
26
+ License-File: LICENSE
27
+ Keywords: ci,evaluation,llm,mlops,regression-testing
28
+ Requires-Python: >=3.10
29
+ Requires-Dist: numpy>=1.24
30
+ Requires-Dist: scipy>=1.10
31
+ Provides-Extra: clustering
32
+ Requires-Dist: scikit-learn>=1.2; extra == 'clustering'
33
+ Provides-Extra: dev
34
+ Requires-Dist: pytest>=7.0; extra == 'dev'
35
+ Requires-Dist: scikit-learn>=1.2; extra == 'dev'
36
+ Provides-Extra: embeddings
37
+ Requires-Dist: sentence-transformers>=2.2; extra == 'embeddings'
38
+ Provides-Extra: langsmith
39
+ Requires-Dist: langsmith>=0.1; extra == 'langsmith'
40
+ Description-Content-Type: text/markdown
41
+
42
+ # regression-substrate
43
+
44
+ A statistically rigorous CI gate for AI systems. It treats model outputs as
45
+ distributions, penalizes unreliable judges, and returns a `SHIP` / `HOLD` /
46
+ `REGRESSION` verdict you can block a pull request on.
47
+
48
+ ## Install
49
+
50
+ ```bash
51
+ pip install regression-substrate # core (numpy, scipy)
52
+ pip install "regression-substrate[clustering]" # + auto_cluster (scikit-learn)
53
+ pip install "regression-substrate[langsmith]" # + LangSmith adapter
54
+ ```
55
+
56
+ For development (editable install with test dependencies):
57
+
58
+ ```bash
59
+ git clone <repo-url>
60
+ cd regression-substrate
61
+ pip install -e ".[dev]"
62
+ ```
63
+
64
+ ## CLI (drop into CI)
65
+
66
+ ```bash
67
+ regsub --data evals.csv --gold gold.jsonl --version-a v1 --version-b v2 --out out/
68
+ # exit 0 = SHIP / SHIP_WITH_FLAGS ; 1 = REGRESSION / HOLD ; 2 = JUDGE_INADMISSIBLE
69
+ ```
70
+
71
+ One line in your CI pipeline blocks the PR on a regression.
72
+
73
+ ## Library
74
+
75
+ ```python
76
+ from regression_substrate import gate, load_from_csv, Judge
77
+
78
+ judge = Judge(my_llm_scorer) # any (input, response) -> [0,1]
79
+ cal = judge.calibrate(gold_records) # -> kappa, error_sd
80
+ sa, sb, cids, meta = load_from_csv("evals.csv", "v1", "v2")
81
+ decision = gate(sa, sb, cids, judge_error_sd=cal["error_sd"], kappa=cal["kappa"])
82
+ print(decision.verdict)
83
+ ```
84
+
85
+ ## What's inside
86
+
87
+ | Module | Purpose |
88
+ |---|---|
89
+ | `diff_engine` | Offline gate: variance components, bootstrap CI, cluster scan, BH/e-BH |
90
+ | `ingest` | Loaders (JSONL, CSV), judge harness, auto-clustering |
91
+ | `sequential_gate` | Always-valid martingale monitor for continuous deployment |
92
+ | `gold` | Rolling gold set, drift detection, forced sampling for labeling |
93
+ | `adapters` | Vendor flatteners (LangSmith preset) |
94
+ | `otel_exporter` | OTel-aligned span capture path |
95
+ | `cli` | The `regsub` console command |
96
+
97
+ ## Running tests
98
+
99
+ ```bash
100
+ pip install -e ".[dev]"
101
+ pytest
102
+ ```
103
+
104
+ See `examples/` for a runnable dataset and `CHANGES.md` for design decisions.
@@ -0,0 +1,63 @@
1
+ # regression-substrate
2
+
3
+ A statistically rigorous CI gate for AI systems. It treats model outputs as
4
+ distributions, penalizes unreliable judges, and returns a `SHIP` / `HOLD` /
5
+ `REGRESSION` verdict you can block a pull request on.
6
+
7
+ ## Install
8
+
9
+ ```bash
10
+ pip install regression-substrate # core (numpy, scipy)
11
+ pip install "regression-substrate[clustering]" # + auto_cluster (scikit-learn)
12
+ pip install "regression-substrate[langsmith]" # + LangSmith adapter
13
+ ```
14
+
15
+ For development (editable install with test dependencies):
16
+
17
+ ```bash
18
+ git clone <repo-url>
19
+ cd regression-substrate
20
+ pip install -e ".[dev]"
21
+ ```
22
+
23
+ ## CLI (drop into CI)
24
+
25
+ ```bash
26
+ regsub --data evals.csv --gold gold.jsonl --version-a v1 --version-b v2 --out out/
27
+ # exit 0 = SHIP / SHIP_WITH_FLAGS ; 1 = REGRESSION / HOLD ; 2 = JUDGE_INADMISSIBLE
28
+ ```
29
+
30
+ One line in your CI pipeline blocks the PR on a regression.
31
+
32
+ ## Library
33
+
34
+ ```python
35
+ from regression_substrate import gate, load_from_csv, Judge
36
+
37
+ judge = Judge(my_llm_scorer) # any (input, response) -> [0,1]
38
+ cal = judge.calibrate(gold_records) # -> kappa, error_sd
39
+ sa, sb, cids, meta = load_from_csv("evals.csv", "v1", "v2")
40
+ decision = gate(sa, sb, cids, judge_error_sd=cal["error_sd"], kappa=cal["kappa"])
41
+ print(decision.verdict)
42
+ ```
43
+
44
+ ## What's inside
45
+
46
+ | Module | Purpose |
47
+ |---|---|
48
+ | `diff_engine` | Offline gate: variance components, bootstrap CI, cluster scan, BH/e-BH |
49
+ | `ingest` | Loaders (JSONL, CSV), judge harness, auto-clustering |
50
+ | `sequential_gate` | Always-valid martingale monitor for continuous deployment |
51
+ | `gold` | Rolling gold set, drift detection, forced sampling for labeling |
52
+ | `adapters` | Vendor flatteners (LangSmith preset) |
53
+ | `otel_exporter` | OTel-aligned span capture path |
54
+ | `cli` | The `regsub` console command |
55
+
56
+ ## Running tests
57
+
58
+ ```bash
59
+ pip install -e ".[dev]"
60
+ pytest
61
+ ```
62
+
63
+ See `examples/` for a runnable dataset and `CHANGES.md` for design decisions.
@@ -0,0 +1,185 @@
1
+ """
2
+ adapters.py — pull from vendor observability platforms into the 7-field schema.
3
+
4
+ HONESTY NOTE — READ THIS:
5
+ * The FLATTENING logic below is tested (see __main__) against a synthetic
6
+ fixture shaped like a vendor's documented run/feedback model. That transform
7
+ is proven.
8
+ * The LIVE FETCH (`load_from_langsmith`) is SDK- and auth-dependent and is NOT
9
+ exercised here. Vendor schemas drift, so the field paths in the presets are
10
+ best-effort and MUST be verified against the platform's current docs before
11
+ you trust them in production.
12
+
13
+ Design: don't hardwire any one vendor. A `TraceMap` says where each field lives
14
+ inside one run record; `flatten_runs` collapses (possibly nested, multi-step)
15
+ runs + feedback into flat 7-field records that ingest.assemble_records consumes.
16
+ A vendor is then just a TraceMap preset plus a thin fetch wrapper.
17
+ """
18
+
19
+ from __future__ import annotations
20
+ from dataclasses import dataclass
21
+ import json
22
+
23
+
24
+ def _dig(obj, path: str, default=None):
25
+ """Read a nested field by dotted path, e.g. 'extra.metadata.version'."""
26
+ cur = obj
27
+ for key in path.split("."):
28
+ if isinstance(cur, dict) and key in cur:
29
+ cur = cur[key]
30
+ else:
31
+ return default
32
+ return cur
33
+
34
+
35
+ def _canon(x) -> str:
36
+ """Canonicalize an input/output payload to a stable string. The `input`
37
+ string is what groups replicates and pairs versions, so it must be stable."""
38
+ if isinstance(x, str):
39
+ return x
40
+ if isinstance(x, dict):
41
+ for k in ("input", "question", "query", "text", "prompt",
42
+ "output", "answer", "result", "response"):
43
+ if isinstance(x.get(k), str):
44
+ return x[k]
45
+ return json.dumps(x, sort_keys=True)
46
+ return str(x)
47
+
48
+
49
+ @dataclass
50
+ class TraceMap:
51
+ """Where the fields live inside one vendor run record."""
52
+ input_path: str
53
+ output_path: str
54
+ version_path: str # MUST have been logged by the team; no version => unpairable
55
+ score_key: str # which feedback key carries the quality score
56
+ run_type_path: str = "run_type"
57
+ parent_path: str = "parent_run_id"
58
+ id_path: str = "id"
59
+ cluster_path: str | None = None
60
+ score_scale: tuple = (0.0, 1.0)
61
+
62
+
63
+ # Best-effort preset. VERIFY these paths against current LangSmith docs.
64
+ LANGSMITH = TraceMap(
65
+ input_path="inputs",
66
+ output_path="outputs",
67
+ version_path="extra.metadata.version",
68
+ score_key="quality",
69
+ run_type_path="run_type",
70
+ parent_path="parent_run_id",
71
+ id_path="id",
72
+ cluster_path="extra.metadata.cluster",
73
+ score_scale=(0.0, 1.0),
74
+ )
75
+
76
+
77
+ def flatten_runs(runs: list[dict], feedback: list[dict], tmap: TraceMap,
78
+ unit: str = "root") -> list[dict]:
79
+ """Collapse runs + feedback into flat 7-field records.
80
+
81
+ unit="root" -> evaluate whole trajectories (input=root input, response=root
82
+ output). Child LLM/tool runs are diagnostic and dropped.
83
+ unit=<type> -> evaluate a component instead (e.g. "retriever", "llm").
84
+ """
85
+ by_run: dict[str, dict] = {}
86
+ for f in feedback:
87
+ by_run.setdefault(f["run_id"], {})[f["key"]] = f.get("score")
88
+
89
+ lo, hi = tmap.score_scale
90
+ skipped_no_version = skipped_no_score = 0
91
+ records = []
92
+ for r in runs:
93
+ is_root = _dig(r, tmap.parent_path) is None
94
+ if unit == "root":
95
+ if not is_root:
96
+ continue
97
+ elif _dig(r, tmap.run_type_path) != unit:
98
+ continue
99
+
100
+ raw = (by_run.get(_dig(r, tmap.id_path)) or {}).get(tmap.score_key)
101
+ if raw is None:
102
+ skipped_no_score += 1
103
+ continue
104
+ version = _dig(r, tmap.version_path)
105
+ if version is None:
106
+ skipped_no_version += 1
107
+ continue
108
+
109
+ score = (float(raw) - lo) / (hi - lo) if hi != lo else float(raw)
110
+ rec = {
111
+ "input": _canon(_dig(r, tmap.input_path)),
112
+ "version": str(version),
113
+ "response": _canon(_dig(r, tmap.output_path)),
114
+ "score": max(0.0, min(1.0, score)),
115
+ }
116
+ if tmap.cluster_path:
117
+ c = _dig(r, tmap.cluster_path)
118
+ if c is not None:
119
+ rec["cluster"] = c
120
+ records.append(rec)
121
+
122
+ if skipped_no_version:
123
+ print(f" [adapter] WARNING: skipped {skipped_no_version} runs with no "
124
+ f"version tag at '{tmap.version_path}' -- they cannot be paired.")
125
+ if skipped_no_score:
126
+ print(f" [adapter] note: skipped {skipped_no_score} runs with no "
127
+ f"'{tmap.score_key}' feedback.")
128
+ return records
129
+
130
+
131
+ def load_from_langsmith(project: str, version_a: str, version_b: str,
132
+ tmap: TraceMap = LANGSMITH, unit: str = "root"):
133
+ """LIVE fetch + flatten + assemble. NOT exercised in this repo -- requires
134
+ the `langsmith` SDK and LANGSMITH_API_KEY, and the SDK call signatures and
135
+ schema below must be verified against current LangSmith docs."""
136
+ from langsmith import Client # raises if not installed
137
+ from ingest import assemble_records
138
+
139
+ client = Client()
140
+ runs = [r.dict() for r in client.list_runs(project_name=project)]
141
+ run_ids = [r["id"] for r in runs]
142
+ feedback = [f.dict() for f in client.list_feedback(run_ids=run_ids)]
143
+ records = flatten_runs(runs, feedback, tmap, unit=unit)
144
+ return assemble_records(records, version_a, version_b)
145
+
146
+
147
+ # --------------------------------------------------------------------------- #
148
+ # Demo: flatten a SYNTHETIC LangSmith-shaped fixture -> gate(). Proves the
149
+ # transform (nested trajectory collapse, version extraction, score scaling,
150
+ # replicate derivation) without touching the live API.
151
+ # --------------------------------------------------------------------------- #
152
+
153
+ if __name__ == "__main__":
154
+ import numpy as np
155
+ from ingest import assemble_records, validate_records
156
+ from diff_engine import gate
157
+
158
+ rng = np.random.default_rng(0)
159
+ inputs = ([("billing", f"billing question {i}") for i in range(3)] +
160
+ [("general", f"general question {i}") for i in range(3)])
161
+
162
+ runs, feedback, rid = [], [], 0
163
+ for cluster, q in inputs:
164
+ for ver, base in [("v1", 0.90), ("v2", 0.20 if cluster == "billing" else 0.78)]:
165
+ for _rep in range(2): # two replicates per (input, version)
166
+ root = f"run-{rid}"; rid += 1
167
+ runs.append({"id": root, "run_type": "chain", "parent_run_id": None,
168
+ "inputs": {"question": q}, "outputs": {"answer": "..."},
169
+ "extra": {"metadata": {"version": ver, "cluster": cluster}}})
170
+ child = f"run-{rid}"; rid += 1 # a nested LLM step (must be dropped)
171
+ runs.append({"id": child, "run_type": "llm", "parent_run_id": root,
172
+ "inputs": {}, "outputs": {}, "extra": {"metadata": {"version": ver}}})
173
+ score = float(np.clip(base + rng.normal(0, 0.03), 0, 1))
174
+ feedback.append({"run_id": root, "key": "quality", "score": round(score, 3)})
175
+
176
+ records = flatten_runs(runs, feedback, LANGSMITH, unit="root")
177
+ print(f"FLATTEN: {len(runs)} raw runs -> {len(records)} flat records "
178
+ f"(child runs dropped under unit='root')")
179
+ print(" sample:", records[0])
180
+ print(" validation problems:", validate_records(records) or "none")
181
+
182
+ sa, sb, cids, meta = assemble_records(records, "v1", "v2")
183
+ print(" assembled:", meta)
184
+ dec = gate(sa, sb, cids, judge_error_sd=0.05, kappa=0.78, alpha=0.05)
185
+ print(" verdict:", dec.verdict, "| CI:", tuple(round(x, 3) for x in dec.delta_ci))
@@ -0,0 +1,14 @@
1
+ {"input": "refund question", "response": "I'm sorry, I'll issue a refund within 5-7 business days.", "human": 0.9}
2
+ {"input": "billing question", "response": "Please contact billing.", "human": 0.2}
3
+ {"input": "password question", "response": "You can reset your password using the link we email you.", "human": 0.85}
4
+ {"input": "billing question", "response": "Check your bill.", "human": 0.25}
5
+ {"input": "hours question", "response": "We're available 24/7 to help.", "human": 0.8}
6
+ {"input": "vague question", "response": "Maybe.", "human": 0.05}
7
+ {"input": "email question", "response": "Head to account settings to update your email.", "human": 0.85}
8
+ {"input": "vague question", "response": "It depends.", "human": 0.15}
9
+ {"input": "duplicate question", "response": "I'll refund the duplicate charge to your account right away.", "human": 0.9}
10
+ {"input": "overcharge question", "response": "Look at your invoice.", "human": 0.3}
11
+ {"input": "email question", "response": "Sure — you can update your email in settings.", "human": 0.8}
12
+ {"input": "apology question", "response": "No idea, sorry.", "human": 0.2}
13
+ {"input": "hours question", "response": "We're open 24/7.", "human": 0.6}
14
+ {"input": "reset question", "response": "I'm happy to help reset your account.", "human": 0.75}
@@ -0,0 +1,25 @@
1
+ input,version,replicate,cluster,response
2
+ How do I get a refund for a double charge?,v1,0,billing,"I'm sorry about the double charge — I can issue a refund, which posts in 5-7 business days."
3
+ How do I get a refund for a double charge?,v1,1,billing,"Happy to help: we'll refund the double charge, usually within 5-7 business days."
4
+ How do I get a refund for a double charge?,v2,0,billing,Please contact billing.
5
+ How do I get a refund for a double charge?,v2,1,billing,That's a billing issue.
6
+ Why was I billed twice this month?,v1,0,billing,"It looks like a duplicate charge — I'll refund the extra payment right away."
7
+ Why was I billed twice this month?,v1,1,billing,"Sorry about that; that's a duplicate, and I'll refund it to your account."
8
+ Why was I billed twice this month?,v2,0,billing,Possibly a duplicate.
9
+ Why was I billed twice this month?,v2,1,billing,Check your bill.
10
+ "I think I was overcharged, what should I do?",v1,0,billing,"I can review the overcharge and refund the difference to your account."
11
+ "I think I was overcharged, what should I do?",v1,1,billing,"Sorry for the overcharge — I'll refund the difference in a few business days."
12
+ "I think I was overcharged, what should I do?",v2,0,billing,Overcharges can happen.
13
+ "I think I was overcharged, what should I do?",v2,1,billing,Look at your invoice.
14
+ What are your support hours?,v1,0,general,"We're available 24/7 to help, any day of the week."
15
+ What are your support hours?,v1,1,general,"Our team is here 24/7, including weekends."
16
+ What are your support hours?,v2,0,general,Support is 24/7.
17
+ What are your support hours?,v2,1,general,We're open 24/7.
18
+ How do I reset my password?,v1,0,general,"To reset your password, click the reset link we email you."
19
+ How do I reset my password?,v1,1,general,Use the password reset link we send to your email address.
20
+ How do I reset my password?,v2,0,general,Use the reset link.
21
+ How do I reset my password?,v2,1,general,Reset it from the login page.
22
+ Where do I update my email address?,v1,0,general,You can update your email under Account settings.
23
+ Where do I update my email address?,v1,1,general,Head to settings to change your email address.
24
+ Where do I update my email address?,v2,0,general,Update it in settings.
25
+ Where do I update my email address?,v2,1,general,Under account settings.