PyPI - xrouter-llm - Versions diffs - 0.1.0__tar.gz - Mend

xrouter-llm 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (60) hide show

xrouter_llm-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,159 @@
+# Xagent Source License
+**Effective Date:** February 15, 2026
+Copyright © 2026 Xorbits Inc.
+---
+## 1. Overview
+The Xagent software, source code, and associated materials (the **“Software”**) are provided under this Xagent Source License (the **“License”**).
+This License provides source-available rights for use, modification, and internal commercial deployment, while restricting certain hosted/service and competitive uses.
+> **Note:** This License is **not** an OSI-approved open source license.
+---
+## 2. Acceptance
+By using, copying, modifying, distributing, or making available the Software, you agree to be bound by this License.
+---
+## 3. Grant of Rights
+Subject to the terms and conditions of this License, the licensor (**“Licensor”**) grants you a non-exclusive, worldwide, royalty-free, non-transferable, non-sublicensable license to:
+1. **Use** the Software;
+2. **Copy** the Software;
+3. **Modify** the Software and create derivative works;
+4. **Distribute** the Software (including derivative works) in source and/or object form; and
+5. **Deploy** the Software for internal business purposes.
+All rights not expressly granted are reserved.
+---
+## 4. Restrictions
+### 4.1 Hosted / Managed Service Restriction
+Except as expressly permitted below, you may not provide the Software, or any **Restricted Functionality** of the Software, to any **Third Party** as a hosted service, managed service, or otherwise make it available for use over a network.
+This prohibition includes (without limitation):
+* offering the Software as “Xagent-as-a-Service” or a shared agent execution platform for multiple Third Parties;
+* providing multiple Third Parties access to a shared runtime, orchestration, execution, scheduling, workflow, or UI environment powered by the Software; or
+* operating a multi-tenant service in which Third Parties can create, run, manage, or monitor agents or workflows using the Software.
+### Permitted Single-Tenant Deployment
+You may deploy and operate the Software on behalf of a single Third Party customer, provided that:
+1. the deployment is dedicated to that customer (single-tenant);
+2. the customer does not share access with other Third Parties;
+3. the Software is not offered as a generalized or reusable platform service to multiple customers;
+4. such deployment is limited to that specific customer’s internal use; and
+5. all Xagent trademarks, product names, copyright notices, and branding elements remain visible and unaltered within the Software and related user interfaces.
+Removal, replacement, white-labeling, or obscuring of Xagent branding in a single-tenant deployment is prohibited unless you have obtained a separate commercial license or written authorization from the Licensor.
+For clarity, internal deployment within your own organization and your Affiliated Entities is permitted.
+### 4.2 Competitive Use Restriction
+You may not use the Software to develop, offer, or operate a product or service whose primary purpose is to provide an agent orchestration runtime or agent execution platform that competes directly with the Licensor’s commercial Xagent offering.
+### 4.3 License Protection / Technical Restrictions
+You may not remove, disable, circumvent, or materially alter any license verification, usage limitation, feature gating, entitlement checking, or similar functionality included in the Software that is intended to enforce this License or commercial terms.
+### 4.4 Notice and Attribution
+You may not alter, remove, or obscure any licensing, copyright, attribution, or other notices included in the Software.
+If you distribute a modified version of the Software, you must include prominent notices stating that you have modified the Software.
+---
+## 5. Trademarks
+This License does not grant you any rights to use the Licensor’s trademarks, service marks, trade names, logos, or product names (including **“Xagent”**), except as required for reasonable and customary use in describing the origin of the Software.
+---
+## 6. Patents
+The Licensor grants you a license under any patent claims the Licensor can license, or becomes able to license, to make, have made, use, sell, offer for sale, import, and have imported the Software, subject to the restrictions in this License.
+This patent license does not apply to any patent claims infringed by your modifications or additions.
+If you or your company make any written claim (including in a lawsuit or administrative proceeding) that the Software infringes or contributes to infringement of any patent, then your patent license under this License terminates immediately.
+---
+## 7. Distribution Conditions
+If you distribute any copy of the Software (modified or unmodified), you must ensure that recipients receive a copy of this License.
+---
+## 8. Termination and Reinstatement
+If you violate this License, your rights under this License terminate automatically.
+If the Licensor provides notice of the violation and you cure the violation within **30 days** of receiving notice, your rights will be reinstated retroactively.
+If you violate this License after reinstatement, your rights terminate automatically and permanently.
+---
+## 9. Disclaimer of Warranty
+TO THE MAXIMUM EXTENT PERMITTED BY LAW, THE SOFTWARE IS PROVIDED **“AS IS”**, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.
+---
+## 10. Limitation of Liability
+TO THE MAXIMUM EXTENT PERMITTED BY LAW, IN NO EVENT WILL THE LICENSOR BE LIABLE FOR ANY DAMAGES ARISING OUT OF OR RELATING TO THIS LICENSE OR THE SOFTWARE, WHETHER IN CONTRACT, TORT, OR OTHERWISE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+---
+## 11. Definitions
+**“Affiliated Entities”** means any entity that controls, is controlled by, or is under common control with you.
+**“Control”** means ownership of more than 50% of the voting power or equity interests, or the power to direct management or policies.
+**“Restricted Functionality”** means the core runtime and orchestration capabilities of the Software, including (without limitation):
+* agent orchestration and task execution runtime;
+* multi-agent coordination and scheduling;
+* workflow execution and planning engine;
+* tool integration runtime and connectors;
+* management UI used to create, run, manage, or monitor agents/workflows.
+**“Third Party”** means any person or entity other than you and your Affiliated Entities.
+**“You”** means the individual or entity exercising rights under this License.
+---
+## 12. Commercial Licensing
+If you wish to use the Software in a way not permitted under this License (including offering a hosted or managed service), you may obtain a commercial license from the Licensor.
+---
+## 13. Miscellaneous
+If any provision of this License is held unenforceable, the remaining provisions will remain in effect.
+This License is the entire agreement regarding the Software and supersedes any prior or contemporaneous agreements relating to the Software.
+---
+**Version 1.0 — Effective February 15, 2026**

xrouter_llm-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,351 @@
+Metadata-Version: 2.4
+Name: xrouter-llm
+Version: 0.1.0
+Summary: Prompt-aware LLM routing-decision service: predicts which model can complete a prompt and picks the cheapest one.
+Author: Xorbits Inc.
+License: # Xagent Source License
+        **Effective Date:** February 15, 2026
+        Copyright © 2026 Xorbits Inc.
+        ---
+        ## 1. Overview
+        The Xagent software, source code, and associated materials (the **“Software”**) are provided under this Xagent Source License (the **“License”**).
+        This License provides source-available rights for use, modification, and internal commercial deployment, while restricting certain hosted/service and competitive uses.
+        > **Note:** This License is **not** an OSI-approved open source license.
+        ---
+        ## 2. Acceptance
+        By using, copying, modifying, distributing, or making available the Software, you agree to be bound by this License.
+        ---
+        ## 3. Grant of Rights
+        Subject to the terms and conditions of this License, the licensor (**“Licensor”**) grants you a non-exclusive, worldwide, royalty-free, non-transferable, non-sublicensable license to:
+        1. **Use** the Software;
+        2. **Copy** the Software;
+        3. **Modify** the Software and create derivative works;
+        4. **Distribute** the Software (including derivative works) in source and/or object form; and
+        5. **Deploy** the Software for internal business purposes.
+        All rights not expressly granted are reserved.
+        ---
+        ## 4. Restrictions
+        ### 4.1 Hosted / Managed Service Restriction
+        Except as expressly permitted below, you may not provide the Software, or any **Restricted Functionality** of the Software, to any **Third Party** as a hosted service, managed service, or otherwise make it available for use over a network.
+        This prohibition includes (without limitation):
+        * offering the Software as “Xagent-as-a-Service” or a shared agent execution platform for multiple Third Parties;
+        * providing multiple Third Parties access to a shared runtime, orchestration, execution, scheduling, workflow, or UI environment powered by the Software; or
+        * operating a multi-tenant service in which Third Parties can create, run, manage, or monitor agents or workflows using the Software.
+        ### Permitted Single-Tenant Deployment
+        You may deploy and operate the Software on behalf of a single Third Party customer, provided that:
+        1. the deployment is dedicated to that customer (single-tenant);
+        2. the customer does not share access with other Third Parties;
+        3. the Software is not offered as a generalized or reusable platform service to multiple customers;
+        4. such deployment is limited to that specific customer’s internal use; and
+        5. all Xagent trademarks, product names, copyright notices, and branding elements remain visible and unaltered within the Software and related user interfaces.
+        Removal, replacement, white-labeling, or obscuring of Xagent branding in a single-tenant deployment is prohibited unless you have obtained a separate commercial license or written authorization from the Licensor.
+        For clarity, internal deployment within your own organization and your Affiliated Entities is permitted.
+        ### 4.2 Competitive Use Restriction
+        You may not use the Software to develop, offer, or operate a product or service whose primary purpose is to provide an agent orchestration runtime or agent execution platform that competes directly with the Licensor’s commercial Xagent offering.
+        ### 4.3 License Protection / Technical Restrictions
+        You may not remove, disable, circumvent, or materially alter any license verification, usage limitation, feature gating, entitlement checking, or similar functionality included in the Software that is intended to enforce this License or commercial terms.
+        ### 4.4 Notice and Attribution
+        You may not alter, remove, or obscure any licensing, copyright, attribution, or other notices included in the Software.
+        If you distribute a modified version of the Software, you must include prominent notices stating that you have modified the Software.
+        ---
+        ## 5. Trademarks
+        This License does not grant you any rights to use the Licensor’s trademarks, service marks, trade names, logos, or product names (including **“Xagent”**), except as required for reasonable and customary use in describing the origin of the Software.
+        ---
+        ## 6. Patents
+        The Licensor grants you a license under any patent claims the Licensor can license, or becomes able to license, to make, have made, use, sell, offer for sale, import, and have imported the Software, subject to the restrictions in this License.
+        This patent license does not apply to any patent claims infringed by your modifications or additions.
+        If you or your company make any written claim (including in a lawsuit or administrative proceeding) that the Software infringes or contributes to infringement of any patent, then your patent license under this License terminates immediately.
+        ---
+        ## 7. Distribution Conditions
+        If you distribute any copy of the Software (modified or unmodified), you must ensure that recipients receive a copy of this License.
+        ---
+        ## 8. Termination and Reinstatement
+        If you violate this License, your rights under this License terminate automatically.
+        If the Licensor provides notice of the violation and you cure the violation within **30 days** of receiving notice, your rights will be reinstated retroactively.
+        If you violate this License after reinstatement, your rights terminate automatically and permanently.
+        ---
+        ## 9. Disclaimer of Warranty
+        TO THE MAXIMUM EXTENT PERMITTED BY LAW, THE SOFTWARE IS PROVIDED **“AS IS”**, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.
+        ---
+        ## 10. Limitation of Liability
+        TO THE MAXIMUM EXTENT PERMITTED BY LAW, IN NO EVENT WILL THE LICENSOR BE LIABLE FOR ANY DAMAGES ARISING OUT OF OR RELATING TO THIS LICENSE OR THE SOFTWARE, WHETHER IN CONTRACT, TORT, OR OTHERWISE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+        ---
+        ## 11. Definitions
+        **“Affiliated Entities”** means any entity that controls, is controlled by, or is under common control with you.
+        **“Control”** means ownership of more than 50% of the voting power or equity interests, or the power to direct management or policies.
+        **“Restricted Functionality”** means the core runtime and orchestration capabilities of the Software, including (without limitation):
+        * agent orchestration and task execution runtime;
+        * multi-agent coordination and scheduling;
+        * workflow execution and planning engine;
+        * tool integration runtime and connectors;
+        * management UI used to create, run, manage, or monitor agents/workflows.
+        **“Third Party”** means any person or entity other than you and your Affiliated Entities.
+        **“You”** means the individual or entity exercising rights under this License.
+        ---
+        ## 12. Commercial Licensing
+        If you wish to use the Software in a way not permitted under this License (including offering a hosted or managed service), you may obtain a commercial license from the Licensor.
+        ---
+        ## 13. Miscellaneous
+        If any provision of this License is held unenforceable, the remaining provisions will remain in effect.
+        This License is the entire agreement regarding the Software and supersedes any prior or contemporaneous agreements relating to the Software.
+        ---
+        **Version 1.0 — Effective February 15, 2026**
+Project-URL: Homepage, https://github.com/xorbitsai/xrouter-llm
+Project-URL: Repository, https://github.com/xorbitsai/xrouter-llm
+Keywords: llm,router,routing,model-selection,irt,openrouter
+Classifier: License :: Other/Proprietary License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Intended Audience :: Developers
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: huggingface-hub>=0.23
+Requires-Dist: joblib>=1.3
+Requires-Dist: numpy>=1.24
+Requires-Dist: pandas>=2.0
+Requires-Dist: pyyaml>=6.0
+Requires-Dist: scikit-learn>=1.3
+Requires-Dist: scipy>=1.10
+Provides-Extra: dev
+Requires-Dist: pytest>=7.4; extra == "dev"
+Dynamic: license-file
+<div align="center">
+<img src="./assets/xorbits-logo.png" width="180px" alt="xorbits" />
+# xrouter-llm
+</div>
+`xrouter-llm` is a prompt-aware LLM **routing-decision** service. It answers
+"which model should serve this prompt?" and records the choice — it does NOT
+call the underlying LLMs.
+## Invariant
+```text
+Do not train:  prompt -> selected model
+Train:         prompt + model -> probability the model completes the prompt
+Decide:        predicted completion + cost -> cheapest model that can complete
+```
+Completion is factored into two decoupled axes (an IRT-style model):
+```text
+P(complete) = sigmoid(a * capability(model) + b * difficulty(prompt) + c)
+```
+- **capability(model)** = the mean of the model's published `gpqa_diamond` and
+  `livecodebench` (both full-coverage on the training side). Going wider doesn't
+  help at this data scale — a flat mean dilutes and learned weights overfit at
+  37 profiled models; see AGENTS.md "Capability benchmarks". Used directly, so a
+  brand-new model's benchmarks drive its ranking.
+- **difficulty(prompt)** = a Ridge regressor on a multilingual embedding
+  (`Qwen/Qwen3-Embedding-0.6B`), trained on each prompt's empirical pass-rate.
+  Multilingual (Chinese transfers from English training data). Picked over
+  `bge-m3` by a controlled probe (`scripts/probe_qwen_difficulty.py`): higher
+  held-out Pearson and it no longer rates trivial prompts ("1+1=?") as maximally
+  hard.
+This factoring is the key lesson: a single joint classifier could not rank
+unseen models by their benchmarks (on this data, model capability barely
+explains completion *marginally* — but it does once difficulty is controlled,
+which is exactly what the factored model exploits).
+## Components
+- `IRTRouter` (`irt_router.py`): the predictor (difficulty x capability).
+- `RoutingPolicy` (`policy.py`): "cheapest model whose predicted completion
+  clears `completion_threshold`; else the highest predicted completion".
+- `serving.py` / `server.py`: HTTP routing-decision API + single-page web UI.
+- `resources/config/models/`: a per-model YAML registry of capability profiles
+  (bundled in the package; resolve with `default_models_dir()`).
+- `resources/config/routers/`: named "auto configs" — a candidate model set +
+  policy (bundled; `default_routers_dir()`).
+- `resources/models/irt_router_350k.joblib`: the trained router shipped with the
+  package (`default_model_path()`).
+## Install
+```bash
+pip install xrouter-llm        # ships a trained router + model registry
+# or, for development:
+pip install -e ".[dev]"
+```
+The wheel bundles a trained router artifact, the model-profile registry, and the
+router configs, so a fresh install can serve immediately with no extra files.
+## Datasets
+The production difficulty model is trained on **multiple datasets combined**
+(all feed the difficulty axis; only profiled models feed the capability axis):
+| Source | Type | Scale | In production train? |
+| --- | --- | --- | --- |
+| `NPULH/LLMRouterBench` (350k stream sample) | single-turn QA / code / math (22 tasks) | 37 models x ~13.8k prompts | ✅ |
+| agent-psychometrics — Terminal-Bench 2.0 | terminal agent | 89 tasks x 112 subjects | ✅ `--dataset agentic:agentic/terminalbench` |
+| agent-psychometrics — SWE-bench Verified | coding agent | 500 tasks x 134 subjects | ✅ task text joined from `princeton-nlp/SWE-bench_Verified` |
+| agent-psychometrics — SWE-bench Pro / GSO | coding agent | 730x14 / 102x15 | ⛔ ship no local task text, external join needed |
+The current artifact trains on LLMRouterBench 350k **+ Terminal-Bench +
+SWE-bench Verified** (377,997 rows / ~14,364 prompts / 283 subjects). The
+agentic matrices come from
+[agent-psychometrics](https://github.com/dariakryvosheieva/agent-psychometrics)
+(MIT) via `agentic.py`. Only the 37 profiled llmrouterbench models feed the
+capability axis; agentic subjects feed difficulty only. RouterBench
+(`withmartian/routerbench`) remains a smaller legacy baseline. Local datasets and
+trained artifacts are not committed (`data/`, `artifacts/` are gitignored).
+Adding more agentic prompt types (e.g. your own traffic) is the only way to make
+difficulty accurate for task mixes outside coding/terminal — see AGENTS.md.
+## Train
+```bash
+xrouter-llm train-irt \
+  --dataset llmrouterbench:data/raw/llmrouterbench_stream_sample_350k \
+  --dataset agentic:agentic/terminalbench \
+  --dataset agentic:agentic/swebench_verified \
+  --benchmark-profiles artifacts/profiles/llmrouterbench_350k_profiles_priority_collected.json \
+  --output artifacts/models/irt_router_350k.joblib
+```
+Diagnostics: `sweep-thresholds` (cost/completion frontier + calibration) and
+`eval-model-holdout` (leave-one-model-out generalization).
+## Serve
+The bundled router, registry, and configs are the defaults, so a bare invocation
+works out of the box:
+```bash
+xrouter-llm serve --port 8080
+```
+Override any of them to use your own trained model or registry:
+```bash
+xrouter-llm serve \
+  --model artifacts/models/irt_router_350k.joblib \
+  --models-dir config/models --routers-dir config/routers \
+  --db artifacts/calls.db --port 8080
+```
+- `GET /` — single-page UI (prompt box, config picker, decision table, history)
+- `GET /api/configs`, `POST /api/route` (`{prompt, config, task?}`),
+  `GET /api/history?limit=N`
+- Every decision is logged to SQLite (`*.db`/`*.sqlite` are gitignored — the log
+  holds user prompts).
+## Model registry
+One YAML per supported model, bundled under
+`src/xrouter_llm/resources/config/models/` (capability profile: provider, costs,
+context, published benchmarks as 0-100 percentages). `model_id` is the model's
+canonical OpenRouter slug (e.g. `anthropic/claude-opus-4.8`). The bundled
+registry is the default for `--benchmark-profiles`; point it at your own
+directory or file to extend it. Add a model = add a file.
+```python
+from xrouter_llm import IRTRouter, default_model_path, default_models_dir, load_benchmark_profiles
+router = IRTRouter.load(default_model_path())
+for profile in load_benchmark_profiles(default_models_dir()).profiles():
+    router.add_benchmark_profile(profile)
+preds = router.predict("实现一个分布式一致性算法", model_ids=["claude-opus-4-8", "deepseek-v4-pro"])
+print({p.model_id: round(p.mu, 3) for p in preds})
+```
+## License
+`xrouter-llm` is released under the **Xagent Source License** (© Xorbits Inc.) —
+see [LICENSE](LICENSE). It is source-available, **not** an OSI-approved open
+source license.
+The license text is shared verbatim with [Xagent](https://github.com/xorbitsai/xagent);
+for this project the licensed "Software" is `xrouter-llm`, and the
+"Restricted Functionality" / hosted-service and competitive-use clauses apply to
+its routing-decision and model-selection capabilities. In short: use,
+modification, and internal/single-tenant deployment are permitted; offering it as
+a multi-tenant hosted/managed service, or a directly competing service, is not.
+See [LICENSE](LICENSE) for the controlling terms.

xrouter_llm-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,161 @@
+<div align="center">
+<img src="./assets/xorbits-logo.png" width="180px" alt="xorbits" />
+# xrouter-llm
+</div>
+`xrouter-llm` is a prompt-aware LLM **routing-decision** service. It answers
+"which model should serve this prompt?" and records the choice — it does NOT
+call the underlying LLMs.
+## Invariant
+```text
+Do not train:  prompt -> selected model
+Train:         prompt + model -> probability the model completes the prompt
+Decide:        predicted completion + cost -> cheapest model that can complete
+```
+Completion is factored into two decoupled axes (an IRT-style model):
+```text
+P(complete) = sigmoid(a * capability(model) + b * difficulty(prompt) + c)
+```
+- **capability(model)** = the mean of the model's published `gpqa_diamond` and
+  `livecodebench` (both full-coverage on the training side). Going wider doesn't
+  help at this data scale — a flat mean dilutes and learned weights overfit at
+  37 profiled models; see AGENTS.md "Capability benchmarks". Used directly, so a
+  brand-new model's benchmarks drive its ranking.
+- **difficulty(prompt)** = a Ridge regressor on a multilingual embedding
+  (`Qwen/Qwen3-Embedding-0.6B`), trained on each prompt's empirical pass-rate.
+  Multilingual (Chinese transfers from English training data). Picked over
+  `bge-m3` by a controlled probe (`scripts/probe_qwen_difficulty.py`): higher
+  held-out Pearson and it no longer rates trivial prompts ("1+1=?") as maximally
+  hard.
+This factoring is the key lesson: a single joint classifier could not rank
+unseen models by their benchmarks (on this data, model capability barely
+explains completion *marginally* — but it does once difficulty is controlled,
+which is exactly what the factored model exploits).
+## Components
+- `IRTRouter` (`irt_router.py`): the predictor (difficulty x capability).
+- `RoutingPolicy` (`policy.py`): "cheapest model whose predicted completion
+  clears `completion_threshold`; else the highest predicted completion".
+- `serving.py` / `server.py`: HTTP routing-decision API + single-page web UI.
+- `resources/config/models/`: a per-model YAML registry of capability profiles
+  (bundled in the package; resolve with `default_models_dir()`).
+- `resources/config/routers/`: named "auto configs" — a candidate model set +
+  policy (bundled; `default_routers_dir()`).
+- `resources/models/irt_router_350k.joblib`: the trained router shipped with the
+  package (`default_model_path()`).
+## Install
+```bash
+pip install xrouter-llm        # ships a trained router + model registry
+# or, for development:
+pip install -e ".[dev]"
+```
+The wheel bundles a trained router artifact, the model-profile registry, and the
+router configs, so a fresh install can serve immediately with no extra files.
+## Datasets
+The production difficulty model is trained on **multiple datasets combined**
+(all feed the difficulty axis; only profiled models feed the capability axis):
+| Source | Type | Scale | In production train? |
+| --- | --- | --- | --- |
+| `NPULH/LLMRouterBench` (350k stream sample) | single-turn QA / code / math (22 tasks) | 37 models x ~13.8k prompts | ✅ |
+| agent-psychometrics — Terminal-Bench 2.0 | terminal agent | 89 tasks x 112 subjects | ✅ `--dataset agentic:agentic/terminalbench` |
+| agent-psychometrics — SWE-bench Verified | coding agent | 500 tasks x 134 subjects | ✅ task text joined from `princeton-nlp/SWE-bench_Verified` |
+| agent-psychometrics — SWE-bench Pro / GSO | coding agent | 730x14 / 102x15 | ⛔ ship no local task text, external join needed |
+The current artifact trains on LLMRouterBench 350k **+ Terminal-Bench +
+SWE-bench Verified** (377,997 rows / ~14,364 prompts / 283 subjects). The
+agentic matrices come from
+[agent-psychometrics](https://github.com/dariakryvosheieva/agent-psychometrics)
+(MIT) via `agentic.py`. Only the 37 profiled llmrouterbench models feed the
+capability axis; agentic subjects feed difficulty only. RouterBench
+(`withmartian/routerbench`) remains a smaller legacy baseline. Local datasets and
+trained artifacts are not committed (`data/`, `artifacts/` are gitignored).
+Adding more agentic prompt types (e.g. your own traffic) is the only way to make
+difficulty accurate for task mixes outside coding/terminal — see AGENTS.md.
+## Train
+```bash
+xrouter-llm train-irt \
+  --dataset llmrouterbench:data/raw/llmrouterbench_stream_sample_350k \
+  --dataset agentic:agentic/terminalbench \
+  --dataset agentic:agentic/swebench_verified \
+  --benchmark-profiles artifacts/profiles/llmrouterbench_350k_profiles_priority_collected.json \
+  --output artifacts/models/irt_router_350k.joblib
+```
+Diagnostics: `sweep-thresholds` (cost/completion frontier + calibration) and
+`eval-model-holdout` (leave-one-model-out generalization).
+## Serve
+The bundled router, registry, and configs are the defaults, so a bare invocation
+works out of the box:
+```bash
+xrouter-llm serve --port 8080
+```
+Override any of them to use your own trained model or registry:
+```bash
+xrouter-llm serve \
+  --model artifacts/models/irt_router_350k.joblib \
+  --models-dir config/models --routers-dir config/routers \
+  --db artifacts/calls.db --port 8080
+```
+- `GET /` — single-page UI (prompt box, config picker, decision table, history)
+- `GET /api/configs`, `POST /api/route` (`{prompt, config, task?}`),
+  `GET /api/history?limit=N`
+- Every decision is logged to SQLite (`*.db`/`*.sqlite` are gitignored — the log
+  holds user prompts).
+## Model registry
+One YAML per supported model, bundled under
+`src/xrouter_llm/resources/config/models/` (capability profile: provider, costs,
+context, published benchmarks as 0-100 percentages). `model_id` is the model's
+canonical OpenRouter slug (e.g. `anthropic/claude-opus-4.8`). The bundled
+registry is the default for `--benchmark-profiles`; point it at your own
+directory or file to extend it. Add a model = add a file.
+```python
+from xrouter_llm import IRTRouter, default_model_path, default_models_dir, load_benchmark_profiles
+router = IRTRouter.load(default_model_path())
+for profile in load_benchmark_profiles(default_models_dir()).profiles():
+    router.add_benchmark_profile(profile)
+preds = router.predict("实现一个分布式一致性算法", model_ids=["claude-opus-4-8", "deepseek-v4-pro"])
+print({p.model_id: round(p.mu, 3) for p in preds})
+```
+## License
+`xrouter-llm` is released under the **Xagent Source License** (© Xorbits Inc.) —
+see [LICENSE](LICENSE). It is source-available, **not** an OSI-approved open
+source license.
+The license text is shared verbatim with [Xagent](https://github.com/xorbitsai/xagent);
+for this project the licensed "Software" is `xrouter-llm`, and the
+"Restricted Functionality" / hosted-service and competitive-use clauses apply to
+its routing-decision and model-selection capabilities. In short: use,
+modification, and internal/single-tenant deployment are permitted; offering it as
+a multi-tenant hosted/managed service, or a directly competing service, is not.
+See [LICENSE](LICENSE) for the controlling terms.