language-models 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.turbo/turbo-build.log +5 -0
- package/.turbo/turbo-test.log +18 -0
- package/README.md +56 -142
- package/data/models.json +18805 -0
- package/dist/aliases.d.ts +5 -1
- package/dist/aliases.d.ts.map +1 -0
- package/dist/aliases.js +40 -4
- package/dist/aliases.js.map +1 -0
- package/dist/data/models.json +18805 -0
- package/dist/index.d.ts +10 -7
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +10 -7
- package/dist/index.js.map +1 -0
- package/dist/models.d.ts +94 -167
- package/dist/models.d.ts.map +1 -0
- package/dist/models.js +109 -69803
- package/dist/models.js.map +1 -0
- package/package.json +25 -23
- package/scripts/fetch-models.ts +115 -0
- package/src/aliases.test.ts +319 -0
- package/src/aliases.ts +48 -4
- package/src/index.test.ts +400 -0
- package/src/index.ts +20 -9
- package/src/models.test.ts +392 -0
- package/src/models.ts +174 -0
- package/tsconfig.json +5 -15
- package/vitest.config.ts +4 -14
- package/.editorconfig +0 -10
- package/.gitattributes +0 -4
- package/.releaserc.js +0 -129
- package/LICENSE +0 -21
- package/dist/parser.d.ts +0 -86
- package/dist/parser.js +0 -390
- package/dist/providers.d.ts +0 -1
- package/dist/providers.js +0 -76
- package/dist/types.d.ts +0 -127
- package/dist/types.js +0 -1
- package/eslint.config.js +0 -3
- package/generate/build-models.ts +0 -150
- package/generate/overwrites.ts +0 -12
- package/publish.js +0 -32
- package/roadmap.md +0 -54
- package/src/models.d.ts +0 -170
- package/src/models.js +0 -70434
- package/src/parser.ts +0 -485
- package/src/providers.ts +0 -79
- package/src/types.ts +0 -135
- package/tests/parser.test.ts +0 -11
- package/tests/regex.test.ts +0 -42
- package/tests/selector.test.ts +0 -53
- package/tests/setup.ts +0 -0
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
|
|
2
|
+
> language-models@0.0.1 test /Users/nathanclevenger/projects/mdx.org.ai/primitives/packages/language-models
|
|
3
|
+
> vitest
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
DEV v2.1.9 /Users/nathanclevenger/projects/mdx.org.ai/primitives/packages/language-models
|
|
7
|
+
|
|
8
|
+
✓ src/aliases.test.ts (48 tests) 8ms
|
|
9
|
+
✓ src/models.test.ts (41 tests) 11ms
|
|
10
|
+
✓ src/index.test.ts (39 tests) 14ms
|
|
11
|
+
|
|
12
|
+
Test Files 3 passed (3)
|
|
13
|
+
Tests 128 passed (128)
|
|
14
|
+
Start at 10:31:48
|
|
15
|
+
Duration 354ms (transform 130ms, setup 0ms, collect 167ms, tests 33ms, environment 0ms, prepare 183ms)
|
|
16
|
+
|
|
17
|
+
PASS Waiting for file changes...
|
|
18
|
+
press h to show help, press q to quit
|
package/README.md
CHANGED
|
@@ -1,165 +1,79 @@
|
|
|
1
|
-
#
|
|
2
|
-
|
|
3
|
-
Utilities and Tools for the AI SDK, Functions, Workflows, Observability, and Evals
|
|
4
|
-
|
|
5
|
-
open questions:
|
|
6
|
-
|
|
7
|
-
We need to support provider/creator/model
|
|
8
|
-
Do we need or want a @ sign?
|
|
9
|
-
Do we also support a creator/model syntax without provider? I think probably
|
|
10
|
-
I think we should follow/embrace openrouter's syntax wherever possible - extending it though to add our requirements
|
|
11
|
-
Do we also have our own version of openrouter/auto, openrouter/auto:online
|
|
12
|
-
Can we tie the routing into the contents of the message and/or priority? (like performance, latency, throughput, cost, etc?)
|
|
13
|
-
How do we handle reasoning? Follow the :reasoning flag from openrouter?
|
|
14
|
-
Many new open models do not initially support tools or structured output, as that requires a lot of work by the hosting provider to make that function ... do we want a composite type tool that could use a fast/cheap model like gemini-2-flash-lite or 4o-mini to transform the output of the first model into the specified structured output?
|
|
15
|
-
How do we want to handle other capabilities/tools?
|
|
16
|
-
Should we route :online to native search models like gemini, perplexity, or the new 4o-search? or do we just follow the OpenRouter convention of injecting the Search results into the context window before calling?
|
|
17
|
-
Do we want to support more general purpose tools? (like https://agentic.so/intro)
|
|
18
|
-
How can we also support our own secure code execution tool (JS not python like Google / OAI)
|
|
19
|
-
|
|
20
|
-
Clearly we will need to phase and iterate ... but we have to think very carefully because once we start using the API across multiple projects, it's going to be hard to change
|
|
21
|
-
|
|
22
|
-
@{provider}/{creator}/{model}:{config,capabilities,tools,priorities}
|
|
23
|
-
|
|
24
|
-
{creator}/{model}:{config,capabilities,tools,priorities}
|
|
25
|
-
?model=deepseek-ai/deepseek-r1-distill-qwen-32b
|
|
26
|
-
|
|
27
|
-
?model=@openai/openai/gpt-4o-search-preview
|
|
28
|
-
|
|
29
|
-
?model=openai/gpt-4o:code,online
|
|
30
|
-
|
|
31
|
-
@openrouter/deepseek-ai/deepseek-r1-distill-qwen-32b
|
|
32
|
-
@cloudflare/deepseek-ai/deepseek-r1-distill-qwen-32b
|
|
33
|
-
@google-vertex/deepseek-ai/deepseek-r1-distill-qwen-32b
|
|
34
|
-
@google-ai-studio/anthropic/claude-3.7-sonnet:thinking
|
|
35
|
-
@google-vertex/anthropic/claude-3.7-sonnet:thinking
|
|
36
|
-
@open-router/anthropic/claude-3.7-sonnet:thinking-low,online,nitro
|
|
37
|
-
|
|
38
|
-
Use Cases:
|
|
39
|
-
|
|
40
|
-
Evals - we need to easily and dynamically/programmatically change models, settings, tools, configuration, and mode
|
|
41
|
-
(for example - we need to test if something does better with or without reasoning ... and within reasoning, we need low, medium, and high models ... We also in some cases must force structured outputs to be structured outputs, because if tool use is required, say by an anthropic model, that supports structured outputs via tools, then that is incompatible ... but also, there are essentially 4 different ways to get Objects that match a certain schema:
|
|
42
|
-
structured_output: the best by far, but very limited support ... I think only 3 providers support this today
|
|
43
|
-
tool_use: There is support for many more providers with tool use, but only a subset actually enforces guaranteed schema ... but also has limitations on number of tools and forced tool to use
|
|
44
|
-
response_format: Supported on a majority of providers and models - but not universal - this guarantees valid JSON responses, but not any guarantee about the schema. This JSON mode can also be used without a schema to just force the model to respond in a structured data form vs normal text/markdown
|
|
45
|
-
system prompt: If the model/provider does not support any of the first 3, then you can just ask nicely ... but depending on the model, this will fail 10-50% of the time
|
|
46
|
-
Experimentation - Humans or LLM-as-Judge can pick the preferred response from 2 or more models (ie. compare 4o, o2, 3.7, 3.7:reasoning, flash 2, flash 2.0 reasoning
|
|
47
|
-
Specify best model - no cost/price requirements ... 4.5, o1-high, o3-mini-high, r1, 3.7, 3.7:reasoning-high, Flash 2 Pro, Flash 2 Reasoning
|
|
48
|
-
Specify only reasoning:
|
|
49
|
-
at runtime, be able to tweak provider/model/settings/tools via a simple query param for example ... ?model=*:reasoning+code would route that to any and all models with reasoning and a code execution tool ... currently adding tool, changing config, etc requires code changes ... we need all of these variables to be tweaked/edited/stored/evaluated at runtime without code changes
|
|
50
|
-
also we need to opt in/out of caching (and maybe even logging req/res bodies in sensitive PII situations)
|
|
51
|
-
also probably seed ...
|
|
52
|
-
|
|
53
|
-
## Usage
|
|
54
|
-
|
|
55
|
-
Examples:
|
|
56
|
-
|
|
57
|
-
```ts
|
|
58
|
-
// The name of the exported function should be changed, I just wanted to get the ball rolling
|
|
59
|
-
import { getSupportedModel } from '@drivly/ai-utils'
|
|
60
|
-
|
|
61
|
-
const model = getSupportedModel('@openai/openai/gpt-4o:code,online')
|
|
62
|
-
const model = getSupportedModel('@anthropic/anthropic/claude-3.7-sonnet:thinking,code,online')
|
|
63
|
-
const model = getSupportedModel([
|
|
64
|
-
// Fallback support if a certain model is down / doesnt have the full capabilities supported
|
|
65
|
-
req.args.model, // Attempt to use the model from the request, otherwise fallback to the next one
|
|
66
|
-
'@openai/openai/gpt-4o:code,online,thinking',
|
|
67
|
-
'@anthropic/anthropic/claude-3.7-sonnet:thinking,code,online',
|
|
68
|
-
])
|
|
69
|
-
```
|
|
70
|
-
|
|
71
|
-
# Meta-model Manager
|
|
72
|
-
|
|
73
|
-
To support complex busines requirements, we need a system that can automatically determine the best model for each task.
|
|
74
|
-
|
|
75
|
-
### Classification of content
|
|
76
|
-
|
|
77
|
-
A classification layer will be used to attach weights based on the prompt. The weights indicate the relative importance of each type of content in the prompt:
|
|
78
|
-
|
|
79
|
-
`'Generate a business plan to sell water to a fish.'` -> `[ 'businessLogic:0.5', 'marketing:0.4', 'legal:0.21' ]`
|
|
80
|
-
|
|
81
|
-
Each model in our system will maintain a unique set of performance weights, calculated from multiple feedback sources including the Arena as well as direct user-provided Eval Feedback. The classification layer's primary responsibility is to identify and tag content types within prompts.
|
|
82
|
-
|
|
83
|
-
When users indicate their satisfaction with a model's response through positive feedback (such as giving a `👍`), our system automatically increases that model's score for the identified content tags. This scoring mechanism creates a self-improving selection algorithm that learns which models excel at specific content types over time, leading to increasingly accurate model selection for future similar prompts without requiring manual intervention.
|
|
84
|
-
|
|
85
|
-
The key difference between this and systems such as NotDiamond is that we can offer multiple meta-models, each with their own focuses and strengths. Another benefit that we offer is that our weights are changing daily as we receive more feedback.
|
|
1
|
+
# language-models
|
|
86
2
|
|
|
87
|
-
|
|
3
|
+
Model listing and resolution for LLM providers. Fetches models from OpenRouter and resolves aliases to full model IDs.
|
|
88
4
|
|
|
89
|
-
|
|
90
|
-
- Tags could be generated from the prompt, making it fully dynamic. Works best if we force the classification model to try its best to match an existing tag, but create a new one if it cant.
|
|
91
|
-
- Version pinning.
|
|
5
|
+
## Quick Start
|
|
92
6
|
|
|
93
|
-
|
|
7
|
+
```typescript
|
|
8
|
+
import { resolve, list, search } from 'language-models'
|
|
94
9
|
|
|
95
|
-
|
|
10
|
+
// Resolve aliases to full model IDs
|
|
11
|
+
resolve('opus') // 'anthropic/claude-opus-4.5'
|
|
12
|
+
resolve('gpt-4o') // 'openai/gpt-4o'
|
|
13
|
+
resolve('llama-70b') // 'meta-llama/llama-3.3-70b-instruct'
|
|
14
|
+
resolve('mistral') // 'mistralai/mistral-large-2411'
|
|
96
15
|
|
|
97
|
-
|
|
16
|
+
// List all available models
|
|
17
|
+
const models = list()
|
|
98
18
|
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
Using the above information, we can now sort our models using the weights to find the best model within a certain "meta-model". In this example, we're using the `frontier-reasoning` group of models that are best at reasoning and logic.
|
|
19
|
+
// Search models
|
|
20
|
+
const claudeModels = search('claude')
|
|
21
|
+
```
|
|
104
22
|
|
|
105
|
-
|
|
23
|
+
## API
|
|
106
24
|
|
|
107
|
-
|
|
25
|
+
### `resolve(input: string): string`
|
|
108
26
|
|
|
109
|
-
|
|
110
|
-
sequenceDiagram
|
|
111
|
-
participant User
|
|
112
|
-
participant Classification as Classification Layer
|
|
113
|
-
participant MetaManager as Meta-model Manager
|
|
114
|
-
participant ModelRegistry as Model Registry
|
|
115
|
-
participant SelectedModel as Selected Model
|
|
116
|
-
participant Feedback as Feedback System
|
|
27
|
+
Resolve an alias or partial name to a full model ID.
|
|
117
28
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
29
|
+
```typescript
|
|
30
|
+
resolve('opus') // 'anthropic/claude-opus-4.5'
|
|
31
|
+
resolve('sonnet') // 'anthropic/claude-sonnet-4.5'
|
|
32
|
+
resolve('gpt') // 'openai/gpt-4o'
|
|
33
|
+
resolve('llama') // 'meta-llama/llama-4-maverick'
|
|
34
|
+
resolve('anthropic/claude-opus-4.5') // 'anthropic/claude-opus-4.5' (pass-through)
|
|
35
|
+
```
|
|
121
36
|
|
|
122
|
-
|
|
123
|
-
ModelRegistry->>MetaManager: Return candidate models with performance weights
|
|
37
|
+
### `list(): ModelInfo[]`
|
|
124
38
|
|
|
125
|
-
|
|
126
|
-
MetaManager->>SelectedModel: Route prompt to best model (e.g., claude-3.7-sonnet)
|
|
127
|
-
SelectedModel->>User: Return response
|
|
39
|
+
List all available models from OpenRouter.
|
|
128
40
|
|
|
129
|
-
|
|
130
|
-
Feedback->>ModelRegistry: Update model scores for identified content tags
|
|
131
|
-
Note over ModelRegistry: Increases claude-3.7-sonnet's<br/>score for businessLogic tag
|
|
41
|
+
### `get(id: string): ModelInfo | undefined`
|
|
132
42
|
|
|
133
|
-
|
|
134
|
-
```
|
|
43
|
+
Get a model by exact ID.
|
|
135
44
|
|
|
136
|
-
|
|
45
|
+
### `search(query: string): ModelInfo[]`
|
|
137
46
|
|
|
138
|
-
|
|
47
|
+
Search models by ID or name.
|
|
139
48
|
|
|
140
|
-
|
|
141
|
-
- PDF with reasoning (claude-3.7-sonnet) -> `*:pdf,reasoning`
|
|
142
|
-
- Creative writing (gpt-4.5) -> `creative`
|
|
143
|
-
- Extremely complex code debugging (o1-pro)
|
|
144
|
-
- Cheap blog post writer (gemini-2.0-flash)
|
|
145
|
-
- Better blog post writer (gpt-4o || claude-3.7-sonnet)
|
|
146
|
-
- Best blog post writer (gpt-4.5)
|
|
147
|
-
- Deal review (o3-mini || claude-3.7-sonnet || gemini-2.5-pro)
|
|
49
|
+
## Available Aliases
|
|
148
50
|
|
|
149
|
-
|
|
51
|
+
| Alias | Model ID |
|
|
52
|
+
|-------|----------|
|
|
53
|
+
| `opus` | anthropic/claude-opus-4.5 |
|
|
54
|
+
| `sonnet` | anthropic/claude-sonnet-4.5 |
|
|
55
|
+
| `haiku` | anthropic/claude-haiku-4.5 |
|
|
56
|
+
| `claude` | anthropic/claude-sonnet-4.5 |
|
|
57
|
+
| `gpt`, `gpt-4o`, `4o` | openai/gpt-4o |
|
|
58
|
+
| `o1`, `o3`, `o3-mini` | openai/o1, openai/o3, openai/o3-mini |
|
|
59
|
+
| `gemini`, `flash` | google/gemini-2.5-flash |
|
|
60
|
+
| `gemini-pro` | google/gemini-2.5-pro |
|
|
61
|
+
| `llama`, `llama-4` | meta-llama/llama-4-maverick |
|
|
62
|
+
| `llama-70b` | meta-llama/llama-3.3-70b-instruct |
|
|
63
|
+
| `mistral` | mistralai/mistral-large-2411 |
|
|
64
|
+
| `codestral` | mistralai/codestral-2501 |
|
|
65
|
+
| `deepseek` | deepseek/deepseek-chat |
|
|
66
|
+
| `r1` | deepseek/deepseek-r1 |
|
|
67
|
+
| `qwen` | qwen/qwen3-235b-a22b |
|
|
68
|
+
| `grok` | x-ai/grok-3 |
|
|
69
|
+
| `sonar` | perplexity/sonar-pro |
|
|
150
70
|
|
|
151
|
-
|
|
152
|
-
Requirements: Must be good at creative writing, but cost less than $15 per million tokens
|
|
153
|
-
Constraints: Must be able to handle complex code
|
|
154
|
-
```
|
|
71
|
+
## Updating Models
|
|
155
72
|
|
|
156
|
-
|
|
73
|
+
Fetch the latest models from OpenRouter:
|
|
157
74
|
|
|
158
|
-
```
|
|
159
|
-
|
|
160
|
-
"gemini-2.0-flash",
|
|
161
|
-
...
|
|
162
|
-
]
|
|
75
|
+
```bash
|
|
76
|
+
pnpm fetch-models
|
|
163
77
|
```
|
|
164
78
|
|
|
165
|
-
|
|
79
|
+
This updates `data/models.json` with all available models.
|