language-models 0.1.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/.turbo/turbo-build.log +5 -0
  2. package/CHANGELOG.md +3 -0
  3. package/README.md +56 -142
  4. package/data/models.json +18805 -0
  5. package/dist/aliases.d.ts +5 -1
  6. package/dist/aliases.d.ts.map +1 -0
  7. package/dist/aliases.js +40 -4
  8. package/dist/aliases.js.map +1 -0
  9. package/dist/data/models.json +18805 -0
  10. package/dist/index.d.ts +10 -7
  11. package/dist/index.d.ts.map +1 -0
  12. package/dist/index.js +10 -7
  13. package/dist/index.js.map +1 -0
  14. package/dist/models.d.ts +94 -167
  15. package/dist/models.d.ts.map +1 -0
  16. package/dist/models.js +109 -69803
  17. package/dist/models.js.map +1 -0
  18. package/package.json +25 -23
  19. package/scripts/fetch-models.ts +115 -0
  20. package/src/aliases.test.ts +319 -0
  21. package/src/aliases.ts +48 -4
  22. package/src/index.test.ts +400 -0
  23. package/src/index.ts +20 -9
  24. package/src/models.test.ts +392 -0
  25. package/src/models.ts +174 -0
  26. package/tsconfig.json +5 -15
  27. package/vitest.config.ts +4 -14
  28. package/.editorconfig +0 -10
  29. package/.gitattributes +0 -4
  30. package/.releaserc.js +0 -129
  31. package/LICENSE +0 -21
  32. package/dist/parser.d.ts +0 -86
  33. package/dist/parser.js +0 -390
  34. package/dist/providers.d.ts +0 -1
  35. package/dist/providers.js +0 -76
  36. package/dist/types.d.ts +0 -127
  37. package/dist/types.js +0 -1
  38. package/eslint.config.js +0 -3
  39. package/generate/build-models.ts +0 -150
  40. package/generate/overwrites.ts +0 -12
  41. package/publish.js +0 -32
  42. package/roadmap.md +0 -54
  43. package/src/models.d.ts +0 -170
  44. package/src/models.js +0 -70434
  45. package/src/parser.ts +0 -485
  46. package/src/providers.ts +0 -79
  47. package/src/types.ts +0 -135
  48. package/tests/parser.test.ts +0 -11
  49. package/tests/regex.test.ts +0 -42
  50. package/tests/selector.test.ts +0 -53
  51. package/tests/setup.ts +0 -0
@@ -0,0 +1,5 @@
1
+
2
+ 
3
+ > language-models@2.0.1 build /Users/nathanclevenger/projects/primitives.org.ai/packages/language-models
4
+ > tsc -p tsconfig.json && cp -r data dist/
5
+
package/CHANGELOG.md ADDED
@@ -0,0 +1,3 @@
1
+ # language-models
2
+
3
+ ## 2.0.1
package/README.md CHANGED
@@ -1,165 +1,79 @@
1
- # ai-models
2
-
3
- Utilities and Tools for the AI SDK, Functions, Workflows, Observability, and Evals
4
-
5
- open questions:
6
-
7
- We need to support provider/creator/model
8
- Do we need or want a @ sign?
9
- Do we also support a creator/model syntax without provider? I think probably
10
- I think we should follow/embrace openrouter's syntax wherever possible - extending it though to add our requirements
11
- Do we also have our own version of openrouter/auto, openrouter/auto:online
12
- Can we tie the routing into the contents of the message and/or priority? (like performance, latency, throughput, cost, etc?)
13
- How do we handle reasoning? Follow the :reasoning flag from openrouter?
14
- Many new open models do not initially support tools or structured output, as that requires a lot of work by the hosting provider to make that function ... do we want a composite type tool that could use a fast/cheap model like gemini-2-flash-lite or 4o-mini to transform the output of the first model into the specified structured output?
15
- How do we want to handle other capabilities/tools?
16
- Should we route :online to native search models like gemini, perplexity, or the new 4o-search? or do we just follow the OpenRouter convention of injecting the Search results into the context window before calling?
17
- Do we want to support more general purpose tools? (like https://agentic.so/intro)
18
- How can we also support our own secure code execution tool (JS not python like Google / OAI)
19
-
20
- Clearly we will need to phase and iterate ... but we have to think very carefully because once we start using the API across multiple projects, it's going to be hard to change
21
-
22
- @{provider}/{creator}/{model}:{config,capabilities,tools,priorities}
23
-
24
- {creator}/{model}:{config,capabilities,tools,priorities}
25
- ?model=deepseek-ai/deepseek-r1-distill-qwen-32b
26
-
27
- ?model=@openai/openai/gpt-4o-search-preview
28
-
29
- ?model=openai/gpt-4o:code,online
30
-
31
- @openrouter/deepseek-ai/deepseek-r1-distill-qwen-32b
32
- @cloudflare/deepseek-ai/deepseek-r1-distill-qwen-32b
33
- @google-vertex/deepseek-ai/deepseek-r1-distill-qwen-32b
34
- @google-ai-studio/anthropic/claude-3.7-sonnet:thinking
35
- @google-vertex/anthropic/claude-3.7-sonnet:thinking
36
- @open-router/anthropic/claude-3.7-sonnet:thinking-low,online,nitro
37
-
38
- Use Cases:
39
-
40
- Evals - we need to easily and dynamically/programmatically change models, settings, tools, configuration, and mode
41
- (for example - we need to test if something does better with or without reasoning ... and within reasoning, we need low, medium, and high models ... We also in some cases must force structured outputs to be structured outputs, because if tool use is required, say by an anthropic model, that supports structured outputs via tools, then that is incompatible ... but also, there are essentially 4 different ways to get Objects that match a certain schema:
42
- structured_output: the best by far, but very limited support ... I think only 3 providers support this today
43
- tool_use: There is support for many more providers with tool use, but only a subset actually enforces guaranteed schema ... but also has limitations on number of tools and forced tool to use
44
- response_format: Supported on a majority of providers and models - but not universal - this guarantees valid JSON responses, but not any guarantee about the schema. This JSON mode can also be used without a schema to just force the model to respond in a structured data form vs normal text/markdown
45
- system prompt: If the model/provider does not support any of the first 3, then you can just ask nicely ... but depending on the model, this will fail 10-50% of the time
46
- Experimentation - Humans or LLM-as-Judge can pick the preferred response from 2 or more models (ie. compare 4o, o2, 3.7, 3.7:reasoning, flash 2, flash 2.0 reasoning
47
- Specify best model - no cost/price requirements ... 4.5, o1-high, o3-mini-high, r1, 3.7, 3.7:reasoning-high, Flash 2 Pro, Flash 2 Reasoning
48
- Specify only reasoning:
49
- at runtime, be able to tweak provider/model/settings/tools via a simple query param for example ... ?model=*:reasoning+code would route that to any and all models with reasoning and a code execution tool ... currently adding tool, changing config, etc requires code changes ... we need all of these variables to be tweaked/edited/stored/evaluated at runtime without code changes
50
- also we need to opt in/out of caching (and maybe even logging req/res bodies in sensitive PII situations)
51
- also probably seed ...
52
-
53
- ## Usage
54
-
55
- Examples:
56
-
57
- ```ts
58
- // The name of the exported function should be changed, I just wanted to get the ball rolling
59
- import { getSupportedModel } from '@drivly/ai-utils'
60
-
61
- const model = getSupportedModel('@openai/openai/gpt-4o:code,online')
62
- const model = getSupportedModel('@anthropic/anthropic/claude-3.7-sonnet:thinking,code,online')
63
- const model = getSupportedModel([
64
- // Fallback support if a certain model is down / doesnt have the full capabilities supported
65
- req.args.model, // Attempt to use the model from the request, otherwise fallback to the next one
66
- '@openai/openai/gpt-4o:code,online,thinking',
67
- '@anthropic/anthropic/claude-3.7-sonnet:thinking,code,online',
68
- ])
69
- ```
70
-
71
- # Meta-model Manager
72
-
73
- To support complex busines requirements, we need a system that can automatically determine the best model for each task.
74
-
75
- ### Classification of content
76
-
77
- A classification layer will be used to attach weights based on the prompt. The weights indicate the relative importance of each type of content in the prompt:
78
-
79
- `'Generate a business plan to sell water to a fish.'` -> `[ 'businessLogic:0.5', 'marketing:0.4', 'legal:0.21' ]`
80
-
81
- Each model in our system will maintain a unique set of performance weights, calculated from multiple feedback sources including the Arena as well as direct user-provided Eval Feedback. The classification layer's primary responsibility is to identify and tag content types within prompts.
82
-
83
- When users indicate their satisfaction with a model's response through positive feedback (such as giving a `👍`), our system automatically increases that model's score for the identified content tags. This scoring mechanism creates a self-improving selection algorithm that learns which models excel at specific content types over time, leading to increasingly accurate model selection for future similar prompts without requiring manual intervention.
84
-
85
- The key difference between this and systems such as NotDiamond is that we can offer multiple meta-models, each with their own focuses and strengths. Another benefit that we offer is that our weights are changing daily as we receive more feedback.
1
+ # language-models
86
2
 
87
- ### Ideas and thoughts
3
+ Model listing and resolution for LLM providers. Fetches models from OpenRouter and resolves aliases to full model IDs.
88
4
 
89
- - One idea could be to slowly remove weights over time, marking models with less usage as less important. This would allow for newer models to quickly bubble up to the top without needing a vast evaluation dataset. For example, if Gemini 3 releases and people start using it more, the weights for Gemini 2 will slowly decrease, making it less and less likely to be chosen as we can more confident in the newer model.
90
- - Tags could be generated from the prompt, making it fully dynamic. Works best if we force the classification model to try its best to match an existing tag, but create a new one if it cant.
91
- - Version pinning.
5
+ ## Quick Start
92
6
 
93
- ### Examples
7
+ ```typescript
8
+ import { resolve, list, search } from 'language-models'
94
9
 
95
- #### Meta model: frontier()
10
+ // Resolve aliases to full model IDs
11
+ resolve('opus') // 'anthropic/claude-opus-4.5'
12
+ resolve('gpt-4o') // 'openai/gpt-4o'
13
+ resolve('llama-70b') // 'meta-llama/llama-3.3-70b-instruct'
14
+ resolve('mistral') // 'mistralai/mistral-large-2411'
96
15
 
97
- #### Prompt: Generate a business plan to sell water to a fish.
16
+ // List all available models
17
+ const models = list()
98
18
 
99
- #### Weights: `[ 'businessLogic:0.75', 'marketing:0.4', 'legal:0.21' ]`
100
-
101
- #### Models
102
-
103
- Using the above information, we can now sort our models using the weights to find the best model within a certain "meta-model". In this example, we're using the `frontier-reasoning` group of models that are best at reasoning and logic.
19
+ // Search models
20
+ const claudeModels = search('claude')
21
+ ```
104
22
 
105
- Thanks to our classification layer, its extremely easy to route the prompt to `claude-3.7-sonnet`, which best matches the `businessLogic` tag (among others).
23
+ ## API
106
24
 
107
- ## Sequence diagram
25
+ ### `resolve(input: string): string`
108
26
 
109
- ```mermaid
110
- sequenceDiagram
111
- participant User
112
- participant Classification as Classification Layer
113
- participant MetaManager as Meta-model Manager
114
- participant ModelRegistry as Model Registry
115
- participant SelectedModel as Selected Model
116
- participant Feedback as Feedback System
27
+ Resolve an alias or partial name to a full model ID.
117
28
 
118
- User->>Classification: Send prompt "Generate business plan for fish water"
119
- Note over Classification: Analyzes content types in prompt
120
- Classification->>MetaManager: Provides weights [businessLogic:0.75, marketing:0.4, legal:0.21]
29
+ ```typescript
30
+ resolve('opus') // 'anthropic/claude-opus-4.5'
31
+ resolve('sonnet') // 'anthropic/claude-sonnet-4.5'
32
+ resolve('gpt') // 'openai/gpt-4o'
33
+ resolve('llama') // 'meta-llama/llama-4-maverick'
34
+ resolve('anthropic/claude-opus-4.5') // 'anthropic/claude-opus-4.5' (pass-through)
35
+ ```
121
36
 
122
- MetaManager->>ModelRegistry: Request models matching meta-model type (e.g., "frontier-reasoning")
123
- ModelRegistry->>MetaManager: Return candidate models with performance weights
37
+ ### `list(): ModelInfo[]`
124
38
 
125
- Note over MetaManager: Ranks models based on content weights<br/>and historical performance
126
- MetaManager->>SelectedModel: Route prompt to best model (e.g., claude-3.7-sonnet)
127
- SelectedModel->>User: Return response
39
+ List all available models from OpenRouter.
128
40
 
129
- User->>Feedback: Provides feedback (👍)
130
- Feedback->>ModelRegistry: Update model scores for identified content tags
131
- Note over ModelRegistry: Increases claude-3.7-sonnet's<br/>score for businessLogic tag
41
+ ### `get(id: string): ModelInfo | undefined`
132
42
 
133
- Note over MetaManager: Future similar prompts more likely<br/>to select same high-performing model
134
- ```
43
+ Get a model by exact ID.
135
44
 
136
- `*:reasoning(sort:pricing,latency)`
45
+ ### `search(query: string): ModelInfo[]`
137
46
 
138
- ## Examples
47
+ Search models by ID or name.
139
48
 
140
- - Best PDF model thats the cheapest (gemini-2.0-flash) -> `*:pdf(sort:pricing)`
141
- - PDF with reasoning (claude-3.7-sonnet) -> `*:pdf,reasoning`
142
- - Creative writing (gpt-4.5) -> `creative`
143
- - Extremely complex code debugging (o1-pro)
144
- - Cheap blog post writer (gemini-2.0-flash)
145
- - Better blog post writer (gpt-4o || claude-3.7-sonnet)
146
- - Best blog post writer (gpt-4.5)
147
- - Deal review (o3-mini || claude-3.7-sonnet || gemini-2.5-pro)
49
+ ## Available Aliases
148
50
 
149
- #### Example business requirements
51
+ | Alias | Model ID |
52
+ |-------|----------|
53
+ | `opus` | anthropic/claude-opus-4.5 |
54
+ | `sonnet` | anthropic/claude-sonnet-4.5 |
55
+ | `haiku` | anthropic/claude-haiku-4.5 |
56
+ | `claude` | anthropic/claude-sonnet-4.5 |
57
+ | `gpt`, `gpt-4o`, `4o` | openai/gpt-4o |
58
+ | `o1`, `o3`, `o3-mini` | openai/o1, openai/o3, openai/o3-mini |
59
+ | `gemini`, `flash` | google/gemini-2.5-flash |
60
+ | `gemini-pro` | google/gemini-2.5-pro |
61
+ | `llama`, `llama-4` | meta-llama/llama-4-maverick |
62
+ | `llama-70b` | meta-llama/llama-3.3-70b-instruct |
63
+ | `mistral` | mistralai/mistral-large-2411 |
64
+ | `codestral` | mistralai/codestral-2501 |
65
+ | `deepseek` | deepseek/deepseek-chat |
66
+ | `r1` | deepseek/deepseek-r1 |
67
+ | `qwen` | qwen/qwen3-235b-a22b |
68
+ | `grok` | x-ai/grok-3 |
69
+ | `sonar` | perplexity/sonar-pro |
150
70
 
151
- ```
152
- Requirements: Must be good at creative writing, but cost less than $15 per million tokens
153
- Constraints: Must be able to handle complex code
154
- ```
71
+ ## Updating Models
155
72
 
156
- We need to transform the above business requirements into a meta-model:
73
+ Fetch the latest models from OpenRouter:
157
74
 
158
- ```
159
- [
160
- "gemini-2.0-flash",
161
- ...
162
- ]
75
+ ```bash
76
+ pnpm fetch-models
163
77
  ```
164
78
 
165
- From the list, we can then sort by pricing, latency, and other constraints at completion time depending on the users goals.
79
+ This updates `data/models.json` with all available models.