litmus-ai 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +161 -0
- package/dist/cli.js +95793 -0
- package/dist/highlights-eq9cgrbb.scm +604 -0
- package/dist/highlights-ghv9g403.scm +205 -0
- package/dist/highlights-hk7bwhj4.scm +284 -0
- package/dist/highlights-r812a2qc.scm +150 -0
- package/dist/highlights-x6tmsnaa.scm +115 -0
- package/dist/injections-73j83es3.scm +27 -0
- package/dist/tree-sitter-javascript-nd0q4pe9.wasm +0 -0
- package/dist/tree-sitter-markdown-411r6y9b.wasm +0 -0
- package/dist/tree-sitter-markdown_inline-j5349f42.wasm +0 -0
- package/dist/tree-sitter-typescript-zxjzwt75.wasm +0 -0
- package/dist/tree-sitter-zig-e78zbjpm.wasm +0 -0
- package/package.json +35 -0
package/README.md
ADDED
|
@@ -0,0 +1,161 @@
|
|
|
1
|
+
Litmus
|
|
2
|
+
|
|
3
|
+
A terminal-based LLM benchmarking and evaluation tool built with **OpenTUI**. Compare multiple language models side-by-side, evaluate their tool usage capabilities, and analyze results with evals
|
|
4
|
+
|
|
5
|
+

|
|
6
|
+
|
|
7
|
+
## Features
|
|
8
|
+
|
|
9
|
+
### Model Comparison
|
|
10
|
+
|
|
11
|
+
- Run identical prompts across multiple LLMs simultaneously
|
|
12
|
+
- Real-time streaming responses with progress indicators
|
|
13
|
+
- Supports basically any model via Openrouter - who knows what will work and what won't.
|
|
14
|
+
- Visual comparison grid with response timing
|
|
15
|
+
- **Multi-modal support** - Attach images to prompts (see Image Attachments below)
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
### Image Attachments
|
|
19
|
+
|
|
20
|
+
Litmus supports multi-modal prompts with image attachments. You can attach images in multiple ways:
|
|
21
|
+
|
|
22
|
+
**Clipboard Paste (Ctrl+V)**
|
|
23
|
+
- Copy an image to your clipboard (Cmd/Ctrl+C on any image)
|
|
24
|
+
- Press `Ctrl+V` in the Benchmark view to attach
|
|
25
|
+
|
|
26
|
+
**File Path**
|
|
27
|
+
- Type or paste a file path to an image
|
|
28
|
+
- Supports `~/` home directory expansion
|
|
29
|
+
- Example: `~/photos/screenshot.png`
|
|
30
|
+
|
|
31
|
+
**Supported Formats**: PNG, JPG, JPEG, GIF, WebP, BMP
|
|
32
|
+
|
|
33
|
+
**Image Controls**
|
|
34
|
+
- `x` - Remove last attached image
|
|
35
|
+
- `c` - Clear all attached images
|
|
36
|
+
- `i` - Open image input dialog (alternative method)
|
|
37
|
+
- Images are displayed above the prompt input when attached
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
### Evals using LLM-as-Judge
|
|
41
|
+
|
|
42
|
+
- Run automated evaluations using dedicated judge models
|
|
43
|
+
- Multi-criteria scoring (accuracy, relevance, reasoning, tool use)
|
|
44
|
+
- Pairwise comparisons and ranking
|
|
45
|
+
- Detailed reasoning and score breakdowns
|
|
46
|
+
|
|
47
|
+

|
|
48
|
+
|
|
49
|
+
### Persistent Storage
|
|
50
|
+
|
|
51
|
+
- SQLite database for all benchmark runs and results
|
|
52
|
+
- Searchable history of past runs
|
|
53
|
+
- Track performance over time
|
|
54
|
+
|
|
55
|
+

|
|
56
|
+
|
|
57
|
+
|
|
58
|
+
|
|
59
|
+
## Installation
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
npm install -g litmus-ai
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
### Environment Setup
|
|
66
|
+
|
|
67
|
+
Create a `.env` file in your working directory or export the variables:
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
export OPENROUTER_API_KEY=your_key_here
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
## Quick Start
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
litmus
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### Basic Workflow
|
|
80
|
+
|
|
81
|
+
1. **Select Models** - Choose from available models in the dropdown
|
|
82
|
+
2. **Enter Prompt** - Type your test prompt or select from templates
|
|
83
|
+
3. **Enable Tools** - Toggle tools to test function calling (optional)
|
|
84
|
+
4. **Generate** - Press `Enter` or `g` to run the benchmark
|
|
85
|
+
5. **Evaluate** - Press `e` in the Evaluation view to run LLM-as-judge scoring
|
|
86
|
+
|
|
87
|
+
## Configuration
|
|
88
|
+
|
|
89
|
+
### Environment Variables
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
OPENROUTER_API_KEY=your_key_here # Required - get from https://openrouter.ai
|
|
93
|
+
EXA_API_KEY=your_key_here # Optional - for web search tool (https://exa.ai)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
|
|
97
|
+
## Evaluation Criteria
|
|
98
|
+
|
|
99
|
+
Litmus evaluates models on:
|
|
100
|
+
|
|
101
|
+
- **Accuracy** - Correctness of information
|
|
102
|
+
- **Completeness** - Thoroughness of response
|
|
103
|
+
- **Relevance** - How well it addresses the prompt
|
|
104
|
+
- **Clarity** - Communication quality
|
|
105
|
+
- **Tool Use** - Proper function calling (when applicable)
|
|
106
|
+
- **Overall Score** - Weighted combination
|
|
107
|
+
|
|
108
|
+
## Keyboard Shortcuts
|
|
109
|
+
|
|
110
|
+
### Global
|
|
111
|
+
|
|
112
|
+
- **Ctrl+K** - Toggle console
|
|
113
|
+
- **Tab** - Cycle focus
|
|
114
|
+
- **Escape** - Back/Focus nav
|
|
115
|
+
|
|
116
|
+
### Benchmark View
|
|
117
|
+
|
|
118
|
+
- **g** - Generate responses
|
|
119
|
+
- **Enter** - Add model (when focused)
|
|
120
|
+
- **Space** - Toggle tool
|
|
121
|
+
- **d** - Remove last model
|
|
122
|
+
- **Ctrl+V** - Paste image from clipboard
|
|
123
|
+
- **i** - Open image input dialog
|
|
124
|
+
- **x** - Remove last attached image
|
|
125
|
+
- **c** - Clear all attached images
|
|
126
|
+
- **/** - Search/add models
|
|
127
|
+
|
|
128
|
+
### Evaluation View
|
|
129
|
+
|
|
130
|
+
- **e** - Run evaluation
|
|
131
|
+
- **Left/Right** - Select judge model
|
|
132
|
+
- **q** - Back to history
|
|
133
|
+
|
|
134
|
+
### History View
|
|
135
|
+
|
|
136
|
+
- **/** - Focus search
|
|
137
|
+
- **Enter** - Select run
|
|
138
|
+
- **Delete** - Remove run
|
|
139
|
+
|
|
140
|
+
|
|
141
|
+
## Development
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
# Install dependencies
|
|
145
|
+
bun install
|
|
146
|
+
|
|
147
|
+
# Run development mode
|
|
148
|
+
bun dev
|
|
149
|
+
|
|
150
|
+
# Build for production
|
|
151
|
+
bun build
|
|
152
|
+
|
|
153
|
+
# Run tests
|
|
154
|
+
bun test
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
## License
|
|
158
|
+
|
|
159
|
+
MIT License - see LICENSE file for details.
|
|
160
|
+
|
|
161
|
+
- 🐛 [Issue Tracker](https://github.com/your-username/Litmus/issues)
|