litmus-ai 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,161 @@
1
+ Litmus
2
+
3
+ A terminal-based LLM benchmarking and evaluation tool built with **OpenTUI**. Compare multiple language models side-by-side, evaluate their tool usage capabilities, and analyze results with evals
4
+
5
+ ![](images/main.png)
6
+
7
+ ## Features
8
+
9
+ ### Model Comparison
10
+
11
+ - Run identical prompts across multiple LLMs simultaneously
12
+ - Real-time streaming responses with progress indicators
13
+ - Supports basically any model via Openrouter - who knows what will work and what won't.
14
+ - Visual comparison grid with response timing
15
+ - **Multi-modal support** - Attach images to prompts (see Image Attachments below)
16
+
17
+
18
+ ### Image Attachments
19
+
20
+ Litmus supports multi-modal prompts with image attachments. You can attach images in multiple ways:
21
+
22
+ **Clipboard Paste (Ctrl+V)**
23
+ - Copy an image to your clipboard (Cmd/Ctrl+C on any image)
24
+ - Press `Ctrl+V` in the Benchmark view to attach
25
+
26
+ **File Path**
27
+ - Type or paste a file path to an image
28
+ - Supports `~/` home directory expansion
29
+ - Example: `~/photos/screenshot.png`
30
+
31
+ **Supported Formats**: PNG, JPG, JPEG, GIF, WebP, BMP
32
+
33
+ **Image Controls**
34
+ - `x` - Remove last attached image
35
+ - `c` - Clear all attached images
36
+ - `i` - Open image input dialog (alternative method)
37
+ - Images are displayed above the prompt input when attached
38
+
39
+
40
+ ### Evals using LLM-as-Judge
41
+
42
+ - Run automated evaluations using dedicated judge models
43
+ - Multi-criteria scoring (accuracy, relevance, reasoning, tool use)
44
+ - Pairwise comparisons and ranking
45
+ - Detailed reasoning and score breakdowns
46
+
47
+ ![](images/eval.png)
48
+
49
+ ### Persistent Storage
50
+
51
+ - SQLite database for all benchmark runs and results
52
+ - Searchable history of past runs
53
+ - Track performance over time
54
+
55
+ ![](images/history.png)
56
+
57
+
58
+
59
+ ## Installation
60
+
61
+ ```bash
62
+ npm install -g litmus-ai
63
+ ```
64
+
65
+ ### Environment Setup
66
+
67
+ Create a `.env` file in your working directory or export the variables:
68
+
69
+ ```bash
70
+ export OPENROUTER_API_KEY=your_key_here
71
+ ```
72
+
73
+ ## Quick Start
74
+
75
+ ```bash
76
+ litmus
77
+ ```
78
+
79
+ ### Basic Workflow
80
+
81
+ 1. **Select Models** - Choose from available models in the dropdown
82
+ 2. **Enter Prompt** - Type your test prompt or select from templates
83
+ 3. **Enable Tools** - Toggle tools to test function calling (optional)
84
+ 4. **Generate** - Press `Enter` or `g` to run the benchmark
85
+ 5. **Evaluate** - Press `e` in the Evaluation view to run LLM-as-judge scoring
86
+
87
+ ## Configuration
88
+
89
+ ### Environment Variables
90
+
91
+ ```bash
92
+ OPENROUTER_API_KEY=your_key_here # Required - get from https://openrouter.ai
93
+ EXA_API_KEY=your_key_here # Optional - for web search tool (https://exa.ai)
94
+ ```
95
+
96
+
97
+ ## Evaluation Criteria
98
+
99
+ Litmus evaluates models on:
100
+
101
+ - **Accuracy** - Correctness of information
102
+ - **Completeness** - Thoroughness of response
103
+ - **Relevance** - How well it addresses the prompt
104
+ - **Clarity** - Communication quality
105
+ - **Tool Use** - Proper function calling (when applicable)
106
+ - **Overall Score** - Weighted combination
107
+
108
+ ## Keyboard Shortcuts
109
+
110
+ ### Global
111
+
112
+ - **Ctrl+K** - Toggle console
113
+ - **Tab** - Cycle focus
114
+ - **Escape** - Back/Focus nav
115
+
116
+ ### Benchmark View
117
+
118
+ - **g** - Generate responses
119
+ - **Enter** - Add model (when focused)
120
+ - **Space** - Toggle tool
121
+ - **d** - Remove last model
122
+ - **Ctrl+V** - Paste image from clipboard
123
+ - **i** - Open image input dialog
124
+ - **x** - Remove last attached image
125
+ - **c** - Clear all attached images
126
+ - **/** - Search/add models
127
+
128
+ ### Evaluation View
129
+
130
+ - **e** - Run evaluation
131
+ - **Left/Right** - Select judge model
132
+ - **q** - Back to history
133
+
134
+ ### History View
135
+
136
+ - **/** - Focus search
137
+ - **Enter** - Select run
138
+ - **Delete** - Remove run
139
+
140
+
141
+ ## Development
142
+
143
+ ```bash
144
+ # Install dependencies
145
+ bun install
146
+
147
+ # Run development mode
148
+ bun dev
149
+
150
+ # Build for production
151
+ bun build
152
+
153
+ # Run tests
154
+ bun test
155
+ ```
156
+
157
+ ## License
158
+
159
+ MIT License - see LICENSE file for details.
160
+
161
+ - 🐛 [Issue Tracker](https://github.com/your-username/Litmus/issues)