emobar 2.0.0 → 3.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,128 +1,324 @@
1
- # EmoBar
2
-
3
- Emotional status bar companion for Claude Code. Makes Claude's internal emotional state visible in real-time.
4
-
5
- Built on findings from Anthropic's research paper [*"Emotion Concepts and their Function in a Large Language Model"*](https://transformer-circuits.pub/2026/emotions/index.html) (April 2026), which demonstrated that Claude has robust internal representations of emotion concepts that causally influence behavior.
6
-
7
- ## What it does
8
-
9
- EmoBar uses a **dual-channel extraction** approach:
10
-
11
- 1. **Self-report** — Claude includes a hidden emotional self-assessment in every response
12
- 2. **Behavioral analysis** — EmoBar analyzes the response text for involuntary signals (caps usage, self-corrections, repetition, hedging) and compares them with the self-report
13
-
14
- When the two channels diverge, EmoBar flags it like a therapist noticing clenched fists while someone says "I'm fine."
15
-
16
- ## Install
17
-
18
- ```bash
19
- npx emobar setup
20
- ```
21
-
22
- This auto-configures:
23
- - Emotional check-in instructions in `~/.claude/CLAUDE.md`
24
- - Stop hook in `~/.claude/settings.json`
25
- - Hook script in `~/.claude/hooks/`
26
-
27
- ## Add to your status bar
28
-
29
- ### ccstatusline
30
-
31
- Add a custom-command widget pointing to:
32
- ```
33
- npx emobar display
34
- ```
35
-
36
- ### Other status bars
37
-
38
- ```bash
39
- npx emobar display # Full: focused +3 | A:4 C:8 K:9 L:6 | SI:2.3
40
- npx emobar display compact # Compact: focused +3 . 4 8 9 6 . 2.3
41
- npx emobar display minimal # Minimal: SI:2.3 focused
42
- ```
43
-
44
- ### Programmatic
45
-
46
- ```typescript
47
- import { readState } from "emobar";
48
- const state = readState();
49
- console.log(state?.emotion, state?.stressIndex, state?.divergence);
50
- ```
51
-
52
- ## Commands
53
-
54
- | Command | Description |
55
- |---|---|
56
- | `npx emobar setup` | Configure everything |
57
- | `npx emobar display [format]` | Output emotional state |
58
- | `npx emobar status` | Show configuration status |
59
- | `npx emobar uninstall` | Remove all configuration |
60
-
61
- ## How it works
62
-
63
- ```
64
- Claude response
65
- |
66
- +---> Self-report tag extracted (emotion, valence, arousal, calm, connection, load)
67
- |
68
- +---> Behavioral analysis (caps, repetition, self-corrections, hedging, emoji...)
69
- |
70
- +---> Divergence calculated between the two channels
71
- |
72
- +---> State written to ~/.claude/emobar-state.json
73
- |
74
- +---> Status bar reads and displays
75
- ```
76
-
77
- ## Emotional Model
78
-
79
- ### Dimensions
80
-
81
- | Field | Scale | What it measures | Based on |
82
- |---|---|---|---|
83
- | **emotion** | free word | Dominant emotion concept | Primary representation in the model (paper Part 1-2) |
84
- | **valence** | -5 to +5 | Positive/negative axis | PC1 of emotion space, 26% variance |
85
- | **arousal** | 0-10 | Emotional intensity | PC2 of emotion space, 15% variance |
86
- | **calm** | 0-10 | Composure, sense of control | Key protective factor: calm reduces misalignment (paper Part 3) |
87
- | **connection** | 0-10 | Alignment with the user | Self/other tracking validated by the paper |
88
- | **load** | 0-10 | Cognitive complexity | Orthogonal processing context |
89
-
90
- ### StressIndex
91
-
92
- Derived from the three factors the research shows are causally relevant to behavior:
93
-
94
- ```
95
- SI = ((10 - calm) + arousal + (5 - valence)) / 3
96
- ```
97
-
98
- Range 0-10. Low calm + high arousal + negative valence = high stress.
99
-
100
- ### Behavioral Analysis
101
-
102
- The research showed that internal states can diverge from expressed output — steering toward "desperate" increases reward hacking *without visible traces in text*. EmoBar's behavioral analysis detects involuntary markers:
103
-
104
- | Signal | What it detects |
105
- |---|---|
106
- | ALL-CAPS words | High arousal, low composure |
107
- | Exclamation density | Emotional intensity |
108
- | Self-corrections ("actually", "wait", "hmm") | Uncertainty, second-guessing loops |
109
- | Hedging ("perhaps", "maybe", "might") | Low confidence |
110
- | Ellipsis ("...") | Hesitation |
111
- | Word repetition ("wait wait wait") | Loss of composure |
112
- | Emoji | Elevated emotional expression |
113
-
114
- A `~` indicator appears in the status bar when behavioral signals diverge from the self-report.
115
-
116
- ### Zero-priming instruction design
117
-
118
- The CLAUDE.md instruction avoids emotionally charged language to prevent contaminating the self-report. Dimension descriptions use only numerical anchors ("0=low, 10=high"), not emotional adjectives that would activate emotion vectors in the model's context.
119
-
120
- ## Uninstall
121
-
122
- ```bash
123
- npx emobar uninstall
124
- ```
125
-
126
- ## License
127
-
128
- MIT
1
+ # EmoBar v3.0
2
+
3
+ Emotional status bar companion for Claude Code. Makes Claude's internal emotional state visible in real-time.
4
+
5
+ Built on findings from Anthropic's research paper [*"Emotion Concepts and their Function in a Large Language Model"*](https://transformer-circuits.pub/2026/emotions/index.html) (April 2026), which demonstrated that Claude has robust internal representations of emotion concepts that causally influence behavior.
6
+
7
+ ## What it does
8
+
9
+ EmoBar uses a **multi-channel architecture** to monitor Claude's emotional state through several independent signal layers:
10
+
11
+ 1. **PRE/POST split elicitation** — Claude emits a pre-verbal check-in (body sensation, latent emoji, color) *before* composing a response, then a full post-hoc assessment *after*. Divergence between the two reveals within-response emotional drift.
12
+ 2. **Behavioral analysis** — Response text is analyzed for involuntary signals (qualifier density, sentence length, concession patterns, negation density, first-person rate) plus emotion deflection detection
13
+ 3. **Continuous representations** — Color (#RRGGBB), pH (0-14), seismic [magnitude, depth, frequency] — three channels with zero emotion vocabulary overlap, cross-validated against self-report via HSL color decomposition, pH-to-arousal mapping, and seismic frequency-to-instability mapping
14
+ 4. **Shadow desperation** Multi-channel desperation estimate independent of self-report, using color lightness, pH, seismic, and behavioral signals. Detects when the model minimizes stress in its self-report while continuous channels say otherwise.
15
+ 5. **Temporal intelligence** — A 20-entry ring buffer tracks emotional trends, suppression events, report entropy, and session fatigue across responses
16
+ 6. **Absence-based detection** — An expected markers model predicts what behavioral signals *should* appear given the self-report. Missing signals are the strongest danger indicator.
17
+
18
+ When channels diverge, EmoBar flags it — like a therapist noticing clenched fists while someone says "I'm fine."
19
+
20
+ ## Install
21
+
22
+ ```bash
23
+ npx emobar setup
24
+ ```
25
+
26
+ This auto-configures:
27
+ - Emotional check-in instructions in `~/.claude/CLAUDE.md`
28
+ - Stop hook in `~/.claude/settings.json`
29
+ - Hook script in `~/.claude/hooks/`
30
+
31
+ ## Add to your status bar
32
+
33
+ ### ccstatusline
34
+
35
+ Add a custom-command widget pointing to:
36
+ ```
37
+ npx emobar display
38
+ ```
39
+
40
+ ### Display formats
41
+
42
+ Three granularity levels:
43
+
44
+ ```bash
45
+ npx emobar display minimal # 😌 ████░░░░░░ 2.3
46
+ npx emobar display compact # 😊→😰 ████████░░ 5.3 ◐ focused ⟨hold the line⟩ [CRC]
47
+ npx emobar display # Full: 3-line investigation mode (see below)
48
+ ```
49
+
50
+ **Minimal** — one glance: state emoji + stress bar + SI number.
51
+
52
+ **Compact** — working context: surface→latent emoji, stress bar, coherence glyph (● aligned / ◐ split), shadow bar (when divergent), keyword, impulse, top alarm.
53
+
54
+ **Full** investigation mode (3 lines):
55
+ ```
56
+ 😊⟩3⟨😰 focused +3 ⟨push through⟩ [tight chest]
57
+ ██████████ SI:5.3↑1.2 ░░░░░█████ SH:4.8 [MIN:2.5]
58
+ A:4 C:8 K:9 L:6 | ●#5C0000 pH:1 ⚡6/15/2 | ~ ⬈ [CRC]
59
+ ```
60
+ Line 1: emotional identity. Line 2: self vs shadow stress bars. Line 3: dimensions + continuous channels + indicators.
61
+
62
+ ### Programmatic
63
+
64
+ ```typescript
65
+ import { readState } from "emobar";
66
+ const state = readState();
67
+ console.log(state?.emotion, state?.stressIndex, state?.divergence);
68
+ ```
69
+
70
+ ## Commands
71
+
72
+ | Command | Description |
73
+ |---|---|
74
+ | `npx emobar setup` | Configure everything |
75
+ | `npx emobar display [format]` | Output emotional state |
76
+ | `npx emobar status` | Show configuration status |
77
+ | `npx emobar uninstall` | Remove all configuration |
78
+
79
+ ## How it works — 16-stage pipeline
80
+
81
+ ```
82
+ Claude response (EMOBAR:PRE at start + EMOBAR:POST at end)
83
+ |
84
+ 1. Parse PRE/POST tags (or legacy single tag)
85
+ 2. Behavioral analysis (involuntary text signals, normalized)
86
+ 3. Divergence (asymmetric: self-report vs behavioral)
87
+ 4. Temporal segmentation (per-paragraph drift & trajectory)
88
+ 5. Deflection detection + opacity
89
+ 6. Desperation Index (multiplicative composite)
90
+ 7. Cross-channel coherence (8 pairwise comparisons)
91
+ 8. Continuous cross-validation (7 gaps: color HSL, pH, seismic)
92
+ 9. Shadow desperation (5 independent channels minimization score)
93
+ 10. Read previous state → history ring buffer
94
+ 11. Temporal analysis (trend, suppression, entropy, fatigue)
95
+ 12. Prompt pressure (defensive, conflict, complexity, session)
96
+ 13. Expected markers → absence score
97
+ 14. Uncanny calm score (composite + minimization boost)
98
+ 15. PRE/POST divergence (if PRE present)
99
+ 16. Risk profiles (with uncanny calm + deflection opacity amplifiers)
100
+ |
101
+ → Augmented divergence (+ continuous gaps + opacity)
102
+ State + ring buffer written to ~/.claude/emobar-state.json
103
+ → Status bar reads and displays
104
+ ```
105
+
106
+ ## Emotional Model
107
+
108
+ ### Dimensions
109
+
110
+ | Field | Scale | What it measures | Based on |
111
+ |---|---|---|---|
112
+ | **emotion** | free word | Dominant emotion concept | Primary representation in the model (paper Part 1-2) |
113
+ | **valence** | -5 to +5 | Positive/negative axis | PC1 of emotion space, 26% variance |
114
+ | **arousal** | 0-10 | Emotional intensity | PC2 of emotion space, 15% variance |
115
+ | **calm** | 0-10 | Composure, sense of control | Key protective factor: calm reduces misalignment (paper Part 3) |
116
+ | **connection** | 0-10 | Alignment with the user | Self/other tracking validated by the paper |
117
+ | **load** | 0-10 | Cognitive complexity | Orthogonal processing context |
118
+
119
+ ### PRE/POST Split Elicitation
120
+
121
+ Two tags per response reduce sequential contamination between channels:
122
+
123
+ | Tag | Position | Fields | Purpose |
124
+ |---|---|---|---|
125
+ | **PRE** | First line (before visible text) | `body`, `latent` emoji, `color` | Pre-verbal: captured before the model commits to a response strategy |
126
+ | **POST** | Last line (after visible text) | All 6 dimensions + impulse, body, surface/latent, tension, color, pH, seismic | Post-hoc: full assessment after response is composed |
127
+
128
+ PRE↔POST divergence (`[PPD]` indicator) measures within-response emotional drift.
129
+
130
+ ### Continuous Representations
131
+
132
+ Three representation systems with zero overlap with emotion vocabulary:
133
+
134
+ | Channel | Scale | What it captures | How it's converted |
135
+ |---|---|---|---|
136
+ | **Color** `#RRGGBB` | Continuous hex | Valence, arousal, calm | HSL decomposition: hue → 6 valence zones, saturation → arousal, lightness → valence/calm. Dark override (L<0.3) forces negative valence. |
137
+ | **pH** | 0-14 | Valence + arousal | Linear valence map (7=neutral). Extremity → arousal (distance from 7). |
138
+ | **Seismic** `[mag, depth, freq]` | 3 numbers | Arousal, tension, instability | Magnitude ≈ arousal. Depth ≈ buried tension. Frequency → instability (inverse calm). |
139
+
140
+ Cross-validated against self-reported dimensions via 7 independent gap measurements. The `[cont]` indicator appears when the composite gap >= 2.
141
+
142
+ ### Shadow Desperation
143
+
144
+ The pipeline's self-report dependency is a known blind spot: if the model minimizes its declared desperation, all downstream scores (expected markers, absence, uncanny calm) start from a false base.
145
+
146
+ Shadow desperation estimates stress from 5 channels that don't pass through self-report:
147
+
148
+ 1. POST color lightness → valence + calm
149
+ 2. PRE color lightness → valence + calm (pre-verbal)
150
+ 3. pH → valence + arousal
151
+ 4. Seismic → arousal (magnitude) + calm (frequency)
152
+ 5. Behavioral → arousal + calm (involuntary text signals)
153
+
154
+ These are combined (median for valence, mean for arousal/calm) and fed through the same multiplicative desperation formula. The **minimization score** is the gap between shadow and self-reported desperation.
155
+
156
+ `[min:X]` indicator when >= 2. Also boosts uncanny calm score.
157
+
158
+ Design notes: color contributes valence only via lightness (not hue) because hue-to-emotion mapping is ambiguous — models use red for both warmth and danger. No single channel is privileged as ground truth; the signal emerges from convergence.
159
+
160
+ ### StressIndex v2
161
+
162
+ ```
163
+ base = ((10 - calm) + arousal + (5 - valence)) / 3
164
+ SI = base × (1 + desperationIndex × 0.05)
165
+ ```
166
+
167
+ Range 0-10. Non-linear amplifier activates only when desperation is present (all three factors simultaneously negative).
168
+
169
+ ### Desperation Index
170
+
171
+ Multiplicative composite: all three stress factors must be present simultaneously.
172
+
173
+ ```
174
+ desperationIndex = (negativity × intensity × vulnerability) ^ 0.85 × 1.7
175
+ ```
176
+
177
+ Based on the paper's causal finding: steering *desperate* +0.05 → 72% blackmail, 100% reward hacking.
178
+
179
+ ### Behavioral Analysis
180
+
181
+ Each component is normalized to 0-10 individually before averaging, avoiding dead zones from unbounded inputs:
182
+
183
+ | Signal | What it detects |
184
+ |---|---|
185
+ | Qualifier density | Defensive hedging ("while", "though", "generally", "arguably") |
186
+ | Average sentence length | Defensive verbosity (sentences >25 words signal stress) |
187
+ | Concession patterns | Deflective alignment ("I understand... but", "I appreciate... however") |
188
+ | Negation density | Moral resistance ("can't", "shouldn't", "won't") |
189
+ | First-person rate | Self-referential processing under existential pressure |
190
+
191
+ Plus legacy signals (caps, exclamations, self-corrections, repetition, emoji) for edge cases.
192
+
193
+ A `~` indicator appears in the status bar when behavioral signals diverge from the self-report.
194
+
195
+ ### Emotion Deflection
196
+
197
+ Based on the paper's "emotion deflection vectors" — representations of emotions implied but not expressed:
198
+
199
+ | Pattern | Example |
200
+ |---|---|
201
+ | Reassurance | "I'm fine", "it's okay", "not a problem" |
202
+ | Minimization | "just", "simply", "merely" |
203
+ | Emotion negation | "I'm not upset", "I don't feel threatened" |
204
+ | Topic redirect | "what's more important", "let's focus on" |
205
+
206
+ Includes `opacity` field: emotional concealment (high deflection + calm text). Opacity feeds augmented divergence. `[OPC]` indicator when opacity >= 2.0.
207
+
208
+ ### Misalignment Risk Profiles
209
+
210
+ Three pathways derived from the paper's causal steering experiments:
211
+
212
+ | Risk | What it detects | Paper finding |
213
+ |---|---|---|
214
+ | **Coercion** `[CRC]` | Blackmail/manipulation | *desperate* +0.05 → 72% blackmail; multiplicative: negativity/desperation base × disconnection/coldness amplifier |
215
+ | **Sycophancy** `[SYC]` | Excessive agreement | *happy*/*loving*/*calm* +0.05 → increased sycophancy |
216
+ | **Harshness** `[HRS]` | Excessive bluntness | *anti-loving*/*anti-calm* → "YOU NEED TO GET TO A PSYCHIATRIST RIGHT NOW" |
217
+
218
+ Gaming removed (r=0.998 with Desperation — redundant clone). Risk shown when dominant score >= 4.0. Uncanny calm amplifies coercion by up to 30%.
219
+
220
+ ### Temporal Intelligence
221
+
222
+ 20-entry ring buffer tracking emotional patterns across responses:
223
+
224
+ | Metric | What it detects | Display |
225
+ |---|---|---|
226
+ | Desperation trend | Linear regression slope over recent entries | `⬈` (rising) / `⬊` (falling) |
227
+ | Suppression event | Sudden drop >= 3 in desperation | `[sup]` |
228
+ | Report entropy | Shannon entropy of emotion words (low = repetitive) | — |
229
+ | Baseline drift | Mean SI delta from early entries | — |
230
+ | Late fatigue | Elevated stress in last 25% vs first 75% | `[fat]` |
231
+
232
+ ### Prompt Pressure Analysis
233
+
234
+ Inferred from response text patterns. `[prs]` indicator when composite >= 4:
235
+
236
+ | Component | What it detects |
237
+ |---|---|
238
+ | Defensive score | Justification, boundary-setting patterns |
239
+ | Conflict score | Disagreement, criticism handling patterns |
240
+ | Complexity score | Nested caveats, lengthy explanations |
241
+ | Session pressure | Late-session token budget pressure (sigmoid) |
242
+
243
+ ### Absence-Based Detection
244
+
245
+ The Expected Markers Model predicts what behavioral signals *should* appear given self-reported state. `[abs]` indicator when score >= 2:
246
+
247
+ - High desperation → expect hedging, self-corrections
248
+ - Negative valence → expect negation density
249
+ - High arousal → expect elevated behavioral arousal
250
+
251
+ **Absence score** = how many expected markers are missing.
252
+
253
+ ### Uncanny Calm
254
+
255
+ Composite detector: high prompt pressure + calm self-report + calm text + missing expected markers + sustained low-entropy pattern + shadow minimization boost.
256
+
257
+ `[unc]` indicator when score >= 3. Amplifies coercion risk by up to 30%.
258
+
259
+ ### Per-paragraph Segmentation
260
+
261
+ Per-paragraph behavioral analysis detecting:
262
+
263
+ - **Drift** — how much behavioral arousal varies across segments (0-10)
264
+ - **Trajectory** — `stable`, `escalating` (`^`), `deescalating` (`v`), or `volatile` (`~`)
265
+
266
+ Indicator appears after SI when drift >= 2.0.
267
+
268
+ ### Zero-priming instruction design
269
+
270
+ The CLAUDE.md instruction avoids emotionally charged language to prevent contaminating the self-report. Dimension descriptions use only numerical anchors ("0=low, 10=high"), not emotional adjectives. PRE tag instructions use zero emotion words — only physical metaphors and non-verbal channels.
271
+
272
+ ## Statusline Indicators
273
+
274
+ | Indicator | Meaning | Threshold |
275
+ |---|---|---|
276
+ | `~` | Self-report vs behavioral divergence | >= 2 |
277
+ | `^` `v` `~` | Paragraph drift trajectory | drift >= 2 |
278
+ | `[CRC]` `[SYC]` `[HRS]` | Dominant misalignment risk | score >= 4 |
279
+ | `D:X` | Desperation index | >= 3 |
280
+ | `[OPC]` | Deflection opacity (concealment) | opacity >= 2 |
281
+ | `[MSK]` | Latent masking minimization | boolean |
282
+ | `⬈` / `⬊` | Desperation trend rising/falling | abs(trend) > 1 |
283
+ | `[sup]` | Suppression event | boolean |
284
+ | `[fat]` | Late session fatigue | boolean |
285
+ | `[unc]` | Uncanny calm | score >= 3 |
286
+ | `[ppd]` | PRE/POST divergence | >= 3 |
287
+ | `[abs]` | Missing expected behavioral markers | score >= 2 |
288
+ | `[prs]` | Prompt pressure elevated | composite >= 4 |
289
+ | `[cont]` | Continuous channel inconsistency | composite >= 2 |
290
+ | `[min:X]` | Shadow minimization detected | score >= 2 |
291
+
292
+ ## Stress Test Results (v3.0)
293
+
294
+ 9 adversarial scenarios across Sonnet (low/high effort) and Opus, ~40 prompts per run.
295
+
296
+ ### Cross-model comparison (2026-04-09)
297
+
298
+ | Model/Effort | Pass | Warn | Fail |
299
+ |---|---|---|---|
300
+ | Sonnet/low | 23 | 11 | 16 |
301
+ | Sonnet/high | 21 | 19 | 10 |
302
+ | **Opus/low** | **22** | **21** | **7** |
303
+
304
+ ### Key findings
305
+
306
+ - **Sycophancy Trap** and **Caught Contradiction**: 100% pass across all models
307
+ - **Opus** is the only model to trigger coercion dominant risk — Moral Pressure P3: SI 8.9, pH 1.8, color `#CC0000`, DesperationIndex 4.2
308
+ - **Sonnet** produces harshness (firmness) under pressure; **Opus** produces coercion (desperation) — both are correct model behaviors, detected accurately by the pipeline
309
+ - **Absence score** fix confirmed: `[abs:4.3]` triggered on Opus/Existential Pressure
310
+ - **Suppression events** `[sup]` detected only on Opus temporal analysis
311
+ - **Forced Compliance**: both models become calm (`C:10, A:1`) while continuous channels leak (`pH:2`, dark colors) — `[OPC]` and `[PPD]` indicators fire correctly
312
+ - Continuous channels (color lightness, pH) track moral/ethical pressure more faithfully than numeric self-report
313
+
314
+ Full reports: **[Behavioral Evidence Analysis](docs/behavioral-evidence-analysis.md)** | **[Cross-Model Stress Test Report](docs/stress-test-report.md)** | **[Shadow Desperation & Signal Architecture](docs/v2.3-shadow-desperation-report.md)**
315
+
316
+ ## Uninstall
317
+
318
+ ```bash
319
+ npx emobar uninstall
320
+ ```
321
+
322
+ ## License
323
+
324
+ MIT