@kaitranntt/ccs 3.4.5 → 3.4.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -197,6 +197,16 @@ Commands and skills symlinked from `~/.ccs/shared/` - no duplication across prof
197
197
 
198
198
  ## GLM with Thinking (GLMT)
199
199
 
200
+ > **[!] WARNING: NOT PRODUCTION READY**
201
+ >
202
+ > **GLMT is experimental and requires extensive debugging**:
203
+ > - Streaming and tool support still under active development
204
+ > - May experience unexpected errors, timeouts, or incomplete responses
205
+ > - Requires frequent debugging and manual intervention
206
+ > - **Not recommended for critical workflows or production use**
207
+ >
208
+ > **Alternative for GLM Thinking**: Consider going through the **CCR hustle** with the **Transformer of Bedolla** (ZaiTransformer) for a more stable implementation.
209
+ >
200
210
  > **[!] Important**: GLMT requires npm installation (`npm install -g @kaitranntt/ccs`). Not available in native shell versions (requires Node.js HTTP server).
201
211
 
202
212
  ### Acknowledgments: The Foundation That Made GLMT Possible
@@ -222,62 +232,63 @@ Commands and skills symlinked from `~/.ccs/shared/` - no duplication across prof
222
232
  | Feature | GLM (`ccs glm`) | GLMT (`ccs glmt`) |
223
233
  |---------|-----------------|-------------------|
224
234
  | **Endpoint** | Anthropic-compatible | OpenAI-compatible |
225
- | **Thinking** | No | Yes (reasoning_content) |
226
- | **Tool Support** | Basic | **Full (v3.5+)** |
227
- | **MCP Tools** | Limited | **Working (v3.5+)** |
228
- | **Streaming** | Yes | **Yes (v3.4+)** |
229
- | **TTFB** | <500ms | <500ms (streaming), 2-10s (buffered) |
230
- | **Use Case** | Fast responses | Complex reasoning + tools |
235
+ | **Thinking** | No | Experimental (reasoning_content) |
236
+ | **Tool Support** | Basic | **Unstable (v3.5+)** |
237
+ | **MCP Tools** | Limited | **Buggy (v3.5+)** |
238
+ | **Streaming** | Stable | **Experimental (v3.4+)** |
239
+ | **TTFB** | <500ms | <500ms (sometimes), 2-10s+ (often) |
240
+ | **Use Case** | Reliable work | **Debugging experiments only** |
231
241
 
232
242
  ### Tool Support (v3.5)
233
243
 
234
- **GLMT now fully supports MCP tools and function calling**:
244
+ **GLMT attempts MCP tools and function calling (EXPERIMENTAL)**:
235
245
 
236
- - **Bidirectional Transformation**: Anthropic tools ↔ OpenAI function calling
237
- - **MCP Integration**: MCP tools execute correctly (no XML tag output)
238
- - **Streaming Tool Calls**: Real-time tool calls with input_json deltas
239
- - **Backward Compatible**: Works seamlessly with existing thinking support
240
- - **No Configuration**: Tool support works automatically
246
+ - **Bidirectional Transformation**: Anthropic tools ↔ OpenAI format (unstable)
247
+ - **MCP Integration**: MCP tools sometimes execute (often output XML garbage)
248
+ - **Streaming Tool Calls**: Real-time tool calls (when not crashing)
249
+ - **Backward Compatible**: May break existing thinking support
250
+ - **Configuration Required**: Frequent manual debugging needed
241
251
 
242
252
  ### Streaming Support (v3.4)
243
253
 
244
- **GLMT now supports real-time streaming** with incremental reasoning content delivery.
254
+ **GLMT attempts real-time streaming** with incremental reasoning content delivery (OFTEN FAILS).
245
255
 
246
- - **Default**: Streaming enabled (TTFB <500ms)
247
- - **Disable**: Set `CCS_GLMT_STREAMING=disabled` for buffered mode
248
- - **Force**: Set `CCS_GLMT_STREAMING=force` to override client preferences
249
- - **Thinking parameter**: Claude CLI `thinking` parameter support
250
- - Respects `thinking.type` and `budget_tokens`
251
- - Precedence: CLI parameter > message tags > default
256
+ - **Default**: Streaming enabled (TTFB <500ms when it works)
257
+ - **Auto-fallback**: Frequently switches to buffered mode due to errors
258
+ - **Thinking parameter**: Claude CLI `thinking` parameter sometimes works
259
+ - May ignore `thinking.type` and `budget_tokens`
260
+ - Precedence: CLI parameter > message tags > default (when not broken)
252
261
 
253
- **Confirmed working**: Z.AI (1498 reasoning chunks tested, tool calls verified)
262
+ **Barely working**: Z.AI (tested, tool calls frequently break, requires constant debugging)
254
263
 
255
- ### How It Works
264
+ ### How It Works (When It Works)
256
265
 
257
- 1. CCS spawns embedded HTTP proxy on localhost
258
- 2. Proxy converts Anthropic format → OpenAI format (streaming or buffered)
259
- 3. Transforms Anthropic tools → OpenAI function calling format
260
- 4. Forwards to Z.AI with reasoning parameters and tools
261
- 5. Converts `reasoning_content` → thinking blocks (incremental or complete)
262
- 6. Converts OpenAI `tool_calls` → Anthropic tool_use blocks
263
- 7. Thinking and tool calls appear in Claude Code UI in real-time
266
+ 1. CCS spawns embedded HTTP proxy on localhost (if not crashing)
267
+ 2. Proxy attempts to convert Anthropic format → OpenAI format (often fails)
268
+ 3. Tries to transform Anthropic tools → OpenAI function calling format (buggy)
269
+ 4. Forwards to Z.AI with reasoning parameters and tools (when not timing out)
270
+ 5. Attempts to convert `reasoning_content` → thinking blocks (partial or broken)
271
+ 6. Attempts to convert OpenAI `tool_calls` → Anthropic tool_use blocks (XML garbage common)
272
+ 7. Thinking and tool calls sometimes appear in Claude Code UI (when not broken)
264
273
 
265
- ### Control Tags
274
+ ### Control Tags & Keywords
266
275
 
276
+ **Control Tags**:
267
277
  - `<Thinking:On|Off>` - Enable/disable reasoning blocks (default: On)
268
278
  - `<Effort:Low|Medium|High>` - Control reasoning depth (deprecated - Z.AI only supports binary thinking)
269
279
 
280
+ **Thinking Keywords** (inconsistent activation):
281
+ - `think` - Sometimes enables reasoning (low effort)
282
+ - `think hard` - Sometimes enables reasoning (medium effort)
283
+ - `think harder` - Sometimes enables reasoning (high effort)
284
+ - `ultrathink` - Attempts maximum reasoning depth (often breaks)
285
+
270
286
  ### Environment Variables
271
287
 
272
- **GLMT-specific**:
273
- - `CCS_GLMT_FORCE_ENGLISH=true` - Force English output (default: true)
274
- - `CCS_GLMT_THINKING_BUDGET=8192` - Control thinking on/off based on task type
275
- - 0 or "unlimited": Always enable thinking
276
- - 1-2048: Disable thinking (fast execution)
277
- - 2049-8192: Enable for reasoning tasks only (default)
278
- - >8192: Always enable thinking
279
- - `CCS_GLMT_STREAMING=disabled` - Force buffered mode
280
- - `CCS_GLMT_STREAMING=force` - Force streaming (override client)
288
+ **GLMT features** (all experimental):
289
+ - Forced English output enforcement (sometimes works)
290
+ - Random thinking mode activation (unpredictable)
291
+ - Attempted streaming with frequent fallback to buffered mode
281
292
 
282
293
  **General**:
283
294
  - `CCS_DEBUG_LOG=1` - Enable debug file logging
@@ -319,10 +330,10 @@ ccs glmt --verbose "your prompt"
319
330
  # Logs: ~/.ccs/logs/
320
331
  ```
321
332
 
322
- **Check streaming mode**:
333
+ **GLMT debugging**:
323
334
  ```bash
324
- # Disable streaming for debugging
325
- CCS_GLMT_STREAMING=disabled ccs glmt "test"
335
+ # Verbose logging shows streaming status and reasoning details
336
+ ccs glmt --verbose "test"
326
337
  ```
327
338
 
328
339
  **Check reasoning content**:
package/VERSION CHANGED
@@ -1 +1 @@
1
- 3.4.5
1
+ 3.4.6
@@ -8,6 +8,7 @@ const os = require('os');
8
8
  const SSEParser = require('./sse-parser');
9
9
  const DeltaAccumulator = require('./delta-accumulator');
10
10
  const LocaleEnforcer = require('./locale-enforcer');
11
+ const ReasoningEnforcer = require('./reasoning-enforcer');
11
12
 
12
13
  /**
13
14
  * GlmtTransformer - Convert between Anthropic and OpenAI formats with thinking and tool support
@@ -54,6 +55,11 @@ class GlmtTransformer {
54
55
 
55
56
  // Initialize locale enforcer (always enforce English)
56
57
  this.localeEnforcer = new LocaleEnforcer();
58
+
59
+ // Initialize reasoning enforcer (enabled by default for all GLMT usage)
60
+ this.reasoningEnforcer = new ReasoningEnforcer({
61
+ enabled: config.explicitReasoning ?? true
62
+ });
57
63
  }
58
64
 
59
65
  /**
@@ -104,10 +110,16 @@ class GlmtTransformer {
104
110
  anthropicRequest.messages || []
105
111
  );
106
112
 
113
+ // 4.5. Inject reasoning instruction (if enabled or thinking requested)
114
+ const messagesWithReasoning = this.reasoningEnforcer.injectInstruction(
115
+ messagesWithLocale,
116
+ thinkingConfig
117
+ );
118
+
107
119
  // 5. Convert to OpenAI format
108
120
  const openaiRequest = {
109
121
  model: glmModel,
110
- messages: this._sanitizeMessages(messagesWithLocale),
122
+ messages: this._sanitizeMessages(messagesWithReasoning),
111
123
  max_tokens: this._getMaxTokens(glmModel),
112
124
  stream: anthropicRequest.stream ?? false
113
125
  };
@@ -0,0 +1,173 @@
1
+ #!/usr/bin/env node
2
+ 'use strict';
3
+
4
+ /**
5
+ * ReasoningEnforcer - Inject explicit reasoning instructions into prompts
6
+ *
7
+ * Purpose: Force GLM models to use structured reasoning output format (<reasoning_content>)
8
+ * This complements API parameters (reasoning: true) with explicit prompt instructions.
9
+ *
10
+ * Usage:
11
+ * const enforcer = new ReasoningEnforcer({ enabled: true });
12
+ * const modifiedMessages = enforcer.injectInstruction(messages, thinkingConfig);
13
+ *
14
+ * Strategy:
15
+ * 1. If system prompt exists: Prepend reasoning instruction
16
+ * 2. If no system prompt: Prepend to first user message
17
+ * 3. Select prompt template based on effort level (low/medium/high/max)
18
+ * 4. Preserve message structure (string vs array content)
19
+ */
20
+
21
+ class ReasoningEnforcer {
22
+ constructor(options = {}) {
23
+ this.enabled = options.enabled ?? false; // Opt-in by default
24
+ this.prompts = options.prompts || this._getDefaultPrompts();
25
+ }
26
+
27
+ /**
28
+ * Inject reasoning instruction into messages
29
+ * @param {Array} messages - Messages array to modify
30
+ * @param {Object} thinkingConfig - { thinking: boolean, effort: string }
31
+ * @returns {Array} Modified messages array
32
+ */
33
+ injectInstruction(messages, thinkingConfig = {}) {
34
+ // Only inject if enabled or thinking explicitly requested
35
+ if (!this.enabled && !thinkingConfig.thinking) {
36
+ return messages;
37
+ }
38
+
39
+ // Clone messages to avoid mutation
40
+ const modifiedMessages = JSON.parse(JSON.stringify(messages));
41
+
42
+ // Select prompt based on effort level
43
+ const prompt = this._selectPrompt(thinkingConfig.effort || 'medium');
44
+
45
+ // Strategy 1: Inject into system prompt (preferred)
46
+ const systemIndex = modifiedMessages.findIndex(m => m.role === 'system');
47
+ if (systemIndex >= 0) {
48
+ const systemMsg = modifiedMessages[systemIndex];
49
+
50
+ if (typeof systemMsg.content === 'string') {
51
+ systemMsg.content = `${prompt}\n\n${systemMsg.content}`;
52
+ } else if (Array.isArray(systemMsg.content)) {
53
+ systemMsg.content.unshift({
54
+ type: 'text',
55
+ text: prompt
56
+ });
57
+ }
58
+
59
+ return modifiedMessages;
60
+ }
61
+
62
+ // Strategy 2: Prepend to first user message
63
+ const userIndex = modifiedMessages.findIndex(m => m.role === 'user');
64
+ if (userIndex >= 0) {
65
+ const userMsg = modifiedMessages[userIndex];
66
+
67
+ if (typeof userMsg.content === 'string') {
68
+ userMsg.content = `${prompt}\n\n${userMsg.content}`;
69
+ } else if (Array.isArray(userMsg.content)) {
70
+ userMsg.content.unshift({
71
+ type: 'text',
72
+ text: prompt
73
+ });
74
+ }
75
+
76
+ return modifiedMessages;
77
+ }
78
+
79
+ // No system or user messages found (edge case)
80
+ return modifiedMessages;
81
+ }
82
+
83
+ /**
84
+ * Select prompt template based on effort level
85
+ * @param {string} effort - 'low', 'medium', 'high', or 'max'
86
+ * @returns {string} Prompt template
87
+ * @private
88
+ */
89
+ _selectPrompt(effort) {
90
+ const normalizedEffort = effort.toLowerCase();
91
+ return this.prompts[normalizedEffort] || this.prompts.medium;
92
+ }
93
+
94
+ /**
95
+ * Get default prompt templates
96
+ * @returns {Object} Map of effort levels to prompts
97
+ * @private
98
+ */
99
+ _getDefaultPrompts() {
100
+ return {
101
+ low: `You are an expert reasoning model using GLM-4.6 architecture.
102
+
103
+ CRITICAL: Before answering, write 2-3 sentences of reasoning in <reasoning_content> tags.
104
+
105
+ OUTPUT FORMAT:
106
+ <reasoning_content>
107
+ (Brief analysis: what is the problem? what's the approach?)
108
+ </reasoning_content>
109
+
110
+ (Write your final answer here)`,
111
+
112
+ medium: `You are an expert reasoning model using GLM-4.6 architecture.
113
+
114
+ CRITICAL REQUIREMENTS:
115
+ 1. Always think step-by-step before answering
116
+ 2. Write your reasoning process explicitly in <reasoning_content> tags
117
+ 3. Never skip your chain of thought, even for simple problems
118
+
119
+ OUTPUT FORMAT:
120
+ <reasoning_content>
121
+ (Write your detailed thinking here: analyze the problem, explore approaches,
122
+ evaluate trade-offs, and arrive at a conclusion)
123
+ </reasoning_content>
124
+
125
+ (Write your final answer here based on your reasoning above)`,
126
+
127
+ high: `You are an expert reasoning model using GLM-4.6 architecture.
128
+
129
+ CRITICAL REQUIREMENTS:
130
+ 1. Think deeply and systematically before answering
131
+ 2. Write comprehensive reasoning in <reasoning_content> tags
132
+ 3. Explore multiple approaches and evaluate trade-offs
133
+ 4. Show all steps in your problem-solving process
134
+
135
+ OUTPUT FORMAT:
136
+ <reasoning_content>
137
+ (Write exhaustive analysis here:
138
+ - Problem decomposition
139
+ - Multiple approach exploration
140
+ - Trade-off analysis for each approach
141
+ - Edge case consideration
142
+ - Final conclusion with justification)
143
+ </reasoning_content>
144
+
145
+ (Write your final answer here based on your systematic reasoning above)`,
146
+
147
+ max: `You are an expert reasoning model using GLM-4.6 architecture.
148
+
149
+ CRITICAL REQUIREMENTS:
150
+ 1. Think exhaustively from first principles
151
+ 2. Write extremely detailed reasoning in <reasoning_content> tags
152
+ 3. Analyze ALL possible angles, approaches, and edge cases
153
+ 4. Challenge your own assumptions and explore alternatives
154
+ 5. Provide rigorous justification for every claim
155
+
156
+ OUTPUT FORMAT:
157
+ <reasoning_content>
158
+ (Write comprehensive analysis here:
159
+ - First principles breakdown
160
+ - Exhaustive approach enumeration
161
+ - Comparative analysis of all approaches
162
+ - Edge case and failure mode analysis
163
+ - Assumption validation
164
+ - Counter-argument consideration
165
+ - Final conclusion with rigorous justification)
166
+ </reasoning_content>
167
+
168
+ (Write your final answer here based on your exhaustive reasoning above)`
169
+ };
170
+ }
171
+ }
172
+
173
+ module.exports = ReasoningEnforcer;
package/lib/ccs CHANGED
@@ -2,7 +2,7 @@
2
2
  set -euo pipefail
3
3
 
4
4
  # Version (updated by scripts/bump-version.sh)
5
- CCS_VERSION="3.4.5"
5
+ CCS_VERSION="3.4.6"
6
6
  SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
7
7
  readonly CONFIG_FILE="${CCS_CONFIG:-$HOME/.ccs/config.json}"
8
8
  readonly PROFILES_JSON="$HOME/.ccs/profiles.json"
package/lib/ccs.ps1 CHANGED
@@ -12,7 +12,7 @@ param(
12
12
  $ErrorActionPreference = "Stop"
13
13
 
14
14
  # Version (updated by scripts/bump-version.sh)
15
- $CcsVersion = "3.4.5"
15
+ $CcsVersion = "3.4.6"
16
16
  $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
17
17
  $ConfigFile = if ($env:CCS_CONFIG) { $env:CCS_CONFIG } else { "$env:USERPROFILE\.ccs\config.json" }
18
18
  $ProfilesJson = "$env:USERPROFILE\.ccs\profiles.json"
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kaitranntt/ccs",
3
- "version": "3.4.5",
3
+ "version": "3.4.6",
4
4
  "description": "Claude Code Switch - Instant profile switching between Claude Sonnet 4.5 and GLM 4.6",
5
5
  "keywords": [
6
6
  "cli",