npm - @kaitranntt/ccs - Versions diffs - 3.4.5 → 3.4.6 - Mend

@kaitranntt/ccs 3.4.5 → 3.4.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +52 -41
package/VERSION +1 -1
package/bin/glmt/glmt-transformer.js +13 -1
package/bin/glmt/reasoning-enforcer.js +173 -0
package/lib/ccs +1 -1
package/lib/ccs.ps1 +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -197,6 +197,16 @@ Commands and skills symlinked from `~/.ccs/shared/` - no duplication across prof
 ## GLM with Thinking (GLMT)
+> **[!] WARNING: NOT PRODUCTION READY**
+>
+> **GLMT is experimental and requires extensive debugging**:
+> - Streaming and tool support still under active development
+> - May experience unexpected errors, timeouts, or incomplete responses
+> - Requires frequent debugging and manual intervention
+> - **Not recommended for critical workflows or production use**
+>
+> **Alternative for GLM Thinking**: Consider going through the **CCR hustle** with the **Transformer of Bedolla** (ZaiTransformer) for a more stable implementation.
+>
 > **[!] Important**: GLMT requires npm installation (`npm install -g @kaitranntt/ccs`). Not available in native shell versions (requires Node.js HTTP server).
 ### Acknowledgments: The Foundation That Made GLMT Possible
@@ -222,62 +232,63 @@ Commands and skills symlinked from `~/.ccs/shared/` - no duplication across prof
 | Feature | GLM (`ccs glm`) | GLMT (`ccs glmt`) |
 |---------|-----------------|-------------------|
 | **Endpoint** | Anthropic-compatible | OpenAI-compatible |
-| **Thinking** | No | Yes (reasoning_content) |
-| **Tool Support** | Basic | **Full (v3.5+)** |
-| **MCP Tools** | Limited | **Working (v3.5+)** |
-| **Streaming** | Yes | **Yes (v3.4+)** |
-| **TTFB** | <500ms | <500ms (streaming), 2-10s (buffered) |
-| **Use Case** | Fast responses | Complex reasoning + tools |
+| **Thinking** | No | Experimental (reasoning_content) |
+| **Tool Support** | Basic | **Unstable (v3.5+)** |
+| **MCP Tools** | Limited | **Buggy (v3.5+)** |
+| **Streaming** | Stable | **Experimental (v3.4+)** |
+| **TTFB** | <500ms | <500ms (sometimes), 2-10s+ (often) |
+| **Use Case** | Reliable work | **Debugging experiments only** |
 ### Tool Support (v3.5)
-**GLMT now fully supports MCP tools and function calling**:
+**GLMT attempts MCP tools and function calling (EXPERIMENTAL)**:
-- **Bidirectional Transformation**: Anthropic tools ↔ OpenAI function calling
-- **MCP Integration**: MCP tools execute correctly (no XML tag output)
-- **Streaming Tool Calls**: Real-time tool calls with input_json deltas
-- **Backward Compatible**: Works seamlessly with existing thinking support
-- **No Configuration**: Tool support works automatically
+- **Bidirectional Transformation**: Anthropic tools ↔ OpenAI format (unstable)
+- **MCP Integration**: MCP tools sometimes execute (often output XML garbage)
+- **Streaming Tool Calls**: Real-time tool calls (when not crashing)
+- **Backward Compatible**: May break existing thinking support
+- **Configuration Required**: Frequent manual debugging needed
 ### Streaming Support (v3.4)
-**GLMT now supports real-time streaming** with incremental reasoning content delivery.
+**GLMT attempts real-time streaming** with incremental reasoning content delivery (OFTEN FAILS).
-- **Default**: Streaming enabled (TTFB <500ms)
-- **Disable**: Set `CCS_GLMT_STREAMING=disabled` for buffered mode
-- **Force**: Set `CCS_GLMT_STREAMING=force` to override client preferences
-- **Thinking parameter**: Claude CLI `thinking` parameter support
-  - Respects `thinking.type` and `budget_tokens`
-  - Precedence: CLI parameter > message tags > default
+- **Default**: Streaming enabled (TTFB <500ms when it works)
+- **Auto-fallback**: Frequently switches to buffered mode due to errors
+- **Thinking parameter**: Claude CLI `thinking` parameter sometimes works
+  - May ignore `thinking.type` and `budget_tokens`
+  - Precedence: CLI parameter > message tags > default (when not broken)
-**Confirmed working**: Z.AI (1498 reasoning chunks tested, tool calls verified)
+**Barely working**: Z.AI (tested, tool calls frequently break, requires constant debugging)
-### How It Works
+### How It Works (When It Works)
-1. CCS spawns embedded HTTP proxy on localhost
-2. Proxy converts Anthropic format → OpenAI format (streaming or buffered)
-3. Transforms Anthropic tools → OpenAI function calling format
-4. Forwards to Z.AI with reasoning parameters and tools
-5. Converts `reasoning_content` → thinking blocks (incremental or complete)
-6. Converts OpenAI `tool_calls` → Anthropic tool_use blocks
-7. Thinking and tool calls appear in Claude Code UI in real-time
+1. CCS spawns embedded HTTP proxy on localhost (if not crashing)
+2. Proxy attempts to convert Anthropic format → OpenAI format (often fails)
+3. Tries to transform Anthropic tools → OpenAI function calling format (buggy)
+4. Forwards to Z.AI with reasoning parameters and tools (when not timing out)
+5. Attempts to convert `reasoning_content` → thinking blocks (partial or broken)
+6. Attempts to convert OpenAI `tool_calls` → Anthropic tool_use blocks (XML garbage common)
+7. Thinking and tool calls sometimes appear in Claude Code UI (when not broken)
-### Control Tags
+### Control Tags & Keywords
+**Control Tags**:
 - `<Thinking:On|Off>` - Enable/disable reasoning blocks (default: On)
 - `<Effort:Low|Medium|High>` - Control reasoning depth (deprecated - Z.AI only supports binary thinking)
+**Thinking Keywords** (inconsistent activation):
+- `think` - Sometimes enables reasoning (low effort)
+- `think hard` - Sometimes enables reasoning (medium effort)
+- `think harder` - Sometimes enables reasoning (high effort)
+- `ultrathink` - Attempts maximum reasoning depth (often breaks)
 ### Environment Variables
-**GLMT-specific**:
-- `CCS_GLMT_FORCE_ENGLISH=true` - Force English output (default: true)
-- `CCS_GLMT_THINKING_BUDGET=8192` - Control thinking on/off based on task type
-  - 0 or "unlimited": Always enable thinking
-  - 1-2048: Disable thinking (fast execution)
-  - 2049-8192: Enable for reasoning tasks only (default)
-  - >8192: Always enable thinking
-- `CCS_GLMT_STREAMING=disabled` - Force buffered mode
-- `CCS_GLMT_STREAMING=force` - Force streaming (override client)
+**GLMT features** (all experimental):
+- Forced English output enforcement (sometimes works)
+- Random thinking mode activation (unpredictable)
+- Attempted streaming with frequent fallback to buffered mode
 **General**:
 - `CCS_DEBUG_LOG=1` - Enable debug file logging
@@ -319,10 +330,10 @@ ccs glmt --verbose "your prompt"
 # Logs: ~/.ccs/logs/
 ```
-**Check streaming mode**:
+**GLMT debugging**:
 ```bash
-# Disable streaming for debugging
-CCS_GLMT_STREAMING=disabled ccs glmt "test"
+# Verbose logging shows streaming status and reasoning details
+ccs glmt --verbose "test"
 ```
 **Check reasoning content**:

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 3.4.5
1	+ 3.4.6

package/bin/glmt/glmt-transformer.js CHANGED Viewed

@@ -8,6 +8,7 @@ const os = require('os');
 const SSEParser = require('./sse-parser');
 const DeltaAccumulator = require('./delta-accumulator');
 const LocaleEnforcer = require('./locale-enforcer');
+const ReasoningEnforcer = require('./reasoning-enforcer');
 /**
  * GlmtTransformer - Convert between Anthropic and OpenAI formats with thinking and tool support
@@ -54,6 +55,11 @@ class GlmtTransformer {
     // Initialize locale enforcer (always enforce English)
     this.localeEnforcer = new LocaleEnforcer();
+    // Initialize reasoning enforcer (enabled by default for all GLMT usage)
+    this.reasoningEnforcer = new ReasoningEnforcer({
+      enabled: config.explicitReasoning ?? true
+    });
   }
   /**
@@ -104,10 +110,16 @@ class GlmtTransformer {
         anthropicRequest.messages || []
       );
+      // 4.5. Inject reasoning instruction (if enabled or thinking requested)
+      const messagesWithReasoning = this.reasoningEnforcer.injectInstruction(
+        messagesWithLocale,
+        thinkingConfig
+      );
       // 5. Convert to OpenAI format
       const openaiRequest = {
         model: glmModel,
-        messages: this._sanitizeMessages(messagesWithLocale),
+        messages: this._sanitizeMessages(messagesWithReasoning),
         max_tokens: this._getMaxTokens(glmModel),
         stream: anthropicRequest.stream ?? false
       };

package/bin/glmt/reasoning-enforcer.js ADDED Viewed

@@ -0,0 +1,173 @@
+#!/usr/bin/env node
+'use strict';
+/**
+ * ReasoningEnforcer - Inject explicit reasoning instructions into prompts
+ *
+ * Purpose: Force GLM models to use structured reasoning output format (<reasoning_content>)
+ * This complements API parameters (reasoning: true) with explicit prompt instructions.
+ *
+ * Usage:
+ *   const enforcer = new ReasoningEnforcer({ enabled: true });
+ *   const modifiedMessages = enforcer.injectInstruction(messages, thinkingConfig);
+ *
+ * Strategy:
+ *   1. If system prompt exists: Prepend reasoning instruction
+ *   2. If no system prompt: Prepend to first user message
+ *   3. Select prompt template based on effort level (low/medium/high/max)
+ *   4. Preserve message structure (string vs array content)
+ */
+class ReasoningEnforcer {
+  constructor(options = {}) {
+    this.enabled = options.enabled ?? false; // Opt-in by default
+    this.prompts = options.prompts || this._getDefaultPrompts();
+  }
+  /**
+   * Inject reasoning instruction into messages
+   * @param {Array} messages - Messages array to modify
+   * @param {Object} thinkingConfig - { thinking: boolean, effort: string }
+   * @returns {Array} Modified messages array
+   */
+  injectInstruction(messages, thinkingConfig = {}) {
+    // Only inject if enabled or thinking explicitly requested
+    if (!this.enabled && !thinkingConfig.thinking) {
+      return messages;
+    }
+    // Clone messages to avoid mutation
+    const modifiedMessages = JSON.parse(JSON.stringify(messages));
+    // Select prompt based on effort level
+    const prompt = this._selectPrompt(thinkingConfig.effort || 'medium');
+    // Strategy 1: Inject into system prompt (preferred)
+    const systemIndex = modifiedMessages.findIndex(m => m.role === 'system');
+    if (systemIndex >= 0) {
+      const systemMsg = modifiedMessages[systemIndex];
+      if (typeof systemMsg.content === 'string') {
+        systemMsg.content = `${prompt}\n\n${systemMsg.content}`;
+      } else if (Array.isArray(systemMsg.content)) {
+        systemMsg.content.unshift({
+          type: 'text',
+          text: prompt
+        });
+      }
+      return modifiedMessages;
+    }
+    // Strategy 2: Prepend to first user message
+    const userIndex = modifiedMessages.findIndex(m => m.role === 'user');
+    if (userIndex >= 0) {
+      const userMsg = modifiedMessages[userIndex];
+      if (typeof userMsg.content === 'string') {
+        userMsg.content = `${prompt}\n\n${userMsg.content}`;
+      } else if (Array.isArray(userMsg.content)) {
+        userMsg.content.unshift({
+          type: 'text',
+          text: prompt
+        });
+      }
+      return modifiedMessages;
+    }
+    // No system or user messages found (edge case)
+    return modifiedMessages;
+  }
+  /**
+   * Select prompt template based on effort level
+   * @param {string} effort - 'low', 'medium', 'high', or 'max'
+   * @returns {string} Prompt template
+   * @private
+   */
+  _selectPrompt(effort) {
+    const normalizedEffort = effort.toLowerCase();
+    return this.prompts[normalizedEffort] || this.prompts.medium;
+  }
+  /**
+   * Get default prompt templates
+   * @returns {Object} Map of effort levels to prompts
+   * @private
+   */
+  _getDefaultPrompts() {
+    return {
+      low: `You are an expert reasoning model using GLM-4.6 architecture.
+CRITICAL: Before answering, write 2-3 sentences of reasoning in <reasoning_content> tags.
+OUTPUT FORMAT:
+<reasoning_content>
+(Brief analysis: what is the problem? what's the approach?)
+</reasoning_content>
+(Write your final answer here)`,
+      medium: `You are an expert reasoning model using GLM-4.6 architecture.
+CRITICAL REQUIREMENTS:
+1. Always think step-by-step before answering
+2. Write your reasoning process explicitly in <reasoning_content> tags
+3. Never skip your chain of thought, even for simple problems
+OUTPUT FORMAT:
+<reasoning_content>
+(Write your detailed thinking here: analyze the problem, explore approaches,
+evaluate trade-offs, and arrive at a conclusion)
+</reasoning_content>
+(Write your final answer here based on your reasoning above)`,
+      high: `You are an expert reasoning model using GLM-4.6 architecture.
+CRITICAL REQUIREMENTS:
+1. Think deeply and systematically before answering
+2. Write comprehensive reasoning in <reasoning_content> tags
+3. Explore multiple approaches and evaluate trade-offs
+4. Show all steps in your problem-solving process
+OUTPUT FORMAT:
+<reasoning_content>
+(Write exhaustive analysis here:
+ - Problem decomposition
+ - Multiple approach exploration
+ - Trade-off analysis for each approach
+ - Edge case consideration
+ - Final conclusion with justification)
+</reasoning_content>
+(Write your final answer here based on your systematic reasoning above)`,
+      max: `You are an expert reasoning model using GLM-4.6 architecture.
+CRITICAL REQUIREMENTS:
+1. Think exhaustively from first principles
+2. Write extremely detailed reasoning in <reasoning_content> tags
+3. Analyze ALL possible angles, approaches, and edge cases
+4. Challenge your own assumptions and explore alternatives
+5. Provide rigorous justification for every claim
+OUTPUT FORMAT:
+<reasoning_content>
+(Write comprehensive analysis here:
+ - First principles breakdown
+ - Exhaustive approach enumeration
+ - Comparative analysis of all approaches
+ - Edge case and failure mode analysis
+ - Assumption validation
+ - Counter-argument consideration
+ - Final conclusion with rigorous justification)
+</reasoning_content>
+(Write your final answer here based on your exhaustive reasoning above)`
+    };
+  }
+}
+module.exports = ReasoningEnforcer;

package/lib/ccs CHANGED Viewed

@@ -2,7 +2,7 @@
 set -euo pipefail
 # Version (updated by scripts/bump-version.sh)
-CCS_VERSION="3.4.5"
+CCS_VERSION="3.4.6"
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 readonly CONFIG_FILE="${CCS_CONFIG:-$HOME/.ccs/config.json}"
 readonly PROFILES_JSON="$HOME/.ccs/profiles.json"

package/lib/ccs.ps1 CHANGED Viewed

@@ -12,7 +12,7 @@ param(
 $ErrorActionPreference = "Stop"
 # Version (updated by scripts/bump-version.sh)
-$CcsVersion = "3.4.5"
+$CcsVersion = "3.4.6"
 $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
 $ConfigFile = if ($env:CCS_CONFIG) { $env:CCS_CONFIG } else { "$env:USERPROFILE\.ccs\config.json" }
 $ProfilesJson = "$env:USERPROFILE\.ccs\profiles.json"

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kaitranntt/ccs",
-  "version": "3.4.5",
+  "version": "3.4.6",
   "description": "Claude Code Switch - Instant profile switching between Claude Sonnet 4.5 and GLM 4.6",
   "keywords": [
     "cli",