sneakoscope 4.0.4 → 4.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -9
- package/crates/sks-core/Cargo.lock +1 -1
- package/crates/sks-core/Cargo.toml +1 -1
- package/crates/sks-core/src/main.rs +1 -1
- package/dist/bin/sks.js +1 -1
- package/dist/core/codex-app/glm-profile-schema.js +5 -1
- package/dist/core/commands/glm-command.js +15 -1
- package/dist/core/commands/mad-sks-command.js +65 -9
- package/dist/core/fsx.js +1 -1
- package/dist/core/perf/lru-cache.js +33 -0
- package/dist/core/providers/glm/glm-52-profile.js +14 -7
- package/dist/core/providers/glm/glm-52-request.js +40 -12
- package/dist/core/providers/glm/glm-52-response-guard.js +1 -2
- package/dist/core/providers/glm/glm-52-settings.js +50 -8
- package/dist/core/providers/glm/glm-bench.js +90 -0
- package/dist/core/providers/glm/glm-context-budget.js +15 -0
- package/dist/core/providers/glm/glm-context-cache.js +9 -0
- package/dist/core/providers/glm/glm-latency-trace.js +40 -0
- package/dist/core/providers/glm/glm-mad-launch.js +18 -3
- package/dist/core/providers/glm/glm-mad-mode.js +48 -20
- package/dist/core/providers/glm/glm-model-meta-cache.js +19 -0
- package/dist/core/providers/glm/glm-profile-resolver.js +104 -0
- package/dist/core/providers/glm/glm-reasoning-policy.js +15 -0
- package/dist/core/providers/glm/glm-request-cache.js +47 -0
- package/dist/core/providers/glm/glm-speed-context.js +82 -0
- package/dist/core/providers/glm/glm-speed-gate.js +40 -0
- package/dist/core/providers/glm/glm-speed-output-parser.js +40 -0
- package/dist/core/providers/glm/glm-tool-schema-cache.js +19 -0
- package/dist/core/version.js +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -35,16 +35,16 @@ Set up this agent project with Sneakoscope Codex. Use [[mandarange/Sneakoscope-C
|
|
|
35
35
|
|
|
36
36
|
## 🚀 Current Release
|
|
37
37
|
|
|
38
|
-
SKS **4.0.
|
|
38
|
+
SKS **4.0.5** tunes only the GLM 5.2 MAD path: `sks --mad --glm` now defaults to an xhigh reasoning profile while recovering speed through compact GLM context, disabled default tools, streaming, request/schema caches, and redacted bench/trace artifacts. Ordinary `sks --mad`, Naruto/Team, and non-GLM Codex paths keep their existing defaults.
|
|
39
39
|
|
|
40
|
-
What changed in 4.0.
|
|
40
|
+
What changed in 4.0.5:
|
|
41
41
|
|
|
42
|
-
- **GLM
|
|
43
|
-
- **
|
|
44
|
-
- **
|
|
45
|
-
- **
|
|
46
|
-
- **
|
|
47
|
-
- **4.0.
|
|
42
|
+
- **GLM-only xhigh speed profile.** `sks --mad --glm` keeps OpenRouter locked to `z-ai/glm-5.2`, uses `reasoning.effort: xhigh`, and bounds the default completion budget to the speed profile instead of changing global SKS reasoning defaults.
|
|
43
|
+
- **Compact GLM request shape.** The GLM speed profile uses streaming, `tool_choice: none`, no fallback `models`, `provider.allow_fallbacks: false`, `provider.require_parameters: true`, and throughput/latency provider preferences.
|
|
44
|
+
- **Opt-in GLM depth controls.** `--deep`, `--xhigh`, `--strict`, `--ttft`, and `--exact-provider` select explicit GLM profiles without affecting non-GLM routes.
|
|
45
|
+
- **GLM speed infrastructure.** GLM-only context budgeting, encoded request cache, tool schema cache, model metadata cache, output envelope parsing, deterministic patch gating, latency traces, and `--bench` dry-run diagnostics are covered by tests.
|
|
46
|
+
- **No GPT fallback panes.** GLM MAD keeps the existing GPT/codex-sdk native swarm disabled by default until a GLM worker backend exists, preserving the no-fallback guarantee.
|
|
47
|
+
- **4.0.4 GLM launch proof remains.** Each GLM MAD launch still writes `mad-glm-launch.json` with provider/model/profile/wrapper evidence and keeps OpenRouter keys out of layout artifacts.
|
|
48
48
|
|
|
49
49
|
SKS **3.1.16** was a launch-reliability patch on the 3.1.15 doctor-reliability release. It made `sks --mad` self-bootstrap a fresh project instead of dead-ending on a missing Codex config.
|
|
50
50
|
|
|
@@ -396,7 +396,7 @@ sks team open-zellij latest
|
|
|
396
396
|
sks team attach-zellij latest
|
|
397
397
|
```
|
|
398
398
|
|
|
399
|
-
Interactive SKS sessions use Zellij layouts. By default SKS launches Codex in Fast service tier with `--model gpt-5.5`, `-c service_tier="fast"`, the selected `model_reasoning_effort`, and `--no-alt-screen` for Zellij-backed interactive panes so terminal scrollback captures the conversation transcript. SKS
|
|
399
|
+
Interactive SKS sessions use Zellij layouts. By default SKS launches Codex in Fast service tier with `--model gpt-5.5`, `-c service_tier="fast"`, the selected `model_reasoning_effort`, and `--no-alt-screen` for Zellij-backed interactive panes so terminal scrollback captures the conversation transcript. Non-GLM SKS sessions force the model to `gpt-5.5`; `sks --mad --glm` is the OpenRouter GLM 5.2 exception. `SKS_CODEX_MODEL` and `SKS_CODEX_FAST_HIGH=0` cannot downgrade or remove the non-GLM model pin. You can still set `SKS_CODEX_REASONING` to change reasoning effort, and `SKS_ZELLIJ_CODEX_ALT_SCREEN=1` restores Codex's alternate-screen UI for the next launch. Use `sks --mad --workspace <name>` for an explicit MAD session and `sks help` for CLI help.
|
|
400
400
|
|
|
401
401
|
Before opening the interactive runtime, SKS checks the installed Codex CLI against npm `@openai/codex@latest`. If a newer version exists, it asks `Y/n`; answering `y` updates automatically with `npm i -g @openai/codex@latest` and then opens the runtime with the updated Codex CLI.
|
|
402
402
|
|
|
@@ -4,7 +4,7 @@ use std::io::{self, Read, Seek, SeekFrom};
|
|
|
4
4
|
fn main() {
|
|
5
5
|
let mut args = std::env::args().skip(1);
|
|
6
6
|
match args.next().as_deref() {
|
|
7
|
-
Some("--version") => println!("sks-rs 4.0.
|
|
7
|
+
Some("--version") => println!("sks-rs 4.0.5"),
|
|
8
8
|
Some("compact-info") => {
|
|
9
9
|
let mut input = String::new();
|
|
10
10
|
let _ = io::stdin().read_to_string(&mut input);
|
package/dist/bin/sks.js
CHANGED
|
@@ -13,7 +13,11 @@ export function validateGlmCodexAppModelProfile(value) {
|
|
|
13
13
|
profile.model === GLM_52_OPENROUTER_MODEL ? null : 'glm_codex_app_profile_invalid_model',
|
|
14
14
|
profile.mode === GLM_MAD_MODE ? null : 'glm_codex_app_profile_invalid_mode',
|
|
15
15
|
profile.strictModelLock === true ? null : 'glm_codex_app_profile_not_strict',
|
|
16
|
-
profile.gptFallbackAllowed === false ? null : 'glm_codex_app_profile_allows_gpt_fallback'
|
|
16
|
+
profile.gptFallbackAllowed === false ? null : 'glm_codex_app_profile_allows_gpt_fallback',
|
|
17
|
+
profile.defaultProfile === 'speed' ? null : 'glm_codex_app_profile_default_not_speed',
|
|
18
|
+
profile.defaultSettings?.tool_choice === 'none' ? null : 'glm_codex_app_profile_default_tools_not_omitted',
|
|
19
|
+
profile.defaultSettings?.provider_require_parameters === true ? null : 'glm_codex_app_profile_default_does_not_require_parameters',
|
|
20
|
+
profile.defaultSettings?.provider_allow_fallbacks === false ? null : 'glm_codex_app_profile_allows_provider_fallback'
|
|
17
21
|
].filter((item) => Boolean(item));
|
|
18
22
|
return {
|
|
19
23
|
ok: blockers.length === 0,
|
|
@@ -1,10 +1,24 @@
|
|
|
1
1
|
import { runMadGlmMode } from '../providers/glm/glm-mad-mode.js';
|
|
2
2
|
import { flag } from '../../cli/args.js';
|
|
3
3
|
import { madHighCommand } from './mad-sks-command.js';
|
|
4
|
+
import { runGlmBench } from '../providers/glm/glm-bench.js';
|
|
5
|
+
import { printJson } from '../../cli/output.js';
|
|
4
6
|
export async function glmCommand(args = []) {
|
|
7
|
+
if (flag(args, '--bench')) {
|
|
8
|
+
const result = await runGlmBench(process.cwd(), args);
|
|
9
|
+
if (result.status === 'blocked')
|
|
10
|
+
process.exitCode = 1;
|
|
11
|
+
if (flag(args, '--json'))
|
|
12
|
+
printJson(result);
|
|
13
|
+
else if (result.status === 'blocked')
|
|
14
|
+
console.error(`GLM bench blocked: ${result.warnings.join(', ')}`);
|
|
15
|
+
else
|
|
16
|
+
console.log(`GLM bench: dry-run p50=${result.summary.speed_p50_total_ms}ms ratio=${result.summary.speed_vs_deep_ratio}`);
|
|
17
|
+
return result;
|
|
18
|
+
}
|
|
5
19
|
const result = await runMadGlmMode(args);
|
|
6
20
|
if (!result.ok || flag(args, '--repair') || flag(args, '--json'))
|
|
7
21
|
return result;
|
|
8
|
-
return madHighCommand(['--glm', ...args], { glmReadiness: result });
|
|
22
|
+
return madHighCommand(['--glm', ...args], { glmReadiness: result, glmArgs: args });
|
|
9
23
|
}
|
|
10
24
|
//# sourceMappingURL=glm-command.js.map
|
|
@@ -30,13 +30,31 @@ export async function madHighCommand(args = [], deps = {}) {
|
|
|
30
30
|
const subcommand = firstSubcommand(args);
|
|
31
31
|
if (subcommand)
|
|
32
32
|
return madSksSubcommand(subcommand, args.filter((arg) => String(arg) !== subcommand));
|
|
33
|
-
const cleanArgs = stripMadLaunchOnlyArgs(args);
|
|
34
33
|
const rawArgs = (args || []).map((arg) => String(arg));
|
|
35
34
|
const glmMadLaunch = isMadGlmLaunch(rawArgs, deps);
|
|
35
|
+
const glmOnlyFlagBlockers = findGlmOnlyMadFlagBlockers(rawArgs, glmMadLaunch);
|
|
36
|
+
if (glmOnlyFlagBlockers.length) {
|
|
37
|
+
const result = {
|
|
38
|
+
ok: false,
|
|
39
|
+
status: 'blocked',
|
|
40
|
+
blockers: glmOnlyFlagBlockers,
|
|
41
|
+
hint: 'GLM profile and diagnostics flags require sks --mad --glm.'
|
|
42
|
+
};
|
|
43
|
+
if (rawArgs.includes('--json'))
|
|
44
|
+
console.log(JSON.stringify(result, null, 2));
|
|
45
|
+
else {
|
|
46
|
+
console.error('SKS MAD launch blocked: GLM-only flags require --glm.');
|
|
47
|
+
for (const blocker of glmOnlyFlagBlockers)
|
|
48
|
+
console.error(`- ${blocker}`);
|
|
49
|
+
}
|
|
50
|
+
process.exitCode = 1;
|
|
51
|
+
return result;
|
|
52
|
+
}
|
|
53
|
+
const cleanArgs = stripMadLaunchOnlyArgs(args, { includeGlmFlags: glmMadLaunch });
|
|
36
54
|
const madDbGrant = resolveMadLaunchMadDbGrant(rawArgs);
|
|
37
55
|
const dryRun = rawArgs.includes('--dry-run');
|
|
38
56
|
if (rawArgs.includes('--json') && !dryRun) {
|
|
39
|
-
const profile = glmMadLaunch ? buildMadGlmLaunchProfileNoWrite() : buildMadHighLaunchProfileNoWrite();
|
|
57
|
+
const profile = glmMadLaunch ? buildMadGlmLaunchProfileNoWrite(rawArgs) : buildMadHighLaunchProfileNoWrite();
|
|
40
58
|
return console.log(JSON.stringify(profile, null, 2));
|
|
41
59
|
}
|
|
42
60
|
const update = { status: 'notice_only', non_blocking: true };
|
|
@@ -176,7 +194,7 @@ export async function madHighCommand(args = [], deps = {}) {
|
|
|
176
194
|
return launchPreflight;
|
|
177
195
|
}
|
|
178
196
|
const madLaunch = await activateMadZellijPermissionState(process.cwd(), args);
|
|
179
|
-
const glmRuntime = glmMadLaunch ? await prepareMadGlmLaunchRuntime(madLaunch, deps) : null;
|
|
197
|
+
const glmRuntime = glmMadLaunch ? await prepareMadGlmLaunchRuntime(madLaunch, { ...deps, glmArgs: deps?.glmArgs || rawArgs }) : null;
|
|
180
198
|
if (glmMadLaunch && !glmRuntime?.ok) {
|
|
181
199
|
process.exitCode = 1;
|
|
182
200
|
return glmRuntime;
|
|
@@ -325,7 +343,7 @@ function isMadGlmLaunch(args = [], deps = {}) {
|
|
|
325
343
|
}
|
|
326
344
|
async function prepareMadGlmLaunchRuntime(madLaunch, deps = {}) {
|
|
327
345
|
const keyResolution = await resolveMadGlmLaunchKey(process.env);
|
|
328
|
-
const profile = buildMadGlmLaunchProfileNoWrite();
|
|
346
|
+
const profile = buildMadGlmLaunchProfileNoWrite(deps?.glmArgs || []);
|
|
329
347
|
if (!keyResolution.key) {
|
|
330
348
|
const blocked = {
|
|
331
349
|
schema: 'sks.glm-mad-launch.v1',
|
|
@@ -334,6 +352,9 @@ async function prepareMadGlmLaunchRuntime(madLaunch, deps = {}) {
|
|
|
334
352
|
mission_id: madLaunch.mission_id,
|
|
335
353
|
provider: profile.provider,
|
|
336
354
|
model: profile.model,
|
|
355
|
+
glm_profile: profile.glm_profile,
|
|
356
|
+
glm_mode: profile.glm_mode,
|
|
357
|
+
model_reasoning_effort: profile.model_reasoning_effort,
|
|
337
358
|
gpt_fallback_allowed: false,
|
|
338
359
|
blockers: keyResolution.blockers,
|
|
339
360
|
warnings: keyResolution.warnings
|
|
@@ -367,6 +388,9 @@ async function prepareMadGlmLaunchRuntime(madLaunch, deps = {}) {
|
|
|
367
388
|
type: 'mad_sks.glm_launch_profile_ready',
|
|
368
389
|
provider: profile.provider,
|
|
369
390
|
model: profile.model,
|
|
391
|
+
glm_profile: profile.glm_profile,
|
|
392
|
+
glm_mode: profile.glm_mode,
|
|
393
|
+
model_reasoning_effort: profile.model_reasoning_effort,
|
|
370
394
|
key_source: keyResolution.source || null,
|
|
371
395
|
gpt_fallback_allowed: false
|
|
372
396
|
});
|
|
@@ -713,7 +737,7 @@ async function activateMadZellijPermissionState(cwd = process.cwd(), args = [])
|
|
|
713
737
|
});
|
|
714
738
|
return { mission_id: id, dir, gate, root };
|
|
715
739
|
}
|
|
716
|
-
function
|
|
740
|
+
function baseMadLaunchOnlyFlags() {
|
|
717
741
|
return new Set([
|
|
718
742
|
'--mad',
|
|
719
743
|
'--MAD',
|
|
@@ -760,8 +784,26 @@ function madLaunchOnlyFlags() {
|
|
|
760
784
|
'--ack'
|
|
761
785
|
]);
|
|
762
786
|
}
|
|
763
|
-
function
|
|
787
|
+
function glmMadLaunchOnlyFlags() {
|
|
764
788
|
return new Set([
|
|
789
|
+
'--deep',
|
|
790
|
+
'--xhigh',
|
|
791
|
+
'--strict',
|
|
792
|
+
'--trace',
|
|
793
|
+
'--ttft',
|
|
794
|
+
'--exact-provider'
|
|
795
|
+
]);
|
|
796
|
+
}
|
|
797
|
+
function madLaunchOnlyFlags(includeGlmFlags = false) {
|
|
798
|
+
const flags = baseMadLaunchOnlyFlags();
|
|
799
|
+
if (includeGlmFlags) {
|
|
800
|
+
for (const flag of glmMadLaunchOnlyFlags())
|
|
801
|
+
flags.add(flag);
|
|
802
|
+
}
|
|
803
|
+
return flags;
|
|
804
|
+
}
|
|
805
|
+
function madLaunchValueFlags(includeGlmFlags = false) {
|
|
806
|
+
const flags = new Set([
|
|
765
807
|
'--mad-agents',
|
|
766
808
|
'--mad-swarm-agents',
|
|
767
809
|
'--mad-swarm-work-items',
|
|
@@ -769,6 +811,20 @@ function madLaunchValueFlags() {
|
|
|
769
811
|
'--mad-swarm-prompt',
|
|
770
812
|
'--ack'
|
|
771
813
|
]);
|
|
814
|
+
if (includeGlmFlags)
|
|
815
|
+
flags.add('--exact-provider');
|
|
816
|
+
return flags;
|
|
817
|
+
}
|
|
818
|
+
export function findGlmOnlyMadFlagBlockers(args = [], glmMadLaunch = false) {
|
|
819
|
+
if (glmMadLaunch)
|
|
820
|
+
return [];
|
|
821
|
+
const blockers = [];
|
|
822
|
+
const glmOnly = new Set([...glmMadLaunchOnlyFlags(), '--bench']);
|
|
823
|
+
for (const arg of args) {
|
|
824
|
+
if (glmOnly.has(String(arg)))
|
|
825
|
+
blockers.push(`glm_flag_requires_--glm:${arg}`);
|
|
826
|
+
}
|
|
827
|
+
return blockers;
|
|
772
828
|
}
|
|
773
829
|
export function defaultMadSwarmBackend(args = [], opts = {}) {
|
|
774
830
|
const list = (args || []).map((arg) => String(arg));
|
|
@@ -785,9 +841,9 @@ export function defaultMadSwarmBackend(args = [], opts = {}) {
|
|
|
785
841
|
return 'codex-sdk';
|
|
786
842
|
return 'zellij';
|
|
787
843
|
}
|
|
788
|
-
function stripMadLaunchOnlyArgs(args = []) {
|
|
789
|
-
const flags = madLaunchOnlyFlags();
|
|
790
|
-
const valueFlags = madLaunchValueFlags();
|
|
844
|
+
export function stripMadLaunchOnlyArgs(args = [], opts = {}) {
|
|
845
|
+
const flags = madLaunchOnlyFlags(Boolean(opts.includeGlmFlags));
|
|
846
|
+
const valueFlags = madLaunchValueFlags(Boolean(opts.includeGlmFlags));
|
|
791
847
|
const out = [];
|
|
792
848
|
for (let i = 0; i < args.length; i += 1) {
|
|
793
849
|
const arg = String(args[i]);
|
package/dist/core/fsx.js
CHANGED
|
@@ -5,7 +5,7 @@ import os from 'node:os';
|
|
|
5
5
|
import crypto from 'node:crypto';
|
|
6
6
|
import { spawn } from 'node:child_process';
|
|
7
7
|
import { fileURLToPath } from 'node:url';
|
|
8
|
-
export const PACKAGE_VERSION = '4.0.
|
|
8
|
+
export const PACKAGE_VERSION = '4.0.5';
|
|
9
9
|
export const DEFAULT_PROCESS_TAIL_BYTES = 256 * 1024;
|
|
10
10
|
export const DEFAULT_PROCESS_TIMEOUT_MS = 30 * 60 * 1000;
|
|
11
11
|
export function nowIso() {
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
export class SksLruCache {
|
|
2
|
+
maxEntries;
|
|
3
|
+
map = new Map();
|
|
4
|
+
constructor(maxEntries = 128) {
|
|
5
|
+
this.maxEntries = Math.max(1, Math.floor(maxEntries));
|
|
6
|
+
}
|
|
7
|
+
get size() {
|
|
8
|
+
return this.map.size;
|
|
9
|
+
}
|
|
10
|
+
get(key) {
|
|
11
|
+
const entry = this.map.get(key);
|
|
12
|
+
if (!entry)
|
|
13
|
+
return null;
|
|
14
|
+
this.map.delete(key);
|
|
15
|
+
this.map.set(key, entry);
|
|
16
|
+
return entry.value;
|
|
17
|
+
}
|
|
18
|
+
set(key, value, createdAt = Date.now()) {
|
|
19
|
+
if (this.map.has(key))
|
|
20
|
+
this.map.delete(key);
|
|
21
|
+
this.map.set(key, { key, value, createdAt });
|
|
22
|
+
while (this.map.size > this.maxEntries) {
|
|
23
|
+
const oldest = this.map.keys().next().value;
|
|
24
|
+
if (!oldest)
|
|
25
|
+
break;
|
|
26
|
+
this.map.delete(oldest);
|
|
27
|
+
}
|
|
28
|
+
}
|
|
29
|
+
clear() {
|
|
30
|
+
this.map.clear();
|
|
31
|
+
}
|
|
32
|
+
}
|
|
33
|
+
//# sourceMappingURL=lru-cache.js.map
|
|
@@ -1,7 +1,9 @@
|
|
|
1
|
-
import {
|
|
1
|
+
import { GLM_52_OPENROUTER_MODEL, GLM_MAD_MODE } from './glm-52-settings.js';
|
|
2
|
+
import { profileFromConst } from './glm-profile-resolver.js';
|
|
2
3
|
export const GLM_CODEX_APP_PROFILE_ID = 'sks/glm-5.2-mad';
|
|
3
|
-
export const GLM_CODEX_APP_PROFILE_LABEL = 'GLM 5.2 (MAD / OpenRouter)';
|
|
4
|
+
export const GLM_CODEX_APP_PROFILE_LABEL = 'GLM 5.2 (MAD XHigh Speed / OpenRouter)';
|
|
4
5
|
export function buildGlmCodexAppModelProfile() {
|
|
6
|
+
const speed = profileFromConst('speed');
|
|
5
7
|
return {
|
|
6
8
|
schema: 'sks.codex-app-model-profile.v1',
|
|
7
9
|
id: GLM_CODEX_APP_PROFILE_ID,
|
|
@@ -12,12 +14,17 @@ export function buildGlmCodexAppModelProfile() {
|
|
|
12
14
|
strictModelLock: true,
|
|
13
15
|
gptFallbackAllowed: false,
|
|
14
16
|
requiresSecret: 'openrouter-api-key',
|
|
17
|
+
defaultProfile: 'speed',
|
|
15
18
|
defaultSettings: {
|
|
16
|
-
temperature:
|
|
17
|
-
top_p:
|
|
18
|
-
reasoning_effort: '
|
|
19
|
-
tool_choice:
|
|
20
|
-
parallel_tool_calls:
|
|
19
|
+
temperature: speed.temperature,
|
|
20
|
+
top_p: speed.top_p,
|
|
21
|
+
reasoning_effort: 'xhigh',
|
|
22
|
+
tool_choice: speed.tool_choice,
|
|
23
|
+
parallel_tool_calls: speed.parallel_tool_calls,
|
|
24
|
+
max_tokens: speed.max_tokens,
|
|
25
|
+
provider_sort: speed.provider.sort || 'throughput',
|
|
26
|
+
provider_allow_fallbacks: false,
|
|
27
|
+
provider_require_parameters: speed.provider.require_parameters
|
|
21
28
|
},
|
|
22
29
|
codexCompatibility: {
|
|
23
30
|
target: 'rust-v0.141.0',
|
|
@@ -1,34 +1,62 @@
|
|
|
1
1
|
import { GLM_52_DEFAULT_REQUEST_SETTINGS, GLM_52_OPENROUTER_MODEL, clampGlm52MaxTokens } from './glm-52-settings.js';
|
|
2
|
+
import { buildDeepReasoningConfig } from './glm-reasoning-policy.js';
|
|
3
|
+
import { profileFromConst, resolveGlmProfileFromArgs } from './glm-profile-resolver.js';
|
|
2
4
|
export function buildGlm52Request(input) {
|
|
5
|
+
const profile = resolveInputProfile(input.profile, input.args, input.reasoningEffort);
|
|
6
|
+
if (profile.blockers.length) {
|
|
7
|
+
throw new Error(`GLM request profile blocked: ${profile.blockers.join(', ')}`);
|
|
8
|
+
}
|
|
9
|
+
const strictOrDeepEffort = profile.reasoning_effort || (input.reasoningEffort === 'high' || input.reasoningEffort === 'xhigh' ? input.reasoningEffort : undefined);
|
|
10
|
+
const reasoning = profile.name === 'speed'
|
|
11
|
+
? buildDeepReasoningConfig('xhigh')
|
|
12
|
+
: buildDeepReasoningConfig(strictOrDeepEffort || 'high');
|
|
3
13
|
const request = {
|
|
4
14
|
model: GLM_52_OPENROUTER_MODEL,
|
|
5
15
|
messages: input.messages,
|
|
6
|
-
stream: input.stream ??
|
|
7
|
-
temperature:
|
|
8
|
-
top_p:
|
|
9
|
-
reasoning
|
|
10
|
-
max_tokens: clampGlm52MaxTokens(input.maxTokens),
|
|
11
|
-
tool_choice: input.toolChoice ??
|
|
12
|
-
parallel_tool_calls: input.parallelToolCalls ??
|
|
16
|
+
stream: input.stream ?? profile.stream,
|
|
17
|
+
temperature: profile.temperature,
|
|
18
|
+
top_p: profile.top_p,
|
|
19
|
+
...(reasoning ? { reasoning } : {}),
|
|
20
|
+
max_tokens: clampGlm52MaxTokens(input.maxTokens ?? profile.max_tokens),
|
|
21
|
+
tool_choice: input.toolChoice ?? profile.tool_choice,
|
|
22
|
+
parallel_tool_calls: input.parallelToolCalls ?? profile.parallel_tool_calls,
|
|
23
|
+
...(profile.stop && profile.name === 'speed' ? { stop: profile.stop } : {}),
|
|
13
24
|
provider: {
|
|
14
25
|
allow_fallbacks: false,
|
|
15
|
-
require_parameters:
|
|
16
|
-
sort: input.providerSort ??
|
|
17
|
-
|
|
26
|
+
require_parameters: profile.provider.require_parameters,
|
|
27
|
+
...(profile.provider.sort || input.providerSort ? { sort: input.providerSort ?? profile.provider.sort } : {}),
|
|
28
|
+
...(profile.provider.preferred_min_throughput ? { preferred_min_throughput: profile.provider.preferred_min_throughput } : {}),
|
|
29
|
+
...(profile.provider.preferred_max_latency ? { preferred_max_latency: profile.provider.preferred_max_latency } : {}),
|
|
30
|
+
...(profile.provider.order ? { order: profile.provider.order } : {})
|
|
31
|
+
},
|
|
32
|
+
...(input.responseFormat || profile.response_format ? { response_format: input.responseFormat ?? profile.response_format } : {})
|
|
18
33
|
};
|
|
19
34
|
return {
|
|
20
35
|
...request,
|
|
21
|
-
...(input.tools ? { tools: input.tools } : {})
|
|
22
|
-
...(input.responseFormat ? { response_format: input.responseFormat } : {})
|
|
36
|
+
...(input.tools && request.tool_choice !== 'none' ? { tools: input.tools } : {})
|
|
23
37
|
};
|
|
24
38
|
}
|
|
25
39
|
export function buildGlm52KeyValidationRequest() {
|
|
26
40
|
return buildGlm52Request({
|
|
27
41
|
messages: [{ role: 'user', content: 'Reply with OK.' }],
|
|
42
|
+
profile: 'speed',
|
|
28
43
|
stream: false,
|
|
29
44
|
maxTokens: 1,
|
|
30
45
|
toolChoice: 'none',
|
|
31
46
|
parallelToolCalls: false
|
|
32
47
|
});
|
|
33
48
|
}
|
|
49
|
+
function resolveInputProfile(profile, args, reasoningEffort) {
|
|
50
|
+
if (profile && typeof profile === 'object')
|
|
51
|
+
return profile;
|
|
52
|
+
if (profile)
|
|
53
|
+
return profileFromConst(profile);
|
|
54
|
+
if (args)
|
|
55
|
+
return resolveGlmProfileFromArgs(args);
|
|
56
|
+
if (reasoningEffort === 'xhigh')
|
|
57
|
+
return profileFromConst('xhigh');
|
|
58
|
+
if (reasoningEffort === 'high')
|
|
59
|
+
return profileFromConst('deep');
|
|
60
|
+
return profileFromConst(GLM_52_DEFAULT_REQUEST_SETTINGS.mode === 'mad-glm-speed' ? 'speed' : 'speed');
|
|
61
|
+
}
|
|
34
62
|
//# sourceMappingURL=glm-52-request.js.map
|
|
@@ -11,8 +11,7 @@ export function assertGlm52ActualModel(responseModel) {
|
|
|
11
11
|
}
|
|
12
12
|
const normalized = responseModel.toLowerCase();
|
|
13
13
|
if (normalized === GLM_52_OPENROUTER_MODEL ||
|
|
14
|
-
normalized.startsWith(`${GLM_52_OPENROUTER_MODEL}-`)
|
|
15
|
-
normalized.includes('glm-5.2')) {
|
|
14
|
+
normalized.startsWith(`${GLM_52_OPENROUTER_MODEL}-`)) {
|
|
16
15
|
return {
|
|
17
16
|
ok: true,
|
|
18
17
|
code: 'ok',
|
|
@@ -1,24 +1,66 @@
|
|
|
1
1
|
export { OPENROUTER_CHAT_COMPLETIONS_URL } from '../openrouter/openrouter-types.js';
|
|
2
2
|
export const GLM_52_OPENROUTER_MODEL = 'z-ai/glm-5.2';
|
|
3
|
-
export const
|
|
4
|
-
export const
|
|
3
|
+
export const GLM_52_MODEL = GLM_52_OPENROUTER_MODEL;
|
|
4
|
+
export const GLM_SPEED_MODE = 'mad-glm-speed';
|
|
5
|
+
export const GLM_DEEP_MODE = 'mad-glm-deep';
|
|
6
|
+
export const GLM_XHIGH_MODE = 'mad-glm-xhigh';
|
|
7
|
+
export const GLM_STRICT_MODE = 'mad-glm-strict';
|
|
8
|
+
export const GLM_MAD_MODE = GLM_SPEED_MODE;
|
|
9
|
+
export const GLM_52_MAX_TOKENS_SPEED = 4096;
|
|
10
|
+
export const GLM_52_MAX_TOKENS_DEFAULT = GLM_52_MAX_TOKENS_SPEED;
|
|
11
|
+
export const GLM_52_MAX_TOKENS_DEEP = 16384;
|
|
12
|
+
export const GLM_52_MAX_TOKENS_XHIGH = 32768;
|
|
5
13
|
export const GLM_52_MAX_TOKENS_LONG = 65536;
|
|
6
14
|
export const GLM_52_MAX_TOKENS_XLONG = 131072;
|
|
7
15
|
export const GLM_52_TOP_PROVIDER_MAX_COMPLETION_TOKENS = 262144;
|
|
8
|
-
export const
|
|
16
|
+
export const GLM_SPEED_PROFILE = {
|
|
9
17
|
model: GLM_52_OPENROUTER_MODEL,
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
18
|
+
mode: GLM_SPEED_MODE,
|
|
19
|
+
temperature: 0.2,
|
|
20
|
+
top_p: 0.85,
|
|
13
21
|
stream: true,
|
|
14
22
|
provider: {
|
|
15
23
|
allow_fallbacks: false,
|
|
16
|
-
require_parameters: true
|
|
24
|
+
require_parameters: true,
|
|
25
|
+
sort: 'throughput',
|
|
26
|
+
preferred_min_throughput: { p50: 80, p90: 40 },
|
|
27
|
+
preferred_max_latency: { p50: 2, p90: 5 }
|
|
28
|
+
},
|
|
29
|
+
tool_choice: 'none',
|
|
30
|
+
parallel_tool_calls: false,
|
|
31
|
+
max_tokens: GLM_52_MAX_TOKENS_SPEED,
|
|
32
|
+
reasoning_effort: 'xhigh',
|
|
33
|
+
reasoning_default: 'xhigh-speed-optimized'
|
|
34
|
+
};
|
|
35
|
+
export const GLM_DEEP_PROFILE = {
|
|
36
|
+
model: GLM_52_OPENROUTER_MODEL,
|
|
37
|
+
mode: GLM_DEEP_MODE,
|
|
38
|
+
temperature: 0.3,
|
|
39
|
+
top_p: 0.9,
|
|
40
|
+
stream: true,
|
|
41
|
+
provider: {
|
|
42
|
+
allow_fallbacks: false,
|
|
43
|
+
require_parameters: true,
|
|
44
|
+
sort: 'throughput'
|
|
17
45
|
},
|
|
18
46
|
tool_choice: 'auto',
|
|
19
47
|
parallel_tool_calls: false,
|
|
20
|
-
max_tokens:
|
|
48
|
+
max_tokens: GLM_52_MAX_TOKENS_DEEP,
|
|
49
|
+
reasoning_effort: 'high'
|
|
50
|
+
};
|
|
51
|
+
export const GLM_XHIGH_PROFILE = {
|
|
52
|
+
...GLM_DEEP_PROFILE,
|
|
53
|
+
mode: GLM_XHIGH_MODE,
|
|
54
|
+
max_tokens: GLM_52_MAX_TOKENS_XHIGH,
|
|
55
|
+
reasoning_effort: 'xhigh'
|
|
56
|
+
};
|
|
57
|
+
export const GLM_STRICT_PROFILE = {
|
|
58
|
+
...GLM_DEEP_PROFILE,
|
|
59
|
+
mode: GLM_STRICT_MODE,
|
|
60
|
+
structured_outputs: true,
|
|
61
|
+
response_format: 'json_schema'
|
|
21
62
|
};
|
|
63
|
+
export const GLM_52_DEFAULT_REQUEST_SETTINGS = GLM_SPEED_PROFILE;
|
|
22
64
|
export function clampGlm52MaxTokens(value) {
|
|
23
65
|
const numeric = Number.isFinite(value) ? Math.floor(Number(value)) : GLM_52_MAX_TOKENS_DEFAULT;
|
|
24
66
|
return Math.max(1, Math.min(numeric, GLM_52_TOP_PROVIDER_MAX_COMPLETION_TOKENS));
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
import path from 'node:path';
|
|
2
|
+
import { nowIso, writeJsonAtomic } from '../../fsx.js';
|
|
3
|
+
import { profileFromConst } from './glm-profile-resolver.js';
|
|
4
|
+
import { createEmptyGlmLatencyTrace, writeGlmLatencyTrace } from './glm-latency-trace.js';
|
|
5
|
+
const SYNTHETIC_CASES = Object.freeze([
|
|
6
|
+
benchCase('small doc edit', 'doc_edit', 420, 980),
|
|
7
|
+
benchCase('small TS function edit', 'small_edit', 460, 1100),
|
|
8
|
+
benchCase('failing test fix from small error', 'test_fix', 520, 1220),
|
|
9
|
+
benchCase('simple config edit', 'config_edit', 390, 930)
|
|
10
|
+
]);
|
|
11
|
+
export async function runGlmBench(root, args = []) {
|
|
12
|
+
const execute = args.includes('--execute');
|
|
13
|
+
if (execute) {
|
|
14
|
+
const blocked = {
|
|
15
|
+
schema: 'sks.glm-bench-result.v1',
|
|
16
|
+
version: '4.0.5',
|
|
17
|
+
generated_at: nowIso(),
|
|
18
|
+
status: 'blocked',
|
|
19
|
+
dry_run: true,
|
|
20
|
+
cases: [],
|
|
21
|
+
summary: {
|
|
22
|
+
speed_p50_total_ms: 0,
|
|
23
|
+
speed_p90_total_ms: 0,
|
|
24
|
+
speed_p50_ttft_ms: null
|
|
25
|
+
},
|
|
26
|
+
warnings: ['execute_requested_but_live_openrouter_bench_not_implemented']
|
|
27
|
+
};
|
|
28
|
+
await writeJsonAtomic(path.join(root, '.sneakoscope', 'glm', 'bench-blocked.json'), blocked);
|
|
29
|
+
return blocked;
|
|
30
|
+
}
|
|
31
|
+
const speedTotals = SYNTHETIC_CASES.map((row) => row.speed.total_ms);
|
|
32
|
+
const deepTotals = SYNTHETIC_CASES.map((row) => row.deep.total_ms);
|
|
33
|
+
const result = {
|
|
34
|
+
schema: 'sks.glm-bench-result.v1',
|
|
35
|
+
version: '4.0.5',
|
|
36
|
+
generated_at: nowIso(),
|
|
37
|
+
status: 'dry_run',
|
|
38
|
+
dry_run: true,
|
|
39
|
+
cases: SYNTHETIC_CASES,
|
|
40
|
+
summary: {
|
|
41
|
+
speed_p50_total_ms: percentile(speedTotals, 50),
|
|
42
|
+
speed_p90_total_ms: percentile(speedTotals, 90),
|
|
43
|
+
speed_p50_ttft_ms: null,
|
|
44
|
+
deep_p50_total_ms: percentile(deepTotals, 50),
|
|
45
|
+
speed_vs_deep_ratio: Number((percentile(speedTotals, 50) / percentile(deepTotals, 50)).toFixed(3))
|
|
46
|
+
},
|
|
47
|
+
warnings: ['synthetic_dry_run_no_network_no_gpt_key_required']
|
|
48
|
+
};
|
|
49
|
+
await writeJsonAtomic(path.join(root, '.sneakoscope', 'glm', 'bench-result.json'), result);
|
|
50
|
+
await writeGlmLatencyTrace(root, {
|
|
51
|
+
...createEmptyGlmLatencyTrace('speed'),
|
|
52
|
+
total_ms: result.summary.speed_p50_total_ms,
|
|
53
|
+
context_estimated_tokens: 16_000,
|
|
54
|
+
request_encode_ms: 1,
|
|
55
|
+
encoded_request_cache_hit: true
|
|
56
|
+
});
|
|
57
|
+
return result;
|
|
58
|
+
}
|
|
59
|
+
function benchCase(name, taskKind, speedMs, deepMs) {
|
|
60
|
+
return {
|
|
61
|
+
name,
|
|
62
|
+
task_kind: taskKind,
|
|
63
|
+
speed: {
|
|
64
|
+
mode: 'speed',
|
|
65
|
+
synthetic: true,
|
|
66
|
+
llm_calls: 1,
|
|
67
|
+
max_tokens: profileFromConst('speed').max_tokens,
|
|
68
|
+
context_target_tokens: 16_000,
|
|
69
|
+
total_ms: speedMs,
|
|
70
|
+
ttft_ms: null
|
|
71
|
+
},
|
|
72
|
+
deep: {
|
|
73
|
+
mode: 'deep',
|
|
74
|
+
synthetic: true,
|
|
75
|
+
llm_calls: 1,
|
|
76
|
+
max_tokens: profileFromConst('deep').max_tokens,
|
|
77
|
+
context_target_tokens: 64_000,
|
|
78
|
+
total_ms: deepMs,
|
|
79
|
+
ttft_ms: null
|
|
80
|
+
}
|
|
81
|
+
};
|
|
82
|
+
}
|
|
83
|
+
function percentile(values, p) {
|
|
84
|
+
const sorted = [...values].sort((a, b) => a - b);
|
|
85
|
+
if (!sorted.length)
|
|
86
|
+
return 0;
|
|
87
|
+
const index = Math.min(sorted.length - 1, Math.max(0, Math.ceil((p / 100) * sorted.length) - 1));
|
|
88
|
+
return sorted[index] || 0;
|
|
89
|
+
}
|
|
90
|
+
//# sourceMappingURL=glm-bench.js.map
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
export const GLM_SPEED_CONTEXT_TARGET_TOKENS = 16_000;
|
|
2
|
+
export const GLM_SPEED_CONTEXT_HARD_CAP_TOKENS = 32_000;
|
|
3
|
+
export const GLM_DEEP_CONTEXT_TARGET_TOKENS = 64_000;
|
|
4
|
+
export function estimateGlmTokens(text) {
|
|
5
|
+
if (!text)
|
|
6
|
+
return 0;
|
|
7
|
+
return Math.ceil(text.length / 4);
|
|
8
|
+
}
|
|
9
|
+
export function trimToEstimatedTokens(text, maxTokens) {
|
|
10
|
+
const maxChars = Math.max(0, Math.floor(maxTokens) * 4);
|
|
11
|
+
if (text.length <= maxChars)
|
|
12
|
+
return text;
|
|
13
|
+
return text.slice(0, maxChars);
|
|
14
|
+
}
|
|
15
|
+
//# sourceMappingURL=glm-context-budget.js.map
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
import { SksLruCache } from '../../perf/lru-cache.js';
|
|
2
|
+
export function createGlmContextCache(maxEntries = 64) {
|
|
3
|
+
const cache = new SksLruCache(maxEntries);
|
|
4
|
+
return {
|
|
5
|
+
getByDigest: (digest) => cache.get(digest),
|
|
6
|
+
set: (context) => cache.set(context.digest, context)
|
|
7
|
+
};
|
|
8
|
+
}
|
|
9
|
+
//# sourceMappingURL=glm-context-cache.js.map
|