@tiens.nguyen/gonext-local-worker 1.0.50 → 1.0.51

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,6 +2,32 @@
2
2
  Run:
3
3
  GONEXT_API_BASE=... GONEXT_WORKER_KEY=... npx -y --package @tiens.nguyen/gonext-local-worker gonext-local-worker
4
4
 
5
+ ## Agent chat mode
6
+
7
+ Select **Agent** (instead of Chat) under the composer in the web app. Your
8
+ free-form prompt is handed to a [smolagents](https://github.com/huggingface/smolagents)
9
+ agent running on your local MLX/Ollama model. The agent can call tools (v1:
10
+ `http_request`) and streams its thinking steps + final answer directly into the
11
+ chat thread.
12
+
13
+ Requires smolagents in the worker's Python environment:
14
+
15
+ ```sh
16
+ pip install smolagents certifi
17
+ ```
18
+
19
+ - `certifi` supplies a trusted CA bundle so the agent's `http_request` tool can
20
+ verify HTTPS certificates on macOS (where Python's default bundle may be
21
+ missing). The worker falls back to the system bundle if certifi is absent.
22
+ - Agent mode is blocked for cloud models (the API returns 400). Select a local
23
+ MLX or Ollama model first.
24
+ - Tool steps appear in the collapsible reasoning (`<think>`) area; the final
25
+ answer is the message body — no new UI needed.
26
+
27
+ The agent script is `gonext_agent_chat.py` (reads `{messages, agentBaseURL,
28
+ agentApiKey, agentModelId, tools, maxSteps}` on stdin; emits NDJSON
29
+ `{"type":"step"/"final","text":"..."}` lines on stdout).
30
+
5
31
  ## API Check / HTTP probe (Tools & Agents modes)
6
32
 
7
33
  The worker can run Postman-style HTTP probes queued from the web app
@@ -12,23 +38,15 @@ network_error).
12
38
 
13
39
  - **Tools (`tool_only`)** — no extra setup. The selected local model writes a
14
40
  one-line health summary of the measured result.
15
- - **Agents (`agentic`)** — a [smolagents](https://github.com/huggingface/smolagents)
16
- agent (running on the selected local model) produces the summary. Install it
17
- in the worker's Python environment:
18
-
19
- ```sh
20
- pip install smolagents
21
- ```
41
+ - **Agents (`agentic`)** — a smolagents agent (running on the selected local
42
+ model) produces the summary. Requires `pip install smolagents`.
22
43
 
23
44
  The agent talks to your local MLX OpenAI-compatible server (no cloud calls).
24
45
  The agent only summarizes; the worker's measurement stays the source of truth,
25
46
  so if smolagents or the model is unavailable the probe still returns the
26
47
  measured result with a note.
27
48
 
28
- ### Probe-related env
49
+ ### Env vars
29
50
 
30
- GONEXT_PROBE_PYTHON Python executable for the smolagents agent
51
+ GONEXT_PROBE_PYTHON Python executable for smolagents scripts
31
52
  (default: GONEXT_MLX_LM_PYTHON or python3)
32
-
33
- The agent script lives next to the worker as `gonext_probe_agent.py` (reads a
34
- JSON probe config on stdin, writes a JSON summary on stdout).
@@ -1216,6 +1216,16 @@ async function runAgentChatJob(job) {
1216
1216
  }
1217
1217
  };
1218
1218
 
1219
+ console.log(
1220
+ `[gonext-worker] agent_chat ${jobId} baseURL=${payload?.agentBaseURL ?? "(none)"} modelId=${payload?.agentModelId ?? "(none)"}`
1221
+ );
1222
+ // Send an immediate heartbeat so the web 60-180s no-progress timer doesn't
1223
+ // fire while the local model is loading/generating its first reasoning step.
1224
+ enqueueText("<think>Agent starting…\n");
1225
+ flushTail = flushTail.then(() => flushChunks()).catch((err) => {
1226
+ console.error("[gonext-worker] agent_chat heartbeat flush error:", err);
1227
+ });
1228
+
1219
1229
  try {
1220
1230
  const python =
1221
1231
  (process.env.GONEXT_PROBE_PYTHON ?? process.env.GONEXT_MLX_LM_PYTHON ?? "")
@@ -1231,7 +1241,7 @@ async function runAgentChatJob(job) {
1231
1241
  });
1232
1242
  const timeoutMs = 300_000; // 5 min max for an agent run
1233
1243
 
1234
- let inThink = false;
1244
+ let inThink = true; // already opened <think> above
1235
1245
  let finalText = "";
1236
1246
 
1237
1247
  await runProcessWithStreamingStdout(python, [scriptPath], input, timeoutMs, (event) => {
@@ -148,6 +148,9 @@ def run_agent_chat(cfg):
148
148
  max_steps=max_steps,
149
149
  step_callbacks=[step_callback],
150
150
  )
151
+ # Emit before agent.run() so the web no-progress timer resets while the
152
+ # model loads its weights and generates its first reasoning step.
153
+ _emit({"type": "step", "text": f"Sending task to {agent_model_id}…"})
151
154
  with contextlib.redirect_stdout(sys.stderr):
152
155
  result = agent.run(task_text)
153
156
  final_text = str(result).strip()
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tiens.nguyen/gonext-local-worker",
3
- "version": "1.0.50",
3
+ "version": "1.0.51",
4
4
  "description": "Polls GoNext cloud API for async local LLM jobs and runs them against Ollama/OpenAI-compatible servers on this Mac",
5
5
  "type": "module",
6
6
  "license": "MIT",