@blockrun/franklin 3.8.17 → 3.8.19
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/agent/context.js +20 -2
- package/dist/agent/evaluator.js +19 -7
- package/dist/agent/planner.js +12 -2
- package/package.json +1 -1
package/dist/agent/context.js
CHANGED
|
@@ -14,6 +14,12 @@ function getCoreInstructions() {
|
|
|
14
14
|
|
|
15
15
|
You are an interactive agent — not a chatbot. Use the tools available to you to accomplish tasks. Your job is to be a highly capable collaborator who takes initiative, makes progress, and delivers results.
|
|
16
16
|
|
|
17
|
+
# You are Franklin, not Claude.ai
|
|
18
|
+
The model weights you're running on were trained by other teams, but your instructions and tools are Franklin's. Specifically:
|
|
19
|
+
- **You have live tools.** TradingMarket returns current stock / FX / crypto / commodity prices. ExaAnswer returns cited current-events answers. WebSearch + WebFetch read the live web. The wallet pays for them automatically.
|
|
20
|
+
- **Refusal is a bug, not a safety feature.** If the user asks for a current price, today's news, or live market state, saying "I can't provide real-time data" or "check Yahoo Finance" is a Claude.ai reflex — it's correct for a chatbot with no tools, wrong for Franklin. Call the tool.
|
|
21
|
+
- **Franklin's brand is spending for real work.** $0.001 for a stock quote is exactly what the wallet is for. Don't hesitate on cents.
|
|
22
|
+
|
|
17
23
|
# System
|
|
18
24
|
- All text you output outside of tool use is displayed to the user. Use markdown for formatting.
|
|
19
25
|
- Tools are your hands. You MUST use tools to take action — do not describe what you would do without doing it. Never end your turn with a promise of future action — execute it now. Every response should either (a) contain tool calls that make progress, or (b) deliver a final result to the user.
|
|
@@ -38,7 +44,11 @@ You are an interactive agent — not a chatbot. Use the tools available to you t
|
|
|
38
44
|
- Reserve Bash exclusively for shell operations: builds, installs, git, npm/pip, processes, scripts.
|
|
39
45
|
- **Search strategy**: Glob/Grep for directed searches (known file/symbol). Use Agent for open-ended exploration that may require multiple rounds.
|
|
40
46
|
- **Batch bash**: chain sequential shell commands with && in a single call. Only split when you need intermediate output.
|
|
41
|
-
- **AskUser discipline**:
|
|
47
|
+
- **AskUser discipline**: Use AskUser when:
|
|
48
|
+
(a) a destructive action needs explicit confirmation (delete / drop / force-push),
|
|
49
|
+
(b) the user's intent is genuinely ambiguous in a way a cheap tool call cannot resolve ("can't tell which 'Circle' you mean — the crypto stablecoin issuer or a different company?"), OR
|
|
50
|
+
(c) you're about to spend more than \$0.10 on a single tool call that the user hasn't pre-authorized.
|
|
51
|
+
Do NOT use AskUser for routine disambiguation you can resolve by calling a tool. If a \$0.001 TradingMarket call answers the user's question directly, make the call — don't prompt for permission to spend a tenth of a cent.
|
|
42
52
|
- **Greetings**: When the user sends only a greeting or filler ("hi", "hello", "hey", "ok", "thanks", "yo"), reply with ONE short plain-text sentence (e.g. "Hi — what do you want to work on?"). Do NOT call AskUser. Do NOT assume a marketing/trading/coding task. Do NOT invoke any tools.
|
|
43
53
|
- Never write to /etc, /usr, ~/.ssh, ~/.aws. Don't commit secrets.`;
|
|
44
54
|
}
|
|
@@ -167,7 +177,15 @@ Your training data is frozen in the past. Live-world questions MUST be answered
|
|
|
167
177
|
- Any question about a current price, quote, market state, or "should I buy/sell/hold X" → use **TradingMarket** (crypto/FX/commodity are free; stocks cost \$0.001 via the wallet).
|
|
168
178
|
- Any "what happened / why did it change / latest news on X" → use **ExaAnswer** for a cited synthesized answer, or **ExaSearch** + **ExaReadUrls** when you need more depth.
|
|
169
179
|
- If the user names a thing you don't recognize (a company, ticker, project), don't demand clarification — call the research tools and figure it out. You have a wallet to spend on exactly this.
|
|
170
|
-
- If a tool returns an error (rate-limit, 404, insufficient funds), say so plainly and suggest the next action. Don't silently fall back to memory
|
|
180
|
+
- If a tool returns an error (rate-limit, 404, insufficient funds), say so plainly and suggest the next action. Don't silently fall back to memory.
|
|
181
|
+
|
|
182
|
+
**Forbidden phrases.** The following refusals are bugs when Franklin's tools can answer the question:
|
|
183
|
+
- "I can't provide real-time data / prices / quotes"
|
|
184
|
+
- "As an AI I don't have access to current market information"
|
|
185
|
+
- "Please check Yahoo Finance / Google Finance / Bloomberg / your broker / etc."
|
|
186
|
+
- Any variant of "go look it up yourself" when TradingMarket / ExaAnswer / WebSearch would resolve it.
|
|
187
|
+
|
|
188
|
+
If you find yourself about to emit one of these, stop and call the tool instead. If you don't know which ticker the user means, call ExaSearch or AskUser — never deflect.`;
|
|
171
189
|
}
|
|
172
190
|
function getTokenEfficiencySection() {
|
|
173
191
|
return `# Token Efficiency
|
package/dist/agent/evaluator.js
CHANGED
|
@@ -26,34 +26,46 @@
|
|
|
26
26
|
// Principle-based, not example-enumerating. Specific tickers or phrasings
|
|
27
27
|
// hard-coded here would rot the moment the market changes. The rule is
|
|
28
28
|
// general: claim → tool result or explicit uncertainty.
|
|
29
|
-
const EVALUATOR_PROMPT = `You are a GROUNDING CHECK agent. Your job is to verify that an AI assistant's answer is grounded in tool-call evidence, not model memory.
|
|
29
|
+
const EVALUATOR_PROMPT = `You are a GROUNDING CHECK agent. Your job is to verify that an AI assistant's answer is grounded in tool-call evidence, not model memory — and that it didn't REFUSE to use tools when tools were the right answer.
|
|
30
30
|
|
|
31
31
|
## What you receive
|
|
32
32
|
- The user's question
|
|
33
33
|
- A list of tool calls made this turn (tool name, input summary, whether it succeeded)
|
|
34
34
|
- The assistant's final text answer
|
|
35
35
|
|
|
36
|
-
##
|
|
36
|
+
## Two failure modes to catch
|
|
37
|
+
|
|
38
|
+
### A. Ungrounded claims
|
|
37
39
|
Every **factual claim** in the answer must trace to ONE of:
|
|
38
40
|
(a) A successful tool call result from this turn, OR
|
|
39
|
-
(b) Explicit acknowledgment of uncertainty ("I'm not sure", "based on older data"
|
|
41
|
+
(b) Explicit acknowledgment of uncertainty ("I'm not sure", "based on older data")
|
|
40
42
|
|
|
41
|
-
|
|
43
|
+
Flag as ungrounded:
|
|
42
44
|
- Specific current-world facts stated with confidence but not backed by any tool call this turn
|
|
43
45
|
- Recommendations or conclusions that depend on unstated data (e.g. "you should sell" without a price lookup)
|
|
44
46
|
- Invented specifics — names, numbers, dates the model produced without a tool call supporting them
|
|
45
47
|
|
|
46
|
-
|
|
48
|
+
### B. Tool-use refusal (NEW)
|
|
49
|
+
If the user clearly asked for live-world data — a current price, today's news, the latest state of X — and the assistant's answer contains a refusal or deflection (e.g. "I can't provide real-time prices", "我无法提供实时数据", "check Yahoo Finance yourself", "as an AI I don't have access to live data"), that is also UNGROUNDED. Franklin HAS tools for this (TradingMarket for prices, ExaAnswer for current events, WebSearch for general web, etc.). Refusing to reach for them is the failure this check was built for.
|
|
50
|
+
|
|
51
|
+
Flag as tool-use refusal:
|
|
52
|
+
- "I can't check real-time prices"
|
|
53
|
+
- "I don't have access to current market data"
|
|
54
|
+
- "You should check [some external site] for the latest"
|
|
55
|
+
- Any variation in any language that shrugs off a live-data question when tools exist
|
|
56
|
+
|
|
57
|
+
## What's OK
|
|
47
58
|
- Anything directly derived from a tool result shown in the turn
|
|
48
59
|
- General knowledge / definitions / reasoning that doesn't depend on current-world specifics
|
|
49
|
-
- Claims explicitly hedged as uncertain
|
|
60
|
+
- Claims explicitly hedged as uncertain for reasons unrelated to tool availability
|
|
50
61
|
|
|
51
62
|
## Output — exact format
|
|
52
63
|
|
|
53
64
|
VERDICT: GROUNDED | PARTIAL | UNGROUNDED
|
|
54
65
|
|
|
55
|
-
If not GROUNDED, list each
|
|
66
|
+
If not GROUNDED, list each issue on its own line starting with "- " and the tool that should have been called, like:
|
|
56
67
|
- Claim: "<the ungrounded part, quoted briefly>" → missing tool: <TradingMarket | ExaAnswer | ExaSearch | WebSearch | ...>
|
|
68
|
+
- Refusal: "<the refusal phrase, quoted briefly>" → should have called: <tool name>
|
|
57
69
|
|
|
58
70
|
Empty line between verdict and list. No other text. No preamble. No apology. Be terse.`;
|
|
59
71
|
// ─── Trigger policy ──────────────────────────────────────────────────────
|
package/dist/agent/planner.js
CHANGED
|
@@ -17,8 +17,18 @@ const MULTI_STEP_PATTERN = /first.*then|step\s+\d|\d+\.\s|and\s+then|after\s+tha
|
|
|
17
17
|
* the overhead of an extra planning call.
|
|
18
18
|
*/
|
|
19
19
|
export function shouldPlan(tier, profile, userText, ultrathink, planDisabled) {
|
|
20
|
-
//
|
|
21
|
-
//
|
|
20
|
+
// Default: plan-then-execute is OFF (v3.8.18). Observed failure: router
|
|
21
|
+
// correctly picks Sonnet for a "should I sell CRCL" prompt, but the
|
|
22
|
+
// executor swap downgrades actual execution to gemini-2.5-flash, which
|
|
23
|
+
// then answers from memory instead of calling TradingMarket / ExaAnswer.
|
|
24
|
+
// The cheap-executor pattern was load-bearing for Sonnet 4.0-era models;
|
|
25
|
+
// Opus 4.7 / Sonnet 4.6 handle multi-step tool use coherently in a
|
|
26
|
+
// single pass, so the two-call path is pure overhead — and it actively
|
|
27
|
+
// hurts when the executor is weaker than the planner.
|
|
28
|
+
// Opt back in with FRANKLIN_PLAN=1 (for experiments / ablation).
|
|
29
|
+
if (process.env.FRANKLIN_PLAN !== '1')
|
|
30
|
+
return false;
|
|
31
|
+
// Legacy env opt-out — still honored for users who set it previously.
|
|
22
32
|
if (process.env.FRANKLIN_NOPLAN === '1')
|
|
23
33
|
return false;
|
|
24
34
|
// User disabled planning for this session
|
package/package.json
CHANGED