npm - @stephen-lord/other2 - Versions diffs - 1.0.8 → 1.0.10 - Mend

@stephen-lord/other2 1.0.8 → 1.0.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (47) hide show

package/dist/docs/manus/README.md ADDED Viewed

@@ -0,0 +1,85 @@
+# Manus Context Engineering 文档索引
+本目录包含了关于 Manus Context Engineering 和 Logit Masking 技术的文章文档。
+## 核心文章（官方原文）
+### 英文原文
+- **Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus.md** - Peak Ji 的官方原文，Manus 的 Context Engineering 核心经验
+- **Why-Context-Engineering.md** - Lance Martin 的分析文章
+- **The-Performance-Reality-KV-Cache-as-the-North-Star.md** - KV Cache 性能优化详解
+- **Context-Engineering-for-AI-Agents-Part-2.md** - Phil Schmid 的第二部分深度解析
+### 中文翻译与解读
+- **AI代理的上下文工程：构建Manus的经验教训.md** - 官方中文版（需单独保存）
+- **来自-Manus-的一手分享如何构建-AI-Agent-的上下文工程-智源社区.md** - 智源社区翻译版本
+- **CN-扒完全网最强-AI-团队的-Context-Engineering-攻略我们总结出了这-5-大方法-智源社区.md** - 智源社区总结的5大方法
+- **Manus-内部的-Context-工程经验精校高亮要点-人人都是产品经理.md** - 精校翻译版本，带要点高亮
+## 技术深度文章
+### Logit Masking / Constrained Decoding
+- **Tech-Understanding-Logit-Bias-in-LLMs-Medium.md** - Logit Bias 技术详解
+- **Tech-How-to-build-function-calling-and-JSON-mode-for-open-source-and-fine-tuned-LLMs.md** - 使用 State Machine 和 Logit Biasing 实现函数调用
+- **Tech-Constrained-Decoding-and-Structured-Output-for-Agent-Reliability-Engineering-Notes.md** - 约束解码与结构化输出详解
+### 架构与实践
+- **Observability-for-Manus-15-Agents-Logs-Retries-and-Error-Budgets.md** - Manus 1.5 可观测性最佳实践
+- **OpenManus-Technical-Analysis-Architecture-and-Implementation-of-an-Open-Source-A.md** - OpenManus 开源实现分析
+- **Industry.md** - ZenML 的 Context Engineering 策略分析
+## 中文深度解读
+- **【深度专题】Context-Engineering-是什么-为什么-Manus-团队花千万美金踩坑-只为搞懂-怎么喂模型-.md** - 台湾 AI Post Hub 的深度专题解析
+- **大白话读懂Manus-上下文优化策略-开发者社区-火山引擎.md** - 用大白话解释优化策略
+## 建议阅读顺序
+### 初学者路径
+1. 先看 **智源社区的总结文章**（CN-扒完全网最强...）建立概念地图
+2. 再看 **官方中文版** 或 **人人都是产品经理的精校版** 理解细节
+3. 最后看 **官方英文原文** 理解原始表达
+### 技术深入路径
+1. **Context Engineering 核心原则**（官方原文）
+2. **Logit Bias 技术详解**（Tech-Understanding-Logit-Bias）
+3. **Constrained Decoding 实战**（Tech-Constrained-Decoding）
+4. **Function Calling 实现**（Tech-How-to-build-function-calling）
+## 核心技术要点总结
+### 1. KV-Cache 命中率优化
+- 保持提示前缀稳定
+- 只追加，不修改历史
+- 确定性序列化
+### 2. Logit Masking 工具可见性
+- 保留所有工具定义
+- 通过 token logits 遮蔽实现动态控制
+- 使用响应预填充技术
+### 3. 文件系统作为外部记忆
+- 上下文窗口有限
+- 可逆压缩策略
+- URL/路径作为恢复锚点
+### 4. 错误保留与学习
+- 保留失败记录
+- 让模型从错误中学习
+- 错误恢复是智能体标志
+### 5. 避免少样本陷阱
+- 增加上下文多样性
+- 打破模式固化
+- 防止机械模仿
+## 关键概念
+- **Context Engineering**: 上下文工程，通过设计输入上下文来塑造 AI Agent 行为
+- **KV-Cache**: Key-Value 缓存，用于缓存已计算过的 token 表示
+- **Logit Masking**: Logit 遮蔽，通过修改 token 概率分布来控制输出
+- **Constrained Decoding**: 约束解码，限制模型只能生成符合特定格式的输出
+- **State Machine**: 状态机，用于跟踪当前允许的有效 token 集合
+---
+*最后更新: 2026-03-28*

package/dist/docs/manus/Tech-Constrained-Decoding-Agent-Reliability.md ADDED Viewed

@@ -0,0 +1,81 @@
+# Constrained Decoding and Structured Output for Agent Reliability
+**Source:** https://notes.muthu.co/2025/11/constrained-decoding-and-structured-output-for-agent-reliability/
+---
+When building production AI agents, one of the most persistent problems is unpredictable output formats. An agent needs to call a tool with precise JSON parameters, but the LLM wraps the output in markdown code blocks, adds explanatory text, or hallucinate invalid field names.
+## Concept Introduction
+Constrained decoding modifies the token sampling process by masking invalid tokens, setting their probability to zero before sampling.
+```
+Standard Decoding:
+P(next_token | context) → Sample from all vocabulary
+Constrained Decoding:
+P(next_token | context, grammar) → Sample only from valid tokens
+```
+### Types of Constraints
+- A JSON schema (only generate valid JSON matching the schema)
+- A regular expression (output must match the regex)
+- A context-free grammar (follow specific syntax rules)
+- A finite-state machine (transition through defined states)
+### Modern Implementation Techniques
+- Token masking at inference time
+- Incremental parsing to track valid next tokens
+- Beam search with grammar-aware scoring
+- Logit bias to steer generation probabilistically
+## Core Algorithm: FSM-Guided Token Masking
+```python
+def constrained_decode(prompt, schema, max_tokens):
+    # Convert schema to FSM
+    fsm = schema_to_fsm(schema)
+    state = fsm.initial_state
+    tokens = []
+    for _ in range(max_tokens):
+        # Get next token logits from LLM
+        logits = llm.forward(prompt + tokens)
+        # Mask invalid tokens based on current FSM state
+        valid_tokens = fsm.get_valid_tokens(state)
+        masked_logits = mask_logits(logits, valid_tokens)
+        # Sample next token
+        next_token = sample(masked_logits)
+        tokens.append(next_token)
+        # Update FSM state
+        state = fsm.transition(state, next_token)
+        # Check if reached accept state
+        if fsm.is_terminal(state):
+            break
+    return tokens
+```
+## Key Libraries
+- **Outlines** - Fast regex and JSON schema constraints using FSMs
+- **Guidance** - Microsoft's grammar-based generation library
+- **LM Format Enforcer** - Token masking for various formats
+## Benchmarks (2024)
+- **Prompt engineering**: 82% JSON valid
+- **Post-processing**: 89% JSON valid
+- **Constrained decoding**: 99.8% JSON valid
+Tool Calling Reliability:
+- **Standard generation**: 76% executable calls
+- **OpenAI function calling**: 94% executable
+- **Outlines JSON mode**: 99.2% executable

package/dist/docs/manus/Tech-How-to-build-function-calling-and-JSON-mode.md ADDED Viewed

@@ -0,0 +1,43 @@
+# How to build function calling and JSON mode for open-source and fine-tuned LLMs
+**Source:** https://baseten.co/blog/how-to-build-function-calling-and-json-mode-for-open-source-and-fine-tuned-llms
+---
+Use a state machine to generate token masks for logit biasing to enable function calling and structured output at the model server level.
+## Overview
+Today, we announced support for function calling and structured output for LLMs deployed with our TensorRT-LLM Engine Builder. This adds support at the model server level for two key features:
+- **Function calling**: also known as "tool use," this feature lets you pass a set of defined tools to a LLM as part of the request body. Based on the prompt, the model selects and returns the most appropriate function/tool from the provided options.
+- **Structured output**: an evolution of "JSON mode," this feature enforces an output schema defined as part of the LLM input. The LLM output is guaranteed to adhere to the provided schema, with full Pydantic support.
+## How structured output is generated
+To understand how it's possible to guarantee structured output, we need to dive into the details of how a token is generated during LLM inference:
+1. A vector of logits is outputted from the final layer of the LLM's neural network
+2. A normalization function like softmax is applied to turn the logits into probabilities
+3. Using these probabilities, a token is selected
+### Logit biasing ensures token validity
+The length of the logit vector is equal to the number of tokens in the model's vocabulary. For example, Llama 3 LLMs have a vocabulary of ~128,000 tokens.
+For structured output, we only want to generate valid tokens. Logit biasing guarantees valid output structure by identifying every invalid token and setting its score to negative infinity.
+### State machine provides token requirements
+The model server tracks output format using a state machine. Using the Outlines library:
+1. Takes the schema passed as model output
+2. Transforms it into a regular expression
+3. Generates a state machine from that regex
+The state machine is cached in memory, and an appropriate token mask is created for each node. This means calculations aren't made during inference time - existing masks are applied based on which state is active.
+## Key Takeaway
+Thanks to pre-computed token masks, there's minimal latency impact from using constrained decoding. You can expect the same tokens per second when generating JSON as when generating ordinary text.