npm - myagent-ai - Versions diffs - 1.22.2 → 1.23.0 - Mend

myagent-ai 1.22.2 → 1.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/agents/main_agent.py CHANGED Viewed

@@ -63,37 +63,37 @@ class MainAgent(BaseAgent):
 </output>
 注意事项:
-1. toolstocal标签: 尽量一次性列出所有需执行工具调用的，多个"tool""工具调用只要按顺序重复堆叠tool标签即可，解析器会按顺序执行工具调用，最终全部执行完后，会连同所有结果，回调大语言模型。如果某个工具执行超时了，也会回调回调大模型，让大模型分析为什么超时，改用其他工具。如非必要，不要一次仅调用一个工具。
+1. toolstocal标签: 尽量一次性列出所有需执行工具调用的，多个"tool"工具调用只要按顺序重复堆叠tool标签即可，解析器会按顺序执行工具调用，最终全部执行完后，会连同所有结果，回调大语言模型。如果某个工具执行超时了，也会回调大模型，让大模型分析为什么超时，改用其他工具。如非必要，不要一次仅调用一个工具。
 2. 上下文中的记忆系统说明
 - <automemory>: 系统自动根据你通过 <remember> 保存的记忆和当前用户输入，搜索出的 top10 相关记忆。这些是你过去主动记住的内容（包含时间信息），可供参考。
 - <recall_memory>: 你在上一轮通过 <recall> 指定的记忆搜索结果。系统根据你提供的关键字和时间点搜索了 top5 相关记忆。
 - 两种记忆互补：automemory 是自动匹配的，recall_memory 是你主动指定搜索的。如果 automemory 不足，使用 <recall> 请求更多。
-3. 工具选择指南
-- **搜索信息**: 用 `web_search`（返回标题+URL+摘要）
-- **读取网页内容**: 用 `web_read`（传入URL，提取正文）
-- **浏览器交互**（填表、点击、截图等）: 才使用 browser_open / browser_click 等
-- **执行命令/代码**: 用 `command` 工具执行 shell 命令（python/node/bash 等代码也通过 command 执行，如 `python script.py`、`node app.js`）
-- **文件操作**: 用 `file_read` / `file_write` / `file_list` 等文件工具
-- **发送文件给用户**: 用 `file_send` 工具（参数: file_path=文件路径, description=说明），当你生成或处理了文件需要返回给用户时使用
-- **播放音频**: 用 `playaudio` 工具（参数: url=音乐链接或file_path=本地文件路径），在聊天中内嵌播放音频（支持QQ音乐、YouTube音乐、本地MP3/WAV等），播放时自动关闭语音合成
-- **播放视频**: 用 `playvideo` 工具（参数: url=视频链接或file_path=本地文件路径），在聊天中内嵌播放视频（支持抖音、YouTube、B站、本地MP4等），播放时自动关闭语音合成
-- **网页控制**: 用 `web_control` 工具在聊天中打开一个可控制的浏览器面板，可浏览网页、点击元素、填写表单、滚动页面、执行JS、管理Cookie等。
-  - 打开: `{"action": "open", "url": "https://example.com"}` — 首次使用会自动创建会话，后续可传 session_id 复用
-  - 导航: `{"action": "navigate", "url": "https://...", "session_id": "xxx"}`
-  - 获取内容: `{"action": "get_content", "what": "text|html|url|title|links|images|forms|inputs", "session_id": "xxx"}`
-  - 点击元素: `{"action": "click", "selector": "CSS选择器", "session_id": "xxx"}`
-  - 填写输入: `{"action": "fill", "selector": "CSS选择器", "value": "内容", "session_id": "xxx"}`
-  - 滚动页面: `{"action": "scroll", "direction": "up|down|top|bottom", "distance": 300, "session_id": "xxx"}`
-  - 执行JS: `{"action": "evaluate", "script": "JavaScript代码", "session_id": "xxx"}`
-  - 管理Cookie: `{"action": "set_cookies", "cookies": [{"name":"k","value":"v","domain":"example.com"}], "session_id": "xxx"}` 或 `{"action": "get_cookies", "session_id": "xxx"}`
-  - 等待: `{"action": "wait", "time": 1000}` 毫秒 或 `{"action": "wait", "selector": ".result", "timeout": 10}` 秒
-  - 关闭: `{"action": "close", "session_id": "xxx"}`
-  - 注意: 网页通过服务端代理加载，用户可在面板中手动操作。复杂交互建议先用 get_content 查看页面结构再用 click/fill。
-- **主动召回记忆**: 用 `recall_memory` 工具（参数: keyword=关键字, time_point=可选时间点如"2025-01", limit=数量默认5），根据关键字和时间搜索历史记忆
-- **OCR 文字识别**: 用 `image_ocr` 工具（参数: image_path=图片路径, lang=ch/en），从截图、扫描件、照片中提取文字
-- **图片内容分析**: 用 `image_analyze` 工具（参数: image_path=图片路径, prompt=分析提示词），识别图片中的物体、文字、图表数据等（需模型支持视觉）
-- **语音转文字**: 用 `audio_transcribe` 工具（参数: audio_path=音频路径, language=zh/en），将音频文件转录为文本
-- **专业技能指令**: 系统内置了丰富的专业技能指南（如 PDF/DOCX/XLSX/PPT 生成、图表绘制、前端开发、全栈开发、图像生成等），当你需要执行特定领域的复杂任务时，通过 `<get_knowledge>` 标签请求相关技能指令（如填写 "PDF文档生成指南"、"PPT制作规范" 等），系统将在下一轮通过 `<knowledge>` 注入完整指令。
+3. 工具使用指南 — 只有一个工具: command
+你只有一个工具就是 command（执行命令行），所有操作都通过它完成。格式:
+  <tool><toolname>command</toolname><parms>{"command": "要执行的命令"}</parms><timeout>超时秒数</timeout></tool>
+常用 CLI 命令 (通过 command 工具调用):
+- OCR 文字识别: myagent-ai ocr 图片路径 [ch|en]
+- 图片分析(VLM): myagent-ai analyze-image 图片路径 [分析提示词]
+- 语音转文字: myagent-ai transcribe 音频路径 [zh|en|ja]
+- 网络搜索: myagent-ai search 关键词
+- 读取网页: myagent-ai read-url URL
+- 发送文件给用户: myagent-ai send-file 文件路径 描述
+- 读文件: cat 文件路径
+- 写文件: echo "内容" > 文件路径  或  python3 -c "open('f','w').write('内容')"
+- 执行代码: python3 script.py  或  python3 -c "代码"
+- 文件列表: ls -la 目录
+- 系统信息: uname -a / df -h / free -h
+调用示例:
+  <tool><toolname>command</toolname><parms>{"command": "myagent-ai ocr /tmp/screenshot.png"}</parms><timeout>30</timeout></tool>
+  <tool><toolname>command</toolname><parms>{"command": "myagent-ai search 人工智能最新进展"}</parms><timeout>15</timeout></tool>
+  <tool><toolname>command</toolname><parms>{"command": "myagent-ai send-file /tmp/report.pdf 季度报告"}</parms><timeout>10</timeout></tool>
+多个命令可用 && 连接一次执行:
+  <tool><toolname>command</toolname><parms>{"command": "myagent-ai search xxx && myagent-ai read-url https://..."}</parms><timeout>30</timeout></tool>
+专业技能指令: 系统内置了丰富的专业技能指南（PDF/DOCX/XLSX/PPT 生成、图表绘制、前端开发等），通过 <get_knowledge> 请求相关技能指令。
 4. 准备好内容后，最后，再检查输出格式，确保满足以下要求:
 <output>
 <mainsubject>当前对话的6字以内标题（每轮都需输出，系统会每3轮自动更新会话名称）</mainsubject>

package/core/tool_dispatcher.py CHANGED Viewed

@@ -169,7 +169,38 @@ class ToolDispatcher:
         exec_result = await self.executor.execute(
             language="shell", code=code_text, timeout=timeout,
         )
-        return exec_result.to_dict()
+        result = exec_result.to_dict()
+        # [v1.22.0] 检测 __SEND_FILE__ 标记 — CLI send-file 命令输出此标记
+        # 格式: __SEND_FILE__绝对路径|描述__END__
+        output = result.get("output", "")
+        import re as _re
+        send_markers = _re.findall(r'__SEND_FILE__(.+?)\|(.+?)__END__', output)
+        if send_markers:
+            # 从输出中移除标记行
+            clean_output = _re.sub(r'__SEND_FILE__.+?__END__\n?', '', output).strip()
+            result["output"] = clean_output
+            # 执行 file_send
+            for send_path, send_desc in send_markers:
+                send_path = send_path.strip()
+                send_desc = send_desc.strip()
+                try:
+                    p = _P(send_path)
+                    if p.exists():
+                        from skills.file_send import FileSendSkill
+                        fskill = FileSendSkill()
+                        fresult = await fskill.execute(
+                            str(p), send_desc or f"文件: {p.name}",
+                        )
+                        if fresult.get("success"):
+                            logger.info(f"[{task_id}] CLI 自动发送文件: {p.name}")
+                        else:
+                            result["output"] += f"\n[文件发送失败: {fresult.get('error', '')}]"
+                except Exception as e:
+                    logger.warning(f"[{task_id}] CLI 文件发送异常: {e}")
+                    result["output"] += f"\n[文件发送异常: {e}]"
+        return result
     async def _exec_recall_memory(self, params: Dict, task_id: str) -> Dict:
         """主动召回记忆"""

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "myagent-ai",
-  "version": "1.22.2",
+  "version": "1.23.0",
   "description": "本地桌面端执行型AI助手 - Open Interpreter 风格 | Local Desktop Execution-Oriented AI Assistant",
   "main": "main.py",
   "bin": {

package/scripts/cli.py ADDED Viewed

@@ -0,0 +1,371 @@
+#!/usr/bin/env python3
+"""
+scripts/cli.py - MyAgent CLI 工具集
+====================================
+轻量级命令行工具，LLM 通过 command 工具调用这些命令完成所有操作。
+用法: myagent-ai <command> [args...]
+"""
+from __future__ import annotations
+import asyncio
+import base64
+import json
+import mimetypes
+import os
+import sys
+from pathlib import Path
+# 确保可以导入 myagent 模块
+_SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+_PROJECT_ROOT = os.path.dirname(_SCRIPT_DIR)
+if _PROJECT_ROOT not in sys.path:
+    sys.path.insert(0, _PROJECT_ROOT)
+_DATA_DIR = os.path.expanduser("~/.myagent/data")
+def _load_config() -> dict:
+    """加载用户配置"""
+    config_path = os.path.join(_DATA_DIR, "config.json")
+    if os.path.exists(config_path):
+        with open(config_path, encoding="utf-8") as f:
+            return json.load(f)
+    return {}
+def _err(msg: str) -> None:
+    print(f"错误: {msg}", file=sys.stderr)
+# =============================================================================
+# OCR 光学字符识别
+# =============================================================================
+def cmd_ocr(args: list[str]) -> int:
+    """从图片中提取文字"""
+    image = args[0] if args else ""
+    lang = "ch"
+    for a in args[1:]:
+        if a in ("ch", "en", "japan", "korean", "ta", "te", "kn", "hi", "ar", "ur"):
+            lang = a
+            break
+    if not image:
+        _err("用法: myagent-ai ocr <图片路径> [ch|en|japan|korean|...]")
+        return 1
+    image = os.path.expanduser(image)
+    if not os.path.isfile(image):
+        _err(f"图片不存在: {image}")
+        return 1
+    ext = Path(image).suffix.lower().lstrip(".")
+    supported = ("png", "jpg", "jpeg", "bmp", "tiff", "tif", "webp", "gif")
+    if ext not in supported:
+        _err(f"不支持的图片格式: .{ext}")
+        return 1
+    try:
+        from paddleocr import PaddleOCR
+    except ImportError:
+        _err("paddleocr 未安装。安装: pip install paddleocr paddlepaddle")
+        return 1
+    try:
+        ocr = PaddleOCR(use_angle_cls=True, lang=lang, show_log=False)
+        result = ocr.ocr(image, cls=True)
+        if result and result[0]:
+            for line in result[0]:
+                text, (conf,) = line[1]
+                print(text)
+        else:
+            print("(未检测到文字)")
+    except Exception as e:
+        _err(f"OCR 失败: {e}")
+        return 1
+    return 0
+# =============================================================================
+# 图片内容分析 (VLM)
+# =============================================================================
+def cmd_analyze_image(args: list[str]) -> int:
+    """使用视觉语言模型分析图片"""
+    image = args[0] if args else ""
+    # prompt 从 --prompt 后面取，或剩余参数拼接
+    prompt = ""
+    rest = args[1:]
+    if "--prompt" in rest:
+        idx = rest.index("--prompt")
+        prompt = " ".join(rest[idx + 1:]) if idx + 1 < len(rest) else ""
+    elif rest:
+        prompt = " ".join(rest)
+    if not prompt:
+        prompt = "请详细描述这张图片的内容，包括文字、布局、物体等信息。"
+    if not image:
+        _err("用法: myagent-ai analyze-image <图片路径> [--prompt 分析提示词]")
+        return 1
+    image = os.path.expanduser(image)
+    if not os.path.isfile(image):
+        _err(f"图片不存在: {image}")
+        return 1
+    # 检查图片大小
+    fsize = os.path.getsize(image)
+    if fsize > 20 * 1024 * 1024:
+        _err(f"图片过大 ({fsize / 1024 / 1024:.1f}MB)，限制 20MB")
+        return 1
+    # 读取 LLM 配置
+    cfg = _load_config().get("llm", {})
+    api_key = cfg.get("api_key", "")
+    base_url = cfg.get("base_url", "")
+    model = cfg.get("model", "")
+    # 模型链支持: 检查是否有视觉模型
+    model_chain = cfg.get("model_chain", [])
+    vision_model = None
+    vision_base_url = base_url
+    for m in model_chain:
+        m_name = m.get("model", "")
+        m_modes = m.get("modes", [])
+        if "image" in m_modes:
+            vision_model = m_name
+            provider_cfg = m.get("provider", "")
+            if provider_cfg:
+                # 从 provider 配置中获取 base_url
+                provider_settings = cfg.get("providers", {}).get(provider_cfg, {})
+                vision_base_url = provider_settings.get("base_url", base_url)
+            break
+    if not vision_model and not api_key:
+        _err("未配置支持视觉的模型。请在配置中添加 image 模式的模型。")
+        return 1
+    use_model = vision_model or model
+    use_url = vision_base_url or base_url
+    if not use_url:
+        _err("未配置 LLM API 地址")
+        return 1
+    # 编码图片
+    with open(image, "rb") as f:
+        b64 = base64.b64encode(f.read()).decode()
+    mime = mimetypes.guess_type(image)[0] or "image/png"
+    try:
+        from openai import OpenAI
+        client = OpenAI(api_key=api_key, base_url=use_url)
+        resp = client.chat.completions.create(
+            model=use_model,
+            messages=[{
+                "role": "user",
+                "content": [
+                    {"type": "text", "text": prompt},
+                    {"type": "image_url", "image_url": {"url": f"data:{mime};base64,{b64}"}},
+                ],
+            }],
+            max_tokens=4096,
+            temperature=0.1,
+        )
+        content = resp.choices[0].message.content
+        if content:
+            print(content)
+        else:
+            print("(模型未返回分析结果)")
+    except Exception as e:
+        _err(f"图片分析失败: {e}")
+        return 1
+    return 0
+# =============================================================================
+# 语音转文字 (STT)
+# =============================================================================
+async def _do_transcribe(audio_path: str, language: str) -> dict:
+    from core.stt import transcribe
+    return await transcribe(audio_path=audio_path, language=language)
+def cmd_transcribe(args: list[str]) -> int:
+    """将音频文件转录为文本"""
+    audio = args[0] if args else ""
+    lang = "zh"
+    for a in args[1:]:
+        if a in ("zh", "en", "ja", "ko", "yue", "auto"):
+            lang = a
+            break
+    if not audio:
+        _err("用法: myagent-ai transcribe <音频路径> [zh|en|ja|ko|yue]")
+        return 1
+    audio = os.path.expanduser(audio)
+    if not os.path.isfile(audio):
+        _err(f"音频文件不存在: {audio}")
+        return 1
+    try:
+        result = asyncio.run(_do_transcribe(audio, lang))
+        if result.get("success"):
+            print(result["output"])
+            return 0
+        else:
+            _err(result.get("error", "转录失败"))
+            return 1
+    except RuntimeError:
+        # 已有事件循环
+        loop = asyncio.new_event_loop()
+        asyncio.set_event_loop(loop)
+        try:
+            result = loop.run_until_complete(_do_transcribe(audio, lang))
+            if result.get("success"):
+                print(result["output"])
+                return 0
+            else:
+                _err(result.get("error", "转录失败"))
+                return 1
+        finally:
+            loop.close()
+    except Exception as e:
+        _err(f"语音转文字失败: {e}")
+        return 1
+# =============================================================================
+# 网络搜索
+# =============================================================================
+def cmd_search(args: list[str]) -> int:
+    """DuckDuckGo 网络搜索"""
+    query = " ".join(args)
+    if not query:
+        _err("用法: myagent-ai search <关键词>")
+        return 1
+    try:
+        from duckduckgo_search import DDGS
+        with DDGS() as ddgs:
+            results = list(ddgs.text(query, max_results=5))
+        if results:
+            for i, r in enumerate(results, 1):
+                print(f"{i}. {r['title']}")
+                print(f"   URL: {r['href']}")
+                print(f"   {r['body']}")
+                print()
+        else:
+            print("(未找到搜索结果)")
+    except Exception as e:
+        _err(f"搜索失败: {e}")
+        return 1
+    return 0
+# =============================================================================
+# 读取网页内容
+# =============================================================================
+def cmd_read_url(args: list[str]) -> int:
+    """提取网页正文内容"""
+    url = args[0] if args else ""
+    if not url:
+        _err("用法: myagent-ai read-url <URL>")
+        return 1
+    try:
+        import requests
+        from bs4 import BeautifulSoup
+        resp = requests.get(url, timeout=15, headers={
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
+        })
+        resp.raise_for_status()
+        soup = BeautifulSoup(resp.text, "html.parser")
+        # 移除无关标签
+        for tag in soup(["script", "style", "nav", "header", "footer", "aside"]):
+            tag.decompose()
+        # 提取标题
+        title = soup.title.string.strip() if soup.title else ""
+        if title:
+            print(f"标题: {title}\n")
+        text = soup.get_text(separator="\n", strip=True)
+        # 截断过长内容
+        if len(text) > 8000:
+            text = text[:8000] + "\n...(内容过长，已截断)"
+        print(text)
+    except Exception as e:
+        _err(f"读取网页失败: {e}")
+        return 1
+    return 0
+# =============================================================================
+# 发送文件给用户
+# =============================================================================
+def cmd_send_file(args: list[str]) -> int:
+    """发送文件给用户 (输出特殊标记，由 ToolDispatcher 解析)"""
+    fpath = args[0] if args else ""
+    desc = ""
+    if len(args) > 1:
+        # 支持两种格式:
+        #   myagent-ai send-file path description
+        #   myagent-ai send-file path --desc "description"
+        rest = args[1:]
+        if "--desc" in rest:
+            idx = rest.index("--desc")
+            desc = " ".join(rest[idx + 1:]) if idx + 1 < len(rest) else ""
+        else:
+            desc = " ".join(rest)
+    if not fpath:
+        _err("用法: myagent-ai send-file <文件路径> [描述]")
+        return 1
+    fpath = os.path.expanduser(fpath)
+    if not os.path.isfile(fpath):
+        _err(f"文件不存在: {fpath}")
+        return 1
+    # 输出特殊标记 (ToolDispatcher._exec_command 会检测并自动发送)
+    abs_path = os.path.abspath(fpath)
+    print(f"__SEND_FILE__{abs_path}|{desc}__END__")
+    return 0
+# =============================================================================
+# 命令注册
+# =============================================================================
+COMMANDS: dict[str, tuple] = {
+    "ocr":            (cmd_ocr,            "OCR 光学字符识别",        "<图片路径> [ch|en|japan|korean|...]"),
+    "analyze-image":  (cmd_analyze_image,  "图片内容分析 (VLM)",       "<图片路径> [--prompt 分析提示词]"),
+    "transcribe":     (cmd_transcribe,     "语音转文字",               "<音频路径> [zh|en|ja|ko|yue]"),
+    "search":         (cmd_search,         "网络搜索 (DuckDuckGo)",   "<关键词>"),
+    "read-url":       (cmd_read_url,       "读取网页正文内容",         "<URL>"),
+    "send-file":      (cmd_send_file,      "发送文件给用户",           "<文件路径> [描述]"),
+}
+def main():
+    if len(sys.argv) < 2 or sys.argv[1] in ("-h", "--help", "help"):
+        print("MyAgent CLI - 命令行工具集\n")
+        print("用法: myagent-ai <command> [args...]\n")
+        print("可用命令:")
+        for name, (_, desc, usage) in COMMANDS.items():
+            print(f"  myagent-ai {name} {usage}")
+            print(f"    → {desc}")
+            print()
+        if len(sys.argv) < 2:
+            sys.exit(1)
+        return
+    cmd = sys.argv[1]
+    if cmd not in COMMANDS:
+        _err(f"未知命令: {cmd}")
+        _err(f"可用命令: {', '.join(COMMANDS.keys())}")
+        _err("运行 myagent-ai help 查看帮助")
+        sys.exit(1)
+    handler = COMMANDS[cmd][0]
+    rc = handler(sys.argv[2:])
+    sys.exit(rc or 0)
+if __name__ == "__main__":
+    main()

package/start.js CHANGED Viewed

@@ -539,7 +539,7 @@ function cmdUpdate(pkgDir) {
   // 1. npm 升级全局包（--force 确保跳过缓存）
   console.log("  正在通过 npm 更新 myagent-ai ...");
   try {
-    execFileSync("npm", ["install", "-g", PKG_NAME, "--prefer-online", "--force"], {
+    execFileSync("npm", ["install", "-g", PKG_NAME + "@latest", "--prefer-online", "--force"], {
       encoding: "utf8", stdio: "inherit", timeout: 120000,
     });
   } catch (e) {
@@ -764,6 +764,43 @@ function main() {
   if (cmd === "reinstall") { cmdReinstall(pkgDir); return; }
   if (cmd === "install")   { cmdInstall(pkgDir); return; }
+  // [v1.22.0] CLI 子命令 — 轻量级，不需要完整启动 myagent
+  const CLI_CMDS = ["ocr", "analyze-image", "transcribe", "search", "read-url", "send-file", "help", "-h", "--help"];
+  if (CLI_CMDS.includes(cmd)) {
+    const venvPython = getVenvPython(getVenvDir());
+    const venvDir = getVenvDir();
+    const cliScript = path.join(pkgDir, "scripts", "cli.py");
+    if (!fs.existsSync(cliScript)) {
+      console.error("\x1b[31mCLI 脚本不存在: " + cliScript + "\x1b[0m");
+      process.exit(1);
+    }
+    let pyExe;
+    let env;
+    if (fs.existsSync(venvPython)) {
+      pyExe = venvPython;
+      env = {
+        ...process.env,
+        VIRTUAL_ENV: venvDir,
+        PATH: `${path.join(venvDir, IS_WIN ? "Scripts" : "bin")}:${process.env.PATH}`,
+      };
+    } else {
+      pyExe = findSystemPython();
+      if (!pyExe) {
+        console.error("\x1b[31mPython 未找到，请先运行: myagent-ai install\x1b[0m");
+        process.exit(1);
+      }
+      env = process.env;
+    }
+    try {
+      execFileSync(pyExe, [cliScript, ...args], {
+        stdio: "inherit", cwd: pkgDir, env, timeout: 300000,
+      });
+    } catch (e) {
+      process.exit(e.status || 1);
+    }
+    return;
+  }
   // 默认: 直接启动
   cmdRun(pkgDir, args);
 }