RubyGems - openclacky - Versions diffs - 0.8.6 → 0.8.7 - Mend

openclacky 0.8.6 → 0.8.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +25 -0
data/lib/clacky/agent/memory_updater.rb +2 -2
data/lib/clacky/agent/session_serializer.rb +7 -0
data/lib/clacky/agent.rb +34 -15
data/lib/clacky/default_skills/channel-setup/SKILL.md +37 -110
data/lib/clacky/default_skills/pdf-reader/SKILL.md +90 -0
data/lib/clacky/default_skills/skill-add/SKILL.md +39 -23
data/lib/clacky/default_skills/skill-add/scripts/install_from_zip.rb +233 -0
data/lib/clacky/server/http_server.rb +78 -3
data/lib/clacky/skill.rb +25 -11
data/lib/clacky/skill_loader.rb +14 -7
data/lib/clacky/tools/browser.rb +75 -56
data/lib/clacky/tools/file_reader.rb +3 -3
data/lib/clacky/tools/shell.rb +22 -0
data/lib/clacky/utils/file_processor.rb +2 -2
data/lib/clacky/version.rb +1 -1
data/lib/clacky/web/app.css +57 -0
data/lib/clacky/web/app.js +90 -16
data/lib/clacky/web/channels.js +1 -1
data/lib/clacky/web/icon-dark.svg +23 -0
data/lib/clacky/web/icon.svg +26 -0
data/lib/clacky/web/index.html +2 -1
data/lib/clacky/web/sessions.js +8 -4
data/lib/clacky/web/skills.js +60 -30
metadata +5 -2
data/lib/clacky/default_skills/skill-add/scripts/install_from_github.rb +0 -233

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: '0190a435c18d666a4d2cc7c65b301bd24f18cc6a0159f5d7a4ae25ee552883d7'
-  data.tar.gz: 62eecd22f6cc112aa00674c683002f14ada01d637b95551a5ca7ff29d408ad75
+  metadata.gz: 38f9805e951dec0f87bda1b64033e0ea7f0c5c6d1c4fd2427f57dfc13aec0835
+  data.tar.gz: f6f0d08206ead392ffbbc073bb92c5b8e5b4c9f4ecf37172153c4bf46f4963e0
 SHA512:
-  metadata.gz: 8b0dcb369eeb481fc32dd8e02d4cc22ed6b21f3fb5115d9145623444bb88be1cf95cf1c3f8b62988bd54c28888512ee90ecf4df74f98250a04f2f0fcbd9d77d8
-  data.tar.gz: a6d72920b58547540dd6b389cb8c5dadafc4cf9face794217a7d4c2245bb4f2b46fce4e83616074d1054b9f7542ee0601cad1f9e43fc9358563f6d0dc8b97124
+  metadata.gz: d7400735f1f2cbf9fa6b74e56aaa9264e881ab0618885e87b9757458b3b87bde01c5319db6d6f6833573792229c8aa635d5c09bab43cdde15e8cddfe2ce3e418
+  data.tar.gz: ef4dede49038208ff386f5b536ba4c64158e5b72f5599694f14ecf83bd3259b51be6af52bef10fbdea88fbc23f2b2b11c9316e1bdbb1f350c355a0fedeb23bd1

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.8.7] - 2026-03-13
+### Added
+- **PDF file upload and reading**: users can now upload PDF files directly in the WebUI chat; the agent reads and analyzes the content via the built-in `pdf-reader` skill
+- **WebUI favicon and SVG icons**: browser tab now shows the Clacky icon
+- **Public skill store install**: skills from the public store can be installed directly via the WebUI without a GitHub URL
+- **Auto-kill previous server on startup**: launching `clacky serve` now automatically kills any previously running instance via pidfile, preventing port conflicts
+### Improved
+- **Brand skill loading speed**: loading brand skills no longer triggers a network decryption request — name and description are now read from the local `brand_skills.json` cache, making New Session significantly faster
+- **Memory update UX**: memory update step now shows a spinner and info-style message instead of a bare log line
+- **Browser snapshot output**: snapshot output is compressed to reduce token cost when the agent uses browser tools
+- **Subagent output**: subagent task completion now shows a brief info line instead of a full "Task Complete" block, reducing noise in the parent agent's context
+### Fixed
+- **Subagent token delta on first iteration**: subagent now inherits `previous_total_tokens` correctly, fixing an inflated token count on the first tool iteration
+- **Chrome DevTools inspect URL**: updated the remote debugging URL to include the `#remote-debugging` fragment for correct navigation
+- **Shell output token explosion**: long lines in shell output are now truncated to prevent excessive token usage
+### More
+- Binary file size limit lowered from 5 MB to 512 KB to reduce accidental token cost
+- `kill_existing_server` logic moved from CLI into `HttpServer` for cleaner separation
+- Browser tool prefers `snapshot -i` over `screenshot` for lower token cost
+- Cross-platform PID file path using `Dir.tmpdir` instead of hardcoded `/tmp`
 ## [0.8.6] - 2026-03-12
 ### Added

data/lib/clacky/agent/memory_updater.rb CHANGED Viewed

@@ -42,7 +42,7 @@ module Clacky
         @memory_prompt_injected = true
         @memory_updating = true
-        @ui&.show_info("Updating long-term memory...")
+        @ui&.show_progress("Updating long-term memory…")
         @messages << {
           role: "user",
@@ -62,7 +62,7 @@ module Clacky
         @messages.reject! { |m| m[:memory_update] }
         @memory_prompt_injected = false
         @memory_updating = false
-        @ui&.show_info("Memory updated.")
+        @ui&.clear_progress
       end
       private def memory_update_enabled?

data/lib/clacky/agent/session_serializer.rb CHANGED Viewed

@@ -275,6 +275,13 @@ module Clacky
             next unless source.is_a?(Hash) && source[:type].to_s == "base64"
             "data:#{source[:media_type]};base64,#{source[:data]}"
+          when "document"
+            # Anthropic PDF document block — return a sentinel string for frontend display
+            source = block[:source]
+            next unless source.is_a?(Hash) && source[:media_type].to_s == "application/pdf"
+            # Return a special marker so the frontend can render a PDF badge instead of an <img>
+            "pdf:#{source[:data]&.then { |d| d[0, 32] }}"  # prefix to identify without full payload
           end
         end
       end

data/lib/clacky/agent.rb CHANGED Viewed

@@ -141,7 +141,7 @@ module Clacky
       @config.model_name
     end
-    def run(user_input, images: [])
+    def run(user_input, images: [], files: [])
       # Start new task for Time Machine
       task_id = start_new_task
@@ -172,8 +172,8 @@ module Clacky
         @messages << system_message
       end
-      # Format user message with images if provided
-      user_content = format_user_content(user_input, images)
+      # Format user message with images and files if provided
+      user_content = format_user_content(user_input, images, files)
       @messages << { role: "user", content: user_content, task_id: task_id, created_at: Time.now.to_f }
       @total_tasks += 1
@@ -208,7 +208,12 @@ module Clacky
           # Check if done (no more tool calls needed)
           if response[:finish_reason] == "stop" || response[:tool_calls].nil? || response[:tool_calls].empty?
-            @ui&.show_assistant_message(response[:content]) if response[:content] && !response[:content].empty?
+            # During memory update phase, show LLM response as info (not a chat bubble)
+            if @memory_updating && response[:content] && !response[:content].empty?
+              @ui&.show_info("🧠 " + response[:content].strip)
+            elsif response[:content] && !response[:content].empty?
+              @ui&.show_assistant_message(response[:content])
+            end
             # Debug: log why we're stopping
             if @config.verbose && (response[:tool_calls].nil? || response[:tool_calls].empty?)
@@ -227,7 +232,8 @@ module Clacky
           end
           # Show assistant message if there's content before tool calls
-          if response[:content] && !response[:content].empty?
+          # During memory update phase, suppress text output (only tool calls matter)
+          if response[:content] && !response[:content].empty? && !@memory_updating
             @ui&.show_assistant_message(response[:content])
           end
@@ -272,13 +278,17 @@ module Clacky
           @modified_files_in_task = []  # Reset for next task
         end
-        @ui&.show_complete(
-          iterations: result[:iterations],
-          cost: result[:total_cost_usd],
-          duration: result[:duration_seconds],
-          cache_stats: result[:cache_stats],
-          awaiting_user_feedback: awaiting_user_feedback
-        )
+        if @is_subagent
+          @ui&.show_info("Subagent done (#{result[:iterations]} iterations, $#{result[:total_cost_usd].round(4)})")
+        else
+          @ui&.show_complete(
+            iterations: result[:iterations],
+            cost: result[:total_cost_usd],
+            duration: result[:duration_seconds],
+            cache_stats: result[:cache_stats],
+            awaiting_user_feedback: awaiting_user_feedback
+          )
+        end
         @hooks.trigger(:on_complete, result)
         result
       rescue Clacky::AgentInterrupted
@@ -714,6 +724,10 @@ module Clacky
         ui: @ui,
         profile: @agent_profile.name
       )
+      subagent.instance_variable_set(:@is_subagent, true)
+      # Inherit previous_total_tokens so the first iteration delta is calculated correctly
+      subagent.instance_variable_set(:@previous_total_tokens, @previous_total_tokens)
       # Deep clone messages to avoid cross-contamination
       subagent.instance_variable_set(:@messages, deep_clone(@messages))
@@ -809,11 +823,16 @@ module Clacky
     end
     # Format user content with optional images
+    # PDF files are handled upstream (server injects file path into message text),
+    # so this method only needs to handle images.
     # @param text [String] User's text input
     # @param images [Array<String>] Array of image file paths or data: URLs
-    # @return [String|Array] String if no images, Array with text and image_url objects if images present
-    private def format_user_content(text, images)
-      return text if images.nil? || images.empty?
+    # @param files [Array] Unused — kept for signature compatibility
+    # @return [String|Array] String if no images, Array with content blocks otherwise
+    private def format_user_content(text, images, files = [])
+      images ||= []
+      return text if images.empty?
       content = []
       content << { type: "text", text: text } unless text.nil? || text.empty?

data/lib/clacky/default_skills/channel-setup/SKILL.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 name: channel-setup
 description: |
-  Configure IM platform channels (Feishu/Lark, WeCom) for open-clacky.
-  Uses browser automation to complete setup automatically — no manual credential copying.
+  Configure IM platform channels (Feishu, WeCom) for open-clacky.
+  Uses browser automation for navigation; guides the user to paste credentials and perform UI steps.
   Trigger on: "channel setup", "setup feishu", "setup wecom", "channel config",
   "channel status", "channel enable", "channel disable", "channel reconfigure", "channel doctor".
   Subcommands: setup, status, enable <platform>, disable <platform>, reconfigure, doctor.
@@ -21,23 +21,14 @@ allowed-tools:
 Configure IM platform channels for open-clacky. Config is stored at `~/.clacky/channels.yml`.
-## Core Rule: Never ask for credentials
-All credentials (App Secret, Bot Secret, etc.) must be read directly from browser snapshots.
-**Asking the user to copy, type, or provide any credential is a failure.**
-If automation cannot reveal a value, say so and suggest retrying — never fall back to manual input.
-**Exception**: For Feishu and WeCom, guide the user to paste credentials — do not take snapshots or screenshots to extract. Directly ask the user to reveal and paste.
 ## Browser Automation Principles
-- Before opening any platform URL, detect Chrome availability and confirm the browser to use with the user.
-- **CRITICAL**: When the user chooses "1. Use my Chrome", pass `isolated: false` on every browser tool call (open, snapshot, etc.). When the user chooses "2. Use built-in", pass `isolated: true`. Omitting this causes the wrong browser to be used.
-- **When using the user's Chrome** (isolated=false): use `tab new <url>` instead of `open <url>` so the page opens in a new tab rather than replacing the current one.
-- After every navigation, take a snapshot before interacting with the page.
+- **Always use built-in browser**: Pass `isolated: true` on every browser tool call. Do NOT ask the user to choose — use the built-in browser only.
+- **Never use `screenshot`**: Use `snapshot -i` instead to get page structure as text. Do NOT generate image files.
+- Use `open <url>` for navigation.
+- AI navigates; user performs form fills, clicks, and pastes when instructed.
 - If a login page or QR code appears, tell the user to log in and wait for "done" before continuing.
-- To read a hidden credential: take an interactive snapshot to find the reveal/eye button, click it, then read the now-visible value from the next snapshot.
-- If stuck (CAPTCHA, unexpected page, dialog, cannot find a UI element, scroll fails), take a screenshot, describe the situation, and **guide the user to help** — do NOT fall back to alternative navigation (e.g., switching tabs, trying different URLs). Ask the user to perform the specific step manually and reply "done" when ready.
-- Never print raw secrets — mask to last 4 characters in all output.
+- If stuck (CAPTCHA, unexpected page, dialog, cannot find a UI element), **guide the user to help** — ask the user to perform the specific step manually and reply "done" when ready.
 ---
@@ -76,7 +67,7 @@ If the file doesn't exist: "No channels configured yet. Run `/channel-setup setu
 Ask:
 > Which platform would you like to connect?
 >
-> 1. Feishu / Lark
+> 1. Feishu
 > 2. WeCom (Enterprise WeChat)
 ---
@@ -85,33 +76,27 @@ Ask:
 #### Phase 1 — Open Feishu Open Platform
-1. Detect Chrome and confirm browser preference with the user. **Remember**: user chose 1 → pass `isolated: false`; chose 2 → pass `isolated: true` on every browser call.
-2. Ask (if not clear from context):
-   > Are you using Feishu (China) or Lark (International)?
-   > 1. Feishu — https://open.feishu.cn
-   > 2. Lark — https://open.larksuite.com
-3. Navigate to `https://open.feishu.cn/app` (or `/larksuite.com/app`).
-4. Take a snapshot. If a login page or QR code is shown, tell the user to log in and wait for "done".
-5. Confirm the app list is visible.
+1. Navigate: `open https://open.feishu.cn/app`. Pass `isolated: true`.
+2. Use `snapshot -i` to check page state. If a login page or QR code is shown, tell the user to log in and wait for "done".
+3. Confirm the app list is visible.
 #### Phase 2 — Create a new app
-6. **Always create a new app** — do NOT reuse existing apps. Click "Create Enterprise Self-Built App", then create with name `Open Clacky` and description `AI assistant powered by open-clacky`.
+6. **Always create a new app** — do NOT reuse existing apps. Guide the user: "Click 'Create Enterprise Self-Built App', fill in name (e.g. Open Clacky) and description (e.g. AI assistant powered by open-clacky), then submit. Reply done." Wait for "done".
-#### Phase 3 — Get credentials
+#### Phase 3 — Enable Bot capability
-7. Navigate to the app's Credentials & Basic Info page.
-8. Do NOT take snapshots or screenshots. Directly guide the user: "Click the eye icon next to App Secret to reveal it. Copy App ID and App Secret, then paste here. Reply with: App ID: xxx, App Secret: xxx" (confirm back masked to last 4 chars).
+7. Feishu opens Add App Capabilities by default after creating an app. Guide the user: "Find the Bot capability card and click the Add button next to it, then reply done." Wait for "done".
-#### Phase 4 — Enable Bot capability
+#### Phase 4 — Get credentials
-10. Navigate to Add App Capabilities in the left menu.
-11. Find the Bot capability card and add it. Confirm any dialog.
+8. Navigate to Credentials & Basic Info in the left menu.
+9. Guide the user: "Copy App ID and App Secret, then paste here. Reply with: App ID: xxx, App Secret: xxx" Wait for "done".
 #### Phase 5 — Add message permissions
-12. Navigate to Permission Management and open the bulk import dialog.
-13. **Clear the default/example content first** (select all, delete), then paste the following JSON:
+10. Navigate to Permission Management and open the bulk import dialog.
+11. Guide the user: "In the bulk import dialog, clear the existing example first (select all, delete), then paste the following JSON. Reply done." Wait for "done". Do NOT try to clear or edit via browser — user does it.
 ```json
 {
@@ -126,45 +111,21 @@ Ask:
 }
 ```
-14. Confirm all three permissions appear as enabled.
+#### Phase 6 — Configure event subscription (Long Connection)
-#### Phase 6 — Configure event subscription
+**CRITICAL**: Feishu requires the long connection to be established *before* you can save the event config. The platform shows "No application connection detected, ensure long connection is established before saving" until `clacky server` is running and connected. Do NOT try to save until the connection is established.
-15. Navigate to Events & Callbacks.
-16. Change the subscription method to **Long Connection** and save.
-17. Add the event `im.message.receive_v1`.
+12. **Apply config and establish connection** — Run `curl -X POST http://localhost:7070/api/channels/feishu -H "Content-Type: application/json" -d '{"app_id":"...","app_secret":"...","domain":"..."}'`. The server hot-reloads the Feishu adapter and establishes the WebSocket.
+13. **Wait for connection** — Wait until the log shows `[feishu-ws] WebSocket connected ✅`.
+14. **Navigate to Events & Callbacks** — Then guide the user: "Select 'Long Connection' mode. Click Save. Then click Add Event, type `im.message.receive_v1` in the search box, select it, click Add. Reply done." Wait for "done".
 #### Phase 7 — Publish the app
-18. Navigate to Version Management & Release, create a new version (e.g. `1.0.0`), and publish.
-19. Note: personal accounts publish immediately; enterprise accounts require admin approval — tell the user if this applies.
-#### Phase 8 — Allowed users (optional)
+15. Navigate to Version Management & Release. Then guide the user: "Create a new version, fill in version (e.g. 1.0.0) and update description (e.g. Initial release for Open Clacky), then publish. Reply done." Wait for "done".
-20. Ask:
-    > Do you want to restrict which Feishu users can send tasks to the AI?
-    > Reply "skip" to allow everyone, or "yes" to configure a whitelist.
-21. If "yes":
-    - Tell the user to send any message to the Open Clacky bot in Feishu, then reply "done".
-    - Navigate to Log Search → Event Log, find the latest `im.message.receive_v1` event, and read `sender.sender_id.open_id` (format `ou_xxx`) directly from the page.
-    - Repeat for additional users if needed.
+#### Phase 8 — Finalize config and validate
-#### Phase 9 — Save config and validate
-Write `~/.clacky/channels.yml` (merge with existing content, never overwrite other platforms):
-```yaml
-channels:
-  feishu:
-    enabled: true
-    app_id: <from user paste>
-    app_secret: <from user paste>
-    domain: https://open.feishu.cn   # or https://open.larksuite.com
-    # allowed_users:                 # omit if not configured
-    #   - ou_xxx
-```
-Run `chmod 600 ~/.clacky/channels.yml`.
+Config was applied in step 12 (via API).
 Validate:
 ```bash
@@ -174,57 +135,23 @@ curl -s -X POST "${DOMAIN}/open-apis/auth/v3/tenant_access_token/internal" \
 ```
 Check for `"code":0`. If it fails, explain and offer to retry.
-On success: "✅ Feishu channel configured. Restart `clacky server` to activate."
+On success: "✅ Feishu channel configured. The channel is already active."
 ---
 ### WeCom setup
-First ask: "Are you an admin of your WeCom enterprise (can log in to work.weixin.qq.com)? Reply 1 or 2."
-- If using AskFollowupQuestion: pass options as `Yes — I am an enterprise admin` and `No — I am not an admin` (no leading numbers; the tool will add 1. and 2.).
-- **1** → use Admin Console flow (browser automation).
-- **2** → use Client flow (guide user in WeCom desktop client, then ask for Bot ID and Secret).
----
-#### Admin Console flow (user has admin access)
-**Principle**: Do NOT take snapshots or screenshots to inspect the UI. Directly guide the user through each step. For Bot ID and Secret, guide the user to paste them — do NOT try to extract from the page.
+1. Navigate: `open https://work.weixin.qq.com/wework_admin/frame#/aiHelper/create`. Pass `isolated: true`.
+2. Use `snapshot -i` to check page state. If a login page or QR code is shown, tell the user to log in and wait for "done".
+3. Steps 3–7: Do NOT take snapshots. Guide the user: "Scroll to the bottom of the right panel and click 'API mode creation'. Reply done." Wait for "done".
+4. Guide the user: "Click 'Add' next to 'Visible Range'. In the scope dialog, select the top-level company node (or specific users/departments). Click Confirm. Reply done." Wait for "done".
+5. Guide the user: "If Secret is not visible, click 'Get Secret'. Copy Bot ID and Secret **before** clicking Save — do NOT click 'Get Secret' again after copying (it invalidates the previous secret). Paste here. Reply with: Bot ID: xxx, Secret: xxx" Wait for "done".
+6. Guide the user: "Click Save. In the dialog, enter name (e.g. Open Clacky) and description (e.g. AI assistant powered by open-clacky). Click Confirm. Click Save again. Reply done." Wait for "done".
+7. **Apply config and hot-reload** — Parse credentials from step 5. Trim leading/trailing whitespace from bot_id and secret. Run `curl -X POST http://localhost:7070/api/channels/wecom -H "Content-Type: application/json" -d '{"bot_id":"...","secret":"..."}'`. Ensure bot_id (starts with `aib`) and secret (longer string) are not swapped.
-1. Detect Chrome and confirm browser preference with the user (if not already done). **Remember**: user chose 1 → pass `isolated: false` when calling browser; chose 2 → pass `isolated: true`.
-2. Navigate directly to `https://work.weixin.qq.com/wework_admin/frame#/aiHelper/create` (use `tab new <url>` when isolated=false). Pass the same `isolated` value on every browser call.
-3. Directly guide the user: "If you see a login page or QR code, log in. When the create page is visible, reply done." Wait for "done".
-4. Guide the user: "Scroll to the bottom of the right panel and click 'API mode creation', then reply done." Wait for "done".
-5. Guide the user: "In the scope dialog, select the top-level company node to allow all members, or select specific users/departments if you prefer. Click Confirm, then reply done." Wait for "done".
-6. Guide the user: "If the Secret is not yet visible, click 'Get Secret'. When both Bot ID and Secret are visible, copy them and paste here. Reply with: Bot ID: xxx, Secret: xxx" (confirm back masked to last 4 chars).
-7. Guide the user: "Click Save. In the dialog, enter name 'Open Clacky' and description 'AI assistant powered by open-clacky', click Confirm, then reply done." Wait for "done".
-8. Write config and run `chmod 600 ~/.clacky/channels.yml`.
----
-#### Client flow (user is not admin; cannot access admin console)
-Guide the user to operate in the **WeCom desktop client** (Workbench). No browser automation needed.
-1. Guide the user: "Open the WeCom desktop client → Workbench → Smart Bot → "Create Bot". Reply done when you see the creation page." Wait for "done".
-2. Guide the user: "Scroll to the bottom of the page and click 'API Mode'. Reply done." Wait for "done".
-3. Guide the user: "The Bot ID appears on the right side. Under 'API Configuration', find the Secret row and click 'Click to Reveal' if needed. Copy both and paste here. Reply with: Bot ID: xxx, Secret: xxx" (confirm back masked to last 4 chars).
-4. Guide the user: "Fill in name 'Open Clacky' and description 'AI assistant powered by open-clacky', click Save (or Confirm), then reply done." Wait for "done".
-5. Write `~/.clacky/channels.yml` and run `chmod 600 ~/.clacky/channels.yml`.
----
-#### Save config (both flows)
-```yaml
-channels:
-  wecom:
-    enabled: true
-    bot_id: <extracted or entered>
-    secret: <extracted or entered>
-```
+On success: "✅ WeCom channel configured."
-On success: "✅ WeCom channel configured. Restart `clacky server` to activate. To use the bot: WeCom client → Workbench → Management → click the bot details → Go to use."
+On success: "✅ WeCom channel configured. To use the bot: WeCom client → Contacts → select Smart Bot to see the newly created bot.".
 ---

data/lib/clacky/default_skills/pdf-reader/SKILL.md ADDED Viewed

@@ -0,0 +1,90 @@
+---
+name: pdf-reader
+description: 'Read and analyze PDF files. Use this skill when the user has attached a PDF or mentions a PDF file path and wants to understand, summarize, extract, or ask questions about its content. Trigger on: "read this PDF", "analyze the PDF", "what does this PDF say", "what is in this file", "里面有什么", "帮我看看这个PDF", "总结一下", "这份文件说了什么" — or when a message contains a PDF attachment reference even without an explicit question. Also trigger when the user asks vague questions like "what is this?", "summarize", "tell me about this" if a PDF is attached.'
+disable-model-invocation: false
+user-invocable: true
+---
+# PDF Reading Skill
+## Your Goal
+Extract text content from the PDF file and answer the user's question based on that content. If the user's question is vague or absent, default to providing a clear structured summary of the document.
+## Step 1 — Extract text from the PDF
+Use `pdftotext` (preferred, fastest) or Python `pdfplumber` as fallback.
+### Option A: pdftotext (use this first)
+```bash
+pdftotext -layout -enc UTF-8 "/path/to/file.pdf" -
+```
+- `-enc UTF-8` ensures correct encoding for Chinese, Japanese, and other non-Latin text
+- `-layout` preserves column layout for tables
+- The `-` at the end prints to stdout (no temp file needed)
+**Install if missing:**
+- macOS: `brew install poppler`
+- Ubuntu/Debian: `apt install poppler-utils`
+- CentOS/Fedora: `yum install poppler-utils`
+### Option B: Python pdfplumber (fallback if pdftotext not available)
+```python
+import pdfplumber
+with pdfplumber.open("/path/to/file.pdf") as pdf:
+    for i, page in enumerate(pdf.pages, 1):
+        text = page.extract_text()
+        if text:
+            print(f"--- Page {i} ---")
+            print(text)
+```
+### Option C: pypdf (last resort)
+```python
+from pypdf import PdfReader
+reader = PdfReader("/path/to/file.pdf")
+for i, page in enumerate(reader.pages, 1):
+    print(f"--- Page {i} ---")
+    print(page.extract_text())
+```
+## Step 2 — Handle large files
+If the extracted text is truncated or very long (>200 lines):
+- For a **summary request**: read the full output file instead of relying on stdout — save to a temp file first:
+  ```bash
+  pdftotext -layout -enc UTF-8 "/path/to/file.pdf" /tmp/pdf_extracted.txt
+  cat /tmp/pdf_extracted.txt
+  ```
+- For a **specific question**: use `grep` to locate relevant sections before reading the full content:
+  ```bash
+  grep -n "keyword" /tmp/pdf_extracted.txt | head -30
+  ```
+- Extract once, answer from memory — do NOT re-read the file multiple times.
+## Step 3 — Answer the user's question
+### Output format guidelines
+Adapt the response format to the document type:
+| Document type | Recommended format |
+|---|---|
+| Business plan / Report | Structured summary with ## headers per section |
+| Contract / Legal | Key clauses in bullet points, highlight dates and parties |
+| Academic paper | Abstract → Key findings → Methodology → Conclusions |
+| Invoice / Receipt | Table: item, amount, total |
+| General / Unknown | Brief overview paragraph + key points as bullets |
+**General rules:**
+- Use Markdown formatting (headers, bullets, tables) for clarity
+- Match the user's language — if they asked in Chinese, answer in Chinese
+- Lead with the most important information first
+- If the user asked a specific question, answer it directly before summarizing
+## Rules
+- Always use the **actual file path** from the `[PDF attached: ...]` message
+- If text extraction returns empty (scanned/image PDF), inform the user and suggest: `brew install tesseract` + `tesseract file.pdf output txt`
+- Do NOT re-read the file multiple times — extract once, answer from memory
+- If the user's question is vague (e.g. "里面有什么", "what is this?"), default to a full structured summary

data/lib/clacky/default_skills/skill-add/SKILL.md CHANGED Viewed

@@ -1,44 +1,60 @@
 ---
 name: skill-add
-description: Install skills from GitHub. Use when user provides a GitHub URL to install a skill or skill collection. Trigger on phrases like "install skill from github", "add skill from repo", "/skill-add https://github.com/..."
+description: 'Install skills from a zip URL. Use this skill whenever the user wants to install a skill from a zip link, or uses commands like /skill-add with a URL. Trigger on phrases like: install skill, install from zip, skill from zip, skill from url, add skill from zip, 安装skill, 从zip安装skill.'
 disable-model-invocation: false
 user-invocable: true
 ---
-# Skill Add — Install Skills from GitHub
+# Skill Add — Zip Installer
-Install skills from a GitHub repository into `~/.clacky/skills/`.
+Installs a skill from a zip URL using the bundled `install_from_zip.rb` script.
-## Supported URL Formats
+## Finding the Script
-```
-# Install all skills found in the repo's skills/ directories
-https://github.com/user/repo
-# Install all skills under a specific subdirectory
-https://github.com/user/repo/skills
+The script lives inside this skill's directory, in one of two locations:
+- Global: `~/.clacky/skills/skill-add/scripts/`
+- Project-level: `.clacky/skills/skill-add/scripts/`
-# Install a single specific skill by subpath
-https://github.com/user/repo/skills/my-skill
+Locate it at runtime with:
+```bash
+ruby "$(find ~/.clacky/skills/skill-add .clacky/skills/skill-add -name 'install_from_zip.rb' 2>/dev/null | head -1)" <zip_url> <slug>
 ```
-## How to Run
+---
-Execute the installer script via shell:
+## How to Install
+When the user provides a `.zip` URL, run:
 ```bash
-ruby lib/clacky/default_skills/skill-add/scripts/install_from_github.rb <github_url>
+ruby "$(find ~/.clacky/skills/skill-add .clacky/skills/skill-add -name 'install_from_zip.rb' 2>/dev/null | head -1)" <zip_url> <slug>
 ```
-The script will:
-1. Clone the repo (shallow, depth=1)
-2. Locate the skill(s) based on the URL (whole repo scan, or specific subpath)
-3. Copy each skill to `~/.clacky/skills/<skill-name>/`
-4. Report what was installed
+- `<zip_url>` — the download URL provided by the user
+- `<slug>` — the skill's directory name; if not provided, infer it from the URL filename by stripping version suffixes (e.g. `canvas-design-1.2.0.zip` → `canvas-design`)
+The script handles everything automatically:
+- Downloads the zip (follows HTTP redirects)
+- Extracts and locates all `SKILL.md` files inside
+- Copies skill directories to `.clacky/skills/` in the current project (overwrites existing)
+- Reports installed skills with their descriptions
+**Do NOT manually download or unzip — the script handles everything.**
-After installation, the skill is immediately available as `/skill-name` in all sessions.
+## Example
+```
+/skill-add https://store.clacky.ai/skills/canvas-design-1.2.0.zip
+```
+```bash
+ruby "$(find ~/.clacky/skills/skill-add .clacky/skills/skill-add -name 'install_from_zip.rb' 2>/dev/null | head -1)" \
+  "https://store.clacky.ai/skills/canvas-design-1.2.0.zip" \
+  "canvas-design"
+```
 ## Notes
-- If a skill with the same name already exists, it is skipped (not overwritten)
-- For creating new skills from scratch, use the `skill-creator` skill instead
+- Skills install to `.clacky/skills/` in the current project
+- Project-level skills override global skills (`~/.clacky/skills/`)
+- If the user doesn't provide a URL, ask them for the zip URL — this skill only supports zip installs