agentsite-kit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,226 @@
1
+ # AgentSite Kit
2
+
3
+ [中文文档](./README.zh-CN.md)
4
+
5
+ Make any website Agent-friendly.
6
+
7
+ AgentSite Kit scans your website's public pages, extracts and classifies content, then generates structured data and APIs that AI Agents can reliably read, search, and query.
8
+
9
+ In the age of AI search (Perplexity, ChatGPT Search) and autonomous Agents, websites need more than just human-readable HTML. AgentSite Kit adds an **Agent-readable layer** on top of your existing site — no redesign required.
10
+
11
+ ## What It Does
12
+
13
+ - **Scans** your site via sitemap or crawling, respects robots.txt
14
+ - **Classifies** pages automatically (docs, FAQ, blog, product, pricing, changelog, etc.)
15
+ - **Extracts** titles, summaries, headings, metadata, and body content
16
+ - **Generates** standardized output files: `llms.txt`, `agent-sitemap.json`, `agent-index.json`, and per-type structured JSON
17
+ - **Serves** a query API for Agents to search and retrieve your content
18
+ - **Supports MCP** (Model Context Protocol) for direct integration with AI tools
19
+ - **Detects changes** incrementally — only re-processes updated pages
20
+
21
+ ## Quick Start
22
+
23
+ ```bash
24
+ # Install dependencies and build
25
+ npm install
26
+ npm run build
27
+
28
+ # Initialize config (interactive)
29
+ npx agentsite init
30
+
31
+ # Scan your website
32
+ npx agentsite scan
33
+
34
+ # Generate Agent-friendly files
35
+ npx agentsite generate
36
+
37
+ # Start the API server
38
+ npx agentsite serve
39
+ ```
40
+
41
+ The API server runs on `http://localhost:3141` by default.
42
+
43
+ ## Commands
44
+
45
+ | Command | Description |
46
+ |---------|-------------|
47
+ | `agentsite init` | Create `agentsite.config.yaml` interactively |
48
+ | `agentsite init -t <template>` | Initialize with an industry template |
49
+ | `agentsite init --list-templates` | List available templates |
50
+ | `agentsite scan` | Scan website and classify pages |
51
+ | `agentsite scan --no-llm` | Scan without LLM-assisted classification |
52
+ | `agentsite generate` | Generate Agent-friendly files from scan results |
53
+ | `agentsite serve` | Start the API server |
54
+ | `agentsite serve -p 8080` | Start on a custom port |
55
+ | `agentsite update` | Incremental update — re-scan, detect changes, regenerate |
56
+ | `agentsite mcp` | Start MCP server over stdio |
57
+
58
+ ## Templates
59
+
60
+ Pre-configured templates for common site types:
61
+
62
+ - `docs-site` — Documentation websites
63
+ - `blog` — Blogs and content sites
64
+ - `saas` — SaaS product websites
65
+ - `knowledge-base` — Knowledge base / wiki sites
66
+ - `ecommerce` — E-commerce stores
67
+ - `portfolio` — Personal / portfolio sites
68
+ - `api-docs` — API documentation
69
+ - `community` — Community forums
70
+
71
+ ```bash
72
+ npx agentsite init -t saas
73
+ ```
74
+
75
+ ## API Endpoints
76
+
77
+ Once `agentsite serve` is running:
78
+
79
+ | Endpoint | Description |
80
+ |----------|-------------|
81
+ | `GET /api/health` | Health check |
82
+ | `GET /api/search?q=keyword` | Search across all content |
83
+ | `GET /api/pages/:id` | Get a single page |
84
+ | `GET /api/docs` | Documentation entries |
85
+ | `GET /api/faq` | FAQ entries |
86
+ | `GET /api/products` | Product entries |
87
+ | `GET /api/articles` | Blog / article entries |
88
+ | `GET /api/pricing` | Pricing information |
89
+ | `GET /api/changelog` | Changelog entries |
90
+ | `GET /api/stats` | Site statistics |
91
+ | `GET /api/config` | Current configuration |
92
+ | `GET /api/files` | Generated file listing |
93
+ | `GET /api/access-log` | Access log |
94
+ | `GET /api/sites` | All configured sites (multi-site) |
95
+
96
+ ## Generated Files
97
+
98
+ After `scan` + `generate`, the `.agentsite/` directory contains:
99
+
100
+ ```
101
+ .agentsite/
102
+ ├── llms.txt # LLM-friendly plain text overview
103
+ ├── agent-sitemap.json # Agent-oriented sitemap
104
+ ├── agent-index.json # Structured site index
105
+ ├── scan-result.json # Raw scan data
106
+ └── data/
107
+ ├── docs.json # Documentation
108
+ ├── faq.json # FAQs
109
+ ├── products.json # Products
110
+ ├── articles.json # Articles / blog posts
111
+ ├── pricing.json # Pricing
112
+ └── changelog.json # Changelog
113
+ ```
114
+
115
+ ## Configuration
116
+
117
+ `agentsite.config.yaml` example:
118
+
119
+ ```yaml
120
+ site:
121
+ url: https://example.com
122
+ name: My Site
123
+ description: A brief description of your site
124
+
125
+ scan:
126
+ maxPages: 100
127
+ concurrency: 3
128
+ delayMs: 200
129
+ include:
130
+ - "**"
131
+ exclude: []
132
+ respectRobotsTxt: true
133
+
134
+ output:
135
+ dir: .agentsite
136
+ formats:
137
+ - llms-txt
138
+ - agent-sitemap
139
+ - agent-index
140
+ - structured
141
+
142
+ server:
143
+ port: 3141
144
+ rateLimit:
145
+ max: 60
146
+ timeWindow: 1 minute
147
+ accessLog: true
148
+
149
+ access:
150
+ allowedPages:
151
+ - "**"
152
+ blockedPages: []
153
+ allowedTypes:
154
+ - docs
155
+ - faq
156
+ - blog
157
+ - product
158
+ - pricing
159
+ - about
160
+ - contact
161
+ - changelog
162
+ summaryOnly: false
163
+ allowSearch: true
164
+ ```
165
+
166
+ ### LLM-Assisted Mode
167
+
168
+ Add LLM config to enable AI-powered page classification and summarization:
169
+
170
+ ```yaml
171
+ llm:
172
+ provider: openai # or anthropic, etc.
173
+ apiKey: sk-...
174
+ model: gpt-4o-mini
175
+ ```
176
+
177
+ ## Docker
178
+
179
+ ```bash
180
+ # Build and run
181
+ docker compose up -d
182
+
183
+ # Or build manually
184
+ docker build -t agentsite .
185
+ docker run -p 3141:3141 -v ./agentsite.config.yaml:/app/agentsite.config.yaml:ro agentsite
186
+ ```
187
+
188
+ ## MCP Integration
189
+
190
+ AgentSite Kit can run as an MCP server, allowing AI tools (Claude, Cursor, etc.) to directly query your site data:
191
+
192
+ ```bash
193
+ npx agentsite mcp
194
+ ```
195
+
196
+ Add to your MCP client config:
197
+
198
+ ```json
199
+ {
200
+ "mcpServers": {
201
+ "my-site": {
202
+ "command": "npx",
203
+ "args": ["agentsite", "mcp"]
204
+ }
205
+ }
206
+ }
207
+ ```
208
+
209
+ ## Plugin System
210
+
211
+ AgentSite Kit supports plugins with lifecycle hooks:
212
+
213
+ ```yaml
214
+ plugins:
215
+ - ./my-plugin.js
216
+ ```
217
+
218
+ Plugins can hook into `beforeScan`, `afterScan`, `beforeGenerate`, and `afterGenerate` stages.
219
+
220
+ ## Requirements
221
+
222
+ - Node.js >= 18
223
+
224
+ ## License
225
+
226
+ MIT
@@ -0,0 +1,226 @@
1
+ # AgentSite Kit
2
+
3
+ [English](./README.md)
4
+
5
+ 让任何网站对 AI Agent 友好。
6
+
7
+ AgentSite Kit 扫描网站的公开页面,提取并分类内容,然后生成结构化数据和 API,让 AI Agent 能稳定地读取、检索和查询你的网站内容。
8
+
9
+ 在 AI 搜索(Perplexity、ChatGPT Search)和自动化 Agent 的时代,网站不能只有人类可读的 HTML。AgentSite Kit 在你现有网站之上添加一层 **Agent 可读层** —— 无需重新设计网站。
10
+
11
+ ## 功能特性
12
+
13
+ - **扫描** — 通过 sitemap 或爬虫发现页面,遵守 robots.txt
14
+ - **分类** — 自动识别页面类型(文档、FAQ、博客、产品、定价、更新日志等)
15
+ - **提取** — 标题、摘要、标题层级、元数据、正文内容
16
+ - **生成** — 标准化输出文件:`llms.txt`、`agent-sitemap.json`、`agent-index.json` 及分类结构化 JSON
17
+ - **服务** — 提供查询 API,供 Agent 搜索和获取内容
18
+ - **MCP 支持** — 兼容 Model Context Protocol,可直接接入 AI 工具
19
+ - **增量更新** — 检测变化,只重新处理有更新的页面
20
+
21
+ ## 快速开始
22
+
23
+ ```bash
24
+ # 安装依赖并构建
25
+ npm install
26
+ npm run build
27
+
28
+ # 初始化配置(交互式)
29
+ npx agentsite init
30
+
31
+ # 扫描网站
32
+ npx agentsite scan
33
+
34
+ # 生成 Agent 可读文件
35
+ npx agentsite generate
36
+
37
+ # 启动 API 服务
38
+ npx agentsite serve
39
+ ```
40
+
41
+ API 服务默认运行在 `http://localhost:3141`。
42
+
43
+ ## 命令一览
44
+
45
+ | 命令 | 说明 |
46
+ |------|------|
47
+ | `agentsite init` | 交互式创建 `agentsite.config.yaml` |
48
+ | `agentsite init -t <模板>` | 使用行业模板初始化 |
49
+ | `agentsite init --list-templates` | 列出可用模板 |
50
+ | `agentsite scan` | 扫描网站并分类页面 |
51
+ | `agentsite scan --no-llm` | 扫描时不使用 LLM 辅助分类 |
52
+ | `agentsite generate` | 根据扫描结果生成 Agent 可读文件 |
53
+ | `agentsite serve` | 启动 API 服务 |
54
+ | `agentsite serve -p 8080` | 指定端口启动 |
55
+ | `agentsite update` | 增量更新 — 重新扫描、检测变化、重新生成 |
56
+ | `agentsite mcp` | 启动 MCP 服务(stdio 模式) |
57
+
58
+ ## 行业模板
59
+
60
+ 内置多种常见网站类型的预配置模板:
61
+
62
+ - `docs-site` — 文档站
63
+ - `blog` — 博客 / 内容站
64
+ - `saas` — SaaS 产品官网
65
+ - `knowledge-base` — 知识库 / Wiki
66
+ - `ecommerce` — 电商网站
67
+ - `portfolio` — 个人作品集
68
+ - `api-docs` — API 文档
69
+ - `community` — 社区论坛
70
+
71
+ ```bash
72
+ npx agentsite init -t saas
73
+ ```
74
+
75
+ ## API 接口
76
+
77
+ 启动 `agentsite serve` 后可用:
78
+
79
+ | 接口 | 说明 |
80
+ |------|------|
81
+ | `GET /api/health` | 健康检查 |
82
+ | `GET /api/search?q=关键词` | 全文搜索 |
83
+ | `GET /api/pages/:id` | 获取单个页面 |
84
+ | `GET /api/docs` | 文档数据 |
85
+ | `GET /api/faq` | FAQ 数据 |
86
+ | `GET /api/products` | 产品数据 |
87
+ | `GET /api/articles` | 文章 / 博客数据 |
88
+ | `GET /api/pricing` | 定价信息 |
89
+ | `GET /api/changelog` | 更新日志 |
90
+ | `GET /api/stats` | 站点统计 |
91
+ | `GET /api/config` | 当前配置 |
92
+ | `GET /api/files` | 生成文件列表 |
93
+ | `GET /api/access-log` | 访问日志 |
94
+ | `GET /api/sites` | 所有配置的站点(多站点模式) |
95
+
96
+ ## 生成文件
97
+
98
+ 执行 `scan` + `generate` 后,`.agentsite/` 目录结构:
99
+
100
+ ```
101
+ .agentsite/
102
+ ├── llms.txt # LLM 友好的纯文本概览
103
+ ├── agent-sitemap.json # Agent 专用站点地图
104
+ ├── agent-index.json # 结构化站点索引
105
+ ├── scan-result.json # 原始扫描数据
106
+ └── data/
107
+ ├── docs.json # 文档
108
+ ├── faq.json # 常见问题
109
+ ├── products.json # 产品
110
+ ├── articles.json # 文章 / 博客
111
+ ├── pricing.json # 定价
112
+ └── changelog.json # 更新日志
113
+ ```
114
+
115
+ ## 配置说明
116
+
117
+ `agentsite.config.yaml` 示例:
118
+
119
+ ```yaml
120
+ site:
121
+ url: https://example.com
122
+ name: 我的网站
123
+ description: 网站简介
124
+
125
+ scan:
126
+ maxPages: 100
127
+ concurrency: 3
128
+ delayMs: 200
129
+ include:
130
+ - "**"
131
+ exclude: []
132
+ respectRobotsTxt: true
133
+
134
+ output:
135
+ dir: .agentsite
136
+ formats:
137
+ - llms-txt
138
+ - agent-sitemap
139
+ - agent-index
140
+ - structured
141
+
142
+ server:
143
+ port: 3141
144
+ rateLimit:
145
+ max: 60
146
+ timeWindow: 1 minute
147
+ accessLog: true
148
+
149
+ access:
150
+ allowedPages:
151
+ - "**"
152
+ blockedPages: []
153
+ allowedTypes:
154
+ - docs
155
+ - faq
156
+ - blog
157
+ - product
158
+ - pricing
159
+ - about
160
+ - contact
161
+ - changelog
162
+ summaryOnly: false
163
+ allowSearch: true
164
+ ```
165
+
166
+ ### LLM 辅助模式
167
+
168
+ 添加 LLM 配置以启用 AI 驱动的页面分类和摘要生成:
169
+
170
+ ```yaml
171
+ llm:
172
+ provider: openai # 或 anthropic 等
173
+ apiKey: sk-...
174
+ model: gpt-4o-mini
175
+ ```
176
+
177
+ ## Docker 部署
178
+
179
+ ```bash
180
+ # 构建并运行
181
+ docker compose up -d
182
+
183
+ # 或手动构建
184
+ docker build -t agentsite .
185
+ docker run -p 3141:3141 -v ./agentsite.config.yaml:/app/agentsite.config.yaml:ro agentsite
186
+ ```
187
+
188
+ ## MCP 集成
189
+
190
+ AgentSite Kit 可作为 MCP 服务运行,让 AI 工具(Claude、Cursor 等)直接查询你的站点数据:
191
+
192
+ ```bash
193
+ npx agentsite mcp
194
+ ```
195
+
196
+ 在 MCP 客户端配置中添加:
197
+
198
+ ```json
199
+ {
200
+ "mcpServers": {
201
+ "my-site": {
202
+ "command": "npx",
203
+ "args": ["agentsite", "mcp"]
204
+ }
205
+ }
206
+ }
207
+ ```
208
+
209
+ ## 插件系统
210
+
211
+ AgentSite Kit 支持插件,可在生命周期各阶段介入:
212
+
213
+ ```yaml
214
+ plugins:
215
+ - ./my-plugin.js
216
+ ```
217
+
218
+ 支持的钩子:`beforeScan`、`afterScan`、`beforeGenerate`、`afterGenerate`。
219
+
220
+ ## 环境要求
221
+
222
+ - Node.js >= 18
223
+
224
+ ## 许可证
225
+
226
+ MIT
@@ -0,0 +1,2 @@
1
+ #!/usr/bin/env node
2
+ import '../dist/index.js';