@praveencs/agent 0.7.2 โ 0.7.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +16 -0
- package/ROADMAP.md +35 -0
- package/dist/src/cli/index.js +1 -1
- package/docs/articles/01-vision-architecture.md +52 -0
- package/docs/articles/02-goal-decomposition.md +80 -0
- package/docs/articles/03-skill-execution.md +80 -0
- package/docs/articles/04-memory-persistence.md +69 -0
- package/docs/articles/05-self-improvement.md +89 -0
- package/package.json +3 -1
package/README.md
CHANGED
|
@@ -174,6 +174,22 @@ The runtime consists of several modular components:
|
|
|
174
174
|
|
|
175
175
|
---
|
|
176
176
|
|
|
177
|
+
|
|
178
|
+
## ๐ Learning Series
|
|
179
|
+
|
|
180
|
+
Want to understand how this agent works under the hood? Check out our 5-part architecture series:
|
|
181
|
+
|
|
182
|
+
1. [**Vision & Architecture**](docs/articles/01-vision-architecture.md) - The high-level design.
|
|
183
|
+
2. [**The Brain (Planner)**](docs/articles/02-goal-decomposition.md) - How goal decomposition works.
|
|
184
|
+
3. [**The Body (Executor)**](docs/articles/03-skill-execution.md) - Secure skill execution.
|
|
185
|
+
4. [**Memory & Context**](docs/articles/04-memory-persistence.md) - SQLite & Vector storage.
|
|
186
|
+
5. [**Self-Improvement**](docs/articles/05-self-improvement.md) - Metrics & The Auto-Fixer.
|
|
187
|
+
|
|
188
|
+
## ๐ฎ What's Next?
|
|
189
|
+
|
|
190
|
+
We are just getting started. The future includes **Multi-Agent Swarms**, **Sandboxed Execution**, and **Voice Interfaces**.
|
|
191
|
+
Check out our detailed [**ROADMAP.md**](ROADMAP.md) to see where we are heading and how you can help build the future of autonomous software development.
|
|
192
|
+
|
|
177
193
|
## ๐ค Contributing
|
|
178
194
|
|
|
179
195
|
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to set up your development environment.
|
package/ROADMAP.md
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# ๐ฃ๏ธ Roadmap: The Future of @praveencs/agent
|
|
2
|
+
|
|
3
|
+
We have built a robust autonomous agent runtime (`v0.7.x`). But this is just the beginning.
|
|
4
|
+
Here is our vision for the next major milestones.
|
|
5
|
+
|
|
6
|
+
## Phase 1: Robustness & Safety (Current Focus)
|
|
7
|
+
- [ ] **Sandboxed Execution**: Run all shell skills inside ephemeral Docker containers to prevent accidental system damage.
|
|
8
|
+
- [ ] **Permission Scopes**: Fine-grained access control (e.g., "Allow read access to `/project` but write access only to `/project/src`").
|
|
9
|
+
- [ ] **Secrets Management**: Secure, encrypted storage for API keys integrated with system keychains.
|
|
10
|
+
|
|
11
|
+
## Phase 2: Multi-Agent Collaboration (The Swarm)
|
|
12
|
+
- [ ] **Agent-to-Agent Protocol**: Define a standard schema for agents to send messages and delegate tasks to each other.
|
|
13
|
+
- [ ] **Specialized Personas**:
|
|
14
|
+
- `Coder Agent`: Writes and tests code.
|
|
15
|
+
- `Reviewer Agent`: Critiques pull requests.
|
|
16
|
+
- `Architect Agent`: High-level system design.
|
|
17
|
+
- [ ] **Orchestrator**: A master process that spins up specialized agents for a complex goal.
|
|
18
|
+
|
|
19
|
+
## Phase 3: Multimodal Interfaces
|
|
20
|
+
- [ ] **Voice Interface**: Speak to your agent ("Deploy this to prod") and hear responses.
|
|
21
|
+
- [ ] **Vision Capabilities**: Allow the agent to "see" your screen or read images (e.g., "Fix the CSS on this screenshot").
|
|
22
|
+
- [ ] **IDE Integration**: VS Code extension to have the agent live in your editor sidebar.
|
|
23
|
+
|
|
24
|
+
## Phase 4: The Agent Cloud
|
|
25
|
+
- [ ] **Skill Hub**: A public registry (npm-style) to share and install community skills.
|
|
26
|
+
- [ ] **Remote Execution**: Run the heavy agent logic on a cloud server while controlling it from your laptop.
|
|
27
|
+
- [ ] **Web Dashboard**: Real-time visualization of agent thought processes, memory graph, and task plans.
|
|
28
|
+
|
|
29
|
+
## ๐ค Join the Mission
|
|
30
|
+
This is an open-source journey. We need help with:
|
|
31
|
+
- Writing new Skills (see `docs/articles/03-skill-execution.md`)
|
|
32
|
+
- Improving the Planner prompt engineering
|
|
33
|
+
- Building the Web Dashboard
|
|
34
|
+
|
|
35
|
+
Submit a PR and let's build the future of work, together.
|
package/dist/src/cli/index.js
CHANGED
|
@@ -18,7 +18,7 @@ export function createCLI() {
|
|
|
18
18
|
program
|
|
19
19
|
.name('agent')
|
|
20
20
|
.description('Agent Runtime โ autonomous, goal-oriented AI agent with skills, plans, memory, and permissioned tools')
|
|
21
|
-
.version('0.7.
|
|
21
|
+
.version('0.7.4')
|
|
22
22
|
.option('--verbose', 'Enable verbose output')
|
|
23
23
|
.option('--no-color', 'Disable colored output')
|
|
24
24
|
.option('--config <path>', 'Path to config file');
|
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
# Building an Autonomous AI Agent: The Vision & Architecture (Part 1/5)
|
|
2
|
+
|
|
3
|
+
Most AI projects today are chatbots. You type, they type back. But what if your AI could *act* on your behalf? What if it could plan a project, check your code, deploy your app, and even fix itself when things breakโall while you sleep?
|
|
4
|
+
|
|
5
|
+
This is the promise of **Autonomous Agents**. Unlike chatbots, agents have:
|
|
6
|
+
1. **Goals** (long-term objectives)
|
|
7
|
+
2. **Memory** (persistence across sessions)
|
|
8
|
+
3. **Skills** (integrations with real-world tools)
|
|
9
|
+
4. **Autonomy** (looping execution without constant prompts)
|
|
10
|
+
|
|
11
|
+
In this 5-part series, we will break down exactly how we built `@praveencs/agent`, a powerful, open-source autonomous agent runtime.
|
|
12
|
+
|
|
13
|
+
## ๐๏ธ The Core Architecture
|
|
14
|
+
|
|
15
|
+
To build a truly functional agent, we need several distinct components working in harmony. We'll use a modular architecture:
|
|
16
|
+
|
|
17
|
+
### 1. The Brain (Planner)
|
|
18
|
+
Standard LLMs (like GPT-4) are great at answering questions but terrible at long-term execution. To fix this, we need a **Planner**.
|
|
19
|
+
- **Role**: Take a high-level goal ("Build a blog") and decompose it into small, atomic tasks ("Create Next.js app", "Set up DB", "Write About page").
|
|
20
|
+
- **Innovation**: We use a recursive decomposition strategy where the LLM acts as a project manager.
|
|
21
|
+
|
|
22
|
+
### 2. The Body (Executor)
|
|
23
|
+
Once we have a list of tasks, something needs to *do* them. This is the **Executor**.
|
|
24
|
+
- **Role**: Pick up the next task, determine the right tool (Skill) to use, and execute it.
|
|
25
|
+
- **Innovation**: We treat shell commands as first-class citizens. The agent can run `npm install`, `git commit`, or `docker build` just like a human developer.
|
|
26
|
+
|
|
27
|
+
### 3. The Memory (Context)
|
|
28
|
+
A developer who forgets the codebase every morning is useless. Our agent needs **Memory**.
|
|
29
|
+
- **Role**: Store project context ("This is a TypeScript project"), facts ("Staging IP is 10.0.0.5"), and learnings ("The last build failed because of a missing dependency").
|
|
30
|
+
- **Innovation**: We use SQLite with FTS5 (Full-Text Search) and a JSON-based vector-like storage to quickly retrieve relevant context for every task.
|
|
31
|
+
|
|
32
|
+
### 4. The Daemon (Autonomy)
|
|
33
|
+
The secret sauce is the loop. A chatbot waits for input. An agent runs in a loop.
|
|
34
|
+
- **Role**: A background process that constantly checks for pending tasks, file changes, or new goals.
|
|
35
|
+
|
|
36
|
+
## ๐ ๏ธ Tech Stack
|
|
37
|
+
|
|
38
|
+
We're building this in **TypeScript** (Node.js) because:
|
|
39
|
+
- **Ecosystem**: Access to millions of npm packages.
|
|
40
|
+
- **Safety**: Strong typing prevents runtime errors in complex logic flows.
|
|
41
|
+
- **Performance**: Validated through years of enterprise usage.
|
|
42
|
+
|
|
43
|
+
**Key Libraries:**
|
|
44
|
+
- `better-sqlite3`: Fast, synchronous SQLite access.
|
|
45
|
+
- `commander`: Powerful CLI framework.
|
|
46
|
+
- `openai/anthropic-sdk`: Interface to the LLM brains.
|
|
47
|
+
|
|
48
|
+
## ๐ What's Next?
|
|
49
|
+
|
|
50
|
+
In **Part 2**, we will dive into the code for **The Brain**. We'll write the `GoalDecomposer` class that turns vague requests into structured project plans.
|
|
51
|
+
|
|
52
|
+
Stay tuned!
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
# Building The Brain: AI Goal Decomposition (Part 2/5)
|
|
2
|
+
|
|
3
|
+
In **Part 1**, we saw that the key difference between a chatbot and an agent is structure.
|
|
4
|
+
|
|
5
|
+
A chatbot might say: "Here is a list of steps to build a blog."
|
|
6
|
+
An agent says: "I have created 5 pending tasks for you to approve or let me execute."
|
|
7
|
+
|
|
8
|
+
This transformation happens in the **Planner**.
|
|
9
|
+
|
|
10
|
+
## ๐ง The `GoalDecomposer`
|
|
11
|
+
|
|
12
|
+
### The Problem
|
|
13
|
+
LLMs are notoriously bad at holding long chains of reasoning. If you say "Build me a facebook clone", they might hallucinately spit out 200 lines of code and then stop.
|
|
14
|
+
|
|
15
|
+
### The Solution: Recursion
|
|
16
|
+
Instead of one massive prompt, we can use **Recursive Decomposition**.
|
|
17
|
+
1. **Receive Goal**: "Build a blog app"
|
|
18
|
+
2. **Decompose**: Break it into 3-5 high-level tasks.
|
|
19
|
+
- "Set up Next.js"
|
|
20
|
+
- "Create Database schema"
|
|
21
|
+
- "Implement styling"
|
|
22
|
+
3. **Refine**: If a task is too complex ("Implement styling" is huge), decompose *that* task further.
|
|
23
|
+
|
|
24
|
+
### ๐ป The Code
|
|
25
|
+
|
|
26
|
+
We built `src/goals/decomposer.ts` to handle this. Here's the core logic:
|
|
27
|
+
|
|
28
|
+
```typescript
|
|
29
|
+
// 1. Construct the planning prompt
|
|
30
|
+
const systemPrompt = `You are an expert project manager AI.
|
|
31
|
+
Your goal is to break down a high-level objective into actionable, atomic tasks.
|
|
32
|
+
|
|
33
|
+
Rules:
|
|
34
|
+
1. Each task must be executable by a single skill (e.g., shell command, file write).
|
|
35
|
+
2. Define dependencies (Task B depends on Task A).
|
|
36
|
+
3. If a task is dangerous (e.g., delete DB), mark it 'requiresApproval: true'.
|
|
37
|
+
|
|
38
|
+
Output ONLY valid JSON:
|
|
39
|
+
{
|
|
40
|
+
"tasks": [
|
|
41
|
+
{ "title": "...", "skill": "git-clone", "dependsOn": [] },
|
|
42
|
+
...
|
|
43
|
+
]
|
|
44
|
+
}`;
|
|
45
|
+
|
|
46
|
+
// 2. Call the LLM
|
|
47
|
+
const completion = await llm.chat({
|
|
48
|
+
messages: [
|
|
49
|
+
{ role: 'system', content: systemPrompt },
|
|
50
|
+
{ role: 'user', content: `Goal: ${userGoal}` }
|
|
51
|
+
]
|
|
52
|
+
});
|
|
53
|
+
|
|
54
|
+
// 3. Parse and Store
|
|
55
|
+
const plan = JSON.parse(completion.content);
|
|
56
|
+
for (const task of plan.tasks) {
|
|
57
|
+
await goalStore.addTask(task);
|
|
58
|
+
}
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### Why This Works better
|
|
62
|
+
- **Context Window**: By breaking things down, each step fits comfortably in the LLM's context window.
|
|
63
|
+
- **Error Recovery**: If "Set up Database" fails, the agent knows exactly *which* part failed and can retry just that task, instead of restarting the whole conversation.
|
|
64
|
+
- **Parallelism**: Tasks without dependencies can be executed simultaneously (e.g., designing the logo while the database spins up).
|
|
65
|
+
|
|
66
|
+
### Real-World Example
|
|
67
|
+
|
|
68
|
+
**Goal**: "Deploy to Vercel"
|
|
69
|
+
|
|
70
|
+
**Decomposition**:
|
|
71
|
+
1. Task #1: `vercel login` (Skill: `shell-exec`)
|
|
72
|
+
2. Task #2: `vercel link` (Skill: `shell-exec`, Depends on #1)
|
|
73
|
+
3. Task #3: `vercel build` (Skill: `shell-exec`, Depends on #2)
|
|
74
|
+
4. Task #4: `vercel deploy --prod` (Skill: `shell-exec`, Depends on #3)
|
|
75
|
+
|
|
76
|
+
This structured approach transforms vague intents into a reliable execution graph.
|
|
77
|
+
|
|
78
|
+
## ๐ Next Up: Execution
|
|
79
|
+
|
|
80
|
+
In **Part 3**, we'll build the engines that actually *do* the work: The **Skill Executor**. We'll learn how to let an AI safely run shell commands.
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
# Building The Body: Skill Execution & Tools (Part 3/5)
|
|
2
|
+
|
|
3
|
+
In **Part 2**, we built the Brain that breaks goals into tasks. Now, in **Part 3**, we give our agent hands.
|
|
4
|
+
|
|
5
|
+
The **Executor** is the engine that actually *performs* actions. It takes a task like "Run tests" and translates it into real-world commands (`npm test`).
|
|
6
|
+
|
|
7
|
+
## ๐ ๏ธ Skills as Code
|
|
8
|
+
|
|
9
|
+
A "Skill" is simply a prompt that teaches the LLM how to use a specific tool. We store skills as markdown files (`prompt.md`).
|
|
10
|
+
|
|
11
|
+
Example Skill: `git-commit`
|
|
12
|
+
```markdown
|
|
13
|
+
# Git Commit Skill
|
|
14
|
+
|
|
15
|
+
## Usage
|
|
16
|
+
When asked to commit changes, determine the message based on diffs.
|
|
17
|
+
|
|
18
|
+
## Output Format
|
|
19
|
+
```bash
|
|
20
|
+
git add .
|
|
21
|
+
git commit -m "feat: updated user model"
|
|
22
|
+
```
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
This simple format allows anyone to add new capabilities without writing complex TypeScript logic.
|
|
26
|
+
|
|
27
|
+
## โก The `TaskExecutor`
|
|
28
|
+
|
|
29
|
+
The `TaskExecutor` class (`src/goals/executor.ts`) is responsible for:
|
|
30
|
+
1. **Reading Task**: "Commit changes"
|
|
31
|
+
2. **Matching Skill**: Find `git-commit` skill.
|
|
32
|
+
3. **Prompting LLM**: Combine task + skill + context into a prompt.
|
|
33
|
+
4. **Executing Commands**: Safely run the extracted shell commands.
|
|
34
|
+
|
|
35
|
+
### Key Logic:
|
|
36
|
+
|
|
37
|
+
```typescript
|
|
38
|
+
// 1. Prepare Prompt
|
|
39
|
+
const prompt = `You are executing task: "${task.title}".
|
|
40
|
+
Use the skill: "${task.skill}".
|
|
41
|
+
|
|
42
|
+
Current Directory: ${process.cwd()}
|
|
43
|
+
|
|
44
|
+
Write the exact shell commands needed inside a bash block.`;
|
|
45
|
+
|
|
46
|
+
// 2. Get Commands from LLM
|
|
47
|
+
const response = await llm.chat({ messages: [{ content: prompt }] });
|
|
48
|
+
const commands = extractBashBlocks(response.content);
|
|
49
|
+
|
|
50
|
+
// 3. Execute Safely
|
|
51
|
+
for (const cmd of commands) {
|
|
52
|
+
if (isDangerous(cmd)) {
|
|
53
|
+
await requestApproval(cmd); // Human-in-the-loop safety
|
|
54
|
+
}
|
|
55
|
+
const result = await execAsync(cmd);
|
|
56
|
+
task.output += result.stdout;
|
|
57
|
+
}
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
### Safety First: The Human-in-the-loop
|
|
61
|
+
|
|
62
|
+
Giving an AI shell access is inherently risky. We mitigate this with:
|
|
63
|
+
- **Permission Scoping**: Skills can be marked `requiresApproval: true`.
|
|
64
|
+
- **Command Whitelisting**: Certain commands (`rm -rf /`) are blocked by default.
|
|
65
|
+
- **Sandboxing**: (Future) Run commands in Docker containers.
|
|
66
|
+
|
|
67
|
+
## ๐ The Execution Loop
|
|
68
|
+
|
|
69
|
+
The agent runs in a loop via the `Daemon` (`src/daemon/service.ts`).
|
|
70
|
+
1. **Check Queue**: Any pending tasks?
|
|
71
|
+
2. **Pop Task**: Get highest priority task.
|
|
72
|
+
3. **Execute**: Run `TaskExecutor.execute(task)`.
|
|
73
|
+
4. **Analyze**: Did it succeed?
|
|
74
|
+
- **Success**: Mark task complete, update goal progress.
|
|
75
|
+
- **Failure**: Auto-retry or mark as failed.
|
|
76
|
+
5. **Log**: Record activity in persistent memory.
|
|
77
|
+
|
|
78
|
+
## ๐ Next Up: Memory
|
|
79
|
+
|
|
80
|
+
An agent needs to remember what it did yesterday. In **Part 4**, we'll build the **Memory Store** using SQLite and Vector Search.
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
# Building The Memory: Long-Term Context (Part 4/5)
|
|
2
|
+
|
|
3
|
+
In **Part 3**, we gave our agent hands to execute tasks. But an agent with amnesia is frustrating.
|
|
4
|
+
|
|
5
|
+
Imagine hiring a developer who forgets your project structure every morning. That's why we need **Long-Term Memory**.
|
|
6
|
+
|
|
7
|
+
## ๐ง The Problem of Context
|
|
8
|
+
|
|
9
|
+
LLMs have limited context windows (e.g., 128k tokens). You can't fit your entire codebase, documentation, and history into every prompt. We need to selectively retrieve only relevant information.
|
|
10
|
+
|
|
11
|
+
## ๐พ The Solution: SQLite + Semantic Search
|
|
12
|
+
|
|
13
|
+
We built a custom `MemoryStore` using `better-sqlite3`.
|
|
14
|
+
|
|
15
|
+
### 1. Structured Data (Relational)
|
|
16
|
+
Tasks, Goals, and Metrics fit perfectly into traditional SQL tables.
|
|
17
|
+
- `goals`: Track high-level objectives.
|
|
18
|
+
- `tasks`: Track individual steps and success/failure.
|
|
19
|
+
- `audit_events`: immutable log of every action taken.
|
|
20
|
+
|
|
21
|
+
### 2. Unstructured Data (Semantic)
|
|
22
|
+
But what about "The login button is broken on mobile"? This is unstructured text.
|
|
23
|
+
We store this in a `memories` table with **Vector Embeddings** or **Full-Text Search (FTS5)**.
|
|
24
|
+
|
|
25
|
+
We chose FTS5 for simplicity and speed:
|
|
26
|
+
```sql
|
|
27
|
+
CREATE VIRTUAL TABLE memories_fts USING fts5(content, metadata);
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
When you search for "login bug", SQLite's FTS5 engine finds relevant rows instantly based on keywords, even in large datasets.
|
|
31
|
+
|
|
32
|
+
### ๐ป The Implementation
|
|
33
|
+
|
|
34
|
+
In `src/memory/store.ts`:
|
|
35
|
+
|
|
36
|
+
```typescript
|
|
37
|
+
// 1. Save a Memory
|
|
38
|
+
memoryStore.save(
|
|
39
|
+
"The login button is broken on mobile Safari.",
|
|
40
|
+
"fact", // Type: fact, rule, learned
|
|
41
|
+
"web-app", // Domain/Project
|
|
42
|
+
["bug", "login", "mobile"] // Tags
|
|
43
|
+
_ );
|
|
44
|
+
|
|
45
|
+
// 2. Retrieve Context
|
|
46
|
+
const relevant = memoryStore.search("login issues", 5);
|
|
47
|
+
// Returns: [{ content: "The login button is broken...", score: ... }]
|
|
48
|
+
|
|
49
|
+
// 3. Use in Prompt
|
|
50
|
+
const prompt = `Task: Fix login bug.
|
|
51
|
+
Context:
|
|
52
|
+
${relevant.map(m => `- ${m.content}`).join('\n')}
|
|
53
|
+
|
|
54
|
+
Based on this context, write a fix.`;
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
## ๐ The Learning Loop
|
|
58
|
+
|
|
59
|
+
Every time a task completes successfully (or fails spectacularly), we record a **Learned Memory**.
|
|
60
|
+
|
|
61
|
+
- **Success**: "Using `image: node:18-alpine` fixed the build error."
|
|
62
|
+
- **Failure**: "Don't use `rm -rf` without checking path first."
|
|
63
|
+
|
|
64
|
+
The next time a similar task comes up ("Fix build error"), the agent searches its memory, finds the previous solution, and applies it automatically. This creates a compounding intelligence effect.
|
|
65
|
+
|
|
66
|
+
## ๐ Next Up: Self-Improvement
|
|
67
|
+
|
|
68
|
+
In the final **Part 5**, we'll cover the most exciting feature: **The Auto-Fixer**.
|
|
69
|
+
How can an agent detect its own bugs and rewrite its own code?
|
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
# Building Self-Improvement: The Auto-Fixer (Part 5/5)
|
|
2
|
+
|
|
3
|
+
In **Part 1-4**, we built an agent that can plan, execute, and remember. But like any junior developer, it will make mistakes.
|
|
4
|
+
|
|
5
|
+
What if the `npm-install` skill breaks because of a new error message format?
|
|
6
|
+
|
|
7
|
+
Normally, you'd fix the code. But an **Autonomous Agent** should fix *itself*.
|
|
8
|
+
|
|
9
|
+
## โค๏ธ The Metrics
|
|
10
|
+
|
|
11
|
+
First, we need to know something is broken. We track **Skill Metrics** (`src/memory/store.ts`):
|
|
12
|
+
|
|
13
|
+
```sql
|
|
14
|
+
CREATE TABLE skill_metrics (
|
|
15
|
+
skill TEXT PRIMARY KEY,
|
|
16
|
+
calls INTEGER DEFAULT 0,
|
|
17
|
+
successes INTEGER DEFAULT 0,
|
|
18
|
+
failures INTEGER DEFAULT 0,
|
|
19
|
+
total_duration_ms INTEGER DEFAULT 0
|
|
20
|
+
);
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Whenever a skill runs, we record the outcome:
|
|
24
|
+
- **Success Rate**: `successes / calls`
|
|
25
|
+
- **Avg Duration**: `total_duration_ms / calls`
|
|
26
|
+
|
|
27
|
+
If `git-commit` fails 5 times in a row, its success rate drops below 20%. This triggers an alert.
|
|
28
|
+
|
|
29
|
+
## ๐ฉบ The Doctor
|
|
30
|
+
|
|
31
|
+
We built `src/skills/doctor.ts`. Its job is to diagnose failing skills.
|
|
32
|
+
|
|
33
|
+
1. **Check Metrics**: Find skills with >5 failures or <50% success rate.
|
|
34
|
+
2. **Analyze Logs**: Retrieve specific error messages (e.g., "fatal: not a git repository").
|
|
35
|
+
3. **Generate Report**: "Skill `git-commit` is failing because it's running outside a repo."
|
|
36
|
+
|
|
37
|
+
## ๐ง The Auto-Fixer
|
|
38
|
+
|
|
39
|
+
The magic happens in `doctor.fix(skillName)`. It uses the LLM to patch the code.
|
|
40
|
+
|
|
41
|
+
### 1. Construct Prompt
|
|
42
|
+
```typescript
|
|
43
|
+
const prompt = `You act as an AI Tool Developer.
|
|
44
|
+
The skill "${skillName}" is failing repeatedly.
|
|
45
|
+
|
|
46
|
+
Current Source (prompt.md):
|
|
47
|
+
${currentCode}
|
|
48
|
+
|
|
49
|
+
Recent Errors:
|
|
50
|
+
- ${error1}
|
|
51
|
+
- ${error2}
|
|
52
|
+
|
|
53
|
+
Rewrite the prompt to handle these errors.`;
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### 2. Generate Patch
|
|
57
|
+
The LLM reads the errors ("not a git repository") and decides to add a check:
|
|
58
|
+
```markdown
|
|
59
|
+
# Git Commit Skill
|
|
60
|
+
## Updated Instructions
|
|
61
|
+
1. Run `git status` first to verify repo.
|
|
62
|
+
2. If distinct, add changes.
|
|
63
|
+
3. Commit.
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### 3. Apply & Reload
|
|
67
|
+
The agent overwrites `prompt.md` with the new version and reloads the skill instantly. The next execution uses the fixed logic.
|
|
68
|
+
|
|
69
|
+
## ๐ Conclusion
|
|
70
|
+
|
|
71
|
+
We have built a system that:
|
|
72
|
+
1. **Decomposes** vague goals into tasks.
|
|
73
|
+
2. **Executes** tasks using defined skills.
|
|
74
|
+
3. **Remembers** context and learnings.
|
|
75
|
+
4. **Monitors** itself and **Fixes** its own tools.
|
|
76
|
+
|
|
77
|
+
This loopโPlan, Do, Check, Actโis the foundation of autonomy.
|
|
78
|
+
|
|
79
|
+
### ๐ Series Recap
|
|
80
|
+
|
|
81
|
+
- **Part 1**: Architecture & Vision
|
|
82
|
+
- **Part 2**: The Brain (Goal Decomposition)
|
|
83
|
+
- **Part 3**: The Body (Skill Execution)
|
|
84
|
+
- **Part 4**: The Memory (Persistence)
|
|
85
|
+
- **Part 5**: Self-Improvement (Auto-Fixer)
|
|
86
|
+
|
|
87
|
+
You can explore the full source code and contribute at [GitHub](https://github.com/praveencs87/agent).
|
|
88
|
+
|
|
89
|
+
Happy Hacking!
|
package/package.json
CHANGED
|
@@ -1,10 +1,12 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@praveencs/agent",
|
|
3
|
-
"version": "0.7.
|
|
3
|
+
"version": "0.7.4",
|
|
4
4
|
"files": [
|
|
5
5
|
"dist",
|
|
6
6
|
"bin",
|
|
7
7
|
"README.md",
|
|
8
|
+
"ROADMAP.md",
|
|
9
|
+
"docs",
|
|
8
10
|
"src/index.d.ts"
|
|
9
11
|
],
|
|
10
12
|
"description": "CLI agent runtime with Skill Hub, Plan Files, and permissioned tools",
|