@praveencs/agent 0.7.2 โ†’ 0.7.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -174,6 +174,17 @@ The runtime consists of several modular components:
174
174
 
175
175
  ---
176
176
 
177
+
178
+ ## ๐Ÿ“š Learning Series
179
+
180
+ Want to understand how this agent works under the hood? Check out our 5-part architecture series:
181
+
182
+ 1. [**Vision & Architecture**](docs/articles/01-vision-architecture.md) - The high-level design.
183
+ 2. [**The Brain (Planner)**](docs/articles/02-goal-decomposition.md) - How goal decomposition works.
184
+ 3. [**The Body (Executor)**](docs/articles/03-skill-execution.md) - Secure skill execution.
185
+ 4. [**Memory & Context**](docs/articles/04-memory-persistence.md) - SQLite & Vector storage.
186
+ 5. [**Self-Improvement**](docs/articles/05-self-improvement.md) - Metrics & The Auto-Fixer.
187
+
177
188
  ## ๐Ÿค Contributing
178
189
 
179
190
  We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to set up your development environment.
@@ -18,7 +18,7 @@ export function createCLI() {
18
18
  program
19
19
  .name('agent')
20
20
  .description('Agent Runtime โ€” autonomous, goal-oriented AI agent with skills, plans, memory, and permissioned tools')
21
- .version('0.7.2')
21
+ .version('0.7.3')
22
22
  .option('--verbose', 'Enable verbose output')
23
23
  .option('--no-color', 'Disable colored output')
24
24
  .option('--config <path>', 'Path to config file');
@@ -0,0 +1,52 @@
1
+ # Building an Autonomous AI Agent: The Vision & Architecture (Part 1/5)
2
+
3
+ Most AI projects today are chatbots. You type, they type back. But what if your AI could *act* on your behalf? What if it could plan a project, check your code, deploy your app, and even fix itself when things breakโ€”all while you sleep?
4
+
5
+ This is the promise of **Autonomous Agents**. Unlike chatbots, agents have:
6
+ 1. **Goals** (long-term objectives)
7
+ 2. **Memory** (persistence across sessions)
8
+ 3. **Skills** (integrations with real-world tools)
9
+ 4. **Autonomy** (looping execution without constant prompts)
10
+
11
+ In this 5-part series, we will break down exactly how we built `@praveencs/agent`, a powerful, open-source autonomous agent runtime.
12
+
13
+ ## ๐Ÿ—๏ธ The Core Architecture
14
+
15
+ To build a truly functional agent, we need several distinct components working in harmony. We'll use a modular architecture:
16
+
17
+ ### 1. The Brain (Planner)
18
+ Standard LLMs (like GPT-4) are great at answering questions but terrible at long-term execution. To fix this, we need a **Planner**.
19
+ - **Role**: Take a high-level goal ("Build a blog") and decompose it into small, atomic tasks ("Create Next.js app", "Set up DB", "Write About page").
20
+ - **Innovation**: We use a recursive decomposition strategy where the LLM acts as a project manager.
21
+
22
+ ### 2. The Body (Executor)
23
+ Once we have a list of tasks, something needs to *do* them. This is the **Executor**.
24
+ - **Role**: Pick up the next task, determine the right tool (Skill) to use, and execute it.
25
+ - **Innovation**: We treat shell commands as first-class citizens. The agent can run `npm install`, `git commit`, or `docker build` just like a human developer.
26
+
27
+ ### 3. The Memory (Context)
28
+ A developer who forgets the codebase every morning is useless. Our agent needs **Memory**.
29
+ - **Role**: Store project context ("This is a TypeScript project"), facts ("Staging IP is 10.0.0.5"), and learnings ("The last build failed because of a missing dependency").
30
+ - **Innovation**: We use SQLite with FTS5 (Full-Text Search) and a JSON-based vector-like storage to quickly retrieve relevant context for every task.
31
+
32
+ ### 4. The Daemon (Autonomy)
33
+ The secret sauce is the loop. A chatbot waits for input. An agent runs in a loop.
34
+ - **Role**: A background process that constantly checks for pending tasks, file changes, or new goals.
35
+
36
+ ## ๐Ÿ› ๏ธ Tech Stack
37
+
38
+ We're building this in **TypeScript** (Node.js) because:
39
+ - **Ecosystem**: Access to millions of npm packages.
40
+ - **Safety**: Strong typing prevents runtime errors in complex logic flows.
41
+ - **Performance**: Validated through years of enterprise usage.
42
+
43
+ **Key Libraries:**
44
+ - `better-sqlite3`: Fast, synchronous SQLite access.
45
+ - `commander`: Powerful CLI framework.
46
+ - `openai/anthropic-sdk`: Interface to the LLM brains.
47
+
48
+ ## ๐Ÿš€ What's Next?
49
+
50
+ In **Part 2**, we will dive into the code for **The Brain**. We'll write the `GoalDecomposer` class that turns vague requests into structured project plans.
51
+
52
+ Stay tuned!
@@ -0,0 +1,80 @@
1
+ # Building The Brain: AI Goal Decomposition (Part 2/5)
2
+
3
+ In **Part 1**, we saw that the key difference between a chatbot and an agent is structure.
4
+
5
+ A chatbot might say: "Here is a list of steps to build a blog."
6
+ An agent says: "I have created 5 pending tasks for you to approve or let me execute."
7
+
8
+ This transformation happens in the **Planner**.
9
+
10
+ ## ๐Ÿง  The `GoalDecomposer`
11
+
12
+ ### The Problem
13
+ LLMs are notoriously bad at holding long chains of reasoning. If you say "Build me a facebook clone", they might hallucinately spit out 200 lines of code and then stop.
14
+
15
+ ### The Solution: Recursion
16
+ Instead of one massive prompt, we can use **Recursive Decomposition**.
17
+ 1. **Receive Goal**: "Build a blog app"
18
+ 2. **Decompose**: Break it into 3-5 high-level tasks.
19
+ - "Set up Next.js"
20
+ - "Create Database schema"
21
+ - "Implement styling"
22
+ 3. **Refine**: If a task is too complex ("Implement styling" is huge), decompose *that* task further.
23
+
24
+ ### ๐Ÿ’ป The Code
25
+
26
+ We built `src/goals/decomposer.ts` to handle this. Here's the core logic:
27
+
28
+ ```typescript
29
+ // 1. Construct the planning prompt
30
+ const systemPrompt = `You are an expert project manager AI.
31
+ Your goal is to break down a high-level objective into actionable, atomic tasks.
32
+
33
+ Rules:
34
+ 1. Each task must be executable by a single skill (e.g., shell command, file write).
35
+ 2. Define dependencies (Task B depends on Task A).
36
+ 3. If a task is dangerous (e.g., delete DB), mark it 'requiresApproval: true'.
37
+
38
+ Output ONLY valid JSON:
39
+ {
40
+ "tasks": [
41
+ { "title": "...", "skill": "git-clone", "dependsOn": [] },
42
+ ...
43
+ ]
44
+ }`;
45
+
46
+ // 2. Call the LLM
47
+ const completion = await llm.chat({
48
+ messages: [
49
+ { role: 'system', content: systemPrompt },
50
+ { role: 'user', content: `Goal: ${userGoal}` }
51
+ ]
52
+ });
53
+
54
+ // 3. Parse and Store
55
+ const plan = JSON.parse(completion.content);
56
+ for (const task of plan.tasks) {
57
+ await goalStore.addTask(task);
58
+ }
59
+ ```
60
+
61
+ ### Why This Works better
62
+ - **Context Window**: By breaking things down, each step fits comfortably in the LLM's context window.
63
+ - **Error Recovery**: If "Set up Database" fails, the agent knows exactly *which* part failed and can retry just that task, instead of restarting the whole conversation.
64
+ - **Parallelism**: Tasks without dependencies can be executed simultaneously (e.g., designing the logo while the database spins up).
65
+
66
+ ### Real-World Example
67
+
68
+ **Goal**: "Deploy to Vercel"
69
+
70
+ **Decomposition**:
71
+ 1. Task #1: `vercel login` (Skill: `shell-exec`)
72
+ 2. Task #2: `vercel link` (Skill: `shell-exec`, Depends on #1)
73
+ 3. Task #3: `vercel build` (Skill: `shell-exec`, Depends on #2)
74
+ 4. Task #4: `vercel deploy --prod` (Skill: `shell-exec`, Depends on #3)
75
+
76
+ This structured approach transforms vague intents into a reliable execution graph.
77
+
78
+ ## ๐Ÿš€ Next Up: Execution
79
+
80
+ In **Part 3**, we'll build the engines that actually *do* the work: The **Skill Executor**. We'll learn how to let an AI safely run shell commands.
@@ -0,0 +1,80 @@
1
+ # Building The Body: Skill Execution & Tools (Part 3/5)
2
+
3
+ In **Part 2**, we built the Brain that breaks goals into tasks. Now, in **Part 3**, we give our agent hands.
4
+
5
+ The **Executor** is the engine that actually *performs* actions. It takes a task like "Run tests" and translates it into real-world commands (`npm test`).
6
+
7
+ ## ๐Ÿ› ๏ธ Skills as Code
8
+
9
+ A "Skill" is simply a prompt that teaches the LLM how to use a specific tool. We store skills as markdown files (`prompt.md`).
10
+
11
+ Example Skill: `git-commit`
12
+ ```markdown
13
+ # Git Commit Skill
14
+
15
+ ## Usage
16
+ When asked to commit changes, determine the message based on diffs.
17
+
18
+ ## Output Format
19
+ ```bash
20
+ git add .
21
+ git commit -m "feat: updated user model"
22
+ ```
23
+ ```
24
+
25
+ This simple format allows anyone to add new capabilities without writing complex TypeScript logic.
26
+
27
+ ## โšก The `TaskExecutor`
28
+
29
+ The `TaskExecutor` class (`src/goals/executor.ts`) is responsible for:
30
+ 1. **Reading Task**: "Commit changes"
31
+ 2. **Matching Skill**: Find `git-commit` skill.
32
+ 3. **Prompting LLM**: Combine task + skill + context into a prompt.
33
+ 4. **Executing Commands**: Safely run the extracted shell commands.
34
+
35
+ ### Key Logic:
36
+
37
+ ```typescript
38
+ // 1. Prepare Prompt
39
+ const prompt = `You are executing task: "${task.title}".
40
+ Use the skill: "${task.skill}".
41
+
42
+ Current Directory: ${process.cwd()}
43
+
44
+ Write the exact shell commands needed inside a bash block.`;
45
+
46
+ // 2. Get Commands from LLM
47
+ const response = await llm.chat({ messages: [{ content: prompt }] });
48
+ const commands = extractBashBlocks(response.content);
49
+
50
+ // 3. Execute Safely
51
+ for (const cmd of commands) {
52
+ if (isDangerous(cmd)) {
53
+ await requestApproval(cmd); // Human-in-the-loop safety
54
+ }
55
+ const result = await execAsync(cmd);
56
+ task.output += result.stdout;
57
+ }
58
+ ```
59
+
60
+ ### Safety First: The Human-in-the-loop
61
+
62
+ Giving an AI shell access is inherently risky. We mitigate this with:
63
+ - **Permission Scoping**: Skills can be marked `requiresApproval: true`.
64
+ - **Command Whitelisting**: Certain commands (`rm -rf /`) are blocked by default.
65
+ - **Sandboxing**: (Future) Run commands in Docker containers.
66
+
67
+ ## ๐Ÿ”„ The Execution Loop
68
+
69
+ The agent runs in a loop via the `Daemon` (`src/daemon/service.ts`).
70
+ 1. **Check Queue**: Any pending tasks?
71
+ 2. **Pop Task**: Get highest priority task.
72
+ 3. **Execute**: Run `TaskExecutor.execute(task)`.
73
+ 4. **Analyze**: Did it succeed?
74
+ - **Success**: Mark task complete, update goal progress.
75
+ - **Failure**: Auto-retry or mark as failed.
76
+ 5. **Log**: Record activity in persistent memory.
77
+
78
+ ## ๐Ÿš€ Next Up: Memory
79
+
80
+ An agent needs to remember what it did yesterday. In **Part 4**, we'll build the **Memory Store** using SQLite and Vector Search.
@@ -0,0 +1,69 @@
1
+ # Building The Memory: Long-Term Context (Part 4/5)
2
+
3
+ In **Part 3**, we gave our agent hands to execute tasks. But an agent with amnesia is frustrating.
4
+
5
+ Imagine hiring a developer who forgets your project structure every morning. That's why we need **Long-Term Memory**.
6
+
7
+ ## ๐Ÿง  The Problem of Context
8
+
9
+ LLMs have limited context windows (e.g., 128k tokens). You can't fit your entire codebase, documentation, and history into every prompt. We need to selectively retrieve only relevant information.
10
+
11
+ ## ๐Ÿ’พ The Solution: SQLite + Semantic Search
12
+
13
+ We built a custom `MemoryStore` using `better-sqlite3`.
14
+
15
+ ### 1. Structured Data (Relational)
16
+ Tasks, Goals, and Metrics fit perfectly into traditional SQL tables.
17
+ - `goals`: Track high-level objectives.
18
+ - `tasks`: Track individual steps and success/failure.
19
+ - `audit_events`: immutable log of every action taken.
20
+
21
+ ### 2. Unstructured Data (Semantic)
22
+ But what about "The login button is broken on mobile"? This is unstructured text.
23
+ We store this in a `memories` table with **Vector Embeddings** or **Full-Text Search (FTS5)**.
24
+
25
+ We chose FTS5 for simplicity and speed:
26
+ ```sql
27
+ CREATE VIRTUAL TABLE memories_fts USING fts5(content, metadata);
28
+ ```
29
+
30
+ When you search for "login bug", SQLite's FTS5 engine finds relevant rows instantly based on keywords, even in large datasets.
31
+
32
+ ### ๐Ÿ’ป The Implementation
33
+
34
+ In `src/memory/store.ts`:
35
+
36
+ ```typescript
37
+ // 1. Save a Memory
38
+ memoryStore.save(
39
+ "The login button is broken on mobile Safari.",
40
+ "fact", // Type: fact, rule, learned
41
+ "web-app", // Domain/Project
42
+ ["bug", "login", "mobile"] // Tags
43
+ _ );
44
+
45
+ // 2. Retrieve Context
46
+ const relevant = memoryStore.search("login issues", 5);
47
+ // Returns: [{ content: "The login button is broken...", score: ... }]
48
+
49
+ // 3. Use in Prompt
50
+ const prompt = `Task: Fix login bug.
51
+ Context:
52
+ ${relevant.map(m => `- ${m.content}`).join('\n')}
53
+
54
+ Based on this context, write a fix.`;
55
+ ```
56
+
57
+ ## ๐Ÿ”„ The Learning Loop
58
+
59
+ Every time a task completes successfully (or fails spectacularly), we record a **Learned Memory**.
60
+
61
+ - **Success**: "Using `image: node:18-alpine` fixed the build error."
62
+ - **Failure**: "Don't use `rm -rf` without checking path first."
63
+
64
+ The next time a similar task comes up ("Fix build error"), the agent searches its memory, finds the previous solution, and applies it automatically. This creates a compounding intelligence effect.
65
+
66
+ ## ๐Ÿš€ Next Up: Self-Improvement
67
+
68
+ In the final **Part 5**, we'll cover the most exciting feature: **The Auto-Fixer**.
69
+ How can an agent detect its own bugs and rewrite its own code?
@@ -0,0 +1,89 @@
1
+ # Building Self-Improvement: The Auto-Fixer (Part 5/5)
2
+
3
+ In **Part 1-4**, we built an agent that can plan, execute, and remember. But like any junior developer, it will make mistakes.
4
+
5
+ What if the `npm-install` skill breaks because of a new error message format?
6
+
7
+ Normally, you'd fix the code. But an **Autonomous Agent** should fix *itself*.
8
+
9
+ ## โค๏ธ The Metrics
10
+
11
+ First, we need to know something is broken. We track **Skill Metrics** (`src/memory/store.ts`):
12
+
13
+ ```sql
14
+ CREATE TABLE skill_metrics (
15
+ skill TEXT PRIMARY KEY,
16
+ calls INTEGER DEFAULT 0,
17
+ successes INTEGER DEFAULT 0,
18
+ failures INTEGER DEFAULT 0,
19
+ total_duration_ms INTEGER DEFAULT 0
20
+ );
21
+ ```
22
+
23
+ Whenever a skill runs, we record the outcome:
24
+ - **Success Rate**: `successes / calls`
25
+ - **Avg Duration**: `total_duration_ms / calls`
26
+
27
+ If `git-commit` fails 5 times in a row, its success rate drops below 20%. This triggers an alert.
28
+
29
+ ## ๐Ÿฉบ The Doctor
30
+
31
+ We built `src/skills/doctor.ts`. Its job is to diagnose failing skills.
32
+
33
+ 1. **Check Metrics**: Find skills with >5 failures or <50% success rate.
34
+ 2. **Analyze Logs**: Retrieve specific error messages (e.g., "fatal: not a git repository").
35
+ 3. **Generate Report**: "Skill `git-commit` is failing because it's running outside a repo."
36
+
37
+ ## ๐Ÿ”ง The Auto-Fixer
38
+
39
+ The magic happens in `doctor.fix(skillName)`. It uses the LLM to patch the code.
40
+
41
+ ### 1. Construct Prompt
42
+ ```typescript
43
+ const prompt = `You act as an AI Tool Developer.
44
+ The skill "${skillName}" is failing repeatedly.
45
+
46
+ Current Source (prompt.md):
47
+ ${currentCode}
48
+
49
+ Recent Errors:
50
+ - ${error1}
51
+ - ${error2}
52
+
53
+ Rewrite the prompt to handle these errors.`;
54
+ ```
55
+
56
+ ### 2. Generate Patch
57
+ The LLM reads the errors ("not a git repository") and decides to add a check:
58
+ ```markdown
59
+ # Git Commit Skill
60
+ ## Updated Instructions
61
+ 1. Run `git status` first to verify repo.
62
+ 2. If distinct, add changes.
63
+ 3. Commit.
64
+ ```
65
+
66
+ ### 3. Apply & Reload
67
+ The agent overwrites `prompt.md` with the new version and reloads the skill instantly. The next execution uses the fixed logic.
68
+
69
+ ## ๐Ÿš€ Conclusion
70
+
71
+ We have built a system that:
72
+ 1. **Decomposes** vague goals into tasks.
73
+ 2. **Executes** tasks using defined skills.
74
+ 3. **Remembers** context and learnings.
75
+ 4. **Monitors** itself and **Fixes** its own tools.
76
+
77
+ This loopโ€”Plan, Do, Check, Actโ€”is the foundation of autonomy.
78
+
79
+ ### ๐Ÿ“š Series Recap
80
+
81
+ - **Part 1**: Architecture & Vision
82
+ - **Part 2**: The Brain (Goal Decomposition)
83
+ - **Part 3**: The Body (Skill Execution)
84
+ - **Part 4**: The Memory (Persistence)
85
+ - **Part 5**: Self-Improvement (Auto-Fixer)
86
+
87
+ You can explore the full source code and contribute at [GitHub](https://github.com/praveencs87/agent).
88
+
89
+ Happy Hacking!
package/package.json CHANGED
@@ -1,10 +1,11 @@
1
1
  {
2
2
  "name": "@praveencs/agent",
3
- "version": "0.7.2",
3
+ "version": "0.7.3",
4
4
  "files": [
5
5
  "dist",
6
6
  "bin",
7
7
  "README.md",
8
+ "docs",
8
9
  "src/index.d.ts"
9
10
  ],
10
11
  "description": "CLI agent runtime with Skill Hub, Plan Files, and permissioned tools",