pi-smart-reader 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +77 -0
- package/benchmark.ts +55 -0
- package/benchmark_file.ts +139 -0
- package/docs/proposal.md +32 -0
- package/docs/spec.md +60 -0
- package/package.json +38 -0
- package/src/extractor.ts +91 -0
- package/src/index.ts +90 -0
- package/src/parser.ts +45 -0
- package/src/skeleton.ts +69 -0
- package/src/types/pi-coding-agent.d.ts +18 -0
- package/src/types/web-tree-sitter.d.ts +30 -0
- package/tasks.md +29 -0
- package/tsconfig.json +16 -0
- package/wasm/tree-sitter-typescript.wasm +1 -0
- package/wasm/tree-sitter.wasm +1 -0
package/README.md
ADDED
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
# pi-smart-reader 🔍
|
|
2
|
+
|
|
3
|
+
A structural code analysis extension for [Pi](https://pi.dev/) that eliminates "token bloat" by providing skeletal views and targeted symbol extraction.
|
|
4
|
+
|
|
5
|
+
[](https://pi.dev/packages)
|
|
6
|
+
[](https://opensource.org/licenses/MIT)
|
|
7
|
+
[](https://www.npmjs.com/package/pi-smart-reader)
|
|
8
|
+
|
|
9
|
+
## 🚩 The Problem
|
|
10
|
+
Reading entire files is the most expensive operation for an AI agent. In large codebases, loading a 2,000-line file just to understand a single function:
|
|
11
|
+
- **Wastes Tokens**: Consumes a huge portion of the context window.
|
|
12
|
+
- **Dilutes Attention**: Buries the "signal" in a sea of "noise," leading to hallucinations or missed details.
|
|
13
|
+
- **Increases Cost**: Significantly raises the token count per request.
|
|
14
|
+
|
|
15
|
+
## ✨ The Solution
|
|
16
|
+
`pi-smart-reader` replaces blind reading with **Structural Extraction**. Instead of reading the whole file, the agent can now "skim" the API and surgically extract only the necessary logic.
|
|
17
|
+
|
|
18
|
+
## 🚀 Features
|
|
19
|
+
|
|
20
|
+
### 1. Skeleton View (`mode: "skeleton"`)
|
|
21
|
+
Generates a high-level map of the file. It preserves all class and function signatures but strips the implementation bodies.
|
|
22
|
+
- **Example**: 2,000 lines of code $\to$ 50 lines of API signatures.
|
|
23
|
+
- **Benefit**: Allows the agent to understand the file's capabilities without loading the whole content.
|
|
24
|
+
|
|
25
|
+
### 2. Targeted Symbol Extraction (`mode: "symbol"`)
|
|
26
|
+
Extracts the exact source code for a specific function, method, or variable.
|
|
27
|
+
- **Precision**: Uses AST (Abstract Syntax Tree) parsing to find the precise byte range of the symbol.
|
|
28
|
+
- **Efficiency**: Loads only the required logic into the context.
|
|
29
|
+
|
|
30
|
+
### 3. Dependency Awareness
|
|
31
|
+
When extracting a symbol, the sentinel scans the function body for calls to other symbols within the same file and suggests them as related dependencies.
|
|
32
|
+
|
|
33
|
+
## 🛠️ Installation
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
pi install npm:pi-smart-reader
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## 📖 Quick Start
|
|
40
|
+
|
|
41
|
+
The extension adds the `smart_read` tool to your agent.
|
|
42
|
+
|
|
43
|
+
### Step 1: Get the Skeleton
|
|
44
|
+
Instead of `read(path)`, use:
|
|
45
|
+
```json
|
|
46
|
+
{
|
|
47
|
+
"tool": "smart_read",
|
|
48
|
+
"input": {
|
|
49
|
+
"path": "src/services/UserService.ts",
|
|
50
|
+
"options": { "mode": "skeleton" }
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
```
|
|
54
|
+
*The agent now sees all available methods without the noise of their implementations.*
|
|
55
|
+
|
|
56
|
+
### Step 2: Extract the Logic
|
|
57
|
+
Once the agent identifies the target method (e.g., `validateToken`), it extracts it:
|
|
58
|
+
```json
|
|
59
|
+
{
|
|
60
|
+
"tool": "smart_read",
|
|
61
|
+
"input": {
|
|
62
|
+
"path": "src/services/UserService.ts",
|
|
63
|
+
"options": {
|
|
64
|
+
"mode": "symbol",
|
|
65
|
+
"symbol": "validateToken"
|
|
66
|
+
}
|
|
67
|
+
}
|
|
68
|
+
}
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## ⚙️ Technical Details
|
|
72
|
+
- **Engine**: Powered by `tree-sitter` for robust, language-aware parsing.
|
|
73
|
+
- **Complexity**: $O(N)$ parsing time, but $O(1)$ context impact after extraction.
|
|
74
|
+
- **Language Support**: Full support for TypeScript and JavaScript.
|
|
75
|
+
|
|
76
|
+
## 📜 License
|
|
77
|
+
MIT
|
package/benchmark.ts
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
import { SmartParser } from "./src/parser";
|
|
2
|
+
import { SkeletonEngine } from "./src/skeleton";
|
|
3
|
+
import { SymbolExtractor } from "./src/extractor";
|
|
4
|
+
import { readFileSync } from "fs";
|
|
5
|
+
|
|
6
|
+
async function runBenchmark() {
|
|
7
|
+
console.log("📊 Starting pi-smart-reader Performance Benchmark...");
|
|
8
|
+
|
|
9
|
+
const parser = new SmartParser();
|
|
10
|
+
await parser.initialize({
|
|
11
|
+
wasmPath:
|
|
12
|
+
"https://github.com/tree-sitter/tree-sitter-wasm/releases/download/v0.20.0/tree-sitter.wasm",
|
|
13
|
+
languagePath:
|
|
14
|
+
"https://github.com/tree-sitter/tree-sitter-typescript/releases/download/v0.20.0/tree-sitter-typescript.wasm",
|
|
15
|
+
});
|
|
16
|
+
|
|
17
|
+
const skeletonEngine = new SkeletonEngine(parser);
|
|
18
|
+
const symbolExtractor = new SymbolExtractor(parser);
|
|
19
|
+
|
|
20
|
+
const filePath = "benchmark_file.ts";
|
|
21
|
+
const source = readFileSync(filePath, "utf8");
|
|
22
|
+
const fullSize = source.length;
|
|
23
|
+
|
|
24
|
+
console.log(`\nFile: ${filePath}`);
|
|
25
|
+
console.log(`Full File Size: ${fullSize} characters`);
|
|
26
|
+
|
|
27
|
+
// 1. Test Skeleton
|
|
28
|
+
const skeleton = skeletonEngine.generateSkeleton(source);
|
|
29
|
+
const skeletonSize = skeleton.length;
|
|
30
|
+
const skeletonReduction = ((1 - skeletonSize / fullSize) * 100).toFixed(2);
|
|
31
|
+
|
|
32
|
+
console.log(`\n--- Skeleton View ---`);
|
|
33
|
+
console.log(`Skeleton Size: ${skeletonSize} characters`);
|
|
34
|
+
console.log(`Reduction: ${skeletonReduction}%`);
|
|
35
|
+
|
|
36
|
+
// 2. Test Symbol Extraction
|
|
37
|
+
const targetSymbol = "login";
|
|
38
|
+
const { content, relatedSymbols } = symbolExtractor.extractSymbol(
|
|
39
|
+
source,
|
|
40
|
+
targetSymbol,
|
|
41
|
+
);
|
|
42
|
+
const symbolSize = content.length;
|
|
43
|
+
const symbolReduction = ((1 - symbolSize / fullSize) * 100).toFixed(2);
|
|
44
|
+
|
|
45
|
+
console.log(`\n--- Symbol Extraction [${targetSymbol}] ---`);
|
|
46
|
+
console.log(`Symbol Size: ${symbolSize} characters`);
|
|
47
|
+
console.log(`Reduction: ${symbolReduction}%`);
|
|
48
|
+
console.log(`Related Symbols Found: ${relatedSymbols.join(", ") || "None"}`);
|
|
49
|
+
|
|
50
|
+
console.log(
|
|
51
|
+
`\nConclusion: pi-smart-reader reduced context from ${fullSize} to as little as ${Math.min(skeletonSize, symbolSize)} characters.`,
|
|
52
|
+
);
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
runBenchmark().catch(console.error);
|
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Benchmark file for pi-smart-reader
|
|
3
|
+
* Contains many functions to test AST extraction and token reduction.
|
|
4
|
+
*/
|
|
5
|
+
|
|
6
|
+
export class UserAuthService {
|
|
7
|
+
public async login(username: string, password: string): Promise<boolean> {
|
|
8
|
+
console.log("Logging in user...");
|
|
9
|
+
const user = await this.findUser(username);
|
|
10
|
+
if (!user) return false;
|
|
11
|
+
const isValid = await this.verifyPassword(password, user.hash);
|
|
12
|
+
if (isValid) {
|
|
13
|
+
const token = this.generateToken(user.id);
|
|
14
|
+
await this.logSession(user.id, token);
|
|
15
|
+
return true;
|
|
16
|
+
}
|
|
17
|
+
return false;
|
|
18
|
+
}
|
|
19
|
+
|
|
20
|
+
private async findUser(username: string): Promise<any> {
|
|
21
|
+
// Simulate DB lookup
|
|
22
|
+
return { id: "123", hash: "hashed_password" };
|
|
23
|
+
}
|
|
24
|
+
|
|
25
|
+
private async verifyPassword(
|
|
26
|
+
password: string,
|
|
27
|
+
hash: string,
|
|
28
|
+
): Promise<boolean> {
|
|
29
|
+
// Simulate password check
|
|
30
|
+
return password === "password123";
|
|
31
|
+
}
|
|
32
|
+
|
|
33
|
+
private generateToken(userId: string): string {
|
|
34
|
+
return "jwt_token_" + userId + "_" + Date.now();
|
|
35
|
+
}
|
|
36
|
+
|
|
37
|
+
private async logSession(userId: string, token: string): Promise<void> {
|
|
38
|
+
console.log(`Session started for ${userId}`);
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
export class DataProcessor {
|
|
43
|
+
public processData(data: any[]): any[] {
|
|
44
|
+
console.log("Starting data processing...");
|
|
45
|
+
const filtered = this.filterInvalid(data);
|
|
46
|
+
const mapped = this.mapToInternal(filtered);
|
|
47
|
+
const sorted = this.sortData(mapped);
|
|
48
|
+
return this.finalize(sorted);
|
|
49
|
+
}
|
|
50
|
+
|
|
51
|
+
private filterInvalid(data: any[]): any[] {
|
|
52
|
+
return data.filter((item) => item !== null && item !== undefined);
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
private mapToInternal(data: any[]): any[] {
|
|
56
|
+
return data.map((item) => ({ ...item, processed: true }));
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
private sortData(data: any[]): any[] {
|
|
60
|
+
return data.sort((a, b) => a.id - b.id);
|
|
61
|
+
}
|
|
62
|
+
|
|
63
|
+
private finalize(data: any[]): any[] {
|
|
64
|
+
console.log("Finalizing process...");
|
|
65
|
+
return data;
|
|
66
|
+
}
|
|
67
|
+
}
|
|
68
|
+
|
|
69
|
+
function helperUtility1() {
|
|
70
|
+
console.log("Util 1");
|
|
71
|
+
return true;
|
|
72
|
+
}
|
|
73
|
+
|
|
74
|
+
function helperUtility2() {
|
|
75
|
+
console.log("Util 2");
|
|
76
|
+
return false;
|
|
77
|
+
}
|
|
78
|
+
|
|
79
|
+
// Adding more filler functions to increase file size
|
|
80
|
+
function filler1() {
|
|
81
|
+
return 1;
|
|
82
|
+
}
|
|
83
|
+
function filler2() {
|
|
84
|
+
return 2;
|
|
85
|
+
}
|
|
86
|
+
function filler3() {
|
|
87
|
+
return 3;
|
|
88
|
+
}
|
|
89
|
+
function filler4() {
|
|
90
|
+
return 4;
|
|
91
|
+
}
|
|
92
|
+
function filler5() {
|
|
93
|
+
return 5;
|
|
94
|
+
}
|
|
95
|
+
function filler6() {
|
|
96
|
+
return 6;
|
|
97
|
+
}
|
|
98
|
+
function filler7() {
|
|
99
|
+
return 7;
|
|
100
|
+
}
|
|
101
|
+
function filler8() {
|
|
102
|
+
return 8;
|
|
103
|
+
}
|
|
104
|
+
function filler9() {
|
|
105
|
+
return 9;
|
|
106
|
+
}
|
|
107
|
+
function filler10() {
|
|
108
|
+
return 10;
|
|
109
|
+
}
|
|
110
|
+
function filler11() {
|
|
111
|
+
return 11;
|
|
112
|
+
}
|
|
113
|
+
function filler12() {
|
|
114
|
+
return 12;
|
|
115
|
+
}
|
|
116
|
+
function filler13() {
|
|
117
|
+
return 13;
|
|
118
|
+
}
|
|
119
|
+
function filler14() {
|
|
120
|
+
return 14;
|
|
121
|
+
}
|
|
122
|
+
function filler15() {
|
|
123
|
+
return 15;
|
|
124
|
+
}
|
|
125
|
+
function filler16() {
|
|
126
|
+
return 16;
|
|
127
|
+
}
|
|
128
|
+
function filler17() {
|
|
129
|
+
return 17;
|
|
130
|
+
}
|
|
131
|
+
function filler18() {
|
|
132
|
+
return 18;
|
|
133
|
+
}
|
|
134
|
+
function filler19() {
|
|
135
|
+
return 19;
|
|
136
|
+
}
|
|
137
|
+
function filler20() {
|
|
138
|
+
return 20;
|
|
139
|
+
}
|
package/docs/proposal.md
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Proposal: pi-smart-reader
|
|
2
|
+
|
|
3
|
+
## 1. Problem Statement
|
|
4
|
+
When working with large files, AI agents typically use the `read` tool, which loads the entire file content into the conversation context. This leads to several critical issues:
|
|
5
|
+
- **Token Exhaustion**: Large files quickly consume the context window, leaving less room for reasoning and output.
|
|
6
|
+
- **Attention Dilution**: The "Lost in the Middle" phenomenon occurs where the model overlooks critical details buried in noise.
|
|
7
|
+
- **Cost**: Higher token usage increases API costs.
|
|
8
|
+
|
|
9
|
+
## 2. Goal
|
|
10
|
+
Create a Pi extension that replaces "blind" file reading with **Structural Extraction**. The agent should be able to understand the "shape" of a file without reading its entire content, and then surgically extract only the relevant fragments of code needed for the task.
|
|
11
|
+
|
|
12
|
+
## 3. Core Features
|
|
13
|
+
### A. Skeleton View (`mode: "skeleton"`)
|
|
14
|
+
Instead of the full file, the sentinel generates a "Skeletal" version. It strips all implementation details (function bodies, class internals) and preserves only the signatures.
|
|
15
|
+
- **Example**: A 2,000-line file becomes a 50-line list of exported functions and classes.
|
|
16
|
+
|
|
17
|
+
### B. Targeted Symbol Extraction (`mode: "symbol"`)
|
|
18
|
+
Allows the agent to request a specific function, method, or variable by name.
|
|
19
|
+
- **Precision**: Using AST (Abstract Syntax Tree) parsing, the tool extracts only the exact range of the requested symbol.
|
|
20
|
+
- **Efficiency**: Only the relevant code is loaded into the context.
|
|
21
|
+
|
|
22
|
+
### C. Dependency Mapping
|
|
23
|
+
When extracting a symbol, the tool scans for internal calls to other functions in the same file and suggests those related symbols to the agent.
|
|
24
|
+
|
|
25
|
+
## 4. Success Criteria
|
|
26
|
+
- **Token Reduction**: At least a 70-90% reduction in tokens used when analyzing large files.
|
|
27
|
+
- **Accuracy**: The agent must be able to identify the correct function to modify using only the Skeleton view.
|
|
28
|
+
- **Language Support**: Initial support for TypeScript and JavaScript using `web-tree-sitter`.
|
|
29
|
+
|
|
30
|
+
## 5. Non-Goals
|
|
31
|
+
- This is not a replacement for `read` when the entire file is actually needed (e.g., for a full refactor).
|
|
32
|
+
- It will not perform cross-file dependency resolution (that is the role of LSP).
|
package/docs/spec.md
ADDED
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
# Technical Specification: pi-smart-reader
|
|
2
|
+
|
|
3
|
+
## 1. Architecture Overview
|
|
4
|
+
`pi-smart-reader` is a Pi extension that provides a specialized tool `smart_read` which leverages AST parsing via `web-tree-sitter` to selectively extract code fragments.
|
|
5
|
+
|
|
6
|
+
## 2. Detection & Extraction Logic
|
|
7
|
+
|
|
8
|
+
### A. The Skeleton Engine
|
|
9
|
+
The engine uses Tree-sitter queries to identify "headers" of code blocks.
|
|
10
|
+
**Query Logic (TS/JS)**:
|
|
11
|
+
- Find all `function_declaration`, `method_definition`, and `variable_declaration` (with arrow functions).
|
|
12
|
+
- Extract the name and the parameter list.
|
|
13
|
+
- Replace the body (`{ ... }`) with a comment `// ... implementation`.
|
|
14
|
+
|
|
15
|
+
### B. Symbol Extraction
|
|
16
|
+
When a `symbol` name is provided:
|
|
17
|
+
1. The AST is scanned for a node whose identifier matches the symbol name.
|
|
18
|
+
2. The `start` and `end` byte offsets of that node are identified.
|
|
19
|
+
3. The original file content is sliced using these offsets.
|
|
20
|
+
|
|
21
|
+
### C. Dependency Scanning
|
|
22
|
+
While extracting a symbol body:
|
|
23
|
+
1. The body is parsed for `call_expression` nodes.
|
|
24
|
+
2. The identifiers of these calls are collected.
|
|
25
|
+
3. Any identifiers that match other symbols in the same file are returned as `relatedSymbols`.
|
|
26
|
+
|
|
27
|
+
## 3. Tool API Design
|
|
28
|
+
|
|
29
|
+
**Tool Name**: `smart_read`
|
|
30
|
+
|
|
31
|
+
**Input Schema**:
|
|
32
|
+
```json
|
|
33
|
+
{
|
|
34
|
+
"path": "string",
|
|
35
|
+
"options": {
|
|
36
|
+
"mode": "skeleton" | "symbol",
|
|
37
|
+
"symbol": "string (optional)",
|
|
38
|
+
"includeDependencies": "boolean (optional)"
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
**Output Format**:
|
|
44
|
+
- `mode: "skeleton"` $\to$ Returns the skeletal version of the file.
|
|
45
|
+
- `mode: "symbol"` $\to$ Returns the code fragment and a list of related symbols.
|
|
46
|
+
|
|
47
|
+
## 4. Implementation Details
|
|
48
|
+
|
|
49
|
+
### A. Parser Strategy
|
|
50
|
+
Use `web-tree-sitter` for WASM-based parsing.
|
|
51
|
+
- **Language**: `tree-sitter-typescript` (covers JS and TS).
|
|
52
|
+
- **Loading**: Load the WASM binary on extension startup.
|
|
53
|
+
|
|
54
|
+
### B. Integration with Pi
|
|
55
|
+
The extension registers `smart_read` as a custom tool via `pi.registerTool()`.
|
|
56
|
+
|
|
57
|
+
## 5. Performance & Complexity
|
|
58
|
+
- **Time Complexity**: $O(N)$ to parse the file, where $N$ is file size.
|
|
59
|
+
- **Space Complexity**: $O(T)$ where $T$ is the size of the extracted fragment.
|
|
60
|
+
- **Token Impact**: Moves from $O(FileSize)$ to $O(SymbolSize)$.
|
package/package.json
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "pi-smart-reader",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "Optimizes token usage by providing skeletal views and targeted symbol extraction for large files.",
|
|
5
|
+
"main": "dist/index.js",
|
|
6
|
+
"types": "dist/index.d.ts",
|
|
7
|
+
"scripts": {
|
|
8
|
+
"build": "tsc",
|
|
9
|
+
"dev": "tsc -w"
|
|
10
|
+
},
|
|
11
|
+
"keywords": [
|
|
12
|
+
"pi",
|
|
13
|
+
"pi-package",
|
|
14
|
+
"context-optimization",
|
|
15
|
+
"ast",
|
|
16
|
+
"token-savings"
|
|
17
|
+
],
|
|
18
|
+
"author": "ZachDreamZ",
|
|
19
|
+
"license": "MIT",
|
|
20
|
+
"pi": {
|
|
21
|
+
"extensions": [
|
|
22
|
+
"dist/index.js"
|
|
23
|
+
]
|
|
24
|
+
},
|
|
25
|
+
"dependencies": {
|
|
26
|
+
"tree-sitter": "^0.21.1",
|
|
27
|
+
"tree-sitter-typescript": "^0.23.2",
|
|
28
|
+
"web-tree-sitter": "latest"
|
|
29
|
+
},
|
|
30
|
+
"peerDependencies": {
|
|
31
|
+
"@earendil-works/pi-ai": "*",
|
|
32
|
+
"@earendil-works/pi-coding-agent": "*"
|
|
33
|
+
},
|
|
34
|
+
"devDependencies": {
|
|
35
|
+
"@types/node": "^20.0.0",
|
|
36
|
+
"typescript": "^5.0.0"
|
|
37
|
+
}
|
|
38
|
+
}
|
package/src/extractor.ts
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
import type { SmartParser } from "./parser";
|
|
2
|
+
import type { Node } from "web-tree-sitter";
|
|
3
|
+
|
|
4
|
+
export class SymbolExtractor {
|
|
5
|
+
constructor(private parser: SmartParser) {}
|
|
6
|
+
|
|
7
|
+
/**
|
|
8
|
+
* Extracts the full body of a specific symbol by name.
|
|
9
|
+
*/
|
|
10
|
+
public extractSymbol(
|
|
11
|
+
source: string,
|
|
12
|
+
symbolName: string,
|
|
13
|
+
): { content: string; relatedSymbols: string[] } {
|
|
14
|
+
const tree = this.parser.parse(source);
|
|
15
|
+
const root = tree.rootNode;
|
|
16
|
+
|
|
17
|
+
const targetNode = this.findSymbolNode(root, symbolName);
|
|
18
|
+
|
|
19
|
+
if (!targetNode) {
|
|
20
|
+
throw new Error(`Symbol '${symbolName}' not found in file.`);
|
|
21
|
+
}
|
|
22
|
+
|
|
23
|
+
const content = source.slice(targetNode.startIndex, targetNode.endIndex);
|
|
24
|
+
const related = this.findRelatedSymbols(targetNode, source);
|
|
25
|
+
|
|
26
|
+
return {
|
|
27
|
+
content,
|
|
28
|
+
relatedSymbols: related,
|
|
29
|
+
};
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
private findSymbolNode(node: Node, name: string): Node | null {
|
|
33
|
+
const findIdentifierNode = (n: Node): Node | null => {
|
|
34
|
+
if (n.type === "identifier" || n.type === "property_identifier") {
|
|
35
|
+
if (n.text === name) {
|
|
36
|
+
console.log(`Found identifier ${name} at ${n.startIndex}`);
|
|
37
|
+
return n;
|
|
38
|
+
}
|
|
39
|
+
}
|
|
40
|
+
for (const child of n.namedChildren) {
|
|
41
|
+
const found = findIdentifierNode(child);
|
|
42
|
+
if (found) return found;
|
|
43
|
+
}
|
|
44
|
+
return null;
|
|
45
|
+
};
|
|
46
|
+
|
|
47
|
+
const idNode = findIdentifierNode(node);
|
|
48
|
+
if (!idNode) {
|
|
49
|
+
console.log(`Identifier ${name} not found in AST.`);
|
|
50
|
+
return null;
|
|
51
|
+
}
|
|
52
|
+
|
|
53
|
+
let current: Node | null = idNode;
|
|
54
|
+
while (current) {
|
|
55
|
+
console.log(`Climbing: ${current.type}`);
|
|
56
|
+
if (
|
|
57
|
+
current.type === "function_declaration" ||
|
|
58
|
+
current.type === "method_definition" ||
|
|
59
|
+
current.type === "variable_declarator" ||
|
|
60
|
+
current.type === "class_declaration"
|
|
61
|
+
) {
|
|
62
|
+
console.log(`Found symbol container: ${current.type}`);
|
|
63
|
+
return current;
|
|
64
|
+
}
|
|
65
|
+
current = current.parent;
|
|
66
|
+
}
|
|
67
|
+
|
|
68
|
+
console.log(`Reached root without finding a symbol container for ${name}.`);
|
|
69
|
+
return null;
|
|
70
|
+
}
|
|
71
|
+
|
|
72
|
+
private findRelatedSymbols(node: Node, _source: string): string[] {
|
|
73
|
+
const related: string[] = [];
|
|
74
|
+
|
|
75
|
+
// Scan the subtree for call expressions
|
|
76
|
+
const walk = (n: Node) => {
|
|
77
|
+
if (n.type === "call_expression") {
|
|
78
|
+
const call = n.namedChildren.find((c) => c.type === "identifier");
|
|
79
|
+
if (call) {
|
|
80
|
+
related.push(call.text);
|
|
81
|
+
}
|
|
82
|
+
}
|
|
83
|
+
for (const child of n.namedChildren) {
|
|
84
|
+
walk(child);
|
|
85
|
+
}
|
|
86
|
+
};
|
|
87
|
+
|
|
88
|
+
walk(node);
|
|
89
|
+
return [...new Set(related)]; // Unique only
|
|
90
|
+
}
|
|
91
|
+
}
|
package/src/index.ts
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
|
|
2
|
+
import { SmartParser } from "./parser";
|
|
3
|
+
import { SkeletonEngine } from "./skeleton";
|
|
4
|
+
import { SymbolExtractor } from "./extractor";
|
|
5
|
+
import { readFileSync } from "fs";
|
|
6
|
+
|
|
7
|
+
export default async function (pi: ExtensionAPI) {
|
|
8
|
+
const parser = new SmartParser();
|
|
9
|
+
|
|
10
|
+
// Configuration for JS/TS (using hosted WASM for simplicity in this version)
|
|
11
|
+
const config = {
|
|
12
|
+
wasmPath:
|
|
13
|
+
"https://github.com/tree-sitter/tree-sitter-wasm/releases/download/v0.20.0/tree-sitter.wasm",
|
|
14
|
+
languagePath:
|
|
15
|
+
"https://github.com/tree-sitter/tree-sitter-typescript/releases/download/v0.20.0/tree-sitter-typescript.wasm",
|
|
16
|
+
};
|
|
17
|
+
|
|
18
|
+
await parser.initialize(config);
|
|
19
|
+
const skeletonEngine = new SkeletonEngine(parser);
|
|
20
|
+
const symbolExtractor = new SymbolExtractor(parser);
|
|
21
|
+
|
|
22
|
+
pi.registerTool({
|
|
23
|
+
name: "smart_read",
|
|
24
|
+
description:
|
|
25
|
+
"Read a file structurally. Use 'skeleton' mode to see the API of a large file, or 'symbol' mode to extract a specific function body.",
|
|
26
|
+
parameters: {
|
|
27
|
+
type: "object",
|
|
28
|
+
properties: {
|
|
29
|
+
path: { type: "string", description: "Path to the file to read" },
|
|
30
|
+
options: {
|
|
31
|
+
type: "object",
|
|
32
|
+
properties: {
|
|
33
|
+
mode: {
|
|
34
|
+
type: "string",
|
|
35
|
+
enum: ["skeleton", "symbol"],
|
|
36
|
+
description: "Extraction mode",
|
|
37
|
+
},
|
|
38
|
+
symbol: {
|
|
39
|
+
type: "string",
|
|
40
|
+
description:
|
|
41
|
+
"The name of the symbol to extract (required for 'symbol' mode)",
|
|
42
|
+
},
|
|
43
|
+
},
|
|
44
|
+
required: ["mode"],
|
|
45
|
+
},
|
|
46
|
+
},
|
|
47
|
+
required: ["path", "options"],
|
|
48
|
+
},
|
|
49
|
+
handler: async (input: any, _ctx: any) => {
|
|
50
|
+
const { path, options } = input;
|
|
51
|
+
|
|
52
|
+
try {
|
|
53
|
+
const source = readFileSync(path, "utf8");
|
|
54
|
+
|
|
55
|
+
if (options.mode === "skeleton") {
|
|
56
|
+
return {
|
|
57
|
+
content: skeletonEngine.generateSkeleton(source),
|
|
58
|
+
mode: "skeleton",
|
|
59
|
+
message:
|
|
60
|
+
"Skeletal view of the file generated. Implementation details stripped.",
|
|
61
|
+
};
|
|
62
|
+
}
|
|
63
|
+
|
|
64
|
+
if (options.mode === "symbol") {
|
|
65
|
+
if (!options.symbol) {
|
|
66
|
+
throw new Error("Symbol name is required for 'symbol' mode.");
|
|
67
|
+
}
|
|
68
|
+
|
|
69
|
+
const { content, relatedSymbols } = symbolExtractor.extractSymbol(
|
|
70
|
+
source,
|
|
71
|
+
options.symbol,
|
|
72
|
+
);
|
|
73
|
+
|
|
74
|
+
return {
|
|
75
|
+
content,
|
|
76
|
+
relatedSymbols,
|
|
77
|
+
mode: "symbol",
|
|
78
|
+
message: `Extracted symbol '${options.symbol}' and identified related dependencies.`,
|
|
79
|
+
};
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
throw new Error(`Unsupported mode: ${options.mode}`);
|
|
83
|
+
} catch (error: any) {
|
|
84
|
+
return {
|
|
85
|
+
error: `Failed to smart-read ${path}: ${error.message}`,
|
|
86
|
+
};
|
|
87
|
+
}
|
|
88
|
+
},
|
|
89
|
+
});
|
|
90
|
+
}
|
package/src/parser.ts
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
import Parser from "tree-sitter";
|
|
2
|
+
import TypeScript from "tree-sitter-typescript";
|
|
3
|
+
|
|
4
|
+
export interface ParserConfig {
|
|
5
|
+
// Not strictly needed for native tree-sitter but kept for API compatibility
|
|
6
|
+
wasmPath?: string;
|
|
7
|
+
languagePath?: string;
|
|
8
|
+
}
|
|
9
|
+
|
|
10
|
+
export class SmartParser {
|
|
11
|
+
private parser: Parser | null = null;
|
|
12
|
+
private language: any | null = null;
|
|
13
|
+
|
|
14
|
+
/**
|
|
15
|
+
* Initializes the tree-sitter parser with TypeScript support.
|
|
16
|
+
*/
|
|
17
|
+
public async initialize(_config?: ParserConfig): Promise<void> {
|
|
18
|
+
try {
|
|
19
|
+
this.parser = new Parser();
|
|
20
|
+
this.language = TypeScript.typescript;
|
|
21
|
+
this.parser.setLanguage(this.language);
|
|
22
|
+
} catch (error) {
|
|
23
|
+
console.error("[pi-smart-reader] Initialization failed:", error);
|
|
24
|
+
throw new Error(`Failed to initialize tree-sitter: ${error}`);
|
|
25
|
+
}
|
|
26
|
+
}
|
|
27
|
+
|
|
28
|
+
/**
|
|
29
|
+
* Parses the source code into an AST.
|
|
30
|
+
*/
|
|
31
|
+
public parse(source: string) {
|
|
32
|
+
if (!this.parser) {
|
|
33
|
+
throw new Error("Parser not initialized. Call initialize() first.");
|
|
34
|
+
}
|
|
35
|
+
return this.parser.parse(source);
|
|
36
|
+
}
|
|
37
|
+
|
|
38
|
+
public getLanguage() {
|
|
39
|
+
return this.language;
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
public isInitialized(): boolean {
|
|
43
|
+
return this.parser !== null && this.language !== null;
|
|
44
|
+
}
|
|
45
|
+
}
|
package/src/skeleton.ts
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
import type { SmartParser } from "./parser";
|
|
2
|
+
import type { Node } from "web-tree-sitter";
|
|
3
|
+
|
|
4
|
+
export class SkeletonEngine {
|
|
5
|
+
constructor(private parser: SmartParser) {}
|
|
6
|
+
|
|
7
|
+
/**
|
|
8
|
+
* Generates a skeletal view of the source code.
|
|
9
|
+
* Keeps signatures of functions and classes, but strips bodies.
|
|
10
|
+
*/
|
|
11
|
+
public generateSkeleton(source: string): string {
|
|
12
|
+
const tree = this.parser.parse(source);
|
|
13
|
+
const root = tree.rootNode;
|
|
14
|
+
|
|
15
|
+
let skeleton = "";
|
|
16
|
+
const lines = source.split("\\n");
|
|
17
|
+
|
|
18
|
+
const walkAndLog = (node: any, depth = 0) => {
|
|
19
|
+
if (node.text.includes("login")) {
|
|
20
|
+
console.log(`Depth ${depth} | Type: ${node.type} | Text: ${node.text}`);
|
|
21
|
+
}
|
|
22
|
+
for (const child of node.namedChildren) {
|
|
23
|
+
walkAndLog(child, depth + 1);
|
|
24
|
+
}
|
|
25
|
+
};
|
|
26
|
+
walkAndLog(root);
|
|
27
|
+
|
|
28
|
+
// We iterate through the top-level named children
|
|
29
|
+
for (const node of root.namedChildren) {
|
|
30
|
+
if (this.isSignatureNode(node)) {
|
|
31
|
+
skeleton += this.extractSignature(node, lines) + "\\n";
|
|
32
|
+
} else if (node.type === "comment") {
|
|
33
|
+
skeleton += node.text + "\\n";
|
|
34
|
+
}
|
|
35
|
+
}
|
|
36
|
+
|
|
37
|
+
return skeleton || "// No structural symbols found in this file.";
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
private isSignatureNode(node: Node): boolean {
|
|
41
|
+
const types = [
|
|
42
|
+
"function_declaration",
|
|
43
|
+
"method_definition",
|
|
44
|
+
"class_declaration",
|
|
45
|
+
"variable_declaration",
|
|
46
|
+
];
|
|
47
|
+
return types.includes(node.type);
|
|
48
|
+
}
|
|
49
|
+
|
|
50
|
+
private extractSignature(node: Node, lines: string[]): string {
|
|
51
|
+
const startLine = node.startPosition.row;
|
|
52
|
+
const endLine = node.endPosition.row;
|
|
53
|
+
|
|
54
|
+
// For signatures, we want the line where the name/params are,
|
|
55
|
+
// but we want to stop before the opening brace '{'
|
|
56
|
+
let result = "";
|
|
57
|
+
for (let i = startLine; i <= endLine; i++) {
|
|
58
|
+
const line = lines[i] || "";
|
|
59
|
+
if (line.includes("{")) {
|
|
60
|
+
result += line.split("{")[0] + " { // ... implementation";
|
|
61
|
+
break;
|
|
62
|
+
}
|
|
63
|
+
result += line + "\\n";
|
|
64
|
+
}
|
|
65
|
+
|
|
66
|
+
// Trim trailing newlines
|
|
67
|
+
return result.trimEnd();
|
|
68
|
+
}
|
|
69
|
+
}
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
declare module "@earendil-works/pi-coding-agent" {
|
|
2
|
+
export interface ExtensionAPI {
|
|
3
|
+
on(
|
|
4
|
+
event: string,
|
|
5
|
+
handler: (event: any, ctx: ExtensionContext) => Promise<void> | void,
|
|
6
|
+
): void;
|
|
7
|
+
registerTool(options: any): void;
|
|
8
|
+
}
|
|
9
|
+
|
|
10
|
+
export interface ExtensionContext {
|
|
11
|
+
ui: {
|
|
12
|
+
notify(
|
|
13
|
+
message: string,
|
|
14
|
+
level: "info" | "success" | "warning" | "error",
|
|
15
|
+
): void;
|
|
16
|
+
};
|
|
17
|
+
}
|
|
18
|
+
}
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
declare module 'web-tree-sitter' {
|
|
2
|
+
export default class Parser {
|
|
3
|
+
static async init(options: { wasmPath: string }): Promise<void>;
|
|
4
|
+
constructor();
|
|
5
|
+
setLanguage(language: any): void;
|
|
6
|
+
parse(source: string): Tree;
|
|
7
|
+
|
|
8
|
+
static Language: {
|
|
9
|
+
load(path: string): Promise<any>;
|
|
10
|
+
};
|
|
11
|
+
}
|
|
12
|
+
|
|
13
|
+
export interface Tree {
|
|
14
|
+
rootNode: Node;
|
|
15
|
+
// Simplified for this extension's needs
|
|
16
|
+
}
|
|
17
|
+
|
|
18
|
+
export interface Node {
|
|
19
|
+
id: number;
|
|
20
|
+
type: string;
|
|
21
|
+
text: string;
|
|
22
|
+
startPosition: { row: number; column: number };
|
|
23
|
+
endPosition: { row: number; column: number };
|
|
24
|
+
startIndex: number;
|
|
25
|
+
endIndex: number;
|
|
26
|
+
children: Node[];
|
|
27
|
+
parent: Node | null;
|
|
28
|
+
namedChildren: Node[];
|
|
29
|
+
}
|
|
30
|
+
}
|
package/tasks.md
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
# Implementation Tasks: pi-smart-reader
|
|
2
|
+
|
|
3
|
+
## Phase 1: Foundation
|
|
4
|
+
- [ ] Initialize project with `package.json` and `tsconfig.json`.
|
|
5
|
+
- [ ] Setup `web-tree-sitter` WASM loader and language bindings.
|
|
6
|
+
|
|
7
|
+
## Phase 2: The Skeleton Engine
|
|
8
|
+
- [ ] Implement Tree-sitter queries to find function/class signatures.
|
|
9
|
+
- [ ] Implement the logic to strip function bodies and replace with `// ...`.
|
|
10
|
+
- [ ] Create a test suite with large TS/JS files to verify skeletal output.
|
|
11
|
+
|
|
12
|
+
## Phase 3: Precision Extraction
|
|
13
|
+
- [ ] Implement symbol lookup logic (mapping name $\to$ byte range).
|
|
14
|
+
- [ ] Implement the content slicing mechanism.
|
|
15
|
+
- [ ] Implement internal dependency scanning (finding calls within a function).
|
|
16
|
+
|
|
17
|
+
## Phase 4: Pi Tool Integration
|
|
18
|
+
- [ ] Define the `smart_read` tool schema.
|
|
19
|
+
- [ ] Implement the `smart_read` handler in the extension.
|
|
20
|
+
- [ ] Integrate the logic from Phase 2 and 3 into the handler.
|
|
21
|
+
|
|
22
|
+
## Phase 5: Benchmarking & Optimization
|
|
23
|
+
- [ ] Compare token usage: `read` vs `smart_read(skeleton)` vs `smart_read(symbol)`.
|
|
24
|
+
- [ ] Optimize AST queries for speed.
|
|
25
|
+
- [ ] Handle edge cases (anonymous functions, complex nested classes).
|
|
26
|
+
|
|
27
|
+
## Phase 6: Release
|
|
28
|
+
- [ ] Write professional README.md.
|
|
29
|
+
- [ ] Publish to npm and GitHub.
|
package/tsconfig.json
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
{
|
|
2
|
+
"compilerOptions": {
|
|
3
|
+
"target": "ES2022",
|
|
4
|
+
"module": "NodeNext",
|
|
5
|
+
"moduleResolution": "NodeNext",
|
|
6
|
+
"outDir": "./dist",
|
|
7
|
+
"rootDir": "./src",
|
|
8
|
+
"strict": true,
|
|
9
|
+
"esModuleInterop": true,
|
|
10
|
+
"skipLibCheck": true,
|
|
11
|
+
"forceConsistentCasingInFileNames": true,
|
|
12
|
+
"declaration": true
|
|
13
|
+
},
|
|
14
|
+
"include": ["src/**/*"],
|
|
15
|
+
"typeRoots": ["./node_modules/@types", "src/types"]
|
|
16
|
+
}
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
Not Found
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
Not Found
|