catport 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/ARCHITECTURE.md +94 -0
- package/CONTRIBUTING.md +133 -0
- package/LICENSE +21 -0
- package/README.md +414 -0
- package/bin/catport +8 -0
- package/package.json +48 -0
- package/src/cli/args.js +133 -0
- package/src/cli/main.js +78 -0
- package/src/cli/parser.js +152 -0
- package/src/cli/ui.js +78 -0
- package/src/config/constants.js +62 -0
- package/src/config/ignores.js +119 -0
- package/src/config/loader.js +15 -0
- package/src/config/options.js +181 -0
- package/src/core/analyzer.js +23 -0
- package/src/core/bundler.js +165 -0
- package/src/core/extractor.js +76 -0
- package/src/core/ignore.js +65 -0
- package/src/core/processor.js +59 -0
- package/src/core/scanner.js +184 -0
- package/src/formatters/index.js +78 -0
- package/src/formatters/json.js +284 -0
- package/src/formatters/markdown.js +164 -0
- package/src/formatters/multipart.js +127 -0
- package/src/formatters/xml.js +221 -0
- package/src/formatters/yaml.js +147 -0
- package/src/index.js +11 -0
- package/src/optimizers/definitions.js +79 -0
- package/src/optimizers/index.js +96 -0
- package/src/optimizers/langs/batch.js +3 -0
- package/src/optimizers/langs/c_family.js +3 -0
- package/src/optimizers/langs/clojure.js +3 -0
- package/src/optimizers/langs/css.js +3 -0
- package/src/optimizers/langs/go.js +5 -0
- package/src/optimizers/langs/haskell.js +4 -0
- package/src/optimizers/langs/html.js +4 -0
- package/src/optimizers/langs/ini.js +4 -0
- package/src/optimizers/langs/javascript.js +11 -0
- package/src/optimizers/langs/lua.js +4 -0
- package/src/optimizers/langs/markdown.js +3 -0
- package/src/optimizers/langs/perl.js +3 -0
- package/src/optimizers/langs/php.js +4 -0
- package/src/optimizers/langs/powershell.js +5 -0
- package/src/optimizers/langs/python.js +5 -0
- package/src/optimizers/langs/ruby.js +4 -0
- package/src/optimizers/langs/rust.js +3 -0
- package/src/optimizers/langs/shell.js +4 -0
- package/src/optimizers/langs/sql.js +4 -0
- package/src/optimizers/langs/xml.js +3 -0
- package/src/optimizers/langs/yaml.js +3 -0
- package/src/optimizers/tokenizer.js +444 -0
- package/src/utils/git.js +35 -0
- package/src/utils/io.js +79 -0
- package/src/utils/logger.js +25 -0
- package/src/utils/path.js +59 -0
- package/src/utils/style.js +59 -0
package/ARCHITECTURE.md
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
# System Architecture
|
|
7
|
+
|
|
8
|
+
**catport** is a modular, functional CLI utility designed around stream processing principles and the Strategy design pattern.
|
|
9
|
+
|
|
10
|
+
## Core Components
|
|
11
|
+
|
|
12
|
+
### 1. CLI Layer (`src/cli/`)
|
|
13
|
+
|
|
14
|
+
* **main.js**: The entry point. It initializes the IO interface, loads configuration, and invokes the Bundler or Extractor.
|
|
15
|
+
* **args.js**: A custom, zero-dependency argument parser. It handles flag parsing, type coercion, and schema validation against `src/config/options.js`.
|
|
16
|
+
* **ui.js**: Manages `stdout` and `stderr` formatting. It provides ANSI color support via `src/utils/style.js`.
|
|
17
|
+
|
|
18
|
+
### 2. Business Logic (`src/core/`)
|
|
19
|
+
|
|
20
|
+
* **Scanner**: A generator-based file system walker.
|
|
21
|
+
* **Logic**: It recursively traverses directories.
|
|
22
|
+
* **Filtering**: It parses `.gitignore` files at each directory level, creating a scoped ignore chain to correctly handle nested repositories and exclusions. It maintains a stack of "Ignore Matchers" where rules in deeper directories override or extend rules from parents.
|
|
23
|
+
* **Lazy Evaluation**: Files are yielded one by one to keep memory usage low, regardless of project size.
|
|
24
|
+
|
|
25
|
+
* **Bundler**: The orchestration engine.
|
|
26
|
+
* **Input**: Consumes the `Scanner` iterator.
|
|
27
|
+
* **Processing**: Applies priority sorting, binary detection, and token budgeting.
|
|
28
|
+
* **Concurrency**: Uses a batched `Promise.all` approach to read files in parallel (controlled by the `--concurrency` flag).
|
|
29
|
+
|
|
30
|
+
* **Processor**: The file content transformation unit.
|
|
31
|
+
* **Reading**: Reads the file content (initially a small sample to detect binary, then the full content if text).
|
|
32
|
+
* **Optimization Strategies**:
|
|
33
|
+
* **Internal**: If a standard mode (`minify`, `comments`, etc.) is selected, it delegates to the `Optimizer` service.
|
|
34
|
+
* **External (Custom Transforms)**: If `config.optimizeCmd` is set (via `-O "cmd"`), the Processor bypasses the internal logic. It detects if `{}` is present in the command string.
|
|
35
|
+
* **With `{}`**: It invokes `io.exec(cmd)` replacing `{}` with the quoted, absolute file path.
|
|
36
|
+
* **Without `{}`**: It invokes `io.execPipe(cmd, content)` passing the raw content to the child process's stdin.
|
|
37
|
+
* This allows for infinite extensibility via standard Unix tools (sed, awk, grep, external minifiers).
|
|
38
|
+
|
|
39
|
+
* **Extractor**: The parsing engine.
|
|
40
|
+
* **Input**: Reads from `stdin` or a file.
|
|
41
|
+
* **Parsing**: Delegates to the appropriate `Formatter` to reconstruct file objects.
|
|
42
|
+
* **Security**: Enforces the sandbox. It resolves target paths and ensures they do not escape the extraction root via `..` segments.
|
|
43
|
+
|
|
44
|
+
* **Analyzer**: Analysis utilities.
|
|
45
|
+
* **Binary Detection**: checks the first 1024 bytes of a buffer for null bytes.
|
|
46
|
+
* **Token Counting**: Uses a heuristic (char length / `charsPerToken`) for performance.
|
|
47
|
+
* **Priority Scoring**: Assigns priority scores to files based on pattern matching rules.
|
|
48
|
+
|
|
49
|
+
### 3. Strategy Layer
|
|
50
|
+
|
|
51
|
+
* **Formatters** (`src/formatters/`): Pluggable serialization strategies.
|
|
52
|
+
* Each formatter implements a standard interface: `header(meta)`, `file(fileObj)`, `footer(meta)`, and `parse(text)`.
|
|
53
|
+
* Supported strategies: Markdown, XML, JSON, YAML, Multipart.
|
|
54
|
+
|
|
55
|
+
* **Optimizers** (`src/optimizers/`): Language-aware code optimization.
|
|
56
|
+
* **Tokenizer**: A custom state machine (DFA) that parses source code byte-by-byte.
|
|
57
|
+
* **Definitions**: Language-specific configurations (e.g., `src/optimizers/langs/python.js`) define valid comments, string delimiters, and regex literals.
|
|
58
|
+
* **Safety**: The tokenizer identifies strings and regex literals to ensure that content within them is never modified, while safely removing surrounding whitespace and comments.
|
|
59
|
+
* **Normalization**: Uses string enum values (`'minify'`, `'comments'`, `'whitespace'`) to select the strategy, ensuring configuration consistency across the app.
|
|
60
|
+
|
|
61
|
+
### 4. Utilities (`src/utils/`)
|
|
62
|
+
|
|
63
|
+
* **NodeIO**: A dependency injection wrapper for `node:fs`, `node:process`, and `node:child_process`. This allows the core logic to be unit-tested with a mock file system without mocking native modules.
|
|
64
|
+
* **Logger**: Provides leveled logging (ERROR, WARN, INFO, DEBUG).
|
|
65
|
+
|
|
66
|
+
## Data Flow
|
|
67
|
+
|
|
68
|
+
### Bundling Process
|
|
69
|
+
|
|
70
|
+
1. **CLI** parses arguments -> `Config` object.
|
|
71
|
+
2. **Scanner** yields `FileItem` objects (metadata only).
|
|
72
|
+
3. **Bundler** collects items, applies priority sorting.
|
|
73
|
+
4. **Bundler** reads file content in batches (IO).
|
|
74
|
+
5. **Processor** reads sample, checks binary, reads full content.
|
|
75
|
+
6. **Processor** transforms content via `Optimizer` OR external command.
|
|
76
|
+
7. **Analyzer** counts tokens of the processed block.
|
|
77
|
+
8. **Formatter** serializes the `FileItem` into a string.
|
|
78
|
+
9. **CLI** writes the string to the output stream.
|
|
79
|
+
|
|
80
|
+
### Extraction Process
|
|
81
|
+
|
|
82
|
+
1. **CLI** reads input stream -> String buffer.
|
|
83
|
+
2. **Formatter** detects format and parses String -> `FileItem[]`.
|
|
84
|
+
3. **Extractor** iterates over `FileItem[]`.
|
|
85
|
+
4. **Extractor** sanitizes path and checks security bounds.
|
|
86
|
+
5. **Extractor** writes content to disk (IO).
|
|
87
|
+
|
|
88
|
+
## Security Model
|
|
89
|
+
|
|
90
|
+
The extraction process assumes untrusted input (e.g., generated by an LLM).
|
|
91
|
+
|
|
92
|
+
* **Sandbox**: Files are written relative to the current working directory (or specified `--extract-dir`).
|
|
93
|
+
* **Path Traversal**: Paths like `../../etc/passwd` are detected. The system resolves the absolute path of the target and verifies it starts with the absolute path of the extraction root.
|
|
94
|
+
* **Overrides**: The `--unsafe` flag bypasses these checks.
|
package/CONTRIBUTING.md
ADDED
|
@@ -0,0 +1,133 @@
|
|
|
1
|
+
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
|
|
5
|
+
|
|
6
|
+
# Contributing to catport
|
|
7
|
+
|
|
8
|
+
Thank you for your interest in contributing!
|
|
9
|
+
**catport** is a deliberately minimalist, zero-dependency Node.js tool that follows the Unix philosophy to the letter. To keep it fast, reliable, and composable for years to come, we enforce strict principles. Contributions that respect these constraints are merged quickly; anything that introduces complexity or deviates from the core mission will be politely declined.
|
|
10
|
+
|
|
11
|
+
## Project Philosophy
|
|
12
|
+
|
|
13
|
+
**catport** strictly adheres to the Unix tradition: *Do one thing, and do it well.*
|
|
14
|
+
|
|
15
|
+
- It is a pure filter: reads text from stdin, writes text to stdout, and does nothing else.
|
|
16
|
+
- It must remain predictable, composable, and pipeline-friendly.
|
|
17
|
+
- It works seamlessly with core utilities (`cat`, `grep`, `sed`, `awk`, `jq`, shell redirects, etc.).
|
|
18
|
+
- No hidden side effects, no unsolicited output to stderr unless reporting genuine errors, no unnecessary features.
|
|
19
|
+
|
|
20
|
+
If a proposed feature cannot be meaningfully used in a one-liner, it does not belong in catport.
|
|
21
|
+
|
|
22
|
+
## Core Technical Constraints
|
|
23
|
+
|
|
24
|
+
- **Zero external dependencies**
|
|
25
|
+
We use only Node.js built-in modules. No lodash, chalk, commander, or any other third-party packages.
|
|
26
|
+
This keeps the project lightweight, eliminates supply-chain risks, simplifies installation, and guarantees long-term maintainability.
|
|
27
|
+
*Note: This means you cannot simply `npm install` a library to solve a problem. You must write the logic yourself or stick to the standard library.*
|
|
28
|
+
|
|
29
|
+
- **Performance First**
|
|
30
|
+
The tool must stay fast and memory-efficient even when processing large amount of text or being part of a long pipeline over large codebases. Avoid blocking I/O where possible; prefer streaming transforms.
|
|
31
|
+
|
|
32
|
+
- **No build step**
|
|
33
|
+
Plain JavaScript (ESM) that runs directly on Node.js ≥ 20. No transpilation, no bundlers.
|
|
34
|
+
|
|
35
|
+
- **Clarity over cleverness**
|
|
36
|
+
The entire implementation should be understandable in one sitting by anyone familiar with Node.js streams.
|
|
37
|
+
|
|
38
|
+
## Contribution Guidelines
|
|
39
|
+
|
|
40
|
+
1. **Open an issue first** for anything beyond trivial fixes — let’s agree the change aligns with the project’s scope.
|
|
41
|
+
2. Add or update tests when touching functionality (`npm test` must pass).
|
|
42
|
+
3. Keep the binary small and the `--help` output concise.
|
|
43
|
+
4. Update README or man page if you introduce visible behavior changes.
|
|
44
|
+
|
|
45
|
+
## Development Setup
|
|
46
|
+
|
|
47
|
+
Clone the repository and install development dependencies (linting only):
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
git clone https://github.com/Jelodar/catport.git
|
|
51
|
+
cd catport
|
|
52
|
+
npm install
|
|
53
|
+
```
|
|
54
|
+
Required runtime: **Node.js ≥ 20.0.0**
|
|
55
|
+
|
|
56
|
+
|
|
57
|
+
### Directory Structure
|
|
58
|
+
|
|
59
|
+
* `src/cli/`: **Interface Layer**. Handles argument parsing, validation, and UI output.
|
|
60
|
+
* `src/core/`: **Business Logic Layer**.
|
|
61
|
+
* `Bundler`: Orchestrates the flow.
|
|
62
|
+
* `Scanner`: Generator-based filesystem walker.
|
|
63
|
+
* `Extractor`: Parser and writer for restoring files.
|
|
64
|
+
* `Analyzer`: Heuristics for binary detection and token counting.
|
|
65
|
+
* `src/formatters/`: **Strategy Pattern**. Pluggable output formats (Markdown, XML, etc.).
|
|
66
|
+
* `src/optimizers/`: **Optimization Layer**. Language-aware tokenizers.
|
|
67
|
+
* `src/utils/`: **Infrastructure**. Stateless helpers (IO, Git, Logger).
|
|
68
|
+
|
|
69
|
+
Please see [ARCHITECTURE.md](ARCHITECTURE.md) for more architectural details.
|
|
70
|
+
|
|
71
|
+
### Data Flow
|
|
72
|
+
|
|
73
|
+
1. **CLI** parses arguments (zero-dep) -> Config Object.
|
|
74
|
+
2. **Scanner** yields `FileItem` objects lazily.
|
|
75
|
+
3. **Analyzer** enriches `FileItem` (isBinary, priority score).
|
|
76
|
+
4. **Bundler** filters, sorts, and batches `FileItems`.
|
|
77
|
+
5. **Processor** determines whether to use internal `Optimizer` or external shell commands.
|
|
78
|
+
6. **Optimizer** (if used) transforms content (Strategy).
|
|
79
|
+
7. **Formatter** serializes content (Strategy).
|
|
80
|
+
8. **IO** writes to stream.
|
|
81
|
+
|
|
82
|
+
### Testing
|
|
83
|
+
|
|
84
|
+
We use the native `node:test` runner. All new features and bug fixes must include unit tests.
|
|
85
|
+
|
|
86
|
+
### Running Tests
|
|
87
|
+
|
|
88
|
+
Run the full test suite:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
# Run all tests
|
|
92
|
+
npm test
|
|
93
|
+
|
|
94
|
+
# Run specific suite
|
|
95
|
+
node --test tests/unit/scanner.test.js
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
All tests must pass and code must be formatted before a PR is considered.
|
|
99
|
+
|
|
100
|
+
## Contribution examples
|
|
101
|
+
|
|
102
|
+
### Adding a New Format
|
|
103
|
+
|
|
104
|
+
1. Create `src/formatters/<format>.js`.
|
|
105
|
+
2. Implement the required methods: `header(meta)`, `file(fileObj)`, `footer(meta)`, `parse(text)`.
|
|
106
|
+
3. Register the format in `src/formatters/index.js`.
|
|
107
|
+
4. Add unit tests in `tests/unit/formatters.test.js`.
|
|
108
|
+
|
|
109
|
+
### Adding a New Optimizer
|
|
110
|
+
|
|
111
|
+
* **External (Preferred for one-offs)**: If you need a specific transformation (e.g., Minifying a rare language), consider simply using the `-O "command"` feature rather than modifying the core code. This aligns with our zero-dependency philosophy by leveraging user-installed tools (like `black` for Python or `gofmt` for Go).
|
|
112
|
+
* **Internal (Core support)**: To add native support for a new language in `minify` mode:
|
|
113
|
+
1. Update `src/optimizers/definitions.js` with language-specific syntax rules (comments, strings, etc.).
|
|
114
|
+
2. If complex logic is needed (e.g. non-standard string delimiters), update the `Tokenizer` state machine in `src/optimizers/tokenizer.js`.
|
|
115
|
+
3. Add unit tests in `tests/unit/optimizer.test.js`.
|
|
116
|
+
|
|
117
|
+
|
|
118
|
+
## Contribution Workflow
|
|
119
|
+
|
|
120
|
+
1. **Open an issue first** for anything more than a typo or obvious bug fix.
|
|
121
|
+
We want to confirm the change fits the project’s narrow scope before code is written.
|
|
122
|
+
2. Fork & create a descriptively named branch (`fix/xyz`, `feat/xyz`).
|
|
123
|
+
3. Write clean, tested code that respects the constraints above.
|
|
124
|
+
4. Update README and/or `--help` output if user-visible behavior changes.
|
|
125
|
+
5. Submit a Pull Request with:
|
|
126
|
+
- A clear description of the problem and solution
|
|
127
|
+
- Reference to the related issue
|
|
128
|
+
|
|
129
|
+
Small, focused PRs are strongly preferred.
|
|
130
|
+
|
|
131
|
+
If you’re unsure whether something fits, just ask in an issue — we’re friendly and respond quickly.
|
|
132
|
+
|
|
133
|
+
Thank you for helping keep **catport** fast, simple, and true to the Unix way!
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) Reza Jelodar.
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,414 @@
|
|
|
1
|
+
# catport
|
|
2
|
+
|
|
3
|
+
<p align="center">
|
|
4
|
+
<samp>Deterministic filesystem serializer for LLM context generation.</samp>
|
|
5
|
+
<br><br>
|
|
6
|
+
<a href="https://www.npmjs.com/package/catport">
|
|
7
|
+
<img src="https://img.shields.io/npm/v/catport?style=flat-square&color=000" alt="npm" />
|
|
8
|
+
</a>
|
|
9
|
+
<a href="./LICENSE">
|
|
10
|
+
<img src="https://img.shields.io/npm/l/catport?style=flat-square&color=000" alt="license" />
|
|
11
|
+
</a>
|
|
12
|
+
<img src="https://img.shields.io/badge/dependencies-0-000?style=flat-square" alt="zero dependencies" />
|
|
13
|
+
<img src="https://img.shields.io/badge/node-%3E%3D20-000?style=flat-square" alt="node" />
|
|
14
|
+
</p>
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
**catport** is a high-performance, **zero-dependency** Node.js tool that converts complex project directories into clean, structured, token-efficient text streams suitable for Large Language Models — and back again.
|
|
19
|
+
|
|
20
|
+
It functions as a high-fidelity, bidirectional conduit:
|
|
21
|
+
|
|
22
|
+
1. **Bundling**: Scans source directories, applies rigorous filtering (gitignore, binary detection), performs language-aware code optimization (minification), and produces a structured output stream.
|
|
23
|
+
2. **Extraction**: Ingests `catport` bundles (or LLM responses), validates structure, prevents path traversal attacks, and restores the filesystem.
|
|
24
|
+
|
|
25
|
+
All while respecting `.gitignore`, token budgets, code minification, and filesystem security.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## See it in action
|
|
30
|
+
|
|
31
|
+
```console
|
|
32
|
+
$ catport -g HEAD -O minify -o context.md
|
|
33
|
+
|
|
34
|
+
> Found 12 files changed in git
|
|
35
|
+
> Optimizing (minify)...
|
|
36
|
+
> Token estimate: 14.2k
|
|
37
|
+
> Wrote context.md (45ms)
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Why catport?
|
|
43
|
+
|
|
44
|
+
In an era of integrated AI tools like Cursor or Claude Code, you might wonder why a standalone CLI is necessary. The answer lies in control and composability.
|
|
45
|
+
|
|
46
|
+
Integrated tools are often black boxes. They use hidden heuristics to select context, locking you into their specific workflow and API. Catport takes a rigorous engineering approach. It generates pure, deterministic text that you can pipe anywhere. You can feed it to local Ollama instances, the ChatGPT web UI, or custom CI pipelines. You are not tied to a specific vendor.
|
|
47
|
+
|
|
48
|
+
It also solves the token economy problem. Most IDE plugins dump raw text, wasting valuable context window space. Catport's syntax-aware minification reduces payload size significantly, allowing you to fit more logic into the model's memory without losing meaning.
|
|
49
|
+
|
|
50
|
+
Cursor is built for flow. Catport is built for engineering. It is the `tar` command for the LLM age. Simple, fundamental, and model-agnostic.
|
|
51
|
+
|
|
52
|
+
### vs The World
|
|
53
|
+
|
|
54
|
+
| Feature | `catport` | `tar` / `cp` | IDE Plugins |
|
|
55
|
+
| :--- | :---: | :---: | :---: |
|
|
56
|
+
| **Output** | LLM-Ready Text | Binary / Raw | Raw Text |
|
|
57
|
+
| **Git Aware** | ✅ | ❌ | ⚠️ |
|
|
58
|
+
| **Token Budget** | ✅ | ❌ | ❌ |
|
|
59
|
+
| **Minification** | ✅ | ❌ | ❌ |
|
|
60
|
+
| **Restorable** | ✅ | ✅ | ❌ |
|
|
61
|
+
| **Dependencies** | 0 | 0 | Many |
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## Features
|
|
66
|
+
|
|
67
|
+
* **Zero Dependencies**: Built entirely on the Node.js standard library. No bloat, no supply chain risks.
|
|
68
|
+
* **Polyglot Minification**: Advanced, syntax-aware whitespace removal for common file extensions and languages (JS, Python, Rust, Go, SQL, etc.).
|
|
69
|
+
* **Token Budgeting**: Prioritize essential files (`README.md`, `src/core/`) and hard-stop when a token limit is reached.
|
|
70
|
+
* **Git Aware**: Smartly filters `.gitignore` rules and can bundle only files changed relative to a specific git commit/branch.
|
|
71
|
+
* **Secure Extraction**: Sandboxed extraction prevents malicious paths (`../../etc/passwd`) from escaping the target directory.
|
|
72
|
+
* **Multiple Formats**: Output to Markdown, XML (CDATA), JSON, YAML, or MIME Multipart based on your prompting strategy.
|
|
73
|
+
* **Custom Transforms**: Pipe file content through external shell commands (e.g., `sed`, `awk`, `terser`) for limitless customization.
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Installation
|
|
78
|
+
|
|
79
|
+
### CLI Usage
|
|
80
|
+
|
|
81
|
+
Install globally to use as a command-line tool:
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
npm install -g catport
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Library Usage
|
|
88
|
+
|
|
89
|
+
Install as a dev dependency to use within your build scripts:
|
|
90
|
+
|
|
91
|
+
```bash
|
|
92
|
+
npm install -D catport
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
# CLI Usage
|
|
98
|
+
|
|
99
|
+
## Quick Start
|
|
100
|
+
|
|
101
|
+
### Bundling (Code to Text)
|
|
102
|
+
|
|
103
|
+
The default behavior bundles the current directory into a Markdown file.
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
# Bundle current directory to stdout
|
|
107
|
+
catport
|
|
108
|
+
|
|
109
|
+
# Bundle specific paths to a file
|
|
110
|
+
catport src/ tests/ -o context.md
|
|
111
|
+
|
|
112
|
+
# Bundle only git changes (unstaged + staged vs main)
|
|
113
|
+
catport -g main -o diff.md
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### Extraction (Text to Code)
|
|
117
|
+
|
|
118
|
+
Extract code blocks from an LLM response or an existing bundle.
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
# Extract from a file to the current directory
|
|
122
|
+
catport -x response.md
|
|
123
|
+
|
|
124
|
+
# Extract to a specific directory
|
|
125
|
+
catport -x response.md -d ./output/
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## CLI Reference
|
|
131
|
+
|
|
132
|
+
### General Options
|
|
133
|
+
|
|
134
|
+
| Flag | Description |
|
|
135
|
+
|-------------------------|-----------------------------------------------------------------------------|
|
|
136
|
+
| `-h, --help` | Show help |
|
|
137
|
+
| `-V, --version` | Show version |
|
|
138
|
+
| `-v, --verbose` | Verbose logging (skipped files, token estimates, priority scores) |
|
|
139
|
+
| `-o, --output <file>` | Write to file instead of stdout |
|
|
140
|
+
|
|
141
|
+
### Bundling Options
|
|
142
|
+
|
|
143
|
+
| Flag | Description |
|
|
144
|
+
|-----------------------------------|-----------------------------------------------------------------------------------------------------|
|
|
145
|
+
| `-f, --format <fmt>` | `md` / `xml` / `json` / `yaml` / `multipart` (default: `md`) |
|
|
146
|
+
| `-R, --reply-format <fmt>` | Tell the LLM which format to reply in (useful when sending MD but wanting XML back) |
|
|
147
|
+
| `-C, --context "<text>"` | Prepend custom system prompt / context block |
|
|
148
|
+
| `-T, --task "<text>"` | Append task instruction at the end |
|
|
149
|
+
| `-I, --no-instruct` | Disable auto-generated "how to reply" instructions |
|
|
150
|
+
| `-n, --no-structure` | Disable directory structure generation |
|
|
151
|
+
| `-l, --list-dirs` | Include directories in the structure listing |
|
|
152
|
+
| `-k, --skeleton` | Output only directory tree (no file contents) – great for high-level context before deep dives |
|
|
153
|
+
| `-O, --optimize <mode>` | `none` / `whitespace` / `comments` / `minify` OR shell command (default: `none`) |
|
|
154
|
+
| `-S, --max-size <size>` | Max size per file to process (e.g. `1MB`, `500KB`). Larger files are skipped unless custom transform is used. (default: `10MB`) |
|
|
155
|
+
| `-c, --chars-per-token <n>` | Characters per token for budget estimation (default: `4.2`) |
|
|
156
|
+
| `-P, --concurrency <n>` | Max concurrent file reads (default: `32`) |
|
|
157
|
+
|
|
158
|
+
### Filtering & Scoping
|
|
159
|
+
|
|
160
|
+
| Flag | Description |
|
|
161
|
+
|-----------------------------------|-----------------------------------------------------------------------------------------------------|
|
|
162
|
+
| `-e, --extensions <list>` | Comma-separated extensions to include (e.g. `js,ts,py,rust`) |
|
|
163
|
+
| `-i, --ignore <glob>` | Additional ignore globs (can be repeated) |
|
|
164
|
+
| `-u, --no-ignore` | **Unrestricted mode** – ignore `.gitignore` and built-in exclusions |
|
|
165
|
+
| `-g, --git-diff [ref]` | Only changed files vs `ref` (default: `HEAD`). Omit ref → unstaged + staged + untracked changes |
|
|
166
|
+
| `-b, --budget <tokens>` | Stop bundling when estimated tokens exceed this number |
|
|
167
|
+
| `-p, --priority <rule>` | `pattern:score` – higher score = bundled earlier when budget is tight (can be repeated) |
|
|
168
|
+
|
|
169
|
+
### XML-Specific
|
|
170
|
+
|
|
171
|
+
| Flag | Description |
|
|
172
|
+
|-----------------------------------|-----------------------------------------------------------------------------------------------------|
|
|
173
|
+
| `-X, --xml-mode <mode>` | `auto` (default), `cdata`, or `escape` |
|
|
174
|
+
|
|
175
|
+
### Extraction Options
|
|
176
|
+
|
|
177
|
+
| Flag | Description |
|
|
178
|
+
|-----------------------------------|-----------------------------------------------------------------------------------------------------|
|
|
179
|
+
| `-x, --extract` | Enable extraction mode |
|
|
180
|
+
| `-d, --extract-dir <dir>` | Target directory (default: current directory) |
|
|
181
|
+
| `-U, --unsafe` | **Disable security sandbox** – allow `../` and absolute paths (use only in trusted environments) |
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Optimization Levels (`-O`)
|
|
186
|
+
|
|
187
|
+
| Mode | What it does |
|
|
188
|
+
|------------|----------------------------------------------------------------------------------------------|
|
|
189
|
+
| `none` | Verbatim content |
|
|
190
|
+
| `whitespace`| Trim trailing whitespace, collapse blank lines |
|
|
191
|
+
| `comments` | + Remove `//`, `#`, `/* */` comments |
|
|
192
|
+
| `minify` | Advanced language-aware minification (preserves strings, regex, heredocs, indentation logic) |
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
## Deep Dive: Custom Transforms
|
|
197
|
+
|
|
198
|
+
`catport`'s design philosophy favors the "Unix Way"—small tools working together. Instead of building a plugin system for every possible modification (obfuscation, transcoding, redaction), `catport` allows you to pipe file content through **any shell command**.
|
|
199
|
+
|
|
200
|
+
You can pass a command string to `--optimize` (`-O`) instead of a preset mode. This enables you to use standard tools like `sed`, `awk`, `tr`, or external CLI binaries.
|
|
201
|
+
|
|
202
|
+
### Mode A: Standard Streams (Piping)
|
|
203
|
+
|
|
204
|
+
If your command string does **not** contain the placeholder `{}`, `catport` will spawn your command as a child process.
|
|
205
|
+
1. It writes the file's original content to the child process's `stdin`.
|
|
206
|
+
2. It captures the `stdout` as the new file content.
|
|
207
|
+
3. It bundles the result.
|
|
208
|
+
|
|
209
|
+
**Examples:**
|
|
210
|
+
|
|
211
|
+
```bash
|
|
212
|
+
# Redact sensitive keys using sed
|
|
213
|
+
catport -O "sed 's/API_KEY=[a-zA-Z0-9]*/API_KEY=REDACTED/g'"
|
|
214
|
+
|
|
215
|
+
# Convert all text to uppercase (using tr)
|
|
216
|
+
catport -O "tr '[:lower:]' '[:upper:]'"
|
|
217
|
+
|
|
218
|
+
# Prepend a copyright header to every file
|
|
219
|
+
catport -O "cat copyright_header.txt -"
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
### Mode B: File Replacement (Placeholder)
|
|
223
|
+
|
|
224
|
+
If your command string contains `{}`, `catport` will substitute `{}` with the **absolute path** of the file being processed.
|
|
225
|
+
1. It executes the command directly.
|
|
226
|
+
2. It captures `stdout` as the new content.
|
|
227
|
+
3. It ignores the original file content reading stream (optimization: prevents double reading).
|
|
228
|
+
|
|
229
|
+
This is useful for tools that expect a file path argument rather than stdin, or when you want to output metadata about a file instead of its content.
|
|
230
|
+
|
|
231
|
+
**Examples:**
|
|
232
|
+
|
|
233
|
+
```bash
|
|
234
|
+
# Use an external minifier (e.g., uglify-js)
|
|
235
|
+
catport -O "uglifyjs {} --compress --mangle"
|
|
236
|
+
|
|
237
|
+
# Bundle file statistics instead of content
|
|
238
|
+
catport -O "stat {}"
|
|
239
|
+
|
|
240
|
+
# Use wc to count lines
|
|
241
|
+
catport -O "wc -l {}"
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
### Supported Languages at `minify` mode
|
|
245
|
+
|
|
246
|
+
When using the built-in `minify` mode, `catport` uses a custom, zero-dependency tokenizer that understands the syntax of these languages to safely remove comments and whitespace without breaking strings or regex literals:
|
|
247
|
+
|
|
248
|
+
- **Web**: JavaScript, TypeScript, JSX/TSX, HTML, CSS, SCSS, Less
|
|
249
|
+
- **Scripting**: Python, Ruby, Perl, PHP, Lua, Bash/Zsh, PowerShell, Batch
|
|
250
|
+
- **Systems**: C, C++, C#, Java, Go, Rust, Swift, Kotlin
|
|
251
|
+
- **Data/Config**: JSON, YAML, XML, SQL, INI, TOML, Dockerfile, Clojure
|
|
252
|
+
- **Docs**: Markdown (collapses whitespace, strips HTML comments)
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
256
|
+
## Deep Dive: Git Integration (`-g`)
|
|
257
|
+
|
|
258
|
+
The `-g` or `--git-diff` flag turns `catport` into a smart diffing tool for LLMs. It leverages your local `git` binary to identify changed files.
|
|
259
|
+
|
|
260
|
+
### 1. Working Directory Changes (`-g` or `-g HEAD`)
|
|
261
|
+
By default, `catport -g` (equivalent to `-g HEAD`) bundles:
|
|
262
|
+
1. **Staged changes**: Files you have `git add`-ed.
|
|
263
|
+
2. **Unstaged changes**: Modified files you haven't added yet.
|
|
264
|
+
3. **Untracked files**: New files (unless ignored).
|
|
265
|
+
|
|
266
|
+
This is perfect for "Review my current work" prompts.
|
|
267
|
+
|
|
268
|
+
### 2. Branch Comparisons (`-g main`)
|
|
269
|
+
When you provide a ref, `catport` runs `git diff --name-only <ref>` to find files that differ between your current state and that reference.
|
|
270
|
+
* `catport -g main`: What has changed in my feature branch vs main?
|
|
271
|
+
* `catport -g v1.0.0`: What has changed since the last release?
|
|
272
|
+
|
|
273
|
+
### 3. CI/CD Usage
|
|
274
|
+
You can use this in CI pipelines to generate changelogs or automated PR summaries.
|
|
275
|
+
```bash
|
|
276
|
+
# In a GitHub Action
|
|
277
|
+
catport -g origin/main -O minify -o changes.md
|
|
278
|
+
# Send changes.md to LLM...
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
## Deep Dive: Token Budgeting & Priority
|
|
284
|
+
|
|
285
|
+
Context windows are limited (e.g., 32k, 128k, 1M tokens). `catport` helps you maximize value within that limit using `-b` (budget) and `-p` (priority).
|
|
286
|
+
|
|
287
|
+
### How Counting Works
|
|
288
|
+
`catport` uses a heuristic: `Length in Chars / charsPerToken` (Default 4.2). This is faster than running a real tokenizer (like cl100k_base) and accurate enough for estimation.
|
|
289
|
+
|
|
290
|
+
### The Algorithm
|
|
291
|
+
1. **Scan**: All matching files are found.
|
|
292
|
+
2. **Score**: Each file is assigned a priority score (Default: 1).
|
|
293
|
+
* Files matching a `-p` rule get that rule's score.
|
|
294
|
+
* Example: `-p "package.json:100"` gives `package.json` score 100.
|
|
295
|
+
3. **Sort**: Files are sorted by Score (Descending) -> Name (Ascending).
|
|
296
|
+
4. **Bundle**: `catport` iterates through the sorted list.
|
|
297
|
+
* It calculates the token cost of the file.
|
|
298
|
+
* `Current Total + File Cost <= Budget`? Include it.
|
|
299
|
+
* `Current Total + File Cost > Budget`? Skip it (and log a warning).
|
|
300
|
+
|
|
301
|
+
### Skeleton Mode (`-k`)
|
|
302
|
+
If your project is huge, start with `-k`.
|
|
303
|
+
1. Run `catport -k > tree.md`
|
|
304
|
+
2. Send `tree.md` to the LLM: *"Here is my file structure. Where should I add the new AuthController?"*
|
|
305
|
+
3. LLM responds with a path.
|
|
306
|
+
4. Run `catport src/controllers/AuthController.ts` to get context for just that file.
|
|
307
|
+
|
|
308
|
+
---
|
|
309
|
+
|
|
310
|
+
## Output Formats
|
|
311
|
+
|
|
312
|
+
| Format | Flag | Best Use Case |
|
|
313
|
+
|-------------|------------|--------------------------------------------------------|
|
|
314
|
+
| Markdown | `md` | Highest readability for most LLMs |
|
|
315
|
+
| XML | `xml` | Strict parsing, CDATA prevents hallucinated fences |
|
|
316
|
+
| JSON | `json` | Programmatic consumption, easy to parse post-LLM |
|
|
317
|
+
| YAML | `yaml` | Clean, human-readable alternative to JSON |
|
|
318
|
+
| Multipart | `multipart`| Clean, low-complexity, token-saving format |
|
|
319
|
+
|
|
320
|
+
---
|
|
321
|
+
|
|
322
|
+
## Configuration File (`.catport.json`)
|
|
323
|
+
|
|
324
|
+
`catport` looks for a `.catport.json` file in the current working directory.
|
|
325
|
+
CLI flags override these values.
|
|
326
|
+
|
|
327
|
+
```json
|
|
328
|
+
{
|
|
329
|
+
"format": "xml",
|
|
330
|
+
"optimize": "minify",
|
|
331
|
+
"budget": 60000,
|
|
332
|
+
"maxSize": "5MB",
|
|
333
|
+
"charsPerToken": 4.0,
|
|
334
|
+
"gitDiff": "origin/main",
|
|
335
|
+
"ignore": [
|
|
336
|
+
"dist/",
|
|
337
|
+
"node_modules/",
|
|
338
|
+
"**/*.test.ts"
|
|
339
|
+
],
|
|
340
|
+
"priority": [
|
|
341
|
+
"README.md:200",
|
|
342
|
+
"package.json:150",
|
|
343
|
+
"src/**:80",
|
|
344
|
+
"*.md:50"
|
|
345
|
+
],
|
|
346
|
+
"replyFormat": "xml",
|
|
347
|
+
"xmlMode": "auto",
|
|
348
|
+
"concurrency": 32
|
|
349
|
+
}
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
---
|
|
353
|
+
|
|
354
|
+
## Programmatic Usage
|
|
355
|
+
|
|
356
|
+
You can use `catport` directly in your Node.js applications.
|
|
357
|
+
|
|
358
|
+
```javascript
|
|
359
|
+
import { Bundler, Extractor, NodeIO } from 'catport';
|
|
360
|
+
|
|
361
|
+
// 1. Bundle files
|
|
362
|
+
const config = {
|
|
363
|
+
paths: ['./src'],
|
|
364
|
+
format: 'xml',
|
|
365
|
+
optimize: 'comments', // 'none', 'whitespace', 'comments', 'minify'
|
|
366
|
+
ignore: ['**/*.spec.ts']
|
|
367
|
+
};
|
|
368
|
+
|
|
369
|
+
// Writes to stdout by default, or provide a custom IO writer
|
|
370
|
+
await Bundler.run(config, NodeIO);
|
|
371
|
+
|
|
372
|
+
// 2. Extract files
|
|
373
|
+
const extractConfig = {
|
|
374
|
+
paths: ['response.xml'],
|
|
375
|
+
extractDir: './output'
|
|
376
|
+
};
|
|
377
|
+
|
|
378
|
+
await Extractor.run(extractConfig, NodeIO);
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
---
|
|
382
|
+
|
|
383
|
+
## Security (Extraction Mode)
|
|
384
|
+
|
|
385
|
+
By default, extraction is **sandboxed**:
|
|
386
|
+
|
|
387
|
+
- All paths are normalized and resolved relative to `--extract-dir`
|
|
388
|
+
- `../` sequences that escape the target directory are blocked
|
|
389
|
+
- Absolute paths are rejected
|
|
390
|
+
- Hidden files (`.env`, `.git`) are allowed only if explicitly included
|
|
391
|
+
|
|
392
|
+
Use `--unsafe` (`-U`) **only** when you fully trust the source (e.g., internal automation).
|
|
393
|
+
|
|
394
|
+
---
|
|
395
|
+
|
|
396
|
+
## Use Cases & Workflows
|
|
397
|
+
|
|
398
|
+
| Workflow | Typical Command Sequence |
|
|
399
|
+
|---------------------------------------|-----------------------------------------------------------------------------------------------|
|
|
400
|
+
| **Code review** | `catport -g main -O minify -b 60000 -o context.md → send → receive → catport -x response.md` |
|
|
401
|
+
| **Agentic coding loop** | Use `--reply-format xml` + `--format xml` for reliable round-tripping |
|
|
402
|
+
| **High-level planning** | `catport -k > tree.md` → ask LLM for architecture suggestions → then deep dive |
|
|
403
|
+
| **Token-constrained models** | `-b 16000 -O minify` with strong priority rules |
|
|
404
|
+
| **Editor plugin / AI pair programmer**| Programmatic API + `--skeleton` + selective `-p` rules |
|
|
405
|
+
|
|
406
|
+
---
|
|
407
|
+
|
|
408
|
+
## Contributing
|
|
409
|
+
|
|
410
|
+
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details on the architectural philosophy (Unix-style pipes, zero-deps) and coding standards.
|
|
411
|
+
|
|
412
|
+
## License
|
|
413
|
+
|
|
414
|
+
MIT
|