@woladi/sortai 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +124 -90
- package/dist/cli.js +40 -5
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -1,46 +1,34 @@
|
|
|
1
1
|
# sortai
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://www.npmjs.com/package/@woladi/sortai)
|
|
4
|
+
[](https://www.npmjs.com/package/@woladi/sortai)
|
|
5
|
+
[](./LICENSE)
|
|
4
6
|
|
|
5
|
-
|
|
7
|
+
> macOS CLI that scans a folder, reads every document with **Apple Vision OCR**, and automatically writes **Finder tags** and **Finder comments** — so your files become searchable in Spotlight and browsable by tag in Finder. Runs fully offline by default. Cloud LLMs optional.
|
|
6
8
|
|
|
7
|
-
##
|
|
8
|
-
|
|
9
|
-
- macOS 12+
|
|
10
|
-
- Node.js 20+
|
|
11
|
-
- Xcode Command Line Tools (`xcode-select --install`) — needed by `macos-vision` to build its Swift binary at install time
|
|
12
|
-
- One of:
|
|
13
|
-
- [Ollama](https://ollama.com) running locally (default) — keeps everything offline
|
|
14
|
-
- Anthropic or OpenAI API key — for cloud LLM with optional `--mask`
|
|
15
|
-
|
|
16
|
-
## Quick start
|
|
17
|
-
|
|
18
|
-
```bash
|
|
19
|
-
# First run creates ~/.config/sortai/config.json with the default taxonomy
|
|
20
|
-
npx sortai
|
|
21
|
-
|
|
22
|
-
# Dry-run on the Desktop with local Ollama (default mistral-nemo)
|
|
23
|
-
npx sortai ~/Desktop --dry-run
|
|
9
|
+
## What it does
|
|
24
10
|
|
|
25
|
-
|
|
26
|
-
npx sortai ~/Desktop
|
|
27
|
-
```
|
|
11
|
+
`sortai` walks a folder recursively, reads the content of PDFs and images using Apple's on-device Vision framework (via [`macos-vision`](https://www.npmjs.com/package/macos-vision)), and uses a language model to infer what the file is about. It then writes that understanding directly into the file's macOS metadata:
|
|
28
12
|
|
|
29
|
-
|
|
13
|
+
- **Finder tags** — coloured labels visible in Finder's sidebar and file listings (e.g. `#Faktura`, `#Umowa`, `#CV`)
|
|
14
|
+
- **Finder comment** — a one-sentence description visible in the "Get Info" panel (`⌘I`) and in Spotlight search results
|
|
30
15
|
|
|
31
|
-
|
|
16
|
+
These are standard macOS extended attributes (`xattr`), not a separate database. They travel with the file, work offline, and are indexed by Spotlight immediately.
|
|
32
17
|
|
|
33
|
-
|
|
34
|
-
# Anthropic Claude, with PII masked locally via pseudonym-mcp before the upstream call
|
|
35
|
-
npx sortai ~/Desktop --cloud anthropic --mask --api-key sk-ant-...
|
|
18
|
+
### How it translates to Finder and Spotlight
|
|
36
19
|
|
|
37
|
-
|
|
38
|
-
ANTHROPIC_API_KEY=sk-ant-... npx sortai ~/Desktop --cloud openai
|
|
39
|
-
```
|
|
20
|
+
After `sortai` runs, you can:
|
|
40
21
|
|
|
41
|
-
|
|
22
|
+
| Action | How |
|
|
23
|
+
|--------|-----|
|
|
24
|
+
| Browse all invoices | Finder sidebar → click `#Faktura` tag |
|
|
25
|
+
| Search by tag in Spotlight | `⌘Space` → type `tag:Faktura` |
|
|
26
|
+
| Search by comment in Spotlight | `⌘Space` → type any word from the comment |
|
|
27
|
+
| Filter by tag in Finder | Finder → `⌘F` → Add criteria → Tags |
|
|
28
|
+
| See description without opening | Select file → `⌘I` → Spotlight Comments |
|
|
29
|
+
| Smart folder by tag | Finder → New Smart Folder → Tags is `Faktura` |
|
|
42
30
|
|
|
43
|
-
|
|
31
|
+
Tags and comments are written as binary plist `xattr` entries (`com.apple.metadata:_kMDItemUserTags`, `com.apple.metadata:kMDItemFinderComment`) — the same format Finder itself uses when you manually add a tag. After writing, `sortai` calls `mdimport` to trigger immediate Spotlight reindexing.
|
|
44
32
|
|
|
45
33
|
## How it works
|
|
46
34
|
|
|
@@ -48,32 +36,101 @@ When `--mask` is set, `sortai` spawns [`pseudonym-mcp`](https://www.npmjs.com/pa
|
|
|
48
36
|
folder (recursive walk, .dotfiles + excluded dirs skipped)
|
|
49
37
|
│
|
|
50
38
|
▼
|
|
51
|
-
dedup
|
|
39
|
+
dedup: SHA256 over file bytes → identical files → #Duplikat pre-tag
|
|
52
40
|
│
|
|
53
41
|
▼ for each file
|
|
54
|
-
macos-vision
|
|
55
|
-
|
|
42
|
+
macos-vision → Apple Vision OCR (on-device, no network)
|
|
43
|
+
│ PDF: auto-rasterised, page-bounded (default: first 2 pages)
|
|
44
|
+
│ Images: PNG, JPG, HEIC, WEBP
|
|
56
45
|
│
|
|
57
46
|
▼
|
|
58
|
-
pretag
|
|
47
|
+
pretag: regex rules over filepath + OCR text → quick pre-tags
|
|
59
48
|
│
|
|
60
49
|
▼ ≥4 pre-tags AND no OCR text → skip LLM (fast path)
|
|
61
|
-
LLM
|
|
62
|
-
├── default: local Ollama (mistral-nemo) —
|
|
50
|
+
LLM inference: filename + extension + pre-tags + OCR text → tags + comment
|
|
51
|
+
├── default: local Ollama (mistral-nemo) — 100% offline
|
|
63
52
|
└── --cloud anthropic|openai:
|
|
64
|
-
├── --mask → pseudonym-mcp
|
|
65
|
-
├── cloud LLM
|
|
66
|
-
└── --mask → pseudonym-mcp
|
|
53
|
+
├── --mask → pseudonym-mcp masks PII in OCR text (PESEL, names, IBANs…)
|
|
54
|
+
├── cloud LLM receives masked OCR text
|
|
55
|
+
└── --mask → pseudonym-mcp restores originals in the returned comment
|
|
56
|
+
│
|
|
57
|
+
▼ strict-evidence validation (e.g. #Bank only if "iban"/"rachunek" appears literally)
|
|
58
|
+
│ per-file 180 s watchdog → fallback to pre-tags if LLM hangs
|
|
67
59
|
│
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
macos.ts: xattr -wx + binary plist
|
|
71
|
-
├── com.apple.metadata:_kMDItemUserTags (Finder tags)
|
|
72
|
-
└── com.apple.metadata:kMDItemFinderComment (Finder comment)
|
|
73
|
-
├── mdimport <file> (Spotlight reindex, fire-and-forget)
|
|
60
|
+
xattr: write Finder tags + Finder comment as binary plist
|
|
61
|
+
mdimport: trigger Spotlight reindex (fire-and-forget)
|
|
74
62
|
```
|
|
75
63
|
|
|
76
|
-
|
|
64
|
+
## The OCR engine: Apple Vision via macos-vision
|
|
65
|
+
|
|
66
|
+
OCR is handled by [`macos-vision`](https://www.npmjs.com/package/macos-vision) — a Node.js package that calls Apple's native **Vision framework** (`VNRecognizeTextRequest`) directly. This means:
|
|
67
|
+
|
|
68
|
+
- **No network calls for OCR** — recognition happens entirely on your CPU/GPU
|
|
69
|
+
- **No Python, no Tesseract, no external binaries** — Vision is built into macOS 12+
|
|
70
|
+
- **High accuracy** — the same engine used by Finder's "Look Up" and Live Text
|
|
71
|
+
- **PDF support** — PDFs are rasterised page-by-page; `sortai` reads the first 2 pages by default (configurable)
|
|
72
|
+
- **Image support** — PNG, JPG, JPEG, WEBP, HEIC
|
|
73
|
+
|
|
74
|
+
## Privacy model
|
|
75
|
+
|
|
76
|
+
| Mode | OCR | LLM | What leaves your machine |
|
|
77
|
+
|------|-----|-----|--------------------------|
|
|
78
|
+
| Default (Ollama) | Apple Vision, on-device | Local Ollama model | Nothing |
|
|
79
|
+
| `--cloud anthropic\|openai` | Apple Vision, on-device | Cloud API | Full OCR text of each file |
|
|
80
|
+
| `--cloud ... --mask` | Apple Vision, on-device | Cloud API | Masked OCR (`[PESEL:1]`, `[PERSON:1]`, …) |
|
|
81
|
+
|
|
82
|
+
When `--mask` is set, `sortai` spawns [`pseudonym-mcp`](https://www.npmjs.com/package/pseudonym-mcp) as a local MCP server over stdio. Before each cloud call it runs `mask_text` on the OCR output (replacing real names, PESELs, IBANs, emails etc. with tokens), sends the masked text to the LLM, then runs `unmask_text` on the returned comment to restore the original values.
|
|
83
|
+
|
|
84
|
+
> **Pseudonymisation is a defence-in-depth control, not a compliance silver bullet.** Pseudonymised data is still personal data under GDPR Art. 4(5). Read the `pseudonym-mcp` README for the honest limitations.
|
|
85
|
+
|
|
86
|
+
## Requirements
|
|
87
|
+
|
|
88
|
+
- macOS 12+
|
|
89
|
+
- Node.js 20+
|
|
90
|
+
- Xcode Command Line Tools — `xcode-select --install` (needed by `macos-vision` to build its Swift binary at install time)
|
|
91
|
+
- One of:
|
|
92
|
+
- [Ollama](https://ollama.com) running locally (default) — pull any model, e.g. `ollama pull mistral-nemo`
|
|
93
|
+
- Anthropic or OpenAI API key for cloud mode
|
|
94
|
+
|
|
95
|
+
## Quick start
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
# First run creates ~/.config/sortai/config.json and exits
|
|
99
|
+
npx @woladi/sortai
|
|
100
|
+
|
|
101
|
+
# Dry-run: see what tags would be written, without touching any files
|
|
102
|
+
npx @woladi/sortai ~/Desktop --dry-run
|
|
103
|
+
|
|
104
|
+
# Actually write Finder tags and comments
|
|
105
|
+
npx @woladi/sortai ~/Desktop
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
> The first invocation writes the default config and exits. **Edit `~/.config/sortai/config.json`** to match your own tag taxonomy, then re-run.
|
|
109
|
+
|
|
110
|
+
### Reset metadata before a fresh run
|
|
111
|
+
|
|
112
|
+
```bash
|
|
113
|
+
# Remove all Finder tags and comments sortai previously wrote
|
|
114
|
+
npx @woladi/sortai ~/Desktop --clear
|
|
115
|
+
|
|
116
|
+
# Preview what would be cleared without touching files
|
|
117
|
+
npx @woladi/sortai ~/Desktop --clear --dry-run
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
After `--clear`, Spotlight is reindexed automatically (`mdimport`) so stale tags disappear from search immediately. Combine with a config change and re-run to start fresh with a new taxonomy.
|
|
121
|
+
|
|
122
|
+
### Cloud mode (optional)
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
# Anthropic Claude — OCR text sent to the API
|
|
126
|
+
npx @woladi/sortai ~/Desktop --cloud anthropic --api-key sk-ant-...
|
|
127
|
+
|
|
128
|
+
# With PII pseudonymisation: only tokens like [PESEL:1] reach the cloud
|
|
129
|
+
npx @woladi/sortai ~/Desktop --cloud anthropic --mask --api-key sk-ant-...
|
|
130
|
+
|
|
131
|
+
# OpenAI
|
|
132
|
+
OPENAI_API_KEY=sk-... npx @woladi/sortai ~/Desktop --cloud openai
|
|
133
|
+
```
|
|
77
134
|
|
|
78
135
|
## CLI flags
|
|
79
136
|
|
|
@@ -82,25 +139,22 @@ macos.ts: xattr -wx + binary plist
|
|
|
82
139
|
| `<folder>` | from config | Folder to scan recursively |
|
|
83
140
|
| `--config <path>` | `~/.config/sortai/config.json` | Alternative config file |
|
|
84
141
|
| `--dry-run` | off | Print results without writing tags/comments |
|
|
142
|
+
| `--clear` | off | Remove all sortai-written Finder tags and comments from every file in the folder |
|
|
85
143
|
| `--model <name>` | `mistral-nemo` (Ollama) | LLM model name |
|
|
86
144
|
| `--ollama-url <url>` | `http://localhost:11434` | Ollama server |
|
|
87
145
|
| `--cloud anthropic\|openai` | — | Switch to a cloud LLM |
|
|
88
|
-
| `--api-key <key>` | env | API key
|
|
89
|
-
| `--mask` | off | Pseudonymise OCR via pseudonym-mcp
|
|
146
|
+
| `--api-key <key>` | env | API key (`SORTAI_API_KEY` / `ANTHROPIC_API_KEY` / `OPENAI_API_KEY`) |
|
|
147
|
+
| `--mask` | off | Pseudonymise OCR text via pseudonym-mcp before cloud call |
|
|
90
148
|
| `--lang en\|pl` | `pl` | Language for pseudonym-mcp regex rules |
|
|
91
149
|
| `--exclude <names>` | from config | Comma-separated folder names to skip |
|
|
92
150
|
| `--limit <n>` | — | Process at most N files |
|
|
93
|
-
| `--skip-tagged` | off | Skip files that already carry `cfg.tags.autoTag` (`#AI_Sorted`
|
|
94
|
-
| `--no-dedup` | off | Skip SHA256
|
|
151
|
+
| `--skip-tagged` | off | Skip files that already carry `cfg.tags.autoTag` (`#AI_Sorted`) |
|
|
152
|
+
| `--no-dedup` | off | Skip SHA256 duplicate detection |
|
|
95
153
|
| `--verbose` | off | Extra logs |
|
|
96
154
|
|
|
97
|
-
Environment variables: `SORTAI_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`.
|
|
98
|
-
|
|
99
155
|
## Configuration
|
|
100
156
|
|
|
101
|
-
The
|
|
102
|
-
|
|
103
|
-
The config file is plain JSON. Sections:
|
|
157
|
+
The first run writes `~/.config/sortai/config.json`. Edit it to fit your taxonomy:
|
|
104
158
|
|
|
105
159
|
```json
|
|
106
160
|
{
|
|
@@ -135,54 +189,34 @@ The config file is plain JSON. Sections:
|
|
|
135
189
|
],
|
|
136
190
|
"autoTag": "#AI_Sorted"
|
|
137
191
|
},
|
|
138
|
-
"context": "1-2 sentence description of yourself and ongoing matters — used by the LLM as background.
|
|
192
|
+
"context": "1-2 sentence description of yourself and ongoing matters — used by the LLM as background."
|
|
139
193
|
}
|
|
140
194
|
```
|
|
141
195
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
-
|
|
145
|
-
-
|
|
146
|
-
-
|
|
147
|
-
-
|
|
148
|
-
-
|
|
149
|
-
-
|
|
150
|
-
- `tags.pathRules` — regex patterns over `path.replace(/[\\/_-]/g, " ") + " " + ocrText`. Multiple rules can match; results merge into `preTags`.
|
|
151
|
-
- `tags.autoTag` — appended to every successfully tagged file (sentinel so you can find "already processed" items in Finder and `--skip-tagged` works).
|
|
152
|
-
- `context` — pinned to the system prompt as background knowledge. **Edit this** — the default is a placeholder.
|
|
196
|
+
Key options:
|
|
197
|
+
|
|
198
|
+
- **`tags.allowed`** — the full set of tags the LLM may return; anything outside this list is dropped.
|
|
199
|
+
- **`tags.strict`** — subset of `allowed`. A strict tag only lands on a file if at least one `strictEvidence` keyword appears verbatim in OCR or filename. Prevents false positives on sensitive categories like `#Bank` or `#Kredyt`.
|
|
200
|
+
- **`tags.autoTag`** — appended to every successfully processed file. Used as a sentinel by `--skip-tagged` so you don't re-process files on the next run.
|
|
201
|
+
- **`tags.pathRules`** — regex rules matched against the full filepath + OCR text. Matched tags become *pre-tags* that are always included and passed to the LLM as hints.
|
|
202
|
+
- **`ocr.startPage` / `ocr.maxPages`** — PDF page range. Default reads pages 1–2; raise `maxPages` for long documents where the key content is deeper.
|
|
203
|
+
- **`context`** — one or two sentences about yourself pinned to the LLM system prompt. The model uses this as background when writing comments (e.g. knowing you're a freelancer or a specific sector helps contextualise ambiguous documents).
|
|
153
204
|
|
|
154
205
|
## Duplicate detection
|
|
155
206
|
|
|
156
207
|
`sortai` ships two independent duplicate signals:
|
|
157
208
|
|
|
158
|
-
- **`#Duplikat`** — SHA256 over file bytes
|
|
159
|
-
- **`#PrawdopodobnaKopia`** — heuristic
|
|
160
|
-
|
|
161
|
-
A file can carry both, one, or neither. Skip the hash pre-pass with `--no-dedup` if it's too slow on huge media libraries.
|
|
162
|
-
|
|
163
|
-
## What about Markdown export?
|
|
164
|
-
|
|
165
|
-
`sortai` is the *tagger*. If you want image/PDF → Markdown, use `macos-vision` directly:
|
|
166
|
-
|
|
167
|
-
```bash
|
|
168
|
-
npx macos-vision --markdown invoice.pdf -o invoice.md
|
|
169
|
-
```
|
|
170
|
-
|
|
171
|
-
That's the same Apple Vision + Ollama pipeline (VisionScribe), without the file-tagging layer.
|
|
172
|
-
|
|
173
|
-
## Privacy
|
|
209
|
+
- **`#Duplikat`** — SHA256 hash over file bytes. Files in a group of ≥2 identical hashes all get this tag. Catches `cp`, sync conflicts, bit-identical copies regardless of filename. Skipped for files > `cfg.dedup.maxFileSizeMB` and for 0-byte files.
|
|
210
|
+
- **`#PrawdopodobnaKopia`** — heuristic matched against filename + OCR: detects `copy`, `kopia`, `duplikat`, `(2)` patterns. Catches macOS Finder "Duplicate", "Save As" copies, manual versioning — cases where bytes differ (different mtime, repacked PDF) but the file is logically a copy.
|
|
174
211
|
|
|
175
|
-
-
|
|
176
|
-
- **`--cloud` without `--mask`**: the *full* OCR text of every scanned file is sent to your chosen provider. Use only when you trust the provider with the documents.
|
|
177
|
-
- **`--cloud --mask`**: the OCR text is masked locally first; tokens like `[PERSON:1]`, `[PESEL:1]` flow to the cloud instead of literals. Structure, dates, amounts, and any PII the regex/LLM detector misses still travel. See [`pseudonym-mcp`](https://www.npmjs.com/package/pseudonym-mcp) for the full caveats.
|
|
178
|
-
- File metadata is written via `osascript` (Apple Events). `sortai` makes no other network calls beyond your chosen LLM provider.
|
|
212
|
+
A file can carry both, one, or neither. Use `--no-dedup` to skip hashing on large media libraries.
|
|
179
213
|
|
|
180
214
|
## Development
|
|
181
215
|
|
|
182
216
|
```bash
|
|
183
217
|
git clone https://github.com/woladi/sortai.git
|
|
184
218
|
cd sortai
|
|
185
|
-
npm install # macOS only; Linux/Windows
|
|
219
|
+
npm install # macOS only; on Linux/Windows use --ignore-scripts
|
|
186
220
|
npm run typecheck
|
|
187
221
|
npm run build
|
|
188
222
|
node dist/cli.js --help
|
package/dist/cli.js
CHANGED
|
@@ -4,12 +4,15 @@ import chalk from 'chalk';
|
|
|
4
4
|
import ora from 'ora';
|
|
5
5
|
import path from 'node:path';
|
|
6
6
|
import { existsSync } from 'node:fs';
|
|
7
|
+
import { execFile } from 'node:child_process';
|
|
8
|
+
import { promisify } from 'node:util';
|
|
9
|
+
const execFileAsync = promisify(execFile);
|
|
7
10
|
import { expandHome, loadConfig } from './config.js';
|
|
8
11
|
import { walkFiles } from './walker.js';
|
|
9
12
|
import { extractOcrText } from './ocr.js';
|
|
10
13
|
import { preTagFromPath } from './pretag.js';
|
|
11
14
|
import { mergeTags } from './tags.js';
|
|
12
|
-
import { writeFileMetadata } from './macos.js';
|
|
15
|
+
import { writeFileMetadata, clearMacosMetadata } from './macos.js';
|
|
13
16
|
import { Masker } from './mask.js';
|
|
14
17
|
import { inferTagsAndComment } from './llm/index.js';
|
|
15
18
|
import { findDuplicates } from './dedup.js';
|
|
@@ -58,6 +61,7 @@ async function main() {
|
|
|
58
61
|
.argument('[folder]', 'folder to scan (overrides config.scan.folder)')
|
|
59
62
|
.option('--config <path>', 'path to config JSON (default: ~/.config/sortai/config.json)')
|
|
60
63
|
.option('--dry-run', 'do not write tags/comments; just log', false)
|
|
64
|
+
.option('--clear', 'remove all sortai-written Finder tags and comments from every file in the folder', false)
|
|
61
65
|
.option('--model <name>', 'LLM model name (default depends on provider)')
|
|
62
66
|
.option('--ollama-url <url>', 'Ollama base URL (default: http://localhost:11434)')
|
|
63
67
|
.option('--cloud <provider>', "use a cloud LLM: 'anthropic' or 'openai'")
|
|
@@ -117,6 +121,40 @@ async function main() {
|
|
|
117
121
|
masker = undefined;
|
|
118
122
|
}
|
|
119
123
|
}
|
|
124
|
+
if (opts.clear) {
|
|
125
|
+
process.stdout.write(chalk.cyan(`🧹 Clearing sortai metadata from ${root}\n`));
|
|
126
|
+
if (opts.dryRun)
|
|
127
|
+
process.stdout.write(chalk.yellow(' [dry-run — no changes will be written]\n'));
|
|
128
|
+
process.stdout.write('\n');
|
|
129
|
+
const clearFiles = await walkFiles(root, cfg);
|
|
130
|
+
let cleared = 0;
|
|
131
|
+
let clearErrors = 0;
|
|
132
|
+
for (const filePath of clearFiles) {
|
|
133
|
+
const rel = path.relative(root, filePath);
|
|
134
|
+
if (opts.dryRun) {
|
|
135
|
+
process.stdout.write(chalk.gray(` 🗑 ${rel}\n`));
|
|
136
|
+
cleared++;
|
|
137
|
+
continue;
|
|
138
|
+
}
|
|
139
|
+
try {
|
|
140
|
+
await clearMacosMetadata(filePath);
|
|
141
|
+
execFileAsync('mdimport', [filePath]).catch(() => { });
|
|
142
|
+
if (opts.verbose)
|
|
143
|
+
process.stdout.write(chalk.gray(` 🗑 ${rel}\n`));
|
|
144
|
+
cleared++;
|
|
145
|
+
}
|
|
146
|
+
catch {
|
|
147
|
+
process.stdout.write(chalk.red(` ❌ ${rel}\n`));
|
|
148
|
+
clearErrors++;
|
|
149
|
+
}
|
|
150
|
+
}
|
|
151
|
+
process.stdout.write('═══════════════════════════════════════════════════════\n');
|
|
152
|
+
process.stdout.write(chalk.bold('✨ Done\n'));
|
|
153
|
+
process.stdout.write(chalk.green(` 🗑 Cleared: ${cleared}\n`));
|
|
154
|
+
if (clearErrors)
|
|
155
|
+
process.stdout.write(chalk.red(` ❌ Errors: ${clearErrors}\n`));
|
|
156
|
+
return;
|
|
157
|
+
}
|
|
120
158
|
process.stdout.write(chalk.cyan(`🚀 Scanning ${root}\n`));
|
|
121
159
|
process.stdout.write(` Provider: ${cfg.llm.provider} (${cfg.llm.model})`);
|
|
122
160
|
if (cfg.mask.enabled && masker)
|
|
@@ -131,14 +169,11 @@ async function main() {
|
|
|
131
169
|
let allFiles = await walkFiles(root, cfg);
|
|
132
170
|
process.stdout.write(`📁 Files: ${allFiles.length}\n`);
|
|
133
171
|
if (opts.skipTagged) {
|
|
134
|
-
const { execFile } = await import('node:child_process');
|
|
135
|
-
const { promisify } = await import('node:util');
|
|
136
|
-
const exec = promisify(execFile);
|
|
137
172
|
const before = allFiles.length;
|
|
138
173
|
const filtered = [];
|
|
139
174
|
for (const f of allFiles) {
|
|
140
175
|
try {
|
|
141
|
-
const { stdout: md } = await
|
|
176
|
+
const { stdout: md } = await execFileAsync('mdls', ['-name', 'kMDItemUserTags', '-raw', f], { timeout: 3_000 });
|
|
142
177
|
if (!md.includes(cfg.tags.autoTag))
|
|
143
178
|
filtered.push(f);
|
|
144
179
|
}
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@woladi/sortai",
|
|
3
|
-
"version": "0.1.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "0.1.1",
|
|
4
|
+
"description": "Automatically tag and describe your files using Apple Vision OCR + local Ollama or cloud LLM — writes native Finder tags and comments searchable in Spotlight",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "./dist/cli.js",
|
|
7
7
|
"bin": {
|