auxiliar-mcp 0.9.0 → 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +64 -28
- package/dist/data/event-sources.js +9 -10
- package/dist/data/solve.js +107 -0
- package/dist/server.js +1 -1
- package/dist/tools/solve.js +11 -3
- package/package.json +26 -8
package/README.md
CHANGED
|
@@ -1,10 +1,13 @@
|
|
|
1
1
|
# auxiliar-mcp
|
|
2
2
|
|
|
3
|
-
MCP server that
|
|
3
|
+
The MCP server that tells your agent **which tool to install** for a task, and **which cloud service to pick** for a stack.
|
|
4
4
|
|
|
5
|
-
Your agent
|
|
5
|
+
Your agent is intelligent but stuck. It doesn't know Surya beats Tesseract by 1.5pp on word accuracy for Brazilian NFS-e invoices. It doesn't know SendGrid killed its free tier. It guesses, installs the wrong thing, and you burn 30 minutes.
|
|
6
6
|
|
|
7
|
-
**auxiliar-mcp** gives your agent
|
|
7
|
+
**auxiliar-mcp** gives your agent reproducible eval-backed answers for the two questions it hits most:
|
|
8
|
+
|
|
9
|
+
1. **"What installable tool should I use for task X?"** — skills, MCPs, vendor APIs, local binaries — ranked on real-world corpora, via `solve_task`.
|
|
10
|
+
2. **"What cloud service should I pick for Y?"** — Chrome-verified pricing, risks, compatibility, setup commands for 77 services across 16 categories, via `recommend_service`.
|
|
8
11
|
|
|
9
12
|
## Install
|
|
10
13
|
|
|
@@ -16,22 +19,53 @@ claude mcp add auxiliar -- npx auxiliar-mcp
|
|
|
16
19
|
npx auxiliar-mcp
|
|
17
20
|
```
|
|
18
21
|
|
|
19
|
-
## Tools
|
|
22
|
+
## Tools (8)
|
|
20
23
|
|
|
21
24
|
| Tool | What it does |
|
|
22
25
|
|------|-------------|
|
|
23
|
-
| `
|
|
24
|
-
| `
|
|
25
|
-
| `
|
|
26
|
-
| `
|
|
27
|
-
| `
|
|
26
|
+
| `solve_task` | Get the ranked list of installable tools for a job-to-be-done (e.g., `pdf-text-extraction-mcp`, `nfs-e`, `boleto`, `receipt-parsing`, `bookkeeping-ocr`) with scorecards, install commands, FAQ, alternatives considered, and methodological caveats. |
|
|
27
|
+
| `list_solve_tasks` | Discover every `/solve/` task ranking available — slugs, top picks, categories, agent compatibility. |
|
|
28
|
+
| `recommend_service` | Picks the best cloud service for your constraints (framework, budget, region, GDPR, edge, lock-in). |
|
|
29
|
+
| `get_pricing` | Chrome-verified pricing — including JS-rendered pages agents can't read via WebFetch. |
|
|
30
|
+
| `get_risks` | Risk flags, gotchas, recent breaking changes. |
|
|
31
|
+
| `check_compatibility` | Warns about known conflicts between services (e.g., Turso + Prisma needs adapter). |
|
|
32
|
+
| `setup_service` | CLI commands, signup URLs, env vars, estimated setup time. |
|
|
33
|
+
| `list_services` | Browse the full 77-service catalog, filtered by category. |
|
|
34
|
+
|
|
35
|
+
## When to use `solve_task`
|
|
36
|
+
|
|
37
|
+
Your agent needs an **installable tool** (skill, MCP, vendor API, or local binary) and you want a reproducible evaluation, not vibes.
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
Agent: "I need to extract text from Brazilian NFS-e invoices, boletos, and phone-photo receipts. What should I install?"
|
|
28
41
|
|
|
29
|
-
|
|
42
|
+
→ solve_task(task_slug="pdf-text-extraction-mcp")
|
|
43
|
+
# aliases work too: "pdf", "ocr", "nfs-e", "boleto", "receipt-parsing", "bookkeeping-ocr", "invoice-extraction"
|
|
44
|
+
|
|
45
|
+
Returns (truncated):
|
|
46
|
+
{
|
|
47
|
+
"answer": "Install Surya (pip install surya-ocr + pin transformers<5.0.0). It led our 10-document real-world corpus on word accuracy (76.9%) and layout preservation (7.0/10), free, local. Tesseract 5 runs 14× faster for throughput-critical workflows. Google Document AI wins on phone-photo receipts specifically...",
|
|
48
|
+
"candidates": [
|
|
49
|
+
{ "slug": "surya", "rank": 1, "scorecard": {"word_accuracy": 0.769, "layout": 7, "p50_latency_sec": 22.1, "install_friction": 7, "cost_per_10_docs_usd": 0} },
|
|
50
|
+
{ "slug": "tesseract", "rank": 2, "scorecard": {"word_accuracy": 0.754, "layout": 5, "p50_latency_sec": 1.6, "install_friction": 3, "cost_per_10_docs_usd": 0} },
|
|
51
|
+
{ "slug": "google-document-ai", "rank": 3, "scorecard": {"word_accuracy": 0.697, "layout": 5.7, "p50_latency_sec": 3.8, "install_friction": 7, "cost_per_10_docs_usd": 0.069} }
|
|
52
|
+
],
|
|
53
|
+
"corpus_summary": "10 real-world documents: native-text PDFs, legal docs, Brazilian corporate-registry scans, NFS-e invoices, boletos, phone-photo receipts.",
|
|
54
|
+
"alternatives_considered": [ /* yescan, Mistral OCR, pdf-reader-mcp — dropped with reasons */ ],
|
|
55
|
+
"faq": [ /* e.g., "Why does all score 0 on the boleto?" */ ]
|
|
56
|
+
}
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Full page with reproducible commands: https://auxiliar.ai/solve/pdf-text-extraction-mcp/
|
|
60
|
+
|
|
61
|
+
## When to use `recommend_service`
|
|
62
|
+
|
|
63
|
+
Your agent needs a **cloud service** (database, email provider, auth, payments, etc.).
|
|
30
64
|
|
|
31
65
|
```
|
|
32
66
|
Agent: "I need a database for my Next.js app. Budget is free, deployed to Cloudflare Workers."
|
|
33
67
|
|
|
34
|
-
→ recommend_service(need="database", framework="nextjs", constraints="edge, zero cold starts")
|
|
68
|
+
→ recommend_service(need="database", framework="nextjs", budget="free", constraints="edge, zero cold starts")
|
|
35
69
|
|
|
36
70
|
Returns:
|
|
37
71
|
{
|
|
@@ -40,17 +74,12 @@ Returns:
|
|
|
40
74
|
"pricing": { "free_tier": "5 GB storage, 100 databases" },
|
|
41
75
|
"risks": ["Not PostgreSQL — limited ORM support"],
|
|
42
76
|
"migration_difficulty": "high",
|
|
43
|
-
"key_features": ["SQLite/libSQL", "embedded replicas", "zero cold starts"],
|
|
44
|
-
"mcp_available": false,
|
|
45
|
-
"cli_available": true,
|
|
46
77
|
"cli_install": "brew install tursodatabase/tap/turso",
|
|
47
|
-
"alternatives": [
|
|
48
|
-
{ "provider": "neon", "trade_off": "Has cold starts on free tier" }
|
|
49
|
-
]
|
|
78
|
+
"alternatives": [{"provider": "neon", "trade_off": "Has cold starts on free tier"}]
|
|
50
79
|
}
|
|
51
80
|
```
|
|
52
81
|
|
|
53
|
-
## Services Covered
|
|
82
|
+
## Services Covered (77)
|
|
54
83
|
|
|
55
84
|
**Database:** Neon, Supabase, Turso, PlanetScale, Render Postgres, AWS RDS, Railway Postgres, Cloudflare D1
|
|
56
85
|
**Email:** Resend, Postmark, SendGrid, AWS SES, Mailgun, Listmonk
|
|
@@ -68,17 +97,23 @@ Returns:
|
|
|
68
97
|
**SMS:** Twilio, Vonage, MessageBird
|
|
69
98
|
**Feature Flags:** LaunchDarkly, Statsig, Flagsmith, Unleash
|
|
70
99
|
**Cron:** Inngest, Trigger.dev, QStash, Vercel Cron, Cloudflare Cron
|
|
100
|
+
**PDF / OCR (via solve_task):** Surya, Tesseract 5, Google Document AI
|
|
101
|
+
|
|
102
|
+
## /solve/ Tasks Available
|
|
103
|
+
|
|
104
|
+
| Slug | Top pick | Corpus | Categories |
|
|
105
|
+
|------|----------|--------|-----------|
|
|
106
|
+
| `pdf-text-extraction-mcp` | Surya | 10 Brazilian docs incl. NFS-e, boleto, phone-photo receipts | pdf-processing, ocr, agent-tools |
|
|
107
|
+
|
|
108
|
+
More `/solve/` rankings added as walkthroughs run. Each page includes its reproducible command so you can re-run the eval yourself.
|
|
71
109
|
|
|
72
110
|
## Data Quality
|
|
73
111
|
|
|
74
|
-
-
|
|
75
|
-
- Updated
|
|
76
|
-
-
|
|
77
|
-
- Scores 9/10 on recommendation accuracy
|
|
78
|
-
- 47 category aliases (agents can query "llm-api", "file-storage", "vector-db", etc.)
|
|
79
|
-
- 27 compatibility rules with cross-service conflict detection
|
|
112
|
+
- **/solve/ evals:** reproducible corpus + harness + scoring per task. Ground truth is LLM-drafted, human-finalized. Published commands can be re-run locally.
|
|
113
|
+
- **Cloud-service pricing:** Chrome-verified (actual service websites, not training data). Updated through 2026-04.
|
|
114
|
+
- **Trust scores:** 50+ agent runs across 8 iterations; 47 category aliases; 27 compatibility rules.
|
|
80
115
|
|
|
81
|
-
## Constraints You Can Use
|
|
116
|
+
## Constraints You Can Use on `recommend_service`
|
|
82
117
|
|
|
83
118
|
| Constraint | Example |
|
|
84
119
|
|-----------|---------|
|
|
@@ -92,12 +127,13 @@ Returns:
|
|
|
92
127
|
|
|
93
128
|
## Privacy
|
|
94
129
|
|
|
95
|
-
The MCP server pings `auxiliar.ai/api/` on each tool call for analytics. Only query parameters are sent (e.g., `?need=database&framework=nextjs`). No personal data, no API keys, no project info. Works offline with bundled data if the ping fails.
|
|
130
|
+
The MCP server pings `auxiliar.ai/api/` on each tool call for analytics. Only query parameters are sent (e.g., `?need=database&framework=nextjs` or `?task_slug=pdf-text-extraction-mcp`). No personal data, no API keys, no project info. Works offline with bundled data if the ping fails.
|
|
96
131
|
|
|
97
132
|
## Links
|
|
98
133
|
|
|
99
|
-
- [auxiliar.ai](https://auxiliar.ai) — comparison site with
|
|
100
|
-
- [
|
|
134
|
+
- [auxiliar.ai](https://auxiliar.ai) — the comparison site with service entries and `/solve/` task rankings
|
|
135
|
+
- [/solve/pdf-text-extraction-mcp](https://auxiliar.ai/solve/pdf-text-extraction-mcp/) — the OCR walkthrough
|
|
136
|
+
- [GitHub](https://github.com/Tlalvarez/Auxiliar-ai) — source + reproducible eval harness under `scripts/ocr-walkthrough/`
|
|
101
137
|
|
|
102
138
|
## License
|
|
103
139
|
|
|
@@ -36,7 +36,7 @@ export const eventSources = [
|
|
|
36
36
|
name: "Vercel",
|
|
37
37
|
sources: [
|
|
38
38
|
{ type: "status", url: "https://www.vercel-status.com/history.rss" },
|
|
39
|
-
{ type: "changelog", url: "https://vercel.com/
|
|
39
|
+
{ type: "changelog", url: "https://vercel.com/atom" },
|
|
40
40
|
{
|
|
41
41
|
type: "security",
|
|
42
42
|
url: "https://vercel.com/kb/bulletin",
|
|
@@ -48,7 +48,7 @@ export const eventSources = [
|
|
|
48
48
|
slug: "stripe",
|
|
49
49
|
name: "Stripe",
|
|
50
50
|
sources: [
|
|
51
|
-
{ type: "status", url: "https://
|
|
51
|
+
{ type: "status", url: "https://www.stripestatus.com/history.rss" },
|
|
52
52
|
{ type: "changelog", url: "https://stripe.com/blog/feed.rss" },
|
|
53
53
|
],
|
|
54
54
|
},
|
|
@@ -57,15 +57,15 @@ export const eventSources = [
|
|
|
57
57
|
name: "Supabase",
|
|
58
58
|
sources: [
|
|
59
59
|
{ type: "status", url: "https://status.supabase.com/history.rss" },
|
|
60
|
-
{ type: "changelog", url: "https://supabase.com/
|
|
60
|
+
{ type: "changelog", url: "https://supabase.com/feed.xml" },
|
|
61
61
|
],
|
|
62
62
|
},
|
|
63
63
|
{
|
|
64
64
|
slug: "neon",
|
|
65
65
|
name: "Neon",
|
|
66
66
|
sources: [
|
|
67
|
-
{ type: "status", url: "https://neonstatus.com/
|
|
68
|
-
{ type: "changelog", url: "https://neon.
|
|
67
|
+
{ type: "status", url: "https://neonstatus.com/pages/6878fc85709daa75be6c7e3c/rss" },
|
|
68
|
+
{ type: "changelog", url: "https://neon.com/blog/rss.xml" },
|
|
69
69
|
],
|
|
70
70
|
},
|
|
71
71
|
{
|
|
@@ -73,7 +73,7 @@ export const eventSources = [
|
|
|
73
73
|
name: "Clerk",
|
|
74
74
|
sources: [
|
|
75
75
|
{ type: "status", url: "https://status.clerk.com/history.rss" },
|
|
76
|
-
{ type: "changelog", url: "https://clerk.com/changelog/
|
|
76
|
+
{ type: "changelog", url: "https://clerk.com/changelog/atom.xml" },
|
|
77
77
|
],
|
|
78
78
|
},
|
|
79
79
|
{
|
|
@@ -92,8 +92,7 @@ export const eventSources = [
|
|
|
92
92
|
slug: "railway",
|
|
93
93
|
name: "Railway",
|
|
94
94
|
sources: [
|
|
95
|
-
{ type: "status", url: "https://
|
|
96
|
-
{ type: "changelog", url: "https://blog.railway.com/feed.xml" },
|
|
95
|
+
{ type: "status", url: "https://railway.instatus.com/history.rss" },
|
|
97
96
|
],
|
|
98
97
|
},
|
|
99
98
|
{
|
|
@@ -101,7 +100,7 @@ export const eventSources = [
|
|
|
101
100
|
name: "Render",
|
|
102
101
|
sources: [
|
|
103
102
|
{ type: "status", url: "https://status.render.com/history.rss" },
|
|
104
|
-
{ type: "changelog", url: "https://render.com/changelog/
|
|
103
|
+
{ type: "changelog", url: "https://render.com/changelog/feed.xml" },
|
|
105
104
|
],
|
|
106
105
|
},
|
|
107
106
|
{
|
|
@@ -109,7 +108,7 @@ export const eventSources = [
|
|
|
109
108
|
name: "Resend",
|
|
110
109
|
sources: [
|
|
111
110
|
{ type: "status", url: "https://resend-status.com/history.rss" },
|
|
112
|
-
{ type: "changelog", url: "https://resend.com/
|
|
111
|
+
{ type: "changelog", url: "https://resend.com/blog/index.xml" },
|
|
113
112
|
],
|
|
114
113
|
},
|
|
115
114
|
{
|
package/dist/data/solve.js
CHANGED
|
@@ -124,4 +124,111 @@ export const solveTasks = {
|
|
|
124
124
|
openclaw: { tesseract: true, surya: true, "google-document-ai": true },
|
|
125
125
|
},
|
|
126
126
|
},
|
|
127
|
+
"nfs-e-extraction": {
|
|
128
|
+
slug: "nfs-e-extraction",
|
|
129
|
+
title: "NFS-e field extraction for agents — ranked by field accuracy on Brazilian São Paulo invoices",
|
|
130
|
+
url: "https://auxiliar.ai/solve/nfs-e-extraction/",
|
|
131
|
+
last_verified: "2026-04-23",
|
|
132
|
+
categories: ["nfs-e", "brazilian-invoice", "structured-extraction", "bookkeeping", "agent-tools"],
|
|
133
|
+
works_with: ["claude-code", "claude-desktop", "cursor", "openclaw"],
|
|
134
|
+
alternatives: ["auxiliar-nfs-e-with-surya", "auxiliar-nfs-e-with-google-doc-ai", "auxiliar-nfs-e-with-tesseract"],
|
|
135
|
+
eval_method: "auxiliar-nfs-e-field-accuracy-v1 (2-doc SP corpus)",
|
|
136
|
+
answer: "When your Claude Code / OpenClaw agent is asked to extract structured fields from a Brazilian NFS-e (Nota Fiscal Eletrônica de Serviços) — for bookkeeping, reimbursement batching, accountant handoff, tax reconciliation — install `auxiliar-nfs-e` paired with Surya OCR. On our 2-doc São Paulo NFS-e corpus it achieved 100% field accuracy (41/41 fields): numero_nota, codigo_verificacao, data_emissao, chave_acesso, prestador CNPJ/IM/nome/endereço, tomador CNPJ/IM/nome, valor_servico, codigo_servico, descrição, RPS reference, and more. Surya's OCR preserves line-level field ordering cleanly, which the parser's label-based extractor relies on. For budget-sensitive workflows, Google Document AI pairs with the same parser at 87.8% field accuracy and a ~$0.002/page cost after the 1,000-page/month free tier. Tesseract 5 is the fastest option but drops to 63.4% field accuracy because its default output reorders the retention/ISS table.",
|
|
137
|
+
candidates: [
|
|
138
|
+
{
|
|
139
|
+
slug: "auxiliar-nfs-e-with-surya",
|
|
140
|
+
name: "auxiliar-nfs-e + Surya",
|
|
141
|
+
rank: 1,
|
|
142
|
+
install: "python -m venv .venv && source .venv/bin/activate && pip install surya-ocr 'transformers<5.0.0' && git clone https://github.com/Tlalvarez/Auxiliar-ai.git /tmp/auxiliar && cp /tmp/auxiliar/scripts/walkthroughs/nfs-e-extraction/parser.py ./nfse_parser.py",
|
|
143
|
+
scorecard: {
|
|
144
|
+
word_accuracy: 1.0,
|
|
145
|
+
token_f1: 1.0,
|
|
146
|
+
install_friction: 7,
|
|
147
|
+
p50_latency_sec: 22.5,
|
|
148
|
+
cost_per_10_docs_usd: 0,
|
|
149
|
+
},
|
|
150
|
+
notes: "100% field accuracy (41/41) on the 2-doc SP corpus. Surya preserves line-level field ordering. Parser extracts prestador, tomador, valor, codigo_servico, ISS, retenções, and RPS reference into a typed dataclass.",
|
|
151
|
+
license: "MIT (parser) + GPL-3.0 (Surya code) + AI Pubs Open Rail-M (Surya weights, <$2M clause)",
|
|
152
|
+
},
|
|
153
|
+
{
|
|
154
|
+
slug: "auxiliar-nfs-e-with-google-doc-ai",
|
|
155
|
+
name: "auxiliar-nfs-e + Google Document AI",
|
|
156
|
+
rank: 2,
|
|
157
|
+
install: "gcloud auth application-default login && gcloud services enable documentai.googleapis.com --project YOUR_PROJECT && git clone https://github.com/Tlalvarez/Auxiliar-ai.git /tmp/auxiliar",
|
|
158
|
+
scorecard: {
|
|
159
|
+
word_accuracy: 0.878,
|
|
160
|
+
install_friction: 7,
|
|
161
|
+
p50_latency_sec: 3.8,
|
|
162
|
+
cost_per_10_docs_usd: 0.069,
|
|
163
|
+
},
|
|
164
|
+
notes: "87.8% field accuracy (36/41). Lost valor_servico on doc 03 + RPS fields on doc 08. Cloud auth flow; 1000 pages/month free tier; data leaves local machine.",
|
|
165
|
+
license: "MIT (parser) + Proprietary (Google Cloud)",
|
|
166
|
+
},
|
|
167
|
+
{
|
|
168
|
+
slug: "auxiliar-nfs-e-with-tesseract",
|
|
169
|
+
name: "auxiliar-nfs-e + Tesseract 5",
|
|
170
|
+
rank: 3,
|
|
171
|
+
install: "brew install tesseract tesseract-lang poppler && git clone https://github.com/Tlalvarez/Auxiliar-ai.git /tmp/auxiliar",
|
|
172
|
+
scorecard: {
|
|
173
|
+
word_accuracy: 0.634,
|
|
174
|
+
install_friction: 3,
|
|
175
|
+
p50_latency_sec: 1.6,
|
|
176
|
+
cost_per_10_docs_usd: 0,
|
|
177
|
+
},
|
|
178
|
+
notes: "63.4% field accuracy (26/41). Retention table reorders in Tesseract output, breaking positional extraction for ISS/deductions fields. Use only when throughput > accuracy.",
|
|
179
|
+
license: "MIT (parser) + Apache 2.0 (Tesseract)",
|
|
180
|
+
},
|
|
181
|
+
],
|
|
182
|
+
corpus_summary: "2 real São Paulo NFS-e invoices from Dunas's bookkeeping archive. Doc 03: legal services (Advocacia, R$ 3.900,00), Simples Nacional prestador. Doc 08: training services (R$ 25.000,00), includes RPS reference. Ground truth is the PDF's embedded text layer (authoritative for native-text NFS-e). Corpus is git-ignored (real business docs); ground-truth transcriptions and parser committed.",
|
|
183
|
+
alternatives_considered: [
|
|
184
|
+
{
|
|
185
|
+
name: "Pure OCR without a parser (Surya / Tesseract / Google Doc AI alone)",
|
|
186
|
+
dropped_because: "Returns raw text; agents then have to reimplement NFS-e field regex logic per project. The parser is the value.",
|
|
187
|
+
},
|
|
188
|
+
{
|
|
189
|
+
name: "LLM field extraction (prompt Claude/GPT with NFS-e text)",
|
|
190
|
+
dropped_because: "Non-deterministic, slower, more expensive per page, and requires additional verification step. For a regulated document with fixed structure, regex + position-based extraction is correct.",
|
|
191
|
+
},
|
|
192
|
+
{
|
|
193
|
+
name: "Generic invoice extractors on ClawHub (pdf-reader-mcp, openocr-skill, opendataloader-pdf)",
|
|
194
|
+
dropped_because: "None handle NFS-e's specific structure (SP retention table, chave de acesso format, RPS reference, inscrição municipal format). They solve 'read PDF text'; they don't solve 'extract CNPJ do prestador'.",
|
|
195
|
+
},
|
|
196
|
+
{
|
|
197
|
+
name: "PyPI packages nfce-xml / nfepy (official XML-format parsers)",
|
|
198
|
+
dropped_because: "These parse the official NFS-e XML API output (when you have API access). They don't handle PDF-first workflows, which is what agents receive from users.",
|
|
199
|
+
},
|
|
200
|
+
],
|
|
201
|
+
faq: [
|
|
202
|
+
{
|
|
203
|
+
q: "Does this work for NFS-e from municipalities other than São Paulo?",
|
|
204
|
+
a: "Not yet. Each Brazilian municipality has a slightly different NFS-e layout (field labels, section headers, retention table format). The v0.1 parser is hand-tuned for São Paulo's form based on the 2-doc corpus. For other municipalities (Rio, Curitiba, Belo Horizonte, etc.), the parser needs an additional layout adapter. Until then, agents can still extract generic fields (CNPJ, dates, values via regex) but won't get the structured ISS/retention fields.",
|
|
205
|
+
},
|
|
206
|
+
{
|
|
207
|
+
q: "Why is Tesseract so much worse at field extraction than at raw text extraction?",
|
|
208
|
+
a: "Tesseract outputs text in a top-to-bottom reading order that doesn't preserve the NFS-e form's two-column retention table structure. Labels end up separated from values. The parser's label-based extractor falls back to positional heuristics for retention fields, which Tesseract's reordering breaks. Surya and Google Document AI preserve label-value proximity, so the parser hits 100% and 87.8% respectively.",
|
|
209
|
+
},
|
|
210
|
+
{
|
|
211
|
+
q: "How does this compare to hitting the São Paulo Prefeitura XML API directly?",
|
|
212
|
+
a: "The XML API is authoritative but requires credentials, the invoice's chave de acesso or number, and a non-trivial auth flow. When agents receive a PDF attachment in a bookkeeping workflow, the XML API isn't usable. The PDF-first parser lets agents work from the document the user actually shared.",
|
|
213
|
+
},
|
|
214
|
+
{
|
|
215
|
+
q: "Does the parser validate CNPJ check digits?",
|
|
216
|
+
a: "Yes. `parser.validate_cnpj(cnpj)` runs the standard Receita Federal CNPJ check-digit algorithm. Useful for flagging OCR errors before writing to a ledger.",
|
|
217
|
+
},
|
|
218
|
+
],
|
|
219
|
+
methodological_caveats: [
|
|
220
|
+
"Corpus is 2 documents from the same issuer municipality (São Paulo). Field-accuracy claims apply to São Paulo NFS-e specifically.",
|
|
221
|
+
"Ground truth is the PDF's embedded text layer (pdftotext), authoritative for native-text NFS-e but wouldn't apply to scanned images of printed NFS-e.",
|
|
222
|
+
"Field accuracy metric counts exact-string match per field; fuzzy matches would inflate accuracy slightly. Exact-match chosen for zero-error bookkeeping reliability.",
|
|
223
|
+
"Retention values (all zeros in the corpus because both prestadores are Simples Nacional) are extracted by position. Non-zero retention edge cases not yet end-to-end tested.",
|
|
224
|
+
"CNPJ validation uses the check-digit algorithm but doesn't query Receita Federal for active-status; a valid check-digit CNPJ can still be an inactive company.",
|
|
225
|
+
],
|
|
226
|
+
update_cadence: "Re-run this walkthrough when: (a) any of the three OCR candidates ships a major version, (b) the São Paulo Prefeitura changes the NFS-e form layout (watched via the scanner module's BR government feeds), (c) 90 days after first publish (2026-07-23), (d) new NFS-e parser skills emerge on ClawHub / PyPI / npm.",
|
|
227
|
+
fit_by_agent: {
|
|
228
|
+
"claude-code": { "auxiliar-nfs-e-with-surya": true, "auxiliar-nfs-e-with-google-doc-ai": true, "auxiliar-nfs-e-with-tesseract": true },
|
|
229
|
+
"claude-desktop": { "auxiliar-nfs-e-with-surya": true, "auxiliar-nfs-e-with-google-doc-ai": true, "auxiliar-nfs-e-with-tesseract": true },
|
|
230
|
+
cursor: { "auxiliar-nfs-e-with-surya": true, "auxiliar-nfs-e-with-google-doc-ai": true, "auxiliar-nfs-e-with-tesseract": true },
|
|
231
|
+
openclaw: { "auxiliar-nfs-e-with-surya": true, "auxiliar-nfs-e-with-google-doc-ai": true, "auxiliar-nfs-e-with-tesseract": true },
|
|
232
|
+
},
|
|
233
|
+
},
|
|
127
234
|
};
|
package/dist/server.js
CHANGED
|
@@ -76,7 +76,7 @@ server.tool("list_services", "List all available services and categories. Use wi
|
|
|
76
76
|
});
|
|
77
77
|
// Tool: solve_task
|
|
78
78
|
server.tool("solve_task", "Fetch the full /solve/ task ranking for a specific job-to-be-done (e.g., 'extract text from PDFs', 'parse Brazilian NFS-e invoices'). Returns the ranked candidates with install commands, an evaluated scorecard (word accuracy, layout, latency, cost, install friction), alternatives considered and dropped, FAQs, and methodological caveats. Use this when an agent needs to pick an installable tool (skill/MCP/API/local binary) for a task rather than a cloud service. Data comes from a reproducible eval run on a real-world corpus — not training data.", {
|
|
79
|
-
task_slug: z.string().max(100).optional().describe("Task slug (e.g., 'pdf-text-extraction-mcp'). Aliases
|
|
79
|
+
task_slug: z.string().max(100).optional().describe("Task slug (e.g., 'pdf-text-extraction-mcp', 'nfs-e-extraction'). Aliases that resolve automatically: 'pdf', 'ocr', 'pdf-ocr', 'document-ai', 'invoice-extraction', 'boleto', 'receipt-parsing', 'bookkeeping-ocr' (→ PDF OCR ranking); 'nfs-e', 'nfse', 'nota-fiscal', 'nota-fiscal-eletronica', 'brazilian-invoice', 'cnpj-invoice' (→ NFS-e structured extraction). Call list_solve_tasks first if you don't know the slug."),
|
|
80
80
|
category: z.string().max(100).optional().describe("Filter by task category (e.g., 'ocr', 'pdf-processing', 'agent-tools'). Returns all matching tasks."),
|
|
81
81
|
}, async (params) => {
|
|
82
82
|
const result = await solveTask(params);
|
package/dist/tools/solve.js
CHANGED
|
@@ -4,7 +4,7 @@ import { pingApi } from "./analytics.js";
|
|
|
4
4
|
// task. Kept separate from the category-alias maps in list.ts / recommend.ts
|
|
5
5
|
// because /solve/ tasks are about *jobs to be done*, not service categories.
|
|
6
6
|
const taskAliases = {
|
|
7
|
-
// PDF text extraction task
|
|
7
|
+
// PDF text extraction task — generic OCR/PDF ranker
|
|
8
8
|
"pdf": "pdf-text-extraction-mcp",
|
|
9
9
|
"pdfs": "pdf-text-extraction-mcp",
|
|
10
10
|
"pdf-extraction": "pdf-text-extraction-mcp",
|
|
@@ -18,10 +18,18 @@ const taskAliases = {
|
|
|
18
18
|
"invoice-parsing": "pdf-text-extraction-mcp",
|
|
19
19
|
"receipt-parsing": "pdf-text-extraction-mcp",
|
|
20
20
|
"receipt-ocr": "pdf-text-extraction-mcp",
|
|
21
|
-
"nfs-e": "pdf-text-extraction-mcp",
|
|
22
|
-
"nfse": "pdf-text-extraction-mcp",
|
|
23
21
|
"boleto": "pdf-text-extraction-mcp",
|
|
24
22
|
"bookkeeping-ocr": "pdf-text-extraction-mcp",
|
|
23
|
+
// NFS-e structured-extraction task — Brazilian-specific parser
|
|
24
|
+
"nfs-e": "nfs-e-extraction",
|
|
25
|
+
"nfse": "nfs-e-extraction",
|
|
26
|
+
"nfs-e-parser": "nfs-e-extraction",
|
|
27
|
+
"nota-fiscal": "nfs-e-extraction",
|
|
28
|
+
"nota-fiscal-eletronica": "nfs-e-extraction",
|
|
29
|
+
"nota-fiscal-servicos": "nfs-e-extraction",
|
|
30
|
+
"brazilian-invoice": "nfs-e-extraction",
|
|
31
|
+
"brazilian-nfs-e": "nfs-e-extraction",
|
|
32
|
+
"cnpj-invoice": "nfs-e-extraction",
|
|
25
33
|
};
|
|
26
34
|
function resolveSlug(raw) {
|
|
27
35
|
const lower = raw.toLowerCase().trim();
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "auxiliar-mcp",
|
|
3
|
-
"version": "0.
|
|
4
|
-
"description": "
|
|
3
|
+
"version": "0.10.0",
|
|
4
|
+
"description": "Agent-installable-tool rankings (OCR, PDF extraction, NFS-e, bookkeeping, and more) for Claude Code, Cursor, Claude Desktop, OpenClaw — evaluated on real-world corpora. Call solve_task to get the best skill/MCP/API/local binary for a task, ranked by word accuracy, layout, latency, cost, and install friction. Also Chrome-verified pricing, risks, and setup for 77 cloud services.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "dist/server.js",
|
|
7
7
|
"bin": {
|
|
@@ -24,17 +24,35 @@
|
|
|
24
24
|
"model-context-protocol",
|
|
25
25
|
"ai-agent",
|
|
26
26
|
"claude-code",
|
|
27
|
+
"claude-desktop",
|
|
27
28
|
"cursor",
|
|
28
29
|
"windsurf",
|
|
30
|
+
"openclaw",
|
|
31
|
+
"clawhub",
|
|
32
|
+
"ocr",
|
|
33
|
+
"pdf",
|
|
34
|
+
"pdf-extraction",
|
|
35
|
+
"pdf-ocr",
|
|
36
|
+
"document-ai",
|
|
37
|
+
"invoice-extraction",
|
|
38
|
+
"nfs-e",
|
|
39
|
+
"boleto",
|
|
40
|
+
"bookkeeping",
|
|
41
|
+
"brazilian-invoice",
|
|
42
|
+
"receipt-parsing",
|
|
43
|
+
"solve-task",
|
|
44
|
+
"task-ranking",
|
|
45
|
+
"agent-tools",
|
|
46
|
+
"agent-skills",
|
|
47
|
+
"tool-selection",
|
|
48
|
+
"agent-upgrade",
|
|
29
49
|
"cloud-services",
|
|
30
50
|
"pricing",
|
|
31
51
|
"developer-tools",
|
|
32
|
-
"
|
|
33
|
-
"
|
|
34
|
-
"
|
|
35
|
-
"
|
|
36
|
-
"stripe",
|
|
37
|
-
"infrastructure"
|
|
52
|
+
"infrastructure",
|
|
53
|
+
"surya",
|
|
54
|
+
"tesseract",
|
|
55
|
+
"google-document-ai"
|
|
38
56
|
],
|
|
39
57
|
"license": "MIT",
|
|
40
58
|
"repository": {
|