@matheuskrumenauer/tanya 0.14.0-beta.0 → 0.17.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +127 -11
- package/dist/chunk-5PSV2Y3X.js +16879 -0
- package/dist/chunk-5PSV2Y3X.js.map +1 -0
- package/dist/cli.js +4022 -16143
- package/dist/cli.js.map +1 -1
- package/dist/runInkChat-AZFI7553.js +950 -0
- package/dist/runInkChat-AZFI7553.js.map +1 -0
- package/package.json +5 -1
package/README.md
CHANGED
|
@@ -1,21 +1,28 @@
|
|
|
1
1
|
# Tanya
|
|
2
2
|
|
|
3
|
+
**A Claude-Code-style coding agent that actually works with DeepSeek.**
|
|
4
|
+
|
|
3
5
|
[](https://github.com/matheusjkweber/tanya/actions/workflows/ci.yml)
|
|
4
6
|
[](https://www.npmjs.com/package/@matheuskrumenauer/tanya)
|
|
5
7
|
[](./LICENSE)
|
|
6
|
-
[](https://github.com/matheusjkweber/tanya/graphs/contributors)
|
|
8
|
+
[](./docs/benchmarks/eco-30-latest.json)
|
|
8
9
|
|
|
9
|
-
Tanya is
|
|
10
|
+
Existing tools (Cursor, Claude Code, and Chinese-native CLIs) produce malformed tool calls, dropped schemas, and silent failures on DeepSeek. Tanya is built specifically to handle DeepSeek's quirks - permissive tool-call parsing, retry-with-correction, schema flattening, reasoning-model support - without compromising the deterministic verifier that catches hallucinations cheap models would otherwise sneak past you.
|
|
10
11
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
12
|
+
Works with: DeepSeek (primary), Qwen, Grok, Groq, Ollama, and any OpenAI-compatible endpoint.
|
|
13
|
+
|
|
14
|
+
## Why this exists
|
|
14
15
|
|
|
15
|
-
|
|
16
|
+
I have a PhD in AI and I use DeepSeek every day. Every coding-agent CLI I tried either broke tool calls, silently dropped schema details, or made verification feel like an afterthought. I built Tanya so I could actually work with DeepSeek and still have a verifier watching what the model changed.
|
|
16
17
|
|
|
17
18
|
## Install
|
|
18
19
|
|
|
20
|
+
```bash
|
|
21
|
+
npm i -g @matheuskrumenauer/tanya
|
|
22
|
+
export DEEPSEEK_API_KEY=sk-...
|
|
23
|
+
tanya
|
|
24
|
+
```
|
|
25
|
+
|
|
19
26
|
Local development:
|
|
20
27
|
|
|
21
28
|
```bash
|
|
@@ -48,6 +55,23 @@ npm install -g --os=linux --cpu=arm64 --libc=glibc @matheuskrumenauer/tanya
|
|
|
48
55
|
Use `--cpu=amd64` on x64 containers. Tracking issue:
|
|
49
56
|
https://github.com/matheusjkweber/tanya/issues/9.
|
|
50
57
|
|
|
58
|
+
## Quick start
|
|
59
|
+
|
|
60
|
+
```bash
|
|
61
|
+
tanya ask "explain this repo"
|
|
62
|
+
tanya run --verify "npm test" "fix the failing test"
|
|
63
|
+
tanya providers test --provider deepseek
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## What makes it work with DeepSeek
|
|
67
|
+
|
|
68
|
+
- Permissive tool-call parsing recovers missing IDs, stringified arguments, missing wrappers, and other almost-OpenAI-compatible responses before a run falls over.
|
|
69
|
+
- Retry-with-correction turns malformed tool calls into explicit repair prompts instead of silent no-ops.
|
|
70
|
+
- Schema flattening keeps narrow providers from rejecting tool definitions with `$ref` or `oneOf` shapes.
|
|
71
|
+
- Reasoning-model support separates `deepseek-reasoner` thinking from final answers, archives it, and tracks reasoning tokens in cost reports.
|
|
72
|
+
- The verifier checks changed files, expected artifacts, validation output, and blockers after the model acts, so cheap-model drift has to pass deterministic review.
|
|
73
|
+
- Defaults to `deepseek-v4-pro` and tracks DeepSeek's API roadmap; legacy aliases still work but warn before their scheduled deprecation.
|
|
74
|
+
|
|
51
75
|
## Contributing
|
|
52
76
|
|
|
53
77
|
Start with [CONTRIBUTING.md](./CONTRIBUTING.md) for local setup, tool and
|
|
@@ -64,7 +88,7 @@ Create `.env` from `.env.example`:
|
|
|
64
88
|
```bash
|
|
65
89
|
DEEPSEEK_API_KEY=...
|
|
66
90
|
DEEPSEEK_BASE_URL=https://api.deepseek.com
|
|
67
|
-
TANYA_MODEL=deepseek-
|
|
91
|
+
TANYA_MODEL=deepseek-v4-pro
|
|
68
92
|
```
|
|
69
93
|
|
|
70
94
|
Use the reasoner profile for harder coding/planning tasks:
|
|
@@ -100,6 +124,9 @@ When set, Tanya appends a summary of completed tasks to the vault daily note. `t
|
|
|
100
124
|
DeepSeek documents its API as OpenAI-compatible for chat completions:
|
|
101
125
|
https://api-docs.deepseek.com/
|
|
102
126
|
|
|
127
|
+
- Tracks the DeepSeek API roadmap: warns when legacy model names approach
|
|
128
|
+
deprecation, with a documented migration path in `docs/providers.md`.
|
|
129
|
+
|
|
103
130
|
## Backward compatibility
|
|
104
131
|
|
|
105
132
|
The old `tania` command remains as a binary alias for `tanya`, so existing
|
|
@@ -293,12 +320,54 @@ Escalations are visible: if a cheap route exhausts the malformed tool-call
|
|
|
293
320
|
repair budget, Tanya emits `escalation_event` and uses the route fallback once,
|
|
294
321
|
up to `TANYA_ESCALATION_CAP` per session.
|
|
295
322
|
|
|
323
|
+
Per-turn reasoning budgets fall back to `TANYA_REASONING_CAP_SHORT` (default
|
|
324
|
+
`2000`) and `TANYA_REASONING_CAP_LONG` (default `8000`) when a route pins no
|
|
325
|
+
`reasoningCap` of its own.
|
|
326
|
+
|
|
296
327
|
See [docs/routing.md](./docs/routing.md) for schema, examples, context-window
|
|
297
|
-
guards, per-tool model overrides,
|
|
328
|
+
guards, per-tool model overrides, sub-agent model pins, and reasoning budgets.
|
|
329
|
+
|
|
330
|
+
## Live status
|
|
331
|
+
|
|
332
|
+
Interactive `tanya chat` sessions show a compact status footer derived from the
|
|
333
|
+
same events already sent to the human sink:
|
|
334
|
+
|
|
335
|
+
```text
|
|
336
|
+
[deepseek:deepseek-chat | tool_call | $0.04 | 2 tools | 1 child]
|
|
337
|
+
[awaiting permission: run_shell]
|
|
338
|
+
[escalated deepseek:deepseek-chat->openai:gpt-4.1-mini: parse_failure]
|
|
339
|
+
```
|
|
340
|
+
|
|
341
|
+
The footer is TTY-only. Piped output and JSONL output stay byte-stable and
|
|
342
|
+
receive no ANSI cursor control bytes. Disable it with
|
|
343
|
+
`TANYA_LIVE_STATUS=0` or the legacy `TANIA_LIVE_STATUS=0` alias.
|
|
344
|
+
|
|
345
|
+
See [docs/live-status.md](./docs/live-status.md) for the surfaced fields,
|
|
346
|
+
streaming strategy, and TTY fallback behavior.
|
|
347
|
+
|
|
348
|
+
## Reasoning models
|
|
349
|
+
|
|
350
|
+
Reasoning routes such as `deepseek-reasoner`, `qwen3-thinking-*`, and
|
|
351
|
+
`grok-3-reasoning` are handled as a separate stream. Tanya archives reasoning to
|
|
352
|
+
`.tania/runs/<runId>/reasoning.jsonl`, emits `reasoning_chunk` events, and keeps
|
|
353
|
+
assistant history reasoning-free so replay and verifier inputs stay stable.
|
|
354
|
+
|
|
355
|
+
Reasoning tokens appear separately in `/cost` and `/budget`. Route rules can set
|
|
356
|
+
`reasoningCap.maxTokens`; built-in defaults are 2k for planning-like turns and
|
|
357
|
+
8k for synthesis/verification/reasoning turns. If the cap is exceeded, Tanya
|
|
358
|
+
emits `reasoning_truncated` and asks the model to finish.
|
|
359
|
+
|
|
360
|
+
Use `/memory --reasoning <runId>` to inspect archived reasoning. Use
|
|
361
|
+
`TANYA_HIDE_REASONING=1` to hide reasoning from the human UI while preserving
|
|
362
|
+
JSONL events. Verifier reasoning annotations are off by default; enable
|
|
363
|
+
them with `--verbose-verifier` or `TANYA_VERIFIER_INCLUDE_REASONING=1`.
|
|
364
|
+
|
|
365
|
+
See [docs/reasoning.md](./docs/reasoning.md) for provider notes, billing math,
|
|
366
|
+
budget defaults, and UX modes.
|
|
298
367
|
|
|
299
368
|
`--verify` adds required verification commands to the run context. Tanya must run and report each exact command before finishing the coding task.
|
|
300
369
|
|
|
301
|
-
`tanya benchmark run --all` currently exercises
|
|
370
|
+
`tanya benchmark run --all` currently exercises 23 executable low-to-medium regression fixtures: targeted edits, new files, dependency/lockfile updates, framework-style migrations, failing-test repair, frontend smoke checks, artifact/context reuse, streaming long-tool execution, compaction-boundary recovery, run-history logging, dirty worktrees, and report repair.
|
|
302
371
|
|
|
303
372
|
By default, `tanya run` also performs an independent post-check after the agent finishes. If the workspace has a `typecheck` script, Tanya reruns that exact script with the local package manager (`npm`, `pnpm`, `yarn`, or `bun`). If not, it falls back to `npx tsc --noEmit --pretty false` when a `tsconfig` is present. If the workspace has a `test` script, Tanya reruns that as well unless the run already reported a passing test verification.
|
|
304
373
|
|
|
@@ -361,6 +430,28 @@ Tanya trims model-visible tokens while keeping state reversible and auditable.
|
|
|
361
430
|
|
|
362
431
|
See [docs/token-economy.md](./docs/token-economy.md) for the full model, cache locations, and tool-definition knobs.
|
|
363
432
|
|
|
433
|
+
## Benchmarks
|
|
434
|
+
|
|
435
|
+
Tanya includes an eval harness for verifier-stress suites, SWE-bench-Lite
|
|
436
|
+
adapters, integration-provided suites, and the `eco-30` token-economy bench.
|
|
437
|
+
|
|
438
|
+
```bash
|
|
439
|
+
tanya eval --suite tanya-native --dry-run
|
|
440
|
+
tanya eval --suite tanya-native --out .tania/eval/results/tanya-native.json
|
|
441
|
+
tanya eval report .tania/eval/results/tanya-native.json
|
|
442
|
+
tanya eval compare docs/benchmarks/tanya-native-latest.json .tania/eval/results/tanya-native.json --format markdown
|
|
443
|
+
```
|
|
444
|
+
|
|
445
|
+
Public snapshots live in [docs/benchmarks](./docs/benchmarks/). The eval result
|
|
446
|
+
schema and determinism contract are documented in
|
|
447
|
+
[docs/eval-format.md](./docs/eval-format.md).
|
|
448
|
+
|
|
449
|
+
`eco-30` is the token-economy suite. Its reports include total cost, cost per
|
|
450
|
+
pass, tokens per pass, reasoning share, and cost-regression checks. The
|
|
451
|
+
`verifier-self-test` suite is the verifier moat regression net: known-correct
|
|
452
|
+
and known-incorrect artifacts where the expected outcome is the verifier's
|
|
453
|
+
classification, not the model's output.
|
|
454
|
+
|
|
364
455
|
## Edit blocks
|
|
365
456
|
|
|
366
457
|
`edit_block` applies bounded search/replace edits without falling back to a
|
|
@@ -391,6 +482,30 @@ pass.
|
|
|
391
482
|
See [docs/edit-blocks.md](./docs/edit-blocks.md) for the full tool reference,
|
|
392
483
|
permission model, confidence threshold, and failure modes.
|
|
393
484
|
|
|
485
|
+
## Structural repo-map
|
|
486
|
+
|
|
487
|
+
Lite prompts can include a generated structural map from
|
|
488
|
+
`.tania/index/repo-map.json`. The map lists workspace-relative files, language,
|
|
489
|
+
parser provenance, top-level symbols, imports, and exports so cheap providers
|
|
490
|
+
can target likely files before spending turns on blind reads.
|
|
491
|
+
|
|
492
|
+
Tanya indexes TypeScript/JavaScript, Python, Go, Swift, and Kotlin with a
|
|
493
|
+
lightweight ripgrep-style parser and falls back to path-only entries when file
|
|
494
|
+
content cannot be read. Generated, binary, ignored, and oversized files are
|
|
495
|
+
skipped. The repo-map is advisory context only: agents must still read files
|
|
496
|
+
before editing, and the verifier remains the final authority.
|
|
497
|
+
|
|
498
|
+
Use `TANYA_LITE_PROMPT=1` to inject a ranked repo-map excerpt. Tune the default
|
|
499
|
+
1000-token section budget with `TANYA_REPO_MAP_PROMPT_BUDGET`; the legacy
|
|
500
|
+
`TANIA_*` alias is also accepted. If the prompt budget is tight, the repo-map
|
|
501
|
+
drops before skill packs because it is generated and recoverable.
|
|
502
|
+
|
|
503
|
+
Use `inspect_repo_map` when the model needs more structural detail by file,
|
|
504
|
+
symbol, or language without burning prompt tokens on the whole map.
|
|
505
|
+
|
|
506
|
+
See [docs/repo-map.md](./docs/repo-map.md) for schema, parser status, ranking,
|
|
507
|
+
budget interaction, and cache invalidation.
|
|
508
|
+
|
|
394
509
|
Context files are generic JSON envelopes for caller-supplied task metadata, artifacts, instructions, and verification commands.
|
|
395
510
|
|
|
396
511
|
## Current Tools
|
|
@@ -398,6 +513,7 @@ Context files are generic JSON envelopes for caller-supplied task metadata, arti
|
|
|
398
513
|
- `list_files`
|
|
399
514
|
- `read_file`
|
|
400
515
|
- `search`
|
|
516
|
+
- `inspect_repo_map`
|
|
401
517
|
- `inspect_project_context`
|
|
402
518
|
- `find_reusable_artifacts`
|
|
403
519
|
- `build_task_brief`
|
|
@@ -448,7 +564,7 @@ To make variants, override terminal copy with repeated `--line` flags:
|
|
|
448
564
|
tanya video one-terminal-simctl \
|
|
449
565
|
--output-dir assets/video \
|
|
450
566
|
--basename install-failure \
|
|
451
|
-
--line '$ xcrun simctl install booted
|
|
567
|
+
--line '$ xcrun simctl install booted DemoApp.app' \
|
|
452
568
|
--line 'error: unable to find a booted simulator' \
|
|
453
569
|
--line '$ xcrun simctl io booted screenshot out.png' \
|
|
454
570
|
--line 'xcrun: error: selected device is not available'
|