npm - @matheuskrumenauer/tanya - Versions diffs - 0.14.0-beta.0 → 0.17.0 - Mend

@matheuskrumenauer/tanya 0.14.0-beta.0 → 0.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +127 -11
package/dist/chunk-5PSV2Y3X.js +16879 -0
package/dist/chunk-5PSV2Y3X.js.map +1 -0
package/dist/cli.js +4022 -16143
package/dist/cli.js.map +1 -1
package/dist/runInkChat-AZFI7553.js +950 -0
package/dist/runInkChat-AZFI7553.js.map +1 -0
package/package.json +5 -1

package/README.md CHANGED Viewed

@@ -1,21 +1,28 @@
 # Tanya
+**A Claude-Code-style coding agent that actually works with DeepSeek.**
 [![CI](https://github.com/matheusjkweber/tanya/actions/workflows/ci.yml/badge.svg)](https://github.com/matheusjkweber/tanya/actions/workflows/ci.yml)
 [![npm version](https://img.shields.io/npm/v/%40matheuskrumenauer%2Ftanya.svg)](https://www.npmjs.com/package/@matheuskrumenauer/tanya)
 [![license](https://img.shields.io/github/license/matheusjkweber/tanya.svg)](./LICENSE)
-[![node](https://img.shields.io/badge/node-%3E%3D20-brightgreen.svg)](./package.json)
-[![contributors](https://img.shields.io/github/contributors/matheusjkweber/tanya.svg)](https://github.com/matheusjkweber/tanya/graphs/contributors)
+[![DeepSeek-V4-Pro pass rate](https://img.shields.io/badge/deepseek--v4--pro%20pass%20rate-96.7%25-brightgreen)](./docs/benchmarks/eco-30-latest.json)
-Tanya is a live, tool-using AI CLI. It starts like Claude Code:
+Existing tools (Cursor, Claude Code, and Chinese-native CLIs) produce malformed tool calls, dropped schemas, and silent failures on DeepSeek. Tanya is built specifically to handle DeepSeek's quirks - permissive tool-call parsing, retry-with-correction, schema flattening, reasoning-model support - without compromising the deterministic verifier that catches hallucinations cheap models would otherwise sneak past you.
-```bash
-tanya
-```
+Works with: DeepSeek (primary), Qwen, Grok, Groq, Ollama, and any OpenAI-compatible endpoint.
+## Why this exists
-The first provider is DeepSeek. The provider layer is OpenAI-compatible, so future providers can be added with an API key, base URL, and model name.
+I have a PhD in AI and I use DeepSeek every day. Every coding-agent CLI I tried either broke tool calls, silently dropped schema details, or made verification feel like an afterthought. I built Tanya so I could actually work with DeepSeek and still have a verifier watching what the model changed.
 ## Install
+```bash
+npm i -g @matheuskrumenauer/tanya
+export DEEPSEEK_API_KEY=sk-...
+tanya
+```
 Local development:
 ```bash
@@ -48,6 +55,23 @@ npm install -g --os=linux --cpu=arm64 --libc=glibc @matheuskrumenauer/tanya
 Use `--cpu=amd64` on x64 containers. Tracking issue:
 https://github.com/matheusjkweber/tanya/issues/9.
+## Quick start
+```bash
+tanya ask "explain this repo"
+tanya run --verify "npm test" "fix the failing test"
+tanya providers test --provider deepseek
+```
+## What makes it work with DeepSeek
+- Permissive tool-call parsing recovers missing IDs, stringified arguments, missing wrappers, and other almost-OpenAI-compatible responses before a run falls over.
+- Retry-with-correction turns malformed tool calls into explicit repair prompts instead of silent no-ops.
+- Schema flattening keeps narrow providers from rejecting tool definitions with `$ref` or `oneOf` shapes.
+- Reasoning-model support separates `deepseek-reasoner` thinking from final answers, archives it, and tracks reasoning tokens in cost reports.
+- The verifier checks changed files, expected artifacts, validation output, and blockers after the model acts, so cheap-model drift has to pass deterministic review.
+- Defaults to `deepseek-v4-pro` and tracks DeepSeek's API roadmap; legacy aliases still work but warn before their scheduled deprecation.
 ## Contributing
 Start with [CONTRIBUTING.md](./CONTRIBUTING.md) for local setup, tool and
@@ -64,7 +88,7 @@ Create `.env` from `.env.example`:
 ```bash
 DEEPSEEK_API_KEY=...
 DEEPSEEK_BASE_URL=https://api.deepseek.com
-TANYA_MODEL=deepseek-chat
+TANYA_MODEL=deepseek-v4-pro
 ```
 Use the reasoner profile for harder coding/planning tasks:
@@ -100,6 +124,9 @@ When set, Tanya appends a summary of completed tasks to the vault daily note. `t
 DeepSeek documents its API as OpenAI-compatible for chat completions:
 https://api-docs.deepseek.com/
+- Tracks the DeepSeek API roadmap: warns when legacy model names approach
+  deprecation, with a documented migration path in `docs/providers.md`.
 ## Backward compatibility
 The old `tania` command remains as a binary alias for `tanya`, so existing
@@ -293,12 +320,54 @@ Escalations are visible: if a cheap route exhausts the malformed tool-call
 repair budget, Tanya emits `escalation_event` and uses the route fallback once,
 up to `TANYA_ESCALATION_CAP` per session.
+Per-turn reasoning budgets fall back to `TANYA_REASONING_CAP_SHORT` (default
+`2000`) and `TANYA_REASONING_CAP_LONG` (default `8000`) when a route pins no
+`reasoningCap` of its own.
 See [docs/routing.md](./docs/routing.md) for schema, examples, context-window
-guards, per-tool model overrides, and sub-agent model pins.
+guards, per-tool model overrides, sub-agent model pins, and reasoning budgets.
+## Live status
+Interactive `tanya chat` sessions show a compact status footer derived from the
+same events already sent to the human sink:
+```text
+[deepseek:deepseek-chat | tool_call | $0.04 | 2 tools | 1 child]
+[awaiting permission: run_shell]
+[escalated deepseek:deepseek-chat->openai:gpt-4.1-mini: parse_failure]
+```
+The footer is TTY-only. Piped output and JSONL output stay byte-stable and
+receive no ANSI cursor control bytes. Disable it with
+`TANYA_LIVE_STATUS=0` or the legacy `TANIA_LIVE_STATUS=0` alias.
+See [docs/live-status.md](./docs/live-status.md) for the surfaced fields,
+streaming strategy, and TTY fallback behavior.
+## Reasoning models
+Reasoning routes such as `deepseek-reasoner`, `qwen3-thinking-*`, and
+`grok-3-reasoning` are handled as a separate stream. Tanya archives reasoning to
+`.tania/runs/<runId>/reasoning.jsonl`, emits `reasoning_chunk` events, and keeps
+assistant history reasoning-free so replay and verifier inputs stay stable.
+Reasoning tokens appear separately in `/cost` and `/budget`. Route rules can set
+`reasoningCap.maxTokens`; built-in defaults are 2k for planning-like turns and
+8k for synthesis/verification/reasoning turns. If the cap is exceeded, Tanya
+emits `reasoning_truncated` and asks the model to finish.
+Use `/memory --reasoning <runId>` to inspect archived reasoning. Use
+`TANYA_HIDE_REASONING=1` to hide reasoning from the human UI while preserving
+JSONL events. Verifier reasoning annotations are off by default; enable
+them with `--verbose-verifier` or `TANYA_VERIFIER_INCLUDE_REASONING=1`.
+See [docs/reasoning.md](./docs/reasoning.md) for provider notes, billing math,
+budget defaults, and UX modes.
 `--verify` adds required verification commands to the run context. Tanya must run and report each exact command before finishing the coding task.
-`tanya benchmark run --all` currently exercises 27 executable low-to-medium regression fixtures: targeted edits, new files, dependency/lockfile updates, framework-style migrations, failing-test repair, frontend smoke checks, artifact/context reuse, streaming long-tool execution, compaction-boundary recovery, run-history logging, dirty worktrees, report repair, and the CosmoHQ mobile/backend smoke profiles.
+`tanya benchmark run --all` currently exercises 23 executable low-to-medium regression fixtures: targeted edits, new files, dependency/lockfile updates, framework-style migrations, failing-test repair, frontend smoke checks, artifact/context reuse, streaming long-tool execution, compaction-boundary recovery, run-history logging, dirty worktrees, and report repair.
 By default, `tanya run` also performs an independent post-check after the agent finishes. If the workspace has a `typecheck` script, Tanya reruns that exact script with the local package manager (`npm`, `pnpm`, `yarn`, or `bun`). If not, it falls back to `npx tsc --noEmit --pretty false` when a `tsconfig` is present. If the workspace has a `test` script, Tanya reruns that as well unless the run already reported a passing test verification.
@@ -361,6 +430,28 @@ Tanya trims model-visible tokens while keeping state reversible and auditable.
 See [docs/token-economy.md](./docs/token-economy.md) for the full model, cache locations, and tool-definition knobs.
+## Benchmarks
+Tanya includes an eval harness for verifier-stress suites, SWE-bench-Lite
+adapters, integration-provided suites, and the `eco-30` token-economy bench.
+```bash
+tanya eval --suite tanya-native --dry-run
+tanya eval --suite tanya-native --out .tania/eval/results/tanya-native.json
+tanya eval report .tania/eval/results/tanya-native.json
+tanya eval compare docs/benchmarks/tanya-native-latest.json .tania/eval/results/tanya-native.json --format markdown
+```
+Public snapshots live in [docs/benchmarks](./docs/benchmarks/). The eval result
+schema and determinism contract are documented in
+[docs/eval-format.md](./docs/eval-format.md).
+`eco-30` is the token-economy suite. Its reports include total cost, cost per
+pass, tokens per pass, reasoning share, and cost-regression checks. The
+`verifier-self-test` suite is the verifier moat regression net: known-correct
+and known-incorrect artifacts where the expected outcome is the verifier's
+classification, not the model's output.
 ## Edit blocks
 `edit_block` applies bounded search/replace edits without falling back to a
@@ -391,6 +482,30 @@ pass.
 See [docs/edit-blocks.md](./docs/edit-blocks.md) for the full tool reference,
 permission model, confidence threshold, and failure modes.
+## Structural repo-map
+Lite prompts can include a generated structural map from
+`.tania/index/repo-map.json`. The map lists workspace-relative files, language,
+parser provenance, top-level symbols, imports, and exports so cheap providers
+can target likely files before spending turns on blind reads.
+Tanya indexes TypeScript/JavaScript, Python, Go, Swift, and Kotlin with a
+lightweight ripgrep-style parser and falls back to path-only entries when file
+content cannot be read. Generated, binary, ignored, and oversized files are
+skipped. The repo-map is advisory context only: agents must still read files
+before editing, and the verifier remains the final authority.
+Use `TANYA_LITE_PROMPT=1` to inject a ranked repo-map excerpt. Tune the default
+1000-token section budget with `TANYA_REPO_MAP_PROMPT_BUDGET`; the legacy
+`TANIA_*` alias is also accepted. If the prompt budget is tight, the repo-map
+drops before skill packs because it is generated and recoverable.
+Use `inspect_repo_map` when the model needs more structural detail by file,
+symbol, or language without burning prompt tokens on the whole map.
+See [docs/repo-map.md](./docs/repo-map.md) for schema, parser status, ranking,
+budget interaction, and cache invalidation.
 Context files are generic JSON envelopes for caller-supplied task metadata, artifacts, instructions, and verification commands.
 ## Current Tools
@@ -398,6 +513,7 @@ Context files are generic JSON envelopes for caller-supplied task metadata, arti
 - `list_files`
 - `read_file`
 - `search`
+- `inspect_repo_map`
 - `inspect_project_context`
 - `find_reusable_artifacts`
 - `build_task_brief`
@@ -448,7 +564,7 @@ To make variants, override terminal copy with repeated `--line` flags:
 tanya video one-terminal-simctl \
   --output-dir assets/video \
   --basename install-failure \
-  --line '$ xcrun simctl install booted CosmoKit.app' \
+  --line '$ xcrun simctl install booted DemoApp.app' \
   --line 'error: unable to find a booted simulator' \
   --line '$ xcrun simctl io booted screenshot out.png' \
   --line 'xcrun: error: selected device is not available'