axm-echo 0.0.1.dev0__tar.gz → 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- axm_echo-0.1.0/.gitignore +48 -0
- axm_echo-0.1.0/CONTRIBUTING.md +16 -0
- axm_echo-0.1.0/PKG-INFO +84 -0
- axm_echo-0.1.0/README.md +58 -0
- axm_echo-0.1.0/docs/explanation/architecture.md +58 -0
- axm_echo-0.1.0/docs/howto/index.md +9 -0
- axm_echo-0.1.0/docs/howto/reuse-check-in-planning.md +78 -0
- axm_echo-0.1.0/docs/index.md +98 -0
- axm_echo-0.1.0/docs/reference/cli.md +136 -0
- axm_echo-0.1.0/docs/tutorials/getting-started.md +92 -0
- axm_echo-0.1.0/mkdocs.yml +14 -0
- axm_echo-0.1.0/pyproject.toml +262 -0
- axm_echo-0.1.0/src/axm_echo/__init__.py +42 -0
- axm_echo-0.1.0/src/axm_echo/_version.py +24 -0
- axm_echo-0.1.0/src/axm_echo/cluster.py +320 -0
- axm_echo-0.1.0/src/axm_echo/corpus.py +318 -0
- axm_echo-0.1.0/src/axm_echo/embedding.py +198 -0
- axm_echo-0.1.0/src/axm_echo/scope.py +77 -0
- axm_echo-0.1.0/src/axm_echo/structural.py +86 -0
- axm_echo-0.1.0/src/axm_echo/tools.py +606 -0
- axm_echo-0.1.0/src/axm_echo/waiver.py +168 -0
- axm_echo-0.1.0/tests/__init__.py +1 -0
- axm_echo-0.1.0/tests/conftest.py +3 -0
- axm_echo-0.1.0/tests/e2e/__init__.py +1 -0
- axm_echo-0.1.0/tests/e2e/conftest.py +3 -0
- axm_echo-0.1.0/tests/e2e/test_echo_check.py +78 -0
- axm_echo-0.1.0/tests/e2e/test_echo_code.py +78 -0
- axm_echo-0.1.0/tests/integration/__init__.py +1 -0
- axm_echo-0.1.0/tests/integration/conftest.py +3 -0
- axm_echo-0.1.0/tests/integration/test_boilerplate_calibration.py +240 -0
- axm_echo-0.1.0/tests/integration/test_discover_package_roots__extract_monorepo.py +164 -0
- axm_echo-0.1.0/tests/integration/test_echo_check.py +277 -0
- axm_echo-0.1.0/tests/integration/test_echo_code.py +581 -0
- axm_echo-0.1.0/tests/integration/test_embed__extract_package.py +240 -0
- axm_echo-0.1.0/tests/integration/test_load_scope.py +83 -0
- axm_echo-0.1.0/tests/unit/__init__.py +1 -0
- axm_echo-0.1.0/tests/unit/conftest.py +3 -0
- axm_echo-0.1.0/tests/unit/test_cluster.py +177 -0
- axm_echo-0.1.0/tests/unit/test_corpus.py +89 -0
- axm_echo-0.1.0/tests/unit/test_echo_code_guards.py +30 -0
- axm_echo-0.1.0/tests/unit/test_embedding.py +91 -0
- axm_echo-0.1.0/tests/unit/test_structural.py +65 -0
- axm_echo-0.1.0/tests/unit/test_version.py +11 -0
- axm_echo-0.1.0/tests/unit/test_waiver.py +94 -0
- axm_echo-0.0.1.dev0/PKG-INFO +0 -14
- axm_echo-0.0.1.dev0/README.md +0 -3
- axm_echo-0.0.1.dev0/pyproject.toml +0 -16
- axm_echo-0.0.1.dev0/src/axm_echo/__init__.py +0 -1
- {axm_echo-0.0.1.dev0 → axm_echo-0.1.0}/src/axm_echo/py.typed +0 -0
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Python
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
*$py.class
|
|
5
|
+
*.so
|
|
6
|
+
# Generated version file
|
|
7
|
+
_version.py
|
|
8
|
+
.Python
|
|
9
|
+
build/
|
|
10
|
+
dist/
|
|
11
|
+
*.egg-info/
|
|
12
|
+
*.egg
|
|
13
|
+
|
|
14
|
+
# Virtual environments
|
|
15
|
+
.venv/
|
|
16
|
+
.uv/
|
|
17
|
+
venv/
|
|
18
|
+
ENV/
|
|
19
|
+
|
|
20
|
+
# Testing & Coverage
|
|
21
|
+
.pytest_cache/
|
|
22
|
+
.coverage
|
|
23
|
+
coverage.xml
|
|
24
|
+
coverage.json
|
|
25
|
+
htmlcov/
|
|
26
|
+
coverage_html/
|
|
27
|
+
.tox/
|
|
28
|
+
|
|
29
|
+
# Type checking & Linting
|
|
30
|
+
.mypy_cache/
|
|
31
|
+
.ruff_cache/
|
|
32
|
+
|
|
33
|
+
# IDE
|
|
34
|
+
.idea/
|
|
35
|
+
.vscode/
|
|
36
|
+
*.swp
|
|
37
|
+
*.swo
|
|
38
|
+
|
|
39
|
+
# Environment
|
|
40
|
+
.env
|
|
41
|
+
.envrc
|
|
42
|
+
|
|
43
|
+
# Documentation
|
|
44
|
+
site/
|
|
45
|
+
|
|
46
|
+
# OS
|
|
47
|
+
.DS_Store
|
|
48
|
+
Thumbs.db
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# Contributing to axm-echo
|
|
2
|
+
|
|
3
|
+
Thanks for your interest in contributing!
|
|
4
|
+
|
|
5
|
+
## Development setup
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
uv sync --all-groups
|
|
9
|
+
uv run pytest
|
|
10
|
+
```
|
|
11
|
+
|
|
12
|
+
## Pull requests
|
|
13
|
+
|
|
14
|
+
- Follow Conventional Commits for commit messages.
|
|
15
|
+
- Run `make lint` and `make test` before opening a PR.
|
|
16
|
+
- Add tests for new features and bug fixes.
|
axm_echo-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: axm-echo
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Neural similarity & echo detection over code corpora (MiniLM + scikit-learn).
|
|
5
|
+
Project-URL: Homepage, https://github.com/axm-protocols/axm-forge-workspace
|
|
6
|
+
Project-URL: Documentation, https://axm-protocols.github.io/axm-forge-workspace/
|
|
7
|
+
Project-URL: Repository, https://github.com/axm-protocols/axm-forge-workspace.git
|
|
8
|
+
Project-URL: Issues, https://github.com/axm-protocols/axm-forge-workspace/issues
|
|
9
|
+
Author-email: Gabriel Jarry <gabriel@axm-protocols.io>
|
|
10
|
+
License-Expression: MIT
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
16
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
17
|
+
Classifier: Typing :: Typed
|
|
18
|
+
Requires-Python: >=3.12
|
|
19
|
+
Requires-Dist: axm
|
|
20
|
+
Requires-Dist: axm-ast
|
|
21
|
+
Requires-Dist: numpy>=2.5.0
|
|
22
|
+
Requires-Dist: scikit-learn>=1.9.0
|
|
23
|
+
Requires-Dist: sentence-transformers>=5.6.0
|
|
24
|
+
Requires-Dist: torch>=2.12.1
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
|
|
27
|
+
# axm-echo
|
|
28
|
+
|
|
29
|
+
Neural similarity & echo detection over code corpora (MiniLM + scikit-learn).
|
|
30
|
+
|
|
31
|
+
<p align="center">
|
|
32
|
+
<a href="https://forge.axm-protocols.io/audit/"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-forge-workspace/gh-pages/badges/axm-echo/axm-audit.json" alt="axm-audit"></a>
|
|
33
|
+
<a href="https://forge.axm-protocols.io/init/"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-forge-workspace/gh-pages/badges/axm-echo/axm-init.json" alt="axm-init"></a>
|
|
34
|
+
<a href="https://github.com/axm-protocols/axm-forge-workspace/actions/workflows/axm-quality.yml"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-forge-workspace/gh-pages/badges/axm-echo/coverage.json" alt="Coverage"></a>
|
|
35
|
+
<img src="https://img.shields.io/badge/python-3.12%2B-blue" alt="Python 3.12+">
|
|
36
|
+
</p>
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Overview
|
|
41
|
+
|
|
42
|
+
Neural similarity & echo detection over code corpora (MiniLM + scikit-learn).
|
|
43
|
+
|
|
44
|
+
## Features
|
|
45
|
+
|
|
46
|
+
- **Neural by default** — the `st` (MiniLM) backend ships in the base install
|
|
47
|
+
(`torch` + `sentence-transformers`) and runs in-process; no extra to enable.
|
|
48
|
+
- **`tfidf` opt-out** — the pure-CPU `numpy` + `scikit-learn` backend stays
|
|
49
|
+
available (`--backend tfidf`) for callers that want to avoid loading torch.
|
|
50
|
+
- Built on `axm-ast` for code-corpus extraction — the corpus feeding both
|
|
51
|
+
`echo_code` (cross-package dedup) and `echo_check` (reuse retrieval).
|
|
52
|
+
|
|
53
|
+
## Installation
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
# echo is neural by default — the install ships torch + sentence-transformers.
|
|
57
|
+
uv add axm-echo
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
Or as a workspace dependency in `pyproject.toml`:
|
|
61
|
+
|
|
62
|
+
```toml
|
|
63
|
+
[project]
|
|
64
|
+
dependencies = ["axm-echo"]
|
|
65
|
+
|
|
66
|
+
[tool.uv.sources]
|
|
67
|
+
axm-echo = { workspace = true }
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Development
|
|
71
|
+
|
|
72
|
+
This package is part of the **axm-forge-workspace** uv workspace.
|
|
73
|
+
|
|
74
|
+
```bash
|
|
75
|
+
# Run tests for this package
|
|
76
|
+
uv run pytest --package axm-echo
|
|
77
|
+
|
|
78
|
+
# From workspace root
|
|
79
|
+
make test-axm-echo
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
## License
|
|
83
|
+
|
|
84
|
+
MIT — © 2026 Gabriel Jarry
|
axm_echo-0.1.0/README.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# axm-echo
|
|
2
|
+
|
|
3
|
+
Neural similarity & echo detection over code corpora (MiniLM + scikit-learn).
|
|
4
|
+
|
|
5
|
+
<p align="center">
|
|
6
|
+
<a href="https://forge.axm-protocols.io/audit/"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-forge-workspace/gh-pages/badges/axm-echo/axm-audit.json" alt="axm-audit"></a>
|
|
7
|
+
<a href="https://forge.axm-protocols.io/init/"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-forge-workspace/gh-pages/badges/axm-echo/axm-init.json" alt="axm-init"></a>
|
|
8
|
+
<a href="https://github.com/axm-protocols/axm-forge-workspace/actions/workflows/axm-quality.yml"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-forge-workspace/gh-pages/badges/axm-echo/coverage.json" alt="Coverage"></a>
|
|
9
|
+
<img src="https://img.shields.io/badge/python-3.12%2B-blue" alt="Python 3.12+">
|
|
10
|
+
</p>
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## Overview
|
|
15
|
+
|
|
16
|
+
Neural similarity & echo detection over code corpora (MiniLM + scikit-learn).
|
|
17
|
+
|
|
18
|
+
## Features
|
|
19
|
+
|
|
20
|
+
- **Neural by default** — the `st` (MiniLM) backend ships in the base install
|
|
21
|
+
(`torch` + `sentence-transformers`) and runs in-process; no extra to enable.
|
|
22
|
+
- **`tfidf` opt-out** — the pure-CPU `numpy` + `scikit-learn` backend stays
|
|
23
|
+
available (`--backend tfidf`) for callers that want to avoid loading torch.
|
|
24
|
+
- Built on `axm-ast` for code-corpus extraction — the corpus feeding both
|
|
25
|
+
`echo_code` (cross-package dedup) and `echo_check` (reuse retrieval).
|
|
26
|
+
|
|
27
|
+
## Installation
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
# echo is neural by default — the install ships torch + sentence-transformers.
|
|
31
|
+
uv add axm-echo
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
Or as a workspace dependency in `pyproject.toml`:
|
|
35
|
+
|
|
36
|
+
```toml
|
|
37
|
+
[project]
|
|
38
|
+
dependencies = ["axm-echo"]
|
|
39
|
+
|
|
40
|
+
[tool.uv.sources]
|
|
41
|
+
axm-echo = { workspace = true }
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Development
|
|
45
|
+
|
|
46
|
+
This package is part of the **axm-forge-workspace** uv workspace.
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
# Run tests for this package
|
|
50
|
+
uv run pytest --package axm-echo
|
|
51
|
+
|
|
52
|
+
# From workspace root
|
|
53
|
+
make test-axm-echo
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## License
|
|
57
|
+
|
|
58
|
+
MIT — © 2026 Gabriel Jarry
|
|
@@ -0,0 +1,58 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
`axm-echo` is a flat set of single-responsibility modules — no `core/` /
|
|
4
|
+
`adapters/` split, no hexagonal layering. The two `axm.tools` entry points
|
|
5
|
+
(`echo_code`, `echo_check`) orchestrate a shared
|
|
6
|
+
**corpus → embed → compare** pipeline; everything else is a leaf the tools
|
|
7
|
+
compose.
|
|
8
|
+
|
|
9
|
+
```mermaid
|
|
10
|
+
graph TD
|
|
11
|
+
subgraph "Tools (axm.tools entry points)"
|
|
12
|
+
EchoCode["EchoCodeTool · echo_code"]
|
|
13
|
+
EchoCheck["EchoCheckTool · echo_check"]
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
subgraph "Pipeline"
|
|
17
|
+
Corpus["corpus · extract_package / extract_monorepo"]
|
|
18
|
+
Embedding["embedding · embed / neighbors (tfidf | st)"]
|
|
19
|
+
Cluster["cluster · cross_pairs / split_pairs / cluster_pairs"]
|
|
20
|
+
Waiver["waiver · cluster_hash / acknowledged"]
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
subgraph "Leaves"
|
|
24
|
+
Scope["scope · load_scope (~/axm/echo.toml)"]
|
|
25
|
+
Structural["structural · jaccard_similarity (stdlib, no torch)"]
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
EchoCode --> Corpus
|
|
29
|
+
EchoCode --> Embedding
|
|
30
|
+
EchoCode --> Cluster
|
|
31
|
+
EchoCode --> Waiver
|
|
32
|
+
EchoCheck --> Corpus
|
|
33
|
+
EchoCheck --> Embedding
|
|
34
|
+
Corpus --> Scope
|
|
35
|
+
Corpus -->|axm-ast| Embedding
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Modules
|
|
39
|
+
|
|
40
|
+
| Module | Role |
|
|
41
|
+
|---|---|
|
|
42
|
+
| `tools` | The `echo_code` / `echo_check` `AXMTool`s (MCP + CLI + DAG node). They run the pipeline and shape the `ToolResult`. |
|
|
43
|
+
| `corpus` | Extract public symbols from a package (`extract_package`) or the whole scope (`extract_monorepo`) via `axm-ast`; each `Symbol` carries an `embed_text`. |
|
|
44
|
+
| `embedding` | The two backends behind `embed()` — `tfidf` (scikit-learn, pure CPU) and `st` (MiniLM, neural). `neighbors()` does exact cosine top-k. |
|
|
45
|
+
| `cluster` | Cross-package candidate pairs (`cross_pairs`), the v7 anti-signal split (`split_pairs`: dupes / parallel-API / boilerplate), and union-find clustering. |
|
|
46
|
+
| `waiver` | The acknowledged-cluster mechanism: a stable `cluster_hash` and the `[[tool.axm-echo.acknowledged]]` waiver lifecycle (mark / stale). |
|
|
47
|
+
| `scope` | Resolve the workspace roots to scan from `~/axm/echo.toml`, degrading to the current directory when absent. |
|
|
48
|
+
| `structural` | 100%-structural similarity over `ast.FunctionDef` bodies (`statement_set` + `jaccard_similarity`); pure stdlib, never loads torch. The primitive `duplicate_tests` reuses. |
|
|
49
|
+
|
|
50
|
+
## Design decisions
|
|
51
|
+
|
|
52
|
+
| Decision | Rationale |
|
|
53
|
+
|---|---|
|
|
54
|
+
| Neural by default (`st`/MiniLM) | Docstring similarity wants semantics; `torch` + `sentence-transformers` ship in the base install. |
|
|
55
|
+
| `tfidf` backend kept | A pure-CPU opt-out for callers that must avoid loading torch — `embed(texts, backend="tfidf")` and `--backend tfidf`. |
|
|
56
|
+
| Lazy torch import | `torch` is imported only inside the `st` backend, so the `tfidf` path stays light at runtime even though torch is installed. |
|
|
57
|
+
| Flat modules, no hexagonal split | Each module is one concern with a small public surface; the tools compose them. No abstract ports to swap. |
|
|
58
|
+
| Exact cosine (no ANN) | Corpora are monorepo-sized; brute-force matmul is exact and fast enough, with no index to maintain. |
|
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
# How-To Guides
|
|
2
|
+
|
|
3
|
+
Task-oriented guides for common workflows.
|
|
4
|
+
|
|
5
|
+
## Available Guides
|
|
6
|
+
|
|
7
|
+
- [Reuse check during planning with `echo_check`](reuse-check-in-planning.md) —
|
|
8
|
+
retrieve existing monorepo symbols for a ticket's intention and decide
|
|
9
|
+
reuse / extend / develop before drafting it.
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
# Reuse check during planning with `echo_check`
|
|
2
|
+
|
|
3
|
+
When a plan is decomposed into tickets (the `/plan-tickets` workflow), every
|
|
4
|
+
ticket whose scope is *"develop a helper / function / class that does X"*
|
|
5
|
+
risks minting a duplicate of something the monorepo already provides — a
|
|
6
|
+
fifth retry helper, a third CSV reader, another `slugify`. `echo_check`
|
|
7
|
+
turns that risk into a deliberate decision.
|
|
8
|
+
|
|
9
|
+
This guide shows how to wire `echo_check` into the planning step that
|
|
10
|
+
gathers codebase intelligence, and how to act on its result.
|
|
11
|
+
|
|
12
|
+
## When to run it
|
|
13
|
+
|
|
14
|
+
Run the reuse check **only** for tickets that introduce *reusable
|
|
15
|
+
behaviour* — a new unit worth deduplicating. Skip it for pure glue, config
|
|
16
|
+
edits, dependency bumps, or scopes that are already "wire / refactor
|
|
17
|
+
existing code": there is no new helper to deduplicate there.
|
|
18
|
+
|
|
19
|
+
## 1. Retrieve the closest existing symbols
|
|
20
|
+
|
|
21
|
+
Call `echo_check` on the **intention** — a free-form description of the
|
|
22
|
+
behaviour the ticket would build:
|
|
23
|
+
|
|
24
|
+
```python
|
|
25
|
+
from axm_echo.tools import EchoCheckTool
|
|
26
|
+
|
|
27
|
+
result = EchoCheckTool().execute(
|
|
28
|
+
intention="resilient HTTP call with retry on transient 5xx errors",
|
|
29
|
+
)
|
|
30
|
+
candidates = result.data["candidates"]
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Via MCP / CLI the same call is `axm echo_check` or
|
|
34
|
+
`axm_call(name="echo_check", arguments={"intention": "..."})`.
|
|
35
|
+
|
|
36
|
+
Each candidate carries its `qualname`, `package`, `score`, full docstring
|
|
37
|
+
(`doc_full`), a location `verdict`, and a `promotable` flag:
|
|
38
|
+
|
|
39
|
+
| Field | Meaning |
|
|
40
|
+
|---|---|
|
|
41
|
+
| `verdict = "reuse_canonical"` | The hit lives in the canonical commons (`axm-ingot`) — reuse the canonical symbol directly. |
|
|
42
|
+
| `verdict = "reuse_in_place"` | A real helper exists in some package but has not been canonicalised — reuse it **in place** from `<package>`; do not mint a duplicate just because it is not in the ingot yet. |
|
|
43
|
+
| `promotable = True` | A well-documented non-ingot candidate worth canonicalising later. |
|
|
44
|
+
|
|
45
|
+
An **empty** candidate list means nothing scored above the retrieval
|
|
46
|
+
threshold — the intention is genuinely novel.
|
|
47
|
+
|
|
48
|
+
## 2. Decide reuse / extend / develop — read the docstrings, not the score
|
|
49
|
+
|
|
50
|
+
`echo_check` *retrieves and ranks*; it deliberately does **not** decide.
|
|
51
|
+
A `PARTIAL` match (similar docstring, different contract) can outrank a
|
|
52
|
+
perfect one, so never branch on the score or the verdict tag alone. Read
|
|
53
|
+
each candidate's `doc_full` and signature, compare its real contract
|
|
54
|
+
against the intention, and pick one branch:
|
|
55
|
+
|
|
56
|
+
| Decision | When | Ticket effect |
|
|
57
|
+
|---|---|---|
|
|
58
|
+
| **reuse** | A candidate already does *exactly* what the intention needs. | Rewrite the ticket to *"import / reuse `<qualname>` from `<package>` + wire it in"*; drop the implementation tasks, keep only wiring + tests. |
|
|
59
|
+
| **extend** | A candidate is the right *canonical* home but misses a parameter / mode / edge case. | Emit an **extension ticket** on `<qualname>` in `<package>`, and make the consumer ticket `blocks`-depend on it (extension lands first). |
|
|
60
|
+
| **develop** | No candidate covers the intention (empty list, or all near-misses with a different contract). | Write the "develop a helper" ticket as normal. |
|
|
61
|
+
|
|
62
|
+
## 3. Worked example
|
|
63
|
+
|
|
64
|
+
Spec line: *"the screener needs a resilient HTTP call (retry on 5xx)."*
|
|
65
|
+
|
|
66
|
+
```python
|
|
67
|
+
EchoCheckTool().execute(
|
|
68
|
+
intention="resilient HTTP call with retry on transient errors",
|
|
69
|
+
)
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
If a candidate like `request_with_retry [axm-commons]` comes back with a
|
|
73
|
+
docstring matching the contract, **do not** emit "develop a retry helper".
|
|
74
|
+
Emit a **reuse** ticket — *"reuse `request_with_retry` from `axm-commons`
|
|
75
|
+
in the screener fetch path"* — with its implementation tasks dropped.
|
|
76
|
+
|
|
77
|
+
If nothing matches (empty candidate list), the helper genuinely does not
|
|
78
|
+
exist yet: emit the develop ticket.
|
|
@@ -0,0 +1,98 @@
|
|
|
1
|
+
---
|
|
2
|
+
hide:
|
|
3
|
+
- navigation
|
|
4
|
+
- toc
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# axm-echo
|
|
8
|
+
|
|
9
|
+
<p align="center">
|
|
10
|
+
<strong>Neural similarity & echo detection over code corpora (MiniLM + scikit-learn).</strong>
|
|
11
|
+
</p>
|
|
12
|
+
|
|
13
|
+
<p align="center">
|
|
14
|
+
<a href="https://github.com/axm-protocols/axm-forge-workspace/actions/workflows/ci.yml">
|
|
15
|
+
<img src="https://github.com/axm-protocols/axm-forge-workspace/actions/workflows/ci.yml/badge.svg" alt="CI" />
|
|
16
|
+
</a>
|
|
17
|
+
<a href="https://github.com/axm-protocols/axm-forge-workspace/actions/workflows/axm-quality.yml">
|
|
18
|
+
<img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-forge-workspace/gh-pages/badges/axm-echo/axm-init.json" alt="axm-init" />
|
|
19
|
+
</a>
|
|
20
|
+
<a href="https://github.com/axm-protocols/axm-forge-workspace/actions/workflows/axm-quality.yml">
|
|
21
|
+
<img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-forge-workspace/gh-pages/badges/axm-echo/axm-audit.json" alt="axm-audit" />
|
|
22
|
+
</a>
|
|
23
|
+
<a href="https://github.com/axm-protocols/axm-forge-workspace/actions/workflows/axm-quality.yml">
|
|
24
|
+
<img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-forge-workspace/gh-pages/badges/axm-echo/coverage.json" alt="Coverage" />
|
|
25
|
+
</a>
|
|
26
|
+
<img src="https://img.shields.io/badge/python-3.12+-blue.svg" alt="Python 3.12+" />
|
|
27
|
+
</p>
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Installation
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
# echo is neural by default — the install ships torch + sentence-transformers
|
|
35
|
+
# (MiniLM) alongside numpy + scikit-learn.
|
|
36
|
+
uv add axm-echo
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
The neural `st` backend is the in-process default. The `tfidf` backend stays
|
|
40
|
+
pure-CPU and never loads torch, for callers that want to skip the model.
|
|
41
|
+
|
|
42
|
+
## Quick Start
|
|
43
|
+
|
|
44
|
+
```python
|
|
45
|
+
from axm_echo import embed, extract_monorepo, neighbors
|
|
46
|
+
|
|
47
|
+
# 1. Build a corpus of public symbols across the configured workspaces
|
|
48
|
+
# (driven by ~/axm/echo.toml, falling back to the current dir).
|
|
49
|
+
symbols = extract_monorepo()
|
|
50
|
+
texts = [s["embed_text"] for s in symbols]
|
|
51
|
+
|
|
52
|
+
# 2. Embed it. "st" (MiniLM) is the neural default; "tfidf" stays pure-CPU.
|
|
53
|
+
matrix = embed(texts, backend="tfidf")
|
|
54
|
+
|
|
55
|
+
# 3. Find the nearest neighbours of a symbol (exact cosine top-k).
|
|
56
|
+
for idx, score in neighbors(matrix[0], matrix, k=5):
|
|
57
|
+
print(f"{score:.3f} {symbols[idx]['qualname']}")
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Features
|
|
61
|
+
|
|
62
|
+
- ✅ **`echo_code` cross-package echo detection** — the `axm echo_code` tool
|
|
63
|
+
(MCP + CLI + DAG node) clusters intent-equivalent duplicate symbols across
|
|
64
|
+
packages, with the v7 anti-signals (trivial-accessor filter, parallel-API
|
|
65
|
+
demotion, boilerplate-frequency demotion) applied
|
|
66
|
+
- ✅ **Liveable `echo_code` report** — bounded `--top-n` display (the neural
|
|
67
|
+
pass still finds them all, only the output is capped; the total stays
|
|
68
|
+
visible), `--max-cluster-size` rejection of union-find over-merges, and an
|
|
69
|
+
acknowledged-cluster **waiver** workflow (`[[tool.axm-echo.acknowledged]]`
|
|
70
|
+
in the scan-root `pyproject.toml`) that excludes intended echoes and reports
|
|
71
|
+
stale waivers to retire
|
|
72
|
+
- ✅ **`echo_check` intent retrieval** — the `axm echo_check` tool
|
|
73
|
+
(MCP + CLI + DAG node) embeds a free-form intention and returns the top-k
|
|
74
|
+
nearest monorepo symbols with their docstrings, each tagged with a location
|
|
75
|
+
verdict (reuse canonical / reuse in place / promotable); it does the
|
|
76
|
+
retrieval, leaving the use / extend / nothing decision to the caller
|
|
77
|
+
- ✅ **Structural similarity** — `statement_set` / `jaccard_similarity`
|
|
78
|
+
(with `flatten_body` / `normalize_dump`) compare two `ast.FunctionDef`
|
|
79
|
+
bodies by Jaccard over constant/identifier-normalized statement-sets;
|
|
80
|
+
100% structural, pure stdlib, never loads torch
|
|
81
|
+
- ✅ **Two embedding backends** — `st` (MiniLM `all-MiniLM-L6-v2`, the neural
|
|
82
|
+
default) and `tfidf` (code, scikit-learn), selected by a registry
|
|
83
|
+
- ✅ **Exact neighbour search** — brute-force cosine matmul, no ANN
|
|
84
|
+
- ✅ **Lazy torch import** — `torch` + `sentence-transformers` ship in the
|
|
85
|
+
base install (neural-by-default), but torch is imported only inside the
|
|
86
|
+
`st` backend, so the `tfidf` path never loads it at runtime
|
|
87
|
+
- ✅ **axm-ast corpus extractor** — public symbols with signature +
|
|
88
|
+
docstring, `embed_text` falling back to code when undocumented
|
|
89
|
+
- ✅ **Scope loader** — `~/axm/echo.toml`, graceful degradation to the
|
|
90
|
+
current workspace
|
|
91
|
+
- ✅ **Modern Python** — 3.12+ with strict typing
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
<div style="text-align: center; margin: 2rem 0;">
|
|
96
|
+
<a href="tutorials/getting-started/" class="md-button md-button--primary">Get Started →</a>
|
|
97
|
+
<a href="reference/cli/" class="md-button">Reference</a>
|
|
98
|
+
</div>
|
|
@@ -0,0 +1,136 @@
|
|
|
1
|
+
# CLI Reference
|
|
2
|
+
|
|
3
|
+
## Commands
|
|
4
|
+
|
|
5
|
+
### `axm echo_code`
|
|
6
|
+
|
|
7
|
+
Detect cross-package code **echoes** — intent-equivalent duplicate symbols
|
|
8
|
+
across the configured monorepo. The tool walks the scope, embeds every public
|
|
9
|
+
documented symbol, finds cross-package pairs whose docstrings are semantically
|
|
10
|
+
close, applies the v7 anti-signals, and prints the surviving **clusters** plus
|
|
11
|
+
the demoted parallel-API / boilerplate buckets.
|
|
12
|
+
|
|
13
|
+
The command is auto-registered from the `axm.tools` entry point, so the same
|
|
14
|
+
implementation is reachable as an MCP tool and a DAG `tool_node` too.
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
# Cluster echoes across the corpus (~/axm/echo.toml scope, or the cwd).
|
|
18
|
+
axm echo_code
|
|
19
|
+
|
|
20
|
+
# The default backend is neural "st" (MiniLM, in-process). Opt into the
|
|
21
|
+
# pure-CPU tfidf backend to avoid loading torch.
|
|
22
|
+
axm echo_code --backend tfidf
|
|
23
|
+
|
|
24
|
+
# Raise the cosine floor for a candidate pair (default 0.55).
|
|
25
|
+
axm echo_code --threshold 0.7
|
|
26
|
+
|
|
27
|
+
# Show only the 10 nearest actionable clusters (the total stays in the header).
|
|
28
|
+
axm echo_code --top-n 10
|
|
29
|
+
|
|
30
|
+
# Tighten the over-merge guard (default 50): drop any component above 20.
|
|
31
|
+
axm echo_code --max-cluster-size 20
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
| Option | Default | Description |
|
|
35
|
+
| -- | -- | -- |
|
|
36
|
+
| `--backend` | `st` | Embedding backend: `st` (neural MiniLM, the in-process default) or `tfidf` (pure CPU, no torch). |
|
|
37
|
+
| `--threshold` | `0.55` | Minimum cosine similarity for a candidate pair. |
|
|
38
|
+
| `--top-n` | `30` | Bound the report to the N nearest *non-acknowledged* clusters. The neural pass still finds them all — only the display is bounded; the total count stays in the header. |
|
|
39
|
+
| `--max-cluster-size` | `50` | Reject any connected component larger than this as a union-find over-merge (a structural-conformity signal, not a duplicate echo — a genuine duplicate is 2-5 members). |
|
|
40
|
+
|
|
41
|
+
Output names the tool, the live/shown/actionable cluster counts, the corpus
|
|
42
|
+
size, and the demoted buckets, then lists each shown cluster's members with
|
|
43
|
+
their package and docstring first line:
|
|
44
|
+
|
|
45
|
+
```text
|
|
46
|
+
echo_code | 8 clusters, 3 shown (8 actionable) | corpus 16 symbols | 0 parallel-API · 0 boilerplate (demoted)
|
|
47
|
+
|
|
48
|
+
cluster 1 sim=1.000 (2 symbols)
|
|
49
|
+
axm_commons.errors.RateLimitError [axm-commons] “Raised when the upstream API rate limit has been exceeded.”
|
|
50
|
+
axm_bib.errors.RateLimitError [axm-bib] “Raised when the upstream API rate limit has been exceeded.”
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
#### Acknowledging a cluster (waiver)
|
|
54
|
+
|
|
55
|
+
A genuine cross-package echo that is *intended* (a parallel API, a deliberate
|
|
56
|
+
wrapper) is noise on every run. Acknowledge it in the **scan-root** `pyproject.toml`
|
|
57
|
+
(the first workspace root in `~/axm/echo.toml`) so it drops out of the
|
|
58
|
+
actionable top-N. Each entry is a 12-hex `cluster_hash` (printed in the tool's
|
|
59
|
+
`data.clusters[*].cluster_hash`) plus a non-empty `reason`:
|
|
60
|
+
|
|
61
|
+
```toml
|
|
62
|
+
[[tool.axm-echo.acknowledged]]
|
|
63
|
+
hash = "ca29d81fb73c"
|
|
64
|
+
reason = "parallel API, intended cross-package duplication"
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
An acknowledged *live* cluster is marked `acknowledged` and excluded from the
|
|
68
|
+
top-N and the `actionable_count`. The mechanism is self-cleaning: a waiver whose
|
|
69
|
+
hash no longer matches any live cluster is reported under
|
|
70
|
+
`data.stale_acknowledged` ("this waiver no longer serves a purpose, retire it")
|
|
71
|
+
— informative, never blocking. A malformed entry (bad hash, empty reason) is
|
|
72
|
+
rejected gracefully into `data.acknowledged_errors`; the run never crashes.
|
|
73
|
+
|
|
74
|
+
### `axm echo_check`
|
|
75
|
+
|
|
76
|
+
Retrieve the public symbols closest to a free-form **intention**, ranked by
|
|
77
|
+
semantic similarity across the whole monorepo. Before writing a new helper,
|
|
78
|
+
ask `echo_check` what already exists: it embeds the intention, returns the
|
|
79
|
+
top-k nearest documented symbols with their docstrings, and tags each with a
|
|
80
|
+
location **verdict** so you know whether to reuse the canonical symbol, reuse
|
|
81
|
+
one in place, or promote it.
|
|
82
|
+
|
|
83
|
+
The verdict is a *location* tag, not a decision: a high score means "this is
|
|
84
|
+
the closest existing promise", never "use this". The use / extend / nothing
|
|
85
|
+
call is left to the calling agent — a partial match may legitimately score
|
|
86
|
+
above an exact one.
|
|
87
|
+
|
|
88
|
+
Like `echo_code`, the command is auto-registered from the `axm.tools` entry
|
|
89
|
+
point, so the same implementation is reachable as an MCP tool and a DAG
|
|
90
|
+
`tool_node` too.
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
# Retrieve the closest existing symbols for an intention.
|
|
94
|
+
axm echo_check --intention "HTTP request with retry and backoff"
|
|
95
|
+
|
|
96
|
+
# The default backend is neural "st" (MiniLM, in-process). Opt into the
|
|
97
|
+
# pure-CPU tfidf backend to avoid loading torch.
|
|
98
|
+
axm echo_check --intention "slugify a string" --backend tfidf
|
|
99
|
+
|
|
100
|
+
# Raise the retrieval floor / cap the number of candidates.
|
|
101
|
+
axm echo_check --intention "parse a CSV file" --threshold 0.5 --k 3
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
| Option | Default | Description |
|
|
105
|
+
| -- | -- | -- |
|
|
106
|
+
| `--intention` | `""` | Free-form description of the behaviour to implement. |
|
|
107
|
+
| `--backend` | `st` | Embedding backend: `st` (neural MiniLM, the in-process default) or `tfidf` (pure CPU, no torch). |
|
|
108
|
+
| `--k` | `10` | Maximum number of candidates to return. |
|
|
109
|
+
| `--threshold` | `0.30` | Minimum cosine similarity for a candidate to be retrieved. Below it the candidate is dropped, so a novel intention returns an empty list rather than a spurious match. |
|
|
110
|
+
|
|
111
|
+
The verdict is set by the candidate's package: a hit in `axm-ingot` is
|
|
112
|
+
`reuse_canonical`; anything else is `reuse_in_place` (with a `promotable→ingot`
|
|
113
|
+
hint when the symbol is documented well enough to be worth canonicalising).
|
|
114
|
+
|
|
115
|
+
Output names the tool, the intention, the candidate count, and the corpus
|
|
116
|
+
size, then lists each ranked candidate with its package, similarity, verdict
|
|
117
|
+
and docstring first line:
|
|
118
|
+
|
|
119
|
+
```text
|
|
120
|
+
echo_check | “HTTP request with retry and backoff” | 1 candidates | corpus 2 symbols
|
|
121
|
+
|
|
122
|
+
1. axm_ingot.net.fetch_url [axm-ingot] sim=0.762 reuse_canonical
|
|
123
|
+
"Perform an HTTP request, retrying with backoff on transient errors."
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
When nothing crosses the threshold the report says so explicitly, rather than
|
|
127
|
+
surfacing a weak false match:
|
|
128
|
+
|
|
129
|
+
```text
|
|
130
|
+
echo_check | “render a mermaid sequence diagram” | 0 candidates | corpus 2 symbols
|
|
131
|
+
(no candidate above threshold — likely novel)
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## Python API
|
|
135
|
+
|
|
136
|
+
Auto-generated API reference is available under [Python API](api/).
|