sal-torch 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- sal_torch-0.1.0/.github/workflows/ci.yml +18 -0
- sal_torch-0.1.0/.github/workflows/publish.yml +32 -0
- sal_torch-0.1.0/.gitignore +12 -0
- sal_torch-0.1.0/CHANGELOG.md +10 -0
- sal_torch-0.1.0/CLAUDE.md +142 -0
- sal_torch-0.1.0/PKG-INFO +78 -0
- sal_torch-0.1.0/README.md +40 -0
- sal_torch-0.1.0/SAL_ROADMAP_V3.md +440 -0
- sal_torch-0.1.0/SAL_TORCH_DESIGN.md +574 -0
- sal_torch-0.1.0/docs/architecture_support.md +83 -0
- sal_torch-0.1.0/docs/getting_started.md +86 -0
- sal_torch-0.1.0/docs/how_sal_works.md +74 -0
- sal_torch-0.1.0/docs/licensing.md +56 -0
- sal_torch-0.1.0/examples/compare_with_without_sal.py +138 -0
- sal_torch-0.1.0/examples/full_control.py +102 -0
- sal_torch-0.1.0/examples/quickstart.py +87 -0
- sal_torch-0.1.0/examples/standalone_fi.py +64 -0
- sal_torch-0.1.0/pyproject.toml +34 -0
- sal_torch-0.1.0/sal/__init__.py +40 -0
- sal_torch-0.1.0/sal/arch_support.py +157 -0
- sal_torch-0.1.0/sal/callback.py +48 -0
- sal_torch-0.1.0/sal/config.py +55 -0
- sal_torch-0.1.0/sal/fi.py +201 -0
- sal_torch-0.1.0/sal/license.py +48 -0
- sal_torch-0.1.0/sal/masker.py +185 -0
- sal_torch-0.1.0/sal/report.py +10 -0
- sal_torch-0.1.0/sal/scanner.py +71 -0
- sal_torch-0.1.0/sal/trainer.py +49 -0
- sal_torch-0.1.0/sal-torch-0.1.0-dev.tar.gz +0 -0
- sal_torch-0.1.0/scripts/modal_integration_test.py +76 -0
- sal_torch-0.1.0/scripts/modal_run_examples.py +54 -0
- sal_torch-0.1.0/scripts/validate_published_results.py +203 -0
- sal_torch-0.1.0/tests/conftest.py +101 -0
- sal_torch-0.1.0/tests/test_arch_integration.py +116 -0
- sal_torch-0.1.0/tests/test_config.py +35 -0
- sal_torch-0.1.0/tests/test_fi.py +77 -0
- sal_torch-0.1.0/tests/test_masker.py +90 -0
- sal_torch-0.1.0/tests/test_training_integration.py +138 -0
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
name: ci
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [main]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
test:
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
steps:
|
|
13
|
+
- uses: actions/checkout@v4
|
|
14
|
+
- uses: actions/setup-python@v5
|
|
15
|
+
with:
|
|
16
|
+
python-version: "3.12"
|
|
17
|
+
- run: pip install -e ".[dev]"
|
|
18
|
+
- run: pytest tests/ -v
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
name: publish
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
tags:
|
|
6
|
+
- "v*"
|
|
7
|
+
|
|
8
|
+
jobs:
|
|
9
|
+
test:
|
|
10
|
+
runs-on: ubuntu-latest
|
|
11
|
+
steps:
|
|
12
|
+
- uses: actions/checkout@v4
|
|
13
|
+
- uses: actions/setup-python@v5
|
|
14
|
+
with:
|
|
15
|
+
python-version: "3.12"
|
|
16
|
+
- run: pip install -e ".[dev]"
|
|
17
|
+
- run: pytest tests/ -v
|
|
18
|
+
|
|
19
|
+
publish:
|
|
20
|
+
needs: test
|
|
21
|
+
runs-on: ubuntu-latest
|
|
22
|
+
steps:
|
|
23
|
+
- uses: actions/checkout@v4
|
|
24
|
+
- uses: actions/setup-python@v5
|
|
25
|
+
with:
|
|
26
|
+
python-version: "3.12"
|
|
27
|
+
- run: pip install build twine
|
|
28
|
+
- run: python -m build
|
|
29
|
+
- run: twine upload dist/*
|
|
30
|
+
env:
|
|
31
|
+
TWINE_USERNAME: __token__
|
|
32
|
+
TWINE_PASSWORD: ${{ secrets.CE_Publish }}
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## 0.1.0-dev (2026-06-25)
|
|
4
|
+
|
|
5
|
+
- Initial scaffold
|
|
6
|
+
- Core SAL: HeadMasker, SALConfig, SALCallback, SALTrainer
|
|
7
|
+
- FI: activation graph extraction, Fragility Index, layer classification
|
|
8
|
+
- FIScanner: one-shot structural analysis
|
|
9
|
+
- Architecture auto-detection: 12 architectures
|
|
10
|
+
- License system: Ed25519 offline verification
|
|
@@ -0,0 +1,142 @@
|
|
|
1
|
+
# CLAUDE.md — sal-torch
|
|
2
|
+
|
|
3
|
+
Context for future sessions working on this package.
|
|
4
|
+
|
|
5
|
+
## What this is
|
|
6
|
+
|
|
7
|
+
`sal-torch` is a **commercial** PyTorch package (BSL 1.1 license) that makes any
|
|
8
|
+
transformer **compression-resilient** through training-time head masking. It is
|
|
9
|
+
a **product**, not a research repo: clean API, clear errors, no research
|
|
10
|
+
artifacts.
|
|
11
|
+
|
|
12
|
+
Two independent, composable components:
|
|
13
|
+
|
|
14
|
+
- **SAL (Structurally Adaptive Learning)** — training-time random head masking
|
|
15
|
+
that forces functional redistribution. *Perturbs.*
|
|
16
|
+
- **FI (Fragility Index)** — a post-hoc structural fragility diagnostic.
|
|
17
|
+
*Measures.*
|
|
18
|
+
|
|
19
|
+
SAL and FI are **separate**. A user can use either without the other.
|
|
20
|
+
|
|
21
|
+
## Cardinal rules (do not violate)
|
|
22
|
+
|
|
23
|
+
1. **RANDOM is the default method.** Zero overhead, best-or-equal accuracy at
|
|
24
|
+
validated scales (≤124M). Structured selection (FI/λ₂/liquefaction) all
|
|
25
|
+
*underperform* random in ablations and are not shipped as defaults.
|
|
26
|
+
2. **SAL and FI are separate.** SAL perturbs, FI measures. Keep them decoupled.
|
|
27
|
+
3. **Do NOT expose spectral-framework internals** in code, docstrings, or error
|
|
28
|
+
messages: no λ₂ / algebraic connectivity / Fiedler value, no TCGE, no
|
|
29
|
+
simplicial hierarchy, no Laplacian theory. FI is a **"fragility score"**,
|
|
30
|
+
full stop.
|
|
31
|
+
4. **All unit tests must pass on CPU** with the tiny model fixture
|
|
32
|
+
(4 layers × 8 heads × 64 hidden). No GPU required for the unit suite.
|
|
33
|
+
5. The **scaffold API is the target**; the reference research code
|
|
34
|
+
(`H:\Structural Awareness Loss\`) is the **source of truth for the
|
|
35
|
+
mechanism**. Keep the clean API, use the validated implementation details.
|
|
36
|
+
6. Treat `pyproject.toml`, `LICENSE`, `README.md` as off-limits unless the task
|
|
37
|
+
explicitly says otherwise. Local only — do not push to any remote.
|
|
38
|
+
|
|
39
|
+
## The validated mechanism (source of truth: reference research code)
|
|
40
|
+
|
|
41
|
+
These details were reconciled against the canonical reference implementation
|
|
42
|
+
(`sal_v5_extra.py`, the `random` branch — the validated default).
|
|
43
|
+
|
|
44
|
+
### Head masking (`sal/masker.py`)
|
|
45
|
+
|
|
46
|
+
- **Forward PRE-hook on the attention output projection** (`o_proj` /
|
|
47
|
+
`out_proj` / `c_proj` / `dense`). The hook zeros per-head slices of the
|
|
48
|
+
projection's **INPUT** — the concatenated per-head attention outputs, the
|
|
49
|
+
only place where feature dims map cleanly onto heads. **Never** zero the
|
|
50
|
+
projection *output* (post-projection features are fully mixed; zeroing them
|
|
51
|
+
does not zero heads — this was a bug in the original scaffold).
|
|
52
|
+
- **Pruned heads accumulate.** During the prune window, randomly chosen heads
|
|
53
|
+
are deactivated progressively and **stay** deactivated. This progressive
|
|
54
|
+
structural damage is the causal mechanism — the model self-reorganizes to
|
|
55
|
+
operate without the removed heads.
|
|
56
|
+
- After the window, the pruned set is **held** through end of training (not
|
|
57
|
+
restored), so adaptation is baked into the weights.
|
|
58
|
+
- `schedule` controls how the pruned count grows: `"random"` (default,
|
|
59
|
+
window-proportional ramp), `"progressive"` (one head per `prune_interval`),
|
|
60
|
+
`"burst"` (full target at window open). All accumulate; all use random
|
|
61
|
+
selection.
|
|
62
|
+
|
|
63
|
+
### Fragility Index (`sal/fi.py`)
|
|
64
|
+
|
|
65
|
+
- **FI = fraction of edges with zero triangle support** = `#{edges (i,j) with
|
|
66
|
+
(A·A)[i,j] == 0} / #edges`. Range [0,1]. Low = triangulated/robust, high =
|
|
67
|
+
fragile. This is the validated FI — **NOT** `1 − λ₂/λ_max` (the original
|
|
68
|
+
scaffold used the Laplacian Fiedler value; that is a separate, underperforming
|
|
69
|
+
research method and is forbidden by cardinal rule 3).
|
|
70
|
+
- Graph build: capture the **input** to each output projection via a pre-hook;
|
|
71
|
+
per-head signature = mask-weighted seq-mean per sentence, flattened over
|
|
72
|
+
`[sentences, head_dim]`, float64; Pearson similarity between heads; binary
|
|
73
|
+
adjacency at a **fixed edge density (default 0.10 → 90th-percentile |sim|
|
|
74
|
+
threshold)**.
|
|
75
|
+
- Layer classification (IMMUNE / BUFFER / CRITICAL) = relative FI change when a
|
|
76
|
+
layer's heads are removed (defaults: <1% immune, ≥5% critical). This is a
|
|
77
|
+
product API layered on top of the validated FI.
|
|
78
|
+
|
|
79
|
+
## Package layout
|
|
80
|
+
|
|
81
|
+
```
|
|
82
|
+
sal/
|
|
83
|
+
__init__.py version, set_license(), license_info()
|
|
84
|
+
config.py SALConfig (+ .auto(model)) — clean API target
|
|
85
|
+
masker.py HeadMasker — pre-hook, accumulation, hold-after-window
|
|
86
|
+
callback.py SALCallback (HF Trainer integration)
|
|
87
|
+
trainer.py SALTrainer (standalone PyTorch loop)
|
|
88
|
+
fi.py compute_fi (triangle fragility), extract_activation_graph, classify_layers
|
|
89
|
+
scanner.py FIScanner, FIMonitor
|
|
90
|
+
arch_support.py detect_architecture() — registry of supported archs
|
|
91
|
+
license.py Ed25519 offline license (signature verify still a stub)
|
|
92
|
+
report.py compliance report (stub — Phase 5)
|
|
93
|
+
tests/ CPU-only unit tests + conftest tiny model fixture
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Planned-but-not-yet-implemented (per design doc): `plasticity.py`,
|
|
97
|
+
`guard.py`, `ExpertMasker`, PDF reports.
|
|
98
|
+
|
|
99
|
+
## Development phase status
|
|
100
|
+
|
|
101
|
+
- **Phase 1 (Reconciliation + Core SAL): DONE.** masker.py and fi.py aligned to
|
|
102
|
+
the validated mechanism; 26 unit tests pass on CPU.
|
|
103
|
+
- **Phase 2 (Architecture Support): DONE.** Module/projection finding centralized
|
|
104
|
+
in `arch_support.py` (relative to `model.base_model`, so masker + FI hook the
|
|
105
|
+
same modules). Validated on real models — DistilBERT, GPT-2, ViT, BERT — both
|
|
106
|
+
on CPU and on a Modal T4 GPU. SAL training pipeline (SALConfig.auto -> SALCallback
|
|
107
|
+
-> HF Trainer -> FIScanner) verified end-to-end.
|
|
108
|
+
- Next: Phase 3 (FIScanner/FIMonitor hardening), Phase 4 (PlasticityScanner),
|
|
109
|
+
Phase 5 (license signing + reports).
|
|
110
|
+
|
|
111
|
+
## Integration tests
|
|
112
|
+
|
|
113
|
+
CPU unit suite stays at **26 passed**; integration tests are marked
|
|
114
|
+
`@pytest.mark.integration` and **skipped by default** (need network + model
|
|
115
|
+
downloads):
|
|
116
|
+
```
|
|
117
|
+
python -m pytest # 26 unit tests (CPU), integration skipped
|
|
118
|
+
python -m pytest --run-integration # + real-model arch + training tests
|
|
119
|
+
modal run scripts/modal_integration_test.py # same logic on a Modal T4 GPU
|
|
120
|
+
```
|
|
121
|
+
The Modal image needs `pytest` + `accelerate>=1.1.0` (Trainer dep). On
|
|
122
|
+
Windows, run Modal with `PYTHONUTF8=1` so the CLI can print its build glyphs.
|
|
123
|
+
Transformers 5.x silently ignores `head_mask` — confirming why SAL hooks the
|
|
124
|
+
projection directly.
|
|
125
|
+
|
|
126
|
+
## Reference docs
|
|
127
|
+
|
|
128
|
+
- `SAL_TORCH_DESIGN.md` — architecture, target API, package structure, roadmap.
|
|
129
|
+
NOTE: §D5 mentions λ₂; that is superseded by cardinal rule 3 + the reference —
|
|
130
|
+
FI is the triangle-fragility score, not λ₂.
|
|
131
|
+
- `SAL_ROADMAP_V3.md` — ablation results, validated claims (random > structured
|
|
132
|
+
selection; gains come from progressive training-time sparsification).
|
|
133
|
+
- `H:\Structural Awareness Loss\` — private research reference. Canonical
|
|
134
|
+
mechanism: `sal_v5_extra.py` (`random` branch). Not shipped.
|
|
135
|
+
|
|
136
|
+
## Testing
|
|
137
|
+
|
|
138
|
+
```
|
|
139
|
+
python -m pytest -q # CPU only, ~30s, must be all-green
|
|
140
|
+
```
|
|
141
|
+
Fixtures in `tests/conftest.py`: `tiny_model` (custom 4×8×64 GPT-2-like),
|
|
142
|
+
`tiny_config`, `probe_data`.
|
sal_torch-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: sal-torch
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Structurally Adaptive Learning — training-time sparsification for robust neural networks
|
|
5
|
+
Author-email: Cognitive Engineering <contact@cognitive-engineering.dev>
|
|
6
|
+
License-Expression: LicenseRef-BSL-1.1
|
|
7
|
+
Keywords: compression,pruning,pytorch,sparsification,transformers
|
|
8
|
+
Classifier: Development Status :: 3 - Alpha
|
|
9
|
+
Classifier: Intended Audience :: Science/Research
|
|
10
|
+
Classifier: Programming Language :: Python :: 3
|
|
11
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
12
|
+
Requires-Python: >=3.10
|
|
13
|
+
Requires-Dist: numpy
|
|
14
|
+
Requires-Dist: scipy
|
|
15
|
+
Requires-Dist: torch>=2.1
|
|
16
|
+
Provides-Extra: all
|
|
17
|
+
Requires-Dist: fpdf2; extra == 'all'
|
|
18
|
+
Requires-Dist: matplotlib; extra == 'all'
|
|
19
|
+
Requires-Dist: peft>=0.8; extra == 'all'
|
|
20
|
+
Requires-Dist: pynacl>=1.5.0; extra == 'all'
|
|
21
|
+
Requires-Dist: pytest-cov; extra == 'all'
|
|
22
|
+
Requires-Dist: pytest>=7.0; extra == 'all'
|
|
23
|
+
Requires-Dist: ruff; extra == 'all'
|
|
24
|
+
Requires-Dist: transformers>=4.38; extra == 'all'
|
|
25
|
+
Provides-Extra: crypto
|
|
26
|
+
Requires-Dist: pynacl>=1.5.0; extra == 'crypto'
|
|
27
|
+
Provides-Extra: dev
|
|
28
|
+
Requires-Dist: pytest-cov; extra == 'dev'
|
|
29
|
+
Requires-Dist: pytest>=7.0; extra == 'dev'
|
|
30
|
+
Requires-Dist: ruff; extra == 'dev'
|
|
31
|
+
Provides-Extra: hf
|
|
32
|
+
Requires-Dist: peft>=0.8; extra == 'hf'
|
|
33
|
+
Requires-Dist: transformers>=4.38; extra == 'hf'
|
|
34
|
+
Provides-Extra: reports
|
|
35
|
+
Requires-Dist: fpdf2; extra == 'reports'
|
|
36
|
+
Requires-Dist: matplotlib; extra == 'reports'
|
|
37
|
+
Description-Content-Type: text/markdown
|
|
38
|
+
|
|
39
|
+
# sal-torch
|
|
40
|
+
|
|
41
|
+

|
|
42
|
+
|
|
43
|
+
**Structurally Adaptive Learning for PyTorch**
|
|
44
|
+
|
|
45
|
+
Training-time sparsification that makes neural networks structurally resilient to compression.
|
|
46
|
+
|
|
47
|
+
## Install
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
pip install sal-torch # core
|
|
51
|
+
pip install sal-torch[hf] # + HuggingFace Trainer
|
|
52
|
+
pip install sal-torch[all] # everything
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
```python
|
|
56
|
+
from sal import SALConfig, SALCallback
|
|
57
|
+
|
|
58
|
+
config = SALConfig.auto(model)
|
|
59
|
+
trainer = Trainer(model=model, callbacks=[SALCallback(config)])
|
|
60
|
+
trainer.train()
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
Three lines. Any transformer. Compression-resilient.
|
|
64
|
+
|
|
65
|
+
## Examples
|
|
66
|
+
|
|
67
|
+
- [`examples/quickstart.py`](examples/quickstart.py) — 3-line SAL training on DistilBERT
|
|
68
|
+
- [`examples/standalone_fi.py`](examples/standalone_fi.py) — Fragility Index scan, no training
|
|
69
|
+
- [`examples/full_control.py`](examples/full_control.py) — manual config + standalone trainer
|
|
70
|
+
- [`examples/compare_with_without_sal.py`](examples/compare_with_without_sal.py) — SAL vs. baseline under compression
|
|
71
|
+
|
|
72
|
+
New here? Start with [docs/getting_started.md](docs/getting_started.md).
|
|
73
|
+
|
|
74
|
+
## License
|
|
75
|
+
|
|
76
|
+
BSL 1.1 — free for research and evaluation. Commercial production requires a license.
|
|
77
|
+
|
|
78
|
+
Built by [Cognitive Engineering](https://cognitive-engineering.dev) in Switzerland.
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# sal-torch
|
|
2
|
+
|
|
3
|
+

|
|
4
|
+
|
|
5
|
+
**Structurally Adaptive Learning for PyTorch**
|
|
6
|
+
|
|
7
|
+
Training-time sparsification that makes neural networks structurally resilient to compression.
|
|
8
|
+
|
|
9
|
+
## Install
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
pip install sal-torch # core
|
|
13
|
+
pip install sal-torch[hf] # + HuggingFace Trainer
|
|
14
|
+
pip install sal-torch[all] # everything
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
```python
|
|
18
|
+
from sal import SALConfig, SALCallback
|
|
19
|
+
|
|
20
|
+
config = SALConfig.auto(model)
|
|
21
|
+
trainer = Trainer(model=model, callbacks=[SALCallback(config)])
|
|
22
|
+
trainer.train()
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
Three lines. Any transformer. Compression-resilient.
|
|
26
|
+
|
|
27
|
+
## Examples
|
|
28
|
+
|
|
29
|
+
- [`examples/quickstart.py`](examples/quickstart.py) — 3-line SAL training on DistilBERT
|
|
30
|
+
- [`examples/standalone_fi.py`](examples/standalone_fi.py) — Fragility Index scan, no training
|
|
31
|
+
- [`examples/full_control.py`](examples/full_control.py) — manual config + standalone trainer
|
|
32
|
+
- [`examples/compare_with_without_sal.py`](examples/compare_with_without_sal.py) — SAL vs. baseline under compression
|
|
33
|
+
|
|
34
|
+
New here? Start with [docs/getting_started.md](docs/getting_started.md).
|
|
35
|
+
|
|
36
|
+
## License
|
|
37
|
+
|
|
38
|
+
BSL 1.1 — free for research and evaluation. Commercial production requires a license.
|
|
39
|
+
|
|
40
|
+
Built by [Cognitive Engineering](https://cognitive-engineering.dev) in Switzerland.
|