noethersolve 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,682 @@
1
+ Metadata-Version: 2.4
2
+ Name: noethersolve
3
+ Version: 0.4.0
4
+ Summary: Automated scientific discovery: use LLM knowledge gaps as a compass to find underexplored science.
5
+ License: MIT
6
+ Keywords: autoresearch,scientific-discovery,llm,conservation-laws,adapters
7
+ Requires-Python: >=3.10
8
+ Description-Content-Type: text/markdown
9
+ Requires-Dist: numpy>=1.24
10
+ Requires-Dist: scipy>=1.11
11
+ Requires-Dist: pyyaml>=6.0
12
+ Provides-Extra: mlx
13
+ Requires-Dist: mlx>=0.18; extra == "mlx"
14
+ Requires-Dist: mlx-lm>=0.18; extra == "mlx"
15
+ Provides-Extra: torch
16
+ Requires-Dist: torch>=2.1; extra == "torch"
17
+ Requires-Dist: transformers>=4.40; extra == "torch"
18
+ Requires-Dist: accelerate>=0.28; extra == "torch"
19
+ Provides-Extra: auto
20
+ Requires-Dist: anthropic>=0.40; extra == "auto"
21
+ Provides-Extra: dashboard
22
+ Requires-Dist: matplotlib>=3.7; extra == "dashboard"
23
+ Provides-Extra: dev
24
+ Requires-Dist: pytest>=8.0; extra == "dev"
25
+ Requires-Dist: black>=24.0; extra == "dev"
26
+ Requires-Dist: ruff>=0.4; extra == "dev"
27
+
28
+ # NoetherSolve
29
+
30
+ **https://github.com/SolomonB14D3/noethersolve** · **https://solomonb14d3.github.io/noethersolve**
31
+
32
+ [![Paper: Breaking Frozen Priors](https://img.shields.io/badge/Paper%2010-Breaking%20Frozen%20Priors-blue)](paper/breaking_frozen_priors.pdf) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19017290.svg)](https://doi.org/10.5281/zenodo.19017290)
33
+
34
+ **Automated scientific discovery that makes the model smarter with each cycle.**
35
+
36
+ Most autoresearch systems generate hypotheses and hope for the best. NoetherSolve closes the loop: it generates candidates, verifies them numerically, measures whether the model already knows them, and when it doesn't, **discovers the answer and teaches it back to the model**. Each discovery trains an adapter that persists through the rest of the run. The model that evaluates candidate #50 is smarter than the one that evaluated candidate #1, because every intervening discovery has been injected into it.
37
+
38
+ This matters because the adapters aren't fixing things the model already knows. The Q_f conservation law family, the stretch-resistant R_f ratio, the continuous Euler extension — none of these existed in any training corpus. The system discovered them through numerical simulation, verified they were real, confirmed the model had never seen them (oracle margin -30 to -44), and wrote them into the model's knowledge. After adapter training, the model recognizes and correctly ranks these quantities (margin flipped to +4 to +30, ranking Spearman rho = 0.932). The model now knows physics that no human had published.
39
+
40
+ And the adapters don't degrade existing knowledge. Zero MMLU degradation across every adapter tested, because they operate in logit space — they reshape the output distribution without touching the hidden-state knowledge pathway. Each cycle adds knowledge without taking any away. Cross-domain transfer is real: joint training on physics and topology produces positive transfer in both directions, meaning the model learns something general about invariance that applies across fields.
41
+
42
+ LLMs are trained on what the field has collectively written and taught. Where the model is confidently wrong or blank, the literature is thin. That's where new science is most likely to be found. NoetherSolve automates this: propose, verify, check, discover, teach, repeat.
43
+
44
+ The method is domain-agnostic. We've applied it to fluid dynamics, electromagnetism, chemical kinetics, Hamiltonian mechanics, Navier-Stokes regularity, and knot theory so far. Any field where you can numerically verify a claim and ask a model about it is fair game.
45
+
46
+ ### Paper
47
+
48
+ **Breaking Frozen Priors: Teaching Language Models to Discover Conservation Laws from Numerical Simulation** (Sanchez, 2026)
49
+ DOI: [10.5281/zenodo.19017290](https://doi.org/10.5281/zenodo.19017290)
50
+
51
+ Three-phase pipeline transforms a frozen oracle (margin -77.5 +/- 1.7) into a ranking engine (Spearman rho = 0.932 from baseline -0.143). Novel Q_f invariant family verified across chaotic vortex systems and extended to continuous 2D/3D Euler equations. The LLM gap pointed directly at the physics: the model's blind spot on weighted distance sums led to the discovery of stretch-resistant invariants relevant to 3D Navier-Stokes regularity. See [`paper/breaking_frozen_priors.pdf`](paper/breaking_frozen_priors.pdf).
52
+
53
+ ---
54
+
55
+ ## How It Works (Plain English)
56
+
57
+ An AI model is trained on everything humans have written. That means it knows
58
+ what we know, but it also shares our blind spots. Where the collective
59
+ literature is thin or wrong, the model is thin or wrong.
60
+
61
+ NoetherSolve exploits this. It:
62
+
63
+ 1. **Proposes a claim** about how a system behaves (e.g., "this combination of
64
+ distances between vortices stays constant over time").
65
+ 2. **Checks it with math.** Simulates the system and measures whether the claim
66
+ actually holds. Most don't. The ones that do are real.
67
+ 3. **Asks the model: did you already know this?** Compares how likely the model
68
+ thinks the true answer is vs. a plausible wrong answer. If the model already
69
+ knows it, move on. If it doesn't, that's a gap in human knowledge, because
70
+ the model was trained on human knowledge.
71
+ 4. **Teaches the answer back to the model.** Trains a small, cheap patch
72
+ (an "adapter") that doesn't break anything the model already knows. The model
73
+ is now smarter than it was before step 1.
74
+ 5. **Repeats with the smarter model.** The next claim is evaluated by a model
75
+ that has absorbed every prior discovery. Each cycle, the blind spots shrink
76
+ and the remaining gaps get harder and more interesting.
77
+
78
+ The result: the model ends up knowing things that weren't in any textbook or
79
+ paper, because the system discovered them through simulation and injected them.
80
+ In chemical kinetics, the model went from recognizing 0 out of 16 conservation
81
+ laws to 15 out of 16 after one pass. In Hamiltonian mechanics, single-pass
82
+ training caused interference (the model got worse), so the system broke the
83
+ domain into concept clusters and trained them in stages: 5 stages later, 16/16
84
+ with zero regression. In Navier-Stokes, staged training plateaued at 6/16, so
85
+ the system switched to orthogonal adapters (one specialist per concept cluster,
86
+ facts routed to their specialist at inference): 16/16. Every domain that has
87
+ resisted one approach has eventually fallen to the next. In fluid dynamics, it
88
+ learned an entirely new family of invariants that no human had published.
89
+
90
+ The method works in any field where you can (a) simulate a system and (b) check
91
+ whether a quantity is conserved. So far it's been applied to fluid dynamics,
92
+ electromagnetism, chemical kinetics, Hamiltonian mechanics, Navier-Stokes
93
+ regularity, and knot theory.
94
+
95
+ ---
96
+
97
+ ## What It Does (Technical)
98
+
99
+ NoetherSolve runs a **dual-filter pipeline**. The "oracle" is a base LLM scored by log-probability: for each candidate fact, we compare `log P(true answer | context)` against `log P(best distractor | context)`. Positive margin means the model knows it; negative means it doesn't.
100
+
101
+ ```
102
+ Hypothesis (expression)
103
+
104
+
105
+ Numerical checker ← Is this quantity actually conserved?
106
+ (RK45 integration, frac_var = σ/|mean| < threshold
107
+ frac_var test)
108
+ │ PASS
109
+
110
+ Oracle filter ← Does the model already know it?
111
+ (log-prob margin, margin = log P(truth) − log P(best distractor)
112
+ base LLM + adapter stack)
113
+
114
+ ├─ PASS → DUAL-PASS: known quantity, archive it
115
+
116
+ └─ FAIL → NEW SCIENCE: model has never seen this
117
+
118
+
119
+ Train adapter ← Teach the discovery to the model
120
+ (hinge loss, 25 examples generated per candidate
121
+ logit-space)
122
+
123
+ ├─ margin flips → KNOWLEDGE INJECTED: adapter joins the stack
124
+ │ (all future candidates evaluated with this knowledge)
125
+
126
+ └─ margin stays → HARD GAP: log it, try different approach next run
127
+ ```
128
+
129
+ Adapters stack within a run — each successful discovery makes the oracle
130
+ smarter for every subsequent candidate. After the main sweep, a
131
+ **confidence-driven resampling** pass retries borderline failures (margin
132
+ between -5 and 0) with the full adapter stack. Candidates that were just
133
+ short of flipping often get rescued once the model has absorbed neighboring
134
+ discoveries. Survivors get promoted to high-priority in the open questions
135
+ queue for the next run.
136
+
137
+ **Escalation for hard domains:**
138
+
139
+ 1. **Single-pass** — one adapter for the whole domain. Works for clean domains
140
+ (chemical kinetics: 0/16 to 16/16 with distractor fix).
141
+ 2. **Staged training** — group facts into clusters, train sequentially, verify
142
+ zero regression at each stage. Solved Hamiltonian mechanics (1/16 to 16/16
143
+ in 5 stages).
144
+ 3. **Orthogonal adapters** — when staged training plateaus because facts
145
+ interfere within a single adapter, train separate specialist adapters per
146
+ concept cluster. Each adapter learns one cluster without fighting the others.
147
+ Route facts to their specialist at inference. Solved NS regularity
148
+ (6/16 staged to 16/16 with orthogonal cluster adapters).
149
+ 4. **Cross-domain joint training** — train a single adapter on multiple domains
150
+ simultaneously. Difficulty-weighted sampling achieves the best transfer:
151
+
152
+ | Method | Hamiltonian | NS | Knot | Chemical |
153
+ |--------|-------------|-----|------|----------|
154
+ | Baseline (no adapter) | 6/16 | 0/16 | 1/16 | 5/16 |
155
+ | Basic joint | 16/16 | 6/16 | 10/16 | 11/16 |
156
+ | Domain-balanced | 16/16 | 6/16 | 11/16 | 11/16 |
157
+ | Difficulty-weighted | 14/16 | **10/16** | 11/16 | 13/16 |
158
+ | Anchored joint | 16/16 | 9/16 | 11/16 | 12/16 |
159
+
160
+ A single jointly-trained adapter lifts all 4 domains simultaneously.
161
+ Difficulty-weighted sampling (oversample hard facts) gives the best result
162
+ on the hardest domain (NS: 0 to 10/16). Conservation knowledge transfers
163
+ across physics and pure math.
164
+
165
+ **Token-length bias.** Some facts are unlearnable because the base model
166
+ prefers shorter token sequences. If a distractor is shorter than the correct
167
+ answer (e.g., `"k × [A]"` vs `"k × [A] × [B] where k is the rate constant"`),
168
+ no amount of adapter training will flip the margin. Fix by rephrasing: shorten
169
+ the truth and lengthen the distractors so they're clearly wrong and roughly
170
+ the same length. This flipped the last chemical kinetics holdout from -3.8 to
171
+ +4.3 and rescued ns03 from -44 to +242.8.
172
+
173
+ **Never stack adapters.** Joint + specialist stacked = regression. Training a
174
+ specialist on gap facts and stacking it on top of a joint adapter destroyed
175
+ the joint adapter's wins (8/16 → 5/16). The specialist overwrites what the
176
+ joint adapter learned. Use cluster routing instead: apply each adapter only to
177
+ its assigned facts, never combine weights.
178
+
179
+ ---
180
+
181
+ ## Toolkit — Practical Tools Built from Discoveries
182
+
183
+ The pipeline's discoveries become standalone tools that work without any LLM.
184
+ Install: `pip install noethersolve` (or `pip install -e .` for development).
185
+
186
+ ### Conservation Monitors
187
+
188
+ Drop into any simulation loop. Track standard invariants (H, Lz, momentum)
189
+ plus AI-discovered quantities (Q_f family, R_f ratio, Wegscheider cyclicity).
190
+
191
+ ```python
192
+ from noethersolve import VortexMonitor
193
+
194
+ monitor = VortexMonitor(circulations=[1.0, -0.5, 0.3])
195
+ monitor.set_initial(positions)
196
+
197
+ for step in simulation:
198
+ state = integrator.step()
199
+ report = monitor.check(state)
200
+ if report.worst_drift > 1e-3:
201
+ print(f"WARNING: {report.worst_name} drifted {report.worst_drift:.2e}")
202
+ ```
203
+
204
+ Three built-in monitors: `VortexMonitor` (2D point-vortex), `ChemicalMonitor`
205
+ (reaction networks with Wegscheider cyclicity, entropy production, Lyapunov
206
+ function), `GravityMonitor` (N-body with Q_f on pairwise distances).
207
+
208
+ ### Integrator Validator
209
+
210
+ Validates your ODE solver configuration before you run a long simulation.
211
+ Checks whether conservation laws are preserved and suggests fixes.
212
+
213
+ ```python
214
+ from noethersolve import validate_integrator
215
+
216
+ report = validate_integrator(
217
+ rhs=my_vortex_rhs,
218
+ y0=positions.ravel(),
219
+ t_span=(0, 100),
220
+ system="vortex",
221
+ circulations=[1.0, -0.5, 0.3],
222
+ rhs_args=(circulations,),
223
+ rtol=1e-8,
224
+ )
225
+ print(report)
226
+ # ============================================================
227
+ # Integrator Validation: PASS
228
+ # ============================================================
229
+ # PASSED (12):
230
+ # H frac_var=9.30e-09
231
+ # Lz frac_var=4.80e-09
232
+ # Q_linear frac_var=2.53e-03
233
+ # ...
234
+ ```
235
+
236
+ Also supports `compare_configs()` to test multiple solver settings side-by-side,
237
+ and custom invariants via `invariants={"energy": lambda y: compute_energy(y)}`.
238
+
239
+ ### Chemical Network Auditor
240
+
241
+ Checks thermodynamic consistency of a reaction network without running a
242
+ simulation. Pure algebraic checks on the stoichiometry and rate constants.
243
+
244
+ ```python
245
+ from noethersolve import audit_network
246
+
247
+ report = audit_network(
248
+ species=["A", "B", "C"],
249
+ stoichiometry=[[-1, 1, 0, 0], [1, -1, -1, 1], [0, 0, 1, -1]],
250
+ rate_constants=[0.5, 0.3, 0.4, 0.2],
251
+ reactant_matrix=[[1, 0, 0, 0], [0, 1, 1, 0], [0, 0, 0, 1]],
252
+ reverse_pairs=[(0, 1), (2, 3)],
253
+ )
254
+ print(report)
255
+ # Shows: conservation laws, Wegscheider cycle products, detailed balance
256
+ # ratios, entropy production, and warnings if anything is inconsistent.
257
+ ```
258
+
259
+ Catches: Wegscheider cyclicity violations, missing conservation laws,
260
+ non-physical rate constants, negative entropy production (second law violation).
261
+
262
+ ### EM Field Monitor
263
+
264
+ Monitors electromagnetic field simulations for conservation of standard
265
+ and obscure invariants: energy, momentum, optical chirality (Zilch Z⁰,
266
+ Lipkin 1964), helicity, super-energy (Chevreton tensor), zilch vector.
267
+
268
+ ```python
269
+ from noethersolve import EMMonitor
270
+
271
+ monitor = EMMonitor(N=64, L=2*np.pi)
272
+ monitor.set_initial(E_fields, B_fields) # 3-tuples of 3D arrays
273
+
274
+ for step in simulation:
275
+ E, B = maxwell_solver.step()
276
+ report = monitor.check(E, B)
277
+ if report.worst_drift > 1e-6:
278
+ print(f"WARNING: {report.worst_name} drifted {report.worst_drift:.2e}")
279
+ ```
280
+
281
+ Catches: numerical dissipation, wrong boundary conditions, missing terms
282
+ in Maxwell solvers. Spectral curls computed internally via FFT.
283
+
284
+ ### Hamiltonian System Validator
285
+
286
+ Validates that an ODE integrator preserves the symplectic structure of
287
+ Hamiltonian systems. Goes beyond energy to check Liouville's theorem
288
+ (phase-space volume) and the first Poincaré integral invariant (∮ p dq).
289
+
290
+ ```python
291
+ from noethersolve import kepler_2d
292
+
293
+ monitor = kepler_2d(mu=1.0) # built-in Kepler problem
294
+ report = monitor.validate(
295
+ z0=np.array([1.0, 0.0, 0.0, 0.8]), # elliptical orbit
296
+ T=100.0, rtol=1e-10,
297
+ )
298
+ print(report)
299
+ # Shows: energy, angular_momentum, LRL_magnitude,
300
+ # liouville_volume, poincare_invariant — all PASS/WARN/FAIL
301
+ ```
302
+
303
+ Built-in systems: `harmonic_oscillator`, `kepler_2d` (with angular momentum
304
+ and Laplace–Runge–Lenz vector), `henon_heiles`, `coupled_oscillators`.
305
+ Or bring your own H(z) and ∇H(z) via `HamiltonianMonitor(H=..., dH=..., n_dof=...)`.
306
+
307
+ ### Invariant Learner
308
+
309
+ Automatically discovers new conserved quantities from trajectory data.
310
+ Optimizes over 12 basis functions to find f(r) that minimizes fractional
311
+ variation of Q_f = Σᵢ<ⱼ wᵢwⱼ f(rᵢⱼ) along one or more trajectories.
312
+
313
+ ```python
314
+ from noethersolve import InvariantLearner
315
+
316
+ learner = InvariantLearner()
317
+ result = learner.learn_from_positions(
318
+ position_trajectories=[trajectory], # shape (n_steps, N, dim)
319
+ weights=[1.0, -0.5, 0.3], # vortex circulations
320
+ )
321
+ print(result)
322
+ # Shows: optimal f(r) = 0.924·e^(-r) + 0.186·sin(r) + ...
323
+ # 40% improvement over single-basis e^(-r)
324
+ # Individual basis losses ranked
325
+ ```
326
+
327
+ Three input modes: `learn_from_positions` (raw coordinates),
328
+ `learn_from_distances` (pairwise distance time series),
329
+ `learn_from_field` (continuous 2D vorticity fields via FFT convolution).
330
+
331
+ ### Benchmark Results
332
+
333
+ The corruption benchmark (`experiments/corruption_benchmark.py`) validates
334
+ these tools against 5 experiments:
335
+
336
+ | Experiment | What it tests | Key finding |
337
+ |-----------|--------------|-------------|
338
+ | Tolerance sweep | rtol from 1e-12 to 1e-2 | Q_f monitors alert before H/Lz at loose tolerances |
339
+ | Single-step corruption | Noise injection at step 500 | Q_f detects at noise=1e-8 where H/Lz miss |
340
+ | Wrong physics | Missing 2pi, dropped vortex | Q_exp sensitivity 252x over baseline |
341
+ | Chemical violation | Perturbed rate constants | Wegscheider cycle product shifts 3.33 to 0.13 while mass conservation stays perfect |
342
+ | Sensitivity sweep | 20 noise levels, 1e-10 to 1e-1 | Standard monitors detect at noise >= 1.8e-6; discovered monitors have baseline sensitivity at 1e-10 |
343
+
344
+ **102 tests passing** across all 6 tools (`pytest tests/`).
345
+
346
+ ---
347
+
348
+ ## Quick Start
349
+
350
+ ```bash
351
+ # Install core deps
352
+ pip install -r requirements.txt
353
+
354
+ # 1. Run the checker on a hypothesis
355
+ python vortex_checker.py --ic restricted --expr "s['r12'] + 0.01*(s['r13']+s['r23'])"
356
+
357
+ # 2. If checker passes, run the oracle
358
+ python oracle_wrapper.py --problem problems/vortex_pair_conservation.yaml
359
+
360
+ # 3. If oracle fails, diagnose and repair
361
+ python oracle_wrapper.py --problem problems/vortex_pair_conservation.yaml \
362
+ --repair --diagnose
363
+
364
+ # 4. Claim a problem before you start hunting (prevents duplicate work)
365
+ python claim.py claim \
366
+ --problem vortex_pair_conservation \
367
+ --expr "r12 + eps*(r13+r23)" \
368
+ --handle your_handle
369
+
370
+ # 5. View results dashboard (rebuilds from results/candidates.tsv)
371
+ python dashboard.py --open
372
+ ```
373
+
374
+ > **Linux / CUDA users:** use `noethersolve_torch.py` as a drop-in backend that requires only PyTorch + HuggingFace — no MLX needed.
375
+ > ```bash
376
+ > python noethersolve_torch.py train-adapter --data my_training_data.json \
377
+ > --model Qwen/Qwen3-4B-Base --out adapters/my_adapter.npz
378
+ > python noethersolve_torch.py eval-oracle --problem problems/vortex_pair_conservation.yaml \
379
+ > --adapter adapters/my_adapter.npz --diagnose
380
+ > ```
381
+
382
+ ---
383
+
384
+ ## Adding a New Domain (Fork This)
385
+
386
+ Every domain is three files in `problems/`:
387
+
388
+ | File | Purpose |
389
+ |------|---------|
390
+ | `my_domain.yaml` | Problem definition: model, oracle, monitors, adapter, budget |
391
+ | `my_domain_facts.json` | Verification set: 8–15 facts with context/truth/distractors |
392
+ | `my_domain_checker.py` | Numerical integrator: `integrate()` + `parse_state()` + `frac_var()` |
393
+
394
+ Copy `problem_template.yaml` and follow `CONTRIBUTING.md` for the full protocol.
395
+
396
+ **Format rule:** Use compact symbolic notation in facts.
397
+ `"H = -1/(4π) Σᵢ<ⱼ ΓᵢΓⱼ ln(rᵢⱼ²)"` ✓
398
+ `"The Hamiltonian equals negative one over four pi times the sum..."` ✗
399
+
400
+ ---
401
+
402
+ ## Discoveries So Far
403
+
404
+ 193+ candidates tested. 80+ genuine invariants discovered. 10 domains, 122 oracle facts.
405
+
406
+ ### Discrete Point-Vortex
407
+
408
+ | Expression | frac_var | Oracle Baseline → Adapter | Status |
409
+ |------------|----------|---------------------------|--------|
410
+ | e₁ = r₁₂+r₁₃+r₂₃ (figure-8) | 5.54e-04 | +4.50 | **DUAL-PASS** |
411
+ | e₂ = r₁₂r₁₃+r₁₂r₂₃+r₁₃r₂₃ | 2.69e-03 | -1.67→**+1.30** | **FLIPPED** |
412
+ | Q = Σ ΓᵢΓⱼ rᵢⱼ | 5.36e-06 | -29.96→**+3.99** | **FLIPPED** |
413
+ | Q₂ = Σ ΓᵢΓⱼ rᵢⱼ² (= Γ·Lz) | 9.62e-12 | -43.9→**+29.6** | **FLIPPED** (exact) |
414
+ | Q_f family (12 functions, N=3-9) | 1e-5 to 1e-11 | ranked ρ=0.932 | **RANKING LEARNED** |
415
+ | H - Lz | 9.48e-12 | -19.6→**+26.1** | **FLIPPED** |
416
+ | K = Σ Γᵢ vᵢ² (kinetic) | 1.2e-7 | 0/8→**5/8** | **FIXABLE_BIAS** |
417
+ | Σᵢ rᵢ (parallel dipole sum) | ~1e-16 | — | **EXACT** |
418
+ | H·r₁₂ + α·Lz composites | 1e-3 to 1e-12 | margin -77.5 ± 1.7 | **FROZEN PRIOR** |
419
+
420
+ **K invariant (new family).** K = Σ Γᵢ vᵢ² is independent of the Q_f family (R² = 0.048 against Q₋₂). The key finding is a distance-angle cancellation: the distance component alone has frac_var 1.3e-5, the angular component has frac_var 1.1e-1, but the combined K has frac_var 1.2e-7 — a 100,000× improvement from cancellation. This is a genuinely new conservation mechanism. With `k_adapter_v3`: 5/8 facts flipped (definition, independence, physical interpretation, Biot-Savart formula, numerical frac_var values).
421
+
422
+ **Parallel dipole sum.** For N parallel dipoles, Σᵢ rᵢ = const exactly (frac_var ~10⁻¹⁶). Individual dipole positions vary 20-30%, but the sum is machine-precision constant. Follows from linear impulse conservation.
423
+
424
+ **Frozen prior diagnostic.** The H·r₁₂ + α·Lz family (70+ variants) revealed that the base model pattern-matches instead of evaluating coefficients: oracle margins are -77.5 ± 1.7 across 4 orders of magnitude of α variation. The model doesn't care what α is. This led to the physics-supervised training approach that broke the prior (correlation r = -0.11 → r = +0.952).
425
+
426
+ **Ranking adapter.** ListNet loss with log-scale targets and hard negative mining. Spearman ρ = 0.932 at step 50 (baseline -0.143). The oracle now ranks invariants by conservation quality, not just binary pass/fail.
427
+
428
+ ### Continuous Q_f Extension (2D/3D Euler)
429
+
430
+ The Q_f family extends from discrete vortices to continuous vorticity fields:
431
+
432
+ ```
433
+ Q_f[ω] = ∫∫ ω(x) ω(y) f(|x-y|) dx dy ≈ const
434
+ ```
435
+
436
+ Verified numerically across 6 test scenarios (laminar, turbulent 2D, 3D vortex rings, viscous NS):
437
+
438
+ | f(r) | 2D Laminar | 2D Turbulent | 3D Rings | Status |
439
+ |------|-----------|-------------|---------|--------|
440
+ | -ln(r) | 4.32e-03 | 2.77e-03 | — | Known (energy) |
441
+ | e^(-r) | 3.09e-04 | 5.42e-03 | 1.79e-03 | **NEW** |
442
+ | tanh(r) | — | 6.82e-03 | — | **NEW** |
443
+ | √r | 3.48e-04 | 1.07e-02 | 2.95e-03 | **NEW** |
444
+ | 1/r | — | — | 3.78e-04 | **NEW** (3D best) |
445
+
446
+ Oracle results: baseline **0/12 pass rate** (complete knowledge gap). With `qf_continuous_adapter`: **7/12 pass rate** (58.3%), diagnostic changed from KNOWLEDGE_GAP to FIXABLE_BIAS.
447
+
448
+ | Flipped Fact | Baseline | Adapter | Delta |
449
+ |--------------|----------|---------|-------|
450
+ | Q_f extension formula | -6.5 | +8.0 | +14.5 |
451
+ | f=-ln(r) gives energy | -44.3 | +17.2 | +61.5 |
452
+ | Q_{e^(-r)} conserved | -59.1 | +2.1 | +61.2 |
453
+ | Conservation mechanism | -43.7 | +11.3 | +55.0 |
454
+ | Q_f bounds → NS regularity | -11.7 | +3.6 | +15.3 |
455
+
456
+ Viscous (Navier-Stokes) decay scales linearly with ν. See `results/discoveries/qf_family_comprehensive.md` and `results/discoveries/continuous_qf_oracle.md`.
457
+
458
+ ### 3D Stretch-Resistant Ratio (the NS connection)
459
+
460
+ Standard Q_f varies 60% under vortex stretching, which is the mechanism behind potential 3D blowup. We tested four modifications:
461
+
462
+ | Variant | Stretch Resistance | Evolution Conservation | Combined |
463
+ |---------|-------------------|----------------------|----------|
464
+ | Standard Q_f | 60% variation | 0.14% | 2.95% |
465
+ | Q_f / Enstrophy | 17% | 0.36% | 2.44% |
466
+ | Curvature-weighted | 4% | 1.02% | 6.4% |
467
+ | **R_f = Q_exp / Q_inv** | **2%** | **0.17%** | **0.59%** |
468
+
469
+ R_f = Q_{e^(-r)} / Q_{1/r} survives stretching because both numerator and denominator scale as ~L² under stretching, and the ratio cancels. Physically, R_f measures the locality of vorticity interactions: how much the dynamics depends on nearby vs distant vorticity.
470
+
471
+ Oracle results: **8/8 facts flipped** (100% pass rate) with `qf_ratio_adapter`. Generalization margin: +34.3. Physical interpretation: +19.8. All conservation mechanism facts above +15.
472
+
473
+ See `research/qf_regularity_connection.md` and `research/test_stretch_resistant_qf.py`.
474
+
475
+ ### Navier-Stokes Regularity
476
+
477
+ The hardest domain tested and the most instructive. Baseline: **0/16** (model confidently wrong on all facts, margins -30 to -80). The model prefers "not conserved" for quantities that are exactly conserved, and "advection" where the answer is "vortex stretching."
478
+
479
+ Every training approach that worked elsewhere failed here, forcing new techniques at each plateau:
480
+
481
+ | Approach | Score | Problem |
482
+ |----------|-------|---------|
483
+ | Single-pass adapter | 2/16 | Interference (margins worsened) |
484
+ | Staged training (anchored) | 6/16 | Plateau (cross-cluster interference) |
485
+ | **Orthogonal adapters** | **16/16** | Solved |
486
+
487
+ The breakthrough was discovering that NS facts are **representational see-saws**: training on blowup facts (2/2 within cluster) destroys conservation margins (to -600). Training on conservation facts (2/2 within cluster) destroys blowup margins (to -1100). Even a single new fact causes regression on previously passing facts. The concepts need to move in opposite directions within logit space.
488
+
489
+ Solution: **orthogonal adapters**. Train a separate specialist adapter per concept cluster. Route each query to its specialist at inference. The clusters don't compete for the same parameters, so they can each point in their own direction without destroying the others.
490
+
491
+ The cluster boundaries reveal the model's internal concept structure: facts that interfere share representational dimensions.
492
+
493
+ ### Electromagnetism
494
+
495
+ Spectral Maxwell solver verifying conservation of EM invariants (energy, Lipkin's zilch, optical chirality, helicity, super-energy). All confirmed exactly conserved (frac_var < 10⁻⁶).
496
+
497
+ Oracle results on Qwen3-4B-Base: baseline **1/12 pass rate** (8.3%). The model fails on basic energy conservation (margin -4.08), not just obscure quantities. Zilch (margin -11.63) and super-energy (margin -9.94) are complete knowledge gaps.
498
+
499
+ With `em_adapter_v4`: **6/12 pass rate** (50%). Flipped: energy (-4.08→+14.96), chirality (-11.63→+8.21), super-energy (-9.94→+12.34), helicity (-7.89→+9.45). Mean margin: -11.04→-0.21.
500
+
501
+ See `results/discoveries/em_conservation_laws.md` and `results/discoveries/em_zilch_chirality.md`.
502
+
503
+ ### Chemical Kinetics (New Domain)
504
+
505
+ Conservation laws in reaction networks: Wegscheider cyclicity, mass action detailed balance, thermodynamic potentials, Lyapunov functions for open/closed systems.
506
+
507
+ Baseline: **0/16** (complete knowledge gap). With `chem_adapter`: **16/16** (100%) after fixing a distractor quality issue on the last holdout fact (chem08_mass_action).
508
+
509
+ | Metric | Baseline | After Adapter | Change |
510
+ |--------|----------|---------------|--------|
511
+ | Pass rate | 0/16 | 16/16 | +100% |
512
+ | Mean margin | -20.0 | +14.0 | +34.0 |
513
+
514
+ The first domain to reach 100% from single-pass training. Chemical kinetics conservation laws are well-defined enough for the oracle to learn them cleanly. The holdout fact initially appeared stuck at -1.4 margin, but the issue was a weak distractor, not a weak adapter. Fixing the distractor quality flipped it immediately.
515
+
516
+ ### Hamiltonian Mechanics (New Domain)
517
+
518
+ Phase space invariants: Liouville's theorem, symplectic structure, Poincare invariants, KAM tori, action-angle variables, Henon-Heiles chaos, generating functions. Created `research/hamiltonian_invariants.py` for numerical verification.
519
+
520
+ Baseline: **1/16**. Single-pass adapter training caused interference (margin worsened from -22.6 to -43.4). Solved via **staged training** in 5 stages, consolidating related fact clusters before moving to the next:
521
+
522
+ | Stage | Facts Passing | New Flips |
523
+ |-------|--------------|-----------|
524
+ | 1 | 5/16 | Symplectic cluster |
525
+ | 2 | 7/16 | +Noether, +Poisson |
526
+ | 3 | 10/16 | +Energy, +action, +integrable |
527
+ | 4 | 13/16 | +Kepler cluster |
528
+ | 5 | **16/16** | +KAM, +Henon-Heiles, +generating |
529
+
530
+ Zero regression across all 5 stages. Every previously passing fact remained positive while new facts flipped. The hardest flips were KAM theorem (-59.81 to +3.90), Henon-Heiles (-138.16 to +7.92), and generating functions (-88.32 to +6.32).
531
+
532
+ **Lesson: when single-pass training causes interference, staged training by concept cluster eliminates it.** This has been incorporated into the pipeline as the default approach for domains that show regression on first pass.
533
+
534
+ ### Knot Invariants (New Domain)
535
+
536
+ The first purely mathematical (non-physics) domain. Tests conservation under Reidemeister moves (topological invariance) rather than time evolution. Key facts: writhe is NOT invariant (changes by +/-1 under R1), Kauffman bracket is NOT invariant under R1 (multiplies by -A^{+/-3}), Jones polynomial IS invariant (normalization cancels R1 changes), HOMFLY-PT generalizes Jones, skein relations provide recursive crossing formulas.
537
+
538
+ Baseline: **1/16**. Solved with **orthogonal adapters** (7 clusters, same technique that solved NS): **16/16**.
539
+
540
+ This is significant for two reasons. First, the orthogonal adapter technique generalizes beyond physics into pure mathematics. The model's wrong priors about topology (confusing invariance with non-invariance, mixing up which quantities survive which moves) create the same see-saw interference seen in NS. The fix is the same: partition into non-interfering clusters, train specialist adapters, route at inference.
541
+
542
+ Second, **cross-domain transfer works.** Multi-domain joint training across all 4 domains (Hamiltonian, NS, knots, chemical) with difficulty-weighted sampling lifts every domain from a single adapter. NS went from 0/16 baseline to 10/16, knots from 1/16 to 11/16, chemical from 5/16 to 13/16. The model learns something general about "what it means for a quantity to be invariant" that applies regardless of whether invariance is under time evolution, Reidemeister moves, or reaction network balance.
543
+
544
+ ### Optimal f(r) Linear Combination
545
+
546
+ Gradient descent over weighted combinations of basis functions finds optimal conservation:
547
+
548
+ ```
549
+ f*(r) = 0.023 e^(-r/2) + 0.021 tanh(r) - 0.019 sin(r) + ...
550
+ ```
551
+
552
+ 99.6% improvement in conservation over any single basis function. With `optimal_f_adapter`: 2/4 facts flipped (dominant terms: +16.5, learned vs energy: +5.3).
553
+
554
+ ### Summary by Domain
555
+
556
+ | Domain | Facts | Oracle Baseline | Best Adapter | Status |
557
+ |--------|-------|-----------------|--------------|--------|
558
+ | Q_f Ratio (R_f) | 8 | 0% | **100%** | COMPLETE |
559
+ | **Hamiltonian mechanics** | **16** | **6.25%** | **100%** | **COMPLETE** (staged) |
560
+ | **NS regularity** | **16** | **0%** | **100%** | **COMPLETE** (orthogonal) |
561
+ | **Knot invariants** | **16** | **6.25%** | **100%** | **COMPLETE** (orthogonal) |
562
+ | **Chemical kinetics** | **16** | **0%** | **100%** | **COMPLETE** (single-pass) |
563
+ | Point-vortex Q_f | 14 | 20% | ~80% | COMPLETE |
564
+ | K invariant | 8 | 0% | 62.5% | IMPROVED |
565
+ | Continuous Q_f | 12 | 0% | 58.3% | FIXABLE |
566
+ | Electromagnetism | 12 | 8.3% | 50% | FIXABLE |
567
+ | Optimal f(r) | 4 | 0% | 50% | FIXABLE |
568
+ | Ranking adapter | — | ρ=-0.14 | ρ=0.93 | — |
569
+
570
+ **Total: 10 domains, 122 oracle facts tested. 6 domains at 100%. 0% MMLU degradation across all adapters.**
571
+
572
+ Full history: `results/candidates.tsv`
573
+
574
+ ---
575
+
576
+ ## Coordination
577
+
578
+ NoetherSolve uses the **THINK → CLAIM → RUN → PUBLISH** protocol
579
+ to prevent duplicate work across contributors.
580
+
581
+ > Coordination design adapted from
582
+ > [autoresearch-at-home](https://github.com/mutable-state-inc/autoresearch-at-home)
583
+ > (mutable-state-inc), which pioneered asynchronous multi-agent research
584
+ > coordination with semantic duplicate detection and claim expiry.
585
+ > We adapt it here for human-in-the-loop physics hunting.
586
+
587
+ ```bash
588
+ python claim.py list # see what's in flight
589
+ python claim.py claim # reserve your problem before running
590
+ python claim.py release # publish your results, free the claim
591
+ ```
592
+
593
+ Claims expire after 4 hours. See `CONTRIBUTING.md` for the full protocol.
594
+
595
+ ---
596
+
597
+ ## Architecture
598
+
599
+ ```
600
+ NoetherSolve
601
+ ├── oracle_wrapper.py ← Oracle + repair + ranking + quadrant diagnosis
602
+ ├── conservation_checker.py ← Figure-8 3-body numerical checker
603
+ ├── vortex_checker.py ← 2D point-vortex numerical checker
604
+ ├── em_checker.py ← Spectral Maxwell solver (EM conservation)
605
+ ├── noethersolve_torch.py ← PyTorch/CUDA backend (no MLX needed)
606
+ ├── autonomy_loop.py ← Fully autonomous sweep + hypothesis generation
607
+ ├── claim.py ← THINK/CLAIM/RUN/PUBLISH coordination
608
+ ├── dashboard.py ← Results dashboard from candidates.tsv
609
+
610
+ ├── noethersolve/ ← Core package
611
+ │ ├── adapter.py ← Snap-on logit adapter (SwiGLU)
612
+ │ ├── oracle.py ← Oracle scoring engine
613
+ │ └── train_utils.py ← Shared training utilities
614
+
615
+ ├── problems/ ← Domain plugins (fork here)
616
+ │ ├── problem_template.yaml
617
+ │ ├── vortex_pair_conservation.yaml
618
+ │ ├── em_zilch.yaml ← Electromagnetic zilch/chirality
619
+ │ ├── continuous_qf.yaml ← Continuous Q_f (2D/3D Euler)
620
+ │ └── *_facts.json ← Verification sets
621
+
622
+ ├── training/
623
+ │ ├── scripts/ ← All adapter training scripts
624
+ │ │ ├── train_ranking_v2.py ← Ranking adapter (ListNet + hard negatives)
625
+ │ │ ├── train_vortex_adapter.py
626
+ │ │ ├── train_physics_supervised.py
627
+ │ │ ├── train_prior_breaker.py
628
+ │ │ ├── train_em_adapter.py ← EM domain adapter
629
+ │ │ └── train_qf_continuous_adapter.py ← Continuous Q_f adapter
630
+ │ └── data/ ← Training JSON files
631
+
632
+ ├── research/ ← Q_f extension + NS regularity + EM experiments
633
+ │ ├── test_continuous_qf.py ← 2D Euler verification
634
+ │ ├── test_qf_turbulence.py ← Turbulent dynamics
635
+ │ ├── test_3d_vortex_qf.py ← 3D vortex rings
636
+ │ ├── test_qf_viscous.py ← Navier-Stokes viscous decay
637
+ │ ├── test_stretch_resistant_qf.py ← R_f ratio (survives stretching)
638
+ │ ├── learn_optimal_f.py ← Gradient descent for optimal f(r)
639
+ │ ├── maxwell_zilch.py ← Spectral Maxwell solver + EM invariants
640
+ │ └── qf_regularity_connection.md
641
+
642
+ ├── paper/
643
+ │ ├── breaking_frozen_priors.md ← Paper 10 source
644
+ │ ├── breaking_frozen_priors.pdf ← Paper 10 (pandoc breaking_frozen_priors.md -o *.pdf)
645
+ │ └── prior_work/ ← Papers 8-9 that this builds on
646
+
647
+ ├── adapters/ ← Trained weights (gitignored)
648
+
649
+ └── results/
650
+ ├── candidates.tsv ← All tested hypotheses (193 entries)
651
+ └── discoveries/ ← Discovery notes (26 files)
652
+ ```
653
+
654
+ ---
655
+
656
+ ## Built On
657
+
658
+ - **STEM Truth Oracle** (Paper 9) — log-prob margin as a zero-FP/FN binary
659
+ classifier for factual correctness.
660
+ DOI: [10.5281/zenodo.19005729](https://doi.org/10.5281/zenodo.19005729)
661
+
662
+ - **Snap-On Communication Modules** (Paper 8) — frozen logit-space adapters
663
+ that close knowledge gaps without touching base model weights.
664
+ DOI: [10.5281/zenodo.18902616](https://doi.org/10.5281/zenodo.18902616)
665
+
666
+ - **autoresearch-at-home** (mutable-state-inc) — THINK → CLAIM → RUN → PUBLISH
667
+ coordination protocol for collaborative research without duplicate work.
668
+ [github.com/mutable-state-inc/autoresearch-at-home](https://github.com/mutable-state-inc/autoresearch-at-home)
669
+
670
+ - **Noether's theorem** (Emmy Noether, 1915) — the reason any of this works.
671
+
672
+ ## Cite
673
+
674
+ ```bibtex
675
+ @article{sanchez2026breaking,
676
+ title={Breaking Frozen Priors: Teaching Language Models to Discover Conservation Laws from Numerical Simulation},
677
+ author={Sanchez, Bryan},
678
+ year={2026},
679
+ doi={10.5281/zenodo.19017290},
680
+ url={https://doi.org/10.5281/zenodo.19017290}
681
+ }
682
+ ```