edmkit 0.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- edmkit-0.0.0/.claude/settings.json +13 -0
- edmkit-0.0.0/.github/workflows/ci.yaml +40 -0
- edmkit-0.0.0/.github/workflows/release.yaml +29 -0
- edmkit-0.0.0/.gitignore +10 -0
- edmkit-0.0.0/.python-version +1 -0
- edmkit-0.0.0/CLAUDE.md +75 -0
- edmkit-0.0.0/LICENSE +21 -0
- edmkit-0.0.0/PKG-INFO +36 -0
- edmkit-0.0.0/README.md +25 -0
- edmkit-0.0.0/debug.py +235 -0
- edmkit-0.0.0/pyproject.toml +15 -0
- edmkit-0.0.0/ruff.toml +1 -0
- edmkit-0.0.0/src/edmkit/__init__.py +4 -0
- edmkit-0.0.0/src/edmkit/ccm.py +76 -0
- edmkit-0.0.0/src/edmkit/embedding.py +56 -0
- edmkit-0.0.0/src/edmkit/generate/__init__.py +4 -0
- edmkit-0.0.0/src/edmkit/generate/double_pendulum.py +52 -0
- edmkit-0.0.0/src/edmkit/generate/lorenz.py +22 -0
- edmkit-0.0.0/src/edmkit/generate/mackey_glass.py +17 -0
- edmkit-0.0.0/src/edmkit/simplex_projection.py +51 -0
- edmkit-0.0.0/src/edmkit/smap.py +72 -0
- edmkit-0.0.0/src/edmkit/tensor.py +15 -0
- edmkit-0.0.0/src/edmkit/util.py +190 -0
- edmkit-0.0.0/tests/smoke_test.py +44 -0
- edmkit-0.0.0/tests/test_embedding.py +12 -0
- edmkit-0.0.0/tests/test_simplex_projection.py +82 -0
- edmkit-0.0.0/tests/test_smap.py +143 -0
- edmkit-0.0.0/tests/test_util.py +163 -0
- edmkit-0.0.0/uv.lock +670 -0
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Basic CI setup: Lint with ruff, run tests with pytest
|
|
2
|
+
name: Test
|
|
3
|
+
|
|
4
|
+
on:
|
|
5
|
+
pull_request:
|
|
6
|
+
push:
|
|
7
|
+
branches:
|
|
8
|
+
- main
|
|
9
|
+
|
|
10
|
+
jobs:
|
|
11
|
+
lint:
|
|
12
|
+
name: Lint
|
|
13
|
+
runs-on: ubuntu-latest
|
|
14
|
+
steps:
|
|
15
|
+
- uses: actions/checkout@v4
|
|
16
|
+
- uses: astral-sh/setup-uv@v3
|
|
17
|
+
- name: Ruff lint
|
|
18
|
+
run: uv run ruff check .
|
|
19
|
+
- name: Ruff format
|
|
20
|
+
run: uv run ruff format --diff .
|
|
21
|
+
|
|
22
|
+
type-check:
|
|
23
|
+
name: Type check
|
|
24
|
+
runs-on: ubuntu-latest
|
|
25
|
+
steps:
|
|
26
|
+
- uses: actions/checkout@v4
|
|
27
|
+
- uses: astral-sh/setup-uv@v3
|
|
28
|
+
- name: Ty type check
|
|
29
|
+
run: uv run ty check
|
|
30
|
+
|
|
31
|
+
test:
|
|
32
|
+
name: Run tests
|
|
33
|
+
strategy:
|
|
34
|
+
matrix:
|
|
35
|
+
os: [ubuntu-latest, macos-latest, windows-latest]
|
|
36
|
+
runs-on: ${{ matrix.os }}
|
|
37
|
+
steps:
|
|
38
|
+
- uses: actions/checkout@v4
|
|
39
|
+
- uses: astral-sh/setup-uv@v3
|
|
40
|
+
- run: uv run pytest
|
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
name: Release
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
|
|
6
|
+
jobs:
|
|
7
|
+
pypi:
|
|
8
|
+
name: Publish to PyPI
|
|
9
|
+
runs-on: ubuntu-latest
|
|
10
|
+
environment:
|
|
11
|
+
name: pypi
|
|
12
|
+
permissions:
|
|
13
|
+
id-token: write
|
|
14
|
+
contents: read
|
|
15
|
+
steps:
|
|
16
|
+
- name: Checkout
|
|
17
|
+
uses: actions/checkout@v5
|
|
18
|
+
- name: Install uv
|
|
19
|
+
uses: astral-sh/setup-uv@v6
|
|
20
|
+
- name: Install Python 3.13
|
|
21
|
+
run: uv python install 3.13
|
|
22
|
+
- name: Build
|
|
23
|
+
run: uv build
|
|
24
|
+
- name: Smoke test (wheel)
|
|
25
|
+
run: uv run --isolated --no-project --with dist/*.whl tests/smoke_test.py
|
|
26
|
+
- name: Smoke test (source distribution)
|
|
27
|
+
run: uv run --isolated --no-project --with dist/*.tar.gz tests/smoke_test.py
|
|
28
|
+
- name: Publish
|
|
29
|
+
run: uv publish
|
edmkit-0.0.0/.gitignore
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
3.13.0
|
edmkit-0.0.0/CLAUDE.md
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Project Overview
|
|
6
|
+
|
|
7
|
+
edmkit is a Python library for Empirical Dynamic Modeling (EDM), providing tools for nonlinear time series analysis including embedding, simplex projection, S-Map, and convergent cross mapping.
|
|
8
|
+
|
|
9
|
+
## Development Commands
|
|
10
|
+
|
|
11
|
+
### Testing
|
|
12
|
+
```bash
|
|
13
|
+
# Run all tests
|
|
14
|
+
uv run pytest tests/
|
|
15
|
+
|
|
16
|
+
# Run a specific test file
|
|
17
|
+
uv run pytest tests/test_simplex_projection.py
|
|
18
|
+
|
|
19
|
+
# Run tests with verbose output
|
|
20
|
+
uv run pytest -v tests/
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
### Type Checking
|
|
24
|
+
```bash
|
|
25
|
+
# Type check
|
|
26
|
+
uv run ty check .
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
### Linting and Formatting
|
|
30
|
+
```bash
|
|
31
|
+
# Check code style and linting issues
|
|
32
|
+
uv run ruff check .
|
|
33
|
+
|
|
34
|
+
# Auto-format code
|
|
35
|
+
uv run ruff format .
|
|
36
|
+
|
|
37
|
+
# Check specific file
|
|
38
|
+
uv run ruff check src/edmkit/simplex_projection.py
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### Installation and Dependencies
|
|
42
|
+
```bash
|
|
43
|
+
# Install in development mode with uv (recommended)
|
|
44
|
+
uv sync --dev
|
|
45
|
+
|
|
46
|
+
# Do not use pip interface
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Architecture and Code Structure
|
|
50
|
+
|
|
51
|
+
### Core Modules
|
|
52
|
+
- **embedding.py**: Time series embedding using lagged coordinates. Key function: `embed()`
|
|
53
|
+
- **simplex_projection.py**: Nearest-neighbor forecasting in phase space. Key function: `simplex_projection()`
|
|
54
|
+
- **smap.py**: Sequential Locally Weighted Global Linear Maps. Key function: `smap()`
|
|
55
|
+
- **ccm.py**: Convergent Cross Mapping for causality detection.
|
|
56
|
+
- **tensor.py**: Abstraction layer over tinygrad for GPU acceleration
|
|
57
|
+
- **util.py**: Utility functions including distance calculations, autocorrelation, DTW
|
|
58
|
+
|
|
59
|
+
### Key Design Patterns
|
|
60
|
+
1. **Functional API**: Most functions are pure with NumPy array inputs/outputs
|
|
61
|
+
2. **GPU Acceleration**: Uses tinygrad tensors internally for performance
|
|
62
|
+
3. **Input Validation**: Extensive assertions with informative error messages
|
|
63
|
+
4. **Vectorized Operations**: Leverages NumPy/tinygrad broadcasting for efficiency
|
|
64
|
+
5. **Minimal Dependency**: Use minimal dependencies and all of usage should be justified
|
|
65
|
+
|
|
66
|
+
### Testing Approach
|
|
67
|
+
- Tests compare outputs against pyEDM reference implementation
|
|
68
|
+
- Use synthetic data from generators (Lorenz, Mackey-Glass, etc.) as fixture
|
|
69
|
+
- Test files mirror source structure (e.g., `test_simplex_projection.py` for `simplex_projection.py`)
|
|
70
|
+
|
|
71
|
+
### Important Implementation Details
|
|
72
|
+
1. **Distance Calculations**: The library uses custom distance functions that handle NaN values specially
|
|
73
|
+
2. **Platform Considerations**: Special handling for macOS non-ARM systems regarding Metal backend
|
|
74
|
+
3. **API Stability**: The library is under active development - expect API changes
|
|
75
|
+
4. **Tensor Conversion**: Functions convert between NumPy arrays and tinygrad tensors internally
|
edmkit-0.0.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 temma
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
edmkit-0.0.0/PKG-INFO
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: edmkit
|
|
3
|
+
Version: 0.0.0
|
|
4
|
+
Summary: Simple EDM (Empirical Dynamic Modeling) library
|
|
5
|
+
Author-email: FUJISHIGE TEMMA <tenma.x0@gmail.com>
|
|
6
|
+
License-File: LICENSE
|
|
7
|
+
Requires-Python: >=3.13
|
|
8
|
+
Requires-Dist: numpy>=2.3.4
|
|
9
|
+
Requires-Dist: tinygrad>=0.11.0
|
|
10
|
+
Description-Content-Type: text/markdown
|
|
11
|
+
|
|
12
|
+
# edmkit
|
|
13
|
+
|
|
14
|
+
This library is a collection of tools and utilities that are useful for Empirical Data Modeling (EDM) and related tasks. The library is designed to be fast and lightweight, and easy to use.
|
|
15
|
+
|
|
16
|
+
::: warning
|
|
17
|
+
This library is still under intensive development so API may change in the future.
|
|
18
|
+
:::
|
|
19
|
+
|
|
20
|
+
## Installation
|
|
21
|
+
|
|
22
|
+
To install the library, you can use pip:
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
pip install edmkit
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
Or you can also use [uv](https://docs.astral.sh/uv/):
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
uv add edmkit
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Usage
|
|
35
|
+
|
|
36
|
+
Most of the functions accept and return `numpy` arrays or `edmkit.Tensor`(alias to `tinygrad.Tensor`).
|
edmkit-0.0.0/README.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# edmkit
|
|
2
|
+
|
|
3
|
+
This library is a collection of tools and utilities that are useful for Empirical Data Modeling (EDM) and related tasks. The library is designed to be fast and lightweight, and easy to use.
|
|
4
|
+
|
|
5
|
+
::: warning
|
|
6
|
+
This library is still under intensive development so API may change in the future.
|
|
7
|
+
:::
|
|
8
|
+
|
|
9
|
+
## Installation
|
|
10
|
+
|
|
11
|
+
To install the library, you can use pip:
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
pip install edmkit
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
Or you can also use [uv](https://docs.astral.sh/uv/):
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
uv add edmkit
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Usage
|
|
24
|
+
|
|
25
|
+
Most of the functions accept and return `numpy` arrays or `edmkit.Tensor`(alias to `tinygrad.Tensor`).
|
edmkit-0.0.0/debug.py
ADDED
|
@@ -0,0 +1,235 @@
|
|
|
1
|
+
import matplotlib.pyplot as plt
|
|
2
|
+
import numpy as np
|
|
3
|
+
import pandas as pd
|
|
4
|
+
import pyEDM
|
|
5
|
+
|
|
6
|
+
from edmkit import generate, tensor
|
|
7
|
+
from edmkit.embedding import lagged_embed
|
|
8
|
+
from edmkit.simplex_projection import simplex_projection
|
|
9
|
+
from edmkit.util import pairwise_distance, topk
|
|
10
|
+
|
|
11
|
+
|
|
12
|
+
def logistic_map(n: int = 200):
|
|
13
|
+
"""Generate logistic map time series x."""
|
|
14
|
+
x = np.zeros(n)
|
|
15
|
+
x[0] = 0.1
|
|
16
|
+
# Logistic map
|
|
17
|
+
for i in range(1, n):
|
|
18
|
+
x[i] = 3.8 * x[i - 1] * (1 - x[i - 1])
|
|
19
|
+
return x
|
|
20
|
+
|
|
21
|
+
|
|
22
|
+
def lorenz(n: int = 200):
|
|
23
|
+
"""Generate Lorenz attractor time series x."""
|
|
24
|
+
sigma, rho, beta = 10, 28, 8 / 3
|
|
25
|
+
X0 = np.array([1.0, 1.0, 1.0])
|
|
26
|
+
t_max = 30
|
|
27
|
+
dt = t_max / (n * 10) # Generate enough points before subsampling
|
|
28
|
+
|
|
29
|
+
t, X = generate.lorenz(sigma, rho, beta, X0, dt, t_max)
|
|
30
|
+
return X[::10, 0][:n] # Ensure we get exactly n points
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
def mackey_glass(n: int = 200):
|
|
34
|
+
"""Generate Mackey-Glass time series x."""
|
|
35
|
+
tau, n_exponent = 17, 10
|
|
36
|
+
beta, gamma = 0.2, 0.1
|
|
37
|
+
x0 = 0.9
|
|
38
|
+
t_max = 200
|
|
39
|
+
dt = t_max / n
|
|
40
|
+
|
|
41
|
+
t, x = generate.mackey_glass(tau, n_exponent, beta, gamma, x0, dt, t_max)
|
|
42
|
+
return x
|
|
43
|
+
|
|
44
|
+
|
|
45
|
+
params = [
|
|
46
|
+
("logistic_map", 3, 2),
|
|
47
|
+
("lorenz", 3, 1),
|
|
48
|
+
("mackey_glass", 4, 2),
|
|
49
|
+
]
|
|
50
|
+
|
|
51
|
+
|
|
52
|
+
def main():
|
|
53
|
+
for data, E, tau in params:
|
|
54
|
+
if data == "logistic_map":
|
|
55
|
+
x = logistic_map()
|
|
56
|
+
elif data == "lorenz":
|
|
57
|
+
x = lorenz()
|
|
58
|
+
elif data == "mackey_glass":
|
|
59
|
+
x = mackey_glass()
|
|
60
|
+
else:
|
|
61
|
+
raise ValueError(f"Unknown data type: {data}")
|
|
62
|
+
|
|
63
|
+
# common parameters
|
|
64
|
+
lib_size = 150
|
|
65
|
+
Tp = 2
|
|
66
|
+
|
|
67
|
+
# pyEDM
|
|
68
|
+
df = pd.DataFrame({"time": np.arange(len(x)), "value": x})
|
|
69
|
+
lib, pred = f"1 {lib_size}", f"{lib_size + 1} {len(x)}"
|
|
70
|
+
pyedm_result = pyEDM.Simplex(dataFrame=df, lib=lib, pred=pred, E=E, tau=-tau, columns="value", target="value", Tp=Tp, verbose=False)
|
|
71
|
+
pyedm_predictions = pyedm_result["Predictions"].values[Tp:-Tp] # first Tp values are NaN, last Tp values are not in true x
|
|
72
|
+
|
|
73
|
+
# edmkit
|
|
74
|
+
embedding = lagged_embed(x, tau, E)
|
|
75
|
+
shift = tau * (E - 1) # embedding starts at this index (i.e. embedding[0][0] == x[shift])
|
|
76
|
+
X = embedding[: lib_size - shift]
|
|
77
|
+
Y = embedding[Tp : lib_size - shift + Tp, 0] # shifted by Tp
|
|
78
|
+
|
|
79
|
+
query_points = embedding[lib_size - shift :]
|
|
80
|
+
edmkit_predictions = simplex_projection(X, Y, query_points)[:-Tp] # last Tp values are not in true x
|
|
81
|
+
|
|
82
|
+
ground_truth = x[lib_size + Tp :]
|
|
83
|
+
pyedm_rmse = np.sqrt(np.mean((pyedm_predictions - ground_truth) ** 2))
|
|
84
|
+
edmkit_rmse = np.sqrt(np.mean((edmkit_predictions - ground_truth) ** 2))
|
|
85
|
+
|
|
86
|
+
diff = edmkit_predictions - pyedm_predictions
|
|
87
|
+
|
|
88
|
+
fig1 = plt.figure(figsize=(10, 6))
|
|
89
|
+
|
|
90
|
+
ax1 = fig1.add_subplot(2, 1, 1)
|
|
91
|
+
|
|
92
|
+
ax1.plot(ground_truth, label="Ground Truth", color="black")
|
|
93
|
+
ax1.plot(pyedm_predictions, label="pyEDM Predictions", linestyle="--", color="blue")
|
|
94
|
+
ax1.plot(edmkit_predictions, label="edmkit Predictions", linestyle=":", color="red")
|
|
95
|
+
|
|
96
|
+
print(pyedm_predictions.shape, edmkit_predictions.shape, ground_truth.shape, X.shape, Y.shape, query_points.shape)
|
|
97
|
+
|
|
98
|
+
large_diff = np.abs(diff) > np.max(np.abs(diff)) * 0.1
|
|
99
|
+
ax1.vlines(
|
|
100
|
+
np.where(large_diff)[0], ymin=0, ymax=1, color="orange", alpha=0.3, label="Significant Difference", transform=ax1.get_xaxis_transform()
|
|
101
|
+
)
|
|
102
|
+
|
|
103
|
+
ax2 = fig1.add_subplot(2, 1, 2, sharex=ax1)
|
|
104
|
+
ax2.plot(diff, label="Difference (edmkit - pyEDM)", color="green")
|
|
105
|
+
ax2.vlines(
|
|
106
|
+
np.where(large_diff)[0], ymin=0, ymax=1, color="orange", alpha=0.3, label="Significant Difference", transform=ax2.get_xaxis_transform()
|
|
107
|
+
)
|
|
108
|
+
ax2.axhline(0, color="black", linestyle="--")
|
|
109
|
+
|
|
110
|
+
fig1.tight_layout()
|
|
111
|
+
fig1.legend()
|
|
112
|
+
plt.show()
|
|
113
|
+
|
|
114
|
+
# display embedding in 3D
|
|
115
|
+
fig2 = plt.figure(figsize=(10, 6))
|
|
116
|
+
ax3 = fig2.add_subplot(111, projection="3d")
|
|
117
|
+
|
|
118
|
+
ax3.set_title(f"Embedding (E={E}, tau={tau})")
|
|
119
|
+
|
|
120
|
+
ax3.scatter(X[:, 0], X[:, 1], X[:, 2], c=np.arange(len(X)), cmap="viridis", s=3) # type: ignore
|
|
121
|
+
ax3.set_xlabel("X(t)")
|
|
122
|
+
ax3.set_ylabel(f"X(t-{tau})")
|
|
123
|
+
ax3.set_zlabel(f"X(t-{2 * tau})") # type: ignore
|
|
124
|
+
|
|
125
|
+
ax3.scatter(
|
|
126
|
+
query_points[:-Tp, 0],
|
|
127
|
+
query_points[:-Tp, 1],
|
|
128
|
+
query_points[:-Tp, 2],
|
|
129
|
+
c="black",
|
|
130
|
+
s=5, # type: ignore
|
|
131
|
+
label="Query Points",
|
|
132
|
+
)
|
|
133
|
+
ax3.scatter(
|
|
134
|
+
query_points[:-Tp, 0][large_diff],
|
|
135
|
+
query_points[:-Tp, 1][large_diff],
|
|
136
|
+
query_points[:-Tp, 2][large_diff],
|
|
137
|
+
c="red",
|
|
138
|
+
s=5, # type: ignore
|
|
139
|
+
label="Significant Difference",
|
|
140
|
+
)
|
|
141
|
+
ax3.scatter(
|
|
142
|
+
pyedm_predictions[large_diff],
|
|
143
|
+
pyedm_predictions[large_diff],
|
|
144
|
+
pyedm_predictions[large_diff],
|
|
145
|
+
c="blue",
|
|
146
|
+
s=5, # type: ignore
|
|
147
|
+
label="pyEDM Predictions",
|
|
148
|
+
marker="x",
|
|
149
|
+
)
|
|
150
|
+
ax3.scatter(
|
|
151
|
+
edmkit_predictions[large_diff],
|
|
152
|
+
edmkit_predictions[large_diff],
|
|
153
|
+
edmkit_predictions[large_diff],
|
|
154
|
+
c="red",
|
|
155
|
+
s=5, # type: ignore
|
|
156
|
+
label="edmkit Predictions",
|
|
157
|
+
marker="x",
|
|
158
|
+
)
|
|
159
|
+
|
|
160
|
+
fig2.tight_layout()
|
|
161
|
+
fig2.legend()
|
|
162
|
+
plt.show()
|
|
163
|
+
|
|
164
|
+
fig3 = plt.figure(figsize=(10, 6))
|
|
165
|
+
ax4 = fig3.add_subplot(111, projection="3d")
|
|
166
|
+
ax4.set_title("Sample #1 with Significant Difference (with neighbors)")
|
|
167
|
+
|
|
168
|
+
pyedm_result_object = pyEDM.Simplex(
|
|
169
|
+
dataFrame=df, lib=lib, pred=pred, E=E, tau=-tau, columns="value", target="value", Tp=Tp, verbose=False, returnObject=True
|
|
170
|
+
)
|
|
171
|
+
print(pyedm_result_object.libOverlap)
|
|
172
|
+
|
|
173
|
+
query_point = query_points[:-Tp][large_diff][0]
|
|
174
|
+
|
|
175
|
+
pyedm_neighbor_indices = pyedm_result_object.knn_neighbors[:-Tp][large_diff][0]
|
|
176
|
+
|
|
177
|
+
D = pairwise_distance(tensor.Tensor(query_points, dtype=tensor.dtypes.float32), tensor.Tensor(X, dtype=tensor.dtypes.float32)).numpy()
|
|
178
|
+
D = np.sqrt(D)
|
|
179
|
+
|
|
180
|
+
k: int = X.shape[1] + 1
|
|
181
|
+
neighbor_indices = np.zeros((len(query_points), k))
|
|
182
|
+
|
|
183
|
+
for i in range(len(query_points)):
|
|
184
|
+
# find k nearest neighbors
|
|
185
|
+
indices, distances = topk(D[i], k, largest=False)
|
|
186
|
+
neighbor_indices[i] = indices
|
|
187
|
+
|
|
188
|
+
edmkit_neighbor_index = neighbor_indices[:-Tp][large_diff][0].astype(int)
|
|
189
|
+
|
|
190
|
+
pyedm_neighbors = X[pyedm_neighbor_indices]
|
|
191
|
+
edmkit_neighbors = X[edmkit_neighbor_index]
|
|
192
|
+
|
|
193
|
+
ax4.scatter(X[:, 0], X[:, 1], X[:, 2], c=np.arange(len(X)), cmap="viridis", s=3, zorder=1) # type: ignore
|
|
194
|
+
ax4.scatter(
|
|
195
|
+
query_point[0],
|
|
196
|
+
query_point[1],
|
|
197
|
+
query_point[2],
|
|
198
|
+
c="black",
|
|
199
|
+
s=10, # type: ignore
|
|
200
|
+
label="Query Point",
|
|
201
|
+
zorder=2,
|
|
202
|
+
)
|
|
203
|
+
ax4.scatter(
|
|
204
|
+
edmkit_neighbors[:, 0],
|
|
205
|
+
edmkit_neighbors[:, 1],
|
|
206
|
+
edmkit_neighbors[:, 2],
|
|
207
|
+
c="red",
|
|
208
|
+
s=10, # type: ignore
|
|
209
|
+
label="edmkit Neighbors",
|
|
210
|
+
marker="x",
|
|
211
|
+
zorder=3,
|
|
212
|
+
)
|
|
213
|
+
ax4.scatter(
|
|
214
|
+
pyedm_neighbors[:, 0],
|
|
215
|
+
pyedm_neighbors[:, 1],
|
|
216
|
+
pyedm_neighbors[:, 2],
|
|
217
|
+
c="blue",
|
|
218
|
+
s=10, # type: ignore
|
|
219
|
+
label="pyEDM Neighbors",
|
|
220
|
+
marker="x",
|
|
221
|
+
zorder=4,
|
|
222
|
+
)
|
|
223
|
+
|
|
224
|
+
fig3.tight_layout()
|
|
225
|
+
fig3.legend()
|
|
226
|
+
plt.show()
|
|
227
|
+
|
|
228
|
+
try:
|
|
229
|
+
assert np.abs(pyedm_rmse - edmkit_rmse) < 1e-6, f"RMSE: pyEDM {pyedm_rmse}, edmkit {edmkit_rmse}, diff {np.abs(pyedm_rmse - edmkit_rmse)}"
|
|
230
|
+
except AssertionError as e:
|
|
231
|
+
print(f"AssertionError: {e}")
|
|
232
|
+
|
|
233
|
+
|
|
234
|
+
if __name__ == "__main__":
|
|
235
|
+
main()
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
[project]
|
|
2
|
+
name = "edmkit"
|
|
3
|
+
version = "0.0.0"
|
|
4
|
+
description = "Simple EDM (Empirical Dynamic Modeling) library"
|
|
5
|
+
authors = [{ name = "FUJISHIGE TEMMA", email = "tenma.x0@gmail.com" }]
|
|
6
|
+
readme = "README.md"
|
|
7
|
+
requires-python = ">= 3.13"
|
|
8
|
+
dependencies = ["numpy>=2.3.4", "tinygrad>=0.11.0"]
|
|
9
|
+
|
|
10
|
+
[dependency-groups]
|
|
11
|
+
dev = ["pyedm>=2.2.4", "pytest>=8.4.2", "ruff>=0.14.0", "ty>=0.0.1a22"]
|
|
12
|
+
|
|
13
|
+
[build-system]
|
|
14
|
+
requires = ["hatchling"]
|
|
15
|
+
build-backend = "hatchling.build"
|
edmkit-0.0.0/ruff.toml
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
line-length = 150
|
|
@@ -0,0 +1,76 @@
|
|
|
1
|
+
import numpy as np
|
|
2
|
+
|
|
3
|
+
from edmkit.embedding import lagged_embed
|
|
4
|
+
from edmkit.tensor import Tensor, dtypes
|
|
5
|
+
from edmkit.util import pad, pairwise_distance, topk
|
|
6
|
+
|
|
7
|
+
|
|
8
|
+
def calculate_rho(observations: np.ndarray, predictions: np.ndarray):
|
|
9
|
+
assert len(observations) == len(predictions), "observations and predictions must have the same length"
|
|
10
|
+
return np.corrcoef(observations.T, predictions.T)[0, 1]
|
|
11
|
+
|
|
12
|
+
|
|
13
|
+
def search_best_embedding(
|
|
14
|
+
x: np.ndarray, tau_list: list[int], e_list: list[int], Tp: int, max_L: int | None = None, rng: np.random.Generator | None = None
|
|
15
|
+
):
|
|
16
|
+
assert all(tau > 0 for tau in tau_list), f"tau must be positive, got tau_list={tau_list}"
|
|
17
|
+
assert all(e > 0 for e in e_list), f"e must be positive, got e_list={e_list}"
|
|
18
|
+
assert max_L is None or max_L <= len(x), f"max_L must be less than or equal to len(x), got max_L={max_L} and len(x)={len(x)}"
|
|
19
|
+
|
|
20
|
+
if rng is None:
|
|
21
|
+
rng = np.random.default_rng()
|
|
22
|
+
|
|
23
|
+
if max_L is not None:
|
|
24
|
+
x = x[rng.choice(len(x), min(len(x), max_L), replace=False)]
|
|
25
|
+
|
|
26
|
+
# lagged_embed(x, tau, e).shape[0] == len(x) - (e - 1) * tau
|
|
27
|
+
min_L = len(x) - (max(e_list) - 1) * max(tau_list)
|
|
28
|
+
assert min_L > 0, (
|
|
29
|
+
f"Not enough data points to embed, got len(x)(={len(x)}) - max(e)(={max(e_list)}) * max(tau)(={max(tau_list)}) = min_L(={min_L})"
|
|
30
|
+
)
|
|
31
|
+
|
|
32
|
+
embeddings: list[np.ndarray] = []
|
|
33
|
+
for tau in tau_list:
|
|
34
|
+
for e in e_list:
|
|
35
|
+
# align the time indices of the embeddings. note that the time index of the first embedding is `1 + (e - 1) * tau`
|
|
36
|
+
embeddings.append(lagged_embed(x, tau, e)[-min_L:]) # (min_L, e)
|
|
37
|
+
|
|
38
|
+
X = pad(embeddings) # X.shape == (len(tau_list) * len(e_list), min_L, max(e_list))
|
|
39
|
+
D = pairwise_distance(Tensor(X, dtype=dtypes.float32)).numpy()
|
|
40
|
+
|
|
41
|
+
L = min_L
|
|
42
|
+
seq = np.arange(L)
|
|
43
|
+
lib_size = L // 2
|
|
44
|
+
|
|
45
|
+
rho = np.zeros((len(tau_list), len(e_list)))
|
|
46
|
+
|
|
47
|
+
for i, tau in enumerate(tau_list):
|
|
48
|
+
for j, e in enumerate(e_list):
|
|
49
|
+
batch = i * len(e_list) + j
|
|
50
|
+
|
|
51
|
+
samples_indecies = np.arange(lib_size, L)
|
|
52
|
+
|
|
53
|
+
observations = X[batch, samples_indecies, :e]
|
|
54
|
+
predictions = np.zeros((len(samples_indecies), e), dtype=x.dtype)
|
|
55
|
+
|
|
56
|
+
for k, t in enumerate(samples_indecies):
|
|
57
|
+
# [0, 1, 2, 3, 4 | 5, 6, 7, 8, 9, 10]; initialize mask, `|` separates lib and test
|
|
58
|
+
mask = np.ones(L, dtype=bool)
|
|
59
|
+
|
|
60
|
+
# [0, 1, 2, 3, 4 | 5, F, 7, 8, 9, 10], t = 6; exclude self
|
|
61
|
+
mask[t] = False
|
|
62
|
+
|
|
63
|
+
# [0, 1, 2, 3, 4 | 5, F, 7, 8, F, F ], Tp = 2; exclude last Tp points to prevent out-of-bound indexing on predictions
|
|
64
|
+
mask[-Tp:] = False
|
|
65
|
+
|
|
66
|
+
# [0, 1, 2, 3, 4 | F, F, F, F, F, F ], lib_size = 5; exclude test points
|
|
67
|
+
mask[lib_size:] = False
|
|
68
|
+
|
|
69
|
+
# find k(=e+1) nearest neighbors in phase space for simplex projection
|
|
70
|
+
indices_masked, _ = topk(D[batch, t, mask], e + 1, largest=False)
|
|
71
|
+
indices = seq[mask][indices_masked]
|
|
72
|
+
predictions[k] = X[batch, indices + Tp, :e].mean()
|
|
73
|
+
|
|
74
|
+
rho[i, j] = calculate_rho(observations, predictions)
|
|
75
|
+
|
|
76
|
+
return rho
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
import numpy as np
|
|
2
|
+
|
|
3
|
+
|
|
4
|
+
def lagged_embed(x: np.ndarray, tau: int, e: int):
|
|
5
|
+
"""Lagged embedding of a time series `x`.
|
|
6
|
+
|
|
7
|
+
Parameters
|
|
8
|
+
----------
|
|
9
|
+
`x` : `np.ndarray` of shape `(N,)`
|
|
10
|
+
`tau` : `int`
|
|
11
|
+
`e` : `int`
|
|
12
|
+
|
|
13
|
+
Returns
|
|
14
|
+
-------
|
|
15
|
+
`np.ndarray` of shape `(N - (e - 1) * tau, e)`
|
|
16
|
+
|
|
17
|
+
Raises
|
|
18
|
+
------
|
|
19
|
+
AssertionError
|
|
20
|
+
- If `x` is not a 1D array.
|
|
21
|
+
- If `tau` or `e` is not positive.
|
|
22
|
+
- If `e * tau >= len(x)`.
|
|
23
|
+
|
|
24
|
+
Notes
|
|
25
|
+
-----
|
|
26
|
+
- While open to interpretation, it's generally more intuitive to consider the embedding as starting from the `(e - 1) * tau`th element of the original time series and ending at the `len(x) - 1`th element (the last value), rather than starting from the 0th element and ending at `len(x) - 1 - (e - 1) * tau`.
|
|
27
|
+
- This distinction reflects whether we think of "attaching past values to the present" or "attaching future values to the present". The information content of the result is the same either way.
|
|
28
|
+
- The use of `reversed` in the implementation emphasizes this perspective.
|
|
29
|
+
|
|
30
|
+
Examples
|
|
31
|
+
--------
|
|
32
|
+
```
|
|
33
|
+
import numpy as np
|
|
34
|
+
from edm.embedding import lagged_embed
|
|
35
|
+
|
|
36
|
+
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
|
|
37
|
+
tau = 2
|
|
38
|
+
e = 3
|
|
39
|
+
|
|
40
|
+
E = lagged_embed(x, tau, e)
|
|
41
|
+
print(E)
|
|
42
|
+
print(E.shape)
|
|
43
|
+
# [[4 2 0]
|
|
44
|
+
# [5 3 1]
|
|
45
|
+
# [6 4 2]
|
|
46
|
+
# [7 5 3]
|
|
47
|
+
# [8 6 4]
|
|
48
|
+
# [9 7 5]]
|
|
49
|
+
# (6, 3)
|
|
50
|
+
```
|
|
51
|
+
"""
|
|
52
|
+
assert len(x.shape) == 1, f"X must be a 1D array, got x.shape={x.shape}"
|
|
53
|
+
assert tau > 0 and e > 0, f"tau and e must be positive, got tau={tau}, e={e}"
|
|
54
|
+
assert (e - 1) * tau <= x.shape[0], f"e and tau must satisfy `(e - 1) * tau < len(X)`, got e={e}, tau={tau}"
|
|
55
|
+
|
|
56
|
+
return np.array([x[tau * (e - 1) :]] + [x[tau * i : -tau * ((e - 1) - i)] for i in reversed(range(e - 1))]).transpose()
|