GLDF 0.9.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- GLDF/__init__.py +2 -0
- GLDF/bridges/__init__.py +0 -0
- GLDF/bridges/causal_learn.py +185 -0
- GLDF/bridges/tigramite.py +143 -0
- GLDF/bridges/tigramite_plotting_modified.py +4764 -0
- GLDF/cit.py +274 -0
- GLDF/data_management.py +588 -0
- GLDF/data_processing.py +754 -0
- GLDF/frontend.py +537 -0
- GLDF/hccd.py +403 -0
- GLDF/hyperparams.py +205 -0
- GLDF/independence_atoms.py +78 -0
- GLDF/state_space_construction.py +288 -0
- GLDF/tutorials/01_preconfigured_quickstart.ipynb +302 -0
- GLDF/tutorials/02_detailed_configuration.ipynb +394 -0
- GLDF/tutorials/03_custom_patterns.ipynb +447 -0
- gldf-0.9.0.dist-info/METADATA +101 -0
- gldf-0.9.0.dist-info/RECORD +20 -0
- gldf-0.9.0.dist-info/WHEEL +4 -0
- gldf-0.9.0.dist-info/licenses/LICENSE +621 -0
|
@@ -0,0 +1,447 @@
|
|
|
1
|
+
{
|
|
2
|
+
"cells": [
|
|
3
|
+
{
|
|
4
|
+
"cell_type": "code",
|
|
5
|
+
"execution_count": 46,
|
|
6
|
+
"id": "9d68999a",
|
|
7
|
+
"metadata": {},
|
|
8
|
+
"outputs": [],
|
|
9
|
+
"source": [
|
|
10
|
+
"import GLDF\n",
|
|
11
|
+
"import numpy as np\n",
|
|
12
|
+
"import matplotlib.pyplot as plt"
|
|
13
|
+
]
|
|
14
|
+
},
|
|
15
|
+
{
|
|
16
|
+
"cell_type": "markdown",
|
|
17
|
+
"id": "f0fdba52",
|
|
18
|
+
"metadata": {},
|
|
19
|
+
"source": [
|
|
20
|
+
"# Tutorial 03: Custom Patterns - Approximate Periodicity\n",
|
|
21
|
+
"\n",
|
|
22
|
+
"Our framework can deal with very general prior structures about patterns.\n",
|
|
23
|
+
"This tutorial demonstrates how to describe a custom pattern on the example of (approximate) periodic in time patterns.\n",
|
|
24
|
+
"To this end we\n",
|
|
25
|
+
"1. Describe our prior knowledge about the pattern in a formal way.\n",
|
|
26
|
+
"2. Configure HCCD to run with this custom pattern.\n",
|
|
27
|
+
"\n",
|
|
28
|
+
"For readability we will represent periodicity-patterns by strings, relevant properties can readily be extracted, for example:"
|
|
29
|
+
]
|
|
30
|
+
},
|
|
31
|
+
{
|
|
32
|
+
"cell_type": "code",
|
|
33
|
+
"execution_count": 47,
|
|
34
|
+
"id": "2637fe0b",
|
|
35
|
+
"metadata": {},
|
|
36
|
+
"outputs": [],
|
|
37
|
+
"source": [
|
|
38
|
+
"repeating_pattern = \"AACBCC\""
|
|
39
|
+
]
|
|
40
|
+
},
|
|
41
|
+
{
|
|
42
|
+
"cell_type": "markdown",
|
|
43
|
+
"id": "3ee0d384",
|
|
44
|
+
"metadata": {},
|
|
45
|
+
"source": [
|
|
46
|
+
"## 1. The Pattern\n",
|
|
47
|
+
"\n",
|
|
48
|
+
"We first have to formalize what \"approximately periodic\" means.\n",
|
|
49
|
+
"We can do so by implementing a derived class of data_management.CIT_DataPatterned.\n",
|
|
50
|
+
"\n",
|
|
51
|
+
"As described in the documentation, we have to implement (at least) three methods:\n",
|
|
52
|
+
"\n",
|
|
53
|
+
"* view_blocks: This is the core functionality. It encodes our prior idea about patterns\n",
|
|
54
|
+
" by supplying \"blocks\" (of a given size) of \"similar\" data-points.\n",
|
|
55
|
+
" Similar, in our case, are data-points that follow the periodic pattern.\n",
|
|
56
|
+
" For example if the approximate pattern is 'AAB' we expect regimes in data to \"usually\" (but not always)\n",
|
|
57
|
+
" follow the structure AAB AAB AAB AAB AAB AAB AAB AAB (spaces for readability).\n",
|
|
58
|
+
" Now, if we want blocks of \"similar\" points of, say, size 8 from this sequence, we should consider\n",
|
|
59
|
+
" for example\n",
|
|
60
|
+
"\n",
|
|
61
|
+
" * block No1: XX. XX. XX. XX. ... ... ... ... (8 data-points, all from 'A')\n",
|
|
62
|
+
" * block No2: ... ... ... ... XX. XX. XX. XX. (8 data-points, all from 'A')\n",
|
|
63
|
+
" * block No3: ..X ..X ..X ..X ..X ..X ..X ..X (8 data-points, all from 'B')\n",
|
|
64
|
+
" * for the remaineder of the data, repeat this logic, collecting further blocks (possibly discarding some data-points in the end if rounding-issues occur).\n",
|
|
65
|
+
"\n",
|
|
66
|
+
" The code below is a simple (albeit likely not particular efficient) implementation of this strategy.\n",
|
|
67
|
+
"\n",
|
|
68
|
+
"* static reproject_blocks: This describes how results on blocks can be \"reprojected\" onto the time-axis for plotting. (This can be a static function on the patterned data,\n",
|
|
69
|
+
" or an arbitrary method of a factory-object as shown below on 'Pattern_PeriodicPersistent'.)\n",
|
|
70
|
+
"* get_actual_block_format: Blocks could be two-dimensional, in which case their format is not (for plotting purposes)\n",
|
|
71
|
+
" fully described by their size. In our case, we simply copy through the requested size."
|
|
72
|
+
]
|
|
73
|
+
},
|
|
74
|
+
{
|
|
75
|
+
"cell_type": "markdown",
|
|
76
|
+
"id": "392096c4",
|
|
77
|
+
"metadata": {},
|
|
78
|
+
"source": [
|
|
79
|
+
"### 1.1 Analyizing Pattern-Strings\n",
|
|
80
|
+
"\n",
|
|
81
|
+
"Consider the following example:"
|
|
82
|
+
]
|
|
83
|
+
},
|
|
84
|
+
{
|
|
85
|
+
"cell_type": "code",
|
|
86
|
+
"execution_count": 48,
|
|
87
|
+
"id": "7f3551d3",
|
|
88
|
+
"metadata": {},
|
|
89
|
+
"outputs": [
|
|
90
|
+
{
|
|
91
|
+
"name": "stdout",
|
|
92
|
+
"output_type": "stream",
|
|
93
|
+
"text": [
|
|
94
|
+
"AACBCC\n",
|
|
95
|
+
"6\n",
|
|
96
|
+
"{'B', 'A', 'C'}\n",
|
|
97
|
+
"{'B': 1, 'A': 2, 'C': 3}\n",
|
|
98
|
+
"{'B': array([False, False, False, True, False, False]), 'A': array([ True, True, False, False, False, False]), 'C': array([False, False, True, False, True, True])}\n"
|
|
99
|
+
]
|
|
100
|
+
}
|
|
101
|
+
],
|
|
102
|
+
"source": [
|
|
103
|
+
"repeating_pattern = \"AACBCC\"\n",
|
|
104
|
+
"\n",
|
|
105
|
+
"pattern_len = len(repeating_pattern)\n",
|
|
106
|
+
"repeating_elements = set(repeating_pattern) # letters in pattern, eg {A, B, C} for pattern AABCCC\n",
|
|
107
|
+
"repetitions = {elem: repeating_pattern.count(elem) for elem in repeating_elements} # number of occurences of letters per period, eg {A:2, B:1, C:3}\n",
|
|
108
|
+
"masks = {elem: np.array(list(repeating_pattern))==elem for elem in repeating_elements}\n",
|
|
109
|
+
"\n",
|
|
110
|
+
"print(repeating_pattern)\n",
|
|
111
|
+
"print(pattern_len)\n",
|
|
112
|
+
"print(repeating_elements)\n",
|
|
113
|
+
"print(repetitions)\n",
|
|
114
|
+
"print(masks)"
|
|
115
|
+
]
|
|
116
|
+
},
|
|
117
|
+
{
|
|
118
|
+
"cell_type": "markdown",
|
|
119
|
+
"id": "9e3429cc",
|
|
120
|
+
"metadata": {},
|
|
121
|
+
"source": [
|
|
122
|
+
"Generally, the following helper provides this kind of quick analysis:"
|
|
123
|
+
]
|
|
124
|
+
},
|
|
125
|
+
{
|
|
126
|
+
"cell_type": "code",
|
|
127
|
+
"execution_count": 49,
|
|
128
|
+
"id": "6f14a2cf",
|
|
129
|
+
"metadata": {},
|
|
130
|
+
"outputs": [],
|
|
131
|
+
"source": [
|
|
132
|
+
"class Pattern_PeriodicPersistent_Analyize:\n",
|
|
133
|
+
" def __init__(self, repeating_pattern: str, N: int):\n",
|
|
134
|
+
" self.N = N\n",
|
|
135
|
+
" self.repeating_pattern = repeating_pattern\n",
|
|
136
|
+
" self.pattern_len = len(repeating_pattern)\n",
|
|
137
|
+
" self.repeating_elements = set(repeating_pattern) # letters in pattern, eg {A, B, C} for pattern AABCCC\n",
|
|
138
|
+
" self.repetitions = {elem: repeating_pattern.count(elem) for elem in self.repeating_elements} # number of occurences of letters per period, eg {A:2, B:1, C:3}\n",
|
|
139
|
+
" self.masks = {elem: np.array(list(repeating_pattern))==elem for elem in self.repeating_elements}\n",
|
|
140
|
+
" self.full_masks = {elem: self.repeat_mask(mask) for elem, mask in self.masks.items()}\n",
|
|
141
|
+
" \n",
|
|
142
|
+
" def repeat_mask(self, mask):\n",
|
|
143
|
+
" result = np.empty(self.N, dtype=bool)\n",
|
|
144
|
+
" full_repetitions = int(len(result)/self.pattern_len)\n",
|
|
145
|
+
" result[:full_repetitions*self.pattern_len] = np.tile(mask, full_repetitions)\n",
|
|
146
|
+
" result[full_repetitions*self.pattern_len+1:] = mask[:len(result)-full_repetitions*self.pattern_len]\n",
|
|
147
|
+
" return result\n",
|
|
148
|
+
" \n",
|
|
149
|
+
" def count_blocks_for_element(self, element, block_size) -> int:\n",
|
|
150
|
+
" return int(np.count_nonzero(self.full_masks[element])/block_size)"
|
|
151
|
+
]
|
|
152
|
+
},
|
|
153
|
+
{
|
|
154
|
+
"cell_type": "markdown",
|
|
155
|
+
"id": "9e22b70f",
|
|
156
|
+
"metadata": {},
|
|
157
|
+
"source": [
|
|
158
|
+
"### 1.2 The Patterned Data\n",
|
|
159
|
+
"\n",
|
|
160
|
+
"The view_blocks method (and its per-element analogue _view_blocks_element) provides the main functionality, as described above."
|
|
161
|
+
]
|
|
162
|
+
},
|
|
163
|
+
{
|
|
164
|
+
"cell_type": "code",
|
|
165
|
+
"execution_count": 50,
|
|
166
|
+
"id": "b439432d",
|
|
167
|
+
"metadata": {},
|
|
168
|
+
"outputs": [],
|
|
169
|
+
"source": [
|
|
170
|
+
"class CIT_DataPatterned_PeriodicPersistent(GLDF.data_management.CIT_DataPatterned, Pattern_PeriodicPersistent_Analyize):\n",
|
|
171
|
+
" def __init__(self, repeating_pattern: str, **args):\n",
|
|
172
|
+
" GLDF.data_management.CIT_DataPatterned.__init__(self, **args)\n",
|
|
173
|
+
" Pattern_PeriodicPersistent_Analyize.__init__(self, repeating_pattern, self.sample_count())\n",
|
|
174
|
+
"\n",
|
|
175
|
+
"\n",
|
|
176
|
+
" def _view_blocks_element(self, element, block_size) -> np.ndarray:\n",
|
|
177
|
+
" block_count = self.count_blocks_for_element(element, block_size)\n",
|
|
178
|
+
" aligned_N = block_size * block_count\n",
|
|
179
|
+
"\n",
|
|
180
|
+
" data_masked_x = self.x_data[self.full_masks[element]]\n",
|
|
181
|
+
" blocks_x = data_masked_x[:aligned_N].reshape((block_count, block_size))\n",
|
|
182
|
+
" data_masked_y = self.y_data[self.full_masks[element]]\n",
|
|
183
|
+
" blocks_y = data_masked_y[:aligned_N].reshape((block_count, block_size))\n",
|
|
184
|
+
" if self.z_dim() > 0:\n",
|
|
185
|
+
" data_masked_z = self.z_data[self.full_masks[element],:]\n",
|
|
186
|
+
" blocks_z = data_masked_z[:aligned_N,:].reshape((block_count, block_size,-1))\n",
|
|
187
|
+
" else:\n",
|
|
188
|
+
" blocks_z = None\n",
|
|
189
|
+
"\n",
|
|
190
|
+
" return blocks_x, blocks_y, blocks_z\n",
|
|
191
|
+
"\n",
|
|
192
|
+
"\n",
|
|
193
|
+
" def view_blocks(self, block_size:int) -> GLDF.data_management.BlockView:\n",
|
|
194
|
+
" all_blocks_x = []\n",
|
|
195
|
+
" all_blocks_y = []\n",
|
|
196
|
+
" all_blocks_z = []\n",
|
|
197
|
+
" for elem in self.repeating_elements:\n",
|
|
198
|
+
" blocks_x, blocks_y, blocks_z = self._view_blocks_element(elem, block_size)\n",
|
|
199
|
+
" all_blocks_x.append(blocks_x)\n",
|
|
200
|
+
" all_blocks_y.append(blocks_y)\n",
|
|
201
|
+
" all_blocks_z.append(blocks_z)\n",
|
|
202
|
+
" return GLDF.data_management.BlockView(\n",
|
|
203
|
+
" pattern_provider=self,\n",
|
|
204
|
+
" cache_id=None if self.cache_id is None else (*self.cache_id, block_size),\n",
|
|
205
|
+
" x_blocks=np.vstack(all_blocks_x),\n",
|
|
206
|
+
" y_blocks=np.vstack(all_blocks_y),\n",
|
|
207
|
+
" z_blocks=np.vstack(all_blocks_z) if self.z_dim() > 0 else None\n",
|
|
208
|
+
" )\n",
|
|
209
|
+
" \n",
|
|
210
|
+
" @staticmethod\n",
|
|
211
|
+
" def get_actual_block_format(requested_size: int) -> int:\n",
|
|
212
|
+
" return requested_size"
|
|
213
|
+
]
|
|
214
|
+
},
|
|
215
|
+
{
|
|
216
|
+
"cell_type": "markdown",
|
|
217
|
+
"id": "bd1ad9a9",
|
|
218
|
+
"metadata": {},
|
|
219
|
+
"source": [
|
|
220
|
+
"### 1.3 Factory Object\n",
|
|
221
|
+
"Finally, we wrap the above pattern into a factory-object and provide reprojection-logic for plotting."
|
|
222
|
+
]
|
|
223
|
+
},
|
|
224
|
+
{
|
|
225
|
+
"cell_type": "code",
|
|
226
|
+
"execution_count": 51,
|
|
227
|
+
"id": "c9a4bb8d",
|
|
228
|
+
"metadata": {},
|
|
229
|
+
"outputs": [],
|
|
230
|
+
"source": [
|
|
231
|
+
"class Pattern_PeriodicPersistent:\n",
|
|
232
|
+
" def __init__(self, repeating_pattern: str):\n",
|
|
233
|
+
" self.repeating_pattern = repeating_pattern\n",
|
|
234
|
+
"\n",
|
|
235
|
+
" def __call__(self, **args):\n",
|
|
236
|
+
" return CIT_DataPatterned_PeriodicPersistent(repeating_pattern=self.repeating_pattern, **args)\n",
|
|
237
|
+
" \n",
|
|
238
|
+
" def reproject_blocks(self, value_per_block: np.ndarray, block_configuration: GLDF.data_management.BlockView, data_configuration: tuple[int,...]) -> dict[str, np.ndarray]:\n",
|
|
239
|
+
" analyize = Pattern_PeriodicPersistent_Analyize(self.repeating_pattern, N=block_configuration.pattern_provider.sample_count()) # patterned-data sample-size accounts for tau-max (window-count < N)\n",
|
|
240
|
+
" \n",
|
|
241
|
+
" block_size = block_configuration.block_size()\n",
|
|
242
|
+
" read_offset = 0\n",
|
|
243
|
+
" results = {}\n",
|
|
244
|
+
"\n",
|
|
245
|
+
" for elem in analyize.repeating_elements:\n",
|
|
246
|
+
" block_count = analyize.count_blocks_for_element(elem, block_size)\n",
|
|
247
|
+
" elem_values = value_per_block[read_offset:read_offset+block_count]\n",
|
|
248
|
+
" read_offset += block_count\n",
|
|
249
|
+
" result = np.full( data_configuration, float(\"nan\") )\n",
|
|
250
|
+
" renorm_block_size = int(block_size*analyize.pattern_len/analyize.repetitions[elem])\n",
|
|
251
|
+
" result[:block_count*renorm_block_size] = elem_values.repeat(renorm_block_size)\n",
|
|
252
|
+
" results[elem] = result\n",
|
|
253
|
+
"\n",
|
|
254
|
+
" return results"
|
|
255
|
+
]
|
|
256
|
+
},
|
|
257
|
+
{
|
|
258
|
+
"cell_type": "markdown",
|
|
259
|
+
"id": "9636c7d3",
|
|
260
|
+
"metadata": {},
|
|
261
|
+
"source": [
|
|
262
|
+
"## 2. Modify the Configuration\n",
|
|
263
|
+
"\n",
|
|
264
|
+
"As a simple application of what is also explained in the second tutorial, we add a configuration using the above pattern in HCCD.\n",
|
|
265
|
+
"The pattern is actually associated to the data-manager, indeed we can use the default (time-series) data-manager\n",
|
|
266
|
+
"and provide the pattern in its constructor:"
|
|
267
|
+
]
|
|
268
|
+
},
|
|
269
|
+
{
|
|
270
|
+
"cell_type": "code",
|
|
271
|
+
"execution_count": 52,
|
|
272
|
+
"id": "760021f5",
|
|
273
|
+
"metadata": {},
|
|
274
|
+
"outputs": [],
|
|
275
|
+
"source": [
|
|
276
|
+
"class ConfigureHCCD_PeriodicPersistent(GLDF.frontend.ConfigureHCCD):\n",
|
|
277
|
+
" def __init__(self, regimes_are_large: bool=True, alpha: float=0.01, alpha_pc1: float=0.1, repeating_pattern: str=\"AB\"):\n",
|
|
278
|
+
" ts_config = GLDF.frontend.configure_hccd_temporal_regimes(regimes_are_large=regimes_are_large, alpha=alpha, alpha_pc1=alpha_pc1)\n",
|
|
279
|
+
" from dataclasses import asdict\n",
|
|
280
|
+
" super().__init__(**asdict(ts_config))\n",
|
|
281
|
+
" self.repeating_pattern = repeating_pattern\n",
|
|
282
|
+
"\n",
|
|
283
|
+
" def get_data_manager(self):\n",
|
|
284
|
+
" return GLDF.data_management.DataManager_NumpyArray_Timeseries(self._data, pattern=Pattern_PeriodicPersistent(self.repeating_pattern))"
|
|
285
|
+
]
|
|
286
|
+
},
|
|
287
|
+
{
|
|
288
|
+
"cell_type": "code",
|
|
289
|
+
"execution_count": 53,
|
|
290
|
+
"id": "5c534929",
|
|
291
|
+
"metadata": {},
|
|
292
|
+
"outputs": [],
|
|
293
|
+
"source": [
|
|
294
|
+
"import numpy as np\n",
|
|
295
|
+
"\n",
|
|
296
|
+
"N = 1000\n",
|
|
297
|
+
"\n",
|
|
298
|
+
"# At odd times, in last quarter [second half]\n",
|
|
299
|
+
"R1 = np.mod( np.arange(N), 2 ) == 1\n",
|
|
300
|
+
"R1[:int(3*N/4)] = False\n",
|
|
301
|
+
"\n",
|
|
302
|
+
"# and even times always [outside first and last quarter]\n",
|
|
303
|
+
"R2 = np.mod( np.arange(N), 2 ) == 0\n",
|
|
304
|
+
"#R2[:int(N/4)] = False\n",
|
|
305
|
+
"#R2[int(3*N/4):] = False\n",
|
|
306
|
+
"\n",
|
|
307
|
+
"# the context is \"on\":\n",
|
|
308
|
+
"R = np.logical_or(R1, R2)\n",
|
|
309
|
+
"\n",
|
|
310
|
+
"\n",
|
|
311
|
+
"rng = np.random.default_rng()\n",
|
|
312
|
+
"X_noise = rng.standard_normal(N)\n",
|
|
313
|
+
"Y_noise = rng.standard_normal(N)\n",
|
|
314
|
+
"Z_noise = rng.standard_normal(N)\n",
|
|
315
|
+
"\n",
|
|
316
|
+
"X = np.empty_like(X_noise)\n",
|
|
317
|
+
"Y = np.empty_like(Y_noise)\n",
|
|
318
|
+
"Z = np.empty_like(Z_noise)\n",
|
|
319
|
+
"\n",
|
|
320
|
+
"def lag_one_or_zero(values, t):\n",
|
|
321
|
+
" return values[t-1] if t > 0 else 0.0\n",
|
|
322
|
+
"\n",
|
|
323
|
+
"for t in range(N):\n",
|
|
324
|
+
" X[t] = X_noise[t] + 0.2 * lag_one_or_zero(X, t)\n",
|
|
325
|
+
" Z[t] = Z_noise[t]\n",
|
|
326
|
+
" Y[t] = Y_noise[t] + R[t] * lag_one_or_zero(X, t) + Z[t]\n",
|
|
327
|
+
"\n",
|
|
328
|
+
"data = np.array([X,Y,Z]).T\n",
|
|
329
|
+
"var_names = [\"X\", \"Y\", \"Z\"]"
|
|
330
|
+
]
|
|
331
|
+
},
|
|
332
|
+
{
|
|
333
|
+
"cell_type": "markdown",
|
|
334
|
+
"id": "b7333338",
|
|
335
|
+
"metadata": {},
|
|
336
|
+
"source": [
|
|
337
|
+
"Commented out lines: If only using eg half the data at odd times, only about a quarter of data remains, which especially for \"small\" N, may not suffice.\n",
|
|
338
|
+
"In principle even times can have regimes independently, these are currently commented out, because otherwise persistent regimes also work (in principle/for infinite N at least)."
|
|
339
|
+
]
|
|
340
|
+
},
|
|
341
|
+
{
|
|
342
|
+
"cell_type": "code",
|
|
343
|
+
"execution_count": 54,
|
|
344
|
+
"id": "db5d4477",
|
|
345
|
+
"metadata": {},
|
|
346
|
+
"outputs": [
|
|
347
|
+
{
|
|
348
|
+
"data": {
|
|
349
|
+
"image/png": "",
|
|
350
|
+
"text/plain": [
|
|
351
|
+
"<Figure size 800x200 with 4 Axes>"
|
|
352
|
+
]
|
|
353
|
+
},
|
|
354
|
+
"metadata": {},
|
|
355
|
+
"output_type": "display_data"
|
|
356
|
+
}
|
|
357
|
+
],
|
|
358
|
+
"source": [
|
|
359
|
+
"fig, ax = plt.subplots(nrows=1, ncols=4, figsize=(8,2))\n",
|
|
360
|
+
"\n",
|
|
361
|
+
"for ax_i in ax:\n",
|
|
362
|
+
" ax_i.set_ylim(-0.05, 1.05)\n",
|
|
363
|
+
"\n",
|
|
364
|
+
"ax[0].plot(R[::2], label=\"even times\")\n",
|
|
365
|
+
"ax[0].plot(R[1::2], label=\"odd times\", linestyle=\"dashed\")\n",
|
|
366
|
+
"ax[0].legend()\n",
|
|
367
|
+
"\n",
|
|
368
|
+
"ax[1].plot(np.arange(20)+240, R[240:260], '+')\n",
|
|
369
|
+
"ax[2].plot(np.arange(20)+490, R[490:510], '+')\n",
|
|
370
|
+
"ax[3].plot(np.arange(20)+740, R[740:760], '+')\n",
|
|
371
|
+
"\n",
|
|
372
|
+
"plt.show()"
|
|
373
|
+
]
|
|
374
|
+
},
|
|
375
|
+
{
|
|
376
|
+
"cell_type": "code",
|
|
377
|
+
"execution_count": 55,
|
|
378
|
+
"id": "7c18e304",
|
|
379
|
+
"metadata": {},
|
|
380
|
+
"outputs": [
|
|
381
|
+
{
|
|
382
|
+
"data": {
|
|
383
|
+
"image/png": "",
|
|
384
|
+
"text/plain": [
|
|
385
|
+
"<Figure size 400x400 with 1 Axes>"
|
|
386
|
+
]
|
|
387
|
+
},
|
|
388
|
+
"metadata": {},
|
|
389
|
+
"output_type": "display_data"
|
|
390
|
+
},
|
|
391
|
+
{
|
|
392
|
+
"data": {
|
|
393
|
+
"image/png": "",
|
|
394
|
+
"text/plain": [
|
|
395
|
+
"<Figure size 640x480 with 1 Axes>"
|
|
396
|
+
]
|
|
397
|
+
},
|
|
398
|
+
"metadata": {},
|
|
399
|
+
"output_type": "display_data"
|
|
400
|
+
}
|
|
401
|
+
],
|
|
402
|
+
"source": [
|
|
403
|
+
"config = ConfigureHCCD_PeriodicPersistent(repeating_pattern=\"AB\")\n",
|
|
404
|
+
"config.indicator_resolution_granularity = 80\n",
|
|
405
|
+
"\n",
|
|
406
|
+
"result = config.run(data)\n",
|
|
407
|
+
"result.var_names = var_names\n",
|
|
408
|
+
"import matplotlib.pyplot as plt\n",
|
|
409
|
+
"result.plot_labeled_union_graph()\n",
|
|
410
|
+
"plt.show()\n",
|
|
411
|
+
"for mi in result.model_indicators():\n",
|
|
412
|
+
" mi.plot_resolution()\n",
|
|
413
|
+
" plt.legend()\n",
|
|
414
|
+
" plt.show()"
|
|
415
|
+
]
|
|
416
|
+
},
|
|
417
|
+
{
|
|
418
|
+
"cell_type": "markdown",
|
|
419
|
+
"id": "81498128",
|
|
420
|
+
"metadata": {},
|
|
421
|
+
"source": [
|
|
422
|
+
"Indicator-resolution can be poor, especially when indicator_resolution_granularity is chosen poorly, considering that given N=1000 there are 998 (max-length, including PC1-parents for MCI tests) windows, eg using indicator_resolution_granularity=100 (the default) leaves a remainder of 98 data-points ..., effectively (per odd/even sub-dataset) blocks have temporal extension 2*indicator_resolution_granularity, which reinforces this issue."
|
|
423
|
+
]
|
|
424
|
+
}
|
|
425
|
+
],
|
|
426
|
+
"metadata": {
|
|
427
|
+
"kernelspec": {
|
|
428
|
+
"display_name": ".venv",
|
|
429
|
+
"language": "python",
|
|
430
|
+
"name": "python3"
|
|
431
|
+
},
|
|
432
|
+
"language_info": {
|
|
433
|
+
"codemirror_mode": {
|
|
434
|
+
"name": "ipython",
|
|
435
|
+
"version": 3
|
|
436
|
+
},
|
|
437
|
+
"file_extension": ".py",
|
|
438
|
+
"mimetype": "text/x-python",
|
|
439
|
+
"name": "python",
|
|
440
|
+
"nbconvert_exporter": "python",
|
|
441
|
+
"pygments_lexer": "ipython3",
|
|
442
|
+
"version": "3.13.5"
|
|
443
|
+
}
|
|
444
|
+
},
|
|
445
|
+
"nbformat": 4,
|
|
446
|
+
"nbformat_minor": 5
|
|
447
|
+
}
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: GLDF
|
|
3
|
+
Version: 0.9.0
|
|
4
|
+
Summary: GLDF: Non-Homogeneous Causal Graph Discovery
|
|
5
|
+
Project-URL: Homepage, https://martin-rabel.github.io/Causal_GLDF/
|
|
6
|
+
Project-URL: Documentation, https://martin-rabel.github.io/Causal_GLDF/
|
|
7
|
+
Project-URL: Repository, https://github.com/martin-rabel/Causal_GLDF/
|
|
8
|
+
Project-URL: Issues, https://github.com/martin-rabel/Causal_GLDF/issues
|
|
9
|
+
Author: Martin Rabel
|
|
10
|
+
Author-email: Martin Rabel <martin.rabel@uni-potsdam.de>
|
|
11
|
+
License-Expression: GPL-3.0-or-later
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Keywords: CPD,causal discovery,causality,clustering,graph discovery,non-stationary
|
|
14
|
+
Classifier: Operating System :: OS Independent
|
|
15
|
+
Classifier: Programming Language :: Python :: 3
|
|
16
|
+
Requires-Python: >=3.10
|
|
17
|
+
Requires-Dist: numpy
|
|
18
|
+
Requires-Dist: scipy
|
|
19
|
+
Provides-Extra: full
|
|
20
|
+
Requires-Dist: causal-learn>=0.1.4.3; extra == 'full'
|
|
21
|
+
Requires-Dist: matplotlib>=3.7.0; extra == 'full'
|
|
22
|
+
Requires-Dist: networkx>=3.0; extra == 'full'
|
|
23
|
+
Requires-Dist: tigramite>=5.2.0.0; extra == 'full'
|
|
24
|
+
Provides-Extra: minimal
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
|
|
27
|
+
# GLDF: Non-Homogeneous Causal Graph Discovery
|
|
28
|
+
|
|
29
|
+
This is the reference implementation of the framework described by [1]
|
|
30
|
+
for causal graph discovery on non-homogeneous data.
|
|
31
|
+
|
|
32
|
+
## Introduction
|
|
33
|
+
|
|
34
|
+
Algorithms for causal graph discovery (CD) on IID or stationary data are readily
|
|
35
|
+
available. However, real-world data rarely is IID or stationary.
|
|
36
|
+
This framework particularly aims to find ways in which a causal graph
|
|
37
|
+
*changes* over time, space or patterns in other extraneous information attached
|
|
38
|
+
to data. It does so by focusing on an approach that is local in the graph (gL)
|
|
39
|
+
and testing qualitative questions directly (D), hence the name gLD-framework
|
|
40
|
+
(GLDF).
|
|
41
|
+
|
|
42
|
+
Besides statistical efficiency, another major strength of this approach
|
|
43
|
+
is its modularity. This allows the present framework to directly
|
|
44
|
+
build on existing CD-algorithm implementations.
|
|
45
|
+
At the same time, almost arbitrary "patterns" generalizing persistence
|
|
46
|
+
in time or space can be leveraged to account for regime-structure.
|
|
47
|
+
The framework extensively builds on modified conditional independence tests
|
|
48
|
+
(CITs), where individual (or all) modifications can easily be customized
|
|
49
|
+
if the need arises.
|
|
50
|
+
|
|
51
|
+
|
|
52
|
+
## Getting Started
|
|
53
|
+
|
|
54
|
+
The best place to get started is probably the extensive [documentation](https://martin-rabel.github.io/Causal_GLDF/).
|
|
55
|
+
Further, there are tutorials in the form of jupyter-notebooks in the sub-directory "tutorials".
|
|
56
|
+
This package is designed to be easily extensible and to integrate well out-of-the box
|
|
57
|
+
with [tigramite](https://github.com/jakobrunge/tigramite).
|
|
58
|
+
|
|
59
|
+
|
|
60
|
+
## Requirements
|
|
61
|
+
|
|
62
|
+
|
|
63
|
+
Minimal:
|
|
64
|
+
|
|
65
|
+
* python (version 3.10 or new recommended, tested on 3.13.5) and its standard-libary
|
|
66
|
+
* numpy (version 2.0.0 or newer, tested on 2.3.2)
|
|
67
|
+
* scipy (version 1.10.0 or newer, tested on 1.16.1)
|
|
68
|
+
|
|
69
|
+
Recommended (additionally):
|
|
70
|
+
|
|
71
|
+
* matplotlib (version 3.7.0 or newer, tested on 3.10.5)
|
|
72
|
+
* tigramite (version 5.2.0.0 or newer, tested on 5.2.8.2)
|
|
73
|
+
* networkx (version 3.0 or newer, tested on 3.5), used via tigramite's graph plotting
|
|
74
|
+
* causal-learn (tested on 0.1.4.3)
|
|
75
|
+
|
|
76
|
+
|
|
77
|
+
|
|
78
|
+
## References
|
|
79
|
+
|
|
80
|
+
[1] M. Rabel, J. Runge.
|
|
81
|
+
Context-Specific Causal Graph Discovery with Unobserved Contexts: Non-Stationarity, Regimes and Spatio-Temporal Patterns.
|
|
82
|
+
*archive preprint* [arXiv:2511.21537](https://arxiv.org/abs/2511.21537), 2025.
|
|
83
|
+
|
|
84
|
+
|
|
85
|
+
## User Agreement
|
|
86
|
+
|
|
87
|
+
By downloading this software you agree with the following points:
|
|
88
|
+
This software is provided without any warranty or conditions of any kind. We assume no responsibility for errors or omissions in the results and interpretations following from application of this software.
|
|
89
|
+
|
|
90
|
+
You commit to cite above papers in your reports or publications.
|
|
91
|
+
|
|
92
|
+
|
|
93
|
+
## License
|
|
94
|
+
|
|
95
|
+
Copyright (C) 2025-2026 Martin Rabel
|
|
96
|
+
|
|
97
|
+
GNU General Public License v3.0
|
|
98
|
+
|
|
99
|
+
See the file LICENSE for full text.
|
|
100
|
+
|
|
101
|
+
This package is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
|