patchfm 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
patchfm-1.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Samy-Melwan Vilhes
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
patchfm-1.1.0/PKG-INFO ADDED
@@ -0,0 +1,128 @@
1
+ Metadata-Version: 2.4
2
+ Name: patchfm
3
+ Version: 1.1.0
4
+ Summary: a Foundation Model for Univariate Time Series Forecasting
5
+ Author-email: Samy-Melwan Vilhes <samy-melwan.vilhes@insa-rouen.fr>
6
+ License: MIT License
7
+
8
+ Copyright (c) 2025 Samy-Melwan Vilhes
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+ Project-URL: Repository, https://github.com/vilhess/PatchFM
28
+ Project-URL: Issues, https://github.com/vilhess/PatchFM/issues
29
+ Keywords: Transformer,LLM,Time Series,Zero-shot,Deep Learning
30
+ Classifier: Programming Language :: Python :: 3
31
+ Classifier: Operating System :: OS Independent
32
+ Requires-Python: >=3.11
33
+ Description-Content-Type: text/markdown
34
+ License-File: LICENSE
35
+ Requires-Dist: torch>=2.5.0
36
+ Requires-Dist: einops>=0.8.1
37
+ Requires-Dist: huggingface-hub>=0.35.1
38
+ Requires-Dist: rotary-embedding-torch>=0.8.9
39
+ Requires-Dist: numpy>=1.26.0
40
+ Dynamic: license-file
41
+
42
+ # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
43
+
44
+ [Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
45
+
46
+ A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
47
+
48
+ ## Highlights
49
+ - Next-patch prediction objective (autoregressive, causal)
50
+ - Patch-based representation of time series (tokens ↔ patches)
51
+ - Causal masking self-attention with RoPE (relative positions)
52
+ - RevIN (Reversible Instance Normalization) with causal statistics
53
+ - SwiGLU feed-forward networks
54
+ - Multi-quantile outputs (median + uncertainty bands)
55
+ - Efficient rollout with KV caching
56
+
57
+ ## Installation
58
+ ```bash
59
+ pip install patchfm
60
+ ```
61
+
62
+ ## Quick Start
63
+
64
+ ```python
65
+ import torch
66
+ from patchfm.configs import PatchFMConfig
67
+ from patchfm.model import Forecaster
68
+
69
+ # --- Instantiate model ---
70
+ config = PatchFMConfig()
71
+ model = Forecaster(config)
72
+
73
+ # --- Inference ---
74
+ forecast_horizon = 64
75
+ seq = torch.randn(1, 1024) # (batch, time)
76
+ pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, time, quantiles)
77
+ ```
78
+
79
+ We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
80
+ If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
81
+
82
+ <a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
83
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
84
+ </a>
85
+
86
+ ## Method (TL;DR)
87
+ - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
88
+ - RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
89
+ - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
90
+ - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
91
+ - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
92
+ - Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
93
+
94
+ ## Problem Formulation
95
+ Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
96
+
97
+ ## Loss: Multi-Quantile (Pinball)
98
+ For residual $u = x - \hat{x}^{(q)}$:
99
+ $$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
100
+ Aggregate over positions, patch elements, and quantiles.
101
+
102
+ ## Architecture
103
+ - Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
104
+ - Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
105
+ - FFN: SwiGLU (SiLU-gated), pre-norm + residual
106
+ - Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
107
+
108
+ ### Model Details
109
+ - Patch size: 32
110
+ - Max context: 32 patches (1024 steps)
111
+ - Forecast horizon: 32 steps per forward pass
112
+ - Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
113
+ - Layers: 6
114
+ - Attention heads: 64 (head dim 32)
115
+ - Model dim: 2048
116
+ - Parameters: ~300M
117
+
118
+ ## Inference
119
+ - Single step: predict next patch ($P_{len}$ values)
120
+ - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
121
+ - KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
122
+
123
+ ## Acknowledgements
124
+ We thank the authors of the following repositories for inspiration and code snippets:
125
+ - [TiRex](https://github.com/NX-AI/tirex)
126
+
127
+ ## Citation
128
+ If you use this work, please cite the paper ...
@@ -0,0 +1,87 @@
1
+ # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
2
+
3
+ [Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
4
+
5
+ A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
6
+
7
+ ## Highlights
8
+ - Next-patch prediction objective (autoregressive, causal)
9
+ - Patch-based representation of time series (tokens ↔ patches)
10
+ - Causal masking self-attention with RoPE (relative positions)
11
+ - RevIN (Reversible Instance Normalization) with causal statistics
12
+ - SwiGLU feed-forward networks
13
+ - Multi-quantile outputs (median + uncertainty bands)
14
+ - Efficient rollout with KV caching
15
+
16
+ ## Installation
17
+ ```bash
18
+ pip install patchfm
19
+ ```
20
+
21
+ ## Quick Start
22
+
23
+ ```python
24
+ import torch
25
+ from patchfm.configs import PatchFMConfig
26
+ from patchfm.model import Forecaster
27
+
28
+ # --- Instantiate model ---
29
+ config = PatchFMConfig()
30
+ model = Forecaster(config)
31
+
32
+ # --- Inference ---
33
+ forecast_horizon = 64
34
+ seq = torch.randn(1, 1024) # (batch, time)
35
+ pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, time, quantiles)
36
+ ```
37
+
38
+ We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
39
+ If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
40
+
41
+ <a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
42
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
43
+ </a>
44
+
45
+ ## Method (TL;DR)
46
+ - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
47
+ - RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
48
+ - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
49
+ - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
50
+ - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
51
+ - Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
52
+
53
+ ## Problem Formulation
54
+ Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
55
+
56
+ ## Loss: Multi-Quantile (Pinball)
57
+ For residual $u = x - \hat{x}^{(q)}$:
58
+ $$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
59
+ Aggregate over positions, patch elements, and quantiles.
60
+
61
+ ## Architecture
62
+ - Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
63
+ - Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
64
+ - FFN: SwiGLU (SiLU-gated), pre-norm + residual
65
+ - Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
66
+
67
+ ### Model Details
68
+ - Patch size: 32
69
+ - Max context: 32 patches (1024 steps)
70
+ - Forecast horizon: 32 steps per forward pass
71
+ - Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
72
+ - Layers: 6
73
+ - Attention heads: 64 (head dim 32)
74
+ - Model dim: 2048
75
+ - Parameters: ~300M
76
+
77
+ ## Inference
78
+ - Single step: predict next patch ($P_{len}$ values)
79
+ - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
80
+ - KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
81
+
82
+ ## Acknowledgements
83
+ We thank the authors of the following repositories for inspiration and code snippets:
84
+ - [TiRex](https://github.com/NX-AI/tirex)
85
+
86
+ ## Citation
87
+ If you use this work, please cite the paper ...
@@ -0,0 +1,30 @@
1
+ [project]
2
+ name = "patchfm"
3
+ version = "1.1.0"
4
+ authors = [
5
+ { name="Samy-Melwan Vilhes", email="samy-melwan.vilhes@insa-rouen.fr" },
6
+ ]
7
+ description = "a Foundation Model for Univariate Time Series Forecasting"
8
+ readme = "README.md"
9
+ license = {file="LICENSE"}
10
+ requires-python = ">=3.11"
11
+ classifiers = [
12
+ "Programming Language :: Python :: 3",
13
+ "Operating System :: OS Independent",
14
+ ]
15
+ keywords = ["Transformer", "LLM", "Time Series", "Zero-shot", "Deep Learning"]
16
+ dependencies = [
17
+ "torch>=2.5.0",
18
+ "einops>=0.8.1",
19
+ "huggingface-hub>=0.35.1",
20
+ "rotary-embedding-torch>=0.8.9",
21
+ "numpy>=1.26.0"
22
+ ]
23
+
24
+ [project.urls]
25
+ Repository = "https://github.com/vilhess/PatchFM"
26
+ Issues = "https://github.com/vilhess/PatchFM/issues"
27
+
28
+ [build-system]
29
+ requires = ["setuptools >= 77.0.3"]
30
+ build-backend = "setuptools.build_meta"
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,2 @@
1
+ from patchfm.inference.forecaster import Forecaster
2
+ from patchfm.configs.model_config import PatchFMConfig
@@ -0,0 +1,21 @@
1
+ from dataclasses import dataclass, field, asdict
2
+
3
+ @dataclass
4
+ class PatchFMConfig:
5
+ max_seq_len: int = 1024
6
+ patch_len: int = 32
7
+ d_model: int = 2048
8
+ n_heads: int = 64
9
+ n_layers_encoder: int = 6
10
+ quantiles: list[float] = field(default_factory=lambda: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
11
+
12
+ compile: bool = True
13
+
14
+ def __getitem__(self, key):
15
+ return getattr(self, key)
16
+
17
+ def __setitem__(self, key, value):
18
+ return setattr(self, key, value)
19
+
20
+ def to_dict(self):
21
+ return asdict(self)
@@ -0,0 +1,129 @@
1
+ import torch
2
+ import torch.nn as nn
3
+ from einops import rearrange
4
+ from src.inference.modules import RevIN, ResidualBlock, TransformerEncoder, PatchFM
5
+
6
+
7
+ # --- Forecaster Model ---
8
+ class Forecaster(nn.Module):
9
+ def __init__(self, config):
10
+ super().__init__()
11
+
12
+ # Store config
13
+ self.max_seq_len = config["max_seq_len"]
14
+ self.patch_len = config["patch_len"]
15
+ self.d_model = config["d_model"]
16
+ self.n_heads = config["n_heads"]
17
+ self.n_layers_encoder = config["n_layers_encoder"]
18
+ self.quantiles = config["quantiles"]
19
+ self.n_quantiles = len(self.quantiles)
20
+
21
+ print("Loading base model from HuggingFace Hub...")
22
+ base_model = PatchFM.from_pretrained("vilhess/PatchFM")
23
+ self._init_from_base(base_model)
24
+
25
+ self.eval()
26
+ self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
27
+ self.to(self.device)
28
+
29
+ if config["compile"]:
30
+ self = torch.compile(self)
31
+
32
+ def _init_components(self):
33
+ """Initialize modules from scratch."""
34
+ self.revin = RevIN()
35
+ self.proj_embedding = ResidualBlock(
36
+ in_dim=self.patch_len,
37
+ hid_dim=2 * self.patch_len,
38
+ out_dim=self.d_model
39
+ )
40
+ self.transformer_encoder = TransformerEncoder(
41
+ d_model=self.d_model,
42
+ n_heads=self.n_heads,
43
+ n_layers=self.n_layers_encoder
44
+ )
45
+ self.proj_output = ResidualBlock(
46
+ in_dim=self.d_model,
47
+ hid_dim=2 * self.d_model,
48
+ out_dim=self.patch_len * self.n_quantiles
49
+ )
50
+
51
+ def _init_from_base(self, base_model):
52
+ """Initialize modules by reusing a pretrained PatchFM model."""
53
+ self.revin = base_model.revin
54
+ self.proj_embedding = base_model.proj_embedding
55
+ self.transformer_encoder = base_model.transformer_encoder
56
+ self.proj_output = base_model.proj_output
57
+
58
+ @torch.inference_mode()
59
+ def forecast(self, x: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
60
+ x = x.to(self.device)
61
+ # Ensure input shape (bs, length)
62
+ if x.ndim != 2:
63
+ x = x.unsqueeze(0)
64
+ bs, ws = x.size()
65
+
66
+ if ws > self.max_seq_len:
67
+ print(f"Warning: Input length {ws} exceeds max_seq_len {self.max_seq_len}. Truncating input.")
68
+ x = x[:, -self.max_seq_len:]
69
+ ws = self.max_seq_len
70
+
71
+ # Pad so length is divisible by patch_len
72
+ pad = (self.patch_len - ws % self.patch_len) % self.patch_len
73
+ if pad > 0:
74
+ x = torch.cat([x[:, :1].repeat(1, pad), x], dim=1)
75
+
76
+ # Default horizon = patch_len
77
+ forecast_horizon = forecast_horizon or self.patch_len
78
+
79
+ # Reshape into patches
80
+ x = rearrange(x, "b (pn pl) -> b pn pl", pl=self.patch_len)
81
+
82
+ rollouts = -(-forecast_horizon // self.patch_len) # ceil division
83
+ predictions = []
84
+
85
+ for _ in range(rollouts):
86
+
87
+ # Forward pass
88
+ x = self.revin(x, mode="norm")
89
+ x = self.proj_embedding(x)
90
+ x = self.transformer_encoder(x)
91
+ x = x[:, -1:, :] # Keep only the last patch for autoregressive forecasting
92
+ forecasting = self.proj_output(x)
93
+ forecasting = self.revin(forecasting, mode="denorm_last")
94
+
95
+ # Reshape to (bs, patch_num, patch_len, n_quantiles)
96
+ forecasting = rearrange(
97
+ forecasting, "b 1 (pl q) -> b 1 pl q",
98
+ pl=self.patch_len, q=self.n_quantiles
99
+ )
100
+
101
+ # Take median quantile (index 4)
102
+ patch_median = forecasting[:, -1:, :, 4].detach()
103
+ predictions.append(forecasting[:, -1, :, :])
104
+
105
+ # Append median patch for next rollout
106
+ x = patch_median.clone()
107
+
108
+ pred_quantiles = torch.cat(predictions, dim=1)
109
+ pred_quantiles = pred_quantiles[:, :forecast_horizon, :]
110
+ pred_median = pred_quantiles[:, :, 4]
111
+
112
+ pred_quantiles = pred_quantiles[..., [self.quantiles.index(q) for q in quantiles]] if quantiles is not None else pred_quantiles
113
+
114
+ self.clear_cache()
115
+
116
+ if torch.any(torch.isnan(pred_median)) or torch.any(torch.isinf(pred_median)):
117
+ print("Warning: NaN or Inf values detected in predictions. Returning zeros.")
118
+ pred_median = torch.zeros_like(pred_median)
119
+ pred_quantiles = torch.zeros_like(pred_quantiles)
120
+
121
+ return pred_median, pred_quantiles
122
+
123
+ def __call__(self, context: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
124
+ return self.forecast(context, forecast_horizon, quantiles)
125
+
126
+ def clear_cache(self):
127
+ self.revin.clear_cache()
128
+ for layer in self.transformer_encoder.layers:
129
+ layer.attn.clear_cache()
@@ -0,0 +1,251 @@
1
+ # Modules efficient for inference with caching
2
+
3
+ import torch
4
+ import torch.nn as nn
5
+ from einops import rearrange
6
+ from rotary_embedding_torch import RotaryEmbedding
7
+ from huggingface_hub import PyTorchModelHubMixin
8
+
9
+ def fill_nan_with_last_observed(x):
10
+ bs, pn, pl = x.size()
11
+ x = rearrange(x, "b pn pl -> (b pn) pl")
12
+ valid_mask = ~torch.isnan(x)
13
+ x_temp = torch.where(valid_mask, x, torch.zeros_like(x))
14
+ seq_indices = torch.arange(x.size(-1), device=x.device).unsqueeze(0)
15
+ valid_indices = torch.where(valid_mask, seq_indices, torch.tensor(-1, device=x.device))
16
+ last_valid_idx = torch.cummax(valid_indices, dim=-1)[0]
17
+ x = x_temp.gather(-1, torch.clamp(last_valid_idx, min=0))
18
+ x = rearrange(x, "(b pn) pl -> b pn pl", b=bs)
19
+ return x
20
+
21
+ class RevIN(nn.Module):
22
+ def __init__(self, eps=1e-5):
23
+ super().__init__()
24
+ self.eps = eps
25
+ self.cached_mean = None
26
+ self.cached_std = None
27
+
28
+ self.cached_cumsum_x = None
29
+ self.cached_cumsum_x2 = None
30
+ self.cached_counts = None
31
+
32
+ def forward(self, x, mode):
33
+ assert x.dim() == 3, "Input tensor must be (batch, n_patches, patch_len)"
34
+
35
+ x64 = x.double()
36
+
37
+ if mode == "norm":
38
+ mean, std = self._get_statistics(x64)
39
+ self.cached_mean, self.cached_std = mean[:, -1:].detach(), std[:, -1:].detach()
40
+ out = (x64 - mean) / std
41
+
42
+ nan_idx = out.isnan()
43
+ if nan_idx.any():
44
+ out = fill_nan_with_last_observed(out)
45
+
46
+ elif mode == "denorm_last":
47
+ assert self.cached_mean is not None and self.cached_std is not None, \
48
+ "Call forward(..., 'norm') before 'denorm'"
49
+ out = x64 * self.cached_std + self.cached_mean
50
+
51
+ else:
52
+ raise NotImplementedError(f"Mode '{mode}' not implemented.")
53
+
54
+ return out.float()
55
+
56
+ def _get_statistics(self, x):
57
+ """
58
+ Numerically stable mean and variance computation using
59
+ incremental mean and variance along the patch dimension.
60
+ x: (B, P, L) float64
61
+ Returns: mean, std (both (B, P, 1))
62
+ """
63
+ B, P, L = x.shape
64
+
65
+ nan_counts = torch.isnan(x).sum(-1, keepdim=True)
66
+ nan_counts = torch.cumsum(nan_counts, dim=1)
67
+
68
+ counts = torch.arange(1, P+1, device=x.device).view(1, P, 1).repeat(B, 1, 1) * L
69
+ counts = counts - nan_counts
70
+
71
+ if self.cached_counts is not None:
72
+ counts += self.cached_counts
73
+ self.cached_counts = counts[:, -1:, :]
74
+
75
+ cumsum_x = torch.cumsum(x.nansum(dim=-1, keepdim=True), dim=1)
76
+ if self.cached_cumsum_x is not None:
77
+ cumsum_x += self.cached_cumsum_x
78
+ self.cached_cumsum_x = cumsum_x[:, -1:, :]
79
+
80
+ mean = cumsum_x / counts
81
+
82
+ cumsum_x2 = torch.cumsum((x**2).nansum(dim=-1, keepdim=True), dim=1)
83
+ if self.cached_cumsum_x2 is not None:
84
+ cumsum_x2 += self.cached_cumsum_x2
85
+ self.cached_cumsum_x2 = cumsum_x2[:, -1:, :]
86
+
87
+ var = (cumsum_x2 - 2 * mean * cumsum_x + counts * mean**2) / counts
88
+ std = torch.sqrt(var + 1e-5)
89
+
90
+ return mean, std
91
+
92
+ def clear_cache(self):
93
+ self.cached_cumsum_x = None
94
+ self.cached_cumsum_x2 = None
95
+ self.cached_counts = None
96
+
97
+
98
+ class ResidualBlock(nn.Module):
99
+ def __init__(self, in_dim, hid_dim, out_dim):
100
+ super().__init__()
101
+ self.hidden_layer = nn.Linear(in_dim, hid_dim)
102
+ self.output_layer = nn.Linear(hid_dim, out_dim)
103
+ self.residual_layer = nn.Linear(in_dim, out_dim)
104
+ self.act = nn.ReLU()
105
+
106
+ def forward(self, x):
107
+ hid = self.act(self.hidden_layer(x))
108
+ out = self.output_layer(hid)
109
+ res = self.residual_layer(x)
110
+ out = out+res
111
+ return out
112
+
113
+ class MultiHeadAttention(nn.Module):
114
+ def __init__(self, d_model, n_heads, last=False):
115
+ super().__init__()
116
+ assert d_model%n_heads==0, f"d_model ({d_model}) must be divisible by n_heads ({n_heads})"
117
+
118
+ self.WQ = nn.Linear(d_model, d_model)
119
+ self.WK = nn.Linear(d_model, d_model)
120
+ self.WV = nn.Linear(d_model, d_model)
121
+
122
+ self.out_proj = nn.Linear(d_model, d_model)
123
+
124
+ self.head_dim = d_model//n_heads
125
+ self.n_heads = n_heads
126
+
127
+ self.rope = RotaryEmbedding(dim=self.head_dim//2)
128
+
129
+ self.k_cache = None
130
+ self.v_cache = None
131
+
132
+ self.last = last
133
+
134
+ def forward(self, q):
135
+ bs, context, dim = q.size()
136
+ offset = 0
137
+ is_causal = True
138
+
139
+ k = q
140
+ v = q
141
+
142
+ if self.last:
143
+ q = q[:, -1:, :]
144
+ is_causal = False
145
+ offset += (context - 1)
146
+
147
+ q = self.WQ(q).reshape(bs, -1, self.n_heads, self.head_dim).transpose(1, 2)
148
+ k = self.WK(k).reshape(bs, -1, self.n_heads, self.head_dim).transpose(1, 2)
149
+ v = self.WV(v).reshape(bs, -1, self.n_heads, self.head_dim).transpose(1, 2)
150
+
151
+ if self.k_cache is not None and self.v_cache is not None:
152
+ offset += self.k_cache.size(2)
153
+ is_causal = False
154
+ k = torch.cat([self.k_cache, k], dim=2)
155
+ v = torch.cat([self.v_cache, v], dim=2)
156
+
157
+ self.k_cache = k
158
+ self.v_cache = v
159
+
160
+ q = self.rope.rotate_queries_or_keys(q, offset=offset)
161
+ k = self.rope.rotate_queries_or_keys(k)
162
+
163
+ values = nn.functional.scaled_dot_product_attention(q, k, v, is_causal=is_causal)
164
+
165
+ values = values.transpose(1, 2).reshape(bs, -1, dim)
166
+ values = self.out_proj(values)
167
+ return values
168
+
169
+ def clear_cache(self):
170
+ self.k_cache = None
171
+ self.v_cache = None
172
+
173
+ class FeedForward(nn.Module):
174
+ def __init__(self, d_model, multiple_of=256):
175
+ super().__init__()
176
+
177
+ hidden_dim = d_model*4
178
+ hidden_dim = int(2 * hidden_dim / 3)
179
+ hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)
180
+
181
+ self.w1 = nn.Linear(d_model, hidden_dim, bias=False)
182
+ self.w2 = nn.Linear(hidden_dim, d_model, bias=False)
183
+ self.w3 = nn.Linear(d_model, hidden_dim, bias=False)
184
+
185
+ self.act = nn.SiLU()
186
+
187
+ def forward(self, x):
188
+ x = self.w2(self.act(self.w1(x)) * self.w3(x))
189
+ return x
190
+
191
+
192
+ class TransformerEncoderLayer(nn.Module):
193
+ def __init__(self, d_model, n_heads, last=False):
194
+ super().__init__()
195
+ self.ln1 = nn.LayerNorm(d_model)
196
+ self.attn = MultiHeadAttention(d_model=d_model, n_heads=n_heads, last=last)
197
+ self.ln2 = nn.LayerNorm(d_model)
198
+ self.ff = FeedForward(d_model=d_model)
199
+
200
+ def forward(self, x):
201
+ out_attn = self.attn(self.ln1((x)))
202
+ x = x + out_attn
203
+ out = x + self.ff(self.ln2(x))
204
+ return out
205
+
206
+ class TransformerEncoder(nn.Module):
207
+ def __init__(self, d_model, n_heads, n_layers):
208
+ super().__init__()
209
+ self.layers = nn.ModuleList(
210
+ [
211
+ TransformerEncoderLayer(d_model=d_model, n_heads=n_heads)
212
+ for _ in range(n_layers-1)
213
+ ]
214
+ )
215
+ self.layers.append(TransformerEncoderLayer(d_model=d_model, n_heads=n_heads, last=True))
216
+ self.norm = nn.LayerNorm(d_model)
217
+
218
+ def forward(self, x):
219
+ for layer in self.layers:
220
+ x = layer(x)
221
+ return self.norm(x)
222
+
223
+ class PatchFM(nn.Module, PyTorchModelHubMixin):
224
+ def __init__(self, config):
225
+ super().__init__()
226
+
227
+ # Store config
228
+ self.patch_len = config["patch_len"]
229
+ self.d_model = config["d_model"]
230
+ self.n_heads = config["n_heads"]
231
+ self.n_layers_encoder = config["n_layers_encoder"]
232
+ self.quantiles = config["quantiles"]
233
+ self.n_quantiles = len(self.quantiles)
234
+
235
+ # Components
236
+ self.revin = RevIN()
237
+ self.proj_embedding = ResidualBlock(
238
+ in_dim=self.patch_len,
239
+ hid_dim=2 * self.patch_len,
240
+ out_dim=self.d_model
241
+ )
242
+ self.transformer_encoder = TransformerEncoder(
243
+ d_model=self.d_model,
244
+ n_heads=self.n_heads,
245
+ n_layers=self.n_layers_encoder
246
+ )
247
+ self.proj_output = ResidualBlock(
248
+ in_dim=self.d_model,
249
+ hid_dim=2 * self.d_model,
250
+ out_dim=self.patch_len * self.n_quantiles
251
+ )
@@ -0,0 +1,128 @@
1
+ Metadata-Version: 2.4
2
+ Name: patchfm
3
+ Version: 1.1.0
4
+ Summary: a Foundation Model for Univariate Time Series Forecasting
5
+ Author-email: Samy-Melwan Vilhes <samy-melwan.vilhes@insa-rouen.fr>
6
+ License: MIT License
7
+
8
+ Copyright (c) 2025 Samy-Melwan Vilhes
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+ Project-URL: Repository, https://github.com/vilhess/PatchFM
28
+ Project-URL: Issues, https://github.com/vilhess/PatchFM/issues
29
+ Keywords: Transformer,LLM,Time Series,Zero-shot,Deep Learning
30
+ Classifier: Programming Language :: Python :: 3
31
+ Classifier: Operating System :: OS Independent
32
+ Requires-Python: >=3.11
33
+ Description-Content-Type: text/markdown
34
+ License-File: LICENSE
35
+ Requires-Dist: torch>=2.5.0
36
+ Requires-Dist: einops>=0.8.1
37
+ Requires-Dist: huggingface-hub>=0.35.1
38
+ Requires-Dist: rotary-embedding-torch>=0.8.9
39
+ Requires-Dist: numpy>=1.26.0
40
+ Dynamic: license-file
41
+
42
+ # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
43
+
44
+ [Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
45
+
46
+ A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
47
+
48
+ ## Highlights
49
+ - Next-patch prediction objective (autoregressive, causal)
50
+ - Patch-based representation of time series (tokens ↔ patches)
51
+ - Causal masking self-attention with RoPE (relative positions)
52
+ - RevIN (Reversible Instance Normalization) with causal statistics
53
+ - SwiGLU feed-forward networks
54
+ - Multi-quantile outputs (median + uncertainty bands)
55
+ - Efficient rollout with KV caching
56
+
57
+ ## Installation
58
+ ```bash
59
+ pip install patchfm
60
+ ```
61
+
62
+ ## Quick Start
63
+
64
+ ```python
65
+ import torch
66
+ from patchfm.configs import PatchFMConfig
67
+ from patchfm.model import Forecaster
68
+
69
+ # --- Instantiate model ---
70
+ config = PatchFMConfig()
71
+ model = Forecaster(config)
72
+
73
+ # --- Inference ---
74
+ forecast_horizon = 64
75
+ seq = torch.randn(1, 1024) # (batch, time)
76
+ pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, time, quantiles)
77
+ ```
78
+
79
+ We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
80
+ If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
81
+
82
+ <a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
83
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
84
+ </a>
85
+
86
+ ## Method (TL;DR)
87
+ - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
88
+ - RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
89
+ - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
90
+ - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
91
+ - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
92
+ - Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
93
+
94
+ ## Problem Formulation
95
+ Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
96
+
97
+ ## Loss: Multi-Quantile (Pinball)
98
+ For residual $u = x - \hat{x}^{(q)}$:
99
+ $$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
100
+ Aggregate over positions, patch elements, and quantiles.
101
+
102
+ ## Architecture
103
+ - Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
104
+ - Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
105
+ - FFN: SwiGLU (SiLU-gated), pre-norm + residual
106
+ - Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
107
+
108
+ ### Model Details
109
+ - Patch size: 32
110
+ - Max context: 32 patches (1024 steps)
111
+ - Forecast horizon: 32 steps per forward pass
112
+ - Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
113
+ - Layers: 6
114
+ - Attention heads: 64 (head dim 32)
115
+ - Model dim: 2048
116
+ - Parameters: ~300M
117
+
118
+ ## Inference
119
+ - Single step: predict next patch ($P_{len}$ values)
120
+ - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
121
+ - KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
122
+
123
+ ## Acknowledgements
124
+ We thank the authors of the following repositories for inspiration and code snippets:
125
+ - [TiRex](https://github.com/NX-AI/tirex)
126
+
127
+ ## Citation
128
+ If you use this work, please cite the paper ...
@@ -0,0 +1,12 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ src/patchfm/__init__.py
5
+ src/patchfm.egg-info/PKG-INFO
6
+ src/patchfm.egg-info/SOURCES.txt
7
+ src/patchfm.egg-info/dependency_links.txt
8
+ src/patchfm.egg-info/requires.txt
9
+ src/patchfm.egg-info/top_level.txt
10
+ src/patchfm/configs/model_config.py
11
+ src/patchfm/inference/forecaster.py
12
+ src/patchfm/inference/modules.py
@@ -0,0 +1,5 @@
1
+ torch>=2.5.0
2
+ einops>=0.8.1
3
+ huggingface-hub>=0.35.1
4
+ rotary-embedding-torch>=0.8.9
5
+ numpy>=1.26.0
@@ -0,0 +1 @@
1
+ patchfm