patchfm 2.0.0__tar.gz → 2.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: patchfm
3
- Version: 2.0.0
3
+ Version: 2.1.0
4
4
  Summary: a Foundation Model for Univariate Time Series Forecasting
5
5
  Author-email: Samy-Melwan Vilhes <samy-melwan.vilhes@insa-rouen.fr>
6
6
  License: MIT License
@@ -43,37 +43,60 @@ Dynamic: license-file
43
43
  # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
44
44
 
45
45
  [Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
46
+ [Paper Model Card](https://github.com/vilhess/PatchFM/blob/main/main.pdf)
46
47
 
47
- A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
48
+ A concise, reproducible recipe for training a transformer-based, patch-to-patch forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
48
49
 
49
50
  ## Highlights
50
51
  - Next-patch prediction objective (autoregressive, causal)
51
52
  - Patch-based representation of time series (tokens ↔ patches)
52
53
  - Causal masking self-attention with RoPE (relative positions)
53
- - RevIN (Reversible Instance Normalization) with causal statistics
54
+ - RevIN (Reversible Instance Normalization)
54
55
  - SwiGLU feed-forward networks
55
56
  - Multi-quantile outputs (median + uncertainty bands)
56
- - Efficient rollout with KV caching
57
+ - KV-cache for efficient long-horizon inference
58
+ - Autoregressive multi-quantile decoding [MOIRAI2.0](https://arxiv.org/pdf/2511.11698) (currently without KV-cache)
57
59
 
58
- ## Installation
60
+ ## Quick Start
61
+
62
+ ### from source code
63
+
64
+ 1. Clone the repository and install dependencies
59
65
  ```bash
60
- pip install patchfm
66
+ git clone https://github.com/vilhess/PatchFM
67
+ cd PatchFM
68
+ pip install -r requirements.txt
61
69
  ```
62
-
63
- ## Quick Start
70
+ 2. Run inference with a pretrained model from Huggingface Hub
64
71
 
65
72
  ```python
66
73
  import torch
67
- from patchfm import PatchFMConfig, Forecaster
74
+ from model import Forecaster
75
+ from configs import PatchFMConfig
68
76
 
69
77
  # --- Instantiate model ---
70
- config = PatchFMConfig()
78
+ config = PatchFMConfig(load_from_hub=True)
71
79
  model = Forecaster(config)
72
80
 
73
81
  # --- Inference ---
74
82
  forecast_horizon = 64
75
83
  seq = torch.randn(1, 1024) # (batch, time)
76
- pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, forecast_horizon), (batch, forecast_horizon, quantiles)
84
+ pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9], quantile_decoding=True) # (batch, time), (batch, time, quantiles)
85
+ ```
86
+
87
+ ### from pip package
88
+
89
+ 1. Install the package from PyPI
90
+ ```bash
91
+ pip install patchfm
92
+ ```
93
+ 2. Run inference with a pretrained model from Huggingface Hub
94
+
95
+ ```python
96
+ import torch
97
+ from patchfm import PatchFMConfig, Forecaster
98
+
99
+ # same as above
77
100
  ```
78
101
 
79
102
  We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
@@ -85,11 +108,12 @@ If you dont have suitable hardware you can run the the extended quick start exam
85
108
 
86
109
  ## Method (TL;DR)
87
110
  - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
88
- - RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
111
+ - Causal RevIN: Normalize input signal and denormalize outputs to the original scale without statistics leakage.
89
112
  - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
90
113
  - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
91
114
  - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
92
- - Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
115
+ - Inference: Predict next patch; roll out autoregressively for long horizons.
116
+ - KV-cache: during inference, cache keys/values to avoid redundant computations.
93
117
 
94
118
  ## Problem Formulation
95
119
  Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
@@ -118,11 +142,37 @@ Aggregate over positions, patch elements, and quantiles.
118
142
  ## Inference
119
143
  - Single step: predict next patch ($P_{len}$ values)
120
144
  - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
121
- - KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
145
+
146
+ ## Datasets
147
+ - UTSD (Unified Time Series Dataset) [UTSD]: seven domains (Energy, IoT, Nature, Web, Health, Transport, Environment). We work with UTSD-12G (~18M series after preprocessing).
148
+ - GIFT-Eval pretraining dataset [GIFT]: aligned with the GIFT-Eval dataset but without data leakage issue with the benchmark. The dataset contains approximately 71 univariate and 17 multivariate time series datasets from various
149
+ domains and various frequencies. After preprocessing, this yields approximately 600K univariate series.
150
+ - Artificial: ~1M synthetic series (sinusoidal, linear, polynomial, logarithmic) plus mixtures via TSMixup [Chronos]; Gaussian Process samples via KernelSynth (mixtures of RBF/periodic/linear kernels with swept hyperparameters).
151
+
152
+ ## Repository Layout
153
+
154
+ - `model/training/` — main PatchFM model class
155
+
156
+ - `modules.py` - core modules (Residual Layers, MHA, SwiGLU, RoPE, Transformer Encoder, ...)
157
+ - `revin.py` — causal RevIN
158
+ - `loss.py` — multi-quantile (pinball) loss
159
+ - `trainer.py` — PyTorch Lightning trainer class
160
+
161
+ - `model/inference/` — main PatchFM model class for inference
162
+ - `modules.py` — core modules with caching support
163
+ - `forecaster.py` — Forecasting model and rollout logic
164
+
165
+ - `dataset/` — data loading and preprocessing
166
+ - `artificial.py` — synthetic dataset : artificial signals + TSMixup + KernelSynth
167
+ - `utsd.py` — Unified Time Series Dataset (UTSD) loading and preprocessing
168
+ - `gift.py` — GIFT-Eval pretraining dataset loading and preprocessing
169
+ - `get_data.py` — utility to fetch and preprocess datasets
170
+ - `generate_data.py` — utility to generate and save the KernelSynth dataset (long to generate)
171
+
172
+ - `configs/` — model and training configurations
173
+ - `notebooks/inference` — how to load a trained model and generate forecasts
174
+ - `training.py` — training script using PyTorch Lightning
122
175
 
123
176
  ## Acknowledgements
124
177
  We thank the authors of the following repositories for inspiration and code snippets:
125
178
  - [TiRex](https://github.com/NX-AI/tirex)
126
-
127
- ## Citation
128
- If you use this work, please cite the paper ...
@@ -0,0 +1,136 @@
1
+ # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
2
+
3
+ [Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
4
+ [Paper Model Card](https://github.com/vilhess/PatchFM/blob/main/main.pdf)
5
+
6
+ A concise, reproducible recipe for training a transformer-based, patch-to-patch forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
7
+
8
+ ## Highlights
9
+ - Next-patch prediction objective (autoregressive, causal)
10
+ - Patch-based representation of time series (tokens ↔ patches)
11
+ - Causal masking self-attention with RoPE (relative positions)
12
+ - RevIN (Reversible Instance Normalization)
13
+ - SwiGLU feed-forward networks
14
+ - Multi-quantile outputs (median + uncertainty bands)
15
+ - KV-cache for efficient long-horizon inference
16
+ - Autoregressive multi-quantile decoding [MOIRAI2.0](https://arxiv.org/pdf/2511.11698) (currently without KV-cache)
17
+
18
+ ## Quick Start
19
+
20
+ ### from source code
21
+
22
+ 1. Clone the repository and install dependencies
23
+ ```bash
24
+ git clone https://github.com/vilhess/PatchFM
25
+ cd PatchFM
26
+ pip install -r requirements.txt
27
+ ```
28
+ 2. Run inference with a pretrained model from Huggingface Hub
29
+
30
+ ```python
31
+ import torch
32
+ from model import Forecaster
33
+ from configs import PatchFMConfig
34
+
35
+ # --- Instantiate model ---
36
+ config = PatchFMConfig(load_from_hub=True)
37
+ model = Forecaster(config)
38
+
39
+ # --- Inference ---
40
+ forecast_horizon = 64
41
+ seq = torch.randn(1, 1024) # (batch, time)
42
+ pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9], quantile_decoding=True) # (batch, time), (batch, time, quantiles)
43
+ ```
44
+
45
+ ### from pip package
46
+
47
+ 1. Install the package from PyPI
48
+ ```bash
49
+ pip install patchfm
50
+ ```
51
+ 2. Run inference with a pretrained model from Huggingface Hub
52
+
53
+ ```python
54
+ import torch
55
+ from patchfm import PatchFMConfig, Forecaster
56
+
57
+ # same as above
58
+ ```
59
+
60
+ We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
61
+ If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
62
+
63
+ <a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
64
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
65
+ </a>
66
+
67
+ ## Method (TL;DR)
68
+ - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
69
+ - Causal RevIN: Normalize input signal and denormalize outputs to the original scale without statistics leakage.
70
+ - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
71
+ - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
72
+ - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
73
+ - Inference: Predict next patch; roll out autoregressively for long horizons.
74
+ - KV-cache: during inference, cache keys/values to avoid redundant computations.
75
+
76
+ ## Problem Formulation
77
+ Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
78
+
79
+ ## Loss: Multi-Quantile (Pinball)
80
+ For residual $u = x - \hat{x}^{(q)}$:
81
+ $$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
82
+ Aggregate over positions, patch elements, and quantiles.
83
+
84
+ ## Architecture
85
+ - Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
86
+ - Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
87
+ - FFN: SwiGLU (SiLU-gated), pre-norm + residual
88
+ - Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
89
+
90
+ ### Model Details
91
+ - Patch size: 32
92
+ - Max context: 32 patches (1024 steps)
93
+ - Forecast horizon: 32 steps per forward pass
94
+ - Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
95
+ - Layers: 6
96
+ - Attention heads: 64 (head dim 32)
97
+ - Model dim: 2048
98
+ - Parameters: ~300M
99
+
100
+ ## Inference
101
+ - Single step: predict next patch ($P_{len}$ values)
102
+ - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
103
+
104
+ ## Datasets
105
+ - UTSD (Unified Time Series Dataset) [UTSD]: seven domains (Energy, IoT, Nature, Web, Health, Transport, Environment). We work with UTSD-12G (~18M series after preprocessing).
106
+ - GIFT-Eval pretraining dataset [GIFT]: aligned with the GIFT-Eval dataset but without data leakage issue with the benchmark. The dataset contains approximately 71 univariate and 17 multivariate time series datasets from various
107
+ domains and various frequencies. After preprocessing, this yields approximately 600K univariate series.
108
+ - Artificial: ~1M synthetic series (sinusoidal, linear, polynomial, logarithmic) plus mixtures via TSMixup [Chronos]; Gaussian Process samples via KernelSynth (mixtures of RBF/periodic/linear kernels with swept hyperparameters).
109
+
110
+ ## Repository Layout
111
+
112
+ - `model/training/` — main PatchFM model class
113
+
114
+ - `modules.py` - core modules (Residual Layers, MHA, SwiGLU, RoPE, Transformer Encoder, ...)
115
+ - `revin.py` — causal RevIN
116
+ - `loss.py` — multi-quantile (pinball) loss
117
+ - `trainer.py` — PyTorch Lightning trainer class
118
+
119
+ - `model/inference/` — main PatchFM model class for inference
120
+ - `modules.py` — core modules with caching support
121
+ - `forecaster.py` — Forecasting model and rollout logic
122
+
123
+ - `dataset/` — data loading and preprocessing
124
+ - `artificial.py` — synthetic dataset : artificial signals + TSMixup + KernelSynth
125
+ - `utsd.py` — Unified Time Series Dataset (UTSD) loading and preprocessing
126
+ - `gift.py` — GIFT-Eval pretraining dataset loading and preprocessing
127
+ - `get_data.py` — utility to fetch and preprocess datasets
128
+ - `generate_data.py` — utility to generate and save the KernelSynth dataset (long to generate)
129
+
130
+ - `configs/` — model and training configurations
131
+ - `notebooks/inference` — how to load a trained model and generate forecasts
132
+ - `training.py` — training script using PyTorch Lightning
133
+
134
+ ## Acknowledgements
135
+ We thank the authors of the following repositories for inspiration and code snippets:
136
+ - [TiRex](https://github.com/NX-AI/tirex)
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "patchfm"
3
- version = "2.0.0"
3
+ version = "2.1.0"
4
4
  authors = [
5
5
  { name="Samy-Melwan Vilhes", email="samy-melwan.vilhes@insa-rouen.fr" },
6
6
  ]
@@ -1,5 +1,6 @@
1
1
  import torch
2
2
  import torch.nn as nn
3
+ import numpy as np
3
4
  from einops import rearrange
4
5
  from patchfm.inference.modules import CausalRevIN, ResidualBlock, TransformerEncoder, PatchFM, SeqTypeConverter
5
6
 
@@ -20,7 +21,7 @@ class Forecaster(nn.Module):
20
21
  self.max_patches = self.max_seq_len // self.patch_len
21
22
 
22
23
  print("Loading base model from HuggingFace Hub...")
23
- base_model = PatchFM.from_pretrained("vilhess/PatchFM")
24
+ base_model = PatchFM.from_pretrained("vilhess/PatchFM-CausalRevIN-asinh")
24
25
  self._init_from_base(base_model)
25
26
 
26
27
  self.eval()
@@ -59,7 +60,13 @@ class Forecaster(nn.Module):
59
60
  self.proj_output = base_model.proj_output
60
61
 
61
62
  @torch.inference_mode()
62
- def forecast(self, x: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
63
+ def auto_regressive_quantile_decoding(self, x: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
64
+
65
+ q = torch.tensor(self.quantiles, device=self.device)
66
+
67
+ # Default horizon = patch_len
68
+ forecast_horizon = forecast_horizon or self.patch_len
69
+
63
70
  x = self.converter.convert(x)
64
71
  assert x.ndim in (1, 2), f"Input dimension must be 1D (time) or 2D (batch, time), got {x.ndim}D."
65
72
 
@@ -81,20 +88,32 @@ class Forecaster(nn.Module):
81
88
  if pad > 0:
82
89
  x = torch.cat([x[:, :1].repeat(1, pad), x], dim=1)
83
90
 
84
- # Default horizon = patch_len
85
- forecast_horizon = forecast_horizon or self.patch_len
86
-
87
91
  # Reshape into patches
88
- x = rearrange(x, "b (pn pl) -> b pn pl", pl=self.patch_len)
92
+ x = rearrange(x, "b (pn pl) -> b pn pl", pl=self.patch_len)
89
93
 
90
94
  rollouts = -(-forecast_horizon // self.patch_len) # ceil division
91
95
  predictions = []
92
96
 
93
- for _ in range(rollouts):
97
+ # 1st Forward pass
98
+ x = self.revin(x, mode="norm")
99
+ x = self.proj_embedding(x)
100
+ x = self.transformer_encoder(x)
101
+ x = x[:, -1:, :] # Keep only the last patch for autoregressive forecasting
102
+
103
+ forecasting = self.proj_output(x)
104
+ forecasting = self.revin(forecasting, mode="denorm")
105
+
106
+ # Reshape to (bs, patch_num, patch_len, n_quantiles)
107
+ forecasting = rearrange(
108
+ forecasting, "b 1 (pl q) -> b 1 pl q",
109
+ pl=self.patch_len, q=self.n_quantiles
110
+ )
111
+ x = forecasting.permute(0, 3, 1, 2).reshape(forecasting.size(0)*self.n_quantiles, 1, self.patch_len)
94
112
 
95
- if x.size(1) > self.max_patches:
96
- x = x[:, -self.max_patches:, :]
113
+ predictions.append(forecasting[:, -1, :, :].detach())
97
114
 
115
+ for _ in range(rollouts-1):
116
+
98
117
  # Forward pass
99
118
  x = self.revin(x, mode="norm")
100
119
  x = self.proj_embedding(x)
@@ -103,27 +122,30 @@ class Forecaster(nn.Module):
103
122
  forecasting = self.proj_output(x)
104
123
  forecasting = self.revin(forecasting, mode="denorm")
105
124
 
106
- # Reshape to (bs, patch_num, patch_len, n_quantiles)
125
+ # Reshape to (bs*n_quantiles, patch_num, patch_len, n_quantiles)
107
126
  forecasting = rearrange(
108
127
  forecasting, "b 1 (pl q) -> b 1 pl q",
109
128
  pl=self.patch_len, q=self.n_quantiles
110
129
  )
130
+
131
+ forecasting = rearrange(
132
+ forecasting, "(b q) 1 pl h -> b q 1 pl h",
133
+ q=self.n_quantiles
134
+ )
135
+ forecasting = forecasting.permute(0, 2, 3, 1, 4).flatten(start_dim=-2) # batch x 1 x patch_len x n_quantiles**2
136
+ forecasting = torch.quantile(forecasting, q, dim=-1) # n_quantiles x batch x 1 x patch_len
111
137
 
112
- # Take median quantile (index 4)
113
- patch_median = forecasting[:, -1:, :, 4].detach()
114
- predictions.append(forecasting[:, -1, :, :])
138
+ x = forecasting.permute(1, 0, 2, 3).reshape(-1, 1, self.patch_len)
139
+ predictions.append(forecasting.permute(1, 2, 3, 0)[:, 0].detach())
140
+
141
+ self.clear_cache()
115
142
 
116
- # Append median patch for next rollout
117
- x = patch_median.clone()
118
-
119
143
  pred_quantiles = torch.cat(predictions, dim=1)
120
144
  pred_quantiles = pred_quantiles[:, :forecast_horizon, :]
121
145
  pred_median = pred_quantiles[:, :, 4]
122
146
 
123
147
  pred_quantiles = pred_quantiles[..., [self.quantiles.index(q) for q in quantiles]] if quantiles is not None else pred_quantiles
124
148
 
125
- self.clear_cache()
126
-
127
149
  if torch.any(torch.isnan(pred_median)) or torch.any(torch.isinf(pred_median)):
128
150
  print("Warning: NaN or Inf values detected in predictions. Returning zeros.")
129
151
  pred_median = torch.zeros_like(pred_median)
@@ -136,10 +158,30 @@ class Forecaster(nn.Module):
136
158
  pred_median, pred_quantiles = self.converter.deconvert(pred_median, pred_quantiles)
137
159
  return pred_median, pred_quantiles
138
160
 
139
- def __call__(self, context: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
140
- return self.forecast(context, forecast_horizon, quantiles)
161
+ def __call__(self, context: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None, flip_equivariance: bool = False) -> torch.Tensor:
162
+ if flip_equivariance:
163
+ print("Flip equivariance enabled: forecast = (f(x) - f(-x)) / 2. This requires multiplying by 2 the batch size (Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting).")
164
+ bs = context.size(0)
165
+ context_flipped = -context
166
+ concat_context = torch.cat([context, context_flipped], dim=0)
167
+ pred_median_full, pred_quantiles_full = self.auto_regressive_quantile_decoding(concat_context, forecast_horizon, quantiles)
168
+ pred_median, pred_quantiles = pred_median_full[:bs], pred_quantiles_full[:bs]
169
+ pred_median2, pred_quantiles2 = pred_median_full[bs:], pred_quantiles_full[bs:]
170
+ pred_median = (pred_median - pred_median2) / 2
171
+ pred_quantiles = (pred_quantiles - flip_last_dim(pred_quantiles2)) / 2
172
+ else:
173
+ pred_median, pred_quantiles = self.auto_regressive_quantile_decoding(context, forecast_horizon, quantiles)
174
+ return pred_median, pred_quantiles
141
175
 
142
176
  def clear_cache(self):
143
177
  self.revin.clear_cache()
144
178
  for layer in self.transformer_encoder.layers:
145
- layer.attn.clear_cache()
179
+ layer.attn.clear_cache()
180
+
181
+ def flip_last_dim(x):
182
+ if isinstance(x, torch.Tensor):
183
+ return torch.flip(x, dims=[-1])
184
+ elif isinstance(x, np.ndarray):
185
+ return np.flip(x, axis=-1)
186
+ else:
187
+ raise TypeError(f"Unsupported type: {type(x)}")
@@ -136,11 +136,14 @@ class CausalRevIN(nn.Module):
136
136
  counts = counts - nan_counts
137
137
 
138
138
  if self.cached_counts is not None:
139
+ factor = B//self.cached_counts.size(0)
140
+ self.cached_counts = self.cached_counts.repeat_interleave(factor, dim=0)
139
141
  counts += self.cached_counts
140
142
  self.cached_counts = counts[:, -1:, :]
141
143
 
142
144
  cumsum_x = torch.cumsum(x.nansum(dim=-1, keepdim=True), dim=1)
143
145
  if self.cached_cumsum_x is not None:
146
+ self.cached_cumsum_x = self.cached_cumsum_x.repeat_interleave(factor, dim=0)
144
147
  cumsum_x += self.cached_cumsum_x
145
148
  self.cached_cumsum_x = cumsum_x[:, -1:, :]
146
149
 
@@ -149,6 +152,7 @@ class CausalRevIN(nn.Module):
149
152
 
150
153
  cumsum_x2 = torch.cumsum((x**2).nansum(dim=-1, keepdim=True), dim=1)
151
154
  if self.cached_cumsum_x2 is not None:
155
+ self.cached_cumsum_x2 = self.cached_cumsum_x2.repeat_interleave(factor, dim=0)
152
156
  cumsum_x2 += self.cached_cumsum_x2
153
157
  self.cached_cumsum_x2 = cumsum_x2[:, -1:, :]
154
158
 
@@ -161,6 +165,8 @@ class CausalRevIN(nn.Module):
161
165
  self.cached_cumsum_x = None
162
166
  self.cached_cumsum_x2 = None
163
167
  self.cached_counts = None
168
+ self.cached_mean = None
169
+ self.cached_std = None
164
170
 
165
171
  class ResidualBlock(nn.Module):
166
172
  def __init__(self, in_dim, hid_dim, out_dim):
@@ -218,6 +224,9 @@ class MultiHeadAttention(nn.Module):
218
224
  if self.k_cache is not None and self.v_cache is not None:
219
225
  offset += self.k_cache.size(2)
220
226
  is_causal = False
227
+ factor = q.size(0) // self.k_cache.size(0)
228
+ self.k_cache = self.k_cache.repeat_interleave(factor, dim=0)
229
+ self.v_cache = self.v_cache.repeat_interleave(factor, dim=0)
221
230
  k = torch.cat([self.k_cache, k], dim=2)
222
231
  v = torch.cat([self.v_cache, v], dim=2)
223
232
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: patchfm
3
- Version: 2.0.0
3
+ Version: 2.1.0
4
4
  Summary: a Foundation Model for Univariate Time Series Forecasting
5
5
  Author-email: Samy-Melwan Vilhes <samy-melwan.vilhes@insa-rouen.fr>
6
6
  License: MIT License
@@ -43,37 +43,60 @@ Dynamic: license-file
43
43
  # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
44
44
 
45
45
  [Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
46
+ [Paper Model Card](https://github.com/vilhess/PatchFM/blob/main/main.pdf)
46
47
 
47
- A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
48
+ A concise, reproducible recipe for training a transformer-based, patch-to-patch forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
48
49
 
49
50
  ## Highlights
50
51
  - Next-patch prediction objective (autoregressive, causal)
51
52
  - Patch-based representation of time series (tokens ↔ patches)
52
53
  - Causal masking self-attention with RoPE (relative positions)
53
- - RevIN (Reversible Instance Normalization) with causal statistics
54
+ - RevIN (Reversible Instance Normalization)
54
55
  - SwiGLU feed-forward networks
55
56
  - Multi-quantile outputs (median + uncertainty bands)
56
- - Efficient rollout with KV caching
57
+ - KV-cache for efficient long-horizon inference
58
+ - Autoregressive multi-quantile decoding [MOIRAI2.0](https://arxiv.org/pdf/2511.11698) (currently without KV-cache)
57
59
 
58
- ## Installation
60
+ ## Quick Start
61
+
62
+ ### from source code
63
+
64
+ 1. Clone the repository and install dependencies
59
65
  ```bash
60
- pip install patchfm
66
+ git clone https://github.com/vilhess/PatchFM
67
+ cd PatchFM
68
+ pip install -r requirements.txt
61
69
  ```
62
-
63
- ## Quick Start
70
+ 2. Run inference with a pretrained model from Huggingface Hub
64
71
 
65
72
  ```python
66
73
  import torch
67
- from patchfm import PatchFMConfig, Forecaster
74
+ from model import Forecaster
75
+ from configs import PatchFMConfig
68
76
 
69
77
  # --- Instantiate model ---
70
- config = PatchFMConfig()
78
+ config = PatchFMConfig(load_from_hub=True)
71
79
  model = Forecaster(config)
72
80
 
73
81
  # --- Inference ---
74
82
  forecast_horizon = 64
75
83
  seq = torch.randn(1, 1024) # (batch, time)
76
- pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, forecast_horizon), (batch, forecast_horizon, quantiles)
84
+ pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9], quantile_decoding=True) # (batch, time), (batch, time, quantiles)
85
+ ```
86
+
87
+ ### from pip package
88
+
89
+ 1. Install the package from PyPI
90
+ ```bash
91
+ pip install patchfm
92
+ ```
93
+ 2. Run inference with a pretrained model from Huggingface Hub
94
+
95
+ ```python
96
+ import torch
97
+ from patchfm import PatchFMConfig, Forecaster
98
+
99
+ # same as above
77
100
  ```
78
101
 
79
102
  We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
@@ -85,11 +108,12 @@ If you dont have suitable hardware you can run the the extended quick start exam
85
108
 
86
109
  ## Method (TL;DR)
87
110
  - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
88
- - RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
111
+ - Causal RevIN: Normalize input signal and denormalize outputs to the original scale without statistics leakage.
89
112
  - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
90
113
  - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
91
114
  - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
92
- - Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
115
+ - Inference: Predict next patch; roll out autoregressively for long horizons.
116
+ - KV-cache: during inference, cache keys/values to avoid redundant computations.
93
117
 
94
118
  ## Problem Formulation
95
119
  Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
@@ -118,11 +142,37 @@ Aggregate over positions, patch elements, and quantiles.
118
142
  ## Inference
119
143
  - Single step: predict next patch ($P_{len}$ values)
120
144
  - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
121
- - KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
145
+
146
+ ## Datasets
147
+ - UTSD (Unified Time Series Dataset) [UTSD]: seven domains (Energy, IoT, Nature, Web, Health, Transport, Environment). We work with UTSD-12G (~18M series after preprocessing).
148
+ - GIFT-Eval pretraining dataset [GIFT]: aligned with the GIFT-Eval dataset but without data leakage issue with the benchmark. The dataset contains approximately 71 univariate and 17 multivariate time series datasets from various
149
+ domains and various frequencies. After preprocessing, this yields approximately 600K univariate series.
150
+ - Artificial: ~1M synthetic series (sinusoidal, linear, polynomial, logarithmic) plus mixtures via TSMixup [Chronos]; Gaussian Process samples via KernelSynth (mixtures of RBF/periodic/linear kernels with swept hyperparameters).
151
+
152
+ ## Repository Layout
153
+
154
+ - `model/training/` — main PatchFM model class
155
+
156
+ - `modules.py` - core modules (Residual Layers, MHA, SwiGLU, RoPE, Transformer Encoder, ...)
157
+ - `revin.py` — causal RevIN
158
+ - `loss.py` — multi-quantile (pinball) loss
159
+ - `trainer.py` — PyTorch Lightning trainer class
160
+
161
+ - `model/inference/` — main PatchFM model class for inference
162
+ - `modules.py` — core modules with caching support
163
+ - `forecaster.py` — Forecasting model and rollout logic
164
+
165
+ - `dataset/` — data loading and preprocessing
166
+ - `artificial.py` — synthetic dataset : artificial signals + TSMixup + KernelSynth
167
+ - `utsd.py` — Unified Time Series Dataset (UTSD) loading and preprocessing
168
+ - `gift.py` — GIFT-Eval pretraining dataset loading and preprocessing
169
+ - `get_data.py` — utility to fetch and preprocess datasets
170
+ - `generate_data.py` — utility to generate and save the KernelSynth dataset (long to generate)
171
+
172
+ - `configs/` — model and training configurations
173
+ - `notebooks/inference` — how to load a trained model and generate forecasts
174
+ - `training.py` — training script using PyTorch Lightning
122
175
 
123
176
  ## Acknowledgements
124
177
  We thank the authors of the following repositories for inspiration and code snippets:
125
178
  - [TiRex](https://github.com/NX-AI/tirex)
126
-
127
- ## Citation
128
- If you use this work, please cite the paper ...
patchfm-2.0.0/README.md DELETED
@@ -1,86 +0,0 @@
1
- # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
2
-
3
- [Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
4
-
5
- A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
6
-
7
- ## Highlights
8
- - Next-patch prediction objective (autoregressive, causal)
9
- - Patch-based representation of time series (tokens ↔ patches)
10
- - Causal masking self-attention with RoPE (relative positions)
11
- - RevIN (Reversible Instance Normalization) with causal statistics
12
- - SwiGLU feed-forward networks
13
- - Multi-quantile outputs (median + uncertainty bands)
14
- - Efficient rollout with KV caching
15
-
16
- ## Installation
17
- ```bash
18
- pip install patchfm
19
- ```
20
-
21
- ## Quick Start
22
-
23
- ```python
24
- import torch
25
- from patchfm import PatchFMConfig, Forecaster
26
-
27
- # --- Instantiate model ---
28
- config = PatchFMConfig()
29
- model = Forecaster(config)
30
-
31
- # --- Inference ---
32
- forecast_horizon = 64
33
- seq = torch.randn(1, 1024) # (batch, time)
34
- pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, forecast_horizon), (batch, forecast_horizon, quantiles)
35
- ```
36
-
37
- We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
38
- If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
39
-
40
- <a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
41
- <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
42
- </a>
43
-
44
- ## Method (TL;DR)
45
- - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
46
- - RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
47
- - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
48
- - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
49
- - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
50
- - Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
51
-
52
- ## Problem Formulation
53
- Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
54
-
55
- ## Loss: Multi-Quantile (Pinball)
56
- For residual $u = x - \hat{x}^{(q)}$:
57
- $$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
58
- Aggregate over positions, patch elements, and quantiles.
59
-
60
- ## Architecture
61
- - Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
62
- - Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
63
- - FFN: SwiGLU (SiLU-gated), pre-norm + residual
64
- - Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
65
-
66
- ### Model Details
67
- - Patch size: 32
68
- - Max context: 32 patches (1024 steps)
69
- - Forecast horizon: 32 steps per forward pass
70
- - Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
71
- - Layers: 6
72
- - Attention heads: 64 (head dim 32)
73
- - Model dim: 2048
74
- - Parameters: ~300M
75
-
76
- ## Inference
77
- - Single step: predict next patch ($P_{len}$ values)
78
- - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
79
- - KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
80
-
81
- ## Acknowledgements
82
- We thank the authors of the following repositories for inspiration and code snippets:
83
- - [TiRex](https://github.com/NX-AI/tirex)
84
-
85
- ## Citation
86
- If you use this work, please cite the paper ...
File without changes
File without changes
File without changes