PyPI - patchfm - Versions diffs - 2.0.0__tar.gz → 2.1.0__tar.gz - Mend

patchfm 2.0.0tar.gz → 2.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

{patchfm-2.0.0/src/patchfm.egg-info → patchfm-2.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: patchfm
-Version: 2.0.0
+Version: 2.1.0
 Summary: a Foundation Model for Univariate Time Series Forecasting
 Author-email: Samy-Melwan Vilhes <samy-melwan.vilhes@insa-rouen.fr>
 License: MIT License
@@ -43,37 +43,60 @@ Dynamic: license-file
 # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
 [Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
+[Paper Model Card](https://github.com/vilhess/PatchFM/blob/main/main.pdf)
-A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
+A concise, reproducible recipe for training a transformer-based, patch-to-patch forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
 ## Highlights
 - Next-patch prediction objective (autoregressive, causal)
 - Patch-based representation of time series (tokens ↔ patches)
 - Causal masking self-attention with RoPE (relative positions)
-- RevIN (Reversible Instance Normalization) with causal statistics
+- RevIN (Reversible Instance Normalization)
 - SwiGLU feed-forward networks
 - Multi-quantile outputs (median + uncertainty bands)
-- Efficient rollout with KV caching
+- KV-cache for efficient long-horizon inference
+- Autoregressive multi-quantile decoding [MOIRAI2.0](https://arxiv.org/pdf/2511.11698) (currently without KV-cache)
-## Installation
+## Quick Start
+### from source code
+1. Clone the repository and install dependencies
 ```bash
-pip install patchfm
+git clone https://github.com/vilhess/PatchFM
+cd PatchFM
+pip install -r requirements.txt
 ```
-## Quick Start
+2. Run inference with a pretrained model from Huggingface Hub
 ```python
 import torch
-from patchfm import PatchFMConfig, Forecaster
+from model import Forecaster
+from configs import PatchFMConfig
 # --- Instantiate model ---
-config = PatchFMConfig()
+config = PatchFMConfig(load_from_hub=True)
 model = Forecaster(config)
 # --- Inference ---
 forecast_horizon = 64
 seq = torch.randn(1, 1024)  # (batch, time)
-pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, forecast_horizon), (batch, forecast_horizon, quantiles)
+pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9], quantile_decoding=True)  #  (batch, time), (batch, time, quantiles)
+```
+### from pip package
+1. Install the package from PyPI
+```bash
+pip install patchfm
+```
+2. Run inference with a pretrained model from Huggingface Hub
+```python
+import torch
+from patchfm import PatchFMConfig, Forecaster
+# same as above
 ```
 We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
@@ -85,11 +108,12 @@ If you dont have suitable hardware you can run the the extended quick start exam
 ## Method (TL;DR)
 - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
-- RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
+- Causal RevIN: Normalize input signal and denormalize outputs to the original scale without statistics leakage.
 - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
 - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
 - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
-- Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
+- Inference: Predict next patch; roll out autoregressively for long horizons.
+- KV-cache: during inference, cache keys/values to avoid redundant computations.
 ## Problem Formulation
 Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
@@ -118,11 +142,37 @@ Aggregate over positions, patch elements, and quantiles.
 ## Inference
 - Single step: predict next patch ($P_{len}$ values)
 - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
-- KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
+## Datasets
+- UTSD (Unified Time Series Dataset) [UTSD]: seven domains (Energy, IoT, Nature, Web, Health, Transport, Environment). We work with UTSD-12G (~18M series after preprocessing).
+- GIFT-Eval pretraining dataset [GIFT]: aligned with the GIFT-Eval dataset but without data leakage issue with the benchmark. The dataset contains approximately 71 univariate and 17 multivariate time series datasets from various
+domains and various frequencies. After preprocessing, this yields approximately 600K univariate series.
+- Artificial: ~1M synthetic series (sinusoidal, linear, polynomial, logarithmic) plus mixtures via TSMixup [Chronos]; Gaussian Process samples via KernelSynth (mixtures of RBF/periodic/linear kernels with swept hyperparameters).
+## Repository Layout
+- `model/training/` — main PatchFM model class
+  - `modules.py` - core modules (Residual Layers, MHA, SwiGLU, RoPE, Transformer Encoder, ...)
+  - `revin.py` — causal RevIN
+  - `loss.py` — multi-quantile (pinball) loss
+  - `trainer.py` — PyTorch Lightning trainer class
+- `model/inference/` — main PatchFM model class for inference
+  - `modules.py` — core modules with caching support
+  - `forecaster.py` — Forecasting model and rollout logic
+- `dataset/` — data loading and preprocessing
+  - `artificial.py` — synthetic dataset : artificial signals + TSMixup + KernelSynth
+  - `utsd.py` — Unified Time Series Dataset (UTSD) loading and preprocessing
+  - `gift.py` — GIFT-Eval pretraining dataset loading and preprocessing
+  - `get_data.py` — utility to fetch and preprocess datasets
+  - `generate_data.py` — utility to generate and save the KernelSynth dataset (long to generate)
+- `configs/` — model and training configurations
+- `notebooks/inference` — how to load a trained model and generate forecasts
+- `training.py` — training script using PyTorch Lightning
 ## Acknowledgements
 We thank the authors of the following repositories for inspiration and code snippets:
 - [TiRex](https://github.com/NX-AI/tirex)
-## Citation
-If you use this work, please cite the paper ...

patchfm-2.1.0/README.md ADDED Viewed

@@ -0,0 +1,136 @@
+# A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
+[Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
+[Paper Model Card](https://github.com/vilhess/PatchFM/blob/main/main.pdf)
+A concise, reproducible recipe for training a transformer-based, patch-to-patch forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
+## Highlights
+- Next-patch prediction objective (autoregressive, causal)
+- Patch-based representation of time series (tokens ↔ patches)
+- Causal masking self-attention with RoPE (relative positions)
+- RevIN (Reversible Instance Normalization)
+- SwiGLU feed-forward networks
+- Multi-quantile outputs (median + uncertainty bands)
+- KV-cache for efficient long-horizon inference
+- Autoregressive multi-quantile decoding [MOIRAI2.0](https://arxiv.org/pdf/2511.11698) (currently without KV-cache)
+## Quick Start
+### from source code
+1. Clone the repository and install dependencies
+```bash
+git clone https://github.com/vilhess/PatchFM
+cd PatchFM
+pip install -r requirements.txt
+```
+2. Run inference with a pretrained model from Huggingface Hub
+```python
+import torch
+from model import Forecaster
+from configs import PatchFMConfig
+# --- Instantiate model ---
+config = PatchFMConfig(load_from_hub=True)
+model = Forecaster(config)
+# --- Inference ---
+forecast_horizon = 64
+seq = torch.randn(1, 1024)  # (batch, time)
+pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9], quantile_decoding=True)  #  (batch, time), (batch, time, quantiles)
+```
+### from pip package
+1. Install the package from PyPI
+```bash
+pip install patchfm
+```
+2. Run inference with a pretrained model from Huggingface Hub
+```python
+import torch
+from patchfm import PatchFMConfig, Forecaster
+# same as above
+```
+We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
+If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
+<a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
+  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
+</a>
+## Method (TL;DR)
+- Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
+- Causal RevIN: Normalize input signal and denormalize outputs to the original scale without statistics leakage.
+- Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
+- Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
+- Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
+- Inference: Predict next patch; roll out autoregressively for long horizons.
+- KV-cache: during inference, cache keys/values to avoid redundant computations.
+## Problem Formulation
+Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
+## Loss: Multi-Quantile (Pinball)
+For residual $u = x - \hat{x}^{(q)}$:
+$$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
+Aggregate over positions, patch elements, and quantiles.
+## Architecture
+- Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
+- Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
+- FFN: SwiGLU (SiLU-gated), pre-norm + residual
+- Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
+### Model Details
+- Patch size: 32
+- Max context: 32 patches (1024 steps)
+- Forecast horizon: 32 steps per forward pass
+- Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
+- Layers: 6
+- Attention heads: 64 (head dim 32)
+- Model dim: 2048
+- Parameters: ~300M
+## Inference
+- Single step: predict next patch ($P_{len}$ values)
+- Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
+## Datasets
+- UTSD (Unified Time Series Dataset) [UTSD]: seven domains (Energy, IoT, Nature, Web, Health, Transport, Environment). We work with UTSD-12G (~18M series after preprocessing).
+- GIFT-Eval pretraining dataset [GIFT]: aligned with the GIFT-Eval dataset but without data leakage issue with the benchmark. The dataset contains approximately 71 univariate and 17 multivariate time series datasets from various
+domains and various frequencies. After preprocessing, this yields approximately 600K univariate series.
+- Artificial: ~1M synthetic series (sinusoidal, linear, polynomial, logarithmic) plus mixtures via TSMixup [Chronos]; Gaussian Process samples via KernelSynth (mixtures of RBF/periodic/linear kernels with swept hyperparameters).
+## Repository Layout
+- `model/training/` — main PatchFM model class
+  - `modules.py` - core modules (Residual Layers, MHA, SwiGLU, RoPE, Transformer Encoder, ...)
+  - `revin.py` — causal RevIN
+  - `loss.py` — multi-quantile (pinball) loss
+  - `trainer.py` — PyTorch Lightning trainer class
+- `model/inference/` — main PatchFM model class for inference
+  - `modules.py` — core modules with caching support
+  - `forecaster.py` — Forecasting model and rollout logic
+- `dataset/` — data loading and preprocessing
+  - `artificial.py` — synthetic dataset : artificial signals + TSMixup + KernelSynth
+  - `utsd.py` — Unified Time Series Dataset (UTSD) loading and preprocessing
+  - `gift.py` — GIFT-Eval pretraining dataset loading and preprocessing
+  - `get_data.py` — utility to fetch and preprocess datasets
+  - `generate_data.py` — utility to generate and save the KernelSynth dataset (long to generate)
+- `configs/` — model and training configurations
+- `notebooks/inference` — how to load a trained model and generate forecasts
+- `training.py` — training script using PyTorch Lightning
+## Acknowledgements
+We thank the authors of the following repositories for inspiration and code snippets:
+- [TiRex](https://github.com/NX-AI/tirex)

{patchfm-2.0.0 → patchfm-2.1.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "patchfm"
-version = "2.0.0"
+version = "2.1.0"
 authors = [
   { name="Samy-Melwan Vilhes", email="samy-melwan.vilhes@insa-rouen.fr" },
 ]

{patchfm-2.0.0 → patchfm-2.1.0}/src/patchfm/inference/forecaster.py RENAMED Viewed

@@ -1,5 +1,6 @@
 import torch
 import torch.nn as nn
+import numpy as np
 from einops import rearrange
 from patchfm.inference.modules import CausalRevIN, ResidualBlock, TransformerEncoder, PatchFM, SeqTypeConverter
@@ -20,7 +21,7 @@ class Forecaster(nn.Module):
         self.max_patches = self.max_seq_len // self.patch_len
         print("Loading base model from HuggingFace Hub...")
-        base_model = PatchFM.from_pretrained("vilhess/PatchFM")
+        base_model = PatchFM.from_pretrained("vilhess/PatchFM-CausalRevIN-asinh")
         self._init_from_base(base_model)
         self.eval()
@@ -59,7 +60,13 @@ class Forecaster(nn.Module):
         self.proj_output = base_model.proj_output
     @torch.inference_mode()
-    def forecast(self, x: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
+    def auto_regressive_quantile_decoding(self, x: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
+        q = torch.tensor(self.quantiles, device=self.device)
+        # Default horizon = patch_len
+        forecast_horizon = forecast_horizon or self.patch_len
         x = self.converter.convert(x)
         assert x.ndim in (1, 2), f"Input dimension must be 1D (time) or 2D (batch, time), got {x.ndim}D."
@@ -81,20 +88,32 @@ class Forecaster(nn.Module):
         if pad > 0:
             x = torch.cat([x[:, :1].repeat(1, pad), x], dim=1)
-        # Default horizon = patch_len
-        forecast_horizon = forecast_horizon or self.patch_len
         # Reshape into patches
-        x = rearrange(x, "b (pn pl) -> b pn pl", pl=self.patch_len)
+        x = rearrange(x, "b (pn pl) -> b pn pl", pl=self.patch_len)
         rollouts = -(-forecast_horizon // self.patch_len)  # ceil division
         predictions = []
-        for _ in range(rollouts):
+        # 1st Forward pass
+        x = self.revin(x, mode="norm")
+        x = self.proj_embedding(x)
+        x = self.transformer_encoder(x)
+        x = x[:, -1:, :]  # Keep only the last patch for autoregressive forecasting
+        forecasting = self.proj_output(x)
+        forecasting = self.revin(forecasting, mode="denorm")
+        # Reshape to (bs, patch_num, patch_len, n_quantiles)
+        forecasting = rearrange(
+            forecasting, "b 1 (pl q) -> b 1 pl q",
+            pl=self.patch_len, q=self.n_quantiles
+        )
+        x = forecasting.permute(0, 3, 1, 2).reshape(forecasting.size(0)*self.n_quantiles, 1, self.patch_len)
-            if x.size(1) > self.max_patches:
-                x = x[:, -self.max_patches:, :]
+        predictions.append(forecasting[:, -1, :, :].detach())
+        for _ in range(rollouts-1):
             # Forward pass
             x = self.revin(x, mode="norm")
             x = self.proj_embedding(x)
@@ -103,27 +122,30 @@ class Forecaster(nn.Module):
             forecasting = self.proj_output(x)
             forecasting = self.revin(forecasting, mode="denorm")
-            # Reshape to (bs, patch_num, patch_len, n_quantiles)
+            # Reshape to (bs*n_quantiles, patch_num, patch_len, n_quantiles)
             forecasting = rearrange(
                 forecasting, "b 1 (pl q) -> b 1 pl q",
                 pl=self.patch_len, q=self.n_quantiles
             )
+            forecasting = rearrange(
+                forecasting, "(b q) 1 pl h -> b q 1 pl h",
+                q=self.n_quantiles
+            )
+            forecasting = forecasting.permute(0, 2, 3, 1, 4).flatten(start_dim=-2)  # batch x 1 x patch_len x n_quantiles**2
+            forecasting = torch.quantile(forecasting, q, dim=-1) # n_quantiles x batch x 1 x patch_len
-            # Take median quantile (index 4)
-            patch_median = forecasting[:, -1:, :, 4].detach()
-            predictions.append(forecasting[:, -1, :, :])
+            x = forecasting.permute(1, 0, 2, 3).reshape(-1, 1, self.patch_len)
+            predictions.append(forecasting.permute(1, 2, 3, 0)[:, 0].detach())
+        self.clear_cache()
-            # Append median patch for next rollout
-            x = patch_median.clone()
         pred_quantiles = torch.cat(predictions, dim=1)
         pred_quantiles = pred_quantiles[:, :forecast_horizon, :]
         pred_median = pred_quantiles[:, :, 4]
         pred_quantiles = pred_quantiles[..., [self.quantiles.index(q) for q in quantiles]] if quantiles is not None else pred_quantiles
-        self.clear_cache()
         if torch.any(torch.isnan(pred_median)) or torch.any(torch.isinf(pred_median)):
             print("Warning: NaN or Inf values detected in predictions. Returning zeros.")
             pred_median = torch.zeros_like(pred_median)
@@ -136,10 +158,30 @@ class Forecaster(nn.Module):
         pred_median, pred_quantiles = self.converter.deconvert(pred_median, pred_quantiles)
         return pred_median, pred_quantiles
-    def __call__(self, context: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
-        return self.forecast(context, forecast_horizon, quantiles)
+    def __call__(self, context: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None, flip_equivariance: bool = False) -> torch.Tensor:
+        if flip_equivariance:
+            print("Flip equivariance enabled: forecast = (f(x) - f(-x)) / 2. This requires multiplying by 2 the batch size (Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting).")
+            bs = context.size(0)
+            context_flipped = -context
+            concat_context = torch.cat([context, context_flipped], dim=0)
+            pred_median_full, pred_quantiles_full = self.auto_regressive_quantile_decoding(concat_context, forecast_horizon, quantiles)
+            pred_median, pred_quantiles = pred_median_full[:bs], pred_quantiles_full[:bs]
+            pred_median2, pred_quantiles2 = pred_median_full[bs:], pred_quantiles_full[bs:]
+            pred_median = (pred_median - pred_median2) / 2
+            pred_quantiles = (pred_quantiles - flip_last_dim(pred_quantiles2)) / 2
+        else:
+            pred_median, pred_quantiles = self.auto_regressive_quantile_decoding(context, forecast_horizon, quantiles)
+        return pred_median, pred_quantiles
     def clear_cache(self):
         self.revin.clear_cache()
         for layer in self.transformer_encoder.layers:
-            layer.attn.clear_cache()
+            layer.attn.clear_cache()
+def flip_last_dim(x):
+    if isinstance(x, torch.Tensor):
+        return torch.flip(x, dims=[-1])
+    elif isinstance(x, np.ndarray):
+        return np.flip(x, axis=-1)
+    else:
+        raise TypeError(f"Unsupported type: {type(x)}")

{patchfm-2.0.0 → patchfm-2.1.0}/src/patchfm/inference/modules.py RENAMED Viewed

@@ -136,11 +136,14 @@ class CausalRevIN(nn.Module):
         counts = counts - nan_counts
         if self.cached_counts is not None:
+            factor = B//self.cached_counts.size(0)
+            self.cached_counts = self.cached_counts.repeat_interleave(factor, dim=0)
             counts += self.cached_counts
         self.cached_counts = counts[:, -1:, :]
         cumsum_x = torch.cumsum(x.nansum(dim=-1, keepdim=True), dim=1)
         if self.cached_cumsum_x is not None:
+            self.cached_cumsum_x = self.cached_cumsum_x.repeat_interleave(factor, dim=0)
             cumsum_x += self.cached_cumsum_x
         self.cached_cumsum_x = cumsum_x[:, -1:, :]
@@ -149,6 +152,7 @@ class CausalRevIN(nn.Module):
         cumsum_x2 = torch.cumsum((x**2).nansum(dim=-1, keepdim=True), dim=1)
         if self.cached_cumsum_x2 is not None:
+            self.cached_cumsum_x2 = self.cached_cumsum_x2.repeat_interleave(factor, dim=0)
             cumsum_x2 += self.cached_cumsum_x2
         self.cached_cumsum_x2 = cumsum_x2[:, -1:, :]
@@ -161,6 +165,8 @@ class CausalRevIN(nn.Module):
         self.cached_cumsum_x = None
         self.cached_cumsum_x2 = None
         self.cached_counts = None
+        self.cached_mean = None
+        self.cached_std = None
 class ResidualBlock(nn.Module):
     def __init__(self, in_dim, hid_dim, out_dim):
@@ -218,6 +224,9 @@ class MultiHeadAttention(nn.Module):
         if self.k_cache is not None and self.v_cache is not None:
             offset += self.k_cache.size(2)
             is_causal = False
+            factor = q.size(0) // self.k_cache.size(0)
+            self.k_cache = self.k_cache.repeat_interleave(factor, dim=0)
+            self.v_cache = self.v_cache.repeat_interleave(factor, dim=0)
             k = torch.cat([self.k_cache, k], dim=2)
             v = torch.cat([self.v_cache, v], dim=2)

{patchfm-2.0.0 → patchfm-2.1.0/src/patchfm.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: patchfm
-Version: 2.0.0
+Version: 2.1.0
 Summary: a Foundation Model for Univariate Time Series Forecasting
 Author-email: Samy-Melwan Vilhes <samy-melwan.vilhes@insa-rouen.fr>
 License: MIT License
@@ -43,37 +43,60 @@ Dynamic: license-file
 # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
 [Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
+[Paper Model Card](https://github.com/vilhess/PatchFM/blob/main/main.pdf)
-A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
+A concise, reproducible recipe for training a transformer-based, patch-to-patch forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
 ## Highlights
 - Next-patch prediction objective (autoregressive, causal)
 - Patch-based representation of time series (tokens ↔ patches)
 - Causal masking self-attention with RoPE (relative positions)
-- RevIN (Reversible Instance Normalization) with causal statistics
+- RevIN (Reversible Instance Normalization)
 - SwiGLU feed-forward networks
 - Multi-quantile outputs (median + uncertainty bands)
-- Efficient rollout with KV caching
+- KV-cache for efficient long-horizon inference
+- Autoregressive multi-quantile decoding [MOIRAI2.0](https://arxiv.org/pdf/2511.11698) (currently without KV-cache)
-## Installation
+## Quick Start
+### from source code
+1. Clone the repository and install dependencies
 ```bash
-pip install patchfm
+git clone https://github.com/vilhess/PatchFM
+cd PatchFM
+pip install -r requirements.txt
 ```
-## Quick Start
+2. Run inference with a pretrained model from Huggingface Hub
 ```python
 import torch
-from patchfm import PatchFMConfig, Forecaster
+from model import Forecaster
+from configs import PatchFMConfig
 # --- Instantiate model ---
-config = PatchFMConfig()
+config = PatchFMConfig(load_from_hub=True)
 model = Forecaster(config)
 # --- Inference ---
 forecast_horizon = 64
 seq = torch.randn(1, 1024)  # (batch, time)
-pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, forecast_horizon), (batch, forecast_horizon, quantiles)
+pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9], quantile_decoding=True)  #  (batch, time), (batch, time, quantiles)
+```
+### from pip package
+1. Install the package from PyPI
+```bash
+pip install patchfm
+```
+2. Run inference with a pretrained model from Huggingface Hub
+```python
+import torch
+from patchfm import PatchFMConfig, Forecaster
+# same as above
 ```
 We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
@@ -85,11 +108,12 @@ If you dont have suitable hardware you can run the the extended quick start exam
 ## Method (TL;DR)
 - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
-- RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
+- Causal RevIN: Normalize input signal and denormalize outputs to the original scale without statistics leakage.
 - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
 - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
 - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
-- Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
+- Inference: Predict next patch; roll out autoregressively for long horizons.
+- KV-cache: during inference, cache keys/values to avoid redundant computations.
 ## Problem Formulation
 Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
@@ -118,11 +142,37 @@ Aggregate over positions, patch elements, and quantiles.
 ## Inference
 - Single step: predict next patch ($P_{len}$ values)
 - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
-- KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
+## Datasets
+- UTSD (Unified Time Series Dataset) [UTSD]: seven domains (Energy, IoT, Nature, Web, Health, Transport, Environment). We work with UTSD-12G (~18M series after preprocessing).
+- GIFT-Eval pretraining dataset [GIFT]: aligned with the GIFT-Eval dataset but without data leakage issue with the benchmark. The dataset contains approximately 71 univariate and 17 multivariate time series datasets from various
+domains and various frequencies. After preprocessing, this yields approximately 600K univariate series.
+- Artificial: ~1M synthetic series (sinusoidal, linear, polynomial, logarithmic) plus mixtures via TSMixup [Chronos]; Gaussian Process samples via KernelSynth (mixtures of RBF/periodic/linear kernels with swept hyperparameters).
+## Repository Layout
+- `model/training/` — main PatchFM model class
+  - `modules.py` - core modules (Residual Layers, MHA, SwiGLU, RoPE, Transformer Encoder, ...)
+  - `revin.py` — causal RevIN
+  - `loss.py` — multi-quantile (pinball) loss
+  - `trainer.py` — PyTorch Lightning trainer class
+- `model/inference/` — main PatchFM model class for inference
+  - `modules.py` — core modules with caching support
+  - `forecaster.py` — Forecasting model and rollout logic
+- `dataset/` — data loading and preprocessing
+  - `artificial.py` — synthetic dataset : artificial signals + TSMixup + KernelSynth
+  - `utsd.py` — Unified Time Series Dataset (UTSD) loading and preprocessing
+  - `gift.py` — GIFT-Eval pretraining dataset loading and preprocessing
+  - `get_data.py` — utility to fetch and preprocess datasets
+  - `generate_data.py` — utility to generate and save the KernelSynth dataset (long to generate)
+- `configs/` — model and training configurations
+- `notebooks/inference` — how to load a trained model and generate forecasts
+- `training.py` — training script using PyTorch Lightning
 ## Acknowledgements
 We thank the authors of the following repositories for inspiration and code snippets:
 - [TiRex](https://github.com/NX-AI/tirex)
-## Citation
-If you use this work, please cite the paper ...

patchfm-2.0.0/README.md DELETED Viewed

@@ -1,86 +0,0 @@
-# A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
-[Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
-A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
-## Highlights
-- Next-patch prediction objective (autoregressive, causal)
-- Patch-based representation of time series (tokens ↔ patches)
-- Causal masking self-attention with RoPE (relative positions)
-- RevIN (Reversible Instance Normalization) with causal statistics
-- SwiGLU feed-forward networks
-- Multi-quantile outputs (median + uncertainty bands)
-- Efficient rollout with KV caching
-## Installation
-```bash
-pip install patchfm
-```
-## Quick Start
-```python
-import torch
-from patchfm import PatchFMConfig, Forecaster
-# --- Instantiate model ---
-config = PatchFMConfig()
-model = Forecaster(config)
-# --- Inference ---
-forecast_horizon = 64
-seq = torch.randn(1, 1024)  # (batch, time)
-pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, forecast_horizon), (batch, forecast_horizon, quantiles)
-```
-We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
-If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
-<a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
-  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
-</a>
-## Method (TL;DR)
-- Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
-- RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
-- Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
-- Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
-- Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
-- Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
-## Problem Formulation
-Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
-## Loss: Multi-Quantile (Pinball)
-For residual $u = x - \hat{x}^{(q)}$:
-$$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
-Aggregate over positions, patch elements, and quantiles.
-## Architecture
-- Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
-- Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
-- FFN: SwiGLU (SiLU-gated), pre-norm + residual
-- Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
-### Model Details
-- Patch size: 32
-- Max context: 32 patches (1024 steps)
-- Forecast horizon: 32 steps per forward pass
-- Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
-- Layers: 6
-- Attention heads: 64 (head dim 32)
-- Model dim: 2048
-- Parameters: ~300M
-## Inference
-- Single step: predict next patch ($P_{len}$ values)
-- Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
-- KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
-## Acknowledgements
-We thank the authors of the following repositories for inspiration and code snippets:
-- [TiRex](https://github.com/NX-AI/tirex)
-## Citation
-If you use this work, please cite the paper ...