PyPI - patchfm - Versions diffs - 1.1.0__tar.gz - Mend

patchfm 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

patchfm-1.1.0/LICENSE +21 -0
patchfm-1.1.0/PKG-INFO +128 -0
patchfm-1.1.0/README.md +87 -0
patchfm-1.1.0/pyproject.toml +30 -0
patchfm-1.1.0/setup.cfg +4 -0
patchfm-1.1.0/src/patchfm/__init__.py +2 -0
patchfm-1.1.0/src/patchfm/configs/model_config.py +21 -0
patchfm-1.1.0/src/patchfm/inference/forecaster.py +129 -0
patchfm-1.1.0/src/patchfm/inference/modules.py +251 -0
patchfm-1.1.0/src/patchfm.egg-info/PKG-INFO +128 -0
patchfm-1.1.0/src/patchfm.egg-info/SOURCES.txt +12 -0
patchfm-1.1.0/src/patchfm.egg-info/dependency_links.txt +1 -0
patchfm-1.1.0/src/patchfm.egg-info/requires.txt +5 -0
patchfm-1.1.0/src/patchfm.egg-info/top_level.txt +1 -0

patchfm-1.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 Samy-Melwan Vilhes
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

patchfm-1.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,128 @@
+Metadata-Version: 2.4
+Name: patchfm
+Version: 1.1.0
+Summary: a Foundation Model for Univariate Time Series Forecasting
+Author-email: Samy-Melwan Vilhes <samy-melwan.vilhes@insa-rouen.fr>
+License: MIT License
+        Copyright (c) 2025 Samy-Melwan Vilhes
+        Permission is hereby granted, free of charge, to any person obtaining a copy
+        of this software and associated documentation files (the "Software"), to deal
+        in the Software without restriction, including without limitation the rights
+        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+        copies of the Software, and to permit persons to whom the Software is
+        furnished to do so, subject to the following conditions:
+        The above copyright notice and this permission notice shall be included in all
+        copies or substantial portions of the Software.
+        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+        SOFTWARE.
+Project-URL: Repository, https://github.com/vilhess/PatchFM
+Project-URL: Issues, https://github.com/vilhess/PatchFM/issues
+Keywords: Transformer,LLM,Time Series,Zero-shot,Deep Learning
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: torch>=2.5.0
+Requires-Dist: einops>=0.8.1
+Requires-Dist: huggingface-hub>=0.35.1
+Requires-Dist: rotary-embedding-torch>=0.8.9
+Requires-Dist: numpy>=1.26.0
+Dynamic: license-file
+# A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
+[Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
+A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
+## Highlights
+- Next-patch prediction objective (autoregressive, causal)
+- Patch-based representation of time series (tokens ↔ patches)
+- Causal masking self-attention with RoPE (relative positions)
+- RevIN (Reversible Instance Normalization) with causal statistics
+- SwiGLU feed-forward networks
+- Multi-quantile outputs (median + uncertainty bands)
+- Efficient rollout with KV caching
+## Installation
+```bash
+pip install patchfm
+```
+## Quick Start
+```python
+import torch
+from patchfm.configs import PatchFMConfig
+from patchfm.model import Forecaster
+# --- Instantiate model ---
+config = PatchFMConfig()
+model = Forecaster(config)
+# --- Inference ---
+forecast_horizon = 64
+seq = torch.randn(1, 1024)  # (batch, time)
+pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9])  # (batch, time, quantiles)
+```
+We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
+If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
+<a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
+  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
+</a>
+## Method (TL;DR)
+- Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
+- RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
+- Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
+- Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
+- Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
+- Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
+## Problem Formulation
+Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
+## Loss: Multi-Quantile (Pinball)
+For residual $u = x - \hat{x}^{(q)}$:
+$$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
+Aggregate over positions, patch elements, and quantiles.
+## Architecture
+- Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
+- Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
+- FFN: SwiGLU (SiLU-gated), pre-norm + residual
+- Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
+### Model Details
+- Patch size: 32
+- Max context: 32 patches (1024 steps)
+- Forecast horizon: 32 steps per forward pass
+- Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
+- Layers: 6
+- Attention heads: 64 (head dim 32)
+- Model dim: 2048
+- Parameters: ~300M
+## Inference
+- Single step: predict next patch ($P_{len}$ values)
+- Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
+- KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
+## Acknowledgements
+We thank the authors of the following repositories for inspiration and code snippets:
+- [TiRex](https://github.com/NX-AI/tirex)
+## Citation
+If you use this work, please cite the paper ...

patchfm-1.1.0/README.md ADDED Viewed

@@ -0,0 +1,87 @@
+# A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
+[Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
+A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
+## Highlights
+- Next-patch prediction objective (autoregressive, causal)
+- Patch-based representation of time series (tokens ↔ patches)
+- Causal masking self-attention with RoPE (relative positions)
+- RevIN (Reversible Instance Normalization) with causal statistics
+- SwiGLU feed-forward networks
+- Multi-quantile outputs (median + uncertainty bands)
+- Efficient rollout with KV caching
+## Installation
+```bash
+pip install patchfm
+```
+## Quick Start
+```python
+import torch
+from patchfm.configs import PatchFMConfig
+from patchfm.model import Forecaster
+# --- Instantiate model ---
+config = PatchFMConfig()
+model = Forecaster(config)
+# --- Inference ---
+forecast_horizon = 64
+seq = torch.randn(1, 1024)  # (batch, time)
+pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9])  # (batch, time, quantiles)
+```
+We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
+If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
+<a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
+  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
+</a>
+## Method (TL;DR)
+- Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
+- RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
+- Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
+- Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
+- Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
+- Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
+## Problem Formulation
+Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
+## Loss: Multi-Quantile (Pinball)
+For residual $u = x - \hat{x}^{(q)}$:
+$$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
+Aggregate over positions, patch elements, and quantiles.
+## Architecture
+- Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
+- Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
+- FFN: SwiGLU (SiLU-gated), pre-norm + residual
+- Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
+### Model Details
+- Patch size: 32
+- Max context: 32 patches (1024 steps)
+- Forecast horizon: 32 steps per forward pass
+- Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
+- Layers: 6
+- Attention heads: 64 (head dim 32)
+- Model dim: 2048
+- Parameters: ~300M
+## Inference
+- Single step: predict next patch ($P_{len}$ values)
+- Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
+- KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
+## Acknowledgements
+We thank the authors of the following repositories for inspiration and code snippets:
+- [TiRex](https://github.com/NX-AI/tirex)
+## Citation
+If you use this work, please cite the paper ...

patchfm-1.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,30 @@
+[project]
+name = "patchfm"
+version = "1.1.0"
+authors = [
+  { name="Samy-Melwan Vilhes", email="samy-melwan.vilhes@insa-rouen.fr" },
+]
+description = "a Foundation Model for Univariate Time Series Forecasting"
+readme = "README.md"
+license = {file="LICENSE"}
+requires-python = ">=3.11"
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "Operating System :: OS Independent",
+]
+keywords = ["Transformer", "LLM", "Time Series", "Zero-shot", "Deep Learning"]
+dependencies = [
+    "torch>=2.5.0",
+    "einops>=0.8.1",
+    "huggingface-hub>=0.35.1",
+    "rotary-embedding-torch>=0.8.9",
+    "numpy>=1.26.0"
+]
+[project.urls]
+Repository = "https://github.com/vilhess/PatchFM"
+Issues = "https://github.com/vilhess/PatchFM/issues"
+[build-system]
+requires = ["setuptools >= 77.0.3"]
+build-backend = "setuptools.build_meta"

patchfm-1.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

patchfm-1.1.0/src/patchfm/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ from patchfm.inference.forecaster import Forecaster
2	+ from patchfm.configs.model_config import PatchFMConfig

patchfm-1.1.0/src/patchfm/configs/model_config.py ADDED Viewed

@@ -0,0 +1,21 @@
+from dataclasses import dataclass, field, asdict
+@dataclass
+class PatchFMConfig:
+    max_seq_len: int = 1024
+    patch_len: int = 32
+    d_model: int = 2048
+    n_heads: int = 64
+    n_layers_encoder: int = 6
+    quantiles: list[float] = field(default_factory=lambda: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
+    compile: bool = True
+    def __getitem__(self, key):
+        return getattr(self, key)
+    def __setitem__(self, key, value):
+        return setattr(self, key, value)
+    def to_dict(self):
+        return asdict(self)

patchfm-1.1.0/src/patchfm/inference/forecaster.py ADDED Viewed

@@ -0,0 +1,129 @@
+import torch
+import torch.nn as nn
+from einops import rearrange
+from src.inference.modules import RevIN, ResidualBlock, TransformerEncoder, PatchFM
+# --- Forecaster Model ---
+class Forecaster(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        # Store config
+        self.max_seq_len = config["max_seq_len"]
+        self.patch_len = config["patch_len"]
+        self.d_model = config["d_model"]
+        self.n_heads = config["n_heads"]
+        self.n_layers_encoder = config["n_layers_encoder"]
+        self.quantiles = config["quantiles"]
+        self.n_quantiles = len(self.quantiles)
+        print("Loading base model from HuggingFace Hub...")
+        base_model = PatchFM.from_pretrained("vilhess/PatchFM")
+        self._init_from_base(base_model)
+        self.eval()
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        self.to(self.device)
+        if config["compile"]:
+            self = torch.compile(self)
+    def _init_components(self):
+        """Initialize modules from scratch."""
+        self.revin = RevIN()
+        self.proj_embedding = ResidualBlock(
+            in_dim=self.patch_len,
+            hid_dim=2 * self.patch_len,
+            out_dim=self.d_model
+        )
+        self.transformer_encoder = TransformerEncoder(
+            d_model=self.d_model,
+            n_heads=self.n_heads,
+            n_layers=self.n_layers_encoder
+        )
+        self.proj_output = ResidualBlock(
+            in_dim=self.d_model,
+            hid_dim=2 * self.d_model,
+            out_dim=self.patch_len * self.n_quantiles
+        )
+    def _init_from_base(self, base_model):
+        """Initialize modules by reusing a pretrained PatchFM model."""
+        self.revin = base_model.revin
+        self.proj_embedding = base_model.proj_embedding
+        self.transformer_encoder = base_model.transformer_encoder
+        self.proj_output = base_model.proj_output
+    @torch.inference_mode()
+    def forecast(self, x: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
+        x = x.to(self.device)
+        # Ensure input shape (bs, length)
+        if x.ndim != 2:
+            x = x.unsqueeze(0)
+        bs, ws = x.size()
+        if ws > self.max_seq_len:
+            print(f"Warning: Input length {ws} exceeds max_seq_len {self.max_seq_len}. Truncating input.")
+            x = x[:, -self.max_seq_len:]
+            ws = self.max_seq_len
+        # Pad so length is divisible by patch_len
+        pad = (self.patch_len - ws % self.patch_len) % self.patch_len
+        if pad > 0:
+            x = torch.cat([x[:, :1].repeat(1, pad), x], dim=1)
+        # Default horizon = patch_len
+        forecast_horizon = forecast_horizon or self.patch_len
+        # Reshape into patches
+        x = rearrange(x, "b (pn pl) -> b pn pl", pl=self.patch_len)
+        rollouts = -(-forecast_horizon // self.patch_len)  # ceil division
+        predictions = []
+        for _ in range(rollouts):
+            # Forward pass
+            x = self.revin(x, mode="norm")
+            x = self.proj_embedding(x)
+            x = self.transformer_encoder(x)
+            x = x[:, -1:, :]  # Keep only the last patch for autoregressive forecasting
+            forecasting = self.proj_output(x)
+            forecasting = self.revin(forecasting, mode="denorm_last")
+            # Reshape to (bs, patch_num, patch_len, n_quantiles)
+            forecasting = rearrange(
+                forecasting, "b 1 (pl q) -> b 1 pl q",
+                pl=self.patch_len, q=self.n_quantiles
+            )
+            # Take median quantile (index 4)
+            patch_median = forecasting[:, -1:, :, 4].detach()
+            predictions.append(forecasting[:, -1, :, :])
+            # Append median patch for next rollout
+            x = patch_median.clone()
+        pred_quantiles = torch.cat(predictions, dim=1)
+        pred_quantiles = pred_quantiles[:, :forecast_horizon, :]
+        pred_median = pred_quantiles[:, :, 4]
+        pred_quantiles = pred_quantiles[..., [self.quantiles.index(q) for q in quantiles]] if quantiles is not None else pred_quantiles
+        self.clear_cache()
+        if torch.any(torch.isnan(pred_median)) or torch.any(torch.isinf(pred_median)):
+            print("Warning: NaN or Inf values detected in predictions. Returning zeros.")
+            pred_median = torch.zeros_like(pred_median)
+            pred_quantiles = torch.zeros_like(pred_quantiles)
+        return pred_median, pred_quantiles
+    def __call__(self, context: torch.Tensor, forecast_horizon: int | None = None, quantiles: list[float] | None = None) -> torch.Tensor:
+        return self.forecast(context, forecast_horizon, quantiles)
+    def clear_cache(self):
+        self.revin.clear_cache()
+        for layer in self.transformer_encoder.layers:
+            layer.attn.clear_cache()

patchfm-1.1.0/src/patchfm/inference/modules.py ADDED Viewed

@@ -0,0 +1,251 @@
+# Modules efficient for inference with caching
+import torch
+import torch.nn as nn
+from einops import rearrange
+from rotary_embedding_torch import RotaryEmbedding
+from huggingface_hub import PyTorchModelHubMixin
+def fill_nan_with_last_observed(x):
+    bs, pn, pl = x.size()
+    x = rearrange(x, "b pn pl -> (b pn) pl")
+    valid_mask = ~torch.isnan(x)
+    x_temp = torch.where(valid_mask, x, torch.zeros_like(x))
+    seq_indices = torch.arange(x.size(-1), device=x.device).unsqueeze(0)
+    valid_indices = torch.where(valid_mask, seq_indices, torch.tensor(-1, device=x.device))
+    last_valid_idx = torch.cummax(valid_indices, dim=-1)[0]
+    x = x_temp.gather(-1, torch.clamp(last_valid_idx, min=0))
+    x = rearrange(x, "(b pn) pl -> b pn pl", b=bs)
+    return x
+class RevIN(nn.Module):
+    def __init__(self, eps=1e-5):
+        super().__init__()
+        self.eps = eps
+        self.cached_mean = None
+        self.cached_std = None
+        self.cached_cumsum_x = None
+        self.cached_cumsum_x2 = None
+        self.cached_counts = None
+    def forward(self, x, mode):
+        assert x.dim() == 3, "Input tensor must be (batch, n_patches, patch_len)"
+        x64 = x.double()
+        if mode == "norm":
+            mean, std = self._get_statistics(x64)
+            self.cached_mean, self.cached_std = mean[:, -1:].detach(), std[:, -1:].detach()
+            out = (x64 - mean) / std
+            nan_idx = out.isnan()
+            if nan_idx.any():
+                out = fill_nan_with_last_observed(out)
+        elif mode == "denorm_last":
+            assert self.cached_mean is not None and self.cached_std is not None, \
+                "Call forward(..., 'norm') before 'denorm'"
+            out = x64 * self.cached_std + self.cached_mean
+        else:
+            raise NotImplementedError(f"Mode '{mode}' not implemented.")
+        return out.float()
+    def _get_statistics(self, x):
+        """
+        Numerically stable mean and variance computation using
+        incremental mean and variance along the patch dimension.
+        x: (B, P, L) float64
+        Returns: mean, std (both (B, P, 1))
+        """
+        B, P, L = x.shape
+        nan_counts = torch.isnan(x).sum(-1, keepdim=True)
+        nan_counts = torch.cumsum(nan_counts, dim=1)
+        counts = torch.arange(1, P+1, device=x.device).view(1, P, 1).repeat(B, 1, 1) * L
+        counts = counts - nan_counts
+        if self.cached_counts is not None:
+            counts += self.cached_counts
+        self.cached_counts = counts[:, -1:, :]
+        cumsum_x = torch.cumsum(x.nansum(dim=-1, keepdim=True), dim=1)
+        if self.cached_cumsum_x is not None:
+            cumsum_x += self.cached_cumsum_x
+        self.cached_cumsum_x = cumsum_x[:, -1:, :]
+        mean = cumsum_x / counts
+        cumsum_x2 = torch.cumsum((x**2).nansum(dim=-1, keepdim=True), dim=1)
+        if self.cached_cumsum_x2 is not None:
+            cumsum_x2 += self.cached_cumsum_x2
+        self.cached_cumsum_x2 = cumsum_x2[:, -1:, :]
+        var = (cumsum_x2 - 2 * mean * cumsum_x + counts * mean**2) / counts
+        std = torch.sqrt(var + 1e-5)
+        return mean, std
+    def clear_cache(self):
+        self.cached_cumsum_x = None
+        self.cached_cumsum_x2 = None
+        self.cached_counts = None
+class ResidualBlock(nn.Module):
+    def __init__(self, in_dim, hid_dim, out_dim):
+        super().__init__()
+        self.hidden_layer = nn.Linear(in_dim, hid_dim)
+        self.output_layer = nn.Linear(hid_dim, out_dim)
+        self.residual_layer = nn.Linear(in_dim, out_dim)
+        self.act = nn.ReLU()
+    def forward(self, x):
+        hid = self.act(self.hidden_layer(x))
+        out = self.output_layer(hid)
+        res = self.residual_layer(x)
+        out = out+res
+        return out
+class MultiHeadAttention(nn.Module):
+    def __init__(self, d_model, n_heads, last=False):
+        super().__init__()
+        assert d_model%n_heads==0, f"d_model ({d_model}) must be divisible by n_heads ({n_heads})"
+        self.WQ = nn.Linear(d_model, d_model)
+        self.WK = nn.Linear(d_model, d_model)
+        self.WV = nn.Linear(d_model, d_model)
+        self.out_proj = nn.Linear(d_model, d_model)
+        self.head_dim = d_model//n_heads
+        self.n_heads = n_heads
+        self.rope = RotaryEmbedding(dim=self.head_dim//2)
+        self.k_cache = None
+        self.v_cache = None
+        self.last = last
+    def forward(self, q):
+        bs, context, dim = q.size()
+        offset = 0
+        is_causal = True
+        k = q
+        v = q
+        if self.last:
+            q = q[:, -1:, :]
+            is_causal = False
+            offset += (context - 1)
+        q = self.WQ(q).reshape(bs, -1, self.n_heads, self.head_dim).transpose(1, 2)
+        k = self.WK(k).reshape(bs, -1, self.n_heads, self.head_dim).transpose(1, 2)
+        v = self.WV(v).reshape(bs, -1, self.n_heads, self.head_dim).transpose(1, 2)
+        if self.k_cache is not None and self.v_cache is not None:
+            offset += self.k_cache.size(2)
+            is_causal = False
+            k = torch.cat([self.k_cache, k], dim=2)
+            v = torch.cat([self.v_cache, v], dim=2)
+        self.k_cache = k
+        self.v_cache = v
+        q = self.rope.rotate_queries_or_keys(q, offset=offset)
+        k = self.rope.rotate_queries_or_keys(k)
+        values = nn.functional.scaled_dot_product_attention(q, k, v, is_causal=is_causal)
+        values = values.transpose(1, 2).reshape(bs, -1, dim)
+        values = self.out_proj(values)
+        return values
+    def clear_cache(self):
+        self.k_cache = None
+        self.v_cache = None
+class FeedForward(nn.Module):
+    def __init__(self, d_model, multiple_of=256):
+        super().__init__()
+        hidden_dim = d_model*4
+        hidden_dim = int(2 * hidden_dim / 3)
+        hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)
+        self.w1 = nn.Linear(d_model, hidden_dim, bias=False)
+        self.w2 = nn.Linear(hidden_dim, d_model, bias=False)
+        self.w3 = nn.Linear(d_model, hidden_dim, bias=False)
+        self.act = nn.SiLU()
+    def forward(self, x):
+        x = self.w2(self.act(self.w1(x)) * self.w3(x))
+        return x
+class TransformerEncoderLayer(nn.Module):
+    def __init__(self, d_model, n_heads, last=False):
+        super().__init__()
+        self.ln1 = nn.LayerNorm(d_model)
+        self.attn = MultiHeadAttention(d_model=d_model, n_heads=n_heads, last=last)
+        self.ln2 = nn.LayerNorm(d_model)
+        self.ff = FeedForward(d_model=d_model)
+    def forward(self, x):
+        out_attn = self.attn(self.ln1((x)))
+        x = x + out_attn
+        out = x + self.ff(self.ln2(x))
+        return out
+class TransformerEncoder(nn.Module):
+    def __init__(self, d_model, n_heads, n_layers):
+        super().__init__()
+        self.layers = nn.ModuleList(
+            [
+                TransformerEncoderLayer(d_model=d_model, n_heads=n_heads)
+                for _ in range(n_layers-1)
+            ]
+        )
+        self.layers.append(TransformerEncoderLayer(d_model=d_model, n_heads=n_heads, last=True))
+        self.norm = nn.LayerNorm(d_model)
+    def forward(self, x):
+        for layer in self.layers:
+            x = layer(x)
+        return self.norm(x)
+class PatchFM(nn.Module, PyTorchModelHubMixin):
+    def __init__(self, config):
+        super().__init__()
+        # Store config
+        self.patch_len = config["patch_len"]
+        self.d_model = config["d_model"]
+        self.n_heads = config["n_heads"]
+        self.n_layers_encoder = config["n_layers_encoder"]
+        self.quantiles = config["quantiles"]
+        self.n_quantiles = len(self.quantiles)
+        # Components
+        self.revin = RevIN()
+        self.proj_embedding = ResidualBlock(
+            in_dim=self.patch_len,
+            hid_dim=2 * self.patch_len,
+            out_dim=self.d_model
+        )
+        self.transformer_encoder = TransformerEncoder(
+            d_model=self.d_model,
+            n_heads=self.n_heads,
+            n_layers=self.n_layers_encoder
+        )
+        self.proj_output = ResidualBlock(
+            in_dim=self.d_model,
+            hid_dim=2 * self.d_model,
+            out_dim=self.patch_len * self.n_quantiles
+        )

patchfm-1.1.0/src/patchfm.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,128 @@
+Metadata-Version: 2.4
+Name: patchfm
+Version: 1.1.0
+Summary: a Foundation Model for Univariate Time Series Forecasting
+Author-email: Samy-Melwan Vilhes <samy-melwan.vilhes@insa-rouen.fr>
+License: MIT License
+        Copyright (c) 2025 Samy-Melwan Vilhes
+        Permission is hereby granted, free of charge, to any person obtaining a copy
+        of this software and associated documentation files (the "Software"), to deal
+        in the Software without restriction, including without limitation the rights
+        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+        copies of the Software, and to permit persons to whom the Software is
+        furnished to do so, subject to the following conditions:
+        The above copyright notice and this permission notice shall be included in all
+        copies or substantial portions of the Software.
+        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+        SOFTWARE.
+Project-URL: Repository, https://github.com/vilhess/PatchFM
+Project-URL: Issues, https://github.com/vilhess/PatchFM/issues
+Keywords: Transformer,LLM,Time Series,Zero-shot,Deep Learning
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: torch>=2.5.0
+Requires-Dist: einops>=0.8.1
+Requires-Dist: huggingface-hub>=0.35.1
+Requires-Dist: rotary-embedding-torch>=0.8.9
+Requires-Dist: numpy>=1.26.0
+Dynamic: license-file
+# A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
+[Huggingface Model Card](https://huggingface.co/vilhess/PatchFM)
+A transformer-based forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
+## Highlights
+- Next-patch prediction objective (autoregressive, causal)
+- Patch-based representation of time series (tokens ↔ patches)
+- Causal masking self-attention with RoPE (relative positions)
+- RevIN (Reversible Instance Normalization) with causal statistics
+- SwiGLU feed-forward networks
+- Multi-quantile outputs (median + uncertainty bands)
+- Efficient rollout with KV caching
+## Installation
+```bash
+pip install patchfm
+```
+## Quick Start
+```python
+import torch
+from patchfm.configs import PatchFMConfig
+from patchfm.model import Forecaster
+# --- Instantiate model ---
+config = PatchFMConfig()
+model = Forecaster(config)
+# --- Inference ---
+forecast_horizon = 64
+seq = torch.randn(1, 1024)  # (batch, time)
+pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9])  # (batch, time, quantiles)
+```
+We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
+If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
+<a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
+  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
+</a>
+## Method (TL;DR)
+- Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
+- RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
+- Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
+- Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
+- Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
+- Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
+## Problem Formulation
+Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
+## Loss: Multi-Quantile (Pinball)
+For residual $u = x - \hat{x}^{(q)}$:
+$$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
+Aggregate over positions, patch elements, and quantiles.
+## Architecture
+- Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
+- Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
+- FFN: SwiGLU (SiLU-gated), pre-norm + residual
+- Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
+### Model Details
+- Patch size: 32
+- Max context: 32 patches (1024 steps)
+- Forecast horizon: 32 steps per forward pass
+- Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
+- Layers: 6
+- Attention heads: 64 (head dim 32)
+- Model dim: 2048
+- Parameters: ~300M
+## Inference
+- Single step: predict next patch ($P_{len}$ values)
+- Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
+- KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
+## Acknowledgements
+We thank the authors of the following repositories for inspiration and code snippets:
+- [TiRex](https://github.com/NX-AI/tirex)
+## Citation
+If you use this work, please cite the paper ...

patchfm-1.1.0/src/patchfm.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,12 @@
+LICENSE
+README.md
+pyproject.toml
+src/patchfm/__init__.py
+src/patchfm.egg-info/PKG-INFO
+src/patchfm.egg-info/SOURCES.txt
+src/patchfm.egg-info/dependency_links.txt
+src/patchfm.egg-info/requires.txt
+src/patchfm.egg-info/top_level.txt
+src/patchfm/configs/model_config.py
+src/patchfm/inference/forecaster.py
+src/patchfm/inference/modules.py

patchfm-1.1.0/src/patchfm.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

patchfm-1.1.0/src/patchfm.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,5 @@
+torch>=2.5.0
+einops>=0.8.1
+huggingface-hub>=0.35.1
+rotary-embedding-torch>=0.8.9
+numpy>=1.26.0

patchfm-1.1.0/src/patchfm.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ patchfm