PyPI - cortexflowx - Versions diffs - 0.1.0__tar.gz → 0.2.0__tar.gz - Mend

cortexflowx 0.1.0tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/CHANGELOG.md RENAMED Viewed

@@ -2,6 +2,22 @@
 All notable changes to CortexFlow will be documented in this file.
+## [0.2.0] - 2026-04-12
+### Added
+- **Semantic diversity**: All pipelines now support `num_samples` parameter to generate multiple diverse reconstructions per brain input. Each sample uses independent noise (image/audio) or independent random draws (text), producing semantically varied outputs.
+- **Nucleus (top-p) sampling** for Brain2Text: `top_p` parameter enables nucleus filtering alongside top-k, giving finer control over text generation diversity.
+- When `num_samples > 1`, output shapes become `(B, num_samples, ...)` for image/audio; text metadata returns grouped lists per brain input.
+## [0.1.1] - 2025-06-26
+### Fixed
+- **DiT zero-init**: `_initialize_weights()` no longer overwrites critical zero-initialized gating parameters (AdaLN modulation, final layer projection). Output is now exactly zero at initialization for stable training.
+- **Brain2Text BOS mismatch**: Training now prepends BOS token so the model learns to predict from BOS context, matching inference behavior.
+- **Brain2Text empty output**: `reconstruct()` now correctly skips the BOS token when decoding generated sequences to text.
 ## [0.1.0] - 2025-06-25
 ### Added

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cortexflowx
-Version: 0.1.0
+Version: 0.2.0
 Summary: Brain-to-image/audio/text reconstruction using Diffusion Transformers and Flow Matching. Decode what someone saw, heard, or thought from fMRI.
 Project-URL: Homepage, https://github.com/stef41/cortexflow
 Project-URL: Repository, https://github.com/stef41/cortexflow
@@ -65,12 +65,12 @@ fMRI voxels
 ## Installation
 ```bash
-pip install cortexflow
+pip install cortexflowx
 ```
 With audio support:
 ```bash
-pip install cortexflow[audio]
+pip install cortexflowx[audio]
 ```
 ## Quick Start

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/README.md RENAMED Viewed

@@ -30,12 +30,12 @@ fMRI voxels
 ## Installation
 ```bash
-pip install cortexflow
+pip install cortexflowx
 ```
 With audio support:
 ```bash
-pip install cortexflow[audio]
+pip install cortexflowx[audio]
 ```
 ## Quick Start

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "cortexflowx"
-version = "0.1.0"
+version = "0.2.0"
 description = "Brain-to-image/audio/text reconstruction using Diffusion Transformers and Flow Matching. Decode what someone saw, heard, or thought from fMRI."
 readme = "README.md"
 license = {text = "Apache-2.0"}

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/brain2audio.py RENAMED Viewed

@@ -211,25 +211,52 @@ class Brain2Audio(nn.Module):
         brain_data: BrainData,
         num_steps: int = 50,
         cfg_scale: float = 3.0,
+        num_samples: int = 1,
     ) -> ReconstructionResult:
-        """Reconstruct audio mel spectrogram from brain activity."""
+        """Reconstruct audio mel spectrogram from brain activity.
+        Args:
+            brain_data: fMRI data to decode.
+            num_steps: Number of ODE solver steps.
+            cfg_scale: Classifier-free guidance scale.
+            num_samples: Number of diverse samples per brain input.
+                Each sample uses independent noise, producing semantically
+                varied reconstructions. Output shape becomes
+                ``(B, num_samples, n_mels, T)`` when ``num_samples > 1``.
+        Returns:
+            ReconstructionResult with the decoded mel spectrogram(s).
+        """
         B = brain_data.batch_size
         device = brain_data.voxels.device
         brain_global, brain_tokens = self.brain_encoder(brain_data.voxels)
-        mel_shape = (B, self.n_mels, self.audio_len)
+        # Repeat conditioning for multiple samples per input
+        if num_samples > 1:
+            brain_global = brain_global.repeat_interleave(num_samples, dim=0)
+            brain_tokens = brain_tokens.repeat_interleave(num_samples, dim=0)
+        BN = B * num_samples
+        mel_shape = (BN, self.n_mels, self.audio_len)
         mel = self.flow_matcher.sample(
             self.dit, mel_shape, brain_global, brain_tokens,
             num_steps=num_steps, cfg_scale=cfg_scale,
-            brain_global_uncond=self.uncond_global.expand(B, -1),
-            brain_tokens_uncond=self.uncond_tokens.expand(B, -1, -1),
+            brain_global_uncond=self.uncond_global.expand(BN, -1),
+            brain_tokens_uncond=self.uncond_tokens.expand(BN, -1, -1),
         )
+        # Reshape to (B, num_samples, n_mels, T) when generating multiple
+        if num_samples > 1:
+            mel = mel.view(B, num_samples, self.n_mels, self.audio_len)
         return ReconstructionResult(
             modality=Modality.AUDIO,
             output=mel,
-            brain_condition=brain_global,
+            brain_condition=brain_global[:B],
             n_steps=num_steps,
             cfg_scale=cfg_scale,
+            metadata={"num_samples": num_samples},
         )
     @staticmethod

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/brain2img.py RENAMED Viewed

@@ -154,6 +154,7 @@ class Brain2Image(nn.Module):
         brain_data: BrainData,
         num_steps: int = 50,
         cfg_scale: float = 4.0,
+        num_samples: int = 1,
     ) -> ReconstructionResult:
         """Reconstruct an image from brain activity.
@@ -161,9 +162,13 @@ class Brain2Image(nn.Module):
             brain_data: fMRI data to decode.
             num_steps: Number of ODE solver steps.
             cfg_scale: Classifier-free guidance scale.
+            num_samples: Number of diverse samples per brain input.
+                Each sample uses independent noise, producing semantically
+                varied reconstructions. Output shape becomes
+                ``(B, num_samples, C, H, W)`` when ``num_samples > 1``.
         Returns:
-            ReconstructionResult with the decoded image.
+            ReconstructionResult with the decoded image(s).
         """
         B = brain_data.batch_size
         device = brain_data.voxels.device
@@ -171,12 +176,19 @@ class Brain2Image(nn.Module):
         # Encode brain
         brain_global, brain_tokens = self.encode_brain(brain_data)
+        # Repeat conditioning for multiple samples per input
+        if num_samples > 1:
+            brain_global = brain_global.repeat_interleave(num_samples, dim=0)
+            brain_tokens = brain_tokens.repeat_interleave(num_samples, dim=0)
+        BN = B * num_samples
         # Unconditional embeddings for CFG
-        uncond_global = self.uncond_global.expand(B, -1)
-        uncond_tokens = self.uncond_tokens.expand(B, -1, -1)
+        uncond_global = self.uncond_global.expand(BN, -1)
+        uncond_tokens = self.uncond_tokens.expand(BN, -1, -1)
-        # Sample latents via flow matching
-        latent_shape = (B, self._latent_channels, self._latent_size, self._latent_size)
+        # Sample latents via flow matching (each gets independent noise)
+        latent_shape = (BN, self._latent_channels, self._latent_size, self._latent_size)
         z = self.flow_matcher.sample(
             self.dit,
             shape=latent_shape,
@@ -192,12 +204,18 @@ class Brain2Image(nn.Module):
         images = self.vae.decode(z)
         images = images.clamp(0, 1)
+        # Reshape to (B, num_samples, C, H, W) when generating multiple
+        if num_samples > 1:
+            C, H, W = images.shape[1:]
+            images = images.view(B, num_samples, C, H, W)
         return ReconstructionResult(
             modality=Modality.IMAGE,
             output=images,
-            brain_condition=brain_global,
+            brain_condition=brain_global[:B],
             n_steps=num_steps,
             cfg_scale=cfg_scale,
+            metadata={"num_samples": num_samples},
         )

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/brain2text.py RENAMED Viewed

@@ -190,16 +190,22 @@ class Brain2Text(nn.Module):
         Args:
             text_tokens: ``(B, T)`` target token IDs (byte-level).
+                Raw text bytes — BOS is prepended automatically.
             brain_data: Corresponding fMRI data.
         Returns:
             Scalar cross-entropy loss.
         """
+        B, T = text_tokens.shape
         _, brain_tokens = self.brain_encoder(brain_data.voxels)
-        # Input: [BOS, t1, t2, ..., t_{n-1}], Target: [t1, t2, ..., t_n]
-        input_tokens = text_tokens[:, :-1]
-        target_tokens = text_tokens[:, 1:]
+        # Prepend BOS so the model learns to predict from BOS context
+        # Input: [BOS, t1, t2, ..., t_{T-1}], Target: [t1, t2, ..., t_T]
+        bos = torch.full(
+            (B, 1), self.bos_token, dtype=torch.long, device=text_tokens.device
+        )
+        input_tokens = torch.cat([bos, text_tokens[:, :-1]], dim=1)
+        target_tokens = text_tokens
         logits = self.decoder(input_tokens, brain_tokens)
         return F.cross_entropy(
@@ -215,6 +221,8 @@ class Brain2Text(nn.Module):
         max_len: int | None = None,
         temperature: float = 0.8,
         top_k: int = 50,
+        top_p: float = 0.0,
+        num_samples: int = 1,
     ) -> ReconstructionResult:
         """Reconstruct text from brain activity via autoregressive decoding.
@@ -222,7 +230,16 @@ class Brain2Text(nn.Module):
             brain_data: fMRI data to decode.
             max_len: Maximum generation length.
             temperature: Sampling temperature.
-            top_k: Top-k filtering.
+            top_k: Top-k filtering (0 to disable).
+            top_p: Nucleus sampling threshold (0.0 to disable). When set,
+                only the smallest set of tokens with cumulative probability
+                >= ``top_p`` are kept. Promotes semantic diversity.
+            num_samples: Number of diverse samples per brain input.
+                Each sample decodes independently with different random
+                draws, producing semantically varied texts. When
+                ``num_samples > 1``, ``metadata["texts"]`` is a list of
+                lists: ``texts[i]`` contains ``num_samples`` strings for
+                brain input *i*.
         Returns:
             ReconstructionResult with generated text as metadata.
@@ -233,8 +250,14 @@ class Brain2Text(nn.Module):
         _, brain_tokens = self.brain_encoder(brain_data.voxels)
+        # Repeat conditioning for multiple samples per input
+        if num_samples > 1:
+            brain_tokens = brain_tokens.repeat_interleave(num_samples, dim=0)
+        BN = B * num_samples
         # Start with BOS token
-        generated = torch.full((B, 1), self.bos_token, dtype=torch.long, device=device)
+        generated = torch.full((BN, 1), self.bos_token, dtype=torch.long, device=device)
         for _ in range(gen_len - 1):
             logits = self.decoder(generated, brain_tokens)
@@ -242,10 +265,20 @@ class Brain2Text(nn.Module):
             # Top-k filtering
             if top_k > 0:
-                topk_vals, _ = next_logits.topk(top_k, dim=-1)
+                topk_vals, _ = next_logits.topk(min(top_k, next_logits.size(-1)), dim=-1)
                 threshold = topk_vals[:, -1].unsqueeze(-1)
                 next_logits = next_logits.masked_fill(next_logits < threshold, float("-inf"))
+            # Nucleus (top-p) filtering
+            if top_p > 0.0:
+                sorted_logits, sorted_idx = next_logits.sort(dim=-1, descending=True)
+                cum_probs = sorted_logits.softmax(dim=-1).cumsum(dim=-1)
+                # Remove tokens with cumulative probability above top_p
+                mask = cum_probs - sorted_logits.softmax(dim=-1) >= top_p
+                sorted_logits[mask] = float("-inf")
+                # Scatter back
+                next_logits = sorted_logits.scatter(1, sorted_idx, sorted_logits)
             probs = F.softmax(next_logits, dim=-1)
             next_token = torch.multinomial(probs, num_samples=1)
             generated = torch.cat([generated, next_token], dim=1)
@@ -254,14 +287,23 @@ class Brain2Text(nn.Module):
             if (next_token == 0).all():
                 break
-        # Decode to text
-        texts = [self.tokens_to_text(generated[i]) for i in range(B)]
+        # Decode to text (skip BOS token at position 0)
+        if num_samples > 1:
+            # Group samples: texts[i] = list of num_samples strings
+            texts = []
+            for i in range(B):
+                group = []
+                for s in range(num_samples):
+                    group.append(self.tokens_to_text(generated[i * num_samples + s, 1:]))
+                texts.append(group)
+        else:
+            texts = [self.tokens_to_text(generated[i, 1:]) for i in range(B)]
         return ReconstructionResult(
             modality=Modality.TEXT,
             output=generated,
-            brain_condition=brain_tokens.mean(dim=1),
-            metadata={"texts": texts},
+            brain_condition=brain_tokens[:B].mean(dim=1),
+            metadata={"texts": texts, "num_samples": num_samples},
         )

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/dit.py RENAMED Viewed

@@ -330,7 +330,11 @@ class DiffusionTransformer(nn.Module):
         self._initialize_weights()
     def _initialize_weights(self) -> None:
-        """Initialize weights following DiT conventions."""
+        """Initialize weights following DiT conventions.
+        Global Xavier init first, then re-apply zero-init on gating
+        parameters (AdaLN modulation, final layer) for stable training.
+        """
         def _init(m: nn.Module) -> None:
             if isinstance(m, nn.Linear):
@@ -344,6 +348,15 @@ class DiffusionTransformer(nn.Module):
         self.apply(_init)
+        # Re-apply zero-init on gating parameters (overwritten by global init)
+        for block in self.blocks:
+            nn.init.zeros_(block.adaLN_modulation[-1].weight)
+            nn.init.zeros_(block.adaLN_modulation[-1].bias)
+        nn.init.zeros_(self.final_layer.adaLN[-1].weight)
+        nn.init.zeros_(self.final_layer.adaLN[-1].bias)
+        nn.init.zeros_(self.final_layer.proj.weight)
+        nn.init.zeros_(self.final_layer.proj.bias)
     def unpatchify(self, x: torch.Tensor) -> torch.Tensor:
         """Reshape patch tokens back to spatial latent maps.

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_brain2audio.py RENAMED Viewed

@@ -91,6 +91,26 @@ class TestBrain2Audio:
         assert result.output.shape[0] == 1
+    def test_reconstruct_num_samples(self, model, brain_data):
+        model.eval()
+        result = model.reconstruct(brain_data, num_steps=2, num_samples=3)
+        assert result.output.shape == (BATCH, 3, N_MELS, AUDIO_LEN)
+        assert result.metadata["num_samples"] == 3
+    def test_diverse_audio_samples_differ(self, model, brain_data):
+        """Multiple audio samples from same brain input should differ."""
+        model.eval()
+        result = model.reconstruct(brain_data, num_steps=2, num_samples=2)
+        sample_0 = result.output[:, 0]
+        sample_1 = result.output[:, 1]
+        assert not torch.allclose(sample_0, sample_1), "Diverse samples should differ"
+    def test_reconstruct_num_samples_1(self, model, brain_data):
+        model.eval()
+        result = model.reconstruct(brain_data, num_steps=2, num_samples=1)
+        assert result.output.shape == (BATCH, N_MELS, AUDIO_LEN)
 class TestMelToWaveform:
     def test_output_shape(self):
         mel = torch.rand(1, N_MELS, 16).abs() + 0.01

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_brain2img.py RENAMED Viewed

@@ -79,6 +79,27 @@ class TestBrain2Image:
         assert result.output.shape[0] == 1
+    def test_reconstruct_num_samples(self, model, brain_data):
+        model.eval()
+        result = model.reconstruct(brain_data, num_steps=2, num_samples=3)
+        assert result.output.shape == (BATCH, 3, 3, IMG_SIZE, IMG_SIZE)
+        assert result.metadata["num_samples"] == 3
+    def test_diverse_samples_differ(self, model, brain_data):
+        """Multiple samples from same brain input should differ."""
+        model.eval()
+        result = model.reconstruct(brain_data, num_steps=2, num_samples=2)
+        sample_0 = result.output[:, 0]
+        sample_1 = result.output[:, 1]
+        assert not torch.allclose(sample_0, sample_1), "Diverse samples should differ"
+    def test_reconstruct_num_samples_1(self, model, brain_data):
+        """num_samples=1 should behave like the default."""
+        model.eval()
+        result = model.reconstruct(brain_data, num_steps=2, num_samples=1)
+        assert result.output.shape == (BATCH, 3, IMG_SIZE, IMG_SIZE)
 class TestBuildBrain2Img:
     def test_default_build(self):
         model = build_brain2img(n_voxels=64, img_size=8, hidden_dim=16, depth=1, num_heads=4)

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_brain2text.py RENAMED Viewed

@@ -142,6 +142,54 @@ class TestBrain2Text:
         assert len(result.metadata["texts"]) == 1
+    def test_reconstruct_num_samples(self, model, brain_data):
+        model.eval()
+        result = model.reconstruct(brain_data, max_len=8, num_samples=3)
+        texts = result.metadata["texts"]
+        assert len(texts) == BATCH
+        for group in texts:
+            assert isinstance(group, list)
+            assert len(group) == 3
+            for t in group:
+                assert isinstance(t, str)
+    def test_diverse_text_samples_differ(self, model, brain_data):
+        """Multiple text samples from same brain input should differ."""
+        model.eval()
+        # High temperature for max diversity
+        result = model.reconstruct(
+            brain_data, max_len=8, temperature=1.5, num_samples=4,
+        )
+        texts = result.metadata["texts"]
+        # At least one brain input should produce non-identical samples
+        any_differ = False
+        for group in texts:
+            if len(set(group)) > 1:
+                any_differ = True
+                break
+        assert any_differ, "At high temperature, diverse samples should differ"
+    def test_reconstruct_top_p(self, model, brain_data):
+        model.eval()
+        result = model.reconstruct(brain_data, max_len=8, top_p=0.9, top_k=0)
+        assert result.output.shape[0] == BATCH
+        assert len(result.metadata["texts"]) == BATCH
+    def test_reconstruct_top_p_and_top_k(self, model, brain_data):
+        model.eval()
+        result = model.reconstruct(brain_data, max_len=8, top_p=0.9, top_k=20)
+        assert result.output.shape[0] == BATCH
+    def test_reconstruct_num_samples_1(self, model, brain_data):
+        """num_samples=1 should return flat list of strings."""
+        model.eval()
+        result = model.reconstruct(brain_data, max_len=8, num_samples=1)
+        texts = result.metadata["texts"]
+        assert len(texts) == BATCH
+        for t in texts:
+            assert isinstance(t, str)
 class TestBuildBrain2Text:
     def test_default_build(self):
         model = build_brain2text(n_voxels=64, max_len=16, hidden_dim=16, depth=1)

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/.github/workflows/ci.yml RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/.gitignore RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/CITATION.cff RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/LICENSE RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/examples/brain2audio_demo.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/examples/brain2img_demo.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/examples/brain2text_demo.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/__init__.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/_types.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/brain_encoder.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/flow_matching.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/training.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/src/cortexflow/vae.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/conftest.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_brain_encoder.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_dit.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_flow_matching.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_init.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_integration.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_training.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_types.py RENAMED Viewed

File without changes

{cortexflowx-0.1.0 → cortexflowx-0.2.0}/tests/test_vae.py RENAMED Viewed

File without changes

cortexflowx 0.1.0__tar.gz → 0.2.0__tar.gz

cortexflowx 0.1.0tar.gz → 0.2.0tar.gz