PyPI - liger-kernel-nightly - Versions diffs - 0.5.2.dev20241211213024__py3-none-any.whl → 0.5.2.dev20241212000548__py3-none-any.whl - Mend

liger-kernel-nightly 0.5.2.dev20241211213024py3-none-any.whl → 0.5.2.dev20241212000548py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

liger_kernel/chunked_loss/README.md ADDED Viewed

@@ -0,0 +1,25 @@
+# Liger FlexChunkLoss: Alignment and Distillation loss
+Liger FlexChunkLoss offers a versatile interface, delivering up to 80% memory savings and a 10% throughput boost for post-training loss functions, including alignment (DPO, ORPO, CPO) and very soon, distillation. Its flexible design supports custom losses, ensuring efficiency gains across diverse use cases.
+### User interface
+FlexChunkLoss offers two flexible usage options:
+1. **Via `Liger[Custom Loss]Trainer`**
+   For example, by simply replacing the HuggingFace `ORPOTrainer` with `LigerORPOTrainer` in your code, you can leverage our optimized ORPO implementation and immediately benefit from improved performance.
+2. **Using `nn.Module` Implementations of Custom Loss Functions**
+   Explore the [LigerORPOTrainer implementation](https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/transformers/orpo_trainer.py) to see how the modular design integrates custom loss functions seamlessly.
+### What's under the hood?
+We employ chunking and fused kernel optimizations to enhance performance. By fusing the final linear layer with loss computation and calculating backward gradients during the forward pass, we significantly reduce the need for storing intermediate activations. All operations are implemented in PyTorch, leveraging `torch.compile` to streamline kernel execution without relying on extensive low-level optimizations. Additionally, we minimize `torch.compile` recompilations to reduce overhead and ensure consistent performance gains.
+### Extending to custom loss functions
+We provide two base classes: `LigerFusedLinearPreferenceBase` for alignment use cases and `LigerFusedLinearDistillationBase` for distillation use cases. These base classes manage chunking, kernel fusions, and Torch compilation.
+To implement a custom loss function, you need to create a subclass that defines the custom preference or distillation loss function, capable of processing a given input chunk. The base class will take care of the optimizations, handling most of the heavy lifting for you.
+For a working example, refer to the [ORPO loss implementation](https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/chunked_loss/orpo_loss.py).

liger_kernel/chunked_loss/fused_linear_preference.py CHANGED Viewed

@@ -29,7 +29,7 @@ class LigerFusedLinearPreferenceBase(torch.autograd.Function):
         compute_nll_loss=True,
         compiled=True,
         use_ref_model=False,
-        ref_input=None,
+        # TODO: ref input
         ref_weight=None,
         ref_bias=None,
         **loss_kwargs,
@@ -59,7 +59,6 @@ class LigerFusedLinearPreferenceBase(torch.autograd.Function):
             compute_nll_loss (bool): Whether to compute NLL loss.
             compiled (bool): Whether to use torch compile for chunk accumulation.
             use_ref_model (bool): Whether to use a reference model for the alignment loss.
-            ref_input (torch.Tensor): Reference input tensor. Shape: (batch_size, seq_len, hidden_size).
             ref_weight (torch.Tensor): Reference weight tensor. Shape: (vocab_size, hidden_size).
             ref_bias (torch.Tensor, optional): Reference bias tensor. Shape: (vocab_size,).
             loss_kwargs (dict): Other possible arguments that a loss function might need
@@ -93,7 +92,6 @@ class LigerFusedLinearPreferenceBase(torch.autograd.Function):
             compute_nll_loss=compute_nll_loss,
             full_target=target,
             use_ref_model=use_ref_model,
-            ref_input=ref_input,
             ref_weight=ref_weight,
             ref_bias=ref_bias,
             **loss_kwargs,
@@ -303,7 +301,6 @@ class LigerFusedLinearPreferenceBase(torch.autograd.Function):
         beta=0.1,
         compute_nll_loss=True,
         use_ref_model=False,
-        ref_input=None,
         ref_weight=None,
         ref_bias=None,
         **loss_kwargs,
@@ -322,7 +319,6 @@ class LigerFusedLinearPreferenceBase(torch.autograd.Function):
             beta (float): Weight for the preference loss.
             compute_nll_loss (bool): Whether to compute NLL loss.
             use_ref_model (bool): Whether to use a reference model for the alignment loss.
-            ref_input (torch.Tensor): Reference input tensor. Shape: (2 * chunk_size, sequence_length, hidden_size).
             ref_weight (torch.Tensor): Reference weight tensor. Shape: (vocab_size, hidden_size).
             ref_bias (torch.Tensor, optional): Reference bias tensor. Shape: (vocab_size,).
             loss_kwargs (dict): Additional arguments for the loss function.
@@ -361,7 +357,7 @@ class LigerFusedLinearPreferenceBase(torch.autograd.Function):
                     ref_rejected_logits,
                     ref_chosen_nll_loss,
                 ) = LigerFusedLinearPreferenceBase.chunk_forward(
-                    ref_input,
+                    input_chunk,
                     ref_weight,
                     target_chunk,
                     ref_bias,

{liger_kernel_nightly-0.5.2.dev20241211213024.dist-info → liger_kernel_nightly-0.5.2.dev20241212000548.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: liger_kernel_nightly
-Version: 0.5.2.dev20241211213024
+Version: 0.5.2.dev20241212000548
 Summary: Efficient Triton kernels for LLM Training
 License: BSD 2-CLAUSE LICENSE
         Copyright 2024 LinkedIn Corporation
@@ -32,11 +32,6 @@ License-File: LICENSE
 License-File: NOTICE
 Requires-Dist: torch>=2.1.2
 Requires-Dist: triton>=2.3.1
-Provides-Extra: amd
-Requires-Dist: torch>=2.6.0.dev; extra == "amd"
-Requires-Dist: setuptools-scm>=8; extra == "amd"
-Requires-Dist: torchvision>=0.20.0.dev; extra == "amd"
-Requires-Dist: triton>=3.0.0; extra == "amd"
 Provides-Extra: dev
 Requires-Dist: transformers>=4.44.2; extra == "dev"
 Requires-Dist: matplotlib>=3.7.2; extra == "dev"
@@ -47,12 +42,11 @@ Requires-Dist: pytest>=7.1.2; extra == "dev"
 Requires-Dist: pytest-xdist; extra == "dev"
 Requires-Dist: pytest-rerunfailures; extra == "dev"
 Requires-Dist: datasets>=2.19.2; extra == "dev"
-Requires-Dist: torchvision>=0.16.2; extra == "dev"
 Requires-Dist: seaborn; extra == "dev"
-Provides-Extra: transformers
-Requires-Dist: transformers~=4.0; extra == "transformers"
-Provides-Extra: trl
-Requires-Dist: trl>=0.11.0; extra == "trl"
+Provides-Extra: fmt
+Requires-Dist: flake8; extra == "fmt"
+Requires-Dist: isort; extra == "fmt"
+Requires-Dist: black; extra == "fmt"
 <a name="readme-top"></a>
@@ -202,11 +196,13 @@ To install from source:
 ```bash
 git clone https://github.com/linkedin/Liger-Kernel.git
 cd Liger-Kernel
+# Install Default Dependencies
+# Setup.py will detect whether you are using AMD or NVIDIA
 pip install -e .
-# or if installing on amd platform
-pip install -e .[amd] --extra-index-url https://download.pytorch.org/whl/nightly/rocm6.2 # rocm6.2
-# or if using transformers
-pip install -e .[transformers]
+# Setup Development Dependencies
+pip install -e ".[dev]"
 ```

{liger_kernel_nightly-0.5.2.dev20241211213024.dist-info → liger_kernel_nightly-0.5.2.dev20241212000548.dist-info}/RECORD RENAMED Viewed

@@ -1,12 +1,13 @@
 liger_kernel/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 liger_kernel/env_report.py,sha256=FViyPju795lB6z4k2TZldvBSmQdcS0A2hcnDxepJrDo,1822
 liger_kernel/utils.py,sha256=HJa-xVKOohDn6pLVIx-Fv0V9h0QAL3qZGQNRICI-OpI,249
+liger_kernel/chunked_loss/README.md,sha256=K6rucm6nqHpWCmxUOhBYcE3apwQxAy0TfRUippR7Icw,2243
 liger_kernel/chunked_loss/__init__.py,sha256=R2wCcz4Y0kTAve926DH3k182XKezpXeACMHj05g9Mm8,346
 liger_kernel/chunked_loss/cpo_loss.py,sha256=Qu1Ul2A12sp6CqIT-atPbHWFb_LLtINEA9mOpIRx_0g,3097
 liger_kernel/chunked_loss/dpo_loss.py,sha256=H9_RRhclckHYM2sd75tgbnf8IxC_PU2JCALbgtPQvwc,4222
 liger_kernel/chunked_loss/functional.py,sha256=9Gr-YXIuEzEJkBUhDx3G2fuQayckLor7cC7svhmPML4,549
 liger_kernel/chunked_loss/fused_linear_distillation.py,sha256=2BH6DCPjsR2zS6zcwFPcIIZRhLF8SohjGdKsAJ_301o,10222
-liger_kernel/chunked_loss/fused_linear_preference.py,sha256=qeRod4MFVttj62uPFhgKAWNNjVrqiEvu5SjZfRnOGzI,15389
+liger_kernel/chunked_loss/fused_linear_preference.py,sha256=vlWfaaIECWvCQhY9PM7zRI0vKThIrydMf6P44bXn1EE,15114
 liger_kernel/chunked_loss/orpo_loss.py,sha256=ZuKGjbkIYzV4UzvupNdq6vyxCp7-BztQkUt8ZnFvKos,3531
 liger_kernel/chunked_loss/simpo_loss.py,sha256=Wa4LOlDG9PbJkOOkKg8hbKvnKgg7OTBz6-qIkwPK1yw,3275
 liger_kernel/ops/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
@@ -57,9 +58,9 @@ liger_kernel/transformers/trainer/__init__.py,sha256=c4OQVJmhNOloj0JYSEc0j_cQuBb
 liger_kernel/transformers/trainer/orpo_trainer.py,sha256=jko6oq_XQdBSmXubp05E-_YXOyhtB5Bj75dg5YNwOsE,7517
 liger_kernel/triton/__init__.py,sha256=yfRe0zMb47QnqjecZWG7LnanfCTzeku7SgWRAwNVmzU,101
 liger_kernel/triton/monkey_patch.py,sha256=5BcGKTtdqeYchypBIBopGIWPx1-cFALz7sOKoEsqXJ0,1584
-liger_kernel_nightly-0.5.2.dev20241211213024.dist-info/LICENSE,sha256=OhzLDHJ0to4a8sodVLELZiCFylZ1NAAYLs-HrjPy0ag,1312
-liger_kernel_nightly-0.5.2.dev20241211213024.dist-info/METADATA,sha256=biU1_vrGRLdmGynYauvf4YfAVHrgN2RtGWe_CNuAD3c,20721
-liger_kernel_nightly-0.5.2.dev20241211213024.dist-info/NOTICE,sha256=njwnoPZLh9AN8SJQzxvCGLHi-8X__AvWRze6joNXIY8,2066
-liger_kernel_nightly-0.5.2.dev20241211213024.dist-info/WHEEL,sha256=P9jw-gEje8ByB7_hXoICnHtVCrEwMQh-630tKvQWehc,91
-liger_kernel_nightly-0.5.2.dev20241211213024.dist-info/top_level.txt,sha256=2eghu4hA3LnkM7ElW92tQ8zegWKgSbeo-k-aGe1YnvY,13
-liger_kernel_nightly-0.5.2.dev20241211213024.dist-info/RECORD,,
+liger_kernel_nightly-0.5.2.dev20241212000548.dist-info/LICENSE,sha256=OhzLDHJ0to4a8sodVLELZiCFylZ1NAAYLs-HrjPy0ag,1312
+liger_kernel_nightly-0.5.2.dev20241212000548.dist-info/METADATA,sha256=NfFECBU1FHBc34_9Ybi5h4iFRUTmKUeNCcdqvPzhbR4,20392
+liger_kernel_nightly-0.5.2.dev20241212000548.dist-info/NOTICE,sha256=njwnoPZLh9AN8SJQzxvCGLHi-8X__AvWRze6joNXIY8,2066
+liger_kernel_nightly-0.5.2.dev20241212000548.dist-info/WHEEL,sha256=P9jw-gEje8ByB7_hXoICnHtVCrEwMQh-630tKvQWehc,91
+liger_kernel_nightly-0.5.2.dev20241212000548.dist-info/top_level.txt,sha256=2eghu4hA3LnkM7ElW92tQ8zegWKgSbeo-k-aGe1YnvY,13
+liger_kernel_nightly-0.5.2.dev20241212000548.dist-info/RECORD,,

{liger_kernel_nightly-0.5.2.dev20241211213024.dist-info → liger_kernel_nightly-0.5.2.dev20241212000548.dist-info}/LICENSE RENAMED Viewed

File without changes

{liger_kernel_nightly-0.5.2.dev20241211213024.dist-info → liger_kernel_nightly-0.5.2.dev20241212000548.dist-info}/NOTICE RENAMED Viewed

File without changes

{liger_kernel_nightly-0.5.2.dev20241211213024.dist-info → liger_kernel_nightly-0.5.2.dev20241212000548.dist-info}/WHEEL RENAMED Viewed

File without changes

{liger_kernel_nightly-0.5.2.dev20241211213024.dist-info → liger_kernel_nightly-0.5.2.dev20241212000548.dist-info}/top_level.txt RENAMED Viewed

File without changes

liger-kernel-nightly 0.5.2.dev20241211213024__py3-none-any.whl → 0.5.2.dev20241212000548__py3-none-any.whl

liger-kernel-nightly 0.5.2.dev20241211213024py3-none-any.whl → 0.5.2.dev20241212000548py3-none-any.whl