PyPI - liger-kernel-nightly - Versions diffs - 0.5.2.dev20241212055650__py3-none-any.whl → 0.5.2.dev20241212060541__py3-none-any.whl - Mend

liger-kernel-nightly 0.5.2.dev20241212055650py3-none-any.whl → 0.5.2.dev20241212060541py3-none-any.whl

Files changed (6) hide show

{liger_kernel_nightly-0.5.2.dev20241212055650.dist-info → liger_kernel_nightly-0.5.2.dev20241212060541.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: liger_kernel_nightly
-Version: 0.5.2.dev20241212055650
+Version: 0.5.2.dev20241212060541
 Summary: Efficient Triton kernels for LLM Training
 License: BSD 2-CLAUSE LICENSE
         Copyright 2024 LinkedIn Corporation
@@ -119,7 +119,7 @@ Requires-Dist: seaborn; extra == "dev"
 **Liger Kernel** is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU **training throughput by 20%** and reduces **memory usage by 60%**. We have implemented **Hugging Face Compatible** `RMSNorm`, `RoPE`, `SwiGLU`, `CrossEntropy`, `FusedLinearCrossEntropy`, and more to come. The kernel works out of the box with [Flash Attention](https://github.com/Dao-AILab/flash-attention), [PyTorch FSDP](https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html), and [Microsoft DeepSpeed](https://github.com/microsoft/DeepSpeed). We welcome contributions from the community to gather the best kernels for LLM training.
-We've also added optimized Post-Training kernels that deliver **up to 80% memory savings** for alignment and distillation tasks. We support losses like DPO, CPO, ORPO, SimPO, JSD, and many more. Check out our [deep dive thread](https://x.com/hsu_byron/status/1866577403918917655)
+We've also added optimized Post-Training kernels that deliver **up to 80% memory savings** for alignment and distillation tasks. We support losses like DPO, CPO, ORPO, SimPO, JSD, and many more. Check out [how we optimize the memory](https://x.com/hsu_byron/status/1866577403918917655).
 ## Supercharge Your Model with Liger Kernel
@@ -136,6 +136,19 @@ With one line of code, Liger Kernel can increase throughput by more than 20% and
 > - Benchmark conditions: LLaMA 3-8B, Batch Size = 8, Data Type = `bf16`, Optimizer = AdamW, Gradient Checkpointing = True, Distributed Strategy = FSDP1 on 8 A100s.
 > - Hugging Face models start to OOM at a 4K context length, whereas Hugging Face + Liger Kernel scales up to 16K.
+## Optimize post training with Liger Kernel
+![Post Training](https://raw.githubusercontent.com/linkedin/Liger-Kernel/main/docs/images/post-training.png)
+We provide optimized post training kernels like DPO, ORPO, SimPO, and more which can reduce memory usage by up to 80%. You can easily use them as python modules.
+```python
+from liger_kernel.chunked_loss import LigerFusedLinearDPOLoss
+orpo_loss = LigerFusedLinearORPOLoss()
+y = orpo_loss(lm_head.weight, x, target)
+```
 ## Examples
 | **Use Case**                                    | **Description**                                                                                   |

{liger_kernel_nightly-0.5.2.dev20241212055650.dist-info → liger_kernel_nightly-0.5.2.dev20241212060541.dist-info}/RECORD RENAMED Viewed

@@ -58,9 +58,9 @@ liger_kernel/transformers/trainer/__init__.py,sha256=c4OQVJmhNOloj0JYSEc0j_cQuBb
 liger_kernel/transformers/trainer/orpo_trainer.py,sha256=jko6oq_XQdBSmXubp05E-_YXOyhtB5Bj75dg5YNwOsE,7517
 liger_kernel/triton/__init__.py,sha256=yfRe0zMb47QnqjecZWG7LnanfCTzeku7SgWRAwNVmzU,101
 liger_kernel/triton/monkey_patch.py,sha256=5BcGKTtdqeYchypBIBopGIWPx1-cFALz7sOKoEsqXJ0,1584
-liger_kernel_nightly-0.5.2.dev20241212055650.dist-info/LICENSE,sha256=OhzLDHJ0to4a8sodVLELZiCFylZ1NAAYLs-HrjPy0ag,1312
-liger_kernel_nightly-0.5.2.dev20241212055650.dist-info/METADATA,sha256=V0dpxYDogziib6ySh0AqgHcxSwRs-Kpy1IEfQl-Z_eo,20518
-liger_kernel_nightly-0.5.2.dev20241212055650.dist-info/NOTICE,sha256=njwnoPZLh9AN8SJQzxvCGLHi-8X__AvWRze6joNXIY8,2066
-liger_kernel_nightly-0.5.2.dev20241212055650.dist-info/WHEEL,sha256=P9jw-gEje8ByB7_hXoICnHtVCrEwMQh-630tKvQWehc,91
-liger_kernel_nightly-0.5.2.dev20241212055650.dist-info/top_level.txt,sha256=2eghu4hA3LnkM7ElW92tQ8zegWKgSbeo-k-aGe1YnvY,13
-liger_kernel_nightly-0.5.2.dev20241212055650.dist-info/RECORD,,
+liger_kernel_nightly-0.5.2.dev20241212060541.dist-info/LICENSE,sha256=OhzLDHJ0to4a8sodVLELZiCFylZ1NAAYLs-HrjPy0ag,1312
+liger_kernel_nightly-0.5.2.dev20241212060541.dist-info/METADATA,sha256=J64c14dbQAzCW0-j89DnVcgt1VxXesKDC-szl0_2dvU,21001
+liger_kernel_nightly-0.5.2.dev20241212060541.dist-info/NOTICE,sha256=njwnoPZLh9AN8SJQzxvCGLHi-8X__AvWRze6joNXIY8,2066
+liger_kernel_nightly-0.5.2.dev20241212060541.dist-info/WHEEL,sha256=P9jw-gEje8ByB7_hXoICnHtVCrEwMQh-630tKvQWehc,91
+liger_kernel_nightly-0.5.2.dev20241212060541.dist-info/top_level.txt,sha256=2eghu4hA3LnkM7ElW92tQ8zegWKgSbeo-k-aGe1YnvY,13
+liger_kernel_nightly-0.5.2.dev20241212060541.dist-info/RECORD,,

{liger_kernel_nightly-0.5.2.dev20241212055650.dist-info → liger_kernel_nightly-0.5.2.dev20241212060541.dist-info}/LICENSE RENAMED Viewed

File without changes

{liger_kernel_nightly-0.5.2.dev20241212055650.dist-info → liger_kernel_nightly-0.5.2.dev20241212060541.dist-info}/NOTICE RENAMED Viewed

File without changes

{liger_kernel_nightly-0.5.2.dev20241212055650.dist-info → liger_kernel_nightly-0.5.2.dev20241212060541.dist-info}/WHEEL RENAMED Viewed

File without changes

{liger_kernel_nightly-0.5.2.dev20241212055650.dist-info → liger_kernel_nightly-0.5.2.dev20241212060541.dist-info}/top_level.txt RENAMED Viewed

File without changes

liger-kernel-nightly 0.5.2.dev20241212055650__py3-none-any.whl → 0.5.2.dev20241212060541__py3-none-any.whl

liger-kernel-nightly 0.5.2.dev20241212055650py3-none-any.whl → 0.5.2.dev20241212060541py3-none-any.whl