PyPI - litgpt - Versions diffs - 0.3.0__tar.gz → 0.4.0.dev0__tar.gz - Mend

litgpt 0.3.0tar.gz → 0.4.0.dev0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (99) hide show

{litgpt-0.3.0 → litgpt-0.4.0.dev0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: litgpt
-Version: 0.3.0
+Version: 0.4.0.dev0
 Summary: Hackable implementation of state-of-the-art open-source LLMs
 Author-email: Lightning AI <contact@lightning.ai>
 License:                                  Apache License
@@ -210,23 +210,24 @@ Project-URL: documentation, https://github.com/lightning-AI/litgpt/tutorials
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: torch>=2.2.0
-Requires-Dist: lightning==2.3.0.dev20240328
+Requires-Dist: lightning==2.3.0.dev20240428
 Requires-Dist: jsonargparse[signatures]>=4.27.6
-Requires-Dist: litserve>=0.1.0
 Provides-Extra: test
 Requires-Dist: pytest>=8.1.1; extra == "test"
 Requires-Dist: pytest-rerunfailures>=14.0; extra == "test"
 Requires-Dist: pytest-timeout>=2.3.1; extra == "test"
+Requires-Dist: pytest-dependency>=0.6.0; extra == "test"
 Requires-Dist: transformers>=4.38.0; extra == "test"
 Requires-Dist: einops>=0.7.0; extra == "test"
 Requires-Dist: protobuf>=4.23.4; extra == "test"
-Requires-Dist: lightning-thunder==0.2.0.dev20240404; python_version >= "3.10" and extra == "test"
+Requires-Dist: lightning-thunder==0.2.0.dev20240505; python_version >= "3.10" and extra == "test"
 Provides-Extra: all
 Requires-Dist: bitsandbytes==0.42.0; extra == "all"
 Requires-Dist: sentencepiece>=0.2.0; extra == "all"
 Requires-Dist: tokenizers>=0.15.2; extra == "all"
 Requires-Dist: requests>=2.31.0; extra == "all"
-Requires-Dist: litdata>=0.2.2; extra == "all"
+Requires-Dist: litdata==0.2.6; extra == "all"
+Requires-Dist: litserve==0.1.1dev0; extra == "all"
 Requires-Dist: zstandard>=0.22.0; extra == "all"
 Requires-Dist: pandas>=1.9.0; extra == "all"
 Requires-Dist: pyarrow>=15.0.2; extra == "all"
@@ -247,7 +248,11 @@ Requires-Dist: huggingface_hub[hf_transfer]>=0.21.0; extra == "all"
 Uses the latest state-of-the-art techniques:
-✅ flash attention &nbsp; &nbsp;  ✅ fp4/8/16/32 &nbsp; &nbsp;  ✅ LoRA, QLoRA, Adapter (v1, v2) &nbsp; &nbsp;  ✅ FSDP &nbsp; &nbsp;  ✅ 1-1000+ GPUs/TPUs
+<pre>
+✅ flash attention    ✅ fp4/8/16/32        ✅ LoRA, QLoRA, Adapter
+✅ FSDP               ✅ 1-1000+ GPUs/TPUs  ✅ 20+ LLMs
+</pre>
 ---
@@ -273,63 +278,70 @@ Uses the latest state-of-the-art techniques:
 <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/GithubLitGPTDAG2.png" alt="LitGPT steps" width="auto"/>
 &nbsp;
-# Finetune, pretrain and deploy LLMs Lightning fast ⚡⚡
+# Finetune, pretrain and deploy LLMs Lightning fast ⚡⚡
 LitGPT is a command-line tool designed to easily [finetune](#finetune-an-llm), [pretrain](#pretrain-an-llm), [evaluate](#use-an-llm), and [deploy](#deploy-an-llm) [20+ LLMs](#choose-from-20-llms) **on your own data**. It features highly-optimized [training recipes](#training-recipes) for the world's most powerful open-source large language models (LLMs).
-We reimplemented all model architectures and training recipes from scratch for 4 reasons:
+We reimplemented all model architectures and training recipes from scratch for 4 reasons:
-1. Remove all abstraction layers and have single file implementations.
-2. Guarantee Apache 2.0 compliance to enable enterprise use without limits.
-3. Optimized each model's architectural detail to maximize performance, reduce costs, and speed up training.
-4. Highly-optimized [recipe configs](#training-recipes) we have tested at enterprise scale.
+1. Remove all abstraction layers and have single file implementations.
+2. Guarantee Apache 2.0 compliance to enable enterprise use without limits.
+3. Optimized each model's architectural detail to maximize performance, reduce costs, and speed up training.
+4. Highly-optimized [recipe configs](#training-recipes) we have tested at enterprise scale.
 ---
 &nbsp;
 # Choose from 20+ LLMs
-LitGPT has 🤯 **custom, from-scratch implementations** of [20+ LLMs](tutorials/download_model_weights.md) without layers of abstraction:
+LitGPT has 🤯 **custom, from-scratch implementations** of [20+ LLMs](tutorials/download_model_weights.md) without layers of abstraction:
 | Model | Model size | Author | Reference |
 |----|----|----|----|
-| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3)                                                                     |
-| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                      |
+| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3) |
+| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288) |
 | Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) |
-| Mistral | 7B | Mistral AI | [Mistral website](https://mistral.ai/)                                                                                       |
+| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/)                                                                      |
+| Mistral | 7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/)                                                                         |
 | CodeGemma | 7B | Google | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma) |
 | ... | ... | ... | ...   |
 <details>
   <summary>See full list of 20+ LLMs</summary>
-&nbsp;
+&nbsp;
 #### All models
 | Model | Model size | Author | Reference |
 |----|----|----|----|
-| CodeGemma | 7B | Google | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma) |
-| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950) |
-| Dolly | 3B, 7B, 12B | Databricks | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) |
-| Falcon | 7B, 40B, 180B | TII UAE | [TII 2023](https://falconllm.tii.ae)                                                                                         |
-| FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models)                             |
-| Function Calling Llama 2 | 7B | Trelis | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)                                   |
-| Gemma | 2B, 7B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)                         |
-| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                      |
-| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3)                                                                     |
-| LongChat | 7B, 13B | LMSYS | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/)                                                            |
-| Mistral | 7B | Mistral AI | [Mistral website](https://mistral.ai/)                                                                                       |
-| Nous-Hermes | 7B, 13B, 70B | NousResearch | [Org page](https://huggingface.co/NousResearch)                                                                              |
-| OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [Geng & Liu 2023](https://github.com/openlm-research/open_llama)                                                             |
-| Phi | 1.3B, 2.7B | Microsoft Research  | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                                                           |
+| CodeGemma | 7B | Google | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma)                                                                 |
+| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                                                   |
+| Danube2 | 1.8B | H2O.ai | [H2O.ai](https://h2o.ai/platform/danube-1-8b/)                                                                                             |
+| Dolly | 3B, 7B, 12B | Databricks | [Conover et al. 2023](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm)      |
+| Falcon | 7B, 40B, 180B | TII UAE | [TII 2023](https://falconllm.tii.ae)                                                                                              |
+| FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models)                 |
+| Function Calling Llama 2 | 7B | Trelis | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)                                  |
+| Gemma | 2B, 7B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)                                       |
+| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                           |
+| Llama 3 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3)                                                                                   |
+| LongChat | 7B, 13B | LMSYS | [LongChat Team 2023](https://lmsys.org/blog/2023-06-29-longchat/)                                                                       |
+| MicroLlama | 300M | Ken Wang | [MicroLlama repo](https://github.com/keeeeenw/MicroLlama)
+| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/)                                                                     |
+| Mistral | 7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/)                                                                        |
+| Nous-Hermes | 7B, 13B, 70B | NousResearch | [Org page](https://huggingface.co/NousResearch)                                                                          |
+| OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [Geng & Liu 2023](https://github.com/openlm-research/open_llama)                                                         |
+| Phi | 1.3B, 2.7B | Microsoft Research  | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                                                          |
 | Platypus | 7B, 13B, 70B |  Lee et al. | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317)                                                               |
-| Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373)                                                                     |
-| RedPajama-INCITE | 3B, 7B | Together | [Together 2023](https://together.ai/blog/redpajama-models-v1)                                                                |
-| StableCode | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                           |
-| StableLM  | 3B, 7B | Stability AI | [Stability AI 2023](https://github.com/Stability-AI/StableLM)                                                                |
-| StableLM Zephyr | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                           |
-| TinyLlama | 1.1B | Zhang et al. | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)                                                                   |
-| Vicuna | 7B, 13B, 33B | LMSYS | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/)
+| Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373)                                            |
+| RedPajama-INCITE | 3B, 7B | Together | [Together 2023](https://together.ai/blog/redpajama-models-v1)                                                                 |
+| StableCode | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                                  |
+| StableLM  | 3B, 7B | Stability AI | [Stability AI 2023](https://github.com/Stability-AI/StableLM)                                                                    |
+| StableLM Zephyr | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                             |
+| TinyLlama | 1.1B | Zhang et al. | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)                                                                         |
+| Vicuna | 7B, 13B, 33B | LMSYS | [Li et al. 2023](https://lmsys.org/blog/2023-03-30-vicuna/)                                                                          |
+**Tip**: You can list all available models by running the `litgpt download list` command.
 </details>
@@ -361,41 +373,44 @@ pip install -e '.[all]'
 &nbsp;
 # Quick start
-After installing LitGPT, select the model and action you want to take on that model (finetune, pretrain, evaluate, deploy, etc...):
+After installing LitGPT, select the model and action you want to take on that model (finetune, pretrain, evaluate, deploy, etc...):
 ```bash
 # ligpt [action] [model]
 litgpt  download  meta-llama/Meta-Llama-3-8B-Instruct
 litgpt  chat      meta-llama/Meta-Llama-3-8B-Instruct
-litgpt  finetune  meta-llama/Meta-Llama-3-8B-Instruct
-litgpt  pretrain  meta-llama/Meta-Llama-3-8B-Instruct
-litgpt  serve     meta-llama/Meta-Llama-3-8B-Instruct
+litgpt  finetune  meta-llama/Meta-Llama-3-8B-Instruct
+litgpt  pretrain  meta-llama/Meta-Llama-3-8B-Instruct
+litgpt  serve     meta-llama/Meta-Llama-3-8B-Instruct
 ```
 &nbsp;
 ###  Use an LLM for inference
-Use LLMs for inference to test its chatting capabilities, run evaluations, or extract embeddings, etc...
-Here's an example showing how to use the Mistral 7B LLM.
+Use LLMs for inference to test its chatting capabilities, run evaluations, or extract embeddings, etc.
+Here's an example showing how to use the Phi-2 LLM.
 <a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-chat">
   <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio"/>
 </a>
-&nbsp;
+&nbsp;
 ```bash
-# 1) Download a pretrained model
-litgpt download --repo_id mistralai/Mistral-7B-Instruct-v0.2
+# 1) List all available models in litgpt
+litgpt download list
+# 2) Download a pretrained model
+litgpt download microsoft/phi-2
-# 2) Chat with the model
-litgpt chat \
-  --checkpoint_dir checkpoints/mistralai/Mistral-7B-Instruct-v0.2
+# 3) Chat with the model
+litgpt chat microsoft/phi-2
 >> Prompt: What do Llamas eat?
 ```
-For more information, refer to the [download](tutorials/download_model_weights.md) and [inference](tutorials/inference.md) tutorials.
+The download of certain models requires an additional access token. You can read more about this in the [download](tutorials/download_model_weights.md#specific-models-and-access-tokens) documentation.
+For more information on the different inference options, refer to the [inference](tutorials/inference.md) tutorial.
 &nbsp;
@@ -406,37 +421,35 @@ For more information, refer to the [download](tutorials/download_model_weights.m
   <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio"/>
 </a>
-&nbsp;
+&nbsp;
 ```bash
 # 1) Download a pretrained model
-litgpt download --repo_id microsoft/phi-2
+litgpt download microsoft/phi-2
 # 2) Finetune the model
 curl -L https://huggingface.co/datasets/ksaw008/finance_alpaca/resolve/main/finance_alpaca.json -o my_custom_dataset.json
-litgpt finetune \
-  --checkpoint_dir checkpoints/microsoft/phi-2 \
+litgpt finetune microsoft/phi-2 \
   --data JSON \
   --data.json_path my_custom_dataset.json \
   --data.val_split_fraction 0.1 \
   --out_dir out/custom-model
 # 3) Chat with the model
-litgpt chat \
-  --checkpoint_dir out/custom-model/final
+litgpt chat out/custom-model/final
 ```
 &nbsp;
-### Pretrain an LLM
+### Pretrain an LLM
 Train an LLM from scratch on your own data via pretraining:
 <a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-pretrain">
 <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg"; alt="Open In Studio"/>
 </a>
-&nbsp;
+&nbsp;
 ```bash
 mkdir -p custom_texts
@@ -444,35 +457,32 @@ curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_text
 curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt
 # 1) Download a tokenizer
-litgpt download \
-  --repo_id EleutherAI/pythia-160m \
+litgpt download EleutherAI/pythia-160m \
   --tokenizer_only True
 # 2) Pretrain the model
-litgpt pretrain \
-  --model_name pythia-160m \
-  --tokenizer_dir checkpoints/EleutherAI/pythia-160m \
+litgpt pretrain EleutherAI/pythia-160m \
+  --tokenizer_dir EleutherAI/pythia-160m \
   --data TextFiles \
   --data.train_data_path "custom_texts/" \
   --train.max_tokens 10_000_000 \
   --out_dir out/custom-model
 # 3) Chat with the model
-litgpt chat \
-  --checkpoint_dir out/custom-model/final
+litgpt chat out/custom-model/final
 ```
 &nbsp;
-### Continue pretraining an LLM
-This is another way of finetuning that specializes an already pretrained model by training on custom data:
+### Continue pretraining an LLM
+This is another way of finetuning that specializes an already pretrained model by training on custom data:
 <a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-continue-pretraining">
 <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg"; alt="Open In Studio"/>
 </a>
-&nbsp;
+&nbsp;
 ```bash
 mkdir -p custom_texts
@@ -480,27 +490,25 @@ curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_text
 curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt
 # 1) Download a pretrained model
-litgpt download --repo_id EleutherAI/pythia-160m
+litgpt download EleutherAI/pythia-160m
 # 2) Continue pretraining the model
-litgpt pretrain \
-  --model_name pythia-160m \
-  --tokenizer_dir checkpoints/EleutherAI/pythia-160m \
-  --initial_checkpoint_dir checkpoints/EleutherAI/pythia-160m \
+litgpt pretrain EleutherAI/pythia-160m \
+  --tokenizer_dir EleutherAI/pythia-160m \
+  --initial_checkpoint_dir EleutherAI/pythia-160m \
   --data TextFiles \
   --data.train_data_path "custom_texts/" \
   --train.max_tokens 10_000_000 \
   --out_dir out/custom-model
 # 3) Chat with the model
-litgpt chat \
-  --checkpoint_dir out/custom-model/final
+litgpt chat out/custom-model/final
 ```
 &nbsp;
 ### Deploy an LLM
-Once you're ready to deploy a finetuned LLM, run this command:
+Once you're ready to deploy a finetuned LLM, run this command:
 <a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-serve">
   <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/studio-badge.svg" alt="Open In Studio"/>
@@ -509,20 +517,20 @@ Once you're ready to deploy a finetuned LLM, run this command:
 &nbsp;
 ```bash
-# locate the checkpoint to your finetuned or pretrained model and call the `serve` command:
-litgpt serve --checkpoint_dir path/to/your/checkpoint/microsoft/phi-2
+# locate the checkpoint to your finetuned or pretrained model and call the `serve` command:
+litgpt serve microsoft/phi-2
-# Alternative: if you haven't finetuned, download any checkpoint to deploy it:
-litgpt download --repo_id microsoft/phi-2
-litgpt serve --checkpoint_dir checkpoints/microsoft/phi-2
+# Alternative: if you haven't finetuned, download any checkpoint to deploy it:
+litgpt download microsoft/phi-2
+litgpt serve microsoft/phi-2
 ```
-Test the server in a separate terminal and integrate the model API into your AI product:
+Test the server in a separate terminal and integrate the model API into your AI product:
 ```python
 # 3) Use the server (in a separate session)
 import requests, json
  response = requests.post(
-     "http://127.0.0.1:8000/predict",
+     "http://127.0.0.1:8000/predict",
      json={"prompt": "Fix typos in the following sentence: Exampel input"}
 )
 print(response.json()["output"])
@@ -731,7 +739,7 @@ litgpt finetune \
 &nbsp;
-# Community
+# Community
 ## Get involved!

litgpt 0.3.0__tar.gz → 0.4.0.dev0__tar.gz

litgpt 0.3.0tar.gz → 0.4.0.dev0tar.gz