PyPI - llmflowstack - Versions diffs - 1.2.6__tar.gz → 1.3.0__tar.gz - Mend

llmflowstack 1.2.6tar.gz → 1.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

{llmflowstack-1.2.6 → llmflowstack-1.3.0}/PKG-INFO RENAMED Viewed

@@ -1,35 +1,29 @@
 Metadata-Version: 2.4
 Name: llmflowstack
-Version: 1.2.6
-Summary: LLMFlowStack is a framework for training and using LLMs (LLaMA, GPT-OSS, Gemma, ...). Supports DAPT, fine-tuning, and distributed inference. Public fork without institution-specific components.
+Version: 1.3.0
+Summary: LLMFlowStack is a framework for training and using LLMs (LLaMA, GPT-OSS, Gemma, ...). Supports DAPT, fine-tuning, and distributed inference.
 Author-email: Gustavo Henrique Ferreira Cruz <gustavohferreiracruz@gmail.com>
 License: MIT
 License-File: LICENSE
 Requires-Python: >=3.12
 Requires-Dist: accelerate
 Requires-Dist: bert-score
-Requires-Dist: bitsandbytes
 Requires-Dist: chromadb
 Requires-Dist: datasets
 Requires-Dist: evaluate
-Requires-Dist: huggingface-hub
+Requires-Dist: fbgemm-gpu-genai
 Requires-Dist: kernels
 Requires-Dist: langchain-chroma
 Requires-Dist: langchain-community
 Requires-Dist: nltk
-Requires-Dist: numpy
-Requires-Dist: openai-harmony
-Requires-Dist: pandas
 Requires-Dist: peft
+Requires-Dist: pillow
 Requires-Dist: rouge-score
 Requires-Dist: safetensors
-Requires-Dist: scikit-learn
-Requires-Dist: scipy
 Requires-Dist: sentence-transformers
 Requires-Dist: torch
-Requires-Dist: torchvision
-Requires-Dist: tqdm
-Requires-Dist: transformers
+Requires-Dist: torchao
+Requires-Dist: transformers==4.57.6
 Requires-Dist: triton
 Requires-Dist: trl
 Description-Content-Type: text/markdown
@@ -53,32 +47,23 @@ The goal is to make experimentation with LLMs more accessible, without the need
 This framework is designed to provide flexibility when working with different open-source and commercial LLMs. Currently, the following models are supported:
 - **GPT-OSS**
   - [`GPT-OSS 20B`](https://huggingface.co/openai/gpt-oss-20b)
   - [`GPT-OSS 120B`](https://huggingface.co/openai/gpt-oss-120b)
-    > Fine-Tuning, DAPT and Inference Available
 - **LLaMA 3**
   - [`LLaMA 3.1 8B - Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
   - [`LLaMA 3.1 70B - Instruct`](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)
   - [`LLaMA 3.3 70B - Instruct`](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
   - [`LLaMA 3.3 405B - Instruct`](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct)
-    > Fine-Tuning, DAPT and Inference Available
 - **LLaMA 4**
   - [`LLaMA 4 Scout - Instruct`](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct)
-    > DAPT and Inference Available
 - **Gemma**
   - [`Gemma 3 27B - Instruct`](https://huggingface.co/google/gemma-3-27b-it)
-    > DAPT and Inference Available
 - **MedGemma**
-  - [`MedGemma 27B Text - Instruct`](https://huggingface.co/google/medgemma-27b-text-it)
-    > Fine-Tuning, DAPT and Inference Available
+  - [`MedGemma 27B - Instruct`](https://huggingface.co/google/medgemma-27b-it)
 > Other architectures based on those **may** function correctly.
@@ -101,22 +86,22 @@ This section presents a bit of what you can do with the framework.
 You can load as many models as your hardware allows (H100 GPU recommended)...
 ```python
-from llmflowstack import GPT_OSS, LLaMA3
+from llmflowstack import GptOss, Llama3
-# Loading a LLaMA model
-first_model = LLaMA3()
+# Loading a Llama model
+first_model = Llama3()
 first_model.load_checkpoint(
   checkpoint="/llama-3.1-8b-Instruct",
 )
-# Loading a quantized LLaMA model
-second_model = LLaMA3(
+# Loading a quantized Llama model
+second_model = Llama3(
   checkpoint="/llama-3.3-70b-Instruct",
   quantization="4bit"
 )
 # Loading a GPT-OSS, quantized and with seed
-thrid_model = GPT_OSS(
+thrid_model = GptOss(
   checkpoint="/gpt-oss-20b",
   quantization=True,
   seed=1234
@@ -126,32 +111,31 @@ thrid_model = GPT_OSS(
 ### Inference Examples
 ```python
-> from llmflowstack import GPT_OSS, GenerationParams, GenerationSampleParams
+> from llmflowstack import GptOss, GenerationParams
-> gpt_oss_model = GPT_OSS(checkpoint="/gpt-oss-120b")
+> gpt_oss_model = GptOss(checkpoint="/gpt-oss-120b")
 > gpt_oss_model.generate("Tell me a joke!")
 'Why did the scarecrow become a successful motivational speaker? Because he was outstanding **in** his field! 🌾😄'
 # Exclusive for GPT-OSS
-> gpt_oss_model.set_reasoning_level("High")
+> gpt_oss_model.set_reasoning_level("High") # Low, Medium, High, Off
 > custom_input = gpt_oss_model.build_input(
     input_text="Tell me another joke!",
     developer_message="You are a clown and after every joke, you should say 'HONK HONK'"
   )
 > gpt_oss_model.generate(
-    input=custom_input,
+    data=custom_input,
     params=GenerationParams(
+      mode="sample", # greedy, sample or beam
       max_new_tokens=1024,
-      sample=GenerationSampleParams(
-        temperature=0.3
-      )
+      temperature=0.3
     )
   )
 'Why did the scarecrow win an award? Because he was outstanding in his field!  \n\nHONK HONK'
-> llama_model = LLaMA3(checkpoint="/llama-3.3-70B-Instruct", quantization="4bit")
+> llama_model = Llama3(checkpoint="/llama-3.3-70B-Instruct", quantization="4bit")
 > llama_model.generate("Why is the sky blue?")
 'The sky appears blue because of a phenomenon called Rayleigh scattering, which is the scattering of light'
@@ -162,7 +146,7 @@ thrid_model = GPT_OSS(
 You can also generate tokens using a streamer, that is, receiving one token at a time by using the iterator version of the generate function:
 ```python
-llama_4 = LLaMA4(
+llama_4 = Llama4(
   checkpoint="llama-4-scout-17b-16e-instruct"
 )
@@ -175,10 +159,10 @@ for text in it:
 ### Training Examples (DAPT & Fine-tune)
 ```python
-from llmflowstack import LLaMA3
+from llmflowstack import Llama3
 from llmflowstack.schemas import TrainParams
-model = LLaMA3(
+model = Llama3(
   checkpoint="llama-3.1-8b-Instruct"
 )
@@ -186,28 +170,29 @@ model = LLaMA3(
 dataset = []
 dataset.append(model.build_input(
   input_text="Chico is a cat, which color he is?",
-  expected_answer="Black!"
+  output_text="Black!"
 ))
 dataset.append(model.build_input(
   input_text="Fred is a dog, which color he is?",
-  expected_answer="White!"
+  output_text="White!"
 ))
 # Does the DAPT in the full model
-model.dapt(
-  train_dataset=dataset,
+model.train(
+  train_data=dataset,
   params=TrainParams(
     batch_size=1,
     epochs=3,
     gradient_accumulation=1,
     lr=2e-5
-  )
+  ),
+  mode="DAPT"
 )
 # Does the fine-tune this time
-model.fine_tune(
-  train_dataset=dataset,
+model.train(
+  train_data=dataset,
   params=TrainParams(
     batch_size=1,
     gradient_accumulation=1,
@@ -216,7 +201,8 @@ model.fine_tune(
   ),
   save_at_end=True,
   # It will save the model
-  save_path="./output"
+  save_path="./output",
+  mode="FT"
 )
 # Saving the final result
@@ -224,88 +210,3 @@ model.save_checkpoint(
   path="./model-output"
 )
 ```
-### RAG Pipeline
-A prototype of a RAG pipeline is also available. You can instantiate and use it as follows:
-```python
-from llmflowstack import VectorDatabase
-vector_db = VectorDatabase(
-	checkpoint="jina-embeddings-v4",
-	chunk_size=1000,
-	chunk_overlap=200
-)
-# Create or load an existing collection
-vector_db.get_collection(
-	collection_name="memory_rag",
-	persist_directory="./memory"
-)
-vector_db.get_collection(
-	collection_name="files_rag",
-	persist_directory="./files"
-)
-# You may also omit the persist directory; in this case, the RAG data will be stored in memory
-vector_db.get_collection(
-	collection_name="files_rag"
-)
-# To create a new document in a collection
-vector_db.create(
-	collection_name="memory_rag",
-	information="User loves Pizza!",    # Main information to be indexed in the vector database
-	other_info={"category": "food"},
-	can_split=False,                    # Indicates whether the information can be split into chunks
-	should_index=True                   # Defaults to True — defines whether the document should be indexed or only returned as a Document instance
-)
-# After adding documents, you can query the database
-query_result = vector_db.query(
-	collection_name="memory_rag",
-	query="pizza",
-	filter={"category": "food"},
-	k=3   # Number of chunks to retrieve
-)
-print(query_result)
-# > "User loves Pizza!"
-```
-### NLP Evaluation
-> **Disclaimer**
-> These evaluation functions are designed for batch processing. Models and encoders are loaded internally on each call, which may be inefficient for per-sample or streaming evaluation.
-```python
-> from llmflowstack import text_evaluation
-> from llmflowstack.utils import (bert_score_evaluation, bleu_score_evaluation, cosine_similarity_evaluation, rouge_evaluation)
-# Predictions from some model
-> predictions = ["Chico is a dog, and he is orange!", "Fred is a cat, and he is white!"]
-# References text (ground truth)
-> references = ["Chico is a cat, and he is black!", "Fred is a dog, and he is white!"]
-# BERT Score Evaluation
-> bert_score_evaluation(predictions, references)
-{'bertscore_precision': 0.9773, 'bertscore_recall': 0.9773, 'bertscore_f1': 0.9773}
-# Bleu Score Evaluation
-> bleu_score_evaluation(predictions, references)
-{'bleu_score': 0.3656}
-# Cosine Similarity Evaluation
-> cosine_similarity_evaluation(predictions, references)
-{'cosine_similarity': 0.7443}
-# Rouge Score Evaluation
-> rouge_evaluation(predictions, references)
-{'rouge1': 0.8125, 'rouge2': 0.6429, 'rougeL': 0.8125}
-# All-in-one function
-> text_evaluation(predictions, references)
-{'bertscore_precision': 0.9773, 'bertscore_recall': 0.9773, 'bertscore_f1': 0.9773, 'bleu_score': 0.3656, 'cosine_similarity': 0.7443, 'rouge1': 0.8125, 'rouge2': 0.6429, 'rougeL': 0.8125}
-```

{llmflowstack-1.2.6 → llmflowstack-1.3.0}/README.md RENAMED Viewed

@@ -17,32 +17,23 @@ The goal is to make experimentation with LLMs more accessible, without the need
 This framework is designed to provide flexibility when working with different open-source and commercial LLMs. Currently, the following models are supported:
 - **GPT-OSS**
   - [`GPT-OSS 20B`](https://huggingface.co/openai/gpt-oss-20b)
   - [`GPT-OSS 120B`](https://huggingface.co/openai/gpt-oss-120b)
-    > Fine-Tuning, DAPT and Inference Available
 - **LLaMA 3**
   - [`LLaMA 3.1 8B - Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
   - [`LLaMA 3.1 70B - Instruct`](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)
   - [`LLaMA 3.3 70B - Instruct`](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
   - [`LLaMA 3.3 405B - Instruct`](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct)
-    > Fine-Tuning, DAPT and Inference Available
 - **LLaMA 4**
   - [`LLaMA 4 Scout - Instruct`](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct)
-    > DAPT and Inference Available
 - **Gemma**
   - [`Gemma 3 27B - Instruct`](https://huggingface.co/google/gemma-3-27b-it)
-    > DAPT and Inference Available
 - **MedGemma**
-  - [`MedGemma 27B Text - Instruct`](https://huggingface.co/google/medgemma-27b-text-it)
-    > Fine-Tuning, DAPT and Inference Available
+  - [`MedGemma 27B - Instruct`](https://huggingface.co/google/medgemma-27b-it)
 > Other architectures based on those **may** function correctly.
@@ -65,22 +56,22 @@ This section presents a bit of what you can do with the framework.
 You can load as many models as your hardware allows (H100 GPU recommended)...
 ```python
-from llmflowstack import GPT_OSS, LLaMA3
+from llmflowstack import GptOss, Llama3
-# Loading a LLaMA model
-first_model = LLaMA3()
+# Loading a Llama model
+first_model = Llama3()
 first_model.load_checkpoint(
   checkpoint="/llama-3.1-8b-Instruct",
 )
-# Loading a quantized LLaMA model
-second_model = LLaMA3(
+# Loading a quantized Llama model
+second_model = Llama3(
   checkpoint="/llama-3.3-70b-Instruct",
   quantization="4bit"
 )
 # Loading a GPT-OSS, quantized and with seed
-thrid_model = GPT_OSS(
+thrid_model = GptOss(
   checkpoint="/gpt-oss-20b",
   quantization=True,
   seed=1234
@@ -90,32 +81,31 @@ thrid_model = GPT_OSS(
 ### Inference Examples
 ```python
-> from llmflowstack import GPT_OSS, GenerationParams, GenerationSampleParams
+> from llmflowstack import GptOss, GenerationParams
-> gpt_oss_model = GPT_OSS(checkpoint="/gpt-oss-120b")
+> gpt_oss_model = GptOss(checkpoint="/gpt-oss-120b")
 > gpt_oss_model.generate("Tell me a joke!")
 'Why did the scarecrow become a successful motivational speaker? Because he was outstanding **in** his field! 🌾😄'
 # Exclusive for GPT-OSS
-> gpt_oss_model.set_reasoning_level("High")
+> gpt_oss_model.set_reasoning_level("High") # Low, Medium, High, Off
 > custom_input = gpt_oss_model.build_input(
     input_text="Tell me another joke!",
     developer_message="You are a clown and after every joke, you should say 'HONK HONK'"
   )
 > gpt_oss_model.generate(
-    input=custom_input,
+    data=custom_input,
     params=GenerationParams(
+      mode="sample", # greedy, sample or beam
       max_new_tokens=1024,
-      sample=GenerationSampleParams(
-        temperature=0.3
-      )
+      temperature=0.3
     )
   )
 'Why did the scarecrow win an award? Because he was outstanding in his field!  \n\nHONK HONK'
-> llama_model = LLaMA3(checkpoint="/llama-3.3-70B-Instruct", quantization="4bit")
+> llama_model = Llama3(checkpoint="/llama-3.3-70B-Instruct", quantization="4bit")
 > llama_model.generate("Why is the sky blue?")
 'The sky appears blue because of a phenomenon called Rayleigh scattering, which is the scattering of light'
@@ -126,7 +116,7 @@ thrid_model = GPT_OSS(
 You can also generate tokens using a streamer, that is, receiving one token at a time by using the iterator version of the generate function:
 ```python
-llama_4 = LLaMA4(
+llama_4 = Llama4(
   checkpoint="llama-4-scout-17b-16e-instruct"
 )
@@ -139,10 +129,10 @@ for text in it:
 ### Training Examples (DAPT & Fine-tune)
 ```python
-from llmflowstack import LLaMA3
+from llmflowstack import Llama3
 from llmflowstack.schemas import TrainParams
-model = LLaMA3(
+model = Llama3(
   checkpoint="llama-3.1-8b-Instruct"
 )
@@ -150,28 +140,29 @@ model = LLaMA3(
 dataset = []
 dataset.append(model.build_input(
   input_text="Chico is a cat, which color he is?",
-  expected_answer="Black!"
+  output_text="Black!"
 ))
 dataset.append(model.build_input(
   input_text="Fred is a dog, which color he is?",
-  expected_answer="White!"
+  output_text="White!"
 ))
 # Does the DAPT in the full model
-model.dapt(
-  train_dataset=dataset,
+model.train(
+  train_data=dataset,
   params=TrainParams(
     batch_size=1,
     epochs=3,
     gradient_accumulation=1,
     lr=2e-5
-  )
+  ),
+  mode="DAPT"
 )
 # Does the fine-tune this time
-model.fine_tune(
-  train_dataset=dataset,
+model.train(
+  train_data=dataset,
   params=TrainParams(
     batch_size=1,
     gradient_accumulation=1,
@@ -180,7 +171,8 @@ model.fine_tune(
   ),
   save_at_end=True,
   # It will save the model
-  save_path="./output"
+  save_path="./output",
+  mode="FT"
 )
 # Saving the final result
@@ -188,88 +180,3 @@ model.save_checkpoint(
   path="./model-output"
 )
 ```
-### RAG Pipeline
-A prototype of a RAG pipeline is also available. You can instantiate and use it as follows:
-```python
-from llmflowstack import VectorDatabase
-vector_db = VectorDatabase(
-	checkpoint="jina-embeddings-v4",
-	chunk_size=1000,
-	chunk_overlap=200
-)
-# Create or load an existing collection
-vector_db.get_collection(
-	collection_name="memory_rag",
-	persist_directory="./memory"
-)
-vector_db.get_collection(
-	collection_name="files_rag",
-	persist_directory="./files"
-)
-# You may also omit the persist directory; in this case, the RAG data will be stored in memory
-vector_db.get_collection(
-	collection_name="files_rag"
-)
-# To create a new document in a collection
-vector_db.create(
-	collection_name="memory_rag",
-	information="User loves Pizza!",    # Main information to be indexed in the vector database
-	other_info={"category": "food"},
-	can_split=False,                    # Indicates whether the information can be split into chunks
-	should_index=True                   # Defaults to True — defines whether the document should be indexed or only returned as a Document instance
-)
-# After adding documents, you can query the database
-query_result = vector_db.query(
-	collection_name="memory_rag",
-	query="pizza",
-	filter={"category": "food"},
-	k=3   # Number of chunks to retrieve
-)
-print(query_result)
-# > "User loves Pizza!"
-```
-### NLP Evaluation
-> **Disclaimer**
-> These evaluation functions are designed for batch processing. Models and encoders are loaded internally on each call, which may be inefficient for per-sample or streaming evaluation.
-```python
-> from llmflowstack import text_evaluation
-> from llmflowstack.utils import (bert_score_evaluation, bleu_score_evaluation, cosine_similarity_evaluation, rouge_evaluation)
-# Predictions from some model
-> predictions = ["Chico is a dog, and he is orange!", "Fred is a cat, and he is white!"]
-# References text (ground truth)
-> references = ["Chico is a cat, and he is black!", "Fred is a dog, and he is white!"]
-# BERT Score Evaluation
-> bert_score_evaluation(predictions, references)
-{'bertscore_precision': 0.9773, 'bertscore_recall': 0.9773, 'bertscore_f1': 0.9773}
-# Bleu Score Evaluation
-> bleu_score_evaluation(predictions, references)
-{'bleu_score': 0.3656}
-# Cosine Similarity Evaluation
-> cosine_similarity_evaluation(predictions, references)
-{'cosine_similarity': 0.7443}
-# Rouge Score Evaluation
-> rouge_evaluation(predictions, references)
-{'rouge1': 0.8125, 'rouge2': 0.6429, 'rougeL': 0.8125}
-# All-in-one function
-> text_evaluation(predictions, references)
-{'bertscore_precision': 0.9773, 'bertscore_recall': 0.9773, 'bertscore_f1': 0.9773, 'bleu_score': 0.3656, 'cosine_similarity': 0.7443, 'rouge1': 0.8125, 'rouge2': 0.6429, 'rougeL': 0.8125}
-```

llmflowstack-1.3.0/llmflowstack/__init__.py ADDED Viewed

@@ -0,0 +1,27 @@
+from .decoders.gemma_3 import Gemma3
+from .decoders.gpt_2 import Gpt2
+from .decoders.gpt_oss import GptOss
+from .decoders.llama_3 import Llama3
+from .decoders.llama_4 import Llama4
+from .decoders.medgemma import MedGemma
+#from .decoders.qwen_3 import Qwen3
+from .rag.VectorDatabase import VectorDatabase
+from .schemas.params import GenerationParams, TrainParams
+from .utils.evaluation_methods import text_evaluation
+__all__ = [
+  "Gemma3",
+  "Gpt2",
+  "GptOss",
+  "Llama3",
+  "Llama4",
+  "MedGemma",
+#	"Qwen3",
+  "VectorDatabase",
+  "GenerationParams",
+  "TrainParams",
+  "text_evaluation"
+]

llmflowstack 1.2.6__tar.gz → 1.3.0__tar.gz

llmflowstack 1.2.6tar.gz → 1.3.0tar.gz