mmgp 1.0.5__tar.gz → 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: mmgp
3
- Version: 1.0.5
3
+ Version: 1.1.0
4
4
  Summary: Memory Management for the GPU Poor
5
5
  Author-email: deepbeepmeep <deepbeepmeep@yahoo.com>
6
6
  License: GNU GENERAL PUBLIC LICENSE
@@ -683,7 +683,11 @@ License-File: LICENSE.md
683
683
  Requires-Dist: torch>=2.1.0
684
684
  Requires-Dist: optimum-quanto
685
685
 
686
- **------------------ Memory Management for the GPU Poor by DeepBeepMeep ------------------**
686
+
687
+ <p align="center">
688
+ <H2>Memory Management for the GPU Poor by DeepBeepMeep</H2>
689
+ </p>
690
+
687
691
 
688
692
  This module contains multiples optimisations so that models such as Flux (and derived), Mochi, CogView, HunyuanVideo, ... can run smoothly on a 24 GB GPU limited card.
689
693
  This a replacement for the accelerate library that should in theory manage offloading, but doesn't work properly with models that are loaded / unloaded several
@@ -693,31 +697,50 @@ Requirements:
693
697
  - GPU: RTX 3090/ RTX 4090 (24 GB of VRAM)
694
698
  - RAM: minimum 48 GB, recommended 64 GB
695
699
 
700
+ First you need to install the module in your current project with:
701
+ ```shell
702
+ pip install mmgp
703
+ ```
704
+
696
705
  It is almost plug and play and just needs to be invoked from the main app just after the model pipeline has been created.
697
- 1) First make sure that the pipeline explictly loads the models in the CPU device
698
- for instance: pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16).to("cpu")
706
+ 1) First make sure that the pipeline explictly loads the models in the CPU device, for instance:
707
+ ```
708
+ pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16).to("cpu")
709
+ ```
710
+
699
711
  2) Once every potential Lora has been loaded and merged, add the following lines:
700
712
 
701
- *from mmgp import offload*
702
- *offload.me(pipe)*
703
-
704
- The 'transformer' model that contains usually the video or image generator is quantized on the fly by default to 8 bits. If you want to save time on disk and reduce the loading time, you may want to load directly a prequantized model. In that case you need to set the option *quantizeTransformer* to *False* to turn off on the fly quantization.
713
+ ```
714
+ from mmgp import offload
715
+ offload.me(pipe)
716
+ ```
717
+ The 'transformer' model in the pipe contains usually the video or image generator is quantized on the fly by default to 8 bits. If you want to save time on disk and reduce the loading time, you may want to load directly a prequantized model. In that case you need to set the option *quantizeTransformer* to *False* to turn off on the fly quantization.
705
718
 
706
719
  If you have more than 64GB RAM you may want to enable RAM pinning with the option *pinInRAM = True*. You will get in return super fast loading / unloading of models
707
720
  (this can save significant time if the same pipeline is run multiple times in a row)
708
721
 
709
- Sometime there isn't an explicit pipe object as each submodel is loaded separately in the main app. If this is the case, you need to create a dictionary that manually maps all the models.
710
-
722
+ Sometime there isn't an explicit pipe object as each submodel is loaded separately in the main app. If this is the case, you need to create a dictionary that manually maps all the models.\
711
723
  For instance :
712
- for flux derived models: *pipe = { "text_encoder": clip, "text_encoder_2": t5, "transformer": model, "vae":ae }*
713
- for mochi: *pipe = { "text_encoder": self.text_encoder, "transformer": self.dit, "vae":self.decoder }*
724
+
725
+
726
+ - for flux derived models:
727
+ ```
728
+ pipe = { "text_encoder": clip, "text_encoder_2": t5, "transformer": model, "vae":ae }
729
+ ```
730
+ - for mochi:
731
+ ```
732
+ pipe = { "text_encoder": self.text_encoder, "transformer": self.dit, "vae":self.decoder }
733
+ ```
734
+
714
735
 
715
736
  Please note that there should be always one model whose Id is 'transformer'. It corresponds to the main image / video model which usually needs to be quantized (this is done on the fly by default when loading the model).
716
737
 
717
738
  Becareful, lots of models use the T5 XXL as a text encoder. However, quite often their corresponding pipeline configurations point at the official Google T5 XXL repository
718
739
  where there is a huge 40GB model to download and load. It is cumbersorme as it is a 32 bits model and contains the decoder part of T5 that is not used.
719
740
  I suggest you use instead one of the 16 bits encoder only version available around, for instance:
720
- *text_encoder_2 = T5EncoderModel.from_pretrained("black-forest-labs/FLUX.1-dev", subfolder="text_encoder_2", torch_dtype=torch.float16)*
741
+ ```
742
+ text_encoder_2 = T5EncoderModel.from_pretrained("black-forest-labs/FLUX.1-dev", subfolder="text_encoder_2", torch_dtype=torch.float16)
743
+ ```
721
744
 
722
745
  Sometime just providing the pipe won't be sufficient as you will need to change the content of the core model:
723
746
  - For instance you may need to disable an existing CPU offload logic that already exists (such as manual calls to move tensors between cuda and the cpu)
@@ -727,6 +750,6 @@ You are free to use my module for non commercial use as long you give me proper
727
750
 
728
751
  Thanks to
729
752
  ---------
730
- Huggingface / accelerate for the hooking examples
731
- Huggingface / quanto for their very useful quantizer
732
- gau-nernst for his Pinnig RAM samples
753
+ - Huggingface / accelerate for the hooking examples
754
+ - Huggingface / quanto for their very useful quantizer
755
+ - gau-nernst for his Pinnig RAM samples
@@ -1,4 +1,8 @@
1
- **------------------ Memory Management for the GPU Poor by DeepBeepMeep ------------------**
1
+
2
+ <p align="center">
3
+ <H2>Memory Management for the GPU Poor by DeepBeepMeep</H2>
4
+ </p>
5
+
2
6
 
3
7
  This module contains multiples optimisations so that models such as Flux (and derived), Mochi, CogView, HunyuanVideo, ... can run smoothly on a 24 GB GPU limited card.
4
8
  This a replacement for the accelerate library that should in theory manage offloading, but doesn't work properly with models that are loaded / unloaded several
@@ -8,31 +12,50 @@ Requirements:
8
12
  - GPU: RTX 3090/ RTX 4090 (24 GB of VRAM)
9
13
  - RAM: minimum 48 GB, recommended 64 GB
10
14
 
15
+ First you need to install the module in your current project with:
16
+ ```shell
17
+ pip install mmgp
18
+ ```
19
+
11
20
  It is almost plug and play and just needs to be invoked from the main app just after the model pipeline has been created.
12
- 1) First make sure that the pipeline explictly loads the models in the CPU device
13
- for instance: pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16).to("cpu")
21
+ 1) First make sure that the pipeline explictly loads the models in the CPU device, for instance:
22
+ ```
23
+ pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16).to("cpu")
24
+ ```
25
+
14
26
  2) Once every potential Lora has been loaded and merged, add the following lines:
15
27
 
16
- *from mmgp import offload*
17
- *offload.me(pipe)*
18
-
19
- The 'transformer' model that contains usually the video or image generator is quantized on the fly by default to 8 bits. If you want to save time on disk and reduce the loading time, you may want to load directly a prequantized model. In that case you need to set the option *quantizeTransformer* to *False* to turn off on the fly quantization.
28
+ ```
29
+ from mmgp import offload
30
+ offload.me(pipe)
31
+ ```
32
+ The 'transformer' model in the pipe contains usually the video or image generator is quantized on the fly by default to 8 bits. If you want to save time on disk and reduce the loading time, you may want to load directly a prequantized model. In that case you need to set the option *quantizeTransformer* to *False* to turn off on the fly quantization.
20
33
 
21
34
  If you have more than 64GB RAM you may want to enable RAM pinning with the option *pinInRAM = True*. You will get in return super fast loading / unloading of models
22
35
  (this can save significant time if the same pipeline is run multiple times in a row)
23
36
 
24
- Sometime there isn't an explicit pipe object as each submodel is loaded separately in the main app. If this is the case, you need to create a dictionary that manually maps all the models.
25
-
37
+ Sometime there isn't an explicit pipe object as each submodel is loaded separately in the main app. If this is the case, you need to create a dictionary that manually maps all the models.\
26
38
  For instance :
27
- for flux derived models: *pipe = { "text_encoder": clip, "text_encoder_2": t5, "transformer": model, "vae":ae }*
28
- for mochi: *pipe = { "text_encoder": self.text_encoder, "transformer": self.dit, "vae":self.decoder }*
39
+
40
+
41
+ - for flux derived models:
42
+ ```
43
+ pipe = { "text_encoder": clip, "text_encoder_2": t5, "transformer": model, "vae":ae }
44
+ ```
45
+ - for mochi:
46
+ ```
47
+ pipe = { "text_encoder": self.text_encoder, "transformer": self.dit, "vae":self.decoder }
48
+ ```
49
+
29
50
 
30
51
  Please note that there should be always one model whose Id is 'transformer'. It corresponds to the main image / video model which usually needs to be quantized (this is done on the fly by default when loading the model).
31
52
 
32
53
  Becareful, lots of models use the T5 XXL as a text encoder. However, quite often their corresponding pipeline configurations point at the official Google T5 XXL repository
33
54
  where there is a huge 40GB model to download and load. It is cumbersorme as it is a 32 bits model and contains the decoder part of T5 that is not used.
34
55
  I suggest you use instead one of the 16 bits encoder only version available around, for instance:
35
- *text_encoder_2 = T5EncoderModel.from_pretrained("black-forest-labs/FLUX.1-dev", subfolder="text_encoder_2", torch_dtype=torch.float16)*
56
+ ```
57
+ text_encoder_2 = T5EncoderModel.from_pretrained("black-forest-labs/FLUX.1-dev", subfolder="text_encoder_2", torch_dtype=torch.float16)
58
+ ```
36
59
 
37
60
  Sometime just providing the pipe won't be sufficient as you will need to change the content of the core model:
38
61
  - For instance you may need to disable an existing CPU offload logic that already exists (such as manual calls to move tensors between cuda and the cpu)
@@ -42,6 +65,6 @@ You are free to use my module for non commercial use as long you give me proper
42
65
 
43
66
  Thanks to
44
67
  ---------
45
- Huggingface / accelerate for the hooking examples
46
- Huggingface / quanto for their very useful quantizer
47
- gau-nernst for his Pinnig RAM samples
68
+ - Huggingface / accelerate for the hooking examples
69
+ - Huggingface / quanto for their very useful quantizer
70
+ - gau-nernst for his Pinnig RAM samples
@@ -0,0 +1,15 @@
1
+ [project]
2
+ name = "mmgp"
3
+ version = "1.1.0"
4
+ authors = [
5
+ { name = "deepbeepmeep", email = "deepbeepmeep@yahoo.com" },
6
+ ]
7
+ description = "Memory Management for the GPU Poor"
8
+ readme = "README.md"
9
+ requires-python = ">=3.10"
10
+ license = { file = "LICENSE.md" }
11
+ dependencies = [
12
+ "torch >= 2.1.0",
13
+ "optimum-quanto",
14
+ ]
15
+
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: mmgp
3
- Version: 1.0.5
3
+ Version: 1.1.0
4
4
  Summary: Memory Management for the GPU Poor
5
5
  Author-email: deepbeepmeep <deepbeepmeep@yahoo.com>
6
6
  License: GNU GENERAL PUBLIC LICENSE
@@ -683,7 +683,11 @@ License-File: LICENSE.md
683
683
  Requires-Dist: torch>=2.1.0
684
684
  Requires-Dist: optimum-quanto
685
685
 
686
- **------------------ Memory Management for the GPU Poor by DeepBeepMeep ------------------**
686
+
687
+ <p align="center">
688
+ <H2>Memory Management for the GPU Poor by DeepBeepMeep</H2>
689
+ </p>
690
+
687
691
 
688
692
  This module contains multiples optimisations so that models such as Flux (and derived), Mochi, CogView, HunyuanVideo, ... can run smoothly on a 24 GB GPU limited card.
689
693
  This a replacement for the accelerate library that should in theory manage offloading, but doesn't work properly with models that are loaded / unloaded several
@@ -693,31 +697,50 @@ Requirements:
693
697
  - GPU: RTX 3090/ RTX 4090 (24 GB of VRAM)
694
698
  - RAM: minimum 48 GB, recommended 64 GB
695
699
 
700
+ First you need to install the module in your current project with:
701
+ ```shell
702
+ pip install mmgp
703
+ ```
704
+
696
705
  It is almost plug and play and just needs to be invoked from the main app just after the model pipeline has been created.
697
- 1) First make sure that the pipeline explictly loads the models in the CPU device
698
- for instance: pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16).to("cpu")
706
+ 1) First make sure that the pipeline explictly loads the models in the CPU device, for instance:
707
+ ```
708
+ pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16).to("cpu")
709
+ ```
710
+
699
711
  2) Once every potential Lora has been loaded and merged, add the following lines:
700
712
 
701
- *from mmgp import offload*
702
- *offload.me(pipe)*
703
-
704
- The 'transformer' model that contains usually the video or image generator is quantized on the fly by default to 8 bits. If you want to save time on disk and reduce the loading time, you may want to load directly a prequantized model. In that case you need to set the option *quantizeTransformer* to *False* to turn off on the fly quantization.
713
+ ```
714
+ from mmgp import offload
715
+ offload.me(pipe)
716
+ ```
717
+ The 'transformer' model in the pipe contains usually the video or image generator is quantized on the fly by default to 8 bits. If you want to save time on disk and reduce the loading time, you may want to load directly a prequantized model. In that case you need to set the option *quantizeTransformer* to *False* to turn off on the fly quantization.
705
718
 
706
719
  If you have more than 64GB RAM you may want to enable RAM pinning with the option *pinInRAM = True*. You will get in return super fast loading / unloading of models
707
720
  (this can save significant time if the same pipeline is run multiple times in a row)
708
721
 
709
- Sometime there isn't an explicit pipe object as each submodel is loaded separately in the main app. If this is the case, you need to create a dictionary that manually maps all the models.
710
-
722
+ Sometime there isn't an explicit pipe object as each submodel is loaded separately in the main app. If this is the case, you need to create a dictionary that manually maps all the models.\
711
723
  For instance :
712
- for flux derived models: *pipe = { "text_encoder": clip, "text_encoder_2": t5, "transformer": model, "vae":ae }*
713
- for mochi: *pipe = { "text_encoder": self.text_encoder, "transformer": self.dit, "vae":self.decoder }*
724
+
725
+
726
+ - for flux derived models:
727
+ ```
728
+ pipe = { "text_encoder": clip, "text_encoder_2": t5, "transformer": model, "vae":ae }
729
+ ```
730
+ - for mochi:
731
+ ```
732
+ pipe = { "text_encoder": self.text_encoder, "transformer": self.dit, "vae":self.decoder }
733
+ ```
734
+
714
735
 
715
736
  Please note that there should be always one model whose Id is 'transformer'. It corresponds to the main image / video model which usually needs to be quantized (this is done on the fly by default when loading the model).
716
737
 
717
738
  Becareful, lots of models use the T5 XXL as a text encoder. However, quite often their corresponding pipeline configurations point at the official Google T5 XXL repository
718
739
  where there is a huge 40GB model to download and load. It is cumbersorme as it is a 32 bits model and contains the decoder part of T5 that is not used.
719
740
  I suggest you use instead one of the 16 bits encoder only version available around, for instance:
720
- *text_encoder_2 = T5EncoderModel.from_pretrained("black-forest-labs/FLUX.1-dev", subfolder="text_encoder_2", torch_dtype=torch.float16)*
741
+ ```
742
+ text_encoder_2 = T5EncoderModel.from_pretrained("black-forest-labs/FLUX.1-dev", subfolder="text_encoder_2", torch_dtype=torch.float16)
743
+ ```
721
744
 
722
745
  Sometime just providing the pipe won't be sufficient as you will need to change the content of the core model:
723
746
  - For instance you may need to disable an existing CPU offload logic that already exists (such as manual calls to move tensors between cuda and the cpu)
@@ -727,6 +750,6 @@ You are free to use my module for non commercial use as long you give me proper
727
750
 
728
751
  Thanks to
729
752
  ---------
730
- Huggingface / accelerate for the hooking examples
731
- Huggingface / quanto for their very useful quantizer
732
- gau-nernst for his Pinnig RAM samples
753
+ - Huggingface / accelerate for the hooking examples
754
+ - Huggingface / quanto for their very useful quantizer
755
+ - gau-nernst for his Pinnig RAM samples
@@ -2,7 +2,6 @@ LICENSE.md
2
2
  README.md
3
3
  pyproject.toml
4
4
  src/__init__.py
5
- src/_version.py
6
5
  src/mmgp.py
7
6
  src/mmgp.egg-info/PKG-INFO
8
7
  src/mmgp.egg-info/SOURCES.txt
@@ -1,3 +1,2 @@
1
1
  __init__
2
- _version
3
2
  mmgp
mmgp-1.0.5/pyproject.toml DELETED
@@ -1,74 +0,0 @@
1
- [project]
2
- name = "mmgp"
3
- authors = [
4
- { name = "deepbeepmeep", email = "deepbeepmeep@yahoo.com" },
5
- ]
6
- description = "Memory Management for the GPU Poor"
7
- readme = "README.md"
8
- requires-python = ">=3.10"
9
- license = { file = "LICENSE.md" }
10
- dynamic = ["version"]
11
- dependencies = [
12
- "torch >= 2.1.0",
13
- "optimum-quanto",
14
- ]
15
-
16
- [project.optional-dependencies]
17
-
18
-
19
- [build-system]
20
- build-backend = "setuptools.build_meta"
21
- requires = ["setuptools>=64", "wheel", "setuptools_scm>=8"]
22
-
23
- [tool.ruff]
24
- line-length = 110
25
- target-version = "py310"
26
- extend-exclude = ["/usr/lib/*"]
27
-
28
- [tool.ruff.lint]
29
- ignore = [
30
- "E501", # line too long - will be fixed in format
31
- ]
32
-
33
- [tool.ruff.format]
34
- quote-style = "double"
35
- indent-style = "space"
36
- line-ending = "auto"
37
- skip-magic-trailing-comma = false
38
- docstring-code-format = true
39
- exclude = [
40
- "src/_version.py", # generated by setuptools_scm
41
- ]
42
-
43
- [tool.ruff.lint.isort]
44
- combine-as-imports = true
45
- force-wrap-aliases = true
46
- known-local-folder = ["src"]
47
- known-first-party = ["mmgp"]
48
-
49
- [tool.pyright]
50
- include = ["src"]
51
- exclude = [
52
- "**/__pycache__", # cache directories
53
- "./typings", # generated type stubs
54
- ]
55
- stubPath = "./typings"
56
-
57
- [tool.tomlsort]
58
- in_place = true
59
- no_sort_tables = true
60
- spaces_before_inline_comment = 1
61
- spaces_indent_inline_array = 2
62
- trailing_comma_inline_array = true
63
- sort_first = [
64
- "project",
65
- "build-system",
66
- "tool.setuptools",
67
- ]
68
-
69
- # needs to be last for CI reasons
70
- [tool.setuptools_scm]
71
- write_to = "src/_version.py"
72
- parentdir_prefix_version = "mmgp-"
73
- fallback_version = "1.0.5"
74
- version_scheme = "post-release"
@@ -1,16 +0,0 @@
1
- # file generated by setuptools_scm
2
- # don't change, don't track in version control
3
- TYPE_CHECKING = False
4
- if TYPE_CHECKING:
5
- from typing import Tuple, Union
6
- VERSION_TUPLE = Tuple[Union[int, str], ...]
7
- else:
8
- VERSION_TUPLE = object
9
-
10
- version: str
11
- __version__: str
12
- __version_tuple__: VERSION_TUPLE
13
- version_tuple: VERSION_TUPLE
14
-
15
- __version__ = version = '1.0.5'
16
- __version_tuple__ = version_tuple = (1, 0, 5)
File without changes
File without changes
File without changes
File without changes