mmgp 3.0.9__tar.gz → 3.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of mmgp might be problematic. Click here for more details.
- {mmgp-3.0.9/src/mmgp.egg-info → mmgp-3.1.1}/PKG-INFO +3 -3
- {mmgp-3.0.9 → mmgp-3.1.1}/README.md +2 -2
- {mmgp-3.0.9 → mmgp-3.1.1}/pyproject.toml +1 -1
- {mmgp-3.0.9 → mmgp-3.1.1}/src/mmgp/offload.py +697 -583
- {mmgp-3.0.9 → mmgp-3.1.1}/src/mmgp/safetensors2.py +40 -30
- {mmgp-3.0.9 → mmgp-3.1.1/src/mmgp.egg-info}/PKG-INFO +3 -3
- {mmgp-3.0.9 → mmgp-3.1.1}/LICENSE.md +0 -0
- {mmgp-3.0.9 → mmgp-3.1.1}/setup.cfg +0 -0
- {mmgp-3.0.9 → mmgp-3.1.1}/src/__init__.py +0 -0
- {mmgp-3.0.9 → mmgp-3.1.1}/src/mmgp/__init__.py +0 -0
- {mmgp-3.0.9 → mmgp-3.1.1}/src/mmgp.egg-info/SOURCES.txt +0 -0
- {mmgp-3.0.9 → mmgp-3.1.1}/src/mmgp.egg-info/dependency_links.txt +0 -0
- {mmgp-3.0.9 → mmgp-3.1.1}/src/mmgp.egg-info/requires.txt +0 -0
- {mmgp-3.0.9 → mmgp-3.1.1}/src/mmgp.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.2
|
|
2
2
|
Name: mmgp
|
|
3
|
-
Version: 3.
|
|
3
|
+
Version: 3.1.1
|
|
4
4
|
Summary: Memory Management for the GPU Poor
|
|
5
5
|
Author-email: deepbeepmeep <deepbeepmeep@yahoo.com>
|
|
6
6
|
License: GNU GENERAL PUBLIC LICENSE
|
|
@@ -17,7 +17,7 @@ Requires-Dist: peft
|
|
|
17
17
|
|
|
18
18
|
|
|
19
19
|
<p align="center">
|
|
20
|
-
<H2>Memory Management 3.0
|
|
20
|
+
<H2>Memory Management 3.1.0 for the GPU Poor by DeepBeepMeep</H2>
|
|
21
21
|
</p>
|
|
22
22
|
|
|
23
23
|
|
|
@@ -100,7 +100,7 @@ For example:
|
|
|
100
100
|
The smaller this number, the more VRAM left for image data / longer video but also the slower because there will be lots of loading / unloading between the RAM and the VRAM. If model is too big to fit in a budget, it will be broken down in multiples parts that will be unloaded / loaded consequently. The speed of low budget can be increased (up to 2 times) by turning on the options pinnedMemory and asyncTransfers.
|
|
101
101
|
- asyncTransfers: boolean, load to the GPU the next model part while the current part is being processed. This requires twice the budget if any is defined. This may increase speed by 20% (mostly visible on fast modern GPUs).
|
|
102
102
|
- verboseLevel: number between 0 and 2 (1 by default), provides various level of feedback of the different processes
|
|
103
|
-
- compile: list of model ids to compile, may accelerate up x2 depending on the type of GPU. As of 01/01/2025 it will work only on Linux or WSL since compilation relies on Triton which is not yet supported on Windows
|
|
103
|
+
- compile: list of model ids to compile, may accelerate up x2 depending on the type of GPU. It makes sens to compile only the model that is frequently used such as the "transformer" model in the case of video or image generation. As of 01/01/2025 it will work only on Linux or WSL since compilation relies on Triton which is not yet supported on Windows
|
|
104
104
|
|
|
105
105
|
If you are short on RAM and plan to work with quantized models, it is recommended to load pre-quantized models direclty rather than using on the fly quantization, it will be faster and consume slightly less RAM.
|
|
106
106
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
|
|
2
2
|
<p align="center">
|
|
3
|
-
<H2>Memory Management 3.0
|
|
3
|
+
<H2>Memory Management 3.1.0 for the GPU Poor by DeepBeepMeep</H2>
|
|
4
4
|
</p>
|
|
5
5
|
|
|
6
6
|
|
|
@@ -83,7 +83,7 @@ For example:
|
|
|
83
83
|
The smaller this number, the more VRAM left for image data / longer video but also the slower because there will be lots of loading / unloading between the RAM and the VRAM. If model is too big to fit in a budget, it will be broken down in multiples parts that will be unloaded / loaded consequently. The speed of low budget can be increased (up to 2 times) by turning on the options pinnedMemory and asyncTransfers.
|
|
84
84
|
- asyncTransfers: boolean, load to the GPU the next model part while the current part is being processed. This requires twice the budget if any is defined. This may increase speed by 20% (mostly visible on fast modern GPUs).
|
|
85
85
|
- verboseLevel: number between 0 and 2 (1 by default), provides various level of feedback of the different processes
|
|
86
|
-
- compile: list of model ids to compile, may accelerate up x2 depending on the type of GPU. As of 01/01/2025 it will work only on Linux or WSL since compilation relies on Triton which is not yet supported on Windows
|
|
86
|
+
- compile: list of model ids to compile, may accelerate up x2 depending on the type of GPU. It makes sens to compile only the model that is frequently used such as the "transformer" model in the case of video or image generation. As of 01/01/2025 it will work only on Linux or WSL since compilation relies on Triton which is not yet supported on Windows
|
|
87
87
|
|
|
88
88
|
If you are short on RAM and plan to work with quantized models, it is recommended to load pre-quantized models direclty rather than using on the fly quantization, it will be faster and consume slightly less RAM.
|
|
89
89
|
|