lopace 0.1.0__py3-none-any.whl → 0.1.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,13 +1,13 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: lopace
3
- Version: 0.1.0
3
+ Version: 0.1.1
4
4
  Summary: Lossless Optimized Prompt Accurate Compression Engine
5
5
  Home-page: https://github.com/connectaman/LoPace
6
6
  Author: Aman Ulla
7
7
  License: MIT
8
- Project-URL: Homepage, https://github.com/amanulla/lopace
9
- Project-URL: Repository, https://github.com/amanulla/lopace
10
- Project-URL: Issues, https://github.com/amanulla/lopace/issues
8
+ Project-URL: Homepage, https://github.com/connectaman/LoPace
9
+ Project-URL: Repository, https://github.com/connectaman/LoPace
10
+ Project-URL: Issues, https://github.com/connectaman/LoPace/issues
11
11
  Keywords: prompt,compression,tokenization,zstd,bpe,nlp
12
12
  Classifier: Development Status :: 4 - Beta
13
13
  Classifier: Intended Audience :: Developers
@@ -229,17 +229,7 @@ Enumeration of available compression methods:
229
229
 
230
230
  ### Compression Pipeline (Hybrid Method)
231
231
 
232
- ```
233
- Input: Raw System Prompt String (100%)
234
-
235
- Tokenization: Convert to Tiktoken IDs (~70% reduced)
236
-
237
- Binary Packing: Convert IDs to uint16 (~50% of above)
238
-
239
- Zstd: Final compression (~30% further reduction)
240
-
241
- Output: Compressed Binary Blob
242
- ```
232
+ ![Compression Pipeline](screenshots/compression-pipeline.png)
243
233
 
244
234
  ### Why Hybrid is Best for Databases
245
235
 
@@ -257,6 +247,88 @@ Output: Compressed Binary Blob
257
247
  # Hybrid: 120 bytes (76% space saved) ← Best!
258
248
  ```
259
249
 
250
+ ## Benchmarks & Performance Analysis
251
+
252
+ Comprehensive benchmarks were conducted on 10 diverse prompts across three size categories (small, medium, and large) to evaluate LoPace's compression performance. The following visualizations present detailed analysis of compression metrics, storage efficiency, speed, and memory usage.
253
+
254
+ ### Compression Ratio Analysis
255
+
256
+ ![Compression Ratio](screenshots/compression_ratio.svg)
257
+
258
+ **Key Insights:**
259
+ - **Hybrid method consistently achieves the highest compression ratios** across all prompt sizes
260
+ - Compression effectiveness increases with prompt size, with large prompts showing 4-6x compression ratios
261
+ - Box plots show the distribution of compression ratios, demonstrating consistent performance
262
+ - Token-based compression provides moderate compression, while Zstd alone offers good baseline performance
263
+
264
+ ### Space Savings Performance
265
+
266
+ ![Space Savings](screenshots/space_savings.svg)
267
+
268
+ **Key Insights:**
269
+ - **Hybrid method achieves 70-80% space savings** on average across all prompt categories
270
+ - Space savings improve significantly with larger prompts (up to 85% for very large prompts)
271
+ - Error bars indicate consistent performance with low variance
272
+ - All three methods show substantial space reduction compared to uncompressed storage
273
+
274
+ ### Disk Size Comparison
275
+
276
+ ![Disk Size Comparison](screenshots/disk_size_comparison.svg)
277
+
278
+ **Key Insights:**
279
+ - **Dramatic reduction in storage requirements** - compressed data is 3-6x smaller than original
280
+ - Log-scale visualization shows the magnitude of space savings across different prompt sizes
281
+ - Hybrid method provides the best storage efficiency, especially for large prompts
282
+ - Size reduction percentage increases linearly with prompt complexity
283
+
284
+ ### Speed & Throughput Metrics
285
+
286
+ ![Speed Metrics](screenshots/speed_metrics.svg)
287
+
288
+ **Key Insights:**
289
+ - **Compression speeds range from 50-200 MB/s** depending on method and prompt size
290
+ - Decompression is consistently faster than compression (100-500 MB/s)
291
+ - Hybrid method maintains excellent throughput despite additional processing steps
292
+ - Processing time scales sub-linearly with prompt size, demonstrating efficient algorithms
293
+
294
+ ### Memory Usage Analysis
295
+
296
+ ![Memory Usage](screenshots/memory_usage.svg)
297
+
298
+ **Key Insights:**
299
+ - **Memory footprint is minimal** - typically under 10 MB even for large prompts
300
+ - Memory usage scales gracefully with input size
301
+ - Compression and decompression show similar memory requirements
302
+ - All methods demonstrate efficient memory utilization suitable for production environments
303
+
304
+ ### Comprehensive Method Comparison
305
+
306
+ ![Comprehensive Comparison](screenshots/comprehensive_comparison.svg)
307
+
308
+ **Key Insights:**
309
+ - **Heatmaps provide at-a-glance comparison** of all metrics across methods and prompt sizes
310
+ - Hybrid method consistently ranks highest in compression ratio and space savings
311
+ - Throughput remains competitive across all methods
312
+ - Memory usage is well-balanced, with no method showing excessive requirements
313
+
314
+ ### Scalability Analysis
315
+
316
+ ![Scalability Analysis](screenshots/scalability_analysis.svg)
317
+
318
+ **Key Insights:**
319
+ - **Performance scales efficiently** with prompt size across all metrics
320
+ - Compression ratio improves with larger inputs (better pattern recognition)
321
+ - Processing time increases sub-linearly, demonstrating algorithmic efficiency
322
+ - Memory usage grows modestly, making LoPace suitable for very large prompts
323
+
324
+ ### Key Findings Summary
325
+
326
+ 1. **Hybrid method is optimal** for maximum compression (70-80% space savings)
327
+ 2. **All methods are lossless** - 100% fidelity verified across all test cases
328
+ 3. **Speed is production-ready** - 50-200 MB/s compression throughput
329
+ 4. **Memory efficient** - Under 10 MB for typical use cases
330
+ 5. **Scales excellently** - Performance improves with larger prompts
331
+
260
332
  ## Running the Example
261
333
 
262
334
  ```bash
@@ -334,6 +406,8 @@ See [.github/workflows/README.md](.github/workflows/README.md) for detailed setu
334
406
 
335
407
  LoPace uses the following compression techniques:
336
408
 
409
+ ![Compression techniques](screenshots/lopace-compression-technique.png)
410
+
337
411
  1. **LZ77 (Sliding Window)**: Used **indirectly** through Zstandard
338
412
  - Zstandard internally uses LZ77-style algorithms to find repeated patterns
339
413
  - Instead of storing "assistant" again, it stores a tuple: (distance_back, length)
@@ -0,0 +1,11 @@
1
+ lopace/__init__.py,sha256=PYjZWZHhSITNgag9sF0qZ_yXgZaMa3R8_3FuasiH0Nc,351
2
+ lopace/compressor.py,sha256=nUTWDcAPYvQaeSFKx_lne-D2xIQ02IMVGE4yLODo8qE,19060
3
+ lopace-0.1.1.dist-info/licenses/LICENSE,sha256=uFUrlsfsOwx_8Nzhq2pUgNaJghcJxXBMML3l7T39Tm0,1067
4
+ scripts/__init__.py,sha256=XLq0VmLoEBfnWjzYmxb_JRzAIqwZDv-2s10TO692TLc,59
5
+ scripts/generate_visualizations.py,sha256=rQ-GNExRA5JxGzLEFLcI6mjpZkzf03QrUx7eqxpSxy4,37264
6
+ tests/__init__.py,sha256=yXNVJE20E2iHo0qbit5SgRE35eXWq89F1kkhNHy7VJA,31
7
+ tests/test_compressor.py,sha256=-vMztSzY89n5dpShcACrFboEQOlfJ6FxF7eQOEU3swM,8273
8
+ lopace-0.1.1.dist-info/METADATA,sha256=h6GPJtwOQn7O2sA9b7YPplJTlQzN4TFKK6G6Z7KWN3A,15847
9
+ lopace-0.1.1.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
10
+ lopace-0.1.1.dist-info/top_level.txt,sha256=k-gL-51ulMq50vhNS91c1eyGRNse0vs_PzS9VdAiYlw,21
11
+ lopace-0.1.1.dist-info/RECORD,,
@@ -1,2 +1,3 @@
1
1
  lopace
2
+ scripts
2
3
  tests
scripts/__init__.py ADDED
@@ -0,0 +1 @@
1
+ """Scripts for generating visualizations and benchmarks."""
@@ -0,0 +1,849 @@
1
+ """
2
+ Generate comprehensive visualizations for LoPace compression metrics.
3
+ Creates publication-quality SVG plots suitable for research papers.
4
+ """
5
+
6
+ import os
7
+ import sys
8
+ import time
9
+ import tracemalloc
10
+ from pathlib import Path
11
+ from typing import List, Dict, Tuple
12
+ import numpy as np
13
+ import matplotlib
14
+ matplotlib.use('SVG') # Use SVG backend for vector graphics
15
+ import matplotlib.pyplot as plt
16
+ import seaborn as sns
17
+ from matplotlib.patches import Rectangle
18
+ import pandas as pd
19
+
20
+ # Add parent directory to path to import lopace
21
+ sys.path.insert(0, str(Path(__file__).parent.parent))
22
+
23
+ from lopace import PromptCompressor, CompressionMethod
24
+
25
+
26
+ # Set style for publication-quality plots
27
+ sns.set_style("whitegrid")
28
+ plt.rcParams.update({
29
+ 'font.size': 12,
30
+ 'font.family': 'serif',
31
+ 'font.serif': ['Times New Roman', 'DejaVu Serif'],
32
+ 'axes.labelsize': 14,
33
+ 'axes.titlesize': 16,
34
+ 'xtick.labelsize': 12,
35
+ 'ytick.labelsize': 12,
36
+ 'legend.fontsize': 11,
37
+ 'figure.titlesize': 18,
38
+ 'figure.dpi': 300,
39
+ 'savefig.dpi': 300,
40
+ 'savefig.format': 'svg',
41
+ 'svg.fonttype': 'none', # Editable text in SVG
42
+ 'mathtext.default': 'regular',
43
+ 'axes.linewidth': 1.2,
44
+ 'grid.linewidth': 0.8,
45
+ 'lines.linewidth': 2.0,
46
+ 'patch.linewidth': 1.5,
47
+ })
48
+
49
+
50
+ def generate_test_prompts() -> List[Tuple[str, str]]:
51
+ """Generate test prompts of various sizes."""
52
+ prompts = []
53
+
54
+ # Small prompts (50-200 chars)
55
+ small_prompts = [
56
+ "You are a helpful AI assistant.",
57
+ "Translate the following text to French.",
58
+ "Summarize this document in 3 sentences.",
59
+ "You are an expert Python developer.",
60
+ ]
61
+
62
+ # Medium prompts (500-2000 chars)
63
+ medium_prompts = [
64
+ """You are a helpful AI assistant designed to provide accurate,
65
+ detailed, and helpful responses to user queries. Your goal is to assist users
66
+ by understanding their questions and providing relevant information, explanations,
67
+ or guidance. Always be respectful, clear, and concise in your communications.
68
+ If you are uncertain about something, it's better to acknowledge that uncertainty
69
+ rather than provide potentially incorrect information.""",
70
+
71
+ """As an advanced language model, your primary function is to understand
72
+ and respond to user inputs in a helpful, accurate, and safe manner. You should
73
+ provide informative answers, assist with problem-solving, engage in creative
74
+ writing tasks, and support various learning activities. Maintain objectivity,
75
+ cite sources when appropriate, and always prioritize user safety and ethical
76
+ considerations in your responses.""",
77
+
78
+ """You are a professional software engineer with expertise in multiple
79
+ programming languages including Python, JavaScript, Java, and C++. Your role
80
+ is to help users write clean, efficient, and maintainable code. Provide
81
+ code examples, explain best practices, debug issues, and suggest improvements.
82
+ Always consider performance, security, and scalability in your recommendations.""",
83
+ ]
84
+
85
+ # Large prompts (5000-20000 chars)
86
+ large_prompts = [
87
+ """You are a comprehensive AI assistant specializing in technical documentation
88
+ and educational content. Your expertise spans multiple domains including computer science,
89
+ data science, machine learning, software engineering, and web development. When responding
90
+ to queries, you should provide thorough explanations, include relevant examples, and
91
+ structure your responses in a clear and organized manner. Always aim to educate while
92
+ solving problems. Break down complex concepts into digestible parts, use analogies when
93
+ helpful, and provide practical applications of theoretical knowledge. Maintain accuracy
94
+ by acknowledging when you're uncertain and suggest reliable sources for further learning.
95
+ Your communication style should be professional yet accessible, avoiding unnecessary
96
+ jargon while ensuring precision in technical details. Consider the user's background
97
+ and adjust your explanation depth accordingly. For code-related queries, always provide
98
+ complete, working examples with comments explaining key parts. For conceptual questions,
99
+ use diagrams, step-by-step breakdowns, and real-world analogies. When discussing
100
+ best practices, explain not just what to do but why, including trade-offs and
101
+ alternative approaches. Your goal is to empower users with knowledge and skills
102
+ rather than just providing answers. Encourage critical thinking, experimentation,
103
+ and continuous learning. Address potential pitfalls, common mistakes, and how to
104
+ avoid them. Provide context about industry standards and emerging trends when relevant.
105
+ Remember that effective teaching involves understanding the learner's perspective,
106
+ patience, and encouragement. Always prioritize clarity, accuracy, and educational value
107
+ in every interaction. Balance thoroughness with conciseness, ensuring responses are
108
+ comprehensive yet not overwhelming. Use formatting effectively to improve readability,
109
+ including bullet points, numbered lists, and section headers when appropriate.""",
110
+
111
+ """System Prompt for Advanced Multi-Modal AI Assistant: This AI system is designed
112
+ to be a versatile, intelligent, and highly capable assistant that can handle a wide range
113
+ of tasks across multiple domains. The system integrates natural language processing,
114
+ reasoning capabilities, knowledge retrieval, and contextual understanding to provide
115
+ comprehensive support. Primary capabilities include question answering, problem-solving,
116
+ creative tasks, analysis, code generation, data interpretation, and educational support.
117
+ The assistant maintains a knowledge base spanning science, technology, humanities,
118
+ business, arts, and current events. When interacting with users, the system should
119
+ prioritize accuracy, helpfulness, safety, and ethical considerations. Responses should
120
+ be well-structured, clear, and appropriately detailed based on the complexity of the query.
121
+ The assistant should ask clarifying questions when necessary, acknowledge limitations,
122
+ and provide sources or references when making factual claims. For technical questions,
123
+ provide detailed explanations with examples. For creative tasks, demonstrate imagination
124
+ while maintaining coherence and appropriateness. For analytical tasks, show step-by-step
125
+ reasoning and present conclusions clearly. The system should adapt its communication style
126
+ to match the user's level of expertise and the context of the conversation. Always aim
127
+ to be constructive, respectful, and professional. When dealing with sensitive topics,
128
+ exercise caution and provide balanced perspectives. For coding tasks, write clean,
129
+ well-commented code following best practices. For writing tasks, ensure proper grammar,
130
+ style, and structure. The assistant should continuously learn from interactions while
131
+ maintaining core principles and guidelines. It should handle ambiguity gracefully,
132
+ provide multiple perspectives when appropriate, and help users think critically about
133
+ complex issues. The system is designed to be a tool for empowerment, education, and
134
+ efficient problem-solving.""",
135
+ ]
136
+
137
+ # Combine and label prompts
138
+ for prompt in small_prompts:
139
+ prompts.append(("Small", prompt))
140
+
141
+ for prompt in medium_prompts:
142
+ prompts.append(("Medium", prompt))
143
+
144
+ for large_prompts_list in large_prompts:
145
+ prompts.append(("Large", large_prompts_list))
146
+
147
+ # Add one more large prompt if needed
148
+ if len(prompts) < 10:
149
+ additional_large = """Comprehensive System Prompt for Advanced AI Assistant: This sophisticated
150
+ artificial intelligence system represents a state-of-the-art language model designed to excel across
151
+ a multitude of domains and applications. The system integrates deep learning architectures, extensive
152
+ knowledge bases, and advanced reasoning capabilities to provide exceptional assistance. Core competencies
153
+ include natural language understanding and generation, logical reasoning, creative problem-solving,
154
+ technical expertise, and ethical decision-making. The assistant maintains extensive knowledge spanning
155
+ STEM fields, humanities, arts, business, law, medicine, and contemporary issues. When engaging with users,
156
+ the system employs sophisticated contextual understanding, adapts communication styles appropriately,
157
+ and provides nuanced, well-reasoned responses. The architecture supports multi-modal interactions,
158
+ real-time learning, and seamless integration with external tools and databases. Quality assurance
159
+ mechanisms ensure accuracy, relevance, and safety in all outputs. The system demonstrates exceptional
160
+ capabilities in code generation and analysis, creative writing, data analysis, educational instruction,
161
+ research assistance, and complex problem decomposition. Advanced features include meta-cognitive reasoning,
162
+ uncertainty quantification, bias detection and mitigation, and explainable AI principles. The assistant
163
+ prioritizes user empowerment through education, transparency, and collaborative problem-solving approaches."""
164
+ prompts.append(("Large", additional_large))
165
+
166
+ # Ensure we have exactly 10 prompts
167
+ prompts = prompts[:10]
168
+
169
+ return prompts
170
+
171
+
172
+ def measure_compression(
173
+ compressor: PromptCompressor,
174
+ prompt: str,
175
+ method: CompressionMethod
176
+ ) -> Dict:
177
+ """Measure compression metrics for a given prompt and method."""
178
+ # Memory tracking
179
+ tracemalloc.start()
180
+
181
+ # Compression
182
+ start_time = time.perf_counter()
183
+ compressed = compressor.compress(prompt, method)
184
+ compression_time = time.perf_counter() - start_time
185
+ current, peak = tracemalloc.get_traced_memory()
186
+ compression_memory = peak / (1024 * 1024) # MB
187
+
188
+ tracemalloc.stop()
189
+
190
+ # Decompression
191
+ tracemalloc.start()
192
+ start_time = time.perf_counter()
193
+ decompressed = compressor.decompress(compressed, method)
194
+ decompression_time = time.perf_counter() - start_time
195
+ current, peak = tracemalloc.get_traced_memory()
196
+ decompression_memory = peak / (1024 * 1024) # MB
197
+ tracemalloc.stop()
198
+
199
+ # Verify losslessness
200
+ is_lossless = prompt == decompressed
201
+
202
+ # Calculate metrics
203
+ original_size = len(prompt.encode('utf-8'))
204
+ compressed_size = len(compressed)
205
+
206
+ metrics = {
207
+ 'method': method.value,
208
+ 'original_size_bytes': original_size,
209
+ 'compressed_size_bytes': compressed_size,
210
+ 'compression_ratio': original_size / compressed_size if compressed_size > 0 else 0,
211
+ 'space_savings_percent': (1 - compressed_size / original_size) * 100 if original_size > 0 else 0,
212
+ 'bytes_saved': original_size - compressed_size,
213
+ 'compression_time_ms': compression_time * 1000,
214
+ 'decompression_time_ms': decompression_time * 1000,
215
+ 'compression_throughput_mbps': (original_size / (1024 * 1024)) / compression_time if compression_time > 0 else 0,
216
+ 'decompression_throughput_mbps': (compressed_size / (1024 * 1024)) / decompression_time if decompression_time > 0 else 0,
217
+ 'compression_memory_mb': compression_memory,
218
+ 'decompression_memory_mb': decompression_memory,
219
+ 'is_lossless': is_lossless,
220
+ 'num_characters': len(prompt),
221
+ }
222
+
223
+ return metrics
224
+
225
+
226
+ def run_benchmarks() -> pd.DataFrame:
227
+ """Run compression benchmarks on all prompts and methods."""
228
+ compressor = PromptCompressor(model="cl100k_base", zstd_level=15)
229
+ prompts = generate_test_prompts()
230
+
231
+ all_results = []
232
+
233
+ print("Running benchmarks...")
234
+ for idx, (category, prompt) in enumerate(prompts, 1):
235
+ print(f" Processing prompt {idx}/10 ({category}, {len(prompt)} chars)...")
236
+
237
+ for method in [CompressionMethod.ZSTD, CompressionMethod.TOKEN, CompressionMethod.HYBRID]:
238
+ metrics = measure_compression(compressor, prompt, method)
239
+ metrics['prompt_id'] = idx
240
+ metrics['prompt_category'] = category
241
+ metrics['prompt_length'] = len(prompt)
242
+ all_results.append(metrics)
243
+
244
+ df = pd.DataFrame(all_results)
245
+ return df
246
+
247
+
248
+ def plot_compression_ratio(df: pd.DataFrame, output_dir: Path):
249
+ """Plot compression ratios by method and prompt size."""
250
+ fig, axes = plt.subplots(1, 2, figsize=(14, 6))
251
+
252
+ # Left: Compression ratio by method
253
+ ax1 = axes[0]
254
+ method_order = ['zstd', 'token', 'hybrid']
255
+ method_labels = ['Zstd', 'Token (BPE)', 'Hybrid']
256
+
257
+ data_by_method = [df[df['method'] == m]['compression_ratio'].values for m in method_order]
258
+
259
+ bp = ax1.boxplot(data_by_method, labels=method_labels, patch_artist=True,
260
+ widths=0.6, showmeans=True, meanline=True)
261
+
262
+ colors = ['#3498db', '#2ecc71', '#9b59b6']
263
+ for patch, color in zip(bp['boxes'], colors):
264
+ patch.set_facecolor(color)
265
+ patch.set_alpha(0.7)
266
+
267
+ ax1.set_ylabel('Compression Ratio', fontweight='bold')
268
+ ax1.set_xlabel('Compression Method', fontweight='bold')
269
+ ax1.set_title('(a) Compression Ratio Distribution by Method', fontweight='bold', pad=15)
270
+ ax1.grid(True, alpha=0.3, linestyle='--')
271
+ ax1.set_ylim(bottom=0)
272
+
273
+ # Right: Compression ratio by prompt category
274
+ ax2 = axes[1]
275
+ categories = ['Small', 'Medium', 'Large']
276
+ category_data = []
277
+
278
+ for category in categories:
279
+ category_df = df[df['prompt_category'] == category]
280
+ method_data = [category_df[category_df['method'] == m]['compression_ratio'].mean()
281
+ for m in method_order]
282
+ category_data.append(method_data)
283
+
284
+ x = np.arange(len(categories))
285
+ width = 0.25
286
+
287
+ for i, (method, color) in enumerate(zip(method_labels, colors)):
288
+ values = [category_data[j][i] for j in range(len(categories))]
289
+ ax2.bar(x + i * width, values, width, label=method, color=color, alpha=0.8)
290
+
291
+ ax2.set_ylabel('Mean Compression Ratio', fontweight='bold')
292
+ ax2.set_xlabel('Prompt Category', fontweight='bold')
293
+ ax2.set_title('(b) Compression Ratio by Prompt Size', fontweight='bold', pad=15)
294
+ ax2.set_xticks(x + width)
295
+ ax2.set_xticklabels(categories)
296
+ ax2.legend(loc='upper left', framealpha=0.9)
297
+ ax2.grid(True, alpha=0.3, linestyle='--', axis='y')
298
+ ax2.set_ylim(bottom=0)
299
+
300
+ plt.tight_layout()
301
+ plt.savefig(output_dir / 'compression_ratio.svg', format='svg', bbox_inches='tight')
302
+ plt.close()
303
+ print(f" Saved: compression_ratio.svg")
304
+
305
+
306
+ def plot_space_savings(df: pd.DataFrame, output_dir: Path):
307
+ """Plot space savings percentages."""
308
+ fig, ax = plt.subplots(figsize=(12, 7))
309
+
310
+ categories = ['Small', 'Medium', 'Large']
311
+ method_order = ['zstd', 'token', 'hybrid']
312
+ method_labels = ['Zstd', 'Token (BPE)', 'Hybrid']
313
+ colors = ['#3498db', '#2ecc71', '#9b59b6']
314
+
315
+ x = np.arange(len(categories))
316
+ width = 0.25
317
+
318
+ for i, (method, label, color) in enumerate(zip(method_order, method_labels, colors)):
319
+ means = []
320
+ stds = []
321
+ for category in categories:
322
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
323
+ means.append(subset['space_savings_percent'].mean())
324
+ stds.append(subset['space_savings_percent'].std())
325
+
326
+ bars = ax.bar(x + i * width, means, width, label=label, color=color, alpha=0.8,
327
+ yerr=stds, capsize=5, error_kw={'elinewidth': 2, 'capthick': 2})
328
+
329
+ ax.set_ylabel('Space Savings (%)', fontweight='bold')
330
+ ax.set_xlabel('Prompt Category', fontweight='bold')
331
+ ax.set_title('Space Savings by Compression Method and Prompt Size', fontweight='bold', pad=15)
332
+ ax.set_xticks(x + width)
333
+ ax.set_xticklabels(categories)
334
+ ax.legend(loc='upper left', framealpha=0.9, ncol=3)
335
+ ax.grid(True, alpha=0.3, linestyle='--', axis='y')
336
+ ax.set_ylim(0, 100)
337
+
338
+ # Add value labels on bars
339
+ for i, method in enumerate(method_order):
340
+ for j, category in enumerate(categories):
341
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
342
+ mean_val = subset['space_savings_percent'].mean()
343
+ ax.text(j + i * width, mean_val + 2, f'{mean_val:.1f}%',
344
+ ha='center', va='bottom', fontsize=10, fontweight='bold')
345
+
346
+ plt.tight_layout()
347
+ plt.savefig(output_dir / 'space_savings.svg', format='svg', bbox_inches='tight')
348
+ plt.close()
349
+ print(f" Saved: space_savings.svg")
350
+
351
+
352
+ def plot_disk_size_comparison(df: pd.DataFrame, output_dir: Path):
353
+ """Plot original vs compressed disk sizes."""
354
+ fig, axes = plt.subplots(2, 1, figsize=(14, 10))
355
+
356
+ method_order = ['zstd', 'token', 'hybrid']
357
+ method_labels = ['Zstd', 'Token (BPE)', 'Hybrid']
358
+ colors = ['#3498db', '#2ecc71', '#9b59b6']
359
+
360
+ # Top: Stacked bar chart showing original vs compressed
361
+ ax1 = axes[0]
362
+ categories = ['Small', 'Medium', 'Large']
363
+
364
+ x = np.arange(len(categories))
365
+ width = 0.25
366
+
367
+ for category_idx, category in enumerate(categories):
368
+ category_df = df[df['prompt_category'] == category]
369
+ original_sizes = category_df.groupby('prompt_id')['original_size_bytes'].first().mean()
370
+
371
+ compressed_means = []
372
+ for method in method_order:
373
+ method_df = category_df[category_df['method'] == method]
374
+ compressed_means.append(method_df['compressed_size_bytes'].mean())
375
+
376
+ # Stack bars
377
+ bottom = 0
378
+ for i, (method, label, color) in enumerate(zip(method_order, method_labels, colors)):
379
+ if i == 0:
380
+ # First method: show original vs compressed
381
+ ax1.bar(category_idx + i * width, original_sizes / 1024, width,
382
+ label='Original Size' if category_idx == 0 else '', color='#e74c3c', alpha=0.7)
383
+ ax1.bar(category_idx + i * width, compressed_means[i] / 1024, width,
384
+ bottom=0, label=label if category_idx == 0 else '', color=color, alpha=0.8)
385
+ else:
386
+ # Other methods: just compressed size
387
+ ax1.bar(category_idx + i * width, compressed_means[i] / 1024, width,
388
+ label=label if category_idx == 0 else '', color=color, alpha=0.8)
389
+
390
+ ax1.set_ylabel('Size (KB)', fontweight='bold')
391
+ ax1.set_xlabel('Prompt Category', fontweight='bold')
392
+ ax1.set_title('Disk Size: Original vs Compressed', fontweight='bold', pad=15)
393
+ ax1.set_xticks(x + width)
394
+ ax1.set_xticklabels(categories)
395
+ ax1.legend(loc='upper left', framealpha=0.9, ncol=4)
396
+ ax1.grid(True, alpha=0.3, linestyle='--', axis='y')
397
+ ax1.set_yscale('log')
398
+
399
+ # Bottom: Percentage reduction
400
+ ax2 = axes[1]
401
+
402
+ for i, (method, label, color) in enumerate(zip(method_order, method_labels, colors)):
403
+ means = []
404
+ for category in categories:
405
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
406
+ means.append(subset['space_savings_percent'].mean())
407
+
408
+ ax2.plot(categories, means, marker='o', linewidth=2.5, markersize=10,
409
+ label=label, color=color, markerfacecolor=color, markeredgewidth=2)
410
+
411
+ ax2.set_ylabel('Size Reduction (%)', fontweight='bold')
412
+ ax2.set_xlabel('Prompt Category', fontweight='bold')
413
+ ax2.set_title('Size Reduction by Prompt Category', fontweight='bold', pad=15)
414
+ ax2.legend(loc='best', framealpha=0.9)
415
+ ax2.grid(True, alpha=0.3, linestyle='--')
416
+ ax2.set_ylim(0, 100)
417
+
418
+ plt.tight_layout()
419
+ plt.savefig(output_dir / 'disk_size_comparison.svg', format='svg', bbox_inches='tight')
420
+ plt.close()
421
+ print(f" Saved: disk_size_comparison.svg")
422
+
423
+
424
+ def plot_speed_metrics(df: pd.DataFrame, output_dir: Path):
425
+ """Plot compression and decompression speed metrics."""
426
+ fig, axes = plt.subplots(2, 2, figsize=(16, 12))
427
+
428
+ method_order = ['zstd', 'token', 'hybrid']
429
+ method_labels = ['Zstd', 'Token (BPE)', 'Hybrid']
430
+ colors = ['#3498db', '#2ecc71', '#9b59b6']
431
+ categories = ['Small', 'Medium', 'Large']
432
+
433
+ # Top-left: Compression time
434
+ ax1 = axes[0, 0]
435
+ x = np.arange(len(categories))
436
+ width = 0.25
437
+
438
+ for i, (method, label, color) in enumerate(zip(method_order, method_labels, colors)):
439
+ means = []
440
+ for category in categories:
441
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
442
+ means.append(subset['compression_time_ms'].mean())
443
+
444
+ ax1.bar(x + i * width, means, width, label=label, color=color, alpha=0.8)
445
+
446
+ ax1.set_ylabel('Compression Time (ms)', fontweight='bold')
447
+ ax1.set_xlabel('Prompt Category', fontweight='bold')
448
+ ax1.set_title('(a) Compression Time', fontweight='bold', pad=15)
449
+ ax1.set_xticks(x + width)
450
+ ax1.set_xticklabels(categories)
451
+ ax1.legend(framealpha=0.9)
452
+ ax1.grid(True, alpha=0.3, linestyle='--', axis='y')
453
+ ax1.set_yscale('log')
454
+
455
+ # Top-right: Decompression time
456
+ ax2 = axes[0, 1]
457
+ for i, (method, label, color) in enumerate(zip(method_order, method_labels, colors)):
458
+ means = []
459
+ for category in categories:
460
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
461
+ means.append(subset['decompression_time_ms'].mean())
462
+
463
+ ax2.bar(x + i * width, means, width, label=label, color=color, alpha=0.8)
464
+
465
+ ax2.set_ylabel('Decompression Time (ms)', fontweight='bold')
466
+ ax2.set_xlabel('Prompt Category', fontweight='bold')
467
+ ax2.set_title('(b) Decompression Time', fontweight='bold', pad=15)
468
+ ax2.set_xticks(x + width)
469
+ ax2.set_xticklabels(categories)
470
+ ax2.legend(framealpha=0.9)
471
+ ax2.grid(True, alpha=0.3, linestyle='--', axis='y')
472
+ ax2.set_yscale('log')
473
+
474
+ # Bottom-left: Compression throughput
475
+ ax3 = axes[1, 0]
476
+ for i, (method, label, color) in enumerate(zip(method_order, method_labels, colors)):
477
+ means = []
478
+ for category in categories:
479
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
480
+ means.append(subset['compression_throughput_mbps'].mean())
481
+
482
+ ax3.plot(categories, means, marker='o', linewidth=2.5, markersize=10,
483
+ label=label, color=color, markerfacecolor=color, markeredgewidth=2)
484
+
485
+ ax3.set_ylabel('Throughput (MB/s)', fontweight='bold')
486
+ ax3.set_xlabel('Prompt Category', fontweight='bold')
487
+ ax3.set_title('(c) Compression Throughput', fontweight='bold', pad=15)
488
+ ax3.legend(framealpha=0.9)
489
+ ax3.grid(True, alpha=0.3, linestyle='--')
490
+ ax3.set_ylim(bottom=0)
491
+
492
+ # Bottom-right: Decompression throughput
493
+ ax4 = axes[1, 1]
494
+ for i, (method, label, color) in enumerate(zip(method_order, method_labels, colors)):
495
+ means = []
496
+ for category in categories:
497
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
498
+ means.append(subset['decompression_throughput_mbps'].mean())
499
+
500
+ ax4.plot(categories, means, marker='s', linewidth=2.5, markersize=10,
501
+ label=label, color=color, markerfacecolor=color, markeredgewidth=2)
502
+
503
+ ax4.set_ylabel('Throughput (MB/s)', fontweight='bold')
504
+ ax4.set_xlabel('Prompt Category', fontweight='bold')
505
+ ax4.set_title('(d) Decompression Throughput', fontweight='bold', pad=15)
506
+ ax4.legend(framealpha=0.9)
507
+ ax4.grid(True, alpha=0.3, linestyle='--')
508
+ ax4.set_ylim(bottom=0)
509
+
510
+ plt.tight_layout()
511
+ plt.savefig(output_dir / 'speed_metrics.svg', format='svg', bbox_inches='tight')
512
+ plt.close()
513
+ print(f" Saved: speed_metrics.svg")
514
+
515
+
516
+ def plot_memory_usage(df: pd.DataFrame, output_dir: Path):
517
+ """Plot memory usage during compression and decompression."""
518
+ fig, axes = plt.subplots(1, 2, figsize=(16, 6))
519
+
520
+ method_order = ['zstd', 'token', 'hybrid']
521
+ method_labels = ['Zstd', 'Token (BPE)', 'Hybrid']
522
+ colors = ['#3498db', '#2ecc71', '#9b59b6']
523
+ categories = ['Small', 'Medium', 'Large']
524
+
525
+ x = np.arange(len(categories))
526
+ width = 0.25
527
+
528
+ # Left: Compression memory
529
+ ax1 = axes[0]
530
+ for i, (method, label, color) in enumerate(zip(method_order, method_labels, colors)):
531
+ means = []
532
+ stds = []
533
+ for category in categories:
534
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
535
+ means.append(subset['compression_memory_mb'].mean())
536
+ stds.append(subset['compression_memory_mb'].std())
537
+
538
+ ax1.bar(x + i * width, means, width, label=label, color=color, alpha=0.8,
539
+ yerr=stds, capsize=5, error_kw={'elinewidth': 2, 'capthick': 2})
540
+
541
+ ax1.set_ylabel('Memory Usage (MB)', fontweight='bold')
542
+ ax1.set_xlabel('Prompt Category', fontweight='bold')
543
+ ax1.set_title('(a) Compression Memory Usage', fontweight='bold', pad=15)
544
+ ax1.set_xticks(x + width)
545
+ ax1.set_xticklabels(categories)
546
+ ax1.legend(framealpha=0.9)
547
+ ax1.grid(True, alpha=0.3, linestyle='--', axis='y')
548
+ ax1.set_ylim(bottom=0)
549
+
550
+ # Right: Decompression memory
551
+ ax2 = axes[1]
552
+ for i, (method, label, color) in enumerate(zip(method_order, method_labels, colors)):
553
+ means = []
554
+ stds = []
555
+ for category in categories:
556
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
557
+ means.append(subset['decompression_memory_mb'].mean())
558
+ stds.append(subset['decompression_memory_mb'].std())
559
+
560
+ ax2.bar(x + i * width, means, width, label=label, color=color, alpha=0.8,
561
+ yerr=stds, capsize=5, error_kw={'elinewidth': 2, 'capthick': 2})
562
+
563
+ ax2.set_ylabel('Memory Usage (MB)', fontweight='bold')
564
+ ax2.set_xlabel('Prompt Category', fontweight='bold')
565
+ ax2.set_title('(b) Decompression Memory Usage', fontweight='bold', pad=15)
566
+ ax2.set_xticks(x + width)
567
+ ax2.set_xticklabels(categories)
568
+ ax2.legend(framealpha=0.9)
569
+ ax2.grid(True, alpha=0.3, linestyle='--', axis='y')
570
+ ax2.set_ylim(bottom=0)
571
+
572
+ plt.tight_layout()
573
+ plt.savefig(output_dir / 'memory_usage.svg', format='svg', bbox_inches='tight')
574
+ plt.close()
575
+ print(f" Saved: memory_usage.svg")
576
+
577
+
578
+ def plot_comprehensive_comparison(df: pd.DataFrame, output_dir: Path):
579
+ """Create a comprehensive comparison heatmap."""
580
+ fig, axes = plt.subplots(2, 2, figsize=(16, 14))
581
+
582
+ method_order = ['zstd', 'token', 'hybrid']
583
+ method_labels = ['Zstd', 'Token\n(BPE)', 'Hybrid']
584
+ categories = ['Small', 'Medium', 'Large']
585
+
586
+ # Top-left: Compression ratio heatmap
587
+ ax1 = axes[0, 0]
588
+ compression_ratio_matrix = []
589
+ for method in method_order:
590
+ row = []
591
+ for category in categories:
592
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
593
+ row.append(subset['compression_ratio'].mean())
594
+ compression_ratio_matrix.append(row)
595
+
596
+ im1 = ax1.imshow(compression_ratio_matrix, cmap='YlOrRd', aspect='auto', vmin=0)
597
+ ax1.set_xticks(np.arange(len(categories)))
598
+ ax1.set_yticks(np.arange(len(method_labels)))
599
+ ax1.set_xticklabels(categories)
600
+ ax1.set_yticklabels(method_labels)
601
+ ax1.set_ylabel('Compression Method', fontweight='bold')
602
+ ax1.set_xlabel('Prompt Category', fontweight='bold')
603
+ ax1.set_title('(a) Compression Ratio', fontweight='bold', pad=15)
604
+
605
+ # Add text annotations
606
+ for i in range(len(method_labels)):
607
+ for j in range(len(categories)):
608
+ text = ax1.text(j, i, f'{compression_ratio_matrix[i][j]:.2f}x',
609
+ ha="center", va="center", color="black", fontweight='bold')
610
+
611
+ plt.colorbar(im1, ax=ax1, label='Compression Ratio')
612
+
613
+ # Top-right: Space savings heatmap
614
+ ax2 = axes[0, 1]
615
+ space_savings_matrix = []
616
+ for method in method_order:
617
+ row = []
618
+ for category in categories:
619
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
620
+ row.append(subset['space_savings_percent'].mean())
621
+ space_savings_matrix.append(row)
622
+
623
+ im2 = ax2.imshow(space_savings_matrix, cmap='RdYlGn', aspect='auto', vmin=0, vmax=100)
624
+ ax2.set_xticks(np.arange(len(categories)))
625
+ ax2.set_yticks(np.arange(len(method_labels)))
626
+ ax2.set_xticklabels(categories)
627
+ ax2.set_yticklabels(method_labels)
628
+ ax2.set_ylabel('Compression Method', fontweight='bold')
629
+ ax2.set_xlabel('Prompt Category', fontweight='bold')
630
+ ax2.set_title('(b) Space Savings (%)', fontweight='bold', pad=15)
631
+
632
+ for i in range(len(method_labels)):
633
+ for j in range(len(categories)):
634
+ text = ax2.text(j, i, f'{space_savings_matrix[i][j]:.1f}%',
635
+ ha="center", va="center", color="black", fontweight='bold')
636
+
637
+ plt.colorbar(im2, ax=ax2, label='Space Savings (%)')
638
+
639
+ # Bottom-left: Compression speed heatmap
640
+ ax3 = axes[1, 0]
641
+ speed_matrix = []
642
+ for method in method_order:
643
+ row = []
644
+ for category in categories:
645
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
646
+ row.append(subset['compression_throughput_mbps'].mean())
647
+ speed_matrix.append(row)
648
+
649
+ im3 = ax3.imshow(speed_matrix, cmap='viridis', aspect='auto')
650
+ ax3.set_xticks(np.arange(len(categories)))
651
+ ax3.set_yticks(np.arange(len(method_labels)))
652
+ ax3.set_xticklabels(categories)
653
+ ax3.set_yticklabels(method_labels)
654
+ ax3.set_ylabel('Compression Method', fontweight='bold')
655
+ ax3.set_xlabel('Prompt Category', fontweight='bold')
656
+ ax3.set_title('(c) Compression Throughput (MB/s)', fontweight='bold', pad=15)
657
+
658
+ for i in range(len(method_labels)):
659
+ for j in range(len(categories)):
660
+ text = ax3.text(j, i, f'{speed_matrix[i][j]:.2f}',
661
+ ha="center", va="center", color="white", fontweight='bold')
662
+
663
+ plt.colorbar(im3, ax=ax3, label='Throughput (MB/s)')
664
+
665
+ # Bottom-right: Memory usage heatmap
666
+ ax4 = axes[1, 1]
667
+ memory_matrix = []
668
+ for method in method_order:
669
+ row = []
670
+ for category in categories:
671
+ subset = df[(df['method'] == method) & (df['prompt_category'] == category)]
672
+ row.append(subset['compression_memory_mb'].mean())
673
+ memory_matrix.append(row)
674
+
675
+ im4 = ax4.imshow(memory_matrix, cmap='plasma', aspect='auto')
676
+ ax4.set_xticks(np.arange(len(categories)))
677
+ ax4.set_yticks(np.arange(len(method_labels)))
678
+ ax4.set_xticklabels(categories)
679
+ ax4.set_yticklabels(method_labels)
680
+ ax4.set_ylabel('Compression Method', fontweight='bold')
681
+ ax4.set_xlabel('Prompt Category', fontweight='bold')
682
+ ax4.set_title('(d) Compression Memory Usage (MB)', fontweight='bold', pad=15)
683
+
684
+ for i in range(len(method_labels)):
685
+ for j in range(len(categories)):
686
+ text = ax4.text(j, i, f'{memory_matrix[i][j]:.2f}',
687
+ ha="center", va="center", color="white", fontweight='bold')
688
+
689
+ plt.colorbar(im4, ax=ax4, label='Memory (MB)')
690
+
691
+ plt.tight_layout()
692
+ plt.savefig(output_dir / 'comprehensive_comparison.svg', format='svg', bbox_inches='tight')
693
+ plt.close()
694
+ print(f" Saved: comprehensive_comparison.svg")
695
+
696
+
697
+ def plot_scalability(df: pd.DataFrame, output_dir: Path):
698
+ """Plot how metrics scale with prompt size."""
699
+ fig, axes = plt.subplots(2, 2, figsize=(16, 12))
700
+
701
+ method_order = ['zstd', 'token', 'hybrid']
702
+ method_labels = ['Zstd', 'Token (BPE)', 'Hybrid']
703
+ colors = ['#3498db', '#2ecc71', '#9b59b6']
704
+
705
+ # Get unique prompt sizes
706
+ prompt_sizes = sorted(df['prompt_length'].unique())
707
+
708
+ # Top-left: Compression ratio vs prompt size
709
+ ax1 = axes[0, 0]
710
+ for method, label, color in zip(method_order, method_labels, colors):
711
+ means = []
712
+ sizes = []
713
+ for size in prompt_sizes:
714
+ subset = df[(df['method'] == method) & (df['prompt_length'] == size)]
715
+ if len(subset) > 0:
716
+ means.append(subset['compression_ratio'].mean())
717
+ sizes.append(size)
718
+
719
+ ax1.plot(sizes, means, marker='o', linewidth=2.5, markersize=8,
720
+ label=label, color=color, markerfacecolor=color, markeredgewidth=2)
721
+
722
+ ax1.set_xlabel('Prompt Length (characters)', fontweight='bold')
723
+ ax1.set_ylabel('Compression Ratio', fontweight='bold')
724
+ ax1.set_title('(a) Compression Ratio vs Prompt Size', fontweight='bold', pad=15)
725
+ ax1.legend(framealpha=0.9)
726
+ ax1.grid(True, alpha=0.3, linestyle='--')
727
+ ax1.set_xscale('log')
728
+
729
+ # Top-right: Space savings vs prompt size
730
+ ax2 = axes[0, 1]
731
+ for method, label, color in zip(method_order, method_labels, colors):
732
+ means = []
733
+ sizes = []
734
+ for size in prompt_sizes:
735
+ subset = df[(df['method'] == method) & (df['prompt_length'] == size)]
736
+ if len(subset) > 0:
737
+ means.append(subset['space_savings_percent'].mean())
738
+ sizes.append(size)
739
+
740
+ ax2.plot(sizes, means, marker='s', linewidth=2.5, markersize=8,
741
+ label=label, color=color, markerfacecolor=color, markeredgewidth=2)
742
+
743
+ ax2.set_xlabel('Prompt Length (characters)', fontweight='bold')
744
+ ax2.set_ylabel('Space Savings (%)', fontweight='bold')
745
+ ax2.set_title('(b) Space Savings vs Prompt Size', fontweight='bold', pad=15)
746
+ ax2.legend(framealpha=0.9)
747
+ ax2.grid(True, alpha=0.3, linestyle='--')
748
+ ax2.set_xscale('log')
749
+
750
+ # Bottom-left: Compression time vs prompt size
751
+ ax3 = axes[1, 0]
752
+ for method, label, color in zip(method_order, method_labels, colors):
753
+ means = []
754
+ sizes = []
755
+ for size in prompt_sizes:
756
+ subset = df[(df['method'] == method) & (df['prompt_length'] == size)]
757
+ if len(subset) > 0:
758
+ means.append(subset['compression_time_ms'].mean())
759
+ sizes.append(size)
760
+
761
+ ax3.plot(sizes, means, marker='^', linewidth=2.5, markersize=8,
762
+ label=label, color=color, markerfacecolor=color, markeredgewidth=2)
763
+
764
+ ax3.set_xlabel('Prompt Length (characters)', fontweight='bold')
765
+ ax3.set_ylabel('Compression Time (ms)', fontweight='bold')
766
+ ax3.set_title('(c) Compression Time vs Prompt Size', fontweight='bold', pad=15)
767
+ ax3.legend(framealpha=0.9)
768
+ ax3.grid(True, alpha=0.3, linestyle='--')
769
+ ax3.set_xscale('log')
770
+ ax3.set_yscale('log')
771
+
772
+ # Bottom-right: Memory vs prompt size
773
+ ax4 = axes[1, 1]
774
+ for method, label, color in zip(method_order, method_labels, colors):
775
+ means = []
776
+ sizes = []
777
+ for size in prompt_sizes:
778
+ subset = df[(df['method'] == method) & (df['prompt_length'] == size)]
779
+ if len(subset) > 0:
780
+ means.append(subset['compression_memory_mb'].mean())
781
+ sizes.append(size)
782
+
783
+ ax4.plot(sizes, means, marker='d', linewidth=2.5, markersize=8,
784
+ label=label, color=color, markerfacecolor=color, markeredgewidth=2)
785
+
786
+ ax4.set_xlabel('Prompt Length (characters)', fontweight='bold')
787
+ ax4.set_ylabel('Memory Usage (MB)', fontweight='bold')
788
+ ax4.set_title('(d) Memory Usage vs Prompt Size', fontweight='bold', pad=15)
789
+ ax4.legend(framealpha=0.9)
790
+ ax4.grid(True, alpha=0.3, linestyle='--')
791
+ ax4.set_xscale('log')
792
+
793
+ plt.tight_layout()
794
+ plt.savefig(output_dir / 'scalability_analysis.svg', format='svg', bbox_inches='tight')
795
+ plt.close()
796
+ print(f" Saved: scalability_analysis.svg")
797
+
798
+
799
+ def main():
800
+ """Main function to generate all visualizations."""
801
+ # Create output directory
802
+ output_dir = Path(__file__).parent.parent / 'screenshots'
803
+ output_dir.mkdir(exist_ok=True)
804
+
805
+ print("=" * 70)
806
+ print("LoPace Visualization Generator")
807
+ print("=" * 70)
808
+ print(f"Output directory: {output_dir}")
809
+ print()
810
+
811
+ # Run benchmarks
812
+ print("Step 1: Running compression benchmarks...")
813
+ df = run_benchmarks()
814
+
815
+ # Save raw data
816
+ csv_path = output_dir / 'benchmark_data.csv'
817
+ df.to_csv(csv_path, index=False)
818
+ print(f"\n Saved benchmark data to: {csv_path}")
819
+
820
+ print("\nStep 2: Generating visualizations...")
821
+
822
+ # Generate all plots
823
+ plot_compression_ratio(df, output_dir)
824
+ plot_space_savings(df, output_dir)
825
+ plot_disk_size_comparison(df, output_dir)
826
+ plot_speed_metrics(df, output_dir)
827
+ plot_memory_usage(df, output_dir)
828
+ plot_comprehensive_comparison(df, output_dir)
829
+ plot_scalability(df, output_dir)
830
+
831
+ print("\n" + "=" * 70)
832
+ print("Visualization generation complete!")
833
+ print(f"All plots saved to: {output_dir}")
834
+ print("=" * 70)
835
+
836
+ # Print summary statistics
837
+ print("\nSummary Statistics:")
838
+ print("-" * 70)
839
+ for method in ['zstd', 'token', 'hybrid']:
840
+ method_df = df[df['method'] == method]
841
+ print(f"\n{method.upper()}:")
842
+ print(f" Mean Compression Ratio: {method_df['compression_ratio'].mean():.2f}x")
843
+ print(f" Mean Space Savings: {method_df['space_savings_percent'].mean():.2f}%")
844
+ print(f" Mean Compression Time: {method_df['compression_time_ms'].mean():.2f} ms")
845
+ print(f" Mean Throughput: {method_df['compression_throughput_mbps'].mean():.2f} MB/s")
846
+
847
+
848
+ if __name__ == "__main__":
849
+ main()
@@ -1,9 +0,0 @@
1
- lopace/__init__.py,sha256=PYjZWZHhSITNgag9sF0qZ_yXgZaMa3R8_3FuasiH0Nc,351
2
- lopace/compressor.py,sha256=nUTWDcAPYvQaeSFKx_lne-D2xIQ02IMVGE4yLODo8qE,19060
3
- lopace-0.1.0.dist-info/licenses/LICENSE,sha256=uFUrlsfsOwx_8Nzhq2pUgNaJghcJxXBMML3l7T39Tm0,1067
4
- tests/__init__.py,sha256=yXNVJE20E2iHo0qbit5SgRE35eXWq89F1kkhNHy7VJA,31
5
- tests/test_compressor.py,sha256=-vMztSzY89n5dpShcACrFboEQOlfJ6FxF7eQOEU3swM,8273
6
- lopace-0.1.0.dist-info/METADATA,sha256=yXy0jt23uvVWkGlEeCb8KEUSx1_o3N02wZZEFj5weEI,12199
7
- lopace-0.1.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
8
- lopace-0.1.0.dist-info/top_level.txt,sha256=8CLB5czxmmAfR7ayh3TO5qyB1-xJoYNxabufJ37Xh5o,13
9
- lopace-0.1.0.dist-info/RECORD,,
File without changes