ultrasav 0.1.0__tar.gz → 0.2.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. {ultrasav-0.1.0/src/ultrasav.egg-info → ultrasav-0.2.4}/PKG-INFO +598 -565
  2. {ultrasav-0.1.0 → ultrasav-0.2.4}/README.md +109 -61
  3. {ultrasav-0.1.0 → ultrasav-0.2.4}/pyproject.toml +5 -16
  4. {ultrasav-0.1.0 → ultrasav-0.2.4}/src/ultrasav/__init__.py +43 -34
  5. ultrasav-0.1.0/src/ultrasav/def_add_cases.py → ultrasav-0.2.4/src/ultrasav/_add_cases.py +4 -4
  6. ultrasav-0.1.0/src/ultrasav/class_data.py → ultrasav-0.2.4/src/ultrasav/_data.py +6 -3
  7. ultrasav-0.1.0/src/ultrasav/def_merge_data.py → ultrasav-0.2.4/src/ultrasav/_merge_data.py +1 -1
  8. ultrasav-0.1.0/src/ultrasav/def_merge_meta.py → ultrasav-0.2.4/src/ultrasav/_merge_meta.py +1 -1
  9. ultrasav-0.2.4/src/ultrasav/_metadata.py +1239 -0
  10. ultrasav-0.2.4/src/ultrasav/_set_value_labels.py +209 -0
  11. ultrasav-0.1.0/src/ultrasav/def_write_files.py → ultrasav-0.2.4/src/ultrasav/_write_files.py +12 -13
  12. {ultrasav-0.1.0 → ultrasav-0.2.4}/src/ultrasav/metaman/__init__.py +18 -12
  13. ultrasav-0.2.4/src/ultrasav/metaman/_describe.py +421 -0
  14. ultrasav-0.1.0/src/ultrasav/metaman/def_get_meta.py → ultrasav-0.2.4/src/ultrasav/metaman/_get_meta.py +2 -2
  15. ultrasav-0.1.0/src/ultrasav/metaman/def_make_datamap.py → ultrasav-0.2.4/src/ultrasav/metaman/_make_datamap.py +1 -1
  16. ultrasav-0.1.0/src/ultrasav/metaman/def_make_labels.py → ultrasav-0.2.4/src/ultrasav/metaman/_make_labels.py +2 -2
  17. ultrasav-0.1.0/src/ultrasav/metaman/def_map_engine.py → ultrasav-0.2.4/src/ultrasav/metaman/_map_engine.py +530 -529
  18. ultrasav-0.1.0/src/ultrasav/metaman/def_map_to_excel.py → ultrasav-0.2.4/src/ultrasav/metaman/_map_to_excel.py +2 -2
  19. ultrasav-0.1.0/src/ultrasav/metaman/def_write_excel_engine.py → ultrasav-0.2.4/src/ultrasav/metaman/_write_excel_engine.py +1 -1
  20. ultrasav-0.1.0/LICENSE +0 -21
  21. ultrasav-0.1.0/PKG-INFO +0 -565
  22. ultrasav-0.1.0/setup.cfg +0 -4
  23. ultrasav-0.1.0/src/ultrasav/class_metadata.py +0 -570
  24. ultrasav-0.1.0/src/ultrasav.egg-info/SOURCES.txt +0 -26
  25. ultrasav-0.1.0/src/ultrasav.egg-info/dependency_links.txt +0 -1
  26. ultrasav-0.1.0/src/ultrasav.egg-info/requires.txt +0 -25
  27. ultrasav-0.1.0/src/ultrasav.egg-info/top_level.txt +0 -1
  28. /ultrasav-0.1.0/src/ultrasav/def_make_dummy.py → /ultrasav-0.2.4/src/ultrasav/_make_dummy.py +0 -0
  29. /ultrasav-0.1.0/src/ultrasav/def_read_files.py → /ultrasav-0.2.4/src/ultrasav/_read_files.py +0 -0
  30. /ultrasav-0.1.0/src/ultrasav/metaman/pastel_color_schemes.py → /ultrasav-0.2.4/src/ultrasav/metaman/_color_schemes.py +0 -0
  31. /ultrasav-0.1.0/src/ultrasav/metaman/def_detect_variable_type.py → /ultrasav-0.2.4/src/ultrasav/metaman/_detect_variable_type.py +0 -0
@@ -1,565 +1,598 @@
1
- Metadata-Version: 2.4
2
- Name: ultrasav
3
- Version: 0.1.0
4
- Summary: A Python package for working with SPSS/SAV files with two-track architecture separating data and metadata operations
5
- Author: Albert Li
6
- Maintainer: Albert Li
7
- License: MIT
8
- Project-URL: Homepage, https://github.com/albertxli/ultrasav
9
- Project-URL: Documentation, https://ultrasav.readthedocs.io
10
- Project-URL: Changelog, https://github.com/albertxli/ultrasav/blob/main/CHANGELOG.md
11
- Keywords: spss,spss labels,spss files,sav files,sav,statistics,data-science,data-processing,survey-data,metadata,spss metadata,pyreadstat,dataframe-agnostic,polars,pandas,read spss,read sav,write spss,write sav,merge spss,merge sav,datamap,spss-datamap,validation,data-quality,tidyspss,metaprinter
12
- Classifier: Development Status :: 4 - Beta
13
- Classifier: Intended Audience :: Science/Research
14
- Classifier: Intended Audience :: Developers
15
- Classifier: License :: OSI Approved :: MIT License
16
- Classifier: Programming Language :: Python
17
- Classifier: Programming Language :: Python :: 3
18
- Classifier: Programming Language :: Python :: 3.10
19
- Classifier: Programming Language :: Python :: 3.11
20
- Classifier: Programming Language :: Python :: 3.12
21
- Classifier: Programming Language :: Python :: 3.13
22
- Classifier: Operating System :: OS Independent
23
- Classifier: Topic :: Scientific/Engineering
24
- Classifier: Topic :: Software Development :: Libraries :: Python Modules
25
- Requires-Python: >=3.11
26
- Description-Content-Type: text/markdown
27
- License-File: LICENSE
28
- Requires-Dist: pandas>=2.2.0
29
- Requires-Dist: polars>=1.3.0
30
- Requires-Dist: pyreadstat>=1.3.2
31
- Requires-Dist: narwhals>=2.11.0
32
- Requires-Dist: openpyxl>=3.0.0
33
- Requires-Dist: xlsxwriter>=3.1.0
34
- Provides-Extra: excel
35
- Requires-Dist: fastexcel>=0.9.0; extra == "excel"
36
- Provides-Extra: dev
37
- Requires-Dist: pytest>=7.0.0; extra == "dev"
38
- Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
39
- Requires-Dist: black>=23.0.0; extra == "dev"
40
- Requires-Dist: ruff>=0.1.0; extra == "dev"
41
- Requires-Dist: mypy>=1.0.0; extra == "dev"
42
- Requires-Dist: pre-commit>=3.0.0; extra == "dev"
43
- Provides-Extra: docs
44
- Requires-Dist: sphinx>=6.0.0; extra == "docs"
45
- Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
46
- Requires-Dist: myst-parser>=1.0.0; extra == "docs"
47
- Provides-Extra: all
48
- Requires-Dist: ultrasav[dev,docs,excel]; extra == "all"
49
- Dynamic: license-file
50
-
51
- # ultrasav
52
-
53
- ⚡ An 'Ultra-powerful' Python package for preparing production-ready SPSS/SAV files using a two-track architecture that separates data and metadata operations.
54
-
55
- > *"Specium Ray for your data!" - Transform SPSS files with the power of Ultra*
56
-
57
-
58
- ## 💡 Motivation
59
-
60
- **ultrasav** is built as a thoughtful wrapper around the excellent pyreadstat package. We're not here to reinvent the wheel for reading and writing SAV files - pyreadstat already does that brilliantly!
61
-
62
- Instead, ultrasav provides additional transformation tools for tasks that are commonly done by folks who work with SAV files regularly:
63
- - 🏷️ **Rename variables** - Change variable names in batch with clean methodology
64
- - 🔄 **Recode values** - Transform codes across multiple variables with clean syntax
65
- - 🏷️ **Update labels** - Batch update variable labels and value labels without losing track
66
- - 📊 **Reorganize columns** - Move variables to specific positions for standardized layouts
67
- - 📀 **Merge files intelligently** - Stack survey data while preserving all metadata
68
- - 🎯 **Handle missing values** - Consistent missing value definitions across datasets
69
- - 🦸 **Inspect & report metadata** - Generate datamaps and validation reports with metaman
70
-
71
- ## 🎯 Core Philosophy
72
-
73
- **ultrasav** follows a simple but powerful principle: **Data and Metadata are two independent layers that only come together at read/write time.**
74
-
75
- ```
76
- ┌─────────────┐ ┌─────────────┐
77
- │ DATA │ │ METADATA │
78
- │ DataFrame │ │ Labels │
79
- │ Operations │ │ Formats │
80
- └─────────────┘ └─────────────┘
81
- │ │
82
- └────────┬────────────────┘
83
-
84
-
85
- ┌─────────────┐
86
- │ WRITE SAV │
87
- └─────────────┘
88
- ```
89
-
90
- ### The Common Problems
91
-
92
- If you work with SPSS files in Python, you've probably asked yourself:
93
-
94
- - How do I bulk update variable labels and value labels?
95
- - How do I quickly relocate variables to ideal positions?
96
- - How do I merge datasets — and more specifically, how are the labels being merged?
97
- - How can I see a comprehensive datamap of my data?
98
- - Most importantly: **How do I prepare a tidy SPSS file with clean labels and metadata that is production-ready?**
99
-
100
- ultrasav answers all of these.
101
-
102
- ### The ultrasav Way
103
-
104
- ```python
105
- import ultrasav as ul
106
-
107
- # Read → splits into two independent tracks
108
- df, meta = ul.read_sav("survey.sav")
109
-
110
- # Track 1 - Data: Transform data freely
111
- data = ul.Data(df) # Wrap df into our Data class
112
- df = data.move(first=['id']).rename({'Q1': 'satisfaction'}).replace({'satisfaction': {6: 99}}).to_native()
113
-
114
- # Track 2 - Metadata: Update metadata independently
115
- meta = ul.Metadata(meta) # Wrap meta into our Metadata class
116
- meta.column_labels = {'satisfaction': 'Overall satisfaction'}
117
- meta.variable_value_labels={'recommend': {0: 'No', 1: 'Yes'}}
118
-
119
-
120
- # Convergence: Reunite at write time
121
- ul.write_sav(df, meta, "clean_survey.sav")
122
- ```
123
-
124
- The goal is to provide you with a **clean and easy-to-understand way** to transform your SPSS data that you can use in real production workflows with minimal tweaking.
125
-
126
- ### 🚀 DataFrame-Agnostic Design
127
-
128
- One of ultrasav's superpowers is being **dataframe-agnostic** — it works seamlessly with both **polars** and **pandas** thanks to [narwhals](https://github.com/MarcoGorelli/narwhals) under the hood:
129
-
130
- - 🐻‍❄️ **Polars by default** - Blazing fast performance out of the box
131
- - 🐼 **Pandas fully supported** - Use `output_format="pandas"` when needed
132
- - 🔄 **Switch freely** - Convert between pandas and polars anytime
133
- - 🔧 **Future-proof** - Ready for whatever dataframe library comes next
134
-
135
- **Default output format: Polars** — All operations return polars DataFrames by default for blazing-fast performance. Pandas is fully supported via the `output_format="pandas"` parameter.
136
-
137
- ```python
138
- import ultrasav as ul
139
-
140
- # Polars by default
141
- df_pl, meta = ul.read_sav("survey.sav", output_format="polars")
142
-
143
- # Or explicitly request pandas
144
- df_pd, meta = ul.read_sav("survey.sav", output_format="pandas")
145
-
146
- # The Data class works with either
147
- data = ul.Data(df_pl) # Works with both Polars and pandas!
148
-
149
- # Transform using ultrasav's consistent API
150
- data = data.rename({"Q1": "satisfaction"}).replace({'satisfaction': {6: 99}})
151
- df_native = data.to_native() # Get back your polars DataFrame
152
- ```
153
-
154
- ### Who Is This For?
155
-
156
- - 📊 **Market Researchers** - Merge waves, standardize labels, prepare deliverables
157
- - 🔬 **Data Scientists** - Clean survey data, prepare features, maintain metadata
158
- - 🏭 **Data Engineers** - Build robust pipelines that preserve SPSS metadata
159
- - 🎓 **Academic Researchers** - Manage longitudinal studies, harmonize datasets
160
- - 📈 **Anyone working with SPSS** - If you use SAV files regularly, this is for you!
161
-
162
- ## 🚀 Installation
163
-
164
- ```bash
165
- # Using uv
166
- uv add ultrasav
167
-
168
- # Or using pip
169
- pip install ultrasav
170
- ```
171
-
172
- ## 📚 Quick Start
173
-
174
- ### Basic Usage
175
-
176
- ```python
177
- import ultrasav as ul
178
-
179
- # Read SPSS file - automatically splits into data and metadata
180
- df, meta = ul.read_sav("survey.sav")
181
- # Note: You can also use pyreadstat directly - our classes work with pyreadstat meta objects too
182
-
183
- # Track 1: Process data independently
184
- data = ul.Data(df) # Wrap in Data class for transformations
185
- data = data.move(first=["ID", "Date"]) # Reorder columns
186
- data = data.rename({"Q1": "Satisfaction"}) # Rename columns
187
- data = data.replace({"Satisfaction": {99: None}}) # Replace values
188
- df = data.to_native() # Back to native DataFrame
189
-
190
- # Track 2: Process metadata independently
191
- meta.column_labels = {"Satisfaction": "Customer Satisfaction Score"}
192
- meta.variable_value_labels = {
193
- "Satisfaction": {1: "Very Dissatisfied", 5: "Very Satisfied"}
194
- }
195
- meta.variable_measure = {
196
- 'Satisfaction': 'ordinal',
197
- 'Gender': 'nominal',
198
- 'Age': 'scale',
199
- }
200
-
201
- # Convergence: Write both tracks to SPSS
202
- ul.write_sav(df, meta, "cleaned_survey.sav")
203
- ```
204
-
205
- ### Merging Files
206
-
207
- ```python
208
- import ultrasav as ul
209
-
210
- # Merge multiple files vertically with automatic metadata handling
211
- df, meta = ul.add_cases([
212
- "wave1.sav",
213
- "wave2.sav",
214
- "wave3.sav"
215
- ])
216
-
217
- # Metadata is automatically preserved from top to bottom.
218
- # A source-tracking column is automatically added to show each row's origin.
219
- # Example: mrgsrc: ["wave1.sav", "wave2.sav", "wave3.sav"]
220
-
221
- ul.write_sav(df, meta, "merged_output.sav")
222
- ```
223
-
224
- ### Advanced Merging
225
-
226
- ```python
227
- import ultrasav as ul
228
-
229
- # Use specific metadata template for all files
230
- standard_meta = ul.Metadata() # Create an empty meta object
231
- standard_meta.column_labels = {"Q1": "Satisfaction", "Q2": "Loyalty"}
232
- standard_meta.variable_value_labels = {
233
- "Satisfaction": {1: "Very Dissatisfied", 5: "Very Satisfied"}
234
- }
235
-
236
- data, meta = ul.add_cases(
237
- inputs=["file1.sav", "file2.sav", "file3.csv"],
238
- meta=[standard_meta], # Apply this metadata to merged data
239
- source_col="mrgsrc", # Auto append column 'mrgsrc' to track source files
240
- output_format="polars" # Explicit format (polars is default)
241
- )
242
- ```
243
-
244
- ### Writing Back
245
-
246
- ```python
247
- # Read SPSS file
248
- df, meta = ul.read_sav("huge_survey.sav")
249
-
250
- # All ultrasav operations work the same
251
- df = ul.Data(df).rename({"Q1": "satisfaction"}).drop(["unused_var"]).to_native()
252
-
253
- # Efficient write-back
254
- # Simply provide the 'meta' object; labels and formats are applied automatically.
255
- # Compatible with both ultrasav and pyreadstat meta objects.
256
- ul.write_sav(df, meta, "processed_data.sav")
257
- ```
258
-
259
- ## 🦸 Metaman: The Metadata Submodule
260
-
261
- ultrasav includes **metaman**, a powerful submodule for metadata inspection, extraction, and reporting. All metaman functions are accessible directly from the top-level `ul` namespace.
262
-
263
- ### Generate Validation Datamaps
264
-
265
- Create comprehensive datamaps showing variable types, value distributions, and data quality metrics:
266
-
267
- ```python
268
- import ultrasav as ul
269
-
270
- df, meta = ul.read_sav("survey.sav")
271
-
272
- # Create a validation datamap
273
- datamap = ul.make_datamap(df, meta)
274
-
275
- # Export to beautifully formatted Excel
276
- # This function supports polars only at the moment
277
- ul.map_to_excel(datamap, "validation_report.xlsx")
278
-
279
- # Use custom color schemes
280
- ul.map_to_excel(
281
- datamap,
282
- "validation_report.xlsx",
283
- alternating_group_formats=ul.get_color_scheme("pastel_blue")
284
- )
285
- ```
286
-
287
- The datamap includes:
288
- - Variable names and labels
289
- - Variable types (single-select, multi-select, numeric, text, date)
290
- - Value codes and labels
291
- - Value counts and percentages
292
- - Missing data flags
293
- - Missing value label detection
294
-
295
- **Note: Variable types are inferred from both SPSS data and metadata on a best-effort basis and may not always perfectly reflect the true underlying types.**
296
-
297
- ### Extract Metadata to Python Files
298
-
299
- Save existing metadata (if any) from a sav file as importable Python dictionaries for reuse across projects:
300
-
301
- ```python
302
- import ultrasav as ul
303
-
304
- df, meta = ul.read_sav("survey.sav")
305
-
306
- # Extract metadata (labels) to in-memory python object
307
- meta_dict = ul.get_meta(meta)
308
-
309
- # Extract and save ALL metadata to a Python file
310
- meta_dict = ul.get_meta(meta, include_all=True, output_path="survey_labels.py")
311
- ```
312
-
313
- ### Create Labels from Excel Templates
314
-
315
- Build label dictionaries from scratch using Excel templates - perfect for translating surveys or standardizing labels:
316
-
317
- ```python
318
- import ultrasav as ul
319
-
320
- # Excel file with 'col_label' and 'value_label' sheets
321
- col_labels, val_labels = ul.make_labels(
322
- input_path="label_template.xlsx",
323
- output_path="translated_labels.py" # optional
324
- )
325
- ```
326
-
327
- **Excel Structure:**
328
-
329
- Your Excel file should have two sheets:
330
-
331
- 1. **Column Labels Sheet** (default sheet name: "col_label"):
332
- | variable | label |
333
- |----------|-------|
334
- | age | Age of respondent |
335
- | gender | Gender |
336
- | income | Annual household income |
337
-
338
- 2. **Value Labels Sheet** (default sheet name: "value_label"):
339
- | variable | value | label |
340
- |----------|-------|-------|
341
- | gender | 1 | Male |
342
- | gender | 2 | Female |
343
- | income | 1 | Under $25k |
344
- | income | 2 | $25k-50k |
345
-
346
- ## 📖 API Reference
347
-
348
- ### Core Functions
349
-
350
- #### `read_sav(filepath, output_format="polars")`
351
- Read SPSS file and return separated data and metadata.
352
- This is a wrapper around pyreadstat.read_sav with some additional encoding handling
353
-
354
- ```python
355
- df, meta = ul.read_sav("survey.sav")
356
- ```
357
-
358
- #### `write_sav(data, meta, filepath)`
359
- Write data and metadata to SPSS file.
360
-
361
- ```python
362
- ul.write_sav(df, meta, "processed_data.sav")
363
- ```
364
-
365
- #### `add_cases(inputs, meta=None, source_col="mrgsrc")`
366
- Merge multiple files/dataframes vertically with metadata handling, return merged data and metadata.
367
-
368
- ```python
369
- df_merged, meta_merged = ul.add_cases(["wave1.sav","wave2.sav", "wave3.sav"])
370
- ```
371
-
372
- ### Classes
373
-
374
- #### `Data`
375
- Handles all dataframe operations while maintaining compatibility with both Polars and pandas.
376
-
377
- ```python
378
- import ultrasav as ul
379
-
380
- df, meta = ul.read_sav("survey.sav") # Returns a Polars DataFrame and meta object
381
-
382
- # Convert polars or pandas df into our ul.Data() class
383
- data = ul.Data(df)
384
-
385
- # Data Class Methods
386
- # move - to relocate columns
387
- data = data.move(
388
- first=['respondent_id'],
389
- last=['timestamp'],
390
- before={'age': 'gender'}, # place 'age' column before 'gender'
391
- after={'wave': ['age', 'gender', 'income']} # place demographic columns after 'wave'
392
- )
393
-
394
- # rename - to rename columns
395
- data = data.rename({"old": "new"})
396
-
397
- # replace - to replace/recode values
398
- data = data.replace({"col": {1: 100}})
399
-
400
- # select - to select columns
401
- data = data.select(['age', 'gender'])
402
-
403
- # drop - to drop columns
404
- data = data.drop(['id', 'language'])
405
-
406
- # to_native - to return ul.Data(df) back to its native dataframe
407
- df = data.to_native() # Get back Polars/pandas DataFrame
408
-
409
- # Optionally, use chaining for cleaner code
410
- df = (
411
- ul.Data(df)
412
- .move(first=['respondent_id'])
413
- .rename({"old": "new"})
414
- .replace({"col": {1: 100}})
415
- .select(['age', 'gender'])
416
- .drop(['id', 'language'])
417
- .to_native()
418
- )
419
- ```
420
-
421
- #### `Metadata`
422
- Manages all SPSS metadata independently from data.
423
-
424
- ```python
425
- import ultrasav as ul
426
-
427
- df, meta = ul.read_sav("survey.sav")
428
-
429
- meta = ul.Metadata(meta)
430
-
431
- # All updatable metadata
432
-
433
- meta.column_labels = {"Q1": "Question 1"}
434
- meta.variable_value_labels = {"Q1": {1: "Yes", 0: "No"}}
435
- meta.variable_measure = {"age": "scale"}
436
- meta.variable_format = {"age": "F3.0", "city_name": "A50"}
437
- meta.variable_display_width = {"city_name": 50,}
438
- meta.missing_ranges = {"Q1": [99], "Q2": [{"lo":998,"hi":999}]}
439
- meta.notes = "Created on 2025-02-15"
440
- meta.file_label = "My Survey 2025"
441
-
442
- # Optionally, use '.update()' to update everything at once
443
- meta = meta.update(
444
- column_labels = {"Q1": "Question 1"},
445
- variable_value_labels = {"Q1": {1: "Yes", 0: "No"}},
446
- variable_measure = {"age": "scale"},
447
- variable_format = {"age": "F3.0", "city_name": "A50"},
448
- ...
449
- )
450
-
451
- # You can update any writable metadata fields supported by pyreadstat.
452
- ```
453
- **Metadata Updating Logic**
454
- - Original metadata is preserved and never destroyed
455
- - User updates overlay on top of originals
456
- - When you set `meta.column_labels = {"Q1": "New Label"}`:
457
- - This updates Q1's column label if there is an existing column label within the original meta.column_labels
458
- - If Q1 is not in the original metadata, then Q1's new label will simply be appended at the bottom of the meta.column_labels dict
459
- - All other column labels remain unchanged
460
- - Original metadata still exists underneath
461
- - This update logic applies to all updatable metadata
462
-
463
- **Note on `variable_value_labels` Update Behavior:**
464
-
465
- When updating `meta.variable_value_labels`, the entire value-label dictionary for a variable is **replaced**, not merged.
466
-
467
- ```python
468
- # Original metadata
469
- meta.variable_value_labels = {"Q1": {1: "Yes", 2: "No", 99: "Unsure"}}
470
-
471
- # User update
472
- meta.variable_value_labels = {"Q1": {1: "Yes", 0: "No"}}
473
-
474
- # Result for Q1 becomes:
475
- {"Q1": {1: "Yes", 0: "No"}} # Previous values 2 and 99 are NOT preserved
476
- ```
477
-
478
- This means:
479
- - Only the value-label pairs explicitly provided in the update are kept
480
- - The entire dictionary for that variable is replaced at once
481
- - Variable-level entries are preserved (e.g., "Q1" still exists), but value-level merging does not occur
482
-
483
- This follows ultrasav's design principle: metadata updates overlay at the variable level — never partially merged — ensuring clean and intentional metadata after each update.
484
-
485
- **Critical Design Choice:**
486
- - When you rename an existing column "Q1" to "Q1a" in data, the associated metadata does not automatically carry over
487
- - You must explicitly provide new metadata for the newly renamed column "Q1a"
488
- - No automatic tracking or mapping between old and new names
489
-
490
-
491
- ### 🦸 Metaman Functions
492
-
493
- #### `make_datamap(df, meta, output_format=None)`
494
- Create a validation datamap from data and metadata.
495
-
496
- ```python
497
- datamap = ul.make_datamap(df, meta)
498
- ```
499
-
500
- #### `map_to_excel(df, file_path, **kwargs)`
501
- Export datamap to formatted Excel with merged cells and alternating colors.
502
-
503
- ```python
504
- ul.map_to_excel(datamap, "report.xlsx") # Saves datamap to Excel
505
- ul.map_to_excel(datamap, "report.xlsx", alternating_group_formats=ul.get_color_scheme("pastel_blue"))
506
- ```
507
-
508
- #### `get_meta(meta, output_path=None, include_all=False)`
509
- Extract metadata to a Python file or dictionary.
510
-
511
- ```python
512
- meta_dict = ul.get_meta(meta) # Returns meta_dict in memory
513
- ul.get_meta(meta, output_path="labels.py") # Saves to file
514
- ```
515
-
516
- #### `make_labels(input_path, output_path=None)`
517
- Create label dictionaries from an Excel template.
518
-
519
- ```python
520
- col_labels, val_labels = ul.make_labels("template.xlsx") # Returns label dicts in memory
521
- col_labels, val_labels = ul.make_labels("template.xlsx", "labels.py") # Saves to file
522
- ```
523
-
524
- #### `detect_variable_type(df, meta, column)`
525
- Detect variable type (single-select, multi-select, numeric, text, date).
526
-
527
- ```python
528
- var_type = ul.detect_variable_type(df, meta, "Q1")
529
- ```
530
-
531
- #### `get_color_scheme(name)`
532
- Get a color scheme for Excel formatting.
533
-
534
- ```python
535
- scheme = ul.get_color_scheme("pastel_blue")
536
- # Options: "classic_grey", "pastel_green", "pastel_blue", "pastel_purple", "pastel_indigo"
537
- ```
538
-
539
- ## ⚡ Why "ultrasav"?
540
-
541
- The name combines "Ultra" (inspired by Ultraman) with "SAV" (SPSS file format), representing the ultra-powerful transformation capabilities of this package. Just like Ultraman's Specium Ray, ultrasav splits and recombines data with precision and power!
542
-
543
- And **metaman**? He's the metadata superhero who swoops in to inspect, validate, and report on your SPSS data! 🦸
544
-
545
- ## 🤝 Contributing
546
-
547
- Contributions are welcome! Please feel free to submit a Pull Request.
548
-
549
- ## 📄 License
550
-
551
- MIT License - see LICENSE file for details.
552
-
553
- ## 🙏 Acknowledgments
554
-
555
- - Built on top of [pyreadstat](https://github.com/Roche/pyreadstat) for SPSS file handling
556
- - Uses [narwhals](https://github.com/MarcoGorelli/narwhals) for dataframe compatibility
557
- - Excel export powered by [xlsxwriter](https://github.com/jmcnamara/XlsxWriter)
558
-
559
- ## 📬 Contact
560
-
561
- - Author: Albert Li
562
-
563
- ## 📄 Version History
564
-
565
- - **0.1.0**: Initial release with two-track architecture for data/metadata separation and metaman submodule for metadata inspection & reporting
1
+ Metadata-Version: 2.3
2
+ Name: ultrasav
3
+ Version: 0.2.4
4
+ Summary: A Python package for working with SPSS/SAV files with two-track architecture separating data and metadata operations
5
+ Keywords: spss,spss labels,spss files,sav files,sav,statistics,data-science,data-processing,survey-data,metadata,spss metadata,pyreadstat,dataframe-agnostic,polars,pandas,read spss,read sav,write spss,write sav,merge spss,merge sav,datamap,spss-datamap,validation,data-quality,tidyspss,metaprinter
6
+ Author: Albert Li
7
+ License: MIT
8
+ Requires-Dist: pandas>=2.2.0
9
+ Requires-Dist: polars>=1.3.0
10
+ Requires-Dist: pyreadstat>=1.3.2
11
+ Requires-Dist: narwhals>=2.11.0
12
+ Requires-Dist: openpyxl>=3.0.0
13
+ Requires-Dist: xlsxwriter>=3.1.0
14
+ Requires-Dist: ultrasav[excel,dev,docs] ; extra == 'all'
15
+ Requires-Dist: pytest>=7.0.0 ; extra == 'dev'
16
+ Requires-Dist: pytest-cov>=4.0.0 ; extra == 'dev'
17
+ Requires-Dist: black>=23.0.0 ; extra == 'dev'
18
+ Requires-Dist: ruff>=0.1.0 ; extra == 'dev'
19
+ Requires-Dist: mypy>=1.0.0 ; extra == 'dev'
20
+ Requires-Dist: pre-commit>=3.0.0 ; extra == 'dev'
21
+ Requires-Dist: sphinx>=6.0.0 ; extra == 'docs'
22
+ Requires-Dist: sphinx-rtd-theme>=1.0.0 ; extra == 'docs'
23
+ Requires-Dist: myst-parser>=1.0.0 ; extra == 'docs'
24
+ Requires-Dist: fastexcel>=0.9.0 ; extra == 'excel'
25
+ Maintainer: Albert Li
26
+ Requires-Python: >=3.11
27
+ Project-URL: Homepage, https://github.com/albertxli/ultrasav
28
+ Project-URL: Documentation, https://ultrasav.readthedocs.io
29
+ Project-URL: Changelog, https://github.com/albertxli/ultrasav/blob/main/CHANGELOG.md
30
+ Provides-Extra: all
31
+ Provides-Extra: dev
32
+ Provides-Extra: docs
33
+ Provides-Extra: excel
34
+ Description-Content-Type: text/markdown
35
+
36
+ # ⚡ultrasav
37
+
38
+ An 'Ultra-powerful' Python package for preparing production-ready SPSS/SAV files using a two-track architecture that separates data and metadata operations.
39
+
40
+
41
+
42
+ ## 💡 Motivation
43
+
44
+ **ultrasav** is built as a thoughtful wrapper around the excellent pyreadstat package. We're not here to reinvent the wheel for reading and writing SAV files - pyreadstat already does that brilliantly!
45
+
46
+ Instead, ultrasav provides additional transformation tools for tasks that are commonly done by folks who work with SAV files regularly:
47
+ - 🏷️ **Rename variables** - Change variable names in batch with clean methodology
48
+ - 🔄 **Recode values** - Transform codes across multiple variables with clean syntax
49
+ - 🏷️ **Update labels** - Batch update variable labels and value labels without losing track
50
+ - 📊 **Reorganize columns** - Move variables to specific positions for standardized layouts
51
+ - 📀 **Merge files intelligently** - Stack survey data while preserving all metadata
52
+ - 🎯 **Handle missing values** - Consistent missing value definitions across datasets
53
+ - 🦸 **Inspect & report metadata** - Generate datamaps and validation reports with metaman
54
+
55
+ ## 🎯 Core Philosophy
56
+
57
+ **ultrasav** follows a simple but powerful principle: **Data and Metadata are two independent layers that only come together at read/write time.**
58
+
59
+ ```
60
+ ┌─────────────┐ ┌─────────────┐
61
+ │ DATA │ │ METADATA │
62
+ │ DataFrame │ │ Labels │
63
+ │ Operations │ │ Formats │
64
+ └─────────────┘ └─────────────┘
65
+ │ │
66
+ └────────┬────────────────┘
67
+
68
+
69
+ ┌─────────────┐
70
+ │ WRITE SAV │
71
+ └─────────────┘
72
+ ```
73
+
74
+ ### The Common Problems
75
+
76
+ If you work with SPSS files in Python, you've probably asked yourself:
77
+
78
+ - How do I bulk update variable labels and value labels?
79
+ - How do I quickly relocate variables to ideal positions?
80
+ - How do I merge datasets — and more specifically, how are the labels being merged?
81
+ - How can I see a comprehensive datamap of my data?
82
+ - Most importantly: **How do I prepare a tidy SPSS file with clean labels and metadata that is production-ready?**
83
+
84
+ ultrasav answers all of these.
85
+
86
+ ### The ultrasav Way
87
+
88
+ ```python
89
+ import ultrasav as ul
90
+
91
+ # Read → splits into two independent tracks
92
+ df, meta = ul.read_sav("survey.sav")
93
+
94
+ # Track 1 - Data: Transform data freely
95
+ data = ul.Data(df) # Wrap df into our Data class
96
+ df = data.move(first=['id']).rename({'Q1': 'satisfaction'}).replace({'satisfaction': {6: 99}}).to_native()
97
+
98
+ # Track 2 - Metadata: Update metadata independently (immutable - returns NEW object)
99
+ meta = ul.Metadata(meta) # Wrap meta into our Metadata class
100
+ meta = meta.update(
101
+ column_labels={'satisfaction': 'Overall satisfaction'},
102
+ variable_value_labels={'recommend': {0: 'No', 1: 'Yes'}}
103
+ )
104
+
105
+ # Convergence: Reunite at write time
106
+ ul.write_sav(df, meta, "clean_survey.sav")
107
+ ```
108
+
109
+ The goal is to provide you with a **clean and easy-to-understand way** to transform your SPSS data that you can use in real production workflows with minimal tweaking.
110
+
111
+ ### 🚀 DataFrame-Agnostic Design
112
+
113
+ One of ultrasav's superpowers is being **dataframe-agnostic** — it works seamlessly with both **polars** and **pandas** thanks to [narwhals](https://github.com/MarcoGorelli/narwhals) under the hood:
114
+
115
+ - 🐻‍❄️ **Polars by default** - Blazing fast performance out of the box
116
+ - 🐼 **Pandas fully supported** - Use `output_format="pandas"` when needed
117
+ - 🔄 **Switch freely** - Convert between pandas and polars anytime
118
+ - 🔧 **Future-proof** - Ready for whatever dataframe library comes next
119
+
120
+ **Default output format: Polars** All operations return polars DataFrames by default for blazing-fast performance. Pandas is fully supported via the `output_format="pandas"` parameter.
121
+
122
+ ```python
123
+ import ultrasav as ul
124
+
125
+ # Polars by default
126
+ df_pl, meta = ul.read_sav("survey.sav", output_format="polars")
127
+
128
+ # Or explicitly request pandas
129
+ df_pd, meta = ul.read_sav("survey.sav", output_format="pandas")
130
+
131
+ # The Data class works with either
132
+ data = ul.Data(df_pl) # Works with both Polars and pandas!
133
+
134
+ # Transform using ultrasav's consistent API
135
+ data = data.rename({"Q1": "satisfaction"}).replace({'satisfaction': {6: 99}})
136
+ df_native = data.to_native() # Get back your polars DataFrame
137
+ ```
138
+
139
+ ### Who Is This For?
140
+
141
+ - 📊 **Market Researchers** - Merge waves, standardize labels, prepare deliverables
142
+ - 🔬 **Data Scientists** - Clean survey data, prepare features, maintain metadata
143
+ - 🏭 **Data Engineers** - Build robust pipelines that preserve SPSS metadata
144
+ - 🎓 **Academic Researchers** - Manage longitudinal studies, harmonize datasets
145
+ - 📈 **Anyone working with SPSS** - If you use SAV files regularly, this is for you!
146
+
147
+ ## 🚀 Installation
148
+
149
+ ```bash
150
+ # Using uv
151
+ uv add ultrasav
152
+
153
+ # Or using pip
154
+ pip install ultrasav
155
+ ```
156
+
157
+ ## 📚 Quick Start
158
+
159
+ ### Basic Usage
160
+
161
+ ```python
162
+ import ultrasav as ul
163
+
164
+ # Read SPSS file - automatically splits into data and metadata
165
+ df, meta = ul.read_sav("survey.sav")
166
+ # Note: You can also use pyreadstat directly - our classes work with pyreadstat meta objects too
167
+
168
+ # Track 1: Process data independently
169
+ data = ul.Data(df) # Wrap in Data class for transformations
170
+ data = data.move(first=["ID", "Date"]) # Reorder columns
171
+ data = data.rename({"Q1": "Satisfaction"}) # Rename columns
172
+ data = data.replace({"Satisfaction": {99: None}}) # Replace values
173
+ df = data.to_native() # Back to native DataFrame
174
+
175
+ # Track 2: Process metadata independently (immutable updates)
176
+ meta = ul.Metadata(meta)
177
+ meta = meta.update(
178
+ column_labels={"Satisfaction": "Customer Satisfaction Score"},
179
+ variable_value_labels={
180
+ "Satisfaction": {1: "Very Dissatisfied", 5: "Very Satisfied"}
181
+ },
182
+ variable_measure={
183
+ 'Satisfaction': 'ordinal',
184
+ 'Gender': 'nominal',
185
+ 'Age': 'scale',
186
+ }
187
+ )
188
+
189
+ # Convergence: Write both tracks to SPSS
190
+ ul.write_sav(df, meta, "cleaned_survey.sav")
191
+ ```
192
+
193
+ ### Merging Files
194
+
195
+ ```python
196
+ import ultrasav as ul
197
+
198
+ # Merge multiple files vertically with automatic metadata handling
199
+ df, meta = ul.add_cases([
200
+ "wave1.sav",
201
+ "wave2.sav",
202
+ "wave3.sav"
203
+ ])
204
+
205
+ # Metadata is automatically preserved from top to bottom.
206
+ # A source-tracking column is automatically added to show each row's origin.
207
+ # Example: mrgsrc: ["wave1.sav", "wave2.sav", "wave3.sav"]
208
+
209
+ ul.write_sav(df, meta, "merged_output.sav")
210
+ ```
211
+
212
+ ### Advanced Merging
213
+
214
+ ```python
215
+ import ultrasav as ul
216
+
217
+ # Use specific metadata template for all files
218
+ standard_meta = ul.Metadata() # Create an empty meta object
219
+ standard_meta = standard_meta.update(
220
+ column_labels={"Q1": "Satisfaction", "Q2": "Loyalty"},
221
+ variable_value_labels={
222
+ "Satisfaction": {1: "Very Dissatisfied", 5: "Very Satisfied"}
223
+ }
224
+ )
225
+
226
+ data, meta = ul.add_cases(
227
+ inputs=["file1.sav", "file2.sav", "file3.csv"],
228
+ meta=[standard_meta], # Apply this metadata to merged data
229
+ source_col="mrgsrc", # Auto append column 'mrgsrc' to track source files
230
+ output_format="polars" # Explicit format (polars is default)
231
+ )
232
+ ```
233
+
234
+ ### Writing Back
235
+
236
+ ```python
237
+ # Read SPSS file
238
+ df, meta = ul.read_sav("huge_survey.sav")
239
+
240
+ # All ultrasav operations work the same
241
+ df = ul.Data(df).rename({"Q1": "satisfaction"}).drop(["unused_var"]).to_native()
242
+
243
+ # Efficient write-back
244
+ # Simply provide the 'meta' object; labels and formats are applied automatically.
245
+ # Compatible with both ultrasav and pyreadstat meta objects.
246
+ ul.write_sav(df, meta, "processed_data.sav")
247
+
248
+ # For compressed output, use .zsav extension with compress=True
249
+ meta = ul.Metadata(meta).update(compress=True)
250
+ ul.write_sav(df, meta, "compressed_data.zsav")
251
+ ```
252
+
253
+ ## 🦸 Metaman: The Metadata Submodule
254
+
255
+ ultrasav includes **metaman**, a powerful submodule for metadata inspection, extraction, and reporting. All metaman functions are accessible directly from the top-level `ul` namespace.
256
+
257
+ ### Generate Validation Datamaps
258
+
259
+ Create comprehensive datamaps showing variable types, value distributions, and data quality metrics:
260
+
261
+ ```python
262
+ import ultrasav as ul
263
+
264
+ df, meta = ul.read_sav("survey.sav")
265
+
266
+ # Create a validation datamap
267
+ datamap = ul.make_datamap(df, meta)
268
+
269
+ # Export to beautifully formatted Excel
270
+ # This function supports polars only at the moment
271
+ ul.map_to_excel(datamap, "validation_report.xlsx")
272
+
273
+ # Use custom color schemes
274
+ ul.map_to_excel(
275
+ datamap,
276
+ "validation_report.xlsx",
277
+ alternating_group_formats=ul.get_color_scheme("pastel_blue")
278
+ )
279
+ ```
280
+
281
+ The datamap includes:
282
+ - Variable names and labels
283
+ - Variable types (single-select, multi-select, numeric, text, date)
284
+ - Value codes and labels
285
+ - Value counts and percentages
286
+ - Missing data flags
287
+ - Missing value label detection
288
+
289
+ **Note: Variable types are inferred from both SPSS data and metadata on a best-effort basis and may not always perfectly reflect the true underlying types.**
290
+
291
+ ### Extract Metadata to Python Files
292
+
293
+ Save existing metadata (if any) from a sav file as importable Python dictionaries for reuse across projects:
294
+
295
+ ```python
296
+ import ultrasav as ul
297
+
298
+ df, meta = ul.read_sav("survey.sav")
299
+
300
+ # Extract metadata (labels) to in-memory python object
301
+ meta_dict = ul.get_meta(meta)
302
+
303
+ # Extract and save ALL metadata to a Python file
304
+ meta_dict = ul.get_meta(meta, include_all=True, output_path="survey_labels.py")
305
+ ```
306
+
307
+ ### Create Labels from Excel Templates
308
+
309
+ Build label dictionaries from scratch using Excel templates - perfect for translating surveys or standardizing labels:
310
+
311
+ ```python
312
+ import ultrasav as ul
313
+
314
+ # Excel file with 'col_label' and 'value_label' sheets
315
+ col_labels, val_labels = ul.make_labels(
316
+ input_path="label_template.xlsx",
317
+ output_path="translated_labels.py" # optional
318
+ )
319
+ ```
320
+
321
+ **Excel Structure:**
322
+
323
+ Your Excel file should have two sheets:
324
+
325
+ 1. **Column Labels Sheet** (default sheet name: "col_label"):
326
+ | variable | label |
327
+ |----------|-------|
328
+ | age | Age of respondent |
329
+ | gender | Gender |
330
+ | income | Annual household income |
331
+
332
+ 2. **Value Labels Sheet** (default sheet name: "value_label"):
333
+ | variable | value | label |
334
+ |----------|-------|-------|
335
+ | gender | 1 | Male |
336
+ | gender | 2 | Female |
337
+ | income | 1 | Under $25k |
338
+ | income | 2 | $25k-50k |
339
+
340
+ ## 📖 API Reference
341
+
342
+ ### Core Functions
343
+
344
+ #### `read_sav(filepath, output_format="polars")`
345
+ Read SPSS file and return separated data and metadata.
346
+ This is a wrapper around pyreadstat.read_sav with some additional encoding handling
347
+
348
+ ```python
349
+ df, meta = ul.read_sav("survey.sav")
350
+ ```
351
+
352
+ #### `write_sav(data, meta, filepath, **overrides)`
353
+ Write data and metadata to SPSS file.
354
+
355
+ ```python
356
+ ul.write_sav(df, meta, "processed_data.sav")
357
+
358
+ # With compression (must use .zsav extension)
359
+ meta_compressed = ul.Metadata(meta).update(compress=True)
360
+ ul.write_sav(df, meta_compressed, "compressed_data.zsav")
361
+ ```
362
+
363
+ **Compression Validation:** When `compress=True` in metadata, the destination file must have a `.zsav` extension. A `ValueError` is raised if you attempt to write a compressed file with a `.sav` extension.
364
+
365
+ ```python
366
+ # This will raise ValueError
367
+ meta = ul.Metadata().update(compress=True)
368
+ ul.write_sav(df, meta, "output.sav") # ❌ Wrong extension!
369
+ # ValueError: Metadata has compress=True but destination file 'output.sav'
370
+ # has extension '.sav'. Compressed SPSS files must use the '.zsav' extension.
371
+
372
+ # Correct usage
373
+ ul.write_sav(df, meta, "output.zsav") # ✅ Correct
374
+ ```
375
+
376
+ #### `add_cases(inputs, meta=None, source_col="mrgsrc")`
377
+ Merge multiple files/dataframes vertically with metadata handling, return merged data and metadata.
378
+
379
+ ```python
380
+ df_merged, meta_merged = ul.add_cases(["wave1.sav","wave2.sav", "wave3.sav"])
381
+ ```
382
+
383
+ ### Classes
384
+
385
+ #### `Data`
386
+ Handles all dataframe operations while maintaining compatibility with both Polars and pandas.
387
+
388
+ ```python
389
+ import ultrasav as ul
390
+
391
+ df, meta = ul.read_sav("survey.sav") # Returns a Polars DataFrame and meta object
392
+
393
+ # Convert polars or pandas df into our ul.Data() class
394
+ data = ul.Data(df)
395
+
396
+ # Data Class Methods
397
+ # move - to relocate columns
398
+ data = data.move(
399
+ first=['respondent_id'],
400
+ last=['timestamp'],
401
+ before={'age': 'gender'}, # place 'age' column before 'gender'
402
+ after={'wave': ['age', 'gender', 'income']} # place demographic columns after 'wave'
403
+ )
404
+
405
+ # rename - to rename columns
406
+ data = data.rename({"old": "new"})
407
+
408
+ # replace - to replace/recode values
409
+ data = data.replace({"col": {1: 100}})
410
+
411
+ # select - to select columns
412
+ data = data.select(['age', 'gender'])
413
+
414
+ # drop - to drop columns
415
+ data = data.drop(['id', 'language'])
416
+
417
+ # to_native - to return ul.Data(df) back to its native dataframe
418
+ df = data.to_native() # Get back Polars/pandas DataFrame
419
+
420
+ # Optionally, use chaining for cleaner code
421
+ df = (
422
+ ul.Data(df)
423
+ .move(first=['respondent_id'])
424
+ .rename({"old": "new"})
425
+ .replace({"col": {1: 100}})
426
+ .select(['age', 'gender'])
427
+ .drop(['id', 'language'])
428
+ .to_native()
429
+ )
430
+ ```
431
+
432
+ #### `Metadata`
433
+ Manages all SPSS metadata independently from data. Uses **immutable updates** - all update operations return NEW Metadata objects, nothing is modified in place.
434
+
435
+ ```python
436
+ import ultrasav as ul
437
+
438
+ df, meta = ul.read_sav("survey.sav")
439
+
440
+ meta = ul.Metadata(meta)
441
+
442
+ # Use .update() to update metadata (returns NEW object)
443
+ meta = meta.update(
444
+ column_labels={"Q1": "Question 1"},
445
+ variable_value_labels={"Q1": {1: "Yes", 0: "No"}},
446
+ variable_measure={"age": "scale"},
447
+ variable_format={"age": "F3.0", "city_name": "A50"},
448
+ variable_display_width={"city_name": 50},
449
+ missing_ranges={"Q1": [99], "Q2": [{"lo": 998, "hi": 999}]},
450
+ note="Created on 2025-02-15",
451
+ file_label="My Survey 2025",
452
+ compress=False, # Set to True for .zsav output
453
+ row_compress=False
454
+ )
455
+
456
+ # Or use convenience with_*() methods for single updates
457
+ meta = meta.with_column_labels({"Q2": "Question 2"})
458
+ meta = meta.with_file_label("Updated Survey 2025")
459
+ meta = meta.with_compress(True) # For .zsav output
460
+
461
+ # Chain multiple updates
462
+ meta = (meta
463
+ .with_column_labels({"Q1": "Question 1"})
464
+ .with_variable_measure({"Q1": "nominal"})
465
+ .with_file_label("My Survey 2025")
466
+ )
467
+
468
+ # Access metadata properties (read-only)
469
+ print(meta.column_labels) # {'Q1': 'Question 1', ...}
470
+ print(meta.variable_value_labels) # {'Q1': {1: 'Yes', 0: 'No'}, ...}
471
+ print(meta.compress) # True/False
472
+ ```
473
+
474
+ **Immutable Design:**
475
+ - Original metadata is preserved and never destroyed
476
+ - All `update()` and `with_*()` methods return NEW Metadata objects
477
+ - The original object remains unchanged
478
+
479
+ ```python
480
+ meta1 = ul.Metadata(meta)
481
+ meta2 = meta1.update(column_labels={"Q1": "New Label"})
482
+ # meta1 is UNCHANGED, meta2 has the update
483
+ ```
484
+
485
+ **Metadata Updating Logic:**
486
+ - User updates overlay on top of originals
487
+ - When you update `column_labels={"Q1": "New Label"}`:
488
+ - This updates Q1's column label if there is an existing column label
489
+ - If Q1 is not in the original metadata, Q1's new label will be appended
490
+ - All other column labels remain unchanged
491
+
492
+ **Note on `variable_value_labels` Update Behavior:**
493
+
494
+ When updating `variable_value_labels`, the entire value-label dictionary for a variable is **replaced**, not merged.
495
+
496
+ ```python
497
+ # Original metadata
498
+ meta = ul.Metadata({"variable_value_labels": {"Q1": {1: "Yes", 2: "No", 99: "Unsure"}}})
499
+
500
+ # User update
501
+ meta = meta.update(variable_value_labels={"Q1": {1: "Yes", 0: "No"}})
502
+
503
+ # Result for Q1 becomes:
504
+ {"Q1": {1: "Yes", 0: "No"}} # Previous values 2 and 99 are NOT preserved
505
+ ```
506
+
507
+ This means:
508
+ - Only the value-label pairs explicitly provided in the update are kept
509
+ - The entire dictionary for that variable is replaced at once
510
+ - Variable-level entries are preserved (e.g., "Q1" still exists), but value-level merging does not occur
511
+
512
+ This follows ultrasav's design principle: metadata updates overlay at the variable level — never partially merged — ensuring clean and intentional metadata after each update.
513
+
514
+ **Critical Design Choice:**
515
+ - When you rename an existing column "Q1" to "Q1a" in data, the associated metadata does not automatically carry over
516
+ - You must explicitly provide new metadata for the newly renamed column "Q1a"
517
+ - No automatic tracking or mapping between old and new names
518
+
519
+
520
+ ### 🦸 Metaman Functions
521
+
522
+ #### `make_datamap(df, meta, output_format=None)`
523
+ Create a validation datamap from data and metadata.
524
+
525
+ ```python
526
+ datamap = ul.make_datamap(df, meta)
527
+ ```
528
+
529
+ #### `map_to_excel(df, file_path, **kwargs)`
530
+ Export datamap to formatted Excel with merged cells and alternating colors.
531
+
532
+ ```python
533
+ ul.map_to_excel(datamap, "report.xlsx") # Saves datamap to Excel
534
+ ul.map_to_excel(datamap, "report.xlsx", alternating_group_formats=ul.get_color_scheme("pastel_blue"))
535
+ ```
536
+
537
+ #### `get_meta(meta, output_path=None, include_all=False)`
538
+ Extract metadata to a Python file or dictionary.
539
+
540
+ ```python
541
+ meta_dict = ul.get_meta(meta) # Returns meta_dict in memory
542
+ ul.get_meta(meta, output_path="labels.py") # Saves to file
543
+ ```
544
+
545
+ #### `make_labels(input_path, output_path=None)`
546
+ Create label dictionaries from an Excel template.
547
+
548
+ ```python
549
+ col_labels, val_labels = ul.make_labels("template.xlsx") # Returns label dicts in memory
550
+ col_labels, val_labels = ul.make_labels("template.xlsx", "labels.py") # Saves to file
551
+ ```
552
+
553
+ #### `detect_variable_type(df, meta, column)`
554
+ Detect variable type (single-select, multi-select, numeric, text, date).
555
+
556
+ ```python
557
+ var_type = ul.detect_variable_type(df, meta, "Q1")
558
+ ```
559
+
560
+ #### `get_color_scheme(name)`
561
+ Get a color scheme for Excel formatting.
562
+
563
+ ```python
564
+ scheme = ul.get_color_scheme("pastel_blue")
565
+ # Options: "classic_grey", "pastel_green", "pastel_blue", "pastel_purple", "pastel_indigo"
566
+ ```
567
+
568
+ #### `describe(df, meta, columns)`
569
+
570
+ Quickly variable summary including variable metadata and value distributions:
571
+
572
+ ```python
573
+ # Single variable
574
+ ul.describe(df, meta, "Q1")
575
+
576
+ # Multiple variables
577
+ ul.describe(df, meta, ["Q1", "Q2", "Q3"])
578
+
579
+ # Get summary dict without printing
580
+ summary = ul.describe(df, meta, "Q1", print_output=False)
581
+ ```
582
+
583
+ ## ⚡ Why "ultrasav"?
584
+
585
+ The name combines "Ultra" (super-powered) with "SAV" (SPSS file format), representing the ultra-powerful transformation capabilities of this package. Just like Ultraman's Specium Ray, ultrasav splits and recombines data with precision and power!
586
+
587
+ And **metaman**? He's the metadata superhero who swoops in to inspect, validate, and report on your SPSS data! 🦸
588
+
589
+
590
+ ## 📄 License
591
+
592
+ MIT License - see LICENSE file for details.
593
+
594
+ ## 🙏 Acknowledgments
595
+
596
+ - Built on top of [pyreadstat](https://github.com/Roche/pyreadstat) for SPSS file handling
597
+ - Uses [narwhals](https://github.com/MarcoGorelli/narwhals) for dataframe compatibility
598
+ - Excel export powered by [xlsxwriter](https://github.com/jmcnamara/XlsxWriter)