pyreadstat 1.2.8__tar.gz → 1.3.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of pyreadstat might be problematic. Click here for more details.
- {pyreadstat-1.2.8/pyreadstat.egg-info → pyreadstat-1.3.0}/PKG-INFO +1 -1
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/README.md +12 -6
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/__init__.py +1 -1
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/_readstat_parser.c +2078 -2023
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/_readstat_parser.pxd +1 -1
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/_readstat_parser.pyx +12 -2
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/_readstat_writer.c +4520 -3814
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/_readstat_writer.pxd +2 -1
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/_readstat_writer.pyx +62 -11
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/pyreadstat.c +2217 -2137
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/pyreadstat.pyx +28 -18
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/readstat_api.pxd +7 -2
- {pyreadstat-1.2.8 → pyreadstat-1.3.0/pyreadstat.egg-info}/PKG-INFO +1 -1
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/setup.py +1 -1
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_sas.c +7 -2
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_sas7bcat_read.c +12 -2
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_sas7bcat_write.c +8 -2
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_sas7bdat_read.c +13 -5
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_xport_write.c +9 -5
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_por_read.c +1 -2
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_por_write.c +2 -3
- pyreadstat-1.3.0/src/spss/readstat_sav_parse.c +843 -0
- pyreadstat-1.3.0/src/spss/readstat_sav_parse_mr_name.c +546 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav_read.c +9 -12
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/tests/test_basic.py +13 -0
- pyreadstat-1.2.8/src/spss/readstat_sav_parse.c +0 -872
- pyreadstat-1.2.8/src/spss/readstat_sav_parse_mr_name.c +0 -468
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/LICENSE +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/MANIFEST.in +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyproject.toml +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/conditional_includes.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/pyfunctions.py +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/pyreadstat.pxd +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat/worker.py +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat.egg-info/SOURCES.txt +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat.egg-info/dependency_links.txt +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat.egg-info/requires.txt +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/pyreadstat.egg-info/top_level.txt +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/setup.cfg +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/CKHashTable.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/CKHashTable.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_bits.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_bits.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_convert.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_convert.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_error.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_iconv.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_io_unistd.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_io_unistd.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_malloc.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_malloc.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_metadata.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_parser.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_strings.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_value.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_variable.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_writer.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/readstat_writer.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/ieee.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/ieee.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_sas.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_sas7bdat_write.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_sas_rle.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_sas_rle.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_xport.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_xport.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_xport_parse_format.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_xport_parse_format.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/sas/readstat_xport_read.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_por.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_por.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_por_parse.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_por_parse.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav_compress.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav_compress.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav_parse.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav_parse_mr_name.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav_parse_timestamp.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav_parse_timestamp.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_sav_write.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_spss.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_spss.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_spss_parse.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_spss_parse.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_zsav_compress.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_zsav_compress.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_zsav_read.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_zsav_read.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_zsav_write.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/spss/readstat_zsav_write.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/stata/readstat_dta.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/stata/readstat_dta.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/stata/readstat_dta_parse_timestamp.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/stata/readstat_dta_parse_timestamp.h +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/stata/readstat_dta_read.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/src/stata/readstat_dta_write.c +0 -0
- {pyreadstat-1.2.8 → pyreadstat-1.3.0}/tests/test_version.py +0 -0
|
@@ -333,7 +333,8 @@ df, meta = pyreadstat.read_sas7bdat('/path/to/a/file.sas7bdat', usecols=["variab
|
|
|
333
333
|
A challenge when reading large files is the time consumed in the operation. In order to alleviate this
|
|
334
334
|
pyreadstat provides a function "read\_file\_multiprocessing" to read a file in parallel processes using
|
|
335
335
|
the python multiprocessing library. As it reads the whole file in one go you need to have enough RAM for the operation. If
|
|
336
|
-
that is not the case look at Reading rows in chunks (next section)
|
|
336
|
+
that is not the case look at Reading rows in chunks (next section). Notice however that you can combine reading in parallel
|
|
337
|
+
with reading in chunks as described in the next section.
|
|
337
338
|
|
|
338
339
|
Speed ups in the process will depend on a number of factors such as number of processes available, RAM,
|
|
339
340
|
content of the file etc.
|
|
@@ -598,11 +599,17 @@ translated as NaN by default and to the correspoding string value if
|
|
|
598
599
|
user_missing is set to True. meta.missing_ranges will show the string
|
|
599
600
|
value as well.
|
|
600
601
|
|
|
601
|
-
|
|
602
|
+
When writing a pandas dataframe to a sav file, if user defined missing values are not set, NaNs are translated to
|
|
603
|
+
empty strings, as there is no other possibility to represent those missing values and user defined missing values
|
|
604
|
+
are not set automatically.
|
|
605
|
+
|
|
606
|
+
When reading a sav into a pandas dataframe, if the value in
|
|
602
607
|
a character variable is an empty string (''), it will not be translated to NaN, but will stay as an empty string. This
|
|
603
|
-
is because the empty string is a valid character value in SPSS and pyreadstat preserves that property.
|
|
608
|
+
is because the empty string is a valid character value in SPSS and pyreadstat preserves that property.
|
|
609
|
+
|
|
610
|
+
This behaviour generates an asymetrical situation that has to be managed by the user. You can convert
|
|
604
611
|
empty strings to nan very easily with pandas if you think it is appropiate
|
|
605
|
-
for your dataset.
|
|
612
|
+
for your dataset, or you can use defined missing values as described before.
|
|
606
613
|
|
|
607
614
|
|
|
608
615
|
##### SAS and STATA
|
|
@@ -641,7 +648,6 @@ df, meta = pyreadstat.read_dta("/path/to/file.dta", user_missing=True, apply_val
|
|
|
641
648
|
|
|
642
649
|
Empty strings are still transtaled as empty strings and not as NaN.
|
|
643
650
|
|
|
644
|
-
|
|
645
651
|
The information about what values are user missing is stored in the meta object, in the variable missing_user_values.
|
|
646
652
|
This is a list listing all user defined missing values.
|
|
647
653
|
|
|
@@ -798,7 +804,7 @@ pyreadstat.write_sav(df, path, variable_format=formats)
|
|
|
798
804
|
```
|
|
799
805
|
|
|
800
806
|
The appropiate formats to use are beyond the scope of this documentation. Probably you want to read a file
|
|
801
|
-
produced in the original application and use meta.
|
|
807
|
+
produced in the original application and use meta.original_variable\_types to get the formats. Otherwise look
|
|
802
808
|
for the documentation of the original application.
|
|
803
809
|
|
|
804
810
|
##### SPSS
|