tmlt.analytics 0.4.2__tar.gz → 0.6.2a3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- tmlt_analytics-0.6.2a3/CHANGELOG.rst +623 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/NOTICE +1 -1
- tmlt_analytics-0.6.2a3/PKG-INFO +119 -0
- tmlt_analytics-0.6.2a3/README.md +78 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/_static/css/custom.css +11 -14
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/_static/js/version-banner.js +6 -4
- tmlt_analytics-0.6.2a3/doc/_templates/build-info.html +3 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/_templates/layout.html +2 -0
- tmlt_analytics-0.6.2a3/doc/_templates/package-name.html +3 -0
- tmlt_analytics-0.6.2a3/doc/_templates/sidebar-nav-bs.html +19 -0
- tmlt_analytics-0.6.2a3/doc/conf.py +259 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/aws-glue.rst +177 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/bigquery/bigquery-setup.rst +36 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/bigquery/docker-image.rst +82 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/bigquery/index.rst +37 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/bigquery/inputs-outputs.rst +214 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/bigquery/parameters.rst +192 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/bigquery/running-the-program.rst +112 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/bigquery/setup.rst +50 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/databricks.rst +136 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/index.rst +18 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/installation.rst +133 -0
- tmlt_analytics-0.6.2a3/doc/howto-guides/troubleshooting.rst +105 -0
- tmlt_analytics-0.6.2a3/doc/images/api_diagram.svg +1 -0
- tmlt_analytics-0.6.2a3/doc/images/chart_books_borrowed_by_zip.png +0 -0
- tmlt_analytics-0.6.2a3/doc/images/chart_books_by_unique_members.png +0 -0
- tmlt_analytics-0.6.2a3/doc/images/chart_favorite_genres.png +0 -0
- tmlt_analytics-0.6.2a3/doc/images/chart_genres_by_age.png +0 -0
- tmlt_analytics-0.6.2a3/doc/images/chart_total_books_borrowed_in_zipcodes.png +0 -0
- tmlt_analytics-0.6.2a3/doc/images/flat_map_row_example.svg +1 -0
- tmlt_analytics-0.6.2a3/doc/images/flow_chart_truncation.svg +1 -0
- tmlt_analytics-0.6.2a3/doc/images/glue_graph.png +0 -0
- tmlt_analytics-0.6.2a3/doc/images/index_howto_guides.svg +10 -0
- tmlt_analytics-0.6.2a3/doc/images/intuitive_noise_visualization.png +0 -0
- tmlt_analytics-0.6.2a3/doc/images/mock_checkout_logs.svg +1 -0
- tmlt_analytics-0.6.2a3/doc/images/private_join_example.svg +1 -0
- tmlt_analytics-0.6.2a3/doc/images/private_join_tables.svg +1 -0
- tmlt_analytics-0.6.2a3/doc/images/public_join_example_zips.svg +1 -0
- tmlt_analytics-0.6.2a3/doc/images/tmlt_api_overview_diagram.svg +388 -0
- tmlt_analytics-0.6.2a3/doc/images/tuning-parameters-error-vs-clamping-varied-budgets.png +0 -0
- tmlt_analytics-0.6.2a3/doc/images/tuning-parameters-error-vs-clamping.png +0 -0
- tmlt_analytics-0.6.2a3/doc/index.rst +160 -0
- tmlt.analytics-0.4.2/doc/additional-resources/privacy_policy.rst → tmlt_analytics-0.6.2a3/doc/privacy-policy.rst +6 -1
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/templates/python/class.rst +34 -3
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/templates/python/module.rst +15 -2
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/topic-guides/index.rst +2 -1
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/topic-guides/nulls-nans-infinities.rst +25 -17
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/topic-guides/privacy-budgets.rst +40 -27
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/topic-guides/privacy-promise.rst +57 -33
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/topic-guides/spark.rst +57 -13
- tmlt_analytics-0.6.2a3/doc/topic-guides/understanding-sensitivity.rst +232 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/topic-guides/working-with-sessions.rst +10 -6
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/tutorials/clamping-bounds.rst +10 -9
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/tutorials/first-steps.rst +32 -46
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/tutorials/groupby-queries.rst +164 -7
- tmlt_analytics-0.6.2a3/doc/tutorials/index.rst +33 -0
- tmlt_analytics-0.6.2a3/doc/tutorials/more-with-privacy-ids.rst +344 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/tutorials/privacy-budget-basics.rst +35 -7
- tmlt_analytics-0.6.2a3/doc/tutorials/privacy-id-basics.rst +377 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/tutorials/simple-transformations.rst +56 -42
- tmlt_analytics-0.6.2a3/pyproject.toml +250 -0
- {tmlt.analytics-0.4.2/test/unit → tmlt_analytics-0.6.2a3/test}/__init__.py +1 -1
- tmlt_analytics-0.6.2a3/test/conftest.py +333 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/test/system/__init__.py +1 -1
- tmlt_analytics-0.6.2a3/test/system/conftest.py +117 -0
- tmlt_analytics-0.6.2a3/test/system/session/__init__.py +4 -0
- tmlt_analytics-0.6.2a3/test/system/session/conftest.py +64 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/__init__.py +4 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/queries/__init__.py +4 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/queries/conftest.py +33 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/queries/test_flat_map_by_id.py +377 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/test_constraint_propagation.py +490 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/test_count_distinct_optimization.py +193 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/test_id_col_operations.py +257 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/test_l0_linf_truncation.py +648 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/test_l1_truncation.py +342 -0
- tmlt_analytics-0.6.2a3/test/system/session/ids/test_partition.py +205 -0
- tmlt_analytics-0.6.2a3/test/system/session/mixed/__init__.py +4 -0
- tmlt_analytics-0.6.2a3/test/system/session/mixed/test_mixed_session.py +158 -0
- tmlt_analytics-0.6.2a3/test/system/session/rows/__init__.py +4 -0
- tmlt_analytics-0.6.2a3/test/system/session/rows/conftest.py +761 -0
- tmlt_analytics-0.6.2a3/test/system/session/rows/test_add_max_rows.py +1306 -0
- tmlt_analytics-0.6.2a3/test/system/session/rows/test_add_max_rows_in_max_groups.py +164 -0
- tmlt_analytics-0.6.2a3/test/system/session/rows/test_add_max_rows_infs_nulls.py +508 -0
- tmlt_analytics-0.6.2a3/test/system/session/rows/test_invalid.py +267 -0
- tmlt_analytics-0.6.2a3/test/system/session/test_budgets.py +97 -0
- tmlt_analytics-0.6.2a3/test/system/session/test_invalid_constraints.py +65 -0
- {tmlt.analytics-0.4.2/test → tmlt_analytics-0.6.2a3/test/unit}/__init__.py +1 -1
- tmlt_analytics-0.6.2a3/test/unit/base_builder/test_builder.py +143 -0
- tmlt_analytics-0.6.2a3/test/unit/base_builder/test_mixins.py +209 -0
- tmlt_analytics-0.6.2a3/test/unit/keysets/__init__.py +4 -0
- tmlt_analytics-0.6.2a3/test/unit/keysets/test_keyset.py +707 -0
- tmlt_analytics-0.6.2a3/test/unit/keysets/test_product_keyset.py +396 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/test/unit/query_expr_compiler/__init__.py +2 -2
- tmlt_analytics-0.6.2a3/test/unit/query_expr_compiler/test_measurement_visitor.py +1928 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/test/unit/query_expr_compiler/test_output_schema_visitor.py +607 -291
- tmlt_analytics-0.6.2a3/test/unit/query_expr_compiler/transformation_visitor/__init__.py +4 -0
- tmlt_analytics-0.6.2a3/test/unit/query_expr_compiler/transformation_visitor/conftest.py +487 -0
- tmlt_analytics-0.6.2a3/test/unit/query_expr_compiler/transformation_visitor/test_add_keys.py +643 -0
- tmlt_analytics-0.6.2a3/test/unit/query_expr_compiler/transformation_visitor/test_add_rows.py +936 -0
- tmlt_analytics-0.6.2a3/test/unit/query_expr_compiler/transformation_visitor/test_constraints.py +188 -0
- tmlt_analytics-0.6.2a3/test/unit/test_binning_spec.py +539 -0
- tmlt_analytics-0.6.2a3/test/unit/test_catalog.py +62 -0
- tmlt_analytics-0.6.2a3/test/unit/test_cleanup.py +24 -0
- tmlt_analytics-0.6.2a3/test/unit/test_config.py +134 -0
- tmlt_analytics-0.6.2a3/test/unit/test_constraints.py +165 -0
- tmlt_analytics-0.6.2a3/test/unit/test_neighboring_relations.py +350 -0
- tmlt_analytics-0.6.2a3/test/unit/test_noise_info.py +127 -0
- tmlt_analytics-0.6.2a3/test/unit/test_privacy_budget.py +704 -0
- tmlt_analytics-0.6.2a3/test/unit/test_privacy_budget_rounding_helper.py +151 -0
- tmlt_analytics-0.6.2a3/test/unit/test_protected_change.py +101 -0
- tmlt_analytics-0.6.2a3/test/unit/test_query_builder.py +1349 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/test/unit/test_query_expr_compiler.py +462 -556
- tmlt_analytics-0.6.2a3/test/unit/test_query_expression.py +583 -0
- tmlt_analytics-0.6.2a3/test/unit/test_query_expression_visitor.py +213 -0
- tmlt_analytics-0.6.2a3/test/unit/test_schema.py +122 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/test/unit/test_schema_conversion.py +30 -30
- tmlt_analytics-0.6.2a3/test/unit/test_session.py +2778 -0
- tmlt_analytics-0.6.2a3/test/unit/test_table_identifiers.py +58 -0
- tmlt_analytics-0.6.2a3/test/unit/test_transformation_utils.py +165 -0
- tmlt_analytics-0.6.2a3/test/unit/test_truncation_strategy.py +24 -0
- tmlt_analytics-0.6.2a3/test/unit/test_utils.py +13 -0
- tmlt_analytics-0.6.2a3/test_requirements.txt +3 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/__init__.py +58 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_base_builder.py +202 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_catalog.py +117 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/tmlt/analytics/_coerce_spark_schema.py +9 -19
- tmlt_analytics-0.6.2a3/tmlt/analytics/_neighboring_relation.py +404 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_neighboring_relation_visitor.py +177 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/tmlt/analytics/_noise_info.py +73 -6
- tmlt.analytics-0.4.2/tmlt/analytics/query_expr.py → tmlt_analytics-0.6.2a3/tmlt/analytics/_query_expr.py +447 -221
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/tmlt/analytics/_query_expr_compiler/__init__.py +1 -1
- tmlt_analytics-0.6.2a3/tmlt/analytics/_query_expr_compiler/_base_measurement_visitor.py +1836 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_query_expr_compiler/_base_transformation_visitor.py +1611 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_query_expr_compiler/_compiler.py +265 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_query_expr_compiler/_constraint_propagation.py +217 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_query_expr_compiler/_measurement_visitor.py +164 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_query_expr_compiler/_output_schema_visitor.py +919 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_query_expr_compiler/_transformation_visitor.py +12 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/tmlt/analytics/_schema.py +148 -22
- tmlt_analytics-0.6.2a3/tmlt/analytics/_table_identifier.py +60 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_table_reference.py +116 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_transformation_utils.py +324 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_type_checking.py +17 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/_utils.py +124 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/tmlt/analytics/binning_spec.py +138 -51
- tmlt_analytics-0.6.2a3/tmlt/analytics/cleanup.py +17 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/config.py +175 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/constraints/__init__.py +24 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/constraints/_base.py +32 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/constraints/_simplify.py +21 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/constraints/_truncation.py +375 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/keyset.py +585 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/privacy_budget.py +558 -0
- tmlt_analytics-0.6.2a3/tmlt/analytics/protected_change.py +130 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/tmlt/analytics/query_builder.py +1332 -637
- tmlt_analytics-0.6.2a3/tmlt/analytics/session.py +1821 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/tmlt/analytics/truncation_strategy.py +34 -15
- tmlt_analytics-0.6.2a3/tmlt/analytics/utils.py +172 -0
- tmlt.analytics-0.4.2/CHANGELOG.md +0 -103
- tmlt.analytics-0.4.2/PKG-INFO +0 -71
- tmlt.analytics-0.4.2/README.md +0 -38
- tmlt.analytics-0.4.2/doc/_templates/build-info.html +0 -1
- tmlt.analytics-0.4.2/doc/_templates/package-name.html +0 -1
- tmlt.analytics-0.4.2/doc/additional-resources/changelog.rst +0 -196
- tmlt.analytics-0.4.2/doc/additional-resources/contact.rst +0 -10
- tmlt.analytics-0.4.2/doc/additional-resources/index.rst +0 -14
- tmlt.analytics-0.4.2/doc/additional-resources/license.rst +0 -15
- tmlt.analytics-0.4.2/doc/conf.py +0 -219
- tmlt.analytics-0.4.2/doc/images/chart_favorite_genres.png +0 -0
- tmlt.analytics-0.4.2/doc/index.rst +0 -97
- tmlt.analytics-0.4.2/doc/installation.rst +0 -183
- tmlt.analytics-0.4.2/doc/intersphinx_mapping.json +0 -32
- tmlt.analytics-0.4.2/doc/tutorials/index.rst +0 -20
- tmlt.analytics-0.4.2/examples/interactive_evaluation.ipynb +0 -227
- tmlt.analytics-0.4.2/examples/private_join.ipynb +0 -209
- tmlt.analytics-0.4.2/examples/zcdp_puredp_switching.ipynb +0 -249
- tmlt.analytics-0.4.2/pyproject.toml +0 -154
- tmlt.analytics-0.4.2/setup.py +0 -36
- tmlt.analytics-0.4.2/test/system/test_session.py +0 -2109
- tmlt.analytics-0.4.2/test/unit/query_expr_compiler/test_measurement_visitor.py +0 -2237
- tmlt.analytics-0.4.2/test/unit/query_expr_compiler/test_transformation_visitor.py +0 -1059
- tmlt.analytics-0.4.2/test/unit/test_binning_spec.py +0 -375
- tmlt.analytics-0.4.2/test/unit/test_catalog.py +0 -105
- tmlt.analytics-0.4.2/test/unit/test_cleanup.py +0 -25
- tmlt.analytics-0.4.2/test/unit/test_keyset.py +0 -647
- tmlt.analytics-0.4.2/test/unit/test_privacy_budget.py +0 -87
- tmlt.analytics-0.4.2/test/unit/test_privacy_budget_rounding_helper.py +0 -147
- tmlt.analytics-0.4.2/test/unit/test_query_builder.py +0 -1127
- tmlt.analytics-0.4.2/test/unit/test_query_expression.py +0 -534
- tmlt.analytics-0.4.2/test/unit/test_query_expression_visitor.py +0 -176
- tmlt.analytics-0.4.2/test/unit/test_schema.py +0 -66
- tmlt.analytics-0.4.2/test/unit/test_session.py +0 -1251
- tmlt.analytics-0.4.2/test/unit/test_truncation_strategy.py +0 -28
- tmlt.analytics-0.4.2/test_requirements.txt +0 -2
- tmlt.analytics-0.4.2/tmlt/analytics/__init__.py +0 -8
- tmlt.analytics-0.4.2/tmlt/analytics/_catalog.py +0 -182
- tmlt.analytics-0.4.2/tmlt/analytics/_privacy_budget_rounding_helper.py +0 -65
- tmlt.analytics-0.4.2/tmlt/analytics/_query_expr_compiler/_compiler.py +0 -211
- tmlt.analytics-0.4.2/tmlt/analytics/_query_expr_compiler/_measurement_visitor.py +0 -744
- tmlt.analytics-0.4.2/tmlt/analytics/_query_expr_compiler/_output_schema_visitor.py +0 -982
- tmlt.analytics-0.4.2/tmlt/analytics/_query_expr_compiler/_transformation_visitor.py +0 -849
- tmlt.analytics-0.4.2/tmlt/analytics/cleanup.py +0 -21
- tmlt.analytics-0.4.2/tmlt/analytics/keyset.py +0 -306
- tmlt.analytics-0.4.2/tmlt/analytics/privacy_budget.py +0 -111
- tmlt.analytics-0.4.2/tmlt/analytics/session.py +0 -1210
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/LICENSE +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/LICENSE.docs +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/_static/favicon.ico +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/_static/logo.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_age_at_joining.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_attacker_certainty.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_average_age_by_edu.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_counts_age_gender.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_counts_different_eps.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_counts_edu+sex.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_counts_education.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_error_vs_partition_age_edu.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_filters_education.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_quantiles_education.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_senior_counts_1.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_senior_counts_2.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_teen_edu_counts.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/chart_younger_age_counts.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/clamping_bounds_averages.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/clamping_bounds_schema.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/histogram_books_borrowed.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/index_api.svg +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/index_more.svg +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/index_topic_guides.svg +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/index_tutorials.svg +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/doc/images/logo.png +0 -0
- {tmlt.analytics-0.4.2 → tmlt_analytics-0.6.2a3}/tmlt/analytics/py.typed +0 -0
|
@@ -0,0 +1,623 @@
|
|
|
1
|
+
..
|
|
2
|
+
SPDX-License-Identifier: CC-BY-SA-4.0
|
|
3
|
+
Copyright Tumult Labs 2024
|
|
4
|
+
.. _analytics-changelog:
|
|
5
|
+
|
|
6
|
+
Changelog
|
|
7
|
+
=========
|
|
8
|
+
|
|
9
|
+
Unreleased
|
|
10
|
+
----------
|
|
11
|
+
|
|
12
|
+
Changed
|
|
13
|
+
~~~~~~~
|
|
14
|
+
- Upgraded to typeguard 4.
|
|
15
|
+
- Privacy budgets support division, addition and subtraction.
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
.. _v0.16.1:
|
|
19
|
+
|
|
20
|
+
0.16.1 - 2024-09-04
|
|
21
|
+
-------------------
|
|
22
|
+
|
|
23
|
+
This is a maintenance release, with no externally-visible changes.
|
|
24
|
+
|
|
25
|
+
.. _v0.16.0:
|
|
26
|
+
|
|
27
|
+
0.16.0 - 2024-08-21
|
|
28
|
+
-------------------
|
|
29
|
+
This release adds a new :meth:`QueryBuilder.flat_map_by_id <tmlt.analytics.query_builder.QueryBuilder.flat_map_by_id>` transformation, improved constraint support when using :meth:`~tmlt.analytics.session.Session.partition_and_create`, and performance improvements.
|
|
30
|
+
|
|
31
|
+
Added
|
|
32
|
+
~~~~~
|
|
33
|
+
- Added a new transformation, :meth:`QueryBuilder.flat_map_by_id <tmlt.analytics.query_builder.QueryBuilder.flat_map_by_id>`, which allows user-defined transformations to be applied to groups of rows sharing an ID on tables with the :class:`~tmlt.analytics.protected_change.AddRowsWithID` protected change.
|
|
34
|
+
|
|
35
|
+
|
|
36
|
+
Fixed
|
|
37
|
+
~~~~~
|
|
38
|
+
- Significantly improved the performance of coercing Session input dataframe columns to supported types.
|
|
39
|
+
|
|
40
|
+
Changed
|
|
41
|
+
~~~~~~~
|
|
42
|
+
- :meth:`~tmlt.analytics.session.Session.partition_and_create` can now be used on a table with an :class:`~tmlt.analytics.protected_change.AddRowsWithID` protected change if a :class:`~tmlt.analytics.constraints.MaxRowsPerID` constraint is present, converting the table being partitioned into one with an :class:`~tmlt.analytics.protected_change.AddMaxRows` protected change.
|
|
43
|
+
The behavior when using :meth:`~tmlt.analytics.session.Session.partition_and_create` on such a table with a :class:`~tmlt.analytics.constraints.MaxGroupsPerID` constraint has not changed.
|
|
44
|
+
If both :class:`~tmlt.analytics.constraints.MaxRowsPerID` and :class:`~tmlt.analytics.constraints.MaxGroupsPerID` constraints are present, the :class:`~tmlt.analytics.constraints.MaxRowsPerID` constraint is ignored and only the :class:`~tmlt.analytics.constraints.MaxGroupsPerID` constraint gets applied.
|
|
45
|
+
|
|
46
|
+
.. _v0.15.0:
|
|
47
|
+
|
|
48
|
+
0.15.0 - 2024-08-12
|
|
49
|
+
-------------------
|
|
50
|
+
This release extends the :meth:`~tmlt.analytics.query_builder.GroupedQueryBuilder.get_bounds` method so it can get upper and lower bounds for each group in a dataframe.
|
|
51
|
+
In addition, it changes the object used to represent queries to the new :class:`~tmlt.analytics.query_builder.Query` class, and updates the format in which table schemas are returned.
|
|
52
|
+
|
|
53
|
+
|
|
54
|
+
Added
|
|
55
|
+
~~~~~
|
|
56
|
+
- Added a dependency on the library ``tabulate`` to improve table displays from :meth:`~tmlt.analytics.session.Session.describe`.
|
|
57
|
+
- Added the ability to :meth:`~tmlt.analytics.query_builder.GroupedQueryBuilder.get_bounds` after calling :meth:`~tmlt.analytics.query_builder.QueryBuilder.groupby`, for determining upper and lower bounds for a column per group in a differentially private way.
|
|
58
|
+
|
|
59
|
+
Changed
|
|
60
|
+
~~~~~~~
|
|
61
|
+
- *Backwards-incompatible*: The :meth:`~tmlt.analytics.query_builder.QueryBuilder.get_bounds` query now returns a dataframe when evaluated instead of a tuple.
|
|
62
|
+
- *Backwards-incompatible*: The :meth:`Session.get_schema() <tmlt.analytics.session.Session.get_schema>` and :meth:`KeySet.schema() <tmlt.analytics.keyset.KeySet.schema>` methods now return a normal dictionary of column names to :class:`~tmlt.analytics.query_builder.ColumnDescriptor`\ s, rather than a specialized ``Schema`` type.
|
|
63
|
+
This brings them more in line with the rest of the Tumult Analytics API, but could impact code that used some functionality available through the ``Schema`` type.
|
|
64
|
+
Uses of these methods where the result is treated as a dictionary should not be impacted.
|
|
65
|
+
- :class:`~tmlt.analytics.query_builder.QueryBuilder` now returns a :class:`~tmlt.analytics.query_builder.Query` object instead of a ``QueryExpr`` or ``AggregatedQueryBuilder`` when a query is created.
|
|
66
|
+
This should not affect code using :class:`~tmlt.analytics.query_builder.QueryBuilder` unless it directly inspects these objects.
|
|
67
|
+
- GroupbyCount queries now return :class:`~tmlt.analytics.query_builder.GroupbyCountQuery`, a subclass of :class:`~tmlt.analytics.query_builder.Query` that has the :meth:`~tmlt.analytics.query_builder.GroupbyCountQuery.suppress` post-process method.
|
|
68
|
+
- :meth:`~tmlt.analytics.session.Session.evaluate` now accepts :class:`~tmlt.analytics.query_builder.Query` objects instead of ``QueryExpr`` objects.
|
|
69
|
+
- Replaced asserts with custom exceptions in cases where internal errors are detected.
|
|
70
|
+
Internal errors are now raised as :class:`~tmlt.analytics.AnalyticsInternalError`.
|
|
71
|
+
- Updated to Tumult Core 0.16.1.
|
|
72
|
+
|
|
73
|
+
Removed
|
|
74
|
+
~~~~~~~
|
|
75
|
+
- QueryExprs (previously in ``tmlt.analytics.query_expr``) have been removed from the Tumult Analytics public API.
|
|
76
|
+
Queries should be created using :class:`~tmlt.analytics.query_builder.QueryBuilder`, which returns a new :class:`~tmlt.analytics.query_builder.Query` when a query is created.
|
|
77
|
+
- Removed the ``query_expr`` attribute from the :class:`~tmlt.analytics.query_builder.QueryBuilder` class.
|
|
78
|
+
- Removed support for Pandas 1.2 and 1.3 due to a known bug in Pandas versions below 1.4.
|
|
79
|
+
|
|
80
|
+
.. _v0.14.0:
|
|
81
|
+
|
|
82
|
+
0.14.0 - 2024-07-18
|
|
83
|
+
-------------------
|
|
84
|
+
|
|
85
|
+
Tumult Analytics 0.14.0 introduces experimental support for Python 3.12.
|
|
86
|
+
Full support for Python 3.12 and Pandas 2 will not be available until the release of PySpark 4.0.
|
|
87
|
+
In addition, Python 3.7 is no longer supported.
|
|
88
|
+
|
|
89
|
+
In addition, this release deprecates the ``tmlt.analytics.query_expr`` module.
|
|
90
|
+
Use of ``QueryExpr`` and its subtypes to create queries has been discouraged for a long time, and these types will be removed from the Tumult Analytics API in an upcoming release.
|
|
91
|
+
Other types from this module have been moved into the ``tmlt.analytics.query_builder`` module, though they may be imported from either until the ``query_expr`` module is removed.
|
|
92
|
+
|
|
93
|
+
Added
|
|
94
|
+
~~~~~
|
|
95
|
+
- Tumult Analytics now has experimental support for Python 3.12 using Pandas 2.
|
|
96
|
+
|
|
97
|
+
Changed
|
|
98
|
+
~~~~~~~
|
|
99
|
+
- Mechanism enums (e.g. :class:`~tmlt.analytics.query_builder.CountMechanism`) should now be imported from :mod:`tmlt.analytics.query_builder`.
|
|
100
|
+
The current query expression module (``tmlt.analytics.query_expr``) will be removed from the public API in an upcoming release.
|
|
101
|
+
|
|
102
|
+
Removed
|
|
103
|
+
~~~~~~~
|
|
104
|
+
- Removed support for Python 3.7.
|
|
105
|
+
|
|
106
|
+
Deprecated
|
|
107
|
+
~~~~~~~~~~
|
|
108
|
+
- QueryExprs (previously in ``tmlt.analytics.query_expr``) will be removed from the Tumult Analytics public API in an upcoming release.
|
|
109
|
+
Queries should be created using :class:`~tmlt.analytics.query_builder.QueryBuilder` instead.
|
|
110
|
+
|
|
111
|
+
.. _v0.13.0:
|
|
112
|
+
|
|
113
|
+
0.13.0 - 2024-07-03
|
|
114
|
+
-------------------
|
|
115
|
+
This release makes some supporting classes immutable.
|
|
116
|
+
|
|
117
|
+
|
|
118
|
+
Changed
|
|
119
|
+
~~~~~~~
|
|
120
|
+
- Made :class:`~tmlt.analytics.binning_spec.BinningSpec` immutable.
|
|
121
|
+
|
|
122
|
+
.. _v0.12.0:
|
|
123
|
+
|
|
124
|
+
0.12.0 - 2024-06-18
|
|
125
|
+
-------------------
|
|
126
|
+
|
|
127
|
+
This release adds support for left public joins.
|
|
128
|
+
|
|
129
|
+
Added
|
|
130
|
+
~~~~~
|
|
131
|
+
- Added support for left public joins to :meth:`~.join_public`, previously only inner joins were supported.
|
|
132
|
+
|
|
133
|
+
|
|
134
|
+
|
|
135
|
+
.. _v0.11.0:
|
|
136
|
+
|
|
137
|
+
0.11.0 - 2024-06-05
|
|
138
|
+
-------------------
|
|
139
|
+
|
|
140
|
+
This release introduces support in the query language for suppressing aggregates below a certain threshold, providing an easier and clearer way to express queries where small values must be dropped due to potentially-high noise.
|
|
141
|
+
|
|
142
|
+
For macOS users, it also introduces native support for Apple silicon, allowing Tumult Analytics to be used on ARM-based Macs without the need for Rosetta.
|
|
143
|
+
Take a look at the updated :ref:`installation guide <Installation instructions>` for more information about this.
|
|
144
|
+
If you have an existing installation that uses Rosetta, ensure that you are using a supported native Python installation when switching over.
|
|
145
|
+
Users with Intel-based Macs should not be affected.
|
|
146
|
+
|
|
147
|
+
Added
|
|
148
|
+
~~~~~
|
|
149
|
+
- Added a ``tmlt.analytics.query_expr.SuppressAggregates`` query type, for suppressing aggregates less than a certain threshold.
|
|
150
|
+
This is currently only supported for post-processing ``tmlt.analytics.query_expr.GroupByCount`` queries.
|
|
151
|
+
These can be built using the :class:`~tmlt.analytics.query_builder.QueryBuilder` by calling ``AggregatedQueryBuilder.suppress`` after building a GroupByCount query.
|
|
152
|
+
As part of this change, query builders now return an ``tmlt.analytics.query_builder.AggregatedQueryBuilder`` instead of a ``tmlt.analytics.query_expr.QueryExpr`` when aggregating;
|
|
153
|
+
the ``tmlt.analytics.query_builder.AggregatedQueryBuilder`` can be passed to :meth:`Session.evaluate <tmlt.analytics.session.Session.evaluate>` so most existing code should not need to be migrated.
|
|
154
|
+
- Added :meth:`~tmlt.analytics.keyset.KeySet.cache` and :meth:`~tmlt.analytics.keyset.KeySet.uncache` methods to :class:`~tmlt.analytics.keyset.KeySet` for caching and uncaching the underlying Spark dataframe.
|
|
155
|
+
These methods can be used to improve performance because KeySets follow Spark's lazy evaluation model.
|
|
156
|
+
|
|
157
|
+
Changed
|
|
158
|
+
~~~~~~~
|
|
159
|
+
- :class:`~tmlt.analytics.privacy_budget.PureDPBudget`, :class:`~tmlt.analytics.privacy_budget.ApproxDPBudget`, and :class:`~tmlt.analytics.privacy_budget.RhoZCDPBudget` are now immutable classes.
|
|
160
|
+
- :class:`~tmlt.analytics.privacy_budget.PureDPBudget` and :class:`~tmlt.analytics.privacy_budget.ApproxDPBudget` are no longer considered equal if they have the same epsilon and the :class:`~tmlt.analytics.privacy_budget.ApproxDPBudget` has a delta of zero.
|
|
161
|
+
|
|
162
|
+
.. _v0.10.2:
|
|
163
|
+
|
|
164
|
+
0.10.2 - 2024-05-31
|
|
165
|
+
-------------------
|
|
166
|
+
|
|
167
|
+
Changed
|
|
168
|
+
~~~~~~~
|
|
169
|
+
- Column order is now preserved when selecting columns from a :class:`~tmlt.analytics.keyset.KeySet`.
|
|
170
|
+
|
|
171
|
+
.. _v0.10.1:
|
|
172
|
+
|
|
173
|
+
0.10.1 - 2024-05-28
|
|
174
|
+
-------------------
|
|
175
|
+
|
|
176
|
+
This release contains no externally-visible changes from the previous version.
|
|
177
|
+
|
|
178
|
+
|
|
179
|
+
.. _v0.10.0:
|
|
180
|
+
|
|
181
|
+
0.10.0 - 2024-05-17
|
|
182
|
+
-------------------
|
|
183
|
+
|
|
184
|
+
This release adds a new :meth:`~tmlt.analytics.query_builder.QueryBuilder.get_bounds` aggregation.
|
|
185
|
+
It also includes performance improvements for :class:`~tmlt.analytics.keyset.KeySet`\ s, and other quality-of-life improvements.
|
|
186
|
+
|
|
187
|
+
Added
|
|
188
|
+
~~~~~
|
|
189
|
+
- Added the :meth:`QueryBuilder.get_bounds <tmlt.analytics.query_builder.QueryBuilder.get_bounds>` function, for determining upper and lower bounds for a column in a differentially private way.
|
|
190
|
+
|
|
191
|
+
Changed
|
|
192
|
+
~~~~~~~
|
|
193
|
+
- If a :class:`~tmlt.analytics.session.Session.Builder` has only one
|
|
194
|
+
private dataframe *and* that dataframe uses the
|
|
195
|
+
:class:`~tmlt.analytics.protected_change.AddRowsWithID` protected change,
|
|
196
|
+
the relevant ID space will automatically be added to the Builder when
|
|
197
|
+
:meth:`~tmlt.analytics.session.Session.Builder.build` is called.
|
|
198
|
+
- :class:`~tmlt.analytics.keyset.KeySet` is now an abstract class, in order to
|
|
199
|
+
make some KeySet operations (column selection after cross-products) more
|
|
200
|
+
efficient.
|
|
201
|
+
Behavior is unchanged for users of the :meth:`~tmlt.analytics.keyset.KeySet.from_dict`
|
|
202
|
+
and :meth:`~tmlt.analytics.keyset.KeySet.from_dataframe` constructors.
|
|
203
|
+
|
|
204
|
+
Fixed
|
|
205
|
+
~~~~~
|
|
206
|
+
- Stopped trying to set extra options for Java 11 and removed error when options are not set. Removed ``get_java_11_config()``.
|
|
207
|
+
- Updated minimum supported Spark version to 3.1.1 to prevent Java 11 error.
|
|
208
|
+
|
|
209
|
+
.. _v0.9.0:
|
|
210
|
+
|
|
211
|
+
0.9.0 - 2024-04-16
|
|
212
|
+
------------------
|
|
213
|
+
|
|
214
|
+
This is a maintenance release, fixing a number of bugs and improving our API documentation.
|
|
215
|
+
|
|
216
|
+
Note that the 0.9.x release series will be the last to support Python 3.7, which has not been receiving security updates for several months.
|
|
217
|
+
If this is a problem, please `reach out to us <mailto:info@tmlt.io>`_.
|
|
218
|
+
|
|
219
|
+
Changed
|
|
220
|
+
~~~~~~~
|
|
221
|
+
- :class:`~tmlt.analytics.keyset.KeySet` equality is now performed without converting the underlying dataframe to Pandas.
|
|
222
|
+
- :meth:`~tmlt.analytics.session.Session.partition_and_create`: the ``column`` and ``splits`` arguments are now annotated as required.
|
|
223
|
+
- The minimum supported version of Tumult Core is now 0.13.0.
|
|
224
|
+
- The :meth:`QueryBuilder.variance <tmlt.analytics.query_builder.QueryBuilder.variance>`, :meth:`QueryBuilder.stdev <tmlt.analytics.query_builder.QueryBuilder.stdev>`, :meth:`GroupedQueryBuilder.variance <tmlt.analytics.query_builder.GroupedQueryBuilder.variance>`, and :meth:`GroupedQueryBuilder.stdev <tmlt.analytics.query_builder.GroupedQueryBuilder.stdev>` methods now calculate the sample variance or standard deviation, rather than the population variance or standard deviation.
|
|
225
|
+
|
|
226
|
+
Removed
|
|
227
|
+
~~~~~~~
|
|
228
|
+
- *Backwards-incompatible*: The ``stability`` and ``grouping_column`` parameters to :meth:`Session.from_dataframe <tmlt.analytics.session.Session.from_dataframe>` and :meth:`Session.Builder.with_private_dataframe <tmlt.analytics.session.Session.Builder.with_private_dataframe>` have been removed (deprecated since :ref:`0.7.0 <v0.7.0>`).
|
|
229
|
+
As a result, the ``protected_change`` parameter to those methods is now required.
|
|
230
|
+
|
|
231
|
+
Fixed
|
|
232
|
+
~~~~~
|
|
233
|
+
- The error message when attempting to overspend an :class:`~tmlt.analytics.privacy_budget.ApproxDPBudget` now more clearly indicates which component of the budget was insufficient to evaluate the query.
|
|
234
|
+
- :meth:`QueryBuilder.get_groups <tmlt.analytics.query_builder.QueryBuilder.get_groups>` now automatically excludes ID columns if no columns are specified.
|
|
235
|
+
- Flat maps now correctly ignore ``max_rows`` when it does not apply.
|
|
236
|
+
Previously they would raise a warning saying that ``max_rows`` was ignored, but would still use it to limit the number of rows in the output.
|
|
237
|
+
|
|
238
|
+
.. _v0.8.3:
|
|
239
|
+
|
|
240
|
+
0.8.3 - 2024-02-27
|
|
241
|
+
------------------
|
|
242
|
+
|
|
243
|
+
This is a maintenance release that adds support for newer versions of Tumult Core. It contains no API changes.
|
|
244
|
+
|
|
245
|
+
.. _v0.8.2:
|
|
246
|
+
|
|
247
|
+
0.8.2 - 2023-11-29
|
|
248
|
+
------------------
|
|
249
|
+
|
|
250
|
+
This release addresses a serious security vulnerability in PyArrow: `CVE-2023-47248 <https://nvd.nist.gov/vuln/detail/CVE-2023-47248>`__.
|
|
251
|
+
It is **strongly recommended** that all users update to this version of Analytics or apply one of the mitigations described in the `GitHub Advisory <https://github.com/advisories/GHSA-5wvp-7f3h-6wmm>`__.
|
|
252
|
+
|
|
253
|
+
Changed
|
|
254
|
+
~~~~~~~
|
|
255
|
+
- Increased minimum supported version of Tumult Core to 0.11.5.
|
|
256
|
+
As a result:
|
|
257
|
+
|
|
258
|
+
- Increased the minimum supported version of PyArrow to 14.0.1 for Python 3.8 and above.
|
|
259
|
+
- Added dependency on ``pyarrow-hotfix`` on Python 3.7.
|
|
260
|
+
Note that if you are using Python 3.7, the hotfix must be imported before using PySpark in order to be effective.
|
|
261
|
+
Analytics imports the hotfix, so importing Analytics before using Spark will also work.
|
|
262
|
+
|
|
263
|
+
.. _v0.8.1:
|
|
264
|
+
|
|
265
|
+
0.8.1 - 2023-10-30
|
|
266
|
+
------------------
|
|
267
|
+
|
|
268
|
+
This release adds support for Python 3.11, as well as compatibility with newer versions of various dependencies, including PySpark.
|
|
269
|
+
It also includes documentation improvements, but no API changes.
|
|
270
|
+
|
|
271
|
+
.. _v0.8.0:
|
|
272
|
+
|
|
273
|
+
0.8.0 - 2023-08-15
|
|
274
|
+
------------------
|
|
275
|
+
|
|
276
|
+
This is a maintenance release that addresses a performance regression for complex queries and improves naming consistency in some areas of the Tumult Analytics API.
|
|
277
|
+
|
|
278
|
+
Added
|
|
279
|
+
~~~~~
|
|
280
|
+
- Added the :meth:`QueryBuilder.get_groups <tmlt.analytics.query_builder.QueryBuilder.get_groups>` function, for determining groupby keys for a table in a differentially private way.
|
|
281
|
+
|
|
282
|
+
Changed
|
|
283
|
+
~~~~~~~
|
|
284
|
+
- *Backwards-incompatible*: Renamed ``DropExcess.max_records`` to :attr:`~tmlt.analytics.truncation_strategy.TruncationStrategy.DropExcess.max_rows`.
|
|
285
|
+
- *Backwards-incompatible*: Renamed ``FlatMap.max_num_rows`` to ``FlatMap.max_rows``.
|
|
286
|
+
- Changed the name of an argument for :meth:`QueryBuilder.flat_map()<tmlt.analytics.query_builder.QueryBuilder.flat_map>` from ``max_num_rows`` to ``max_rows``. The old ``max_num_rows`` argument is deprecated and will be removed in a future release.
|
|
287
|
+
|
|
288
|
+
Fixed
|
|
289
|
+
~~~~~
|
|
290
|
+
- Upgrades to version 0.11 of Tumult Core.
|
|
291
|
+
This addresses a performance issue introduced in Tumult Analytics 0.7.0 where some complex queries compiled much more slowly than they had previously.
|
|
292
|
+
|
|
293
|
+
.. _v0.7.3:
|
|
294
|
+
|
|
295
|
+
0.7.3 - 2023-07-13
|
|
296
|
+
------------------
|
|
297
|
+
|
|
298
|
+
Fixed
|
|
299
|
+
~~~~~
|
|
300
|
+
- Fixed a crash in public and private joins.
|
|
301
|
+
|
|
302
|
+
.. _v0.7.2:
|
|
303
|
+
|
|
304
|
+
0.7.2 - 2023-06-15
|
|
305
|
+
------------------
|
|
306
|
+
|
|
307
|
+
This release adds support for running Tumult Analytics on Python 3.10.
|
|
308
|
+
It also enables adding continuous Gaussian noise to query results, and addresses a number of bugs and API inconsistencies.
|
|
309
|
+
|
|
310
|
+
Added
|
|
311
|
+
~~~~~
|
|
312
|
+
- Tumult Analytics now supports Python 3.10 in addition to the previously-supported versions.
|
|
313
|
+
- Queries evaluated with zCDP budgets can now use continuous Gaussian noise, allowing the use of Gaussian noise for queries with non-integer results.
|
|
314
|
+
|
|
315
|
+
Changed
|
|
316
|
+
~~~~~~~
|
|
317
|
+
- The :meth:`QueryBuilder.replace_null_and_nan()<tmlt.analytics.query_builder.QueryBuilder.replace_null_and_nan>` and :meth:`QueryBuilder.drop_null_and_nan()<tmlt.analytics.query_builder.QueryBuilder.drop_null_and_nan>` methods now accept empty column specifications on tables with an :class:`~tmlt.analytics.protected_change.AddRowsWithID` protected change.
|
|
318
|
+
Replacing/dropping nulls on ID columns is still not allowed, but the ID column will now automatically be excluded in this case rather than raising an exception.
|
|
319
|
+
- :meth:`BinningSpec.bins()<tmlt.analytics.binning_spec.BinningSpec.bins>` used to only include the NaN bin if the provided bin edges were floats.
|
|
320
|
+
However, float-valued columns can be binned with integer bin edges, which resulted in a confusing situation where a :class:`~tmlt.analytics.binning_spec.BinningSpec` could indicate that it would not use a NaN bin but still place values in the NaN bin.
|
|
321
|
+
To avoid this, :meth:`BinningSpec.bins()<tmlt.analytics.binning_spec.BinningSpec.bins>` now always includes the NaN bin if one was specified, regardless of whether the bin edge type can represent NaN values.
|
|
322
|
+
- The automatically-generated bin names in :class:`~tmlt.analytics.binning_spec.BinningSpec` now quote strings when they are used as bin edges.
|
|
323
|
+
For example, the bin generated by ``BinningSpec(["0", "1"])`` is now ``['0', '1']`` where it was previously ``[0, 1]``.
|
|
324
|
+
Bins with edges of other types are not affected.
|
|
325
|
+
|
|
326
|
+
Fixed
|
|
327
|
+
~~~~~
|
|
328
|
+
- Creating a :class:`~tmlt.analytics.session.Session` with multiple tables in an ID space used to fail if some of those tables' ID columns allowed nulls and others did not.
|
|
329
|
+
This no longer occurs, and in such cases all of the tables' ID columns are made nullable.
|
|
330
|
+
|
|
331
|
+
.. _v0.7.1:
|
|
332
|
+
|
|
333
|
+
0.7.1 - 2023-05-23
|
|
334
|
+
------------------
|
|
335
|
+
|
|
336
|
+
This is a maintenance release that mainly contains documentation updates.
|
|
337
|
+
It also fixes a bug where installing Tumult Analytics using pip 23 and above could fail due to a dependency mismatch.
|
|
338
|
+
|
|
339
|
+
.. _v0.7.0:
|
|
340
|
+
|
|
341
|
+
0.7.0 - 2023-04-27
|
|
342
|
+
------------------
|
|
343
|
+
|
|
344
|
+
This release adds support for *privacy identifiers*:
|
|
345
|
+
Tumult Analytics can now protect input tables in which the differential privacy guarantee needs to hide the presence of arbitrarily many rows sharing the same value in a particular column.
|
|
346
|
+
For example, this may be used to protect each user of a service when every row in a table is associated with a user ID.
|
|
347
|
+
|
|
348
|
+
Privacy identifiers are set up using the new :class:`~tmlt.analytics.protected_change.AddRowsWithID` protected change.
|
|
349
|
+
A number of features have been added to the API to support this, including alternative behaviors for various query transformations when working with IDs and the new concept of :mod:`~tmlt.analytics.constraints`.
|
|
350
|
+
To get started with these features, take a look at the new :ref:`Working with privacy IDs <Working with privacy IDs>` and :ref:`Doing more with privacy IDs <Advanced IDs features>` tutorials.
|
|
351
|
+
|
|
352
|
+
Added
|
|
353
|
+
~~~~~
|
|
354
|
+
- A new :class:`~tmlt.analytics.protected_change.AddRowsWithID` protected change has been added, which protects the addition or removal of all rows with the same value in a specified column.
|
|
355
|
+
See the documentation for :class:`~tmlt.analytics.protected_change.AddRowsWithID` and the :ref:`Doing more with privacy IDs <Advanced IDs features>` tutorial for more information.
|
|
356
|
+
|
|
357
|
+
- When creating a Session with :class:`~tmlt.analytics.protected_change.AddRowsWithID` using a :class:`Session.Builder<tmlt.analytics.session.Session.Builder>`, you must use the new :meth:`~tmlt.analytics.session.Session.Builder.with_id_space` method to specify the identifier space(s) of tables using this protected change.
|
|
358
|
+
- When creating a Session with :meth:`Session.from_dataframe()<tmlt.analytics.session.Session.from_dataframe>`, specifying an ID space is not necessary.
|
|
359
|
+
|
|
360
|
+
- :class:`~tmlt.analytics.query_builder.QueryBuilder` has a new method, :meth:`~tmlt.analytics.query_builder.QueryBuilder.enforce`, for enforcing constraints on a table.
|
|
361
|
+
Types for representing these constraints are located in the new :mod:`tmlt.analytics.constraints` module.
|
|
362
|
+
- A new method, :meth:`Session.describe()<tmlt.analytics.session.Session.describe>`, has been added to provide a summary of the tables in a :class:`~tmlt.analytics.session.Session`, or of a single table or the output of a query.
|
|
363
|
+
|
|
364
|
+
Changed
|
|
365
|
+
~~~~~~~
|
|
366
|
+
- :meth:`QueryBuilder.join_private()<tmlt.analytics.query_builder.QueryBuilder.join_private>` now accepts the name of a private table as ``right_operand``.
|
|
367
|
+
For example, ``QueryBuilder("table").join_private("foo")`` is equivalent to ``QueryBuilder("table").join_private(QueryBuilder("foo"))``.
|
|
368
|
+
- The ``max_num_rows`` parameter to :meth:`QueryBuilder.flat_map()<tmlt.analytics.query_builder.QueryBuilder.flat_map>` is now optional when applied to tables with an :class:`~tmlt.analytics.protected_change.AddRowsWithID` protected change.
|
|
369
|
+
- *Backwards-incompatible*: The parameters to :meth:`QueryBuilder.flat_map()<tmlt.analytics.query_builder.QueryBuilder.flat_map>` have been reordered, moving ``max_num_rows`` to be the last parameter.
|
|
370
|
+
- *Backwards-incompatible*: The lower and upper bounds for quantile, sum, average, variance, and standard deviation queries can no longer be equal to one another.
|
|
371
|
+
The lower bound must now be strictly less than the upper bound.
|
|
372
|
+
- *Backwards-incompatible*: Renamed :meth:`QueryBuilder.filter()<tmlt.analytics.query_builder.QueryBuilder.filter>` ``predicate`` argument to ``condition``.
|
|
373
|
+
- *Backwards-incompatible*: Renamed ``tmlt.analytics.query_expr.Filter`` query expression ``predicate`` property to ``condition``.
|
|
374
|
+
- *Backwards-incompatible*: Renamed :meth:`KeySet.filter()<tmlt.analytics.keyset.KeySet.filter>` ``expr`` argument to ``condition``.
|
|
375
|
+
|
|
376
|
+
Deprecated
|
|
377
|
+
~~~~~~~~~~
|
|
378
|
+
- The ``stability`` and ``grouping_column`` parameters to :class:`Session.from_dataframe()<tmlt.analytics.session.Session.from_dataframe>` and :class:`Session.Builder.with_private_dataframe()<tmlt.analytics.session.Session.Builder.with_private_dataframe>` are deprecated, and will be removed in a future release.
|
|
379
|
+
The ``protected_change`` parameter should be used instead, and will become required.
|
|
380
|
+
|
|
381
|
+
Removed
|
|
382
|
+
~~~~~~~
|
|
383
|
+
- The ``attr_name`` parameter to :class:`Session.partition_and_create()<tmlt.analytics.session.Session.partition_and_create>`, which was deprecated in version 0.5.0, has been removed.
|
|
384
|
+
|
|
385
|
+
Fixed
|
|
386
|
+
~~~~~
|
|
387
|
+
- :meth:`Session.add_public_datafame()<tmlt.analytics.session.Session.add_public_dataframe>` used to allow creation of a public table with the same name as an existing public table, which was neither intended nor fully supported by some :class:`~tmlt.analytics.session.Session` methods.
|
|
388
|
+
It now raises a ``ValueError`` in this case.
|
|
389
|
+
- Some query patterns on tables containing nulls could cause grouped aggregations to produce the wrong set of group keys in their output.
|
|
390
|
+
This no longer happens.
|
|
391
|
+
- In certain unusual cases, join transformations could erroneously drop rows containing nulls in columns that were not being joined on.
|
|
392
|
+
These rows are no longer dropped.
|
|
393
|
+
|
|
394
|
+
.. _v0.6.1:
|
|
395
|
+
|
|
396
|
+
0.6.1 - 2022-12-07
|
|
397
|
+
------------------
|
|
398
|
+
|
|
399
|
+
This is a maintenance release which introduces a number of documentation improvements, but has no publicly-visible API changes.
|
|
400
|
+
|
|
401
|
+
.. _v0.6.0:
|
|
402
|
+
|
|
403
|
+
0.6.0 - 2022-12-06
|
|
404
|
+
------------------
|
|
405
|
+
|
|
406
|
+
.. _changelog#protected-change:
|
|
407
|
+
|
|
408
|
+
This release introduces a new way to specify what unit of data is protected by the privacy guarantee of a :class:`~tmlt.analytics.session.Session`.
|
|
409
|
+
A new ``protected_change`` parameter is available when creating a :class:`~tmlt.analytics.session.Session`, taking an instance of the new :class:`~tmlt.analytics.protected_change.ProtectedChange` class which describes the largest unit of data in the resulting table on which the differential privacy guarantee will hold.
|
|
410
|
+
See the documentation for the :mod:`~tmlt.analytics.protected_change` module for more information about the available protected changes and how to use them.
|
|
411
|
+
|
|
412
|
+
The ``stability`` and ``grouping_column`` parameters which were used to specify this information are still accepted, and work as before, but they will be deprecated and eventually removed in future releases.
|
|
413
|
+
The default behavior of assuming ``stability=1`` if no other information is given will also be deprecated and removed, on a similar timeline to ``stability`` and ``grouping_column``; instead, explicitly specify ``protected_change=AddOneRow()``.
|
|
414
|
+
These changes should make the privacy guarantees provided by the :class:`~tmlt.analytics.session.Session` interface easier to understand and harder to misuse, and allow for future support for other units of protection that were not representable with the existing API.
|
|
415
|
+
|
|
416
|
+
Added
|
|
417
|
+
~~~~~
|
|
418
|
+
- As described above, :meth:`Session.Builder.with_private_dataframe <tmlt.analytics.session.Session.Builder.with_private_dataframe>` and :meth:`Session.from_dataframe <tmlt.analytics.session.Session.from_dataframe>` now have a new parameter, ``protected_change``.
|
|
419
|
+
This parameter takes an instance of one of the classes defined in the new :mod:`~tmlt.analytics.protected_change` module, specifying the unit of data in the corresponding table to be protected.
|
|
420
|
+
|
|
421
|
+
0.5.1 - 2022-11-16
|
|
422
|
+
------------------
|
|
423
|
+
|
|
424
|
+
Changed
|
|
425
|
+
~~~~~~~
|
|
426
|
+
|
|
427
|
+
- Updated to Tumult Core 0.6.0.
|
|
428
|
+
|
|
429
|
+
.. _v0.5.0:
|
|
430
|
+
|
|
431
|
+
0.5.0 - 2022-10-17
|
|
432
|
+
------------------
|
|
433
|
+
|
|
434
|
+
Added
|
|
435
|
+
~~~~~
|
|
436
|
+
|
|
437
|
+
- Added a diagram to the API reference page.
|
|
438
|
+
- Analytics now does an additional Spark configuration check for users running Java 11+ at the time of Analytics Session initialization. If the user is running Java 11 or higher with an incorrect Spark configuration, Analytics raises an informative exception.
|
|
439
|
+
- Added a method to check that basic Analytics functionality works (``tmlt.analytics.utils.check_installation``).
|
|
440
|
+
|
|
441
|
+
Changed
|
|
442
|
+
~~~~~~~
|
|
443
|
+
|
|
444
|
+
- *Backwards-incompatible*: Changed argument names for ``QueryBuilder.count_distinct`` and ``KeySet.__getitem__`` from ``cols`` to ``columns``, for consistency. The old argument has been deprecated, but is still available.
|
|
445
|
+
- *Backwards-incompatible*: Changed the argument name for ``Session.partition_and_create`` from ``attr_name`` to ``column``. The old argument has been deprecated, but is still available.
|
|
446
|
+
- Improved the error message shown when a filter expression is invalid.
|
|
447
|
+
- Updated to Tumult Core 0.5.0.
|
|
448
|
+
As a result, ``python-flint`` is no longer a transitive dependency, simplifying the Analytics installation process.
|
|
449
|
+
|
|
450
|
+
Deprecated
|
|
451
|
+
~~~~~~~~~~
|
|
452
|
+
|
|
453
|
+
- The contents of the ``cleanup`` module have been moved to the ``utils`` module. The ``cleanup`` module will be removed in a future version.
|
|
454
|
+
|
|
455
|
+
.. _v0.4.2:
|
|
456
|
+
|
|
457
|
+
0.4.2 - 2022-09-06
|
|
458
|
+
------------------
|
|
459
|
+
|
|
460
|
+
Fixed
|
|
461
|
+
~~~~~
|
|
462
|
+
|
|
463
|
+
- Switched to Core version 0.4.3 to avoid warnings when evaluating some queries.
|
|
464
|
+
|
|
465
|
+
.. _v0.4.1:
|
|
466
|
+
|
|
467
|
+
0.4.1 - 2022-08-25
|
|
468
|
+
------------------
|
|
469
|
+
|
|
470
|
+
Added
|
|
471
|
+
~~~~~
|
|
472
|
+
|
|
473
|
+
- Added ``QueryBuilder.histogram`` function, which provides a shorthand for generating binned data counts.
|
|
474
|
+
- Analytics now checks to see if the user is running Java 11 or higher. If they are, Analytics either sets the appropriate Spark options (if Spark is not yet running) or raises an informative exception (if Spark is running and configured incorrectly).
|
|
475
|
+
|
|
476
|
+
Changed
|
|
477
|
+
~~~~~~~
|
|
478
|
+
|
|
479
|
+
- Improved documentation for ``QueryBuilder.map`` and ``QueryBuilder.flat_map``.
|
|
480
|
+
|
|
481
|
+
Fixed
|
|
482
|
+
~~~~~
|
|
483
|
+
|
|
484
|
+
- Switched to Core version 0.4.2, which contains a fix for an issue that sometimes caused queries to fail to be compiled.
|
|
485
|
+
|
|
486
|
+
.. _v0.4.0:
|
|
487
|
+
|
|
488
|
+
0.4.0 - 2022-07-22
|
|
489
|
+
------------------
|
|
490
|
+
|
|
491
|
+
Added
|
|
492
|
+
~~~~~
|
|
493
|
+
|
|
494
|
+
- ``Session.from_dataframe`` and ``Session.Builder.with_private_dataframe`` now have a ``grouping_column`` option and support non-integer stabilities.
|
|
495
|
+
This allows setting up grouping columns like those that result from grouping flatmaps when loading data.
|
|
496
|
+
This is an advanced feature, and should be used carefully.
|
|
497
|
+
|
|
498
|
+
.. _v0.3.0:
|
|
499
|
+
|
|
500
|
+
0.3.0 - 2022-06-23
|
|
501
|
+
------------------
|
|
502
|
+
|
|
503
|
+
Added
|
|
504
|
+
~~~~~
|
|
505
|
+
|
|
506
|
+
- Added ``QueryBuilder.bin_column`` and an associated ``BinningSpec`` type.
|
|
507
|
+
- Dates may now be used in ``KeySet``\ s.
|
|
508
|
+
- Added support for DataFrames containing NaN and null values. Columns created by Map and FlatMap are now marked as potentially containing NaN and null values.
|
|
509
|
+
- Added ``QueryBuilder.replace_null_and_nan`` function, which replaces null and NaN values with specified defaults.
|
|
510
|
+
- Added ``QueryBuilder.replace_infinite`` function, which replaces positive and negative infinity values with specified defaults.
|
|
511
|
+
- Added ``QueryBuilder.drop_null_and_nan`` function, which drops null and NaN values for specified columns.
|
|
512
|
+
- Added ``QueryBuilder.drop_infinite`` function, which drops infinite values for specified columns.
|
|
513
|
+
- Aggregations (sum, quantile, average, variance, and standard deviation) now silently drop null and NaN values before being performed.
|
|
514
|
+
- Aggregations (sum, quantile, average, variance, and standard deviation) now silently clamp infinite values (+infinity and -infinity) to the query’s lower and upper bounds.
|
|
515
|
+
- Added a ``cleanup`` module with two functions: a ``cleanup`` function to remove the current temporary table (which should be called before ``spark.stop()``), and a ``remove_all_temp_tables`` function that removes all temporary tables ever created by Analytics.
|
|
516
|
+
- Added a topic guide in the documentation for Tumult Analytics’ treatment of null, NaN, and infinite values.
|
|
517
|
+
|
|
518
|
+
Changed
|
|
519
|
+
~~~~~~~
|
|
520
|
+
|
|
521
|
+
- *Backwards-incompatible*: Sessions no longer allow DataFrames to contain a column named ``""`` (the empty string).
|
|
522
|
+
- *Backwards-incompatible*: You can no longer call ``Session.Builder.with_privacy_budget`` multiple times on the same builder.
|
|
523
|
+
- *Backwards-incompatible*: You can no longer call ``Session.add_private_data`` multiple times with the same source id.
|
|
524
|
+
- *Backwards-incompatible*: Sessions now use the DataFrame’s schema to determine which columns are nullable.
|
|
525
|
+
|
|
526
|
+
Removed
|
|
527
|
+
~~~~~~~
|
|
528
|
+
|
|
529
|
+
- *Backwards-incompatible*: Removed ``groupby_public_source`` and ``groupby_domains`` from ``QueryBuilder``.
|
|
530
|
+
- *Backwards-incompatible*: ``Session.from_csv`` and CSV-related methods on ``Session.Builder`` have been removed.
|
|
531
|
+
Instead, use ``spark.read.csv`` along with ``Session.from_dataframe`` and other dataframe-based methods.
|
|
532
|
+
- *Backwards-incompatible*: Removed ``validate`` option from ``Session.from_dataframe``, ``Session.add_public_dataframe``, ``Session.Builder.with_private_dataframe``, ``Session.Builder.with_public_dataframe``.
|
|
533
|
+
- *Backwards-incompatible*: Removed ``KeySet.contains_nan_or_null``.
|
|
534
|
+
|
|
535
|
+
Fixed
|
|
536
|
+
~~~~~
|
|
537
|
+
|
|
538
|
+
- *Backwards-incompatible*: ``KeySet``\ s now explicitly check for and disallow the use of floats and timestamps as keys.
|
|
539
|
+
This has always been the intended behavior, but it was previously not checked for and could work or cause non-obvious errors depending on the situation.
|
|
540
|
+
- ``KeySet.dataframe()`` now always returns a dataframe where all rows are distinct.
|
|
541
|
+
- Under certain circumstances, evaluating a ``GroupByCountDistinct`` query expression used to modify the input ``QueryExpr``.
|
|
542
|
+
This no longer occurs.
|
|
543
|
+
- It is now possible to partition on a column created by a grouping flat map, which used to raise exception from Core.
|
|
544
|
+
|
|
545
|
+
.. _v0.2.1:
|
|
546
|
+
|
|
547
|
+
0.2.1 - 2022-04-14 (internal release)
|
|
548
|
+
-------------------------------------
|
|
549
|
+
|
|
550
|
+
Added
|
|
551
|
+
~~~~~
|
|
552
|
+
|
|
553
|
+
- Added support for basic operations (filter, map, etc.) on Spark date and timestamp columns.
|
|
554
|
+
``ColumnType`` has two new variants, ``DATE`` and ``TIMESTAMP``, to support these.
|
|
555
|
+
- Future documentation will now include any exceptions defined in Analytics.
|
|
556
|
+
|
|
557
|
+
Changed
|
|
558
|
+
~~~~~~~
|
|
559
|
+
|
|
560
|
+
- Switch session to use Persist/Unpersist instead of Cache.
|
|
561
|
+
|
|
562
|
+
.. _v0.2.0:
|
|
563
|
+
|
|
564
|
+
0.2.0 - 2022-03-28 (internal release)
|
|
565
|
+
-------------------------------------
|
|
566
|
+
|
|
567
|
+
Removed
|
|
568
|
+
~~~~~~~
|
|
569
|
+
|
|
570
|
+
- Multi-query evaluate support is entirely removed.
|
|
571
|
+
- Columns that are neither floats nor doubles will no longer be checked for NaN values.
|
|
572
|
+
- The ``BIT`` variant of the ``ColumnType`` enum was removed, as it was not supported elsewhere in Analytics.
|
|
573
|
+
|
|
574
|
+
Changed
|
|
575
|
+
~~~~~~~
|
|
576
|
+
|
|
577
|
+
- *Backwards-incompatible*: Renamed ``query_exprs`` parameter in ``Session.evaluate`` to ``query_expr``.
|
|
578
|
+
- *Backwards-incompatible*: ``QueryBuilder.join_public`` and the ``JoinPublic`` query expression can now accept public tables specified as Spark dataframes. The existing behavior using public source IDs is still supported, but the ``public_id`` parameter/property is now called ``public_table``.
|
|
579
|
+
- Installation on Python 3.7.1 through 3.7.3 is now allowed.
|
|
580
|
+
- KeySets now do type coercion on creation, matching the type coercion that Sessions do for private sources.
|
|
581
|
+
- Sessions created by ``partition_and_create`` must be used in the order they were created, and using the parent session will forcibly close all child sessions.
|
|
582
|
+
Sessions can be manually closed with ``session.stop()``.
|
|
583
|
+
|
|
584
|
+
Fixed
|
|
585
|
+
~~~~~
|
|
586
|
+
|
|
587
|
+
- Joining with a public table that contains no NaNs, but has a column where NaNs are allowed, previously caused an error when compiling queries. This is now handled correctly.
|
|
588
|
+
|
|
589
|
+
.. _v0.1.1:
|
|
590
|
+
|
|
591
|
+
0.1.1 - 2022-02-28 (internal release)
|
|
592
|
+
-------------------------------------
|
|
593
|
+
|
|
594
|
+
Added
|
|
595
|
+
~~~~~
|
|
596
|
+
|
|
597
|
+
- Added a ``KeySet`` class, which will eventually be used for all GroupBy queries.
|
|
598
|
+
- Added ``QueryBuilder.groupby()``, a new group-by based on ``KeySet``\ s.
|
|
599
|
+
|
|
600
|
+
Changed
|
|
601
|
+
~~~~~~~
|
|
602
|
+
|
|
603
|
+
- The Analytics library now uses ``KeySet`` and ``QueryBuilder.groupby()`` for all
|
|
604
|
+
GroupBy queries.
|
|
605
|
+
- The various ``Session`` methods for loading in data from CSV no longer support loading the data’s schema from a file.
|
|
606
|
+
- Made Session return a more user-friendly error message when the user provides a privacy budget of 0.
|
|
607
|
+
- Removed all instances of the old name of this library, and replaced them with “Analytics”
|
|
608
|
+
|
|
609
|
+
Deprecated
|
|
610
|
+
~~~~~~~~~~
|
|
611
|
+
|
|
612
|
+
- ``QueryBuilder.groupby_domains()`` and ``QueryBuilder.groupby_public_source()`` are now deprecated in favor of using ``QueryBuilder.groupby()`` with ``KeySet``\ s.
|
|
613
|
+
They will be removed in a future version.
|
|
614
|
+
|
|
615
|
+
.. _v0.1.0:
|
|
616
|
+
|
|
617
|
+
0.1.0 - 2022-02-15 (internal release)
|
|
618
|
+
-------------------------------------
|
|
619
|
+
|
|
620
|
+
Added
|
|
621
|
+
~~~~~
|
|
622
|
+
|
|
623
|
+
- Initial release.
|