PyPI - splink - Versions diffs - 4.0.0.dev5__tar.gz → 4.0.0.dev8__tar.gz - Mend

splink 4.0.0.dev5tar.gz → 4.0.0.dev8tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (161) hide show

{splink-4.0.0.dev5 → splink-4.0.0.dev8}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: splink
-Version: 4.0.0.dev5
+Version: 4.0.0.dev8
 Summary: Fast probabilistic data linkage at scale
 Home-page: https://github.com/moj-analytical-services/splink
 License: MIT
@@ -27,7 +27,6 @@ Requires-Dist: jsonschema (>=3.2)
 Requires-Dist: numpy (>=1.17.3) ; python_version < "3.12"
 Requires-Dist: numpy (>=1.26.0) ; python_version >= "3.12"
 Requires-Dist: pandas (>1.3.5)
-Requires-Dist: phonetics (>=1.0.5)
 Requires-Dist: psycopg2-binary (>=2.8.0) ; extra == "postgres"
 Requires-Dist: pyspark (>=3.2.1) ; extra == "pyspark" or extra == "spark"
 Requires-Dist: sqlglot (>=13.0.0)
@@ -51,11 +50,11 @@ Splink is a Python package for probabilistic record linkage (entity resolution)
 ## Key Features
-⚡ **Speed:** Capable of linking a million records on a laptop in around a minute.
-🎯 **Accuracy:** Support for term frequency adjustments and user-defined fuzzy matching logic.
-🌐 **Scalability:** Execute linkage in Python (using DuckDB) or big-data backends like AWS Athena or Spark for 100+ million records.
-🎓 **Unsupervised Learning:** No training data is required for model training.
-📊 **Interactive Outputs:** A suite of interactive visualisations help users understand their model and diagnose problems.
+⚡ **Speed:** Capable of linking a million records on a laptop in around a minute.<br>
+🎯 **Accuracy:** Support for term frequency adjustments and user-defined fuzzy matching logic.<br>
+🌐 **Scalability:** Execute linkage in Python (using DuckDB) or big-data backends like AWS Athena or Spark for 100+ million records.<br>
+🎓 **Unsupervised Learning:** No training data is required for model training.<br>
+📊 **Interactive Outputs:** A suite of interactive visualisations help users understand their model and diagnose problems.<br>
 Splink's linkage algorithm is based on Fellegi-Sunter's model of record linkage, with various customisations to improve accuracy.
@@ -75,19 +74,16 @@ and clusters these links to produce an estimated person ID:
 ## What data does Splink work best with?
-Before using Splink, input data should be standardised, with consistent column names and formatting (e.g., lowercased, punctuation cleaned up, etc.).
 Splink performs best with input data containing **multiple** columns that are **not highly correlated**. For instance, if the entity type is persons, you may have columns for full name, date of birth, and city. If the entity type is companies, you could have columns for name, turnover, sector, and telephone number.
-High correlation occurs when the value of a column is highly constrained (predictable) from the value of another column. For example, a 'city' field is almost perfectly correlated with 'postcode'. Gender is highly correlated with 'first name'. Correlation is particularly problematic if **all** of your input columns are highly correlated.
+High correlation occurs when one column is highly predictable from another - for instance, city can be predicted from postcode.  Correlation is particularly problematic if **all** of your input columns are highly correlated.
 Splink is not designed for linking a single column containing a 'bag of words'. For example, a table with a single 'company name' column, and no other details.
 ## Documentation
-The homepage for the Splink documentation can be found [here](https://moj-analytical-services.github.io/splink/). Interactive demos can be found [here](https://github.com/moj-analytical-services/splink/tree/master/docs/demos), or by clicking the following Binder link:
+The homepage for the Splink documentation can be found [here](https://moj-analytical-services.github.io/splink/), including a [tutorial](https://moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html) and [examples](https://moj-analytical-services.github.io/splink/demos/examples/examples_index.html) that can be run in the browser.
-[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/moj-analytical-services/splink/binder_branch?labpath=docs%2Fdemos%2Ftutorials%2F00_Tutorial_Introduction.ipynb)
 The specification of the Fellegi Sunter statistical model behind `splink` is similar as that used in the R [fastLink package](https://github.com/kosukeimai/fastLink). Accompanying the fastLink package is an [academic paper](http://imai.fas.harvard.edu/research/files/linkage.pdf) that describes this model. The [Splink documentation site](https://moj-analytical-services.github.io/splink/topic_guides/fellegi_sunter.html) and a [series of interactive articles](https://www.robinlinacre.com/probabilistic_linkage/) also explores the theory behind Splink.
@@ -143,43 +139,56 @@ The following code demonstrates how to estimate the parameters of a deduplicatio
 For more detailed tutorial, please see [here](https://moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html).
 ```py
-from splink.duckdb.linker import DuckDBLinker
-import splink.duckdb.comparison_library as cl
-import splink.duckdb.comparison_template_library as ctl
-from splink.duckdb.blocking_rule_library import block_on
-from splink.datasets import splink_datasets
+import splink.comparison_library as cl
+import splink.comparison_template_library as ctl
+from splink import DuckDBAPI, Linker, SettingsCreator, block_on, splink_datasets
+db_api = DuckDBAPI()
 df = splink_datasets.fake_1000
-settings = {
-    "link_type": "dedupe_only",
-    "blocking_rules_to_generate_predictions": [
+settings = SettingsCreator(
+    link_type="dedupe_only",
+    comparisons=[
+        cl.JaroWinklerAtThresholds("first_name", [0.9, 0.7]),
+        cl.JaroAtThresholds("surname", [0.9, 0.7]),
+        ctl.DateComparison(
+            "dob",
+            input_is_string=True,
+            datetime_metrics=["year", "month"],
+            datetime_thresholds=[1, 1],
+        ),
+        cl.ExactMatch("city").configure(term_frequency_adjustments=True),
+        ctl.EmailComparison("email"),
+    ],
+    blocking_rules_to_generate_predictions=[
         block_on("first_name"),
         block_on("surname"),
-    ],
-    "comparisons": [
-        ctl.name_comparison("first_name"),
-        ctl.name_comparison("surname"),
-        ctl.date_comparison("dob", cast_strings_to_date=True),
-        cl.exact_match("city", term_frequency_adjustments=True),
-        ctl.email_comparison("email", include_username_fuzzy_level=False),
-    ],
-}
+    ]
+)
+linker = Linker(df, settings, db_api)
+linker.training.estimate_probability_two_random_records_match(
+    [block_on("first_name", "surname")],
+    recall=0.7,
+)
-linker = DuckDBLinker(df, settings)
-linker.estimate_u_using_random_sampling(max_pairs=1e6)
+linker.training.estimate_u_using_random_sampling(max_pairs=1e6)
-blocking_rule_for_training = block_on(["first_name", "surname"])
+linker.training.estimate_parameters_using_expectation_maximisation(
+    block_on("first_name", "surname")
+)
-linker.estimate_parameters_using_expectation_maximisation(blocking_rule_for_training, estimate_without_term_frequencies=True)
+linker.training.estimate_parameters_using_expectation_maximisation(block_on("dob"))
-blocking_rule_for_training = block_on("dob")
-linker.estimate_parameters_using_expectation_maximisation(blocking_rule_for_training, estimate_without_term_frequencies=True)
+pairwise_predictions = linker.inference.predict(threshold_match_weight=-10)
-pairwise_predictions = linker.predict()
+clusters = linker.clustering.cluster_pairwise_predictions_at_threshold(
+    pairwise_predictions, 0.95
+)
-clusters = linker.cluster_pairwise_predictions_at_threshold(pairwise_predictions, 0.95)
-clusters.as_pandas_dataframe(limit=5)
+df_clusters = clusters.as_pandas_dataframe(limit=5)
 ```
 ## Videos
@@ -187,13 +196,10 @@ clusters.as_pandas_dataframe(limit=5)
 - [A introductory presentation on Splink](https://www.youtube.com/watch?v=msz3T741KQI)
 - [An introduction to the Splink Comparison Viewer dashboard](https://www.youtube.com/watch?v=DNvCMqjipis)
-## Charts Gallery
-You can see all of the interactive charts provided in Splink by checking out the [Charts Gallery](https://moj-analytical-services.github.io/splink/charts/index.html).
 ## Support
-To find the best place to ask a question, report a bug or get general advice, please refer to our [Contributing Guide](./CONTRIBUTING.md).
+To find the best place to ask a question, report a bug or get general advice, please refer to our [Guide](./CONTRIBUTING.md).
 ## Use Cases
@@ -201,8 +207,6 @@ To see how users are using Splink in the wild, check out the [Use Cases](https:/
 ## Awards
-❓ Future of Government Awards 2023: Open Source Creation - [Shortlisted, result to be announced shortly](https://futureofgovernment.com/en)
 🥈 Civil Service Awards 2023: Best Use of Data, Science, and Technology - [Runner up](https://www.civilserviceawards.com/best-use-of-data-science-and-technology-award-2/)
 🥇 Analysis in Government Awards 2022: People's Choice Award - [Winner](https://analysisfunction.civilservice.gov.uk/news/announcing-the-winner-of-the-first-analysis-in-government-peoples-choice-award/)

{splink-4.0.0.dev5 → splink-4.0.0.dev8}/README.md RENAMED Viewed

@@ -15,11 +15,11 @@ Splink is a Python package for probabilistic record linkage (entity resolution)
 ## Key Features
-⚡ **Speed:** Capable of linking a million records on a laptop in around a minute.
-🎯 **Accuracy:** Support for term frequency adjustments and user-defined fuzzy matching logic.
-🌐 **Scalability:** Execute linkage in Python (using DuckDB) or big-data backends like AWS Athena or Spark for 100+ million records.
-🎓 **Unsupervised Learning:** No training data is required for model training.
-📊 **Interactive Outputs:** A suite of interactive visualisations help users understand their model and diagnose problems.
+⚡ **Speed:** Capable of linking a million records on a laptop in around a minute.<br>
+🎯 **Accuracy:** Support for term frequency adjustments and user-defined fuzzy matching logic.<br>
+🌐 **Scalability:** Execute linkage in Python (using DuckDB) or big-data backends like AWS Athena or Spark for 100+ million records.<br>
+🎓 **Unsupervised Learning:** No training data is required for model training.<br>
+📊 **Interactive Outputs:** A suite of interactive visualisations help users understand their model and diagnose problems.<br>
 Splink's linkage algorithm is based on Fellegi-Sunter's model of record linkage, with various customisations to improve accuracy.
@@ -39,19 +39,16 @@ and clusters these links to produce an estimated person ID:
 ## What data does Splink work best with?
-Before using Splink, input data should be standardised, with consistent column names and formatting (e.g., lowercased, punctuation cleaned up, etc.).
 Splink performs best with input data containing **multiple** columns that are **not highly correlated**. For instance, if the entity type is persons, you may have columns for full name, date of birth, and city. If the entity type is companies, you could have columns for name, turnover, sector, and telephone number.
-High correlation occurs when the value of a column is highly constrained (predictable) from the value of another column. For example, a 'city' field is almost perfectly correlated with 'postcode'. Gender is highly correlated with 'first name'. Correlation is particularly problematic if **all** of your input columns are highly correlated.
+High correlation occurs when one column is highly predictable from another - for instance, city can be predicted from postcode.  Correlation is particularly problematic if **all** of your input columns are highly correlated.
 Splink is not designed for linking a single column containing a 'bag of words'. For example, a table with a single 'company name' column, and no other details.
 ## Documentation
-The homepage for the Splink documentation can be found [here](https://moj-analytical-services.github.io/splink/). Interactive demos can be found [here](https://github.com/moj-analytical-services/splink/tree/master/docs/demos), or by clicking the following Binder link:
+The homepage for the Splink documentation can be found [here](https://moj-analytical-services.github.io/splink/), including a [tutorial](https://moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html) and [examples](https://moj-analytical-services.github.io/splink/demos/examples/examples_index.html) that can be run in the browser.
-[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/moj-analytical-services/splink/binder_branch?labpath=docs%2Fdemos%2Ftutorials%2F00_Tutorial_Introduction.ipynb)
 The specification of the Fellegi Sunter statistical model behind `splink` is similar as that used in the R [fastLink package](https://github.com/kosukeimai/fastLink). Accompanying the fastLink package is an [academic paper](http://imai.fas.harvard.edu/research/files/linkage.pdf) that describes this model. The [Splink documentation site](https://moj-analytical-services.github.io/splink/topic_guides/fellegi_sunter.html) and a [series of interactive articles](https://www.robinlinacre.com/probabilistic_linkage/) also explores the theory behind Splink.
@@ -107,43 +104,56 @@ The following code demonstrates how to estimate the parameters of a deduplicatio
 For more detailed tutorial, please see [here](https://moj-analytical-services.github.io/splink/demos/tutorials/00_Tutorial_Introduction.html).
 ```py
-from splink.duckdb.linker import DuckDBLinker
-import splink.duckdb.comparison_library as cl
-import splink.duckdb.comparison_template_library as ctl
-from splink.duckdb.blocking_rule_library import block_on
-from splink.datasets import splink_datasets
+import splink.comparison_library as cl
+import splink.comparison_template_library as ctl
+from splink import DuckDBAPI, Linker, SettingsCreator, block_on, splink_datasets
+db_api = DuckDBAPI()
 df = splink_datasets.fake_1000
-settings = {
-    "link_type": "dedupe_only",
-    "blocking_rules_to_generate_predictions": [
+settings = SettingsCreator(
+    link_type="dedupe_only",
+    comparisons=[
+        cl.JaroWinklerAtThresholds("first_name", [0.9, 0.7]),
+        cl.JaroAtThresholds("surname", [0.9, 0.7]),
+        ctl.DateComparison(
+            "dob",
+            input_is_string=True,
+            datetime_metrics=["year", "month"],
+            datetime_thresholds=[1, 1],
+        ),
+        cl.ExactMatch("city").configure(term_frequency_adjustments=True),
+        ctl.EmailComparison("email"),
+    ],
+    blocking_rules_to_generate_predictions=[
         block_on("first_name"),
         block_on("surname"),
-    ],
-    "comparisons": [
-        ctl.name_comparison("first_name"),
-        ctl.name_comparison("surname"),
-        ctl.date_comparison("dob", cast_strings_to_date=True),
-        cl.exact_match("city", term_frequency_adjustments=True),
-        ctl.email_comparison("email", include_username_fuzzy_level=False),
-    ],
-}
+    ]
+)
+linker = Linker(df, settings, db_api)
+linker.training.estimate_probability_two_random_records_match(
+    [block_on("first_name", "surname")],
+    recall=0.7,
+)
-linker = DuckDBLinker(df, settings)
-linker.estimate_u_using_random_sampling(max_pairs=1e6)
+linker.training.estimate_u_using_random_sampling(max_pairs=1e6)
-blocking_rule_for_training = block_on(["first_name", "surname"])
+linker.training.estimate_parameters_using_expectation_maximisation(
+    block_on("first_name", "surname")
+)
-linker.estimate_parameters_using_expectation_maximisation(blocking_rule_for_training, estimate_without_term_frequencies=True)
+linker.training.estimate_parameters_using_expectation_maximisation(block_on("dob"))
-blocking_rule_for_training = block_on("dob")
-linker.estimate_parameters_using_expectation_maximisation(blocking_rule_for_training, estimate_without_term_frequencies=True)
+pairwise_predictions = linker.inference.predict(threshold_match_weight=-10)
-pairwise_predictions = linker.predict()
+clusters = linker.clustering.cluster_pairwise_predictions_at_threshold(
+    pairwise_predictions, 0.95
+)
-clusters = linker.cluster_pairwise_predictions_at_threshold(pairwise_predictions, 0.95)
-clusters.as_pandas_dataframe(limit=5)
+df_clusters = clusters.as_pandas_dataframe(limit=5)
 ```
 ## Videos
@@ -151,13 +161,10 @@ clusters.as_pandas_dataframe(limit=5)
 - [A introductory presentation on Splink](https://www.youtube.com/watch?v=msz3T741KQI)
 - [An introduction to the Splink Comparison Viewer dashboard](https://www.youtube.com/watch?v=DNvCMqjipis)
-## Charts Gallery
-You can see all of the interactive charts provided in Splink by checking out the [Charts Gallery](https://moj-analytical-services.github.io/splink/charts/index.html).
 ## Support
-To find the best place to ask a question, report a bug or get general advice, please refer to our [Contributing Guide](./CONTRIBUTING.md).
+To find the best place to ask a question, report a bug or get general advice, please refer to our [Guide](./CONTRIBUTING.md).
 ## Use Cases
@@ -165,8 +172,6 @@ To see how users are using Splink in the wild, check out the [Use Cases](https:/
 ## Awards
-❓ Future of Government Awards 2023: Open Source Creation - [Shortlisted, result to be announced shortly](https://futureofgovernment.com/en)
 🥈 Civil Service Awards 2023: Best Use of Data, Science, and Technology - [Runner up](https://www.civilserviceawards.com/best-use-of-data-science-and-technology-award-2/)
 🥇 Analysis in Government Awards 2022: People's Choice Award - [Winner](https://analysisfunction.civilservice.gov.uk/news/announcing-the-winner-of-the-first-analysis-in-government-peoples-choice-award/)

{splink-4.0.0.dev5 → splink-4.0.0.dev8}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "splink"
-version = "4.0.0.dev5"
+version = "4.0.0.dev8"
 description = "Fast probabilistic data linkage at scale"
 authors = ["Robin Linacre <robinlinacre@hotmail.com>", "Sam Lindsay", "Theodore Manassis", "Tom Hepworth", "Andy Bond", "Ross Kennedy"]
 license = "MIT"
@@ -17,7 +17,6 @@ duckdb = ">=0.9.2"
 sqlglot = ">=13.0.0"
 altair = "^5.0.1"
 Jinja2 = ">=3.0.3"
-phonetics = ">=1.0.5"
 # need to manually specify numpy versions suitable for CI
 # 1.24.4 works with python 3.8, but not 3.12

{splink-4.0.0.dev5 → splink-4.0.0.dev8}/splink/__init__.py RENAMED Viewed

@@ -44,7 +44,7 @@ def __getattr__(name):
     raise AttributeError(f"module 'splink' has no attribute '{name}'") from None
-__version__ = "4.0.0.dev5"
+__version__ = "4.0.0.dev8"
 __all__ = [

splink-4.0.0.dev8/splink/backends/athena.py ADDED Viewed

@@ -0,0 +1,3 @@
+from splink.internals.athena.database_api import AthenaAPI
+__all__ = ["AthenaAPI"]

{splink-4.0.0.dev5 → splink-4.0.0.dev8}/splink/blocking_analysis.py RENAMED Viewed

@@ -2,10 +2,12 @@ from .internals.blocking_analysis import (
     count_comparisons_from_blocking_rule,
     cumulative_comparisons_to_be_scored_from_blocking_rules_chart,
     cumulative_comparisons_to_be_scored_from_blocking_rules_data,
+    n_largest_blocks,
 )
 __all__ = [
     "count_comparisons_from_blocking_rule",
     "cumulative_comparisons_to_be_scored_from_blocking_rules_chart",
     "cumulative_comparisons_to_be_scored_from_blocking_rules_data",
+    "n_largest_blocks",
 ]

{splink-4.0.0.dev5 → splink-4.0.0.dev8}/splink/comparison_library.py RENAMED Viewed

@@ -4,13 +4,18 @@ from splink.internals.comparison_library import (
     ArrayIntersectAtSizes,
     CustomComparison,
     DamerauLevenshteinAtThresholds,
+    DateOfBirthComparison,
     DistanceFunctionAtThresholds,
     DistanceInKMAtThresholds,
+    EmailComparison,
     ExactMatch,
+    ForenameSurnameComparison,
     JaccardAtThresholds,
     JaroAtThresholds,
     JaroWinklerAtThresholds,
     LevenshteinAtThresholds,
+    NameComparison,
+    PostcodeComparison,
 )
 __all__ = [
@@ -26,4 +31,9 @@ __all__ = [
     "AbsoluteDateDifferenceAtThresholds",
     "ArrayIntersectAtSizes",
     "DistanceInKMAtThresholds",
+    "DateOfBirthComparison",
+    "EmailComparison",
+    "ForenameSurnameComparison",
+    "NameComparison",
+    "PostcodeComparison",
 ]

splink-4.0.0.dev8/splink/exploratory.py ADDED Viewed

@@ -0,0 +1,5 @@
+from .internals import similarity_analysis
+from .internals.completeness import completeness_chart
+from .internals.profile_data import profile_columns
+__all__ = ["completeness_chart", "profile_columns", "similarity_analysis"]

{splink-4.0.0.dev5 → splink-4.0.0.dev8}/splink/internals/accuracy.py RENAMED Viewed

@@ -1,7 +1,7 @@
 from __future__ import annotations
 from copy import deepcopy
-from typing import TYPE_CHECKING
+from typing import TYPE_CHECKING, Optional
 from splink.internals.block_from_labels import block_from_labels
 from splink.internals.blocking import BlockingRule
@@ -307,8 +307,11 @@ def _select_found_by_blocking_rules(linker: "Linker") -> str:
 def truth_space_table_from_labels_table(
-    linker, labels_tablename, threshold_actual=0.5, match_weight_round_to_nearest=None
-):
+    linker: Linker,
+    labels_tablename: str,
+    threshold_actual: float = 0.5,
+    match_weight_round_to_nearest: Optional[float] = None,
+) -> SplinkDataFrame:
     pipeline = CTEPipeline()
     nodes_with_tf = compute_df_concat_with_tf(linker, pipeline)
@@ -323,7 +326,7 @@ def truth_space_table_from_labels_table(
     )
     pipeline.enqueue_list_of_sqls(sqls)
-    df_truth_space_table = linker.db_api.sql_pipeline_to_splink_dataframe(pipeline)
+    df_truth_space_table = linker._db_api.sql_pipeline_to_splink_dataframe(pipeline)
     return df_truth_space_table
@@ -356,7 +359,7 @@ def truth_space_table_from_labels_column(
     """
     pipeline.enqueue_sql(sql, "__splink__cartesian_product")
-    cartesian_count = linker.db_api.sql_pipeline_to_splink_dataframe(pipeline)
+    cartesian_count = linker._db_api.sql_pipeline_to_splink_dataframe(pipeline)
     row_count_df = cartesian_count.as_record_dict()
     cartesian_count.drop_table_from_database_and_remove_from_cache()
@@ -393,7 +396,7 @@ def truth_space_table_from_labels_column(
     )
     pipeline.enqueue_list_of_sqls(sqls)
-    df_truth_space_table = linker.db_api.sql_pipeline_to_splink_dataframe(pipeline)
+    df_truth_space_table = linker._db_api.sql_pipeline_to_splink_dataframe(pipeline)
     return df_truth_space_table
@@ -439,12 +442,12 @@ def predictions_from_sample_of_pairwise_labels_sql(linker, labels_tablename):
 def prediction_errors_from_labels_table(
-    linker,
-    labels_tablename,
-    include_false_positives=True,
-    include_false_negatives=True,
-    threshold=0.5,
-):
+    linker: Linker,
+    labels_tablename: str,
+    include_false_positives: bool = True,
+    include_false_negatives: bool = True,
+    threshold: float = 0.5,
+) -> SplinkDataFrame:
     pipeline = CTEPipeline()
     nodes_with_tf = compute_df_concat_with_tf(linker, pipeline)
     pipeline = CTEPipeline([nodes_with_tf])
@@ -486,7 +489,7 @@ def prediction_errors_from_labels_table(
     pipeline.enqueue_sql(sql, "__splink__labels_with_fp_fn_status")
-    return linker.db_api.sql_pipeline_to_splink_dataframe(pipeline)
+    return linker._db_api.sql_pipeline_to_splink_dataframe(pipeline)
 def _predict_from_label_column_sql(linker, label_colname):
@@ -509,18 +512,18 @@ def _predict_from_label_column_sql(linker, label_colname):
         settings._additional_column_names_to_retain.append(label_colname)
     # Now we want to create predictions
-    df_predict = linker.predict()
+    df_predict = linker.inference.predict()
     return df_predict
 def prediction_errors_from_label_column(
-    linker,
-    label_colname,
-    include_false_positives=True,
-    include_false_negatives=True,
-    threshold=0.5,
-):
+    linker: Linker,
+    label_colname: str,
+    include_false_positives: bool = True,
+    include_false_negatives: bool = True,
+    threshold: float = 0.5,
+) -> SplinkDataFrame:
     df_predict = _predict_from_label_column_sql(
         linker,
         label_colname,
@@ -577,6 +580,6 @@ def prediction_errors_from_label_column(
     pipeline.enqueue_sql(sql, "__splink__predictions_from_label_column_fp_fn_only")
-    predictions = linker.db_api.sql_pipeline_to_splink_dataframe(pipeline)
+    predictions = linker._db_api.sql_pipeline_to_splink_dataframe(pipeline)
     return predictions

splink 4.0.0.dev5__tar.gz → 4.0.0.dev8__tar.gz

splink 4.0.0.dev5tar.gz → 4.0.0.dev8tar.gz