resolvekit 0.0.1__tar.gz → 0.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- resolvekit-0.1.1/LICENSE +21 -0
- resolvekit-0.1.1/PKG-INFO +225 -0
- resolvekit-0.1.1/README.md +176 -0
- resolvekit-0.1.1/pyproject.toml +333 -0
- resolvekit-0.1.1/src/resolvekit/NOTICE.md +87 -0
- resolvekit-0.1.1/src/resolvekit/__init__.py +121 -0
- resolvekit-0.1.1/src/resolvekit/_convenience.py +603 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/admin1/metadata.json +60 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/admin2/metadata.json +60 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/admin3/metadata.json +61 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/admin4/metadata.json +60 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/admin5/metadata.json +60 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/cities/metadata.json +61 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/continental_unions/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/continental_unions/metadata.json +44 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/continental_unions/symspell.dict +164 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/continents/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/continents/metadata.json +44 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/continents/symspell.dict +87 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/countries/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/countries/geo_calibrator.json +7 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/countries/metadata.json +49 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/countries/symspell.dict +4489 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/regions/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/regions/metadata.json +44 -0
- resolvekit-0.1.1/src/resolvekit/_data/geo/regions/symspell.dict +1500 -0
- resolvekit-0.1.1/src/resolvekit/_data/manifest.json +267 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/companies/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/companies/metadata.json +44 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/companies/symspell.dict +1262 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/data_sources/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/data_sources/metadata.json +44 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/data_sources/symspell.dict +313 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/governments/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/governments/metadata.json +46 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/governments/symspell.dict +3828 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/lenders/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/lenders/metadata.json +44 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/lenders/symspell.dict +196 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/political_parties/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/political_parties/metadata.json +44 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/political_parties/symspell.dict +160 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/providers/entities.sqlite +0 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/providers/metadata.json +44 -0
- resolvekit-0.1.1/src/resolvekit/_data/org/providers/symspell.dict +598 -0
- resolvekit-0.1.1/src/resolvekit/_data/parse/deny_list.json +26 -0
- resolvekit-0.1.1/src/resolvekit/_pandas_integration.py +125 -0
- resolvekit-0.1.1/src/resolvekit/_polars_integration.py +100 -0
- resolvekit-0.1.1/src/resolvekit/builder/__init__.py +40 -0
- resolvekit-0.1.1/src/resolvekit/builder/_outcomes.py +29 -0
- resolvekit-0.1.1/src/resolvekit/builder/api.py +310 -0
- resolvekit-0.1.1/src/resolvekit/builder/containment.py +160 -0
- resolvekit-0.1.1/src/resolvekit/builder/country_geonames_aliases.py +256 -0
- resolvekit-0.1.1/src/resolvekit/builder/datapack_layout.py +99 -0
- resolvekit-0.1.1/src/resolvekit/builder/entity_validity.py +87 -0
- resolvekit-0.1.1/src/resolvekit/builder/formal_names.py +226 -0
- resolvekit-0.1.1/src/resolvekit/builder/geo_shared.py +644 -0
- resolvekit-0.1.1/src/resolvekit/builder/groups.py +146 -0
- resolvekit-0.1.1/src/resolvekit/builder/inspection.py +105 -0
- resolvekit-0.1.1/src/resolvekit/builder/models.py +165 -0
- resolvekit-0.1.1/src/resolvekit/builder/module_catalog.py +257 -0
- resolvekit-0.1.1/src/resolvekit/builder/oecd_dac.py +699 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/__init__.py +37 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/build_report.py +66 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/changelog.py +167 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/chunk.py +243 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/contribution.py +171 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/core.py +236 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/discover.py +409 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/enrich.py +398 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/geo_staging.py +218 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/packaging.py +415 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/promote.py +49 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/qa.py +58 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/reconcile.py +186 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/stages.py +543 -0
- resolvekit-0.1.1/src/resolvekit/builder/pipeline/types.py +205 -0
- resolvekit-0.1.1/src/resolvekit/builder/presets.py +40 -0
- resolvekit-0.1.1/src/resolvekit/builder/registry.py +68 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/__init__.py +11 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/__init__.py +43 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/adapter.py +154 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/base_dc_api.py +154 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/bundle.py +127 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/canonicalize.py +229 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/client.py +452 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/constants.py +36 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/__init__.py +15 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_admin_walk.py +356 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_chunk_callback.py +73 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_geo_regions.py +92 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_ordered_emitter.py +111 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_progress_context.py +27 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_streaming.py +143 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_type_mappings.py +65 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/adapter.py +126 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/dc_api.py +287 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/discovery.py +415 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/fetch.py +123 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/mappings.py +177 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/profile.py +77 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/prominence.py +152 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/models.py +140 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/node.py +283 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/__init__.py +11 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/adapter.py +60 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/dc_api.py +133 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/discovery.py +79 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/fetch.py +73 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/mappings.py +83 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/profile.py +155 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/rows.py +277 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/specs.py +76 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/text.py +20 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/discovery_events.py +149 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/protocol.py +129 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/seed/continents.py +181 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/seed/m49.py +571 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/wikidata/__init__.py +8 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/wikidata/aliases.py +221 -0
- resolvekit-0.1.1/src/resolvekit/builder/sources/wikidata/sitelinks.py +91 -0
- resolvekit-0.1.1/src/resolvekit/builder/sqlite/__init__.py +55 -0
- resolvekit-0.1.1/src/resolvekit/builder/sqlite/constants.py +14 -0
- resolvekit-0.1.1/src/resolvekit/builder/sqlite/context.py +15 -0
- resolvekit-0.1.1/src/resolvekit/builder/sqlite/diff.py +294 -0
- resolvekit-0.1.1/src/resolvekit/builder/sqlite/export.py +211 -0
- resolvekit-0.1.1/src/resolvekit/builder/sqlite/specs.py +99 -0
- resolvekit-0.1.1/src/resolvekit/builder/sqlite/validate.py +176 -0
- resolvekit-0.1.1/src/resolvekit/builder/sqlite/write.py +219 -0
- resolvekit-0.1.1/src/resolvekit/builder/state.py +368 -0
- resolvekit-0.1.1/src/resolvekit/builder/utils.py +64 -0
- resolvekit-0.1.1/src/resolvekit/calibration/__init__.py +112 -0
- resolvekit-0.1.1/src/resolvekit/calibration/adapters/__init__.py +36 -0
- resolvekit-0.1.1/src/resolvekit/calibration/adapters/_cldr_source.py +77 -0
- resolvekit-0.1.1/src/resolvekit/calibration/adapters/_latin_filter.py +14 -0
- resolvekit-0.1.1/src/resolvekit/calibration/adapters/_wikidata_client.py +100 -0
- resolvekit-0.1.1/src/resolvekit/calibration/adapters/cldr.py +104 -0
- resolvekit-0.1.1/src/resolvekit/calibration/adapters/geonames.py +183 -0
- resolvekit-0.1.1/src/resolvekit/calibration/adapters/multilingual_names.py +272 -0
- resolvekit-0.1.1/src/resolvekit/calibration/adapters/synthetic.py +447 -0
- resolvekit-0.1.1/src/resolvekit/calibration/adapters/wikidata.py +205 -0
- resolvekit-0.1.1/src/resolvekit/calibration/dataset.py +321 -0
- resolvekit-0.1.1/src/resolvekit/calibration/evaluation.py +210 -0
- resolvekit-0.1.1/src/resolvekit/calibration/fitting.py +254 -0
- resolvekit-0.1.1/src/resolvekit/calibration/models.py +110 -0
- resolvekit-0.1.1/src/resolvekit/calibration/scoring_model.py +87 -0
- resolvekit-0.1.1/src/resolvekit/calibration/train.py +537 -0
- resolvekit-0.1.1/src/resolvekit/calibration/vectorize.py +77 -0
- resolvekit-0.1.1/src/resolvekit/core/__init__.py +117 -0
- resolvekit-0.1.1/src/resolvekit/core/api/__init__.py +7 -0
- resolvekit-0.1.1/src/resolvekit/core/api/_byod.py +362 -0
- resolvekit-0.1.1/src/resolvekit/core/api/_pivot.py +77 -0
- resolvekit-0.1.1/src/resolvekit/core/api/batch.py +164 -0
- resolvekit-0.1.1/src/resolvekit/core/api/bulk.py +787 -0
- resolvekit-0.1.1/src/resolvekit/core/api/cache.py +103 -0
- resolvekit-0.1.1/src/resolvekit/core/api/code_lookup.py +219 -0
- resolvekit-0.1.1/src/resolvekit/core/api/containment_api.py +105 -0
- resolvekit-0.1.1/src/resolvekit/core/api/diagnostics.py +176 -0
- resolvekit-0.1.1/src/resolvekit/core/api/entity_lookup.py +123 -0
- resolvekit-0.1.1/src/resolvekit/core/api/group_api.py +219 -0
- resolvekit-0.1.1/src/resolvekit/core/api/info.py +186 -0
- resolvekit-0.1.1/src/resolvekit/core/api/inspect.py +125 -0
- resolvekit-0.1.1/src/resolvekit/core/api/loading/__init__.py +50 -0
- resolvekit-0.1.1/src/resolvekit/core/api/loading/module_catalog.py +329 -0
- resolvekit-0.1.1/src/resolvekit/core/api/loading/pack_loader.py +168 -0
- resolvekit-0.1.1/src/resolvekit/core/api/loading/paths.py +213 -0
- resolvekit-0.1.1/src/resolvekit/core/api/loading/store_builder.py +131 -0
- resolvekit-0.1.1/src/resolvekit/core/api/modules.py +112 -0
- resolvekit-0.1.1/src/resolvekit/core/api/output_spec.py +333 -0
- resolvekit-0.1.1/src/resolvekit/core/api/output_view.py +184 -0
- resolvekit-0.1.1/src/resolvekit/core/api/query_prep.py +107 -0
- resolvekit-0.1.1/src/resolvekit/core/api/resolve_flow.py +366 -0
- resolvekit-0.1.1/src/resolvekit/core/api/resolver.py +2673 -0
- resolvekit-0.1.1/src/resolvekit/core/api/snap.py +115 -0
- resolvekit-0.1.1/src/resolvekit/core/api/suggest_flow.py +283 -0
- resolvekit-0.1.1/src/resolvekit/core/byod/__init__.py +21 -0
- resolvekit-0.1.1/src/resolvekit/core/byod/build.py +361 -0
- resolvekit-0.1.1/src/resolvekit/core/byod/builder.py +37 -0
- resolvekit-0.1.1/src/resolvekit/core/byod/cache.py +250 -0
- resolvekit-0.1.1/src/resolvekit/core/byod/intake.py +464 -0
- resolvekit-0.1.1/src/resolvekit/core/byod/result.py +41 -0
- resolvekit-0.1.1/src/resolvekit/core/config.py +133 -0
- resolvekit-0.1.1/src/resolvekit/core/datapack.py +513 -0
- resolvekit-0.1.1/src/resolvekit/core/download_api.py +110 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/__init__.py +57 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/_stages.py +475 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/config.py +47 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/decision.py +293 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/enrichment.py +582 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/interfaces.py +531 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/multi_runner.py +979 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/router.py +328 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/runner.py +860 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/suggest_rank.py +288 -0
- resolvekit-0.1.1/src/resolvekit/core/engine/tier_utils.py +147 -0
- resolvekit-0.1.1/src/resolvekit/core/errors.py +546 -0
- resolvekit-0.1.1/src/resolvekit/core/errors_base.py +38 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/__init__.py +58 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/events.py +41 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/feature_text.py +47 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/helpers.py +89 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/protocol.py +32 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/renderers.py +368 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/result_html.py +245 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/result_types.py +29 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/scorecard.py +588 -0
- resolvekit-0.1.1/src/resolvekit/core/explain/sink.py +119 -0
- resolvekit-0.1.1/src/resolvekit/core/linking/__init__.py +24 -0
- resolvekit-0.1.1/src/resolvekit/core/linking/base_linker.py +104 -0
- resolvekit-0.1.1/src/resolvekit/core/linking/base_normalizer.py +58 -0
- resolvekit-0.1.1/src/resolvekit/core/linking/linker.py +133 -0
- resolvekit-0.1.1/src/resolvekit/core/linking/normalizer.py +51 -0
- resolvekit-0.1.1/src/resolvekit/core/merge.py +99 -0
- resolvekit-0.1.1/src/resolvekit/core/model/__init__.py +82 -0
- resolvekit-0.1.1/src/resolvekit/core/model/_repr.py +92 -0
- resolvekit-0.1.1/src/resolvekit/core/model/bulk_result.py +593 -0
- resolvekit-0.1.1/src/resolvekit/core/model/candidate.py +186 -0
- resolvekit-0.1.1/src/resolvekit/core/model/crosswalk.py +262 -0
- resolvekit-0.1.1/src/resolvekit/core/model/entity.py +250 -0
- resolvekit-0.1.1/src/resolvekit/core/model/entity_attributes.py +142 -0
- resolvekit-0.1.1/src/resolvekit/core/model/features.py +14 -0
- resolvekit-0.1.1/src/resolvekit/core/model/generation.py +45 -0
- resolvekit-0.1.1/src/resolvekit/core/model/inspection.py +122 -0
- resolvekit-0.1.1/src/resolvekit/core/model/name_grammar.py +179 -0
- resolvekit-0.1.1/src/resolvekit/core/model/query.py +99 -0
- resolvekit-0.1.1/src/resolvekit/core/model/result.py +486 -0
- resolvekit-0.1.1/src/resolvekit/core/module_registry.py +382 -0
- resolvekit-0.1.1/src/resolvekit/core/overlay_loader.py +166 -0
- resolvekit-0.1.1/src/resolvekit/core/parse/__init__.py +9 -0
- resolvekit-0.1.1/src/resolvekit/core/parse/_pivot.py +109 -0
- resolvekit-0.1.1/src/resolvekit/core/parse/automaton.py +302 -0
- resolvekit-0.1.1/src/resolvekit/core/parse/denylist.py +66 -0
- resolvekit-0.1.1/src/resolvekit/core/parse/detect.py +179 -0
- resolvekit-0.1.1/src/resolvekit/core/parse/engine.py +273 -0
- resolvekit-0.1.1/src/resolvekit/core/parse/link.py +381 -0
- resolvekit-0.1.1/src/resolvekit/core/parse/offsets.py +442 -0
- resolvekit-0.1.1/src/resolvekit/core/parse/result.py +336 -0
- resolvekit-0.1.1/src/resolvekit/core/registry.py +280 -0
- resolvekit-0.1.1/src/resolvekit/core/remote.py +330 -0
- resolvekit-0.1.1/src/resolvekit/core/store/__init__.py +13 -0
- resolvekit-0.1.1/src/resolvekit/core/store/composed_sqlite.py +434 -0
- resolvekit-0.1.1/src/resolvekit/core/store/composite.py +191 -0
- resolvekit-0.1.1/src/resolvekit/core/store/interface.py +315 -0
- resolvekit-0.1.1/src/resolvekit/core/store/merging.py +255 -0
- resolvekit-0.1.1/src/resolvekit/core/store/sqlite.py +1028 -0
- resolvekit-0.1.1/src/resolvekit/core/store/sqlite_helpers.py +159 -0
- resolvekit-0.1.1/src/resolvekit/core/store/store_view.py +175 -0
- resolvekit-0.1.1/src/resolvekit/core/util/__init__.py +21 -0
- resolvekit-0.1.1/src/resolvekit/core/util/normalization.py +327 -0
- resolvekit-0.1.1/src/resolvekit/core/util/sentinel.py +183 -0
- resolvekit-0.1.1/src/resolvekit/core/version.py +118 -0
- resolvekit-0.1.1/src/resolvekit/diagnostics/__init__.py +77 -0
- resolvekit-0.1.1/src/resolvekit/errors/__init__.py +70 -0
- resolvekit-0.1.1/src/resolvekit/extensions.py +32 -0
- resolvekit-0.1.1/src/resolvekit/packs/_artifacts.py +43 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/__init__.py +12 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/decision.py +27 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/extractor.py +71 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/features.py +38 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/normalizer.py +20 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/pack.py +142 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/scoring.py +69 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/sources/__init__.py +13 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/sources/exact_code.py +53 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/sources/exact_name.py +86 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/sources/fts.py +24 -0
- resolvekit-0.1.1/src/resolvekit/packs/custom/sources/fuzzy.py +20 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/__init__.py +39 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/_specificity.py +22 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/build/__init__.py +5 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/build/builder.py +33 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/__init__.py +13 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/containment.py +112 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/membership.py +62 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/temporal.py +19 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/type_constraint.py +51 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/decision.py +119 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/extractor.py +171 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/features.py +51 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/linker.py +12 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/normalizer.py +11 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/pack.py +315 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/routing.py +77 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/scoring.py +187 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/sources/__init__.py +17 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/sources/_short_input.py +183 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/sources/exact_code.py +193 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/sources/exact_name.py +114 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/sources/fts.py +31 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/sources/fuzzy.py +132 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/sources/fuzzy_retrieval.py +53 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/sources/query_shapes.py +49 -0
- resolvekit-0.1.1/src/resolvekit/packs/geo/sources/symspell.py +231 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/__init__.py +6 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/_acronym.py +33 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/build/__init__.py +5 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/build/builder.py +198 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/constraints/__init__.py +15 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/constraints/country_relevance.py +72 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/constraints/parent_org.py +70 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/constraints/temporal.py +28 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/constraints/type_constraint.py +11 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/decision.py +144 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/feature_extractor.py +110 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/features.py +52 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/linker.py +12 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/normalizer.py +47 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/pack.py +167 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/routing.py +41 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/scoring.py +66 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/sources/__init__.py +17 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/sources/acronym.py +76 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/sources/exact_code.py +78 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/sources/exact_name.py +66 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/sources/fts.py +24 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/sources/fuzzy.py +19 -0
- resolvekit-0.1.1/src/resolvekit/packs/org/sources/symspell.py +26 -0
- resolvekit-0.1.1/src/resolvekit/pandas/__init__.py +18 -0
- resolvekit-0.1.1/src/resolvekit/polars/__init__.py +20 -0
- resolvekit-0.1.1/src/resolvekit/shared/__init__.py +60 -0
- resolvekit-0.1.1/src/resolvekit/shared/build/__init__.py +7 -0
- resolvekit-0.1.1/src/resolvekit/shared/build/base_builder.py +624 -0
- resolvekit-0.1.1/src/resolvekit/shared/build/schema.py +59 -0
- resolvekit-0.1.1/src/resolvekit/shared/constraints/__init__.py +17 -0
- resolvekit-0.1.1/src/resolvekit/shared/constraints/temporal_constraint.py +167 -0
- resolvekit-0.1.1/src/resolvekit/shared/constraints/type_constraint.py +143 -0
- resolvekit-0.1.1/src/resolvekit/shared/scoring_base.py +96 -0
- resolvekit-0.1.1/src/resolvekit/shared/sources/__init__.py +16 -0
- resolvekit-0.1.1/src/resolvekit/shared/sources/code_helpers.py +34 -0
- resolvekit-0.1.1/src/resolvekit/shared/sources/fts_base.py +131 -0
- resolvekit-0.1.1/src/resolvekit/shared/sources/fuzzy_base.py +153 -0
- resolvekit-0.1.1/src/resolvekit/shared/sources/fuzzy_retrieval_base.py +182 -0
- resolvekit-0.1.1/src/resolvekit/shared/sources/symspell_base.py +544 -0
- resolvekit-0.1.1/src/resolvekit/types/__init__.py +49 -0
- resolvekit-0.0.1/PKG-INFO +0 -36
- resolvekit-0.0.1/README.md +0 -2
- resolvekit-0.0.1/pyproject.toml +0 -214
- resolvekit-0.0.1/src/resolvekit/README.md +0 -134
- resolvekit-0.0.1/src/resolvekit/__init__.py +0 -67
- resolvekit-0.0.1/src/resolvekit/api/README.md +0 -165
- resolvekit-0.0.1/src/resolvekit/api/__init__.py +0 -10
- resolvekit-0.0.1/src/resolvekit/api/convenience.py +0 -53
- resolvekit-0.0.1/src/resolvekit/api/resolver.py +0 -457
- resolvekit-0.0.1/src/resolvekit/builders/README.md +0 -173
- resolvekit-0.0.1/src/resolvekit/calibration/README.md +0 -351
- resolvekit-0.0.1/src/resolvekit/calibration/__init__.py +0 -12
- resolvekit-0.0.1/src/resolvekit/calibration/calibrator.py +0 -184
- resolvekit-0.0.1/src/resolvekit/calibration/features.py +0 -139
- resolvekit-0.0.1/src/resolvekit/calibration/models.py +0 -78
- resolvekit-0.0.1/src/resolvekit/cli/README.md +0 -215
- resolvekit-0.0.1/src/resolvekit/cli/main.py +0 -18
- resolvekit-0.0.1/src/resolvekit/config.py +0 -128
- resolvekit-0.0.1/src/resolvekit/constants.py +0 -252
- resolvekit-0.0.1/src/resolvekit/constraints/README.md +0 -102
- resolvekit-0.0.1/src/resolvekit/constraints/__init__.py +0 -17
- resolvekit-0.0.1/src/resolvekit/constraints/constraint_engine.py +0 -111
- resolvekit-0.0.1/src/resolvekit/constraints/hierarchy_validator.py +0 -148
- resolvekit-0.0.1/src/resolvekit/constraints/membership_validator.py +0 -60
- resolvekit-0.0.1/src/resolvekit/constraints/protocols.py +0 -33
- resolvekit-0.0.1/src/resolvekit/constraints/temporal_validator.py +0 -43
- resolvekit-0.0.1/src/resolvekit/constraints/type_validator.py +0 -42
- resolvekit-0.0.1/src/resolvekit/data/README.md +0 -165
- resolvekit-0.0.1/src/resolvekit/data/__init__.py +0 -14
- resolvekit-0.0.1/src/resolvekit/data/alias_repository.py +0 -206
- resolvekit-0.0.1/src/resolvekit/data/code_repository.py +0 -85
- resolvekit-0.0.1/src/resolvekit/data/context_filters.py +0 -49
- resolvekit-0.0.1/src/resolvekit/data/db_manager.py +0 -196
- resolvekit-0.0.1/src/resolvekit/data/entity_repository.py +0 -466
- resolvekit-0.0.1/src/resolvekit/data/membership_repository.py +0 -107
- resolvekit-0.0.1/src/resolvekit/data/query_builder.py +0 -177
- resolvekit-0.0.1/src/resolvekit/data/schema.py +0 -122
- resolvekit-0.0.1/src/resolvekit/disambiguation/README.md +0 -72
- resolvekit-0.0.1/src/resolvekit/extraction/README.md +0 -204
- resolvekit-0.0.1/src/resolvekit/matchers/README.md +0 -77
- resolvekit-0.0.1/src/resolvekit/matchers/__init__.py +0 -65
- resolvekit-0.0.1/src/resolvekit/matchers/alias_exact.py +0 -65
- resolvekit-0.0.1/src/resolvekit/matchers/canonical_name.py +0 -62
- resolvekit-0.0.1/src/resolvekit/matchers/cascade.py +0 -127
- resolvekit-0.0.1/src/resolvekit/matchers/code_validators.py +0 -250
- resolvekit-0.0.1/src/resolvekit/matchers/exact_code.py +0 -177
- resolvekit-0.0.1/src/resolvekit/matchers/fts_matcher.py +0 -106
- resolvekit-0.0.1/src/resolvekit/matchers/fuzzy_matcher.py +0 -142
- resolvekit-0.0.1/src/resolvekit/matchers/priorities.py +0 -174
- resolvekit-0.0.1/src/resolvekit/matchers/protocols.py +0 -75
- resolvekit-0.0.1/src/resolvekit/normalization/README.md +0 -192
- resolvekit-0.0.1/src/resolvekit/normalization/__init__.py +0 -8
- resolvekit-0.0.1/src/resolvekit/normalization/normalizer.py +0 -164
- resolvekit-0.0.1/src/resolvekit/overlays/README.md +0 -226
- resolvekit-0.0.1/src/resolvekit/types.py +0 -534
- resolvekit-0.0.1/src/resolvekit/utils/README.md +0 -188
- resolvekit-0.0.1/src/resolvekit/utils/__init__.py +0 -48
- resolvekit-0.0.1/src/resolvekit/utils/cache.py +0 -109
- resolvekit-0.0.1/src/resolvekit/utils/dates.py +0 -339
- resolvekit-0.0.1/src/resolvekit/utils/errors.py +0 -145
- resolvekit-0.0.1/src/resolvekit/utils/files.py +0 -366
- resolvekit-0.0.1/src/resolvekit/utils/logging.py +0 -219
- resolvekit-0.0.1/src/resolvekit/utils/text.py +0 -475
- resolvekit-0.0.1/src/resolvekit/utils/validation.py +0 -301
- {resolvekit-0.0.1/src/resolvekit/builders → resolvekit-0.1.1/src/resolvekit/builder/sources/seed}/__init__.py +0 -0
- {resolvekit-0.0.1/src/resolvekit/cli → resolvekit-0.1.1/src/resolvekit/packs}/__init__.py +0 -0
- /resolvekit-0.0.1/src/resolvekit/disambiguation/__init__.py → /resolvekit-0.1.1/src/resolvekit/packs/geo/data/.gitkeep +0 -0
- /resolvekit-0.0.1/src/resolvekit/extraction/__init__.py → /resolvekit-0.1.1/src/resolvekit/packs/org/data/.gitkeep +0 -0
- /resolvekit-0.0.1/src/resolvekit/overlays/__init__.py → /resolvekit-0.1.1/src/resolvekit/py.typed +0 -0
resolvekit-0.1.1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025–2026 Jorge Rivera
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,225 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: resolvekit
|
|
3
|
+
Version: 0.1.1
|
|
4
|
+
Summary: Entity and place resolution system that maps messy place/entity strings and codes to canonical entities
|
|
5
|
+
Keywords: entity-resolution,geocoding,place-names,data-commons,iso-codes,offline,disambiguation,normalization
|
|
6
|
+
Author: Jorge Rivera
|
|
7
|
+
Author-email: Jorge Rivera <jorge.rivera@one.org>
|
|
8
|
+
License-Expression: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Classifier: Development Status :: 4 - Beta
|
|
11
|
+
Classifier: Intended Audience :: Developers
|
|
12
|
+
Classifier: Intended Audience :: Science/Research
|
|
13
|
+
Classifier: Operating System :: OS Independent
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
17
|
+
Classifier: Programming Language :: Python :: 3 :: Only
|
|
18
|
+
Classifier: Topic :: Scientific/Engineering :: GIS
|
|
19
|
+
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
20
|
+
Classifier: Topic :: Text Processing :: Linguistic
|
|
21
|
+
Classifier: Typing :: Typed
|
|
22
|
+
Requires-Dist: packaging>=24.0
|
|
23
|
+
Requires-Dist: pooch>=1.8.0,<2.0
|
|
24
|
+
Requires-Dist: pydantic>=2.9,<3.0
|
|
25
|
+
Requires-Dist: rapidfuzz>=3.0.0,<4.0
|
|
26
|
+
Requires-Dist: symspellpy>=6.7.7,<7.0
|
|
27
|
+
Requires-Dist: spacy>=3.7,<4.0 ; extra == 'baselines'
|
|
28
|
+
Requires-Dist: gecko-syndata>=0.6.4 ; extra == 'calibration'
|
|
29
|
+
Requires-Dist: scikit-learn>=1.8.0 ; extra == 'calibration'
|
|
30
|
+
Requires-Dist: babel>=2.18.0 ; extra == 'data'
|
|
31
|
+
Requires-Dist: datacommons-client>=2.1.4 ; extra == 'data'
|
|
32
|
+
Requires-Dist: pycountry>=23.12.11 ; extra == 'data'
|
|
33
|
+
Requires-Dist: pyyaml>=6.0 ; extra == 'data'
|
|
34
|
+
Requires-Dist: pandas>=2.0 ; extra == 'pandas'
|
|
35
|
+
Requires-Dist: ahocorasick-rs>=1.0.3,<2.0 ; extra == 'parsing'
|
|
36
|
+
Requires-Dist: polars>=1.0 ; extra == 'polars'
|
|
37
|
+
Requires-Python: >=3.12
|
|
38
|
+
Project-URL: Homepage, https://github.com/jm-rivera/resolvekit
|
|
39
|
+
Project-URL: Repository, https://github.com/jm-rivera/resolvekit
|
|
40
|
+
Project-URL: Documentation, https://jm-rivera.github.io/resolvekit/
|
|
41
|
+
Project-URL: Issues, https://github.com/jm-rivera/resolvekit/issues
|
|
42
|
+
Provides-Extra: baselines
|
|
43
|
+
Provides-Extra: calibration
|
|
44
|
+
Provides-Extra: data
|
|
45
|
+
Provides-Extra: pandas
|
|
46
|
+
Provides-Extra: parsing
|
|
47
|
+
Provides-Extra: polars
|
|
48
|
+
Description-Content-Type: text/markdown
|
|
49
|
+
|
|
50
|
+
# resolvekit
|
|
51
|
+
|
|
52
|
+
Resolve messy place and entity strings — and codes — to canonical entity IDs, offline and deterministically. Feed it `"Brasil"`, `"Cote dIvoire"`, or `"Republic of Korea"` and get back `country/BRA`, `country/CIV`, `country/KOR`.
|
|
53
|
+
|
|
54
|
+
- **Offline and deterministic.** No network call, no LLM, no external service at resolution time. The same input gives the same output, today and next year.
|
|
55
|
+
- **Countries to cities to organizations.** Resolve countries, UN M.49 regions, continents, sub-national admin levels 1–5, cities, and organizations through one pipeline. Most alternatives stop at countries.
|
|
56
|
+
- **Typo- and alias-tolerant.** Exact-code and exact-name matching, full-text search, fuzzy matching, and typo correction, with a calibrated confidence score on every result.
|
|
57
|
+
- **Built for tabular data.** `bulk()` cleans a whole pandas or polars column in one call, deduplicating repeated values.
|
|
58
|
+
- **A graph, not a lookup table.** List the members of the EU, NATO, or OECD, check membership, convert between code systems, and query it all as of a past date.
|
|
59
|
+
|
|
60
|
+
## Install
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
# with uv
|
|
64
|
+
uv add resolvekit # Python >= 3.12
|
|
65
|
+
uv add "resolvekit[pandas]" # add the pandas integration for bulk()
|
|
66
|
+
|
|
67
|
+
# with pip
|
|
68
|
+
pip install resolvekit
|
|
69
|
+
pip install "resolvekit[pandas]"
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
Country, region, continent, and organization data ships in the wheel and works offline immediately. Sub-national admin levels and cities are fetched on first use — see [the docs](https://jm-rivera.github.io/resolvekit/).
|
|
73
|
+
|
|
74
|
+
## Quickstart
|
|
75
|
+
|
|
76
|
+
```python
|
|
77
|
+
import resolvekit as rk
|
|
78
|
+
|
|
79
|
+
rk.resolve_id("United States") # "country/USA"
|
|
80
|
+
rk.resolve("Germany", to="iso3") # "DEU"
|
|
81
|
+
rk.resolve("Japan", to="flag") # "🇯🇵"
|
|
82
|
+
rk.resolve("Tanzania", to="dcid") # "country/TZA"
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
`to=` pivots a resolved entity to `iso2`, `iso3`, `name`, `flag`, `dcid`, `wikidata`, and other code systems.
|
|
86
|
+
|
|
87
|
+
Clean a DataFrame column in one call. `bulk()` deduplicates internally, so a 10,000-row column with 50 distinct values runs 50 resolutions, not 10,000:
|
|
88
|
+
|
|
89
|
+
```python
|
|
90
|
+
import pandas as pd
|
|
91
|
+
import resolvekit as rk
|
|
92
|
+
|
|
93
|
+
df = pd.DataFrame({"country": ["United States", "Brasil", "Cote dIvoire", "n/a"]})
|
|
94
|
+
df["iso3"] = rk.bulk(values=df["country"], to="iso3")
|
|
95
|
+
# country iso3
|
|
96
|
+
# United States USA
|
|
97
|
+
# Brasil BRA
|
|
98
|
+
# Cote dIvoire CIV
|
|
99
|
+
# n/a None
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## What you can do with it
|
|
103
|
+
|
|
104
|
+
### Every result carries a calibrated confidence score
|
|
105
|
+
|
|
106
|
+
`confidence` is a calibrated probability, not a raw similarity score — `0.93` means the pipeline estimates roughly a 93% chance the match is correct. A code match and a fuzzy name match are on the same scale:
|
|
107
|
+
|
|
108
|
+
```python
|
|
109
|
+
r = rk.resolve("US")
|
|
110
|
+
r.entity_id # "country/USA"
|
|
111
|
+
r.confidence # 0.951
|
|
112
|
+
r.match_tier # "exact_code"
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
Call `r.explain(verbosity="full").as_text()` for a scorecard of which matchers fired, what they matched, and why this candidate won. It renders to text, Markdown, or JSON.
|
|
116
|
+
|
|
117
|
+
### It abstains instead of guessing
|
|
118
|
+
|
|
119
|
+
When two candidates score too close to separate, the result is ambiguous — `confidence` is `None`, and each candidate carries its own score:
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
r = rk.resolve("Congo")
|
|
123
|
+
r.is_ambiguous # True
|
|
124
|
+
[(c.entity_id, round(c.confidence, 3)) for c in r.candidates[:2]]
|
|
125
|
+
# [("country/COD", 0.908), ("country/COG", 0.908)]
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
`resolve_id()` raises `AmbiguousResolutionError` by default; pass `on_ambiguous="null"` for `None` or `"best"` to take the top candidate. Placeholder inputs like `"n/a"` or `"unknown"` short-circuit to no-match before scoring.
|
|
129
|
+
|
|
130
|
+
### Query the graph — including as of a past date
|
|
131
|
+
|
|
132
|
+
Entities carry typed, time-aware relations. Ask who belongs to a group, what a region contains, and what either looked like on a given date:
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
from datetime import date
|
|
136
|
+
|
|
137
|
+
r = rk.default()
|
|
138
|
+
|
|
139
|
+
r.members_of("EU", as_codes="iso3") # 27 codes
|
|
140
|
+
r.is_member("United Kingdom", "EU") # False — left in 2020
|
|
141
|
+
r.is_member("United Kingdom", "EU", as_of=date(2018, 1, 1)) # True
|
|
142
|
+
r.within("Eastern Africa", entity_type="geo.country", to="iso3")
|
|
143
|
+
# ['BDI', 'COM', 'DJI', 'ERI', 'ETH', 'KEN', 'MDG', 'MOZ', 'MUS',
|
|
144
|
+
# 'MWI', 'MYT', 'RWA', 'SOM', 'SSD', 'SYC', 'TZA', 'UGA', 'ZMB', 'ZWE']
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
`as_of` drops candidates outside their existence window before scoring — it's a hard filter, not a score penalty.
|
|
148
|
+
|
|
149
|
+
### Extract entities from free text
|
|
150
|
+
|
|
151
|
+
`parse()` scans text with an Aho-Corasick dictionary pass, then runs the same resolution pipeline on each span. No NER model, no network call (needs `pip install "resolvekit[parsing]"`):
|
|
152
|
+
|
|
153
|
+
```python
|
|
154
|
+
import resolvekit as rk
|
|
155
|
+
|
|
156
|
+
for e in rk.parse("Leaders from Kenya, Uganda and the United States met to discuss trade."):
|
|
157
|
+
if e.entity_id:
|
|
158
|
+
print(f"{e.surface!r} [{e.start}:{e.end}] -> {e.entity_id} ({e.confidence:.2f})")
|
|
159
|
+
# 'Kenya' [13:18] -> country/KEN (0.91)
|
|
160
|
+
# 'Uganda' [20:26] -> country/UGA (0.91)
|
|
161
|
+
# 'the United States' [31:48] -> country/USA (0.91)
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
`parse_bulk()` runs over a list or Series and tags each span with its source row index.
|
|
165
|
+
|
|
166
|
+
### Bring your own data
|
|
167
|
+
|
|
168
|
+
`Resolver.from_records()` builds a resolver from a list of dicts, a DataFrame, or a CSV — no schema, no server:
|
|
169
|
+
|
|
170
|
+
```python
|
|
171
|
+
from resolvekit import Resolver
|
|
172
|
+
|
|
173
|
+
r = Resolver.from_records(
|
|
174
|
+
[{"id": "w1", "label": "Widget", "sku": "abc"},
|
|
175
|
+
{"id": "w2", "label": "Gadget", "sku": "xyz"}],
|
|
176
|
+
domain="custom", name="label", id="id", codes=["sku"],
|
|
177
|
+
)
|
|
178
|
+
r.resolve("Widget").entity_id # "custom/w1"
|
|
179
|
+
r.entity(sku="abc").entity_id # "custom/w1"
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
`Resolver.augment()` joins your own columns onto an existing resolver's entities by code, attaching attributes without a rebuild and reporting what linked, what was minted, and what was skipped.
|
|
183
|
+
|
|
184
|
+
## How it compares
|
|
185
|
+
|
|
186
|
+
resolvekit is benchmarked against eight other resolvers on a public, reproducible suite — run it yourself with `uv run python -m benchmarks`. Methodology and per-dataset numbers are in [benchmarks/README.md](https://github.com/jm-rivera/resolvekit/blob/main/benchmarks/README.md); the figures below are from the committed 2026-06-10 run. A dash means the tool was skipped because the dataset is outside its scope, not that it scored zero.
|
|
187
|
+
|
|
188
|
+
| tool | offline | entity types | `countries_en` | `countries_multilingual` | `admin` | `cities` |
|
|
189
|
+
|---|---|---|---|---|---|---|
|
|
190
|
+
| **resolvekit** | yes | country, admin1–5, city, continent, org | **0.793** | 0.632 | **0.935** | **0.858** |
|
|
191
|
+
| **resolvekit** (typed) | yes | same, with type hints | **0.794** | 0.614 | **0.977** | **0.862** |
|
|
192
|
+
| hdx_python_country | yes | country | 0.642 | 0.565 | — | — |
|
|
193
|
+
| countryguess | yes | country | 0.675 | 0.512 | — | — |
|
|
194
|
+
| country_converter | yes | country | 0.566 | 0.419 | — | — |
|
|
195
|
+
| geonamescache | yes | country, city | 0.057 | 0.148 | — | 0.000 |
|
|
196
|
+
| rapidfuzz_dict | yes | country | 0.469 | 0.370 | — | — |
|
|
197
|
+
| pycountry | yes | country | 0.099 | 0.143 | — | — |
|
|
198
|
+
| data_commons_resolve | no | country, admin1–2, city | 0.625 | **0.827** | 0.598 | 0.502 |
|
|
199
|
+
|
|
200
|
+
The lead is widest on sub-national data. Six of the eight competitors are country-only and can't answer admin or city queries at all. On `cities`, resolvekit scores 0.858 against data_commons_resolve's 0.502 and geonamescache's 0.000; on `admin`, 0.935 against data_commons_resolve's 0.598. resolvekit is also the only tool that emits a calibrated confidence score (ECE 0.043 on the geo eval set).
|
|
201
|
+
|
|
202
|
+
## How it works
|
|
203
|
+
|
|
204
|
+
**The data.** resolvekit is built from public authority datasets — [Data Commons](https://datacommons.org/), ISO 3166, the UN M.49 standard, Wikidata, and World Bank/UN statistical groupings. Each entity gets one stable ID (countries reuse the Data Commons dcid, so `country/DEU` is the same key Data Commons uses) plus codes across dozens of systems: ISO 2/3/numeric, Wikidata, GND, VIAF, OpenStreetMap, IOC, and more. Names and codes drift over time; the ID stays fixed. The dataset is compiled into SQLite files that ship on disk, so resolution never makes a network call.
|
|
205
|
+
|
|
206
|
+
**The graph.** Those entities are nodes in a graph, not rows in a flat code table. A small set of typed, time-aware edges — `member_of`, `contained_in`, `subsidiary_of` — connect them, each carrying a `[valid_from, valid_until)` validity window. That structure answers questions a code-conversion library can't: which countries were in the EU on a given date, what a region contains, which group an organization belongs to. It's a small, fixed graph for entity relations — there's no query language, just the membership and containment methods on `Resolver`.
|
|
207
|
+
|
|
208
|
+
**Resolution.** A query is normalized, then run through a cascade of matchers, cheapest first — exact code, exact name, full-text search, fuzzy edit distance, then SymSpell typo correction — stopping once a confident match is found. Each candidate is scored by a calibrated model that folds match tier, edit distance, query length, and entity prominence into one confidence value, so a code match and a fuzzy name match sit on the same scale. The result is `resolved` when the top candidate clears the threshold and leads the runner-up by a margin, `ambiguous` when two are too close to separate, and `no_match` when nothing clears it.
|
|
209
|
+
|
|
210
|
+
Country, region, continent, and organization modules ship in the wheel; sub-national admin levels and cities are separate packs fetched on demand. See [How resolution works](https://jm-rivera.github.io/resolvekit/explanation/how-resolution-works/), [The entity graph](https://jm-rivera.github.io/resolvekit/explanation/knowledge-graph/), and [Offline-first and the data-pack split](https://jm-rivera.github.io/resolvekit/explanation/offline-and-data-packs/).
|
|
211
|
+
|
|
212
|
+
## Documentation
|
|
213
|
+
|
|
214
|
+
Full documentation — tutorial, how-to guides, API reference, and design notes — lives at **[jm-rivera.github.io/resolvekit](https://jm-rivera.github.io/resolvekit/)**.
|
|
215
|
+
|
|
216
|
+
- [Install](https://jm-rivera.github.io/resolvekit/getting-started/install/)
|
|
217
|
+
- [Your first resolution](https://jm-rivera.github.io/resolvekit/getting-started/first-resolution/) (tutorial)
|
|
218
|
+
- [How-to guides](https://jm-rivera.github.io/resolvekit/how-to/clean-a-dataframe-column/)
|
|
219
|
+
- [API reference](https://jm-rivera.github.io/resolvekit/reference/api/)
|
|
220
|
+
- [How resolution works](https://jm-rivera.github.io/resolvekit/explanation/how-resolution-works/)
|
|
221
|
+
- [Roadmap](https://jm-rivera.github.io/resolvekit/roadmap/)
|
|
222
|
+
|
|
223
|
+
## License
|
|
224
|
+
|
|
225
|
+
MIT — see [LICENSE](https://github.com/jm-rivera/resolvekit/blob/main/LICENSE). Bundled data is covered under multiple licenses; see [NOTICE.md](https://github.com/jm-rivera/resolvekit/blob/main/src/resolvekit/NOTICE.md) for third-party data attributions.
|
|
@@ -0,0 +1,176 @@
|
|
|
1
|
+
# resolvekit
|
|
2
|
+
|
|
3
|
+
Resolve messy place and entity strings — and codes — to canonical entity IDs, offline and deterministically. Feed it `"Brasil"`, `"Cote dIvoire"`, or `"Republic of Korea"` and get back `country/BRA`, `country/CIV`, `country/KOR`.
|
|
4
|
+
|
|
5
|
+
- **Offline and deterministic.** No network call, no LLM, no external service at resolution time. The same input gives the same output, today and next year.
|
|
6
|
+
- **Countries to cities to organizations.** Resolve countries, UN M.49 regions, continents, sub-national admin levels 1–5, cities, and organizations through one pipeline. Most alternatives stop at countries.
|
|
7
|
+
- **Typo- and alias-tolerant.** Exact-code and exact-name matching, full-text search, fuzzy matching, and typo correction, with a calibrated confidence score on every result.
|
|
8
|
+
- **Built for tabular data.** `bulk()` cleans a whole pandas or polars column in one call, deduplicating repeated values.
|
|
9
|
+
- **A graph, not a lookup table.** List the members of the EU, NATO, or OECD, check membership, convert between code systems, and query it all as of a past date.
|
|
10
|
+
|
|
11
|
+
## Install
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
# with uv
|
|
15
|
+
uv add resolvekit # Python >= 3.12
|
|
16
|
+
uv add "resolvekit[pandas]" # add the pandas integration for bulk()
|
|
17
|
+
|
|
18
|
+
# with pip
|
|
19
|
+
pip install resolvekit
|
|
20
|
+
pip install "resolvekit[pandas]"
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Country, region, continent, and organization data ships in the wheel and works offline immediately. Sub-national admin levels and cities are fetched on first use — see [the docs](https://jm-rivera.github.io/resolvekit/).
|
|
24
|
+
|
|
25
|
+
## Quickstart
|
|
26
|
+
|
|
27
|
+
```python
|
|
28
|
+
import resolvekit as rk
|
|
29
|
+
|
|
30
|
+
rk.resolve_id("United States") # "country/USA"
|
|
31
|
+
rk.resolve("Germany", to="iso3") # "DEU"
|
|
32
|
+
rk.resolve("Japan", to="flag") # "🇯🇵"
|
|
33
|
+
rk.resolve("Tanzania", to="dcid") # "country/TZA"
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
`to=` pivots a resolved entity to `iso2`, `iso3`, `name`, `flag`, `dcid`, `wikidata`, and other code systems.
|
|
37
|
+
|
|
38
|
+
Clean a DataFrame column in one call. `bulk()` deduplicates internally, so a 10,000-row column with 50 distinct values runs 50 resolutions, not 10,000:
|
|
39
|
+
|
|
40
|
+
```python
|
|
41
|
+
import pandas as pd
|
|
42
|
+
import resolvekit as rk
|
|
43
|
+
|
|
44
|
+
df = pd.DataFrame({"country": ["United States", "Brasil", "Cote dIvoire", "n/a"]})
|
|
45
|
+
df["iso3"] = rk.bulk(values=df["country"], to="iso3")
|
|
46
|
+
# country iso3
|
|
47
|
+
# United States USA
|
|
48
|
+
# Brasil BRA
|
|
49
|
+
# Cote dIvoire CIV
|
|
50
|
+
# n/a None
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## What you can do with it
|
|
54
|
+
|
|
55
|
+
### Every result carries a calibrated confidence score
|
|
56
|
+
|
|
57
|
+
`confidence` is a calibrated probability, not a raw similarity score — `0.93` means the pipeline estimates roughly a 93% chance the match is correct. A code match and a fuzzy name match are on the same scale:
|
|
58
|
+
|
|
59
|
+
```python
|
|
60
|
+
r = rk.resolve("US")
|
|
61
|
+
r.entity_id # "country/USA"
|
|
62
|
+
r.confidence # 0.951
|
|
63
|
+
r.match_tier # "exact_code"
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Call `r.explain(verbosity="full").as_text()` for a scorecard of which matchers fired, what they matched, and why this candidate won. It renders to text, Markdown, or JSON.
|
|
67
|
+
|
|
68
|
+
### It abstains instead of guessing
|
|
69
|
+
|
|
70
|
+
When two candidates score too close to separate, the result is ambiguous — `confidence` is `None`, and each candidate carries its own score:
|
|
71
|
+
|
|
72
|
+
```python
|
|
73
|
+
r = rk.resolve("Congo")
|
|
74
|
+
r.is_ambiguous # True
|
|
75
|
+
[(c.entity_id, round(c.confidence, 3)) for c in r.candidates[:2]]
|
|
76
|
+
# [("country/COD", 0.908), ("country/COG", 0.908)]
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
`resolve_id()` raises `AmbiguousResolutionError` by default; pass `on_ambiguous="null"` for `None` or `"best"` to take the top candidate. Placeholder inputs like `"n/a"` or `"unknown"` short-circuit to no-match before scoring.
|
|
80
|
+
|
|
81
|
+
### Query the graph — including as of a past date
|
|
82
|
+
|
|
83
|
+
Entities carry typed, time-aware relations. Ask who belongs to a group, what a region contains, and what either looked like on a given date:
|
|
84
|
+
|
|
85
|
+
```python
|
|
86
|
+
from datetime import date
|
|
87
|
+
|
|
88
|
+
r = rk.default()
|
|
89
|
+
|
|
90
|
+
r.members_of("EU", as_codes="iso3") # 27 codes
|
|
91
|
+
r.is_member("United Kingdom", "EU") # False — left in 2020
|
|
92
|
+
r.is_member("United Kingdom", "EU", as_of=date(2018, 1, 1)) # True
|
|
93
|
+
r.within("Eastern Africa", entity_type="geo.country", to="iso3")
|
|
94
|
+
# ['BDI', 'COM', 'DJI', 'ERI', 'ETH', 'KEN', 'MDG', 'MOZ', 'MUS',
|
|
95
|
+
# 'MWI', 'MYT', 'RWA', 'SOM', 'SSD', 'SYC', 'TZA', 'UGA', 'ZMB', 'ZWE']
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
`as_of` drops candidates outside their existence window before scoring — it's a hard filter, not a score penalty.
|
|
99
|
+
|
|
100
|
+
### Extract entities from free text
|
|
101
|
+
|
|
102
|
+
`parse()` scans text with an Aho-Corasick dictionary pass, then runs the same resolution pipeline on each span. No NER model, no network call (needs `pip install "resolvekit[parsing]"`):
|
|
103
|
+
|
|
104
|
+
```python
|
|
105
|
+
import resolvekit as rk
|
|
106
|
+
|
|
107
|
+
for e in rk.parse("Leaders from Kenya, Uganda and the United States met to discuss trade."):
|
|
108
|
+
if e.entity_id:
|
|
109
|
+
print(f"{e.surface!r} [{e.start}:{e.end}] -> {e.entity_id} ({e.confidence:.2f})")
|
|
110
|
+
# 'Kenya' [13:18] -> country/KEN (0.91)
|
|
111
|
+
# 'Uganda' [20:26] -> country/UGA (0.91)
|
|
112
|
+
# 'the United States' [31:48] -> country/USA (0.91)
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
`parse_bulk()` runs over a list or Series and tags each span with its source row index.
|
|
116
|
+
|
|
117
|
+
### Bring your own data
|
|
118
|
+
|
|
119
|
+
`Resolver.from_records()` builds a resolver from a list of dicts, a DataFrame, or a CSV — no schema, no server:
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
from resolvekit import Resolver
|
|
123
|
+
|
|
124
|
+
r = Resolver.from_records(
|
|
125
|
+
[{"id": "w1", "label": "Widget", "sku": "abc"},
|
|
126
|
+
{"id": "w2", "label": "Gadget", "sku": "xyz"}],
|
|
127
|
+
domain="custom", name="label", id="id", codes=["sku"],
|
|
128
|
+
)
|
|
129
|
+
r.resolve("Widget").entity_id # "custom/w1"
|
|
130
|
+
r.entity(sku="abc").entity_id # "custom/w1"
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
`Resolver.augment()` joins your own columns onto an existing resolver's entities by code, attaching attributes without a rebuild and reporting what linked, what was minted, and what was skipped.
|
|
134
|
+
|
|
135
|
+
## How it compares
|
|
136
|
+
|
|
137
|
+
resolvekit is benchmarked against eight other resolvers on a public, reproducible suite — run it yourself with `uv run python -m benchmarks`. Methodology and per-dataset numbers are in [benchmarks/README.md](https://github.com/jm-rivera/resolvekit/blob/main/benchmarks/README.md); the figures below are from the committed 2026-06-10 run. A dash means the tool was skipped because the dataset is outside its scope, not that it scored zero.
|
|
138
|
+
|
|
139
|
+
| tool | offline | entity types | `countries_en` | `countries_multilingual` | `admin` | `cities` |
|
|
140
|
+
|---|---|---|---|---|---|---|
|
|
141
|
+
| **resolvekit** | yes | country, admin1–5, city, continent, org | **0.793** | 0.632 | **0.935** | **0.858** |
|
|
142
|
+
| **resolvekit** (typed) | yes | same, with type hints | **0.794** | 0.614 | **0.977** | **0.862** |
|
|
143
|
+
| hdx_python_country | yes | country | 0.642 | 0.565 | — | — |
|
|
144
|
+
| countryguess | yes | country | 0.675 | 0.512 | — | — |
|
|
145
|
+
| country_converter | yes | country | 0.566 | 0.419 | — | — |
|
|
146
|
+
| geonamescache | yes | country, city | 0.057 | 0.148 | — | 0.000 |
|
|
147
|
+
| rapidfuzz_dict | yes | country | 0.469 | 0.370 | — | — |
|
|
148
|
+
| pycountry | yes | country | 0.099 | 0.143 | — | — |
|
|
149
|
+
| data_commons_resolve | no | country, admin1–2, city | 0.625 | **0.827** | 0.598 | 0.502 |
|
|
150
|
+
|
|
151
|
+
The lead is widest on sub-national data. Six of the eight competitors are country-only and can't answer admin or city queries at all. On `cities`, resolvekit scores 0.858 against data_commons_resolve's 0.502 and geonamescache's 0.000; on `admin`, 0.935 against data_commons_resolve's 0.598. resolvekit is also the only tool that emits a calibrated confidence score (ECE 0.043 on the geo eval set).
|
|
152
|
+
|
|
153
|
+
## How it works
|
|
154
|
+
|
|
155
|
+
**The data.** resolvekit is built from public authority datasets — [Data Commons](https://datacommons.org/), ISO 3166, the UN M.49 standard, Wikidata, and World Bank/UN statistical groupings. Each entity gets one stable ID (countries reuse the Data Commons dcid, so `country/DEU` is the same key Data Commons uses) plus codes across dozens of systems: ISO 2/3/numeric, Wikidata, GND, VIAF, OpenStreetMap, IOC, and more. Names and codes drift over time; the ID stays fixed. The dataset is compiled into SQLite files that ship on disk, so resolution never makes a network call.
|
|
156
|
+
|
|
157
|
+
**The graph.** Those entities are nodes in a graph, not rows in a flat code table. A small set of typed, time-aware edges — `member_of`, `contained_in`, `subsidiary_of` — connect them, each carrying a `[valid_from, valid_until)` validity window. That structure answers questions a code-conversion library can't: which countries were in the EU on a given date, what a region contains, which group an organization belongs to. It's a small, fixed graph for entity relations — there's no query language, just the membership and containment methods on `Resolver`.
|
|
158
|
+
|
|
159
|
+
**Resolution.** A query is normalized, then run through a cascade of matchers, cheapest first — exact code, exact name, full-text search, fuzzy edit distance, then SymSpell typo correction — stopping once a confident match is found. Each candidate is scored by a calibrated model that folds match tier, edit distance, query length, and entity prominence into one confidence value, so a code match and a fuzzy name match sit on the same scale. The result is `resolved` when the top candidate clears the threshold and leads the runner-up by a margin, `ambiguous` when two are too close to separate, and `no_match` when nothing clears it.
|
|
160
|
+
|
|
161
|
+
Country, region, continent, and organization modules ship in the wheel; sub-national admin levels and cities are separate packs fetched on demand. See [How resolution works](https://jm-rivera.github.io/resolvekit/explanation/how-resolution-works/), [The entity graph](https://jm-rivera.github.io/resolvekit/explanation/knowledge-graph/), and [Offline-first and the data-pack split](https://jm-rivera.github.io/resolvekit/explanation/offline-and-data-packs/).
|
|
162
|
+
|
|
163
|
+
## Documentation
|
|
164
|
+
|
|
165
|
+
Full documentation — tutorial, how-to guides, API reference, and design notes — lives at **[jm-rivera.github.io/resolvekit](https://jm-rivera.github.io/resolvekit/)**.
|
|
166
|
+
|
|
167
|
+
- [Install](https://jm-rivera.github.io/resolvekit/getting-started/install/)
|
|
168
|
+
- [Your first resolution](https://jm-rivera.github.io/resolvekit/getting-started/first-resolution/) (tutorial)
|
|
169
|
+
- [How-to guides](https://jm-rivera.github.io/resolvekit/how-to/clean-a-dataframe-column/)
|
|
170
|
+
- [API reference](https://jm-rivera.github.io/resolvekit/reference/api/)
|
|
171
|
+
- [How resolution works](https://jm-rivera.github.io/resolvekit/explanation/how-resolution-works/)
|
|
172
|
+
- [Roadmap](https://jm-rivera.github.io/resolvekit/roadmap/)
|
|
173
|
+
|
|
174
|
+
## License
|
|
175
|
+
|
|
176
|
+
MIT — see [LICENSE](https://github.com/jm-rivera/resolvekit/blob/main/LICENSE). Bundled data is covered under multiple licenses; see [NOTICE.md](https://github.com/jm-rivera/resolvekit/blob/main/src/resolvekit/NOTICE.md) for third-party data attributions.
|