resolvekit 0.0.1__tar.gz → 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (403) hide show
  1. resolvekit-0.1.1/LICENSE +21 -0
  2. resolvekit-0.1.1/PKG-INFO +225 -0
  3. resolvekit-0.1.1/README.md +176 -0
  4. resolvekit-0.1.1/pyproject.toml +333 -0
  5. resolvekit-0.1.1/src/resolvekit/NOTICE.md +87 -0
  6. resolvekit-0.1.1/src/resolvekit/__init__.py +121 -0
  7. resolvekit-0.1.1/src/resolvekit/_convenience.py +603 -0
  8. resolvekit-0.1.1/src/resolvekit/_data/geo/admin1/metadata.json +60 -0
  9. resolvekit-0.1.1/src/resolvekit/_data/geo/admin2/metadata.json +60 -0
  10. resolvekit-0.1.1/src/resolvekit/_data/geo/admin3/metadata.json +61 -0
  11. resolvekit-0.1.1/src/resolvekit/_data/geo/admin4/metadata.json +60 -0
  12. resolvekit-0.1.1/src/resolvekit/_data/geo/admin5/metadata.json +60 -0
  13. resolvekit-0.1.1/src/resolvekit/_data/geo/cities/metadata.json +61 -0
  14. resolvekit-0.1.1/src/resolvekit/_data/geo/continental_unions/entities.sqlite +0 -0
  15. resolvekit-0.1.1/src/resolvekit/_data/geo/continental_unions/metadata.json +44 -0
  16. resolvekit-0.1.1/src/resolvekit/_data/geo/continental_unions/symspell.dict +164 -0
  17. resolvekit-0.1.1/src/resolvekit/_data/geo/continents/entities.sqlite +0 -0
  18. resolvekit-0.1.1/src/resolvekit/_data/geo/continents/metadata.json +44 -0
  19. resolvekit-0.1.1/src/resolvekit/_data/geo/continents/symspell.dict +87 -0
  20. resolvekit-0.1.1/src/resolvekit/_data/geo/countries/entities.sqlite +0 -0
  21. resolvekit-0.1.1/src/resolvekit/_data/geo/countries/geo_calibrator.json +7 -0
  22. resolvekit-0.1.1/src/resolvekit/_data/geo/countries/metadata.json +49 -0
  23. resolvekit-0.1.1/src/resolvekit/_data/geo/countries/symspell.dict +4489 -0
  24. resolvekit-0.1.1/src/resolvekit/_data/geo/regions/entities.sqlite +0 -0
  25. resolvekit-0.1.1/src/resolvekit/_data/geo/regions/metadata.json +44 -0
  26. resolvekit-0.1.1/src/resolvekit/_data/geo/regions/symspell.dict +1500 -0
  27. resolvekit-0.1.1/src/resolvekit/_data/manifest.json +267 -0
  28. resolvekit-0.1.1/src/resolvekit/_data/org/companies/entities.sqlite +0 -0
  29. resolvekit-0.1.1/src/resolvekit/_data/org/companies/metadata.json +44 -0
  30. resolvekit-0.1.1/src/resolvekit/_data/org/companies/symspell.dict +1262 -0
  31. resolvekit-0.1.1/src/resolvekit/_data/org/data_sources/entities.sqlite +0 -0
  32. resolvekit-0.1.1/src/resolvekit/_data/org/data_sources/metadata.json +44 -0
  33. resolvekit-0.1.1/src/resolvekit/_data/org/data_sources/symspell.dict +313 -0
  34. resolvekit-0.1.1/src/resolvekit/_data/org/governments/entities.sqlite +0 -0
  35. resolvekit-0.1.1/src/resolvekit/_data/org/governments/metadata.json +46 -0
  36. resolvekit-0.1.1/src/resolvekit/_data/org/governments/symspell.dict +3828 -0
  37. resolvekit-0.1.1/src/resolvekit/_data/org/lenders/entities.sqlite +0 -0
  38. resolvekit-0.1.1/src/resolvekit/_data/org/lenders/metadata.json +44 -0
  39. resolvekit-0.1.1/src/resolvekit/_data/org/lenders/symspell.dict +196 -0
  40. resolvekit-0.1.1/src/resolvekit/_data/org/political_parties/entities.sqlite +0 -0
  41. resolvekit-0.1.1/src/resolvekit/_data/org/political_parties/metadata.json +44 -0
  42. resolvekit-0.1.1/src/resolvekit/_data/org/political_parties/symspell.dict +160 -0
  43. resolvekit-0.1.1/src/resolvekit/_data/org/providers/entities.sqlite +0 -0
  44. resolvekit-0.1.1/src/resolvekit/_data/org/providers/metadata.json +44 -0
  45. resolvekit-0.1.1/src/resolvekit/_data/org/providers/symspell.dict +598 -0
  46. resolvekit-0.1.1/src/resolvekit/_data/parse/deny_list.json +26 -0
  47. resolvekit-0.1.1/src/resolvekit/_pandas_integration.py +125 -0
  48. resolvekit-0.1.1/src/resolvekit/_polars_integration.py +100 -0
  49. resolvekit-0.1.1/src/resolvekit/builder/__init__.py +40 -0
  50. resolvekit-0.1.1/src/resolvekit/builder/_outcomes.py +29 -0
  51. resolvekit-0.1.1/src/resolvekit/builder/api.py +310 -0
  52. resolvekit-0.1.1/src/resolvekit/builder/containment.py +160 -0
  53. resolvekit-0.1.1/src/resolvekit/builder/country_geonames_aliases.py +256 -0
  54. resolvekit-0.1.1/src/resolvekit/builder/datapack_layout.py +99 -0
  55. resolvekit-0.1.1/src/resolvekit/builder/entity_validity.py +87 -0
  56. resolvekit-0.1.1/src/resolvekit/builder/formal_names.py +226 -0
  57. resolvekit-0.1.1/src/resolvekit/builder/geo_shared.py +644 -0
  58. resolvekit-0.1.1/src/resolvekit/builder/groups.py +146 -0
  59. resolvekit-0.1.1/src/resolvekit/builder/inspection.py +105 -0
  60. resolvekit-0.1.1/src/resolvekit/builder/models.py +165 -0
  61. resolvekit-0.1.1/src/resolvekit/builder/module_catalog.py +257 -0
  62. resolvekit-0.1.1/src/resolvekit/builder/oecd_dac.py +699 -0
  63. resolvekit-0.1.1/src/resolvekit/builder/pipeline/__init__.py +37 -0
  64. resolvekit-0.1.1/src/resolvekit/builder/pipeline/build_report.py +66 -0
  65. resolvekit-0.1.1/src/resolvekit/builder/pipeline/changelog.py +167 -0
  66. resolvekit-0.1.1/src/resolvekit/builder/pipeline/chunk.py +243 -0
  67. resolvekit-0.1.1/src/resolvekit/builder/pipeline/contribution.py +171 -0
  68. resolvekit-0.1.1/src/resolvekit/builder/pipeline/core.py +236 -0
  69. resolvekit-0.1.1/src/resolvekit/builder/pipeline/discover.py +409 -0
  70. resolvekit-0.1.1/src/resolvekit/builder/pipeline/enrich.py +398 -0
  71. resolvekit-0.1.1/src/resolvekit/builder/pipeline/geo_staging.py +218 -0
  72. resolvekit-0.1.1/src/resolvekit/builder/pipeline/packaging.py +415 -0
  73. resolvekit-0.1.1/src/resolvekit/builder/pipeline/promote.py +49 -0
  74. resolvekit-0.1.1/src/resolvekit/builder/pipeline/qa.py +58 -0
  75. resolvekit-0.1.1/src/resolvekit/builder/pipeline/reconcile.py +186 -0
  76. resolvekit-0.1.1/src/resolvekit/builder/pipeline/stages.py +543 -0
  77. resolvekit-0.1.1/src/resolvekit/builder/pipeline/types.py +205 -0
  78. resolvekit-0.1.1/src/resolvekit/builder/presets.py +40 -0
  79. resolvekit-0.1.1/src/resolvekit/builder/registry.py +68 -0
  80. resolvekit-0.1.1/src/resolvekit/builder/sources/__init__.py +11 -0
  81. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/__init__.py +43 -0
  82. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/adapter.py +154 -0
  83. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/base_dc_api.py +154 -0
  84. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/bundle.py +127 -0
  85. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/canonicalize.py +229 -0
  86. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/client.py +452 -0
  87. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/constants.py +36 -0
  88. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/__init__.py +15 -0
  89. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_admin_walk.py +356 -0
  90. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_chunk_callback.py +73 -0
  91. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_geo_regions.py +92 -0
  92. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_ordered_emitter.py +111 -0
  93. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_progress_context.py +27 -0
  94. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_streaming.py +143 -0
  95. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/_type_mappings.py +65 -0
  96. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/adapter.py +126 -0
  97. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/dc_api.py +287 -0
  98. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/discovery.py +415 -0
  99. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/fetch.py +123 -0
  100. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/mappings.py +177 -0
  101. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/profile.py +77 -0
  102. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/geo/prominence.py +152 -0
  103. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/models.py +140 -0
  104. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/node.py +283 -0
  105. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/__init__.py +11 -0
  106. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/adapter.py +60 -0
  107. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/dc_api.py +133 -0
  108. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/discovery.py +79 -0
  109. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/fetch.py +73 -0
  110. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/mappings.py +83 -0
  111. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/org/profile.py +155 -0
  112. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/rows.py +277 -0
  113. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/specs.py +76 -0
  114. resolvekit-0.1.1/src/resolvekit/builder/sources/datacommons/text.py +20 -0
  115. resolvekit-0.1.1/src/resolvekit/builder/sources/discovery_events.py +149 -0
  116. resolvekit-0.1.1/src/resolvekit/builder/sources/protocol.py +129 -0
  117. resolvekit-0.1.1/src/resolvekit/builder/sources/seed/continents.py +181 -0
  118. resolvekit-0.1.1/src/resolvekit/builder/sources/seed/m49.py +571 -0
  119. resolvekit-0.1.1/src/resolvekit/builder/sources/wikidata/__init__.py +8 -0
  120. resolvekit-0.1.1/src/resolvekit/builder/sources/wikidata/aliases.py +221 -0
  121. resolvekit-0.1.1/src/resolvekit/builder/sources/wikidata/sitelinks.py +91 -0
  122. resolvekit-0.1.1/src/resolvekit/builder/sqlite/__init__.py +55 -0
  123. resolvekit-0.1.1/src/resolvekit/builder/sqlite/constants.py +14 -0
  124. resolvekit-0.1.1/src/resolvekit/builder/sqlite/context.py +15 -0
  125. resolvekit-0.1.1/src/resolvekit/builder/sqlite/diff.py +294 -0
  126. resolvekit-0.1.1/src/resolvekit/builder/sqlite/export.py +211 -0
  127. resolvekit-0.1.1/src/resolvekit/builder/sqlite/specs.py +99 -0
  128. resolvekit-0.1.1/src/resolvekit/builder/sqlite/validate.py +176 -0
  129. resolvekit-0.1.1/src/resolvekit/builder/sqlite/write.py +219 -0
  130. resolvekit-0.1.1/src/resolvekit/builder/state.py +368 -0
  131. resolvekit-0.1.1/src/resolvekit/builder/utils.py +64 -0
  132. resolvekit-0.1.1/src/resolvekit/calibration/__init__.py +112 -0
  133. resolvekit-0.1.1/src/resolvekit/calibration/adapters/__init__.py +36 -0
  134. resolvekit-0.1.1/src/resolvekit/calibration/adapters/_cldr_source.py +77 -0
  135. resolvekit-0.1.1/src/resolvekit/calibration/adapters/_latin_filter.py +14 -0
  136. resolvekit-0.1.1/src/resolvekit/calibration/adapters/_wikidata_client.py +100 -0
  137. resolvekit-0.1.1/src/resolvekit/calibration/adapters/cldr.py +104 -0
  138. resolvekit-0.1.1/src/resolvekit/calibration/adapters/geonames.py +183 -0
  139. resolvekit-0.1.1/src/resolvekit/calibration/adapters/multilingual_names.py +272 -0
  140. resolvekit-0.1.1/src/resolvekit/calibration/adapters/synthetic.py +447 -0
  141. resolvekit-0.1.1/src/resolvekit/calibration/adapters/wikidata.py +205 -0
  142. resolvekit-0.1.1/src/resolvekit/calibration/dataset.py +321 -0
  143. resolvekit-0.1.1/src/resolvekit/calibration/evaluation.py +210 -0
  144. resolvekit-0.1.1/src/resolvekit/calibration/fitting.py +254 -0
  145. resolvekit-0.1.1/src/resolvekit/calibration/models.py +110 -0
  146. resolvekit-0.1.1/src/resolvekit/calibration/scoring_model.py +87 -0
  147. resolvekit-0.1.1/src/resolvekit/calibration/train.py +537 -0
  148. resolvekit-0.1.1/src/resolvekit/calibration/vectorize.py +77 -0
  149. resolvekit-0.1.1/src/resolvekit/core/__init__.py +117 -0
  150. resolvekit-0.1.1/src/resolvekit/core/api/__init__.py +7 -0
  151. resolvekit-0.1.1/src/resolvekit/core/api/_byod.py +362 -0
  152. resolvekit-0.1.1/src/resolvekit/core/api/_pivot.py +77 -0
  153. resolvekit-0.1.1/src/resolvekit/core/api/batch.py +164 -0
  154. resolvekit-0.1.1/src/resolvekit/core/api/bulk.py +787 -0
  155. resolvekit-0.1.1/src/resolvekit/core/api/cache.py +103 -0
  156. resolvekit-0.1.1/src/resolvekit/core/api/code_lookup.py +219 -0
  157. resolvekit-0.1.1/src/resolvekit/core/api/containment_api.py +105 -0
  158. resolvekit-0.1.1/src/resolvekit/core/api/diagnostics.py +176 -0
  159. resolvekit-0.1.1/src/resolvekit/core/api/entity_lookup.py +123 -0
  160. resolvekit-0.1.1/src/resolvekit/core/api/group_api.py +219 -0
  161. resolvekit-0.1.1/src/resolvekit/core/api/info.py +186 -0
  162. resolvekit-0.1.1/src/resolvekit/core/api/inspect.py +125 -0
  163. resolvekit-0.1.1/src/resolvekit/core/api/loading/__init__.py +50 -0
  164. resolvekit-0.1.1/src/resolvekit/core/api/loading/module_catalog.py +329 -0
  165. resolvekit-0.1.1/src/resolvekit/core/api/loading/pack_loader.py +168 -0
  166. resolvekit-0.1.1/src/resolvekit/core/api/loading/paths.py +213 -0
  167. resolvekit-0.1.1/src/resolvekit/core/api/loading/store_builder.py +131 -0
  168. resolvekit-0.1.1/src/resolvekit/core/api/modules.py +112 -0
  169. resolvekit-0.1.1/src/resolvekit/core/api/output_spec.py +333 -0
  170. resolvekit-0.1.1/src/resolvekit/core/api/output_view.py +184 -0
  171. resolvekit-0.1.1/src/resolvekit/core/api/query_prep.py +107 -0
  172. resolvekit-0.1.1/src/resolvekit/core/api/resolve_flow.py +366 -0
  173. resolvekit-0.1.1/src/resolvekit/core/api/resolver.py +2673 -0
  174. resolvekit-0.1.1/src/resolvekit/core/api/snap.py +115 -0
  175. resolvekit-0.1.1/src/resolvekit/core/api/suggest_flow.py +283 -0
  176. resolvekit-0.1.1/src/resolvekit/core/byod/__init__.py +21 -0
  177. resolvekit-0.1.1/src/resolvekit/core/byod/build.py +361 -0
  178. resolvekit-0.1.1/src/resolvekit/core/byod/builder.py +37 -0
  179. resolvekit-0.1.1/src/resolvekit/core/byod/cache.py +250 -0
  180. resolvekit-0.1.1/src/resolvekit/core/byod/intake.py +464 -0
  181. resolvekit-0.1.1/src/resolvekit/core/byod/result.py +41 -0
  182. resolvekit-0.1.1/src/resolvekit/core/config.py +133 -0
  183. resolvekit-0.1.1/src/resolvekit/core/datapack.py +513 -0
  184. resolvekit-0.1.1/src/resolvekit/core/download_api.py +110 -0
  185. resolvekit-0.1.1/src/resolvekit/core/engine/__init__.py +57 -0
  186. resolvekit-0.1.1/src/resolvekit/core/engine/_stages.py +475 -0
  187. resolvekit-0.1.1/src/resolvekit/core/engine/config.py +47 -0
  188. resolvekit-0.1.1/src/resolvekit/core/engine/decision.py +293 -0
  189. resolvekit-0.1.1/src/resolvekit/core/engine/enrichment.py +582 -0
  190. resolvekit-0.1.1/src/resolvekit/core/engine/interfaces.py +531 -0
  191. resolvekit-0.1.1/src/resolvekit/core/engine/multi_runner.py +979 -0
  192. resolvekit-0.1.1/src/resolvekit/core/engine/router.py +328 -0
  193. resolvekit-0.1.1/src/resolvekit/core/engine/runner.py +860 -0
  194. resolvekit-0.1.1/src/resolvekit/core/engine/suggest_rank.py +288 -0
  195. resolvekit-0.1.1/src/resolvekit/core/engine/tier_utils.py +147 -0
  196. resolvekit-0.1.1/src/resolvekit/core/errors.py +546 -0
  197. resolvekit-0.1.1/src/resolvekit/core/errors_base.py +38 -0
  198. resolvekit-0.1.1/src/resolvekit/core/explain/__init__.py +58 -0
  199. resolvekit-0.1.1/src/resolvekit/core/explain/events.py +41 -0
  200. resolvekit-0.1.1/src/resolvekit/core/explain/feature_text.py +47 -0
  201. resolvekit-0.1.1/src/resolvekit/core/explain/helpers.py +89 -0
  202. resolvekit-0.1.1/src/resolvekit/core/explain/protocol.py +32 -0
  203. resolvekit-0.1.1/src/resolvekit/core/explain/renderers.py +368 -0
  204. resolvekit-0.1.1/src/resolvekit/core/explain/result_html.py +245 -0
  205. resolvekit-0.1.1/src/resolvekit/core/explain/result_types.py +29 -0
  206. resolvekit-0.1.1/src/resolvekit/core/explain/scorecard.py +588 -0
  207. resolvekit-0.1.1/src/resolvekit/core/explain/sink.py +119 -0
  208. resolvekit-0.1.1/src/resolvekit/core/linking/__init__.py +24 -0
  209. resolvekit-0.1.1/src/resolvekit/core/linking/base_linker.py +104 -0
  210. resolvekit-0.1.1/src/resolvekit/core/linking/base_normalizer.py +58 -0
  211. resolvekit-0.1.1/src/resolvekit/core/linking/linker.py +133 -0
  212. resolvekit-0.1.1/src/resolvekit/core/linking/normalizer.py +51 -0
  213. resolvekit-0.1.1/src/resolvekit/core/merge.py +99 -0
  214. resolvekit-0.1.1/src/resolvekit/core/model/__init__.py +82 -0
  215. resolvekit-0.1.1/src/resolvekit/core/model/_repr.py +92 -0
  216. resolvekit-0.1.1/src/resolvekit/core/model/bulk_result.py +593 -0
  217. resolvekit-0.1.1/src/resolvekit/core/model/candidate.py +186 -0
  218. resolvekit-0.1.1/src/resolvekit/core/model/crosswalk.py +262 -0
  219. resolvekit-0.1.1/src/resolvekit/core/model/entity.py +250 -0
  220. resolvekit-0.1.1/src/resolvekit/core/model/entity_attributes.py +142 -0
  221. resolvekit-0.1.1/src/resolvekit/core/model/features.py +14 -0
  222. resolvekit-0.1.1/src/resolvekit/core/model/generation.py +45 -0
  223. resolvekit-0.1.1/src/resolvekit/core/model/inspection.py +122 -0
  224. resolvekit-0.1.1/src/resolvekit/core/model/name_grammar.py +179 -0
  225. resolvekit-0.1.1/src/resolvekit/core/model/query.py +99 -0
  226. resolvekit-0.1.1/src/resolvekit/core/model/result.py +486 -0
  227. resolvekit-0.1.1/src/resolvekit/core/module_registry.py +382 -0
  228. resolvekit-0.1.1/src/resolvekit/core/overlay_loader.py +166 -0
  229. resolvekit-0.1.1/src/resolvekit/core/parse/__init__.py +9 -0
  230. resolvekit-0.1.1/src/resolvekit/core/parse/_pivot.py +109 -0
  231. resolvekit-0.1.1/src/resolvekit/core/parse/automaton.py +302 -0
  232. resolvekit-0.1.1/src/resolvekit/core/parse/denylist.py +66 -0
  233. resolvekit-0.1.1/src/resolvekit/core/parse/detect.py +179 -0
  234. resolvekit-0.1.1/src/resolvekit/core/parse/engine.py +273 -0
  235. resolvekit-0.1.1/src/resolvekit/core/parse/link.py +381 -0
  236. resolvekit-0.1.1/src/resolvekit/core/parse/offsets.py +442 -0
  237. resolvekit-0.1.1/src/resolvekit/core/parse/result.py +336 -0
  238. resolvekit-0.1.1/src/resolvekit/core/registry.py +280 -0
  239. resolvekit-0.1.1/src/resolvekit/core/remote.py +330 -0
  240. resolvekit-0.1.1/src/resolvekit/core/store/__init__.py +13 -0
  241. resolvekit-0.1.1/src/resolvekit/core/store/composed_sqlite.py +434 -0
  242. resolvekit-0.1.1/src/resolvekit/core/store/composite.py +191 -0
  243. resolvekit-0.1.1/src/resolvekit/core/store/interface.py +315 -0
  244. resolvekit-0.1.1/src/resolvekit/core/store/merging.py +255 -0
  245. resolvekit-0.1.1/src/resolvekit/core/store/sqlite.py +1028 -0
  246. resolvekit-0.1.1/src/resolvekit/core/store/sqlite_helpers.py +159 -0
  247. resolvekit-0.1.1/src/resolvekit/core/store/store_view.py +175 -0
  248. resolvekit-0.1.1/src/resolvekit/core/util/__init__.py +21 -0
  249. resolvekit-0.1.1/src/resolvekit/core/util/normalization.py +327 -0
  250. resolvekit-0.1.1/src/resolvekit/core/util/sentinel.py +183 -0
  251. resolvekit-0.1.1/src/resolvekit/core/version.py +118 -0
  252. resolvekit-0.1.1/src/resolvekit/diagnostics/__init__.py +77 -0
  253. resolvekit-0.1.1/src/resolvekit/errors/__init__.py +70 -0
  254. resolvekit-0.1.1/src/resolvekit/extensions.py +32 -0
  255. resolvekit-0.1.1/src/resolvekit/packs/_artifacts.py +43 -0
  256. resolvekit-0.1.1/src/resolvekit/packs/custom/__init__.py +12 -0
  257. resolvekit-0.1.1/src/resolvekit/packs/custom/decision.py +27 -0
  258. resolvekit-0.1.1/src/resolvekit/packs/custom/extractor.py +71 -0
  259. resolvekit-0.1.1/src/resolvekit/packs/custom/features.py +38 -0
  260. resolvekit-0.1.1/src/resolvekit/packs/custom/normalizer.py +20 -0
  261. resolvekit-0.1.1/src/resolvekit/packs/custom/pack.py +142 -0
  262. resolvekit-0.1.1/src/resolvekit/packs/custom/scoring.py +69 -0
  263. resolvekit-0.1.1/src/resolvekit/packs/custom/sources/__init__.py +13 -0
  264. resolvekit-0.1.1/src/resolvekit/packs/custom/sources/exact_code.py +53 -0
  265. resolvekit-0.1.1/src/resolvekit/packs/custom/sources/exact_name.py +86 -0
  266. resolvekit-0.1.1/src/resolvekit/packs/custom/sources/fts.py +24 -0
  267. resolvekit-0.1.1/src/resolvekit/packs/custom/sources/fuzzy.py +20 -0
  268. resolvekit-0.1.1/src/resolvekit/packs/geo/__init__.py +39 -0
  269. resolvekit-0.1.1/src/resolvekit/packs/geo/_specificity.py +22 -0
  270. resolvekit-0.1.1/src/resolvekit/packs/geo/build/__init__.py +5 -0
  271. resolvekit-0.1.1/src/resolvekit/packs/geo/build/builder.py +33 -0
  272. resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/__init__.py +13 -0
  273. resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/containment.py +112 -0
  274. resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/membership.py +62 -0
  275. resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/temporal.py +19 -0
  276. resolvekit-0.1.1/src/resolvekit/packs/geo/constraints/type_constraint.py +51 -0
  277. resolvekit-0.1.1/src/resolvekit/packs/geo/decision.py +119 -0
  278. resolvekit-0.1.1/src/resolvekit/packs/geo/extractor.py +171 -0
  279. resolvekit-0.1.1/src/resolvekit/packs/geo/features.py +51 -0
  280. resolvekit-0.1.1/src/resolvekit/packs/geo/linker.py +12 -0
  281. resolvekit-0.1.1/src/resolvekit/packs/geo/normalizer.py +11 -0
  282. resolvekit-0.1.1/src/resolvekit/packs/geo/pack.py +315 -0
  283. resolvekit-0.1.1/src/resolvekit/packs/geo/routing.py +77 -0
  284. resolvekit-0.1.1/src/resolvekit/packs/geo/scoring.py +187 -0
  285. resolvekit-0.1.1/src/resolvekit/packs/geo/sources/__init__.py +17 -0
  286. resolvekit-0.1.1/src/resolvekit/packs/geo/sources/_short_input.py +183 -0
  287. resolvekit-0.1.1/src/resolvekit/packs/geo/sources/exact_code.py +193 -0
  288. resolvekit-0.1.1/src/resolvekit/packs/geo/sources/exact_name.py +114 -0
  289. resolvekit-0.1.1/src/resolvekit/packs/geo/sources/fts.py +31 -0
  290. resolvekit-0.1.1/src/resolvekit/packs/geo/sources/fuzzy.py +132 -0
  291. resolvekit-0.1.1/src/resolvekit/packs/geo/sources/fuzzy_retrieval.py +53 -0
  292. resolvekit-0.1.1/src/resolvekit/packs/geo/sources/query_shapes.py +49 -0
  293. resolvekit-0.1.1/src/resolvekit/packs/geo/sources/symspell.py +231 -0
  294. resolvekit-0.1.1/src/resolvekit/packs/org/__init__.py +6 -0
  295. resolvekit-0.1.1/src/resolvekit/packs/org/_acronym.py +33 -0
  296. resolvekit-0.1.1/src/resolvekit/packs/org/build/__init__.py +5 -0
  297. resolvekit-0.1.1/src/resolvekit/packs/org/build/builder.py +198 -0
  298. resolvekit-0.1.1/src/resolvekit/packs/org/constraints/__init__.py +15 -0
  299. resolvekit-0.1.1/src/resolvekit/packs/org/constraints/country_relevance.py +72 -0
  300. resolvekit-0.1.1/src/resolvekit/packs/org/constraints/parent_org.py +70 -0
  301. resolvekit-0.1.1/src/resolvekit/packs/org/constraints/temporal.py +28 -0
  302. resolvekit-0.1.1/src/resolvekit/packs/org/constraints/type_constraint.py +11 -0
  303. resolvekit-0.1.1/src/resolvekit/packs/org/decision.py +144 -0
  304. resolvekit-0.1.1/src/resolvekit/packs/org/feature_extractor.py +110 -0
  305. resolvekit-0.1.1/src/resolvekit/packs/org/features.py +52 -0
  306. resolvekit-0.1.1/src/resolvekit/packs/org/linker.py +12 -0
  307. resolvekit-0.1.1/src/resolvekit/packs/org/normalizer.py +47 -0
  308. resolvekit-0.1.1/src/resolvekit/packs/org/pack.py +167 -0
  309. resolvekit-0.1.1/src/resolvekit/packs/org/routing.py +41 -0
  310. resolvekit-0.1.1/src/resolvekit/packs/org/scoring.py +66 -0
  311. resolvekit-0.1.1/src/resolvekit/packs/org/sources/__init__.py +17 -0
  312. resolvekit-0.1.1/src/resolvekit/packs/org/sources/acronym.py +76 -0
  313. resolvekit-0.1.1/src/resolvekit/packs/org/sources/exact_code.py +78 -0
  314. resolvekit-0.1.1/src/resolvekit/packs/org/sources/exact_name.py +66 -0
  315. resolvekit-0.1.1/src/resolvekit/packs/org/sources/fts.py +24 -0
  316. resolvekit-0.1.1/src/resolvekit/packs/org/sources/fuzzy.py +19 -0
  317. resolvekit-0.1.1/src/resolvekit/packs/org/sources/symspell.py +26 -0
  318. resolvekit-0.1.1/src/resolvekit/pandas/__init__.py +18 -0
  319. resolvekit-0.1.1/src/resolvekit/polars/__init__.py +20 -0
  320. resolvekit-0.1.1/src/resolvekit/shared/__init__.py +60 -0
  321. resolvekit-0.1.1/src/resolvekit/shared/build/__init__.py +7 -0
  322. resolvekit-0.1.1/src/resolvekit/shared/build/base_builder.py +624 -0
  323. resolvekit-0.1.1/src/resolvekit/shared/build/schema.py +59 -0
  324. resolvekit-0.1.1/src/resolvekit/shared/constraints/__init__.py +17 -0
  325. resolvekit-0.1.1/src/resolvekit/shared/constraints/temporal_constraint.py +167 -0
  326. resolvekit-0.1.1/src/resolvekit/shared/constraints/type_constraint.py +143 -0
  327. resolvekit-0.1.1/src/resolvekit/shared/scoring_base.py +96 -0
  328. resolvekit-0.1.1/src/resolvekit/shared/sources/__init__.py +16 -0
  329. resolvekit-0.1.1/src/resolvekit/shared/sources/code_helpers.py +34 -0
  330. resolvekit-0.1.1/src/resolvekit/shared/sources/fts_base.py +131 -0
  331. resolvekit-0.1.1/src/resolvekit/shared/sources/fuzzy_base.py +153 -0
  332. resolvekit-0.1.1/src/resolvekit/shared/sources/fuzzy_retrieval_base.py +182 -0
  333. resolvekit-0.1.1/src/resolvekit/shared/sources/symspell_base.py +544 -0
  334. resolvekit-0.1.1/src/resolvekit/types/__init__.py +49 -0
  335. resolvekit-0.0.1/PKG-INFO +0 -36
  336. resolvekit-0.0.1/README.md +0 -2
  337. resolvekit-0.0.1/pyproject.toml +0 -214
  338. resolvekit-0.0.1/src/resolvekit/README.md +0 -134
  339. resolvekit-0.0.1/src/resolvekit/__init__.py +0 -67
  340. resolvekit-0.0.1/src/resolvekit/api/README.md +0 -165
  341. resolvekit-0.0.1/src/resolvekit/api/__init__.py +0 -10
  342. resolvekit-0.0.1/src/resolvekit/api/convenience.py +0 -53
  343. resolvekit-0.0.1/src/resolvekit/api/resolver.py +0 -457
  344. resolvekit-0.0.1/src/resolvekit/builders/README.md +0 -173
  345. resolvekit-0.0.1/src/resolvekit/calibration/README.md +0 -351
  346. resolvekit-0.0.1/src/resolvekit/calibration/__init__.py +0 -12
  347. resolvekit-0.0.1/src/resolvekit/calibration/calibrator.py +0 -184
  348. resolvekit-0.0.1/src/resolvekit/calibration/features.py +0 -139
  349. resolvekit-0.0.1/src/resolvekit/calibration/models.py +0 -78
  350. resolvekit-0.0.1/src/resolvekit/cli/README.md +0 -215
  351. resolvekit-0.0.1/src/resolvekit/cli/main.py +0 -18
  352. resolvekit-0.0.1/src/resolvekit/config.py +0 -128
  353. resolvekit-0.0.1/src/resolvekit/constants.py +0 -252
  354. resolvekit-0.0.1/src/resolvekit/constraints/README.md +0 -102
  355. resolvekit-0.0.1/src/resolvekit/constraints/__init__.py +0 -17
  356. resolvekit-0.0.1/src/resolvekit/constraints/constraint_engine.py +0 -111
  357. resolvekit-0.0.1/src/resolvekit/constraints/hierarchy_validator.py +0 -148
  358. resolvekit-0.0.1/src/resolvekit/constraints/membership_validator.py +0 -60
  359. resolvekit-0.0.1/src/resolvekit/constraints/protocols.py +0 -33
  360. resolvekit-0.0.1/src/resolvekit/constraints/temporal_validator.py +0 -43
  361. resolvekit-0.0.1/src/resolvekit/constraints/type_validator.py +0 -42
  362. resolvekit-0.0.1/src/resolvekit/data/README.md +0 -165
  363. resolvekit-0.0.1/src/resolvekit/data/__init__.py +0 -14
  364. resolvekit-0.0.1/src/resolvekit/data/alias_repository.py +0 -206
  365. resolvekit-0.0.1/src/resolvekit/data/code_repository.py +0 -85
  366. resolvekit-0.0.1/src/resolvekit/data/context_filters.py +0 -49
  367. resolvekit-0.0.1/src/resolvekit/data/db_manager.py +0 -196
  368. resolvekit-0.0.1/src/resolvekit/data/entity_repository.py +0 -466
  369. resolvekit-0.0.1/src/resolvekit/data/membership_repository.py +0 -107
  370. resolvekit-0.0.1/src/resolvekit/data/query_builder.py +0 -177
  371. resolvekit-0.0.1/src/resolvekit/data/schema.py +0 -122
  372. resolvekit-0.0.1/src/resolvekit/disambiguation/README.md +0 -72
  373. resolvekit-0.0.1/src/resolvekit/extraction/README.md +0 -204
  374. resolvekit-0.0.1/src/resolvekit/matchers/README.md +0 -77
  375. resolvekit-0.0.1/src/resolvekit/matchers/__init__.py +0 -65
  376. resolvekit-0.0.1/src/resolvekit/matchers/alias_exact.py +0 -65
  377. resolvekit-0.0.1/src/resolvekit/matchers/canonical_name.py +0 -62
  378. resolvekit-0.0.1/src/resolvekit/matchers/cascade.py +0 -127
  379. resolvekit-0.0.1/src/resolvekit/matchers/code_validators.py +0 -250
  380. resolvekit-0.0.1/src/resolvekit/matchers/exact_code.py +0 -177
  381. resolvekit-0.0.1/src/resolvekit/matchers/fts_matcher.py +0 -106
  382. resolvekit-0.0.1/src/resolvekit/matchers/fuzzy_matcher.py +0 -142
  383. resolvekit-0.0.1/src/resolvekit/matchers/priorities.py +0 -174
  384. resolvekit-0.0.1/src/resolvekit/matchers/protocols.py +0 -75
  385. resolvekit-0.0.1/src/resolvekit/normalization/README.md +0 -192
  386. resolvekit-0.0.1/src/resolvekit/normalization/__init__.py +0 -8
  387. resolvekit-0.0.1/src/resolvekit/normalization/normalizer.py +0 -164
  388. resolvekit-0.0.1/src/resolvekit/overlays/README.md +0 -226
  389. resolvekit-0.0.1/src/resolvekit/types.py +0 -534
  390. resolvekit-0.0.1/src/resolvekit/utils/README.md +0 -188
  391. resolvekit-0.0.1/src/resolvekit/utils/__init__.py +0 -48
  392. resolvekit-0.0.1/src/resolvekit/utils/cache.py +0 -109
  393. resolvekit-0.0.1/src/resolvekit/utils/dates.py +0 -339
  394. resolvekit-0.0.1/src/resolvekit/utils/errors.py +0 -145
  395. resolvekit-0.0.1/src/resolvekit/utils/files.py +0 -366
  396. resolvekit-0.0.1/src/resolvekit/utils/logging.py +0 -219
  397. resolvekit-0.0.1/src/resolvekit/utils/text.py +0 -475
  398. resolvekit-0.0.1/src/resolvekit/utils/validation.py +0 -301
  399. {resolvekit-0.0.1/src/resolvekit/builders → resolvekit-0.1.1/src/resolvekit/builder/sources/seed}/__init__.py +0 -0
  400. {resolvekit-0.0.1/src/resolvekit/cli → resolvekit-0.1.1/src/resolvekit/packs}/__init__.py +0 -0
  401. /resolvekit-0.0.1/src/resolvekit/disambiguation/__init__.py → /resolvekit-0.1.1/src/resolvekit/packs/geo/data/.gitkeep +0 -0
  402. /resolvekit-0.0.1/src/resolvekit/extraction/__init__.py → /resolvekit-0.1.1/src/resolvekit/packs/org/data/.gitkeep +0 -0
  403. /resolvekit-0.0.1/src/resolvekit/overlays/__init__.py → /resolvekit-0.1.1/src/resolvekit/py.typed +0 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025–2026 Jorge Rivera
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,225 @@
1
+ Metadata-Version: 2.4
2
+ Name: resolvekit
3
+ Version: 0.1.1
4
+ Summary: Entity and place resolution system that maps messy place/entity strings and codes to canonical entities
5
+ Keywords: entity-resolution,geocoding,place-names,data-commons,iso-codes,offline,disambiguation,normalization
6
+ Author: Jorge Rivera
7
+ Author-email: Jorge Rivera <jorge.rivera@one.org>
8
+ License-Expression: MIT
9
+ License-File: LICENSE
10
+ Classifier: Development Status :: 4 - Beta
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Operating System :: OS Independent
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Programming Language :: Python :: 3.13
17
+ Classifier: Programming Language :: Python :: 3 :: Only
18
+ Classifier: Topic :: Scientific/Engineering :: GIS
19
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
20
+ Classifier: Topic :: Text Processing :: Linguistic
21
+ Classifier: Typing :: Typed
22
+ Requires-Dist: packaging>=24.0
23
+ Requires-Dist: pooch>=1.8.0,<2.0
24
+ Requires-Dist: pydantic>=2.9,<3.0
25
+ Requires-Dist: rapidfuzz>=3.0.0,<4.0
26
+ Requires-Dist: symspellpy>=6.7.7,<7.0
27
+ Requires-Dist: spacy>=3.7,<4.0 ; extra == 'baselines'
28
+ Requires-Dist: gecko-syndata>=0.6.4 ; extra == 'calibration'
29
+ Requires-Dist: scikit-learn>=1.8.0 ; extra == 'calibration'
30
+ Requires-Dist: babel>=2.18.0 ; extra == 'data'
31
+ Requires-Dist: datacommons-client>=2.1.4 ; extra == 'data'
32
+ Requires-Dist: pycountry>=23.12.11 ; extra == 'data'
33
+ Requires-Dist: pyyaml>=6.0 ; extra == 'data'
34
+ Requires-Dist: pandas>=2.0 ; extra == 'pandas'
35
+ Requires-Dist: ahocorasick-rs>=1.0.3,<2.0 ; extra == 'parsing'
36
+ Requires-Dist: polars>=1.0 ; extra == 'polars'
37
+ Requires-Python: >=3.12
38
+ Project-URL: Homepage, https://github.com/jm-rivera/resolvekit
39
+ Project-URL: Repository, https://github.com/jm-rivera/resolvekit
40
+ Project-URL: Documentation, https://jm-rivera.github.io/resolvekit/
41
+ Project-URL: Issues, https://github.com/jm-rivera/resolvekit/issues
42
+ Provides-Extra: baselines
43
+ Provides-Extra: calibration
44
+ Provides-Extra: data
45
+ Provides-Extra: pandas
46
+ Provides-Extra: parsing
47
+ Provides-Extra: polars
48
+ Description-Content-Type: text/markdown
49
+
50
+ # resolvekit
51
+
52
+ Resolve messy place and entity strings — and codes — to canonical entity IDs, offline and deterministically. Feed it `"Brasil"`, `"Cote dIvoire"`, or `"Republic of Korea"` and get back `country/BRA`, `country/CIV`, `country/KOR`.
53
+
54
+ - **Offline and deterministic.** No network call, no LLM, no external service at resolution time. The same input gives the same output, today and next year.
55
+ - **Countries to cities to organizations.** Resolve countries, UN M.49 regions, continents, sub-national admin levels 1–5, cities, and organizations through one pipeline. Most alternatives stop at countries.
56
+ - **Typo- and alias-tolerant.** Exact-code and exact-name matching, full-text search, fuzzy matching, and typo correction, with a calibrated confidence score on every result.
57
+ - **Built for tabular data.** `bulk()` cleans a whole pandas or polars column in one call, deduplicating repeated values.
58
+ - **A graph, not a lookup table.** List the members of the EU, NATO, or OECD, check membership, convert between code systems, and query it all as of a past date.
59
+
60
+ ## Install
61
+
62
+ ```bash
63
+ # with uv
64
+ uv add resolvekit # Python >= 3.12
65
+ uv add "resolvekit[pandas]" # add the pandas integration for bulk()
66
+
67
+ # with pip
68
+ pip install resolvekit
69
+ pip install "resolvekit[pandas]"
70
+ ```
71
+
72
+ Country, region, continent, and organization data ships in the wheel and works offline immediately. Sub-national admin levels and cities are fetched on first use — see [the docs](https://jm-rivera.github.io/resolvekit/).
73
+
74
+ ## Quickstart
75
+
76
+ ```python
77
+ import resolvekit as rk
78
+
79
+ rk.resolve_id("United States") # "country/USA"
80
+ rk.resolve("Germany", to="iso3") # "DEU"
81
+ rk.resolve("Japan", to="flag") # "🇯🇵"
82
+ rk.resolve("Tanzania", to="dcid") # "country/TZA"
83
+ ```
84
+
85
+ `to=` pivots a resolved entity to `iso2`, `iso3`, `name`, `flag`, `dcid`, `wikidata`, and other code systems.
86
+
87
+ Clean a DataFrame column in one call. `bulk()` deduplicates internally, so a 10,000-row column with 50 distinct values runs 50 resolutions, not 10,000:
88
+
89
+ ```python
90
+ import pandas as pd
91
+ import resolvekit as rk
92
+
93
+ df = pd.DataFrame({"country": ["United States", "Brasil", "Cote dIvoire", "n/a"]})
94
+ df["iso3"] = rk.bulk(values=df["country"], to="iso3")
95
+ # country iso3
96
+ # United States USA
97
+ # Brasil BRA
98
+ # Cote dIvoire CIV
99
+ # n/a None
100
+ ```
101
+
102
+ ## What you can do with it
103
+
104
+ ### Every result carries a calibrated confidence score
105
+
106
+ `confidence` is a calibrated probability, not a raw similarity score — `0.93` means the pipeline estimates roughly a 93% chance the match is correct. A code match and a fuzzy name match are on the same scale:
107
+
108
+ ```python
109
+ r = rk.resolve("US")
110
+ r.entity_id # "country/USA"
111
+ r.confidence # 0.951
112
+ r.match_tier # "exact_code"
113
+ ```
114
+
115
+ Call `r.explain(verbosity="full").as_text()` for a scorecard of which matchers fired, what they matched, and why this candidate won. It renders to text, Markdown, or JSON.
116
+
117
+ ### It abstains instead of guessing
118
+
119
+ When two candidates score too close to separate, the result is ambiguous — `confidence` is `None`, and each candidate carries its own score:
120
+
121
+ ```python
122
+ r = rk.resolve("Congo")
123
+ r.is_ambiguous # True
124
+ [(c.entity_id, round(c.confidence, 3)) for c in r.candidates[:2]]
125
+ # [("country/COD", 0.908), ("country/COG", 0.908)]
126
+ ```
127
+
128
+ `resolve_id()` raises `AmbiguousResolutionError` by default; pass `on_ambiguous="null"` for `None` or `"best"` to take the top candidate. Placeholder inputs like `"n/a"` or `"unknown"` short-circuit to no-match before scoring.
129
+
130
+ ### Query the graph — including as of a past date
131
+
132
+ Entities carry typed, time-aware relations. Ask who belongs to a group, what a region contains, and what either looked like on a given date:
133
+
134
+ ```python
135
+ from datetime import date
136
+
137
+ r = rk.default()
138
+
139
+ r.members_of("EU", as_codes="iso3") # 27 codes
140
+ r.is_member("United Kingdom", "EU") # False — left in 2020
141
+ r.is_member("United Kingdom", "EU", as_of=date(2018, 1, 1)) # True
142
+ r.within("Eastern Africa", entity_type="geo.country", to="iso3")
143
+ # ['BDI', 'COM', 'DJI', 'ERI', 'ETH', 'KEN', 'MDG', 'MOZ', 'MUS',
144
+ # 'MWI', 'MYT', 'RWA', 'SOM', 'SSD', 'SYC', 'TZA', 'UGA', 'ZMB', 'ZWE']
145
+ ```
146
+
147
+ `as_of` drops candidates outside their existence window before scoring — it's a hard filter, not a score penalty.
148
+
149
+ ### Extract entities from free text
150
+
151
+ `parse()` scans text with an Aho-Corasick dictionary pass, then runs the same resolution pipeline on each span. No NER model, no network call (needs `pip install "resolvekit[parsing]"`):
152
+
153
+ ```python
154
+ import resolvekit as rk
155
+
156
+ for e in rk.parse("Leaders from Kenya, Uganda and the United States met to discuss trade."):
157
+ if e.entity_id:
158
+ print(f"{e.surface!r} [{e.start}:{e.end}] -> {e.entity_id} ({e.confidence:.2f})")
159
+ # 'Kenya' [13:18] -> country/KEN (0.91)
160
+ # 'Uganda' [20:26] -> country/UGA (0.91)
161
+ # 'the United States' [31:48] -> country/USA (0.91)
162
+ ```
163
+
164
+ `parse_bulk()` runs over a list or Series and tags each span with its source row index.
165
+
166
+ ### Bring your own data
167
+
168
+ `Resolver.from_records()` builds a resolver from a list of dicts, a DataFrame, or a CSV — no schema, no server:
169
+
170
+ ```python
171
+ from resolvekit import Resolver
172
+
173
+ r = Resolver.from_records(
174
+ [{"id": "w1", "label": "Widget", "sku": "abc"},
175
+ {"id": "w2", "label": "Gadget", "sku": "xyz"}],
176
+ domain="custom", name="label", id="id", codes=["sku"],
177
+ )
178
+ r.resolve("Widget").entity_id # "custom/w1"
179
+ r.entity(sku="abc").entity_id # "custom/w1"
180
+ ```
181
+
182
+ `Resolver.augment()` joins your own columns onto an existing resolver's entities by code, attaching attributes without a rebuild and reporting what linked, what was minted, and what was skipped.
183
+
184
+ ## How it compares
185
+
186
+ resolvekit is benchmarked against eight other resolvers on a public, reproducible suite — run it yourself with `uv run python -m benchmarks`. Methodology and per-dataset numbers are in [benchmarks/README.md](https://github.com/jm-rivera/resolvekit/blob/main/benchmarks/README.md); the figures below are from the committed 2026-06-10 run. A dash means the tool was skipped because the dataset is outside its scope, not that it scored zero.
187
+
188
+ | tool | offline | entity types | `countries_en` | `countries_multilingual` | `admin` | `cities` |
189
+ |---|---|---|---|---|---|---|
190
+ | **resolvekit** | yes | country, admin1–5, city, continent, org | **0.793** | 0.632 | **0.935** | **0.858** |
191
+ | **resolvekit** (typed) | yes | same, with type hints | **0.794** | 0.614 | **0.977** | **0.862** |
192
+ | hdx_python_country | yes | country | 0.642 | 0.565 | — | — |
193
+ | countryguess | yes | country | 0.675 | 0.512 | — | — |
194
+ | country_converter | yes | country | 0.566 | 0.419 | — | — |
195
+ | geonamescache | yes | country, city | 0.057 | 0.148 | — | 0.000 |
196
+ | rapidfuzz_dict | yes | country | 0.469 | 0.370 | — | — |
197
+ | pycountry | yes | country | 0.099 | 0.143 | — | — |
198
+ | data_commons_resolve | no | country, admin1–2, city | 0.625 | **0.827** | 0.598 | 0.502 |
199
+
200
+ The lead is widest on sub-national data. Six of the eight competitors are country-only and can't answer admin or city queries at all. On `cities`, resolvekit scores 0.858 against data_commons_resolve's 0.502 and geonamescache's 0.000; on `admin`, 0.935 against data_commons_resolve's 0.598. resolvekit is also the only tool that emits a calibrated confidence score (ECE 0.043 on the geo eval set).
201
+
202
+ ## How it works
203
+
204
+ **The data.** resolvekit is built from public authority datasets — [Data Commons](https://datacommons.org/), ISO 3166, the UN M.49 standard, Wikidata, and World Bank/UN statistical groupings. Each entity gets one stable ID (countries reuse the Data Commons dcid, so `country/DEU` is the same key Data Commons uses) plus codes across dozens of systems: ISO 2/3/numeric, Wikidata, GND, VIAF, OpenStreetMap, IOC, and more. Names and codes drift over time; the ID stays fixed. The dataset is compiled into SQLite files that ship on disk, so resolution never makes a network call.
205
+
206
+ **The graph.** Those entities are nodes in a graph, not rows in a flat code table. A small set of typed, time-aware edges — `member_of`, `contained_in`, `subsidiary_of` — connect them, each carrying a `[valid_from, valid_until)` validity window. That structure answers questions a code-conversion library can't: which countries were in the EU on a given date, what a region contains, which group an organization belongs to. It's a small, fixed graph for entity relations — there's no query language, just the membership and containment methods on `Resolver`.
207
+
208
+ **Resolution.** A query is normalized, then run through a cascade of matchers, cheapest first — exact code, exact name, full-text search, fuzzy edit distance, then SymSpell typo correction — stopping once a confident match is found. Each candidate is scored by a calibrated model that folds match tier, edit distance, query length, and entity prominence into one confidence value, so a code match and a fuzzy name match sit on the same scale. The result is `resolved` when the top candidate clears the threshold and leads the runner-up by a margin, `ambiguous` when two are too close to separate, and `no_match` when nothing clears it.
209
+
210
+ Country, region, continent, and organization modules ship in the wheel; sub-national admin levels and cities are separate packs fetched on demand. See [How resolution works](https://jm-rivera.github.io/resolvekit/explanation/how-resolution-works/), [The entity graph](https://jm-rivera.github.io/resolvekit/explanation/knowledge-graph/), and [Offline-first and the data-pack split](https://jm-rivera.github.io/resolvekit/explanation/offline-and-data-packs/).
211
+
212
+ ## Documentation
213
+
214
+ Full documentation — tutorial, how-to guides, API reference, and design notes — lives at **[jm-rivera.github.io/resolvekit](https://jm-rivera.github.io/resolvekit/)**.
215
+
216
+ - [Install](https://jm-rivera.github.io/resolvekit/getting-started/install/)
217
+ - [Your first resolution](https://jm-rivera.github.io/resolvekit/getting-started/first-resolution/) (tutorial)
218
+ - [How-to guides](https://jm-rivera.github.io/resolvekit/how-to/clean-a-dataframe-column/)
219
+ - [API reference](https://jm-rivera.github.io/resolvekit/reference/api/)
220
+ - [How resolution works](https://jm-rivera.github.io/resolvekit/explanation/how-resolution-works/)
221
+ - [Roadmap](https://jm-rivera.github.io/resolvekit/roadmap/)
222
+
223
+ ## License
224
+
225
+ MIT — see [LICENSE](https://github.com/jm-rivera/resolvekit/blob/main/LICENSE). Bundled data is covered under multiple licenses; see [NOTICE.md](https://github.com/jm-rivera/resolvekit/blob/main/src/resolvekit/NOTICE.md) for third-party data attributions.
@@ -0,0 +1,176 @@
1
+ # resolvekit
2
+
3
+ Resolve messy place and entity strings — and codes — to canonical entity IDs, offline and deterministically. Feed it `"Brasil"`, `"Cote dIvoire"`, or `"Republic of Korea"` and get back `country/BRA`, `country/CIV`, `country/KOR`.
4
+
5
+ - **Offline and deterministic.** No network call, no LLM, no external service at resolution time. The same input gives the same output, today and next year.
6
+ - **Countries to cities to organizations.** Resolve countries, UN M.49 regions, continents, sub-national admin levels 1–5, cities, and organizations through one pipeline. Most alternatives stop at countries.
7
+ - **Typo- and alias-tolerant.** Exact-code and exact-name matching, full-text search, fuzzy matching, and typo correction, with a calibrated confidence score on every result.
8
+ - **Built for tabular data.** `bulk()` cleans a whole pandas or polars column in one call, deduplicating repeated values.
9
+ - **A graph, not a lookup table.** List the members of the EU, NATO, or OECD, check membership, convert between code systems, and query it all as of a past date.
10
+
11
+ ## Install
12
+
13
+ ```bash
14
+ # with uv
15
+ uv add resolvekit # Python >= 3.12
16
+ uv add "resolvekit[pandas]" # add the pandas integration for bulk()
17
+
18
+ # with pip
19
+ pip install resolvekit
20
+ pip install "resolvekit[pandas]"
21
+ ```
22
+
23
+ Country, region, continent, and organization data ships in the wheel and works offline immediately. Sub-national admin levels and cities are fetched on first use — see [the docs](https://jm-rivera.github.io/resolvekit/).
24
+
25
+ ## Quickstart
26
+
27
+ ```python
28
+ import resolvekit as rk
29
+
30
+ rk.resolve_id("United States") # "country/USA"
31
+ rk.resolve("Germany", to="iso3") # "DEU"
32
+ rk.resolve("Japan", to="flag") # "🇯🇵"
33
+ rk.resolve("Tanzania", to="dcid") # "country/TZA"
34
+ ```
35
+
36
+ `to=` pivots a resolved entity to `iso2`, `iso3`, `name`, `flag`, `dcid`, `wikidata`, and other code systems.
37
+
38
+ Clean a DataFrame column in one call. `bulk()` deduplicates internally, so a 10,000-row column with 50 distinct values runs 50 resolutions, not 10,000:
39
+
40
+ ```python
41
+ import pandas as pd
42
+ import resolvekit as rk
43
+
44
+ df = pd.DataFrame({"country": ["United States", "Brasil", "Cote dIvoire", "n/a"]})
45
+ df["iso3"] = rk.bulk(values=df["country"], to="iso3")
46
+ # country iso3
47
+ # United States USA
48
+ # Brasil BRA
49
+ # Cote dIvoire CIV
50
+ # n/a None
51
+ ```
52
+
53
+ ## What you can do with it
54
+
55
+ ### Every result carries a calibrated confidence score
56
+
57
+ `confidence` is a calibrated probability, not a raw similarity score — `0.93` means the pipeline estimates roughly a 93% chance the match is correct. A code match and a fuzzy name match are on the same scale:
58
+
59
+ ```python
60
+ r = rk.resolve("US")
61
+ r.entity_id # "country/USA"
62
+ r.confidence # 0.951
63
+ r.match_tier # "exact_code"
64
+ ```
65
+
66
+ Call `r.explain(verbosity="full").as_text()` for a scorecard of which matchers fired, what they matched, and why this candidate won. It renders to text, Markdown, or JSON.
67
+
68
+ ### It abstains instead of guessing
69
+
70
+ When two candidates score too close to separate, the result is ambiguous — `confidence` is `None`, and each candidate carries its own score:
71
+
72
+ ```python
73
+ r = rk.resolve("Congo")
74
+ r.is_ambiguous # True
75
+ [(c.entity_id, round(c.confidence, 3)) for c in r.candidates[:2]]
76
+ # [("country/COD", 0.908), ("country/COG", 0.908)]
77
+ ```
78
+
79
+ `resolve_id()` raises `AmbiguousResolutionError` by default; pass `on_ambiguous="null"` for `None` or `"best"` to take the top candidate. Placeholder inputs like `"n/a"` or `"unknown"` short-circuit to no-match before scoring.
80
+
81
+ ### Query the graph — including as of a past date
82
+
83
+ Entities carry typed, time-aware relations. Ask who belongs to a group, what a region contains, and what either looked like on a given date:
84
+
85
+ ```python
86
+ from datetime import date
87
+
88
+ r = rk.default()
89
+
90
+ r.members_of("EU", as_codes="iso3") # 27 codes
91
+ r.is_member("United Kingdom", "EU") # False — left in 2020
92
+ r.is_member("United Kingdom", "EU", as_of=date(2018, 1, 1)) # True
93
+ r.within("Eastern Africa", entity_type="geo.country", to="iso3")
94
+ # ['BDI', 'COM', 'DJI', 'ERI', 'ETH', 'KEN', 'MDG', 'MOZ', 'MUS',
95
+ # 'MWI', 'MYT', 'RWA', 'SOM', 'SSD', 'SYC', 'TZA', 'UGA', 'ZMB', 'ZWE']
96
+ ```
97
+
98
+ `as_of` drops candidates outside their existence window before scoring — it's a hard filter, not a score penalty.
99
+
100
+ ### Extract entities from free text
101
+
102
+ `parse()` scans text with an Aho-Corasick dictionary pass, then runs the same resolution pipeline on each span. No NER model, no network call (needs `pip install "resolvekit[parsing]"`):
103
+
104
+ ```python
105
+ import resolvekit as rk
106
+
107
+ for e in rk.parse("Leaders from Kenya, Uganda and the United States met to discuss trade."):
108
+ if e.entity_id:
109
+ print(f"{e.surface!r} [{e.start}:{e.end}] -> {e.entity_id} ({e.confidence:.2f})")
110
+ # 'Kenya' [13:18] -> country/KEN (0.91)
111
+ # 'Uganda' [20:26] -> country/UGA (0.91)
112
+ # 'the United States' [31:48] -> country/USA (0.91)
113
+ ```
114
+
115
+ `parse_bulk()` runs over a list or Series and tags each span with its source row index.
116
+
117
+ ### Bring your own data
118
+
119
+ `Resolver.from_records()` builds a resolver from a list of dicts, a DataFrame, or a CSV — no schema, no server:
120
+
121
+ ```python
122
+ from resolvekit import Resolver
123
+
124
+ r = Resolver.from_records(
125
+ [{"id": "w1", "label": "Widget", "sku": "abc"},
126
+ {"id": "w2", "label": "Gadget", "sku": "xyz"}],
127
+ domain="custom", name="label", id="id", codes=["sku"],
128
+ )
129
+ r.resolve("Widget").entity_id # "custom/w1"
130
+ r.entity(sku="abc").entity_id # "custom/w1"
131
+ ```
132
+
133
+ `Resolver.augment()` joins your own columns onto an existing resolver's entities by code, attaching attributes without a rebuild and reporting what linked, what was minted, and what was skipped.
134
+
135
+ ## How it compares
136
+
137
+ resolvekit is benchmarked against eight other resolvers on a public, reproducible suite — run it yourself with `uv run python -m benchmarks`. Methodology and per-dataset numbers are in [benchmarks/README.md](https://github.com/jm-rivera/resolvekit/blob/main/benchmarks/README.md); the figures below are from the committed 2026-06-10 run. A dash means the tool was skipped because the dataset is outside its scope, not that it scored zero.
138
+
139
+ | tool | offline | entity types | `countries_en` | `countries_multilingual` | `admin` | `cities` |
140
+ |---|---|---|---|---|---|---|
141
+ | **resolvekit** | yes | country, admin1–5, city, continent, org | **0.793** | 0.632 | **0.935** | **0.858** |
142
+ | **resolvekit** (typed) | yes | same, with type hints | **0.794** | 0.614 | **0.977** | **0.862** |
143
+ | hdx_python_country | yes | country | 0.642 | 0.565 | — | — |
144
+ | countryguess | yes | country | 0.675 | 0.512 | — | — |
145
+ | country_converter | yes | country | 0.566 | 0.419 | — | — |
146
+ | geonamescache | yes | country, city | 0.057 | 0.148 | — | 0.000 |
147
+ | rapidfuzz_dict | yes | country | 0.469 | 0.370 | — | — |
148
+ | pycountry | yes | country | 0.099 | 0.143 | — | — |
149
+ | data_commons_resolve | no | country, admin1–2, city | 0.625 | **0.827** | 0.598 | 0.502 |
150
+
151
+ The lead is widest on sub-national data. Six of the eight competitors are country-only and can't answer admin or city queries at all. On `cities`, resolvekit scores 0.858 against data_commons_resolve's 0.502 and geonamescache's 0.000; on `admin`, 0.935 against data_commons_resolve's 0.598. resolvekit is also the only tool that emits a calibrated confidence score (ECE 0.043 on the geo eval set).
152
+
153
+ ## How it works
154
+
155
+ **The data.** resolvekit is built from public authority datasets — [Data Commons](https://datacommons.org/), ISO 3166, the UN M.49 standard, Wikidata, and World Bank/UN statistical groupings. Each entity gets one stable ID (countries reuse the Data Commons dcid, so `country/DEU` is the same key Data Commons uses) plus codes across dozens of systems: ISO 2/3/numeric, Wikidata, GND, VIAF, OpenStreetMap, IOC, and more. Names and codes drift over time; the ID stays fixed. The dataset is compiled into SQLite files that ship on disk, so resolution never makes a network call.
156
+
157
+ **The graph.** Those entities are nodes in a graph, not rows in a flat code table. A small set of typed, time-aware edges — `member_of`, `contained_in`, `subsidiary_of` — connect them, each carrying a `[valid_from, valid_until)` validity window. That structure answers questions a code-conversion library can't: which countries were in the EU on a given date, what a region contains, which group an organization belongs to. It's a small, fixed graph for entity relations — there's no query language, just the membership and containment methods on `Resolver`.
158
+
159
+ **Resolution.** A query is normalized, then run through a cascade of matchers, cheapest first — exact code, exact name, full-text search, fuzzy edit distance, then SymSpell typo correction — stopping once a confident match is found. Each candidate is scored by a calibrated model that folds match tier, edit distance, query length, and entity prominence into one confidence value, so a code match and a fuzzy name match sit on the same scale. The result is `resolved` when the top candidate clears the threshold and leads the runner-up by a margin, `ambiguous` when two are too close to separate, and `no_match` when nothing clears it.
160
+
161
+ Country, region, continent, and organization modules ship in the wheel; sub-national admin levels and cities are separate packs fetched on demand. See [How resolution works](https://jm-rivera.github.io/resolvekit/explanation/how-resolution-works/), [The entity graph](https://jm-rivera.github.io/resolvekit/explanation/knowledge-graph/), and [Offline-first and the data-pack split](https://jm-rivera.github.io/resolvekit/explanation/offline-and-data-packs/).
162
+
163
+ ## Documentation
164
+
165
+ Full documentation — tutorial, how-to guides, API reference, and design notes — lives at **[jm-rivera.github.io/resolvekit](https://jm-rivera.github.io/resolvekit/)**.
166
+
167
+ - [Install](https://jm-rivera.github.io/resolvekit/getting-started/install/)
168
+ - [Your first resolution](https://jm-rivera.github.io/resolvekit/getting-started/first-resolution/) (tutorial)
169
+ - [How-to guides](https://jm-rivera.github.io/resolvekit/how-to/clean-a-dataframe-column/)
170
+ - [API reference](https://jm-rivera.github.io/resolvekit/reference/api/)
171
+ - [How resolution works](https://jm-rivera.github.io/resolvekit/explanation/how-resolution-works/)
172
+ - [Roadmap](https://jm-rivera.github.io/resolvekit/roadmap/)
173
+
174
+ ## License
175
+
176
+ MIT — see [LICENSE](https://github.com/jm-rivera/resolvekit/blob/main/LICENSE). Bundled data is covered under multiple licenses; see [NOTICE.md](https://github.com/jm-rivera/resolvekit/blob/main/src/resolvekit/NOTICE.md) for third-party data attributions.