@pencil-agent/nano-pencil 2.0.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (195) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/mcp/mcp-client.d.ts +3 -1
  7. package/dist/core/mcp/mcp-client.js +6 -6
  8. package/dist/core/mcp/mcp-config.d.ts +3 -3
  9. package/dist/core/mcp/mcp-config.js +1 -1
  10. package/dist/core/mcp/mcp-manager.d.ts +5 -1
  11. package/dist/core/mcp/mcp-manager.js +1 -1
  12. package/dist/core/platform/config/resource-loader.d.ts +2 -0
  13. package/dist/core/platform/config/resource-loader.js +2 -2
  14. package/dist/core/runtime/agent-session.d.ts +12 -0
  15. package/dist/core/runtime/agent-session.js +8 -8
  16. package/dist/core/runtime/sdk.d.ts +8 -0
  17. package/dist/core/runtime/sdk.js +1 -1
  18. package/dist/extensions/builtin/AGENT.md +115 -115
  19. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  20. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  91. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  92. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  93. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  94. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  95. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  96. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  97. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  98. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  99. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  100. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  101. package/dist/extensions/builtin/browser/browser.md +73 -73
  102. package/dist/extensions/builtin/browser/install.md +142 -142
  103. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  104. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  105. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  107. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  108. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  109. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  110. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  111. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  112. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  113. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  114. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  115. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  116. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  117. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  118. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  119. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  120. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  121. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  122. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  123. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  124. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  125. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  126. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  127. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  128. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  129. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  130. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  131. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  132. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  133. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  134. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  135. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  136. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  137. package/dist/extensions/builtin/goal/README.md +67 -67
  138. package/dist/extensions/builtin/grub/README.md +112 -112
  139. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  140. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  141. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  142. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  143. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  144. package/dist/extensions/builtin/loop/README.md +92 -92
  145. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  146. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  147. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  148. package/dist/extensions/builtin/sal/README.md +72 -72
  149. package/dist/extensions/builtin/security-audit/README.md +289 -289
  150. package/dist/extensions/builtin/team/AGENT.md +112 -112
  151. package/dist/extensions/builtin/team/TESTING.md +299 -299
  152. package/dist/extensions/builtin/token-save/README.md +56 -56
  153. package/dist/extensions/optional/AGENT.md +10 -10
  154. package/dist/modes/interactive/interactive-mode.js +36 -36
  155. package/dist/modes/interactive/theme/dark.json +85 -85
  156. package/dist/modes/interactive/theme/light.json +84 -84
  157. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  158. package/dist/modes/interactive/theme/warm.json +81 -81
  159. package/dist/node_modules/@pencil-agent/agent-core/dist/agent-loop.js +3 -2
  160. package/dist/node_modules/@pencil-agent/agent-core/dist/structured-adaptive-agent-loop.js +2 -1
  161. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  162. package/docs/cc-agent-design.md +1297 -0
  163. package/docs/cc-tui-design.md +1333 -0
  164. package/docs/codex-goal-command-impl.md +1055 -1055
  165. package/docs/codex-goal-vs-grub.md +500 -500
  166. package/docs/custom-provider.md +27 -27
  167. package/docs/extensions.md +27 -27
  168. package/docs/keybindings.md +27 -27
  169. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  170. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  171. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  172. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  173. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  174. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  175. package/docs/loop-usage-examples.md +214 -214
  176. package/docs/models.md +27 -27
  177. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +170 -0
  178. package/docs/packages.md +27 -27
  179. package/docs/pi-design-philosophy.md +457 -457
  180. package/docs/planmode.md +1987 -1987
  181. package/docs/prompt-templates.md +27 -27
  182. package/docs/providers.md +27 -27
  183. package/docs/scan-report.md +3820 -0
  184. package/docs/sdk.md +27 -27
  185. package/docs/skills.md +27 -27
  186. package/docs/themes.md +27 -27
  187. package/docs/tui.md +27 -27
  188. package/docs//345/257/271/346/240/207Claude-Code.md +1775 -0
  189. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +261 -0
  190. package/package.json +190 -190
  191. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +0 -851
  192. package/docs/SDK-TESTING.md +0 -364
  193. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +0 -593
  194. package/docs/startup-performance-optimization.md +0 -301
  195. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +0 -47
@@ -1,490 +1,490 @@
1
- # OpenStreetMap — Nominatim Geocoding + Overpass API
2
-
3
- Two fully public, no-auth APIs. Everything is a direct HTTP call — never need a browser.
4
-
5
- - **Nominatim**: geocoding (place name → lat/lon and reverse). Rate limit: 1 req/s.
6
- - **Overpass API**: spatial query engine over the full OSM dataset. Rate limit: 2 concurrent slots per IP on the public instance.
7
-
8
- **Do not use `http_get` without overriding `User-Agent`** — its default `Mozilla/5.0` is blocked by both APIs with HTTP 403. Pass `headers={"User-Agent": "browser-harness/1.0"}` on every call.
9
-
10
- ---
11
-
12
- ## Fastest path: forward geocode a place
13
-
14
- ```python
15
- import json, urllib.parse
16
- from helpers import http_get
17
-
18
- UA = {"User-Agent": "browser-harness/1.0"}
19
-
20
- def geocode(query: str, limit: int = 3) -> list[dict]:
21
- q = urllib.parse.quote(query)
22
- raw = http_get(
23
- f"https://nominatim.openstreetmap.org/search?q={q}&format=json&limit={limit}&addressdetails=1",
24
- headers=UA
25
- )
26
- return json.loads(raw) # [] when nothing found
27
-
28
- results = geocode("Eiffel Tower")
29
- # results[0]['display_name'] == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
30
- # results[0]['lat'] == '48.8582599' ← STRING, not float
31
- # results[0]['lon'] == '2.2945006' ← STRING, not float
32
- # results[0]['type'] == 'tower'
33
- # results[0]['class'] == 'man_made'
34
- # results[0]['importance'] == 0.6205937724353116
35
- # results[0]['osm_type'] == 'way'
36
- # results[0]['osm_id'] == 5013364
37
- # results[0]['boundingbox'] == ['48.8574753', '48.8590453', '2.2933119', '2.2956897'] ← all strings
38
- # results[0]['address']['city'] == 'Paris'
39
- # results[0]['address']['postcode'] == '75007'
40
- # results[0]['address']['country'] == 'France'
41
- # results[0]['address']['country_code'] == 'fr'
42
- ```
43
-
44
- ---
45
-
46
- ## Nominatim: all three query modes
47
-
48
- ### 1. Forward geocode (free-text)
49
-
50
- ```python
51
- import json, urllib.parse
52
- from helpers import http_get
53
-
54
- UA = {"User-Agent": "browser-harness/1.0"}
55
-
56
- raw = http_get(
57
- "https://nominatim.openstreetmap.org/search?q=Eiffel+Tower&format=json&limit=3&addressdetails=1",
58
- headers=UA
59
- )
60
- results = json.loads(raw)
61
- # Returns [] when nothing found — no exception
62
-
63
- # Useful optional params:
64
- # &addressdetails=1 → adds 'address' dict to each result (city, postcode, road, etc.)
65
- # &extratags=1 → adds 'extratags' dict (website, wikidata, phone, etc.)
66
- # &namedetails=1 → adds 'namedetails' dict (name:en, name:fr, etc.)
67
- # &countrycodes=fr,de → restrict to countries (comma-separated ISO 3166-1 alpha-2)
68
- # &viewbox=2.2,48.8,2.4,48.9 &bounded=1 → restrict to bounding box (lon_min,lat_min,lon_max,lat_max)
69
- ```
70
-
71
- ### 2. Reverse geocode (lat/lon → address)
72
-
73
- ```python
74
- raw = http_get(
75
- "https://nominatim.openstreetmap.org/reverse?lat=48.8584&lon=2.2945&format=json",
76
- headers=UA
77
- )
78
- result = json.loads(raw)
79
- # result['display_name'] == 'Avenue Gustave Eiffel, Quartier du Gros-Caillou, ..., France'
80
- # result['address']['road'] == 'Avenue Gustave Eiffel'
81
- # result['address']['city'] == 'Paris'
82
- # result['address']['postcode'] == '75007'
83
- # result['address']['country'] == 'France'
84
- # result['address']['country_code'] == 'fr'
85
- # result['address']['state'] == 'Île-de-France'
86
- # result['lat'], result['lon'] → strings (not floats)
87
-
88
- # Optional: &zoom=N (0-18) controls granularity of the returned address
89
- # zoom=3 → country, zoom=10 → city, zoom=18 → street/building (default)
90
- ```
91
-
92
- ### 3. Structured search (field-based)
93
-
94
- ```python
95
- raw = http_get(
96
- "https://nominatim.openstreetmap.org/search?city=Paris&country=France&format=json&limit=1",
97
- headers=UA
98
- )
99
- result = json.loads(raw)[0]
100
- # result['name'] == 'Paris'
101
- # result['lat'] == '48.8534951'
102
- # result['lon'] == '2.3483915'
103
- # result['type'] == 'administrative'
104
- # result['place_rank'] == 12 (lower = broader: 4=country, 8=state, 12=city, 30=POI)
105
- # result['addresstype'] == 'city'
106
- # result['boundingbox'] == ['48.8155755', '48.9021560', '2.2241220', '2.4697602']
107
-
108
- # Supported structured params: street, city, county, state, country, postalcode
109
- ```
110
-
111
- ### 4. Lookup by OSM ID
112
-
113
- ```python
114
- # Prefix: N=node, W=way, R=relation
115
- raw = http_get(
116
- "https://nominatim.openstreetmap.org/lookup?osm_ids=W5013364&format=json",
117
- headers=UA
118
- )
119
- result = json.loads(raw)
120
- # Returns list. Eiffel Tower way: result[0]['name'] == 'Tour Eiffel'
121
- # Supports up to 50 IDs: osm_ids=W5013364,N123456,R789
122
- ```
123
-
124
- ---
125
-
126
- ## Nominatim response field reference
127
-
128
- | Field | Type | Notes |
129
- |-------|------|-------|
130
- | `place_id` | int | Internal Nominatim ID — do not cache long-term, can change |
131
- | `osm_type` | str | `"node"`, `"way"`, or `"relation"` |
132
- | `osm_id` | int | The OSM element ID |
133
- | `lat` | **str** | Latitude as string — convert with `float(r['lat'])` |
134
- | `lon` | **str** | Longitude as string — convert with `float(r['lon'])` |
135
- | `display_name` | str | Full human-readable address string |
136
- | `name` | str | Short name of the place |
137
- | `type` | str | OSM type tag value: `"tower"`, `"administrative"`, `"restaurant"`, etc. |
138
- | `class` | str | OSM key: `"man_made"`, `"boundary"`, `"amenity"`, `"highway"`, etc. |
139
- | `addresstype` | str | Semantic category: `"city"`, `"road"`, `"man_made"`, etc. |
140
- | `place_rank` | int | Hierarchy rank: 4=country, 8=state, 12=city, 16=suburb, 30=POI |
141
- | `importance` | float | 0–1 relevance score (higher = more notable) |
142
- | `boundingbox` | list[str] | `[south_lat, north_lat, west_lon, east_lon]` — all strings, note unusual order |
143
- | `licence` | str | ODbL attribution string — include in user-facing output |
144
- | `address` | dict | Only present with `&addressdetails=1` or in reverse results |
145
-
146
- `address` dict common keys: `road`, `house_number`, `quarter`, `suburb`, `city_district`, `city`, `state`, `postcode`, `country`, `country_code`, `ISO3166-2-lvl4/6`.
147
-
148
- ---
149
-
150
- ## Overpass API: query OSM data by tags
151
-
152
- Overpass is a read-only query engine over the full OSM planet. It supports finding POIs by tag, radius, bounding box, and combinations.
153
-
154
- **Endpoint**: `https://overpass-api.de/api/interpreter`
155
- **Backup instances** (use when main is overloaded, which happens often):
156
- - `https://overpass.openstreetmap.fr/api/interpreter` — requires non-Mozilla User-Agent
157
-
158
- **http_get works for GET requests** — pass `headers={"User-Agent": "browser-harness/1.0"}`. For POST, use `urllib` directly (see example below).
159
-
160
- ### GET query (simplest for http_get)
161
-
162
- ```python
163
- import json, urllib.parse
164
- from helpers import http_get
165
-
166
- UA = {"User-Agent": "browser-harness/1.0"}
167
- OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
168
-
169
- def overpass_get(query: str) -> dict:
170
- url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
171
- raw = http_get(url, headers=UA)
172
- if not raw.startswith("{"):
173
- raise RuntimeError(f"Overpass error (HTML returned): {raw[:200]}")
174
- return json.loads(raw)
175
-
176
- # Find cafes in central Paris (bbox: south_lat, west_lon, north_lat, east_lon)
177
- r = overpass_get('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 10;')
178
- # r['version'] == 0.6
179
- # r['generator'] == 'Overpass API 0.7.62.7 375dc00a'
180
- # r['elements'] → list of matching OSM elements
181
-
182
- for cafe in r['elements']:
183
- print(cafe['tags'].get('name'), cafe['lat'], cafe['lon'])
184
- # 'Café de l\'Alma' 48.8609068 2.3015143
185
- # 'Le Campanella' 48.8585847 2.3032822
186
- # 'Kozy Bosquet' 48.855445 2.3054013
187
-
188
- # Find restaurants within 500m radius of a point (around filter)
189
- r = overpass_get(
190
- '[out:json][timeout:25];node["amenity"="restaurant"](around:500,37.7749,-122.4194);out 10;'
191
- )
192
- for rest in r['elements']:
193
- print(rest['tags'].get('name'), rest['tags'].get('cuisine',''))
194
- # 'Nepalese Indian Cusine' 'indian;nepali'
195
- # 'Local Diner' 'coffee_shop;italian;burger;seafood'
196
- # 'Moya Cafe' ''
197
- ```
198
-
199
- ### POST query (for complex QL, avoids URL length limits)
200
-
201
- ```python
202
- import json, urllib.parse, urllib.request, gzip
203
- from helpers import http_get # http_get is GET-only; use urllib for POST
204
-
205
- OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
206
-
207
- def overpass_post(query: str) -> dict:
208
- """POST to Overpass — no URL length limits, preferred for multi-statement QL."""
209
- data = urllib.parse.urlencode({"data": query}).encode()
210
- req = urllib.request.Request(
211
- OVERPASS, data=data, method="POST",
212
- headers={
213
- "User-Agent": "browser-harness/1.0",
214
- "Content-Type": "application/x-www-form-urlencoded",
215
- "Accept-Encoding": "gzip",
216
- }
217
- )
218
- with urllib.request.urlopen(req, timeout=30) as r:
219
- body = r.read()
220
- if r.headers.get("Content-Encoding") == "gzip":
221
- body = gzip.decompress(body)
222
- body = body.decode()
223
- if not body.startswith("{"):
224
- raise RuntimeError(f"Overpass error (HTML): {body[:300]}")
225
- return json.loads(body)
226
-
227
- # Example: cafes in Paris bbox
228
- r = overpass_post('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 5;')
229
- print(len(r['elements'])) # 5 (or up to 5)
230
- ```
231
-
232
- ### Overpass element structure
233
-
234
- Every element in `r['elements']` is a dict with at minimum:
235
-
236
- ```python
237
- {
238
- "type": "node", # "node", "way", or "relation"
239
- "id": 308684349, # int — OSM element ID (stable, use for dedup)
240
- "lat": 48.8609068, # float — ONLY present for node type
241
- "lon": 2.3015143, # float — ONLY present for node type
242
- "tags": { # dict — all OSM tags on this element
243
- "amenity": "cafe",
244
- "name": "Café de l'Alma",
245
- "name:fr": "Café de l'Alma",
246
- "outdoor_seating": "yes",
247
- "payment:credit_cards": "yes",
248
- "phone": "+33 1 45 51 56 74",
249
- "opening_hours": "Mo-Sa 08:00-23:00; Su 09:00-19:00", # optional
250
- "website": "https://...", # optional
251
- "wheelchair": "yes" # optional
252
- }
253
- }
254
- ```
255
-
256
- For `way` elements, use `out center;` to get a `center` dict with lat/lon instead of a node list:
257
-
258
- ```python
259
- # way element with out center:
260
- {
261
- "type": "way",
262
- "id": 338411946,
263
- "center": {"lat": 48.8660087, "lon": 2.3153233}, # centroid of the polygon
264
- "nodes": [3454913623, 3454913707, ...], # node IDs forming the boundary
265
- "tags": {"amenity": "cafe", "name": "Café 1902", ...}
266
- }
267
-
268
- # Query to get both nodes and ways with lat/lon:
269
- query = '[out:json][timeout:25];(node["amenity"="cafe"](48.85,2.29,48.87,2.32);way["amenity"="cafe"](48.85,2.29,48.87,2.32););out center 20;'
270
- r = overpass_get(query)
271
- for el in r['elements']:
272
- if el['type'] == 'node':
273
- lat, lon = el['lat'], el['lon']
274
- else: # way
275
- lat, lon = el['center']['lat'], el['center']['lon']
276
- print(el['tags'].get('name'), lat, lon)
277
- ```
278
-
279
- ### Overpass QL quick reference
280
-
281
- ```
282
- [out:json][timeout:25] # Required header: JSON output, 25s timeout
283
- [maxsize:52428800] # Optional: 50MB max result size (default is server limit)
284
-
285
- node["amenity"="cafe"](south,west,north,east);out N;
286
- # ↑ bbox order: south_lat, west_lon, north_lat, east_lon
287
- # Note: DIFFERENT from Nominatim's boundingbox field which is [south,north,west,east]
288
-
289
- node["amenity"="cafe"](around:RADIUS_METERS,LAT,LON);out N;
290
-
291
- node["amenity"~"cafe|restaurant"](bbox);out N; # regex match on tag value
292
- node[!"name"](bbox);out N; # elements WITHOUT the 'name' tag
293
- node["name"~"Star",i](bbox);out N; # case-insensitive regex
294
-
295
- # Union of types:
296
- (node["amenity"="cafe"](bbox); way["amenity"="cafe"](bbox););out center N;
297
-
298
- # Multiple tags (AND logic):
299
- node["amenity"="cafe"]["outdoor_seating"="yes"](bbox);out N;
300
- ```
301
-
302
- ---
303
-
304
- ## OSM tile server (reference only, no scraping)
305
-
306
- ```
307
- https://{a,b,c}.tile.openstreetmap.org/{z}/{x}/{y}.png
308
- ```
309
-
310
- - Subdomains `a`, `b`, `c` for load balancing
311
- - `z` = zoom level 0–19, `x`/`y` = tile coordinates
312
- - Returns 256×256 PNG tiles
313
- - Policy: max 2 req/s per IP, non-commercial use, must display OSM attribution
314
- - Tile coordinate calculator: `https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames`
315
- - Bulk tile downloading is prohibited — use Overpass or data extracts instead
316
-
317
- ```python
318
- # Convert lat/lon to tile coordinates
319
- import math
320
-
321
- def lat_lon_to_tile(lat, lon, zoom):
322
- n = 2 ** zoom
323
- x = int((lon + 180) / 360 * n)
324
- y = int((1 - math.log(math.tan(math.radians(lat)) + 1 / math.cos(math.radians(lat))) / math.pi) / 2 * n)
325
- return x, y
326
-
327
- x, y = lat_lon_to_tile(48.8582, 2.2945, 14)
328
- url = f"https://a.tile.openstreetmap.org/14/{x}/{y}.png"
329
- # url == 'https://a.tile.openstreetmap.org/14/8281/5646.png'
330
- ```
331
-
332
- ---
333
-
334
- ## Rate limits
335
-
336
- | API | Limit | Enforcement | 429 behavior |
337
- |-----|-------|-------------|--------------|
338
- | Nominatim | 1 req/s | Soft — rapid requests work but you get delayed/dropped | Returns HTTP 403 if your IP is banned (not 429) |
339
- | Overpass (main) | 2 concurrent slots per IP | Hard — 3rd concurrent req returns HTML error immediately | HTML error page with `rate_limited` in body |
340
- | Overpass (main) | Also: query complexity quota | Resets over time (~per hour) | HTML error page with `rate_limited` |
341
- | Tile server | 2 req/s per IP | Soft/hard | IP block |
342
-
343
- **Check your Overpass quota**:
344
- ```python
345
- raw = http_get("https://overpass-api.de/api/status", headers={"User-Agent": "browser-harness/1.0"})
346
- print(raw)
347
- # Connected as: 1728118854
348
- # Rate limit: 2
349
- # 2 slots available now.
350
- # Slot available after: 2026-04-18T11:00:00Z, in 30 seconds.
351
- ```
352
-
353
- **Handle rate limiting in production**:
354
- ```python
355
- import time
356
-
357
- def overpass_get_with_retry(query: str, max_retries: int = 3) -> dict:
358
- for attempt in range(max_retries):
359
- url = f"https://overpass.openstreetmap.fr/api/interpreter?data={urllib.parse.quote(query)}"
360
- raw = http_get(url, headers={"User-Agent": "browser-harness/1.0"})
361
- if raw.startswith("{"):
362
- return json.loads(raw)
363
- if "rate_limited" in raw or "too busy" in raw:
364
- wait = 2 ** attempt * 10 # 10s, 20s, 40s
365
- time.sleep(wait)
366
- continue
367
- raise RuntimeError(f"Overpass error: {raw[:200]}")
368
- raise RuntimeError("Overpass: too many retries")
369
- ```
370
-
371
- ---
372
-
373
- ## Complete working example
374
-
375
- ```python
376
- import json, time, urllib.parse, urllib.request, gzip
377
- from helpers import http_get
378
-
379
- UA = {"User-Agent": "browser-harness/1.0"}
380
- NOMINATIM = "https://nominatim.openstreetmap.org"
381
- OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
382
-
383
- def geocode(query: str, limit: int = 1) -> list[dict]:
384
- """Forward geocode — returns [] if nothing found."""
385
- q = urllib.parse.quote(query)
386
- raw = http_get(f"{NOMINATIM}/search?q={q}&format=json&limit={limit}&addressdetails=1", headers=UA)
387
- return json.loads(raw)
388
-
389
- def reverse_geocode(lat: float, lon: float) -> dict:
390
- """Reverse geocode — always returns a result (nearest road/place)."""
391
- raw = http_get(f"{NOMINATIM}/reverse?lat={lat}&lon={lon}&format=json", headers=UA)
392
- return json.loads(raw)
393
-
394
- def overpass_get(query: str) -> list[dict]:
395
- """Run an Overpass QL query, return elements list."""
396
- url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
397
- raw = http_get(url, headers=UA)
398
- if not raw.startswith("{"):
399
- raise RuntimeError(f"Overpass error: {raw[:200]}")
400
- return json.loads(raw)["elements"]
401
-
402
- def overpass_post(query: str) -> list[dict]:
403
- """POST variant — avoids URL length limits for complex queries."""
404
- data = urllib.parse.urlencode({"data": query}).encode()
405
- req = urllib.request.Request(
406
- OVERPASS, data=data, method="POST",
407
- headers={"User-Agent": "browser-harness/1.0",
408
- "Content-Type": "application/x-www-form-urlencoded",
409
- "Accept-Encoding": "gzip"}
410
- )
411
- with urllib.request.urlopen(req, timeout=30) as r:
412
- body = r.read()
413
- if r.headers.get("Content-Encoding") == "gzip":
414
- body = gzip.decompress(body)
415
- body = body.decode()
416
- if not body.startswith("{"):
417
- raise RuntimeError(f"Overpass error: {body[:300]}")
418
- return json.loads(body)["elements"]
419
-
420
- # --- Usage examples (validated 2026-04-18) ---
421
-
422
- # 1. Geocode a landmark
423
- places = geocode("Eiffel Tower", limit=3)
424
- # places[0]['lat'] == '48.8582599' (string)
425
- # places[0]['lon'] == '2.2945006' (string)
426
- # places[0]['display_name'] == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
427
- # places[0]['address']['city'] == 'Paris'
428
- lat = float(places[0]['lat'])
429
- lon = float(places[0]['lon'])
430
-
431
- # 2. Reverse geocode the coordinates
432
- addr = reverse_geocode(lat, lon)
433
- # addr['address']['road'] == 'Avenue Gustave Eiffel'
434
- # addr['address']['city'] == 'Paris'
435
- # addr['address']['postcode']== '75007'
436
- # addr['address']['country'] == 'France'
437
-
438
- # 3. Find nearby cafes (wait 1s between nominatim and overpass if same script)
439
- time.sleep(1)
440
- cafes = overpass_get(
441
- f"[out:json][timeout:25];node[\"amenity\"=\"cafe\"](around:500,{lat},{lon});out 10;"
442
- )
443
- for cafe in cafes:
444
- print(f"{cafe['tags'].get('name','?'):30s} {cafe['lat']:.4f}, {cafe['lon']:.4f}")
445
- # Café de l'Alma 48.8609, 2.3015
446
- # Le Campanella 48.8586, 2.3033
447
-
448
- # 4. Structured city lookup + find restaurants in bounding box
449
- time.sleep(1)
450
- paris = geocode("Paris, France")[0]
451
- bb = paris['boundingbox'] # [south_lat, north_lat, west_lon, east_lon] ← Nominatim order!
452
- # For Overpass: need (south_lat, west_lon, north_lat, east_lon) ← DIFFERENT order
453
- south, north, west, east = bb[0], bb[1], bb[2], bb[3]
454
- # Restrict to center slice to avoid massive result set
455
- center_bbox = f"48.855,2.295,48.865,2.315"
456
- rests = overpass_post(
457
- f"[out:json][timeout:25];node[\"amenity\"=\"restaurant\"]({center_bbox});out 5;"
458
- )
459
- print(f"Found {len(rests)} restaurants near Paris center")
460
- ```
461
-
462
- ---
463
-
464
- ## Gotchas
465
-
466
- **`http_get` default UA (`Mozilla/5.0`) is blocked by both APIs.** Always pass `headers={"User-Agent": "browser-harness/1.0"}`. The `headers` kwarg in `http_get` does a `.update()` so it properly overrides the default. Confirmed: Mozilla/5.0 → 403 on Nominatim; `browser-harness/1.0` → 200.
467
-
468
- **Blocked User-Agent patterns on Nominatim**: `Mozilla/5.0`, `python-requests/*`, `Wget/*`. Accepted: any non-generic app-style UA like `browser-harness/1.0`, `MyApp/2.0`, `curl/7.x`. Nominatim policy requires a descriptive UA with contact info, but in practice any non-library string passes.
469
-
470
- **Nominatim lat/lon are strings, Overpass lat/lon are floats.** Always convert Nominatim coordinates: `float(result['lat'])`. Overpass element `lat`/`lon` are native Python floats — no conversion needed.
471
-
472
- **Nominatim `boundingbox` field order is `[south_lat, north_lat, west_lon, east_lon]` — NOT `[south, west, north, east]`.** Overpass bbox uses `(south_lat, west_lon, north_lat, east_lon)`. When feeding a Nominatim bounding box into Overpass, you must reorder: `f"({bb[0]},{bb[2]},{bb[1]},{bb[3]})"`.
473
-
474
- **`overpass-api.de` main instance is frequently overloaded.** Returns HTTP 504 (timeout) or an HTML error page with `rate_limited` when busy. The FR mirror (`overpass.openstreetmap.fr`) is usually more responsive but also blocks `Mozilla/5.0`. Always detect non-JSON responses: `if not raw.startswith("{")`.
475
-
476
- **Overpass error responses are HTML, not JSON.** The API returns HTTP 200 with an HTML error page when rate-limited or when the server is too busy. Always check `raw.startswith("{")` before parsing.
477
-
478
- **Overpass rate limit: 2 concurrent slots, NOT 2 requests/s.** You can run 2 queries simultaneously. A 3rd concurrent query immediately returns an error. Sequential queries with no sleep between them work fine as long as each completes before the next starts.
479
-
480
- **`out N;` limits results to N elements — use it.** Without a limit, large bounding boxes can return thousands of elements and hit the 512MB memory limit, returning a `maxsize` error. Default safe limit: `out 50;` for exploration, `out 500;` for bulk collection.
481
-
482
- **Overpass QL bbox order is `(south, west, north, east)` — latitude FIRST.** This is the opposite of the standard GeoJSON convention `[west, south, east, north]`. The `around:` filter uses `(around:METERS,LAT,LON)` — note lat before lon.
483
-
484
- **`name` tag in Overpass is the local-language name.** For Paris cafes this is French. English names may appear under `name:en` but are often absent. Never assume `name` is in English.
485
-
486
- **Nominatim `/reverse` always returns the nearest result** — it never returns an empty response (unlike `/search`). If the coordinates are in the ocean, it still returns the nearest coastline or country.
487
-
488
- **`place_id` is internal and ephemeral** — do not store it for long-term use. Use `osm_type` + `osm_id` for stable references (e.g., `way/5013364` for the Eiffel Tower).
489
-
490
- **Overpass `http_get` POST workaround**: `http_get` only supports GET. For POST requests (needed to avoid URL length limits for complex multi-statement QL), use `urllib.request.Request` directly as shown in the `overpass_post()` example above.
1
+ # OpenStreetMap — Nominatim Geocoding + Overpass API
2
+
3
+ Two fully public, no-auth APIs. Everything is a direct HTTP call — never need a browser.
4
+
5
+ - **Nominatim**: geocoding (place name → lat/lon and reverse). Rate limit: 1 req/s.
6
+ - **Overpass API**: spatial query engine over the full OSM dataset. Rate limit: 2 concurrent slots per IP on the public instance.
7
+
8
+ **Do not use `http_get` without overriding `User-Agent`** — its default `Mozilla/5.0` is blocked by both APIs with HTTP 403. Pass `headers={"User-Agent": "browser-harness/1.0"}` on every call.
9
+
10
+ ---
11
+
12
+ ## Fastest path: forward geocode a place
13
+
14
+ ```python
15
+ import json, urllib.parse
16
+ from helpers import http_get
17
+
18
+ UA = {"User-Agent": "browser-harness/1.0"}
19
+
20
+ def geocode(query: str, limit: int = 3) -> list[dict]:
21
+ q = urllib.parse.quote(query)
22
+ raw = http_get(
23
+ f"https://nominatim.openstreetmap.org/search?q={q}&format=json&limit={limit}&addressdetails=1",
24
+ headers=UA
25
+ )
26
+ return json.loads(raw) # [] when nothing found
27
+
28
+ results = geocode("Eiffel Tower")
29
+ # results[0]['display_name'] == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
30
+ # results[0]['lat'] == '48.8582599' ← STRING, not float
31
+ # results[0]['lon'] == '2.2945006' ← STRING, not float
32
+ # results[0]['type'] == 'tower'
33
+ # results[0]['class'] == 'man_made'
34
+ # results[0]['importance'] == 0.6205937724353116
35
+ # results[0]['osm_type'] == 'way'
36
+ # results[0]['osm_id'] == 5013364
37
+ # results[0]['boundingbox'] == ['48.8574753', '48.8590453', '2.2933119', '2.2956897'] ← all strings
38
+ # results[0]['address']['city'] == 'Paris'
39
+ # results[0]['address']['postcode'] == '75007'
40
+ # results[0]['address']['country'] == 'France'
41
+ # results[0]['address']['country_code'] == 'fr'
42
+ ```
43
+
44
+ ---
45
+
46
+ ## Nominatim: all three query modes
47
+
48
+ ### 1. Forward geocode (free-text)
49
+
50
+ ```python
51
+ import json, urllib.parse
52
+ from helpers import http_get
53
+
54
+ UA = {"User-Agent": "browser-harness/1.0"}
55
+
56
+ raw = http_get(
57
+ "https://nominatim.openstreetmap.org/search?q=Eiffel+Tower&format=json&limit=3&addressdetails=1",
58
+ headers=UA
59
+ )
60
+ results = json.loads(raw)
61
+ # Returns [] when nothing found — no exception
62
+
63
+ # Useful optional params:
64
+ # &addressdetails=1 → adds 'address' dict to each result (city, postcode, road, etc.)
65
+ # &extratags=1 → adds 'extratags' dict (website, wikidata, phone, etc.)
66
+ # &namedetails=1 → adds 'namedetails' dict (name:en, name:fr, etc.)
67
+ # &countrycodes=fr,de → restrict to countries (comma-separated ISO 3166-1 alpha-2)
68
+ # &viewbox=2.2,48.8,2.4,48.9 &bounded=1 → restrict to bounding box (lon_min,lat_min,lon_max,lat_max)
69
+ ```
70
+
71
+ ### 2. Reverse geocode (lat/lon → address)
72
+
73
+ ```python
74
+ raw = http_get(
75
+ "https://nominatim.openstreetmap.org/reverse?lat=48.8584&lon=2.2945&format=json",
76
+ headers=UA
77
+ )
78
+ result = json.loads(raw)
79
+ # result['display_name'] == 'Avenue Gustave Eiffel, Quartier du Gros-Caillou, ..., France'
80
+ # result['address']['road'] == 'Avenue Gustave Eiffel'
81
+ # result['address']['city'] == 'Paris'
82
+ # result['address']['postcode'] == '75007'
83
+ # result['address']['country'] == 'France'
84
+ # result['address']['country_code'] == 'fr'
85
+ # result['address']['state'] == 'Île-de-France'
86
+ # result['lat'], result['lon'] → strings (not floats)
87
+
88
+ # Optional: &zoom=N (0-18) controls granularity of the returned address
89
+ # zoom=3 → country, zoom=10 → city, zoom=18 → street/building (default)
90
+ ```
91
+
92
+ ### 3. Structured search (field-based)
93
+
94
+ ```python
95
+ raw = http_get(
96
+ "https://nominatim.openstreetmap.org/search?city=Paris&country=France&format=json&limit=1",
97
+ headers=UA
98
+ )
99
+ result = json.loads(raw)[0]
100
+ # result['name'] == 'Paris'
101
+ # result['lat'] == '48.8534951'
102
+ # result['lon'] == '2.3483915'
103
+ # result['type'] == 'administrative'
104
+ # result['place_rank'] == 12 (lower = broader: 4=country, 8=state, 12=city, 30=POI)
105
+ # result['addresstype'] == 'city'
106
+ # result['boundingbox'] == ['48.8155755', '48.9021560', '2.2241220', '2.4697602']
107
+
108
+ # Supported structured params: street, city, county, state, country, postalcode
109
+ ```
110
+
111
+ ### 4. Lookup by OSM ID
112
+
113
+ ```python
114
+ # Prefix: N=node, W=way, R=relation
115
+ raw = http_get(
116
+ "https://nominatim.openstreetmap.org/lookup?osm_ids=W5013364&format=json",
117
+ headers=UA
118
+ )
119
+ result = json.loads(raw)
120
+ # Returns list. Eiffel Tower way: result[0]['name'] == 'Tour Eiffel'
121
+ # Supports up to 50 IDs: osm_ids=W5013364,N123456,R789
122
+ ```
123
+
124
+ ---
125
+
126
+ ## Nominatim response field reference
127
+
128
+ | Field | Type | Notes |
129
+ |-------|------|-------|
130
+ | `place_id` | int | Internal Nominatim ID — do not cache long-term, can change |
131
+ | `osm_type` | str | `"node"`, `"way"`, or `"relation"` |
132
+ | `osm_id` | int | The OSM element ID |
133
+ | `lat` | **str** | Latitude as string — convert with `float(r['lat'])` |
134
+ | `lon` | **str** | Longitude as string — convert with `float(r['lon'])` |
135
+ | `display_name` | str | Full human-readable address string |
136
+ | `name` | str | Short name of the place |
137
+ | `type` | str | OSM type tag value: `"tower"`, `"administrative"`, `"restaurant"`, etc. |
138
+ | `class` | str | OSM key: `"man_made"`, `"boundary"`, `"amenity"`, `"highway"`, etc. |
139
+ | `addresstype` | str | Semantic category: `"city"`, `"road"`, `"man_made"`, etc. |
140
+ | `place_rank` | int | Hierarchy rank: 4=country, 8=state, 12=city, 16=suburb, 30=POI |
141
+ | `importance` | float | 0–1 relevance score (higher = more notable) |
142
+ | `boundingbox` | list[str] | `[south_lat, north_lat, west_lon, east_lon]` — all strings, note unusual order |
143
+ | `licence` | str | ODbL attribution string — include in user-facing output |
144
+ | `address` | dict | Only present with `&addressdetails=1` or in reverse results |
145
+
146
+ `address` dict common keys: `road`, `house_number`, `quarter`, `suburb`, `city_district`, `city`, `state`, `postcode`, `country`, `country_code`, `ISO3166-2-lvl4/6`.
147
+
148
+ ---
149
+
150
+ ## Overpass API: query OSM data by tags
151
+
152
+ Overpass is a read-only query engine over the full OSM planet. It supports finding POIs by tag, radius, bounding box, and combinations.
153
+
154
+ **Endpoint**: `https://overpass-api.de/api/interpreter`
155
+ **Backup instances** (use when main is overloaded, which happens often):
156
+ - `https://overpass.openstreetmap.fr/api/interpreter` — requires non-Mozilla User-Agent
157
+
158
+ **http_get works for GET requests** — pass `headers={"User-Agent": "browser-harness/1.0"}`. For POST, use `urllib` directly (see example below).
159
+
160
+ ### GET query (simplest for http_get)
161
+
162
+ ```python
163
+ import json, urllib.parse
164
+ from helpers import http_get
165
+
166
+ UA = {"User-Agent": "browser-harness/1.0"}
167
+ OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
168
+
169
+ def overpass_get(query: str) -> dict:
170
+ url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
171
+ raw = http_get(url, headers=UA)
172
+ if not raw.startswith("{"):
173
+ raise RuntimeError(f"Overpass error (HTML returned): {raw[:200]}")
174
+ return json.loads(raw)
175
+
176
+ # Find cafes in central Paris (bbox: south_lat, west_lon, north_lat, east_lon)
177
+ r = overpass_get('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 10;')
178
+ # r['version'] == 0.6
179
+ # r['generator'] == 'Overpass API 0.7.62.7 375dc00a'
180
+ # r['elements'] → list of matching OSM elements
181
+
182
+ for cafe in r['elements']:
183
+ print(cafe['tags'].get('name'), cafe['lat'], cafe['lon'])
184
+ # 'Café de l\'Alma' 48.8609068 2.3015143
185
+ # 'Le Campanella' 48.8585847 2.3032822
186
+ # 'Kozy Bosquet' 48.855445 2.3054013
187
+
188
+ # Find restaurants within 500m radius of a point (around filter)
189
+ r = overpass_get(
190
+ '[out:json][timeout:25];node["amenity"="restaurant"](around:500,37.7749,-122.4194);out 10;'
191
+ )
192
+ for rest in r['elements']:
193
+ print(rest['tags'].get('name'), rest['tags'].get('cuisine',''))
194
+ # 'Nepalese Indian Cusine' 'indian;nepali'
195
+ # 'Local Diner' 'coffee_shop;italian;burger;seafood'
196
+ # 'Moya Cafe' ''
197
+ ```
198
+
199
+ ### POST query (for complex QL, avoids URL length limits)
200
+
201
+ ```python
202
+ import json, urllib.parse, urllib.request, gzip
203
+ from helpers import http_get # http_get is GET-only; use urllib for POST
204
+
205
+ OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
206
+
207
+ def overpass_post(query: str) -> dict:
208
+ """POST to Overpass — no URL length limits, preferred for multi-statement QL."""
209
+ data = urllib.parse.urlencode({"data": query}).encode()
210
+ req = urllib.request.Request(
211
+ OVERPASS, data=data, method="POST",
212
+ headers={
213
+ "User-Agent": "browser-harness/1.0",
214
+ "Content-Type": "application/x-www-form-urlencoded",
215
+ "Accept-Encoding": "gzip",
216
+ }
217
+ )
218
+ with urllib.request.urlopen(req, timeout=30) as r:
219
+ body = r.read()
220
+ if r.headers.get("Content-Encoding") == "gzip":
221
+ body = gzip.decompress(body)
222
+ body = body.decode()
223
+ if not body.startswith("{"):
224
+ raise RuntimeError(f"Overpass error (HTML): {body[:300]}")
225
+ return json.loads(body)
226
+
227
+ # Example: cafes in Paris bbox
228
+ r = overpass_post('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 5;')
229
+ print(len(r['elements'])) # 5 (or up to 5)
230
+ ```
231
+
232
+ ### Overpass element structure
233
+
234
+ Every element in `r['elements']` is a dict with at minimum:
235
+
236
+ ```python
237
+ {
238
+ "type": "node", # "node", "way", or "relation"
239
+ "id": 308684349, # int — OSM element ID (stable, use for dedup)
240
+ "lat": 48.8609068, # float — ONLY present for node type
241
+ "lon": 2.3015143, # float — ONLY present for node type
242
+ "tags": { # dict — all OSM tags on this element
243
+ "amenity": "cafe",
244
+ "name": "Café de l'Alma",
245
+ "name:fr": "Café de l'Alma",
246
+ "outdoor_seating": "yes",
247
+ "payment:credit_cards": "yes",
248
+ "phone": "+33 1 45 51 56 74",
249
+ "opening_hours": "Mo-Sa 08:00-23:00; Su 09:00-19:00", # optional
250
+ "website": "https://...", # optional
251
+ "wheelchair": "yes" # optional
252
+ }
253
+ }
254
+ ```
255
+
256
+ For `way` elements, use `out center;` to get a `center` dict with lat/lon instead of a node list:
257
+
258
+ ```python
259
+ # way element with out center:
260
+ {
261
+ "type": "way",
262
+ "id": 338411946,
263
+ "center": {"lat": 48.8660087, "lon": 2.3153233}, # centroid of the polygon
264
+ "nodes": [3454913623, 3454913707, ...], # node IDs forming the boundary
265
+ "tags": {"amenity": "cafe", "name": "Café 1902", ...}
266
+ }
267
+
268
+ # Query to get both nodes and ways with lat/lon:
269
+ query = '[out:json][timeout:25];(node["amenity"="cafe"](48.85,2.29,48.87,2.32);way["amenity"="cafe"](48.85,2.29,48.87,2.32););out center 20;'
270
+ r = overpass_get(query)
271
+ for el in r['elements']:
272
+ if el['type'] == 'node':
273
+ lat, lon = el['lat'], el['lon']
274
+ else: # way
275
+ lat, lon = el['center']['lat'], el['center']['lon']
276
+ print(el['tags'].get('name'), lat, lon)
277
+ ```
278
+
279
+ ### Overpass QL quick reference
280
+
281
+ ```
282
+ [out:json][timeout:25] # Required header: JSON output, 25s timeout
283
+ [maxsize:52428800] # Optional: 50MB max result size (default is server limit)
284
+
285
+ node["amenity"="cafe"](south,west,north,east);out N;
286
+ # ↑ bbox order: south_lat, west_lon, north_lat, east_lon
287
+ # Note: DIFFERENT from Nominatim's boundingbox field which is [south,north,west,east]
288
+
289
+ node["amenity"="cafe"](around:RADIUS_METERS,LAT,LON);out N;
290
+
291
+ node["amenity"~"cafe|restaurant"](bbox);out N; # regex match on tag value
292
+ node[!"name"](bbox);out N; # elements WITHOUT the 'name' tag
293
+ node["name"~"Star",i](bbox);out N; # case-insensitive regex
294
+
295
+ # Union of types:
296
+ (node["amenity"="cafe"](bbox); way["amenity"="cafe"](bbox););out center N;
297
+
298
+ # Multiple tags (AND logic):
299
+ node["amenity"="cafe"]["outdoor_seating"="yes"](bbox);out N;
300
+ ```
301
+
302
+ ---
303
+
304
+ ## OSM tile server (reference only, no scraping)
305
+
306
+ ```
307
+ https://{a,b,c}.tile.openstreetmap.org/{z}/{x}/{y}.png
308
+ ```
309
+
310
+ - Subdomains `a`, `b`, `c` for load balancing
311
+ - `z` = zoom level 0–19, `x`/`y` = tile coordinates
312
+ - Returns 256×256 PNG tiles
313
+ - Policy: max 2 req/s per IP, non-commercial use, must display OSM attribution
314
+ - Tile coordinate calculator: `https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames`
315
+ - Bulk tile downloading is prohibited — use Overpass or data extracts instead
316
+
317
+ ```python
318
+ # Convert lat/lon to tile coordinates
319
+ import math
320
+
321
+ def lat_lon_to_tile(lat, lon, zoom):
322
+ n = 2 ** zoom
323
+ x = int((lon + 180) / 360 * n)
324
+ y = int((1 - math.log(math.tan(math.radians(lat)) + 1 / math.cos(math.radians(lat))) / math.pi) / 2 * n)
325
+ return x, y
326
+
327
+ x, y = lat_lon_to_tile(48.8582, 2.2945, 14)
328
+ url = f"https://a.tile.openstreetmap.org/14/{x}/{y}.png"
329
+ # url == 'https://a.tile.openstreetmap.org/14/8281/5646.png'
330
+ ```
331
+
332
+ ---
333
+
334
+ ## Rate limits
335
+
336
+ | API | Limit | Enforcement | 429 behavior |
337
+ |-----|-------|-------------|--------------|
338
+ | Nominatim | 1 req/s | Soft — rapid requests work but you get delayed/dropped | Returns HTTP 403 if your IP is banned (not 429) |
339
+ | Overpass (main) | 2 concurrent slots per IP | Hard — 3rd concurrent req returns HTML error immediately | HTML error page with `rate_limited` in body |
340
+ | Overpass (main) | Also: query complexity quota | Resets over time (~per hour) | HTML error page with `rate_limited` |
341
+ | Tile server | 2 req/s per IP | Soft/hard | IP block |
342
+
343
+ **Check your Overpass quota**:
344
+ ```python
345
+ raw = http_get("https://overpass-api.de/api/status", headers={"User-Agent": "browser-harness/1.0"})
346
+ print(raw)
347
+ # Connected as: 1728118854
348
+ # Rate limit: 2
349
+ # 2 slots available now.
350
+ # Slot available after: 2026-04-18T11:00:00Z, in 30 seconds.
351
+ ```
352
+
353
+ **Handle rate limiting in production**:
354
+ ```python
355
+ import time
356
+
357
+ def overpass_get_with_retry(query: str, max_retries: int = 3) -> dict:
358
+ for attempt in range(max_retries):
359
+ url = f"https://overpass.openstreetmap.fr/api/interpreter?data={urllib.parse.quote(query)}"
360
+ raw = http_get(url, headers={"User-Agent": "browser-harness/1.0"})
361
+ if raw.startswith("{"):
362
+ return json.loads(raw)
363
+ if "rate_limited" in raw or "too busy" in raw:
364
+ wait = 2 ** attempt * 10 # 10s, 20s, 40s
365
+ time.sleep(wait)
366
+ continue
367
+ raise RuntimeError(f"Overpass error: {raw[:200]}")
368
+ raise RuntimeError("Overpass: too many retries")
369
+ ```
370
+
371
+ ---
372
+
373
+ ## Complete working example
374
+
375
+ ```python
376
+ import json, time, urllib.parse, urllib.request, gzip
377
+ from helpers import http_get
378
+
379
+ UA = {"User-Agent": "browser-harness/1.0"}
380
+ NOMINATIM = "https://nominatim.openstreetmap.org"
381
+ OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
382
+
383
+ def geocode(query: str, limit: int = 1) -> list[dict]:
384
+ """Forward geocode — returns [] if nothing found."""
385
+ q = urllib.parse.quote(query)
386
+ raw = http_get(f"{NOMINATIM}/search?q={q}&format=json&limit={limit}&addressdetails=1", headers=UA)
387
+ return json.loads(raw)
388
+
389
+ def reverse_geocode(lat: float, lon: float) -> dict:
390
+ """Reverse geocode — always returns a result (nearest road/place)."""
391
+ raw = http_get(f"{NOMINATIM}/reverse?lat={lat}&lon={lon}&format=json", headers=UA)
392
+ return json.loads(raw)
393
+
394
+ def overpass_get(query: str) -> list[dict]:
395
+ """Run an Overpass QL query, return elements list."""
396
+ url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
397
+ raw = http_get(url, headers=UA)
398
+ if not raw.startswith("{"):
399
+ raise RuntimeError(f"Overpass error: {raw[:200]}")
400
+ return json.loads(raw)["elements"]
401
+
402
+ def overpass_post(query: str) -> list[dict]:
403
+ """POST variant — avoids URL length limits for complex queries."""
404
+ data = urllib.parse.urlencode({"data": query}).encode()
405
+ req = urllib.request.Request(
406
+ OVERPASS, data=data, method="POST",
407
+ headers={"User-Agent": "browser-harness/1.0",
408
+ "Content-Type": "application/x-www-form-urlencoded",
409
+ "Accept-Encoding": "gzip"}
410
+ )
411
+ with urllib.request.urlopen(req, timeout=30) as r:
412
+ body = r.read()
413
+ if r.headers.get("Content-Encoding") == "gzip":
414
+ body = gzip.decompress(body)
415
+ body = body.decode()
416
+ if not body.startswith("{"):
417
+ raise RuntimeError(f"Overpass error: {body[:300]}")
418
+ return json.loads(body)["elements"]
419
+
420
+ # --- Usage examples (validated 2026-04-18) ---
421
+
422
+ # 1. Geocode a landmark
423
+ places = geocode("Eiffel Tower", limit=3)
424
+ # places[0]['lat'] == '48.8582599' (string)
425
+ # places[0]['lon'] == '2.2945006' (string)
426
+ # places[0]['display_name'] == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
427
+ # places[0]['address']['city'] == 'Paris'
428
+ lat = float(places[0]['lat'])
429
+ lon = float(places[0]['lon'])
430
+
431
+ # 2. Reverse geocode the coordinates
432
+ addr = reverse_geocode(lat, lon)
433
+ # addr['address']['road'] == 'Avenue Gustave Eiffel'
434
+ # addr['address']['city'] == 'Paris'
435
+ # addr['address']['postcode']== '75007'
436
+ # addr['address']['country'] == 'France'
437
+
438
+ # 3. Find nearby cafes (wait 1s between nominatim and overpass if same script)
439
+ time.sleep(1)
440
+ cafes = overpass_get(
441
+ f"[out:json][timeout:25];node[\"amenity\"=\"cafe\"](around:500,{lat},{lon});out 10;"
442
+ )
443
+ for cafe in cafes:
444
+ print(f"{cafe['tags'].get('name','?'):30s} {cafe['lat']:.4f}, {cafe['lon']:.4f}")
445
+ # Café de l'Alma 48.8609, 2.3015
446
+ # Le Campanella 48.8586, 2.3033
447
+
448
+ # 4. Structured city lookup + find restaurants in bounding box
449
+ time.sleep(1)
450
+ paris = geocode("Paris, France")[0]
451
+ bb = paris['boundingbox'] # [south_lat, north_lat, west_lon, east_lon] ← Nominatim order!
452
+ # For Overpass: need (south_lat, west_lon, north_lat, east_lon) ← DIFFERENT order
453
+ south, north, west, east = bb[0], bb[1], bb[2], bb[3]
454
+ # Restrict to center slice to avoid massive result set
455
+ center_bbox = f"48.855,2.295,48.865,2.315"
456
+ rests = overpass_post(
457
+ f"[out:json][timeout:25];node[\"amenity\"=\"restaurant\"]({center_bbox});out 5;"
458
+ )
459
+ print(f"Found {len(rests)} restaurants near Paris center")
460
+ ```
461
+
462
+ ---
463
+
464
+ ## Gotchas
465
+
466
+ **`http_get` default UA (`Mozilla/5.0`) is blocked by both APIs.** Always pass `headers={"User-Agent": "browser-harness/1.0"}`. The `headers` kwarg in `http_get` does a `.update()` so it properly overrides the default. Confirmed: Mozilla/5.0 → 403 on Nominatim; `browser-harness/1.0` → 200.
467
+
468
+ **Blocked User-Agent patterns on Nominatim**: `Mozilla/5.0`, `python-requests/*`, `Wget/*`. Accepted: any non-generic app-style UA like `browser-harness/1.0`, `MyApp/2.0`, `curl/7.x`. Nominatim policy requires a descriptive UA with contact info, but in practice any non-library string passes.
469
+
470
+ **Nominatim lat/lon are strings, Overpass lat/lon are floats.** Always convert Nominatim coordinates: `float(result['lat'])`. Overpass element `lat`/`lon` are native Python floats — no conversion needed.
471
+
472
+ **Nominatim `boundingbox` field order is `[south_lat, north_lat, west_lon, east_lon]` — NOT `[south, west, north, east]`.** Overpass bbox uses `(south_lat, west_lon, north_lat, east_lon)`. When feeding a Nominatim bounding box into Overpass, you must reorder: `f"({bb[0]},{bb[2]},{bb[1]},{bb[3]})"`.
473
+
474
+ **`overpass-api.de` main instance is frequently overloaded.** Returns HTTP 504 (timeout) or an HTML error page with `rate_limited` when busy. The FR mirror (`overpass.openstreetmap.fr`) is usually more responsive but also blocks `Mozilla/5.0`. Always detect non-JSON responses: `if not raw.startswith("{")`.
475
+
476
+ **Overpass error responses are HTML, not JSON.** The API returns HTTP 200 with an HTML error page when rate-limited or when the server is too busy. Always check `raw.startswith("{")` before parsing.
477
+
478
+ **Overpass rate limit: 2 concurrent slots, NOT 2 requests/s.** You can run 2 queries simultaneously. A 3rd concurrent query immediately returns an error. Sequential queries with no sleep between them work fine as long as each completes before the next starts.
479
+
480
+ **`out N;` limits results to N elements — use it.** Without a limit, large bounding boxes can return thousands of elements and hit the 512MB memory limit, returning a `maxsize` error. Default safe limit: `out 50;` for exploration, `out 500;` for bulk collection.
481
+
482
+ **Overpass QL bbox order is `(south, west, north, east)` — latitude FIRST.** This is the opposite of the standard GeoJSON convention `[west, south, east, north]`. The `around:` filter uses `(around:METERS,LAT,LON)` — note lat before lon.
483
+
484
+ **`name` tag in Overpass is the local-language name.** For Paris cafes this is French. English names may appear under `name:en` but are often absent. Never assume `name` is in English.
485
+
486
+ **Nominatim `/reverse` always returns the nearest result** — it never returns an empty response (unlike `/search`). If the coordinates are in the ocean, it still returns the nearest coastline or country.
487
+
488
+ **`place_id` is internal and ephemeral** — do not store it for long-term use. Use `osm_type` + `osm_id` for stable references (e.g., `way/5013364` for the Eiffel Tower).
489
+
490
+ **Overpass `http_get` POST workaround**: `http_get` only supports GET. For POST requests (needed to avoid URL length limits for complex multi-statement QL), use `urllib.request.Request` directly as shown in the `overpass_post()` example above.