@pencil-agent/nano-pencil 2.0.1 → 2.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. package/README.md +267 -267
  2. package/dist/build-meta.json +3 -3
  3. package/dist/core/export-html/AGENT.md +11 -11
  4. package/dist/core/export-html/template.css +971 -971
  5. package/dist/core/export-html/template.html +54 -54
  6. package/dist/core/model/custom-providers.js +1 -1
  7. package/dist/core/model-registry.js +5 -5
  8. package/dist/extensions/builtin/AGENT.md +115 -115
  9. package/dist/extensions/builtin/browser/AGENT.md +17 -17
  10. package/dist/extensions/builtin/browser/agent-workspace/agent_helpers.py +12 -12
  11. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/amazon/product-search.md +198 -198
  12. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/archive-org/scraping.md +341 -341
  13. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv/scraping.md +311 -311
  14. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/arxiv-bulk/scraping.md +333 -333
  15. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/atlas/overview.md +70 -70
  16. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/booking-com/scraping.md +578 -578
  17. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/capterra/scraping.md +440 -440
  18. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/centilebrain/generate-estimates.md +110 -110
  19. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coingecko/scraping.md +325 -325
  20. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coinmarketcap/scraping.md +463 -463
  21. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/coursera/scraping.md +360 -360
  22. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/craigslist/scraping.md +390 -390
  23. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/crossref/scraping.md +568 -568
  24. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/dev-to/scraping.md +323 -323
  25. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/duckduckgo/scraping.md +349 -349
  26. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/ebay/scraping.md +435 -435
  27. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/etsy/scraping.md +506 -506
  28. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/eventbrite/scraping.md +363 -363
  29. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/expedia/automation.md +168 -168
  30. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/groups.md +236 -236
  31. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/facebook/pages.md +295 -295
  32. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/framer/editor.md +108 -108
  33. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/fred/scraping.md +493 -493
  34. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/g2/scraping.md +580 -580
  35. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/genius/scraping.md +511 -511
  36. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/repo-actions.md +65 -65
  37. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/github/scraping.md +184 -184
  38. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/glassdoor/scraping.md +543 -543
  39. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gmail/compose.md +122 -122
  40. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/goodreads/scraping.md +461 -461
  41. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/gutenberg/scraping.md +383 -383
  42. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/hackernews/scraping.md +243 -243
  43. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/howlongtobeat/scraping.md +473 -473
  44. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/imdb/scraping.md +271 -271
  45. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/itch-io/scraping.md +436 -436
  46. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/job-boards/indeed-glassdoor.md +1021 -1021
  47. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/letterboxd/scraping.md +349 -349
  48. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/linkedin/invitation-manager.md +109 -109
  49. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/loom/folder-enumeration.md +170 -170
  50. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/macrotrends/scraping.md +537 -537
  51. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/article-hydration.md +120 -120
  52. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/medium/scraping.md +414 -414
  53. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/metacritic/scraping.md +477 -477
  54. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/musicbrainz/scraping.md +478 -478
  55. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/nasa/scraping.md +339 -339
  56. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/news-aggregation/multi-source.md +205 -205
  57. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/open-library/scraping.md +472 -472
  58. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openalex/scraping.md +470 -470
  59. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/openstreetmap/scraping.md +490 -490
  60. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/package-registries/npm-pypi.md +478 -478
  61. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/polymarket/scraping.md +234 -234
  62. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/producthunt/scraping.md +307 -307
  63. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/pubmed/scraping.md +421 -421
  64. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/quora/scraping.md +364 -364
  65. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rawg/scraping.md +352 -352
  66. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/reddit/scraping.md +124 -124
  67. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/rest-countries/scraping.md +233 -233
  68. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/sec-edgar/scraping.md +361 -361
  69. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/README.md +36 -36
  70. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/embedded-apps.md +72 -72
  71. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/knowledge-base.md +109 -109
  72. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/shopify-admin/polaris-inputs.md +137 -137
  73. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/soundcloud/scraping.md +362 -362
  74. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/spotify/scraping.md +339 -339
  75. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/stackoverflow/scraping.md +435 -435
  76. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/steam/scraping.md +575 -575
  77. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/substack/scraping.md +338 -338
  78. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/thetechgeeks/pricing.md +52 -52
  79. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tiktok/upload.md +107 -107
  80. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/tradingview/scraping.md +309 -309
  81. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trello/boards-and-lists.md +88 -88
  82. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/trustpilot/scraping.md +375 -375
  83. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/walmart/scraping.md +444 -444
  84. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wayback-machine/scraping.md +306 -306
  85. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/weather/scraping.md +398 -398
  86. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/wellfound/scraping.md +596 -596
  87. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/world-bank/scraping.md +356 -356
  88. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/xiaohongshu/scraping.md +84 -84
  89. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/youtube/scraping.md +418 -418
  90. package/dist/extensions/builtin/browser/agent-workspace/domain-skills/zillow/scraping.md +433 -433
  91. package/dist/extensions/builtin/browser/browser.md +73 -73
  92. package/dist/extensions/builtin/browser/install.md +142 -142
  93. package/dist/extensions/builtin/browser/interaction-skills/connection.md +48 -48
  94. package/dist/extensions/builtin/browser/interaction-skills/cookies.md +3 -3
  95. package/dist/extensions/builtin/browser/interaction-skills/cross-origin-iframes.md +3 -3
  96. package/dist/extensions/builtin/browser/interaction-skills/dialogs.md +64 -64
  97. package/dist/extensions/builtin/browser/interaction-skills/downloads.md +3 -3
  98. package/dist/extensions/builtin/browser/interaction-skills/drag-and-drop.md +3 -3
  99. package/dist/extensions/builtin/browser/interaction-skills/dropdowns.md +3 -3
  100. package/dist/extensions/builtin/browser/interaction-skills/iframes.md +3 -3
  101. package/dist/extensions/builtin/browser/interaction-skills/network-requests.md +3 -3
  102. package/dist/extensions/builtin/browser/interaction-skills/print-as-pdf.md +3 -3
  103. package/dist/extensions/builtin/browser/interaction-skills/profile-sync.md +90 -90
  104. package/dist/extensions/builtin/browser/interaction-skills/screenshots.md +17 -17
  105. package/dist/extensions/builtin/browser/interaction-skills/scrolling.md +3 -3
  106. package/dist/extensions/builtin/browser/interaction-skills/shadow-dom.md +3 -3
  107. package/dist/extensions/builtin/browser/interaction-skills/tabs.md +69 -69
  108. package/dist/extensions/builtin/browser/interaction-skills/uploads.md +1 -1
  109. package/dist/extensions/builtin/browser/interaction-skills/viewport.md +3 -3
  110. package/dist/extensions/builtin/browser/src/browser_harness/AGENT.md +15 -15
  111. package/dist/extensions/builtin/browser/src/browser_harness/__init__.py +8 -8
  112. package/dist/extensions/builtin/browser/src/browser_harness/_ipc.py +90 -90
  113. package/dist/extensions/builtin/browser/src/browser_harness/admin.py +722 -722
  114. package/dist/extensions/builtin/browser/src/browser_harness/daemon.py +328 -328
  115. package/dist/extensions/builtin/browser/src/browser_harness/helpers.py +396 -396
  116. package/dist/extensions/builtin/browser/src/browser_harness/run.py +103 -103
  117. package/dist/extensions/builtin/discipline/skills/brainstorming/SKILL.md +33 -33
  118. package/dist/extensions/builtin/discipline/skills/executing-plans/SKILL.md +25 -25
  119. package/dist/extensions/builtin/discipline/skills/finishing-development-branch/SKILL.md +25 -25
  120. package/dist/extensions/builtin/discipline/skills/receiving-code-review/SKILL.md +22 -22
  121. package/dist/extensions/builtin/discipline/skills/requesting-code-review/SKILL.md +31 -31
  122. package/dist/extensions/builtin/discipline/skills/systematic-debugging/SKILL.md +28 -28
  123. package/dist/extensions/builtin/discipline/skills/test-driven-development/SKILL.md +32 -32
  124. package/dist/extensions/builtin/discipline/skills/using-git-worktrees/SKILL.md +25 -25
  125. package/dist/extensions/builtin/discipline/skills/verification-before-completion/SKILL.md +27 -27
  126. package/dist/extensions/builtin/discipline/skills/writing-plans/SKILL.md +26 -26
  127. package/dist/extensions/builtin/goal/README.md +67 -67
  128. package/dist/extensions/builtin/grub/README.md +112 -112
  129. package/dist/extensions/builtin/link-world/agent-workspace/README.md +16 -16
  130. package/dist/extensions/builtin/link-world/internet-search/internet-search.md +65 -65
  131. package/dist/extensions/builtin/link-world/link-world-agent.md +82 -82
  132. package/dist/extensions/builtin/link-world/linkworld.md +313 -313
  133. package/dist/extensions/builtin/link-world/network-routing/network-routing.md +67 -67
  134. package/dist/extensions/builtin/loop/README.md +92 -92
  135. package/dist/extensions/builtin/mcp/figma-design.md +68 -68
  136. package/dist/extensions/builtin/mcp/mcp-management.md +85 -85
  137. package/dist/extensions/builtin/recap/AGENT.md +15 -15
  138. package/dist/extensions/builtin/sal/README.md +72 -72
  139. package/dist/extensions/builtin/security-audit/README.md +289 -289
  140. package/dist/extensions/builtin/team/AGENT.md +112 -112
  141. package/dist/extensions/builtin/team/TESTING.md +299 -299
  142. package/dist/extensions/builtin/token-save/README.md +56 -56
  143. package/dist/extensions/optional/AGENT.md +10 -10
  144. package/dist/modes/interactive/controllers/input-submit-controller.js +2 -2
  145. package/dist/modes/interactive/controllers/stream-render-controller.js +2 -2
  146. package/dist/modes/interactive/interactive-mode.js +19 -19
  147. package/dist/modes/interactive/theme/dark.json +85 -85
  148. package/dist/modes/interactive/theme/light.json +84 -84
  149. package/dist/modes/interactive/theme/theme-schema.json +335 -335
  150. package/dist/modes/interactive/theme/warm.json +81 -81
  151. package/dist/node_modules/@pencil-agent/ai/dist/cli.js +0 -0
  152. package/dist/node_modules/@pencil-agent/ai/dist/models.generated.js +1 -1
  153. package/docs/ACP/345/215/217/350/256/256/351/233/206/346/210/220/345/274/200/345/217/221/346/226/207/346/241/243.md +851 -0
  154. package/docs/SDK-TESTING.md +364 -0
  155. package/docs/codex-goal-command-impl.md +1055 -1055
  156. package/docs/codex-goal-vs-grub.md +500 -500
  157. package/docs/custom-provider.md +27 -27
  158. package/docs/extensions.md +27 -27
  159. package/docs/keybindings.md +27 -27
  160. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/200/273/347/273/223.md" +250 -250
  161. package/docs/loop /351/207/215/346/236/204/345/256/214/346/210/220/346/212/245/345/221/212.md" +122 -122
  162. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210.md" +1222 -1222
  163. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/256/236/347/216/260/346/212/245/345/221/212.md" +158 -158
  164. package/docs/loop /351/207/215/346/236/204/346/226/271/346/241/210/345/257/271/346/257/224/345/210/206/346/236/220.md" +128 -128
  165. package/docs/loop /351/207/215/346/236/204/350/256/241/345/210/222.md" +320 -320
  166. package/docs/loop-usage-examples.md +214 -214
  167. package/docs/mem-core/346/212/200/346/234/257/346/226/207/346/241/243.md +593 -0
  168. package/docs/models.md +27 -27
  169. package/docs/packages.md +27 -27
  170. package/docs/pi-design-philosophy.md +457 -457
  171. package/docs/planmode.md +1987 -1987
  172. package/docs/prompt-templates.md +27 -27
  173. package/docs/providers.md +27 -27
  174. package/docs/sdk.md +27 -27
  175. package/docs/skills.md +27 -27
  176. package/docs/startup-performance-optimization.md +301 -0
  177. package/docs/themes.md +27 -27
  178. package/docs/tui.md +27 -27
  179. package/docs//350/256/244/347/237/245/345/234/260/345/233/276.md +47 -0
  180. package/package.json +190 -190
  181. package/docs/cc-agent-design.md +0 -1297
  182. package/docs/cc-tui-design.md +0 -1333
  183. package/docs/nanoPencil-/345/255/246/344/271/240/350/256/241/345/210/222.md +0 -170
  184. package/docs/scan-report.md +0 -3820
  185. package/docs//345/257/271/346/240/207Claude-Code.md +0 -1775
  186. package/docs//351/230/277/351/207/214/345/267/264/345/267/264/350/264/242/346/212/245/345/210/206/346/236/220/344/271/246.md +0 -261
@@ -1,490 +1,490 @@
1
- # OpenStreetMap — Nominatim Geocoding + Overpass API
2
-
3
- Two fully public, no-auth APIs. Everything is a direct HTTP call — never need a browser.
4
-
5
- - **Nominatim**: geocoding (place name → lat/lon and reverse). Rate limit: 1 req/s.
6
- - **Overpass API**: spatial query engine over the full OSM dataset. Rate limit: 2 concurrent slots per IP on the public instance.
7
-
8
- **Do not use `http_get` without overriding `User-Agent`** — its default `Mozilla/5.0` is blocked by both APIs with HTTP 403. Pass `headers={"User-Agent": "browser-harness/1.0"}` on every call.
9
-
10
- ---
11
-
12
- ## Fastest path: forward geocode a place
13
-
14
- ```python
15
- import json, urllib.parse
16
- from helpers import http_get
17
-
18
- UA = {"User-Agent": "browser-harness/1.0"}
19
-
20
- def geocode(query: str, limit: int = 3) -> list[dict]:
21
- q = urllib.parse.quote(query)
22
- raw = http_get(
23
- f"https://nominatim.openstreetmap.org/search?q={q}&format=json&limit={limit}&addressdetails=1",
24
- headers=UA
25
- )
26
- return json.loads(raw) # [] when nothing found
27
-
28
- results = geocode("Eiffel Tower")
29
- # results[0]['display_name'] == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
30
- # results[0]['lat'] == '48.8582599' ← STRING, not float
31
- # results[0]['lon'] == '2.2945006' ← STRING, not float
32
- # results[0]['type'] == 'tower'
33
- # results[0]['class'] == 'man_made'
34
- # results[0]['importance'] == 0.6205937724353116
35
- # results[0]['osm_type'] == 'way'
36
- # results[0]['osm_id'] == 5013364
37
- # results[0]['boundingbox'] == ['48.8574753', '48.8590453', '2.2933119', '2.2956897'] ← all strings
38
- # results[0]['address']['city'] == 'Paris'
39
- # results[0]['address']['postcode'] == '75007'
40
- # results[0]['address']['country'] == 'France'
41
- # results[0]['address']['country_code'] == 'fr'
42
- ```
43
-
44
- ---
45
-
46
- ## Nominatim: all three query modes
47
-
48
- ### 1. Forward geocode (free-text)
49
-
50
- ```python
51
- import json, urllib.parse
52
- from helpers import http_get
53
-
54
- UA = {"User-Agent": "browser-harness/1.0"}
55
-
56
- raw = http_get(
57
- "https://nominatim.openstreetmap.org/search?q=Eiffel+Tower&format=json&limit=3&addressdetails=1",
58
- headers=UA
59
- )
60
- results = json.loads(raw)
61
- # Returns [] when nothing found — no exception
62
-
63
- # Useful optional params:
64
- # &addressdetails=1 → adds 'address' dict to each result (city, postcode, road, etc.)
65
- # &extratags=1 → adds 'extratags' dict (website, wikidata, phone, etc.)
66
- # &namedetails=1 → adds 'namedetails' dict (name:en, name:fr, etc.)
67
- # &countrycodes=fr,de → restrict to countries (comma-separated ISO 3166-1 alpha-2)
68
- # &viewbox=2.2,48.8,2.4,48.9 &bounded=1 → restrict to bounding box (lon_min,lat_min,lon_max,lat_max)
69
- ```
70
-
71
- ### 2. Reverse geocode (lat/lon → address)
72
-
73
- ```python
74
- raw = http_get(
75
- "https://nominatim.openstreetmap.org/reverse?lat=48.8584&lon=2.2945&format=json",
76
- headers=UA
77
- )
78
- result = json.loads(raw)
79
- # result['display_name'] == 'Avenue Gustave Eiffel, Quartier du Gros-Caillou, ..., France'
80
- # result['address']['road'] == 'Avenue Gustave Eiffel'
81
- # result['address']['city'] == 'Paris'
82
- # result['address']['postcode'] == '75007'
83
- # result['address']['country'] == 'France'
84
- # result['address']['country_code'] == 'fr'
85
- # result['address']['state'] == 'Île-de-France'
86
- # result['lat'], result['lon'] → strings (not floats)
87
-
88
- # Optional: &zoom=N (0-18) controls granularity of the returned address
89
- # zoom=3 → country, zoom=10 → city, zoom=18 → street/building (default)
90
- ```
91
-
92
- ### 3. Structured search (field-based)
93
-
94
- ```python
95
- raw = http_get(
96
- "https://nominatim.openstreetmap.org/search?city=Paris&country=France&format=json&limit=1",
97
- headers=UA
98
- )
99
- result = json.loads(raw)[0]
100
- # result['name'] == 'Paris'
101
- # result['lat'] == '48.8534951'
102
- # result['lon'] == '2.3483915'
103
- # result['type'] == 'administrative'
104
- # result['place_rank'] == 12 (lower = broader: 4=country, 8=state, 12=city, 30=POI)
105
- # result['addresstype'] == 'city'
106
- # result['boundingbox'] == ['48.8155755', '48.9021560', '2.2241220', '2.4697602']
107
-
108
- # Supported structured params: street, city, county, state, country, postalcode
109
- ```
110
-
111
- ### 4. Lookup by OSM ID
112
-
113
- ```python
114
- # Prefix: N=node, W=way, R=relation
115
- raw = http_get(
116
- "https://nominatim.openstreetmap.org/lookup?osm_ids=W5013364&format=json",
117
- headers=UA
118
- )
119
- result = json.loads(raw)
120
- # Returns list. Eiffel Tower way: result[0]['name'] == 'Tour Eiffel'
121
- # Supports up to 50 IDs: osm_ids=W5013364,N123456,R789
122
- ```
123
-
124
- ---
125
-
126
- ## Nominatim response field reference
127
-
128
- | Field | Type | Notes |
129
- |-------|------|-------|
130
- | `place_id` | int | Internal Nominatim ID — do not cache long-term, can change |
131
- | `osm_type` | str | `"node"`, `"way"`, or `"relation"` |
132
- | `osm_id` | int | The OSM element ID |
133
- | `lat` | **str** | Latitude as string — convert with `float(r['lat'])` |
134
- | `lon` | **str** | Longitude as string — convert with `float(r['lon'])` |
135
- | `display_name` | str | Full human-readable address string |
136
- | `name` | str | Short name of the place |
137
- | `type` | str | OSM type tag value: `"tower"`, `"administrative"`, `"restaurant"`, etc. |
138
- | `class` | str | OSM key: `"man_made"`, `"boundary"`, `"amenity"`, `"highway"`, etc. |
139
- | `addresstype` | str | Semantic category: `"city"`, `"road"`, `"man_made"`, etc. |
140
- | `place_rank` | int | Hierarchy rank: 4=country, 8=state, 12=city, 16=suburb, 30=POI |
141
- | `importance` | float | 0–1 relevance score (higher = more notable) |
142
- | `boundingbox` | list[str] | `[south_lat, north_lat, west_lon, east_lon]` — all strings, note unusual order |
143
- | `licence` | str | ODbL attribution string — include in user-facing output |
144
- | `address` | dict | Only present with `&addressdetails=1` or in reverse results |
145
-
146
- `address` dict common keys: `road`, `house_number`, `quarter`, `suburb`, `city_district`, `city`, `state`, `postcode`, `country`, `country_code`, `ISO3166-2-lvl4/6`.
147
-
148
- ---
149
-
150
- ## Overpass API: query OSM data by tags
151
-
152
- Overpass is a read-only query engine over the full OSM planet. It supports finding POIs by tag, radius, bounding box, and combinations.
153
-
154
- **Endpoint**: `https://overpass-api.de/api/interpreter`
155
- **Backup instances** (use when main is overloaded, which happens often):
156
- - `https://overpass.openstreetmap.fr/api/interpreter` — requires non-Mozilla User-Agent
157
-
158
- **http_get works for GET requests** — pass `headers={"User-Agent": "browser-harness/1.0"}`. For POST, use `urllib` directly (see example below).
159
-
160
- ### GET query (simplest for http_get)
161
-
162
- ```python
163
- import json, urllib.parse
164
- from helpers import http_get
165
-
166
- UA = {"User-Agent": "browser-harness/1.0"}
167
- OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
168
-
169
- def overpass_get(query: str) -> dict:
170
- url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
171
- raw = http_get(url, headers=UA)
172
- if not raw.startswith("{"):
173
- raise RuntimeError(f"Overpass error (HTML returned): {raw[:200]}")
174
- return json.loads(raw)
175
-
176
- # Find cafes in central Paris (bbox: south_lat, west_lon, north_lat, east_lon)
177
- r = overpass_get('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 10;')
178
- # r['version'] == 0.6
179
- # r['generator'] == 'Overpass API 0.7.62.7 375dc00a'
180
- # r['elements'] → list of matching OSM elements
181
-
182
- for cafe in r['elements']:
183
- print(cafe['tags'].get('name'), cafe['lat'], cafe['lon'])
184
- # 'Café de l\'Alma' 48.8609068 2.3015143
185
- # 'Le Campanella' 48.8585847 2.3032822
186
- # 'Kozy Bosquet' 48.855445 2.3054013
187
-
188
- # Find restaurants within 500m radius of a point (around filter)
189
- r = overpass_get(
190
- '[out:json][timeout:25];node["amenity"="restaurant"](around:500,37.7749,-122.4194);out 10;'
191
- )
192
- for rest in r['elements']:
193
- print(rest['tags'].get('name'), rest['tags'].get('cuisine',''))
194
- # 'Nepalese Indian Cusine' 'indian;nepali'
195
- # 'Local Diner' 'coffee_shop;italian;burger;seafood'
196
- # 'Moya Cafe' ''
197
- ```
198
-
199
- ### POST query (for complex QL, avoids URL length limits)
200
-
201
- ```python
202
- import json, urllib.parse, urllib.request, gzip
203
- from helpers import http_get # http_get is GET-only; use urllib for POST
204
-
205
- OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
206
-
207
- def overpass_post(query: str) -> dict:
208
- """POST to Overpass — no URL length limits, preferred for multi-statement QL."""
209
- data = urllib.parse.urlencode({"data": query}).encode()
210
- req = urllib.request.Request(
211
- OVERPASS, data=data, method="POST",
212
- headers={
213
- "User-Agent": "browser-harness/1.0",
214
- "Content-Type": "application/x-www-form-urlencoded",
215
- "Accept-Encoding": "gzip",
216
- }
217
- )
218
- with urllib.request.urlopen(req, timeout=30) as r:
219
- body = r.read()
220
- if r.headers.get("Content-Encoding") == "gzip":
221
- body = gzip.decompress(body)
222
- body = body.decode()
223
- if not body.startswith("{"):
224
- raise RuntimeError(f"Overpass error (HTML): {body[:300]}")
225
- return json.loads(body)
226
-
227
- # Example: cafes in Paris bbox
228
- r = overpass_post('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 5;')
229
- print(len(r['elements'])) # 5 (or up to 5)
230
- ```
231
-
232
- ### Overpass element structure
233
-
234
- Every element in `r['elements']` is a dict with at minimum:
235
-
236
- ```python
237
- {
238
- "type": "node", # "node", "way", or "relation"
239
- "id": 308684349, # int — OSM element ID (stable, use for dedup)
240
- "lat": 48.8609068, # float — ONLY present for node type
241
- "lon": 2.3015143, # float — ONLY present for node type
242
- "tags": { # dict — all OSM tags on this element
243
- "amenity": "cafe",
244
- "name": "Café de l'Alma",
245
- "name:fr": "Café de l'Alma",
246
- "outdoor_seating": "yes",
247
- "payment:credit_cards": "yes",
248
- "phone": "+33 1 45 51 56 74",
249
- "opening_hours": "Mo-Sa 08:00-23:00; Su 09:00-19:00", # optional
250
- "website": "https://...", # optional
251
- "wheelchair": "yes" # optional
252
- }
253
- }
254
- ```
255
-
256
- For `way` elements, use `out center;` to get a `center` dict with lat/lon instead of a node list:
257
-
258
- ```python
259
- # way element with out center:
260
- {
261
- "type": "way",
262
- "id": 338411946,
263
- "center": {"lat": 48.8660087, "lon": 2.3153233}, # centroid of the polygon
264
- "nodes": [3454913623, 3454913707, ...], # node IDs forming the boundary
265
- "tags": {"amenity": "cafe", "name": "Café 1902", ...}
266
- }
267
-
268
- # Query to get both nodes and ways with lat/lon:
269
- query = '[out:json][timeout:25];(node["amenity"="cafe"](48.85,2.29,48.87,2.32);way["amenity"="cafe"](48.85,2.29,48.87,2.32););out center 20;'
270
- r = overpass_get(query)
271
- for el in r['elements']:
272
- if el['type'] == 'node':
273
- lat, lon = el['lat'], el['lon']
274
- else: # way
275
- lat, lon = el['center']['lat'], el['center']['lon']
276
- print(el['tags'].get('name'), lat, lon)
277
- ```
278
-
279
- ### Overpass QL quick reference
280
-
281
- ```
282
- [out:json][timeout:25] # Required header: JSON output, 25s timeout
283
- [maxsize:52428800] # Optional: 50MB max result size (default is server limit)
284
-
285
- node["amenity"="cafe"](south,west,north,east);out N;
286
- # ↑ bbox order: south_lat, west_lon, north_lat, east_lon
287
- # Note: DIFFERENT from Nominatim's boundingbox field which is [south,north,west,east]
288
-
289
- node["amenity"="cafe"](around:RADIUS_METERS,LAT,LON);out N;
290
-
291
- node["amenity"~"cafe|restaurant"](bbox);out N; # regex match on tag value
292
- node[!"name"](bbox);out N; # elements WITHOUT the 'name' tag
293
- node["name"~"Star",i](bbox);out N; # case-insensitive regex
294
-
295
- # Union of types:
296
- (node["amenity"="cafe"](bbox); way["amenity"="cafe"](bbox););out center N;
297
-
298
- # Multiple tags (AND logic):
299
- node["amenity"="cafe"]["outdoor_seating"="yes"](bbox);out N;
300
- ```
301
-
302
- ---
303
-
304
- ## OSM tile server (reference only, no scraping)
305
-
306
- ```
307
- https://{a,b,c}.tile.openstreetmap.org/{z}/{x}/{y}.png
308
- ```
309
-
310
- - Subdomains `a`, `b`, `c` for load balancing
311
- - `z` = zoom level 0–19, `x`/`y` = tile coordinates
312
- - Returns 256×256 PNG tiles
313
- - Policy: max 2 req/s per IP, non-commercial use, must display OSM attribution
314
- - Tile coordinate calculator: `https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames`
315
- - Bulk tile downloading is prohibited — use Overpass or data extracts instead
316
-
317
- ```python
318
- # Convert lat/lon to tile coordinates
319
- import math
320
-
321
- def lat_lon_to_tile(lat, lon, zoom):
322
- n = 2 ** zoom
323
- x = int((lon + 180) / 360 * n)
324
- y = int((1 - math.log(math.tan(math.radians(lat)) + 1 / math.cos(math.radians(lat))) / math.pi) / 2 * n)
325
- return x, y
326
-
327
- x, y = lat_lon_to_tile(48.8582, 2.2945, 14)
328
- url = f"https://a.tile.openstreetmap.org/14/{x}/{y}.png"
329
- # url == 'https://a.tile.openstreetmap.org/14/8281/5646.png'
330
- ```
331
-
332
- ---
333
-
334
- ## Rate limits
335
-
336
- | API | Limit | Enforcement | 429 behavior |
337
- |-----|-------|-------------|--------------|
338
- | Nominatim | 1 req/s | Soft — rapid requests work but you get delayed/dropped | Returns HTTP 403 if your IP is banned (not 429) |
339
- | Overpass (main) | 2 concurrent slots per IP | Hard — 3rd concurrent req returns HTML error immediately | HTML error page with `rate_limited` in body |
340
- | Overpass (main) | Also: query complexity quota | Resets over time (~per hour) | HTML error page with `rate_limited` |
341
- | Tile server | 2 req/s per IP | Soft/hard | IP block |
342
-
343
- **Check your Overpass quota**:
344
- ```python
345
- raw = http_get("https://overpass-api.de/api/status", headers={"User-Agent": "browser-harness/1.0"})
346
- print(raw)
347
- # Connected as: 1728118854
348
- # Rate limit: 2
349
- # 2 slots available now.
350
- # Slot available after: 2026-04-18T11:00:00Z, in 30 seconds.
351
- ```
352
-
353
- **Handle rate limiting in production**:
354
- ```python
355
- import time
356
-
357
- def overpass_get_with_retry(query: str, max_retries: int = 3) -> dict:
358
- for attempt in range(max_retries):
359
- url = f"https://overpass.openstreetmap.fr/api/interpreter?data={urllib.parse.quote(query)}"
360
- raw = http_get(url, headers={"User-Agent": "browser-harness/1.0"})
361
- if raw.startswith("{"):
362
- return json.loads(raw)
363
- if "rate_limited" in raw or "too busy" in raw:
364
- wait = 2 ** attempt * 10 # 10s, 20s, 40s
365
- time.sleep(wait)
366
- continue
367
- raise RuntimeError(f"Overpass error: {raw[:200]}")
368
- raise RuntimeError("Overpass: too many retries")
369
- ```
370
-
371
- ---
372
-
373
- ## Complete working example
374
-
375
- ```python
376
- import json, time, urllib.parse, urllib.request, gzip
377
- from helpers import http_get
378
-
379
- UA = {"User-Agent": "browser-harness/1.0"}
380
- NOMINATIM = "https://nominatim.openstreetmap.org"
381
- OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
382
-
383
- def geocode(query: str, limit: int = 1) -> list[dict]:
384
- """Forward geocode — returns [] if nothing found."""
385
- q = urllib.parse.quote(query)
386
- raw = http_get(f"{NOMINATIM}/search?q={q}&format=json&limit={limit}&addressdetails=1", headers=UA)
387
- return json.loads(raw)
388
-
389
- def reverse_geocode(lat: float, lon: float) -> dict:
390
- """Reverse geocode — always returns a result (nearest road/place)."""
391
- raw = http_get(f"{NOMINATIM}/reverse?lat={lat}&lon={lon}&format=json", headers=UA)
392
- return json.loads(raw)
393
-
394
- def overpass_get(query: str) -> list[dict]:
395
- """Run an Overpass QL query, return elements list."""
396
- url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
397
- raw = http_get(url, headers=UA)
398
- if not raw.startswith("{"):
399
- raise RuntimeError(f"Overpass error: {raw[:200]}")
400
- return json.loads(raw)["elements"]
401
-
402
- def overpass_post(query: str) -> list[dict]:
403
- """POST variant — avoids URL length limits for complex queries."""
404
- data = urllib.parse.urlencode({"data": query}).encode()
405
- req = urllib.request.Request(
406
- OVERPASS, data=data, method="POST",
407
- headers={"User-Agent": "browser-harness/1.0",
408
- "Content-Type": "application/x-www-form-urlencoded",
409
- "Accept-Encoding": "gzip"}
410
- )
411
- with urllib.request.urlopen(req, timeout=30) as r:
412
- body = r.read()
413
- if r.headers.get("Content-Encoding") == "gzip":
414
- body = gzip.decompress(body)
415
- body = body.decode()
416
- if not body.startswith("{"):
417
- raise RuntimeError(f"Overpass error: {body[:300]}")
418
- return json.loads(body)["elements"]
419
-
420
- # --- Usage examples (validated 2026-04-18) ---
421
-
422
- # 1. Geocode a landmark
423
- places = geocode("Eiffel Tower", limit=3)
424
- # places[0]['lat'] == '48.8582599' (string)
425
- # places[0]['lon'] == '2.2945006' (string)
426
- # places[0]['display_name'] == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
427
- # places[0]['address']['city'] == 'Paris'
428
- lat = float(places[0]['lat'])
429
- lon = float(places[0]['lon'])
430
-
431
- # 2. Reverse geocode the coordinates
432
- addr = reverse_geocode(lat, lon)
433
- # addr['address']['road'] == 'Avenue Gustave Eiffel'
434
- # addr['address']['city'] == 'Paris'
435
- # addr['address']['postcode']== '75007'
436
- # addr['address']['country'] == 'France'
437
-
438
- # 3. Find nearby cafes (wait 1s between nominatim and overpass if same script)
439
- time.sleep(1)
440
- cafes = overpass_get(
441
- f"[out:json][timeout:25];node[\"amenity\"=\"cafe\"](around:500,{lat},{lon});out 10;"
442
- )
443
- for cafe in cafes:
444
- print(f"{cafe['tags'].get('name','?'):30s} {cafe['lat']:.4f}, {cafe['lon']:.4f}")
445
- # Café de l'Alma 48.8609, 2.3015
446
- # Le Campanella 48.8586, 2.3033
447
-
448
- # 4. Structured city lookup + find restaurants in bounding box
449
- time.sleep(1)
450
- paris = geocode("Paris, France")[0]
451
- bb = paris['boundingbox'] # [south_lat, north_lat, west_lon, east_lon] ← Nominatim order!
452
- # For Overpass: need (south_lat, west_lon, north_lat, east_lon) ← DIFFERENT order
453
- south, north, west, east = bb[0], bb[1], bb[2], bb[3]
454
- # Restrict to center slice to avoid massive result set
455
- center_bbox = f"48.855,2.295,48.865,2.315"
456
- rests = overpass_post(
457
- f"[out:json][timeout:25];node[\"amenity\"=\"restaurant\"]({center_bbox});out 5;"
458
- )
459
- print(f"Found {len(rests)} restaurants near Paris center")
460
- ```
461
-
462
- ---
463
-
464
- ## Gotchas
465
-
466
- **`http_get` default UA (`Mozilla/5.0`) is blocked by both APIs.** Always pass `headers={"User-Agent": "browser-harness/1.0"}`. The `headers` kwarg in `http_get` does a `.update()` so it properly overrides the default. Confirmed: Mozilla/5.0 → 403 on Nominatim; `browser-harness/1.0` → 200.
467
-
468
- **Blocked User-Agent patterns on Nominatim**: `Mozilla/5.0`, `python-requests/*`, `Wget/*`. Accepted: any non-generic app-style UA like `browser-harness/1.0`, `MyApp/2.0`, `curl/7.x`. Nominatim policy requires a descriptive UA with contact info, but in practice any non-library string passes.
469
-
470
- **Nominatim lat/lon are strings, Overpass lat/lon are floats.** Always convert Nominatim coordinates: `float(result['lat'])`. Overpass element `lat`/`lon` are native Python floats — no conversion needed.
471
-
472
- **Nominatim `boundingbox` field order is `[south_lat, north_lat, west_lon, east_lon]` — NOT `[south, west, north, east]`.** Overpass bbox uses `(south_lat, west_lon, north_lat, east_lon)`. When feeding a Nominatim bounding box into Overpass, you must reorder: `f"({bb[0]},{bb[2]},{bb[1]},{bb[3]})"`.
473
-
474
- **`overpass-api.de` main instance is frequently overloaded.** Returns HTTP 504 (timeout) or an HTML error page with `rate_limited` when busy. The FR mirror (`overpass.openstreetmap.fr`) is usually more responsive but also blocks `Mozilla/5.0`. Always detect non-JSON responses: `if not raw.startswith("{")`.
475
-
476
- **Overpass error responses are HTML, not JSON.** The API returns HTTP 200 with an HTML error page when rate-limited or when the server is too busy. Always check `raw.startswith("{")` before parsing.
477
-
478
- **Overpass rate limit: 2 concurrent slots, NOT 2 requests/s.** You can run 2 queries simultaneously. A 3rd concurrent query immediately returns an error. Sequential queries with no sleep between them work fine as long as each completes before the next starts.
479
-
480
- **`out N;` limits results to N elements — use it.** Without a limit, large bounding boxes can return thousands of elements and hit the 512MB memory limit, returning a `maxsize` error. Default safe limit: `out 50;` for exploration, `out 500;` for bulk collection.
481
-
482
- **Overpass QL bbox order is `(south, west, north, east)` — latitude FIRST.** This is the opposite of the standard GeoJSON convention `[west, south, east, north]`. The `around:` filter uses `(around:METERS,LAT,LON)` — note lat before lon.
483
-
484
- **`name` tag in Overpass is the local-language name.** For Paris cafes this is French. English names may appear under `name:en` but are often absent. Never assume `name` is in English.
485
-
486
- **Nominatim `/reverse` always returns the nearest result** — it never returns an empty response (unlike `/search`). If the coordinates are in the ocean, it still returns the nearest coastline or country.
487
-
488
- **`place_id` is internal and ephemeral** — do not store it for long-term use. Use `osm_type` + `osm_id` for stable references (e.g., `way/5013364` for the Eiffel Tower).
489
-
490
- **Overpass `http_get` POST workaround**: `http_get` only supports GET. For POST requests (needed to avoid URL length limits for complex multi-statement QL), use `urllib.request.Request` directly as shown in the `overpass_post()` example above.
1
+ # OpenStreetMap — Nominatim Geocoding + Overpass API
2
+
3
+ Two fully public, no-auth APIs. Everything is a direct HTTP call — never need a browser.
4
+
5
+ - **Nominatim**: geocoding (place name → lat/lon and reverse). Rate limit: 1 req/s.
6
+ - **Overpass API**: spatial query engine over the full OSM dataset. Rate limit: 2 concurrent slots per IP on the public instance.
7
+
8
+ **Do not use `http_get` without overriding `User-Agent`** — its default `Mozilla/5.0` is blocked by both APIs with HTTP 403. Pass `headers={"User-Agent": "browser-harness/1.0"}` on every call.
9
+
10
+ ---
11
+
12
+ ## Fastest path: forward geocode a place
13
+
14
+ ```python
15
+ import json, urllib.parse
16
+ from helpers import http_get
17
+
18
+ UA = {"User-Agent": "browser-harness/1.0"}
19
+
20
+ def geocode(query: str, limit: int = 3) -> list[dict]:
21
+ q = urllib.parse.quote(query)
22
+ raw = http_get(
23
+ f"https://nominatim.openstreetmap.org/search?q={q}&format=json&limit={limit}&addressdetails=1",
24
+ headers=UA
25
+ )
26
+ return json.loads(raw) # [] when nothing found
27
+
28
+ results = geocode("Eiffel Tower")
29
+ # results[0]['display_name'] == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
30
+ # results[0]['lat'] == '48.8582599' ← STRING, not float
31
+ # results[0]['lon'] == '2.2945006' ← STRING, not float
32
+ # results[0]['type'] == 'tower'
33
+ # results[0]['class'] == 'man_made'
34
+ # results[0]['importance'] == 0.6205937724353116
35
+ # results[0]['osm_type'] == 'way'
36
+ # results[0]['osm_id'] == 5013364
37
+ # results[0]['boundingbox'] == ['48.8574753', '48.8590453', '2.2933119', '2.2956897'] ← all strings
38
+ # results[0]['address']['city'] == 'Paris'
39
+ # results[0]['address']['postcode'] == '75007'
40
+ # results[0]['address']['country'] == 'France'
41
+ # results[0]['address']['country_code'] == 'fr'
42
+ ```
43
+
44
+ ---
45
+
46
+ ## Nominatim: all three query modes
47
+
48
+ ### 1. Forward geocode (free-text)
49
+
50
+ ```python
51
+ import json, urllib.parse
52
+ from helpers import http_get
53
+
54
+ UA = {"User-Agent": "browser-harness/1.0"}
55
+
56
+ raw = http_get(
57
+ "https://nominatim.openstreetmap.org/search?q=Eiffel+Tower&format=json&limit=3&addressdetails=1",
58
+ headers=UA
59
+ )
60
+ results = json.loads(raw)
61
+ # Returns [] when nothing found — no exception
62
+
63
+ # Useful optional params:
64
+ # &addressdetails=1 → adds 'address' dict to each result (city, postcode, road, etc.)
65
+ # &extratags=1 → adds 'extratags' dict (website, wikidata, phone, etc.)
66
+ # &namedetails=1 → adds 'namedetails' dict (name:en, name:fr, etc.)
67
+ # &countrycodes=fr,de → restrict to countries (comma-separated ISO 3166-1 alpha-2)
68
+ # &viewbox=2.2,48.8,2.4,48.9 &bounded=1 → restrict to bounding box (lon_min,lat_min,lon_max,lat_max)
69
+ ```
70
+
71
+ ### 2. Reverse geocode (lat/lon → address)
72
+
73
+ ```python
74
+ raw = http_get(
75
+ "https://nominatim.openstreetmap.org/reverse?lat=48.8584&lon=2.2945&format=json",
76
+ headers=UA
77
+ )
78
+ result = json.loads(raw)
79
+ # result['display_name'] == 'Avenue Gustave Eiffel, Quartier du Gros-Caillou, ..., France'
80
+ # result['address']['road'] == 'Avenue Gustave Eiffel'
81
+ # result['address']['city'] == 'Paris'
82
+ # result['address']['postcode'] == '75007'
83
+ # result['address']['country'] == 'France'
84
+ # result['address']['country_code'] == 'fr'
85
+ # result['address']['state'] == 'Île-de-France'
86
+ # result['lat'], result['lon'] → strings (not floats)
87
+
88
+ # Optional: &zoom=N (0-18) controls granularity of the returned address
89
+ # zoom=3 → country, zoom=10 → city, zoom=18 → street/building (default)
90
+ ```
91
+
92
+ ### 3. Structured search (field-based)
93
+
94
+ ```python
95
+ raw = http_get(
96
+ "https://nominatim.openstreetmap.org/search?city=Paris&country=France&format=json&limit=1",
97
+ headers=UA
98
+ )
99
+ result = json.loads(raw)[0]
100
+ # result['name'] == 'Paris'
101
+ # result['lat'] == '48.8534951'
102
+ # result['lon'] == '2.3483915'
103
+ # result['type'] == 'administrative'
104
+ # result['place_rank'] == 12 (lower = broader: 4=country, 8=state, 12=city, 30=POI)
105
+ # result['addresstype'] == 'city'
106
+ # result['boundingbox'] == ['48.8155755', '48.9021560', '2.2241220', '2.4697602']
107
+
108
+ # Supported structured params: street, city, county, state, country, postalcode
109
+ ```
110
+
111
+ ### 4. Lookup by OSM ID
112
+
113
+ ```python
114
+ # Prefix: N=node, W=way, R=relation
115
+ raw = http_get(
116
+ "https://nominatim.openstreetmap.org/lookup?osm_ids=W5013364&format=json",
117
+ headers=UA
118
+ )
119
+ result = json.loads(raw)
120
+ # Returns list. Eiffel Tower way: result[0]['name'] == 'Tour Eiffel'
121
+ # Supports up to 50 IDs: osm_ids=W5013364,N123456,R789
122
+ ```
123
+
124
+ ---
125
+
126
+ ## Nominatim response field reference
127
+
128
+ | Field | Type | Notes |
129
+ |-------|------|-------|
130
+ | `place_id` | int | Internal Nominatim ID — do not cache long-term, can change |
131
+ | `osm_type` | str | `"node"`, `"way"`, or `"relation"` |
132
+ | `osm_id` | int | The OSM element ID |
133
+ | `lat` | **str** | Latitude as string — convert with `float(r['lat'])` |
134
+ | `lon` | **str** | Longitude as string — convert with `float(r['lon'])` |
135
+ | `display_name` | str | Full human-readable address string |
136
+ | `name` | str | Short name of the place |
137
+ | `type` | str | OSM type tag value: `"tower"`, `"administrative"`, `"restaurant"`, etc. |
138
+ | `class` | str | OSM key: `"man_made"`, `"boundary"`, `"amenity"`, `"highway"`, etc. |
139
+ | `addresstype` | str | Semantic category: `"city"`, `"road"`, `"man_made"`, etc. |
140
+ | `place_rank` | int | Hierarchy rank: 4=country, 8=state, 12=city, 16=suburb, 30=POI |
141
+ | `importance` | float | 0–1 relevance score (higher = more notable) |
142
+ | `boundingbox` | list[str] | `[south_lat, north_lat, west_lon, east_lon]` — all strings, note unusual order |
143
+ | `licence` | str | ODbL attribution string — include in user-facing output |
144
+ | `address` | dict | Only present with `&addressdetails=1` or in reverse results |
145
+
146
+ `address` dict common keys: `road`, `house_number`, `quarter`, `suburb`, `city_district`, `city`, `state`, `postcode`, `country`, `country_code`, `ISO3166-2-lvl4/6`.
147
+
148
+ ---
149
+
150
+ ## Overpass API: query OSM data by tags
151
+
152
+ Overpass is a read-only query engine over the full OSM planet. It supports finding POIs by tag, radius, bounding box, and combinations.
153
+
154
+ **Endpoint**: `https://overpass-api.de/api/interpreter`
155
+ **Backup instances** (use when main is overloaded, which happens often):
156
+ - `https://overpass.openstreetmap.fr/api/interpreter` — requires non-Mozilla User-Agent
157
+
158
+ **http_get works for GET requests** — pass `headers={"User-Agent": "browser-harness/1.0"}`. For POST, use `urllib` directly (see example below).
159
+
160
+ ### GET query (simplest for http_get)
161
+
162
+ ```python
163
+ import json, urllib.parse
164
+ from helpers import http_get
165
+
166
+ UA = {"User-Agent": "browser-harness/1.0"}
167
+ OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
168
+
169
+ def overpass_get(query: str) -> dict:
170
+ url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
171
+ raw = http_get(url, headers=UA)
172
+ if not raw.startswith("{"):
173
+ raise RuntimeError(f"Overpass error (HTML returned): {raw[:200]}")
174
+ return json.loads(raw)
175
+
176
+ # Find cafes in central Paris (bbox: south_lat, west_lon, north_lat, east_lon)
177
+ r = overpass_get('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 10;')
178
+ # r['version'] == 0.6
179
+ # r['generator'] == 'Overpass API 0.7.62.7 375dc00a'
180
+ # r['elements'] → list of matching OSM elements
181
+
182
+ for cafe in r['elements']:
183
+ print(cafe['tags'].get('name'), cafe['lat'], cafe['lon'])
184
+ # 'Café de l\'Alma' 48.8609068 2.3015143
185
+ # 'Le Campanella' 48.8585847 2.3032822
186
+ # 'Kozy Bosquet' 48.855445 2.3054013
187
+
188
+ # Find restaurants within 500m radius of a point (around filter)
189
+ r = overpass_get(
190
+ '[out:json][timeout:25];node["amenity"="restaurant"](around:500,37.7749,-122.4194);out 10;'
191
+ )
192
+ for rest in r['elements']:
193
+ print(rest['tags'].get('name'), rest['tags'].get('cuisine',''))
194
+ # 'Nepalese Indian Cusine' 'indian;nepali'
195
+ # 'Local Diner' 'coffee_shop;italian;burger;seafood'
196
+ # 'Moya Cafe' ''
197
+ ```
198
+
199
+ ### POST query (for complex QL, avoids URL length limits)
200
+
201
+ ```python
202
+ import json, urllib.parse, urllib.request, gzip
203
+ from helpers import http_get # http_get is GET-only; use urllib for POST
204
+
205
+ OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
206
+
207
+ def overpass_post(query: str) -> dict:
208
+ """POST to Overpass — no URL length limits, preferred for multi-statement QL."""
209
+ data = urllib.parse.urlencode({"data": query}).encode()
210
+ req = urllib.request.Request(
211
+ OVERPASS, data=data, method="POST",
212
+ headers={
213
+ "User-Agent": "browser-harness/1.0",
214
+ "Content-Type": "application/x-www-form-urlencoded",
215
+ "Accept-Encoding": "gzip",
216
+ }
217
+ )
218
+ with urllib.request.urlopen(req, timeout=30) as r:
219
+ body = r.read()
220
+ if r.headers.get("Content-Encoding") == "gzip":
221
+ body = gzip.decompress(body)
222
+ body = body.decode()
223
+ if not body.startswith("{"):
224
+ raise RuntimeError(f"Overpass error (HTML): {body[:300]}")
225
+ return json.loads(body)
226
+
227
+ # Example: cafes in Paris bbox
228
+ r = overpass_post('[out:json][timeout:25];node["amenity"="cafe"](48.855,2.295,48.862,2.308);out 5;')
229
+ print(len(r['elements'])) # 5 (or up to 5)
230
+ ```
231
+
232
+ ### Overpass element structure
233
+
234
+ Every element in `r['elements']` is a dict with at minimum:
235
+
236
+ ```python
237
+ {
238
+ "type": "node", # "node", "way", or "relation"
239
+ "id": 308684349, # int — OSM element ID (stable, use for dedup)
240
+ "lat": 48.8609068, # float — ONLY present for node type
241
+ "lon": 2.3015143, # float — ONLY present for node type
242
+ "tags": { # dict — all OSM tags on this element
243
+ "amenity": "cafe",
244
+ "name": "Café de l'Alma",
245
+ "name:fr": "Café de l'Alma",
246
+ "outdoor_seating": "yes",
247
+ "payment:credit_cards": "yes",
248
+ "phone": "+33 1 45 51 56 74",
249
+ "opening_hours": "Mo-Sa 08:00-23:00; Su 09:00-19:00", # optional
250
+ "website": "https://...", # optional
251
+ "wheelchair": "yes" # optional
252
+ }
253
+ }
254
+ ```
255
+
256
+ For `way` elements, use `out center;` to get a `center` dict with lat/lon instead of a node list:
257
+
258
+ ```python
259
+ # way element with out center:
260
+ {
261
+ "type": "way",
262
+ "id": 338411946,
263
+ "center": {"lat": 48.8660087, "lon": 2.3153233}, # centroid of the polygon
264
+ "nodes": [3454913623, 3454913707, ...], # node IDs forming the boundary
265
+ "tags": {"amenity": "cafe", "name": "Café 1902", ...}
266
+ }
267
+
268
+ # Query to get both nodes and ways with lat/lon:
269
+ query = '[out:json][timeout:25];(node["amenity"="cafe"](48.85,2.29,48.87,2.32);way["amenity"="cafe"](48.85,2.29,48.87,2.32););out center 20;'
270
+ r = overpass_get(query)
271
+ for el in r['elements']:
272
+ if el['type'] == 'node':
273
+ lat, lon = el['lat'], el['lon']
274
+ else: # way
275
+ lat, lon = el['center']['lat'], el['center']['lon']
276
+ print(el['tags'].get('name'), lat, lon)
277
+ ```
278
+
279
+ ### Overpass QL quick reference
280
+
281
+ ```
282
+ [out:json][timeout:25] # Required header: JSON output, 25s timeout
283
+ [maxsize:52428800] # Optional: 50MB max result size (default is server limit)
284
+
285
+ node["amenity"="cafe"](south,west,north,east);out N;
286
+ # ↑ bbox order: south_lat, west_lon, north_lat, east_lon
287
+ # Note: DIFFERENT from Nominatim's boundingbox field which is [south,north,west,east]
288
+
289
+ node["amenity"="cafe"](around:RADIUS_METERS,LAT,LON);out N;
290
+
291
+ node["amenity"~"cafe|restaurant"](bbox);out N; # regex match on tag value
292
+ node[!"name"](bbox);out N; # elements WITHOUT the 'name' tag
293
+ node["name"~"Star",i](bbox);out N; # case-insensitive regex
294
+
295
+ # Union of types:
296
+ (node["amenity"="cafe"](bbox); way["amenity"="cafe"](bbox););out center N;
297
+
298
+ # Multiple tags (AND logic):
299
+ node["amenity"="cafe"]["outdoor_seating"="yes"](bbox);out N;
300
+ ```
301
+
302
+ ---
303
+
304
+ ## OSM tile server (reference only, no scraping)
305
+
306
+ ```
307
+ https://{a,b,c}.tile.openstreetmap.org/{z}/{x}/{y}.png
308
+ ```
309
+
310
+ - Subdomains `a`, `b`, `c` for load balancing
311
+ - `z` = zoom level 0–19, `x`/`y` = tile coordinates
312
+ - Returns 256×256 PNG tiles
313
+ - Policy: max 2 req/s per IP, non-commercial use, must display OSM attribution
314
+ - Tile coordinate calculator: `https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames`
315
+ - Bulk tile downloading is prohibited — use Overpass or data extracts instead
316
+
317
+ ```python
318
+ # Convert lat/lon to tile coordinates
319
+ import math
320
+
321
+ def lat_lon_to_tile(lat, lon, zoom):
322
+ n = 2 ** zoom
323
+ x = int((lon + 180) / 360 * n)
324
+ y = int((1 - math.log(math.tan(math.radians(lat)) + 1 / math.cos(math.radians(lat))) / math.pi) / 2 * n)
325
+ return x, y
326
+
327
+ x, y = lat_lon_to_tile(48.8582, 2.2945, 14)
328
+ url = f"https://a.tile.openstreetmap.org/14/{x}/{y}.png"
329
+ # url == 'https://a.tile.openstreetmap.org/14/8281/5646.png'
330
+ ```
331
+
332
+ ---
333
+
334
+ ## Rate limits
335
+
336
+ | API | Limit | Enforcement | 429 behavior |
337
+ |-----|-------|-------------|--------------|
338
+ | Nominatim | 1 req/s | Soft — rapid requests work but you get delayed/dropped | Returns HTTP 403 if your IP is banned (not 429) |
339
+ | Overpass (main) | 2 concurrent slots per IP | Hard — 3rd concurrent req returns HTML error immediately | HTML error page with `rate_limited` in body |
340
+ | Overpass (main) | Also: query complexity quota | Resets over time (~per hour) | HTML error page with `rate_limited` |
341
+ | Tile server | 2 req/s per IP | Soft/hard | IP block |
342
+
343
+ **Check your Overpass quota**:
344
+ ```python
345
+ raw = http_get("https://overpass-api.de/api/status", headers={"User-Agent": "browser-harness/1.0"})
346
+ print(raw)
347
+ # Connected as: 1728118854
348
+ # Rate limit: 2
349
+ # 2 slots available now.
350
+ # Slot available after: 2026-04-18T11:00:00Z, in 30 seconds.
351
+ ```
352
+
353
+ **Handle rate limiting in production**:
354
+ ```python
355
+ import time
356
+
357
+ def overpass_get_with_retry(query: str, max_retries: int = 3) -> dict:
358
+ for attempt in range(max_retries):
359
+ url = f"https://overpass.openstreetmap.fr/api/interpreter?data={urllib.parse.quote(query)}"
360
+ raw = http_get(url, headers={"User-Agent": "browser-harness/1.0"})
361
+ if raw.startswith("{"):
362
+ return json.loads(raw)
363
+ if "rate_limited" in raw or "too busy" in raw:
364
+ wait = 2 ** attempt * 10 # 10s, 20s, 40s
365
+ time.sleep(wait)
366
+ continue
367
+ raise RuntimeError(f"Overpass error: {raw[:200]}")
368
+ raise RuntimeError("Overpass: too many retries")
369
+ ```
370
+
371
+ ---
372
+
373
+ ## Complete working example
374
+
375
+ ```python
376
+ import json, time, urllib.parse, urllib.request, gzip
377
+ from helpers import http_get
378
+
379
+ UA = {"User-Agent": "browser-harness/1.0"}
380
+ NOMINATIM = "https://nominatim.openstreetmap.org"
381
+ OVERPASS = "https://overpass.openstreetmap.fr/api/interpreter"
382
+
383
+ def geocode(query: str, limit: int = 1) -> list[dict]:
384
+ """Forward geocode — returns [] if nothing found."""
385
+ q = urllib.parse.quote(query)
386
+ raw = http_get(f"{NOMINATIM}/search?q={q}&format=json&limit={limit}&addressdetails=1", headers=UA)
387
+ return json.loads(raw)
388
+
389
+ def reverse_geocode(lat: float, lon: float) -> dict:
390
+ """Reverse geocode — always returns a result (nearest road/place)."""
391
+ raw = http_get(f"{NOMINATIM}/reverse?lat={lat}&lon={lon}&format=json", headers=UA)
392
+ return json.loads(raw)
393
+
394
+ def overpass_get(query: str) -> list[dict]:
395
+ """Run an Overpass QL query, return elements list."""
396
+ url = f"{OVERPASS}?data={urllib.parse.quote(query)}"
397
+ raw = http_get(url, headers=UA)
398
+ if not raw.startswith("{"):
399
+ raise RuntimeError(f"Overpass error: {raw[:200]}")
400
+ return json.loads(raw)["elements"]
401
+
402
+ def overpass_post(query: str) -> list[dict]:
403
+ """POST variant — avoids URL length limits for complex queries."""
404
+ data = urllib.parse.urlencode({"data": query}).encode()
405
+ req = urllib.request.Request(
406
+ OVERPASS, data=data, method="POST",
407
+ headers={"User-Agent": "browser-harness/1.0",
408
+ "Content-Type": "application/x-www-form-urlencoded",
409
+ "Accept-Encoding": "gzip"}
410
+ )
411
+ with urllib.request.urlopen(req, timeout=30) as r:
412
+ body = r.read()
413
+ if r.headers.get("Content-Encoding") == "gzip":
414
+ body = gzip.decompress(body)
415
+ body = body.decode()
416
+ if not body.startswith("{"):
417
+ raise RuntimeError(f"Overpass error: {body[:300]}")
418
+ return json.loads(body)["elements"]
419
+
420
+ # --- Usage examples (validated 2026-04-18) ---
421
+
422
+ # 1. Geocode a landmark
423
+ places = geocode("Eiffel Tower", limit=3)
424
+ # places[0]['lat'] == '48.8582599' (string)
425
+ # places[0]['lon'] == '2.2945006' (string)
426
+ # places[0]['display_name'] == 'Tour Eiffel, 5, Avenue Anatole France, ..., 75007, France'
427
+ # places[0]['address']['city'] == 'Paris'
428
+ lat = float(places[0]['lat'])
429
+ lon = float(places[0]['lon'])
430
+
431
+ # 2. Reverse geocode the coordinates
432
+ addr = reverse_geocode(lat, lon)
433
+ # addr['address']['road'] == 'Avenue Gustave Eiffel'
434
+ # addr['address']['city'] == 'Paris'
435
+ # addr['address']['postcode']== '75007'
436
+ # addr['address']['country'] == 'France'
437
+
438
+ # 3. Find nearby cafes (wait 1s between nominatim and overpass if same script)
439
+ time.sleep(1)
440
+ cafes = overpass_get(
441
+ f"[out:json][timeout:25];node[\"amenity\"=\"cafe\"](around:500,{lat},{lon});out 10;"
442
+ )
443
+ for cafe in cafes:
444
+ print(f"{cafe['tags'].get('name','?'):30s} {cafe['lat']:.4f}, {cafe['lon']:.4f}")
445
+ # Café de l'Alma 48.8609, 2.3015
446
+ # Le Campanella 48.8586, 2.3033
447
+
448
+ # 4. Structured city lookup + find restaurants in bounding box
449
+ time.sleep(1)
450
+ paris = geocode("Paris, France")[0]
451
+ bb = paris['boundingbox'] # [south_lat, north_lat, west_lon, east_lon] ← Nominatim order!
452
+ # For Overpass: need (south_lat, west_lon, north_lat, east_lon) ← DIFFERENT order
453
+ south, north, west, east = bb[0], bb[1], bb[2], bb[3]
454
+ # Restrict to center slice to avoid massive result set
455
+ center_bbox = f"48.855,2.295,48.865,2.315"
456
+ rests = overpass_post(
457
+ f"[out:json][timeout:25];node[\"amenity\"=\"restaurant\"]({center_bbox});out 5;"
458
+ )
459
+ print(f"Found {len(rests)} restaurants near Paris center")
460
+ ```
461
+
462
+ ---
463
+
464
+ ## Gotchas
465
+
466
+ **`http_get` default UA (`Mozilla/5.0`) is blocked by both APIs.** Always pass `headers={"User-Agent": "browser-harness/1.0"}`. The `headers` kwarg in `http_get` does a `.update()` so it properly overrides the default. Confirmed: Mozilla/5.0 → 403 on Nominatim; `browser-harness/1.0` → 200.
467
+
468
+ **Blocked User-Agent patterns on Nominatim**: `Mozilla/5.0`, `python-requests/*`, `Wget/*`. Accepted: any non-generic app-style UA like `browser-harness/1.0`, `MyApp/2.0`, `curl/7.x`. Nominatim policy requires a descriptive UA with contact info, but in practice any non-library string passes.
469
+
470
+ **Nominatim lat/lon are strings, Overpass lat/lon are floats.** Always convert Nominatim coordinates: `float(result['lat'])`. Overpass element `lat`/`lon` are native Python floats — no conversion needed.
471
+
472
+ **Nominatim `boundingbox` field order is `[south_lat, north_lat, west_lon, east_lon]` — NOT `[south, west, north, east]`.** Overpass bbox uses `(south_lat, west_lon, north_lat, east_lon)`. When feeding a Nominatim bounding box into Overpass, you must reorder: `f"({bb[0]},{bb[2]},{bb[1]},{bb[3]})"`.
473
+
474
+ **`overpass-api.de` main instance is frequently overloaded.** Returns HTTP 504 (timeout) or an HTML error page with `rate_limited` when busy. The FR mirror (`overpass.openstreetmap.fr`) is usually more responsive but also blocks `Mozilla/5.0`. Always detect non-JSON responses: `if not raw.startswith("{")`.
475
+
476
+ **Overpass error responses are HTML, not JSON.** The API returns HTTP 200 with an HTML error page when rate-limited or when the server is too busy. Always check `raw.startswith("{")` before parsing.
477
+
478
+ **Overpass rate limit: 2 concurrent slots, NOT 2 requests/s.** You can run 2 queries simultaneously. A 3rd concurrent query immediately returns an error. Sequential queries with no sleep between them work fine as long as each completes before the next starts.
479
+
480
+ **`out N;` limits results to N elements — use it.** Without a limit, large bounding boxes can return thousands of elements and hit the 512MB memory limit, returning a `maxsize` error. Default safe limit: `out 50;` for exploration, `out 500;` for bulk collection.
481
+
482
+ **Overpass QL bbox order is `(south, west, north, east)` — latitude FIRST.** This is the opposite of the standard GeoJSON convention `[west, south, east, north]`. The `around:` filter uses `(around:METERS,LAT,LON)` — note lat before lon.
483
+
484
+ **`name` tag in Overpass is the local-language name.** For Paris cafes this is French. English names may appear under `name:en` but are often absent. Never assume `name` is in English.
485
+
486
+ **Nominatim `/reverse` always returns the nearest result** — it never returns an empty response (unlike `/search`). If the coordinates are in the ocean, it still returns the nearest coastline or country.
487
+
488
+ **`place_id` is internal and ephemeral** — do not store it for long-term use. Use `osm_type` + `osm_id` for stable references (e.g., `way/5013364` for the Eiffel Tower).
489
+
490
+ **Overpass `http_get` POST workaround**: `http_get` only supports GET. For POST requests (needed to avoid URL length limits for complex multi-statement QL), use `urllib.request.Request` directly as shown in the `overpass_post()` example above.