udata-hydra-csvapi 0.1.0.dev209__tar.gz → 0.2.0.dev0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,229 @@
1
+ Metadata-Version: 2.1
2
+ Name: udata-hydra-csvapi
3
+ Version: 0.2.0.dev0
4
+ Summary: API for CSV converted by udata-hydra
5
+ License: MIT
6
+ Author: data.gouv.fr
7
+ Author-email: opendatateam@data.gouv.fr
8
+ Requires-Python: >=3.11,<4.0
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Classifier: Programming Language :: Python :: 3.12
13
+ Requires-Dist: aiohttp (>=3.8.4,<4.0.0)
14
+ Requires-Dist: aiohttp-cors (==0.7.0)
15
+ Requires-Dist: aiohttp-swagger (==1.0.16)
16
+ Requires-Dist: sentry-sdk (>=2.13.0,<3.0.0)
17
+ Description-Content-Type: text/markdown
18
+
19
+ # Api-tabular
20
+
21
+ This connects to [hydra](https://github.com/datagouv/hydra) and serves the converted CSVs as an API.
22
+
23
+ ## Run locally
24
+
25
+ Start [hydra](https://github.com/datagouv/hydra) via `docker compose`.
26
+
27
+ Launch this project:
28
+
29
+ ```shell
30
+ docker compose up
31
+ ```
32
+
33
+ You can now access the raw postgrest API on http://localhost:8080.
34
+
35
+ Now you can launch the proxy (ie the app):
36
+
37
+ ```shell
38
+ poetry install
39
+ poetry run adev runserver -p8005 api_tabular/app.py # Api related to apified CSV files by udata-hydra
40
+ poetry run adev runserver -p8005 api_tabular/metrics.py # Api related to udata's metrics
41
+ ```
42
+
43
+ And query postgrest via the proxy using a `resource_id`, cf below. Test resource_id is `aaaaaaaa-1111-bbbb-2222-cccccccccccc`
44
+
45
+ ## API
46
+
47
+ ### Meta informations on resource
48
+
49
+ ```shell
50
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/
51
+ ```
52
+
53
+ ```json
54
+ {
55
+ "created_at": "2023-04-21T22:54:22.043492+00:00",
56
+ "url": "https://data.gouv.fr/datasets/example/resources/fake.csv",
57
+ "links": [
58
+ {
59
+ "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
60
+ "type": "GET",
61
+ "rel": "profile"
62
+ },
63
+ {
64
+ "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/",
65
+ "type": "GET",
66
+ "rel": "data"
67
+ },
68
+ {
69
+ "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
70
+ "type": "GET",
71
+ "rel": "swagger"
72
+ }
73
+ ]
74
+ }
75
+ ```
76
+
77
+ ### Profile (csv-detective output) for a resource
78
+
79
+ ```shell
80
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/
81
+ ```
82
+
83
+ ```json
84
+ {
85
+ "profile": {
86
+ "header": [
87
+ "id",
88
+ "score",
89
+ "decompte",
90
+ "is_true",
91
+ "birth",
92
+ "liste"
93
+ ]
94
+ },
95
+ "...": "..."
96
+ }
97
+ ```
98
+
99
+ ### Data for a resource (ie resource API)
100
+
101
+ ```shell
102
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/
103
+ ```
104
+
105
+ ```json
106
+ {
107
+ "data": [
108
+ {
109
+ "__id": 1,
110
+ "id": " 8c7a6452-9295-4db2-b692-34104574fded",
111
+ "score": 0.708,
112
+ "decompte": 90,
113
+ "is_true": false,
114
+ "birth": "1949-07-16",
115
+ "liste": "[0]"
116
+ },
117
+ ...
118
+ ],
119
+ "links": {
120
+ "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
121
+ "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
122
+ "next": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=20",
123
+ "prev": null
124
+ },
125
+ "meta": {
126
+ "page": 1,
127
+ "page_size": 20,
128
+ "total": 1000
129
+ }
130
+ }
131
+ ```
132
+
133
+ This endpoint can be queried with the following operators as query string (replacing `column_name` with the name of an actual column):
134
+
135
+ ```
136
+ # sort by column
137
+ column_name__sort=asc
138
+ column_name__sort=desc
139
+
140
+ # exact value
141
+ column_name__exact=value
142
+
143
+ # differs
144
+ column_name__differs=value
145
+
146
+ # contains (for strings only)
147
+ column_name__contains=value
148
+
149
+ # in (value in list)
150
+ column_name__in=value1,value2,value3
151
+
152
+ # less
153
+ column_name__less=value
154
+
155
+ # greater
156
+ column_name__greater=value
157
+
158
+ # strictly less
159
+ column_name__strictly_less=value
160
+
161
+ # strictly greater
162
+ column_name__strictly_greater=value
163
+ ```
164
+
165
+ For instance:
166
+ ```shell
167
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?score__greater=0.9&decompte__exact=13
168
+ ```
169
+ returns
170
+ ```json
171
+ {
172
+ "data": [
173
+ {
174
+ "__id": 52,
175
+ "id": " 5174f26d-d62b-4adb-a43a-c3b6288fa2f6",
176
+ "score": 0.985,
177
+ "decompte": 13,
178
+ "is_true": false,
179
+ "birth": "1980-03-23",
180
+ "liste": "[0]"
181
+ },
182
+ {
183
+ "__id": 543,
184
+ "id": " 8705df7c-8a6a-49e2-9514-cf2fb532525e",
185
+ "score": 0.955,
186
+ "decompte": 13,
187
+ "is_true": true,
188
+ "birth": "1965-02-06",
189
+ "liste": "[0, 1, 2]"
190
+ }
191
+ ],
192
+ "links": {
193
+ "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
194
+ "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
195
+ "next": null,
196
+ "prev": null
197
+ },
198
+ "meta": {
199
+ "page": 1,
200
+ "page_size": 20,
201
+ "total": 2
202
+ }
203
+ }
204
+ ```
205
+
206
+ Pagination is made through queries with `page` and `page_size`:
207
+ ```shell
208
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=30
209
+ ```
210
+
211
+
212
+ ## Contributing
213
+
214
+ ### Pre-commit hook
215
+
216
+ This repository uses a [pre-commit](https://pre-commit.com/) hook which lint and format code before each commit.
217
+ Please install it with:
218
+ ```shell
219
+ poetry run pre-commit install
220
+ ```
221
+
222
+ ### Lint and format code
223
+
224
+ To lint, format and sort imports, this repository uses [Ruff](https://astral.sh/ruff/).
225
+ You can run the following command to lint and format the code:
226
+ ```shell
227
+ poetry run ruff check --fix && poetry run ruff format
228
+ ```
229
+
@@ -0,0 +1,210 @@
1
+ # Api-tabular
2
+
3
+ This connects to [hydra](https://github.com/datagouv/hydra) and serves the converted CSVs as an API.
4
+
5
+ ## Run locally
6
+
7
+ Start [hydra](https://github.com/datagouv/hydra) via `docker compose`.
8
+
9
+ Launch this project:
10
+
11
+ ```shell
12
+ docker compose up
13
+ ```
14
+
15
+ You can now access the raw postgrest API on http://localhost:8080.
16
+
17
+ Now you can launch the proxy (ie the app):
18
+
19
+ ```shell
20
+ poetry install
21
+ poetry run adev runserver -p8005 api_tabular/app.py # Api related to apified CSV files by udata-hydra
22
+ poetry run adev runserver -p8005 api_tabular/metrics.py # Api related to udata's metrics
23
+ ```
24
+
25
+ And query postgrest via the proxy using a `resource_id`, cf below. Test resource_id is `aaaaaaaa-1111-bbbb-2222-cccccccccccc`
26
+
27
+ ## API
28
+
29
+ ### Meta informations on resource
30
+
31
+ ```shell
32
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/
33
+ ```
34
+
35
+ ```json
36
+ {
37
+ "created_at": "2023-04-21T22:54:22.043492+00:00",
38
+ "url": "https://data.gouv.fr/datasets/example/resources/fake.csv",
39
+ "links": [
40
+ {
41
+ "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
42
+ "type": "GET",
43
+ "rel": "profile"
44
+ },
45
+ {
46
+ "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/",
47
+ "type": "GET",
48
+ "rel": "data"
49
+ },
50
+ {
51
+ "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
52
+ "type": "GET",
53
+ "rel": "swagger"
54
+ }
55
+ ]
56
+ }
57
+ ```
58
+
59
+ ### Profile (csv-detective output) for a resource
60
+
61
+ ```shell
62
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/
63
+ ```
64
+
65
+ ```json
66
+ {
67
+ "profile": {
68
+ "header": [
69
+ "id",
70
+ "score",
71
+ "decompte",
72
+ "is_true",
73
+ "birth",
74
+ "liste"
75
+ ]
76
+ },
77
+ "...": "..."
78
+ }
79
+ ```
80
+
81
+ ### Data for a resource (ie resource API)
82
+
83
+ ```shell
84
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/
85
+ ```
86
+
87
+ ```json
88
+ {
89
+ "data": [
90
+ {
91
+ "__id": 1,
92
+ "id": " 8c7a6452-9295-4db2-b692-34104574fded",
93
+ "score": 0.708,
94
+ "decompte": 90,
95
+ "is_true": false,
96
+ "birth": "1949-07-16",
97
+ "liste": "[0]"
98
+ },
99
+ ...
100
+ ],
101
+ "links": {
102
+ "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
103
+ "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
104
+ "next": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=20",
105
+ "prev": null
106
+ },
107
+ "meta": {
108
+ "page": 1,
109
+ "page_size": 20,
110
+ "total": 1000
111
+ }
112
+ }
113
+ ```
114
+
115
+ This endpoint can be queried with the following operators as query string (replacing `column_name` with the name of an actual column):
116
+
117
+ ```
118
+ # sort by column
119
+ column_name__sort=asc
120
+ column_name__sort=desc
121
+
122
+ # exact value
123
+ column_name__exact=value
124
+
125
+ # differs
126
+ column_name__differs=value
127
+
128
+ # contains (for strings only)
129
+ column_name__contains=value
130
+
131
+ # in (value in list)
132
+ column_name__in=value1,value2,value3
133
+
134
+ # less
135
+ column_name__less=value
136
+
137
+ # greater
138
+ column_name__greater=value
139
+
140
+ # strictly less
141
+ column_name__strictly_less=value
142
+
143
+ # strictly greater
144
+ column_name__strictly_greater=value
145
+ ```
146
+
147
+ For instance:
148
+ ```shell
149
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?score__greater=0.9&decompte__exact=13
150
+ ```
151
+ returns
152
+ ```json
153
+ {
154
+ "data": [
155
+ {
156
+ "__id": 52,
157
+ "id": " 5174f26d-d62b-4adb-a43a-c3b6288fa2f6",
158
+ "score": 0.985,
159
+ "decompte": 13,
160
+ "is_true": false,
161
+ "birth": "1980-03-23",
162
+ "liste": "[0]"
163
+ },
164
+ {
165
+ "__id": 543,
166
+ "id": " 8705df7c-8a6a-49e2-9514-cf2fb532525e",
167
+ "score": 0.955,
168
+ "decompte": 13,
169
+ "is_true": true,
170
+ "birth": "1965-02-06",
171
+ "liste": "[0, 1, 2]"
172
+ }
173
+ ],
174
+ "links": {
175
+ "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
176
+ "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
177
+ "next": null,
178
+ "prev": null
179
+ },
180
+ "meta": {
181
+ "page": 1,
182
+ "page_size": 20,
183
+ "total": 2
184
+ }
185
+ }
186
+ ```
187
+
188
+ Pagination is made through queries with `page` and `page_size`:
189
+ ```shell
190
+ curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=30
191
+ ```
192
+
193
+
194
+ ## Contributing
195
+
196
+ ### Pre-commit hook
197
+
198
+ This repository uses a [pre-commit](https://pre-commit.com/) hook which lint and format code before each commit.
199
+ Please install it with:
200
+ ```shell
201
+ poetry run pre-commit install
202
+ ```
203
+
204
+ ### Lint and format code
205
+
206
+ To lint, format and sort imports, this repository uses [Ruff](https://astral.sh/ruff/).
207
+ You can run the following command to lint and format the code:
208
+ ```shell
209
+ poetry run ruff check --fix && poetry run ruff format
210
+ ```
@@ -1,8 +1,7 @@
1
1
  import os
2
-
3
2
  from pathlib import Path
4
3
 
5
- import toml
4
+ import tomllib
6
5
 
7
6
 
8
7
  class Configurator:
@@ -16,15 +15,23 @@ class Configurator:
16
15
 
17
16
  def configure(self):
18
17
  # load default settings
19
- configuration = toml.load(Path(__file__).parent / "config_default.toml")
20
- if "POSTGREST_ENDPOINT" in os.environ:
21
- configuration["PG_RST_URL"] = f"http://{os.getenv('POSTGREST_ENDPOINT')}"
18
+ with open(Path(__file__).parent / "config_default.toml", "rb") as f:
19
+ configuration = tomllib.load(f)
22
20
 
23
- configuration["PG_RST_URL"]
24
21
  # override with local settings
25
22
  local_settings = os.environ.get("CSVAPI_SETTINGS", Path.cwd() / "config.toml")
26
23
  if Path(local_settings).exists():
27
- configuration.update(toml.load(local_settings))
24
+ with open(local_settings, "rb") as f:
25
+ configuration.update(tomllib.load(f))
26
+
27
+ # override with os env settings
28
+ for config_key in configuration:
29
+ if config_key in os.environ:
30
+ configuration[config_key] = os.getenv(config_key)
31
+
32
+ # Make sure PGREST_ENDPOINT has a scheme
33
+ if not configuration["PGREST_ENDPOINT"].startswith("http"):
34
+ configuration["PGREST_ENDPOINT"] = f"http://{configuration['PGREST_ENDPOINT']}"
28
35
 
29
36
  self.configuration = configuration
30
37
  self.check()
@@ -0,0 +1,187 @@
1
+ import os
2
+
3
+ import aiohttp_cors
4
+ import sentry_sdk
5
+ from aiohttp import ClientSession, web
6
+ from aiohttp_swagger import setup_swagger
7
+ from sentry_sdk.integrations.aiohttp import AioHttpIntegration
8
+
9
+ from api_tabular import config
10
+ from api_tabular.error import QueryException
11
+ from api_tabular.query import (
12
+ get_resource,
13
+ get_resource_data,
14
+ get_resource_data_streamed,
15
+ )
16
+ from api_tabular.utils import (
17
+ build_link_with_page,
18
+ build_sql_query_string,
19
+ build_swagger_file,
20
+ url_for,
21
+ )
22
+
23
+ routes = web.RouteTableDef()
24
+
25
+ sentry_sdk.init(
26
+ dsn=config.SENTRY_DSN,
27
+ integrations=[AioHttpIntegration()],
28
+ traces_sample_rate=1.0,
29
+ )
30
+
31
+
32
+ @routes.get(r"/api/resources/{rid}/", name="meta")
33
+ async def resource_meta(request):
34
+ resource_id = request.match_info["rid"]
35
+ resource = await get_resource(request.app["csession"], resource_id, ["created_at", "url"])
36
+ return web.json_response(
37
+ {
38
+ "created_at": resource["created_at"],
39
+ "url": resource["url"],
40
+ "links": [
41
+ {
42
+ "href": url_for(request, "profile", rid=resource_id, _external=True),
43
+ "type": "GET",
44
+ "rel": "profile",
45
+ },
46
+ {
47
+ "href": url_for(request, "data", rid=resource_id, _external=True),
48
+ "type": "GET",
49
+ "rel": "data",
50
+ },
51
+ {
52
+ "href": url_for(request, "swagger", rid=resource_id, _external=True),
53
+ "type": "GET",
54
+ "rel": "swagger",
55
+ },
56
+ ],
57
+ }
58
+ )
59
+
60
+
61
+ @routes.get(r"/api/resources/{rid}/profile/", name="profile")
62
+ async def resource_profile(request):
63
+ resource_id = request.match_info["rid"]
64
+ resource = await get_resource(request.app["csession"], resource_id, ["profile:csv_detective"])
65
+ return web.json_response(resource)
66
+
67
+
68
+ @routes.get(r"/api/resources/{rid}/swagger/", name="swagger")
69
+ async def resource_swagger(request):
70
+ resource_id = request.match_info["rid"]
71
+ resource = await get_resource(request.app["csession"], resource_id, ["profile:csv_detective"])
72
+ swagger_string = build_swagger_file(resource["profile"]["columns"], resource_id)
73
+ return web.Response(body=swagger_string)
74
+
75
+
76
+ @routes.get(r"/api/resources/{rid}/data/", name="data")
77
+ async def resource_data(request):
78
+ resource_id = request.match_info["rid"]
79
+ query_string = request.query_string.split("&") if request.query_string else []
80
+ page = int(request.query.get("page", "1"))
81
+ page_size = int(request.query.get("page_size", config.PAGE_SIZE_DEFAULT))
82
+
83
+ if page_size > config.PAGE_SIZE_MAX:
84
+ raise QueryException(
85
+ 400,
86
+ None,
87
+ "Invalid query string",
88
+ f"Page size exceeds allowed maximum: {config.PAGE_SIZE_MAX}",
89
+ )
90
+ if page > 1:
91
+ offset = page_size * (page - 1)
92
+ else:
93
+ offset = 0
94
+
95
+ try:
96
+ sql_query = build_sql_query_string(query_string, page_size, offset)
97
+ except ValueError:
98
+ raise QueryException(400, None, "Invalid query string", "Malformed query")
99
+
100
+ resource = await get_resource(request.app["csession"], resource_id, ["parsing_table"])
101
+ response, total = await get_resource_data(request.app["csession"], resource, sql_query)
102
+
103
+ next = build_link_with_page(request, query_string, page + 1, page_size)
104
+ prev = build_link_with_page(request, query_string, page - 1, page_size)
105
+ body = {
106
+ "data": response,
107
+ "links": {
108
+ "profile": url_for(request, "profile", rid=resource_id, _external=True),
109
+ "swagger": url_for(request, "swagger", rid=resource_id, _external=True),
110
+ "next": next if page_size + offset < total else None,
111
+ "prev": prev if page > 1 else None,
112
+ },
113
+ "meta": {"page": page, "page_size": page_size, "total": total},
114
+ }
115
+ return web.json_response(body)
116
+
117
+
118
+ @routes.get(r"/api/resources/{rid}/data/csv/", name="csv")
119
+ async def resource_data_csv(request):
120
+ resource_id = request.match_info["rid"]
121
+ query_string = request.query_string.split("&") if request.query_string else []
122
+
123
+ try:
124
+ sql_query = build_sql_query_string(query_string)
125
+ except ValueError:
126
+ raise QueryException(400, None, "Invalid query string", "Malformed query")
127
+
128
+ resource = await get_resource(request.app["csession"], resource_id, ["parsing_table"])
129
+
130
+ response_headers = {
131
+ "Content-Disposition": f'attachment; filename="{resource_id}.csv"',
132
+ "Content-Type": "text/csv",
133
+ }
134
+ response = web.StreamResponse(headers=response_headers)
135
+ await response.prepare(request)
136
+
137
+ async for chunk in get_resource_data_streamed(request.app["csession"], resource, sql_query):
138
+ await response.write(chunk)
139
+
140
+ await response.write_eof()
141
+ return response
142
+
143
+
144
+ @routes.get(r"/health/")
145
+ async def get_health(request):
146
+ return web.HTTPOk()
147
+
148
+
149
+ async def app_factory():
150
+ async def on_startup(app):
151
+ app["csession"] = ClientSession()
152
+
153
+ async def on_cleanup(app):
154
+ await app["csession"].close()
155
+
156
+ app = web.Application()
157
+ app.add_routes(routes)
158
+ app.on_startup.append(on_startup)
159
+ app.on_cleanup.append(on_cleanup)
160
+
161
+ cors = aiohttp_cors.setup(
162
+ app,
163
+ defaults={
164
+ "*": aiohttp_cors.ResourceOptions(
165
+ allow_credentials=True, expose_headers="*", allow_headers="*"
166
+ )
167
+ },
168
+ )
169
+ for route in list(app.router.routes()):
170
+ cors.add(route)
171
+
172
+ setup_swagger(
173
+ app,
174
+ swagger_url=config.DOC_PATH,
175
+ ui_version=3,
176
+ swagger_from_file="ressource_app_swagger.yaml",
177
+ )
178
+
179
+ return app
180
+
181
+
182
+ def run():
183
+ web.run_app(app_factory(), path=os.environ.get("CSVAPI_APP_SOCKET_PATH"))
184
+
185
+
186
+ if __name__ == "__main__":
187
+ run()
@@ -0,0 +1,8 @@
1
+ PGREST_ENDPOINT = "http://localhost:8080"
2
+ SERVER_NAME = "localhost:8005"
3
+ SCHEME = "http"
4
+ SENTRY_DSN = ""
5
+ PAGE_SIZE_DEFAULT = 20
6
+ PAGE_SIZE_MAX = 50
7
+ BATCH_SIZE = 50000
8
+ DOC_PATH = "/api/doc"