PyPI - udata-hydra-csvapi - Versions diffs - 0.1.0.dev209__tar.gz → 0.2.0.dev0__tar.gz - Mend

udata-hydra-csvapi 0.1.0.dev209tar.gz → 0.2.0.dev0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

udata_hydra_csvapi-0.2.0.dev0/PKG-INFO ADDED Viewed

@@ -0,0 +1,229 @@
+Metadata-Version: 2.1
+Name: udata-hydra-csvapi
+Version: 0.2.0.dev0
+Summary: API for CSV converted by udata-hydra
+License: MIT
+Author: data.gouv.fr
+Author-email: opendatateam@data.gouv.fr
+Requires-Python: >=3.11,<4.0
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Requires-Dist: aiohttp (>=3.8.4,<4.0.0)
+Requires-Dist: aiohttp-cors (==0.7.0)
+Requires-Dist: aiohttp-swagger (==1.0.16)
+Requires-Dist: sentry-sdk (>=2.13.0,<3.0.0)
+Description-Content-Type: text/markdown
+# Api-tabular
+This connects to [hydra](https://github.com/datagouv/hydra) and serves the converted CSVs as an API.
+## Run locally
+Start [hydra](https://github.com/datagouv/hydra) via `docker compose`.
+Launch this project:
+```shell
+docker compose up
+```
+You can now access the raw postgrest API on http://localhost:8080.
+Now you can launch the proxy (ie the app):
+```shell
+poetry install
+poetry run adev runserver -p8005 api_tabular/app.py        # Api related to apified CSV files by udata-hydra
+poetry run adev runserver -p8005 api_tabular/metrics.py    # Api related to udata's metrics
+```
+And query postgrest via the proxy using a `resource_id`, cf below. Test resource_id is `aaaaaaaa-1111-bbbb-2222-cccccccccccc`
+## API
+### Meta informations on resource
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/
+```
+```json
+{
+  "created_at": "2023-04-21T22:54:22.043492+00:00",
+  "url": "https://data.gouv.fr/datasets/example/resources/fake.csv",
+  "links": [
+    {
+      "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
+      "type": "GET",
+      "rel": "profile"
+    },
+    {
+      "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/",
+      "type": "GET",
+      "rel": "data"
+    },
+    {
+      "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
+      "type": "GET",
+      "rel": "swagger"
+    }
+  ]
+}
+```
+### Profile (csv-detective output) for a resource
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/
+```
+```json
+{
+  "profile": {
+    "header": [
+        "id",
+        "score",
+        "decompte",
+        "is_true",
+        "birth",
+        "liste"
+    ]
+  },
+  "...": "..."
+}
+```
+### Data for a resource (ie resource API)
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/
+```
+```json
+{
+  "data": [
+    {
+        "__id": 1,
+        "id": " 8c7a6452-9295-4db2-b692-34104574fded",
+        "score": 0.708,
+        "decompte": 90,
+        "is_true": false,
+        "birth": "1949-07-16",
+        "liste": "[0]"
+    },
+    ...
+  ],
+  "links": {
+      "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
+      "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
+      "next": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=20",
+      "prev": null
+  },
+  "meta": {
+      "page": 1,
+      "page_size": 20,
+      "total": 1000
+  }
+}
+```
+This endpoint can be queried with the following operators as query string (replacing `column_name` with the name of an actual column):
+```
+# sort by column
+column_name__sort=asc
+column_name__sort=desc
+# exact value
+column_name__exact=value
+# differs
+column_name__differs=value
+# contains (for strings only)
+column_name__contains=value
+# in (value in list)
+column_name__in=value1,value2,value3
+# less
+column_name__less=value
+# greater
+column_name__greater=value
+# strictly less
+column_name__strictly_less=value
+# strictly greater
+column_name__strictly_greater=value
+```
+For instance:
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?score__greater=0.9&decompte__exact=13
+```
+returns
+```json
+{
+  "data": [
+    {
+      "__id": 52,
+      "id": " 5174f26d-d62b-4adb-a43a-c3b6288fa2f6",
+      "score": 0.985,
+      "decompte": 13,
+      "is_true": false,
+      "birth": "1980-03-23",
+      "liste": "[0]"
+    },
+    {
+      "__id": 543,
+      "id": " 8705df7c-8a6a-49e2-9514-cf2fb532525e",
+      "score": 0.955,
+      "decompte": 13,
+      "is_true": true,
+      "birth": "1965-02-06",
+      "liste": "[0, 1, 2]"
+    }
+  ],
+  "links": {
+    "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
+    "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
+    "next": null,
+    "prev": null
+  },
+  "meta": {
+    "page": 1,
+    "page_size": 20,
+    "total": 2
+  }
+}
+```
+Pagination is made through queries with `page` and `page_size`:
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=30
+```
+## Contributing
+### Pre-commit hook
+This repository uses a [pre-commit](https://pre-commit.com/) hook which lint and format code before each commit.
+Please install it with:
+```shell
+poetry run pre-commit install
+```
+### Lint and format code
+To lint, format and sort imports, this repository uses [Ruff](https://astral.sh/ruff/).
+You can run the following command to lint and format the code:
+```shell
+poetry run ruff check --fix && poetry run ruff format
+```

udata_hydra_csvapi-0.2.0.dev0/README.md ADDED Viewed

@@ -0,0 +1,210 @@
+# Api-tabular
+This connects to [hydra](https://github.com/datagouv/hydra) and serves the converted CSVs as an API.
+## Run locally
+Start [hydra](https://github.com/datagouv/hydra) via `docker compose`.
+Launch this project:
+```shell
+docker compose up
+```
+You can now access the raw postgrest API on http://localhost:8080.
+Now you can launch the proxy (ie the app):
+```shell
+poetry install
+poetry run adev runserver -p8005 api_tabular/app.py        # Api related to apified CSV files by udata-hydra
+poetry run adev runserver -p8005 api_tabular/metrics.py    # Api related to udata's metrics
+```
+And query postgrest via the proxy using a `resource_id`, cf below. Test resource_id is `aaaaaaaa-1111-bbbb-2222-cccccccccccc`
+## API
+### Meta informations on resource
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/
+```
+```json
+{
+  "created_at": "2023-04-21T22:54:22.043492+00:00",
+  "url": "https://data.gouv.fr/datasets/example/resources/fake.csv",
+  "links": [
+    {
+      "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
+      "type": "GET",
+      "rel": "profile"
+    },
+    {
+      "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/",
+      "type": "GET",
+      "rel": "data"
+    },
+    {
+      "href": "/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
+      "type": "GET",
+      "rel": "swagger"
+    }
+  ]
+}
+```
+### Profile (csv-detective output) for a resource
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/
+```
+```json
+{
+  "profile": {
+    "header": [
+        "id",
+        "score",
+        "decompte",
+        "is_true",
+        "birth",
+        "liste"
+    ]
+  },
+  "...": "..."
+}
+```
+### Data for a resource (ie resource API)
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/
+```
+```json
+{
+  "data": [
+    {
+        "__id": 1,
+        "id": " 8c7a6452-9295-4db2-b692-34104574fded",
+        "score": 0.708,
+        "decompte": 90,
+        "is_true": false,
+        "birth": "1949-07-16",
+        "liste": "[0]"
+    },
+    ...
+  ],
+  "links": {
+      "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
+      "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
+      "next": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=20",
+      "prev": null
+  },
+  "meta": {
+      "page": 1,
+      "page_size": 20,
+      "total": 1000
+  }
+}
+```
+This endpoint can be queried with the following operators as query string (replacing `column_name` with the name of an actual column):
+```
+# sort by column
+column_name__sort=asc
+column_name__sort=desc
+# exact value
+column_name__exact=value
+# differs
+column_name__differs=value
+# contains (for strings only)
+column_name__contains=value
+# in (value in list)
+column_name__in=value1,value2,value3
+# less
+column_name__less=value
+# greater
+column_name__greater=value
+# strictly less
+column_name__strictly_less=value
+# strictly greater
+column_name__strictly_greater=value
+```
+For instance:
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?score__greater=0.9&decompte__exact=13
+```
+returns
+```json
+{
+  "data": [
+    {
+      "__id": 52,
+      "id": " 5174f26d-d62b-4adb-a43a-c3b6288fa2f6",
+      "score": 0.985,
+      "decompte": 13,
+      "is_true": false,
+      "birth": "1980-03-23",
+      "liste": "[0]"
+    },
+    {
+      "__id": 543,
+      "id": " 8705df7c-8a6a-49e2-9514-cf2fb532525e",
+      "score": 0.955,
+      "decompte": 13,
+      "is_true": true,
+      "birth": "1965-02-06",
+      "liste": "[0, 1, 2]"
+    }
+  ],
+  "links": {
+    "profile": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/profile/",
+    "swagger": "http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/swagger/",
+    "next": null,
+    "prev": null
+  },
+  "meta": {
+    "page": 1,
+    "page_size": 20,
+    "total": 2
+  }
+}
+```
+Pagination is made through queries with `page` and `page_size`:
+```shell
+curl http://localhost:8005/api/resources/aaaaaaaa-1111-bbbb-2222-cccccccccccc/data/?page=2&page_size=30
+```
+## Contributing
+### Pre-commit hook
+This repository uses a [pre-commit](https://pre-commit.com/) hook which lint and format code before each commit.
+Please install it with:
+```shell
+poetry run pre-commit install
+```
+### Lint and format code
+To lint, format and sort imports, this repository uses [Ruff](https://astral.sh/ruff/).
+You can run the following command to lint and format the code:
+```shell
+poetry run ruff check --fix && poetry run ruff format
+```

{udata_hydra_csvapi-0.1.0.dev209 → udata_hydra_csvapi-0.2.0.dev0}/api_tabular/__init__.py RENAMED Viewed

@@ -1,8 +1,7 @@
 import os
 from pathlib import Path
-import toml
+import tomllib
 class Configurator:
@@ -16,15 +15,23 @@ class Configurator:
     def configure(self):
         # load default settings
-        configuration = toml.load(Path(__file__).parent / "config_default.toml")
-        if "POSTGREST_ENDPOINT" in os.environ:
-            configuration["PG_RST_URL"] = f"http://{os.getenv('POSTGREST_ENDPOINT')}"
+        with open(Path(__file__).parent / "config_default.toml", "rb") as f:
+            configuration = tomllib.load(f)
-        configuration["PG_RST_URL"]
         # override with local settings
         local_settings = os.environ.get("CSVAPI_SETTINGS", Path.cwd() / "config.toml")
         if Path(local_settings).exists():
-            configuration.update(toml.load(local_settings))
+            with open(local_settings, "rb") as f:
+                configuration.update(tomllib.load(f))
+        # override with os env settings
+        for config_key in configuration:
+            if config_key in os.environ:
+                configuration[config_key] = os.getenv(config_key)
+        # Make sure PGREST_ENDPOINT has a scheme
+        if not configuration["PGREST_ENDPOINT"].startswith("http"):
+            configuration["PGREST_ENDPOINT"] = f"http://{configuration['PGREST_ENDPOINT']}"
         self.configuration = configuration
         self.check()

udata_hydra_csvapi-0.2.0.dev0/api_tabular/app.py ADDED Viewed

@@ -0,0 +1,187 @@
+import os
+import aiohttp_cors
+import sentry_sdk
+from aiohttp import ClientSession, web
+from aiohttp_swagger import setup_swagger
+from sentry_sdk.integrations.aiohttp import AioHttpIntegration
+from api_tabular import config
+from api_tabular.error import QueryException
+from api_tabular.query import (
+    get_resource,
+    get_resource_data,
+    get_resource_data_streamed,
+)
+from api_tabular.utils import (
+    build_link_with_page,
+    build_sql_query_string,
+    build_swagger_file,
+    url_for,
+)
+routes = web.RouteTableDef()
+sentry_sdk.init(
+    dsn=config.SENTRY_DSN,
+    integrations=[AioHttpIntegration()],
+    traces_sample_rate=1.0,
+)
+@routes.get(r"/api/resources/{rid}/", name="meta")
+async def resource_meta(request):
+    resource_id = request.match_info["rid"]
+    resource = await get_resource(request.app["csession"], resource_id, ["created_at", "url"])
+    return web.json_response(
+        {
+            "created_at": resource["created_at"],
+            "url": resource["url"],
+            "links": [
+                {
+                    "href": url_for(request, "profile", rid=resource_id, _external=True),
+                    "type": "GET",
+                    "rel": "profile",
+                },
+                {
+                    "href": url_for(request, "data", rid=resource_id, _external=True),
+                    "type": "GET",
+                    "rel": "data",
+                },
+                {
+                    "href": url_for(request, "swagger", rid=resource_id, _external=True),
+                    "type": "GET",
+                    "rel": "swagger",
+                },
+            ],
+        }
+    )
+@routes.get(r"/api/resources/{rid}/profile/", name="profile")
+async def resource_profile(request):
+    resource_id = request.match_info["rid"]
+    resource = await get_resource(request.app["csession"], resource_id, ["profile:csv_detective"])
+    return web.json_response(resource)
+@routes.get(r"/api/resources/{rid}/swagger/", name="swagger")
+async def resource_swagger(request):
+    resource_id = request.match_info["rid"]
+    resource = await get_resource(request.app["csession"], resource_id, ["profile:csv_detective"])
+    swagger_string = build_swagger_file(resource["profile"]["columns"], resource_id)
+    return web.Response(body=swagger_string)
+@routes.get(r"/api/resources/{rid}/data/", name="data")
+async def resource_data(request):
+    resource_id = request.match_info["rid"]
+    query_string = request.query_string.split("&") if request.query_string else []
+    page = int(request.query.get("page", "1"))
+    page_size = int(request.query.get("page_size", config.PAGE_SIZE_DEFAULT))
+    if page_size > config.PAGE_SIZE_MAX:
+        raise QueryException(
+            400,
+            None,
+            "Invalid query string",
+            f"Page size exceeds allowed maximum: {config.PAGE_SIZE_MAX}",
+        )
+    if page > 1:
+        offset = page_size * (page - 1)
+    else:
+        offset = 0
+    try:
+        sql_query = build_sql_query_string(query_string, page_size, offset)
+    except ValueError:
+        raise QueryException(400, None, "Invalid query string", "Malformed query")
+    resource = await get_resource(request.app["csession"], resource_id, ["parsing_table"])
+    response, total = await get_resource_data(request.app["csession"], resource, sql_query)
+    next = build_link_with_page(request, query_string, page + 1, page_size)
+    prev = build_link_with_page(request, query_string, page - 1, page_size)
+    body = {
+        "data": response,
+        "links": {
+            "profile": url_for(request, "profile", rid=resource_id, _external=True),
+            "swagger": url_for(request, "swagger", rid=resource_id, _external=True),
+            "next": next if page_size + offset < total else None,
+            "prev": prev if page > 1 else None,
+        },
+        "meta": {"page": page, "page_size": page_size, "total": total},
+    }
+    return web.json_response(body)
+@routes.get(r"/api/resources/{rid}/data/csv/", name="csv")
+async def resource_data_csv(request):
+    resource_id = request.match_info["rid"]
+    query_string = request.query_string.split("&") if request.query_string else []
+    try:
+        sql_query = build_sql_query_string(query_string)
+    except ValueError:
+        raise QueryException(400, None, "Invalid query string", "Malformed query")
+    resource = await get_resource(request.app["csession"], resource_id, ["parsing_table"])
+    response_headers = {
+        "Content-Disposition": f'attachment; filename="{resource_id}.csv"',
+        "Content-Type": "text/csv",
+    }
+    response = web.StreamResponse(headers=response_headers)
+    await response.prepare(request)
+    async for chunk in get_resource_data_streamed(request.app["csession"], resource, sql_query):
+        await response.write(chunk)
+    await response.write_eof()
+    return response
+@routes.get(r"/health/")
+async def get_health(request):
+    return web.HTTPOk()
+async def app_factory():
+    async def on_startup(app):
+        app["csession"] = ClientSession()
+    async def on_cleanup(app):
+        await app["csession"].close()
+    app = web.Application()
+    app.add_routes(routes)
+    app.on_startup.append(on_startup)
+    app.on_cleanup.append(on_cleanup)
+    cors = aiohttp_cors.setup(
+        app,
+        defaults={
+            "*": aiohttp_cors.ResourceOptions(
+                allow_credentials=True, expose_headers="*", allow_headers="*"
+            )
+        },
+    )
+    for route in list(app.router.routes()):
+        cors.add(route)
+    setup_swagger(
+        app,
+        swagger_url=config.DOC_PATH,
+        ui_version=3,
+        swagger_from_file="ressource_app_swagger.yaml",
+    )
+    return app
+def run():
+    web.run_app(app_factory(), path=os.environ.get("CSVAPI_APP_SOCKET_PATH"))
+if __name__ == "__main__":
+    run()

udata_hydra_csvapi-0.2.0.dev0/api_tabular/config_default.toml ADDED Viewed

@@ -0,0 +1,8 @@
+PGREST_ENDPOINT = "http://localhost:8080"
+SERVER_NAME = "localhost:8005"
+SCHEME = "http"
+SENTRY_DSN = ""
+PAGE_SIZE_DEFAULT = 20
+PAGE_SIZE_MAX = 50
+BATCH_SIZE = 50000
+DOC_PATH = "/api/doc"

udata-hydra-csvapi 0.1.0.dev209__tar.gz → 0.2.0.dev0__tar.gz

udata-hydra-csvapi 0.1.0.dev209tar.gz → 0.2.0.dev0tar.gz