gooddata-flight-server 1.34.1.dev1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of gooddata-flight-server might be problematic. Click here for more details.

Files changed (49) hide show
  1. gooddata_flight_server/__init__.py +23 -0
  2. gooddata_flight_server/_version.py +7 -0
  3. gooddata_flight_server/cli.py +137 -0
  4. gooddata_flight_server/config/__init__.py +1 -0
  5. gooddata_flight_server/config/config.py +536 -0
  6. gooddata_flight_server/errors/__init__.py +1 -0
  7. gooddata_flight_server/errors/error_code.py +209 -0
  8. gooddata_flight_server/errors/error_info.py +475 -0
  9. gooddata_flight_server/exceptions.py +16 -0
  10. gooddata_flight_server/health/__init__.py +1 -0
  11. gooddata_flight_server/health/health_check_http_server.py +103 -0
  12. gooddata_flight_server/health/server_health_monitor.py +83 -0
  13. gooddata_flight_server/metrics.py +16 -0
  14. gooddata_flight_server/py.typed +1 -0
  15. gooddata_flight_server/server/__init__.py +1 -0
  16. gooddata_flight_server/server/auth/__init__.py +1 -0
  17. gooddata_flight_server/server/auth/auth_middleware.py +83 -0
  18. gooddata_flight_server/server/auth/token_verifier.py +62 -0
  19. gooddata_flight_server/server/auth/token_verifier_factory.py +55 -0
  20. gooddata_flight_server/server/auth/token_verifier_impl.py +41 -0
  21. gooddata_flight_server/server/base.py +63 -0
  22. gooddata_flight_server/server/default.logging.ini +28 -0
  23. gooddata_flight_server/server/flight_rpc/__init__.py +1 -0
  24. gooddata_flight_server/server/flight_rpc/flight_middleware.py +162 -0
  25. gooddata_flight_server/server/flight_rpc/flight_server.py +228 -0
  26. gooddata_flight_server/server/flight_rpc/flight_service.py +279 -0
  27. gooddata_flight_server/server/flight_rpc/server_methods.py +200 -0
  28. gooddata_flight_server/server/server_base.py +321 -0
  29. gooddata_flight_server/server/server_main.py +116 -0
  30. gooddata_flight_server/tasks/__init__.py +1 -0
  31. gooddata_flight_server/tasks/base.py +21 -0
  32. gooddata_flight_server/tasks/metrics.py +115 -0
  33. gooddata_flight_server/tasks/task.py +193 -0
  34. gooddata_flight_server/tasks/task_error.py +60 -0
  35. gooddata_flight_server/tasks/task_executor.py +96 -0
  36. gooddata_flight_server/tasks/task_result.py +363 -0
  37. gooddata_flight_server/tasks/temporal_container.py +247 -0
  38. gooddata_flight_server/tasks/thread_task_executor.py +639 -0
  39. gooddata_flight_server/utils/__init__.py +1 -0
  40. gooddata_flight_server/utils/libc_utils.py +35 -0
  41. gooddata_flight_server/utils/logging.py +158 -0
  42. gooddata_flight_server/utils/methods_discovery.py +98 -0
  43. gooddata_flight_server/utils/otel_tracing.py +142 -0
  44. gooddata_flight_server-1.34.1.dev1.data/scripts/gooddata-flight-server +10 -0
  45. gooddata_flight_server-1.34.1.dev1.dist-info/LICENSE.txt +7 -0
  46. gooddata_flight_server-1.34.1.dev1.dist-info/METADATA +749 -0
  47. gooddata_flight_server-1.34.1.dev1.dist-info/RECORD +49 -0
  48. gooddata_flight_server-1.34.1.dev1.dist-info/WHEEL +5 -0
  49. gooddata_flight_server-1.34.1.dev1.dist-info/top_level.txt +1 -0
@@ -0,0 +1,749 @@
1
+ Metadata-Version: 2.2
2
+ Name: gooddata-flight-server
3
+ Version: 1.34.1.dev1
4
+ Summary: Flight RPC server to host custom functions
5
+ Author: GoodData
6
+ Author-email: support@gooddata.com
7
+ License: MIT
8
+ Project-URL: Documentation, https://gooddata-flight-server.readthedocs.io/en/v1.34.1.dev1
9
+ Project-URL: Source, https://github.com/gooddata/gooddata-python-sdk
10
+ Keywords: gooddata,flight,rpc,flight rpc,custom functions,analytics,headless,business,intelligence,headless-bi,cloud,native,semantic,layer,sql,metrics
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Environment :: Console
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3.9
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Database
20
+ Classifier: Topic :: Scientific/Engineering
21
+ Classifier: Topic :: Software Development
22
+ Classifier: Typing :: Typed
23
+ Requires-Python: >=3.9.0
24
+ Description-Content-Type: text/markdown
25
+ License-File: LICENSE.txt
26
+ Requires-Dist: dynaconf<4.0.0,>=3.1.11
27
+ Requires-Dist: opentelemetry-api<=2.0.0,>=1.24.0
28
+ Requires-Dist: opentelemetry-sdk<=2.0.0,>=1.24.0
29
+ Requires-Dist: orjson<4.0.0,>=3.8.5
30
+ Requires-Dist: prometheus-client~=0.20.0
31
+ Requires-Dist: pyarrow>=16.1.0
32
+ Requires-Dist: readerwriterlock~=1.0.9
33
+ Requires-Dist: structlog<25.0.0,>=24.0.0
34
+ Dynamic: author
35
+ Dynamic: author-email
36
+ Dynamic: classifier
37
+ Dynamic: description
38
+ Dynamic: description-content-type
39
+ Dynamic: keywords
40
+ Dynamic: license
41
+ Dynamic: project-url
42
+ Dynamic: requires-dist
43
+ Dynamic: requires-python
44
+ Dynamic: summary
45
+
46
+ # GoodData Flight Server
47
+
48
+ The GoodData Flight Server is an opinionated, pluggable Flight RPC Server implementation.
49
+
50
+ It builds on top of the Flight RPC components provided by [PyArrow](https://pypi.org/project/pyarrow/) and
51
+ on functions and capabilities typically needed when building production-ready
52
+ Flight RPC data services:
53
+
54
+ - A robust configuration system leveraging [Dynaconf](https://www.dynaconf.com/)
55
+ - Enablement of data service observability (logging, metrics, tracing)
56
+ - Health checking exposed via liveness and readiness endpoints
57
+ - Token-based authentication with pluggable token verification methods
58
+
59
+ Next to this, the server also comes with infrastructure that you can leverage
60
+ for building data service functionality itself:
61
+
62
+ - Library for generating and serving Flights created using long-running tasks
63
+ - Extendable error handling infrastructure that allows your service to
64
+ provide error information in structured manner
65
+
66
+ Code in this package is derived from our production codebase, where we run
67
+ and operate many different data services and have this infrastructure proven
68
+ and battle-tested.
69
+
70
+ ## Getting Started
71
+
72
+ The `gooddata-flight-server` package is like any other. You can install it
73
+ using `pip install gooddata-flight-server` or - more common - add it as dependency
74
+ to your project.
75
+
76
+ The server takes care of all the boilerplate, and you take care of implementing
77
+ the Flight RPC methods - similar as you would implement them using PyArrow's Flight
78
+ server.
79
+
80
+ Here is a very simple example of the data service's Flight RPC methods implementation:
81
+
82
+ ```python
83
+ import gooddata_flight_server as gf
84
+ import pyarrow.flight
85
+
86
+
87
+ class DataServiceMethods(gf.FlightServerMethods):
88
+ """
89
+ This example data service serves some sample static data. Any
90
+ DoGet request will return that static data. All other Flight RPC
91
+ methods are left unimplemented.
92
+
93
+ Note how the class holds onto the `ServerContext` - the implementations
94
+ will usually want to do this because the context contains additional
95
+ dependencies such as:
96
+
97
+ - Location to send out in FlightInfo
98
+ - Health monitor that the implementation can use to indicate
99
+ its status
100
+ - Task executor to perform long-running tasks
101
+ """
102
+
103
+ StaticData = pyarrow.table({
104
+ "col1": [1, 2, 3]
105
+ })
106
+
107
+ def __init__(self, ctx: gf.ServerContext) -> None:
108
+ self._ctx = ctx
109
+
110
+ def do_get(self,
111
+ context: pyarrow.flight.ServerCallContext,
112
+ ticket: pyarrow.flight.Ticket
113
+ ) -> pyarrow.flight.FlightDataStream:
114
+ return pyarrow.flight.RecordBatchStream(
115
+ self.StaticData
116
+ )
117
+
118
+
119
+ @gf.flight_server_methods
120
+ def my_service(ctx: gf.ServerContext) -> gf.FlightServerMethods:
121
+ """
122
+ Factory function for the data service. It returns implementation of Flight RPC
123
+ methods which are then integrated into the server.
124
+
125
+ The ServerContext passed in `ctx` allows you to access available configuration
126
+ and various useful server components.
127
+ """
128
+ return DataServiceMethods(ctx)
129
+
130
+
131
+ if __name__ == "__main__":
132
+ # additional options & config files can be passed to the
133
+ # create_server methods; more on this later
134
+ server = gf.create_server(my_service)
135
+ server.start()
136
+
137
+ # the main thread will block on this call
138
+ #
139
+ # SIGINT/SIGTERM causes graceful shutdown - the method will
140
+ # exit once server is stopped.
141
+ server.wait_for_stop()
142
+ ```
143
+
144
+ Notice the annotated `my_service` function. This is a factory for your data service's
145
+ Flight RPC methods. The server will call it out at appropriate time during the startup.
146
+ It will pass you the full context available at the time from where your code can access:
147
+
148
+ - available configuration loaded using Dynaconf
149
+ - health-checking components
150
+ - components to use for running long-running tasks.
151
+
152
+ During startup, the server will register signal handlers for SIGINT and SIGTERM - it will
153
+ perform graceful shutdown and tear everything down in the correct order when it receives them.
154
+
155
+ The server also comes with a simple CLI that you can use to start it up and load particular
156
+ data service:
157
+
158
+ ```shell
159
+ $ gooddata-flight-server start --methods-provider my_service.main
160
+ ```
161
+
162
+ The CLI will import the `my_service.main` Python module and look for a function decorated
163
+ with `@flight_server_methods`. It will start the server and make it initialize your data service
164
+ implementation and integrate it into the Flight RPC server.
165
+
166
+ Without any configuration, the server will bind to `127.0.0.1:17001` and run without TLS and not
167
+ use any authentication. It will not start health check or metric endpoints and will not start
168
+ the OpenTelemetry exporters.
169
+
170
+ NOTE: the CLI also has other arguments that let you specify configuration files to load and
171
+ logging configuration to use.
172
+
173
+ ### Configuration
174
+
175
+ The server uses [Dynaconf](https://www.dynaconf.com/) to for all its configuration. There are
176
+ many settings already in place to influence server's configuration and behavior. Your data service
177
+ code can also leverage Dynaconf config to configure itself: you can pass any number of configuration
178
+ files / env variables at startup; the server will load them all using Dynaconf and let your code
179
+ work with Dynaconf structures.
180
+
181
+ We recommend you to check out the Dynaconf documentation to learn more about how it works and
182
+ what are the capabilities. This text will only highlight the most common usage.
183
+
184
+ The available server settings are documented in the [sample-config.toml](sample-config.toml).
185
+ You can take this and use it as template for your own configuration.
186
+
187
+ To use a configuration file during startup, you can start the server like this:
188
+
189
+ ```shell
190
+ $ gooddata-flight-server start \
191
+ --methods-provider my_service.main \
192
+ --config server.config.toml
193
+ ```
194
+
195
+ In case your service needs its own configuration, it is often a good idea to keep it in
196
+ a separate file and add that to startup:
197
+
198
+ ```shell
199
+ $ gooddata-flight-server start \
200
+ --methods-provider my_service.main \
201
+ --config server.config.toml my_service.config.toml
202
+ ```
203
+
204
+ #### Environment variables
205
+
206
+ All settings that you can code into the config file can be also provided using environment
207
+ variables.
208
+
209
+ The server's Dynaconf integration is set up so that all environment variables are
210
+ expected to be prefixed with `GOODDATA_FLIGHT_`.
211
+
212
+ The environment variable naming convention is set up by Dynaconf and goes as follows:
213
+ `GOODDATA_FLIGHT_{SECTION}__{SETTING_NAME}`
214
+
215
+ Where the `SECTION` is for example `[server]`. For convenience, the [sample-config.toml](sample-config.toml)
216
+ indicates the full name of respective environment variable in each setting's documentation.
217
+
218
+ #### Configuration for your service
219
+
220
+ If your service needs its own configuration, you should aim to have a TOML config file like this:
221
+
222
+ ```toml
223
+ [my_service]
224
+ # env: GOODDATA_FLIGHT_MY_SERVICE__OPT1
225
+ opt1 = "value"
226
+ ```
227
+
228
+ When you provide such config file to server, it will parse it and make its contents available in the `ctx.settings`.
229
+ You can then access value of this setting in your factory function. For example like this:
230
+
231
+ ```python
232
+ import gooddata_flight_server as gf
233
+
234
+ _MY_CONFIG_SECTION = "my_service"
235
+
236
+ @gf.flight_server_methods
237
+ def my_service(ctx: gf.ServerContext) -> gf.FlightServerMethods:
238
+ opt1 = ctx.settings.get(f"{_MY_CONFIG_SECTION}.opt1")
239
+
240
+ # ... create and return server methods ...
241
+ ```
242
+
243
+ ### Authentication
244
+
245
+ Currently, the server supports two modes of authentication:
246
+
247
+ - no authentication
248
+ - token-based authentication and allows you to plug in custom token verification logic
249
+
250
+ The token verification method that comes built-in with the server is a simple one: the token is
251
+ an arbitrary, secret value shared between server and client. You configure the list of valid secret
252
+ tokens at server start-up and then at your discretion distribute these secret values to clients.
253
+
254
+ By default, the server runs with no authentication. To turn on the token based authentication,
255
+ you have to:
256
+
257
+ - Set the `authentication_method` setting to `token`.
258
+
259
+ By default, the server will use the built-in token verification strategy
260
+ called `EnumeratedTokenVerification`.
261
+
262
+ - Configure the secret tokens.
263
+
264
+ You can do this using environment variable: `GOODDATA_FLIGHT_ENUMERATED_TOKENS__TOKENS='["", ""]'`.
265
+ Put the secret token(s) inside the quotes. Alternatively, you can code tokens into a configuration file
266
+ such as this:
267
+
268
+ ```toml
269
+ [enumerated_tokens]
270
+ tokens = ["", ""]
271
+ ```
272
+
273
+ IMPORTANT: never commit secrets to your VCS.
274
+
275
+ With this setup in place, the server will expect the Flight clients to include token in the
276
+ `authorization` header in form of `Bearer <token>`. The token must be present on every
277
+ call.
278
+
279
+ Here is an example how to make a call that includes the `authorization` header:
280
+
281
+ ```python
282
+ import pyarrow.flight
283
+
284
+ def example_call_using_tokens():
285
+ opts = pyarrow.flight.FlightCallOptions(headers=[(b"authorization", b"Bearer <token>")])
286
+ client = pyarrow.flight.FlightClient("grpc+tls://localhost:17001")
287
+
288
+ for flight in client.list_flights(b"", opts):
289
+ print(flight)
290
+ ```
291
+
292
+ ## Developer Manual
293
+
294
+ This part of the documentation explains additional capabilities of the server.
295
+
296
+ ### Long-running tasks
297
+
298
+ Part of this package is a component that you can use to generate Flight data using long-running
299
+ tasks: the `TaskExecutor` component. The server will configure and create an instance of TaskExecutor
300
+ at startup; your server can access it via `ServerContext`.
301
+
302
+ The `TaskExecutor` implementation wraps on top of `ThreadPoolExecutor`: you can configure the number of
303
+ threads available for your tasks using `task_threads` setting. Each active task will use one thread from
304
+ this pool. If all threads are occupied, the tasks will be queued using FIFO strategy.
305
+
306
+ To use the `TaskExecutor`, you have to encapsulate the Flight data generation logic into a class
307
+ that extends the `Task` interface. Here, in the `run()` method you implement the necessary
308
+ algorithm that generates data.
309
+
310
+ The `Task` interface comes with a contract how your code should return the result (data) or raise
311
+ errors. The `TaskExecutor` will hold onto the results generated by your task and retain them for
312
+ a configured amount of time (see `task_result_ttl_sec` setting). The infrastructure recognizes that
313
+ your task may generate result that can be consumed either repeatedly (say Arrow Tables) or just
314
+ once (say RecordBatchReader backed by live stream).
315
+
316
+ Here is an example showing how to code a task, how to integrate its execution and how to
317
+ send out data that it generated:
318
+
319
+ ```python
320
+ from typing import Union, Any
321
+
322
+ import pyarrow.flight
323
+
324
+ import gooddata_flight_server as gf
325
+
326
+
327
+ class MyServiceTask(gf.Task):
328
+ def __init__(
329
+ self,
330
+ task_specific_payload: Any,
331
+ cmd: bytes,
332
+ ):
333
+ super().__init__(cmd)
334
+
335
+ self._task_specific_payload = task_specific_payload
336
+
337
+ def run(self) -> Union[gf.TaskResult, gf.TaskError]:
338
+ # tasks support cancellation; your code can check for
339
+ # cancellation at any time; if the task was cancelled the
340
+ # method will raise exception.
341
+ #
342
+ # do not forget to do cleanup on cancellation
343
+ self.check_cancelled()
344
+
345
+ # ... do whatever is needed to generate the data
346
+ data: pyarrow.RecordBatchReader = some_method_to_generate_data()
347
+
348
+ # when the data is ready, wrap it in a result that implements
349
+ # the FlightDataTaskResult interface; there are built-in implementations
350
+ # to wrap Arrow Table or Arrow RecordBatchReader.
351
+ #
352
+ # you can write your own result if you need special handling
353
+ # of result and/or resources bound to the result.
354
+ return gf.FlightDataTaskResult.for_data(data)
355
+
356
+
357
+ class DataServiceMethods(gf.FlightServerMethods):
358
+ def __init__(self, ctx: gf.ServerContext) -> None:
359
+ self._ctx = ctx
360
+
361
+ def _prepare_flight_info(self, task_result: gf.TaskExecutionResult) -> pyarrow.flight.FlightInfo:
362
+ if task_result.error is not None:
363
+ raise task_result.error.as_flight_error()
364
+
365
+ if task_result.cancelled:
366
+ raise gf.ErrorInfo.for_reason(
367
+ gf.ErrorCode.COMMAND_CANCELLED,
368
+ f"Service call was cancelled. Invocation task was: '{task_result.task_id}'.",
369
+ ).to_server_error()
370
+
371
+ result = task_result.result
372
+
373
+ return pyarrow.flight.FlightInfo(
374
+ schema=result.get_schema(),
375
+ descriptor=pyarrow.flight.FlightDescriptor.for_command(task_result.cmd),
376
+ endpoints=[
377
+ pyarrow.flight.FlightEndpoint(
378
+ ticket=pyarrow.flight.Ticket(ticket=task_result.task_id.encode()),
379
+ locations=[self._ctx.location],
380
+ )
381
+ ],
382
+ total_records=-1,
383
+ total_bytes=-1,
384
+ )
385
+
386
+ def get_flight_info(
387
+ self,
388
+ context: pyarrow.flight.ServerCallContext,
389
+ descriptor: pyarrow.flight.FlightDescriptor,
390
+ ) -> pyarrow.flight.FlightInfo:
391
+ cmd = descriptor.command
392
+ # parse & validate the command
393
+ some_parsed_command = ...
394
+
395
+ # create your custom task; you will usually pass the parsed command
396
+ # so that task knows what to do. The 'raw' command is required as well because
397
+ # it should be bounced back in the FlightInfo
398
+ task = MyServiceTask(task_specific_payload=some_parsed_command, cmd=cmd)
399
+ self._ctx.task_executor.submit(task)
400
+
401
+ # wait for the task to complete
402
+ result = self._ctx.task_executor.wait_for_result(task_id=task.task_id)
403
+
404
+ # once the task completes, create the FlightInfo or raise exception in
405
+ # case the task failed. The ticket in the FlightInfo should contain the
406
+ # task identifier.
407
+ return self._prepare_flight_info(result)
408
+
409
+ def do_get(self,
410
+ context: pyarrow.flight.ServerCallContext,
411
+ ticket: pyarrow.flight.Ticket
412
+ ) -> pyarrow.flight.FlightDataStream:
413
+ # caller comes to pick the data; the ticket should be the task identifier
414
+ task_id = ticket.ticket.decode()
415
+
416
+ # this utility method on the base class takes care of everything needed
417
+ # to correctly create FlightDataStream from the task result (or die trying
418
+ # in case the task result is no longer preset, or the result indicates that
419
+ # the task has failed)
420
+ return self.do_get_task_result(context, self._ctx.task_executor, task_id)
421
+ ```
422
+
423
+ ### Custom token verification strategy
424
+
425
+ At the moment, the built-in token verification strategy supported by the server is the
426
+ most basic one. In cases when this strategy is not good enough, you can code your own
427
+ and plug it into the server.
428
+
429
+ The `TokenVerificationStrategy` interface sets contract for your custom strategy. You
430
+ implement this class inside a Python module and then tell the server to load that
431
+ module.
432
+
433
+ For example, you create a module `my_service.auth.custom_token_verification` where you
434
+ implement the verification strategy:
435
+
436
+ ```python
437
+ import gooddata_flight_server as gf
438
+ import pyarrow.flight
439
+ from typing import Any
440
+
441
+
442
+ class MyCustomTokenVerification(gf.TokenVerificationStrategy):
443
+ def verify(self, call_info: pyarrow.flight.CallInfo, token: str) -> Any:
444
+ # implement your arbitrary logic here;
445
+ #
446
+ # see method and class documentation to learn more
447
+ raise NotImplementedError
448
+
449
+ @classmethod
450
+ def create(cls, ctx: gf.ServerContext) -> "TokenVerificationStrategy":
451
+ # code has chance to read any necessary settings from `ctx.settings`
452
+ # property and then use those values to construct the class
453
+ #
454
+ # see method and class documentation to learn more
455
+ return MyCustomTokenVerification()
456
+ ```
457
+
458
+ Then, you can use the `token_verification` setting to tell the server to look up
459
+ and load token verification strategy from `my_service.auth.custom_token_verification` module.
460
+
461
+ Using custom verification strategy, you can implement support for say JWT tokens or look
462
+ up valid tokens inside some database.
463
+
464
+ NOTE: As is, the server infrastructure does not concern itself with how the clients actually
465
+ obtain the valid tokens. At the moment, this is outside of this project's scope. You can distribute
466
+ tokens to clients using some procedure or implement custom APIs where clients have to log in
467
+ in order to obtain a valid token.
468
+
469
+ ### Logging
470
+
471
+ The server comes with the `structlog` installed by default. The `structlog` is used and configured
472
+ so that it uses Python stdlib logging backend. The `structlog` pipeline is set up so that:
473
+
474
+ - In dev mode, the logs are pretty-printed into console (achieved by `--dev-log` option of the server)
475
+ - In production deployment, the logs are serialized into JSON (using orjson) which is then written out.
476
+ This is ideal for consumption in log aggregators.
477
+
478
+ By default, the stdlib loggers are configured using the [default.logging.ini](./gooddata_flight_server/server/default.logging.ini)
479
+ file. In the default setup, all INFO-level logs are emitted.
480
+
481
+ If you want to customize the logging configuration, then:
482
+
483
+ - make a copy of this file and tweak it as you need
484
+ - either pass path to your config file to the `create_server` function or use `--logging-config`
485
+ argument on the CLI
486
+
487
+ The config file is the standard Python logging configuration file. You can learn about its intricacies
488
+ in Python documentation.
489
+
490
+ NOTE: you typically do not want to touch the formatter settings inside the logging ini file - the
491
+ `structlog` library creates the entire log lines accordingly to deployment mode.
492
+
493
+ The use of `structlog` and loggers is fairly straightforward:
494
+
495
+ ```python
496
+ import structlog
497
+
498
+ _LOGGER = structlog.get_logger("my_service")
499
+ _LOGGER.info("event-name", some_event_key="value_to_log")
500
+ ```
501
+
502
+ #### Recommendations
503
+
504
+ Here are few assorted recommendations based on our production experience with `structlog`:
505
+
506
+ - You can log complex objects such as lists, tuples, dicts and data classes no problem
507
+ - Be careful though. What can be serialized into dev-log may not always serialize
508
+ using `orjson` into production logs
509
+ - Always log exceptions using the special [exc_info](https://www.structlog.org/en/stable/exceptions.html) event key.
510
+ - Mind the cardinality of the logger instances. If you have a class of which you may have thousands of
511
+ instances, then it is **not a good idea** to create a logger instance for each instance of your class - even
512
+ if the logger name is the same; this is because each logger instance comes with memory overhead.
513
+
514
+ ### Prometheus Metrics
515
+
516
+ The server can be configured to start HTTP endpoint that exposes values of Prometheus
517
+ metrics. This is disabled by default.
518
+
519
+ To get started with Prometheus metrics you need to:
520
+
521
+ - Set `metrics_host` and `metrics_port`
522
+
523
+ - Check out the config file comments to learn more about these settings.
524
+ - What you have to remember is that the Prometheus scraper is an external process that
525
+ needs to reach the HTTP endpoint via network.
526
+
527
+ From then on, you can start using the Prometheus client to create various types of metrics. For example:
528
+
529
+ ```python
530
+ from prometheus_client import Counter
531
+
532
+ # instantiate counter
533
+ MY_COUNTER = Counter(
534
+ "my_counter",
535
+ "Fitting description of `my_counter`.",
536
+ )
537
+
538
+ def some_function():
539
+ # ...
540
+ MY_COUNTER.inc()
541
+ ```
542
+
543
+ #### Recommendations
544
+
545
+ Here are a few assorted recommendations based on our production experience:
546
+
547
+ - You must avoid double-declaration of metrics. If you try to define metric with same
548
+ identifier twice, the registration will fail.
549
+
550
+ - It is nice to declare all/most metrics in single place. For example create `my_metrics.py`
551
+ file and in that have `MyMetrics` class with one static field per metric.
552
+
553
+ This approach leads to better 'discoverability' of available metrics just by looking
554
+ at code. Using class with static field per-metric in turn makes imports and autocomplete
555
+ more convenient.
556
+
557
+ ### Open Telemetry
558
+
559
+ The server can be configured to integrate with OpenTelemetry and start and auto-configure
560
+ OpenTelemetry exporters. It will also auto-fill the ResourceAttributes by doing discovery where possible.
561
+
562
+ See the `otel_*` options in the configuration files to learn more. In a nutshell it
563
+ goes as follows:
564
+
565
+ - Configure which exporter to use using `otel_exporter_type` setting.
566
+
567
+ Nowadays, the `otlp-grpc` or `otlp-http` is the usual choice.
568
+
569
+ Depending on the exporter you use, you may/must specify additional, exporter-specific
570
+ environment variables to configure the exporter. The supported environment variables
571
+ are documented in the respective OpenTelemetry exporter package; e.g. they are not
572
+ something special to GoodData's Flight Server.
573
+
574
+ See [official exporter documentation](https://opentelemetry-python.readthedocs.io/en/latest/exporter/otlp/otlp.html#module-opentelemetry.exporter.otlp.proto.grpc).
575
+
576
+ - Install the respective exporter package.
577
+
578
+ - Tweak the other `otel_*` settings: you must at minimum set the `otel_service_name`
579
+
580
+ The settings apart from `otel_service_name` will fall back to defaults.
581
+
582
+ To start tracing, you need to initialize a tracer. You can do so as follows:
583
+
584
+ ```python
585
+ from opentelemetry import trace
586
+
587
+ MY_TRACER: trace.Tracer = trace.ProxyTracer("my_tracer")
588
+ ```
589
+
590
+ Typically, you want to create one instance of tracer for your entire data service and then
591
+ import that instance and use it wherever needed to create spans:
592
+
593
+ ```python
594
+ from your_module_with_tracer import MY_TRACER
595
+
596
+ def some_function():
597
+ # ... code
598
+ with MY_TRACER.start_as_current_span("do_some_work") as span:
599
+ # ... code
600
+ pass
601
+ ```
602
+
603
+ Note: there are many ways to instrument your code with spans. See [OpenTelemetry documentation](https://opentelemetry.io/docs/languages/python/instrumentation/)
604
+ to find out more.
605
+
606
+ #### Recommendations
607
+
608
+ Here are a few assorted recommendations based on our production experience:
609
+
610
+ - Always use the `ProxyTracer`. The underlying initialization code done by the server
611
+ will correctly set the actual tracer that will be called from the ProxyTracer.
612
+
613
+ This way, if you turn off OpenTelemetry (by commenting out the `otel_export_type` setting or setting it
614
+ to 'none'), the NoOpTracer will be injected under the covers and all the tracing code will
615
+ be no-op as well.
616
+
617
+ ### Health Checks
618
+
619
+ The server comes with a basic health-checking infrastructure - this is especially useful
620
+ when deploying to environments (such as k8s) that monitor health of your server and can automatically
621
+ restart it in case of problems.
622
+
623
+ When you configure the `health_check_host` (and optionally also `health_check_port`) setting, the
624
+ server will expose two HTTP endpoints:
625
+
626
+ - `/ready` - indicates whether the server is up and ready to serve requests
627
+
628
+ The endpoint will respond with status `500` if the server is not ready. Otherwise will respond with
629
+ `202`. The server is deemed ready when all its modules are up and the Flight RPC server is
630
+ 'unlocked' to handle requests.
631
+
632
+ - `/live` - indicates whether the server is still alive and can be used. The liveness is determined
633
+ from the status of the modules.
634
+
635
+ Each of the server's modules can report its status to a central health checking service. If any of
636
+ the modules is unhealthy, the whole server is unhealthy.
637
+
638
+ Similar to the readiness, the server will respond with status `500` when not healthy. Otherwise, it
639
+ will respond with status `202`.
640
+
641
+ Creating health-checks is fairly straightforward:
642
+
643
+ - Your service's factory function receives `ServerContext`
644
+
645
+ - The `ServerContext` contains `health` property - which returns an instance of `ServerHealthMonitor`
646
+
647
+ - At this occasion, your code should hold onto / propagate the health monitor to any mission-critical
648
+ modules / components used by your implementation
649
+
650
+ - The `ServerHealthMonitor` has `set_module_status(module, status)` method - you can use this to indicate status
651
+
652
+ - The module `name` argument to this method can be anything you see fit
653
+ - The status is either `ModuleHealthStatus.OK` or `ModuleHealthStatus.NOT_OK`
654
+ - When your module is `NOT_OK`, the entire server is `NOT_OK`
655
+ - Usually, there is a grace period for which the server can be `NOT_OK`; after the time is up,
656
+ environment will restart the server
657
+ - If you return your module back to `OK` status, the server returns to `OK` status as well - thus
658
+ avoiding the automatic restarts.
659
+
660
+ Here is an example component using health monitoring:
661
+
662
+ ```python
663
+ import gooddata_flight_server as gf
664
+
665
+ class YourMissionCriticalComponent:
666
+ """
667
+ Let's say this component is used to perform some heavy lifting / important job.
668
+
669
+ The component is created in your service's factory and is used during Flight RPC
670
+ invocation. You propagate the `health` monitor to it at construction time.
671
+ """
672
+ def __init__(self, health: gf.ServerHealthMonitor) -> None:
673
+ self._health = health
674
+
675
+ def some_important_method(self):
676
+ try:
677
+ # this does some important work
678
+ return
679
+ except OSError:
680
+ # it runs into some kind of unrecoverable error (OSError here is purely example);
681
+ # by setting the status to NOT_OK, your component indicates that it is unhealthy
682
+ # and the /live endpoint will report the entire server as unhealthy.
683
+ #
684
+ # usually, the liveness checks have a grace period. if you set the module back
685
+ # to `gf.ModuleHealthStatus.OK` everything turns healthy again. If the grace
686
+ # period elapses, the server will usually be restarted by the environment.
687
+ self._health.set_module_status("YourMissionCriticalComponent", gf.ModuleHealthStatus.NOT_OK)
688
+ raise
689
+ ```
690
+
691
+ ## Troubleshooting
692
+
693
+ ### Clients cannot read data during GetFlightInfo->DoGet flow; getting DNS errors
694
+
695
+ The root cause here is usually misconfiguration of `listen_host` and `advertise_host`
696
+
697
+ You must always remember that `GetFlightInfo` returns a `FlightInfo` that is used
698
+ by clients to obtain the data using `DoGet`. The `FlightInfo` contains the location(s)
699
+ that the client will connect to - they must be reachable by the client.
700
+
701
+ There are a few things to check:
702
+
703
+ 1. Ensure that your service implementation correctly sets Location in the FlightInfo
704
+
705
+ Usually, you want to set the location to the value that your service implementation
706
+ receives in the `ServerContext`. This location is prepared by the server and contains
707
+ the value of `advertise_host` and `advertise_port`.
708
+
709
+ 2. Ensure that the `advertise_host` is set correctly; mistakes can happen easily especially
710
+ in dockerized environments. The documentation of `listen_host` and `advertise_host`
711
+ has additional detail.
712
+
713
+ To highlight specifics of Dockerized deployment:
714
+
715
+ - The server most often needs to listen on `0.0.0.0`
716
+ - The server must, however, advertise different hostname/IP - one that is reachable from
717
+ outside the container
718
+
719
+ ### The server's RSS keeps on growing; looks like server leaking memory
720
+
721
+ This can be usually observed on servers that are write-heavy: servers that handle a lot
722
+ of `DoPut` or `DoExchange` requests. When such servers run in environments that enforce
723
+ RSS limits, they can end up killed.
724
+
725
+ Often, this not a leak but a behavior of `malloc`. Even if you tell PyArrow to use
726
+ the `jemalloc` allocator, the underlying gRPC server used by Flight RPC will use `malloc` and
727
+ by default `malloc` will take its time returning unused memory back to the system.
728
+
729
+ And since the gRPC server is responsible for allocating memory for the received Arrow data,
730
+ it is often the `DoPut` or `DoExchange` workload that look like leaking memory.
731
+
732
+ If the RSS size is a problem (say you are running service inside k8s with memory limit), the
733
+ usual strategy is to:
734
+
735
+ 1. Set / tweak malloc behavior using `GLIBC_TUNABLES` environment variable; reduce
736
+ the malloc trim threshold and possibly also reduce number of malloc arenas
737
+
738
+ Here is a quite aggressive: `GLIBC_TUNABLES="glibc.malloc.trim_threshold=4:glibc.malloc.arena_max=2:glibc.malloc.tcache_count=0"`
739
+
740
+ 2. Periodically call `malloc_trim` to poke malloc to trim any unneeded allocations and
741
+ return them to the system.
742
+
743
+ The GoodData Flight server already implements period malloc trim. By default, the interval
744
+ is set to `30 seconds`. You can change this interval using the `malloc_trim_interval_sec`
745
+ setting.
746
+
747
+ Additionally, we recommend to read up on [Python Memory Management](https://realpython.com/python-memory-management/) -
748
+ especially the part where CPython is not returning unused blocks back to the system. This may be another reason for
749
+ RSS growth - the tricky bit here being that it really depends on object creation patterns in your service.