gooddata-flight-server 1.28.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of gooddata-flight-server might be problematic. Click here for more details.

Files changed (49) hide show
  1. gooddata_flight_server/__init__.py +23 -0
  2. gooddata_flight_server/_version.py +7 -0
  3. gooddata_flight_server/cli.py +137 -0
  4. gooddata_flight_server/config/__init__.py +1 -0
  5. gooddata_flight_server/config/config.py +536 -0
  6. gooddata_flight_server/errors/__init__.py +1 -0
  7. gooddata_flight_server/errors/error_code.py +209 -0
  8. gooddata_flight_server/errors/error_info.py +475 -0
  9. gooddata_flight_server/exceptions.py +16 -0
  10. gooddata_flight_server/health/__init__.py +1 -0
  11. gooddata_flight_server/health/health_check_http_server.py +103 -0
  12. gooddata_flight_server/health/server_health_monitor.py +83 -0
  13. gooddata_flight_server/metrics.py +16 -0
  14. gooddata_flight_server/py.typed +1 -0
  15. gooddata_flight_server/server/__init__.py +1 -0
  16. gooddata_flight_server/server/auth/__init__.py +1 -0
  17. gooddata_flight_server/server/auth/auth_middleware.py +83 -0
  18. gooddata_flight_server/server/auth/token_verifier.py +62 -0
  19. gooddata_flight_server/server/auth/token_verifier_factory.py +55 -0
  20. gooddata_flight_server/server/auth/token_verifier_impl.py +41 -0
  21. gooddata_flight_server/server/base.py +63 -0
  22. gooddata_flight_server/server/default.logging.ini +28 -0
  23. gooddata_flight_server/server/flight_rpc/__init__.py +1 -0
  24. gooddata_flight_server/server/flight_rpc/flight_middleware.py +162 -0
  25. gooddata_flight_server/server/flight_rpc/flight_server.py +230 -0
  26. gooddata_flight_server/server/flight_rpc/flight_service.py +281 -0
  27. gooddata_flight_server/server/flight_rpc/server_methods.py +200 -0
  28. gooddata_flight_server/server/server_base.py +321 -0
  29. gooddata_flight_server/server/server_main.py +116 -0
  30. gooddata_flight_server/tasks/__init__.py +1 -0
  31. gooddata_flight_server/tasks/base.py +21 -0
  32. gooddata_flight_server/tasks/metrics.py +115 -0
  33. gooddata_flight_server/tasks/task.py +193 -0
  34. gooddata_flight_server/tasks/task_error.py +60 -0
  35. gooddata_flight_server/tasks/task_executor.py +96 -0
  36. gooddata_flight_server/tasks/task_result.py +363 -0
  37. gooddata_flight_server/tasks/temporal_container.py +247 -0
  38. gooddata_flight_server/tasks/thread_task_executor.py +639 -0
  39. gooddata_flight_server/utils/__init__.py +1 -0
  40. gooddata_flight_server/utils/libc_utils.py +35 -0
  41. gooddata_flight_server/utils/logging.py +158 -0
  42. gooddata_flight_server/utils/methods_discovery.py +98 -0
  43. gooddata_flight_server/utils/otel_tracing.py +142 -0
  44. gooddata_flight_server-1.28.0.data/scripts/gooddata-flight-server +10 -0
  45. gooddata_flight_server-1.28.0.dist-info/LICENSE.txt +1066 -0
  46. gooddata_flight_server-1.28.0.dist-info/METADATA +737 -0
  47. gooddata_flight_server-1.28.0.dist-info/RECORD +49 -0
  48. gooddata_flight_server-1.28.0.dist-info/WHEEL +5 -0
  49. gooddata_flight_server-1.28.0.dist-info/top_level.txt +1 -0
@@ -0,0 +1,737 @@
1
+ Metadata-Version: 2.1
2
+ Name: gooddata-flight-server
3
+ Version: 1.28.0
4
+ Summary: Flight RPC server to host custom functions
5
+ Author: GoodData
6
+ Author-email: support@gooddata.com
7
+ License: MIT
8
+ Project-URL: Documentation, https://gooddata-flight-server.readthedocs.io/en/v1.28.0
9
+ Project-URL: Source, https://github.com/gooddata/gooddata-python-sdk
10
+ Keywords: gooddata,flight,rpc,flight rpc,custom functions,analytics,headless,business,intelligence,headless-bi,cloud,native,semantic,layer,sql,metrics
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Environment :: Console
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3.9
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Database
19
+ Classifier: Topic :: Scientific/Engineering
20
+ Classifier: Topic :: Software Development
21
+ Classifier: Typing :: Typed
22
+ Requires-Python: >=3.9.0
23
+ Description-Content-Type: text/markdown
24
+ License-File: LICENSE.txt
25
+ Requires-Dist: dynaconf <4.0.0,>=3.1.11
26
+ Requires-Dist: opentelemetry-api <=2.0.0,>=1.24.0
27
+ Requires-Dist: opentelemetry-sdk <=2.0.0,>=1.24.0
28
+ Requires-Dist: orjson <4.0.0,>=3.8.5
29
+ Requires-Dist: prometheus-client ~=0.20.0
30
+ Requires-Dist: pyarrow >=16.1.0
31
+ Requires-Dist: readerwriterlock ~=1.0.9
32
+ Requires-Dist: structlog <25.0.0,>=24.0.0
33
+
34
+ # GoodData Flight Server
35
+
36
+ The GoodData Flight Server is an opinionated, pluggable Flight RPC Server implementation.
37
+
38
+ It builds on top of the Flight RPC components provided by [PyArrow](https://pypi.org/project/pyarrow/) and
39
+ on functions and capabilities typically needed when building production-ready
40
+ Flight RPC data services:
41
+
42
+ - A robust configuration system leveraging [Dynaconf](https://www.dynaconf.com/)
43
+ - Enablement of data service observability (logging, metrics, tracing)
44
+ - Health checking exposed via liveness and readiness endpoints
45
+ - Token-based authentication with pluggable token verification methods
46
+
47
+ Next to this, the server also comes with infrastructure that you can leverage
48
+ for building data service functionality itself:
49
+
50
+ - Library for generating and serving Flights created using long-running tasks
51
+ - Extendable error handling infrastructure that allows your service to
52
+ provide error information in structured manner
53
+
54
+ Code in this package is derived from our production codebase, where we run
55
+ and operate many different data services and have this infrastructure proven
56
+ and battle-tested.
57
+
58
+ ## Getting Started
59
+
60
+ The `gooddata-flight-server` package is like any other. You can install it
61
+ using `pip install gooddata-flight-server` or - more common - add it as dependency
62
+ to your project.
63
+
64
+ The server takes care of all the boilerplate, and you take care of implementing
65
+ the Flight RPC methods - similar as you would implement them using PyArrow's Flight
66
+ server.
67
+
68
+ Here is a very simple example of the data service's Flight RPC methods implementation:
69
+
70
+ ```python
71
+ import gooddata_flight_server as gf
72
+ import pyarrow.flight
73
+
74
+
75
+ class DataServiceMethods(gf.FlightServerMethods):
76
+ """
77
+ This example data service serves some sample static data. Any
78
+ DoGet request will return that static data. All other Flight RPC
79
+ methods are left unimplemented.
80
+
81
+ Note how the class holds onto the `ServerContext` - the implementations
82
+ will usually want to do this because the context contains additional
83
+ dependencies such as:
84
+
85
+ - Location to send out in FlightInfo
86
+ - Health monitor that the implementation can use to indicate
87
+ its status
88
+ - Task executor to perform long-running tasks
89
+ """
90
+
91
+ StaticData = pyarrow.table({
92
+ "col1": [1, 2, 3]
93
+ })
94
+
95
+ def __init__(self, ctx: gf.ServerContext) -> None:
96
+ self._ctx = ctx
97
+
98
+ def do_get(self,
99
+ context: pyarrow.flight.ServerCallContext,
100
+ ticket: pyarrow.flight.Ticket
101
+ ) -> pyarrow.flight.FlightDataStream:
102
+ return pyarrow.flight.RecordBatchStream(
103
+ self.StaticData
104
+ )
105
+
106
+
107
+ @gf.flight_server_methods
108
+ def my_service(ctx: gf.ServerContext) -> gf.FlightServerMethods:
109
+ """
110
+ Factory function for the data service. It returns implementation of Flight RPC
111
+ methods which are then integrated into the server.
112
+
113
+ The ServerContext passed in `ctx` allows you to access available configuration
114
+ and various useful server components.
115
+ """
116
+ return DataServiceMethods(ctx)
117
+
118
+
119
+ if __name__ == "__main__":
120
+ # additional options & config files can be passed to the
121
+ # create_server methods; more on this later
122
+ server = gf.create_server(my_service)
123
+ server.start()
124
+
125
+ # the main thread will block on this call
126
+ #
127
+ # SIGINT/SIGTERM causes graceful shutdown - the method will
128
+ # exit once server is stopped.
129
+ server.wait_for_stop()
130
+ ```
131
+
132
+ Notice the annotated `my_service` function. This is a factory for your data service's
133
+ Flight RPC methods. The server will call it out at appropriate time during the startup.
134
+ It will pass you the full context available at the time from where your code can access:
135
+
136
+ - available configuration loaded using Dynaconf
137
+ - health-checking components
138
+ - components to use for running long-running tasks.
139
+
140
+ During startup, the server will register signal handlers for SIGINT and SIGTERM - it will
141
+ perform graceful shutdown and tear everything down in the correct order when it receives them.
142
+
143
+ The server also comes with a simple CLI that you can use to start it up and load particular
144
+ data service:
145
+
146
+ ```shell
147
+ $ gooddata-flight-server start --methods-provider my_service.main
148
+ ```
149
+
150
+ The CLI will import the `my_service.main` Python module and look for a function decorated
151
+ with `@flight_server_methods`. It will start the server and make it initialize your data service
152
+ implementation and integrate it into the Flight RPC server.
153
+
154
+ Without any configuration, the server will bind to `127.0.0.1:17001` and run without TLS and not
155
+ use any authentication. It will not start health check or metric endpoints and will not start
156
+ the OpenTelemetry exporters.
157
+
158
+ NOTE: the CLI also has other arguments that let you specify configuration files to load and
159
+ logging configuration to use.
160
+
161
+ ### Configuration
162
+
163
+ The server uses [Dynaconf](https://www.dynaconf.com/) to for all its configuration. There are
164
+ many settings already in place to influence server's configuration and behavior. Your data service
165
+ code can also leverage Dynaconf config to configure itself: you can pass any number of configuration
166
+ files / env variables at startup; the server will load them all using Dynaconf and let your code
167
+ work with Dynaconf structures.
168
+
169
+ We recommend you to check out the Dynaconf documentation to learn more about how it works and
170
+ what are the capabilities. This text will only highlight the most common usage.
171
+
172
+ The available server settings are documented in the [sample-config.toml](sample-config.toml).
173
+ You can take this and use it as template for your own configuration.
174
+
175
+ To use a configuration file during startup, you can start the server like this:
176
+
177
+ ```shell
178
+ $ gooddata-flight-server start \
179
+ --methods-provider my_service.main \
180
+ --config server.config.toml
181
+ ```
182
+
183
+ In case your service needs its own configuration, it is often a good idea to keep it in
184
+ a separate file and add that to startup:
185
+
186
+ ```shell
187
+ $ gooddata-flight-server start \
188
+ --methods-provider my_service.main \
189
+ --config server.config.toml my_service.config.toml
190
+ ```
191
+
192
+ #### Environment variables
193
+
194
+ All settings that you can code into the config file can be also provided using environment
195
+ variables.
196
+
197
+ The server's Dynaconf integration is set up so that all environment variables are
198
+ expected to be prefixed with `GOODDATA_FLIGHT_`.
199
+
200
+ The environment variable naming convention is set up by Dynaconf and goes as follows:
201
+ `GOODDATA_FLIGHT_{SECTION}__{SETTING_NAME}`
202
+
203
+ Where the `SECTION` is for example `[server]`. For convenience, the [sample-config.toml](sample-config.toml)
204
+ indicates the full name of respective environment variable in each setting's documentation.
205
+
206
+ #### Configuration for your service
207
+
208
+ If your service needs its own configuration, you should aim to have a TOML config file like this:
209
+
210
+ ```toml
211
+ [my_service]
212
+ # env: GOODDATA_FLIGHT_MY_SERVICE__OPT1
213
+ opt1 = "value"
214
+ ```
215
+
216
+ When you provide such config file to server, it will parse it and make its contents available in the `ctx.settings`.
217
+ You can then access value of this setting in your factory function. For example like this:
218
+
219
+ ```python
220
+ import gooddata_flight_server as gf
221
+
222
+ _MY_CONFIG_SECTION = "my_service"
223
+
224
+ @gf.flight_server_methods
225
+ def my_service(ctx: gf.ServerContext) -> gf.FlightServerMethods:
226
+ opt1 = ctx.settings.get(f"{_MY_CONFIG_SECTION}.opt1")
227
+
228
+ # ... create and return server methods ...
229
+ ```
230
+
231
+ ### Authentication
232
+
233
+ Currently, the server supports two modes of authentication:
234
+
235
+ - no authentication
236
+ - token-based authentication and allows you to plug in custom token verification logic
237
+
238
+ The token verification method that comes built-in with the server is a simple one: the token is
239
+ an arbitrary, secret value shared between server and client. You configure the list of valid secret
240
+ tokens at server start-up and then at your discretion distribute these secret values to clients.
241
+
242
+ By default, the server runs with no authentication. To turn on the token based authentication,
243
+ you have to:
244
+
245
+ - Set the `authentication_method` setting to `token`.
246
+
247
+ By default, the server will use the built-in token verification strategy
248
+ called `EnumeratedTokenVerification`.
249
+
250
+ - Configure the secret tokens.
251
+
252
+ You can do this using environment variable: `GOODDATA_FLIGHT_ENUMERATED_TOKENS__TOKENS='["", ""]'`.
253
+ Put the secret token(s) inside the quotes. Alternatively, you can code tokens into a configuration file
254
+ such as this:
255
+
256
+ ```toml
257
+ [enumerated_tokens]
258
+ tokens = ["", ""]
259
+ ```
260
+
261
+ IMPORTANT: never commit secrets to your VCS.
262
+
263
+ With this setup in place, the server will expect the Flight clients to include token in the
264
+ `authorization` header in form of `Bearer <token>`. The token must be present on every
265
+ call.
266
+
267
+ Here is an example how to make a call that includes the `authorization` header:
268
+
269
+ ```python
270
+ import pyarrow.flight
271
+
272
+ def example_call_using_tokens():
273
+ opts = pyarrow.flight.FlightCallOptions(headers=[(b"authorization", b"Bearer <token>")])
274
+ client = pyarrow.flight.FlightClient("grpc+tls://localhost:17001")
275
+
276
+ for flight in client.list_flights(b"", opts):
277
+ print(flight)
278
+ ```
279
+
280
+ ## Developer Manual
281
+
282
+ This part of the documentation explains additional capabilities of the server.
283
+
284
+ ### Long-running tasks
285
+
286
+ Part of this package is a component that you can use to generate Flight data using long-running
287
+ tasks: the `TaskExecutor` component. The server will configure and create an instance of TaskExecutor
288
+ at startup; your server can access it via `ServerContext`.
289
+
290
+ The `TaskExecutor` implementation wraps on top of `ThreadPoolExecutor`: you can configure the number of
291
+ threads available for your tasks using `task_threads` setting. Each active task will use one thread from
292
+ this pool. If all threads are occupied, the tasks will be queued using FIFO strategy.
293
+
294
+ To use the `TaskExecutor`, you have to encapsulate the Flight data generation logic into a class
295
+ that extends the `Task` interface. Here, in the `run()` method you implement the necessary
296
+ algorithm that generates data.
297
+
298
+ The `Task` interface comes with a contract how your code should return the result (data) or raise
299
+ errors. The `TaskExecutor` will hold onto the results generated by your task and retain them for
300
+ a configured amount of time (see `task_result_ttl_sec` setting). The infrastructure recognizes that
301
+ your task may generate result that can be consumed either repeatedly (say Arrow Tables) or just
302
+ once (say RecordBatchReader backed by live stream).
303
+
304
+ Here is an example showing how to code a task, how to integrate its execution and how to
305
+ send out data that it generated:
306
+
307
+ ```python
308
+ from typing import Union, Any
309
+
310
+ import pyarrow.flight
311
+
312
+ import gooddata_flight_server as gf
313
+
314
+
315
+ class MyServiceTask(gf.Task):
316
+ def __init__(
317
+ self,
318
+ task_specific_payload: Any,
319
+ cmd: bytes,
320
+ ):
321
+ super().__init__(cmd)
322
+
323
+ self._task_specific_payload = task_specific_payload
324
+
325
+ def run(self) -> Union[gf.TaskResult, gf.TaskError]:
326
+ # tasks support cancellation; your code can check for
327
+ # cancellation at any time; if the task was cancelled the
328
+ # method will raise exception.
329
+ #
330
+ # do not forget to do cleanup on cancellation
331
+ self.check_cancelled()
332
+
333
+ # ... do whatever is needed to generate the data
334
+ data: pyarrow.RecordBatchReader = some_method_to_generate_data()
335
+
336
+ # when the data is ready, wrap it in a result that implements
337
+ # the FlightDataTaskResult interface; there are built-in implementations
338
+ # to wrap Arrow Table or Arrow RecordBatchReader.
339
+ #
340
+ # you can write your own result if you need special handling
341
+ # of result and/or resources bound to the result.
342
+ return gf.FlightDataTaskResult.for_data(data)
343
+
344
+
345
+ class DataServiceMethods(gf.FlightServerMethods):
346
+ def __init__(self, ctx: gf.ServerContext) -> None:
347
+ self._ctx = ctx
348
+
349
+ def _prepare_flight_info(self, task_result: gf.TaskExecutionResult) -> pyarrow.flight.FlightInfo:
350
+ if task_result.error is not None:
351
+ raise task_result.error.as_flight_error()
352
+
353
+ if task_result.cancelled:
354
+ raise gf.ErrorInfo.for_reason(
355
+ gf.ErrorCode.COMMAND_CANCELLED,
356
+ f"Service call was cancelled. Invocation task was: '{task_result.task_id}'.",
357
+ ).to_server_error()
358
+
359
+ result = task_result.result
360
+
361
+ return pyarrow.flight.FlightInfo(
362
+ schema=result.get_schema(),
363
+ descriptor=pyarrow.flight.FlightDescriptor.for_command(task_result.cmd),
364
+ endpoints=[
365
+ pyarrow.flight.FlightEndpoint(
366
+ ticket=pyarrow.flight.Ticket(ticket=task_result.task_id.encode()),
367
+ locations=[self._ctx.location],
368
+ )
369
+ ],
370
+ total_records=-1,
371
+ total_bytes=-1,
372
+ )
373
+
374
+ def get_flight_info(
375
+ self,
376
+ context: pyarrow.flight.ServerCallContext,
377
+ descriptor: pyarrow.flight.FlightDescriptor,
378
+ ) -> pyarrow.flight.FlightInfo:
379
+ cmd = descriptor.command
380
+ # parse & validate the command
381
+ some_parsed_command = ...
382
+
383
+ # create your custom task; you will usually pass the parsed command
384
+ # so that task knows what to do. The 'raw' command is required as well because
385
+ # it should be bounced back in the FlightInfo
386
+ task = MyServiceTask(task_specific_payload=some_parsed_command, cmd=cmd)
387
+ self._ctx.task_executor.submit(task)
388
+
389
+ # wait for the task to complete
390
+ result = self._ctx.task_executor.wait_for_result(task_id=task.task_id)
391
+
392
+ # once the task completes, create the FlightInfo or raise exception in
393
+ # case the task failed. The ticket in the FlightInfo should contain the
394
+ # task identifier.
395
+ return self._prepare_flight_info(result)
396
+
397
+ def do_get(self,
398
+ context: pyarrow.flight.ServerCallContext,
399
+ ticket: pyarrow.flight.Ticket
400
+ ) -> pyarrow.flight.FlightDataStream:
401
+ # caller comes to pick the data; the ticket should be the task identifier
402
+ task_id = ticket.ticket.decode()
403
+
404
+ # this utility method on the base class takes care of everything needed
405
+ # to correctly create FlightDataStream from the task result (or die trying
406
+ # in case the task result is no longer preset, or the result indicates that
407
+ # the task has failed)
408
+ return self.do_get_task_result(context, self._ctx.task_executor, task_id)
409
+ ```
410
+
411
+ ### Custom token verification strategy
412
+
413
+ At the moment, the built-in token verification strategy supported by the server is the
414
+ most basic one. In cases when this strategy is not good enough, you can code your own
415
+ and plug it into the server.
416
+
417
+ The `TokenVerificationStrategy` interface sets contract for your custom strategy. You
418
+ implement this class inside a Python module and then tell the server to load that
419
+ module.
420
+
421
+ For example, you create a module `my_service.auth.custom_token_verification` where you
422
+ implement the verification strategy:
423
+
424
+ ```python
425
+ import gooddata_flight_server as gf
426
+ import pyarrow.flight
427
+ from typing import Any
428
+
429
+
430
+ class MyCustomTokenVerification(gf.TokenVerificationStrategy):
431
+ def verify(self, call_info: pyarrow.flight.CallInfo, token: str) -> Any:
432
+ # implement your arbitrary logic here;
433
+ #
434
+ # see method and class documentation to learn more
435
+ raise NotImplementedError
436
+
437
+ @classmethod
438
+ def create(cls, ctx: gf.ServerContext) -> "TokenVerificationStrategy":
439
+ # code has chance to read any necessary settings from `ctx.settings`
440
+ # property and then use those values to construct the class
441
+ #
442
+ # see method and class documentation to learn more
443
+ return MyCustomTokenVerification()
444
+ ```
445
+
446
+ Then, you can use the `token_verification` setting to tell the server to look up
447
+ and load token verification strategy from `my_service.auth.custom_token_verification` module.
448
+
449
+ Using custom verification strategy, you can implement support for say JWT tokens or look
450
+ up valid tokens inside some database.
451
+
452
+ NOTE: As is, the server infrastructure does not concern itself with how the clients actually
453
+ obtain the valid tokens. At the moment, this is outside of this project's scope. You can distribute
454
+ tokens to clients using some procedure or implement custom APIs where clients have to log in
455
+ in order to obtain a valid token.
456
+
457
+ ### Logging
458
+
459
+ The server comes with the `structlog` installed by default. The `structlog` is used and configured
460
+ so that it uses Python stdlib logging backend. The `structlog` pipeline is set up so that:
461
+
462
+ - In dev mode, the logs are pretty-printed into console (achieved by `--dev-log` option of the server)
463
+ - In production deployment, the logs are serialized into JSON (using orjson) which is then written out.
464
+ This is ideal for consumption in log aggregators.
465
+
466
+ By default, the stdlib loggers are configured using the [default.logging.ini](./gooddata_flight_server/server/default.logging.ini)
467
+ file. In the default setup, all INFO-level logs are emitted.
468
+
469
+ If you want to customize the logging configuration, then:
470
+
471
+ - make a copy of this file and tweak it as you need
472
+ - either pass path to your config file to the `create_server` function or use `--logging-config`
473
+ argument on the CLI
474
+
475
+ The config file is the standard Python logging configuration file. You can learn about its intricacies
476
+ in Python documentation.
477
+
478
+ NOTE: you typically do not want to touch the formatter settings inside the logging ini file - the
479
+ `structlog` library creates the entire log lines accordingly to deployment mode.
480
+
481
+ The use of `structlog` and loggers is fairly straightforward:
482
+
483
+ ```python
484
+ import structlog
485
+
486
+ _LOGGER = structlog.get_logger("my_service")
487
+ _LOGGER.info("event-name", some_event_key="value_to_log")
488
+ ```
489
+
490
+ #### Recommendations
491
+
492
+ Here are few assorted recommendations based on our production experience with `structlog`:
493
+
494
+ - You can log complex objects such as lists, tuples, dicts and data classes no problem
495
+ - Be careful though. What can be serialized into dev-log may not always serialize
496
+ using `orjson` into production logs
497
+ - Always log exceptions using the special [exc_info](https://www.structlog.org/en/stable/exceptions.html) event key.
498
+ - Mind the cardinality of the logger instances. If you have a class of which you may have thousands of
499
+ instances, then it is **not a good idea** to create a logger instance for each instance of your class - even
500
+ if the logger name is the same; this is because each logger instance comes with memory overhead.
501
+
502
+ ### Prometheus Metrics
503
+
504
+ The server can be configured to start HTTP endpoint that exposes values of Prometheus
505
+ metrics. This is disabled by default.
506
+
507
+ To get started with Prometheus metrics you need to:
508
+
509
+ - Set `metrics_host` and `metrics_port`
510
+
511
+ - Check out the config file comments to learn more about these settings.
512
+ - What you have to remember is that the Prometheus scraper is an external process that
513
+ needs to reach the HTTP endpoint via network.
514
+
515
+ From then on, you can start using the Prometheus client to create various types of metrics. For example:
516
+
517
+ ```python
518
+ from prometheus_client import Counter
519
+
520
+ # instantiate counter
521
+ MY_COUNTER = Counter(
522
+ "my_counter",
523
+ "Fitting description of `my_counter`.",
524
+ )
525
+
526
+ def some_function():
527
+ # ...
528
+ MY_COUNTER.inc()
529
+ ```
530
+
531
+ #### Recommendations
532
+
533
+ Here are a few assorted recommendations based on our production experience:
534
+
535
+ - You must avoid double-declaration of metrics. If you try to define metric with same
536
+ identifier twice, the registration will fail.
537
+
538
+ - It is nice to declare all/most metrics in single place. For example create `my_metrics.py`
539
+ file and in that have `MyMetrics` class with one static field per metric.
540
+
541
+ This approach leads to better 'discoverability' of available metrics just by looking
542
+ at code. Using class with static field per-metric in turn makes imports and autocomplete
543
+ more convenient.
544
+
545
+ ### Open Telemetry
546
+
547
+ The server can be configured to integrate with OpenTelemetry and start and auto-configure
548
+ OpenTelemetry exporters. It will also auto-fill the ResourceAttributes by doing discovery where possible.
549
+
550
+ See the `otel_*` options in the configuration files to learn more. In a nutshell it
551
+ goes as follows:
552
+
553
+ - Configure which exporter to use using `otel_exporter_type` setting.
554
+
555
+ Nowadays, the `otlp-grpc` or `otlp-http` is the usual choice.
556
+
557
+ Depending on the exporter you use, you may/must specify additional, exporter-specific
558
+ environment variables to configure the exporter. The supported environment variables
559
+ are documented in the respective OpenTelemetry exporter package; e.g. they are not
560
+ something special to GoodData's Flight Server.
561
+
562
+ See [official exporter documentation](https://opentelemetry-python.readthedocs.io/en/latest/exporter/otlp/otlp.html#module-opentelemetry.exporter.otlp.proto.grpc).
563
+
564
+ - Install the respective exporter package.
565
+
566
+ - Tweak the other `otel_*` settings: you must at minimum set the `otel_service_name`
567
+
568
+ The settings apart from `otel_service_name` will fall back to defaults.
569
+
570
+ To start tracing, you need to initialize a tracer. You can do so as follows:
571
+
572
+ ```python
573
+ from opentelemetry import trace
574
+
575
+ MY_TRACER: trace.Tracer = trace.ProxyTracer("my_tracer")
576
+ ```
577
+
578
+ Typically, you want to create one instance of tracer for your entire data service and then
579
+ import that instance and use it wherever needed to create spans:
580
+
581
+ ```python
582
+ from your_module_with_tracer import MY_TRACER
583
+
584
+ def some_function():
585
+ # ... code
586
+ with MY_TRACER.start_as_current_span("do_some_work") as span:
587
+ # ... code
588
+ pass
589
+ ```
590
+
591
+ Note: there are many ways to instrument your code with spans. See [OpenTelemetry documentation](https://opentelemetry.io/docs/languages/python/instrumentation/)
592
+ to find out more.
593
+
594
+ #### Recommendations
595
+
596
+ Here are a few assorted recommendations based on our production experience:
597
+
598
+ - Always use the `ProxyTracer`. The underlying initialization code done by the server
599
+ will correctly set the actual tracer that will be called from the ProxyTracer.
600
+
601
+ This way, if you turn off OpenTelemetry (by commenting out the `otel_export_type` setting or setting it
602
+ to 'none'), the NoOpTracer will be injected under the covers and all the tracing code will
603
+ be no-op as well.
604
+
605
+ ### Health Checks
606
+
607
+ The server comes with a basic health-checking infrastructure - this is especially useful
608
+ when deploying to environments (such as k8s) that monitor health of your server and can automatically
609
+ restart it in case of problems.
610
+
611
+ When you configure the `health_check_host` (and optionally also `health_check_port`) setting, the
612
+ server will expose two HTTP endpoints:
613
+
614
+ - `/ready` - indicates whether the server is up and ready to serve requests
615
+
616
+ The endpoint will respond with status `500` if the server is not ready. Otherwise will respond with
617
+ `202`. The server is deemed ready when all its modules (which includes your FlexFunctions) are
618
+ up and the Flight RPC server is 'unlocked' to handle requests.
619
+
620
+ - `/live` - indicates whether the server is still alive and can be used. The liveness is determined
621
+ from the status of the modules.
622
+
623
+ Each of the server's modules can report its status to a central health checking service. If any of
624
+ the modules is unhealthy, the whole server is unhealthy.
625
+
626
+ Similar to the readiness, the server will respond with status `500` when not healthy. Otherwise, it
627
+ will respond with status `202`.
628
+
629
+ Creating health-checks is fairly straightforward:
630
+
631
+ - Your service's factory function receives `ServerContext`
632
+
633
+ - The `ServerContext` contains `health` property - which returns an instance of `ServerHealthMonitor`
634
+
635
+ - At this occasion, your code should hold onto / propagate the health monitor to any mission-critical
636
+ modules / components used by your implementation
637
+
638
+ - The `ServerHealthMonitor` has `set_module_status(module, status)` method - you can use this to indicate status
639
+
640
+ - The module `name` argument to this method can be anything you see fit
641
+ - The status is either `ModuleHealthStatus.OK` or `ModuleHealthStatus.NOT_OK`
642
+ - When your module is `NOT_OK`, the entire server is `NOT_OK`
643
+ - Usually, there is a grace period for which the server can be `NOT_OK`; after the time is up,
644
+ environment will restart the server
645
+ - If you return your module back to `OK` status, the server returns to `OK` status as well - thus
646
+ avoiding the automatic restarts.
647
+
648
+ Here is an example component using health monitoring:
649
+
650
+ ```python
651
+ import gooddata_flight_server as gf
652
+
653
+ class YourMissionCriticalComponent:
654
+ """
655
+ Let's say this component is used to perform some heavy lifting / important job.
656
+
657
+ The component is created in your service's factory and is used during Flight RPC
658
+ invocation. You propagate the `health` monitor to it at construction time.
659
+ """
660
+ def __init__(self, health: gf.ServerHealthMonitor) -> None:
661
+ self._health = health
662
+
663
+ def some_important_method(self):
664
+ try:
665
+ # this does some important work
666
+ return
667
+ except OSError:
668
+ # it runs into some kind of unrecoverable error (OSError here is purely example);
669
+ # by setting the status to NOT_OK, your component indicates that it is unhealthy
670
+ # and the /live endpoint will report the entire server as unhealthy.
671
+ #
672
+ # usually, the liveness checks have a grace period. if you set the module back
673
+ # to `gf.ModuleHealthStatus.OK` everything turns healthy again. If the grace
674
+ # period elapses, the server will usually be restarted by the environment.
675
+ self._health.set_module_status("YourMissionCriticalComponent", gf.ModuleHealthStatus.NOT_OK)
676
+ raise
677
+ ```
678
+
679
+ ## Troubleshooting
680
+
681
+ ### Clients cannot read data during GetFlightInfo->DoGet flow; getting DNS errors
682
+
683
+ The root cause here is usually misconfiguration of `listen_host` and `advertise_host`
684
+
685
+ You must always remember that `GetFlightInfo` returns a `FlightInfo` that is used
686
+ by clients to obtain the data using `DoGet`. The `FlightInfo` contains the location(s)
687
+ that the client will connect to - they must be reachable by the client.
688
+
689
+ There are a few things to check:
690
+
691
+ 1. Ensure that your service implementation correctly sets Location in the FlightInfo
692
+
693
+ Usually, you want to set the location to the value that your service implementation
694
+ receives in the `ServerContext`. This location is prepared by the server and contains
695
+ the value of `advertise_host` and `advertise_port`.
696
+
697
+ 2. Ensure that the `advertise_host` is set correctly; mistakes can happen easily especially
698
+ in dockerized environments. The documentation of `listen_host` and `advertise_host`
699
+ has additional detail.
700
+
701
+ To highlight specifics of Dockerized deployment:
702
+
703
+ - The server most often needs to listen on `0.0.0.0`
704
+ - The server must, however, advertise different hostname/IP - one that is reachable from
705
+ outside the container
706
+
707
+ ### The server's RSS keeps on growing; looks like server leaking memory
708
+
709
+ This can be usually observed on servers that are write-heavy: servers that handle a lot
710
+ of `DoPut` or `DoExchange` requests. When such servers run in environments that enforce
711
+ RSS limits, they can end up killed.
712
+
713
+ Often, this not a leak but a behavior of `malloc`. Even if you tell PyArrow to use
714
+ the `jemalloc` allocator, the underlying gRPC server used by Flight RPC will use `malloc` and
715
+ by default `malloc` will take its time returning unused memory back to the system.
716
+
717
+ And since the gRPC server is responsible for allocating memory for the received Arrow data,
718
+ it is often the `DoPut` or `DoExchange` workload that look like leaking memory.
719
+
720
+ If the RSS size is a problem (say you are running service inside k8s with memory limit), the
721
+ usual strategy is to:
722
+
723
+ 1. Set / tweak malloc behavior using `GLIBC_TUNABLES` environment variable; reduce
724
+ the malloc trim threshold and possibly also reduce number of malloc arenas
725
+
726
+ Here is a quite aggressive: `GLIBC_TUNABLES="glibc.malloc.trim_threshold=4:glibc.malloc.arena_max=2:glibc.malloc.tcache_count=0"`
727
+
728
+ 2. Periodically call `malloc_trim` to poke malloc to trim any unneeded allocations and
729
+ return them to the system.
730
+
731
+ The GoodData Flight server already implements period malloc trim. By default, the interval
732
+ is set to `30 seconds`. You can change this interval using the `malloc_trim_interval_sec`
733
+ setting.
734
+
735
+ Additionally, we recommend to read up on [Python Memory Management](https://realpython.com/python-memory-management/) -
736
+ especially the part where CPython is not returning unused blocks back to the system. This may be another reason for
737
+ RSS growth - the tricky bit here being that it really depends on object creation patterns in your service.