@flying-pillow/manager 0.1.0-beta.10
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +132 -0
- package/dist/assets/application/docker/docker-compose.gpu.yml +16 -0
- package/dist/assets/application/docker/docker-compose.yml +579 -0
- package/dist/assets/application/migrations/0.0.0/entity_model.json +12589 -0
- package/dist/assets/application/migrations/0.1.0-beta.0/entity_model.json +12589 -0
- package/dist/assets/application/migrations/0.1.0-beta.1/entity_model.json +12589 -0
- package/dist/assets/application/migrations/0.1.0-beta.2/entity_model.json +18600 -0
- package/dist/assets/application/migrations/0.1.0-beta.5/entity_model.json +18600 -0
- package/dist/assets/application/migrations/0.1.0-beta.8/entity_model.json +18600 -0
- package/dist/assets/application/migrations/0.1.0-beta.9/entity_model.json +18600 -0
- package/dist/assets/application/observability/README.md +130 -0
- package/dist/assets/application/observability/grafana/provisioning/dashboards/celery_dashboard.json +1045 -0
- package/dist/assets/application/observability/grafana/provisioning/dashboards/dashboard.yml +13 -0
- package/dist/assets/application/observability/grafana/provisioning/datasources/datasources.yml +23 -0
- package/dist/assets/application/observability/otel-collector/otel-collector-config.yml +77 -0
- package/dist/assets/application/observability/prometheus/prometheus.yml +19 -0
- package/dist/assets/application/observability/surrealist/instance.json +24 -0
- package/dist/assets/gateway/docker/docker-compose.gateway.yml +127 -0
- package/dist/assets/gateway/terminal/Dockerfile +19 -0
- package/dist/assets/gateway/terminal/entrypoint.sh +28 -0
- package/dist/cli.js +36858 -0
- package/package.json +52 -0
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
# Flying-Pillow Observability Stack
|
|
2
|
+
|
|
3
|
+
This document outlines the architecture and configuration of the observability stack for the Flying-Pillow monorepo. The stack is designed to provide comprehensive logging, tracing, and metrics for all services, both TypeScript and Python-based.
|
|
4
|
+
|
|
5
|
+
## 1. Core Philosophy
|
|
6
|
+
|
|
7
|
+
The observability stack adheres to the following principles:
|
|
8
|
+
|
|
9
|
+
- **Centralized Collection:** All telemetry data (Logs, Traces, Metrics) is sent to a central OpenTelemetry (OTel) Collector.
|
|
10
|
+
- **Standardized Format:** All services emit telemetry in the OpenTelemetry Protocol (OTLP) format.
|
|
11
|
+
- **Schema-Driven:** Telemetry data structures are defined in Zod (TypeScript) and Pydantic (Python) schemas, ensuring cross-language consistency. Attribute naming follows the `snake_case` convention to align with OpenTelemetry Semantic Conventions.
|
|
12
|
+
- **Correlation:** Logs are automatically correlated with traces via `trace_id` and `span_id` for seamless debugging across services.
|
|
13
|
+
|
|
14
|
+
## 2. Architecture Overview
|
|
15
|
+
|
|
16
|
+
The stack consists of application services, a central OTel Collector, and dedicated backend storage systems for each telemetry signal. All components run in a shared Docker network.
|
|
17
|
+
|
|
18
|
+
```mermaid
|
|
19
|
+
graph TD
|
|
20
|
+
subgraph "Application Services"
|
|
21
|
+
A[app SvelteKit] --> C
|
|
22
|
+
W[wss Node.js] --> C
|
|
23
|
+
H[hub FastAPI] --> C
|
|
24
|
+
P[processor Python/Celery] --> C
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
subgraph "Observability Pipeline"
|
|
28
|
+
C(OTel Collector)
|
|
29
|
+
C -- Traces --> J[Jaeger]
|
|
30
|
+
C -- Metrics --> PR[Prometheus]
|
|
31
|
+
C -- Logs --> L[Loki]
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
subgraph "Visualization"
|
|
35
|
+
G[Grafana]
|
|
36
|
+
G --> J
|
|
37
|
+
G --> PR
|
|
38
|
+
G --> L
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
style C fill:#f9f,stroke:#333,stroke-width:2px
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## 3. Data Flow and Configuration
|
|
45
|
+
|
|
46
|
+
### 3.1. Tracing
|
|
47
|
+
|
|
48
|
+
- **SDKs:** Services are instrumented using the appropriate OpenTelemetry SDK (`@opentelemetry/sdk-node` for TS, `opentelemetry-sdk` for Python).
|
|
49
|
+
- **Protocol:** OTLP/HTTP.
|
|
50
|
+
- **Destination:** `otel-collector:4318/v1/traces`.
|
|
51
|
+
- **Collector Pipeline:** Receives traces, batches them, and exports them to Jaeger.
|
|
52
|
+
- **Backend:** Jaeger (in-memory `all-in-one` image for development).
|
|
53
|
+
- **Visualization:** Grafana (Jaeger data source).
|
|
54
|
+
|
|
55
|
+
### 3.2. Metrics
|
|
56
|
+
|
|
57
|
+
- **SDKs:** Services use the OTel SDKs to collect runtime and custom metrics.
|
|
58
|
+
- **Protocol:** OTLP/HTTP.
|
|
59
|
+
- **Destination:** `otel-collector:4318/v1/metrics`.
|
|
60
|
+
- **Collector Pipeline:** Receives metrics, batches them, and exports them to a Prometheus-compatible endpoint.
|
|
61
|
+
- **Backend:** Prometheus (with persistent volume storage).
|
|
62
|
+
-. **Visualization:** Grafana (Prometheus data source).
|
|
63
|
+
|
|
64
|
+
### 3.3. Logging
|
|
65
|
+
|
|
66
|
+
The logging pipeline is the most sophisticated, designed to produce structured, labeled, and correlated logs.
|
|
67
|
+
|
|
68
|
+
#### Data Structure
|
|
69
|
+
|
|
70
|
+
Log records are structured according to schemas defined in `apps/common` and `services/common`. The key components are:
|
|
71
|
+
|
|
72
|
+
- **Trace Context:** Ambient context for the request (`trace_id`, `organizationId`, `userId`, `role`). Injected automatically by the logger.
|
|
73
|
+
- **Log Metadata:** Developer-provided context (`message`, `function`, `args`, `data`, `error_data`).
|
|
74
|
+
- **Convention:** The public logger API accepts a developer-friendly format, which is transformed internally into a serializable, schema-compliant object before being sent to the pipeline.
|
|
75
|
+
|
|
76
|
+
#### Detailed Data Flow
|
|
77
|
+
|
|
78
|
+
```mermaid
|
|
79
|
+
sequenceDiagram
|
|
80
|
+
participant App as Application Code
|
|
81
|
+
participant Logger as Type-Safe Logger
|
|
82
|
+
participant Winston as Winston.js
|
|
83
|
+
participant OTel as OTel Collector
|
|
84
|
+
participant Loki
|
|
85
|
+
participant Grafana
|
|
86
|
+
|
|
87
|
+
App->>Logger: logger.error("Message", {error: new Error(), ...});
|
|
88
|
+
Logger->>Logger: Transforms live Error into serializable `error_data` object.
|
|
89
|
+
Logger->>Winston: .log("Message", {..., error_data: {...}});
|
|
90
|
+
Winston->>Winston: Injects Context (trace_id, org_id, etc.).
|
|
91
|
+
Winston->>OTel: Sends final structured log object via OTLP.
|
|
92
|
+
|
|
93
|
+
OTel->>OTel: **transform processor** copies `organizationId`, `role`, etc. to resource attributes.
|
|
94
|
+
OTel->>OTel: **resource processor** adds `loki.resource.labels` hint.
|
|
95
|
+
OTel->>Loki: Exports log, creating indexed labels from hints.
|
|
96
|
+
|
|
97
|
+
Grafana->>Loki: Queries with `{service_name="app", role="admin"}`.
|
|
98
|
+
Loki-->>Grafana: Returns matching log streams.
|
|
99
|
+
Grafana->>Grafana: **| json** parser processes log body.
|
|
100
|
+
Grafana->>Grafana: **| json attributes="attributes"** parses nested attributes.
|
|
101
|
+
Grafana->>Grafana: **| line_format** creates clean summary view.
|
|
102
|
+
Grafana->>Grafana: Displays interactive, nested details on expand.
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
#### Key Configurations
|
|
106
|
+
|
|
107
|
+
- **`apps/common/src/system/logging/ServerLogger.ts`:** The TypeScript logger implementation. Responsible for transforming the developer-provided log metadata into a serializable format that avoids conflicts with the pipeline.
|
|
108
|
+
- **`observability/otel-collector/config.yml`:** The heart of the pipeline.
|
|
109
|
+
- Uses a `transform` processor to copy log record attributes (e.g., `organizationId`, `role`) to the resource attributes.
|
|
110
|
+
- Uses a `resource` processor to add the `loki.resource.labels` hint, which tells the Loki exporter to create indexed labels from all the enriched resource attributes.
|
|
111
|
+
- This ensures logs are filterable in Loki by `service_name`, `organizationId`, `role`, `event`, etc.
|
|
112
|
+
- **Grafana Dashboard:**
|
|
113
|
+
- Uses a multi-stage parser query (`... | json | json attributes="attributes"`) to correctly handle the nested log structure produced by the collector.
|
|
114
|
+
- Uses `line_format` to create a clean, human-readable summary for each log row.
|
|
115
|
+
- Uses "Derived Fields" in the Loki data source settings to create a clickable link from the `traceid` field to the corresponding trace in Jaeger.
|
|
116
|
+
|
|
117
|
+
## 4. How to Use
|
|
118
|
+
|
|
119
|
+
### For Developers
|
|
120
|
+
|
|
121
|
+
- **Logging:** Import the singleton `logger` instance. Call methods like `logger.info(message, meta)` and `logger.error(message, meta)`. Follow the type-safe `LogMetaType` and `ErrorMetaType` schemas to provide rich context.
|
|
122
|
+
- **Context:** The `ServerContext` is automatically propagated and its data is injected into all logs. You do not need to manually add `trace_id` or `organizationId` to log calls.
|
|
123
|
+
|
|
124
|
+
### For Operations / Debugging
|
|
125
|
+
|
|
126
|
+
- **Grafana:** The primary interface for visualization.
|
|
127
|
+
- Use the "Flying-Pillow Application Logs" dashboard.
|
|
128
|
+
- Use the dropdowns to filter logs by `service_name`, `organizationId`, `role`, etc.
|
|
129
|
+
- Expand log rows to see full, interactive context, including stack traces and function arguments.
|
|
130
|
+
- Click the link on the `traceid` field to pivot directly to the full distributed trace in Jaeger.
|