PyPI - boundary-analyzer - Versions diffs - 0.2.0__tar.gz - Mend

boundary-analyzer 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (65) hide show

boundary_analyzer-0.2.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,34 @@
+# Changelog
+## v1.0.0 (2026-06-11)
+### Features
+- **SCOM pipeline** : computes Service-COhesion Metric from Jaeger traces (health filtering, endpoint extraction, DB table detection, endpoint-table mapping, threshold analysis, report generation)
+- **CLI tool** : `mba` / `boundary-analyzer` commands (`run`, `setup`, `dashboard`, `teastore`)
+- **Auto-instrumentation** : auto-detects Python microservices (FastAPI, Flask, Django), injects OpenTelemetry, collects traces via Jaeger, runs SCOM analysis
+- **TeaStore support** : Docker Compose deployment with OTel Java agent, traffic generator, trace exporter, full SCOM pipeline
+- **Dashboard** : interactive Dash web UI for SCOM results
+- **LLM analysis** (optional) : AI-powered narrative report via OpenRouter (Qwen), disabled by default
+### Improvements
+- Segment-based health matching (`HEALTH_KEYWORDS`) instead of fragile `endswith` — `/health/all`, `/auth/health`, `/ready/isready`, `/metrics` (via `http.target`) correctly filtered
+- `--skip-no-db-services` flag to exclude stateless services (proxy, orchestrator, etc.) from SCOM ranking
+- `run_teastore()` function extracted for programmatic access
+### Bug fixes
+- MissingGreenlet in classroom-repository (added `selectinload`)
+- datetime timezone-aware comparison in enrollment-service
+- `academic_year` int→str conversion in enrollment-service
+- Scope bug in `cleaned_parts` variable in CLI cleanup logic
+- SQLAlchemy duplicate instrumentation (event listeners only, no `SQLAlchemyInstrumentor`/`AsyncPGInstrumentor`)
+- `[project.scripts]` whitespace in pyproject.toml
+### Tests
+- 74 tests total (58 existing + 16 TeaStore)
+- TeaStore synthetic fixtures (persistence-service with 5 tables, auth-service without DB)
+- 3 test classes : TeaStorePipelineTest, TeaStoreSkipNoDbTest, TeaStoreNoFilterTest
+### Infrastructure
+- CI via GitHub Actions (`.github/workflows/ci.yml`) — Python 3.11 × 3.12
+- `mba` CLI alias alongside `boundary-analyzer`
+- Version bump to 0.2.0

boundary_analyzer-0.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,50 @@
+Metadata-Version: 2.4
+Name: boundary-analyzer
+Version: 0.2.0
+Summary: SCOM-based microservice boundary analysis from Jaeger traces
+Author-email: Ray Ague <rayague03@gmail.com>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/rayague/measure-automation
+Project-URL: Repository, https://github.com/rayague/measure-automation
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: requests>=2.31.0
+Requires-Dist: pandas>=2.2.0
+Requires-Dist: PyYAML>=6.0.1
+Requires-Dist: dash>=2.14.0
+Requires-Dist: plotly>=5.18.0
+# Changelog
+## v1.0.0 (2026-06-11)
+### Features
+- **SCOM pipeline** : computes Service-COhesion Metric from Jaeger traces (health filtering, endpoint extraction, DB table detection, endpoint-table mapping, threshold analysis, report generation)
+- **CLI tool** : `mba` / `boundary-analyzer` commands (`run`, `setup`, `dashboard`, `teastore`)
+- **Auto-instrumentation** : auto-detects Python microservices (FastAPI, Flask, Django), injects OpenTelemetry, collects traces via Jaeger, runs SCOM analysis
+- **TeaStore support** : Docker Compose deployment with OTel Java agent, traffic generator, trace exporter, full SCOM pipeline
+- **Dashboard** : interactive Dash web UI for SCOM results
+- **LLM analysis** (optional) : AI-powered narrative report via OpenRouter (Qwen), disabled by default
+### Improvements
+- Segment-based health matching (`HEALTH_KEYWORDS`) instead of fragile `endswith` — `/health/all`, `/auth/health`, `/ready/isready`, `/metrics` (via `http.target`) correctly filtered
+- `--skip-no-db-services` flag to exclude stateless services (proxy, orchestrator, etc.) from SCOM ranking
+- `run_teastore()` function extracted for programmatic access
+### Bug fixes
+- MissingGreenlet in classroom-repository (added `selectinload`)
+- datetime timezone-aware comparison in enrollment-service
+- `academic_year` int→str conversion in enrollment-service
+- Scope bug in `cleaned_parts` variable in CLI cleanup logic
+- SQLAlchemy duplicate instrumentation (event listeners only, no `SQLAlchemyInstrumentor`/`AsyncPGInstrumentor`)
+- `[project.scripts]` whitespace in pyproject.toml
+### Tests
+- 74 tests total (58 existing + 16 TeaStore)
+- TeaStore synthetic fixtures (persistence-service with 5 tables, auth-service without DB)
+- 3 test classes : TeaStorePipelineTest, TeaStoreSkipNoDbTest, TeaStoreNoFilterTest
+### Infrastructure
+- CI via GitHub Actions (`.github/workflows/ci.yml`) — Python 3.11 × 3.12
+- `mba` CLI alias alongside `boundary-analyzer`
+- Version bump to 0.2.0

boundary_analyzer-0.2.0/README.md ADDED Viewed

@@ -0,0 +1,269 @@
+# measure-automation
+A tool for analyzing microservice boundaries using runtime traces from OpenTelemetry/Jaeger. It computes Service Cohesion Measure (SCOM) to detect services with low cohesion that may have wrong boundaries.
+## Prerequisites
+- Python 3.11+
+- Jaeger instance running (http://localhost:16686 by default)
+- Your services instrumented with OpenTelemetry
+## Installation
+```powershell
+python -m pip install -e .
+```
+Or install dependencies manually:
+```powershell
+python -m pip install requests pandas pyyaml dash plotly
+```
+## Configuration
+Edit `config/settings.yaml`:
+```yaml
+jaeger_base_url: "http://localhost:16686"
+service_name: "YOUR_SERVICE_NAME"  # Set to a real service from Jaeger
+lookback_minutes: 10
+limit_traces: 20
+output_dir: "data/raw/traces"
+# SCOM calculation method
+# - "paper": CI/CImax normalization from the paper (endpoints < 2 => 0)
+# - "weighted": weighted Jaccard (legacy)
+# - "simple": unweighted Jaccard (legacy)
+scom_method: "weighted"  # Options: "paper", "weighted" or "simple"
+table_weighting: true
+endpoint_weighting: true
+# Threshold method for suspicious services
+threshold_method: "percentile"  # Options: "percentile", "zscore", or "fixed"
+threshold_percentile: 25.0
+threshold_zscore: -1.5
+scom_threshold: 0.5
+```
+## Pipeline Steps
+Run the pipeline in order:
+## One-command (Professional) Usage
+After installation, you can run the full pipeline with a single command.
+```powershell
+boundary-analyzer run
+```
+Equivalent:
+```powershell
+python -m boundary_analyzer run
+```
+### Options
+- **`--skip-collect`**
+  Skips Step 01 (Jaeger trace collection) and reuses the existing traces in the folder configured by `output_dir` in `config/settings.yaml`.
+- **`--dashboard`**
+  Launches the dashboard after the pipeline completes.
+- **`--data-dir <path>`**
+  Base directory containing `interim/` and `processed/` for the dashboard (default: `data`).
+- **`--dash-host <host>`**
+  Dashboard bind host (default: `127.0.0.1`). Use `0.0.0.0` to expose on LAN.
+- **`--dash-port <port>`**
+  Dashboard port (default: `8050`).
+- **`--settings <path>`**
+  Path to `settings.yaml`. This applies to all pipeline steps.
+  Note: for MongoDB spans, the tool counts collections as "tables" (for backward compatibility in CSV/report columns).
+### Examples
+Run everything (collect traces + compute results + report):
+```powershell
+boundary-analyzer run
+```
+Reuse traces already collected and open the dashboard:
+```powershell
+boundary-analyzer run --skip-collect --dashboard
+```
+Launch only the dashboard:
+```powershell
+boundary-analyzer dashboard
+```
+The dashboard is available by default at:
+`http://127.0.0.1:8050`
+Launch the dashboard for a different results folder:
+```powershell
+boundary-analyzer dashboard --data-dir .\demo-service\scom_report
+```
+## Setup mode (when your project has no Jaeger / OpenTelemetry)
+If your target project is not instrumented yet (no OpenTelemetry, no Jaeger), you can use the auto-setup command.
+It will:
+- detect the framework
+- install OpenTelemetry packages (unless you pass `--no-install`)
+- generate an instrumentation file for your app
+- start Jaeger (unless you pass `--no-jaeger`)
+- ask you to restart your app and send some traffic
+- collect traces and run the analysis
+```powershell
+boundary-analyzer setup --project-path .\path\to\your-service
+```
+Common options:
+- **`--framework <name>`**: force a framework instead of auto-detect
+- **`--service-name <name>`**: set the Jaeger service name
+- **`--no-jaeger`**: skip starting Jaeger (use if already running)
+- **`--no-install`**: skip installing OpenTelemetry packages
+- **`--jaeger-host <host>`**: Jaeger host (default: `localhost`)
+Example:
+```powershell
+boundary-analyzer setup --project-path .\demo-service --service-name demo-service
+```
+Run setup and open the dashboard on the generated results:
+```powershell
+boundary-analyzer setup --project-path .\demo-service --service-name demo-service --dashboard
+```
+### Step 01: Collect traces from Jaeger
+Collects trace data from Jaeger API for a specific service.
+```powershell
+python .\src\boundary_analyzer\pipeline\step_01_collect_traces.py
+```
+**Output:** `data/raw/traces/jaeger_traces_{service}_{timestamp}.json`
+### Step 02: Read and flatten traces
+Reads all trace files and flattens spans into a CSV format.
+```powershell
+python .\src\boundary_analyzer\pipeline\step_02_read_traces.py
+```
+**Output:** `data/interim/spans.csv`
+### Step 03: Find endpoints
+Extracts HTTP endpoints from spans (method + route normalization).
+```powershell
+python .\src\boundary_analyzer\pipeline\step_03_find_endpoints.py
+```
+**Output:** `data/interim/endpoints.csv`
+### Step 04: Find database tables
+Extracts database table names from SQL operations in spans.
+```powershell
+python .\src\boundary_analyzer\pipeline\step_04_find_db_tables.py
+```
+**Output:** `data/interim/db_operations.csv`
+### Step 05: Build endpoint-table mapping
+Links endpoints to database tables by walking the span parent chain.
+```powershell
+python .\src\boundary_analyzer\pipeline\step_05_build_mapping.py
+```
+**Output:** `data/interim/endpoint_table_map.csv`
+### Step 06: Compute SCOM scores
+Calculates Service Cohesion Measure for each service using weighted Jaccard similarity.
+```powershell
+python .\src\boundary_analyzer\pipeline\step_06_compute_scom.py
+```
+**Output:** `data/processed/service_scom.csv`
+### Step 07: Rank and flag suspicious services
+Applies statistical threshold (percentile, Z-score, or fixed) to flag services with low cohesion.
+```powershell
+python .\src\boundary_analyzer\pipeline\step_07_rank_and_flag.py
+```
+**Output:**
+- `data/processed/service_rank.csv`
+- `data/processed/suspicious_services.csv`
+### Step 08: Generate report
+Creates a Markdown report with analysis results.
+```powershell
+python .\src\boundary_analyzer\pipeline\step_08_make_report.py
+```
+**Output:** `reports/latest/report.md`
+## Dashboard
+Launch the interactive dashboard to visualize results:
+```powershell
+python .\src\boundary_analyzer\dashboard\app.py
+```
+The dashboard will be available at `http://localhost:8050`
+## Quick Start
+Run all steps in sequence:
+```powershell
+python .\src\boundary_analyzer\pipeline\step_01_collect_traces.py
+python .\src\boundary_analyzer\pipeline\step_02_read_traces.py
+python .\src\boundary_analyzer\pipeline\step_03_find_endpoints.py
+python .\src\boundary_analyzer\pipeline\step_04_find_db_tables.py
+python .\src\boundary_analyzer\pipeline\step_05_build_mapping.py
+python .\src\boundary_analyzer\pipeline\step_06_compute_scom.py
+python .\src\boundary_analyzer\pipeline\step_07_rank_and_flag.py
+python .\src\boundary_analyzer\pipeline\step_08_make_report.py
+python .\src\boundary_analyzer\dashboard\app.py
+```
+## Documentation
+See `docs/research_method.md` for detailed information about:
+- Core concepts (coupling, cohesion, wrong cuts)
+- Analysis method and limitations
+- SCOM calculation formula
+- Threshold selection methods
+- Future improvements

boundary_analyzer-0.2.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,29 @@
+[project]
+name = "boundary-analyzer"
+version = "0.2.0"
+description = "SCOM-based microservice boundary analysis from Jaeger traces"
+readme = "CHANGELOG.md"
+license = "MIT"
+authors = [{ name = "Ray Ague", email = "rayague03@gmail.com" }]
+requires-python = ">=3.11"
+dependencies = [
+  "requests>=2.31.0",
+  "pandas>=2.2.0",
+  "PyYAML>=6.0.1",
+  "dash>=2.14.0",
+  "plotly>=5.18.0",
+]
+[project.urls]
+Homepage = "https://github.com/rayague/measure-automation"
+Repository = "https://github.com/rayague/measure-automation"
+[project.scripts]
+boundary-analyzer = "boundary_analyzer.cli:main"
+mba = "boundary_analyzer.cli:main"
+[tool.setuptools]
+package-dir = {"" = "src"}
+[tool.setuptools.packages.find]
+where = ["src"]

boundary_analyzer-0.2.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

boundary_analyzer-0.2.0/src/boundary_analyzer/__init__.py ADDED Viewed

File without changes

boundary_analyzer-0.2.0/src/boundary_analyzer/__main__.py ADDED Viewed

@@ -0,0 +1,7 @@
+from __future__ import annotations
+from boundary_analyzer.cli import main
+if __name__ == "__main__":
+    raise SystemExit(main())

boundary_analyzer-0.2.0/src/boundary_analyzer/auto_setup/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+# auto_setup/__init__.py
+# This file makes auto_setup a proper Python package.
+# You can import from it like:
+#   from auto_setup.setup_instrumentation import detect_framework

boundary_analyzer-0.2.0/src/boundary_analyzer/auto_setup/django_wrapper.py ADDED Viewed

@@ -0,0 +1,48 @@
+"""
+otel_instrumentation.py – Django / Django REST Framework
+==========================================================
+This file sets up OpenTelemetry tracing for your Django application.
+HOW IT WORKS:
+  - Every HTTP request Django handles becomes a span.
+  - Every ORM database query also becomes a span.
+  - All spans are sent to Jaeger.
+YOU DO NOT NEED TO EDIT THIS FILE.
+Add these 2 lines BEFORE django.setup() in your manage.py or wsgi.py:
+    from otel_instrumentation import init_tracing
+    init_tracing()
+"""
+from opentelemetry import trace
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.sdk.resources import Resource
+from opentelemetry.instrumentation.django import DjangoInstrumentor
+from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
+def init_tracing():
+    """
+    Call this function ONCE, before django.setup() is called.
+    """
+    resource = Resource.create({"service.name": "{{SERVICE_NAME}}"})
+    exporter = OTLPSpanExporter(
+        endpoint="http://{{JAEGER_HOST}}:{{JAEGER_GRPC_PORT}}"
+    )
+    provider = TracerProvider(resource=resource)
+    provider.add_span_processor(BatchSpanProcessor(exporter))
+    trace.set_tracer_provider(provider)
+    # Automatically trace every Django view / URL
+    DjangoInstrumentor().instrument()
+    # Automatically trace every ORM query
+    SQLAlchemyInstrumentor().instrument()
+    print(f"[OTel] Tracing enabled → Jaeger at {{JAEGER_HOST}}:{{JAEGER_GRPC_PORT}}")

boundary_analyzer-0.2.0/src/boundary_analyzer/auto_setup/djangorest_wrapper.py ADDED Viewed

@@ -0,0 +1,44 @@
+"""
+otel_instrumentation.py – Django REST Framework
+=================================================
+Same setup as Django — DRF sits on top of Django,
+so we instrument at the Django level.
+Add these 2 lines BEFORE django.setup():
+    from otel_instrumentation import init_tracing
+    init_tracing()
+"""
+# Django REST Framework uses the same Django instrumentation
+from opentelemetry import trace
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.sdk.resources import Resource
+from opentelemetry.instrumentation.django import DjangoInstrumentor
+from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
+def init_tracing():
+    """
+    Initialize tracing for Django REST Framework.
+    DRF is built on top of Django, so we instrument Django directly.
+    """
+    resource = Resource.create({"service.name": "{{SERVICE_NAME}}"})
+    exporter = OTLPSpanExporter(
+        endpoint="http://{{JAEGER_HOST}}:{{JAEGER_GRPC_PORT}}"
+    )
+    provider = TracerProvider(resource=resource)
+    provider.add_span_processor(BatchSpanProcessor(exporter))
+    trace.set_tracer_provider(provider)
+    # Instrument every DRF/Django view automatically
+    DjangoInstrumentor().instrument()
+    # Instrument every database query automatically
+    SQLAlchemyInstrumentor().instrument()
+    print(f"[OTel] Tracing enabled → Jaeger at {{JAEGER_HOST}}:{{JAEGER_GRPC_PORT}}")

boundary_analyzer-0.2.0/src/boundary_analyzer/auto_setup/fastapi_wrapper.py ADDED Viewed

@@ -0,0 +1,59 @@
+"""
+otel_instrumentation.py – FastAPI
+===================================
+This file sets up OpenTelemetry tracing for your FastAPI application.
+HOW IT WORKS:
+  - Every HTTP request becomes a span (recorded event).
+  - Every database query (via SQLAlchemy) also becomes a span.
+  - All spans are sent to Jaeger so we can see and analyze them.
+YOU DO NOT NEED TO EDIT THIS FILE.
+Just call init_tracing() at the top of your main.py (before app = FastAPI()).
+"""
+from opentelemetry import trace
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.sdk.resources import Resource
+from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
+from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
+def init_tracing(app=None):
+    """
+    Call this function ONCE at the start of your app.
+    If you pass your FastAPI app object, HTTP routes will be traced automatically.
+    Example:
+        app = FastAPI()
+        init_tracing(app)
+    If you do not pass the app, call FastAPIInstrumentor().instrument_app(app)
+    after creating it.
+    """
+    # Tell Jaeger the name of this service
+    resource = Resource.create({"service.name": "{{SERVICE_NAME}}"})
+    # Send traces to Jaeger via gRPC
+    exporter = OTLPSpanExporter(
+        endpoint="http://{{JAEGER_HOST}}:{{JAEGER_GRPC_PORT}}"
+    )
+    provider = TracerProvider(resource=resource)
+    provider.add_span_processor(BatchSpanProcessor(exporter))
+    trace.set_tracer_provider(provider)
+    # Instrument database queries automatically
+    SQLAlchemyInstrumentor().instrument()
+    # Instrument the FastAPI app if provided
+    if app is not None:
+        FastAPIInstrumentor.instrument_app(app)
+    else:
+        # Will instrument the first FastAPI app created after this call
+        FastAPIInstrumentor().instrument()
+    print(f"[OTel] Tracing enabled → Jaeger at {{JAEGER_HOST}}:{{JAEGER_GRPC_PORT}}")

boundary_analyzer-0.2.0/src/boundary_analyzer/auto_setup/flask_wrapper.py ADDED Viewed

@@ -0,0 +1,53 @@
+"""
+otel_instrumentation.py – Flask
+================================
+This file sets up OpenTelemetry tracing for your Flask application.
+HOW IT WORKS:
+  - Every HTTP request your app receives becomes a "span" (a recorded event).
+  - Every database query also becomes a span.
+  - All spans are sent to Jaeger so we can see them.
+YOU DO NOT NEED TO EDIT THIS FILE.
+Just call init_tracing() at the top of your main app file.
+"""
+from opentelemetry import trace
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.sdk.resources import Resource
+from opentelemetry.instrumentation.flask import FlaskInstrumentor
+from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
+def init_tracing():
+    """
+    Call this function ONCE at the start of your app.
+    It configures OpenTelemetry to send traces to Jaeger.
+    """
+    # 'resource' tells Jaeger which service these traces belong to
+    resource = Resource.create({"service.name": "{{SERVICE_NAME}}"})
+    # 'exporter' sends the traces to Jaeger over gRPC
+    exporter = OTLPSpanExporter(
+        endpoint="http://{{JAEGER_HOST}}:{{JAEGER_GRPC_PORT}}"
+    )
+    # 'provider' is the main tracing engine
+    provider = TracerProvider(resource=resource)
+    # 'processor' batches spans and sends them to the exporter
+    provider.add_span_processor(BatchSpanProcessor(exporter))
+    # Register the provider globally
+    trace.set_tracer_provider(provider)
+    # Automatically trace every Flask HTTP request
+    FlaskInstrumentor().instrument()
+    # Automatically trace every SQLAlchemy database query
+    SQLAlchemyInstrumentor().instrument()
+    print(f"[OTel] Tracing enabled → Jaeger at {{JAEGER_HOST}}:{{JAEGER_GRPC_PORT}}")