npm - cerebrex - Versions diffs - 0.9.1 → 0.9.3 - Mend

cerebrex 0.9.1 → 0.9.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -7,19 +7,20 @@
 [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](./LICENSE)
 [![CI](https://github.com/arealcoolco/CerebreX/actions/workflows/ci.yml/badge.svg)](https://github.com/arealcoolco/CerebreX/actions/workflows/ci.yml)
 [![npm version](https://img.shields.io/npm/v/cerebrex.svg)](https://www.npmjs.com/package/cerebrex)
+[![Benchmarks](https://img.shields.io/badge/benchmarks-v0.9.2-brightgreen)](./BENCHMARKS.md)
 [![GitHub Stars](https://img.shields.io/github/stars/arealcoolco/CerebreX?style=social)](https://github.com/arealcoolco/CerebreX)
 [![Issues](https://img.shields.io/github/issues/arealcoolco/CerebreX)](https://github.com/arealcoolco/CerebreX/issues)
-**Build. Test. Remember. Coordinate. Publish.**
+**Build. Test. Remember. Coordinate. Publish.**
 The complete infrastructure layer for AI agents — in one CLI.
-[🚀 Quickstart](#-quickstart) · [🗂 Structure](#-monorepo-structure) · [🛣 Roadmap](#-roadmap) · [🐛 Issues](https://github.com/arealcoolco/CerebreX/issues)
+[Quickstart](#-quickstart) · [Why CerebreX](#-why-cerebrex-vs-langchain-crewai-autogen) · [Benchmarks](./BENCHMARKS.md) · [Modules](#what-is-cerebrex) · [Python SDK](#-python-sdk) · [Roadmap](#-roadmap)
 </div>
 ---
-> **Status: v0.9.1 — Security hardening patch: risk gate integrated, JWT auth on token endpoint, KAIROS backoff + validation**
+> **Status: v0.9.3 — Agent test runner (`cerebrex test` — replay + assertions + fixture recording + CI mode)**
 > `npm install -g cerebrex` — or download a self-contained binary from [GitHub Releases](https://github.com/arealcoolco/CerebreX/releases) (no Node.js required)
 >
 > **Live:** Registry UI → `https://registry.therealcool.site`
@@ -47,6 +48,56 @@ Eight modules. One CLI. One registry. One coordination layer.
 ---
+## Why CerebreX vs LangChain, CrewAI, AutoGen
+> Full benchmark methodology, raw numbers, and detailed comparisons: [**BENCHMARKS.md**](./BENCHMARKS.md)
+### Measured Performance (v0.9.2)
+```
+FORGE  parse + scaffold 20-endpoint OpenAPI spec   →   0.12ms  median
+MEMEX  read agent memory index                     →   0.01ms  median
+MEMEX  assemble 3-layer context                    →   0.03ms  median
+HIVE   classify + route 10-task swarm              →   0.09ms  median
+TRACE  record tool-call step                       →  <0.01ms  median  (27,435 ops/s)
+All benchmarks                                     →  100% success rate
+```
+### Features No Other Framework Has
+| What You Need | CerebreX | LangChain | CrewAI | AutoGen |
+|---------------|:--------:|:---------:|:------:|:-------:|
+| Generate MCP servers from any OpenAPI spec | **FORGE** | ❌ | ❌ | ❌ |
+| Three-layer cloud memory (KV + R2 + D1) | **MEMEX** | ⚠️ Paid | ❌ | ❌ |
+| Nightly AI memory consolidation | **autoDream** | ❌ | ❌ | ❌ |
+| Autonomous background daemon | **KAIROS** | ❌ | ❌ | ❌ |
+| Risk gate on every agent action | **HIVE** | ❌ | ❌ | ❌ |
+| Opus plan + human approval before execution | **ULTRAPLAN** | ❌ | ❌ | ❌ |
+| Built-in MCP package registry | **REGISTRY** | ❌ | ❌ | ❌ |
+| Built-in observability (free, local) | **TRACE** | ⚠️ Paid | ❌ | ❌ |
+| Single CLI for all of the above | `cerebrex` | ❌ | ❌ | ❌ |
+### Startup Time
+| | CerebreX | LangChain | CrewAI | AutoGen |
+|-|:--------:|:---------:|:------:|:-------:|
+| CLI / module cold start | **~80ms** | ~2,100ms | ~3,400ms | ~1,800ms |
+> CerebreX starts **26x faster** than LangChain and **42x faster** than CrewAI.
+> Bun runtime + single bundled file vs Python's large import tree.
+### What the Others Don't Have
+**LangChain** is a composition library — it connects existing tools but ships zero infrastructure. Memory requires external Redis/Postgres. Observability requires paying for LangSmith. There's no risk gating, no background daemons, and no MCP generation.
+**CrewAI** orchestrates agents in crews but its memory is SQLite-only and in-process. There's no cloud persistence, no risk classification, and no autonomous daemon. Each agent does what it's told — nothing more.
+**AutoGen** excels at multi-agent conversation but everything runs in-process. No cloud memory, no background loop, no registry, no observability beyond print statements.
+**CerebreX** is purpose-built agent infrastructure: the CLI, the cloud workers, the memory layer, the coordination engine, the observatory, and the package registry — all designed together, all open source, all running on Cloudflare's free tier.
+---
 ## ⚡ Quickstart
 ```bash
@@ -333,6 +384,136 @@ The CerebreX registry includes a browser-based UI served directly from the Worke
 ---
+## 📊 Benchmarks
+Full results with competitive analysis: [**BENCHMARKS.md**](./BENCHMARKS.md)
+```bash
+# Run all local benchmarks (no network needed)
+cerebrex bench
+# Run a specific suite
+cerebrex bench --suite forge    # MCP server generation
+cerebrex bench --suite memex    # three-layer memory
+cerebrex bench --suite hive     # swarm coordination + risk gate
+cerebrex bench --suite trace    # observability recording
+cerebrex bench --suite registry # package search
+# Or run directly with Bun
+bun benchmarks/forge-bench.ts
+bun benchmarks/memex-bench.ts
+```
+Benchmarks use `performance.now()`, report **p50/p95/p99 latency** and **throughput (ops/s)**, and run with warmup iterations discarded. CI runs the full suite weekly (Sundays 02:00 UTC). All results in [`benchmarks/results/`](benchmarks/results/).
+---
+## 🐍 Python SDK
+```bash
+pip install cerebrex
+```
+```python
+import asyncio
+from cerebrex import CerebreXClient
+async def main():
+    async with CerebreXClient(api_key="cx-your-key") as client:
+        # Write to agent memory
+        await client.memex.write_index("my-agent", "# Memory\n- learned today")
+        # Assemble a system prompt from all three memory layers
+        ctx = await client.memex.assemble_context("my-agent", topics=["context"])
+        # Search the registry
+        results = await client.registry.search("web-search")
+        # Submit a KAIROS task
+        task = await client.kairos.submit_task("my-agent", "fetch",
+            payload={"url": "https://api.example.com/data"})
+asyncio.run(main())
+```
+See [sdks/python/README.md](sdks/python/README.md) for the full SDK reference including ULTRAPLAN, TRACE, LangChain integration, and CrewAI integration.
+---
+## 🧪 Agent Test Runner
+`cerebrex test` lets you write structured assertions against recorded agent traces — no live model calls needed.
+```bash
+# Scaffold a starter spec file
+cerebrex test init
+# Run all discovered specs
+cerebrex test run
+# Run a specific spec with verbose output
+cerebrex test run my-agent.test.yaml --verbose
+# CI mode (JSON to stdout, exit 1 on failure)
+cerebrex test run --ci
+# Only run tests tagged "smoke"
+cerebrex test run --tag smoke
+# Record a saved trace session as a reusable fixture
+cerebrex test record <session-id>
+# List all discovered spec files
+cerebrex test list
+# Inspect a spec file
+cerebrex test show my-agent.test.yaml
+```
+**Spec format** (`my-agent.test.yaml`):
+```yaml
+name: My Agent Tests
+tests:
+  - name: search tool called with correct query
+    steps:
+      - type: tool_call
+        toolName: web_search
+        inputs:
+          query: "CerebreX agent OS"
+        latencyMs: 120
+      - type: tool_result
+        toolName: web_search
+        outputs:
+          results:
+            - title: "CerebreX — Agent Infrastructure OS"
+        tokens: 45
+    assert:
+      noErrors: true
+      stepCount: 2
+      toolsCalled:
+        tools: [web_search]
+      steps:
+        - at: 0
+          toolName: web_search
+  # Replay a recorded trace fixture
+  - name: matches recorded session
+    fixture: my-session.fixture.json
+    assert:
+      noErrors: true
+      stepCount:
+        min: 1
+      output:
+        path: results.0.title
+        contains: "CerebreX"
+```
+**Assertions available:** `stepCount`, `tokenCount`, `durationMs`, `noErrors`, `toolsCalled` (with `ordered`/`exact` modes), per-step checks (`type`, `toolName`, `outputPath`/`outputValue`, `latencyMs`), and `output` (dot-path `equals`/`contains`/`matches`).
+---
 ## 🗂 Monorepo Structure
 ```
@@ -340,11 +521,28 @@ CerebreX/
 ├── apps/
 │   ├── cli/              # cerebrex CLI — the main published package
 │   │   ├── src/
-│   │   │   ├── commands/ # build, trace, memex, auth, hive, other-commands
-│   │   │   └── core/     # forge/, trace/, memex/ engines + dashboard
+│   │   │   ├── commands/ # build, trace, memex, auth, hive, bench, test, other-commands
+│   │   │   └── core/     # forge/, trace/, memex/, test/ engines + dashboard
 │   │   └── dist/         # built output (git-ignored, built by CI)
 │   └── dashboard/        # Standalone trace explorer HTML
 │       └── src/index.html
+├── benchmarks/           # Performance benchmark suite (local + live)
+│   ├── forge-bench.ts    # FORGE pipeline timing
+│   ├── trace-bench.ts    # TRACE step recording throughput
+│   ├── memex-bench.ts    # Three-layer MEMEX operations
+│   ├── hive-bench.ts     # Swarm coordination + risk gate
+│   ├── registry-bench.ts # Package search + metadata
+│   ├── agent-tasks-bench.ts  # Cross-framework comparison scaffold
+│   └── src/
+│       ├── stats.ts      # p50/p95/p99 helpers
+│       ├── types.ts      # BenchmarkResult type
+│       ├── reporters/    # console, json, markdown reporters
+│       └── adapters/     # cerebrex adapter (5 standardized tasks)
+├── sdks/
+│   └── python/           # Python async SDK (pip install cerebrex)
+│       ├── src/cerebrex/ # CerebreXClient + module sub-clients
+│       ├── tests/        # pytest test suite with pytest-httpx mocks
+│       └── examples/     # quickstart, langchain_integration, crewai_integration
 ├── workers/
 │   ├── registry/         # Cloudflare Worker — live registry backend + Web UI
 │   │   ├── src/index.ts  # REST API (D1 + KV) + embedded HTML pages
@@ -370,7 +568,10 @@ CerebreX/
 │       ├── deploy-registry.yml # auto-deploy registry Worker
 │       ├── deploy-memex.yml    # auto-deploy MEMEX Worker
 │       ├── deploy-kairos.yml   # auto-deploy KAIROS Worker
-│       └── build-binaries.yml  # build standalone binaries on release
+│       ├── build-binaries.yml  # build standalone binaries on release
+│       ├── benchmarks.yml      # weekly benchmark suite (Sundays 02:00 UTC)
+│       ├── test-python.yml     # Python SDK tests (3.10, 3.11, 3.12)
+│       └── publish-python.yml  # publish cerebrex to PyPI on release
 └── turbo.json
 ```
@@ -449,9 +650,9 @@ cd apps/cli && bun run build
 - [x] HIVE swarm strategies — parallel, pipeline, competitive + 6 built-in presets *(v0.9)*
 - [x] `@cerebrex/system-prompt` — master system prompt package + live MEMEX context loader *(v0.9)*
 - [x] Security hardening — risk gate wired into HIVE worker, JWT /token endpoint authenticated, KAIROS exponential backoff + JSON validation, agentId injection prevention *(v0.9.1)*
-- [ ] Agent test runner — `cerebrex test` with replay + assertions *(v1.0)*
-- [ ] Custom domain *(next)*
-- [ ] Enterprise tier + on-prem *(v1.0)*
+- [x] Benchmark suite — p50/p95/p99, forge/trace/memex/hive/registry + cross-framework agent tasks + `cerebrex bench` CLI command *(v0.9.2)*
+- [x] Python SDK — async httpx client, Pydantic v2, full module coverage, LangChain + CrewAI integrations *(v0.9.2)*
+- [x] Agent test runner — `cerebrex test` with replay + assertions, fixture recording, tag filtering, CI mode *(v0.9.3)*
 ---