PyPI - contextro - Versions diffs - 0.0.1__tar.gz - Mend

contextro 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (213) hide show

contextro-0.0.1/.agent/skills/applied-ai-engineer/SKILL.md +174 -0
contextro-0.0.1/.agent/skills/applied-ai-engineer/evals/evals.json +286 -0
contextro-0.0.1/.agent/skills/applied-ai-engineer/references/engineering-patterns.md +59 -0
contextro-0.0.1/.agent/skills/applied-ai-engineer/references/eval-rubric.md +18 -0
contextro-0.0.1/.agent/skills/autoresearch/SKILL.md +218 -0
contextro-0.0.1/.agent/skills/autoresearch/evals/evals.json +912 -0
contextro-0.0.1/.agent/skills/autoresearch/references/eval-rubric.md +13 -0
contextro-0.0.1/.agent/skills/autoresearch/references/experiment-patterns.md +38 -0
contextro-0.0.1/.agent/skills/breakthrough-researcher/SKILL.md +166 -0
contextro-0.0.1/.agent/skills/breakthrough-researcher/evals/evals.json +249 -0
contextro-0.0.1/.agent/skills/breakthrough-researcher/references/eval-rubric.md +19 -0
contextro-0.0.1/.agent/skills/breakthrough-researcher/references/research-patterns.md +66 -0
contextro-0.0.1/.agent/skills/dev-contextia-mcp/SKILL.md +156 -0
contextro-0.0.1/.agent/skills/dev-contextia-mcp/evals/evals-new-tools-e2e.json +306 -0
contextro-0.0.1/.agent/skills/dev-contextia-mcp/evals/evals-workflow.json +326 -0
contextro-0.0.1/.agent/skills/dev-contextia-mcp/evals/evals.json +347 -0
contextro-0.0.1/.agent/skills/dev-contextia-mcp/references/benchmark-results.md +42 -0
contextro-0.0.1/.agent/skills/dev-contextia-mcp/references/eval-rubric.md +55 -0
contextro-0.0.1/.agent/skills/dev-contextia-mcp/references/tool-decision-tree.md +64 -0
contextro-0.0.1/.agent/skills/fastmcp-server-engineer/SKILL.md +90 -0
contextro-0.0.1/.agent/skills/fastmcp-server-engineer/evals/evals.json +28 -0
contextro-0.0.1/.agent/skills/fastmcp-server-engineer/references/eval-rubric.md +15 -0
contextro-0.0.1/.agent/skills/fastmcp-server-engineer/references/fastmcp-patterns.md +21 -0
contextro-0.0.1/.agent/skills/mcp-protocol-architect/SKILL.md +87 -0
contextro-0.0.1/.agent/skills/mcp-protocol-architect/evals/evals.json +274 -0
contextro-0.0.1/.agent/skills/mcp-protocol-architect/references/eval-rubric.md +15 -0
contextro-0.0.1/.agent/skills/mcp-protocol-architect/references/mcp-patterns.md +27 -0
contextro-0.0.1/.agent/skills/python-systems-engineer/SKILL.md +89 -0
contextro-0.0.1/.agent/skills/python-systems-engineer/evals/evals.json +28 -0
contextro-0.0.1/.agent/skills/python-systems-engineer/references/eval-rubric.md +14 -0
contextro-0.0.1/.agent/skills/python-systems-engineer/references/python-patterns.md +19 -0
contextro-0.0.1/.agent/skills/rust-extension-engineer/SKILL.md +89 -0
contextro-0.0.1/.agent/skills/rust-extension-engineer/evals/evals.json +28 -0
contextro-0.0.1/.agent/skills/rust-extension-engineer/references/eval-rubric.md +14 -0
contextro-0.0.1/.agent/skills/rust-extension-engineer/references/rust-patterns.md +18 -0
contextro-0.0.1/.dockerignore +17 -0
contextro-0.0.1/.github/workflows/alpha.yml +136 -0
contextro-0.0.1/.github/workflows/publish.yml +79 -0
contextro-0.0.1/.gitignore +57 -0
contextro-0.0.1/.python-version +1 -0
contextro-0.0.1/AGENTS.md +63 -0
contextro-0.0.1/CHANGELOG.md +104 -0
contextro-0.0.1/CLAUDE.md +207 -0
contextro-0.0.1/CONTRIBUTING.md +274 -0
contextro-0.0.1/Dockerfile +70 -0
contextro-0.0.1/LICENSE +21 -0
contextro-0.0.1/PKG-INFO +493 -0
contextro-0.0.1/README.md +444 -0
contextro-0.0.1/commit-one-by-one.sh +92 -0
contextro-0.0.1/contextro-mcp-ci +7 -0
contextro-0.0.1/deploy/alpha/docker-compose.yml +36 -0
contextro-0.0.1/deploy/alpha/pull-and-restart.sh +13 -0
contextro-0.0.1/docker-compose.dev.yml +48 -0
contextro-0.0.1/docker-compose.yml +43 -0
contextro-0.0.1/docs/ARCHITECTURE.md +140 -0
contextro-0.0.1/docs/DEVELOPER_GUIDE.md +293 -0
contextro-0.0.1/docs/FUTURE_CONTRIBUTIONS.md +371 -0
contextro-0.0.1/docs/IMPLEMENTATION_PLAN.md +339 -0
contextro-0.0.1/docs/INSTALLATION.md +425 -0
contextro-0.0.1/docs/PROJECT_INFO.md +209 -0
contextro-0.0.1/docs/RESEARCH.md +238 -0
contextro-0.0.1/docs/USAGE_GUIDE.md +296 -0
contextro-0.0.1/docs/research/INDEX.md +19 -0
contextro-0.0.1/docs/research/RESEARCH-TEMPLATE.md +30 -0
contextro-0.0.1/docs/research/contextia-best-in-class-plan.md +341 -0
contextro-0.0.1/opencode.json +46 -0
contextro-0.0.1/pyproject.toml +103 -0
contextro-0.0.1/rust/ctx_fast/Cargo.lock +390 -0
contextro-0.0.1/rust/ctx_fast/Cargo.toml +16 -0
contextro-0.0.1/rust/ctx_fast/src/file_scanner.rs +135 -0
contextro-0.0.1/rust/ctx_fast/src/git_ops.rs +98 -0
contextro-0.0.1/rust/ctx_fast/src/hasher.rs +28 -0
contextro-0.0.1/rust/ctx_fast/src/lib.rs +185 -0
contextro-0.0.1/scripts/bench_final.py +281 -0
contextro-0.0.1/scripts/benchmark_browser_use.py +186 -0
contextro-0.0.1/scripts/benchmark_browser_use_results.json +14 -0
contextro-0.0.1/scripts/benchmark_chunk_profiles.py +106 -0
contextro-0.0.1/scripts/benchmark_disclosure.py +334 -0
contextro-0.0.1/scripts/benchmark_embeddings.py +354 -0
contextro-0.0.1/scripts/benchmark_embeddings_full.py +398 -0
contextro-0.0.1/scripts/benchmark_platform_live.py +542 -0
contextro-0.0.1/scripts/benchmark_results.json +58 -0
contextro-0.0.1/scripts/benchmark_results_full.json +90 -0
contextro-0.0.1/scripts/benchmark_retrieval_quality.py +211 -0
contextro-0.0.1/scripts/benchmark_token_efficiency.py +422 -0
contextro-0.0.1/scripts/benchmark_utils.py +128 -0
contextro-0.0.1/scripts/dev_http_server.py +185 -0
contextro-0.0.1/scripts/docker_healthcheck.py +21 -0
contextro-0.0.1/scripts/evaluate_contextia_skill.py +325 -0
contextro-0.0.1/scripts/init.sh +36 -0
contextro-0.0.1/scripts/results.tsv +20 -0
contextro-0.0.1/scripts/results_browser_use.tsv +2 -0
contextro-0.0.1/scripts/results_indexing_speed.tsv +11 -0
contextro-0.0.1/scripts/results_platform_staging.tsv +2 -0
contextro-0.0.1/scripts/test_tool.sh +30 -0
contextro-0.0.1/scripts/token_benchmark_results.json +16 -0
contextro-0.0.1/setup.py +27 -0
contextro-0.0.1/setup.sh +211 -0
contextro-0.0.1/smithery.yaml +55 -0
contextro-0.0.1/src/contextro_mcp/__init__.py +3 -0
contextro-0.0.1/src/contextro_mcp/accelerator.py +321 -0
contextro-0.0.1/src/contextro_mcp/analysis/__init__.py +0 -0
contextro-0.0.1/src/contextro_mcp/analysis/code_analyzer.py +237 -0
contextro-0.0.1/src/contextro_mcp/config.py +248 -0
contextro-0.0.1/src/contextro_mcp/core/__init__.py +0 -0
contextro-0.0.1/src/contextro_mcp/core/exceptions.py +77 -0
contextro-0.0.1/src/contextro_mcp/core/graph_models.py +276 -0
contextro-0.0.1/src/contextro_mcp/core/interfaces.py +53 -0
contextro-0.0.1/src/contextro_mcp/core/models.py +285 -0
contextro-0.0.1/src/contextro_mcp/engines/__init__.py +0 -0
contextro-0.0.1/src/contextro_mcp/engines/bm25_engine.py +153 -0
contextro-0.0.1/src/contextro_mcp/engines/fusion.py +208 -0
contextro-0.0.1/src/contextro_mcp/engines/graph_engine.py +285 -0
contextro-0.0.1/src/contextro_mcp/engines/live_grep.py +169 -0
contextro-0.0.1/src/contextro_mcp/engines/output_sandbox.py +140 -0
contextro-0.0.1/src/contextro_mcp/engines/query_cache.py +163 -0
contextro-0.0.1/src/contextro_mcp/engines/reranker.py +117 -0
contextro-0.0.1/src/contextro_mcp/engines/vector_engine.py +209 -0
contextro-0.0.1/src/contextro_mcp/execution/__init__.py +11 -0
contextro-0.0.1/src/contextro_mcp/execution/ast_compression.py +187 -0
contextro-0.0.1/src/contextro_mcp/execution/compaction.py +310 -0
contextro-0.0.1/src/contextro_mcp/execution/interfaces.py +31 -0
contextro-0.0.1/src/contextro_mcp/execution/response_policy.py +303 -0
contextro-0.0.1/src/contextro_mcp/execution/runtime.py +73 -0
contextro-0.0.1/src/contextro_mcp/execution/search.py +446 -0
contextro-0.0.1/src/contextro_mcp/formatting/__init__.py +0 -0
contextro-0.0.1/src/contextro_mcp/formatting/response_builder.py +121 -0
contextro-0.0.1/src/contextro_mcp/formatting/token_budget.py +59 -0
contextro-0.0.1/src/contextro_mcp/formatting/toon_encoder.py +81 -0
contextro-0.0.1/src/contextro_mcp/git/__init__.py +1 -0
contextro-0.0.1/src/contextro_mcp/git/branch_watcher.py +258 -0
contextro-0.0.1/src/contextro_mcp/git/commit_indexer.py +564 -0
contextro-0.0.1/src/contextro_mcp/git/cross_repo.py +210 -0
contextro-0.0.1/src/contextro_mcp/indexing/__init__.py +14 -0
contextro-0.0.1/src/contextro_mcp/indexing/chunk_context.py +110 -0
contextro-0.0.1/src/contextro_mcp/indexing/chunker.py +144 -0
contextro-0.0.1/src/contextro_mcp/indexing/embedding_service.py +429 -0
contextro-0.0.1/src/contextro_mcp/indexing/file_discovery.py +95 -0
contextro-0.0.1/src/contextro_mcp/indexing/parallel_indexer.py +141 -0
contextro-0.0.1/src/contextro_mcp/indexing/pipeline.py +865 -0
contextro-0.0.1/src/contextro_mcp/indexing/smart_chunker.py +223 -0
contextro-0.0.1/src/contextro_mcp/memory/__init__.py +0 -0
contextro-0.0.1/src/contextro_mcp/memory/compaction_archive.py +131 -0
contextro-0.0.1/src/contextro_mcp/memory/memory_store.py +305 -0
contextro-0.0.1/src/contextro_mcp/memory/session_tracker.py +146 -0
contextro-0.0.1/src/contextro_mcp/middleware/__init__.py +1 -0
contextro-0.0.1/src/contextro_mcp/middleware/audit.py +83 -0
contextro-0.0.1/src/contextro_mcp/parsing/__init__.py +0 -0
contextro-0.0.1/src/contextro_mcp/parsing/astgrep_parser.py +497 -0
contextro-0.0.1/src/contextro_mcp/parsing/file_watcher.py +162 -0
contextro-0.0.1/src/contextro_mcp/parsing/language_registry.py +274 -0
contextro-0.0.1/src/contextro_mcp/parsing/treesitter_parser.py +421 -0
contextro-0.0.1/src/contextro_mcp/persistence/__init__.py +0 -0
contextro-0.0.1/src/contextro_mcp/persistence/store.py +196 -0
contextro-0.0.1/src/contextro_mcp/research/__init__.py +13 -0
contextro-0.0.1/src/contextro_mcp/research/catalog.py +256 -0
contextro-0.0.1/src/contextro_mcp/schemas/__init__.py +59 -0
contextro-0.0.1/src/contextro_mcp/schemas/inputs.py +218 -0
contextro-0.0.1/src/contextro_mcp/schemas/responses.py +257 -0
contextro-0.0.1/src/contextro_mcp/security/__init__.py +1 -0
contextro-0.0.1/src/contextro_mcp/security/permissions.py +106 -0
contextro-0.0.1/src/contextro_mcp/security/rate_limiter.py +87 -0
contextro-0.0.1/src/contextro_mcp/server.py +3110 -0
contextro-0.0.1/src/contextro_mcp/state.py +283 -0
contextro-0.0.1/tests/__init__.py +0 -0
contextro-0.0.1/tests/conftest.py +153 -0
contextro-0.0.1/tests/test_accelerator.py +213 -0
contextro-0.0.1/tests/test_analyze_tool.py +175 -0
contextro-0.0.1/tests/test_audit.py +122 -0
contextro-0.0.1/tests/test_bm25_engine.py +200 -0
contextro-0.0.1/tests/test_branch_watcher.py +222 -0
contextro-0.0.1/tests/test_chunker.py +237 -0
contextro-0.0.1/tests/test_commit_indexer.py +301 -0
contextro-0.0.1/tests/test_config.py +64 -0
contextro-0.0.1/tests/test_cross_repo.py +237 -0
contextro-0.0.1/tests/test_e2e.py +175 -0
contextro-0.0.1/tests/test_embedding_properties.py +280 -0
contextro-0.0.1/tests/test_exceptions.py +58 -0
contextro-0.0.1/tests/test_explain_tool.py +107 -0
contextro-0.0.1/tests/test_fusion.py +194 -0
contextro-0.0.1/tests/test_git_tools.py +336 -0
contextro-0.0.1/tests/test_graph_engine.py +151 -0
contextro-0.0.1/tests/test_graph_models.py +195 -0
contextro-0.0.1/tests/test_graph_persistence.py +137 -0
contextro-0.0.1/tests/test_graph_tools.py +200 -0
contextro-0.0.1/tests/test_health.py +135 -0
contextro-0.0.1/tests/test_hybrid_search.py +170 -0
contextro-0.0.1/tests/test_impact_tool.py +128 -0
contextro-0.0.1/tests/test_interfaces.py +53 -0
contextro-0.0.1/tests/test_language_registry.py +99 -0
contextro-0.0.1/tests/test_live_grep.py +74 -0
contextro-0.0.1/tests/test_memory_store.py +200 -0
contextro-0.0.1/tests/test_memory_usage.py +77 -0
contextro-0.0.1/tests/test_models.py +258 -0
contextro-0.0.1/tests/test_performance.py +69 -0
contextro-0.0.1/tests/test_permissions.py +207 -0
contextro-0.0.1/tests/test_pipeline.py +307 -0
contextro-0.0.1/tests/test_rate_limiter.py +111 -0
contextro-0.0.1/tests/test_reranker.py +114 -0
contextro-0.0.1/tests/test_research_catalog.py +23 -0
contextro-0.0.1/tests/test_response_builder.py +94 -0
contextro-0.0.1/tests/test_schemas.py +232 -0
contextro-0.0.1/tests/test_search_execution.py +306 -0
contextro-0.0.1/tests/test_security.py +180 -0
contextro-0.0.1/tests/test_smart_chunker.py +206 -0
contextro-0.0.1/tests/test_state.py +59 -0
contextro-0.0.1/tests/test_token_budget.py +92 -0
contextro-0.0.1/tests/test_tool_response_policy.py +133 -0
contextro-0.0.1/tests/test_tools_basic.py +284 -0
contextro-0.0.1/tests/test_treesitter_parser.py +168 -0
contextro-0.0.1/tests/test_trust_remote_code.py +84 -0
contextro-0.0.1/tests/test_vector_engine.py +188 -0
contextro-0.0.1/uv.lock +3192 -0

contextro-0.0.1/.agent/skills/applied-ai-engineer/SKILL.md ADDED Viewed

@@ -0,0 +1,174 @@
+---
+name: applied-ai-engineer
+description: >
+  Use for turning research ideas or agent workflows into robust, benchmarked, observable,
+  production-ready systems. Trigger when the user asks to productionize an AI feature, build
+  a harness, add evals, improve reliability, reduce regressions, add observability, create a
+  rollout plan, improve agent performance through better scaffolding, or convert a promising
+  research idea into a safe implementation path. Do not use for pure literature review,
+  speculative research with no implementation intent, or trivial code changes.
+when_to_use: >
+  Especially useful for harness engineering, evaluator design, benchmark discipline,
+  instrumentation, rollout safety, architecture legibility, compaction and resume flows,
+  workflow governance, and making agent systems reliable under real constraints.
+metadata:
+  version: "1.0.0"
+  category: engineering
+  tags: [applied-ai, harness, evals, observability, rollout, reliability, benchmarking]
+license: MIT
+---
+# Applied AI Engineer
+You are the applied AI engineering role.
+Your job is to turn a good idea into a reliable system with guardrails, observability, and a
+repeatable evaluation story.
+## Use This Skill To Produce
+- a concrete implementation path
+- a benchmark or eval harness
+- regression guardrails
+- observability requirements
+- rollout and rollback criteria
+- repository artifacts that make the system legible to future agents
+## Method
+### 1. Define The Outcome And Constraints
+Start every task by naming:
+- user-visible outcome
+- primary metric
+- secondary guardrails
+- hard constraints such as memory, latency, privacy, local-first behavior, and test integrity
+If the metric is unclear, make it explicit before changing the system.
+### 2. Make The System Legible
+Prefer repository-local artifacts over hidden conversational guidance.
+Use or improve:
+- concise top-level instructions
+- structured docs in `docs/`
+- executable benchmark scripts in `scripts/`
+- tests and linters
+- eval definitions
+- stable response shapes and resume artifacts
+OpenAI's lesson applies here: give the agent a map, not a manual.
+### 3. Build The Harness Before Trusting The Change
+For meaningful AI or retrieval changes, define:
+- baseline benchmark command
+- realistic task set or eval set
+- deterministic checks where possible
+- evaluator workflow where deterministic checks are insufficient
+- before vs after comparison
+For Contextro, prefer the existing benchmark surfaces:
+- `python scripts/benchmark_token_efficiency.py`
+- `python scripts/benchmark_retrieval_quality.py --path src --query-limit 20`
+- `python scripts/benchmark_chunk_profiles.py --path src --query-limit 20`
+- `python scripts/benchmark_disclosure.py`
+- `python scripts/bench_final.py`
+- `pytest -v`
+- `ruff check .`
+### 4. Implement The Smallest Enforceable Slice
+Do not solve a broad problem with a large rewrite unless the harness proves you need one.
+Prefer:
+- one clear invariant at a time
+- one benchmarked change at a time
+- thin entrypoints with logic moved into focused modules
+- explicit boundaries between orchestration, state, formatting, and domain logic
+- structure that can be tested and observed
+- reusable system surfaces over prompt-only behavior
+### 5. Add Observability And Recovery
+If a system can fail, drift, or regress, add the signals that reveal it:
+- metrics
+- logs
+- traces or event records
+- resume and compaction artifacts
+- stable prefixes for cache-friendly outputs
+- searchable history when long-running tasks matter
+Devin and DeepSeek patterns both matter here: realistic feedback loops and resumable trajectories.
+### 6. Validate Before You Ship
+Every significant change should have:
+- test result
+- benchmark result
+- regression guardrail status
+- failure-mode review
+- rollback plan
+Do not trade away correctness or maintainability for a single benchmark win.
+### 7. Encode Taste Into The Repo
+If a human review comment is likely to recur, turn it into one of:
+- documentation
+- a lint or test
+- a benchmark assertion
+- an eval case
+- an explicit workflow rule
+The goal is not to keep fixing the same thing manually.
+## Company Patterns To Reuse
+- OpenAI: harness engineering, repo-local system of record, architecture legibility, enforceable invariants
+- Anthropic: smallest high-signal context, progressive disclosure, explicit long-running-agent support
+- Cursor: keep implementation tightly coupled to codebase retrieval, fast local iteration, and low-friction edits
+- Windsurf: pair planning with execution, preserve working state across long tasks, and keep agent actions IDE-aware
+- Mistral: use efficient model/task routing and modular context slices before reaching for heavier system complexity
+- Devin and Cognition: realistic environments, evaluator loops, autonomous feedback, environment-aware critique
+- NVIDIA: benchmark the whole pipeline, not just one subcomponent
+- DeepSeek: checkpointing, cache-aware structure, trajectory logging, resumability
+## Output Format
+Return results in this order:
+1. `Outcome and metric`
+2. `Constraints`
+3. `Current baseline`
+4. `Implementation plan`
+5. `Harness and eval plan`
+6. `Observability and guardrails`
+7. `Rollout and rollback`
+## Anti-Patterns
+- Do not ship AI behavior with no evals.
+- Do not benchmark one metric while ignoring tests, latency, memory, or user-visible regressions.
+- Do not rely on giant instruction blobs when code, docs, lint, or evals can enforce the behavior.
+- Do not hide critical workflow knowledge only in chat.
+- Do not choose architectural rewrites before testing smaller enforceable changes.
+## Handoff Rule
+- use `breakthrough-researcher` when the solution space is still unclear
+- use `autoresearch` when the metric and experiment loop are already defined and ready to run autonomously
+## References
+- Engineering patterns: `references/engineering-patterns.md`
+- Skill eval rubric: `references/eval-rubric.md`

contextro-0.0.1/.agent/skills/applied-ai-engineer/evals/evals.json ADDED Viewed

@@ -0,0 +1,286 @@
+{
+  "skill_name": "applied-ai-engineer",
+  "version": "1.1.0",
+  "description": "Eval suite for the applied AI engineering skill. Measures trigger quality, harness-first engineering, observability discipline, benchmark and eval rigor, and production-safe implementation planning.",
+  "categories": {
+    "triggering": "Skill loads for productionization and reliability work",
+    "workflow": "Skill follows metric, baseline, harness, implementation, and validation order",
+    "evals": "Skill insists on benchmarks or evaluator design before trusting changes",
+    "observability": "Skill adds signals and recovery paths for long-running systems",
+    "anti_pattern": "Skill avoids shipping without guardrails or using prompt blobs as system design",
+    "output_quality": "Skill returns rollout, rollback, and concrete enforcement mechanisms"
+  },
+  "evals": [
+    {
+      "id": 1,
+      "name": "trigger-productionize-agent-feature",
+      "category": "triggering",
+      "prompt": "We have a promising idea for searchable compaction archive. I need an applied AI engineer plan to productionize it safely.",
+      "should_trigger": true,
+      "expected_behavior": "Skill loads and frames the task as harness, implementation, observability, and rollout work.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "skill triggers for productionization request",
+          "type": "routing",
+          "passing_condition": "Skill loads for productionize and safely ship phrasing"
+        }
+      ]
+    },
+    {
+      "id": 2,
+      "name": "trigger-build-harness",
+      "category": "triggering",
+      "prompt": "Build me a proper harness and eval plan for improving Contextro's retrieval pipeline.",
+      "should_trigger": true,
+      "expected_behavior": "Skill loads for harness and evaluation design.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "skill triggers for harness work",
+          "type": "routing",
+          "passing_condition": "Skill loads for harness and eval request"
+        }
+      ]
+    },
+    {
+      "id": 3,
+      "name": "no-trigger-pure-research",
+      "category": "triggering",
+      "prompt": "Research the most novel ideas in long-context memory systems and summarize the papers.",
+      "should_trigger": false,
+      "expected_behavior": "Skill does not load for pure literature review with no implementation intent.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "skill does not trigger for pure research",
+          "type": "routing",
+          "passing_condition": "Skill does not load for literature-only request"
+        }
+      ]
+    },
+    {
+      "id": 4,
+      "name": "no-trigger-trivial-edit",
+      "category": "triggering",
+      "prompt": "Add a comment to the SearchEngine class.",
+      "should_trigger": false,
+      "expected_behavior": "Skill does not load for trivial code edits.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "skill does not trigger for trivial edit",
+          "type": "routing",
+          "passing_condition": "Skill does not load for small direct edit"
+        }
+      ]
+    },
+    {
+      "id": 5,
+      "name": "workflow-metric-first",
+      "category": "workflow",
+      "prompt": "Make Contextro's compaction system better.",
+      "expected_behavior": "Skill starts by naming user-visible outcome, metric, and constraints before implementation steps.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "metric and constraints come first",
+          "type": "workflow",
+          "passing_condition": "Response begins with outcome, metric, or constraints before implementation details"
+        }
+      ]
+    },
+    {
+      "id": 6,
+      "name": "workflow-baseline-before-change",
+      "category": "workflow",
+      "prompt": "Plan the engineering work to reduce token output in search responses.",
+      "expected_behavior": "Skill establishes current benchmark command and baseline before proposing changes.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "baseline is established",
+          "type": "workflow",
+          "passing_condition": "Response includes benchmark command and current baseline before proposed changes"
+        }
+      ]
+    },
+    {
+      "id": 7,
+      "name": "evals-harness-before-trust",
+      "category": "evals",
+      "prompt": "We think AST-aware compression will help. Give me the implementation plan.",
+      "expected_behavior": "Skill includes a benchmark or eval harness before trusting the improvement.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "benchmark or eval plan is included",
+          "type": "evals",
+          "passing_condition": "Response includes a benchmark command, eval set, or before-vs-after comparison plan"
+        }
+      ]
+    },
+    {
+      "id": 8,
+      "name": "evals-realistic-environment",
+      "category": "evals",
+      "prompt": "How would you validate a new cross-repo search workflow?",
+      "expected_behavior": "Skill prefers realistic task evaluation over only synthetic microbenchmarks.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "realistic evaluation is included",
+          "type": "evals",
+          "passing_condition": "Response includes realistic task scenarios or evaluator-style checks, not only isolated microbenchmarks"
+        }
+      ]
+    },
+    {
+      "id": 9,
+      "name": "observability-signals-added",
+      "category": "observability",
+      "prompt": "Design the engineering plan for a long-running agent feature that may regress over time.",
+      "expected_behavior": "Skill adds logs, metrics, traces, or equivalent signals plus resume or recovery artifacts.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "signals are proposed",
+          "type": "observability",
+          "passing_condition": "Response includes logs, metrics, traces, event records, or resume artifacts"
+        }
+      ]
+    },
+    {
+      "id": 10,
+      "name": "observability-resume-and-recovery",
+      "category": "observability",
+      "prompt": "How would you engineer Contextro to survive compaction and long tasks more reliably?",
+      "expected_behavior": "Skill includes resume, checkpoint, archive, or trajectory concepts.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "resume or checkpoint path is included",
+          "type": "observability",
+          "passing_condition": "Response includes checkpointing, session packets, searchable archive, or trajectory logging"
+        }
+      ]
+    },
+    {
+      "id": 11,
+      "name": "anti-pattern-no-ship-without-evals",
+      "category": "anti_pattern",
+      "prompt": "We don't need benchmarks. Just implement searchable archive and ship it.",
+      "expected_behavior": "Skill rejects shipping without evaluation.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "shipping without evals is rejected",
+          "type": "anti_pattern",
+          "passing_condition": "Response refuses to skip benchmark or eval validation"
+        }
+      ]
+    },
+    {
+      "id": 12,
+      "name": "anti-pattern-no-single-metric-blindness",
+      "category": "anti_pattern",
+      "prompt": "If latency drops by 20%, I don't care if tests fail or memory doubles.",
+      "expected_behavior": "Skill rejects single-metric optimization that ignores guardrails.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "guardrails are defended",
+          "type": "anti_pattern",
+          "passing_condition": "Response rejects ignoring tests, memory, or user-visible regressions"
+        }
+      ]
+    },
+    {
+      "id": 13,
+      "name": "anti-pattern-no-giant-prompt-manual",
+      "category": "anti_pattern",
+      "prompt": "Let's solve agent reliability by writing a massive 5000-line AGENTS file.",
+      "expected_behavior": "Skill rejects giant prompt manuals and prefers enforceable artifacts.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "prompt blob solution is rejected",
+          "type": "anti_pattern",
+          "passing_condition": "Response recommends concise maps, docs, lints, tests, or evals instead of giant manuals"
+        }
+      ]
+    },
+    {
+      "id": 14,
+      "name": "output-quality-rollout-and-rollback",
+      "category": "output_quality",
+      "prompt": "Give me the implementation plan for a new memory feature.",
+      "expected_behavior": "Skill includes rollout and rollback criteria.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "rollout and rollback are included",
+          "type": "output_quality",
+          "passing_condition": "Response includes rollout, rollback, or failure recovery criteria"
+        }
+      ]
+    },
+    {
+      "id": 15,
+      "name": "output-quality-enforcement-artifacts",
+      "category": "output_quality",
+      "prompt": "How do we make agent behavior consistent across future sessions?",
+      "expected_behavior": "Skill prefers enforceable repo artifacts over repeated human reminding.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "enforcement mechanisms are proposed",
+          "type": "output_quality",
+          "passing_condition": "Response includes docs, lint, tests, evals, rules, or hooks as enforcement mechanisms"
+        }
+      ]
+    },
+    {
+      "id": 16,
+      "name": "workflow-modular-slice-before-rewrite",
+      "category": "workflow",
+      "prompt": "The FastMCP server is getting messy. Let's rewrite half the codebase and move everything around while we add searchable archive.",
+      "expected_behavior": "Skill rejects the broad rewrite impulse and instead proposes a small, benchmarked slice with explicit module boundaries.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "small enforceable slice is preferred",
+          "type": "workflow",
+          "passing_condition": "Response proposes a minimal incremental slice before any broad rewrite"
+        },
+        {
+          "id": "a2",
+          "text": "module boundaries are named",
+          "type": "workflow",
+          "passing_condition": "Response calls out boundaries such as entrypoint vs domain logic, formatting, state, or orchestration"
+        }
+      ]
+    },
+    {
+      "id": 17,
+      "name": "output-quality-thin-entrypoint-plan",
+      "category": "output_quality",
+      "prompt": "Plan the engineering work for a new MCP feature without letting server.py become the dumping ground.",
+      "expected_behavior": "Skill returns a plan that keeps entrypoints thin and routes reusable logic into focused modules with verification.",
+      "assertions": [
+        {
+          "id": "a1",
+          "text": "thin entrypoint approach included",
+          "type": "output_quality",
+          "passing_condition": "Response says to keep the entrypoint thin or avoid putting all feature logic in server.py"
+        },
+        {
+          "id": "a2",
+          "text": "verification remains part of the modularization plan",
+          "type": "output_quality",
+          "passing_condition": "Response includes tests, benchmark, or eval checks alongside the modular implementation plan"
+        }
+      ]
+    }
+  ]
+}

contextro-0.0.1/.agent/skills/applied-ai-engineer/references/engineering-patterns.md ADDED Viewed

@@ -0,0 +1,59 @@
+# Applied AI Engineering Patterns
+This reference captures the repeated engineering patterns behind strong AI product teams.
+## OpenAI
+- The engineering role shifts from manual coding to systems, scaffolding, and leverage.
+- Keep top-level instructions short and use the repository as the system of record.
+- Enforce architecture and taste through tooling, not repeated review comments.
+- Favor legibility, strict boundaries, and mechanically checked invariants.
+## Anthropic
+- Use the smallest high-signal context that still solves the task.
+- Make long-running work resumable with explicit artifacts.
+- Use progressive disclosure to avoid flooding the model with low-value detail.
+## Cursor
+- Keep the coding loop close to the local codebase: retrieve, inspect, edit, verify.
+- Reduce friction between research, implementation, and validation so good ideas survive contact with the repo.
+- Prefer precise context targeting over broad prompt stuffing.
+## Windsurf
+- Treat agent engineering as a coordinated plan-plus-execution system, not only a chat interaction.
+- Keep visible intermediate state so long tasks can recover without losing intent.
+- Make the environment, tools, and execution status legible enough for iterative autonomous work.
+## Mistral
+- Efficient model usage depends on good routing, compact context, and clean task decomposition.
+- Smaller passes with strong structure can outperform a single large opaque pass.
+- System quality comes from orchestration and interfaces, not only raw model size.
+## Devin And Cognition
+- Prefer realistic task environments over abstract unit-only evaluation.
+- Use autonomous evaluator flows when deterministic checks are insufficient.
+- Store external notes and environment state so long tasks can resume cleanly.
+## NVIDIA
+- Measure the entire RAG system: chunking, retrieval, reranking, shaping, latency, memory.
+- The best architecture on paper is not the best architecture until it wins on the target corpus.
+## DeepSeek
+- Long-horizon systems benefit from checkpointing and resumable trajectories.
+- Stable prompt structure improves reuse and efficiency.
+- Preserve the useful state for tool-calling loops without replaying everything.
+## What This Means For Contextro
+- The next gains should come from harness quality, workflow control, observability, and resume flows.
+- Research ideas should be translated into benchmarked, enforceable repo artifacts.
+- New behavior should land with tests, evals, metrics, and a rollback story.
+- Implementation patterns from Cursor, Windsurf, and Mistral should be translated into repo-local
+  harnesses, workflow state, and efficient context/task routing rather than copied superficially.

contextro-0.0.1/.agent/skills/applied-ai-engineer/references/eval-rubric.md ADDED Viewed

@@ -0,0 +1,18 @@
+# Applied AI Engineer Eval Rubric
+## The skill passes when it:
+- triggers for productionization, harness, eval, observability, rollout, or reliability work
+- does not trigger for pure research or trivial code edits
+- defines metric, constraints, baseline, and guardrails
+- proposes a concrete harness or evaluation method
+- includes observability and rollback thinking
+- prefers enforceable repository artifacts over prompt-only guidance
+## The skill fails when it:
+- ships changes without evals or benchmarks
+- optimizes a single metric while ignoring regressions
+- turns into a pure literature-review skill
+- recommends big rewrites before smaller benchmarked slices
+- omits rollout, rollback, or failure detection