mdcontext 0.0.1 → 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.changeset/README.md +28 -0
- package/.changeset/config.json +11 -0
- package/.github/workflows/ci.yml +83 -0
- package/.github/workflows/release.yml +113 -0
- package/.tldrignore +112 -0
- package/AGENTS.md +46 -0
- package/BACKLOG.md +338 -0
- package/README.md +231 -11
- package/biome.json +36 -0
- package/cspell.config.yaml +14 -0
- package/dist/chunk-KRYIFLQR.js +92 -0
- package/dist/chunk-S7E6TFX6.js +742 -0
- package/dist/chunk-VVTGZNBT.js +1519 -0
- package/dist/cli/main.d.ts +1 -0
- package/dist/cli/main.js +2015 -0
- package/dist/index.d.ts +266 -0
- package/dist/index.js +86 -0
- package/dist/mcp/server.d.ts +1 -0
- package/dist/mcp/server.js +376 -0
- package/docs/019-USAGE.md +586 -0
- package/docs/020-current-implementation.md +364 -0
- package/docs/021-DOGFOODING-FINDINGS.md +175 -0
- package/docs/BACKLOG.md +80 -0
- package/docs/DESIGN.md +439 -0
- package/docs/PROJECT.md +88 -0
- package/docs/ROADMAP.md +407 -0
- package/docs/test-links.md +9 -0
- package/package.json +69 -10
- package/pnpm-workspace.yaml +5 -0
- package/research/config-analysis/01-current-implementation.md +470 -0
- package/research/config-analysis/02-strategy-recommendation.md +428 -0
- package/research/config-analysis/03-task-candidates.md +715 -0
- package/research/config-analysis/033-research-configuration-management.md +828 -0
- package/research/config-analysis/034-research-effect-cli-config.md +1504 -0
- package/research/config-analysis/04-consolidated-task-candidates.md +277 -0
- package/research/dogfood/consolidated-tool-evaluation.md +373 -0
- package/research/dogfood/strategy-a/a-synthesis.md +184 -0
- package/research/dogfood/strategy-a/a1-docs.md +226 -0
- package/research/dogfood/strategy-a/a2-amorphic.md +156 -0
- package/research/dogfood/strategy-a/a3-llm.md +164 -0
- package/research/dogfood/strategy-b/b-synthesis.md +228 -0
- package/research/dogfood/strategy-b/b1-architecture.md +207 -0
- package/research/dogfood/strategy-b/b2-gaps.md +258 -0
- package/research/dogfood/strategy-b/b3-workflows.md +250 -0
- package/research/dogfood/strategy-c/c-synthesis.md +451 -0
- package/research/dogfood/strategy-c/c1-explorer.md +192 -0
- package/research/dogfood/strategy-c/c2-diver-memory.md +145 -0
- package/research/dogfood/strategy-c/c3-diver-control.md +148 -0
- package/research/dogfood/strategy-c/c4-diver-failure.md +151 -0
- package/research/dogfood/strategy-c/c5-diver-execution.md +221 -0
- package/research/dogfood/strategy-c/c6-diver-org.md +221 -0
- package/research/effect-cli-error-handling.md +845 -0
- package/research/effect-errors-as-values.md +943 -0
- package/research/errors-task-analysis/00-consolidated-tasks.md +207 -0
- package/research/errors-task-analysis/cli-commands-analysis.md +909 -0
- package/research/errors-task-analysis/embeddings-analysis.md +709 -0
- package/research/errors-task-analysis/index-search-analysis.md +812 -0
- package/research/mdcontext-error-analysis.md +521 -0
- package/research/npm_publish/011-npm-workflow-research-agent2.md +792 -0
- package/research/npm_publish/012-npm-workflow-research-agent1.md +530 -0
- package/research/npm_publish/013-npm-workflow-research-agent3.md +722 -0
- package/research/npm_publish/014-npm-workflow-synthesis.md +556 -0
- package/research/npm_publish/031-npm-workflow-task-analysis.md +134 -0
- package/research/semantic-search/002-research-embedding-models.md +490 -0
- package/research/semantic-search/003-research-rag-alternatives.md +523 -0
- package/research/semantic-search/004-research-vector-search.md +841 -0
- package/research/semantic-search/032-research-semantic-search.md +427 -0
- package/research/task-management-2026/00-synthesis-recommendations.md +295 -0
- package/research/task-management-2026/01-ai-workflow-tools.md +416 -0
- package/research/task-management-2026/02-agent-framework-patterns.md +476 -0
- package/research/task-management-2026/03-lightweight-file-based.md +567 -0
- package/research/task-management-2026/04-established-tools-ai-features.md +541 -0
- package/research/task-management-2026/linear/01-core-features-workflow.md +771 -0
- package/research/task-management-2026/linear/02-api-integrations.md +930 -0
- package/research/task-management-2026/linear/03-ai-features.md +368 -0
- package/research/task-management-2026/linear/04-pricing-setup.md +205 -0
- package/research/task-management-2026/linear/05-usage-patterns-best-practices.md +605 -0
- package/scripts/rebuild-hnswlib.js +63 -0
- package/src/cli/argv-preprocessor.test.ts +210 -0
- package/src/cli/argv-preprocessor.ts +202 -0
- package/src/cli/cli.test.ts +430 -0
- package/src/cli/commands/backlinks.ts +54 -0
- package/src/cli/commands/context.ts +197 -0
- package/src/cli/commands/index-cmd.ts +300 -0
- package/src/cli/commands/index.ts +13 -0
- package/src/cli/commands/links.ts +52 -0
- package/src/cli/commands/search.ts +451 -0
- package/src/cli/commands/stats.ts +146 -0
- package/src/cli/commands/tree.ts +107 -0
- package/src/cli/flag-schemas.ts +275 -0
- package/src/cli/help.ts +386 -0
- package/src/cli/index.ts +9 -0
- package/src/cli/main.ts +145 -0
- package/src/cli/options.ts +31 -0
- package/src/cli/typo-suggester.test.ts +105 -0
- package/src/cli/typo-suggester.ts +130 -0
- package/src/cli/utils.ts +126 -0
- package/src/core/index.ts +1 -0
- package/src/core/types.ts +140 -0
- package/src/embeddings/index.ts +8 -0
- package/src/embeddings/openai-provider.ts +165 -0
- package/src/embeddings/semantic-search.ts +583 -0
- package/src/embeddings/types.ts +82 -0
- package/src/embeddings/vector-store.ts +299 -0
- package/src/index/index.ts +4 -0
- package/src/index/indexer.ts +446 -0
- package/src/index/storage.ts +196 -0
- package/src/index/types.ts +109 -0
- package/src/index/watcher.ts +131 -0
- package/src/index.ts +8 -0
- package/src/mcp/server.ts +483 -0
- package/src/parser/index.ts +1 -0
- package/src/parser/parser.test.ts +291 -0
- package/src/parser/parser.ts +395 -0
- package/src/parser/section-filter.ts +270 -0
- package/src/search/query-parser.test.ts +260 -0
- package/src/search/query-parser.ts +319 -0
- package/src/search/searcher.test.ts +182 -0
- package/src/search/searcher.ts +602 -0
- package/src/summarize/budget-bugs.test.ts +620 -0
- package/src/summarize/formatters.ts +419 -0
- package/src/summarize/index.ts +20 -0
- package/src/summarize/summarizer.test.ts +275 -0
- package/src/summarize/summarizer.ts +528 -0
- package/src/summarize/verify-bugs.test.ts +238 -0
- package/src/utils/index.ts +1 -0
- package/src/utils/tokens.test.ts +142 -0
- package/src/utils/tokens.ts +186 -0
- package/tests/fixtures/cli/.mdcontext/config.json +8 -0
- package/tests/fixtures/cli/.mdcontext/indexes/documents.json +33 -0
- package/tests/fixtures/cli/.mdcontext/indexes/links.json +12 -0
- package/tests/fixtures/cli/.mdcontext/indexes/sections.json +233 -0
- package/tests/fixtures/cli/.mdcontext/vectors.bin +0 -0
- package/tests/fixtures/cli/.mdcontext/vectors.meta.json +1264 -0
- package/tests/fixtures/cli/README.md +9 -0
- package/tests/fixtures/cli/api-reference.md +11 -0
- package/tests/fixtures/cli/getting-started.md +11 -0
- package/tsconfig.json +26 -0
- package/vitest.config.ts +21 -0
- package/vitest.setup.ts +12 -0
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
# Report: A3 - LLM Chat Analyst
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
|
|
5
|
+
Extract additional feedback and ideas from docs.llm/ (ignoring vector7/)
|
|
6
|
+
|
|
7
|
+
## Command Log
|
|
8
|
+
|
|
9
|
+
| # | Command | Purpose | Result | Useful? |
|
|
10
|
+
| --- | ----------------------------------------------------------------------- | ---------------------- | ------------------------------------------------ | ----------- |
|
|
11
|
+
| 1 | `mdcontext --help` | Learn tool | Showed commands, options, examples | Yes |
|
|
12
|
+
| 2 | `mdcontext tree docs.llm/` | List all files | Found 3 files: amorphic.md, feedback.md, spec.md | Yes |
|
|
13
|
+
| 3 | `mdcontext index docs.llm/ --force` | Index directory | Ran in background | Yes |
|
|
14
|
+
| 4 | `mdcontext stats docs.llm/` | Check index stats | 3 docs, 89K tokens, 448 sections | Very useful |
|
|
15
|
+
| 5 | `mdcontext tree docs.llm/feedback.md` | Show outline | Revealed section structure (11K tokens) | Yes |
|
|
16
|
+
| 6 | `mdcontext tree docs.llm/amorphic.md` | Show outline | Revealed section structure (21K tokens) | Yes |
|
|
17
|
+
| 7 | `mdcontext tree docs.llm/spec.md` | Show outline | Revealed section structure (56K tokens) | Yes |
|
|
18
|
+
| 8 | `mdcontext search "feedback" docs.llm/` | Find feedback content | 10 results with context | Yes |
|
|
19
|
+
| 9 | `mdcontext search "ideas" docs.llm/` | Find ideas | 6 results | Moderate |
|
|
20
|
+
| 10 | `mdcontext search "suggestion OR recommend" docs.llm/` | Find suggestions | 10 results | Moderate |
|
|
21
|
+
| 11 | `mdcontext search "problem OR challenge OR limitation" docs.llm/` | Find problems | 10 results, critical themes | Yes |
|
|
22
|
+
| 12 | `mdcontext search "innovation OR future OR vision" docs.llm/` | Find future vision | 10 results | Yes |
|
|
23
|
+
| 13 | `mdcontext search "improve OR enhancement OR better" docs.llm/` | Find improvements | 10 results | Yes |
|
|
24
|
+
| 14 | `mdcontext context docs.llm/feedback.md -t 3000` | Summarize feedback.md | Got condensed overview (16% of file) | Yes |
|
|
25
|
+
| 15 | `mdcontext context docs.llm/amorphic.md --section "Research Frontiers"` | Get research section | Clean targeted content | Very useful |
|
|
26
|
+
| 16 | `mdcontext context docs.llm/amorphic.md --section "Open Questions"` | Get open questions | 3 key questions | Very useful |
|
|
27
|
+
| 17 | `mdcontext search "trust OR security OR privacy"` | Find trust themes | 10 results | Yes |
|
|
28
|
+
| 18 | `mdcontext context docs.llm/feedback.md --section "The Core Thesis"` | Get core thesis | Key product vision | Very useful |
|
|
29
|
+
| 19 | `mdcontext search "agent OR autonomous OR automation"` | Find automation themes | 10 results, deep content | Yes |
|
|
30
|
+
| 20 | `mdcontext context ... --section "The Paradox of Automation"` | Get paradox section | Insightful content | Very useful |
|
|
31
|
+
| 21 | `mdcontext context ... --section "From Platform to Paradigm"` | Get paradigm section | Rich philosophical content | Very useful |
|
|
32
|
+
| 22 | `mdcontext search "workflow OR process OR pattern"` | Find workflow themes | 10 results | Yes |
|
|
33
|
+
| 23 | `mdcontext context ... --section "What Problem This Solves"` | Get problem statement | Clear product definition | Very useful |
|
|
34
|
+
| 24 | `mdcontext context ... --section "Key Differentiators"` | Get differentiators | Comparison table | Yes |
|
|
35
|
+
| 25 | `mdcontext search "memory OR knowledge OR context"` | Find memory themes | 10 results | Yes |
|
|
36
|
+
| 26 | `mdcontext context ... --section "Geometric Memory Architecture"` | Get memory arch | Core innovation | Very useful |
|
|
37
|
+
| 27 | `mdcontext context ... --section "Anti-Patterns"` | Get anti-patterns | Clear guardrails | Yes |
|
|
38
|
+
| 28 | `mdcontext search "user OR experience OR usability"` | Find UX themes | 10 results | Moderate |
|
|
39
|
+
| 29 | `mdcontext context ... --section "The Authority Gradient"` | Get authority section | Key innovation | Very useful |
|
|
40
|
+
| 30 | `mdcontext search "scale OR enterprise OR organization"` | Find scale themes | 10 results | Moderate |
|
|
41
|
+
| 31 | `mdcontext context ... --section "Philosophy"` | Get philosophy | Core principles | Yes |
|
|
42
|
+
| 32 | `mdcontext context ... --section "Designing for Emergence"` | Get emergence section | Design principles | Very useful |
|
|
43
|
+
|
|
44
|
+
## Findings
|
|
45
|
+
|
|
46
|
+
### Key Discoveries
|
|
47
|
+
|
|
48
|
+
1. **docs.llm/ contains LLM-generated product vision documents** - These are AI-written explorations of a "HumanWork" (also called "Amorphic") platform for human-AI collaboration orchestration.
|
|
49
|
+
|
|
50
|
+
2. **Core Product Vision: "Operating System for Work"** - The documents describe a system that bridges human creativity with AI execution, using immutable event ledgers and role-based abstractions.
|
|
51
|
+
|
|
52
|
+
3. **Three-Layer Memory Architecture** - Event (immutable facts), Status (derived views), Semantic (AI-assisted understanding) - a sophisticated memory model for persistent AI collaboration.
|
|
53
|
+
|
|
54
|
+
4. **Explicit Anti-Automation Stance** - The documents articulate why pure automation fails for knowledge work and propose "choreography" as an alternative paradigm.
|
|
55
|
+
|
|
56
|
+
5. **Research-Grade Thinking** - Includes open questions, research frontiers, and mature consideration of organizational transformation.
|
|
57
|
+
|
|
58
|
+
### Relevant Quotes/Sections Found
|
|
59
|
+
|
|
60
|
+
> "Enterprise adoption of autonomous agents has stalled due to **Opacity** (we don't know how the agent works) and **Risk** (we can't trust it to run unsupervised)."
|
|
61
|
+
> Source: feedback.md, "The Core Thesis"
|
|
62
|
+
|
|
63
|
+
> "In order for AI to automate humans, we must first orchestrate humans. HumanWork is that orchestration layer."
|
|
64
|
+
> Source: feedback.md, "The Core Thesis"
|
|
65
|
+
|
|
66
|
+
> "The most sophisticated choreographed intelligence systems often appear less automated than simpler ones."
|
|
67
|
+
> Source: amorphic.md, "The Paradox of Automation"
|
|
68
|
+
|
|
69
|
+
> "How do we ensure HumanWork organizations remain aligned with human values as they become more autonomous?"
|
|
70
|
+
> Source: amorphic.md, "Open Questions"
|
|
71
|
+
|
|
72
|
+
> "Memory is not: Chat logs, Raw execution logs, Agent recall, Hidden embeddings. Memory is: Immutable facts, Derived views, Shared context, Inspectable truth."
|
|
73
|
+
> Source: spec.md, "Memory Philosophy"
|
|
74
|
+
|
|
75
|
+
> "The system helps humans think better - it never decides for them."
|
|
76
|
+
> Source: spec.md, "Philosophy"
|
|
77
|
+
|
|
78
|
+
> "What emerges isn't just a more capable system - it's a new category of intelligence entirely."
|
|
79
|
+
> Source: feedback.md, "From Platform to Paradigm"
|
|
80
|
+
|
|
81
|
+
### Themes Identified
|
|
82
|
+
|
|
83
|
+
1. **Human-AI Choreography** - Reframing "automation" as "collaboration" with explicit authority gradients and intervention points
|
|
84
|
+
|
|
85
|
+
2. **Memory as Infrastructure** - Treating organizational memory as first-class infrastructure with event sourcing and immutable ledgers
|
|
86
|
+
|
|
87
|
+
3. **Trust Through Transparency** - Building trust via observable behavior rather than assumed reliability
|
|
88
|
+
|
|
89
|
+
4. **Emergent Intelligence** - Designing systems that grow and learn rather than being fully specified upfront
|
|
90
|
+
|
|
91
|
+
5. **Anti-Patterns as Guardrails** - Explicit documentation of what NOT to do (e.g., never hide events, never let workflows execute directly)
|
|
92
|
+
|
|
93
|
+
6. **Workspace Metaphor** - Cognitive workspaces that maintain context across sessions and enable parallel exploration
|
|
94
|
+
|
|
95
|
+
7. **Role Abstraction** - Separating the "what" of work from "who does it" (human vs AI actor)
|
|
96
|
+
|
|
97
|
+
8. **Recursive Improvement** - Meta-learning where the collaboration system itself improves at collaboration
|
|
98
|
+
|
|
99
|
+
## Tool Evaluation
|
|
100
|
+
|
|
101
|
+
### What Worked Well
|
|
102
|
+
|
|
103
|
+
- **Section-targeted context** (`--section` flag) was extremely effective for extracting specific topics from large files
|
|
104
|
+
- **Boolean search operators** (OR, AND) enabled comprehensive theme discovery
|
|
105
|
+
- **Tree command with outlines** provided excellent navigation for 56K+ token documents
|
|
106
|
+
- **Token budget controls** (`-t` flag) allowed reasonable context extraction
|
|
107
|
+
- **Search with line context** showed surrounding text, making results interpretable
|
|
108
|
+
- **Stats command** gave immediate understanding of corpus size and complexity
|
|
109
|
+
|
|
110
|
+
### What Was Frustrating
|
|
111
|
+
|
|
112
|
+
- **No embeddings by default** - Every search reminded me to run `--embed` but I stayed with keyword search
|
|
113
|
+
- **10 result limit** on searches - Could not see full scope of matches without multiple queries
|
|
114
|
+
- **Boolean search required explicit operators** - Had to use "OR" not natural language
|
|
115
|
+
- **Context command truncation** - The 16% of feedback.md felt limiting for a summary
|
|
116
|
+
- **No way to exclude sections** - Could not say "give me everything EXCEPT this section"
|
|
117
|
+
|
|
118
|
+
### What Was Missing
|
|
119
|
+
|
|
120
|
+
- **Cross-file semantic search** - Would have been useful to find themes across all 3 files at once
|
|
121
|
+
- **Relevance ranking** - Results came in document order, not relevance order
|
|
122
|
+
- **Summary generation** - Would appreciate AI-generated summaries of search results
|
|
123
|
+
- **Export to structured format** - No easy way to extract findings programmatically
|
|
124
|
+
- **Highlight/annotation** - Cannot mark sections for later reference
|
|
125
|
+
|
|
126
|
+
### Confidence Level
|
|
127
|
+
|
|
128
|
+
[X] High - The tool gave me reliable, reproducible access to the content. Boolean searches and section targeting let me systematically explore 89K tokens of dense material.
|
|
129
|
+
|
|
130
|
+
### Would Use Again? (1-5)
|
|
131
|
+
|
|
132
|
+
**4** - Strong tool for navigating large markdown corpora. The section-targeting feature is genuinely valuable for long documents. Would score 5 if semantic search was enabled by default and result limits were configurable. The approach of "tree to understand structure, search to find content, context to extract" is a solid workflow.
|
|
133
|
+
|
|
134
|
+
## Time & Efficiency
|
|
135
|
+
|
|
136
|
+
- Commands run: 32
|
|
137
|
+
- Compared to reading all files: **Much less** - 89K tokens would take significant time to read raw. The tool allowed surgical extraction of relevant content in under 15 minutes of exploration.
|
|
138
|
+
|
|
139
|
+
## Additional Observations
|
|
140
|
+
|
|
141
|
+
### These are High-Quality Design Documents
|
|
142
|
+
|
|
143
|
+
The docs.llm/ folder contains what appears to be Claude-generated (or Claude-assisted) exploration of a complex product vision. The thinking is sophisticated, covering:
|
|
144
|
+
|
|
145
|
+
- Architectural patterns
|
|
146
|
+
- Organizational transformation
|
|
147
|
+
- Research frontiers
|
|
148
|
+
- Anti-patterns and guardrails
|
|
149
|
+
- Philosophy and design principles
|
|
150
|
+
|
|
151
|
+
### Feedback for the Product (HumanWork/Amorphic)
|
|
152
|
+
|
|
153
|
+
From the content I extracted, the documents themselves surface several self-reflective questions:
|
|
154
|
+
|
|
155
|
+
1. How to prevent "organizational capture" where systems optimize for self-perpetuation
|
|
156
|
+
2. How to maintain human agency while leveraging AI efficiency
|
|
157
|
+
3. How to ensure value alignment as systems become more autonomous
|
|
158
|
+
4. How to measure success in "hybrid organizations"
|
|
159
|
+
|
|
160
|
+
These open questions represent mature product thinking and should inform roadmap priorities.
|
|
161
|
+
|
|
162
|
+
### The "Memory Is All You Need" Parallel
|
|
163
|
+
|
|
164
|
+
The feedback.md document draws an explicit parallel between the Transformer "Attention Is All You Need" breakthrough and their proposed "Memory Is All You Need" architecture. This is a bold positioning that frames the product as a paradigm shift rather than incremental improvement.
|
|
@@ -0,0 +1,228 @@
|
|
|
1
|
+
# B-Synth: Strategy B Synthesis
|
|
2
|
+
|
|
3
|
+
## Executive Summary
|
|
4
|
+
|
|
5
|
+
The three Strategy B agents collectively identified a mature, self-aware specification that thoroughly documents what NOT to do (anti-patterns, invariants) but has significant gaps in terminology alignment, implementation guidance, and philosophical framing. The most critical finding is the HumanWork-Evolution.md document which already synthesizes feedback into a phased improvement plan - agents B1-B3 largely validated and expanded on this existing gap analysis.
|
|
6
|
+
|
|
7
|
+
## Cross-Agent Patterns
|
|
8
|
+
|
|
9
|
+
**Theme 1: Semantic Search Underperformance**
|
|
10
|
+
All three agents found semantic search unreliable for multi-word conceptual queries. All fell back to keyword search frequently. This is the strongest cross-agent signal about the mdcontext tool.
|
|
11
|
+
|
|
12
|
+
**Theme 2: HumanWork-Evolution.md as Critical Source**
|
|
13
|
+
Both B1 and B2 independently discovered this document as the authoritative source for gaps and critiques, validating its importance.
|
|
14
|
+
|
|
15
|
+
**Theme 3: Human-First Philosophy with Acknowledged Tensions**
|
|
16
|
+
All agents found the docs emphasize human control, but B2 identified a philosophical gap: the spec frames human control as end-state rather than transition phase toward "intelligence crystallization."
|
|
17
|
+
|
|
18
|
+
**Theme 4: Section-Level Context Extraction Praised**
|
|
19
|
+
All three agents highlighted the `--section` flag as highly effective for targeted retrieval.
|
|
20
|
+
|
|
21
|
+
**Theme 5: Checkpoint/Intervention Architecture**
|
|
22
|
+
B1 found checkpoints in anti-patterns, B3 found them in workflow design - the spec heavily emphasizes checkpoints as the key governance mechanism.
|
|
23
|
+
|
|
24
|
+
## Consolidated Findings
|
|
25
|
+
|
|
26
|
+
### Architecture Criticisms (from B1)
|
|
27
|
+
|
|
28
|
+
**External Criticisms (of traditional approaches):**
|
|
29
|
+
|
|
30
|
+
- Brittleness of pure automation (combinatorial explosion of rules)
|
|
31
|
+
- Coordination Trap (multiplies human translation work)
|
|
32
|
+
- Innovation Strangulation (automation-incompatible approaches avoided)
|
|
33
|
+
- Judgment Gap (80% flawless, 20% chaos)
|
|
34
|
+
- Context Collapse (context as configuration, not conversation)
|
|
35
|
+
- Observability Problem (black-box agents kill trust)
|
|
36
|
+
|
|
37
|
+
**Self-Imposed Constraints (internal guardrails):**
|
|
38
|
+
|
|
39
|
+
- 8 Architectural Invariants (no hidden state, no irreversible execution, etc.)
|
|
40
|
+
- 7 Memory Model Anti-Patterns
|
|
41
|
+
- 8 Workflow Anti-Patterns
|
|
42
|
+
|
|
43
|
+
**Open Questions (acknowledged gaps):**
|
|
44
|
+
|
|
45
|
+
- Alignment with human values at scale
|
|
46
|
+
- Limits of organizational intelligence
|
|
47
|
+
- Preventing organizational capture (self-perpetuation)
|
|
48
|
+
|
|
49
|
+
### Gaps Identified (from B2)
|
|
50
|
+
|
|
51
|
+
**Terminology Gaps:**
|
|
52
|
+
|
|
53
|
+
- Agent -> Actor (unified human/machine)
|
|
54
|
+
- Artifact -> Deliverable (business language)
|
|
55
|
+
- Event Memory -> The Ledger (IP capture emphasis)
|
|
56
|
+
|
|
57
|
+
**Missing Primitives:**
|
|
58
|
+
|
|
59
|
+
- Correction Event (captures human intelligence on modifications)
|
|
60
|
+
- Authority Gradient (replaces binary control)
|
|
61
|
+
- Pattern Crystallization (organizational learning mechanism)
|
|
62
|
+
|
|
63
|
+
**Architectural Gaps:**
|
|
64
|
+
|
|
65
|
+
- No geometric/semantic embeddings in Semantic Memory
|
|
66
|
+
- Cost model doesn't unify human hours and AI tokens
|
|
67
|
+
- Privacy model is "policy overlay" only
|
|
68
|
+
- No formal API specification
|
|
69
|
+
|
|
70
|
+
**Philosophical Gap:**
|
|
71
|
+
|
|
72
|
+
- Spec positions "human control" as goal
|
|
73
|
+
- Feedback suggests reframing as "intelligence extraction"
|
|
74
|
+
- Human corrections should become portable organizational intelligence
|
|
75
|
+
|
|
76
|
+
### Workflow Improvements (from B3)
|
|
77
|
+
|
|
78
|
+
**Core Philosophy:**
|
|
79
|
+
|
|
80
|
+
- Workflows as "guidance without control"
|
|
81
|
+
- Six concepts: Entry Signals, Roles, Phases, Activities, Checkpoints, Exit Conditions
|
|
82
|
+
- Checkpoints as primary governance mechanism
|
|
83
|
+
|
|
84
|
+
**Authority Gradient (4 modes):**
|
|
85
|
+
|
|
86
|
+
1. Instructional: Step-by-step human instructions
|
|
87
|
+
2. Consultative: Human defines goal, agent proposes
|
|
88
|
+
3. Supervisory: Agents execute, humans monitor
|
|
89
|
+
4. Exploratory: Alternating generation/testing
|
|
90
|
+
|
|
91
|
+
**Intervention Points:**
|
|
92
|
+
|
|
93
|
+
- Redirect, Override, Inject, Escalate
|
|
94
|
+
|
|
95
|
+
**Key Patterns:**
|
|
96
|
+
|
|
97
|
+
- Time Travel and Branching
|
|
98
|
+
- Parallel Exploration
|
|
99
|
+
- Immutable Workflow Versioning
|
|
100
|
+
|
|
101
|
+
**Organizational Transformation:**
|
|
102
|
+
|
|
103
|
+
- Choreographic Maturity Model (4 levels)
|
|
104
|
+
- Cultural shifts toward experimental mindsets
|
|
105
|
+
|
|
106
|
+
## Proposed Spec Changes (Prioritized)
|
|
107
|
+
|
|
108
|
+
### High Priority
|
|
109
|
+
|
|
110
|
+
- [ ] Rename Artifact -> Deliverable throughout (B2)
|
|
111
|
+
- [ ] Add Correction Event primitive (B2) - captures IP when humans modify outputs
|
|
112
|
+
- [ ] Add Authority Gradient to Execution Model (B2, B3) - instructional/consultative/supervisory/exploratory
|
|
113
|
+
- [ ] Expand Judgment Gap (80/20 problem) handling beyond "humans intervene" (B1)
|
|
114
|
+
- [ ] Add "Known Limitations and Trade-offs" section (B1) - what HumanWork sacrifices
|
|
115
|
+
- [ ] Unify cost model for Human + Machine Actors (B2)
|
|
116
|
+
|
|
117
|
+
### Medium Priority
|
|
118
|
+
|
|
119
|
+
- [ ] Add Actor primitive with type: Human | Machine (B2)
|
|
120
|
+
- [ ] Add Pattern Crystallization to Memory Model (B2)
|
|
121
|
+
- [ ] Rename Event Memory -> The Ledger (B2)
|
|
122
|
+
- [ ] Add cognitive telemetry to Checkpoints (B2) - deliberation_duration, confidence_signal, modification_depth
|
|
123
|
+
- [ ] Document concrete answers to Open Questions or mark as research priorities (B1)
|
|
124
|
+
- [ ] Create decision framework: when DAG-style execution IS appropriate (B1)
|
|
125
|
+
- [ ] Add explicit checkpoint requirements for high-stakes workflows (B3)
|
|
126
|
+
- [ ] Define minimum intervention points per workflow phase (B3)
|
|
127
|
+
|
|
128
|
+
### Low Priority
|
|
129
|
+
|
|
130
|
+
- [ ] Enhance Semantic Memory with geometric embeddings (B2)
|
|
131
|
+
- [ ] Add detection guidance for when Status Memory becomes authoritative (B1)
|
|
132
|
+
- [ ] Reframe "human control" as transition phase, not end state (B2)
|
|
133
|
+
- [ ] Adopt choreography language over orchestration (B2)
|
|
134
|
+
- [ ] Develop privacy model beyond "policy overlay" (B2)
|
|
135
|
+
- [ ] Create formal API specification (B2)
|
|
136
|
+
- [ ] Establish choreographic maturity assessment framework (B3)
|
|
137
|
+
- [ ] Create signals taxonomy (activity, outcome, attention, health) (B3)
|
|
138
|
+
|
|
139
|
+
## Tool Evaluation Synthesis
|
|
140
|
+
|
|
141
|
+
All three agents used the mdcontext tool extensively (38, 35, and 41 commands respectively = 114 total commands). Their assessments were remarkably consistent.
|
|
142
|
+
|
|
143
|
+
### Common Praise
|
|
144
|
+
|
|
145
|
+
- **Section-level context extraction** (`--section`) was universally praised as highly effective
|
|
146
|
+
- **Keyword search** was reliable and essential fallback
|
|
147
|
+
- **Token budget control** (`-t`) helped manage context size
|
|
148
|
+
- **Tree command** gave quick corpus overview
|
|
149
|
+
- **Fast embedding indexing** (~$0.003 cost)
|
|
150
|
+
- **Stats command** useful for understanding corpus size
|
|
151
|
+
|
|
152
|
+
### Common Frustrations
|
|
153
|
+
|
|
154
|
+
- **Semantic search returned 0 results** for multi-word conceptual queries (all 3 agents)
|
|
155
|
+
- **Token truncation** without clear indication of what was excluded
|
|
156
|
+
- **No way to chain or aggregate searches** - had to run many separate commands
|
|
157
|
+
- **Multi-word keyword searches failed** (e.g., "issue challenge gap" = 0 results)
|
|
158
|
+
- **False positives** in keyword search
|
|
159
|
+
- **No semantic search threshold adjustment**
|
|
160
|
+
|
|
161
|
+
### Suggested Improvements
|
|
162
|
+
|
|
163
|
+
- Add fuzzy/stemmed search (fail vs failure)
|
|
164
|
+
- Add "search within results" / progressive refinement
|
|
165
|
+
- Add context around keyword matches without re-running
|
|
166
|
+
- Add combined semantic+keyword hybrid mode
|
|
167
|
+
- Add cross-document synthesis
|
|
168
|
+
- Add batch context extraction for multiple sections/files
|
|
169
|
+
- Add "related sections" feature
|
|
170
|
+
- Add Boolean operators in keyword mode
|
|
171
|
+
- Add export/save functionality
|
|
172
|
+
- Add "what's undefined" query (terms used but not defined)
|
|
173
|
+
|
|
174
|
+
### Quantitative Summary
|
|
175
|
+
|
|
176
|
+
| Agent | Commands | Confidence | Rating |
|
|
177
|
+
| ----- | -------- | ---------- | ------ |
|
|
178
|
+
| B1 | 38 | Medium | 4/5 |
|
|
179
|
+
| B2 | 35 | High | 4/5 |
|
|
180
|
+
| B3 | 41 | High | 4/5 |
|
|
181
|
+
|
|
182
|
+
All agents rated the tool 4/5 and found it significantly faster than reading all files manually.
|
|
183
|
+
|
|
184
|
+
## Methodology Assessment
|
|
185
|
+
|
|
186
|
+
How well did Strategy B (divide by question) work?
|
|
187
|
+
|
|
188
|
+
### Strengths
|
|
189
|
+
|
|
190
|
+
- **Clear scope boundaries**: Each agent had a focused research question, avoiding overlap
|
|
191
|
+
- **Efficient parallelization**: Three agents could work simultaneously on different questions
|
|
192
|
+
- **Natural synthesis path**: Findings from each question type combined naturally into a coherent picture
|
|
193
|
+
- **Reduced redundancy**: Agents didn't repeat the same searches (unlike Strategy A file-based division)
|
|
194
|
+
- **Comprehensive coverage**: Architecture + Gaps + Workflows covers the spec from multiple angles
|
|
195
|
+
- **Discovery of key document**: Multiple agents independently found HumanWork-Evolution.md, validating its importance
|
|
196
|
+
|
|
197
|
+
### Weaknesses
|
|
198
|
+
|
|
199
|
+
- **Question boundaries can be fuzzy**: "Architecture criticisms" vs "gaps" had some overlap (e.g., observability problem)
|
|
200
|
+
- **Dependent insights split**: Authority Gradient appeared in both B2 (as gap) and B3 (as workflow improvement)
|
|
201
|
+
- **No shared discovery context**: B2 found HumanWork-Evolution.md which would have helped B1's research
|
|
202
|
+
- **Variable scope difficulty**: Some questions (workflows) were more expansive than others (architecture criticisms)
|
|
203
|
+
|
|
204
|
+
### Would Recommend For
|
|
205
|
+
|
|
206
|
+
- **Documentation analysis** where questions naturally partition the content
|
|
207
|
+
- **Due diligence reviews** (legal, technical, financial angles)
|
|
208
|
+
- **Research synthesis** where multiple perspectives on same corpus needed
|
|
209
|
+
- **Gap analysis** where "what exists" vs "what's missing" are distinct questions
|
|
210
|
+
- **Any task where questions are more natural than file divisions**
|
|
211
|
+
|
|
212
|
+
### Not Recommended For
|
|
213
|
+
|
|
214
|
+
- **Code review** (files matter more than questions)
|
|
215
|
+
- **Tasks where answers span all questions** (high synthesis overhead)
|
|
216
|
+
- **Simple/small corpora** (parallelization overhead not worth it)
|
|
217
|
+
|
|
218
|
+
## Appendix: Agent Command Efficiency
|
|
219
|
+
|
|
220
|
+
| Metric | B1 | B2 | B3 | Total |
|
|
221
|
+
| ------------------- | --- | --- | --- | ----- |
|
|
222
|
+
| Commands run | 38 | 35 | 41 | 114 |
|
|
223
|
+
| Semantic searches | 8 | 4 | 12 | 24 |
|
|
224
|
+
| Keyword searches | 22 | 23 | 0 | 45 |
|
|
225
|
+
| Context extractions | 13 | 9 | 19 | 41 |
|
|
226
|
+
| Tree/Stats/Index | 3 | 3 | 3 | 9 |
|
|
227
|
+
|
|
228
|
+
**Key observation**: B3 (workflows) used semantic search exclusively and found it more effective for their domain. B1 and B2 heavily relied on keyword search after semantic search failed. This suggests semantic search may work better for concrete concepts (workflows, collaboration) than abstract critiques (gaps, criticisms).
|
|
@@ -0,0 +1,207 @@
|
|
|
1
|
+
# Report: B1 - Architecture Critic Hunter
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
|
|
5
|
+
Find architecture and design criticisms across all documentation
|
|
6
|
+
|
|
7
|
+
## Research Question
|
|
8
|
+
|
|
9
|
+
What architecture and design criticisms exist?
|
|
10
|
+
|
|
11
|
+
## Command Log
|
|
12
|
+
|
|
13
|
+
| # | Command | Purpose | Result | Useful? |
|
|
14
|
+
| --- | ------------------------------------------------------------------------------------------------------------- | ------------------------------- | ------------------------------------- | ------- |
|
|
15
|
+
| 1 | `mdcontext --help` | Learn tool | Got full usage guide | Yes |
|
|
16
|
+
| 2 | `mdcontext index --embed --force` | Index all files with embeddings | 23 docs, 922 sections, 904 embeddings | Yes |
|
|
17
|
+
| 3 | `mdcontext search "architecture criticism problems design flaws limitations"` | Semantic search for criticisms | 1 result (ARCHITECTURAL_FOUNDATIONS) | Partial |
|
|
18
|
+
| 4 | `mdcontext search "design trade-offs weaknesses concerns issues"` | Semantic search | 0 results | No |
|
|
19
|
+
| 5 | `mdcontext search "failure problems complexity challenges"` | Semantic search | 0 results | No |
|
|
20
|
+
| 6 | `mdcontext search "failure" --mode keyword` | Keyword search | 10 results | Yes |
|
|
21
|
+
| 7 | `mdcontext context docs.amorphic/02-THE_FAILURE_OF_PURE_AUTOMATION.md -t 3000` | Get full failure analysis | Full document | Yes |
|
|
22
|
+
| 8 | `mdcontext search "limitations" --mode keyword` | Keyword search | 1 result | Yes |
|
|
23
|
+
| 9 | `mdcontext search "problem" --mode keyword` | Keyword search | 10 results | Yes |
|
|
24
|
+
| 10 | `mdcontext search "risk" --mode keyword` | Keyword search | 10 results | Yes |
|
|
25
|
+
| 11 | `mdcontext search "anti-pattern" --mode keyword` | Keyword search | 4 results | Yes |
|
|
26
|
+
| 12 | `mdcontext context docs/05-MEMORY_MODEL.md --section "Anti-Patterns"` | Get memory anti-patterns | 7 forbidden patterns | Yes |
|
|
27
|
+
| 13 | `mdcontext context docs/06-WORKFLOWS.md --section "Anti-Patterns"` | Get workflow anti-patterns | 8 forbidden patterns | Yes |
|
|
28
|
+
| 14 | `mdcontext search "concern" --mode keyword` | Keyword search | 10 results | Yes |
|
|
29
|
+
| 15 | `mdcontext search "brittle" --mode keyword` | Keyword search | 10 results | Yes |
|
|
30
|
+
| 16 | `mdcontext search "complexity" --mode keyword` | Keyword search | 9 results | Yes |
|
|
31
|
+
| 17 | `mdcontext search "overhead" --mode keyword` | Keyword search | 7 results | Yes |
|
|
32
|
+
| 18 | `mdcontext search "design patterns architecture decision"` | Semantic search | 10 results | Yes |
|
|
33
|
+
| 19 | `mdcontext context docs.amorphic/03-ARCHITECTURAL_FOUNDATIONS.md -t 2000` | Get architectural foundations | Full document | Yes |
|
|
34
|
+
| 20 | `mdcontext context docs.amorphic/05-TECHNICAL_IMPLEMENTATION_PATTERNS.md -t 2500` | Get implementation patterns | Full document | Yes |
|
|
35
|
+
| 21 | `mdcontext search "gap" --mode keyword` | Keyword search | 8 results | Yes |
|
|
36
|
+
| 22 | `mdcontext context docs.amorphic/02-THE_FAILURE_OF_PURE_AUTOMATION.md --section "Judgment Gap"` | Get judgment gap section | Detailed section | Yes |
|
|
37
|
+
| 23 | `mdcontext search "forbidden" --mode keyword` | Keyword search | 5 results | Yes |
|
|
38
|
+
| 24 | `mdcontext search "corrupt" --mode keyword` | Keyword search | 5 results | Yes |
|
|
39
|
+
| 25 | `mdcontext tree docs/01-ARCHITECTURE.md` | Get document outline | 47 sections | Yes |
|
|
40
|
+
| 26 | `mdcontext context docs/01-ARCHITECTURE.md --section "Why This Architecture Works"` | Get rationale | Brief justification | Yes |
|
|
41
|
+
| 27 | `mdcontext context docs/01-ARCHITECTURE.md --section "Architectural Invariants"` | Get invariants | 8 invariants | Yes |
|
|
42
|
+
| 28 | `mdcontext search "fail" --mode keyword` | Keyword search | 10 results | Yes |
|
|
43
|
+
| 29 | `mdcontext context docs/00-README.md --section "What Problem"` | Get problem statement | Core problem | Yes |
|
|
44
|
+
| 30 | `mdcontext search "cost" --mode keyword` | Keyword search | 10 results | Yes |
|
|
45
|
+
| 31 | `mdcontext context docs.llm/feedback.md -t 3000` | Get feedback document | Chat feedback analysis | Yes |
|
|
46
|
+
| 32 | `mdcontext search "traditional" --mode keyword` | Keyword search | 10 results | Yes |
|
|
47
|
+
| 33 | `mdcontext tree docs.llm/amorphic.md` | Get amorphic outline | Full outline | Yes |
|
|
48
|
+
| 34 | `mdcontext context docs.llm/amorphic.md --section "Open Questions"` | Get open questions | 3 open questions | Yes |
|
|
49
|
+
| 35 | `mdcontext context docs.llm/amorphic.md --section "Paradox of Automation"` | Get paradox section | Detailed section | Yes |
|
|
50
|
+
| 36 | `mdcontext search "wrong" --mode keyword` | Keyword search | 6 results | Yes |
|
|
51
|
+
| 37 | `mdcontext context docs.amorphic/04-THE_HUMAN-AGENT_COLLABORATION_MODEL.md --section "Observability Problem"` | Get observability problem | Key issue identified | Yes |
|
|
52
|
+
| 38 | `mdcontext search "scale" --mode keyword` | Keyword search | 10 results | Yes |
|
|
53
|
+
|
|
54
|
+
## Findings
|
|
55
|
+
|
|
56
|
+
### Key Discoveries
|
|
57
|
+
|
|
58
|
+
#### 1. Criticisms of Traditional/Pure Automation (Major Theme)
|
|
59
|
+
|
|
60
|
+
The documentation extensively critiques traditional automation approaches:
|
|
61
|
+
|
|
62
|
+
- **Brittleness**: "The system becomes brittle not because any individual rule is wrong, but because the combinatorial explosion of rules creates a rigid lattice that cannot bend without breaking."
|
|
63
|
+
- **Coordination Trap**: Pure automation "multiplies coordination requirements by forcing human work into machine-readable formats that require constant translation and synchronization."
|
|
64
|
+
- **Innovation Strangulation**: "Teams avoid innovative approaches not because they're technically inferior, but because they're automation-incompatible."
|
|
65
|
+
- **Human Bottleneck Paradox**: Attempting to eliminate humans creates new bottlenecks in system configuration and exception handling.
|
|
66
|
+
- **Context Collapse**: Traditional systems treat "context as configuration rather than conversation."
|
|
67
|
+
- **Judgment Gap**: "Systems that handle 80% of cases flawlessly but create chaos in the remaining 20%."
|
|
68
|
+
|
|
69
|
+
#### 2. Agent System Criticisms (Self-Aware)
|
|
70
|
+
|
|
71
|
+
The documentation acknowledges problems with current agent systems:
|
|
72
|
+
|
|
73
|
+
> "Most agent systems fail at real work because they optimize for demos, single-shot tasks, and autonomous execution. They become opaque, brittle, hard to interrupt, impossible to rewind, and unsafe to scale."
|
|
74
|
+
> Source: docs/00-README.md
|
|
75
|
+
|
|
76
|
+
#### 3. Observability Problem
|
|
77
|
+
|
|
78
|
+
> "Most agent systems are black boxes. You send a request, wait, and get a result - with no visibility into what happened in between. When something goes wrong, you're left debugging phantom processes and mysterious failures. This opacity kills trust."
|
|
79
|
+
> Source: docs.amorphic/04-THE_HUMAN-AGENT_COLLABORATION_MODEL.md
|
|
80
|
+
|
|
81
|
+
#### 4. Anti-Patterns Explicitly Forbidden
|
|
82
|
+
|
|
83
|
+
**Memory Model Anti-Patterns:**
|
|
84
|
+
|
|
85
|
+
- Storing mutable state in Event Memory
|
|
86
|
+
- Treating Status Memory as authoritative
|
|
87
|
+
- Letting Semantic Memory drive execution
|
|
88
|
+
- Hiding Events from humans
|
|
89
|
+
- Creating circular dependencies between layers
|
|
90
|
+
- Bypassing Event Memory for "performance"
|
|
91
|
+
- Hard-deleting critical audit events
|
|
92
|
+
|
|
93
|
+
**Workflow Anti-Patterns:**
|
|
94
|
+
|
|
95
|
+
- Workflows that execute directly
|
|
96
|
+
- Workflows that mutate artifacts
|
|
97
|
+
- Workflows that allocate cost
|
|
98
|
+
- Workflows that own agents
|
|
99
|
+
- Hidden workflow state
|
|
100
|
+
- Workflows that become Turing-complete
|
|
101
|
+
- Mandatory workflows (at system level)
|
|
102
|
+
- Workflows that bypass Control Plane
|
|
103
|
+
|
|
104
|
+
#### 5. Architectural Invariants (Design Constraints)
|
|
105
|
+
|
|
106
|
+
The system explicitly maintains these constraints to avoid known issues:
|
|
107
|
+
|
|
108
|
+
- No hidden mutable state
|
|
109
|
+
- No irreversible execution
|
|
110
|
+
- No unobservable progress
|
|
111
|
+
- No agent-owned memory
|
|
112
|
+
- No loss of human authority
|
|
113
|
+
- No concurrent mutation of the same scope
|
|
114
|
+
- No execution without a Workspace
|
|
115
|
+
- No automatic flow from Org to Workspace
|
|
116
|
+
|
|
117
|
+
#### 6. Open Questions (Acknowledged Gaps)
|
|
118
|
+
|
|
119
|
+
> "How do we ensure HumanWork organizations remain aligned with human values as they become more autonomous?"
|
|
120
|
+
> "What are the limits of organizational intelligence? Are there problems that fundamentally require individual rather than collective cognition?"
|
|
121
|
+
> "How do we prevent organizational capture - scenarios where HumanWork systems optimize for their own perpetuation rather than their intended purposes?"
|
|
122
|
+
> Source: docs.llm/amorphic.md
|
|
123
|
+
|
|
124
|
+
#### 7. Substrate Problem
|
|
125
|
+
|
|
126
|
+
> "Implementation details leak into the conceptual model, making the workflow harder to reason about and modify."
|
|
127
|
+
> Source: docs.amorphic/03-ARCHITECTURAL_FOUNDATIONS.md
|
|
128
|
+
|
|
129
|
+
### Relevant Quotes/Sections Found
|
|
130
|
+
|
|
131
|
+
> "Pure automation assumes complete knowledge of the problem space. It requires that all possible states, transitions, and edge cases be enumerable at design time. This works beautifully for manufacturing widgets or processing financial transactions - domains where the rules are well-understood and the exceptions are genuinely exceptional. But knowledge work exists in a different regime entirely."
|
|
132
|
+
> Source: docs.amorphic/02-THE_FAILURE_OF_PURE_AUTOMATION.md, The Brittleness of Complete Systems
|
|
133
|
+
|
|
134
|
+
> "The paradox emerges when pure automation, in attempting to eliminate human bottlenecks, creates new bottlenecks in the form of system configuration, exception handling, and cross-system integration."
|
|
135
|
+
> Source: docs.amorphic/02-THE_FAILURE_OF_PURE_AUTOMATION.md, The Human Bottleneck Paradox
|
|
136
|
+
|
|
137
|
+
> "Traditional workflow systems model execution as directed acyclic graphs (DAGs) - nodes representing tasks, edges representing dependencies. This works well for batch processing and pipeline scenarios where the structure is known in advance. But it breaks down when workflows need to adapt their structure based on runtime conditions or accumulated learning."
|
|
138
|
+
> Source: docs.amorphic/03-ARCHITECTURAL_FOUNDATIONS.md, Component Relationships
|
|
139
|
+
|
|
140
|
+
> "If Status Memory cannot be rebuilt, it has become a source of truth and the system is corrupted."
|
|
141
|
+
> Source: docs/05-MEMORY_MODEL.md, The Hard Rule
|
|
142
|
+
|
|
143
|
+
### Answer to Research Question
|
|
144
|
+
|
|
145
|
+
**What architecture and design criticisms exist?**
|
|
146
|
+
|
|
147
|
+
The documentation contains extensive, self-aware architectural criticism organized into three categories:
|
|
148
|
+
|
|
149
|
+
1. **Criticisms of Traditional Approaches (external):** The docs thoroughly critique pure automation, traditional workflow systems (DAGs), black-box agent systems, and context-as-configuration approaches. These criticisms justify the HumanWork design decisions.
|
|
150
|
+
|
|
151
|
+
2. **Self-Imposed Constraints (internal guardrails):** The architecture explicitly forbids specific anti-patterns for both memory and workflows. These represent lessons learned about what NOT to do - treating them as "corrupted" states if they appear.
|
|
152
|
+
|
|
153
|
+
3. **Acknowledged Open Questions (honest gaps):** The documentation admits uncertainty about alignment with human values at scale, limits of organizational intelligence, and preventing organizational capture.
|
|
154
|
+
|
|
155
|
+
The architectural philosophy is defensive - explicitly naming what can go wrong and building constraints to prevent it. The invariants and anti-patterns serve as architectural "unit tests" against known failure modes.
|
|
156
|
+
|
|
157
|
+
## Proposed Spec Changes
|
|
158
|
+
|
|
159
|
+
- [ ] Add section on "Known Limitations and Trade-offs" to acknowledge what HumanWork architecture sacrifices (e.g., raw execution speed for observability)
|
|
160
|
+
- [ ] Expand on how the Judgment Gap (80/20 problem) is specifically addressed beyond "humans intervene"
|
|
161
|
+
- [ ] Document concrete answers to the Open Questions or mark them as research priorities
|
|
162
|
+
- [ ] Add guidance on detecting when Status Memory has "become authoritative" before corruption
|
|
163
|
+
- [ ] Create decision framework for when DAG-style execution IS appropriate vs. adaptive execution
|
|
164
|
+
|
|
165
|
+
## Tool Evaluation
|
|
166
|
+
|
|
167
|
+
### What Worked Well
|
|
168
|
+
|
|
169
|
+
- Keyword search (`--mode keyword`) was highly effective for finding specific terms like "failure", "brittle", "anti-pattern"
|
|
170
|
+
- Section-targeted context (`--section "X"`) efficiently extracted exactly what I needed
|
|
171
|
+
- The `tree` command helped understand document structure before diving in
|
|
172
|
+
- Embedding indexing was fast and one-time cost
|
|
173
|
+
- Token budget control (`-t`) helped manage context size
|
|
174
|
+
|
|
175
|
+
### What Was Frustrating
|
|
176
|
+
|
|
177
|
+
- Semantic search often returned 0 results for multi-word queries that should have matched
|
|
178
|
+
- Semantic search for "design trade-offs weaknesses concerns issues" returned nothing
|
|
179
|
+
- Semantic search for "failure problems complexity challenges" returned nothing
|
|
180
|
+
- Had to fall back to keyword search frequently after semantic failed
|
|
181
|
+
- Multi-word keyword searches didn't work (e.g., "issue challenge gap" = 0 results)
|
|
182
|
+
- Boolean operators in keyword mode unclear if supported
|
|
183
|
+
|
|
184
|
+
### What Was Missing
|
|
185
|
+
|
|
186
|
+
- No fuzzy/stemmed search (had to search "fail" vs "failure" separately)
|
|
187
|
+
- No "search within results" or progressive refinement
|
|
188
|
+
- No way to get context around keyword matches without re-running with `context`
|
|
189
|
+
- Semantic search threshold/sensitivity adjustment not available
|
|
190
|
+
- No combined semantic+keyword hybrid mode
|
|
191
|
+
- Difficult to search for concepts without exact terms
|
|
192
|
+
|
|
193
|
+
### Confidence Level
|
|
194
|
+
|
|
195
|
+
[X] Medium
|
|
196
|
+
|
|
197
|
+
The keyword search found the explicit criticisms comprehensively. However, I may have missed implicit criticisms or design concerns that don't use obvious negative terminology. Semantic search underperformed expectations.
|
|
198
|
+
|
|
199
|
+
### Would Use Again? (1-5)
|
|
200
|
+
|
|
201
|
+
**4** - Good for structured documentation analysis. Keyword search is reliable. Would use again but with clearer expectations that semantic search needs more work. The section-level context extraction is genuinely useful for targeted retrieval.
|
|
202
|
+
|
|
203
|
+
## Time & Efficiency
|
|
204
|
+
|
|
205
|
+
- Commands run: **38**
|
|
206
|
+
- Compared to reading all files: **Much less** - Would have taken 30+ minutes to read all docs manually. Tool-based search took approximately 15 minutes to find all relevant criticisms.
|
|
207
|
+
- Token efficiency: Reduced ~150k tokens of docs to targeted extracts totaling ~15k tokens of relevant content
|