mdcontext 0.0.1 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (140) hide show
  1. package/.changeset/README.md +28 -0
  2. package/.changeset/config.json +11 -0
  3. package/.github/workflows/ci.yml +83 -0
  4. package/.github/workflows/release.yml +113 -0
  5. package/.tldrignore +112 -0
  6. package/AGENTS.md +46 -0
  7. package/BACKLOG.md +338 -0
  8. package/README.md +231 -11
  9. package/biome.json +36 -0
  10. package/cspell.config.yaml +14 -0
  11. package/dist/chunk-KRYIFLQR.js +92 -0
  12. package/dist/chunk-S7E6TFX6.js +742 -0
  13. package/dist/chunk-VVTGZNBT.js +1519 -0
  14. package/dist/cli/main.d.ts +1 -0
  15. package/dist/cli/main.js +2015 -0
  16. package/dist/index.d.ts +266 -0
  17. package/dist/index.js +86 -0
  18. package/dist/mcp/server.d.ts +1 -0
  19. package/dist/mcp/server.js +376 -0
  20. package/docs/019-USAGE.md +586 -0
  21. package/docs/020-current-implementation.md +364 -0
  22. package/docs/021-DOGFOODING-FINDINGS.md +175 -0
  23. package/docs/BACKLOG.md +80 -0
  24. package/docs/DESIGN.md +439 -0
  25. package/docs/PROJECT.md +88 -0
  26. package/docs/ROADMAP.md +407 -0
  27. package/docs/test-links.md +9 -0
  28. package/package.json +69 -10
  29. package/pnpm-workspace.yaml +5 -0
  30. package/research/config-analysis/01-current-implementation.md +470 -0
  31. package/research/config-analysis/02-strategy-recommendation.md +428 -0
  32. package/research/config-analysis/03-task-candidates.md +715 -0
  33. package/research/config-analysis/033-research-configuration-management.md +828 -0
  34. package/research/config-analysis/034-research-effect-cli-config.md +1504 -0
  35. package/research/config-analysis/04-consolidated-task-candidates.md +277 -0
  36. package/research/dogfood/consolidated-tool-evaluation.md +373 -0
  37. package/research/dogfood/strategy-a/a-synthesis.md +184 -0
  38. package/research/dogfood/strategy-a/a1-docs.md +226 -0
  39. package/research/dogfood/strategy-a/a2-amorphic.md +156 -0
  40. package/research/dogfood/strategy-a/a3-llm.md +164 -0
  41. package/research/dogfood/strategy-b/b-synthesis.md +228 -0
  42. package/research/dogfood/strategy-b/b1-architecture.md +207 -0
  43. package/research/dogfood/strategy-b/b2-gaps.md +258 -0
  44. package/research/dogfood/strategy-b/b3-workflows.md +250 -0
  45. package/research/dogfood/strategy-c/c-synthesis.md +451 -0
  46. package/research/dogfood/strategy-c/c1-explorer.md +192 -0
  47. package/research/dogfood/strategy-c/c2-diver-memory.md +145 -0
  48. package/research/dogfood/strategy-c/c3-diver-control.md +148 -0
  49. package/research/dogfood/strategy-c/c4-diver-failure.md +151 -0
  50. package/research/dogfood/strategy-c/c5-diver-execution.md +221 -0
  51. package/research/dogfood/strategy-c/c6-diver-org.md +221 -0
  52. package/research/effect-cli-error-handling.md +845 -0
  53. package/research/effect-errors-as-values.md +943 -0
  54. package/research/errors-task-analysis/00-consolidated-tasks.md +207 -0
  55. package/research/errors-task-analysis/cli-commands-analysis.md +909 -0
  56. package/research/errors-task-analysis/embeddings-analysis.md +709 -0
  57. package/research/errors-task-analysis/index-search-analysis.md +812 -0
  58. package/research/mdcontext-error-analysis.md +521 -0
  59. package/research/npm_publish/011-npm-workflow-research-agent2.md +792 -0
  60. package/research/npm_publish/012-npm-workflow-research-agent1.md +530 -0
  61. package/research/npm_publish/013-npm-workflow-research-agent3.md +722 -0
  62. package/research/npm_publish/014-npm-workflow-synthesis.md +556 -0
  63. package/research/npm_publish/031-npm-workflow-task-analysis.md +134 -0
  64. package/research/semantic-search/002-research-embedding-models.md +490 -0
  65. package/research/semantic-search/003-research-rag-alternatives.md +523 -0
  66. package/research/semantic-search/004-research-vector-search.md +841 -0
  67. package/research/semantic-search/032-research-semantic-search.md +427 -0
  68. package/research/task-management-2026/00-synthesis-recommendations.md +295 -0
  69. package/research/task-management-2026/01-ai-workflow-tools.md +416 -0
  70. package/research/task-management-2026/02-agent-framework-patterns.md +476 -0
  71. package/research/task-management-2026/03-lightweight-file-based.md +567 -0
  72. package/research/task-management-2026/04-established-tools-ai-features.md +541 -0
  73. package/research/task-management-2026/linear/01-core-features-workflow.md +771 -0
  74. package/research/task-management-2026/linear/02-api-integrations.md +930 -0
  75. package/research/task-management-2026/linear/03-ai-features.md +368 -0
  76. package/research/task-management-2026/linear/04-pricing-setup.md +205 -0
  77. package/research/task-management-2026/linear/05-usage-patterns-best-practices.md +605 -0
  78. package/scripts/rebuild-hnswlib.js +63 -0
  79. package/src/cli/argv-preprocessor.test.ts +210 -0
  80. package/src/cli/argv-preprocessor.ts +202 -0
  81. package/src/cli/cli.test.ts +430 -0
  82. package/src/cli/commands/backlinks.ts +54 -0
  83. package/src/cli/commands/context.ts +197 -0
  84. package/src/cli/commands/index-cmd.ts +300 -0
  85. package/src/cli/commands/index.ts +13 -0
  86. package/src/cli/commands/links.ts +52 -0
  87. package/src/cli/commands/search.ts +451 -0
  88. package/src/cli/commands/stats.ts +146 -0
  89. package/src/cli/commands/tree.ts +107 -0
  90. package/src/cli/flag-schemas.ts +275 -0
  91. package/src/cli/help.ts +386 -0
  92. package/src/cli/index.ts +9 -0
  93. package/src/cli/main.ts +145 -0
  94. package/src/cli/options.ts +31 -0
  95. package/src/cli/typo-suggester.test.ts +105 -0
  96. package/src/cli/typo-suggester.ts +130 -0
  97. package/src/cli/utils.ts +126 -0
  98. package/src/core/index.ts +1 -0
  99. package/src/core/types.ts +140 -0
  100. package/src/embeddings/index.ts +8 -0
  101. package/src/embeddings/openai-provider.ts +165 -0
  102. package/src/embeddings/semantic-search.ts +583 -0
  103. package/src/embeddings/types.ts +82 -0
  104. package/src/embeddings/vector-store.ts +299 -0
  105. package/src/index/index.ts +4 -0
  106. package/src/index/indexer.ts +446 -0
  107. package/src/index/storage.ts +196 -0
  108. package/src/index/types.ts +109 -0
  109. package/src/index/watcher.ts +131 -0
  110. package/src/index.ts +8 -0
  111. package/src/mcp/server.ts +483 -0
  112. package/src/parser/index.ts +1 -0
  113. package/src/parser/parser.test.ts +291 -0
  114. package/src/parser/parser.ts +395 -0
  115. package/src/parser/section-filter.ts +270 -0
  116. package/src/search/query-parser.test.ts +260 -0
  117. package/src/search/query-parser.ts +319 -0
  118. package/src/search/searcher.test.ts +182 -0
  119. package/src/search/searcher.ts +602 -0
  120. package/src/summarize/budget-bugs.test.ts +620 -0
  121. package/src/summarize/formatters.ts +419 -0
  122. package/src/summarize/index.ts +20 -0
  123. package/src/summarize/summarizer.test.ts +275 -0
  124. package/src/summarize/summarizer.ts +528 -0
  125. package/src/summarize/verify-bugs.test.ts +238 -0
  126. package/src/utils/index.ts +1 -0
  127. package/src/utils/tokens.test.ts +142 -0
  128. package/src/utils/tokens.ts +186 -0
  129. package/tests/fixtures/cli/.mdcontext/config.json +8 -0
  130. package/tests/fixtures/cli/.mdcontext/indexes/documents.json +33 -0
  131. package/tests/fixtures/cli/.mdcontext/indexes/links.json +12 -0
  132. package/tests/fixtures/cli/.mdcontext/indexes/sections.json +233 -0
  133. package/tests/fixtures/cli/.mdcontext/vectors.bin +0 -0
  134. package/tests/fixtures/cli/.mdcontext/vectors.meta.json +1264 -0
  135. package/tests/fixtures/cli/README.md +9 -0
  136. package/tests/fixtures/cli/api-reference.md +11 -0
  137. package/tests/fixtures/cli/getting-started.md +11 -0
  138. package/tsconfig.json +26 -0
  139. package/vitest.config.ts +21 -0
  140. package/vitest.setup.ts +12 -0
@@ -0,0 +1,164 @@
1
+ # Report: A3 - LLM Chat Analyst
2
+
3
+ ## Mission
4
+
5
+ Extract additional feedback and ideas from docs.llm/ (ignoring vector7/)
6
+
7
+ ## Command Log
8
+
9
+ | # | Command | Purpose | Result | Useful? |
10
+ | --- | ----------------------------------------------------------------------- | ---------------------- | ------------------------------------------------ | ----------- |
11
+ | 1 | `mdcontext --help` | Learn tool | Showed commands, options, examples | Yes |
12
+ | 2 | `mdcontext tree docs.llm/` | List all files | Found 3 files: amorphic.md, feedback.md, spec.md | Yes |
13
+ | 3 | `mdcontext index docs.llm/ --force` | Index directory | Ran in background | Yes |
14
+ | 4 | `mdcontext stats docs.llm/` | Check index stats | 3 docs, 89K tokens, 448 sections | Very useful |
15
+ | 5 | `mdcontext tree docs.llm/feedback.md` | Show outline | Revealed section structure (11K tokens) | Yes |
16
+ | 6 | `mdcontext tree docs.llm/amorphic.md` | Show outline | Revealed section structure (21K tokens) | Yes |
17
+ | 7 | `mdcontext tree docs.llm/spec.md` | Show outline | Revealed section structure (56K tokens) | Yes |
18
+ | 8 | `mdcontext search "feedback" docs.llm/` | Find feedback content | 10 results with context | Yes |
19
+ | 9 | `mdcontext search "ideas" docs.llm/` | Find ideas | 6 results | Moderate |
20
+ | 10 | `mdcontext search "suggestion OR recommend" docs.llm/` | Find suggestions | 10 results | Moderate |
21
+ | 11 | `mdcontext search "problem OR challenge OR limitation" docs.llm/` | Find problems | 10 results, critical themes | Yes |
22
+ | 12 | `mdcontext search "innovation OR future OR vision" docs.llm/` | Find future vision | 10 results | Yes |
23
+ | 13 | `mdcontext search "improve OR enhancement OR better" docs.llm/` | Find improvements | 10 results | Yes |
24
+ | 14 | `mdcontext context docs.llm/feedback.md -t 3000` | Summarize feedback.md | Got condensed overview (16% of file) | Yes |
25
+ | 15 | `mdcontext context docs.llm/amorphic.md --section "Research Frontiers"` | Get research section | Clean targeted content | Very useful |
26
+ | 16 | `mdcontext context docs.llm/amorphic.md --section "Open Questions"` | Get open questions | 3 key questions | Very useful |
27
+ | 17 | `mdcontext search "trust OR security OR privacy"` | Find trust themes | 10 results | Yes |
28
+ | 18 | `mdcontext context docs.llm/feedback.md --section "The Core Thesis"` | Get core thesis | Key product vision | Very useful |
29
+ | 19 | `mdcontext search "agent OR autonomous OR automation"` | Find automation themes | 10 results, deep content | Yes |
30
+ | 20 | `mdcontext context ... --section "The Paradox of Automation"` | Get paradox section | Insightful content | Very useful |
31
+ | 21 | `mdcontext context ... --section "From Platform to Paradigm"` | Get paradigm section | Rich philosophical content | Very useful |
32
+ | 22 | `mdcontext search "workflow OR process OR pattern"` | Find workflow themes | 10 results | Yes |
33
+ | 23 | `mdcontext context ... --section "What Problem This Solves"` | Get problem statement | Clear product definition | Very useful |
34
+ | 24 | `mdcontext context ... --section "Key Differentiators"` | Get differentiators | Comparison table | Yes |
35
+ | 25 | `mdcontext search "memory OR knowledge OR context"` | Find memory themes | 10 results | Yes |
36
+ | 26 | `mdcontext context ... --section "Geometric Memory Architecture"` | Get memory arch | Core innovation | Very useful |
37
+ | 27 | `mdcontext context ... --section "Anti-Patterns"` | Get anti-patterns | Clear guardrails | Yes |
38
+ | 28 | `mdcontext search "user OR experience OR usability"` | Find UX themes | 10 results | Moderate |
39
+ | 29 | `mdcontext context ... --section "The Authority Gradient"` | Get authority section | Key innovation | Very useful |
40
+ | 30 | `mdcontext search "scale OR enterprise OR organization"` | Find scale themes | 10 results | Moderate |
41
+ | 31 | `mdcontext context ... --section "Philosophy"` | Get philosophy | Core principles | Yes |
42
+ | 32 | `mdcontext context ... --section "Designing for Emergence"` | Get emergence section | Design principles | Very useful |
43
+
44
+ ## Findings
45
+
46
+ ### Key Discoveries
47
+
48
+ 1. **docs.llm/ contains LLM-generated product vision documents** - These are AI-written explorations of a "HumanWork" (also called "Amorphic") platform for human-AI collaboration orchestration.
49
+
50
+ 2. **Core Product Vision: "Operating System for Work"** - The documents describe a system that bridges human creativity with AI execution, using immutable event ledgers and role-based abstractions.
51
+
52
+ 3. **Three-Layer Memory Architecture** - Event (immutable facts), Status (derived views), Semantic (AI-assisted understanding) - a sophisticated memory model for persistent AI collaboration.
53
+
54
+ 4. **Explicit Anti-Automation Stance** - The documents articulate why pure automation fails for knowledge work and propose "choreography" as an alternative paradigm.
55
+
56
+ 5. **Research-Grade Thinking** - Includes open questions, research frontiers, and mature consideration of organizational transformation.
57
+
58
+ ### Relevant Quotes/Sections Found
59
+
60
+ > "Enterprise adoption of autonomous agents has stalled due to **Opacity** (we don't know how the agent works) and **Risk** (we can't trust it to run unsupervised)."
61
+ > Source: feedback.md, "The Core Thesis"
62
+
63
+ > "In order for AI to automate humans, we must first orchestrate humans. HumanWork is that orchestration layer."
64
+ > Source: feedback.md, "The Core Thesis"
65
+
66
+ > "The most sophisticated choreographed intelligence systems often appear less automated than simpler ones."
67
+ > Source: amorphic.md, "The Paradox of Automation"
68
+
69
+ > "How do we ensure HumanWork organizations remain aligned with human values as they become more autonomous?"
70
+ > Source: amorphic.md, "Open Questions"
71
+
72
+ > "Memory is not: Chat logs, Raw execution logs, Agent recall, Hidden embeddings. Memory is: Immutable facts, Derived views, Shared context, Inspectable truth."
73
+ > Source: spec.md, "Memory Philosophy"
74
+
75
+ > "The system helps humans think better - it never decides for them."
76
+ > Source: spec.md, "Philosophy"
77
+
78
+ > "What emerges isn't just a more capable system - it's a new category of intelligence entirely."
79
+ > Source: feedback.md, "From Platform to Paradigm"
80
+
81
+ ### Themes Identified
82
+
83
+ 1. **Human-AI Choreography** - Reframing "automation" as "collaboration" with explicit authority gradients and intervention points
84
+
85
+ 2. **Memory as Infrastructure** - Treating organizational memory as first-class infrastructure with event sourcing and immutable ledgers
86
+
87
+ 3. **Trust Through Transparency** - Building trust via observable behavior rather than assumed reliability
88
+
89
+ 4. **Emergent Intelligence** - Designing systems that grow and learn rather than being fully specified upfront
90
+
91
+ 5. **Anti-Patterns as Guardrails** - Explicit documentation of what NOT to do (e.g., never hide events, never let workflows execute directly)
92
+
93
+ 6. **Workspace Metaphor** - Cognitive workspaces that maintain context across sessions and enable parallel exploration
94
+
95
+ 7. **Role Abstraction** - Separating the "what" of work from "who does it" (human vs AI actor)
96
+
97
+ 8. **Recursive Improvement** - Meta-learning where the collaboration system itself improves at collaboration
98
+
99
+ ## Tool Evaluation
100
+
101
+ ### What Worked Well
102
+
103
+ - **Section-targeted context** (`--section` flag) was extremely effective for extracting specific topics from large files
104
+ - **Boolean search operators** (OR, AND) enabled comprehensive theme discovery
105
+ - **Tree command with outlines** provided excellent navigation for 56K+ token documents
106
+ - **Token budget controls** (`-t` flag) allowed reasonable context extraction
107
+ - **Search with line context** showed surrounding text, making results interpretable
108
+ - **Stats command** gave immediate understanding of corpus size and complexity
109
+
110
+ ### What Was Frustrating
111
+
112
+ - **No embeddings by default** - Every search reminded me to run `--embed` but I stayed with keyword search
113
+ - **10 result limit** on searches - Could not see full scope of matches without multiple queries
114
+ - **Boolean search required explicit operators** - Had to use "OR" not natural language
115
+ - **Context command truncation** - The 16% of feedback.md felt limiting for a summary
116
+ - **No way to exclude sections** - Could not say "give me everything EXCEPT this section"
117
+
118
+ ### What Was Missing
119
+
120
+ - **Cross-file semantic search** - Would have been useful to find themes across all 3 files at once
121
+ - **Relevance ranking** - Results came in document order, not relevance order
122
+ - **Summary generation** - Would appreciate AI-generated summaries of search results
123
+ - **Export to structured format** - No easy way to extract findings programmatically
124
+ - **Highlight/annotation** - Cannot mark sections for later reference
125
+
126
+ ### Confidence Level
127
+
128
+ [X] High - The tool gave me reliable, reproducible access to the content. Boolean searches and section targeting let me systematically explore 89K tokens of dense material.
129
+
130
+ ### Would Use Again? (1-5)
131
+
132
+ **4** - Strong tool for navigating large markdown corpora. The section-targeting feature is genuinely valuable for long documents. Would score 5 if semantic search was enabled by default and result limits were configurable. The approach of "tree to understand structure, search to find content, context to extract" is a solid workflow.
133
+
134
+ ## Time & Efficiency
135
+
136
+ - Commands run: 32
137
+ - Compared to reading all files: **Much less** - 89K tokens would take significant time to read raw. The tool allowed surgical extraction of relevant content in under 15 minutes of exploration.
138
+
139
+ ## Additional Observations
140
+
141
+ ### These are High-Quality Design Documents
142
+
143
+ The docs.llm/ folder contains what appears to be Claude-generated (or Claude-assisted) exploration of a complex product vision. The thinking is sophisticated, covering:
144
+
145
+ - Architectural patterns
146
+ - Organizational transformation
147
+ - Research frontiers
148
+ - Anti-patterns and guardrails
149
+ - Philosophy and design principles
150
+
151
+ ### Feedback for the Product (HumanWork/Amorphic)
152
+
153
+ From the content I extracted, the documents themselves surface several self-reflective questions:
154
+
155
+ 1. How to prevent "organizational capture" where systems optimize for self-perpetuation
156
+ 2. How to maintain human agency while leveraging AI efficiency
157
+ 3. How to ensure value alignment as systems become more autonomous
158
+ 4. How to measure success in "hybrid organizations"
159
+
160
+ These open questions represent mature product thinking and should inform roadmap priorities.
161
+
162
+ ### The "Memory Is All You Need" Parallel
163
+
164
+ The feedback.md document draws an explicit parallel between the Transformer "Attention Is All You Need" breakthrough and their proposed "Memory Is All You Need" architecture. This is a bold positioning that frames the product as a paradigm shift rather than incremental improvement.
@@ -0,0 +1,228 @@
1
+ # B-Synth: Strategy B Synthesis
2
+
3
+ ## Executive Summary
4
+
5
+ The three Strategy B agents collectively identified a mature, self-aware specification that thoroughly documents what NOT to do (anti-patterns, invariants) but has significant gaps in terminology alignment, implementation guidance, and philosophical framing. The most critical finding is the HumanWork-Evolution.md document which already synthesizes feedback into a phased improvement plan - agents B1-B3 largely validated and expanded on this existing gap analysis.
6
+
7
+ ## Cross-Agent Patterns
8
+
9
+ **Theme 1: Semantic Search Underperformance**
10
+ All three agents found semantic search unreliable for multi-word conceptual queries. All fell back to keyword search frequently. This is the strongest cross-agent signal about the mdcontext tool.
11
+
12
+ **Theme 2: HumanWork-Evolution.md as Critical Source**
13
+ Both B1 and B2 independently discovered this document as the authoritative source for gaps and critiques, validating its importance.
14
+
15
+ **Theme 3: Human-First Philosophy with Acknowledged Tensions**
16
+ All agents found the docs emphasize human control, but B2 identified a philosophical gap: the spec frames human control as end-state rather than transition phase toward "intelligence crystallization."
17
+
18
+ **Theme 4: Section-Level Context Extraction Praised**
19
+ All three agents highlighted the `--section` flag as highly effective for targeted retrieval.
20
+
21
+ **Theme 5: Checkpoint/Intervention Architecture**
22
+ B1 found checkpoints in anti-patterns, B3 found them in workflow design - the spec heavily emphasizes checkpoints as the key governance mechanism.
23
+
24
+ ## Consolidated Findings
25
+
26
+ ### Architecture Criticisms (from B1)
27
+
28
+ **External Criticisms (of traditional approaches):**
29
+
30
+ - Brittleness of pure automation (combinatorial explosion of rules)
31
+ - Coordination Trap (multiplies human translation work)
32
+ - Innovation Strangulation (automation-incompatible approaches avoided)
33
+ - Judgment Gap (80% flawless, 20% chaos)
34
+ - Context Collapse (context as configuration, not conversation)
35
+ - Observability Problem (black-box agents kill trust)
36
+
37
+ **Self-Imposed Constraints (internal guardrails):**
38
+
39
+ - 8 Architectural Invariants (no hidden state, no irreversible execution, etc.)
40
+ - 7 Memory Model Anti-Patterns
41
+ - 8 Workflow Anti-Patterns
42
+
43
+ **Open Questions (acknowledged gaps):**
44
+
45
+ - Alignment with human values at scale
46
+ - Limits of organizational intelligence
47
+ - Preventing organizational capture (self-perpetuation)
48
+
49
+ ### Gaps Identified (from B2)
50
+
51
+ **Terminology Gaps:**
52
+
53
+ - Agent -> Actor (unified human/machine)
54
+ - Artifact -> Deliverable (business language)
55
+ - Event Memory -> The Ledger (IP capture emphasis)
56
+
57
+ **Missing Primitives:**
58
+
59
+ - Correction Event (captures human intelligence on modifications)
60
+ - Authority Gradient (replaces binary control)
61
+ - Pattern Crystallization (organizational learning mechanism)
62
+
63
+ **Architectural Gaps:**
64
+
65
+ - No geometric/semantic embeddings in Semantic Memory
66
+ - Cost model doesn't unify human hours and AI tokens
67
+ - Privacy model is "policy overlay" only
68
+ - No formal API specification
69
+
70
+ **Philosophical Gap:**
71
+
72
+ - Spec positions "human control" as goal
73
+ - Feedback suggests reframing as "intelligence extraction"
74
+ - Human corrections should become portable organizational intelligence
75
+
76
+ ### Workflow Improvements (from B3)
77
+
78
+ **Core Philosophy:**
79
+
80
+ - Workflows as "guidance without control"
81
+ - Six concepts: Entry Signals, Roles, Phases, Activities, Checkpoints, Exit Conditions
82
+ - Checkpoints as primary governance mechanism
83
+
84
+ **Authority Gradient (4 modes):**
85
+
86
+ 1. Instructional: Step-by-step human instructions
87
+ 2. Consultative: Human defines goal, agent proposes
88
+ 3. Supervisory: Agents execute, humans monitor
89
+ 4. Exploratory: Alternating generation/testing
90
+
91
+ **Intervention Points:**
92
+
93
+ - Redirect, Override, Inject, Escalate
94
+
95
+ **Key Patterns:**
96
+
97
+ - Time Travel and Branching
98
+ - Parallel Exploration
99
+ - Immutable Workflow Versioning
100
+
101
+ **Organizational Transformation:**
102
+
103
+ - Choreographic Maturity Model (4 levels)
104
+ - Cultural shifts toward experimental mindsets
105
+
106
+ ## Proposed Spec Changes (Prioritized)
107
+
108
+ ### High Priority
109
+
110
+ - [ ] Rename Artifact -> Deliverable throughout (B2)
111
+ - [ ] Add Correction Event primitive (B2) - captures IP when humans modify outputs
112
+ - [ ] Add Authority Gradient to Execution Model (B2, B3) - instructional/consultative/supervisory/exploratory
113
+ - [ ] Expand Judgment Gap (80/20 problem) handling beyond "humans intervene" (B1)
114
+ - [ ] Add "Known Limitations and Trade-offs" section (B1) - what HumanWork sacrifices
115
+ - [ ] Unify cost model for Human + Machine Actors (B2)
116
+
117
+ ### Medium Priority
118
+
119
+ - [ ] Add Actor primitive with type: Human | Machine (B2)
120
+ - [ ] Add Pattern Crystallization to Memory Model (B2)
121
+ - [ ] Rename Event Memory -> The Ledger (B2)
122
+ - [ ] Add cognitive telemetry to Checkpoints (B2) - deliberation_duration, confidence_signal, modification_depth
123
+ - [ ] Document concrete answers to Open Questions or mark as research priorities (B1)
124
+ - [ ] Create decision framework: when DAG-style execution IS appropriate (B1)
125
+ - [ ] Add explicit checkpoint requirements for high-stakes workflows (B3)
126
+ - [ ] Define minimum intervention points per workflow phase (B3)
127
+
128
+ ### Low Priority
129
+
130
+ - [ ] Enhance Semantic Memory with geometric embeddings (B2)
131
+ - [ ] Add detection guidance for when Status Memory becomes authoritative (B1)
132
+ - [ ] Reframe "human control" as transition phase, not end state (B2)
133
+ - [ ] Adopt choreography language over orchestration (B2)
134
+ - [ ] Develop privacy model beyond "policy overlay" (B2)
135
+ - [ ] Create formal API specification (B2)
136
+ - [ ] Establish choreographic maturity assessment framework (B3)
137
+ - [ ] Create signals taxonomy (activity, outcome, attention, health) (B3)
138
+
139
+ ## Tool Evaluation Synthesis
140
+
141
+ All three agents used the mdcontext tool extensively (38, 35, and 41 commands respectively = 114 total commands). Their assessments were remarkably consistent.
142
+
143
+ ### Common Praise
144
+
145
+ - **Section-level context extraction** (`--section`) was universally praised as highly effective
146
+ - **Keyword search** was reliable and essential fallback
147
+ - **Token budget control** (`-t`) helped manage context size
148
+ - **Tree command** gave quick corpus overview
149
+ - **Fast embedding indexing** (~$0.003 cost)
150
+ - **Stats command** useful for understanding corpus size
151
+
152
+ ### Common Frustrations
153
+
154
+ - **Semantic search returned 0 results** for multi-word conceptual queries (all 3 agents)
155
+ - **Token truncation** without clear indication of what was excluded
156
+ - **No way to chain or aggregate searches** - had to run many separate commands
157
+ - **Multi-word keyword searches failed** (e.g., "issue challenge gap" = 0 results)
158
+ - **False positives** in keyword search
159
+ - **No semantic search threshold adjustment**
160
+
161
+ ### Suggested Improvements
162
+
163
+ - Add fuzzy/stemmed search (fail vs failure)
164
+ - Add "search within results" / progressive refinement
165
+ - Add context around keyword matches without re-running
166
+ - Add combined semantic+keyword hybrid mode
167
+ - Add cross-document synthesis
168
+ - Add batch context extraction for multiple sections/files
169
+ - Add "related sections" feature
170
+ - Add Boolean operators in keyword mode
171
+ - Add export/save functionality
172
+ - Add "what's undefined" query (terms used but not defined)
173
+
174
+ ### Quantitative Summary
175
+
176
+ | Agent | Commands | Confidence | Rating |
177
+ | ----- | -------- | ---------- | ------ |
178
+ | B1 | 38 | Medium | 4/5 |
179
+ | B2 | 35 | High | 4/5 |
180
+ | B3 | 41 | High | 4/5 |
181
+
182
+ All agents rated the tool 4/5 and found it significantly faster than reading all files manually.
183
+
184
+ ## Methodology Assessment
185
+
186
+ How well did Strategy B (divide by question) work?
187
+
188
+ ### Strengths
189
+
190
+ - **Clear scope boundaries**: Each agent had a focused research question, avoiding overlap
191
+ - **Efficient parallelization**: Three agents could work simultaneously on different questions
192
+ - **Natural synthesis path**: Findings from each question type combined naturally into a coherent picture
193
+ - **Reduced redundancy**: Agents didn't repeat the same searches (unlike Strategy A file-based division)
194
+ - **Comprehensive coverage**: Architecture + Gaps + Workflows covers the spec from multiple angles
195
+ - **Discovery of key document**: Multiple agents independently found HumanWork-Evolution.md, validating its importance
196
+
197
+ ### Weaknesses
198
+
199
+ - **Question boundaries can be fuzzy**: "Architecture criticisms" vs "gaps" had some overlap (e.g., observability problem)
200
+ - **Dependent insights split**: Authority Gradient appeared in both B2 (as gap) and B3 (as workflow improvement)
201
+ - **No shared discovery context**: B2 found HumanWork-Evolution.md which would have helped B1's research
202
+ - **Variable scope difficulty**: Some questions (workflows) were more expansive than others (architecture criticisms)
203
+
204
+ ### Would Recommend For
205
+
206
+ - **Documentation analysis** where questions naturally partition the content
207
+ - **Due diligence reviews** (legal, technical, financial angles)
208
+ - **Research synthesis** where multiple perspectives on same corpus needed
209
+ - **Gap analysis** where "what exists" vs "what's missing" are distinct questions
210
+ - **Any task where questions are more natural than file divisions**
211
+
212
+ ### Not Recommended For
213
+
214
+ - **Code review** (files matter more than questions)
215
+ - **Tasks where answers span all questions** (high synthesis overhead)
216
+ - **Simple/small corpora** (parallelization overhead not worth it)
217
+
218
+ ## Appendix: Agent Command Efficiency
219
+
220
+ | Metric | B1 | B2 | B3 | Total |
221
+ | ------------------- | --- | --- | --- | ----- |
222
+ | Commands run | 38 | 35 | 41 | 114 |
223
+ | Semantic searches | 8 | 4 | 12 | 24 |
224
+ | Keyword searches | 22 | 23 | 0 | 45 |
225
+ | Context extractions | 13 | 9 | 19 | 41 |
226
+ | Tree/Stats/Index | 3 | 3 | 3 | 9 |
227
+
228
+ **Key observation**: B3 (workflows) used semantic search exclusively and found it more effective for their domain. B1 and B2 heavily relied on keyword search after semantic search failed. This suggests semantic search may work better for concrete concepts (workflows, collaboration) than abstract critiques (gaps, criticisms).
@@ -0,0 +1,207 @@
1
+ # Report: B1 - Architecture Critic Hunter
2
+
3
+ ## Mission
4
+
5
+ Find architecture and design criticisms across all documentation
6
+
7
+ ## Research Question
8
+
9
+ What architecture and design criticisms exist?
10
+
11
+ ## Command Log
12
+
13
+ | # | Command | Purpose | Result | Useful? |
14
+ | --- | ------------------------------------------------------------------------------------------------------------- | ------------------------------- | ------------------------------------- | ------- |
15
+ | 1 | `mdcontext --help` | Learn tool | Got full usage guide | Yes |
16
+ | 2 | `mdcontext index --embed --force` | Index all files with embeddings | 23 docs, 922 sections, 904 embeddings | Yes |
17
+ | 3 | `mdcontext search "architecture criticism problems design flaws limitations"` | Semantic search for criticisms | 1 result (ARCHITECTURAL_FOUNDATIONS) | Partial |
18
+ | 4 | `mdcontext search "design trade-offs weaknesses concerns issues"` | Semantic search | 0 results | No |
19
+ | 5 | `mdcontext search "failure problems complexity challenges"` | Semantic search | 0 results | No |
20
+ | 6 | `mdcontext search "failure" --mode keyword` | Keyword search | 10 results | Yes |
21
+ | 7 | `mdcontext context docs.amorphic/02-THE_FAILURE_OF_PURE_AUTOMATION.md -t 3000` | Get full failure analysis | Full document | Yes |
22
+ | 8 | `mdcontext search "limitations" --mode keyword` | Keyword search | 1 result | Yes |
23
+ | 9 | `mdcontext search "problem" --mode keyword` | Keyword search | 10 results | Yes |
24
+ | 10 | `mdcontext search "risk" --mode keyword` | Keyword search | 10 results | Yes |
25
+ | 11 | `mdcontext search "anti-pattern" --mode keyword` | Keyword search | 4 results | Yes |
26
+ | 12 | `mdcontext context docs/05-MEMORY_MODEL.md --section "Anti-Patterns"` | Get memory anti-patterns | 7 forbidden patterns | Yes |
27
+ | 13 | `mdcontext context docs/06-WORKFLOWS.md --section "Anti-Patterns"` | Get workflow anti-patterns | 8 forbidden patterns | Yes |
28
+ | 14 | `mdcontext search "concern" --mode keyword` | Keyword search | 10 results | Yes |
29
+ | 15 | `mdcontext search "brittle" --mode keyword` | Keyword search | 10 results | Yes |
30
+ | 16 | `mdcontext search "complexity" --mode keyword` | Keyword search | 9 results | Yes |
31
+ | 17 | `mdcontext search "overhead" --mode keyword` | Keyword search | 7 results | Yes |
32
+ | 18 | `mdcontext search "design patterns architecture decision"` | Semantic search | 10 results | Yes |
33
+ | 19 | `mdcontext context docs.amorphic/03-ARCHITECTURAL_FOUNDATIONS.md -t 2000` | Get architectural foundations | Full document | Yes |
34
+ | 20 | `mdcontext context docs.amorphic/05-TECHNICAL_IMPLEMENTATION_PATTERNS.md -t 2500` | Get implementation patterns | Full document | Yes |
35
+ | 21 | `mdcontext search "gap" --mode keyword` | Keyword search | 8 results | Yes |
36
+ | 22 | `mdcontext context docs.amorphic/02-THE_FAILURE_OF_PURE_AUTOMATION.md --section "Judgment Gap"` | Get judgment gap section | Detailed section | Yes |
37
+ | 23 | `mdcontext search "forbidden" --mode keyword` | Keyword search | 5 results | Yes |
38
+ | 24 | `mdcontext search "corrupt" --mode keyword` | Keyword search | 5 results | Yes |
39
+ | 25 | `mdcontext tree docs/01-ARCHITECTURE.md` | Get document outline | 47 sections | Yes |
40
+ | 26 | `mdcontext context docs/01-ARCHITECTURE.md --section "Why This Architecture Works"` | Get rationale | Brief justification | Yes |
41
+ | 27 | `mdcontext context docs/01-ARCHITECTURE.md --section "Architectural Invariants"` | Get invariants | 8 invariants | Yes |
42
+ | 28 | `mdcontext search "fail" --mode keyword` | Keyword search | 10 results | Yes |
43
+ | 29 | `mdcontext context docs/00-README.md --section "What Problem"` | Get problem statement | Core problem | Yes |
44
+ | 30 | `mdcontext search "cost" --mode keyword` | Keyword search | 10 results | Yes |
45
+ | 31 | `mdcontext context docs.llm/feedback.md -t 3000` | Get feedback document | Chat feedback analysis | Yes |
46
+ | 32 | `mdcontext search "traditional" --mode keyword` | Keyword search | 10 results | Yes |
47
+ | 33 | `mdcontext tree docs.llm/amorphic.md` | Get amorphic outline | Full outline | Yes |
48
+ | 34 | `mdcontext context docs.llm/amorphic.md --section "Open Questions"` | Get open questions | 3 open questions | Yes |
49
+ | 35 | `mdcontext context docs.llm/amorphic.md --section "Paradox of Automation"` | Get paradox section | Detailed section | Yes |
50
+ | 36 | `mdcontext search "wrong" --mode keyword` | Keyword search | 6 results | Yes |
51
+ | 37 | `mdcontext context docs.amorphic/04-THE_HUMAN-AGENT_COLLABORATION_MODEL.md --section "Observability Problem"` | Get observability problem | Key issue identified | Yes |
52
+ | 38 | `mdcontext search "scale" --mode keyword` | Keyword search | 10 results | Yes |
53
+
54
+ ## Findings
55
+
56
+ ### Key Discoveries
57
+
58
+ #### 1. Criticisms of Traditional/Pure Automation (Major Theme)
59
+
60
+ The documentation extensively critiques traditional automation approaches:
61
+
62
+ - **Brittleness**: "The system becomes brittle not because any individual rule is wrong, but because the combinatorial explosion of rules creates a rigid lattice that cannot bend without breaking."
63
+ - **Coordination Trap**: Pure automation "multiplies coordination requirements by forcing human work into machine-readable formats that require constant translation and synchronization."
64
+ - **Innovation Strangulation**: "Teams avoid innovative approaches not because they're technically inferior, but because they're automation-incompatible."
65
+ - **Human Bottleneck Paradox**: Attempting to eliminate humans creates new bottlenecks in system configuration and exception handling.
66
+ - **Context Collapse**: Traditional systems treat "context as configuration rather than conversation."
67
+ - **Judgment Gap**: "Systems that handle 80% of cases flawlessly but create chaos in the remaining 20%."
68
+
69
+ #### 2. Agent System Criticisms (Self-Aware)
70
+
71
+ The documentation acknowledges problems with current agent systems:
72
+
73
+ > "Most agent systems fail at real work because they optimize for demos, single-shot tasks, and autonomous execution. They become opaque, brittle, hard to interrupt, impossible to rewind, and unsafe to scale."
74
+ > Source: docs/00-README.md
75
+
76
+ #### 3. Observability Problem
77
+
78
+ > "Most agent systems are black boxes. You send a request, wait, and get a result - with no visibility into what happened in between. When something goes wrong, you're left debugging phantom processes and mysterious failures. This opacity kills trust."
79
+ > Source: docs.amorphic/04-THE_HUMAN-AGENT_COLLABORATION_MODEL.md
80
+
81
+ #### 4. Anti-Patterns Explicitly Forbidden
82
+
83
+ **Memory Model Anti-Patterns:**
84
+
85
+ - Storing mutable state in Event Memory
86
+ - Treating Status Memory as authoritative
87
+ - Letting Semantic Memory drive execution
88
+ - Hiding Events from humans
89
+ - Creating circular dependencies between layers
90
+ - Bypassing Event Memory for "performance"
91
+ - Hard-deleting critical audit events
92
+
93
+ **Workflow Anti-Patterns:**
94
+
95
+ - Workflows that execute directly
96
+ - Workflows that mutate artifacts
97
+ - Workflows that allocate cost
98
+ - Workflows that own agents
99
+ - Hidden workflow state
100
+ - Workflows that become Turing-complete
101
+ - Mandatory workflows (at system level)
102
+ - Workflows that bypass Control Plane
103
+
104
+ #### 5. Architectural Invariants (Design Constraints)
105
+
106
+ The system explicitly maintains these constraints to avoid known issues:
107
+
108
+ - No hidden mutable state
109
+ - No irreversible execution
110
+ - No unobservable progress
111
+ - No agent-owned memory
112
+ - No loss of human authority
113
+ - No concurrent mutation of the same scope
114
+ - No execution without a Workspace
115
+ - No automatic flow from Org to Workspace
116
+
117
+ #### 6. Open Questions (Acknowledged Gaps)
118
+
119
+ > "How do we ensure HumanWork organizations remain aligned with human values as they become more autonomous?"
120
+ > "What are the limits of organizational intelligence? Are there problems that fundamentally require individual rather than collective cognition?"
121
+ > "How do we prevent organizational capture - scenarios where HumanWork systems optimize for their own perpetuation rather than their intended purposes?"
122
+ > Source: docs.llm/amorphic.md
123
+
124
+ #### 7. Substrate Problem
125
+
126
+ > "Implementation details leak into the conceptual model, making the workflow harder to reason about and modify."
127
+ > Source: docs.amorphic/03-ARCHITECTURAL_FOUNDATIONS.md
128
+
129
+ ### Relevant Quotes/Sections Found
130
+
131
+ > "Pure automation assumes complete knowledge of the problem space. It requires that all possible states, transitions, and edge cases be enumerable at design time. This works beautifully for manufacturing widgets or processing financial transactions - domains where the rules are well-understood and the exceptions are genuinely exceptional. But knowledge work exists in a different regime entirely."
132
+ > Source: docs.amorphic/02-THE_FAILURE_OF_PURE_AUTOMATION.md, The Brittleness of Complete Systems
133
+
134
+ > "The paradox emerges when pure automation, in attempting to eliminate human bottlenecks, creates new bottlenecks in the form of system configuration, exception handling, and cross-system integration."
135
+ > Source: docs.amorphic/02-THE_FAILURE_OF_PURE_AUTOMATION.md, The Human Bottleneck Paradox
136
+
137
+ > "Traditional workflow systems model execution as directed acyclic graphs (DAGs) - nodes representing tasks, edges representing dependencies. This works well for batch processing and pipeline scenarios where the structure is known in advance. But it breaks down when workflows need to adapt their structure based on runtime conditions or accumulated learning."
138
+ > Source: docs.amorphic/03-ARCHITECTURAL_FOUNDATIONS.md, Component Relationships
139
+
140
+ > "If Status Memory cannot be rebuilt, it has become a source of truth and the system is corrupted."
141
+ > Source: docs/05-MEMORY_MODEL.md, The Hard Rule
142
+
143
+ ### Answer to Research Question
144
+
145
+ **What architecture and design criticisms exist?**
146
+
147
+ The documentation contains extensive, self-aware architectural criticism organized into three categories:
148
+
149
+ 1. **Criticisms of Traditional Approaches (external):** The docs thoroughly critique pure automation, traditional workflow systems (DAGs), black-box agent systems, and context-as-configuration approaches. These criticisms justify the HumanWork design decisions.
150
+
151
+ 2. **Self-Imposed Constraints (internal guardrails):** The architecture explicitly forbids specific anti-patterns for both memory and workflows. These represent lessons learned about what NOT to do - treating them as "corrupted" states if they appear.
152
+
153
+ 3. **Acknowledged Open Questions (honest gaps):** The documentation admits uncertainty about alignment with human values at scale, limits of organizational intelligence, and preventing organizational capture.
154
+
155
+ The architectural philosophy is defensive - explicitly naming what can go wrong and building constraints to prevent it. The invariants and anti-patterns serve as architectural "unit tests" against known failure modes.
156
+
157
+ ## Proposed Spec Changes
158
+
159
+ - [ ] Add section on "Known Limitations and Trade-offs" to acknowledge what HumanWork architecture sacrifices (e.g., raw execution speed for observability)
160
+ - [ ] Expand on how the Judgment Gap (80/20 problem) is specifically addressed beyond "humans intervene"
161
+ - [ ] Document concrete answers to the Open Questions or mark them as research priorities
162
+ - [ ] Add guidance on detecting when Status Memory has "become authoritative" before corruption
163
+ - [ ] Create decision framework for when DAG-style execution IS appropriate vs. adaptive execution
164
+
165
+ ## Tool Evaluation
166
+
167
+ ### What Worked Well
168
+
169
+ - Keyword search (`--mode keyword`) was highly effective for finding specific terms like "failure", "brittle", "anti-pattern"
170
+ - Section-targeted context (`--section "X"`) efficiently extracted exactly what I needed
171
+ - The `tree` command helped understand document structure before diving in
172
+ - Embedding indexing was fast and one-time cost
173
+ - Token budget control (`-t`) helped manage context size
174
+
175
+ ### What Was Frustrating
176
+
177
+ - Semantic search often returned 0 results for multi-word queries that should have matched
178
+ - Semantic search for "design trade-offs weaknesses concerns issues" returned nothing
179
+ - Semantic search for "failure problems complexity challenges" returned nothing
180
+ - Had to fall back to keyword search frequently after semantic failed
181
+ - Multi-word keyword searches didn't work (e.g., "issue challenge gap" = 0 results)
182
+ - Boolean operators in keyword mode unclear if supported
183
+
184
+ ### What Was Missing
185
+
186
+ - No fuzzy/stemmed search (had to search "fail" vs "failure" separately)
187
+ - No "search within results" or progressive refinement
188
+ - No way to get context around keyword matches without re-running with `context`
189
+ - Semantic search threshold/sensitivity adjustment not available
190
+ - No combined semantic+keyword hybrid mode
191
+ - Difficult to search for concepts without exact terms
192
+
193
+ ### Confidence Level
194
+
195
+ [X] Medium
196
+
197
+ The keyword search found the explicit criticisms comprehensively. However, I may have missed implicit criticisms or design concerns that don't use obvious negative terminology. Semantic search underperformed expectations.
198
+
199
+ ### Would Use Again? (1-5)
200
+
201
+ **4** - Good for structured documentation analysis. Keyword search is reliable. Would use again but with clearer expectations that semantic search needs more work. The section-level context extraction is genuinely useful for targeted retrieval.
202
+
203
+ ## Time & Efficiency
204
+
205
+ - Commands run: **38**
206
+ - Compared to reading all files: **Much less** - Would have taken 30+ minutes to read all docs manually. Tool-based search took approximately 15 minutes to find all relevant criticisms.
207
+ - Token efficiency: Reduced ~150k tokens of docs to targeted extracts totaling ~15k tokens of relevant content