woods 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ab164a85b76d9c97fc6142836da5349a444e9c62f507622fb327f5cc8f434ed4
4
- data.tar.gz: 66752a95ddb4183a6f78d47417690242cfc3ad2bdfc622b8740fe2fbc388658e
3
+ metadata.gz: 927abae1f4f641405384261569e1d25f94a672ca986d1c50093b3f6a56b7db38
4
+ data.tar.gz: fa35b4320669d195a8e4f377400b6999e735aebf5447071ee3353eaa8856840b
5
5
  SHA512:
6
- metadata.gz: 2d53024eefb62544ba536f23b1c9f36bebab988fc75223ef72e1d2ffd1d2ed0b46b2507781b040726b8059d14c9f6eefa3faa1c4d6b0a4b6c5019905ef41675d
7
- data.tar.gz: 8d5c7a1e7ab4c7b401e61140a9ec5bea06848244d08192f05b0cc088a93980b3208cf3f22a0319545857051dc0b2a234f4d4c2ef8a5789ef108080f179aa6f99
6
+ metadata.gz: 5ae6ef3436f6aa6b936b46103480e797a8a6e0fb4250f5dcc8bc721c2b9b911739e5d5aebd5b8b97c6788d58dcd19e9dbd5c6211a3400283e60084ce80c6d031
7
+ data.tar.gz: 6b38946aca86d407ab6d516d32dda4c5797adfcd27249b1685bdebb249ff34e71d62e3eabb266c991d1b32ad2df815e2a56dc924615c1def9df0c4c6754cd629
data/CHANGELOG.md CHANGED
@@ -5,6 +5,23 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.2.0] - 2026-03-27
9
+
10
+ ### Added
11
+
12
+ - **Unblocked Documents API exporter** — sync extraction data to an Unblocked collection for code review and Q&A context
13
+ - `Woods::Unblocked::Client` — REST client with retry and daily budget rate limiting
14
+ - `Woods::Unblocked::DocumentBuilder` — type-specific Markdown formatters optimized for review context (blast radius, entry points, associations, side effects)
15
+ - `Woods::Unblocked::Exporter` — full/partial sync orchestrator with priority ordering
16
+ - `Woods::Unblocked::RateLimiter` — daily budget tracking (1000 calls/day)
17
+ - New rake tasks: `woods:unblocked_sync` (alias: `woods:relay`)
18
+ - New config: `unblocked_api_token`, `unblocked_collection_id`, `unblocked_repo_url`
19
+ - Integration guide: `docs/UNBLOCKED_INTEGRATION.md`
20
+ - **Domain cluster detection** in `GraphAnalyzer` — groups code units into semantic domains using namespace prefixes and graph connectivity
21
+ - `GraphAnalyzer#domain_clusters` — hybrid namespace + graph clustering with hub identification, entry point detection, and boundary edge mapping
22
+ - New MCP tool: `domain_clusters` with `min_size` and `types` filters
23
+ - New renderer: `render_domain_clusters` in MarkdownRenderer
24
+
8
25
  ## [0.3.1] - 2026-03-04
9
26
 
10
27
  ### Fixed
data/README.md CHANGED
@@ -65,6 +65,40 @@ Woods boots your Rails app, introspects everything using runtime APIs, and write
65
65
 
66
66
  Your `User` model includes `Auditable`, `Searchable`, and `SoftDeletable`. An AI tool reading `app/models/user.rb` sees 40 lines. Woods inlines all three concerns directly into the extracted unit — the AI sees the full 200-line behavioral surface area in one block.
67
67
 
68
+ ```ruby
69
+ # What your AI sees (app/models/user.rb) — 4 lines:
70
+ class User < ApplicationRecord
71
+ include Auditable
72
+ include Searchable
73
+ end
74
+
75
+ # What Woods produces — full source with schema + inlined concerns:
76
+ # == Schema Information
77
+ # email :string not null
78
+ # name :string
79
+ #
80
+ # class User < ApplicationRecord
81
+ # include Auditable
82
+ # include Searchable
83
+ # validates :email, presence: true, uniqueness: true
84
+ # ...
85
+ # end
86
+ #
87
+ # ┌─────────────────────────────────────────────────────────────────────┐
88
+ # │ Included from: Auditable │
89
+ # └─────────────────────────────────────────────────────────────────────┘
90
+ # def audit_trail ...
91
+ # ─────────────────────────── End Auditable ───────────────────────────
92
+ #
93
+ # ┌─────────────────────────────────────────────────────────────────────┐
94
+ # │ Included from: Searchable │
95
+ # └─────────────────────────────────────────────────────────────────────┘
96
+ # scope :search, ->(q) { where("name ILIKE ?", "%#{q}%") }
97
+ # ─────────────────────────── End Searchable ───────────────────────────
98
+ ```
99
+
100
+ The `metadata[:inlined_concerns]` array lists which concerns were resolved, so retrieval can filter by concern inclusion.
101
+
68
102
  ### Schema Prepending
69
103
 
70
104
  Model source gets a header with actual column types, indexes, and foreign keys pulled from the live database. No more guessing whether `name` is a `string` or `text`, or whether there's an index on `email`.
@@ -83,6 +117,122 @@ Controller source gets a route map prepended showing the real HTTP verb + path +
83
117
 
84
118
  ---
85
119
 
120
+ ## Examples
121
+
122
+ ### Extracted Model with Schema and Associations
123
+
124
+ After extraction, each model is a self-contained JSON file with schema, associations, validations, and inlined concern source:
125
+
126
+ ```json
127
+ {
128
+ "type": "model",
129
+ "identifier": "Order",
130
+ "file_path": "app/models/order.rb",
131
+ "source_code": "# == Schema Information\n# id :bigint not null, pk\n# user_id :bigint not null, fk\n# status :string default(\"pending\")\n# total_cents :integer\n#\nclass Order < ApplicationRecord\n belongs_to :user\n has_many :line_items\n validates :status, inclusion: { in: %w[pending paid shipped] }\n ...\nend\n\n# ┌───────────────────────────────────────────────────────────────────┐\n# │ Included from: Auditable │\n# └───────────────────────────────────────────────────────────────────┘\n# module Auditable\n# ...\n# end\n# ──────────────────────── End Auditable ────────────────────────────",
132
+ "metadata": {
133
+ "associations": [
134
+ { "type": "belongs_to", "name": "user", "target": "User" },
135
+ { "type": "has_many", "name": "line_items", "target": "LineItem" }
136
+ ],
137
+ "validations": [
138
+ { "attribute": "status", "type": "inclusion", "options": { "in": ["pending", "paid", "shipped"] } }
139
+ ],
140
+ "enums": { "status": { "pending": 0, "active": 1, "shipped": 2 } },
141
+ "scopes": [{ "name": "active", "source": "-> { where(status: :active) }" }],
142
+ "inlined_concerns": ["Auditable"]
143
+ },
144
+ "dependencies": [
145
+ { "type": "model", "target": "User", "via": "belongs_to" },
146
+ { "type": "model", "target": "LineItem", "via": "has_many" }
147
+ ]
148
+ }
149
+ ```
150
+
151
+ ### Callback Chain with Side-Effects
152
+
153
+ Woods resolves the full callback chain in execution order and detects side-effects — which columns get written, which jobs get enqueued, which mailers fire:
154
+
155
+ ```json
156
+ "callbacks": [
157
+ { "type": "before_validation", "filter": "normalize_email", "kind": "before", "conditions": {} },
158
+ { "type": "before_save", "filter": "set_slug", "kind": "before", "conditions": {},
159
+ "side_effects": { "columns_written": ["slug"], "jobs_enqueued": [], "services_called": [], "mailers_triggered": [], "database_reads": [], "operations": [] } },
160
+ { "type": "after_commit", "filter": "send_welcome", "kind": "after", "conditions": {},
161
+ "side_effects": { "columns_written": [], "jobs_enqueued": ["WelcomeEmailJob"], "services_called": [], "mailers_triggered": ["UserMailer"], "database_reads": [], "operations": [] } }
162
+ ]
163
+ ```
164
+
165
+ Side-effects are detected by `CallbackAnalyzer`, which scans callback method bodies for patterns like `self.col =` (column writes), `perform_later` (job enqueues), and `deliver_later` (mailer triggers). This is the #1 thing AI tools get wrong about Rails models.
166
+
167
+ ### Route-to-Controller Lookup
168
+
169
+ Every route becomes its own `ExtractedUnit` with the controller and action bound from the live routing table:
170
+
171
+ ```json
172
+ {
173
+ "type": "route",
174
+ "identifier": "POST /checkout",
175
+ "metadata": {
176
+ "controller": "orders",
177
+ "action": "create",
178
+ "route_name": "checkout"
179
+ }
180
+ }
181
+ ```
182
+
183
+ To find which controller handles a URL, use the MCP `search` tool:
184
+
185
+ ```json
186
+ { "tool": "search", "params": { "query": "/checkout", "types": ["route"] } }
187
+ ```
188
+
189
+ This returns all matching route units with their controller and action — no guessing about custom routes, nested resources, or engine mount points.
190
+
191
+ ### Looking Up a Model's Full Structure
192
+
193
+ Use the MCP `lookup` tool to get a model's complete JSON representation — schema, associations, validations, callbacks, and inlined concerns in one call:
194
+
195
+ ```json
196
+ { "tool": "lookup", "params": { "identifier": "Order", "include_source": true } }
197
+ ```
198
+
199
+ Returns the full `ExtractedUnit` JSON shown in the example above, including `source_code` (with schema header and inlined concerns), `metadata` (associations, callbacks, validations, enums, scopes), `dependencies`, and `dependents`.
200
+
201
+ To get just the structured metadata without source code:
202
+
203
+ ```json
204
+ { "tool": "lookup", "params": { "identifier": "Order", "include_source": false, "sections": ["metadata"] } }
205
+ ```
206
+
207
+ ### Finding Jobs Enqueued by a Service
208
+
209
+ Use the MCP `dependencies` tool to trace what a service triggers:
210
+
211
+ ```json
212
+ { "tool": "dependencies", "params": { "identifier": "CheckoutService", "depth": 2, "types": ["job"] } }
213
+ ```
214
+
215
+ Returns all job units reachable from `CheckoutService` within 2 hops — including jobs triggered indirectly via model callbacks (e.g., `CheckoutService` → `Order` → `OrderConfirmationJob`).
216
+
217
+ ### Runtime-Generated Method Detection
218
+
219
+ Because Woods runs inside the booted Rails process, it captures every method Rails generates dynamically — enum predicates, association builders, attribute accessors, and scope methods that static analysis tools cannot see:
220
+
221
+ ```json
222
+ {
223
+ "identifier": "Order",
224
+ "metadata": {
225
+ "enums": { "status": { "pending": 0, "active": 1, "shipped": 2 } },
226
+ "scopes": [{ "name": "active", "source": "-> { where(status: :active) }" }],
227
+ "associations": [{ "type": "has_many", "name": "line_items", "target": "LineItem" }]
228
+ }
229
+ }
230
+ ```
231
+
232
+ Static tools miss `status_active?`, `status_pending?`, `build_line_item`, `create_line_item!`, and dynamically registered scopes. Woods captures all of these because it queries the runtime class via `instance_methods(false)` after Rails has processed every DSL declaration.
233
+
234
+ ---
235
+
86
236
  ## Connect to Your AI Tool
87
237
 
88
238
  Woods ships two MCP servers. Most users only need the **Index Server**.
@@ -223,6 +373,26 @@ Woods is backend-agnostic. Your app database, vector store, embedding provider,
223
373
 
224
374
  See [Backend Matrix](docs/BACKEND_MATRIX.md) for supported combinations and [Configuration Reference](docs/CONFIGURATION_REFERENCE.md) for every option with defaults.
225
375
 
376
+ ### Environment-Specific Configuration
377
+
378
+ ```ruby
379
+ Woods.configure do |config|
380
+ config.output_dir = Rails.root.join('tmp/woods')
381
+
382
+ # CI: only extract models and controllers for faster builds
383
+ config.extractors = %i[models controllers] if ENV['CI']
384
+
385
+ # Environment-conditional embedding provider
386
+ if ENV['OPENAI_API_KEY']
387
+ config.embedding_provider = :openai
388
+ config.embedding_options = { api_key: ENV['OPENAI_API_KEY'] }
389
+ else
390
+ config.embedding_provider = :ollama
391
+ config.embedding_options = { base_url: 'http://localhost:11434' }
392
+ end
393
+ end
394
+ ```
395
+
226
396
  ---
227
397
 
228
398
  ## Keeping the Index Current
@@ -289,12 +459,15 @@ Everything flows through `ExtractedUnit` — the universal data structure. Each
289
459
  |-------|-----------------|
290
460
  | `identifier` | Class name or descriptive key (`"User"`, `"POST /orders"`) |
291
461
  | `type` | Category (`:model`, `:controller`, `:service`, `:job`, etc.) |
462
+ | `file_path` | Source file location relative to Rails root |
463
+ | `namespace` | Module namespace (`"Admin"`, `nil` for top-level) |
292
464
  | `source_code` | Annotated source with inlined concerns and schema |
293
465
  | `metadata` | Structured data — associations, callbacks, routes, fields |
294
466
  | `dependencies` | What this unit depends on (forward edges) |
295
467
  | `dependents` | What depends on this unit (reverse edges) |
296
468
  | `chunks` | Semantic sub-sections for large units |
297
- | `estimated_tokens` | Token count for LLM context budgeting |
469
+ | `extracted_at` | ISO 8601 timestamp of extraction |
470
+ | `source_hash` | SHA-256 digest for change detection |
298
471
 
299
472
  ### Output Structure
300
473
 
@@ -323,7 +496,7 @@ tmp/woods/
323
496
  │ │
324
497
  │ ┌────────────┐ ┌─────────────┐ ┌──────────────────────┐ │
325
498
  │ │ Extract │───>│ Resolve │───>│ Write JSON │ │
326
- │ │ 34 types │ │ graph + │ │ per unit │ │
499
+ │ │ 33 types │ │ graph + │ │ per unit │ │
327
500
  │ │ │ │ git data │ │ │ │
328
501
  │ └────────────┘ └─────────────┘ └──────────────────────┘ │
329
502
  └──────────────────────────────────────────────────────────────────┘
@@ -17,6 +17,10 @@ require_relative '../lib/woods/console/server'
17
17
  config_path = ENV.fetch('WOODS_CONSOLE_CONFIG', File.expand_path('~/.woods/console.yml'))
18
18
  config = File.exist?(config_path) ? YAML.safe_load_file(config_path) : {}
19
19
 
20
+ # Suppress json-schema MultiJSON deprecation notice that would pollute stderr
21
+ # during MCP stdio transport. The notice fires when multi_json is in the bundle.
22
+ JSON::Validator.use_multi_json = false if defined?(JSON::Validator) && JSON::Validator.respond_to?(:use_multi_json=)
23
+
20
24
  server = Woods::Console::Server.build(config: config)
21
25
  transport = MCP::Server::Transports::StdioTransport.new(server)
22
26
  transport.open
data/exe/woods-mcp CHANGED
@@ -19,6 +19,10 @@ require_relative '../lib/woods/mcp/bootstrapper'
19
19
  require_relative '../lib/woods/embedding/text_preparer'
20
20
  require_relative '../lib/woods/embedding/indexer'
21
21
 
22
+ # Suppress json-schema MultiJSON deprecation notice that would pollute stderr
23
+ # during MCP stdio transport. The notice fires when multi_json is in the bundle.
24
+ JSON::Validator.use_multi_json = false if defined?(JSON::Validator) && JSON::Validator.respond_to?(:use_multi_json=)
25
+
22
26
  index_dir = Woods::MCP::Bootstrapper.resolve_index_dir(ARGV)
23
27
  retriever = Woods::MCP::Bootstrapper.build_retriever
24
28
  snapshot_store = Woods::MCP::Bootstrapper.build_snapshot_store(index_dir)
data/lib/tasks/woods.rake CHANGED
@@ -618,4 +618,58 @@ namespace :woods do
618
618
 
619
619
  desc 'Send findings from the field — sync to Notion (alias for notion_sync)'
620
620
  task send: :notion_sync
621
+
622
+ desc 'Sync extraction data to Unblocked collection (Documents API)'
623
+ task unblocked_sync: :environment do
624
+ require 'woods/unblocked/exporter'
625
+
626
+ config = Woods.configuration
627
+ config.unblocked_api_token = ENV.fetch('UNBLOCKED_API_TOKEN', nil) || config.unblocked_api_token
628
+ config.unblocked_collection_id = ENV.fetch('UNBLOCKED_COLLECTION_ID', nil) || config.unblocked_collection_id
629
+ config.unblocked_repo_url = ENV.fetch('UNBLOCKED_REPO_URL', nil) || config.unblocked_repo_url
630
+
631
+ unless config.unblocked_api_token
632
+ puts 'ERROR: Unblocked API token not configured.'
633
+ puts 'Set UNBLOCKED_API_TOKEN env var or configure unblocked_api_token in Woods.configure.'
634
+ exit 1
635
+ end
636
+
637
+ unless config.unblocked_collection_id
638
+ puts 'ERROR: Unblocked collection ID not configured.'
639
+ puts 'Set UNBLOCKED_COLLECTION_ID env var or configure unblocked_collection_id in Woods.configure.'
640
+ exit 1
641
+ end
642
+
643
+ unless config.unblocked_repo_url
644
+ puts 'ERROR: Repository URL not configured.'
645
+ puts 'Set UNBLOCKED_REPO_URL env var or configure unblocked_repo_url in Woods.configure.'
646
+ puts 'Example: https://github.com/your-org/your-repo'
647
+ exit 1
648
+ end
649
+
650
+ output_dir = ENV.fetch('WOODS_OUTPUT', config.output_dir)
651
+
652
+ puts 'Syncing extraction data to Unblocked...'
653
+ puts " Output dir: #{output_dir}"
654
+ puts " Collection: #{config.unblocked_collection_id}"
655
+ puts " Repo URL: #{config.unblocked_repo_url}"
656
+ puts
657
+
658
+ exporter = Woods::Unblocked::Exporter.new(index_dir: output_dir)
659
+ stats = exporter.sync_all
660
+
661
+ puts
662
+ puts 'Sync complete!'
663
+ puts " Documents synced: #{stats[:synced]}"
664
+ puts " Documents skipped: #{stats[:skipped]}"
665
+
666
+ if stats[:errors].any?
667
+ puts " Errors: #{stats[:errors].size}"
668
+ stats[:errors].first(5).each { |e| puts " - #{e}" }
669
+ puts " ... and #{stats[:errors].size - 5} more" if stats[:errors].size > 5
670
+ end
671
+ end
672
+
673
+ desc 'Relay findings to Unblocked (alias for unblocked_sync)'
674
+ task relay: :unblocked_sync
621
675
  end
@@ -327,6 +327,9 @@ module Woods
327
327
  callbacks: extract_callbacks(model),
328
328
  scopes: extract_scopes(model, source),
329
329
  enums: extract_enums(model),
330
+ inlined_concerns: extract_included_modules(model)
331
+ .select { |mod| mod.name && concern_source(mod) }
332
+ .map { |mod| mod.name.demodulize },
330
333
 
331
334
  # API surface
332
335
  class_methods: model.methods(false).sort,
@@ -611,7 +614,7 @@ module Woods
611
614
  def extract_dependencies(model, source = nil)
612
615
  # Associations point to other models
613
616
  deps = model.reflect_on_all_associations.filter_map do |assoc|
614
- { type: :model, target: assoc.class_name, via: :association }
617
+ { type: :model, target: assoc.class_name, via: assoc.macro }
615
618
  rescue NameError => e
616
619
  @warnings << "[#{model.name}] Skipping broken association dep #{assoc.name}: #{e.message}"
617
620
  nil
@@ -154,6 +154,52 @@ module Woods
154
154
  end
155
155
  end
156
156
 
157
+ # Group units into semantic domains using namespace prefixes and graph connectivity.
158
+ #
159
+ # Strategy:
160
+ # 1. Seed clusters from top-level namespace prefixes (e.g., ShippingProfile::*, Order::*)
161
+ # 2. Assign unnamespaced units to their most-connected cluster
162
+ # 3. Merge small clusters (< min_size) into their most-connected neighbor
163
+ # 4. For each cluster, identify the hub (highest PageRank) and entry points
164
+ # 5. Compute boundary edges between clusters
165
+ #
166
+ # @param min_size [Integer] Minimum units per cluster before merging (default: 3)
167
+ # @param types [Array<String>, nil] Filter to these unit types (default: all)
168
+ # @return [Array<Hash>] Clusters sorted by member count descending.
169
+ # Each hash: { name:, hub:, members:, member_count:, entry_points:, boundary_edges:, types: }
170
+ def domain_clusters(min_size: 3, types: nil)
171
+ nodes = graph_nodes
172
+ return [] if nodes.empty?
173
+
174
+ # Filter by types if specified
175
+ filtered_ids = if types
176
+ type_set = types.map(&:to_s)
177
+ nodes.select { |_, meta| type_set.include?(meta[:type].to_s) }.keys
178
+ else
179
+ nodes.keys
180
+ end
181
+
182
+ return [] if filtered_ids.empty?
183
+
184
+ # Step 1: Seed clusters from namespace prefixes
185
+ clusters = seed_namespace_clusters(filtered_ids, nodes)
186
+
187
+ # Step 2: Assign unnamespaced/root units to most-connected cluster
188
+ assign_orphaned_units(clusters, filtered_ids, nodes)
189
+
190
+ # Step 3: Merge small clusters
191
+ merge_small_clusters(clusters, min_size)
192
+
193
+ # Step 4: Enrich each cluster with hub, entry points, boundary edges
194
+ pagerank_scores = @graph.pagerank
195
+ enrich_clusters(clusters, nodes, pagerank_scores)
196
+
197
+ # Sort by member count descending
198
+ clusters.values
199
+ .select { |c| c[:members].any? }
200
+ .sort_by { |c| -c[:member_count] }
201
+ end
202
+
157
203
  # Full analysis report combining all structural metrics.
158
204
  #
159
205
  # @return [Hash] Complete analysis with :orphans, :dead_ends, :hubs,
@@ -182,6 +228,171 @@ module Woods
182
228
 
183
229
  private
184
230
 
231
+ # ──────────────────────────────────────────────────────────────────────
232
+ # Domain Cluster Helpers
233
+ # ──────────────────────────────────────────────────────────────────────
234
+
235
+ # Extract the top-level namespace prefix for clustering.
236
+ # "ShippingProfile::Setting" => "ShippingProfile"
237
+ # "Order::Transactions::Refund" => "Order"
238
+ # "Account" => nil (no namespace)
239
+ def cluster_prefix(identifier)
240
+ parts = identifier.to_s.split('::')
241
+ parts.size > 1 ? parts.first : nil
242
+ end
243
+
244
+ # Seed initial clusters from namespace prefixes.
245
+ def seed_namespace_clusters(filtered_ids, _nodes)
246
+ clusters = {}
247
+
248
+ filtered_ids.each do |id|
249
+ prefix = cluster_prefix(id)
250
+ next unless prefix
251
+
252
+ clusters[prefix] ||= { name: prefix, members: [], member_set: Set.new }
253
+ clusters[prefix][:members] << id
254
+ clusters[prefix][:member_set].add(id)
255
+ end
256
+
257
+ clusters
258
+ end
259
+
260
+ # Assign units with no namespace prefix to their most-connected cluster.
261
+ def assign_orphaned_units(clusters, filtered_ids, _nodes)
262
+ return if clusters.empty?
263
+
264
+ unassigned = filtered_ids.select { |id| cluster_prefix(id).nil? }
265
+
266
+ unassigned.each do |id|
267
+ best_cluster = find_most_connected_cluster(id, clusters)
268
+ next unless best_cluster
269
+
270
+ clusters[best_cluster][:members] << id
271
+ clusters[best_cluster][:member_set].add(id)
272
+ end
273
+ end
274
+
275
+ # Find which cluster a unit has the most connections to.
276
+ def find_most_connected_cluster(identifier, clusters)
277
+ connections = Hash.new(0)
278
+
279
+ # Check forward edges (dependencies)
280
+ @graph.dependencies_of(identifier).each do |dep|
281
+ clusters.each do |name, cluster|
282
+ connections[name] += 1 if cluster[:member_set].include?(dep)
283
+ end
284
+ end
285
+
286
+ # Check reverse edges (dependents)
287
+ @graph.dependents_of(identifier).each do |dep|
288
+ clusters.each do |name, cluster|
289
+ connections[name] += 1 if cluster[:member_set].include?(dep)
290
+ end
291
+ end
292
+
293
+ return nil if connections.empty?
294
+
295
+ connections.max_by { |_, count| count }.first
296
+ end
297
+
298
+ # Merge clusters smaller than min_size into their most-connected neighbor.
299
+ def merge_small_clusters(clusters, min_size)
300
+ loop do
301
+ small = clusters.select { |_, c| c[:members].size < min_size }
302
+ break if small.empty?
303
+
304
+ # Merge the smallest cluster first
305
+ name, cluster = small.min_by { |_, c| c[:members].size }
306
+
307
+ # Find which other cluster this one connects to most
308
+ target = find_merge_target(cluster, clusters, name)
309
+
310
+ if target
311
+ clusters[target][:members].concat(cluster[:members])
312
+ cluster[:members].each { |id| clusters[target][:member_set].add(id) }
313
+ end
314
+
315
+ clusters.delete(name)
316
+ end
317
+ end
318
+
319
+ # Find the best cluster to merge into (most cross-cluster edges).
320
+ def find_merge_target(cluster, all_clusters, exclude_name)
321
+ connections = Hash.new(0)
322
+
323
+ cluster[:members].each do |id|
324
+ (@graph.dependencies_of(id) + @graph.dependents_of(id)).each do |connected|
325
+ all_clusters.each do |name, other|
326
+ next if name == exclude_name
327
+
328
+ connections[name] += 1 if other[:member_set].include?(connected)
329
+ end
330
+ end
331
+ end
332
+
333
+ return nil if connections.empty?
334
+
335
+ connections.max_by { |_, count| count }.first
336
+ end
337
+
338
+ # Enrich clusters with hub, entry points, boundary edges, and type breakdown.
339
+ def enrich_clusters(clusters, nodes, pagerank_scores)
340
+ clusters.each_value do |cluster|
341
+ members = cluster[:members]
342
+ member_set = cluster[:member_set]
343
+
344
+ # Hub: highest PageRank within the cluster
345
+ hub_id = members.max_by { |id| pagerank_scores[id] || 0 }
346
+ cluster[:hub] = hub_id
347
+
348
+ # Entry points: controllers and GraphQL resolvers in the cluster's dependents
349
+ entry_types = %w[controller graphql_resolver graphql_mutation graphql_query]
350
+ entry_points = Set.new
351
+ members.each do |id|
352
+ @graph.dependents_of(id).each do |dep|
353
+ meta = nodes[dep]
354
+ entry_points.add(dep) if meta && entry_types.include?(meta[:type].to_s)
355
+ end
356
+ end
357
+ cluster[:entry_points] = entry_points.to_a
358
+
359
+ # Boundary edges: connections that cross cluster boundaries
360
+ boundary = []
361
+ members.each do |id|
362
+ @graph.dependencies_of(id).each do |dep|
363
+ next if member_set.include?(dep)
364
+
365
+ dep_meta = nodes[dep]
366
+ next unless dep_meta
367
+
368
+ boundary << { from: id, to: dep, via: 'dependency' }
369
+ end
370
+
371
+ @graph.dependents_of(id).each do |dep|
372
+ next if member_set.include?(dep)
373
+
374
+ dep_meta = nodes[dep]
375
+ next unless dep_meta
376
+
377
+ boundary << { from: dep, to: id, via: 'dependent' }
378
+ end
379
+ end
380
+ # Deduplicate and limit boundary edges
381
+ cluster[:boundary_edges] = boundary.uniq { |e| [e[:from], e[:to]] }.first(20)
382
+
383
+ # Type breakdown
384
+ type_counts = members.each_with_object(Hash.new(0)) do |id, counts|
385
+ meta = nodes[id]
386
+ counts[meta[:type].to_s] += 1 if meta
387
+ end
388
+ cluster[:types] = type_counts
389
+
390
+ # Final shape
391
+ cluster[:member_count] = members.size
392
+ cluster.delete(:member_set) # Internal tracking, not part of output
393
+ end
394
+ end
395
+
185
396
  # ──────────────────────────────────────────────────────────────────────
186
397
  # Graph Accessors
187
398
  # ──────────────────────────────────────────────────────────────────────
@@ -165,6 +165,67 @@ module Woods
165
165
  lines.join("\n").rstrip
166
166
  end
167
167
 
168
+ # ── domain_clusters ────────────────────────────────────────
169
+
170
+ # @param data [Hash] Domain cluster data with :clusters and :total
171
+ # @return [String] Markdown domain cluster overview
172
+ def render_domain_clusters(data, **)
173
+ clusters = fetch_key(data, :clusters) || []
174
+ total = fetch_key(data, :total) || clusters.size
175
+ lines = []
176
+ lines << '## Domain Clusters'
177
+ lines << ''
178
+ lines << "#{total} domains detected."
179
+ lines << ''
180
+
181
+ clusters.each do |cluster|
182
+ name = cluster[:name] || cluster['name']
183
+ member_count = cluster[:member_count] || cluster['member_count'] || 0
184
+ hub = cluster[:hub] || cluster['hub']
185
+ lines << "### #{name} (#{member_count} units)"
186
+ lines << ''
187
+ lines << "**Hub:** #{hub}" if hub
188
+ lines << ''
189
+
190
+ # Type breakdown
191
+ types = cluster[:types] || cluster['types']
192
+ if types.is_a?(Hash) && types.any?
193
+ type_parts = types.sort_by { |_, count| -count }.map { |type, count| "#{count} #{type}s" }
194
+ lines << "**Types:** #{type_parts.join(', ')}"
195
+ end
196
+
197
+ # Entry points
198
+ entry_points = cluster[:entry_points] || cluster['entry_points'] || []
199
+ lines << "**Entry points:** #{entry_points.first(10).join(', ')}" if entry_points.any?
200
+
201
+ # Members (show first 15)
202
+ members = cluster[:members] || cluster['members'] || []
203
+ if members.any?
204
+ lines << ''
205
+ lines << '**Members:**'
206
+ members.first(15).each { |m| lines << "- #{m}" }
207
+ lines << "- _... and #{members.size - 15} more_" if members.size > 15
208
+ end
209
+
210
+ # Boundary edges (show first 10)
211
+ boundaries = cluster[:boundary_edges] || cluster['boundary_edges'] || []
212
+ if boundaries.any?
213
+ lines << ''
214
+ lines << '**Boundary connections:**'
215
+ boundaries.first(10).each do |edge|
216
+ from = edge[:from] || edge['from']
217
+ to = edge[:to] || edge['to']
218
+ via = edge[:via] || edge['via']
219
+ lines << "- #{from} → #{to} (#{via})"
220
+ end
221
+ end
222
+
223
+ lines << ''
224
+ end
225
+
226
+ lines.join("\n").rstrip
227
+ end
228
+
168
229
  # ── pagerank ────────────────────────────────────────────────
169
230
 
170
231
  # @param data [Hash] PageRank data with :total_nodes and :results