woods 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (185) hide show
  1. checksums.yaml +7 -0
  2. data/CHANGELOG.md +89 -0
  3. data/CODE_OF_CONDUCT.md +83 -0
  4. data/CONTRIBUTING.md +65 -0
  5. data/LICENSE.txt +21 -0
  6. data/README.md +406 -0
  7. data/exe/woods-console +59 -0
  8. data/exe/woods-console-mcp +22 -0
  9. data/exe/woods-mcp +34 -0
  10. data/exe/woods-mcp-http +37 -0
  11. data/exe/woods-mcp-start +58 -0
  12. data/lib/generators/woods/install_generator.rb +32 -0
  13. data/lib/generators/woods/pgvector_generator.rb +37 -0
  14. data/lib/generators/woods/templates/add_pgvector_to_woods.rb.erb +15 -0
  15. data/lib/generators/woods/templates/create_woods_tables.rb.erb +43 -0
  16. data/lib/tasks/woods.rake +621 -0
  17. data/lib/tasks/woods_evaluation.rake +115 -0
  18. data/lib/woods/ast/call_site_extractor.rb +106 -0
  19. data/lib/woods/ast/method_extractor.rb +71 -0
  20. data/lib/woods/ast/node.rb +116 -0
  21. data/lib/woods/ast/parser.rb +614 -0
  22. data/lib/woods/ast.rb +6 -0
  23. data/lib/woods/builder.rb +200 -0
  24. data/lib/woods/cache/cache_middleware.rb +199 -0
  25. data/lib/woods/cache/cache_store.rb +264 -0
  26. data/lib/woods/cache/redis_cache_store.rb +116 -0
  27. data/lib/woods/cache/solid_cache_store.rb +111 -0
  28. data/lib/woods/chunking/chunk.rb +84 -0
  29. data/lib/woods/chunking/semantic_chunker.rb +295 -0
  30. data/lib/woods/console/adapters/cache_adapter.rb +58 -0
  31. data/lib/woods/console/adapters/good_job_adapter.rb +33 -0
  32. data/lib/woods/console/adapters/job_adapter.rb +68 -0
  33. data/lib/woods/console/adapters/sidekiq_adapter.rb +33 -0
  34. data/lib/woods/console/adapters/solid_queue_adapter.rb +33 -0
  35. data/lib/woods/console/audit_logger.rb +75 -0
  36. data/lib/woods/console/bridge.rb +177 -0
  37. data/lib/woods/console/confirmation.rb +90 -0
  38. data/lib/woods/console/connection_manager.rb +173 -0
  39. data/lib/woods/console/console_response_renderer.rb +74 -0
  40. data/lib/woods/console/embedded_executor.rb +373 -0
  41. data/lib/woods/console/model_validator.rb +81 -0
  42. data/lib/woods/console/rack_middleware.rb +87 -0
  43. data/lib/woods/console/safe_context.rb +82 -0
  44. data/lib/woods/console/server.rb +612 -0
  45. data/lib/woods/console/sql_validator.rb +172 -0
  46. data/lib/woods/console/tools/tier1.rb +118 -0
  47. data/lib/woods/console/tools/tier2.rb +117 -0
  48. data/lib/woods/console/tools/tier3.rb +110 -0
  49. data/lib/woods/console/tools/tier4.rb +79 -0
  50. data/lib/woods/coordination/pipeline_lock.rb +109 -0
  51. data/lib/woods/cost_model/embedding_cost.rb +88 -0
  52. data/lib/woods/cost_model/estimator.rb +128 -0
  53. data/lib/woods/cost_model/provider_pricing.rb +67 -0
  54. data/lib/woods/cost_model/storage_cost.rb +52 -0
  55. data/lib/woods/cost_model.rb +22 -0
  56. data/lib/woods/db/migrations/001_create_units.rb +38 -0
  57. data/lib/woods/db/migrations/002_create_edges.rb +35 -0
  58. data/lib/woods/db/migrations/003_create_embeddings.rb +37 -0
  59. data/lib/woods/db/migrations/004_create_snapshots.rb +45 -0
  60. data/lib/woods/db/migrations/005_create_snapshot_units.rb +40 -0
  61. data/lib/woods/db/migrations/006_rename_tables.rb +34 -0
  62. data/lib/woods/db/migrator.rb +73 -0
  63. data/lib/woods/db/schema_version.rb +73 -0
  64. data/lib/woods/dependency_graph.rb +236 -0
  65. data/lib/woods/embedding/indexer.rb +140 -0
  66. data/lib/woods/embedding/openai.rb +126 -0
  67. data/lib/woods/embedding/provider.rb +162 -0
  68. data/lib/woods/embedding/text_preparer.rb +112 -0
  69. data/lib/woods/evaluation/baseline_runner.rb +115 -0
  70. data/lib/woods/evaluation/evaluator.rb +139 -0
  71. data/lib/woods/evaluation/metrics.rb +79 -0
  72. data/lib/woods/evaluation/query_set.rb +148 -0
  73. data/lib/woods/evaluation/report_generator.rb +90 -0
  74. data/lib/woods/extracted_unit.rb +145 -0
  75. data/lib/woods/extractor.rb +1028 -0
  76. data/lib/woods/extractors/action_cable_extractor.rb +201 -0
  77. data/lib/woods/extractors/ast_source_extraction.rb +46 -0
  78. data/lib/woods/extractors/behavioral_profile.rb +309 -0
  79. data/lib/woods/extractors/caching_extractor.rb +261 -0
  80. data/lib/woods/extractors/callback_analyzer.rb +246 -0
  81. data/lib/woods/extractors/concern_extractor.rb +292 -0
  82. data/lib/woods/extractors/configuration_extractor.rb +219 -0
  83. data/lib/woods/extractors/controller_extractor.rb +404 -0
  84. data/lib/woods/extractors/database_view_extractor.rb +278 -0
  85. data/lib/woods/extractors/decorator_extractor.rb +253 -0
  86. data/lib/woods/extractors/engine_extractor.rb +223 -0
  87. data/lib/woods/extractors/event_extractor.rb +211 -0
  88. data/lib/woods/extractors/factory_extractor.rb +289 -0
  89. data/lib/woods/extractors/graphql_extractor.rb +892 -0
  90. data/lib/woods/extractors/i18n_extractor.rb +117 -0
  91. data/lib/woods/extractors/job_extractor.rb +374 -0
  92. data/lib/woods/extractors/lib_extractor.rb +218 -0
  93. data/lib/woods/extractors/mailer_extractor.rb +269 -0
  94. data/lib/woods/extractors/manager_extractor.rb +188 -0
  95. data/lib/woods/extractors/middleware_extractor.rb +133 -0
  96. data/lib/woods/extractors/migration_extractor.rb +469 -0
  97. data/lib/woods/extractors/model_extractor.rb +988 -0
  98. data/lib/woods/extractors/phlex_extractor.rb +252 -0
  99. data/lib/woods/extractors/policy_extractor.rb +191 -0
  100. data/lib/woods/extractors/poro_extractor.rb +229 -0
  101. data/lib/woods/extractors/pundit_extractor.rb +223 -0
  102. data/lib/woods/extractors/rails_source_extractor.rb +473 -0
  103. data/lib/woods/extractors/rake_task_extractor.rb +343 -0
  104. data/lib/woods/extractors/route_extractor.rb +181 -0
  105. data/lib/woods/extractors/scheduled_job_extractor.rb +331 -0
  106. data/lib/woods/extractors/serializer_extractor.rb +339 -0
  107. data/lib/woods/extractors/service_extractor.rb +217 -0
  108. data/lib/woods/extractors/shared_dependency_scanner.rb +91 -0
  109. data/lib/woods/extractors/shared_utility_methods.rb +281 -0
  110. data/lib/woods/extractors/state_machine_extractor.rb +398 -0
  111. data/lib/woods/extractors/test_mapping_extractor.rb +225 -0
  112. data/lib/woods/extractors/validator_extractor.rb +211 -0
  113. data/lib/woods/extractors/view_component_extractor.rb +311 -0
  114. data/lib/woods/extractors/view_template_extractor.rb +261 -0
  115. data/lib/woods/feedback/gap_detector.rb +89 -0
  116. data/lib/woods/feedback/store.rb +119 -0
  117. data/lib/woods/filename_utils.rb +32 -0
  118. data/lib/woods/flow_analysis/operation_extractor.rb +206 -0
  119. data/lib/woods/flow_analysis/response_code_mapper.rb +154 -0
  120. data/lib/woods/flow_assembler.rb +290 -0
  121. data/lib/woods/flow_document.rb +191 -0
  122. data/lib/woods/flow_precomputer.rb +102 -0
  123. data/lib/woods/formatting/base.rb +30 -0
  124. data/lib/woods/formatting/claude_adapter.rb +98 -0
  125. data/lib/woods/formatting/generic_adapter.rb +56 -0
  126. data/lib/woods/formatting/gpt_adapter.rb +64 -0
  127. data/lib/woods/formatting/human_adapter.rb +78 -0
  128. data/lib/woods/graph_analyzer.rb +374 -0
  129. data/lib/woods/mcp/bootstrapper.rb +96 -0
  130. data/lib/woods/mcp/index_reader.rb +394 -0
  131. data/lib/woods/mcp/renderers/claude_renderer.rb +81 -0
  132. data/lib/woods/mcp/renderers/json_renderer.rb +17 -0
  133. data/lib/woods/mcp/renderers/markdown_renderer.rb +353 -0
  134. data/lib/woods/mcp/renderers/plain_renderer.rb +240 -0
  135. data/lib/woods/mcp/server.rb +962 -0
  136. data/lib/woods/mcp/tool_response_renderer.rb +85 -0
  137. data/lib/woods/model_name_cache.rb +51 -0
  138. data/lib/woods/notion/client.rb +217 -0
  139. data/lib/woods/notion/exporter.rb +219 -0
  140. data/lib/woods/notion/mapper.rb +40 -0
  141. data/lib/woods/notion/mappers/column_mapper.rb +57 -0
  142. data/lib/woods/notion/mappers/migration_mapper.rb +39 -0
  143. data/lib/woods/notion/mappers/model_mapper.rb +161 -0
  144. data/lib/woods/notion/mappers/shared.rb +22 -0
  145. data/lib/woods/notion/rate_limiter.rb +68 -0
  146. data/lib/woods/observability/health_check.rb +79 -0
  147. data/lib/woods/observability/instrumentation.rb +34 -0
  148. data/lib/woods/observability/structured_logger.rb +57 -0
  149. data/lib/woods/operator/error_escalator.rb +81 -0
  150. data/lib/woods/operator/pipeline_guard.rb +92 -0
  151. data/lib/woods/operator/status_reporter.rb +80 -0
  152. data/lib/woods/railtie.rb +38 -0
  153. data/lib/woods/resilience/circuit_breaker.rb +99 -0
  154. data/lib/woods/resilience/index_validator.rb +167 -0
  155. data/lib/woods/resilience/retryable_provider.rb +108 -0
  156. data/lib/woods/retrieval/context_assembler.rb +261 -0
  157. data/lib/woods/retrieval/query_classifier.rb +133 -0
  158. data/lib/woods/retrieval/ranker.rb +277 -0
  159. data/lib/woods/retrieval/search_executor.rb +316 -0
  160. data/lib/woods/retriever.rb +152 -0
  161. data/lib/woods/ruby_analyzer/class_analyzer.rb +170 -0
  162. data/lib/woods/ruby_analyzer/dataflow_analyzer.rb +77 -0
  163. data/lib/woods/ruby_analyzer/fqn_builder.rb +18 -0
  164. data/lib/woods/ruby_analyzer/mermaid_renderer.rb +280 -0
  165. data/lib/woods/ruby_analyzer/method_analyzer.rb +143 -0
  166. data/lib/woods/ruby_analyzer/trace_enricher.rb +143 -0
  167. data/lib/woods/ruby_analyzer.rb +87 -0
  168. data/lib/woods/session_tracer/file_store.rb +104 -0
  169. data/lib/woods/session_tracer/middleware.rb +143 -0
  170. data/lib/woods/session_tracer/redis_store.rb +106 -0
  171. data/lib/woods/session_tracer/session_flow_assembler.rb +254 -0
  172. data/lib/woods/session_tracer/session_flow_document.rb +223 -0
  173. data/lib/woods/session_tracer/solid_cache_store.rb +139 -0
  174. data/lib/woods/session_tracer/store.rb +81 -0
  175. data/lib/woods/storage/graph_store.rb +120 -0
  176. data/lib/woods/storage/metadata_store.rb +196 -0
  177. data/lib/woods/storage/pgvector.rb +195 -0
  178. data/lib/woods/storage/qdrant.rb +205 -0
  179. data/lib/woods/storage/vector_store.rb +167 -0
  180. data/lib/woods/temporal/json_snapshot_store.rb +245 -0
  181. data/lib/woods/temporal/snapshot_store.rb +345 -0
  182. data/lib/woods/token_utils.rb +19 -0
  183. data/lib/woods/version.rb +5 -0
  184. data/lib/woods.rb +246 -0
  185. metadata +270 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: ab164a85b76d9c97fc6142836da5349a444e9c62f507622fb327f5cc8f434ed4
4
+ data.tar.gz: 66752a95ddb4183a6f78d47417690242cfc3ad2bdfc622b8740fe2fbc388658e
5
+ SHA512:
6
+ metadata.gz: 2d53024eefb62544ba536f23b1c9f36bebab988fc75223ef72e1d2ffd1d2ed0b46b2507781b040726b8059d14c9f6eefa3faa1c4d6b0a4b6c5019905ef41675d
7
+ data.tar.gz: 8d5c7a1e7ab4c7b401e61140a9ec5bea06848244d08192f05b0cc088a93980b3208cf3f22a0319545857051dc0b2a234f4d4c2ef8a5789ef108080f179aa6f99
data/CHANGELOG.md ADDED
@@ -0,0 +1,89 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.3.1] - 2026-03-04
9
+
10
+ ### Fixed
11
+
12
+ - **Gemspec version** now reads from `version.rb` instead of being hardcoded — prevents version mismatch during gem builds
13
+ - **Release workflow** replaced `rake release` (fails on tag-triggered detached HEAD) with `gem build` + `gem push`
14
+
15
+ ## [0.3.0] - 2026-03-04
16
+
17
+ ### Added
18
+
19
+ - **Redis/SolidCache caching layer** for retrieval pipeline with TTL, namespace isolation, and nil-caching
20
+ - **Engine classification** — engines tagged as `:framework` or `:application` based on install path (handles Docker vendor paths)
21
+ - **Graph analysis staleness tracking** — `generated_at` timestamp and `graph_sha` for detecting stale analysis
22
+ - **Docker setup guide** (`docs/DOCKER_SETUP.md`) — split architecture, volume mounts, bridge mode, troubleshooting
23
+ - **Context7 documentation suite** — 10 new user-facing docs optimized for AI retrieval: FAQ, Troubleshooting, Architecture, Extractor Reference, WHY Woods, MCP Tool Cookbook, and 3 Context7 skills
24
+ - **`context7.json`** configuration for controlling Context7 indexing scope
25
+
26
+ ### Fixed
27
+
28
+ - **Vendor path leak** in source file resolution across 9 extractors — framework gems under `vendor/bundle` no longer produce empty source
29
+ - **Prism cross-version compatibility** — handle API differences between Prism versions
30
+ - **`schema_sha`** now supports `db/structure.sql` fallback (not just `db/schema.rb`)
31
+ - **ViewComponent extractor** skips framework-internal components with no resolvable source file
32
+ - **HTTP connection reuse** and retry handling in embedding providers
33
+ - **DependencyGraph `to_h`** returns a dup to prevent cache pollution
34
+ - **MCP tool counts** corrected across all documentation (27 index / 31 console)
35
+ - **TROUBLESHOOTING.md** corrected: `config.extractors` controls retrieval scope, not which extractors run
36
+
37
+ ### Changed
38
+
39
+ - **README streamlined** from 620 to 325 lines — added Quick Start, Documentation table; removed verbose sections in favor of links to dedicated docs
40
+ - **Internal rake tasks** (`retrieve`, `self_analyze`) hidden from `rails -T`
41
+ - **Estimated tokens memoization** removed to prevent stale values after source changes
42
+ - **Simplification sweep** — dead code removal, shared helper extraction, bug fixes across caching and retrieval layers
43
+
44
+ ### Performance
45
+
46
+ - Critical hotspots fixed across extraction, storage, and retrieval pipelines
47
+ - `fetch_key` optimization for falsy value handling in cache layer
48
+
49
+ ## [0.2.1] - 2026-02-19
50
+
51
+ ### Changed
52
+
53
+ - Switch release workflow to RubyGems trusted publishing
54
+
55
+ ## [0.2.0] - 2026-02-19
56
+
57
+ ### Added
58
+
59
+ - **Embedded console MCP server** for zero-config Rails querying (no bridge process needed)
60
+ - **Console MCP setup guide** (`docs/CONSOLE_MCP_SETUP.md`) — stdio, Docker, HTTP/Rack, SSH bridge options
61
+ - **CODEOWNERS** and issue template configuration
62
+
63
+ ### Fixed
64
+
65
+ - MCP gem compatibility and symbol key handling in embedded executor
66
+ - Duplicate URI warning in gemspec
67
+
68
+ ## [0.1.0] - 2026-02-18
69
+
70
+ ### Added
71
+
72
+ - **Extraction layer** with 13 extractors: Model, Controller, Service, Job, Mailer, Phlex, ViewComponent, GraphQL, Serializer, Manager, Policy, Validator, RailsSource
73
+ - **Dependency graph** with PageRank scoring and GraphAnalyzer (orphans, hubs, cycles, bridges)
74
+ - **Storage interfaces** with InMemory, SQLite, Pgvector, and Qdrant adapters
75
+ - **Embedding pipeline** with OpenAI and Ollama providers, TextPreparer, resumable Indexer
76
+ - **Semantic chunking** with type-aware splitting (model sections, controller per-action)
77
+ - **Context formatting** adapters for Claude, GPT, generic LLMs, and humans
78
+ - **Retrieval pipeline** with QueryClassifier, SearchExecutor, RRF Ranker, ContextAssembler
79
+ - **Retriever orchestrator** with degradation tiers and RetrievalTrace
80
+ - **Schema management** with versioned migrations and Rails generators
81
+ - **Observability** with ActiveSupport::Notifications instrumentation, structured logging, health checks
82
+ - **Resilience** with CircuitBreaker, RetryableProvider, IndexValidator
83
+ - **MCP Index Server** (21 tools) for AI agent codebase retrieval
84
+ - **Console MCP Server** (31 tools across 4 tiers) for live Rails data access
85
+ - **AST layer** with Prism adapter for method extraction and call site analysis
86
+ - **RubyAnalyzer** for class, method, and data flow analysis
87
+ - **Flow extraction** with FlowAssembler, OperationExtractor, FlowDocument
88
+ - **Evaluation harness** with Precision@k, Recall, MRR metrics and baseline comparisons
89
+ - **Rake tasks** for extraction, incremental indexing, framework source, validation, stats, evaluation
@@ -0,0 +1,83 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
6
+
7
+ We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
8
+
9
+ ## Our Standards
10
+
11
+ Examples of behavior that contributes to a positive environment for our community include:
12
+
13
+ * Demonstrating empathy and kindness toward other people
14
+ * Being respectful of differing opinions, viewpoints, and experiences
15
+ * Giving and gracefully accepting constructive feedback
16
+ * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
17
+ * Focusing on what is best not just for us as individuals, but for the overall community
18
+
19
+ Examples of unacceptable behavior include:
20
+
21
+ * The use of sexualized language or imagery, and sexual attention or advances of any kind
22
+ * Trolling, insulting or derogatory comments, and personal or political attacks
23
+ * Public or private harassment
24
+ * Publishing others' private information, such as a physical or email address, without their explicit permission
25
+ * Other conduct which could reasonably be considered inappropriate in a professional setting
26
+
27
+ ## Enforcement Responsibilities
28
+
29
+ Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
30
+
31
+ Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
32
+
33
+ ## Scope
34
+
35
+ This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
36
+
37
+ ## Enforcement
38
+
39
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at info@leah.wtf. All complaints will be reviewed and investigated promptly and fairly.
40
+
41
+ All community leaders are obligated to respect the privacy and security of the reporter of any incident.
42
+
43
+ ## Enforcement Guidelines
44
+
45
+ Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
46
+
47
+ ### 1. Correction
48
+
49
+ **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
50
+
51
+ **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
52
+
53
+ ### 2. Warning
54
+
55
+ **Community Impact**: A violation through a single incident or series of actions.
56
+
57
+ **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
58
+
59
+ ### 3. Temporary Ban
60
+
61
+ **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
62
+
63
+ **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
64
+
65
+ ### 4. Permanent Ban
66
+
67
+ **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
68
+
69
+ **Consequence**: A permanent ban from any sort of public interaction within the community.
70
+
71
+ ## Attribution
72
+
73
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.1, available at [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
74
+
75
+ Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder][Mozilla CoC].
76
+
77
+ For answers to common questions about this code of conduct, see the FAQ at [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at [https://www.contributor-covenant.org/translations][translations].
78
+
79
+ [homepage]: https://www.contributor-covenant.org
80
+ [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
81
+ [Mozilla CoC]: https://github.com/mozilla/diversity
82
+ [FAQ]: https://www.contributor-covenant.org/faq
83
+ [translations]: https://www.contributor-covenant.org/translations
data/CONTRIBUTING.md ADDED
@@ -0,0 +1,65 @@
1
+ # Contributing to Woods
2
+
3
+ Thank you for your interest in contributing to Woods!
4
+
5
+ ## Bug Reports
6
+
7
+ Please open an issue on GitHub with:
8
+
9
+ - A clear description of the bug
10
+ - Steps to reproduce
11
+ - Expected vs. actual behavior
12
+ - Your Ruby version, Rails version, and database adapter
13
+
14
+ ## Feature Requests
15
+
16
+ Open an issue describing:
17
+
18
+ - The problem you're trying to solve
19
+ - Your proposed solution
20
+ - Any alternatives you've considered
21
+
22
+ ## Pull Requests
23
+
24
+ 1. Fork the repo and create your branch from `main`
25
+ 2. Install dependencies: `bin/setup`
26
+ 3. Make your changes
27
+ 4. Add tests for new functionality
28
+ 5. Ensure the test suite passes: `bundle exec rake spec`
29
+ 6. Ensure code style passes: `bundle exec rubocop`
30
+ 7. Update CHANGELOG.md with your changes
31
+ 8. Open a pull request
32
+
33
+ ## Development Setup
34
+
35
+ ```bash
36
+ git clone https://github.com/lost-in-the/woods.git
37
+ cd woods
38
+ bin/setup
39
+ bundle exec rake spec # Run tests
40
+ bundle exec rubocop # Check style
41
+ ```
42
+
43
+ ## Testing
44
+
45
+ Woods has two test suites:
46
+
47
+ - **Gem unit specs** (`spec/`): Run with `bundle exec rake spec`. No Rails boot required.
48
+ - **Integration specs**: Run inside a host Rails app to test real extraction.
49
+
50
+ All new features need tests. Bug fixes should include a regression test.
51
+
52
+ ## Code Style
53
+
54
+ - `frozen_string_literal: true` on every file
55
+ - YARD documentation on public methods
56
+ - `rescue StandardError`, never bare `rescue`
57
+ - All extractors return `Array<ExtractedUnit>`
58
+
59
+ ## Runtime Introspection Requirement
60
+
61
+ Woods uses runtime introspection, not static parsing. If your feature requires access to Rails internals (ActiveRecord reflections, route introspection, etc.), it must run inside a booted Rails environment. Unit tests should use mocks/stubs; integration tests should run in a real Rails app.
62
+
63
+ ## License
64
+
65
+ By contributing, you agree that your contributions will be licensed under the MIT License.
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2024-2026 Leah Armstrong
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,406 @@
1
+ # Woods
2
+
3
+ **Your AI coding assistant is guessing about your Rails app. Woods gives it the real answers.**
4
+
5
+ Rails hides enormous amounts of behavior behind conventions, concerns, and runtime magic. When you ask an AI assistant "what callbacks fire when a User saves?" or "what routes map to this controller?", it guesses from training data — and gets it wrong. Woods runs *inside* your Rails app, extracts what's actually happening at runtime, and serves that context directly to your AI tools via [MCP](https://modelcontextprotocol.io/).
6
+
7
+ Works with **Claude Code**, **Cursor**, **Windsurf**, and any MCP-compatible tool.
8
+
9
+ ---
10
+
11
+ ## The Problem
12
+
13
+ Ask your AI assistant about your Rails app and watch it confidently hallucinate:
14
+
15
+ | You ask | What the AI says | What's actually true |
16
+ |---------|-----------------|---------------------|
17
+ | "What callbacks fire when User saves?" | `before_save :set_slug` | 11 callbacks across 4 files, including 3 from concerns |
18
+ | "What routes map to OrdersController?" | Standard REST routes | Custom `POST /checkout`, nested under `/shops/:shop_id` |
19
+ | "What does the checkout flow do?" | Describes `CheckoutService` | Misses that `order.save!` triggers 3 callbacks that enqueue 2 jobs |
20
+
21
+ The AI isn't bad — it just can't see what Rails is doing. Your 40-line model file has 10x that behavior when you factor in included concerns, schema context, callback chains, validations, and association reflections. Static analysis can't reach any of it.
22
+
23
+ **Woods fixes this by running inside Rails and extracting what's actually there.**
24
+
25
+ See [Why Woods?](docs/WHY_CODEBASE_INDEX.md) for detailed before/after examples.
26
+
27
+ ---
28
+
29
+ ## Quick Start
30
+
31
+ Five steps from install to asking questions:
32
+
33
+ ```bash
34
+ # 1. Add to your Rails app's Gemfile
35
+ gem 'woods', group: :development
36
+
37
+ # 2. Install and configure
38
+ bundle install
39
+ rails generate woods:install
40
+
41
+ # 3. Extract your codebase (requires Rails to be running)
42
+ bundle exec rake woods:extract
43
+ # Aliases: woods:scan
44
+
45
+ # 4. Verify it worked
46
+ bundle exec rake woods:stats
47
+ # Aliases: woods:look
48
+
49
+ # 5. Add the MCP server to your AI tool (see "Connect to Your AI Tool" below)
50
+ ```
51
+
52
+ After extraction, your AI tool gets accurate, structured context about every model, controller, service, job, route, and more — including all the behavior that Rails hides.
53
+
54
+ > **Docker?** Run extraction inside the container: `docker compose exec app bundle exec rake woods:extract`. The MCP server runs on the host reading volume-mounted output. See [Docker Setup](docs/DOCKER_SETUP.md).
55
+
56
+ See [Getting Started](docs/GETTING_STARTED.md) for the full walkthrough including storage presets, CI setup, and common first-run issues.
57
+
58
+ ---
59
+
60
+ ## What Does It Actually Do?
61
+
62
+ Woods boots your Rails app, introspects everything using runtime APIs, and writes structured JSON that your AI tools can read. Here's what that means in practice:
63
+
64
+ ### Concern Inlining
65
+
66
+ Your `User` model includes `Auditable`, `Searchable`, and `SoftDeletable`. An AI tool reading `app/models/user.rb` sees 40 lines. Woods inlines all three concerns directly into the extracted unit — the AI sees the full 200-line behavioral surface area in one block.
67
+
68
+ ### Schema Prepending
69
+
70
+ Model source gets a header with actual column types, indexes, and foreign keys pulled from the live database. No more guessing whether `name` is a `string` or `text`, or whether there's an index on `email`.
71
+
72
+ ### Route Binding
73
+
74
+ Controller source gets a route map prepended showing the real HTTP verb + path + constraints for every action. No more assuming standard REST when your app has custom routes and nested resources.
75
+
76
+ ### Dependency Graph
77
+
78
+ 34 extractors build a bidirectional graph: what each unit depends on, and what depends on it. Change a concern and trace every model it touches. Refactor a service and see every controller that calls it. PageRank scoring identifies the most important nodes in your codebase.
79
+
80
+ ### Callback Side-Effect Analysis
81
+
82
+ `CallbackAnalyzer` detects what actually happens inside callbacks — which columns get written, which jobs get enqueued, which services get called, which mailers fire. This is the #1 source of unexpected bugs in Rails, and the #1 thing AI tools get wrong.
83
+
84
+ ---
85
+
86
+ ## Connect to Your AI Tool
87
+
88
+ Woods ships two MCP servers. Most users only need the **Index Server**.
89
+
90
+ ### Index Server — Reads Pre-Extracted Data (No Rails Required)
91
+
92
+ 27 tools for code lookup, dependency traversal, semantic search, graph analysis, and more. Reads static JSON from disk — fast, no Rails boot needed.
93
+
94
+ **Claude Code** — add to `.mcp.json` in your project root:
95
+
96
+ ```json
97
+ {
98
+ "mcpServers": {
99
+ "woods": {
100
+ "command": "woods-mcp-start",
101
+ "args": ["./tmp/woods"]
102
+ }
103
+ }
104
+ }
105
+ ```
106
+
107
+ > `woods-mcp-start` is a self-healing wrapper that validates the index, checks dependencies, and auto-restarts on failure. Recommended for Claude Code.
108
+
109
+ **Cursor / Windsurf** — add to your MCP config:
110
+
111
+ ```json
112
+ {
113
+ "mcpServers": {
114
+ "woods": {
115
+ "command": "woods-mcp",
116
+ "args": ["/path/to/your-rails-app/tmp/woods"]
117
+ }
118
+ }
119
+ }
120
+ ```
121
+
122
+ ### Console Server — Live Rails Queries (Optional)
123
+
124
+ 31 tools for querying real database records, monitoring job queues, running model diagnostics, and checking schema. Connects to a live Rails process. Every query runs in a rolled-back transaction with SQL validation — safe for development use.
125
+
126
+ ```json
127
+ {
128
+ "mcpServers": {
129
+ "woods-console": {
130
+ "command": "bundle",
131
+ "args": ["exec", "rake", "woods:console"],
132
+ "cwd": "/path/to/your-rails-app"
133
+ }
134
+ }
135
+ }
136
+ ```
137
+
138
+ See [MCP Servers](docs/MCP_SERVERS.md) for the full tool catalog and [MCP Tool Cookbook](docs/MCP_TOOL_COOKBOOK.md) for scenario-based examples.
139
+
140
+ ---
141
+
142
+ ## What Gets Extracted
143
+
144
+ 34 extractors cover every major Rails concept:
145
+
146
+ | Category | What's Extracted | Key Details |
147
+ |----------|-----------------|-------------|
148
+ | **Models** | Schema, associations, validations, scopes, callbacks, enums | Concerns inlined, callback side-effects analyzed |
149
+ | **Controllers** | Actions, filters, permitted params, response formats | Route map prepended, per-action filter chains |
150
+ | **Services & Jobs** | Entry points, dependencies, retry config, queue names | Includes services, interactors, operations, commands |
151
+ | **Views & Components** | ERB templates, Phlex components, ViewComponents | Partial references, slot definitions, prop interfaces |
152
+ | **Routes & Middleware** | Full route table, middleware stack order | Constraint resolution, engine mount points |
153
+ | **GraphQL** | Types, mutations, resolvers, fields | Relay connections, argument definitions |
154
+ | **Background Work** | Jobs, mailers, Action Cable channels, scheduled tasks | Queue configuration, retry policies |
155
+ | **Data Layer** | Migrations, database views, state machines, events | DDL metadata, reversibility, transition graphs |
156
+ | **Testing** | Factories, test-to-source mappings | FactoryBot definitions, spec file associations |
157
+ | **Framework Source** | Rails internals, gem source for exact installed versions | Pinned to your `Gemfile.lock` versions |
158
+
159
+ See [Extractor Reference](docs/EXTRACTOR_REFERENCE.md) for per-extractor documentation with configuration options and example output.
160
+
161
+ ---
162
+
163
+ ## Use Cases
164
+
165
+ ### For AI-Assisted Development
166
+
167
+ - **Context-aware code generation** — your AI sees the full model (with concerns, schema, and callbacks) before writing new code
168
+ - **Feature planning** — query the dependency graph to understand blast radius before changing anything
169
+ - **PR context** — compute affected units from a diff and explain downstream impact
170
+ - **Code review** — surface hidden callback side-effects that a reviewer might miss
171
+ - **Onboarding** — new team members ask "how does checkout work?" and get the real execution flow
172
+
173
+ ### For Architecture & Technical Debt
174
+
175
+ - **Dead code detection** — `GraphAnalyzer` finds orphaned units with no dependents
176
+ - **Hub identification** — find models with 50+ dependents that are bottlenecks
177
+ - **Cycle detection** — circular dependencies surfaced automatically
178
+ - **Migration risk** — DDL metadata shows which pending migrations touch large tables
179
+ - **API surface audit** — every endpoint, its method, path, filters, and permitted params
180
+ - **Callback chain auditing** — the #1 source of Rails bugs, now visible and traceable
181
+
182
+ ---
183
+
184
+ ## Configuration
185
+
186
+ ### Zero-Config Start
187
+
188
+ The install generator creates a working configuration. The only required option is `output_dir`, which defaults to `tmp/woods`:
189
+
190
+ ```ruby
191
+ # config/initializers/woods.rb
192
+ Woods.configure do |config|
193
+ config.output_dir = Rails.root.join('tmp/woods')
194
+ end
195
+ ```
196
+
197
+ ### Storage Presets
198
+
199
+ For embedding and semantic search, use a preset to configure storage and embedding together:
200
+
201
+ ```ruby
202
+ # Local development — no external services needed
203
+ Woods.configure_with_preset(:local)
204
+
205
+ # PostgreSQL — pgvector + OpenAI embeddings
206
+ Woods.configure_with_preset(:postgresql)
207
+
208
+ # Production scale — Qdrant + OpenAI embeddings
209
+ Woods.configure_with_preset(:production)
210
+ ```
211
+
212
+ ### Backend Compatibility
213
+
214
+ Woods is backend-agnostic. Your app database, vector store, embedding provider, and job system are all configurable independently:
215
+
216
+ | Component | Options |
217
+ |-----------|---------|
218
+ | **App Database** | MySQL, PostgreSQL, SQLite |
219
+ | **Vector Store** | In-memory, pgvector, Qdrant |
220
+ | **Embeddings** | OpenAI, Ollama (local, free) |
221
+ | **Job System** | Sidekiq, Solid Queue, GoodJob, inline |
222
+ | **View Layer** | ERB, Phlex, ViewComponent |
223
+
224
+ See [Backend Matrix](docs/BACKEND_MATRIX.md) for supported combinations and [Configuration Reference](docs/CONFIGURATION_REFERENCE.md) for every option with defaults.
225
+
226
+ ---
227
+
228
+ ## Keeping the Index Current
229
+
230
+ ### Incremental Updates
231
+
232
+ After the initial extraction, update only changed files — typically 5-10x faster:
233
+
234
+ ```bash
235
+ bundle exec rake woods:incremental
236
+ # Aliases: woods:tend
237
+ ```
238
+
239
+ ### CI Integration
240
+
241
+ ```yaml
242
+ # .github/workflows/index.yml
243
+ jobs:
244
+ index:
245
+ runs-on: ubuntu-latest
246
+ steps:
247
+ - uses: actions/checkout@v4
248
+ with:
249
+ fetch-depth: 2
250
+ - name: Update index
251
+ run: bundle exec rake woods:incremental
252
+ env:
253
+ GITHUB_BASE_REF: ${{ github.base_ref }}
254
+ ```
255
+
256
+ ### Other Tasks
257
+
258
+ ```bash
259
+ rake woods:validate # Check index integrity (alias: woods:vet)
260
+ rake woods:stats # Show unit counts and graph stats (alias: woods:look)
261
+ rake woods:clean # Remove index output (alias: woods:clear)
262
+ rake woods:embed # Embed units for semantic search (alias: woods:nest)
263
+ rake woods:embed_incremental # Embed changed units only (alias: woods:hone)
264
+ rake woods:notion_sync # Sync models/columns to Notion (alias: woods:send)
265
+ ```
266
+
267
+ ---
268
+
269
+ ## How It Works Under the Hood
270
+
271
+ ```
272
+ Inside your Rails app (rake task):
273
+ 1. Boot Rails, eager-load all application classes
274
+ 2. 34 extractors introspect models, controllers, routes, etc.
275
+ 3. Dependency graph is built with forward + reverse edges
276
+ 4. Git metadata enriches each unit (last modified, contributors, churn)
277
+ 5. JSON output written to tmp/woods/
278
+
279
+ On the host (no Rails needed):
280
+ 6. Embedding pipeline chunks and vectorizes units (optional)
281
+ 7. MCP Index Server reads JSON and answers AI tool queries
282
+ ```
283
+
284
+ ### The ExtractedUnit
285
+
286
+ Everything flows through `ExtractedUnit` — the universal data structure. Each unit carries:
287
+
288
+ | Field | What It Contains |
289
+ |-------|-----------------|
290
+ | `identifier` | Class name or descriptive key (`"User"`, `"POST /orders"`) |
291
+ | `type` | Category (`:model`, `:controller`, `:service`, `:job`, etc.) |
292
+ | `source_code` | Annotated source with inlined concerns and schema |
293
+ | `metadata` | Structured data — associations, callbacks, routes, fields |
294
+ | `dependencies` | What this unit depends on (forward edges) |
295
+ | `dependents` | What depends on this unit (reverse edges) |
296
+ | `chunks` | Semantic sub-sections for large units |
297
+ | `estimated_tokens` | Token count for LLM context budgeting |
298
+
299
+ ### Output Structure
300
+
301
+ ```
302
+ tmp/woods/
303
+ ├── manifest.json # Git SHA, timestamps, checksums
304
+ ├── dependency_graph.json # Full graph with PageRank scores
305
+ ├── SUMMARY.md # Human-readable overview
306
+ ├── models/
307
+ │ ├── _index.json # Quick lookup index
308
+ │ ├── User.json # Full unit with inlined concerns
309
+ │ └── Order.json
310
+ ├── controllers/
311
+ │ └── OrdersController.json # With route map prepended
312
+ ├── services/
313
+ │ └── CheckoutService.json
314
+ └── rails_source/
315
+ └── ... # Framework source for installed versions
316
+ ```
317
+
318
+ ### Architecture Diagram
319
+
320
+ ```
321
+ ┌──────────────────────────────────────────────────────────────────┐
322
+ │ Rails Application │
323
+ │ │
324
+ │ ┌────────────┐ ┌─────────────┐ ┌──────────────────────┐ │
325
+ │ │ Extract │───>│ Resolve │───>│ Write JSON │ │
326
+ │ │ 34 types │ │ graph + │ │ per unit │ │
327
+ │ │ │ │ git data │ │ │ │
328
+ │ └────────────┘ └─────────────┘ └──────────────────────┘ │
329
+ └──────────────────────────────────────────────────────────────────┘
330
+
331
+ ┌─────────────────────────┘
332
+
333
+ ┌──────────────────────────────────────────────────────────────────┐
334
+ │ Host / CI Environment │
335
+ │ │
336
+ │ ┌────────────┐ ┌─────────────┐ ┌──────────────────────┐ │
337
+ │ │ Embed │───>│ Vector Store│ │ MCP Index Server │ │
338
+ │ │ OpenAI / │ │ pgvector / │ │ 27 tools │ │
339
+ │ │ Ollama │ │ Qdrant │ │ No Rails required │ │
340
+ │ └────────────┘ └─────────────┘ └──────────────────────┘ │
341
+ │ │
342
+ │ ┌────────────────────────────────┐ │
343
+ │ │ Console MCP Server │ │
344
+ │ │ 31 tools, bridges to Rails │ │
345
+ │ └────────────────────────────────┘ │
346
+ └──────────────────────────────────────────────────────────────────┘
347
+ ```
348
+
349
+ See [Architecture](docs/ARCHITECTURE.md) for the deep dive — extraction phases, graph internals, retrieval pipeline, and semantic chunking.
350
+
351
+ ---
352
+
353
+ ## Advanced Features
354
+
355
+ | Feature | What It Does | Guide |
356
+ |---------|-------------|-------|
357
+ | **Semantic Search** | Natural-language queries like "find email validation logic" | [Configuration Reference](docs/CONFIGURATION_REFERENCE.md) |
358
+ | **Temporal Snapshots** | Compare extraction state across git SHAs | [FAQ](docs/FAQ.md#what-are-temporal-snapshots) |
359
+ | **Session Tracing** | Record which code paths fire during a browser session | [FAQ](docs/FAQ.md#what-does-the-session-tracer-do) |
360
+ | **Notion Export** | Sync model/column data to Notion for non-technical stakeholders | [Notion Integration](docs/NOTION_INTEGRATION.md) |
361
+ | **Graph Analysis** | Find orphans, hubs, cycles, bridges in your dependency graph | [Architecture](docs/ARCHITECTURE.md) |
362
+ | **Evaluation Harness** | Measure retrieval precision, recall, and MRR | [Architecture](docs/ARCHITECTURE.md) |
363
+ | **Flow Precomputation** | Per-action request flow maps (controller → model → jobs) | [Configuration Reference](docs/CONFIGURATION_REFERENCE.md) |
364
+
365
+ ---
366
+
367
+ ## Documentation
368
+
369
+ | Guide | Who It's For | Description |
370
+ |-------|-------------|-------------|
371
+ | [Getting Started](docs/GETTING_STARTED.md) | Everyone | Install, configure, extract, inspect |
372
+ | [FAQ](docs/FAQ.md) | Everyone | Common questions about setup, extraction, MCP, Docker |
373
+ | [Troubleshooting](docs/TROUBLESHOOTING.md) | Everyone | Symptom → cause → fix |
374
+ | [MCP Servers](docs/MCP_SERVERS.md) | Setup | Full tool catalog for Claude Code, Cursor, Windsurf |
375
+ | [MCP Tool Cookbook](docs/MCP_TOOL_COOKBOOK.md) | Daily use | Scenario-based "how do I..." examples |
376
+ | [Docker Setup](docs/DOCKER_SETUP.md) | Docker users | Container extraction + host MCP server |
377
+ | [Configuration Reference](docs/CONFIGURATION_REFERENCE.md) | Customization | Every option with defaults |
378
+ | [Extractor Reference](docs/EXTRACTOR_REFERENCE.md) | Deep dive | What each of the 34 extractors captures |
379
+ | [Architecture](docs/ARCHITECTURE.md) | Contributors | Pipeline stages, graph internals, retrieval |
380
+ | [Backend Matrix](docs/BACKEND_MATRIX.md) | Infrastructure | Supported database, vector, and embedding combos |
381
+ | [Why Woods?](docs/WHY_CODEBASE_INDEX.md) | Evaluation | Detailed before/after comparisons |
382
+
383
+ ---
384
+
385
+ ## Requirements
386
+
387
+ - Ruby >= 3.0
388
+ - Rails >= 6.1
389
+
390
+ Works with MySQL, PostgreSQL, and SQLite. No additional infrastructure required for basic extraction — embedding and vector search are optional add-ons.
391
+
392
+ ## Development
393
+
394
+ ```bash
395
+ bin/setup # Install dependencies
396
+ bundle exec rake spec # Run tests (~2500 examples)
397
+ bundle exec rubocop # Lint
398
+ ```
399
+
400
+ ## Contributing
401
+
402
+ Bug reports and pull requests are welcome on GitHub at https://github.com/lost-in-the/woods. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
403
+
404
+ ## License
405
+
406
+ Available as open source under the [MIT License](LICENSE.txt).