broadlistening 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. checksums.yaml +7 -0
  2. data/.rspec +3 -0
  3. data/.rubocop.yml +3 -0
  4. data/CHANGELOG.md +40 -0
  5. data/CLAUDE.md +112 -0
  6. data/LICENSE +24 -0
  7. data/LICENSE-AGPLv3.txt +661 -0
  8. data/README.md +195 -0
  9. data/Rakefile +77 -0
  10. data/exe/broadlistening +6 -0
  11. data/lib/broadlistening/argument.rb +136 -0
  12. data/lib/broadlistening/cli.rb +196 -0
  13. data/lib/broadlistening/comment.rb +128 -0
  14. data/lib/broadlistening/compatibility.rb +375 -0
  15. data/lib/broadlistening/config.rb +190 -0
  16. data/lib/broadlistening/context.rb +180 -0
  17. data/lib/broadlistening/csv_loader.rb +109 -0
  18. data/lib/broadlistening/hierarchical_clustering.rb +142 -0
  19. data/lib/broadlistening/kmeans.rb +185 -0
  20. data/lib/broadlistening/llm_client.rb +84 -0
  21. data/lib/broadlistening/pipeline.rb +129 -0
  22. data/lib/broadlistening/planner.rb +114 -0
  23. data/lib/broadlistening/provider.rb +97 -0
  24. data/lib/broadlistening/spec_loader.rb +86 -0
  25. data/lib/broadlistening/status.rb +132 -0
  26. data/lib/broadlistening/steps/aggregation.rb +228 -0
  27. data/lib/broadlistening/steps/base_step.rb +42 -0
  28. data/lib/broadlistening/steps/clustering.rb +103 -0
  29. data/lib/broadlistening/steps/embedding.rb +40 -0
  30. data/lib/broadlistening/steps/extraction.rb +73 -0
  31. data/lib/broadlistening/steps/initial_labelling.rb +85 -0
  32. data/lib/broadlistening/steps/merge_labelling.rb +93 -0
  33. data/lib/broadlistening/steps/overview.rb +36 -0
  34. data/lib/broadlistening/version.rb +5 -0
  35. data/lib/broadlistening.rb +44 -0
  36. data/schema/hierarchical_result.json +152 -0
  37. data/sig/broadlistening.rbs +4 -0
  38. metadata +194 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 23f5c91f6f1458def1f7d9a8f48ffe18e5065298020042ea83c8ee4b567e747e
4
+ data.tar.gz: de2380dc80b27514a6831b6e8bb84cef37562e824a42e4e57b55cd07e4e3da71
5
+ SHA512:
6
+ metadata.gz: 6fc398f25fc7a5dc7935958fc5ff100234d8a52a312b45c6a2f8f77b0b0210a9c89e9186cf3139dd25652834d4d249cbb16ec225e753ea8289f9882f150b7ffc
7
+ data.tar.gz: 7d4ef0cfceae1f7a781c6d8e03e2bcd2b649d7ffc3c41ce04bc47fc1cdd25689f6a63604b01f88b33f163049b95a0cee869feeb3397834fe8a8a22e268ef22ff
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,3 @@
1
+ inherit_gem:
2
+ rubocop-rails-omakase: rubocop.yml
3
+
data/CHANGELOG.md ADDED
@@ -0,0 +1,40 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [0.7.0] - 2024-11-30
9
+
10
+ ### Added
11
+ - Multi-provider support for LLM
12
+ - OpenAI (default)
13
+ - Azure OpenAI
14
+ - Google Gemini
15
+ - OpenRouter
16
+ - Local LLM (Ollama)
17
+ - CLI class compatible with Python kouchou-ai `hierarchical_main.py`
18
+ - `broadlistening CONFIG [options]` command
19
+ - `-f, --force` option to force re-run all steps
20
+ - `-o, --only STEP` option to run only specified step
21
+ - `--skip-interaction` option to skip confirmation prompt
22
+ - Auto-generates output directory from config filename (e.g., `config/report.json` → `outputs/report/`)
23
+ - Config class now supports `input`, `question`, `name`, and `intro` fields for Python compatibility
24
+
25
+ ### Changed
26
+ - Extracted Provider class to separate LLM provider configuration from Config
27
+ - Removed Services namespace, moved classes directly to Broadlistening module
28
+
29
+ ### Fixed
30
+ - Changed PROVIDERS hash keys from strings to symbols for consistency
31
+ - Use single worker in notification tests for deterministic behavior
32
+
33
+ ## [0.1.0] - 2024-11-30
34
+
35
+ ### Added
36
+ - Initial implementation of Broadlistening pipeline
37
+ - 7-step pipeline: extraction, embedding, clustering, initial_labelling, merge_labelling, overview, aggregation
38
+ - Incremental execution support with status tracking
39
+ - Output format compatible with Kouchou-AI Python implementation
40
+ - ActiveSupport::Notifications for pipeline events
data/CLAUDE.md ADDED
@@ -0,0 +1,112 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Overview
6
+
7
+ Broadlistening is a Ruby gem that implements the broadlistening pipeline for clustering and analyzing public comments using LLM. It is a Ruby port of the Kouchou-AI (広聴AI) Python implementation, designed for use in Rails applications and other Ruby environments.
8
+
9
+ ## Development Commands
10
+
11
+ ```bash
12
+ # Setup
13
+ bin/setup
14
+
15
+ # Run all tests
16
+ bundle exec rspec
17
+
18
+ # Run a single test file
19
+ bundle exec rspec spec/pipeline_spec.rb
20
+
21
+ # Run compatibility tests only
22
+ bundle exec rspec spec/compatibility/
23
+
24
+ # Run a specific test by line number
25
+ bundle exec rspec spec/pipeline_spec.rb:42
26
+
27
+ # Linting
28
+ bundle exec rubocop
29
+
30
+ # Auto-fix lint issues
31
+ bundle exec rubocop -A
32
+
33
+ # Interactive console
34
+ bin/console
35
+ ```
36
+
37
+ ### Compatibility Tasks
38
+
39
+ ```bash
40
+ # Validate Python output structure against schema
41
+ bundle exec rake compatibility:validate_python
42
+
43
+ # Compare Python and Ruby outputs
44
+ bundle exec rake "compatibility:compare[python_output.json,ruby_output.json]"
45
+ ```
46
+
47
+ ## Architecture
48
+
49
+ ### Pipeline Flow
50
+
51
+ The pipeline processes comments through 7 sequential steps:
52
+
53
+ 1. **Extraction** - Extract opinions from comments using LLM
54
+ 2. **Embedding** - Vectorize extracted opinions using OpenAI embeddings
55
+ 3. **Clustering** - UMAP dimensionality reduction + KMeans + hierarchical clustering
56
+ 4. **Initial Labelling** - LLM-based labeling for each cluster
57
+ 5. **Merge Labelling** - Hierarchical label integration
58
+ 6. **Overview** - LLM-generated summary of all clusters
59
+ 7. **Aggregation** - Assemble final JSON output
60
+
61
+ ### Key Components
62
+
63
+ - **Pipeline** (`lib/broadlistening/pipeline.rb`) - Orchestrates step execution, handles incremental execution
64
+ - **Context** (`lib/broadlistening/context.rb`) - Manages all data flowing through pipeline, supports load/save for incremental execution
65
+ - **Config** (`lib/broadlistening/config.rb`) - Configuration management, compatible with Python config.json format
66
+ - **SpecLoader** (`lib/broadlistening/spec_loader.rb`) - Loads step specifications from Python's hierarchical_specs.json
67
+ - **Planner** (`lib/broadlistening/planner.rb`) - Determines which steps need to run based on dependencies
68
+ - **Status** (`lib/broadlistening/status.rb`) - Tracks execution status and locking
69
+
70
+ ### Services
71
+
72
+ - `LlmClient` - OpenAI API wrapper for chat completions
73
+ - `KMeans` - K-means clustering implementation using Numo::NArray
74
+ - `HierarchicalClustering` - Builds hierarchical cluster structure
75
+
76
+ ### Steps
77
+
78
+ All steps inherit from `Steps::BaseStep` and implement `execute` method:
79
+ - Steps read from and write to Context
80
+ - Each step has an output file defined in Context::OUTPUT_FILES
81
+ - Dependencies between steps are defined in hierarchical_specs.json
82
+
83
+ ## Technology Stack
84
+
85
+ - **Ruby**: >= 3.1.0
86
+ - **Numerical Computing**: Numo::NArray
87
+ - **Dimensionality Reduction**: umappp (C++ native extension)
88
+ - **LLM**: ruby-openai
89
+ - **Parallelization**: parallel gem
90
+ - **Schema Validation**: json_schemer
91
+ - **Code Style**: rubocop-rails-omakase, rubocop-rspec
92
+
93
+ ## Kouchou-AI Compatibility
94
+
95
+ This gem produces output compatible with Kouchou-AI's hierarchical_result.json format:
96
+ - Schema defined in `schema/hierarchical_result.json`
97
+ - Use `Compatibility.validate_with_schema(output)` to validate
98
+ - Step specs loaded from Python's `server/broadlistening/pipeline/hierarchical_specs.json`
99
+
100
+ ### Output Format
101
+
102
+ Final result includes: arguments, clusters, comments, propertyMap, translations, overview, config
103
+
104
+ ## Native Extension Notes
105
+
106
+ The umappp gem requires a C++ compiler:
107
+ ```bash
108
+ # macOS
109
+ CXX=clang++ gem install umappp
110
+
111
+ # Use Rice 4.6.x (compatibility issues with 4.7.x)
112
+ ```
data/LICENSE ADDED
@@ -0,0 +1,24 @@
1
+ GNU AFFERO GENERAL PUBLIC LICENSE
2
+ Version 3, 19 November 2007
3
+
4
+ Copyright (C) 2025 Masayoshi Takahashi
5
+ Based on Kouchou-AI (https://github.com/digitaldemocracy2030/kouchou-ai)
6
+ by Digital Democracy 2030
7
+
8
+ This program is free software: you can redistribute it and/or modify
9
+ it under the terms of the GNU Affero General Public License as published
10
+ by the Free Software Foundation, either version 3 of the License, or
11
+ (at your option) any later version.
12
+
13
+ This program is distributed in the hope that it will be useful,
14
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
15
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
+ GNU Affero General Public License for more details.
17
+
18
+ You should have received a copy of the GNU Affero General Public License
19
+ along with this program. If not, see <https://www.gnu.org/licenses/>.
20
+
21
+ ---
22
+
23
+ The full text of the GNU Affero General Public License version 3
24
+ is included in the file LICENSE-AGPLv3.txt in this repository.