broadlistening 0.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.rspec +3 -0
- data/.rubocop.yml +3 -0
- data/CHANGELOG.md +40 -0
- data/CLAUDE.md +112 -0
- data/LICENSE +24 -0
- data/LICENSE-AGPLv3.txt +661 -0
- data/README.md +195 -0
- data/Rakefile +77 -0
- data/exe/broadlistening +6 -0
- data/lib/broadlistening/argument.rb +136 -0
- data/lib/broadlistening/cli.rb +196 -0
- data/lib/broadlistening/comment.rb +128 -0
- data/lib/broadlistening/compatibility.rb +375 -0
- data/lib/broadlistening/config.rb +190 -0
- data/lib/broadlistening/context.rb +180 -0
- data/lib/broadlistening/csv_loader.rb +109 -0
- data/lib/broadlistening/hierarchical_clustering.rb +142 -0
- data/lib/broadlistening/kmeans.rb +185 -0
- data/lib/broadlistening/llm_client.rb +84 -0
- data/lib/broadlistening/pipeline.rb +129 -0
- data/lib/broadlistening/planner.rb +114 -0
- data/lib/broadlistening/provider.rb +97 -0
- data/lib/broadlistening/spec_loader.rb +86 -0
- data/lib/broadlistening/status.rb +132 -0
- data/lib/broadlistening/steps/aggregation.rb +228 -0
- data/lib/broadlistening/steps/base_step.rb +42 -0
- data/lib/broadlistening/steps/clustering.rb +103 -0
- data/lib/broadlistening/steps/embedding.rb +40 -0
- data/lib/broadlistening/steps/extraction.rb +73 -0
- data/lib/broadlistening/steps/initial_labelling.rb +85 -0
- data/lib/broadlistening/steps/merge_labelling.rb +93 -0
- data/lib/broadlistening/steps/overview.rb +36 -0
- data/lib/broadlistening/version.rb +5 -0
- data/lib/broadlistening.rb +44 -0
- data/schema/hierarchical_result.json +152 -0
- data/sig/broadlistening.rbs +4 -0
- metadata +194 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: 23f5c91f6f1458def1f7d9a8f48ffe18e5065298020042ea83c8ee4b567e747e
|
|
4
|
+
data.tar.gz: de2380dc80b27514a6831b6e8bb84cef37562e824a42e4e57b55cd07e4e3da71
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 6fc398f25fc7a5dc7935958fc5ff100234d8a52a312b45c6a2f8f77b0b0210a9c89e9186cf3139dd25652834d4d249cbb16ec225e753ea8289f9882f150b7ffc
|
|
7
|
+
data.tar.gz: 7d4ef0cfceae1f7a781c6d8e03e2bcd2b649d7ffc3c41ce04bc47fc1cdd25689f6a63604b01f88b33f163049b95a0cee869feeb3397834fe8a8a22e268ef22ff
|
data/.rspec
ADDED
data/.rubocop.yml
ADDED
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [0.7.0] - 2024-11-30
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- Multi-provider support for LLM
|
|
12
|
+
- OpenAI (default)
|
|
13
|
+
- Azure OpenAI
|
|
14
|
+
- Google Gemini
|
|
15
|
+
- OpenRouter
|
|
16
|
+
- Local LLM (Ollama)
|
|
17
|
+
- CLI class compatible with Python kouchou-ai `hierarchical_main.py`
|
|
18
|
+
- `broadlistening CONFIG [options]` command
|
|
19
|
+
- `-f, --force` option to force re-run all steps
|
|
20
|
+
- `-o, --only STEP` option to run only specified step
|
|
21
|
+
- `--skip-interaction` option to skip confirmation prompt
|
|
22
|
+
- Auto-generates output directory from config filename (e.g., `config/report.json` → `outputs/report/`)
|
|
23
|
+
- Config class now supports `input`, `question`, `name`, and `intro` fields for Python compatibility
|
|
24
|
+
|
|
25
|
+
### Changed
|
|
26
|
+
- Extracted Provider class to separate LLM provider configuration from Config
|
|
27
|
+
- Removed Services namespace, moved classes directly to Broadlistening module
|
|
28
|
+
|
|
29
|
+
### Fixed
|
|
30
|
+
- Changed PROVIDERS hash keys from strings to symbols for consistency
|
|
31
|
+
- Use single worker in notification tests for deterministic behavior
|
|
32
|
+
|
|
33
|
+
## [0.1.0] - 2024-11-30
|
|
34
|
+
|
|
35
|
+
### Added
|
|
36
|
+
- Initial implementation of Broadlistening pipeline
|
|
37
|
+
- 7-step pipeline: extraction, embedding, clustering, initial_labelling, merge_labelling, overview, aggregation
|
|
38
|
+
- Incremental execution support with status tracking
|
|
39
|
+
- Output format compatible with Kouchou-AI Python implementation
|
|
40
|
+
- ActiveSupport::Notifications for pipeline events
|
data/CLAUDE.md
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Overview
|
|
6
|
+
|
|
7
|
+
Broadlistening is a Ruby gem that implements the broadlistening pipeline for clustering and analyzing public comments using LLM. It is a Ruby port of the Kouchou-AI (広聴AI) Python implementation, designed for use in Rails applications and other Ruby environments.
|
|
8
|
+
|
|
9
|
+
## Development Commands
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
# Setup
|
|
13
|
+
bin/setup
|
|
14
|
+
|
|
15
|
+
# Run all tests
|
|
16
|
+
bundle exec rspec
|
|
17
|
+
|
|
18
|
+
# Run a single test file
|
|
19
|
+
bundle exec rspec spec/pipeline_spec.rb
|
|
20
|
+
|
|
21
|
+
# Run compatibility tests only
|
|
22
|
+
bundle exec rspec spec/compatibility/
|
|
23
|
+
|
|
24
|
+
# Run a specific test by line number
|
|
25
|
+
bundle exec rspec spec/pipeline_spec.rb:42
|
|
26
|
+
|
|
27
|
+
# Linting
|
|
28
|
+
bundle exec rubocop
|
|
29
|
+
|
|
30
|
+
# Auto-fix lint issues
|
|
31
|
+
bundle exec rubocop -A
|
|
32
|
+
|
|
33
|
+
# Interactive console
|
|
34
|
+
bin/console
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
### Compatibility Tasks
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
# Validate Python output structure against schema
|
|
41
|
+
bundle exec rake compatibility:validate_python
|
|
42
|
+
|
|
43
|
+
# Compare Python and Ruby outputs
|
|
44
|
+
bundle exec rake "compatibility:compare[python_output.json,ruby_output.json]"
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
## Architecture
|
|
48
|
+
|
|
49
|
+
### Pipeline Flow
|
|
50
|
+
|
|
51
|
+
The pipeline processes comments through 7 sequential steps:
|
|
52
|
+
|
|
53
|
+
1. **Extraction** - Extract opinions from comments using LLM
|
|
54
|
+
2. **Embedding** - Vectorize extracted opinions using OpenAI embeddings
|
|
55
|
+
3. **Clustering** - UMAP dimensionality reduction + KMeans + hierarchical clustering
|
|
56
|
+
4. **Initial Labelling** - LLM-based labeling for each cluster
|
|
57
|
+
5. **Merge Labelling** - Hierarchical label integration
|
|
58
|
+
6. **Overview** - LLM-generated summary of all clusters
|
|
59
|
+
7. **Aggregation** - Assemble final JSON output
|
|
60
|
+
|
|
61
|
+
### Key Components
|
|
62
|
+
|
|
63
|
+
- **Pipeline** (`lib/broadlistening/pipeline.rb`) - Orchestrates step execution, handles incremental execution
|
|
64
|
+
- **Context** (`lib/broadlistening/context.rb`) - Manages all data flowing through pipeline, supports load/save for incremental execution
|
|
65
|
+
- **Config** (`lib/broadlistening/config.rb`) - Configuration management, compatible with Python config.json format
|
|
66
|
+
- **SpecLoader** (`lib/broadlistening/spec_loader.rb`) - Loads step specifications from Python's hierarchical_specs.json
|
|
67
|
+
- **Planner** (`lib/broadlistening/planner.rb`) - Determines which steps need to run based on dependencies
|
|
68
|
+
- **Status** (`lib/broadlistening/status.rb`) - Tracks execution status and locking
|
|
69
|
+
|
|
70
|
+
### Services
|
|
71
|
+
|
|
72
|
+
- `LlmClient` - OpenAI API wrapper for chat completions
|
|
73
|
+
- `KMeans` - K-means clustering implementation using Numo::NArray
|
|
74
|
+
- `HierarchicalClustering` - Builds hierarchical cluster structure
|
|
75
|
+
|
|
76
|
+
### Steps
|
|
77
|
+
|
|
78
|
+
All steps inherit from `Steps::BaseStep` and implement `execute` method:
|
|
79
|
+
- Steps read from and write to Context
|
|
80
|
+
- Each step has an output file defined in Context::OUTPUT_FILES
|
|
81
|
+
- Dependencies between steps are defined in hierarchical_specs.json
|
|
82
|
+
|
|
83
|
+
## Technology Stack
|
|
84
|
+
|
|
85
|
+
- **Ruby**: >= 3.1.0
|
|
86
|
+
- **Numerical Computing**: Numo::NArray
|
|
87
|
+
- **Dimensionality Reduction**: umappp (C++ native extension)
|
|
88
|
+
- **LLM**: ruby-openai
|
|
89
|
+
- **Parallelization**: parallel gem
|
|
90
|
+
- **Schema Validation**: json_schemer
|
|
91
|
+
- **Code Style**: rubocop-rails-omakase, rubocop-rspec
|
|
92
|
+
|
|
93
|
+
## Kouchou-AI Compatibility
|
|
94
|
+
|
|
95
|
+
This gem produces output compatible with Kouchou-AI's hierarchical_result.json format:
|
|
96
|
+
- Schema defined in `schema/hierarchical_result.json`
|
|
97
|
+
- Use `Compatibility.validate_with_schema(output)` to validate
|
|
98
|
+
- Step specs loaded from Python's `server/broadlistening/pipeline/hierarchical_specs.json`
|
|
99
|
+
|
|
100
|
+
### Output Format
|
|
101
|
+
|
|
102
|
+
Final result includes: arguments, clusters, comments, propertyMap, translations, overview, config
|
|
103
|
+
|
|
104
|
+
## Native Extension Notes
|
|
105
|
+
|
|
106
|
+
The umappp gem requires a C++ compiler:
|
|
107
|
+
```bash
|
|
108
|
+
# macOS
|
|
109
|
+
CXX=clang++ gem install umappp
|
|
110
|
+
|
|
111
|
+
# Use Rice 4.6.x (compatibility issues with 4.7.x)
|
|
112
|
+
```
|
data/LICENSE
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
GNU AFFERO GENERAL PUBLIC LICENSE
|
|
2
|
+
Version 3, 19 November 2007
|
|
3
|
+
|
|
4
|
+
Copyright (C) 2025 Masayoshi Takahashi
|
|
5
|
+
Based on Kouchou-AI (https://github.com/digitaldemocracy2030/kouchou-ai)
|
|
6
|
+
by Digital Democracy 2030
|
|
7
|
+
|
|
8
|
+
This program is free software: you can redistribute it and/or modify
|
|
9
|
+
it under the terms of the GNU Affero General Public License as published
|
|
10
|
+
by the Free Software Foundation, either version 3 of the License, or
|
|
11
|
+
(at your option) any later version.
|
|
12
|
+
|
|
13
|
+
This program is distributed in the hope that it will be useful,
|
|
14
|
+
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
15
|
+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
16
|
+
GNU Affero General Public License for more details.
|
|
17
|
+
|
|
18
|
+
You should have received a copy of the GNU Affero General Public License
|
|
19
|
+
along with this program. If not, see <https://www.gnu.org/licenses/>.
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
The full text of the GNU Affero General Public License version 3
|
|
24
|
+
is included in the file LICENSE-AGPLv3.txt in this repository.
|