sentiment_insights 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 7c3ede714068979aaa53076ed6fef752235b50d45ab57f388d9baf8a0e952baf
4
+ data.tar.gz: 6fc6b77e018b41a3503d45938b65575f0bd90dfbd8a16ac26257c738591709f3
5
+ SHA512:
6
+ metadata.gz: d5025dc6ab42d5b4358d66084e3306773984e796da881acf917fd879d579375d152cb7458477d3071f080d43bbb90b0ea8b122833343b963f5b84aac15d11adf
7
+ data.tar.gz: 63f1abac85a05be6ee68006eeb5729bcc53f1a0f4932ad13067e9ca06f4b89be9ce62f88ca5c8008d647862b29fd7704658a6bb3153dcd8063923650a6c2eb63
data/.gitignore ADDED
@@ -0,0 +1,14 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
12
+ .env
13
+ .idea/
14
+ .travis.yml
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the maintainer at mathrails@gmail.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile ADDED
@@ -0,0 +1,8 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in sentiment_insights.gemspec
6
+ gemspec
7
+
8
+ gem "dotenv", "~> 2.8", :group => :development
data/Gemfile.lock ADDED
@@ -0,0 +1,57 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ sentiment_insights (0.1.0)
5
+ aws-sdk-comprehend (~> 1.98.0)
6
+ sentimental (~> 1.4.0)
7
+
8
+ GEM
9
+ remote: https://rubygems.org/
10
+ specs:
11
+ aws-eventstream (1.3.2)
12
+ aws-partitions (1.1094.0)
13
+ aws-sdk-comprehend (1.98.0)
14
+ aws-sdk-core (~> 3, >= 3.216.0)
15
+ aws-sigv4 (~> 1.5)
16
+ aws-sdk-core (3.222.3)
17
+ aws-eventstream (~> 1, >= 1.3.0)
18
+ aws-partitions (~> 1, >= 1.992.0)
19
+ aws-sigv4 (~> 1.9)
20
+ base64
21
+ jmespath (~> 1, >= 1.6.1)
22
+ logger
23
+ aws-sigv4 (1.11.0)
24
+ aws-eventstream (~> 1, >= 1.0.2)
25
+ base64 (0.2.0)
26
+ diff-lcs (1.6.1)
27
+ dotenv (2.8.1)
28
+ jmespath (1.6.2)
29
+ logger (1.7.0)
30
+ rake (13.2.1)
31
+ rspec (3.13.0)
32
+ rspec-core (~> 3.13.0)
33
+ rspec-expectations (~> 3.13.0)
34
+ rspec-mocks (~> 3.13.0)
35
+ rspec-core (3.13.3)
36
+ rspec-support (~> 3.13.0)
37
+ rspec-expectations (3.13.3)
38
+ diff-lcs (>= 1.2.0, < 2.0)
39
+ rspec-support (~> 3.13.0)
40
+ rspec-mocks (3.13.2)
41
+ diff-lcs (>= 1.2.0, < 2.0)
42
+ rspec-support (~> 3.13.0)
43
+ rspec-support (3.13.2)
44
+ sentimental (1.4.1)
45
+
46
+ PLATFORMS
47
+ arm64-darwin-24
48
+
49
+ DEPENDENCIES
50
+ bundler (~> 2.0)
51
+ dotenv (~> 2.8)
52
+ rake (~> 13.0)
53
+ rspec (~> 3.0)
54
+ sentiment_insights!
55
+
56
+ BUNDLED WITH
57
+ 2.4.21
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 mathrailsAI
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2025 mathrailsAI
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,265 @@
1
+ # SentimentInsights 💬📊
2
+
3
+ **SentimentInsights** is a Ruby gem that helps you uncover meaningful insights from open-ended survey responses using Natural Language Processing (NLP). It supports multi-provider analysis via OpenAI, AWS Comprehend, or a local fallback engine.
4
+
5
+ ---
6
+
7
+ ## ✨ Features
8
+
9
+ ### ✅ 1. Sentiment Analysis
10
+
11
+ Quickly classify and summarize user responses as positive, neutral, or negative — globally or by segment (e.g., age, region).
12
+
13
+ #### 🔍 Example Call
14
+
15
+ ```ruby
16
+ insight = SentimentInsights::Insights::Sentiment.new
17
+ result = insight.analyze(entries)
18
+ ```
19
+
20
+ #### 📾 Sample Output
21
+
22
+ ```ruby
23
+ {:global_summary=>
24
+ {:total_count=>5,
25
+ :positive_count=>3,
26
+ :neutral_count=>0,
27
+ :negative_count=>2,
28
+ :positive_percentage=>60.0,
29
+ :neutral_percentage=>0.0,
30
+ :negative_percentage=>40.0,
31
+ :net_sentiment_score=>20.0},
32
+ :segment_summary=>
33
+ {:age=>
34
+ {"25-34"=>
35
+ {:total_count=>3,
36
+ :positive_count=>3,
37
+ :neutral_count=>0,
38
+ :negative_count=>0,
39
+ :positive_percentage=>100.0,
40
+ :neutral_percentage=>0.0,
41
+ :negative_percentage=>0.0,
42
+ :net_sentiment_score=>100.0}},
43
+ :top_positive_comments=>
44
+ [{:answer=>
45
+ "I absolutely loved the experience shopping with Everlane. The website is clean,\n" +
46
+ "product descriptions are spot-on, and my jeans arrived two days early with eco-friendly packaging.",
47
+ :score=>0.9}],
48
+ :top_negative_comments=>
49
+ [{:answer=>
50
+ "The checkout flow on your site was a nightmare. The promo code from your Instagram campaign didn’t work,\n" +
51
+ "and it kept redirecting me to the homepage. Shopify integration needs a serious fix.",
52
+ :score=>-0.7}],
53
+ :responses=>
54
+ [{:answer=>
55
+ "I absolutely loved the experience shopping with Everlane. The website is clean,\n" +
56
+ "product descriptions are spot-on, and my jeans arrived two days early with eco-friendly packaging.",
57
+ :segment=>{:age=>"25-34", :region=>"West"},
58
+ :sentiment_label=>:positive,
59
+ :sentiment_score=>0.9}]}}
60
+ ```
61
+
62
+ ### ✅ 2. Key Phrase Extraction
63
+
64
+ Extract frequently mentioned phrases and identify their associated sentiment and segment spread.
65
+
66
+ ```ruby
67
+ insight = SentimentInsights::Insights::KeyPhrases.new
68
+ result = insight.extract(entries, question: question)
69
+ ```
70
+
71
+ ```ruby
72
+ {:phrases=>
73
+ [{:phrase=>"everlane",
74
+ :mentions=>["r_1"],
75
+ :summary=>
76
+ {:total_mentions=>1,
77
+ :sentiment_distribution=>{:positive=>1, :negative=>0, :neutral=>0},
78
+ :segment_distribution=>{:age=>{"25-34"=>1}, :region=>{"West"=>1}}}}],
79
+ :responses=>
80
+ [{:id=>"r_1",
81
+ :sentence=>
82
+ "I absolutely loved the experience shopping with Everlane. The website is clean,\n" +
83
+ "product descriptions are spot-on, and my jeans arrived two days early with eco-friendly packaging.",
84
+ :sentiment=>:positive,
85
+ :segment=>{:age=>"25-34", :region=>"West"}}]}
86
+ ```
87
+
88
+ ### ✅ 3. Entity Recognition
89
+
90
+ Identify named entities like organizations, products, and people, and track them by sentiment and segment.
91
+
92
+ ```ruby
93
+ insight = SentimentInsights::Insights::Entities.new
94
+ result = insight.extract(entries, question: question)
95
+ ```
96
+
97
+ ```ruby
98
+ {:entities=>
99
+ [{:entity=>"everlane",
100
+ :type=>"ORGANIZATION",
101
+ :mentions=>["r_1"],
102
+ :summary=>
103
+ {:total_mentions=>1,
104
+ :segment_distribution=>{:age=>{"25-34"=>1}, :region=>{"West"=>1}}}},
105
+ {:entity=>"jeans",
106
+ :type=>"PRODUCT",
107
+ :mentions=>["r_1"],
108
+ :summary=>
109
+ {:total_mentions=>1,
110
+ :segment_distribution=>{:age=>{"25-34"=>1}, :region=>{"West"=>1}}}},
111
+ {:entity=>"24 hours",
112
+ :type=>"TIME",
113
+ :mentions=>["r_4"],
114
+ :summary=>
115
+ {:total_mentions=>1,
116
+ :segment_distribution=>{:age=>{"45-54"=>1}, :region=>{"Midwest"=>1}}}}],
117
+ :responses=>
118
+ [{:id=>"r_1",
119
+ :sentence=>
120
+ "I absolutely loved the experience shopping with Everlane. The website is clean,\n" +
121
+ "product descriptions are spot-on, and my jeans arrived two days early with eco-friendly packaging.",
122
+ :segment=>{:age=>"25-34", :region=>"West"}},
123
+ {:id=>"r_4",
124
+ :sentence=>
125
+ "I reached out to your Zendesk support team about a missing package, and while they responded within 24 hours,\n" +
126
+ "the response was copy-paste and didn't address my issue directly.",
127
+ :segment=>{:age=>"45-54", :region=>"Midwest"}}]}
128
+ ```
129
+
130
+ ### ✅ 4. Topic Modeling *(Coming Soon)*
131
+
132
+ Automatically group similar responses into topics and subthemes.
133
+
134
+ ---
135
+
136
+ ## 🔌 Supported Providers
137
+
138
+ | Feature | OpenAI ✅ | AWS Comprehend ✅ | Sentimental (Local) ⚠️ |
139
+ | ------------------ | -------------- | ---------------- | ---------------------- |
140
+ | Sentiment Analysis | ✅ | ✅ | ✅ |
141
+ | Key Phrases | ✅ | ✅ | ❌ Not supported |
142
+ | Entities | ✅ | ✅ | ❌ Not supported |
143
+ | Topics | 🔜 Coming Soon | 🔜 Coming Soon | ❌ |
144
+
145
+ Legend: ✅ Supported | 🔜 Coming Soon | ❌ Not Available | ⚠️ Partial
146
+
147
+ ---
148
+
149
+ ## 📅 Example Input
150
+
151
+ ```ruby
152
+ question = "What did you like or dislike about your recent shopping experience with us?"
153
+
154
+ entries = [
155
+ {
156
+ answer: "I absolutely loved the experience shopping with Everlane. The website is clean,\nproduct descriptions are spot-on, and my jeans arrived two days early with eco-friendly packaging.",
157
+ segment: { age: "25-34", region: "West" }
158
+ },
159
+ {
160
+ answer: "The checkout flow on your site was a nightmare. The promo code from your Instagram campaign didn’t work,\nand it kept redirecting me to the homepage. Shopify integration needs a serious fix.",
161
+ segment: { age: "35-44", region: "South" }
162
+ },
163
+ {
164
+ answer: "Apple Pay made the mobile checkout super fast. I placed an order while waiting for my coffee at Starbucks.\nGreat job optimizing the app UX—this is a game-changer.",
165
+ segment: { age: "25-34", region: "West" }
166
+ },
167
+ {
168
+ answer: "I reached out to your Zendesk support team about a missing package, and while they responded within 24 hours,\nthe response was copy-paste and didn't address my issue directly.",
169
+ segment: { age: "45-54", region: "Midwest" }
170
+ },
171
+ {
172
+ answer: "Shipping delays aside, I really liked the personalized note inside the box. Small gestures like that\nmake the Uniqlo brand stand out. Will definitely recommend to friends.",
173
+ segment: { age: "25-34", region: "West" }
174
+ }
175
+ ]
176
+ ```
177
+
178
+ ---
179
+
180
+ ## 🚀 Quick Start
181
+
182
+ ```ruby
183
+ # Install the gem
184
+ $ gem install sentiment_insights
185
+
186
+ # Configure the provider
187
+ SentimentInsights.configure do |config|
188
+ config.provider = :openai # or :aws, :sentimental
189
+ end
190
+
191
+ # Run analysis
192
+ insight = SentimentInsights::Insights::Sentiment.new
193
+ result = insight.analyze(entries)
194
+ puts JSON.pretty_generate(result)
195
+ ```
196
+
197
+ ---
198
+
199
+ ## 🔑 Environment Variables
200
+
201
+ ### OpenAI
202
+
203
+ ```bash
204
+ OPENAI_API_KEY=your_openai_key_here
205
+ ```
206
+
207
+ ### AWS Comprehend
208
+
209
+ ```bash
210
+ AWS_ACCESS_KEY_ID=your_aws_key
211
+ AWS_SECRET_ACCESS_KEY=your_aws_secret
212
+ AWS_REGION=us-east-1
213
+ ```
214
+
215
+ ---
216
+
217
+ ## 💎 Ruby Compatibility
218
+
219
+ - **Minimum Ruby version:** 2.7
220
+ - Tested on: 2.7, 3.0, 3.1, 3.2
221
+
222
+ ---
223
+
224
+ ## 🔮 Testing
225
+
226
+ ```bash
227
+ bundle exec rspec
228
+ ```
229
+
230
+ ---
231
+
232
+ ## 📋 Roadmap
233
+
234
+ - [x] Sentiment Analysis
235
+ - [x] Key Phrase Extraction
236
+ - [x] Entity Recognition
237
+ - [ ] Topic Modeling
238
+ - [ ] CSV/JSON Export Helpers
239
+ - [ ] Visual Dashboard Add-on
240
+
241
+ ---
242
+
243
+ ## 📄 License
244
+
245
+ MIT License
246
+
247
+ ---
248
+
249
+ ## 🙌 Contributing
250
+
251
+ Pull requests welcome! Please open an issue to discuss major changes first.
252
+
253
+ ---
254
+
255
+ ## 💬 Acknowledgements
256
+
257
+ - [OpenAI GPT](https://platform.openai.com/docs)
258
+ - [AWS Comprehend](https://docs.aws.amazon.com/comprehend/latest/dg/what-is.html)
259
+ - [Sentimental Gem](https://github.com/7compass/sentimental)
260
+
261
+ ---
262
+
263
+ ## 📢 Questions?
264
+
265
+ File an issue or reach out on [GitHub](https://github.com/your-repo)
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,15 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "dotenv/load"
5
+ require "sentiment_insights"
6
+
7
+ # You can add fixtures and/or initialization code here to make experimenting
8
+ # with your gem easier. You can also use a different console, if you like.
9
+
10
+ # (If you use this, don't forget to add pry to your Gemfile!)
11
+ # require "pry"
12
+ # Pry.start
13
+
14
+ require "irb"
15
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,48 @@
1
+ require "sentiment_insights/clients/sentiment/open_ai_client"
2
+ require "sentiment_insights/clients/sentiment/aws_comprehend_client"
3
+ require_relative "clients/sentiment/sentimental_client"
4
+
5
+ require "sentiment_insights/insights/sentiment"
6
+ require "sentiment_insights/insights/key_phrases"
7
+ require "sentiment_insights/insights/entities"
8
+ require "sentiment_insights/insights/topics"
9
+
10
+ module SentimentInsights
11
+ class Analyzer
12
+ def initialize(provider: SentimentInsights.configuration.provider)
13
+ @provider = provider
14
+ end
15
+
16
+ # Sentiment Analysis
17
+ def sentiment(text)
18
+ SentimentInsights::Insights::Sentiment.new(@provider).analyze(text)
19
+ end
20
+
21
+ def sentiment_batch(texts)
22
+ SentimentInsights::Insights::Sentiment.new(@provider).analyze_batch(texts)
23
+ end
24
+
25
+ # Key Phrase Extraction
26
+ def key_phrases(text)
27
+ SentimentInsights::Insights::KeyPhrases.new(@provider).extract(text)
28
+ end
29
+
30
+ def key_phrases_batch(texts, options: {})
31
+ SentimentInsights::Insights::KeyPhrases.new(@provider).extract_batch(texts, options)
32
+ end
33
+
34
+ # Entity Recognition
35
+ def entities(text)
36
+ SentimentInsights::Insights::Entities.new(@provider).extract(text)
37
+ end
38
+
39
+ def entities_batch(texts, options: {})
40
+ SentimentInsights::Insights::Entities.new(@provider).extract_batch(texts, options)
41
+ end
42
+
43
+ # Topic Modeling
44
+ def topics_batch(texts, options: {})
45
+ SentimentInsights::Insights::Topics.new(@provider).model_topics(texts, options)
46
+ end
47
+ end
48
+ end
@@ -0,0 +1,84 @@
1
+ require 'aws-sdk-comprehend'
2
+ require 'logger'
3
+
4
+ module SentimentInsights
5
+ module Clients
6
+ module Entities
7
+ class AwsClient
8
+ MAX_BATCH_SIZE = 25
9
+
10
+ def initialize(region: 'us-east-1')
11
+ @client = Aws::Comprehend::Client.new(region: region)
12
+ @logger = Logger.new($stdout)
13
+ end
14
+
15
+ def extract_batch(entries, question: nil)
16
+ responses = []
17
+ entity_map = Hash.new { |h, k| h[k] = [] }
18
+
19
+ entries.each_slice(MAX_BATCH_SIZE).with_index do |batch, batch_idx|
20
+ texts = batch.map { |e| e[:answer].to_s.strip[0...5000] }
21
+
22
+ begin
23
+ resp = @client.batch_detect_entities({
24
+ text_list: texts,
25
+ language_code: 'en'
26
+ })
27
+
28
+ resp.result_list.each_with_index do |res, idx|
29
+ entry_index = (batch_idx * MAX_BATCH_SIZE) + idx
30
+ entry = entries[entry_index]
31
+ sentence = texts[idx]
32
+ response_id = "r_#{entry_index + 1}"
33
+
34
+ responses << {
35
+ id: response_id,
36
+ sentence: sentence,
37
+ segment: entry[:segment] || {}
38
+ }
39
+
40
+ entities = res.entities.map do |e|
41
+ {
42
+ text: e.text.downcase.strip,
43
+ type: e.type
44
+ }
45
+ end.uniq { |e| [e[:text], e[:type]] }
46
+
47
+ entities.each do |ent|
48
+ key = [ent[:text], ent[:type]]
49
+ entity_map[key] << response_id
50
+ end
51
+ end
52
+
53
+ resp.error_list.each do |error|
54
+ @logger.warn "AWS entity error at index #{error.index}: #{error.error_code}"
55
+ end
56
+
57
+ rescue Aws::Comprehend::Errors::ServiceError => e
58
+ @logger.error "AWS Comprehend error: #{e.message}"
59
+ batch.each_with_index do |entry, i|
60
+ entry_index = (batch_idx * MAX_BATCH_SIZE) + i
61
+ responses << {
62
+ id: "r_#{entry_index + 1}",
63
+ sentence: entry[:answer],
64
+ segment: entry[:segment] || {}
65
+ }
66
+ end
67
+ end
68
+ end
69
+
70
+ entities = entity_map.map do |(text, type), ref_ids|
71
+ {
72
+ entity: text,
73
+ type: type,
74
+ mentions: ref_ids.uniq,
75
+ summary: nil
76
+ }
77
+ end
78
+
79
+ { entities: entities, responses: responses }
80
+ end
81
+ end
82
+ end
83
+ end
84
+ end