sentimentscopeai 1.0.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- sentimentscopeai-1.0.0/LICENSE +21 -0
- sentimentscopeai-1.0.0/PKG-INFO +244 -0
- sentimentscopeai-1.0.0/README.md +229 -0
- sentimentscopeai-1.0.0/pyproject.toml +23 -0
- sentimentscopeai-1.0.0/sentimentscopeai/__init__.py +2 -0
- sentimentscopeai-1.0.0/sentimentscopeai/core.py +373 -0
- sentimentscopeai-1.0.0/sentimentscopeai.egg-info/PKG-INFO +244 -0
- sentimentscopeai-1.0.0/sentimentscopeai.egg-info/SOURCES.txt +10 -0
- sentimentscopeai-1.0.0/sentimentscopeai.egg-info/dependency_links.txt +1 -0
- sentimentscopeai-1.0.0/sentimentscopeai.egg-info/requires.txt +2 -0
- sentimentscopeai-1.0.0/sentimentscopeai.egg-info/top_level.txt +1 -0
- sentimentscopeai-1.0.0/setup.cfg +4 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Vignesh Thondikulam
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,244 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: sentimentscopeai
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: Transformer-based review sentiment analysis and actionable insight generation.
|
|
5
|
+
Author: Vignesh Thondikulam
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/VigneshT24/SentimentScopeAI
|
|
8
|
+
Project-URL: Repository, https://github.com/VigneshT24/SentimentScopeAI
|
|
9
|
+
Requires-Python: >=3.9
|
|
10
|
+
Description-Content-Type: text/markdown
|
|
11
|
+
License-File: LICENSE
|
|
12
|
+
Requires-Dist: torch
|
|
13
|
+
Requires-Dist: transformers
|
|
14
|
+
Dynamic: license-file
|
|
15
|
+
|
|
16
|
+
# SentimentScopeAI
|
|
17
|
+
## Fine-Grained Review Sentiment Analysis & Insight Generation
|
|
18
|
+
|
|
19
|
+
SentimentScopeAI is a Python-based NLP system that leverages PyTorch and HuggingFace Transformers (pre-trained models) to move beyond binary sentiment classification and instead analyze, interpret, and reason over collections of user reviews to help companies improve their products/services
|
|
20
|
+
|
|
21
|
+
Rather than treating sentiment analysis as a black-box prediction task, this project focuses on semantic interpretation, explainability, and the generation of aggregated insights, simulating how a human analyst would read and summarize large volumes of feedback.
|
|
22
|
+
|
|
23
|
+
## Project Motivation
|
|
24
|
+
|
|
25
|
+
SentimentScopeAI is designed to answer deeper, more practical questions:
|
|
26
|
+
* What does a numerical rating actually mean in context?
|
|
27
|
+
* How consistent are opinions across many reviews?
|
|
28
|
+
* What actionable advice can be derived from collective sentiment?
|
|
29
|
+
|
|
30
|
+
## Features
|
|
31
|
+
|
|
32
|
+
1.) Pre-Trained Sentiment Modeling (PyTorch + HuggingFace)
|
|
33
|
+
* Uses pre-trained transformer models from HuggingFace
|
|
34
|
+
* Integrated via PyTorch for inference and extensibility
|
|
35
|
+
* Enables robust sentiment understanding without training from scratch
|
|
36
|
+
* Designed so downstream logic operates on model outputs, not raw text
|
|
37
|
+
|
|
38
|
+
2.) Rating Meaning Inference
|
|
39
|
+
* Implemented the infer_rating_meaning() function
|
|
40
|
+
* Converts numerical ratings (1–5) into semantic interpretations
|
|
41
|
+
* Uses sentiment signals, linguistic tone, and contextual cues
|
|
42
|
+
* Handles:
|
|
43
|
+
* Mixed sentiment
|
|
44
|
+
* Neutral or ambiguous phrasing
|
|
45
|
+
* Disagreement between rating score and review text
|
|
46
|
+
|
|
47
|
+
Example:
|
|
48
|
+
```
|
|
49
|
+
Rating: 3
|
|
50
|
+
→ "Mixed experience with noticeable positives and recurring issues."
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
3.) Structured Review Ingestion
|
|
54
|
+
* Reviews are parsed in a structured format (JSON / Python objects)
|
|
55
|
+
* Each review preserves:
|
|
56
|
+
* Company name
|
|
57
|
+
* Service or product name
|
|
58
|
+
* Full review text
|
|
59
|
+
* Enables batch analysis across multiple reviews per service
|
|
60
|
+
|
|
61
|
+
4.) Explainable, Deterministic Pipeline
|
|
62
|
+
* Downstream reasoning is transparent and testable
|
|
63
|
+
* No opaque end-to-end predictions
|
|
64
|
+
* Model outputs are interpreted rather than blindly trusted
|
|
65
|
+
* Designed for debugging, auditing, and future research extension
|
|
66
|
+
|
|
67
|
+
5.) Cross-Review Advice Generation
|
|
68
|
+
* Read all reviews for a given product or service
|
|
69
|
+
* Aggregate sentiment signals across users
|
|
70
|
+
* Detect recurring strengths and weaknesses
|
|
71
|
+
* Generate actionable advice for stakeholders
|
|
72
|
+
|
|
73
|
+
These steps transition the system from analysis → reasoning → recommendation generation.
|
|
74
|
+
|
|
75
|
+
Example:
|
|
76
|
+
```
|
|
77
|
+
For <Service Name>: overall sentiment is mixed reflecting a balance
|
|
78
|
+
of positive and negative feedback
|
|
79
|
+
|
|
80
|
+
The following specific issues were extracted from negative reviews:
|
|
81
|
+
|
|
82
|
+
1) missed a few appointments
|
|
83
|
+
2) not signed into the right account
|
|
84
|
+
3) interface is horrible
|
|
85
|
+
4) find the interface confusing
|
|
86
|
+
5) invitations and acceptances are terrible
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
## System Architecture Overview
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
Reviews
|
|
93
|
+
↓
|
|
94
|
+
Pre-trained Transformer (HuggingFace + PyTorch)
|
|
95
|
+
↓
|
|
96
|
+
Sentiment Signals
|
|
97
|
+
↓
|
|
98
|
+
Rating Meaning Inference
|
|
99
|
+
↓
|
|
100
|
+
Cross-Review Aggregation
|
|
101
|
+
↓
|
|
102
|
+
Advice Generation
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
## Tech-Stack
|
|
106
|
+
|
|
107
|
+
* **Language**: Python
|
|
108
|
+
* **Deep Learning**: PyTorch
|
|
109
|
+
* **NLP Models**: HuggingFace Transformers (pre-trained), Flan-T5
|
|
110
|
+
* **Concepts**:
|
|
111
|
+
* Sentiment analysis
|
|
112
|
+
* Semantic interpretation
|
|
113
|
+
* Explainable AI
|
|
114
|
+
* **Aggregated reasoning**
|
|
115
|
+
* **Data Handling**: JSON, Python data structures
|
|
116
|
+
|
|
117
|
+
## Project Structure (Simplified)
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
SentimentScopeAI/
|
|
121
|
+
│
|
|
122
|
+
├── sentimentscopeAI.py # Core sentiment + inference logic
|
|
123
|
+
├── README.md # Documentation
|
|
124
|
+
└── requirements.txt # Dependencies (PyTorch, Transformers)
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
## Why SentimentScopeAI?
|
|
128
|
+
|
|
129
|
+
Every organization collects feedback - but reading hundreds or thousands of reviews is time-consuming, inconsistent, and difficult to scale. Important insights are often buried in repetitive comments, while actionable criticism gets overlooked.
|
|
130
|
+
|
|
131
|
+
SentimentScopeAI is designed to do the heavy lifting:
|
|
132
|
+
* Reads and analyzes large volumes of reviews automatically
|
|
133
|
+
* Identifies recurring pain points across users
|
|
134
|
+
* Distills unstructured feedback into clear, constructive, and actionable advice
|
|
135
|
+
* Helps teams focus on what to improve rather than sorting through raw text
|
|
136
|
+
|
|
137
|
+
## Installation & Usage
|
|
138
|
+
|
|
139
|
+
SentimentScopeAI is distributed as a Python package and can be installed via pip:
|
|
140
|
+
|
|
141
|
+
```
|
|
142
|
+
pip install sentimentscopeai
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Requirements:
|
|
146
|
+
* Python 3.9 or newer (Python 3.10 or above is recommended for best performance and compatibility)
|
|
147
|
+
* PyTorch
|
|
148
|
+
* HuggingFace Transformers
|
|
149
|
+
* Internet connection on first run (to download pre-trained models)
|
|
150
|
+
|
|
151
|
+
All required dependencies are automatically installed with the package.
|
|
152
|
+
|
|
153
|
+
## Basic Usage:
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
from sentimentscopeai import SentimentScopeAI
|
|
157
|
+
|
|
158
|
+
# MAKE SURE TO PASS IN: current_folder/json_file_name, not just json_file_name
|
|
159
|
+
review_bot = ssAI.SentimentScopeAI("Testing/companyreview.json")
|
|
160
|
+
|
|
161
|
+
print(review_bot.generate_summary())
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
What Happens Internally
|
|
165
|
+
|
|
166
|
+
* Reviews are parsed from a structured JSON file
|
|
167
|
+
* Sentiment is inferred using pre-trained transformer models (PyTorch + HuggingFace)
|
|
168
|
+
* Rating meanings are semantically interpreted
|
|
169
|
+
* Flan-T5 generates human-readable, actionable advice based on aggregated feedback
|
|
170
|
+
|
|
171
|
+
## IMPORTANT NOTICE:
|
|
172
|
+
|
|
173
|
+
1.) Strict JSON Input Format (Required)
|
|
174
|
+
|
|
175
|
+
SentimentScopeAI only accepts JSON input.
|
|
176
|
+
The review file must follow this exact structure:
|
|
177
|
+
|
|
178
|
+
```json
|
|
179
|
+
[
|
|
180
|
+
{
|
|
181
|
+
"company_name": "Company Name",
|
|
182
|
+
"service_name": "Product or Service Name",
|
|
183
|
+
"review": "Full user review text goes here."
|
|
184
|
+
},
|
|
185
|
+
{
|
|
186
|
+
"company_name": "Company Name",
|
|
187
|
+
"service_name": "Product or Service Name",
|
|
188
|
+
"review": "Another review text."
|
|
189
|
+
}
|
|
190
|
+
]
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
Missing fields, incorrect keys, or non-JSON formats will cause parsing errors.
|
|
194
|
+
|
|
195
|
+
2.) JSON Must Be Valid
|
|
196
|
+
|
|
197
|
+
* File must be UTF-8 encoded
|
|
198
|
+
* No trailing commas
|
|
199
|
+
* No comments
|
|
200
|
+
* Must be a list ([]), not a single object
|
|
201
|
+
|
|
202
|
+
You can use a JSON validator if you are unsure.
|
|
203
|
+
|
|
204
|
+
3.) One Company & One Service per JSON File (Required)
|
|
205
|
+
|
|
206
|
+
EXAMPLE:
|
|
207
|
+
```json
|
|
208
|
+
[
|
|
209
|
+
{
|
|
210
|
+
"company_name": "Google",
|
|
211
|
+
"service_name": "Google Calendar",
|
|
212
|
+
"review": "The interface is clean and easy to use."
|
|
213
|
+
},
|
|
214
|
+
{
|
|
215
|
+
"company_name": "Google",
|
|
216
|
+
"service_name": "Google Calendar",
|
|
217
|
+
"review": "Reminders are helpful, but syncing can be slow."
|
|
218
|
+
}
|
|
219
|
+
]
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
This restriction is intentional:
|
|
223
|
+
|
|
224
|
+
* Sentiment aggregation assumes a single shared context
|
|
225
|
+
* Advice generation relies on consistent product-level patterns
|
|
226
|
+
* Mixing services can produce misleading summaries and recommendations
|
|
227
|
+
|
|
228
|
+
If you need to analyze multiple products or companies, create separate JSON files and run SentimentScopeAI independently for each dataset.
|
|
229
|
+
|
|
230
|
+
4.) Model Loading Behavior
|
|
231
|
+
|
|
232
|
+
* Transformer models are lazy-loaded
|
|
233
|
+
* First run may take longer due to:
|
|
234
|
+
* Model downloads
|
|
235
|
+
* Tokenizer initialization
|
|
236
|
+
* Subsequent runs are significantly faster
|
|
237
|
+
|
|
238
|
+
This design improves startup efficiency and memory usage.
|
|
239
|
+
|
|
240
|
+
|
|
241
|
+
**SentimentScopeAI is provided as-is and is not liable for any damages arising from its use.
|
|
242
|
+
Do not include personal, sensitive, or confidential information in review data.
|
|
243
|
+
All input data is processed locally and is not used for model training or retained beyond execution.
|
|
244
|
+
Users are responsible for ensuring ethical and appropriate use of the system.**
|
|
@@ -0,0 +1,229 @@
|
|
|
1
|
+
# SentimentScopeAI
|
|
2
|
+
## Fine-Grained Review Sentiment Analysis & Insight Generation
|
|
3
|
+
|
|
4
|
+
SentimentScopeAI is a Python-based NLP system that leverages PyTorch and HuggingFace Transformers (pre-trained models) to move beyond binary sentiment classification and instead analyze, interpret, and reason over collections of user reviews to help companies improve their products/services
|
|
5
|
+
|
|
6
|
+
Rather than treating sentiment analysis as a black-box prediction task, this project focuses on semantic interpretation, explainability, and the generation of aggregated insights, simulating how a human analyst would read and summarize large volumes of feedback.
|
|
7
|
+
|
|
8
|
+
## Project Motivation
|
|
9
|
+
|
|
10
|
+
SentimentScopeAI is designed to answer deeper, more practical questions:
|
|
11
|
+
* What does a numerical rating actually mean in context?
|
|
12
|
+
* How consistent are opinions across many reviews?
|
|
13
|
+
* What actionable advice can be derived from collective sentiment?
|
|
14
|
+
|
|
15
|
+
## Features
|
|
16
|
+
|
|
17
|
+
1.) Pre-Trained Sentiment Modeling (PyTorch + HuggingFace)
|
|
18
|
+
* Uses pre-trained transformer models from HuggingFace
|
|
19
|
+
* Integrated via PyTorch for inference and extensibility
|
|
20
|
+
* Enables robust sentiment understanding without training from scratch
|
|
21
|
+
* Designed so downstream logic operates on model outputs, not raw text
|
|
22
|
+
|
|
23
|
+
2.) Rating Meaning Inference
|
|
24
|
+
* Implemented the infer_rating_meaning() function
|
|
25
|
+
* Converts numerical ratings (1–5) into semantic interpretations
|
|
26
|
+
* Uses sentiment signals, linguistic tone, and contextual cues
|
|
27
|
+
* Handles:
|
|
28
|
+
* Mixed sentiment
|
|
29
|
+
* Neutral or ambiguous phrasing
|
|
30
|
+
* Disagreement between rating score and review text
|
|
31
|
+
|
|
32
|
+
Example:
|
|
33
|
+
```
|
|
34
|
+
Rating: 3
|
|
35
|
+
→ "Mixed experience with noticeable positives and recurring issues."
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
3.) Structured Review Ingestion
|
|
39
|
+
* Reviews are parsed in a structured format (JSON / Python objects)
|
|
40
|
+
* Each review preserves:
|
|
41
|
+
* Company name
|
|
42
|
+
* Service or product name
|
|
43
|
+
* Full review text
|
|
44
|
+
* Enables batch analysis across multiple reviews per service
|
|
45
|
+
|
|
46
|
+
4.) Explainable, Deterministic Pipeline
|
|
47
|
+
* Downstream reasoning is transparent and testable
|
|
48
|
+
* No opaque end-to-end predictions
|
|
49
|
+
* Model outputs are interpreted rather than blindly trusted
|
|
50
|
+
* Designed for debugging, auditing, and future research extension
|
|
51
|
+
|
|
52
|
+
5.) Cross-Review Advice Generation
|
|
53
|
+
* Read all reviews for a given product or service
|
|
54
|
+
* Aggregate sentiment signals across users
|
|
55
|
+
* Detect recurring strengths and weaknesses
|
|
56
|
+
* Generate actionable advice for stakeholders
|
|
57
|
+
|
|
58
|
+
These steps transition the system from analysis → reasoning → recommendation generation.
|
|
59
|
+
|
|
60
|
+
Example:
|
|
61
|
+
```
|
|
62
|
+
For <Service Name>: overall sentiment is mixed reflecting a balance
|
|
63
|
+
of positive and negative feedback
|
|
64
|
+
|
|
65
|
+
The following specific issues were extracted from negative reviews:
|
|
66
|
+
|
|
67
|
+
1) missed a few appointments
|
|
68
|
+
2) not signed into the right account
|
|
69
|
+
3) interface is horrible
|
|
70
|
+
4) find the interface confusing
|
|
71
|
+
5) invitations and acceptances are terrible
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## System Architecture Overview
|
|
75
|
+
|
|
76
|
+
```
|
|
77
|
+
Reviews
|
|
78
|
+
↓
|
|
79
|
+
Pre-trained Transformer (HuggingFace + PyTorch)
|
|
80
|
+
↓
|
|
81
|
+
Sentiment Signals
|
|
82
|
+
↓
|
|
83
|
+
Rating Meaning Inference
|
|
84
|
+
↓
|
|
85
|
+
Cross-Review Aggregation
|
|
86
|
+
↓
|
|
87
|
+
Advice Generation
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Tech-Stack
|
|
91
|
+
|
|
92
|
+
* **Language**: Python
|
|
93
|
+
* **Deep Learning**: PyTorch
|
|
94
|
+
* **NLP Models**: HuggingFace Transformers (pre-trained), Flan-T5
|
|
95
|
+
* **Concepts**:
|
|
96
|
+
* Sentiment analysis
|
|
97
|
+
* Semantic interpretation
|
|
98
|
+
* Explainable AI
|
|
99
|
+
* **Aggregated reasoning**
|
|
100
|
+
* **Data Handling**: JSON, Python data structures
|
|
101
|
+
|
|
102
|
+
## Project Structure (Simplified)
|
|
103
|
+
|
|
104
|
+
```
|
|
105
|
+
SentimentScopeAI/
|
|
106
|
+
│
|
|
107
|
+
├── sentimentscopeAI.py # Core sentiment + inference logic
|
|
108
|
+
├── README.md # Documentation
|
|
109
|
+
└── requirements.txt # Dependencies (PyTorch, Transformers)
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Why SentimentScopeAI?
|
|
113
|
+
|
|
114
|
+
Every organization collects feedback - but reading hundreds or thousands of reviews is time-consuming, inconsistent, and difficult to scale. Important insights are often buried in repetitive comments, while actionable criticism gets overlooked.
|
|
115
|
+
|
|
116
|
+
SentimentScopeAI is designed to do the heavy lifting:
|
|
117
|
+
* Reads and analyzes large volumes of reviews automatically
|
|
118
|
+
* Identifies recurring pain points across users
|
|
119
|
+
* Distills unstructured feedback into clear, constructive, and actionable advice
|
|
120
|
+
* Helps teams focus on what to improve rather than sorting through raw text
|
|
121
|
+
|
|
122
|
+
## Installation & Usage
|
|
123
|
+
|
|
124
|
+
SentimentScopeAI is distributed as a Python package and can be installed via pip:
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
pip install sentimentscopeai
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
Requirements:
|
|
131
|
+
* Python 3.9 or newer (Python 3.10 or above is recommended for best performance and compatibility)
|
|
132
|
+
* PyTorch
|
|
133
|
+
* HuggingFace Transformers
|
|
134
|
+
* Internet connection on first run (to download pre-trained models)
|
|
135
|
+
|
|
136
|
+
All required dependencies are automatically installed with the package.
|
|
137
|
+
|
|
138
|
+
## Basic Usage:
|
|
139
|
+
|
|
140
|
+
```python
|
|
141
|
+
from sentimentscopeai import SentimentScopeAI
|
|
142
|
+
|
|
143
|
+
# MAKE SURE TO PASS IN: current_folder/json_file_name, not just json_file_name
|
|
144
|
+
review_bot = ssAI.SentimentScopeAI("Testing/companyreview.json")
|
|
145
|
+
|
|
146
|
+
print(review_bot.generate_summary())
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
What Happens Internally
|
|
150
|
+
|
|
151
|
+
* Reviews are parsed from a structured JSON file
|
|
152
|
+
* Sentiment is inferred using pre-trained transformer models (PyTorch + HuggingFace)
|
|
153
|
+
* Rating meanings are semantically interpreted
|
|
154
|
+
* Flan-T5 generates human-readable, actionable advice based on aggregated feedback
|
|
155
|
+
|
|
156
|
+
## IMPORTANT NOTICE:
|
|
157
|
+
|
|
158
|
+
1.) Strict JSON Input Format (Required)
|
|
159
|
+
|
|
160
|
+
SentimentScopeAI only accepts JSON input.
|
|
161
|
+
The review file must follow this exact structure:
|
|
162
|
+
|
|
163
|
+
```json
|
|
164
|
+
[
|
|
165
|
+
{
|
|
166
|
+
"company_name": "Company Name",
|
|
167
|
+
"service_name": "Product or Service Name",
|
|
168
|
+
"review": "Full user review text goes here."
|
|
169
|
+
},
|
|
170
|
+
{
|
|
171
|
+
"company_name": "Company Name",
|
|
172
|
+
"service_name": "Product or Service Name",
|
|
173
|
+
"review": "Another review text."
|
|
174
|
+
}
|
|
175
|
+
]
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
Missing fields, incorrect keys, or non-JSON formats will cause parsing errors.
|
|
179
|
+
|
|
180
|
+
2.) JSON Must Be Valid
|
|
181
|
+
|
|
182
|
+
* File must be UTF-8 encoded
|
|
183
|
+
* No trailing commas
|
|
184
|
+
* No comments
|
|
185
|
+
* Must be a list ([]), not a single object
|
|
186
|
+
|
|
187
|
+
You can use a JSON validator if you are unsure.
|
|
188
|
+
|
|
189
|
+
3.) One Company & One Service per JSON File (Required)
|
|
190
|
+
|
|
191
|
+
EXAMPLE:
|
|
192
|
+
```json
|
|
193
|
+
[
|
|
194
|
+
{
|
|
195
|
+
"company_name": "Google",
|
|
196
|
+
"service_name": "Google Calendar",
|
|
197
|
+
"review": "The interface is clean and easy to use."
|
|
198
|
+
},
|
|
199
|
+
{
|
|
200
|
+
"company_name": "Google",
|
|
201
|
+
"service_name": "Google Calendar",
|
|
202
|
+
"review": "Reminders are helpful, but syncing can be slow."
|
|
203
|
+
}
|
|
204
|
+
]
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
This restriction is intentional:
|
|
208
|
+
|
|
209
|
+
* Sentiment aggregation assumes a single shared context
|
|
210
|
+
* Advice generation relies on consistent product-level patterns
|
|
211
|
+
* Mixing services can produce misleading summaries and recommendations
|
|
212
|
+
|
|
213
|
+
If you need to analyze multiple products or companies, create separate JSON files and run SentimentScopeAI independently for each dataset.
|
|
214
|
+
|
|
215
|
+
4.) Model Loading Behavior
|
|
216
|
+
|
|
217
|
+
* Transformer models are lazy-loaded
|
|
218
|
+
* First run may take longer due to:
|
|
219
|
+
* Model downloads
|
|
220
|
+
* Tokenizer initialization
|
|
221
|
+
* Subsequent runs are significantly faster
|
|
222
|
+
|
|
223
|
+
This design improves startup efficiency and memory usage.
|
|
224
|
+
|
|
225
|
+
|
|
226
|
+
**SentimentScopeAI is provided as-is and is not liable for any damages arising from its use.
|
|
227
|
+
Do not include personal, sensitive, or confidential information in review data.
|
|
228
|
+
All input data is processed locally and is not used for model training or retained beyond execution.
|
|
229
|
+
Users are responsible for ensuring ethical and appropriate use of the system.**
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=61.0", "wheel"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "sentimentscopeai"
|
|
7
|
+
version = "1.0.0"
|
|
8
|
+
description = "Transformer-based review sentiment analysis and actionable insight generation."
|
|
9
|
+
readme = "README.md"
|
|
10
|
+
requires-python = ">=3.9"
|
|
11
|
+
license = { text = "MIT" }
|
|
12
|
+
authors = [
|
|
13
|
+
{ name = "Vignesh Thondikulam" }
|
|
14
|
+
]
|
|
15
|
+
|
|
16
|
+
dependencies = [
|
|
17
|
+
"torch",
|
|
18
|
+
"transformers"
|
|
19
|
+
]
|
|
20
|
+
|
|
21
|
+
[project.urls]
|
|
22
|
+
Homepage = "https://github.com/VigneshT24/SentimentScopeAI"
|
|
23
|
+
Repository = "https://github.com/VigneshT24/SentimentScopeAI"
|
|
@@ -0,0 +1,373 @@
|
|
|
1
|
+
import torch
|
|
2
|
+
import json
|
|
3
|
+
import os
|
|
4
|
+
import string
|
|
5
|
+
import random
|
|
6
|
+
import textwrap
|
|
7
|
+
from transformers import (AutoTokenizer, AutoModelForSequenceClassification, AutoModelForSeq2SeqLM, T5ForConditionalGeneration, T5Tokenizer, set_seed)
|
|
8
|
+
|
|
9
|
+
class SentimentScopeAI:
|
|
10
|
+
## Private attributes
|
|
11
|
+
__hf_model_name = None
|
|
12
|
+
__hf_tokenizer = None
|
|
13
|
+
__hf_model = None
|
|
14
|
+
__pytorch_model_name = None
|
|
15
|
+
__pytorch_tokenizer = None
|
|
16
|
+
__pytorch_model = None
|
|
17
|
+
__json_file_path = None
|
|
18
|
+
__service_name = None
|
|
19
|
+
__device = None
|
|
20
|
+
__notable_negatives = []
|
|
21
|
+
__extraction_model = None
|
|
22
|
+
__extraction_tokenizer = None
|
|
23
|
+
|
|
24
|
+
def __init__(self, file_path):
|
|
25
|
+
"""Initialize the SentimentScopeAI class with the specified JSON file path."""
|
|
26
|
+
self.__hf_model_name = "Vamsi/T5_Paraphrase_Paws"
|
|
27
|
+
self.__pytorch_model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
|
|
28
|
+
self.__extraction_model_name = "google/flan-t5-large"
|
|
29
|
+
self.__json_file_path = os.path.abspath(file_path)
|
|
30
|
+
self.__device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
31
|
+
|
|
32
|
+
@property
|
|
33
|
+
def hf_model(self):
|
|
34
|
+
"""Lazy loader for the Paraphrase Model."""
|
|
35
|
+
if self.__hf_model is None:
|
|
36
|
+
self.__hf_model = AutoModelForSeq2SeqLM.from_pretrained(self.__hf_model_name)
|
|
37
|
+
return self.__hf_model
|
|
38
|
+
|
|
39
|
+
@property
|
|
40
|
+
def hf_tokenizer(self):
|
|
41
|
+
"""Lazy loader for the T5 Tokenizer."""
|
|
42
|
+
if self.__hf_tokenizer is None:
|
|
43
|
+
self.__hf_tokenizer = T5Tokenizer.from_pretrained(self.__hf_model_name, legacy=True)
|
|
44
|
+
return self.__hf_tokenizer
|
|
45
|
+
|
|
46
|
+
@property
|
|
47
|
+
def pytorch_tokenizer(self):
|
|
48
|
+
"""Lazy loader for the PyTorch Tokenizer."""
|
|
49
|
+
if self.__pytorch_tokenizer is None:
|
|
50
|
+
self.__pytorch_tokenizer = AutoTokenizer.from_pretrained(self.__pytorch_model_name)
|
|
51
|
+
return self.__pytorch_tokenizer
|
|
52
|
+
|
|
53
|
+
@property
|
|
54
|
+
def pytorch_model(self):
|
|
55
|
+
"""Lazy loader for the PyTorch Model."""
|
|
56
|
+
if self.__pytorch_model is None:
|
|
57
|
+
self.__pytorch_model = AutoModelForSequenceClassification.from_pretrained(
|
|
58
|
+
self.__pytorch_model_name
|
|
59
|
+
).to(self.__device)
|
|
60
|
+
return self.__pytorch_model
|
|
61
|
+
|
|
62
|
+
@property
|
|
63
|
+
def extraction_model(self):
|
|
64
|
+
"""Lazy loader for the Flan-T5 extraction model."""
|
|
65
|
+
if self.__extraction_model is None:
|
|
66
|
+
self.__extraction_model = T5ForConditionalGeneration.from_pretrained(
|
|
67
|
+
self.__extraction_model_name
|
|
68
|
+
).to(self.__device)
|
|
69
|
+
return self.__extraction_model
|
|
70
|
+
|
|
71
|
+
@property
|
|
72
|
+
def extraction_tokenizer(self):
|
|
73
|
+
"""Lazy loader for the Flan-T5 tokenizer."""
|
|
74
|
+
if self.__extraction_tokenizer is None:
|
|
75
|
+
self.__extraction_tokenizer = AutoTokenizer.from_pretrained(
|
|
76
|
+
self.__extraction_model_name
|
|
77
|
+
)
|
|
78
|
+
return self.__extraction_tokenizer
|
|
79
|
+
|
|
80
|
+
def __get_predictive_star(self, text) -> int:
|
|
81
|
+
"""
|
|
82
|
+
Predict the sentiment star rating for the given text review.
|
|
83
|
+
|
|
84
|
+
Args:
|
|
85
|
+
text (str): The text review to analyze.
|
|
86
|
+
Returns:
|
|
87
|
+
int: The predicted star rating (1 to 5).
|
|
88
|
+
"""
|
|
89
|
+
inputs = self.pytorch_tokenizer(text, return_tensors="pt").to(self.__device)
|
|
90
|
+
|
|
91
|
+
with torch.no_grad():
|
|
92
|
+
outputs = self.pytorch_model(**inputs)
|
|
93
|
+
|
|
94
|
+
logits = outputs.logits
|
|
95
|
+
prediction = torch.argmax(logits, dim=-1).item()
|
|
96
|
+
|
|
97
|
+
num_star = prediction + 1
|
|
98
|
+
return num_star
|
|
99
|
+
|
|
100
|
+
def __calculate_all_review(self) -> int:
|
|
101
|
+
"""
|
|
102
|
+
Calculate and print the predicted star ratings for all reviews in the JSON file.
|
|
103
|
+
|
|
104
|
+
Args:
|
|
105
|
+
None
|
|
106
|
+
Returns:
|
|
107
|
+
tuple: A tuple containing the total number of reviews and the average star rating.
|
|
108
|
+
"""
|
|
109
|
+
try:
|
|
110
|
+
with open(self.__json_file_path, 'r') as reviews_file:
|
|
111
|
+
all_reviews = json.load(reviews_file)
|
|
112
|
+
sum = 0
|
|
113
|
+
num_reviews = 0
|
|
114
|
+
for i, entry in enumerate(all_reviews, 1):
|
|
115
|
+
single_review_rating = self.__get_predictive_star(entry['review'])
|
|
116
|
+
sum += single_review_rating
|
|
117
|
+
self.__service_name = entry['service_name']
|
|
118
|
+
num_reviews = i
|
|
119
|
+
return (sum / num_reviews) if num_reviews != 0 else 0
|
|
120
|
+
except FileNotFoundError:
|
|
121
|
+
print("The JSON file you inputted doesn't exist. Please input a valid company review file.")
|
|
122
|
+
except json.JSONDecodeError:
|
|
123
|
+
print("Could not decode JSON file. Check for valid JSON syntax.")
|
|
124
|
+
except PermissionError:
|
|
125
|
+
print("Permission denied to open the JSON file.")
|
|
126
|
+
except Exception as e:
|
|
127
|
+
print(f"An unexpected error occured: {e}")
|
|
128
|
+
|
|
129
|
+
def __paraphrase_statement(self, statement: str) -> list[str]:
|
|
130
|
+
"""Generates multiple unique paraphrased variations of a given string.
|
|
131
|
+
|
|
132
|
+
Uses a Hugging Face transformer model to generate five variations of the
|
|
133
|
+
input statement. Results are normalized (lowercased, stripped of
|
|
134
|
+
punctuation, and whitespace-cleaned) to ensure uniqueness.
|
|
135
|
+
|
|
136
|
+
Args:
|
|
137
|
+
statement (str): The text to be paraphrased.
|
|
138
|
+
|
|
139
|
+
Returns:
|
|
140
|
+
list[str]: A list of unique, cleaned paraphrased strings.
|
|
141
|
+
Returns [""] if the input is None, empty, or whitespace.
|
|
142
|
+
"""
|
|
143
|
+
set_seed(random.randint(0, 2**32 - 1))
|
|
144
|
+
|
|
145
|
+
if statement is None or statement.isspace() or statement == "":
|
|
146
|
+
return [""]
|
|
147
|
+
|
|
148
|
+
prompt = f"paraphrase: {statement}"
|
|
149
|
+
encoder = self.hf_tokenizer(prompt, return_tensors="pt", truncation=True)
|
|
150
|
+
|
|
151
|
+
output = self.hf_model.generate(
|
|
152
|
+
**encoder,
|
|
153
|
+
max_length=48,
|
|
154
|
+
do_sample=True,
|
|
155
|
+
top_p=0.99,
|
|
156
|
+
top_k=50,
|
|
157
|
+
temperature= 1.0,
|
|
158
|
+
num_return_sequences=5,
|
|
159
|
+
repetition_penalty=1.2,
|
|
160
|
+
)
|
|
161
|
+
|
|
162
|
+
resultant = self.hf_tokenizer.batch_decode(output, skip_special_tokens=True)
|
|
163
|
+
|
|
164
|
+
seen = set()
|
|
165
|
+
unique = []
|
|
166
|
+
translator = str.maketrans('', '', string.punctuation)
|
|
167
|
+
|
|
168
|
+
for list_sentence in resultant:
|
|
169
|
+
list_sentence = list_sentence.lower().strip()
|
|
170
|
+
list_sentence = list_sentence.translate(translator)
|
|
171
|
+
while (list_sentence[-1:] == ' '):
|
|
172
|
+
list_sentence = list_sentence[:-1]
|
|
173
|
+
seen.add(list_sentence)
|
|
174
|
+
|
|
175
|
+
for set_sentence in seen:
|
|
176
|
+
unique.append(set_sentence)
|
|
177
|
+
|
|
178
|
+
return unique
|
|
179
|
+
|
|
180
|
+
def __infer_rating_meaning(self) -> str:
|
|
181
|
+
"""Translates numerical rating scores into descriptive, paraphrased sentiment.
|
|
182
|
+
|
|
183
|
+
Calculates the aggregate review score and maps it to a sentiment category
|
|
184
|
+
(ranging from 'Very Negative' to 'Very Positive'). To avoid repetitive
|
|
185
|
+
output, the final description is passed through an AI paraphrasing
|
|
186
|
+
engine and a random variation is selected.
|
|
187
|
+
|
|
188
|
+
Returns:
|
|
189
|
+
str: A randomly selected paraphrased sentence describing the
|
|
190
|
+
overall service sentiment.
|
|
191
|
+
"""
|
|
192
|
+
overall_rating = self.__calculate_all_review()
|
|
193
|
+
|
|
194
|
+
def generate_sentence(rating_summ):
|
|
195
|
+
return f"For {self.__service_name}: " + random.choice(self.__paraphrase_statement(rating_summ)).strip()
|
|
196
|
+
|
|
197
|
+
if 1.0 <= overall_rating < 2.0:
|
|
198
|
+
return generate_sentence("Overall sentiment is very negative, indicating widespread dissatisfaction among users.")
|
|
199
|
+
elif 2.0 <= overall_rating < 3.0:
|
|
200
|
+
return generate_sentence("Overall sentiment is negative, suggesting notable dissatisfaction across reviews.")
|
|
201
|
+
elif 3.0 <= overall_rating < 4.0:
|
|
202
|
+
return generate_sentence("Overall sentiment is mixed, reflecting a balance of positive and negative feedback.")
|
|
203
|
+
elif 4.0 <= overall_rating < 5.0:
|
|
204
|
+
return generate_sentence("Overall sentiment is positive, indicating general user satisfaction.")
|
|
205
|
+
else:
|
|
206
|
+
return generate_sentence("Overall sentiment is very positive, reflecting strong user approval and satisfaction.")
|
|
207
|
+
|
|
208
|
+
def __extract_negative_aspects(self, review: str) -> list[str]:
|
|
209
|
+
"""
|
|
210
|
+
Extract actionable negative aspects from a review using AI-based text generation.
|
|
211
|
+
|
|
212
|
+
This method uses the Flan-T5 language model to identify specific, constructive
|
|
213
|
+
problems mentioned in a review. Unlike simple sentiment analysis, this extracts
|
|
214
|
+
concrete issues that describe what is broken, missing, or difficult - filtering
|
|
215
|
+
out vague emotional words like "horrible" or "bad".
|
|
216
|
+
|
|
217
|
+
The extraction focuses on actionable feedback that can help improve a product
|
|
218
|
+
or service, such as "notifications arrive at wrong times" rather than just
|
|
219
|
+
"notifications are bad".
|
|
220
|
+
|
|
221
|
+
Args:
|
|
222
|
+
review (str): The review text to analyze for negative aspects.
|
|
223
|
+
|
|
224
|
+
Returns:
|
|
225
|
+
list[str]: A list of specific problem phrases extracted from the review.
|
|
226
|
+
Each phrase describes a concrete issue. Returns an empty list
|
|
227
|
+
if the review is empty, contains only whitespace, or no
|
|
228
|
+
problems are identified.
|
|
229
|
+
|
|
230
|
+
Note:
|
|
231
|
+
This method uses the Flan-T5 model which is loaded lazily on first use.
|
|
232
|
+
Processing time depends on review length and available hardware (CPU/GPU).
|
|
233
|
+
Very short outputs (≤3 characters) are filtered out as likely artifacts.
|
|
234
|
+
"""
|
|
235
|
+
if not review or review.isspace():
|
|
236
|
+
return []
|
|
237
|
+
|
|
238
|
+
prompt = f"""What problems does this review mention? List each as a brief phrase.
|
|
239
|
+
Each problem should be describing what is wrong, DON'T OUTPUT one word lines like "horrible" or "bad".
|
|
240
|
+
Make sure they are CONSTRUCTIVE CRITICISM THAT CAN HELP SOMEONE IMPROVE
|
|
241
|
+
|
|
242
|
+
Review: {review}
|
|
243
|
+
|
|
244
|
+
Problems mentioned:"""
|
|
245
|
+
|
|
246
|
+
inputs = self.extraction_tokenizer(
|
|
247
|
+
prompt,
|
|
248
|
+
return_tensors="pt",
|
|
249
|
+
max_length=512,
|
|
250
|
+
truncation=True
|
|
251
|
+
).to(self.__device)
|
|
252
|
+
|
|
253
|
+
outputs = self.extraction_model.generate(
|
|
254
|
+
**inputs,
|
|
255
|
+
max_length=150,
|
|
256
|
+
num_beams=4,
|
|
257
|
+
temperature=0.7,
|
|
258
|
+
do_sample=True,
|
|
259
|
+
top_p=0.9,
|
|
260
|
+
early_stopping=True
|
|
261
|
+
)
|
|
262
|
+
|
|
263
|
+
result = self.extraction_tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
264
|
+
if result.strip().lower() in ['none', 'none.', 'no problems', '']:
|
|
265
|
+
return[]
|
|
266
|
+
|
|
267
|
+
issues = []
|
|
268
|
+
for line in result.split('\n'):
|
|
269
|
+
line = line.strip()
|
|
270
|
+
line = line.lstrip('•-*1234567890.) ')
|
|
271
|
+
if line and len(line) > 3:
|
|
272
|
+
issues.append(line)
|
|
273
|
+
|
|
274
|
+
return issues
|
|
275
|
+
|
|
276
|
+
def output_all_reviews(self) -> None:
|
|
277
|
+
"""
|
|
278
|
+
Output all reviews from the JSON file in a formatted manner.
|
|
279
|
+
|
|
280
|
+
Args:
|
|
281
|
+
None
|
|
282
|
+
Returns:
|
|
283
|
+
None
|
|
284
|
+
"""
|
|
285
|
+
try:
|
|
286
|
+
with open(self.__json_file_path, 'r') as file:
|
|
287
|
+
company_reviews = json.load(file)
|
|
288
|
+
for i, entry in enumerate(company_reviews, 1):
|
|
289
|
+
print(f"Review #{i}")
|
|
290
|
+
print(f"Company Name: {entry['company_name']}")
|
|
291
|
+
print(f"Service Name: {entry['service_name']}")
|
|
292
|
+
print(f"Review: {textwrap.fill(entry['review'], width=70)}")
|
|
293
|
+
print("\n\n")
|
|
294
|
+
except FileNotFoundError:
|
|
295
|
+
print("The JSON file you inputted doesn't exist. Please input a valid company review file.")
|
|
296
|
+
except json.JSONDecodeError:
|
|
297
|
+
print("Could not decode JSON file. Check for valid JSON syntax.")
|
|
298
|
+
except PermissionError:
|
|
299
|
+
print("Permission denied to open the JSON file.")
|
|
300
|
+
except Exception as e:
|
|
301
|
+
print(f"An unexpected error occured: {e}")
|
|
302
|
+
|
|
303
|
+
def generate_summary(self) -> str:
|
|
304
|
+
"""
|
|
305
|
+
Generate a formatted sentiment summary based on user reviews for a service.
|
|
306
|
+
|
|
307
|
+
This method reads a JSON file containing user reviews, infers the overall
|
|
308
|
+
sentiment rating, and produces a structured, human-readable summary.
|
|
309
|
+
The summary includes:
|
|
310
|
+
- A concise explanation of the inferred sentiment rating
|
|
311
|
+
- A numbered list of representative negative reviews (up to 3)
|
|
312
|
+
- A numbered list of representative positive reviews (up to 3)
|
|
313
|
+
|
|
314
|
+
Long-form reviews are wrapped to a fixed line width while preserving
|
|
315
|
+
list structure and readability.
|
|
316
|
+
|
|
317
|
+
The method is resilient to common file and parsing errors and will
|
|
318
|
+
emit descriptive messages if the input file cannot be accessed or
|
|
319
|
+
decoded properly.
|
|
320
|
+
|
|
321
|
+
Returns:
|
|
322
|
+
str
|
|
323
|
+
A multi-paragraph, text-wrapped sentiment summary suitable for
|
|
324
|
+
console output, logs, or reports.
|
|
325
|
+
|
|
326
|
+
Raises:
|
|
327
|
+
None
|
|
328
|
+
All exceptions are handled internally with descriptive error
|
|
329
|
+
messages to prevent interruption of execution.
|
|
330
|
+
"""
|
|
331
|
+
try:
|
|
332
|
+
reviews = []
|
|
333
|
+
with open(self.__json_file_path, 'r') as file:
|
|
334
|
+
company_reviews = json.load(file)
|
|
335
|
+
for i, entry in enumerate(company_reviews, 1):
|
|
336
|
+
if self.__get_predictive_star(entry['review']) <= 2:
|
|
337
|
+
for part in self.__extract_negative_aspects(entry['review']):
|
|
338
|
+
self.__notable_negatives.append(part)
|
|
339
|
+
self.__service_name = entry['service_name']
|
|
340
|
+
reviews.append(entry['review'])
|
|
341
|
+
except FileNotFoundError:
|
|
342
|
+
print("The JSON file you inputted doesn't exist. Please input a valid company review file.")
|
|
343
|
+
except json.JSONDecodeError:
|
|
344
|
+
print("Could not decode JSON file. Check for valid JSON syntax.")
|
|
345
|
+
except PermissionError:
|
|
346
|
+
print("Permission denied to open the JSON file.")
|
|
347
|
+
except Exception as e:
|
|
348
|
+
print(f"An unexpected error occured: {e}")
|
|
349
|
+
|
|
350
|
+
def format_numbered_list(items):
|
|
351
|
+
if not items:
|
|
352
|
+
return "None found"
|
|
353
|
+
|
|
354
|
+
lines = []
|
|
355
|
+
for i, item in enumerate(items, start=1):
|
|
356
|
+
prefix = f"{i}) "
|
|
357
|
+
wrapper = textwrap.TextWrapper(
|
|
358
|
+
width=70,
|
|
359
|
+
initial_indent=prefix,
|
|
360
|
+
subsequent_indent=" " * len(prefix) + " "
|
|
361
|
+
)
|
|
362
|
+
lines.append(wrapper.fill(str(item)))
|
|
363
|
+
return "\n".join(lines)
|
|
364
|
+
|
|
365
|
+
rating_meaning = self.__infer_rating_meaning()
|
|
366
|
+
|
|
367
|
+
parts = [
|
|
368
|
+
textwrap.fill(rating_meaning, width=70),
|
|
369
|
+
textwrap.fill("The following reviews highlight some concerns users have expressed:", width=70),
|
|
370
|
+
format_numbered_list(self.__notable_negatives)
|
|
371
|
+
]
|
|
372
|
+
|
|
373
|
+
return "\n\n".join(parts)
|
|
@@ -0,0 +1,244 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: sentimentscopeai
|
|
3
|
+
Version: 1.0.0
|
|
4
|
+
Summary: Transformer-based review sentiment analysis and actionable insight generation.
|
|
5
|
+
Author: Vignesh Thondikulam
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/VigneshT24/SentimentScopeAI
|
|
8
|
+
Project-URL: Repository, https://github.com/VigneshT24/SentimentScopeAI
|
|
9
|
+
Requires-Python: >=3.9
|
|
10
|
+
Description-Content-Type: text/markdown
|
|
11
|
+
License-File: LICENSE
|
|
12
|
+
Requires-Dist: torch
|
|
13
|
+
Requires-Dist: transformers
|
|
14
|
+
Dynamic: license-file
|
|
15
|
+
|
|
16
|
+
# SentimentScopeAI
|
|
17
|
+
## Fine-Grained Review Sentiment Analysis & Insight Generation
|
|
18
|
+
|
|
19
|
+
SentimentScopeAI is a Python-based NLP system that leverages PyTorch and HuggingFace Transformers (pre-trained models) to move beyond binary sentiment classification and instead analyze, interpret, and reason over collections of user reviews to help companies improve their products/services
|
|
20
|
+
|
|
21
|
+
Rather than treating sentiment analysis as a black-box prediction task, this project focuses on semantic interpretation, explainability, and the generation of aggregated insights, simulating how a human analyst would read and summarize large volumes of feedback.
|
|
22
|
+
|
|
23
|
+
## Project Motivation
|
|
24
|
+
|
|
25
|
+
SentimentScopeAI is designed to answer deeper, more practical questions:
|
|
26
|
+
* What does a numerical rating actually mean in context?
|
|
27
|
+
* How consistent are opinions across many reviews?
|
|
28
|
+
* What actionable advice can be derived from collective sentiment?
|
|
29
|
+
|
|
30
|
+
## Features
|
|
31
|
+
|
|
32
|
+
1.) Pre-Trained Sentiment Modeling (PyTorch + HuggingFace)
|
|
33
|
+
* Uses pre-trained transformer models from HuggingFace
|
|
34
|
+
* Integrated via PyTorch for inference and extensibility
|
|
35
|
+
* Enables robust sentiment understanding without training from scratch
|
|
36
|
+
* Designed so downstream logic operates on model outputs, not raw text
|
|
37
|
+
|
|
38
|
+
2.) Rating Meaning Inference
|
|
39
|
+
* Implemented the infer_rating_meaning() function
|
|
40
|
+
* Converts numerical ratings (1–5) into semantic interpretations
|
|
41
|
+
* Uses sentiment signals, linguistic tone, and contextual cues
|
|
42
|
+
* Handles:
|
|
43
|
+
* Mixed sentiment
|
|
44
|
+
* Neutral or ambiguous phrasing
|
|
45
|
+
* Disagreement between rating score and review text
|
|
46
|
+
|
|
47
|
+
Example:
|
|
48
|
+
```
|
|
49
|
+
Rating: 3
|
|
50
|
+
→ "Mixed experience with noticeable positives and recurring issues."
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
3.) Structured Review Ingestion
|
|
54
|
+
* Reviews are parsed in a structured format (JSON / Python objects)
|
|
55
|
+
* Each review preserves:
|
|
56
|
+
* Company name
|
|
57
|
+
* Service or product name
|
|
58
|
+
* Full review text
|
|
59
|
+
* Enables batch analysis across multiple reviews per service
|
|
60
|
+
|
|
61
|
+
4.) Explainable, Deterministic Pipeline
|
|
62
|
+
* Downstream reasoning is transparent and testable
|
|
63
|
+
* No opaque end-to-end predictions
|
|
64
|
+
* Model outputs are interpreted rather than blindly trusted
|
|
65
|
+
* Designed for debugging, auditing, and future research extension
|
|
66
|
+
|
|
67
|
+
5.) Cross-Review Advice Generation
|
|
68
|
+
* Read all reviews for a given product or service
|
|
69
|
+
* Aggregate sentiment signals across users
|
|
70
|
+
* Detect recurring strengths and weaknesses
|
|
71
|
+
* Generate actionable advice for stakeholders
|
|
72
|
+
|
|
73
|
+
These steps transition the system from analysis → reasoning → recommendation generation.
|
|
74
|
+
|
|
75
|
+
Example:
|
|
76
|
+
```
|
|
77
|
+
For <Service Name>: overall sentiment is mixed reflecting a balance
|
|
78
|
+
of positive and negative feedback
|
|
79
|
+
|
|
80
|
+
The following specific issues were extracted from negative reviews:
|
|
81
|
+
|
|
82
|
+
1) missed a few appointments
|
|
83
|
+
2) not signed into the right account
|
|
84
|
+
3) interface is horrible
|
|
85
|
+
4) find the interface confusing
|
|
86
|
+
5) invitations and acceptances are terrible
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
## System Architecture Overview
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
Reviews
|
|
93
|
+
↓
|
|
94
|
+
Pre-trained Transformer (HuggingFace + PyTorch)
|
|
95
|
+
↓
|
|
96
|
+
Sentiment Signals
|
|
97
|
+
↓
|
|
98
|
+
Rating Meaning Inference
|
|
99
|
+
↓
|
|
100
|
+
Cross-Review Aggregation
|
|
101
|
+
↓
|
|
102
|
+
Advice Generation
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
## Tech-Stack
|
|
106
|
+
|
|
107
|
+
* **Language**: Python
|
|
108
|
+
* **Deep Learning**: PyTorch
|
|
109
|
+
* **NLP Models**: HuggingFace Transformers (pre-trained), Flan-T5
|
|
110
|
+
* **Concepts**:
|
|
111
|
+
* Sentiment analysis
|
|
112
|
+
* Semantic interpretation
|
|
113
|
+
* Explainable AI
|
|
114
|
+
* **Aggregated reasoning**
|
|
115
|
+
* **Data Handling**: JSON, Python data structures
|
|
116
|
+
|
|
117
|
+
## Project Structure (Simplified)
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
SentimentScopeAI/
|
|
121
|
+
│
|
|
122
|
+
├── sentimentscopeAI.py # Core sentiment + inference logic
|
|
123
|
+
├── README.md # Documentation
|
|
124
|
+
└── requirements.txt # Dependencies (PyTorch, Transformers)
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
## Why SentimentScopeAI?
|
|
128
|
+
|
|
129
|
+
Every organization collects feedback - but reading hundreds or thousands of reviews is time-consuming, inconsistent, and difficult to scale. Important insights are often buried in repetitive comments, while actionable criticism gets overlooked.
|
|
130
|
+
|
|
131
|
+
SentimentScopeAI is designed to do the heavy lifting:
|
|
132
|
+
* Reads and analyzes large volumes of reviews automatically
|
|
133
|
+
* Identifies recurring pain points across users
|
|
134
|
+
* Distills unstructured feedback into clear, constructive, and actionable advice
|
|
135
|
+
* Helps teams focus on what to improve rather than sorting through raw text
|
|
136
|
+
|
|
137
|
+
## Installation & Usage
|
|
138
|
+
|
|
139
|
+
SentimentScopeAI is distributed as a Python package and can be installed via pip:
|
|
140
|
+
|
|
141
|
+
```
|
|
142
|
+
pip install sentimentscopeai
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Requirements:
|
|
146
|
+
* Python 3.9 or newer (Python 3.10 or above is recommended for best performance and compatibility)
|
|
147
|
+
* PyTorch
|
|
148
|
+
* HuggingFace Transformers
|
|
149
|
+
* Internet connection on first run (to download pre-trained models)
|
|
150
|
+
|
|
151
|
+
All required dependencies are automatically installed with the package.
|
|
152
|
+
|
|
153
|
+
## Basic Usage:
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
from sentimentscopeai import SentimentScopeAI
|
|
157
|
+
|
|
158
|
+
# MAKE SURE TO PASS IN: current_folder/json_file_name, not just json_file_name
|
|
159
|
+
review_bot = ssAI.SentimentScopeAI("Testing/companyreview.json")
|
|
160
|
+
|
|
161
|
+
print(review_bot.generate_summary())
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
What Happens Internally
|
|
165
|
+
|
|
166
|
+
* Reviews are parsed from a structured JSON file
|
|
167
|
+
* Sentiment is inferred using pre-trained transformer models (PyTorch + HuggingFace)
|
|
168
|
+
* Rating meanings are semantically interpreted
|
|
169
|
+
* Flan-T5 generates human-readable, actionable advice based on aggregated feedback
|
|
170
|
+
|
|
171
|
+
## IMPORTANT NOTICE:
|
|
172
|
+
|
|
173
|
+
1.) Strict JSON Input Format (Required)
|
|
174
|
+
|
|
175
|
+
SentimentScopeAI only accepts JSON input.
|
|
176
|
+
The review file must follow this exact structure:
|
|
177
|
+
|
|
178
|
+
```json
|
|
179
|
+
[
|
|
180
|
+
{
|
|
181
|
+
"company_name": "Company Name",
|
|
182
|
+
"service_name": "Product or Service Name",
|
|
183
|
+
"review": "Full user review text goes here."
|
|
184
|
+
},
|
|
185
|
+
{
|
|
186
|
+
"company_name": "Company Name",
|
|
187
|
+
"service_name": "Product or Service Name",
|
|
188
|
+
"review": "Another review text."
|
|
189
|
+
}
|
|
190
|
+
]
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
Missing fields, incorrect keys, or non-JSON formats will cause parsing errors.
|
|
194
|
+
|
|
195
|
+
2.) JSON Must Be Valid
|
|
196
|
+
|
|
197
|
+
* File must be UTF-8 encoded
|
|
198
|
+
* No trailing commas
|
|
199
|
+
* No comments
|
|
200
|
+
* Must be a list ([]), not a single object
|
|
201
|
+
|
|
202
|
+
You can use a JSON validator if you are unsure.
|
|
203
|
+
|
|
204
|
+
3.) One Company & One Service per JSON File (Required)
|
|
205
|
+
|
|
206
|
+
EXAMPLE:
|
|
207
|
+
```json
|
|
208
|
+
[
|
|
209
|
+
{
|
|
210
|
+
"company_name": "Google",
|
|
211
|
+
"service_name": "Google Calendar",
|
|
212
|
+
"review": "The interface is clean and easy to use."
|
|
213
|
+
},
|
|
214
|
+
{
|
|
215
|
+
"company_name": "Google",
|
|
216
|
+
"service_name": "Google Calendar",
|
|
217
|
+
"review": "Reminders are helpful, but syncing can be slow."
|
|
218
|
+
}
|
|
219
|
+
]
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
This restriction is intentional:
|
|
223
|
+
|
|
224
|
+
* Sentiment aggregation assumes a single shared context
|
|
225
|
+
* Advice generation relies on consistent product-level patterns
|
|
226
|
+
* Mixing services can produce misleading summaries and recommendations
|
|
227
|
+
|
|
228
|
+
If you need to analyze multiple products or companies, create separate JSON files and run SentimentScopeAI independently for each dataset.
|
|
229
|
+
|
|
230
|
+
4.) Model Loading Behavior
|
|
231
|
+
|
|
232
|
+
* Transformer models are lazy-loaded
|
|
233
|
+
* First run may take longer due to:
|
|
234
|
+
* Model downloads
|
|
235
|
+
* Tokenizer initialization
|
|
236
|
+
* Subsequent runs are significantly faster
|
|
237
|
+
|
|
238
|
+
This design improves startup efficiency and memory usage.
|
|
239
|
+
|
|
240
|
+
|
|
241
|
+
**SentimentScopeAI is provided as-is and is not liable for any damages arising from its use.
|
|
242
|
+
Do not include personal, sensitive, or confidential information in review data.
|
|
243
|
+
All input data is processed locally and is not used for model training or retained beyond execution.
|
|
244
|
+
Users are responsible for ensuring ethical and appropriate use of the system.**
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
sentimentscopeai/__init__.py
|
|
5
|
+
sentimentscopeai/core.py
|
|
6
|
+
sentimentscopeai.egg-info/PKG-INFO
|
|
7
|
+
sentimentscopeai.egg-info/SOURCES.txt
|
|
8
|
+
sentimentscopeai.egg-info/dependency_links.txt
|
|
9
|
+
sentimentscopeai.egg-info/requires.txt
|
|
10
|
+
sentimentscopeai.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
sentimentscopeai
|