sentimentscopeai 1.2.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Vignesh Thondikulam
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,200 @@
1
+ Metadata-Version: 2.4
2
+ Name: sentimentscopeai
3
+ Version: 1.2.2
4
+ Summary: Transformer-based review sentiment analysis and actionable insight generation.
5
+ Author: Vignesh Thondikulam
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/VigneshT24/SentimentScopeAI
8
+ Project-URL: Repository, https://github.com/VigneshT24/SentimentScopeAI
9
+ Requires-Python: >=3.9
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: torch
13
+ Requires-Dist: transformers
14
+ Requires-Dist: sentencepiece
15
+ Dynamic: license-file
16
+
17
+ ![Sentiment Scope AI Logo](https://github.com/user-attachments/assets/493203ec-ed08-4ad7-8e1a-1c4aced128cb)
18
+
19
+
20
+ # SentimentScopeAI
21
+ ## Fine-Grained Review Sentiment Analysis & Insight Generation
22
+
23
+ SentimentScopeAI is a Python-based NLP system that leverages PyTorch and HuggingFace Transformers (pre-trained models) to move beyond binary sentiment classification and instead analyze, interpret, and reason over collections of user reviews to help companies improve their products/services
24
+
25
+ Rather than treating sentiment analysis as a black-box prediction task, this project focuses on semantic interpretation, explainability, and the generation of aggregated insights, simulating how a human analyst would read and summarize large volumes of feedback.
26
+
27
+ ## Project Motivation
28
+
29
+ SentimentScopeAI is designed to answer this one main question:
30
+ * What actionable advice can be derived from collective sentiment?
31
+
32
+ ## Features
33
+
34
+ 1.) Pre-Trained Sentiment Modeling (PyTorch + HuggingFace)
35
+ * Uses pre-trained transformer models from HuggingFace
36
+ * Integrated via PyTorch for inference and extensibility
37
+ * Enables robust sentiment understanding without training from scratch
38
+ * Designed so downstream logic operates on model outputs, not raw text
39
+
40
+ 2.) Rating Meaning Inference
41
+ * Implemented the infer_rating_meaning() function
42
+ * Converts numerical ratings (1–5) into semantic interpretations
43
+ * Uses sentiment signals, linguistic tone, and contextual cues
44
+ * Handles:
45
+ * Mixed sentiment
46
+ * Neutral or ambiguous phrasing
47
+ * Disagreement between rating score and review text
48
+
49
+ Example:
50
+ ```
51
+ Rating: 3
52
+ → "Mixed experience with noticeable positives and recurring issues."
53
+ ```
54
+
55
+ 3.) Explainable, Deterministic Pipeline
56
+ * Downstream reasoning is transparent and testable
57
+ * No opaque end-to-end predictions
58
+ * Model outputs are interpreted rather than blindly trusted
59
+ * Designed for debugging, auditing, and future research extension
60
+
61
+ 4.) Summary Generation
62
+ * Read all reviews for a given product or service
63
+ * Aggregate sentiment signals across users
64
+ * Detect recurring strengths and weaknesses
65
+ * Generate a summary of all reviews to help stakeholders
66
+
67
+ These steps transition the system from analysis → reasoning → recommendation generation.
68
+
69
+ Example:
70
+ ```
71
+ For <Company Name>'s <Service Name>: overall sentiment is mixed reflecting a balance
72
+ of positive and negative feedback
73
+
74
+ The following specific issues were extracted from negative reviews:
75
+
76
+ 1) missed a few appointments
77
+ 2) not signed into the right account
78
+ 3) interface is horrible
79
+ 4) find the interface confusing
80
+ 5) invitations and acceptances are terrible
81
+ ```
82
+
83
+ ## System Architecture Overview
84
+
85
+ ```
86
+ Reviews
87
+
88
+ Pre-trained Transformer (HuggingFace + PyTorch)
89
+
90
+ Sentiment Signals
91
+
92
+ Rating Meaning Inference
93
+
94
+ Summary Generation
95
+ ```
96
+
97
+ ## Tech-Stack
98
+
99
+ * **Language**: Python
100
+ * **Deep Learning**: PyTorch
101
+ * **NLP Models**: HuggingFace Transformers (pre-trained), Flan-T5
102
+ * **Aggregated reasoning**
103
+ * **Data Handling**: JSON, Python data structures
104
+
105
+ ## Why SentimentScopeAI?
106
+
107
+ Every organization collects feedback - but reading hundreds or thousands of reviews is time-consuming, inconsistent, and difficult to scale. Important insights are often buried in repetitive comments, while actionable criticism gets overlooked.
108
+
109
+ SentimentScopeAI is designed to do the heavy lifting:
110
+ * Reads and analyzes large volumes of reviews automatically
111
+ * Identifies recurring pain points across users
112
+ * Pick the one main piece of negative from each review
113
+ * Helps teams focus on what to improve rather than sorting through raw text
114
+
115
+ ## Installation & Usage
116
+
117
+ SentimentScopeAI is distributed as a Python package and can be installed via pip:
118
+
119
+ ```
120
+ pip install sentimentscopeai
121
+ ```
122
+
123
+ Requirements:
124
+ * Python 3.9 or newer (Python 3.10 or above is recommended for best performance and compatibility)
125
+ * PyTorch
126
+ * HuggingFace Transformers
127
+ * Internet connection
128
+
129
+ All required dependencies are automatically installed with the package.
130
+
131
+ ## Basic Usage:
132
+
133
+ ```python
134
+ from sentimentscopeai import SentimentScopeAI
135
+
136
+ # MAKE SURE TO PASS IN: current_folder/json_file_name, not just json_file_name if the following doesn't work
137
+ review_bot = SentimentScopeAI("json_file_name", "company_name", "service_name")
138
+
139
+ print(review_bot.generate_summary())
140
+ ```
141
+
142
+ What Happens Internally
143
+
144
+ * Reviews are parsed from a structured JSON file
145
+ * Sentiment is inferred using pre-trained transformer models (PyTorch + HuggingFace)
146
+ * Rating meanings are semantically interpreted
147
+ * Flan-T5 finds the negatives from each review and summarizes the whole file
148
+
149
+ ## IMPORTANT NOTICE:
150
+
151
+ 1.) JSON Input Format (Required)
152
+
153
+ SentimentScopeAI only accepts JSON input.
154
+ The review file must follow this exact structure:
155
+
156
+ ```json
157
+ [
158
+ "review_text",
159
+ "review_text",
160
+ "review_text",
161
+ ...
162
+ ]
163
+ ```
164
+
165
+ Missing fields, incorrect keys, or non-JSON formats will cause parsing errors.
166
+
167
+ 2.) JSON Must Be Valid
168
+
169
+ * File must be UTF-8 encoded
170
+ * No trailing commas
171
+ * No comments
172
+ * Must be a list ([]), not a single object
173
+
174
+ You can use a JSON validator if you are unsure.
175
+
176
+ 3.) One Company & One Service per JSON File (Required)
177
+
178
+ This restriction is intentional:
179
+
180
+ * Sentiment aggregation assumes a single shared context
181
+ * Summary generation relies on consistent product-level patterns
182
+ * Mixing services can produce misleading summaries and recommendations
183
+
184
+ If you need to analyze multiple products or companies, create separate JSON files and run SentimentScopeAI independently for each dataset.
185
+
186
+ 4.) Model Loading Behavior
187
+
188
+ * Transformer models are lazy-loaded
189
+ * First run may take longer due to:
190
+ * Model downloads
191
+ * Tokenizer initialization
192
+ * Subsequent runs are significantly faster
193
+
194
+ This design improves startup efficiency and memory usage.
195
+
196
+
197
+ **SentimentScopeAI is provided as-is and is not liable for any damages arising from its use.
198
+ Do not include personal, sensitive, or confidential information in review data.
199
+ All input data is processed locally and is not used for model training or retained beyond execution.
200
+ Users are responsible for ensuring ethical and appropriate use of the system.**
@@ -0,0 +1,184 @@
1
+ ![Sentiment Scope AI Logo](https://github.com/user-attachments/assets/493203ec-ed08-4ad7-8e1a-1c4aced128cb)
2
+
3
+
4
+ # SentimentScopeAI
5
+ ## Fine-Grained Review Sentiment Analysis & Insight Generation
6
+
7
+ SentimentScopeAI is a Python-based NLP system that leverages PyTorch and HuggingFace Transformers (pre-trained models) to move beyond binary sentiment classification and instead analyze, interpret, and reason over collections of user reviews to help companies improve their products/services
8
+
9
+ Rather than treating sentiment analysis as a black-box prediction task, this project focuses on semantic interpretation, explainability, and the generation of aggregated insights, simulating how a human analyst would read and summarize large volumes of feedback.
10
+
11
+ ## Project Motivation
12
+
13
+ SentimentScopeAI is designed to answer this one main question:
14
+ * What actionable advice can be derived from collective sentiment?
15
+
16
+ ## Features
17
+
18
+ 1.) Pre-Trained Sentiment Modeling (PyTorch + HuggingFace)
19
+ * Uses pre-trained transformer models from HuggingFace
20
+ * Integrated via PyTorch for inference and extensibility
21
+ * Enables robust sentiment understanding without training from scratch
22
+ * Designed so downstream logic operates on model outputs, not raw text
23
+
24
+ 2.) Rating Meaning Inference
25
+ * Implemented the infer_rating_meaning() function
26
+ * Converts numerical ratings (1–5) into semantic interpretations
27
+ * Uses sentiment signals, linguistic tone, and contextual cues
28
+ * Handles:
29
+ * Mixed sentiment
30
+ * Neutral or ambiguous phrasing
31
+ * Disagreement between rating score and review text
32
+
33
+ Example:
34
+ ```
35
+ Rating: 3
36
+ → "Mixed experience with noticeable positives and recurring issues."
37
+ ```
38
+
39
+ 3.) Explainable, Deterministic Pipeline
40
+ * Downstream reasoning is transparent and testable
41
+ * No opaque end-to-end predictions
42
+ * Model outputs are interpreted rather than blindly trusted
43
+ * Designed for debugging, auditing, and future research extension
44
+
45
+ 4.) Summary Generation
46
+ * Read all reviews for a given product or service
47
+ * Aggregate sentiment signals across users
48
+ * Detect recurring strengths and weaknesses
49
+ * Generate a summary of all reviews to help stakeholders
50
+
51
+ These steps transition the system from analysis → reasoning → recommendation generation.
52
+
53
+ Example:
54
+ ```
55
+ For <Company Name>'s <Service Name>: overall sentiment is mixed reflecting a balance
56
+ of positive and negative feedback
57
+
58
+ The following specific issues were extracted from negative reviews:
59
+
60
+ 1) missed a few appointments
61
+ 2) not signed into the right account
62
+ 3) interface is horrible
63
+ 4) find the interface confusing
64
+ 5) invitations and acceptances are terrible
65
+ ```
66
+
67
+ ## System Architecture Overview
68
+
69
+ ```
70
+ Reviews
71
+
72
+ Pre-trained Transformer (HuggingFace + PyTorch)
73
+
74
+ Sentiment Signals
75
+
76
+ Rating Meaning Inference
77
+
78
+ Summary Generation
79
+ ```
80
+
81
+ ## Tech-Stack
82
+
83
+ * **Language**: Python
84
+ * **Deep Learning**: PyTorch
85
+ * **NLP Models**: HuggingFace Transformers (pre-trained), Flan-T5
86
+ * **Aggregated reasoning**
87
+ * **Data Handling**: JSON, Python data structures
88
+
89
+ ## Why SentimentScopeAI?
90
+
91
+ Every organization collects feedback - but reading hundreds or thousands of reviews is time-consuming, inconsistent, and difficult to scale. Important insights are often buried in repetitive comments, while actionable criticism gets overlooked.
92
+
93
+ SentimentScopeAI is designed to do the heavy lifting:
94
+ * Reads and analyzes large volumes of reviews automatically
95
+ * Identifies recurring pain points across users
96
+ * Pick the one main piece of negative from each review
97
+ * Helps teams focus on what to improve rather than sorting through raw text
98
+
99
+ ## Installation & Usage
100
+
101
+ SentimentScopeAI is distributed as a Python package and can be installed via pip:
102
+
103
+ ```
104
+ pip install sentimentscopeai
105
+ ```
106
+
107
+ Requirements:
108
+ * Python 3.9 or newer (Python 3.10 or above is recommended for best performance and compatibility)
109
+ * PyTorch
110
+ * HuggingFace Transformers
111
+ * Internet connection
112
+
113
+ All required dependencies are automatically installed with the package.
114
+
115
+ ## Basic Usage:
116
+
117
+ ```python
118
+ from sentimentscopeai import SentimentScopeAI
119
+
120
+ # MAKE SURE TO PASS IN: current_folder/json_file_name, not just json_file_name if the following doesn't work
121
+ review_bot = SentimentScopeAI("json_file_name", "company_name", "service_name")
122
+
123
+ print(review_bot.generate_summary())
124
+ ```
125
+
126
+ What Happens Internally
127
+
128
+ * Reviews are parsed from a structured JSON file
129
+ * Sentiment is inferred using pre-trained transformer models (PyTorch + HuggingFace)
130
+ * Rating meanings are semantically interpreted
131
+ * Flan-T5 finds the negatives from each review and summarizes the whole file
132
+
133
+ ## IMPORTANT NOTICE:
134
+
135
+ 1.) JSON Input Format (Required)
136
+
137
+ SentimentScopeAI only accepts JSON input.
138
+ The review file must follow this exact structure:
139
+
140
+ ```json
141
+ [
142
+ "review_text",
143
+ "review_text",
144
+ "review_text",
145
+ ...
146
+ ]
147
+ ```
148
+
149
+ Missing fields, incorrect keys, or non-JSON formats will cause parsing errors.
150
+
151
+ 2.) JSON Must Be Valid
152
+
153
+ * File must be UTF-8 encoded
154
+ * No trailing commas
155
+ * No comments
156
+ * Must be a list ([]), not a single object
157
+
158
+ You can use a JSON validator if you are unsure.
159
+
160
+ 3.) One Company & One Service per JSON File (Required)
161
+
162
+ This restriction is intentional:
163
+
164
+ * Sentiment aggregation assumes a single shared context
165
+ * Summary generation relies on consistent product-level patterns
166
+ * Mixing services can produce misleading summaries and recommendations
167
+
168
+ If you need to analyze multiple products or companies, create separate JSON files and run SentimentScopeAI independently for each dataset.
169
+
170
+ 4.) Model Loading Behavior
171
+
172
+ * Transformer models are lazy-loaded
173
+ * First run may take longer due to:
174
+ * Model downloads
175
+ * Tokenizer initialization
176
+ * Subsequent runs are significantly faster
177
+
178
+ This design improves startup efficiency and memory usage.
179
+
180
+
181
+ **SentimentScopeAI is provided as-is and is not liable for any damages arising from its use.
182
+ Do not include personal, sensitive, or confidential information in review data.
183
+ All input data is processed locally and is not used for model training or retained beyond execution.
184
+ Users are responsible for ensuring ethical and appropriate use of the system.**
@@ -0,0 +1,24 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "sentimentscopeai"
7
+ version = "1.2.2"
8
+ description = "Transformer-based review sentiment analysis and actionable insight generation."
9
+ readme = "README.md"
10
+ requires-python = ">=3.9"
11
+ license = { text = "MIT" }
12
+ authors = [
13
+ { name = "Vignesh Thondikulam" }
14
+ ]
15
+
16
+ dependencies = [
17
+ "torch",
18
+ "transformers",
19
+ "sentencepiece"
20
+ ]
21
+
22
+ [project.urls]
23
+ Homepage = "https://github.com/VigneshT24/SentimentScopeAI"
24
+ Repository = "https://github.com/VigneshT24/SentimentScopeAI"
@@ -0,0 +1,2 @@
1
+ from .core import SentimentScopeAI
2
+ __all__ = ["SentimentScopeAI"]
@@ -0,0 +1,455 @@
1
+ import torch
2
+ import json
3
+ import os
4
+ import string
5
+ import random
6
+ import textwrap
7
+ import time
8
+ import sys
9
+ import threading
10
+ from difflib import SequenceMatcher
11
+ from transformers import (AutoTokenizer, AutoModelForSequenceClassification, AutoModelForSeq2SeqLM, T5ForConditionalGeneration, T5Tokenizer, set_seed)
12
+
13
+ class SentimentScopeAI:
14
+ ## Private attributes
15
+ __hf_model_name = None
16
+ __hf_tokenizer = None
17
+ __hf_model = None
18
+ __pytorch_model_name = None
19
+ __pytorch_tokenizer = None
20
+ __pytorch_model = None
21
+ __json_file_path = None
22
+ __device = None
23
+ __notable_negatives = []
24
+ __extraction_model = None
25
+ __extraction_tokenizer = None
26
+ __company_name = None
27
+ __service_name = None
28
+ __stop_timer = None
29
+ __timer_thread = None
30
+
31
+ def __init__(self, file_path, company_name, service_name):
32
+ """
33
+ Initialize the SentimentScopeAI class with the specified JSON file path, company's name, and service's name.
34
+
35
+ Args:
36
+ - file_path (str): specified JSON file path
37
+ - company_name (str): name of the company being reviewed
38
+ - service_name (str): name of the company's service/product being reviewed
39
+
40
+ Returns:
41
+ tuple: A tuple containing the total number of reviews and the average star rating.
42
+ """
43
+ self.__hf_model_name = "Vamsi/T5_Paraphrase_Paws"
44
+ self.__pytorch_model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
45
+ self.__extraction_model_name = "google/flan-t5-large"
46
+ self.__company_name = company_name
47
+ self.__service_name = service_name
48
+ self.__json_file_path = os.path.abspath(file_path)
49
+ print("""
50
+ ─────────────────────────────────────────────────────────────────────────────
51
+ SentimentScopeAI can make mistakes. This AI may produce incomplete summaries,
52
+ misclassify sentiment, or categorize positive feedback as negative. Please
53
+ verify critical insights before making decisions based on this analysis.
54
+ ─────────────────────────────────────────────────────────────────────────────
55
+ """)
56
+ self.__device = "cuda" if torch.cuda.is_available() else "cpu"
57
+ self.__stop_timer = threading.Event()
58
+ self.__timer_thread = threading.Thread(target=self.__time_threading)
59
+ self.__timer_thread.start()
60
+
61
+ @property
62
+ def hf_model(self):
63
+ """Lazy loader for the Paraphrase Model."""
64
+ if self.__hf_model is None:
65
+ self.__hf_model = AutoModelForSeq2SeqLM.from_pretrained(self.__hf_model_name)
66
+ return self.__hf_model
67
+
68
+ @property
69
+ def hf_tokenizer(self):
70
+ """Lazy loader for the Paraphrase Tokenizer."""
71
+ if self.__hf_tokenizer is None:
72
+ self.__hf_tokenizer = T5Tokenizer.from_pretrained(self.__hf_model_name, legacy=True)
73
+ return self.__hf_tokenizer
74
+
75
+ @property
76
+ def pytorch_tokenizer(self):
77
+ """Lazy loader for the PyTorch Tokenizer."""
78
+ if self.__pytorch_tokenizer is None:
79
+ self.__pytorch_tokenizer = AutoTokenizer.from_pretrained(self.__pytorch_model_name)
80
+ return self.__pytorch_tokenizer
81
+
82
+ @property
83
+ def pytorch_model(self):
84
+ """Lazy loader for the PyTorch Model."""
85
+ if self.__pytorch_model is None:
86
+ self.__pytorch_model = AutoModelForSequenceClassification.from_pretrained(
87
+ self.__pytorch_model_name
88
+ ).to(self.__device)
89
+ return self.__pytorch_model
90
+
91
+ @property
92
+ def extraction_model(self):
93
+ """Lazy loader for the Flan-T5 extraction model."""
94
+ if self.__extraction_model is None:
95
+ self.__extraction_model = T5ForConditionalGeneration.from_pretrained(
96
+ self.__extraction_model_name
97
+ ).to(self.__device)
98
+ return self.__extraction_model
99
+
100
+ @property
101
+ def extraction_tokenizer(self):
102
+ """Lazy loader for the Flan-T5 tokenizer."""
103
+ if self.__extraction_tokenizer is None:
104
+ self.__extraction_tokenizer = AutoTokenizer.from_pretrained(
105
+ self.__extraction_model_name
106
+ )
107
+ return self.__extraction_tokenizer
108
+
109
+ def __time_threading(self) -> None:
110
+ """Time Threading for elapsed timer while SentimentScopeAI processes"""
111
+ start_time = time.time()
112
+ while not self.__stop_timer.is_set():
113
+ elapsed_time = time.time() - start_time
114
+ mins, secs = divmod(elapsed_time, 60)
115
+ hours, mins = divmod(mins, 60)
116
+
117
+ timer_display = f"SentimentScopeAI is processing (elapsed time): {int(hours):02}:{int(mins):02}:{int(secs):02}"
118
+ sys.stdout.write('\r' + timer_display)
119
+ sys.stdout.flush()
120
+
121
+ time.sleep(0.1)
122
+
123
+ def __get_predictive_star(self, text: str) -> int:
124
+ """
125
+ Predict the sentiment star rating for the given text review.
126
+
127
+ Args:
128
+ text (str): The text review to analyze.
129
+ Returns:
130
+ int: The predicted star rating (1 to 5).
131
+ """
132
+
133
+ max_len = getattr(self.pytorch_tokenizer, "model_max_length", 512)
134
+
135
+ inputs = self.pytorch_tokenizer(
136
+ text,
137
+ return_tensors="pt",
138
+ truncation=True,
139
+ max_length=max_len
140
+ ).to(self.__device)
141
+
142
+ with torch.no_grad():
143
+ outputs = self.pytorch_model(**inputs)
144
+
145
+ logits = outputs.logits
146
+ prediction = torch.argmax(logits, dim=-1).item()
147
+
148
+ num_star = prediction + 1
149
+ return num_star
150
+
151
+ def __calculate_all_review(self) -> int:
152
+ """
153
+ Calculate and print the predicted star ratings for all reviews in the JSON file.
154
+
155
+ Args:
156
+ None
157
+ Returns:
158
+ tuple: A tuple containing the total number of reviews and the average star rating.
159
+ """
160
+ # don't need try-catch because it is handled in generate_summary()
161
+ with open(self.__json_file_path, 'r') as reviews_file:
162
+ all_reviews = json.load(reviews_file)
163
+ sum = 0
164
+ num_reviews = 0
165
+ for i, entry in enumerate(all_reviews, 1):
166
+ single_review_rating = self.__get_predictive_star(entry)
167
+ sum += single_review_rating
168
+ num_reviews = i
169
+ return (sum / num_reviews) if num_reviews != 0 else 0
170
+
171
+ def __paraphrase_statement(self, statement: str) -> list[str]:
172
+ """
173
+ Generates multiple unique paraphrased variations of a given string.
174
+
175
+ Uses a Hugging Face transformer model to generate five variations of the
176
+ input statement. Results are normalized (lowercased, stripped of
177
+ punctuation, and whitespace-cleaned) to ensure uniqueness.
178
+
179
+ Args:
180
+ statement (str): The text to be paraphrased.
181
+
182
+ Returns:
183
+ list[str]: A list of unique, cleaned paraphrased strings.
184
+ Returns [""] if the input is None, empty, or whitespace.
185
+ """
186
+ set_seed(random.randint(0, 2**32 - 1))
187
+
188
+ if statement is None or statement.isspace() or statement == "":
189
+ return [""]
190
+
191
+ prompt = f"paraphrase: {statement}"
192
+ encoder = self.hf_tokenizer(prompt, return_tensors="pt", truncation=True)
193
+
194
+ output = self.hf_model.generate(
195
+ **encoder,
196
+ max_length=48,
197
+ do_sample=True,
198
+ top_p=0.99,
199
+ top_k=50,
200
+ temperature= 1.0,
201
+ num_return_sequences=5,
202
+ repetition_penalty=1.2,
203
+ )
204
+
205
+ resultant = self.hf_tokenizer.batch_decode(output, skip_special_tokens=True)
206
+
207
+ seen = set()
208
+ unique = []
209
+ translator = str.maketrans('', '', string.punctuation)
210
+
211
+ for list_sentence in resultant:
212
+ list_sentence = list_sentence.lower().strip()
213
+ list_sentence = list_sentence.translate(translator)
214
+ while (list_sentence[-1:] == ' '):
215
+ list_sentence = list_sentence[:-1]
216
+ seen.add(list_sentence)
217
+
218
+ for set_sentence in seen:
219
+ unique.append(set_sentence)
220
+
221
+ return unique
222
+
223
+ def __infer_rating_meaning(self) -> str:
224
+ """
225
+ Translates numerical rating scores into descriptive, paraphrased sentiment.
226
+
227
+ Calculates the aggregate review score and maps it to a sentiment category
228
+ (ranging from 'Very Negative' to 'Very Positive'). To avoid repetitive
229
+ output, the final description is passed through an AI paraphrasing
230
+ engine and a random variation is selected.
231
+
232
+ Args:
233
+ None
234
+
235
+ Returns:
236
+ str: A randomly selected paraphrased sentence describing the
237
+ overall service sentiment.
238
+ """
239
+ overall_rating = self.__calculate_all_review()
240
+
241
+ if overall_rating is None:
242
+ return "JSON FILE PATH IS UNIDENTIFIABLE, please try inputting the name properly (e.g. \"companyreview.json\")."
243
+
244
+ def generate_sentence(rating_summ):
245
+ return f"For {self.__company_name}'s {self.__service_name}: " + random.choice(self.__paraphrase_statement(rating_summ)).strip()
246
+
247
+ if 1.0 <= overall_rating < 2.0:
248
+ return generate_sentence("Overall sentiment is very negative, indicating widespread dissatisfaction among users.")
249
+ elif 2.0 <= overall_rating < 3.0:
250
+ return generate_sentence("Overall sentiment is negative, suggesting notable dissatisfaction across reviews.")
251
+ elif 3.0 <= overall_rating < 4.0:
252
+ return generate_sentence("Overall sentiment is mixed, reflecting a balance of positive and negative feedback.")
253
+ elif 4.0 <= overall_rating < 5.0:
254
+ return generate_sentence("Overall sentiment is positive, indicating general user satisfaction.")
255
+ else:
256
+ return generate_sentence("Overall sentiment is very positive, reflecting strong user approval and satisfaction.")
257
+
258
+ def __delete_duplicate(self, issues: list[str]) -> list[str]:
259
+ """
260
+ Filters out duplicate and near-duplicate issue strings using fuzzy matching.
261
+
262
+ The method normalizes strings by converting them to lowercase and stripping
263
+ whitespace. It ignores issues that are empty or contain two or fewer words.
264
+ A string is considered a duplicate if its similarity ratio (via SequenceMatcher)
265
+ is greater than 0.75 compared to any already accepted issue.
266
+
267
+ Args:
268
+ issues (list[str]): A list of raw issue descriptions to be processed.
269
+
270
+ Returns:
271
+ list[str]: A list of unique, normalized issue strings that met the
272
+ length and similarity requirements.
273
+ """
274
+ if not issues:
275
+ return []
276
+
277
+
278
+ result = []
279
+ for issue in issues:
280
+ if not issue or len(issue.split()) <= 2:
281
+ continue
282
+ issue = issue.lower().strip()
283
+
284
+ is_dup = any(SequenceMatcher(None, issue, existing).ratio() > 0.75 for existing in result)
285
+
286
+ if not is_dup:
287
+ result.append(issue)
288
+ return result
289
+
290
+ def __extract_negative_aspects(self, review: str) -> list[str]:
291
+ """
292
+ Extract actionable negative aspects from a review using AI-based text generation.
293
+
294
+ This method uses the Flan-T5 language model to identify specific, constructive
295
+ problems mentioned in a review. Unlike simple sentiment analysis, this extracts
296
+ concrete issues that describe what is broken, missing, or difficult - filtering
297
+ out vague emotional words like "horrible" or "bad".
298
+
299
+ The extraction focuses on actionable feedback that can help improve a product
300
+ or service, such as "notifications arrive at wrong times" rather than just
301
+ "notifications are bad".
302
+
303
+ Args:
304
+ review (str): The review text to analyze for negative aspects.
305
+
306
+ Returns:
307
+ list[str]: A list of specific problem phrases extracted from the review.
308
+ Each phrase describes a concrete issue. Returns an empty list
309
+ if the review is empty, contains only whitespace, or no
310
+ problems are identified.
311
+
312
+ Note:
313
+ This method uses the Flan-T5 model which is loaded lazily on first use.
314
+ Processing time depends on review length and available hardware (CPU/GPU).
315
+ Very short outputs (≤3 characters) are filtered out as likely artifacts.
316
+ """
317
+ if not review or review.isspace():
318
+ return []
319
+
320
+ prompt = f"""
321
+ Extract ONE actionable operational/service issues from the review.
322
+
323
+ Output EXACTLY one line:
324
+ - if the review has no issue, the issue is vague, or seems positive, output exactly this: none
325
+ - or: a 6 to 14 word issue describing what went wrong (include concrete details like time/fee if present).
326
+
327
+ Do NOT output:
328
+ - vague sentiment ("rude", "treated like garbage", "horrible")
329
+ - filler ("where to start", "unbelievable")
330
+ - person names (keep role only)
331
+ - location-only statements
332
+
333
+ Use only words from the review. Do not invent details
334
+
335
+ Review:
336
+ {review}
337
+
338
+ Answer:
339
+ """.strip()
340
+
341
+ inputs = self.extraction_tokenizer(
342
+ prompt,
343
+ return_tensors="pt",
344
+ max_length=512,
345
+ truncation=True
346
+ ).to(self.__device)
347
+
348
+ outputs = self.extraction_model.generate(
349
+ **inputs,
350
+ max_new_tokens=30,
351
+ num_beams=5,
352
+ do_sample=False,
353
+ no_repeat_ngram_size=3,
354
+ early_stopping=True,
355
+ )
356
+
357
+ result = self.extraction_tokenizer.decode(outputs[0], skip_special_tokens=True)
358
+ if result.strip().lower() in ['none', 'none.', 'no problems', '']:
359
+ return[]
360
+
361
+ issues = []
362
+ for line in result.split('\n'):
363
+ line = line.strip()
364
+ line = line.lstrip('•-*1234567890.) ')
365
+ if line and len(line) > 3:
366
+ issues.append(line)
367
+
368
+ return issues
369
+
370
+ def generate_summary(self) -> str:
371
+ """
372
+ Generate a formatted sentiment summary based on user reviews for a service.
373
+
374
+ This method reads a JSON file containing user reviews, infers the overall
375
+ sentiment rating, and produces a structured, human-readable summary.
376
+ The summary includes:
377
+ - A concise explanation of the inferred sentiment rating
378
+ - A numbered list of representative negatives mentioned
379
+
380
+ Long-form reviews are wrapped to a fixed line width while preserving
381
+ list structure and readability.
382
+
383
+ The method is resilient to common file and parsing errors and will
384
+ emit descriptive messages if the input file cannot be accessed or
385
+ decoded properly.
386
+
387
+ Args:
388
+ None
389
+
390
+ Returns:
391
+ str
392
+ A multi-paragraph, text-wrapped sentiment summary suitable for
393
+ console output, logs, or reports.
394
+
395
+ Raises:
396
+ None
397
+ All exceptions are handled internally with descriptive error
398
+ messages to prevent interruption of execution.
399
+ """
400
+ try:
401
+ reviews = []
402
+ with open(self.__json_file_path, 'r') as file:
403
+ company_reviews = json.load(file)
404
+ for i, entry in enumerate(company_reviews, 1):
405
+ for part in self.__extract_negative_aspects(entry):
406
+ self.__notable_negatives.append(part)
407
+ reviews.append(entry)
408
+ except FileNotFoundError:
409
+ return ("JSON FILE PATH IS UNIDENTIFIABLE, please try inputting the name properly (e.g. \"companyreview.json\").")
410
+ except json.JSONDecodeError:
411
+ return ("Could not decode JSON file. Check for valid JSON syntax.")
412
+ except PermissionError:
413
+ return ("Permission denied to open the JSON file.")
414
+ except Exception as e:
415
+ return (f"An unexpected error occured: {e}")
416
+
417
+ self.__notable_negatives = self.__delete_duplicate(self.__notable_negatives)
418
+
419
+ def format_numbered_list(items):
420
+ if not items:
421
+ return "None found"
422
+
423
+ lines = []
424
+ for i, item in enumerate(items, start=1):
425
+ prefix = f"{i}) "
426
+ wrapper = textwrap.TextWrapper(
427
+ width=70,
428
+ initial_indent=prefix,
429
+ subsequent_indent=" " * len(prefix) + " "
430
+ )
431
+ lines.append(wrapper.fill(str(item)))
432
+ return "\n".join(lines)
433
+
434
+ self.__stop_timer.set()
435
+ self.__timer_thread.join()
436
+ print()
437
+ print()
438
+
439
+ rating_meaning = self.__infer_rating_meaning()
440
+
441
+ parts = [textwrap.fill(rating_meaning, width=70)]
442
+
443
+ if self.__calculate_all_review() >= 4:
444
+ parts.append(
445
+ textwrap.fill(
446
+ "Since the overall rating is good, I don't have any notable negatives to mention.",
447
+ width=70))
448
+ else:
449
+ parts.append(
450
+ textwrap.fill(
451
+ "The following reviews highlight some concerns users have expressed:",
452
+ width=70))
453
+ parts.append(format_numbered_list(self.__notable_negatives))
454
+
455
+ return "\n\n".join(parts)
@@ -0,0 +1,200 @@
1
+ Metadata-Version: 2.4
2
+ Name: sentimentscopeai
3
+ Version: 1.2.2
4
+ Summary: Transformer-based review sentiment analysis and actionable insight generation.
5
+ Author: Vignesh Thondikulam
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/VigneshT24/SentimentScopeAI
8
+ Project-URL: Repository, https://github.com/VigneshT24/SentimentScopeAI
9
+ Requires-Python: >=3.9
10
+ Description-Content-Type: text/markdown
11
+ License-File: LICENSE
12
+ Requires-Dist: torch
13
+ Requires-Dist: transformers
14
+ Requires-Dist: sentencepiece
15
+ Dynamic: license-file
16
+
17
+ ![Sentiment Scope AI Logo](https://github.com/user-attachments/assets/493203ec-ed08-4ad7-8e1a-1c4aced128cb)
18
+
19
+
20
+ # SentimentScopeAI
21
+ ## Fine-Grained Review Sentiment Analysis & Insight Generation
22
+
23
+ SentimentScopeAI is a Python-based NLP system that leverages PyTorch and HuggingFace Transformers (pre-trained models) to move beyond binary sentiment classification and instead analyze, interpret, and reason over collections of user reviews to help companies improve their products/services
24
+
25
+ Rather than treating sentiment analysis as a black-box prediction task, this project focuses on semantic interpretation, explainability, and the generation of aggregated insights, simulating how a human analyst would read and summarize large volumes of feedback.
26
+
27
+ ## Project Motivation
28
+
29
+ SentimentScopeAI is designed to answer this one main question:
30
+ * What actionable advice can be derived from collective sentiment?
31
+
32
+ ## Features
33
+
34
+ 1.) Pre-Trained Sentiment Modeling (PyTorch + HuggingFace)
35
+ * Uses pre-trained transformer models from HuggingFace
36
+ * Integrated via PyTorch for inference and extensibility
37
+ * Enables robust sentiment understanding without training from scratch
38
+ * Designed so downstream logic operates on model outputs, not raw text
39
+
40
+ 2.) Rating Meaning Inference
41
+ * Implemented the infer_rating_meaning() function
42
+ * Converts numerical ratings (1–5) into semantic interpretations
43
+ * Uses sentiment signals, linguistic tone, and contextual cues
44
+ * Handles:
45
+ * Mixed sentiment
46
+ * Neutral or ambiguous phrasing
47
+ * Disagreement between rating score and review text
48
+
49
+ Example:
50
+ ```
51
+ Rating: 3
52
+ → "Mixed experience with noticeable positives and recurring issues."
53
+ ```
54
+
55
+ 3.) Explainable, Deterministic Pipeline
56
+ * Downstream reasoning is transparent and testable
57
+ * No opaque end-to-end predictions
58
+ * Model outputs are interpreted rather than blindly trusted
59
+ * Designed for debugging, auditing, and future research extension
60
+
61
+ 4.) Summary Generation
62
+ * Read all reviews for a given product or service
63
+ * Aggregate sentiment signals across users
64
+ * Detect recurring strengths and weaknesses
65
+ * Generate a summary of all reviews to help stakeholders
66
+
67
+ These steps transition the system from analysis → reasoning → recommendation generation.
68
+
69
+ Example:
70
+ ```
71
+ For <Company Name>'s <Service Name>: overall sentiment is mixed reflecting a balance
72
+ of positive and negative feedback
73
+
74
+ The following specific issues were extracted from negative reviews:
75
+
76
+ 1) missed a few appointments
77
+ 2) not signed into the right account
78
+ 3) interface is horrible
79
+ 4) find the interface confusing
80
+ 5) invitations and acceptances are terrible
81
+ ```
82
+
83
+ ## System Architecture Overview
84
+
85
+ ```
86
+ Reviews
87
+
88
+ Pre-trained Transformer (HuggingFace + PyTorch)
89
+
90
+ Sentiment Signals
91
+
92
+ Rating Meaning Inference
93
+
94
+ Summary Generation
95
+ ```
96
+
97
+ ## Tech-Stack
98
+
99
+ * **Language**: Python
100
+ * **Deep Learning**: PyTorch
101
+ * **NLP Models**: HuggingFace Transformers (pre-trained), Flan-T5
102
+ * **Aggregated reasoning**
103
+ * **Data Handling**: JSON, Python data structures
104
+
105
+ ## Why SentimentScopeAI?
106
+
107
+ Every organization collects feedback - but reading hundreds or thousands of reviews is time-consuming, inconsistent, and difficult to scale. Important insights are often buried in repetitive comments, while actionable criticism gets overlooked.
108
+
109
+ SentimentScopeAI is designed to do the heavy lifting:
110
+ * Reads and analyzes large volumes of reviews automatically
111
+ * Identifies recurring pain points across users
112
+ * Pick the one main piece of negative from each review
113
+ * Helps teams focus on what to improve rather than sorting through raw text
114
+
115
+ ## Installation & Usage
116
+
117
+ SentimentScopeAI is distributed as a Python package and can be installed via pip:
118
+
119
+ ```
120
+ pip install sentimentscopeai
121
+ ```
122
+
123
+ Requirements:
124
+ * Python 3.9 or newer (Python 3.10 or above is recommended for best performance and compatibility)
125
+ * PyTorch
126
+ * HuggingFace Transformers
127
+ * Internet connection
128
+
129
+ All required dependencies are automatically installed with the package.
130
+
131
+ ## Basic Usage:
132
+
133
+ ```python
134
+ from sentimentscopeai import SentimentScopeAI
135
+
136
+ # MAKE SURE TO PASS IN: current_folder/json_file_name, not just json_file_name if the following doesn't work
137
+ review_bot = SentimentScopeAI("json_file_name", "company_name", "service_name")
138
+
139
+ print(review_bot.generate_summary())
140
+ ```
141
+
142
+ What Happens Internally
143
+
144
+ * Reviews are parsed from a structured JSON file
145
+ * Sentiment is inferred using pre-trained transformer models (PyTorch + HuggingFace)
146
+ * Rating meanings are semantically interpreted
147
+ * Flan-T5 finds the negatives from each review and summarizes the whole file
148
+
149
+ ## IMPORTANT NOTICE:
150
+
151
+ 1.) JSON Input Format (Required)
152
+
153
+ SentimentScopeAI only accepts JSON input.
154
+ The review file must follow this exact structure:
155
+
156
+ ```json
157
+ [
158
+ "review_text",
159
+ "review_text",
160
+ "review_text",
161
+ ...
162
+ ]
163
+ ```
164
+
165
+ Missing fields, incorrect keys, or non-JSON formats will cause parsing errors.
166
+
167
+ 2.) JSON Must Be Valid
168
+
169
+ * File must be UTF-8 encoded
170
+ * No trailing commas
171
+ * No comments
172
+ * Must be a list ([]), not a single object
173
+
174
+ You can use a JSON validator if you are unsure.
175
+
176
+ 3.) One Company & One Service per JSON File (Required)
177
+
178
+ This restriction is intentional:
179
+
180
+ * Sentiment aggregation assumes a single shared context
181
+ * Summary generation relies on consistent product-level patterns
182
+ * Mixing services can produce misleading summaries and recommendations
183
+
184
+ If you need to analyze multiple products or companies, create separate JSON files and run SentimentScopeAI independently for each dataset.
185
+
186
+ 4.) Model Loading Behavior
187
+
188
+ * Transformer models are lazy-loaded
189
+ * First run may take longer due to:
190
+ * Model downloads
191
+ * Tokenizer initialization
192
+ * Subsequent runs are significantly faster
193
+
194
+ This design improves startup efficiency and memory usage.
195
+
196
+
197
+ **SentimentScopeAI is provided as-is and is not liable for any damages arising from its use.
198
+ Do not include personal, sensitive, or confidential information in review data.
199
+ All input data is processed locally and is not used for model training or retained beyond execution.
200
+ Users are responsible for ensuring ethical and appropriate use of the system.**
@@ -0,0 +1,10 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ sentimentscopeai/__init__.py
5
+ sentimentscopeai/core.py
6
+ sentimentscopeai.egg-info/PKG-INFO
7
+ sentimentscopeai.egg-info/SOURCES.txt
8
+ sentimentscopeai.egg-info/dependency_links.txt
9
+ sentimentscopeai.egg-info/requires.txt
10
+ sentimentscopeai.egg-info/top_level.txt
@@ -0,0 +1,3 @@
1
+ torch
2
+ transformers
3
+ sentencepiece
@@ -0,0 +1 @@
1
+ sentimentscopeai
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+