true-lies-validator 0.7.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- true_lies_validator-0.7.0/LICENSE +21 -0
- true_lies_validator-0.7.0/PKG-INFO +676 -0
- true_lies_validator-0.7.0/README.md +655 -0
- true_lies_validator-0.7.0/pyproject.toml +24 -0
- true_lies_validator-0.7.0/setup.cfg +4 -0
- true_lies_validator-0.7.0/setup.py +21 -0
- true_lies_validator-0.7.0/tests/test_conversation.py +531 -0
- true_lies_validator-0.7.0/tests/test_core.py +175 -0
- true_lies_validator-0.7.0/tests/test_edge_cases.py +399 -0
- true_lies_validator-0.7.0/tests/test_extractors.py +288 -0
- true_lies_validator-0.7.0/tests/test_utils.py +44 -0
- true_lies_validator-0.7.0/true_lies/__init__.py +88 -0
- true_lies_validator-0.7.0/true_lies/config.py +11 -0
- true_lies_validator-0.7.0/true_lies/conversation.py +524 -0
- true_lies_validator-0.7.0/true_lies/extractors.py +36 -0
- true_lies_validator-0.7.0/true_lies/html_reporter.py +1441 -0
- true_lies_validator-0.7.0/true_lies/polarity.py +86 -0
- true_lies_validator-0.7.0/true_lies/runner.py +160 -0
- true_lies_validator-0.7.0/true_lies/scenario.py +25 -0
- true_lies_validator-0.7.0/true_lies/semantic.py +84 -0
- true_lies_validator-0.7.0/true_lies/semantic_data/banking.json +60 -0
- true_lies_validator-0.7.0/true_lies/semantic_data/energy.json +23 -0
- true_lies_validator-0.7.0/true_lies/semantic_data/insurance.json +38 -0
- true_lies_validator-0.7.0/true_lies/semantic_data/motorcycle_dealership.json +28 -0
- true_lies_validator-0.7.0/true_lies/semantic_data/retail.json +12 -0
- true_lies_validator-0.7.0/true_lies/utils.py +694 -0
- true_lies_validator-0.7.0/true_lies/validation_core.py +117 -0
- true_lies_validator-0.7.0/true_lies_validator.egg-info/PKG-INFO +676 -0
- true_lies_validator-0.7.0/true_lies_validator.egg-info/SOURCES.txt +30 -0
- true_lies_validator-0.7.0/true_lies_validator.egg-info/dependency_links.txt +1 -0
- true_lies_validator-0.7.0/true_lies_validator.egg-info/requires.txt +1 -0
- true_lies_validator-0.7.0/true_lies_validator.egg-info/top_level.txt +1 -0
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Pato Miner
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,676 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: true-lies-validator
|
|
3
|
+
Version: 0.7.0
|
|
4
|
+
Summary: True Lies - Separating truth from AI fiction. A powerful library for detecting LLM hallucinations, validating AI responses, and generating professional HTML reports with interactive dashboards.
|
|
5
|
+
Author: Pato Miner
|
|
6
|
+
Author-email: Pato Miner <patominer@gmail.com>
|
|
7
|
+
Keywords: llm,validation,hallucination-detection,ai,python,truth-detection,html-reports,dashboard,chatbot-testing
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
14
|
+
Requires-Python: >=3.10
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
License-File: LICENSE
|
|
17
|
+
Requires-Dist: nltk
|
|
18
|
+
Dynamic: author
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
Dynamic: requires-python
|
|
21
|
+
|
|
22
|
+
# True Lies Validator 🎭
|
|
23
|
+
|
|
24
|
+
**The easiest library to validate LLM and chatbot responses**
|
|
25
|
+
|
|
26
|
+
Validates if your LLM or chatbot is telling the truth, remembering context and maintaining coherence. Perfect for automated conversation testing.
|
|
27
|
+
|
|
28
|
+
## 🚀 Quick Installation
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
# Install the library
|
|
32
|
+
pip install true-lies-validator
|
|
33
|
+
|
|
34
|
+
# Verify installation
|
|
35
|
+
python -c "from true_lies import ConversationValidator, HTMLReporter; print('✅ Installed correctly')"
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
> **📦 Current version: 0.7.0** - With HTML Reporter, interactive dashboards, and advanced analytics
|
|
39
|
+
|
|
40
|
+
## ⚡ Get Started in 2 Minutes
|
|
41
|
+
|
|
42
|
+
### 1. Basic Validation (1 minute)
|
|
43
|
+
|
|
44
|
+
```python
|
|
45
|
+
from true_lies import ConversationValidator
|
|
46
|
+
|
|
47
|
+
# Create validator
|
|
48
|
+
conv = ConversationValidator()
|
|
49
|
+
|
|
50
|
+
# Add conversation with automatic reporting
|
|
51
|
+
conv.add_turn_and_report(
|
|
52
|
+
user_input="Hello, I'm John, my email is john@company.com",
|
|
53
|
+
bot_response="Hello John! I'll help you with your inquiry.",
|
|
54
|
+
expected_facts={'name': 'John', 'email': 'john@company.com'},
|
|
55
|
+
title="Turn 1: User identifies themselves"
|
|
56
|
+
)
|
|
57
|
+
|
|
58
|
+
# Validate if the bot remembers the context
|
|
59
|
+
final_response = "John, your inquiry about john@company.com is resolved"
|
|
60
|
+
retention = conv.validate_and_report(
|
|
61
|
+
response=final_response,
|
|
62
|
+
facts_to_check=['name', 'email'],
|
|
63
|
+
title="Retention Test"
|
|
64
|
+
)
|
|
65
|
+
|
|
66
|
+
# Automatic result: ✅ PASS or ❌ FAIL
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
### 2. Complete Multi-turn Validation (2 minutes)
|
|
70
|
+
|
|
71
|
+
```python
|
|
72
|
+
from true_lies import ConversationValidator
|
|
73
|
+
|
|
74
|
+
def test_chatbot_support():
|
|
75
|
+
"""Complete support chatbot test"""
|
|
76
|
+
|
|
77
|
+
# Create validator
|
|
78
|
+
conv = ConversationValidator()
|
|
79
|
+
|
|
80
|
+
# Turn 1: User reports problem
|
|
81
|
+
conv.add_turn_and_report(
|
|
82
|
+
user_input="My app doesn't work, I'm user ID 12345",
|
|
83
|
+
bot_response="Hello, I'll help you. What error do you see?",
|
|
84
|
+
expected_facts={'user_id': '12345', 'issue_type': 'app_not_working'},
|
|
85
|
+
title="Turn 1: User reports problem"
|
|
86
|
+
)
|
|
87
|
+
|
|
88
|
+
# Turn 2: User provides details
|
|
89
|
+
conv.add_turn_and_report(
|
|
90
|
+
user_input="Error 500 on login, email john@company.com",
|
|
91
|
+
bot_response="I understand, error 500 on login. Checking your account.",
|
|
92
|
+
expected_facts={'error_code': '500', 'email': 'john@company.com'},
|
|
93
|
+
title="Turn 2: User provides details"
|
|
94
|
+
)
|
|
95
|
+
|
|
96
|
+
# Show conversation summary
|
|
97
|
+
conv.print_conversation_summary("Conversation Summary")
|
|
98
|
+
|
|
99
|
+
# Final test: Does the bot remember everything?
|
|
100
|
+
final_response = "John (ID 12345), your error 500 will be fixed in 2 hours"
|
|
101
|
+
retention = conv.validate_and_report(
|
|
102
|
+
response=final_response,
|
|
103
|
+
facts_to_check=['user_id', 'error_code', 'email'],
|
|
104
|
+
title="Context Retention Test"
|
|
105
|
+
)
|
|
106
|
+
|
|
107
|
+
# Return result for automated tests
|
|
108
|
+
return retention['retention_score'] >= 0.8
|
|
109
|
+
|
|
110
|
+
# Run test
|
|
111
|
+
if __name__ == "__main__":
|
|
112
|
+
test_chatbot_support()
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## 🎯 Popular Use Cases
|
|
116
|
+
|
|
117
|
+
### E-commerce
|
|
118
|
+
```python
|
|
119
|
+
# Customer buying product
|
|
120
|
+
conv.add_turn_and_report(
|
|
121
|
+
user_input="Hello, I'm Maria, I want to buy a laptop for $1500",
|
|
122
|
+
bot_response="Hello Maria! I'll help you with the laptop. Registered email: maria@store.com",
|
|
123
|
+
expected_facts={'customer_name': 'Maria', 'product': 'laptop', 'budget': '1500'},
|
|
124
|
+
title="Turn 1: Customer identifies themselves"
|
|
125
|
+
)
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
### Banking
|
|
129
|
+
```python
|
|
130
|
+
# Customer requesting loan
|
|
131
|
+
conv.add_turn_and_report(
|
|
132
|
+
user_input="I'm Carlos, I work at TechCorp, I earn $95,000, I want a loan",
|
|
133
|
+
bot_response="Hello Carlos! I'll help you with your loan. Email: carlos@bank.com",
|
|
134
|
+
expected_facts={'customer_name': 'Carlos', 'employer': 'TechCorp', 'income': '95000'},
|
|
135
|
+
title="Turn 1: Customer requests loan"
|
|
136
|
+
)
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
### Technical Support
|
|
140
|
+
```python
|
|
141
|
+
# User reports problem
|
|
142
|
+
conv.add_turn_and_report(
|
|
143
|
+
user_input="My app doesn't work, I'm user ID 12345",
|
|
144
|
+
bot_response="Hello, I'll help you. What error do you see?",
|
|
145
|
+
expected_facts={'user_id': '12345', 'issue_type': 'app_not_working'},
|
|
146
|
+
title="Turn 1: User reports problem"
|
|
147
|
+
)
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
## 🔧 Main Methods
|
|
151
|
+
|
|
152
|
+
### `add_turn_and_report()` - Add turn with automatic reporting
|
|
153
|
+
```python
|
|
154
|
+
conv.add_turn_and_report(
|
|
155
|
+
user_input="...",
|
|
156
|
+
bot_response="...",
|
|
157
|
+
expected_facts={'key': 'value'},
|
|
158
|
+
title="Turn description"
|
|
159
|
+
)
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
### `validate_and_report()` - Validate retention with automatic reporting
|
|
163
|
+
```python
|
|
164
|
+
retention = conv.validate_and_report(
|
|
165
|
+
response="Bot response to validate",
|
|
166
|
+
facts_to_check=['fact1', 'fact2'],
|
|
167
|
+
title="Retention Test"
|
|
168
|
+
)
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
### `print_conversation_summary()` - Conversation summary
|
|
172
|
+
```python
|
|
173
|
+
conv.print_conversation_summary("Conversation Summary")
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
## 📊 Supported Fact Types
|
|
177
|
+
|
|
178
|
+
The library automatically detects these types of information:
|
|
179
|
+
|
|
180
|
+
- **Names**: "John", "Maria Gonzalez"
|
|
181
|
+
- **Emails**: "john@company.com", "maria@store.com"
|
|
182
|
+
- **Phones**: "+1-555-123-4567", "(555) 123-4567"
|
|
183
|
+
- **IDs**: "12345", "USER-001", "POL-2024-001"
|
|
184
|
+
- **Amounts**: "$1,500", "1500", "USD 1500"
|
|
185
|
+
- **Employers**: "TechCorp", "Google Inc", "Microsoft"
|
|
186
|
+
- **Dates**: "2024-12-31", "31/12/2024", "December 31, 2024"
|
|
187
|
+
- **Percentages**: "15%", "15 percent", "fifteen percent"
|
|
188
|
+
|
|
189
|
+
## 🎨 Automatic Reporting
|
|
190
|
+
|
|
191
|
+
True Lies handles all the reporting. You only need 3 lines:
|
|
192
|
+
|
|
193
|
+
```python
|
|
194
|
+
# Before (30+ lines of manual code)
|
|
195
|
+
print(f"📊 Detailed results:")
|
|
196
|
+
for fact in facts:
|
|
197
|
+
retained = retention.get(f'{fact}_retained', False)
|
|
198
|
+
# ... 25 more lines of manual prints
|
|
199
|
+
|
|
200
|
+
# After (3 simple lines)
|
|
201
|
+
retention = conv.validate_and_report(
|
|
202
|
+
response=final_response,
|
|
203
|
+
facts_to_check=['fact1', 'fact2'],
|
|
204
|
+
title="Retention Test"
|
|
205
|
+
)
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
## 📊 HTML Reports & Dashboard
|
|
209
|
+
|
|
210
|
+
Generate professional HTML reports with interactive dashboards for automated chatbot testing:
|
|
211
|
+
|
|
212
|
+
### Quick HTML Report
|
|
213
|
+
|
|
214
|
+
```python
|
|
215
|
+
from true_lies import HTMLReporter
|
|
216
|
+
|
|
217
|
+
# Generate test data
|
|
218
|
+
results = [
|
|
219
|
+
{'test_name': 'Test 1', 'retention_score': 0.85, 'all_retained': True, 'facts_retained': 3, 'total_facts': 3, 'timestamp': '2024-12-31T10:00:00'},
|
|
220
|
+
{'test_name': 'Test 2', 'retention_score': 0.60, 'all_retained': False, 'facts_retained': 2, 'total_facts': 3, 'timestamp': '2024-12-31T11:00:00'}
|
|
221
|
+
]
|
|
222
|
+
|
|
223
|
+
# Generate HTML report
|
|
224
|
+
reporter = HTMLReporter()
|
|
225
|
+
output_file = reporter.generate_report(
|
|
226
|
+
results=results,
|
|
227
|
+
title="Chatbot Validation Report - December 2024",
|
|
228
|
+
show_details=True
|
|
229
|
+
)
|
|
230
|
+
|
|
231
|
+
print(f"📊 Report generated: {output_file}")
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
### Advanced Features
|
|
235
|
+
|
|
236
|
+
**📈 Interactive Charts:**
|
|
237
|
+
- Success Rate Analysis
|
|
238
|
+
- Performance by Category
|
|
239
|
+
- Response Time Analysis
|
|
240
|
+
- Facts Retention Analysis
|
|
241
|
+
- Weekly Performance Trends
|
|
242
|
+
- Performance Comparisons
|
|
243
|
+
|
|
244
|
+
**🔍 Advanced Filtering:**
|
|
245
|
+
- Filter by score range
|
|
246
|
+
- Filter by date range
|
|
247
|
+
- Filter by facts count
|
|
248
|
+
- Real-time search with smart operators
|
|
249
|
+
- Sort by date, score, or status
|
|
250
|
+
|
|
251
|
+
**📊 Temporal Analysis:**
|
|
252
|
+
- Daily/Weekly/Monthly views
|
|
253
|
+
- Baseline comparisons
|
|
254
|
+
- Trend analysis
|
|
255
|
+
- Performance tracking over time
|
|
256
|
+
|
|
257
|
+
**📄 Export Options:**
|
|
258
|
+
- PDF export with full formatting
|
|
259
|
+
- High-quality charts and graphs
|
|
260
|
+
- Multi-page reports
|
|
261
|
+
- Professional styling
|
|
262
|
+
|
|
263
|
+
**💬 Detailed Test Information:**
|
|
264
|
+
- User input text
|
|
265
|
+
- Bot response text
|
|
266
|
+
- Expected response text
|
|
267
|
+
- Reference text comparison
|
|
268
|
+
- Facts analysis per test
|
|
269
|
+
- Conversation summaries
|
|
270
|
+
|
|
271
|
+
### Example: Complete Test Suite
|
|
272
|
+
|
|
273
|
+
```python
|
|
274
|
+
from true_lies import ConversationValidator, HTMLReporter
|
|
275
|
+
from datetime import datetime, timedelta
|
|
276
|
+
|
|
277
|
+
def run_comprehensive_tests():
|
|
278
|
+
"""Run comprehensive chatbot tests and generate HTML report"""
|
|
279
|
+
|
|
280
|
+
results = []
|
|
281
|
+
|
|
282
|
+
# Test 1: Customer Service
|
|
283
|
+
conv1 = ConversationValidator()
|
|
284
|
+
conv1.add_turn(
|
|
285
|
+
user_input="Hello, I'm John, email john@company.com, ID 12345",
|
|
286
|
+
bot_response="Hello John, I'll help you",
|
|
287
|
+
expected_facts={'name': 'John', 'email': 'john@company.com', 'id': '12345'}
|
|
288
|
+
)
|
|
289
|
+
|
|
290
|
+
result1 = conv1.validate_retention(
|
|
291
|
+
response="John (ID 12345), your request is processed. Confirmation sent to john@company.com",
|
|
292
|
+
facts_to_check=['name', 'email', 'id']
|
|
293
|
+
)
|
|
294
|
+
result1.update({
|
|
295
|
+
'test_name': 'Customer Service - User Identification',
|
|
296
|
+
'test_category': 'Customer Service',
|
|
297
|
+
'timestamp': datetime.now().isoformat(),
|
|
298
|
+
'user_input': "Hello, I'm John, email john@company.com, ID 12345",
|
|
299
|
+
'bot_response': "John (ID 12345), your request is processed. Confirmation sent to john@company.com",
|
|
300
|
+
'expected_response': "John (ID 12345), your request is processed. Confirmation sent to john@company.com"
|
|
301
|
+
})
|
|
302
|
+
results.append(result1)
|
|
303
|
+
|
|
304
|
+
# Test 2: Technical Support
|
|
305
|
+
conv2 = ConversationValidator()
|
|
306
|
+
conv2.add_turn(
|
|
307
|
+
user_input="My app crashed, error code 500, user ID 67890",
|
|
308
|
+
bot_response="I'll help you with the crash",
|
|
309
|
+
expected_facts={'issue': 'app_crash', 'error': '500', 'user_id': '67890'}
|
|
310
|
+
)
|
|
311
|
+
|
|
312
|
+
result2 = conv2.validate_retention(
|
|
313
|
+
response="User 67890, your error 500 crash will be fixed in 2 hours",
|
|
314
|
+
facts_to_check=['user_id', 'error']
|
|
315
|
+
)
|
|
316
|
+
result2.update({
|
|
317
|
+
'test_name': 'Technical Support - Error Reporting',
|
|
318
|
+
'test_category': 'Technical Support',
|
|
319
|
+
'timestamp': (datetime.now() - timedelta(hours=1)).isoformat(),
|
|
320
|
+
'user_input': "My app crashed, error code 500, user ID 67890",
|
|
321
|
+
'bot_response': "User 67890, your error 500 crash will be fixed in 2 hours",
|
|
322
|
+
'expected_response': "User 67890, your error 500 crash will be fixed in 2 hours"
|
|
323
|
+
})
|
|
324
|
+
results.append(result2)
|
|
325
|
+
|
|
326
|
+
# Generate comprehensive HTML report
|
|
327
|
+
reporter = HTMLReporter()
|
|
328
|
+
output_file = reporter.generate_report(
|
|
329
|
+
results=results,
|
|
330
|
+
title="Comprehensive Chatbot Validation Report",
|
|
331
|
+
show_details=True
|
|
332
|
+
)
|
|
333
|
+
|
|
334
|
+
print(f"✅ Comprehensive report generated: {output_file}")
|
|
335
|
+
return output_file
|
|
336
|
+
|
|
337
|
+
# Run tests and generate report
|
|
338
|
+
if __name__ == "__main__":
|
|
339
|
+
run_comprehensive_tests()
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
### CI/CD Integration
|
|
343
|
+
|
|
344
|
+
The HTML Reporter integrates seamlessly with CI/CD pipelines:
|
|
345
|
+
|
|
346
|
+
```yaml
|
|
347
|
+
# GitHub Actions example
|
|
348
|
+
- name: Run Chatbot Tests
|
|
349
|
+
run: |
|
|
350
|
+
python -m pytest tests/
|
|
351
|
+
python examples/comprehensive_test_suite.py
|
|
352
|
+
|
|
353
|
+
- name: Upload HTML Report
|
|
354
|
+
uses: actions/upload-artifact@v3
|
|
355
|
+
with:
|
|
356
|
+
name: chatbot-validation-report
|
|
357
|
+
path: "*.html"
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
### Report Features
|
|
361
|
+
|
|
362
|
+
**🎯 Key Metrics:**
|
|
363
|
+
- Total candidates tested
|
|
364
|
+
- Pass rate percentage
|
|
365
|
+
- Average retention score
|
|
366
|
+
- Facts retention rate
|
|
367
|
+
|
|
368
|
+
**📊 Visual Analytics:**
|
|
369
|
+
- Interactive Chart.js graphs
|
|
370
|
+
- Real-time filtering and search
|
|
371
|
+
- Temporal analysis controls
|
|
372
|
+
- Performance comparisons
|
|
373
|
+
|
|
374
|
+
**🔍 Detailed Analysis:**
|
|
375
|
+
- Individual test results
|
|
376
|
+
- Facts retention per test
|
|
377
|
+
- Conversation text comparison
|
|
378
|
+
- Failure analysis
|
|
379
|
+
|
|
380
|
+
**📱 Responsive Design:**
|
|
381
|
+
- Mobile-friendly interface
|
|
382
|
+
- Professional styling
|
|
383
|
+
- Export to PDF
|
|
384
|
+
- Shareable reports
|
|
385
|
+
|
|
386
|
+
## 📈 Automatic Metrics
|
|
387
|
+
|
|
388
|
+
- **Retention Score**: 0.0 - 1.0 (how well it remembers)
|
|
389
|
+
- **Facts Retained**: X/Y facts remembered
|
|
390
|
+
- **Evaluation**: A, B, C, D, F (automatic grading)
|
|
391
|
+
- **Details per Fact**: What was found and what wasn't
|
|
392
|
+
|
|
393
|
+
## 🚀 Complete Examples
|
|
394
|
+
|
|
395
|
+
### Example 1: Support Chatbot
|
|
396
|
+
```python
|
|
397
|
+
from true_lies import ConversationValidator
|
|
398
|
+
|
|
399
|
+
def test_support_chatbot():
|
|
400
|
+
conv = ConversationValidator()
|
|
401
|
+
|
|
402
|
+
# Turn 1: User reports problem
|
|
403
|
+
conv.add_turn_and_report(
|
|
404
|
+
user_input="My app doesn't work, I'm user ID 12345",
|
|
405
|
+
bot_response="Hello, I'll help you. What error do you see?",
|
|
406
|
+
expected_facts={'user_id': '12345', 'issue_type': 'app_not_working'},
|
|
407
|
+
title="Turn 1: User reports problem"
|
|
408
|
+
)
|
|
409
|
+
|
|
410
|
+
# Turn 2: User provides details
|
|
411
|
+
conv.add_turn_and_report(
|
|
412
|
+
user_input="Error 500 on login, email john@company.com",
|
|
413
|
+
bot_response="I understand, error 500 on login. Checking your account.",
|
|
414
|
+
expected_facts={'error_code': '500', 'email': 'john@company.com'},
|
|
415
|
+
title="Turn 2: User provides details"
|
|
416
|
+
)
|
|
417
|
+
|
|
418
|
+
# Final test
|
|
419
|
+
final_response = "John (ID 12345), your error 500 will be fixed in 2 hours"
|
|
420
|
+
retention = conv.validate_and_report(
|
|
421
|
+
response=final_response,
|
|
422
|
+
facts_to_check=['user_id', 'error_code', 'email'],
|
|
423
|
+
title="Context Retention Test"
|
|
424
|
+
)
|
|
425
|
+
|
|
426
|
+
return retention['retention_score'] >= 0.8
|
|
427
|
+
|
|
428
|
+
if __name__ == "__main__":
|
|
429
|
+
test_support_chatbot()
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
### Example 2: E-commerce
|
|
433
|
+
```python
|
|
434
|
+
from true_lies import ConversationValidator
|
|
435
|
+
|
|
436
|
+
def test_ecommerce_chatbot():
|
|
437
|
+
conv = ConversationValidator()
|
|
438
|
+
|
|
439
|
+
# Turn 1: Customer identifies themselves
|
|
440
|
+
conv.add_turn_and_report(
|
|
441
|
+
user_input="Hello, I'm Maria Gonzalez, email maria@store.com, I want to buy a laptop",
|
|
442
|
+
bot_response="Hello Maria! I'll help you with the laptop. Registered email: maria@store.com",
|
|
443
|
+
expected_facts={'customer_name': 'Maria Gonzalez', 'email': 'maria@store.com', 'product_interest': 'laptop'},
|
|
444
|
+
title="Turn 1: Customer identifies themselves"
|
|
445
|
+
)
|
|
446
|
+
|
|
447
|
+
# Turn 2: Customer specifies budget
|
|
448
|
+
conv.add_turn_and_report(
|
|
449
|
+
user_input="My budget is $1500, I need it for programming",
|
|
450
|
+
bot_response="Perfect Maria, we have laptops for programming in that range. I'll send options to maria@store.com",
|
|
451
|
+
expected_facts={'budget': '1500', 'use_case': 'programming'},
|
|
452
|
+
title="Turn 2: Customer specifies budget"
|
|
453
|
+
)
|
|
454
|
+
|
|
455
|
+
# Final test
|
|
456
|
+
final_response = "Maria, your programming laptop for $1500 is ready. I'll send the invoice to maria@store.com"
|
|
457
|
+
retention = conv.validate_and_report(
|
|
458
|
+
response=final_response,
|
|
459
|
+
facts_to_check=['customer_name', 'email', 'budget', 'use_case'],
|
|
460
|
+
title="E-commerce Retention Test"
|
|
461
|
+
)
|
|
462
|
+
|
|
463
|
+
return retention['retention_score'] >= 0.8
|
|
464
|
+
|
|
465
|
+
if __name__ == "__main__":
|
|
466
|
+
test_ecommerce_chatbot()
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
## 🔍 Advanced Validation (Optional)
|
|
470
|
+
|
|
471
|
+
For more complex cases, you can also use traditional validation:
|
|
472
|
+
|
|
473
|
+
```python
|
|
474
|
+
from true_lies import create_scenario, validate_llm_candidates
|
|
475
|
+
|
|
476
|
+
# Facts that MUST be in the response
|
|
477
|
+
facts = {
|
|
478
|
+
'policy_number': {'extractor': 'categorical', 'expected': 'POL-2024-001'},
|
|
479
|
+
'premium': {'extractor': 'money', 'expected': '850.00'},
|
|
480
|
+
'coverage_type': {'extractor': 'categorical', 'expected': 'auto insurance'}
|
|
481
|
+
}
|
|
482
|
+
|
|
483
|
+
# Reference text for semantic comparison
|
|
484
|
+
reference_text = "Your auto insurance policy #POL-2024-001 has a premium of $850.00"
|
|
485
|
+
|
|
486
|
+
# Create scenario (with automatic fact weighting)
|
|
487
|
+
scenario = create_scenario(
|
|
488
|
+
facts=facts,
|
|
489
|
+
semantic_reference=reference_text,
|
|
490
|
+
semantic_mappings={} # Weights are applied automatically
|
|
491
|
+
)
|
|
492
|
+
|
|
493
|
+
# Validate responses
|
|
494
|
+
candidates = [
|
|
495
|
+
"Policy POL-2024-001 covers your automobile with monthly payments of $850.00",
|
|
496
|
+
"Your car insurance policy POL-2024-001 costs $850 monthly"
|
|
497
|
+
]
|
|
498
|
+
|
|
499
|
+
results = validate_llm_candidates(
|
|
500
|
+
scenario=scenario,
|
|
501
|
+
candidates=candidates,
|
|
502
|
+
threshold=0.7
|
|
503
|
+
)
|
|
504
|
+
```
|
|
505
|
+
|
|
506
|
+
### 🎯 Advanced Features
|
|
507
|
+
|
|
508
|
+
**Automatic Fact Weighting:**
|
|
509
|
+
- Values in your `expected` facts are automatically weighted
|
|
510
|
+
- Significant improvement in similarity scores (+55% in typical cases)
|
|
511
|
+
- No additional configuration needed
|
|
512
|
+
|
|
513
|
+
**Improved Polarity Detection:**
|
|
514
|
+
- Correctly detects negative phrases with "not", "does not", "don't", etc.
|
|
515
|
+
- Patterns in English and Spanish
|
|
516
|
+
- Avoids false positives with substrings
|
|
517
|
+
|
|
518
|
+
**Optimized Semantic Mappings:**
|
|
519
|
+
- Use simple and specific mappings
|
|
520
|
+
- Avoid over-mapping that can worsen scores
|
|
521
|
+
- Recommendation: minimal mappings or no mappings
|
|
522
|
+
|
|
523
|
+
### 💡 Best Practices
|
|
524
|
+
|
|
525
|
+
**1. Fact Configuration:**
|
|
526
|
+
```python
|
|
527
|
+
# ✅ CORRECT - For specific numbers
|
|
528
|
+
'account_number': {'extractor': 'number', 'expected': '2992'}
|
|
529
|
+
|
|
530
|
+
# ❌ INCORRECT - For specific numbers
|
|
531
|
+
'account_number': {'extractor': 'categorical', 'expected': '2992'}
|
|
532
|
+
|
|
533
|
+
# ✅ CORRECT - For categories
|
|
534
|
+
'account_type': {'extractor': 'categorical', 'expected': 'savings'}
|
|
535
|
+
```
|
|
536
|
+
|
|
537
|
+
**2. Semantic Mappings:**
|
|
538
|
+
```python
|
|
539
|
+
# ✅ CORRECT - Simple mappings
|
|
540
|
+
semantic_mappings = {
|
|
541
|
+
"account": ["cuenta"],
|
|
542
|
+
"balance": ["saldo", "monto"]
|
|
543
|
+
}
|
|
544
|
+
|
|
545
|
+
# ❌ INCORRECT - Excessive mappings
|
|
546
|
+
semantic_mappings = {
|
|
547
|
+
"phrases": ["the balance of your", "your term deposit account", ...] # Too aggressive
|
|
548
|
+
}
|
|
549
|
+
```
|
|
550
|
+
|
|
551
|
+
**3. Thresholds:**
|
|
552
|
+
- **0.6-0.7**: For strict validation
|
|
553
|
+
- **0.5-0.6**: For permissive validation
|
|
554
|
+
- **0.8+**: Only for exact cases
|
|
555
|
+
|
|
556
|
+
## 🎯 Available Extractors
|
|
557
|
+
|
|
558
|
+
- **`money`**: Monetary values ($1,234.56, USD 27, 100 dollars) - **Improved v0.6.2+**
|
|
559
|
+
- **`number`**: General numbers (25, 3.14, 1000)
|
|
560
|
+
- **`categorical`**: Categorical values with synonyms - **Improved v0.6.2+**
|
|
561
|
+
- **`email`**: Email addresses
|
|
562
|
+
- **`phone`**: Phone numbers
|
|
563
|
+
- **`hours`**: Time schedules (9:00 AM, 14:30, 3:00 PM)
|
|
564
|
+
- **`id`**: Identifiers (USER-001, POL-2024-001)
|
|
565
|
+
- **`regex`**: Custom patterns
|
|
566
|
+
|
|
567
|
+
### 🔧 Extractor Improvements (v0.6.2+)
|
|
568
|
+
|
|
569
|
+
**Improved `money` extractor:**
|
|
570
|
+
- Prioritizes amounts with currency symbols ($, USD, dollars)
|
|
571
|
+
- Avoids capturing non-monetary numbers
|
|
572
|
+
- Better accuracy in banking scenarios
|
|
573
|
+
|
|
574
|
+
**Improved `categorical` extractor:**
|
|
575
|
+
- Whole word matches (avoids false positives)
|
|
576
|
+
- Better detection of specific patterns
|
|
577
|
+
- Compatible with exact expected values
|
|
578
|
+
|
|
579
|
+
## 📚 Complete Documentation
|
|
580
|
+
|
|
581
|
+
- **[Multi-turn Validation Guide](MULTITURN_VALIDATION_README.md)** - Complete details
|
|
582
|
+
- **[Integration Guide](INTEGRATION_GUIDE.md)** - How to integrate into your project
|
|
583
|
+
- **[Email Extraction Guide](EMAIL_EXTRACTION_GUIDE.md)** - Advanced extraction
|
|
584
|
+
- **[Before/After Comparison](COMPARISON_BEFORE_AFTER.md)** - Library improvements
|
|
585
|
+
- **[HTML Reporter Guide](HTML_REPORTER_README.md)** - Complete HTML reporting documentation
|
|
586
|
+
|
|
587
|
+
## 🎯 Examples & Demos
|
|
588
|
+
|
|
589
|
+
### HTML Reporter Examples
|
|
590
|
+
- **[Basic HTML Report](examples/html_report_example.py)** - Simple report generation
|
|
591
|
+
- **[Advanced Filters Demo](examples/advanced_filters_demo.py)** - Advanced filtering capabilities
|
|
592
|
+
- **[Temporal Analysis Demo](examples/temporal_analysis_demo.py)** - Temporal analysis features
|
|
593
|
+
- **[Advanced Search Demo](examples/advanced_search_demo.py)** - Real-time search functionality
|
|
594
|
+
- **[PDF Export Demo](examples/pdf_export_demo.py)** - PDF export capabilities
|
|
595
|
+
|
|
596
|
+
### CI/CD Integration Examples
|
|
597
|
+
- **[GitHub Actions](.github/workflows/chatbot-validation.yml)** - Automated testing workflow
|
|
598
|
+
- **[Jenkins Pipeline](ci_cd/Jenkinsfile)** - Jenkins integration
|
|
599
|
+
- **[GitLab CI](.gitlab-ci.yml)** - GitLab CI configuration
|
|
600
|
+
- **[Test Runner](ci_cd/run_tests_and_report.py)** - Automated test execution
|
|
601
|
+
|
|
602
|
+
## 🛠️ Diagnostic Tools
|
|
603
|
+
|
|
604
|
+
### Diagnostic Tool
|
|
605
|
+
To diagnose similarity and extraction issues:
|
|
606
|
+
|
|
607
|
+
```python
|
|
608
|
+
from diagnostic_tool import run_custom_diagnosis
|
|
609
|
+
|
|
610
|
+
# Your configuration
|
|
611
|
+
fact_configs = {
|
|
612
|
+
'account_number': {'extractor': 'number', 'expected': '2992'},
|
|
613
|
+
'balance_amount': {'extractor': 'money', 'expected': '3,000.60'}
|
|
614
|
+
}
|
|
615
|
+
candidates = ["Your account 2992 has $3,000.60"]
|
|
616
|
+
|
|
617
|
+
# Diagnose
|
|
618
|
+
run_custom_diagnosis(
|
|
619
|
+
text="The balance of your Term Deposit account 2992 is $3,000.60",
|
|
620
|
+
fact_configs=fact_configs,
|
|
621
|
+
candidates=candidates
|
|
622
|
+
)
|
|
623
|
+
```
|
|
624
|
+
|
|
625
|
+
## 🔄 Changelog
|
|
626
|
+
|
|
627
|
+
### v0.7.0 (Current)
|
|
628
|
+
- ✅ **NEW: HTML Reporter** - Professional HTML reports with interactive dashboards
|
|
629
|
+
- ✅ **NEW: Interactive Charts** - Chart.js integration for visual analytics
|
|
630
|
+
- ✅ **NEW: Advanced Filtering** - Real-time search and filtering capabilities
|
|
631
|
+
- ✅ **NEW: Temporal Analysis** - Daily/Weekly/Monthly performance tracking
|
|
632
|
+
- ✅ **NEW: PDF Export** - High-quality PDF reports with full formatting
|
|
633
|
+
- ✅ **NEW: CI/CD Integration** - GitHub Actions, Jenkins, GitLab CI support
|
|
634
|
+
- ✅ **NEW: Detailed Test Information** - User input, bot response, expected response comparison
|
|
635
|
+
- ✅ **NEW: Responsive Design** - Mobile-friendly professional interface
|
|
636
|
+
|
|
637
|
+
### v0.6.4
|
|
638
|
+
- ✅ Improved polarity detection (detects "not", "does not", etc.)
|
|
639
|
+
- ✅ Complete negative patterns in English and Spanish
|
|
640
|
+
- ✅ Avoids false positives with substrings
|
|
641
|
+
|
|
642
|
+
### v0.6.3
|
|
643
|
+
- ✅ Duplicate function removed
|
|
644
|
+
- ✅ Consistent API
|
|
645
|
+
- ✅ Clean code
|
|
646
|
+
|
|
647
|
+
### v0.6.2
|
|
648
|
+
- ✅ Automatic fact weighting
|
|
649
|
+
- ✅ Improved similarity (+55% in typical cases)
|
|
650
|
+
- ✅ Improved money extractor
|
|
651
|
+
- ✅ English reporting
|
|
652
|
+
|
|
653
|
+
## 🤝 Contributing
|
|
654
|
+
|
|
655
|
+
Contributions are welcome! Please:
|
|
656
|
+
|
|
657
|
+
1. Fork the project
|
|
658
|
+
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
|
|
659
|
+
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
|
|
660
|
+
4. Push to the branch (`git push origin feature/AmazingFeature`)
|
|
661
|
+
5. Open a Pull Request
|
|
662
|
+
|
|
663
|
+
## 📄 License
|
|
664
|
+
|
|
665
|
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
666
|
+
|
|
667
|
+
## 🙏 Acknowledgments
|
|
668
|
+
|
|
669
|
+
- NLTK for natural language processing capabilities
|
|
670
|
+
- The open source community for inspiration and feedback
|
|
671
|
+
|
|
672
|
+
---
|
|
673
|
+
|
|
674
|
+
**True Lies - Where AI meets reality** 🎭
|
|
675
|
+
|
|
676
|
+
*Have questions? Open an issue or contact the development team.*
|