eval-ai-library 0.3.3__py3-none-any.whl → 0.3.10__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of eval-ai-library might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: eval-ai-library
3
- Version: 0.3.3
3
+ Version: 0.3.10
4
4
  Summary: Comprehensive AI Model Evaluation Framework with support for multiple LLM providers
5
5
  Author-email: Aleksandr Meshkov <alekslynx90@gmail.com>
6
6
  License: MIT
@@ -45,6 +45,7 @@ Requires-Dist: html2text>=2020.1.16
45
45
  Requires-Dist: markdown>=3.4.0
46
46
  Requires-Dist: pandas>=2.0.0
47
47
  Requires-Dist: striprtf>=0.0.26
48
+ Requires-Dist: flask>=3.0.0
48
49
  Provides-Extra: dev
49
50
  Requires-Dist: pytest>=7.0.0; extra == "dev"
50
51
  Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
@@ -807,6 +808,170 @@ response, cost = await chat_complete(
807
808
  )
808
809
  ```
809
810
 
811
+ ## Dashboard
812
+
813
+ The library includes an interactive web dashboard for visualizing evaluation results. All evaluation results are automatically saved to cache and can be viewed in a beautiful web interface.
814
+
815
+ ### Features
816
+
817
+ - 📊 **Interactive Charts**: Visual representation of metrics with Chart.js
818
+ - 📈 **Metrics Summary**: Aggregate statistics across all evaluations
819
+ - 🔍 **Detailed View**: Drill down into individual test cases and metric results
820
+ - 💾 **Session History**: Access past evaluation runs
821
+ - 🎨 **Beautiful UI**: Modern, responsive interface with color-coded results
822
+ - 🔄 **Real-time Updates**: Refresh to see new evaluation results
823
+
824
+ ### Starting the Dashboard
825
+
826
+ The dashboard runs as a separate server that you start once and keep running:
827
+ ```bash
828
+ # Start dashboard server (from your project directory)
829
+ eval-lib dashboard
830
+
831
+ # Custom port if 14500 is busy
832
+ eval-lib dashboard --port 8080
833
+
834
+ # Custom cache directory
835
+ eval-lib dashboard --cache-dir /path/to/cache
836
+ ```
837
+
838
+ Once started, the dashboard will be available at `http://localhost:14500`
839
+
840
+ ### Saving Results to Dashboard
841
+
842
+ Enable dashboard cache saving in your evaluation:
843
+ ```python
844
+ import asyncio
845
+ from eval_lib import (
846
+ evaluate,
847
+ EvalTestCase,
848
+ AnswerRelevancyMetric,
849
+ FaithfulnessMetric
850
+ )
851
+
852
+ async def evaluate_with_dashboard():
853
+ test_cases = [
854
+ EvalTestCase(
855
+ input="What is the capital of France?",
856
+ actual_output="Paris is the capital.",
857
+ expected_output="Paris",
858
+ retrieval_context=["Paris is the capital of France."]
859
+ )
860
+ ]
861
+
862
+ metrics = [
863
+ AnswerRelevancyMetric(model="gpt-4o-mini", threshold=0.7),
864
+ FaithfulnessMetric(model="gpt-4o-mini", threshold=0.8)
865
+ ]
866
+
867
+ # Results are saved to .eval_cache/ for dashboard viewing
868
+ results = await evaluate(
869
+ test_cases=test_cases,
870
+ metrics=metrics,
871
+ show_dashboard=True, # ← Enable dashboard cache
872
+ session_name="My First Evaluation" # Optional session name
873
+ )
874
+
875
+ return results
876
+
877
+ asyncio.run(evaluate_with_dashboard())
878
+ ```
879
+
880
+ ### Typical Workflow
881
+
882
+ **Terminal 1 - Start Dashboard (once):**
883
+ ```bash
884
+ cd ~/my_project
885
+ eval-lib dashboard
886
+ # Leave this terminal open - dashboard stays running
887
+ ```
888
+
889
+ **Terminal 2 - Run Evaluations (multiple times):**
890
+ ```python
891
+ # Run evaluation 1
892
+ results1 = await evaluate(
893
+ test_cases=test_cases1,
894
+ metrics=metrics,
895
+ show_dashboard=True,
896
+ session_name="Evaluation 1"
897
+ )
898
+
899
+ # Run evaluation 2
900
+ results2 = await evaluate(
901
+ test_cases=test_cases2,
902
+ metrics=metrics,
903
+ show_dashboard=True,
904
+ session_name="Evaluation 2"
905
+ )
906
+
907
+ # All results are cached and viewable in dashboard
908
+ ```
909
+
910
+ **Browser:**
911
+ - Open `http://localhost:14500`
912
+ - Refresh page (F5) to see new evaluation results
913
+ - Switch between different evaluation sessions using the dropdown
914
+
915
+ ### Dashboard Features
916
+
917
+ **Summary Cards:**
918
+ - Total test cases evaluated
919
+ - Total cost across all evaluations
920
+ - Number of metrics used
921
+
922
+ **Metrics Overview:**
923
+ - Average scores per metric
924
+ - Pass/fail counts
925
+ - Success rates
926
+ - Model used for evaluation
927
+ - Total cost per metric
928
+
929
+ **Detailed Results Table:**
930
+ - Test case inputs and outputs
931
+ - Individual metric scores
932
+ - Pass/fail status
933
+ - Click "View Details" for full information including:
934
+ - Complete input/output/expected output
935
+ - Full retrieval context
936
+ - Detailed evaluation reasoning
937
+ - Complete evaluation logs
938
+
939
+ **Charts:**
940
+ - Bar chart: Average scores by metric
941
+ - Doughnut chart: Success rate distribution
942
+
943
+ ### Cache Management
944
+
945
+ Results are stored in `.eval_cache/results.json` in your project directory:
946
+ ```bash
947
+ # View cache contents
948
+ cat .eval_cache/results.json
949
+
950
+ # Clear cache via dashboard
951
+ # Click "Clear Cache" button in dashboard UI
952
+
953
+ # Or manually delete cache
954
+ rm -rf .eval_cache/
955
+ ```
956
+
957
+ ### CLI Commands
958
+ ```bash
959
+ # Start dashboard with defaults
960
+ eval-lib dashboard
961
+
962
+ # Custom port
963
+ eval-lib dashboard --port 8080
964
+
965
+ # Custom cache directory
966
+ eval-lib dashboard --cache-dir /path/to/project/.eval_cache
967
+
968
+ # Check library version
969
+ eval-lib version
970
+
971
+ # Help
972
+ eval-lib help
973
+ ```
974
+
810
975
  ## Custom LLM Providers
811
976
 
812
977
  The library supports custom LLM providers through the `CustomLLMClient` abstract base class. This allows you to integrate any LLM provider, including internal corporate models, locally-hosted models, or custom endpoints.
@@ -1,7 +1,10 @@
1
- eval_ai_library-0.3.3.dist-info/licenses/LICENSE,sha256=rK9uLDgWNrCHNdp-Zma_XghDE7Fs0u0kDi3WMcmYx6w,1074
2
- eval_lib/__init__.py,sha256=ySdAQb2DQma2y-ERuFv3VQEAq3S8d8G4vORfo__aqfk,3087
3
- eval_lib/evaluate.py,sha256=GjlXZb5dnl44LCaJwdkyGCYcC50zoNZn3NrofzNAVJ0,11490
1
+ eval_ai_library-0.3.10.dist-info/licenses/LICENSE,sha256=rK9uLDgWNrCHNdp-Zma_XghDE7Fs0u0kDi3WMcmYx6w,1074
2
+ eval_lib/__init__.py,sha256=OMrncAoUbbrJXfaYf8k2wJEGw1e2r9k-s1uXkerZ9mE,3204
3
+ eval_lib/cli.py,sha256=Fvnj6HgCQ3lhx28skweALgHSm3FMEpavQCB3o_sQhtE,4731
4
+ eval_lib/dashboard_server.py,sha256=6ND7ujtzN0PdMyVmJFnKDWrIf4kaodnetLZRPUhYHas,6751
5
+ eval_lib/evaluate.py,sha256=LEjwPsuuPGpdwes-xXesCKtKlBFFMF5X1CpIGJIrZ20,12630
4
6
  eval_lib/evaluation_schema.py,sha256=7IDd_uozqewhh7k0p1hKut_20udvRxxkV6thclxKUg0,1904
7
+ eval_lib/html.py,sha256=_tBTtwxZpjIwc3TVOyLGDw2VFD77aAeA47JdovoZ0CI,24094
5
8
  eval_lib/llm_client.py,sha256=eeTVhCLR1uYbhqOEOSBt3wWPKuzgzA9v8m0F9f-4Gqg,14910
6
9
  eval_lib/metric_pattern.py,sha256=wULgMNDeAqJC_Qjglo7bYzY2eGhA_PmY_hA_qGfg0sI,11730
7
10
  eval_lib/price.py,sha256=jbmkkUTxPuXrkSHuaJYPl7jSzfDIzQ9p_swWWs26UJ0,1986
@@ -28,7 +31,8 @@ eval_lib/metrics/faithfulness_metric/faithfulness.py,sha256=OqamlhTOps7d-NOStSIK
28
31
  eval_lib/metrics/geval/geval.py,sha256=mNciHXnqU2drOJsWlYmbwftGiKM89-Ykw2f6XneIGBM,10629
29
32
  eval_lib/metrics/restricted_refusal_metric/restricted_refusal.py,sha256=4QqYgGMcp6W9Lw-v4s0AlUhMSOKvBOEgnLvhqVXaT9I,4286
30
33
  eval_lib/metrics/toxicity_metric/toxicity.py,sha256=rBE1_fvpbCRdBpBep1y1LTIhofKR8GD4Eh76EOYzxL0,4076
31
- eval_ai_library-0.3.3.dist-info/METADATA,sha256=S6nodzMnFB5T1Gvtsg19qi1TEwxGtwc9CqLaBWxgPnM,43879
32
- eval_ai_library-0.3.3.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
33
- eval_ai_library-0.3.3.dist-info/top_level.txt,sha256=uQHpEd2XI0oZgq1eCww9zMvVgDJgwXMWkCD45fYUzEg,9
34
- eval_ai_library-0.3.3.dist-info/RECORD,,
34
+ eval_ai_library-0.3.10.dist-info/METADATA,sha256=pevxrimXqbreKbRwHZ0GBu_VXsfGhles6OMN2SBOJHo,47969
35
+ eval_ai_library-0.3.10.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
36
+ eval_ai_library-0.3.10.dist-info/entry_points.txt,sha256=VTDuJiTezDkBLQw1NWcRoOOuZPHqYgOCcVIoYno-L00,47
37
+ eval_ai_library-0.3.10.dist-info/top_level.txt,sha256=uQHpEd2XI0oZgq1eCww9zMvVgDJgwXMWkCD45fYUzEg,9
38
+ eval_ai_library-0.3.10.dist-info/RECORD,,
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ eval-lib = eval_lib.cli:main
eval_lib/__init__.py CHANGED
@@ -7,7 +7,7 @@ A powerful library for evaluating AI models with support for multiple LLM provid
7
7
  and a wide range of evaluation metrics for RAG systems and AI agents.
8
8
  """
9
9
 
10
- __version__ = "0.3.3"
10
+ __version__ = "0.3.10"
11
11
  __author__ = "Aleksandr Meshkov"
12
12
 
13
13
  # Core evaluation functions
@@ -66,6 +66,10 @@ from eval_lib.agent_metrics import (
66
66
  KnowledgeRetentionMetric
67
67
  )
68
68
 
69
+ from .dashboard_server import (
70
+ DashboardCache
71
+ )
72
+
69
73
 
70
74
  def __getattr__(name):
71
75
  """
@@ -136,4 +140,8 @@ __all__ = [
136
140
  # Utils
137
141
  "score_agg",
138
142
  "extract_json_block",
143
+
144
+ # Dashboard
145
+ 'start_dashboard',
146
+ 'DashboardCache',
139
147
  ]
eval_lib/cli.py ADDED
@@ -0,0 +1,166 @@
1
+ # eval_lib/cli.py
2
+ """
3
+ Command-line interface for Eval AI Library
4
+ """
5
+
6
+ import argparse
7
+ import sys
8
+ from pathlib import Path
9
+
10
+
11
+ def run_dashboard():
12
+ """Run dashboard server from CLI"""
13
+ parser = argparse.ArgumentParser(
14
+ description='Eval AI Library Dashboard Server',
15
+ prog='eval-lib dashboard'
16
+ )
17
+ parser.add_argument(
18
+ '--port',
19
+ type=int,
20
+ default=14500,
21
+ help='Port to run dashboard on (default: 14500)'
22
+ )
23
+ parser.add_argument(
24
+ '--host',
25
+ type=str,
26
+ default='0.0.0.0',
27
+ help='Host to bind to (default: 0.0.0.0)'
28
+ )
29
+ parser.add_argument(
30
+ '--cache-dir',
31
+ type=str,
32
+ default='.eval_cache',
33
+ help='Path to cache directory (default: .eval_cache)'
34
+ )
35
+
36
+ args = parser.parse_args(sys.argv[2:]) # Skip 'eval-lib' and 'dashboard'
37
+
38
+ # Import here to avoid loading everything for --help
39
+ from eval_lib.dashboard_server import DashboardCache
40
+ from eval_lib.html import HTML_TEMPLATE
41
+ from flask import Flask, render_template_string, jsonify
42
+
43
+ # Create cache with custom directory
44
+ def get_fresh_cache():
45
+ """Reload cache from disk"""
46
+ return DashboardCache(cache_dir=args.cache_dir)
47
+
48
+ cache = get_fresh_cache()
49
+
50
+ print("="*70)
51
+ print("📊 Eval AI Library - Dashboard Server")
52
+ print("="*70)
53
+
54
+ # Check cache
55
+ latest = cache.get_latest()
56
+ if latest:
57
+ print(f"\n✅ Found cached results:")
58
+ print(f" Latest session: {latest['session_id']}")
59
+ print(f" Timestamp: {latest['timestamp']}")
60
+ print(f" Total sessions: {len(cache.get_all())}")
61
+ else:
62
+ print("\n⚠️ No cached results found")
63
+ print(" Run an evaluation with show_dashboard=True to populate cache")
64
+
65
+ print(f"\n🚀 Starting server...")
66
+ print(f" URL: http://localhost:{args.port}")
67
+ print(f" Host: {args.host}")
68
+ print(f" Cache: {Path(args.cache_dir).absolute()}")
69
+ print(f"\n💡 Keep this terminal open to keep the server running")
70
+ print(f" Press Ctrl+C to stop\n")
71
+ print("="*70 + "\n")
72
+
73
+ app = Flask(__name__)
74
+ app.config['WTF_CSRF_ENABLED'] = False
75
+
76
+ @app.route('/')
77
+ def index():
78
+ return render_template_string(HTML_TEMPLATE)
79
+
80
+ @app.route('/favicon.ico')
81
+ def favicon():
82
+ return '', 204
83
+
84
+ @app.after_request
85
+ def after_request(response):
86
+ response.headers['Access-Control-Allow-Origin'] = '*'
87
+ response.headers['Access-Control-Allow-Methods'] = 'GET, POST, OPTIONS'
88
+ response.headers['Access-Control-Allow-Headers'] = 'Content-Type'
89
+ return response
90
+
91
+ @app.route('/api/latest')
92
+ def api_latest():
93
+ cache = get_fresh_cache()
94
+ latest = cache.get_latest()
95
+ if latest:
96
+ return jsonify(latest)
97
+ return jsonify({'error': 'No results available'}), 404
98
+
99
+ @app.route('/api/sessions')
100
+ def api_sessions():
101
+ cache = get_fresh_cache()
102
+ sessions = [
103
+ {
104
+ 'session_id': s['session_id'],
105
+ 'timestamp': s['timestamp'],
106
+ 'total_tests': s['data']['total_tests']
107
+ }
108
+ for s in cache.get_all()
109
+ ]
110
+ return jsonify(sessions)
111
+
112
+ @app.route('/api/session/<session_id>')
113
+ def api_session(session_id):
114
+ cache = get_fresh_cache()
115
+ session = cache.get_by_session(session_id)
116
+ if session:
117
+ return jsonify(session)
118
+ return jsonify({'error': 'Session not found'}), 404
119
+
120
+ @app.route('/api/clear')
121
+ def api_clear():
122
+ cache = get_fresh_cache()
123
+ cache.clear()
124
+ return jsonify({'message': 'Cache cleared'})
125
+
126
+ try:
127
+ app.run(
128
+ host=args.host,
129
+ port=args.port,
130
+ debug=False,
131
+ use_reloader=False,
132
+ threaded=True
133
+ )
134
+ except KeyboardInterrupt:
135
+ print("\n\n🛑 Dashboard server stopped")
136
+
137
+
138
+ def main():
139
+ """Main CLI entry point"""
140
+ parser = argparse.ArgumentParser(
141
+ description='Eval AI Library CLI',
142
+ usage='eval-lib <command> [options]'
143
+ )
144
+ parser.add_argument(
145
+ 'command',
146
+ help='Command to run (dashboard, version, help)'
147
+ )
148
+
149
+ # Parse only the command
150
+ args = parser.parse_args(sys.argv[1:2])
151
+
152
+ if args.command == 'dashboard':
153
+ run_dashboard()
154
+ elif args.command == 'version':
155
+ from eval_lib import __version__
156
+ print(f"Eval AI Library v{__version__}")
157
+ elif args.command == 'help':
158
+ parser.print_help()
159
+ else:
160
+ print(f"Unknown command: {args.command}")
161
+ print("Available commands: dashboard, version, help")
162
+ sys.exit(1)
163
+
164
+
165
+ if __name__ == '__main__':
166
+ main()
@@ -0,0 +1,172 @@
1
+ # eval_lib/dashboard_server.py
2
+
3
+ import json
4
+ from pathlib import Path
5
+ from typing import List, Dict, Any, Optional
6
+ from datetime import datetime
7
+
8
+
9
+ class DashboardCache:
10
+ """Cache to store evaluation results for the dashboard"""
11
+
12
+ def __init__(self, cache_dir: str = ".eval_cache"):
13
+ self.cache_dir = Path(cache_dir)
14
+ self.cache_dir.mkdir(exist_ok=True)
15
+ self.cache_file = self.cache_dir / "results.json"
16
+ self.results_history = []
17
+ self._load_cache()
18
+
19
+ def _load_cache(self):
20
+ """Load cache from file"""
21
+ if self.cache_file.exists():
22
+ try:
23
+ with open(self.cache_file, 'r', encoding='utf-8') as f:
24
+ self.results_history = json.load(f)
25
+ except Exception as e:
26
+ print(f"Warning: Could not load cache: {e}")
27
+ self.results_history = []
28
+
29
+ def _save_cache(self):
30
+ """Save cache to file"""
31
+ try:
32
+ with open(self.cache_file, 'w', encoding='utf-8') as f:
33
+ json.dump(self.results_history, f,
34
+ indent=2, ensure_ascii=False)
35
+ except Exception as e:
36
+ print(f"Warning: Could not save cache: {e}")
37
+
38
+ def add_results(self, results: List[tuple], session_name: Optional[str] = None) -> str:
39
+ """Add new results to the cache"""
40
+ import time
41
+ session_id = session_name or f"session_{int(time.time())}"
42
+ parsed_data = self._parse_results(results)
43
+
44
+ session_data = {
45
+ 'session_id': session_id,
46
+ 'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
47
+ 'data': parsed_data
48
+ }
49
+
50
+ self.results_history.append(session_data)
51
+ self._save_cache()
52
+
53
+ return session_id
54
+
55
+ def get_latest(self) -> Optional[Dict[str, Any]]:
56
+ """Get latest results"""
57
+ if self.results_history:
58
+ return self.results_history[-1]
59
+ return None
60
+
61
+ def get_all(self) -> List[Dict[str, Any]]:
62
+ """Get all results"""
63
+ return self.results_history
64
+
65
+ def get_by_session(self, session_id: str) -> Optional[Dict[str, Any]]:
66
+ """Get results by session_id"""
67
+ for session in self.results_history:
68
+ if session['session_id'] == session_id:
69
+ return session
70
+ return None
71
+
72
+ def clear(self):
73
+ """Clear the cache"""
74
+ self.results_history = []
75
+ self._save_cache()
76
+
77
+ def _parse_results(self, results: List[tuple]) -> Dict[str, Any]:
78
+ """Parse raw results into structured format for dashboard"""
79
+
80
+ test_cases = []
81
+ metrics_summary = {}
82
+ total_cost = 0.0
83
+
84
+ for test_idx, test_results in results:
85
+ for result in test_results:
86
+ test_case_data = {
87
+ 'test_index': test_idx,
88
+ 'input': result.input[:100] + '...' if len(result.input) > 100 else result.input,
89
+ 'input_full': result.input,
90
+ 'actual_output': result.actual_output[:200] if result.actual_output else '',
91
+ 'actual_output_full': result.actual_output,
92
+ 'expected_output': result.expected_output[:200] if result.expected_output else '',
93
+ 'expected_output_full': result.expected_output,
94
+ 'retrieval_context': result.retrieval_context if result.retrieval_context else [],
95
+ 'metrics': []
96
+ }
97
+
98
+ for metric_data in result.metrics_data:
99
+ # Determine model name
100
+ if isinstance(metric_data.evaluation_model, str):
101
+ model_name = metric_data.evaluation_model
102
+ else:
103
+ # For CustomLLMClient
104
+ try:
105
+ model_name = metric_data.evaluation_model.get_model_name()
106
+ except:
107
+ model_name = str(
108
+ type(metric_data.evaluation_model).__name__)
109
+
110
+ test_case_data['metrics'].append({
111
+ 'name': metric_data.name,
112
+ 'score': round(metric_data.score, 3),
113
+ 'success': metric_data.success,
114
+ 'threshold': metric_data.threshold,
115
+ 'reason': metric_data.reason[:300] if metric_data.reason else '',
116
+ 'reason_full': metric_data.reason,
117
+ 'evaluation_model': model_name,
118
+ 'evaluation_cost': metric_data.evaluation_cost,
119
+ 'evaluation_log': metric_data.evaluation_log
120
+ })
121
+
122
+ if metric_data.name not in metrics_summary:
123
+ metrics_summary[metric_data.name] = {
124
+ 'scores': [],
125
+ 'passed': 0,
126
+ 'failed': 0,
127
+ 'threshold': metric_data.threshold,
128
+ 'total_cost': 0.0,
129
+ 'model': model_name
130
+ }
131
+
132
+ metrics_summary[metric_data.name]['scores'].append(
133
+ metric_data.score)
134
+ if metric_data.success:
135
+ metrics_summary[metric_data.name]['passed'] += 1
136
+ else:
137
+ metrics_summary[metric_data.name]['failed'] += 1
138
+
139
+ if metric_data.evaluation_cost:
140
+ total_cost += metric_data.evaluation_cost
141
+ metrics_summary[metric_data.name]['total_cost'] += metric_data.evaluation_cost
142
+
143
+ test_cases.append(test_case_data)
144
+
145
+ for metric_name, data in metrics_summary.items():
146
+ data['avg_score'] = sum(data['scores']) / \
147
+ len(data['scores']) if data['scores'] else 0
148
+ data['success_rate'] = (data['passed'] / (data['passed'] + data['failed'])
149
+ * 100) if (data['passed'] + data['failed']) > 0 else 0
150
+
151
+ return {
152
+ 'test_cases': test_cases,
153
+ 'metrics_summary': metrics_summary,
154
+ 'total_cost': total_cost,
155
+ 'total_tests': len(test_cases)
156
+ }
157
+
158
+
159
+ def save_results_to_cache(results: List[tuple], session_name: Optional[str] = None) -> str:
160
+ """
161
+ Save evaluation results to cache for dashboard viewing.
162
+ Cache is always saved to .eval_cache/ in current directory.
163
+
164
+ Args:
165
+ results: Evaluation results from evaluate()
166
+ session_name: Optional name for the session
167
+
168
+ Returns:
169
+ Session ID
170
+ """
171
+ cache = DashboardCache()
172
+ return cache.add_results(results, session_name)
eval_lib/evaluate.py CHANGED
@@ -68,7 +68,9 @@ def _print_summary(results: List, total_cost: float, total_time: float, passed:
68
68
  async def evaluate(
69
69
  test_cases: List[EvalTestCase],
70
70
  metrics: List[MetricPattern],
71
- verbose: bool = True
71
+ verbose: bool = True,
72
+ show_dashboard: bool = False,
73
+ session_name: str = None,
72
74
  ) -> List[Tuple[None, List[TestCaseResult]]]:
73
75
  """
74
76
  Evaluate test cases with multiple metrics.
@@ -77,6 +79,10 @@ async def evaluate(
77
79
  test_cases: List of test cases to evaluate
78
80
  metrics: List of metrics to apply
79
81
  verbose: Enable detailed logging (default: True)
82
+ show_dashboard: Launch interactive web dashboard (default: False)
83
+ dashboard_port: Port for dashboard server (default: 14500)
84
+ session_name: Name for this evaluation session
85
+ cache_dir: Directory to store cache (default: .eval_cache)
80
86
 
81
87
  Returns:
82
88
  List of evaluation results
@@ -183,6 +189,23 @@ async def evaluate(
183
189
  _print_summary(results, total_cost, total_time,
184
190
  total_passed, total_tests)
185
191
 
192
+ if show_dashboard:
193
+ from eval_lib.dashboard_server import save_results_to_cache
194
+
195
+ session_id = save_results_to_cache(results, session_name)
196
+
197
+ if verbose:
198
+ print(f"\n{Colors.BOLD}{Colors.GREEN}{'='*70}{Colors.ENDC}")
199
+ print(f"{Colors.BOLD}{Colors.GREEN}📊 DASHBOARD{Colors.ENDC}")
200
+ print(f"{Colors.BOLD}{Colors.GREEN}{'='*70}{Colors.ENDC}")
201
+ print(
202
+ f"\n✅ Results saved to cache: {Colors.CYAN}{session_id}{Colors.ENDC}")
203
+ print(f"\n💡 To view results, run:")
204
+ print(f" {Colors.YELLOW}eval-lib dashboard{Colors.ENDC}")
205
+ print(
206
+ f"\n Then open: {Colors.CYAN}http://localhost:14500{Colors.ENDC}")
207
+ print(f"\n{Colors.BOLD}{Colors.GREEN}{'='*70}{Colors.ENDC}\n")
208
+
186
209
  return results
187
210
 
188
211
 
eval_lib/html.py ADDED
@@ -0,0 +1,736 @@
1
+ HTML_TEMPLATE = """
2
+ <!DOCTYPE html>
3
+ <html lang="en">
4
+ <head>
5
+ <meta charset="UTF-8">
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
7
+ <title>Eval AI Library - Interactive Dashboard</title>
8
+ <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
9
+ <style>
10
+ * {
11
+ margin: 0;
12
+ padding: 0;
13
+ box-sizing: border-box;
14
+ }
15
+
16
+ body {
17
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
18
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
19
+ padding: 20px;
20
+ min-height: 100vh;
21
+ }
22
+
23
+ .container {
24
+ max-width: 1400px;
25
+ margin: 0 auto;
26
+ background: white;
27
+ border-radius: 20px;
28
+ padding: 30px;
29
+ box-shadow: 0 20px 60px rgba(0,0,0,0.3);
30
+ }
31
+
32
+ header {
33
+ display: flex;
34
+ justify-content: space-between;
35
+ align-items: center;
36
+ margin-bottom: 40px;
37
+ padding-bottom: 20px;
38
+ border-bottom: 3px solid #667eea;
39
+ }
40
+
41
+ h1 {
42
+ color: #667eea;
43
+ font-size: 2.5em;
44
+ }
45
+
46
+ .controls {
47
+ display: flex;
48
+ gap: 10px;
49
+ align-items: center;
50
+ }
51
+
52
+ select, button {
53
+ padding: 10px 20px;
54
+ border-radius: 8px;
55
+ border: 2px solid #667eea;
56
+ background: white;
57
+ color: #667eea;
58
+ font-weight: 600;
59
+ cursor: pointer;
60
+ transition: all 0.3s;
61
+ }
62
+
63
+ button:hover {
64
+ background: #667eea;
65
+ color: white;
66
+ }
67
+
68
+ .timestamp {
69
+ color: #666;
70
+ font-size: 0.9em;
71
+ margin-left: 20px;
72
+ }
73
+
74
+ .summary {
75
+ display: grid;
76
+ grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
77
+ gap: 20px;
78
+ margin-bottom: 40px;
79
+ }
80
+
81
+ .summary-card {
82
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
83
+ color: white;
84
+ padding: 25px;
85
+ border-radius: 15px;
86
+ text-align: center;
87
+ box-shadow: 0 5px 15px rgba(0,0,0,0.2);
88
+ transition: transform 0.3s;
89
+ }
90
+
91
+ .summary-card:hover {
92
+ transform: translateY(-5px);
93
+ }
94
+
95
+ .summary-card h3 {
96
+ font-size: 0.9em;
97
+ margin-bottom: 10px;
98
+ opacity: 0.9;
99
+ }
100
+
101
+ .summary-card .value {
102
+ font-size: 2em;
103
+ font-weight: bold;
104
+ }
105
+
106
+ .metrics-grid {
107
+ display: grid;
108
+ grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
109
+ gap: 20px;
110
+ margin-bottom: 40px;
111
+ }
112
+
113
+ .metric-card {
114
+ background: #f8f9fa;
115
+ border-radius: 15px;
116
+ padding: 20px;
117
+ box-shadow: 0 3px 10px rgba(0,0,0,0.1);
118
+ transition: transform 0.3s;
119
+ }
120
+
121
+ .metric-card:hover {
122
+ transform: translateY(-5px);
123
+ }
124
+
125
+ .metric-card h3 {
126
+ color: #667eea;
127
+ margin-bottom: 15px;
128
+ font-size: 1.1em;
129
+ }
130
+
131
+ .metric-score {
132
+ font-size: 2.5em;
133
+ font-weight: bold;
134
+ color: #764ba2;
135
+ margin-bottom: 15px;
136
+ }
137
+
138
+ .metric-details p {
139
+ margin: 8px 0;
140
+ color: #555;
141
+ font-size: 0.9em;
142
+ }
143
+
144
+ .charts {
145
+ display: grid;
146
+ grid-template-columns: repeat(auto-fit, minmax(400px, 1fr));
147
+ gap: 30px;
148
+ margin-bottom: 40px;
149
+ }
150
+
151
+ .chart-container {
152
+ background: #f8f9fa;
153
+ border-radius: 15px;
154
+ padding: 20px;
155
+ box-shadow: 0 3px 10px rgba(0,0,0,0.1);
156
+ }
157
+
158
+ .chart-container h2 {
159
+ color: #667eea;
160
+ margin-bottom: 20px;
161
+ font-size: 1.3em;
162
+ }
163
+
164
+ table {
165
+ width: 100%;
166
+ border-collapse: collapse;
167
+ background: white;
168
+ border-radius: 10px;
169
+ overflow: hidden;
170
+ box-shadow: 0 3px 10px rgba(0,0,0,0.1);
171
+ }
172
+
173
+ th {
174
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
175
+ color: white;
176
+ padding: 15px;
177
+ text-align: left;
178
+ font-weight: 600;
179
+ cursor: pointer;
180
+ user-select: none;
181
+ }
182
+
183
+ th:hover {
184
+ background: linear-gradient(135deg, #5568d3 0%, #653a8b 100%);
185
+ }
186
+
187
+ td {
188
+ padding: 12px 15px;
189
+ border-bottom: 1px solid #eee;
190
+ font-size: 0.9em;
191
+ }
192
+
193
+ tr.success {
194
+ background: #f0fdf4;
195
+ }
196
+
197
+ tr.failed {
198
+ background: #fef2f2;
199
+ }
200
+
201
+ tr:hover {
202
+ background: #f8f9fa !important;
203
+ }
204
+
205
+ .reason {
206
+ max-width: 300px;
207
+ color: #666;
208
+ }
209
+
210
+ .view-details-btn {
211
+ background: #667eea;
212
+ color: white;
213
+ border: none;
214
+ padding: 5px 12px;
215
+ border-radius: 5px;
216
+ cursor: pointer;
217
+ font-size: 0.85em;
218
+ transition: all 0.3s;
219
+ }
220
+
221
+ .view-details-btn:hover {
222
+ background: #5568d3;
223
+ transform: scale(1.05);
224
+ }
225
+
226
+ /* Modal styles */
227
+ .modal {
228
+ display: none;
229
+ position: fixed;
230
+ z-index: 1000;
231
+ left: 0;
232
+ top: 0;
233
+ width: 100%;
234
+ height: 100%;
235
+ overflow: auto;
236
+ background-color: rgba(0,0,0,0.7);
237
+ animation: fadeIn 0.3s;
238
+ }
239
+
240
+ @keyframes fadeIn {
241
+ from { opacity: 0; }
242
+ to { opacity: 1; }
243
+ }
244
+
245
+ .modal-content {
246
+ background-color: #fefefe;
247
+ margin: 2% auto;
248
+ padding: 30px;
249
+ border-radius: 15px;
250
+ width: 90%;
251
+ max-width: 900px;
252
+ max-height: 90vh;
253
+ overflow-y: auto;
254
+ box-shadow: 0 20px 60px rgba(0,0,0,0.3);
255
+ animation: slideIn 0.3s;
256
+ }
257
+
258
+ @keyframes slideIn {
259
+ from {
260
+ transform: translateY(-50px);
261
+ opacity: 0;
262
+ }
263
+ to {
264
+ transform: translateY(0);
265
+ opacity: 1;
266
+ }
267
+ }
268
+
269
+ .modal-header {
270
+ display: flex;
271
+ justify-content: space-between;
272
+ align-items: center;
273
+ margin-bottom: 20px;
274
+ padding-bottom: 15px;
275
+ border-bottom: 2px solid #667eea;
276
+ }
277
+
278
+ .modal-header h2 {
279
+ color: #667eea;
280
+ margin: 0;
281
+ }
282
+
283
+ .close {
284
+ color: #aaa;
285
+ font-size: 35px;
286
+ font-weight: bold;
287
+ cursor: pointer;
288
+ transition: color 0.3s;
289
+ }
290
+
291
+ .close:hover {
292
+ color: #667eea;
293
+ }
294
+
295
+ .detail-section {
296
+ margin: 20px 0;
297
+ padding: 15px;
298
+ background: #f8f9fa;
299
+ border-radius: 10px;
300
+ border-left: 4px solid #667eea;
301
+ }
302
+
303
+ .detail-section h3 {
304
+ color: #667eea;
305
+ margin-bottom: 10px;
306
+ font-size: 1.1em;
307
+ }
308
+
309
+ .detail-section pre {
310
+ background: white;
311
+ padding: 15px;
312
+ border-radius: 8px;
313
+ overflow-x: auto;
314
+ font-size: 0.85em;
315
+ line-height: 1.5;
316
+ }
317
+
318
+ .detail-section p {
319
+ margin: 8px 0;
320
+ color: #555;
321
+ line-height: 1.6;
322
+ }
323
+
324
+ .badge {
325
+ display: inline-block;
326
+ padding: 4px 10px;
327
+ border-radius: 12px;
328
+ font-size: 0.8em;
329
+ font-weight: 600;
330
+ margin-right: 8px;
331
+ }
332
+
333
+ .badge-success {
334
+ background: #d1fae5;
335
+ color: #065f46;
336
+ }
337
+
338
+ .badge-failed {
339
+ background: #fee2e2;
340
+ color: #991b1b;
341
+ }
342
+
343
+ .loading {
344
+ text-align: center;
345
+ padding: 40px;
346
+ color: #667eea;
347
+ font-size: 1.2em;
348
+ }
349
+
350
+ .no-data {
351
+ text-align: center;
352
+ padding: 60px;
353
+ color: #999;
354
+ }
355
+
356
+ .no-data h2 {
357
+ color: #667eea;
358
+ margin-bottom: 20px;
359
+ }
360
+
361
+ @media (max-width: 768px) {
362
+ .charts {
363
+ grid-template-columns: 1fr;
364
+ }
365
+
366
+ .metrics-grid {
367
+ grid-template-columns: 1fr;
368
+ }
369
+
370
+ header {
371
+ flex-direction: column;
372
+ gap: 15px;
373
+ }
374
+
375
+ .modal-content {
376
+ width: 95%;
377
+ margin: 5% auto;
378
+ padding: 20px;
379
+ }
380
+ }
381
+ </style>
382
+ </head>
383
+ <body>
384
+ <div class="container">
385
+ <header>
386
+ <div>
387
+ <h1>📊 Eval AI Library Dashboard</h1>
388
+ <span class="timestamp" id="timestamp">Loading...</span>
389
+ </div>
390
+ <div class="controls">
391
+ <select id="sessionSelect" onchange="loadSession()">
392
+ <option value="">Loading sessions...</option>
393
+ </select>
394
+ <button onclick="refreshData()">🔄 Refresh</button>
395
+ <button onclick="clearCache()">🗑️ Clear Cache</button>
396
+ </div>
397
+ </header>
398
+
399
+ <div id="content" class="loading">
400
+ Loading data...
401
+ </div>
402
+ </div>
403
+
404
+ <!-- Modal для детальной информации -->
405
+ <div id="detailsModal" class="modal">
406
+ <div class="modal-content">
407
+ <div class="modal-header">
408
+ <h2>📋 Evaluation Details</h2>
409
+ <span class="close" onclick="closeModal()">&times;</span>
410
+ </div>
411
+ <div id="modalBody"></div>
412
+ </div>
413
+ </div>
414
+
415
+ <script>
416
+ let currentData = null;
417
+ let scoresChart = null;
418
+ let successChart = null;
419
+
420
+ // Загрузить список сессий
421
+ async function loadSessions() {
422
+ try {
423
+ const response = await fetch('/api/sessions');
424
+ const sessions = await response.json();
425
+
426
+ const select = document.getElementById('sessionSelect');
427
+ select.innerHTML = '<option value="latest">Latest Results</option>';
428
+
429
+ sessions.reverse().forEach(session => {
430
+ const option = document.createElement('option');
431
+ option.value = session.session_id;
432
+ option.textContent = `${session.session_id} (${session.timestamp}) - ${session.total_tests} tests`;
433
+ select.appendChild(option);
434
+ });
435
+ } catch (error) {
436
+ console.error('Error loading sessions:', error);
437
+ }
438
+ }
439
+
440
+ // Загрузить данные сессии
441
+ async function loadSession() {
442
+ const select = document.getElementById('sessionSelect');
443
+ const sessionId = select.value;
444
+
445
+ try {
446
+ let url = '/api/latest';
447
+ if (sessionId && sessionId !== 'latest') {
448
+ url = `/api/session/${sessionId}`;
449
+ }
450
+
451
+ const response = await fetch(url);
452
+ if (!response.ok) {
453
+ showNoData();
454
+ return;
455
+ }
456
+
457
+ const session = await response.json();
458
+ currentData = session.data;
459
+ renderDashboard(session);
460
+ } catch (error) {
461
+ console.error('Error loading session:', error);
462
+ showNoData();
463
+ }
464
+ }
465
+
466
+ // Показать "нет данных"
467
+ function showNoData() {
468
+ document.getElementById('content').innerHTML = `
469
+ <div class="no-data">
470
+ <h2>No evaluation results available</h2>
471
+ <p>Run an evaluation with <code>show_dashboard=True</code> to see results here.</p>
472
+ </div>
473
+ `;
474
+ }
475
+
476
+ // Отрисовать дашборд
477
+ function renderDashboard(session) {
478
+ const data = session.data;
479
+ document.getElementById('timestamp').textContent = `Generated: ${session.timestamp}`;
480
+
481
+ const metricsLabels = Object.keys(data.metrics_summary);
482
+ const metricsScores = metricsLabels.map(m => data.metrics_summary[m].avg_score);
483
+ const metricsSuccessRates = metricsLabels.map(m => data.metrics_summary[m].success_rate);
484
+
485
+ let metricCards = '';
486
+ for (const [metricName, metricData] of Object.entries(data.metrics_summary)) {
487
+ metricCards += `
488
+ <div class="metric-card">
489
+ <h3>${metricName}</h3>
490
+ <div class="metric-score">${metricData.avg_score.toFixed(3)}</div>
491
+ <div class="metric-details">
492
+ <p>✅ Passed: ${metricData.passed}</p>
493
+ <p>❌ Failed: ${metricData.failed}</p>
494
+ <p>📊 Success Rate: ${metricData.success_rate.toFixed(1)}%</p>
495
+ <p>🎯 Threshold: ${metricData.threshold}</p>
496
+ <p>🤖 Model: ${metricData.model}</p>
497
+ <p>💰 Total Cost: $${metricData.total_cost.toFixed(6)}</p>
498
+ </div>
499
+ </div>
500
+ `;
501
+ }
502
+
503
+ let tableRows = '';
504
+ data.test_cases.forEach((testCase, tcIdx) => {
505
+ testCase.metrics.forEach((metric, mIdx) => {
506
+ const statusEmoji = metric.success ? '✅' : '❌';
507
+ const statusClass = metric.success ? 'success' : 'failed';
508
+
509
+ tableRows += `
510
+ <tr class="${statusClass}">
511
+ <td>${testCase.test_index}</td>
512
+ <td>${testCase.input}</td>
513
+ <td>${metric.name}</td>
514
+ <td>${metric.score.toFixed(3)}</td>
515
+ <td>${metric.threshold}</td>
516
+ <td>${statusEmoji}</td>
517
+ <td>${metric.evaluation_model}</td>
518
+ <td>$${(metric.evaluation_cost || 0).toFixed(6)}</td>
519
+ <td>
520
+ <button class="view-details-btn" onclick="showDetails(${tcIdx}, ${mIdx})">
521
+ View Details
522
+ </button>
523
+ </td>
524
+ </tr>
525
+ `;
526
+ });
527
+ });
528
+
529
+ document.getElementById('content').innerHTML = `
530
+ <div class="summary">
531
+ <div class="summary-card">
532
+ <h3>Total Tests</h3>
533
+ <div class="value">${data.total_tests}</div>
534
+ </div>
535
+ <div class="summary-card">
536
+ <h3>Total Cost</h3>
537
+ <div class="value">$${data.total_cost.toFixed(6)}</div>
538
+ </div>
539
+ <div class="summary-card">
540
+ <h3>Metrics</h3>
541
+ <div class="value">${metricsLabels.length}</div>
542
+ </div>
543
+ </div>
544
+
545
+ <h2 style="color: #667eea; margin-bottom: 20px;">📈 Metrics Summary</h2>
546
+ <div class="metrics-grid">
547
+ ${metricCards}
548
+ </div>
549
+
550
+ <h2 style="color: #667eea; margin-bottom: 20px;">📊 Charts</h2>
551
+ <div class="charts">
552
+ <div class="chart-container">
553
+ <h2>Average Scores by Metric</h2>
554
+ <canvas id="scoresChart"></canvas>
555
+ </div>
556
+ <div class="chart-container">
557
+ <h2>Success Rate by Metric</h2>
558
+ <canvas id="successChart"></canvas>
559
+ </div>
560
+ </div>
561
+
562
+ <h2 style="color: #667eea; margin: 40px 0 20px 0;">📋 Detailed Results</h2>
563
+ <table>
564
+ <thead>
565
+ <tr>
566
+ <th>Test #</th>
567
+ <th>Input</th>
568
+ <th>Metric</th>
569
+ <th>Score</th>
570
+ <th>Threshold</th>
571
+ <th>Status</th>
572
+ <th>Model</th>
573
+ <th>Cost</th>
574
+ <th>Actions</th>
575
+ </tr>
576
+ </thead>
577
+ <tbody>
578
+ ${tableRows}
579
+ </tbody>
580
+ </table>
581
+ `;
582
+
583
+ renderCharts(metricsLabels, metricsScores, metricsSuccessRates);
584
+ }
585
+
586
+ // Показать детали в модальном окне
587
+ function showDetails(testCaseIdx, metricIdx) {
588
+ const testCase = currentData.test_cases[testCaseIdx];
589
+ const metric = testCase.metrics[metricIdx];
590
+
591
+ const statusBadge = metric.success
592
+ ? '<span class="badge badge-success">✅ PASSED</span>'
593
+ : '<span class="badge badge-failed">❌ FAILED</span>';
594
+
595
+ let modalContent = `
596
+ <div class="detail-section">
597
+ <h3>Test Case #${testCase.test_index}</h3>
598
+ <p><strong>Input:</strong> ${testCase.input_full}</p>
599
+ <p><strong>Actual Output:</strong> ${testCase.actual_output_full || 'N/A'}</p>
600
+ <p><strong>Expected Output:</strong> ${testCase.expected_output_full || 'N/A'}</p>
601
+ </div>
602
+
603
+ <div class="detail-section">
604
+ <h3>Metric: ${metric.name}</h3>
605
+ ${statusBadge}
606
+ <p><strong>Score:</strong> ${metric.score.toFixed(3)} / ${metric.threshold}</p>
607
+ <p><strong>Model:</strong> ${metric.evaluation_model}</p>
608
+ <p><strong>Cost:</strong> $${(metric.evaluation_cost || 0).toFixed(6)}</p>
609
+ </div>
610
+
611
+ <div class="detail-section">
612
+ <h3>Reason</h3>
613
+ <p>${metric.reason_full || metric.reason}</p>
614
+ </div>
615
+ `;
616
+
617
+ // Добавляем retrieval context если есть
618
+ if (testCase.retrieval_context && testCase.retrieval_context.length > 0) {
619
+ modalContent += `
620
+ <div class="detail-section">
621
+ <h3>Retrieval Context (${testCase.retrieval_context.length} chunks)</h3>
622
+ ${testCase.retrieval_context.map((ctx, idx) => `
623
+ <p><strong>Chunk ${idx + 1}:</strong></p>
624
+ <p style="margin-left: 20px; color: #666;">${ctx.substring(0, 300)}${ctx.length > 300 ? '...' : ''}</p>
625
+ `).join('')}
626
+ </div>
627
+ `;
628
+ }
629
+
630
+ // Добавляем evaluation log если есть
631
+ if (metric.evaluation_log) {
632
+ modalContent += `
633
+ <div class="detail-section">
634
+ <h3>Evaluation Log</h3>
635
+ <pre>${JSON.stringify(metric.evaluation_log, null, 2)}</pre>
636
+ </div>
637
+ `;
638
+ }
639
+
640
+ document.getElementById('modalBody').innerHTML = modalContent;
641
+ document.getElementById('detailsModal').style.display = 'block';
642
+ }
643
+
644
+ // Закрыть модальное окно
645
+ function closeModal() {
646
+ document.getElementById('detailsModal').style.display = 'none';
647
+ }
648
+
649
+ // Закрытие по клику вне модального окна
650
+ window.onclick = function(event) {
651
+ const modal = document.getElementById('detailsModal');
652
+ if (event.target == modal) {
653
+ closeModal();
654
+ }
655
+ }
656
+
657
+ // Отрисовать графики
658
+ function renderCharts(labels, scores, successRates) {
659
+ if (scoresChart) scoresChart.destroy();
660
+ if (successChart) successChart.destroy();
661
+
662
+ const scoresCtx = document.getElementById('scoresChart').getContext('2d');
663
+ scoresChart = new Chart(scoresCtx, {
664
+ type: 'bar',
665
+ data: {
666
+ labels: labels,
667
+ datasets: [{
668
+ label: 'Average Score',
669
+ data: scores,
670
+ backgroundColor: 'rgba(102, 126, 234, 0.8)',
671
+ borderColor: 'rgba(102, 126, 234, 1)',
672
+ borderWidth: 2
673
+ }]
674
+ },
675
+ options: {
676
+ responsive: true,
677
+ scales: {
678
+ y: {
679
+ beginAtZero: true,
680
+ max: 1.0
681
+ }
682
+ }
683
+ }
684
+ });
685
+
686
+ const successCtx = document.getElementById('successChart').getContext('2d');
687
+ successChart = new Chart(successCtx, {
688
+ type: 'doughnut',
689
+ data: {
690
+ labels: labels,
691
+ datasets: [{
692
+ label: 'Success Rate (%)',
693
+ data: successRates,
694
+ backgroundColor: [
695
+ 'rgba(102, 126, 234, 0.8)',
696
+ 'rgba(118, 75, 162, 0.8)',
697
+ 'rgba(237, 100, 166, 0.8)',
698
+ 'rgba(255, 154, 158, 0.8)',
699
+ 'rgba(250, 208, 196, 0.8)'
700
+ ],
701
+ borderWidth: 2
702
+ }]
703
+ },
704
+ options: {
705
+ responsive: true
706
+ }
707
+ });
708
+ }
709
+
710
+ // Обновить данные
711
+ function refreshData() {
712
+ loadSessions();
713
+ loadSession();
714
+ }
715
+
716
+ // Очистить кеш
717
+ async function clearCache() {
718
+ if (confirm('Are you sure you want to clear all cached results?')) {
719
+ try {
720
+ await fetch('/api/clear');
721
+ alert('Cache cleared!');
722
+ refreshData();
723
+ } catch (error) {
724
+ console.error('Error clearing cache:', error);
725
+ alert('Error clearing cache');
726
+ }
727
+ }
728
+ }
729
+
730
+ // Инициализация
731
+ loadSessions();
732
+ loadSession();
733
+ </script>
734
+ </body>
735
+ </html>
736
+ """