user-simulator 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. user_sim/__init__.py +0 -0
  2. user_sim/cli/__init__.py +0 -0
  3. user_sim/cli/gen_user_profile.py +34 -0
  4. user_sim/cli/init_project.py +65 -0
  5. user_sim/cli/sensei_chat.py +481 -0
  6. user_sim/cli/sensei_check.py +103 -0
  7. user_sim/cli/validation_check.py +143 -0
  8. user_sim/core/__init__.py +0 -0
  9. user_sim/core/ask_about.py +665 -0
  10. user_sim/core/data_extraction.py +260 -0
  11. user_sim/core/data_gathering.py +134 -0
  12. user_sim/core/interaction_styles.py +147 -0
  13. user_sim/core/role_structure.py +608 -0
  14. user_sim/core/user_simulator.py +302 -0
  15. user_sim/handlers/__init__.py +0 -0
  16. user_sim/handlers/asr_module.py +128 -0
  17. user_sim/handlers/html_parser_module.py +202 -0
  18. user_sim/handlers/image_recognition_module.py +139 -0
  19. user_sim/handlers/pdf_parser_module.py +123 -0
  20. user_sim/utils/__init__.py +0 -0
  21. user_sim/utils/config.py +47 -0
  22. user_sim/utils/cost_tracker.py +153 -0
  23. user_sim/utils/cost_tracker_v2.py +193 -0
  24. user_sim/utils/errors.py +15 -0
  25. user_sim/utils/exceptions.py +47 -0
  26. user_sim/utils/languages.py +78 -0
  27. user_sim/utils/register_management.py +62 -0
  28. user_sim/utils/show_logs.py +63 -0
  29. user_sim/utils/token_cost_calculator.py +338 -0
  30. user_sim/utils/url_management.py +60 -0
  31. user_sim/utils/utilities.py +568 -0
  32. user_simulator-0.1.0.dist-info/METADATA +733 -0
  33. user_simulator-0.1.0.dist-info/RECORD +37 -0
  34. user_simulator-0.1.0.dist-info/WHEEL +5 -0
  35. user_simulator-0.1.0.dist-info/entry_points.txt +6 -0
  36. user_simulator-0.1.0.dist-info/licenses/LICENSE.txt +21 -0
  37. user_simulator-0.1.0.dist-info/top_level.txt +1 -0
@@ -0,0 +1,733 @@
1
+ Metadata-Version: 2.4
2
+ Name: user-simulator
3
+ Version: 0.1.0
4
+ Summary: LLM-based user simulator for chatbot testing.
5
+ Author: Alejandro Del Pozzo Escalera, Juan de Lara Jaramillo, Esther Guerra Sánchez
6
+ License: MIT License
7
+
8
+ Copyright (c) [year] [fullname]
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+ Project-URL: Homepage, https://github.com/satori-chatbots/user-simulator
28
+ Requires-Python: >=3.12
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE.txt
31
+ Requires-Dist: allpairspy>=2.5.1
32
+ Requires-Dist: beautifulsoup4>=4.13.4
33
+ Requires-Dist: colorama>=0.4.6
34
+ Requires-Dist: httpx>=0.28.1
35
+ Requires-Dist: langchain>=0.3.25
36
+ Requires-Dist: langchain-openai>=0.3.23
37
+ Requires-Dist: pandas>=2.3.0
38
+ Requires-Dist: pillow>=11.2.1
39
+ Requires-Dist: pymupdf>=1.26.1
40
+ Requires-Dist: pyyaml>=6.0.2
41
+ Requires-Dist: requests>=2.32.4
42
+ Requires-Dist: scikit-learn>=1.7.0
43
+ Requires-Dist: selenium>=4.33.0
44
+ Requires-Dist: webdriver-manager>=4.0.2
45
+ Dynamic: license-file
46
+
47
+ # User simulator for chatbot testing
48
+
49
+ ## Description
50
+ The evolution of technology increased the complexity of chatbots, and also it's testing methods. With the introduction of LLMs, chatbots are capable of humanizing
51
+ conversations and imitating the pragmatics of natural language. Several approaches have been created in order to evaluate the
52
+ performance of chatbots.
53
+
54
+ The code in this project allows creating test cases based in conversations that a user simulator will have
55
+ with the chatbot to test.
56
+
57
+ ## Usage
58
+
59
+ In order to run the simulator, a specific chatbot should be deployed previously (i.e. Taskyto, Rasa...).
60
+
61
+ The script `sensei_chat.py` contains the functions to load the user simulator profile, start a conversation with the chatbot
62
+ and save this conversation and its configuration parámeters. The user simulator profile is stored in yaml files,
63
+ which should be located in the project folder created.
64
+
65
+
66
+ ## Environment Configuration
67
+
68
+ In order to install all the necessary packages, execute the requirements.txt file in a virtual environment as:
69
+ ` pip install -r requirements.txt`.
70
+
71
+ Recommended python version: v3.12
72
+
73
+ Since Sensei is based on LangChain, different LLM providers can be used to run the tests. An API key of the provider
74
+ selected must be set as an environment variable, with its corresponding variable name (Ex: OPENAI_API_KEY, GOOGLE_API_KEY...).
75
+ For more information about model providers and LangChain, visit the following link: https://python.langchain.com/docs/integrations/chat/
76
+
77
+
78
+
79
+ In some cases, an exception could happen while installing the packages. Some troubleshooting carry on are:
80
+
81
+ - Upgrade pip: `pip install --upgrade pip`
82
+ - Upgrade wheel and setuptools: `pip install --upgrade wheel setuptools`
83
+
84
+ ## Initialization
85
+
86
+ An initialization process is required in order to create a project folder which will contain all information regarding
87
+ the execution of the tests.
88
+
89
+ To create this project folder, the script `init_project.py` must be run before anything else along with the command
90
+ `--path "project_path" --name "project_name"`.
91
+
92
+ Example: `python init_project.py --path C:\your\project\path --name pizza-shop-test`
93
+
94
+ The project folder will be created following the structure below:
95
+ ```
96
+ project_folder/
97
+ |
98
+ |_ personalities/
99
+ |_ profiles/
100
+ | |
101
+ | |_ user profiles (.yml) / user profile folders
102
+ |
103
+ |_ rules/
104
+ |_ types/
105
+ |_ run.yml
106
+ ```
107
+
108
+ A project folder contains 5 elements:
109
+
110
+ - personalities: This folder is used to store the custom personalities created by the user
111
+ - profiles: In this folder, all user profiles will be stored as single YAML files or as execution
112
+ folders of YAML files.
113
+ - rules: Here, rules for metamorphic testing are disposed.
114
+ - types: This folder contains all custom data types created by the user for data input and data extraction.
115
+ - run.yml: This file allows the user to create a run configuration instead of creating a whole command line with
116
+ execution parameters. This file is structured as follows:
117
+ ```
118
+ project_folder: # name of the project folder
119
+
120
+ user_profile: # name of the user profile YAML to use or name of the folder containing the user profiles.
121
+ technology: # chatbot technology to test (Taskyto, Ada, Kuki, etc.).
122
+ connector: # path to the chatbot connector to use.
123
+ connector_parameters: # (Optional) If the chatbot connector contains editable parameters, they can be defined here as a dictionary
124
+ param_1:
125
+ param_2:
126
+ extract: # path where conversation outputs and reports will be stored.
127
+ execution_parameters: # additional execution parameters.
128
+ # - verbose
129
+ # - clean_cache
130
+ # - update_cache
131
+ # - ignore_cache
132
+ ```
133
+
134
+
135
+ ## User Profile YAML Configuration
136
+
137
+ This file contains all the properties the user will follow in order to carry out the conversation. Since the user simulator is
138
+ based in OpenAI GPT4-o LLM technology, some of the fields should be written as prompts in natural language. For these fields, a
139
+ prompt engineering task should be carried out by the tester to narrow down the role of the user simulator and guide its
140
+ behaviour. A description of the fields and an example of the YAML structure is described below.
141
+
142
+ ```
143
+ test_name: "pizza_order_test_custom"
144
+
145
+ llm:
146
+ temperature: 0.8
147
+ model: gpt-4o
148
+ model_prov: openai
149
+ format:
150
+ type: text
151
+
152
+ user:
153
+ language: English
154
+ role: you have to act as a user ordering a pizza to a pizza shop.
155
+ context:
156
+ - personality: personalities/formal-user.yml
157
+ - your name is Jon Doe
158
+ goals:
159
+ - "a {{size}} custom pizza with {{toppings}}"
160
+ - "{{cans}} cans of {{drink}}"
161
+ - how long is going to take the pizza to arrive
162
+ - how much will it cost
163
+
164
+ - size:
165
+ function: another()
166
+ type: string
167
+ data:
168
+ - small
169
+ - medium
170
+ - big
171
+
172
+ - toppings:
173
+ function: random(rand)
174
+ type: string
175
+ data:
176
+ - cheese
177
+ - mushrooms
178
+ - pepperoni
179
+
180
+ - cans:
181
+ function: forward(drink)
182
+ type: int
183
+ data:
184
+ min: 1
185
+ max: 3
186
+ step: 1
187
+
188
+ - drink:
189
+ function: forward()
190
+ type: string
191
+ data:
192
+ - sprite
193
+ - coke
194
+ - Orange Fanta
195
+
196
+ chatbot:
197
+ is_starter: True
198
+ fallback: I'm sorry it's a little loud in my pizza shop, can you say that again?
199
+ output:
200
+ - price:
201
+ type: money
202
+ description: The final price of the pizza order
203
+ - time:
204
+ type: time
205
+ description: how long is going to take the pizza to be ready
206
+ - order_id:
207
+ type: str
208
+ description: my order ID
209
+
210
+ conversation:
211
+ number: sample(0.2)
212
+ goal_style:
213
+ steps: 5
214
+ interaction_style:
215
+ - random:
216
+ - make spelling mistakes
217
+ - all questions
218
+ - long phrases
219
+ - change language:
220
+ - italian
221
+ - portuguese
222
+ - chinese
223
+
224
+ ```
225
+
226
+ ## test_name
227
+
228
+ Here it is defined the name of the test suit. This name will be assigned to the exported test file and the folder containing the tests.
229
+
230
+ ## llm
231
+ This parameter establishes the characteristics of the llm model. It consists of a dictionary with two fields, "model" and "temperature".
232
+ - model: This parameter indicates the llm model that will carry out the conversation as the user simulator. Models to use should be available in
233
+ LangChain's OpenAI module.
234
+ - model_prov: This optional parameter specifies the model's provider. Sice there are different available providers in LangChain that may
235
+ contain the same model, in some cases it is necessary to specify the provider in order to avoid confusion, for example:
236
+ - Gemini models are available in "google-genai" or "google-vertexai" providers.
237
+ - Llama models are available in different providers such as Groq or Fireworks AI.
238
+ - temperature: This parameter controls the randomness and diversity of the responses generated by the LLM. The value supported is float between 0.0 and 1.0.
239
+ - format: This parameter allows the tester to enable the speech recognition module in order to test ASR based chatbots, or
240
+ enable the text module to test text chatbots. This parameter contains two sub parameters: "type" and "config". "type" indicates if
241
+ the conversation will use the text module or the speech module, and "config" allows the tester to load a directory to a YAML
242
+ file with the personalized configuration of the speech module. "confing" is only available when "type" is set to "speech" mode.
243
+
244
+
245
+ The whole llm parameter is optional, thus if it is not instantiated in the yaml file, model, temperature and
246
+ format will be set to default values, which are "gpt-4o", "0.8", and "type: text" respectively.
247
+
248
+
249
+ ## user
250
+
251
+ This field defines the properties of the user simulator in 3 parameters: language, role, context and goals
252
+
253
+ ### language
254
+
255
+ This parameter defines the main language that will be used in the conversations. If no language is provided, it is set to English by default.
256
+
257
+ ### role
258
+
259
+ In this field, the tester should define the role the user will deploy during the conversation as a prompt, according to the chatbot to test.
260
+
261
+ ### context
262
+
263
+ This field consist of a list of prompts that will define some characteristics of the user simulator.
264
+ This can be used to define the name of the user, the availability for an appointment, allergies or intolerances, etc.
265
+ An option for loading predefined "personalities" can be enabled by typing inside of this field "personality:" and the
266
+ path to the YAML file containing the desired personality. These personalities can go along with characteristics added
267
+ by the programmer.
268
+
269
+ ### goals
270
+
271
+ This field, named "ask_about" in previous versions is used to narrow down the conversation topics the user simulator will carry out with the chatbot.
272
+ It consists of a list of strings and dictionaries.
273
+
274
+ The tester define a list of prompts with indications for the user simulator to check on the chatbot.
275
+ These prompts can contain variables that should be called inside the text between double brackets {{var}}.
276
+ Variables are useful to provide variability in the testing process and should be instantiated in the list as
277
+ shown in the example above with the exact same name as written between brackets (case-sensitive).
278
+
279
+ Variables follow a specific structure defined by 3 fields as shown below: data, type and function.
280
+ ```
281
+ goals:
282
+ - "cost estimation for photos of {{number_photo}} artworks"
283
+ - number_photo:
284
+ function: forward()
285
+ type: int
286
+ data:
287
+ step: 2
288
+ min: 1
289
+ max: 6
290
+
291
+ # data: (only with float)
292
+ # steps: 0.2 // linspace: 5
293
+ # min: 1
294
+ # max: 6
295
+ ```
296
+ ### type
297
+ This field indicates the type of data that will be substituted in the variable placement.
298
+
299
+ Types can be default or custom. Default types are included in Sensei's source code and consist of "string", "int" and
300
+ "float". Custom types are defined by the user and must be included in the "types" folder inside the project folder.
301
+
302
+ Custom types follow the structure in the example below:
303
+
304
+ ```
305
+ # Structure for phone_number.yml
306
+
307
+ name: phone_number
308
+ type_description: A phone number
309
+ format: r"^\d{3}-\d{7}$"
310
+ extraction: str
311
+ ```
312
+
313
+ ```
314
+ # Structure for currency.yml
315
+
316
+ name: currency
317
+ type_description: a float number with a currency
318
+ format: r'\d+(?:[.,]\d+)?\s*(?:[\$\€\£]|USD|EUR)'
319
+ extraction:
320
+ value:
321
+ type: float
322
+ description: a float value
323
+ currency:
324
+ type: string
325
+ description: a currency unit
326
+ ```
327
+ - name: indicates the type's name. It must be identical to the name of the yaml file containing the custom type information.
328
+ - type_description: this is a prompt to describe the type created.
329
+ - format: this field defines the format that data will follow in python regular expressions.
330
+ - extraction: The extraction field defines how to extract the relevant data from a matched value based on the format (regex). Its structure depends on the complexity of the extracted information:
331
+
332
+ - If the extracted value corresponds directly to a single, basic Python type (e.g. str, int, float),
333
+ you can simply specify the type name.This means the entire match is returned as a single value of that type,
334
+ with no further breakdown required.
335
+ - If the extracted value contains multiple meaningful components (e.g. a number and a currency, a date and time, etc.), then the extraction field must define a structured object.
336
+ Each key represents a component to extract, with its own type and description
337
+
338
+ ### data
339
+ Here, the data list to use will be defined. In general, data lists must be defined manually by the user, but there
340
+ are some cases where it can be created automatically.
341
+
342
+ As shown in the example above, instead of defining a list of the amount of artworks,
343
+ it is possible to automatically create an integer or float list based on range instructions using a 'min, max, step' structure,
344
+ where min refers to the minimum value of the list, max refers to the maximum value of the list,
345
+ and step refers to the separation steps between samples. When working with float data, it can also be used the "linspace"
346
+ parameter instead of step, where samples will be listed with a linear separation step between them.
347
+
348
+ This field also allows the user to create data lists based in prompts by using the function "any()".
349
+ ```
350
+ - drink:
351
+ function: another()
352
+ type: string
353
+ data:
354
+ - Sprite
355
+ - Coca-Cola
356
+ - Pepsi
357
+ - any(3 soda drinks)
358
+ - any(alcoholic drinks)
359
+ ```
360
+ By using this function, an LLM creates a list following the instructions provided by the user inside the parenthesis.
361
+ This function can be used alone in the list or accompanied by other items added by the user. When used with other items,
362
+ the "any()" function will exclude these items from the list generation process in case they're related to the instruction. Multiple
363
+ "any()" functions can be used inside the list.
364
+ Note that if no amount is specified in the prompt, the "any()" function will create a list with an unpredictable amount of items.
365
+
366
+
367
+ The possibility to add personalized list functions to create data lists is another option available in this field,
368
+ as shown in the example below.
369
+
370
+ ```
371
+ - number:
372
+ function: forward()
373
+ type: int
374
+ data:
375
+ file: list_functions/number_list.py
376
+ function_name:
377
+ args:
378
+ - 1
379
+ - 6
380
+ - 2
381
+
382
+ - pizza_type:
383
+ function: forward()
384
+ type: string
385
+ data:
386
+ file: list_functions/number_list.py
387
+ function_name: shuffle_list
388
+ args: list_functions/list_of_things.yml
389
+ ```
390
+ In these two examples, a personalized list function is implemented in "data". The structure consist in three parameters:
391
+ - file: The path to the .py file where the function is created
392
+ - function_name: the name of the function to run inside the .py file
393
+ - args: the required input args for the function.
394
+
395
+ List functions are fully personalized by the user.
396
+
397
+ ### function
398
+ Functions are useful to determine how data will be added to the prompt.
399
+
400
+ Since the data is listed, functions are used to iterate through these lists in order to change the information
401
+ inside the variable in each conversation. The functions available in this update are the following:
402
+
403
+ - default(): the default() function assigns all data in the list to the variable in the prompt.
404
+ - random(): this function picks only one random sample inside the list assigned to the variable.
405
+ - random(5): this function picks a certain amount of random samples inside the list. In this example, 5 random
406
+ samples will be picked from the list. This number can't exceed the list length.
407
+ - random(rand): this function picks a random amount of random samples inside the list.
408
+ This amount will not exceed the list length.
409
+ - another(): the another() function will always randomly pick a different sample until finishing the options.
410
+ - another(5): when a certain amount is defined inside the brackets, the another() function will pick this number of
411
+ samples without repetition between conversations until finishing the options.
412
+ - forward(): this function iterates through each of the samples in the list one by one. It allows to nest multiple
413
+ forward() functions in order to cover all combinations possible. To nest forward() functions it is necessary to reference the variable that it is going to nest by typing
414
+ its name inside the parenthesis, as shown in the example below:
415
+ ```
416
+ goals:
417
+ - "{{cans}} cans of {{drink}}"
418
+
419
+ - cans:
420
+ function: forward(drink)
421
+ type: int
422
+ data:
423
+ min: 1
424
+ max: 3
425
+ step: 1
426
+
427
+ - drink:
428
+ function: forward()
429
+ type: string
430
+ data:
431
+ - sprite
432
+ - coke
433
+ - Orange Fanta
434
+
435
+ ```
436
+ - pairwise(): This function iterates through data by creating pairwise based combinations for pairwise testing.Pairwise
437
+ testing is a combinatorial test-design technique that focuses on covering all possible
438
+ pairs of input parameter values at least once. It’s based on the observation that most software faults are triggered
439
+ by interactions of just two parameters, so exercising every pair often finds the majority of bugs
440
+ with far fewer tests than full Cartesian‐product enumeration.
441
+
442
+ The pairwise function must be applied to more than 1 variable in order to create the combinations matrix to iterate.
443
+ Variables will change in each conversation based on the matrix construction.
444
+
445
+ ## chatbot
446
+
447
+ This field provides information about the chatbot configuration and the data to be obtained from the conversation.
448
+
449
+ ### is_starter
450
+
451
+ This parameter defines whether the chatbot will start the conversation or not. The value supported is boolean and
452
+ will be set depending on the chatbot to test.
453
+
454
+ ### fallback
455
+
456
+ Here, the tester should provide the chatbot's original fallback message in order to allow the user simulator to detect
457
+ fallbacks. This is needed to avoid fallback loops, allowing the user simulator to rephrase the query or change the topic.
458
+
459
+ ### output
460
+
461
+ This field helps the tester get some certain information for the conversation once it is finished. It is used for data validation tasks.
462
+
463
+ The tester defines some certain data to obtain from the conversation in order to validate the consistency and
464
+ performance of the chatbot. This output field must follow the structure below:
465
+
466
+ ```
467
+ output:
468
+ - price:
469
+ type: currency
470
+ description: The final price of the pizza order
471
+ - time:
472
+ type: time
473
+ description: how long is going to take the pizza to be ready
474
+ - order_id:
475
+ type: string
476
+ description: my order ID
477
+ ```
478
+
479
+ A name for the data to output must be defined. Each output must contain these two parameters:
480
+
481
+ - type: here it is defined the type of value to output. This types can be default or custom as defined in the "type"
482
+ parameter in "goals". Default types are the following:
483
+ - int: Outputs data as an integer.
484
+ - float: Outputs data as a float.
485
+ - string: Outputs data as text.
486
+ - time/time("format"): Outputs data in a time format. An output format can be specified by adding a parenthesis with the
487
+ desired format written in natural language. Ex: time(UTC), time(hh:mm:ss), time(show time in hours, minutes and seconds)
488
+ - date/time("format"): Outputs data in a date format. Following the same logis as "time" type, a date format can be specified
489
+ in natural language. Ex: date(mm/dd/yyyy), date(day-month-year), date(show date in days, months and years)
490
+ - list[type]: Outputs a list of the specified data inside the brackets
491
+
492
+
493
+ - description: In this parameter, the tester should prompt a text defining which information has to be obtained from the conversation.
494
+
495
+
496
+ ## conversation
497
+
498
+ This field defines some parameters that will dictate how the conversations will be generated. It consists
499
+ of 3 parameters: number, goal_style and interaction_style.
500
+
501
+ ```
502
+ conversation:
503
+ number: 3
504
+ max_cost: 1
505
+ goal_style:
506
+ steps: 5
507
+ max_cost: 0.1
508
+ interaction_style:
509
+ - random:
510
+ - make spelling mistakes
511
+ - all questions
512
+ - long phrases
513
+ - change language:
514
+ - italian
515
+ - portuguese
516
+ - chinese
517
+ ```
518
+
519
+ ### number
520
+ This parameter specifies the number of conversations to generate. You can assign a specific numeric value to this field to define an exact number of conversations.
521
+ Example: number: 2 (This will generate 2 conversations.)
522
+
523
+ Alternatively, the number of conversations can be determined by the number of combinations derived from the value matrix
524
+ generated by nested forward or pairwise functions—provided these functions are included in the "goals" field.
525
+ To use this method, set the number field to "combinations".
526
+
527
+ - _combinations_:
528
+ This option calculates the maximum number of conversations that can be generated based on the total
529
+ number of possible combinations from the value matrices produced only by the forward and pairwise functions.
530
+ The biggest number of combinations obtained by any of the available matrices will be used.
531
+ ````
532
+ conversation:
533
+ number: combinations
534
+ ````
535
+
536
+ - _combinations(float)_:
537
+ To reduce the number of conversations, you can specify a percentage by including a float value between 0 and 1 in parentheses.
538
+ This value will be used to calculate a proportion of the total number of generated conversations.
539
+ ````
540
+ conversation:
541
+ number: combinations(0.6) # this will use only 60% of the total conversations.
542
+ ````
543
+
544
+ - _combinations(float, function)_:
545
+ It is possible to reference the value matrix generated by a specific function to determine the number of conversations
546
+ by including the function's name in parentheses.
547
+ ````
548
+ conversation:
549
+ number: combinations(0.6, pairwise) # this will use only 60% of the total conversations from the pairwise matrix.
550
+ ````
551
+
552
+ ````
553
+ conversation:
554
+ number: combinations(1, forward) # this will use the 100% of the total conversations from the biggest forward matrix.
555
+ ````
556
+
557
+
558
+ ### max_cost
559
+ Since there is a cost to implementing LLMs, the max_cost parameter has been introduced to keep the expenditure
560
+ under control by setting a limit on the cost of the execution. This parameter is optional and the value represents
561
+ price in dollars.
562
+
563
+ ### goal_style
564
+ This defines how the conversation should end. There are 3 options in this update
565
+ - steps: the tester should input the number of interactions to be done before the conversation ends.
566
+ - random steps: a random number of interactions will be done between 1 and an amount defined by the user. This amount can't exceed 20.
567
+ - all_answered: the conversation will end as long as all the queries in "goals" have been asked by the user and answered by the chatbot.
568
+ This option creates an internal data frame that verifies if all "goals" queries are being responded or confirmed, and it is possible to export this
569
+ dataframe once the conversation ended by setting the "export" field as True, as shown in the following example. This field is not mandatory, thus if only
570
+ "all_answered" is defined, the export field is set as False by default.
571
+ When all_answered is set, conversations are regulated with a loop break based on the chatbot's fallback message in order to avoid infinite loops when the chatbot does
572
+ not know how to answer to several questions made by the user. But, in some cases, this loop break can be dodged due to hallucinations from the chatbot, leading to
573
+ irrelevant and extremely long conversations. To avoid this, a "limit" parameter is implemented in order to give the tester the possibility to stop the conversation
574
+ after a specific amount of interactions in case the loop break was not triggered before or all queries were not answered. This parameter is not mandatory neither and will
575
+ be set to 30 interactions by default.
576
+ ```
577
+ goal_style:
578
+ all_answered:
579
+ export: True
580
+ limit: 20
581
+ ```
582
+ - default: the default mode enables "all_answered" mode with 'export' set as False and 'limit' set to 30, since no steps are defined.
583
+ - max_cost (individual): This parameter mimics the functionality of the max_cost parameter defined a level above. However, the
584
+ cost limit is set fot each individual conversation inside the execution. Once this limit is surpassed, the conversation ends
585
+ and the next one is executed. This parameter is optional, but when used, it must be defined in conjunction with the goal
586
+ styles explained before.
587
+ ```
588
+ conversation:
589
+ number: sample(0.2)
590
+ max_cost: 1 # cost limit per execution
591
+ goal_style:
592
+ steps: 5
593
+ max_cost: 0.1 # cost limit per conversation
594
+ ```
595
+
596
+ ### interaction_style
597
+ This indicates how the user simulator should carry out the conversation. There are 7 options in this update
598
+ - long phrase: the user will use very long phrases to write any query.
599
+ - change your mind: the user will change its mind eventually. Useful in conversations when the user has to
600
+ provide information, such as toppings on a pizza, an appointment date...
601
+ - change language: the user will change the language in the middle of a conversation. This should be defined as a list
602
+ of languages inside the parameter, as shown in the example above.
603
+ - make spelling mistakes: the user will make typos and spelling mistakes during the conversation
604
+ - single question: the user makes only one query per interaction from "goals" field.
605
+ - all questions: the user asks everything inside the "goals" field in one interaction.
606
+ - random: this options allows to create a list inside of it with any of the interaction styles mentioned above.
607
+ Then, it selects a random amount of interaction styles to apply to the conversation. Here's an example on how to apply this interaction style:
608
+ ```
609
+ interaction_style:
610
+ - random:
611
+ - make spelling mistakes
612
+ - all questions
613
+ - long phrases
614
+ - change language:
615
+ - italian
616
+ - portuguese
617
+ - chinese
618
+ ```
619
+ - default: the user simulator will carry out the conversation in a natural way.
620
+
621
+ ## Execution
622
+
623
+ To initiate the execution of the test process, it can be done in two ways:
624
+
625
+ ### Command execution
626
+
627
+ The sensei_chat.py script must be executed along with some command-line arguments for a successful execution.
628
+
629
+ Example:
630
+
631
+ ```
632
+ --technology
633
+ taskyto
634
+
635
+ --connector
636
+ path/to/connecor.yml
637
+
638
+ --connector_parameters
639
+ "{\"api_url\": \"http://127.0.0.1:5000\"}" #dictionary-like format is required
640
+
641
+ --project_path
642
+ C:\path\to\project\folder
643
+
644
+ --user_profile
645
+ profile_1.yaml \\\\\ folder_of_profiles
646
+
647
+ --extract
648
+ C:\path\to\extract\output\information
649
+
650
+ --verbose
651
+ ```
652
+
653
+
654
+ - --technology: Chatbot technology to test.
655
+ - --connector: path to the chatbot connector to use.
656
+ - --connector_parameters: dynamic parameters for the selected chatbot connector
657
+ - --project_path: The project path where all testing content is stored for a specific project.
658
+ - --user_profile: name of the user profile YAML or the folder containing user profiles to use in the testing process.
659
+ - --extract: path where conversation outputs and reports will be stored.
660
+ - --verbose: shows logs during the testing process.
661
+ - --clean_cache: cache is cleaned after the testing process.
662
+ - --update_cache: cache is updated with new content if previous cache was saved.
663
+ - --ignore_cache: cache is ignored during the testing process.
664
+
665
+
666
+ ### run.yml execution
667
+
668
+ The sensei_chat.py script must be executed with the command --run_from_yaml referencing to a project folder path which contains the
669
+ run.yml configuration file explained previously in the "initialization" section.
670
+
671
+ Once the arguments are assigned inside the run.yml, the execution can be performed.
672
+
673
+ Example:
674
+
675
+ `--run_from_yaml examples/academic_helper`
676
+
677
+
678
+ ## Profile Validation
679
+
680
+ The validation_check.py script enables testers to carry out a validation process on the generated profile.
681
+ It produces two types of output files: a JSON file that reports any formatting errors detected in the
682
+ profile, and CSV files containing the matrices generated by the functions utilized within the profile.
683
+
684
+ ![img.png](data%2Freadme_data%2Fimg.png)
685
+
686
+ The script contains the following run arguments:
687
+
688
+ - --profile: Specifies the directory containing the profile to be validated.
689
+ - --export: Defines the path where the output files will be saved.
690
+ - --combined_matrix: If enabled, generates a single combined matrix of all function elements instead of separate matrices for each function.
691
+ - --verbose: Displays detailed logs of the validation process.
692
+
693
+ Example:
694
+ `--profile path\to\profile.yml --export export\path --combined_matrix --verbose`
695
+
696
+ ## Connectors
697
+
698
+ Connectors enable SENSEI to interface with various chatbot APIs.
699
+ These connectors are defined in a YAML configuration file, with their structure varying based
700
+ on the specific communication protocols and design of each chatbot API.
701
+
702
+ SENSEI provides a generic Chatbot Connector class that enables users to integrate with chatbot APIs in a standardized manner.
703
+ An example implementation is shown below:
704
+
705
+ ```
706
+ config:
707
+ api_url: "https:\\xxxxxx"
708
+ headers:
709
+ Content-Type: 'application/json'
710
+ timeout: 10000
711
+ request_key: "queryInput.text.text"
712
+ response_key: "queryResult.fulfillmentText"
713
+
714
+ payload:
715
+ queryInput:
716
+ text:
717
+ text:
718
+ languageCode: "en"
719
+
720
+ parameters:
721
+ - api_url
722
+
723
+ ```
724
+
725
+ A generic connector must contain 3 fields:
726
+
727
+ - config: Contains API-specific configuration details such as the endpoint URL, headers, timeout, and
728
+ keys used to extract request and response data. Developers should tailor this section based on the API requirements.
729
+ - payload: Defines the structure of the request payload used to communicate with the chatbot API.
730
+ - parameters (optional): Lists dynamic parameters that can be passed to the connector at runtime, allowing for
731
+ flexible configuration without modifying the core YAML file.
732
+
733
+