user-simulator 0.1.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- user_sim/__init__.py +0 -0
- user_sim/cli/__init__.py +0 -0
- user_sim/cli/gen_user_profile.py +34 -0
- user_sim/cli/init_project.py +65 -0
- user_sim/cli/sensei_chat.py +481 -0
- user_sim/cli/sensei_check.py +103 -0
- user_sim/cli/validation_check.py +143 -0
- user_sim/core/__init__.py +0 -0
- user_sim/core/ask_about.py +665 -0
- user_sim/core/data_extraction.py +260 -0
- user_sim/core/data_gathering.py +134 -0
- user_sim/core/interaction_styles.py +147 -0
- user_sim/core/role_structure.py +608 -0
- user_sim/core/user_simulator.py +302 -0
- user_sim/handlers/__init__.py +0 -0
- user_sim/handlers/asr_module.py +128 -0
- user_sim/handlers/html_parser_module.py +202 -0
- user_sim/handlers/image_recognition_module.py +139 -0
- user_sim/handlers/pdf_parser_module.py +123 -0
- user_sim/utils/__init__.py +0 -0
- user_sim/utils/config.py +47 -0
- user_sim/utils/cost_tracker.py +153 -0
- user_sim/utils/cost_tracker_v2.py +193 -0
- user_sim/utils/errors.py +15 -0
- user_sim/utils/exceptions.py +47 -0
- user_sim/utils/languages.py +78 -0
- user_sim/utils/register_management.py +62 -0
- user_sim/utils/show_logs.py +63 -0
- user_sim/utils/token_cost_calculator.py +338 -0
- user_sim/utils/url_management.py +60 -0
- user_sim/utils/utilities.py +568 -0
- user_simulator-0.1.0.dist-info/METADATA +733 -0
- user_simulator-0.1.0.dist-info/RECORD +37 -0
- user_simulator-0.1.0.dist-info/WHEEL +5 -0
- user_simulator-0.1.0.dist-info/entry_points.txt +6 -0
- user_simulator-0.1.0.dist-info/licenses/LICENSE.txt +21 -0
- user_simulator-0.1.0.dist-info/top_level.txt +1 -0
@@ -0,0 +1,733 @@
|
|
1
|
+
Metadata-Version: 2.4
|
2
|
+
Name: user-simulator
|
3
|
+
Version: 0.1.0
|
4
|
+
Summary: LLM-based user simulator for chatbot testing.
|
5
|
+
Author: Alejandro Del Pozzo Escalera, Juan de Lara Jaramillo, Esther Guerra Sánchez
|
6
|
+
License: MIT License
|
7
|
+
|
8
|
+
Copyright (c) [year] [fullname]
|
9
|
+
|
10
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
11
|
+
of this software and associated documentation files (the "Software"), to deal
|
12
|
+
in the Software without restriction, including without limitation the rights
|
13
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
14
|
+
copies of the Software, and to permit persons to whom the Software is
|
15
|
+
furnished to do so, subject to the following conditions:
|
16
|
+
|
17
|
+
The above copyright notice and this permission notice shall be included in all
|
18
|
+
copies or substantial portions of the Software.
|
19
|
+
|
20
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
21
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
22
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
23
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
24
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
25
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
26
|
+
SOFTWARE.
|
27
|
+
Project-URL: Homepage, https://github.com/satori-chatbots/user-simulator
|
28
|
+
Requires-Python: >=3.12
|
29
|
+
Description-Content-Type: text/markdown
|
30
|
+
License-File: LICENSE.txt
|
31
|
+
Requires-Dist: allpairspy>=2.5.1
|
32
|
+
Requires-Dist: beautifulsoup4>=4.13.4
|
33
|
+
Requires-Dist: colorama>=0.4.6
|
34
|
+
Requires-Dist: httpx>=0.28.1
|
35
|
+
Requires-Dist: langchain>=0.3.25
|
36
|
+
Requires-Dist: langchain-openai>=0.3.23
|
37
|
+
Requires-Dist: pandas>=2.3.0
|
38
|
+
Requires-Dist: pillow>=11.2.1
|
39
|
+
Requires-Dist: pymupdf>=1.26.1
|
40
|
+
Requires-Dist: pyyaml>=6.0.2
|
41
|
+
Requires-Dist: requests>=2.32.4
|
42
|
+
Requires-Dist: scikit-learn>=1.7.0
|
43
|
+
Requires-Dist: selenium>=4.33.0
|
44
|
+
Requires-Dist: webdriver-manager>=4.0.2
|
45
|
+
Dynamic: license-file
|
46
|
+
|
47
|
+
# User simulator for chatbot testing
|
48
|
+
|
49
|
+
## Description
|
50
|
+
The evolution of technology increased the complexity of chatbots, and also it's testing methods. With the introduction of LLMs, chatbots are capable of humanizing
|
51
|
+
conversations and imitating the pragmatics of natural language. Several approaches have been created in order to evaluate the
|
52
|
+
performance of chatbots.
|
53
|
+
|
54
|
+
The code in this project allows creating test cases based in conversations that a user simulator will have
|
55
|
+
with the chatbot to test.
|
56
|
+
|
57
|
+
## Usage
|
58
|
+
|
59
|
+
In order to run the simulator, a specific chatbot should be deployed previously (i.e. Taskyto, Rasa...).
|
60
|
+
|
61
|
+
The script `sensei_chat.py` contains the functions to load the user simulator profile, start a conversation with the chatbot
|
62
|
+
and save this conversation and its configuration parámeters. The user simulator profile is stored in yaml files,
|
63
|
+
which should be located in the project folder created.
|
64
|
+
|
65
|
+
|
66
|
+
## Environment Configuration
|
67
|
+
|
68
|
+
In order to install all the necessary packages, execute the requirements.txt file in a virtual environment as:
|
69
|
+
` pip install -r requirements.txt`.
|
70
|
+
|
71
|
+
Recommended python version: v3.12
|
72
|
+
|
73
|
+
Since Sensei is based on LangChain, different LLM providers can be used to run the tests. An API key of the provider
|
74
|
+
selected must be set as an environment variable, with its corresponding variable name (Ex: OPENAI_API_KEY, GOOGLE_API_KEY...).
|
75
|
+
For more information about model providers and LangChain, visit the following link: https://python.langchain.com/docs/integrations/chat/
|
76
|
+
|
77
|
+
|
78
|
+
|
79
|
+
In some cases, an exception could happen while installing the packages. Some troubleshooting carry on are:
|
80
|
+
|
81
|
+
- Upgrade pip: `pip install --upgrade pip`
|
82
|
+
- Upgrade wheel and setuptools: `pip install --upgrade wheel setuptools`
|
83
|
+
|
84
|
+
## Initialization
|
85
|
+
|
86
|
+
An initialization process is required in order to create a project folder which will contain all information regarding
|
87
|
+
the execution of the tests.
|
88
|
+
|
89
|
+
To create this project folder, the script `init_project.py` must be run before anything else along with the command
|
90
|
+
`--path "project_path" --name "project_name"`.
|
91
|
+
|
92
|
+
Example: `python init_project.py --path C:\your\project\path --name pizza-shop-test`
|
93
|
+
|
94
|
+
The project folder will be created following the structure below:
|
95
|
+
```
|
96
|
+
project_folder/
|
97
|
+
|
|
98
|
+
|_ personalities/
|
99
|
+
|_ profiles/
|
100
|
+
| |
|
101
|
+
| |_ user profiles (.yml) / user profile folders
|
102
|
+
|
|
103
|
+
|_ rules/
|
104
|
+
|_ types/
|
105
|
+
|_ run.yml
|
106
|
+
```
|
107
|
+
|
108
|
+
A project folder contains 5 elements:
|
109
|
+
|
110
|
+
- personalities: This folder is used to store the custom personalities created by the user
|
111
|
+
- profiles: In this folder, all user profiles will be stored as single YAML files or as execution
|
112
|
+
folders of YAML files.
|
113
|
+
- rules: Here, rules for metamorphic testing are disposed.
|
114
|
+
- types: This folder contains all custom data types created by the user for data input and data extraction.
|
115
|
+
- run.yml: This file allows the user to create a run configuration instead of creating a whole command line with
|
116
|
+
execution parameters. This file is structured as follows:
|
117
|
+
```
|
118
|
+
project_folder: # name of the project folder
|
119
|
+
|
120
|
+
user_profile: # name of the user profile YAML to use or name of the folder containing the user profiles.
|
121
|
+
technology: # chatbot technology to test (Taskyto, Ada, Kuki, etc.).
|
122
|
+
connector: # path to the chatbot connector to use.
|
123
|
+
connector_parameters: # (Optional) If the chatbot connector contains editable parameters, they can be defined here as a dictionary
|
124
|
+
param_1:
|
125
|
+
param_2:
|
126
|
+
extract: # path where conversation outputs and reports will be stored.
|
127
|
+
execution_parameters: # additional execution parameters.
|
128
|
+
# - verbose
|
129
|
+
# - clean_cache
|
130
|
+
# - update_cache
|
131
|
+
# - ignore_cache
|
132
|
+
```
|
133
|
+
|
134
|
+
|
135
|
+
## User Profile YAML Configuration
|
136
|
+
|
137
|
+
This file contains all the properties the user will follow in order to carry out the conversation. Since the user simulator is
|
138
|
+
based in OpenAI GPT4-o LLM technology, some of the fields should be written as prompts in natural language. For these fields, a
|
139
|
+
prompt engineering task should be carried out by the tester to narrow down the role of the user simulator and guide its
|
140
|
+
behaviour. A description of the fields and an example of the YAML structure is described below.
|
141
|
+
|
142
|
+
```
|
143
|
+
test_name: "pizza_order_test_custom"
|
144
|
+
|
145
|
+
llm:
|
146
|
+
temperature: 0.8
|
147
|
+
model: gpt-4o
|
148
|
+
model_prov: openai
|
149
|
+
format:
|
150
|
+
type: text
|
151
|
+
|
152
|
+
user:
|
153
|
+
language: English
|
154
|
+
role: you have to act as a user ordering a pizza to a pizza shop.
|
155
|
+
context:
|
156
|
+
- personality: personalities/formal-user.yml
|
157
|
+
- your name is Jon Doe
|
158
|
+
goals:
|
159
|
+
- "a {{size}} custom pizza with {{toppings}}"
|
160
|
+
- "{{cans}} cans of {{drink}}"
|
161
|
+
- how long is going to take the pizza to arrive
|
162
|
+
- how much will it cost
|
163
|
+
|
164
|
+
- size:
|
165
|
+
function: another()
|
166
|
+
type: string
|
167
|
+
data:
|
168
|
+
- small
|
169
|
+
- medium
|
170
|
+
- big
|
171
|
+
|
172
|
+
- toppings:
|
173
|
+
function: random(rand)
|
174
|
+
type: string
|
175
|
+
data:
|
176
|
+
- cheese
|
177
|
+
- mushrooms
|
178
|
+
- pepperoni
|
179
|
+
|
180
|
+
- cans:
|
181
|
+
function: forward(drink)
|
182
|
+
type: int
|
183
|
+
data:
|
184
|
+
min: 1
|
185
|
+
max: 3
|
186
|
+
step: 1
|
187
|
+
|
188
|
+
- drink:
|
189
|
+
function: forward()
|
190
|
+
type: string
|
191
|
+
data:
|
192
|
+
- sprite
|
193
|
+
- coke
|
194
|
+
- Orange Fanta
|
195
|
+
|
196
|
+
chatbot:
|
197
|
+
is_starter: True
|
198
|
+
fallback: I'm sorry it's a little loud in my pizza shop, can you say that again?
|
199
|
+
output:
|
200
|
+
- price:
|
201
|
+
type: money
|
202
|
+
description: The final price of the pizza order
|
203
|
+
- time:
|
204
|
+
type: time
|
205
|
+
description: how long is going to take the pizza to be ready
|
206
|
+
- order_id:
|
207
|
+
type: str
|
208
|
+
description: my order ID
|
209
|
+
|
210
|
+
conversation:
|
211
|
+
number: sample(0.2)
|
212
|
+
goal_style:
|
213
|
+
steps: 5
|
214
|
+
interaction_style:
|
215
|
+
- random:
|
216
|
+
- make spelling mistakes
|
217
|
+
- all questions
|
218
|
+
- long phrases
|
219
|
+
- change language:
|
220
|
+
- italian
|
221
|
+
- portuguese
|
222
|
+
- chinese
|
223
|
+
|
224
|
+
```
|
225
|
+
|
226
|
+
## test_name
|
227
|
+
|
228
|
+
Here it is defined the name of the test suit. This name will be assigned to the exported test file and the folder containing the tests.
|
229
|
+
|
230
|
+
## llm
|
231
|
+
This parameter establishes the characteristics of the llm model. It consists of a dictionary with two fields, "model" and "temperature".
|
232
|
+
- model: This parameter indicates the llm model that will carry out the conversation as the user simulator. Models to use should be available in
|
233
|
+
LangChain's OpenAI module.
|
234
|
+
- model_prov: This optional parameter specifies the model's provider. Sice there are different available providers in LangChain that may
|
235
|
+
contain the same model, in some cases it is necessary to specify the provider in order to avoid confusion, for example:
|
236
|
+
- Gemini models are available in "google-genai" or "google-vertexai" providers.
|
237
|
+
- Llama models are available in different providers such as Groq or Fireworks AI.
|
238
|
+
- temperature: This parameter controls the randomness and diversity of the responses generated by the LLM. The value supported is float between 0.0 and 1.0.
|
239
|
+
- format: This parameter allows the tester to enable the speech recognition module in order to test ASR based chatbots, or
|
240
|
+
enable the text module to test text chatbots. This parameter contains two sub parameters: "type" and "config". "type" indicates if
|
241
|
+
the conversation will use the text module or the speech module, and "config" allows the tester to load a directory to a YAML
|
242
|
+
file with the personalized configuration of the speech module. "confing" is only available when "type" is set to "speech" mode.
|
243
|
+
|
244
|
+
|
245
|
+
The whole llm parameter is optional, thus if it is not instantiated in the yaml file, model, temperature and
|
246
|
+
format will be set to default values, which are "gpt-4o", "0.8", and "type: text" respectively.
|
247
|
+
|
248
|
+
|
249
|
+
## user
|
250
|
+
|
251
|
+
This field defines the properties of the user simulator in 3 parameters: language, role, context and goals
|
252
|
+
|
253
|
+
### language
|
254
|
+
|
255
|
+
This parameter defines the main language that will be used in the conversations. If no language is provided, it is set to English by default.
|
256
|
+
|
257
|
+
### role
|
258
|
+
|
259
|
+
In this field, the tester should define the role the user will deploy during the conversation as a prompt, according to the chatbot to test.
|
260
|
+
|
261
|
+
### context
|
262
|
+
|
263
|
+
This field consist of a list of prompts that will define some characteristics of the user simulator.
|
264
|
+
This can be used to define the name of the user, the availability for an appointment, allergies or intolerances, etc.
|
265
|
+
An option for loading predefined "personalities" can be enabled by typing inside of this field "personality:" and the
|
266
|
+
path to the YAML file containing the desired personality. These personalities can go along with characteristics added
|
267
|
+
by the programmer.
|
268
|
+
|
269
|
+
### goals
|
270
|
+
|
271
|
+
This field, named "ask_about" in previous versions is used to narrow down the conversation topics the user simulator will carry out with the chatbot.
|
272
|
+
It consists of a list of strings and dictionaries.
|
273
|
+
|
274
|
+
The tester define a list of prompts with indications for the user simulator to check on the chatbot.
|
275
|
+
These prompts can contain variables that should be called inside the text between double brackets {{var}}.
|
276
|
+
Variables are useful to provide variability in the testing process and should be instantiated in the list as
|
277
|
+
shown in the example above with the exact same name as written between brackets (case-sensitive).
|
278
|
+
|
279
|
+
Variables follow a specific structure defined by 3 fields as shown below: data, type and function.
|
280
|
+
```
|
281
|
+
goals:
|
282
|
+
- "cost estimation for photos of {{number_photo}} artworks"
|
283
|
+
- number_photo:
|
284
|
+
function: forward()
|
285
|
+
type: int
|
286
|
+
data:
|
287
|
+
step: 2
|
288
|
+
min: 1
|
289
|
+
max: 6
|
290
|
+
|
291
|
+
# data: (only with float)
|
292
|
+
# steps: 0.2 // linspace: 5
|
293
|
+
# min: 1
|
294
|
+
# max: 6
|
295
|
+
```
|
296
|
+
### type
|
297
|
+
This field indicates the type of data that will be substituted in the variable placement.
|
298
|
+
|
299
|
+
Types can be default or custom. Default types are included in Sensei's source code and consist of "string", "int" and
|
300
|
+
"float". Custom types are defined by the user and must be included in the "types" folder inside the project folder.
|
301
|
+
|
302
|
+
Custom types follow the structure in the example below:
|
303
|
+
|
304
|
+
```
|
305
|
+
# Structure for phone_number.yml
|
306
|
+
|
307
|
+
name: phone_number
|
308
|
+
type_description: A phone number
|
309
|
+
format: r"^\d{3}-\d{7}$"
|
310
|
+
extraction: str
|
311
|
+
```
|
312
|
+
|
313
|
+
```
|
314
|
+
# Structure for currency.yml
|
315
|
+
|
316
|
+
name: currency
|
317
|
+
type_description: a float number with a currency
|
318
|
+
format: r'\d+(?:[.,]\d+)?\s*(?:[\$\€\£]|USD|EUR)'
|
319
|
+
extraction:
|
320
|
+
value:
|
321
|
+
type: float
|
322
|
+
description: a float value
|
323
|
+
currency:
|
324
|
+
type: string
|
325
|
+
description: a currency unit
|
326
|
+
```
|
327
|
+
- name: indicates the type's name. It must be identical to the name of the yaml file containing the custom type information.
|
328
|
+
- type_description: this is a prompt to describe the type created.
|
329
|
+
- format: this field defines the format that data will follow in python regular expressions.
|
330
|
+
- extraction: The extraction field defines how to extract the relevant data from a matched value based on the format (regex). Its structure depends on the complexity of the extracted information:
|
331
|
+
|
332
|
+
- If the extracted value corresponds directly to a single, basic Python type (e.g. str, int, float),
|
333
|
+
you can simply specify the type name.This means the entire match is returned as a single value of that type,
|
334
|
+
with no further breakdown required.
|
335
|
+
- If the extracted value contains multiple meaningful components (e.g. a number and a currency, a date and time, etc.), then the extraction field must define a structured object.
|
336
|
+
Each key represents a component to extract, with its own type and description
|
337
|
+
|
338
|
+
### data
|
339
|
+
Here, the data list to use will be defined. In general, data lists must be defined manually by the user, but there
|
340
|
+
are some cases where it can be created automatically.
|
341
|
+
|
342
|
+
As shown in the example above, instead of defining a list of the amount of artworks,
|
343
|
+
it is possible to automatically create an integer or float list based on range instructions using a 'min, max, step' structure,
|
344
|
+
where min refers to the minimum value of the list, max refers to the maximum value of the list,
|
345
|
+
and step refers to the separation steps between samples. When working with float data, it can also be used the "linspace"
|
346
|
+
parameter instead of step, where samples will be listed with a linear separation step between them.
|
347
|
+
|
348
|
+
This field also allows the user to create data lists based in prompts by using the function "any()".
|
349
|
+
```
|
350
|
+
- drink:
|
351
|
+
function: another()
|
352
|
+
type: string
|
353
|
+
data:
|
354
|
+
- Sprite
|
355
|
+
- Coca-Cola
|
356
|
+
- Pepsi
|
357
|
+
- any(3 soda drinks)
|
358
|
+
- any(alcoholic drinks)
|
359
|
+
```
|
360
|
+
By using this function, an LLM creates a list following the instructions provided by the user inside the parenthesis.
|
361
|
+
This function can be used alone in the list or accompanied by other items added by the user. When used with other items,
|
362
|
+
the "any()" function will exclude these items from the list generation process in case they're related to the instruction. Multiple
|
363
|
+
"any()" functions can be used inside the list.
|
364
|
+
Note that if no amount is specified in the prompt, the "any()" function will create a list with an unpredictable amount of items.
|
365
|
+
|
366
|
+
|
367
|
+
The possibility to add personalized list functions to create data lists is another option available in this field,
|
368
|
+
as shown in the example below.
|
369
|
+
|
370
|
+
```
|
371
|
+
- number:
|
372
|
+
function: forward()
|
373
|
+
type: int
|
374
|
+
data:
|
375
|
+
file: list_functions/number_list.py
|
376
|
+
function_name:
|
377
|
+
args:
|
378
|
+
- 1
|
379
|
+
- 6
|
380
|
+
- 2
|
381
|
+
|
382
|
+
- pizza_type:
|
383
|
+
function: forward()
|
384
|
+
type: string
|
385
|
+
data:
|
386
|
+
file: list_functions/number_list.py
|
387
|
+
function_name: shuffle_list
|
388
|
+
args: list_functions/list_of_things.yml
|
389
|
+
```
|
390
|
+
In these two examples, a personalized list function is implemented in "data". The structure consist in three parameters:
|
391
|
+
- file: The path to the .py file where the function is created
|
392
|
+
- function_name: the name of the function to run inside the .py file
|
393
|
+
- args: the required input args for the function.
|
394
|
+
|
395
|
+
List functions are fully personalized by the user.
|
396
|
+
|
397
|
+
### function
|
398
|
+
Functions are useful to determine how data will be added to the prompt.
|
399
|
+
|
400
|
+
Since the data is listed, functions are used to iterate through these lists in order to change the information
|
401
|
+
inside the variable in each conversation. The functions available in this update are the following:
|
402
|
+
|
403
|
+
- default(): the default() function assigns all data in the list to the variable in the prompt.
|
404
|
+
- random(): this function picks only one random sample inside the list assigned to the variable.
|
405
|
+
- random(5): this function picks a certain amount of random samples inside the list. In this example, 5 random
|
406
|
+
samples will be picked from the list. This number can't exceed the list length.
|
407
|
+
- random(rand): this function picks a random amount of random samples inside the list.
|
408
|
+
This amount will not exceed the list length.
|
409
|
+
- another(): the another() function will always randomly pick a different sample until finishing the options.
|
410
|
+
- another(5): when a certain amount is defined inside the brackets, the another() function will pick this number of
|
411
|
+
samples without repetition between conversations until finishing the options.
|
412
|
+
- forward(): this function iterates through each of the samples in the list one by one. It allows to nest multiple
|
413
|
+
forward() functions in order to cover all combinations possible. To nest forward() functions it is necessary to reference the variable that it is going to nest by typing
|
414
|
+
its name inside the parenthesis, as shown in the example below:
|
415
|
+
```
|
416
|
+
goals:
|
417
|
+
- "{{cans}} cans of {{drink}}"
|
418
|
+
|
419
|
+
- cans:
|
420
|
+
function: forward(drink)
|
421
|
+
type: int
|
422
|
+
data:
|
423
|
+
min: 1
|
424
|
+
max: 3
|
425
|
+
step: 1
|
426
|
+
|
427
|
+
- drink:
|
428
|
+
function: forward()
|
429
|
+
type: string
|
430
|
+
data:
|
431
|
+
- sprite
|
432
|
+
- coke
|
433
|
+
- Orange Fanta
|
434
|
+
|
435
|
+
```
|
436
|
+
- pairwise(): This function iterates through data by creating pairwise based combinations for pairwise testing.Pairwise
|
437
|
+
testing is a combinatorial test-design technique that focuses on covering all possible
|
438
|
+
pairs of input parameter values at least once. It’s based on the observation that most software faults are triggered
|
439
|
+
by interactions of just two parameters, so exercising every pair often finds the majority of bugs
|
440
|
+
with far fewer tests than full Cartesian‐product enumeration.
|
441
|
+
|
442
|
+
The pairwise function must be applied to more than 1 variable in order to create the combinations matrix to iterate.
|
443
|
+
Variables will change in each conversation based on the matrix construction.
|
444
|
+
|
445
|
+
## chatbot
|
446
|
+
|
447
|
+
This field provides information about the chatbot configuration and the data to be obtained from the conversation.
|
448
|
+
|
449
|
+
### is_starter
|
450
|
+
|
451
|
+
This parameter defines whether the chatbot will start the conversation or not. The value supported is boolean and
|
452
|
+
will be set depending on the chatbot to test.
|
453
|
+
|
454
|
+
### fallback
|
455
|
+
|
456
|
+
Here, the tester should provide the chatbot's original fallback message in order to allow the user simulator to detect
|
457
|
+
fallbacks. This is needed to avoid fallback loops, allowing the user simulator to rephrase the query or change the topic.
|
458
|
+
|
459
|
+
### output
|
460
|
+
|
461
|
+
This field helps the tester get some certain information for the conversation once it is finished. It is used for data validation tasks.
|
462
|
+
|
463
|
+
The tester defines some certain data to obtain from the conversation in order to validate the consistency and
|
464
|
+
performance of the chatbot. This output field must follow the structure below:
|
465
|
+
|
466
|
+
```
|
467
|
+
output:
|
468
|
+
- price:
|
469
|
+
type: currency
|
470
|
+
description: The final price of the pizza order
|
471
|
+
- time:
|
472
|
+
type: time
|
473
|
+
description: how long is going to take the pizza to be ready
|
474
|
+
- order_id:
|
475
|
+
type: string
|
476
|
+
description: my order ID
|
477
|
+
```
|
478
|
+
|
479
|
+
A name for the data to output must be defined. Each output must contain these two parameters:
|
480
|
+
|
481
|
+
- type: here it is defined the type of value to output. This types can be default or custom as defined in the "type"
|
482
|
+
parameter in "goals". Default types are the following:
|
483
|
+
- int: Outputs data as an integer.
|
484
|
+
- float: Outputs data as a float.
|
485
|
+
- string: Outputs data as text.
|
486
|
+
- time/time("format"): Outputs data in a time format. An output format can be specified by adding a parenthesis with the
|
487
|
+
desired format written in natural language. Ex: time(UTC), time(hh:mm:ss), time(show time in hours, minutes and seconds)
|
488
|
+
- date/time("format"): Outputs data in a date format. Following the same logis as "time" type, a date format can be specified
|
489
|
+
in natural language. Ex: date(mm/dd/yyyy), date(day-month-year), date(show date in days, months and years)
|
490
|
+
- list[type]: Outputs a list of the specified data inside the brackets
|
491
|
+
|
492
|
+
|
493
|
+
- description: In this parameter, the tester should prompt a text defining which information has to be obtained from the conversation.
|
494
|
+
|
495
|
+
|
496
|
+
## conversation
|
497
|
+
|
498
|
+
This field defines some parameters that will dictate how the conversations will be generated. It consists
|
499
|
+
of 3 parameters: number, goal_style and interaction_style.
|
500
|
+
|
501
|
+
```
|
502
|
+
conversation:
|
503
|
+
number: 3
|
504
|
+
max_cost: 1
|
505
|
+
goal_style:
|
506
|
+
steps: 5
|
507
|
+
max_cost: 0.1
|
508
|
+
interaction_style:
|
509
|
+
- random:
|
510
|
+
- make spelling mistakes
|
511
|
+
- all questions
|
512
|
+
- long phrases
|
513
|
+
- change language:
|
514
|
+
- italian
|
515
|
+
- portuguese
|
516
|
+
- chinese
|
517
|
+
```
|
518
|
+
|
519
|
+
### number
|
520
|
+
This parameter specifies the number of conversations to generate. You can assign a specific numeric value to this field to define an exact number of conversations.
|
521
|
+
Example: number: 2 (This will generate 2 conversations.)
|
522
|
+
|
523
|
+
Alternatively, the number of conversations can be determined by the number of combinations derived from the value matrix
|
524
|
+
generated by nested forward or pairwise functions—provided these functions are included in the "goals" field.
|
525
|
+
To use this method, set the number field to "combinations".
|
526
|
+
|
527
|
+
- _combinations_:
|
528
|
+
This option calculates the maximum number of conversations that can be generated based on the total
|
529
|
+
number of possible combinations from the value matrices produced only by the forward and pairwise functions.
|
530
|
+
The biggest number of combinations obtained by any of the available matrices will be used.
|
531
|
+
````
|
532
|
+
conversation:
|
533
|
+
number: combinations
|
534
|
+
````
|
535
|
+
|
536
|
+
- _combinations(float)_:
|
537
|
+
To reduce the number of conversations, you can specify a percentage by including a float value between 0 and 1 in parentheses.
|
538
|
+
This value will be used to calculate a proportion of the total number of generated conversations.
|
539
|
+
````
|
540
|
+
conversation:
|
541
|
+
number: combinations(0.6) # this will use only 60% of the total conversations.
|
542
|
+
````
|
543
|
+
|
544
|
+
- _combinations(float, function)_:
|
545
|
+
It is possible to reference the value matrix generated by a specific function to determine the number of conversations
|
546
|
+
by including the function's name in parentheses.
|
547
|
+
````
|
548
|
+
conversation:
|
549
|
+
number: combinations(0.6, pairwise) # this will use only 60% of the total conversations from the pairwise matrix.
|
550
|
+
````
|
551
|
+
|
552
|
+
````
|
553
|
+
conversation:
|
554
|
+
number: combinations(1, forward) # this will use the 100% of the total conversations from the biggest forward matrix.
|
555
|
+
````
|
556
|
+
|
557
|
+
|
558
|
+
### max_cost
|
559
|
+
Since there is a cost to implementing LLMs, the max_cost parameter has been introduced to keep the expenditure
|
560
|
+
under control by setting a limit on the cost of the execution. This parameter is optional and the value represents
|
561
|
+
price in dollars.
|
562
|
+
|
563
|
+
### goal_style
|
564
|
+
This defines how the conversation should end. There are 3 options in this update
|
565
|
+
- steps: the tester should input the number of interactions to be done before the conversation ends.
|
566
|
+
- random steps: a random number of interactions will be done between 1 and an amount defined by the user. This amount can't exceed 20.
|
567
|
+
- all_answered: the conversation will end as long as all the queries in "goals" have been asked by the user and answered by the chatbot.
|
568
|
+
This option creates an internal data frame that verifies if all "goals" queries are being responded or confirmed, and it is possible to export this
|
569
|
+
dataframe once the conversation ended by setting the "export" field as True, as shown in the following example. This field is not mandatory, thus if only
|
570
|
+
"all_answered" is defined, the export field is set as False by default.
|
571
|
+
When all_answered is set, conversations are regulated with a loop break based on the chatbot's fallback message in order to avoid infinite loops when the chatbot does
|
572
|
+
not know how to answer to several questions made by the user. But, in some cases, this loop break can be dodged due to hallucinations from the chatbot, leading to
|
573
|
+
irrelevant and extremely long conversations. To avoid this, a "limit" parameter is implemented in order to give the tester the possibility to stop the conversation
|
574
|
+
after a specific amount of interactions in case the loop break was not triggered before or all queries were not answered. This parameter is not mandatory neither and will
|
575
|
+
be set to 30 interactions by default.
|
576
|
+
```
|
577
|
+
goal_style:
|
578
|
+
all_answered:
|
579
|
+
export: True
|
580
|
+
limit: 20
|
581
|
+
```
|
582
|
+
- default: the default mode enables "all_answered" mode with 'export' set as False and 'limit' set to 30, since no steps are defined.
|
583
|
+
- max_cost (individual): This parameter mimics the functionality of the max_cost parameter defined a level above. However, the
|
584
|
+
cost limit is set fot each individual conversation inside the execution. Once this limit is surpassed, the conversation ends
|
585
|
+
and the next one is executed. This parameter is optional, but when used, it must be defined in conjunction with the goal
|
586
|
+
styles explained before.
|
587
|
+
```
|
588
|
+
conversation:
|
589
|
+
number: sample(0.2)
|
590
|
+
max_cost: 1 # cost limit per execution
|
591
|
+
goal_style:
|
592
|
+
steps: 5
|
593
|
+
max_cost: 0.1 # cost limit per conversation
|
594
|
+
```
|
595
|
+
|
596
|
+
### interaction_style
|
597
|
+
This indicates how the user simulator should carry out the conversation. There are 7 options in this update
|
598
|
+
- long phrase: the user will use very long phrases to write any query.
|
599
|
+
- change your mind: the user will change its mind eventually. Useful in conversations when the user has to
|
600
|
+
provide information, such as toppings on a pizza, an appointment date...
|
601
|
+
- change language: the user will change the language in the middle of a conversation. This should be defined as a list
|
602
|
+
of languages inside the parameter, as shown in the example above.
|
603
|
+
- make spelling mistakes: the user will make typos and spelling mistakes during the conversation
|
604
|
+
- single question: the user makes only one query per interaction from "goals" field.
|
605
|
+
- all questions: the user asks everything inside the "goals" field in one interaction.
|
606
|
+
- random: this options allows to create a list inside of it with any of the interaction styles mentioned above.
|
607
|
+
Then, it selects a random amount of interaction styles to apply to the conversation. Here's an example on how to apply this interaction style:
|
608
|
+
```
|
609
|
+
interaction_style:
|
610
|
+
- random:
|
611
|
+
- make spelling mistakes
|
612
|
+
- all questions
|
613
|
+
- long phrases
|
614
|
+
- change language:
|
615
|
+
- italian
|
616
|
+
- portuguese
|
617
|
+
- chinese
|
618
|
+
```
|
619
|
+
- default: the user simulator will carry out the conversation in a natural way.
|
620
|
+
|
621
|
+
## Execution
|
622
|
+
|
623
|
+
To initiate the execution of the test process, it can be done in two ways:
|
624
|
+
|
625
|
+
### Command execution
|
626
|
+
|
627
|
+
The sensei_chat.py script must be executed along with some command-line arguments for a successful execution.
|
628
|
+
|
629
|
+
Example:
|
630
|
+
|
631
|
+
```
|
632
|
+
--technology
|
633
|
+
taskyto
|
634
|
+
|
635
|
+
--connector
|
636
|
+
path/to/connecor.yml
|
637
|
+
|
638
|
+
--connector_parameters
|
639
|
+
"{\"api_url\": \"http://127.0.0.1:5000\"}" #dictionary-like format is required
|
640
|
+
|
641
|
+
--project_path
|
642
|
+
C:\path\to\project\folder
|
643
|
+
|
644
|
+
--user_profile
|
645
|
+
profile_1.yaml \\\\\ folder_of_profiles
|
646
|
+
|
647
|
+
--extract
|
648
|
+
C:\path\to\extract\output\information
|
649
|
+
|
650
|
+
--verbose
|
651
|
+
```
|
652
|
+
|
653
|
+
|
654
|
+
- --technology: Chatbot technology to test.
|
655
|
+
- --connector: path to the chatbot connector to use.
|
656
|
+
- --connector_parameters: dynamic parameters for the selected chatbot connector
|
657
|
+
- --project_path: The project path where all testing content is stored for a specific project.
|
658
|
+
- --user_profile: name of the user profile YAML or the folder containing user profiles to use in the testing process.
|
659
|
+
- --extract: path where conversation outputs and reports will be stored.
|
660
|
+
- --verbose: shows logs during the testing process.
|
661
|
+
- --clean_cache: cache is cleaned after the testing process.
|
662
|
+
- --update_cache: cache is updated with new content if previous cache was saved.
|
663
|
+
- --ignore_cache: cache is ignored during the testing process.
|
664
|
+
|
665
|
+
|
666
|
+
### run.yml execution
|
667
|
+
|
668
|
+
The sensei_chat.py script must be executed with the command --run_from_yaml referencing to a project folder path which contains the
|
669
|
+
run.yml configuration file explained previously in the "initialization" section.
|
670
|
+
|
671
|
+
Once the arguments are assigned inside the run.yml, the execution can be performed.
|
672
|
+
|
673
|
+
Example:
|
674
|
+
|
675
|
+
`--run_from_yaml examples/academic_helper`
|
676
|
+
|
677
|
+
|
678
|
+
## Profile Validation
|
679
|
+
|
680
|
+
The validation_check.py script enables testers to carry out a validation process on the generated profile.
|
681
|
+
It produces two types of output files: a JSON file that reports any formatting errors detected in the
|
682
|
+
profile, and CSV files containing the matrices generated by the functions utilized within the profile.
|
683
|
+
|
684
|
+

|
685
|
+
|
686
|
+
The script contains the following run arguments:
|
687
|
+
|
688
|
+
- --profile: Specifies the directory containing the profile to be validated.
|
689
|
+
- --export: Defines the path where the output files will be saved.
|
690
|
+
- --combined_matrix: If enabled, generates a single combined matrix of all function elements instead of separate matrices for each function.
|
691
|
+
- --verbose: Displays detailed logs of the validation process.
|
692
|
+
|
693
|
+
Example:
|
694
|
+
`--profile path\to\profile.yml --export export\path --combined_matrix --verbose`
|
695
|
+
|
696
|
+
## Connectors
|
697
|
+
|
698
|
+
Connectors enable SENSEI to interface with various chatbot APIs.
|
699
|
+
These connectors are defined in a YAML configuration file, with their structure varying based
|
700
|
+
on the specific communication protocols and design of each chatbot API.
|
701
|
+
|
702
|
+
SENSEI provides a generic Chatbot Connector class that enables users to integrate with chatbot APIs in a standardized manner.
|
703
|
+
An example implementation is shown below:
|
704
|
+
|
705
|
+
```
|
706
|
+
config:
|
707
|
+
api_url: "https:\\xxxxxx"
|
708
|
+
headers:
|
709
|
+
Content-Type: 'application/json'
|
710
|
+
timeout: 10000
|
711
|
+
request_key: "queryInput.text.text"
|
712
|
+
response_key: "queryResult.fulfillmentText"
|
713
|
+
|
714
|
+
payload:
|
715
|
+
queryInput:
|
716
|
+
text:
|
717
|
+
text:
|
718
|
+
languageCode: "en"
|
719
|
+
|
720
|
+
parameters:
|
721
|
+
- api_url
|
722
|
+
|
723
|
+
```
|
724
|
+
|
725
|
+
A generic connector must contain 3 fields:
|
726
|
+
|
727
|
+
- config: Contains API-specific configuration details such as the endpoint URL, headers, timeout, and
|
728
|
+
keys used to extract request and response data. Developers should tailor this section based on the API requirements.
|
729
|
+
- payload: Defines the structure of the request payload used to communicate with the chatbot API.
|
730
|
+
- parameters (optional): Lists dynamic parameters that can be passed to the connector at runtime, allowing for
|
731
|
+
flexible configuration without modifying the core YAML file.
|
732
|
+
|
733
|
+
|