flex-cartesian 1.3.0 → 2.0.0.beta
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/LICENSE +6 -0
- data/README.md +133 -439
- data/lib/analyzer.rb +48 -0
- data/lib/analyzers/morris.rb +268 -0
- data/lib/flex-cartesian/flex-cartesian-analyzer.rb +16 -0
- data/lib/flex-cartesian/flex-cartesian-core.rb +624 -0
- data/lib/flex-cartesian/flex-cartesian-deprecations.rb +13 -0
- data/lib/flex-cartesian/flex-cartesian-io.rb +192 -0
- data/lib/flex-cartesian/flex-cartesian-utilities.rb +126 -0
- data/lib/flex-cartesian.rb +13 -331
- data/lib/version.rb +3 -0
- data/lib/visualization/html.rb +217 -0
- metadata +48 -26
data/README.md
CHANGED
|
@@ -1,530 +1,224 @@
|
|
|
1
|
-
|
|
1
|
+
<h1 align="center">FlexCartesian</h1>
|
|
2
|
+
<p align="center">
|
|
3
|
+
<b>Model real systems as functions of parameters.<br>Extract behavioural blueprints.<br>Get insights with a few lines of code.</b>
|
|
4
|
+
</p>
|
|
2
5
|
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
## Features
|
|
6
|
-
|
|
7
|
-
✅ Named dimensions with arbitrary keys
|
|
8
|
-
|
|
9
|
-
✅ Enumerate over Cartesian space with a single block argument
|
|
10
|
-
|
|
11
|
-
✅ Actions on Cartesian are decoupled from dimensionality: `s.cartesian { |v| do_something(v) }`
|
|
12
|
-
|
|
13
|
-
✅ Conditions for Cartesian space: `s.cond(:set) { |v| v.dim1 > v.dim2 } }`
|
|
14
|
-
|
|
15
|
-
✅ Calculation over named dimensions: `s.cartesian { |v| puts "#{v.dim1} and #{v.dim2}" }`
|
|
6
|
+
---
|
|
16
7
|
|
|
17
|
-
|
|
8
|
+
# What Is It
|
|
18
9
|
|
|
19
|
-
|
|
10
|
+
FlexCartesian is a novel approach to parameter space analysis. It introduces the Parametric Behaviour Blueprinting paradigm, abbreviated as PBB.
|
|
20
11
|
|
|
21
|
-
|
|
12
|
+
# What Is It For
|
|
22
13
|
|
|
23
|
-
|
|
14
|
+
Most systems around us are functions of parameters.<br/>
|
|
24
15
|
|
|
25
|
-
|
|
16
|
+
The LLM you use has inference parameters, and its functions are response quality, response time, and throughput. The cloud storage you use has configuration parameters, and its functions are IOPS and throughput. Even the car you drive has driving parameters, and its function is cost per mile.
|
|
26
17
|
|
|
27
|
-
|
|
18
|
+
As a rule, a parametric system's behavior is characterized by its function. Naturally, you want to tune those parameters to bring the system to its absolute best operating mode: the lowest cost per mile, the highest storage IOPS, or the lowest response time from an LLM. This leads to two fundamental questions:
|
|
28
19
|
|
|
29
|
-
|
|
20
|
+
<p align="center">
|
|
21
|
+
<b>HOW DO PARAMETERS INFLUENCE THE BEHAVIOUR OF THE SYSTEM?</b>
|
|
22
|
+
</p>
|
|
30
23
|
|
|
31
|
-
|
|
24
|
+
<p align="center">
|
|
25
|
+
<b>WHAT IS THE ABSOLUTE BEST OPERATING MODE OF THE SYSTEM?</b>
|
|
26
|
+
</p>
|
|
32
27
|
|
|
33
|
-
|
|
28
|
+
FlexCartesian addresses both questions. It explores the behavior of your system and identifies its optimal operating modes.
|
|
34
29
|
|
|
35
|
-
|
|
30
|
+
# Why It Exists
|
|
36
31
|
|
|
37
|
-
|
|
32
|
+
FlexCartesian fills several practical gaps left by conventional benchmarking and parameter space analysis tools.
|
|
38
33
|
|
|
39
|
-
|
|
40
|
-
- Metrics: `throughput`, `latency`, `memory`
|
|
41
|
-
- Output: CSV or Markdown tables
|
|
34
|
+
**> _Data gathering is separated from data analysis._**
|
|
42
35
|
|
|
43
|
-
|
|
36
|
+
Specifically, benchmarking tools blindly provide raw data, while modeling tools blindly assume that data has already been prepared somehow.
|
|
37
|
+
There is a need for a single tool that builds a model of a system and probes live data from it in a structured, consistent way that perfectly aligns with that model.
|
|
44
38
|
|
|
45
|
-
|
|
39
|
+
**> _Data gathered from systems is often scattered, inconsistent, unstructured, and incomplete._**
|
|
46
40
|
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
- With computed evaluation metrics like `accuracy`, `AUC`, etc
|
|
41
|
+
What's even worse, you rarely know in advance if there are gaps in the fetched data or where they might be.
|
|
42
|
+
There is a need for a tool that establishes a rigorous mathematical model of the system first, and then gathers data exactly as the model requires to represent the system consistently.
|
|
50
43
|
|
|
51
|
-
|
|
44
|
+
**> _System analysis is spread across System Engineer, System Architect, and Data Scientist roles._**
|
|
52
45
|
|
|
53
|
-
|
|
46
|
+
The first role knows how to benchmark and gather data. The second role understands the system's architecture. The third knows how to explore the data. There is a need for a simple tool that enables any of these roles to conduct full-cycle analysis without delays. For example, a System Architect should be able to iteratively run an analysis, updating the data in the model quickly and independently.
|
|
54
47
|
|
|
55
|
-
|
|
56
|
-
region: ["us-west", "eu-central"]
|
|
57
|
-
tier: ["basic", "pro"]
|
|
58
|
-
replicas: [1, 3, 5]
|
|
59
|
-
```
|
|
48
|
+
**> _Heavy-weight scripting is required to turn specialized libraries into end-user tools._**
|
|
60
49
|
|
|
61
|
-
|
|
62
|
-
```ruby
|
|
63
|
-
s.cond(:set) { |v| (v.tier == "basic" ? v.replicas == 1 : true) }
|
|
64
|
-
```
|
|
50
|
+
There is a need for a high-level, concise Domain-Specific Language (DSL) to gather data, explore it, and model the parametric system.
|
|
65
51
|
|
|
66
|
-
|
|
67
|
-
Generate and benchmark all valid CLI calls:
|
|
52
|
+
# Essential Advantages
|
|
68
53
|
|
|
69
|
-
|
|
70
|
-
myapp --threads=4 --batch=32 --backend=torch
|
|
71
|
-
```
|
|
54
|
+
FlexCartesian elevates the traditional parameter space analysis paradigm to the next level, which we call **Parametric Behaviour Blueprinting (PBB)**.
|
|
72
55
|
|
|
73
|
-
|
|
56
|
+
Conventional parameter space analysis takes the existence of parameter values and system states for granted, focusing solely on exploring the state (sensitivity, robustness, trade-offs, extrema, heatmaps, etc.). PBB extends this scope further:
|
|
74
57
|
|
|
75
|
-
|
|
76
|
-
Automatically cover input parameter spaces for:
|
|
58
|
+
**1. Live data gathering.** FlexCartesian fetches data directly from a real system (digital or physical) using defined `behavioural functions`.
|
|
77
59
|
|
|
78
|
-
|
|
79
|
-
- User roles: ["guest", "user", "admin"]
|
|
80
|
-
- Language settings: ["en", "fr", "de"]
|
|
60
|
+
**2. Evolving blueprints.** A live linkage to the real system maintains a behavioural blueprint that evolves over time. This is driven by `data sources` feeding the behavioural functions, natively supporting the temporal dimension in the model.
|
|
81
61
|
|
|
82
|
-
|
|
83
|
-
Generate multidimensional experimental spaces for:
|
|
62
|
+
**3. Structured Consistency.** FlexCartesian maintains the data gathered from the real system in a structured, complete order. This is enforced by FlexCartesian's core mathematical model: `parameter space` + `conditions` + `behavioural functions`. These three concepts guarantee that the system's behavior is described correctly for any valid combination of parameters.
|
|
84
63
|
|
|
85
|
-
-
|
|
86
|
-
- Bioinformatics parameter sweeps
|
|
87
|
-
- Network behavior modeling, etc
|
|
64
|
+
**4. Reverse Linkage for Simulation.** FlexCartesian doesn't just use the live linkage to gather data; it allows you to use the blueprint as a substitute for the real system. This unlocks new opportunities in system modeling, testing, and integration. It is particularly useful for air-gapped systems or AI training, where providing real system data is unavailable or prohibitively expensive.
|
|
88
65
|
|
|
89
|
-
|
|
90
|
-
Output Cartesian data as:
|
|
66
|
+
Additionally, FlexCartesian implements PBB via a highly expressive Ruby-based DSL. This allows you to execute powerful concepts in just a single line of code, natively integrating all the flexibility and elegance of Ruby.
|
|
91
67
|
|
|
92
|
-
|
|
93
|
-
- CSV (for Excel, Google Sheets, and more advanced BI tools)
|
|
94
|
-
- Plain text (for CLI previews)
|
|
68
|
+
## Example #1: Avoiding semantic shift in ChatGPT
|
|
95
69
|
|
|
96
|
-
|
|
97
|
-
Use it to drive automated test inputs for:
|
|
70
|
+
Suppose we want to find the optimal operating mode for ChatGPT—specifically, the ranges of `temperature` and `tokens` where the model gives stable, consistent answers to repeated questions. A lack of stability is known as _semantic shift,_ which is crucial to avoid in fields like law or science, where an AI assistant must provide reliable answers based on a strict corpus of documents.
|
|
98
71
|
|
|
99
|
-
-
|
|
100
|
-
- Minitest table-driven tests
|
|
101
|
-
- PyTest parameterization
|
|
72
|
+
Perhaps, we want to find optimal operating mode of ChatGPT - specifically, the ranges of its temperature and tokens where ChatGPT gives stable and consistent answers to repeated question. The lack of such stability is called semantic shift, which is crucial to avoid in such fields as law or science, where AI assistant must provide very stable answers based on a given corpus of documents.
|
|
102
73
|
|
|
103
|
-
|
|
74
|
+
While you can run [this example](https://github.com/Yuri-Rassokhin/flex-cartesian/tree/main/examples/13_chatgpt_semantic_shift) yourself, here's how FlexCartesian determines a semantic-shift-free operating mode, step by step.
|
|
104
75
|
|
|
105
|
-
|
|
106
|
-
bundle install
|
|
107
|
-
gem build flex-cartesian.gemspec
|
|
108
|
-
gem install flex-cartesian-*.gem
|
|
109
|
-
```
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
## Usage
|
|
76
|
+
Enable FlexCartesian:
|
|
114
77
|
|
|
115
78
|
```ruby
|
|
116
|
-
#!/usr/bin/ruby
|
|
117
|
-
|
|
118
79
|
require 'flex-cartesian'
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
# BASIC CONCEPTS
|
|
123
|
-
|
|
124
|
-
# 1. Cartesian object is a set of combinations of values of dimansions.
|
|
125
|
-
# 2. Dimensions always have names.
|
|
126
|
-
|
|
127
|
-
puts "\nDefine named dimensions"
|
|
128
|
-
example = {
|
|
129
|
-
dim1: [1, 2],
|
|
130
|
-
dim2: ['x', 'y'],
|
|
131
|
-
dim3: [true, false]
|
|
132
|
-
}
|
|
133
|
-
|
|
134
|
-
puts "\nCreate Cartesian space"
|
|
135
|
-
s = FlexCartesian.new(example)
|
|
136
|
-
|
|
137
|
-
def do_something(v)
|
|
138
|
-
# do something here on vector v and its components
|
|
139
|
-
end
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
# ITERATION OVER CARTESIAN SPACE
|
|
144
|
-
|
|
145
|
-
# 3. Iterator is dimensionality-agnostic, that is, has a vector syntax that hides dimensions under the hood.
|
|
146
|
-
# This keeps foundational code intact, and isolates modifications in the iterator body 'do_something'.
|
|
147
|
-
# 4. For efficiency on VERY largse Cartesian spaces, there are
|
|
148
|
-
# a). lazy evaluation of each combination
|
|
149
|
-
# b). progress bar to track time-consuming calculations.
|
|
150
|
-
|
|
151
|
-
puts "\nIterate over all Cartesian combinations and execute action (dimensionality-agnostic style)"
|
|
152
|
-
s.cartesian { |v| do_something(v) }
|
|
153
|
-
|
|
154
|
-
puts "\nIterate over all Cartesian combinations and execute action (dimensionality-aware style)"
|
|
155
|
-
s.cartesian { |v| puts "#{v.dim1} & #{v.dim2}" if v.dim3 }
|
|
156
|
-
|
|
157
|
-
puts "\nIterate and display progress bar (useful for large Cartesian spaces)"
|
|
158
|
-
s.progress_each { |v| do_something(v) }
|
|
159
|
-
|
|
160
|
-
puts "\nIterate in lLazy mode, without materializing entire Cartesian product in memory"
|
|
161
|
-
s.cartesian(lazy: true).take(2).each { |v| do_something(v) }
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
# FUNCTIONS ON CARTESIAN SPACE
|
|
166
|
-
|
|
167
|
-
# 5. A function is a virtual dimension that is calculated based on a vector of base dimensions.
|
|
168
|
-
# You can think of a function as a scalar field defined on Cartesian space.
|
|
169
|
-
# 6. Functions are printed as virtual dimensions in `.output`.
|
|
170
|
-
# 7. Functions do not add to `.size` of Cartesian space.
|
|
171
|
-
|
|
172
|
-
puts "\nAdd function 'triple'"
|
|
173
|
-
puts "Note: function is visualized in .output as a new dimension"
|
|
174
|
-
s.func(:add, :triple) { |v| v.dim1 * 3 + (v.dim3 ? 1: 0) }
|
|
175
|
-
s.func(:run)
|
|
176
|
-
s.output
|
|
177
|
-
|
|
178
|
-
puts "\Add and then remove function 'test'"
|
|
179
|
-
s.func(:add, :test) { |v| v.dim3.to_i }
|
|
180
|
-
s.func(:del, :test)
|
|
181
|
-
|
|
182
|
-
puts "\nThis function will calculate first, always"
|
|
183
|
-
s.func(:add, :test_first, order: :first) { puts "HERE" }
|
|
184
|
-
|
|
185
|
-
puts "\nThis function will calculate last, always"
|
|
186
|
-
s.func(:add, :test_last, order: :last) { puts "HERE" }
|
|
187
|
-
|
|
188
|
-
puts "\nNew functions respect first and last ones"
|
|
189
|
-
s.func(:add, :test_more) { puts "HERE" }
|
|
190
|
-
|
|
191
|
-
puts "\nFirst and last functions are handy for pre- and post-processing for each combination"
|
|
192
|
-
|
|
193
|
-
# CONDITIONS ON CARTESIAN SPACE
|
|
194
|
-
|
|
195
|
-
# 8. A condition is a logical constraint for allowed combitnations of Cartesian space.
|
|
196
|
-
# 9. Using conditions, you can take a slice of Cartesian space.
|
|
197
|
-
# In particular, you can reflect semantical dependency of dimensional values.
|
|
198
|
-
|
|
199
|
-
puts "Build Cartesian space that includes only odd values of 'dim1' dimension"
|
|
200
|
-
s.cond(:set) { |v| v.dim1.odd? }
|
|
201
|
-
puts "print all the conditions in format 'index | condition '"
|
|
202
|
-
s.cond
|
|
203
|
-
puts "Test the condition: print the updated Cartesian space"
|
|
204
|
-
s.output
|
|
205
|
-
puts "Test the condition: check the updated size of Cartesian space"
|
|
206
|
-
puts "New size: #{s.size}"
|
|
207
|
-
puts "Clear condition #0"
|
|
208
|
-
s.cond(:unset, index: 0)
|
|
209
|
-
puts "Clear all conditions"
|
|
210
|
-
s.cond(:clear)
|
|
211
|
-
puts "Restored size without conditions: #{s.size}"
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
# PRINT
|
|
216
|
-
|
|
217
|
-
puts "\nPrint Cartesian space as plain table, all functions included"
|
|
218
|
-
s.output
|
|
219
|
-
puts "\nPrint Cartesian space as Markdown"
|
|
220
|
-
s.output(format: :markdown)
|
|
221
|
-
puts "\nPrint Cartesian space as CSV"
|
|
222
|
-
s.output(format: :csv)
|
|
223
|
-
puts "\nGranular output of Cartesian space dimensions: #{s.dimensions(values: false)}"
|
|
224
|
-
puts "\nGranular output of Cartesian vectors:"
|
|
225
|
-
s.cartesian { |v| puts s.dimensions(v, separator: ' ') }
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
# IMPORT / EXPORT
|
|
230
|
-
|
|
231
|
-
puts "\nImport Cartesian space from JSON (similar method for YAML)"
|
|
232
|
-
File.write('example.json', JSON.pretty_generate(example))
|
|
233
|
-
puts "\nNote: after import, all assigned functions will calculate again, and they appear in the output"
|
|
234
|
-
s.import('example.json').output
|
|
235
|
-
puts "\nExport Cartesian space to YAML (similar method for JSON)"
|
|
236
|
-
s.export('example.yaml', format: :yaml)
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
# UTILITIES
|
|
241
|
-
|
|
242
|
-
puts "\nGet number of Cartesian combinations"
|
|
243
|
-
puts "Note: .size counts only dimensions, it ignores virtual constructs (functions, conditions, etc.)"
|
|
244
|
-
puts "Total size of Cartesian space: #{s.size}"
|
|
245
|
-
puts "\nPartially converting Cartesian space to array:"
|
|
246
|
-
array = s.to_a(limit: 3)
|
|
247
|
-
puts array.inspect
|
|
248
80
|
```
|
|
249
81
|
|
|
250
|
-
|
|
251
|
-
|
|
252
|
-
The most common use case for FlexCartesian is sweep analysis, that is, analysis of target value on all possible combinations of its parameters.
|
|
253
|
-
FlexCartesian has been designed to provide a concise form for sweep analysis:
|
|
82
|
+
Define the parameter space:
|
|
254
83
|
|
|
255
84
|
```ruby
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
s = FlexCartesian.new(path: './config.json')
|
|
260
|
-
|
|
261
|
-
# Define the values we want to calculate on all possible combinations of parameters
|
|
262
|
-
s.func(:add, :cmd) { |v| v.threads * v.batch }
|
|
263
|
-
s.func(:add, :performance) { |v| v.cmd / 3 }
|
|
264
|
-
|
|
265
|
-
# Calculate
|
|
266
|
-
s.func(:run)
|
|
267
|
-
|
|
268
|
-
# Save result as CSV, to easily open it in any business analytics tool
|
|
269
|
-
s.output(format: :csv, file: './benchmark.csv')
|
|
270
|
-
# For convenience, print result to the terminal
|
|
271
|
-
s.output
|
|
272
|
-
```
|
|
273
|
-
|
|
274
|
-
As this code is a little artificial, let us build real-world example.
|
|
275
|
-
Perhaps, we want to analyze PING perfomance from our machine to several DNS providers: Google DNS, CloudFlare DNS, and Cisco DNS.
|
|
276
|
-
For each of those services, we would like to know:
|
|
277
|
-
|
|
278
|
-
- What is our ping time?
|
|
279
|
-
- How does ping scale by packet size?
|
|
280
|
-
- How does ping statistics vary based on count of pings?
|
|
281
|
-
|
|
282
|
-
These input parameters form the following dimensions.
|
|
283
|
-
|
|
284
|
-
```json
|
|
285
|
-
{
|
|
286
|
-
"count": [2, 4],
|
|
287
|
-
"size": [32, 64],
|
|
288
|
-
"target": [
|
|
289
|
-
"8.8.8.8", // Google DNS
|
|
290
|
-
"1.1.1.1", // Cloudflare DNS
|
|
291
|
-
"208.67.222.222" // Cisco OpenDNS
|
|
292
|
-
]
|
|
293
|
-
}
|
|
85
|
+
space = FlexCartesian.new({
|
|
86
|
+
temperature: [0.0, 0.2, 0.4, 0.6, 0.8, 1.0],
|
|
87
|
+
tokens: [20, 50, 100, 200, 400]})
|
|
294
88
|
```
|
|
295
89
|
|
|
296
|
-
|
|
297
|
-
Let us build the code to run over these parameters.
|
|
90
|
+
Define the behavioural function `response`. For any combination of parameters, it returns the response given by ChatGPT to the same test question:
|
|
298
91
|
|
|
299
92
|
```ruby
|
|
300
|
-
|
|
93
|
+
msg = [ { role: "system", content: "You are a precise and consistent assistant." },
|
|
94
|
+
{ role: "user", content: "Explain quantum mechanics in one sentence." } ]
|
|
301
95
|
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
result = {} # raw result of each ping
|
|
305
|
-
|
|
306
|
-
s.func(:add, :command) { |v| "ping -c #{v.count} -s #{v.size} #{v.target}" } # ping command
|
|
307
|
-
s.func(:add, :raw_ping, hide: true) { |v| result[v.command] ||= `#{v.command} 2>&1` } # capturing ping result
|
|
308
|
-
s.func(:add, :time) { |v| v.raw_ping[/min\/avg\/max\/(?:mdev|stddev) = [^\/]+\/([^\/]+)/, 1]&.to_f } # fetch ping time from result
|
|
309
|
-
s.func(:add, :min) { |v| v.raw_ping[/min\/avg\/max\/(?:mdev|stddev) = ([^\/]+)/, 1]&.to_f } # fetch min time from result
|
|
310
|
-
s.func(:add, :loss) { |v| v.raw_ping[/(\d+(?:\.\d+)?)% packet loss/, 1]&.to_f } # fetch ping loss from result
|
|
311
|
-
|
|
312
|
-
s.func(:run, progress: true, title: "Pinging") # Sweep analysis! Benchmark all possible combinations of parameters
|
|
313
|
-
|
|
314
|
-
s.output(format: :csv, file: './result.csv') # save benchmark result as CSV
|
|
315
|
-
|
|
316
|
-
s.output(colorize: true) # for convenience, show result in terminal
|
|
96
|
+
space.func(:add, :response) { |v| llm(temperature: v.temperature, max_tokens: v.tokens, messages: msg ) }
|
|
317
97
|
```
|
|
318
98
|
|
|
319
|
-
|
|
99
|
+
Enrich the system with two additional behavioural functions. For any answer provided by `response`, the `embedding` function returns its vector embedding. The `semantic_shift` function then calculates how far the response drifts away from the very first answer ("anchor") given by the model. This value is exactly what we need!
|
|
320
100
|
|
|
321
|
-
|
|
101
|
+
```ruby
|
|
102
|
+
space.func(:add, :embedding) { |v| anchor ||= embed(v.response); embed(v.response) }
|
|
103
|
+
space.func(:add, :semantic_shift) { |v| (1.0 - cosine(v.embedding, anchor)).round(2) }
|
|
104
|
+
```
|
|
322
105
|
|
|
323
|
-
|
|
106
|
+
Next, we compute all the functions across the entire parameter space.
|
|
107
|
+
Behind the scenes, FlexCartesian iterates each function over all possible parameter combinations.
|
|
324
108
|
|
|
325
|
-
|
|
326
|
-
|
|
109
|
+
```ruby
|
|
110
|
+
space.func(:run, progress: true)
|
|
111
|
+
```
|
|
327
112
|
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
- Local file systems using FS-based utilities
|
|
331
|
-
- Local CPU RAM using RAM disk or specialized benchmarks for CPU RAM
|
|
332
|
-
- Database performance using SQL client or non-SQL client utilities
|
|
333
|
-
- Performance of object storage of cloud providers, be it AWS S3, OCI Object Storage, or anything else
|
|
334
|
-
- Performance of any AI model, from simplistic YOLO to heavy-weight LLM such as LLAMA, Cohere, or DeepSeek
|
|
335
|
-
- ... Any other target application or service
|
|
113
|
+
Upon completion, FlexCartesian holds the Parametric Behaviour Blueprint of our system.
|
|
114
|
+
Now, we can visualize this PBB as a 2D heatmap showing how the semantics of ChatGPT's answers depend on `tokens` and `temperature`
|
|
336
115
|
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
|
|
116
|
+
```ruby
|
|
117
|
+
space.visualize(x: :temperature, y: :tokens, func: :semantic_shift, output: "./viz.html")
|
|
118
|
+
```
|
|
340
119
|
|
|
341
|
-
|
|
120
|
+
When we open `./viz.html` in a browser, we see the semantic shift varying from `0.0` (identical to the first answer) to `1.0` (totally inconsistent):
|
|
342
121
|
|
|
122
|
+
<p align="center">
|
|
123
|
+
<img src="docs/assets/viz/example-low-rate.gif" width="600"/>
|
|
124
|
+
</p>
|
|
343
125
|
|
|
126
|
+
The heatmap gives us our answer: to completely avoid semantic shift, we should keep the temperature at or below `0.2`.
|
|
127
|
+
The number of tokens has no measurable influence on response stability.
|
|
128
|
+
Notably, even at temperatures beyond `0.2`, the responses remain respectably consistent, with the shift barely reaching `0.2`.
|
|
344
129
|
|
|
345
|
-
|
|
130
|
+
Finally, we want to mathematically assess the influence of each parameter:
|
|
346
131
|
|
|
347
|
-
### Initialization
|
|
348
132
|
```ruby
|
|
349
|
-
|
|
133
|
+
space.analyzer(:morris, trajectories: 10, step: 0.1, seed: 42).output(func: :semantic_shift, format: :markdown)
|
|
350
134
|
```
|
|
351
|
-
- `dimensions_hash`: optional hash with named dimensions; each value can be an `Enumerable` (arrays, ranges, etc)
|
|
352
|
-
- `path`: optional path to file with stored dimensions, JSON and YAML supported
|
|
353
|
-
- `format`: optional format of `path` file, defaults to JSON
|
|
354
135
|
|
|
355
|
-
|
|
356
|
-
```ruby
|
|
357
|
-
dimensions = {
|
|
358
|
-
dim1: [1, 2],
|
|
359
|
-
dim2: ['x', 'y'],
|
|
360
|
-
dim3: [true, false]
|
|
361
|
-
}
|
|
136
|
+
This produces a Markdown table quantifying the influence of the parameters:
|
|
362
137
|
|
|
363
|
-
|
|
364
|
-
|
|
138
|
+
|parameter |influence[semantic_shift]|deviation|probes|category|linearity |recommendation |
|
|
139
|
+
|-----------|-------------------------|---------|------|--------|-----------------|-------------------------------------------------------------------------------------------------|
|
|
140
|
+
|tokens |0.14 |0.22 |10 |strong |highly non-linear|critical parameter with complex interactions; prioritize for variance-based analysis (e.g. Sobol)|
|
|
141
|
+
|temperature|0.09 |0.19 |10 |strong |highly non-linear|critical parameter with complex interactions; prioritize for variance-based analysis (e.g. Sobol)|
|
|
365
142
|
|
|
366
|
-
|
|
143
|
+
This sensitivity table confirms the strong influence of both parameters and, most importantly, categorizes their influence as highly non-linear. This means we cannot make decisions based on just a few isolated probes. Instead, we must build a complete behavioural blueprint across the entire parameter space and find the "sweet spot": temperature <= 0.2, regardless of tokens.
|
|
367
144
|
|
|
368
|
-
|
|
145
|
+
## Example #2: Sensitivity of AWS DynamoDB servers
|
|
369
146
|
|
|
370
|
-
|
|
371
|
-
```ruby
|
|
372
|
-
# With block
|
|
373
|
-
cartesian(dims = nil, lazy: false) { |vector| ... }
|
|
374
|
-
# Without block: returns Enumerator
|
|
375
|
-
cartesian(dims = nil, lazy: false)
|
|
376
|
-
```
|
|
377
|
-
- `dims`: optional dimensions hash (default is the one provided at initialization).
|
|
378
|
-
- `lazy`: if true, returns a lazy enumerator.
|
|
147
|
+
Let's look at [another example](https://github.com/Yuri-Rassokhin/flex-cartesian/tree/main/examples/09_ping_visualize). If we ping AWS DynamoDB, how do specific parameters influence the ping time? For this example, let's analyze geographic IP address and packet size.
|
|
379
148
|
|
|
380
|
-
|
|
381
|
-
```ruby
|
|
382
|
-
s.cartesian { |v| puts "#{v.dim1} - #{v.dim2}" }
|
|
383
|
-
```
|
|
149
|
+
Here is the full code showing how FlexCartesian finds the answer:
|
|
384
150
|
|
|
385
|
-
---
|
|
386
|
-
|
|
387
|
-
### Handling Functions
|
|
388
|
-
```ruby
|
|
389
|
-
func(command = :print, name = nil, hide: false, progress: false, title: "calculating functions", order: nil, &block)
|
|
390
|
-
```
|
|
391
|
-
- `command`: symbol, one of the following
|
|
392
|
-
- `:add` to add function as a virtual dimension to Cartesian space
|
|
393
|
-
- `:del` to delete function from Cartesian space
|
|
394
|
-
- `:print` as defaut action, prints all the functions added to Cartesian space
|
|
395
|
-
- `:run` to calculate all the functions defined for Cartesian space
|
|
396
|
-
- `name`: symbol, name of the virtual dimension, e.g. `:my_function`
|
|
397
|
-
- `hide`: flag that hides or shows the function in .output; it is useful to hide intermediate calculations
|
|
398
|
-
- `progress`: show progress bar during `:run`, useful for large Cartesian space
|
|
399
|
-
- `title`: title of the progress bar
|
|
400
|
-
- `order`: can be `:first` or `:last` to make the function calculate before or after all other functions
|
|
401
|
-
- `block`: a function that receives each vector and returns a computed value
|
|
402
|
-
|
|
403
|
-
Functions show up in `.output` like additional (virtual) dimensions.
|
|
404
|
-
|
|
405
|
-
> Note: functions must be calculated excpliticy using `:run` command.
|
|
406
|
-
> Before the first calculation, a function has `nil` values in `.output`.
|
|
407
|
-
> Explicit :run is reequired to unambigously control points in the execution flow where high computational resource is to be consumed.
|
|
408
|
-
> Otherwise, automated recalculation of functions, perhaps, during `.output` would be a difficult-to-track computational burden.
|
|
409
|
-
|
|
410
|
-
Example:
|
|
411
151
|
```ruby
|
|
412
|
-
|
|
413
|
-
|
|
414
|
-
s.func(:run)
|
|
415
|
-
|
|
416
|
-
s.output(format: :markdown)
|
|
417
|
-
# | dim1 | dim2 | increment |
|
|
418
|
-
# |------|------|--------|
|
|
419
|
-
# | 1 | "A" | 2 |
|
|
420
|
-
# | 1 | "B" | 2 |
|
|
421
|
-
# ...
|
|
422
|
-
```
|
|
423
|
-
|
|
152
|
+
# enable FlexCartesian
|
|
153
|
+
require 'flex-cartesian'
|
|
424
154
|
|
|
425
|
-
|
|
155
|
+
# define parameter space
|
|
156
|
+
space = FlexCartesian.new({
|
|
157
|
+
size: [64, 512, 1400, 1500, 4096, 8192],
|
|
158
|
+
target: [ "dynamodb.eu-central-1.amazonaws.com", # Frankfurt
|
|
159
|
+
"dynamodb.us-east-1.amazonaws.com", # Virginia, US
|
|
160
|
+
"dynamodb.sa-east-1.amazonaws.com", # Sao Paolo
|
|
161
|
+
"dynamodb.ap-northeast-1.amazonaws.com", # Tokio
|
|
162
|
+
"dynamodb.af-south-1.amazonaws.com"]}) # Capetown
|
|
426
163
|
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
|
|
164
|
+
# define behavioural functions:
|
|
165
|
+
# 1. 'command' constructs ping command
|
|
166
|
+
# 2. 'raw' executes the command and returns raw result
|
|
167
|
+
# 3. 'time' extracts ping time from the result
|
|
168
|
+
# 4. 'cap' is a fancy stuff, it shows 150 ms ping threshold on the future visialization.
|
|
169
|
+
result = {}
|
|
170
|
+
space.func(:add, :command) { |v| "ping -c #{v.count} -s #{v.size} -i #{v.interval} #{v.target}" }
|
|
171
|
+
space.func(:add, :raw, hide: true) { |v| result[v.command] ||= `#{v.command} 2>&1` }
|
|
172
|
+
space.func(:add, :time) { |v| v.raw[/min\/avg\/max\/(?:mdev|stddev) = [^\/]+\/([^\/]+)/, 1]&.to_f.round(2) }
|
|
173
|
+
space.func(:add, :cap) { |v| 150 }
|
|
432
174
|
|
|
433
|
-
|
|
175
|
+
# Now we compute all the functions in the parameter space
|
|
176
|
+
space.func(:run, progress: true)
|
|
434
177
|
|
|
435
|
-
|
|
436
|
-
|
|
437
|
-
|
|
178
|
+
# Visualize behavioural blueprint as a 2D-heatmap
|
|
179
|
+
# It will show two functions - ping time (:time) and 150ms threashold (:cap)
|
|
180
|
+
space.visualize(x: :size, y: :target, func: [ :time, :cap ], output: "./viz.html")
|
|
438
181
|
```
|
|
439
|
-
- `limit`: maximum number of combinations to collect.
|
|
440
182
|
|
|
441
|
-
|
|
183
|
+
By running this code, you'll generate an interactive HTML heatmap `./viz.html` illustrating how geography and packet size affect ping times to DynamoDB.
|
|
184
|
+
|
|
185
|
+
As you can see, FlexCartesian's expressive DSL packs complex system profiling into simple one-liners.
|
|
186
|
+
If you need a mathematically rigorous assessment of these parameters, simply add one more line:
|
|
442
187
|
|
|
443
|
-
### Iterate with Progress Bar
|
|
444
188
|
```ruby
|
|
445
|
-
|
|
189
|
+
space.analyzer(:morris, trajectories: 10, step: 0.1, seed: 42).output(func: :time)
|
|
446
190
|
```
|
|
447
|
-
Displays a progress bar using `ruby-progressbar`.
|
|
448
191
|
|
|
449
|
-
|
|
192
|
+
This applies [Morris sensitivity analysis](https://en.wikipedia.org/wiki/Morris_method) directly to the behavioural blueprint extracted by FlexCartesian.
|
|
450
193
|
|
|
451
|
-
|
|
452
|
-
```ruby
|
|
453
|
-
output(separator: " | ", colorize: false, align: true, format: :plain, limit: nil, file: nil)
|
|
454
|
-
```
|
|
455
|
-
- `separator`: how to visually separate columns in the output
|
|
456
|
-
- `colorize`: whether to colorize output or not
|
|
457
|
-
- `align`: whether to align output by column or not
|
|
458
|
-
- `format`: one of `:plain`, `:markdown`, or `:csv`
|
|
459
|
-
- `limit`: break the output after the first `limit` Cartesian combinations
|
|
460
|
-
- `file`: print to `file`
|
|
461
|
-
|
|
462
|
-
Prints all combinations in table form (plain/markdown/CSV).
|
|
463
|
-
Markdown example:
|
|
464
|
-
```markdown
|
|
465
|
-
| dim1 | dim2 |
|
|
466
|
-
|------|------|
|
|
467
|
-
| 1 | "a" |
|
|
468
|
-
| 2 | "b" |
|
|
469
|
-
```
|
|
194
|
+
## API Documentation
|
|
470
195
|
|
|
471
|
-
|
|
472
|
-
dimensions(data = @dimensions, raw: false, separator: ', ', dimensions: true, values: true)
|
|
473
|
-
```
|
|
474
|
-
Generates formatted Cartesian dimensions or vectors of Cartesian space
|
|
475
|
-
- `data`: what to format, either Cartesian vector (usually `s.cartesian { |v| ... }) or entire Cartesian dimensions
|
|
476
|
-
- `raw`: if enabled, overrides any other formarring flags and returns the same as `.inspect`
|
|
477
|
-
- `separator`: how to separate individual components
|
|
478
|
-
- `dimensions`: whether or not show dimension names
|
|
479
|
-
- `values`: whether or not show value(s) associated with dimensions
|
|
196
|
+
Detailed API documentation is available [here](docs/api/api.md).
|
|
480
197
|
|
|
481
|
-
|
|
198
|
+
## Installation
|
|
482
199
|
|
|
483
|
-
|
|
484
|
-
|
|
485
|
-
import(path, format: :json)
|
|
200
|
+
```bash
|
|
201
|
+
gem install flex-cartesian
|
|
486
202
|
```
|
|
487
|
-
- `path`: input file
|
|
488
|
-
- `format`: format to read, `:json` and `:yaml` supported
|
|
489
203
|
|
|
490
|
-
|
|
491
|
-
```ruby
|
|
492
|
-
s.from_json("file.json")
|
|
493
|
-
s.from_yaml("file.yaml")
|
|
494
|
-
```
|
|
204
|
+
## Status
|
|
495
205
|
|
|
496
|
-
|
|
206
|
+
This project is actively developed. Please [submit](https://github.com/Yuri-Rassokhin/flex-cartesian/issues) your feature requests or bug reports.
|
|
497
207
|
|
|
498
|
-
|
|
499
|
-
```ruby
|
|
500
|
-
export(path, format: :json)
|
|
501
|
-
```
|
|
502
|
-
- `path`: output file
|
|
503
|
-
- `format`: format to export, `:json` and `:yaml` supported
|
|
208
|
+
## Contributing
|
|
504
209
|
|
|
505
|
-
|
|
506
|
-
|
|
507
|
-
|
|
508
|
-
```
|
|
509
|
-
- `command`: one of the following
|
|
510
|
-
- `:set` to set the condition to Cartesian space
|
|
511
|
-
- `:unset` to remove the `index` condition from Cartesian space
|
|
512
|
-
- `:clear` to remove all conditions from Cartesian space
|
|
513
|
-
- `:print` default command, prints all the conditions on the Cartesian space
|
|
514
|
-
- `index`: index of the condition set to Cartesian space, it is used to remove specified condition
|
|
515
|
-
- `block`: definition of the condition, it should return `true` or `false` to avoid unpredictable behavior
|
|
516
|
-
|
|
517
|
-
Example:
|
|
518
|
-
```ruby
|
|
519
|
-
s.cond(:set) { |v| v.dim1 > v.dim3 }
|
|
520
|
-
s.cond # defaults to s.cond(:print) and shows all the conditions in the form 'index | definition'
|
|
521
|
-
s.cond(:unset, 0) # remove the condition
|
|
522
|
-
s.cond(:clear) # remove all conditions, if any
|
|
523
|
-
```
|
|
210
|
+
Bug reports and pull requests are welcome on [GitHub](https://github.com/Yuri-Rassokhin/flex-cartesian).
|
|
211
|
+
|
|
212
|
+
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
|
|
524
213
|
|
|
214
|
+
1. Fork the Project
|
|
215
|
+
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
|
|
216
|
+
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
|
|
217
|
+
4. Push to the Branch (`git push origin feature/AmazingFeature`)
|
|
218
|
+
5. Open a Pull Request
|
|
525
219
|
|
|
220
|
+
This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to standard open-source etiquette.
|
|
526
221
|
|
|
527
222
|
## License
|
|
528
223
|
|
|
529
|
-
This project is licensed under the terms of the GNU General Public License v3.0.
|
|
530
|
-
See [LICENSE](LICENSE) for more details.
|
|
224
|
+
This project is licensed under the terms of the GNU General Public License v3.0. See [LICENSE](LICENSE) for more details.
|