@grnsft/if 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (87) hide show
  1. package/.commitlintrc.js +4 -7
  2. package/README.md +5 -1
  3. package/Refactor-migration-guide.md +342 -0
  4. package/build/{models/export-csv.d.ts → builtins/export-csv-raw.d.ts} +1 -1
  5. package/build/builtins/export-csv-raw.js +129 -0
  6. package/build/builtins/export-csv.d.ts +10 -0
  7. package/build/builtins/export-csv.js +83 -0
  8. package/build/builtins/export-log.js +18 -0
  9. package/build/builtins/export-yaml.js +27 -0
  10. package/build/builtins/group-by.d.ts +5 -0
  11. package/build/builtins/group-by.js +56 -0
  12. package/build/{models → builtins}/index.js +1 -1
  13. package/build/{models → builtins}/time-sync.d.ts +2 -2
  14. package/build/builtins/time-sync.js +304 -0
  15. package/build/config/config.js +2 -2
  16. package/build/config/params.js +56 -1
  17. package/build/config/strings.d.ts +4 -2
  18. package/build/config/strings.js +7 -9
  19. package/build/index.js +3 -1
  20. package/build/lib/aggregate.d.ts +1 -1
  21. package/build/lib/aggregate.js +4 -3
  22. package/build/lib/compute.js +9 -10
  23. package/build/lib/exhaust.js +8 -5
  24. package/build/lib/initialize.d.ts +2 -2
  25. package/build/lib/initialize.js +9 -5
  26. package/build/lib/load.d.ts +33 -2
  27. package/build/lib/load.js +4 -4
  28. package/build/lib/parameterize.d.ts +1 -1
  29. package/build/lib/parameterize.js +6 -4
  30. package/build/types/aggregation.d.ts +0 -1
  31. package/build/types/aggregation.js +1 -1
  32. package/build/types/compute.d.ts +3 -3
  33. package/build/types/compute.js +1 -1
  34. package/build/types/interface.d.ts +14 -1
  35. package/build/types/interface.js +6 -1
  36. package/build/types/manifest.d.ts +9 -40
  37. package/build/types/manifest.js +1 -1
  38. package/build/types/parameters.d.ts +2 -12
  39. package/build/types/parameters.js +1 -1
  40. package/build/types/plugin-storage.d.ts +6 -0
  41. package/build/types/plugin-storage.js +3 -0
  42. package/build/util/aggregation-helper.d.ts +1 -1
  43. package/build/util/aggregation-helper.js +11 -12
  44. package/build/util/errors.d.ts +1 -1
  45. package/build/util/errors.js +2 -2
  46. package/build/util/json.js +25 -5
  47. package/build/util/log-memoize.d.ts +3 -0
  48. package/build/util/log-memoize.js +19 -0
  49. package/build/util/logger.d.ts +1 -1
  50. package/build/util/logger.js +2 -3
  51. package/build/util/plugin-storage.d.ts +14 -0
  52. package/build/util/plugin-storage.js +34 -0
  53. package/build/util/validations.d.ts +144 -11
  54. package/build/util/validations.js +19 -17
  55. package/examples/manifests/asim-demo.yml +74 -0
  56. package/examples/manifests/basic-demo.yml +4 -9
  57. package/examples/manifests/boavizta-cloud.yml +20 -0
  58. package/examples/manifests/cim.yml +3 -3
  59. package/examples/manifests/divide.yml +38 -0
  60. package/examples/manifests/generics.yml +4 -4
  61. package/examples/manifests/group-by.yml +13 -13
  62. package/examples/manifests/mock-observation.yml +2 -2
  63. package/examples/manifests/nesting-demo.yml +2 -2
  64. package/examples/manifests/nesting.yml +16 -16
  65. package/examples/manifests/pipeline-demo-1.yml +4 -4
  66. package/examples/manifests/pipeline-demo-2.yml +9 -9
  67. package/examples/manifests/pipeline-demo.yml +10 -10
  68. package/examples/manifests/pipeline-teads-sci.yml +4 -4
  69. package/examples/manifests/pipeline-with-generics.yml +8 -8
  70. package/examples/manifests/pipeline-with-mocks.yml +10 -10
  71. package/examples/manifests/regex.yml +23 -0
  72. package/package.json +6 -3
  73. package/src/builtins/README.md +852 -0
  74. package/build/models/export-csv.js +0 -129
  75. package/build/models/export-log.js +0 -18
  76. package/build/models/export-yaml.js +0 -24
  77. package/build/models/group-by.d.ts +0 -11
  78. package/build/models/group-by.js +0 -56
  79. package/build/models/time-sync.js +0 -304
  80. package/build/types/initialize.d.ts +0 -4
  81. package/build/types/initialize.js +0 -3
  82. package/build/types/load.d.ts +0 -7
  83. package/build/types/load.js +0 -3
  84. package/src/models/README.md +0 -268
  85. /package/build/{models → builtins}/export-log.d.ts +0 -0
  86. /package/build/{models → builtins}/export-yaml.d.ts +0 -0
  87. /package/build/{models → builtins}/index.d.ts +0 -0
@@ -0,0 +1,852 @@
1
+ # IF builtins
2
+
3
+ There are three built-in features of IF:
4
+
5
+ - time-sync
6
+ - CSV exporter
7
+ - groupby
8
+
9
+ On this page, you can find the documentation for each of these three builtins.
10
+
11
+ ## Time-sync
12
+
13
+ Time sync standardizes the start time, end time and temporal resolution of all output data across an entire tree.
14
+
15
+ ### Parameters
16
+
17
+ ### Plugin config
18
+
19
+ The following should be defined in the plugin initialization:
20
+
21
+ - `start-time`: global start time as ISO 8061 string
22
+ - `stop`: global end time as ISO 8061 string
23
+ - `interval`: temporal resolution in seconds
24
+ - `error-on-padding`: avoid zero/'zeroish' padding (if needed) and error out instead. `False` by defult.
25
+
26
+ #### Inputs:
27
+
28
+ - `inputs`: an array of observations
29
+
30
+ #### Returns
31
+
32
+ - `inputs`: time-synchronized version of the tree
33
+
34
+
35
+
36
+
37
+ #### Overview
38
+
39
+ A manifest file for a tree might contain many nodes each representing some different part of an application's stack or even different applications running on different machines. It is therefore common to have time series data in each component that is not directly comparable to other components either because the temporal resolution of the data is different, they cover different periods, or there are gaps in some records (e.g. some apps might burst but then go dormant, while others run continuously). This makes post-hoc visualization, analysis and aggregation of data from groups of nodes difficult to achieve. To address this, we created a time synchronization plugin that takes in non-uniform times series and snaps them all to a regular timeline with uniform start time, end time and temporal resolution.
40
+
41
+ We do this by implementing the following logic:
42
+
43
+ - Shift readings to nearest whole seconds
44
+ - Upsample the time series to a base resolution (1s)
45
+ - Resample to desired resolution by batching 1s entries
46
+ - Extrapolate or trim to ensure all time series share global start and end dates
47
+
48
+ The next section explains each stage in more detail.
49
+
50
+ #### Details
51
+
52
+ ##### Upsampling rules
53
+
54
+ A set of `inputs` is naturally a time series because all `observations` include a `timestamp` and a `duration`, measured in seconds.
55
+ For each `observation` in `inputs` we check whether the duration is greater than 1 second. If `duration` is greater than 1 second, we create N new `observation` objects, where N is equal to `duration`. This means we have an `observation` for every second between the initial timestamp and the end of the observation period. Each new object receives a timestamp incremented by one second.
56
+
57
+ This looks as follows:
58
+
59
+ ```ts
60
+ [{timestamp: '2023-12-12T00:00:00.000Z', duration: 5}]
61
+
62
+ # becomes
63
+ [
64
+ {timestamp: '2023-12-12T00:00:01.000Z', duration: 1}
65
+ {timestamp: '2023-12-12T00:00:02.000Z', duration: 1}
66
+ {timestamp: '2023-12-12T00:00:03.000Z', duration: 1}
67
+ {timestamp: '2023-12-12T00:00:04.000Z', duration: 1}
68
+ {timestamp: '2023-12-12T00:00:05.000Z', duration: 1}
69
+ ]
70
+ ```
71
+
72
+ Each `observation` actually includes many key-value pairs. The precise content of the `observation` is not known until runtime because it depends on which plugins have been included in the pipeline. Different values have to be treated differently when we upsample in time. The method we use to upsample depends on the `aggregation-method` defined for each key in `units.yml`.
73
+
74
+ If the right way to aggregate a value is to sum it, then the right way to upsample it is to divide by `duration`, effectively spreading the total out evenly across the new, higher resolution, `observations` so that the total across the same bucket of time is unchanged (i.e. if the total for some value is 10 when there is one entry with `duration = 10s`, then the total should still be 10 when there are 10 entries each witch `duration = 1s`).
75
+
76
+ On the other hand, if the right way to aggregate a value is to take its average over some time period, the value should be copied unchanged into the newly upsampled `observations`. This is appropriate for values that are proportional or percentages, such as `cpu/utilization`. Treating these values as constants means the average over the `duration` for an observation is identical whether you consider the initial `observation` or the upsampled set of N `observation`s.
77
+
78
+ Constants can simply be copied as-is, because they are constants. Examples might be the `grid/carbon-intensity` - this value does not change depending on how frequently you observe it.
79
+
80
+ Therefore, we apply this logic and the resulting flow looks as follows (the `aggregation-method` for `carbon` and `energy` is `sum`, `grid/carbon-intensity` is a constant and `cpu/utilization` is expressed as a percentage):
81
+
82
+ ```ts
83
+ [{timestamp: '2023-12-12T00:00:00.000Z', duration: 5, 'cpu/utilization': 12, carbon: 5, energy: 10, 'grid/carbon-intensity': 471}]
84
+
85
+ # becomes
86
+
87
+ [
88
+ {timestamp: '2023-12-12T00:00:00.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, 'grid/carbon-intensity': 471},
89
+ {timestamp: '2023-12-12T00:00:01.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, 'grid/carbon-intensity': 471},
90
+ {timestamp: '2023-12-12T00:00:02.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, 'grid/carbon-intensity': 471},
91
+ {timestamp: '2023-12-12T00:00:03.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, 'grid/carbon-intensity': 471},
92
+ {timestamp: '2023-12-12T00:00:04.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, 'grid/carbon-intensity': 471},
93
+ {timestamp: '2023-12-12T00:00:05.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, 'grid/carbon-intensity': 471}
94
+ ]
95
+ ```
96
+
97
+ The end result is that for each `observation`, we upsample the time series to yield 1 second resolution data between `timestamp` and `timestamp + duration`.
98
+
99
+ ##### Gap-filling
100
+
101
+ Sometimes there might be discontinuities in the time series between one `observation` and another. For example we might have two `observations` in a set of `inputs` that have timestamps spaced 10 seconds apart, but the `duration` of the first `observation` is only 5 seconds. in this case, 5 seconds of data are unaccounted for and create a discontinuity in the time series.
102
+
103
+ To solve this problem, for all but the first `observation` in the `inputs` array, we grab the `timestamp` and `duration` from the previous `observation` and check that `timestamp[N] + duration[N] == timestamp[N+1]`. If this condition is not satisfied, we backfill the missing data with a "zero-observation" which is identical to the surrounding observations except any values whose `aggregation-method` is `sum` are set to zero. This is equivalent to assuming that when there is no data available, the app being monitored is switched off.
104
+
105
+ The end result of this gap-filling is that we have continuous 1 second resolution data that can be resampled to a new temporal resolution.
106
+
107
+ ```ts
108
+ [
109
+ {timestamp: '2023-12-12T00:00:00.000Z', duration: 5, 'cpu/utilization': 12, carbon: 5, energy: 10, grid/carbon-intensity: 471},
110
+ {timestamp: '2023-12-12T00:00:08.000Z', duration: 2, 'cpu/utilization': 12, carbon: 5, energy: 10, grid/carbon-intensity: 471}
111
+ ]
112
+
113
+ # There are 2 seconds of missing data between the end of timestamp[0] + duration, and timestamp[1]
114
+ # After expansion and infilling, the array becomes:
115
+
116
+ [
117
+ {timestamp: '2023-12-12T00:00:00.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
118
+ {timestamp: '2023-12-12T00:00:01.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
119
+ {timestamp: '2023-12-12T00:00:02.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
120
+ {timestamp: '2023-12-12T00:00:03.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
121
+ {timestamp: '2023-12-12T00:00:04.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
122
+ {timestamp: '2023-12-12T00:00:05.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
123
+ {timestamp: '2023-12-12T00:00:06.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
124
+ {timestamp: '2023-12-12T00:00:07.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
125
+ {timestamp: '2023-12-12T00:00:08.000Z', duration: 1, 'cpu/utilization': 12, carbon: 2.5, energy: 5, grid/carbon-intensity: 471},
126
+ {timestamp: '2023-12-12T00:00:09.000Z', duration: 1, 'cpu/utilization': 12, carbon: 2.5, energy: 5, grid/carbon-intensity: 471}
127
+ ]
128
+ ```
129
+
130
+ Note that when `error-on-padding` is `true` no gap-filling is performed and the plugin will error out instead.
131
+
132
+ ##### Trimming and padding
133
+
134
+ To ensure parity across all the components in a tree, we need to synchronize the start and end times for all time series. To do this, we pass the `time-sync` plugin plugin some global config: `startTime`, `endTime` and `interval`. The `startTime` is the timestamp where _all_ input arrays across the entire tree should begin, and `endTime` is the timestamp where _all_ input arrays across the entire tree should end. `interval` is the time resolution we ultimately want to resample to.
135
+
136
+ To synchronize the time series start and end we check the first element of `inputs` for each node in the tree and determine whether it is earlier, later or equal to the global start time. If it is equal then no action is required. If the `input` start time is earlier than the global start time, we simply discard entries from the front of the array until the start times are aligned. If the `input` start time is after the global start time, then we pad with our "zero-observation" object - one for every second separating the global start time from the `input` start time. The same process is repeated for the end time - we either trim away `input` data or pad it out with "zero-observation" objects.
137
+
138
+ For example, for `startTime = 2023-12-12T00:00:00.000Z` and `endTime = 2023-12-12T00:00:15.000Z`:
139
+
140
+ ```ts
141
+ [
142
+ {timestamp: '2023-12-12T00:00:05.000Z', duration: 5, 'cpu/utilization': 12, carbon: 5, energy: 10, 'grid/carbon-intensity': 471},
143
+ ]
144
+
145
+ # There are 5 seconds missing from the start and end. After padding, the array becomes:
146
+
147
+ [
148
+ {timestamp: '2023-12-12T00:00:00.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
149
+ {timestamp: '2023-12-12T00:00:01.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
150
+ {timestamp: '2023-12-12T00:00:02.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
151
+ {timestamp: '2023-12-12T00:00:03.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
152
+ {timestamp: '2023-12-12T00:00:04.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
153
+ {timestamp: '2023-12-12T00:00:05.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
154
+ {timestamp: '2023-12-12T00:00:06.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
155
+ {timestamp: '2023-12-12T00:00:07.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
156
+ {timestamp: '2023-12-12T00:00:08.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
157
+ {timestamp: '2023-12-12T00:00:09.000Z', duration: 1, 'cpu/utilization': 12, carbon: 1, energy: 2, grid/carbon-intensity: 471},
158
+ {timestamp: '2023-12-12T00:00:10.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
159
+ {timestamp: '2023-12-12T00:00:11.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
160
+ {timestamp: '2023-12-12T00:00:12.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
161
+ {timestamp: '2023-12-12T00:00:13.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
162
+ {timestamp: '2023-12-12T00:00:14.000Z', duration: 1, 'cpu/utilization': 0, carbon: 0, energy: 0, grid/carbon-intensity: 471},
163
+
164
+ ]
165
+ ```
166
+
167
+ Note that when `error-on-padding` is `true` no padding is performed and the plugin will error out instead.
168
+
169
+ ##### Resampling rules
170
+
171
+ Now we have synchronized, continuous, high resolution time series data, we can resample. To achieve this, we use `interval`, which sets the global temporal resolution for the final, processed time series. `intervalk` is expressed in units of seconds, which means we can simply batch `observations` together in groups of size `interval`. For each value in each object we either sum, average or copy the values into one single summary object representing each time bucket of size `interval` depending on their `aggregation-method` defined in `params.ts`. The returned array is the final, synchronized time series at the desired temporal resolution.
172
+
173
+
174
+ #### Assumptions and limitations
175
+
176
+ To do time synchronization, we assume:
177
+
178
+ - There is no environmental impact for an application when there is no data available.
179
+ - Evenly distributing the total for a `duration` across higher resolution `observations` is appropriate, as opposed to having some non-uniform distribution.
180
+
181
+
182
+ ### Typescript implementation
183
+
184
+
185
+ To run the plugin, you must first create an instance of `TimeSync`.
186
+ Then, you can call `execute()`.
187
+
188
+ ```typescript
189
+ const globalConfig = {
190
+ 'start-time': '2023-12-12T00:00:00.000Z',
191
+ 'end-time': '2023-12-12T00:00:30.000Z',
192
+ interval: 10
193
+ }
194
+ const timeSync = TimeSync(globalConfig);
195
+ const results = timeSync.execute([
196
+ {
197
+ timestamp: '2023-12-12T00:00:00.000Z'
198
+ duration: 10
199
+ 'cpu/utilization': 10
200
+ carbon: 100
201
+ energy: 100
202
+ requests: 300
203
+ },
204
+ {
205
+ timestamp: '2023-12-12T00:00:10.000Z'
206
+ duration: 10
207
+ 'cpu/utilization': 20
208
+ carbon: 100,
209
+ energy: 100,
210
+ requests: 380
211
+ }
212
+ ])
213
+ ```
214
+
215
+ ### Example manifest
216
+
217
+ IF users will typically call the plugin as part of a pipeline defined in an `manifest`
218
+ file. In this case, instantiating and configuring the plugin is handled by
219
+ `ie` and does not have to be done explicitly by the user.
220
+ The following is an example `manifest` that calls `time-sync`:
221
+
222
+ ```yaml
223
+ name: time-sync-demo
224
+ description: impl with 2 levels of nesting with non-uniform timing of observations
225
+ tags:
226
+ initialize:
227
+ plugins:
228
+ teads-curve:
229
+ method: TeadsCurve
230
+ path: '@grnsft/if-unofficial-plugins'
231
+ sci-e:
232
+ method: SciE
233
+ path: '@grnsft/if-plugins'
234
+ sci-m:
235
+ path: '@grnsft/if-plugins'
236
+ method: SciM
237
+ sci-o:
238
+ method: SciO
239
+ path: '@grnsft/if-plugins'
240
+ time-sync:
241
+ method: TimeSync
242
+ path: builtin
243
+ global-config:
244
+ start-time: '2023-12-12T00:00:00.000Z' # ISO timestamp
245
+ end-time: '2023-12-12T00:01:00.000Z' # ISO timestamp
246
+ interval: 5 # seconds
247
+ tree:
248
+ children:
249
+ child: # an advanced grouping node
250
+ pipeline:
251
+ - teads-curve
252
+ - sci-e
253
+ - sci-m
254
+ - sci-o
255
+ - time-sync
256
+ config:
257
+ teads-curve:
258
+ cpu/thermal-design-power: 65
259
+ sci-m:
260
+ device/emissions-embodied: 251000 # gCO2eq
261
+ time-reserved: 3600 # 1 hour in s
262
+ device/expected-lifespan: 126144000 # 4 years in seconds
263
+ resources-reserved: 1
264
+ resources-total: 1
265
+ sci-o:
266
+ grid/carbon-intensity: 457 # gCO2/kwh
267
+ children:
268
+ child-1:
269
+ inputs:
270
+ - timestamp: '2023-12-12T00:00:00.000Z'
271
+ duration: 10
272
+ cpu/utilization: 10
273
+ carbon: 100
274
+ energy: 100
275
+ requests: 300
276
+ - timestamp: '2023-12-12T00:00:10.000Z'
277
+ duration: 10
278
+ cpu/utilization: 20
279
+ carbon: 200
280
+ energy: 200
281
+ requests: 380
282
+
283
+ ```
284
+
285
+
286
+ ## CSV Exporter
287
+
288
+ IF supports exporting data to CSV files. This provides users with a data format that enables visualization and data analysis using standard data analysis tools.
289
+
290
+ ### Manifest config
291
+
292
+ To export your data to a CSV file, you have to provide a small piece of config data to your manifest file:
293
+
294
+ ```yaml
295
+ initialize:
296
+ outputs:
297
+ - csv
298
+ ```
299
+
300
+ You can also add `- yaml` if you want to export to both `yaml` and `csv` simultaneously.
301
+
302
+ ### CLI command
303
+
304
+ Then, you must select the metric you want to export to CSV. The name of that metric must be added to the savepath provided to the `--output` command in the CLI, after a hashtag.
305
+
306
+ For example, to export the `carbon` data from your tree to a CSV file:
307
+
308
+ ```sh
309
+ ie --manifest example.yml --output example#carbon
310
+ ```
311
+
312
+ This will save a CSV file called `example.csv`. The contents will look similar to the following:
313
+
314
+ | | | | | |
315
+ | ---------------------------------------------- | ---------------- | ---------------------------- | ---------------------------- | ---------------------------- |
316
+ | **Path** | **Aggregated** | **2024-03-05T00:00:00.000Z** | **2024-03-05T00:05:00.000Z** | **2024-03-05T00:10:00.000Z** |
317
+ | tree.carbon | 425.289232008725 | 17.9269877157543 | 8.9024388783018 | 45.6021901509012 |
318
+ | tree.children.westus3.carbon | 104.696836722878 | 3.59973803197887 | 3.47438149032372 | 6.91318436533634 |
319
+ | tree.children.westus3.children.server-1.carbon | 104.696836722878 | 3.59973803197887 | 3.47438149032372 | 6.91318436533634 |
320
+ | tree.children.france.carbon | 320.592395285847 | 14.3272496837754 | 5.42805738797808 | 38.6890057855649 |
321
+ | tree.children.france.children.server-2.carbon | 320.592395285847 | 14.3272496837754 | 5.42805738797808 | 38.6890057855649 |
322
+
323
+
324
+ ### Comparing CSV to Yaml
325
+
326
+ The CSV above is generated from the following yaml. The `carbon` metric is extracted and added to the CSV. Otherwise, the CSV is an exact representation of the following yaml tree. You can see that the CSV representation is *much* easier to understand than the full yaml tree:
327
+
328
+ ```yaml
329
+ tree:
330
+ pipeline:
331
+ - mock-observations
332
+ - group-by
333
+ - cloud-metadata
334
+ - time-sync
335
+ - watttime
336
+ - teads-curve
337
+ - operational-carbon
338
+ defaults:
339
+ grid/carbon-intensity: 500
340
+ config:
341
+ group-by:
342
+ group:
343
+ - cloud/region
344
+ - name
345
+ children:
346
+ westus3:
347
+ children:
348
+ server-1:
349
+ inputs:
350
+ - timestamp: '2024-03-05T00:00:00.000Z'
351
+ duration: 300
352
+ name: server-1
353
+ cloud/instance-type: Standard_E64_v3
354
+ cloud/region: westus3
355
+ cloud/vendor: azure
356
+ cpu/utilization: 66
357
+ grid/carbon-intensity: 500
358
+ - timestamp: '2024-03-05T00:05:00.000Z'
359
+ duration: 300
360
+ name: server-1
361
+ cloud/instance-type: Standard_E64_v3
362
+ cloud/region: westus3
363
+ cloud/vendor: azure
364
+ cpu/utilization: 4
365
+ grid/carbon-intensity: 500
366
+ - timestamp: '2024-03-05T00:10:00.000Z'
367
+ duration: 300
368
+ name: server-1
369
+ cloud/instance-type: Standard_E64_v3
370
+ cloud/region: westus3
371
+ cloud/vendor: azure
372
+ cpu/utilization: 54
373
+ grid/carbon-intensity: 500
374
+ - timestamp: '2024-03-05T00:15:00.000Z'
375
+ duration: 300
376
+ name: server-1
377
+ cloud/instance-type: Standard_E64_v3
378
+ cloud/region: westus3
379
+ cloud/vendor: azure
380
+ cpu/utilization: 19
381
+ grid/carbon-intensity: 500
382
+ outputs:
383
+ - timestamp: '2024-03-05T00:00:00.000Z'
384
+ duration: 300
385
+ name: server-1
386
+ cloud/instance-type: Standard_E64_v3
387
+ cloud/region: westus3
388
+ cloud/vendor: azure
389
+ cpu/utilization: 65.78
390
+ grid/carbon-intensity: 369.4947514218548
391
+ vcpus-allocated: 64
392
+ vcpus-total: 64
393
+ memory-available: 432
394
+ physical-processor: >-
395
+ Intel® Xeon® Platinum 8370C,Intel® Xeon® Platinum 8272CL,Intel®
396
+ Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz
397
+ cpu/thermal-design-power: 269.1
398
+ cloud/region-cfe: CAISO
399
+ cloud/region-em-zone-id: US-CAL-CISO
400
+ cloud/region-wt-id: CAISO_NORTH
401
+ cloud/region-location: US West (N. California)
402
+ cloud/region-geolocation: 34.0497,-118.1326
403
+ geolocation: 34.0497,-118.1326
404
+ cpu/energy: 0.018934842060004835
405
+ carbon: 6.996324760173567
406
+ - timestamp: '2024-03-05T00:05:00.000Z'
407
+ duration: 300
408
+ name: server-1
409
+ cloud/instance-type: Standard_E64_v3
410
+ cloud/region: westus3
411
+ cloud/vendor: azure
412
+ cpu/utilization: 3.986666666666667
413
+ grid/carbon-intensity: 369.38452029076234
414
+ vcpus-allocated: 64
415
+ vcpus-total: 64
416
+ memory-available: 432
417
+ physical-processor: >-
418
+ Intel® Xeon® Platinum 8370C,Intel® Xeon® Platinum 8272CL,Intel®
419
+ Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz
420
+ cpu/thermal-design-power: 269.1
421
+ cloud/region-cfe: CAISO
422
+ cloud/region-em-zone-id: US-CAL-CISO
423
+ cloud/region-wt-id: CAISO_NORTH
424
+ cloud/region-location: US West (N. California)
425
+ cloud/region-geolocation: 34.0497,-118.1326
426
+ geolocation: 34.0497,-118.1326
427
+ cpu/energy: 0.004545546617763956
428
+ carbon: 1.6790545568620359
429
+ - timestamp: '2024-03-05T00:10:00.000Z'
430
+ duration: 300
431
+ name: server-1
432
+ cloud/instance-type: Standard_E64_v3
433
+ cloud/region: westus3
434
+ cloud/vendor: azure
435
+ cpu/utilization: 53.82
436
+ grid/carbon-intensity: 372.58122309244305
437
+ vcpus-allocated: 64
438
+ vcpus-total: 64
439
+ memory-available: 432
440
+ physical-processor: >-
441
+ Intel® Xeon® Platinum 8370C,Intel® Xeon® Platinum 8272CL,Intel®
442
+ Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz
443
+ cpu/thermal-design-power: 269.1
444
+ cloud/region-cfe: CAISO
445
+ cloud/region-em-zone-id: US-CAL-CISO
446
+ cloud/region-wt-id: CAISO_NORTH
447
+ cloud/region-location: US West (N. California)
448
+ cloud/region-geolocation: 34.0497,-118.1326
449
+ geolocation: 34.0497,-118.1326
450
+ cpu/energy: 0.017357893372978016
451
+ carbon: 6.467225143212361
452
+ - timestamp: '2024-03-05T00:15:00.000Z'
453
+ duration: 300
454
+ name: server-1
455
+ cloud/instance-type: Standard_E64_v3
456
+ cloud/region: westus3
457
+ cloud/vendor: azure
458
+ cpu/utilization: 18.936666666666667
459
+ grid/carbon-intensity: 434.20042537311633
460
+ vcpus-allocated: 64
461
+ vcpus-total: 64
462
+ memory-available: 432
463
+ physical-processor: >-
464
+ Intel® Xeon® Platinum 8370C,Intel® Xeon® Platinum 8272CL,Intel®
465
+ Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz
466
+ cpu/thermal-design-power: 269.1
467
+ cloud/region-cfe: CAISO
468
+ cloud/region-em-zone-id: US-CAL-CISO
469
+ cloud/region-wt-id: CAISO_NORTH
470
+ cloud/region-location: US West (N. California)
471
+ cloud/region-geolocation: 34.0497,-118.1326
472
+ geolocation: 34.0497,-118.1326
473
+ cpu/energy: 0.010385485956624245
474
+ carbon: 4.5093824200727735
475
+ aggregated:
476
+ carbon: 19.651986880320734
477
+ outputs:
478
+ - carbon: 6.996324760173567
479
+ timestamp: '2024-03-05T00:00:00.000Z'
480
+ duration: 300
481
+ - carbon: 1.6790545568620359
482
+ timestamp: '2024-03-05T00:05:00.000Z'
483
+ duration: 300
484
+ - carbon: 6.467225143212361
485
+ timestamp: '2024-03-05T00:10:00.000Z'
486
+ duration: 300
487
+ - carbon: 4.5093824200727735
488
+ timestamp: '2024-03-05T00:15:00.000Z'
489
+ duration: 300
490
+ aggregated:
491
+ carbon: 19.651986880320734
492
+ france:
493
+ children:
494
+ server-2:
495
+ inputs:
496
+ - timestamp: '2024-03-05T00:00:00.000Z'
497
+ duration: 300
498
+ name: server-2
499
+ cloud/instance-type: Standard_E64_v3
500
+ cloud/region: france
501
+ cloud/vendor: azure
502
+ cpu/utilization: 15
503
+ grid/carbon-intensity: 500
504
+ - timestamp: '2024-03-05T00:05:00.000Z'
505
+ duration: 300
506
+ name: server-2
507
+ cloud/instance-type: Standard_E64_v3
508
+ cloud/region: france
509
+ cloud/vendor: azure
510
+ cpu/utilization: 78
511
+ grid/carbon-intensity: 500
512
+ - timestamp: '2024-03-05T00:10:00.000Z'
513
+ duration: 300
514
+ name: server-2
515
+ cloud/instance-type: Standard_E64_v3
516
+ cloud/region: france
517
+ cloud/vendor: azure
518
+ cpu/utilization: 16
519
+ grid/carbon-intensity: 500
520
+ - timestamp: '2024-03-05T00:15:00.000Z'
521
+ duration: 300
522
+ name: server-2
523
+ cloud/instance-type: Standard_E64_v3
524
+ cloud/region: france
525
+ cloud/vendor: azure
526
+ cpu/utilization: 6
527
+ grid/carbon-intensity: 500
528
+ outputs:
529
+ - timestamp: '2024-03-05T00:00:00.000Z'
530
+ duration: 300
531
+ name: server-2
532
+ cloud/instance-type: Standard_E64_v3
533
+ cloud/region: france
534
+ cloud/vendor: azure
535
+ cpu/utilization: 14.95
536
+ grid/carbon-intensity: 1719.1647205176753
537
+ vcpus-allocated: 64
538
+ vcpus-total: 64
539
+ memory-available: 432
540
+ physical-processor: >-
541
+ Intel® Xeon® Platinum 8370C,Intel® Xeon® Platinum 8272CL,Intel®
542
+ Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz
543
+ cpu/thermal-design-power: 269.1
544
+ cloud/region-cfe: France
545
+ cloud/region-em-zone-id: FR
546
+ cloud/region-wt-id: FR
547
+ cloud/region-location: Paris
548
+ cloud/region-geolocation: 48.8567,2.3522
549
+ geolocation: 48.8567,2.3522
550
+ cpu/energy: 0.00905914075141129
551
+ carbon: 15.574155178030272
552
+ - timestamp: '2024-03-05T00:05:00.000Z'
553
+ duration: 300
554
+ name: server-2
555
+ cloud/instance-type: Standard_E64_v3
556
+ cloud/region: france
557
+ cloud/vendor: azure
558
+ cpu/utilization: 77.74
559
+ grid/carbon-intensity: 1719.0544893865829
560
+ vcpus-allocated: 64
561
+ vcpus-total: 64
562
+ memory-available: 432
563
+ physical-processor: >-
564
+ Intel® Xeon® Platinum 8370C,Intel® Xeon® Platinum 8272CL,Intel®
565
+ Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz
566
+ cpu/thermal-design-power: 269.1
567
+ cloud/region-cfe: France
568
+ cloud/region-em-zone-id: FR
569
+ cloud/region-wt-id: FR
570
+ cloud/region-location: Paris
571
+ cloud/region-geolocation: 48.8567,2.3522
572
+ geolocation: 48.8567,2.3522
573
+ cpu/energy: 0.020379266251888902
574
+ carbon: 35.0330691407141
575
+ - timestamp: '2024-03-05T00:10:00.000Z'
576
+ duration: 300
577
+ name: server-2
578
+ cloud/instance-type: Standard_E64_v3
579
+ cloud/region: france
580
+ cloud/vendor: azure
581
+ cpu/utilization: 15.946666666666667
582
+ grid/carbon-intensity: 1718.8707708347622
583
+ vcpus-allocated: 64
584
+ vcpus-total: 64
585
+ memory-available: 432
586
+ physical-processor: >-
587
+ Intel® Xeon® Platinum 8370C,Intel® Xeon® Platinum 8272CL,Intel®
588
+ Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz
589
+ cpu/thermal-design-power: 269.1
590
+ cloud/region-cfe: France
591
+ cloud/region-em-zone-id: FR
592
+ cloud/region-wt-id: FR
593
+ cloud/region-location: Paris
594
+ cloud/region-geolocation: 48.8567,2.3522
595
+ geolocation: 48.8567,2.3522
596
+ cpu/energy: 0.009405866514354337
597
+ carbon: 16.16746902589712
598
+ - timestamp: '2024-03-05T00:15:00.000Z'
599
+ duration: 300
600
+ name: server-2
601
+ cloud/instance-type: Standard_E64_v3
602
+ cloud/region: france
603
+ cloud/vendor: azure
604
+ cpu/utilization: 5.98
605
+ grid/carbon-intensity: 1718.6686804277592
606
+ vcpus-allocated: 64
607
+ vcpus-total: 64
608
+ memory-available: 432
609
+ physical-processor: >-
610
+ Intel® Xeon® Platinum 8370C,Intel® Xeon® Platinum 8272CL,Intel®
611
+ Xeon® 8171M 2.1 GHz,Intel® Xeon® E5-2673 v4 2.3 GHz
612
+ cpu/thermal-design-power: 269.1
613
+ cloud/region-cfe: France
614
+ cloud/region-em-zone-id: FR
615
+ cloud/region-wt-id: FR
616
+ cloud/region-location: Paris
617
+ cloud/region-geolocation: 48.8567,2.3522
618
+ geolocation: 48.8567,2.3522
619
+ cpu/energy: 0.0054492484351820105
620
+ carbon: 9.365452617417297
621
+ aggregated:
622
+ carbon: 76.1401459620588
623
+ outputs:
624
+ - carbon: 15.574155178030272
625
+ timestamp: '2024-03-05T00:00:00.000Z'
626
+ duration: 300
627
+ - carbon: 35.0330691407141
628
+ timestamp: '2024-03-05T00:05:00.000Z'
629
+ duration: 300
630
+ - carbon: 16.16746902589712
631
+ timestamp: '2024-03-05T00:10:00.000Z'
632
+ duration: 300
633
+ - carbon: 9.365452617417297
634
+ timestamp: '2024-03-05T00:15:00.000Z'
635
+ duration: 300
636
+ aggregated:
637
+ carbon: 76.1401459620588
638
+ outputs:
639
+ - carbon: 22.57047993820384
640
+ timestamp: '2024-03-05T00:00:00.000Z'
641
+ duration: 300
642
+ - carbon: 36.71212369757613
643
+ timestamp: '2024-03-05T00:05:00.000Z'
644
+ duration: 300
645
+ - carbon: 22.63469416910948
646
+ timestamp: '2024-03-05T00:10:00.000Z'
647
+ duration: 300
648
+ - carbon: 13.87483503749007
649
+ timestamp: '2024-03-05T00:15:00.000Z'
650
+ duration: 300
651
+ aggregated:
652
+ carbon: 95.79213284237952
653
+ ```
654
+
655
+ ### CSV and aggregation
656
+
657
+ The CSV representation of the output data is helpful for intuiting how the aggregation procedure works. What we refer to as "horizontal" aggregation is really an aggregation of the *rows* of the CSV. You can replicate the IF aggregation function by summing the cells in each row of the CSV. Similarly, what we refer to as "vertical" aggregation can be replicated by summing the *columns* in the CSV representation (this is not *exactly* accurate because you have to skip summing both parent nodes and their children, both of which are represented in the CSV, but it is true conceptually).
658
+
659
+
660
+ ## Groupby
661
+
662
+ Groupby is an IF plugin that reorganizes a tree according to keys provided by the user. This allows users to regroup their observations according to various properties of their application. For example, the following manifest file contains a flat array of observations. This is how you might expect data to arrive from an importer plugin, maybe one that hits a metrics API for a cloud service.
663
+
664
+
665
+ ```yaml
666
+ name: if-demo
667
+ description: demo pipeline
668
+ graph:
669
+ children:
670
+ my-app:
671
+ pipeline:
672
+ - group-by
673
+ - teads-curve
674
+ config:
675
+ group-by:
676
+ - cloud-region
677
+ - instance-type
678
+ inputs:
679
+ - timestamp: 2023-07-06T00:00
680
+ duration: 300
681
+ instance-type: A1
682
+ region: uk-west
683
+ cpu-util: 99
684
+ - timestamp: 2023-07-06T05:00
685
+ duration: 300
686
+ instance-type: A1
687
+ region: uk-west
688
+ cpu-util: 23
689
+ - timestamp: 2023-07-06T10:00
690
+ duration: 300
691
+ instance-type: A1
692
+ region: uk-west
693
+ cpu-util: 12
694
+ - timestamp: 2023-07-06T00:00 # note this time restarts at the start timstamp
695
+ duration: 300
696
+ instance-type: B1
697
+ region: uk-west
698
+ cpu-util: 11
699
+ - timestamp: 2023-07-06T05:00
700
+ duration: 300
701
+ instance-type: B1
702
+ region: uk-west
703
+ cpu-util: 67
704
+ - timestamp: 2023-07-06T10:00
705
+ duration: 300
706
+ instance-type: B1
707
+ region: uk-west
708
+ cpu-util: 1
709
+ ```
710
+
711
+ However, each observation contains an `instance-type` field that varies between observations. There are two instance types being represented in this array of observations. This means there are duplicate entries for the same timestamp in this array. This is the problem that `group-by` solves. You provide `instance-type` as a key to the `group-by` plugin and it extracts the data belonging to the different instances and separates them into independent arrays. The above example would be restructured so that instance types `A1` and `B1` have their own data, as follows:
712
+
713
+
714
+ ```yaml
715
+ graph:
716
+ children:
717
+ my-app:
718
+ pipeline:
719
+ # - group-by
720
+ - teads-curve
721
+ config:
722
+ group-by:
723
+ groups:
724
+ - cloud-region
725
+ - instance-type
726
+ children:
727
+ A1:
728
+ inputs:
729
+ - timestamp: 2023-07-06T00:00
730
+ duration: 300
731
+ instance-type: A1
732
+ region: uk-west
733
+ cpu-util: 99
734
+ - timestamp: 2023-07-06T05:00
735
+ duration: 300
736
+ instance-type: A1
737
+ region: uk-west
738
+ cpu-util: 23
739
+ - timestamp: 2023-07-06T10:00
740
+ duration: 300
741
+ instance-type: A1
742
+ region: uk-west
743
+ cpu-util: 12
744
+ B1:
745
+ inputs:
746
+ - timestamp: 2023-07-06T00:00
747
+ duration: 300
748
+ instance-type: B1
749
+ region: uk-east
750
+ cpu-util: 11
751
+ - timestamp: 2023-07-06T05:00
752
+ duration: 300
753
+ instance-type: B1
754
+ region: uk-east
755
+ cpu-util: 67
756
+ - timestamp: 2023-07-06T10:00
757
+ duration: 300
758
+ instance-type: B1
759
+ region: uk-east
760
+ cpu-util: 1
761
+ ```
762
+
763
+ ### Using `group-by`
764
+
765
+ To use `group-by`, you have to initialize it as a plugin and invoke it in a pipeline.
766
+
767
+ The initialization looks as follows:
768
+
769
+ ```yaml
770
+ initialize:
771
+ plugins:
772
+ group-by:
773
+ path: 'builtin'
774
+ method: GroupBy
775
+ ```
776
+
777
+ You then have to provide config defining which keys to group by in each component. This is done at the component level (i.e. not global config).
778
+ For example:
779
+
780
+
781
+ ```yaml
782
+ tree:
783
+ children:
784
+ my-app:
785
+ pipeline:
786
+ - group-by
787
+ config:
788
+ group-by:
789
+ group:
790
+ - region
791
+ - instance-type
792
+ ```
793
+
794
+ In the example above, the plugin would regroup the input data for the specific component by `region` and by `instance-type`.
795
+
796
+ Assuming the values `A1` and `B1` are found for `instance-type` and the values `uk-east` and `uk-west` are found for `region`, the result of `group-by` would look similar to the following:
797
+
798
+
799
+ ```yaml
800
+ tree:
801
+ children:
802
+ my-app:
803
+ pipeline:
804
+ - group-by
805
+ config:
806
+ group-by:
807
+ groups:
808
+ - region
809
+ - instance-type
810
+ children:
811
+ uk-west:
812
+ children:
813
+ A1:
814
+ inputs:
815
+ - timestamp: 2023-07-06T00:00
816
+ duration: 300
817
+ instance-type: A1
818
+ region: uk-west
819
+ cpu-util: 99
820
+ - timestamp: 2023-07-06T05:00
821
+ duration: 300
822
+ instance-type: A1
823
+ region: uk-west
824
+ cpu-util: 23
825
+ - timestamp: 2023-07-06T10:00
826
+ duration: 300
827
+ instance-type: A1
828
+ region: uk-west
829
+ cpu-util: 12
830
+ uk-east:
831
+ children:
832
+ B1:
833
+ inputs:
834
+ - timestamp: 2023-07-06T00:00
835
+ duration: 300
836
+ instance-type: B1
837
+ region: uk-east
838
+ cpu-util: 11
839
+ - timestamp: 2023-07-06T05:00
840
+ duration: 300
841
+ instance-type: B1
842
+ region: uk-east
843
+ cpu-util: 67
844
+ - timestamp: 2023-07-06T10:00
845
+ duration: 300
846
+ instance-type: B1
847
+ region: uk-east
848
+ cpu-util: 1
849
+ ```
850
+
851
+ This reorganized data can then be used to feed the rest of a computation pipeline.
852
+