gitgreen 1.3.0 → 1.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +275 -380
- package/package.json +1 -1
- package/scripts/build_power_profiles.py +0 -268
package/README.md
CHANGED
|
@@ -1,488 +1,383 @@
|
|
|
1
|
-
|
|
1
|
+
<p align="center">
|
|
2
|
+
<h1 align="center">GitGreen</h1>
|
|
3
|
+
<p align="center">
|
|
4
|
+
<strong>Carbon footprint tracking for GitLab CI/CD pipelines</strong>
|
|
5
|
+
</p>
|
|
6
|
+
<p align="center">
|
|
7
|
+
<a href="#quick-start">Quick Start</a> •
|
|
8
|
+
<a href="#installation">Installation</a> •
|
|
9
|
+
<a href="#usage">Usage</a> •
|
|
10
|
+
<a href="#methodology">Methodology</a> •
|
|
11
|
+
<a href="#faq">FAQ</a>
|
|
12
|
+
</p>
|
|
13
|
+
</p>
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
[](https://www.npmjs.com/package/gitgreen)
|
|
18
|
+
[](LICENSE)
|
|
19
|
+
[](https://nodejs.org/)
|
|
20
|
+
|
|
21
|
+
GitGreen measures the carbon emissions of your CI/CD jobs using real CPU/RAM metrics, grid carbon intensity data, and research-backed power models. Works with **GCP** and **AWS** runners.
|
|
2
22
|
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
### Required
|
|
10
|
-
|
|
11
|
-
- **Node.js** (v20 or higher) - [Download](https://nodejs.org/)
|
|
12
|
-
- Required to run the CLI tool
|
|
13
|
-
- Check version: `node --version`
|
|
23
|
+
```
|
|
24
|
+
Total emissions: 2.847 gCO₂e
|
|
25
|
+
├── CPU: 1.923 gCO₂e
|
|
26
|
+
├── RAM: 0.412 gCO₂e
|
|
27
|
+
└── Scope 3: 0.512 gCO₂e (embodied carbon)
|
|
28
|
+
```
|
|
14
29
|
|
|
15
|
-
|
|
16
|
-
- This project uses `pnpm@8.15.4` as the package manager
|
|
17
|
-
- Install pnpm: `npm install -g pnpm`
|
|
30
|
+
## Quick Start
|
|
18
31
|
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
32
|
+
```bash
|
|
33
|
+
# Install globally
|
|
34
|
+
npm install -g gitgreen
|
|
22
35
|
|
|
23
|
-
|
|
36
|
+
# Initialize in your GitLab project
|
|
37
|
+
cd your-repo
|
|
38
|
+
gitgreen init
|
|
39
|
+
```
|
|
24
40
|
|
|
25
|
-
|
|
26
|
-
- Required for GCP provider setup and authentication
|
|
27
|
-
- After installation, authenticate: `gcloud auth login`
|
|
28
|
-
- Set default project: `gcloud config set project YOUR_PROJECT_ID`
|
|
29
|
-
- Check version: `gcloud --version`
|
|
41
|
+
The wizard configures everything: provider credentials, carbon budgets, and CI/CD integration.
|
|
30
42
|
|
|
31
|
-
|
|
32
|
-
- Required for AWS provider (can use environment variables or AWS CLI config)
|
|
33
|
-
- Configure credentials: `aws configure`
|
|
34
|
-
- Or set environment variables: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`
|
|
43
|
+
## Table of Contents
|
|
35
44
|
|
|
36
|
-
|
|
45
|
+
- [Installation](#installation)
|
|
46
|
+
- [Usage](#usage)
|
|
47
|
+
- [Advanced](#advanced)
|
|
48
|
+
- [Providers](#providers)
|
|
49
|
+
- [Output Integrations](#output-integrations)
|
|
50
|
+
- [Database Schema](#database-schema)
|
|
51
|
+
- [Methodology](#methodology)
|
|
52
|
+
- [Architecture](#architecture)
|
|
53
|
+
- [Configuration](#configuration)
|
|
54
|
+
- [FAQ](#faq)
|
|
55
|
+
- [Contributing](#contributing)
|
|
56
|
+
- [References](#references)
|
|
57
|
+
- [License](#license)
|
|
37
58
|
|
|
38
|
-
|
|
39
|
-
- Recommended for easier GitLab authentication and CI/CD variable management
|
|
40
|
-
- After installation, authenticate: `glab auth login`
|
|
41
|
-
- Check version: `glab --version`
|
|
42
|
-
- If not installed, you'll need to provide a GitLab Personal Access Token manually
|
|
59
|
+
## Installation
|
|
43
60
|
|
|
44
|
-
###
|
|
61
|
+
### Prerequisites
|
|
45
62
|
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
63
|
+
| Requirement | Version | Notes |
|
|
64
|
+
|-------------|---------|-------|
|
|
65
|
+
| Node.js | 20+ | [Download](https://nodejs.org/) |
|
|
66
|
+
| Git | Any | For GitLab project detection |
|
|
67
|
+
| gcloud CLI | Any | Required for GCP ([Install](https://cloud.google.com/sdk/docs/install-sdk)) |
|
|
68
|
+
| AWS CLI | Any | Required for AWS ([Install](https://aws.amazon.com/cli/)) |
|
|
69
|
+
| glab CLI | Any | Optional, for easier GitLab auth ([Install](https://docs.gitlab.com/cli/)) |
|
|
49
70
|
|
|
50
|
-
|
|
51
|
-
- From npm (global CLI):
|
|
52
|
-
```bash
|
|
53
|
-
npm install -g gitgreen
|
|
54
|
-
gitgreen --help
|
|
55
|
-
```
|
|
71
|
+
### Install
|
|
56
72
|
|
|
57
|
-
Run tests:
|
|
58
73
|
```bash
|
|
59
|
-
npm
|
|
74
|
+
npm install -g gitgreen
|
|
60
75
|
```
|
|
61
76
|
|
|
62
|
-
|
|
63
|
-
```bash
|
|
64
|
-
npm build
|
|
65
|
-
npm stress
|
|
66
|
-
```
|
|
77
|
+
### API Keys
|
|
67
78
|
|
|
68
|
-
|
|
79
|
+
- **Electricity Maps** (required): [Get free API key](https://api-portal.electricitymaps.com)
|
|
69
80
|
|
|
70
|
-
|
|
81
|
+
## Usage
|
|
71
82
|
|
|
72
|
-
|
|
83
|
+
### Initialize a Project
|
|
73
84
|
|
|
74
85
|
```bash
|
|
75
|
-
# In your repo
|
|
76
86
|
gitgreen init
|
|
77
87
|
```
|
|
78
88
|
|
|
79
|
-
The
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
- Append a ready-made job to your `.gitlab-ci.yml`
|
|
87
|
-
- Print the CI/CD variable checklist (ELECTRICITY_MAPS_API_KEY, provider credentials, budget flags)
|
|
88
|
-
|
|
89
|
-
### How It Works
|
|
90
|
-
|
|
91
|
-
Once initialized, all subsequent pipelines will run on the configured runner, and their performance will be automatically measured. The carbon tracking is implemented using GitLab CI/CD components as a final step in your pipeline. The carbon tracking job itself is not computationally expensive, so it adds minimal overhead to your CI/CD workflows.
|
|
92
|
-
|
|
93
|
-
Key environment variables:
|
|
94
|
-
- `ELECTRICITY_MAPS_API_KEY` (required) - Get a free key from https://api-portal.electricitymaps.com
|
|
95
|
-
- `GCP_PROJECT_ID` / `GOOGLE_CLOUD_PROJECT` (required for GCP) - Your GCP project ID
|
|
96
|
-
- `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` (required for AWS CloudWatch) - Credentials with CloudWatch read access on the runner
|
|
97
|
-
- `AWS_REGION` (for AWS) - Region of the runner EC2 instance
|
|
98
|
-
|
|
99
|
-
The `gitgreen init` wizard will automatically set all necessary GitLab CI/CD variables for you.
|
|
100
|
-
|
|
101
|
-
## Providers
|
|
102
|
-
|
|
103
|
-
GitGreen supports multiple cloud providers:
|
|
104
|
-
|
|
105
|
-
- **GCP**: Fully wired — the CLI parser expects GCP Monitoring JSON, the GitLab component polls GCE metrics, and `gitgreen init` provisions or reuses GCP runners.
|
|
106
|
-
- **AWS**: CloudWatch-backed — the CLI can pull metrics directly via `--from-cloudwatch`, the GitLab component fetches CloudWatch CPU/RAM data, and `gitgreen init` can configure or provision EC2 runners.
|
|
107
|
-
|
|
108
|
-
## Architecture
|
|
109
|
-
Runtime flow:
|
|
110
|
-
```
|
|
111
|
-
CPU/RAM timeseries (GCP Monitoring JSON or custom collector)
|
|
112
|
-
|
|
|
113
|
-
v
|
|
114
|
-
+------------------+ +----------------------------+
|
|
115
|
-
| CLI (gitgreen) |--->| CarbonCalculator |
|
|
116
|
-
| - parse metrics | | - PowerProfileRepository |<- machine data in data/*.json
|
|
117
|
-
| - CLI/env opts | | - ZoneMapper (region→zone) |<- static + runtime PUE mapping
|
|
118
|
-
+------------------+ | - IntensityProvider |<- Electricity Maps API
|
|
119
|
-
+-------------+------------+
|
|
120
|
-
|
|
|
121
|
-
v
|
|
122
|
-
+-----------------------------+
|
|
123
|
-
| Report formatter |
|
|
124
|
-
| - Markdown/JSON artifacts |
|
|
125
|
-
| - Budget evaluation |
|
|
126
|
-
+-------------+---------------+
|
|
127
|
-
|
|
|
128
|
-
v
|
|
129
|
-
+-----------------------------+
|
|
130
|
-
| GitLab client (optional) |
|
|
131
|
-
| - MR note via CI_JOB_TOKEN |
|
|
132
|
-
+-----------------------------+
|
|
133
|
-
```
|
|
134
|
-
|
|
135
|
-
GitLab CI path:
|
|
136
|
-
```
|
|
137
|
-
Pipeline starts → component script fetches CPU/RAM timeseries from GCP Monitoring or AWS CloudWatch
|
|
138
|
-
→ writes JSON files → runs `gitgreen --provider gcp|aws ...`
|
|
139
|
-
→ emits `carbon-report.md` / `carbon-report.json`
|
|
140
|
-
→ optional MR note when CI_JOB_TOKEN is present
|
|
141
|
-
```
|
|
142
|
-
|
|
143
|
-
## Output Integrations
|
|
144
|
-
|
|
145
|
-
During `gitgreen init` you can opt into exporting GitGreen data to external systems. The wizard includes an integration step with two optional sinks:
|
|
89
|
+
The wizard will:
|
|
90
|
+
1. Configure your cloud provider (GCP/AWS)
|
|
91
|
+
2. Set machine type and region
|
|
92
|
+
3. Configure carbon budgets
|
|
93
|
+
4. Set up database exports (optional)
|
|
94
|
+
5. Append the tracking job to `.gitlab-ci.yml`
|
|
95
|
+
6. Store credentials in GitLab CI/CD variables
|
|
146
96
|
|
|
147
|
-
|
|
148
|
-
- **Runner inventory** – the machine catalog that powers your GitLab runners, including machine type and scope 3 estimates.
|
|
97
|
+
**That's it!** Run your next pipeline and carbon emissions will be calculated automatically. Results appear as MR comments, in your configured database, and are stored in carbon job artifacts.
|
|
149
98
|
|
|
150
|
-
|
|
151
|
-
- **MySQL** – populates `GITGREEN_JOB_MYSQL_*` / `GITGREEN_RUNNER_MYSQL_*` and inserts rows through a standard MySQL client.
|
|
152
|
-
- **PostgreSQL** – captures host, port, credentials, schema, table, and SSL mode (`GITGREEN_JOB_POSTGRES_*` / `GITGREEN_RUNNER_POSTGRES_*`) for storage in Postgres.
|
|
99
|
+
---
|
|
153
100
|
|
|
154
|
-
|
|
101
|
+
## Advanced
|
|
155
102
|
|
|
156
|
-
|
|
157
|
-
- writes each carbon calculation (typed summary columns plus CPU/RAM timeseries rows) and optional runner inventory snapshot into the configured sink.
|
|
103
|
+
### Providers
|
|
158
104
|
|
|
159
|
-
|
|
105
|
+
| Provider | Metrics Source | Status |
|
|
106
|
+
|----------|---------------|--------|
|
|
107
|
+
| **GCP** | Cloud Monitoring API | Full support |
|
|
108
|
+
| **AWS** | CloudWatch API | Full support |
|
|
160
109
|
|
|
161
|
-
|
|
110
|
+
### Output Integrations
|
|
162
111
|
|
|
163
|
-
|
|
164
|
-
2. The data target it handles (`job` vs `runner`)
|
|
165
|
-
3. Prompted credential fields (label, env var key, input type, default, mask flag)
|
|
112
|
+
Export carbon data to external databases for analytics and dashboards.
|
|
166
113
|
|
|
167
|
-
|
|
114
|
+
#### Supported Connectors
|
|
168
115
|
|
|
169
|
-
|
|
116
|
+
| Connector | Job Data | Runner Inventory |
|
|
117
|
+
|-----------|----------|------------------|
|
|
118
|
+
| MySQL | `GITGREEN_JOB_MYSQL_*` | `GITGREEN_RUNNER_MYSQL_*` |
|
|
119
|
+
| PostgreSQL | `GITGREEN_JOB_POSTGRES_*` | `GITGREEN_RUNNER_POSTGRES_*` |
|
|
170
120
|
|
|
171
|
-
|
|
121
|
+
#### Database Migrations
|
|
172
122
|
|
|
173
123
|
```bash
|
|
174
|
-
gitgreen migrate --scope job #
|
|
175
|
-
gitgreen migrate --scope runner #
|
|
176
|
-
gitgreen migrate --scope all #
|
|
124
|
+
gitgreen migrate --scope job # Job emissions tables
|
|
125
|
+
gitgreen migrate --scope runner # Runner inventory tables
|
|
126
|
+
gitgreen migrate --scope all # Both
|
|
177
127
|
```
|
|
178
128
|
|
|
179
|
-
|
|
129
|
+
Migrations run automatically in CI pipelines.
|
|
180
130
|
|
|
181
|
-
###
|
|
131
|
+
### Database Schema
|
|
182
132
|
|
|
183
|
-
|
|
133
|
+
<details>
|
|
134
|
+
<summary><strong>Job Table</strong> (per-job emissions)</summary>
|
|
184
135
|
|
|
185
136
|
| Column | Type | Description |
|
|
186
137
|
|--------|------|-------------|
|
|
187
|
-
| `id` | BIGSERIAL |
|
|
188
|
-
| `ingested_at` | TIMESTAMPTZ |
|
|
189
|
-
| `provider` | TEXT |
|
|
190
|
-
| `region` | TEXT | Cloud region/zone
|
|
191
|
-
| `machine_type` | TEXT | Instance type
|
|
192
|
-
| `
|
|
193
|
-
| `
|
|
194
|
-
| `
|
|
195
|
-
| `
|
|
196
|
-
| `
|
|
197
|
-
| `
|
|
198
|
-
| `
|
|
199
|
-
| `
|
|
200
|
-
| `
|
|
201
|
-
| `
|
|
202
|
-
| `
|
|
203
|
-
| `
|
|
204
|
-
| `
|
|
205
|
-
| `
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
| `runner_revision` | TEXT | GitLab runner revision |
|
|
212
|
-
| `payload` | JSONB | Full calculation result as JSON (for extensibility) |
|
|
213
|
-
|
|
214
|
-
#### Job timeseries table
|
|
138
|
+
| `id` | BIGSERIAL | Primary key |
|
|
139
|
+
| `ingested_at` | TIMESTAMPTZ | Insert timestamp |
|
|
140
|
+
| `provider` | TEXT | `gcp` or `aws` |
|
|
141
|
+
| `region` | TEXT | Cloud region/zone |
|
|
142
|
+
| `machine_type` | TEXT | Instance type |
|
|
143
|
+
| `runtime_seconds` | INT | Job duration |
|
|
144
|
+
| `total_emissions` | DOUBLE | Total gCO₂eq |
|
|
145
|
+
| `cpu_emissions` | DOUBLE | CPU gCO₂eq |
|
|
146
|
+
| `ram_emissions` | DOUBLE | RAM gCO₂eq |
|
|
147
|
+
| `scope3_emissions` | DOUBLE | Embodied gCO₂eq |
|
|
148
|
+
| `carbon_intensity` | DOUBLE | Grid intensity (gCO₂eq/kWh) |
|
|
149
|
+
| `pue` | DOUBLE | Power Usage Effectiveness |
|
|
150
|
+
| `carbon_budget` | DOUBLE | Budget threshold |
|
|
151
|
+
| `over_budget` | BOOLEAN | Budget exceeded |
|
|
152
|
+
| `gitlab_project_id` | BIGINT | GitLab project |
|
|
153
|
+
| `gitlab_pipeline_id` | BIGINT | Pipeline ID |
|
|
154
|
+
| `gitlab_job_id` | BIGINT | Job ID |
|
|
155
|
+
| `gitlab_job_name` | TEXT | Job name |
|
|
156
|
+
| `payload` | JSONB | Full result JSON |
|
|
157
|
+
|
|
158
|
+
</details>
|
|
159
|
+
|
|
160
|
+
<details>
|
|
161
|
+
<summary><strong>Timeseries Table</strong> (CPU/RAM metrics)</summary>
|
|
215
162
|
|
|
216
163
|
| Column | Type | Description |
|
|
217
164
|
|--------|------|-------------|
|
|
218
|
-
| `id` | BIGSERIAL |
|
|
219
|
-
| `job_id` | BIGINT | Foreign key to
|
|
220
|
-
| `metric` | TEXT |
|
|
221
|
-
| `ts` | TIMESTAMPTZ | Timestamp
|
|
222
|
-
| `value` | DOUBLE | Metric value
|
|
165
|
+
| `id` | BIGSERIAL | Primary key |
|
|
166
|
+
| `job_id` | BIGINT | Foreign key to job |
|
|
167
|
+
| `metric` | TEXT | `cpu`, `ram_used`, `ram_size` |
|
|
168
|
+
| `ts` | TIMESTAMPTZ | Timestamp |
|
|
169
|
+
| `value` | DOUBLE | Metric value |
|
|
170
|
+
|
|
171
|
+
</details>
|
|
223
172
|
|
|
224
|
-
|
|
173
|
+
<details>
|
|
174
|
+
<summary><strong>Runner Inventory Table</strong></summary>
|
|
225
175
|
|
|
226
176
|
| Column | Type | Description |
|
|
227
177
|
|--------|------|-------------|
|
|
228
|
-
| `id` | BIGSERIAL |
|
|
229
|
-
| `ingested_at` | TIMESTAMPTZ | When the record was inserted |
|
|
178
|
+
| `id` | BIGSERIAL | Primary key |
|
|
230
179
|
| `runner_id` | TEXT | GitLab runner ID |
|
|
231
|
-
| `runner_description` | TEXT | Runner description |
|
|
232
|
-
| `runner_version` | TEXT | GitLab runner version |
|
|
233
|
-
| `runner_revision` | TEXT | GitLab runner revision |
|
|
234
|
-
| `runner_platform` | TEXT | OS platform (e.g., `linux`) |
|
|
235
|
-
| `runner_architecture` | TEXT | CPU architecture (e.g., `amd64`) |
|
|
236
|
-
| `runner_executor` | TEXT | Executor type (e.g., `docker`, `shell`) |
|
|
237
|
-
| `runner_tags` | TEXT | Comma-separated runner tags |
|
|
238
180
|
| `machine_type` | TEXT | Instance type |
|
|
239
181
|
| `provider` | TEXT | Cloud provider |
|
|
240
|
-
| `region` | TEXT |
|
|
241
|
-
| `
|
|
242
|
-
| `
|
|
243
|
-
| `gcp_zone` | TEXT | GCP zone |
|
|
244
|
-
| `aws_region` | TEXT | AWS region (if applicable) |
|
|
245
|
-
| `aws_instance_id` | TEXT | AWS EC2 instance ID |
|
|
246
|
-
| `last_job_machine_type` | TEXT | Machine type from most recent job |
|
|
247
|
-
| `last_job_region` | TEXT | Region from most recent job |
|
|
248
|
-
| `last_job_provider` | TEXT | Provider from most recent job |
|
|
249
|
-
| `last_job_runtime_seconds` | INT | Runtime of most recent job |
|
|
250
|
-
| `last_job_total_emissions` | DOUBLE | Emissions from most recent job in gCO2eq |
|
|
251
|
-
| `last_job_recorded_at` | TIMESTAMPTZ | When the most recent job was recorded |
|
|
252
|
-
| `payload` | JSONB | Full runner metadata as JSON |
|
|
253
|
-
|
|
254
|
-
## Adding a provider
|
|
255
|
-
1. Extend `CloudProvider` and the provider guard in `src/index.ts` so the calculator accepts the new key.
|
|
256
|
-
2. Add machine power data (`<provider>_machine_power_profiles.json`) and, if needed, CPU profiles to `data/`, then update `PowerProfileRepository.loadMachineData` to load it.
|
|
257
|
-
3. Map regions to Electricity Maps zones and a PUE default in `ZoneMapper` (or via `data/runtime-pue-mappings.json` for runtime overrides).
|
|
258
|
-
4. Parse that provider's metrics into the `TimeseriesPoint` shape (timestamp + numeric value) alongside RAM size/usage, and update the CLI/init/templates to pull those metrics.
|
|
259
|
-
5. Wire any CI automation (runner tags, MR note flags) to pass the correct provider, machine type, and region strings.
|
|
260
|
-
|
|
261
|
-
## Publish
|
|
262
|
-
- Ensure version bump in `package.json`
|
|
263
|
-
- Run `pnpm -C node-module build && pnpm -C node-module test`
|
|
264
|
-
- Publish: `pnpm -C node-module publish --access public`
|
|
182
|
+
| `region` | TEXT | Region |
|
|
183
|
+
| `last_job_total_emissions` | DOUBLE | Last job emissions |
|
|
184
|
+
| `payload` | JSONB | Full metadata |
|
|
265
185
|
|
|
266
|
-
|
|
186
|
+
</details>
|
|
267
187
|
|
|
268
|
-
|
|
188
|
+
## Methodology
|
|
269
189
|
|
|
270
|
-
|
|
190
|
+
GitGreen's calculations are based on peer-reviewed methodologies from [re:cinq](https://re-cinq.com/blog/cloud-cpu-energy-consumption) and [Teads](https://github.com/re-cinq/emissions-data).
|
|
271
191
|
|
|
272
|
-
###
|
|
192
|
+
### Formula
|
|
273
193
|
|
|
274
|
-
|
|
194
|
+
```
|
|
195
|
+
E_total = E_operational + E_embodied
|
|
275
196
|
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
- **AWS CloudWatch API** - To fetch CPU/RAM metrics from your AWS account (requires your AWS credentials)
|
|
197
|
+
E_operational = (P_cpu + P_ram) × runtime_hours × PUE × carbon_intensity
|
|
198
|
+
E_embodied = scope3_hourly × runtime_hours
|
|
199
|
+
```
|
|
280
200
|
|
|
281
|
-
|
|
201
|
+
| Variable | Description | Source |
|
|
202
|
+
|----------|-------------|--------|
|
|
203
|
+
| `P_cpu` | CPU power (kW) | Interpolated from utilization |
|
|
204
|
+
| `P_ram` | RAM power (0.5 W/GB) | Industry standard for DDR4 |
|
|
205
|
+
| `PUE` | Data center efficiency | Google/AWS published data |
|
|
206
|
+
| `carbon_intensity` | Grid emissions (gCO₂eq/kWh) | Electricity Maps API |
|
|
207
|
+
| `scope3_hourly` | Embodied carbon rate | Dell R740 LCA study |
|
|
282
208
|
|
|
283
|
-
###
|
|
209
|
+
### CPU Power Model
|
|
284
210
|
|
|
285
|
-
|
|
211
|
+
CPU power is **non-linear** with utilization. We use cubic spline interpolation across measured data points:
|
|
286
212
|
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
213
|
+
| Utilization | Power (% of TDP) |
|
|
214
|
+
|-------------|------------------|
|
|
215
|
+
| 0% (idle) | 1.7% |
|
|
216
|
+
| 10% | 3.4% |
|
|
217
|
+
| 50% | 16.9% |
|
|
218
|
+
| 100% | 100% |
|
|
291
219
|
|
|
292
|
-
|
|
220
|
+
**VM Power Correction:** Cloud VMs share physical CPUs. We scale power by the ratio of VM vCPUs to physical threads:
|
|
293
221
|
|
|
294
|
-
|
|
222
|
+
```
|
|
223
|
+
P_vm = TDP × ratio × (vm_vcpus / physical_threads)
|
|
224
|
+
```
|
|
295
225
|
|
|
296
|
-
|
|
226
|
+
| CPU | Cores | Threads | TDP | Source |
|
|
227
|
+
|-----|-------|---------|-----|--------|
|
|
228
|
+
| Intel Xeon Gold 6268CL | 24 | 48 | 205W | [Intel](https://www.ebay.com/p/2321792675) |
|
|
229
|
+
| Intel Xeon Platinum 8481C | 56 | 112 | 350W | [TechPowerUp](https://www.techpowerup.com/cpu-specs/xeon-platinum-8481c.c3992) |
|
|
230
|
+
| AMD EPYC 7B12 | 64 | 128 | 240W | [AMD](https://www.newegg.com/p/1FR-00G6-00026) |
|
|
231
|
+
| Ampere Altra Q64-30 | 64 | 64 | 180W | [Ampere](https://amperecomputing.com/en/briefs/ampere-altra-family-product-brief) |
|
|
297
232
|
|
|
298
|
-
|
|
299
|
-
2. Adding the GitLab CI/CD component to your `.gitlab-ci.yml` manually
|
|
300
|
-
3. Running `gitgreen` directly with command-line options instead of using the component
|
|
233
|
+
### Scope 3 (Embodied Carbon)
|
|
301
234
|
|
|
302
|
-
|
|
235
|
+
Manufacturing emissions amortized over hardware lifespan (6 years):
|
|
303
236
|
|
|
304
|
-
|
|
237
|
+
| Component | Emissions |
|
|
238
|
+
|-----------|-----------|
|
|
239
|
+
| Base server | ~1000 kgCO₂eq |
|
|
240
|
+
| Per CPU | ~100 kgCO₂eq |
|
|
241
|
+
| Per 32GB DIMM | ~44 kgCO₂eq |
|
|
242
|
+
| Per SSD | ~50-100 kgCO₂eq |
|
|
305
243
|
|
|
306
|
-
|
|
244
|
+
*Source: Dell PowerEdge R740 Life Cycle Assessment*
|
|
307
245
|
|
|
308
|
-
|
|
309
|
-
- **Pipeline metadata** (start/end times) from GitLab CI/CD environment variables
|
|
246
|
+
### Accuracy & Limitations
|
|
310
247
|
|
|
311
|
-
|
|
248
|
+
**Expected Accuracy:**
|
|
249
|
+
- Relative comparisons: Excellent (comparing job A vs B)
|
|
250
|
+
- Absolute values: ±15-25% for CPU-bound workloads
|
|
312
251
|
|
|
313
|
-
|
|
252
|
+
**Known Limitations:**
|
|
314
253
|
|
|
315
|
-
|
|
254
|
+
| Limitation | Impact | Notes |
|
|
255
|
+
|------------|--------|-------|
|
|
256
|
+
| No GPU modeling | High | GPU jobs underestimated |
|
|
257
|
+
| Fixed RAM power | Low | 0.5 W/GB industry standard |
|
|
258
|
+
| No network/storage I/O | Low | <5% of typical job power |
|
|
316
259
|
|
|
317
|
-
|
|
318
|
-
```bash
|
|
319
|
-
gitgreen --provider gcp \
|
|
320
|
-
--machine e2-standard-4 \
|
|
321
|
-
--region us-central1-a \
|
|
322
|
-
--cpu-timeseries cpu.json \
|
|
323
|
-
--ram-used-timeseries ram-used.json \
|
|
324
|
-
--ram-size-timeseries ram-size.json \
|
|
325
|
-
--out-md report.md
|
|
326
|
-
```
|
|
260
|
+
## Architecture
|
|
327
261
|
|
|
328
|
-
**AWS example:**
|
|
329
|
-
```bash
|
|
330
|
-
gitgreen --provider aws \
|
|
331
|
-
--machine t3.medium \
|
|
332
|
-
--region us-east-1 \
|
|
333
|
-
--cpu-timeseries cpu.json \
|
|
334
|
-
--ram-used-timeseries ram-used.json \
|
|
335
|
-
--ram-size-timeseries ram-size.json \
|
|
336
|
-
--out-md report.md
|
|
337
262
|
```
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
|
|
263
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
264
|
+
│ GitLab Pipeline │
|
|
265
|
+
├─────────────────────────────────────────────────────────────┤
|
|
266
|
+
│ ┌─────────────┐ ┌──────────────────────────────────┐ │
|
|
267
|
+
│ │ Your Jobs │───▶│ GitGreen Carbon Tracking Job │ │
|
|
268
|
+
│ └─────────────┘ │ ├─ Fetch CPU/RAM metrics │ │
|
|
269
|
+
│ │ ├─ Calculate emissions │ │
|
|
270
|
+
│ │ ├─ Post MR comment │ │
|
|
271
|
+
│ │ └─ Export to database │ │
|
|
272
|
+
│ └──────────────────────────────────┘ │
|
|
273
|
+
└─────────────────────────────────────────────────────────────┘
|
|
274
|
+
│
|
|
275
|
+
┌────────────────────┼────────────────────┐
|
|
276
|
+
▼ ▼ ▼
|
|
277
|
+
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
|
278
|
+
│ GCP Monitoring │ │ Electricity Maps│ │ MySQL/Postgres │
|
|
279
|
+
│ AWS CloudWatch │ │ API │ │ (optional) │
|
|
280
|
+
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
|
346
281
|
```
|
|
347
282
|
|
|
348
|
-
|
|
283
|
+
## Configuration
|
|
349
284
|
|
|
350
|
-
|
|
285
|
+
### Adding a New Provider
|
|
351
286
|
|
|
352
|
-
|
|
287
|
+
1. Add machine power profiles to `data/<provider>_machine_power_profiles.json`
|
|
288
|
+
2. Update `PowerProfileRepository.loadMachineData()`
|
|
289
|
+
3. Map regions to Electricity Maps zones in `ZoneMapper`
|
|
290
|
+
4. Parse metrics into `TimeseriesPoint` format
|
|
291
|
+
5. Wire CI automation for provider-specific settings
|
|
353
292
|
|
|
354
|
-
###
|
|
293
|
+
### Source Data Files
|
|
355
294
|
|
|
356
|
-
|
|
295
|
+
| File | Description |
|
|
296
|
+
|------|-------------|
|
|
297
|
+
| `data/cpu_physical_specs.json` | Physical CPU specs with sources |
|
|
298
|
+
| `data/cpu_power_profiles.json` | TDP and power ratios |
|
|
299
|
+
| `data/gcp_machine_power_profiles.json` | GCP machine mappings |
|
|
300
|
+
| `data/aws_machine_power_profiles.json` | AWS instance mappings |
|
|
301
|
+
| `data/source/*.csv` | Original re:cinq research data |
|
|
357
302
|
|
|
358
|
-
|
|
359
|
-
E_total = E_operational + E_embodied
|
|
303
|
+
## FAQ
|
|
360
304
|
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
```
|
|
305
|
+
<details>
|
|
306
|
+
<summary><strong>Are my credentials secure?</strong></summary>
|
|
364
307
|
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
- **P_ram**: RAM power consumption (0.5 W/GB × used GB)
|
|
368
|
-
- **PUE**: Power Usage Effectiveness of the data center (typically 1.1-1.2 for hyperscalers)
|
|
369
|
-
- **carbon_intensity**: Grid carbon intensity in gCO2eq/kWh from Electricity Maps API
|
|
370
|
-
- **scope3_emissions_hourly**: Amortized embodied carbon from manufacturing
|
|
308
|
+
Yes. GitGreen stores all credentials in GitLab CI/CD variables (encrypted). Nothing is written to your local disk.
|
|
309
|
+
</details>
|
|
371
310
|
|
|
372
|
-
|
|
311
|
+
<details>
|
|
312
|
+
<summary><strong>What APIs does GitGreen call?</strong></summary>
|
|
373
313
|
|
|
374
|
-
|
|
314
|
+
Only APIs you explicitly configure:
|
|
315
|
+
- Electricity Maps API (carbon intensity)
|
|
316
|
+
- GCP Monitoring API (metrics)
|
|
317
|
+
- AWS CloudWatch API (metrics)
|
|
318
|
+
- GitLab API (MR comments, via CI_JOB_TOKEN)
|
|
375
319
|
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
| 0% (idle) | ~1.7% |
|
|
379
|
-
| 10% | ~3.4% |
|
|
380
|
-
| 50% | ~16.9% |
|
|
381
|
-
| 100% | 100% |
|
|
320
|
+
GitGreen has no backend server.
|
|
321
|
+
</details>
|
|
382
322
|
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
P_cpu(util) = CubicSpline([0, 10, 50, 100], [W_idle, W_10, W_50, W_100])(util)
|
|
386
|
-
```
|
|
323
|
+
<details>
|
|
324
|
+
<summary><strong>Will it break my .gitlab-ci.yml?</strong></summary>
|
|
387
325
|
|
|
388
|
-
|
|
326
|
+
No. GitGreen creates a backup before any changes, only appends content (never modifies existing jobs), and asks for confirmation.
|
|
327
|
+
</details>
|
|
389
328
|
|
|
390
|
-
|
|
329
|
+
<details>
|
|
330
|
+
<summary><strong>Can I run it without the wizard?</strong></summary>
|
|
391
331
|
|
|
392
|
-
|
|
332
|
+
Yes. Set CI/CD variables manually and run `gitgreen` with CLI options.
|
|
333
|
+
</details>
|
|
393
334
|
|
|
394
|
-
|
|
395
|
-
P_vm = TDP × ratio × (vm_vcpus / physical_threads)
|
|
396
|
-
```
|
|
335
|
+
## Contributing
|
|
397
336
|
|
|
398
|
-
|
|
337
|
+
1. Fork the repository
|
|
338
|
+
2. Create a feature branch
|
|
339
|
+
3. Run tests: `npm test`
|
|
340
|
+
4. Submit a pull request
|
|
399
341
|
|
|
400
|
-
|
|
401
|
-
|-----|-------|---------|-----|--------|
|
|
402
|
-
| Intel Xeon Gold 6268CL | 24 | 48 | 205W | [eBay/Intel](https://www.ebay.com/p/2321792675) |
|
|
403
|
-
| Intel Xeon Gold 6253CL | 18 | 36 | 205W | [PassMark](https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+Gold+6253CL) |
|
|
404
|
-
| Intel Xeon Platinum 8481C | 56 | 112 | 350W | [TechPowerUp](https://www.techpowerup.com/cpu-specs/xeon-platinum-8481c.c3992) |
|
|
405
|
-
| Intel Xeon Platinum 8373C | 36 | 72 | 300W | [Wikipedia](https://en.wikipedia.org/wiki/List_of_Intel_Xeon_processors_(Ice_Lake-based)) |
|
|
406
|
-
| AMD EPYC 7B12 | 64 | 128 | 240W | [Newegg](https://www.newegg.com/p/1FR-00G6-00026) |
|
|
407
|
-
| Ampere Altra Q64-30 | 64 | 64 | 180W | [Ampere](https://amperecomputing.com/en/briefs/ampere-altra-family-product-brief) |
|
|
408
|
-
|
|
409
|
-
**Example:** For `n2-standard-80` (80 vCPUs, Xeon Gold 6268CL with 48 threads):
|
|
410
|
-
- Old (inflated): 205W × 80 = 16,400W at 100%
|
|
411
|
-
- Corrected: 205W × (80/48) = 342W at 100%
|
|
342
|
+
## References
|
|
412
343
|
|
|
413
|
-
###
|
|
414
|
-
|
|
415
|
-
Scope 3 emissions represent the carbon footprint of manufacturing, shipping, and disposing of hardware. Based on Dell PowerEdge R740 LCA data:
|
|
416
|
-
|
|
417
|
-
| Component | Emissions (kgCO2eq) |
|
|
418
|
-
|-----------|---------------------|
|
|
419
|
-
| Base server (1 socket, low DRAM) | ~1000 |
|
|
420
|
-
| Per additional CPU | ~100 |
|
|
421
|
-
| Per 32GB DIMM | ~44 |
|
|
422
|
-
| Per SSD | ~50-100 |
|
|
344
|
+
### Research & Methodology
|
|
423
345
|
|
|
424
|
-
|
|
346
|
+
| Source | Description |
|
|
347
|
+
|--------|-------------|
|
|
348
|
+
| [re:cinq Cloud CPU Energy Consumption](https://re-cinq.com/blog/cloud-cpu-energy-consumption) | CPU power modeling methodology |
|
|
349
|
+
| [re:cinq emissions-data](https://github.com/re-cinq/emissions-data) | Machine power profiles and ratios |
|
|
350
|
+
| [Teads Engineering](https://engineering.teads.com/) | Original research on cloud carbon |
|
|
351
|
+
| Dell PowerEdge R740 LCA | Scope 3 embodied carbon data |
|
|
425
352
|
|
|
426
353
|
### Data Sources
|
|
427
354
|
|
|
428
355
|
| Data | Source |
|
|
429
356
|
|------|--------|
|
|
430
|
-
|
|
|
431
|
-
| GCP machine
|
|
432
|
-
|
|
|
433
|
-
|
|
|
434
|
-
|
|
|
435
|
-
|
|
436
|
-
### Accuracy & Limitations
|
|
437
|
-
|
|
438
|
-
#### What's grounded in real research
|
|
357
|
+
| Real-time carbon intensity | [Electricity Maps API](https://www.electricitymaps.com/) |
|
|
358
|
+
| GCP machine specifications | [Google Cloud CPU Platforms](https://cloud.google.com/compute/docs/cpu-platforms) |
|
|
359
|
+
| GCP data center PUE | [Google Data Center Efficiency](https://www.google.com/about/datacenters/efficiency/) |
|
|
360
|
+
| Intel CPU specifications | [Intel ARK](https://ark.intel.com/) |
|
|
361
|
+
| AMD CPU specifications | [AMD Product Pages](https://www.amd.com/en/products/specifications/processors) |
|
|
362
|
+
| Ampere CPU specifications | [Ampere Product Briefs](https://amperecomputing.com/briefs) |
|
|
439
363
|
|
|
440
|
-
|
|
441
|
-
|-----------|-------------|--------|
|
|
442
|
-
| CPU power ratios | Measured at 0%, 10%, 50%, 100% utilization | [re:cinq/Teads](https://github.com/re-cinq/emissions-data) |
|
|
443
|
-
| Physical CPU specs | Official vendor specifications | Intel ARK, TechPowerUp, AMD, Ampere |
|
|
444
|
-
| Carbon intensity | Real-time grid data | [Electricity Maps API](https://www.electricitymaps.com/) |
|
|
445
|
-
| PUE values | Published data center efficiency | [Google](https://www.google.com/about/datacenters/efficiency/) |
|
|
446
|
-
| Scope 3 methodology | Life cycle assessment | Dell PowerEdge R740 LCA |
|
|
364
|
+
### CPU Specifications Used
|
|
447
365
|
|
|
448
|
-
|
|
449
|
-
|
|
450
|
-
|
|
451
|
-
|
|
366
|
+
| CPU | Cores | Threads | TDP | Source |
|
|
367
|
+
|-----|-------|---------|-----|--------|
|
|
368
|
+
| Intel Xeon Gold 6268CL | 24 | 48 | 205W | [Product Listing](https://www.ebay.com/p/2321792675) |
|
|
369
|
+
| Intel Xeon Gold 6253CL | 18 | 36 | 205W | [PassMark](https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+Gold+6253CL) |
|
|
370
|
+
| Intel Xeon Platinum 8481C | 56 | 112 | 350W | [TechPowerUp](https://www.techpowerup.com/cpu-specs/xeon-platinum-8481c.c3992) |
|
|
371
|
+
| Intel Xeon Platinum 8373C | 36 | 72 | 300W | [Wikipedia](https://en.wikipedia.org/wiki/List_of_Intel_Xeon_processors_(Ice_Lake-based)) |
|
|
372
|
+
| AMD EPYC 7B12 | 64 | 128 | 240W | [Newegg](https://www.newegg.com/p/1FR-00G6-00026) |
|
|
373
|
+
| Ampere Altra Q64-30 | 64 | 64 | 180W | [Ampere Brief](https://amperecomputing.com/en/briefs/ampere-altra-family-product-brief) |
|
|
452
374
|
|
|
453
|
-
|
|
375
|
+
## License
|
|
454
376
|
|
|
455
|
-
|
|
456
|
-
|------------|--------|-------|
|
|
457
|
-
| RAM uses fixed 0.5 W/GB | Low | Industry standard estimate for DDR4 |
|
|
458
|
-
| Scope 3 only for some AWS types | Medium | GCP scope 3 data not yet available |
|
|
459
|
-
| No GPU modeling | High (if using GPUs) | GPU-heavy jobs will be underestimated |
|
|
460
|
-
| No network I/O modeling | Low | Typically <5% of job power |
|
|
461
|
-
| No storage I/O modeling | Low | Typically <5% of job power |
|
|
462
|
-
| Multi-tenant overhead | Low | Actual power may be 5-10% lower due to shared resources |
|
|
463
|
-
| CPU specs may be incomplete | Medium | Falls back to unscaled profile if CPU not in database |
|
|
464
|
-
|
|
465
|
-
#### What this means for you
|
|
466
|
-
|
|
467
|
-
- **CI/CD optimization**: The relative comparisons are reliable - if job A shows 2x the emissions of job B, that's meaningful
|
|
468
|
-
- **Absolute reporting**: Use the numbers for directional guidance, not precise carbon accounting
|
|
469
|
-
- **Trend tracking**: Week-over-week and month-over-month trends are accurate
|
|
470
|
-
- **GPU workloads**: Currently underestimated - GPU power not modeled
|
|
471
|
-
|
|
472
|
-
### Source Data
|
|
473
|
-
|
|
474
|
-
Raw source data files are available in `data/`:
|
|
475
|
-
- `cpu_physical_specs.json` - Physical CPU specs with thread counts and sources (our research)
|
|
476
|
-
- `cpu_power_profiles.json` - TDP and power ratios per CPU type
|
|
477
|
-
- `gcp_machine_power_profiles.json` - GCP machine type to power mappings
|
|
478
|
-
- `aws_machine_power_profiles.json` - AWS instance type to power mappings
|
|
479
|
-
|
|
480
|
-
Original re:cinq data in `data/source/`:
|
|
481
|
-
- `GCP Machine types - CPU Profiles.csv` - TDP and power ratios per CPU
|
|
482
|
-
- `GCP Machine types - Instances.csv` - Machine type to CPU mappings
|
|
483
|
-
- `GCP Machine types - Scope 3 Ratios.csv` - Embodied carbon factors
|
|
484
|
-
- `GCP Machine types - Dell R740 LCA.csv` - Life cycle assessment reference
|
|
377
|
+
MIT License - see [LICENSE](LICENSE) for details.
|
|
485
378
|
|
|
486
|
-
|
|
379
|
+
---
|
|
487
380
|
|
|
488
|
-
|
|
381
|
+
<p align="center">
|
|
382
|
+
<sub>Built with care for a sustainable software future.</sub>
|
|
383
|
+
</p>
|
package/package.json
CHANGED
|
@@ -1,268 +0,0 @@
|
|
|
1
|
-
#!/usr/bin/env python3
|
|
2
|
-
"""
|
|
3
|
-
Build complete GCP machine power profiles by correlating data across multiple CSV files.
|
|
4
|
-
The power data is spread across different files and needs to be combined.
|
|
5
|
-
"""
|
|
6
|
-
import pandas as pd
|
|
7
|
-
import json
|
|
8
|
-
import numpy as np
|
|
9
|
-
|
|
10
|
-
def find_actual_power_data():
|
|
11
|
-
"""
|
|
12
|
-
Search through all CSV files to find where the actual power consumption data is
|
|
13
|
-
"""
|
|
14
|
-
print("=== SEARCHING FOR ACTUAL POWER DATA ===")
|
|
15
|
-
|
|
16
|
-
# 1. Check Machine Ratios file
|
|
17
|
-
print("\n--- Machine Ratios File ---")
|
|
18
|
-
ratios_df = pd.read_csv("data/GCP Machine types - Machine Ratios.csv")
|
|
19
|
-
|
|
20
|
-
# Look for non-NaN power data
|
|
21
|
-
power_cols = ['CPU Name', 'PkgWatt Idle', 'PkgWatt CPUStress 10%', 'PkgWatt CPUStress 50%', 'PkgWatt Average 100%']
|
|
22
|
-
power_data = ratios_df[power_cols].dropna()
|
|
23
|
-
|
|
24
|
-
print(f"Rows with power data: {len(power_data)}")
|
|
25
|
-
if len(power_data) > 0:
|
|
26
|
-
print("Sample power data from Machine Ratios:")
|
|
27
|
-
print(power_data.head())
|
|
28
|
-
|
|
29
|
-
# 2. Check CPU Profiles file
|
|
30
|
-
print("\n--- CPU Profiles File ---")
|
|
31
|
-
cpu_df = pd.read_csv("data/GCP Machine types - CPU Profiles.csv")
|
|
32
|
-
|
|
33
|
-
# Look for ratio data that can be used to calculate power
|
|
34
|
-
ratio_cols = ['Processor SKU', 'TDP (W)', 'IDLE ratio', '10% ratio', '50% Ratio']
|
|
35
|
-
cpu_power = cpu_df[ratio_cols].dropna()
|
|
36
|
-
|
|
37
|
-
print(f"Rows with CPU ratio data: {len(cpu_power)}")
|
|
38
|
-
if len(cpu_power) > 0:
|
|
39
|
-
print("Sample CPU ratio data:")
|
|
40
|
-
print(cpu_power.head())
|
|
41
|
-
|
|
42
|
-
# 3. Check Bare Metal Profiles (this had actual power data in the first analysis)
|
|
43
|
-
print("\n--- Bare Metal Power Profiles ---")
|
|
44
|
-
bare_metal_df = pd.read_csv("data/GCP Machine types - Bare Metal Power Profiles.csv")
|
|
45
|
-
|
|
46
|
-
# Find rows with actual power measurements
|
|
47
|
-
bare_metal_power_cols = ['Product Name', 'PkgWatt Idle', 'PkgWatt CPUStress 10%', 'PkgWatt CPUStress 50%', 'PkgWatt CPUStress 100%']
|
|
48
|
-
bare_metal_power = bare_metal_df[bare_metal_power_cols].dropna()
|
|
49
|
-
|
|
50
|
-
print(f"Rows with bare metal power data: {len(bare_metal_power)}")
|
|
51
|
-
if len(bare_metal_power) > 0:
|
|
52
|
-
print("Sample bare metal power data:")
|
|
53
|
-
print(bare_metal_power.head())
|
|
54
|
-
|
|
55
|
-
return power_data, cpu_power, bare_metal_power
|
|
56
|
-
|
|
57
|
-
def correlate_cpu_to_machines():
|
|
58
|
-
"""
|
|
59
|
-
Try to correlate CPU types to GCP machine instances
|
|
60
|
-
"""
|
|
61
|
-
print("\n=== CORRELATING CPU TYPES TO GCP MACHINES ===")
|
|
62
|
-
|
|
63
|
-
# Load instances file
|
|
64
|
-
instances_df = pd.read_csv("data/GCP Machine types - Instances.csv")
|
|
65
|
-
|
|
66
|
-
# Look at CPU information in instances
|
|
67
|
-
cpu_info_cols = ['Instance type', 'Platform CPU Name', 'Instance vCPU', 'Instance Memory (in GB)']
|
|
68
|
-
cpu_info = instances_df[cpu_info_cols].dropna(subset=['Platform CPU Name'])
|
|
69
|
-
|
|
70
|
-
print(f"Machines with CPU information: {len(cpu_info)}")
|
|
71
|
-
if len(cpu_info) > 0:
|
|
72
|
-
print("Sample CPU information:")
|
|
73
|
-
print(cpu_info.head(10))
|
|
74
|
-
|
|
75
|
-
# Get unique CPU types used in GCP
|
|
76
|
-
unique_cpus = cpu_info['Platform CPU Name'].unique()
|
|
77
|
-
print(f"\nUnique CPU types in GCP: {len(unique_cpus)}")
|
|
78
|
-
for cpu in unique_cpus[:10]: # Show first 10
|
|
79
|
-
print(f" - {cpu}")
|
|
80
|
-
|
|
81
|
-
return cpu_info
|
|
82
|
-
|
|
83
|
-
def build_power_profiles_from_ratios():
|
|
84
|
-
"""
|
|
85
|
-
Build power profiles using CPU ratios and TDP values
|
|
86
|
-
"""
|
|
87
|
-
print("\n=== BUILDING POWER PROFILES FROM CPU RATIOS ===")
|
|
88
|
-
|
|
89
|
-
# Load CPU profiles with ratios
|
|
90
|
-
cpu_df = pd.read_csv("data/GCP Machine types - CPU Profiles.csv")
|
|
91
|
-
|
|
92
|
-
# Clean and process the data
|
|
93
|
-
ratio_data = cpu_df[['Processor SKU', 'TDP (W)', 'IDLE ratio', '10% ratio', '50% Ratio']].dropna()
|
|
94
|
-
|
|
95
|
-
if len(ratio_data) > 0:
|
|
96
|
-
print(f"CPUs with ratio data: {len(ratio_data)}")
|
|
97
|
-
|
|
98
|
-
# Calculate actual power consumption from ratios
|
|
99
|
-
power_profiles = {}
|
|
100
|
-
|
|
101
|
-
for _, row in ratio_data.iterrows():
|
|
102
|
-
cpu_name = row['Processor SKU']
|
|
103
|
-
tdp_watts = row['TDP (W)']
|
|
104
|
-
idle_ratio = row['IDLE ratio']
|
|
105
|
-
ratio_10 = row['10% ratio']
|
|
106
|
-
ratio_50 = row['50% Ratio']
|
|
107
|
-
|
|
108
|
-
# Calculate power at different utilization levels
|
|
109
|
-
# Using TDP and ratios to estimate power consumption
|
|
110
|
-
power_profiles[cpu_name] = {
|
|
111
|
-
'tdp_watts': tdp_watts,
|
|
112
|
-
'power_profile': [
|
|
113
|
-
{'percentage': 0, 'watts': tdp_watts * idle_ratio},
|
|
114
|
-
{'percentage': 10, 'watts': tdp_watts * ratio_10},
|
|
115
|
-
{'percentage': 50, 'watts': tdp_watts * ratio_50},
|
|
116
|
-
{'percentage': 100, 'watts': tdp_watts} # Assume 100% = TDP
|
|
117
|
-
]
|
|
118
|
-
}
|
|
119
|
-
|
|
120
|
-
print("\nSample calculated power profiles:")
|
|
121
|
-
for i, (cpu_name, profile) in enumerate(power_profiles.items()):
|
|
122
|
-
if i < 3: # Show first 3
|
|
123
|
-
print(f"\n{cpu_name}:")
|
|
124
|
-
print(f" TDP: {profile['tdp_watts']}W")
|
|
125
|
-
for point in profile['power_profile']:
|
|
126
|
-
print(f" {point['percentage']}%: {point['watts']:.1f}W")
|
|
127
|
-
|
|
128
|
-
# Save CPU power profiles
|
|
129
|
-
with open('cpu_power_profiles.json', 'w') as f:
|
|
130
|
-
json.dump(power_profiles, f, indent=2)
|
|
131
|
-
|
|
132
|
-
print(f"\n✅ Saved {len(power_profiles)} CPU power profiles to cpu_power_profiles.json")
|
|
133
|
-
|
|
134
|
-
return power_profiles
|
|
135
|
-
else:
|
|
136
|
-
print("❌ No usable ratio data found")
|
|
137
|
-
return {}
|
|
138
|
-
|
|
139
|
-
def map_gcp_machines_to_power():
|
|
140
|
-
"""
|
|
141
|
-
Map GCP machine types to their CPU power profiles
|
|
142
|
-
"""
|
|
143
|
-
print("\n=== MAPPING GCP MACHINES TO POWER PROFILES ===")
|
|
144
|
-
|
|
145
|
-
# Load instances and find CPU mappings
|
|
146
|
-
instances_df = pd.read_csv("data/GCP Machine types - Instances.csv")
|
|
147
|
-
cpu_info = instances_df[['Instance type', 'Platform CPU Name', 'Instance vCPU', 'Instance Memory (in GB)']].dropna(subset=['Platform CPU Name'])
|
|
148
|
-
|
|
149
|
-
# Load CPU power profiles
|
|
150
|
-
try:
|
|
151
|
-
with open('cpu_power_profiles.json', 'r') as f:
|
|
152
|
-
cpu_power_profiles = json.load(f)
|
|
153
|
-
except:
|
|
154
|
-
print("❌ CPU power profiles not found. Run build_power_profiles_from_ratios() first.")
|
|
155
|
-
return {}
|
|
156
|
-
|
|
157
|
-
# Map GCP machines to their power profiles
|
|
158
|
-
gcp_machine_profiles = {}
|
|
159
|
-
|
|
160
|
-
for _, row in cpu_info.iterrows():
|
|
161
|
-
machine_type = row['Instance type']
|
|
162
|
-
cpu_name = row['Platform CPU Name']
|
|
163
|
-
vcpus = row['Instance vCPU']
|
|
164
|
-
memory_gb = row['Instance Memory (in GB)']
|
|
165
|
-
|
|
166
|
-
# Find matching CPU power profile (exact match or partial match)
|
|
167
|
-
matching_cpu = None
|
|
168
|
-
for cpu_profile_name in cpu_power_profiles.keys():
|
|
169
|
-
if cpu_name in cpu_profile_name or cpu_profile_name in cpu_name:
|
|
170
|
-
matching_cpu = cpu_profile_name
|
|
171
|
-
break
|
|
172
|
-
|
|
173
|
-
if matching_cpu:
|
|
174
|
-
# Scale power consumption based on vCPUs (since profiles are per-CPU)
|
|
175
|
-
base_profile = cpu_power_profiles[matching_cpu]
|
|
176
|
-
|
|
177
|
-
gcp_machine_profiles[machine_type] = {
|
|
178
|
-
'vcpus': vcpus,
|
|
179
|
-
'memory_gb': memory_gb,
|
|
180
|
-
'platform_cpu': cpu_name,
|
|
181
|
-
'matched_cpu_profile': matching_cpu,
|
|
182
|
-
'cpu_power_profile': [
|
|
183
|
-
{
|
|
184
|
-
'percentage': point['percentage'],
|
|
185
|
-
'watts': point['watts'] * (vcpus / base_profile.get('vcpus', 1)) # Scale by vCPU count
|
|
186
|
-
}
|
|
187
|
-
for point in base_profile['power_profile']
|
|
188
|
-
]
|
|
189
|
-
}
|
|
190
|
-
|
|
191
|
-
if gcp_machine_profiles:
|
|
192
|
-
print(f"✅ Mapped {len(gcp_machine_profiles)} GCP machines to power profiles")
|
|
193
|
-
|
|
194
|
-
# Show sample mappings
|
|
195
|
-
print("\nSample GCP machine power profiles:")
|
|
196
|
-
for i, (machine, profile) in enumerate(gcp_machine_profiles.items()):
|
|
197
|
-
if i < 3:
|
|
198
|
-
print(f"\n{machine} ({profile['vcpus']} vCPUs, {profile['memory_gb']}GB):")
|
|
199
|
-
print(f" CPU: {profile['platform_cpu']}")
|
|
200
|
-
print(f" Matched profile: {profile['matched_cpu_profile']}")
|
|
201
|
-
for point in profile['cpu_power_profile']:
|
|
202
|
-
print(f" {point['percentage']}%: {point['watts']:.1f}W")
|
|
203
|
-
|
|
204
|
-
# Save complete GCP machine power profiles
|
|
205
|
-
with open('gcp_machine_power_profiles.json', 'w') as f:
|
|
206
|
-
json.dump(gcp_machine_profiles, f, indent=2)
|
|
207
|
-
|
|
208
|
-
print(f"\n✅ Saved complete GCP machine power profiles to gcp_machine_power_profiles.json")
|
|
209
|
-
|
|
210
|
-
return gcp_machine_profiles
|
|
211
|
-
else:
|
|
212
|
-
print("❌ Could not map any GCP machines to power profiles")
|
|
213
|
-
return {}
|
|
214
|
-
|
|
215
|
-
def final_recommendations():
|
|
216
|
-
"""
|
|
217
|
-
Provide final implementation recommendations based on available data
|
|
218
|
-
"""
|
|
219
|
-
print("\n" + "="*60)
|
|
220
|
-
print("FINAL IMPLEMENTATION RECOMMENDATIONS")
|
|
221
|
-
print("="*60)
|
|
222
|
-
|
|
223
|
-
try:
|
|
224
|
-
with open('gcp_machine_power_profiles.json', 'r') as f:
|
|
225
|
-
profiles = json.load(f)
|
|
226
|
-
|
|
227
|
-
print("✅ SUCCESS: Ready for re:cinq implementation!")
|
|
228
|
-
print(f" - {len(profiles)} GCP machine types with power profiles")
|
|
229
|
-
print(" - Power consumption curves: 0%, 10%, 50%, 100% CPU utilization")
|
|
230
|
-
print(" - Data ready for cubic spline interpolation")
|
|
231
|
-
|
|
232
|
-
print("\n📋 IMPLEMENTATION STEPS:")
|
|
233
|
-
print("1. Load gcp_machine_power_profiles.json in CarbonService")
|
|
234
|
-
print("2. Extract machine type from GitLab runner tags")
|
|
235
|
-
print("3. Use cubic-spline package for power interpolation")
|
|
236
|
-
print("4. Get real-time carbon intensity from Electricity Maps API")
|
|
237
|
-
print("5. Apply Google data center PUE values")
|
|
238
|
-
print("6. Calculate: interpolated_power(kW) × runtime(h) × PUE × carbon_intensity")
|
|
239
|
-
|
|
240
|
-
print(f"\n🔧 SAMPLE CALCULATION CODE:")
|
|
241
|
-
sample_machine = list(profiles.keys())[0]
|
|
242
|
-
sample_profile = profiles[sample_machine]
|
|
243
|
-
print(f"// Example for {sample_machine}")
|
|
244
|
-
print("const powerProfile = [")
|
|
245
|
-
for point in sample_profile['cpu_power_profile']:
|
|
246
|
-
print(f" {{ percentage: {point['percentage']}, watts: {point['watts']:.1f} }},")
|
|
247
|
-
print("];")
|
|
248
|
-
print("const powerWatts = cubicSplineInterpolation(powerProfile, cpuUtilization);")
|
|
249
|
-
|
|
250
|
-
except FileNotFoundError:
|
|
251
|
-
print("❌ No machine power profiles generated")
|
|
252
|
-
print(" Run the correlation functions first")
|
|
253
|
-
|
|
254
|
-
if __name__ == "__main__":
|
|
255
|
-
# Step 1: Find where actual power data exists
|
|
256
|
-
machine_ratios_power, cpu_ratios, bare_metal_power = find_actual_power_data()
|
|
257
|
-
|
|
258
|
-
# Step 2: Correlate CPU information
|
|
259
|
-
cpu_info = correlate_cpu_to_machines()
|
|
260
|
-
|
|
261
|
-
# Step 3: Build power profiles from available data
|
|
262
|
-
cpu_profiles = build_power_profiles_from_ratios()
|
|
263
|
-
|
|
264
|
-
# Step 4: Map GCP machines to power profiles
|
|
265
|
-
gcp_profiles = map_gcp_machines_to_power()
|
|
266
|
-
|
|
267
|
-
# Step 5: Final recommendations
|
|
268
|
-
final_recommendations()
|