gitgreen 1.3.0 → 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,488 +1,383 @@
1
- # GitGreen CLI
1
+ <p align="center">
2
+ <h1 align="center">GitGreen</h1>
3
+ <p align="center">
4
+ <strong>Carbon footprint tracking for GitLab CI/CD pipelines</strong>
5
+ </p>
6
+ <p align="center">
7
+ <a href="#quick-start">Quick Start</a> •
8
+ <a href="#installation">Installation</a> •
9
+ <a href="#usage">Usage</a> •
10
+ <a href="#methodology">Methodology</a> •
11
+ <a href="#faq">FAQ</a>
12
+ </p>
13
+ </p>
14
+
15
+ ---
16
+
17
+ [![npm version](https://img.shields.io/npm/v/gitgreen.svg)](https://www.npmjs.com/package/gitgreen)
18
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
19
+ [![Node.js](https://img.shields.io/badge/Node.js-20+-339933?logo=node.js&logoColor=white)](https://nodejs.org/)
20
+
21
+ GitGreen measures the carbon emissions of your CI/CD jobs using real CPU/RAM metrics, grid carbon intensity data, and research-backed power models. Works with **GCP** and **AWS** runners.
2
22
 
3
- Self-contained carbon calculation CLI for GitLab jobs (no API server). It reuses the same power profiles and budget reporting as the existing implementation, pulls Electricity Maps intensity, and can post Merge Request notes via `CI_JOB_TOKEN`. Works with both GCP and AWS runners.
4
-
5
- ## Prerequisites
6
-
7
- Before installing GitGreen CLI, ensure you have the following installed and configured:
8
-
9
- ### Required
10
-
11
- - **Node.js** (v20 or higher) - [Download](https://nodejs.org/)
12
- - Required to run the CLI tool
13
- - Check version: `node --version`
23
+ ```
24
+ Total emissions: 2.847 gCO₂e
25
+ ├── CPU: 1.923 gCO₂e
26
+ ├── RAM: 0.412 gCO₂e
27
+ └── Scope 3: 0.512 gCO₂e (embodied carbon)
28
+ ```
14
29
 
15
- - **npm** or **pnpm** - Package manager (comes with Node.js)
16
- - This project uses `pnpm@8.15.4` as the package manager
17
- - Install pnpm: `npm install -g pnpm`
30
+ ## Quick Start
18
31
 
19
- - **Git** - [Download](https://git-scm.com/)
20
- - Required for detecting GitLab project information during initialization
21
- - Check version: `git --version`
32
+ ```bash
33
+ # Install globally
34
+ npm install -g gitgreen
22
35
 
23
- ### Cloud Provider CLIs (Required based on provider)
36
+ # Initialize in your GitLab project
37
+ cd your-repo
38
+ gitgreen init
39
+ ```
24
40
 
25
- - **Google Cloud SDK (gcloud CLI)** - [Installation Guide](https://cloud.google.com/sdk/docs/install-sdk)
26
- - Required for GCP provider setup and authentication
27
- - After installation, authenticate: `gcloud auth login`
28
- - Set default project: `gcloud config set project YOUR_PROJECT_ID`
29
- - Check version: `gcloud --version`
41
+ The wizard configures everything: provider credentials, carbon budgets, and CI/CD integration.
30
42
 
31
- - **AWS Credentials** - [AWS CLI Installation](https://aws.amazon.com/cli/) (optional, but recommended)
32
- - Required for AWS provider (can use environment variables or AWS CLI config)
33
- - Configure credentials: `aws configure`
34
- - Or set environment variables: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`
43
+ ## Table of Contents
35
44
 
36
- ### Optional (but recommended)
45
+ - [Installation](#installation)
46
+ - [Usage](#usage)
47
+ - [Advanced](#advanced)
48
+ - [Providers](#providers)
49
+ - [Output Integrations](#output-integrations)
50
+ - [Database Schema](#database-schema)
51
+ - [Methodology](#methodology)
52
+ - [Architecture](#architecture)
53
+ - [Configuration](#configuration)
54
+ - [FAQ](#faq)
55
+ - [Contributing](#contributing)
56
+ - [References](#references)
57
+ - [License](#license)
37
58
 
38
- - **GitLab CLI (glab)** - [Installation Guide](https://docs.gitlab.com/cli/)
39
- - Recommended for easier GitLab authentication and CI/CD variable management
40
- - After installation, authenticate: `glab auth login`
41
- - Check version: `glab --version`
42
- - If not installed, you'll need to provide a GitLab Personal Access Token manually
59
+ ## Installation
43
60
 
44
- ### API Keys
61
+ ### Prerequisites
45
62
 
46
- - **Electricity Maps API Key** - [Get free key](https://api-portal.electricitymaps.com)
47
- - Required for carbon intensity data
48
- - You'll be prompted for this during `gitgreen init`
63
+ | Requirement | Version | Notes |
64
+ |-------------|---------|-------|
65
+ | Node.js | 20+ | [Download](https://nodejs.org/) |
66
+ | Git | Any | For GitLab project detection |
67
+ | gcloud CLI | Any | Required for GCP ([Install](https://cloud.google.com/sdk/docs/install-sdk)) |
68
+ | AWS CLI | Any | Required for AWS ([Install](https://aws.amazon.com/cli/)) |
69
+ | glab CLI | Any | Optional, for easier GitLab auth ([Install](https://docs.gitlab.com/cli/)) |
49
70
 
50
- ## Install
51
- - From npm (global CLI):
52
- ```bash
53
- npm install -g gitgreen
54
- gitgreen --help
55
- ```
71
+ ### Install
56
72
 
57
- Run tests:
58
73
  ```bash
59
- npm test
74
+ npm install -g gitgreen
60
75
  ```
61
76
 
62
- Stress test multiple live configs (build first, real APIs):
63
- ```bash
64
- npm build
65
- npm stress
66
- ```
77
+ ### API Keys
67
78
 
68
- ## Usage
79
+ - **Electricity Maps** (required): [Get free API key](https://api-portal.electricitymaps.com)
69
80
 
70
- ### Initializing a New GitLab Project
81
+ ## Usage
71
82
 
72
- To get started with carbon tracking in any GitLab project (any project with a remote pointing to a GitLab instance), run:
83
+ ### Initialize a Project
73
84
 
74
85
  ```bash
75
- # In your repo
76
86
  gitgreen init
77
87
  ```
78
88
 
79
- The initialization wizard will guide you through the setup process:
80
- - Configure provider/machine/region settings
81
- - Set carbon budgets
82
- - Configure MR note preferences
83
- - Choose to use an existing runner or spin up a new one with supported providers
84
-
85
- After initialization, the wizard will:
86
- - Append a ready-made job to your `.gitlab-ci.yml`
87
- - Print the CI/CD variable checklist (ELECTRICITY_MAPS_API_KEY, provider credentials, budget flags)
88
-
89
- ### How It Works
90
-
91
- Once initialized, all subsequent pipelines will run on the configured runner, and their performance will be automatically measured. The carbon tracking is implemented using GitLab CI/CD components as a final step in your pipeline. The carbon tracking job itself is not computationally expensive, so it adds minimal overhead to your CI/CD workflows.
92
-
93
- Key environment variables:
94
- - `ELECTRICITY_MAPS_API_KEY` (required) - Get a free key from https://api-portal.electricitymaps.com
95
- - `GCP_PROJECT_ID` / `GOOGLE_CLOUD_PROJECT` (required for GCP) - Your GCP project ID
96
- - `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` (required for AWS CloudWatch) - Credentials with CloudWatch read access on the runner
97
- - `AWS_REGION` (for AWS) - Region of the runner EC2 instance
98
-
99
- The `gitgreen init` wizard will automatically set all necessary GitLab CI/CD variables for you.
100
-
101
- ## Providers
102
-
103
- GitGreen supports multiple cloud providers:
104
-
105
- - **GCP**: Fully wired — the CLI parser expects GCP Monitoring JSON, the GitLab component polls GCE metrics, and `gitgreen init` provisions or reuses GCP runners.
106
- - **AWS**: CloudWatch-backed — the CLI can pull metrics directly via `--from-cloudwatch`, the GitLab component fetches CloudWatch CPU/RAM data, and `gitgreen init` can configure or provision EC2 runners.
107
-
108
- ## Architecture
109
- Runtime flow:
110
- ```
111
- CPU/RAM timeseries (GCP Monitoring JSON or custom collector)
112
- |
113
- v
114
- +------------------+ +----------------------------+
115
- | CLI (gitgreen) |--->| CarbonCalculator |
116
- | - parse metrics | | - PowerProfileRepository |<- machine data in data/*.json
117
- | - CLI/env opts | | - ZoneMapper (region→zone) |<- static + runtime PUE mapping
118
- +------------------+ | - IntensityProvider |<- Electricity Maps API
119
- +-------------+------------+
120
- |
121
- v
122
- +-----------------------------+
123
- | Report formatter |
124
- | - Markdown/JSON artifacts |
125
- | - Budget evaluation |
126
- +-------------+---------------+
127
- |
128
- v
129
- +-----------------------------+
130
- | GitLab client (optional) |
131
- | - MR note via CI_JOB_TOKEN |
132
- +-----------------------------+
133
- ```
134
-
135
- GitLab CI path:
136
- ```
137
- Pipeline starts → component script fetches CPU/RAM timeseries from GCP Monitoring or AWS CloudWatch
138
- → writes JSON files → runs `gitgreen --provider gcp|aws ...`
139
- → emits `carbon-report.md` / `carbon-report.json`
140
- → optional MR note when CI_JOB_TOKEN is present
141
- ```
142
-
143
- ## Output Integrations
144
-
145
- During `gitgreen init` you can opt into exporting GitGreen data to external systems. The wizard includes an integration step with two optional sinks:
89
+ The wizard will:
90
+ 1. Configure your cloud provider (GCP/AWS)
91
+ 2. Set machine type and region
92
+ 3. Configure carbon budgets
93
+ 4. Set up database exports (optional)
94
+ 5. Append the tracking job to `.gitlab-ci.yml`
95
+ 6. Store credentials in GitLab CI/CD variables
146
96
 
147
- - **Per-job carbon data** emissions, runtime, and runner tags for every CI job.
148
- - **Runner inventory** – the machine catalog that powers your GitLab runners, including machine type and scope 3 estimates.
97
+ **That's it!** Run your next pipeline and carbon emissions will be calculated automatically. Results appear as MR comments, in your configured database, and are stored in carbon job artifacts.
149
98
 
150
- Built-in connectors today:
151
- - **MySQL** – populates `GITGREEN_JOB_MYSQL_*` / `GITGREEN_RUNNER_MYSQL_*` and inserts rows through a standard MySQL client.
152
- - **PostgreSQL** – captures host, port, credentials, schema, table, and SSL mode (`GITGREEN_JOB_POSTGRES_*` / `GITGREEN_RUNNER_POSTGRES_*`) for storage in Postgres.
99
+ ---
153
100
 
154
- When you select either connector, the wizard captures host, port, username, password, database, and target table names and stores them in CI/CD variables. It immediately connects with those credentials to ensure the database, schema, and table exist (job sinks also create a `<table>_timeseries` table linked via foreign key). During CI, the GitGreen CLI automatically detects those env vars and:
101
+ ## Advanced
155
102
 
156
- - runs `gitgreen migrate --scope job|runner` to apply any pending migrations (tracked per DB via `gitgreen_migrations`);
157
- - writes each carbon calculation (typed summary columns plus CPU/RAM timeseries rows) and optional runner inventory snapshot into the configured sink.
103
+ ### Providers
158
104
 
159
- ### Extending the interface
105
+ | Provider | Metrics Source | Status |
106
+ |----------|---------------|--------|
107
+ | **GCP** | Cloud Monitoring API | Full support |
108
+ | **AWS** | CloudWatch API | Full support |
160
109
 
161
- Additional connectors can be added without touching the wizard logic. Each destination implements the `OutputIntegration` interface in `src/lib/integrations/output-integrations.ts`, which specifies:
110
+ ### Output Integrations
162
111
 
163
- 1. Display metadata (`id`, `name`, `description`)
164
- 2. The data target it handles (`job` vs `runner`)
165
- 3. Prompted credential fields (label, env var key, input type, default, mask flag)
112
+ Export carbon data to external databases for analytics and dashboards.
166
113
 
167
- To add another sink (for example PostgreSQL or a webhook), create a new entry in that file with the fields your integration needs. Re-run `gitgreen init` and the option will automatically appear in the integration step.
114
+ #### Supported Connectors
168
115
 
169
- ### Database migrations
116
+ | Connector | Job Data | Runner Inventory |
117
+ |-----------|----------|------------------|
118
+ | MySQL | `GITGREEN_JOB_MYSQL_*` | `GITGREEN_RUNNER_MYSQL_*` |
119
+ | PostgreSQL | `GITGREEN_JOB_POSTGRES_*` | `GITGREEN_RUNNER_POSTGRES_*` |
170
120
 
171
- Structured sinks rely on migrations tracked in `gitgreen_migrations`. Run them whenever you update GitGreen or change table names:
121
+ #### Database Migrations
172
122
 
173
123
  ```bash
174
- gitgreen migrate --scope job # apply job sink migrations (summary + timeseries)
175
- gitgreen migrate --scope runner # apply runner inventory migrations
176
- gitgreen migrate --scope all # convenience wrapper (used by the GitLab component)
124
+ gitgreen migrate --scope job # Job emissions tables
125
+ gitgreen migrate --scope runner # Runner inventory tables
126
+ gitgreen migrate --scope all # Both
177
127
  ```
178
128
 
179
- The GitLab component automatically runs `gitgreen migrate --scope job` and `--scope runner` before calculating emissions, so pipelines stay in sync even when you change versions.
129
+ Migrations run automatically in CI pipelines.
180
130
 
181
- ### Table schemas
131
+ ### Database Schema
182
132
 
183
- #### Job table (per-job emissions)
133
+ <details>
134
+ <summary><strong>Job Table</strong> (per-job emissions)</summary>
184
135
 
185
136
  | Column | Type | Description |
186
137
  |--------|------|-------------|
187
- | `id` | BIGSERIAL | Auto-incrementing primary key |
188
- | `ingested_at` | TIMESTAMPTZ | When the record was inserted |
189
- | `provider` | TEXT | Cloud provider (`gcp` or `aws`) |
190
- | `region` | TEXT | Cloud region/zone where the job ran |
191
- | `machine_type` | TEXT | Instance type (e.g., `e2-standard-4`, `t3.medium`) |
192
- | `cpu_points` | INT | Number of CPU metric data points collected |
193
- | `ram_points` | INT | Number of RAM metric data points collected |
194
- | `runtime_seconds` | INT | Total job duration in seconds |
195
- | `total_emissions` | DOUBLE | Total carbon emissions in gCO2eq (cpu + ram + scope3) |
196
- | `cpu_emissions` | DOUBLE | Emissions from CPU usage in gCO2eq |
197
- | `ram_emissions` | DOUBLE | Emissions from RAM usage in gCO2eq |
198
- | `scope3_emissions` | DOUBLE | Embodied carbon (manufacturing, shipping, disposal) in gCO2eq |
199
- | `carbon_intensity` | DOUBLE | Grid carbon intensity in gCO2eq/kWh from Electricity Maps |
200
- | `pue` | DOUBLE | Power Usage Effectiveness of the data center |
201
- | `carbon_budget` | DOUBLE | Configured carbon budget threshold in gCO2eq |
202
- | `over_budget` | BOOLEAN | Whether total emissions exceeded the budget |
203
- | `gitlab_project_id` | BIGINT | GitLab project ID |
204
- | `gitlab_pipeline_id` | BIGINT | GitLab pipeline ID |
205
- | `gitlab_job_id` | BIGINT | GitLab job ID |
206
- | `gitlab_job_name` | TEXT | Name of the GitLab job |
207
- | `runner_id` | TEXT | GitLab runner ID |
208
- | `runner_description` | TEXT | Runner description from GitLab |
209
- | `runner_tags` | TEXT | Comma-separated runner tags |
210
- | `runner_version` | TEXT | GitLab runner version |
211
- | `runner_revision` | TEXT | GitLab runner revision |
212
- | `payload` | JSONB | Full calculation result as JSON (for extensibility) |
213
-
214
- #### Job timeseries table
138
+ | `id` | BIGSERIAL | Primary key |
139
+ | `ingested_at` | TIMESTAMPTZ | Insert timestamp |
140
+ | `provider` | TEXT | `gcp` or `aws` |
141
+ | `region` | TEXT | Cloud region/zone |
142
+ | `machine_type` | TEXT | Instance type |
143
+ | `runtime_seconds` | INT | Job duration |
144
+ | `total_emissions` | DOUBLE | Total gCO₂eq |
145
+ | `cpu_emissions` | DOUBLE | CPU gCO₂eq |
146
+ | `ram_emissions` | DOUBLE | RAM gCO₂eq |
147
+ | `scope3_emissions` | DOUBLE | Embodied gCO₂eq |
148
+ | `carbon_intensity` | DOUBLE | Grid intensity (gCO₂eq/kWh) |
149
+ | `pue` | DOUBLE | Power Usage Effectiveness |
150
+ | `carbon_budget` | DOUBLE | Budget threshold |
151
+ | `over_budget` | BOOLEAN | Budget exceeded |
152
+ | `gitlab_project_id` | BIGINT | GitLab project |
153
+ | `gitlab_pipeline_id` | BIGINT | Pipeline ID |
154
+ | `gitlab_job_id` | BIGINT | Job ID |
155
+ | `gitlab_job_name` | TEXT | Job name |
156
+ | `payload` | JSONB | Full result JSON |
157
+
158
+ </details>
159
+
160
+ <details>
161
+ <summary><strong>Timeseries Table</strong> (CPU/RAM metrics)</summary>
215
162
 
216
163
  | Column | Type | Description |
217
164
  |--------|------|-------------|
218
- | `id` | BIGSERIAL | Auto-incrementing primary key |
219
- | `job_id` | BIGINT | Foreign key to the job table |
220
- | `metric` | TEXT | Metric name: `cpu`, `ram_used`, or `ram_size` |
221
- | `ts` | TIMESTAMPTZ | Timestamp of the data point |
222
- | `value` | DOUBLE | Metric value (CPU utilization %, RAM bytes used, or RAM bytes total) |
165
+ | `id` | BIGSERIAL | Primary key |
166
+ | `job_id` | BIGINT | Foreign key to job |
167
+ | `metric` | TEXT | `cpu`, `ram_used`, `ram_size` |
168
+ | `ts` | TIMESTAMPTZ | Timestamp |
169
+ | `value` | DOUBLE | Metric value |
170
+
171
+ </details>
223
172
 
224
- #### Runner inventory table
173
+ <details>
174
+ <summary><strong>Runner Inventory Table</strong></summary>
225
175
 
226
176
  | Column | Type | Description |
227
177
  |--------|------|-------------|
228
- | `id` | BIGSERIAL | Auto-incrementing primary key |
229
- | `ingested_at` | TIMESTAMPTZ | When the record was inserted |
178
+ | `id` | BIGSERIAL | Primary key |
230
179
  | `runner_id` | TEXT | GitLab runner ID |
231
- | `runner_description` | TEXT | Runner description |
232
- | `runner_version` | TEXT | GitLab runner version |
233
- | `runner_revision` | TEXT | GitLab runner revision |
234
- | `runner_platform` | TEXT | OS platform (e.g., `linux`) |
235
- | `runner_architecture` | TEXT | CPU architecture (e.g., `amd64`) |
236
- | `runner_executor` | TEXT | Executor type (e.g., `docker`, `shell`) |
237
- | `runner_tags` | TEXT | Comma-separated runner tags |
238
180
  | `machine_type` | TEXT | Instance type |
239
181
  | `provider` | TEXT | Cloud provider |
240
- | `region` | TEXT | Cloud region |
241
- | `gcp_project_id` | TEXT | GCP project ID (if applicable) |
242
- | `gcp_instance_id` | TEXT | GCP instance ID |
243
- | `gcp_zone` | TEXT | GCP zone |
244
- | `aws_region` | TEXT | AWS region (if applicable) |
245
- | `aws_instance_id` | TEXT | AWS EC2 instance ID |
246
- | `last_job_machine_type` | TEXT | Machine type from most recent job |
247
- | `last_job_region` | TEXT | Region from most recent job |
248
- | `last_job_provider` | TEXT | Provider from most recent job |
249
- | `last_job_runtime_seconds` | INT | Runtime of most recent job |
250
- | `last_job_total_emissions` | DOUBLE | Emissions from most recent job in gCO2eq |
251
- | `last_job_recorded_at` | TIMESTAMPTZ | When the most recent job was recorded |
252
- | `payload` | JSONB | Full runner metadata as JSON |
253
-
254
- ## Adding a provider
255
- 1. Extend `CloudProvider` and the provider guard in `src/index.ts` so the calculator accepts the new key.
256
- 2. Add machine power data (`<provider>_machine_power_profiles.json`) and, if needed, CPU profiles to `data/`, then update `PowerProfileRepository.loadMachineData` to load it.
257
- 3. Map regions to Electricity Maps zones and a PUE default in `ZoneMapper` (or via `data/runtime-pue-mappings.json` for runtime overrides).
258
- 4. Parse that provider's metrics into the `TimeseriesPoint` shape (timestamp + numeric value) alongside RAM size/usage, and update the CLI/init/templates to pull those metrics.
259
- 5. Wire any CI automation (runner tags, MR note flags) to pass the correct provider, machine type, and region strings.
260
-
261
- ## Publish
262
- - Ensure version bump in `package.json`
263
- - Run `pnpm -C node-module build && pnpm -C node-module test`
264
- - Publish: `pnpm -C node-module publish --access public`
182
+ | `region` | TEXT | Region |
183
+ | `last_job_total_emissions` | DOUBLE | Last job emissions |
184
+ | `payload` | JSONB | Full metadata |
265
185
 
266
- ## FAQ
186
+ </details>
267
187
 
268
- ### Are my API keys and credentials secure?
188
+ ## Methodology
269
189
 
270
- Yes. GitGreen does not save your keys locally. During `gitgreen init`, all sensitive credentials (API keys, service account keys, AWS credentials) are stored only in GitLab CI/CD variables, which are encrypted and managed by GitLab. The CLI never writes credentials to disk on your local machine.
190
+ GitGreen's calculations are based on peer-reviewed methodologies from [re:cinq](https://re-cinq.com/blog/cloud-cpu-energy-consumption) and [Teads](https://github.com/re-cinq/emissions-data).
271
191
 
272
- ### Does GitGreen call any third-party APIs?
192
+ ### Formula
273
193
 
274
- GitGreen only calls APIs that you explicitly configure and authorize:
194
+ ```
195
+ E_total = E_operational + E_embodied
275
196
 
276
- - **Electricity Maps API** - For carbon intensity data (requires your API key)
277
- - **GitLab API** - Only when using `glab` CLI or providing a PAT, to set CI/CD variables
278
- - **GCP Monitoring API** - To fetch CPU/RAM metrics from your GCP project (requires your service account)
279
- - **AWS CloudWatch API** - To fetch CPU/RAM metrics from your AWS account (requires your AWS credentials)
197
+ E_operational = (P_cpu + P_ram) × runtime_hours × PUE × carbon_intensity
198
+ E_embodied = scope3_hourly × runtime_hours
199
+ ```
280
200
 
281
- GitGreen does not operate any backend services or call any APIs owned by the GitGreen project. All API calls are made directly from your environment or CI/CD pipeline using your credentials.
201
+ | Variable | Description | Source |
202
+ |----------|-------------|--------|
203
+ | `P_cpu` | CPU power (kW) | Interpolated from utilization |
204
+ | `P_ram` | RAM power (0.5 W/GB) | Industry standard for DDR4 |
205
+ | `PUE` | Data center efficiency | Google/AWS published data |
206
+ | `carbon_intensity` | Grid emissions (gCO₂eq/kWh) | Electricity Maps API |
207
+ | `scope3_hourly` | Embodied carbon rate | Dell R740 LCA study |
282
208
 
283
- ### How do I know GitGreen won't destroy my `.gitlab-ci.yml` file?
209
+ ### CPU Power Model
284
210
 
285
- GitGreen takes safety precautions when modifying your `.gitlab-ci.yml`:
211
+ CPU power is **non-linear** with utilization. We use cubic spline interpolation across measured data points:
286
212
 
287
- 1. **Automatic backups** - Before making any changes, GitGreen creates a backup of your existing `.gitlab-ci.yml` file
288
- 2. **Backup location** - The backup path is printed to the console (e.g., `/tmp/gitlab-ci-backup-{timestamp}.yml`)
289
- 3. **Append-only changes** - GitGreen only appends new content to your CI file; it never removes or modifies existing jobs
290
- 4. **Confirmation prompt** - You're asked to confirm before any changes are made
213
+ | Utilization | Power (% of TDP) |
214
+ |-------------|------------------|
215
+ | 0% (idle) | 1.7% |
216
+ | 10% | 3.4% |
217
+ | 50% | 16.9% |
218
+ | 100% | 100% |
291
219
 
292
- If something goes wrong, you can restore your file from the backup location that was printed.
220
+ **VM Power Correction:** Cloud VMs share physical CPUs. We scale power by the ratio of VM vCPUs to physical threads:
293
221
 
294
- ### Can I use GitGreen without the `init` wizard?
222
+ ```
223
+ P_vm = TDP × ratio × (vm_vcpus / physical_threads)
224
+ ```
295
225
 
296
- Yes. You can configure GitGreen manually by:
226
+ | CPU | Cores | Threads | TDP | Source |
227
+ |-----|-------|---------|-----|--------|
228
+ | Intel Xeon Gold 6268CL | 24 | 48 | 205W | [Intel](https://www.ebay.com/p/2321792675) |
229
+ | Intel Xeon Platinum 8481C | 56 | 112 | 350W | [TechPowerUp](https://www.techpowerup.com/cpu-specs/xeon-platinum-8481c.c3992) |
230
+ | AMD EPYC 7B12 | 64 | 128 | 240W | [AMD](https://www.newegg.com/p/1FR-00G6-00026) |
231
+ | Ampere Altra Q64-30 | 64 | 64 | 180W | [Ampere](https://amperecomputing.com/en/briefs/ampere-altra-family-product-brief) |
297
232
 
298
- 1. Setting the required CI/CD variables in your GitLab project settings
299
- 2. Adding the GitLab CI/CD component to your `.gitlab-ci.yml` manually
300
- 3. Running `gitgreen` directly with command-line options instead of using the component
233
+ ### Scope 3 (Embodied Carbon)
301
234
 
302
- The `init` wizard is a convenience tool, but all functionality is available through manual configuration.
235
+ Manufacturing emissions amortized over hardware lifespan (6 years):
303
236
 
304
- ### What data does GitGreen collect?
237
+ | Component | Emissions |
238
+ |-----------|-----------|
239
+ | Base server | ~1000 kgCO₂eq |
240
+ | Per CPU | ~100 kgCO₂eq |
241
+ | Per 32GB DIMM | ~44 kgCO₂eq |
242
+ | Per SSD | ~50-100 kgCO₂eq |
305
243
 
306
- GitGreen only collects:
244
+ *Source: Dell PowerEdge R740 Life Cycle Assessment*
307
245
 
308
- - **CPU and RAM usage metrics** from your cloud provider (GCP Monitoring or AWS CloudWatch)
309
- - **Pipeline metadata** (start/end times) from GitLab CI/CD environment variables
246
+ ### Accuracy & Limitations
310
247
 
311
- GitGreen does not collect any personal information, code, or data from your repositories. All calculations happen locally in your CI/CD pipeline.
248
+ **Expected Accuracy:**
249
+ - Relative comparisons: Excellent (comparing job A vs B)
250
+ - Absolute values: ±15-25% for CPU-bound workloads
312
251
 
313
- ### Can I run GitGreen locally for testing?
252
+ **Known Limitations:**
314
253
 
315
- Yes. You can run `gitgreen` directly with your own metrics files:
254
+ | Limitation | Impact | Notes |
255
+ |------------|--------|-------|
256
+ | No GPU modeling | High | GPU jobs underestimated |
257
+ | Fixed RAM power | Low | 0.5 W/GB industry standard |
258
+ | No network/storage I/O | Low | <5% of typical job power |
316
259
 
317
- **GCP example:**
318
- ```bash
319
- gitgreen --provider gcp \
320
- --machine e2-standard-4 \
321
- --region us-central1-a \
322
- --cpu-timeseries cpu.json \
323
- --ram-used-timeseries ram-used.json \
324
- --ram-size-timeseries ram-size.json \
325
- --out-md report.md
326
- ```
260
+ ## Architecture
327
261
 
328
- **AWS example:**
329
- ```bash
330
- gitgreen --provider aws \
331
- --machine t3.medium \
332
- --region us-east-1 \
333
- --cpu-timeseries cpu.json \
334
- --ram-used-timeseries ram-used.json \
335
- --ram-size-timeseries ram-size.json \
336
- --out-md report.md
337
262
  ```
338
-
339
- Or use `--from-cloudwatch` to fetch metrics directly from AWS CloudWatch:
340
- ```bash
341
- gitgreen --provider aws \
342
- --machine t3.medium \
343
- --region us-east-1 \
344
- --from-cloudwatch \
345
- --out-md report.md
263
+ ┌─────────────────────────────────────────────────────────────┐
264
+ │ GitLab Pipeline │
265
+ ├─────────────────────────────────────────────────────────────┤
266
+ │ ┌─────────────┐ ┌──────────────────────────────────┐ │
267
+ Your Jobs │───▶│ GitGreen Carbon Tracking Job │ │
268
+ └─────────────┘ │ ├─ Fetch CPU/RAM metrics │ │
269
+ │ │ ├─ Calculate emissions │ │
270
+ │ │ ├─ Post MR comment │ │
271
+ │ │ └─ Export to database │ │
272
+ │ └──────────────────────────────────┘ │
273
+ └─────────────────────────────────────────────────────────────┘
274
+
275
+ ┌────────────────────┼────────────────────┐
276
+ ▼ ▼ ▼
277
+ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
278
+ │ GCP Monitoring │ │ Electricity Maps│ │ MySQL/Postgres │
279
+ │ AWS CloudWatch │ │ API │ │ (optional) │
280
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
346
281
  ```
347
282
 
348
- This is useful for testing before integrating into your CI/CD pipeline.
283
+ ## Configuration
349
284
 
350
- ## Carbon Calculation Methodology
285
+ ### Adding a New Provider
351
286
 
352
- GitGreen's carbon calculations are based on the methodology developed by [re:cinq](https://re-cinq.com/blog/cloud-cpu-energy-consumption) and [Teads](https://github.com/re-cinq/emissions-data), adapted for CI/CD workloads.
287
+ 1. Add machine power profiles to `data/<provider>_machine_power_profiles.json`
288
+ 2. Update `PowerProfileRepository.loadMachineData()`
289
+ 3. Map regions to Electricity Maps zones in `ZoneMapper`
290
+ 4. Parse metrics into `TimeseriesPoint` format
291
+ 5. Wire CI automation for provider-specific settings
353
292
 
354
- ### Formula
293
+ ### Source Data Files
355
294
 
356
- The total carbon emissions for a CI job are calculated as:
295
+ | File | Description |
296
+ |------|-------------|
297
+ | `data/cpu_physical_specs.json` | Physical CPU specs with sources |
298
+ | `data/cpu_power_profiles.json` | TDP and power ratios |
299
+ | `data/gcp_machine_power_profiles.json` | GCP machine mappings |
300
+ | `data/aws_machine_power_profiles.json` | AWS instance mappings |
301
+ | `data/source/*.csv` | Original re:cinq research data |
357
302
 
358
- ```
359
- E_total = E_operational + E_embodied
303
+ ## FAQ
360
304
 
361
- E_operational = (P_cpu + P_ram) × runtime_hours × PUE × carbon_intensity
362
- E_embodied = scope3_emissions_hourly × runtime_hours
363
- ```
305
+ <details>
306
+ <summary><strong>Are my credentials secure?</strong></summary>
364
307
 
365
- Where:
366
- - **P_cpu**: CPU power consumption in kW, interpolated from utilization
367
- - **P_ram**: RAM power consumption (0.5 W/GB × used GB)
368
- - **PUE**: Power Usage Effectiveness of the data center (typically 1.1-1.2 for hyperscalers)
369
- - **carbon_intensity**: Grid carbon intensity in gCO2eq/kWh from Electricity Maps API
370
- - **scope3_emissions_hourly**: Amortized embodied carbon from manufacturing
308
+ Yes. GitGreen stores all credentials in GitLab CI/CD variables (encrypted). Nothing is written to your local disk.
309
+ </details>
371
310
 
372
- ### CPU Power Interpolation
311
+ <details>
312
+ <summary><strong>What APIs does GitGreen call?</strong></summary>
373
313
 
374
- CPU power is not linear with utilization. We use **cubic spline interpolation** across 4 measured points:
314
+ Only APIs you explicitly configure:
315
+ - Electricity Maps API (carbon intensity)
316
+ - GCP Monitoring API (metrics)
317
+ - AWS CloudWatch API (metrics)
318
+ - GitLab API (MR comments, via CI_JOB_TOKEN)
375
319
 
376
- | Utilization | Power Ratio (of TDP) |
377
- |-------------|---------------------|
378
- | 0% (idle) | ~1.7% |
379
- | 10% | ~3.4% |
380
- | 50% | ~16.9% |
381
- | 100% | 100% |
320
+ GitGreen has no backend server.
321
+ </details>
382
322
 
383
- Power at any utilization is calculated as:
384
- ```
385
- P_cpu(util) = CubicSpline([0, 10, 50, 100], [W_idle, W_10, W_50, W_100])(util)
386
- ```
323
+ <details>
324
+ <summary><strong>Will it break my .gitlab-ci.yml?</strong></summary>
387
325
 
388
- The power values are derived from CPU TDP (Thermal Design Power) multiplied by empirically measured ratios from the [re:cinq emissions-data](https://github.com/re-cinq/emissions-data) project.
326
+ No. GitGreen creates a backup before any changes, only appends content (never modifies existing jobs), and asks for confirmation.
327
+ </details>
389
328
 
390
- ### Physical vCPU Correction
329
+ <details>
330
+ <summary><strong>Can I run it without the wizard?</strong></summary>
391
331
 
392
- Cloud VMs share physical CPUs. To accurately estimate power for a VM, we scale by the ratio of VM vCPUs to physical CPU threads:
332
+ Yes. Set CI/CD variables manually and run `gitgreen` with CLI options.
333
+ </details>
393
334
 
394
- ```
395
- P_vm = TDP × ratio × (vm_vcpus / physical_threads)
396
- ```
335
+ ## Contributing
397
336
 
398
- Physical thread counts are sourced from official specifications:
337
+ 1. Fork the repository
338
+ 2. Create a feature branch
339
+ 3. Run tests: `npm test`
340
+ 4. Submit a pull request
399
341
 
400
- | CPU | Cores | Threads | TDP | Source |
401
- |-----|-------|---------|-----|--------|
402
- | Intel Xeon Gold 6268CL | 24 | 48 | 205W | [eBay/Intel](https://www.ebay.com/p/2321792675) |
403
- | Intel Xeon Gold 6253CL | 18 | 36 | 205W | [PassMark](https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+Gold+6253CL) |
404
- | Intel Xeon Platinum 8481C | 56 | 112 | 350W | [TechPowerUp](https://www.techpowerup.com/cpu-specs/xeon-platinum-8481c.c3992) |
405
- | Intel Xeon Platinum 8373C | 36 | 72 | 300W | [Wikipedia](https://en.wikipedia.org/wiki/List_of_Intel_Xeon_processors_(Ice_Lake-based)) |
406
- | AMD EPYC 7B12 | 64 | 128 | 240W | [Newegg](https://www.newegg.com/p/1FR-00G6-00026) |
407
- | Ampere Altra Q64-30 | 64 | 64 | 180W | [Ampere](https://amperecomputing.com/en/briefs/ampere-altra-family-product-brief) |
408
-
409
- **Example:** For `n2-standard-80` (80 vCPUs, Xeon Gold 6268CL with 48 threads):
410
- - Old (inflated): 205W × 80 = 16,400W at 100%
411
- - Corrected: 205W × (80/48) = 342W at 100%
342
+ ## References
412
343
 
413
- ### Scope 3 (Embodied Carbon)
414
-
415
- Scope 3 emissions represent the carbon footprint of manufacturing, shipping, and disposing of hardware. Based on Dell PowerEdge R740 LCA data:
416
-
417
- | Component | Emissions (kgCO2eq) |
418
- |-----------|---------------------|
419
- | Base server (1 socket, low DRAM) | ~1000 |
420
- | Per additional CPU | ~100 |
421
- | Per 32GB DIMM | ~44 |
422
- | Per SSD | ~50-100 |
344
+ ### Research & Methodology
423
345
 
424
- These are amortized over a 6-year server lifespan (~0.019 gCO2eq/hour conversion factor) and allocated proportionally to VM resources.
346
+ | Source | Description |
347
+ |--------|-------------|
348
+ | [re:cinq Cloud CPU Energy Consumption](https://re-cinq.com/blog/cloud-cpu-energy-consumption) | CPU power modeling methodology |
349
+ | [re:cinq emissions-data](https://github.com/re-cinq/emissions-data) | Machine power profiles and ratios |
350
+ | [Teads Engineering](https://engineering.teads.com/) | Original research on cloud carbon |
351
+ | Dell PowerEdge R740 LCA | Scope 3 embodied carbon data |
425
352
 
426
353
  ### Data Sources
427
354
 
428
355
  | Data | Source |
429
356
  |------|--------|
430
- | CPU power ratios | [re:cinq emissions-data](https://github.com/re-cinq/emissions-data) |
431
- | GCP machine specs | [Google Cloud CPU Platforms](https://cloud.google.com/compute/docs/cpu-platforms) |
432
- | Carbon intensity | [Electricity Maps API](https://www.electricitymaps.com/) |
433
- | PUE values | [Google Data Center Efficiency](https://www.google.com/about/datacenters/efficiency/) |
434
- | Scope 3 LCA data | Dell PowerEdge R740 Life Cycle Assessment |
435
-
436
- ### Accuracy & Limitations
437
-
438
- #### What's grounded in real research
357
+ | Real-time carbon intensity | [Electricity Maps API](https://www.electricitymaps.com/) |
358
+ | GCP machine specifications | [Google Cloud CPU Platforms](https://cloud.google.com/compute/docs/cpu-platforms) |
359
+ | GCP data center PUE | [Google Data Center Efficiency](https://www.google.com/about/datacenters/efficiency/) |
360
+ | Intel CPU specifications | [Intel ARK](https://ark.intel.com/) |
361
+ | AMD CPU specifications | [AMD Product Pages](https://www.amd.com/en/products/specifications/processors) |
362
+ | Ampere CPU specifications | [Ampere Product Briefs](https://amperecomputing.com/briefs) |
439
363
 
440
- | Component | Methodology | Source |
441
- |-----------|-------------|--------|
442
- | CPU power ratios | Measured at 0%, 10%, 50%, 100% utilization | [re:cinq/Teads](https://github.com/re-cinq/emissions-data) |
443
- | Physical CPU specs | Official vendor specifications | Intel ARK, TechPowerUp, AMD, Ampere |
444
- | Carbon intensity | Real-time grid data | [Electricity Maps API](https://www.electricitymaps.com/) |
445
- | PUE values | Published data center efficiency | [Google](https://www.google.com/about/datacenters/efficiency/) |
446
- | Scope 3 methodology | Life cycle assessment | Dell PowerEdge R740 LCA |
364
+ ### CPU Specifications Used
447
365
 
448
- #### Expected accuracy
449
-
450
- - **Relative accuracy**: Excellent for comparing jobs (A vs B) and tracking trends
451
- - **Absolute accuracy**: ±15-25% for CPU-bound workloads
366
+ | CPU | Cores | Threads | TDP | Source |
367
+ |-----|-------|---------|-----|--------|
368
+ | Intel Xeon Gold 6268CL | 24 | 48 | 205W | [Product Listing](https://www.ebay.com/p/2321792675) |
369
+ | Intel Xeon Gold 6253CL | 18 | 36 | 205W | [PassMark](https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+Gold+6253CL) |
370
+ | Intel Xeon Platinum 8481C | 56 | 112 | 350W | [TechPowerUp](https://www.techpowerup.com/cpu-specs/xeon-platinum-8481c.c3992) |
371
+ | Intel Xeon Platinum 8373C | 36 | 72 | 300W | [Wikipedia](https://en.wikipedia.org/wiki/List_of_Intel_Xeon_processors_(Ice_Lake-based)) |
372
+ | AMD EPYC 7B12 | 64 | 128 | 240W | [Newegg](https://www.newegg.com/p/1FR-00G6-00026) |
373
+ | Ampere Altra Q64-30 | 64 | 64 | 180W | [Ampere Brief](https://amperecomputing.com/en/briefs/ampere-altra-family-product-brief) |
452
374
 
453
- #### Known limitations
375
+ ## License
454
376
 
455
- | Limitation | Impact | Notes |
456
- |------------|--------|-------|
457
- | RAM uses fixed 0.5 W/GB | Low | Industry standard estimate for DDR4 |
458
- | Scope 3 only for some AWS types | Medium | GCP scope 3 data not yet available |
459
- | No GPU modeling | High (if using GPUs) | GPU-heavy jobs will be underestimated |
460
- | No network I/O modeling | Low | Typically <5% of job power |
461
- | No storage I/O modeling | Low | Typically <5% of job power |
462
- | Multi-tenant overhead | Low | Actual power may be 5-10% lower due to shared resources |
463
- | CPU specs may be incomplete | Medium | Falls back to unscaled profile if CPU not in database |
464
-
465
- #### What this means for you
466
-
467
- - **CI/CD optimization**: The relative comparisons are reliable - if job A shows 2x the emissions of job B, that's meaningful
468
- - **Absolute reporting**: Use the numbers for directional guidance, not precise carbon accounting
469
- - **Trend tracking**: Week-over-week and month-over-month trends are accurate
470
- - **GPU workloads**: Currently underestimated - GPU power not modeled
471
-
472
- ### Source Data
473
-
474
- Raw source data files are available in `data/`:
475
- - `cpu_physical_specs.json` - Physical CPU specs with thread counts and sources (our research)
476
- - `cpu_power_profiles.json` - TDP and power ratios per CPU type
477
- - `gcp_machine_power_profiles.json` - GCP machine type to power mappings
478
- - `aws_machine_power_profiles.json` - AWS instance type to power mappings
479
-
480
- Original re:cinq data in `data/source/`:
481
- - `GCP Machine types - CPU Profiles.csv` - TDP and power ratios per CPU
482
- - `GCP Machine types - Instances.csv` - Machine type to CPU mappings
483
- - `GCP Machine types - Scope 3 Ratios.csv` - Embodied carbon factors
484
- - `GCP Machine types - Dell R740 LCA.csv` - Life cycle assessment reference
377
+ MIT License - see [LICENSE](LICENSE) for details.
485
378
 
486
- ## License
379
+ ---
487
380
 
488
- MIT License - see [LICENSE](LICENSE) file for details.
381
+ <p align="center">
382
+ <sub>Built with care for a sustainable software future.</sub>
383
+ </p>
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gitgreen",
3
- "version": "1.3.0",
3
+ "version": "1.4.0",
4
4
  "description": "GitGreen CLI for carbon reporting in GitLab pipelines (GCP/AWS)",
5
5
  "main": "dist/index.js",
6
6
  "types": "dist/index.d.ts",
@@ -1,268 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Build complete GCP machine power profiles by correlating data across multiple CSV files.
4
- The power data is spread across different files and needs to be combined.
5
- """
6
- import pandas as pd
7
- import json
8
- import numpy as np
9
-
10
- def find_actual_power_data():
11
- """
12
- Search through all CSV files to find where the actual power consumption data is
13
- """
14
- print("=== SEARCHING FOR ACTUAL POWER DATA ===")
15
-
16
- # 1. Check Machine Ratios file
17
- print("\n--- Machine Ratios File ---")
18
- ratios_df = pd.read_csv("data/GCP Machine types - Machine Ratios.csv")
19
-
20
- # Look for non-NaN power data
21
- power_cols = ['CPU Name', 'PkgWatt Idle', 'PkgWatt CPUStress 10%', 'PkgWatt CPUStress 50%', 'PkgWatt Average 100%']
22
- power_data = ratios_df[power_cols].dropna()
23
-
24
- print(f"Rows with power data: {len(power_data)}")
25
- if len(power_data) > 0:
26
- print("Sample power data from Machine Ratios:")
27
- print(power_data.head())
28
-
29
- # 2. Check CPU Profiles file
30
- print("\n--- CPU Profiles File ---")
31
- cpu_df = pd.read_csv("data/GCP Machine types - CPU Profiles.csv")
32
-
33
- # Look for ratio data that can be used to calculate power
34
- ratio_cols = ['Processor SKU', 'TDP (W)', 'IDLE ratio', '10% ratio', '50% Ratio']
35
- cpu_power = cpu_df[ratio_cols].dropna()
36
-
37
- print(f"Rows with CPU ratio data: {len(cpu_power)}")
38
- if len(cpu_power) > 0:
39
- print("Sample CPU ratio data:")
40
- print(cpu_power.head())
41
-
42
- # 3. Check Bare Metal Profiles (this had actual power data in the first analysis)
43
- print("\n--- Bare Metal Power Profiles ---")
44
- bare_metal_df = pd.read_csv("data/GCP Machine types - Bare Metal Power Profiles.csv")
45
-
46
- # Find rows with actual power measurements
47
- bare_metal_power_cols = ['Product Name', 'PkgWatt Idle', 'PkgWatt CPUStress 10%', 'PkgWatt CPUStress 50%', 'PkgWatt CPUStress 100%']
48
- bare_metal_power = bare_metal_df[bare_metal_power_cols].dropna()
49
-
50
- print(f"Rows with bare metal power data: {len(bare_metal_power)}")
51
- if len(bare_metal_power) > 0:
52
- print("Sample bare metal power data:")
53
- print(bare_metal_power.head())
54
-
55
- return power_data, cpu_power, bare_metal_power
56
-
57
- def correlate_cpu_to_machines():
58
- """
59
- Try to correlate CPU types to GCP machine instances
60
- """
61
- print("\n=== CORRELATING CPU TYPES TO GCP MACHINES ===")
62
-
63
- # Load instances file
64
- instances_df = pd.read_csv("data/GCP Machine types - Instances.csv")
65
-
66
- # Look at CPU information in instances
67
- cpu_info_cols = ['Instance type', 'Platform CPU Name', 'Instance vCPU', 'Instance Memory (in GB)']
68
- cpu_info = instances_df[cpu_info_cols].dropna(subset=['Platform CPU Name'])
69
-
70
- print(f"Machines with CPU information: {len(cpu_info)}")
71
- if len(cpu_info) > 0:
72
- print("Sample CPU information:")
73
- print(cpu_info.head(10))
74
-
75
- # Get unique CPU types used in GCP
76
- unique_cpus = cpu_info['Platform CPU Name'].unique()
77
- print(f"\nUnique CPU types in GCP: {len(unique_cpus)}")
78
- for cpu in unique_cpus[:10]: # Show first 10
79
- print(f" - {cpu}")
80
-
81
- return cpu_info
82
-
83
- def build_power_profiles_from_ratios():
84
- """
85
- Build power profiles using CPU ratios and TDP values
86
- """
87
- print("\n=== BUILDING POWER PROFILES FROM CPU RATIOS ===")
88
-
89
- # Load CPU profiles with ratios
90
- cpu_df = pd.read_csv("data/GCP Machine types - CPU Profiles.csv")
91
-
92
- # Clean and process the data
93
- ratio_data = cpu_df[['Processor SKU', 'TDP (W)', 'IDLE ratio', '10% ratio', '50% Ratio']].dropna()
94
-
95
- if len(ratio_data) > 0:
96
- print(f"CPUs with ratio data: {len(ratio_data)}")
97
-
98
- # Calculate actual power consumption from ratios
99
- power_profiles = {}
100
-
101
- for _, row in ratio_data.iterrows():
102
- cpu_name = row['Processor SKU']
103
- tdp_watts = row['TDP (W)']
104
- idle_ratio = row['IDLE ratio']
105
- ratio_10 = row['10% ratio']
106
- ratio_50 = row['50% Ratio']
107
-
108
- # Calculate power at different utilization levels
109
- # Using TDP and ratios to estimate power consumption
110
- power_profiles[cpu_name] = {
111
- 'tdp_watts': tdp_watts,
112
- 'power_profile': [
113
- {'percentage': 0, 'watts': tdp_watts * idle_ratio},
114
- {'percentage': 10, 'watts': tdp_watts * ratio_10},
115
- {'percentage': 50, 'watts': tdp_watts * ratio_50},
116
- {'percentage': 100, 'watts': tdp_watts} # Assume 100% = TDP
117
- ]
118
- }
119
-
120
- print("\nSample calculated power profiles:")
121
- for i, (cpu_name, profile) in enumerate(power_profiles.items()):
122
- if i < 3: # Show first 3
123
- print(f"\n{cpu_name}:")
124
- print(f" TDP: {profile['tdp_watts']}W")
125
- for point in profile['power_profile']:
126
- print(f" {point['percentage']}%: {point['watts']:.1f}W")
127
-
128
- # Save CPU power profiles
129
- with open('cpu_power_profiles.json', 'w') as f:
130
- json.dump(power_profiles, f, indent=2)
131
-
132
- print(f"\n✅ Saved {len(power_profiles)} CPU power profiles to cpu_power_profiles.json")
133
-
134
- return power_profiles
135
- else:
136
- print("❌ No usable ratio data found")
137
- return {}
138
-
139
- def map_gcp_machines_to_power():
140
- """
141
- Map GCP machine types to their CPU power profiles
142
- """
143
- print("\n=== MAPPING GCP MACHINES TO POWER PROFILES ===")
144
-
145
- # Load instances and find CPU mappings
146
- instances_df = pd.read_csv("data/GCP Machine types - Instances.csv")
147
- cpu_info = instances_df[['Instance type', 'Platform CPU Name', 'Instance vCPU', 'Instance Memory (in GB)']].dropna(subset=['Platform CPU Name'])
148
-
149
- # Load CPU power profiles
150
- try:
151
- with open('cpu_power_profiles.json', 'r') as f:
152
- cpu_power_profiles = json.load(f)
153
- except:
154
- print("❌ CPU power profiles not found. Run build_power_profiles_from_ratios() first.")
155
- return {}
156
-
157
- # Map GCP machines to their power profiles
158
- gcp_machine_profiles = {}
159
-
160
- for _, row in cpu_info.iterrows():
161
- machine_type = row['Instance type']
162
- cpu_name = row['Platform CPU Name']
163
- vcpus = row['Instance vCPU']
164
- memory_gb = row['Instance Memory (in GB)']
165
-
166
- # Find matching CPU power profile (exact match or partial match)
167
- matching_cpu = None
168
- for cpu_profile_name in cpu_power_profiles.keys():
169
- if cpu_name in cpu_profile_name or cpu_profile_name in cpu_name:
170
- matching_cpu = cpu_profile_name
171
- break
172
-
173
- if matching_cpu:
174
- # Scale power consumption based on vCPUs (since profiles are per-CPU)
175
- base_profile = cpu_power_profiles[matching_cpu]
176
-
177
- gcp_machine_profiles[machine_type] = {
178
- 'vcpus': vcpus,
179
- 'memory_gb': memory_gb,
180
- 'platform_cpu': cpu_name,
181
- 'matched_cpu_profile': matching_cpu,
182
- 'cpu_power_profile': [
183
- {
184
- 'percentage': point['percentage'],
185
- 'watts': point['watts'] * (vcpus / base_profile.get('vcpus', 1)) # Scale by vCPU count
186
- }
187
- for point in base_profile['power_profile']
188
- ]
189
- }
190
-
191
- if gcp_machine_profiles:
192
- print(f"✅ Mapped {len(gcp_machine_profiles)} GCP machines to power profiles")
193
-
194
- # Show sample mappings
195
- print("\nSample GCP machine power profiles:")
196
- for i, (machine, profile) in enumerate(gcp_machine_profiles.items()):
197
- if i < 3:
198
- print(f"\n{machine} ({profile['vcpus']} vCPUs, {profile['memory_gb']}GB):")
199
- print(f" CPU: {profile['platform_cpu']}")
200
- print(f" Matched profile: {profile['matched_cpu_profile']}")
201
- for point in profile['cpu_power_profile']:
202
- print(f" {point['percentage']}%: {point['watts']:.1f}W")
203
-
204
- # Save complete GCP machine power profiles
205
- with open('gcp_machine_power_profiles.json', 'w') as f:
206
- json.dump(gcp_machine_profiles, f, indent=2)
207
-
208
- print(f"\n✅ Saved complete GCP machine power profiles to gcp_machine_power_profiles.json")
209
-
210
- return gcp_machine_profiles
211
- else:
212
- print("❌ Could not map any GCP machines to power profiles")
213
- return {}
214
-
215
- def final_recommendations():
216
- """
217
- Provide final implementation recommendations based on available data
218
- """
219
- print("\n" + "="*60)
220
- print("FINAL IMPLEMENTATION RECOMMENDATIONS")
221
- print("="*60)
222
-
223
- try:
224
- with open('gcp_machine_power_profiles.json', 'r') as f:
225
- profiles = json.load(f)
226
-
227
- print("✅ SUCCESS: Ready for re:cinq implementation!")
228
- print(f" - {len(profiles)} GCP machine types with power profiles")
229
- print(" - Power consumption curves: 0%, 10%, 50%, 100% CPU utilization")
230
- print(" - Data ready for cubic spline interpolation")
231
-
232
- print("\n📋 IMPLEMENTATION STEPS:")
233
- print("1. Load gcp_machine_power_profiles.json in CarbonService")
234
- print("2. Extract machine type from GitLab runner tags")
235
- print("3. Use cubic-spline package for power interpolation")
236
- print("4. Get real-time carbon intensity from Electricity Maps API")
237
- print("5. Apply Google data center PUE values")
238
- print("6. Calculate: interpolated_power(kW) × runtime(h) × PUE × carbon_intensity")
239
-
240
- print(f"\n🔧 SAMPLE CALCULATION CODE:")
241
- sample_machine = list(profiles.keys())[0]
242
- sample_profile = profiles[sample_machine]
243
- print(f"// Example for {sample_machine}")
244
- print("const powerProfile = [")
245
- for point in sample_profile['cpu_power_profile']:
246
- print(f" {{ percentage: {point['percentage']}, watts: {point['watts']:.1f} }},")
247
- print("];")
248
- print("const powerWatts = cubicSplineInterpolation(powerProfile, cpuUtilization);")
249
-
250
- except FileNotFoundError:
251
- print("❌ No machine power profiles generated")
252
- print(" Run the correlation functions first")
253
-
254
- if __name__ == "__main__":
255
- # Step 1: Find where actual power data exists
256
- machine_ratios_power, cpu_ratios, bare_metal_power = find_actual_power_data()
257
-
258
- # Step 2: Correlate CPU information
259
- cpu_info = correlate_cpu_to_machines()
260
-
261
- # Step 3: Build power profiles from available data
262
- cpu_profiles = build_power_profiles_from_ratios()
263
-
264
- # Step 4: Map GCP machines to power profiles
265
- gcp_profiles = map_gcp_machines_to_power()
266
-
267
- # Step 5: Final recommendations
268
- final_recommendations()