greenmining 1.1.9__tar.gz → 1.2.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- greenmining-1.2.1/CHANGELOG.md +107 -0
- greenmining-1.2.1/PKG-INFO +311 -0
- greenmining-1.2.1/README.md +254 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/__init__.py +29 -10
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/analyzers/__init__.py +0 -8
- greenmining-1.2.1/greenmining/controllers/repository_controller.py +156 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/services/local_repo_analyzer.py +15 -8
- greenmining-1.2.1/greenmining.egg-info/PKG-INFO +311 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining.egg-info/SOURCES.txt +0 -12
- {greenmining-1.1.9 → greenmining-1.2.1}/pyproject.toml +1 -1
- greenmining-1.1.9/CHANGELOG.md +0 -154
- greenmining-1.1.9/PKG-INFO +0 -865
- greenmining-1.1.9/README.md +0 -808
- greenmining-1.1.9/greenmining/analyzers/power_regression.py +0 -211
- greenmining-1.1.9/greenmining/analyzers/qualitative_analyzer.py +0 -394
- greenmining-1.1.9/greenmining/analyzers/version_power_analyzer.py +0 -246
- greenmining-1.1.9/greenmining/config.py +0 -91
- greenmining-1.1.9/greenmining/controllers/repository_controller.py +0 -161
- greenmining-1.1.9/greenmining/presenters/__init__.py +0 -7
- greenmining-1.1.9/greenmining/presenters/console_presenter.py +0 -143
- greenmining-1.1.9/greenmining.egg-info/PKG-INFO +0 -865
- {greenmining-1.1.9 → greenmining-1.2.1}/LICENSE +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/MANIFEST.in +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/__main__.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/analyzers/code_diff_analyzer.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/analyzers/metrics_power_correlator.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/analyzers/statistical_analyzer.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/analyzers/temporal_analyzer.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/controllers/__init__.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/energy/__init__.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/energy/base.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/energy/carbon_reporter.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/energy/codecarbon_meter.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/energy/cpu_meter.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/energy/rapl.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/gsf_patterns.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/models/__init__.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/models/aggregated_stats.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/models/analysis_result.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/models/commit.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/models/repository.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/services/__init__.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/services/commit_extractor.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/services/data_aggregator.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/services/data_analyzer.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/services/github_graphql_fetcher.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/services/reports.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining/utils.py +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining.egg-info/dependency_links.txt +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining.egg-info/requires.txt +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/greenmining.egg-info/top_level.txt +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/setup.cfg +0 -0
- {greenmining-1.1.9 → greenmining-1.2.1}/setup.py +0 -0
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## [1.2.1] - 2026-02-01
|
|
4
|
+
|
|
5
|
+
### Fixed
|
|
6
|
+
- Clone directory collision in `LocalRepoAnalyzer` when multiple repos share the same name (e.g. `open-android/Android` vs `hmkcode/Android` vs `duckduckgo/Android`)
|
|
7
|
+
- Race condition corruption during parallel analysis (`could not lock config file` errors)
|
|
8
|
+
- Aligned clone path sanitization with `RepositoryController._sanitize_repo_name` (owner\_repo format)
|
|
9
|
+
|
|
10
|
+
### Changed
|
|
11
|
+
- Clone directory structure now uses unique `owner_repo/` parent dirs per repository
|
|
12
|
+
|
|
13
|
+
## [1.2.0] - 2026-01-31
|
|
14
|
+
|
|
15
|
+
### Added
|
|
16
|
+
- `clone_repositories()` top-level function for cloning repos into `./greenmining_repos/` with sanitized directory names
|
|
17
|
+
- Repository name sanitization (`_sanitize_repo_name`) to prevent filesystem issues from special characters
|
|
18
|
+
- 2 missing official GSF patterns: "Match Utilization Requirements with Pre-configured Servers", "Optimize Impact on Customer Devices and Equipment"
|
|
19
|
+
- 11 new green keywords (energy proportionality, backward compatible, customer device, device lifetime, etc.)
|
|
20
|
+
- GSF pattern database now covers 100% of the official Green Software Foundation catalog (61/61)
|
|
21
|
+
|
|
22
|
+
### Changed
|
|
23
|
+
- Repositories now clone to `./greenmining_repos/` instead of `/tmp` (fixes OS cleanup and permission issues)
|
|
24
|
+
- `fetch_repositories()` takes direct parameters -- no Config intermediary
|
|
25
|
+
- All function defaults are explicit parameters instead of config file values
|
|
26
|
+
- Default supported languages updated from 7 to 20 (matches experiment scope)
|
|
27
|
+
- Library reference documentation added to mkdocs navigation
|
|
28
|
+
|
|
29
|
+
### Removed
|
|
30
|
+
- **`config.py`** module entirely (Config class, get_config singleton, .env/YAML loading layer)
|
|
31
|
+
- **`__version__.py`** (stale orphaned file with wrong version 1.0.5)
|
|
32
|
+
- **`services/github_fetcher.py`** (empty deprecated REST API stub)
|
|
33
|
+
- **`analyzers/power_regression.py`** (PowerRegressionDetector -- requires running repo code, not feasible in current pipeline)
|
|
34
|
+
- **`analyzers/version_power_analyzer.py`** (VersionPowerAnalyzer -- same reason)
|
|
35
|
+
- **`analyzers/qualitative_analyzer.py`** (QualitativeAnalyzer -- unused)
|
|
36
|
+
- **`presenters/`** module (ConsolePresenter -- never used by any code)
|
|
37
|
+
- **`docs/reference/config-options.md`** (obsolete config reference page)
|
|
38
|
+
- 10 dead utility functions (estimate_tokens, estimate_cost, print_banner, print_section, load_csv_file, handle_github_rate_limit, format_duration, truncate_text, create_checkpoint, load_checkpoint)
|
|
39
|
+
- 35+ unused Config attributes that were set but never read
|
|
40
|
+
- Dead imports across 14 files
|
|
41
|
+
- Dead methods: DataAnalyzer._check_green_awareness, DataAnalyzer._detect_known_pattern, CommitExtractor._extract_commit_metadata, StatisticalAnalyzer.pattern_adoption_rate_analysis, CodeCarbonMeter.get_carbon_intensity, Config.validate
|
|
42
|
+
|
|
43
|
+
## [1.1.9] - 2026-01-31
|
|
44
|
+
|
|
45
|
+
### Removed
|
|
46
|
+
- Web dashboard module (`greenmining/dashboard/`) and Flask dependency
|
|
47
|
+
- Dashboard documentation page and all dashboard references
|
|
48
|
+
|
|
49
|
+
### Fixed
|
|
50
|
+
- ReadTheDocs experiment page not rendering (trailing whitespace in mkdocs nav)
|
|
51
|
+
- Plotly rendering in notebook (nbformat dependency)
|
|
52
|
+
|
|
53
|
+
## [1.1.6] - 2026-01-31
|
|
54
|
+
|
|
55
|
+
### Fixed
|
|
56
|
+
- EnergyMetrics property aliases (`energy_joules`, `average_power_watts`)
|
|
57
|
+
- Parallel energy measurement conflict with shared meter instance
|
|
58
|
+
- StatisticalAnalyzer timezone-aware date handling
|
|
59
|
+
- DataFrame column collision in pattern correlation analysis
|
|
60
|
+
|
|
61
|
+
### Added
|
|
62
|
+
- `since_date` / `to_date` parameters for date-bounded commit analysis
|
|
63
|
+
- `created_before` / `pushed_after` search filters
|
|
64
|
+
- GraphQL API and experiment documentation pages
|
|
65
|
+
- Full process metrics and method-level metrics documentation
|
|
66
|
+
|
|
67
|
+
### Changed
|
|
68
|
+
- Energy measurement demonstrates all 4 backends: RAPL, CPU Meter, CodeCarbon, tracemalloc
|
|
69
|
+
- Removed all PyDriller references (replaced with gitpython + lizard)
|
|
70
|
+
|
|
71
|
+
### Removed
|
|
72
|
+
- Qualitative Validation and Carbon Footprint Reporting steps from experiment
|
|
73
|
+
|
|
74
|
+
## [0.1.12] - 2025-12-03
|
|
75
|
+
|
|
76
|
+
### Added
|
|
77
|
+
- Custom search keywords for repository fetching (`--keywords` option)
|
|
78
|
+
- `fetch_repositories()` function exposed in public API
|
|
79
|
+
|
|
80
|
+
### Changed
|
|
81
|
+
- README updated to reflect 122 patterns (was showing 76 in PyPI description)
|
|
82
|
+
|
|
83
|
+
## [0.1.11] - 2025-12-03
|
|
84
|
+
|
|
85
|
+
### Added
|
|
86
|
+
- Expanded pattern database from 76 to 122 patterns
|
|
87
|
+
- Added 9 new categories
|
|
88
|
+
- Expanded keywords from 190 to 321
|
|
89
|
+
- VU Amsterdam 2024 research patterns for ML systems
|
|
90
|
+
|
|
91
|
+
## [0.1.0] - 2025-12-02
|
|
92
|
+
|
|
93
|
+
### Added
|
|
94
|
+
- Initial release
|
|
95
|
+
- Core functionality for GSF pattern mining
|
|
96
|
+
- Support for 100 microservices repositories
|
|
97
|
+
- Pattern matching with 76 GSF patterns
|
|
98
|
+
- Green awareness analysis
|
|
99
|
+
- Docker containerization
|
|
100
|
+
|
|
101
|
+
[1.2.1]: https://github.com/adam-bouafia/greenmining/compare/v1.2.0...v1.2.1
|
|
102
|
+
[1.2.0]: https://github.com/adam-bouafia/greenmining/compare/v1.1.9...v1.2.0
|
|
103
|
+
[1.1.9]: https://github.com/adam-bouafia/greenmining/compare/v1.1.6...v1.1.9
|
|
104
|
+
[1.1.6]: https://github.com/adam-bouafia/greenmining/compare/v0.1.12...v1.1.6
|
|
105
|
+
[0.1.12]: https://github.com/adam-bouafia/greenmining/compare/v0.1.11...v0.1.12
|
|
106
|
+
[0.1.11]: https://github.com/adam-bouafia/greenmining/compare/v0.1.0...v0.1.11
|
|
107
|
+
[0.1.0]: https://github.com/adam-bouafia/greenmining/releases/tag/v0.1.0
|
|
@@ -0,0 +1,311 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: greenmining
|
|
3
|
+
Version: 1.2.1
|
|
4
|
+
Summary: An empirical Python library for Mining Software Repositories (MSR) in Green IT research
|
|
5
|
+
Author-email: Adam Bouafia <a.bouafia@student.vu.nl>
|
|
6
|
+
License: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/adam-bouafia/greenmining
|
|
8
|
+
Project-URL: Documentation, https://github.com/adam-bouafia/greenmining#readme
|
|
9
|
+
Project-URL: Linkedin, https://www.linkedin.com/in/adam-bouafia/
|
|
10
|
+
Project-URL: Repository, https://github.com/adam-bouafia/greenmining
|
|
11
|
+
Project-URL: Issues, https://github.com/adam-bouafia/greenmining/issues
|
|
12
|
+
Project-URL: Changelog, https://github.com/adam-bouafia/greenmining/blob/main/CHANGELOG.md
|
|
13
|
+
Keywords: green-software,gsf,msr,mining-software-repositories,green-it,sustainability,carbon-footprint,energy-efficiency,repository-analysis,github-analysis,pydriller,empirical-software-engineering
|
|
14
|
+
Classifier: Development Status :: 3 - Alpha
|
|
15
|
+
Classifier: Intended Audience :: Developers
|
|
16
|
+
Classifier: Intended Audience :: Science/Research
|
|
17
|
+
Classifier: Topic :: Software Development :: Quality Assurance
|
|
18
|
+
Classifier: Topic :: Scientific/Engineering :: Information Analysis
|
|
19
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
20
|
+
Classifier: Programming Language :: Python :: 3
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
23
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
24
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
25
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
26
|
+
Classifier: Operating System :: OS Independent
|
|
27
|
+
Requires-Python: >=3.9
|
|
28
|
+
Description-Content-Type: text/markdown
|
|
29
|
+
License-File: LICENSE
|
|
30
|
+
Requires-Dist: PyGithub
|
|
31
|
+
Requires-Dist: PyDriller
|
|
32
|
+
Requires-Dist: pandas
|
|
33
|
+
Requires-Dist: colorama
|
|
34
|
+
Requires-Dist: tabulate
|
|
35
|
+
Requires-Dist: tqdm
|
|
36
|
+
Requires-Dist: matplotlib
|
|
37
|
+
Requires-Dist: plotly
|
|
38
|
+
Requires-Dist: python-dotenv
|
|
39
|
+
Requires-Dist: requests
|
|
40
|
+
Provides-Extra: dev
|
|
41
|
+
Requires-Dist: pytest; extra == "dev"
|
|
42
|
+
Requires-Dist: pytest-cov; extra == "dev"
|
|
43
|
+
Requires-Dist: pytest-mock; extra == "dev"
|
|
44
|
+
Requires-Dist: black; extra == "dev"
|
|
45
|
+
Requires-Dist: ruff; extra == "dev"
|
|
46
|
+
Requires-Dist: mypy; extra == "dev"
|
|
47
|
+
Requires-Dist: build; extra == "dev"
|
|
48
|
+
Requires-Dist: twine; extra == "dev"
|
|
49
|
+
Provides-Extra: energy
|
|
50
|
+
Requires-Dist: psutil; extra == "energy"
|
|
51
|
+
Requires-Dist: codecarbon; extra == "energy"
|
|
52
|
+
Provides-Extra: docs
|
|
53
|
+
Requires-Dist: sphinx; extra == "docs"
|
|
54
|
+
Requires-Dist: sphinx-rtd-theme; extra == "docs"
|
|
55
|
+
Requires-Dist: myst-parser; extra == "docs"
|
|
56
|
+
Dynamic: license-file
|
|
57
|
+
|
|
58
|
+
# greenmining
|
|
59
|
+
|
|
60
|
+
An empirical Python library for Mining Software Repositories (MSR) in Green IT research.
|
|
61
|
+
|
|
62
|
+
[](https://pypi.org/project/greenmining/)
|
|
63
|
+
[](https://pypi.org/project/greenmining/)
|
|
64
|
+
[](LICENSE)
|
|
65
|
+
[](https://greenmining.readthedocs.io/)
|
|
66
|
+
|
|
67
|
+
## Overview
|
|
68
|
+
|
|
69
|
+
`greenmining` is a research-grade Python library designed for **empirical Mining Software Repositories (MSR)** studies in **Green IT**. It enables researchers and practitioners to:
|
|
70
|
+
|
|
71
|
+
- **Mine repositories at scale** - Search, fetch, and clone GitHub repositories via GraphQL API with configurable filters
|
|
72
|
+
- **Classify green commits** - Detect 124 sustainability patterns from the Green Software Foundation (GSF) catalog using 332 keywords
|
|
73
|
+
- **Analyze any repository by URL** - Direct Git-based analysis with support for private repositories
|
|
74
|
+
- **Measure energy consumption** - RAPL, CodeCarbon, and CPU Energy Meter backends for power profiling
|
|
75
|
+
- **Carbon footprint reporting** - CO2 emissions calculation with 20+ country profiles and cloud region support
|
|
76
|
+
- **Method-level analysis** - Per-method complexity and metrics via Lizard integration
|
|
77
|
+
- **Generate research datasets** - Statistical analysis, temporal trends, and publication-ready reports
|
|
78
|
+
|
|
79
|
+
## Installation
|
|
80
|
+
|
|
81
|
+
### Via pip
|
|
82
|
+
|
|
83
|
+
```bash
|
|
84
|
+
pip install greenmining
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### With energy measurement
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
pip install greenmining[energy]
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
### From source
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
git clone https://github.com/adam-bouafia/greenmining.git
|
|
97
|
+
cd greenmining
|
|
98
|
+
pip install -e .
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
## Quick Start
|
|
102
|
+
|
|
103
|
+
### Pattern Detection
|
|
104
|
+
|
|
105
|
+
```python
|
|
106
|
+
from greenmining import GSF_PATTERNS, is_green_aware, get_pattern_by_keywords
|
|
107
|
+
|
|
108
|
+
print(f"Total patterns: {len(GSF_PATTERNS)}") # 124 patterns across 15 categories
|
|
109
|
+
|
|
110
|
+
commit_msg = "Optimize Redis caching to reduce energy consumption"
|
|
111
|
+
if is_green_aware(commit_msg):
|
|
112
|
+
patterns = get_pattern_by_keywords(commit_msg)
|
|
113
|
+
print(f"Matched patterns: {patterns}")
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
### Fetch Repositories
|
|
117
|
+
|
|
118
|
+
```python
|
|
119
|
+
from greenmining import fetch_repositories
|
|
120
|
+
|
|
121
|
+
repos = fetch_repositories(
|
|
122
|
+
github_token="your_token",
|
|
123
|
+
max_repos=50,
|
|
124
|
+
min_stars=500,
|
|
125
|
+
keywords="kubernetes cloud-native",
|
|
126
|
+
languages=["Python", "Go"],
|
|
127
|
+
created_after="2020-01-01",
|
|
128
|
+
pushed_after="2023-01-01",
|
|
129
|
+
)
|
|
130
|
+
|
|
131
|
+
for repo in repos[:5]:
|
|
132
|
+
print(f"- {repo.full_name} ({repo.stars} stars)")
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Clone Repositories
|
|
136
|
+
|
|
137
|
+
```python
|
|
138
|
+
from greenmining import fetch_repositories, clone_repositories
|
|
139
|
+
|
|
140
|
+
repos = fetch_repositories(github_token="your_token", max_repos=10, keywords="android")
|
|
141
|
+
|
|
142
|
+
# Clone into ./greenmining_repos/ with sanitized directory names
|
|
143
|
+
paths = clone_repositories(repos)
|
|
144
|
+
print(f"Cloned {len(paths)} repositories")
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Analyze Repositories by URL
|
|
148
|
+
|
|
149
|
+
```python
|
|
150
|
+
from greenmining import analyze_repositories
|
|
151
|
+
|
|
152
|
+
results = analyze_repositories(
|
|
153
|
+
urls=[
|
|
154
|
+
"https://github.com/kubernetes/kubernetes",
|
|
155
|
+
"https://github.com/istio/istio",
|
|
156
|
+
],
|
|
157
|
+
max_commits=100,
|
|
158
|
+
parallel_workers=2,
|
|
159
|
+
energy_tracking=True,
|
|
160
|
+
energy_backend="auto",
|
|
161
|
+
method_level_analysis=True,
|
|
162
|
+
include_source_code=True,
|
|
163
|
+
github_token="your_token",
|
|
164
|
+
since_date="2020-01-01",
|
|
165
|
+
to_date="2025-12-31",
|
|
166
|
+
)
|
|
167
|
+
|
|
168
|
+
for result in results:
|
|
169
|
+
print(f"{result.name}: {result.green_commit_rate:.1%} green")
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### Access Pattern Data
|
|
173
|
+
|
|
174
|
+
```python
|
|
175
|
+
from greenmining import GSF_PATTERNS
|
|
176
|
+
|
|
177
|
+
# Get patterns by category
|
|
178
|
+
cloud = {k: v for k, v in GSF_PATTERNS.items() if v['category'] == 'cloud'}
|
|
179
|
+
print(f"Cloud patterns: {len(cloud)}")
|
|
180
|
+
|
|
181
|
+
# All categories
|
|
182
|
+
categories = set(p['category'] for p in GSF_PATTERNS.values())
|
|
183
|
+
print(f"Categories: {sorted(categories)}")
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
### Energy Measurement
|
|
187
|
+
|
|
188
|
+
```python
|
|
189
|
+
from greenmining.energy import get_energy_meter, CPUEnergyMeter
|
|
190
|
+
|
|
191
|
+
# Auto-detect best backend
|
|
192
|
+
meter = get_energy_meter("auto")
|
|
193
|
+
meter.start()
|
|
194
|
+
# ... your workload ...
|
|
195
|
+
result = meter.stop()
|
|
196
|
+
print(f"Energy: {result.joules:.2f} J, Power: {result.watts_avg:.2f} W")
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
### Statistical Analysis
|
|
200
|
+
|
|
201
|
+
```python
|
|
202
|
+
from greenmining.analyzers import StatisticalAnalyzer, TemporalAnalyzer
|
|
203
|
+
|
|
204
|
+
stat = StatisticalAnalyzer()
|
|
205
|
+
temporal = TemporalAnalyzer(granularity="quarter")
|
|
206
|
+
|
|
207
|
+
# Pattern correlations, effect sizes, temporal trends
|
|
208
|
+
# See experiment notebook for full usage
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### Metrics-to-Power Correlation
|
|
212
|
+
|
|
213
|
+
```python
|
|
214
|
+
from greenmining.analyzers import MetricsPowerCorrelator
|
|
215
|
+
|
|
216
|
+
correlator = MetricsPowerCorrelator()
|
|
217
|
+
correlator.fit(
|
|
218
|
+
metrics=["complexity", "nloc", "code_churn"],
|
|
219
|
+
metrics_values={
|
|
220
|
+
"complexity": [10, 20, 30, 40],
|
|
221
|
+
"nloc": [100, 200, 300, 400],
|
|
222
|
+
"code_churn": [50, 100, 150, 200],
|
|
223
|
+
},
|
|
224
|
+
power_measurements=[5.0, 8.0, 12.0, 15.0],
|
|
225
|
+
)
|
|
226
|
+
print(f"Feature importance: {correlator.feature_importance}")
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
## Features
|
|
230
|
+
|
|
231
|
+
### Core Capabilities
|
|
232
|
+
|
|
233
|
+
- **Pattern Detection**: 124 sustainability patterns across 15 categories from the GSF catalog
|
|
234
|
+
- **Keyword Analysis**: 332 green software detection keywords
|
|
235
|
+
- **Repository Fetching**: GraphQL API with date, star, and language filters
|
|
236
|
+
- **Repository Cloning**: Sanitized directory names in `./greenmining_repos/`
|
|
237
|
+
- **URL-Based Analysis**: Direct Git-based analysis from GitHub URLs (HTTPS and SSH)
|
|
238
|
+
- **Batch Processing**: Parallel analysis of multiple repositories
|
|
239
|
+
- **Private Repository Support**: Authentication via SSH keys or GitHub tokens
|
|
240
|
+
|
|
241
|
+
### Analysis & Measurement
|
|
242
|
+
|
|
243
|
+
- **Energy Measurement**: RAPL, CodeCarbon, and CPU Energy Meter backends
|
|
244
|
+
- **Carbon Footprint Reporting**: CO2 emissions with 20+ country profiles (AWS, GCP, Azure)
|
|
245
|
+
- **Metrics-to-Power Correlation**: Pearson and Spearman analysis between code metrics and power
|
|
246
|
+
- **Method-Level Analysis**: Per-method complexity metrics via Lizard integration
|
|
247
|
+
- **Source Code Access**: Before/after source code for refactoring detection
|
|
248
|
+
- **Process Metrics**: DMM size, complexity, interfacing via PyDriller
|
|
249
|
+
- **Statistical Analysis**: Correlations, effect sizes, and temporal trends
|
|
250
|
+
- **Multi-format Output**: JSON, CSV, pandas DataFrame
|
|
251
|
+
|
|
252
|
+
### Energy Backends
|
|
253
|
+
|
|
254
|
+
| Backend | Platform | Metrics | Requirements |
|
|
255
|
+
|---------|----------|---------|--------------|
|
|
256
|
+
| **RAPL** | Linux (Intel/AMD) | CPU/RAM energy (Joules) | `/sys/class/powercap/` access |
|
|
257
|
+
| **CodeCarbon** | Cross-platform | Energy + Carbon emissions (gCO2) | `pip install codecarbon` |
|
|
258
|
+
| **CPU Meter** | All platforms | Estimated CPU energy (Joules) | Optional: `pip install psutil` |
|
|
259
|
+
| **Auto** | All platforms | Best available backend | Automatic detection |
|
|
260
|
+
|
|
261
|
+
### GSF Pattern Categories
|
|
262
|
+
|
|
263
|
+
**124 patterns across 15 categories:**
|
|
264
|
+
|
|
265
|
+
| Category | Patterns | Examples |
|
|
266
|
+
|----------|----------|----------|
|
|
267
|
+
| Cloud | 42 | Auto-scaling, serverless, right-sizing, region selection |
|
|
268
|
+
| Web | 17 | CDN, caching, lazy loading, compression |
|
|
269
|
+
| AI/ML | 19 | Model pruning, quantization, edge inference |
|
|
270
|
+
| Database | 5 | Indexing, query optimization, connection pooling |
|
|
271
|
+
| Networking | 8 | Protocol optimization, HTTP/2, gRPC |
|
|
272
|
+
| Network | 6 | Request batching, GraphQL, circuit breakers |
|
|
273
|
+
| Microservices | 4 | Service decomposition, graceful shutdown |
|
|
274
|
+
| Infrastructure | 4 | Alpine containers, IaC, renewable regions |
|
|
275
|
+
| General | 8 | Feature flags, precomputation, background jobs |
|
|
276
|
+
| Others | 11 | Caching, resource, data, async, code, monitoring |
|
|
277
|
+
|
|
278
|
+
## Development
|
|
279
|
+
|
|
280
|
+
```bash
|
|
281
|
+
git clone https://github.com/adam-bouafia/greenmining.git
|
|
282
|
+
cd greenmining
|
|
283
|
+
pip install -e ".[dev]"
|
|
284
|
+
|
|
285
|
+
pytest tests/
|
|
286
|
+
black greenmining/ tests/
|
|
287
|
+
ruff check greenmining/ tests/
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
## Requirements
|
|
291
|
+
|
|
292
|
+
- Python 3.9+
|
|
293
|
+
- PyGithub, PyDriller, pandas, colorama, tqdm
|
|
294
|
+
|
|
295
|
+
**Optional:**
|
|
296
|
+
|
|
297
|
+
```bash
|
|
298
|
+
pip install greenmining[energy] # psutil, codecarbon
|
|
299
|
+
pip install greenmining[dev] # pytest, black, ruff, mypy
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
## License
|
|
303
|
+
|
|
304
|
+
MIT License - See [LICENSE](LICENSE) for details.
|
|
305
|
+
|
|
306
|
+
## Links
|
|
307
|
+
|
|
308
|
+
- **GitHub**: https://github.com/adam-bouafia/greenmining
|
|
309
|
+
- **PyPI**: https://pypi.org/project/greenmining/
|
|
310
|
+
- **Documentation**: https://greenmining.readthedocs.io/
|
|
311
|
+
- **Docker Hub**: https://hub.docker.com/r/adambouafia/greenmining
|
|
@@ -0,0 +1,254 @@
|
|
|
1
|
+
# greenmining
|
|
2
|
+
|
|
3
|
+
An empirical Python library for Mining Software Repositories (MSR) in Green IT research.
|
|
4
|
+
|
|
5
|
+
[](https://pypi.org/project/greenmining/)
|
|
6
|
+
[](https://pypi.org/project/greenmining/)
|
|
7
|
+
[](LICENSE)
|
|
8
|
+
[](https://greenmining.readthedocs.io/)
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
`greenmining` is a research-grade Python library designed for **empirical Mining Software Repositories (MSR)** studies in **Green IT**. It enables researchers and practitioners to:
|
|
13
|
+
|
|
14
|
+
- **Mine repositories at scale** - Search, fetch, and clone GitHub repositories via GraphQL API with configurable filters
|
|
15
|
+
- **Classify green commits** - Detect 124 sustainability patterns from the Green Software Foundation (GSF) catalog using 332 keywords
|
|
16
|
+
- **Analyze any repository by URL** - Direct Git-based analysis with support for private repositories
|
|
17
|
+
- **Measure energy consumption** - RAPL, CodeCarbon, and CPU Energy Meter backends for power profiling
|
|
18
|
+
- **Carbon footprint reporting** - CO2 emissions calculation with 20+ country profiles and cloud region support
|
|
19
|
+
- **Method-level analysis** - Per-method complexity and metrics via Lizard integration
|
|
20
|
+
- **Generate research datasets** - Statistical analysis, temporal trends, and publication-ready reports
|
|
21
|
+
|
|
22
|
+
## Installation
|
|
23
|
+
|
|
24
|
+
### Via pip
|
|
25
|
+
|
|
26
|
+
```bash
|
|
27
|
+
pip install greenmining
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
### With energy measurement
|
|
31
|
+
|
|
32
|
+
```bash
|
|
33
|
+
pip install greenmining[energy]
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
### From source
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
git clone https://github.com/adam-bouafia/greenmining.git
|
|
40
|
+
cd greenmining
|
|
41
|
+
pip install -e .
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Quick Start
|
|
45
|
+
|
|
46
|
+
### Pattern Detection
|
|
47
|
+
|
|
48
|
+
```python
|
|
49
|
+
from greenmining import GSF_PATTERNS, is_green_aware, get_pattern_by_keywords
|
|
50
|
+
|
|
51
|
+
print(f"Total patterns: {len(GSF_PATTERNS)}") # 124 patterns across 15 categories
|
|
52
|
+
|
|
53
|
+
commit_msg = "Optimize Redis caching to reduce energy consumption"
|
|
54
|
+
if is_green_aware(commit_msg):
|
|
55
|
+
patterns = get_pattern_by_keywords(commit_msg)
|
|
56
|
+
print(f"Matched patterns: {patterns}")
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Fetch Repositories
|
|
60
|
+
|
|
61
|
+
```python
|
|
62
|
+
from greenmining import fetch_repositories
|
|
63
|
+
|
|
64
|
+
repos = fetch_repositories(
|
|
65
|
+
github_token="your_token",
|
|
66
|
+
max_repos=50,
|
|
67
|
+
min_stars=500,
|
|
68
|
+
keywords="kubernetes cloud-native",
|
|
69
|
+
languages=["Python", "Go"],
|
|
70
|
+
created_after="2020-01-01",
|
|
71
|
+
pushed_after="2023-01-01",
|
|
72
|
+
)
|
|
73
|
+
|
|
74
|
+
for repo in repos[:5]:
|
|
75
|
+
print(f"- {repo.full_name} ({repo.stars} stars)")
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Clone Repositories
|
|
79
|
+
|
|
80
|
+
```python
|
|
81
|
+
from greenmining import fetch_repositories, clone_repositories
|
|
82
|
+
|
|
83
|
+
repos = fetch_repositories(github_token="your_token", max_repos=10, keywords="android")
|
|
84
|
+
|
|
85
|
+
# Clone into ./greenmining_repos/ with sanitized directory names
|
|
86
|
+
paths = clone_repositories(repos)
|
|
87
|
+
print(f"Cloned {len(paths)} repositories")
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### Analyze Repositories by URL
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
from greenmining import analyze_repositories
|
|
94
|
+
|
|
95
|
+
results = analyze_repositories(
|
|
96
|
+
urls=[
|
|
97
|
+
"https://github.com/kubernetes/kubernetes",
|
|
98
|
+
"https://github.com/istio/istio",
|
|
99
|
+
],
|
|
100
|
+
max_commits=100,
|
|
101
|
+
parallel_workers=2,
|
|
102
|
+
energy_tracking=True,
|
|
103
|
+
energy_backend="auto",
|
|
104
|
+
method_level_analysis=True,
|
|
105
|
+
include_source_code=True,
|
|
106
|
+
github_token="your_token",
|
|
107
|
+
since_date="2020-01-01",
|
|
108
|
+
to_date="2025-12-31",
|
|
109
|
+
)
|
|
110
|
+
|
|
111
|
+
for result in results:
|
|
112
|
+
print(f"{result.name}: {result.green_commit_rate:.1%} green")
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### Access Pattern Data
|
|
116
|
+
|
|
117
|
+
```python
|
|
118
|
+
from greenmining import GSF_PATTERNS
|
|
119
|
+
|
|
120
|
+
# Get patterns by category
|
|
121
|
+
cloud = {k: v for k, v in GSF_PATTERNS.items() if v['category'] == 'cloud'}
|
|
122
|
+
print(f"Cloud patterns: {len(cloud)}")
|
|
123
|
+
|
|
124
|
+
# All categories
|
|
125
|
+
categories = set(p['category'] for p in GSF_PATTERNS.values())
|
|
126
|
+
print(f"Categories: {sorted(categories)}")
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### Energy Measurement
|
|
130
|
+
|
|
131
|
+
```python
|
|
132
|
+
from greenmining.energy import get_energy_meter, CPUEnergyMeter
|
|
133
|
+
|
|
134
|
+
# Auto-detect best backend
|
|
135
|
+
meter = get_energy_meter("auto")
|
|
136
|
+
meter.start()
|
|
137
|
+
# ... your workload ...
|
|
138
|
+
result = meter.stop()
|
|
139
|
+
print(f"Energy: {result.joules:.2f} J, Power: {result.watts_avg:.2f} W")
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### Statistical Analysis
|
|
143
|
+
|
|
144
|
+
```python
|
|
145
|
+
from greenmining.analyzers import StatisticalAnalyzer, TemporalAnalyzer
|
|
146
|
+
|
|
147
|
+
stat = StatisticalAnalyzer()
|
|
148
|
+
temporal = TemporalAnalyzer(granularity="quarter")
|
|
149
|
+
|
|
150
|
+
# Pattern correlations, effect sizes, temporal trends
|
|
151
|
+
# See experiment notebook for full usage
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
### Metrics-to-Power Correlation
|
|
155
|
+
|
|
156
|
+
```python
|
|
157
|
+
from greenmining.analyzers import MetricsPowerCorrelator
|
|
158
|
+
|
|
159
|
+
correlator = MetricsPowerCorrelator()
|
|
160
|
+
correlator.fit(
|
|
161
|
+
metrics=["complexity", "nloc", "code_churn"],
|
|
162
|
+
metrics_values={
|
|
163
|
+
"complexity": [10, 20, 30, 40],
|
|
164
|
+
"nloc": [100, 200, 300, 400],
|
|
165
|
+
"code_churn": [50, 100, 150, 200],
|
|
166
|
+
},
|
|
167
|
+
power_measurements=[5.0, 8.0, 12.0, 15.0],
|
|
168
|
+
)
|
|
169
|
+
print(f"Feature importance: {correlator.feature_importance}")
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
## Features
|
|
173
|
+
|
|
174
|
+
### Core Capabilities
|
|
175
|
+
|
|
176
|
+
- **Pattern Detection**: 124 sustainability patterns across 15 categories from the GSF catalog
|
|
177
|
+
- **Keyword Analysis**: 332 green software detection keywords
|
|
178
|
+
- **Repository Fetching**: GraphQL API with date, star, and language filters
|
|
179
|
+
- **Repository Cloning**: Sanitized directory names in `./greenmining_repos/`
|
|
180
|
+
- **URL-Based Analysis**: Direct Git-based analysis from GitHub URLs (HTTPS and SSH)
|
|
181
|
+
- **Batch Processing**: Parallel analysis of multiple repositories
|
|
182
|
+
- **Private Repository Support**: Authentication via SSH keys or GitHub tokens
|
|
183
|
+
|
|
184
|
+
### Analysis & Measurement
|
|
185
|
+
|
|
186
|
+
- **Energy Measurement**: RAPL, CodeCarbon, and CPU Energy Meter backends
|
|
187
|
+
- **Carbon Footprint Reporting**: CO2 emissions with 20+ country profiles (AWS, GCP, Azure)
|
|
188
|
+
- **Metrics-to-Power Correlation**: Pearson and Spearman analysis between code metrics and power
|
|
189
|
+
- **Method-Level Analysis**: Per-method complexity metrics via Lizard integration
|
|
190
|
+
- **Source Code Access**: Before/after source code for refactoring detection
|
|
191
|
+
- **Process Metrics**: DMM size, complexity, interfacing via PyDriller
|
|
192
|
+
- **Statistical Analysis**: Correlations, effect sizes, and temporal trends
|
|
193
|
+
- **Multi-format Output**: JSON, CSV, pandas DataFrame
|
|
194
|
+
|
|
195
|
+
### Energy Backends
|
|
196
|
+
|
|
197
|
+
| Backend | Platform | Metrics | Requirements |
|
|
198
|
+
|---------|----------|---------|--------------|
|
|
199
|
+
| **RAPL** | Linux (Intel/AMD) | CPU/RAM energy (Joules) | `/sys/class/powercap/` access |
|
|
200
|
+
| **CodeCarbon** | Cross-platform | Energy + Carbon emissions (gCO2) | `pip install codecarbon` |
|
|
201
|
+
| **CPU Meter** | All platforms | Estimated CPU energy (Joules) | Optional: `pip install psutil` |
|
|
202
|
+
| **Auto** | All platforms | Best available backend | Automatic detection |
|
|
203
|
+
|
|
204
|
+
### GSF Pattern Categories
|
|
205
|
+
|
|
206
|
+
**124 patterns across 15 categories:**
|
|
207
|
+
|
|
208
|
+
| Category | Patterns | Examples |
|
|
209
|
+
|----------|----------|----------|
|
|
210
|
+
| Cloud | 42 | Auto-scaling, serverless, right-sizing, region selection |
|
|
211
|
+
| Web | 17 | CDN, caching, lazy loading, compression |
|
|
212
|
+
| AI/ML | 19 | Model pruning, quantization, edge inference |
|
|
213
|
+
| Database | 5 | Indexing, query optimization, connection pooling |
|
|
214
|
+
| Networking | 8 | Protocol optimization, HTTP/2, gRPC |
|
|
215
|
+
| Network | 6 | Request batching, GraphQL, circuit breakers |
|
|
216
|
+
| Microservices | 4 | Service decomposition, graceful shutdown |
|
|
217
|
+
| Infrastructure | 4 | Alpine containers, IaC, renewable regions |
|
|
218
|
+
| General | 8 | Feature flags, precomputation, background jobs |
|
|
219
|
+
| Others | 11 | Caching, resource, data, async, code, monitoring |
|
|
220
|
+
|
|
221
|
+
## Development
|
|
222
|
+
|
|
223
|
+
```bash
|
|
224
|
+
git clone https://github.com/adam-bouafia/greenmining.git
|
|
225
|
+
cd greenmining
|
|
226
|
+
pip install -e ".[dev]"
|
|
227
|
+
|
|
228
|
+
pytest tests/
|
|
229
|
+
black greenmining/ tests/
|
|
230
|
+
ruff check greenmining/ tests/
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
## Requirements
|
|
234
|
+
|
|
235
|
+
- Python 3.9+
|
|
236
|
+
- PyGithub, PyDriller, pandas, colorama, tqdm
|
|
237
|
+
|
|
238
|
+
**Optional:**
|
|
239
|
+
|
|
240
|
+
```bash
|
|
241
|
+
pip install greenmining[energy] # psutil, codecarbon
|
|
242
|
+
pip install greenmining[dev] # pytest, black, ruff, mypy
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
## License
|
|
246
|
+
|
|
247
|
+
MIT License - See [LICENSE](LICENSE) for details.
|
|
248
|
+
|
|
249
|
+
## Links
|
|
250
|
+
|
|
251
|
+
- **GitHub**: https://github.com/adam-bouafia/greenmining
|
|
252
|
+
- **PyPI**: https://pypi.org/project/greenmining/
|
|
253
|
+
- **Documentation**: https://greenmining.readthedocs.io/
|
|
254
|
+
- **Docker Hub**: https://hub.docker.com/r/adambouafia/greenmining
|