greenmining 1.0.3__tar.gz → 1.0.5__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {greenmining-1.0.3/greenmining.egg-info → greenmining-1.0.5}/PKG-INFO +69 -173
- {greenmining-1.0.3 → greenmining-1.0.5}/README.md +66 -169
- greenmining-1.0.5/greenmining/__init__.py +43 -0
- greenmining-1.0.5/greenmining/__main__.py +12 -0
- greenmining-1.0.5/greenmining/__version__.py +3 -0
- greenmining-1.0.5/greenmining/analyzers/__init__.py +13 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/analyzers/code_diff_analyzer.py +151 -61
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/analyzers/qualitative_analyzer.py +15 -81
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/analyzers/statistical_analyzer.py +8 -69
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/analyzers/temporal_analyzer.py +16 -72
- greenmining-1.0.5/greenmining/config.py +188 -0
- greenmining-1.0.5/greenmining/controllers/__init__.py +7 -0
- greenmining-1.0.5/greenmining/controllers/repository_controller.py +231 -0
- greenmining-1.0.5/greenmining/energy/__init__.py +13 -0
- greenmining-1.0.5/greenmining/energy/base.py +165 -0
- greenmining-1.0.5/greenmining/energy/codecarbon_meter.py +146 -0
- greenmining-1.0.5/greenmining/energy/rapl.py +157 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/gsf_patterns.py +4 -26
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/models/__init__.py +1 -5
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/models/aggregated_stats.py +4 -4
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/models/analysis_result.py +4 -4
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/models/commit.py +5 -5
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/models/repository.py +5 -5
- greenmining-1.0.5/greenmining/presenters/__init__.py +7 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/presenters/console_presenter.py +24 -24
- greenmining-1.0.5/greenmining/services/__init__.py +17 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/services/commit_extractor.py +8 -152
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/services/data_aggregator.py +45 -175
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/services/data_analyzer.py +9 -202
- greenmining-1.0.5/greenmining/services/github_fetcher.py +210 -0
- greenmining-1.0.5/greenmining/services/github_graphql_fetcher.py +361 -0
- greenmining-1.0.5/greenmining/services/local_repo_analyzer.py +387 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/services/reports.py +33 -137
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining/utils.py +21 -149
- {greenmining-1.0.3 → greenmining-1.0.5/greenmining.egg-info}/PKG-INFO +69 -173
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining.egg-info/SOURCES.txt +12 -10
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining.egg-info/requires.txt +2 -2
- {greenmining-1.0.3 → greenmining-1.0.5}/pyproject.toml +4 -8
- {greenmining-1.0.3 → greenmining-1.0.5}/setup.py +1 -1
- greenmining-1.0.3/greenmining/__init__.py +0 -61
- greenmining-1.0.3/greenmining/__main__.py +0 -6
- greenmining-1.0.3/greenmining/__version__.py +0 -3
- greenmining-1.0.3/greenmining/analyzers/__init__.py +0 -17
- greenmining-1.0.3/greenmining/analyzers/ml_feature_extractor.py +0 -512
- greenmining-1.0.3/greenmining/analyzers/nlp_analyzer.py +0 -365
- greenmining-1.0.3/greenmining/cli.py +0 -471
- greenmining-1.0.3/greenmining/config.py +0 -141
- greenmining-1.0.3/greenmining/controllers/__init__.py +0 -11
- greenmining-1.0.3/greenmining/controllers/repository_controller.py +0 -172
- greenmining-1.0.3/greenmining/main.py +0 -37
- greenmining-1.0.3/greenmining/presenters/__init__.py +0 -11
- greenmining-1.0.3/greenmining/services/__init__.py +0 -13
- greenmining-1.0.3/greenmining/services/github_fetcher.py +0 -323
- greenmining-1.0.3/greenmining.egg-info/entry_points.txt +0 -2
- greenmining-1.0.3/pytest.ini +0 -22
- {greenmining-1.0.3 → greenmining-1.0.5}/CHANGELOG.md +0 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/LICENSE +0 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/MANIFEST.in +0 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining.egg-info/dependency_links.txt +0 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/greenmining.egg-info/top_level.txt +0 -0
- {greenmining-1.0.3 → greenmining-1.0.5}/setup.cfg +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: greenmining
|
|
3
|
-
Version: 1.0.
|
|
3
|
+
Version: 1.0.5
|
|
4
4
|
Summary: Analyze GitHub repositories to identify green software engineering patterns and energy-efficient practices
|
|
5
5
|
Author-email: Adam Bouafia <a.bouafia@student.vu.nl>
|
|
6
6
|
License: MIT
|
|
@@ -23,20 +23,19 @@ Classifier: Programming Language :: Python :: 3.11
|
|
|
23
23
|
Classifier: Programming Language :: Python :: 3.12
|
|
24
24
|
Classifier: Programming Language :: Python :: 3.13
|
|
25
25
|
Classifier: Operating System :: OS Independent
|
|
26
|
-
Classifier: Environment :: Console
|
|
27
26
|
Requires-Python: >=3.9
|
|
28
27
|
Description-Content-Type: text/markdown
|
|
29
28
|
License-File: LICENSE
|
|
30
29
|
Requires-Dist: PyGithub>=2.1.1
|
|
31
30
|
Requires-Dist: PyDriller>=2.5
|
|
32
31
|
Requires-Dist: pandas>=2.2.0
|
|
33
|
-
Requires-Dist: click>=8.1.7
|
|
34
32
|
Requires-Dist: colorama>=0.4.6
|
|
35
33
|
Requires-Dist: tabulate>=0.9.0
|
|
36
34
|
Requires-Dist: tqdm>=4.66.0
|
|
37
35
|
Requires-Dist: matplotlib>=3.8.0
|
|
38
36
|
Requires-Dist: plotly>=5.18.0
|
|
39
37
|
Requires-Dist: python-dotenv>=1.0.0
|
|
38
|
+
Requires-Dist: requests>=2.31.0
|
|
40
39
|
Provides-Extra: dev
|
|
41
40
|
Requires-Dist: pytest>=7.4.0; extra == "dev"
|
|
42
41
|
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
|
|
@@ -44,7 +43,7 @@ Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
|
|
|
44
43
|
Requires-Dist: black>=23.12.0; extra == "dev"
|
|
45
44
|
Requires-Dist: ruff>=0.1.9; extra == "dev"
|
|
46
45
|
Requires-Dist: mypy>=1.8.0; extra == "dev"
|
|
47
|
-
Requires-Dist: build>=1.0.
|
|
46
|
+
Requires-Dist: build>=1.0.5; extra == "dev"
|
|
48
47
|
Requires-Dist: twine>=4.0.2; extra == "dev"
|
|
49
48
|
Provides-Extra: docs
|
|
50
49
|
Requires-Dist: sphinx>=7.2.0; extra == "docs"
|
|
@@ -62,7 +61,7 @@ Green mining for microservices repositories.
|
|
|
62
61
|
|
|
63
62
|
## Overview
|
|
64
63
|
|
|
65
|
-
`greenmining` is a Python library
|
|
64
|
+
`greenmining` is a Python library for analyzing GitHub repositories to identify green software engineering practices and energy-efficient patterns. It detects sustainable software patterns across cloud, web, AI, database, networking, and general categories.
|
|
66
65
|
|
|
67
66
|
## Installation
|
|
68
67
|
|
|
@@ -88,37 +87,6 @@ docker pull adambouafia/greenmining:latest
|
|
|
88
87
|
|
|
89
88
|
## Quick Start
|
|
90
89
|
|
|
91
|
-
### CLI Usage
|
|
92
|
-
|
|
93
|
-
```bash
|
|
94
|
-
# Set your GitHub token
|
|
95
|
-
export GITHUB_TOKEN="your_github_token"
|
|
96
|
-
|
|
97
|
-
# Run full analysis pipeline
|
|
98
|
-
greenmining pipeline --max-repos 100
|
|
99
|
-
|
|
100
|
-
# Fetch repositories with custom keywords
|
|
101
|
-
greenmining fetch --max-repos 100 --min-stars 100 --keywords "kubernetes docker cloud-native"
|
|
102
|
-
|
|
103
|
-
# Fetch with default (microservices)
|
|
104
|
-
greenmining fetch --max-repos 100 --min-stars 100
|
|
105
|
-
|
|
106
|
-
# Extract commits
|
|
107
|
-
greenmining extract --max-commits 50
|
|
108
|
-
|
|
109
|
-
# Analyze for green patterns
|
|
110
|
-
greenmining analyze
|
|
111
|
-
|
|
112
|
-
# Analyze with advanced features
|
|
113
|
-
greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis
|
|
114
|
-
|
|
115
|
-
# Aggregate results with temporal analysis
|
|
116
|
-
greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
|
|
117
|
-
|
|
118
|
-
# Generate report
|
|
119
|
-
greenmining report
|
|
120
|
-
```
|
|
121
|
-
|
|
122
90
|
### Python API
|
|
123
91
|
|
|
124
92
|
#### Basic Pattern Detection
|
|
@@ -137,7 +105,7 @@ if is_green_aware(commit_msg):
|
|
|
137
105
|
# Output: ['Cache Static Data', 'Use Efficient Cache Strategies']
|
|
138
106
|
```
|
|
139
107
|
|
|
140
|
-
#### Fetch Repositories with Custom Keywords
|
|
108
|
+
#### Fetch Repositories with Custom Keywords
|
|
141
109
|
|
|
142
110
|
```python
|
|
143
111
|
from greenmining import fetch_repositories
|
|
@@ -176,8 +144,6 @@ for repo in repos[:5]:
|
|
|
176
144
|
```python
|
|
177
145
|
from greenmining.services.commit_extractor import CommitExtractor
|
|
178
146
|
from greenmining.services.data_analyzer import DataAnalyzer
|
|
179
|
-
from greenmining.analyzers.nlp_analyzer import NLPAnalyzer
|
|
180
|
-
from greenmining.analyzers.ml_feature_extractor import MLFeatureExtractor
|
|
181
147
|
from greenmining import fetch_repositories
|
|
182
148
|
|
|
183
149
|
# Fetch repositories with custom keywords
|
|
@@ -197,23 +163,10 @@ extractor = CommitExtractor(
|
|
|
197
163
|
# Initialize analyzer with advanced features
|
|
198
164
|
analyzer = DataAnalyzer(
|
|
199
165
|
enable_diff_analysis=False, # Enable code diff analysis (slower but more accurate)
|
|
200
|
-
enable_nlp=True, # Enable NLP-enhanced pattern detection
|
|
201
|
-
enable_ml_features=True, # Enable ML feature extraction
|
|
202
166
|
patterns=None, # Custom pattern dict (default: GSF_PATTERNS)
|
|
203
167
|
batch_size=10 # Batch processing size (default: 10)
|
|
204
168
|
)
|
|
205
169
|
|
|
206
|
-
# Optional: Configure NLP analyzer separately
|
|
207
|
-
nlp_analyzer = NLPAnalyzer(
|
|
208
|
-
enable_stemming=True, # Enable morphological analysis (optimize→optimizing)
|
|
209
|
-
enable_synonyms=True # Enable semantic synonym matching (cache→buffer)
|
|
210
|
-
)
|
|
211
|
-
|
|
212
|
-
# Optional: Configure ML feature extractor
|
|
213
|
-
ml_extractor = MLFeatureExtractor(
|
|
214
|
-
green_keywords=None # Custom keyword list (default: built-in 19 keywords)
|
|
215
|
-
)
|
|
216
|
-
|
|
217
170
|
# Extract commits from first repo
|
|
218
171
|
commits = extractor.extract_commits(
|
|
219
172
|
repository=repos[0], # PyGithub Repository object
|
|
@@ -229,18 +182,9 @@ commits = extractor.extract_commits(
|
|
|
229
182
|
|
|
230
183
|
**DataAnalyzer Parameters:**
|
|
231
184
|
- `enable_diff_analysis` (bool, default=False): Enable code diff analysis (slower)
|
|
232
|
-
- `enable_nlp` (bool, default=False): Enable NLP-enhanced pattern detection
|
|
233
|
-
- `enable_ml_features` (bool, default=False): Enable ML feature extraction
|
|
234
185
|
- `patterns` (dict, optional): Custom pattern dictionary (default: GSF_PATTERNS)
|
|
235
186
|
- `batch_size` (int, default=10): Number of commits to process in each batch
|
|
236
187
|
|
|
237
|
-
**NLPAnalyzer Parameters:**
|
|
238
|
-
- `enable_stemming` (bool, default=True): Enable morphological variant matching
|
|
239
|
-
- `enable_synonyms` (bool, default=True): Enable semantic synonym expansion
|
|
240
|
-
|
|
241
|
-
**MLFeatureExtractor Parameters:**
|
|
242
|
-
- `green_keywords` (list[str], optional): Custom green keywords list
|
|
243
|
-
|
|
244
188
|
# Analyze commits for green patterns
|
|
245
189
|
results = []
|
|
246
190
|
for commit in commits:
|
|
@@ -249,18 +193,6 @@ for commit in commits:
|
|
|
249
193
|
results.append(result)
|
|
250
194
|
print(f"Green commit found: {commit.message[:50]}...")
|
|
251
195
|
print(f" Patterns: {result['known_pattern']}")
|
|
252
|
-
|
|
253
|
-
# Access NLP analysis results (NEW)
|
|
254
|
-
if 'nlp_analysis' in result:
|
|
255
|
-
nlp = result['nlp_analysis']
|
|
256
|
-
print(f" NLP: {nlp['morphological_count']} morphological matches, "
|
|
257
|
-
f"{nlp['semantic_count']} semantic matches")
|
|
258
|
-
|
|
259
|
-
# Access ML features (NEW)
|
|
260
|
-
if 'ml_features' in result:
|
|
261
|
-
ml = result['ml_features']['text']
|
|
262
|
-
print(f" ML Features: {ml['word_count']} words, "
|
|
263
|
-
f"keyword density: {ml['keyword_density']:.2f}")
|
|
264
196
|
```
|
|
265
197
|
|
|
266
198
|
#### Access Sustainability Patterns Data
|
|
@@ -296,7 +228,7 @@ print(f"Available categories: {sorted(categories)}")
|
|
|
296
228
|
# 'monitoring', 'network', 'networking', 'resource', 'web']
|
|
297
229
|
```
|
|
298
230
|
|
|
299
|
-
#### Advanced Analysis: Temporal Trends
|
|
231
|
+
#### Advanced Analysis: Temporal Trends
|
|
300
232
|
|
|
301
233
|
```python
|
|
302
234
|
from greenmining.services.data_aggregator import DataAggregator
|
|
@@ -306,7 +238,7 @@ from greenmining.analyzers.qualitative_analyzer import QualitativeAnalyzer
|
|
|
306
238
|
# Initialize aggregator with all advanced features
|
|
307
239
|
aggregator = DataAggregator(
|
|
308
240
|
config=None, # Config object (optional)
|
|
309
|
-
|
|
241
|
+
enable_stats=True, # Enable statistical analysis (correlations, trends)
|
|
310
242
|
enable_temporal=True, # Enable temporal trend analysis
|
|
311
243
|
temporal_granularity="quarter" # Time granularity: day/week/month/quarter/year
|
|
312
244
|
)
|
|
@@ -330,7 +262,7 @@ aggregated = aggregator.aggregate(
|
|
|
330
262
|
|
|
331
263
|
**DataAggregator Parameters:**
|
|
332
264
|
- `config` (Config, optional): Configuration object
|
|
333
|
-
- `
|
|
265
|
+
- `enable_stats` (bool, default=False): Enable pattern correlations and effect size analysis
|
|
334
266
|
- `enable_temporal` (bool, default=False): Enable temporal trend analysis over time
|
|
335
267
|
- `temporal_granularity` (str, default="quarter"): Time granularity (day/week/month/quarter/year)
|
|
336
268
|
|
|
@@ -428,7 +360,7 @@ repositories = fetch_repositories(
|
|
|
428
360
|
min_stars=10,
|
|
429
361
|
keywords="software engineering",
|
|
430
362
|
)
|
|
431
|
-
print(f"
|
|
363
|
+
print(f"Fetched {len(repositories)} repositories")
|
|
432
364
|
|
|
433
365
|
# STAGE 2: Extract Commits
|
|
434
366
|
print("\nExtracting commits...")
|
|
@@ -440,7 +372,7 @@ extractor = CommitExtractor(
|
|
|
440
372
|
timeout=120,
|
|
441
373
|
)
|
|
442
374
|
all_commits = extractor.extract_from_repositories(repositories)
|
|
443
|
-
print(f"
|
|
375
|
+
print(f"Extracted {len(all_commits)} commits")
|
|
444
376
|
|
|
445
377
|
# Save commits
|
|
446
378
|
extractor.save_results(
|
|
@@ -452,8 +384,6 @@ extractor.save_results(
|
|
|
452
384
|
# STAGE 3: Analyze Commits
|
|
453
385
|
print("\nAnalyzing commits...")
|
|
454
386
|
analyzer = DataAnalyzer(
|
|
455
|
-
enable_nlp=True,
|
|
456
|
-
enable_ml_features=True,
|
|
457
387
|
enable_diff_analysis=False, # Set to True for detailed code analysis (slower)
|
|
458
388
|
)
|
|
459
389
|
analyzed_commits = analyzer.analyze_commits(all_commits)
|
|
@@ -461,8 +391,8 @@ analyzed_commits = analyzer.analyze_commits(all_commits)
|
|
|
461
391
|
# Count green-aware commits
|
|
462
392
|
green_count = sum(1 for c in analyzed_commits if c.get("green_aware", False))
|
|
463
393
|
green_percentage = (green_count / len(analyzed_commits) * 100) if analyzed_commits else 0
|
|
464
|
-
print(f"
|
|
465
|
-
print(f"
|
|
394
|
+
print(f"Analyzed {len(analyzed_commits)} commits")
|
|
395
|
+
print(f"Green-aware: {green_count} ({green_percentage:.1f}%)")
|
|
466
396
|
|
|
467
397
|
# Save analysis
|
|
468
398
|
analyzer.save_results(analyzed_commits, output_dir / "analyzed.json")
|
|
@@ -470,7 +400,7 @@ analyzer.save_results(analyzed_commits, output_dir / "analyzed.json")
|
|
|
470
400
|
# STAGE 4: Aggregate Results
|
|
471
401
|
print("\nAggregating results...")
|
|
472
402
|
aggregator = DataAggregator(
|
|
473
|
-
|
|
403
|
+
enable_stats=True,
|
|
474
404
|
enable_temporal=True,
|
|
475
405
|
temporal_granularity="quarter",
|
|
476
406
|
)
|
|
@@ -490,15 +420,15 @@ print("\n" + "="*80)
|
|
|
490
420
|
print("ANALYSIS COMPLETE")
|
|
491
421
|
print("="*80)
|
|
492
422
|
aggregator.print_summary(results)
|
|
493
|
-
print(f"\
|
|
423
|
+
print(f"\nResults saved in: {output_dir.absolute()}")
|
|
494
424
|
```
|
|
495
425
|
|
|
496
426
|
**What this example does:**
|
|
497
427
|
|
|
498
428
|
1. **Fetches repositories** from GitHub based on keywords and filters
|
|
499
429
|
2. **Extracts commits** from each repository (up to 1000 per repo)
|
|
500
|
-
3. **Analyzes commits** for green software patterns
|
|
501
|
-
4. **Aggregates results** with temporal analysis and
|
|
430
|
+
3. **Analyzes commits** for green software patterns
|
|
431
|
+
4. **Aggregates results** with temporal analysis and statistics
|
|
502
432
|
5. **Saves results** to JSON and CSV files for further analysis
|
|
503
433
|
|
|
504
434
|
**Expected output files:**
|
|
@@ -513,17 +443,13 @@ print(f"\n📁 Results saved in: {output_dir.absolute()}")
|
|
|
513
443
|
### Docker Usage
|
|
514
444
|
|
|
515
445
|
```bash
|
|
516
|
-
#
|
|
517
|
-
docker run -v $(pwd)/data:/app/data \
|
|
518
|
-
adambouafia/greenmining:latest
|
|
519
|
-
|
|
520
|
-
# With custom configuration
|
|
521
|
-
docker run -v $(pwd)/.env:/app/.env:ro \
|
|
522
|
-
-v $(pwd)/data:/app/data \
|
|
523
|
-
adambouafia/greenmining:latest pipeline --max-repos 50
|
|
446
|
+
# Interactive shell with Python
|
|
447
|
+
docker run -it -v $(pwd)/data:/app/data \
|
|
448
|
+
adambouafia/greenmining:latest python
|
|
524
449
|
|
|
525
|
-
#
|
|
526
|
-
docker run -
|
|
450
|
+
# Run Python script
|
|
451
|
+
docker run -v $(pwd)/data:/app/data \
|
|
452
|
+
adambouafia/greenmining:latest python your_script.py
|
|
527
453
|
```
|
|
528
454
|
|
|
529
455
|
## Configuration
|
|
@@ -549,14 +475,12 @@ EXCLUDE_BOT_COMMITS=true
|
|
|
549
475
|
|
|
550
476
|
# Optional - Analysis Features
|
|
551
477
|
ENABLE_DIFF_ANALYSIS=false
|
|
552
|
-
ENABLE_NLP=true
|
|
553
|
-
ENABLE_ML_FEATURES=true
|
|
554
478
|
BATCH_SIZE=10
|
|
555
479
|
|
|
556
480
|
# Optional - Temporal Analysis
|
|
557
481
|
ENABLE_TEMPORAL=true
|
|
558
482
|
TEMPORAL_GRANULARITY=quarter
|
|
559
|
-
|
|
483
|
+
ENABLE_STATS=true
|
|
560
484
|
|
|
561
485
|
# Optional - Output
|
|
562
486
|
OUTPUT_DIR=./data
|
|
@@ -586,14 +510,12 @@ config = Config(
|
|
|
586
510
|
|
|
587
511
|
# Analysis Options
|
|
588
512
|
enable_diff_analysis=False, # Enable code diff analysis
|
|
589
|
-
enable_nlp=True, # Enable NLP features
|
|
590
|
-
enable_ml_features=True, # Enable ML feature extraction
|
|
591
513
|
batch_size=10, # Batch processing size
|
|
592
514
|
|
|
593
515
|
# Temporal Analysis
|
|
594
516
|
enable_temporal=True, # Enable temporal trend analysis
|
|
595
517
|
temporal_granularity="quarter", # day/week/month/quarter/year
|
|
596
|
-
|
|
518
|
+
enable_stats=True, # Enable statistical analysis
|
|
597
519
|
|
|
598
520
|
# Output Configuration
|
|
599
521
|
output_dir="./data", # Output directory path
|
|
@@ -619,6 +541,50 @@ config = Config(
|
|
|
619
541
|
- **Docker Support**: Pre-built images for containerized analysis
|
|
620
542
|
- **Programmatic API**: Full Python API for custom workflows and integrations
|
|
621
543
|
- **Clean Architecture**: Modular design with services layer (Fetcher, Extractor, Analyzer, Aggregator, Reports)
|
|
544
|
+
- **Energy Measurement**: Real-time energy consumption tracking via RAPL (Linux) or CodeCarbon (cross-platform)
|
|
545
|
+
|
|
546
|
+
### Energy Measurement
|
|
547
|
+
|
|
548
|
+
greenmining includes built-in energy measurement capabilities for tracking the carbon footprint of your analysis:
|
|
549
|
+
|
|
550
|
+
#### Backend Options
|
|
551
|
+
|
|
552
|
+
| Backend | Platform | Metrics | Requirements |
|
|
553
|
+
|---------|----------|---------|--------------|
|
|
554
|
+
| **RAPL** | Linux (Intel/AMD) | CPU/RAM energy (Joules) | `/sys/class/powercap/` access |
|
|
555
|
+
| **CodeCarbon** | Cross-platform | Energy + Carbon emissions (gCO2) | `pip install codecarbon` |
|
|
556
|
+
|
|
557
|
+
#### Python API
|
|
558
|
+
|
|
559
|
+
```python
|
|
560
|
+
from greenmining.energy import RAPLEnergyMeter, CodeCarbonMeter
|
|
561
|
+
|
|
562
|
+
# RAPL (Linux only)
|
|
563
|
+
rapl = RAPLEnergyMeter()
|
|
564
|
+
if rapl.is_available():
|
|
565
|
+
rapl.start()
|
|
566
|
+
# ... run analysis ...
|
|
567
|
+
result = rapl.stop()
|
|
568
|
+
print(f"Energy: {result.energy_joules:.2f} J")
|
|
569
|
+
|
|
570
|
+
# CodeCarbon (cross-platform)
|
|
571
|
+
cc = CodeCarbonMeter()
|
|
572
|
+
if cc.is_available():
|
|
573
|
+
cc.start()
|
|
574
|
+
# ... run analysis ...
|
|
575
|
+
result = cc.stop()
|
|
576
|
+
print(f"Energy: {result.energy_joules:.2f} J")
|
|
577
|
+
print(f"Carbon: {result.carbon_grams:.4f} gCO2")
|
|
578
|
+
```
|
|
579
|
+
|
|
580
|
+
#### Experiment Results
|
|
581
|
+
|
|
582
|
+
CodeCarbon was verified with a real experiment:
|
|
583
|
+
- **Repository**: flask (pallets/flask)
|
|
584
|
+
- **Commits analyzed**: 10
|
|
585
|
+
- **Energy measured**: 160.6 J
|
|
586
|
+
- **Carbon emissions**: 0.0119 gCO2
|
|
587
|
+
- **Duration**: 11.28 seconds
|
|
622
588
|
|
|
623
589
|
### Pattern Database
|
|
624
590
|
|
|
@@ -683,77 +649,6 @@ Alpine containers, Infrastructure as Code, renewable energy regions, container o
|
|
|
683
649
|
### 15. General (8 patterns)
|
|
684
650
|
Feature flags, incremental processing, precomputation, background jobs, workflow optimization
|
|
685
651
|
|
|
686
|
-
## CLI Commands
|
|
687
|
-
|
|
688
|
-
| Command | Description | Key Options |
|
|
689
|
-
|---------|-------------|-------------|
|
|
690
|
-
| `fetch` | Fetch repositories from GitHub with custom keywords | `--max-repos`, `--min-stars`, `--languages`, `--keywords` |
|
|
691
|
-
| `extract` | Extract commit history from repositories | `--max-commits` per repository |
|
|
692
|
-
| `analyze` | Analyze commits for green patterns | `--enable-nlp`, `--enable-ml-features`, `--enable-diff-analysis` |
|
|
693
|
-
| `aggregate` | Aggregate analysis results | `--enable-temporal`, `--temporal-granularity`, `--enable-enhanced-stats` |
|
|
694
|
-
| `report` | Generate comprehensive report | Creates Markdown and CSV outputs |
|
|
695
|
-
| `pipeline` | Run complete analysis pipeline | `--max-repos`, `--max-commits` (all-in-one) |
|
|
696
|
-
| `status` | Show current analysis status | Displays progress and file statistics |
|
|
697
|
-
|
|
698
|
-
### Command Details
|
|
699
|
-
|
|
700
|
-
#### Fetch Repositories
|
|
701
|
-
```bash
|
|
702
|
-
# Fetch with custom search keywords
|
|
703
|
-
greenmining fetch --max-repos 100 --min-stars 50 --languages Python --keywords "kubernetes docker"
|
|
704
|
-
|
|
705
|
-
# Fetch microservices (default)
|
|
706
|
-
greenmining fetch --max-repos 100 --min-stars 50 --languages Python
|
|
707
|
-
```
|
|
708
|
-
Options:
|
|
709
|
-
- `--max-repos`: Maximum repositories to fetch (default: 100)
|
|
710
|
-
- `--min-stars`: Minimum GitHub stars (default: 100)
|
|
711
|
-
- `--languages`: Filter by programming languages (default: "Python,Java,Go,JavaScript,TypeScript")
|
|
712
|
-
- `--keywords`: Custom search keywords (default: "microservices")
|
|
713
|
-
|
|
714
|
-
#### Extract Commits
|
|
715
|
-
```bash
|
|
716
|
-
greenmining extract --max-commits 50
|
|
717
|
-
```
|
|
718
|
-
Options:
|
|
719
|
-
- `--max-commits`: Maximum commits per repository (default: 50)
|
|
720
|
-
|
|
721
|
-
#### Analyze Commits (with Advanced Features)
|
|
722
|
-
```bash
|
|
723
|
-
# Basic analysis
|
|
724
|
-
greenmining analyze
|
|
725
|
-
|
|
726
|
-
# Advanced analysis with all features
|
|
727
|
-
greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis --batch-size 20
|
|
728
|
-
```
|
|
729
|
-
Options:
|
|
730
|
-
- `--batch-size`: Batch size for processing (default: 10)
|
|
731
|
-
- `--enable-diff-analysis`: Enable code diff analysis (slower but more accurate)
|
|
732
|
-
- `--enable-nlp`: Enable NLP-enhanced pattern detection with morphological variants and synonyms
|
|
733
|
-
- `--enable-ml-features`: Enable ML feature extraction for model training
|
|
734
|
-
|
|
735
|
-
#### Aggregate Results (with Temporal Analysis)
|
|
736
|
-
```bash
|
|
737
|
-
# Basic aggregation
|
|
738
|
-
greenmining aggregate
|
|
739
|
-
|
|
740
|
-
# Advanced aggregation with temporal trends
|
|
741
|
-
greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
|
|
742
|
-
```
|
|
743
|
-
Options:
|
|
744
|
-
- `--enable-enhanced-stats`: Enable enhanced statistical analysis (correlations, effect sizes)
|
|
745
|
-
- `--enable-temporal`: Enable temporal trend analysis
|
|
746
|
-
- `--temporal-granularity`: Time period granularity (choices: day, week, month, quarter, year)
|
|
747
|
-
|
|
748
|
-
#### Run Pipeline
|
|
749
|
-
```bash
|
|
750
|
-
greenmining pipeline --max-repos 50 --max-commits 100
|
|
751
|
-
```
|
|
752
|
-
Options:
|
|
753
|
-
- `--max-repos`: Repositories to analyze
|
|
754
|
-
- `--max-commits`: Commits per repository
|
|
755
|
-
- Executes: fetch → extract → analyze → aggregate → report
|
|
756
|
-
|
|
757
652
|
## Output Files
|
|
758
653
|
|
|
759
654
|
All outputs are saved to the `data/` directory:
|
|
@@ -793,6 +688,7 @@ ruff check greenmining/ tests/
|
|
|
793
688
|
- PyDriller >= 2.5
|
|
794
689
|
- pandas >= 2.2.0
|
|
795
690
|
- click >= 8.1.7
|
|
691
|
+
- codecarbon >= 2.0.0 (optional, for cross-platform energy measurement)
|
|
796
692
|
|
|
797
693
|
## License
|
|
798
694
|
|