greenmining 1.0.3__tar.gz → 1.0.4__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {greenmining-1.0.3/greenmining.egg-info → greenmining-1.0.4}/PKG-INFO +61 -151
- {greenmining-1.0.3 → greenmining-1.0.4}/README.md +58 -147
- greenmining-1.0.4/greenmining/__init__.py +43 -0
- greenmining-1.0.4/greenmining/__main__.py +12 -0
- greenmining-1.0.4/greenmining/__version__.py +3 -0
- greenmining-1.0.4/greenmining/analyzers/__init__.py +13 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/analyzers/code_diff_analyzer.py +151 -61
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/analyzers/qualitative_analyzer.py +15 -81
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/analyzers/statistical_analyzer.py +8 -69
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/analyzers/temporal_analyzer.py +16 -72
- greenmining-1.0.4/greenmining/config.py +188 -0
- greenmining-1.0.4/greenmining/controllers/__init__.py +7 -0
- greenmining-1.0.4/greenmining/controllers/repository_controller.py +231 -0
- greenmining-1.0.4/greenmining/energy/__init__.py +13 -0
- greenmining-1.0.4/greenmining/energy/base.py +165 -0
- greenmining-1.0.4/greenmining/energy/codecarbon_meter.py +146 -0
- greenmining-1.0.4/greenmining/energy/rapl.py +157 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/gsf_patterns.py +4 -26
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/__init__.py +1 -5
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/aggregated_stats.py +4 -4
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/analysis_result.py +4 -4
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/commit.py +5 -5
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/repository.py +5 -5
- greenmining-1.0.4/greenmining/presenters/__init__.py +7 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/presenters/console_presenter.py +24 -24
- greenmining-1.0.4/greenmining/services/__init__.py +17 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/services/commit_extractor.py +8 -152
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/services/data_aggregator.py +45 -175
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/services/data_analyzer.py +9 -202
- greenmining-1.0.4/greenmining/services/github_fetcher.py +212 -0
- greenmining-1.0.4/greenmining/services/github_graphql_fetcher.py +371 -0
- greenmining-1.0.4/greenmining/services/local_repo_analyzer.py +387 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/services/reports.py +33 -137
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/utils.py +21 -149
- {greenmining-1.0.3 → greenmining-1.0.4/greenmining.egg-info}/PKG-INFO +61 -151
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining.egg-info/SOURCES.txt +12 -10
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining.egg-info/requires.txt +2 -2
- {greenmining-1.0.3 → greenmining-1.0.4}/pyproject.toml +4 -8
- greenmining-1.0.3/greenmining/__init__.py +0 -61
- greenmining-1.0.3/greenmining/__main__.py +0 -6
- greenmining-1.0.3/greenmining/__version__.py +0 -3
- greenmining-1.0.3/greenmining/analyzers/__init__.py +0 -17
- greenmining-1.0.3/greenmining/analyzers/ml_feature_extractor.py +0 -512
- greenmining-1.0.3/greenmining/analyzers/nlp_analyzer.py +0 -365
- greenmining-1.0.3/greenmining/cli.py +0 -471
- greenmining-1.0.3/greenmining/config.py +0 -141
- greenmining-1.0.3/greenmining/controllers/__init__.py +0 -11
- greenmining-1.0.3/greenmining/controllers/repository_controller.py +0 -172
- greenmining-1.0.3/greenmining/main.py +0 -37
- greenmining-1.0.3/greenmining/presenters/__init__.py +0 -11
- greenmining-1.0.3/greenmining/services/__init__.py +0 -13
- greenmining-1.0.3/greenmining/services/github_fetcher.py +0 -323
- greenmining-1.0.3/greenmining.egg-info/entry_points.txt +0 -2
- greenmining-1.0.3/pytest.ini +0 -22
- {greenmining-1.0.3 → greenmining-1.0.4}/CHANGELOG.md +0 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/LICENSE +0 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/MANIFEST.in +0 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining.egg-info/dependency_links.txt +0 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/greenmining.egg-info/top_level.txt +0 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/setup.cfg +0 -0
- {greenmining-1.0.3 → greenmining-1.0.4}/setup.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: greenmining
|
|
3
|
-
Version: 1.0.
|
|
3
|
+
Version: 1.0.4
|
|
4
4
|
Summary: Analyze GitHub repositories to identify green software engineering patterns and energy-efficient practices
|
|
5
5
|
Author-email: Adam Bouafia <a.bouafia@student.vu.nl>
|
|
6
6
|
License: MIT
|
|
@@ -23,20 +23,19 @@ Classifier: Programming Language :: Python :: 3.11
|
|
|
23
23
|
Classifier: Programming Language :: Python :: 3.12
|
|
24
24
|
Classifier: Programming Language :: Python :: 3.13
|
|
25
25
|
Classifier: Operating System :: OS Independent
|
|
26
|
-
Classifier: Environment :: Console
|
|
27
26
|
Requires-Python: >=3.9
|
|
28
27
|
Description-Content-Type: text/markdown
|
|
29
28
|
License-File: LICENSE
|
|
30
29
|
Requires-Dist: PyGithub>=2.1.1
|
|
31
30
|
Requires-Dist: PyDriller>=2.5
|
|
32
31
|
Requires-Dist: pandas>=2.2.0
|
|
33
|
-
Requires-Dist: click>=8.1.7
|
|
34
32
|
Requires-Dist: colorama>=0.4.6
|
|
35
33
|
Requires-Dist: tabulate>=0.9.0
|
|
36
34
|
Requires-Dist: tqdm>=4.66.0
|
|
37
35
|
Requires-Dist: matplotlib>=3.8.0
|
|
38
36
|
Requires-Dist: plotly>=5.18.0
|
|
39
37
|
Requires-Dist: python-dotenv>=1.0.0
|
|
38
|
+
Requires-Dist: requests>=2.31.0
|
|
40
39
|
Provides-Extra: dev
|
|
41
40
|
Requires-Dist: pytest>=7.4.0; extra == "dev"
|
|
42
41
|
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
|
|
@@ -44,7 +43,7 @@ Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
|
|
|
44
43
|
Requires-Dist: black>=23.12.0; extra == "dev"
|
|
45
44
|
Requires-Dist: ruff>=0.1.9; extra == "dev"
|
|
46
45
|
Requires-Dist: mypy>=1.8.0; extra == "dev"
|
|
47
|
-
Requires-Dist: build>=1.0.
|
|
46
|
+
Requires-Dist: build>=1.0.4; extra == "dev"
|
|
48
47
|
Requires-Dist: twine>=4.0.2; extra == "dev"
|
|
49
48
|
Provides-Extra: docs
|
|
50
49
|
Requires-Dist: sphinx>=7.2.0; extra == "docs"
|
|
@@ -88,37 +87,6 @@ docker pull adambouafia/greenmining:latest
|
|
|
88
87
|
|
|
89
88
|
## Quick Start
|
|
90
89
|
|
|
91
|
-
### CLI Usage
|
|
92
|
-
|
|
93
|
-
```bash
|
|
94
|
-
# Set your GitHub token
|
|
95
|
-
export GITHUB_TOKEN="your_github_token"
|
|
96
|
-
|
|
97
|
-
# Run full analysis pipeline
|
|
98
|
-
greenmining pipeline --max-repos 100
|
|
99
|
-
|
|
100
|
-
# Fetch repositories with custom keywords
|
|
101
|
-
greenmining fetch --max-repos 100 --min-stars 100 --keywords "kubernetes docker cloud-native"
|
|
102
|
-
|
|
103
|
-
# Fetch with default (microservices)
|
|
104
|
-
greenmining fetch --max-repos 100 --min-stars 100
|
|
105
|
-
|
|
106
|
-
# Extract commits
|
|
107
|
-
greenmining extract --max-commits 50
|
|
108
|
-
|
|
109
|
-
# Analyze for green patterns
|
|
110
|
-
greenmining analyze
|
|
111
|
-
|
|
112
|
-
# Analyze with advanced features
|
|
113
|
-
greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis
|
|
114
|
-
|
|
115
|
-
# Aggregate results with temporal analysis
|
|
116
|
-
greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
|
|
117
|
-
|
|
118
|
-
# Generate report
|
|
119
|
-
greenmining report
|
|
120
|
-
```
|
|
121
|
-
|
|
122
90
|
### Python API
|
|
123
91
|
|
|
124
92
|
#### Basic Pattern Detection
|
|
@@ -197,23 +165,10 @@ extractor = CommitExtractor(
|
|
|
197
165
|
# Initialize analyzer with advanced features
|
|
198
166
|
analyzer = DataAnalyzer(
|
|
199
167
|
enable_diff_analysis=False, # Enable code diff analysis (slower but more accurate)
|
|
200
|
-
enable_nlp=True, # Enable NLP-enhanced pattern detection
|
|
201
|
-
enable_ml_features=True, # Enable ML feature extraction
|
|
202
168
|
patterns=None, # Custom pattern dict (default: GSF_PATTERNS)
|
|
203
169
|
batch_size=10 # Batch processing size (default: 10)
|
|
204
170
|
)
|
|
205
171
|
|
|
206
|
-
# Optional: Configure NLP analyzer separately
|
|
207
|
-
nlp_analyzer = NLPAnalyzer(
|
|
208
|
-
enable_stemming=True, # Enable morphological analysis (optimize→optimizing)
|
|
209
|
-
enable_synonyms=True # Enable semantic synonym matching (cache→buffer)
|
|
210
|
-
)
|
|
211
|
-
|
|
212
|
-
# Optional: Configure ML feature extractor
|
|
213
|
-
ml_extractor = MLFeatureExtractor(
|
|
214
|
-
green_keywords=None # Custom keyword list (default: built-in 19 keywords)
|
|
215
|
-
)
|
|
216
|
-
|
|
217
172
|
# Extract commits from first repo
|
|
218
173
|
commits = extractor.extract_commits(
|
|
219
174
|
repository=repos[0], # PyGithub Repository object
|
|
@@ -229,18 +184,9 @@ commits = extractor.extract_commits(
|
|
|
229
184
|
|
|
230
185
|
**DataAnalyzer Parameters:**
|
|
231
186
|
- `enable_diff_analysis` (bool, default=False): Enable code diff analysis (slower)
|
|
232
|
-
- `enable_nlp` (bool, default=False): Enable NLP-enhanced pattern detection
|
|
233
|
-
- `enable_ml_features` (bool, default=False): Enable ML feature extraction
|
|
234
187
|
- `patterns` (dict, optional): Custom pattern dictionary (default: GSF_PATTERNS)
|
|
235
188
|
- `batch_size` (int, default=10): Number of commits to process in each batch
|
|
236
189
|
|
|
237
|
-
**NLPAnalyzer Parameters:**
|
|
238
|
-
- `enable_stemming` (bool, default=True): Enable morphological variant matching
|
|
239
|
-
- `enable_synonyms` (bool, default=True): Enable semantic synonym expansion
|
|
240
|
-
|
|
241
|
-
**MLFeatureExtractor Parameters:**
|
|
242
|
-
- `green_keywords` (list[str], optional): Custom green keywords list
|
|
243
|
-
|
|
244
190
|
# Analyze commits for green patterns
|
|
245
191
|
results = []
|
|
246
192
|
for commit in commits:
|
|
@@ -306,7 +252,7 @@ from greenmining.analyzers.qualitative_analyzer import QualitativeAnalyzer
|
|
|
306
252
|
# Initialize aggregator with all advanced features
|
|
307
253
|
aggregator = DataAggregator(
|
|
308
254
|
config=None, # Config object (optional)
|
|
309
|
-
|
|
255
|
+
enable_stats=True, # Enable statistical analysis (correlations, trends)
|
|
310
256
|
enable_temporal=True, # Enable temporal trend analysis
|
|
311
257
|
temporal_granularity="quarter" # Time granularity: day/week/month/quarter/year
|
|
312
258
|
)
|
|
@@ -330,7 +276,7 @@ aggregated = aggregator.aggregate(
|
|
|
330
276
|
|
|
331
277
|
**DataAggregator Parameters:**
|
|
332
278
|
- `config` (Config, optional): Configuration object
|
|
333
|
-
- `
|
|
279
|
+
- `enable_stats` (bool, default=False): Enable pattern correlations and effect size analysis
|
|
334
280
|
- `enable_temporal` (bool, default=False): Enable temporal trend analysis over time
|
|
335
281
|
- `temporal_granularity` (str, default="quarter"): Time granularity (day/week/month/quarter/year)
|
|
336
282
|
|
|
@@ -452,8 +398,6 @@ extractor.save_results(
|
|
|
452
398
|
# STAGE 3: Analyze Commits
|
|
453
399
|
print("\nAnalyzing commits...")
|
|
454
400
|
analyzer = DataAnalyzer(
|
|
455
|
-
enable_nlp=True,
|
|
456
|
-
enable_ml_features=True,
|
|
457
401
|
enable_diff_analysis=False, # Set to True for detailed code analysis (slower)
|
|
458
402
|
)
|
|
459
403
|
analyzed_commits = analyzer.analyze_commits(all_commits)
|
|
@@ -470,7 +414,7 @@ analyzer.save_results(analyzed_commits, output_dir / "analyzed.json")
|
|
|
470
414
|
# STAGE 4: Aggregate Results
|
|
471
415
|
print("\nAggregating results...")
|
|
472
416
|
aggregator = DataAggregator(
|
|
473
|
-
|
|
417
|
+
enable_stats=True,
|
|
474
418
|
enable_temporal=True,
|
|
475
419
|
temporal_granularity="quarter",
|
|
476
420
|
)
|
|
@@ -497,8 +441,8 @@ print(f"\n📁 Results saved in: {output_dir.absolute()}")
|
|
|
497
441
|
|
|
498
442
|
1. **Fetches repositories** from GitHub based on keywords and filters
|
|
499
443
|
2. **Extracts commits** from each repository (up to 1000 per repo)
|
|
500
|
-
3. **Analyzes commits** for green software patterns
|
|
501
|
-
4. **Aggregates results** with temporal analysis and
|
|
444
|
+
3. **Analyzes commits** for green software patterns
|
|
445
|
+
4. **Aggregates results** with temporal analysis and statistics
|
|
502
446
|
5. **Saves results** to JSON and CSV files for further analysis
|
|
503
447
|
|
|
504
448
|
**Expected output files:**
|
|
@@ -513,17 +457,13 @@ print(f"\n📁 Results saved in: {output_dir.absolute()}")
|
|
|
513
457
|
### Docker Usage
|
|
514
458
|
|
|
515
459
|
```bash
|
|
516
|
-
#
|
|
517
|
-
docker run -v $(pwd)/data:/app/data \
|
|
518
|
-
adambouafia/greenmining:latest
|
|
519
|
-
|
|
520
|
-
# With custom configuration
|
|
521
|
-
docker run -v $(pwd)/.env:/app/.env:ro \
|
|
522
|
-
-v $(pwd)/data:/app/data \
|
|
523
|
-
adambouafia/greenmining:latest pipeline --max-repos 50
|
|
460
|
+
# Interactive shell with Python
|
|
461
|
+
docker run -it -v $(pwd)/data:/app/data \
|
|
462
|
+
adambouafia/greenmining:latest python
|
|
524
463
|
|
|
525
|
-
#
|
|
526
|
-
docker run -
|
|
464
|
+
# Run Python script
|
|
465
|
+
docker run -v $(pwd)/data:/app/data \
|
|
466
|
+
adambouafia/greenmining:latest python your_script.py
|
|
527
467
|
```
|
|
528
468
|
|
|
529
469
|
## Configuration
|
|
@@ -549,14 +489,12 @@ EXCLUDE_BOT_COMMITS=true
|
|
|
549
489
|
|
|
550
490
|
# Optional - Analysis Features
|
|
551
491
|
ENABLE_DIFF_ANALYSIS=false
|
|
552
|
-
ENABLE_NLP=true
|
|
553
|
-
ENABLE_ML_FEATURES=true
|
|
554
492
|
BATCH_SIZE=10
|
|
555
493
|
|
|
556
494
|
# Optional - Temporal Analysis
|
|
557
495
|
ENABLE_TEMPORAL=true
|
|
558
496
|
TEMPORAL_GRANULARITY=quarter
|
|
559
|
-
|
|
497
|
+
ENABLE_STATS=true
|
|
560
498
|
|
|
561
499
|
# Optional - Output
|
|
562
500
|
OUTPUT_DIR=./data
|
|
@@ -586,14 +524,12 @@ config = Config(
|
|
|
586
524
|
|
|
587
525
|
# Analysis Options
|
|
588
526
|
enable_diff_analysis=False, # Enable code diff analysis
|
|
589
|
-
enable_nlp=True, # Enable NLP features
|
|
590
|
-
enable_ml_features=True, # Enable ML feature extraction
|
|
591
527
|
batch_size=10, # Batch processing size
|
|
592
528
|
|
|
593
529
|
# Temporal Analysis
|
|
594
530
|
enable_temporal=True, # Enable temporal trend analysis
|
|
595
531
|
temporal_granularity="quarter", # day/week/month/quarter/year
|
|
596
|
-
|
|
532
|
+
enable_stats=True, # Enable statistical analysis
|
|
597
533
|
|
|
598
534
|
# Output Configuration
|
|
599
535
|
output_dir="./data", # Output directory path
|
|
@@ -619,6 +555,50 @@ config = Config(
|
|
|
619
555
|
- **Docker Support**: Pre-built images for containerized analysis
|
|
620
556
|
- **Programmatic API**: Full Python API for custom workflows and integrations
|
|
621
557
|
- **Clean Architecture**: Modular design with services layer (Fetcher, Extractor, Analyzer, Aggregator, Reports)
|
|
558
|
+
- **Energy Measurement**: Real-time energy consumption tracking via RAPL (Linux) or CodeCarbon (cross-platform)
|
|
559
|
+
|
|
560
|
+
### Energy Measurement
|
|
561
|
+
|
|
562
|
+
greenmining includes built-in energy measurement capabilities for tracking the carbon footprint of your analysis:
|
|
563
|
+
|
|
564
|
+
#### Backend Options
|
|
565
|
+
|
|
566
|
+
| Backend | Platform | Metrics | Requirements |
|
|
567
|
+
|---------|----------|---------|--------------|
|
|
568
|
+
| **RAPL** | Linux (Intel/AMD) | CPU/RAM energy (Joules) | `/sys/class/powercap/` access |
|
|
569
|
+
| **CodeCarbon** | Cross-platform | Energy + Carbon emissions (gCO2) | `pip install codecarbon` |
|
|
570
|
+
|
|
571
|
+
#### Python API
|
|
572
|
+
|
|
573
|
+
```python
|
|
574
|
+
from greenmining.energy import RAPLEnergyMeter, CodeCarbonMeter
|
|
575
|
+
|
|
576
|
+
# RAPL (Linux only)
|
|
577
|
+
rapl = RAPLEnergyMeter()
|
|
578
|
+
if rapl.is_available():
|
|
579
|
+
rapl.start()
|
|
580
|
+
# ... run analysis ...
|
|
581
|
+
result = rapl.stop()
|
|
582
|
+
print(f"Energy: {result.energy_joules:.2f} J")
|
|
583
|
+
|
|
584
|
+
# CodeCarbon (cross-platform)
|
|
585
|
+
cc = CodeCarbonMeter()
|
|
586
|
+
if cc.is_available():
|
|
587
|
+
cc.start()
|
|
588
|
+
# ... run analysis ...
|
|
589
|
+
result = cc.stop()
|
|
590
|
+
print(f"Energy: {result.energy_joules:.2f} J")
|
|
591
|
+
print(f"Carbon: {result.carbon_grams:.4f} gCO2")
|
|
592
|
+
```
|
|
593
|
+
|
|
594
|
+
#### Experiment Results
|
|
595
|
+
|
|
596
|
+
CodeCarbon was verified with a real experiment:
|
|
597
|
+
- **Repository**: flask (pallets/flask)
|
|
598
|
+
- **Commits analyzed**: 10
|
|
599
|
+
- **Energy measured**: 160.6 J
|
|
600
|
+
- **Carbon emissions**: 0.0119 gCO2
|
|
601
|
+
- **Duration**: 11.28 seconds
|
|
622
602
|
|
|
623
603
|
### Pattern Database
|
|
624
604
|
|
|
@@ -683,77 +663,6 @@ Alpine containers, Infrastructure as Code, renewable energy regions, container o
|
|
|
683
663
|
### 15. General (8 patterns)
|
|
684
664
|
Feature flags, incremental processing, precomputation, background jobs, workflow optimization
|
|
685
665
|
|
|
686
|
-
## CLI Commands
|
|
687
|
-
|
|
688
|
-
| Command | Description | Key Options |
|
|
689
|
-
|---------|-------------|-------------|
|
|
690
|
-
| `fetch` | Fetch repositories from GitHub with custom keywords | `--max-repos`, `--min-stars`, `--languages`, `--keywords` |
|
|
691
|
-
| `extract` | Extract commit history from repositories | `--max-commits` per repository |
|
|
692
|
-
| `analyze` | Analyze commits for green patterns | `--enable-nlp`, `--enable-ml-features`, `--enable-diff-analysis` |
|
|
693
|
-
| `aggregate` | Aggregate analysis results | `--enable-temporal`, `--temporal-granularity`, `--enable-enhanced-stats` |
|
|
694
|
-
| `report` | Generate comprehensive report | Creates Markdown and CSV outputs |
|
|
695
|
-
| `pipeline` | Run complete analysis pipeline | `--max-repos`, `--max-commits` (all-in-one) |
|
|
696
|
-
| `status` | Show current analysis status | Displays progress and file statistics |
|
|
697
|
-
|
|
698
|
-
### Command Details
|
|
699
|
-
|
|
700
|
-
#### Fetch Repositories
|
|
701
|
-
```bash
|
|
702
|
-
# Fetch with custom search keywords
|
|
703
|
-
greenmining fetch --max-repos 100 --min-stars 50 --languages Python --keywords "kubernetes docker"
|
|
704
|
-
|
|
705
|
-
# Fetch microservices (default)
|
|
706
|
-
greenmining fetch --max-repos 100 --min-stars 50 --languages Python
|
|
707
|
-
```
|
|
708
|
-
Options:
|
|
709
|
-
- `--max-repos`: Maximum repositories to fetch (default: 100)
|
|
710
|
-
- `--min-stars`: Minimum GitHub stars (default: 100)
|
|
711
|
-
- `--languages`: Filter by programming languages (default: "Python,Java,Go,JavaScript,TypeScript")
|
|
712
|
-
- `--keywords`: Custom search keywords (default: "microservices")
|
|
713
|
-
|
|
714
|
-
#### Extract Commits
|
|
715
|
-
```bash
|
|
716
|
-
greenmining extract --max-commits 50
|
|
717
|
-
```
|
|
718
|
-
Options:
|
|
719
|
-
- `--max-commits`: Maximum commits per repository (default: 50)
|
|
720
|
-
|
|
721
|
-
#### Analyze Commits (with Advanced Features)
|
|
722
|
-
```bash
|
|
723
|
-
# Basic analysis
|
|
724
|
-
greenmining analyze
|
|
725
|
-
|
|
726
|
-
# Advanced analysis with all features
|
|
727
|
-
greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis --batch-size 20
|
|
728
|
-
```
|
|
729
|
-
Options:
|
|
730
|
-
- `--batch-size`: Batch size for processing (default: 10)
|
|
731
|
-
- `--enable-diff-analysis`: Enable code diff analysis (slower but more accurate)
|
|
732
|
-
- `--enable-nlp`: Enable NLP-enhanced pattern detection with morphological variants and synonyms
|
|
733
|
-
- `--enable-ml-features`: Enable ML feature extraction for model training
|
|
734
|
-
|
|
735
|
-
#### Aggregate Results (with Temporal Analysis)
|
|
736
|
-
```bash
|
|
737
|
-
# Basic aggregation
|
|
738
|
-
greenmining aggregate
|
|
739
|
-
|
|
740
|
-
# Advanced aggregation with temporal trends
|
|
741
|
-
greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
|
|
742
|
-
```
|
|
743
|
-
Options:
|
|
744
|
-
- `--enable-enhanced-stats`: Enable enhanced statistical analysis (correlations, effect sizes)
|
|
745
|
-
- `--enable-temporal`: Enable temporal trend analysis
|
|
746
|
-
- `--temporal-granularity`: Time period granularity (choices: day, week, month, quarter, year)
|
|
747
|
-
|
|
748
|
-
#### Run Pipeline
|
|
749
|
-
```bash
|
|
750
|
-
greenmining pipeline --max-repos 50 --max-commits 100
|
|
751
|
-
```
|
|
752
|
-
Options:
|
|
753
|
-
- `--max-repos`: Repositories to analyze
|
|
754
|
-
- `--max-commits`: Commits per repository
|
|
755
|
-
- Executes: fetch → extract → analyze → aggregate → report
|
|
756
|
-
|
|
757
666
|
## Output Files
|
|
758
667
|
|
|
759
668
|
All outputs are saved to the `data/` directory:
|
|
@@ -793,6 +702,7 @@ ruff check greenmining/ tests/
|
|
|
793
702
|
- PyDriller >= 2.5
|
|
794
703
|
- pandas >= 2.2.0
|
|
795
704
|
- click >= 8.1.7
|
|
705
|
+
- codecarbon >= 2.0.0 (optional, for cross-platform energy measurement)
|
|
796
706
|
|
|
797
707
|
## License
|
|
798
708
|
|
|
@@ -34,37 +34,6 @@ docker pull adambouafia/greenmining:latest
|
|
|
34
34
|
|
|
35
35
|
## Quick Start
|
|
36
36
|
|
|
37
|
-
### CLI Usage
|
|
38
|
-
|
|
39
|
-
```bash
|
|
40
|
-
# Set your GitHub token
|
|
41
|
-
export GITHUB_TOKEN="your_github_token"
|
|
42
|
-
|
|
43
|
-
# Run full analysis pipeline
|
|
44
|
-
greenmining pipeline --max-repos 100
|
|
45
|
-
|
|
46
|
-
# Fetch repositories with custom keywords
|
|
47
|
-
greenmining fetch --max-repos 100 --min-stars 100 --keywords "kubernetes docker cloud-native"
|
|
48
|
-
|
|
49
|
-
# Fetch with default (microservices)
|
|
50
|
-
greenmining fetch --max-repos 100 --min-stars 100
|
|
51
|
-
|
|
52
|
-
# Extract commits
|
|
53
|
-
greenmining extract --max-commits 50
|
|
54
|
-
|
|
55
|
-
# Analyze for green patterns
|
|
56
|
-
greenmining analyze
|
|
57
|
-
|
|
58
|
-
# Analyze with advanced features
|
|
59
|
-
greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis
|
|
60
|
-
|
|
61
|
-
# Aggregate results with temporal analysis
|
|
62
|
-
greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
|
|
63
|
-
|
|
64
|
-
# Generate report
|
|
65
|
-
greenmining report
|
|
66
|
-
```
|
|
67
|
-
|
|
68
37
|
### Python API
|
|
69
38
|
|
|
70
39
|
#### Basic Pattern Detection
|
|
@@ -143,23 +112,10 @@ extractor = CommitExtractor(
|
|
|
143
112
|
# Initialize analyzer with advanced features
|
|
144
113
|
analyzer = DataAnalyzer(
|
|
145
114
|
enable_diff_analysis=False, # Enable code diff analysis (slower but more accurate)
|
|
146
|
-
enable_nlp=True, # Enable NLP-enhanced pattern detection
|
|
147
|
-
enable_ml_features=True, # Enable ML feature extraction
|
|
148
115
|
patterns=None, # Custom pattern dict (default: GSF_PATTERNS)
|
|
149
116
|
batch_size=10 # Batch processing size (default: 10)
|
|
150
117
|
)
|
|
151
118
|
|
|
152
|
-
# Optional: Configure NLP analyzer separately
|
|
153
|
-
nlp_analyzer = NLPAnalyzer(
|
|
154
|
-
enable_stemming=True, # Enable morphological analysis (optimize→optimizing)
|
|
155
|
-
enable_synonyms=True # Enable semantic synonym matching (cache→buffer)
|
|
156
|
-
)
|
|
157
|
-
|
|
158
|
-
# Optional: Configure ML feature extractor
|
|
159
|
-
ml_extractor = MLFeatureExtractor(
|
|
160
|
-
green_keywords=None # Custom keyword list (default: built-in 19 keywords)
|
|
161
|
-
)
|
|
162
|
-
|
|
163
119
|
# Extract commits from first repo
|
|
164
120
|
commits = extractor.extract_commits(
|
|
165
121
|
repository=repos[0], # PyGithub Repository object
|
|
@@ -175,18 +131,9 @@ commits = extractor.extract_commits(
|
|
|
175
131
|
|
|
176
132
|
**DataAnalyzer Parameters:**
|
|
177
133
|
- `enable_diff_analysis` (bool, default=False): Enable code diff analysis (slower)
|
|
178
|
-
- `enable_nlp` (bool, default=False): Enable NLP-enhanced pattern detection
|
|
179
|
-
- `enable_ml_features` (bool, default=False): Enable ML feature extraction
|
|
180
134
|
- `patterns` (dict, optional): Custom pattern dictionary (default: GSF_PATTERNS)
|
|
181
135
|
- `batch_size` (int, default=10): Number of commits to process in each batch
|
|
182
136
|
|
|
183
|
-
**NLPAnalyzer Parameters:**
|
|
184
|
-
- `enable_stemming` (bool, default=True): Enable morphological variant matching
|
|
185
|
-
- `enable_synonyms` (bool, default=True): Enable semantic synonym expansion
|
|
186
|
-
|
|
187
|
-
**MLFeatureExtractor Parameters:**
|
|
188
|
-
- `green_keywords` (list[str], optional): Custom green keywords list
|
|
189
|
-
|
|
190
137
|
# Analyze commits for green patterns
|
|
191
138
|
results = []
|
|
192
139
|
for commit in commits:
|
|
@@ -252,7 +199,7 @@ from greenmining.analyzers.qualitative_analyzer import QualitativeAnalyzer
|
|
|
252
199
|
# Initialize aggregator with all advanced features
|
|
253
200
|
aggregator = DataAggregator(
|
|
254
201
|
config=None, # Config object (optional)
|
|
255
|
-
|
|
202
|
+
enable_stats=True, # Enable statistical analysis (correlations, trends)
|
|
256
203
|
enable_temporal=True, # Enable temporal trend analysis
|
|
257
204
|
temporal_granularity="quarter" # Time granularity: day/week/month/quarter/year
|
|
258
205
|
)
|
|
@@ -276,7 +223,7 @@ aggregated = aggregator.aggregate(
|
|
|
276
223
|
|
|
277
224
|
**DataAggregator Parameters:**
|
|
278
225
|
- `config` (Config, optional): Configuration object
|
|
279
|
-
- `
|
|
226
|
+
- `enable_stats` (bool, default=False): Enable pattern correlations and effect size analysis
|
|
280
227
|
- `enable_temporal` (bool, default=False): Enable temporal trend analysis over time
|
|
281
228
|
- `temporal_granularity` (str, default="quarter"): Time granularity (day/week/month/quarter/year)
|
|
282
229
|
|
|
@@ -398,8 +345,6 @@ extractor.save_results(
|
|
|
398
345
|
# STAGE 3: Analyze Commits
|
|
399
346
|
print("\nAnalyzing commits...")
|
|
400
347
|
analyzer = DataAnalyzer(
|
|
401
|
-
enable_nlp=True,
|
|
402
|
-
enable_ml_features=True,
|
|
403
348
|
enable_diff_analysis=False, # Set to True for detailed code analysis (slower)
|
|
404
349
|
)
|
|
405
350
|
analyzed_commits = analyzer.analyze_commits(all_commits)
|
|
@@ -416,7 +361,7 @@ analyzer.save_results(analyzed_commits, output_dir / "analyzed.json")
|
|
|
416
361
|
# STAGE 4: Aggregate Results
|
|
417
362
|
print("\nAggregating results...")
|
|
418
363
|
aggregator = DataAggregator(
|
|
419
|
-
|
|
364
|
+
enable_stats=True,
|
|
420
365
|
enable_temporal=True,
|
|
421
366
|
temporal_granularity="quarter",
|
|
422
367
|
)
|
|
@@ -443,8 +388,8 @@ print(f"\n📁 Results saved in: {output_dir.absolute()}")
|
|
|
443
388
|
|
|
444
389
|
1. **Fetches repositories** from GitHub based on keywords and filters
|
|
445
390
|
2. **Extracts commits** from each repository (up to 1000 per repo)
|
|
446
|
-
3. **Analyzes commits** for green software patterns
|
|
447
|
-
4. **Aggregates results** with temporal analysis and
|
|
391
|
+
3. **Analyzes commits** for green software patterns
|
|
392
|
+
4. **Aggregates results** with temporal analysis and statistics
|
|
448
393
|
5. **Saves results** to JSON and CSV files for further analysis
|
|
449
394
|
|
|
450
395
|
**Expected output files:**
|
|
@@ -459,17 +404,13 @@ print(f"\n📁 Results saved in: {output_dir.absolute()}")
|
|
|
459
404
|
### Docker Usage
|
|
460
405
|
|
|
461
406
|
```bash
|
|
462
|
-
#
|
|
463
|
-
docker run -v $(pwd)/data:/app/data \
|
|
464
|
-
adambouafia/greenmining:latest
|
|
465
|
-
|
|
466
|
-
# With custom configuration
|
|
467
|
-
docker run -v $(pwd)/.env:/app/.env:ro \
|
|
468
|
-
-v $(pwd)/data:/app/data \
|
|
469
|
-
adambouafia/greenmining:latest pipeline --max-repos 50
|
|
407
|
+
# Interactive shell with Python
|
|
408
|
+
docker run -it -v $(pwd)/data:/app/data \
|
|
409
|
+
adambouafia/greenmining:latest python
|
|
470
410
|
|
|
471
|
-
#
|
|
472
|
-
docker run -
|
|
411
|
+
# Run Python script
|
|
412
|
+
docker run -v $(pwd)/data:/app/data \
|
|
413
|
+
adambouafia/greenmining:latest python your_script.py
|
|
473
414
|
```
|
|
474
415
|
|
|
475
416
|
## Configuration
|
|
@@ -495,14 +436,12 @@ EXCLUDE_BOT_COMMITS=true
|
|
|
495
436
|
|
|
496
437
|
# Optional - Analysis Features
|
|
497
438
|
ENABLE_DIFF_ANALYSIS=false
|
|
498
|
-
ENABLE_NLP=true
|
|
499
|
-
ENABLE_ML_FEATURES=true
|
|
500
439
|
BATCH_SIZE=10
|
|
501
440
|
|
|
502
441
|
# Optional - Temporal Analysis
|
|
503
442
|
ENABLE_TEMPORAL=true
|
|
504
443
|
TEMPORAL_GRANULARITY=quarter
|
|
505
|
-
|
|
444
|
+
ENABLE_STATS=true
|
|
506
445
|
|
|
507
446
|
# Optional - Output
|
|
508
447
|
OUTPUT_DIR=./data
|
|
@@ -532,14 +471,12 @@ config = Config(
|
|
|
532
471
|
|
|
533
472
|
# Analysis Options
|
|
534
473
|
enable_diff_analysis=False, # Enable code diff analysis
|
|
535
|
-
enable_nlp=True, # Enable NLP features
|
|
536
|
-
enable_ml_features=True, # Enable ML feature extraction
|
|
537
474
|
batch_size=10, # Batch processing size
|
|
538
475
|
|
|
539
476
|
# Temporal Analysis
|
|
540
477
|
enable_temporal=True, # Enable temporal trend analysis
|
|
541
478
|
temporal_granularity="quarter", # day/week/month/quarter/year
|
|
542
|
-
|
|
479
|
+
enable_stats=True, # Enable statistical analysis
|
|
543
480
|
|
|
544
481
|
# Output Configuration
|
|
545
482
|
output_dir="./data", # Output directory path
|
|
@@ -565,6 +502,50 @@ config = Config(
|
|
|
565
502
|
- **Docker Support**: Pre-built images for containerized analysis
|
|
566
503
|
- **Programmatic API**: Full Python API for custom workflows and integrations
|
|
567
504
|
- **Clean Architecture**: Modular design with services layer (Fetcher, Extractor, Analyzer, Aggregator, Reports)
|
|
505
|
+
- **Energy Measurement**: Real-time energy consumption tracking via RAPL (Linux) or CodeCarbon (cross-platform)
|
|
506
|
+
|
|
507
|
+
### Energy Measurement
|
|
508
|
+
|
|
509
|
+
greenmining includes built-in energy measurement capabilities for tracking the carbon footprint of your analysis:
|
|
510
|
+
|
|
511
|
+
#### Backend Options
|
|
512
|
+
|
|
513
|
+
| Backend | Platform | Metrics | Requirements |
|
|
514
|
+
|---------|----------|---------|--------------|
|
|
515
|
+
| **RAPL** | Linux (Intel/AMD) | CPU/RAM energy (Joules) | `/sys/class/powercap/` access |
|
|
516
|
+
| **CodeCarbon** | Cross-platform | Energy + Carbon emissions (gCO2) | `pip install codecarbon` |
|
|
517
|
+
|
|
518
|
+
#### Python API
|
|
519
|
+
|
|
520
|
+
```python
|
|
521
|
+
from greenmining.energy import RAPLEnergyMeter, CodeCarbonMeter
|
|
522
|
+
|
|
523
|
+
# RAPL (Linux only)
|
|
524
|
+
rapl = RAPLEnergyMeter()
|
|
525
|
+
if rapl.is_available():
|
|
526
|
+
rapl.start()
|
|
527
|
+
# ... run analysis ...
|
|
528
|
+
result = rapl.stop()
|
|
529
|
+
print(f"Energy: {result.energy_joules:.2f} J")
|
|
530
|
+
|
|
531
|
+
# CodeCarbon (cross-platform)
|
|
532
|
+
cc = CodeCarbonMeter()
|
|
533
|
+
if cc.is_available():
|
|
534
|
+
cc.start()
|
|
535
|
+
# ... run analysis ...
|
|
536
|
+
result = cc.stop()
|
|
537
|
+
print(f"Energy: {result.energy_joules:.2f} J")
|
|
538
|
+
print(f"Carbon: {result.carbon_grams:.4f} gCO2")
|
|
539
|
+
```
|
|
540
|
+
|
|
541
|
+
#### Experiment Results
|
|
542
|
+
|
|
543
|
+
CodeCarbon was verified with a real experiment:
|
|
544
|
+
- **Repository**: flask (pallets/flask)
|
|
545
|
+
- **Commits analyzed**: 10
|
|
546
|
+
- **Energy measured**: 160.6 J
|
|
547
|
+
- **Carbon emissions**: 0.0119 gCO2
|
|
548
|
+
- **Duration**: 11.28 seconds
|
|
568
549
|
|
|
569
550
|
### Pattern Database
|
|
570
551
|
|
|
@@ -629,77 +610,6 @@ Alpine containers, Infrastructure as Code, renewable energy regions, container o
|
|
|
629
610
|
### 15. General (8 patterns)
|
|
630
611
|
Feature flags, incremental processing, precomputation, background jobs, workflow optimization
|
|
631
612
|
|
|
632
|
-
## CLI Commands
|
|
633
|
-
|
|
634
|
-
| Command | Description | Key Options |
|
|
635
|
-
|---------|-------------|-------------|
|
|
636
|
-
| `fetch` | Fetch repositories from GitHub with custom keywords | `--max-repos`, `--min-stars`, `--languages`, `--keywords` |
|
|
637
|
-
| `extract` | Extract commit history from repositories | `--max-commits` per repository |
|
|
638
|
-
| `analyze` | Analyze commits for green patterns | `--enable-nlp`, `--enable-ml-features`, `--enable-diff-analysis` |
|
|
639
|
-
| `aggregate` | Aggregate analysis results | `--enable-temporal`, `--temporal-granularity`, `--enable-enhanced-stats` |
|
|
640
|
-
| `report` | Generate comprehensive report | Creates Markdown and CSV outputs |
|
|
641
|
-
| `pipeline` | Run complete analysis pipeline | `--max-repos`, `--max-commits` (all-in-one) |
|
|
642
|
-
| `status` | Show current analysis status | Displays progress and file statistics |
|
|
643
|
-
|
|
644
|
-
### Command Details
|
|
645
|
-
|
|
646
|
-
#### Fetch Repositories
|
|
647
|
-
```bash
|
|
648
|
-
# Fetch with custom search keywords
|
|
649
|
-
greenmining fetch --max-repos 100 --min-stars 50 --languages Python --keywords "kubernetes docker"
|
|
650
|
-
|
|
651
|
-
# Fetch microservices (default)
|
|
652
|
-
greenmining fetch --max-repos 100 --min-stars 50 --languages Python
|
|
653
|
-
```
|
|
654
|
-
Options:
|
|
655
|
-
- `--max-repos`: Maximum repositories to fetch (default: 100)
|
|
656
|
-
- `--min-stars`: Minimum GitHub stars (default: 100)
|
|
657
|
-
- `--languages`: Filter by programming languages (default: "Python,Java,Go,JavaScript,TypeScript")
|
|
658
|
-
- `--keywords`: Custom search keywords (default: "microservices")
|
|
659
|
-
|
|
660
|
-
#### Extract Commits
|
|
661
|
-
```bash
|
|
662
|
-
greenmining extract --max-commits 50
|
|
663
|
-
```
|
|
664
|
-
Options:
|
|
665
|
-
- `--max-commits`: Maximum commits per repository (default: 50)
|
|
666
|
-
|
|
667
|
-
#### Analyze Commits (with Advanced Features)
|
|
668
|
-
```bash
|
|
669
|
-
# Basic analysis
|
|
670
|
-
greenmining analyze
|
|
671
|
-
|
|
672
|
-
# Advanced analysis with all features
|
|
673
|
-
greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis --batch-size 20
|
|
674
|
-
```
|
|
675
|
-
Options:
|
|
676
|
-
- `--batch-size`: Batch size for processing (default: 10)
|
|
677
|
-
- `--enable-diff-analysis`: Enable code diff analysis (slower but more accurate)
|
|
678
|
-
- `--enable-nlp`: Enable NLP-enhanced pattern detection with morphological variants and synonyms
|
|
679
|
-
- `--enable-ml-features`: Enable ML feature extraction for model training
|
|
680
|
-
|
|
681
|
-
#### Aggregate Results (with Temporal Analysis)
|
|
682
|
-
```bash
|
|
683
|
-
# Basic aggregation
|
|
684
|
-
greenmining aggregate
|
|
685
|
-
|
|
686
|
-
# Advanced aggregation with temporal trends
|
|
687
|
-
greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
|
|
688
|
-
```
|
|
689
|
-
Options:
|
|
690
|
-
- `--enable-enhanced-stats`: Enable enhanced statistical analysis (correlations, effect sizes)
|
|
691
|
-
- `--enable-temporal`: Enable temporal trend analysis
|
|
692
|
-
- `--temporal-granularity`: Time period granularity (choices: day, week, month, quarter, year)
|
|
693
|
-
|
|
694
|
-
#### Run Pipeline
|
|
695
|
-
```bash
|
|
696
|
-
greenmining pipeline --max-repos 50 --max-commits 100
|
|
697
|
-
```
|
|
698
|
-
Options:
|
|
699
|
-
- `--max-repos`: Repositories to analyze
|
|
700
|
-
- `--max-commits`: Commits per repository
|
|
701
|
-
- Executes: fetch → extract → analyze → aggregate → report
|
|
702
|
-
|
|
703
613
|
## Output Files
|
|
704
614
|
|
|
705
615
|
All outputs are saved to the `data/` directory:
|
|
@@ -739,6 +649,7 @@ ruff check greenmining/ tests/
|
|
|
739
649
|
- PyDriller >= 2.5
|
|
740
650
|
- pandas >= 2.2.0
|
|
741
651
|
- click >= 8.1.7
|
|
652
|
+
- codecarbon >= 2.0.0 (optional, for cross-platform energy measurement)
|
|
742
653
|
|
|
743
654
|
## License
|
|
744
655
|
|