greenmining 1.0.3__tar.gz → 1.0.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. {greenmining-1.0.3/greenmining.egg-info → greenmining-1.0.4}/PKG-INFO +61 -151
  2. {greenmining-1.0.3 → greenmining-1.0.4}/README.md +58 -147
  3. greenmining-1.0.4/greenmining/__init__.py +43 -0
  4. greenmining-1.0.4/greenmining/__main__.py +12 -0
  5. greenmining-1.0.4/greenmining/__version__.py +3 -0
  6. greenmining-1.0.4/greenmining/analyzers/__init__.py +13 -0
  7. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/analyzers/code_diff_analyzer.py +151 -61
  8. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/analyzers/qualitative_analyzer.py +15 -81
  9. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/analyzers/statistical_analyzer.py +8 -69
  10. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/analyzers/temporal_analyzer.py +16 -72
  11. greenmining-1.0.4/greenmining/config.py +188 -0
  12. greenmining-1.0.4/greenmining/controllers/__init__.py +7 -0
  13. greenmining-1.0.4/greenmining/controllers/repository_controller.py +231 -0
  14. greenmining-1.0.4/greenmining/energy/__init__.py +13 -0
  15. greenmining-1.0.4/greenmining/energy/base.py +165 -0
  16. greenmining-1.0.4/greenmining/energy/codecarbon_meter.py +146 -0
  17. greenmining-1.0.4/greenmining/energy/rapl.py +157 -0
  18. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/gsf_patterns.py +4 -26
  19. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/__init__.py +1 -5
  20. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/aggregated_stats.py +4 -4
  21. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/analysis_result.py +4 -4
  22. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/commit.py +5 -5
  23. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/models/repository.py +5 -5
  24. greenmining-1.0.4/greenmining/presenters/__init__.py +7 -0
  25. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/presenters/console_presenter.py +24 -24
  26. greenmining-1.0.4/greenmining/services/__init__.py +17 -0
  27. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/services/commit_extractor.py +8 -152
  28. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/services/data_aggregator.py +45 -175
  29. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/services/data_analyzer.py +9 -202
  30. greenmining-1.0.4/greenmining/services/github_fetcher.py +212 -0
  31. greenmining-1.0.4/greenmining/services/github_graphql_fetcher.py +371 -0
  32. greenmining-1.0.4/greenmining/services/local_repo_analyzer.py +387 -0
  33. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/services/reports.py +33 -137
  34. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining/utils.py +21 -149
  35. {greenmining-1.0.3 → greenmining-1.0.4/greenmining.egg-info}/PKG-INFO +61 -151
  36. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining.egg-info/SOURCES.txt +12 -10
  37. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining.egg-info/requires.txt +2 -2
  38. {greenmining-1.0.3 → greenmining-1.0.4}/pyproject.toml +4 -8
  39. greenmining-1.0.3/greenmining/__init__.py +0 -61
  40. greenmining-1.0.3/greenmining/__main__.py +0 -6
  41. greenmining-1.0.3/greenmining/__version__.py +0 -3
  42. greenmining-1.0.3/greenmining/analyzers/__init__.py +0 -17
  43. greenmining-1.0.3/greenmining/analyzers/ml_feature_extractor.py +0 -512
  44. greenmining-1.0.3/greenmining/analyzers/nlp_analyzer.py +0 -365
  45. greenmining-1.0.3/greenmining/cli.py +0 -471
  46. greenmining-1.0.3/greenmining/config.py +0 -141
  47. greenmining-1.0.3/greenmining/controllers/__init__.py +0 -11
  48. greenmining-1.0.3/greenmining/controllers/repository_controller.py +0 -172
  49. greenmining-1.0.3/greenmining/main.py +0 -37
  50. greenmining-1.0.3/greenmining/presenters/__init__.py +0 -11
  51. greenmining-1.0.3/greenmining/services/__init__.py +0 -13
  52. greenmining-1.0.3/greenmining/services/github_fetcher.py +0 -323
  53. greenmining-1.0.3/greenmining.egg-info/entry_points.txt +0 -2
  54. greenmining-1.0.3/pytest.ini +0 -22
  55. {greenmining-1.0.3 → greenmining-1.0.4}/CHANGELOG.md +0 -0
  56. {greenmining-1.0.3 → greenmining-1.0.4}/LICENSE +0 -0
  57. {greenmining-1.0.3 → greenmining-1.0.4}/MANIFEST.in +0 -0
  58. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining.egg-info/dependency_links.txt +0 -0
  59. {greenmining-1.0.3 → greenmining-1.0.4}/greenmining.egg-info/top_level.txt +0 -0
  60. {greenmining-1.0.3 → greenmining-1.0.4}/setup.cfg +0 -0
  61. {greenmining-1.0.3 → greenmining-1.0.4}/setup.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: greenmining
3
- Version: 1.0.3
3
+ Version: 1.0.4
4
4
  Summary: Analyze GitHub repositories to identify green software engineering patterns and energy-efficient practices
5
5
  Author-email: Adam Bouafia <a.bouafia@student.vu.nl>
6
6
  License: MIT
@@ -23,20 +23,19 @@ Classifier: Programming Language :: Python :: 3.11
23
23
  Classifier: Programming Language :: Python :: 3.12
24
24
  Classifier: Programming Language :: Python :: 3.13
25
25
  Classifier: Operating System :: OS Independent
26
- Classifier: Environment :: Console
27
26
  Requires-Python: >=3.9
28
27
  Description-Content-Type: text/markdown
29
28
  License-File: LICENSE
30
29
  Requires-Dist: PyGithub>=2.1.1
31
30
  Requires-Dist: PyDriller>=2.5
32
31
  Requires-Dist: pandas>=2.2.0
33
- Requires-Dist: click>=8.1.7
34
32
  Requires-Dist: colorama>=0.4.6
35
33
  Requires-Dist: tabulate>=0.9.0
36
34
  Requires-Dist: tqdm>=4.66.0
37
35
  Requires-Dist: matplotlib>=3.8.0
38
36
  Requires-Dist: plotly>=5.18.0
39
37
  Requires-Dist: python-dotenv>=1.0.0
38
+ Requires-Dist: requests>=2.31.0
40
39
  Provides-Extra: dev
41
40
  Requires-Dist: pytest>=7.4.0; extra == "dev"
42
41
  Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
@@ -44,7 +43,7 @@ Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
44
43
  Requires-Dist: black>=23.12.0; extra == "dev"
45
44
  Requires-Dist: ruff>=0.1.9; extra == "dev"
46
45
  Requires-Dist: mypy>=1.8.0; extra == "dev"
47
- Requires-Dist: build>=1.0.3; extra == "dev"
46
+ Requires-Dist: build>=1.0.4; extra == "dev"
48
47
  Requires-Dist: twine>=4.0.2; extra == "dev"
49
48
  Provides-Extra: docs
50
49
  Requires-Dist: sphinx>=7.2.0; extra == "docs"
@@ -88,37 +87,6 @@ docker pull adambouafia/greenmining:latest
88
87
 
89
88
  ## Quick Start
90
89
 
91
- ### CLI Usage
92
-
93
- ```bash
94
- # Set your GitHub token
95
- export GITHUB_TOKEN="your_github_token"
96
-
97
- # Run full analysis pipeline
98
- greenmining pipeline --max-repos 100
99
-
100
- # Fetch repositories with custom keywords
101
- greenmining fetch --max-repos 100 --min-stars 100 --keywords "kubernetes docker cloud-native"
102
-
103
- # Fetch with default (microservices)
104
- greenmining fetch --max-repos 100 --min-stars 100
105
-
106
- # Extract commits
107
- greenmining extract --max-commits 50
108
-
109
- # Analyze for green patterns
110
- greenmining analyze
111
-
112
- # Analyze with advanced features
113
- greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis
114
-
115
- # Aggregate results with temporal analysis
116
- greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
117
-
118
- # Generate report
119
- greenmining report
120
- ```
121
-
122
90
  ### Python API
123
91
 
124
92
  #### Basic Pattern Detection
@@ -197,23 +165,10 @@ extractor = CommitExtractor(
197
165
  # Initialize analyzer with advanced features
198
166
  analyzer = DataAnalyzer(
199
167
  enable_diff_analysis=False, # Enable code diff analysis (slower but more accurate)
200
- enable_nlp=True, # Enable NLP-enhanced pattern detection
201
- enable_ml_features=True, # Enable ML feature extraction
202
168
  patterns=None, # Custom pattern dict (default: GSF_PATTERNS)
203
169
  batch_size=10 # Batch processing size (default: 10)
204
170
  )
205
171
 
206
- # Optional: Configure NLP analyzer separately
207
- nlp_analyzer = NLPAnalyzer(
208
- enable_stemming=True, # Enable morphological analysis (optimize→optimizing)
209
- enable_synonyms=True # Enable semantic synonym matching (cache→buffer)
210
- )
211
-
212
- # Optional: Configure ML feature extractor
213
- ml_extractor = MLFeatureExtractor(
214
- green_keywords=None # Custom keyword list (default: built-in 19 keywords)
215
- )
216
-
217
172
  # Extract commits from first repo
218
173
  commits = extractor.extract_commits(
219
174
  repository=repos[0], # PyGithub Repository object
@@ -229,18 +184,9 @@ commits = extractor.extract_commits(
229
184
 
230
185
  **DataAnalyzer Parameters:**
231
186
  - `enable_diff_analysis` (bool, default=False): Enable code diff analysis (slower)
232
- - `enable_nlp` (bool, default=False): Enable NLP-enhanced pattern detection
233
- - `enable_ml_features` (bool, default=False): Enable ML feature extraction
234
187
  - `patterns` (dict, optional): Custom pattern dictionary (default: GSF_PATTERNS)
235
188
  - `batch_size` (int, default=10): Number of commits to process in each batch
236
189
 
237
- **NLPAnalyzer Parameters:**
238
- - `enable_stemming` (bool, default=True): Enable morphological variant matching
239
- - `enable_synonyms` (bool, default=True): Enable semantic synonym expansion
240
-
241
- **MLFeatureExtractor Parameters:**
242
- - `green_keywords` (list[str], optional): Custom green keywords list
243
-
244
190
  # Analyze commits for green patterns
245
191
  results = []
246
192
  for commit in commits:
@@ -306,7 +252,7 @@ from greenmining.analyzers.qualitative_analyzer import QualitativeAnalyzer
306
252
  # Initialize aggregator with all advanced features
307
253
  aggregator = DataAggregator(
308
254
  config=None, # Config object (optional)
309
- enable_enhanced_stats=True, # Enable statistical analysis (correlations, trends)
255
+ enable_stats=True, # Enable statistical analysis (correlations, trends)
310
256
  enable_temporal=True, # Enable temporal trend analysis
311
257
  temporal_granularity="quarter" # Time granularity: day/week/month/quarter/year
312
258
  )
@@ -330,7 +276,7 @@ aggregated = aggregator.aggregate(
330
276
 
331
277
  **DataAggregator Parameters:**
332
278
  - `config` (Config, optional): Configuration object
333
- - `enable_enhanced_stats` (bool, default=False): Enable pattern correlations and effect size analysis
279
+ - `enable_stats` (bool, default=False): Enable pattern correlations and effect size analysis
334
280
  - `enable_temporal` (bool, default=False): Enable temporal trend analysis over time
335
281
  - `temporal_granularity` (str, default="quarter"): Time granularity (day/week/month/quarter/year)
336
282
 
@@ -452,8 +398,6 @@ extractor.save_results(
452
398
  # STAGE 3: Analyze Commits
453
399
  print("\nAnalyzing commits...")
454
400
  analyzer = DataAnalyzer(
455
- enable_nlp=True,
456
- enable_ml_features=True,
457
401
  enable_diff_analysis=False, # Set to True for detailed code analysis (slower)
458
402
  )
459
403
  analyzed_commits = analyzer.analyze_commits(all_commits)
@@ -470,7 +414,7 @@ analyzer.save_results(analyzed_commits, output_dir / "analyzed.json")
470
414
  # STAGE 4: Aggregate Results
471
415
  print("\nAggregating results...")
472
416
  aggregator = DataAggregator(
473
- enable_enhanced_stats=True,
417
+ enable_stats=True,
474
418
  enable_temporal=True,
475
419
  temporal_granularity="quarter",
476
420
  )
@@ -497,8 +441,8 @@ print(f"\n📁 Results saved in: {output_dir.absolute()}")
497
441
 
498
442
  1. **Fetches repositories** from GitHub based on keywords and filters
499
443
  2. **Extracts commits** from each repository (up to 1000 per repo)
500
- 3. **Analyzes commits** for green software patterns using NLP and ML
501
- 4. **Aggregates results** with temporal analysis and enhanced statistics
444
+ 3. **Analyzes commits** for green software patterns
445
+ 4. **Aggregates results** with temporal analysis and statistics
502
446
  5. **Saves results** to JSON and CSV files for further analysis
503
447
 
504
448
  **Expected output files:**
@@ -513,17 +457,13 @@ print(f"\n📁 Results saved in: {output_dir.absolute()}")
513
457
  ### Docker Usage
514
458
 
515
459
  ```bash
516
- # Run analysis pipeline
517
- docker run -v $(pwd)/data:/app/data \
518
- adambouafia/greenmining:latest --help
519
-
520
- # With custom configuration
521
- docker run -v $(pwd)/.env:/app/.env:ro \
522
- -v $(pwd)/data:/app/data \
523
- adambouafia/greenmining:latest pipeline --max-repos 50
460
+ # Interactive shell with Python
461
+ docker run -it -v $(pwd)/data:/app/data \
462
+ adambouafia/greenmining:latest python
524
463
 
525
- # Interactive shell
526
- docker run -it adambouafia/greenmining:latest /bin/bash
464
+ # Run Python script
465
+ docker run -v $(pwd)/data:/app/data \
466
+ adambouafia/greenmining:latest python your_script.py
527
467
  ```
528
468
 
529
469
  ## Configuration
@@ -549,14 +489,12 @@ EXCLUDE_BOT_COMMITS=true
549
489
 
550
490
  # Optional - Analysis Features
551
491
  ENABLE_DIFF_ANALYSIS=false
552
- ENABLE_NLP=true
553
- ENABLE_ML_FEATURES=true
554
492
  BATCH_SIZE=10
555
493
 
556
494
  # Optional - Temporal Analysis
557
495
  ENABLE_TEMPORAL=true
558
496
  TEMPORAL_GRANULARITY=quarter
559
- ENABLE_ENHANCED_STATS=true
497
+ ENABLE_STATS=true
560
498
 
561
499
  # Optional - Output
562
500
  OUTPUT_DIR=./data
@@ -586,14 +524,12 @@ config = Config(
586
524
 
587
525
  # Analysis Options
588
526
  enable_diff_analysis=False, # Enable code diff analysis
589
- enable_nlp=True, # Enable NLP features
590
- enable_ml_features=True, # Enable ML feature extraction
591
527
  batch_size=10, # Batch processing size
592
528
 
593
529
  # Temporal Analysis
594
530
  enable_temporal=True, # Enable temporal trend analysis
595
531
  temporal_granularity="quarter", # day/week/month/quarter/year
596
- enable_enhanced_stats=True, # Enable statistical analysis
532
+ enable_stats=True, # Enable statistical analysis
597
533
 
598
534
  # Output Configuration
599
535
  output_dir="./data", # Output directory path
@@ -619,6 +555,50 @@ config = Config(
619
555
  - **Docker Support**: Pre-built images for containerized analysis
620
556
  - **Programmatic API**: Full Python API for custom workflows and integrations
621
557
  - **Clean Architecture**: Modular design with services layer (Fetcher, Extractor, Analyzer, Aggregator, Reports)
558
+ - **Energy Measurement**: Real-time energy consumption tracking via RAPL (Linux) or CodeCarbon (cross-platform)
559
+
560
+ ### Energy Measurement
561
+
562
+ greenmining includes built-in energy measurement capabilities for tracking the carbon footprint of your analysis:
563
+
564
+ #### Backend Options
565
+
566
+ | Backend | Platform | Metrics | Requirements |
567
+ |---------|----------|---------|--------------|
568
+ | **RAPL** | Linux (Intel/AMD) | CPU/RAM energy (Joules) | `/sys/class/powercap/` access |
569
+ | **CodeCarbon** | Cross-platform | Energy + Carbon emissions (gCO2) | `pip install codecarbon` |
570
+
571
+ #### Python API
572
+
573
+ ```python
574
+ from greenmining.energy import RAPLEnergyMeter, CodeCarbonMeter
575
+
576
+ # RAPL (Linux only)
577
+ rapl = RAPLEnergyMeter()
578
+ if rapl.is_available():
579
+ rapl.start()
580
+ # ... run analysis ...
581
+ result = rapl.stop()
582
+ print(f"Energy: {result.energy_joules:.2f} J")
583
+
584
+ # CodeCarbon (cross-platform)
585
+ cc = CodeCarbonMeter()
586
+ if cc.is_available():
587
+ cc.start()
588
+ # ... run analysis ...
589
+ result = cc.stop()
590
+ print(f"Energy: {result.energy_joules:.2f} J")
591
+ print(f"Carbon: {result.carbon_grams:.4f} gCO2")
592
+ ```
593
+
594
+ #### Experiment Results
595
+
596
+ CodeCarbon was verified with a real experiment:
597
+ - **Repository**: flask (pallets/flask)
598
+ - **Commits analyzed**: 10
599
+ - **Energy measured**: 160.6 J
600
+ - **Carbon emissions**: 0.0119 gCO2
601
+ - **Duration**: 11.28 seconds
622
602
 
623
603
  ### Pattern Database
624
604
 
@@ -683,77 +663,6 @@ Alpine containers, Infrastructure as Code, renewable energy regions, container o
683
663
  ### 15. General (8 patterns)
684
664
  Feature flags, incremental processing, precomputation, background jobs, workflow optimization
685
665
 
686
- ## CLI Commands
687
-
688
- | Command | Description | Key Options |
689
- |---------|-------------|-------------|
690
- | `fetch` | Fetch repositories from GitHub with custom keywords | `--max-repos`, `--min-stars`, `--languages`, `--keywords` |
691
- | `extract` | Extract commit history from repositories | `--max-commits` per repository |
692
- | `analyze` | Analyze commits for green patterns | `--enable-nlp`, `--enable-ml-features`, `--enable-diff-analysis` |
693
- | `aggregate` | Aggregate analysis results | `--enable-temporal`, `--temporal-granularity`, `--enable-enhanced-stats` |
694
- | `report` | Generate comprehensive report | Creates Markdown and CSV outputs |
695
- | `pipeline` | Run complete analysis pipeline | `--max-repos`, `--max-commits` (all-in-one) |
696
- | `status` | Show current analysis status | Displays progress and file statistics |
697
-
698
- ### Command Details
699
-
700
- #### Fetch Repositories
701
- ```bash
702
- # Fetch with custom search keywords
703
- greenmining fetch --max-repos 100 --min-stars 50 --languages Python --keywords "kubernetes docker"
704
-
705
- # Fetch microservices (default)
706
- greenmining fetch --max-repos 100 --min-stars 50 --languages Python
707
- ```
708
- Options:
709
- - `--max-repos`: Maximum repositories to fetch (default: 100)
710
- - `--min-stars`: Minimum GitHub stars (default: 100)
711
- - `--languages`: Filter by programming languages (default: "Python,Java,Go,JavaScript,TypeScript")
712
- - `--keywords`: Custom search keywords (default: "microservices")
713
-
714
- #### Extract Commits
715
- ```bash
716
- greenmining extract --max-commits 50
717
- ```
718
- Options:
719
- - `--max-commits`: Maximum commits per repository (default: 50)
720
-
721
- #### Analyze Commits (with Advanced Features)
722
- ```bash
723
- # Basic analysis
724
- greenmining analyze
725
-
726
- # Advanced analysis with all features
727
- greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis --batch-size 20
728
- ```
729
- Options:
730
- - `--batch-size`: Batch size for processing (default: 10)
731
- - `--enable-diff-analysis`: Enable code diff analysis (slower but more accurate)
732
- - `--enable-nlp`: Enable NLP-enhanced pattern detection with morphological variants and synonyms
733
- - `--enable-ml-features`: Enable ML feature extraction for model training
734
-
735
- #### Aggregate Results (with Temporal Analysis)
736
- ```bash
737
- # Basic aggregation
738
- greenmining aggregate
739
-
740
- # Advanced aggregation with temporal trends
741
- greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
742
- ```
743
- Options:
744
- - `--enable-enhanced-stats`: Enable enhanced statistical analysis (correlations, effect sizes)
745
- - `--enable-temporal`: Enable temporal trend analysis
746
- - `--temporal-granularity`: Time period granularity (choices: day, week, month, quarter, year)
747
-
748
- #### Run Pipeline
749
- ```bash
750
- greenmining pipeline --max-repos 50 --max-commits 100
751
- ```
752
- Options:
753
- - `--max-repos`: Repositories to analyze
754
- - `--max-commits`: Commits per repository
755
- - Executes: fetch → extract → analyze → aggregate → report
756
-
757
666
  ## Output Files
758
667
 
759
668
  All outputs are saved to the `data/` directory:
@@ -793,6 +702,7 @@ ruff check greenmining/ tests/
793
702
  - PyDriller >= 2.5
794
703
  - pandas >= 2.2.0
795
704
  - click >= 8.1.7
705
+ - codecarbon >= 2.0.0 (optional, for cross-platform energy measurement)
796
706
 
797
707
  ## License
798
708
 
@@ -34,37 +34,6 @@ docker pull adambouafia/greenmining:latest
34
34
 
35
35
  ## Quick Start
36
36
 
37
- ### CLI Usage
38
-
39
- ```bash
40
- # Set your GitHub token
41
- export GITHUB_TOKEN="your_github_token"
42
-
43
- # Run full analysis pipeline
44
- greenmining pipeline --max-repos 100
45
-
46
- # Fetch repositories with custom keywords
47
- greenmining fetch --max-repos 100 --min-stars 100 --keywords "kubernetes docker cloud-native"
48
-
49
- # Fetch with default (microservices)
50
- greenmining fetch --max-repos 100 --min-stars 100
51
-
52
- # Extract commits
53
- greenmining extract --max-commits 50
54
-
55
- # Analyze for green patterns
56
- greenmining analyze
57
-
58
- # Analyze with advanced features
59
- greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis
60
-
61
- # Aggregate results with temporal analysis
62
- greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
63
-
64
- # Generate report
65
- greenmining report
66
- ```
67
-
68
37
  ### Python API
69
38
 
70
39
  #### Basic Pattern Detection
@@ -143,23 +112,10 @@ extractor = CommitExtractor(
143
112
  # Initialize analyzer with advanced features
144
113
  analyzer = DataAnalyzer(
145
114
  enable_diff_analysis=False, # Enable code diff analysis (slower but more accurate)
146
- enable_nlp=True, # Enable NLP-enhanced pattern detection
147
- enable_ml_features=True, # Enable ML feature extraction
148
115
  patterns=None, # Custom pattern dict (default: GSF_PATTERNS)
149
116
  batch_size=10 # Batch processing size (default: 10)
150
117
  )
151
118
 
152
- # Optional: Configure NLP analyzer separately
153
- nlp_analyzer = NLPAnalyzer(
154
- enable_stemming=True, # Enable morphological analysis (optimize→optimizing)
155
- enable_synonyms=True # Enable semantic synonym matching (cache→buffer)
156
- )
157
-
158
- # Optional: Configure ML feature extractor
159
- ml_extractor = MLFeatureExtractor(
160
- green_keywords=None # Custom keyword list (default: built-in 19 keywords)
161
- )
162
-
163
119
  # Extract commits from first repo
164
120
  commits = extractor.extract_commits(
165
121
  repository=repos[0], # PyGithub Repository object
@@ -175,18 +131,9 @@ commits = extractor.extract_commits(
175
131
 
176
132
  **DataAnalyzer Parameters:**
177
133
  - `enable_diff_analysis` (bool, default=False): Enable code diff analysis (slower)
178
- - `enable_nlp` (bool, default=False): Enable NLP-enhanced pattern detection
179
- - `enable_ml_features` (bool, default=False): Enable ML feature extraction
180
134
  - `patterns` (dict, optional): Custom pattern dictionary (default: GSF_PATTERNS)
181
135
  - `batch_size` (int, default=10): Number of commits to process in each batch
182
136
 
183
- **NLPAnalyzer Parameters:**
184
- - `enable_stemming` (bool, default=True): Enable morphological variant matching
185
- - `enable_synonyms` (bool, default=True): Enable semantic synonym expansion
186
-
187
- **MLFeatureExtractor Parameters:**
188
- - `green_keywords` (list[str], optional): Custom green keywords list
189
-
190
137
  # Analyze commits for green patterns
191
138
  results = []
192
139
  for commit in commits:
@@ -252,7 +199,7 @@ from greenmining.analyzers.qualitative_analyzer import QualitativeAnalyzer
252
199
  # Initialize aggregator with all advanced features
253
200
  aggregator = DataAggregator(
254
201
  config=None, # Config object (optional)
255
- enable_enhanced_stats=True, # Enable statistical analysis (correlations, trends)
202
+ enable_stats=True, # Enable statistical analysis (correlations, trends)
256
203
  enable_temporal=True, # Enable temporal trend analysis
257
204
  temporal_granularity="quarter" # Time granularity: day/week/month/quarter/year
258
205
  )
@@ -276,7 +223,7 @@ aggregated = aggregator.aggregate(
276
223
 
277
224
  **DataAggregator Parameters:**
278
225
  - `config` (Config, optional): Configuration object
279
- - `enable_enhanced_stats` (bool, default=False): Enable pattern correlations and effect size analysis
226
+ - `enable_stats` (bool, default=False): Enable pattern correlations and effect size analysis
280
227
  - `enable_temporal` (bool, default=False): Enable temporal trend analysis over time
281
228
  - `temporal_granularity` (str, default="quarter"): Time granularity (day/week/month/quarter/year)
282
229
 
@@ -398,8 +345,6 @@ extractor.save_results(
398
345
  # STAGE 3: Analyze Commits
399
346
  print("\nAnalyzing commits...")
400
347
  analyzer = DataAnalyzer(
401
- enable_nlp=True,
402
- enable_ml_features=True,
403
348
  enable_diff_analysis=False, # Set to True for detailed code analysis (slower)
404
349
  )
405
350
  analyzed_commits = analyzer.analyze_commits(all_commits)
@@ -416,7 +361,7 @@ analyzer.save_results(analyzed_commits, output_dir / "analyzed.json")
416
361
  # STAGE 4: Aggregate Results
417
362
  print("\nAggregating results...")
418
363
  aggregator = DataAggregator(
419
- enable_enhanced_stats=True,
364
+ enable_stats=True,
420
365
  enable_temporal=True,
421
366
  temporal_granularity="quarter",
422
367
  )
@@ -443,8 +388,8 @@ print(f"\n📁 Results saved in: {output_dir.absolute()}")
443
388
 
444
389
  1. **Fetches repositories** from GitHub based on keywords and filters
445
390
  2. **Extracts commits** from each repository (up to 1000 per repo)
446
- 3. **Analyzes commits** for green software patterns using NLP and ML
447
- 4. **Aggregates results** with temporal analysis and enhanced statistics
391
+ 3. **Analyzes commits** for green software patterns
392
+ 4. **Aggregates results** with temporal analysis and statistics
448
393
  5. **Saves results** to JSON and CSV files for further analysis
449
394
 
450
395
  **Expected output files:**
@@ -459,17 +404,13 @@ print(f"\n📁 Results saved in: {output_dir.absolute()}")
459
404
  ### Docker Usage
460
405
 
461
406
  ```bash
462
- # Run analysis pipeline
463
- docker run -v $(pwd)/data:/app/data \
464
- adambouafia/greenmining:latest --help
465
-
466
- # With custom configuration
467
- docker run -v $(pwd)/.env:/app/.env:ro \
468
- -v $(pwd)/data:/app/data \
469
- adambouafia/greenmining:latest pipeline --max-repos 50
407
+ # Interactive shell with Python
408
+ docker run -it -v $(pwd)/data:/app/data \
409
+ adambouafia/greenmining:latest python
470
410
 
471
- # Interactive shell
472
- docker run -it adambouafia/greenmining:latest /bin/bash
411
+ # Run Python script
412
+ docker run -v $(pwd)/data:/app/data \
413
+ adambouafia/greenmining:latest python your_script.py
473
414
  ```
474
415
 
475
416
  ## Configuration
@@ -495,14 +436,12 @@ EXCLUDE_BOT_COMMITS=true
495
436
 
496
437
  # Optional - Analysis Features
497
438
  ENABLE_DIFF_ANALYSIS=false
498
- ENABLE_NLP=true
499
- ENABLE_ML_FEATURES=true
500
439
  BATCH_SIZE=10
501
440
 
502
441
  # Optional - Temporal Analysis
503
442
  ENABLE_TEMPORAL=true
504
443
  TEMPORAL_GRANULARITY=quarter
505
- ENABLE_ENHANCED_STATS=true
444
+ ENABLE_STATS=true
506
445
 
507
446
  # Optional - Output
508
447
  OUTPUT_DIR=./data
@@ -532,14 +471,12 @@ config = Config(
532
471
 
533
472
  # Analysis Options
534
473
  enable_diff_analysis=False, # Enable code diff analysis
535
- enable_nlp=True, # Enable NLP features
536
- enable_ml_features=True, # Enable ML feature extraction
537
474
  batch_size=10, # Batch processing size
538
475
 
539
476
  # Temporal Analysis
540
477
  enable_temporal=True, # Enable temporal trend analysis
541
478
  temporal_granularity="quarter", # day/week/month/quarter/year
542
- enable_enhanced_stats=True, # Enable statistical analysis
479
+ enable_stats=True, # Enable statistical analysis
543
480
 
544
481
  # Output Configuration
545
482
  output_dir="./data", # Output directory path
@@ -565,6 +502,50 @@ config = Config(
565
502
  - **Docker Support**: Pre-built images for containerized analysis
566
503
  - **Programmatic API**: Full Python API for custom workflows and integrations
567
504
  - **Clean Architecture**: Modular design with services layer (Fetcher, Extractor, Analyzer, Aggregator, Reports)
505
+ - **Energy Measurement**: Real-time energy consumption tracking via RAPL (Linux) or CodeCarbon (cross-platform)
506
+
507
+ ### Energy Measurement
508
+
509
+ greenmining includes built-in energy measurement capabilities for tracking the carbon footprint of your analysis:
510
+
511
+ #### Backend Options
512
+
513
+ | Backend | Platform | Metrics | Requirements |
514
+ |---------|----------|---------|--------------|
515
+ | **RAPL** | Linux (Intel/AMD) | CPU/RAM energy (Joules) | `/sys/class/powercap/` access |
516
+ | **CodeCarbon** | Cross-platform | Energy + Carbon emissions (gCO2) | `pip install codecarbon` |
517
+
518
+ #### Python API
519
+
520
+ ```python
521
+ from greenmining.energy import RAPLEnergyMeter, CodeCarbonMeter
522
+
523
+ # RAPL (Linux only)
524
+ rapl = RAPLEnergyMeter()
525
+ if rapl.is_available():
526
+ rapl.start()
527
+ # ... run analysis ...
528
+ result = rapl.stop()
529
+ print(f"Energy: {result.energy_joules:.2f} J")
530
+
531
+ # CodeCarbon (cross-platform)
532
+ cc = CodeCarbonMeter()
533
+ if cc.is_available():
534
+ cc.start()
535
+ # ... run analysis ...
536
+ result = cc.stop()
537
+ print(f"Energy: {result.energy_joules:.2f} J")
538
+ print(f"Carbon: {result.carbon_grams:.4f} gCO2")
539
+ ```
540
+
541
+ #### Experiment Results
542
+
543
+ CodeCarbon was verified with a real experiment:
544
+ - **Repository**: flask (pallets/flask)
545
+ - **Commits analyzed**: 10
546
+ - **Energy measured**: 160.6 J
547
+ - **Carbon emissions**: 0.0119 gCO2
548
+ - **Duration**: 11.28 seconds
568
549
 
569
550
  ### Pattern Database
570
551
 
@@ -629,77 +610,6 @@ Alpine containers, Infrastructure as Code, renewable energy regions, container o
629
610
  ### 15. General (8 patterns)
630
611
  Feature flags, incremental processing, precomputation, background jobs, workflow optimization
631
612
 
632
- ## CLI Commands
633
-
634
- | Command | Description | Key Options |
635
- |---------|-------------|-------------|
636
- | `fetch` | Fetch repositories from GitHub with custom keywords | `--max-repos`, `--min-stars`, `--languages`, `--keywords` |
637
- | `extract` | Extract commit history from repositories | `--max-commits` per repository |
638
- | `analyze` | Analyze commits for green patterns | `--enable-nlp`, `--enable-ml-features`, `--enable-diff-analysis` |
639
- | `aggregate` | Aggregate analysis results | `--enable-temporal`, `--temporal-granularity`, `--enable-enhanced-stats` |
640
- | `report` | Generate comprehensive report | Creates Markdown and CSV outputs |
641
- | `pipeline` | Run complete analysis pipeline | `--max-repos`, `--max-commits` (all-in-one) |
642
- | `status` | Show current analysis status | Displays progress and file statistics |
643
-
644
- ### Command Details
645
-
646
- #### Fetch Repositories
647
- ```bash
648
- # Fetch with custom search keywords
649
- greenmining fetch --max-repos 100 --min-stars 50 --languages Python --keywords "kubernetes docker"
650
-
651
- # Fetch microservices (default)
652
- greenmining fetch --max-repos 100 --min-stars 50 --languages Python
653
- ```
654
- Options:
655
- - `--max-repos`: Maximum repositories to fetch (default: 100)
656
- - `--min-stars`: Minimum GitHub stars (default: 100)
657
- - `--languages`: Filter by programming languages (default: "Python,Java,Go,JavaScript,TypeScript")
658
- - `--keywords`: Custom search keywords (default: "microservices")
659
-
660
- #### Extract Commits
661
- ```bash
662
- greenmining extract --max-commits 50
663
- ```
664
- Options:
665
- - `--max-commits`: Maximum commits per repository (default: 50)
666
-
667
- #### Analyze Commits (with Advanced Features)
668
- ```bash
669
- # Basic analysis
670
- greenmining analyze
671
-
672
- # Advanced analysis with all features
673
- greenmining analyze --enable-nlp --enable-ml-features --enable-diff-analysis --batch-size 20
674
- ```
675
- Options:
676
- - `--batch-size`: Batch size for processing (default: 10)
677
- - `--enable-diff-analysis`: Enable code diff analysis (slower but more accurate)
678
- - `--enable-nlp`: Enable NLP-enhanced pattern detection with morphological variants and synonyms
679
- - `--enable-ml-features`: Enable ML feature extraction for model training
680
-
681
- #### Aggregate Results (with Temporal Analysis)
682
- ```bash
683
- # Basic aggregation
684
- greenmining aggregate
685
-
686
- # Advanced aggregation with temporal trends
687
- greenmining aggregate --enable-temporal --temporal-granularity quarter --enable-enhanced-stats
688
- ```
689
- Options:
690
- - `--enable-enhanced-stats`: Enable enhanced statistical analysis (correlations, effect sizes)
691
- - `--enable-temporal`: Enable temporal trend analysis
692
- - `--temporal-granularity`: Time period granularity (choices: day, week, month, quarter, year)
693
-
694
- #### Run Pipeline
695
- ```bash
696
- greenmining pipeline --max-repos 50 --max-commits 100
697
- ```
698
- Options:
699
- - `--max-repos`: Repositories to analyze
700
- - `--max-commits`: Commits per repository
701
- - Executes: fetch → extract → analyze → aggregate → report
702
-
703
613
  ## Output Files
704
614
 
705
615
  All outputs are saved to the `data/` directory:
@@ -739,6 +649,7 @@ ruff check greenmining/ tests/
739
649
  - PyDriller >= 2.5
740
650
  - pandas >= 2.2.0
741
651
  - click >= 8.1.7
652
+ - codecarbon >= 2.0.0 (optional, for cross-platform energy measurement)
742
653
 
743
654
  ## License
744
655