greenmining 0.1.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. greenmining-0.1.4/CHANGELOG.md +79 -0
  2. greenmining-0.1.4/LICENSE +21 -0
  3. greenmining-0.1.4/MANIFEST.in +32 -0
  4. greenmining-0.1.4/PKG-INFO +335 -0
  5. greenmining-0.1.4/README.md +280 -0
  6. greenmining-0.1.4/greenmining/__init__.py +20 -0
  7. greenmining-0.1.4/greenmining/__main__.py +6 -0
  8. greenmining-0.1.4/greenmining/__version__.py +3 -0
  9. greenmining-0.1.4/greenmining/cli.py +370 -0
  10. greenmining-0.1.4/greenmining/config.py +120 -0
  11. greenmining-0.1.4/greenmining/controllers/__init__.py +11 -0
  12. greenmining-0.1.4/greenmining/controllers/repository_controller.py +117 -0
  13. greenmining-0.1.4/greenmining/gsf_patterns.py +802 -0
  14. greenmining-0.1.4/greenmining/main.py +37 -0
  15. greenmining-0.1.4/greenmining/models/__init__.py +12 -0
  16. greenmining-0.1.4/greenmining/models/aggregated_stats.py +30 -0
  17. greenmining-0.1.4/greenmining/models/analysis_result.py +48 -0
  18. greenmining-0.1.4/greenmining/models/commit.py +71 -0
  19. greenmining-0.1.4/greenmining/models/repository.py +89 -0
  20. greenmining-0.1.4/greenmining/presenters/__init__.py +11 -0
  21. greenmining-0.1.4/greenmining/presenters/console_presenter.py +141 -0
  22. greenmining-0.1.4/greenmining/services/__init__.py +13 -0
  23. greenmining-0.1.4/greenmining/services/commit_extractor.py +282 -0
  24. greenmining-0.1.4/greenmining/services/data_aggregator.py +442 -0
  25. greenmining-0.1.4/greenmining/services/data_analyzer.py +333 -0
  26. greenmining-0.1.4/greenmining/services/github_fetcher.py +266 -0
  27. greenmining-0.1.4/greenmining/services/reports.py +531 -0
  28. greenmining-0.1.4/greenmining/utils.py +320 -0
  29. greenmining-0.1.4/greenmining.egg-info/PKG-INFO +335 -0
  30. greenmining-0.1.4/greenmining.egg-info/SOURCES.txt +59 -0
  31. greenmining-0.1.4/greenmining.egg-info/dependency_links.txt +1 -0
  32. greenmining-0.1.4/greenmining.egg-info/entry_points.txt +2 -0
  33. greenmining-0.1.4/greenmining.egg-info/requires.txt +25 -0
  34. greenmining-0.1.4/greenmining.egg-info/top_level.txt +1 -0
  35. greenmining-0.1.4/pyproject.toml +191 -0
  36. greenmining-0.1.4/pytest.ini +22 -0
  37. greenmining-0.1.4/setup.cfg +4 -0
  38. greenmining-0.1.4/setup.py +7 -0
@@ -0,0 +1,79 @@
1
+ # Changelog
2
+
3
+ ## [0.1.4] - 2025-12-02
4
+
5
+ ### Added
6
+ - New release
7
+
8
+
9
+ ## [0.1.3] - 2025-12-02
10
+
11
+ ### Added
12
+ - New release
13
+
14
+
15
+ ## [0.1.2] - 2025-12-02
16
+
17
+ ### Added
18
+ - New release
19
+
20
+
21
+ ## [0.1.1] - 2025-12-02
22
+
23
+ ### Added
24
+ - New release
25
+
26
+
27
+ All notable changes to this project will be documented in this file.
28
+
29
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
30
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
31
+
32
+ ## [Unreleased]
33
+
34
+ ### Added
35
+ - Initial project structure with MCP architecture
36
+ - GSF patterns database (76 patterns, 190 keywords)
37
+ - GitHub repository fetching and analysis
38
+ - PyDriller integration for commit mining
39
+ - Pattern matching engine
40
+ - Green awareness detection
41
+ - Data analysis and reporting
42
+ - CLI interface with Click
43
+ - Docker support with multi-stage builds
44
+ - GitHub Actions CI/CD pipeline
45
+ - PyPI publishing workflow
46
+ - Docker Hub and GHCR publishing
47
+ - Comprehensive test suite
48
+ - Documentation and examples
49
+
50
+ ### Changed
51
+ - N/A
52
+
53
+ ### Deprecated
54
+ - N/A
55
+
56
+ ### Removed
57
+ - N/A
58
+
59
+ ### Fixed
60
+ - N/A
61
+
62
+ ### Security
63
+ - N/A
64
+
65
+ ## [0.1.0] - TBD
66
+
67
+ ### Added
68
+ - Initial release
69
+ - Core functionality for GSF pattern mining
70
+ - CLI tool `greenmining`
71
+ - Support for 100 microservices repositories
72
+ - Pattern matching with 76 GSF patterns
73
+ - Green awareness analysis
74
+ - Data export capabilities
75
+ - Docker containerization
76
+ - CI/CD automation
77
+
78
+ [Unreleased]: https://github.com/yourusername/greenmining/compare/v0.1.0...HEAD
79
+ [0.1.0]: https://github.com/yourusername/greenmining/releases/tag/v0.1.0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 Green Microservices Mining
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,32 @@
1
+ # Include important files in source distribution
2
+ include README.md
3
+ include LICENSE
4
+ include CHANGELOG.md
5
+ include pyproject.toml
6
+ include setup.py
7
+ include pytest.ini
8
+
9
+ # Include package data
10
+ recursive-include greenmining *.py
11
+ recursive-include greenmining py.typed
12
+
13
+ # Exclude unnecessary files
14
+ exclude cli-roadmap.md
15
+ exclude PYPI_ROADMAP.md
16
+ exclude CICD_SETUP.md
17
+ recursive-exclude tests *
18
+ recursive-exclude docs *
19
+ recursive-exclude scripts *
20
+ recursive-exclude prototype *
21
+ recursive-exclude .github *
22
+ exclude Dockerfile
23
+ exclude docker-compose.yml
24
+ exclude .gitignore
25
+ exclude .env
26
+ exclude .dockerignore
27
+
28
+ # Exclude Python cache and build artifacts
29
+ global-exclude __pycache__
30
+ global-exclude *.py[co]
31
+ global-exclude *.so
32
+ global-exclude .DS_Store
@@ -0,0 +1,335 @@
1
+ Metadata-Version: 2.4
2
+ Name: greenmining
3
+ Version: 0.1.4
4
+ Summary: Green Software Foundation (GSF) patterns mining tool for microservices repositories
5
+ Author-email: Your Name <your.email@example.com>
6
+ Maintainer-email: Your Name <your.email@example.com>
7
+ License: MIT
8
+ Project-URL: Homepage, https://github.com/yourusername/greenmining
9
+ Project-URL: Documentation, https://github.com/yourusername/greenmining#readme
10
+ Project-URL: Repository, https://github.com/yourusername/greenmining
11
+ Project-URL: Issues, https://github.com/yourusername/greenmining/issues
12
+ Project-URL: Changelog, https://github.com/yourusername/greenmining/blob/main/CHANGELOG.md
13
+ Keywords: green-software,gsf,sustainability,carbon-footprint,microservices,mining,repository-analysis,energy-efficiency,github-analysis
14
+ Classifier: Development Status :: 3 - Alpha
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Intended Audience :: Science/Research
17
+ Classifier: Topic :: Software Development :: Quality Assurance
18
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
19
+ Classifier: License :: OSI Approved :: MIT License
20
+ Classifier: Programming Language :: Python :: 3
21
+ Classifier: Programming Language :: Python :: 3.9
22
+ Classifier: Programming Language :: Python :: 3.10
23
+ Classifier: Programming Language :: Python :: 3.11
24
+ Classifier: Programming Language :: Python :: 3.12
25
+ Classifier: Programming Language :: Python :: 3.13
26
+ Classifier: Operating System :: OS Independent
27
+ Classifier: Environment :: Console
28
+ Requires-Python: >=3.9
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE
31
+ Requires-Dist: PyGithub>=2.1.1
32
+ Requires-Dist: PyDriller>=2.5
33
+ Requires-Dist: pandas>=2.2.0
34
+ Requires-Dist: click>=8.1.7
35
+ Requires-Dist: colorama>=0.4.6
36
+ Requires-Dist: tabulate>=0.9.0
37
+ Requires-Dist: tqdm>=4.66.0
38
+ Requires-Dist: matplotlib>=3.8.0
39
+ Requires-Dist: plotly>=5.18.0
40
+ Requires-Dist: python-dotenv>=1.0.0
41
+ Provides-Extra: dev
42
+ Requires-Dist: pytest>=7.4.0; extra == "dev"
43
+ Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
44
+ Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
45
+ Requires-Dist: black>=23.12.0; extra == "dev"
46
+ Requires-Dist: ruff>=0.1.9; extra == "dev"
47
+ Requires-Dist: mypy>=1.8.0; extra == "dev"
48
+ Requires-Dist: build>=1.0.3; extra == "dev"
49
+ Requires-Dist: twine>=4.0.2; extra == "dev"
50
+ Provides-Extra: docs
51
+ Requires-Dist: sphinx>=7.2.0; extra == "docs"
52
+ Requires-Dist: sphinx-rtd-theme>=2.0.0; extra == "docs"
53
+ Requires-Dist: myst-parser>=2.0.0; extra == "docs"
54
+ Dynamic: license-file
55
+
56
+ # greenmining
57
+
58
+ Green mining for microservices repositories.
59
+
60
+ [![PyPI](https://img.shields.io/pypi/v/greenmining)](https://pypi.org/project/greenmining/)
61
+ [![Python](https://img.shields.io/pypi/pyversions/greenmining)](https://pypi.org/project/greenmining/)
62
+ [![License](https://img.shields.io/github/license/adam-bouafia/greenmining)](LICENSE)
63
+
64
+ ## Overview
65
+
66
+ `greenmining` is a Python library and CLI tool for analyzing GitHub repositories to identify green software engineering practices. It detects 76 official Green Software Foundation patterns across cloud, web, AI, database, networking, and general categories.
67
+
68
+ ## Features
69
+
70
+ - 🔍 **76 GSF Patterns**: Detect official Green Software Foundation patterns
71
+ - 📊 **Repository Mining**: Analyze 100+ microservices repositories from GitHub
72
+ - 📈 **Green Awareness Detection**: Identify sustainability-focused commits
73
+ - 📄 **Comprehensive Reports**: Generate analysis reports in multiple formats
74
+ - 🐳 **Docker Support**: Run in containers for consistent environments
75
+ - ⚡ **Fast Analysis**: Parallel processing and checkpoint system
76
+
77
+ ## Installation
78
+
79
+ ### Via pip
80
+
81
+ ```bash
82
+ pip install greenmining
83
+ ```
84
+
85
+ ### From source
86
+
87
+ ```bash
88
+ git clone https://github.com/adam-bouafia/greenmining.git
89
+ cd greenmining
90
+ pip install -e .
91
+ ```
92
+
93
+ ### With Docker
94
+
95
+ ```bash
96
+ docker pull adambouafia/greenmining:latest
97
+ ```
98
+
99
+ ## Quick Start
100
+
101
+ ### CLI Usage
102
+
103
+ ```bash
104
+ # Set your GitHub token
105
+ export GITHUB_TOKEN="your_github_token"
106
+
107
+ # Run full analysis pipeline
108
+ greenmining pipeline --max-repos 100
109
+
110
+ # Fetch repositories
111
+ greenmining fetch --max-repos 100 --min-stars 100
112
+
113
+ # Extract commits
114
+ greenmining extract --max-commits 50
115
+
116
+ # Analyze for green patterns
117
+ greenmining analyze
118
+
119
+ # Generate report
120
+ greenmining report
121
+ ```
122
+
123
+ ### Python API
124
+
125
+ #### Basic Pattern Detection
126
+
127
+ ```python
128
+ from greenmining import GSF_PATTERNS, is_green_aware, get_pattern_by_keywords
129
+
130
+ # Check available patterns
131
+ print(f"Total GSF patterns: {len(GSF_PATTERNS)}") # 76
132
+
133
+ # Detect green awareness in commit messages
134
+ commit_msg = "Optimize Redis caching to reduce energy consumption"
135
+ if is_green_aware(commit_msg):
136
+ patterns = get_pattern_by_keywords(commit_msg)
137
+ print(f"Matched patterns: {patterns}")
138
+ # Output: ['Cache Static Data', 'Use Efficient Cache Strategies']
139
+ ```
140
+
141
+ #### Analyze Repository Commits
142
+
143
+ ```python
144
+ from greenmining.services.github_fetcher import GitHubFetcher
145
+ from greenmining.services.commit_extractor import CommitExtractor
146
+ from greenmining.services.data_analyzer import DataAnalyzer
147
+ from greenmining.config import Config
148
+
149
+ # Initialize services
150
+ config = Config()
151
+ fetcher = GitHubFetcher(config)
152
+ extractor = CommitExtractor(config)
153
+ analyzer = DataAnalyzer(config)
154
+
155
+ # Fetch repositories
156
+ repos = fetcher.fetch_repositories(max_repos=10, min_stars=100)
157
+
158
+ # Extract commits from first repo
159
+ commits = extractor.extract_commits(repos[0], max_commits=50)
160
+
161
+ # Analyze commits for green patterns
162
+ results = []
163
+ for commit in commits:
164
+ result = analyzer.analyze_commit(commit)
165
+ if result['green_aware']:
166
+ results.append(result)
167
+ print(f"Green commit found: {commit.message[:50]}...")
168
+ print(f" Patterns: {result['known_pattern']}")
169
+ ```
170
+
171
+ #### Access GSF Patterns Data
172
+
173
+ ```python
174
+ from greenmining import GSF_PATTERNS
175
+
176
+ # Get all cloud patterns
177
+ cloud_patterns = {
178
+ pid: pattern for pid, pattern in GSF_PATTERNS.items()
179
+ if pattern['category'] == 'cloud'
180
+ }
181
+ print(f"Cloud patterns: {len(cloud_patterns)}")
182
+
183
+ # Get pattern details
184
+ cache_pattern = GSF_PATTERNS['gsf_001']
185
+ print(f"Pattern: {cache_pattern['name']}")
186
+ print(f"Category: {cache_pattern['category']}")
187
+ print(f"Keywords: {cache_pattern['keywords']}")
188
+ print(f"Impact: {cache_pattern['sci_impact']}")
189
+ ```
190
+
191
+ #### Generate Custom Reports
192
+
193
+ ```python
194
+ from greenmining.services.data_aggregator import DataAggregator
195
+ from greenmining.config import Config
196
+
197
+ config = Config()
198
+ aggregator = DataAggregator(config)
199
+
200
+ # Load analysis results
201
+ results = aggregator.load_analysis_results()
202
+
203
+ # Generate statistics
204
+ stats = aggregator.calculate_statistics(results)
205
+ print(f"Total commits analyzed: {stats['total_commits']}")
206
+ print(f"Green-aware commits: {stats['green_aware_count']}")
207
+ print(f"Top patterns: {stats['top_patterns'][:5]}")
208
+
209
+ # Export to CSV
210
+ aggregator.export_to_csv(results, "output.csv")
211
+ ```
212
+
213
+ #### Batch Analysis
214
+
215
+ ```python
216
+ from greenmining.controllers.repository_controller import RepositoryController
217
+ from greenmining.config import Config
218
+
219
+ config = Config()
220
+ controller = RepositoryController(config)
221
+
222
+ # Run full pipeline programmatically
223
+ controller.fetch_repositories(max_repos=50)
224
+ controller.extract_commits(max_commits=100)
225
+ controller.analyze_commits()
226
+ controller.aggregate_results()
227
+ controller.generate_report()
228
+
229
+ print("Analysis complete! Check data/ directory for results.")
230
+ ```
231
+
232
+ ### Docker Usage
233
+
234
+ ```bash
235
+ # Run analysis pipeline
236
+ docker run -v $(pwd)/data:/app/data \
237
+ adambouafia/greenmining:latest --help
238
+
239
+ # With custom configuration
240
+ docker run -v $(pwd)/.env:/app/.env:ro \
241
+ -v $(pwd)/data:/app/data \
242
+ adambouafia/greenmining:latest pipeline --max-repos 50
243
+
244
+ # Interactive shell
245
+ docker run -it adambouafia/greenmining:latest /bin/bash
246
+ ```
247
+
248
+ ## Configuration
249
+
250
+ Create a `.env` file or set environment variables:
251
+
252
+ ```bash
253
+ GITHUB_TOKEN=your_github_personal_access_token
254
+ MAX_REPOS=100
255
+ COMMITS_PER_REPO=50
256
+ OUTPUT_DIR=./data
257
+ ```
258
+
259
+ ## GSF Pattern Categories
260
+
261
+ - **Cloud** (40 patterns): Autoscaling, serverless, right-sizing, region selection
262
+ - **Web** (15 patterns): CDN, caching, lazy loading, compression
263
+ - **AI/ML** (8 patterns): Model optimization, pruning, quantization
264
+ - **Database** (6 patterns): Indexing, query optimization, connection pooling
265
+ - **Networking** (4 patterns): Protocol optimization, connection reuse
266
+ - **General** (3 patterns): Code efficiency, resource management
267
+
268
+ ## CLI Commands
269
+
270
+ | Command | Description |
271
+ |---------|-------------|
272
+ | `fetch` | Fetch microservices repositories from GitHub |
273
+ | `extract` | Extract commit history from repositories |
274
+ | `analyze` | Analyze commits for green patterns |
275
+ | `aggregate` | Aggregate analysis results |
276
+ | `report` | Generate comprehensive report |
277
+ | `pipeline` | Run complete analysis pipeline |
278
+ | `status` | Show current analysis status |
279
+
280
+ ## Output Files
281
+
282
+ All outputs are saved to the `data/` directory:
283
+
284
+ - `repositories.json` - Repository metadata
285
+ - `commits.json` - Extracted commit data
286
+ - `analysis_results.json` - Pattern analysis results
287
+ - `aggregated_statistics.json` - Summary statistics
288
+ - `green_analysis_results.csv` - CSV export for spreadsheets
289
+ - `green_microservices_analysis.md` - Final report
290
+
291
+ ## Development
292
+
293
+ ```bash
294
+ # Clone repository
295
+ git clone https://github.com/adam-bouafia/greenmining.git
296
+ cd greenmining
297
+
298
+ # Install development dependencies
299
+ pip install -e ".[dev]"
300
+
301
+ # Run tests
302
+ pytest tests/
303
+
304
+ # Run with coverage
305
+ pytest --cov=greenmining tests/
306
+
307
+ # Format code
308
+ black greenmining/ tests/
309
+ ruff check greenmining/ tests/
310
+ ```
311
+
312
+ ## Requirements
313
+
314
+ - Python 3.9+
315
+ - PyGithub >= 2.1.1
316
+ - PyDriller >= 2.5
317
+ - pandas >= 2.2.0
318
+ - click >= 8.1.7
319
+
320
+ ## License
321
+
322
+ MIT License - See [LICENSE](LICENSE) for details.
323
+
324
+ ## Contributing
325
+
326
+ Contributions are welcome! Please open an issue or submit a pull request.
327
+
328
+ ## Links
329
+
330
+ - **GitHub**: https://github.com/adam-bouafia/greenmining
331
+ - **PyPI**: https://pypi.org/project/greenmining/
332
+ - **Docker Hub**: https://hub.docker.com/r/adambouafia/greenmining
333
+ - **Documentation**: https://github.com/adam-bouafia/greenmining#readme
334
+
335
+