greenmining 0.1.10__tar.gz → 0.1.12__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {greenmining-0.1.10 → greenmining-0.1.12}/CHANGELOG.md +27 -0
- {greenmining-0.1.10/greenmining.egg-info → greenmining-0.1.12}/PKG-INFO +174 -38
- greenmining-0.1.12/README.md +417 -0
- greenmining-0.1.12/greenmining/__init__.py +61 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/cli.py +9 -3
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/controllers/repository_controller.py +10 -3
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/gsf_patterns.py +795 -2
- {greenmining-0.1.10 → greenmining-0.1.12/greenmining.egg-info}/PKG-INFO +174 -38
- {greenmining-0.1.10 → greenmining-0.1.12}/pyproject.toml +7 -10
- greenmining-0.1.10/README.md +0 -280
- greenmining-0.1.10/greenmining/__init__.py +0 -20
- {greenmining-0.1.10 → greenmining-0.1.12}/LICENSE +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/MANIFEST.in +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/__main__.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/__version__.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/config.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/controllers/__init__.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/main.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/__init__.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/aggregated_stats.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/analysis_result.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/commit.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/repository.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/presenters/__init__.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/presenters/console_presenter.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/__init__.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/commit_extractor.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/data_aggregator.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/data_analyzer.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/github_fetcher.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/reports.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/utils.py +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/SOURCES.txt +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/dependency_links.txt +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/entry_points.txt +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/requires.txt +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/top_level.txt +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/pytest.ini +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/setup.cfg +0 -0
- {greenmining-0.1.10 → greenmining-0.1.12}/setup.py +0 -0
|
@@ -1,5 +1,32 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.1.12] - 2025-12-03
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- Custom search keywords for repository fetching (`--keywords` option)
|
|
7
|
+
- `fetch_repositories()` function exposed in public API
|
|
8
|
+
- Users can now search for any topic (kubernetes, docker, serverless, etc.)
|
|
9
|
+
|
|
10
|
+
### Changed
|
|
11
|
+
- README updated to reflect 122 patterns (was showing 76 in PyPI description)
|
|
12
|
+
- CLI `fetch` command now accepts `--keywords` parameter
|
|
13
|
+
- Repository fetching no longer hardcoded to "microservices"
|
|
14
|
+
|
|
15
|
+
### Fixed
|
|
16
|
+
- Outdated pattern count in PyPI package description
|
|
17
|
+
|
|
18
|
+
## [0.1.11] - 2025-12-03
|
|
19
|
+
|
|
20
|
+
### Added
|
|
21
|
+
- Expanded pattern database from 76 to 122 patterns
|
|
22
|
+
- Added 9 new categories (Resource, Caching, Data, Async, Code, Monitoring, Network, Microservices, Infrastructure)
|
|
23
|
+
- Expanded keywords from 190 to 321
|
|
24
|
+
- VU Amsterdam 2024 research patterns for ML systems
|
|
25
|
+
|
|
26
|
+
### Changed
|
|
27
|
+
- README with comprehensive feature documentation
|
|
28
|
+
- Detection rate improved to 37.15% (up from 33.79%)
|
|
29
|
+
|
|
3
30
|
## [0.1.7] - 2025-12-02
|
|
4
31
|
|
|
5
32
|
### Added
|
|
@@ -1,15 +1,14 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: greenmining
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.12
|
|
4
4
|
Summary: Analyze GitHub repositories to identify green software engineering patterns and energy-efficient practices
|
|
5
|
-
Author-email:
|
|
6
|
-
Maintainer-email: Your Name <your.email@example.com>
|
|
5
|
+
Author-email: Adam Bouafia <a.bouafia@student.vu.nl>
|
|
7
6
|
License: MIT
|
|
8
|
-
Project-URL: Homepage, https://github.com/
|
|
9
|
-
Project-URL: Documentation, https://github.com/
|
|
10
|
-
Project-URL: Repository, https://github.com/
|
|
11
|
-
Project-URL: Issues, https://github.com/
|
|
12
|
-
Project-URL: Changelog, https://github.com/
|
|
7
|
+
Project-URL: Homepage, https://github.com/adam-bouafia/greenmining
|
|
8
|
+
Project-URL: Documentation, https://github.com/adam-bouafia/greenmining#readme
|
|
9
|
+
Project-URL: Repository, https://github.com/adam-bouafia/greenmining
|
|
10
|
+
Project-URL: Issues, https://github.com/adam-bouafia/greenmining/issues
|
|
11
|
+
Project-URL: Changelog, https://github.com/adam-bouafia/greenmining/blob/main/CHANGELOG.md
|
|
13
12
|
Keywords: green-software,gsf,sustainability,carbon-footprint,microservices,mining,repository-analysis,energy-efficiency,github-analysis
|
|
14
13
|
Classifier: Development Status :: 3 - Alpha
|
|
15
14
|
Classifier: Intended Audience :: Developers
|
|
@@ -63,11 +62,11 @@ Green mining for microservices repositories.
|
|
|
63
62
|
|
|
64
63
|
## Overview
|
|
65
64
|
|
|
66
|
-
`greenmining` is a Python library and CLI tool for analyzing GitHub repositories to identify green software engineering practices and energy-efficient patterns. It detects
|
|
65
|
+
`greenmining` is a Python library and CLI tool for analyzing GitHub repositories to identify green software engineering practices and energy-efficient patterns. It detects 122 sustainable software patterns across cloud, web, AI, database, networking, and general categories, including advanced patterns from VU Amsterdam 2024 research on green architectural tactics for ML systems.
|
|
67
66
|
|
|
68
67
|
## Features
|
|
69
68
|
|
|
70
|
-
- 🔍 **
|
|
69
|
+
- 🔍 **122 Sustainability Patterns**: Detect energy-efficient and environmentally conscious coding practices across 15 categories (expanded from 76)
|
|
71
70
|
- 📊 **Repository Mining**: Analyze 100+ microservices repositories from GitHub
|
|
72
71
|
- 📈 **Green Awareness Detection**: Identify sustainability-focused commits
|
|
73
72
|
- 📄 **Comprehensive Reports**: Generate analysis reports in multiple formats
|
|
@@ -107,7 +106,10 @@ export GITHUB_TOKEN="your_github_token"
|
|
|
107
106
|
# Run full analysis pipeline
|
|
108
107
|
greenmining pipeline --max-repos 100
|
|
109
108
|
|
|
110
|
-
# Fetch repositories
|
|
109
|
+
# Fetch repositories with custom keywords
|
|
110
|
+
greenmining fetch --max-repos 100 --min-stars 100 --keywords "kubernetes docker cloud-native"
|
|
111
|
+
|
|
112
|
+
# Fetch with default (microservices)
|
|
111
113
|
greenmining fetch --max-repos 100 --min-stars 100
|
|
112
114
|
|
|
113
115
|
# Extract commits
|
|
@@ -128,7 +130,7 @@ greenmining report
|
|
|
128
130
|
from greenmining import GSF_PATTERNS, is_green_aware, get_pattern_by_keywords
|
|
129
131
|
|
|
130
132
|
# Check available patterns
|
|
131
|
-
print(f"Total patterns: {len(GSF_PATTERNS)}") #
|
|
133
|
+
print(f"Total patterns: {len(GSF_PATTERNS)}") # 122 patterns across 15 categories
|
|
132
134
|
|
|
133
135
|
# Detect green awareness in commit messages
|
|
134
136
|
commit_msg = "Optimize Redis caching to reduce energy consumption"
|
|
@@ -138,22 +140,42 @@ if is_green_aware(commit_msg):
|
|
|
138
140
|
# Output: ['Cache Static Data', 'Use Efficient Cache Strategies']
|
|
139
141
|
```
|
|
140
142
|
|
|
143
|
+
#### Fetch Repositories with Custom Keywords (NEW)
|
|
144
|
+
|
|
145
|
+
```python
|
|
146
|
+
from greenmining import fetch_repositories
|
|
147
|
+
|
|
148
|
+
# Fetch repositories with custom search keywords
|
|
149
|
+
repos = fetch_repositories(
|
|
150
|
+
github_token="your_github_token",
|
|
151
|
+
max_repos=50,
|
|
152
|
+
min_stars=500,
|
|
153
|
+
keywords="kubernetes cloud-native",
|
|
154
|
+
languages=["Python", "Go"]
|
|
155
|
+
)
|
|
156
|
+
|
|
157
|
+
print(f"Found {len(repos)} repositories")
|
|
158
|
+
for repo in repos[:5]:
|
|
159
|
+
print(f"- {repo.full_name} ({repo.stars} stars)")
|
|
160
|
+
```
|
|
161
|
+
|
|
141
162
|
#### Analyze Repository Commits
|
|
142
163
|
|
|
143
164
|
```python
|
|
144
|
-
from greenmining.services.github_fetcher import GitHubFetcher
|
|
145
165
|
from greenmining.services.commit_extractor import CommitExtractor
|
|
146
166
|
from greenmining.services.data_analyzer import DataAnalyzer
|
|
147
|
-
from greenmining
|
|
167
|
+
from greenmining import fetch_repositories
|
|
148
168
|
|
|
149
|
-
#
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
169
|
+
# Fetch repositories with custom keywords
|
|
170
|
+
repos = fetch_repositories(
|
|
171
|
+
github_token="your_token",
|
|
172
|
+
max_repos=10,
|
|
173
|
+
keywords="serverless edge-computing"
|
|
174
|
+
)
|
|
154
175
|
|
|
155
|
-
#
|
|
156
|
-
|
|
176
|
+
# Initialize services
|
|
177
|
+
extractor = CommitExtractor()
|
|
178
|
+
analyzer = DataAnalyzer()
|
|
157
179
|
|
|
158
180
|
# Extract commits from first repo
|
|
159
181
|
commits = extractor.extract_commits(repos[0], max_commits=50)
|
|
@@ -173,12 +195,18 @@ for commit in commits:
|
|
|
173
195
|
```python
|
|
174
196
|
from greenmining import GSF_PATTERNS
|
|
175
197
|
|
|
176
|
-
# Get all
|
|
198
|
+
# Get all patterns by category
|
|
177
199
|
cloud_patterns = {
|
|
178
200
|
pid: pattern for pid, pattern in GSF_PATTERNS.items()
|
|
179
201
|
if pattern['category'] == 'cloud'
|
|
180
202
|
}
|
|
181
|
-
print(f"Cloud patterns: {len(cloud_patterns)}")
|
|
203
|
+
print(f"Cloud patterns: {len(cloud_patterns)}") # 40 patterns
|
|
204
|
+
|
|
205
|
+
ai_patterns = {
|
|
206
|
+
pid: pattern for pid, pattern in GSF_PATTERNS.items()
|
|
207
|
+
if pattern['category'] == 'ai'
|
|
208
|
+
}
|
|
209
|
+
print(f"AI/ML patterns: {len(ai_patterns)}") # 19 patterns
|
|
182
210
|
|
|
183
211
|
# Get pattern details
|
|
184
212
|
cache_pattern = GSF_PATTERNS['gsf_001']
|
|
@@ -186,6 +214,13 @@ print(f"Pattern: {cache_pattern['name']}")
|
|
|
186
214
|
print(f"Category: {cache_pattern['category']}")
|
|
187
215
|
print(f"Keywords: {cache_pattern['keywords']}")
|
|
188
216
|
print(f"Impact: {cache_pattern['sci_impact']}")
|
|
217
|
+
|
|
218
|
+
# List all available categories
|
|
219
|
+
categories = set(p['category'] for p in GSF_PATTERNS.values())
|
|
220
|
+
print(f"Available categories: {sorted(categories)}")
|
|
221
|
+
# Output: ['ai', 'async', 'caching', 'cloud', 'code', 'data',
|
|
222
|
+
# 'database', 'general', 'infrastructure', 'microservices',
|
|
223
|
+
# 'monitoring', 'network', 'networking', 'resource', 'web']
|
|
189
224
|
```
|
|
190
225
|
|
|
191
226
|
#### Generate Custom Reports
|
|
@@ -256,26 +291,127 @@ COMMITS_PER_REPO=50
|
|
|
256
291
|
OUTPUT_DIR=./data
|
|
257
292
|
```
|
|
258
293
|
|
|
294
|
+
## Features
|
|
295
|
+
|
|
296
|
+
### Core Capabilities
|
|
297
|
+
|
|
298
|
+
- **Pattern Detection**: Automatically identifies 122 sustainability patterns across 15 categories
|
|
299
|
+
- **Keyword Analysis**: Scans commit messages using 321 green software keywords
|
|
300
|
+
- **Custom Repository Fetching**: Fetch repositories with custom search keywords (not limited to microservices)
|
|
301
|
+
- **Repository Analysis**: Analyzes repositories from GitHub with flexible filtering
|
|
302
|
+
- **Batch Processing**: Analyze hundreds of repositories and thousands of commits
|
|
303
|
+
- **Multi-format Output**: Generates Markdown reports, CSV exports, and JSON data
|
|
304
|
+
- **Statistical Analysis**: Calculates green-awareness metrics, pattern distribution, and trends
|
|
305
|
+
- **Docker Support**: Pre-built images for containerized analysis
|
|
306
|
+
- **Programmatic API**: Full Python API for custom workflows and integrations
|
|
307
|
+
- **Clean Architecture**: Modular design with services layer (Fetcher, Extractor, Analyzer, Aggregator, Reports)
|
|
308
|
+
|
|
309
|
+
### Pattern Database
|
|
310
|
+
|
|
311
|
+
**122 green software patterns based on:**
|
|
312
|
+
- Green Software Foundation (GSF) Patterns Catalog
|
|
313
|
+
- VU Amsterdam 2024 research on ML system sustainability
|
|
314
|
+
- ICSE 2024 conference papers on sustainable software
|
|
315
|
+
|
|
316
|
+
### Detection Performance
|
|
317
|
+
|
|
318
|
+
- **Coverage**: 67% of patterns actively detect in real-world commits
|
|
319
|
+
- **Accuracy**: 100% true positive rate for green-aware commits
|
|
320
|
+
- **Categories**: 15 distinct sustainability domains covered
|
|
321
|
+
- **Keywords**: 321 detection terms across all patterns
|
|
322
|
+
|
|
259
323
|
## GSF Pattern Categories
|
|
260
324
|
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
-
|
|
265
|
-
|
|
266
|
-
|
|
325
|
+
**122 patterns across 15 categories:**
|
|
326
|
+
|
|
327
|
+
### 1. Cloud (40 patterns)
|
|
328
|
+
Auto-scaling, serverless computing, right-sizing instances, region selection for renewable energy, spot instances, idle resource detection, cloud-native architectures
|
|
329
|
+
|
|
330
|
+
### 2. Web (17 patterns)
|
|
331
|
+
CDN usage, caching strategies, lazy loading, asset compression, image optimization, minification, code splitting, tree shaking, prefetching
|
|
332
|
+
|
|
333
|
+
### 3. AI/ML (19 patterns)
|
|
334
|
+
Model optimization, pruning, quantization, edge inference, batch optimization, efficient training, model compression, hardware acceleration, green ML pipelines
|
|
335
|
+
|
|
336
|
+
### 4. Database (5 patterns)
|
|
337
|
+
Indexing strategies, query optimization, connection pooling, prepared statements, database views, denormalization for efficiency
|
|
338
|
+
|
|
339
|
+
### 5. Networking (8 patterns)
|
|
340
|
+
Protocol optimization, connection reuse, HTTP/2, gRPC, efficient serialization, compression, persistent connections
|
|
341
|
+
|
|
342
|
+
### 6. Network (6 patterns)
|
|
343
|
+
Request batching, GraphQL optimization, API gateway patterns, circuit breakers, rate limiting, request deduplication
|
|
344
|
+
|
|
345
|
+
### 7. Caching (2 patterns)
|
|
346
|
+
Multi-level caching, cache invalidation strategies, data deduplication, distributed caching
|
|
347
|
+
|
|
348
|
+
### 8. Resource (2 patterns)
|
|
349
|
+
Resource limits, dynamic allocation, memory management, CPU throttling
|
|
350
|
+
|
|
351
|
+
### 9. Data (3 patterns)
|
|
352
|
+
Efficient serialization formats, pagination, streaming, data compression
|
|
353
|
+
|
|
354
|
+
### 10. Async (3 patterns)
|
|
355
|
+
Event-driven architecture, reactive streams, polling elimination, non-blocking I/O
|
|
356
|
+
|
|
357
|
+
### 11. Code (4 patterns)
|
|
358
|
+
Algorithm optimization, code efficiency, garbage collection tuning, memory profiling
|
|
359
|
+
|
|
360
|
+
### 12. Monitoring (3 patterns)
|
|
361
|
+
Energy monitoring, performance profiling, APM tools, observability patterns
|
|
362
|
+
|
|
363
|
+
### 13. Microservices (4 patterns)
|
|
364
|
+
Service decomposition, colocation strategies, graceful shutdown, service mesh optimization
|
|
365
|
+
|
|
366
|
+
### 14. Infrastructure (4 patterns)
|
|
367
|
+
Alpine containers, Infrastructure as Code, renewable energy regions, container optimization
|
|
368
|
+
|
|
369
|
+
### 15. General (8 patterns)
|
|
370
|
+
Feature flags, incremental processing, precomputation, background jobs, workflow optimization
|
|
267
371
|
|
|
268
372
|
## CLI Commands
|
|
269
373
|
|
|
270
|
-
| Command | Description |
|
|
271
|
-
|
|
272
|
-
| `fetch` | Fetch
|
|
273
|
-
| `extract` | Extract commit history from repositories |
|
|
274
|
-
| `analyze` | Analyze commits for green patterns |
|
|
275
|
-
| `aggregate` | Aggregate analysis results |
|
|
276
|
-
| `report` | Generate comprehensive report |
|
|
277
|
-
| `pipeline` | Run complete analysis pipeline |
|
|
278
|
-
| `status` | Show current analysis status |
|
|
374
|
+
| Command | Description | Key Options |
|
|
375
|
+
|---------|-------------|-------------|
|
|
376
|
+
| `fetch` | Fetch repositories from GitHub with custom keywords | `--max-repos`, `--min-stars`, `--languages`, `--keywords` |
|
|
377
|
+
| `extract` | Extract commit history from repositories | `--max-commits` per repository |
|
|
378
|
+
| `analyze` | Analyze commits for green patterns | Auto-detects patterns from 122-pattern database |
|
|
379
|
+
| `aggregate` | Aggregate analysis results | Generates statistics and summaries |
|
|
380
|
+
| `report` | Generate comprehensive report | Creates Markdown and CSV outputs |
|
|
381
|
+
| `pipeline` | Run complete analysis pipeline | `--max-repos`, `--max-commits` (all-in-one) |
|
|
382
|
+
| `status` | Show current analysis status | Displays progress and file statistics |
|
|
383
|
+
|
|
384
|
+
### Command Details
|
|
385
|
+
|
|
386
|
+
#### Fetch Repositories
|
|
387
|
+
```bash
|
|
388
|
+
# Fetch with custom search keywords
|
|
389
|
+
greenmining fetch --max-repos 100 --min-stars 50 --languages Python --keywords "kubernetes docker"
|
|
390
|
+
|
|
391
|
+
# Fetch microservices (default)
|
|
392
|
+
greenmining fetch --max-repos 100 --min-stars 50 --languages Python
|
|
393
|
+
```
|
|
394
|
+
Options:
|
|
395
|
+
- `--max-repos`: Maximum repositories to fetch (default: 100)
|
|
396
|
+
- `--min-stars`: Minimum GitHub stars (default: 100)
|
|
397
|
+
- `--languages`: Filter by programming languages (default: "Python,Java,Go,JavaScript,TypeScript")
|
|
398
|
+
- `--keywords`: Custom search keywords (default: "microservices")
|
|
399
|
+
|
|
400
|
+
#### Extract Commits
|
|
401
|
+
```bash
|
|
402
|
+
greenmining extract --max-commits 50
|
|
403
|
+
```
|
|
404
|
+
Options:
|
|
405
|
+
- `--max-commits`: Maximum commits per repository (default: 50)
|
|
406
|
+
|
|
407
|
+
#### Run Pipeline
|
|
408
|
+
```bash
|
|
409
|
+
greenmining pipeline --max-repos 50 --max-commits 100
|
|
410
|
+
```
|
|
411
|
+
Options:
|
|
412
|
+
- `--max-repos`: Repositories to analyze
|
|
413
|
+
- `--max-commits`: Commits per repository
|
|
414
|
+
- Executes: fetch → extract → analyze → aggregate → report
|
|
279
415
|
|
|
280
416
|
## Output Files
|
|
281
417
|
|