greenmining 0.1.10__tar.gz → 0.1.12__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. {greenmining-0.1.10 → greenmining-0.1.12}/CHANGELOG.md +27 -0
  2. {greenmining-0.1.10/greenmining.egg-info → greenmining-0.1.12}/PKG-INFO +174 -38
  3. greenmining-0.1.12/README.md +417 -0
  4. greenmining-0.1.12/greenmining/__init__.py +61 -0
  5. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/cli.py +9 -3
  6. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/controllers/repository_controller.py +10 -3
  7. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/gsf_patterns.py +795 -2
  8. {greenmining-0.1.10 → greenmining-0.1.12/greenmining.egg-info}/PKG-INFO +174 -38
  9. {greenmining-0.1.10 → greenmining-0.1.12}/pyproject.toml +7 -10
  10. greenmining-0.1.10/README.md +0 -280
  11. greenmining-0.1.10/greenmining/__init__.py +0 -20
  12. {greenmining-0.1.10 → greenmining-0.1.12}/LICENSE +0 -0
  13. {greenmining-0.1.10 → greenmining-0.1.12}/MANIFEST.in +0 -0
  14. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/__main__.py +0 -0
  15. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/__version__.py +0 -0
  16. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/config.py +0 -0
  17. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/controllers/__init__.py +0 -0
  18. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/main.py +0 -0
  19. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/__init__.py +0 -0
  20. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/aggregated_stats.py +0 -0
  21. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/analysis_result.py +0 -0
  22. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/commit.py +0 -0
  23. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/models/repository.py +0 -0
  24. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/presenters/__init__.py +0 -0
  25. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/presenters/console_presenter.py +0 -0
  26. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/__init__.py +0 -0
  27. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/commit_extractor.py +0 -0
  28. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/data_aggregator.py +0 -0
  29. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/data_analyzer.py +0 -0
  30. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/github_fetcher.py +0 -0
  31. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/services/reports.py +0 -0
  32. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining/utils.py +0 -0
  33. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/SOURCES.txt +0 -0
  34. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/dependency_links.txt +0 -0
  35. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/entry_points.txt +0 -0
  36. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/requires.txt +0 -0
  37. {greenmining-0.1.10 → greenmining-0.1.12}/greenmining.egg-info/top_level.txt +0 -0
  38. {greenmining-0.1.10 → greenmining-0.1.12}/pytest.ini +0 -0
  39. {greenmining-0.1.10 → greenmining-0.1.12}/setup.cfg +0 -0
  40. {greenmining-0.1.10 → greenmining-0.1.12}/setup.py +0 -0
@@ -1,5 +1,32 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.1.12] - 2025-12-03
4
+
5
+ ### Added
6
+ - Custom search keywords for repository fetching (`--keywords` option)
7
+ - `fetch_repositories()` function exposed in public API
8
+ - Users can now search for any topic (kubernetes, docker, serverless, etc.)
9
+
10
+ ### Changed
11
+ - README updated to reflect 122 patterns (was showing 76 in PyPI description)
12
+ - CLI `fetch` command now accepts `--keywords` parameter
13
+ - Repository fetching no longer hardcoded to "microservices"
14
+
15
+ ### Fixed
16
+ - Outdated pattern count in PyPI package description
17
+
18
+ ## [0.1.11] - 2025-12-03
19
+
20
+ ### Added
21
+ - Expanded pattern database from 76 to 122 patterns
22
+ - Added 9 new categories (Resource, Caching, Data, Async, Code, Monitoring, Network, Microservices, Infrastructure)
23
+ - Expanded keywords from 190 to 321
24
+ - VU Amsterdam 2024 research patterns for ML systems
25
+
26
+ ### Changed
27
+ - README with comprehensive feature documentation
28
+ - Detection rate improved to 37.15% (up from 33.79%)
29
+
3
30
  ## [0.1.7] - 2025-12-02
4
31
 
5
32
  ### Added
@@ -1,15 +1,14 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: greenmining
3
- Version: 0.1.10
3
+ Version: 0.1.12
4
4
  Summary: Analyze GitHub repositories to identify green software engineering patterns and energy-efficient practices
5
- Author-email: Your Name <your.email@example.com>
6
- Maintainer-email: Your Name <your.email@example.com>
5
+ Author-email: Adam Bouafia <a.bouafia@student.vu.nl>
7
6
  License: MIT
8
- Project-URL: Homepage, https://github.com/yourusername/greenmining
9
- Project-URL: Documentation, https://github.com/yourusername/greenmining#readme
10
- Project-URL: Repository, https://github.com/yourusername/greenmining
11
- Project-URL: Issues, https://github.com/yourusername/greenmining/issues
12
- Project-URL: Changelog, https://github.com/yourusername/greenmining/blob/main/CHANGELOG.md
7
+ Project-URL: Homepage, https://github.com/adam-bouafia/greenmining
8
+ Project-URL: Documentation, https://github.com/adam-bouafia/greenmining#readme
9
+ Project-URL: Repository, https://github.com/adam-bouafia/greenmining
10
+ Project-URL: Issues, https://github.com/adam-bouafia/greenmining/issues
11
+ Project-URL: Changelog, https://github.com/adam-bouafia/greenmining/blob/main/CHANGELOG.md
13
12
  Keywords: green-software,gsf,sustainability,carbon-footprint,microservices,mining,repository-analysis,energy-efficiency,github-analysis
14
13
  Classifier: Development Status :: 3 - Alpha
15
14
  Classifier: Intended Audience :: Developers
@@ -63,11 +62,11 @@ Green mining for microservices repositories.
63
62
 
64
63
  ## Overview
65
64
 
66
- `greenmining` is a Python library and CLI tool for analyzing GitHub repositories to identify green software engineering practices and energy-efficient patterns. It detects 76 sustainable software patterns across cloud, web, AI, database, networking, and general categories.
65
+ `greenmining` is a Python library and CLI tool for analyzing GitHub repositories to identify green software engineering practices and energy-efficient patterns. It detects 122 sustainable software patterns across cloud, web, AI, database, networking, and general categories, including advanced patterns from VU Amsterdam 2024 research on green architectural tactics for ML systems.
67
66
 
68
67
  ## Features
69
68
 
70
- - 🔍 **76 Sustainability Patterns**: Detect energy-efficient and environmentally conscious coding practices
69
+ - 🔍 **122 Sustainability Patterns**: Detect energy-efficient and environmentally conscious coding practices across 15 categories (expanded from 76)
71
70
  - 📊 **Repository Mining**: Analyze 100+ microservices repositories from GitHub
72
71
  - 📈 **Green Awareness Detection**: Identify sustainability-focused commits
73
72
  - 📄 **Comprehensive Reports**: Generate analysis reports in multiple formats
@@ -107,7 +106,10 @@ export GITHUB_TOKEN="your_github_token"
107
106
  # Run full analysis pipeline
108
107
  greenmining pipeline --max-repos 100
109
108
 
110
- # Fetch repositories
109
+ # Fetch repositories with custom keywords
110
+ greenmining fetch --max-repos 100 --min-stars 100 --keywords "kubernetes docker cloud-native"
111
+
112
+ # Fetch with default (microservices)
111
113
  greenmining fetch --max-repos 100 --min-stars 100
112
114
 
113
115
  # Extract commits
@@ -128,7 +130,7 @@ greenmining report
128
130
  from greenmining import GSF_PATTERNS, is_green_aware, get_pattern_by_keywords
129
131
 
130
132
  # Check available patterns
131
- print(f"Total patterns: {len(GSF_PATTERNS)}") # 76
133
+ print(f"Total patterns: {len(GSF_PATTERNS)}") # 122 patterns across 15 categories
132
134
 
133
135
  # Detect green awareness in commit messages
134
136
  commit_msg = "Optimize Redis caching to reduce energy consumption"
@@ -138,22 +140,42 @@ if is_green_aware(commit_msg):
138
140
  # Output: ['Cache Static Data', 'Use Efficient Cache Strategies']
139
141
  ```
140
142
 
143
+ #### Fetch Repositories with Custom Keywords (NEW)
144
+
145
+ ```python
146
+ from greenmining import fetch_repositories
147
+
148
+ # Fetch repositories with custom search keywords
149
+ repos = fetch_repositories(
150
+ github_token="your_github_token",
151
+ max_repos=50,
152
+ min_stars=500,
153
+ keywords="kubernetes cloud-native",
154
+ languages=["Python", "Go"]
155
+ )
156
+
157
+ print(f"Found {len(repos)} repositories")
158
+ for repo in repos[:5]:
159
+ print(f"- {repo.full_name} ({repo.stars} stars)")
160
+ ```
161
+
141
162
  #### Analyze Repository Commits
142
163
 
143
164
  ```python
144
- from greenmining.services.github_fetcher import GitHubFetcher
145
165
  from greenmining.services.commit_extractor import CommitExtractor
146
166
  from greenmining.services.data_analyzer import DataAnalyzer
147
- from greenmining.config import Config
167
+ from greenmining import fetch_repositories
148
168
 
149
- # Initialize services
150
- config = Config()
151
- fetcher = GitHubFetcher(config)
152
- extractor = CommitExtractor(config)
153
- analyzer = DataAnalyzer(config)
169
+ # Fetch repositories with custom keywords
170
+ repos = fetch_repositories(
171
+ github_token="your_token",
172
+ max_repos=10,
173
+ keywords="serverless edge-computing"
174
+ )
154
175
 
155
- # Fetch repositories
156
- repos = fetcher.fetch_repositories(max_repos=10, min_stars=100)
176
+ # Initialize services
177
+ extractor = CommitExtractor()
178
+ analyzer = DataAnalyzer()
157
179
 
158
180
  # Extract commits from first repo
159
181
  commits = extractor.extract_commits(repos[0], max_commits=50)
@@ -173,12 +195,18 @@ for commit in commits:
173
195
  ```python
174
196
  from greenmining import GSF_PATTERNS
175
197
 
176
- # Get all cloud patterns
198
+ # Get all patterns by category
177
199
  cloud_patterns = {
178
200
  pid: pattern for pid, pattern in GSF_PATTERNS.items()
179
201
  if pattern['category'] == 'cloud'
180
202
  }
181
- print(f"Cloud patterns: {len(cloud_patterns)}")
203
+ print(f"Cloud patterns: {len(cloud_patterns)}") # 40 patterns
204
+
205
+ ai_patterns = {
206
+ pid: pattern for pid, pattern in GSF_PATTERNS.items()
207
+ if pattern['category'] == 'ai'
208
+ }
209
+ print(f"AI/ML patterns: {len(ai_patterns)}") # 19 patterns
182
210
 
183
211
  # Get pattern details
184
212
  cache_pattern = GSF_PATTERNS['gsf_001']
@@ -186,6 +214,13 @@ print(f"Pattern: {cache_pattern['name']}")
186
214
  print(f"Category: {cache_pattern['category']}")
187
215
  print(f"Keywords: {cache_pattern['keywords']}")
188
216
  print(f"Impact: {cache_pattern['sci_impact']}")
217
+
218
+ # List all available categories
219
+ categories = set(p['category'] for p in GSF_PATTERNS.values())
220
+ print(f"Available categories: {sorted(categories)}")
221
+ # Output: ['ai', 'async', 'caching', 'cloud', 'code', 'data',
222
+ # 'database', 'general', 'infrastructure', 'microservices',
223
+ # 'monitoring', 'network', 'networking', 'resource', 'web']
189
224
  ```
190
225
 
191
226
  #### Generate Custom Reports
@@ -256,26 +291,127 @@ COMMITS_PER_REPO=50
256
291
  OUTPUT_DIR=./data
257
292
  ```
258
293
 
294
+ ## Features
295
+
296
+ ### Core Capabilities
297
+
298
+ - **Pattern Detection**: Automatically identifies 122 sustainability patterns across 15 categories
299
+ - **Keyword Analysis**: Scans commit messages using 321 green software keywords
300
+ - **Custom Repository Fetching**: Fetch repositories with custom search keywords (not limited to microservices)
301
+ - **Repository Analysis**: Analyzes repositories from GitHub with flexible filtering
302
+ - **Batch Processing**: Analyze hundreds of repositories and thousands of commits
303
+ - **Multi-format Output**: Generates Markdown reports, CSV exports, and JSON data
304
+ - **Statistical Analysis**: Calculates green-awareness metrics, pattern distribution, and trends
305
+ - **Docker Support**: Pre-built images for containerized analysis
306
+ - **Programmatic API**: Full Python API for custom workflows and integrations
307
+ - **Clean Architecture**: Modular design with services layer (Fetcher, Extractor, Analyzer, Aggregator, Reports)
308
+
309
+ ### Pattern Database
310
+
311
+ **122 green software patterns based on:**
312
+ - Green Software Foundation (GSF) Patterns Catalog
313
+ - VU Amsterdam 2024 research on ML system sustainability
314
+ - ICSE 2024 conference papers on sustainable software
315
+
316
+ ### Detection Performance
317
+
318
+ - **Coverage**: 67% of patterns actively detect in real-world commits
319
+ - **Accuracy**: 100% true positive rate for green-aware commits
320
+ - **Categories**: 15 distinct sustainability domains covered
321
+ - **Keywords**: 321 detection terms across all patterns
322
+
259
323
  ## GSF Pattern Categories
260
324
 
261
- - **Cloud** (40 patterns): Autoscaling, serverless, right-sizing, region selection
262
- - **Web** (15 patterns): CDN, caching, lazy loading, compression
263
- - **AI/ML** (8 patterns): Model optimization, pruning, quantization
264
- - **Database** (6 patterns): Indexing, query optimization, connection pooling
265
- - **Networking** (4 patterns): Protocol optimization, connection reuse
266
- - **General** (3 patterns): Code efficiency, resource management
325
+ **122 patterns across 15 categories:**
326
+
327
+ ### 1. Cloud (40 patterns)
328
+ Auto-scaling, serverless computing, right-sizing instances, region selection for renewable energy, spot instances, idle resource detection, cloud-native architectures
329
+
330
+ ### 2. Web (17 patterns)
331
+ CDN usage, caching strategies, lazy loading, asset compression, image optimization, minification, code splitting, tree shaking, prefetching
332
+
333
+ ### 3. AI/ML (19 patterns)
334
+ Model optimization, pruning, quantization, edge inference, batch optimization, efficient training, model compression, hardware acceleration, green ML pipelines
335
+
336
+ ### 4. Database (5 patterns)
337
+ Indexing strategies, query optimization, connection pooling, prepared statements, database views, denormalization for efficiency
338
+
339
+ ### 5. Networking (8 patterns)
340
+ Protocol optimization, connection reuse, HTTP/2, gRPC, efficient serialization, compression, persistent connections
341
+
342
+ ### 6. Network (6 patterns)
343
+ Request batching, GraphQL optimization, API gateway patterns, circuit breakers, rate limiting, request deduplication
344
+
345
+ ### 7. Caching (2 patterns)
346
+ Multi-level caching, cache invalidation strategies, data deduplication, distributed caching
347
+
348
+ ### 8. Resource (2 patterns)
349
+ Resource limits, dynamic allocation, memory management, CPU throttling
350
+
351
+ ### 9. Data (3 patterns)
352
+ Efficient serialization formats, pagination, streaming, data compression
353
+
354
+ ### 10. Async (3 patterns)
355
+ Event-driven architecture, reactive streams, polling elimination, non-blocking I/O
356
+
357
+ ### 11. Code (4 patterns)
358
+ Algorithm optimization, code efficiency, garbage collection tuning, memory profiling
359
+
360
+ ### 12. Monitoring (3 patterns)
361
+ Energy monitoring, performance profiling, APM tools, observability patterns
362
+
363
+ ### 13. Microservices (4 patterns)
364
+ Service decomposition, colocation strategies, graceful shutdown, service mesh optimization
365
+
366
+ ### 14. Infrastructure (4 patterns)
367
+ Alpine containers, Infrastructure as Code, renewable energy regions, container optimization
368
+
369
+ ### 15. General (8 patterns)
370
+ Feature flags, incremental processing, precomputation, background jobs, workflow optimization
267
371
 
268
372
  ## CLI Commands
269
373
 
270
- | Command | Description |
271
- |---------|-------------|
272
- | `fetch` | Fetch microservices repositories from GitHub |
273
- | `extract` | Extract commit history from repositories |
274
- | `analyze` | Analyze commits for green patterns |
275
- | `aggregate` | Aggregate analysis results |
276
- | `report` | Generate comprehensive report |
277
- | `pipeline` | Run complete analysis pipeline |
278
- | `status` | Show current analysis status |
374
+ | Command | Description | Key Options |
375
+ |---------|-------------|-------------|
376
+ | `fetch` | Fetch repositories from GitHub with custom keywords | `--max-repos`, `--min-stars`, `--languages`, `--keywords` |
377
+ | `extract` | Extract commit history from repositories | `--max-commits` per repository |
378
+ | `analyze` | Analyze commits for green patterns | Auto-detects patterns from 122-pattern database |
379
+ | `aggregate` | Aggregate analysis results | Generates statistics and summaries |
380
+ | `report` | Generate comprehensive report | Creates Markdown and CSV outputs |
381
+ | `pipeline` | Run complete analysis pipeline | `--max-repos`, `--max-commits` (all-in-one) |
382
+ | `status` | Show current analysis status | Displays progress and file statistics |
383
+
384
+ ### Command Details
385
+
386
+ #### Fetch Repositories
387
+ ```bash
388
+ # Fetch with custom search keywords
389
+ greenmining fetch --max-repos 100 --min-stars 50 --languages Python --keywords "kubernetes docker"
390
+
391
+ # Fetch microservices (default)
392
+ greenmining fetch --max-repos 100 --min-stars 50 --languages Python
393
+ ```
394
+ Options:
395
+ - `--max-repos`: Maximum repositories to fetch (default: 100)
396
+ - `--min-stars`: Minimum GitHub stars (default: 100)
397
+ - `--languages`: Filter by programming languages (default: "Python,Java,Go,JavaScript,TypeScript")
398
+ - `--keywords`: Custom search keywords (default: "microservices")
399
+
400
+ #### Extract Commits
401
+ ```bash
402
+ greenmining extract --max-commits 50
403
+ ```
404
+ Options:
405
+ - `--max-commits`: Maximum commits per repository (default: 50)
406
+
407
+ #### Run Pipeline
408
+ ```bash
409
+ greenmining pipeline --max-repos 50 --max-commits 100
410
+ ```
411
+ Options:
412
+ - `--max-repos`: Repositories to analyze
413
+ - `--max-commits`: Commits per repository
414
+ - Executes: fetch → extract → analyze → aggregate → report
279
415
 
280
416
  ## Output Files
281
417