ligamagic-scraper 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,614 @@
1
+ # Liga Magic Scraper
2
+
3
+ A Ruby gem for scraping product data from Liga Magic's card listings using Capybara and Selenium.
4
+
5
+ - ๐Ÿ’Ž Can be used as a gem or standalone CLI tool
6
+ - ๐Ÿ–ฅ๏ธ Supports headed (visible) and headless browser modes
7
+ - ๐Ÿ“ Saves results to JSON files in the `scrapped/` directory
8
+ ## Roadmap
9
+
10
+ - [Global Ligamagic product scrapping](#global-ligamagic-product-scrapping)
11
+ - [Store specific scrapping](#store-specific-scrapping)
12
+ - [Alert System](#alert-system)
13
+
14
+ ## Features
15
+
16
+ ### Global Ligamagic product scrapping
17
+ - [X] Extracts product name, ID, and pricing information
18
+ - [X] Automatically clicks "Load More" to paginate through all products
19
+ - [X] Stops when unavailable products are encountered
20
+ - [X] Uses a global-search flag (`-g` or `--global`)
21
+
22
+ ### Store specific scrapping
23
+ - [X] Extracts product name and pricing information from any Liga Magic store
24
+ - [X] Accepts store domain name only (e.g., `test-store`)
25
+ - [X] Store search with search term support (`-u STORE -s TERM`)
26
+ - [X] Store listing with max pages limit (`-u STORE -p N`)
27
+ - [X] Automatic pagination with safety limits
28
+ - [X] Automatically orders by price (most expensive first)
29
+ - [X] Filters only in-stock items automatically
30
+ - [X] Automatically builds clean URLs with only necessary parameters
31
+ - [X] Uses `--store` or `-u` flag
32
+ - [X] Extracts store name and ID automatically
33
+ - [X] Corrected selectors for card-desc structure
34
+ - [!] **Limitation**: Store searches with search terms cannot extract price/qty due to CSS obfuscation (see docs)
35
+
36
+ ### Alert System
37
+ - [X] Detects changes between scrapes (new products, removed products, price changes, availability changes)
38
+ - [X] Compares current scrape with most recent previous scrape
39
+ - [X] Multiple alert types supported: file, telegram
40
+ - [X] CLI flag to enable alerts: `-a` or `--alerts`
41
+ - [X] Configurable alert types via `--alert-types`
42
+ - [X] File alert fully implemented with organized directory structure
43
+ - [ ] Telegram alert action - currently stub only
44
+
45
+ ## Requirements
46
+
47
+ - Ruby 2.7 or higher
48
+ - Chrome browser installed
49
+ - ChromeDriver (automatically managed by selenium-webdriver)
50
+
51
+ ## Installation
52
+
53
+ ### As a Global Command
54
+
55
+ Install the gem globally to use the `ligamagic-scraper` command anywhere:
56
+
57
+ ```bash
58
+ # Build and install the gem
59
+ gem build ligamagic-scraper.gemspec
60
+ gem install ./ligamagic-scraper-0.1.0.gem
61
+ ```
62
+
63
+ ### As a Library in Your Project
64
+
65
+ Add this line to your application's Gemfile:
66
+
67
+ ```ruby
68
+ gem 'ligamagic-scraper', path: '/path/to/ligamagic-scrapper'
69
+ ```
70
+
71
+ Or install locally:
72
+
73
+ ```bash
74
+ bundle install
75
+ ```
76
+
77
+ ## Usage
78
+
79
+ ### Command Line Interface (Global Installation)
80
+
81
+ After installing the gem globally, you can use the `ligamagic-scraper` command from anywhere:
82
+
83
+ ```bash
84
+ ligamagic-scraper -s "booster box"
85
+ ```
86
+
87
+ **Search Options:**
88
+ - `-s`, `--search SEARCH` - Search term (global search or store search with -u)
89
+ - `-u`, `--store DOMAIN` - Store domain (e.g., `test-store`)
90
+ - `-p`, `--pages N` - Pages to scrape (max: 5, required with -u if no -s)
91
+
92
+ **Options:**
93
+ - `-g`, `--global` - Use global Liga Magic search (default, optional)
94
+ - `-b`, `--browser-mode MODE` - Browser mode: `headed` (default, visible browser) or `headless` (no UI)
95
+ - `-a`, `--alerts` - Enable alert system (detects changes from previous scrapes)
96
+ - `--alert-types TYPES` - Alert types: `file`, `email`, `telegram`, `webhook` (comma-separated, default: file)
97
+ - `-h`, `--help` - Show help message
98
+ - `--version` - Show version
99
+
100
+ **Usage Examples:**
101
+
102
+ | Use Case | Command | Pages | Browser |
103
+ |----------|---------|-------|---------|
104
+ | **Global Search** | | | |
105
+ | Basic global search | `ligamagic-scraper -s "booster box"` | Unlimited | Visible |
106
+ | Global search headless | `ligamagic-scraper -s "dual land" -b headless` | Unlimited | Headless |
107
+ | Global search with alerts | `ligamagic-scraper -s "commander" -a` | Unlimited | Visible |
108
+ | **Store Listing** | | | |
109
+ | List 2 pages | `ligamagic-scraper -u test-store -p 2` | 2 | Visible |
110
+ | List 5 pages (max) | `ligamagic-scraper -u test-store -p 5` | 5 | Visible |
111
+ | List with 10 (capped at 5) | `ligamagic-scraper -u test-store -p 10` | 5 (max) | Visible |
112
+ | List store headless | `ligamagic-scraper -u test-store -p 3 -b headless` | 3 | Headless |
113
+ | List store with alerts | `ligamagic-scraper -u test-store -p 4 -a` | 4 | Visible |
114
+ | **Store Search** | | | |
115
+ | Search within store | `ligamagic-scraper -u test-store -s "volcanic"` | Unlimited | Visible |
116
+ | Store search headless | `ligamagic-scraper -u test-store -s "force of will" -b headless` | Unlimited | Headless |
117
+ | Store search with alerts | `ligamagic-scraper -u test-store -s "mox" -a` | Unlimited | Visible |
118
+ | **Advanced** | | | |
119
+ | Multiple alert types | `ligamagic-scraper -s "lotus" -a --alert-types file,telegram` | Unlimited | Visible |
120
+ | Store with custom alerts | `ligamagic-scraper -u test-store -s "power nine" -a --alert-types file,email` | Unlimited | Visible |
121
+
122
+ **Search Modes:**
123
+
124
+ The gem supports three search modes:
125
+
126
+ | Mode | Flags | Pages | When to Use |
127
+ |------|-------|-------|-------------|
128
+ | **Global Search** | `-s TERM` | Unlimited | Search across all Liga Magic stores |
129
+ | **Store Listing** | `-u STORE -p N` | 1-5 (max) | List products from a specific store |
130
+ | **Store Search** | `-u STORE -s TERM` | Unlimited | Search for specific products within a store |
131
+
132
+ **Notes:**
133
+ - **Maximum 5 pages** allowed for store listings (values above 5 are capped)
134
+ - `-p 2` uses 2 pages
135
+ - `-p 5` uses 5 pages
136
+ - `-p 10` is capped to 5 pages
137
+ - Store searches with search term have unlimited pagination
138
+ - **Automatic filtering**: Store results are automatically:
139
+ - Ordered by price (most expensive first)
140
+ - Filtered to show only in-stock items
141
+ - **Limitation**: Store searches with search terms (`-u STORE -s TERM`) cannot extract price/qty
142
+ - Liga Magic uses CSS sprite obfuscation for anti-scraping protection
143
+ - Only card name and ID are extracted when searching
144
+ - Store listings (`-u STORE -p N`) work normally with full price/qty data
145
+ - Use `-b headless` for background scraping without browser UI
146
+
147
+ ### Using as a Library
148
+
149
+ You can also import and use the scraper in your Ruby projects:
150
+
151
+ #### Global Search
152
+
153
+ ```ruby
154
+ require 'ligamagic_scraper'
155
+
156
+ # Create a global scraper instance
157
+ scraper = LigaMagicScraper::GlobalScraper.new(
158
+ search_term: "booster box",
159
+ browser_mode: 'headless'
160
+ )
161
+
162
+ # Run the scrape
163
+ results = scraper.scrape
164
+
165
+ # Save results to JSON
166
+ scraper.save_to_json(results)
167
+
168
+ # Access the results
169
+ results.each do |product|
170
+ puts "#{product[:name]}: R$ #{product[:min_price]}"
171
+ end
172
+
173
+ # Access execution logs
174
+ scraper.formatted_logs.each { |log| puts log }
175
+ ```
176
+
177
+ #### Store-Specific Search
178
+
179
+ ```ruby
180
+ require 'ligamagic_scraper'
181
+
182
+ # Create a store scraper instance for listing (with max pages)
183
+ scraper = LigaMagicScraper::StoreScraper.new(
184
+ store_domain: "test-store",
185
+ max_pages: 5,
186
+ browser_mode: 'headless'
187
+ )
188
+
189
+ # Run the scrape
190
+ results = scraper.scrape
191
+
192
+ # Save results to JSON
193
+ scraper.save_to_json(results)
194
+
195
+ # Access the results
196
+ results.each do |product|
197
+ puts "[#{product[:card_id]}] #{product[:name]}: R$ #{product[:price]} (#{product[:qtd]} available)"
198
+ end
199
+
200
+ # Access execution logs
201
+ scraper.formatted_logs.each { |log| puts log }
202
+ ```
203
+
204
+ ### Development Usage (Without Installing)
205
+
206
+ If you're developing or testing, you can run the executable directly from the project:
207
+
208
+ ```bash
209
+ # With visible browser (default)
210
+ ./bin/ligamagic-scraper -s "booster box"
211
+
212
+ # With headless mode
213
+ ./bin/ligamagic-scraper -s "commander deck" -b headless
214
+
215
+ # Or with bundle exec
216
+ bundle exec ./bin/ligamagic-scraper -s "commander deck" -b headless
217
+ ```
218
+
219
+ ### Programmatic Logging
220
+
221
+ All scrapers collect execution logs that can be accessed programmatically:
222
+
223
+ ```ruby
224
+ scraper = LigaMagicScraper::GlobalScraper.new(search_term: "test")
225
+ results = scraper.scrape
226
+
227
+ # Access full log entries (with timestamp, level, source)
228
+ scraper.logs
229
+ # => [
230
+ # { timestamp: 2025-10-27 12:00:00, level: :info, message: "๐Ÿš€ Starting...", source: "LigaMagicScraper::GlobalScraper" },
231
+ # ...
232
+ # ]
233
+
234
+ # Access just the messages (for display)
235
+ scraper.formatted_logs
236
+ # => ["๐Ÿš€ Starting Liga Magic global search scraper...", "๐Ÿ” Search term: test", ...]
237
+
238
+ # Filter by level
239
+ error_logs = scraper.logs.select { |l| l[:level] == :error }
240
+
241
+ # Display logs
242
+ scraper.formatted_logs.each { |msg| puts msg }
243
+ ```
244
+
245
+ **Log Levels:**
246
+ - `:info` - Major milestones (starting, completing, saving)
247
+ - `:debug` - Detailed progress (pagination clicks, individual product extraction)
248
+ - `:warning` - Non-fatal issues (skipped items, limits reached)
249
+ - `:error` - Errors and exceptions
250
+
251
+ **Note:** The CLI automatically displays all logs during execution. When using as a library, logs are collected silently and can be accessed via `scraper.logs` or `scraper.formatted_logs`
252
+
253
+ ## How it Works
254
+
255
+ The scraper will:
256
+ 1. Visit the Liga Magic search page with your search term
257
+ 2. Automatically click "Load More" until all available products are loaded
258
+ 3. Extract product names, IDs, slugs, and prices
259
+ 4. Display results in a formatted output
260
+ 5. Save results to a JSON file in `scrapped/YYYYMMDD_search_slug.json`
261
+ 6. Return a Ruby array of hashes with the data
262
+
263
+ ## Output Format
264
+
265
+ ### Global Search Output
266
+
267
+ The global scraper returns an array of hashes with product details:
268
+
269
+ ```ruby
270
+ [
271
+ {
272
+ id: "48851",
273
+ slug: "caixa_de_booster_amonkhet",
274
+ name: "Caixa de Booster - Amonkhet",
275
+ min_price: 1499.0,
276
+ avg_price: 1499.0,
277
+ max_price: 1499.0
278
+ },
279
+ {
280
+ id: "80494",
281
+ slug: "caixa_de_booster_dominaria",
282
+ name: "Caixa de Booster - Dominaria",
283
+ min_price: 1399.99,
284
+ avg_price: 1482.66,
285
+ max_price: 1499.95
286
+ },
287
+ # ...
288
+ ]
289
+ ```
290
+
291
+ **Global Search Fields:**
292
+ - `id`: Unique product code from Liga Magic (pcode), e.g., "48851" (can be null)
293
+ - `slug`: URL-friendly version of product name (normalized, no accents, always present)
294
+ - `name`: Full product name in Portuguese
295
+ - `min_price`: Minimum price across all stores (float)
296
+ - `avg_price`: Average price across all stores (float)
297
+ - `max_price`: Maximum price across all stores (float)
298
+
299
+ ### Store-Specific Output
300
+
301
+ The store scraper returns an array with different fields:
302
+
303
+ ```ruby
304
+ [
305
+ {
306
+ card_id: "16149",
307
+ name: "Volcanic Island",
308
+ slug: "volcanic_island",
309
+ price: 2500.0,
310
+ qtd: 1,
311
+ available: true
312
+ },
313
+ # ...
314
+ ]
315
+ ```
316
+
317
+ **Store-Specific Fields:**
318
+ - `card_id`: Unique card identifier from Liga Magic (extracted from link URL)
319
+ - `name`: Full product name
320
+ - `slug`: URL-friendly version of product name
321
+ - `price`: Product price at this store (float)
322
+ - `qtd`: Quantity available (integer)
323
+ - `available`: Whether the product is available (boolean based on qtd > 0)
324
+
325
+ ### JSON File Format
326
+
327
+ Results are automatically saved to organized directories:
328
+
329
+ **Global Searches:**
330
+ - Directory: `scrapped/global/`
331
+ - Format: `YYYYMMDD_HHMMSS__search_slug.json`
332
+ - Examples:
333
+ - `scrapped/global/20251027_143022__booster_box.json`
334
+ - `scrapped/global/20251027_151530__commander_deck.json`
335
+
336
+ **Store Searches:**
337
+ - Directory: `scrapped/stores/{store_domain}/`
338
+ - Format (without search): `YYYYMMDD_HHMMSS.json`
339
+ - Format (with search): `YYYYMMDD_HHMMSS__search_slug.json`
340
+ - Examples:
341
+ - `scrapped/stores/test-store/20251027_143022.json` (store listing)
342
+ - `scrapped/stores/test-store/20251027_151530__volcanic_island.json` (store search)
343
+
344
+ The datetime format allows multiple scrapes per day, which is useful for price tracking and change detection.
345
+
346
+ JSON structure:
347
+ ```json
348
+ {
349
+ "search_term": "booster box",
350
+ "scraped_at": "2025-10-26T14:30:00-03:00",
351
+ "total_products": 42,
352
+ "products": [
353
+ {
354
+ "id": "48851",
355
+ "slug": "caixa_de_booster_amonkhet",
356
+ "name": "Caixa de Booster - Amonkhet",
357
+ "min_price": 1499.0,
358
+ "avg_price": 1499.0,
359
+ "max_price": 1499.0
360
+ }
361
+ ]
362
+ }
363
+ ```
364
+
365
+ ## Alert System
366
+
367
+ The alert system automatically detects changes between scrapes and can notify you through multiple channels.
368
+
369
+ ### How It Works
370
+
371
+ 1. **Enable alerts** with the `-a` or `--alerts` flag
372
+ 2. The system **automatically finds** the most recent previous scrape file
373
+ 3. **Compares** the current scrape with the previous one
374
+ 4. **Detects changes**:
375
+ - ๐Ÿ†• New products added
376
+ - โŒ Products removed
377
+ - ๐Ÿ’ฐ Price changes (with percentage change)
378
+ - ๐Ÿ“Š Quantity changes (for store scraper)
379
+ - ๐Ÿ“ฆ Availability changes
380
+ 5. **Sends notifications** via configured alert types
381
+
382
+ ### Alert Types
383
+
384
+ - **file**: Save alert to JSON file in `alerts_json/` directory (implemented)
385
+ - **telegram**: Send Telegram message (stub - not implemented yet)
386
+
387
+ ### CLI Usage
388
+
389
+ ```bash
390
+ # Basic alert (uses file alert)
391
+ ligamagic-scraper -s "commander deck" -a
392
+
393
+ # Multiple alert types
394
+ ligamagic-scraper -u test-store -s "volcanic" -a --alert-types file,email,telegram
395
+
396
+ # Alerts with headless mode
397
+ ligamagic-scraper -s "booster box" -a -b headless
398
+ ```
399
+
400
+ ### Library Usage
401
+
402
+ ```ruby
403
+ require 'ligamagic_scraper'
404
+
405
+ # Configure alerts
406
+ alert_config = {
407
+ enabled: true,
408
+ alert_types: [:file, :email],
409
+ compare_previous: true
410
+ }
411
+
412
+ # Create scraper with alerts
413
+ scraper = LigaMagicScraper::GlobalScraper.new(
414
+ search_term: "commander deck",
415
+ alert_config: alert_config
416
+ )
417
+
418
+ results = scraper.scrape
419
+ scraper.save_to_json(results) # Automatically processes alerts
420
+ ```
421
+
422
+ ### Detected Changes Format
423
+
424
+ ```ruby
425
+ {
426
+ has_changes: true,
427
+ timestamp: "2025-10-27T...",
428
+ search_type: "global",
429
+ search_term: "commander deck",
430
+ new_products: [...], # Products that weren't in previous scrape
431
+ removed_products: [...], # Products that are no longer available
432
+ price_changes: [ # Products with price changes
433
+ {
434
+ id: "12345",
435
+ name: "Product Name",
436
+ previous_min: 100.0,
437
+ current_min: 85.0,
438
+ change: -15.0,
439
+ change_percent: -15.0
440
+ }
441
+ ],
442
+ quantity_changes: [ # Quantity changes (store scraper only)
443
+ {
444
+ id: "16149",
445
+ name: "Volcanic Island",
446
+ previous_qtd: 3,
447
+ current_qtd: 1,
448
+ change: -2
449
+ }
450
+ ],
451
+ availability_changes: [...] # Products that became available/unavailable
452
+ }
453
+ ```
454
+
455
+ ### Alert Output
456
+
457
+ **File Alert** saves changes to organized directories:
458
+
459
+ **Global Searches:**
460
+ - Directory: `alerts_json/global/`
461
+ - Example: `alerts_json/global/20251027_143022.json`
462
+
463
+ **Store Searches:**
464
+ - Directory: `alerts_json/stores/{store_domain}/`
465
+ - Example: `alerts_json/stores/kamm-store/20251027_143022.json`
466
+
467
+ Each alert file contains full change details as JSON.
468
+
469
+ **Example Alert File:**
470
+ ```json
471
+ {
472
+ "has_changes": true,
473
+ "timestamp": "2025-10-27T14:30:22-03:00",
474
+ "search_type": "global",
475
+ "search_term": "commander deck",
476
+ "new_products": [...],
477
+ "removed_products": [...],
478
+ "price_changes": [...],
479
+ "quantity_changes": [...],
480
+ "availability_changes": [...]
481
+ }
482
+ ```
483
+
484
+ ## Configuration
485
+
486
+ The scraper uses:
487
+ - **Driver**: Selenium Chrome (headed mode by default, headless mode available)
488
+ - **Browser Modes**:
489
+ - `headed` (default): Visible browser window for monitoring
490
+ - `headless`: No UI, ideal for servers and automation
491
+ - **Wait Time**: 10 seconds for dynamic content
492
+ - **Base URL**: `https://www.ligamagic.com.br/?view=cards%2Fsearch&tipo=1`
493
+ - **Max Pagination**: 50 clicks maximum (configurable via `MAX_CLICKS` constant)
494
+ - **Output Directory**: `scrapped/` (created automatically)
495
+
496
+ ## Development
497
+
498
+ After checking out the repo, run `bundle install` to install dependencies.
499
+
500
+ ### Manual Build and Install
501
+
502
+ To build and install the gem manually:
503
+
504
+ ```bash
505
+ gem build ligamagic-scraper.gemspec
506
+ gem install ./ligamagic-scraper-0.1.0.gem
507
+ ```
508
+
509
+ To uninstall:
510
+
511
+ ```bash
512
+ gem uninstall ligamagic-scraper
513
+ ```
514
+
515
+ ## Version Management & Releases
516
+
517
+ The gem includes automated Rake tasks for version management following [Semantic Versioning](https://semver.org/).
518
+
519
+ ### Quick Release (Recommended)
520
+
521
+ The easiest way to release a new version is using the automated release tasks:
522
+
523
+ ```bash
524
+ # For bug fixes (0.1.0 -> 0.1.1)
525
+ rake release_patch
526
+
527
+ # For new features (0.1.0 -> 0.2.0)
528
+ rake release_minor
529
+
530
+ # For breaking changes (0.1.0 -> 1.0.0)
531
+ rake release_major
532
+ ```
533
+
534
+ These commands will automatically:
535
+ 1. Bump the version number
536
+ 2. Clean old gem files
537
+ 3. Build the new gem
538
+ 4. Install it locally
539
+
540
+ ### Check Current Version
541
+
542
+ ```bash
543
+ rake version
544
+ # Output: Current version: 0.1.0
545
+ ```
546
+
547
+ ### Manual Version Bumping
548
+
549
+ If you just want to bump the version without building:
550
+
551
+ ```bash
552
+ # Bump patch version (0.1.0 -> 0.1.1)
553
+ rake bump_patch
554
+
555
+ # Bump minor version (0.1.0 -> 0.2.0)
556
+ rake bump_minor
557
+
558
+ # Bump major version (0.1.0 -> 1.0.0)
559
+ rake bump_major
560
+ ```
561
+
562
+ ### Build & Install Tasks
563
+
564
+ ```bash
565
+ # Build and install locally (recommended)
566
+ rake install_local
567
+
568
+ # Just build (without installing)
569
+ rake build
570
+
571
+ # Clean old gem files
572
+ rake clean
573
+
574
+ # Uninstall the gem
575
+ rake uninstall
576
+ ```
577
+
578
+ ### Semantic Versioning Guide
579
+
580
+ - **Patch** (X.X.1): Bug fixes, small tweaks, no new features
581
+ - **Minor** (X.1.0): New features, backward compatible changes
582
+ - **Major** (1.0.0): Breaking changes, major redesigns
583
+
584
+ ### Example Workflow
585
+
586
+ ```bash
587
+ # 1. Make your code changes
588
+ # 2. Update CHANGELOG.md with your changes
589
+ # 3. Release the new version
590
+ rake release_patch
591
+
592
+ # Output:
593
+ # ๐Ÿ“ˆ Version bumped: 0.1.0 -> 0.1.1
594
+ # ๐Ÿงน Cleaning old gem files...
595
+ # ๐Ÿ”จ Building gem...
596
+ # ๐Ÿ“ฆ Installing ligamagic-scraper-0.1.1.gem...
597
+ # โœ… Gem installed successfully!
598
+ ```
599
+
600
+ ### All Available Rake Tasks
601
+
602
+ To see all available tasks:
603
+
604
+ ```bash
605
+ rake -T
606
+ ```
607
+
608
+ ## License
609
+
610
+ The gem is available as open source under the terms of the [MIT License](LICENSE).
611
+
612
+ ## Contributing
613
+
614
+ Bug reports and pull requests are welcome!