@jambudipa/spider 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +81 -35
- package/dist/index.js +832 -111
- package/dist/index.js.map +1 -1
- package/package.json +4 -2
package/README.md
CHANGED
|
@@ -1,6 +1,14 @@
|
|
|
1
1
|
# @jambudipa/spider
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
[](https://github.com/jambudipa/spider/actions)
|
|
4
|
+
[](https://codecov.io/gh/jambudipa/spider)
|
|
5
|
+
[](https://badge.fury.io/js/@jambudipa%2Fspider)
|
|
6
|
+
[](https://nodejs.org/)
|
|
7
|
+
[](https://opensource.org/licenses/MIT)
|
|
8
|
+
|
|
9
|
+
A powerful, Effect-based web crawling framework for modern TypeScript applications. Built for type safety, composability, and enterprise-scale crawling operations.
|
|
10
|
+
|
|
11
|
+
> **⚠️ Pre-Release API**: Spider is currently in pre-release development (v0.x.x). The API may change frequently as we refine the library towards a stable v1.0.0 release. Consider this when using Spider in production environments and expect potential breaking changes in minor version updates.
|
|
4
12
|
|
|
5
13
|
## 🏆 **Battle-Tested Against Real-World Scenarios**
|
|
6
14
|
|
|
@@ -25,11 +33,20 @@ A powerful, Effect.js-based web crawling framework for modern TypeScript applica
|
|
|
25
33
|
| **Invalid Referer Blocking** | Header-based access control | Anti-Block |
|
|
26
34
|
| **Persistent Cookie Blocking** | Long-term blocking mechanisms | Anti-Block |
|
|
27
35
|
|
|
28
|
-
🎯 **[View Live Test Results](https://github.com/jambudipa/spider/actions)** | 📊 **
|
|
36
|
+
🎯 **[View Live Test Results](https://github.com/jambudipa/spider/actions/workflows/ci.yml)** | 📊 **All Scenario Tests Passing** | 🚀 **Production Ready**
|
|
37
|
+
|
|
38
|
+
> **Live Testing**: Our CI pipeline runs all 16 web scraping scenarios against real websites daily, ensuring Spider remains robust against changing web technologies.
|
|
39
|
+
|
|
40
|
+
### 🔍 **Current Status** (Updated: Aug 2025)
|
|
41
|
+
- ✅ **Core Functionality**: All web scraping scenarios working
|
|
42
|
+
- ✅ **Type Safety**: Full TypeScript compilation without errors
|
|
43
|
+
- ✅ **Build System**: Package builds successfully for distribution
|
|
44
|
+
- ✅ **Test Suite**: 92+ scenario tests passing against live websites
|
|
45
|
+
- ⚠️ **Code Quality**: 1,163 linting issues identified (technical debt - does not affect functionality)
|
|
29
46
|
|
|
30
47
|
## ✨ Key Features
|
|
31
48
|
|
|
32
|
-
- **🔥 Effect
|
|
49
|
+
- **🔥 Effect Foundation**: Type-safe, functional composition with robust error handling
|
|
33
50
|
- **⚡ High Performance**: Concurrent crawling with intelligent worker pool management
|
|
34
51
|
- **🤖 Robots.txt Compliant**: Automatic robots.txt parsing and compliance checking
|
|
35
52
|
- **🔄 Resumable Crawls**: State persistence and crash recovery capabilities
|
|
@@ -71,22 +88,30 @@ Effect.runPromise(program.pipe(
|
|
|
71
88
|
))
|
|
72
89
|
```
|
|
73
90
|
|
|
74
|
-
##
|
|
91
|
+
## 📚 Documentation
|
|
75
92
|
|
|
76
|
-
|
|
77
|
-
- **[Getting Started Guide](./docs/guides/getting-started.md)** - Complete setup and first crawl
|
|
78
|
-
- **[Examples](./docs/examples/)** - Working examples to get you started
|
|
79
|
-
- **[Basic Configuration](./docs/guides/configuration.md)** - Configuration options
|
|
93
|
+
**Comprehensive documentation is now available** following the [Diátaxis framework](https://diataxis.fr/) for better learning and reference:
|
|
80
94
|
|
|
81
|
-
###
|
|
82
|
-
|
|
83
|
-
- **[Anti-Bot Protection](./docs/guides/anti-bot.md)** - Bypass blocking mechanisms
|
|
84
|
-
- **[Security Handling](./docs/guides/security.md)** - Authentication and sessions
|
|
95
|
+
### 🎓 New to Spider?
|
|
96
|
+
Start with our **[Tutorial](./docs/tutorial/getting-started.md)** - a hands-on guide that takes you from installation to building advanced scrapers.
|
|
85
97
|
|
|
86
|
-
###
|
|
87
|
-
|
|
88
|
-
- **[
|
|
89
|
-
- **[
|
|
98
|
+
### 📋 Need to solve a specific problem?
|
|
99
|
+
Check our **[How-to Guides](./docs/how-to/)** for targeted solutions:
|
|
100
|
+
- **[Authentication](./docs/how-to/authentication.md)** - Handle logins, sessions, and auth flows
|
|
101
|
+
- **[Data Extraction](./docs/how-to/data-extraction.md)** - Extract structured data from HTML
|
|
102
|
+
- **[Resumable Operations](./docs/how-to/resumable-operations.md)** - Build fault-tolerant crawlers
|
|
103
|
+
|
|
104
|
+
### 📚 Need technical details?
|
|
105
|
+
See our **[Reference Documentation](./docs/reference/)**:
|
|
106
|
+
- **[API Reference](./docs/reference/api-reference.md)** - Complete API documentation
|
|
107
|
+
- **[Configuration](./docs/reference/configuration.md)** - All configuration options
|
|
108
|
+
|
|
109
|
+
### 🧠 Want to understand the design?
|
|
110
|
+
Read our **[Explanations](./docs/explanation/)**:
|
|
111
|
+
- **[Architecture](./docs/explanation/architecture.md)** - System design and philosophy
|
|
112
|
+
- **[Web Scraping Concepts](./docs/explanation/web-scraping-concepts.md)** - Core principles
|
|
113
|
+
|
|
114
|
+
**📖 [Browse All Documentation →](./docs/README.md)**
|
|
90
115
|
|
|
91
116
|
## 🛠️ Quick Configuration
|
|
92
117
|
|
|
@@ -382,46 +407,67 @@ npm install
|
|
|
382
407
|
# Build the package
|
|
383
408
|
npm run build
|
|
384
409
|
|
|
385
|
-
# Run tests
|
|
410
|
+
# Run tests (all scenarios)
|
|
386
411
|
npm test
|
|
387
412
|
|
|
388
413
|
# Run tests with coverage
|
|
389
414
|
npm run test:coverage
|
|
390
415
|
|
|
391
|
-
# Type checking
|
|
416
|
+
# Type checking (must pass)
|
|
392
417
|
npm run typecheck
|
|
393
418
|
|
|
394
|
-
#
|
|
419
|
+
# Validate CI setup locally
|
|
420
|
+
npm run ci:validate
|
|
421
|
+
|
|
422
|
+
# Code quality (has known issues)
|
|
423
|
+
npm run lint # Shows 1,163 issues
|
|
424
|
+
npm run format # Formats code consistently
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
### 🛠️ Contributing & Code Quality
|
|
428
|
+
|
|
429
|
+
**Current State**: The codebase is fully functional with comprehensive test coverage, but has technical debt in code style consistency.
|
|
430
|
+
|
|
431
|
+
- ✅ **Functional Changes**: All PRs must pass scenario tests
|
|
432
|
+
- ✅ **Type Safety**: TypeScript compilation must succeed
|
|
433
|
+
- ✅ **Build System**: Package must build without errors
|
|
434
|
+
- 🔄 **Code Style**: Help wanted fixing linting issues (great first contribution!)
|
|
435
|
+
|
|
436
|
+
**Contributing to Code Quality**:
|
|
437
|
+
```bash
|
|
438
|
+
# See specific linting issues
|
|
395
439
|
npm run lint
|
|
396
440
|
|
|
397
|
-
#
|
|
398
|
-
npm run
|
|
441
|
+
# Fix auto-fixable issues
|
|
442
|
+
npm run lint:fix
|
|
443
|
+
|
|
444
|
+
# Focus areas for improvement:
|
|
445
|
+
# - Unused variable cleanup (877 issues)
|
|
446
|
+
# - Return type annotations (286 issues)
|
|
447
|
+
# - Nullish coalescing operators
|
|
448
|
+
# - Console.log removal in production code
|
|
399
449
|
```
|
|
400
450
|
|
|
401
451
|
## License
|
|
402
452
|
|
|
403
453
|
MIT License - see [LICENSE](LICENSE) file for details.
|
|
404
454
|
|
|
405
|
-
## 📚 Documentation
|
|
455
|
+
## 📚 Complete Documentation
|
|
406
456
|
|
|
407
|
-
|
|
457
|
+
All documentation is organized in the [`/docs`](./docs/) directory following the [Diátaxis framework](https://diataxis.fr/):
|
|
408
458
|
|
|
409
|
-
|
|
410
|
-
-
|
|
411
|
-
-
|
|
412
|
-
-
|
|
459
|
+
- **🎓 [Tutorial](./docs/tutorial/)** - Learning-oriented lessons for getting started
|
|
460
|
+
- **📋 [How-to Guides](./docs/how-to/)** - Problem-solving guides for specific tasks
|
|
461
|
+
- **📚 [Reference](./docs/reference/)** - Technical reference and API documentation
|
|
462
|
+
- **🧠 [Explanation](./docs/explanation/)** - Understanding-oriented documentation
|
|
413
463
|
|
|
414
|
-
|
|
415
|
-
- **[Documentation Index](./docs/README.md)** - Overview of all available documentation
|
|
416
|
-
- **[User Guides](./docs/guides/)** - Step-by-step tutorials and best practices
|
|
417
|
-
- **[Feature Documentation](./docs/features/)** - Deep dives into key capabilities
|
|
418
|
-
- **[Advanced Examples](./docs/examples/)** - Real-world usage patterns
|
|
464
|
+
**📖 [Start with the Documentation Index →](./docs/README.md)**
|
|
419
465
|
|
|
420
466
|
## Support
|
|
421
467
|
|
|
422
|
-
- [GitHub Issues](https://github.com/jambudipa/spider/issues)
|
|
423
|
-
- [
|
|
424
|
-
- [
|
|
468
|
+
- [GitHub Issues](https://github.com/jambudipa/spider/issues) - Bug reports and feature requests
|
|
469
|
+
- [Documentation](./docs/) - Comprehensive guides and reference material
|
|
470
|
+
- [Tutorial](./docs/tutorial/getting-started.md) - Step-by-step learning guide
|
|
425
471
|
|
|
426
472
|
---
|
|
427
473
|
|