@jambudipa/spider 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,14 @@
1
1
  # @jambudipa/spider
2
2
 
3
- A powerful, Effect.js-based web crawling framework for modern TypeScript applications. Built for type safety, composability, and enterprise-scale crawling operations.
3
+ [![CI Status](https://github.com/jambudipa/spider/workflows/Spider%20Scenario%20Tests/badge.svg)](https://github.com/jambudipa/spider/actions)
4
+ [![Coverage](https://codecov.io/gh/jambudipa/spider/branch/main/graph/badge.svg)](https://codecov.io/gh/jambudipa/spider)
5
+ [![npm version](https://badge.fury.io/js/@jambudipa%2Fspider.svg)](https://badge.fury.io/js/@jambudipa%2Fspider)
6
+ [![Node.js Version](https://img.shields.io/node/v/@jambudipa/spider.svg)](https://nodejs.org/)
7
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8
+
9
+ A powerful, Effect-based web crawling framework for modern TypeScript applications. Built for type safety, composability, and enterprise-scale crawling operations.
10
+
11
+ > **⚠️ Pre-Release API**: Spider is currently in pre-release development (v0.x.x). The API may change frequently as we refine the library towards a stable v1.0.0 release. Consider this when using Spider in production environments and expect potential breaking changes in minor version updates.
4
12
 
5
13
  ## 🏆 **Battle-Tested Against Real-World Scenarios**
6
14
 
@@ -25,11 +33,20 @@ A powerful, Effect.js-based web crawling framework for modern TypeScript applica
25
33
  | **Invalid Referer Blocking** | Header-based access control | Anti-Block |
26
34
  | **Persistent Cookie Blocking** | Long-term blocking mechanisms | Anti-Block |
27
35
 
28
- 🎯 **[View Live Test Results](https://github.com/jambudipa/spider/actions)** | 📊 **100% Test Pass Rate** | 🚀 **Production Ready**
36
+ 🎯 **[View Live Test Results](https://github.com/jambudipa/spider/actions/workflows/ci.yml)** | 📊 **All Scenario Tests Passing** | 🚀 **Production Ready**
37
+
38
+ > **Live Testing**: Our CI pipeline runs all 16 web scraping scenarios against real websites daily, ensuring Spider remains robust against changing web technologies.
39
+
40
+ ### 🔍 **Current Status** (Updated: Aug 2025)
41
+ - ✅ **Core Functionality**: All web scraping scenarios working
42
+ - ✅ **Type Safety**: Full TypeScript compilation without errors
43
+ - ✅ **Build System**: Package builds successfully for distribution
44
+ - ✅ **Test Suite**: 92+ scenario tests passing against live websites
45
+ - ⚠️ **Code Quality**: 1,163 linting issues identified (technical debt - does not affect functionality)
29
46
 
30
47
  ## ✨ Key Features
31
48
 
32
- - **🔥 Effect.js Foundation**: Type-safe, functional composition with robust error handling
49
+ - **🔥 Effect Foundation**: Type-safe, functional composition with robust error handling
33
50
  - **⚡ High Performance**: Concurrent crawling with intelligent worker pool management
34
51
  - **🤖 Robots.txt Compliant**: Automatic robots.txt parsing and compliance checking
35
52
  - **🔄 Resumable Crawls**: State persistence and crash recovery capabilities
@@ -71,22 +88,30 @@ Effect.runPromise(program.pipe(
71
88
  ))
72
89
  ```
73
90
 
74
- ## 🎯 What's Next?
91
+ ## 📚 Documentation
75
92
 
76
- ### 🆕 New to Spider?
77
- - **[Getting Started Guide](./docs/guides/getting-started.md)** - Complete setup and first crawl
78
- - **[Examples](./docs/examples/)** - Working examples to get you started
79
- - **[Basic Configuration](./docs/guides/configuration.md)** - Configuration options
93
+ **Comprehensive documentation is now available** following the [Diátaxis framework](https://diataxis.fr/) for better learning and reference:
80
94
 
81
- ### 🔄 Advanced Usage
82
- - **[Browser Automation](./docs/guides/browser-automation.md)** - Handle dynamic content
83
- - **[Anti-Bot Protection](./docs/guides/anti-bot.md)** - Bypass blocking mechanisms
84
- - **[Security Handling](./docs/guides/security.md)** - Authentication and sessions
95
+ ### 🎓 New to Spider?
96
+ Start with our **[Tutorial](./docs/tutorial/getting-started.md)** - a hands-on guide that takes you from installation to building advanced scrapers.
85
97
 
86
- ### 🏭 Building Production Systems?
87
- - **[Performance Guide](./docs/guides/performance.md)** - Scale your crawling operations
88
- - **[API Reference](./docs/api/)** - Complete technical documentation
89
- - **[Enterprise Patterns](./docs/examples/enterprise-patterns.md)** - Production-ready patterns
98
+ ### 📋 Need to solve a specific problem?
99
+ Check our **[How-to Guides](./docs/how-to/)** for targeted solutions:
100
+ - **[Authentication](./docs/how-to/authentication.md)** - Handle logins, sessions, and auth flows
101
+ - **[Data Extraction](./docs/how-to/data-extraction.md)** - Extract structured data from HTML
102
+ - **[Resumable Operations](./docs/how-to/resumable-operations.md)** - Build fault-tolerant crawlers
103
+
104
+ ### 📚 Need technical details?
105
+ See our **[Reference Documentation](./docs/reference/)**:
106
+ - **[API Reference](./docs/reference/api-reference.md)** - Complete API documentation
107
+ - **[Configuration](./docs/reference/configuration.md)** - All configuration options
108
+
109
+ ### 🧠 Want to understand the design?
110
+ Read our **[Explanations](./docs/explanation/)**:
111
+ - **[Architecture](./docs/explanation/architecture.md)** - System design and philosophy
112
+ - **[Web Scraping Concepts](./docs/explanation/web-scraping-concepts.md)** - Core principles
113
+
114
+ **📖 [Browse All Documentation →](./docs/README.md)**
90
115
 
91
116
  ## 🛠️ Quick Configuration
92
117
 
@@ -382,46 +407,67 @@ npm install
382
407
  # Build the package
383
408
  npm run build
384
409
 
385
- # Run tests
410
+ # Run tests (all scenarios)
386
411
  npm test
387
412
 
388
413
  # Run tests with coverage
389
414
  npm run test:coverage
390
415
 
391
- # Type checking
416
+ # Type checking (must pass)
392
417
  npm run typecheck
393
418
 
394
- # Linting
419
+ # Validate CI setup locally
420
+ npm run ci:validate
421
+
422
+ # Code quality (has known issues)
423
+ npm run lint # Shows 1,163 issues
424
+ npm run format # Formats code consistently
425
+ ```
426
+
427
+ ### 🛠️ Contributing & Code Quality
428
+
429
+ **Current State**: The codebase is fully functional with comprehensive test coverage, but has technical debt in code style consistency.
430
+
431
+ - ✅ **Functional Changes**: All PRs must pass scenario tests
432
+ - ✅ **Type Safety**: TypeScript compilation must succeed
433
+ - ✅ **Build System**: Package must build without errors
434
+ - 🔄 **Code Style**: Help wanted fixing linting issues (great first contribution!)
435
+
436
+ **Contributing to Code Quality**:
437
+ ```bash
438
+ # See specific linting issues
395
439
  npm run lint
396
440
 
397
- # Format code
398
- npm run format
441
+ # Fix auto-fixable issues
442
+ npm run lint:fix
443
+
444
+ # Focus areas for improvement:
445
+ # - Unused variable cleanup (877 issues)
446
+ # - Return type annotations (286 issues)
447
+ # - Nullish coalescing operators
448
+ # - Console.log removal in production code
399
449
  ```
400
450
 
401
451
  ## License
402
452
 
403
453
  MIT License - see [LICENSE](LICENSE) file for details.
404
454
 
405
- ## 📚 Documentation
455
+ ## 📚 Complete Documentation
406
456
 
407
- Comprehensive documentation is available in the [`/docs`](./docs) directory:
457
+ All documentation is organized in the [`/docs`](./docs/) directory following the [Diátaxis framework](https://diataxis.fr/):
408
458
 
409
- ### 🚀 Quick Links
410
- - **[Getting Started Guide](./docs/guides/getting-started.md)** - Installation, setup, and first crawl
411
- - **[API Reference](./docs/api/)** - Complete API documentation
412
- - **[Examples](./docs/examples/)** - Working examples for common use cases
459
+ - **🎓 [Tutorial](./docs/tutorial/)** - Learning-oriented lessons for getting started
460
+ - **📋 [How-to Guides](./docs/how-to/)** - Problem-solving guides for specific tasks
461
+ - **📚 [Reference](./docs/reference/)** - Technical reference and API documentation
462
+ - **🧠 [Explanation](./docs/explanation/)** - Understanding-oriented documentation
413
463
 
414
- ### 📖 Complete Documentation
415
- - **[Documentation Index](./docs/README.md)** - Overview of all available documentation
416
- - **[User Guides](./docs/guides/)** - Step-by-step tutorials and best practices
417
- - **[Feature Documentation](./docs/features/)** - Deep dives into key capabilities
418
- - **[Advanced Examples](./docs/examples/)** - Real-world usage patterns
464
+ **📖 [Start with the Documentation Index →](./docs/README.md)**
419
465
 
420
466
  ## Support
421
467
 
422
- - [GitHub Issues](https://github.com/jambudipa/spider/issues)
423
- - [Complete Documentation](./docs/)
424
- - [Working Examples](./docs/examples/)
468
+ - [GitHub Issues](https://github.com/jambudipa/spider/issues) - Bug reports and feature requests
469
+ - [Documentation](./docs/) - Comprehensive guides and reference material
470
+ - [Tutorial](./docs/tutorial/getting-started.md) - Step-by-step learning guide
425
471
 
426
472
  ---
427
473