npm - @booklib/core - Versions diffs - 2.0.0 - Mend

@booklib/core 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (374) hide show

package/.cursor/rules/booklib-standards.mdc +40 -0
package/.gemini/context.md +372 -0
package/AGENTS.md +166 -0
package/CHANGELOG.md +226 -0
package/CLAUDE.md +81 -0
package/CODE_OF_CONDUCT.md +31 -0
package/CONTRIBUTING.md +304 -0
package/LICENSE +21 -0
package/PLAN.md +28 -0
package/README.ja.md +198 -0
package/README.ko.md +198 -0
package/README.md +503 -0
package/README.pt-BR.md +198 -0
package/README.uk.md +241 -0
package/README.zh-CN.md +198 -0
package/SECURITY.md +9 -0
package/agents/architecture-reviewer.md +136 -0
package/agents/booklib-reviewer.md +90 -0
package/agents/data-reviewer.md +107 -0
package/agents/jvm-reviewer.md +146 -0
package/agents/python-reviewer.md +128 -0
package/agents/rust-reviewer.md +115 -0
package/agents/ts-reviewer.md +110 -0
package/agents/ui-reviewer.md +117 -0
package/assets/logo.svg +36 -0
package/bin/booklib-mcp.js +304 -0
package/bin/booklib.js +1705 -0
package/bin/skills.cjs +1292 -0
package/booklib-router.mdc +36 -0
package/booklib.config.json +19 -0
package/commands/animation-at-work.md +10 -0
package/commands/clean-code-reviewer.md +10 -0
package/commands/data-intensive-patterns.md +10 -0
package/commands/data-pipelines.md +10 -0
package/commands/design-patterns.md +10 -0
package/commands/domain-driven-design.md +10 -0
package/commands/effective-java.md +10 -0
package/commands/effective-kotlin.md +10 -0
package/commands/effective-python.md +10 -0
package/commands/effective-typescript.md +10 -0
package/commands/kotlin-in-action.md +10 -0
package/commands/lean-startup.md +10 -0
package/commands/microservices-patterns.md +10 -0
package/commands/programming-with-rust.md +10 -0
package/commands/refactoring-ui.md +10 -0
package/commands/rust-in-action.md +10 -0
package/commands/skill-router.md +10 -0
package/commands/spring-boot-in-action.md +10 -0
package/commands/storytelling-with-data.md +10 -0
package/commands/system-design-interview.md +10 -0
package/commands/using-asyncio-python.md +10 -0
package/commands/web-scraping-python.md +10 -0
package/community/registry.json +1616 -0
package/hooks/hooks.json +23 -0
package/hooks/posttooluse-capture.mjs +67 -0
package/hooks/suggest.js +153 -0
package/lib/agent-behaviors.js +40 -0
package/lib/agent-detector.js +96 -0
package/lib/config-loader.js +39 -0
package/lib/conflict-resolver.js +148 -0
package/lib/context-builder.js +574 -0
package/lib/discovery-engine.js +298 -0
package/lib/doctor/hook-installer.js +83 -0
package/lib/doctor/usage-tracker.js +87 -0
package/lib/engine/ai-features.js +253 -0
package/lib/engine/auditor.js +103 -0
package/lib/engine/bm25-index.js +178 -0
package/lib/engine/capture.js +120 -0
package/lib/engine/corrections.js +198 -0
package/lib/engine/doctor.js +195 -0
package/lib/engine/graph-injector.js +137 -0
package/lib/engine/graph.js +161 -0
package/lib/engine/handoff.js +405 -0
package/lib/engine/indexer.js +242 -0
package/lib/engine/parser.js +53 -0
package/lib/engine/query-expander.js +42 -0
package/lib/engine/reranker.js +40 -0
package/lib/engine/rrf.js +59 -0
package/lib/engine/scanner.js +151 -0
package/lib/engine/searcher.js +139 -0
package/lib/engine/session-coordinator.js +306 -0
package/lib/engine/session-manager.js +429 -0
package/lib/engine/synthesizer.js +70 -0
package/lib/installer.js +70 -0
package/lib/instinct-block.js +33 -0
package/lib/mcp-config-writer.js +88 -0
package/lib/paths.js +57 -0
package/lib/profiles/design.md +19 -0
package/lib/profiles/general.md +16 -0
package/lib/profiles/research-analysis.md +22 -0
package/lib/profiles/software-development.md +23 -0
package/lib/profiles/writing-content.md +19 -0
package/lib/project-initializer.js +916 -0
package/lib/registry/skills.js +102 -0
package/lib/registry-searcher.js +99 -0
package/lib/rules/rules-manager.js +169 -0
package/lib/skill-fetcher.js +333 -0
package/lib/well-known-builder.js +70 -0
package/lib/wizard/index.js +404 -0
package/lib/wizard/integration-detector.js +41 -0
package/lib/wizard/project-detector.js +100 -0
package/lib/wizard/prompt.js +156 -0
package/lib/wizard/registry-embeddings.js +107 -0
package/lib/wizard/skill-recommender.js +69 -0
package/llms-full.txt +254 -0
package/llms.txt +70 -0
package/package.json +45 -0
package/research-reports/2026-04-01-current-architecture.md +160 -0
package/research-reports/IDEAS.md +93 -0
package/rules/common/clean-code.md +42 -0
package/rules/java/effective-java.md +42 -0
package/rules/kotlin/effective-kotlin.md +37 -0
package/rules/python/effective-python.md +38 -0
package/rules/rust/rust.md +37 -0
package/rules/typescript/effective-typescript.md +42 -0
package/scripts/gen-llms-full.mjs +36 -0
package/scripts/gen-og.mjs +142 -0
package/scripts/validate-frontmatter.js +25 -0
package/skills/animation-at-work/SKILL.md +270 -0
package/skills/animation-at-work/assets/example_asset.txt +1 -0
package/skills/animation-at-work/evals/evals.json +44 -0
package/skills/animation-at-work/evals/results.json +13 -0
package/skills/animation-at-work/examples/after.md +64 -0
package/skills/animation-at-work/examples/before.md +35 -0
package/skills/animation-at-work/references/api_reference.md +369 -0
package/skills/animation-at-work/references/review-checklist.md +79 -0
package/skills/animation-at-work/scripts/audit_animations.py +295 -0
package/skills/animation-at-work/scripts/example.py +1 -0
package/skills/clean-code-reviewer/SKILL.md +444 -0
package/skills/clean-code-reviewer/audit.json +35 -0
package/skills/clean-code-reviewer/evals/evals.json +185 -0
package/skills/clean-code-reviewer/evals/results.json +13 -0
package/skills/clean-code-reviewer/examples/after.md +48 -0
package/skills/clean-code-reviewer/examples/before.md +33 -0
package/skills/clean-code-reviewer/references/api_reference.md +158 -0
package/skills/clean-code-reviewer/references/practices-catalog.md +282 -0
package/skills/clean-code-reviewer/references/review-checklist.md +254 -0
package/skills/clean-code-reviewer/scripts/pre-review.py +206 -0
package/skills/data-intensive-patterns/SKILL.md +267 -0
package/skills/data-intensive-patterns/assets/example_asset.txt +1 -0
package/skills/data-intensive-patterns/evals/evals.json +54 -0
package/skills/data-intensive-patterns/evals/results.json +13 -0
package/skills/data-intensive-patterns/examples/after.md +61 -0
package/skills/data-intensive-patterns/examples/before.md +38 -0
package/skills/data-intensive-patterns/references/api_reference.md +34 -0
package/skills/data-intensive-patterns/references/patterns-catalog.md +551 -0
package/skills/data-intensive-patterns/references/review-checklist.md +193 -0
package/skills/data-intensive-patterns/scripts/adr.py +213 -0
package/skills/data-intensive-patterns/scripts/example.py +1 -0
package/skills/data-pipelines/SKILL.md +259 -0
package/skills/data-pipelines/assets/example_asset.txt +1 -0
package/skills/data-pipelines/evals/evals.json +45 -0
package/skills/data-pipelines/evals/results.json +13 -0
package/skills/data-pipelines/examples/after.md +97 -0
package/skills/data-pipelines/examples/before.md +37 -0
package/skills/data-pipelines/references/api_reference.md +301 -0
package/skills/data-pipelines/references/review-checklist.md +181 -0
package/skills/data-pipelines/scripts/example.py +1 -0
package/skills/data-pipelines/scripts/new_pipeline.py +444 -0
package/skills/design-patterns/SKILL.md +271 -0
package/skills/design-patterns/assets/example_asset.txt +1 -0
package/skills/design-patterns/evals/evals.json +46 -0
package/skills/design-patterns/evals/results.json +13 -0
package/skills/design-patterns/examples/after.md +52 -0
package/skills/design-patterns/examples/before.md +29 -0
package/skills/design-patterns/references/api_reference.md +1 -0
package/skills/design-patterns/references/patterns-catalog.md +726 -0
package/skills/design-patterns/references/review-checklist.md +173 -0
package/skills/design-patterns/scripts/example.py +1 -0
package/skills/design-patterns/scripts/scaffold.py +807 -0
package/skills/domain-driven-design/SKILL.md +142 -0
package/skills/domain-driven-design/assets/example_asset.txt +1 -0
package/skills/domain-driven-design/evals/evals.json +48 -0
package/skills/domain-driven-design/evals/results.json +13 -0
package/skills/domain-driven-design/examples/after.md +80 -0
package/skills/domain-driven-design/examples/before.md +43 -0
package/skills/domain-driven-design/references/api_reference.md +1 -0
package/skills/domain-driven-design/references/patterns-catalog.md +545 -0
package/skills/domain-driven-design/references/review-checklist.md +158 -0
package/skills/domain-driven-design/scripts/example.py +1 -0
package/skills/domain-driven-design/scripts/scaffold.py +421 -0
package/skills/effective-java/SKILL.md +227 -0
package/skills/effective-java/assets/example_asset.txt +1 -0
package/skills/effective-java/evals/evals.json +46 -0
package/skills/effective-java/evals/results.json +13 -0
package/skills/effective-java/examples/after.md +83 -0
package/skills/effective-java/examples/before.md +37 -0
package/skills/effective-java/references/api_reference.md +1 -0
package/skills/effective-java/references/items-catalog.md +955 -0
package/skills/effective-java/references/review-checklist.md +216 -0
package/skills/effective-java/scripts/checkstyle_setup.py +211 -0
package/skills/effective-java/scripts/example.py +1 -0
package/skills/effective-kotlin/SKILL.md +271 -0
package/skills/effective-kotlin/assets/example_asset.txt +1 -0
package/skills/effective-kotlin/audit.json +29 -0
package/skills/effective-kotlin/evals/evals.json +45 -0
package/skills/effective-kotlin/evals/results.json +13 -0
package/skills/effective-kotlin/examples/after.md +36 -0
package/skills/effective-kotlin/examples/before.md +38 -0
package/skills/effective-kotlin/references/api_reference.md +1 -0
package/skills/effective-kotlin/references/practices-catalog.md +1228 -0
package/skills/effective-kotlin/references/review-checklist.md +126 -0
package/skills/effective-kotlin/scripts/example.py +1 -0
package/skills/effective-python/SKILL.md +441 -0
package/skills/effective-python/evals/evals.json +44 -0
package/skills/effective-python/evals/results.json +13 -0
package/skills/effective-python/examples/after.md +56 -0
package/skills/effective-python/examples/before.md +40 -0
package/skills/effective-python/ref-01-pythonic-thinking.md +202 -0
package/skills/effective-python/ref-02-lists-and-dicts.md +146 -0
package/skills/effective-python/ref-03-functions.md +186 -0
package/skills/effective-python/ref-04-comprehensions-generators.md +211 -0
package/skills/effective-python/ref-05-classes-interfaces.md +188 -0
package/skills/effective-python/ref-06-metaclasses-attributes.md +209 -0
package/skills/effective-python/ref-07-concurrency.md +213 -0
package/skills/effective-python/ref-08-robustness-performance.md +248 -0
package/skills/effective-python/ref-09-testing-debugging.md +253 -0
package/skills/effective-python/ref-10-collaboration.md +175 -0
package/skills/effective-python/references/api_reference.md +218 -0
package/skills/effective-python/references/practices-catalog.md +483 -0
package/skills/effective-python/references/review-checklist.md +190 -0
package/skills/effective-python/scripts/lint.py +173 -0
package/skills/effective-typescript/SKILL.md +262 -0
package/skills/effective-typescript/audit.json +29 -0
package/skills/effective-typescript/evals/evals.json +37 -0
package/skills/effective-typescript/evals/results.json +13 -0
package/skills/effective-typescript/examples/after.md +70 -0
package/skills/effective-typescript/examples/before.md +47 -0
package/skills/effective-typescript/references/api_reference.md +118 -0
package/skills/effective-typescript/references/practices-catalog.md +371 -0
package/skills/effective-typescript/scripts/review.py +169 -0
package/skills/kotlin-in-action/SKILL.md +261 -0
package/skills/kotlin-in-action/assets/example_asset.txt +1 -0
package/skills/kotlin-in-action/evals/evals.json +43 -0
package/skills/kotlin-in-action/evals/results.json +13 -0
package/skills/kotlin-in-action/examples/after.md +53 -0
package/skills/kotlin-in-action/examples/before.md +39 -0
package/skills/kotlin-in-action/references/api_reference.md +1 -0
package/skills/kotlin-in-action/references/practices-catalog.md +436 -0
package/skills/kotlin-in-action/references/review-checklist.md +204 -0
package/skills/kotlin-in-action/scripts/example.py +1 -0
package/skills/kotlin-in-action/scripts/setup_detekt.py +224 -0
package/skills/lean-startup/SKILL.md +160 -0
package/skills/lean-startup/assets/example_asset.txt +1 -0
package/skills/lean-startup/evals/evals.json +43 -0
package/skills/lean-startup/evals/results.json +13 -0
package/skills/lean-startup/examples/after.md +80 -0
package/skills/lean-startup/examples/before.md +34 -0
package/skills/lean-startup/references/api_reference.md +319 -0
package/skills/lean-startup/references/review-checklist.md +137 -0
package/skills/lean-startup/scripts/example.py +1 -0
package/skills/lean-startup/scripts/new_experiment.py +286 -0
package/skills/microservices-patterns/SKILL.md +384 -0
package/skills/microservices-patterns/evals/evals.json +45 -0
package/skills/microservices-patterns/evals/results.json +13 -0
package/skills/microservices-patterns/examples/after.md +69 -0
package/skills/microservices-patterns/examples/before.md +40 -0
package/skills/microservices-patterns/references/patterns-catalog.md +391 -0
package/skills/microservices-patterns/references/review-checklist.md +169 -0
package/skills/microservices-patterns/scripts/new_service.py +583 -0
package/skills/programming-with-rust/SKILL.md +209 -0
package/skills/programming-with-rust/evals/evals.json +37 -0
package/skills/programming-with-rust/evals/results.json +13 -0
package/skills/programming-with-rust/examples/after.md +107 -0
package/skills/programming-with-rust/examples/before.md +59 -0
package/skills/programming-with-rust/references/api_reference.md +152 -0
package/skills/programming-with-rust/references/practices-catalog.md +335 -0
package/skills/programming-with-rust/scripts/review.py +142 -0
package/skills/refactoring-ui/SKILL.md +362 -0
package/skills/refactoring-ui/assets/example_asset.txt +1 -0
package/skills/refactoring-ui/evals/evals.json +45 -0
package/skills/refactoring-ui/evals/results.json +13 -0
package/skills/refactoring-ui/examples/after.md +85 -0
package/skills/refactoring-ui/examples/before.md +58 -0
package/skills/refactoring-ui/references/api_reference.md +355 -0
package/skills/refactoring-ui/references/review-checklist.md +114 -0
package/skills/refactoring-ui/scripts/audit_css.py +250 -0
package/skills/refactoring-ui/scripts/example.py +1 -0
package/skills/rust-in-action/SKILL.md +350 -0
package/skills/rust-in-action/evals/evals.json +38 -0
package/skills/rust-in-action/evals/results.json +13 -0
package/skills/rust-in-action/examples/after.md +156 -0
package/skills/rust-in-action/examples/before.md +56 -0
package/skills/rust-in-action/references/practices-catalog.md +346 -0
package/skills/rust-in-action/scripts/review.py +147 -0
package/skills/skill-router/SKILL.md +186 -0
package/skills/skill-router/evals/evals.json +38 -0
package/skills/skill-router/evals/results.json +13 -0
package/skills/skill-router/examples/after.md +63 -0
package/skills/skill-router/examples/before.md +39 -0
package/skills/skill-router/references/api_reference.md +24 -0
package/skills/skill-router/references/routing-heuristics.md +89 -0
package/skills/skill-router/references/skill-catalog.md +174 -0
package/skills/skill-router/scripts/route.py +266 -0
package/skills/spring-boot-in-action/SKILL.md +340 -0
package/skills/spring-boot-in-action/evals/evals.json +39 -0
package/skills/spring-boot-in-action/evals/results.json +13 -0
package/skills/spring-boot-in-action/examples/after.md +185 -0
package/skills/spring-boot-in-action/examples/before.md +84 -0
package/skills/spring-boot-in-action/references/practices-catalog.md +403 -0
package/skills/spring-boot-in-action/scripts/review.py +184 -0
package/skills/storytelling-with-data/SKILL.md +241 -0
package/skills/storytelling-with-data/assets/example_asset.txt +1 -0
package/skills/storytelling-with-data/evals/evals.json +47 -0
package/skills/storytelling-with-data/evals/results.json +13 -0
package/skills/storytelling-with-data/examples/after.md +50 -0
package/skills/storytelling-with-data/examples/before.md +33 -0
package/skills/storytelling-with-data/references/api_reference.md +379 -0
package/skills/storytelling-with-data/references/review-checklist.md +111 -0
package/skills/storytelling-with-data/scripts/chart_review.py +301 -0
package/skills/storytelling-with-data/scripts/example.py +1 -0
package/skills/system-design-interview/SKILL.md +233 -0
package/skills/system-design-interview/assets/example_asset.txt +1 -0
package/skills/system-design-interview/evals/evals.json +46 -0
package/skills/system-design-interview/evals/results.json +13 -0
package/skills/system-design-interview/examples/after.md +94 -0
package/skills/system-design-interview/examples/before.md +27 -0
package/skills/system-design-interview/references/api_reference.md +582 -0
package/skills/system-design-interview/references/review-checklist.md +201 -0
package/skills/system-design-interview/scripts/example.py +1 -0
package/skills/system-design-interview/scripts/new_design.py +421 -0
package/skills/using-asyncio-python/SKILL.md +290 -0
package/skills/using-asyncio-python/assets/example_asset.txt +1 -0
package/skills/using-asyncio-python/evals/evals.json +43 -0
package/skills/using-asyncio-python/evals/results.json +13 -0
package/skills/using-asyncio-python/examples/after.md +68 -0
package/skills/using-asyncio-python/examples/before.md +39 -0
package/skills/using-asyncio-python/references/api_reference.md +267 -0
package/skills/using-asyncio-python/references/review-checklist.md +149 -0
package/skills/using-asyncio-python/scripts/check_blocking.py +270 -0
package/skills/using-asyncio-python/scripts/example.py +1 -0
package/skills/web-scraping-python/SKILL.md +280 -0
package/skills/web-scraping-python/assets/example_asset.txt +1 -0
package/skills/web-scraping-python/evals/evals.json +46 -0
package/skills/web-scraping-python/evals/results.json +13 -0
package/skills/web-scraping-python/examples/after.md +109 -0
package/skills/web-scraping-python/examples/before.md +40 -0
package/skills/web-scraping-python/references/api_reference.md +393 -0
package/skills/web-scraping-python/references/review-checklist.md +163 -0
package/skills/web-scraping-python/scripts/example.py +1 -0
package/skills/web-scraping-python/scripts/new_scraper.py +231 -0
package/skills/writing-plans/audit.json +34 -0
package/tests/agent-detector.test.js +83 -0
package/tests/corrections.test.js +245 -0
package/tests/doctor/hook-installer.test.js +72 -0
package/tests/doctor/usage-tracker.test.js +140 -0
package/tests/engine/benchmark-eval.test.js +31 -0
package/tests/engine/bm25-index.test.js +85 -0
package/tests/engine/capture-command.test.js +35 -0
package/tests/engine/capture.test.js +17 -0
package/tests/engine/graph-augmented-search.test.js +107 -0
package/tests/engine/graph-injector.test.js +44 -0
package/tests/engine/graph.test.js +216 -0
package/tests/engine/hybrid-searcher.test.js +74 -0
package/tests/engine/indexer-bm25.test.js +37 -0
package/tests/engine/mcp-tools.test.js +73 -0
package/tests/engine/project-initializer-mcp.test.js +99 -0
package/tests/engine/query-expander.test.js +36 -0
package/tests/engine/reranker.test.js +51 -0
package/tests/engine/rrf.test.js +49 -0
package/tests/engine/srag-prefix.test.js +47 -0
package/tests/instinct-block.test.js +23 -0
package/tests/mcp-config-writer.test.js +60 -0
package/tests/project-initializer-new-agents.test.js +48 -0
package/tests/rules/rules-manager.test.js +230 -0
package/tests/well-known-builder.test.js +40 -0
package/tests/wizard/integration-detector.test.js +31 -0
package/tests/wizard/project-detector.test.js +51 -0
package/tests/wizard/prompt-session.test.js +61 -0
package/tests/wizard/prompt.test.js +16 -0
package/tests/wizard/registry-embeddings.test.js +35 -0
package/tests/wizard/skill-recommender.test.js +34 -0
package/tests/wizard/slot-count.test.js +25 -0
package/vercel.json +21 -0

package/skills/web-scraping-python/examples/after.md ADDED Viewed

@@ -0,0 +1,109 @@
+# After
+A scraper using `requests.Session` for connection reuse, `BeautifulSoup` for HTML parsing, per-request retry logic, and polite rate limiting between pages.
+```python
+import logging
+import time
+from dataclasses import dataclass
+import requests
+from bs4 import BeautifulSoup
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
+logger = logging.getLogger(__name__)
+USER_AGENT = "JobResearchBot/1.0 (contact: scraping@mycompany.com)"
+REQUEST_DELAY_SECONDS = 2.0
+@dataclass
+class JobListing:
+    title: str
+    company: str
+    salary: str
+def make_session() -> requests.Session:
+    """Create a session with retry logic and a descriptive User-Agent."""
+    session = requests.Session()
+    session.headers.update({"User-Agent": USER_AGENT})
+    retry_policy = Retry(
+        total=3,
+        backoff_factor=1.5,
+        status_forcelist=[429, 500, 502, 503, 504],
+        allowed_methods=["GET"],
+    )
+    adapter = HTTPAdapter(max_retries=retry_policy)
+    session.mount("https://", adapter)
+    session.mount("http://", adapter)
+    return session
+def parse_job_listings(html: str) -> list[JobListing]:
+    """Extract job listings from a page of HTML using BeautifulSoup."""
+    soup = BeautifulSoup(html, "html.parser")
+    jobs = []
+    for card in soup.select("article.job-card"):
+        title_el   = card.select_one("h2.job-title")
+        company_el = card.select_one("span.company")
+        salary_el  = card.select_one("div.salary")
+        if title_el is None:
+            logger.debug("Skipping card with no title element")
+            continue
+        jobs.append(JobListing(
+            title=title_el.get_text(strip=True),
+            company=company_el.get_text(strip=True) if company_el else "",
+            salary=salary_el.get_text(strip=True) if salary_el else "Not specified",
+        ))
+    return jobs
+def scrape_jobs(base_url: str, num_pages: int) -> list[JobListing]:
+    """Scrape job listings across multiple pages with rate limiting."""
+    session = make_session()
+    all_jobs: list[JobListing] = []
+    for page in range(1, num_pages + 1):
+        url = f"{base_url}?page={page}"
+        logger.info("Fetching page %d: %s", page, url)
+        try:
+            response = session.get(url, timeout=15)
+            response.raise_for_status()
+        except requests.HTTPError as exc:
+            logger.error("HTTP error on page %d: %s", page, exc)
+            break
+        except requests.RequestException as exc:
+            logger.error("Request failed on page %d: %s — stopping", page, exc)
+            break
+        page_jobs = parse_job_listings(response.text)
+        logger.info("Extracted %d listings from page %d", len(page_jobs), page)
+        all_jobs.extend(page_jobs)
+        if page < num_pages:
+            time.sleep(REQUEST_DELAY_SECONDS)  # be polite
+    return all_jobs
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.INFO)
+    jobs = scrape_jobs("https://jobs.example.com/listings", num_pages=20)
+    print(f"Total jobs scraped: {len(jobs)}")
+```
+Key improvements:
+- `requests.Session` with `HTTPAdapter` reuses TCP connections and retries on transient server errors — one session for all pages instead of a new connection per request (Ch 1, 14: Session reuse and retry)
+- `BeautifulSoup` with CSS selectors replaces regex HTML parsing — correct, readable, and resilient to attribute ordering changes (Ch 2: Use BeautifulSoup, not regex, for HTML)
+- `parse_job_listings` is a pure function that takes an HTML string and returns typed `JobListing` dataclasses — easily unit-tested with saved HTML fixtures (Ch 15: Testing scrapers)
+- `None` checks on each element before `.get_text()` prevent `AttributeError` when elements are missing (Ch 2: Defensive parsing)
+- `time.sleep(REQUEST_DELAY_SECONDS)` between pages respects the server; `USER_AGENT` identifies the bot with a contact address (Ch 14, 18: Rate limiting and identification)
+- Specific `requests.HTTPError` and `requests.RequestException` replace the bare `except` — errors are logged with page context and the crawl stops gracefully (Ch 1, 14: Error handling)

package/skills/web-scraping-python/examples/before.md ADDED Viewed

@@ -0,0 +1,40 @@
+# Before
+A scraper that hammers a job listings site with no delays, parses HTML with regex, swallows all errors, and creates a new TCP connection for every page.
+```python
+import urllib.request
+import re
+def scrape_jobs(base_url, num_pages):
+    all_jobs = []
+    for page in range(1, num_pages + 1):
+        url = base_url + "?page=" + str(page)
+        try:
+            # New connection every request, no headers, no rate limiting
+            response = urllib.request.urlopen(url)
+            html = response.read().decode("utf-8")
+        except:
+            # Swallows every error — silent failures
+            continue
+        # Parsing HTML with regex — fragile and incorrect
+        titles = re.findall(r'<h2 class="job-title">(.*?)</h2>', html)
+        companies = re.findall(r'<span class="company">(.*?)</span>', html)
+        salaries = re.findall(r'<div class="salary">(.*?)</div>', html)
+        for i in range(len(titles)):
+            job = {
+                "title": titles[i],
+                "company": companies[i] if i < len(companies) else "",
+                "salary": salaries[i] if i < len(salaries) else "",
+            }
+            all_jobs.append(job)
+    return all_jobs
+jobs = scrape_jobs("https://jobs.example.com/listings", 20)
+print(f"Scraped {len(jobs)} jobs")
+```

package/skills/web-scraping-python/references/api_reference.md ADDED Viewed

@@ -0,0 +1,393 @@
+# Web Scraping with Python — Practices Catalog
+Chapter-by-chapter catalog of practices from *Web Scraping with Python*
+by Ryan Mitchell for scraper building.
+---
+## Chapter 1: Your First Web Scraper
+### Basic Fetching
+- **urllib.request** — `urlopen(url)` returns an HTTPResponse object; read `.read()` for HTML bytes
+- **requests library** — Preferred over urllib; `requests.get(url)` with headers, params, timeout support
+- **Error handling** — Catch `HTTPError` (4xx/5xx), `URLError` (server not found), and connection timeouts
+- **Response checking** — Always check `response.status_code`; handle 403 (forbidden), 404 (not found), 500 (server error)
+### BeautifulSoup Basics
+- **Creating soup** — `BeautifulSoup(html, 'html.parser')` or use `'lxml'` for speed
+- **Direct tag access** — `soup.h1`, `soup.title` returns first matching tag
+- **Tag attributes** — `tag.attrs` returns dict; `tag['href']` for specific attribute; `tag.get_text()` for text content
+- **None checking** — Always check if `soup.find()` returns None before accessing attributes
+---
+## Chapter 2: Advanced HTML Parsing
+### find and findAll
+- **`find(tag, attributes, recursive, text, keywords)`** — Returns first matching element
+- **`findAll(tag, attributes, recursive, text, limit, keywords)`** — Returns list of all matches
+- **Attribute filtering** — `find('div', {'class': 'price'})`, `find('span', {'id': 'result'})`
+- **Multiple tags** — `findAll(['h1', 'h2', 'h3'])` matches any of the listed tags
+- **Text search** — `findAll(text='exact match')` or `findAll(text=re.compile('pattern'))`
+### CSS Selectors
+- **`select(selector)`** — Use CSS selectors: `soup.select('div.content > p')`, `soup.select('#main .item')`
+- **Common selectors** — `tag`, `.class`, `#id`, `tag.class`, `parent > child`, `ancestor descendant`, `tag[attr=val]`
+- **Pseudo-selectors** — `:nth-of-type()`, `:first-child`, etc. for positional selection
+### Navigating the DOM Tree
+- **Children** — `tag.children` (direct children iterator), `tag.descendants` (all descendants)
+- **Siblings** — `tag.next_sibling`, `tag.previous_sibling`, `tag.next_siblings` (iterator)
+- **Parents** — `tag.parent`, `tag.parents` (iterator up to document root)
+- **Navigation tip** — NavigableString objects (text nodes) count as siblings; use `.find_next_sibling('tag')` to skip
+### Regular Expressions with BeautifulSoup
+- **Regex in find** — `soup.find('img', {'src': re.compile(r'\.jpg$')})` matches pattern against attribute
+- **Regex in findAll** — `soup.findAll('a', {'href': re.compile(r'^/wiki/')})` for link patterns
+- **Text regex** — `soup.findAll(text=re.compile(r'\$[\d,]+'))` for finding price patterns
+### Lambda Functions
+- **Lambda filters** — `soup.find_all(lambda tag: len(tag.attrs) == 2)` for custom tag filtering
+- **Complex conditions** — Combine tag name, attributes, text content in lambda for precise selection
+---
+## Chapter 3: Writing Web Crawlers
+### Single-Domain Crawling
+- **Internal link collection** — Find all `<a>` tags; filter for same-domain links using `urlparse`
+- **URL normalization** — Resolve relative URLs with `urljoin`; strip fragments and query strings for dedup
+- **Visited tracking** — Maintain a `set()` of visited URLs; check before fetching
+- **Breadth-first** — Use a queue (collections.deque) for BFS traversal of site
+- **Depth-first** — Use a stack (list) for DFS; useful for deep hierarchical sites
+### Building Robust Crawlers
+- **Recursive crawling** — Function that fetches page, extracts links, recurses on unvisited links
+- **Data extraction during crawl** — Extract target data while crawling; don't just collect URLs
+- **Depth limiting** — Set maximum crawl depth to prevent infinite recursion
+- **URL deduplication** — Normalize URLs before adding to visited set; handle trailing slashes, www prefix
+---
+## Chapter 4: Web Crawling Models
+### Planning a Crawl
+- **Site mapping** — Understand site structure before coding; identify URL patterns, pagination, categories
+- **Crawl scope** — Define which pages/sections to include or exclude
+- **Data schema** — Define what to extract before building; normalize across different page layouts
+### Handling Different Layouts
+- **Template detection** — Sites may use different templates for different content types
+- **Conditional parsing** — Check page type (product vs category vs article) and apply appropriate parser
+- **Data normalization** — Map different field names/formats from different layouts to a unified schema
+### Cross-Site Crawling
+- **Multi-domain** — Maintain per-domain settings (delays, selectors, credentials)
+- **Link following policies** — Decide which external links to follow; whitelist/blacklist domains
+- **Politeness per domain** — Track per-domain request timing; respect each site's robots.txt
+---
+## Chapter 5: Scrapy
+### Scrapy Architecture
+- **Spider** — Defines how to crawl and parse; subclass `scrapy.Spider`; implement `parse()` method
+- **Items** — Structured data containers; define fields with `scrapy.Item` and `scrapy.Field()`
+- **Pipelines** — Process items after extraction; validate, clean, store to database/file
+- **Middleware** — Hook into request/response processing; add headers, proxy rotation, retry logic
+- **Settings** — Configure concurrency (`CONCURRENT_REQUESTS`), delays (`DOWNLOAD_DELAY`), user agent, etc.
+### CrawlSpider
+- **Rules** — Define `Rule(LinkExtractor(...), callback=...)` for automatic link following
+- **LinkExtractor** — Filter links by `allow` (regex), `deny`, `restrict_css`, `restrict_xpaths`
+- **Callback** — Assign parse methods to different URL patterns; `follow=True` for recursive crawling
+### Scrapy Best Practices
+- **Item loaders** — Use `ItemLoader` for cleaner extraction with input/output processors
+- **Logging** — Configure log levels (`LOG_LEVEL = 'INFO'`); log to file for production runs
+- **Autothrottle** — Enable `AUTOTHROTTLE_ENABLED` for adaptive request pacing
+- **Feed exports** — Built-in export to JSON, CSV, XML via `-o output.json`
+- **Contracts** — Add docstring-based contracts for spider testing
+---
+## Chapter 6: Storing Data
+### File Storage
+- **CSV** — Use `csv.writer` or `csv.DictWriter`; handle encoding with `encoding='utf-8'`
+- **JSON** — Use `json.dump()` for structured data; JSON Lines for streaming/appending
+- **Raw files** — Download images, PDFs with `urllib.request.urlretrieve()` or `requests.get()` with streaming
+### Database Storage
+- **MySQL** — Use `pymysql` connector; parameterized queries to prevent SQL injection
+- **PostgreSQL** — Use `psycopg2`; connection pooling for concurrent scrapers
+- **SQLite** — Use built-in `sqlite3` for lightweight local storage; good for prototyping
+- **Schema design** — Design tables to match extracted data; use appropriate types; add indexes on lookup columns
+### Email Integration
+- **smtplib** — Send scraped data or alerts via email; useful for monitoring scraper results
+- **Notifications** — Alert on scraper failures, unusual data patterns, or completion
+### Storage Best Practices
+- **Idempotent storage** — Check for duplicates before inserting; use UPSERT patterns
+- **Raw preservation** — Store raw HTML alongside extracted data for re-parsing capability
+- **Batch operations** — Use bulk inserts for efficiency; commit in batches, not per-row
+- **Connection management** — Use context managers; close connections properly; handle reconnection
+---
+## Chapter 7: Reading Documents
+### PDF Extraction
+- **PDFMiner** — Extract text from PDFs; handle multi-column layouts and tables
+- **Page-by-page** — Process PDFs page by page for memory efficiency
+- **Tables in PDFs** — Use tabula-py or camelot for structured table extraction
+### Word Documents
+- **python-docx** — Read `.docx` files; extract paragraphs, tables, headers
+- **Older formats** — Handle `.doc` files with antiword or textract
+### Encoding
+- **Character detection** — Use `chardet` to detect file encoding when unknown
+- **UTF-8 normalization** — Convert all text to UTF-8; handle BOM (Byte Order Mark)
+- **HTML encoding** — Read `<meta charset>` tag; handle entity references (`&amp;`, `&lt;`)
+---
+## Chapter 8: Cleaning Dirty Data
+### String Normalization
+- **Whitespace** — Strip leading/trailing whitespace; normalize internal whitespace (multiple spaces to one)
+- **Unicode normalization** — Use `unicodedata.normalize('NFKD', text)` for consistent Unicode representation
+- **Case normalization** — Lowercase for comparison; preserve original for display
+### Regex Cleaning
+- **Pattern extraction** — Use regex groups to extract structured data from messy text (prices, dates, phone numbers)
+- **Substitution** — `re.sub()` to remove or replace unwanted characters and patterns
+- **Compiled patterns** — Pre-compile frequently used patterns with `re.compile()` for performance
+### Data Normalization
+- **Date formats** — Parse various date formats with `dateutil.parser`; store in ISO 8601
+- **Number formats** — Handle commas, currency symbols, percentage signs; convert to numeric types
+- **Address normalization** — Standardize address components; handle abbreviations
+### OpenRefine
+- **Faceting** — Group similar values to find inconsistencies
+- **Clustering** — Automatically find and merge similar values (fingerprint, n-gram, etc.)
+- **GREL expressions** — Transform data with OpenRefine's expression language
+---
+## Chapter 9: Natural Language Processing
+### Text Analysis
+- **N-grams** — Extract sequences of N words; useful for finding common phrases and patterns
+- **Frequency analysis** — Count word/phrase frequencies; identify key topics in scraped text
+- **Stop words** — Filter common words (the, is, at) to focus on meaningful content
+### Markov Models
+- **Text generation** — Build Markov chains from scraped text; generate similar-style text
+- **Chain order** — Higher order (2-gram, 3-gram) produces more coherent but less varied output
+### NLTK
+- **Tokenization** — Split text into words and sentences with NLTK tokenizers
+- **Part-of-speech tagging** — Tag words as nouns, verbs, etc. for structured extraction
+- **Named entity recognition** — Extract names, organizations, locations from text
+- **Stemming/lemmatization** — Reduce words to base forms for better matching and analysis
+---
+## Chapter 10: Crawling Through Forms and Logins
+### Form Submission
+- **POST requests** — `requests.post(url, data={'field': 'value'})` for form submission
+- **CSRF tokens** — Extract hidden CSRF token from form HTML; include in POST data
+- **Form fields** — Inspect form with browser DevTools; identify all required fields including hidden ones
+- **File uploads** — Use `files` parameter in `requests.post()` for multipart form data
+### Session Management
+- **requests.Session()** — Maintains cookies across requests; handles redirects; connection pooling
+- **Cookie persistence** — Session object automatically stores and sends cookies
+- **Login flow** — GET login page → extract CSRF → POST credentials → use session for authenticated pages
+### Authentication
+- **HTTP Basic Auth** — `requests.get(url, auth=('user', 'pass'))` for Basic authentication
+- **Token-based** — Extract auth token from login response; send in headers for subsequent requests
+- **OAuth** — Use `requests-oauthlib` for OAuth-protected APIs
+- **Session expiry** — Detect expired sessions (redirects to login); re-authenticate automatically
+---
+## Chapter 11: Scraping JavaScript
+### Selenium WebDriver
+- **Setup** — `webdriver.Chrome()` or `webdriver.Firefox()`; requires matching driver binary
+- **Headless mode** — `options.add_argument('--headless')` for browser without GUI; essential for servers
+- **Navigation** — `driver.get(url)`; `driver.find_element(By.CSS_SELECTOR, selector)`
+- **Interaction** — `.click()`, `.send_keys()`, `.clear()` on elements; simulate user behavior
+### Waiting for Content
+- **Implicit waits** — `driver.implicitly_wait(10)` sets default wait for element finding
+- **Explicit waits** — `WebDriverWait(driver, 10).until(EC.presence_of_element_located(...))` for specific conditions
+- **Expected conditions** — `element_to_be_clickable`, `visibility_of_element_located`, `text_to_be_present_in_element`
+- **Custom waits** — Write lambda conditions for complex wait scenarios
+### JavaScript Execution
+- **Execute script** — `driver.execute_script('return document.title')` runs JS in page context
+- **Scroll page** — `driver.execute_script('window.scrollTo(0, document.body.scrollHeight)')` for infinite scroll
+- **Extract data** — Execute JS to extract data from page variables, localStorage, or DOM
+### Ajax Handling
+- **Wait for Ajax** — Wait for specific elements that load asynchronously
+- **Network monitoring** — Intercept XHR requests to find underlying API endpoints
+- **Alternative approach** — If you can identify the API endpoint, use `requests` directly instead of Selenium
+---
+## Chapter 12: Crawling Through APIs
+### REST API Basics
+- **HTTP methods** — GET (read), POST (create), PUT (update), DELETE (remove)
+- **JSON responses** — `response.json()` for parsing; handle nested objects and arrays
+- **Headers** — Set `Accept: application/json`, `Authorization: Bearer token`
+- **Query parameters** — `requests.get(url, params={'key': 'value'})` for clean URL building
+### Undocumented APIs
+- **Browser DevTools** — Use Network tab to discover API calls made by JavaScript
+- **XHR filtering** — Filter network requests to XHR/Fetch to find data endpoints
+- **Request replication** — Copy request headers, cookies, parameters from DevTools to Python
+- **API reverse engineering** — Study request patterns to understand pagination, filtering, authentication
+### API Best Practices
+- **Rate limiting** — Respect rate limit headers; implement backoff on 429 responses
+- **Pagination** — Handle cursor-based, offset-based, and link-header pagination
+- **Error handling** — Retry on 5xx errors with exponential backoff; don't retry on 4xx
+- **Authentication** — Store API keys securely; handle token refresh for OAuth
+---
+## Chapter 13: Image Processing and OCR
+### Pillow (PIL)
+- **Image loading** — `Image.open(path)` or from URL response content
+- **Manipulation** — Resize, crop, rotate, filter for preprocessing before OCR
+- **Thresholding** — Convert to grayscale; apply threshold for clean black/white text
+### Tesseract OCR
+- **pytesseract** — `pytesseract.image_to_string(image)` for text extraction from images
+- **Preprocessing** — Clean images before OCR: denoise, deskew, threshold, resize
+- **Language support** — Specify language with `lang='eng'`; install language packs as needed
+- **Confidence** — Use `image_to_data()` for per-word confidence scores; filter low confidence
+### CAPTCHA Handling
+- **Simple CAPTCHAs** — Preprocessing + OCR may solve simple text CAPTCHAs
+- **Complex CAPTCHAs** — Consider CAPTCHA-solving services or rethink approach (use API instead)
+- **Ethical note** — CAPTCHAs exist to prevent automated access; respect their purpose
+---
+## Chapter 14: Avoiding Scraping Traps
+### Headers and Identity
+- **User-Agent** — Set a realistic browser User-Agent string; rotate for large-scale scraping
+- **Accept headers** — Include Accept, Accept-Language, Accept-Encoding to mimic real browsers
+- **Referer** — Set appropriate Referer header when navigating between pages
+- **Cookie handling** — Accept and send cookies; use sessions for automatic management
+### Behavioral Patterns
+- **Request timing** — Add random delays between requests (1-5 seconds); avoid perfectly regular intervals
+- **Navigation patterns** — Don't jump straight to data pages; mimic human browsing (home → category → product)
+- **Click patterns** — With Selenium, click through pages naturally rather than jumping directly to URLs
+### Honeypot Detection
+- **Hidden links** — Check for CSS `display:none` or `visibility:hidden` links; avoid following them
+- **Hidden form fields** — Pre-filled hidden fields may be traps; don't submit unexpected values
+- **Link patterns** — Suspicious URL patterns or link text may indicate honeypots
+### IP and Session Management
+- **Proxy rotation** — Rotate IP addresses for large-scale scraping; use proxy services
+- **Session rotation** — Create new sessions periodically; don't use same cookies indefinitely
+- **Fingerprint diversity** — Vary headers, timing, and behavior to avoid fingerprinting
+---
+## Chapter 15: Testing Scrapers
+### Unit Testing
+- **Parse function tests** — Test parsing functions with saved HTML files; verify extracted data
+- **Fixture files** — Save representative HTML pages as test fixtures; don't hit live sites in tests
+- **Edge cases** — Test with missing elements, empty pages, different layouts, malformed HTML
+### Integration Testing
+- **End-to-end** — Test full scrape pipeline from fetch to storage with known target pages
+- **Selenium tests** — Use Selenium for testing JavaScript-heavy scraping flows
+- **Mock responses** — Use `responses` or `requests-mock` libraries for HTTP mocking in tests
+### Testing Best Practices
+- **Site change detection** — Periodically check if site structure has changed; alert on selector failures
+- **Regression testing** — Compare current results against known-good baselines
+- **CI integration** — Run scraper tests in CI pipeline; catch issues before deployment
+---
+## Chapter 16: Parallel Web Scraping
+### Threading
+- **threading module** — Use for I/O-bound scraping; GIL doesn't block network operations
+- **Thread pool** — `concurrent.futures.ThreadPoolExecutor` for managed thread pools
+- **Thread safety** — Use locks for shared state (counters, result lists); prefer queues for task distribution
+### Multiprocessing
+- **multiprocessing module** — Use for CPU-bound processing (parsing, cleaning); bypasses GIL
+- **Process pool** — `concurrent.futures.ProcessPoolExecutor` for managed process pools
+- **Inter-process communication** — Use Queue for task distribution; Pipe for point-to-point
+### Queue-Based Architecture
+- **Producer-consumer** — Producer adds URLs to queue; consumers fetch and parse in parallel
+- **URL frontier** — Priority queue for managing which URLs to crawl next
+- **Result aggregation** — Collect results from workers into shared storage
+### Parallel Best Practices
+- **Per-domain limits** — Limit concurrent requests per domain even with parallel scraping
+- **Graceful shutdown** — Handle KeyboardInterrupt; drain queues cleanly on shutdown
+- **Error isolation** — One worker's failure shouldn't crash the entire scraping operation
+- **Progress tracking** — Log completed/remaining tasks; monitor worker health
+---
+## Chapter 17: Remote Scraping
+### Tor
+- **Tor proxy** — Route requests through Tor network for anonymity; `socks5://127.0.0.1:9150`
+- **IP verification** — Check IP with a service like httpbin.org/ip to verify Tor is active
+- **Performance** — Tor is slow; use only when anonymity is required
+- **Circuit rotation** — Signal Tor to create new circuit for fresh IP; don't rotate too frequently
+### Proxy Services
+- **Rotating proxies** — Commercial proxy services provide rotating IP pools
+- **Proxy types** — HTTP/HTTPS proxies, SOCKS proxies; understand the difference
+- **Proxy configuration** — `requests.get(url, proxies={'http': proxy_url})`; or configure in Scrapy settings
+### Cloud-Based Scraping
+- **Headless instances** — Run scrapers on cloud VMs (AWS, GCP, DigitalOcean) for scale
+- **Containerization** — Docker containers for consistent scraper environments
+- **Scheduling** — Use cron, cloud schedulers, or orchestration tools for recurring scrapes
+- **Cost management** — Right-size instances; use spot/preemptible instances for batch scraping
+---
+## Chapter 18: Legalities and Ethics
+### Legal Framework
+- **robots.txt** — Machine-readable file at `/robots.txt`; specifies which paths are allowed/disallowed
+- **Terms of Service** — Many sites prohibit scraping in ToS; understand the legal weight
+- **CFAA** — Computer Fraud and Abuse Act (US); accessing computers "without authorization" is a federal crime
+- **Copyright** — Scraped data may be copyrighted; fair use depends on purpose and amount
+- **GDPR** — If scraping personal data of EU citizens, GDPR obligations apply
+### Ethical Scraping
+- **Respect the site** — Don't overload servers; honor rate limits; scrape during off-peak hours
+- **Identify yourself** — Use a descriptive User-Agent; provide contact email for site administrators
+- **Minimize footprint** — Only scrape what you need; don't archive entire sites unnecessarily
+- **Data handling** — Handle scraped personal data responsibly; minimize collection and storage
+- **Give back** — If possible, contribute to the site or community; don't just extract value