contextractor 0.3.1 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +7 -5
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -2,7 +2,9 @@
2
2
 
3
3
  Extract clean, readable content from any website using [Trafilatura](https://trafilatura.readthedocs.io/).
4
4
 
5
- Available as: [npm CLI](#install) | [Docker](#docker) | [Apify actor](https://apify.com/shortc/contextractor) | [Web app](https://contextractor.com)
5
+ Available as: [npm CLI](#install) | [Docker](#docker) | [Apify actor](https://apify.com/glueo/contextractor)
6
+
7
+ Try the [Playground](https://contextractor.com) to configure extraction settings and preview commands before running.
6
8
 
7
9
  ## Install
8
10
 
@@ -22,7 +24,7 @@ Works with zero config. Pass URLs directly, or use a config file for complex set
22
24
 
23
25
  ```bash
24
26
  contextractor https://example.com --precision --format json -o ./results
25
- contextractor --config config.yaml --max-pages 10
27
+ contextractor --config config.json --max-pages 10
26
28
  ```
27
29
 
28
30
  ### CLI Options
@@ -31,7 +33,7 @@ contextractor --config config.yaml --max-pages 10
31
33
  contextractor [OPTIONS] [URLS...]
32
34
 
33
35
  Crawl Settings:
34
- --config, -c Path to YAML or JSON config file
36
+ --config, -c Path to JSON config file
35
37
  --output-dir, -o Output directory
36
38
  --format, -f Output format (txt, markdown, json, jsonl, xml, xmltei)
37
39
  --max-pages Max pages to crawl (0 = unlimited)
@@ -95,7 +97,7 @@ CLI flags override config file settings. Merge order: `defaults → config file
95
97
 
96
98
  ### Config File (optional)
97
99
 
98
- Supports both JSON and YAML format. JSON examples shown below:
100
+ Use a JSON config file to set options:
99
101
 
100
102
  ```json
101
103
  {
@@ -214,7 +216,7 @@ docker run -v ./output:/output ghcr.io/contextractor/contextractor https://examp
214
216
  Use a config file:
215
217
 
216
218
  ```bash
217
- docker run -v ./config.yaml:/config.yaml ghcr.io/contextractor/contextractor --config /config.yaml
219
+ docker run -v ./config.json:/config.json ghcr.io/contextractor/contextractor --config /config.json
218
220
  ```
219
221
 
220
222
  All CLI flags work the same inside Docker.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "contextractor",
3
- "version": "0.3.1",
3
+ "version": "0.3.3",
4
4
  "description": "Extract web content from URLs with configurable extraction options",
5
5
  "license": "MIT",
6
6
  "repository": {