apify 3.0.4-beta.37 → 3.0.4-beta.45
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.turbo/turbo-build.log +1 -1
- package/.turbo/turbo-copy.log +2 -2
- package/README.md +15 -30
- package/dist/README.md +15 -30
- package/dist/package.json +5 -5
- package/package.json +5 -5
package/.turbo/turbo-build.log
CHANGED
package/.turbo/turbo-copy.log
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
|
-
apify:copy: cache hit, replaying output
|
|
1
|
+
apify:copy: cache hit, replaying output 8040d99fe3b191ed
|
|
2
2
|
apify:copy:
|
|
3
|
-
apify:copy: > apify@3.0.4-beta.
|
|
3
|
+
apify:copy: > apify@3.0.4-beta.45+e2198a19 copy
|
|
4
4
|
apify:copy: > ts-node -T ../../scripts/copy.ts --readme=local
|
|
5
5
|
apify:copy:
|
package/README.md
CHANGED
|
@@ -5,42 +5,31 @@
|
|
|
5
5
|
[](https://discord.gg/jyEM2PRvMU)
|
|
6
6
|
[](https://github.com/apify/apify-sdk-js/actions/workflows/test-and-release.yml)
|
|
7
7
|
|
|
8
|
-
Apify SDK provides the tools required to run your own Apify
|
|
9
|
-
a brand-new module - [`crawlee`](https://npmjs.org/crawlee) (which you can use outside Apify too!), while keeping the Apify specific parts in this module!
|
|
8
|
+
Apify SDK provides the tools required to run your own Apify actors. The crawlers and scraping related tools, previously included in Apify SDK (v2), have been split into a brand-new module - [`crawlee`](https://npmjs.org/crawlee), while keeping the Apify specific parts in this module.
|
|
10
9
|
|
|
11
|
-
> Would you like to work with us on Crawlee, Apify SDK or similar projects?
|
|
10
|
+
> Would you like to work with us on Crawlee, Apify SDK or similar projects? We are hiring [Node.js engineers](https://apify.com/jobs#senior-node.js-engineer).
|
|
12
11
|
|
|
13
12
|
## Upgrading from v2
|
|
14
13
|
|
|
15
|
-
A lot of things have changed since version 2 of the Apify SDK, including the split of the crawlers to the new [`crawlee`](https://npmjs.org/crawlee) module.
|
|
16
|
-
But fear not, as we've written a guide to help you easily migrate from v2 to v3! Visit the [Upgrading Guide](https://crawlee.dev/docs/upgrading/upgrading-to-v3)
|
|
17
|
-
to find out what changes you need to make (especially the section related to this very [Apify SDK](https://crawlee.dev/docs/upgrading/upgrading-to-v3#apify-sdk)),
|
|
18
|
-
and, if you encounter any issues, join our [Discord server](https://discord.gg/jyEM2PRvMU) for help!
|
|
14
|
+
A lot of things have changed since version 2 of the Apify SDK, including the split of the crawlers to the new [`crawlee`](https://npmjs.org/crawlee) module. We've written a guide to help you easily migrate from v2 to v3. Visit the [Upgrading Guide](https://sdk.apify.com/docs/upgrading/upgrading-to-v3) to find out what changes you need to make (especially the section related to this very [Apify SDK](https://sdk.apify.com/docs/upgrading/upgrading-to-v3#apify-sdk)), and, if you encounter any issues, join our [Discord server](https://discord.gg/jyEM2PRvMU) for help!
|
|
19
15
|
|
|
20
16
|
## Quick Start
|
|
21
17
|
|
|
22
18
|
This short tutorial will set you up to start using Apify SDK in a minute or two.
|
|
23
|
-
If you want to learn more, proceed to the [Apify Platform](https://
|
|
19
|
+
If you want to learn more, proceed to the [Apify Platform](https://sdk.apify.com/docs/guides/apify-platform)
|
|
24
20
|
guide that will take you step by step through running your actor on Apify's platform.
|
|
25
21
|
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
Apify SDK requires [Node.js](https://nodejs.org/en/) 16 or later.
|
|
29
|
-
Add Apify SDK to any Node.js project by running:
|
|
22
|
+
Apify SDK requires [Node.js](https://nodejs.org/en/) 16 or later. Add Apify SDK to any Node.js project by running:
|
|
30
23
|
|
|
31
24
|
```bash
|
|
32
25
|
npm install apify crawlee playwright
|
|
33
26
|
```
|
|
34
27
|
|
|
35
|
-
> For this example, we'll also install the [`crawlee`](https://npmjs.org/crawlee) module, as it now provides the crawlers that were previously exported
|
|
36
|
-
> by Apify SDK. If you don't plan to use crawlers in your actors, then you don't need to install it!
|
|
37
|
-
> Keep in mind that neither `playwright` nor `puppeteer` are bundled with `crawlee` in order to reduce install size and allow greater
|
|
38
|
-
> flexibility. That's why we manually install it with NPM. You can choose one, both, or neither.
|
|
28
|
+
> For this example, we'll also install the [`crawlee`](https://npmjs.org/crawlee) module, as it now provides the crawlers that were previously exported by Apify SDK. If you don't plan to use crawlers in your actors, then you don't need to install it. Keep in mind that neither `playwright` nor `puppeteer` are bundled with `crawlee` in order to reduce install size and allow greater flexibility. That's why we manually install it with NPM. You can choose one, both, or neither.
|
|
39
29
|
|
|
40
|
-
There are two ways to initialize your actor: by using the `Actor.main()` function you're probably used to, or by calling `Actor.init()` and `Actor.exit()` manually
|
|
41
|
-
which also helps reduce the indentation level of your code, to keep it more readable.
|
|
30
|
+
There are two ways to initialize your actor: by using the `Actor.main()` function you're probably used to, or by calling `Actor.init()` and `Actor.exit()` manually. We prefer explicitly calling `init` and `exit`.
|
|
42
31
|
|
|
43
|
-
|
|
32
|
+
### Using `Actor.init()` and `Actor.exit()`
|
|
44
33
|
|
|
45
34
|
```typescript
|
|
46
35
|
import { Actor } from 'apify';
|
|
@@ -54,19 +43,17 @@ const crawler = new PlaywrightCrawler({
|
|
|
54
43
|
const title = await page.title();
|
|
55
44
|
console.log(`Title of ${request.url}: ${title}`);
|
|
56
45
|
|
|
57
|
-
// Add URLs that
|
|
58
|
-
await enqueueLinks(
|
|
59
|
-
globs: ['https://www.iana.org/**'],
|
|
60
|
-
});
|
|
46
|
+
// Add URLs that point to the same hostname.
|
|
47
|
+
await enqueueLinks();
|
|
61
48
|
},
|
|
62
49
|
});
|
|
63
50
|
|
|
64
|
-
await crawler.run(['https://
|
|
51
|
+
await crawler.run(['https://crawlee.dev/']);
|
|
65
52
|
|
|
66
53
|
await Actor.exit();
|
|
67
54
|
```
|
|
68
55
|
|
|
69
|
-
|
|
56
|
+
### Using `Actor.main()`
|
|
70
57
|
|
|
71
58
|
```typescript
|
|
72
59
|
import { Actor } from 'apify';
|
|
@@ -79,14 +66,12 @@ await Actor.main(async () => {
|
|
|
79
66
|
const title = await page.title();
|
|
80
67
|
console.log(`Title of ${request.url}: ${title}`);
|
|
81
68
|
|
|
82
|
-
// Add URLs that
|
|
83
|
-
await enqueueLinks(
|
|
84
|
-
globs: ['https://www.iana.org/**'],
|
|
85
|
-
});
|
|
69
|
+
// Add URLs that point to the same hostname.
|
|
70
|
+
await enqueueLinks();
|
|
86
71
|
},
|
|
87
72
|
});
|
|
88
73
|
|
|
89
|
-
await crawler.run(['https://
|
|
74
|
+
await crawler.run(['https://crawlee.dev/']);
|
|
90
75
|
});
|
|
91
76
|
```
|
|
92
77
|
|
package/dist/README.md
CHANGED
|
@@ -5,42 +5,31 @@
|
|
|
5
5
|
[](https://discord.gg/jyEM2PRvMU)
|
|
6
6
|
[](https://github.com/apify/apify-sdk-js/actions/workflows/test-and-release.yml)
|
|
7
7
|
|
|
8
|
-
Apify SDK provides the tools required to run your own Apify
|
|
9
|
-
a brand-new module - [`crawlee`](https://npmjs.org/crawlee) (which you can use outside Apify too!), while keeping the Apify specific parts in this module!
|
|
8
|
+
Apify SDK provides the tools required to run your own Apify actors. The crawlers and scraping related tools, previously included in Apify SDK (v2), have been split into a brand-new module - [`crawlee`](https://npmjs.org/crawlee), while keeping the Apify specific parts in this module.
|
|
10
9
|
|
|
11
|
-
> Would you like to work with us on Crawlee, Apify SDK or similar projects?
|
|
10
|
+
> Would you like to work with us on Crawlee, Apify SDK or similar projects? We are hiring [Node.js engineers](https://apify.com/jobs#senior-node.js-engineer).
|
|
12
11
|
|
|
13
12
|
## Upgrading from v2
|
|
14
13
|
|
|
15
|
-
A lot of things have changed since version 2 of the Apify SDK, including the split of the crawlers to the new [`crawlee`](https://npmjs.org/crawlee) module.
|
|
16
|
-
But fear not, as we've written a guide to help you easily migrate from v2 to v3! Visit the [Upgrading Guide](https://crawlee.dev/docs/upgrading/upgrading-to-v3)
|
|
17
|
-
to find out what changes you need to make (especially the section related to this very [Apify SDK](https://crawlee.dev/docs/upgrading/upgrading-to-v3#apify-sdk)),
|
|
18
|
-
and, if you encounter any issues, join our [Discord server](https://discord.gg/jyEM2PRvMU) for help!
|
|
14
|
+
A lot of things have changed since version 2 of the Apify SDK, including the split of the crawlers to the new [`crawlee`](https://npmjs.org/crawlee) module. We've written a guide to help you easily migrate from v2 to v3. Visit the [Upgrading Guide](https://sdk.apify.com/docs/upgrading/upgrading-to-v3) to find out what changes you need to make (especially the section related to this very [Apify SDK](https://sdk.apify.com/docs/upgrading/upgrading-to-v3#apify-sdk)), and, if you encounter any issues, join our [Discord server](https://discord.gg/jyEM2PRvMU) for help!
|
|
19
15
|
|
|
20
16
|
## Quick Start
|
|
21
17
|
|
|
22
18
|
This short tutorial will set you up to start using Apify SDK in a minute or two.
|
|
23
|
-
If you want to learn more, proceed to the [Apify Platform](https://
|
|
19
|
+
If you want to learn more, proceed to the [Apify Platform](https://sdk.apify.com/docs/guides/apify-platform)
|
|
24
20
|
guide that will take you step by step through running your actor on Apify's platform.
|
|
25
21
|
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
Apify SDK requires [Node.js](https://nodejs.org/en/) 16 or later.
|
|
29
|
-
Add Apify SDK to any Node.js project by running:
|
|
22
|
+
Apify SDK requires [Node.js](https://nodejs.org/en/) 16 or later. Add Apify SDK to any Node.js project by running:
|
|
30
23
|
|
|
31
24
|
```bash
|
|
32
25
|
npm install apify crawlee playwright
|
|
33
26
|
```
|
|
34
27
|
|
|
35
|
-
> For this example, we'll also install the [`crawlee`](https://npmjs.org/crawlee) module, as it now provides the crawlers that were previously exported
|
|
36
|
-
> by Apify SDK. If you don't plan to use crawlers in your actors, then you don't need to install it!
|
|
37
|
-
> Keep in mind that neither `playwright` nor `puppeteer` are bundled with `crawlee` in order to reduce install size and allow greater
|
|
38
|
-
> flexibility. That's why we manually install it with NPM. You can choose one, both, or neither.
|
|
28
|
+
> For this example, we'll also install the [`crawlee`](https://npmjs.org/crawlee) module, as it now provides the crawlers that were previously exported by Apify SDK. If you don't plan to use crawlers in your actors, then you don't need to install it. Keep in mind that neither `playwright` nor `puppeteer` are bundled with `crawlee` in order to reduce install size and allow greater flexibility. That's why we manually install it with NPM. You can choose one, both, or neither.
|
|
39
29
|
|
|
40
|
-
There are two ways to initialize your actor: by using the `Actor.main()` function you're probably used to, or by calling `Actor.init()` and `Actor.exit()` manually
|
|
41
|
-
which also helps reduce the indentation level of your code, to keep it more readable.
|
|
30
|
+
There are two ways to initialize your actor: by using the `Actor.main()` function you're probably used to, or by calling `Actor.init()` and `Actor.exit()` manually. We prefer explicitly calling `init` and `exit`.
|
|
42
31
|
|
|
43
|
-
|
|
32
|
+
### Using `Actor.init()` and `Actor.exit()`
|
|
44
33
|
|
|
45
34
|
```typescript
|
|
46
35
|
import { Actor } from 'apify';
|
|
@@ -54,19 +43,17 @@ const crawler = new PlaywrightCrawler({
|
|
|
54
43
|
const title = await page.title();
|
|
55
44
|
console.log(`Title of ${request.url}: ${title}`);
|
|
56
45
|
|
|
57
|
-
// Add URLs that
|
|
58
|
-
await enqueueLinks(
|
|
59
|
-
globs: ['https://www.iana.org/**'],
|
|
60
|
-
});
|
|
46
|
+
// Add URLs that point to the same hostname.
|
|
47
|
+
await enqueueLinks();
|
|
61
48
|
},
|
|
62
49
|
});
|
|
63
50
|
|
|
64
|
-
await crawler.run(['https://
|
|
51
|
+
await crawler.run(['https://crawlee.dev/']);
|
|
65
52
|
|
|
66
53
|
await Actor.exit();
|
|
67
54
|
```
|
|
68
55
|
|
|
69
|
-
|
|
56
|
+
### Using `Actor.main()`
|
|
70
57
|
|
|
71
58
|
```typescript
|
|
72
59
|
import { Actor } from 'apify';
|
|
@@ -79,14 +66,12 @@ await Actor.main(async () => {
|
|
|
79
66
|
const title = await page.title();
|
|
80
67
|
console.log(`Title of ${request.url}: ${title}`);
|
|
81
68
|
|
|
82
|
-
// Add URLs that
|
|
83
|
-
await enqueueLinks(
|
|
84
|
-
globs: ['https://www.iana.org/**'],
|
|
85
|
-
});
|
|
69
|
+
// Add URLs that point to the same hostname.
|
|
70
|
+
await enqueueLinks();
|
|
86
71
|
},
|
|
87
72
|
});
|
|
88
73
|
|
|
89
|
-
await crawler.run(['https://
|
|
74
|
+
await crawler.run(['https://crawlee.dev/']);
|
|
90
75
|
});
|
|
91
76
|
```
|
|
92
77
|
|
package/dist/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "apify",
|
|
3
|
-
"version": "3.0.4-beta.
|
|
3
|
+
"version": "3.0.4-beta.45+e2198a19",
|
|
4
4
|
"description": "The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.",
|
|
5
5
|
"engines": {
|
|
6
6
|
"node": ">=16.0.0"
|
|
@@ -57,13 +57,13 @@
|
|
|
57
57
|
"@apify/consts": "^2.0.0",
|
|
58
58
|
"@apify/log": "^2.1.0",
|
|
59
59
|
"@apify/utilities": "^2.1.1",
|
|
60
|
-
"@crawlee/core": "^3.0.
|
|
61
|
-
"@crawlee/types": "^3.0.
|
|
62
|
-
"@crawlee/utils": "^3.0.
|
|
60
|
+
"@crawlee/core": "^3.0.3",
|
|
61
|
+
"@crawlee/types": "^3.0.3",
|
|
62
|
+
"@crawlee/utils": "^3.0.3",
|
|
63
63
|
"apify-client": "^2.6.0",
|
|
64
64
|
"ow": "^0.28.1",
|
|
65
65
|
"semver": "^7.3.7",
|
|
66
66
|
"ws": "^7.5.8"
|
|
67
67
|
},
|
|
68
|
-
"gitHead": "
|
|
68
|
+
"gitHead": "e2198a19a0fd8aa91c95f13f337dcec214f2bbc6"
|
|
69
69
|
}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "apify",
|
|
3
|
-
"version": "3.0.4-beta.
|
|
3
|
+
"version": "3.0.4-beta.45+e2198a19",
|
|
4
4
|
"description": "The scalable web crawling and scraping library for JavaScript/Node.js. Enables development of data extraction and web automation jobs (not only) with headless Chrome and Puppeteer.",
|
|
5
5
|
"engines": {
|
|
6
6
|
"node": ">=16.0.0"
|
|
@@ -57,13 +57,13 @@
|
|
|
57
57
|
"@apify/consts": "^2.0.0",
|
|
58
58
|
"@apify/log": "^2.1.0",
|
|
59
59
|
"@apify/utilities": "^2.1.1",
|
|
60
|
-
"@crawlee/core": "^3.0.
|
|
61
|
-
"@crawlee/types": "^3.0.
|
|
62
|
-
"@crawlee/utils": "^3.0.
|
|
60
|
+
"@crawlee/core": "^3.0.3",
|
|
61
|
+
"@crawlee/types": "^3.0.3",
|
|
62
|
+
"@crawlee/utils": "^3.0.3",
|
|
63
63
|
"apify-client": "^2.6.0",
|
|
64
64
|
"ow": "^0.28.1",
|
|
65
65
|
"semver": "^7.3.7",
|
|
66
66
|
"ws": "^7.5.8"
|
|
67
67
|
},
|
|
68
|
-
"gitHead": "
|
|
68
|
+
"gitHead": "e2198a19a0fd8aa91c95f13f337dcec214f2bbc6"
|
|
69
69
|
}
|