@gby/got-scraping 4.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +164 -0
- package/dist/index.d.ts +284 -0
- package/dist/index.js +1117 -0
- package/dist/index.js.map +1 -0
- package/package.json +67 -0
package/README.md
ADDED
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
> # ⚠️⚠️⚠️ `got-scraping` is EOL ⚠️⚠️⚠️
|
|
2
|
+
>
|
|
3
|
+
> After many years of development, we decided to deprecate the `got-scraping` package.
|
|
4
|
+
> The package will no longer receive updates or support.
|
|
5
|
+
>
|
|
6
|
+
> For new projects, we recommend using [`impit`](https://github.com/apify/impit). `impit` is a modern, powerful, and flexible HTTP client with `fetch` API based on Rust's `reqwest` library. It provides a similar feature set to `got-scraping`, including browser-like request headers, proxy support, and more.
|
|
7
|
+
>
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
## Got Scraping
|
|
11
|
+
|
|
12
|
+
Got Scraping is a small but powerful [`got` extension](https://github.com/sindresorhus/got) with the purpose of sending browser-like requests out of the box. This is very essential in the web scraping industry to blend in with the website traffic.
|
|
13
|
+
|
|
14
|
+
## Installation
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
$ npm install got-scraping
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
# The module is now ESM only
|
|
21
|
+
|
|
22
|
+
This means you have to import it by using an `import` expression, or the `import()` method. You can do so by either migrating your project to ESM, or importing `got-scraping` in an async context
|
|
23
|
+
|
|
24
|
+
```diff
|
|
25
|
+
-const { gotScraping } = require('got-scraping');
|
|
26
|
+
+import { gotScraping } from 'got-scraping';
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
If you cannot migrate to ESM, here's an example of how to import it in an async context:
|
|
30
|
+
|
|
31
|
+
```javascript
|
|
32
|
+
let gotScraping;
|
|
33
|
+
|
|
34
|
+
async function fetchWithGotScraping(url) {
|
|
35
|
+
gotScraping ??= (await import('got-scraping')).gotScraping;
|
|
36
|
+
|
|
37
|
+
return gotScraping.get(url);
|
|
38
|
+
}
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
**Note:**
|
|
42
|
+
> - Node.js >=16 is required due to instability of HTTP/2 support in lower versions.
|
|
43
|
+
|
|
44
|
+
## API
|
|
45
|
+
|
|
46
|
+
Got scraping package is built using the [`got.extend(...)`](https://github.com/sindresorhus/got/blob/main/documentation/10-instances.md) functionality, therefore it supports all the features Got has.
|
|
47
|
+
|
|
48
|
+
Interested what's [under the hood](#under-the-hood)?
|
|
49
|
+
|
|
50
|
+
```javascript
|
|
51
|
+
import { gotScraping } from 'got-scraping';
|
|
52
|
+
|
|
53
|
+
gotScraping
|
|
54
|
+
.get('https://apify.com')
|
|
55
|
+
.then( ({ body }) => console.log(body));
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### options
|
|
59
|
+
|
|
60
|
+
#### `proxyUrl`
|
|
61
|
+
|
|
62
|
+
Type: **`string`**
|
|
63
|
+
|
|
64
|
+
URL of the HTTP or HTTPS based proxy. HTTP/2 proxies are supported as well.
|
|
65
|
+
|
|
66
|
+
```javascript
|
|
67
|
+
import { gotScraping } from 'got-scraping';
|
|
68
|
+
|
|
69
|
+
gotScraping
|
|
70
|
+
.get({
|
|
71
|
+
url: 'https://apify.com',
|
|
72
|
+
proxyUrl: 'http://usernamed:password@myproxy.com:1234',
|
|
73
|
+
})
|
|
74
|
+
.then(({ body }) => console.log(body));
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
#### `useHeaderGenerator`
|
|
78
|
+
|
|
79
|
+
Type: **`boolean`**\
|
|
80
|
+
Default: **`true`**
|
|
81
|
+
|
|
82
|
+
Whether to use the generation of the browser-like headers.
|
|
83
|
+
|
|
84
|
+
#### `headerGeneratorOptions`
|
|
85
|
+
|
|
86
|
+
See the [`HeaderGeneratorOptions`](https://github.com/apify/fingerprint-suite/tree/master/packages/header-generator#headergeneratoroptions) docs.
|
|
87
|
+
|
|
88
|
+
```javascript
|
|
89
|
+
const response = await gotScraping({
|
|
90
|
+
url: 'https://api.apify.com/v2/browser-info',
|
|
91
|
+
headerGeneratorOptions:{
|
|
92
|
+
browsers: [
|
|
93
|
+
{
|
|
94
|
+
name: 'chrome',
|
|
95
|
+
minVersion: 87,
|
|
96
|
+
maxVersion: 89
|
|
97
|
+
}
|
|
98
|
+
],
|
|
99
|
+
devices: ['desktop'],
|
|
100
|
+
locales: ['de-DE', 'en-US'],
|
|
101
|
+
operatingSystems: ['windows', 'linux'],
|
|
102
|
+
}
|
|
103
|
+
});
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
#### `sessionToken`
|
|
107
|
+
|
|
108
|
+
A non-primitive unique object which describes the current session. By default, it's `undefined`, so new headers will be generated every time. Headers generated with the same `sessionToken` never change.
|
|
109
|
+
|
|
110
|
+
## Under the hood
|
|
111
|
+
|
|
112
|
+
Thanks to the included [`header-generator`](https://github.com/apify/fingerprint-suite/tree/master/packages/header-generator) package, you can choose various browsers from different operating systems and devices. It generates all the headers automatically so you can focus on the important stuff instead.
|
|
113
|
+
|
|
114
|
+
Yet another goal is to simplify the usage of proxies. Just pass the `proxyUrl` option and you are set. Got Scraping automatically detects the HTTP protocol that the proxy server supports. After the connection is established, it does another ALPN negotiation for the end server. Once that is complete, Got Scraping can proceed with HTTP requests.
|
|
115
|
+
|
|
116
|
+
Using the same HTTP version that browsers do is important as well. Most modern browsers use HTTP/2, so Got Scraping is making a use of it too. Fortunately, this is already supported by Got - it automatically handles [ALPN protocol negotiation](https://en.wikipedia.org/wiki/Application-Layer_Protocol_Negotiation) to select the best available protocol.
|
|
117
|
+
|
|
118
|
+
HTTP/1.1 headers are always automatically formatted in [`Pascal-Case`](https://pl.wikipedia.org/wiki/PascalCase). However, there is an exception: [`x-`](https://datatracker.ietf.org/doc/html/rfc7231#section-8.3.1) headers are not modified in *any* way.
|
|
119
|
+
|
|
120
|
+
By default, Got Scraping will use an insecure HTTP parser, which allows to access websites with non-spec-compliant web servers.
|
|
121
|
+
|
|
122
|
+
Last but not least, Got Scraping comes with updated TLS configuration. Some websites make a fingerprint of it and compare it with real browsers. While Node.js doesn't support OpenSSL 3 yet, the current configuration still should work flawlessly.
|
|
123
|
+
|
|
124
|
+
To get more detailed information about the implementation, please refer to the [source code](https://github.com/apify/got-scraping/blob/master/src/index.ts).
|
|
125
|
+
|
|
126
|
+
## Tips
|
|
127
|
+
|
|
128
|
+
This package can only generate all the standard attributes. You might want to add the [`referer` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer) if necessary. Please bear in mind that these headers are made for GET requests for HTML documents. If you want to make POST requests or GET requests for any other content type, you should alter these headers according to your needs. You can do so by passing a headers option or writing a custom [Got handler](https://github.com/sindresorhus/got/blob/main/documentation/10-instances.md).
|
|
129
|
+
|
|
130
|
+
This package should provide a solid start for your browser request emulation process. All websites are built differently, and some of them might require some additional special care.
|
|
131
|
+
|
|
132
|
+
### Overriding request headers
|
|
133
|
+
|
|
134
|
+
```javascript
|
|
135
|
+
const response = await gotScraping({
|
|
136
|
+
url: 'https://apify.com/',
|
|
137
|
+
headers: {
|
|
138
|
+
'user-agent': 'test',
|
|
139
|
+
},
|
|
140
|
+
});
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
For more advanced usage please refer to the [Got documentation](https://github.com/sindresorhus/got/#documentation).
|
|
144
|
+
|
|
145
|
+
### JSON mode
|
|
146
|
+
|
|
147
|
+
You can parse JSON with this package too, but please bear in mind that the request header generation is done specifically for `HTML` content type. You might want to alter the generated headers to match the browser ones.
|
|
148
|
+
|
|
149
|
+
```javascript
|
|
150
|
+
const response = await gotScraping({
|
|
151
|
+
responseType: 'json',
|
|
152
|
+
url: 'https://api.apify.com/v2/browser-info',
|
|
153
|
+
});
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### Error recovery
|
|
157
|
+
|
|
158
|
+
This section covers possible errors that might happen due to different site implementations.
|
|
159
|
+
|
|
160
|
+
```
|
|
161
|
+
RequestError: Client network socket disconnected before secure TLS connection was established
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
The error above can be a result of the server not supporting the provided TLS setings. Try changing the ciphers parameter to either `undefined` or a custom value.
|
package/dist/index.d.ts
ADDED
|
@@ -0,0 +1,284 @@
|
|
|
1
|
+
// @ts-ignore Patch needed for ES20xx compatibility while this module uses Node16/NodeNext resolutions
|
|
2
|
+
import * as got from 'got';
|
|
3
|
+
// @ts-ignore Patch needed for ES20xx compatibility while this module uses Node16/NodeNext resolutions
|
|
4
|
+
import { Options, OptionsInit as OptionsInit$1, Agents, CancelableRequest, Response, Request, ExtendOptions, HTTPAlias, PaginationOptions, PaginateData, Got } from 'got';
|
|
5
|
+
// @ts-ignore Patch needed for ES20xx compatibility while this module uses Node16/NodeNext resolutions
|
|
6
|
+
export * from 'got';
|
|
7
|
+
// @ts-ignore Patch needed for ES20xx compatibility while this module uses Node16/NodeNext resolutions
|
|
8
|
+
export { OptionsInit as GotOptionsInit } from 'got';
|
|
9
|
+
import http, { Agent, ClientRequest, ClientRequestArgs, AgentOptions } from 'node:http';
|
|
10
|
+
import https from 'node:https';
|
|
11
|
+
import { HeaderGenerator } from 'header-generator';
|
|
12
|
+
import { NetConnectOpts } from 'node:net';
|
|
13
|
+
import { Duplex } from 'node:stream';
|
|
14
|
+
import { URL as URL$1 } from 'node:url';
|
|
15
|
+
|
|
16
|
+
/**
|
|
17
|
+
* @see https://github.com/nodejs/node/blob/533cafcf7e3ab72e98a2478bc69aedfdf06d3a5e/lib/_http_client.js#L129-L162
|
|
18
|
+
* @see https://github.com/nodejs/node/blob/533cafcf7e3ab72e98a2478bc69aedfdf06d3a5e/lib/_http_client.js#L234-L246
|
|
19
|
+
* @see https://github.com/nodejs/node/blob/533cafcf7e3ab72e98a2478bc69aedfdf06d3a5e/lib/_http_client.js#L304-L305
|
|
20
|
+
* Wraps an existing Agent instance,
|
|
21
|
+
* so there's no need to replace `agent.addRequest`.
|
|
22
|
+
*/
|
|
23
|
+
declare class WrappedAgent<T extends Agent> implements Agent {
|
|
24
|
+
agent: T;
|
|
25
|
+
constructor(agent: T);
|
|
26
|
+
addRequest(request: ClientRequest, options: ClientRequestArgs): void;
|
|
27
|
+
get keepAlive(): boolean;
|
|
28
|
+
get maxSockets(): Agent['maxSockets'];
|
|
29
|
+
get options(): AgentOptions;
|
|
30
|
+
get defaultPort(): number;
|
|
31
|
+
get protocol(): string;
|
|
32
|
+
destroy(): void;
|
|
33
|
+
get maxFreeSockets(): Agent['maxFreeSockets'];
|
|
34
|
+
get maxTotalSockets(): Agent['maxTotalSockets'];
|
|
35
|
+
get freeSockets(): Agent['freeSockets'];
|
|
36
|
+
get sockets(): Agent['sockets'];
|
|
37
|
+
get requests(): Agent['requests'];
|
|
38
|
+
on(eventName: string | symbol, listener: (...args: any[]) => void): this;
|
|
39
|
+
once(eventName: string | symbol, listener: (...args: any[]) => void): this;
|
|
40
|
+
off(eventName: string | symbol, listener: (...args: any[]) => void): this;
|
|
41
|
+
addListener(eventName: string | symbol, listener: (...args: any[]) => void): this;
|
|
42
|
+
removeListener(eventName: string | symbol, listener: (...args: any[]) => void): this;
|
|
43
|
+
removeAllListeners(eventName?: string | symbol): this;
|
|
44
|
+
setMaxListeners(n: number): this;
|
|
45
|
+
getMaxListeners(): number;
|
|
46
|
+
listeners(eventName: Parameters<Agent['listeners']>[0]): ReturnType<Agent['listeners']>;
|
|
47
|
+
rawListeners(eventName: Parameters<Agent['rawListeners']>[0]): ReturnType<Agent['rawListeners']>;
|
|
48
|
+
emit(eventName: string | symbol, ...args: any[]): boolean;
|
|
49
|
+
eventNames(): (string | symbol)[];
|
|
50
|
+
listenerCount(eventName: string | symbol): number;
|
|
51
|
+
prependListener(eventName: string | symbol, listener: (...args: any[]) => void): this;
|
|
52
|
+
prependOnceListener(eventName: string | symbol, listener: (...args: any[]) => void): this;
|
|
53
|
+
createConnection(options: NetConnectOpts, callback?: (err: Error | null, stream: Duplex) => void): Duplex;
|
|
54
|
+
keepSocketAlive(socket: Duplex): void;
|
|
55
|
+
reuseSocket(socket: Duplex, request: ClientRequest): void;
|
|
56
|
+
getName(options?: any): string;
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
/**
|
|
60
|
+
* Transforms the casing of the headers to Pascal-Case.
|
|
61
|
+
*/
|
|
62
|
+
declare class TransformHeadersAgent<T extends Agent> extends WrappedAgent<T> {
|
|
63
|
+
/**
|
|
64
|
+
* Transforms the request via header normalization.
|
|
65
|
+
*/
|
|
66
|
+
transformRequest(request: ClientRequest, { sortHeaders }: {
|
|
67
|
+
sortHeaders: boolean;
|
|
68
|
+
}): void;
|
|
69
|
+
addRequest(request: ClientRequest, options: ClientRequestArgs): void;
|
|
70
|
+
toPascalCase(header: string): string;
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
declare function browserHeadersHook(options: Options): Promise<void>;
|
|
74
|
+
|
|
75
|
+
declare function customOptionsHook(raw: OptionsInit$1, options: Options): void;
|
|
76
|
+
|
|
77
|
+
declare function http2Hook(options: Options): void;
|
|
78
|
+
|
|
79
|
+
declare function insecureParserHook(options: Options): void;
|
|
80
|
+
|
|
81
|
+
declare function optionsValidationHandler(options: unknown): void;
|
|
82
|
+
|
|
83
|
+
declare function proxyHook(options: Options): Promise<void>;
|
|
84
|
+
declare function getAgents(parsedProxyUrl: URL$1, rejectUnauthorized: boolean): Promise<Agents>;
|
|
85
|
+
|
|
86
|
+
declare function tlsHook(options: Options): void;
|
|
87
|
+
|
|
88
|
+
interface Context {
|
|
89
|
+
proxyUrl?: string;
|
|
90
|
+
headerGeneratorOptions?: Record<string, unknown>;
|
|
91
|
+
useHeaderGenerator?: boolean;
|
|
92
|
+
headerGenerator?: {
|
|
93
|
+
getHeaders: (options: Record<string, unknown>) => Record<string, string>;
|
|
94
|
+
};
|
|
95
|
+
insecureHTTPParser?: boolean;
|
|
96
|
+
sessionToken?: object;
|
|
97
|
+
/** @private */
|
|
98
|
+
sessionData?: unknown;
|
|
99
|
+
/** @private */
|
|
100
|
+
resolveProtocol?: (data: unknown) => {
|
|
101
|
+
alpnProtocol: string;
|
|
102
|
+
} | Promise<{
|
|
103
|
+
alpnProtocol: string;
|
|
104
|
+
}>;
|
|
105
|
+
}
|
|
106
|
+
type OptionsInit = OptionsInit$1 & Context;
|
|
107
|
+
|
|
108
|
+
type Except<ObjectType, KeysType extends keyof ObjectType> = Pick<ObjectType, Exclude<keyof ObjectType, KeysType>>;
|
|
109
|
+
type Merge<FirstType, SecondType> = Except<FirstType, Extract<keyof FirstType, keyof SecondType>> & SecondType;
|
|
110
|
+
type ExtendedGotRequestFunction = {
|
|
111
|
+
(url: string | URL, options?: ExtendedOptionsOfTextResponseBody): CancelableRequest<Response<string>>;
|
|
112
|
+
<T>(url: string | URL, options?: ExtendedOptionsOfJSONResponseBody): CancelableRequest<Response<T>>;
|
|
113
|
+
(url: string | URL, options?: ExtendedOptionsOfBufferResponseBody): CancelableRequest<Response<Buffer>>;
|
|
114
|
+
(url: string | URL, options?: ExtendedOptionsOfUnknownResponseBody): CancelableRequest<Response>;
|
|
115
|
+
(options: ExtendedOptionsOfTextResponseBody): CancelableRequest<Response<string>>;
|
|
116
|
+
<T>(options: ExtendedOptionsOfJSONResponseBody): CancelableRequest<Response<T>>;
|
|
117
|
+
(options: ExtendedOptionsOfBufferResponseBody): CancelableRequest<Response<Buffer>>;
|
|
118
|
+
(options: ExtendedOptionsOfUnknownResponseBody): CancelableRequest<Response>;
|
|
119
|
+
(url: string | URL, options?: (Merge<ExtendedOptionsOfTextResponseBody, ResponseBodyOnly>)): CancelableRequest<string>;
|
|
120
|
+
<T>(url: string | URL, options?: (Merge<ExtendedOptionsOfJSONResponseBody, ResponseBodyOnly>)): CancelableRequest<T>;
|
|
121
|
+
(url: string | URL, options?: (Merge<ExtendedOptionsOfBufferResponseBody, ResponseBodyOnly>)): CancelableRequest<Buffer>;
|
|
122
|
+
(options: (Merge<ExtendedOptionsOfTextResponseBody, ResponseBodyOnly>)): CancelableRequest<string>;
|
|
123
|
+
<T>(options: (Merge<ExtendedOptionsOfJSONResponseBody, ResponseBodyOnly>)): CancelableRequest<T>;
|
|
124
|
+
(options: (Merge<ExtendedOptionsOfBufferResponseBody, ResponseBodyOnly>)): CancelableRequest<Buffer>;
|
|
125
|
+
(url: string | URL, options?: Merge<OptionsInit, {
|
|
126
|
+
isStream: true;
|
|
127
|
+
}>): Request;
|
|
128
|
+
(options: Merge<OptionsInit, {
|
|
129
|
+
isStream: true;
|
|
130
|
+
}>): Request;
|
|
131
|
+
(url: string | URL, options?: OptionsInit): CancelableRequest | Request;
|
|
132
|
+
(options: OptionsInit): CancelableRequest | Request;
|
|
133
|
+
(url: undefined, options: undefined, defaults: Options): CancelableRequest | Request;
|
|
134
|
+
};
|
|
135
|
+
type ExtendedOptionsOfTextResponseBody = Merge<OptionsInit, {
|
|
136
|
+
isStream?: false;
|
|
137
|
+
resolveBodyOnly?: false;
|
|
138
|
+
responseType?: 'text';
|
|
139
|
+
}>;
|
|
140
|
+
type ExtendedOptionsOfJSONResponseBody = Merge<OptionsInit, {
|
|
141
|
+
isStream?: false;
|
|
142
|
+
resolveBodyOnly?: false;
|
|
143
|
+
responseType?: 'json';
|
|
144
|
+
}>;
|
|
145
|
+
type ExtendedOptionsOfBufferResponseBody = Merge<OptionsInit, {
|
|
146
|
+
isStream?: false;
|
|
147
|
+
resolveBodyOnly?: false;
|
|
148
|
+
responseType: 'buffer';
|
|
149
|
+
}>;
|
|
150
|
+
type ExtendedOptionsOfUnknownResponseBody = Merge<OptionsInit, {
|
|
151
|
+
isStream?: false;
|
|
152
|
+
resolveBodyOnly?: false;
|
|
153
|
+
}>;
|
|
154
|
+
type ResponseBodyOnly = {
|
|
155
|
+
resolveBodyOnly: true;
|
|
156
|
+
};
|
|
157
|
+
type ExtendedGotStreamFunction = ((url?: string | URL, options?: Merge<OptionsInit, {
|
|
158
|
+
isStream?: true;
|
|
159
|
+
}>) => Request) & ((options?: Merge<OptionsInit, {
|
|
160
|
+
isStream?: true;
|
|
161
|
+
}>) => Request);
|
|
162
|
+
type ExtendedExtendOptions = ExtendOptions & OptionsInit;
|
|
163
|
+
type ExtendedGotStream = ExtendedGotStreamFunction & Record<HTTPAlias, ExtendedGotStreamFunction>;
|
|
164
|
+
type ExtendedPaginationOptions<ElementType, BodyType> = PaginationOptions<ElementType, BodyType> & {
|
|
165
|
+
paginate?: (data: PaginateData<BodyType, ElementType>) => OptionsInit | false;
|
|
166
|
+
};
|
|
167
|
+
type ExtendedOptionsWithPagination<T = unknown, R = unknown> = Merge<OptionsInit, {
|
|
168
|
+
pagination?: ExtendedPaginationOptions<T, R>;
|
|
169
|
+
}>;
|
|
170
|
+
type ExtendedGotPaginate = {
|
|
171
|
+
/**
|
|
172
|
+
Same as `GotPaginate.each`.
|
|
173
|
+
*/
|
|
174
|
+
<T, R = unknown>(url: string | URL, options?: ExtendedOptionsWithPagination<T, R>): AsyncIterableIterator<T>;
|
|
175
|
+
/**
|
|
176
|
+
Same as `GotPaginate.each`.
|
|
177
|
+
*/
|
|
178
|
+
<T, R = unknown>(options?: ExtendedOptionsWithPagination<T, R>): AsyncIterableIterator<T>;
|
|
179
|
+
/**
|
|
180
|
+
Returns an async iterator.
|
|
181
|
+
|
|
182
|
+
See pagination.options for more pagination options.
|
|
183
|
+
|
|
184
|
+
@example
|
|
185
|
+
```
|
|
186
|
+
import { gotScraping } from 'got-scraping';
|
|
187
|
+
|
|
188
|
+
const countLimit = 10;
|
|
189
|
+
|
|
190
|
+
const pagination = gotScraping.paginate('https://api.github.com/repos/sindresorhus/got/commits', {
|
|
191
|
+
pagination: { countLimit }
|
|
192
|
+
});
|
|
193
|
+
|
|
194
|
+
console.log(`Printing latest ${countLimit} Got commits (newest to oldest):`);
|
|
195
|
+
|
|
196
|
+
for await (const commitData of pagination) {
|
|
197
|
+
console.log(commitData.commit.message);
|
|
198
|
+
}
|
|
199
|
+
```
|
|
200
|
+
*/
|
|
201
|
+
each: (<T, R = unknown>(url: string | URL, options?: ExtendedOptionsWithPagination<T, R>) => AsyncIterableIterator<T>) & (<T, R = unknown>(options?: ExtendedOptionsWithPagination<T, R>) => AsyncIterableIterator<T>);
|
|
202
|
+
/**
|
|
203
|
+
Returns a Promise for an array of all results.
|
|
204
|
+
|
|
205
|
+
See pagination.options for more pagination options.
|
|
206
|
+
|
|
207
|
+
@example
|
|
208
|
+
```
|
|
209
|
+
import { gotScraping } from 'got-scraping';
|
|
210
|
+
|
|
211
|
+
const countLimit = 10;
|
|
212
|
+
|
|
213
|
+
const results = await gotScraping.paginate.all('https://api.github.com/repos/sindresorhus/got/commits', {
|
|
214
|
+
pagination: { countLimit }
|
|
215
|
+
});
|
|
216
|
+
|
|
217
|
+
console.log(`Printing latest ${countLimit} Got commits (newest to oldest):`);
|
|
218
|
+
console.log(results);
|
|
219
|
+
```
|
|
220
|
+
*/
|
|
221
|
+
all: (<T, R = unknown>(url: string | URL, options?: ExtendedOptionsWithPagination<T, R>) => Promise<T[]>) & (<T, R = unknown>(options?: ExtendedOptionsWithPagination<T, R>) => Promise<T[]>);
|
|
222
|
+
};
|
|
223
|
+
type GotScraping = {
|
|
224
|
+
stream: ExtendedGotStream;
|
|
225
|
+
paginate: ExtendedGotPaginate;
|
|
226
|
+
defaults: Got['defaults'];
|
|
227
|
+
extend: (...instancesOrOptions: Array<GotScraping | ExtendedExtendOptions>) => GotScraping;
|
|
228
|
+
} & Record<HTTPAlias, ExtendedGotRequestFunction> & ExtendedGotRequestFunction;
|
|
229
|
+
|
|
230
|
+
/**
|
|
231
|
+
* Create the options for the Got Scraping instance.
|
|
232
|
+
* @returns got 的扩展选项 ExtendOptions
|
|
233
|
+
*/
|
|
234
|
+
declare function createGotScrapingOptions(): {
|
|
235
|
+
handlers: got.HandlerFunction[];
|
|
236
|
+
mutableDefaults: boolean;
|
|
237
|
+
http2: boolean;
|
|
238
|
+
https: {
|
|
239
|
+
rejectUnauthorized: boolean;
|
|
240
|
+
};
|
|
241
|
+
throwHttpErrors: boolean;
|
|
242
|
+
timeout: {
|
|
243
|
+
request: number;
|
|
244
|
+
};
|
|
245
|
+
retry: {
|
|
246
|
+
limit: number;
|
|
247
|
+
};
|
|
248
|
+
headers: {
|
|
249
|
+
'user-agent': undefined;
|
|
250
|
+
};
|
|
251
|
+
context: {
|
|
252
|
+
headerGenerator: HeaderGenerator;
|
|
253
|
+
useHeaderGenerator: boolean;
|
|
254
|
+
insecureHTTPParser: boolean;
|
|
255
|
+
};
|
|
256
|
+
agent: {
|
|
257
|
+
http: TransformHeadersAgent<http.Agent>;
|
|
258
|
+
https: TransformHeadersAgent<https.Agent>;
|
|
259
|
+
};
|
|
260
|
+
hooks: {
|
|
261
|
+
init: (typeof customOptionsHook)[];
|
|
262
|
+
beforeRequest: (typeof insecureParserHook)[];
|
|
263
|
+
beforeRedirect: got.BeforeRedirectHook[];
|
|
264
|
+
};
|
|
265
|
+
};
|
|
266
|
+
declare const gotScraping: GotScraping;
|
|
267
|
+
|
|
268
|
+
declare const hooks: {
|
|
269
|
+
init: (typeof customOptionsHook)[];
|
|
270
|
+
beforeRequest: (typeof insecureParserHook)[];
|
|
271
|
+
beforeRedirect: got.BeforeRedirectHook[];
|
|
272
|
+
fixDecompress: got.HandlerFunction;
|
|
273
|
+
insecureParserHook: typeof insecureParserHook;
|
|
274
|
+
sessionDataHook: (options: Options) => void;
|
|
275
|
+
http2Hook: typeof http2Hook;
|
|
276
|
+
proxyHook: typeof proxyHook;
|
|
277
|
+
browserHeadersHook: typeof browserHeadersHook;
|
|
278
|
+
tlsHook: typeof tlsHook;
|
|
279
|
+
optionsValidationHandler: typeof optionsValidationHandler;
|
|
280
|
+
customOptionsHook: typeof customOptionsHook;
|
|
281
|
+
refererHook: got.BeforeRedirectHook;
|
|
282
|
+
};
|
|
283
|
+
|
|
284
|
+
export { Context, ExtendedExtendOptions, ExtendedGotPaginate, ExtendedGotRequestFunction, ExtendedGotStream, ExtendedGotStreamFunction, ExtendedOptionsOfBufferResponseBody, ExtendedOptionsOfJSONResponseBody, ExtendedOptionsOfTextResponseBody, ExtendedOptionsOfUnknownResponseBody, ExtendedOptionsWithPagination, ExtendedPaginationOptions, GotScraping, OptionsInit, ResponseBodyOnly, TransformHeadersAgent, createGotScrapingOptions, getAgents, gotScraping, hooks };
|