@promptbook/website-crawler 0.88.0-8 → 0.88.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +21 -27
- package/esm/index.es.js +79 -18
- package/esm/index.es.js.map +1 -1
- package/esm/typings/src/_packages/utils.index.d.ts +2 -0
- package/esm/typings/src/config.d.ts +1 -1
- package/esm/typings/src/errors/PipelineExecutionError.d.ts +5 -0
- package/esm/typings/src/errors/utils/ErrorJson.d.ts +5 -0
- package/esm/typings/src/utils/expectation-counters/countCharacters.d.ts +3 -0
- package/esm/typings/src/utils/expectation-counters/countLines.d.ts +3 -0
- package/esm/typings/src/utils/expectation-counters/countPages.d.ts +3 -0
- package/esm/typings/src/utils/expectation-counters/countParagraphs.d.ts +3 -0
- package/esm/typings/src/utils/expectation-counters/countSentences.d.ts +3 -0
- package/esm/typings/src/utils/expectation-counters/countWords.d.ts +3 -0
- package/esm/typings/src/utils/serialization/jsonStringsToJsons.d.ts +9 -0
- package/esm/typings/src/utils/serialization/jsonStringsToJsons.test.d.ts +1 -0
- package/package.json +2 -2
- package/umd/index.umd.js +82 -21
- package/umd/index.umd.js.map +1 -1
package/README.md
CHANGED
|
@@ -23,10 +23,6 @@
|
|
|
23
23
|
|
|
24
24
|
|
|
25
25
|
|
|
26
|
-
<blockquote style="color: #ff8811">
|
|
27
|
-
<b>⚠ Warning:</b> This is a pre-release version of the library. It is not yet ready for production use. Please look at <a href="https://www.npmjs.com/package/@promptbook/core?activeTab=versions">latest stable release</a>.
|
|
28
|
-
</blockquote>
|
|
29
|
-
|
|
30
26
|
## 📦 Package `@promptbook/website-crawler`
|
|
31
27
|
|
|
32
28
|
- Promptbooks are [divided into several](#-packages) packages, all are published from [single monorepo](https://github.com/webgptorg/promptbook).
|
|
@@ -70,6 +66,9 @@ This shift is going to happen, whether we are ready for it or not. Our mission i
|
|
|
70
66
|
|
|
71
67
|
|
|
72
68
|
|
|
69
|
+
|
|
70
|
+
|
|
71
|
+
|
|
73
72
|
## 🚀 Get started
|
|
74
73
|
|
|
75
74
|
Take a look at the simple starter kit with books integrated into the **Hello World** sample applications:
|
|
@@ -81,6 +80,8 @@ Take a look at the simple starter kit with books integrated into the **Hello Wor
|
|
|
81
80
|
|
|
82
81
|
|
|
83
82
|
|
|
83
|
+
|
|
84
|
+
|
|
84
85
|
## 💜 The Promptbook Project
|
|
85
86
|
|
|
86
87
|
Promptbook project is ecosystem of multiple projects and tools, following is a list of most important pieces of the project:
|
|
@@ -116,6 +117,14 @@ Promptbook project is ecosystem of multiple projects and tools, following is a l
|
|
|
116
117
|
</tbody>
|
|
117
118
|
</table>
|
|
118
119
|
|
|
120
|
+
Hello world examples:
|
|
121
|
+
|
|
122
|
+
- [Hello world](https://github.com/webgptorg/hello-world)
|
|
123
|
+
- [Hello world in Node.js](https://github.com/webgptorg/hello-world-node-js)
|
|
124
|
+
- [Hello world in Next.js](https://github.com/webgptorg/hello-world-next-js)
|
|
125
|
+
|
|
126
|
+
|
|
127
|
+
|
|
119
128
|
We also have a community of developers and users of **Promptbook**:
|
|
120
129
|
|
|
121
130
|
- [Discord community](https://discord.gg/x3QWNaa89N)
|
|
@@ -282,16 +291,9 @@ Or you can install them separately:
|
|
|
282
291
|
|
|
283
292
|
## 📚 Dictionary
|
|
284
293
|
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
### 📚 Dictionary
|
|
291
|
-
|
|
292
294
|
The following glossary is used to clarify certain concepts:
|
|
293
295
|
|
|
294
|
-
|
|
296
|
+
### General LLM / AI terms
|
|
295
297
|
|
|
296
298
|
- **Prompt drift** is a phenomenon where the AI model starts to generate outputs that are not aligned with the original prompt. This can happen due to the model's training data, the prompt's wording, or the model's architecture.
|
|
297
299
|
- **Pipeline, workflow or chain** is a sequence of tasks that are executed in a specific order. In the context of AI, a pipeline can refer to a sequence of AI models that are used to process data.
|
|
@@ -304,11 +306,11 @@ The following glossary is used to clarify certain concepts:
|
|
|
304
306
|
|
|
305
307
|
|
|
306
308
|
|
|
307
|
-
_Note:
|
|
309
|
+
_Note: This section is not complete dictionary, more list of general AI / LLM terms that has connection with Promptbook_
|
|
308
310
|
|
|
309
311
|
|
|
310
312
|
|
|
311
|
-
|
|
313
|
+
### 💯 Core concepts
|
|
312
314
|
|
|
313
315
|
- [📚 Collection of pipelines](https://github.com/webgptorg/promptbook/discussions/65)
|
|
314
316
|
- [📯 Pipeline](https://github.com/webgptorg/promptbook/discussions/64)
|
|
@@ -321,7 +323,7 @@ _Note: Thos section is not complete dictionary, more list of general AI / LLM te
|
|
|
321
323
|
- [🔣 Words not tokens](https://github.com/webgptorg/promptbook/discussions/29)
|
|
322
324
|
- [☯ Separation of concerns](https://github.com/webgptorg/promptbook/discussions/32)
|
|
323
325
|
|
|
324
|
-
|
|
326
|
+
#### Advanced concepts
|
|
325
327
|
|
|
326
328
|
- [📚 Knowledge (Retrieval-augmented generation)](https://github.com/webgptorg/promptbook/discussions/41)
|
|
327
329
|
- [🌏 Remote server](https://github.com/webgptorg/promptbook/discussions/89)
|
|
@@ -338,17 +340,9 @@ _Note: Thos section is not complete dictionary, more list of general AI / LLM te
|
|
|
338
340
|
|
|
339
341
|
|
|
340
342
|
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
- Anonymous mode
|
|
344
|
-
- Application mode
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
## 🔌 Usage in Typescript / Javascript
|
|
343
|
+
## 🚂 Promptbook Engine
|
|
349
344
|
|
|
350
|
-
|
|
351
|
-
- [Usage with client and remote server](./examples/usage/remote)
|
|
345
|
+

|
|
352
346
|
|
|
353
347
|
## ➕➖ When to use Promptbook?
|
|
354
348
|
|
|
@@ -414,11 +408,11 @@ See [TODO.md](./TODO.md)
|
|
|
414
408
|
<div style="display: flex; align-items: center; gap: 20px;">
|
|
415
409
|
|
|
416
410
|
<a href="https://promptbook.studio/">
|
|
417
|
-
<img src="./design/promptbook-studio-logo.png" alt="Partner 3" height="
|
|
411
|
+
<img src="./design/promptbook-studio-logo.png" alt="Partner 3" height="70">
|
|
418
412
|
</a>
|
|
419
413
|
|
|
420
414
|
<a href="https://technologickainkubace.org/en/about-technology-incubation/about-the-project/">
|
|
421
|
-
<img src="./other/partners/CI-Technology-Incubation.png" alt="Technology Incubation" height="
|
|
415
|
+
<img src="./other/partners/CI-Technology-Incubation.png" alt="Technology Incubation" height="70">
|
|
422
416
|
</a>
|
|
423
417
|
|
|
424
418
|
</div>
|
package/esm/index.es.js
CHANGED
|
@@ -7,8 +7,8 @@ import { mkdir, rm } from 'fs/promises';
|
|
|
7
7
|
import { basename, join, dirname } from 'path';
|
|
8
8
|
import { format } from 'prettier';
|
|
9
9
|
import parserHtml from 'prettier/parser-html';
|
|
10
|
-
import { Subject } from 'rxjs';
|
|
11
10
|
import { randomBytes } from 'crypto';
|
|
11
|
+
import { Subject } from 'rxjs';
|
|
12
12
|
import { forTime } from 'waitasecond';
|
|
13
13
|
import sha256 from 'crypto-js/sha256';
|
|
14
14
|
import { lookup, extension } from 'mime-types';
|
|
@@ -29,7 +29,7 @@ const BOOK_LANGUAGE_VERSION = '1.0.0';
|
|
|
29
29
|
* @generated
|
|
30
30
|
* @see https://github.com/webgptorg/promptbook
|
|
31
31
|
*/
|
|
32
|
-
const PROMPTBOOK_ENGINE_VERSION = '0.88.0
|
|
32
|
+
const PROMPTBOOK_ENGINE_VERSION = '0.88.0';
|
|
33
33
|
/**
|
|
34
34
|
* TODO: string_promptbook_version should be constrained to the all versions of Promptbook engine
|
|
35
35
|
* Note: [💞] Ignore a discrepancy between file name and entity name
|
|
@@ -188,7 +188,7 @@ const DEFAULT_MAX_PARALLEL_COUNT = 5; // <- TODO: [🤹♂️]
|
|
|
188
188
|
*
|
|
189
189
|
* @public exported from `@promptbook/core`
|
|
190
190
|
*/
|
|
191
|
-
const DEFAULT_MAX_EXECUTION_ATTEMPTS =
|
|
191
|
+
const DEFAULT_MAX_EXECUTION_ATTEMPTS = 10; // <- TODO: [🤹♂️]
|
|
192
192
|
// <- TODO: [🕝] Make also `BOOKS_DIRNAME_ALTERNATIVES`
|
|
193
193
|
/**
|
|
194
194
|
* Where to store the temporary downloads
|
|
@@ -2154,6 +2154,21 @@ class MissingToolsError extends Error {
|
|
|
2154
2154
|
}
|
|
2155
2155
|
}
|
|
2156
2156
|
|
|
2157
|
+
/**
|
|
2158
|
+
* Generates random token
|
|
2159
|
+
*
|
|
2160
|
+
* Note: This function is cryptographically secure (it uses crypto.randomBytes internally)
|
|
2161
|
+
*
|
|
2162
|
+
* @private internal helper function
|
|
2163
|
+
* @returns secure random token
|
|
2164
|
+
*/
|
|
2165
|
+
function $randomToken(randomness) {
|
|
2166
|
+
return randomBytes(randomness).toString('hex');
|
|
2167
|
+
}
|
|
2168
|
+
/**
|
|
2169
|
+
* TODO: Maybe use nanoid instead https://github.com/ai/nanoid
|
|
2170
|
+
*/
|
|
2171
|
+
|
|
2157
2172
|
/**
|
|
2158
2173
|
* This error indicates errors during the execution of the pipeline
|
|
2159
2174
|
*
|
|
@@ -2161,11 +2176,17 @@ class MissingToolsError extends Error {
|
|
|
2161
2176
|
*/
|
|
2162
2177
|
class PipelineExecutionError extends Error {
|
|
2163
2178
|
constructor(message) {
|
|
2179
|
+
// Added id parameter
|
|
2164
2180
|
super(message);
|
|
2165
2181
|
this.name = 'PipelineExecutionError';
|
|
2182
|
+
// TODO: [🐙] DRY - Maybe $randomId
|
|
2183
|
+
this.id = `error-${$randomToken(8 /* <- TODO: To global config + Use Base58 to avoid simmilar char conflicts */)}`;
|
|
2166
2184
|
Object.setPrototypeOf(this, PipelineExecutionError.prototype);
|
|
2167
2185
|
}
|
|
2168
2186
|
}
|
|
2187
|
+
/**
|
|
2188
|
+
* TODO: !!!!!! Add id to all errors
|
|
2189
|
+
*/
|
|
2169
2190
|
|
|
2170
2191
|
/**
|
|
2171
2192
|
* Determine if the pipeline is fully prepared
|
|
@@ -2205,18 +2226,33 @@ function isPipelinePrepared(pipeline) {
|
|
|
2205
2226
|
*/
|
|
2206
2227
|
|
|
2207
2228
|
/**
|
|
2208
|
-
*
|
|
2209
|
-
|
|
2210
|
-
*
|
|
2211
|
-
*
|
|
2212
|
-
* @private internal helper function
|
|
2213
|
-
* @returns secure random token
|
|
2229
|
+
* Recursively converts JSON strings to JSON objects
|
|
2230
|
+
|
|
2231
|
+
* @public exported from `@promptbook/utils`
|
|
2214
2232
|
*/
|
|
2215
|
-
function
|
|
2216
|
-
|
|
2233
|
+
function jsonStringsToJsons(object) {
|
|
2234
|
+
if (object === null) {
|
|
2235
|
+
return object;
|
|
2236
|
+
}
|
|
2237
|
+
if (Array.isArray(object)) {
|
|
2238
|
+
return object.map(jsonStringsToJsons);
|
|
2239
|
+
}
|
|
2240
|
+
if (typeof object !== 'object') {
|
|
2241
|
+
return object;
|
|
2242
|
+
}
|
|
2243
|
+
const newObject = { ...object };
|
|
2244
|
+
for (const [key, value] of Object.entries(object)) {
|
|
2245
|
+
if (typeof value === 'string' && isValidJsonString(value)) {
|
|
2246
|
+
newObject[key] = JSON.parse(value);
|
|
2247
|
+
}
|
|
2248
|
+
else {
|
|
2249
|
+
newObject[key] = jsonStringsToJsons(value);
|
|
2250
|
+
}
|
|
2251
|
+
}
|
|
2252
|
+
return newObject;
|
|
2217
2253
|
}
|
|
2218
2254
|
/**
|
|
2219
|
-
* TODO:
|
|
2255
|
+
* TODO: Type the return type correctly
|
|
2220
2256
|
*/
|
|
2221
2257
|
|
|
2222
2258
|
/**
|
|
@@ -2356,7 +2392,7 @@ const ALL_ERRORS = {
|
|
|
2356
2392
|
* @public exported from `@promptbook/utils`
|
|
2357
2393
|
*/
|
|
2358
2394
|
function deserializeError(error) {
|
|
2359
|
-
const { name, stack } = error;
|
|
2395
|
+
const { name, stack, id } = error; // Added id
|
|
2360
2396
|
let { message } = error;
|
|
2361
2397
|
let ErrorClass = ALL_ERRORS[error.name];
|
|
2362
2398
|
if (ErrorClass === undefined) {
|
|
@@ -2371,7 +2407,9 @@ function deserializeError(error) {
|
|
|
2371
2407
|
${block(stack || '')}
|
|
2372
2408
|
`);
|
|
2373
2409
|
}
|
|
2374
|
-
|
|
2410
|
+
const deserializedError = new ErrorClass(message);
|
|
2411
|
+
deserializedError.id = id; // Assign id to the error object
|
|
2412
|
+
return deserializedError;
|
|
2375
2413
|
}
|
|
2376
2414
|
|
|
2377
2415
|
/**
|
|
@@ -2421,17 +2459,19 @@ function assertsTaskSuccessful(executionResult) {
|
|
|
2421
2459
|
*/
|
|
2422
2460
|
function createTask(options) {
|
|
2423
2461
|
const { taskType, taskProcessCallback } = options;
|
|
2462
|
+
// TODO: [🐙] DRY
|
|
2424
2463
|
const taskId = `${taskType.toLowerCase().substring(0, 4)}-${$randomToken(8 /* <- TODO: To global config + Use Base58 to avoid simmilar char conflicts */)}`;
|
|
2425
2464
|
let status = 'RUNNING';
|
|
2426
2465
|
const createdAt = new Date();
|
|
2427
2466
|
let updatedAt = createdAt;
|
|
2428
2467
|
const errors = [];
|
|
2429
2468
|
const warnings = [];
|
|
2430
|
-
|
|
2469
|
+
let currentValue = {};
|
|
2431
2470
|
const partialResultSubject = new Subject();
|
|
2432
2471
|
// <- Note: Not using `BehaviorSubject` because on error we can't access the last value
|
|
2433
2472
|
const finalResultPromise = /* not await */ taskProcessCallback((newOngoingResult) => {
|
|
2434
2473
|
Object.assign(currentValue, newOngoingResult);
|
|
2474
|
+
// <- TODO: assign deep
|
|
2435
2475
|
partialResultSubject.next(newOngoingResult);
|
|
2436
2476
|
});
|
|
2437
2477
|
finalResultPromise
|
|
@@ -2451,7 +2491,8 @@ function createTask(options) {
|
|
|
2451
2491
|
// And delete `ExecutionTask.currentValue.preparedPipeline`
|
|
2452
2492
|
assertsTaskSuccessful(executionResult);
|
|
2453
2493
|
status = 'FINISHED';
|
|
2454
|
-
|
|
2494
|
+
currentValue = jsonStringsToJsons(executionResult);
|
|
2495
|
+
// <- TODO: [🧠] Is this a good idea to convert JSON strins to JSONs?
|
|
2455
2496
|
partialResultSubject.next(executionResult);
|
|
2456
2497
|
}
|
|
2457
2498
|
catch (error) {
|
|
@@ -2515,19 +2556,21 @@ function createTask(options) {
|
|
|
2515
2556
|
*/
|
|
2516
2557
|
function serializeError(error) {
|
|
2517
2558
|
const { name, message, stack } = error;
|
|
2559
|
+
const { id } = error;
|
|
2518
2560
|
if (!Object.keys(ALL_ERRORS).includes(name)) {
|
|
2519
2561
|
console.error(spaceTrim$1((block) => `
|
|
2520
|
-
|
|
2562
|
+
|
|
2521
2563
|
Cannot serialize error with name "${name}"
|
|
2522
2564
|
|
|
2523
2565
|
${block(stack || message)}
|
|
2524
|
-
|
|
2566
|
+
|
|
2525
2567
|
`));
|
|
2526
2568
|
}
|
|
2527
2569
|
return {
|
|
2528
2570
|
name: name,
|
|
2529
2571
|
message,
|
|
2530
2572
|
stack,
|
|
2573
|
+
id, // Include id in the serialized object
|
|
2531
2574
|
};
|
|
2532
2575
|
}
|
|
2533
2576
|
|
|
@@ -4325,6 +4368,9 @@ function countCharacters(text) {
|
|
|
4325
4368
|
text = text.replace(/\p{Extended_Pictographic}(\u{200D}\p{Extended_Pictographic})*/gu, '-');
|
|
4326
4369
|
return text.length;
|
|
4327
4370
|
}
|
|
4371
|
+
/**
|
|
4372
|
+
* TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
|
|
4373
|
+
*/
|
|
4328
4374
|
|
|
4329
4375
|
/**
|
|
4330
4376
|
* Number of characters per standard line with 11pt Arial font size.
|
|
@@ -4356,6 +4402,9 @@ function countLines(text) {
|
|
|
4356
4402
|
const lines = text.split('\n');
|
|
4357
4403
|
return lines.reduce((count, line) => count + Math.ceil(line.length / CHARACTERS_PER_STANDARD_LINE), 0);
|
|
4358
4404
|
}
|
|
4405
|
+
/**
|
|
4406
|
+
* TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
|
|
4407
|
+
*/
|
|
4359
4408
|
|
|
4360
4409
|
/**
|
|
4361
4410
|
* Counts number of pages in the text
|
|
@@ -4367,6 +4416,9 @@ function countLines(text) {
|
|
|
4367
4416
|
function countPages(text) {
|
|
4368
4417
|
return Math.ceil(countLines(text) / LINES_PER_STANDARD_PAGE);
|
|
4369
4418
|
}
|
|
4419
|
+
/**
|
|
4420
|
+
* TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
|
|
4421
|
+
*/
|
|
4370
4422
|
|
|
4371
4423
|
/**
|
|
4372
4424
|
* Counts number of paragraphs in the text
|
|
@@ -4376,6 +4428,9 @@ function countPages(text) {
|
|
|
4376
4428
|
function countParagraphs(text) {
|
|
4377
4429
|
return text.split(/\n\s*\n/).filter((paragraph) => paragraph.trim() !== '').length;
|
|
4378
4430
|
}
|
|
4431
|
+
/**
|
|
4432
|
+
* TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
|
|
4433
|
+
*/
|
|
4379
4434
|
|
|
4380
4435
|
/**
|
|
4381
4436
|
* Split text into sentences
|
|
@@ -4393,6 +4448,9 @@ function splitIntoSentences(text) {
|
|
|
4393
4448
|
function countSentences(text) {
|
|
4394
4449
|
return splitIntoSentences(text).length;
|
|
4395
4450
|
}
|
|
4451
|
+
/**
|
|
4452
|
+
* TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
|
|
4453
|
+
*/
|
|
4396
4454
|
|
|
4397
4455
|
/**
|
|
4398
4456
|
* Counts number of words in the text
|
|
@@ -4406,6 +4464,9 @@ function countWords(text) {
|
|
|
4406
4464
|
text = text.replace(/([a-z])([A-Z])/g, '$1 $2');
|
|
4407
4465
|
return text.split(/[^a-zа-я0-9]+/i).filter((word) => word.length > 0).length;
|
|
4408
4466
|
}
|
|
4467
|
+
/**
|
|
4468
|
+
* TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
|
|
4469
|
+
*/
|
|
4409
4470
|
|
|
4410
4471
|
/**
|
|
4411
4472
|
* Index of all counter functions
|