@promptbook/website-crawler 0.88.0-8 → 0.88.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -23,10 +23,6 @@
23
23
 
24
24
 
25
25
 
26
- <blockquote style="color: #ff8811">
27
- <b>⚠ Warning:</b> This is a pre-release version of the library. It is not yet ready for production use. Please look at <a href="https://www.npmjs.com/package/@promptbook/core?activeTab=versions">latest stable release</a>.
28
- </blockquote>
29
-
30
26
  ## 📦 Package `@promptbook/website-crawler`
31
27
 
32
28
  - Promptbooks are [divided into several](#-packages) packages, all are published from [single monorepo](https://github.com/webgptorg/promptbook).
@@ -70,6 +66,9 @@ This shift is going to happen, whether we are ready for it or not. Our mission i
70
66
 
71
67
 
72
68
 
69
+
70
+
71
+
73
72
  ## 🚀 Get started
74
73
 
75
74
  Take a look at the simple starter kit with books integrated into the **Hello World** sample applications:
@@ -81,6 +80,8 @@ Take a look at the simple starter kit with books integrated into the **Hello Wor
81
80
 
82
81
 
83
82
 
83
+
84
+
84
85
  ## 💜 The Promptbook Project
85
86
 
86
87
  Promptbook project is ecosystem of multiple projects and tools, following is a list of most important pieces of the project:
@@ -116,6 +117,14 @@ Promptbook project is ecosystem of multiple projects and tools, following is a l
116
117
  </tbody>
117
118
  </table>
118
119
 
120
+ Hello world examples:
121
+
122
+ - [Hello world](https://github.com/webgptorg/hello-world)
123
+ - [Hello world in Node.js](https://github.com/webgptorg/hello-world-node-js)
124
+ - [Hello world in Next.js](https://github.com/webgptorg/hello-world-next-js)
125
+
126
+
127
+
119
128
  We also have a community of developers and users of **Promptbook**:
120
129
 
121
130
  - [Discord community](https://discord.gg/x3QWNaa89N)
@@ -282,16 +291,9 @@ Or you can install them separately:
282
291
 
283
292
  ## 📚 Dictionary
284
293
 
285
-
286
-
287
-
288
-
289
-
290
- ### 📚 Dictionary
291
-
292
294
  The following glossary is used to clarify certain concepts:
293
295
 
294
- #### General LLM / AI terms
296
+ ### General LLM / AI terms
295
297
 
296
298
  - **Prompt drift** is a phenomenon where the AI model starts to generate outputs that are not aligned with the original prompt. This can happen due to the model's training data, the prompt's wording, or the model's architecture.
297
299
  - **Pipeline, workflow or chain** is a sequence of tasks that are executed in a specific order. In the context of AI, a pipeline can refer to a sequence of AI models that are used to process data.
@@ -304,11 +306,11 @@ The following glossary is used to clarify certain concepts:
304
306
 
305
307
 
306
308
 
307
- _Note: Thos section is not complete dictionary, more list of general AI / LLM terms that has connection with Promptbook_
309
+ _Note: This section is not complete dictionary, more list of general AI / LLM terms that has connection with Promptbook_
308
310
 
309
311
 
310
312
 
311
- #### 💯 Core concepts
313
+ ### 💯 Core concepts
312
314
 
313
315
  - [📚 Collection of pipelines](https://github.com/webgptorg/promptbook/discussions/65)
314
316
  - [📯 Pipeline](https://github.com/webgptorg/promptbook/discussions/64)
@@ -321,7 +323,7 @@ _Note: Thos section is not complete dictionary, more list of general AI / LLM te
321
323
  - [🔣 Words not tokens](https://github.com/webgptorg/promptbook/discussions/29)
322
324
  - [☯ Separation of concerns](https://github.com/webgptorg/promptbook/discussions/32)
323
325
 
324
- ##### Advanced concepts
326
+ #### Advanced concepts
325
327
 
326
328
  - [📚 Knowledge (Retrieval-augmented generation)](https://github.com/webgptorg/promptbook/discussions/41)
327
329
  - [🌏 Remote server](https://github.com/webgptorg/promptbook/discussions/89)
@@ -338,17 +340,9 @@ _Note: Thos section is not complete dictionary, more list of general AI / LLM te
338
340
 
339
341
 
340
342
 
341
- ### Terms specific to Promptbook TypeScript implementation
342
-
343
- - Anonymous mode
344
- - Application mode
345
-
346
-
347
-
348
- ## 🔌 Usage in Typescript / Javascript
343
+ ## 🚂 Promptbook Engine
349
344
 
350
- - [Simple usage](./examples/usage/simple-script)
351
- - [Usage with client and remote server](./examples/usage/remote)
345
+ ![Schema of Promptbook Engine](./documents/promptbook-engine.svg)
352
346
 
353
347
  ## ➕➖ When to use Promptbook?
354
348
 
@@ -414,11 +408,11 @@ See [TODO.md](./TODO.md)
414
408
  <div style="display: flex; align-items: center; gap: 20px;">
415
409
 
416
410
  <a href="https://promptbook.studio/">
417
- <img src="./design/promptbook-studio-logo.png" alt="Partner 3" height="100">
411
+ <img src="./design/promptbook-studio-logo.png" alt="Partner 3" height="70">
418
412
  </a>
419
413
 
420
414
  <a href="https://technologickainkubace.org/en/about-technology-incubation/about-the-project/">
421
- <img src="./other/partners/CI-Technology-Incubation.png" alt="Technology Incubation" height="100">
415
+ <img src="./other/partners/CI-Technology-Incubation.png" alt="Technology Incubation" height="70">
422
416
  </a>
423
417
 
424
418
  </div>
package/esm/index.es.js CHANGED
@@ -7,8 +7,8 @@ import { mkdir, rm } from 'fs/promises';
7
7
  import { basename, join, dirname } from 'path';
8
8
  import { format } from 'prettier';
9
9
  import parserHtml from 'prettier/parser-html';
10
- import { Subject } from 'rxjs';
11
10
  import { randomBytes } from 'crypto';
11
+ import { Subject } from 'rxjs';
12
12
  import { forTime } from 'waitasecond';
13
13
  import sha256 from 'crypto-js/sha256';
14
14
  import { lookup, extension } from 'mime-types';
@@ -29,7 +29,7 @@ const BOOK_LANGUAGE_VERSION = '1.0.0';
29
29
  * @generated
30
30
  * @see https://github.com/webgptorg/promptbook
31
31
  */
32
- const PROMPTBOOK_ENGINE_VERSION = '0.88.0-8';
32
+ const PROMPTBOOK_ENGINE_VERSION = '0.88.0';
33
33
  /**
34
34
  * TODO: string_promptbook_version should be constrained to the all versions of Promptbook engine
35
35
  * Note: [💞] Ignore a discrepancy between file name and entity name
@@ -188,7 +188,7 @@ const DEFAULT_MAX_PARALLEL_COUNT = 5; // <- TODO: [🤹‍♂️]
188
188
  *
189
189
  * @public exported from `@promptbook/core`
190
190
  */
191
- const DEFAULT_MAX_EXECUTION_ATTEMPTS = 3; // <- TODO: [🤹‍♂️]
191
+ const DEFAULT_MAX_EXECUTION_ATTEMPTS = 10; // <- TODO: [🤹‍♂️]
192
192
  // <- TODO: [🕝] Make also `BOOKS_DIRNAME_ALTERNATIVES`
193
193
  /**
194
194
  * Where to store the temporary downloads
@@ -2154,6 +2154,21 @@ class MissingToolsError extends Error {
2154
2154
  }
2155
2155
  }
2156
2156
 
2157
+ /**
2158
+ * Generates random token
2159
+ *
2160
+ * Note: This function is cryptographically secure (it uses crypto.randomBytes internally)
2161
+ *
2162
+ * @private internal helper function
2163
+ * @returns secure random token
2164
+ */
2165
+ function $randomToken(randomness) {
2166
+ return randomBytes(randomness).toString('hex');
2167
+ }
2168
+ /**
2169
+ * TODO: Maybe use nanoid instead https://github.com/ai/nanoid
2170
+ */
2171
+
2157
2172
  /**
2158
2173
  * This error indicates errors during the execution of the pipeline
2159
2174
  *
@@ -2161,11 +2176,17 @@ class MissingToolsError extends Error {
2161
2176
  */
2162
2177
  class PipelineExecutionError extends Error {
2163
2178
  constructor(message) {
2179
+ // Added id parameter
2164
2180
  super(message);
2165
2181
  this.name = 'PipelineExecutionError';
2182
+ // TODO: [🐙] DRY - Maybe $randomId
2183
+ this.id = `error-${$randomToken(8 /* <- TODO: To global config + Use Base58 to avoid simmilar char conflicts */)}`;
2166
2184
  Object.setPrototypeOf(this, PipelineExecutionError.prototype);
2167
2185
  }
2168
2186
  }
2187
+ /**
2188
+ * TODO: !!!!!! Add id to all errors
2189
+ */
2169
2190
 
2170
2191
  /**
2171
2192
  * Determine if the pipeline is fully prepared
@@ -2205,18 +2226,33 @@ function isPipelinePrepared(pipeline) {
2205
2226
  */
2206
2227
 
2207
2228
  /**
2208
- * Generates random token
2209
- *
2210
- * Note: This function is cryptographically secure (it uses crypto.randomBytes internally)
2211
- *
2212
- * @private internal helper function
2213
- * @returns secure random token
2229
+ * Recursively converts JSON strings to JSON objects
2230
+
2231
+ * @public exported from `@promptbook/utils`
2214
2232
  */
2215
- function $randomToken(randomness) {
2216
- return randomBytes(randomness).toString('hex');
2233
+ function jsonStringsToJsons(object) {
2234
+ if (object === null) {
2235
+ return object;
2236
+ }
2237
+ if (Array.isArray(object)) {
2238
+ return object.map(jsonStringsToJsons);
2239
+ }
2240
+ if (typeof object !== 'object') {
2241
+ return object;
2242
+ }
2243
+ const newObject = { ...object };
2244
+ for (const [key, value] of Object.entries(object)) {
2245
+ if (typeof value === 'string' && isValidJsonString(value)) {
2246
+ newObject[key] = JSON.parse(value);
2247
+ }
2248
+ else {
2249
+ newObject[key] = jsonStringsToJsons(value);
2250
+ }
2251
+ }
2252
+ return newObject;
2217
2253
  }
2218
2254
  /**
2219
- * TODO: Maybe use nanoid instead https://github.com/ai/nanoid
2255
+ * TODO: Type the return type correctly
2220
2256
  */
2221
2257
 
2222
2258
  /**
@@ -2356,7 +2392,7 @@ const ALL_ERRORS = {
2356
2392
  * @public exported from `@promptbook/utils`
2357
2393
  */
2358
2394
  function deserializeError(error) {
2359
- const { name, stack } = error;
2395
+ const { name, stack, id } = error; // Added id
2360
2396
  let { message } = error;
2361
2397
  let ErrorClass = ALL_ERRORS[error.name];
2362
2398
  if (ErrorClass === undefined) {
@@ -2371,7 +2407,9 @@ function deserializeError(error) {
2371
2407
  ${block(stack || '')}
2372
2408
  `);
2373
2409
  }
2374
- return new ErrorClass(message);
2410
+ const deserializedError = new ErrorClass(message);
2411
+ deserializedError.id = id; // Assign id to the error object
2412
+ return deserializedError;
2375
2413
  }
2376
2414
 
2377
2415
  /**
@@ -2421,17 +2459,19 @@ function assertsTaskSuccessful(executionResult) {
2421
2459
  */
2422
2460
  function createTask(options) {
2423
2461
  const { taskType, taskProcessCallback } = options;
2462
+ // TODO: [🐙] DRY
2424
2463
  const taskId = `${taskType.toLowerCase().substring(0, 4)}-${$randomToken(8 /* <- TODO: To global config + Use Base58 to avoid simmilar char conflicts */)}`;
2425
2464
  let status = 'RUNNING';
2426
2465
  const createdAt = new Date();
2427
2466
  let updatedAt = createdAt;
2428
2467
  const errors = [];
2429
2468
  const warnings = [];
2430
- const currentValue = {};
2469
+ let currentValue = {};
2431
2470
  const partialResultSubject = new Subject();
2432
2471
  // <- Note: Not using `BehaviorSubject` because on error we can't access the last value
2433
2472
  const finalResultPromise = /* not await */ taskProcessCallback((newOngoingResult) => {
2434
2473
  Object.assign(currentValue, newOngoingResult);
2474
+ // <- TODO: assign deep
2435
2475
  partialResultSubject.next(newOngoingResult);
2436
2476
  });
2437
2477
  finalResultPromise
@@ -2451,7 +2491,8 @@ function createTask(options) {
2451
2491
  // And delete `ExecutionTask.currentValue.preparedPipeline`
2452
2492
  assertsTaskSuccessful(executionResult);
2453
2493
  status = 'FINISHED';
2454
- Object.assign(currentValue, executionResult);
2494
+ currentValue = jsonStringsToJsons(executionResult);
2495
+ // <- TODO: [🧠] Is this a good idea to convert JSON strins to JSONs?
2455
2496
  partialResultSubject.next(executionResult);
2456
2497
  }
2457
2498
  catch (error) {
@@ -2515,19 +2556,21 @@ function createTask(options) {
2515
2556
  */
2516
2557
  function serializeError(error) {
2517
2558
  const { name, message, stack } = error;
2559
+ const { id } = error;
2518
2560
  if (!Object.keys(ALL_ERRORS).includes(name)) {
2519
2561
  console.error(spaceTrim$1((block) => `
2520
-
2562
+
2521
2563
  Cannot serialize error with name "${name}"
2522
2564
 
2523
2565
  ${block(stack || message)}
2524
-
2566
+
2525
2567
  `));
2526
2568
  }
2527
2569
  return {
2528
2570
  name: name,
2529
2571
  message,
2530
2572
  stack,
2573
+ id, // Include id in the serialized object
2531
2574
  };
2532
2575
  }
2533
2576
 
@@ -4325,6 +4368,9 @@ function countCharacters(text) {
4325
4368
  text = text.replace(/\p{Extended_Pictographic}(\u{200D}\p{Extended_Pictographic})*/gu, '-');
4326
4369
  return text.length;
4327
4370
  }
4371
+ /**
4372
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4373
+ */
4328
4374
 
4329
4375
  /**
4330
4376
  * Number of characters per standard line with 11pt Arial font size.
@@ -4356,6 +4402,9 @@ function countLines(text) {
4356
4402
  const lines = text.split('\n');
4357
4403
  return lines.reduce((count, line) => count + Math.ceil(line.length / CHARACTERS_PER_STANDARD_LINE), 0);
4358
4404
  }
4405
+ /**
4406
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4407
+ */
4359
4408
 
4360
4409
  /**
4361
4410
  * Counts number of pages in the text
@@ -4367,6 +4416,9 @@ function countLines(text) {
4367
4416
  function countPages(text) {
4368
4417
  return Math.ceil(countLines(text) / LINES_PER_STANDARD_PAGE);
4369
4418
  }
4419
+ /**
4420
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4421
+ */
4370
4422
 
4371
4423
  /**
4372
4424
  * Counts number of paragraphs in the text
@@ -4376,6 +4428,9 @@ function countPages(text) {
4376
4428
  function countParagraphs(text) {
4377
4429
  return text.split(/\n\s*\n/).filter((paragraph) => paragraph.trim() !== '').length;
4378
4430
  }
4431
+ /**
4432
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4433
+ */
4379
4434
 
4380
4435
  /**
4381
4436
  * Split text into sentences
@@ -4393,6 +4448,9 @@ function splitIntoSentences(text) {
4393
4448
  function countSentences(text) {
4394
4449
  return splitIntoSentences(text).length;
4395
4450
  }
4451
+ /**
4452
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4453
+ */
4396
4454
 
4397
4455
  /**
4398
4456
  * Counts number of words in the text
@@ -4406,6 +4464,9 @@ function countWords(text) {
4406
4464
  text = text.replace(/([a-z])([A-Z])/g, '$1 $2');
4407
4465
  return text.split(/[^a-zа-я0-9]+/i).filter((word) => word.length > 0).length;
4408
4466
  }
4467
+ /**
4468
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4469
+ */
4409
4470
 
4410
4471
  /**
4411
4472
  * Index of all counter functions