@promptbook/markitdown 0.88.0-9 → 0.89.0-1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (25) hide show
  1. package/README.md +35 -14
  2. package/esm/index.es.js +62 -26
  3. package/esm/index.es.js.map +1 -1
  4. package/esm/typings/src/_packages/core.index.d.ts +2 -2
  5. package/esm/typings/src/_packages/types.index.d.ts +10 -0
  6. package/esm/typings/src/config.d.ts +1 -1
  7. package/esm/typings/src/errors/PipelineExecutionError.d.ts +5 -0
  8. package/esm/typings/src/errors/utils/ErrorJson.d.ts +5 -0
  9. package/esm/typings/src/llm-providers/_common/utils/count-total-usage/LlmExecutionToolsWithTotalUsage.d.ts +7 -0
  10. package/esm/typings/src/llm-providers/_common/utils/count-total-usage/{countTotalUsage.d.ts → countUsage.d.ts} +1 -1
  11. package/esm/typings/src/playground/BrjappConnector.d.ts +64 -0
  12. package/esm/typings/src/playground/brjapp-api-schema.d.ts +12879 -0
  13. package/esm/typings/src/playground/playground.d.ts +5 -0
  14. package/esm/typings/src/remote-server/socket-types/_subtypes/PromptbookServer_Identification.d.ts +2 -1
  15. package/esm/typings/src/remote-server/types/RemoteServerOptions.d.ts +15 -3
  16. package/esm/typings/src/types/typeAliases.d.ts +2 -2
  17. package/esm/typings/src/utils/expectation-counters/countCharacters.d.ts +3 -0
  18. package/esm/typings/src/utils/expectation-counters/countLines.d.ts +3 -0
  19. package/esm/typings/src/utils/expectation-counters/countPages.d.ts +3 -0
  20. package/esm/typings/src/utils/expectation-counters/countParagraphs.d.ts +3 -0
  21. package/esm/typings/src/utils/expectation-counters/countSentences.d.ts +3 -0
  22. package/esm/typings/src/utils/expectation-counters/countWords.d.ts +3 -0
  23. package/package.json +2 -2
  24. package/umd/index.umd.js +65 -29
  25. package/umd/index.umd.js.map +1 -1
package/README.md CHANGED
@@ -58,6 +58,8 @@ Rest of the documentation is common for **entire promptbook ecosystem**:
58
58
 
59
59
  During the computer revolution, we have seen [multiple generations of computer languages](https://github.com/webgptorg/promptbook/discussions/180), from the physical rewiring of the vacuum tubes through low-level machine code to the high-level languages like Python or JavaScript. And now, we're on the edge of the **next revolution**!
60
60
 
61
+
62
+
61
63
  It's a revolution of writing software in **plain human language** that is understandable and executable by both humans and machines – and it's going to change everything!
62
64
 
63
65
  The incredible growth in power of microprocessors and the Moore's Law have been the driving force behind the ever-more powerful languages, and it's been an amazing journey! Similarly, the large language models (like GPT or Claude) are the next big thing in language technology, and they're set to transform the way we interact with computers.
@@ -68,6 +70,9 @@ This shift is going to happen, whether we are ready for it or not. Our mission i
68
70
 
69
71
 
70
72
 
73
+
74
+
75
+
71
76
  ## 🚀 Get started
72
77
 
73
78
  Take a look at the simple starter kit with books integrated into the **Hello World** sample applications:
@@ -79,6 +84,8 @@ Take a look at the simple starter kit with books integrated into the **Hello Wor
79
84
 
80
85
 
81
86
 
87
+
88
+
82
89
  ## 💜 The Promptbook Project
83
90
 
84
91
  Promptbook project is ecosystem of multiple projects and tools, following is a list of most important pieces of the project:
@@ -114,22 +121,35 @@ Promptbook project is ecosystem of multiple projects and tools, following is a l
114
121
  </tbody>
115
122
  </table>
116
123
 
124
+ Hello world examples:
125
+
126
+ - [Hello world](https://github.com/webgptorg/hello-world)
127
+ - [Hello world in Node.js](https://github.com/webgptorg/hello-world-node-js)
128
+ - [Hello world in Next.js](https://github.com/webgptorg/hello-world-next-js)
129
+
130
+
131
+
117
132
  We also have a community of developers and users of **Promptbook**:
118
133
 
119
134
  - [Discord community](https://discord.gg/x3QWNaa89N)
120
135
  - [Landing page `ptbk.io`](https://ptbk.io)
121
136
  - [Github discussions](https://github.com/webgptorg/promptbook/discussions)
122
137
  - [LinkedIn `Promptbook`](https://linkedin.com/company/promptbook)
123
- - [Facebook `Promptbook`](https://www.facebook.com/61560776453536)
138
+ - [Facebook `Promptbook`](https://www.facebook.com/61560776453536)
124
139
 
125
140
  And **Promptbook.studio** branded socials:
126
141
 
142
+
143
+
127
144
  - [Instagram `@promptbook.studio`](https://www.instagram.com/promptbook.studio/)
128
145
 
129
146
  And **Promptujeme** sub-brand:
130
147
 
131
148
  _/Subbrand for Czech clients/_
132
149
 
150
+
151
+
152
+
133
153
  - [Promptujeme.cz](https://www.promptujeme.cz/)
134
154
  - [Facebook `Promptujeme`](https://www.facebook.com/promptujeme/)
135
155
 
@@ -147,6 +167,8 @@ _/Sub-brand for images and graphics generated via Promptbook prompting/_
147
167
 
148
168
  ## 💙 The Book language
149
169
 
170
+
171
+
150
172
  Following is the documentation and blueprint of the [Book language](https://github.com/webgptorg/book).
151
173
 
152
174
  Book is a language that can be used to write AI applications, agents, workflows, automations, knowledgebases, translators, sheet processors, email automations and more. It allows you to harness the power of AI models in human-like terms, without the need to know the specifics and technicalities of the models.
@@ -196,6 +218,8 @@ Personas can have access to different knowledge, tools and actions. They can als
196
218
 
197
219
  - [PERSONA](https://github.com/webgptorg/promptbook/blob/main/documents/commands/PERSONA.md)
198
220
 
221
+
222
+
199
223
  ### **How:** Knowledge, Instruments and Actions
200
224
 
201
225
  The resources used by the personas are used to do the work.
@@ -271,11 +295,9 @@ Or you can install them separately:
271
295
 
272
296
  ## 📚 Dictionary
273
297
 
274
- ### 📚 Dictionary
275
-
276
298
  The following glossary is used to clarify certain concepts:
277
299
 
278
- #### General LLM / AI terms
300
+ ### General LLM / AI terms
279
301
 
280
302
  - **Prompt drift** is a phenomenon where the AI model starts to generate outputs that are not aligned with the original prompt. This can happen due to the model's training data, the prompt's wording, or the model's architecture.
281
303
  - **Pipeline, workflow or chain** is a sequence of tasks that are executed in a specific order. In the context of AI, a pipeline can refer to a sequence of AI models that are used to process data.
@@ -286,9 +308,13 @@ The following glossary is used to clarify certain concepts:
286
308
  - **Retrieval-augmented generation** is a machine learning paradigm where a model generates text by retrieving relevant information from a large database of text. This approach combines the benefits of generative models and retrieval models.
287
309
  - **Longtail** refers to non-common or rare events, items, or entities that are not well-represented in the training data of machine learning models. Longtail items are often challenging for models to predict accurately.
288
310
 
289
- _Note: Thos section is not complete dictionary, more list of general AI / LLM terms that has connection with Promptbook_
290
311
 
291
- #### 💯 Core concepts
312
+
313
+ _Note: This section is not complete dictionary, more list of general AI / LLM terms that has connection with Promptbook_
314
+
315
+
316
+
317
+ ### 💯 Core concepts
292
318
 
293
319
  - [📚 Collection of pipelines](https://github.com/webgptorg/promptbook/discussions/65)
294
320
  - [📯 Pipeline](https://github.com/webgptorg/promptbook/discussions/64)
@@ -301,7 +327,7 @@ _Note: Thos section is not complete dictionary, more list of general AI / LLM te
301
327
  - [🔣 Words not tokens](https://github.com/webgptorg/promptbook/discussions/29)
302
328
  - [☯ Separation of concerns](https://github.com/webgptorg/promptbook/discussions/32)
303
329
 
304
- ##### Advanced concepts
330
+ #### Advanced concepts
305
331
 
306
332
  - [📚 Knowledge (Retrieval-augmented generation)](https://github.com/webgptorg/promptbook/discussions/41)
307
333
  - [🌏 Remote server](https://github.com/webgptorg/promptbook/discussions/89)
@@ -316,11 +342,6 @@ _Note: Thos section is not complete dictionary, more list of general AI / LLM te
316
342
  - [👮 Agent adversary expectations](https://github.com/webgptorg/promptbook/discussions/39)
317
343
  - [view more](https://github.com/webgptorg/promptbook/discussions/categories/concepts)
318
344
 
319
- ### Terms specific to Promptbook TypeScript implementation
320
-
321
- - Anonymous mode
322
- - Application mode
323
-
324
345
 
325
346
 
326
347
  ## 🚂 Promptbook Engine
@@ -391,11 +412,11 @@ See [TODO.md](./TODO.md)
391
412
  <div style="display: flex; align-items: center; gap: 20px;">
392
413
 
393
414
  <a href="https://promptbook.studio/">
394
- <img src="./design/promptbook-studio-logo.png" alt="Partner 3" height="100">
415
+ <img src="./design/promptbook-studio-logo.png" alt="Partner 3" height="70">
395
416
  </a>
396
417
 
397
418
  <a href="https://technologickainkubace.org/en/about-technology-incubation/about-the-project/">
398
- <img src="./other/partners/CI-Technology-Incubation.png" alt="Technology Incubation" height="100">
419
+ <img src="./other/partners/CI-Technology-Incubation.png" alt="Technology Incubation" height="70">
399
420
  </a>
400
421
 
401
422
  </div>
package/esm/index.es.js CHANGED
@@ -5,8 +5,8 @@ import hexEncoder from 'crypto-js/enc-hex';
5
5
  import { basename, join, dirname } from 'path';
6
6
  import { format } from 'prettier';
7
7
  import parserHtml from 'prettier/parser-html';
8
- import { Subject } from 'rxjs';
9
8
  import { randomBytes } from 'crypto';
9
+ import { Subject } from 'rxjs';
10
10
  import { forTime } from 'waitasecond';
11
11
  import sha256 from 'crypto-js/sha256';
12
12
  import { lookup, extension } from 'mime-types';
@@ -26,7 +26,7 @@ const BOOK_LANGUAGE_VERSION = '1.0.0';
26
26
  * @generated
27
27
  * @see https://github.com/webgptorg/promptbook
28
28
  */
29
- const PROMPTBOOK_ENGINE_VERSION = '0.88.0-9';
29
+ const PROMPTBOOK_ENGINE_VERSION = '0.89.0-1';
30
30
  /**
31
31
  * TODO: string_promptbook_version should be constrained to the all versions of Promptbook engine
32
32
  * Note: [💞] Ignore a discrepancy between file name and entity name
@@ -158,7 +158,7 @@ const DEFAULT_MAX_PARALLEL_COUNT = 5; // <- TODO: [🤹‍♂️]
158
158
  *
159
159
  * @public exported from `@promptbook/core`
160
160
  */
161
- const DEFAULT_MAX_EXECUTION_ATTEMPTS = 3; // <- TODO: [🤹‍♂️]
161
+ const DEFAULT_MAX_EXECUTION_ATTEMPTS = 10; // <- TODO: [🤹‍♂️]
162
162
  // <- TODO: [🕝] Make also `BOOKS_DIRNAME_ALTERNATIVES`
163
163
  /**
164
164
  * Where to store the temporary downloads
@@ -2003,6 +2003,21 @@ class MissingToolsError extends Error {
2003
2003
  }
2004
2004
  }
2005
2005
 
2006
+ /**
2007
+ * Generates random token
2008
+ *
2009
+ * Note: This function is cryptographically secure (it uses crypto.randomBytes internally)
2010
+ *
2011
+ * @private internal helper function
2012
+ * @returns secure random token
2013
+ */
2014
+ function $randomToken(randomness) {
2015
+ return randomBytes(randomness).toString('hex');
2016
+ }
2017
+ /**
2018
+ * TODO: Maybe use nanoid instead https://github.com/ai/nanoid
2019
+ */
2020
+
2006
2021
  /**
2007
2022
  * This error indicates errors during the execution of the pipeline
2008
2023
  *
@@ -2010,11 +2025,17 @@ class MissingToolsError extends Error {
2010
2025
  */
2011
2026
  class PipelineExecutionError extends Error {
2012
2027
  constructor(message) {
2028
+ // Added id parameter
2013
2029
  super(message);
2014
2030
  this.name = 'PipelineExecutionError';
2031
+ // TODO: [🐙] DRY - Maybe $randomId
2032
+ this.id = `error-${$randomToken(8 /* <- TODO: To global config + Use Base58 to avoid simmilar char conflicts */)}`;
2015
2033
  Object.setPrototypeOf(this, PipelineExecutionError.prototype);
2016
2034
  }
2017
2035
  }
2036
+ /**
2037
+ * TODO: !!!!!! Add id to all errors
2038
+ */
2018
2039
 
2019
2040
  /**
2020
2041
  * Determine if the pipeline is fully prepared
@@ -2053,21 +2074,6 @@ function isPipelinePrepared(pipeline) {
2053
2074
  * - [♨] Are tasks prepared
2054
2075
  */
2055
2076
 
2056
- /**
2057
- * Generates random token
2058
- *
2059
- * Note: This function is cryptographically secure (it uses crypto.randomBytes internally)
2060
- *
2061
- * @private internal helper function
2062
- * @returns secure random token
2063
- */
2064
- function $randomToken(randomness) {
2065
- return randomBytes(randomness).toString('hex');
2066
- }
2067
- /**
2068
- * TODO: Maybe use nanoid instead https://github.com/ai/nanoid
2069
- */
2070
-
2071
2077
  /**
2072
2078
  * Recursively converts JSON strings to JSON objects
2073
2079
 
@@ -2258,7 +2264,7 @@ const ALL_ERRORS = {
2258
2264
  * @public exported from `@promptbook/utils`
2259
2265
  */
2260
2266
  function deserializeError(error) {
2261
- const { name, stack } = error;
2267
+ const { name, stack, id } = error; // Added id
2262
2268
  let { message } = error;
2263
2269
  let ErrorClass = ALL_ERRORS[error.name];
2264
2270
  if (ErrorClass === undefined) {
@@ -2273,7 +2279,9 @@ function deserializeError(error) {
2273
2279
  ${block(stack || '')}
2274
2280
  `);
2275
2281
  }
2276
- return new ErrorClass(message);
2282
+ const deserializedError = new ErrorClass(message);
2283
+ deserializedError.id = id; // Assign id to the error object
2284
+ return deserializedError;
2277
2285
  }
2278
2286
 
2279
2287
  /**
@@ -2323,6 +2331,7 @@ function assertsTaskSuccessful(executionResult) {
2323
2331
  */
2324
2332
  function createTask(options) {
2325
2333
  const { taskType, taskProcessCallback } = options;
2334
+ // TODO: [🐙] DRY
2326
2335
  const taskId = `${taskType.toLowerCase().substring(0, 4)}-${$randomToken(8 /* <- TODO: To global config + Use Base58 to avoid simmilar char conflicts */)}`;
2327
2336
  let status = 'RUNNING';
2328
2337
  const createdAt = new Date();
@@ -2355,7 +2364,7 @@ function createTask(options) {
2355
2364
  assertsTaskSuccessful(executionResult);
2356
2365
  status = 'FINISHED';
2357
2366
  currentValue = jsonStringsToJsons(executionResult);
2358
- // <- TODO: Convert JSON values in string to JSON objects
2367
+ // <- TODO: [🧠] Is this a good idea to convert JSON strins to JSONs?
2359
2368
  partialResultSubject.next(executionResult);
2360
2369
  }
2361
2370
  catch (error) {
@@ -2419,19 +2428,21 @@ function createTask(options) {
2419
2428
  */
2420
2429
  function serializeError(error) {
2421
2430
  const { name, message, stack } = error;
2431
+ const { id } = error;
2422
2432
  if (!Object.keys(ALL_ERRORS).includes(name)) {
2423
2433
  console.error(spaceTrim((block) => `
2424
-
2434
+
2425
2435
  Cannot serialize error with name "${name}"
2426
2436
 
2427
2437
  ${block(stack || message)}
2428
-
2438
+
2429
2439
  `));
2430
2440
  }
2431
2441
  return {
2432
2442
  name: name,
2433
2443
  message,
2434
2444
  stack,
2445
+ id, // Include id in the serialized object
2435
2446
  };
2436
2447
  }
2437
2448
 
@@ -2574,8 +2585,9 @@ function addUsage(...usageItems) {
2574
2585
  * @returns LLM tools with same functionality with added total cost counting
2575
2586
  * @public exported from `@promptbook/core`
2576
2587
  */
2577
- function countTotalUsage(llmTools) {
2588
+ function countUsage(llmTools) {
2578
2589
  let totalUsage = ZERO_USAGE;
2590
+ const spending = new Subject();
2579
2591
  const proxyTools = {
2580
2592
  get title() {
2581
2593
  // TODO: [🧠] Maybe put here some suffix
@@ -2585,12 +2597,15 @@ function countTotalUsage(llmTools) {
2585
2597
  // TODO: [🧠] Maybe put here some suffix
2586
2598
  return llmTools.description;
2587
2599
  },
2588
- async checkConfiguration() {
2600
+ checkConfiguration() {
2589
2601
  return /* not await */ llmTools.checkConfiguration();
2590
2602
  },
2591
2603
  listModels() {
2592
2604
  return /* not await */ llmTools.listModels();
2593
2605
  },
2606
+ spending() {
2607
+ return spending.asObservable();
2608
+ },
2594
2609
  getTotalUsage() {
2595
2610
  // <- Note: [🥫] Not using getter `get totalUsage` but `getTotalUsage` to allow this object to be proxied
2596
2611
  return totalUsage;
@@ -2601,6 +2616,7 @@ function countTotalUsage(llmTools) {
2601
2616
  // console.info('[🚕] callChatModel through countTotalUsage');
2602
2617
  const promptResult = await llmTools.callChatModel(prompt);
2603
2618
  totalUsage = addUsage(totalUsage, promptResult.usage);
2619
+ spending.next(promptResult.usage);
2604
2620
  return promptResult;
2605
2621
  };
2606
2622
  }
@@ -2609,6 +2625,7 @@ function countTotalUsage(llmTools) {
2609
2625
  // console.info('[🚕] callCompletionModel through countTotalUsage');
2610
2626
  const promptResult = await llmTools.callCompletionModel(prompt);
2611
2627
  totalUsage = addUsage(totalUsage, promptResult.usage);
2628
+ spending.next(promptResult.usage);
2612
2629
  return promptResult;
2613
2630
  };
2614
2631
  }
@@ -2617,6 +2634,7 @@ function countTotalUsage(llmTools) {
2617
2634
  // console.info('[🚕] callEmbeddingModel through countTotalUsage');
2618
2635
  const promptResult = await llmTools.callEmbeddingModel(prompt);
2619
2636
  totalUsage = addUsage(totalUsage, promptResult.usage);
2637
+ spending.next(promptResult.usage);
2620
2638
  return promptResult;
2621
2639
  };
2622
2640
  }
@@ -3513,7 +3531,7 @@ async function preparePipeline(pipeline, tools, options) {
3513
3531
  // TODO: [🚐] Make arrayable LLMs -> single LLM DRY
3514
3532
  const _llms = arrayableToArray(tools.llm);
3515
3533
  const llmTools = _llms.length === 1 ? _llms[0] : joinLlmExecutionTools(..._llms);
3516
- const llmToolsWithUsage = countTotalUsage(llmTools);
3534
+ const llmToolsWithUsage = countUsage(llmTools);
3517
3535
  // <- TODO: [🌯]
3518
3536
  /*
3519
3537
  TODO: [🧠][🪑][🔃] Should this be done or not
@@ -4343,6 +4361,9 @@ function countCharacters(text) {
4343
4361
  text = text.replace(/\p{Extended_Pictographic}(\u{200D}\p{Extended_Pictographic})*/gu, '-');
4344
4362
  return text.length;
4345
4363
  }
4364
+ /**
4365
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4366
+ */
4346
4367
 
4347
4368
  /**
4348
4369
  * Number of characters per standard line with 11pt Arial font size.
@@ -4374,6 +4395,9 @@ function countLines(text) {
4374
4395
  const lines = text.split('\n');
4375
4396
  return lines.reduce((count, line) => count + Math.ceil(line.length / CHARACTERS_PER_STANDARD_LINE), 0);
4376
4397
  }
4398
+ /**
4399
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4400
+ */
4377
4401
 
4378
4402
  /**
4379
4403
  * Counts number of pages in the text
@@ -4385,6 +4409,9 @@ function countLines(text) {
4385
4409
  function countPages(text) {
4386
4410
  return Math.ceil(countLines(text) / LINES_PER_STANDARD_PAGE);
4387
4411
  }
4412
+ /**
4413
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4414
+ */
4388
4415
 
4389
4416
  /**
4390
4417
  * Counts number of paragraphs in the text
@@ -4394,6 +4421,9 @@ function countPages(text) {
4394
4421
  function countParagraphs(text) {
4395
4422
  return text.split(/\n\s*\n/).filter((paragraph) => paragraph.trim() !== '').length;
4396
4423
  }
4424
+ /**
4425
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4426
+ */
4397
4427
 
4398
4428
  /**
4399
4429
  * Split text into sentences
@@ -4411,6 +4441,9 @@ function splitIntoSentences(text) {
4411
4441
  function countSentences(text) {
4412
4442
  return splitIntoSentences(text).length;
4413
4443
  }
4444
+ /**
4445
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4446
+ */
4414
4447
 
4415
4448
  /**
4416
4449
  * Counts number of words in the text
@@ -4424,6 +4457,9 @@ function countWords(text) {
4424
4457
  text = text.replace(/([a-z])([A-Z])/g, '$1 $2');
4425
4458
  return text.split(/[^a-zа-я0-9]+/i).filter((word) => word.length > 0).length;
4426
4459
  }
4460
+ /**
4461
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
4462
+ */
4427
4463
 
4428
4464
  /**
4429
4465
  * Index of all counter functions