npm - @promptbook/markitdown - Versions diffs - 0.88.0-9 → 0.89.0-1 - Mend

@promptbook/markitdown 0.88.0-9 → 0.89.0-1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/README.md CHANGED Viewed

@@ -58,6 +58,8 @@ Rest of the documentation is common for **entire promptbook ecosystem**:
 During the computer revolution, we have seen [multiple generations of computer languages](https://github.com/webgptorg/promptbook/discussions/180), from the physical rewiring of the vacuum tubes through low-level machine code to the high-level languages like Python or JavaScript. And now, we're on the edge of the **next revolution**!
 It's a revolution of writing software in **plain human language** that is understandable and executable by both humans and machines – and it's going to change everything!
 The incredible growth in power of microprocessors and the Moore's Law have been the driving force behind the ever-more powerful languages, and it's been an amazing journey! Similarly, the large language models (like GPT or Claude) are the next big thing in language technology, and they're set to transform the way we interact with computers.
@@ -68,6 +70,9 @@ This shift is going to happen, whether we are ready for it or not. Our mission i
 ## 🚀 Get started
 Take a look at the simple starter kit with books integrated into the **Hello World** sample applications:
@@ -79,6 +84,8 @@ Take a look at the simple starter kit with books integrated into the **Hello Wor
 ## 💜 The Promptbook Project
 Promptbook project is ecosystem of multiple projects and tools, following is a list of most important pieces of the project:
@@ -114,22 +121,35 @@ Promptbook project is ecosystem of multiple projects and tools, following is a l
   </tbody>
 </table>
+Hello world examples:
+-   [Hello world](https://github.com/webgptorg/hello-world)
+-   [Hello world in Node.js](https://github.com/webgptorg/hello-world-node-js)
+-   [Hello world in Next.js](https://github.com/webgptorg/hello-world-next-js)
 We also have a community of developers and users of **Promptbook**:
 -   [Discord community](https://discord.gg/x3QWNaa89N)
 -   [Landing page `ptbk.io`](https://ptbk.io)
 -   [Github discussions](https://github.com/webgptorg/promptbook/discussions)
 -   [LinkedIn `Promptbook`](https://linkedin.com/company/promptbook)
--   [Facebook `Promptbook`](https://www.facebook.com/61560776453536)
+-   [Facebook `Promptbook`](https://www.facebook.com/61560776453536)
 And **Promptbook.studio** branded socials:
 -   [Instagram `@promptbook.studio`](https://www.instagram.com/promptbook.studio/)
 And **Promptujeme** sub-brand:
 _/Subbrand for Czech clients/_
 -   [Promptujeme.cz](https://www.promptujeme.cz/)
 -   [Facebook `Promptujeme`](https://www.facebook.com/promptujeme/)
@@ -147,6 +167,8 @@ _/Sub-brand for images and graphics generated via Promptbook prompting/_
 ## 💙 The Book language
 Following is the documentation and blueprint of the [Book language](https://github.com/webgptorg/book).
 Book is a language that can be used to write AI applications, agents, workflows, automations, knowledgebases, translators, sheet processors, email automations and more. It allows you to harness the power of AI models in human-like terms, without the need to know the specifics and technicalities of the models.
@@ -196,6 +218,8 @@ Personas can have access to different knowledge, tools and actions. They can als
 -   [PERSONA](https://github.com/webgptorg/promptbook/blob/main/documents/commands/PERSONA.md)
 ### **How:** Knowledge, Instruments and Actions
 The resources used by the personas are used to do the work.
@@ -271,11 +295,9 @@ Or you can install them separately:
 ## 📚 Dictionary
-### 📚 Dictionary
 The following glossary is used to clarify certain concepts:
-#### General LLM / AI terms
+### General LLM / AI terms
 -   **Prompt drift** is a phenomenon where the AI model starts to generate outputs that are not aligned with the original prompt. This can happen due to the model's training data, the prompt's wording, or the model's architecture.
 -   **Pipeline, workflow or chain** is a sequence of tasks that are executed in a specific order. In the context of AI, a pipeline can refer to a sequence of AI models that are used to process data.
@@ -286,9 +308,13 @@ The following glossary is used to clarify certain concepts:
 -   **Retrieval-augmented generation** is a machine learning paradigm where a model generates text by retrieving relevant information from a large database of text. This approach combines the benefits of generative models and retrieval models.
 -   **Longtail** refers to non-common or rare events, items, or entities that are not well-represented in the training data of machine learning models. Longtail items are often challenging for models to predict accurately.
-_Note: Thos section is not complete dictionary, more list of general AI / LLM terms that has connection with Promptbook_
-#### 💯 Core concepts
+_Note: This section is not complete dictionary, more list of general AI / LLM terms that has connection with Promptbook_
+### 💯 Core concepts
 -   [📚 Collection of pipelines](https://github.com/webgptorg/promptbook/discussions/65)
 -   [📯 Pipeline](https://github.com/webgptorg/promptbook/discussions/64)
@@ -301,7 +327,7 @@ _Note: Thos section is not complete dictionary, more list of general AI / LLM te
 -   [🔣 Words not tokens](https://github.com/webgptorg/promptbook/discussions/29)
 -   [☯ Separation of concerns](https://github.com/webgptorg/promptbook/discussions/32)
-##### Advanced concepts
+#### Advanced concepts
 -   [📚 Knowledge (Retrieval-augmented generation)](https://github.com/webgptorg/promptbook/discussions/41)
 -   [🌏 Remote server](https://github.com/webgptorg/promptbook/discussions/89)
@@ -316,11 +342,6 @@ _Note: Thos section is not complete dictionary, more list of general AI / LLM te
 -   [👮 Agent adversary expectations](https://github.com/webgptorg/promptbook/discussions/39)
 -   [view more](https://github.com/webgptorg/promptbook/discussions/categories/concepts)
-### Terms specific to Promptbook TypeScript implementation
--   Anonymous mode
--   Application mode
 ## 🚂 Promptbook Engine
@@ -391,11 +412,11 @@ See [TODO.md](./TODO.md)
 <div style="display: flex; align-items: center; gap: 20px;">
   <a href="https://promptbook.studio/">
-    <img src="./design/promptbook-studio-logo.png" alt="Partner 3" height="100">
+    <img src="./design/promptbook-studio-logo.png" alt="Partner 3" height="70">
   </a>
   <a href="https://technologickainkubace.org/en/about-technology-incubation/about-the-project/">
-    <img src="./other/partners/CI-Technology-Incubation.png" alt="Technology Incubation" height="100">
+    <img src="./other/partners/CI-Technology-Incubation.png" alt="Technology Incubation" height="70">
   </a>
 </div>

package/esm/index.es.js CHANGED Viewed

@@ -5,8 +5,8 @@ import hexEncoder from 'crypto-js/enc-hex';
 import { basename, join, dirname } from 'path';
 import { format } from 'prettier';
 import parserHtml from 'prettier/parser-html';
-import { Subject } from 'rxjs';
 import { randomBytes } from 'crypto';
+import { Subject } from 'rxjs';
 import { forTime } from 'waitasecond';
 import sha256 from 'crypto-js/sha256';
 import { lookup, extension } from 'mime-types';
@@ -26,7 +26,7 @@ const BOOK_LANGUAGE_VERSION = '1.0.0';
  * @generated
  * @see https://github.com/webgptorg/promptbook
  */
-const PROMPTBOOK_ENGINE_VERSION = '0.88.0-9';
+const PROMPTBOOK_ENGINE_VERSION = '0.89.0-1';
 /**
  * TODO: string_promptbook_version should be constrained to the all versions of Promptbook engine
  * Note: [💞] Ignore a discrepancy between file name and entity name
@@ -158,7 +158,7 @@ const DEFAULT_MAX_PARALLEL_COUNT = 5; // <- TODO: [🤹‍♂️]
  *
  * @public exported from `@promptbook/core`
  */
-const DEFAULT_MAX_EXECUTION_ATTEMPTS = 3; // <- TODO: [🤹‍♂️]
+const DEFAULT_MAX_EXECUTION_ATTEMPTS = 10; // <- TODO: [🤹‍♂️]
 // <- TODO: [🕝] Make also `BOOKS_DIRNAME_ALTERNATIVES`
 /**
  * Where to store the temporary downloads
@@ -2003,6 +2003,21 @@ class MissingToolsError extends Error {
     }
 }
+/**
+ * Generates random token
+ *
+ * Note: This function is cryptographically secure (it uses crypto.randomBytes internally)
+ *
+ * @private internal helper function
+ * @returns secure random token
+ */
+function $randomToken(randomness) {
+    return randomBytes(randomness).toString('hex');
+}
+/**
+ * TODO: Maybe use nanoid instead https://github.com/ai/nanoid
+ */
 /**
  * This error indicates errors during the execution of the pipeline
  *
@@ -2010,11 +2025,17 @@ class MissingToolsError extends Error {
  */
 class PipelineExecutionError extends Error {
     constructor(message) {
+        // Added id parameter
         super(message);
         this.name = 'PipelineExecutionError';
+        // TODO: [🐙] DRY - Maybe $randomId
+        this.id = `error-${$randomToken(8 /* <- TODO: To global config + Use Base58 to avoid simmilar char conflicts   */)}`;
         Object.setPrototypeOf(this, PipelineExecutionError.prototype);
     }
 }
+/**
+ * TODO: !!!!!! Add id to all errors
+ */
 /**
  * Determine if the pipeline is fully prepared
@@ -2053,21 +2074,6 @@ function isPipelinePrepared(pipeline) {
  *     - [♨] Are tasks prepared
  */
-/**
- * Generates random token
- *
- * Note: This function is cryptographically secure (it uses crypto.randomBytes internally)
- *
- * @private internal helper function
- * @returns secure random token
- */
-function $randomToken(randomness) {
-    return randomBytes(randomness).toString('hex');
-}
-/**
- * TODO: Maybe use nanoid instead https://github.com/ai/nanoid
- */
 /**
  * Recursively converts JSON strings to JSON objects
@@ -2258,7 +2264,7 @@ const ALL_ERRORS = {
  * @public exported from `@promptbook/utils`
  */
 function deserializeError(error) {
-    const { name, stack } = error;
+    const { name, stack, id } = error; // Added id
     let { message } = error;
     let ErrorClass = ALL_ERRORS[error.name];
     if (ErrorClass === undefined) {
@@ -2273,7 +2279,9 @@ function deserializeError(error) {
                 ${block(stack || '')}
             `);
     }
-    return new ErrorClass(message);
+    const deserializedError = new ErrorClass(message);
+    deserializedError.id = id; // Assign id to the error object
+    return deserializedError;
 }
 /**
@@ -2323,6 +2331,7 @@ function assertsTaskSuccessful(executionResult) {
  */
 function createTask(options) {
     const { taskType, taskProcessCallback } = options;
+    // TODO: [🐙] DRY
     const taskId = `${taskType.toLowerCase().substring(0, 4)}-${$randomToken(8 /* <- TODO: To global config + Use Base58 to avoid simmilar char conflicts   */)}`;
     let status = 'RUNNING';
     const createdAt = new Date();
@@ -2355,7 +2364,7 @@ function createTask(options) {
                 assertsTaskSuccessful(executionResult);
                 status = 'FINISHED';
                 currentValue = jsonStringsToJsons(executionResult);
-                // <- TODO: Convert JSON values in string to JSON objects
+                // <- TODO: [🧠] Is this a good idea to convert JSON strins to JSONs?
                 partialResultSubject.next(executionResult);
             }
             catch (error) {
@@ -2419,19 +2428,21 @@ function createTask(options) {
  */
 function serializeError(error) {
     const { name, message, stack } = error;
+    const { id } = error;
     if (!Object.keys(ALL_ERRORS).includes(name)) {
         console.error(spaceTrim((block) => `
                     Cannot serialize error with name "${name}"
                     ${block(stack || message)}
                 `));
     }
     return {
         name: name,
         message,
         stack,
+        id, // Include id in the serialized object
     };
 }
@@ -2574,8 +2585,9 @@ function addUsage(...usageItems) {
  * @returns LLM tools with same functionality with added total cost counting
  * @public exported from `@promptbook/core`
  */
-function countTotalUsage(llmTools) {
+function countUsage(llmTools) {
     let totalUsage = ZERO_USAGE;
+    const spending = new Subject();
     const proxyTools = {
         get title() {
             // TODO: [🧠] Maybe put here some suffix
@@ -2585,12 +2597,15 @@ function countTotalUsage(llmTools) {
             // TODO: [🧠] Maybe put here some suffix
             return llmTools.description;
         },
-        async checkConfiguration() {
+        checkConfiguration() {
             return /* not await */ llmTools.checkConfiguration();
         },
         listModels() {
             return /* not await */ llmTools.listModels();
         },
+        spending() {
+            return spending.asObservable();
+        },
         getTotalUsage() {
             // <- Note: [🥫] Not using getter `get totalUsage` but `getTotalUsage` to allow this object to be proxied
             return totalUsage;
@@ -2601,6 +2616,7 @@ function countTotalUsage(llmTools) {
             // console.info('[🚕] callChatModel through countTotalUsage');
             const promptResult = await llmTools.callChatModel(prompt);
             totalUsage = addUsage(totalUsage, promptResult.usage);
+            spending.next(promptResult.usage);
             return promptResult;
         };
     }
@@ -2609,6 +2625,7 @@ function countTotalUsage(llmTools) {
             // console.info('[🚕] callCompletionModel through countTotalUsage');
             const promptResult = await llmTools.callCompletionModel(prompt);
             totalUsage = addUsage(totalUsage, promptResult.usage);
+            spending.next(promptResult.usage);
             return promptResult;
         };
     }
@@ -2617,6 +2634,7 @@ function countTotalUsage(llmTools) {
             // console.info('[🚕] callEmbeddingModel through countTotalUsage');
             const promptResult = await llmTools.callEmbeddingModel(prompt);
             totalUsage = addUsage(totalUsage, promptResult.usage);
+            spending.next(promptResult.usage);
             return promptResult;
         };
     }
@@ -3513,7 +3531,7 @@ async function preparePipeline(pipeline, tools, options) {
     // TODO: [🚐] Make arrayable LLMs -> single LLM DRY
     const _llms = arrayableToArray(tools.llm);
     const llmTools = _llms.length === 1 ? _llms[0] : joinLlmExecutionTools(..._llms);
-    const llmToolsWithUsage = countTotalUsage(llmTools);
+    const llmToolsWithUsage = countUsage(llmTools);
     //    <- TODO: [🌯]
     /*
     TODO: [🧠][🪑][🔃] Should this be done or not
@@ -4343,6 +4361,9 @@ function countCharacters(text) {
     text = text.replace(/\p{Extended_Pictographic}(\u{200D}\p{Extended_Pictographic})*/gu, '-');
     return text.length;
 }
+/**
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
+ */
 /**
  * Number of characters per standard line with 11pt Arial font size.
@@ -4374,6 +4395,9 @@ function countLines(text) {
     const lines = text.split('\n');
     return lines.reduce((count, line) => count + Math.ceil(line.length / CHARACTERS_PER_STANDARD_LINE), 0);
 }
+/**
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
+ */
 /**
  * Counts number of pages in the text
@@ -4385,6 +4409,9 @@ function countLines(text) {
 function countPages(text) {
     return Math.ceil(countLines(text) / LINES_PER_STANDARD_PAGE);
 }
+/**
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
+ */
 /**
  * Counts number of paragraphs in the text
@@ -4394,6 +4421,9 @@ function countPages(text) {
 function countParagraphs(text) {
     return text.split(/\n\s*\n/).filter((paragraph) => paragraph.trim() !== '').length;
 }
+/**
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
+ */
 /**
  * Split text into sentences
@@ -4411,6 +4441,9 @@ function splitIntoSentences(text) {
 function countSentences(text) {
     return splitIntoSentences(text).length;
 }
+/**
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
+ */
 /**
  * Counts number of words in the text
@@ -4424,6 +4457,9 @@ function countWords(text) {
     text = text.replace(/([a-z])([A-Z])/g, '$1 $2');
     return text.split(/[^a-zа-я0-9]+/i).filter((word) => word.length > 0).length;
 }
+/**
+ * TODO: [🥴] Implement counting in formats - like JSON, CSV, XML,...
+ */
 /**
  * Index of all counter functions