npm - @promptbook/markitdown - Versions diffs - 0.89.0-9 → 0.89.0 - Mend

@promptbook/markitdown 0.89.0-9 → 0.89.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/README.md CHANGED Viewed

@@ -23,10 +23,6 @@
-<blockquote style="color: #ff8811">
-    <b>⚠ Warning:</b> This is a pre-release version of the library. It is not yet ready for production use. Please look at <a href="https://www.npmjs.com/package/@promptbook/core?activeTab=versions">latest stable release</a>.
-</blockquote>
 ## 📦 Package `@promptbook/markitdown`
 - Promptbooks are [divided into several](#-packages) packages, all are published from [single monorepo](https://github.com/webgptorg/promptbook).
@@ -244,6 +240,10 @@ But unlike programming languages, it is designed to be understandable by non-pro
+## 🔒 Security
+For information on reporting security vulnerabilities, see our [Security Policy](./SECURITY.md).
 ## 📦 Packages _(for developers)_
 This library is divided into several packages, all are published from [single monorepo](https://github.com/webgptorg/promptbook).
@@ -300,7 +300,7 @@ The following glossary is used to clarify certain concepts:
 ### General LLM / AI terms
 -   **Prompt drift** is a phenomenon where the AI model starts to generate outputs that are not aligned with the original prompt. This can happen due to the model's training data, the prompt's wording, or the model's architecture.
--   **Pipeline, workflow or chain** is a sequence of tasks that are executed in a specific order. In the context of AI, a pipeline can refer to a sequence of AI models that are used to process data.
+-   [**Pipeline, workflow scenario or chain** is a sequence of tasks that are executed in a specific order. In the context of AI, a pipeline can refer to a sequence of AI models that are used to process data.](https://github.com/webgptorg/promptbook/discussions/88)
 -   **Fine-tuning** is a process where a pre-trained AI model is further trained on a specific dataset to improve its performance on a specific task.
 -   **Zero-shot learning** is a machine learning paradigm where a model is trained to perform a task without any labeled examples. Instead, the model is provided with a description of the task and is expected to generate the correct output.
 -   **Few-shot learning** is a machine learning paradigm where a model is trained to perform a task with only a few labeled examples. This is in contrast to traditional machine learning, where models are trained on large datasets.
@@ -308,10 +308,6 @@ The following glossary is used to clarify certain concepts:
 -   **Retrieval-augmented generation** is a machine learning paradigm where a model generates text by retrieving relevant information from a large database of text. This approach combines the benefits of generative models and retrieval models.
 -   **Longtail** refers to non-common or rare events, items, or entities that are not well-represented in the training data of machine learning models. Longtail items are often challenging for models to predict accurately.
 _Note: This section is not complete dictionary, more list of general AI / LLM terms that has connection with Promptbook_
@@ -425,6 +421,8 @@ See [TODO.md](./TODO.md)
 ## 🖋️ Contributing
-We are open to pull requests, feedback, and suggestions.
+You can also ⭐ star the project, [follow us on GitHub](https://github.com/hejny) or [various other social networks](https://www.pavolhejny.com/contact/).We are open to [pull requests, feedback, and suggestions](./CONTRIBUTING.md).
+## 📞 Support
-You can also ⭐ star the project, [follow us on GitHub](https://github.com/hejny) or [various other social networks](https://www.pavolhejny.com/contact/).
+If you need help or have questions, please check our [Support Resources](./SUPPORT.md).

package/esm/index.es.js CHANGED Viewed

@@ -26,7 +26,7 @@ const BOOK_LANGUAGE_VERSION = '1.0.0';
  * @generated
  * @see https://github.com/webgptorg/promptbook
  */
-const PROMPTBOOK_ENGINE_VERSION = '0.89.0-9';
+const PROMPTBOOK_ENGINE_VERSION = '0.89.0';
 /**
  * TODO: string_promptbook_version should be constrained to the all versions of Promptbook engine
  * Note: [💞] Ignore a discrepancy between file name and entity name
@@ -89,6 +89,7 @@ const ADMIN_EMAIL = 'pavol@ptbk.io';
  * @public exported from `@promptbook/core`
  */
 const ADMIN_GITHUB_NAME = 'hejny';
+//            <- TODO: [🐊] Pick the best claim
 /**
  * When the title is not provided, the default title is used
  *
@@ -121,6 +122,7 @@ const VALUE_STRINGS = {
     infinity: '(infinity; ∞)',
     negativeInfinity: '(negative infinity; -∞)',
     unserializable: '(unserializable value)',
+    circular: '(circular JSON)',
 };
 /**
  * Small number limit
@@ -160,7 +162,7 @@ const DEFAULT_MAX_PARALLEL_COUNT = 5; // <- TODO: [🤹‍♂️]
  */
 const DEFAULT_MAX_EXECUTION_ATTEMPTS = 10; // <- TODO: [🤹‍♂️]
 // <- TODO: [🕝] Make also `BOOKS_DIRNAME_ALTERNATIVES`
-// TODO: !!!!!! Just .promptbook dir, hardocode others
+// TODO: Just `.promptbook` in config, hardcode subfolders like `download-cache` or `execution-cache`
 /**
  * Where to store the temporary downloads
  *
@@ -878,9 +880,60 @@ class ParseError extends Error {
  * TODO: Maybe split `ParseError` and `ApplyError`
  */
+/**
+ * This error type indicates that somewhere in the code non-Error object was thrown and it was wrapped into the `WrappedError`
+ *
+ * @public exported from `@promptbook/core`
+ */
+class WrappedError extends Error {
+    constructor(whatWasThrown) {
+        const tag = `[🤮]`;
+        console.error(tag, whatWasThrown);
+        super(spaceTrim$1(`
+                Non-Error object was thrown
+                Note: Look for ${tag} in the console for more details
+                Please report issue on ${ADMIN_EMAIL}
+            `));
+        this.name = 'WrappedError';
+        Object.setPrototypeOf(this, WrappedError.prototype);
+    }
+}
+/**
+ * Helper used in catch blocks to assert that the error is an instance of `Error`
+ *
+ * @param whatWasThrown Any object that was thrown
+ * @returns Nothing if the error is an instance of `Error`
+ * @throws `WrappedError` or `UnexpectedError` if the error is not standard
+ *
+ * @private within the repository
+ */
+function assertsError(whatWasThrown) {
+    // Case 1: Handle error which was rethrown as `WrappedError`
+    if (whatWasThrown instanceof WrappedError) {
+        const wrappedError = whatWasThrown;
+        throw wrappedError;
+    }
+    // Case 2: Handle unexpected errors
+    if (whatWasThrown instanceof UnexpectedError) {
+        const unexpectedError = whatWasThrown;
+        throw unexpectedError;
+    }
+    // Case 3: Handle standard errors - keep them up to consumer
+    if (whatWasThrown instanceof Error) {
+        return;
+    }
+    // Case 4: Handle non-standard errors - wrap them into `WrappedError` and throw
+    throw new WrappedError(whatWasThrown);
+}
 /**
  * Function isValidJsonString will tell you if the string is valid JSON or not
  *
+ * @param value The string to check
+ * @returns True if the string is a valid JSON string, false otherwise
+ *
  * @public exported from `@promptbook/utils`
  */
 function isValidJsonString(value /* <- [👨‍⚖️] */) {
@@ -889,9 +942,7 @@ function isValidJsonString(value /* <- [👨‍⚖️] */) {
         return true;
     }
     catch (error) {
-        if (!(error instanceof Error)) {
-            throw error;
-        }
+        assertsError(error);
         if (error.message.includes('Unexpected token')) {
             return false;
         }
@@ -1244,9 +1295,7 @@ function checkSerializableAsJson(options) {
                 JSON.stringify(value); // <- TODO: [0]
             }
             catch (error) {
-                if (!(error instanceof Error)) {
-                    throw error;
-                }
+                assertsError(error);
                 throw new UnexpectedError(spaceTrim((block) => `
                             \`${name}\` is not serializable
@@ -2035,7 +2084,7 @@ class PipelineExecutionError extends Error {
     }
 }
 /**
- * TODO: !!!!!! Add id to all errors
+ * TODO: [🧠][🌂] Add id to all errors
  */
 /**
@@ -2246,7 +2295,10 @@ const PROMPTBOOK_ERRORS = {
     PipelineExecutionError,
     PipelineLogicError,
     PipelineUrlError,
+    AuthenticationError,
+    PromptbookFetchError,
     UnexpectedError,
+    WrappedError,
     // TODO: [🪑]> VersionMismatchError,
 };
 /**
@@ -2263,8 +2315,6 @@ const COMMON_JAVASCRIPT_ERRORS = {
     TypeError,
     URIError,
     AggregateError,
-    AuthenticationError,
-    PromptbookFetchError,
     /*
   Note: Not widely supported
   > InternalError,
@@ -2387,8 +2437,8 @@ function createTask(options) {
                 updatedAt = new Date();
                 errors.push(...executionResult.errors);
                 warnings.push(...executionResult.warnings);
-                // <- TODO: !!! Only unique errors and warnings should be added (or filtered)
-                // TODO: [🧠] !!! errors, warning, isSuccessful  are redundant both in `ExecutionTask` and `ExecutionTask.currentValue`
+                // <- TODO: [🌂] Only unique errors and warnings should be added (or filtered)
+                // TODO: [🧠] !! errors, warning, isSuccessful  are redundant both in `ExecutionTask` and `ExecutionTask.currentValue`
                 //            Also maybe move `ExecutionTask.currentValue.usage` -> `ExecutionTask.usage`
                 //            And delete `ExecutionTask.currentValue.preparedPipeline`
                 assertsTaskSuccessful(executionResult);
@@ -2398,6 +2448,7 @@ function createTask(options) {
                 partialResultSubject.next(executionResult);
             }
             catch (error) {
+                assertsError(error);
                 status = 'ERROR';
                 errors.push(error);
                 partialResultSubject.error(error);
@@ -2789,14 +2840,15 @@ class MultipleLlmExecutionTools {
                 }
             }
             catch (error) {
-                if (!(error instanceof Error) || error instanceof UnexpectedError) {
+                assertsError(error);
+                if (error instanceof UnexpectedError) {
                     throw error;
                 }
                 errors.push({ llmExecutionTools, error });
             }
         }
         if (errors.length === 1) {
-            throw errors[0];
+            throw errors[0].error;
         }
         else if (errors.length > 1) {
             throw new PipelineExecutionError(
@@ -3251,9 +3303,7 @@ const promptbookFetch = async (urlOrRequest, init) => {
         return await fetch(urlOrRequest, init);
     }
     catch (error) {
-        if (!(error instanceof Error)) {
-            throw error;
-        }
+        assertsError(error);
         let url;
         if (typeof urlOrRequest === 'string') {
             url = urlOrRequest;
@@ -3484,9 +3534,7 @@ async function prepareKnowledgePieces(knowledgeSources, tools, options) {
             knowledgePreparedUnflatten[index] = pieces;
         }
         catch (error) {
-            if (!(error instanceof Error)) {
-                throw error;
-            }
+            assertsError(error);
             console.warn(error);
             // <- TODO: [🏮] Some standard way how to transform errors into warnings and how to handle non-critical fails during the tasks
         }
@@ -3778,13 +3826,19 @@ function valueToString(value) {
             return value.toISOString();
         }
         else {
-            return JSON.stringify(value);
+            try {
+                return JSON.stringify(value);
+            }
+            catch (error) {
+                if (error instanceof TypeError && error.message.includes('circular structure')) {
+                    return VALUE_STRINGS.circular;
+                }
+                throw error;
+            }
         }
     }
     catch (error) {
-        if (!(error instanceof Error)) {
-            throw error;
-        }
+        assertsError(error);
         console.error(error);
         return VALUE_STRINGS.unserializable;
     }
@@ -3841,9 +3895,7 @@ function extractVariablesFromJavascript(script) {
             }
     }
     catch (error) {
-        if (!(error instanceof Error)) {
-            throw error;
-        }
+        assertsError(error);
         throw new ParseError(spaceTrim$1((block) => `
                     Can not extract variables from the script
                     ${block(error.stack || error.message)}
@@ -3962,6 +4014,28 @@ const MANDATORY_CSV_SETTINGS = Object.freeze({
     // encoding: 'utf-8',
 });
+/**
+ * Function to check if a string is valid CSV
+ *
+ * @param value The string to check
+ * @returns True if the string is a valid CSV string, false otherwise
+ *
+ * @public exported from `@promptbook/utils`
+ */
+function isValidCsvString(value) {
+    try {
+        // A simple check for CSV format: at least one comma and no invalid characters
+        if (value.includes(',') && /^[\w\s,"']+$/.test(value)) {
+            return true;
+        }
+        return false;
+    }
+    catch (error) {
+        assertsError(error);
+        return false;
+    }
+}
 /**
  * Definition for CSV spreadsheet
  *
@@ -3972,7 +4046,7 @@ const CsvFormatDefinition = {
     formatName: 'CSV',
     aliases: ['SPREADSHEET', 'TABLE'],
     isValid(value, settings, schema) {
-        return true;
+        return isValidCsvString(value);
     },
     canBeValid(partialValue, settings, schema) {
         return true;
@@ -4126,6 +4200,30 @@ const TextFormatDefinition = {
  * TODO: [🏢] Allow to expect something inside each item of list and other formats
  */
+/**
+ * Function to check if a string is valid XML
+ *
+ * @param value
+ * @returns True if the string is a valid XML string, false otherwise
+ *
+ * @public exported from `@promptbook/utils`
+ */
+function isValidXmlString(value) {
+    try {
+        const parser = new DOMParser();
+        const parsedDocument = parser.parseFromString(value, 'application/xml');
+        const parserError = parsedDocument.getElementsByTagName('parsererror');
+        if (parserError.length > 0) {
+            return false;
+        }
+        return true;
+    }
+    catch (error) {
+        assertsError(error);
+        return false;
+    }
+}
 /**
  * Definition for XML format
  *
@@ -4135,7 +4233,7 @@ const XmlFormatDefinition = {
     formatName: 'XML',
     mimeType: 'application/xml',
     isValid(value, settings, schema) {
-        return true;
+        return isValidXmlString(value);
     },
     canBeValid(partialValue, settings, schema) {
         return true;
@@ -4708,9 +4806,7 @@ async function executeAttempts(options) {
                                 break scripts;
                             }
                             catch (error) {
-                                if (!(error instanceof Error)) {
-                                    throw error;
-                                }
+                                assertsError(error);
                                 if (error instanceof UnexpectedError) {
                                     throw error;
                                 }
@@ -4780,9 +4876,7 @@ async function executeAttempts(options) {
                             break scripts;
                         }
                         catch (error) {
-                            if (!(error instanceof Error)) {
-                                throw error;
-                            }
+                            assertsError(error);
                             if (error instanceof UnexpectedError) {
                                 throw error;
                             }
@@ -5403,9 +5497,7 @@ async function executePipeline(options) {
         await Promise.all(resolving);
     }
     catch (error /* <- Note: [3] */) {
-        if (!(error instanceof Error)) {
-            throw error;
-        }
+        assertsError(error);
         // Note: No need to rethrow UnexpectedError
         // if (error instanceof UnexpectedError) {
         // Note: Count usage, [🧠] Maybe put to separate function executionReportJsonToUsage + DRY [🤹‍♂️]