PyPI - discovery-engine-api - Versions diffs - 0.2.93__tar.gz → 0.2.94__tar.gz - Mend

discovery-engine-api 0.2.93tar.gz → 0.2.94tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

{discovery_engine_api-0.2.93 → discovery_engine_api-0.2.94}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: discovery-engine-api
-Version: 0.2.93
+Version: 0.2.94
 Summary: Python SDK for Disco API
 Project-URL: Homepage, https://www.leap-labs.com
 Project-URL: Documentation, https://disco.leap-labs.com/llms-full.txt
@@ -78,19 +78,26 @@ Get your API key from the [Developers page](https://disco.leap-labs.com/develope
 await engine.discover(
     file: str | Path | pd.DataFrame,  # Dataset to analyze
     target_column: str,                 # Column to predict/analyze
-    analysis_depth: int = 2,          # 1=fast, higher=deeper search
+    analysis_depth: int = 2,            # 2=default, higher=deeper analysis
     visibility: str = "public",         # "public" (free) or "private" (credits)
     title: str | None = None,           # Dataset title
     description: str | None = None,     # Dataset description
     column_descriptions: dict[str, str] | None = None,  # Improves pattern explanations
-    excluded_columns: list[str] | None = None,           # Columns to exclude (e.g., IDs)
+    excluded_columns: list[str] | None = None,           # Columns to exclude — see below
+    use_llms: bool = False,             # True = LLM explanations (costs more) — see below
     timeout: float = 1800,              # Max seconds to wait
+    # Additional kwargs forwarded to run_async():
+    # task, author, source_url, timeseries_groups, ...
 )
 ```
 > **Tip:** Providing `column_descriptions` significantly improves pattern explanations. If your columns have non-obvious names, always describe them.
-> **Depth and visibility:** Public runs are free; results are published to the public gallery. Private runs consume credits based on file size × depth.
+> **`use_llms`:** Default `False`. Slower and more expensive, but you get smarter pre-processing, literature context and novelty assessment. Set to `True` if you want Disco-generated pattern descriptions, novelty assessment with citations, and report summaries. **Public runs always use LLMs regardless of this setting.** What changes when false: pattern descriptions fall back to generic text, novelty is not assessed (all patterns marked confirmatory, no citations), report summaries are omitted, integer columns with few unique values (e.g. "month" 1-12, "hour" 0-23) may be misclassified as categorical instead of continuous, and high-cardinality text columns get generic cluster names instead of descriptive ones. Use `engine.estimate()` to check credit cost before running.
+> **Visibility:** `"public"` runs are free but results are published, and analysis depth is locked to 2. `"private"` runs keep results confidential and consume credits.
+> **`excluded_columns`:** Always exclude identifiers (row IDs, UUIDs), data leakage (target renamed/reformatted), and tautological columns (alternative encodings of the same construct as the target). For example, if your target is `serious`, exclude `serious_outcome`, `not_serious`, `death` — they're part of the same classification system.
 ## Examples
@@ -111,32 +118,25 @@ result = await engine.discover(
         "age": "Patient age in years",
         "bmi": "Body mass index",
     },
-    excluded_columns=["patient_id", "timestamp"],
+    excluded_columns=["patient_id", "timestamp", "outcome_text"],  # IDs + tautological
 )
 ```
-### Inspecting Columns Before Running
-If you need to see the dataset's columns before choosing a target column, upload first and inspect:
+### Running in the Background
-```python
-# Upload once and get the server's parsed column list
-upload = await engine.upload_file(file="data.csv", title="My dataset")
-print(upload["columns"])   # [{"name": "col1", "type": "continuous", ...}, ...]
-print(upload["rowCount"])  # e.g., 5000
+Runs take 3–15 minutes. While waiting, the SDK logs progress automatically:
-# Pass the result to avoid re-uploading
-result = await engine.run_async(
-    file="data.csv",
-    target_column="col1",
-    wait=True,
-    upload_result=upload,  # skips the upload step
-)
+```
+Waiting for run abc123 to complete...
+  Status: waiting (position 2 in queue) | Est. wait: ~8 min | Upgrade at disco.leap-labs.com/account for priority processing
+  Status: processing (preprocessing — Processing data...) | Elapsed: 34.2s | ETA: ~6 min
+  Status: processing (training — Modelling data...) | Elapsed: 98.7s | ETA: ~4 min
+  Status: processing (interpreting — Extracting patterns...) | Elapsed: 284.1s | ETA: ~2 min
+  Status: processing (reporting — Building report...) | Elapsed: 412.3s | ETA: ~1 min
+Run completed in 467.8s
 ```
-### Running in the Background
-Runs take 3–15 minutes. If you need to do other work while Disco runs:
+If you need to do other work while Disco runs:
 ```python
 import asyncio
@@ -161,6 +161,29 @@ async def main():
 result = asyncio.run(main())
 ```
+### Inspecting Columns Before Running
+If you need to see the dataset's columns before choosing a target column — e.g., when column names are not obvious — upload first, inspect, then run without re-uploading:
+```python
+# Upload once and get the server's parsed column list
+upload = await engine.upload_file(file="data.csv", title="My dataset")
+# upload["file"]    -> {"key": "uploads/abc123.csv", "name": "data.csv",
+#                        "size": 1048576, "fileHash": "sha256:..."}
+# upload["columns"] -> [{"name": "col1", "type": "continuous", ...}, ...]
+# upload["rowCount"] -> 5000
+print(upload["columns"])
+print(upload["rowCount"])
+# Pass the result to avoid re-uploading
+result = await engine.run_async(
+    file="data.csv",
+    target_column="col1",
+    wait=True,
+    upload_result=upload,  # skips the upload step
+)
+```
 ### Synchronous Usage
 For scripts and Jupyter notebooks:
@@ -212,7 +235,7 @@ print(f"Explore: {result.report_url}")
 ## Credits and Pricing
-- **Public runs**: Free. Results published to public gallery.
+- **Public runs**: Free. Results published to public gallery. Locked to depth=2.
 - **Private runs**: Credits scale with file size, depth, and run configuration. $0.10 per credit. Use `engine.estimate()` to check cost before running.
 ```python
@@ -223,13 +246,27 @@ estimate = await engine.estimate(
     analysis_depth=2,
     visibility="private",
 )
-# estimate["cost"]["credits"] -> 21
-# estimate["account"]["sufficient"] -> True/False
+# estimate["cost"]["credits"]               -> 55
+# estimate["cost"]["price_usd"]             -> 5.5
+# estimate["time_estimate"]["estimated_seconds"] -> 360
+# estimate["account"]["sufficient"]         -> True/False
+# estimate["limits"]["max_analysis_depth"]  -> 23  (num_columns - 2)
 ```
 Manage credits and plans at [disco.leap-labs.com/account](https://disco.leap-labs.com/account).
+## Expected Data Format
+Disco expects a **flat table** — columns for features, rows for samples.
+- **One row per observation** — a patient, a sample, a transaction, a measurement, etc.
+- **One column per feature** — numeric, categorical, datetime, or free text are all fine
+- **One target column** — the outcome to analyze. Must have at least 2 distinct values.
+- **Missing values are OK** — Disco handles them automatically. Don't drop rows or impute beforehand.
+Not supported: images, raw text documents, nested/hierarchical JSON, multi-sheet Excel (use the first sheet or export to CSV).
 ## File Size Limits
 Uploads up to **5 GB**. Files are uploaded directly to cloud storage using presigned URLs.
@@ -245,16 +282,30 @@ Supported formats: **CSV**, **TSV**, **Excel (.xlsx)**, **JSON**, **Parquet**, *
 @dataclass
 class EngineResult:
     run_id: str
+    report_id: str | None                          # Report UUID (used in report_url)
     status: str                                    # "pending", "processing", "completed", "failed"
+    dataset_title: str | None                      # Title of the dataset
+    dataset_description: str | None                # Description of the dataset
+    total_rows: int | None
+    target_column: str | None                      # Column being predicted/analyzed
+    task: str | None                               # "regression", "binary_classification", "multiclass_classification"
     summary: Summary | None                        # LLM-generated insights
     patterns: list[Pattern]                        # Discovered patterns (the core output)
     columns: list[Column]                          # Feature info and statistics
-    feature_importance: FeatureImportance | None   # Global importance scores
     correlation_matrix: list[CorrelationEntry]     # Feature correlations
-    report_url: str | None                         # Shareable link to interactive web report
-    task: str | None                               # "regression", "binary_classification", "multiclass_classification"
-    total_rows: int | None
+    feature_importance: FeatureImportance | None   # Global importance scores
+    job_id: str | None                             # Job ID for tracking
+    job_status: str | None                         # Job queue status
+    queue_position: int | None                     # Position in queue when pending (1 = next up)
+    current_step: str | None                       # Active pipeline step (preprocessing, training, interpreting, reporting)
+    current_step_message: str | None               # Human-readable description of the current step
+    estimated_seconds: int | None                  # Estimated total processing time in seconds
+    estimated_wait_seconds: int | None             # Estimated queue wait time in seconds (pending only)
     error_message: str | None
+    report_url: str | None                         # Shareable link to interactive web report
+    hints: list[str]                               # Upgrade hints (non-empty for free-tier users with hidden patterns)
+    hidden_deep_count: int                         # Patterns hidden for free-tier accounts (upgrade to see all)
+    hidden_deep_novel_count: int                   # Novel patterns hidden for free-tier accounts
 ```
 ### Pattern
@@ -263,6 +314,8 @@ class EngineResult:
 @dataclass
 class Pattern:
     id: str
+    task: str                           # "regression", "binary_classification", "multiclass_classification"
+    target_column: str                  # Column being analyzed
     description: str                    # Human-readable description
     conditions: list[dict]              # Conditions defining the pattern
     p_value: float                      # FDR-adjusted p-value
@@ -272,8 +325,10 @@ class Pattern:
     citations: list[dict]               # Academic citations
     target_change_direction: str        # "max" (increases target) or "min" (decreases)
     abs_target_change: float            # Magnitude of effect
+    target_score: float                 # Mean target value (regression) or class fraction (classification) in the subgroup
     support_count: int                  # Rows matching this pattern
     support_percentage: float           # Percentage of dataset
+    target_class: str | None            # For classification tasks
     target_mean: float | None           # For regression tasks
     target_std: float | None
 ```
@@ -323,6 +378,7 @@ class Summary:
     overview: str                       # High-level summary of findings
     key_insights: list[str]             # Main takeaways
     novel_patterns: PatternGroup        # Novel pattern IDs and explanation
+    selected_pattern_id: str | None     # ID of the highlighted/featured pattern
 ```
 ### Column
@@ -342,17 +398,22 @@ class Column:
     std: float | None
     min: float | None
     max: float | None
+    iqr_min: float | None               # 25th percentile
+    iqr_max: float | None               # 75th percentile
+    mode: str | None                    # Most common value (categorical columns)
+    approx_unique: int | None           # Approximate distinct value count
+    null_percentage: float | None
     feature_importance_score: float | None  # Signed importance score
 ```
 ### FeatureImportance
-Computed using **Hierarchical Perturbation (HiPe)**, an ablation-based method. Scores are **signed** — positive means the feature increases the prediction, negative means it decreases it.
+Scores are **signed** — positive means the feature increases the prediction, negative means it decreases it.
 ```python
 @dataclass
 class FeatureImportance:
-    kind: str                           # "global"
+    kind: str                           # "global" | "local"
     baseline: float                     # Baseline model output
     scores: list[FeatureImportanceScore]
@@ -366,12 +427,13 @@ class FeatureImportanceScore:
 ## Error Handling
 ```python
-from discovery import (
-    Engine,
+from discovery import Engine
+from discovery.errors import (
     AuthenticationError,
     InsufficientCreditsError,
     RateLimitError,
     RunFailedError,
+    RunNotFoundError,
     PaymentRequiredError,
 )
@@ -381,11 +443,15 @@ except AuthenticationError as e:
     print(e.suggestion)  # "Check your API key at https://disco.leap-labs.com/developers"
 except InsufficientCreditsError as e:
     print(f"Need {e.credits_required}, have {e.credits_available}")
-    print(e.suggestion)  # "Purchase credits or run publicly for free"
+    print(e.suggestion)  # "Run with visibility='public' (free, results published) or purchase credits with engine.purchase_credits()."
 except RateLimitError as e:
     print(f"Retry after {e.retry_after} seconds")
 except RunFailedError as e:
     print(f"Run {e.run_id} failed: {e}")
+except RunNotFoundError as e:
+    print(f"Run {e.run_id} not found — may have been cleaned up")
+except PaymentRequiredError as e:
+    print(e.suggestion)  # "Attach a payment method with engine.add_payment_method(...)"
 except TimeoutError:
     pass  # Retrieve later with engine.wait_for_completion(run_id)
 ```
@@ -395,7 +461,7 @@ All errors include a `suggestion` field with actionable instructions.
 ## MCP Server
-Disco is available as an [MCP server](https://disco.leap-labs.com/.well-known/mcp.json) with tools for the full discovery lifecycle — estimate, analyze, check status, get results, manage account.
+Disco is available as an [MCP server](https://disco.leap-labs.com/.well-known/mcp.json) with tools for the full discovery lifecycle — estimate, analyze, check status, get results, manage account. To subscribe or purchase credits via MCP, call `discovery_add_payment_method` first to attach a Stripe payment method.
 ```json
 {

{discovery_engine_api-0.2.93 → discovery_engine_api-0.2.94}/README.md RENAMED Viewed

@@ -41,19 +41,26 @@ Get your API key from the [Developers page](https://disco.leap-labs.com/develope
 await engine.discover(
     file: str | Path | pd.DataFrame,  # Dataset to analyze
     target_column: str,                 # Column to predict/analyze
-    analysis_depth: int = 2,          # 1=fast, higher=deeper search
+    analysis_depth: int = 2,            # 2=default, higher=deeper analysis
     visibility: str = "public",         # "public" (free) or "private" (credits)
     title: str | None = None,           # Dataset title
     description: str | None = None,     # Dataset description
     column_descriptions: dict[str, str] | None = None,  # Improves pattern explanations
-    excluded_columns: list[str] | None = None,           # Columns to exclude (e.g., IDs)
+    excluded_columns: list[str] | None = None,           # Columns to exclude — see below
+    use_llms: bool = False,             # True = LLM explanations (costs more) — see below
     timeout: float = 1800,              # Max seconds to wait
+    # Additional kwargs forwarded to run_async():
+    # task, author, source_url, timeseries_groups, ...
 )
 ```
 > **Tip:** Providing `column_descriptions` significantly improves pattern explanations. If your columns have non-obvious names, always describe them.
-> **Depth and visibility:** Public runs are free; results are published to the public gallery. Private runs consume credits based on file size × depth.
+> **`use_llms`:** Default `False`. Slower and more expensive, but you get smarter pre-processing, literature context and novelty assessment. Set to `True` if you want Disco-generated pattern descriptions, novelty assessment with citations, and report summaries. **Public runs always use LLMs regardless of this setting.** What changes when false: pattern descriptions fall back to generic text, novelty is not assessed (all patterns marked confirmatory, no citations), report summaries are omitted, integer columns with few unique values (e.g. "month" 1-12, "hour" 0-23) may be misclassified as categorical instead of continuous, and high-cardinality text columns get generic cluster names instead of descriptive ones. Use `engine.estimate()` to check credit cost before running.
+> **Visibility:** `"public"` runs are free but results are published, and analysis depth is locked to 2. `"private"` runs keep results confidential and consume credits.
+> **`excluded_columns`:** Always exclude identifiers (row IDs, UUIDs), data leakage (target renamed/reformatted), and tautological columns (alternative encodings of the same construct as the target). For example, if your target is `serious`, exclude `serious_outcome`, `not_serious`, `death` — they're part of the same classification system.
 ## Examples
@@ -74,32 +81,25 @@ result = await engine.discover(
         "age": "Patient age in years",
         "bmi": "Body mass index",
     },
-    excluded_columns=["patient_id", "timestamp"],
+    excluded_columns=["patient_id", "timestamp", "outcome_text"],  # IDs + tautological
 )
 ```
-### Inspecting Columns Before Running
-If you need to see the dataset's columns before choosing a target column, upload first and inspect:
+### Running in the Background
-```python
-# Upload once and get the server's parsed column list
-upload = await engine.upload_file(file="data.csv", title="My dataset")
-print(upload["columns"])   # [{"name": "col1", "type": "continuous", ...}, ...]
-print(upload["rowCount"])  # e.g., 5000
+Runs take 3–15 minutes. While waiting, the SDK logs progress automatically:
-# Pass the result to avoid re-uploading
-result = await engine.run_async(
-    file="data.csv",
-    target_column="col1",
-    wait=True,
-    upload_result=upload,  # skips the upload step
-)
+```
+Waiting for run abc123 to complete...
+  Status: waiting (position 2 in queue) | Est. wait: ~8 min | Upgrade at disco.leap-labs.com/account for priority processing
+  Status: processing (preprocessing — Processing data...) | Elapsed: 34.2s | ETA: ~6 min
+  Status: processing (training — Modelling data...) | Elapsed: 98.7s | ETA: ~4 min
+  Status: processing (interpreting — Extracting patterns...) | Elapsed: 284.1s | ETA: ~2 min
+  Status: processing (reporting — Building report...) | Elapsed: 412.3s | ETA: ~1 min
+Run completed in 467.8s
 ```
-### Running in the Background
-Runs take 3–15 minutes. If you need to do other work while Disco runs:
+If you need to do other work while Disco runs:
 ```python
 import asyncio
@@ -124,6 +124,29 @@ async def main():
 result = asyncio.run(main())
 ```
+### Inspecting Columns Before Running
+If you need to see the dataset's columns before choosing a target column — e.g., when column names are not obvious — upload first, inspect, then run without re-uploading:
+```python
+# Upload once and get the server's parsed column list
+upload = await engine.upload_file(file="data.csv", title="My dataset")
+# upload["file"]    -> {"key": "uploads/abc123.csv", "name": "data.csv",
+#                        "size": 1048576, "fileHash": "sha256:..."}
+# upload["columns"] -> [{"name": "col1", "type": "continuous", ...}, ...]
+# upload["rowCount"] -> 5000
+print(upload["columns"])
+print(upload["rowCount"])
+# Pass the result to avoid re-uploading
+result = await engine.run_async(
+    file="data.csv",
+    target_column="col1",
+    wait=True,
+    upload_result=upload,  # skips the upload step
+)
+```
 ### Synchronous Usage
 For scripts and Jupyter notebooks:
@@ -175,7 +198,7 @@ print(f"Explore: {result.report_url}")
 ## Credits and Pricing
-- **Public runs**: Free. Results published to public gallery.
+- **Public runs**: Free. Results published to public gallery. Locked to depth=2.
 - **Private runs**: Credits scale with file size, depth, and run configuration. $0.10 per credit. Use `engine.estimate()` to check cost before running.
 ```python
@@ -186,13 +209,27 @@ estimate = await engine.estimate(
     analysis_depth=2,
     visibility="private",
 )
-# estimate["cost"]["credits"] -> 21
-# estimate["account"]["sufficient"] -> True/False
+# estimate["cost"]["credits"]               -> 55
+# estimate["cost"]["price_usd"]             -> 5.5
+# estimate["time_estimate"]["estimated_seconds"] -> 360
+# estimate["account"]["sufficient"]         -> True/False
+# estimate["limits"]["max_analysis_depth"]  -> 23  (num_columns - 2)
 ```
 Manage credits and plans at [disco.leap-labs.com/account](https://disco.leap-labs.com/account).
+## Expected Data Format
+Disco expects a **flat table** — columns for features, rows for samples.
+- **One row per observation** — a patient, a sample, a transaction, a measurement, etc.
+- **One column per feature** — numeric, categorical, datetime, or free text are all fine
+- **One target column** — the outcome to analyze. Must have at least 2 distinct values.
+- **Missing values are OK** — Disco handles them automatically. Don't drop rows or impute beforehand.
+Not supported: images, raw text documents, nested/hierarchical JSON, multi-sheet Excel (use the first sheet or export to CSV).
 ## File Size Limits
 Uploads up to **5 GB**. Files are uploaded directly to cloud storage using presigned URLs.
@@ -208,16 +245,30 @@ Supported formats: **CSV**, **TSV**, **Excel (.xlsx)**, **JSON**, **Parquet**, *
 @dataclass
 class EngineResult:
     run_id: str
+    report_id: str | None                          # Report UUID (used in report_url)
     status: str                                    # "pending", "processing", "completed", "failed"
+    dataset_title: str | None                      # Title of the dataset
+    dataset_description: str | None                # Description of the dataset
+    total_rows: int | None
+    target_column: str | None                      # Column being predicted/analyzed
+    task: str | None                               # "regression", "binary_classification", "multiclass_classification"
     summary: Summary | None                        # LLM-generated insights
     patterns: list[Pattern]                        # Discovered patterns (the core output)
     columns: list[Column]                          # Feature info and statistics
-    feature_importance: FeatureImportance | None   # Global importance scores
     correlation_matrix: list[CorrelationEntry]     # Feature correlations
-    report_url: str | None                         # Shareable link to interactive web report
-    task: str | None                               # "regression", "binary_classification", "multiclass_classification"
-    total_rows: int | None
+    feature_importance: FeatureImportance | None   # Global importance scores
+    job_id: str | None                             # Job ID for tracking
+    job_status: str | None                         # Job queue status
+    queue_position: int | None                     # Position in queue when pending (1 = next up)
+    current_step: str | None                       # Active pipeline step (preprocessing, training, interpreting, reporting)
+    current_step_message: str | None               # Human-readable description of the current step
+    estimated_seconds: int | None                  # Estimated total processing time in seconds
+    estimated_wait_seconds: int | None             # Estimated queue wait time in seconds (pending only)
     error_message: str | None
+    report_url: str | None                         # Shareable link to interactive web report
+    hints: list[str]                               # Upgrade hints (non-empty for free-tier users with hidden patterns)
+    hidden_deep_count: int                         # Patterns hidden for free-tier accounts (upgrade to see all)
+    hidden_deep_novel_count: int                   # Novel patterns hidden for free-tier accounts
 ```
 ### Pattern
@@ -226,6 +277,8 @@ class EngineResult:
 @dataclass
 class Pattern:
     id: str
+    task: str                           # "regression", "binary_classification", "multiclass_classification"
+    target_column: str                  # Column being analyzed
     description: str                    # Human-readable description
     conditions: list[dict]              # Conditions defining the pattern
     p_value: float                      # FDR-adjusted p-value
@@ -235,8 +288,10 @@ class Pattern:
     citations: list[dict]               # Academic citations
     target_change_direction: str        # "max" (increases target) or "min" (decreases)
     abs_target_change: float            # Magnitude of effect
+    target_score: float                 # Mean target value (regression) or class fraction (classification) in the subgroup
     support_count: int                  # Rows matching this pattern
     support_percentage: float           # Percentage of dataset
+    target_class: str | None            # For classification tasks
     target_mean: float | None           # For regression tasks
     target_std: float | None
 ```
@@ -286,6 +341,7 @@ class Summary:
     overview: str                       # High-level summary of findings
     key_insights: list[str]             # Main takeaways
     novel_patterns: PatternGroup        # Novel pattern IDs and explanation
+    selected_pattern_id: str | None     # ID of the highlighted/featured pattern
 ```
 ### Column
@@ -305,17 +361,22 @@ class Column:
     std: float | None
     min: float | None
     max: float | None
+    iqr_min: float | None               # 25th percentile
+    iqr_max: float | None               # 75th percentile
+    mode: str | None                    # Most common value (categorical columns)
+    approx_unique: int | None           # Approximate distinct value count
+    null_percentage: float | None
     feature_importance_score: float | None  # Signed importance score
 ```
 ### FeatureImportance
-Computed using **Hierarchical Perturbation (HiPe)**, an ablation-based method. Scores are **signed** — positive means the feature increases the prediction, negative means it decreases it.
+Scores are **signed** — positive means the feature increases the prediction, negative means it decreases it.
 ```python
 @dataclass
 class FeatureImportance:
-    kind: str                           # "global"
+    kind: str                           # "global" | "local"
     baseline: float                     # Baseline model output
     scores: list[FeatureImportanceScore]
@@ -329,12 +390,13 @@ class FeatureImportanceScore:
 ## Error Handling
 ```python
-from discovery import (
-    Engine,
+from discovery import Engine
+from discovery.errors import (
     AuthenticationError,
     InsufficientCreditsError,
     RateLimitError,
     RunFailedError,
+    RunNotFoundError,
     PaymentRequiredError,
 )
@@ -344,11 +406,15 @@ except AuthenticationError as e:
     print(e.suggestion)  # "Check your API key at https://disco.leap-labs.com/developers"
 except InsufficientCreditsError as e:
     print(f"Need {e.credits_required}, have {e.credits_available}")
-    print(e.suggestion)  # "Purchase credits or run publicly for free"
+    print(e.suggestion)  # "Run with visibility='public' (free, results published) or purchase credits with engine.purchase_credits()."
 except RateLimitError as e:
     print(f"Retry after {e.retry_after} seconds")
 except RunFailedError as e:
     print(f"Run {e.run_id} failed: {e}")
+except RunNotFoundError as e:
+    print(f"Run {e.run_id} not found — may have been cleaned up")
+except PaymentRequiredError as e:
+    print(e.suggestion)  # "Attach a payment method with engine.add_payment_method(...)"
 except TimeoutError:
     pass  # Retrieve later with engine.wait_for_completion(run_id)
 ```
@@ -358,7 +424,7 @@ All errors include a `suggestion` field with actionable instructions.
 ## MCP Server
-Disco is available as an [MCP server](https://disco.leap-labs.com/.well-known/mcp.json) with tools for the full discovery lifecycle — estimate, analyze, check status, get results, manage account.
+Disco is available as an [MCP server](https://disco.leap-labs.com/.well-known/mcp.json) with tools for the full discovery lifecycle — estimate, analyze, check status, get results, manage account. To subscribe or purchase credits via MCP, call `discovery_add_payment_method` first to attach a Stripe payment method.
 ```json
 {

{discovery_engine_api-0.2.93 → discovery_engine_api-0.2.94}/discovery/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """Disco Python SDK."""
-__version__ = "0.2.93"
+__version__ = "0.2.94"
 from discovery.client import Engine
 from discovery.types import (

discovery_engine_api-0.2.94/discovery/integrations/crewai.py ADDED Viewed

@@ -0,0 +1,118 @@
+"""CrewAI tool wrapper for Disco (Discovery Engine).
+Install: pip install discovery-engine-api crewai
+Usage:
+    from discovery.integrations.crewai import DiscoTool
+    tool = DiscoTool(api_key="disco_...")
+    agent = Agent(tools=[tool], ...)
+"""
+from __future__ import annotations
+import json
+from typing import Any
+from crewai.tools import BaseTool
+from pydantic import BaseModel, Field
+class DiscoInput(BaseModel):
+    """Input for the Disco discovery tool."""
+    file_url: str = Field(
+        description="URL of the tabular dataset to analyse (CSV, Excel, Parquet, JSON, etc.)"
+    )
+    target_column: str = Field(
+        description="The column to predict/explain — the outcome you want to understand"
+    )
+    visibility: str = Field(
+        default="public",
+        description="'public' (free, results published) or 'private' (costs credits, results private)",
+    )
+    analysis_depth: int = Field(
+        default=2,
+        description="Analysis depth — higher means deeper analysis but more credits. Default 2.",
+    )
+    excluded_columns: list[str] = Field(
+        default_factory=list,
+        description="Columns to exclude (IDs, data leakage, tautological columns)",
+    )
+    use_llms: bool = Field(
+        default=False,
+        description="If True, enables LLM-powered summaries, literature context, and novelty assessment. Slower and more expensive. Public runs always use LLMs.",
+    )
+class DiscoTool(BaseTool):
+    """Superhuman exploratory data analysis.
+    Disco finds novel, statistically validated patterns in tabular data — the
+    feature interactions, subgroup effects, and conditional relationships that
+    correlation analysis, LLMs, and manual exploration miss. Every finding comes
+    with p-values, effect sizes, and academic literature citations.
+    Free for public data. No ML expertise required.
+    """
+    name: str = "disco"
+    description: str = (
+        "Automated scientific discovery from tabular data. Use when you need to find "
+        "patterns, interactions, or subgroup effects in a dataset — especially when you "
+        "don't know what to look for. Returns statistically validated patterns with "
+        "p-values, effect sizes, and literature citations. Free for public data."
+    )
+    args_schema: type[BaseModel] = DiscoInput
+    api_key: str = ""
+    def __init__(self, api_key: str, **kwargs: Any):
+        super().__init__(api_key=api_key, **kwargs)
+    def _run(
+        self,
+        file_url: str,
+        target_column: str,
+        visibility: str = "public",
+        analysis_depth: int = 2,
+        excluded_columns: list[str] | None = None,
+        use_llms: bool = False,
+    ) -> str:
+        from discovery import Engine
+        engine = Engine(api_key=self.api_key)
+        result = engine.discover_sync(
+            file=file_url,
+            target_column=target_column,
+            visibility=visibility,
+            analysis_depth=analysis_depth,
+            excluded_columns=excluded_columns or [],
+            use_llms=use_llms,
+        )
+        patterns = []
+        for p in result.patterns:
+            patterns.append(
+                {
+                    "description": p.description,
+                    "conditions": p.conditions,
+                    "p_value": p.p_value,
+                    "effect_size": p.abs_target_change,
+                    "direction": p.target_change_direction,
+                    "support_count": p.support_count,
+                    "support_percentage": p.support_percentage,
+                    "novelty": p.novelty_type,
+                    "novelty_explanation": p.novelty_explanation,
+                    "citations": p.citations,
+                }
+            )
+        output = {
+            "report_url": result.report_url,
+            "pattern_count": len(patterns),
+            "patterns": patterns,
+        }
+        if hasattr(result, "summary") and result.summary:
+            output["summary"] = result.summary.overview
+        return json.dumps(output, indent=2, default=str)

discovery_engine_api-0.2.94/discovery/integrations/langchain.py ADDED Viewed

@@ -0,0 +1,122 @@
+"""LangChain tool wrapper for Disco (Discovery Engine).
+Install: pip install discovery-engine-api langchain-core
+Usage:
+    from discovery.integrations.langchain import DiscoTool
+    tool = DiscoTool(api_key="disco_...")
+    result = tool.invoke({"file_url": "https://example.com/data.csv", "target_column": "outcome"})
+"""
+from __future__ import annotations
+import asyncio
+import json
+from typing import Any
+from langchain_core.tools import BaseTool
+from pydantic import BaseModel, Field
+class DiscoInput(BaseModel):
+    """Input for the Disco discovery tool."""
+    file_url: str = Field(
+        description="URL of the tabular dataset to analyse (CSV, Excel, Parquet, JSON, etc.)"
+    )
+    target_column: str = Field(
+        description="The column to predict/explain — the outcome you want to understand"
+    )
+    visibility: str = Field(
+        default="public",
+        description="'public' (free, results published) or 'private' (costs credits, results private)",
+    )
+    analysis_depth: int = Field(
+        default=2,
+        description="Analysis depth — higher means deeper analysis but more credits. Default 2.",
+    )
+    excluded_columns: list[str] = Field(
+        default_factory=list,
+        description="Columns to exclude (IDs, data leakage, tautological columns)",
+    )
+    use_llms: bool = Field(
+        default=False,
+        description="If True, enables LLM-powered summaries, literature context, and novelty assessment. Slower and more expensive. Public runs always use LLMs.",
+    )
+class DiscoTool(BaseTool):
+    """Superhuman exploratory data analysis.
+    Disco finds novel, statistically validated patterns in tabular data — the
+    feature interactions, subgroup effects, and conditional relationships that
+    correlation analysis, LLMs, and manual exploration miss. Every finding comes
+    with p-values, effect sizes, and academic literature citations.
+    Free for public data. No ML expertise required.
+    """
+    name: str = "disco"
+    description: str = (
+        "Automated scientific discovery from tabular data. Use when you need to find "
+        "patterns, interactions, or subgroup effects in a dataset — especially when you "
+        "don't know what to look for. Returns statistically validated patterns with "
+        "p-values, effect sizes, and literature citations. Free for public data."
+    )
+    args_schema: type[BaseModel] = DiscoInput
+    api_key: str = ""
+    def __init__(self, api_key: str, **kwargs: Any):
+        super().__init__(api_key=api_key, **kwargs)
+    def _run(self, **kwargs: Any) -> str:
+        return asyncio.run(self._arun(**kwargs))
+    async def _arun(
+        self,
+        file_url: str,
+        target_column: str,
+        visibility: str = "public",
+        analysis_depth: int = 2,
+        excluded_columns: list[str] | None = None,
+        use_llms: bool = False,
+    ) -> str:
+        from discovery import Engine
+        engine = Engine(api_key=self.api_key)
+        result = await engine.discover(
+            file=file_url,
+            target_column=target_column,
+            visibility=visibility,
+            analysis_depth=analysis_depth,
+            excluded_columns=excluded_columns or [],
+            use_llms=use_llms,
+        )
+        patterns = []
+        for p in result.patterns:
+            patterns.append(
+                {
+                    "description": p.description,
+                    "conditions": p.conditions,
+                    "p_value": p.p_value,
+                    "effect_size": p.abs_target_change,
+                    "direction": p.target_change_direction,
+                    "support_count": p.support_count,
+                    "support_percentage": p.support_percentage,
+                    "novelty": p.novelty_type,
+                    "novelty_explanation": p.novelty_explanation,
+                    "citations": p.citations,
+                }
+            )
+        output = {
+            "report_url": result.report_url,
+            "pattern_count": len(patterns),
+            "patterns": patterns,
+        }
+        if hasattr(result, "summary") and result.summary:
+            output["summary"] = result.summary.overview
+        return json.dumps(output, indent=2, default=str)

{discovery_engine_api-0.2.93 → discovery_engine_api-0.2.94}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "discovery-engine-api"
-version = "0.2.93"
+version = "0.2.94"
 description = "Python SDK for Disco API"
 readme = "README.md"
 requires-python = ">=3.10"

discovery_engine_api-0.2.93/discovery/integrations/crewai.py DELETED Viewed

@@ -1,96 +0,0 @@
-"""CrewAI tool wrapper for Disco.
-Usage:
-    from discovery.integrations.crewai import DiscoveryEngineTool
-    tool = DiscoveryEngineTool(api_key="disco_...")
-    # Add to your CrewAI agent
-    agent = Agent(tools=[tool], ...)
-"""
-from __future__ import annotations
-import json
-from typing import Any
-from crewai.tools import BaseTool
-from pydantic import Field
-from discovery import Engine
-class DiscoveryEngineTool(BaseTool):
-    """CrewAI tool that runs Disco on tabular data.
-    Finds novel, statistically validated patterns — feature interactions,
-    subgroup effects, and conditional relationships — that correlation analysis,
-    LLMs, and hypothesis-driven approaches miss.
-    """
-    name: str = "Disco"
-    description: str = (
-        "Run Disco on tabular data to find novel, statistically "
-        "validated patterns that you cannot find with pandas, SQL, or by prompting "
-        "an LLM to analyze data. Use when you need to go beyond correlation. "
-        "Input: JSON with 'file' (path), 'target_column' (column to analyze). "
-        "Optional: 'visibility' (public/private), 'analysis_depth' (search depth). "
-        "Returns patterns with conditions, p-values, novelty scores, citations."
-    )
-    api_key: str = Field(description="Disco API key (disco_...)")
-    quiet: bool = Field(default=True, description="Suppress progress output")
-    def _run(self, query: str) -> str:
-        """Run Disco."""
-        try:
-            params = json.loads(query)
-        except json.JSONDecodeError:
-            return json.dumps({"error": "Input must be JSON with 'file' and 'target_column' keys."})
-        file_path = params.get("file")
-        target_column = params.get("target_column")
-        if not file_path or not target_column:
-            return json.dumps({"error": "Missing required keys: 'file' and 'target_column'."})
-        engine = Engine(api_key=self.api_key, quiet=self.quiet)
-        try:
-            result = engine.discover_sync(
-                file=file_path,
-                target_column=target_column,
-                analysis_depth=params.get("analysis_depth", 2),
-                visibility=params.get("visibility", "public"),
-            )
-        except Exception as e:
-            return json.dumps({"error": str(e), "suggestion": getattr(e, "suggestion", None)})
-        return _format_result(result)
-def _format_result(result: Any) -> str:
-    """Format EngineResult as a JSON string."""
-    patterns = []
-    for p in result.patterns:
-        patterns.append(
-            {
-                "description": p.description,
-                "conditions": p.conditions,
-                "p_value": p.p_value,
-                "novelty_type": p.novelty_type,
-                "novelty_explanation": p.novelty_explanation,
-                "effect_size": p.abs_target_change,
-                "direction": p.target_change_direction,
-                "support_percentage": p.support_percentage,
-            }
-        )
-    output: dict[str, Any] = {
-        "status": result.status,
-        "patterns": patterns,
-        "report_url": result.report_url,
-        "dashboard_urls": result.dashboard_urls,
-    }
-    if result.summary:
-        output["summary"] = result.summary.overview
-        output["key_insights"] = result.summary.key_insights
-    return json.dumps(output, indent=2)

discovery_engine_api-0.2.93/discovery/integrations/langchain.py DELETED Viewed

@@ -1,105 +0,0 @@
-"""LangChain tool wrapper for Disco.
-Usage:
-    from discovery.integrations.langchain import DiscoveryEngineTool
-    tool = DiscoveryEngineTool(api_key="disco_...")
-    # Add to your agent's tool list
-    agent = initialize_agent(tools=[tool], llm=llm)
-"""
-from __future__ import annotations
-import asyncio
-import json
-from typing import Any
-from langchain.tools import BaseTool
-from pydantic import Field
-from discovery import Engine
-class DiscoveryEngineTool(BaseTool):
-    """LangChain tool that runs Disco on tabular data.
-    Finds novel, statistically validated patterns — feature interactions,
-    subgroup effects, and conditional relationships — that correlation analysis,
-    LLMs, and hypothesis-driven approaches miss.
-    """
-    name: str = "discovery_engine"
-    description: str = (
-        "Run Disco on tabular data to find novel, statistically "
-        "validated patterns that you cannot find with pandas, SQL, or by prompting "
-        "an LLM to analyze data. Use this when you need to go beyond correlation "
-        "and find things nobody thought to look for. "
-        "Input should be a JSON string with keys: "
-        '"file" (path to CSV/Excel/Parquet), "target_column" (column to analyze). '
-        'Optional: "visibility" (public/private), "analysis_depth" (search depth). '
-        "Returns structured patterns with conditions, p-values, novelty scores, "
-        "and citations."
-    )
-    api_key: str = Field(description="Disco API key (disco_...)")
-    quiet: bool = Field(default=True, description="Suppress progress output")
-    def _run(self, query: str) -> str:
-        """Run synchronously."""
-        return asyncio.get_event_loop().run_until_complete(self._arun(query))
-    async def _arun(self, query: str) -> str:
-        """Run Disco asynchronously."""
-        try:
-            params = json.loads(query)
-        except json.JSONDecodeError:
-            # Treat as just a file path with no target column
-            return json.dumps({"error": "Input must be JSON with 'file' and 'target_column' keys."})
-        file_path = params.get("file")
-        target_column = params.get("target_column")
-        if not file_path or not target_column:
-            return json.dumps({"error": "Missing required keys: 'file' and 'target_column'."})
-        engine = Engine(api_key=self.api_key, quiet=self.quiet)
-        try:
-            result = await engine.discover(
-                file=file_path,
-                target_column=target_column,
-                analysis_depth=params.get("analysis_depth", 2),
-                visibility=params.get("visibility", "public"),
-            )
-        except Exception as e:
-            return json.dumps({"error": str(e), "suggestion": getattr(e, "suggestion", None)})
-        return _format_result(result)
-def _format_result(result: Any) -> str:
-    """Format EngineResult as a JSON string for the LLM."""
-    patterns = []
-    for p in result.patterns:
-        patterns.append(
-            {
-                "description": p.description,
-                "conditions": p.conditions,
-                "p_value": p.p_value,
-                "novelty_type": p.novelty_type,
-                "novelty_explanation": p.novelty_explanation,
-                "effect_size": p.abs_target_change,
-                "direction": p.target_change_direction,
-                "support_percentage": p.support_percentage,
-            }
-        )
-    output: dict[str, Any] = {
-        "status": result.status,
-        "patterns": patterns,
-        "report_url": result.report_url,
-        "dashboard_urls": result.dashboard_urls,
-    }
-    if result.summary:
-        output["summary"] = result.summary.overview
-        output["key_insights"] = result.summary.key_insights
-    return json.dumps(output, indent=2)

{discovery_engine_api-0.2.93 → discovery_engine_api-0.2.94}/.gitignore RENAMED Viewed

File without changes

{discovery_engine_api-0.2.93 → discovery_engine_api-0.2.94}/discovery/client.py RENAMED Viewed

File without changes

{discovery_engine_api-0.2.93 → discovery_engine_api-0.2.94}/discovery/errors.py RENAMED Viewed

File without changes

{discovery_engine_api-0.2.93 → discovery_engine_api-0.2.94}/discovery/integrations/__init__.py RENAMED Viewed

File without changes

{discovery_engine_api-0.2.93 → discovery_engine_api-0.2.94}/discovery/types.py RENAMED Viewed

File without changes

discovery-engine-api 0.2.93__tar.gz → 0.2.94__tar.gz

discovery-engine-api 0.2.93tar.gz → 0.2.94tar.gz