PyPI - nosible - Versions diffs - 0.2.1__tar.gz → 0.2.3__tar.gz - Mend

nosible 0.2.1tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

{nosible-0.2.1/src/nosible.egg-info → nosible-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: nosible
-Version: 0.2.1
+Version: 0.2.3
 Summary: Python client for the NOSIBLE Search API
 Home-page: https://github.com/NosibleAI/nosible
 Author: Stuart Reid, Matthew Dicks, Richard Taylor, Gareth Warburton
@@ -27,7 +27,6 @@ Classifier: Operating System :: OS Independent
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: requests
 Requires-Dist: polars
 Requires-Dist: duckdb
 Requires-Dist: openai
@@ -35,8 +34,8 @@ Requires-Dist: tantivy
 Requires-Dist: pyrate-limiter
 Requires-Dist: tenacity
 Requires-Dist: cryptography
-Requires-Dist: pandas
 Requires-Dist: pyarrow
+Requires-Dist: pandas
 Dynamic: author
 Dynamic: home-page
 Dynamic: license-file
@@ -80,13 +79,15 @@ uv pip install nosible
 **Requirements**:
 * Python 3.9+
-* requests
 * polars
-* cryptography
-* tenacity
-* pyrate-limiter
-* tantivy
+* duckdb
 * openai
+* tantivy
+* pyrate-limiter
+* tenacity
+* cryptography
+* pyarrow
+* pandas
 ### 🔑 Authentication
@@ -140,9 +141,28 @@ os.environ["LLM_API_KEY"] = "sk-..."
 ### 🚀 Examples
-#### Fast Search
+#### Search
+The Search and Searches functions enables you to retrieve **up to 100** results for a single query. This is ideal for most use cases where you need to retrieve information quickly and efficiently.
+- Use the `search` method when you need between **10 and 100** results for a single query.
+- The same applies for the `searches` and `.similar()` methods.
-Retrieve up to 100 results with optional filters:
+- A search will return a set of `Result` objects.
+- The `Result` object is used to represent a single search result and provides methods to access the result's properties.
+    - `url`: The URL of the search result.
+    - `title`: The title of the search result.
+    - `description`: A brief description or summary of the search result.
+    - `netloc`: The network location (domain) of the URL.
+    - `published`: The publication date of the search result.
+    - `visited`: The date and time when the result was visited.
+    - `author`: The author of the content.
+    - `content`: The main content or body of the search result.
+    - `language`: The language code of the content (e.g., 'en' for English).
+    - `similarity`: Similarity score with respect to a query or reference.
+They can be accessed directly from the `Result` object: `print(result.title)` or
+`print(result["title"])`
 ```python
 from nosible import Nosible
@@ -169,9 +189,44 @@ with Nosible(
     print([r.title for r in results])
 ```
+#### Expansions
+**Prompt expansions** are questions **lexically** and **semantically similar** to your main question. Expansions are added alongside your search query to improve your search results. You can add up to 10 expansions per search.
+- You can add you **own expansions** by passing a list of strings to the `expansions` parameter.
+- You can also get your expansions automatically generated by setting `autogenerate_expansions` to `True` when running the search.
+    - For expansions to be generated, you will need the `LLM_API_KEY` to be set in the environment or passed to the `Nosible` constructor.
+      - By default, we use openrouter as an endpoint. However, **we support any endpoint that supports openai**. If you
+        want to use a different endpoint, follow [this](https://nosible-py.readthedocs.io/en/latest/configuration.html#change-llm-base-url) guide in the docs.
+    - You can change this model with the argument **expansions_model**.
+```python
+# Example of using your own expansions
+with Nosible() as nos:
+    results = nos.search(
+        question="How have the Trump tariffs impacted the US economy?",
+        expansions=[
+            "What are the consequences of Trump's 2018 steel and aluminum tariffs on American manufacturers?",
+            "How did Donald Trump's tariffs on Chinese imports influence US import prices and inflation?",
+            "What impact did the Section 232 tariffs under President Trump have on US agricultural exports?",
+            "In what ways have Trump's trade duties affected employment levels in the US automotive sector?",
+            "How have the tariffs imposed by the Trump administration altered American consumer goods pricing nationwide?",
+            "What economic outcomes resulted from President Trump's protective tariffs for the United States economy?",
+            "How did Trump's solar panel tariffs change investment trends in the US energy market?",
+            "What have been the financial effects of Trump's Section 301 tariffs on Chinese electronics imports?",
+            "How did Trump's trade barriers influence GDP growth and trade deficits in the United States?",
+            "In what manner did Donald Trump's import taxes reshape competitiveness of US steel producers globally?",
+        ],
+        n_results=10,
+    )
+print(results)
+```
 #### Parallel Searches
-Run multiple queries concurrently:
+Allows you to run multiple searches concurrently and `yields` the results as they come in.
+- You can pass a list of questions to the `searches` method.
 ```python
 from nosible import Nosible
@@ -190,7 +245,12 @@ with Nosible(nosible_api_key="basic|abcd1234...", llm_api_key="sk-...") as clien
 #### Bulk Search
-Fetch thousands of results for offline analysis:
+Bulk search enables you to retrieve a large number of results in a single request, making it ideal for large-scale data analysis and processing.
+- Use the `bulk_search` method when you need more than 1,000 results for a single query.
+- You can request between **1,000 and 10,000** results per query.
+- All parameters available in the standard `search` method—such as `expansions`, `include_companies`, `include_languages`, and more—are also supported in `bulk_search`.
+- A bulk search for 10,000 results typically completes in about 30 seconds or less.
 ```python
 from nosible import Nosible
@@ -244,9 +304,14 @@ with Nosible(nosible_api_key="basic|abcd1234...") as client:
     print([r for r in results])
 ```
-#### Sentiment Analysis
+#### Sentiment
-Compute sentiment for a single result (uses GPT-4o; requires an LLM API key):
+This fetches a sentiment score for each search result.
+- The sentiment score is a float between `-1` and `1`, where `-1` is **negative**, `0` is **neutral**, and `1` is **positive**.
+- The sentiment model can be changed by passing the `sentiment_model` parameter to the `Nosible` constructor.
+    - The `sentiment_model` defaults to "openai/gpt-4o", which is a powerful model for sentiment analysis.
+- You can also change the base URL for the LLM API by passing the `openai_base_url` parameter to the `Nosible` constructor.
+    - The `openai_base_url` defaults to OpenRouter's API endpoint.
 ```python
 from nosible import Nosible

{nosible-0.2.1 → nosible-0.2.3}/README.md RENAMED Viewed

@@ -36,13 +36,15 @@ uv pip install nosible
 **Requirements**:
 * Python 3.9+
-* requests
 * polars
-* cryptography
-* tenacity
-* pyrate-limiter
-* tantivy
+* duckdb
 * openai
+* tantivy
+* pyrate-limiter
+* tenacity
+* cryptography
+* pyarrow
+* pandas
 ### 🔑 Authentication
@@ -96,9 +98,28 @@ os.environ["LLM_API_KEY"] = "sk-..."
 ### 🚀 Examples
-#### Fast Search
+#### Search
+The Search and Searches functions enables you to retrieve **up to 100** results for a single query. This is ideal for most use cases where you need to retrieve information quickly and efficiently.
+- Use the `search` method when you need between **10 and 100** results for a single query.
+- The same applies for the `searches` and `.similar()` methods.
+- A search will return a set of `Result` objects.
+- The `Result` object is used to represent a single search result and provides methods to access the result's properties.
+    - `url`: The URL of the search result.
+    - `title`: The title of the search result.
+    - `description`: A brief description or summary of the search result.
+    - `netloc`: The network location (domain) of the URL.
+    - `published`: The publication date of the search result.
+    - `visited`: The date and time when the result was visited.
+    - `author`: The author of the content.
+    - `content`: The main content or body of the search result.
+    - `language`: The language code of the content (e.g., 'en' for English).
+    - `similarity`: Similarity score with respect to a query or reference.
-Retrieve up to 100 results with optional filters:
+They can be accessed directly from the `Result` object: `print(result.title)` or
+`print(result["title"])`
 ```python
 from nosible import Nosible
@@ -125,9 +146,44 @@ with Nosible(
     print([r.title for r in results])
 ```
+#### Expansions
+**Prompt expansions** are questions **lexically** and **semantically similar** to your main question. Expansions are added alongside your search query to improve your search results. You can add up to 10 expansions per search.
+- You can add you **own expansions** by passing a list of strings to the `expansions` parameter.
+- You can also get your expansions automatically generated by setting `autogenerate_expansions` to `True` when running the search.
+    - For expansions to be generated, you will need the `LLM_API_KEY` to be set in the environment or passed to the `Nosible` constructor.
+      - By default, we use openrouter as an endpoint. However, **we support any endpoint that supports openai**. If you
+        want to use a different endpoint, follow [this](https://nosible-py.readthedocs.io/en/latest/configuration.html#change-llm-base-url) guide in the docs.
+    - You can change this model with the argument **expansions_model**.
+```python
+# Example of using your own expansions
+with Nosible() as nos:
+    results = nos.search(
+        question="How have the Trump tariffs impacted the US economy?",
+        expansions=[
+            "What are the consequences of Trump's 2018 steel and aluminum tariffs on American manufacturers?",
+            "How did Donald Trump's tariffs on Chinese imports influence US import prices and inflation?",
+            "What impact did the Section 232 tariffs under President Trump have on US agricultural exports?",
+            "In what ways have Trump's trade duties affected employment levels in the US automotive sector?",
+            "How have the tariffs imposed by the Trump administration altered American consumer goods pricing nationwide?",
+            "What economic outcomes resulted from President Trump's protective tariffs for the United States economy?",
+            "How did Trump's solar panel tariffs change investment trends in the US energy market?",
+            "What have been the financial effects of Trump's Section 301 tariffs on Chinese electronics imports?",
+            "How did Trump's trade barriers influence GDP growth and trade deficits in the United States?",
+            "In what manner did Donald Trump's import taxes reshape competitiveness of US steel producers globally?",
+        ],
+        n_results=10,
+    )
+print(results)
+```
 #### Parallel Searches
-Run multiple queries concurrently:
+Allows you to run multiple searches concurrently and `yields` the results as they come in.
+- You can pass a list of questions to the `searches` method.
 ```python
 from nosible import Nosible
@@ -146,7 +202,12 @@ with Nosible(nosible_api_key="basic|abcd1234...", llm_api_key="sk-...") as clien
 #### Bulk Search
-Fetch thousands of results for offline analysis:
+Bulk search enables you to retrieve a large number of results in a single request, making it ideal for large-scale data analysis and processing.
+- Use the `bulk_search` method when you need more than 1,000 results for a single query.
+- You can request between **1,000 and 10,000** results per query.
+- All parameters available in the standard `search` method—such as `expansions`, `include_companies`, `include_languages`, and more—are also supported in `bulk_search`.
+- A bulk search for 10,000 results typically completes in about 30 seconds or less.
 ```python
 from nosible import Nosible
@@ -200,9 +261,14 @@ with Nosible(nosible_api_key="basic|abcd1234...") as client:
     print([r for r in results])
 ```
-#### Sentiment Analysis
+#### Sentiment
-Compute sentiment for a single result (uses GPT-4o; requires an LLM API key):
+This fetches a sentiment score for each search result.
+- The sentiment score is a float between `-1` and `1`, where `-1` is **negative**, `0` is **neutral**, and `1` is **positive**.
+- The sentiment model can be changed by passing the `sentiment_model` parameter to the `Nosible` constructor.
+    - The `sentiment_model` defaults to "openai/gpt-4o", which is a powerful model for sentiment analysis.
+- You can also change the base URL for the LLM API by passing the `openai_base_url` parameter to the `Nosible` constructor.
+    - The `openai_base_url` defaults to OpenRouter's API endpoint.
 ```python
 from nosible import Nosible

{nosible-0.2.1 → nosible-0.2.3}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "nosible"
-version = "0.2.1"
+version = "0.2.3"
 description = "Python client for the NOSIBLE Search API"
 readme = { file = "README.md", content-type = "text/markdown" }
 requires-python = ">=3.9"
@@ -12,7 +12,6 @@ authors = [
 ]
 dependencies = [
-  "requests",
   "polars",
   "duckdb",
   "openai",
@@ -20,8 +19,8 @@ dependencies = [
   "pyrate-limiter",
   "tenacity",
   "cryptography",
-  "pandas",
   "pyarrow",
+  "pandas",
 ]
 license = "MIT"
@@ -60,7 +59,7 @@ where = ["src"]
 dev-dependencies = [
   "pytest",
   "pytest-doctestplus",
-  "requests-cache",
   "pytest-xdist",
-  "urllib3==1.26.15"
+  "urllib3==1.26.15",
+  "hishel",
 ]

{nosible-0.2.1 → nosible-0.2.3}/src/nosible/classes/result.py RENAMED Viewed

@@ -3,9 +3,8 @@ from __future__ import annotations
 from dataclasses import asdict, dataclass
 from typing import TYPE_CHECKING
-from openai import OpenAI
 from nosible.classes.web_page import WebPageData
+from nosible.utils.json_tools import print_dict
 if TYPE_CHECKING:
     from nosible.classes.result_set import ResultSet
@@ -102,11 +101,21 @@ class Result:
           0.99 | Example Domain
         >>> result = Result(title=None, similarity=None)
         >>> print(str(result))
-           N/A | No Title
+        {
+            "url": null,
+            "title": null,
+            "description": null,
+            "netloc": null,
+            "published": null,
+            "visited": null,
+            "author": null,
+            "content": null,
+            "language": null,
+            "similarity": null,
+            "url_hash": null
+        }
         """
-        similarity = f"{self.similarity:.2f}" if self.similarity is not None else "N/A"
-        title = self.title or "No Title"
-        return f"{similarity:>6} | {title}"
+        return print_dict(self.to_dict())
     def __getitem__(self, key: str) -> str | float | bool | None:
         """
@@ -295,12 +304,12 @@ class Result:
             The response must be a float in [-1.0, 1.0]. No other text must be returned.
         """
+        from openai import OpenAI
         llm_client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=client.llm_api_key)
         # Call the chat completions endpoint.
         resp = llm_client.chat.completions.create(
-            model="openai/gpt-4o", messages=[{"role": "user", "content": prompt.strip()}], temperature=0.7
+            model=client.sentiment_model, messages=[{"role": "user", "content": prompt.strip()}], temperature=0.7
         )
         raw = resp.choices[0].message.content

{nosible-0.2.1 → nosible-0.2.3}/src/nosible/classes/result_set.py RENAMED Viewed

@@ -2,15 +2,15 @@ from __future__ import annotations
 from collections.abc import Iterator
 from dataclasses import dataclass, field
-import duckdb
-import pandas as pd
-import polars as pl
-from tantivy import Document, Index, SchemaBuilder
+from typing import TYPE_CHECKING
 from nosible.classes.result import Result
 from nosible.utils.json_tools import json_dumps, json_loads
+if TYPE_CHECKING:
+    import pandas as pd
+    import polars as pl
 @dataclass(frozen=True)
 class ResultSet(Iterator[Result]):
@@ -182,28 +182,34 @@ class ResultSet(Iterator[Result]):
         # Setup if required
         return self
-    def __getitem__(self, key: int) -> Result:
+    def __getitem__(self, key: int | slice) -> Result | ResultSet:
         """
-        Get a Result by index.
+        Get a Result by index or a list of Results by slice.
         Parameters
         ----------
-        key : int
-            Index of the result to retrieve.
+        key : int or slice
+            Index or slice of the result(s) to retrieve.
         Returns
         -------
-        Result
-            The Result at the specified index.
+        Result or ResultSet
+            A single Result if `key` is an integer, or a ResultSet containing the sliced results if `key` is a slice.
         Raises
         ------
         IndexError
             If index is out of range.
+        TypeError
+            If key is not an integer or slice.
         """
-        if 0 <= key < len(self.results):
-            return self.results[key]
-        raise IndexError(f"Index {key} out of range for ResultSet with length {len(self.results)}.")
+        if isinstance(key, int):
+            if 0 <= key < len(self.results):
+                return self.results[key]
+            raise IndexError(f"Index {key} out of range for ResultSet with length {len(self.results)}.")
+        if isinstance(key, slice):
+            return ResultSet(self.results[key])
+        raise TypeError("ResultSet indices must be integers or slices.")
     def __add__(self, other: ResultSet | Result) -> ResultSet:
         """
@@ -316,6 +322,8 @@ class ResultSet(Iterator[Result]):
         Document returned
         Document returned
         """
+        from tantivy import Document, Index, SchemaBuilder
         # Build the Tantivy schema
         schema_builder = SchemaBuilder()
         # Int for doc retrieval.
@@ -439,6 +447,9 @@ class ResultSet(Iterator[Result]):
         Traceback (most recent call last):
         ValueError: Cannot analyze by 'foobar' - not a valid field.
         """
+        import pandas as pd
+        import polars as pl
         # Convert to Polars DataFrame
         df: pl.DataFrame = self.to_polars()
@@ -571,6 +582,10 @@ class ResultSet(Iterator[Result]):
         >>> "url" in df.columns
         True
         """
+        # Lazy import for runtime, but allow static type checking
+        import polars as pl
         return pl.DataFrame(self.to_dicts())
     def to_pandas(self) -> pd.DataFrame:
@@ -911,7 +926,7 @@ class ResultSet(Iterator[Result]):
             import duckdb
             # Convert to Polars DataFrame and then to Arrow Table
-            df = self.to_polars()
+            df = self.to_polars()  # noqa: F841
             # Connect to DuckDB and write the Arrow Table to a table
             con = duckdb.connect(out)
             # Write the DataFrame to the specified table name, replacing if exists
@@ -964,6 +979,8 @@ class ResultSet(Iterator[Result]):
         >>> results[0].title
         'Example Domain'
         """
+        import polars as pl
         try:
             df = pl.read_csv(file_path)
         except Exception as e:
@@ -1124,6 +1141,8 @@ class ResultSet(Iterator[Result]):
         >>> print(len(df))
         1
         """
+        import polars as pl
         pl_df = pl.from_pandas(df)
         return cls.from_polars(pl_df)
@@ -1239,6 +1258,8 @@ class ResultSet(Iterator[Result]):
         >>> results[0].title
         'Example Domain'
         """
+        import polars as pl
         try:
             df = pl.read_parquet(file_path)
         except Exception as e:
@@ -1288,6 +1309,8 @@ class ResultSet(Iterator[Result]):
         >>> results[0].title
         'Example Domain'
         """
+        import polars as pl
         try:
             df = pl.read_ipc(file_path)
         except Exception as e:
@@ -1340,7 +1363,11 @@ class ResultSet(Iterator[Result]):
         >>> loaded[0].title
         'Example Domain'
         """
+        import polars as pl
         try:
+            import duckdb
             con = duckdb.connect(file_path, read_only=True)
         except Exception as e:
             raise RuntimeError(f"Failed to connect to DuckDB file '{file_path}': {e}") from e
@@ -1492,10 +1519,3 @@ class ResultSet(Iterator[Result]):
         """
         # TODO: cleanup handles, sessions, etc.
         pass
-if __name__ == "__main__":
-    import doctest
-    doctest.testmod(optionflags=doctest.ELLIPSIS | doctest.NORMALIZE_WHITESPACE)
-    print("All tests passed!")

{nosible-0.2.1 → nosible-0.2.3}/src/nosible/nosible_client.py RENAMED Viewed

@@ -2,21 +2,17 @@ import gzip
 import json
 import logging
 import os
+import re
 import sys
 import textwrap
 import time
 import types
-import typing
 from collections.abc import Iterator
 from concurrent.futures import ThreadPoolExecutor
 from datetime import datetime
-from typing import Union, Optional
+from typing import Optional, Union
-import polars as pl
-import requests
-from cryptography.fernet import Fernet
-from openai import OpenAI
-from polars import SQLContext
+import httpx
 from tenacity import (
     before_sleep_log,
     retry,
@@ -32,7 +28,6 @@ from nosible.classes.search_set import SearchSet
 from nosible.classes.snippet_set import SnippetSet
 from nosible.classes.web_page import WebPageData
 from nosible.utils.json_tools import json_loads
-from nosible.utils.question_builder import _get_question
 from nosible.utils.rate_limiter import PLAN_RATE_LIMITS, RateLimiter, _rate_limited
 # Set up a module‐level logger.
@@ -56,6 +51,8 @@ class Nosible:
         Base URL for the OpenAI-compatible LLM API. (default is OpenRouter's API endpoint)
     sentiment_model : str, optional
         Model to use for sentiment analysis (default is "openai/gpt-4o").
+    expansions_model : str, optional
+        Model to use for expansions (default is "openai/gpt-4o").
     timeout : int
         Request timeout for HTTP calls.
     retries : int,
@@ -94,7 +91,8 @@ class Nosible:
     - The `nosible_api_key` is required to access the Nosible Search API.
     - The `llm_api_key` is optional and used for LLM-based query expansions.
     - The `openai_base_url` defaults to OpenRouter's API endpoint.
-    - The `sentiment_model` is used for generating query expansions and sentiment analysis.
+    - The `sentiment_model` is used for sentiment analysis.
+    - The `expansions_model` is used for generating query expansions.
     - The `timeout`, `retries`, and `concurrency` parameters control the behavior of HTTP requests.
     Examples
@@ -106,10 +104,11 @@ class Nosible:
     def __init__(
         self,
-        nosible_api_key: str = None,
-        llm_api_key: str = None,
+        nosible_api_key: Optional[str] = None,
+        llm_api_key: Optional[str] = None,
         openai_base_url: str = "https://openrouter.ai/api/v1",
         sentiment_model: str = "openai/gpt-4o",
+        expansions_model: str = "openai/gpt-4o",
         timeout: int = 30,
         retries: int = 5,
         concurrency: int = 10,
@@ -142,6 +141,7 @@ class Nosible:
         self.llm_api_key = llm_api_key or os.getenv("LLM_API_KEY")
         self.openai_base_url = openai_base_url
         self.sentiment_model = sentiment_model
+        self.expansions_model = expansions_model
         # Network parameters
         self.timeout = timeout
         self.retries = retries
@@ -162,7 +162,7 @@ class Nosible:
             reraise=True,
             stop=stop_after_attempt(self.retries) | stop_after_delay(self.timeout),
             wait=wait_exponential(multiplier=1, min=1, max=10),
-            retry=retry_if_exception_type(requests.exceptions.RequestException),
+            retry=retry_if_exception_type(httpx.RequestError),
             before_sleep=before_sleep_log(self.logger, logging.WARNING),
         )(self._post)
@@ -171,12 +171,12 @@ class Nosible:
             reraise=True,
             stop=stop_after_attempt(self.retries) | stop_after_delay(self.timeout),
             wait=wait_exponential(multiplier=1, min=1, max=10),
-            retry=retry_if_exception_type(requests.exceptions.RequestException),
+            retry=retry_if_exception_type(httpx.RequestError),
             before_sleep=before_sleep_log(self.logger, logging.WARNING),
         )(self._generate_expansions)
         # Thread pool for parallel searches
-        self._session = requests.Session()
+        self._session = httpx.Client(follow_redirects=True)
         self._executor = ThreadPoolExecutor(max_workers=self.concurrency)
         # Headers
@@ -201,7 +201,6 @@ class Nosible:
     def search(
         self,
-        *,
         search: Search = None,
         question: str = None,
         expansions: list[str] = None,
@@ -873,6 +872,8 @@ class Nosible:
         ...
         ValueError: Bulk search cannot have more than 10000 results per query.
         """
+        from cryptography.fernet import Fernet
         previous_level = self.logger.level
         if verbose:
             self.logger.setLevel(logging.INFO)
@@ -981,7 +982,7 @@ class Nosible:
             resp = self._post(url="https://www.nosible.ai/search/v1/slow-search", payload=payload)
             try:
                 resp.raise_for_status()
-            except requests.HTTPError as e:
+            except httpx.HTTPStatusError as e:
                 raise ValueError(f"[{question!r}] HTTP {resp.status_code}: {resp.text}") from e
             data = resp.json()
@@ -993,7 +994,7 @@ class Nosible:
             decrypt_using = data.get("decrypt_using")
             for _ in range(100):
                 dl = self._session.get(download_from, timeout=self.timeout)
-                if dl.ok:
+                if dl.status_code == 200:
                     fernet = Fernet(decrypt_using.encode())
                     decrypted = fernet.decrypt(dl.content)
                     decompressed = gzip.decompress(decrypted)
@@ -1053,7 +1054,7 @@ class Nosible:
         ...     ans = nos.answer(
         ...         query="How is research governance and decision-making structured between Google and DeepMind?",
         ...         n_results=100,
-        ...         show_context=True
+        ...         show_context=True,
         ...     )  # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
         <BLANKLINE>
         Doc 1
@@ -1067,11 +1068,7 @@ class Nosible:
             raise ValueError("An LLM API key is required for answer().")
         # Retrieve top documents
-        results = self.search(
-            question=query,
-            n_results=n_results,
-            min_similarity=min_similarity,
-        )
+        results = self.search(question=query, n_results=n_results, min_similarity=min_similarity)
         # Build RAG context
         context = ""
@@ -1090,7 +1087,7 @@ class Nosible:
             print(textwrap.dedent(context))
         # Craft prompt
-        prompt = (f"""
+        prompt = f"""
             # TASK DESCRIPTION
             You are a helpful assistant.  Use the following context to answer the question.
@@ -1102,15 +1099,12 @@ class Nosible:
             ## Context
             {context}
             """
-        )
+        from openai import OpenAI
         # Call LLM
         client = OpenAI(base_url=self.openai_base_url, api_key=self.llm_api_key)
         try:
-            response = client.chat.completions.create(
-                model = model,
-                messages = [{"role": "user", "content": prompt}],
-            )
+            response = client.chat.completions.create(model=model, messages=[{"role": "user", "content": prompt}])
         except Exception as e:
             raise RuntimeError(f"LLM API error: {e}") from e
@@ -1123,13 +1117,7 @@ class Nosible:
         return "Answer:\n" + response.choices[0].message.content.strip()
     @_rate_limited("visit")
-    def visit(
-        self,
-        html: str = "",
-        recrawl: bool = False,
-        render: bool = False,
-        url: str = None
-    ) -> WebPageData:
+    def visit(self, html: str = "", recrawl: bool = False, render: bool = False, url: str = None) -> WebPageData:
         """
         Visit a given URL and return a structured WebPageData object for the page.
@@ -1262,10 +1250,7 @@ class Nosible:
             payload["sql_filter"] = "SELECT loc, published FROM engine"
         # Send the POST to the /trend endpoint
-        response = self._post(
-            url="https://www.nosible.ai/search/v1/trend",
-            payload=payload,
-        )
+        response = self._post(url="https://www.nosible.ai/search/v1/trend", payload=payload)
         # Will raise ValueError on rate-limit or auth errors
         response.raise_for_status()
         payload = response.json().get("response", {})
@@ -1365,7 +1350,7 @@ class Nosible:
                 return False
             # If we reach here, the response is unexpected
             return False
-        except requests.HTTPError:
+        except httpx.HTTPError:
             return False
         except:
             return False
@@ -1460,7 +1445,7 @@ class Nosible:
         out = [
             "Below are the rate limits for all NOSIBLE plans.",
             "To upgrade your package, visit https://www.nosible.ai/products.\n",
-            "Unless otherwise indicated, bulk searches are limited to one-at-a-time per API key.\n"
+            "Unless otherwise indicated, bulk searches are limited to one-at-a-time per API key.\n",
         ]
         user_plan = self._get_user_plan()
@@ -1521,7 +1506,7 @@ class Nosible:
         except Exception:
             pass
-    def _post(self, url: str, payload: dict, headers: dict = None, timeout: int = None) -> requests.Response:
+    def _post(self, url: str, payload: dict, headers: dict = None, timeout: int = None) -> httpx.Response:
         """
         Internal helper to send a POST request with retry logic.
@@ -1553,7 +1538,7 @@ class Nosible:
         Returns
         -------
-        requests.Response
+        httpx.Response
             The HTTP response object.
         """
         response = self._session.post(
@@ -1561,18 +1546,18 @@ class Nosible:
             json=payload,
             headers=headers if headers is not None else self.headers,
             timeout=timeout if timeout is not None else self.timeout,
+            follow_redirects=True,
         )
         # If unauthorized, or if the payload is string too short, treat as invalid API key
         if response.status_code == 401:
             raise ValueError("Your API key is not valid.")
         if response.status_code == 422:
-            # Only inspect JSON if it’s a JSON response
             content_type = response.headers.get("Content-Type", "")
             if content_type.startswith("application/json"):
                 body = response.json()
                 if isinstance(body, list):
-                    body = body[0]  # NOSIBLE returns a list of errors
+                    body = body[0]
                 print(body)
                 if body.get("type") == "string_too_short":
                     raise ValueError("Your API key is not valid: Too Short.")
@@ -1711,12 +1696,14 @@ class Nosible:
                - Contextual Example: Swap "diabetes treatment" with "insulin therapy" or "blood sugar management".
         """.replace("                ", "")
+        # Lazy load
+        from openai import OpenAI
         client = OpenAI(base_url=self.openai_base_url, api_key=self.llm_api_key)
         # Call the chat completions endpoint.
         resp = client.chat.completions.create(
-            model=self.sentiment_model, messages=[{"role": "user", "content": prompt.strip()}], temperature=0.7
+            model=self.expansions_model, messages=[{"role": "user", "content": prompt.strip()}], temperature=0.7
         )
         raw = resp.choices[0].message.content
@@ -1776,14 +1763,16 @@ class Nosible:
             ...
         ValueError: Invalid date for 'visited_start': '2023/12/31'.  Expected ISO format 'YYYY-MM-DD'.
         """
+        dateregex = r"^\d{4}-\d{2}-\d{2}"
+        if not re.match(dateregex, string):
+            raise ValueError(f"Invalid date for '{name}': {string!r}.  Expected ISO format 'YYYY-MM-DD'.")
         try:
             # datetime.fromisoformat accepts both YYYY-MM-DD and full timestamps
             parsed = datetime.fromisoformat(string)
         except Exception:
-            raise ValueError(
-                f"Invalid date for '{name}': {string!r}.  "
-                "Expected ISO format 'YYYY-MM-DD'."
-            )
+            raise ValueError(f"Invalid date for '{name}': {string!r}.  Expected ISO format 'YYYY-MM-DD'.")
     def _format_sql(
         self,
@@ -1996,9 +1985,11 @@ class Nosible:
             "company_3",
             "doc_hash",
         ]
+        import polars as pl  # Lazy import
         # Create a dummy DataFrame with correct columns and no rows
         df = pl.DataFrame({col: [] for col in columns})
-        ctx = SQLContext()
+        ctx = pl.SQLContext()
         ctx.register("engine", df)
         try:
             ctx.execute(sql)
@@ -2019,10 +2010,10 @@ class Nosible:
     def __exit__(
         self,
-        _exc_type: typing.Optional[type[BaseException]],
-        _exc_val: typing.Optional[BaseException],
-        _exc_tb: typing.Optional[types.TracebackType],
-    ) -> typing.Optional[bool]:
+        _exc_type: Optional[type[BaseException]],
+        _exc_val: Optional[BaseException],
+        _exc_tb: Optional[types.TracebackType],
+    ) -> Optional[bool]:
         """
         Always clean up (self.close()), but let exceptions propagate.
         Return True only if you really want to suppress an exception.

{nosible-0.2.1 → nosible-0.2.3/src/nosible.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: nosible
-Version: 0.2.1
+Version: 0.2.3
 Summary: Python client for the NOSIBLE Search API
 Home-page: https://github.com/NosibleAI/nosible
 Author: Stuart Reid, Matthew Dicks, Richard Taylor, Gareth Warburton
@@ -27,7 +27,6 @@ Classifier: Operating System :: OS Independent
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: requests
 Requires-Dist: polars
 Requires-Dist: duckdb
 Requires-Dist: openai
@@ -35,8 +34,8 @@ Requires-Dist: tantivy
 Requires-Dist: pyrate-limiter
 Requires-Dist: tenacity
 Requires-Dist: cryptography
-Requires-Dist: pandas
 Requires-Dist: pyarrow
+Requires-Dist: pandas
 Dynamic: author
 Dynamic: home-page
 Dynamic: license-file
@@ -80,13 +79,15 @@ uv pip install nosible
 **Requirements**:
 * Python 3.9+
-* requests
 * polars
-* cryptography
-* tenacity
-* pyrate-limiter
-* tantivy
+* duckdb
 * openai
+* tantivy
+* pyrate-limiter
+* tenacity
+* cryptography
+* pyarrow
+* pandas
 ### 🔑 Authentication
@@ -140,9 +141,28 @@ os.environ["LLM_API_KEY"] = "sk-..."
 ### 🚀 Examples
-#### Fast Search
+#### Search
+The Search and Searches functions enables you to retrieve **up to 100** results for a single query. This is ideal for most use cases where you need to retrieve information quickly and efficiently.
+- Use the `search` method when you need between **10 and 100** results for a single query.
+- The same applies for the `searches` and `.similar()` methods.
-Retrieve up to 100 results with optional filters:
+- A search will return a set of `Result` objects.
+- The `Result` object is used to represent a single search result and provides methods to access the result's properties.
+    - `url`: The URL of the search result.
+    - `title`: The title of the search result.
+    - `description`: A brief description or summary of the search result.
+    - `netloc`: The network location (domain) of the URL.
+    - `published`: The publication date of the search result.
+    - `visited`: The date and time when the result was visited.
+    - `author`: The author of the content.
+    - `content`: The main content or body of the search result.
+    - `language`: The language code of the content (e.g., 'en' for English).
+    - `similarity`: Similarity score with respect to a query or reference.
+They can be accessed directly from the `Result` object: `print(result.title)` or
+`print(result["title"])`
 ```python
 from nosible import Nosible
@@ -169,9 +189,44 @@ with Nosible(
     print([r.title for r in results])
 ```
+#### Expansions
+**Prompt expansions** are questions **lexically** and **semantically similar** to your main question. Expansions are added alongside your search query to improve your search results. You can add up to 10 expansions per search.
+- You can add you **own expansions** by passing a list of strings to the `expansions` parameter.
+- You can also get your expansions automatically generated by setting `autogenerate_expansions` to `True` when running the search.
+    - For expansions to be generated, you will need the `LLM_API_KEY` to be set in the environment or passed to the `Nosible` constructor.
+      - By default, we use openrouter as an endpoint. However, **we support any endpoint that supports openai**. If you
+        want to use a different endpoint, follow [this](https://nosible-py.readthedocs.io/en/latest/configuration.html#change-llm-base-url) guide in the docs.
+    - You can change this model with the argument **expansions_model**.
+```python
+# Example of using your own expansions
+with Nosible() as nos:
+    results = nos.search(
+        question="How have the Trump tariffs impacted the US economy?",
+        expansions=[
+            "What are the consequences of Trump's 2018 steel and aluminum tariffs on American manufacturers?",
+            "How did Donald Trump's tariffs on Chinese imports influence US import prices and inflation?",
+            "What impact did the Section 232 tariffs under President Trump have on US agricultural exports?",
+            "In what ways have Trump's trade duties affected employment levels in the US automotive sector?",
+            "How have the tariffs imposed by the Trump administration altered American consumer goods pricing nationwide?",
+            "What economic outcomes resulted from President Trump's protective tariffs for the United States economy?",
+            "How did Trump's solar panel tariffs change investment trends in the US energy market?",
+            "What have been the financial effects of Trump's Section 301 tariffs on Chinese electronics imports?",
+            "How did Trump's trade barriers influence GDP growth and trade deficits in the United States?",
+            "In what manner did Donald Trump's import taxes reshape competitiveness of US steel producers globally?",
+        ],
+        n_results=10,
+    )
+print(results)
+```
 #### Parallel Searches
-Run multiple queries concurrently:
+Allows you to run multiple searches concurrently and `yields` the results as they come in.
+- You can pass a list of questions to the `searches` method.
 ```python
 from nosible import Nosible
@@ -190,7 +245,12 @@ with Nosible(nosible_api_key="basic|abcd1234...", llm_api_key="sk-...") as clien
 #### Bulk Search
-Fetch thousands of results for offline analysis:
+Bulk search enables you to retrieve a large number of results in a single request, making it ideal for large-scale data analysis and processing.
+- Use the `bulk_search` method when you need more than 1,000 results for a single query.
+- You can request between **1,000 and 10,000** results per query.
+- All parameters available in the standard `search` method—such as `expansions`, `include_companies`, `include_languages`, and more—are also supported in `bulk_search`.
+- A bulk search for 10,000 results typically completes in about 30 seconds or less.
 ```python
 from nosible import Nosible
@@ -244,9 +304,14 @@ with Nosible(nosible_api_key="basic|abcd1234...") as client:
     print([r for r in results])
 ```
-#### Sentiment Analysis
+#### Sentiment
-Compute sentiment for a single result (uses GPT-4o; requires an LLM API key):
+This fetches a sentiment score for each search result.
+- The sentiment score is a float between `-1` and `1`, where `-1` is **negative**, `0` is **neutral**, and `1` is **positive**.
+- The sentiment model can be changed by passing the `sentiment_model` parameter to the `Nosible` constructor.
+    - The `sentiment_model` defaults to "openai/gpt-4o", which is a powerful model for sentiment analysis.
+- You can also change the base URL for the LLM API by passing the `openai_base_url` parameter to the `Nosible` constructor.
+    - The `openai_base_url` defaults to OpenRouter's API endpoint.
 ```python
 from nosible import Nosible

{nosible-0.2.1 → nosible-0.2.3}/src/nosible.egg-info/SOURCES.txt RENAMED Viewed

@@ -17,7 +17,6 @@ src/nosible/classes/snippet.py
 src/nosible/classes/snippet_set.py
 src/nosible/classes/web_page.py
 src/nosible/utils/json_tools.py
-src/nosible/utils/question_builder.py
 src/nosible/utils/rate_limiter.py
 tests/test_01_nosible.py
 tests/test_02_results.py

{nosible-0.2.1 → nosible-0.2.3}/src/nosible.egg-info/requires.txt RENAMED Viewed

@@ -1,4 +1,3 @@
-requests
 polars
 duckdb
 openai
@@ -6,5 +5,5 @@ tantivy
 pyrate-limiter
 tenacity
 cryptography
-pandas
 pyarrow
+pandas

{nosible-0.2.1 → nosible-0.2.3}/tests/test_02_results.py RENAMED Viewed

@@ -1,6 +1,5 @@
-import pandas as pd
 import pytest
+from polars.dependencies import pandas as pd
 from nosible import Result, ResultSet
@@ -127,3 +126,26 @@ def test_resultset_to_pandas(search_data):
     assert "netloc" in df.columns
     assert "published" in df.columns
     assert "similarity" in df.columns
+def test_resultset_getitem(search_data):
+    """
+    Test the __getitem__ method of ResultSet.
+    This test checks if the ResultSet can be indexed with an integer or a slice,
+    and if it raises an IndexError for out-of-range indices.
+    Raises
+    ------
+    TypeError
+        If the key is not an integer or a slice.
+    IndexError
+        If the index is out of range.
+    """
+    assert isinstance(search_data[0], Result)
+    assert isinstance(search_data[1:3], ResultSet)
+    with pytest.raises(IndexError):
+        _ = search_data[len(search_data)]  # Out of range index
+    with pytest.raises(TypeError):
+        _ = search_data["invalid"]  # Invalid type for index

{nosible-0.2.1 → nosible-0.2.3}/tests/test_03_search_searchset.py RENAMED Viewed

@@ -1,4 +1,3 @@
-import pandas as pd
 from nosible import Search, SearchSet
 import pytest

{nosible-0.2.1 → nosible-0.2.3}/tests/test_04_snippets.py RENAMED Viewed

@@ -1,4 +1,3 @@
-import pandas as pd
 from nosible import Snippet, SnippetSet, WebPageData
 import pytest

nosible-0.2.1/src/nosible/utils/question_builder.py DELETED Viewed

@@ -1,131 +0,0 @@
-import random
-COMPANIES = [
-    "Apple Inc.",
-    "Microsoft Corporation",
-    "Amazon.com, Inc.",
-    "Alphabet Inc.",
-    "Meta Platforms, Inc.",
-    "Tesla, Inc.",
-    "Berkshire Hathaway Inc.",
-    "NVIDIA Corporation",
-    "JPMorgan Chase & Co.",
-    "Johnson & Johnson",
-    "Walmart Inc.",
-    "Visa Inc.",
-    "Mastercard Incorporated",
-    "Procter & Gamble Co.",
-    "UnitedHealth Group Incorporated",
-    "Bank of America Corporation",
-    "Home Depot, Inc.",
-    "Nestlé S.A.",
-    "Samsung Electronics Co., Ltd.",
-    "LVMH Moët Hennessy – Louis Vuitton",
-    "ASML Holding N.V.",
-    "Exxon Mobil Corporation",
-    "Intel Corporation",
-    "Pfizer Inc.",
-    "The Coca-Cola Company",
-    "PepsiCo, Inc.",
-    "Chevron Corporation",
-    "Merck & Co., Inc.",
-    "Novartis International AG",
-    "Toyota Motor Corporation",
-    "Oracle Corporation",
-    "Cisco Systems, Inc.",
-    "Adobe Inc.",
-    "Salesforce, Inc.",
-    "Netflix, Inc.",
-    "International Business Machines Corporation (IBM)",
-    "The Walt Disney Company",
-    "HSBC Holdings plc",
-    "McDonald's Corporation",
-    "Nike, Inc.",
-    "Qualcomm Incorporated",
-    "Roche Holding AG",
-    "SAP SE",
-    "Abbott Laboratories",
-    "Costco Wholesale Corporation",
-    "Broadcom Inc.",
-    "Accenture plc",
-    "Chevron Corporation",
-    "Texas Instruments Incorporated",
-    "Unilever PLC"
-]
-THINGS_TO_KNOW = [
-    "Company name and branding",
-    "Founding date and history",
-    "Founders and key executives",
-    "Headquarters and global offices",
-    "Mission, vision, and values",
-    "Core products and services",
-    "Business model and revenue streams",
-    "Annual revenue and growth rate",
-    "Profit margins (gross, operating, net)",
-    "Market capitalization and valuation",
-    "Key financial ratios (P/E, ROE, ROI)",
-    "Stock price history and recent performance",
-    "Major investors and shareholder structure",
-    "Recent mergers, acquisitions, or divestitures",
-    "R&D spending and innovation pipeline",
-    "Competitive landscape and main rivals",
-    "Market share by region or segment",
-    "Customer segments and target markets",
-    "Pricing strategy and positioning",
-    "Supply chain structure and partners",
-    "Distribution channels and logistics",
-    "Marketing and advertising strategies",
-    "Brand perception and reputation",
-    "ESG (Environmental, Social, Governance) scores",
-    "Sustainability initiatives and impact",
-    "Corporate culture and employee count",
-    "Employee satisfaction and turnover rates",
-    "Leadership and governance practices",
-    "Regulatory environment and compliance record",
-    "Key risks and litigation history",
-    "Recent news, press releases, and media coverage",
-    "Patents, trademarks, and IP portfolio",
-    "Digital transformation and tech stack",
-    "Website traffic and social media metrics",
-    "Mobile app usage and customer reviews",
-    "Partnerships and strategic alliances",
-    "Future outlook and analyst recommendations",
-]
-JOB_TITLES = [
-    "Chief Executive Officer (CEO)",
-    "Chief Financial Officer (CFO)",
-    "Chief Operating Officer (COO)",
-    "Chief Technology Officer (CTO)",
-    "Chief Marketing Officer (CMO)",
-    "Head of Investor Relations",
-    "Director of Corporate Strategy",
-    "Business Development Manager",
-    "Product Manager",
-    "Marketing Manager",
-    "Brand Manager",
-    "Financial Analyst",
-    "Equity Research Analyst",
-    "Market Research Analyst",
-    "Consultant",
-    "Venture Capital Associate",
-    "Private Equity Associate",
-    "Operations Manager",
-    "Supply Chain Manager",
-    "Human Resources Manager",
-    "Sustainability Officer",
-    "Compliance Officer",
-    "Legal Counsel",
-    "Risk Manager",
-    "Data Analyst",
-    "IT Manager",
-    "Sales Director",
-    "Account Manager",
-    "Customer Success Manager",
-    "Public Relations Manager"
-]
-def _get_question():
-    return (f"I am a {random.choice(JOB_TITLES)} and I want to know {random.choice(THINGS_TO_KNOW)}"
-            f"about {random.choice(COMPANIES)}")