nosible 0.2.1__tar.gz → 0.2.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. {nosible-0.2.1/src/nosible.egg-info → nosible-0.2.3}/PKG-INFO +79 -14
  2. {nosible-0.2.1 → nosible-0.2.3}/README.md +77 -11
  3. {nosible-0.2.1 → nosible-0.2.3}/pyproject.toml +4 -5
  4. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/classes/result.py +17 -8
  5. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/classes/result_set.py +42 -22
  6. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/nosible_client.py +47 -56
  7. {nosible-0.2.1 → nosible-0.2.3/src/nosible.egg-info}/PKG-INFO +79 -14
  8. {nosible-0.2.1 → nosible-0.2.3}/src/nosible.egg-info/SOURCES.txt +0 -1
  9. {nosible-0.2.1 → nosible-0.2.3}/src/nosible.egg-info/requires.txt +1 -2
  10. {nosible-0.2.1 → nosible-0.2.3}/tests/test_02_results.py +24 -2
  11. {nosible-0.2.1 → nosible-0.2.3}/tests/test_03_search_searchset.py +0 -1
  12. {nosible-0.2.1 → nosible-0.2.3}/tests/test_04_snippets.py +0 -1
  13. nosible-0.2.1/src/nosible/utils/question_builder.py +0 -131
  14. {nosible-0.2.1 → nosible-0.2.3}/LICENSE +0 -0
  15. {nosible-0.2.1 → nosible-0.2.3}/setup.cfg +0 -0
  16. {nosible-0.2.1 → nosible-0.2.3}/setup.py +0 -0
  17. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/__init__.py +0 -0
  18. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/classes/search.py +0 -0
  19. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/classes/search_set.py +0 -0
  20. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/classes/snippet.py +0 -0
  21. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/classes/snippet_set.py +0 -0
  22. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/classes/web_page.py +0 -0
  23. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/utils/json_tools.py +0 -0
  24. {nosible-0.2.1 → nosible-0.2.3}/src/nosible/utils/rate_limiter.py +0 -0
  25. {nosible-0.2.1 → nosible-0.2.3}/src/nosible.egg-info/dependency_links.txt +0 -0
  26. {nosible-0.2.1 → nosible-0.2.3}/src/nosible.egg-info/top_level.txt +0 -0
  27. {nosible-0.2.1 → nosible-0.2.3}/tests/test_01_nosible.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: nosible
3
- Version: 0.2.1
3
+ Version: 0.2.3
4
4
  Summary: Python client for the NOSIBLE Search API
5
5
  Home-page: https://github.com/NosibleAI/nosible
6
6
  Author: Stuart Reid, Matthew Dicks, Richard Taylor, Gareth Warburton
@@ -27,7 +27,6 @@ Classifier: Operating System :: OS Independent
27
27
  Requires-Python: >=3.9
28
28
  Description-Content-Type: text/markdown
29
29
  License-File: LICENSE
30
- Requires-Dist: requests
31
30
  Requires-Dist: polars
32
31
  Requires-Dist: duckdb
33
32
  Requires-Dist: openai
@@ -35,8 +34,8 @@ Requires-Dist: tantivy
35
34
  Requires-Dist: pyrate-limiter
36
35
  Requires-Dist: tenacity
37
36
  Requires-Dist: cryptography
38
- Requires-Dist: pandas
39
37
  Requires-Dist: pyarrow
38
+ Requires-Dist: pandas
40
39
  Dynamic: author
41
40
  Dynamic: home-page
42
41
  Dynamic: license-file
@@ -80,13 +79,15 @@ uv pip install nosible
80
79
  **Requirements**:
81
80
 
82
81
  * Python 3.9+
83
- * requests
84
82
  * polars
85
- * cryptography
86
- * tenacity
87
- * pyrate-limiter
88
- * tantivy
83
+ * duckdb
89
84
  * openai
85
+ * tantivy
86
+ * pyrate-limiter
87
+ * tenacity
88
+ * cryptography
89
+ * pyarrow
90
+ * pandas
90
91
 
91
92
  ### 🔑 Authentication
92
93
 
@@ -140,9 +141,28 @@ os.environ["LLM_API_KEY"] = "sk-..."
140
141
 
141
142
  ### 🚀 Examples
142
143
 
143
- #### Fast Search
144
+ #### Search
145
+
146
+ The Search and Searches functions enables you to retrieve **up to 100** results for a single query. This is ideal for most use cases where you need to retrieve information quickly and efficiently.
147
+
148
+ - Use the `search` method when you need between **10 and 100** results for a single query.
149
+ - The same applies for the `searches` and `.similar()` methods.
144
150
 
145
- Retrieve up to 100 results with optional filters:
151
+ - A search will return a set of `Result` objects.
152
+ - The `Result` object is used to represent a single search result and provides methods to access the result's properties.
153
+ - `url`: The URL of the search result.
154
+ - `title`: The title of the search result.
155
+ - `description`: A brief description or summary of the search result.
156
+ - `netloc`: The network location (domain) of the URL.
157
+ - `published`: The publication date of the search result.
158
+ - `visited`: The date and time when the result was visited.
159
+ - `author`: The author of the content.
160
+ - `content`: The main content or body of the search result.
161
+ - `language`: The language code of the content (e.g., 'en' for English).
162
+ - `similarity`: Similarity score with respect to a query or reference.
163
+
164
+ They can be accessed directly from the `Result` object: `print(result.title)` or
165
+ `print(result["title"])`
146
166
 
147
167
  ```python
148
168
  from nosible import Nosible
@@ -169,9 +189,44 @@ with Nosible(
169
189
  print([r.title for r in results])
170
190
  ```
171
191
 
192
+ #### Expansions
193
+
194
+ **Prompt expansions** are questions **lexically** and **semantically similar** to your main question. Expansions are added alongside your search query to improve your search results. You can add up to 10 expansions per search.
195
+
196
+ - You can add you **own expansions** by passing a list of strings to the `expansions` parameter.
197
+ - You can also get your expansions automatically generated by setting `autogenerate_expansions` to `True` when running the search.
198
+ - For expansions to be generated, you will need the `LLM_API_KEY` to be set in the environment or passed to the `Nosible` constructor.
199
+ - By default, we use openrouter as an endpoint. However, **we support any endpoint that supports openai**. If you
200
+ want to use a different endpoint, follow [this](https://nosible-py.readthedocs.io/en/latest/configuration.html#change-llm-base-url) guide in the docs.
201
+ - You can change this model with the argument **expansions_model**.
202
+
203
+ ```python
204
+ # Example of using your own expansions
205
+ with Nosible() as nos:
206
+ results = nos.search(
207
+ question="How have the Trump tariffs impacted the US economy?",
208
+ expansions=[
209
+ "What are the consequences of Trump's 2018 steel and aluminum tariffs on American manufacturers?",
210
+ "How did Donald Trump's tariffs on Chinese imports influence US import prices and inflation?",
211
+ "What impact did the Section 232 tariffs under President Trump have on US agricultural exports?",
212
+ "In what ways have Trump's trade duties affected employment levels in the US automotive sector?",
213
+ "How have the tariffs imposed by the Trump administration altered American consumer goods pricing nationwide?",
214
+ "What economic outcomes resulted from President Trump's protective tariffs for the United States economy?",
215
+ "How did Trump's solar panel tariffs change investment trends in the US energy market?",
216
+ "What have been the financial effects of Trump's Section 301 tariffs on Chinese electronics imports?",
217
+ "How did Trump's trade barriers influence GDP growth and trade deficits in the United States?",
218
+ "In what manner did Donald Trump's import taxes reshape competitiveness of US steel producers globally?",
219
+ ],
220
+ n_results=10,
221
+ )
222
+
223
+ print(results)
224
+ ```
225
+
172
226
  #### Parallel Searches
173
227
 
174
- Run multiple queries concurrently:
228
+ Allows you to run multiple searches concurrently and `yields` the results as they come in.
229
+ - You can pass a list of questions to the `searches` method.
175
230
 
176
231
  ```python
177
232
  from nosible import Nosible
@@ -190,7 +245,12 @@ with Nosible(nosible_api_key="basic|abcd1234...", llm_api_key="sk-...") as clien
190
245
 
191
246
  #### Bulk Search
192
247
 
193
- Fetch thousands of results for offline analysis:
248
+ Bulk search enables you to retrieve a large number of results in a single request, making it ideal for large-scale data analysis and processing.
249
+
250
+ - Use the `bulk_search` method when you need more than 1,000 results for a single query.
251
+ - You can request between **1,000 and 10,000** results per query.
252
+ - All parameters available in the standard `search` method—such as `expansions`, `include_companies`, `include_languages`, and more—are also supported in `bulk_search`.
253
+ - A bulk search for 10,000 results typically completes in about 30 seconds or less.
194
254
 
195
255
  ```python
196
256
  from nosible import Nosible
@@ -244,9 +304,14 @@ with Nosible(nosible_api_key="basic|abcd1234...") as client:
244
304
  print([r for r in results])
245
305
  ```
246
306
 
247
- #### Sentiment Analysis
307
+ #### Sentiment
248
308
 
249
- Compute sentiment for a single result (uses GPT-4o; requires an LLM API key):
309
+ This fetches a sentiment score for each search result.
310
+ - The sentiment score is a float between `-1` and `1`, where `-1` is **negative**, `0` is **neutral**, and `1` is **positive**.
311
+ - The sentiment model can be changed by passing the `sentiment_model` parameter to the `Nosible` constructor.
312
+ - The `sentiment_model` defaults to "openai/gpt-4o", which is a powerful model for sentiment analysis.
313
+ - You can also change the base URL for the LLM API by passing the `openai_base_url` parameter to the `Nosible` constructor.
314
+ - The `openai_base_url` defaults to OpenRouter's API endpoint.
250
315
 
251
316
  ```python
252
317
  from nosible import Nosible
@@ -36,13 +36,15 @@ uv pip install nosible
36
36
  **Requirements**:
37
37
 
38
38
  * Python 3.9+
39
- * requests
40
39
  * polars
41
- * cryptography
42
- * tenacity
43
- * pyrate-limiter
44
- * tantivy
40
+ * duckdb
45
41
  * openai
42
+ * tantivy
43
+ * pyrate-limiter
44
+ * tenacity
45
+ * cryptography
46
+ * pyarrow
47
+ * pandas
46
48
 
47
49
  ### 🔑 Authentication
48
50
 
@@ -96,9 +98,28 @@ os.environ["LLM_API_KEY"] = "sk-..."
96
98
 
97
99
  ### 🚀 Examples
98
100
 
99
- #### Fast Search
101
+ #### Search
102
+
103
+ The Search and Searches functions enables you to retrieve **up to 100** results for a single query. This is ideal for most use cases where you need to retrieve information quickly and efficiently.
104
+
105
+ - Use the `search` method when you need between **10 and 100** results for a single query.
106
+ - The same applies for the `searches` and `.similar()` methods.
107
+
108
+ - A search will return a set of `Result` objects.
109
+ - The `Result` object is used to represent a single search result and provides methods to access the result's properties.
110
+ - `url`: The URL of the search result.
111
+ - `title`: The title of the search result.
112
+ - `description`: A brief description or summary of the search result.
113
+ - `netloc`: The network location (domain) of the URL.
114
+ - `published`: The publication date of the search result.
115
+ - `visited`: The date and time when the result was visited.
116
+ - `author`: The author of the content.
117
+ - `content`: The main content or body of the search result.
118
+ - `language`: The language code of the content (e.g., 'en' for English).
119
+ - `similarity`: Similarity score with respect to a query or reference.
100
120
 
101
- Retrieve up to 100 results with optional filters:
121
+ They can be accessed directly from the `Result` object: `print(result.title)` or
122
+ `print(result["title"])`
102
123
 
103
124
  ```python
104
125
  from nosible import Nosible
@@ -125,9 +146,44 @@ with Nosible(
125
146
  print([r.title for r in results])
126
147
  ```
127
148
 
149
+ #### Expansions
150
+
151
+ **Prompt expansions** are questions **lexically** and **semantically similar** to your main question. Expansions are added alongside your search query to improve your search results. You can add up to 10 expansions per search.
152
+
153
+ - You can add you **own expansions** by passing a list of strings to the `expansions` parameter.
154
+ - You can also get your expansions automatically generated by setting `autogenerate_expansions` to `True` when running the search.
155
+ - For expansions to be generated, you will need the `LLM_API_KEY` to be set in the environment or passed to the `Nosible` constructor.
156
+ - By default, we use openrouter as an endpoint. However, **we support any endpoint that supports openai**. If you
157
+ want to use a different endpoint, follow [this](https://nosible-py.readthedocs.io/en/latest/configuration.html#change-llm-base-url) guide in the docs.
158
+ - You can change this model with the argument **expansions_model**.
159
+
160
+ ```python
161
+ # Example of using your own expansions
162
+ with Nosible() as nos:
163
+ results = nos.search(
164
+ question="How have the Trump tariffs impacted the US economy?",
165
+ expansions=[
166
+ "What are the consequences of Trump's 2018 steel and aluminum tariffs on American manufacturers?",
167
+ "How did Donald Trump's tariffs on Chinese imports influence US import prices and inflation?",
168
+ "What impact did the Section 232 tariffs under President Trump have on US agricultural exports?",
169
+ "In what ways have Trump's trade duties affected employment levels in the US automotive sector?",
170
+ "How have the tariffs imposed by the Trump administration altered American consumer goods pricing nationwide?",
171
+ "What economic outcomes resulted from President Trump's protective tariffs for the United States economy?",
172
+ "How did Trump's solar panel tariffs change investment trends in the US energy market?",
173
+ "What have been the financial effects of Trump's Section 301 tariffs on Chinese electronics imports?",
174
+ "How did Trump's trade barriers influence GDP growth and trade deficits in the United States?",
175
+ "In what manner did Donald Trump's import taxes reshape competitiveness of US steel producers globally?",
176
+ ],
177
+ n_results=10,
178
+ )
179
+
180
+ print(results)
181
+ ```
182
+
128
183
  #### Parallel Searches
129
184
 
130
- Run multiple queries concurrently:
185
+ Allows you to run multiple searches concurrently and `yields` the results as they come in.
186
+ - You can pass a list of questions to the `searches` method.
131
187
 
132
188
  ```python
133
189
  from nosible import Nosible
@@ -146,7 +202,12 @@ with Nosible(nosible_api_key="basic|abcd1234...", llm_api_key="sk-...") as clien
146
202
 
147
203
  #### Bulk Search
148
204
 
149
- Fetch thousands of results for offline analysis:
205
+ Bulk search enables you to retrieve a large number of results in a single request, making it ideal for large-scale data analysis and processing.
206
+
207
+ - Use the `bulk_search` method when you need more than 1,000 results for a single query.
208
+ - You can request between **1,000 and 10,000** results per query.
209
+ - All parameters available in the standard `search` method—such as `expansions`, `include_companies`, `include_languages`, and more—are also supported in `bulk_search`.
210
+ - A bulk search for 10,000 results typically completes in about 30 seconds or less.
150
211
 
151
212
  ```python
152
213
  from nosible import Nosible
@@ -200,9 +261,14 @@ with Nosible(nosible_api_key="basic|abcd1234...") as client:
200
261
  print([r for r in results])
201
262
  ```
202
263
 
203
- #### Sentiment Analysis
264
+ #### Sentiment
204
265
 
205
- Compute sentiment for a single result (uses GPT-4o; requires an LLM API key):
266
+ This fetches a sentiment score for each search result.
267
+ - The sentiment score is a float between `-1` and `1`, where `-1` is **negative**, `0` is **neutral**, and `1` is **positive**.
268
+ - The sentiment model can be changed by passing the `sentiment_model` parameter to the `Nosible` constructor.
269
+ - The `sentiment_model` defaults to "openai/gpt-4o", which is a powerful model for sentiment analysis.
270
+ - You can also change the base URL for the LLM API by passing the `openai_base_url` parameter to the `Nosible` constructor.
271
+ - The `openai_base_url` defaults to OpenRouter's API endpoint.
206
272
 
207
273
  ```python
208
274
  from nosible import Nosible
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "nosible"
3
- version = "0.2.1"
3
+ version = "0.2.3"
4
4
  description = "Python client for the NOSIBLE Search API"
5
5
  readme = { file = "README.md", content-type = "text/markdown" }
6
6
  requires-python = ">=3.9"
@@ -12,7 +12,6 @@ authors = [
12
12
  ]
13
13
 
14
14
  dependencies = [
15
- "requests",
16
15
  "polars",
17
16
  "duckdb",
18
17
  "openai",
@@ -20,8 +19,8 @@ dependencies = [
20
19
  "pyrate-limiter",
21
20
  "tenacity",
22
21
  "cryptography",
23
- "pandas",
24
22
  "pyarrow",
23
+ "pandas",
25
24
  ]
26
25
 
27
26
  license = "MIT"
@@ -60,7 +59,7 @@ where = ["src"]
60
59
  dev-dependencies = [
61
60
  "pytest",
62
61
  "pytest-doctestplus",
63
- "requests-cache",
64
62
  "pytest-xdist",
65
- "urllib3==1.26.15"
63
+ "urllib3==1.26.15",
64
+ "hishel",
66
65
  ]
@@ -3,9 +3,8 @@ from __future__ import annotations
3
3
  from dataclasses import asdict, dataclass
4
4
  from typing import TYPE_CHECKING
5
5
 
6
- from openai import OpenAI
7
-
8
6
  from nosible.classes.web_page import WebPageData
7
+ from nosible.utils.json_tools import print_dict
9
8
 
10
9
  if TYPE_CHECKING:
11
10
  from nosible.classes.result_set import ResultSet
@@ -102,11 +101,21 @@ class Result:
102
101
  0.99 | Example Domain
103
102
  >>> result = Result(title=None, similarity=None)
104
103
  >>> print(str(result))
105
- N/A | No Title
104
+ {
105
+ "url": null,
106
+ "title": null,
107
+ "description": null,
108
+ "netloc": null,
109
+ "published": null,
110
+ "visited": null,
111
+ "author": null,
112
+ "content": null,
113
+ "language": null,
114
+ "similarity": null,
115
+ "url_hash": null
116
+ }
106
117
  """
107
- similarity = f"{self.similarity:.2f}" if self.similarity is not None else "N/A"
108
- title = self.title or "No Title"
109
- return f"{similarity:>6} | {title}"
118
+ return print_dict(self.to_dict())
110
119
 
111
120
  def __getitem__(self, key: str) -> str | float | bool | None:
112
121
  """
@@ -295,12 +304,12 @@ class Result:
295
304
 
296
305
  The response must be a float in [-1.0, 1.0]. No other text must be returned.
297
306
  """
298
-
307
+ from openai import OpenAI
299
308
  llm_client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=client.llm_api_key)
300
309
 
301
310
  # Call the chat completions endpoint.
302
311
  resp = llm_client.chat.completions.create(
303
- model="openai/gpt-4o", messages=[{"role": "user", "content": prompt.strip()}], temperature=0.7
312
+ model=client.sentiment_model, messages=[{"role": "user", "content": prompt.strip()}], temperature=0.7
304
313
  )
305
314
 
306
315
  raw = resp.choices[0].message.content
@@ -2,15 +2,15 @@ from __future__ import annotations
2
2
 
3
3
  from collections.abc import Iterator
4
4
  from dataclasses import dataclass, field
5
-
6
- import duckdb
7
- import pandas as pd
8
- import polars as pl
9
- from tantivy import Document, Index, SchemaBuilder
5
+ from typing import TYPE_CHECKING
10
6
 
11
7
  from nosible.classes.result import Result
12
8
  from nosible.utils.json_tools import json_dumps, json_loads
13
9
 
10
+ if TYPE_CHECKING:
11
+ import pandas as pd
12
+ import polars as pl
13
+
14
14
 
15
15
  @dataclass(frozen=True)
16
16
  class ResultSet(Iterator[Result]):
@@ -182,28 +182,34 @@ class ResultSet(Iterator[Result]):
182
182
  # Setup if required
183
183
  return self
184
184
 
185
- def __getitem__(self, key: int) -> Result:
185
+ def __getitem__(self, key: int | slice) -> Result | ResultSet:
186
186
  """
187
- Get a Result by index.
187
+ Get a Result by index or a list of Results by slice.
188
188
 
189
189
  Parameters
190
190
  ----------
191
- key : int
192
- Index of the result to retrieve.
191
+ key : int or slice
192
+ Index or slice of the result(s) to retrieve.
193
193
 
194
194
  Returns
195
195
  -------
196
- Result
197
- The Result at the specified index.
196
+ Result or ResultSet
197
+ A single Result if `key` is an integer, or a ResultSet containing the sliced results if `key` is a slice.
198
198
 
199
199
  Raises
200
200
  ------
201
201
  IndexError
202
202
  If index is out of range.
203
+ TypeError
204
+ If key is not an integer or slice.
203
205
  """
204
- if 0 <= key < len(self.results):
205
- return self.results[key]
206
- raise IndexError(f"Index {key} out of range for ResultSet with length {len(self.results)}.")
206
+ if isinstance(key, int):
207
+ if 0 <= key < len(self.results):
208
+ return self.results[key]
209
+ raise IndexError(f"Index {key} out of range for ResultSet with length {len(self.results)}.")
210
+ if isinstance(key, slice):
211
+ return ResultSet(self.results[key])
212
+ raise TypeError("ResultSet indices must be integers or slices.")
207
213
 
208
214
  def __add__(self, other: ResultSet | Result) -> ResultSet:
209
215
  """
@@ -316,6 +322,8 @@ class ResultSet(Iterator[Result]):
316
322
  Document returned
317
323
  Document returned
318
324
  """
325
+ from tantivy import Document, Index, SchemaBuilder
326
+
319
327
  # Build the Tantivy schema
320
328
  schema_builder = SchemaBuilder()
321
329
  # Int for doc retrieval.
@@ -439,6 +447,9 @@ class ResultSet(Iterator[Result]):
439
447
  Traceback (most recent call last):
440
448
  ValueError: Cannot analyze by 'foobar' - not a valid field.
441
449
  """
450
+ import pandas as pd
451
+ import polars as pl
452
+
442
453
  # Convert to Polars DataFrame
443
454
  df: pl.DataFrame = self.to_polars()
444
455
 
@@ -571,6 +582,10 @@ class ResultSet(Iterator[Result]):
571
582
  >>> "url" in df.columns
572
583
  True
573
584
  """
585
+ # Lazy import for runtime, but allow static type checking
586
+
587
+ import polars as pl
588
+
574
589
  return pl.DataFrame(self.to_dicts())
575
590
 
576
591
  def to_pandas(self) -> pd.DataFrame:
@@ -911,7 +926,7 @@ class ResultSet(Iterator[Result]):
911
926
  import duckdb
912
927
 
913
928
  # Convert to Polars DataFrame and then to Arrow Table
914
- df = self.to_polars()
929
+ df = self.to_polars() # noqa: F841
915
930
  # Connect to DuckDB and write the Arrow Table to a table
916
931
  con = duckdb.connect(out)
917
932
  # Write the DataFrame to the specified table name, replacing if exists
@@ -964,6 +979,8 @@ class ResultSet(Iterator[Result]):
964
979
  >>> results[0].title
965
980
  'Example Domain'
966
981
  """
982
+ import polars as pl
983
+
967
984
  try:
968
985
  df = pl.read_csv(file_path)
969
986
  except Exception as e:
@@ -1124,6 +1141,8 @@ class ResultSet(Iterator[Result]):
1124
1141
  >>> print(len(df))
1125
1142
  1
1126
1143
  """
1144
+ import polars as pl
1145
+
1127
1146
  pl_df = pl.from_pandas(df)
1128
1147
  return cls.from_polars(pl_df)
1129
1148
 
@@ -1239,6 +1258,8 @@ class ResultSet(Iterator[Result]):
1239
1258
  >>> results[0].title
1240
1259
  'Example Domain'
1241
1260
  """
1261
+ import polars as pl
1262
+
1242
1263
  try:
1243
1264
  df = pl.read_parquet(file_path)
1244
1265
  except Exception as e:
@@ -1288,6 +1309,8 @@ class ResultSet(Iterator[Result]):
1288
1309
  >>> results[0].title
1289
1310
  'Example Domain'
1290
1311
  """
1312
+ import polars as pl
1313
+
1291
1314
  try:
1292
1315
  df = pl.read_ipc(file_path)
1293
1316
  except Exception as e:
@@ -1340,7 +1363,11 @@ class ResultSet(Iterator[Result]):
1340
1363
  >>> loaded[0].title
1341
1364
  'Example Domain'
1342
1365
  """
1366
+ import polars as pl
1367
+
1343
1368
  try:
1369
+ import duckdb
1370
+
1344
1371
  con = duckdb.connect(file_path, read_only=True)
1345
1372
  except Exception as e:
1346
1373
  raise RuntimeError(f"Failed to connect to DuckDB file '{file_path}': {e}") from e
@@ -1492,10 +1519,3 @@ class ResultSet(Iterator[Result]):
1492
1519
  """
1493
1520
  # TODO: cleanup handles, sessions, etc.
1494
1521
  pass
1495
-
1496
-
1497
- if __name__ == "__main__":
1498
- import doctest
1499
-
1500
- doctest.testmod(optionflags=doctest.ELLIPSIS | doctest.NORMALIZE_WHITESPACE)
1501
- print("All tests passed!")
@@ -2,21 +2,17 @@ import gzip
2
2
  import json
3
3
  import logging
4
4
  import os
5
+ import re
5
6
  import sys
6
7
  import textwrap
7
8
  import time
8
9
  import types
9
- import typing
10
10
  from collections.abc import Iterator
11
11
  from concurrent.futures import ThreadPoolExecutor
12
12
  from datetime import datetime
13
- from typing import Union, Optional
13
+ from typing import Optional, Union
14
14
 
15
- import polars as pl
16
- import requests
17
- from cryptography.fernet import Fernet
18
- from openai import OpenAI
19
- from polars import SQLContext
15
+ import httpx
20
16
  from tenacity import (
21
17
  before_sleep_log,
22
18
  retry,
@@ -32,7 +28,6 @@ from nosible.classes.search_set import SearchSet
32
28
  from nosible.classes.snippet_set import SnippetSet
33
29
  from nosible.classes.web_page import WebPageData
34
30
  from nosible.utils.json_tools import json_loads
35
- from nosible.utils.question_builder import _get_question
36
31
  from nosible.utils.rate_limiter import PLAN_RATE_LIMITS, RateLimiter, _rate_limited
37
32
 
38
33
  # Set up a module‐level logger.
@@ -56,6 +51,8 @@ class Nosible:
56
51
  Base URL for the OpenAI-compatible LLM API. (default is OpenRouter's API endpoint)
57
52
  sentiment_model : str, optional
58
53
  Model to use for sentiment analysis (default is "openai/gpt-4o").
54
+ expansions_model : str, optional
55
+ Model to use for expansions (default is "openai/gpt-4o").
59
56
  timeout : int
60
57
  Request timeout for HTTP calls.
61
58
  retries : int,
@@ -94,7 +91,8 @@ class Nosible:
94
91
  - The `nosible_api_key` is required to access the Nosible Search API.
95
92
  - The `llm_api_key` is optional and used for LLM-based query expansions.
96
93
  - The `openai_base_url` defaults to OpenRouter's API endpoint.
97
- - The `sentiment_model` is used for generating query expansions and sentiment analysis.
94
+ - The `sentiment_model` is used for sentiment analysis.
95
+ - The `expansions_model` is used for generating query expansions.
98
96
  - The `timeout`, `retries`, and `concurrency` parameters control the behavior of HTTP requests.
99
97
 
100
98
  Examples
@@ -106,10 +104,11 @@ class Nosible:
106
104
 
107
105
  def __init__(
108
106
  self,
109
- nosible_api_key: str = None,
110
- llm_api_key: str = None,
107
+ nosible_api_key: Optional[str] = None,
108
+ llm_api_key: Optional[str] = None,
111
109
  openai_base_url: str = "https://openrouter.ai/api/v1",
112
110
  sentiment_model: str = "openai/gpt-4o",
111
+ expansions_model: str = "openai/gpt-4o",
113
112
  timeout: int = 30,
114
113
  retries: int = 5,
115
114
  concurrency: int = 10,
@@ -142,6 +141,7 @@ class Nosible:
142
141
  self.llm_api_key = llm_api_key or os.getenv("LLM_API_KEY")
143
142
  self.openai_base_url = openai_base_url
144
143
  self.sentiment_model = sentiment_model
144
+ self.expansions_model = expansions_model
145
145
  # Network parameters
146
146
  self.timeout = timeout
147
147
  self.retries = retries
@@ -162,7 +162,7 @@ class Nosible:
162
162
  reraise=True,
163
163
  stop=stop_after_attempt(self.retries) | stop_after_delay(self.timeout),
164
164
  wait=wait_exponential(multiplier=1, min=1, max=10),
165
- retry=retry_if_exception_type(requests.exceptions.RequestException),
165
+ retry=retry_if_exception_type(httpx.RequestError),
166
166
  before_sleep=before_sleep_log(self.logger, logging.WARNING),
167
167
  )(self._post)
168
168
 
@@ -171,12 +171,12 @@ class Nosible:
171
171
  reraise=True,
172
172
  stop=stop_after_attempt(self.retries) | stop_after_delay(self.timeout),
173
173
  wait=wait_exponential(multiplier=1, min=1, max=10),
174
- retry=retry_if_exception_type(requests.exceptions.RequestException),
174
+ retry=retry_if_exception_type(httpx.RequestError),
175
175
  before_sleep=before_sleep_log(self.logger, logging.WARNING),
176
176
  )(self._generate_expansions)
177
177
 
178
178
  # Thread pool for parallel searches
179
- self._session = requests.Session()
179
+ self._session = httpx.Client(follow_redirects=True)
180
180
  self._executor = ThreadPoolExecutor(max_workers=self.concurrency)
181
181
 
182
182
  # Headers
@@ -201,7 +201,6 @@ class Nosible:
201
201
 
202
202
  def search(
203
203
  self,
204
- *,
205
204
  search: Search = None,
206
205
  question: str = None,
207
206
  expansions: list[str] = None,
@@ -873,6 +872,8 @@ class Nosible:
873
872
  ...
874
873
  ValueError: Bulk search cannot have more than 10000 results per query.
875
874
  """
875
+ from cryptography.fernet import Fernet
876
+
876
877
  previous_level = self.logger.level
877
878
  if verbose:
878
879
  self.logger.setLevel(logging.INFO)
@@ -981,7 +982,7 @@ class Nosible:
981
982
  resp = self._post(url="https://www.nosible.ai/search/v1/slow-search", payload=payload)
982
983
  try:
983
984
  resp.raise_for_status()
984
- except requests.HTTPError as e:
985
+ except httpx.HTTPStatusError as e:
985
986
  raise ValueError(f"[{question!r}] HTTP {resp.status_code}: {resp.text}") from e
986
987
 
987
988
  data = resp.json()
@@ -993,7 +994,7 @@ class Nosible:
993
994
  decrypt_using = data.get("decrypt_using")
994
995
  for _ in range(100):
995
996
  dl = self._session.get(download_from, timeout=self.timeout)
996
- if dl.ok:
997
+ if dl.status_code == 200:
997
998
  fernet = Fernet(decrypt_using.encode())
998
999
  decrypted = fernet.decrypt(dl.content)
999
1000
  decompressed = gzip.decompress(decrypted)
@@ -1053,7 +1054,7 @@ class Nosible:
1053
1054
  ... ans = nos.answer(
1054
1055
  ... query="How is research governance and decision-making structured between Google and DeepMind?",
1055
1056
  ... n_results=100,
1056
- ... show_context=True
1057
+ ... show_context=True,
1057
1058
  ... ) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
1058
1059
  <BLANKLINE>
1059
1060
  Doc 1
@@ -1067,11 +1068,7 @@ class Nosible:
1067
1068
  raise ValueError("An LLM API key is required for answer().")
1068
1069
 
1069
1070
  # Retrieve top documents
1070
- results = self.search(
1071
- question=query,
1072
- n_results=n_results,
1073
- min_similarity=min_similarity,
1074
- )
1071
+ results = self.search(question=query, n_results=n_results, min_similarity=min_similarity)
1075
1072
 
1076
1073
  # Build RAG context
1077
1074
  context = ""
@@ -1090,7 +1087,7 @@ class Nosible:
1090
1087
  print(textwrap.dedent(context))
1091
1088
 
1092
1089
  # Craft prompt
1093
- prompt = (f"""
1090
+ prompt = f"""
1094
1091
  # TASK DESCRIPTION
1095
1092
 
1096
1093
  You are a helpful assistant. Use the following context to answer the question.
@@ -1102,15 +1099,12 @@ class Nosible:
1102
1099
  ## Context
1103
1100
  {context}
1104
1101
  """
1105
- )
1102
+ from openai import OpenAI
1106
1103
 
1107
1104
  # Call LLM
1108
1105
  client = OpenAI(base_url=self.openai_base_url, api_key=self.llm_api_key)
1109
1106
  try:
1110
- response = client.chat.completions.create(
1111
- model = model,
1112
- messages = [{"role": "user", "content": prompt}],
1113
- )
1107
+ response = client.chat.completions.create(model=model, messages=[{"role": "user", "content": prompt}])
1114
1108
  except Exception as e:
1115
1109
  raise RuntimeError(f"LLM API error: {e}") from e
1116
1110
 
@@ -1123,13 +1117,7 @@ class Nosible:
1123
1117
  return "Answer:\n" + response.choices[0].message.content.strip()
1124
1118
 
1125
1119
  @_rate_limited("visit")
1126
- def visit(
1127
- self,
1128
- html: str = "",
1129
- recrawl: bool = False,
1130
- render: bool = False,
1131
- url: str = None
1132
- ) -> WebPageData:
1120
+ def visit(self, html: str = "", recrawl: bool = False, render: bool = False, url: str = None) -> WebPageData:
1133
1121
  """
1134
1122
  Visit a given URL and return a structured WebPageData object for the page.
1135
1123
 
@@ -1262,10 +1250,7 @@ class Nosible:
1262
1250
  payload["sql_filter"] = "SELECT loc, published FROM engine"
1263
1251
 
1264
1252
  # Send the POST to the /trend endpoint
1265
- response = self._post(
1266
- url="https://www.nosible.ai/search/v1/trend",
1267
- payload=payload,
1268
- )
1253
+ response = self._post(url="https://www.nosible.ai/search/v1/trend", payload=payload)
1269
1254
  # Will raise ValueError on rate-limit or auth errors
1270
1255
  response.raise_for_status()
1271
1256
  payload = response.json().get("response", {})
@@ -1365,7 +1350,7 @@ class Nosible:
1365
1350
  return False
1366
1351
  # If we reach here, the response is unexpected
1367
1352
  return False
1368
- except requests.HTTPError:
1353
+ except httpx.HTTPError:
1369
1354
  return False
1370
1355
  except:
1371
1356
  return False
@@ -1460,7 +1445,7 @@ class Nosible:
1460
1445
  out = [
1461
1446
  "Below are the rate limits for all NOSIBLE plans.",
1462
1447
  "To upgrade your package, visit https://www.nosible.ai/products.\n",
1463
- "Unless otherwise indicated, bulk searches are limited to one-at-a-time per API key.\n"
1448
+ "Unless otherwise indicated, bulk searches are limited to one-at-a-time per API key.\n",
1464
1449
  ]
1465
1450
 
1466
1451
  user_plan = self._get_user_plan()
@@ -1521,7 +1506,7 @@ class Nosible:
1521
1506
  except Exception:
1522
1507
  pass
1523
1508
 
1524
- def _post(self, url: str, payload: dict, headers: dict = None, timeout: int = None) -> requests.Response:
1509
+ def _post(self, url: str, payload: dict, headers: dict = None, timeout: int = None) -> httpx.Response:
1525
1510
  """
1526
1511
  Internal helper to send a POST request with retry logic.
1527
1512
 
@@ -1553,7 +1538,7 @@ class Nosible:
1553
1538
 
1554
1539
  Returns
1555
1540
  -------
1556
- requests.Response
1541
+ httpx.Response
1557
1542
  The HTTP response object.
1558
1543
  """
1559
1544
  response = self._session.post(
@@ -1561,18 +1546,18 @@ class Nosible:
1561
1546
  json=payload,
1562
1547
  headers=headers if headers is not None else self.headers,
1563
1548
  timeout=timeout if timeout is not None else self.timeout,
1549
+ follow_redirects=True,
1564
1550
  )
1565
1551
 
1566
1552
  # If unauthorized, or if the payload is string too short, treat as invalid API key
1567
1553
  if response.status_code == 401:
1568
1554
  raise ValueError("Your API key is not valid.")
1569
1555
  if response.status_code == 422:
1570
- # Only inspect JSON if it’s a JSON response
1571
1556
  content_type = response.headers.get("Content-Type", "")
1572
1557
  if content_type.startswith("application/json"):
1573
1558
  body = response.json()
1574
1559
  if isinstance(body, list):
1575
- body = body[0] # NOSIBLE returns a list of errors
1560
+ body = body[0]
1576
1561
  print(body)
1577
1562
  if body.get("type") == "string_too_short":
1578
1563
  raise ValueError("Your API key is not valid: Too Short.")
@@ -1711,12 +1696,14 @@ class Nosible:
1711
1696
  - Contextual Example: Swap "diabetes treatment" with "insulin therapy" or "blood sugar management".
1712
1697
 
1713
1698
  """.replace(" ", "")
1699
+ # Lazy load
1700
+ from openai import OpenAI
1714
1701
 
1715
1702
  client = OpenAI(base_url=self.openai_base_url, api_key=self.llm_api_key)
1716
1703
 
1717
1704
  # Call the chat completions endpoint.
1718
1705
  resp = client.chat.completions.create(
1719
- model=self.sentiment_model, messages=[{"role": "user", "content": prompt.strip()}], temperature=0.7
1706
+ model=self.expansions_model, messages=[{"role": "user", "content": prompt.strip()}], temperature=0.7
1720
1707
  )
1721
1708
 
1722
1709
  raw = resp.choices[0].message.content
@@ -1776,14 +1763,16 @@ class Nosible:
1776
1763
  ...
1777
1764
  ValueError: Invalid date for 'visited_start': '2023/12/31'. Expected ISO format 'YYYY-MM-DD'.
1778
1765
  """
1766
+ dateregex = r"^\d{4}-\d{2}-\d{2}"
1767
+
1768
+ if not re.match(dateregex, string):
1769
+ raise ValueError(f"Invalid date for '{name}': {string!r}. Expected ISO format 'YYYY-MM-DD'.")
1770
+
1779
1771
  try:
1780
1772
  # datetime.fromisoformat accepts both YYYY-MM-DD and full timestamps
1781
1773
  parsed = datetime.fromisoformat(string)
1782
1774
  except Exception:
1783
- raise ValueError(
1784
- f"Invalid date for '{name}': {string!r}. "
1785
- "Expected ISO format 'YYYY-MM-DD'."
1786
- )
1775
+ raise ValueError(f"Invalid date for '{name}': {string!r}. Expected ISO format 'YYYY-MM-DD'.")
1787
1776
 
1788
1777
  def _format_sql(
1789
1778
  self,
@@ -1996,9 +1985,11 @@ class Nosible:
1996
1985
  "company_3",
1997
1986
  "doc_hash",
1998
1987
  ]
1988
+ import polars as pl # Lazy import
1989
+
1999
1990
  # Create a dummy DataFrame with correct columns and no rows
2000
1991
  df = pl.DataFrame({col: [] for col in columns})
2001
- ctx = SQLContext()
1992
+ ctx = pl.SQLContext()
2002
1993
  ctx.register("engine", df)
2003
1994
  try:
2004
1995
  ctx.execute(sql)
@@ -2019,10 +2010,10 @@ class Nosible:
2019
2010
 
2020
2011
  def __exit__(
2021
2012
  self,
2022
- _exc_type: typing.Optional[type[BaseException]],
2023
- _exc_val: typing.Optional[BaseException],
2024
- _exc_tb: typing.Optional[types.TracebackType],
2025
- ) -> typing.Optional[bool]:
2013
+ _exc_type: Optional[type[BaseException]],
2014
+ _exc_val: Optional[BaseException],
2015
+ _exc_tb: Optional[types.TracebackType],
2016
+ ) -> Optional[bool]:
2026
2017
  """
2027
2018
  Always clean up (self.close()), but let exceptions propagate.
2028
2019
  Return True only if you really want to suppress an exception.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: nosible
3
- Version: 0.2.1
3
+ Version: 0.2.3
4
4
  Summary: Python client for the NOSIBLE Search API
5
5
  Home-page: https://github.com/NosibleAI/nosible
6
6
  Author: Stuart Reid, Matthew Dicks, Richard Taylor, Gareth Warburton
@@ -27,7 +27,6 @@ Classifier: Operating System :: OS Independent
27
27
  Requires-Python: >=3.9
28
28
  Description-Content-Type: text/markdown
29
29
  License-File: LICENSE
30
- Requires-Dist: requests
31
30
  Requires-Dist: polars
32
31
  Requires-Dist: duckdb
33
32
  Requires-Dist: openai
@@ -35,8 +34,8 @@ Requires-Dist: tantivy
35
34
  Requires-Dist: pyrate-limiter
36
35
  Requires-Dist: tenacity
37
36
  Requires-Dist: cryptography
38
- Requires-Dist: pandas
39
37
  Requires-Dist: pyarrow
38
+ Requires-Dist: pandas
40
39
  Dynamic: author
41
40
  Dynamic: home-page
42
41
  Dynamic: license-file
@@ -80,13 +79,15 @@ uv pip install nosible
80
79
  **Requirements**:
81
80
 
82
81
  * Python 3.9+
83
- * requests
84
82
  * polars
85
- * cryptography
86
- * tenacity
87
- * pyrate-limiter
88
- * tantivy
83
+ * duckdb
89
84
  * openai
85
+ * tantivy
86
+ * pyrate-limiter
87
+ * tenacity
88
+ * cryptography
89
+ * pyarrow
90
+ * pandas
90
91
 
91
92
  ### 🔑 Authentication
92
93
 
@@ -140,9 +141,28 @@ os.environ["LLM_API_KEY"] = "sk-..."
140
141
 
141
142
  ### 🚀 Examples
142
143
 
143
- #### Fast Search
144
+ #### Search
145
+
146
+ The Search and Searches functions enables you to retrieve **up to 100** results for a single query. This is ideal for most use cases where you need to retrieve information quickly and efficiently.
147
+
148
+ - Use the `search` method when you need between **10 and 100** results for a single query.
149
+ - The same applies for the `searches` and `.similar()` methods.
144
150
 
145
- Retrieve up to 100 results with optional filters:
151
+ - A search will return a set of `Result` objects.
152
+ - The `Result` object is used to represent a single search result and provides methods to access the result's properties.
153
+ - `url`: The URL of the search result.
154
+ - `title`: The title of the search result.
155
+ - `description`: A brief description or summary of the search result.
156
+ - `netloc`: The network location (domain) of the URL.
157
+ - `published`: The publication date of the search result.
158
+ - `visited`: The date and time when the result was visited.
159
+ - `author`: The author of the content.
160
+ - `content`: The main content or body of the search result.
161
+ - `language`: The language code of the content (e.g., 'en' for English).
162
+ - `similarity`: Similarity score with respect to a query or reference.
163
+
164
+ They can be accessed directly from the `Result` object: `print(result.title)` or
165
+ `print(result["title"])`
146
166
 
147
167
  ```python
148
168
  from nosible import Nosible
@@ -169,9 +189,44 @@ with Nosible(
169
189
  print([r.title for r in results])
170
190
  ```
171
191
 
192
+ #### Expansions
193
+
194
+ **Prompt expansions** are questions **lexically** and **semantically similar** to your main question. Expansions are added alongside your search query to improve your search results. You can add up to 10 expansions per search.
195
+
196
+ - You can add you **own expansions** by passing a list of strings to the `expansions` parameter.
197
+ - You can also get your expansions automatically generated by setting `autogenerate_expansions` to `True` when running the search.
198
+ - For expansions to be generated, you will need the `LLM_API_KEY` to be set in the environment or passed to the `Nosible` constructor.
199
+ - By default, we use openrouter as an endpoint. However, **we support any endpoint that supports openai**. If you
200
+ want to use a different endpoint, follow [this](https://nosible-py.readthedocs.io/en/latest/configuration.html#change-llm-base-url) guide in the docs.
201
+ - You can change this model with the argument **expansions_model**.
202
+
203
+ ```python
204
+ # Example of using your own expansions
205
+ with Nosible() as nos:
206
+ results = nos.search(
207
+ question="How have the Trump tariffs impacted the US economy?",
208
+ expansions=[
209
+ "What are the consequences of Trump's 2018 steel and aluminum tariffs on American manufacturers?",
210
+ "How did Donald Trump's tariffs on Chinese imports influence US import prices and inflation?",
211
+ "What impact did the Section 232 tariffs under President Trump have on US agricultural exports?",
212
+ "In what ways have Trump's trade duties affected employment levels in the US automotive sector?",
213
+ "How have the tariffs imposed by the Trump administration altered American consumer goods pricing nationwide?",
214
+ "What economic outcomes resulted from President Trump's protective tariffs for the United States economy?",
215
+ "How did Trump's solar panel tariffs change investment trends in the US energy market?",
216
+ "What have been the financial effects of Trump's Section 301 tariffs on Chinese electronics imports?",
217
+ "How did Trump's trade barriers influence GDP growth and trade deficits in the United States?",
218
+ "In what manner did Donald Trump's import taxes reshape competitiveness of US steel producers globally?",
219
+ ],
220
+ n_results=10,
221
+ )
222
+
223
+ print(results)
224
+ ```
225
+
172
226
  #### Parallel Searches
173
227
 
174
- Run multiple queries concurrently:
228
+ Allows you to run multiple searches concurrently and `yields` the results as they come in.
229
+ - You can pass a list of questions to the `searches` method.
175
230
 
176
231
  ```python
177
232
  from nosible import Nosible
@@ -190,7 +245,12 @@ with Nosible(nosible_api_key="basic|abcd1234...", llm_api_key="sk-...") as clien
190
245
 
191
246
  #### Bulk Search
192
247
 
193
- Fetch thousands of results for offline analysis:
248
+ Bulk search enables you to retrieve a large number of results in a single request, making it ideal for large-scale data analysis and processing.
249
+
250
+ - Use the `bulk_search` method when you need more than 1,000 results for a single query.
251
+ - You can request between **1,000 and 10,000** results per query.
252
+ - All parameters available in the standard `search` method—such as `expansions`, `include_companies`, `include_languages`, and more—are also supported in `bulk_search`.
253
+ - A bulk search for 10,000 results typically completes in about 30 seconds or less.
194
254
 
195
255
  ```python
196
256
  from nosible import Nosible
@@ -244,9 +304,14 @@ with Nosible(nosible_api_key="basic|abcd1234...") as client:
244
304
  print([r for r in results])
245
305
  ```
246
306
 
247
- #### Sentiment Analysis
307
+ #### Sentiment
248
308
 
249
- Compute sentiment for a single result (uses GPT-4o; requires an LLM API key):
309
+ This fetches a sentiment score for each search result.
310
+ - The sentiment score is a float between `-1` and `1`, where `-1` is **negative**, `0` is **neutral**, and `1` is **positive**.
311
+ - The sentiment model can be changed by passing the `sentiment_model` parameter to the `Nosible` constructor.
312
+ - The `sentiment_model` defaults to "openai/gpt-4o", which is a powerful model for sentiment analysis.
313
+ - You can also change the base URL for the LLM API by passing the `openai_base_url` parameter to the `Nosible` constructor.
314
+ - The `openai_base_url` defaults to OpenRouter's API endpoint.
250
315
 
251
316
  ```python
252
317
  from nosible import Nosible
@@ -17,7 +17,6 @@ src/nosible/classes/snippet.py
17
17
  src/nosible/classes/snippet_set.py
18
18
  src/nosible/classes/web_page.py
19
19
  src/nosible/utils/json_tools.py
20
- src/nosible/utils/question_builder.py
21
20
  src/nosible/utils/rate_limiter.py
22
21
  tests/test_01_nosible.py
23
22
  tests/test_02_results.py
@@ -1,4 +1,3 @@
1
- requests
2
1
  polars
3
2
  duckdb
4
3
  openai
@@ -6,5 +5,5 @@ tantivy
6
5
  pyrate-limiter
7
6
  tenacity
8
7
  cryptography
9
- pandas
10
8
  pyarrow
9
+ pandas
@@ -1,6 +1,5 @@
1
- import pandas as pd
2
1
  import pytest
3
-
2
+ from polars.dependencies import pandas as pd
4
3
  from nosible import Result, ResultSet
5
4
 
6
5
 
@@ -127,3 +126,26 @@ def test_resultset_to_pandas(search_data):
127
126
  assert "netloc" in df.columns
128
127
  assert "published" in df.columns
129
128
  assert "similarity" in df.columns
129
+
130
+
131
+ def test_resultset_getitem(search_data):
132
+ """
133
+ Test the __getitem__ method of ResultSet.
134
+
135
+ This test checks if the ResultSet can be indexed with an integer or a slice,
136
+ and if it raises an IndexError for out-of-range indices.
137
+
138
+ Raises
139
+ ------
140
+ TypeError
141
+ If the key is not an integer or a slice.
142
+ IndexError
143
+ If the index is out of range.
144
+ """
145
+ assert isinstance(search_data[0], Result)
146
+ assert isinstance(search_data[1:3], ResultSet)
147
+
148
+ with pytest.raises(IndexError):
149
+ _ = search_data[len(search_data)] # Out of range index
150
+ with pytest.raises(TypeError):
151
+ _ = search_data["invalid"] # Invalid type for index
@@ -1,4 +1,3 @@
1
- import pandas as pd
2
1
  from nosible import Search, SearchSet
3
2
  import pytest
4
3
 
@@ -1,4 +1,3 @@
1
- import pandas as pd
2
1
  from nosible import Snippet, SnippetSet, WebPageData
3
2
  import pytest
4
3
 
@@ -1,131 +0,0 @@
1
- import random
2
-
3
- COMPANIES = [
4
- "Apple Inc.",
5
- "Microsoft Corporation",
6
- "Amazon.com, Inc.",
7
- "Alphabet Inc.",
8
- "Meta Platforms, Inc.",
9
- "Tesla, Inc.",
10
- "Berkshire Hathaway Inc.",
11
- "NVIDIA Corporation",
12
- "JPMorgan Chase & Co.",
13
- "Johnson & Johnson",
14
- "Walmart Inc.",
15
- "Visa Inc.",
16
- "Mastercard Incorporated",
17
- "Procter & Gamble Co.",
18
- "UnitedHealth Group Incorporated",
19
- "Bank of America Corporation",
20
- "Home Depot, Inc.",
21
- "Nestlé S.A.",
22
- "Samsung Electronics Co., Ltd.",
23
- "LVMH Moët Hennessy – Louis Vuitton",
24
- "ASML Holding N.V.",
25
- "Exxon Mobil Corporation",
26
- "Intel Corporation",
27
- "Pfizer Inc.",
28
- "The Coca-Cola Company",
29
- "PepsiCo, Inc.",
30
- "Chevron Corporation",
31
- "Merck & Co., Inc.",
32
- "Novartis International AG",
33
- "Toyota Motor Corporation",
34
- "Oracle Corporation",
35
- "Cisco Systems, Inc.",
36
- "Adobe Inc.",
37
- "Salesforce, Inc.",
38
- "Netflix, Inc.",
39
- "International Business Machines Corporation (IBM)",
40
- "The Walt Disney Company",
41
- "HSBC Holdings plc",
42
- "McDonald's Corporation",
43
- "Nike, Inc.",
44
- "Qualcomm Incorporated",
45
- "Roche Holding AG",
46
- "SAP SE",
47
- "Abbott Laboratories",
48
- "Costco Wholesale Corporation",
49
- "Broadcom Inc.",
50
- "Accenture plc",
51
- "Chevron Corporation",
52
- "Texas Instruments Incorporated",
53
- "Unilever PLC"
54
- ]
55
-
56
- THINGS_TO_KNOW = [
57
- "Company name and branding",
58
- "Founding date and history",
59
- "Founders and key executives",
60
- "Headquarters and global offices",
61
- "Mission, vision, and values",
62
- "Core products and services",
63
- "Business model and revenue streams",
64
- "Annual revenue and growth rate",
65
- "Profit margins (gross, operating, net)",
66
- "Market capitalization and valuation",
67
- "Key financial ratios (P/E, ROE, ROI)",
68
- "Stock price history and recent performance",
69
- "Major investors and shareholder structure",
70
- "Recent mergers, acquisitions, or divestitures",
71
- "R&D spending and innovation pipeline",
72
- "Competitive landscape and main rivals",
73
- "Market share by region or segment",
74
- "Customer segments and target markets",
75
- "Pricing strategy and positioning",
76
- "Supply chain structure and partners",
77
- "Distribution channels and logistics",
78
- "Marketing and advertising strategies",
79
- "Brand perception and reputation",
80
- "ESG (Environmental, Social, Governance) scores",
81
- "Sustainability initiatives and impact",
82
- "Corporate culture and employee count",
83
- "Employee satisfaction and turnover rates",
84
- "Leadership and governance practices",
85
- "Regulatory environment and compliance record",
86
- "Key risks and litigation history",
87
- "Recent news, press releases, and media coverage",
88
- "Patents, trademarks, and IP portfolio",
89
- "Digital transformation and tech stack",
90
- "Website traffic and social media metrics",
91
- "Mobile app usage and customer reviews",
92
- "Partnerships and strategic alliances",
93
- "Future outlook and analyst recommendations",
94
- ]
95
-
96
- JOB_TITLES = [
97
- "Chief Executive Officer (CEO)",
98
- "Chief Financial Officer (CFO)",
99
- "Chief Operating Officer (COO)",
100
- "Chief Technology Officer (CTO)",
101
- "Chief Marketing Officer (CMO)",
102
- "Head of Investor Relations",
103
- "Director of Corporate Strategy",
104
- "Business Development Manager",
105
- "Product Manager",
106
- "Marketing Manager",
107
- "Brand Manager",
108
- "Financial Analyst",
109
- "Equity Research Analyst",
110
- "Market Research Analyst",
111
- "Consultant",
112
- "Venture Capital Associate",
113
- "Private Equity Associate",
114
- "Operations Manager",
115
- "Supply Chain Manager",
116
- "Human Resources Manager",
117
- "Sustainability Officer",
118
- "Compliance Officer",
119
- "Legal Counsel",
120
- "Risk Manager",
121
- "Data Analyst",
122
- "IT Manager",
123
- "Sales Director",
124
- "Account Manager",
125
- "Customer Success Manager",
126
- "Public Relations Manager"
127
- ]
128
-
129
- def _get_question():
130
- return (f"I am a {random.choice(JOB_TITLES)} and I want to know {random.choice(THINGS_TO_KNOW)}"
131
- f"about {random.choice(COMPANIES)}")
File without changes
File without changes
File without changes
File without changes