cohere-compass-sdk 1.3.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Cohere
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,354 @@
1
+ Metadata-Version: 2.1
2
+ Name: cohere-compass-sdk
3
+ Version: 1.3.2
4
+ Summary: Cohere Compass SDK
5
+ Requires-Python: >=3.9,<4.0
6
+ Classifier: Programming Language :: Python :: 3
7
+ Classifier: Programming Language :: Python :: 3.9
8
+ Classifier: Programming Language :: Python :: 3.10
9
+ Classifier: Programming Language :: Python :: 3.11
10
+ Requires-Dist: Deprecated (>=1.2.18,<2.0.0)
11
+ Requires-Dist: fsspec (>=2024.6.1)
12
+ Requires-Dist: joblib (==1.4.2)
13
+ Requires-Dist: pydantic (>=2.6.3)
14
+ Requires-Dist: requests (>=2.25.0,<3.0.0)
15
+ Requires-Dist: tenacity (>=8.2.3,<9.0.0)
16
+ Description-Content-Type: text/markdown
17
+
18
+ # Cohere Compass SDK
19
+
20
+ [![Checked with pyright](https://microsoft.github.io/pyright/img/pyright_badge.svg)](https://microsoft.github.io/pyright/)
21
+
22
+ The Compass SDK is a Python library that allows you to parse documents and insert them
23
+ into a Compass index.
24
+
25
+ In order to parse documents, the Compass SDK relies on the Compass Parser API, which is
26
+ a RESTful API that receives files and returns parsed documents. This requires a hosted
27
+ Compass server.
28
+
29
+ The Compass SDK provides a `CompassParserClient` that allows to interact with the parser
30
+ API from your Python code in a convenient manner. The `CompassParserClient` provides
31
+ methods to parse single and multiple files, as well as entire folders, and supports
32
+ multiple file types (e.g., `pdf`, `docx`, `json`, `csv`, etc.) as well as different file
33
+ systems (e.g., local, S3, GCS, etc.).
34
+
35
+ To insert parsed documents into a `Compass` index, the Compass SDK provides a
36
+ `CompassClient` class that allows to interact with a Compass API server. The Compass API
37
+ is also a RESTful API that allows to create, delete and search documents in a Compass
38
+ index. To install a Compass API service, please refer to the [Compass
39
+ documentation](https://github.com/cohere-ai/compass)
40
+
41
+ ## Table of Contents
42
+
43
+ <!--
44
+ Do NOT remove the line below; it is used by markdown-toc to automatically generate the
45
+ Table of Contents.
46
+
47
+ To update the Table Of Contents, execute the following command in the repo root dir:
48
+
49
+ markdown-toc -i README.md
50
+
51
+ If you don't have the markdown-toc tool, you can install it with:
52
+
53
+ npm i -g markdown-toc # use sudo if you use a system-wide node installation.
54
+ >
55
+
56
+ <!-- toc -->
57
+
58
+ - [Getting Started](#getting-started)
59
+ - [Local Development](#local-development)
60
+ - [Create Python Virtual Environment](#create-python-virtual-environment)
61
+ - [Running Tests Locally](#running-tests-locally)
62
+ - [VSCode Users](#vscode-users)
63
+ - [Pre-commit](#pre-commit)
64
+
65
+ <!-- tocstop -->
66
+
67
+ ## Getting Started
68
+
69
+ Fill in your URL, username, password, and path to test data below for an end to end run
70
+ of parsing and searching.
71
+
72
+ ### Installation
73
+
74
+ ```bash
75
+ pip install git+https://github.com/cohere-ai/cohere-compass-sdk.git
76
+ ```
77
+
78
+ ```Python
79
+ from cohere_compass.clients.compass import CompassClient
80
+ from cohere_compass.clients.parser import CompassParserClient
81
+ from cohere_compass.models.config import MetadataStrategy, MetadataConfig
82
+
83
+ api_url = "<COMPASS_URL>"
84
+ parser_url = "<PARSER URL>"
85
+ bearer_token = "<PASS BEARER TOKEN IF ANY OTHERWISE LEAVE IT BLANK>"
86
+
87
+ index = "test-index"
88
+ data_to_index = "<PATH_TO_TEST_DATA>"
89
+
90
+ # Parse the files before indexing
91
+ parsing_client = CompassParserClient(parser_url = parser_url)
92
+ metadata_config = MetadataConfig(
93
+ metadata_strategy=MetadataStrategy.No_Metadata,
94
+ commandr_extractable_attributes=["date", "link", "page_title", "authors"]
95
+ )
96
+
97
+ docs_to_index = parsing_client.process_folder(folder_path=data_to_index, metadata_config=metadata_config, recursive=True)
98
+
99
+ # Create index and insert files
100
+ compass_client = CompassClient(index_url=api_url, bearer_token=bearer_token)
101
+ compass_client.create_index(index_name=index)
102
+ results = compass_client.insert_docs(index_name=index, docs=docs_to_index)
103
+
104
+ result = compass_client.search_chunks(index_name=index, query="test", top_k=1)
105
+ print(f"Results preview: \n {result.hits} ... \n \n ")
106
+ ```
107
+
108
+ ### Adding filters to documents
109
+
110
+ #### Adding filter via dict
111
+
112
+ ```python
113
+ from cohere_compass.clients.compass import CompassClient
114
+ from cohere_compass.clients.parser import CompassParserClient
115
+ from cohere_compass.models.search import SearchFilter
116
+
117
+ api_url = "<COMPASS_URL>"
118
+ parser_url = "<PARSER URL>"
119
+ data_to_index = "<PATH_TO_TEST_DATA>"
120
+ index = "test-index"
121
+ bearer_token = "<PASS BEARER TOKEN IF ANY OTHERWISE LEAVE IT BLANK>"
122
+
123
+ parsing_client = CompassParserClient(parser_url = parser_url)
124
+ custom_context_dict = {
125
+ "doc_purpose": "demo"
126
+ }
127
+
128
+ docs_to_index = parsing_client.process_folder(folder_path=data_to_index, recursive=True, custom_context=custom_context_dict)
129
+
130
+ compass_client = CompassClient(index_url=api_url, bearer_token=bearer_token)
131
+ filter = SearchFilter(type=SearchFilter.FilterType.EQ, field="content.doc_purpose", value="demo")
132
+ result = compass_client.search_chunks(index_name=index, query="*", filters=[filter])
133
+ print(f"Results preview: \n {result.hits} ... \n \n ")
134
+ ```
135
+
136
+ #### Adding filter via function
137
+
138
+ ```python
139
+ from cohere_compass.clients.compass import CompassClient
140
+ from cohere_compass.clients.parser import CompassParserClient
141
+ from cohere_compass.models.search import SearchFilter
142
+ from cohere_compass.models.documents import CompassDocument
143
+
144
+ api_url = "<COMPASS_URL>"
145
+ parser_url = "<PARSER URL>"
146
+ data_to_index = "<PATH_TO_TEST_DATA>"
147
+ index = "test-index"
148
+ bearer_token = "<PASS BEARER TOKEN IF ANY OTHERWISE LEAVE IT BLANK>"
149
+
150
+ parsing_client = CompassParserClient(parser_url = parser_url)
151
+
152
+ def custom_context_fn(input: CompassDocument):
153
+ content = input.content
154
+ if len(input.chunks) > 2:
155
+ content["new_doc_field"] = "more_than_two_chunks"
156
+ else:
157
+ content["new_doc_field"] = "less_than_two_chunks"
158
+ return content
159
+
160
+
161
+ docs_to_index = parsing_client.process_folder(folder_path=data_to_index, recursive=True, custom_context=custom_context_fn)
162
+
163
+ compass_client = CompassClient(index_url=api_url, bearer_token=bearer_token)
164
+ filter = SearchFilter(type=SearchFilter.FilterType.EQ, field="content.new_doc_field", value="less_than_two_chunks")
165
+ result = compass_client.search_chunks(index_name=index, query="*", filters=[filter])
166
+ print(f"Results preview: \n {result.hits} ... \n \n ")
167
+ ```
168
+
169
+ ### RBAC
170
+
171
+ ```python
172
+ from cohere_compass.clients.access_control import CompassRootClient
173
+ from cohere_compass.models.access_control import Group, Permission, Policy, Role, User
174
+ from requests.exceptions import HTTPError
175
+
176
+ ROOT_BEARER_TOKEN = "<ROOT_BEARER_TOKEN>"
177
+ API_URL = "<API_URL>"
178
+ compass_root = CompassRootClient(API_URL, ROOT_BEARER_TOKEN)
179
+
180
+ user = User(user_name="<USER_NAME>")
181
+ group = Group(group_name="<GROUP_NAME>")
182
+ role = Role(role_name="<ROLE_NAME>")
183
+ indexes = ["<ALLOWED_INDEX or REGEX>"]
184
+ permission = Permission.WRITE # or Permission.READ
185
+
186
+ try:
187
+ # Create Users
188
+ users = client.create_users([user])
189
+
190
+ # Create Groups
191
+ groups = client.create_groups([group])
192
+
193
+ # Add Users to a Group
194
+ memberships = client.add_members_to_group(group.group_name, [user.user_name])
195
+
196
+ # Add Policies and Create a Role
197
+ role.policies = [
198
+ Policy(permission=Permission.READ, indexes=indexes),
199
+ ]
200
+ roles = client.create_roles([role])
201
+
202
+ # Update Role Policies
203
+ role.policies = [
204
+ Policy(permission=Permission.READ, indexes=indexes),
205
+ Policy(permission=Permission.WRITE, indexes=indexes),
206
+ ]
207
+ role = client.update_role(role)
208
+
209
+ # Assign Roles to a Group
210
+ role_assignments = client.add_roles_to_group(group.group_name, [role.role_name])
211
+
212
+ # Token for the user to access the indexes
213
+ USER_TO_TOKENS = {user.name: user.token for user in users}
214
+ except HTTPError as e:
215
+ if e.response.status_code == 409:
216
+ print("A entity already exists", e.response.json())
217
+ ```
218
+
219
+ ### Reading RBAC Information
220
+
221
+ ```python
222
+ from cohere_compass.clients.access_control import CompassRootClient
223
+ from cohere_compass.models.access_control import Group, Role, User, PageDirection
224
+ from requests.exceptions import HTTPError
225
+
226
+ ROOT_BEARER_TOKEN = "<ROOT_BEARER_TOKEN>"
227
+ API_URL = "<API_URL>"
228
+ compass_root = CompassRootClient(API_URL, ROOT_BEARER_TOKEN)
229
+
230
+ user = User(user_name="<USER_NAME>")
231
+ group = Group(group_name="<GROUP_NAME>")
232
+ role = Role(role_name="<ROLE_NAME>")
233
+
234
+ # List all Users in the RBAC system
235
+ # First page
236
+ user_page = client.get_users_page()
237
+ # Subsequent pages
238
+ user_page = client.get_users_page(page_info=user_page.page_info, direction=PageDirection.NEXT)
239
+
240
+ # List all Groups in the RBAC system
241
+ # First page
242
+ group_page = client.get_groups_page()
243
+ # Subsequent pages
244
+ group_page = client.get_groups_page(page_info=group_page.page_info, direction=PageDirection.NEXT)
245
+
246
+ # List all Roles in the RBAC system
247
+ # First page
248
+ role_page = client.get_roles_page()
249
+ # Subsequent pages
250
+ role_page = client.get_roles_page(page_info=role_page.page_info, direction=PageDirection.NEXT)
251
+
252
+ # Get the Group Details (all data + first page each of Users who are Members and Roles Assigned)
253
+ detailed_group = client.get_detailed_group(group.group_name)
254
+
255
+ # Get pages of Group's User Memberships
256
+ # First page
257
+ memberships = client.get_group_members_page(group.group_name)
258
+ # Subsequent pages (can use the users_page_info from details)
259
+ memberships = client.get_group_members_page(group.group_name, page_info=memberships.page_info, direction=PageDirection.NEXT)
260
+
261
+ # Get pages of Group's Roles Assignments
262
+ # First page
263
+ role_assignments = client.get_group_roles_page(group.group_name)
264
+ # Subsequent pages (can use the role_page_info from details)
265
+ role_assignments = client.get_group_roles_page(group.group_name, page_info=role_assignments.page_info, direction=PageDirection.NEXT)
266
+
267
+ # Get the User Details (all data + first page of Groups that the User is a Member of)
268
+ detailed_user = client.get_detailed_user(user.user_name)
269
+
270
+ # Get pages of User's Group Memberships
271
+ # First page
272
+ group_memberships = client.get_user_groups_page(user.user_name)
273
+ # Subsequent pages (can use the group_page_info from details)
274
+ group_memberships = client.get_user_groups_page(user.user_name, page_info=group_memberships.page_info, direction=PageDirection.NEXT)
275
+
276
+ # Get the Roles Details (all data + first page of Groups the Role is Assigned to)
277
+ detailed_role = client.get_detailed_role(role.role_name)
278
+
279
+ # Get pages of Role's Group Assignments
280
+ group_assignments = client.get_role_groups_page(role.role_name)
281
+ # Subsequent pages (can use the group_page_info from details)
282
+ group_assignments = client.get_role_groups_page(role.role_name, page_info=group_assignments.page_info, direction=PageDirection.NEXT)
283
+
284
+ # Filtering any Page type query, exemplified on Users Page, but works with all.
285
+ user_page = client.get_users_page(filter="<SOME_NAME_OR_NAME_PARTIAL>")
286
+ ```
287
+
288
+ ### Deleting RBAC
289
+
290
+ ```python
291
+ from cohere_compass.clients.access_control import CompassRootClient
292
+ from cohere_compass.models.access_control import Group, Role, User
293
+
294
+ ROOT_BEARER_TOKEN = "<ROOT_BEARER_TOKEN>"
295
+ API_URL = "<API_URL>"
296
+ compass_root = CompassRootClient(API_URL, ROOT_BEARER_TOKEN)
297
+
298
+ user = User(user_name="<USER_NAME>")
299
+ group = Group(group_name="<GROUP_NAME>")
300
+ role = Role(role_name="<ROLE_NAME>")
301
+
302
+ # removing Roles from a Group
303
+ removed_roles = client.remove_roles_from_group(group.group_name, [role.role_name])
304
+
305
+ # removing Members from a Group
306
+ removed_members = client.remove_members_from_group(group.group_name, [user.user_name])
307
+
308
+ # deleting Roles
309
+ deleted_roles = client.delete_roles([role.role_name])
310
+
311
+ # deleting Groups
312
+ deleted_groups = client.delete_groups([group.group_name])
313
+
314
+ # deleting Users
315
+ deleted_users = client.delete_users([user.user_name])
316
+ ```
317
+
318
+ ## Local Development
319
+
320
+ ### Create Python Virtual Environment
321
+
322
+ We use Poetry to manage our Python environment. To create the virtual environment use
323
+ the following command:
324
+
325
+ ```
326
+ poetry install
327
+ ```
328
+
329
+ ### Running Tests Locally
330
+
331
+ We use `pytest` for testing. So, you can simply run tests using the following command:
332
+
333
+ ```
334
+ poetry run python -m pytest
335
+ ```
336
+
337
+ #### VSCode Users
338
+
339
+ We provide `.vscode` folder for those developers who prefer to use VSCode. You just need
340
+ to open the folder in VSCode and VSCode should pick our settings.
341
+
342
+ ### Pre-commit
343
+
344
+ We love and appreciate Coding Standards and so we enforce them in our code base.
345
+ However, without automation, enforcing Coding Standards usually result in a lot of
346
+ frustration for developers when they publish Pull Requests and our linters complain. So,
347
+ we automate our formatting and linting with [pre-commit](https://pre-commit.com/). All
348
+ you need to do is install our `pre-commit` hook so the code gets formatted automatically
349
+ when you commit your changes locally:
350
+
351
+ ```bash
352
+ pip install pre-commit
353
+ ```
354
+