cmem-plugin-pgvector 0.5.0__tar.gz → 0.6.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.3
2
2
  Name: cmem-plugin-pgvector
3
- Version: 0.5.0
4
- Summary: Store embedding vectors into a Postgres vector store.
3
+ Version: 0.6.0
4
+ Summary: Store and search for embedding vectors in a Postgres vector store.
5
5
  License: Apache-2.0
6
6
  Keywords: eccenca Corporate Memory,plugin
7
7
  Author: eccenca GmbH
@@ -24,43 +24,20 @@ Description-Content-Type: text/markdown
24
24
 
25
25
  # cmem-plugin-pgvector
26
26
 
27
- [![poetry][poetry-shield]][poetry-link] [![ruff][ruff-shield]][ruff-link] [![mypy][mypy-shield]][mypy-link] [![copier][copier-shield]][copier]
28
-
29
- Store embedding vectors into a Postgres vector store.
30
-
31
- This plugin consumes the costumable entity's paths ```embedding```, ```text``` and ```metadata``` as following:
32
-
33
- - The text path contain the text used to generate the embeddings, default ```text```.
34
- - The embedding path contain the embedding representation of the text, default ```embedding```.
35
- - The metadata path contain the information that will be associated with the embedding, default all paths.
27
+ Store and search for embedding vectors in a Postgres vector store.
36
28
 
37
29
  [![eccenca Corporate Memory][cmem-shield]][cmem-link]
38
30
 
39
- ## Use
40
-
41
- Interact with Large Language Models.
42
-
43
31
  This is a plugin for [eccenca](https://eccenca.com) [Corporate Memory](https://documentation.eccenca.com).
44
32
 
45
- You can install it with the [cmemc](https://eccenca.com/go/cmemc) command line
46
- clients like this:
33
+ You can install it with the [cmemc](https://eccenca.com/go/cmemc) command line client like this:
47
34
 
48
- ```
35
+ ``` bash
49
36
  cmemc admin workspace python install cmem-plugin-llm
50
37
  ```
51
38
 
52
- ### Parameters
53
-
54
- - ```collection_name```: The name of the collection where the embeddings are going to be stored, default ```my_collection```
55
- - ```user```:the database user
56
- - ```password```: the database password
57
- - ```host```: the databse host, i.e. locahost
58
- - ```port```: the database port, default ```5432```
59
- - ```database```: the name of the database
60
- - ```pre_delete_collection```: boolean parameter indicating if the collection should be cleanse before insertion, default ```false```
61
- - ```embedding_path```: output path that will contain the generated embedding, default ```embedding```
62
- - ```text_path```: path containing the text used for genereting the embedding, default ```text```
63
- - ```metadata_paths```: paths from the entity that will be stored along with the embedding, default all paths
39
+ [![pypi version](https://img.shields.io/pypi/v/cmem-plugin-pgvector)](https://pypi.org/project/cmem-plugin-pgvector) [![license](https://img.shields.io/pypi/l/cmem-plugin-pgvector)](https://pypi.org/project/cmem-plugin-pgvector)
40
+ [![poetry][poetry-shield]][poetry-link] [![ruff][ruff-shield]][ruff-link] [![mypy][mypy-shield]][mypy-link] [![copier][copier-shield]][copier]
64
41
 
65
42
  [cmem-link]: https://documentation.eccenca.com
66
43
  [cmem-shield]: https://img.shields.io/endpoint?url=https://dev.documentation.eccenca.com/badge.json
@@ -0,0 +1,28 @@
1
+ # cmem-plugin-pgvector
2
+
3
+ Store and search for embedding vectors in a Postgres vector store.
4
+
5
+ [![eccenca Corporate Memory][cmem-shield]][cmem-link]
6
+
7
+ This is a plugin for [eccenca](https://eccenca.com) [Corporate Memory](https://documentation.eccenca.com).
8
+
9
+ You can install it with the [cmemc](https://eccenca.com/go/cmemc) command line client like this:
10
+
11
+ ``` bash
12
+ cmemc admin workspace python install cmem-plugin-llm
13
+ ```
14
+
15
+ [![pypi version](https://img.shields.io/pypi/v/cmem-plugin-pgvector)](https://pypi.org/project/cmem-plugin-pgvector) [![license](https://img.shields.io/pypi/l/cmem-plugin-pgvector)](https://pypi.org/project/cmem-plugin-pgvector)
16
+ [![poetry][poetry-shield]][poetry-link] [![ruff][ruff-shield]][ruff-link] [![mypy][mypy-shield]][mypy-link] [![copier][copier-shield]][copier]
17
+
18
+ [cmem-link]: https://documentation.eccenca.com
19
+ [cmem-shield]: https://img.shields.io/endpoint?url=https://dev.documentation.eccenca.com/badge.json
20
+ [poetry-link]: https://python-poetry.org/
21
+ [poetry-shield]: https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json
22
+ [ruff-link]: https://docs.astral.sh/ruff/
23
+ [ruff-shield]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json&label=Code%20Style
24
+ [mypy-link]: https://mypy-lang.org/
25
+ [mypy-shield]: https://www.mypy-lang.org/static/mypy_badge.svg
26
+ [copier]: https://copier.readthedocs.io/
27
+ [copier-shield]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/copier-org/copier/master/img/badge/badge-grayscale-inverted-border-purple.json
28
+
@@ -0,0 +1,131 @@
1
+ """PGVector commons"""
2
+
3
+ from typing import Any, ClassVar
4
+
5
+ import psycopg
6
+ from cmem_plugin_base.dataintegration.context import (
7
+ PluginContext,
8
+ )
9
+ from cmem_plugin_base.dataintegration.description import PluginParameter
10
+ from cmem_plugin_base.dataintegration.parameter.password import PasswordParameterType
11
+ from cmem_plugin_base.dataintegration.types import (
12
+ Autocompletion,
13
+ IntParameterType,
14
+ StringParameterType,
15
+ )
16
+
17
+
18
+ def get_collection_names(
19
+ dbname: str, user: str, password: str, host: str = "localhost", port: int = 5432
20
+ ) -> list[str]:
21
+ """Return list of collection names"""
22
+ # Create a connection to the database
23
+ with (
24
+ psycopg.connect(dbname=dbname, user=user, password=password, host=host, port=port) as conn,
25
+ conn.cursor() as cursor,
26
+ ):
27
+ # Execute query
28
+ cursor.execute("SELECT name FROM public.langchain_pg_collection;")
29
+ return [row[0] for row in cursor.fetchall()] # Fetch all names
30
+
31
+
32
+ class PGVectorCollection(StringParameterType):
33
+ """PGVector Collection Type"""
34
+
35
+ autocompletion_depends_on_parameters: ClassVar[list[str]] = [
36
+ "host",
37
+ "port",
38
+ "database",
39
+ "user",
40
+ "password",
41
+ ]
42
+
43
+ # auto complete for values
44
+ allow_only_autocompleted_values: bool = True
45
+ # auto complete for labels
46
+ autocomplete_value_with_labels: bool = True
47
+
48
+ def autocomplete(
49
+ self,
50
+ query_terms: list[str],
51
+ depend_on_parameter_values: list[Any],
52
+ context: PluginContext,
53
+ ) -> list[Autocompletion]:
54
+ """Return all results that match ALL provided query terms."""
55
+ _ = context
56
+ host = depend_on_parameter_values[0]
57
+ port = depend_on_parameter_values[1]
58
+ dbname = depend_on_parameter_values[2]
59
+ user = depend_on_parameter_values[3]
60
+ password = depend_on_parameter_values[4]
61
+ password = password if isinstance(password, str) else password.decrypt()
62
+ result = []
63
+ try:
64
+ collections = get_collection_names(
65
+ host=host, port=port, dbname=dbname, user=user, password=password
66
+ )
67
+ filtered_models = set()
68
+ if query_terms:
69
+ for term in query_terms:
70
+ for collection in collections:
71
+ if term in collection:
72
+ filtered_models.add(collection)
73
+ else:
74
+ filtered_models = set(collections)
75
+ result = [Autocompletion(value=f"{_}", label=f"{_}") for _ in filtered_models]
76
+ except Exception as error:
77
+ raise ValueError(
78
+ "Failed to authenticate with OpenAI API, Please check URL and API key."
79
+ ) from error
80
+ result.sort(key=lambda x: x.label)
81
+ return result
82
+
83
+
84
+ class DatabaseParams:
85
+ """Common Plugin parameters"""
86
+
87
+ host = PluginParameter(
88
+ name="host",
89
+ label="Database Host",
90
+ description="The hostname of the postgres database service.",
91
+ default_value="pgvector",
92
+ )
93
+ port = PluginParameter(
94
+ name="port",
95
+ label="Database Port",
96
+ param_type=IntParameterType(),
97
+ description="The port number of the postgres database service.",
98
+ default_value=5432,
99
+ )
100
+ user = PluginParameter(
101
+ name="user",
102
+ label="Database User",
103
+ description="The account name used to login to the postgres database service.",
104
+ default_value="pgvector",
105
+ )
106
+ password = PluginParameter(
107
+ name="password",
108
+ label="Database Password",
109
+ param_type=PasswordParameterType(),
110
+ description="The password of the database account.",
111
+ )
112
+ database = PluginParameter(
113
+ name="database",
114
+ label="Database Name",
115
+ description="The database name.",
116
+ default_value="pgvector",
117
+ )
118
+ collection_name = PluginParameter(
119
+ name="collection_name",
120
+ label="Collection Name",
121
+ description="The name of the collection that will be used for search.",
122
+ param_type=PGVectorCollection(),
123
+ )
124
+
125
+ def as_list(self) -> list[PluginParameter]:
126
+ """Provide all parameters as list"""
127
+ return [
128
+ getattr(self, attr)
129
+ for attr in dir(self)
130
+ if not callable(getattr(self, attr)) and not attr.startswith("__")
131
+ ]
@@ -0,0 +1,221 @@
1
+ """Search Task"""
2
+
3
+ import json
4
+ from ast import literal_eval
5
+ from collections.abc import Generator, Sequence
6
+
7
+ from cmem_plugin_base.dataintegration.context import ExecutionContext, ExecutionReport
8
+ from cmem_plugin_base.dataintegration.description import Icon, Plugin, PluginParameter
9
+ from cmem_plugin_base.dataintegration.entity import Entities, Entity, EntityPath, EntitySchema
10
+ from cmem_plugin_base.dataintegration.parameter.password import Password
11
+ from cmem_plugin_base.dataintegration.plugins import WorkflowPlugin
12
+ from cmem_plugin_base.dataintegration.ports import (
13
+ FixedNumberOfInputs,
14
+ FixedSchemaPort,
15
+ )
16
+ from cmem_plugin_base.dataintegration.types import IntParameterType
17
+ from langchain_core.documents import Document
18
+ from langchain_postgres import PGVector
19
+
20
+ from cmem_plugin_pgvector.commons import DatabaseParams
21
+
22
+
23
+ @Plugin(
24
+ label="Search Vector Embeddings",
25
+ description="Search for top-k metadata stored in Postgres Vector Store (PGVector).",
26
+ documentation="""
27
+ This workflow task search for the top-k metadata stored into Postgres Vector Store.
28
+
29
+ The incoming embedding entities are used to retrieve the nearest top-k
30
+ vectors in the collection stored in the Postgres Vector Store.
31
+ It is possible to specify which paths are going to be used for searching as well as which Postgres
32
+ Vector Store and collection name.
33
+
34
+ The task uses the embeddings from the path configured with the Embedding Query Path
35
+ parameter (`embedding_query_path`, default value: `_embedding`) to search over the collection.
36
+ The results are provided in the output path configured with the Search Result Path parameter
37
+ (`search_result_path`, default value: `_search_result`).
38
+
39
+ The results in this output are structured like this:
40
+
41
+ ``` json
42
+ [
43
+ {
44
+ "id": "..."
45
+ "metadata": "..."
46
+ "content": "..."
47
+ "score": "..."
48
+ },
49
+ ...
50
+ ]
51
+ ```
52
+ """,
53
+ icon=Icon(package=__package__, file_name="postgresql.svg"),
54
+ plugin_id="cmem_plugin_pgvector-Search",
55
+ parameters=[
56
+ *DatabaseParams().as_list(),
57
+ PluginParameter(
58
+ name="embedding_query_path",
59
+ label="Embedding Query Path",
60
+ description="""The path containing the embedding to be used for searching.""",
61
+ default_value="_embedding",
62
+ ),
63
+ PluginParameter(
64
+ name="search_result_path",
65
+ label="Search Result Path",
66
+ description="""The path containing the search result in the output entities.""",
67
+ default_value="_search_result",
68
+ ),
69
+ PluginParameter(
70
+ name="top_k",
71
+ label="Top-k",
72
+ description="The number of entries to be returned in the search result.",
73
+ default_value=10,
74
+ param_type=IntParameterType(),
75
+ ),
76
+ ],
77
+ )
78
+ class PGVectorSearchPlugin(WorkflowPlugin):
79
+ """PGVectorSearchPlugin: Enable the search of vectors in a Postgres Vector Store."""
80
+
81
+ connection_string: str
82
+ user: str
83
+ password: str
84
+ host: str
85
+ port: int
86
+ database: str
87
+ collection_name: str
88
+ embedding_query_path: str
89
+ inputs: Sequence[Entities]
90
+ db: PGVector
91
+ execution_context: ExecutionContext
92
+ report: ExecutionReport
93
+ search_result_path: str
94
+ top_k: int
95
+
96
+ def __init__( # noqa: PLR0913
97
+ self,
98
+ host: str = DatabaseParams.host.default_value,
99
+ port: int = DatabaseParams.port.default_value,
100
+ user: str = DatabaseParams.user.default_value,
101
+ password: Password | str = "",
102
+ database: str = DatabaseParams.database.default_value,
103
+ collection_name: str = DatabaseParams.collection_name.default_value,
104
+ search_result_path: str = "_search_result",
105
+ embedding_query_path: str = "_embedding",
106
+ top_k: int = 10,
107
+ ) -> None:
108
+ self.collection_name = collection_name
109
+ self.user = user
110
+ self.host = host
111
+ self.port = port
112
+ self.database = database
113
+ self.embedding_query_path = embedding_query_path
114
+ self.search_result_path = search_result_path
115
+ self.top_k = top_k
116
+
117
+ str_password = self.password = password if isinstance(password, str) else password.decrypt()
118
+ self.connection_string = (
119
+ f"postgresql+psycopg://{user}:{str_password}@{host}:{port}/{database}"
120
+ )
121
+
122
+ self.report = ExecutionReport()
123
+ self.report.operation = "search"
124
+ self.report.operation_desc = "searches"
125
+
126
+ self.db = PGVector(
127
+ collection_name=self.collection_name,
128
+ connection=self.connection_string,
129
+ embeddings=None, # type: ignore # noqa: PGH003
130
+ use_jsonb=True,
131
+ pre_delete_collection=False,
132
+ )
133
+ self._setup_ports()
134
+
135
+ def _setup_ports(self) -> None:
136
+ """Configure input and output ports depending on the configuration"""
137
+ input_paths = [EntityPath(path=self.embedding_query_path)]
138
+ input_schema = EntitySchema(type_uri="entity", paths=input_paths)
139
+ self.input_ports = FixedNumberOfInputs(ports=[FixedSchemaPort(schema=input_schema)])
140
+
141
+ output_schema = self._generate_output_schema(input_schema=input_schema)
142
+ self.output_port = FixedSchemaPort(schema=output_schema)
143
+
144
+ def _generate_output_schema(self, input_schema: EntitySchema) -> EntitySchema:
145
+ """Get output schema"""
146
+ paths = list(input_schema.paths).copy()
147
+ paths.append(EntityPath(self.search_result_path))
148
+ return EntitySchema(type_uri=input_schema.type_uri, paths=paths)
149
+
150
+ @staticmethod
151
+ def _entity_to_dict(paths: Sequence[EntityPath], entity: Entity) -> dict[str, list[str]]:
152
+ """Create a dict representation of an entity"""
153
+ entity_dic = {}
154
+ for key, value in zip(paths, entity.values, strict=False):
155
+ entity_dic[key.path] = list(value)
156
+ return entity_dic
157
+
158
+ def _update_report(self, count: int) -> None:
159
+ """Update the report"""
160
+ self.report.entity_count = count
161
+ self.execution_context.report.update(self.report)
162
+
163
+ def _cancel_workflow(self) -> bool:
164
+ """Cancel workflow"""
165
+ try:
166
+ if self.execution_context.workflow.status() == "Canceling":
167
+ self.log.info("End task (Cancelled Workflow).")
168
+ return True
169
+ except AttributeError:
170
+ pass
171
+ return False
172
+
173
+ def _docs_to_json(self, docs: list[tuple[Document, float]]) -> list:
174
+ """Convert a list of Documents to a list of metadata"""
175
+ doc_list: list = []
176
+ for doc_tuple in docs:
177
+ json_entity = {}
178
+ json_entity["id"] = doc_tuple[0].id
179
+ json_entity["metadata"] = str(doc_tuple[0].metadata)
180
+ json_entity["content"] = doc_tuple[0].page_content
181
+ json_entity["score"] = str(doc_tuple[1])
182
+ doc_list.append(json_entity)
183
+ return doc_list
184
+
185
+ def _process_entities(self, entities: Entities) -> Generator[Entity]:
186
+ """Process incoming entities' embeddings in vector search"""
187
+ schema_paths: list[EntityPath] = list(entities.schema.paths)
188
+ n_processed_entries: int = 0
189
+ self._update_report(n_processed_entries)
190
+ for entity in entities.entities:
191
+ if self._cancel_workflow():
192
+ return
193
+ entity_dict = self._entity_to_dict(schema_paths, entity)
194
+ embedding: list[float] = literal_eval(entity_dict[self.embedding_query_path][0])
195
+ result: list[tuple[Document, float]] = self.db.similarity_search_with_score_by_vector(
196
+ embedding=embedding, k=self.top_k
197
+ )
198
+ json_result = self._docs_to_json(result)
199
+ entity_dict[self.search_result_path] = [json.dumps(json_result)]
200
+ values = list(entity_dict.values())
201
+ n_processed_entries += 1
202
+ self._update_report(n_processed_entries)
203
+ yield Entity(uri=entity.uri, values=values)
204
+
205
+ def execute(
206
+ self,
207
+ inputs: Sequence[Entities],
208
+ context: ExecutionContext,
209
+ ) -> Entities:
210
+ """Run the workflow operator."""
211
+ self.log.info("Start searching collection.")
212
+ self.inputs = inputs
213
+ self.execution_context = context
214
+ try:
215
+ first_input: Entities = self.inputs[0]
216
+ except IndexError as error:
217
+ raise ValueError("Input port not connected.") from error
218
+ entities = self._process_entities(first_input)
219
+ schema = self._generate_output_schema(first_input.schema)
220
+ self.log.info("End")
221
+ return Entities(entities=entities, schema=schema)
@@ -1,7 +1,4 @@
1
- """Random values workflow plugin module
2
-
3
- Remove this and other example files after bootstrapping your project.
4
- """
1
+ """Store Task"""
5
2
 
6
3
  from ast import literal_eval
7
4
  from collections.abc import Sequence
@@ -10,15 +7,16 @@ from typing import Any
10
7
  from cmem_plugin_base.dataintegration.context import ExecutionContext, ExecutionReport
11
8
  from cmem_plugin_base.dataintegration.description import Icon, Plugin, PluginParameter
12
9
  from cmem_plugin_base.dataintegration.entity import Entities, Entity, EntityPath
13
- from cmem_plugin_base.dataintegration.parameter.password import Password, PasswordParameterType
10
+ from cmem_plugin_base.dataintegration.parameter.password import Password
14
11
  from cmem_plugin_base.dataintegration.plugins import WorkflowPlugin
15
12
  from cmem_plugin_base.dataintegration.ports import (
16
13
  FixedNumberOfInputs,
17
14
  UnknownSchemaPort,
18
15
  )
19
- from cmem_plugin_base.dataintegration.types import IntParameterType
20
16
  from langchain_postgres import PGVector
21
17
 
18
+ from cmem_plugin_pgvector.commons import DatabaseParams
19
+
22
20
 
23
21
  class DataContainer:
24
22
  """Encapsulate the data to be added to the database."""
@@ -26,19 +24,19 @@ class DataContainer:
26
24
  def __init__(self):
27
25
  self.texts = []
28
26
  self.embeddings = []
29
- self.metadatas = []
27
+ self.metadata = []
30
28
 
31
29
  def add(self, text: str, embedding: list[float], metadata: dict) -> None:
32
30
  """Add objects to the respective lists."""
33
31
  self.texts.append(text)
34
32
  self.embeddings.append(embedding)
35
- self.metadatas.append(metadata)
33
+ self.metadata.append(metadata)
36
34
 
37
35
  def clear(self) -> None:
38
36
  """Clear all three lists."""
39
37
  self.texts.clear()
40
38
  self.embeddings.clear()
41
- self.metadatas.clear()
39
+ self.metadata.clear()
42
40
 
43
41
  def size(self) -> int:
44
42
  """Return the size of the lists (assuming all lists have the same length)."""
@@ -46,8 +44,8 @@ class DataContainer:
46
44
 
47
45
 
48
46
  @Plugin(
49
- label="Postgres Vector Store",
50
- description="Store embeddings into Postgres Vector Store.",
47
+ label="Store Vector Embeddings",
48
+ description="Store embeddings into Postgres Vector Store (PGVector).",
51
49
  documentation="""
52
50
  This plugin workflow store embeddings into Postgres Vector Store.
53
51
 
@@ -57,44 +55,9 @@ It is possible to specify either the name of the attributes containing the vecto
57
55
  metadata.
58
56
  """,
59
57
  icon=Icon(package=__package__, file_name="postgresql.svg"),
58
+ plugin_id="cmem_plugin_pgvector-Store",
60
59
  parameters=[
61
- PluginParameter(
62
- name="host",
63
- label="Database Host",
64
- description="The hostname of the postgres database service.",
65
- default_value="pgvector",
66
- ),
67
- PluginParameter(
68
- name="port",
69
- label="Database Port",
70
- param_type=IntParameterType(),
71
- description="The port number of the postgres database service.",
72
- default_value=5432,
73
- ),
74
- PluginParameter(
75
- name="user",
76
- label="Database User",
77
- description="The account name used to login to the postgres database service.",
78
- default_value="pgvector",
79
- ),
80
- PluginParameter(
81
- name="password",
82
- label="Database Password",
83
- param_type=PasswordParameterType(),
84
- description="The password of the database account.",
85
- ),
86
- PluginParameter(
87
- name="database",
88
- label="Database Name",
89
- description="The database name.",
90
- default_value="pgvector",
91
- ),
92
- PluginParameter(
93
- name="collection_name",
94
- label="Collection Name",
95
- description="The name of the collection, where the embeddings are going to be stored.",
96
- default_value="my_collection",
97
- ),
60
+ *DatabaseParams().as_list(),
98
61
  PluginParameter(
99
62
  name="pre_delete_collection",
100
63
  label="Pre Delete Collection",
@@ -153,12 +116,12 @@ class PGVectorStorePlugin(WorkflowPlugin):
153
116
 
154
117
  def __init__( # noqa: PLR0913
155
118
  self,
156
- host: str = "pgvector",
157
- port: int = 5432,
158
- user: str = "pgvector",
119
+ host: str = DatabaseParams.host.default_value,
120
+ port: int = DatabaseParams.port.default_value,
121
+ user: str = DatabaseParams.user.default_value,
159
122
  password: Password | str = "",
160
- database: str = "pgvector",
161
- collection_name: str = "my_collection",
123
+ database: str = DatabaseParams.database.default_value,
124
+ collection_name: str = DatabaseParams.collection_name.default_value,
162
125
  pre_delete_collection: bool = True,
163
126
  source_path: str = "_embedding_source",
164
127
  embedding_path: str = "_embedding",
@@ -247,14 +210,14 @@ class PGVectorStorePlugin(WorkflowPlugin):
247
210
  else self._metadata(entity, schema_paths),
248
211
  )
249
212
  if container.size() == self.batch_processing_size:
250
- self.db.add_embeddings(container.texts, container.embeddings, container.metadatas)
213
+ self.db.add_embeddings(container.texts, container.embeddings, container.metadata)
251
214
  n_processed_entries += container.size()
252
215
  self._update_report(n_processed_entries)
253
216
  container.clear()
254
217
  if self._cancel_workflow():
255
218
  return
256
219
  if container.size() > 0:
257
- self.db.add_embeddings(container.texts, container.embeddings, container.metadatas)
220
+ self.db.add_embeddings(container.texts, container.embeddings, container.metadata)
258
221
  n_processed_entries += container.size()
259
222
  self._update_report(n_processed_entries)
260
223
 
@@ -1,8 +1,8 @@
1
1
  [tool.poetry]
2
2
  name = "cmem-plugin-pgvector"
3
- version = "0.5.0"
3
+ version = "0.6.0"
4
4
  license = "Apache-2.0"
5
- description = "Store embedding vectors into a Postgres vector store."
5
+ description = "Store and search for embedding vectors in a Postgres vector store."
6
6
  authors = ["eccenca GmbH <cmempy-developer@eccenca.com>"]
7
7
  maintainers = ["Edgard Marx <edgard.marx@eccenca.com>"]
8
8
  classifiers = [
@@ -1,51 +0,0 @@
1
- # cmem-plugin-pgvector
2
-
3
- [![poetry][poetry-shield]][poetry-link] [![ruff][ruff-shield]][ruff-link] [![mypy][mypy-shield]][mypy-link] [![copier][copier-shield]][copier]
4
-
5
- Store embedding vectors into a Postgres vector store.
6
-
7
- This plugin consumes the costumable entity's paths ```embedding```, ```text``` and ```metadata``` as following:
8
-
9
- - The text path contain the text used to generate the embeddings, default ```text```.
10
- - The embedding path contain the embedding representation of the text, default ```embedding```.
11
- - The metadata path contain the information that will be associated with the embedding, default all paths.
12
-
13
- [![eccenca Corporate Memory][cmem-shield]][cmem-link]
14
-
15
- ## Use
16
-
17
- Interact with Large Language Models.
18
-
19
- This is a plugin for [eccenca](https://eccenca.com) [Corporate Memory](https://documentation.eccenca.com).
20
-
21
- You can install it with the [cmemc](https://eccenca.com/go/cmemc) command line
22
- clients like this:
23
-
24
- ```
25
- cmemc admin workspace python install cmem-plugin-llm
26
- ```
27
-
28
- ### Parameters
29
-
30
- - ```collection_name```: The name of the collection where the embeddings are going to be stored, default ```my_collection```
31
- - ```user```:the database user
32
- - ```password```: the database password
33
- - ```host```: the databse host, i.e. locahost
34
- - ```port```: the database port, default ```5432```
35
- - ```database```: the name of the database
36
- - ```pre_delete_collection```: boolean parameter indicating if the collection should be cleanse before insertion, default ```false```
37
- - ```embedding_path```: output path that will contain the generated embedding, default ```embedding```
38
- - ```text_path```: path containing the text used for genereting the embedding, default ```text```
39
- - ```metadata_paths```: paths from the entity that will be stored along with the embedding, default all paths
40
-
41
- [cmem-link]: https://documentation.eccenca.com
42
- [cmem-shield]: https://img.shields.io/endpoint?url=https://dev.documentation.eccenca.com/badge.json
43
- [poetry-link]: https://python-poetry.org/
44
- [poetry-shield]: https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json
45
- [ruff-link]: https://docs.astral.sh/ruff/
46
- [ruff-shield]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json&label=Code%20Style
47
- [mypy-link]: https://mypy-lang.org/
48
- [mypy-shield]: https://www.mypy-lang.org/static/mypy_badge.svg
49
- [copier]: https://copier.readthedocs.io/
50
- [copier-shield]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/copier-org/copier/master/img/badge/badge-grayscale-inverted-border-purple.json
51
-