haystack-pixeltable 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,112 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work.
38
+
39
+ "Derivative Works" shall mean any work, whether in Source or Object
40
+ form, that is based on (or derived from) the Work and for which the
41
+ editorial revisions, annotations, elaborations, or other modifications
42
+ represent, as a whole, an original work of authorship.
43
+
44
+ "Contribution" shall mean any work of authorship, including
45
+ the original version of the Work and any modifications or additions
46
+ to that Work, that is intentionally submitted to the Licensor for
47
+ inclusion in the Work by the copyright owner or by an individual or
48
+ Legal Entity authorized to submit on behalf of the copyright owner.
49
+
50
+ "Contributor" shall mean Licensor and any individual or Legal Entity
51
+ on behalf of whom a Contribution has been received by the Licensor and
52
+ subsequently incorporated within the Work.
53
+
54
+ 2. Grant of Copyright License. Subject to the terms and conditions of
55
+ this License, each Contributor hereby grants to You a perpetual,
56
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
57
+ copyright license to reproduce, prepare Derivative Works of,
58
+ publicly display, publicly perform, sublicense, and distribute the
59
+ Work and such Derivative Works in Source or Object form.
60
+
61
+ 3. Grant of Patent License. Subject to the terms and conditions of
62
+ this License, each Contributor hereby grants to You a perpetual,
63
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
64
+ (except as stated in this section) patent license to make, have made,
65
+ use, offer to sell, sell, import, and otherwise transfer the Work.
66
+
67
+ 4. Redistribution. You may reproduce and distribute copies of the
68
+ Work or Derivative Works thereof in any medium, with or without
69
+ modifications, and in Source or Object form, provided that You
70
+ meet the following conditions:
71
+
72
+ (a) You must give any other recipients of the Work or
73
+ Derivative Works a copy of this License; and
74
+
75
+ (b) You must cause any modified files to carry prominent notices
76
+ stating that You changed the files; and
77
+
78
+ (c) You must retain, in the Source form of any Derivative Works
79
+ that You distribute, all copyright, patent, trademark, and
80
+ attribution notices from the Source form of the Work,
81
+ excluding those notices that do not pertain to any part of
82
+ the Derivative Works; and
83
+
84
+ (d) If the Work includes a "NOTICE" text file as part of its
85
+ distribution, then any Derivative Works that You distribute must
86
+ include a readable copy of the attribution notices contained
87
+ within such NOTICE file.
88
+
89
+ 5. Submission of Contributions.
90
+
91
+ 6. Trademarks. This License does not grant permission to use the trade
92
+ names, trademarks, service marks, or product names of the Licensor.
93
+
94
+ 7. Disclaimer of Warranty.
95
+
96
+ 8. Limitation of Liability.
97
+
98
+ 9. Accepting Warranty or Additional Liability.
99
+
100
+ Copyright 2024-2026 Pixeltable, Inc.
101
+
102
+ Licensed under the Apache License, Version 2.0 (the "License");
103
+ you may not use this file except in compliance with the License.
104
+ You may obtain a copy of the License at
105
+
106
+ http://www.apache.org/licenses/LICENSE-2.0
107
+
108
+ Unless required by applicable law or agreed to in writing, software
109
+ distributed under the License is distributed on an "AS IS" BASIS,
110
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
111
+ See the License for the specific language governing permissions and
112
+ limitations under the License.
@@ -0,0 +1,191 @@
1
+ Metadata-Version: 2.4
2
+ Name: haystack-pixeltable
3
+ Version: 0.1.0
4
+ Summary: Haystack Document Store and Retriever backed by Pixeltable multimodal data infrastructure.
5
+ Author-email: Pixeltable <contact@pixeltable.com>
6
+ License-Expression: Apache-2.0
7
+ Project-URL: Homepage, https://github.com/pixeltable/haystack-pixeltable
8
+ Project-URL: Repository, https://github.com/pixeltable/haystack-pixeltable
9
+ Project-URL: Documentation, https://docs.pixeltable.com/
10
+ Project-URL: Issues, https://github.com/pixeltable/haystack-pixeltable/issues
11
+ Project-URL: Discord, https://discord.gg/QPyqFYx2UN
12
+ Keywords: haystack,pixeltable,document-store,retriever,multimodal,embeddings,rag
13
+ Classifier: Development Status :: 4 - Beta
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.10
17
+ Classifier: Programming Language :: Python :: 3.11
18
+ Classifier: Programming Language :: Python :: 3.12
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Requires-Dist: haystack-ai>=2.6.0
24
+ Requires-Dist: pixeltable>=0.2.28
25
+ Requires-Dist: numpy
26
+ Provides-Extra: dev
27
+ Requires-Dist: pytest>=8.0; extra == "dev"
28
+ Requires-Dist: ruff>=0.4; extra == "dev"
29
+ Dynamic: license-file
30
+
31
+ # pixeltable-haystack
32
+
33
+ [![PyPI](https://img.shields.io/pypi/v/pixeltable-haystack)](https://pypi.org/project/pixeltable-haystack/)
34
+ [![CI](https://github.com/pixeltable/haystack-pixeltable/actions/workflows/ci.yml/badge.svg)](https://github.com/pixeltable/haystack-pixeltable/actions/workflows/ci.yml)
35
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
36
+
37
+ Haystack Document Store and Retriever backed by [Pixeltable](https://pixeltable.com/) — persistent, versioned, multimodal data infrastructure for AI applications.
38
+
39
+ ## Installation
40
+
41
+ ```bash
42
+ pip install pixeltable-haystack
43
+ ```
44
+
45
+ ## Quick Start
46
+
47
+ ### Document Store
48
+
49
+ ```python
50
+ from haystack import Document
51
+ from haystack_pixeltable import PixeltableDocumentStore
52
+
53
+ store = PixeltableDocumentStore(
54
+ table_name="myproject.docs",
55
+ embedding_dimension=1536,
56
+ )
57
+
58
+ # Write documents
59
+ store.write_documents([
60
+ Document(content="Pixeltable is multimodal data infrastructure.", embedding=[...]),
61
+ Document(content="Haystack is a framework for building RAG pipelines.", embedding=[...]),
62
+ ])
63
+
64
+ # Filter documents
65
+ results = store.filter_documents(
66
+ filters={"field": "meta.category", "operator": "==", "value": "docs"}
67
+ )
68
+
69
+ # Count
70
+ print(store.count_documents())
71
+ ```
72
+
73
+ ### Retriever (Similarity Search)
74
+
75
+ ```python
76
+ from haystack_pixeltable import PixeltableDocumentStore, PixeltableRetriever
77
+
78
+ store = PixeltableDocumentStore(
79
+ table_name="myproject.docs",
80
+ embedding_dimension=1536,
81
+ )
82
+ retriever = PixeltableRetriever(document_store=store, top_k=5)
83
+
84
+ # Search by embedding vector
85
+ result = retriever.run(query_embedding=[0.1, 0.2, ...])
86
+ for doc in result["documents"]:
87
+ print(f"{doc.content} (score: {doc.score:.3f})")
88
+ ```
89
+
90
+ ### In a Haystack Pipeline
91
+
92
+ ```python
93
+ from haystack import Pipeline
94
+ from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
95
+ from haystack.components.writers import DocumentWriter
96
+ from haystack_pixeltable import PixeltableDocumentStore, PixeltableRetriever
97
+
98
+ store = PixeltableDocumentStore(
99
+ table_name="rag.knowledge",
100
+ embedding_dimension=384,
101
+ )
102
+
103
+ # Indexing pipeline
104
+ indexing = Pipeline()
105
+ indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
106
+ indexing.add_component("writer", DocumentWriter(document_store=store))
107
+ indexing.connect("embedder", "writer")
108
+
109
+ # Query pipeline
110
+ query = Pipeline()
111
+ query.add_component("embedder", SentenceTransformersTextEmbedder())
112
+ query.add_component("retriever", PixeltableRetriever(document_store=store, top_k=5))
113
+ query.connect("embedder.embedding", "retriever.query_embedding")
114
+ ```
115
+
116
+ ## Filtering
117
+
118
+ The Document Store supports the [Haystack filter specification](https://docs.haystack.deepset.ai/docs/metadata-filtering):
119
+
120
+ ```python
121
+ # Simple equality
122
+ store.filter_documents(filters={"field": "meta.category", "operator": "==", "value": "science"})
123
+
124
+ # Comparison operators: ==, !=, >, >=, <, <=
125
+ store.filter_documents(filters={"field": "meta.score", "operator": ">", "value": 0.8})
126
+
127
+ # Compound AND
128
+ store.filter_documents(filters={
129
+ "operator": "AND",
130
+ "conditions": [
131
+ {"field": "meta.category", "operator": "==", "value": "science"},
132
+ {"field": "meta.score", "operator": ">", "value": 0.5},
133
+ ],
134
+ })
135
+
136
+ # Compound OR
137
+ store.filter_documents(filters={
138
+ "operator": "OR",
139
+ "conditions": [
140
+ {"field": "meta.source", "operator": "==", "value": "arxiv"},
141
+ {"field": "meta.source", "operator": "==", "value": "pubmed"},
142
+ ],
143
+ })
144
+ ```
145
+
146
+ ## Pixeltable Escape Hatch: `.table`
147
+
148
+ The `.table` property gives direct access to the underlying Pixeltable table for operations beyond the Haystack interface:
149
+
150
+ ```python
151
+ store = PixeltableDocumentStore(table_name="myproject.docs", embedding_dimension=1536)
152
+ t = store.table
153
+
154
+ # Add a computed column
155
+ import pixeltable.functions.openai as openai
156
+ t.add_computed_column(
157
+ summary=openai.chat_completions(
158
+ messages=[{"role": "user", "content": t.content}],
159
+ model="gpt-4o-mini",
160
+ )
161
+ )
162
+
163
+ # Use arbitrary Pixeltable queries
164
+ results = t.where(t.meta["category"] == "science").select(t.content, t.summary).collect()
165
+
166
+ # Version history
167
+ print(t.count(version=-1)) # row count at previous version
168
+ ```
169
+
170
+ ## Why Pixeltable?
171
+
172
+ | Feature | Pixeltable | Chroma | Qdrant | pgvector |
173
+ |---------|-----------|--------|--------|----------|
174
+ | Persistent storage | Built-in | Opt-in | Opt-in | Built-in |
175
+ | Computed columns | Native | No | No | No |
176
+ | Version history | Native | No | No | No |
177
+ | Multimodal types | Image, Video, Audio, Document | Text only | Text only | Text only |
178
+ | Metadata filtering | JSON + SQL predicates | Limited | Rich | SQL |
179
+ | Embedding auto-compute | Via computed columns | Manual | Manual | Manual |
180
+
181
+ ## Development
182
+
183
+ ```bash
184
+ pip install -e ".[dev]"
185
+ pytest tests/ -v
186
+ ruff check . && ruff format --check .
187
+ ```
188
+
189
+ ## License
190
+
191
+ Apache 2.0
@@ -0,0 +1,161 @@
1
+ # pixeltable-haystack
2
+
3
+ [![PyPI](https://img.shields.io/pypi/v/pixeltable-haystack)](https://pypi.org/project/pixeltable-haystack/)
4
+ [![CI](https://github.com/pixeltable/haystack-pixeltable/actions/workflows/ci.yml/badge.svg)](https://github.com/pixeltable/haystack-pixeltable/actions/workflows/ci.yml)
5
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
6
+
7
+ Haystack Document Store and Retriever backed by [Pixeltable](https://pixeltable.com/) — persistent, versioned, multimodal data infrastructure for AI applications.
8
+
9
+ ## Installation
10
+
11
+ ```bash
12
+ pip install pixeltable-haystack
13
+ ```
14
+
15
+ ## Quick Start
16
+
17
+ ### Document Store
18
+
19
+ ```python
20
+ from haystack import Document
21
+ from haystack_pixeltable import PixeltableDocumentStore
22
+
23
+ store = PixeltableDocumentStore(
24
+ table_name="myproject.docs",
25
+ embedding_dimension=1536,
26
+ )
27
+
28
+ # Write documents
29
+ store.write_documents([
30
+ Document(content="Pixeltable is multimodal data infrastructure.", embedding=[...]),
31
+ Document(content="Haystack is a framework for building RAG pipelines.", embedding=[...]),
32
+ ])
33
+
34
+ # Filter documents
35
+ results = store.filter_documents(
36
+ filters={"field": "meta.category", "operator": "==", "value": "docs"}
37
+ )
38
+
39
+ # Count
40
+ print(store.count_documents())
41
+ ```
42
+
43
+ ### Retriever (Similarity Search)
44
+
45
+ ```python
46
+ from haystack_pixeltable import PixeltableDocumentStore, PixeltableRetriever
47
+
48
+ store = PixeltableDocumentStore(
49
+ table_name="myproject.docs",
50
+ embedding_dimension=1536,
51
+ )
52
+ retriever = PixeltableRetriever(document_store=store, top_k=5)
53
+
54
+ # Search by embedding vector
55
+ result = retriever.run(query_embedding=[0.1, 0.2, ...])
56
+ for doc in result["documents"]:
57
+ print(f"{doc.content} (score: {doc.score:.3f})")
58
+ ```
59
+
60
+ ### In a Haystack Pipeline
61
+
62
+ ```python
63
+ from haystack import Pipeline
64
+ from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
65
+ from haystack.components.writers import DocumentWriter
66
+ from haystack_pixeltable import PixeltableDocumentStore, PixeltableRetriever
67
+
68
+ store = PixeltableDocumentStore(
69
+ table_name="rag.knowledge",
70
+ embedding_dimension=384,
71
+ )
72
+
73
+ # Indexing pipeline
74
+ indexing = Pipeline()
75
+ indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
76
+ indexing.add_component("writer", DocumentWriter(document_store=store))
77
+ indexing.connect("embedder", "writer")
78
+
79
+ # Query pipeline
80
+ query = Pipeline()
81
+ query.add_component("embedder", SentenceTransformersTextEmbedder())
82
+ query.add_component("retriever", PixeltableRetriever(document_store=store, top_k=5))
83
+ query.connect("embedder.embedding", "retriever.query_embedding")
84
+ ```
85
+
86
+ ## Filtering
87
+
88
+ The Document Store supports the [Haystack filter specification](https://docs.haystack.deepset.ai/docs/metadata-filtering):
89
+
90
+ ```python
91
+ # Simple equality
92
+ store.filter_documents(filters={"field": "meta.category", "operator": "==", "value": "science"})
93
+
94
+ # Comparison operators: ==, !=, >, >=, <, <=
95
+ store.filter_documents(filters={"field": "meta.score", "operator": ">", "value": 0.8})
96
+
97
+ # Compound AND
98
+ store.filter_documents(filters={
99
+ "operator": "AND",
100
+ "conditions": [
101
+ {"field": "meta.category", "operator": "==", "value": "science"},
102
+ {"field": "meta.score", "operator": ">", "value": 0.5},
103
+ ],
104
+ })
105
+
106
+ # Compound OR
107
+ store.filter_documents(filters={
108
+ "operator": "OR",
109
+ "conditions": [
110
+ {"field": "meta.source", "operator": "==", "value": "arxiv"},
111
+ {"field": "meta.source", "operator": "==", "value": "pubmed"},
112
+ ],
113
+ })
114
+ ```
115
+
116
+ ## Pixeltable Escape Hatch: `.table`
117
+
118
+ The `.table` property gives direct access to the underlying Pixeltable table for operations beyond the Haystack interface:
119
+
120
+ ```python
121
+ store = PixeltableDocumentStore(table_name="myproject.docs", embedding_dimension=1536)
122
+ t = store.table
123
+
124
+ # Add a computed column
125
+ import pixeltable.functions.openai as openai
126
+ t.add_computed_column(
127
+ summary=openai.chat_completions(
128
+ messages=[{"role": "user", "content": t.content}],
129
+ model="gpt-4o-mini",
130
+ )
131
+ )
132
+
133
+ # Use arbitrary Pixeltable queries
134
+ results = t.where(t.meta["category"] == "science").select(t.content, t.summary).collect()
135
+
136
+ # Version history
137
+ print(t.count(version=-1)) # row count at previous version
138
+ ```
139
+
140
+ ## Why Pixeltable?
141
+
142
+ | Feature | Pixeltable | Chroma | Qdrant | pgvector |
143
+ |---------|-----------|--------|--------|----------|
144
+ | Persistent storage | Built-in | Opt-in | Opt-in | Built-in |
145
+ | Computed columns | Native | No | No | No |
146
+ | Version history | Native | No | No | No |
147
+ | Multimodal types | Image, Video, Audio, Document | Text only | Text only | Text only |
148
+ | Metadata filtering | JSON + SQL predicates | Limited | Rich | SQL |
149
+ | Embedding auto-compute | Via computed columns | Manual | Manual | Manual |
150
+
151
+ ## Development
152
+
153
+ ```bash
154
+ pip install -e ".[dev]"
155
+ pytest tests/ -v
156
+ ruff check . && ruff format --check .
157
+ ```
158
+
159
+ ## License
160
+
161
+ Apache 2.0
@@ -0,0 +1,8 @@
1
+ """Haystack Document Store and Retriever backed by Pixeltable."""
2
+
3
+ from haystack_pixeltable.document_store import PixeltableDocumentStore
4
+ from haystack_pixeltable.retriever import PixeltableRetriever
5
+
6
+ __all__ = ["PixeltableDocumentStore", "PixeltableRetriever"]
7
+
8
+ __version__ = "0.1.0"