swarmauri_parser_bertembedding 0.8.0.dev4__tar.gz → 0.8.0.dev21__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,135 @@
1
+ Metadata-Version: 2.4
2
+ Name: swarmauri_parser_bertembedding
3
+ Version: 0.8.0.dev21
4
+ Summary: Swarmauri Bert Embedding Parser
5
+ License-Expression: Apache-2.0
6
+ License-File: LICENSE
7
+ Keywords: swarmauri,parser,bertembedding,bert,embedding
8
+ Author: Jacob Stewart
9
+ Author-email: jacob@swarmauri.com
10
+ Requires-Python: >=3.10,<3.13
11
+ Classifier: License :: OSI Approved :: Apache Software License
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Natural Language :: English
16
+ Classifier: Development Status :: 3 - Alpha
17
+ Classifier: Intended Audience :: Developers
18
+ Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
19
+ Requires-Dist: swarmauri_base
20
+ Requires-Dist: swarmauri_core
21
+ Requires-Dist: swarmauri_standard
22
+ Requires-Dist: torch
23
+ Requires-Dist: transformers (>=4.45.0)
24
+ Description-Content-Type: text/markdown
25
+
26
+ ![Swarmauri Logo](https://github.com/swarmauri/swarmauri-sdk/blob/3d4d1cfa949399d7019ae9d8f296afba773dfb7f/assets/swarmauri.brand.theme.svg)
27
+
28
+ <p align="center">
29
+ <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
30
+ <img src="https://img.shields.io/pypi/dm/swarmauri_parser_bertembedding" alt="PyPI - Downloads"/></a>
31
+ <a href="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_bertembedding/">
32
+ <img alt="Hits" src="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_bertembedding.svg"/></a>
33
+ <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
34
+ <img src="https://img.shields.io/pypi/pyversions/swarmauri_parser_bertembedding" alt="PyPI - Python Version"/></a>
35
+ <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
36
+ <img src="https://img.shields.io/pypi/l/swarmauri_parser_bertembedding" alt="PyPI - License"/></a>
37
+ <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
38
+ <img src="https://img.shields.io/pypi/v/swarmauri_parser_bertembedding?label=swarmauri_parser_bertembedding&color=green" alt="PyPI - swarmauri_parser_bertembedding"/></a>
39
+ </p>
40
+
41
+ ---
42
+
43
+ # Swarmauri Parser Bert Embedding
44
+
45
+ Parser that converts text into embeddings using a Hugging Face BERT encoder. Produces `Document` objects whose metadata carries the averaged token embedding so downstream Swarmauri pipelines can work with dense vectors.
46
+
47
+ ## Features
48
+
49
+ - Uses `transformers.BertModel` + `BertTokenizer` (default `bert-base-uncased`).
50
+ - Accepts single strings or lists of strings and emits `Document` instances with original text and embedding metadata.
51
+ - Runs in inference (`eval`) mode with automatic `torch.no_grad()` handling.
52
+ - Works on CPU by default; configure PyTorch device settings to leverage GPU.
53
+
54
+ ## Prerequisites
55
+
56
+ - Python 3.10 or newer.
57
+ - PyTorch compatible with your hardware (installed automatically via `transformers` if not present; install CUDA-enabled wheels manually when needed).
58
+ - Internet access on first run so Hugging Face downloads tokenizer/model weights (or warm the cache ahead of time).
59
+
60
+ ## Installation
61
+
62
+ ```bash
63
+ # pip
64
+ pip install swarmauri_parser_bertembedding
65
+
66
+ # poetry
67
+ poetry add swarmauri_parser_bertembedding
68
+
69
+ # uv (pyproject-based projects)
70
+ uv add swarmauri_parser_bertembedding
71
+ ```
72
+
73
+ ## Quickstart
74
+
75
+ ```python
76
+ from swarmauri_parser_bertembedding import BERTEmbeddingParser
77
+
78
+ parser = BERTEmbeddingParser(parser_model_name="bert-base-uncased")
79
+
80
+ documents = parser.parse([
81
+ "Swarmauri agents cooperate over shared memory.",
82
+ "Dense embeddings power semantic search.",
83
+ ])
84
+
85
+ for doc in documents:
86
+ vector = doc.metadata["embedding"]
87
+ print(doc.content)
88
+ print(len(vector), vector[:5])
89
+ ```
90
+
91
+ ## Custom Models & Devices
92
+
93
+ ```python
94
+ import torch
95
+ from swarmauri_parser_bertembedding import BERTEmbeddingParser
96
+ from transformers import BertModel
97
+
98
+ class GPUParser(BERTEmbeddingParser):
99
+ def __init__(self, **kwargs):
100
+ super().__init__(**kwargs)
101
+ self._model = BertModel.from_pretrained(self.parser_model_name).to("cuda")
102
+
103
+ parser = GPUParser(parser_model_name="bert-base-multilingual-cased")
104
+ parser._model.eval()
105
+ ```
106
+
107
+ ## Batch Embeddings at Scale
108
+
109
+ ```python
110
+ from tqdm import tqdm
111
+ from swarmauri_parser_bertembedding import BERTEmbeddingParser
112
+
113
+ texts = [f"Paragraph {i}" for i in range(1000)]
114
+ parser = BERTEmbeddingParser()
115
+
116
+ batched_docs = []
117
+ batch_size = 32
118
+ for start in tqdm(range(0, len(texts), batch_size)):
119
+ batch = texts[start:start + batch_size]
120
+ batched_docs.extend(parser.parse(batch))
121
+ ```
122
+
123
+ Persist the resulting vectors into Swarmauri vector stores (Redis, Qdrant, etc.) via the metadata field.
124
+
125
+ ## Tips
126
+
127
+ - Preprocess text to match model expectations (lowercase for uncased BERT, language-specific cleanup for multilingual models).
128
+ - For extremely long documents, consider chunking before calling `parse` to respect the 512 token limit.
129
+ - Use PyTorch's `to("cuda")` or `to("mps")` to execute on GPUs or Apple silicon accelerators.
130
+ - Cache Hugging Face weights in CI/CD environments (`HF_HOME=/cache/hf`) to avoid repeated downloads.
131
+
132
+ ## Want to help?
133
+
134
+ If you want to contribute to swarmauri-sdk, read up on our [guidelines for contributing](https://github.com/swarmauri/swarmauri-sdk/blob/master/contributing.md) that will help you get started.
135
+
@@ -0,0 +1,109 @@
1
+ ![Swarmauri Logo](https://github.com/swarmauri/swarmauri-sdk/blob/3d4d1cfa949399d7019ae9d8f296afba773dfb7f/assets/swarmauri.brand.theme.svg)
2
+
3
+ <p align="center">
4
+ <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
5
+ <img src="https://img.shields.io/pypi/dm/swarmauri_parser_bertembedding" alt="PyPI - Downloads"/></a>
6
+ <a href="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_bertembedding/">
7
+ <img alt="Hits" src="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_bertembedding.svg"/></a>
8
+ <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
9
+ <img src="https://img.shields.io/pypi/pyversions/swarmauri_parser_bertembedding" alt="PyPI - Python Version"/></a>
10
+ <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
11
+ <img src="https://img.shields.io/pypi/l/swarmauri_parser_bertembedding" alt="PyPI - License"/></a>
12
+ <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
13
+ <img src="https://img.shields.io/pypi/v/swarmauri_parser_bertembedding?label=swarmauri_parser_bertembedding&color=green" alt="PyPI - swarmauri_parser_bertembedding"/></a>
14
+ </p>
15
+
16
+ ---
17
+
18
+ # Swarmauri Parser Bert Embedding
19
+
20
+ Parser that converts text into embeddings using a Hugging Face BERT encoder. Produces `Document` objects whose metadata carries the averaged token embedding so downstream Swarmauri pipelines can work with dense vectors.
21
+
22
+ ## Features
23
+
24
+ - Uses `transformers.BertModel` + `BertTokenizer` (default `bert-base-uncased`).
25
+ - Accepts single strings or lists of strings and emits `Document` instances with original text and embedding metadata.
26
+ - Runs in inference (`eval`) mode with automatic `torch.no_grad()` handling.
27
+ - Works on CPU by default; configure PyTorch device settings to leverage GPU.
28
+
29
+ ## Prerequisites
30
+
31
+ - Python 3.10 or newer.
32
+ - PyTorch compatible with your hardware (installed automatically via `transformers` if not present; install CUDA-enabled wheels manually when needed).
33
+ - Internet access on first run so Hugging Face downloads tokenizer/model weights (or warm the cache ahead of time).
34
+
35
+ ## Installation
36
+
37
+ ```bash
38
+ # pip
39
+ pip install swarmauri_parser_bertembedding
40
+
41
+ # poetry
42
+ poetry add swarmauri_parser_bertembedding
43
+
44
+ # uv (pyproject-based projects)
45
+ uv add swarmauri_parser_bertembedding
46
+ ```
47
+
48
+ ## Quickstart
49
+
50
+ ```python
51
+ from swarmauri_parser_bertembedding import BERTEmbeddingParser
52
+
53
+ parser = BERTEmbeddingParser(parser_model_name="bert-base-uncased")
54
+
55
+ documents = parser.parse([
56
+ "Swarmauri agents cooperate over shared memory.",
57
+ "Dense embeddings power semantic search.",
58
+ ])
59
+
60
+ for doc in documents:
61
+ vector = doc.metadata["embedding"]
62
+ print(doc.content)
63
+ print(len(vector), vector[:5])
64
+ ```
65
+
66
+ ## Custom Models & Devices
67
+
68
+ ```python
69
+ import torch
70
+ from swarmauri_parser_bertembedding import BERTEmbeddingParser
71
+ from transformers import BertModel
72
+
73
+ class GPUParser(BERTEmbeddingParser):
74
+ def __init__(self, **kwargs):
75
+ super().__init__(**kwargs)
76
+ self._model = BertModel.from_pretrained(self.parser_model_name).to("cuda")
77
+
78
+ parser = GPUParser(parser_model_name="bert-base-multilingual-cased")
79
+ parser._model.eval()
80
+ ```
81
+
82
+ ## Batch Embeddings at Scale
83
+
84
+ ```python
85
+ from tqdm import tqdm
86
+ from swarmauri_parser_bertembedding import BERTEmbeddingParser
87
+
88
+ texts = [f"Paragraph {i}" for i in range(1000)]
89
+ parser = BERTEmbeddingParser()
90
+
91
+ batched_docs = []
92
+ batch_size = 32
93
+ for start in tqdm(range(0, len(texts), batch_size)):
94
+ batch = texts[start:start + batch_size]
95
+ batched_docs.extend(parser.parse(batch))
96
+ ```
97
+
98
+ Persist the resulting vectors into Swarmauri vector stores (Redis, Qdrant, etc.) via the metadata field.
99
+
100
+ ## Tips
101
+
102
+ - Preprocess text to match model expectations (lowercase for uncased BERT, language-specific cleanup for multilingual models).
103
+ - For extremely long documents, consider chunking before calling `parse` to respect the 512 token limit.
104
+ - Use PyTorch's `to("cuda")` or `to("mps")` to execute on GPUs or Apple silicon accelerators.
105
+ - Cache Hugging Face weights in CI/CD environments (`HF_HOME=/cache/hf`) to avoid repeated downloads.
106
+
107
+ ## Want to help?
108
+
109
+ If you want to contribute to swarmauri-sdk, read up on our [guidelines for contributing](https://github.com/swarmauri/swarmauri-sdk/blob/master/contributing.md) that will help you get started.
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "swarmauri_parser_bertembedding"
3
- version = "0.8.0.dev4"
3
+ version = "0.8.0.dev21"
4
4
  description = "Swarmauri Bert Embedding Parser"
5
5
  license = "Apache-2.0"
6
6
  readme = "README.md"
@@ -11,6 +11,10 @@ classifiers = [
11
11
  "Programming Language :: Python :: 3.10",
12
12
  "Programming Language :: Python :: 3.11",
13
13
  "Programming Language :: Python :: 3.12",
14
+ "Natural Language :: English",
15
+ "Development Status :: 3 - Alpha",
16
+ "Intended Audience :: Developers",
17
+ "Topic :: Software Development :: Libraries :: Application Frameworks",
14
18
  ]
15
19
  authors = [{ name = "Jacob Stewart", email = "jacob@swarmauri.com" }]
16
20
  dependencies = [
@@ -20,6 +24,13 @@ dependencies = [
20
24
  "swarmauri_base",
21
25
  "swarmauri_standard",
22
26
  ]
27
+ keywords = [
28
+ "swarmauri",
29
+ "parser",
30
+ "bertembedding",
31
+ "bert",
32
+ "embedding",
33
+ ]
23
34
 
24
35
  [tool.uv.sources]
25
36
  swarmauri_core = { workspace = true }
@@ -1,67 +0,0 @@
1
- Metadata-Version: 2.3
2
- Name: swarmauri_parser_bertembedding
3
- Version: 0.8.0.dev4
4
- Summary: Swarmauri Bert Embedding Parser
5
- License: Apache-2.0
6
- Author: Jacob Stewart
7
- Author-email: jacob@swarmauri.com
8
- Requires-Python: >=3.10,<3.13
9
- Classifier: License :: OSI Approved :: Apache Software License
10
- Classifier: Programming Language :: Python :: 3.10
11
- Classifier: Programming Language :: Python :: 3.11
12
- Classifier: Programming Language :: Python :: 3.12
13
- Requires-Dist: swarmauri_base
14
- Requires-Dist: swarmauri_core
15
- Requires-Dist: swarmauri_standard
16
- Requires-Dist: torch
17
- Requires-Dist: transformers (>=4.45.0)
18
- Description-Content-Type: text/markdown
19
-
20
-
21
- ![Swamauri Logo](https://res.cloudinary.com/dbjmpekvl/image/upload/v1730099724/Swarmauri-logo-lockup-2048x757_hww01w.png)
22
-
23
- <p align="center">
24
- <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
25
- <img src="https://img.shields.io/pypi/dm/swarmauri_parser_bertembedding" alt="PyPI - Downloads"/></a>
26
- <a href="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_bertembedding/">
27
- <img alt="Hits" src="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_bertembedding.svg"/></a>
28
- <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
29
- <img src="https://img.shields.io/pypi/pyversions/swarmauri_parser_bertembedding" alt="PyPI - Python Version"/></a>
30
- <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
31
- <img src="https://img.shields.io/pypi/l/swarmauri_parser_bertembedding" alt="PyPI - License"/></a>
32
- <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
33
- <img src="https://img.shields.io/pypi/v/swarmauri_parser_bertembedding?label=swarmauri_parser_bertembedding&color=green" alt="PyPI - swarmauri_parser_bertembedding"/></a>
34
- </p>
35
-
36
- ---
37
-
38
- # Swarmauri Parser Bert Embedding
39
-
40
- A parser that transforms input text into document embeddings using BERT.
41
-
42
- ## Installation
43
-
44
- ```bash
45
- pip install swarmauri_parser_bertembedding
46
- ```
47
-
48
- ## Usage
49
- Basic usage examples with code snippets
50
- ```python
51
- from swarmauri.parsers.BERTEmbeddingParser import BERTEmbeddingParser
52
-
53
- # Initialize the parser
54
- parser = BERTEmbeddingParser()
55
-
56
- # Parse some text data
57
- documents = parser.parse("Your text data here")
58
-
59
- # Access the embeddings
60
- for doc in documents:
61
- print(doc.content)
62
- ```
63
-
64
- ## Want to help?
65
-
66
- If you want to contribute to swarmauri-sdk, read up on our [guidelines for contributing](https://github.com/swarmauri/swarmauri-sdk/blob/master/contributing.md) that will help you get started.
67
-
@@ -1,47 +0,0 @@
1
-
2
- ![Swamauri Logo](https://res.cloudinary.com/dbjmpekvl/image/upload/v1730099724/Swarmauri-logo-lockup-2048x757_hww01w.png)
3
-
4
- <p align="center">
5
- <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
6
- <img src="https://img.shields.io/pypi/dm/swarmauri_parser_bertembedding" alt="PyPI - Downloads"/></a>
7
- <a href="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_bertembedding/">
8
- <img alt="Hits" src="https://hits.sh/github.com/swarmauri/swarmauri-sdk/tree/master/pkgs/community/swarmauri_parser_bertembedding.svg"/></a>
9
- <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
10
- <img src="https://img.shields.io/pypi/pyversions/swarmauri_parser_bertembedding" alt="PyPI - Python Version"/></a>
11
- <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
12
- <img src="https://img.shields.io/pypi/l/swarmauri_parser_bertembedding" alt="PyPI - License"/></a>
13
- <a href="https://pypi.org/project/swarmauri_parser_bertembedding/">
14
- <img src="https://img.shields.io/pypi/v/swarmauri_parser_bertembedding?label=swarmauri_parser_bertembedding&color=green" alt="PyPI - swarmauri_parser_bertembedding"/></a>
15
- </p>
16
-
17
- ---
18
-
19
- # Swarmauri Parser Bert Embedding
20
-
21
- A parser that transforms input text into document embeddings using BERT.
22
-
23
- ## Installation
24
-
25
- ```bash
26
- pip install swarmauri_parser_bertembedding
27
- ```
28
-
29
- ## Usage
30
- Basic usage examples with code snippets
31
- ```python
32
- from swarmauri.parsers.BERTEmbeddingParser import BERTEmbeddingParser
33
-
34
- # Initialize the parser
35
- parser = BERTEmbeddingParser()
36
-
37
- # Parse some text data
38
- documents = parser.parse("Your text data here")
39
-
40
- # Access the embeddings
41
- for doc in documents:
42
- print(doc.content)
43
- ```
44
-
45
- ## Want to help?
46
-
47
- If you want to contribute to swarmauri-sdk, read up on our [guidelines for contributing](https://github.com/swarmauri/swarmauri-sdk/blob/master/contributing.md) that will help you get started.