entity-ent 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Beag Research
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,116 @@
1
+ Metadata-Version: 2.4
2
+ Name: entity-ent
3
+ Version: 0.1.0
4
+ Summary: Entity-conditioned language model with category-theoretic reasoning
5
+ Author: beag labs
6
+ License: Apache-2.0
7
+ Project-URL: Repository, https://github.com/beaglabs/entity-ent
8
+ Project-URL: Models, https://huggingface.co/beaglabs
9
+ Keywords: entity-reasoning,category-theory,graph-reasoning,nlp,llm,language-model
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Intended Audience :: Science/Research
12
+ Classifier: License :: OSI Approved :: Apache Software License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
18
+ Requires-Python: >=3.10
19
+ Description-Content-Type: text/markdown
20
+ License-File: LICENSE
21
+ Requires-Dist: torch>=2.0
22
+ Requires-Dist: transformers>=4.40
23
+ Requires-Dist: peft>=0.10
24
+ Requires-Dist: safetensors>=0.4
25
+ Requires-Dist: tqdm>=4.64
26
+ Provides-Extra: serve
27
+ Requires-Dist: fastapi>=0.100; extra == "serve"
28
+ Requires-Dist: pydantic>=2.0; extra == "serve"
29
+ Provides-Extra: train
30
+ Requires-Dist: datasets>=2.14; extra == "train"
31
+ Requires-Dist: accelerate>=0.30; extra == "train"
32
+ Dynamic: license-file
33
+
34
+ # ent
35
+
36
+ Entity-conditioned language models with category-theoretic reasoning.
37
+
38
+ `ent` augments pretrained transformers with a structured entity stream: tokens
39
+ are mapped to hashed entity IDs, decoded into learned embeddings, and
40
+ injected into the LM hidden space before generation. The inference engine
41
+ layers on abstraction hierarchies, graph reasoning, working memory, and
42
+ deterministic program execution.
43
+
44
+ ## Install
45
+
46
+ ```bash
47
+ pip install entity-ent
48
+ ```
49
+
50
+ For serving:
51
+
52
+ ```bash
53
+ pip install entity-ent[serve]
54
+ ```
55
+
56
+ ## Quickstart
57
+
58
+ ```python
59
+ from transformers import AutoTokenizer
60
+ from ent.training.train import EntitySmolWrapper
61
+
62
+ base_model = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
63
+ tokenizer = AutoTokenizer.from_pretrained(base_model)
64
+ tokenizer.pad_token = tokenizer.eos_token
65
+
66
+ model = EntitySmolWrapper.from_pretrained(
67
+ path="beaglabs/lancero-1.7B",
68
+ base_model_name=base_model,
69
+ device="cuda",
70
+ tokenizer=tokenizer,
71
+ )
72
+
73
+ response = model.chat([
74
+ {"role": "user", "content": "What type of entity is 'Paris'?"}
75
+ ])
76
+ print(response)
77
+ ```
78
+
79
+ Or load with `trust_remote_code` from HuggingFace:
80
+
81
+ ```python
82
+ from transformers import AutoModel, AutoTokenizer
83
+ model = AutoModel.from_pretrained("beaglabs/lancero-1.7B", trust_remote_code=True)
84
+ ```
85
+
86
+ ## Architecture
87
+
88
+ ```
89
+ Token stream ──→ Entity hashing ──→ Entity embeddings ──→ Projection
90
+ │ │
91
+ ▼ ▼
92
+ Token embeddings ───────────────────→ Sum ──→ Transformer ──→ Output
93
+ ```
94
+
95
+ The entity stream provides explicit identity, type, and structural
96
+ information that subword tokens cannot express. The broader engine includes:
97
+
98
+ - **Abstraction**: Type-aware is-a/specializes-to hierarchies
99
+ - **Graph reasoning**: BFS path finding between entity pairs
100
+ - **Working memory**: Confidence-scored active filtering
101
+ - **Durable memory**: Semantic and procedural records across calls
102
+ - **Program execution**: Deterministic arithmetic, comparison, deduction
103
+
104
+ Symbolic queries (entity resolution, type lookups, graph traversal, math)
105
+ run on hashes alone — zero neural inference. Open-ended generation falls
106
+ through to the entity-conditioned LM.
107
+
108
+ ## Models
109
+
110
+ | Model | Base | Link |
111
+ |-------|------|------|
112
+ | Lancero 1.7B | SmolLM2-1.7B-Instruct | [beaglabs/lancero-1.7B](https://huggingface.co/beaglabs/lancero-1.7B) |
113
+
114
+ ## License
115
+
116
+ Apache 2.0
@@ -0,0 +1,83 @@
1
+ # ent
2
+
3
+ Entity-conditioned language models with category-theoretic reasoning.
4
+
5
+ `ent` augments pretrained transformers with a structured entity stream: tokens
6
+ are mapped to hashed entity IDs, decoded into learned embeddings, and
7
+ injected into the LM hidden space before generation. The inference engine
8
+ layers on abstraction hierarchies, graph reasoning, working memory, and
9
+ deterministic program execution.
10
+
11
+ ## Install
12
+
13
+ ```bash
14
+ pip install entity-ent
15
+ ```
16
+
17
+ For serving:
18
+
19
+ ```bash
20
+ pip install entity-ent[serve]
21
+ ```
22
+
23
+ ## Quickstart
24
+
25
+ ```python
26
+ from transformers import AutoTokenizer
27
+ from ent.training.train import EntitySmolWrapper
28
+
29
+ base_model = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
30
+ tokenizer = AutoTokenizer.from_pretrained(base_model)
31
+ tokenizer.pad_token = tokenizer.eos_token
32
+
33
+ model = EntitySmolWrapper.from_pretrained(
34
+ path="beaglabs/lancero-1.7B",
35
+ base_model_name=base_model,
36
+ device="cuda",
37
+ tokenizer=tokenizer,
38
+ )
39
+
40
+ response = model.chat([
41
+ {"role": "user", "content": "What type of entity is 'Paris'?"}
42
+ ])
43
+ print(response)
44
+ ```
45
+
46
+ Or load with `trust_remote_code` from HuggingFace:
47
+
48
+ ```python
49
+ from transformers import AutoModel, AutoTokenizer
50
+ model = AutoModel.from_pretrained("beaglabs/lancero-1.7B", trust_remote_code=True)
51
+ ```
52
+
53
+ ## Architecture
54
+
55
+ ```
56
+ Token stream ──→ Entity hashing ──→ Entity embeddings ──→ Projection
57
+ │ │
58
+ ▼ ▼
59
+ Token embeddings ───────────────────→ Sum ──→ Transformer ──→ Output
60
+ ```
61
+
62
+ The entity stream provides explicit identity, type, and structural
63
+ information that subword tokens cannot express. The broader engine includes:
64
+
65
+ - **Abstraction**: Type-aware is-a/specializes-to hierarchies
66
+ - **Graph reasoning**: BFS path finding between entity pairs
67
+ - **Working memory**: Confidence-scored active filtering
68
+ - **Durable memory**: Semantic and procedural records across calls
69
+ - **Program execution**: Deterministic arithmetic, comparison, deduction
70
+
71
+ Symbolic queries (entity resolution, type lookups, graph traversal, math)
72
+ run on hashes alone — zero neural inference. Open-ended generation falls
73
+ through to the entity-conditioned LM.
74
+
75
+ ## Models
76
+
77
+ | Model | Base | Link |
78
+ |-------|------|------|
79
+ | Lancero 1.7B | SmolLM2-1.7B-Instruct | [beaglabs/lancero-1.7B](https://huggingface.co/beaglabs/lancero-1.7B) |
80
+
81
+ ## License
82
+
83
+ Apache 2.0
File without changes
@@ -0,0 +1,29 @@
1
+ """Categorical attention: type-aware attention biases.
2
+
3
+ Encodes entity-type relationships as additive query+key biases per type.
4
+ O(seq²) memory (one (batch, 1, seq, seq) tensor) but computed via fast
5
+ broadcast add — no gather or outer-product overhead.
6
+ """
7
+
8
+ import torch
9
+ import torch.nn as nn
10
+
11
+
12
+ class CategoricalAttentionBias(nn.Module):
13
+ def __init__(self, num_heads: int, num_types: int = 32):
14
+ super().__init__()
15
+ self.q_bias = nn.Embedding(num_types, 1)
16
+ self.k_bias = nn.Embedding(num_types, 1)
17
+
18
+ def forward(
19
+ self,
20
+ token_types: torch.LongTensor,
21
+ attention_mask: torch.Tensor | None = None,
22
+ ) -> torch.Tensor:
23
+ q = self.q_bias(token_types).squeeze(-1) # (batch, seq)
24
+ k = self.k_bias(token_types).squeeze(-1) # (batch, seq)
25
+ bias = q.unsqueeze(-1) + k.unsqueeze(-2) # (batch, seq, seq) broadcast-add
26
+ bias = bias.unsqueeze(1) # (batch, 1, seq, seq)
27
+ if attention_mask is not None:
28
+ bias = bias.masked_fill(~attention_mask.bool().unsqueeze(1), float("-inf"))
29
+ return bias
@@ -0,0 +1,64 @@
1
+ """baseline — hash-based entity decoder.
2
+
3
+ Maps compact entity hashes through a learned decoder that jointly predicts
4
+ type, class, and scope — the model discovers categorical structure itself
5
+ rather than relying on hand-crafted semantic token strings.
6
+
7
+ Contrast with semantic CEIDs (robot.recon.w11.a = 4+ tokens):
8
+ - Hash-based: single compact ID → learned structured representation
9
+ - Supports categorical attention by providing type/class/scope predictions
10
+ - Trainable end-to-end: the decoder IS the functor
11
+ """
12
+
13
+ import torch
14
+ import torch.nn as nn
15
+ import torch.nn.functional as F
16
+
17
+
18
+ class EntityDecoder(nn.Module):
19
+ def __init__(
20
+ self,
21
+ num_entities: int = 131072,
22
+ hidden_dim: int = 896,
23
+ num_types: int = 32,
24
+ num_classes: int = 64,
25
+ num_scopes: int = 128,
26
+ ):
27
+ super().__init__()
28
+ self.entity_embed = nn.Embedding(num_entities, hidden_dim)
29
+ self.type_head = nn.Linear(hidden_dim, num_types)
30
+ self.class_head = nn.Linear(hidden_dim, num_classes)
31
+ self.scope_head = nn.Linear(hidden_dim, num_scopes)
32
+ self.num_types = num_types
33
+ self.num_classes = num_classes
34
+ self.num_scopes = num_scopes
35
+
36
+ def forward(
37
+ self, entity_ids: torch.LongTensor
38
+ ) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
39
+ e = self.entity_embed(entity_ids)
40
+ type_logits = self.type_head(e)
41
+ class_logits = self.class_head(e)
42
+ scope_logits = self.scope_head(e)
43
+ return e, type_logits, class_logits, scope_logits
44
+
45
+ def compute_structural_loss(
46
+ self,
47
+ type_logits: torch.Tensor,
48
+ class_logits: torch.Tensor,
49
+ scope_logits: torch.Tensor,
50
+ type_labels: torch.LongTensor,
51
+ class_labels: torch.LongTensor,
52
+ scope_labels: torch.LongTensor,
53
+ entity_mask: torch.Tensor,
54
+ ) -> torch.Tensor:
55
+ type_loss = F.cross_entropy(
56
+ type_logits[entity_mask], type_labels[entity_mask]
57
+ )
58
+ class_loss = F.cross_entropy(
59
+ class_logits[entity_mask], class_labels[entity_mask]
60
+ )
61
+ scope_loss = F.cross_entropy(
62
+ scope_logits[entity_mask], scope_labels[entity_mask]
63
+ )
64
+ return type_loss + class_loss + scope_loss
@@ -0,0 +1,73 @@
1
+ """Schema-based entity hash encoding.
2
+
3
+ 17-bit layout (131072 entities):
4
+ [16:12] type (5 bits, 0-31)
5
+ [11:6] class (6 bits, 0-63)
6
+ [5:3] scope (3 bits, 0-7)
7
+ [2] arity (1 bit, low=0 high=1)
8
+ [1] role (1 bit, caller=0 callee=1)
9
+ [0] morphism (1 bit, static=0 dynamic=1)
10
+ """
11
+
12
+ import torch
13
+
14
+ SHIFT_TYPE = 12
15
+ SHIFT_CLASS = 6
16
+ SHIFT_SCOPE = 3
17
+ SHIFT_ARITY = 2
18
+ SHIFT_ROLE = 1
19
+ SHIFT_MORPHISM = 0
20
+
21
+ MASK_TYPE = 0x1F
22
+ MASK_CLASS = 0x3F
23
+ MASK_SCOPE = 0x7
24
+ MASK_ARITY = 0x1
25
+ MASK_ROLE = 0x1
26
+ MASK_MORPHISM = 0x1
27
+
28
+ NUM_ENTITIES = 1 << 17
29
+ NUM_TYPES = 1 << 5
30
+ NUM_CLASSES = 1 << 6
31
+ NUM_SCOPES = 1 << 3
32
+
33
+
34
+ def encode(
35
+ type_id: int,
36
+ class_id: int,
37
+ scope_id: int,
38
+ arity: int,
39
+ role: int,
40
+ morphism: int,
41
+ ) -> int:
42
+ return (
43
+ (type_id << SHIFT_TYPE)
44
+ | (class_id << SHIFT_CLASS)
45
+ | (scope_id << SHIFT_SCOPE)
46
+ | (arity << SHIFT_ARITY)
47
+ | (role << SHIFT_ROLE)
48
+ | morphism
49
+ )
50
+
51
+
52
+ def get_type(entity_hash: torch.Tensor) -> torch.Tensor:
53
+ return (entity_hash >> SHIFT_TYPE) & MASK_TYPE
54
+
55
+
56
+ def get_class(entity_hash: torch.Tensor) -> torch.Tensor:
57
+ return (entity_hash >> SHIFT_CLASS) & MASK_CLASS
58
+
59
+
60
+ def get_scope(entity_hash: torch.Tensor) -> torch.Tensor:
61
+ return (entity_hash >> SHIFT_SCOPE) & MASK_SCOPE
62
+
63
+
64
+ def get_arity(entity_hash: torch.Tensor) -> torch.Tensor:
65
+ return (entity_hash >> SHIFT_ARITY) & MASK_ARITY
66
+
67
+
68
+ def get_role(entity_hash: torch.Tensor) -> torch.Tensor:
69
+ return (entity_hash >> SHIFT_ROLE) & MASK_ROLE
70
+
71
+
72
+ def get_morphism(entity_hash: torch.Tensor) -> torch.Tensor:
73
+ return (entity_hash >> SHIFT_MORPHISM) & MASK_MORPHISM
@@ -0,0 +1,60 @@
1
+ """Murmuration router: lightweight functor routing.
2
+
3
+ Routes tokens to specialized functor modules based on content and
4
+ CEID scope (local interaction rules rather than global learned gating).
5
+
6
+ Enabled in Phase 3 — disabled during Phase 1 pretraining.
7
+ """
8
+
9
+ import torch
10
+ import torch.nn as nn
11
+ import torch.nn.functional as F
12
+
13
+
14
+ class MurmurationRouter(nn.Module):
15
+ def __init__(
16
+ self,
17
+ hidden_size: int,
18
+ routing_dim: int = 512,
19
+ num_functors: int = 8,
20
+ top_k: int = 2,
21
+ num_scopes: int = 8,
22
+ ):
23
+ super().__init__()
24
+ self.hidden_size = hidden_size
25
+ self.num_functors = num_functors
26
+ self.top_k = top_k
27
+ self.gate = nn.Sequential(
28
+ nn.Linear(hidden_size, routing_dim),
29
+ nn.SiLU(),
30
+ nn.Linear(routing_dim, num_functors),
31
+ )
32
+ self.scope_bias = nn.Embedding(num_scopes, num_functors)
33
+
34
+ def forward(
35
+ self,
36
+ hidden_states: torch.Tensor,
37
+ scope_ids: torch.LongTensor | None = None,
38
+ ) -> tuple[torch.Tensor, torch.Tensor]:
39
+ logits = self.gate(hidden_states)
40
+ if scope_ids is not None:
41
+ logits = logits + self.scope_bias(scope_ids)
42
+ routing_weights = F.softmax(logits, dim=-1)
43
+ top_k_weights, top_k_indices = torch.topk(routing_weights, self.top_k, dim=-1)
44
+ top_k_weights = top_k_weights / top_k_weights.sum(dim=-1, keepdim=True)
45
+ return top_k_weights, top_k_indices
46
+
47
+
48
+ class FunctorModule(nn.Module):
49
+ """A single functor — a small FFN specialized for one morphism class."""
50
+
51
+ def __init__(self, hidden_size: int, intermediate_mult: int = 1):
52
+ super().__init__()
53
+ self.net = nn.Sequential(
54
+ nn.Linear(hidden_size, hidden_size * intermediate_mult),
55
+ nn.GELU(),
56
+ nn.Linear(hidden_size * intermediate_mult, hidden_size),
57
+ )
58
+
59
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
60
+ return self.net(x)
@@ -0,0 +1,65 @@
1
+ """ent — entity-conditioned language models with category-theoretic reasoning.
2
+
3
+ Key components:
4
+ category — EntityCategory, EntityMorphism, EntityFunctor, Subcategory
5
+ store — EntityStore (entity decoder weights, fast lookups)
6
+ reasoner — EntityReasoner (symbolic queries over entity graphs)
7
+ decoder — EntityConditionedDecoder (small LM for text generation)
8
+ inference — EntInferenceEngine (abstraction, graph reasoning, working memory, program execution)
9
+ training — EntitySmolWrapper (entity-conditioned generation with LoRA)
10
+ serving — FastAPI server with OpenAI-compatible /v1/chat/completions
11
+ """
12
+
13
+ from ent.core.category import (
14
+ EntityObject,
15
+ EntityMorphism,
16
+ EntityCategory,
17
+ StructuralMorphism,
18
+ EmbeddingFunctor,
19
+ TypeClassifierFunctor,
20
+ Subcategory,
21
+ )
22
+ from ent.core.store import EntityStore
23
+ from ent.core.reasoner import EntityGraph, EntityReasoner
24
+ from ent.models.decoder import DecoderConfig, EntityConditionedDecoder
25
+ from ent.core.inference import (
26
+ AbstractionNode,
27
+ AbstractionEdge,
28
+ AbstractionGraph,
29
+ MemoryItem,
30
+ DurableMemoryRecord,
31
+ DurableMemoryStore,
32
+ CandidateAnswer,
33
+ InferenceState,
34
+ EntityAbstractionLayer,
35
+ ProgramExecutor,
36
+ EntInferenceEngine,
37
+ )
38
+
39
+ __all__ = [
40
+ "EntityObject",
41
+ "EntityMorphism",
42
+ "EntityCategory",
43
+ "StructuralMorphism",
44
+ "EmbeddingFunctor",
45
+ "TypeClassifierFunctor",
46
+ "Subcategory",
47
+ "EntityStore",
48
+ "EntityGraph",
49
+ "EntityReasoner",
50
+ "DecoderConfig",
51
+ "EntityConditionedDecoder",
52
+ "AbstractionNode",
53
+ "AbstractionEdge",
54
+ "AbstractionGraph",
55
+ "MemoryItem",
56
+ "DurableMemoryRecord",
57
+ "DurableMemoryStore",
58
+ "CandidateAnswer",
59
+ "InferenceState",
60
+ "EntityAbstractionLayer",
61
+ "ProgramExecutor",
62
+ "EntInferenceEngine",
63
+ ]
64
+
65
+ __version__ = "0.1.0"
@@ -0,0 +1 @@
1
+ """Core entity reasoning primitives."""