englisp 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (42) hide show
  1. englisp-1.0.0/PKG-INFO +366 -0
  2. englisp-1.0.0/README.md +350 -0
  3. englisp-1.0.0/englisp/__init__.py +96 -0
  4. englisp-1.0.0/englisp/canonicalizer.py +1242 -0
  5. englisp-1.0.0/englisp/compiler.py +895 -0
  6. englisp-1.0.0/englisp/graph_db.py +283 -0
  7. englisp-1.0.0/englisp/interpreter.py +850 -0
  8. englisp-1.0.0/englisp/loader.py +406 -0
  9. englisp-1.0.0/englisp/minimizer.py +380 -0
  10. englisp-1.0.0/englisp/ontology.py +100 -0
  11. englisp-1.0.0/englisp/parser.py +764 -0
  12. englisp-1.0.0/englisp/xbar.py +94 -0
  13. englisp-1.0.0/englisp.egg-info/PKG-INFO +366 -0
  14. englisp-1.0.0/englisp.egg-info/SOURCES.txt +40 -0
  15. englisp-1.0.0/englisp.egg-info/dependency_links.txt +1 -0
  16. englisp-1.0.0/englisp.egg-info/requires.txt +6 -0
  17. englisp-1.0.0/englisp.egg-info/top_level.txt +1 -0
  18. englisp-1.0.0/pyproject.toml +51 -0
  19. englisp-1.0.0/setup.cfg +4 -0
  20. englisp-1.0.0/tests/test_admin_auth.py +111 -0
  21. englisp-1.0.0/tests/test_api.py +67 -0
  22. englisp-1.0.0/tests/test_auth.py +156 -0
  23. englisp-1.0.0/tests/test_canonicalizer.py +86 -0
  24. englisp-1.0.0/tests/test_compiler.py +131 -0
  25. englisp-1.0.0/tests/test_dag.py +64 -0
  26. englisp-1.0.0/tests/test_db_compiler.py +88 -0
  27. englisp-1.0.0/tests/test_email_recovery.py +122 -0
  28. englisp-1.0.0/tests/test_inference.py +108 -0
  29. englisp-1.0.0/tests/test_interpreter.py +125 -0
  30. englisp-1.0.0/tests/test_launch_improvements.py +92 -0
  31. englisp-1.0.0/tests/test_minimizer.py +126 -0
  32. englisp-1.0.0/tests/test_multilingual.py +156 -0
  33. englisp-1.0.0/tests/test_ontology.py +87 -0
  34. englisp-1.0.0/tests/test_parser.py +113 -0
  35. englisp-1.0.0/tests/test_pragmatics.py +87 -0
  36. englisp-1.0.0/tests/test_primes.py +79 -0
  37. englisp-1.0.0/tests/test_programming.py +172 -0
  38. englisp-1.0.0/tests/test_rdf_export.py +83 -0
  39. englisp-1.0.0/tests/test_robustness.py +108 -0
  40. englisp-1.0.0/tests/test_subscriptions.py +118 -0
  41. englisp-1.0.0/tests/test_tense.py +137 -0
  42. englisp-1.0.0/tests/test_validation.py +89 -0
englisp-1.0.0/PKG-INFO ADDED
@@ -0,0 +1,366 @@
1
+ Metadata-Version: 2.4
2
+ Name: englisp
3
+ Version: 1.0.0
4
+ Summary: A Bidirectional Bridge Between Natural Language and Computation
5
+ Author-email: Russell Shen <russellshen7@gmail.com>
6
+ License-Expression: CC-BY-NC-ND-4.0
7
+ Classifier: Programming Language :: Python :: 3
8
+ Classifier: Operating System :: OS Independent
9
+ Requires-Python: >=3.13
10
+ Description-Content-Type: text/markdown
11
+ Requires-Dist: fastapi>=0.110.0
12
+ Requires-Dist: uvicorn>=0.28.0
13
+ Provides-Extra: dev
14
+ Requires-Dist: pytest>=8.0.0; extra == "dev"
15
+ Requires-Dist: httpx>=0.27.0; extra == "dev"
16
+
17
+ # EngLISP: A Bidirectional Bridge Between Natural Language and Computation
18
+
19
+ EngLISP is a structured, bidirectional translation system that bridges the gap between the expressive, ambiguous world of human language and the precise, functional world of computation.
20
+
21
+ This repository implements the full four-stage pipeline described in the [EngLISP Technical Specification](SPECIFICATION.md), enabling seamless round-trip conversions between human sentences, X-bar syntactic trees, rotated S-expressions, and compressed MinimaLIST configurations.
22
+
23
+ ---
24
+
25
+ ## Documentation Guides
26
+
27
+ To dive deeper into the theoretical, practical, and lexical design of the EngLISP engine, explore the following documentation:
28
+
29
+ * **[EngLISP Technical Specification](SPECIFICATION.md)**: Defines the formal specifications of the 4-stage translation pipeline, S-expression rotation rules, semantic minimization algorithms, and native Lisp/Scheme compile-time macro expansion.
30
+ * **[Lexical & Disambiguation Strategies](LEXICAL_STRATEGIES.md)**: Explains the scaling strategy for large-scale lexicons, including mapping grammatical terminals to language-neutral Synset IDs (BabelNet/WordNet), context-aware Lesk Word Sense Disambiguation (WSD), and Earley chart parsing rules.
31
+ * **[Use-Cases & Applications Guide](USE_CASES.md)**: Explores downstream deployment scenarios such as neuro-symbolic LLM pipeline injection, zero-redundancy network serialization, multi-agent protocols, exact semantic search indexing, and explainable AI audit trails.
32
+
33
+ ---
34
+
35
+ ## The Four-Stage Pipeline
36
+
37
+ ```
38
+ Stage 1: Natural Language
39
+ ↕ (Parsing & Generation)
40
+ Stage 2: X-bar Syntax Tree (Linguistic IR)
41
+ ↕ (Rotation Canonicalization)
42
+ Stage 3: EngLISP S-Expression (Computational IR)
43
+ ↕ (Semantic Minimization & Expansion)
44
+ Stage 4: MinimaLIST EngLISP (Minimal sufficient form)
45
+ ```
46
+
47
+ 1. **Stage 1 — Natural Language**: Input English text (e.g., *"The quick brown fox jumped over the lazy dog"*).
48
+ 2. **Stage 2 — X-bar Tree**: A linguistically faithful syntactic tree encoding structural head-complement-specifier relations.
49
+ 3. **Stage 3 — EngLISP S-Expression**: A rotated Lisp S-expression putting semantic heads/verbs first at each level, e.g., `(jumped (fox the quick brown) (over (dog the lazy)))`.
50
+ 4. **Stage 4 — MinimaLIST EngLISP**: A minimized sufficient representation that eliminates redundant surface elements (like default determiners) and simplifies logic (double negation elimination), e.g., `(jumped (fox quick brown) (over (dog lazy)))`.
51
+
52
+ ---
53
+
54
+ ## Pipeline Walkthrough Example
55
+
56
+ To illustrate the forward and reverse transformations, here is the complete life cycle of the sentence *"The dog can chase the cat."*:
57
+
58
+ ### Forward Pipeline (Stage 1 &rarr; Stage 4)
59
+
60
+ 1. **Stage 1 (Natural Language)**:
61
+ ```
62
+ "The dog can chase the cat."
63
+ ```
64
+
65
+ 2. **Stage 2 (X-bar Syntax Tree)**:
66
+ The sentence is parsed into a hierarchical constituency tree:
67
+ ```
68
+ IP [phrase]
69
+ NP [specifier]
70
+ Det [specifier]: "the"
71
+ N' [bar]
72
+ N [head]: "dog"
73
+ I' [bar]
74
+ I [head]: "can"
75
+ VP [complement]
76
+ V' [bar]
77
+ V [head]: "chase"
78
+ NP [complement]
79
+ Det [specifier]: "the"
80
+ N' [bar]
81
+ N [head]: "cat"
82
+ ```
83
+
84
+ 3. **Stage 3 (EngLISP S-Expression)**:
85
+ The syntax tree is canonicalized and rotated. The inflection head `"can"` wraps the clause, and the main verb `"chase"` is rotated to the front of the inner expression:
86
+ ```lisp
87
+ (can (chase (dog the) (cat the)))
88
+ ```
89
+
90
+ 4. **Stage 4 (MinimaLIST EngLISP)**:
91
+ The S-expression is semantic-compressed. Default determiners (`the`) are pruned to reduce redundancy:
92
+ ```lisp
93
+ (can (chase dog cat))
94
+ ```
95
+
96
+ ---
97
+
98
+ ### Reverse Pipeline (Stage 4 &rarr; Stage 1)
99
+
100
+ 1. **Stage 4 (MinimaLIST)**:
101
+ ```lisp
102
+ (can (chase dog cat))
103
+ ```
104
+
105
+ 2. **Stage 3 (EngLISP S-Expression)**:
106
+ The minimizer expands the bare noun arguments back to their canonical noun phrase lists by reintroducing the default determiner (`the`):
107
+ ```lisp
108
+ (can (chase (dog the) (cat the)))
109
+ ```
110
+
111
+ 3. **Stage 2 (X-bar Tree)**:
112
+ The canonicalizer maps the S-expression back to the hierarchical X-bar syntax tree (identical to the parsed tree shown above).
113
+
114
+ 4. **Stage 1 (Natural Language)**:
115
+ The generator traverses the leaf terminals, handles spacing/capitalization, and synthesizes:
116
+ ```
117
+ "The dog can chase the cat."
118
+ ```
119
+
120
+ ---
121
+
122
+ ## LSON (Lisp Symbolic Object Notation) Data Format
123
+
124
+ EngLISP S-expressions and MinimaLIST structures are serialized using **LSON**—a data representation format designed to be theoretically superior to JSON:
125
+ * **Boilerplate-Free Syntax**: Strips away JSON's commas, colons, and quote-redundancies, yielding a compact byte footprint.
126
+ * **Native Graph Sharing**: Uses Lisp anchor `#N=` and backreference `#N#` syntax to serialize circular networks and shared memory nodes natively (DAG hash-consing).
127
+ * **Homoiconicity**: Functions simultaneously as structured data and an executable Abstract Syntax Tree (AST).
128
+
129
+ For a detailed comparison and payload size measurements, see the [EngLISP Use-Cases & Applications Guide](USE_CASES.md#E).
130
+
131
+ ---
132
+
133
+ ## Features
134
+
135
+ - **Bidirectional Transformations**: Edit any computational S-expression (Stage 3 or 4) and generate the corresponding natural language sentence and syntactic tree.
136
+ - **Glassmorphic Web Dashboard**: An interactive, modern dark-mode visual interface to explore the stages in real-time.
137
+ - **Dynamic SVG Tree Renderer**: Displays collapsible, color-coded X-bar constituent trees directly in the browser.
138
+ - **Minimization Optimizer**: Automatic rewrite rules for:
139
+ - Double negation removal: `(not (not happy))` &rarr; `happy`
140
+ - Negation-to-antonym reduction: `(not happy)` &rarr; `sad`
141
+ - Passive-to-Active voice restructuring: `(was chased by X)` &rarr; `(chased X)`
142
+ - Modifier compound semantic compression: `(dog young)` &rarr; `puppy`
143
+ - Determiner pruning: `(dog the)` &rarr; `dog`
144
+
145
+ ---
146
+
147
+ ## Installation & Setup
148
+
149
+ Make sure you have **Python 3.13+** installed.
150
+
151
+ ### Option A: Remote Installation (To use EngLISP in your own projects)
152
+ You can install EngLISP directly from the remote GitHub repository without cloning it locally. This automatically installs `englisp` and its dependencies in your active Python environment:
153
+
154
+ ```bash
155
+ pip install git+https://github.com/russellshen/The-EngLISP-Project.git
156
+ ```
157
+
158
+ ### Option B: Local Installation (To run the dashboard visualizer or run tests)
159
+ 1. Clone the repository and navigate to the directory:
160
+ ```bash
161
+ git clone https://github.com/russellshen/The-EngLISP-Project.git
162
+ cd The-EngLISP-Project
163
+ ```
164
+ 2. Install the package in editable mode along with development dependencies:
165
+ ```bash
166
+ pip install -e .[dev]
167
+ ```
168
+
169
+ ### Lexical Databases & Fallback Samples
170
+
171
+ * **Public Fallback Samples**: This repository comes bundled with lightweight fallback sample datasets (`sample_*.lson` files under `englisp/resources/`). These allow the entire parsing, generation, compilation, and interpretation pipeline to compile, run tests, and execute out of the box for testing and sample inputs.
172
+ * **Full Production Database**: The full-scale multilingual database (containing 254 partition files mapping over 100,000 nouns, verbs, adjectives, grammatical genders, and translations linked to BabelNet and WordNet synsets) is kept in a separate private repository to protect the project's data IP and commercial viability.
173
+ * **Requesting Access**: If you are an academic researcher, open-source contributor, or commercial partner interested in utilizing the full-scale dictionaries or licensing the dataset, please reach out directly via the **Commercial Licensing & Contact** section below.
174
+
175
+ ---
176
+
177
+ ## How to Run
178
+
179
+ To run the interactive web visualizer:
180
+
181
+ 1. Launch the FastAPI server:
182
+ ```bash
183
+ python run.py
184
+ ```
185
+ 2. Open your browser and navigate to:
186
+ [http://127.0.0.1:8000](http://127.0.0.1:8000)
187
+
188
+ To run the automated test suite:
189
+ ```bash
190
+ python -m pytest
191
+ ```
192
+
193
+ ---
194
+
195
+ ## Python Developer API
196
+
197
+ For larger projects, computational pipelines, and non-toy integration, developers can import the `englisp` package directly into their own Python projects.
198
+
199
+ ### Getting Started
200
+
201
+ To use the API, make sure the `englisp` directory is in your Python path:
202
+
203
+ ```python
204
+ import englisp
205
+ ```
206
+
207
+ The package exposes **8 core functions** covering all 3 bidirectional pipeline transformations, 2 direct "all-the-way" shortcuts, and **2 S-expression string serialization helpers**.
208
+
209
+ ---
210
+
211
+ ### 1. Bidirectional Stage Transformations (6 Atomic Functions)
212
+
213
+ #### Stage 1 (NL) &leftrightarrow; Stage 2 (X-bar Tree)
214
+ * **`nl_to_xbar(text: str, lang: str = "auto") -> XBarNode`**
215
+ Parses natural language text into an X-bar tree AST.
216
+ * `lang`: Supports `"en"`, `"fr"`, or `"auto"` (which detects the language dynamically).
217
+ * **`xbar_to_nl(node: XBarNode, lang: str = "en") -> str`**
218
+ Synthesizes grammatical natural language text from an X-bar tree.
219
+
220
+ ```python
221
+ # Parse English (auto-detected)
222
+ tree = englisp.nl_to_xbar("The dog chased the cat.", lang="auto")
223
+ print(tree.category) # Output: IP
224
+ print(tree.pretty_print()) # Outputs structured text tree
225
+
226
+ # Generate back
227
+ text = englisp.xbar_to_nl(tree, lang="en")
228
+ print(text) # Output: "The dog chased the cat."
229
+ ```
230
+
231
+ #### Stage 2 (X-bar Tree) &leftrightarrow; Stage 3 (EngLISP S-Expression)
232
+ * **`xbar_to_englisp(node: XBarNode, lang: str = "en") -> list`**
233
+ Translates an X-bar tree into a canonical, rotated EngLISP S-expression list.
234
+ * **`englisp_to_xbar(sexpr: list, lang: str = "en") -> XBarNode`**
235
+ Reconstructs an X-bar tree from an EngLISP S-expression list.
236
+
237
+ ```python
238
+ # Rotate tree to rotated head-first Lisp list
239
+ sexpr = englisp.xbar_to_englisp(tree, lang="en")
240
+ print(sexpr) # Output: ['chased', ['dog', 'the'], ['cat', 'the']]
241
+
242
+ # Reconstruct X-bar tree from list
243
+ tree_rebuilt = englisp.englisp_to_xbar(sexpr, lang="en")
244
+ ```
245
+
246
+ #### Stage 3 (EngLISP) &leftrightarrow; Stage 4 (MinimaLIST S-Expression)
247
+ * **`englisp_to_minimalist(sexpr: list) -> list`**
248
+ Compresses an EngLISP S-expression list into MinimaLIST form (applying double negation, antonyms, voice shifts, and determiner prunings).
249
+ * **`minimalist_to_englisp(sexpr: list) -> list`**
250
+ Expands a MinimaLIST S-expression back into canonical EngLISP form.
251
+
252
+ ```python
253
+ # Compress S-expression list
254
+ min_sexpr = englisp.englisp_to_minimalist(sexpr)
255
+ print(min_sexpr) # Output: ['chased', 'dog', 'cat']
256
+
257
+ # Expand S-expression list back
258
+ expanded_sexpr = englisp.minimalist_to_englisp(min_sexpr)
259
+ print(expanded_sexpr) # Output: ['chased', ['dog', 'the'], ['cat', 'the']]
260
+ ```
261
+
262
+ ---
263
+
264
+ ### 2. Direct "All the Way" Transformations (2 Shortcut Functions)
265
+
266
+ For quick end-to-end processing without dealing with intermediate X-bar nodes:
267
+
268
+ #### Stage 1 (NL) &rarr; Stage 4 (MinimaLIST)
269
+ * **`nl_to_minimalist(text: str, lang: str = "auto") -> list`**
270
+ Directly converts natural language text to a minimized MinimaLIST S-expression list.
271
+
272
+ ```python
273
+ min_list = englisp.nl_to_minimalist("The dog was not unhappy.", lang="en")
274
+ print(min_list) # Output: ['happy', 'dog'] (pruned det & double negation)
275
+ ```
276
+
277
+ #### Stage 4 (MinimaLIST) &rarr; Stage 1 (NL)
278
+ * **`minimalist_to_nl(sexpr: list, lang: str = "en") -> str`**
279
+ Directly expands a MinimaLIST S-expression and generates the corresponding surface sentence.
280
+
281
+ ```python
282
+ sentence = englisp.minimalist_to_nl(['chased', 'dog', 'cat'], lang="fr")
283
+ print(sentence) # Output: "Le chien chassait le chat." (Cross-lingual translation!)
284
+ ```
285
+
286
+ ---
287
+
288
+ ### 3. Serialization Helpers (2 Utility Functions)
289
+
290
+ Lisp S-expressions are represented as standard nested Python lists in code. To print them or write them to configuration files, serialize them to/from strings:
291
+
292
+ * **`to_string(sexpr: list) -> str`**
293
+ Serializes a nested list to a Lisp-style S-expression string. Supports **DAG hash-consing backreferences** (`#1=...` / `#1#`) for shared memory nodes.
294
+ * **`from_string(s: str) -> list`**
295
+ Parses a Lisp S-expression string back into a nested list structure.
296
+
297
+ ```python
298
+ # Serialize S-expression list with shared memory sub-expressions (DAG)
299
+ dog_ref = ["dog", "the"]
300
+ dag_sexpr = ["and", ["chased", dog_ref, "cat"], ["barked", dog_ref]]
301
+ print(englisp.to_string(dag_sexpr))
302
+ # Output: (and (chased #1=(dog the) cat) (barked #1#))
303
+
304
+ # Parse S-expression string back to python lists
305
+ parsed_list = englisp.from_string("(and (chased #1=(dog the) cat) (barked #1#))")
306
+ print(parsed_list)
307
+ # Output: ['and', ['chased', ['dog', 'the'], 'cat'], ['barked', ['dog', 'the']]]
308
+ ```
309
+
310
+ ---
311
+
312
+ ## Code Directory Structure
313
+
314
+ - `englisp/xbar.py`: Hierarchical X-bar node class modeling syntax tree representations.
315
+ - `englisp/parser.py`: Deterministic natural language parser and generator.
316
+ - `englisp/canonicalizer.py`: Tree-rotation S-expression translator (rotates heads/verbs to the front).
317
+ - `englisp/minimizer.py`: Semantic optimizer applying rewrite and expansion rules for MinimaLIST forms.
318
+ - `englisp/interpreter.py`: Stateful S-expression evaluator supporting arithmetic calculations, backward-chaining logical rules, cycle-detection loop protection, and Explainable AI (XAI) proof trails.
319
+ - `englisp/compiler.py`: S-expression compiler compiling canonical forms to native Common Lisp and Scheme.
320
+ - `englisp/ontology.py`: Graph database mapping and Word Sense Disambiguation (WSD) synset resolution.
321
+ - `englisp/graph_db.py`: Semantic graph database tracking entity relationships and `IS_A` type inheritance.
322
+ - `englisp/loader.py`: Alphabetically partitioned lazy-loading database driver.
323
+ - `web/server.py`: FastAPI server serving the backend API endpoints.
324
+ - `web/static/`: Frontend dashboard containing:
325
+ - `index.html`: Clean glassmorphism layout with Outfit and Inter typography.
326
+ - `styles.css`: Dark-indigo themed visual system.
327
+ - `app.js`: Interactive UI logic, API calls, and custom SVG hierarchical tree layout drawer.
328
+
329
+ ---
330
+
331
+ ## License, Copyright, & Feedback
332
+
333
+ [![CC BY-NC-ND 4.0](https://licensebuttons.net/l/by-nc-nd/4.0/88x31.png)](https://creativecommons.org/licenses/by-nc-nd/4.0/)
334
+
335
+ Copyright © 2026 Russell Shen. All rights reserved.
336
+
337
+ This project and its documentation are licensed under the **Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)** license.
338
+
339
+ ### License Clarification & Terms
340
+
341
+ Under the CC BY-NC-ND 4.0 license, you are free to download, copy, and share this codebase for personal, academic, or non-commercial study, subject to the following strict conditions:
342
+
343
+ 1. **Attribution (BY)**: You must give appropriate credit to the author (**Russell Shen**), provide a link to this license, and indicate if any modifications were made. You must do so in a reasonable manner, but not in any way that suggests endorsement.
344
+ 2. **Non-Commercial (NC)**: You may not use the material for commercial purposes. This explicitly prohibits using this software, its algorithms, data files, or documentation in any commercial products, revenue-generating activities, paid API services, or closed-source corporate projects.
345
+ 3. **No Derivatives (ND)**: If you remix, transform, or build upon this material, you are permitted to do so only for private, personal use. You **may not distribute** any modified or derived versions of the code, specifications, or datasets to the public or any third party.
346
+
347
+ ### Support & Sponsorship
348
+
349
+ If you find the EngLISP project useful and want to support its ongoing development, optimization, and research, please consider sponsoring:
350
+
351
+ * **GitHub Sponsors**: [Sponsor Russell Shen on GitHub](https://github.com/sponsors/russellshen)
352
+
353
+ Your support helps maintain the public code, keep the hosted playground running, and fund future multi-lingual expansions.
354
+
355
+ ### Questions, Suggestions, & Feedback
356
+
357
+ If you have honest, good-faith questions, suggestions, or ideas about the EngLISP project or the LSON specification, please feel free to reach out. I welcome community feedback, academic inquiries, and theoretical discussions.
358
+
359
+ ### Commercial Licensing & Contact
360
+
361
+ Any use outside the narrow scope of the CC BY-NC-ND 4.0 license is strictly prohibited without a separate commercial agreement. Parties interested in commercial deployment, proprietary closed-source integration, SaaS hosting, or distributing modified versions, or who have general feedback and inquiries, may contact the author directly:
362
+
363
+ * **Russell Shen**
364
+ * 📧 [russellshen7@gmail.com](mailto:russellshen7@gmail.com)
365
+
366
+ *Licensing terms, scope, and compensation are subject to separate negotiation and are granted only by explicit written agreement.*