@jaepil/uqa 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (233) hide show
  1. package/LICENSE +661 -0
  2. package/README.md +788 -0
  3. package/dist/types/src/analysis/analyzer.d.ts +27 -0
  4. package/dist/types/src/analysis/analyzer.d.ts.map +1 -0
  5. package/dist/types/src/analysis/char-filter.d.ts +27 -0
  6. package/dist/types/src/analysis/char-filter.d.ts.map +1 -0
  7. package/dist/types/src/analysis/token-filter.d.ts +73 -0
  8. package/dist/types/src/analysis/token-filter.d.ts.map +1 -0
  9. package/dist/types/src/analysis/tokenizer.d.ts +42 -0
  10. package/dist/types/src/analysis/tokenizer.d.ts.map +1 -0
  11. package/dist/types/src/api/query-builder.d.ts +103 -0
  12. package/dist/types/src/api/query-builder.d.ts.map +1 -0
  13. package/dist/types/src/cli/repl.d.ts +42 -0
  14. package/dist/types/src/cli/repl.d.ts.map +1 -0
  15. package/dist/types/src/core/functor.d.ts +38 -0
  16. package/dist/types/src/core/functor.d.ts.map +1 -0
  17. package/dist/types/src/core/hierarchical.d.ts +24 -0
  18. package/dist/types/src/core/hierarchical.d.ts.map +1 -0
  19. package/dist/types/src/core/posting-list.d.ts +43 -0
  20. package/dist/types/src/core/posting-list.d.ts.map +1 -0
  21. package/dist/types/src/core/types.d.ts +116 -0
  22. package/dist/types/src/core/types.d.ts.map +1 -0
  23. package/dist/types/src/engine.d.ts +130 -0
  24. package/dist/types/src/engine.d.ts.map +1 -0
  25. package/dist/types/src/execution/batch.d.ts +144 -0
  26. package/dist/types/src/execution/batch.d.ts.map +1 -0
  27. package/dist/types/src/execution/physical.d.ts +18 -0
  28. package/dist/types/src/execution/physical.d.ts.map +1 -0
  29. package/dist/types/src/execution/relational.d.ts +174 -0
  30. package/dist/types/src/execution/relational.d.ts.map +1 -0
  31. package/dist/types/src/execution/scan.d.ts +41 -0
  32. package/dist/types/src/execution/scan.d.ts.map +1 -0
  33. package/dist/types/src/execution/spill.d.ts +39 -0
  34. package/dist/types/src/execution/spill.d.ts.map +1 -0
  35. package/dist/types/src/fdw/arrow-handler.d.ts +28 -0
  36. package/dist/types/src/fdw/arrow-handler.d.ts.map +1 -0
  37. package/dist/types/src/fdw/duckdb-handler.d.ts +32 -0
  38. package/dist/types/src/fdw/duckdb-handler.d.ts.map +1 -0
  39. package/dist/types/src/fdw/foreign-table.d.ts +47 -0
  40. package/dist/types/src/fdw/foreign-table.d.ts.map +1 -0
  41. package/dist/types/src/fdw/handler.d.ts +52 -0
  42. package/dist/types/src/fdw/handler.d.ts.map +1 -0
  43. package/dist/types/src/fusion/attention.d.ts +23 -0
  44. package/dist/types/src/fusion/attention.d.ts.map +1 -0
  45. package/dist/types/src/fusion/boolean.d.ts +6 -0
  46. package/dist/types/src/fusion/boolean.d.ts.map +1 -0
  47. package/dist/types/src/fusion/learned.d.ts +12 -0
  48. package/dist/types/src/fusion/learned.d.ts.map +1 -0
  49. package/dist/types/src/fusion/log-odds.d.ts +87 -0
  50. package/dist/types/src/fusion/log-odds.d.ts.map +1 -0
  51. package/dist/types/src/fusion/query-features.d.ts +8 -0
  52. package/dist/types/src/fusion/query-features.d.ts.map +1 -0
  53. package/dist/types/src/graph/centrality.d.ts +35 -0
  54. package/dist/types/src/graph/centrality.d.ts.map +1 -0
  55. package/dist/types/src/graph/cross-paradigm.d.ts +68 -0
  56. package/dist/types/src/graph/cross-paradigm.d.ts.map +1 -0
  57. package/dist/types/src/graph/cypher/ast.d.ts +175 -0
  58. package/dist/types/src/graph/cypher/ast.d.ts.map +1 -0
  59. package/dist/types/src/graph/cypher/compiler.d.ts +60 -0
  60. package/dist/types/src/graph/cypher/compiler.d.ts.map +1 -0
  61. package/dist/types/src/graph/cypher/lexer.d.ts +82 -0
  62. package/dist/types/src/graph/cypher/lexer.d.ts.map +1 -0
  63. package/dist/types/src/graph/cypher/parser.d.ts +50 -0
  64. package/dist/types/src/graph/cypher/parser.d.ts.map +1 -0
  65. package/dist/types/src/graph/delta.d.ts +31 -0
  66. package/dist/types/src/graph/delta.d.ts.map +1 -0
  67. package/dist/types/src/graph/graph-embedding.d.ts +13 -0
  68. package/dist/types/src/graph/graph-embedding.d.ts.map +1 -0
  69. package/dist/types/src/graph/incremental-match.d.ts +16 -0
  70. package/dist/types/src/graph/incremental-match.d.ts.map +1 -0
  71. package/dist/types/src/graph/index.d.ts +58 -0
  72. package/dist/types/src/graph/index.d.ts.map +1 -0
  73. package/dist/types/src/graph/join.d.ts +18 -0
  74. package/dist/types/src/graph/join.d.ts.map +1 -0
  75. package/dist/types/src/graph/message-passing.d.ts +17 -0
  76. package/dist/types/src/graph/message-passing.d.ts.map +1 -0
  77. package/dist/types/src/graph/operators.d.ts +81 -0
  78. package/dist/types/src/graph/operators.d.ts.map +1 -0
  79. package/dist/types/src/graph/pattern.d.ts +63 -0
  80. package/dist/types/src/graph/pattern.d.ts.map +1 -0
  81. package/dist/types/src/graph/posting-list.d.ts +24 -0
  82. package/dist/types/src/graph/posting-list.d.ts.map +1 -0
  83. package/dist/types/src/graph/rpq-optimizer.d.ts +25 -0
  84. package/dist/types/src/graph/rpq-optimizer.d.ts.map +1 -0
  85. package/dist/types/src/graph/store.d.ts +61 -0
  86. package/dist/types/src/graph/store.d.ts.map +1 -0
  87. package/dist/types/src/graph/temporal-filter.d.ts +12 -0
  88. package/dist/types/src/graph/temporal-filter.d.ts.map +1 -0
  89. package/dist/types/src/graph/temporal-pattern-match.d.ts +26 -0
  90. package/dist/types/src/graph/temporal-pattern-match.d.ts.map +1 -0
  91. package/dist/types/src/graph/temporal-traverse.d.ts +25 -0
  92. package/dist/types/src/graph/temporal-traverse.d.ts.map +1 -0
  93. package/dist/types/src/graph/versioned-store.d.ts +64 -0
  94. package/dist/types/src/graph/versioned-store.d.ts.map +1 -0
  95. package/dist/types/src/index.d.ts +20 -0
  96. package/dist/types/src/index.d.ts.map +1 -0
  97. package/dist/types/src/joins/base.d.ts +18 -0
  98. package/dist/types/src/joins/base.d.ts.map +1 -0
  99. package/dist/types/src/joins/cross-paradigm.d.ts +46 -0
  100. package/dist/types/src/joins/cross-paradigm.d.ts.map +1 -0
  101. package/dist/types/src/joins/cross.d.ts +9 -0
  102. package/dist/types/src/joins/cross.d.ts.map +1 -0
  103. package/dist/types/src/joins/index.d.ts +7 -0
  104. package/dist/types/src/joins/index.d.ts.map +1 -0
  105. package/dist/types/src/joins/inner.d.ts +7 -0
  106. package/dist/types/src/joins/inner.d.ts.map +1 -0
  107. package/dist/types/src/joins/outer.d.ts +16 -0
  108. package/dist/types/src/joins/outer.d.ts.map +1 -0
  109. package/dist/types/src/joins/semi.d.ts +22 -0
  110. package/dist/types/src/joins/semi.d.ts.map +1 -0
  111. package/dist/types/src/joins/sort-merge.d.ts +7 -0
  112. package/dist/types/src/joins/sort-merge.d.ts.map +1 -0
  113. package/dist/types/src/math/linalg.d.ts +27 -0
  114. package/dist/types/src/math/linalg.d.ts.map +1 -0
  115. package/dist/types/src/math/random.d.ts +15 -0
  116. package/dist/types/src/math/random.d.ts.map +1 -0
  117. package/dist/types/src/operators/aggregation.d.ts +65 -0
  118. package/dist/types/src/operators/aggregation.d.ts.map +1 -0
  119. package/dist/types/src/operators/attention.d.ts +16 -0
  120. package/dist/types/src/operators/attention.d.ts.map +1 -0
  121. package/dist/types/src/operators/backend.d.ts +52 -0
  122. package/dist/types/src/operators/backend.d.ts.map +1 -0
  123. package/dist/types/src/operators/base.d.ts +32 -0
  124. package/dist/types/src/operators/base.d.ts.map +1 -0
  125. package/dist/types/src/operators/boolean.d.ts +23 -0
  126. package/dist/types/src/operators/boolean.d.ts.map +1 -0
  127. package/dist/types/src/operators/calibrated-vector.d.ts +56 -0
  128. package/dist/types/src/operators/calibrated-vector.d.ts.map +1 -0
  129. package/dist/types/src/operators/deep-fusion.d.ts +104 -0
  130. package/dist/types/src/operators/deep-fusion.d.ts.map +1 -0
  131. package/dist/types/src/operators/deep-learn.d.ts +87 -0
  132. package/dist/types/src/operators/deep-learn.d.ts.map +1 -0
  133. package/dist/types/src/operators/hierarchical.d.ts +44 -0
  134. package/dist/types/src/operators/hierarchical.d.ts.map +1 -0
  135. package/dist/types/src/operators/hybrid.d.ts +75 -0
  136. package/dist/types/src/operators/hybrid.d.ts.map +1 -0
  137. package/dist/types/src/operators/learned-fusion.d.ts +15 -0
  138. package/dist/types/src/operators/learned-fusion.d.ts.map +1 -0
  139. package/dist/types/src/operators/multi-field.d.ts +13 -0
  140. package/dist/types/src/operators/multi-field.d.ts.map +1 -0
  141. package/dist/types/src/operators/multi-stage.d.ts +12 -0
  142. package/dist/types/src/operators/multi-stage.d.ts.map +1 -0
  143. package/dist/types/src/operators/primitive.d.ts +75 -0
  144. package/dist/types/src/operators/primitive.d.ts.map +1 -0
  145. package/dist/types/src/operators/progressive-fusion.d.ts +13 -0
  146. package/dist/types/src/operators/progressive-fusion.d.ts.map +1 -0
  147. package/dist/types/src/operators/sparse.d.ts +12 -0
  148. package/dist/types/src/operators/sparse.d.ts.map +1 -0
  149. package/dist/types/src/planner/cardinality.d.ts +110 -0
  150. package/dist/types/src/planner/cardinality.d.ts.map +1 -0
  151. package/dist/types/src/planner/cost-model.d.ts +16 -0
  152. package/dist/types/src/planner/cost-model.d.ts.map +1 -0
  153. package/dist/types/src/planner/executor.d.ts +48 -0
  154. package/dist/types/src/planner/executor.d.ts.map +1 -0
  155. package/dist/types/src/planner/join-enumerator.d.ts +76 -0
  156. package/dist/types/src/planner/join-enumerator.d.ts.map +1 -0
  157. package/dist/types/src/planner/join-graph.d.ts +61 -0
  158. package/dist/types/src/planner/join-graph.d.ts.map +1 -0
  159. package/dist/types/src/planner/join-order.d.ts +24 -0
  160. package/dist/types/src/planner/join-order.d.ts.map +1 -0
  161. package/dist/types/src/planner/optimizer.d.ts +62 -0
  162. package/dist/types/src/planner/optimizer.d.ts.map +1 -0
  163. package/dist/types/src/planner/parallel.d.ts +43 -0
  164. package/dist/types/src/planner/parallel.d.ts.map +1 -0
  165. package/dist/types/src/scoring/bayesian-bm25.d.ts +22 -0
  166. package/dist/types/src/scoring/bayesian-bm25.d.ts.map +1 -0
  167. package/dist/types/src/scoring/bm25.d.ts +20 -0
  168. package/dist/types/src/scoring/bm25.d.ts.map +1 -0
  169. package/dist/types/src/scoring/calibration.d.ts +68 -0
  170. package/dist/types/src/scoring/calibration.d.ts.map +1 -0
  171. package/dist/types/src/scoring/external-prior.d.ts +55 -0
  172. package/dist/types/src/scoring/external-prior.d.ts.map +1 -0
  173. package/dist/types/src/scoring/fusion-wand.d.ts +19 -0
  174. package/dist/types/src/scoring/fusion-wand.d.ts.map +1 -0
  175. package/dist/types/src/scoring/multi-field.d.ts +10 -0
  176. package/dist/types/src/scoring/multi-field.d.ts.map +1 -0
  177. package/dist/types/src/scoring/parameter-learner.d.ts +42 -0
  178. package/dist/types/src/scoring/parameter-learner.d.ts.map +1 -0
  179. package/dist/types/src/scoring/vector.d.ts +42 -0
  180. package/dist/types/src/scoring/vector.d.ts.map +1 -0
  181. package/dist/types/src/scoring/wand.d.ts +51 -0
  182. package/dist/types/src/scoring/wand.d.ts.map +1 -0
  183. package/dist/types/src/sql/compiler.d.ts +551 -0
  184. package/dist/types/src/sql/compiler.d.ts.map +1 -0
  185. package/dist/types/src/sql/expr-evaluator.d.ts +43 -0
  186. package/dist/types/src/sql/expr-evaluator.d.ts.map +1 -0
  187. package/dist/types/src/sql/fts-query.d.ts +60 -0
  188. package/dist/types/src/sql/fts-query.d.ts.map +1 -0
  189. package/dist/types/src/sql/table.d.ts +72 -0
  190. package/dist/types/src/sql/table.d.ts.map +1 -0
  191. package/dist/types/src/storage/abc/document-store.d.ts +44 -0
  192. package/dist/types/src/storage/abc/document-store.d.ts.map +1 -0
  193. package/dist/types/src/storage/abc/graph-store.d.ts +109 -0
  194. package/dist/types/src/storage/abc/graph-store.d.ts.map +1 -0
  195. package/dist/types/src/storage/abc/inverted-index.d.ts +118 -0
  196. package/dist/types/src/storage/abc/inverted-index.d.ts.map +1 -0
  197. package/dist/types/src/storage/block-max-index.d.ts +45 -0
  198. package/dist/types/src/storage/block-max-index.d.ts.map +1 -0
  199. package/dist/types/src/storage/btree-index.d.ts +18 -0
  200. package/dist/types/src/storage/btree-index.d.ts.map +1 -0
  201. package/dist/types/src/storage/catalog.d.ts +87 -0
  202. package/dist/types/src/storage/catalog.d.ts.map +1 -0
  203. package/dist/types/src/storage/document-store.d.ts +20 -0
  204. package/dist/types/src/storage/document-store.d.ts.map +1 -0
  205. package/dist/types/src/storage/index-abc.d.ts +16 -0
  206. package/dist/types/src/storage/index-abc.d.ts.map +1 -0
  207. package/dist/types/src/storage/index-manager.d.ts +19 -0
  208. package/dist/types/src/storage/index-manager.d.ts.map +1 -0
  209. package/dist/types/src/storage/index-types.d.ts +10 -0
  210. package/dist/types/src/storage/index-types.d.ts.map +1 -0
  211. package/dist/types/src/storage/inverted-index.d.ts +55 -0
  212. package/dist/types/src/storage/inverted-index.d.ts.map +1 -0
  213. package/dist/types/src/storage/ivf-index.d.ts +57 -0
  214. package/dist/types/src/storage/ivf-index.d.ts.map +1 -0
  215. package/dist/types/src/storage/managed-connection.d.ts +64 -0
  216. package/dist/types/src/storage/managed-connection.d.ts.map +1 -0
  217. package/dist/types/src/storage/spatial-index.d.ts +18 -0
  218. package/dist/types/src/storage/spatial-index.d.ts.map +1 -0
  219. package/dist/types/src/storage/sqlite-document-store.d.ts +54 -0
  220. package/dist/types/src/storage/sqlite-document-store.d.ts.map +1 -0
  221. package/dist/types/src/storage/sqlite-graph-store.d.ts +59 -0
  222. package/dist/types/src/storage/sqlite-graph-store.d.ts.map +1 -0
  223. package/dist/types/src/storage/sqlite-inverted-index.d.ts +75 -0
  224. package/dist/types/src/storage/sqlite-inverted-index.d.ts.map +1 -0
  225. package/dist/types/src/storage/transaction.d.ts +15 -0
  226. package/dist/types/src/storage/transaction.d.ts.map +1 -0
  227. package/dist/types/src/storage/vector-index.d.ts +23 -0
  228. package/dist/types/src/storage/vector-index.d.ts.map +1 -0
  229. package/dist/uqa.es.js +16156 -0
  230. package/dist/uqa.es.js.map +1 -0
  231. package/dist/uqa.umd.js +11 -0
  232. package/dist/uqa.umd.js.map +1 -0
  233. package/package.json +73 -0
package/README.md ADDED
@@ -0,0 +1,788 @@
1
+ # UQA-JS -- Unified Query Algebra for the Browser
2
+
3
+ A multi-paradigm database engine that unifies **relational**, **text retrieval**, **vector search**, **graph query**, and **geospatial** paradigms under a single algebraic structure, using posting lists as the universal abstraction. SQL interface targets **PostgreSQL 17** compatibility.
4
+
5
+ > **Note:** UQA-JS is the TypeScript/browser port of [UQA](https://github.com/cognica-io/uqa) (Python). The unified query algebra theory behind this project is deployed in production as [Cognica Database](https://cognica.io), a commercial multi-paradigm database engine built in C++20/23. UQA-JS brings the full algebraic framework to the browser via WebAssembly-based dependencies, with identical semantics to the Python reference implementation.
6
+
7
+ ## Background
8
+
9
+ Modern data systems are fragmented into specialized engines: relational databases built on relational algebra, search engines on probabilistic IR models, vector databases on geometric similarity, and graph databases on traversal semantics. UQA eliminates this fragmentation by proving that a single algebraic structure can express operations across all four paradigms.
10
+
11
+ ### Posting Lists as Universal Abstraction
12
+
13
+ The core insight is that **posting lists** -- sorted sequences of `(document_id, payload)` pairs -- can represent result sets from any paradigm. A posting list $L$ is defined as:
14
+
15
+ $$
16
+ L = [(id_1, payload_1),\ (id_2, payload_2),\ \ldots,\ (id_n, payload_n)]
17
+ $$
18
+
19
+ where $id_i < id_j$ for all $i < j$. A bijection $PL: 2^{\mathcal{D}} \rightarrow \mathcal{L}$ maps document sets to posting lists and back, allowing set-theoretic reasoning to carry over directly.
20
+
21
+ ### Boolean Algebra
22
+
23
+ The structure $(\mathcal{L},\ \cup,\ \cap,\ \overline{\cdot},\ \emptyset,\ \mathcal{D})$ forms a **complete Boolean algebra** -- satisfying commutativity, associativity, distributivity, identity, and complement laws. This means any combination of AND, OR, and NOT across paradigms is algebraically well-defined, and query optimization can exploit lattice-theoretic rewrite rules.
24
+
25
+ ### Cross-Paradigm Operators
26
+
27
+ Primitive operators map each paradigm into the posting list space:
28
+
29
+ | Operator | Definition | Paradigm |
30
+ |----------|-----------|----------|
31
+ | $T(t)$ | $PL(\lbrace d \in \mathcal{D} \mid t \in term(d, f) \rbrace)$ | Text retrieval |
32
+ | $V_\theta(q)$ | $PL(\lbrace d \in \mathcal{D} \mid sim(vec(d, f),\ q) \geq \theta \rbrace)$ | Vector search |
33
+ | $KNN_k(q)$ | $PL(D_k)$ where $\|D_k\| = k$, ranked by similarity | Vector search |
34
+ | $Filter_{f,v}(L)$ | $L \cap PL(\lbrace d \in \mathcal{D} \mid d.f = v \rbrace)$ | Relational |
35
+ | $Score_q(L)$ | $(L,\ [s_1, \ldots, s_{\|L\|}])$ | Scoring |
36
+
37
+ Because every operator produces a posting list, they compose freely. A hybrid text + vector search is simply an intersection:
38
+
39
+ $$
40
+ Hybrid_{t,q,\theta} = T(t) \cap V_\theta(q)
41
+ $$
42
+
43
+ ### Graph Extension
44
+
45
+ The second paper extends the framework to graph data by establishing a **Graph-Posting List Isomorphism**. A graph posting list $L_G = [(id_1, G_1), \ldots, (id_n, G_n)]$ maps to standard posting lists via:
46
+
47
+ $$
48
+ \Phi(L_G) = PL\left(\bigcup_{i=1}^{n} \phi_{G \rightarrow D}(G_i)\right)
49
+ $$
50
+
51
+ This isomorphism preserves Boolean operations -- $\Phi(L_G^1 \cup_G L_G^2) = \Phi(L_G^1) \cup \Phi(L_G^2)$ -- so graph traversals, pattern matches, and path queries integrate seamlessly with text, vector, and relational operations under the same algebra.
52
+
53
+ ### Vector Calibration
54
+
55
+ The fifth paper addresses a fundamental gap in hybrid search: vector similarity scores (cosine similarity, inner product, Euclidean distance) are geometric quantities, not probabilities. A cosine similarity of 0.85 does not mean an 85% chance of relevance, yet hybrid systems routinely combine such scores with calibrated lexical signals through ad-hoc normalization. The paper presents a Bayesian calibration framework that transforms vector scores into calibrated relevance probabilities through a likelihood ratio formulation:
56
+
57
+ $$
58
+ \text{logit}\ P(R=1 \mid d) = \log \frac{f_R(d)}{f_G(d)} + \text{logit}\ P(R=1)
59
+ $$
60
+
61
+ where $f_R(d)$ is the local distance density among relevant documents and $f_G(d)$ is the global background density. This has the same additive structure as Bayesian BM25 calibration, establishing a structural identity between lexical and dense retrieval scoring. Both densities are extracted from statistics already computed during ANN index construction and search -- IVF cell populations and intra-cluster distances, HNSW edge distances and search trajectories -- at negligible additional cost. The resulting calibrated vector scores integrate with Bayesian BM25 through additive log-odds:
62
+
63
+ $$
64
+ \text{logit}\ P(R \mid d_{vec}, s_{bm25}) = \underbrace{\log \frac{\hat{f}_R(d)}{f_G(d)}}_{\text{calibrated vector}} + \underbrace{\alpha(s_{bm25} - \beta)}_{\text{calibrated lexical}} + \underbrace{\text{logit}\ P_{base}}_{\text{corpus prior}}
65
+ $$
66
+
67
+ This completes the probabilistic unification of sparse and dense retrieval: both paradigms are calibrated through the same Bayesian likelihood ratio structure, each drawing on the statistics of its native index. For full treatment, see [Paper 5](docs/papers/5.%20Vector%20Scores%20as%20Likelihood%20Ratios%20-%20Index-Derived%20Bayesian%20Calibration%20for%20Hybrid%20Search.pdf).
68
+
69
+ ### Compositional Completeness
70
+
71
+ The framework guarantees that **any query expressible as a combination of relational, text, vector, and graph operations** has a representation in the unified algebra (Theorem 3.3.5). This is not merely an interface unification -- the algebraic closure ensures that cross-paradigm queries (e.g., "find papers cited by graph neighbors whose embeddings are similar to a query vector and whose titles match a keyword") are first-class operations with well-defined optimization rules.
72
+
73
+ For full formal treatment, see [Paper 1](docs/papers/1.%20A%20Unified%20Mathematical%20Framework%20for%20Query%20Algebras%20Across%20Heterogeneous%20Data%20Paradigms.pdf), [Paper 2](docs/papers/2.%20Extending%20the%20Unified%20Mathematical%20Framework%20to%20Support%20Graph%20Data%20Structures.pdf), [Paper 3](docs/papers/3.%20Bayesian%20BM25%20-%20A%20Probabilistic%20Framework%20for%20Hybrid%20Text%20and%20Vector%20Search.pdf), and [Paper 5](docs/papers/5.%20Vector%20Scores%20as%20Likelihood%20Ratios%20-%20Index-Derived%20Bayesian%20Calibration%20for%20Hybrid%20Search.pdf).
74
+
75
+ ## Installation
76
+
77
+ ```bash
78
+ npm install uqa
79
+ ```
80
+
81
+ ## Quick Start
82
+
83
+ ### Creating an Engine
84
+
85
+ ```typescript
86
+ import { Engine } from "uqa";
87
+
88
+ // In-memory engine
89
+ const engine = await Engine.create();
90
+
91
+ // Persistent engine (SQLite via sql.js WASM)
92
+ const engine = await Engine.create({ dbPath: "research.db" });
93
+ ```
94
+
95
+ ### SQL Queries
96
+
97
+ ```typescript
98
+ // Create a table
99
+ await engine.sql(`
100
+ CREATE TABLE papers (
101
+ id SERIAL PRIMARY KEY,
102
+ title TEXT NOT NULL,
103
+ year INTEGER NOT NULL,
104
+ citations INTEGER DEFAULT 0
105
+ )
106
+ `);
107
+
108
+ // Insert data
109
+ await engine.sql(`INSERT INTO papers (title, year, citations) VALUES
110
+ ('attention is all you need', 2017, 90000),
111
+ ('bert pre-training', 2019, 75000),
112
+ ('gpt language models', 2020, 50000)
113
+ `);
114
+
115
+ // Full-text search with BM25 scoring
116
+ const result = await engine.sql(`
117
+ SELECT title, _score FROM papers
118
+ WHERE text_match(title, 'attention') ORDER BY _score DESC
119
+ `);
120
+ console.log(result.entries);
121
+
122
+ // Full-text search with @@ operator (query string mini-language)
123
+ const result = await engine.sql(`
124
+ SELECT title, _score FROM papers
125
+ WHERE title @@ 'attention AND transformer' ORDER BY _score DESC
126
+ `);
127
+
128
+ // K-nearest neighbor vector search
129
+ const result = await engine.sql(`
130
+ SELECT title, _score FROM papers
131
+ WHERE knn_match(embedding, ARRAY[0.1, 0.2, 0.3, 0.4], 5)
132
+ ORDER BY _score DESC
133
+ `);
134
+
135
+ // Multi-signal fusion: text + vector + graph
136
+ const result = await engine.sql(`
137
+ SELECT title, _score FROM papers
138
+ WHERE fuse_log_odds(
139
+ text_match(title, 'attention'),
140
+ knn_match(embedding, ARRAY[0.1, 0.2, 0.3, 0.4], 5),
141
+ traverse_match(1, 'cited_by', 2)
142
+ ) AND year >= 2020
143
+ ORDER BY _score DESC
144
+ `);
145
+
146
+ // Multi-field search across title + abstract
147
+ const result = await engine.sql(`
148
+ SELECT title, _score FROM papers
149
+ WHERE multi_field_match(title, abstract, 'attention transformer')
150
+ ORDER BY _score DESC
151
+ `);
152
+
153
+ // Multi-stage retrieval: broad recall -> precise re-ranking
154
+ const result = await engine.sql(`
155
+ SELECT title, _score FROM papers
156
+ WHERE staged_retrieval(
157
+ bayesian_match(title, 'transformer attention'), 50,
158
+ bayesian_match(abstract, 'self attention mechanism'), 10
159
+ ) ORDER BY _score DESC
160
+ `);
161
+
162
+ // Deep fusion: multi-layer neural network as SQL
163
+ const result = await engine.sql(`
164
+ SELECT id, _score FROM patches
165
+ WHERE deep_fusion(
166
+ layer(knn_match(embedding, $1, 16)),
167
+ convolve('spatial', ARRAY[0.6, 0.4]),
168
+ pool('spatial', 'max', 2),
169
+ flatten(),
170
+ dense(ARRAY[...], ARRAY[...], output_channels => 4, input_channels => 8),
171
+ softmax(),
172
+ gating => 'relu'
173
+ ) ORDER BY _score DESC
174
+ `);
175
+
176
+ // JOINs with qualified columns
177
+ const result = await engine.sql(`
178
+ SELECT e.name, d.name AS dept, e.salary
179
+ FROM employees e
180
+ INNER JOIN departments d ON e.dept_id = d.id
181
+ ORDER BY e.salary DESC
182
+ `);
183
+
184
+ // Window functions
185
+ const result = await engine.sql(`
186
+ SELECT rep, sale_date, amount,
187
+ SUM(amount) OVER (ORDER BY sale_date
188
+ ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
189
+ FROM sales
190
+ `);
191
+
192
+ // Recursive CTE
193
+ const result = await engine.sql(`
194
+ WITH RECURSIVE org_tree AS (
195
+ SELECT id, name, 1 AS depth FROM org_chart WHERE manager_id IS NULL
196
+ UNION ALL
197
+ SELECT o.id, o.name, t.depth + 1
198
+ FROM org_chart o INNER JOIN org_tree t ON o.manager_id = t.id
199
+ )
200
+ SELECT name, depth FROM org_tree ORDER BY depth
201
+ `);
202
+ ```
203
+
204
+ ### Graph Operations (Cypher)
205
+
206
+ ```typescript
207
+ // Create a named graph
208
+ await engine.sql("SELECT * FROM create_graph('social')");
209
+
210
+ // Create vertices and edges via Cypher
211
+ await engine.sql(`
212
+ SELECT * FROM cypher('social', $$
213
+ CREATE (a:Person {name: 'Alice', age: 30})-[:KNOWS]->(b:Person {name: 'Bob', age: 25})
214
+ RETURN a.name, b.name
215
+ $$) AS (a_name agtype, b_name agtype)
216
+ `);
217
+
218
+ // Query the graph
219
+ await engine.sql(`
220
+ SELECT * FROM cypher('social', $$
221
+ MATCH (p:Person)-[:KNOWS]->(friend:Person)
222
+ WHERE p.age > 25
223
+ RETURN p.name, friend.name, p.age
224
+ ORDER BY p.name
225
+ $$) AS (name agtype, friend agtype, age agtype)
226
+ `);
227
+
228
+ // Graph traversal and regular path queries via SQL
229
+ await engine.sql("SELECT _doc_id, title FROM traverse(1, 'cited_by', 2)");
230
+ await engine.sql("SELECT _doc_id, title FROM rpq('cited_by/cited_by', 1)");
231
+
232
+ // Centrality algorithms
233
+ await engine.sql(`
234
+ SELECT * FROM pagerank(0.85, 'social')
235
+ `);
236
+ ```
237
+
238
+ ### Geospatial Queries
239
+
240
+ ```typescript
241
+ await engine.sql(`
242
+ CREATE TABLE restaurants (
243
+ id SERIAL PRIMARY KEY,
244
+ name TEXT NOT NULL,
245
+ cuisine TEXT NOT NULL,
246
+ location POINT
247
+ )
248
+ `);
249
+
250
+ await engine.sql("CREATE INDEX idx_loc ON restaurants USING rtree (location)");
251
+
252
+ // Spatial range query with Haversine distance
253
+ const result = await engine.sql(`
254
+ SELECT name, ROUND(ST_Distance(location, POINT(-73.9857, 40.7484)), 0) AS dist_m
255
+ FROM restaurants
256
+ WHERE spatial_within(location, POINT(-73.9857, 40.7484), 5000)
257
+ ORDER BY dist_m
258
+ `);
259
+
260
+ // Spatial + text + vector fusion
261
+ const result = await engine.sql(`
262
+ SELECT name, _score FROM restaurants
263
+ WHERE fuse_log_odds(
264
+ text_match(description, 'pizza'),
265
+ spatial_within(location, POINT(-73.9969, 40.7306), 3000),
266
+ knn_match(embedding, $1, 5)
267
+ ) ORDER BY _score DESC
268
+ `);
269
+ ```
270
+
271
+ ### QueryBuilder API
272
+
273
+ ```typescript
274
+ import { Engine, Equals, GreaterThanOrEqual } from "uqa";
275
+
276
+ const engine = await Engine.create();
277
+
278
+ // Text search with scoring
279
+ const result = await engine
280
+ .query({ table: "papers" })
281
+ .term("attention", { field: "title" })
282
+ .scoreBayesianBM25("attention")
283
+ .execute();
284
+
285
+ // Nested data: filter + aggregate
286
+ const result = await engine
287
+ .query({ table: "orders" })
288
+ .filter("shipping.city", new Equals("Seoul"))
289
+ .pathAggregate("items.price", "sum")
290
+ .execute();
291
+
292
+ // Graph traversal + aggregation
293
+ const team = engine
294
+ .query({ table: "employees" })
295
+ .traverse(2, "manages", { maxHops: 2 });
296
+ const total = team.vertexAggregate("salary", "sum");
297
+
298
+ // Multi-field search with per-field weights
299
+ const result = await engine
300
+ .query({ table: "papers" })
301
+ .scoreMultiFieldBayesian("attention", ["title", "abstract"], [2.0, 1.0])
302
+ .execute();
303
+
304
+ // Multi-stage pipeline: broad recall -> re-rank
305
+ const s1 = engine
306
+ .query({ table: "papers" })
307
+ .scoreBayesianBM25("transformer", "title");
308
+ const s2 = engine
309
+ .query({ table: "papers" })
310
+ .scoreBayesianBM25("attention", "abstract");
311
+ const result = await engine
312
+ .query({ table: "papers" })
313
+ .multiStage([
314
+ [s1, 50],
315
+ [s2, 10],
316
+ ])
317
+ .execute();
318
+
319
+ // Temporal graph traversal
320
+ const result = await engine
321
+ .query({ table: "social" })
322
+ .temporalTraverse(1, "knows", { maxHops: 2, timestamp: 1700000000.0 })
323
+ .execute();
324
+
325
+ // Facets over all documents
326
+ const facets = engine.query({ table: "papers" }).facet("status");
327
+ ```
328
+
329
+ ### Text Analysis
330
+
331
+ ```typescript
332
+ // Create a custom analyzer via SQL
333
+ await engine.sql(`
334
+ SELECT * FROM create_analyzer('english_stem', '{
335
+ "tokenizer": {"type": "standard"},
336
+ "token_filters": [
337
+ {"type": "lowercase"},
338
+ {"type": "stop", "language": "english"},
339
+ {"type": "porter_stem"}
340
+ ],
341
+ "char_filters": []
342
+ }')
343
+ `);
344
+
345
+ await engine.sql("SELECT * FROM list_analyzers()");
346
+ ```
347
+
348
+ ## Architecture
349
+
350
+ ```mermaid
351
+ graph TD
352
+ SQL[SQL Parser<br/>libpg-query WASM] --> Compiler[SQL Compiler]
353
+ QB[QueryBuilder<br/>Fluent API] --> Operators
354
+
355
+ Compiler --> Optimizer[Query Optimizer]
356
+ Optimizer --> Operators[Operator Tree]
357
+ Operators --> Executor[Plan Executor]
358
+ Operators --> Cypher[Cypher Compiler<br/>openCypher]
359
+
360
+ Executor --> DS[Document Store<br/>sql.js]
361
+ Executor --> II[Inverted Index<br/>sql.js + Analyzer]
362
+ Executor --> VI["Vector Index<br/>IVF"]
363
+ Executor --> SI[Spatial Index<br/>R*Tree]
364
+ Executor --> GS[Graph Store<br/>sql.js<br/>Named Graphs]
365
+
366
+ subgraph Scoring ["Scoring (bayesian-bm25)"]
367
+ BM25[BM25]
368
+ BBFS[Bayesian BM25]
369
+ VS[Vector Scorer]
370
+ end
371
+
372
+ subgraph Fusion ["Fusion (bayesian-bm25)"]
373
+ LO[Log-Odds]
374
+ PB[Probabilistic Boolean]
375
+ end
376
+
377
+ Operators --> Scoring
378
+ Operators --> Fusion
379
+ ```
380
+
381
+ ### Package Structure
382
+
383
+ ```
384
+ src/
385
+ core/ PostingList, types, hierarchical documents, functors
386
+ analysis/ Text analysis pipeline: CharFilter, Tokenizer, TokenFilter, Analyzer,
387
+ dual index/search analyzers
388
+ storage/ Backend-agnostic stores with sql.js persistence: documents, inverted index,
389
+ vectors (IVF), spatial (R*Tree), graph
390
+ operators/ Operator algebra (boolean, primitive, hybrid, aggregation
391
+ (count/sum/avg/min/max/quantile), hierarchical (with cost estimation),
392
+ sparse, multi-field, attention fusion, learned fusion, multi-stage,
393
+ deep fusion (ResNet/GNN/CNN/DenseNet), deep learning (training pipeline))
394
+ scoring/ BM25, Bayesian BM25, VectorScorer, WAND/BlockMaxWAND, calibration,
395
+ parameter learning, external prior, multi-field, fusion WAND
396
+ (via bayesian-bm25), adaptive WAND, bound tightness
397
+ fusion/ Log-odds conjunction (fuse + fuse_mean), probabilistic boolean, attention
398
+ fusion, learned fusion, query features (via bayesian-bm25), adaptive fusion
399
+ graph/ GraphStore, traversal, pattern matching, RPQ, bounded RPQ, weighted paths,
400
+ centrality (PageRank, HITS, betweenness), cross-paradigm, indexes,
401
+ subgraph index, incremental matching, temporal filter/traverse/pattern,
402
+ delta/versioned store, message passing, embeddings, named graphs,
403
+ property indexes, join operators, RPQ optimizer, pattern negation
404
+ cypher/ openCypher lexer, parser, AST, posting-list-based compiler
405
+ joins/ Hash, sort-merge, index, graph, cross-paradigm, similarity joins,
406
+ semi-join, anti-join
407
+ execution/ Volcano iterator engine: columnar batches, vectorized operators
408
+ planner/ Cost model, cardinality estimator, optimizer, DPccp join enumerator
409
+ sql/ SQL compiler (libpg-query WASM), expression evaluator, FTS query parser,
410
+ table DDL/DML
411
+ api/ Fluent QueryBuilder
412
+ tests/ 2,832 tests across 108 test files
413
+ ```
414
+
415
+ ## SQL Reference
416
+
417
+ ### SQL Interface
418
+
419
+ | Category | Syntax |
420
+ |----------|--------|
421
+ | DDL | `CREATE TABLE [IF NOT EXISTS]`, `CREATE TEMPORARY TABLE`, `DROP TABLE [IF EXISTS]`, `CREATE TABLE AS SELECT`, `ALTER TABLE` (ADD/DROP/RENAME COLUMN, SET/DROP DEFAULT, SET/DROP NOT NULL, ALTER TYPE USING), `TRUNCATE TABLE`, `CREATE INDEX`, `DROP INDEX`, `CREATE SEQUENCE`/`NEXTVAL`/`CURRVAL`/`SETVAL`, `ALTER SEQUENCE` |
422
+ | Constraints | `PRIMARY KEY`, `NOT NULL`, `DEFAULT`, `UNIQUE`, `CHECK`, `FOREIGN KEY` (with insert/update/delete validation) |
423
+ | DML | `INSERT INTO ... VALUES`, `INSERT INTO ... SELECT`, `INSERT ... ON CONFLICT DO NOTHING/UPDATE`, `INSERT ... RETURNING`, `UPDATE ... SET ... WHERE [RETURNING]`, `UPDATE ... FROM` (join), `DELETE FROM ... WHERE [RETURNING]`, `DELETE ... USING` (join) |
424
+ | DQL | `SELECT [DISTINCT] ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY [NULLS FIRST/LAST] ... LIMIT ... OFFSET`, `FETCH FIRST n ROWS ONLY`, standalone `VALUES` |
425
+ | Joins | `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, `FULL OUTER JOIN`, `CROSS JOIN` with equality and non-equality `ON` conditions |
426
+ | Set Ops | `UNION [ALL]`, `INTERSECT [ALL]`, `EXCEPT [ALL]` with chaining |
427
+ | Subqueries | `IN (SELECT ...)`, `EXISTS (SELECT ...)`, scalar subqueries, correlated subqueries, derived tables (`FROM (SELECT ...) AS alias`) |
428
+ | CTEs | `WITH name AS (SELECT ...)`, `WITH RECURSIVE` |
429
+ | Views | `CREATE VIEW`, `DROP VIEW` |
430
+ | Window | `ROW_NUMBER`, `RANK`, `DENSE_RANK`, `NTILE`, `LAG`, `LEAD`, `NTH_VALUE`, `PERCENT_RANK`, `CUME_DIST`, aggregates `OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE BETWEEN ...)`, `WINDOW w AS (...)`, `FILTER (WHERE ...)` on window aggregates |
431
+ | Aggregates | `COUNT [DISTINCT]`, `SUM`, `AVG`, `MIN`, `MAX`, `STRING_AGG`, `ARRAY_AGG`, `BOOL_AND`/`EVERY`, `BOOL_OR`, `STDDEV`/`VARIANCE`, `PERCENTILE_CONT/DISC`, `MODE`, `JSON_OBJECT_AGG`, `CORR`, `COVAR_POP/SAMP`, `REGR_*` (10 functions), `deep_learn(...)`, `FILTER (WHERE ...)`, `ORDER BY` within aggregate |
432
+ | Types | `INTEGER`, `BIGINT`, `SERIAL`, `TEXT`, `VARCHAR`, `REAL`, `FLOAT`, `DOUBLE PRECISION`, `NUMERIC(p,s)`, `BOOLEAN`, `DATE`, `TIME`, `TIMESTAMP`, `TIMESTAMPTZ`, `INTERVAL`, `JSON`/`JSONB`, `UUID`, `BYTEA`, `INTEGER[]` (arrays), `VECTOR(N)`, `POINT` |
433
+ | Date/Time | `EXTRACT`, `DATE_TRUNC`, `DATE_PART`, `NOW()`, `CURRENT_DATE`, `CURRENT_TIMESTAMP`, `CURRENT_TIME`, `CLOCK_TIMESTAMP`, `TIMEOFDAY`, `AGE`, `TO_CHAR`, `TO_DATE`, `TO_TIMESTAMP`, `MAKE_DATE`, `MAKE_TIMESTAMP`, `MAKE_INTERVAL`, `TO_NUMBER`, `OVERLAPS`, `ISFINITE` |
434
+ | JSON | `->`, `->>`, `#>`, `#>>` operators, `@>` / `<@` containment, `?` / `?|` / `?&` key existence, `JSONB_SET`, `JSONB_STRIP_NULLS`, `JSON_BUILD_OBJECT`, `JSON_BUILD_ARRAY`, `JSON_OBJECT_KEYS`, `JSON_EXTRACT_PATH`, `JSON_TYPEOF`, `JSON_AGG`, `::jsonb` cast |
435
+ | Table Funcs | `GENERATE_SERIES`, `UNNEST`, `REGEXP_SPLIT_TO_TABLE`, `JSON_EACH`/`JSON_EACH_TEXT`, `JSON_ARRAY_ELEMENTS`/`JSON_ARRAY_ELEMENTS_TEXT` |
436
+ | FTS | `column @@ 'query'` full-text search operator with query string mini-language: bare terms, `"phrases"`, `field:term`, `field:[vector]`, `AND`/`OR`/`NOT`, implicit AND, parenthesized grouping, hybrid text+vector fusion |
437
+ | Functions | 90+ scalar functions: string (`CONCAT_WS`, `POSITION`, `LPAD`, `REVERSE`, `MD5`, `OVERLAY`, `REGEXP_MATCH`, `ENCODE`/`DECODE`, ...), math (`POWER`, `SQRT`, `LN`, `CBRT`, `GCD`, `LCM`, `MIN_SCALE`, `TRIM_SCALE`, trig, ...), conditional (`GREATEST`, `LEAST`, `NULLIF`) |
438
+ | Prepared | `PREPARE name AS ...`, `EXECUTE name(params)`, `DEALLOCATE name` |
439
+ | Utility | `EXPLAIN SELECT ...`, `ANALYZE [table]` |
440
+ | Transactions | `BEGIN`, `COMMIT`, `ROLLBACK`, `SAVEPOINT` |
441
+ | System | `information_schema.columns`, `pg_catalog.pg_tables`, `pg_catalog.pg_views`, `pg_catalog.pg_indexes`, `pg_catalog.pg_type` |
442
+
443
+ ### Extended WHERE Functions
444
+
445
+ | Function | Description |
446
+ |----------|-------------|
447
+ | `column @@ 'query'` | Full-text search operator with query string mini-language (boolean, phrase, field targeting, hybrid text+vector) |
448
+ | `text_match(field, 'query')` | Full-text search with BM25 scoring |
449
+ | `bayesian_match(field, 'query')` | Bayesian BM25 -- calibrated P(relevant) in [0,1] |
450
+ | `knn_match(field, vector, k)` | K-nearest neighbor vector search (vector as `ARRAY[...]` or `$N`) |
451
+ | `traverse_match(start, 'label', hops)` | Graph reachability as a scored signal |
452
+ | `path_filter(path, value)` | Hierarchical equality filter (any-match on arrays) |
453
+ | `path_filter(path, op, value)` | Hierarchical comparison filter (`>`, `<`, `>=`, `<=`, `!=`) |
454
+ | `spatial_within(field, POINT(x,y), dist)` | Geospatial range query (R*Tree + Haversine) |
455
+ | `sparse_threshold(signal, threshold)` | ReLU thresholding: max(0, score - threshold) |
456
+ | `multi_field_match(f1, f2, ..., query)` | Multi-field Bayesian BM25 with log-odds fusion |
457
+ | `bayesian_match_with_prior(f, q, pf, mode)` | Bayesian BM25 with external prior (recency/authority) |
458
+ | `temporal_traverse(start, lbl, hops, ts)` | Time-aware graph traversal |
459
+ | `message_passing(k, agg, property)` | GNN k-layer neighbor aggregation |
460
+ | `graph_embedding(dims, k)` | Structural graph embeddings |
461
+ | `vector_exclude(f, pos, neg, k, theta)` | Vector exclusion: positive minus negative similarity |
462
+ | `pagerank([damping[, iter[, tol]]][, 'graph'])` | PageRank centrality scoring |
463
+ | `hits([iter[, tol]][, 'graph'])` | HITS hub/authority scoring |
464
+ | `betweenness(['graph'])` | Betweenness centrality (Brandes) |
465
+ | `weighted_rpq('expr', start, 'prop'[, 'agg'[, threshold]])` | Weighted RPQ with aggregate predicates |
466
+
467
+ ### Fusion Meta-Functions
468
+
469
+ | Function | Description |
470
+ |----------|-------------|
471
+ | `fuse_log_odds(sig1, sig2, ...[, alpha][, 'relu'\|'swish'])` | Log-odds conjunction with optional gating |
472
+ | `fuse_prob_and(sig1, sig2, ...)` | Probabilistic AND: P = prod(P_i) |
473
+ | `fuse_prob_or(sig1, sig2, ...)` | Probabilistic OR: P = 1 - prod(1 - P_i) |
474
+ | `fuse_prob_not(signal)` | Probabilistic NOT: P = 1 - P_signal |
475
+ | `fuse_attention(sig1, sig2, ...)` | Attention-weighted log-odds fusion |
476
+ | `fuse_learned(sig1, sig2, ...)` | Learned-weight log-odds fusion |
477
+ | `staged_retrieval(sig1, k1, sig2, k2, ...)` | Multi-stage cascading retrieval pipeline |
478
+ | `progressive_fusion(sig1, sig2, k1, sig3, k2[, alpha][, 'gating'])` | Progressive multi-stage WAND fusion |
479
+ | `deep_fusion(layer(...), propagate(...), convolve(...), ...[, gating])` | Multi-layer Bayesian fusion (ResNet + GNN + CNN) |
480
+
481
+ ### Deep Fusion Layer Functions
482
+
483
+ Used inside `deep_fusion()` to compose neural network pipelines:
484
+
485
+ | Function | Description |
486
+ |----------|-------------|
487
+ | `layer(sig1, sig2, ...)` | Signal layer: log-odds conjunction with residual connection (ResNet) |
488
+ | `propagate('label', 'agg'[, 'dir'])` | Graph propagation: spread scores through edges (GNN) |
489
+ | `convolve('label', ARRAY[w...][, 'dir'])` | Spatial convolution: weighted multi-hop BFS aggregation (CNN) |
490
+ | `pool('label', 'method', size[, 'dir'])` | Spatial downsampling via greedy BFS partitioning |
491
+ | `dense(ARRAY[W], ARRAY[b], output_channels => N, input_channels => M)` | Fully connected layer |
492
+ | `flatten()` | Collapse spatial nodes into a single vector |
493
+ | `global_pool('avg'\|'max'\|'avg_max')` | Channel-preserving spatial reduction (alternative to flatten) |
494
+ | `softmax()` | Classification head (numerically stable) |
495
+ | `batch_norm([epsilon => 1e-5])` | Per-channel normalization across nodes |
496
+ | `dropout(p)` | Inference-mode scaling by (1 - p) |
497
+ | `attention(n_heads => N, mode => 'content'\|'random_qk'\|'learned_v')` | Self-attention: context-dependent PoE (Theorem 8.3) |
498
+ | `model('name', $1)` | Load trained model and create full inference pipeline |
499
+ | `embed(vector, in_channels => C, grid_h => H, grid_w => W)` | Inject raw embedding vector into channel map |
500
+
501
+ ### Deep Learning Functions
502
+
503
+ | Function | Description |
504
+ |----------|-------------|
505
+ | `deep_learn('model', label, embedding, 'edge_label', layers...[, gating][, lambda][, l1_ratio][, prune_ratio])` | SELECT aggregate: train a CNN classifier analytically (ridge regression, no backpropagation). Optional L1 regularization and magnitude pruning. |
506
+ | `deep_predict('model', embedding)` | Per-row scalar: inference with trained model, returns class probabilities |
507
+ | `build_grid_graph('table', rows, cols, 'label')` | FROM-clause: construct 4-connected grid graph for spatial convolution |
508
+
509
+ ### SELECT Spatial Functions
510
+
511
+ | Function | Description |
512
+ |----------|-------------|
513
+ | `ST_Distance(point1, point2)` | Haversine great-circle distance in meters |
514
+ | `ST_Within(point1, point2, dist)` | Distance predicate (boolean) |
515
+ | `ST_DWithin(point1, point2, dist)` | Alias for ST_Within |
516
+ | `POINT(x, y)` | Construct a POINT value (longitude, latitude) |
517
+
518
+ ### FROM-Clause Table Functions
519
+
520
+ | Function | Description |
521
+ |----------|-------------|
522
+ | `traverse(start, 'label', hops)` | BFS graph traversal |
523
+ | `rpq('path_expr', start)` | Regular path query (NFA simulation) |
524
+ | `text_search('query', 'field', 'table')` | Table-scoped full-text search |
525
+ | `generate_series(start, stop[, step])` | Generate a series of values |
526
+ | `unnest(array)` | Expand an array to a set of rows |
527
+ | `regexp_split_to_table(str, pattern)` | Split string by regex into rows |
528
+ | `json_each(json)` / `json_each_text(json)` | Expand JSON object to key/value rows |
529
+ | `json_array_elements(json)` | Expand JSON array to a set of rows |
530
+ | `pagerank([damping][, 'table_or_graph'])` | PageRank centrality as table source |
531
+ | `hits([iter][, 'table_or_graph'])` | HITS hub/authority as table source |
532
+ | `betweenness(['table_or_graph'])` | Betweenness centrality as table source |
533
+ | `graph_add_vertex(id, 'label', 'table'[, 'props'])` | Add graph vertex to table's graph store |
534
+ | `graph_add_edge(eid, src, tgt, 'label', 'table'[, 'props'])` | Add graph edge to table's graph store |
535
+ | `create_graph('name')` | Create a named graph namespace |
536
+ | `drop_graph('name')` | Drop a named graph |
537
+ | `cypher('graph', $$ query $$) AS (cols)` | Execute openCypher query on a named graph |
538
+ | `create_analyzer('name', 'config')` | Create a custom text analyzer (JSON config) |
539
+ | `drop_analyzer('name')` | Drop a custom text analyzer |
540
+ | `set_table_analyzer('tbl', 'field', 'name'[, 'phase'])` | Assign index/search analyzer to a field |
541
+ | `list_analyzers()` | List all registered analyzers |
542
+ | `build_grid_graph('table', rows, cols, 'label')` | Construct 4-connected grid graph for spatial convolution |
543
+
544
+ ### Persistence
545
+
546
+ All data is persisted to SQLite (via sql.js WASM) when an engine is created with `dbPath`:
547
+
548
+ | Store | SQLite Table | Description |
549
+ |-------|-------------|-------------|
550
+ | Documents | `_data_{table}` | Typed columns per table |
551
+ | Inverted Index | `_inverted_{table}_{field}` | Per-table per-field posting lists |
552
+ | Field Stats | `_field_stats_{table}` | Per-table field-level statistics (BM25) |
553
+ | Doc Lengths | `_doc_lengths_{table}` | Per-table per-document token lengths (BM25) |
554
+ | Vectors | `_ivf_centroids_{table}_{field}`, `_ivf_lists_{table}_{field}` | IVF index via `CREATE INDEX ... USING hnsw` or `USING ivf` |
555
+ | Spatial | `_rtree_{table}_{field}` | R*Tree virtual table for POINT columns |
556
+ | Graph | `_graph_vertices_{table}`, `_graph_edges_{table}` | Per-table adjacency-indexed graph with vertex labels |
557
+ | Named Graphs | `_graph_catalog_{table}`, `_graph_membership_{table}` | Per-graph partitioned adjacency with catalog and membership tables |
558
+ | B-tree Indexes | SQLite indexes on `_data_{table}` | `CREATE INDEX` support |
559
+ | Analyzers | `_analyzers` | Custom text analyzer configurations |
560
+ | Field Analyzers | `_table_field_analyzers` | Per-field index/search analyzer assignments |
561
+ | Foreign Servers | `_foreign_servers` | FDW server definitions (type, connection options) |
562
+ | Foreign Tables | `_foreign_tables` | FDW table definitions (columns, source, options) |
563
+ | Path Indexes | `_path_indexes` | Pre-computed label-sequence RPQ accelerators |
564
+ | Statistics | `_column_stats` | Per-table histograms and MCVs for optimizer |
565
+ | Models | `_models` | Trained deep learning model configurations (JSON) |
566
+
567
+ ### Query Optimizer
568
+
569
+ - Algebraic simplification (idempotent intersection/union, absorption law, empty elimination)
570
+ - Cost-based optimization with equi-depth histograms and Most Common Values (MCV)
571
+ - **DPccp join order optimization** (Moerkotte & Neumann, 2006) -- O(3^n) dynamic programming over connected subgraph complement pairs; produces optimal bushy join trees for INNER JOIN chains with 2+ relations; greedy fallback for 16+ relations
572
+ - Filter pushdown into intersections (recursive through nested IntersectOperators)
573
+ - Vector threshold merge with floating-point tolerance
574
+ - Intersect operand reordering by execution cost (cheapest first)
575
+ - Fusion signal reordering by cost (cheapest first)
576
+ - Early termination in IntersectOperator (skip remaining operands when accumulator is empty)
577
+ - Predicate-aware cardinality damping (same-column vs different-column correlation)
578
+ - Join-algorithm-aware DPccp cost model (index join vs hash join threshold)
579
+ - R*Tree spatial index scan for POINT column range queries
580
+ - B-tree index scan substitution (replace full scans when profitable)
581
+ - Cross-paradigm cardinality estimation for text, vector, graph, fusion, temporal, and GNN operators
582
+ - Edge property filter pushdown into graph pattern constraints
583
+ - Join-pattern fusion (merge intersected pattern matches with shared variables)
584
+ - Cross-paradigm join cost models (text similarity, vector similarity, graph, hybrid joins)
585
+ - Threshold-aware vector selectivity estimation (4-tier threshold buckets)
586
+ - Temporal graph cardinality correction with timestamp/range selectivity
587
+ - Path index acceleration for simple Concat-of-Labels RPQ expressions and Cypher MATCH patterns
588
+ - CTE inlining for single-reference non-recursive CTEs
589
+ - Predicate pushdown into views and derived tables
590
+ - Implicit cross join reordering via DPccp when equijoin predicates exist in WHERE
591
+ - Filter pushdown into graph traverse operators (vertex predicate BFS pruning)
592
+ - Graph-aware fusion signal reordering with per-graph cost model
593
+ - Named graph scoped statistics (degree distribution, label degree, vertex label counts)
594
+ - Information-theoretic cardinality lower bounds (entropy-based, histogram-aware)
595
+ - Hierarchical operator cost estimation (PathFilter, PathProject, PathUnnest, PathAggregate)
596
+ - Negation-aware pattern match cost estimation
597
+
598
+ ## API Reference
599
+
600
+ ### Core Exports
601
+
602
+ ```typescript
603
+ // Core data structures
604
+ import {
605
+ PostingList,
606
+ GeneralizedPostingList,
607
+ HierarchicalDocument,
608
+ } from "uqa";
609
+
610
+ // Type system (predicates)
611
+ import {
612
+ Equals,
613
+ NotEquals,
614
+ GreaterThan,
615
+ GreaterThanOrEqual,
616
+ LessThan,
617
+ LessThanOrEqual,
618
+ InSet,
619
+ Between,
620
+ IsNull,
621
+ IsNotNull,
622
+ Like,
623
+ ILike,
624
+ IndexStats,
625
+ } from "uqa";
626
+
627
+ // Storage backends
628
+ import {
629
+ MemoryDocumentStore,
630
+ MemoryInvertedIndex,
631
+ FlatVectorIndex,
632
+ MemoryGraphStore,
633
+ } from "uqa";
634
+
635
+ // Scoring
636
+ import {
637
+ BM25Scorer,
638
+ createBM25Params,
639
+ BayesianBM25Scorer,
640
+ createBayesianBM25Params,
641
+ } from "uqa";
642
+
643
+ // Operators
644
+ import {
645
+ Operator,
646
+ TermOperator,
647
+ KNNOperator,
648
+ FilterOperator,
649
+ UnionOperator,
650
+ IntersectOperator,
651
+ ComplementOperator,
652
+ } from "uqa";
653
+
654
+ // Analysis
655
+ import {
656
+ Analyzer,
657
+ standardAnalyzer,
658
+ whitespaceAnalyzer,
659
+ keywordAnalyzer,
660
+ } from "uqa";
661
+
662
+ // Engine and QueryBuilder
663
+ import { Engine, QueryBuilder } from "uqa";
664
+
665
+ // SQL
666
+ import { Table } from "uqa";
667
+ ```
668
+
669
+ ### Engine
670
+
671
+ | Method | Description |
672
+ |--------|-------------|
673
+ | `Engine.create(options?)` | Create an engine instance (async, initializes WASM) |
674
+ | `engine.sql(query, params?)` | Execute a SQL query |
675
+ | `engine.query(options)` | Create a QueryBuilder for fluent queries |
676
+ | `engine.getDocument(id, table)` | Retrieve a document by ID |
677
+ | `engine.close()` | Close the engine and release resources |
678
+
679
+ ### QueryBuilder
680
+
681
+ | Method | Description |
682
+ |--------|-------------|
683
+ | `.term(query, options?)` | Full-text term search |
684
+ | `.filter(field, predicate)` | Apply a filter predicate |
685
+ | `.knn(field, vector, k)` | K-nearest neighbor search |
686
+ | `.traverse(start, label, options?)` | Graph traversal |
687
+ | `.scoreBM25(query, field?)` | Score with BM25 |
688
+ | `.scoreBayesianBM25(query, field?)` | Score with Bayesian BM25 |
689
+ | `.scoreMultiFieldBayesian(query, fields, weights?)` | Multi-field Bayesian scoring |
690
+ | `.fuseLogOdds(signals)` | Log-odds fusion of multiple signals |
691
+ | `.multiStage(stages)` | Multi-stage retrieval pipeline |
692
+ | `.pathAggregate(path, func)` | Hierarchical path aggregation |
693
+ | `.vertexAggregate(property, func)` | Graph vertex aggregation |
694
+ | `.temporalTraverse(start, label, options?)` | Temporal graph traversal |
695
+ | `.facet(field)` | Facet computation |
696
+ | `.execute()` | Execute and return results |
697
+
698
+ ## Dependencies
699
+
700
+ | Package | Purpose | Notes |
701
+ |---------|---------|-------|
702
+ | [libpg-query](https://www.npmjs.com/package/libpg-query) | SQL parsing | PostgreSQL 17 parser compiled to WASM |
703
+ | [bayesian-bm25](https://www.npmjs.com/package/bayesian-bm25) | BM25/Bayesian scoring and fusion | Probabilistic scoring framework |
704
+ | [sql.js](https://www.npmjs.com/package/sql.js) | SQLite persistence | SQLite compiled to WASM |
705
+ | [apache-arrow](https://www.npmjs.com/package/apache-arrow) | Columnar data format | Arrow IPC for execution engine batches |
706
+ | [@duckdb/duckdb-wasm](https://www.npmjs.com/package/@duckdb/duckdb-wasm) | Foreign Data Wrapper backend | Parquet/CSV/JSON query pushdown |
707
+ | [xterm](https://www.npmjs.com/package/xterm) | Terminal emulation | Browser-based SQL shell |
708
+ | [highlight.js](https://www.npmjs.com/package/highlight.js) | Syntax highlighting | SQL syntax highlighting in shell |
709
+ | [comlink](https://www.npmjs.com/package/comlink) | Web Worker communication | RPC for off-main-thread execution |
710
+
711
+ ## Browser Compatibility
712
+
713
+ UQA-JS runs in any modern browser that supports WebAssembly:
714
+
715
+ | Browser | Minimum Version |
716
+ |---------|----------------|
717
+ | Chrome | 57+ |
718
+ | Firefox | 52+ |
719
+ | Safari | 11+ |
720
+ | Edge | 79+ |
721
+
722
+ All heavy computation (SQL parsing, SQLite persistence, scoring) is performed via WASM modules. The engine can optionally run in a Web Worker via Comlink for non-blocking UI interaction.
723
+
724
+ ### Node.js
725
+
726
+ Node.js 18+ is supported for server-side usage and testing.
727
+
728
+ ## Build
729
+
730
+ ```bash
731
+ # Install dependencies
732
+ npm install
733
+
734
+ # Type check
735
+ npm run check
736
+
737
+ # Build ESM + UMD bundles
738
+ npm run build
739
+
740
+ # Run tests
741
+ npm test
742
+
743
+ # Lint
744
+ npm run lint
745
+ ```
746
+
747
+ The build produces:
748
+
749
+ | Output | Path | Format |
750
+ |--------|------|--------|
751
+ | ES module | `dist/uqa.es.js` | ESM |
752
+ | UMD bundle | `dist/uqa.umd.js` | UMD (browser global) |
753
+ | Type declarations | `dist/types/` | `.d.ts` files |
754
+
755
+ ## Tests
756
+
757
+ ```bash
758
+ # Run all tests
759
+ npm test
760
+
761
+ # Run a specific test file
762
+ npx vitest run tests/test_sql.test.ts
763
+
764
+ # Run tests in watch mode
765
+ npm run test:watch
766
+ ```
767
+
768
+ 2,832 tests across 108 test files covering:
769
+
770
+ - Boolean algebra axioms verified with 100 random trials each
771
+ - De Morgan's laws, sorted invariants
772
+ - Complete operator, storage, scoring, graph, SQL, execution, and planner test coverage
773
+
774
+ ## Papers
775
+
776
+ The theoretical foundation is described in the following papers (available in `docs/papers/`):
777
+
778
+ 1. [A Unified Mathematical Framework for Query Algebras Across Heterogeneous Data Paradigms](docs/papers/1.%20A%20Unified%20Mathematical%20Framework%20for%20Query%20Algebras%20Across%20Heterogeneous%20Data%20Paradigms.pdf)
779
+ 2. [Extending the Unified Mathematical Framework to Support Graph Data Structures](docs/papers/2.%20Extending%20the%20Unified%20Mathematical%20Framework%20to%20Support%20Graph%20Data%20Structures.pdf)
780
+ 3. [Bayesian BM25: A Probabilistic Framework for Hybrid Text and Vector Search](docs/papers/3.%20Bayesian%20BM25%20-%20A%20Probabilistic%20Framework%20for%20Hybrid%20Text%20and%20Vector%20Search.pdf)
781
+ 4. [Bayesian Fusion as Neural Computation](docs/papers/4.%20Bayesian%20Fusion%20as%20Neural%20Computation.pdf)
782
+ 5. [Vector Scores as Likelihood Ratios: Index-Derived Bayesian Calibration for Hybrid Search](docs/papers/5.%20Vector%20Scores%20as%20Likelihood%20Ratios%20-%20Index-Derived%20Bayesian%20Calibration%20for%20Hybrid%20Search.pdf)
783
+
784
+ ## License
785
+
786
+ AGPL-3.0-only -- see [LICENSE](LICENSE).
787
+
788
+ Copyright (c) 2023-2026 Cognica, Inc.