frogql 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (209) hide show
  1. frogql-0.1.0/.github/workflows/ci.yml +73 -0
  2. frogql-0.1.0/.github/workflows/release.yml +112 -0
  3. frogql-0.1.0/.gitignore +33 -0
  4. frogql-0.1.0/ARCHITECTURE.md +197 -0
  5. frogql-0.1.0/CLAUDE.md +356 -0
  6. frogql-0.1.0/Cargo.lock +1679 -0
  7. frogql-0.1.0/Cargo.toml +64 -0
  8. frogql-0.1.0/LICENSE +21 -0
  9. frogql-0.1.0/MANUAL.md +274 -0
  10. frogql-0.1.0/PKG-INFO +112 -0
  11. frogql-0.1.0/README.md +485 -0
  12. frogql-0.1.0/bench/.gitignore +2 -0
  13. frogql-0.1.0/bench/BENCHMARK_PLAN.md +75 -0
  14. frogql-0.1.0/bench/LDBC_BENCHMARK.md +237 -0
  15. frogql-0.1.0/bench/LDBC_BENCH_PLAN.md +180 -0
  16. frogql-0.1.0/bench/TYPECHECKER_BENCHMARK.md +200 -0
  17. frogql-0.1.0/bench/cross-system/README.md +102 -0
  18. frogql-0.1.0/bench/cross-system/compare_results.py +163 -0
  19. frogql-0.1.0/bench/cross-system/gqlite/run.sh +68 -0
  20. frogql-0.1.0/bench/cross-system/graphqlite/ic2.cypher +28 -0
  21. frogql-0.1.0/bench/cross-system/graphqlite/requirements.txt +1 -0
  22. frogql-0.1.0/bench/cross-system/graphqlite/run.py +253 -0
  23. frogql-0.1.0/bench/cross-system/graphqlite/setup.py +212 -0
  24. frogql-0.1.0/bench/cross-system/run_all.sh +191 -0
  25. frogql-0.1.0/bench/ldbc-queries/ic1.toml +23 -0
  26. frogql-0.1.0/bench/ldbc-queries/ic10.toml +25 -0
  27. frogql-0.1.0/bench/ldbc-queries/ic11.toml +22 -0
  28. frogql-0.1.0/bench/ldbc-queries/ic12.toml +25 -0
  29. frogql-0.1.0/bench/ldbc-queries/ic13.toml +22 -0
  30. frogql-0.1.0/bench/ldbc-queries/ic14.toml +24 -0
  31. frogql-0.1.0/bench/ldbc-queries/ic2.toml +29 -0
  32. frogql-0.1.0/bench/ldbc-queries/ic3.toml +22 -0
  33. frogql-0.1.0/bench/ldbc-queries/ic4.toml +23 -0
  34. frogql-0.1.0/bench/ldbc-queries/ic5.toml +21 -0
  35. frogql-0.1.0/bench/ldbc-queries/ic6.toml +22 -0
  36. frogql-0.1.0/bench/ldbc-queries/ic7.toml +23 -0
  37. frogql-0.1.0/bench/ldbc-queries/ic8.toml +21 -0
  38. frogql-0.1.0/bench/ldbc-queries/ic9.toml +21 -0
  39. frogql-0.1.0/bench/queries/1-tree.gql +1 -0
  40. frogql-0.1.0/bench/queries/2-3-lollipop.gql +1 -0
  41. frogql-0.1.0/bench/queries/2-comb.gql +1 -0
  42. frogql-0.1.0/bench/queries/2-tree.gql +1 -0
  43. frogql-0.1.0/bench/queries/3-4-lollipop.gql +1 -0
  44. frogql-0.1.0/bench/queries/3-clique.gql +1 -0
  45. frogql-0.1.0/bench/queries/3-cycle.gql +1 -0
  46. frogql-0.1.0/bench/queries/3-path.gql +1 -0
  47. frogql-0.1.0/bench/queries/4-clique.gql +1 -0
  48. frogql-0.1.0/bench/queries/4-cycle.gql +1 -0
  49. frogql-0.1.0/bench/queries/4-path.gql +1 -0
  50. frogql-0.1.0/bench/scripts/csv_to_json.py +168 -0
  51. frogql-0.1.0/bench/scripts/download_livejournal.sh +27 -0
  52. frogql-0.1.0/bench/scripts/generate_queries.py +89 -0
  53. frogql-0.1.0/bench/scripts/run_bench.sh +103 -0
  54. frogql-0.1.0/docs/JOIN_STRATEGY_NOTES.md +457 -0
  55. frogql-0.1.0/docs/graph-type-catalog-plan.md +481 -0
  56. frogql-0.1.0/docs/implemented-optimizations.md +169 -0
  57. frogql-0.1.0/docs/iso-gql-gaps.md +140 -0
  58. frogql-0.1.0/docs/possible-optimizations.md +109 -0
  59. frogql-0.1.0/docs/rules.md +183 -0
  60. frogql-0.1.0/docs/storage-architecture.md +782 -0
  61. frogql-0.1.0/docs/typechecker_migration.md +546 -0
  62. frogql-0.1.0/examples/address_queries.json +237 -0
  63. frogql-0.1.0/examples/bom.gdb +0 -0
  64. frogql-0.1.0/examples/books_queries.json +362 -0
  65. frogql-0.1.0/examples/disney.gdb +0 -0
  66. frogql-0.1.0/examples/disney_queries.json +112 -0
  67. frogql-0.1.0/examples/financial_financial_management.gdb +0 -0
  68. frogql-0.1.0/examples/financial_financial_management_queries.json +4122 -0
  69. frogql-0.1.0/examples/financial_fraud_detection.gdb +0 -0
  70. frogql-0.1.0/examples/financial_fraud_detection_queries.json +3857 -0
  71. frogql-0.1.0/examples/financial_payment.gdb +0 -0
  72. frogql-0.1.0/examples/financial_payment_queries.json +3882 -0
  73. frogql-0.1.0/examples/fraud_detection.gdb +0 -0
  74. frogql-0.1.0/examples/gameofthrones.gdb +0 -0
  75. frogql-0.1.0/examples/grandstack.gdb +0 -0
  76. frogql-0.1.0/examples/hockey.gdb +0 -0
  77. frogql-0.1.0/examples/hockey_queries.json +232 -0
  78. frogql-0.1.0/examples/information_data_lineage.gdb +0 -0
  79. frogql-0.1.0/examples/information_data_lineage_queries.json +3577 -0
  80. frogql-0.1.0/examples/information_technology_identity_and_access_management.gdb +0 -0
  81. frogql-0.1.0/examples/information_technology_identity_and_access_management_queries.json +3532 -0
  82. frogql-0.1.0/examples/information_technology_iot.gdb +0 -0
  83. frogql-0.1.0/examples/information_technology_iot_queries.json +3357 -0
  84. frogql-0.1.0/examples/information_technology_it_asset_management.gdb +0 -0
  85. frogql-0.1.0/examples/information_technology_it_asset_management_queries.json +3632 -0
  86. frogql-0.1.0/examples/knowledge_general_knowledge.gdb +0 -0
  87. frogql-0.1.0/examples/knowledge_general_knowledge_queries.json +1847 -0
  88. frogql-0.1.0/examples/knowledge_graph_geography.gdb +0 -0
  89. frogql-0.1.0/examples/knowledge_graph_geography_queries.json +3212 -0
  90. frogql-0.1.0/examples/manufacturing_bombill_of_materials.gdb +0 -0
  91. frogql-0.1.0/examples/manufacturing_bombill_of_materials_queries.json +2447 -0
  92. frogql-0.1.0/examples/manufacturing_production_process.gdb +0 -0
  93. frogql-0.1.0/examples/manufacturing_production_process_queries.json +2982 -0
  94. frogql-0.1.0/examples/moivelens.gdb +0 -0
  95. frogql-0.1.0/examples/moivelens_queries.json +3262 -0
  96. frogql-0.1.0/examples/movies.gdb +0 -0
  97. frogql-0.1.0/examples/northwind.gdb +0 -0
  98. frogql-0.1.0/examples/olympics_queries.json +377 -0
  99. frogql-0.1.0/examples/soccer_2016.gdb +0 -0
  100. frogql-0.1.0/examples/soccer_2016_queries.json +262 -0
  101. frogql-0.1.0/examples/social_network_recommendation.gdb +0 -0
  102. frogql-0.1.0/examples/social_network_recommendation_queries.json +2307 -0
  103. frogql-0.1.0/examples/social_network_twitter.gdb +0 -0
  104. frogql-0.1.0/examples/social_network_twitter_queries.json +2987 -0
  105. frogql-0.1.0/examples/stackoverflow2.gdb +0 -0
  106. frogql-0.1.0/examples/student_loan.gdb +0 -0
  107. frogql-0.1.0/examples/student_loan_queries.json +487 -0
  108. frogql-0.1.0/examples/typecheck_demo.rs +20 -0
  109. frogql-0.1.0/examples/typecheck_repl_smoke.rs +146 -0
  110. frogql-0.1.0/examples/video_games_queries.json +422 -0
  111. frogql-0.1.0/examples/world.gdb +0 -0
  112. frogql-0.1.0/examples/world_queries.json +282 -0
  113. frogql-0.1.0/pyproject.toml +45 -0
  114. frogql-0.1.0/python/Cargo.toml +17 -0
  115. frogql-0.1.0/python/LICENSE +21 -0
  116. frogql-0.1.0/python/README.md +80 -0
  117. frogql-0.1.0/python/src/lib.rs +620 -0
  118. frogql-0.1.0/scripts/convert_dev_datasets.py +395 -0
  119. frogql-0.1.0/src/bin/bench_queries.rs +110 -0
  120. frogql-0.1.0/src/bin/bench_setup.rs +303 -0
  121. frogql-0.1.0/src/bin/convert_edgelist.rs +195 -0
  122. frogql-0.1.0/src/bin/frogql.rs +1517 -0
  123. frogql-0.1.0/src/bin/ldbc_bench.rs +1215 -0
  124. frogql-0.1.0/src/bin/typecheck_bench.rs +571 -0
  125. frogql-0.1.0/src/elaborate/mod.rs +192 -0
  126. frogql-0.1.0/src/lib.rs +168 -0
  127. frogql-0.1.0/src/model/csv_loader.rs +1061 -0
  128. frogql-0.1.0/src/model/graph.rs +457 -0
  129. frogql-0.1.0/src/model/graph_access.rs +91 -0
  130. frogql-0.1.0/src/model/mod.rs +4 -0
  131. frogql-0.1.0/src/model/value.rs +211 -0
  132. frogql-0.1.0/src/optimizer/existential.rs +116 -0
  133. frogql-0.1.0/src/optimizer/mod.rs +10 -0
  134. frogql-0.1.0/src/optimizer/pushdown.rs +501 -0
  135. frogql-0.1.0/src/pager/header.rs +218 -0
  136. frogql-0.1.0/src/pager/mod.rs +11 -0
  137. frogql-0.1.0/src/pager/page.rs +267 -0
  138. frogql-0.1.0/src/pager/pager.rs +451 -0
  139. frogql-0.1.0/src/parser/grammar.rs +1914 -0
  140. frogql-0.1.0/src/parser/lexer.rs +560 -0
  141. frogql-0.1.0/src/parser/mod.rs +6 -0
  142. frogql-0.1.0/src/runtime/assignment.rs +112 -0
  143. frogql-0.1.0/src/runtime/catalog.rs +247 -0
  144. frogql-0.1.0/src/runtime/engine.rs +2130 -0
  145. frogql-0.1.0/src/runtime/ltj/algorithm.rs +387 -0
  146. frogql-0.1.0/src/runtime/ltj/iterator.rs +182 -0
  147. frogql-0.1.0/src/runtime/ltj/mod.rs +5 -0
  148. frogql-0.1.0/src/runtime/ltj/pattern_extract.rs +903 -0
  149. frogql-0.1.0/src/runtime/ltj/triple_index.rs +295 -0
  150. frogql-0.1.0/src/runtime/ltj/veo.rs +46 -0
  151. frogql-0.1.0/src/runtime/mod.rs +52 -0
  152. frogql-0.1.0/src/runtime/result.rs +148 -0
  153. frogql-0.1.0/src/store/catalog_io.rs +218 -0
  154. frogql-0.1.0/src/store/disk.rs +492 -0
  155. frogql-0.1.0/src/store/disk_index.rs +489 -0
  156. frogql-0.1.0/src/store/io.rs +394 -0
  157. frogql-0.1.0/src/store/lazy.rs +931 -0
  158. frogql-0.1.0/src/store/mod.rs +8 -0
  159. frogql-0.1.0/src/store/record.rs +322 -0
  160. frogql-0.1.0/src/store/secondary_index.rs +419 -0
  161. frogql-0.1.0/src/store/string_table.rs +634 -0
  162. frogql-0.1.0/src/syntax/descriptor.rs +78 -0
  163. frogql-0.1.0/src/syntax/expr.rs +254 -0
  164. frogql-0.1.0/src/syntax/mod.rs +5 -0
  165. frogql-0.1.0/src/syntax/path_pattern.rs +102 -0
  166. frogql-0.1.0/src/syntax/query.rs +329 -0
  167. frogql-0.1.0/src/syntax/statement.rs +72 -0
  168. frogql-0.1.0/src/typing/checker.rs +929 -0
  169. frogql-0.1.0/src/typing/descriptor_type.rs +53 -0
  170. frogql-0.1.0/src/typing/format.rs +293 -0
  171. frogql-0.1.0/src/typing/inference.rs +401 -0
  172. frogql-0.1.0/src/typing/label_type.rs +207 -0
  173. frogql-0.1.0/src/typing/mod.rs +11 -0
  174. frogql-0.1.0/src/typing/path_type.rs +275 -0
  175. frogql-0.1.0/src/typing/property_type.rs +319 -0
  176. frogql-0.1.0/src/typing/simple_type.rs +367 -0
  177. frogql-0.1.0/src/typing/type_environment.rs +305 -0
  178. frogql-0.1.0/src/typing/validate.rs +337 -0
  179. frogql-0.1.0/src/typing/variable_type.rs +509 -0
  180. frogql-0.1.0/test_data/fraud.json +128 -0
  181. frogql-0.1.0/test_data/movies.json +5193 -0
  182. frogql-0.1.0/test_data/social-network.json +78 -0
  183. frogql-0.1.0/tests/aggregates_proptest.proptest-regressions +7 -0
  184. frogql-0.1.0/tests/aggregates_proptest.rs +170 -0
  185. frogql-0.1.0/tests/bench_test.rs +679 -0
  186. frogql-0.1.0/tests/coalesce_test.rs +191 -0
  187. frogql-0.1.0/tests/compile_diagnostics.rs +66 -0
  188. frogql-0.1.0/tests/count_test.rs +807 -0
  189. frogql-0.1.0/tests/elaborate_test.rs +69 -0
  190. frogql-0.1.0/tests/exists_fold_test.rs +195 -0
  191. frogql-0.1.0/tests/exists_runtime_test.rs +241 -0
  192. frogql-0.1.0/tests/float_test.rs +128 -0
  193. frogql-0.1.0/tests/graph_type_test.rs +893 -0
  194. frogql-0.1.0/tests/lattice_proptest.rs +1697 -0
  195. frogql-0.1.0/tests/list_test.rs +150 -0
  196. frogql-0.1.0/tests/multi_match_proptest.rs +198 -0
  197. frogql-0.1.0/tests/multi_match_test.rs +314 -0
  198. frogql-0.1.0/tests/null_test.rs +102 -0
  199. frogql-0.1.0/tests/optional_match_test.rs +391 -0
  200. frogql-0.1.0/tests/order_by_test.rs +406 -0
  201. frogql-0.1.0/tests/parse_and_run_test.rs +257 -0
  202. frogql-0.1.0/tests/parser_test.rs +1096 -0
  203. frogql-0.1.0/tests/record_test.rs +185 -0
  204. frogql-0.1.0/tests/runtime_test.rs +820 -0
  205. frogql-0.1.0/tests/store_runtime_test.rs +194 -0
  206. frogql-0.1.0/tests/text2gql_test.rs +258 -0
  207. frogql-0.1.0/tests/typecheck_gaps_order_by_test.rs +292 -0
  208. frogql-0.1.0/tests/typecheck_smoke.rs +71 -0
  209. frogql-0.1.0/tests/typecheck_test.rs +665 -0
@@ -0,0 +1,73 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [ main ]
6
+ pull_request:
7
+ branches: [ main ]
8
+
9
+ concurrency:
10
+ group: ci-${{ github.ref }}
11
+ cancel-in-progress: true
12
+
13
+ jobs:
14
+ fmt:
15
+ runs-on: ubuntu-latest
16
+ steps:
17
+ - uses: actions/checkout@v4
18
+ - uses: dtolnay/rust-toolchain@stable
19
+ with:
20
+ components: rustfmt
21
+ - run: cargo fmt --all -- --check
22
+
23
+ check:
24
+ runs-on: ubuntu-latest
25
+ steps:
26
+ - uses: actions/checkout@v4
27
+ - uses: dtolnay/rust-toolchain@stable
28
+ - uses: Swatinem/rust-cache@v2
29
+ - run: cargo check --workspace --all-targets
30
+
31
+ clippy:
32
+ runs-on: ubuntu-latest
33
+ env:
34
+ RUSTFLAGS: "-D warnings"
35
+ steps:
36
+ - uses: actions/checkout@v4
37
+ - uses: dtolnay/rust-toolchain@stable
38
+ with:
39
+ components: clippy
40
+ - uses: Swatinem/rust-cache@v2
41
+ - run: cargo clippy --workspace --all-targets -- -D clippy::all
42
+
43
+ test:
44
+ needs: check
45
+ runs-on: ubuntu-latest
46
+ steps:
47
+ - uses: actions/checkout@v4
48
+ - uses: dtolnay/rust-toolchain@stable
49
+ - uses: Swatinem/rust-cache@v2
50
+ # Lib unit tests (inline `#[cfg(test)] mod tests` in src/).
51
+ - name: Lib tests
52
+ run: cargo test --workspace --lib
53
+ # bench_test has pre-existing failures — exclude it (see CLAUDE.md).
54
+ # All other integration binaries listed explicitly. Adding a new
55
+ # `tests/foo.rs` requires adding `--test foo` here, otherwise it's
56
+ # silently skipped from CI.
57
+ - name: Integration tests
58
+ run: |
59
+ cargo test --test parser_test \
60
+ --test runtime_test \
61
+ --test store_runtime_test \
62
+ --test text2gql_test \
63
+ --test typecheck_test \
64
+ --test typecheck_smoke \
65
+ --test elaborate_test \
66
+ --test float_test \
67
+ --test list_test \
68
+ --test record_test \
69
+ --test count_test \
70
+ --test parse_and_run_test \
71
+ --test compile_diagnostics \
72
+ --test graph_type_test \
73
+ --test lattice_proptest
@@ -0,0 +1,112 @@
1
+ name: Release
2
+
3
+ # Build wheels and publish to PyPI when a tag like v0.1.0 is pushed.
4
+ # Uses abi3-py38 so one wheel per (os, arch) supports CPython 3.8+.
5
+ #
6
+ # Setup: add MATURIN_PYPI_TOKEN as a repo secret (the API token from
7
+ # python/pypi_token, never commit it).
8
+
9
+ on:
10
+ push:
11
+ tags:
12
+ - "v*"
13
+ workflow_dispatch:
14
+
15
+ permissions:
16
+ contents: read
17
+
18
+ jobs:
19
+ linux:
20
+ runs-on: ubuntu-latest
21
+ strategy:
22
+ matrix:
23
+ target: [x86_64, aarch64]
24
+ steps:
25
+ - uses: actions/checkout@v4
26
+ - uses: actions/setup-python@v5
27
+ with:
28
+ python-version: "3.12"
29
+ - name: Build wheel (manylinux ${{ matrix.target }})
30
+ uses: PyO3/maturin-action@v1
31
+ with:
32
+ working-directory: python
33
+ target: ${{ matrix.target }}
34
+ args: --release --out dist
35
+ manylinux: auto
36
+ - uses: actions/upload-artifact@v4
37
+ with:
38
+ name: wheels-linux-${{ matrix.target }}
39
+ path: python/dist
40
+
41
+ macos:
42
+ runs-on: macos-14
43
+ strategy:
44
+ matrix:
45
+ target: [x86_64, aarch64]
46
+ steps:
47
+ - uses: actions/checkout@v4
48
+ - uses: actions/setup-python@v5
49
+ with:
50
+ python-version: "3.12"
51
+ - name: Build wheel (macOS ${{ matrix.target }})
52
+ uses: PyO3/maturin-action@v1
53
+ with:
54
+ working-directory: python
55
+ target: ${{ matrix.target }}
56
+ args: --release --out dist
57
+ - uses: actions/upload-artifact@v4
58
+ with:
59
+ name: wheels-macos-${{ matrix.target }}
60
+ path: python/dist
61
+
62
+ windows:
63
+ runs-on: windows-latest
64
+ steps:
65
+ - uses: actions/checkout@v4
66
+ - uses: actions/setup-python@v5
67
+ with:
68
+ python-version: "3.12"
69
+ - name: Build wheel (windows x86_64)
70
+ uses: PyO3/maturin-action@v1
71
+ with:
72
+ working-directory: python
73
+ target: x86_64
74
+ args: --release --out dist
75
+ - uses: actions/upload-artifact@v4
76
+ with:
77
+ name: wheels-windows-x86_64
78
+ path: python/dist
79
+
80
+ sdist:
81
+ runs-on: ubuntu-latest
82
+ steps:
83
+ - uses: actions/checkout@v4
84
+ - name: Build sdist
85
+ uses: PyO3/maturin-action@v1
86
+ with:
87
+ working-directory: python
88
+ command: sdist
89
+ args: --out dist
90
+ - uses: actions/upload-artifact@v4
91
+ with:
92
+ name: sdist
93
+ path: python/dist
94
+
95
+ release:
96
+ name: Publish to PyPI
97
+ runs-on: ubuntu-latest
98
+ needs: [linux, macos, windows, sdist]
99
+ if: startsWith(github.ref, 'refs/tags/v')
100
+ environment: pypi
101
+ steps:
102
+ - uses: actions/download-artifact@v4
103
+ with:
104
+ path: dist
105
+ merge-multiple: true
106
+ - name: Publish
107
+ uses: PyO3/maturin-action@v1
108
+ env:
109
+ MATURIN_PYPI_TOKEN: ${{ secrets.MATURIN_PYPI_TOKEN }}
110
+ with:
111
+ command: upload
112
+ args: --non-interactive --skip-existing dist/*
@@ -0,0 +1,33 @@
1
+ /target
2
+ *.gdb
3
+ !examples/*.gdb
4
+ __pycache__/
5
+ /bench/data/
6
+ # Bench artifacts (timestamped per run; regenerate via the bench
7
+ # scripts under bench/scripts/). Never useful to commit.
8
+ /bench/results/
9
+ # Cross-system bench artifacts: timestamped run outputs and external-
10
+ # system loaded data (each system imports LDBC CSVs into its native
11
+ # format under bench/data/cross-system/<system>/).
12
+ /bench/cross-system/results/
13
+ /bench/data/cross-system/
14
+ /python/.venv/
15
+ /python/foo.py
16
+ /python/pypi_token
17
+ /python/.mypy_cache/
18
+ /python/dist/
19
+ *.swp
20
+ # Large generated databases (re-create with scripts/convert_dev_datasets.py)
21
+ examples/neoflix.gdb
22
+ examples/olympics.gdb
23
+ examples/address.gdb
24
+ examples/recommendations.gdb
25
+ examples/twitter.gdb
26
+ examples/video_games.gdb
27
+ examples/bluesky.gdb
28
+ examples/buzzoverflow.gdb
29
+ examples/books.gdb
30
+ examples/ldbc-sf01.gdb
31
+ # LDBC SNB benchmark inputs (~100 MB at SF=0.1; rebuild with `gqlite ... --import-ldbc-csv`)
32
+ /bench/social_network-*
33
+ latex
@@ -0,0 +1,197 @@
1
+ # GQLite Architecture & Implementation Notes
2
+
3
+ This document captures the full design and implementation state of GQLite for context continuity.
4
+
5
+ ## What This Is
6
+
7
+ A Rust implementation of a GQL (ISO Graph Query Language) graph database with single-file storage inspired by SQLite. Built as a research prototype accompanying an academic paper on GQL path pattern matching. The Python reference implementation lives in `../pygql/`.
8
+
9
+ **Repo:** `pleiad/gqlite` on GitHub.
10
+
11
+ ## Architecture Overview
12
+
13
+ ```
14
+ ┌──────────────────────────────────────────────────────────┐
15
+ │ gqlite::compile(query) │
16
+ │ Public entry point: parse → optimize → return AST │
17
+ └──────────────┬───────────────────────────────────────────┘
18
+
19
+ ┌──────────┼──────────┐
20
+ │ │ │
21
+ ▼ ▼ ▼
22
+ Parser Optimizer Runtime Engine
23
+ (lexer + (predicate (evaluates AST against graph)
24
+ recursive pushdown) generic over GraphAccess trait
25
+ descent) │
26
+
27
+ ┌──────────┼──────────────┐
28
+ │ │ │
29
+ ▼ ▼ ▼
30
+ Graph LazyGraphStore DiskGraphStore
31
+ (in-memory) (lazy records) (disk indexes)
32
+ │ │ │
33
+ │ ┌────┴────┐ ┌────┴────┐
34
+ │ │ Pager │ │ Pager │
35
+ │ │(LRU │ │(LRU │
36
+ │ │ cache) │ │ cache) │
37
+ │ └────┬────┘ └────┬────┘
38
+ │ │ │
39
+ │ File I/O File I/O
40
+ │ (.gql file) (.gql file)
41
+ ```
42
+
43
+ ## Query Pipeline
44
+
45
+ ```
46
+ "(x: Person)-[:Knows]->(y)"
47
+ → parse() → PathPattern AST
48
+ → optimize() → rewritten AST (predicate pushdown)
49
+ → runtime.run() → IntermediateResult { rows: Vec<ResultRow> }
50
+ ```
51
+
52
+ ## Module Map
53
+
54
+ ### `src/lib.rs`
55
+ Entry point: `pub fn compile(query: &str) -> Result<PathPattern, String>` — parse + optimize.
56
+
57
+ ### `src/parser/` — GQL Parser
58
+ - `lexer.rs` — Hand-written tokenizer. Handles compound tokens (`]->`, `<-[`, `~[`, `]~`).
59
+ - `grammar.rs` — Recursive descent parser. Precedence: union (`|`) < concat (adjacency) < quantifiers (`{n,m}`, `*`, `+`, `?`). Expressions: logical < comparison < arithmetic < unary.
60
+
61
+ ### `src/optimizer/` — AST Optimization
62
+ - `pushdown.rs` — Predicate pushdown. Extracts `x.attr is type` from WHERE AND-chains, merges into descriptors. `((x)-[y]->(z) WHERE x.a is bool) → (x:{a:bool})-[y]->(z)`. Only AND conjuncts; OR stays as filter.
63
+
64
+ ### `src/syntax/` — AST Types
65
+ - `path_pattern.rs` — `PathPattern` enum: Node, EdgeRight/Left/Undirected/AnyDirection, Concat, Union, Filter, Repeat, Questioned.
66
+ - `descriptor.rs` — `Descriptor`: optional variable name + `DescriptorType`.
67
+ - `expr.rs` — `Expr` enum: Const, AttrLookup, Binop, Unop, Type. `BinOp`/`UnOp` enums with `delta()` for type checking.
68
+
69
+ ### `src/typing/` — Lattice-Based Type System
70
+ - `simple_type.rs` — `SimpleType`: Z (int), S (str), B (bool), Star (*), Zero (⊥), Union, List. Meet/join/subtype.
71
+ - `label_type.rs` — `LabelType`: Label, Star, Top, Empty, And, Or, Neg. Boolean algebra. `from_list()`, `is_subtype()`, `meet()`, `as_simple_label()`.
72
+ - `property_type.rs` — `PropertyType`: Open `{}` (extra attrs OK), Closed `{{}}` (exact), Zero. Meet/subtype.
73
+ - `descriptor_type.rs` — `DescriptorType` = LabelType + PropertyType.
74
+ - `variable_type.rs` — `VariableType`: Node, EdgeDirectional, EdgeNonDirectional, Union, List, Zero. `Schema` struct.
75
+
76
+ ### `src/runtime/` — Query Execution
77
+ - `engine.rs` — `Runtime<G: GraphAccess>`. Key optimizations:
78
+ - **Label-indexed scanning**: `run_node_pattern` checks for simple label, uses `nodes_with_label()` index.
79
+ - **Adjacency-driven concat**: `run_concat_pattern` detects edge/node on right side, uses `outgoing_edges()`/`incoming_edges()` instead of cross-product.
80
+ - **Hash-join fallback**: for complex right-side patterns, groups by first-node-id for O(n+m) instead of O(n×m).
81
+ - **Repetition hash-join**: builds hash map on grouped results once, reuses for each iteration.
82
+ - `assignment.rs` — `Assignment`: variable→PathValue bindings. `can_unify()`, `unify()`, `fill_nones()`, `to_group()`, `concat_group()`.
83
+ - `result.rs` — `ResultRow` (path + assignment), `IntermediateResult` (vec of rows), `ExprResult`.
84
+
85
+ ### `src/model/` — Graph Data Model
86
+ - `value.rs` — `Value` (Int/Str/Bool), `PathValue` (Node/EdgeDirectional/EdgeUndirectional/Nothing/List), `Path` (with `can_concat`, `concat`, `first_node_id`, `last_node_id`).
87
+ - `graph.rs` — `Graph` struct: nodes, edges_d, edges_u, labels, props, endpoints, src, tgt + indexes (label_to_nodes/edges, outgoing/incoming/undirected_adj). Constructors: `from_file()` (JSON), `from_json_str()`, `from_json_value()`, `from_raw()`, `open()` (.gql), `save()`.
88
+ - `graph_access.rs` — `GraphAccess` trait: 17 methods. Core: nodes(), edges_directed/undirected(), labels(), props(), src(), tgt(), endpoints(), is_directed(), edge_path_value(). Index-aware: nodes_with_label(), directed/undirected_edges_with_label(). Adjacency: outgoing_edges(), incoming_edges(), undirected_edges_of().
89
+
90
+ ### `src/store/` — Persistent Storage
91
+ - `string_table.rs` — Deduplicated string interning on pages. `intern()` → u32, `resolve()` → &str. Multi-page with overflow. `str_to_id` HashMap is public (used by DiskGraphStore).
92
+ - `record.rs` — Binary encode/decode for node/edge cells on slotted pages. Node: user_id_sid, label_sids, props (name_sid + typed value). Edge: node record + src_iid + tgt_iid + directed flag.
93
+ - `io.rs` — `save_graph()` and `load_graph()`. Writes node/edge data pages + on-disk indexes (label index, adjacency, ID index). `load_graph()` reads pages and rebuilds Graph via `from_raw()`.
94
+ - `lazy.rs` — `LazyGraphStore`: compact indexes in memory (IDs, topology), records read on demand through page cache. Uses `RefCell<Pager>` for interior mutability. `Box::leak` for returning references to lazily-loaded data.
95
+ - `disk.rs` — `DiskGraphStore`: adds on-disk label/adjacency indexes. Reads index pages through cache. Same `Box::leak` pattern.
96
+ - `disk_index.rs` — On-disk index format: sorted page chains. Label index (label_sid → page chain of element IDs), adjacency (node_iid → page chain of triples), ID index (sorted pairs for binary search). Page chain format: 8-byte header (type, count, next_page), then fixed-size entries.
97
+
98
+ ### `src/pager/` — Page-Level I/O
99
+ - `page.rs` — 4KB slotted pages. Header: type, cell_count, cell_area_start. Cell pointer array grows forward, cell data grows backward. `insert_cell()`, `cell_offset()`, `read_at()`, `free_space()`.
100
+ - `header.rs` — File header (page 0): magic `GQLDB\0`, version, page_size, page_count, free_list_head, node/edge counts, root page pointers (string_table, node_data, edge_data, label_index, adjacency, edge_label_index, node_id_index, edge_id_index).
101
+ - `pager.rs` — `Pager`: create/open database file, read/write pages through LRU cache. `allocate_page()` (reuses from LIFO free list or extends file), `free_page()`. Configurable cache size (default 2000 pages = ~8MB). Cache stats (hits/misses).
102
+
103
+ ## Three Storage Modes
104
+
105
+ | Mode | Memory | Speed | Use Case |
106
+ |------|--------|-------|----------|
107
+ | `Graph` | O(graph_size) ~6 bytes/element | Fastest (HashMap lookups) | <1M elements |
108
+ | `LazyGraphStore` | O(indexes) ~2 bytes/element | ~1.8x slower (page cache reads) | 1M-10M elements |
109
+ | `DiskGraphStore` | O(indexes + disk_index_roots) | ~2-3x slower (disk index reads) | Same as Lazy currently |
110
+
111
+ **Current limitation:** Both Lazy and Disk still hold user ID strings (`Vec<String>`) and ID-to-index HashMaps in memory because `GraphAccess` trait returns `&str`. To get true O(cache_size) memory, the trait would need to use u32 internal IDs everywhere with string resolution only at result formatting time.
112
+
113
+ ## On-Disk File Format (.gql)
114
+
115
+ ```
116
+ Page 0: File Header
117
+ Magic: "GQLDB\0"
118
+ Version: 1
119
+ Page size: 4096
120
+ Page count, free list head
121
+ Root pointers: string_table, node_data, edge_data,
122
+ label_index, adjacency, edge_label_index,
123
+ node_id_index, edge_id_index
124
+
125
+ Pages 1+:
126
+ StringTable pages (type=5): length-prefixed UTF-8 strings
127
+ NodeData pages (type=1): slotted pages with encoded node cells
128
+ EdgeData pages (type=2): slotted pages with encoded edge cells
129
+ LabelIndex pages (type=4): sorted page chains
130
+ Adjacency pages (type=3): triple page chains
131
+ Free pages (type=7): linked list via first 4 bytes
132
+ ```
133
+
134
+ ## Key Design Decisions
135
+
136
+ 1. **Enums not trait objects** for the type system. Python uses isinstance(); Rust enums with match are the direct equivalent.
137
+
138
+ 2. **GraphAccess trait** for storage abstraction. Runtime is `Runtime<G: GraphAccess>` — same query code for all backends.
139
+
140
+ 3. **`Box::leak` for lazy stores**. The trait returns `&LabelType` and `&Props` but lazy stores compute these on the fly. Leaking is bounded by query result size.
141
+
142
+ 4. **Python mutation side-effect in repetition**. Python's `to_group()` mutates assignments in-place, affecting the original `ir`. In Rust, we clone and use the grouped version for both `res` and the hash map. This was a subtle porting bug.
143
+
144
+ 5. **Hash-join over sort-merge**. For concat and repetition, grouping by first-node-id in a HashMap gives O(n+m) expected. The hash map for repetition is built once and reused across iterations.
145
+
146
+ 6. **Predicate pushdown as compilation phase**. `compile()` = parse + optimize. The optimizer rewrites the AST before the runtime ever sees it.
147
+
148
+ 7. **Label syntax: `:` prefix required**. `-[:Transfer]->` not `-[Transfer]->`. Without `:`, `Transfer` is parsed as a variable name, not a label. This matches the Python Lark grammar.
149
+
150
+ ## Test Coverage
151
+
152
+ 189 tests across 6 test files:
153
+ - `tests/parser_test.rs` (31) — AST structure from parsed queries
154
+ - `tests/runtime_test.rs` (26) — hand-built AST execution
155
+ - `tests/parse_and_run_test.rs` (41) — end-to-end: string → compile → run
156
+ - `tests/store_runtime_test.rs` (31) — save → reopen → run
157
+ - `tests/bench_test.rs` (4) — benchmarks with memory tracking
158
+ - `src/` inline tests (56) — unit tests for all modules
159
+
160
+ ## Benchmarks (10K nodes, 55K edges, release mode)
161
+
162
+ | Query | Graph | Lazy | Disk |
163
+ |-------|-------|------|------|
164
+ | Label scan: Person | 1.6ms | 4.2ms | 3.2ms |
165
+ | 1-hop traversal | 13.5ms | 28.2ms | 58.4ms |
166
+ | 2-hop chain | 24.2ms | 44.9ms | 113.7ms |
167
+ | Repeat {1,2} | 24.0ms | 35.7ms | 33.8ms |
168
+
169
+ Memory at 100K nodes / 550K edges: Graph=603 MB, Lazy=212 MB, Disk=169 MB.
170
+
171
+ ## What's NOT Implemented
172
+
173
+ - **Typechecker** (Phase 3 from original plan) — schema inference + type checking. Port of `pygql/gql/typechecker.py`. Skipped as optional.
174
+ - **CLI binary** — no `main.rs`. Library only.
175
+ - **True O(cache_size) DiskGraphStore** — needs `GraphAccess` redesign to use u32 internal IDs.
176
+ - **Cost-based query planning** — choosing join order by selectivity.
177
+ - **Write-back page cache** — current is write-through.
178
+ - **Transactions / WAL** — not needed for read-heavy research workload.
179
+ - **Unbounded repetition** (`*`, `+`) at runtime — only bounded `{lb, ub}`.
180
+
181
+ ## Relationship to Python Implementation
182
+
183
+ The Rust runtime produces identical results to `pygql/` for all queries. Tests were ported from:
184
+ - `pygql/test/runtime_test.py` → `tests/runtime_test.rs` + `tests/parse_and_run_test.rs`
185
+ - `pygql/test/parser_test.py` → `tests/parser_test.rs`
186
+
187
+ Test databases (`test_data/fraud.json`, `test_data/social-network.json`) are copies of `pygql/dbs/`.
188
+
189
+ ## Building and Testing
190
+
191
+ ```bash
192
+ cargo build --release
193
+ cargo test # all 189 tests
194
+ cargo test --test bench_test --release -- --nocapture # benchmarks
195
+ cargo test --test bench_test bench_graph_vs_lazy --release -- --nocapture # 3-way comparison
196
+ cargo test --test bench_test bench_memory_scaling --release -- --nocapture # memory scaling
197
+ ```