frogql 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- frogql-0.1.0/.github/workflows/ci.yml +73 -0
- frogql-0.1.0/.github/workflows/release.yml +112 -0
- frogql-0.1.0/.gitignore +33 -0
- frogql-0.1.0/ARCHITECTURE.md +197 -0
- frogql-0.1.0/CLAUDE.md +356 -0
- frogql-0.1.0/Cargo.lock +1679 -0
- frogql-0.1.0/Cargo.toml +64 -0
- frogql-0.1.0/LICENSE +21 -0
- frogql-0.1.0/MANUAL.md +274 -0
- frogql-0.1.0/PKG-INFO +112 -0
- frogql-0.1.0/README.md +485 -0
- frogql-0.1.0/bench/.gitignore +2 -0
- frogql-0.1.0/bench/BENCHMARK_PLAN.md +75 -0
- frogql-0.1.0/bench/LDBC_BENCHMARK.md +237 -0
- frogql-0.1.0/bench/LDBC_BENCH_PLAN.md +180 -0
- frogql-0.1.0/bench/TYPECHECKER_BENCHMARK.md +200 -0
- frogql-0.1.0/bench/cross-system/README.md +102 -0
- frogql-0.1.0/bench/cross-system/compare_results.py +163 -0
- frogql-0.1.0/bench/cross-system/gqlite/run.sh +68 -0
- frogql-0.1.0/bench/cross-system/graphqlite/ic2.cypher +28 -0
- frogql-0.1.0/bench/cross-system/graphqlite/requirements.txt +1 -0
- frogql-0.1.0/bench/cross-system/graphqlite/run.py +253 -0
- frogql-0.1.0/bench/cross-system/graphqlite/setup.py +212 -0
- frogql-0.1.0/bench/cross-system/run_all.sh +191 -0
- frogql-0.1.0/bench/ldbc-queries/ic1.toml +23 -0
- frogql-0.1.0/bench/ldbc-queries/ic10.toml +25 -0
- frogql-0.1.0/bench/ldbc-queries/ic11.toml +22 -0
- frogql-0.1.0/bench/ldbc-queries/ic12.toml +25 -0
- frogql-0.1.0/bench/ldbc-queries/ic13.toml +22 -0
- frogql-0.1.0/bench/ldbc-queries/ic14.toml +24 -0
- frogql-0.1.0/bench/ldbc-queries/ic2.toml +29 -0
- frogql-0.1.0/bench/ldbc-queries/ic3.toml +22 -0
- frogql-0.1.0/bench/ldbc-queries/ic4.toml +23 -0
- frogql-0.1.0/bench/ldbc-queries/ic5.toml +21 -0
- frogql-0.1.0/bench/ldbc-queries/ic6.toml +22 -0
- frogql-0.1.0/bench/ldbc-queries/ic7.toml +23 -0
- frogql-0.1.0/bench/ldbc-queries/ic8.toml +21 -0
- frogql-0.1.0/bench/ldbc-queries/ic9.toml +21 -0
- frogql-0.1.0/bench/queries/1-tree.gql +1 -0
- frogql-0.1.0/bench/queries/2-3-lollipop.gql +1 -0
- frogql-0.1.0/bench/queries/2-comb.gql +1 -0
- frogql-0.1.0/bench/queries/2-tree.gql +1 -0
- frogql-0.1.0/bench/queries/3-4-lollipop.gql +1 -0
- frogql-0.1.0/bench/queries/3-clique.gql +1 -0
- frogql-0.1.0/bench/queries/3-cycle.gql +1 -0
- frogql-0.1.0/bench/queries/3-path.gql +1 -0
- frogql-0.1.0/bench/queries/4-clique.gql +1 -0
- frogql-0.1.0/bench/queries/4-cycle.gql +1 -0
- frogql-0.1.0/bench/queries/4-path.gql +1 -0
- frogql-0.1.0/bench/scripts/csv_to_json.py +168 -0
- frogql-0.1.0/bench/scripts/download_livejournal.sh +27 -0
- frogql-0.1.0/bench/scripts/generate_queries.py +89 -0
- frogql-0.1.0/bench/scripts/run_bench.sh +103 -0
- frogql-0.1.0/docs/JOIN_STRATEGY_NOTES.md +457 -0
- frogql-0.1.0/docs/graph-type-catalog-plan.md +481 -0
- frogql-0.1.0/docs/implemented-optimizations.md +169 -0
- frogql-0.1.0/docs/iso-gql-gaps.md +140 -0
- frogql-0.1.0/docs/possible-optimizations.md +109 -0
- frogql-0.1.0/docs/rules.md +183 -0
- frogql-0.1.0/docs/storage-architecture.md +782 -0
- frogql-0.1.0/docs/typechecker_migration.md +546 -0
- frogql-0.1.0/examples/address_queries.json +237 -0
- frogql-0.1.0/examples/bom.gdb +0 -0
- frogql-0.1.0/examples/books_queries.json +362 -0
- frogql-0.1.0/examples/disney.gdb +0 -0
- frogql-0.1.0/examples/disney_queries.json +112 -0
- frogql-0.1.0/examples/financial_financial_management.gdb +0 -0
- frogql-0.1.0/examples/financial_financial_management_queries.json +4122 -0
- frogql-0.1.0/examples/financial_fraud_detection.gdb +0 -0
- frogql-0.1.0/examples/financial_fraud_detection_queries.json +3857 -0
- frogql-0.1.0/examples/financial_payment.gdb +0 -0
- frogql-0.1.0/examples/financial_payment_queries.json +3882 -0
- frogql-0.1.0/examples/fraud_detection.gdb +0 -0
- frogql-0.1.0/examples/gameofthrones.gdb +0 -0
- frogql-0.1.0/examples/grandstack.gdb +0 -0
- frogql-0.1.0/examples/hockey.gdb +0 -0
- frogql-0.1.0/examples/hockey_queries.json +232 -0
- frogql-0.1.0/examples/information_data_lineage.gdb +0 -0
- frogql-0.1.0/examples/information_data_lineage_queries.json +3577 -0
- frogql-0.1.0/examples/information_technology_identity_and_access_management.gdb +0 -0
- frogql-0.1.0/examples/information_technology_identity_and_access_management_queries.json +3532 -0
- frogql-0.1.0/examples/information_technology_iot.gdb +0 -0
- frogql-0.1.0/examples/information_technology_iot_queries.json +3357 -0
- frogql-0.1.0/examples/information_technology_it_asset_management.gdb +0 -0
- frogql-0.1.0/examples/information_technology_it_asset_management_queries.json +3632 -0
- frogql-0.1.0/examples/knowledge_general_knowledge.gdb +0 -0
- frogql-0.1.0/examples/knowledge_general_knowledge_queries.json +1847 -0
- frogql-0.1.0/examples/knowledge_graph_geography.gdb +0 -0
- frogql-0.1.0/examples/knowledge_graph_geography_queries.json +3212 -0
- frogql-0.1.0/examples/manufacturing_bombill_of_materials.gdb +0 -0
- frogql-0.1.0/examples/manufacturing_bombill_of_materials_queries.json +2447 -0
- frogql-0.1.0/examples/manufacturing_production_process.gdb +0 -0
- frogql-0.1.0/examples/manufacturing_production_process_queries.json +2982 -0
- frogql-0.1.0/examples/moivelens.gdb +0 -0
- frogql-0.1.0/examples/moivelens_queries.json +3262 -0
- frogql-0.1.0/examples/movies.gdb +0 -0
- frogql-0.1.0/examples/northwind.gdb +0 -0
- frogql-0.1.0/examples/olympics_queries.json +377 -0
- frogql-0.1.0/examples/soccer_2016.gdb +0 -0
- frogql-0.1.0/examples/soccer_2016_queries.json +262 -0
- frogql-0.1.0/examples/social_network_recommendation.gdb +0 -0
- frogql-0.1.0/examples/social_network_recommendation_queries.json +2307 -0
- frogql-0.1.0/examples/social_network_twitter.gdb +0 -0
- frogql-0.1.0/examples/social_network_twitter_queries.json +2987 -0
- frogql-0.1.0/examples/stackoverflow2.gdb +0 -0
- frogql-0.1.0/examples/student_loan.gdb +0 -0
- frogql-0.1.0/examples/student_loan_queries.json +487 -0
- frogql-0.1.0/examples/typecheck_demo.rs +20 -0
- frogql-0.1.0/examples/typecheck_repl_smoke.rs +146 -0
- frogql-0.1.0/examples/video_games_queries.json +422 -0
- frogql-0.1.0/examples/world.gdb +0 -0
- frogql-0.1.0/examples/world_queries.json +282 -0
- frogql-0.1.0/pyproject.toml +45 -0
- frogql-0.1.0/python/Cargo.toml +17 -0
- frogql-0.1.0/python/LICENSE +21 -0
- frogql-0.1.0/python/README.md +80 -0
- frogql-0.1.0/python/src/lib.rs +620 -0
- frogql-0.1.0/scripts/convert_dev_datasets.py +395 -0
- frogql-0.1.0/src/bin/bench_queries.rs +110 -0
- frogql-0.1.0/src/bin/bench_setup.rs +303 -0
- frogql-0.1.0/src/bin/convert_edgelist.rs +195 -0
- frogql-0.1.0/src/bin/frogql.rs +1517 -0
- frogql-0.1.0/src/bin/ldbc_bench.rs +1215 -0
- frogql-0.1.0/src/bin/typecheck_bench.rs +571 -0
- frogql-0.1.0/src/elaborate/mod.rs +192 -0
- frogql-0.1.0/src/lib.rs +168 -0
- frogql-0.1.0/src/model/csv_loader.rs +1061 -0
- frogql-0.1.0/src/model/graph.rs +457 -0
- frogql-0.1.0/src/model/graph_access.rs +91 -0
- frogql-0.1.0/src/model/mod.rs +4 -0
- frogql-0.1.0/src/model/value.rs +211 -0
- frogql-0.1.0/src/optimizer/existential.rs +116 -0
- frogql-0.1.0/src/optimizer/mod.rs +10 -0
- frogql-0.1.0/src/optimizer/pushdown.rs +501 -0
- frogql-0.1.0/src/pager/header.rs +218 -0
- frogql-0.1.0/src/pager/mod.rs +11 -0
- frogql-0.1.0/src/pager/page.rs +267 -0
- frogql-0.1.0/src/pager/pager.rs +451 -0
- frogql-0.1.0/src/parser/grammar.rs +1914 -0
- frogql-0.1.0/src/parser/lexer.rs +560 -0
- frogql-0.1.0/src/parser/mod.rs +6 -0
- frogql-0.1.0/src/runtime/assignment.rs +112 -0
- frogql-0.1.0/src/runtime/catalog.rs +247 -0
- frogql-0.1.0/src/runtime/engine.rs +2130 -0
- frogql-0.1.0/src/runtime/ltj/algorithm.rs +387 -0
- frogql-0.1.0/src/runtime/ltj/iterator.rs +182 -0
- frogql-0.1.0/src/runtime/ltj/mod.rs +5 -0
- frogql-0.1.0/src/runtime/ltj/pattern_extract.rs +903 -0
- frogql-0.1.0/src/runtime/ltj/triple_index.rs +295 -0
- frogql-0.1.0/src/runtime/ltj/veo.rs +46 -0
- frogql-0.1.0/src/runtime/mod.rs +52 -0
- frogql-0.1.0/src/runtime/result.rs +148 -0
- frogql-0.1.0/src/store/catalog_io.rs +218 -0
- frogql-0.1.0/src/store/disk.rs +492 -0
- frogql-0.1.0/src/store/disk_index.rs +489 -0
- frogql-0.1.0/src/store/io.rs +394 -0
- frogql-0.1.0/src/store/lazy.rs +931 -0
- frogql-0.1.0/src/store/mod.rs +8 -0
- frogql-0.1.0/src/store/record.rs +322 -0
- frogql-0.1.0/src/store/secondary_index.rs +419 -0
- frogql-0.1.0/src/store/string_table.rs +634 -0
- frogql-0.1.0/src/syntax/descriptor.rs +78 -0
- frogql-0.1.0/src/syntax/expr.rs +254 -0
- frogql-0.1.0/src/syntax/mod.rs +5 -0
- frogql-0.1.0/src/syntax/path_pattern.rs +102 -0
- frogql-0.1.0/src/syntax/query.rs +329 -0
- frogql-0.1.0/src/syntax/statement.rs +72 -0
- frogql-0.1.0/src/typing/checker.rs +929 -0
- frogql-0.1.0/src/typing/descriptor_type.rs +53 -0
- frogql-0.1.0/src/typing/format.rs +293 -0
- frogql-0.1.0/src/typing/inference.rs +401 -0
- frogql-0.1.0/src/typing/label_type.rs +207 -0
- frogql-0.1.0/src/typing/mod.rs +11 -0
- frogql-0.1.0/src/typing/path_type.rs +275 -0
- frogql-0.1.0/src/typing/property_type.rs +319 -0
- frogql-0.1.0/src/typing/simple_type.rs +367 -0
- frogql-0.1.0/src/typing/type_environment.rs +305 -0
- frogql-0.1.0/src/typing/validate.rs +337 -0
- frogql-0.1.0/src/typing/variable_type.rs +509 -0
- frogql-0.1.0/test_data/fraud.json +128 -0
- frogql-0.1.0/test_data/movies.json +5193 -0
- frogql-0.1.0/test_data/social-network.json +78 -0
- frogql-0.1.0/tests/aggregates_proptest.proptest-regressions +7 -0
- frogql-0.1.0/tests/aggregates_proptest.rs +170 -0
- frogql-0.1.0/tests/bench_test.rs +679 -0
- frogql-0.1.0/tests/coalesce_test.rs +191 -0
- frogql-0.1.0/tests/compile_diagnostics.rs +66 -0
- frogql-0.1.0/tests/count_test.rs +807 -0
- frogql-0.1.0/tests/elaborate_test.rs +69 -0
- frogql-0.1.0/tests/exists_fold_test.rs +195 -0
- frogql-0.1.0/tests/exists_runtime_test.rs +241 -0
- frogql-0.1.0/tests/float_test.rs +128 -0
- frogql-0.1.0/tests/graph_type_test.rs +893 -0
- frogql-0.1.0/tests/lattice_proptest.rs +1697 -0
- frogql-0.1.0/tests/list_test.rs +150 -0
- frogql-0.1.0/tests/multi_match_proptest.rs +198 -0
- frogql-0.1.0/tests/multi_match_test.rs +314 -0
- frogql-0.1.0/tests/null_test.rs +102 -0
- frogql-0.1.0/tests/optional_match_test.rs +391 -0
- frogql-0.1.0/tests/order_by_test.rs +406 -0
- frogql-0.1.0/tests/parse_and_run_test.rs +257 -0
- frogql-0.1.0/tests/parser_test.rs +1096 -0
- frogql-0.1.0/tests/record_test.rs +185 -0
- frogql-0.1.0/tests/runtime_test.rs +820 -0
- frogql-0.1.0/tests/store_runtime_test.rs +194 -0
- frogql-0.1.0/tests/text2gql_test.rs +258 -0
- frogql-0.1.0/tests/typecheck_gaps_order_by_test.rs +292 -0
- frogql-0.1.0/tests/typecheck_smoke.rs +71 -0
- frogql-0.1.0/tests/typecheck_test.rs +665 -0
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [ main ]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [ main ]
|
|
8
|
+
|
|
9
|
+
concurrency:
|
|
10
|
+
group: ci-${{ github.ref }}
|
|
11
|
+
cancel-in-progress: true
|
|
12
|
+
|
|
13
|
+
jobs:
|
|
14
|
+
fmt:
|
|
15
|
+
runs-on: ubuntu-latest
|
|
16
|
+
steps:
|
|
17
|
+
- uses: actions/checkout@v4
|
|
18
|
+
- uses: dtolnay/rust-toolchain@stable
|
|
19
|
+
with:
|
|
20
|
+
components: rustfmt
|
|
21
|
+
- run: cargo fmt --all -- --check
|
|
22
|
+
|
|
23
|
+
check:
|
|
24
|
+
runs-on: ubuntu-latest
|
|
25
|
+
steps:
|
|
26
|
+
- uses: actions/checkout@v4
|
|
27
|
+
- uses: dtolnay/rust-toolchain@stable
|
|
28
|
+
- uses: Swatinem/rust-cache@v2
|
|
29
|
+
- run: cargo check --workspace --all-targets
|
|
30
|
+
|
|
31
|
+
clippy:
|
|
32
|
+
runs-on: ubuntu-latest
|
|
33
|
+
env:
|
|
34
|
+
RUSTFLAGS: "-D warnings"
|
|
35
|
+
steps:
|
|
36
|
+
- uses: actions/checkout@v4
|
|
37
|
+
- uses: dtolnay/rust-toolchain@stable
|
|
38
|
+
with:
|
|
39
|
+
components: clippy
|
|
40
|
+
- uses: Swatinem/rust-cache@v2
|
|
41
|
+
- run: cargo clippy --workspace --all-targets -- -D clippy::all
|
|
42
|
+
|
|
43
|
+
test:
|
|
44
|
+
needs: check
|
|
45
|
+
runs-on: ubuntu-latest
|
|
46
|
+
steps:
|
|
47
|
+
- uses: actions/checkout@v4
|
|
48
|
+
- uses: dtolnay/rust-toolchain@stable
|
|
49
|
+
- uses: Swatinem/rust-cache@v2
|
|
50
|
+
# Lib unit tests (inline `#[cfg(test)] mod tests` in src/).
|
|
51
|
+
- name: Lib tests
|
|
52
|
+
run: cargo test --workspace --lib
|
|
53
|
+
# bench_test has pre-existing failures — exclude it (see CLAUDE.md).
|
|
54
|
+
# All other integration binaries listed explicitly. Adding a new
|
|
55
|
+
# `tests/foo.rs` requires adding `--test foo` here, otherwise it's
|
|
56
|
+
# silently skipped from CI.
|
|
57
|
+
- name: Integration tests
|
|
58
|
+
run: |
|
|
59
|
+
cargo test --test parser_test \
|
|
60
|
+
--test runtime_test \
|
|
61
|
+
--test store_runtime_test \
|
|
62
|
+
--test text2gql_test \
|
|
63
|
+
--test typecheck_test \
|
|
64
|
+
--test typecheck_smoke \
|
|
65
|
+
--test elaborate_test \
|
|
66
|
+
--test float_test \
|
|
67
|
+
--test list_test \
|
|
68
|
+
--test record_test \
|
|
69
|
+
--test count_test \
|
|
70
|
+
--test parse_and_run_test \
|
|
71
|
+
--test compile_diagnostics \
|
|
72
|
+
--test graph_type_test \
|
|
73
|
+
--test lattice_proptest
|
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
name: Release
|
|
2
|
+
|
|
3
|
+
# Build wheels and publish to PyPI when a tag like v0.1.0 is pushed.
|
|
4
|
+
# Uses abi3-py38 so one wheel per (os, arch) supports CPython 3.8+.
|
|
5
|
+
#
|
|
6
|
+
# Setup: add MATURIN_PYPI_TOKEN as a repo secret (the API token from
|
|
7
|
+
# python/pypi_token, never commit it).
|
|
8
|
+
|
|
9
|
+
on:
|
|
10
|
+
push:
|
|
11
|
+
tags:
|
|
12
|
+
- "v*"
|
|
13
|
+
workflow_dispatch:
|
|
14
|
+
|
|
15
|
+
permissions:
|
|
16
|
+
contents: read
|
|
17
|
+
|
|
18
|
+
jobs:
|
|
19
|
+
linux:
|
|
20
|
+
runs-on: ubuntu-latest
|
|
21
|
+
strategy:
|
|
22
|
+
matrix:
|
|
23
|
+
target: [x86_64, aarch64]
|
|
24
|
+
steps:
|
|
25
|
+
- uses: actions/checkout@v4
|
|
26
|
+
- uses: actions/setup-python@v5
|
|
27
|
+
with:
|
|
28
|
+
python-version: "3.12"
|
|
29
|
+
- name: Build wheel (manylinux ${{ matrix.target }})
|
|
30
|
+
uses: PyO3/maturin-action@v1
|
|
31
|
+
with:
|
|
32
|
+
working-directory: python
|
|
33
|
+
target: ${{ matrix.target }}
|
|
34
|
+
args: --release --out dist
|
|
35
|
+
manylinux: auto
|
|
36
|
+
- uses: actions/upload-artifact@v4
|
|
37
|
+
with:
|
|
38
|
+
name: wheels-linux-${{ matrix.target }}
|
|
39
|
+
path: python/dist
|
|
40
|
+
|
|
41
|
+
macos:
|
|
42
|
+
runs-on: macos-14
|
|
43
|
+
strategy:
|
|
44
|
+
matrix:
|
|
45
|
+
target: [x86_64, aarch64]
|
|
46
|
+
steps:
|
|
47
|
+
- uses: actions/checkout@v4
|
|
48
|
+
- uses: actions/setup-python@v5
|
|
49
|
+
with:
|
|
50
|
+
python-version: "3.12"
|
|
51
|
+
- name: Build wheel (macOS ${{ matrix.target }})
|
|
52
|
+
uses: PyO3/maturin-action@v1
|
|
53
|
+
with:
|
|
54
|
+
working-directory: python
|
|
55
|
+
target: ${{ matrix.target }}
|
|
56
|
+
args: --release --out dist
|
|
57
|
+
- uses: actions/upload-artifact@v4
|
|
58
|
+
with:
|
|
59
|
+
name: wheels-macos-${{ matrix.target }}
|
|
60
|
+
path: python/dist
|
|
61
|
+
|
|
62
|
+
windows:
|
|
63
|
+
runs-on: windows-latest
|
|
64
|
+
steps:
|
|
65
|
+
- uses: actions/checkout@v4
|
|
66
|
+
- uses: actions/setup-python@v5
|
|
67
|
+
with:
|
|
68
|
+
python-version: "3.12"
|
|
69
|
+
- name: Build wheel (windows x86_64)
|
|
70
|
+
uses: PyO3/maturin-action@v1
|
|
71
|
+
with:
|
|
72
|
+
working-directory: python
|
|
73
|
+
target: x86_64
|
|
74
|
+
args: --release --out dist
|
|
75
|
+
- uses: actions/upload-artifact@v4
|
|
76
|
+
with:
|
|
77
|
+
name: wheels-windows-x86_64
|
|
78
|
+
path: python/dist
|
|
79
|
+
|
|
80
|
+
sdist:
|
|
81
|
+
runs-on: ubuntu-latest
|
|
82
|
+
steps:
|
|
83
|
+
- uses: actions/checkout@v4
|
|
84
|
+
- name: Build sdist
|
|
85
|
+
uses: PyO3/maturin-action@v1
|
|
86
|
+
with:
|
|
87
|
+
working-directory: python
|
|
88
|
+
command: sdist
|
|
89
|
+
args: --out dist
|
|
90
|
+
- uses: actions/upload-artifact@v4
|
|
91
|
+
with:
|
|
92
|
+
name: sdist
|
|
93
|
+
path: python/dist
|
|
94
|
+
|
|
95
|
+
release:
|
|
96
|
+
name: Publish to PyPI
|
|
97
|
+
runs-on: ubuntu-latest
|
|
98
|
+
needs: [linux, macos, windows, sdist]
|
|
99
|
+
if: startsWith(github.ref, 'refs/tags/v')
|
|
100
|
+
environment: pypi
|
|
101
|
+
steps:
|
|
102
|
+
- uses: actions/download-artifact@v4
|
|
103
|
+
with:
|
|
104
|
+
path: dist
|
|
105
|
+
merge-multiple: true
|
|
106
|
+
- name: Publish
|
|
107
|
+
uses: PyO3/maturin-action@v1
|
|
108
|
+
env:
|
|
109
|
+
MATURIN_PYPI_TOKEN: ${{ secrets.MATURIN_PYPI_TOKEN }}
|
|
110
|
+
with:
|
|
111
|
+
command: upload
|
|
112
|
+
args: --non-interactive --skip-existing dist/*
|
frogql-0.1.0/.gitignore
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
/target
|
|
2
|
+
*.gdb
|
|
3
|
+
!examples/*.gdb
|
|
4
|
+
__pycache__/
|
|
5
|
+
/bench/data/
|
|
6
|
+
# Bench artifacts (timestamped per run; regenerate via the bench
|
|
7
|
+
# scripts under bench/scripts/). Never useful to commit.
|
|
8
|
+
/bench/results/
|
|
9
|
+
# Cross-system bench artifacts: timestamped run outputs and external-
|
|
10
|
+
# system loaded data (each system imports LDBC CSVs into its native
|
|
11
|
+
# format under bench/data/cross-system/<system>/).
|
|
12
|
+
/bench/cross-system/results/
|
|
13
|
+
/bench/data/cross-system/
|
|
14
|
+
/python/.venv/
|
|
15
|
+
/python/foo.py
|
|
16
|
+
/python/pypi_token
|
|
17
|
+
/python/.mypy_cache/
|
|
18
|
+
/python/dist/
|
|
19
|
+
*.swp
|
|
20
|
+
# Large generated databases (re-create with scripts/convert_dev_datasets.py)
|
|
21
|
+
examples/neoflix.gdb
|
|
22
|
+
examples/olympics.gdb
|
|
23
|
+
examples/address.gdb
|
|
24
|
+
examples/recommendations.gdb
|
|
25
|
+
examples/twitter.gdb
|
|
26
|
+
examples/video_games.gdb
|
|
27
|
+
examples/bluesky.gdb
|
|
28
|
+
examples/buzzoverflow.gdb
|
|
29
|
+
examples/books.gdb
|
|
30
|
+
examples/ldbc-sf01.gdb
|
|
31
|
+
# LDBC SNB benchmark inputs (~100 MB at SF=0.1; rebuild with `gqlite ... --import-ldbc-csv`)
|
|
32
|
+
/bench/social_network-*
|
|
33
|
+
latex
|
|
@@ -0,0 +1,197 @@
|
|
|
1
|
+
# GQLite Architecture & Implementation Notes
|
|
2
|
+
|
|
3
|
+
This document captures the full design and implementation state of GQLite for context continuity.
|
|
4
|
+
|
|
5
|
+
## What This Is
|
|
6
|
+
|
|
7
|
+
A Rust implementation of a GQL (ISO Graph Query Language) graph database with single-file storage inspired by SQLite. Built as a research prototype accompanying an academic paper on GQL path pattern matching. The Python reference implementation lives in `../pygql/`.
|
|
8
|
+
|
|
9
|
+
**Repo:** `pleiad/gqlite` on GitHub.
|
|
10
|
+
|
|
11
|
+
## Architecture Overview
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
┌──────────────────────────────────────────────────────────┐
|
|
15
|
+
│ gqlite::compile(query) │
|
|
16
|
+
│ Public entry point: parse → optimize → return AST │
|
|
17
|
+
└──────────────┬───────────────────────────────────────────┘
|
|
18
|
+
│
|
|
19
|
+
┌──────────┼──────────┐
|
|
20
|
+
│ │ │
|
|
21
|
+
▼ ▼ ▼
|
|
22
|
+
Parser Optimizer Runtime Engine
|
|
23
|
+
(lexer + (predicate (evaluates AST against graph)
|
|
24
|
+
recursive pushdown) generic over GraphAccess trait
|
|
25
|
+
descent) │
|
|
26
|
+
│
|
|
27
|
+
┌──────────┼──────────────┐
|
|
28
|
+
│ │ │
|
|
29
|
+
▼ ▼ ▼
|
|
30
|
+
Graph LazyGraphStore DiskGraphStore
|
|
31
|
+
(in-memory) (lazy records) (disk indexes)
|
|
32
|
+
│ │ │
|
|
33
|
+
│ ┌────┴────┐ ┌────┴────┐
|
|
34
|
+
│ │ Pager │ │ Pager │
|
|
35
|
+
│ │(LRU │ │(LRU │
|
|
36
|
+
│ │ cache) │ │ cache) │
|
|
37
|
+
│ └────┬────┘ └────┬────┘
|
|
38
|
+
│ │ │
|
|
39
|
+
│ File I/O File I/O
|
|
40
|
+
│ (.gql file) (.gql file)
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Query Pipeline
|
|
44
|
+
|
|
45
|
+
```
|
|
46
|
+
"(x: Person)-[:Knows]->(y)"
|
|
47
|
+
→ parse() → PathPattern AST
|
|
48
|
+
→ optimize() → rewritten AST (predicate pushdown)
|
|
49
|
+
→ runtime.run() → IntermediateResult { rows: Vec<ResultRow> }
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Module Map
|
|
53
|
+
|
|
54
|
+
### `src/lib.rs`
|
|
55
|
+
Entry point: `pub fn compile(query: &str) -> Result<PathPattern, String>` — parse + optimize.
|
|
56
|
+
|
|
57
|
+
### `src/parser/` — GQL Parser
|
|
58
|
+
- `lexer.rs` — Hand-written tokenizer. Handles compound tokens (`]->`, `<-[`, `~[`, `]~`).
|
|
59
|
+
- `grammar.rs` — Recursive descent parser. Precedence: union (`|`) < concat (adjacency) < quantifiers (`{n,m}`, `*`, `+`, `?`). Expressions: logical < comparison < arithmetic < unary.
|
|
60
|
+
|
|
61
|
+
### `src/optimizer/` — AST Optimization
|
|
62
|
+
- `pushdown.rs` — Predicate pushdown. Extracts `x.attr is type` from WHERE AND-chains, merges into descriptors. `((x)-[y]->(z) WHERE x.a is bool) → (x:{a:bool})-[y]->(z)`. Only AND conjuncts; OR stays as filter.
|
|
63
|
+
|
|
64
|
+
### `src/syntax/` — AST Types
|
|
65
|
+
- `path_pattern.rs` — `PathPattern` enum: Node, EdgeRight/Left/Undirected/AnyDirection, Concat, Union, Filter, Repeat, Questioned.
|
|
66
|
+
- `descriptor.rs` — `Descriptor`: optional variable name + `DescriptorType`.
|
|
67
|
+
- `expr.rs` — `Expr` enum: Const, AttrLookup, Binop, Unop, Type. `BinOp`/`UnOp` enums with `delta()` for type checking.
|
|
68
|
+
|
|
69
|
+
### `src/typing/` — Lattice-Based Type System
|
|
70
|
+
- `simple_type.rs` — `SimpleType`: Z (int), S (str), B (bool), Star (*), Zero (⊥), Union, List. Meet/join/subtype.
|
|
71
|
+
- `label_type.rs` — `LabelType`: Label, Star, Top, Empty, And, Or, Neg. Boolean algebra. `from_list()`, `is_subtype()`, `meet()`, `as_simple_label()`.
|
|
72
|
+
- `property_type.rs` — `PropertyType`: Open `{}` (extra attrs OK), Closed `{{}}` (exact), Zero. Meet/subtype.
|
|
73
|
+
- `descriptor_type.rs` — `DescriptorType` = LabelType + PropertyType.
|
|
74
|
+
- `variable_type.rs` — `VariableType`: Node, EdgeDirectional, EdgeNonDirectional, Union, List, Zero. `Schema` struct.
|
|
75
|
+
|
|
76
|
+
### `src/runtime/` — Query Execution
|
|
77
|
+
- `engine.rs` — `Runtime<G: GraphAccess>`. Key optimizations:
|
|
78
|
+
- **Label-indexed scanning**: `run_node_pattern` checks for simple label, uses `nodes_with_label()` index.
|
|
79
|
+
- **Adjacency-driven concat**: `run_concat_pattern` detects edge/node on right side, uses `outgoing_edges()`/`incoming_edges()` instead of cross-product.
|
|
80
|
+
- **Hash-join fallback**: for complex right-side patterns, groups by first-node-id for O(n+m) instead of O(n×m).
|
|
81
|
+
- **Repetition hash-join**: builds hash map on grouped results once, reuses for each iteration.
|
|
82
|
+
- `assignment.rs` — `Assignment`: variable→PathValue bindings. `can_unify()`, `unify()`, `fill_nones()`, `to_group()`, `concat_group()`.
|
|
83
|
+
- `result.rs` — `ResultRow` (path + assignment), `IntermediateResult` (vec of rows), `ExprResult`.
|
|
84
|
+
|
|
85
|
+
### `src/model/` — Graph Data Model
|
|
86
|
+
- `value.rs` — `Value` (Int/Str/Bool), `PathValue` (Node/EdgeDirectional/EdgeUndirectional/Nothing/List), `Path` (with `can_concat`, `concat`, `first_node_id`, `last_node_id`).
|
|
87
|
+
- `graph.rs` — `Graph` struct: nodes, edges_d, edges_u, labels, props, endpoints, src, tgt + indexes (label_to_nodes/edges, outgoing/incoming/undirected_adj). Constructors: `from_file()` (JSON), `from_json_str()`, `from_json_value()`, `from_raw()`, `open()` (.gql), `save()`.
|
|
88
|
+
- `graph_access.rs` — `GraphAccess` trait: 17 methods. Core: nodes(), edges_directed/undirected(), labels(), props(), src(), tgt(), endpoints(), is_directed(), edge_path_value(). Index-aware: nodes_with_label(), directed/undirected_edges_with_label(). Adjacency: outgoing_edges(), incoming_edges(), undirected_edges_of().
|
|
89
|
+
|
|
90
|
+
### `src/store/` — Persistent Storage
|
|
91
|
+
- `string_table.rs` — Deduplicated string interning on pages. `intern()` → u32, `resolve()` → &str. Multi-page with overflow. `str_to_id` HashMap is public (used by DiskGraphStore).
|
|
92
|
+
- `record.rs` — Binary encode/decode for node/edge cells on slotted pages. Node: user_id_sid, label_sids, props (name_sid + typed value). Edge: node record + src_iid + tgt_iid + directed flag.
|
|
93
|
+
- `io.rs` — `save_graph()` and `load_graph()`. Writes node/edge data pages + on-disk indexes (label index, adjacency, ID index). `load_graph()` reads pages and rebuilds Graph via `from_raw()`.
|
|
94
|
+
- `lazy.rs` — `LazyGraphStore`: compact indexes in memory (IDs, topology), records read on demand through page cache. Uses `RefCell<Pager>` for interior mutability. `Box::leak` for returning references to lazily-loaded data.
|
|
95
|
+
- `disk.rs` — `DiskGraphStore`: adds on-disk label/adjacency indexes. Reads index pages through cache. Same `Box::leak` pattern.
|
|
96
|
+
- `disk_index.rs` — On-disk index format: sorted page chains. Label index (label_sid → page chain of element IDs), adjacency (node_iid → page chain of triples), ID index (sorted pairs for binary search). Page chain format: 8-byte header (type, count, next_page), then fixed-size entries.
|
|
97
|
+
|
|
98
|
+
### `src/pager/` — Page-Level I/O
|
|
99
|
+
- `page.rs` — 4KB slotted pages. Header: type, cell_count, cell_area_start. Cell pointer array grows forward, cell data grows backward. `insert_cell()`, `cell_offset()`, `read_at()`, `free_space()`.
|
|
100
|
+
- `header.rs` — File header (page 0): magic `GQLDB\0`, version, page_size, page_count, free_list_head, node/edge counts, root page pointers (string_table, node_data, edge_data, label_index, adjacency, edge_label_index, node_id_index, edge_id_index).
|
|
101
|
+
- `pager.rs` — `Pager`: create/open database file, read/write pages through LRU cache. `allocate_page()` (reuses from LIFO free list or extends file), `free_page()`. Configurable cache size (default 2000 pages = ~8MB). Cache stats (hits/misses).
|
|
102
|
+
|
|
103
|
+
## Three Storage Modes
|
|
104
|
+
|
|
105
|
+
| Mode | Memory | Speed | Use Case |
|
|
106
|
+
|------|--------|-------|----------|
|
|
107
|
+
| `Graph` | O(graph_size) ~6 bytes/element | Fastest (HashMap lookups) | <1M elements |
|
|
108
|
+
| `LazyGraphStore` | O(indexes) ~2 bytes/element | ~1.8x slower (page cache reads) | 1M-10M elements |
|
|
109
|
+
| `DiskGraphStore` | O(indexes + disk_index_roots) | ~2-3x slower (disk index reads) | Same as Lazy currently |
|
|
110
|
+
|
|
111
|
+
**Current limitation:** Both Lazy and Disk still hold user ID strings (`Vec<String>`) and ID-to-index HashMaps in memory because `GraphAccess` trait returns `&str`. To get true O(cache_size) memory, the trait would need to use u32 internal IDs everywhere with string resolution only at result formatting time.
|
|
112
|
+
|
|
113
|
+
## On-Disk File Format (.gql)
|
|
114
|
+
|
|
115
|
+
```
|
|
116
|
+
Page 0: File Header
|
|
117
|
+
Magic: "GQLDB\0"
|
|
118
|
+
Version: 1
|
|
119
|
+
Page size: 4096
|
|
120
|
+
Page count, free list head
|
|
121
|
+
Root pointers: string_table, node_data, edge_data,
|
|
122
|
+
label_index, adjacency, edge_label_index,
|
|
123
|
+
node_id_index, edge_id_index
|
|
124
|
+
|
|
125
|
+
Pages 1+:
|
|
126
|
+
StringTable pages (type=5): length-prefixed UTF-8 strings
|
|
127
|
+
NodeData pages (type=1): slotted pages with encoded node cells
|
|
128
|
+
EdgeData pages (type=2): slotted pages with encoded edge cells
|
|
129
|
+
LabelIndex pages (type=4): sorted page chains
|
|
130
|
+
Adjacency pages (type=3): triple page chains
|
|
131
|
+
Free pages (type=7): linked list via first 4 bytes
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## Key Design Decisions
|
|
135
|
+
|
|
136
|
+
1. **Enums not trait objects** for the type system. Python uses isinstance(); Rust enums with match are the direct equivalent.
|
|
137
|
+
|
|
138
|
+
2. **GraphAccess trait** for storage abstraction. Runtime is `Runtime<G: GraphAccess>` — same query code for all backends.
|
|
139
|
+
|
|
140
|
+
3. **`Box::leak` for lazy stores**. The trait returns `&LabelType` and `&Props` but lazy stores compute these on the fly. Leaking is bounded by query result size.
|
|
141
|
+
|
|
142
|
+
4. **Python mutation side-effect in repetition**. Python's `to_group()` mutates assignments in-place, affecting the original `ir`. In Rust, we clone and use the grouped version for both `res` and the hash map. This was a subtle porting bug.
|
|
143
|
+
|
|
144
|
+
5. **Hash-join over sort-merge**. For concat and repetition, grouping by first-node-id in a HashMap gives O(n+m) expected. The hash map for repetition is built once and reused across iterations.
|
|
145
|
+
|
|
146
|
+
6. **Predicate pushdown as compilation phase**. `compile()` = parse + optimize. The optimizer rewrites the AST before the runtime ever sees it.
|
|
147
|
+
|
|
148
|
+
7. **Label syntax: `:` prefix required**. `-[:Transfer]->` not `-[Transfer]->`. Without `:`, `Transfer` is parsed as a variable name, not a label. This matches the Python Lark grammar.
|
|
149
|
+
|
|
150
|
+
## Test Coverage
|
|
151
|
+
|
|
152
|
+
189 tests across 6 test files:
|
|
153
|
+
- `tests/parser_test.rs` (31) — AST structure from parsed queries
|
|
154
|
+
- `tests/runtime_test.rs` (26) — hand-built AST execution
|
|
155
|
+
- `tests/parse_and_run_test.rs` (41) — end-to-end: string → compile → run
|
|
156
|
+
- `tests/store_runtime_test.rs` (31) — save → reopen → run
|
|
157
|
+
- `tests/bench_test.rs` (4) — benchmarks with memory tracking
|
|
158
|
+
- `src/` inline tests (56) — unit tests for all modules
|
|
159
|
+
|
|
160
|
+
## Benchmarks (10K nodes, 55K edges, release mode)
|
|
161
|
+
|
|
162
|
+
| Query | Graph | Lazy | Disk |
|
|
163
|
+
|-------|-------|------|------|
|
|
164
|
+
| Label scan: Person | 1.6ms | 4.2ms | 3.2ms |
|
|
165
|
+
| 1-hop traversal | 13.5ms | 28.2ms | 58.4ms |
|
|
166
|
+
| 2-hop chain | 24.2ms | 44.9ms | 113.7ms |
|
|
167
|
+
| Repeat {1,2} | 24.0ms | 35.7ms | 33.8ms |
|
|
168
|
+
|
|
169
|
+
Memory at 100K nodes / 550K edges: Graph=603 MB, Lazy=212 MB, Disk=169 MB.
|
|
170
|
+
|
|
171
|
+
## What's NOT Implemented
|
|
172
|
+
|
|
173
|
+
- **Typechecker** (Phase 3 from original plan) — schema inference + type checking. Port of `pygql/gql/typechecker.py`. Skipped as optional.
|
|
174
|
+
- **CLI binary** — no `main.rs`. Library only.
|
|
175
|
+
- **True O(cache_size) DiskGraphStore** — needs `GraphAccess` redesign to use u32 internal IDs.
|
|
176
|
+
- **Cost-based query planning** — choosing join order by selectivity.
|
|
177
|
+
- **Write-back page cache** — current is write-through.
|
|
178
|
+
- **Transactions / WAL** — not needed for read-heavy research workload.
|
|
179
|
+
- **Unbounded repetition** (`*`, `+`) at runtime — only bounded `{lb, ub}`.
|
|
180
|
+
|
|
181
|
+
## Relationship to Python Implementation
|
|
182
|
+
|
|
183
|
+
The Rust runtime produces identical results to `pygql/` for all queries. Tests were ported from:
|
|
184
|
+
- `pygql/test/runtime_test.py` → `tests/runtime_test.rs` + `tests/parse_and_run_test.rs`
|
|
185
|
+
- `pygql/test/parser_test.py` → `tests/parser_test.rs`
|
|
186
|
+
|
|
187
|
+
Test databases (`test_data/fraud.json`, `test_data/social-network.json`) are copies of `pygql/dbs/`.
|
|
188
|
+
|
|
189
|
+
## Building and Testing
|
|
190
|
+
|
|
191
|
+
```bash
|
|
192
|
+
cargo build --release
|
|
193
|
+
cargo test # all 189 tests
|
|
194
|
+
cargo test --test bench_test --release -- --nocapture # benchmarks
|
|
195
|
+
cargo test --test bench_test bench_graph_vs_lazy --release -- --nocapture # 3-way comparison
|
|
196
|
+
cargo test --test bench_test bench_memory_scaling --release -- --nocapture # memory scaling
|
|
197
|
+
```
|