RubyGems - rdf-lmdb - Versions diffs - 0.3.2 → 0.3.3 - Mend

rdf-lmdb 0.3.2 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: ea35cfcc72e22d7a71f047728b412575ce712691c2815a3693d23f94909a78b2
-  data.tar.gz: ab6e65528c0b9bc70bf4d18a7f47b2f47206dbfd1d09aa99c4551a799694b3a0
+  metadata.gz: 0c021040c4ca9c7021c79808072cf55ce2c4f378f995726dfd7eec48c1f1b3ca
+  data.tar.gz: 62d9f85388152eddb2f7965ac32cb4fcbb0955d35ab410fa36726632f8f235c0
 SHA512:
-  metadata.gz: 483d8b372297ee978595110432dd21170cd34fafde285b7f0f1269fea9d5d06428066b19e8e9b6cf294c6e71dbf378e1b30e37215e36810fae19dd6ebbeda5fc
-  data.tar.gz: 20fac6ce5e493fa10ef923bd12e0e549e33b83a46f859b623f5fe5838ab42348ec1ef67884bfbe7baaa396ae1937a6bb3256897f86e8fd0343b0705ac2ed423d
+  metadata.gz: c76ff224facd0128e6554018bb85937a0f27175f061e2a59093ba007bd6c790330a93ec20c5346d0ba5ec45f242bad0f393d05b322b8d8ce80d705ee122af1f6
+  data.tar.gz: 5771bec23b120002df5d94e2c8f1e85657409f447924cacb1f2ec464e1fab5751ac1adbaf5fe3286e7e96051fe012e530ddac8ead2097e3e39a0033195b599c1

data/TODO.org ADDED Viewed

@@ -0,0 +1,138 @@
+#+STARTUP: showall hidestars indent
+* Preamble
+- Let us first disabuse ourselves of the notion that this is anyhthing more than a toy database.
+- That said, it's written in a language which is easy to experiment with, on top of a simple database which is easy to use.
+- Also, none of what I'm proposing in here is peculiar to either Ruby or LMDB.
+  - Indeed, any language and any direct-attached key-value store that does transactions could support this (I think?)
+- So whereas other products like [[https://github.com/oxigraph/oxigraph][Oxigraph]] are focused on features like SPARQL, I am particularly interested in how you lay out a key-value store /in general/ such that you can represent an RDF store with characteristics like:
+  - RDF-star (which I should just do anyway)
+  - a change history (i.e., undo)
+  - dealing with multiple users
+    - (i.e., access control)
+  - efficient storage of typed literals
+  - efficient handling of large literals and ~data:~ URIs
+    - unicode normalization for literals for sure
+    - outsourcing to content-addressable storage would be ideal
+    - There are going to be really silly SPARQL queries like searching substrings in ~data:~ URIs
+      - at the basic graph level we will probably just have to serve those up and deal with the cost of doing that
+  - Inferencing:
+    - RDFS, OWL, SHACL inferencing for basic graph queries
+      - don't /generate/ statements here, just return them if the inferences resolve
+  - Layering:
+    - think [[https://en.wikipedia.org/wiki/Union_mount][~unionfs~]] but for RDF stores
+    - "union graphs", contexts which merge two or more other contexts together
+      - no context is kind of like the union of all contexts
+      - except triples have to be stored in an invisible null context if they aren't explicitly ascribed a context
+        - if you select without a context it should return statements from all contexts at once
+        - if you delete a triple (ie not a quad) it should delete it from all contexts (y/n?)
+      - it should be possible to specify contexts that union other arbitrary contexts together
+        - this should recurse but probably not loop/self-reference
+        - the question (as ever) will be when you write to one of these, what happens?
+    - "consensus graphs" which extend the idea of union graphs to a shared reality for multiple users
+    - "proxy graphs" that map to other systems (e.g. SQL)
+      - or even other RDF stores
+    - statement-generating layers that do things we actually /do/ want statements in the graph for, but /generated/ rather than stored (or perhaps merely /cached/, and thus not subject to versioning)
+      - e.g. "soft" inferences, stuff written in vocab specs that had no way to formally express at the time
+        - I'm thinking specifically how ~?c a skos:OrderedCollection; skos:memberList (?m1 ?m2 ?mn)~ implies ~?c skos:member ?m1~ and so on.
+        - Totally achievable with [[https://www.w3.org/TR/shacl-af/#rules][SHACL rules]]
+      - e.g. stateful or aggregate statements computed from other statements
+        - again this is totally doable with SHACL.
+* TODO RDF-star
+- at root there are terms
+  - terms can be normalized and hashed
+  - each term is assigned a numeric identifier that is local to the database and not otherwise exposed
+    - assume this is a native-endian ~size_t~ integer; we are not gonna screw around with portability across cpu architectures
+      - so intel (and apple silicon coincidentally) will be 64-bit little-endian
+  - statements are composed of terms
+    - statements can be represented as: ~statement id => [subject id, predicate id, object id]~
+  - quad stores have contexts
+    - a context is just a term
+    - ~context id => statement id~
+      - also ~statement id => context id~
+  - the gist of RDF* is that entire statements can also be terms
+    - and this can be recursive
+    - so subjects and objects can now be /statements/ in addition to URIs and bnodes (and literals for objects)
+  - so it shouldn't be the end of the world to make that a thing
+  - albeit backward-compatibility to existing stores might be a problem
+    - well if anybody wants to hire me te do that for them, they can
+* TODO change history
+- anyway, that aside, what we're actually after is being able to access the state of the database at the instant of a particular transaction
+  - random access is ideal
+  - indeed random access is probably /necessary/, all things considered
+- so there should be a basic key-value map that maps statement identifiers to statements
+  - then there is another one that maps statements to contexts; this is how contexts are handled
+- each transaction can basically be seen as a "meta-context"
+  - i.e., the state after the transaction is committed may as well have its own context URL.
+  - the grammar of change in an rdf store reduces to:
+    - statements added
+    - statements removed
+  - we can work with this
+- again, you have layer /zero/ which maps between terms and hashes/internal IDs
+  - this is like saying "the database has seen /these/ terms."
+- you have layer /one/ which maps statements (which are also considered terms) to their referents
+  - this is like saying "the database has seen /these/ statements."
+  - (again note statements are also terms under RDF*.)
+- layer /two/ says which /contexts/ the statements belong to.
+  - this is like saying "the /context/ currently contains these statements."
+  - there is a "null" context that includes all statements ever
+** TODO make a sandwich layer between raw statements and context for current state
+- between-/ish/: you can easily imagine removing a statement from one context and adding it to another within a single transaction
+- every transaction can be represented as adding and/or removing zero or more quads such that the union of both sets is nonempty
+  - otherwise there's nothing to record
+  - in other words to be recorded as a transaction you have to /either/ add /or/ remove at least one quad, otherwise it's a no-op
+- originally considered using generated contexts as a surrogate interface for identifying individual states
+  - this obviously isn't going to work because a context implies what remains is a /triple/, not a /quad/, so diffs that don't change anything but the context of a given statement aren't going to be visible
+  - although ehhh that's gonna be weird already because you'll have to have individual contexts for the add side /and/ remove side
+    - how else are you going to represent statements that were removed?
+- anyway there is the technical problem of how to implement this without a shitload of waste
+  - change ID
+  - statements removed
+  - statements added
+- if the change ID monotonically increases (it should, at least internally) on retrieval we just do this:
+  - retrieve the statement from whatever stateless storage
+  - check if it has been added by whatever change ID we're currently looking at
+  - check if it has not been subsequently removed
+    - if it /has/ been subsequently removed, check if it has been re-added
+    - basically we need a mapping of statement ID to change ID
+      - why not just stick a bit on the end of that as to whether it's added or removed
+      - so we have ~added~ and ~removed~ tables of the form ~change id => statement id~
+      - we also have i dunno, ~state~ or something of the form ~statement id => change id, bit for added/removed~
+** TODO global mtime
+- which resources have been affected since this time/transaction id
+* TODO principals (multi-user)
+- each individual user gets their own quad store from their point of view
+- "consensus graph" for multiple users
+  - union of individual spaces
+    - one context identifier everybody involved can read in its totality
+  - every statement you /add/ goes into your own slice and is visible to everybody in the group
+  - you can't add or delete statements in other people's slices and they can't change yours
+  - though they should be able to transfer ownership of a set of statements to you somehow
+    - (but the person receiving should be able to decline the transfer)
+** TODO lensing
+- my concern here is a way to have a single repository that can support multiple users without "leaking" content from other users
+  - like we /could/ just partition these on the disk but there are reasons not to do this:
+    1. it's gonna be a pain in the ass for downstream applications in the best case
+    2. there will be at least /some/ material in common across all users, so content will be unnecessarily duplicated
+    3. LMDB uses ~mmap~ and so running multiple repositories will use lots of ram
+- i.e., rudimentary "access control"
+- it would behave like a second, invisible context
+- something like ~lensed_repo = repo.lens uri~
+  - then you can use ~lensed_repo~ without worrying that it will leak
+** TODO access control
+- evaluate different approaches
+  - resource-based
+    - individual resources or sets of resources?
+    - privileges:
+      - know the existence of a resource
+        - i.e. you don't see statements with this rsource
+      - read statements where the resource is a subject
+        - going to have to censor ~owl:inverseOf~ etc, i.e. access control will have to be evaluated before inferences
+      - add statements with this subject
+      - remove statements with this subject
+  - statement-based
+    - just access-control entire contexts?
+    - that would probably be easiest
+  - identity-oriented vs capability-oriented
+    - would kinda love to do capability-oriented
+* TODO layered graphs
+- yeah this is gonna be hard lol

data/lib/rdf/lmdb/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module RDF
   module LMDB
-    VERSION = "0.3.2"
+    VERSION = "0.3.3"
   end
 end

data/lib/rdf/lmdb.rb CHANGED Viewed

@@ -5,6 +5,7 @@ require 'rdf/ntriples'
 require 'pathname'
 require 'lmdb'
 require 'digest'
+require 'time'
 require 'unf' # lol unf unf unf
 module RDF
@@ -165,6 +166,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
         # databases are opened in a transaction, who knew
         @lmdb.transaction do # |t|
           @dbs = {
+            # this is the control database, it gets no flags
+            control:   [],
+            # actual instance data
             statement: [:integerkey], # key: int; val: ints
             hash2term: [],            # key: sha256, val: int
             int2term:  [:integerkey], # key: int, val: string
@@ -187,6 +191,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
               **(flags + [:create]).map { |f| [f, true] }.to_h)]
           end.to_h
+          # this will write the mtime if it isn't already there
+          mtime
           # t.commit
         end
         @lmdb.sync
@@ -635,6 +642,14 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
         ret
       end
+      def log_mtime time = nil
+        time ||= Time.now in: ?Z
+        nsecs = time.utc.to_r
+        nsecs = (nsecs * 10**9).numerator
+        @lmdb.transaction { @dbs[:control]['mtime'] = [nsecs].pack ?q }
+        time
+      end
       public
       def initialize dir = nil, uri: nil, title: nil, **options, &block
@@ -679,11 +694,28 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
         @lmdb.close
       end
+      # Return a {::Time} object representing when the store was last written.
+      #
+      # @return [Time] said modification time
+      #
+      def mtime
+        if packed = @dbs[:control]['mtime']
+          nsecs = Rational(packed.unpack1(?q), 10 ** 9)
+          Time.at nsecs, in: ?Z
+        else
+          log_mtime
+        end
+      end
       # data manipulation
       def insert_statement statement
         complete! statement
-        @lmdb.transaction { |t| add_one statement; t.commit }
+        @lmdb.transaction do |t|
+          add_one statement
+          log_mtime
+          t.commit # cargo cult?
+        end
         nil
       end
@@ -698,6 +730,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
           else
             rm_one statement
           end
+          log_mtime
           t.commit
         end
         nil
@@ -709,6 +744,7 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
             complete! statement
             add_one statement
           end
+          log_mtime
         end
         nil
@@ -736,6 +772,9 @@ Currently you have to dump from the old layout and reload the new one. Sorry!
           end
           clean_terms hashes.uniq
+          log_mtime
           t.commit
         end

data/rdf-lmdb.gemspec CHANGED Viewed

@@ -36,5 +36,5 @@ robust key-value store.
   # stuff we use
   spec.add_runtime_dependency 'unf',  '~> 0.1'
   spec.add_runtime_dependency 'rdf',  '~> 3'
-  spec.add_runtime_dependency 'lmdb', '~> 0.6.1'
+  spec.add_runtime_dependency 'lmdb', '~> 0.6', '>= 0.6.2'
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rdf-lmdb
 version: !ruby/object:Gem::Version
-  version: 0.3.2
+  version: 0.3.3
 platform: ruby
 authors:
 - Dorian Taylor
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2024-02-19 00:00:00.000000000 Z
+date: 2025-04-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -100,14 +100,20 @@ dependencies:
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.6.1
+        version: '0.6'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 0.6.2
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - "~>"
       - !ruby/object:Gem::Version
-        version: 0.6.1
+        version: '0.6'
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 0.6.2
 description: |
   This module implements RDF::Repository on top of LMDB, a fast and
   robust key-value store.
@@ -124,6 +130,7 @@ files:
 - LICENSE
 - README.md
 - Rakefile
+- TODO.org
 - bin/console
 - bin/setup
 - lib/rdf-lmdb.rb
@@ -149,7 +156,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.3.15
+rubygems_version: 3.4.20
 signing_key:
 specification_version: 4
 summary: Symas LMDB back-end for RDF::Repository