declarative_policy 1.0.0 → 1.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +2 -0
- data/.gitlab-ci.yml +59 -16
- data/.rubocop.yml +4 -1
- data/CHANGELOG.md +8 -0
- data/CONTRIBUTING.md +41 -0
- data/Gemfile +7 -8
- data/Gemfile.lock +37 -20
- data/LICENSE.txt +4 -1
- data/README.md +6 -4
- data/benchmarks/repeated_invocation.rb +37 -0
- data/declarative_policy.gemspec +1 -1
- data/doc/caching.md +299 -1
- data/doc/defining-policies.md +29 -3
- data/doc/optimization.md +277 -0
- data/lib/declarative_policy/base.rb +60 -28
- data/lib/declarative_policy/cache.rb +1 -1
- data/lib/declarative_policy/condition.rb +4 -2
- data/lib/declarative_policy/configuration.rb +7 -1
- data/lib/declarative_policy/rule.rb +5 -5
- data/lib/declarative_policy/runner.rb +58 -26
- data/lib/declarative_policy/version.rb +1 -1
- data/lib/declarative_policy.rb +30 -40
- metadata +11 -7
data/doc/caching.md
CHANGED
@@ -1,4 +1,302 @@
|
|
1
1
|
# Caching
|
2
2
|
|
3
|
-
|
3
|
+
This library deals with making observations about the state of
|
4
|
+
a system (usually performing I/O, such as making a database query),
|
5
|
+
and combining these facts into logical propositions.
|
4
6
|
|
7
|
+
In order to make this performant, the library transparently caches repeated
|
8
|
+
observations of conditions. Understanding how caching works is useful for
|
9
|
+
designing good policies, using them effectively.
|
10
|
+
|
11
|
+
## What is cached?
|
12
|
+
|
13
|
+
If a policy is instantiated with a cache, then the following things will be
|
14
|
+
stored in it:
|
15
|
+
|
16
|
+
- Policy instances (there will only ever be one policy per `user/subject` pair
|
17
|
+
for the lifetime of the cache).
|
18
|
+
- Condition results
|
19
|
+
|
20
|
+
The correctness of these cached values depends on the correctness of the
|
21
|
+
cache-keys. We assume the objects in your domain have a `#id` method that
|
22
|
+
fully captures the notion of object identity. See [Cache keys](#cache-keys) for
|
23
|
+
details. All cache keys begin with `"/dp/"`.
|
24
|
+
|
25
|
+
Policies themselves cache the results of the abilities they compute.
|
26
|
+
|
27
|
+
Policies distinguish between facts based on the type of the fact:
|
28
|
+
|
29
|
+
- Boolean facts: implemented with `condition`.
|
30
|
+
- Abilities: implemented with `rule` blocks.
|
31
|
+
- Non-boolean facts: implemented by policy instance methods.
|
32
|
+
|
33
|
+
For example, consider a policy for countries:
|
34
|
+
|
35
|
+
```ruby
|
36
|
+
class CountryPolicy < DeclarativePolicy::Base
|
37
|
+
condition(:citizen) { @user.citizen_of?(country.country_code) }
|
38
|
+
condition(:eu_citizen, scope: :user) { @user.citizen_of?(*Unions::EU) }
|
39
|
+
condition(:eu_member, scope: :subject) { Unions::EU.include?(country.country_code) }
|
40
|
+
|
41
|
+
condition(:has_visa_waiver) { country.visa_waivers.any? { |c| @user.citizen_of?(c) } }
|
42
|
+
condition(:permanent_resident) { visa_category == :permanent }
|
43
|
+
condition(:has_work_visa) { visa_category == :work }
|
44
|
+
condition(:has_current_visa) { has_visa_waiver? || current_visa.present? }
|
45
|
+
condition(:has_business_visa) { has_visa_waiver? || has_work_visa? || visa_category == :business }
|
46
|
+
|
47
|
+
condition(:full_rights, score: 20) { citizen? || permanent_resident? }
|
48
|
+
condition(:banned) { country.banned_list.include?(@user) }
|
49
|
+
|
50
|
+
rule { eu_member & eu_citizen }.enable :freedom_of_movement
|
51
|
+
rule { full_rights | can?(:freedom_of_movement) }.enable :settle
|
52
|
+
rule { can?(:settle) | has_current_visa }.enable :enter_country
|
53
|
+
rule { can?(:settle) | has_business_visa }.enable :attend_meetings
|
54
|
+
rule { can?(:settle) | has_work_visa }.enable :work
|
55
|
+
rule { citizen }.enable :vote
|
56
|
+
rule { ~citizen & ~permanent_resident }.enable :apply_for_visa
|
57
|
+
rule { banned }.prevent :enter_country, :apply_for_visa
|
58
|
+
|
59
|
+
def current_visa
|
60
|
+
return @current_visa if defined?(@current_visa)
|
61
|
+
|
62
|
+
@current_visa = country.active_visas.find_by(applicant: @user)
|
63
|
+
end
|
64
|
+
|
65
|
+
def visa_category
|
66
|
+
current_visa&.category
|
67
|
+
end
|
68
|
+
|
69
|
+
def country
|
70
|
+
@subject
|
71
|
+
end
|
72
|
+
end
|
73
|
+
```
|
74
|
+
|
75
|
+
This is a reasonably realistic policy - there are a few pieces of state (the
|
76
|
+
country, the list of visa waiver agreements, the list of citizenships the user
|
77
|
+
holds, the kind of visa the user has, if they have one, the current list of
|
78
|
+
banned users), and these are combined to determine a range of abilities (whether
|
79
|
+
one can visit or live in or vote in a certain country). Importantly, these
|
80
|
+
pieces of information are re-used between abilities - the citizenship status is
|
81
|
+
relevant to all abilities, whereas the banned list is only considered on entry
|
82
|
+
and when applying for a new visa).
|
83
|
+
|
84
|
+
If we imagine that some of these operations are reasonably expensive (fetching
|
85
|
+
the current visa status, or checking the banned list, for example), then it
|
86
|
+
follows that we really care about avoiding re-computation of these facts. In the
|
87
|
+
policy above we can see a few strategies that are taken to avoid this:
|
88
|
+
|
89
|
+
- Conditions are re-used liberally.
|
90
|
+
- Non-boolean facts are cached at the policy level.
|
91
|
+
|
92
|
+
## Re-using conditions
|
93
|
+
|
94
|
+
Rules can and should re-use conditions as much as possible. Condition
|
95
|
+
observations are cached automatically, so referring to the same condition in
|
96
|
+
multiple rules is encouraged. Conditions can also refer to other conditions by
|
97
|
+
using the predicate methods that are created for them (see `full_rights`, which
|
98
|
+
refers to the `:citizen` condition as `citizen?`).
|
99
|
+
|
100
|
+
Note that referring to conditions inside other conditions can be DRY, but it
|
101
|
+
limits the ability of the library to optimize the steps (see
|
102
|
+
[optimization](./optimization.md)). For example in the `:has_current_visa`
|
103
|
+
condition, the sub-conditions will always be tested in the order
|
104
|
+
`has_visa_waiver` then `current_visa.present?`. It is recommended not to rely
|
105
|
+
heavily on this kind of abstraction.
|
106
|
+
|
107
|
+
## Re-using rules
|
108
|
+
|
109
|
+
Entire rule-sets can be re-used with `can?`. This is a form of logical
|
110
|
+
implication where a previous conclusion can be used in a further rule. Examples
|
111
|
+
of this here are `can?(:settle)` and `can?(:freedom_of_movement)`. This can
|
112
|
+
prevent having to repeat long groups of conditions in rule definitions. This
|
113
|
+
abstraction is transparent to the optimizer.
|
114
|
+
|
115
|
+
## Non-boolean values must be managed manually
|
116
|
+
|
117
|
+
The condition `has_current_visa` and the more specific
|
118
|
+
`has_{work,business}_visa` all refer to the same piece of state - the
|
119
|
+
`#current_visa`. Since this is not a boolean (but is here a database record with
|
120
|
+
a `#category` attribute), this cannot be a condition, but must be managed by the
|
121
|
+
policy itself.
|
122
|
+
|
123
|
+
The best approach here is to use normal Ruby methods and instance variables for
|
124
|
+
such values. The policy instances themselves are cached, so that any two
|
125
|
+
invocations of `DeclarativePolicy.policy_for(user, object)` with identical
|
126
|
+
`user` and `object` arguments will always return the same policy object. This
|
127
|
+
means instance variables stored on the policy will be available for the lifetime
|
128
|
+
of the cache.
|
129
|
+
|
130
|
+
Methods can be used for the usual reasons of clarity (such as referring to the
|
131
|
+
`@subject` as `country`) and brevity (such as `visa_category`).
|
132
|
+
|
133
|
+
## Cache lifetime
|
134
|
+
|
135
|
+
The cache is provided by the user of the library, passing it to the
|
136
|
+
`.policy_for` method. For example:
|
137
|
+
|
138
|
+
```ruby
|
139
|
+
DeclarativePolicy.policy_for(user, country, cache: some_cache_value)
|
140
|
+
```
|
141
|
+
|
142
|
+
The object only needs to implement the following methods:
|
143
|
+
|
144
|
+
- `cache[key: String] -> Boolean?`: Fetch the cached value
|
145
|
+
- `cache.key?(key: String) -> Boolean`: Test if the key is cached
|
146
|
+
- `cache[key: String] = Boolean`: Cache a value
|
147
|
+
|
148
|
+
Obviously, a `HashMap` will work just fine, but so will a wrapper around a
|
149
|
+
[`Concurrent::Map`](https://ruby-concurrency.github.io/concurrent-ruby/1.1.4/Concurrent/Map.html),
|
150
|
+
or even a map that delegates to Redis with a TTL for each key, so long as the
|
151
|
+
object supports these methods. Keys are never deleted by the library, and values
|
152
|
+
are only computed if the key is not cached, so it is up to the application code
|
153
|
+
to determine the life-time of each key.
|
154
|
+
|
155
|
+
Clearly, cache-invalidation is a hard problem. At GitLab we share a single cache
|
156
|
+
object for each request - so any single request can freely request a permission
|
157
|
+
check multiple times (or even compute related abilities, such as
|
158
|
+
`:enter_country` and `:settle`) and know that no work is duplicated. This
|
159
|
+
allows developers to reason declaratively, and add permission checks where
|
160
|
+
needed, without worrying about performance.
|
161
|
+
|
162
|
+
## Cache sharing: scopes
|
163
|
+
|
164
|
+
Not all conditions are equally specific. The condition `citizen` refers to
|
165
|
+
both the user and the country, and so can only be used when checking both the
|
166
|
+
user and the country. We say that this is the `normal` scope.
|
167
|
+
|
168
|
+
This is not always true however. Sometimes a condition refers only to the user.
|
169
|
+
For example, above we have two conditions: `eu_citizen` and `eu_member`:
|
170
|
+
|
171
|
+
```ruby
|
172
|
+
condition(:eu_citizen, scope: :user) { @user.citizen_of?(*Unions::EU) }
|
173
|
+
condition(:eu_member, scope: :subject) { Unions::EU.include?(country.country_code) }
|
174
|
+
```
|
175
|
+
|
176
|
+
`eu_citizen` refers only to the user, and `eu_member` refers only to the
|
177
|
+
country.
|
178
|
+
|
179
|
+
If we have a user that wants to enter multiple countries on a grand European
|
180
|
+
tour, we could check this with:
|
181
|
+
|
182
|
+
```ruby
|
183
|
+
itinerary.countries.all? { |c| DeclarativePolicy.policy_for(user, c).allowed?(:enter_country) }
|
184
|
+
```
|
185
|
+
|
186
|
+
If `eu_citizen` were declared with the `normal` scope, then this would have a lot of cache
|
187
|
+
misses. By using the `:user` scope on `eu_citizen`, we only check EU citizenship
|
188
|
+
once.
|
189
|
+
|
190
|
+
Similarly for `eu_member`, if a team of football players want to visit a
|
191
|
+
country, then we could check this with:
|
192
|
+
|
193
|
+
```ruby
|
194
|
+
team.players.all? { |user| DeclarativePolicy.policy_for(user, country).allowed?(:enter_country) }
|
195
|
+
```
|
196
|
+
|
197
|
+
Again, by declaring `eu_member` as having the `:subject` scope, this ensures we
|
198
|
+
only check EU membership once, not once for each football player.
|
199
|
+
|
200
|
+
The last scope is `:global`, used when the condition is universally true:
|
201
|
+
|
202
|
+
```ruby
|
203
|
+
condition(:earth_destroyed_by_meteor, scope: global) { !Planet::Earth.exists? }
|
204
|
+
|
205
|
+
rule { earth_destroyed_by_meteor }.prevent_all
|
206
|
+
```
|
207
|
+
|
208
|
+
In this case, it doesn't matter who the user is or even where they are going:
|
209
|
+
the condition will be computed once (per cache lifetime) for all combinations.
|
210
|
+
|
211
|
+
Because of the implications for sharing, the scope determines the
|
212
|
+
[`#score`](https://gitlab.com/gitlab-org/declarative-policy/blob/2ab9dbdf44fb37beb8d0f7c131742d47ae9ef5d0/lib/declarative_policy/condition.rb#L58-77) of
|
213
|
+
the condition (if not provided explicitly). The intention is to prefer values we
|
214
|
+
are more likely (all other things being equal) to re-use:
|
215
|
+
|
216
|
+
- Conditions we have already cached get a score of `0`.
|
217
|
+
- Conditions that are in the `:global` scope get a score of `2`.
|
218
|
+
- Conditions that are in the `:user` or `:subject` scopes get a score of `8`.
|
219
|
+
- Conditions that are in the `:normal` scope get a score of `16`.
|
220
|
+
|
221
|
+
Bear helper-methods in mind when defining scopes. While the instance level cache
|
222
|
+
for non-boolean values would not be shared, as long as the derived condition is
|
223
|
+
shared (for example by being in the `:user` scope, rather than the `:normal`
|
224
|
+
scope), helper-methods will also benefit from improved cache hits.
|
225
|
+
|
226
|
+
### Preferred scope
|
227
|
+
|
228
|
+
In the example situations above (a single user visiting many countries, or a
|
229
|
+
football team visiting one country), we know which is more likely to be useful,
|
230
|
+
the `:subject` or the `:user` scope. We can inform the optimizer of this
|
231
|
+
by setting `DeclarativePolicy.preferred_scope`.
|
232
|
+
|
233
|
+
To do this, check the abilities within a block bounded
|
234
|
+
by [`DeclarativePolicy.with_preferred_scope`](https://gitlab.com/gitlab-org/declarative-policy/blob/481c322a74f76c325d3ccab7f2f3cc2773e8168b/lib/declarative_policy/preferred_scope.rb#L7-13).
|
235
|
+
For example:
|
236
|
+
|
237
|
+
```ruby
|
238
|
+
cache = {}
|
239
|
+
|
240
|
+
# preferring to run user-scoped conditions
|
241
|
+
DeclarativePolicy.with_preferred_scope(:user) do
|
242
|
+
itinerary.countries.all? do |c|
|
243
|
+
DeclarativePolicy.policy_for(user, c, cache: cache).allowed?(:enter_country)
|
244
|
+
end
|
245
|
+
end
|
246
|
+
|
247
|
+
# preferring to run subject-scoped conditions
|
248
|
+
DeclarativePolicy.with_preferred_scope(:subject) do
|
249
|
+
team.players.all? do |player|
|
250
|
+
DeclarativePolicy.policy_for(player, c, cache: cache).allowed?(:enter_country)
|
251
|
+
end
|
252
|
+
end
|
253
|
+
|
254
|
+
```
|
255
|
+
|
256
|
+
When we set `preferred_scope`, this reduces the default score for conditions in
|
257
|
+
that scope, so that they are more likely to be executed first. Instead of `8`,
|
258
|
+
they are given a default score of `4`.
|
259
|
+
|
260
|
+
## Cache keys
|
261
|
+
|
262
|
+
In order for an object to be cached, it should be able to identify itself
|
263
|
+
with a suitable cache key. A good cache key will identify an object, without
|
264
|
+
containing irrelevant information - a database `#id` is perfect, and this
|
265
|
+
library defaults to calling an `#id` method on objects, falling back to
|
266
|
+
`object_id`.
|
267
|
+
|
268
|
+
Relying on `object_id` is not recommended since otherwise equivalent objects
|
269
|
+
have different `object_id` values, and using `object_id` will not get optimal caching. All
|
270
|
+
policy subjects should implement `#id` for this reason. `ActiveRecord` models
|
271
|
+
with an `id` primary ID attribute do not need any extra configuration.
|
272
|
+
|
273
|
+
Please see: [`DeclarativePolicy::Cache`](https://gitlab.com/gitlab-org/declarative-policy/blob/master/lib/declarative_policy/cache.rb).
|
274
|
+
|
275
|
+
## Cache invalidation
|
276
|
+
|
277
|
+
Generally, cache invalidation is best avoided. It is very hard to get right, and
|
278
|
+
relying on it opens you up to subtle but pernicious bugs that are hard to
|
279
|
+
reproduce and debug.
|
280
|
+
|
281
|
+
The best strategy is to run all permission checks upfront, before mutating any
|
282
|
+
state that might change a permission computation. For instance, if you want to
|
283
|
+
make a user an administrator, then check for permission **before** assigning
|
284
|
+
administrator privileges.
|
285
|
+
|
286
|
+
However, it isn't always possible to avoid needing to mark certain parts of the
|
287
|
+
cached state as dirty (in need of re-computation). If this is needed, then you
|
288
|
+
can call the `DeclarativePolicy.invalidate(cache, keys)` method. This takes an
|
289
|
+
enumerable of dirty keys, and:
|
290
|
+
|
291
|
+
- removes the cached condition results from the cache
|
292
|
+
- marks the abilities that depend on those conditions as dirty, and in need of
|
293
|
+
re-computation.
|
294
|
+
|
295
|
+
The responsibility for determining which cache-keys are dirty falls on the
|
296
|
+
client. You could, for example, do this by observing which keys are added to the
|
297
|
+
cache (knowing that condition keys all start with `"/dp/condition/"`), or by
|
298
|
+
scanning the cache for keys that match a heuristic.
|
299
|
+
|
300
|
+
This method is the only place where the `#delete` method is called on the cache.
|
301
|
+
If you do not call `.invalidate`, there is no need for the cache to implement
|
302
|
+
`#delete`.
|
data/doc/defining-policies.md
CHANGED
@@ -74,7 +74,7 @@ condition(:owns) { @subject.owner == @user }
|
|
74
74
|
condition(:has_access_to) { @subject.owner.trusts?(@user) }
|
75
75
|
condition(:old_enough_to_drive) { @user.age >= laws.minimum_age }
|
76
76
|
condition(:has_driving_license) { @user.driving_license&.valid? }
|
77
|
-
condition(:intoxicated, score: 5) { @user.blood_alcohol
|
77
|
+
condition(:intoxicated, score: 5) { @user.blood_alcohol > laws.max_blood_alcohol }
|
78
78
|
condition(:has_access_to, score: 3) { @subject.owner.trusts?(@user) }
|
79
79
|
```
|
80
80
|
|
@@ -108,8 +108,7 @@ Rules are conclusions we can draw based on the facts:
|
|
108
108
|
rule { owns }.enable :drive_vehicle
|
109
109
|
rule { has_access_to }.enable :drive_vehicle
|
110
110
|
rule { ~old_enough_to_drive }.prevent :drive_vehicle
|
111
|
-
rule { intoxicated }.prevent :drive_vehicle
|
112
|
-
rule { ~has_driving_license }.prevent :drive_vehicle
|
111
|
+
rule { intoxicated | ~has_driving_license }.prevent :drive_vehicle
|
113
112
|
```
|
114
113
|
|
115
114
|
Rules are combined such that each ability must be enabled at least once, and not
|
@@ -130,6 +129,33 @@ access the `@user` or `@subject`, or any methods on the policy instance. You
|
|
130
129
|
should not perform I/O in a rule. They exist solely to define the logical rules
|
131
130
|
of implication and combination between conditions.
|
132
131
|
|
132
|
+
The available operations inside a rule block are:
|
133
|
+
|
134
|
+
- Bare words to refer to conditions in the policy, or on any delegate.
|
135
|
+
For example `owns`. This is equivalent to `cond(:owns)`, but as a matter of
|
136
|
+
general style, bare words are preferred.
|
137
|
+
- `~` to negate any rule. For example `~owns`, or `~(intoxicated | banned)`.
|
138
|
+
- `&` or `all?` to combine rules such that all must succeed. For example:
|
139
|
+
`old_enough_to_drive & has_driving_license` or `all?(old_enough_to_drive, has_driving_license)`.
|
140
|
+
- `|` or `any?` to combine rules such that one must succeed. For example:
|
141
|
+
`intoxicated | banned` or `any?(intoxicated, banned)`.
|
142
|
+
- `can?` to refer to the result of evaluating an ability. For example,
|
143
|
+
`can?(:sell_vehicle)`.
|
144
|
+
- `delegate(:delegate_name, :condition_name)` to refer to a specific
|
145
|
+
condition on a named delegate. Use of this is rare, but can be used to
|
146
|
+
handle overrides. For example if a vehicle policy defines a delegate as
|
147
|
+
`delegate :registration`, then we could refer to that
|
148
|
+
as `rule { delegate(:registration, :valid) }`.
|
149
|
+
|
150
|
+
Note: Be careful not to confuse `DeclarativePolicy::Base.condition` with
|
151
|
+
`DeclarativePolicy::RuleDSL#cond`.
|
152
|
+
|
153
|
+
- `condition` constructs a condition from a name and a block. For example:
|
154
|
+
`condition(:adult) { @subject.age >= country.age_of_majority }`.
|
155
|
+
- `cond` constructs a rule which refers to a condition by name. For example:
|
156
|
+
`rule { cond(:adult) }.enable :vote`. Use of `cond` is rare - it is nicer to
|
157
|
+
use the bare word form: `rule { adult }.enable :vote`.
|
158
|
+
|
133
159
|
### Complex conditions
|
134
160
|
|
135
161
|
Conditions may be combined in the rule blocks:
|
data/doc/optimization.md
ADDED
@@ -0,0 +1,277 @@
|
|
1
|
+
# Optimization
|
2
|
+
|
3
|
+
This library cares a lot about performance, and includes features that
|
4
|
+
aim to limit the impact of permission checks on an application. In particular,
|
5
|
+
effort is made to ensure that repeated checks of the same permission are
|
6
|
+
efficient, aiming to eliminate repeated computation and unnecessary I/O.
|
7
|
+
|
8
|
+
The key observation: permission checks generally involve some facts
|
9
|
+
about the real world, and this involves (relatively expensive) I/O to compute.
|
10
|
+
These facts are then combined in some way to generate a judgment. Not all facts
|
11
|
+
are necessary to know in order to determine a judgment. The main aims of the
|
12
|
+
library:
|
13
|
+
|
14
|
+
- Avoid unnecessary work.
|
15
|
+
- If we must do work, do the least work possible.
|
16
|
+
|
17
|
+
The library enables you to define both how to compute these facts
|
18
|
+
(conditions), and how to combine them (rules), but the library is entirely
|
19
|
+
responsible for the scheduling of when to compute each fact.
|
20
|
+
|
21
|
+
## Making truth
|
22
|
+
|
23
|
+
This library is essentially a build-system for truth - you can think of it as
|
24
|
+
similar to [`make`](https://www.gnu.org/software/make/), but:
|
25
|
+
|
26
|
+
- Instead of `targets` there are `abilities`.
|
27
|
+
- Instead of `files`, we produce `boolean` values.
|
28
|
+
|
29
|
+
We have no notion of freshness - uncached conditions are always re-computed, but
|
30
|
+
just like `make`, we try to do the least work possible in order to evaluate the
|
31
|
+
given ability.
|
32
|
+
|
33
|
+
For the interested, this corresponds to
|
34
|
+
[`memo`](https://hackage.haskell.org/package/build-1.0/docs/src/Build.System.html#memo) in
|
35
|
+
the taxonomy of build systems (although the scheduler here is somewhat smarter
|
36
|
+
about the relative order of dependencies).
|
37
|
+
|
38
|
+
## Optimization is reducing computation of expensive I/O
|
39
|
+
|
40
|
+
In the context of this library, optimization refers to ways we can:
|
41
|
+
|
42
|
+
- Expose the smallest possible units of I/O to the scheduler.
|
43
|
+
- Never run a computation twice.
|
44
|
+
- Indicate to the scheduler which computations should be run first.
|
45
|
+
|
46
|
+
For example, if a policy defines the following rule:
|
47
|
+
|
48
|
+
```ruby
|
49
|
+
rule { fact_a & fact_b }.enable :some_ability
|
50
|
+
```
|
51
|
+
|
52
|
+
The core of the matter: if we know in advance that `fact_a == false`, then we do not need to compute
|
53
|
+
`fact_b`. Conversely, if we know in advance that `fact_b == false`, then we do
|
54
|
+
not need to run `fact_a`. The same goes for `fact_a | fact_a`.
|
55
|
+
|
56
|
+
In this case:
|
57
|
+
|
58
|
+
- The smallest possible units of I/O are `fact_a` and `fact_b`, and the library
|
59
|
+
is aware of them.
|
60
|
+
- The library uses the [cache](./caching.md) to avoid running a condition more
|
61
|
+
than once.
|
62
|
+
- It does not matter which order we run these conditions in - the scheduler is
|
63
|
+
free to re-order them if it thinks that `fact_b` is somehow more efficient to
|
64
|
+
compute than `fact_a`.
|
65
|
+
|
66
|
+
## The scheduling logic
|
67
|
+
|
68
|
+
The problem each permission check seeks to solve is determining the truth value
|
69
|
+
of a proposition of the form:
|
70
|
+
|
71
|
+
```pseudo
|
72
|
+
any? enabling-conditions && not (any? preventing-conditions)
|
73
|
+
```
|
74
|
+
|
75
|
+
If `[a, b, c]` are enabling conditions, and `[x, y, z]` are preventing
|
76
|
+
conditions, then this could be expressed as:
|
77
|
+
|
78
|
+
```ruby
|
79
|
+
(a | b | c) & ~x & ~y & ~z
|
80
|
+
```
|
81
|
+
|
82
|
+
But the [scheduler](../lib/declarative_policy/runner.rb) represents this
|
83
|
+
as a flat list of rules - conditions and their outcomes:
|
84
|
+
|
85
|
+
```pseudo
|
86
|
+
[
|
87
|
+
(a, :enable),
|
88
|
+
(b, :enable),
|
89
|
+
(c, :enable),
|
90
|
+
(x, :prevent),
|
91
|
+
(y, :prevent),
|
92
|
+
(z, :prevent)
|
93
|
+
]
|
94
|
+
```
|
95
|
+
|
96
|
+
They aren't necessarily run in this order, however. Instead, we try to order
|
97
|
+
the list to minimize unnecessary work.
|
98
|
+
|
99
|
+
The
|
100
|
+
[logic](https://gitlab.com/gitlab-org/declarative-policy/blob/659ac0525773a76cf8712d47b3c2dadd03b758c9/lib/declarative_policy/runner.rb#L80-112)
|
101
|
+
to process this list is (in pseudo-code):
|
102
|
+
|
103
|
+
```pseudo
|
104
|
+
while any-enable-rule-remains?(rules)
|
105
|
+
rule := pop-cheapest-remaining-rule(rules)
|
106
|
+
fact := observe-io-and-update-cache rule.condition
|
107
|
+
|
108
|
+
if fact and rule.prevents?
|
109
|
+
return prevented
|
110
|
+
else if fact and rule.enables?
|
111
|
+
skip-all-other-enabling-rules!
|
112
|
+
enabled? := true
|
113
|
+
|
114
|
+
if enabled?
|
115
|
+
return enabled
|
116
|
+
else
|
117
|
+
return prevented
|
118
|
+
```
|
119
|
+
|
120
|
+
The process for ordering rules is that each condition has a score, and we prefer
|
121
|
+
the rules with the lowest `score`. Cached values have a score of `0`. Composite
|
122
|
+
conditions (such as `a | b | c`) have a score that the sum of the scores of
|
123
|
+
their components.
|
124
|
+
|
125
|
+
The evaluation of one rule results in updating the cache, so other rules might
|
126
|
+
become cheaper, during policy evaluation. To take this into account, we re-score
|
127
|
+
the set of rules on each iteration of the main loop.
|
128
|
+
|
129
|
+
## Consequences for the policy-writer
|
130
|
+
|
131
|
+
While interesting in its own right, this has some practical consequences for the
|
132
|
+
policy writer:
|
133
|
+
|
134
|
+
### Flat is better than nested
|
135
|
+
|
136
|
+
The scheduler can do a better job of arranging work into the smallest possible
|
137
|
+
chunks if the definitions are as flat as possible, meaning this:
|
138
|
+
|
139
|
+
```ruby
|
140
|
+
rule { condition_a }.enable :some_ability
|
141
|
+
rule { condition_b }.prevent :some_ability
|
142
|
+
```
|
143
|
+
|
144
|
+
Is easier to optimise than:
|
145
|
+
|
146
|
+
```ruby
|
147
|
+
rule { condition_a & ~condition_b }.enable :some_ability
|
148
|
+
```
|
149
|
+
|
150
|
+
We do attempt to flatten and de-nest logical expressions, but it is not always
|
151
|
+
possible to raise all expressions to the top level. All things being
|
152
|
+
equal, we recommend using the declarative style.
|
153
|
+
|
154
|
+
#### An example of sub-optimal scheduling
|
155
|
+
|
156
|
+
The scheduler is only able to re-order conditions that can be flattened out to
|
157
|
+
the top level. For example, given the following definition:
|
158
|
+
|
159
|
+
```ruby
|
160
|
+
condition(:a, score: 1) { ... }
|
161
|
+
condition(:b, score: 2) { ... }
|
162
|
+
condition(:c, score: 3) { ... }
|
163
|
+
|
164
|
+
rule { a & c }.enable :some_ability
|
165
|
+
rule { b & c }.enable :some_ability
|
166
|
+
```
|
167
|
+
|
168
|
+
The conditions are evaluated in the following order:
|
169
|
+
|
170
|
+
- `a & c` (score = 4):
|
171
|
+
- `a` (score = 1)
|
172
|
+
- `c` (score = 3)
|
173
|
+
- `b & c` (score = 3):
|
174
|
+
- `c` (score = 0 [cached])
|
175
|
+
- `b` (score = 2)
|
176
|
+
|
177
|
+
If instead this were three top level rules:
|
178
|
+
|
179
|
+
```ruby
|
180
|
+
rule { a }.enable :some_ability
|
181
|
+
rule { b }.enable :some_ability
|
182
|
+
rule { ~c }.prevent :some_ability
|
183
|
+
```
|
184
|
+
|
185
|
+
Then this would be evaluated as:
|
186
|
+
|
187
|
+
- `a` (score = 1)
|
188
|
+
- `b` (score = 2)
|
189
|
+
- `c` (score = 3)
|
190
|
+
|
191
|
+
If `a` and `b` fail, then `3` is never evaluated, saving the most
|
192
|
+
expensive call.
|
193
|
+
|
194
|
+
The total evaluated costs for each arrangement are:
|
195
|
+
|
196
|
+
| Failing conditions | Nested cost | Flat cost |
|
197
|
+
|--------------------|-----------------|---------------|
|
198
|
+
| none | 4 `(a, c)` | 4 `(a, c)` |
|
199
|
+
| all | 3 `(a, b)` | 3 `(a, b)` |
|
200
|
+
| `a` | 6 `(a, b, c)` | 6 `(a, b, c)` |
|
201
|
+
| `b` | 4 `(a, c)` | 4 `(a, c)` |
|
202
|
+
| `c` | 4 `(a, c, c=0)` | 4 `(a, c)` |
|
203
|
+
| `a` and `b` | 4 `(a, c, c=0)` | 3 `(a, b)` |
|
204
|
+
| `a` and `c` | 6 `(a, b, c)` | 6 `(a, b, c)` |
|
205
|
+
| `b` and `c` | 4 `(a, c, c=0)` | 4 `(a, c)` |
|
206
|
+
|
207
|
+
While the overall costs for all arrangements are very similar,
|
208
|
+
the flat representation is strictly superior, and does not even need to
|
209
|
+
rely on the cache for this behavior.
|
210
|
+
|
211
|
+
### Getting the scope right matters
|
212
|
+
|
213
|
+
By default, the outcome of each rule is cached against a key like
|
214
|
+
`(rule.condition.key, user.key, subject.key)`. (For more information, read
|
215
|
+
[caching](./caching.md).) This makes sense for some things like:
|
216
|
+
|
217
|
+
```ruby
|
218
|
+
condition(:owns_vehicle) { @user == @subject.owner }
|
219
|
+
```
|
220
|
+
|
221
|
+
In this case, the result depends on both the `@user` and the `@subject`. Not all
|
222
|
+
conditions are like that, though! The following condition only refers to the
|
223
|
+
subject:
|
224
|
+
|
225
|
+
```ruby
|
226
|
+
condition(:roadworthy) { @subject.warrant_of_fitness.current? }
|
227
|
+
```
|
228
|
+
|
229
|
+
If we cached this against `(user_a, car_a)` and then tested it
|
230
|
+
against `(user_b, car_a)` it would not match, and we would have to re-compute
|
231
|
+
the condition, even though the road-worthiness of a vehicle does not depend on
|
232
|
+
the driver. See [caching](./caching.md) for more discussion on scopes.
|
233
|
+
|
234
|
+
Because more general conditions are more sharable, all things being equal, it is
|
235
|
+
better to evaluate a condition that might be shared later, rather than one that
|
236
|
+
is less likely to be shared. For this reason, when we sort the rules,
|
237
|
+
we prefer ones with more general scopes to more specific ones.
|
238
|
+
|
239
|
+
### Getting the score right matters
|
240
|
+
|
241
|
+
Each condition has a `score`, which is an abstract weight. By default this is
|
242
|
+
determined by the scope.
|
243
|
+
|
244
|
+
However, if you know that a condition is very expensive to run, then it makes sense
|
245
|
+
to give it a higher score, meaning it's only evaluated if we really need
|
246
|
+
to. On the other hand, if a condition is very likely to be determinative, then
|
247
|
+
giving it a lower score would ensure we test it first.
|
248
|
+
|
249
|
+
For example, take two conditions, one which queries the local DB, and one
|
250
|
+
which makes an external API call. If they are otherwise equivalent, calling
|
251
|
+
the database one first is likely to be more efficient, as it might save us needing
|
252
|
+
to make the external API call. Conditions that are
|
253
|
+
[pure](https://en.wikipedia.org/wiki/Pure_function) can even be given a value of
|
254
|
+
`0`, as no I/O is required to compute them.
|
255
|
+
|
256
|
+
```ruby
|
257
|
+
condition(:local_db) { @subject.related_object.present? }
|
258
|
+
condition(:pure, score: 0) { @subject.some_attribute? }
|
259
|
+
condition(:external_api, score: API_SCORE) { ExtrnalService.get(@subject.id).ok? }
|
260
|
+
|
261
|
+
# these are run in the order: pure, local_db, external_api
|
262
|
+
rule { external_api & pure & local_db }.enable :some_ability
|
263
|
+
```
|
264
|
+
|
265
|
+
The other consideration is the likelihood that a condition is determinative. For
|
266
|
+
example, if `condition_a` is true 80% of the time, and `condition_b` is true
|
267
|
+
20% of the time, then we should prefer to run `condition_a` if these conditions
|
268
|
+
enable an ability (because 80% of the time we don't need to run `condition_b`).
|
269
|
+
But if they prevent an ability, then we would prefer to run `condition_b` first,
|
270
|
+
because again, 80% of the time we can skip `condition_a`. This consideration is
|
271
|
+
more subtle. It requires knowing both the distribution of the condition, and
|
272
|
+
the consequence of its outcome, but this can be used to further optimize the
|
273
|
+
order of evaluation by marking some conditions as more likely to affect the
|
274
|
+
outcome.
|
275
|
+
|
276
|
+
All things being equal, we prefer to run prevent rules, because they have this
|
277
|
+
property - they are more likely to save extra work.
|