@fluidframework/merge-tree 2.0.0-internal.1.1.0 → 2.0.0-internal.1.2.0.93071
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/client.d.ts.map +1 -1
- package/dist/client.js +11 -0
- package/dist/client.js.map +1 -1
- package/dist/mergeTree.d.ts.map +1 -1
- package/dist/mergeTree.js.map +1 -1
- package/dist/snapshotChunks.d.ts +2 -1
- package/dist/snapshotChunks.d.ts.map +1 -1
- package/dist/snapshotChunks.js.map +1 -1
- package/dist/textSegment.d.ts +1 -1
- package/dist/textSegment.js.map +1 -1
- package/{DEV.md → docs/DEV.md} +0 -0
- package/docs/Obliterate.md +639 -0
- package/{REFERENCEPOSITIONS.md → docs/REFERENCEPOSITIONS.md} +0 -0
- package/lib/client.d.ts.map +1 -1
- package/lib/client.js +11 -0
- package/lib/client.js.map +1 -1
- package/lib/mergeTree.d.ts.map +1 -1
- package/lib/mergeTree.js.map +1 -1
- package/lib/snapshotChunks.d.ts +2 -1
- package/lib/snapshotChunks.d.ts.map +1 -1
- package/lib/snapshotChunks.js.map +1 -1
- package/lib/textSegment.d.ts +1 -1
- package/lib/textSegment.js.map +1 -1
- package/package.json +14 -14
- package/src/client.ts +12 -0
- package/src/mergeTree.ts +1 -1
- package/src/snapshotChunks.ts +2 -1
- package/src/textSegment.ts +1 -1
|
@@ -0,0 +1,639 @@
|
|
|
1
|
+
# Merge Tree Obliterate
|
|
2
|
+
|
|
3
|
+
This document covers motivation, spec, and design for the upcoming "obliterate" feature of merge-tree.
|
|
4
|
+
|
|
5
|
+
## Spec
|
|
6
|
+
|
|
7
|
+
A concise description of merge-tree's current merge conflict resolution strategy is as follows:
|
|
8
|
+
|
|
9
|
+
- Insertion of a text segment only conflicts with other insertions at the same location.
|
|
10
|
+
The conflict is resolved by inserting the segment added later nearer in the string.
|
|
11
|
+
For example, from an initial state of "abc", if the operations [insert "hi " at 0] from client 1
|
|
12
|
+
and [insert "bye " at 0] from client 2 are sequenced in that order, the resulting state is "bye hi abc".
|
|
13
|
+
- Range operations (delete, annotate) apply to the range at the time the operation was issued.
|
|
14
|
+
Specifically, insertion of a segment into a range that is concurrently deleted or annotated
|
|
15
|
+
will not result in that inserted segment being deleted or annotated. For example, from an initial state "012",
|
|
16
|
+
the operations [delete the range [1, 3)] from client 1 and [insert "hi" at index 2 (i.e. between "1" and "2")] from client 2,
|
|
17
|
+
the resulting text is "0hi".
|
|
18
|
+
|
|
19
|
+
The merge outcomes for ranges are easy to understand, but not always desirable.
|
|
20
|
+
Oftentimes, when consumers want to work with ranges, they may want their operation to apply to concurrently inserted segments.
|
|
21
|
+
In the example above, these semantics would look like so:
|
|
22
|
+
|
|
23
|
+
```
|
|
24
|
+
// Initial state at seq 0: "012"
|
|
25
|
+
{ seq: 1, refSeq: 0, clientId: 1, op: <insert "hi" at index 2> }
|
|
26
|
+
{ seq: 2, refSeq: 0, clientId: 2, op: <delete the range [1, 3)> }
|
|
27
|
+
// final desired state: "0"
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
```
|
|
31
|
+
// Initial state at seq 0: "012"
|
|
32
|
+
{ seq: 1, refSeq: 0, clientId: 2, op: <delete the range [1, 3)> }
|
|
33
|
+
{ seq: 2, refSeq: 0, clientId: 1, op: <insert "hi" at index 2> }
|
|
34
|
+
// final desired state: "0"
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
A `SharedString` feature request for a removal operation with these semantics dubbed them "obliterate".
|
|
38
|
+
|
|
39
|
+
At an implementation level, these semantics can be viewed in two parts:
|
|
40
|
+
- The range specification is resolved at the time the op is sequenced
|
|
41
|
+
- Any subsequent segments inserted into that range concurrently should also be removed
|
|
42
|
+
|
|
43
|
+
The first clause handles concurrent inserts before the removal is sequenced, and the second clause handles concurrent inserts after the removal is sequenced.
|
|
44
|
+
|
|
45
|
+
However, there is a way to view obliterate's semantics as a special case of a "move" operation,
|
|
46
|
+
which preserves content identity such that concurrently inserted segments will be inserted to the range at its current location.
|
|
47
|
+
A main motivator here from the app perspective might be the idea that if user 1 cut and pastes an entire paragraph to a different section of the document
|
|
48
|
+
whiler user 2 edits it, the desired merge outcome would likely be for user 2's edit to apply to the paragraph in its new location.
|
|
49
|
+
Roughly, anywhere an application would want obliterate merge semantics on user delete of some content,
|
|
50
|
+
the same application would want move semantics if the user instead cut and pasted the content somewhere else.
|
|
51
|
+
|
|
52
|
+
There have historically been feature requests for move semantics inside merge-tree (for example [issue 8518](https://github.com/microsoft/FluidFramework/issues/8518)),
|
|
53
|
+
so it makes sense to do forward-thinking on implementing obliterate in a way that we can extend it to cover move semantics in the future.
|
|
54
|
+
|
|
55
|
+
For that reason, naming choices of fields and semantics for the remainder of the document will be written in terms of obliterate being the special case
|
|
56
|
+
"move this range out of existence".
|
|
57
|
+
This should alleviate any back-compat issues if/when we do decide to implement move (esp. fields that end up in ops or snapshots).
|
|
58
|
+
The current proposal is to use the runtime value "null" to represent "out of existence", but this choice is flexible.
|
|
59
|
+
In prose, for terseness that operation will still be called obliterate.
|
|
60
|
+
After describing obliterate's design, this document [digs into how the design can be extended to work for move](##Move).
|
|
61
|
+
|
|
62
|
+
Notice that the above examples always insert text at positions strictly inside the removed range.
|
|
63
|
+
If the insert operation was instead before the "1" or after the "2", one can imagine different applications wanting different behavior:
|
|
64
|
+
either the obliterated region should expand to include that text, or it should not.
|
|
65
|
+
This topic will be covered in the [endpoint behavior](#endpoint-behavior) section,
|
|
66
|
+
but for eventual consistency strategy discussion one should assume that the design should generally support both options
|
|
67
|
+
(and either leave it up to merge-tree to restrict degrees of freedom as it seems fit).
|
|
68
|
+
|
|
69
|
+
## Eventual Consistency Strategy
|
|
70
|
+
|
|
71
|
+
This section is focused on how one could implement the "obliterate" semantics inside merge tree in an eventually consistent fashion.
|
|
72
|
+
This will constitute the bulk of the complexity of the feature.
|
|
73
|
+
Since obliterate is generally a "different kind of remove," there may be a nice abstraction to introduce at the code level to generalize
|
|
74
|
+
removal information. However, in favor of introducing niceties later this design document will assume fields are inlined and focus on
|
|
75
|
+
the strategy for ensuring eventual consistency. If such an abstraction is introduced, ideally it would enable better "pay-to-play" of
|
|
76
|
+
common code paths based on merge-tree feature usage.
|
|
77
|
+
As an example, `BaseSegment.split()` needs to copy segment properties to the split segment.
|
|
78
|
+
So new properties added to segment will unnecessarily copy undefined values.
|
|
79
|
+
|
|
80
|
+
There are a few aspects of merge tree's bookkeeping and general feature set that require consideration when designing new op semantics:
|
|
81
|
+
|
|
82
|
+
- Any changes to direct fields of tree nodes themselves (either new data or changes to bookkeeping of existing data)
|
|
83
|
+
- How the feature interacts with an increasing collab window and zamboni
|
|
84
|
+
- Impact on the partial lengths scheme
|
|
85
|
+
- Bookkeeping and handling of overlapping removals (note some may be obliterates and some may not be)
|
|
86
|
+
- Reconnection
|
|
87
|
+
- Snapshotting impact
|
|
88
|
+
|
|
89
|
+
We'll first present an overview of a potential scheme for implementing the obliterate op, then comment on these aspects.
|
|
90
|
+
|
|
91
|
+
### High-level bookkeeping changes
|
|
92
|
+
|
|
93
|
+
Segments will be augmented with `movedSeq` and `localMovedSeq` fields which generally align with the semantics of `seq, localSeq, removedSeq,` and `localRemovedSeq`.
|
|
94
|
+
When segments are moved and not just obliterated, they will also contain a reference to the destination segment.
|
|
95
|
+
This may look as follows:
|
|
96
|
+
|
|
97
|
+
```typescript
|
|
98
|
+
/**
|
|
99
|
+
* Tracks information about when and where this segment was moved to.
|
|
100
|
+
* @example - Suppose a merge tree had 3 TextSegments "X", "A", and "B", and
|
|
101
|
+
* received the operation `move({ start: 0, end: 1 }, { dest: 3 }, { seq: 30 })` (moving the "X"
|
|
102
|
+
* after the "A" and the "B").
|
|
103
|
+
* After processing this operation, it would have the segments `[<moved "X" tombstone>, "A", "B", "X"]`.
|
|
104
|
+
* The moved "X" tombstone segment would have the following IMoveInfo: `{ movedSeq: 30, moveDst: <reference to living "X" segment>}`
|
|
105
|
+
*/
|
|
106
|
+
export interface IMoveInfo {
|
|
107
|
+
/**
|
|
108
|
+
* Local seq at which this segment was moved if the move is yet-to-be acked. Only set on the tombstone "source" segment of the move.
|
|
109
|
+
*/
|
|
110
|
+
localMovedSeq?: number;
|
|
111
|
+
/**
|
|
112
|
+
* Seq at which this segment was moved. Only set on the tombstone "source" segment of the move.
|
|
113
|
+
*/
|
|
114
|
+
movedSeq: number;
|
|
115
|
+
/**
|
|
116
|
+
* A reference to the inserted destination segment corresponding to this segment's move.
|
|
117
|
+
* If undefined, the move was an obliterate.
|
|
118
|
+
*/
|
|
119
|
+
moveDst?: ReferencePosition;
|
|
120
|
+
|
|
121
|
+
/**
|
|
122
|
+
* List of client IDs that have moved this segment.
|
|
123
|
+
* The client that actually moved the segment (i.e. whose move op was sequenced first) is stored as the first
|
|
124
|
+
* client in this list. Other clients in the list have all issued concurrent ops to move the segment.
|
|
125
|
+
*/
|
|
126
|
+
movedClientIds: number[];
|
|
127
|
+
}
|
|
128
|
+
|
|
129
|
+
export interface ISegment extends Partial<IRemovalInfo>, Partial<IMoveInfo> {
|
|
130
|
+
// ...
|
|
131
|
+
}
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
The `moveDst` reference position functions as a redirection pointer when another client attempts to concurrently insert into the moved range: the usual approach
|
|
135
|
+
for locating a node at some `{ pos, refSeq, clientId }` applies, and if the resulting segment has been moved, one can follow the trail of moves to find the segment's
|
|
136
|
+
current location.
|
|
137
|
+
|
|
138
|
+
Note that though `movedSeq` and `localMovedSeq` act very similarly to `removedSeq` and `localRemovedSeq` when considering the length of a segment at a given
|
|
139
|
+
perspective: if the perspective is from after the segment was moved, the tombstone segment should have length 0.
|
|
140
|
+
However, these fields need to be independent from `removedSeq` due to the possibility of a removal and a move overlapping, as well as the differences
|
|
141
|
+
in how concurrent inserts are handled into a removed or a moved range.
|
|
142
|
+
|
|
143
|
+
### Remote perspective
|
|
144
|
+
|
|
145
|
+
We now move to some lower-level implementation details on how to ensure eventual consistency operating in this model.
|
|
146
|
+
|
|
147
|
+
First, consider the behavior a client must have when processing an obliterate op it didn't submit.
|
|
148
|
+
For concreteness and ease of explanation, say this op is `{ seq: 50, refSeq: 40, clientId: 2, op: <move the range [10, 15) to null }`.
|
|
149
|
+
The processing client should first mark all segments between the segment `getContainingSegment({ pos: 10, refSeq: 40, clientId: 2 })` and
|
|
150
|
+
`getContainingSegment({ pos: 15, refSeq: 40, clientId: 2 })` that are alive (i.e. inserted, not removed) from the perspective
|
|
151
|
+
`{ seq: 50, clientId: localClientId }` obliterated.
|
|
152
|
+
Note this means that if a segment in the range was concurrently removed, it won't be marked as moved as well.
|
|
153
|
+
The marking process should be roughly equivalent to what happens in a "remove" operation, but instead of updating `removedSeq`/`localRemovedSeq`
|
|
154
|
+
it updates `movedSeq` and `localMovedSeq`.
|
|
155
|
+
|
|
156
|
+
The other interesting difference between this operation and a normal removal is its inclusion of segments inserted between seq 40 and seq 50.
|
|
157
|
+
The current API on merge tree used for `markRangeRemoved` (which is `mapRange`) doesn't support iterating in this fashion,
|
|
158
|
+
but could easily be extended to do so.
|
|
159
|
+
One way to do that would be to decouple the `refSeq` and length calculations used for locating the positions and the `refSeq` used for
|
|
160
|
+
deciding whether or not to descend and `map` children nodes.
|
|
161
|
+
|
|
162
|
+
This handles removal of any concurrently inserted segments sequenced before the obliterate op, as well as local ops sequenced after the
|
|
163
|
+
obliterate op (since we use `localClientId`).
|
|
164
|
+
However, the client still needs to ensure concurrently inserted segments sequenced after the obliterate op are immediately removed.
|
|
165
|
+
The insert codepath will therefore need to take into account if the destination is inside of an ongoing moved area.
|
|
166
|
+
Excursions are a good tool for this job, but checking is still easier said than done.
|
|
167
|
+
Concretely, and continuing with the example operations given above, suppose this insertion happens:
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
{ seq: 60, refSeq: 40, clientId: 3, op: <insert "hello" at index 10> }
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
After locating the insertion point and updating the merge tree, we need to decide if the resulting segment is inside of a moved region.
|
|
174
|
+
If we happened to know the `seq` of the move we were testing for, this would be easy: the first adjacent segment in each direction from
|
|
175
|
+
the perspective of `{ seq: 50, clientId: localClientId }` can inform us if we're either inside or directly adjacent to that moved range.
|
|
176
|
+
Thus, a naive implementation could check all sequence numbers in the collab window.
|
|
177
|
+
The obvious optimization of only checking seq numbers of move ops would improve this slightly.
|
|
178
|
+
But we can do asymptotically better by leveraging the tree structure.
|
|
179
|
+
It would be ideal if we only needed to perform one commonly short excursion in each direction.
|
|
180
|
+
The only candidate that makes much sense is from the perspective of `{ seq: 60, clientId: 3 }` (i.e. the client submitting the insert op at
|
|
181
|
+
the time the op is sequenced).
|
|
182
|
+
The problem with this perspective is that ops 51 through 59 may have inserted a segment between the inserted "hello"
|
|
183
|
+
and the obliterated range that was submitted by a client which has already acked the obliterate.
|
|
184
|
+
For example, `{ seq: 55, clientId: 5, refSeq: 50, op: <insert "i won't be obliterated" at index 10> }`.
|
|
185
|
+
|
|
186
|
+
The forward excursion would need to continue past this segment in order to conclude it isn't in an obliterated range.
|
|
187
|
+
If that was the only such concurrent insert, the next segment it would visit would be an obliterated one and we'd decide
|
|
188
|
+
whether or not to include the newly inserted segment as part of the obliterated region based on some endpoint merge strategy.
|
|
189
|
+
|
|
190
|
+
The key insight is that visiting the segment with seq 55 does provide the excursion with information: since the segment
|
|
191
|
+
was inserted at seq 55 and isn't moved or removed, any move operation must have occurred before seq 55.
|
|
192
|
+
If we keep track of the smallest sequence number of alive segments that we've visited, we therefore have an upper bound
|
|
193
|
+
for any possible adjacent move op.
|
|
194
|
+
Thus, we can halt the excursion as soon as this upper bound falls below the smallest obliterate operation within the collab window.
|
|
195
|
+
If we alternatively reach a segment that has been moved concurrently to the insert we're processing, we can also stop
|
|
196
|
+
and use some endpoint resolution strategy.
|
|
197
|
+
|
|
198
|
+
The guarantee we get for a removed segment isn't quite as good: we only know that the move must have come either before
|
|
199
|
+
the segment was inserted or after it was removed (since move doesn't impact segments that are removed before its application).
|
|
200
|
+
We *could* track this as part of our excursion by maintaining a range of disjoint intervals at which an obliterate "might have happened"
|
|
201
|
+
and exiting as soon as we know no obliterate is possible, but this is probably more effort than required: only decreasing our upper bound
|
|
202
|
+
for removed segments if our existing upper bound is below when the segment was removed is a reasonable intermediate approach that uses
|
|
203
|
+
less bookkeeping overhead.
|
|
204
|
+
|
|
205
|
+
All-in-all, the insert logic modification might look something like this:
|
|
206
|
+
|
|
207
|
+
```typescript
|
|
208
|
+
function wasRemovedAfter(seg: ISegment, seq: number): boolean {
|
|
209
|
+
return seg.removedSeq !== UnassignedSequenceNumber && seg.removedSeq > seq;
|
|
210
|
+
}
|
|
211
|
+
|
|
212
|
+
function insertingWalk(args /* mostly omitted */, op) {
|
|
213
|
+
/* regular insert logic goes here */
|
|
214
|
+
|
|
215
|
+
let moveUpperBound = Number.POSITIVE_INFINITY;
|
|
216
|
+
let movedSegment: ISegment | undefined = undefined;
|
|
217
|
+
const smallestSeqMoveOp = this.getSmallestSeqMoveOp();
|
|
218
|
+
const findAdjacedMovedSegment = (seg) => {
|
|
219
|
+
if (seg.movedSeq && seg.movedSeq > op.referenceSequenceNumber) {
|
|
220
|
+
movedSegment = seg;
|
|
221
|
+
return false;
|
|
222
|
+
}
|
|
223
|
+
|
|
224
|
+
if (!isRemovedAndAcked(seg) || wasRemovedAfter(seg, moveUpperBound)) {
|
|
225
|
+
moveUpperBound = Math.min(moveUpperBound, seg.seq);
|
|
226
|
+
}
|
|
227
|
+
// If we've reached a segment that existed before any of our in-collab-window move ops
|
|
228
|
+
// happened, no need to continue.
|
|
229
|
+
return moveUpperBound > smallestSeqMoveOp;
|
|
230
|
+
}
|
|
231
|
+
forwardExcursion(insertSegment, findAdjacedMovedSegment);
|
|
232
|
+
const furtherMovedSegment = movedSegment;
|
|
233
|
+
currentMin = Number.POSITIVE_INFINITY;
|
|
234
|
+
movedSeg = undefined;
|
|
235
|
+
backwardExcursion(insertSegment, findAdjacedMovedSegment);
|
|
236
|
+
const nearerMovedSegment = movedSegment;
|
|
237
|
+
if (
|
|
238
|
+
(nearerMovedSegment && breakEndpointTie(nearerMovedSegment, insertSegment, op)) ||
|
|
239
|
+
(furtherMovedSegment && breakEndpointTie(insertSegment, furtherMovedSegment, op))
|
|
240
|
+
) {
|
|
241
|
+
// These objects will be analogous to return from `toRemovalInfo`.
|
|
242
|
+
const nearMoveInfo = toMoveInfo(nearerMovedSegment);
|
|
243
|
+
const farMoveInfo = toMoveInfo(furtherMovedSegment);
|
|
244
|
+
// The inserted segment could potentially be adjacent to two different moved regions.
|
|
245
|
+
// We mark it as moved using the info from the earlier such operation.
|
|
246
|
+
const moveInfo = min(nearMoveInfo, farMoveInfo);
|
|
247
|
+
markSegmentMoved(insertSegment, moveInfo, op)
|
|
248
|
+
}
|
|
249
|
+
}
|
|
250
|
+
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
In reality it will be a bit more complicated: this does not properly handle inserting walks performed for local edits (which should never be immediately obliterated),
|
|
254
|
+
nor does it handle local, un-acked obliterates (which will be covered in the next section).
|
|
255
|
+
It's worth noting that removals between the obliterated seq and the inserting op's seq don't complicate things much because excursions visit all segments, regardless of visibility.
|
|
256
|
+
|
|
257
|
+
This limits the segment excursions to not be longer than the number of consecutive segments adjacent to the insertion
|
|
258
|
+
point that are all within the collaboration window.
|
|
259
|
+
That's probably performant enough, but if we want to optimize further at some memory cost it is probably possible to use the
|
|
260
|
+
partialLengths information to skip over blocks in some cases if the sequence numbers of obliterate ops are stored on
|
|
261
|
+
each merge block.
|
|
262
|
+
|
|
263
|
+
### Local perspective
|
|
264
|
+
|
|
265
|
+
Next, we move to the local handling of a move op while it's in flight.
|
|
266
|
+
For consistency with the rest of merge tree's segment state machine, the state transitions of `{ localMovedSeq, movedSeq }` and `{ localRemovedSeq, removedSeq }` should align (`movedSeq` is set to `UnassignedSeqNumber` while the op is in flight with `localMovedSeq` recording the local seq at which the move happened, then on ack of the op `localMovedSeq` is cleared out and `movedSeq` is replaced with the op's seq).
|
|
267
|
+
|
|
268
|
+
While a move op is in flight, any non-local insertions into a locally moved range need to be immediately moved to the range's current location
|
|
269
|
+
(or removed, if it was obliterated).
|
|
270
|
+
This can be accomplished by tweaking the `findAdjacentMovedSegment` function above to account for `localMovedSeq`:
|
|
271
|
+
|
|
272
|
+
```typescript
|
|
273
|
+
const findAdjacentMovedSegment = (seg) => {
|
|
274
|
+
if ((seg.movedSeq && seg.movedSeq > op.referenceSequenceNumber) ||
|
|
275
|
+
seg.localMovedSeq !== undefined) {
|
|
276
|
+
movedSegment = seg;
|
|
277
|
+
return false;
|
|
278
|
+
}
|
|
279
|
+
|
|
280
|
+
if (!isRemovedAndAcked(seg) || wasRemovedAfter(seg, moveUpperBound)) {
|
|
281
|
+
moveUpperBound = Math.min(moveUpperBound, seg.seq);
|
|
282
|
+
}
|
|
283
|
+
// If we've reached a segment that existed before any of our in-collab-window move ops
|
|
284
|
+
// happened, no need to continue.
|
|
285
|
+
return moveUpperBound > smallestSeqMoveOp;
|
|
286
|
+
}
|
|
287
|
+
```
|
|
288
|
+
|
|
289
|
+
We don't need to worry about the analogous problem of extending the excursion as a result of segments between the insert location and a local move
|
|
290
|
+
because any such segments would have also been marked as locally moved when they were inserted into the merge tree.
|
|
291
|
+
In the sample code written for the remote segment, this will also necessitate `markSegmentMoved` to tolerate marking segments with local obliteration info.
|
|
292
|
+
|
|
293
|
+
Much of the same logic that goes into conflicting local + remote removal will need to be applied for move.
|
|
294
|
+
Nothing stands out as a conceptual issue or hurdle in this realm, though. Just tricky conditionals.
|
|
295
|
+
|
|
296
|
+
Once the op is acked, the behavior in the [Remote perspective](#remote-perspective) section suffices for any further concurrent segments.
|
|
297
|
+
|
|
298
|
+
### Other aspects
|
|
299
|
+
|
|
300
|
+
#### Zamboni
|
|
301
|
+
|
|
302
|
+
Zamboni will need updating to account for the new bookkeeping fields, but there aren't any conceptual issues in this realm since zamboni cleans up unnecessary data for segments outside of the collaboration window and the only difference between remove and obliterate happens within the collab window.
|
|
303
|
+
|
|
304
|
+
#### Snapshot
|
|
305
|
+
|
|
306
|
+
Segments in the snapshot will need to serialize and rehydrate the newly added properties.
|
|
307
|
+
Most of the types are plain-old data and JSON.serialize with no issue.
|
|
308
|
+
When move is implemented (and so `moveDst` can actually be a local reference rather than undefined), that field will need some special handling.
|
|
309
|
+
Several schemes are possible, but in the end it should convert to either a `pos` within some view of the merge-tree or an index+offset into the array
|
|
310
|
+
of serialized segments.
|
|
311
|
+
|
|
312
|
+
#### Reconnection
|
|
313
|
+
|
|
314
|
+
When a move op is rebased, there will need to be local fixup of the range marked moved locally, since the resulting range may expand with different semantics (different ops
|
|
315
|
+
will be concurrent to the rebased version). Since locally applying a move doesn't impact any sequenced segment state (and merge policy is to override pending local moves
|
|
316
|
+
with any remote ones just like the remove merge policy), at worst this can be done unperformantly by walking the range, resetting state, and re-applying.
|
|
317
|
+
|
|
318
|
+
The methods necessary for interpreting where the new range should be in the rebased view of the local merge-tree already exist and are used for regular reconnect (e.g.
|
|
319
|
+
to remove a range of content), so should not present additional trouble.
|
|
320
|
+
|
|
321
|
+
#### Partial Lengths
|
|
322
|
+
|
|
323
|
+
One key capability of merge-tree is its ability to resolve the information `{ pos, clientId, refSeq }` (and potentially `localSeq` if the local client) into a particular
|
|
324
|
+
segment + offset in the merge-tree's leaves.
|
|
325
|
+
It does this efficiently by storing indexing structures at each internal node that allow querying for that node's length at any such perspective within the collab window,
|
|
326
|
+
then leveraging those structures in an efficient tree walk.
|
|
327
|
+
|
|
328
|
+
Adding additional tree operations that any client can undertake means that all other clients must be able to reason about their peers' current states.
|
|
329
|
+
For example, `movedSeq` and `localMovedSeq` will need to be considered when calculating the length of a node/range from a given perspective.
|
|
330
|
+
If the duplicated segment that's inserted as the result of a move is given the `clientId` of the moving client (as opposed to the originating client)
|
|
331
|
+
and `seq` of the move operation, generally existing partial lengths logic will work correctly for non-concurrently inserted segments if
|
|
332
|
+
`movedSeq` and `localMovedSeq` on the tombstoned segment are interpreted analogously to `removedSeq` and `localRemovedSeq`.
|
|
333
|
+
Note that this would require updating the description of the `clientId` field, and for attribution purposes we may want to track the clientId that originally
|
|
334
|
+
created the segment separately from the clientId that most recently caused the segment to be where it is (via move).
|
|
335
|
+
|
|
336
|
+
Things get more complicated when considering resolution of node lengths for concurrently inserted segments.
|
|
337
|
+
The remainder of this section assumes the content is obliterated rather than moved; there are additional difficulties for partial lengths when dealing with
|
|
338
|
+
overlapping moves not covered in this document (they are probably solvable, but may require changes to the representation of the partial lengths indexing
|
|
339
|
+
structure rather than just its data).
|
|
340
|
+
|
|
341
|
+
Concretely, let's consider how partial lengths might look for a segment concurrently inserted into a moved region.
|
|
342
|
+
|
|
343
|
+
Suppose:
|
|
344
|
+
|
|
345
|
+
```
|
|
346
|
+
// Initial state at seq 0: "0123456789"
|
|
347
|
+
{ seq: 1, refSeq: 0, clientId: 1, op: <move [0, 5) out of existence> }
|
|
348
|
+
{ seq: 2, refSeq: 0, clientId: 2, op: <insert "hi" at 2> }
|
|
349
|
+
{ seq: 3, refSeq: 0, clientId: 2, op: <insert "hello" at 7> }
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
The desired final state in this case would be "56hello789". After seq 2, client 0 (an observer) has segments that look like so (clientIds that aren't relevant are omitted):
|
|
353
|
+
```
|
|
354
|
+
[
|
|
355
|
+
{ seq: 0, movedSeq: 1, text: "01", movedClientIds: [1] },
|
|
356
|
+
{ seq: 2, movedSeq: 1, clientId: 2, text: "hi", movedClientIds: [1] },
|
|
357
|
+
{ seq: 0, movedSeq: 1, text: "234", movedClientIds: [1] },
|
|
358
|
+
{ seq: 0, text: "56789" }
|
|
359
|
+
]
|
|
360
|
+
```
|
|
361
|
+
|
|
362
|
+
If these segments are all in a single block and the minimum sequence number is 0, their parent's partial lengths resembles the following:
|
|
363
|
+
|
|
364
|
+
```
|
|
365
|
+
{
|
|
366
|
+
minLength: 10 // length of "0123456789"
|
|
367
|
+
partialLengths: [{ seq: 1, seglen: -5 }, { seq: 2, seglen: 0 }],
|
|
368
|
+
clientSeqNumbers: [[], [{ seq: 1, seglen: -5 }], /* client 2 */[ ?? ]]
|
|
369
|
+
}
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
This data reflects the fact that the subsequence starts at length 10 at seq 0, an observer client sees the length of the subsequence shrink by 5 at seq 1,
|
|
373
|
+
and doesn't see the length change afterward (note such a client hasn't yet received seq 3). It also looks correct for resolving client 1's perspective: even if
|
|
374
|
+
the refSeq isn't at least 1, `clientSeqNumbers[1]` will still cause the current client's interpretation of client 1's view to include the removal of the
|
|
375
|
+
range `[0, 5)`. Client 2 is the tricky one: the length of the block from client 2's perspective should be 12 at refSeq 0, but 5 at either refSeq 1 or 2.
|
|
376
|
+
It looks odd, but this can be accomplished by adding a `{ seq: 1 /* comes from movedSeq */, seglen: 2 }` entry to `clientSeqNumbers[2]`.
|
|
377
|
+
The intuition is that client 2 counts the length of the segment unless `seq >= movedSeq`, and the method used in partial lengths computes the length of
|
|
378
|
+
a subsequence using
|
|
379
|
+
|
|
380
|
+
(length at min seq) + (any deltas between minSeq and refSeq) + (any deltas for ops submitted by remote client between refSeq and now),
|
|
381
|
+
|
|
382
|
+
so the last term counts this entry precisely when it's desired.
|
|
383
|
+
|
|
384
|
+
What happens if the insert and the obliterate are concurrent but sequenced in the other order?
|
|
385
|
+
|
|
386
|
+
```
|
|
387
|
+
// Initial state at seq 0: "0123456789"
|
|
388
|
+
{ seq: 1, refSeq: 0, clientId: 2, op: <insert "hi" at 2> }
|
|
389
|
+
{ seq: 2, refSeq: 0, clientId: 1, op: <move [0, 5) out of existence> }
|
|
390
|
+
{ seq: 3, refSeq: 0, clientId: 2, op: <insert "hello" at 7> }
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
The segment state after seq 2 from client 0's perspective will look mostly the same:
|
|
394
|
+
|
|
395
|
+
```
|
|
396
|
+
[
|
|
397
|
+
{ seq: 0, movedSeq: 2, text: "01", movedClientIds: [1] },
|
|
398
|
+
{ seq: 1, movedSeq: 2, clientId: 2, text: "hi", movedClientIds: [1] },
|
|
399
|
+
{ seq: 0, movedSeq: 2, text: "234", movedClientIds: [1] },
|
|
400
|
+
{ seq: 0, text: "56789" }
|
|
401
|
+
]
|
|
402
|
+
```
|
|
403
|
+
|
|
404
|
+
And the partial lengths object might look like this:
|
|
405
|
+
|
|
406
|
+
```
|
|
407
|
+
{
|
|
408
|
+
minLength: 10 // length of "0123456789"
|
|
409
|
+
partialLengths: [{ seq: 1, seglen: 2 }, { seq: 2, seglen: -7 }],
|
|
410
|
+
clientSeqNumbers: [[], [{ seq: 1, seglen: 2 }, { seq: 2, seglen: -7 }], /* client 2 */[{ seq: 1, seglen: 2 }]]
|
|
411
|
+
}
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
Note that in this case, client 1's `clientSeqNumbers` needed to be fixed up to include an entry for the concurrently inserted segment.
|
|
415
|
+
Thus, when an obliterate/move affects a concurrently inserted segment, it's generally possible to modify the generated partial lengths'
|
|
416
|
+
`clientSeqNumbers` for the client that sequenced its concurrent op later using the information on the inserted segment to interpret
|
|
417
|
+
correct values.
|
|
418
|
+
|
|
419
|
+
This strategy is also consistent with the existing strategy for overlapping delete: see the following snippet from `addClientSeqNumberFromPartial`:
|
|
420
|
+
|
|
421
|
+
```typescript
|
|
422
|
+
if (partialLength.overlapRemoveClients) {
|
|
423
|
+
partialLength.overlapRemoveClients.map((oc: Property<number, IOverlapClient>) => {
|
|
424
|
+
// Original client entry was handled above
|
|
425
|
+
if (partialLength.clientId !== oc.data.clientId) {
|
|
426
|
+
this.addClientSeqNumber(oc.data.clientId, partialLength.seq, oc.data.seglen);
|
|
427
|
+
}
|
|
428
|
+
return true;
|
|
429
|
+
});
|
|
430
|
+
}
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
The other interesting case to go through is when an obliterate/move conflicts with another obliterate/move.
|
|
434
|
+
|
|
435
|
+
##### Review of overlapping removal
|
|
436
|
+
|
|
437
|
+
This section illustrates the basic existing handling for overlapping removal.
|
|
438
|
+
It can probably be skipped by readers familiar with the scheme, but is here to help the reader determine where assumptions may break down or go wrong
|
|
439
|
+
for overlapping obliterate/move.
|
|
440
|
+
|
|
441
|
+
Overlapping removal of a segment is tracked using the `removedClientIds` field, which is used in partial lengths to add adjustment entries to avoid double-counting
|
|
442
|
+
removal.
|
|
443
|
+
For example, suppose client 1 and client 2 concurrently remove the range `[0, 5)` and each performs some more ops before acking the others' remove.
|
|
444
|
+
That might look something like this:
|
|
445
|
+
|
|
446
|
+
```
|
|
447
|
+
// Initial state at seq 0: "0123456789"
|
|
448
|
+
{ seq: 1, refSeq: 0, clientId: 1, op: <remove [0, 5)> }
|
|
449
|
+
{ seq: 2, refSeq: 0, clientId: 2, op: <remove [0, 5)> }
|
|
450
|
+
{ seq: 3, refSeq: 0, clientId: 2, op: <insert "hi" at 2> }
|
|
451
|
+
```
|
|
452
|
+
|
|
453
|
+
The correct final state is "56hi789". Consider what happens when a listener client (say, client 0) attempts to interpret the insertion of "hi" by client 2.
|
|
454
|
+
Before processing, its merge tree segment state would look like so:
|
|
455
|
+
|
|
456
|
+
```
|
|
457
|
+
[
|
|
458
|
+
{ seq: 0, removedSeq: 1, removedClientIds: [1, 2], text: "01234" },
|
|
459
|
+
{ seq: 0, text: "56789" }
|
|
460
|
+
]
|
|
461
|
+
````
|
|
462
|
+
|
|
463
|
+
The constructed partial lengths object for the root of the merge tree would then be:
|
|
464
|
+
|
|
465
|
+
```
|
|
466
|
+
{
|
|
467
|
+
minLength: 10,
|
|
468
|
+
partialLengths: [{ seq: 1, seglen: -5 }],
|
|
469
|
+
clientSeqNumbers: [[], /* client 1 */[{ seq: 1, seglen: -5 }], /* client 2 */[{ seq: 1, seglen: -5 }]]
|
|
470
|
+
}
|
|
471
|
+
```
|
|
472
|
+
|
|
473
|
+
Note that client 2's delta applies from seq 1 onward rather than seq 2, since it's constructed using the `seq` and `removedClientIds` on the removed segment.
|
|
474
|
+
|
|
475
|
+
Client 0 would determine where to insert the op with seq 3 by:
|
|
476
|
+
|
|
477
|
+
1. Asking the root for its length at `{ clientId: 2, refSeq: 0 }`
|
|
478
|
+
- This calculation is based on (length at min seq) + (any deltas between minSeq and refSeq) + (any deltas for ops submitted by client 2) - (deltas submitted by client 2 before refSeq)
|
|
479
|
+
- From the above bookkeeping, it would compute 10 + 0 + (-5) - 0 = 5
|
|
480
|
+
2. Asking for the length of the first child at `{ clientId: 2, refSeq: 0 }`
|
|
481
|
+
- Conditionals here are a bit tedious, but we'd see that clientId 2 is in the segment's removedClientId list, so it has length 0
|
|
482
|
+
3. Asking for the length of the second child `{ clientId: 2, refSeq: 0 }`
|
|
483
|
+
- The segment is inserted and not removed, so it has length 5.
|
|
484
|
+
Since the search is looking for an accumulated position of 2, it determines that the correct insertion point is amidst this segment.
|
|
485
|
+
|
|
486
|
+
##### Overlapping obliterate
|
|
487
|
+
|
|
488
|
+
The same general strategy used for overlapping removes should be sufficient for tracking overlapping obliteration of segments.
|
|
489
|
+
It relies only on information about when and by who a segment was removed, and the main difference between remove and obliterate comes
|
|
490
|
+
from which segments they affect rather than how the segments are affected.
|
|
491
|
+
|
|
492
|
+
Note also that because `movedSeq` is distinct from `removedSeq`, the corresponding partial lengths entry for `movedClientIds[0]` obliterating the segment can be entered
|
|
493
|
+
distinctly from the partial lengths entry for `removedClientIds[0]` removing the segment.
|
|
494
|
+
|
|
495
|
+
Again, the interesting case to check is if two separate clients issue obliterate ops amidst a concurrent insert (otherwise it is functionally identical to the remove case).
|
|
496
|
+
|
|
497
|
+
```
|
|
498
|
+
// Initial state at seq 0: "0123456789"
|
|
499
|
+
{ seq: 1, refSeq: 0, clientId: 1, op: <obliterate [0, 5)> }
|
|
500
|
+
{ seq: 2, refSeq: 0, clientId: 2, op: <obliterate [0, 5)> }
|
|
501
|
+
{ seq: 3, refSeq: 0, clientId: 3, op: <insert "hi" at 2> }
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
The segment state of some observing client after seq 3 is essentially the same as in the non-overlapping example:
|
|
505
|
+
|
|
506
|
+
```
|
|
507
|
+
[
|
|
508
|
+
{ seq: 0, movedSeq: 1, text: "01", movedClientIds: [1, 2] },
|
|
509
|
+
{ seq: 3, movedSeq: 1, clientId: 3, text: "hi", movedClientIds: [1, 2] },
|
|
510
|
+
{ seq: 0, movedSeq: 1, text: "234", movedClientIds: [1, 2] },
|
|
511
|
+
{ seq: 0, text: "56789" }
|
|
512
|
+
]
|
|
513
|
+
```
|
|
514
|
+
|
|
515
|
+
From the observing client perspective, the interpretation of each client's text if they were to submit an op with refSeq 0 through 3 is as follows:
|
|
516
|
+
|
|
517
|
+
| refSeq | client 1 | client 2 | client 3 |
|
|
518
|
+
| 0 | 56789 | 56789 | 01hi23456789 |
|
|
519
|
+
| 1 | 56789 | 56789 | 56789 |
|
|
520
|
+
| 1 | 56789 | 56789 | 56789 |
|
|
521
|
+
| 1 | 56789 | 56789 | 56789 |
|
|
522
|
+
|
|
523
|
+
The corresponding lengths table is exactly what's achieved by combining the overlapping remove strategy with the strategy for bookkeeping concurrently inserted segments:
|
|
524
|
+
|
|
525
|
+
```
|
|
526
|
+
{
|
|
527
|
+
minLength: 10,
|
|
528
|
+
partialLengths: [{ seq: 1, seglen: -5 }, { seq: 3, seglen: 0 }],
|
|
529
|
+
clientSeqNumbers: [
|
|
530
|
+
[],
|
|
531
|
+
[{ seq: 1, seglen: -5 }],
|
|
532
|
+
[{ seq: 1, seglen: -5 }], /* comes from adding clientSeqNumber to all entries in removedClientIds */
|
|
533
|
+
[{ seq: 1, seglen: 2 }] /* comes from the inserted "hi" segment which has movedSeq <= seq */
|
|
534
|
+
]
|
|
535
|
+
}
|
|
536
|
+
```
|
|
537
|
+
|
|
538
|
+
This approach works if the operations are sequenced in the other order or intermediately as well.
|
|
539
|
+
|
|
540
|
+
## Endpoint Behavior
|
|
541
|
+
|
|
542
|
+
One important consideration is what happens near the endpoints of the removed range.
|
|
543
|
+
There are two general possibilities: either the obliterate expands to include segments inserted
|
|
544
|
+
adjacent to the endpoint, or it doesn't.
|
|
545
|
+
|
|
546
|
+
In the initial implementation, we should pick some fixed endpoint behavior analogous to how insertion merge policy is fixed.
|
|
547
|
+
Each option is roughly equivalent in difficulty.
|
|
548
|
+
|
|
549
|
+
If applications request more degrees of freedom in this area, the framework for merge outcomes described in [Move](##Move) is a good starting point.
|
|
550
|
+
|
|
551
|
+
## Public API
|
|
552
|
+
|
|
553
|
+
The public API of sequence will need to be updated for users to leverage the obliterate operation. The most obvious way to extend it would be to align the API shape with
|
|
554
|
+
`removeRange`:
|
|
555
|
+
|
|
556
|
+
```typescript
|
|
557
|
+
class SharedSegmentSequence<TInterval extends IInterval> {
|
|
558
|
+
public obliterateRange(start: number, end: number)
|
|
559
|
+
}
|
|
560
|
+
```
|
|
561
|
+
|
|
562
|
+
One interesting alternative is to align the public API of sequence with the idea that there are two conceptual kinds of ranges: slice ranges and set ranges (see the next section
|
|
563
|
+
for details).
|
|
564
|
+
If we did this, we might instead unify `removeRange` and `obliterateRange` into a single method taking in such a range object.
|
|
565
|
+
This would have the nice property of naturally extending to annotate operations, if we anticipate wanting to be able to annotate slice ranges.
|
|
566
|
+
|
|
567
|
+
## Move
|
|
568
|
+
|
|
569
|
+
There are several different possible options for defining merge outcomes for the "move" operation.
|
|
570
|
+
The upcoming SharedTree DDS has done a lot of thinking in this area and landed on a relatively simple set of semantics that give reasonable
|
|
571
|
+
outcomes in most cases (see [issue 9658](https://github.com/microsoft/FluidFramework/issues/9658) for some very detailed reading).
|
|
572
|
+
|
|
573
|
+
These semantics are implementable in merge-tree, are compatible with feature requests for obliterate, and generally seem like a good direction to take
|
|
574
|
+
that we can later extend if applications request.
|
|
575
|
+
|
|
576
|
+
There are a few primitive concepts that all of the merge outcomes depend on.
|
|
577
|
+
|
|
578
|
+
First, a sequence of length `N` is conceptualized as an interleaving set of `N+1` *gaps* and `N` nodes.
|
|
579
|
+
Nodes in the sequence may move, but the gaps between the nodes do not.
|
|
580
|
+
|
|
581
|
+
Insertion into the sequence is performed by specifying a gap to insert in as well as a direction that the inserted content prefers to tend toward
|
|
582
|
+
in case other content is inserted/moved concurrently into the same gap.
|
|
583
|
+
|
|
584
|
+
> Merge-tree already conceptualizes insert locations similarly: it names the gaps `0` through `N`. It does not permit app-level specification of concurrent merges,
|
|
585
|
+
> but that degree of freedom doesn't need to be exposed.
|
|
586
|
+
|
|
587
|
+
Next, there are two types of range specifications: *set ranges* and *slice ranges*.
|
|
588
|
+
|
|
589
|
+
A *set range* targets exactly the objects in a given range at the time it was specified. In merge-tree terms, the segments that the range affects are
|
|
590
|
+
resolved from the perspective of the submitting client at its refSeq, and only those segments undergo whatever operation applies (move, annotate, remove).
|
|
591
|
+
|
|
592
|
+
> Merge-tree's `remove` operation has set range semantics, since it doesn't cause removal of any concurrently inserted segments.
|
|
593
|
+
> It's worth noting that a move operation with set range semantics is conceivable inside this framework, and not something merge-tree currently implements.
|
|
594
|
+
> E.g., if the set range "CDE" inside a string "ABCDEF" was moved to the end of the string, and someone concurrently moved "B" and "C" to the start,
|
|
595
|
+
> the string may end up "BAFCDE" or "BCAFDE" depending on the sequencing order of the moves.
|
|
596
|
+
|
|
597
|
+
Finally, a *slice range* specifies a start location and an end location, where a location has the same object shape as an insert destination: a gap plus a merge direction.
|
|
598
|
+
The range of nodes that the operation affects is interpreted at the time the operation applies, and any concurrent insertions/moves of content *into* that range
|
|
599
|
+
are also affected. The merge direction should be interpreted as relative to a "phantom segment" in the gap specifying the slice endpoint.
|
|
600
|
+
For example, in the string "ABCDE", the slice range
|
|
601
|
+
`[{ pos: 0, merge: <concurrent segments merge nearer> }, { pos: 3, merge: <concurrent segments merge further> })` referring to "ABC" would
|
|
602
|
+
not expand at either endpoint whereas if the merge options were flipped, it would expand at both endpoints.
|
|
603
|
+
|
|
604
|
+
> These semantics align with the proposed merge-tree `move` operation. Like insert, we can fix the direction things should merge
|
|
605
|
+
> (in this case it instead affects which way the move "expands") if consumers don't need the extra degrees of freedom.
|
|
606
|
+
|
|
607
|
+
Notice that because gaps don't move, this set of outcomes doesn't suffer from problems like a range specification becoming invalid (which happens with
|
|
608
|
+
how the legacy shared-tree assigns semantics to its ops, where each is relative to an id).
|
|
609
|
+
It also gives reasonable merge outcomes which basically amount to "first move wins."
|
|
610
|
+
Consider the following two troublesome cases of overlapping move.
|
|
611
|
+
|
|
612
|
+
#### Move within a move
|
|
613
|
+
|
|
614
|
+
```
|
|
615
|
+
// Initial state: "12345 AB CD"
|
|
616
|
+
{ seq: 1, refSeq: 0, clientId: 1, op: <move 2 through 4 to after "A"> } // (all of the op specification would actually be in terms of indices)
|
|
617
|
+
{ seq: 2, refSeq: 0, clientId: 2, op: <move 3 to after paragraph "C"> }
|
|
618
|
+
```
|
|
619
|
+
|
|
620
|
+
One can see with this order of sequencing, we'd end up with "15 A234B CD".
|
|
621
|
+
With the other order, we'd get "15 A24B C3D".
|
|
622
|
+
|
|
623
|
+
Both outcomes are reasonable; clients 1 and 2 effectively expressed opposing desires on where the 3 should go.
|
|
624
|
+
|
|
625
|
+
#### Move of a single endpoint past the other
|
|
626
|
+
|
|
627
|
+
```
|
|
628
|
+
// Initial state: "Paragraph 1<br>Paragraph 2<br>Paragraph 3<br>Paragraph 4<br>Paragraph 5"
|
|
629
|
+
{ seq: 1, refSeq: 0, clientId: 1, op: <move paragraphs 2 through 3 to the gap after paragraph 5> } // (all of the op specification would actually be in terms of indices)
|
|
630
|
+
{ seq: 2, refSeq: 0, clientId: 2, op: <move paragraphs 3 through 4 to the gap after paragraph 5> }
|
|
631
|
+
```
|
|
632
|
+
|
|
633
|
+
Client 1's op succeeds without conflict, giving intermediate state order of the paragraphs "14523".
|
|
634
|
+
Then client 2's op has a start endpoint targetting a tombstoned segment for paragraph 3, so it only affects paragraph 4.
|
|
635
|
+
The final state is "15423" since merge-tree chose near-merge-later.
|
|
636
|
+
|
|
637
|
+
If the ops are sequenced in the other order, the final state would instead be "15234".
|
|
638
|
+
|
|
639
|
+
Both of these outcomes are again generally plausible.
|
|
File without changes
|