webring 1.1.0 → 1.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +18 -0
- package/README.md +107 -30
- package/dist/cache.d.ts.map +1 -1
- package/dist/cache.js +5 -2
- package/dist/cache.js.map +1 -1
- package/dist/fetch.d.ts.map +1 -1
- package/dist/fetch.js.map +1 -1
- package/dist/index.test.js +35 -74
- package/dist/index.test.js.map +1 -1
- package/dist/types.d.ts +64 -0
- package/dist/types.d.ts.map +1 -1
- package/dist/types.js +26 -7
- package/dist/types.js.map +1 -1
- package/package.json +33 -12
- package/src/__snapshots__/index.test.ts.snap +3 -142
- package/src/cache.ts +5 -2
- package/src/fetch.ts +1 -0
- package/src/index.test.ts +39 -74
- package/src/testdata/rss-1.xml +2135 -0
- package/src/testdata/rss-10.xml +1029 -0
- package/src/testdata/rss-11.xml +881 -0
- package/src/testdata/rss-12.xml +4193 -0
- package/src/testdata/rss-13.xml +518 -0
- package/src/testdata/rss-14.xml +155 -0
- package/src/testdata/rss-15.xml +731 -0
- package/src/testdata/rss-16.xml +44 -0
- package/src/testdata/rss-17.xml +120 -0
- package/src/testdata/rss-18.xml +50 -0
- package/src/testdata/rss-19.xml +690 -0
- package/src/testdata/rss-2.xml +2186 -0
- package/src/testdata/rss-3.xml +80356 -0
- package/src/testdata/rss-4.xml +1601 -0
- package/src/testdata/rss-5.xml +3991 -0
- package/src/testdata/rss-6.xml +825 -0
- package/src/testdata/rss-7.xml +35 -0
- package/src/testdata/rss-8.xml +12516 -0
- package/src/testdata/rss-9.xml +369 -0
- package/src/types.ts +26 -8
|
@@ -0,0 +1,4193 @@
|
|
|
1
|
+
<?xml version="1.0" encoding="UTF-8"?>
|
|
2
|
+
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
|
|
3
|
+
<title>samwho.dev</title>
|
|
4
|
+
<subtitle>Personal website of Sam Rose.</subtitle>
|
|
5
|
+
<link href="https://samwho.dev/rss.xml" rel="self" type="application/atom+xml"/>
|
|
6
|
+
<link href="https://samwho.dev"/>
|
|
7
|
+
<generator uri="https://www.getzola.org/">Zola</generator>
|
|
8
|
+
<updated>2024-06-01T00:00:00+00:00</updated>
|
|
9
|
+
<id>https://samwho.dev/rss.xml</id>
|
|
10
|
+
|
|
11
|
+
<entry xml:lang="en">
|
|
12
|
+
<title>A Commitment to Art and Dogs</title>
|
|
13
|
+
<published>2024-06-01T00:00:00+00:00</published>
|
|
14
|
+
<updated>2024-06-01T00:00:00+00:00</updated>
|
|
15
|
+
<author>
|
|
16
|
+
<name>Unknown</name>
|
|
17
|
+
</author>
|
|
18
|
+
<link rel="alternate" href="https://samwho.dev/dogs/" type="text/html"/>
|
|
19
|
+
<id>https://samwho.dev/dogs/</id>
|
|
20
|
+
|
|
21
|
+
<content type="html"><style>
|
|
22
|
+
.dog-line {
|
|
23
|
+
display: flex;
|
|
24
|
+
flex-wrap: nowrap;
|
|
25
|
+
flex-direction: row;
|
|
26
|
+
width: 100%;
|
|
27
|
+
height: 10rem;
|
|
28
|
+
margin-top: 2rem;
|
|
29
|
+
margin-bottom: 2rem;
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
.dog-line img {
|
|
33
|
+
flex-grow: 1;
|
|
34
|
+
height: auto;
|
|
35
|
+
margin: 0;
|
|
36
|
+
padding: 0;
|
|
37
|
+
object-fit: contain;
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
.dog-grid {
|
|
41
|
+
display: grid;
|
|
42
|
+
grid-template-columns: repeat(4, 1fr);
|
|
43
|
+
grid-gap: 1rem;
|
|
44
|
+
margin-top: 2rem;
|
|
45
|
+
margin-bottom: 2rem;
|
|
46
|
+
}
|
|
47
|
+
</style>
|
|
48
|
+
<p>Back in <a href="/memory-allocation">Memory Allocation</a>, I introduced Haskie.</p>
|
|
49
|
+
<blockquote class="haskie">
|
|
50
|
+
<img src="/images/haskie-triumphant-200px.png" alt="A husky puppy looking triumphant" />
|
|
51
|
+
<p>
|
|
52
|
+
Hello!
|
|
53
|
+
</p>
|
|
54
|
+
</blockquote>
|
|
55
|
+
<p>The idea behind Haskie was to create a character that could ask questions the
|
|
56
|
+
reader might have, and to "soften" the posts to make them feel less
|
|
57
|
+
intimidating. I got some feedback from people that Haskie was a bit too
|
|
58
|
+
childish, and didn't feel like he belonged in posts about serious topics.
|
|
59
|
+
This feedback was in the minority, though, and most people liked him. So I kept
|
|
60
|
+
him and used him again in <a href="/hashing">Hashing</a>.</p>
|
|
61
|
+
<p>Having a proxy to the reader was useful. I could anticipate areas of confusion
|
|
62
|
+
and clear them up without creating enormous walls of text. I don't like it
|
|
63
|
+
when the entire screen is filled with text, I like to break it up with images
|
|
64
|
+
and interactive elements. And now dogs.</p>
|
|
65
|
+
<p>Then in <a href="/bloom-filters">Bloom Filters</a>, I found myself needing a
|
|
66
|
+
character to represent the "adult in the room." If Haskie was my proxy to the
|
|
67
|
+
reader, this new character would serve as a proxy to all of the material I
|
|
68
|
+
learned from in the writing of the post. This is Sage.</p>
|
|
69
|
+
<blockquote class="haskie">
|
|
70
|
+
<img src="/images/sage-happy-200px.png" alt="A husky looking happy" />
|
|
71
|
+
<p>
|
|
72
|
+
Well met!
|
|
73
|
+
</p>
|
|
74
|
+
</blockquote>
|
|
75
|
+
<p>I liked the idea of having a cast of characters, each with their own personality
|
|
76
|
+
and purpose. But I had a few problems.</p>
|
|
77
|
+
<h2 id="problems"><a class="anchor" href="#problems">#</a>
|
|
78
|
+
Problems</h2>
|
|
79
|
+
<p>Both Haskie and Sage, because I have no artistic ability, were generated by AI.
|
|
80
|
+
Back when I made them I was making no money from this blog, and I had no idea if
|
|
81
|
+
I was going to keep them around. I didn't want to invest money in an idea that
|
|
82
|
+
could flop, so I didn't feel bad about using AI to try it out.</p>
|
|
83
|
+
<p>Since then, however, I have been paid twice to write posts for companies, and I
|
|
84
|
+
know that I'm keeping the dogs. <strong>It wasn't ethical to continue piggybacking on
|
|
85
|
+
AI</strong>.</p>
|
|
86
|
+
<p>While ethics were the primary motivation, there were some other smaller problems
|
|
87
|
+
with the dogs:</p>
|
|
88
|
+
<ol>
|
|
89
|
+
<li>The visual style of them, while I did like it, never felt like it fit
|
|
90
|
+
with the rest of my personal brand.</li>
|
|
91
|
+
<li>It was difficult to get AI to generate consistent dogs. You'll notice
|
|
92
|
+
differences in coat colouration and features between variants of the same dog.</li>
|
|
93
|
+
<li>The AI generated images look bad at small sizes.</li>
|
|
94
|
+
</ol>
|
|
95
|
+
<p>So I worked with the wonderful <a href="https://www.andycarolan.com/">Andy
|
|
96
|
+
Carolan</a> to create a new design for my dogs. A design that would be
|
|
97
|
+
consistent, fit with my brand, and look good at any size.</p>
|
|
98
|
+
<h2 id="haskie-sage-and-doe"><a class="anchor" href="#haskie-sage-and-doe">#</a>
|
|
99
|
+
Haskie, Sage, and Doe</h2>
|
|
100
|
+
<div class="dog-line">
|
|
101
|
+
<img src="/images/dogs/haskie/default.svg" alt="A husky puppy called Haskie" />
|
|
102
|
+
<img src="/images/dogs/sage/default.svg" alt="A husky called Sage" />
|
|
103
|
+
<img src="/images/dogs/doe/default.svg" alt="A husky puppy called Doe" />
|
|
104
|
+
</div>
|
|
105
|
+
<p>The redesigned dogs are consistent, use simple colours and shapes, and use the
|
|
106
|
+
SVG file format to look good at any size. Each variant clocks in at around
|
|
107
|
+
20kb, which is slightly larger than the small AI-generated images, but I'll be
|
|
108
|
+
able to use them at any size.</p>
|
|
109
|
+
<p>Together the dogs represent a family unit: Sage as the dad, Haskie as the
|
|
110
|
+
youngest child, and Doe as his older sister.</p>
|
|
111
|
+
<p>They also come in a variety of poses, so I can use them to represent different
|
|
112
|
+
emotions or actions.</p>
|
|
113
|
+
<div class="dog-grid">
|
|
114
|
+
<img src="/images/dogs/haskie/bored.svg" alt="A husky puppy called Haskie looking bored" />
|
|
115
|
+
<img src="/images/dogs/haskie/concerned.svg" alt="A husky puppy called Haskie looking concerned" />
|
|
116
|
+
<img src="/images/dogs/haskie/confused.svg" alt="A husky puppy called Haskie looking confused" />
|
|
117
|
+
<img src="/images/dogs/haskie/triumphant.svg" alt="A husky puppy called Haskie looking triumphant" />
|
|
118
|
+
<img src="/images/dogs/sage/caution2.svg" alt="A husky called Sage looking cautioning" />
|
|
119
|
+
<img src="/images/dogs/sage/caution.svg" alt="A husky called Sage looking cautioning" />
|
|
120
|
+
<img src="/images/dogs/sage/despair.svg" alt="A husky called Sage looking despaired" />
|
|
121
|
+
<img src="/images/dogs/sage/proud.svg" alt="A husky called Sage looking proud" />
|
|
122
|
+
<img src="/images/dogs/doe/amazed.svg" alt="A husky puppy called Doe looking amazed" />
|
|
123
|
+
<img src="/images/dogs/doe/mischief.svg" alt="A husky puppy called Doe looking mischievous" />
|
|
124
|
+
<img src="/images/dogs/doe/protective.svg" alt="A husky puppy called Doe looking protective" />
|
|
125
|
+
<img src="/images/dogs/doe/proud.svg" alt="A husky puppy called Doe looking proud" />
|
|
126
|
+
</div>
|
|
127
|
+
<p>We were careful to make the dogs recognisable apart. They differ in colour, ear
|
|
128
|
+
shape, tail shape, and collar tag. Sage and Doe have further distinguishing
|
|
129
|
+
features: Sage with his glasses, and Doe with her bandana. Doe's bandana uses
|
|
130
|
+
the same colours as the <a
|
|
131
|
+
href="https://en.wikipedia.org/wiki/Transgender_flag"> transgender flag</a>,
|
|
132
|
+
to show my support for the trans community and as a nod to her identity.</p>
|
|
133
|
+
<h2 id="going-forward"><a class="anchor" href="#going-forward">#</a>
|
|
134
|
+
Going forward</h2>
|
|
135
|
+
<p>I'm so happy with the new dogs, and plan to use them in my posts going forward.
|
|
136
|
+
I suspect I will, at some point, replace the dogs in my old posts as well.
|
|
137
|
+
I don't plan to add any more characters, and I want to be careful to avoid
|
|
138
|
+
overusing them. I don't want them to become a crutch, or to distract from the
|
|
139
|
+
content of the posts.</p>
|
|
140
|
+
<p>I also haven't forgotten the many people that pointed out to me that you can't
|
|
141
|
+
pet the dogs. I'm working on it.</p>
|
|
142
|
+
</content>
|
|
143
|
+
|
|
144
|
+
</entry>
|
|
145
|
+
|
|
146
|
+
|
|
147
|
+
|
|
148
|
+
|
|
149
|
+
<entry xml:lang="en">
|
|
150
|
+
<title>Bloom Filters</title>
|
|
151
|
+
<published>2024-02-19T00:00:00+00:00</published>
|
|
152
|
+
<updated>2024-02-19T00:00:00+00:00</updated>
|
|
153
|
+
<author>
|
|
154
|
+
<name>Unknown</name>
|
|
155
|
+
</author>
|
|
156
|
+
<link rel="alternate" href="https://samwho.dev/bloom-filters/" type="text/html"/>
|
|
157
|
+
<id>https://samwho.dev/bloom-filters/</id>
|
|
158
|
+
|
|
159
|
+
<content type="html"><style>
|
|
160
|
+
.bf {
|
|
161
|
+
width: 100%;
|
|
162
|
+
height: 150px;
|
|
163
|
+
}
|
|
164
|
+
|
|
165
|
+
@media only screen and (min-width: 320px) and (max-width: 479px) {
|
|
166
|
+
.bf {
|
|
167
|
+
height: 200px;
|
|
168
|
+
}
|
|
169
|
+
}
|
|
170
|
+
|
|
171
|
+
@media only screen and (min-width: 480px) and (max-width: 676px) {
|
|
172
|
+
.bf {
|
|
173
|
+
height: 200px;
|
|
174
|
+
}
|
|
175
|
+
}
|
|
176
|
+
|
|
177
|
+
@media only screen and (min-width: 677px) and (max-width: 991px) {
|
|
178
|
+
.bf {
|
|
179
|
+
height: 150px;
|
|
180
|
+
}
|
|
181
|
+
}
|
|
182
|
+
|
|
183
|
+
form {
|
|
184
|
+
display: flex;
|
|
185
|
+
flex-direction: column;
|
|
186
|
+
align-items: center;
|
|
187
|
+
justify-content: stretch;
|
|
188
|
+
}
|
|
189
|
+
|
|
190
|
+
input {
|
|
191
|
+
border: 1px solid rgb(119, 119, 119);
|
|
192
|
+
padding: 0.25rem;
|
|
193
|
+
border-radius: 0.25rem;
|
|
194
|
+
height: 2em;
|
|
195
|
+
line-height: 2em;
|
|
196
|
+
}
|
|
197
|
+
|
|
198
|
+
.aside {
|
|
199
|
+
padding: 2rem;
|
|
200
|
+
width: 100vw;
|
|
201
|
+
position: relative;
|
|
202
|
+
margin-left: -50vw;
|
|
203
|
+
left: 50%;
|
|
204
|
+
background-color: #eeeeee;
|
|
205
|
+
|
|
206
|
+
display: flex;
|
|
207
|
+
align-items: center;
|
|
208
|
+
flex-direction: column;
|
|
209
|
+
}
|
|
210
|
+
|
|
211
|
+
.aside > * {
|
|
212
|
+
flex-grow: 1;
|
|
213
|
+
}
|
|
214
|
+
|
|
215
|
+
.aside p {
|
|
216
|
+
padding-left: 1rem;
|
|
217
|
+
padding-right: 1rem;
|
|
218
|
+
max-width: 780px;
|
|
219
|
+
font-style: italic;
|
|
220
|
+
font-family: Lora, serif;
|
|
221
|
+
text-align: center;
|
|
222
|
+
}
|
|
223
|
+
|
|
224
|
+
</style>
|
|
225
|
+
<noscript>
|
|
226
|
+
<div class=aside>
|
|
227
|
+
<p>
|
|
228
|
+
This page makes heavy use of JavaScript to visualise the concepts discussed.
|
|
229
|
+
Viewing it without JavaScript will be a strange experience, as the text
|
|
230
|
+
talks about the visualisations. I strongly recommend either enabling
|
|
231
|
+
JavaScript, or not wasting your time.
|
|
232
|
+
</p>
|
|
233
|
+
</div>
|
|
234
|
+
</noscript>
|
|
235
|
+
<p>Everyone has a set of tools they use to solve problems. Growing this set helps
|
|
236
|
+
you to solve ever more difficult problems. In this post, I'm going to teach you
|
|
237
|
+
about a tool you may not have heard of before. It's a niche tool that won't
|
|
238
|
+
apply to many problems, but when it does you'll find it invaluable. It's called
|
|
239
|
+
a "bloom filter."</p>
|
|
240
|
+
<s-dog name=sage mode=warning>
|
|
241
|
+
<b style="color: #d08770">Before you continue!</b> This post
|
|
242
|
+
assumes you know what a <b>hash function</b> is, and if you don't it's going
|
|
243
|
+
to be tricky to understand. Sam has written a post about hash functions, and
|
|
244
|
+
recommendeds that you <b><a href="/hashing">read this first</a>.</b>
|
|
245
|
+
</s-dog>
|
|
246
|
+
<h3 id="what-bloom-filters-can-do"><a class="anchor" href="#what-bloom-filters-can-do">#</a>
|
|
247
|
+
What bloom filters can do</h3>
|
|
248
|
+
<p>Bloom filters are similar to the <code>Set</code> data structure. You can add items to
|
|
249
|
+
them, and check if an item is present. Here's what it might look like to use
|
|
250
|
+
a bloom filter in JavaScript, using a made-up <code>BloomFilter</code> class:</p>
|
|
251
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">let </span><span>bf </span><span style="color:#81a1c1;">= new </span><span style="color:#8fbcbb;">BloomFilter</span><span>()</span><span style="color:#eceff4;">;
|
|
252
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">add</span><span>(</span><span style="color:#a3be8c;">&quot;Ant&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
253
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">add</span><span>(</span><span style="color:#a3be8c;">&quot;Rhino&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
254
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">contains</span><span>(</span><span style="color:#a3be8c;">&quot;Ant&quot;</span><span>)</span><span style="color:#eceff4;">; </span><span style="color:#616e88;">// true
|
|
255
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">contains</span><span>(</span><span style="color:#a3be8c;">&quot;Rhino&quot;</span><span>)</span><span style="color:#eceff4;">; </span><span style="color:#616e88;">// true
|
|
256
|
+
</span></code></pre>
|
|
257
|
+
<p>While this looks almost identical to a <code>Set</code>, there are some key differences.
|
|
258
|
+
Bloom filters are what's called a <strong>probabilistic data structure</strong>. Where a
|
|
259
|
+
<code>Set</code> can give you a concrete "yes" or "no" answer when you call <code>contains</code>, a
|
|
260
|
+
bloom filter can't. Bloom filters can give definite "no"s, but they can't be
|
|
261
|
+
certain about "yes."</p>
|
|
262
|
+
<p>In the example above, when we ask <code>bf</code> if it contains <code>"Ant"</code> and <code>"Rhino"</code>, the
|
|
263
|
+
<code>true</code> that it returns isn't a guarantee that they're present. We know that
|
|
264
|
+
they're present because we added them just a couple of lines before, but it
|
|
265
|
+
would be possible for this to happen:</p>
|
|
266
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">let </span><span>bf </span><span style="color:#81a1c1;">= new </span><span style="color:#8fbcbb;">BloomFilter</span><span>()</span><span style="color:#eceff4;">;
|
|
267
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">add</span><span>(</span><span style="color:#a3be8c;">&quot;Ant&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
268
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">add</span><span>(</span><span style="color:#a3be8c;">&quot;Rhino&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
269
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">contains</span><span>(</span><span style="color:#a3be8c;">&quot;Fox&quot;</span><span>)</span><span style="color:#eceff4;">; </span><span style="color:#616e88;">// true
|
|
270
|
+
</span></code></pre>
|
|
271
|
+
<p>We'll demonstrate <em>why</em> over the course of this post. For now, we'll say that
|
|
272
|
+
when bloom filters return <code>true</code> it doesn't mean "yes", it means "maybe". When
|
|
273
|
+
this happens and the item has never been added before, it's called a
|
|
274
|
+
<strong>false-positive</strong>.</p>
|
|
275
|
+
<p>The opposite, claiming "no" when the answer is "yes," is called a
|
|
276
|
+
<strong>false-negative</strong>. A bloom filter will <em>never</em> give a false-negative, and this
|
|
277
|
+
is what makes them useful.</p>
|
|
278
|
+
<s-dog name=haskie mode=confused>
|
|
279
|
+
A data structure that lies to you?! How could that possibly be useful?
|
|
280
|
+
</s-dog>
|
|
281
|
+
<p>It's not strictly lying, it's just not giving you a definite answer. Let's look
|
|
282
|
+
at an example where we can use this property to our advantage.</p>
|
|
283
|
+
<h3 id="when-bloom-filters-are-useful"><a class="anchor" href="#when-bloom-filters-are-useful">#</a>
|
|
284
|
+
When bloom filters are useful</h3>
|
|
285
|
+
<p>Imagine you're building a web browser, and you want to protect users from
|
|
286
|
+
malicious links. You could build and maintain a list of all known malicious
|
|
287
|
+
links and check the list every time a user navigates the browser. If the link
|
|
288
|
+
they're trying to visit is in the list, you warn the user that they might be
|
|
289
|
+
about to visit a malicious website.</p>
|
|
290
|
+
<p>If we assume there are, say, 1,000,000 malicious links on the Internet, and each
|
|
291
|
+
link is 20 characters long, then the list of malicious links would be 20MB in
|
|
292
|
+
size. This isn't a huge amount of data, but it's not small either. If you have
|
|
293
|
+
lots of users and want to keep this list up to date, the bandwidth could add up.</p>
|
|
294
|
+
<p>However, if you're happy to accept being wrong 0.0001% of the time (1 in a
|
|
295
|
+
million), you could use a bloom filter which can store the same data in 3.59MB.
|
|
296
|
+
That's an 82% reduction in size, and all it costs you is showing the user an
|
|
297
|
+
incorrect warning 1 in every million links visited. If you wanted to take it
|
|
298
|
+
even further, and you were happy to accept being wrong 0.1% of the time (1 in
|
|
299
|
+
1000), the bloom filter would only be 1.8MB.</p>
|
|
300
|
+
<p>This use-case isn't hypothetical, either. Google Chrome used a bloom filter for
|
|
301
|
+
this exact purpose until 2012. If you were worried about showing a warning when
|
|
302
|
+
it wasn't needed, you could always make an API that has the full list of
|
|
303
|
+
malicious links in a database. When the bloom filter says "maybe," you would
|
|
304
|
+
then make an API call to check the full list to be sure. No more spurious
|
|
305
|
+
warnings, and the bloom filter would save you from having to call the API for
|
|
306
|
+
every link visited.</p>
|
|
307
|
+
<h3 id="how-bloom-filters-work"><a class="anchor" href="#how-bloom-filters-work">#</a>
|
|
308
|
+
How bloom filters work</h3>
|
|
309
|
+
<p>At its core, a bloom filter is an array of <s-word>bits</s-word>. When it is
|
|
310
|
+
created, all of the <s-word>bits</s-word> are set to 0. We're going to represent
|
|
311
|
+
this as a grid of circles, with each circle representing 1 <s-word>bit</s-word>.
|
|
312
|
+
Our bloom filters in this post are all going to have 32 <s-word>bits</s-word> in
|
|
313
|
+
total.</p>
|
|
314
|
+
<p><s-bloom-filter id=bf0 class="bf" hashes=sha1,sha256,sha512 bits=32></s-bloom-filter></p>
|
|
315
|
+
<s-dog name=sam padding=false>
|
|
316
|
+
I'm experimenting with alternate colour palettes. If you find the above
|
|
317
|
+
difficult to read, or just don't like it, please try <a
|
|
318
|
+
href="?palette=tol#bf0">this one</a> and let me know what you think.
|
|
319
|
+
<a href="?palette=wong#bf0">Click here</a> to go back to normal.
|
|
320
|
+
</s-dog>
|
|
321
|
+
<p>To add an item to the bloom filter, we're going to hash it with 3 different hash
|
|
322
|
+
functions, then use the 3 resulting values to set 3 <s-word>bits</s-word>. If
|
|
323
|
+
you're not familiar with hashing, I recommend reading <a href="/hashing">my post</a> about
|
|
324
|
+
it before continuing.</p>
|
|
325
|
+
<p>For this post I'm choosing to use 3 of the
|
|
326
|
+
<a href="https://en.wikipedia.org/wiki/Secure_Hash_Algorithms">SHA</a> family of hash
|
|
327
|
+
functions: <s-word>sha1</s-word>, <s-word>sha256</s-word>, and
|
|
328
|
+
<s-word>sha512</s-word>. Here's what our bloom filter looks like if we add the
|
|
329
|
+
value "foo" to it:</p>
|
|
330
|
+
<s-bloom-filter id=bf1 class="bf" hashes=sha1,sha256,sha512 bits=32>
|
|
331
|
+
<add value="foo">
|
|
332
|
+
</s-bloom-filter>
|
|
333
|
+
<p>The <s-word>bits</s-word> in positions <s-bitlink bfid=bf1
|
|
334
|
+
highlight=true>15</s-bitlink>, <s-bitlink bfid=bf1>16</s-bitlink> and <s-bitlink
|
|
335
|
+
bfid=bf1>27</s-bitlink> have been set. Other <s-word>bits</s-word>, e.g.
|
|
336
|
+
<s-bitlink bfid=bf1>1</s-bitlink> have not been set. You can hover or tap the
|
|
337
|
+
<s-word>bits</s-word> in this paragraph to highlight them in the visualisation.
|
|
338
|
+
We get to this state by taking the hash value of "foo" for each of our 3 hash
|
|
339
|
+
functions and modulo it by the number of <s-word>bits</s-word> in our bloom
|
|
340
|
+
filter. Modulo gets us the remainder when dividing by 32, so we get 27 with
|
|
341
|
+
<s-word>sha1</s-word>, 15 with <s-word>sha256</s-word> and 16 with
|
|
342
|
+
<s-word>sha512</s-word>. The table below shows what's happening, and you can try
|
|
343
|
+
inputting your own values to see what <s-word>bits</s-word> they would set if
|
|
344
|
+
added.</p>
|
|
345
|
+
<div style="display: flex; justify-content: center; margin-top: 1em">
|
|
346
|
+
<input
|
|
347
|
+
type="text"
|
|
348
|
+
value=foo
|
|
349
|
+
style="
|
|
350
|
+
width: 20em;
|
|
351
|
+
max-width: 100%;
|
|
352
|
+
margin-left: 0.5em;
|
|
353
|
+
margin-right: 0.5em;
|
|
354
|
+
"
|
|
355
|
+
oninput="
|
|
356
|
+
let hv = document.getElementById('hv1');
|
|
357
|
+
hv.setAttribute('value', this.value);
|
|
358
|
+
"
|
|
359
|
+
/>
|
|
360
|
+
</div>
|
|
361
|
+
<hash-values id="hv1" bfid=bf1 value=foo mod=32></hash-values>
|
|
362
|
+
<p>Go ahead and <s-word>add</s-word> a few of your own values to our bloom filter
|
|
363
|
+
below and see what happens. There's also a <s-word>check</s-word> button that
|
|
364
|
+
will tell you if a value is present within the bloom filter. A value is only
|
|
365
|
+
considered present if all of the <s-word>bits</s-word> checked are set. You can
|
|
366
|
+
start again by hitting the <s-word>clear</s-word> button.</p>
|
|
367
|
+
<p><s-word-adder bfid=bf2></s-word-adder>
|
|
368
|
+
<s-bloom-filter id=bf2 hashes="sha1,sha256,sha512" bits=32 class=bf></s-bloom-filter></p>
|
|
369
|
+
<p>You might occasionally notice that only 2, or even 1, <s-word>bits</s-word> get
|
|
370
|
+
set. This happens when 2 or more of our hash functions produce the same value,
|
|
371
|
+
or we attempt to set a <s-word>bit</s-word> that has already been set. Taking
|
|
372
|
+
that a bit further, have a think about the implications of a bloom filter that
|
|
373
|
+
has every <s-word>bit</s-word> set.</p>
|
|
374
|
+
<s-dog name=haskie mode=concerned>
|
|
375
|
+
Hmm... If every <s-word>bit</s-word> is set, then won't the bloom filter
|
|
376
|
+
claim it contains every item you check? That's a false-positive every
|
|
377
|
+
time!
|
|
378
|
+
</s-dog>
|
|
379
|
+
<p>Exactly right. A bloom filter with every <s-word>bit</s-word> set is equivalent
|
|
380
|
+
to a <code>Set</code> that always returns <code>true</code> for <code>contains</code>. It will claim to contain
|
|
381
|
+
everything you ask it about, even if that thing was never added.</p>
|
|
382
|
+
<h2 id="false-positive-rates"><a class="anchor" href="#false-positive-rates">#</a>
|
|
383
|
+
False-positive rates</h2>
|
|
384
|
+
<p>The rate of false-positives in our bloom filter will grow as the percentage of
|
|
385
|
+
set <s-word>bits</s-word> increases. Drag the slider below the graph to see how
|
|
386
|
+
the false-positive rate changes as the number of set <s-word>bits</s-word>
|
|
387
|
+
increases.</p>
|
|
388
|
+
<s-graph id=graph1 style="width: 100%; aspect-ratio: 2/1;" drawupto=0>
|
|
389
|
+
<axes>
|
|
390
|
+
<x tics=0.2 ticformat="(tic * 100).toFixed(0) + '%'" label="Bits set" max=1>
|
|
391
|
+
<y tics=0.2 ticformat="(tic * 100).toFixed(0) + '%'" label="Chance of false-positive" max=1>
|
|
392
|
+
</axes>
|
|
393
|
+
<lines>
|
|
394
|
+
<line y="Math.pow(x, 3)">
|
|
395
|
+
</lines>
|
|
396
|
+
</s-graph>
|
|
397
|
+
<s-slider
|
|
398
|
+
value=0
|
|
399
|
+
min=0
|
|
400
|
+
max=1
|
|
401
|
+
step=any
|
|
402
|
+
onchange="
|
|
403
|
+
document.getElementById('graph1').setAttribute('drawupto', e.target.value)
|
|
404
|
+
">
|
|
405
|
+
</s-slider>
|
|
406
|
+
<p>It grows slowly at first, but as we get closer to having all
|
|
407
|
+
<s-word>bits</s-word> set the rate increases. This is because we calculate the
|
|
408
|
+
false-positive rate as <code>x^3</code>, where <code>x</code> is the percentage of set
|
|
409
|
+
<s-word>bits</s-word> and <code>3</code> is the number of hash functions used. To give an
|
|
410
|
+
example of why we calculate it with this formula, imagine we have a bloom filter
|
|
411
|
+
with half of its bits set, <code>x = 0.5</code>. If we assume that our hash function has
|
|
412
|
+
an equal chance of setting any of the bits, then the chance that all 3 hash
|
|
413
|
+
functions set a bit that is already set is <code>0.5 * 0.5 * 0.5</code>, or <code>x^3</code>.</p>
|
|
414
|
+
<p>Let's have a look at the false-positive rate of bloom filters that use different
|
|
415
|
+
numbers of hash functions.</p>
|
|
416
|
+
<s-graph id=graph2 style="width: 100%; aspect-ratio: 2/1;" drawupto=0>
|
|
417
|
+
<legend position=top-left>
|
|
418
|
+
<axes>
|
|
419
|
+
<x tics=0.2 ticformat="(tic * 100).toFixed(0) + '%'" label="Bits set" max=1>
|
|
420
|
+
<y tics=0.2 ticformat="(tic * 100).toFixed(0) + '%'" label="Chance of false-positive" max=1>
|
|
421
|
+
</axes>
|
|
422
|
+
<lines>
|
|
423
|
+
<line y="Math.pow(x, 1)" label="k=1">
|
|
424
|
+
<line y="Math.pow(x, 2)" label="k=2">
|
|
425
|
+
<line y="Math.pow(x, 3)" label="k=3">
|
|
426
|
+
<line y="Math.pow(x, 4)" label="k=4">
|
|
427
|
+
<line y="Math.pow(x, 5)" label="k=5">
|
|
428
|
+
</lines>
|
|
429
|
+
</s-graph>
|
|
430
|
+
<s-slider
|
|
431
|
+
value=0
|
|
432
|
+
min=0
|
|
433
|
+
max=1
|
|
434
|
+
step=any
|
|
435
|
+
onchange="
|
|
436
|
+
document.getElementById('graph2').setAttribute('drawupto', e.target.value)
|
|
437
|
+
">
|
|
438
|
+
</s-slider>
|
|
439
|
+
<s-dog name=haskie>
|
|
440
|
+
It looks like more hash functions we use, the better our false-positive rate
|
|
441
|
+
is. Doesn't that mean we should always use lots of hash functions? Why don't
|
|
442
|
+
we use, like, 100?
|
|
443
|
+
</s-dog>
|
|
444
|
+
<p>The problem that using lots of hash functions introduces is that it makes the
|
|
445
|
+
bloom filter fill up faster. The more hash functions you use, the more
|
|
446
|
+
<s-word>bits</s-word> get set for each item you add. There's also the cost of
|
|
447
|
+
hashing itself. Hash functions aren't free, and while the hash functions you'd
|
|
448
|
+
use in a bloom filter try to be as fast as possible, it's still more expensive
|
|
449
|
+
to run 100 of them than it is to run 3.</p>
|
|
450
|
+
<p>It's possible to calculate how full a bloom filter will be after inserting a
|
|
451
|
+
number of items, based on the number of hash functions used. The graph below
|
|
452
|
+
assumes a bloom filter with 1000 <s-word>bits</s-word>.</p>
|
|
453
|
+
<s-graph id=graph3 style="width: 100%; aspect-ratio: 2/1;" drawupto=0>
|
|
454
|
+
<legend position=top-left>
|
|
455
|
+
<axes>
|
|
456
|
+
<x tics=100 label="Items added" max=1000>
|
|
457
|
+
<y tics=100 label="Bits set" max=1000>
|
|
458
|
+
</axes>
|
|
459
|
+
<lines>
|
|
460
|
+
<line y="1000 * (1 - Math.pow(1 - 1/1000, 5 * x))" label="k=5">
|
|
461
|
+
<line y="1000 * (1 - Math.pow(1 - 1/1000, 4 * x))" label="k=4">
|
|
462
|
+
<line y="1000 * (1 - Math.pow(1 - 1/1000, 3 * x))" label="k=3">
|
|
463
|
+
<line y="1000 * (1 - Math.pow(1 - 1/1000, 2 * x))" label="k=2">
|
|
464
|
+
<line y="1000 * (1 - Math.pow(1 - 1/1000, 1 * x))" label="k=1">
|
|
465
|
+
</lines>
|
|
466
|
+
</s-graph>
|
|
467
|
+
<s-slider
|
|
468
|
+
value=0
|
|
469
|
+
min=0
|
|
470
|
+
max=1000
|
|
471
|
+
step=any
|
|
472
|
+
onchange="
|
|
473
|
+
document.getElementById('graph3').setAttribute('drawupto', e.target.value)
|
|
474
|
+
">
|
|
475
|
+
</s-slider>
|
|
476
|
+
<p>The more hash functions we use, the faster we set all of the bits. You'll notice that
|
|
477
|
+
the curve tails off as more items are added. This is because the more
|
|
478
|
+
<s-word>bits</s-word> that are set, the more likely it is that we'all attempt to
|
|
479
|
+
set a <s-word>bit</s-word> that has already been set.</p>
|
|
480
|
+
<p>In practice, 1000 <s-word>bits</s-word> is a very small bloom filter, occupying
|
|
481
|
+
only 125 bytes of memory. Modern computers have a lot of memory, so let's crank this
|
|
482
|
+
up to 100,000 <s-word>bits</s-word> (12.5kB) and see what happens.</p>
|
|
483
|
+
<s-graph id=graph4 style="width: 100%; aspect-ratio: 2/1;" drawupto=0>
|
|
484
|
+
<legend position=top-left>
|
|
485
|
+
<axes>
|
|
486
|
+
<x tics=100 label="Items added" max=1000>
|
|
487
|
+
<y tics=10000 ticformat="(tic / 1000).toFixed(0) + 'k'" label="Bits set" max=100000>
|
|
488
|
+
</axes>
|
|
489
|
+
<lines>
|
|
490
|
+
<line y="100000 * (1 - Math.pow(1 - 1/100000, 5 * x))" label="k=5">
|
|
491
|
+
<line y="100000 * (1 - Math.pow(1 - 1/100000, 4 * x))" label="k=4">
|
|
492
|
+
<line y="100000 * (1 - Math.pow(1 - 1/100000, 3 * x))" label="k=3">
|
|
493
|
+
<line y="100000 * (1 - Math.pow(1 - 1/100000, 2 * x))" label="k=2">
|
|
494
|
+
<line y="100000 * (1 - Math.pow(1 - 1/100000, 1 * x))" label="k=1">
|
|
495
|
+
</lines>
|
|
496
|
+
</s-graph>
|
|
497
|
+
<s-slider
|
|
498
|
+
value=0
|
|
499
|
+
min=0
|
|
500
|
+
max=1000
|
|
501
|
+
step=any
|
|
502
|
+
onchange="
|
|
503
|
+
document.getElementById('graph4').setAttribute('drawupto', e.target.value)
|
|
504
|
+
">
|
|
505
|
+
</s-slider>
|
|
506
|
+
<p>The lines barely leave the bottom of the graph, meaning the bloom filter will
|
|
507
|
+
be very empty and the false-positive rate will be low. All this cost us was
|
|
508
|
+
12.5kB of memory, which is still a very small amount by 2024 standards.</p>
|
|
509
|
+
<h2 id="tuning-a-bloom-filter"><a class="anchor" href="#tuning-a-bloom-filter">#</a>
|
|
510
|
+
Tuning a bloom filter</h2>
|
|
511
|
+
<p>Picking the correct number of hash functions and <s-word>bits</s-word> for a bloom
|
|
512
|
+
filter is a fine balance. Fortunately for us, if we know up-front how many
|
|
513
|
+
unique items we want to store, and what our desired false-positive rate is, we
|
|
514
|
+
can calculate the optimal number of hash functions, and the required number of
|
|
515
|
+
<s-word>bits</s-word>.</p>
|
|
516
|
+
<p>The <a href="https://en.wikipedia.org/wiki/Bloom_filter">bloom filter</a> page on Wikipedia
|
|
517
|
+
covers the mathematics involved, which I'm going to translate into JavaScript
|
|
518
|
+
functions for us to use. I want to stress that you don't need to understand the
|
|
519
|
+
maths to use a bloom filter or read this post. I'm including the link to it only
|
|
520
|
+
for completeness.</p>
|
|
521
|
+
<h3 id="optimal-number-of-bits"><a class="anchor" href="#optimal-number-of-bits">#</a>
|
|
522
|
+
Optimal number of bits</h3>
|
|
523
|
+
<p>The following JavaScript function, which might look a bit scary but bear with
|
|
524
|
+
me, takes the number of items you want to store (<code>items</code>) and the desired
|
|
525
|
+
false-positive rate (<code>fpr</code>, where 1% == <code>0.01</code>), and returns how many
|
|
526
|
+
<s-word>bits</s-word> you will need to achieve that false-positive rate.</p>
|
|
527
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">function </span><span style="color:#88c0d0;">bits</span><span>(items</span><span style="color:#eceff4;">, </span><span>fpr) {
|
|
528
|
+
</span><span> </span><span style="color:#81a1c1;">const </span><span style="font-weight:bold;color:#d8dee9;">n </span><span style="color:#81a1c1;">= -</span><span>items </span><span style="color:#81a1c1;">* </span><span style="color:#8fbcbb;">Math</span><span style="color:#81a1c1;">.</span><span style="font-style:italic;color:#88c0d0;">log</span><span>(fpr)</span><span style="color:#eceff4;">;
|
|
529
|
+
</span><span> </span><span style="color:#81a1c1;">const </span><span style="font-weight:bold;color:#d8dee9;">d </span><span style="color:#81a1c1;">= </span><span style="color:#8fbcbb;">Math</span><span style="color:#81a1c1;">.</span><span style="font-style:italic;color:#88c0d0;">log</span><span>(</span><span style="color:#b48ead;">2</span><span>) </span><span style="color:#81a1c1;">** </span><span style="color:#b48ead;">2</span><span style="color:#eceff4;">;
|
|
530
|
+
</span><span> </span><span style="color:#81a1c1;">return </span><span style="color:#8fbcbb;">Math</span><span style="color:#81a1c1;">.</span><span style="font-style:italic;color:#88c0d0;">ceil</span><span>(n </span><span style="color:#81a1c1;">/ </span><span>d)</span><span style="color:#eceff4;">;
|
|
531
|
+
</span><span>}
|
|
532
|
+
</span></code></pre>
|
|
533
|
+
<p>We can see how this grows for a variety of <code>fpr</code> values in the graph below.</p>
|
|
534
|
+
<s-graph id=graph5 style="width: 100%; aspect-ratio: 2/1;" drawupto=0>
|
|
535
|
+
<legend position=top-left>
|
|
536
|
+
<axes>
|
|
537
|
+
<x tics=100 label="Items you plan to add" max=1000>
|
|
538
|
+
<y tics=1000 ticformat="(tic / 1000).toFixed(0) + 'k'" label="Bits required" max=10000>
|
|
539
|
+
</axes>
|
|
540
|
+
<lines>
|
|
541
|
+
<line y="Math.ceil((-x * Math.log(0.01)) / (Math.log(2) ** 2))" label="fpr=0.01 (1%)">
|
|
542
|
+
<line y="Math.ceil((-x * Math.log(0.05)) / (Math.log(2) ** 2))" label="fpr=0.05 (5%)">
|
|
543
|
+
<line y="Math.ceil((-x * Math.log(0.10)) / (Math.log(2) ** 2))" label="fpr=0.1 (10%)">
|
|
544
|
+
</lines>
|
|
545
|
+
</s-graph>
|
|
546
|
+
<s-slider
|
|
547
|
+
value=0
|
|
548
|
+
min=0
|
|
549
|
+
max=1000
|
|
550
|
+
step=any
|
|
551
|
+
onchange="
|
|
552
|
+
document.getElementById('graph5').setAttribute('drawupto', e.target.value)
|
|
553
|
+
">
|
|
554
|
+
</s-slider>
|
|
555
|
+
<h3 id="optimal-number-of-hash-functions"><a class="anchor" href="#optimal-number-of-hash-functions">#</a>
|
|
556
|
+
Optimal number of hash functions</h3>
|
|
557
|
+
<p>After we've used the JavaScript above to calculate how many
|
|
558
|
+
<s-word>bits</s-word> we need, we can use the following function to calculate
|
|
559
|
+
the optimal number of hash functions to use:</p>
|
|
560
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">function </span><span style="color:#88c0d0;">hashFunctions</span><span>(bits</span><span style="color:#eceff4;">, </span><span>items) {
|
|
561
|
+
</span><span> </span><span style="color:#81a1c1;">return </span><span style="color:#8fbcbb;">Math</span><span style="color:#81a1c1;">.</span><span style="font-style:italic;color:#88c0d0;">ceil</span><span>((bits </span><span style="color:#81a1c1;">/ </span><span>items) </span><span style="color:#81a1c1;">* </span><span style="color:#8fbcbb;">Math</span><span style="color:#81a1c1;">.</span><span style="font-style:italic;color:#88c0d0;">log</span><span>(</span><span style="color:#b48ead;">2</span><span>))</span><span style="color:#eceff4;">;
|
|
562
|
+
</span><span>}
|
|
563
|
+
</span></code></pre>
|
|
564
|
+
<p>Pause for a second here and have a think about how the number of hash functions
|
|
565
|
+
might grow based on the size of the bloom filter and the number of items you
|
|
566
|
+
expect to add. Do you think you'll use more hash functions, or fewer, as the
|
|
567
|
+
bloom filter gets larger? What about as the number of items increases?</p>
|
|
568
|
+
<s-graph id=graph6 style="width: 100%; aspect-ratio: 2/1;" drawupto=0>
|
|
569
|
+
<legend position=top-left>
|
|
570
|
+
<axes>
|
|
571
|
+
<x tics=1000 ticformat="(tic / 1000).toFixed(0) + 'k'" label="Bits in bloom filter" max=10000>
|
|
572
|
+
<y tics=1 label="Optimal hash functions" max=10>
|
|
573
|
+
</axes>
|
|
574
|
+
<lines>
|
|
575
|
+
<line y="Math.ceil((x / 100) * Math.log(2))" label="items = 100">
|
|
576
|
+
<line y="Math.ceil((x / 1000) * Math.log(2))" label="items = 1000">
|
|
577
|
+
<line y="Math.ceil((x / 5000) * Math.log(2))" label="items = 5000">
|
|
578
|
+
</lines>
|
|
579
|
+
</s-graph>
|
|
580
|
+
<s-slider
|
|
581
|
+
value=0
|
|
582
|
+
min=0
|
|
583
|
+
max=10000
|
|
584
|
+
step=any
|
|
585
|
+
onchange="
|
|
586
|
+
document.getElementById('graph6').setAttribute('drawupto', e.target.value)
|
|
587
|
+
">
|
|
588
|
+
</s-slider>
|
|
589
|
+
<p>The more items you plan to add, the fewer hash functions you should use. Yet, a
|
|
590
|
+
larger bloom filter means you can use more hash functions. More hash functions
|
|
591
|
+
keep the false-positive rate lower for longer, but more items fills up the bloom
|
|
592
|
+
filter faster. It's a complex balancing act, and I am thankful that
|
|
593
|
+
mathematicians have done the hard work of figuring it out for us.</p>
|
|
594
|
+
<h3 id="caution"><a class="anchor" href="#caution">#</a>
|
|
595
|
+
Caution</h3>
|
|
596
|
+
<p>While we can stand on the shoulders of giants and pick the optimal number of
|
|
597
|
+
<s-word>bits</s-word> and hash functions for our bloom filter, it's important to
|
|
598
|
+
remember that these rely on you giving good estimates of the number of items you
|
|
599
|
+
expect to add, and choosing a false-positive rate that's acceptable for your
|
|
600
|
+
use-case. These numbers might be difficult to come up with, and I recommend
|
|
601
|
+
erring on the side of caution. If you're not sure, it's likely better to use a
|
|
602
|
+
larger bloom filter than you think you need.</p>
|
|
603
|
+
<h2 id="removing-items-from-a-bloom-filter"><a class="anchor" href="#removing-items-from-a-bloom-filter">#</a>
|
|
604
|
+
Removing items from a bloom filter</h2>
|
|
605
|
+
<p>We've spent the whole post talking about adding things to a bloom filter, and
|
|
606
|
+
the optimal parameters to use. We haven't spoken at all about removing items.</p>
|
|
607
|
+
<p>And that's because you can't!</p>
|
|
608
|
+
<p>In a bloom filter, we're using <s-word>bits</s-word>, individual 1s and 0s, to
|
|
609
|
+
track the presence of items. If we were to remove an item by setting its
|
|
610
|
+
<s-word>bits</s-word> to 0, we might also be removing other items by accident.
|
|
611
|
+
There's no way of knowing.</p>
|
|
612
|
+
<p>Click the buttons of the bloom filter below to see this in action. First we will
|
|
613
|
+
add "foo", then "baz", and then we will remove "baz". Hit "clear" if you want
|
|
614
|
+
to start again.</p>
|
|
615
|
+
<s-add-remove bfid=bf3>
|
|
616
|
+
<add value="foo">
|
|
617
|
+
<add value="baz">
|
|
618
|
+
<remove value="baz">
|
|
619
|
+
<clear>
|
|
620
|
+
</s-add-remove>
|
|
621
|
+
<s-bloom-filter id=bf3 hashes="sha1,sha256,sha512" bits=32 class=bf></s-bloom-filter>
|
|
622
|
+
<p>The end result of this sequence is a bloom filter that doesn't contain "baz",
|
|
623
|
+
but doesn't contain "foo" either. Because both "foo" and "baz" set
|
|
624
|
+
<s-word>bit</s-word> <s-bitlink bfid=bf3>27</s-bitlink>, we accidentally clobber
|
|
625
|
+
the presence of "foo" while removing "baz".</p>
|
|
626
|
+
<p>Something else you might have noticed playing with the above example is that if
|
|
627
|
+
you add "foo" and then attempt to remove "baz" before having added it, nothing
|
|
628
|
+
happens. Even though <s-bitlink bfid=bf3>27</s-bitlink> is set,
|
|
629
|
+
<s-word>bits</s-word> <s-bitlink bfid=bf3>18</s-bitlink> and <s-bitlink
|
|
630
|
+
bfid=bf3>23</s-bitlink> are not, so the bloom filter cannot contain "baz".
|
|
631
|
+
Because of this, it won't unset <s-bitlink bfid=bf3>27</s-bitlink>.</p>
|
|
632
|
+
<h3 id="counting-bloom-filters"><a class="anchor" href="#counting-bloom-filters">#</a>
|
|
633
|
+
Counting bloom filters</h3>
|
|
634
|
+
<p>While you can't remove items from a standard bloom filter, there are variants
|
|
635
|
+
that allow you to do so. One of these variants is called a "counting bloom
|
|
636
|
+
filter," which uses an array of counters instead of bits to keep track of items.</p>
|
|
637
|
+
<s-add-remove bfid=bf4>
|
|
638
|
+
<add value="foo">
|
|
639
|
+
<add value="baz">
|
|
640
|
+
<remove value="baz">
|
|
641
|
+
<clear>
|
|
642
|
+
</s-add-remove>
|
|
643
|
+
<s-bloom-filter id=bf4 counting=true hashes="sha1,sha256,sha512" bits=32 class=bf></s-bloom-filter>
|
|
644
|
+
<p>Now when you go through the sequence, the end result is that the bloom filter
|
|
645
|
+
still contains "foo." It solves the problem.</p>
|
|
646
|
+
<p>The trade-off, though, is that counters are bigger than <s-word>bits</s-word>.
|
|
647
|
+
With 4 bits per counter you can increment up to 15. With 8 bits per counter you
|
|
648
|
+
can increment up to 255. You'll need to pick a counter size sufficient to never
|
|
649
|
+
reach the maximum value, otherwise you risk corrupting the bloom filter. Using
|
|
650
|
+
8x more memory than a standard bloom filter could be a big deal, especially if
|
|
651
|
+
you're using a bloom filter to save memory in the first place. Think hard about
|
|
652
|
+
whether you really need to be able to remove items from your bloom filter.</p>
|
|
653
|
+
<p>Counting bloom filters also introduce the possibility of false-negatives, which
|
|
654
|
+
are impossible in standard bloom filters. Consider the following example.</p>
|
|
655
|
+
<s-add-remove bfid=bf5>
|
|
656
|
+
<add value="loved">
|
|
657
|
+
<add value="your">
|
|
658
|
+
<remove value="response">
|
|
659
|
+
<clear>
|
|
660
|
+
</s-add-remove>
|
|
661
|
+
<s-bloom-filter id=bf5 counting=true hashes="sha1,sha256,sha512" bits=32 class=bf></s-bloom-filter>
|
|
662
|
+
<p>Because "loved" and "response" both hash to the <s-word>bits</s-word>
|
|
663
|
+
<s-bitlink bfid=bf5>5</s-bitlink>, <s-bitlink bfid=bf5>22</s-bitlink>, and
|
|
664
|
+
<s-bitlink bfid=bf5>26</s-bitlink>, when we remove "response" we also remove "loved". If
|
|
665
|
+
we write this as JavaScript the problem becomes more clear:</p>
|
|
666
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">let </span><span>bf </span><span style="color:#81a1c1;">= new </span><span style="color:#8fbcbb;">CountingBloomFilter</span><span>()</span><span style="color:#eceff4;">;
|
|
667
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">add</span><span>(</span><span style="color:#a3be8c;">&quot;loved&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
668
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">add</span><span>(</span><span style="color:#a3be8c;">&quot;your&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
669
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">remove</span><span>(</span><span style="color:#a3be8c;">&quot;response&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
670
|
+
</span><span>bf</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">contains</span><span>(</span><span style="color:#a3be8c;">&quot;loved&quot;</span><span>)</span><span style="color:#eceff4;">; </span><span style="color:#616e88;">// false
|
|
671
|
+
</span></code></pre>
|
|
672
|
+
<p>Even though we know for sure we've added "loved" in this snippet, the call to
|
|
673
|
+
<code>contains</code> will return <code>false</code>. This sort of false-negative can't happen in a
|
|
674
|
+
standard bloom filter, and it removes one of the key benefits of using a bloom
|
|
675
|
+
filter in the first place: the guarantee of no false-negatives.</p>
|
|
676
|
+
<h2 id="bloom-filters-in-the-real-world"><a class="anchor" href="#bloom-filters-in-the-real-world">#</a>
|
|
677
|
+
Bloom filters in the real-world</h2>
|
|
678
|
+
<p>Real-world users of bloom filters include <a href="https://www.akamai.com/">Akamai</a>, who
|
|
679
|
+
use them to avoid caching web pages that are accessed once and never again. They
|
|
680
|
+
do this by storing all page accesses in a bloom filter, and only writing them
|
|
681
|
+
into cache if the bloom filter says they've been seen before. This does result
|
|
682
|
+
in some pages being cached on the first access, but that's fine because it's
|
|
683
|
+
still an improvement. It would be impractical for them to store all page
|
|
684
|
+
accesses in a <code>Set</code>, so they accept the small false-positive rate in favour of
|
|
685
|
+
the significantly smaller bloom filter. Akamai released a
|
|
686
|
+
<a href="https://web.archive.org/web/20210814193152/https://www.akamai.com/us/en/multimedia/documents/technical-publication/algorithmic-nuggets-in-content-delivery-technical-publication.pdf">paper</a>
|
|
687
|
+
about this that goes into the full details if you're interested.</p>
|
|
688
|
+
<p>Google's
|
|
689
|
+
<a href="https://storage.googleapis.com/pub-tools-public-publication-data/pdf/68a74a85e1662fe02ff3967497f31fda7f32225c.pdf">BigTable</a>
|
|
690
|
+
is a distributed key-value store, and uses bloom filters internally to know what
|
|
691
|
+
keys are stored within. When a read request for a key comes in, a bloom filter
|
|
692
|
+
in memory is first checked to see if the key is in the database. If not,
|
|
693
|
+
BigTable can respond with "not found" without ever needing to read from disk.
|
|
694
|
+
Sometimes the bloom filter will say a key might be in the database when it isn't,
|
|
695
|
+
but this is fine because when that happens a disk access will confirm the key in
|
|
696
|
+
fact isn't in the database.</p>
|
|
697
|
+
<h2 id="conclusion"><a class="anchor" href="#conclusion">#</a>
|
|
698
|
+
Conclusion</h2>
|
|
699
|
+
<p>Bloom filters, while niche, can be a huge optimisation in the right situation.
|
|
700
|
+
They're a wonderful application of hash functions, and a great example of making
|
|
701
|
+
a deliberate trade-off to achieve a specific goal.</p>
|
|
702
|
+
<p>Trade-offs, and combining simpler building blocks to create more complex,
|
|
703
|
+
purpose-built data structures, are present everywhere in software engineering.
|
|
704
|
+
Being able to spot where a data structure could net a big win can separate you
|
|
705
|
+
from the pack, and take your career to the next level.</p>
|
|
706
|
+
<p>I hope you've enjoyed this post, and that you find a way to apply bloom filters
|
|
707
|
+
to a problem you're working on.</p>
|
|
708
|
+
<p><em>Join the discussion on <a href="https://news.ycombinator.com/item?id=39439505">Hacker News</a> or <a href="https://lobste.rs/s/gwyglz/visual_interactive_guide_bloom_filters">Lobste.rs</a>!</em></p>
|
|
709
|
+
<h2 id="acknowledgements"><a class="anchor" href="#acknowledgements">#</a>
|
|
710
|
+
Acknowledgements</h2>
|
|
711
|
+
<p>Enormous thank you to my reviewers, without whom this post would be a shadow of
|
|
712
|
+
what you read today. In no particular order:</p>
|
|
713
|
+
<p><a href="https://rylon.dev">rylon</a>, <a href="https://indy.codes">Indy</a>,
|
|
714
|
+
<a href="https://twitter.com/AaronKalair">Aaron</a>, Sophie, <a href="https://dvsj.in">Davis</a>,
|
|
715
|
+
<a href="https://eduardmoldovan.com">ed</a>, <a href="https://github.com/mike12345567">Michael
|
|
716
|
+
Drury</a>, <a href="https://antonz.org/">Anton Zhiyanov</a>,
|
|
717
|
+
<a href="https://appliedgo.net/">Christoph Berger</a>, <a href="https://github.com/aptkingston">Andrew Kingston</a>, <a href="https://thattommyhall.com">Tom Hall</a>.</p>
|
|
718
|
+
</content>
|
|
719
|
+
|
|
720
|
+
</entry>
|
|
721
|
+
|
|
722
|
+
|
|
723
|
+
|
|
724
|
+
|
|
725
|
+
|
|
726
|
+
|
|
727
|
+
<entry xml:lang="en">
|
|
728
|
+
<title>Hashing</title>
|
|
729
|
+
<published>2023-05-24T00:00:00+00:00</published>
|
|
730
|
+
<updated>2023-05-24T00:00:00+00:00</updated>
|
|
731
|
+
<author>
|
|
732
|
+
<name>Unknown</name>
|
|
733
|
+
</author>
|
|
734
|
+
<link rel="alternate" href="https://samwho.dev/hashing/" type="text/html"/>
|
|
735
|
+
<id>https://samwho.dev/hashing/</id>
|
|
736
|
+
|
|
737
|
+
<content type="html"><style>
|
|
738
|
+
form {
|
|
739
|
+
padding-top: 0.5em;
|
|
740
|
+
padding-left: 0.5em;
|
|
741
|
+
padding-right: 0.5em;
|
|
742
|
+
display: flex;
|
|
743
|
+
justify-content: center;
|
|
744
|
+
gap: 0.3em;
|
|
745
|
+
}
|
|
746
|
+
|
|
747
|
+
form input[type=text] {
|
|
748
|
+
flex: 4 1 auto;
|
|
749
|
+
min-width: 0;
|
|
750
|
+
border-radius: 0.3em;
|
|
751
|
+
border: 1px solid #aaaaaa;
|
|
752
|
+
padding: 0.3em;
|
|
753
|
+
}
|
|
754
|
+
|
|
755
|
+
form button {
|
|
756
|
+
flex: 1 1 auto;
|
|
757
|
+
max-width: 140px;
|
|
758
|
+
}
|
|
759
|
+
|
|
760
|
+
form button:disabled {
|
|
761
|
+
opacity: 0.5 !important;
|
|
762
|
+
}
|
|
763
|
+
|
|
764
|
+
form button.add {
|
|
765
|
+
background-color: #009E73;
|
|
766
|
+
color: white;
|
|
767
|
+
border: 0;
|
|
768
|
+
border-radius: 0.3em;
|
|
769
|
+
cursor: pointer;
|
|
770
|
+
}
|
|
771
|
+
|
|
772
|
+
form button.check {
|
|
773
|
+
background-color: #56B4E9;
|
|
774
|
+
color: white;
|
|
775
|
+
border: 0;
|
|
776
|
+
border-radius: 0.3em;
|
|
777
|
+
cursor: pointer;
|
|
778
|
+
}
|
|
779
|
+
|
|
780
|
+
form button.clear {
|
|
781
|
+
background-color: #D55E00;
|
|
782
|
+
color: white;
|
|
783
|
+
border: 0;
|
|
784
|
+
border-radius: 0.3em;
|
|
785
|
+
cursor: pointer;
|
|
786
|
+
}
|
|
787
|
+
|
|
788
|
+
.grid-2x2 {
|
|
789
|
+
display: "grid";
|
|
790
|
+
}
|
|
791
|
+
|
|
792
|
+
.grid {
|
|
793
|
+
user-select: none;
|
|
794
|
+
cursor: pointer;
|
|
795
|
+
margin-top: 1rem;
|
|
796
|
+
margin-bottom: 1rem;
|
|
797
|
+
border: 1px solid #009E73;
|
|
798
|
+
width: 100%;
|
|
799
|
+
display: grid;
|
|
800
|
+
grid-template-columns: repeat(8, 1fr);
|
|
801
|
+
grid-template-rows: repeat(2, 1fr);
|
|
802
|
+
}
|
|
803
|
+
|
|
804
|
+
.grid-item {
|
|
805
|
+
display: flex;
|
|
806
|
+
align-items: center;
|
|
807
|
+
justify-content: center;
|
|
808
|
+
aspect-ratio: 1/1;
|
|
809
|
+
}
|
|
810
|
+
|
|
811
|
+
.grid-active {
|
|
812
|
+
background-color: #009E73;
|
|
813
|
+
color: white;
|
|
814
|
+
}
|
|
815
|
+
|
|
816
|
+
.above-grid {
|
|
817
|
+
display: flex;
|
|
818
|
+
justify-content: center;
|
|
819
|
+
}
|
|
820
|
+
|
|
821
|
+
.hash-examples {
|
|
822
|
+
padding-top: 0.5rem;
|
|
823
|
+
padding-bottom: 0.5rem;
|
|
824
|
+
margin: auto;
|
|
825
|
+
display: flex;
|
|
826
|
+
flex-direction: column;
|
|
827
|
+
align-items: center;
|
|
828
|
+
}
|
|
829
|
+
|
|
830
|
+
.hash-examples div {
|
|
831
|
+
margin: auto;
|
|
832
|
+
}
|
|
833
|
+
|
|
834
|
+
.hash-examples code {
|
|
835
|
+
display: block;
|
|
836
|
+
white-space: pre;
|
|
837
|
+
font-weight: bold;
|
|
838
|
+
}
|
|
839
|
+
|
|
840
|
+
.hash-examples p {
|
|
841
|
+
font-size: 0.75rem;
|
|
842
|
+
font-style: italic;
|
|
843
|
+
text-align: center;
|
|
844
|
+
font-family: Lora, serif;
|
|
845
|
+
width: 75%;
|
|
846
|
+
}
|
|
847
|
+
|
|
848
|
+
.blob {
|
|
849
|
+
cursor: pointer;
|
|
850
|
+
background: #CC79A7;
|
|
851
|
+
display: flex;
|
|
852
|
+
justify-content: center;
|
|
853
|
+
align-items: center;
|
|
854
|
+
font-size: 1.5rem;
|
|
855
|
+
color: white;
|
|
856
|
+
border-radius: 50%;
|
|
857
|
+
margin: 10px;
|
|
858
|
+
height: 3rem;
|
|
859
|
+
width: 3rem;
|
|
860
|
+
min-width: 3rem;
|
|
861
|
+
max-width: 3rem;
|
|
862
|
+
|
|
863
|
+
box-shadow: 0 0 0 0 #CC79A7FF;
|
|
864
|
+
transform: scale(1);
|
|
865
|
+
animation: pulse 2s infinite;
|
|
866
|
+
}
|
|
867
|
+
|
|
868
|
+
@keyframes pulse {
|
|
869
|
+
0% {
|
|
870
|
+
transform: scale(0.85);
|
|
871
|
+
box-shadow: 0 0 0 0 #CC79A77F;
|
|
872
|
+
}
|
|
873
|
+
|
|
874
|
+
70% {
|
|
875
|
+
transform: scale(1);
|
|
876
|
+
box-shadow: 0 0 0 1rem rgba(0, 0, 0, 0);
|
|
877
|
+
}
|
|
878
|
+
|
|
879
|
+
100% {
|
|
880
|
+
transform: scale(0.85);
|
|
881
|
+
box-shadow: 0 0 0 0 rgba(0, 0, 0, 0);
|
|
882
|
+
}
|
|
883
|
+
}
|
|
884
|
+
|
|
885
|
+
.blob-click {
|
|
886
|
+
cursor: default;
|
|
887
|
+
animation: tick 1s linear;
|
|
888
|
+
background: #009E73FF;
|
|
889
|
+
}
|
|
890
|
+
|
|
891
|
+
@keyframes tick {
|
|
892
|
+
0% {
|
|
893
|
+
transform: scale(1);
|
|
894
|
+
box-shadow: 0 0 0 0 #009E73FF;
|
|
895
|
+
}
|
|
896
|
+
|
|
897
|
+
50% {
|
|
898
|
+
box-shadow: 0 0 0 1rem #009E737F;
|
|
899
|
+
}
|
|
900
|
+
|
|
901
|
+
100% {
|
|
902
|
+
box-shadow: 0 0 0 2rem #009E7300;
|
|
903
|
+
}
|
|
904
|
+
}
|
|
905
|
+
|
|
906
|
+
.aside {
|
|
907
|
+
padding: 2rem;
|
|
908
|
+
width: 100vw;
|
|
909
|
+
position: relative;
|
|
910
|
+
margin-left: -50vw;
|
|
911
|
+
left: 50%;
|
|
912
|
+
background-color: #eeeeee;
|
|
913
|
+
|
|
914
|
+
display: flex;
|
|
915
|
+
align-items: center;
|
|
916
|
+
flex-direction: column;
|
|
917
|
+
}
|
|
918
|
+
|
|
919
|
+
.aside > * {
|
|
920
|
+
flex-grow: 1;
|
|
921
|
+
}
|
|
922
|
+
|
|
923
|
+
.aside p {
|
|
924
|
+
padding-left: 1rem;
|
|
925
|
+
padding-right: 1rem;
|
|
926
|
+
max-width: 780px;
|
|
927
|
+
font-style: italic;
|
|
928
|
+
font-family: Lora, serif;
|
|
929
|
+
text-align: center;
|
|
930
|
+
}
|
|
931
|
+
|
|
932
|
+
.pct25 {
|
|
933
|
+
width: 100%;
|
|
934
|
+
height: 200px;
|
|
935
|
+
}
|
|
936
|
+
|
|
937
|
+
.datasets th {
|
|
938
|
+
text-align: left;
|
|
939
|
+
}
|
|
940
|
+
|
|
941
|
+
.datasets td {
|
|
942
|
+
text-align: left;
|
|
943
|
+
}
|
|
944
|
+
|
|
945
|
+
.datasets {
|
|
946
|
+
table-layout: fixed;
|
|
947
|
+
}
|
|
948
|
+
|
|
949
|
+
</style>
|
|
950
|
+
<noscript>
|
|
951
|
+
<div class=aside>
|
|
952
|
+
<p>
|
|
953
|
+
This page makes heavy use of JavaScript to visualise the concepts discussed.
|
|
954
|
+
Viewing it without JavaScript will be a strange experience, as the text
|
|
955
|
+
talks about the visualisations. I strongly recommend either enabling
|
|
956
|
+
JavaScript, or not wasting your time.
|
|
957
|
+
</p>
|
|
958
|
+
</div>
|
|
959
|
+
</noscript>
|
|
960
|
+
<p>As a programmer, you use hash functions every day. They're used in databases
|
|
961
|
+
to optimise queries, they're used in data structures to make things faster,
|
|
962
|
+
they're used in security to keep data safe. Almost every interaction you have
|
|
963
|
+
with technology will involve hash functions in one way or another.</p>
|
|
964
|
+
<p>Hash functions are foundational, and they are <strong>everywhere</strong>.</p>
|
|
965
|
+
<p>But what <em>is</em> a hash function, and how do they work?</p>
|
|
966
|
+
<p>In this post, we're going to demystify hash functions. We're going to start by
|
|
967
|
+
looking at a simple hash function, then we're going to learn how to test if a
|
|
968
|
+
hash function is good or not, and then we're going to look at a real-world use
|
|
969
|
+
of hash functions: the hash map.</p>
|
|
970
|
+
<script>
|
|
971
|
+
document.addEventListener("DOMContentLoaded", function () {
|
|
972
|
+
var blob = document.querySelector(".blob");
|
|
973
|
+
blob.addEventListener("click", function () {
|
|
974
|
+
blob.classList.add("blob-click");
|
|
975
|
+
blob.innerText = "✓";
|
|
976
|
+
});
|
|
977
|
+
});
|
|
978
|
+
</script>
|
|
979
|
+
<div class=aside>
|
|
980
|
+
<div class=blob>
|
|
981
|
+
</div>
|
|
982
|
+
<p>
|
|
983
|
+
This article has visualisations that can be <span class="purple
|
|
984
|
+
bold">clicked</span>.
|
|
985
|
+
</p>
|
|
986
|
+
</div>
|
|
987
|
+
<h2 id="what-is-a-hash-function"><a class="anchor" href="#what-is-a-hash-function">#</a>
|
|
988
|
+
What <em>is</em> a hash function?</h2>
|
|
989
|
+
<p>Hash functions are functions that take an input, usually a string, and produce a
|
|
990
|
+
number. If you were to call a hash function multiple times with the same input,
|
|
991
|
+
it will always return the same number, and that number returned will always be
|
|
992
|
+
within a promised range. What that range is will depend on the hash function,
|
|
993
|
+
some use 32-bit integers (so 0 to 4 billion), others go much larger.</p>
|
|
994
|
+
<p>If we were to write a dummy hash function in JavaScript, it might look like
|
|
995
|
+
this:</p>
|
|
996
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">function </span><span style="color:#88c0d0;">hash</span><span>(input) {
|
|
997
|
+
</span><span> </span><span style="color:#81a1c1;">return </span><span style="color:#b48ead;">0</span><span style="color:#eceff4;">;
|
|
998
|
+
</span><span>}
|
|
999
|
+
</span></code></pre>
|
|
1000
|
+
<p>Even without knowing <em>how</em> hash functions are used, it's probably no surprise
|
|
1001
|
+
that this hash function is useless. Let's see how we can measure how good a
|
|
1002
|
+
hash function is, and after that we'll do a deep dive on how they're used
|
|
1003
|
+
within hash maps.</p>
|
|
1004
|
+
<h2 id="what-makes-a-hash-function-good"><a class="anchor" href="#what-makes-a-hash-function-good">#</a>
|
|
1005
|
+
What makes a hash function good?</h2>
|
|
1006
|
+
<p>Because <code>input</code> can be any string, but the number returned is within some
|
|
1007
|
+
promised range, it's possible that two different inputs can return the same
|
|
1008
|
+
number. This is called a "collision," and good hash functions try to minimise
|
|
1009
|
+
how many collisions they produce.</p>
|
|
1010
|
+
<p>It's not possible to completely eliminate collisions, though. If we wrote a hash
|
|
1011
|
+
function that returned a number in the range 0 to 7, and we gave it 9 unique
|
|
1012
|
+
inputs, we're guaranteed at least 1 collision.</p>
|
|
1013
|
+
<div class="hash-examples">
|
|
1014
|
+
<div>
|
|
1015
|
+
<code>hash("to") == 3</code>
|
|
1016
|
+
<code>hash("the") == 2</code>
|
|
1017
|
+
<code>hash("café") == 0</code>
|
|
1018
|
+
<code>hash("de") == 6</code>
|
|
1019
|
+
<code>hash("versailles") == 4</code>
|
|
1020
|
+
<code>hash("for") == 5</code>
|
|
1021
|
+
<code>hash("coffee") == 0</code>
|
|
1022
|
+
<code>hash("we") == 7</code>
|
|
1023
|
+
<code>hash("had") == 1</code>
|
|
1024
|
+
</div>
|
|
1025
|
+
<p>
|
|
1026
|
+
Output values from a well-known hash function, modulo 8. No matter what 9
|
|
1027
|
+
values we pass, there are only 8 unique numbers and so collisions are
|
|
1028
|
+
inevitable. The goal is to have as few as possible.
|
|
1029
|
+
</p>
|
|
1030
|
+
</div>
|
|
1031
|
+
<p>To visualise collisions, I'm going to use a grid. Each square of the grid is
|
|
1032
|
+
going to represent a number output by a hash function. Here's an example 8x2
|
|
1033
|
+
grid. <span class="purple bold">Click</span> on the grid to increment the
|
|
1034
|
+
example hash output value and see how we map it to a grid square. See what
|
|
1035
|
+
happens when you get a number larger than the number of grid squares.</p>
|
|
1036
|
+
<script>
|
|
1037
|
+
document.addEventListener("DOMContentLoaded", () => {
|
|
1038
|
+
let grid = document.getElementById("first-grid");
|
|
1039
|
+
let hash = document.getElementById("grid-hash");
|
|
1040
|
+
let modulo = document.getElementById("grid-modulo");
|
|
1041
|
+
grid.addEventListener("click", (e) => {
|
|
1042
|
+
e.preventDefault();
|
|
1043
|
+
let number = parseInt(hash.innerText) + 1;
|
|
1044
|
+
hash.innerText = number.toString();
|
|
1045
|
+
modulo.innerText = (number % 16).toString();
|
|
1046
|
+
|
|
1047
|
+
grid.querySelector(".grid-active").classList.remove("grid-active");
|
|
1048
|
+
grid.children[number % 16].classList.add("grid-active");
|
|
1049
|
+
return false;
|
|
1050
|
+
});
|
|
1051
|
+
});
|
|
1052
|
+
</script>
|
|
1053
|
+
<div class=above-grid>
|
|
1054
|
+
<code style="color: #009E73; font-weight: bold;">
|
|
1055
|
+
<span id="grid-hash">13</span> % 16 == <span id="grid-modulo">13</span>
|
|
1056
|
+
</code>
|
|
1057
|
+
</div>
|
|
1058
|
+
<div class="grid" id="first-grid">
|
|
1059
|
+
<div class="grid-item">0</div>
|
|
1060
|
+
<div class="grid-item">1</div>
|
|
1061
|
+
<div class="grid-item">2</div>
|
|
1062
|
+
<div class="grid-item">3</div>
|
|
1063
|
+
<div class="grid-item">4</div>
|
|
1064
|
+
<div class="grid-item">5</div>
|
|
1065
|
+
<div class="grid-item">6</div>
|
|
1066
|
+
<div class="grid-item">7</div>
|
|
1067
|
+
<div class="grid-item">8</div>
|
|
1068
|
+
<div class="grid-item">9</div>
|
|
1069
|
+
<div class="grid-item">10</div>
|
|
1070
|
+
<div class="grid-item">11</div>
|
|
1071
|
+
<div class="grid-item">12</div>
|
|
1072
|
+
<div class="grid-item grid-active">13</div>
|
|
1073
|
+
<div class="grid-item">14</div>
|
|
1074
|
+
<div class="grid-item">15</div>
|
|
1075
|
+
</div>
|
|
1076
|
+
<p>Every time we hash a value, we're going to make its corresponding square on the
|
|
1077
|
+
grid a bit darker. The idea is to create an easy way to see how well a hash
|
|
1078
|
+
function avoids collisions. What we're looking for is a nice, even distribution.
|
|
1079
|
+
We'll know that the hash function isn't good if we have clumps or patterns of
|
|
1080
|
+
dark squares.</p>
|
|
1081
|
+
<blockquote class="haskie">
|
|
1082
|
+
<img src="/images/haskie-confused-200px.png" />
|
|
1083
|
+
<p>
|
|
1084
|
+
You said that when a hash function outputs the same value for 2 different
|
|
1085
|
+
inputs, that's a collision. But if we have a hash function that outputs values
|
|
1086
|
+
in a big range, and we mapped those to a small grid, aren't we going to create
|
|
1087
|
+
lots of collisions on the grid that aren't actually collisions? On our 8x2
|
|
1088
|
+
grid, 1 and 17 both map to the 2nd square.
|
|
1089
|
+
</p>
|
|
1090
|
+
</blockquote>
|
|
1091
|
+
<p>This is a great observation. You're absolutely right, we're going to be creating
|
|
1092
|
+
"pseudo-collisions" on our grid. It's okay, though, because if the hash function
|
|
1093
|
+
is good we will still see an even distribution. Incrementing every square by 100
|
|
1094
|
+
is just as good a distribution as incrementing every square by 1. If we have a
|
|
1095
|
+
bad hash function that collides a lot, that will still stand out. We'll see
|
|
1096
|
+
this shortly.</p>
|
|
1097
|
+
<p>Let's take a larger grid and hash 1,000 randomly-generated strings. You can
|
|
1098
|
+
<span class="purple bold">click</span> on the grid to hash a new set of random
|
|
1099
|
+
inputs, and the grid will animate to show you each input being hashed and placed
|
|
1100
|
+
on the grid.</p>
|
|
1101
|
+
<p><heat-map
|
|
1102
|
+
class="pct25"
|
|
1103
|
+
iterations=1000
|
|
1104
|
+
blockSize=20
|
|
1105
|
+
valueFn=randomUUID
|
|
1106
|
+
hashFn=murmur3
|
|
1107
|
+
color=green>
|
|
1108
|
+
</heat-map></p>
|
|
1109
|
+
<p>The values are nice and evenly distributed because we're using a good,
|
|
1110
|
+
well-known hash function called <code class="green">murmur3</code>. This hash
|
|
1111
|
+
is widely used in the real-world because it has great distribution while also
|
|
1112
|
+
being really, really fast.</p>
|
|
1113
|
+
<p>What would our grid look like if we used a <em>bad</em> hash function?</p>
|
|
1114
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">function </span><span style="color:#88c0d0;">hash</span><span>(input) {
|
|
1115
|
+
</span><span> </span><span style="color:#81a1c1;">let </span><span>hash </span><span style="color:#81a1c1;">= </span><span style="color:#b48ead;">0</span><span style="color:#eceff4;">;
|
|
1116
|
+
</span><span> </span><span style="color:#81a1c1;">for </span><span>(</span><span style="color:#81a1c1;">let </span><span>c </span><span style="color:#81a1c1;">of </span><span>input) {
|
|
1117
|
+
</span><span> hash </span><span style="color:#81a1c1;">+= </span><span>c</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">charCodeAt</span><span>(</span><span style="color:#b48ead;">0</span><span>)</span><span style="color:#eceff4;">;
|
|
1118
|
+
</span><span> }
|
|
1119
|
+
</span><span> </span><span style="color:#81a1c1;">return </span><span>hash </span><span style="color:#81a1c1;">% </span><span style="color:#b48ead;">1000000</span><span style="color:#eceff4;">;
|
|
1120
|
+
</span><span>}
|
|
1121
|
+
</span></code></pre>
|
|
1122
|
+
<p>This hash function loops through the string that we're given and sums the
|
|
1123
|
+
numeric values of each character. It then makes sure that the value is between 0
|
|
1124
|
+
and 1000000 by using the modulus operator (<code>%</code>). Let's call this hash function
|
|
1125
|
+
<code class="red">stringSum</code>.</p>
|
|
1126
|
+
<p>Here it is on the grid. Reminder, this is 1,000 randomly generated strings that
|
|
1127
|
+
we're hashing.</p>
|
|
1128
|
+
<p><heat-map
|
|
1129
|
+
class="pct25"
|
|
1130
|
+
iterations=1000
|
|
1131
|
+
blockSize=20
|
|
1132
|
+
valueFn=randomUUID
|
|
1133
|
+
hashFn=stringSum
|
|
1134
|
+
color=red>
|
|
1135
|
+
</heat-map></p>
|
|
1136
|
+
<p>This doesn't look all that different from <code class="green">murmur3</code>.
|
|
1137
|
+
What gives?</p>
|
|
1138
|
+
<p>The problem is that the strings we're giving to be hashed are random. Let's see
|
|
1139
|
+
how each function performs when given input that is not random: the numbers from
|
|
1140
|
+
1 to 1000 converted to strings.</p>
|
|
1141
|
+
<div style="display: flex; flex-wrap: wrap; gap: 0.5rem; justify-content: space-between; padding-top: 1rem; padding-bottom: 1rem">
|
|
1142
|
+
<heat-map
|
|
1143
|
+
style="flex-grow: 1; width: 40%; min-width: 200px; aspect-ratio: 2/1;"
|
|
1144
|
+
iterations=1000
|
|
1145
|
+
blockSize=10
|
|
1146
|
+
valueFn=intToStr
|
|
1147
|
+
hashFn=murmur3
|
|
1148
|
+
color=green>
|
|
1149
|
+
</heat-map>
|
|
1150
|
+
<heat-map
|
|
1151
|
+
style="flex-grow: 1; width: 40%; min-width: 200px; aspect-ratio: 2/1;"
|
|
1152
|
+
iterations=1000
|
|
1153
|
+
blockSize=10
|
|
1154
|
+
valueFn=intToStr
|
|
1155
|
+
hashFn=stringSum
|
|
1156
|
+
color=red>
|
|
1157
|
+
</heat-map>
|
|
1158
|
+
</div>
|
|
1159
|
+
<p>Now the problem is more clear. When the input isn't random, the output of <code
|
|
1160
|
+
class="red">stringSum</code> forms a pattern. Our <code
|
|
1161
|
+
class="green">murmur3</code> grid, however, looks the same as how it looked with
|
|
1162
|
+
random values.</p>
|
|
1163
|
+
<p>How about if we hash the <a href="https://github.com/powerlanguage/word-lists/blob/master/1000-most-common-words.txt">top 1,000 most common English words</a>:</p>
|
|
1164
|
+
<div style="display: flex; flex-wrap: wrap; gap: 0.5rem; justify-content: space-between; padding-top: 1rem; padding-bottom: 1rem">
|
|
1165
|
+
<heat-map
|
|
1166
|
+
style="flex-grow: 1; width: 40%; min-width: 200px; aspect-ratio: 2/1;"
|
|
1167
|
+
iterations=1000
|
|
1168
|
+
blockSize=10
|
|
1169
|
+
valueFn=commonWords
|
|
1170
|
+
hashFn=murmur3
|
|
1171
|
+
color=green>
|
|
1172
|
+
</heat-map>
|
|
1173
|
+
<heat-map
|
|
1174
|
+
style="flex-grow: 1; width: 40%; min-width: 200px; aspect-ratio: 2/1;"
|
|
1175
|
+
iterations=1000
|
|
1176
|
+
blockSize=10
|
|
1177
|
+
valueFn=commonWords
|
|
1178
|
+
hashFn=stringSum
|
|
1179
|
+
color=red>
|
|
1180
|
+
</heat-map>
|
|
1181
|
+
</div>
|
|
1182
|
+
<p>It's more subtle, but we do see a pattern on the <code class="bold
|
|
1183
|
+
red">stringSum</code> grid. As usual, <code class="green bold">murmur3</code>
|
|
1184
|
+
looks the same as it always does.</p>
|
|
1185
|
+
<p>This is the power of a good hash function: no matter the input,
|
|
1186
|
+
the output is evenly distributed. Let's talk about one more way to visualise
|
|
1187
|
+
this and then talk about why it matters.</p>
|
|
1188
|
+
<h3 id="the-avalanche-effect"><a class="anchor" href="#the-avalanche-effect">#</a>
|
|
1189
|
+
The avalanche effect</h3>
|
|
1190
|
+
<p>Another way hash functions get evaluated is on something called the "avalanche
|
|
1191
|
+
effect." This refers to how many bits in the output value change when just a
|
|
1192
|
+
single bit of the input changes. To say that a hash function has a good
|
|
1193
|
+
avalanche effect, a single bit flip in the input should result in an average of
|
|
1194
|
+
50% the output bits flipping.</p>
|
|
1195
|
+
<p>It's this property that helps hash functions avoid forming patterns in the grid.
|
|
1196
|
+
If small changes in the input result in small changes in the output, you get
|
|
1197
|
+
patterns. Patterns indicate poor distribution, and a higher rate of collisions.</p>
|
|
1198
|
+
<p>Below, we are visualising the avalanche effect by showing two 8-bit binary
|
|
1199
|
+
numbers. The top number is the input value, and the bottom number is the <code
|
|
1200
|
+
class="green bold">murmur3</code> output value. <span class="purple
|
|
1201
|
+
bold">Click</span> on it to <span class="purple bold">flip a single bit</span>
|
|
1202
|
+
in the input. Bits that change in the output will be <span class="green
|
|
1203
|
+
bold">green</span>, bits that stay the same will be <span class="red
|
|
1204
|
+
bold">red</span>.</p>
|
|
1205
|
+
<p><avalanche-effect
|
|
1206
|
+
style="width: 100%; height: 200px;"
|
|
1207
|
+
hashFn="murmur3">
|
|
1208
|
+
</avalanche-effect></p>
|
|
1209
|
+
<p><span class="green bold">murmur3</span> does well, though you will notice that
|
|
1210
|
+
sometimes fewer than 50% of the bits flip and sometimes more. This is okay,
|
|
1211
|
+
provided that it is 50% on average.</p>
|
|
1212
|
+
<p>Let's see how <span class="bold red">stringSum</span> performs.</p>
|
|
1213
|
+
<p><avalanche-effect
|
|
1214
|
+
style="width: 100%; height: 200px;"
|
|
1215
|
+
hashFn="stringSum">
|
|
1216
|
+
</avalanche-effect></p>
|
|
1217
|
+
<p>Well this is embarassing. The output is equal to the input, and so only a single
|
|
1218
|
+
bit flips each time. This does make sense, because <span class="bold
|
|
1219
|
+
red">stringSum</span> just sums the numeric value of each character in the
|
|
1220
|
+
string. This example only hashes the equivalent of a single character, which
|
|
1221
|
+
means the output will always be the same as the input.</p>
|
|
1222
|
+
<h2 id="why-all-of-this-matters"><a class="anchor" href="#why-all-of-this-matters">#</a>
|
|
1223
|
+
Why all of this matters</h2>
|
|
1224
|
+
<p>We've taken the time to understand some of the ways to determine if a hash
|
|
1225
|
+
function is good, but we've not spent any time talking about why it matters.
|
|
1226
|
+
Let's fix that by talking about hash maps.</p>
|
|
1227
|
+
<p>To understand hash maps, we first must understand what a map is. A map is a data
|
|
1228
|
+
structure that allows you to store key-value pairs. Here's an example in
|
|
1229
|
+
JavaScript:</p>
|
|
1230
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">let </span><span>map </span><span style="color:#81a1c1;">= new </span><span style="color:#8fbcbb;">Map</span><span>()</span><span style="color:#eceff4;">;
|
|
1231
|
+
</span><span>map</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">set</span><span>(</span><span style="color:#a3be8c;">&quot;hello&quot;</span><span style="color:#eceff4;">, </span><span style="color:#a3be8c;">&quot;world&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
1232
|
+
</span><span style="color:#8fbcbb;">console</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">log</span><span>(map</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">get</span><span>(</span><span style="color:#a3be8c;">&quot;hello&quot;</span><span>))</span><span style="color:#eceff4;">;
|
|
1233
|
+
</span></code></pre>
|
|
1234
|
+
<p>Here we take a key-value pair (<code>"hello"</code> → <code>"world"</code>) and store it in the map.
|
|
1235
|
+
Then we print out the value associated with the key <code>"hello"</code>, which will be
|
|
1236
|
+
<code>"world"</code>.</p>
|
|
1237
|
+
<p>A more fun real-world use-case would be to find anagrams. An anagram is when two
|
|
1238
|
+
different words contain the same letters, for example "antlers" and "rentals"
|
|
1239
|
+
or "article" and "recital." If you have a list of words and you want to find
|
|
1240
|
+
all of the anagrams, you can sort the letters in each word alphabetically and
|
|
1241
|
+
use that as a key in a map.</p>
|
|
1242
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">let </span><span>words </span><span style="color:#81a1c1;">= </span><span>[
|
|
1243
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;antlers&quot;</span><span style="color:#eceff4;">,
|
|
1244
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;rentals&quot;</span><span style="color:#eceff4;">,
|
|
1245
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;sternal&quot;</span><span style="color:#eceff4;">,
|
|
1246
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;article&quot;</span><span style="color:#eceff4;">,
|
|
1247
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;recital&quot;</span><span style="color:#eceff4;">,
|
|
1248
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;flamboyant&quot;</span><span style="color:#eceff4;">,
|
|
1249
|
+
</span><span>]</span><span style="color:#eceff4;">;
|
|
1250
|
+
</span><span>
|
|
1251
|
+
</span><span style="color:#81a1c1;">let </span><span>map </span><span style="color:#81a1c1;">= new </span><span style="color:#8fbcbb;">Map</span><span>()</span><span style="color:#eceff4;">;
|
|
1252
|
+
</span><span>
|
|
1253
|
+
</span><span style="color:#81a1c1;">for </span><span>(</span><span style="color:#81a1c1;">let </span><span>word </span><span style="color:#81a1c1;">of </span><span>words) {
|
|
1254
|
+
</span><span> </span><span style="color:#81a1c1;">let </span><span>key </span><span style="color:#81a1c1;">= </span><span>word</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">split</span><span>(</span><span style="color:#a3be8c;">&quot;&quot;</span><span>)</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">sort</span><span>()</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">join</span><span>(</span><span style="color:#a3be8c;">&quot;&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
1255
|
+
</span><span>
|
|
1256
|
+
</span><span> </span><span style="color:#81a1c1;">if </span><span>(</span><span style="color:#81a1c1;">!</span><span>map</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">has</span><span>(key)) {
|
|
1257
|
+
</span><span> map</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">set</span><span>(key</span><span style="color:#eceff4;">, </span><span>[])</span><span style="color:#eceff4;">;
|
|
1258
|
+
</span><span> }
|
|
1259
|
+
</span><span> map</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">get</span><span>(key)</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">push</span><span>(word)</span><span style="color:#eceff4;">;
|
|
1260
|
+
</span><span>}
|
|
1261
|
+
</span></code></pre>
|
|
1262
|
+
<p>This code results in a map with the following structure:</p>
|
|
1263
|
+
<pre data-lang="json" style="background-color:#2e3440;color:#d8dee9;" class="language-json "><code class="language-json" data-lang="json"><span>{
|
|
1264
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;aelnrst&quot;</span><span style="color:#eceff4;">: </span><span>[</span><span style="color:#a3be8c;">&quot;antlers&quot;</span><span style="color:#eceff4;">, </span><span style="color:#a3be8c;">&quot;rentals&quot;</span><span style="color:#eceff4;">, </span><span style="color:#a3be8c;">&quot;sternal&quot;</span><span>]</span><span style="color:#eceff4;">,
|
|
1265
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;aceilrt&quot;</span><span style="color:#eceff4;">: </span><span>[</span><span style="color:#a3be8c;">&quot;article&quot;</span><span style="color:#eceff4;">, </span><span style="color:#a3be8c;">&quot;recital&quot;</span><span>]</span><span style="color:#eceff4;">,
|
|
1266
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;aabflmnoty&quot;</span><span style="color:#eceff4;">: </span><span>[</span><span style="color:#a3be8c;">&quot;flamboyant&quot;</span><span>]
|
|
1267
|
+
</span><span>}
|
|
1268
|
+
</span></code></pre>
|
|
1269
|
+
<h3 id="implementing-our-own-simple-hash-map"><a class="anchor" href="#implementing-our-own-simple-hash-map">#</a>
|
|
1270
|
+
Implementing our own simple hash map</h3>
|
|
1271
|
+
<p>Hash maps are one of many map implementations, and there are many ways to
|
|
1272
|
+
implement hash maps. The simplest way, and the way we're going to demonstrate,
|
|
1273
|
+
is to use a list of lists. The inner lists are often referred to as "buckets" in
|
|
1274
|
+
the real-world, so that's what we'll call them here. A hash function is used on
|
|
1275
|
+
the key to determine which bucket to store the key-value pair in, then the
|
|
1276
|
+
key-value pair is added to that bucket.</p>
|
|
1277
|
+
<p>Let's walk through a simple hash map implementation in JavaScript. We're going
|
|
1278
|
+
to go through it bottom-up, so we'll see some utility methods before getting to
|
|
1279
|
+
the <code>set</code> and <code>get</code> implementations.</p>
|
|
1280
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">class </span><span style="color:#8fbcbb;">HashMap </span><span>{
|
|
1281
|
+
</span><span> </span><span style="color:#81a1c1;">constructor</span><span>() {
|
|
1282
|
+
</span><span> </span><span style="color:#81a1c1;">this.</span><span>bs </span><span style="color:#81a1c1;">= </span><span>[[]</span><span style="color:#eceff4;">, </span><span>[]</span><span style="color:#eceff4;">, </span><span>[]]</span><span style="color:#eceff4;">;
|
|
1283
|
+
</span><span> }
|
|
1284
|
+
</span><span>}
|
|
1285
|
+
</span></code></pre>
|
|
1286
|
+
<p>We start off by creating a <code>HashMap</code> class with a constructor that sets up 3
|
|
1287
|
+
buckets. We use 3 buckets and the short variable name <code>bs</code> so that this code
|
|
1288
|
+
displays nicely on devices with smaller screens. In reality, you could have
|
|
1289
|
+
however many buckets you want (and better variable names).</p>
|
|
1290
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">class </span><span style="color:#8fbcbb;">HashMap </span><span>{
|
|
1291
|
+
</span><span> </span><span style="color:#616e88;">// ...
|
|
1292
|
+
</span><span> </span><span style="color:#88c0d0;">bucket</span><span>(key) {
|
|
1293
|
+
</span><span> </span><span style="color:#81a1c1;">let </span><span>h </span><span style="color:#81a1c1;">= </span><span style="color:#88c0d0;">murmur3</span><span>(key)</span><span style="color:#eceff4;">;
|
|
1294
|
+
</span><span> </span><span style="color:#81a1c1;">return this.</span><span>bs[h </span><span style="color:#81a1c1;">% this.</span><span>bs</span><span style="color:#81a1c1;">.</span><span>length]</span><span style="color:#eceff4;">;
|
|
1295
|
+
</span><span> }
|
|
1296
|
+
</span><span>}
|
|
1297
|
+
</span></code></pre>
|
|
1298
|
+
<p>The <code>bucket</code> method uses <code class="bold green">murmur3</code> on the <code>key</code>
|
|
1299
|
+
passed in to find a bucket to use. This is the only place in our hash map code
|
|
1300
|
+
that a hash function is used.</p>
|
|
1301
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">class </span><span style="color:#8fbcbb;">HashMap </span><span>{
|
|
1302
|
+
</span><span> </span><span style="color:#616e88;">// ...
|
|
1303
|
+
</span><span> </span><span style="color:#88c0d0;">entry</span><span>(bucket</span><span style="color:#eceff4;">, </span><span>key) {
|
|
1304
|
+
</span><span> </span><span style="color:#81a1c1;">for </span><span>(</span><span style="color:#81a1c1;">let </span><span>e </span><span style="color:#81a1c1;">of </span><span>bucket) {
|
|
1305
|
+
</span><span> </span><span style="color:#81a1c1;">if </span><span>(e</span><span style="color:#81a1c1;">.</span><span>key </span><span style="color:#81a1c1;">=== </span><span>key) {
|
|
1306
|
+
</span><span> </span><span style="color:#81a1c1;">return </span><span>e</span><span style="color:#eceff4;">;
|
|
1307
|
+
</span><span> }
|
|
1308
|
+
</span><span> }
|
|
1309
|
+
</span><span> </span><span style="color:#81a1c1;">return null</span><span style="color:#eceff4;">;
|
|
1310
|
+
</span><span> }
|
|
1311
|
+
</span><span>}
|
|
1312
|
+
</span></code></pre>
|
|
1313
|
+
<p>The <code>entry</code> method takes a <code>bucket</code> and a <code>key</code> and scans the bucket until it
|
|
1314
|
+
finds an entry with the given key. If no entry is found, <code>null</code> is returned.</p>
|
|
1315
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">class </span><span style="color:#8fbcbb;">HashMap </span><span>{
|
|
1316
|
+
</span><span> </span><span style="color:#616e88;">// ...
|
|
1317
|
+
</span><span> </span><span style="color:#88c0d0;">set</span><span>(key</span><span style="color:#eceff4;">, </span><span>value) {
|
|
1318
|
+
</span><span> </span><span style="color:#81a1c1;">let </span><span>b </span><span style="color:#81a1c1;">= this.</span><span style="color:#88c0d0;">bucket</span><span>(key)</span><span style="color:#eceff4;">;
|
|
1319
|
+
</span><span> </span><span style="color:#81a1c1;">let </span><span>e </span><span style="color:#81a1c1;">= this.</span><span style="color:#88c0d0;">entry</span><span>(b</span><span style="color:#eceff4;">, </span><span>key)</span><span style="color:#eceff4;">;
|
|
1320
|
+
</span><span> </span><span style="color:#81a1c1;">if </span><span>(e) {
|
|
1321
|
+
</span><span> e</span><span style="color:#81a1c1;">.</span><span>value </span><span style="color:#81a1c1;">= </span><span>value</span><span style="color:#eceff4;">;
|
|
1322
|
+
</span><span> </span><span style="color:#81a1c1;">return</span><span style="color:#eceff4;">;
|
|
1323
|
+
</span><span> }
|
|
1324
|
+
</span><span> b</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">push</span><span>({ key</span><span style="color:#eceff4;">, </span><span>value })</span><span style="color:#eceff4;">;
|
|
1325
|
+
</span><span> }
|
|
1326
|
+
</span><span>}
|
|
1327
|
+
</span></code></pre>
|
|
1328
|
+
<p>The <code>set</code> method is the first one we should recognise from our earlier
|
|
1329
|
+
JavaScript <code>Map</code> examples. It takes a key-value pair and stores it in our hash
|
|
1330
|
+
map. It does this by using the <code>bucket</code> and <code>entry</code> methods we created earlier.
|
|
1331
|
+
If an entry is found, its value is overwritten. If no entry is found, the
|
|
1332
|
+
key-value pair is added to the map. In JavaScript, <code>{ key, value }</code> is
|
|
1333
|
+
shorthand for <code>{ key: key, value: value }</code>.</p>
|
|
1334
|
+
<pre data-lang="javascript" style="background-color:#2e3440;color:#d8dee9;" class="language-javascript "><code class="language-javascript" data-lang="javascript"><span style="color:#81a1c1;">class </span><span style="color:#8fbcbb;">HashMap </span><span>{
|
|
1335
|
+
</span><span> </span><span style="color:#616e88;">// ...
|
|
1336
|
+
</span><span> </span><span style="color:#88c0d0;">get</span><span>(key) {
|
|
1337
|
+
</span><span> </span><span style="color:#81a1c1;">let </span><span>b </span><span style="color:#81a1c1;">= this.</span><span style="color:#88c0d0;">bucket</span><span>(key)</span><span style="color:#eceff4;">;
|
|
1338
|
+
</span><span> </span><span style="color:#81a1c1;">let </span><span>e </span><span style="color:#81a1c1;">= this.</span><span style="color:#88c0d0;">entry</span><span>(b</span><span style="color:#eceff4;">, </span><span>key)</span><span style="color:#eceff4;">;
|
|
1339
|
+
</span><span> </span><span style="color:#81a1c1;">if </span><span>(e) {
|
|
1340
|
+
</span><span> </span><span style="color:#81a1c1;">return </span><span>e</span><span style="color:#81a1c1;">.</span><span>value</span><span style="color:#eceff4;">;
|
|
1341
|
+
</span><span> }
|
|
1342
|
+
</span><span> </span><span style="color:#81a1c1;">return null</span><span style="color:#eceff4;">;
|
|
1343
|
+
</span><span> }
|
|
1344
|
+
</span><span>}
|
|
1345
|
+
</span></code></pre>
|
|
1346
|
+
<p>The <code>get</code> method is very similar to <code>set</code>. It uses <code>bucket</code> and <code>entry</code> to find
|
|
1347
|
+
the entry related to the <code>key</code> passed in, just like <code>set</code> does. If an entry is
|
|
1348
|
+
found, its <code>value</code> is returned. If one isn't found, <code>null</code> is returned.</p>
|
|
1349
|
+
<p>That was quite a lot of code. What you should take away from it is that our
|
|
1350
|
+
hash map is a list of lists, and a hash function is used to know which of the
|
|
1351
|
+
lists to store and retrieve a given key from.</p>
|
|
1352
|
+
<p>Here's a visual representation of this hash map in action. <span class="purple
|
|
1353
|
+
bold">Click</span> anywhere on the buckets to add a new key-value pair using our
|
|
1354
|
+
<code>set</code> method. To keep the visualisation simple, if a bucket were to "overflow",
|
|
1355
|
+
the buckets are all reset.</p>
|
|
1356
|
+
<p><hash-map hashFn=murmur3 valueFn=intToStr></hash-map></p>
|
|
1357
|
+
<p>Because we're using <code class="green bold">murmur3</code> as our hash
|
|
1358
|
+
function, you should see good distribution between the buckets. It's expected
|
|
1359
|
+
you'll see <em>some</em> imbalance, but it should generally be quite even.</p>
|
|
1360
|
+
<p>To get a value out of the hash map, we first hash the key to figure out which
|
|
1361
|
+
bucket the value will be in. Then we have to compare the key we're searching for
|
|
1362
|
+
against all of the keys in the bucket.</p>
|
|
1363
|
+
<p>It's this search step that we minimise through hashing, and why <code
|
|
1364
|
+
class="bold green">murmur3</code> is optimised for speed. The faster the hash
|
|
1365
|
+
function, the faster we find the right bucket to search, the faster our hash
|
|
1366
|
+
map is overall.</p>
|
|
1367
|
+
<p>This is also why reducing collisions is so crucial. If we did decide to use that
|
|
1368
|
+
dummy hash function from all the way at the start of this article, the one that
|
|
1369
|
+
returns 0 all the time, we'll put all of our key-value pairs into the first
|
|
1370
|
+
bucket. Finding anything could mean we have to check all of the values in the
|
|
1371
|
+
hash map. With a good hash function, with good distribution, we reduce the
|
|
1372
|
+
amount of searching we have to do to 1/N, where N is the number of buckets.</p>
|
|
1373
|
+
<p>Let's see how <code class="red bold">stringSum</code> does.</p>
|
|
1374
|
+
<p><hash-map hashFn=stringSum valueFn=intToStr></hash-map></p>
|
|
1375
|
+
<p>Interestingly, <code class="bold red">stringSum</code> seems to distribute
|
|
1376
|
+
values quite well. You notice a pattern, but the overall distribution looks
|
|
1377
|
+
good.</p>
|
|
1378
|
+
<blockquote class="haskie">
|
|
1379
|
+
<img src="/images/haskie-triumphant-200px.png" />
|
|
1380
|
+
<p>
|
|
1381
|
+
Finally! A win for <code class="bold red">stringSum</code>. I knew it would
|
|
1382
|
+
be good for something.
|
|
1383
|
+
</p>
|
|
1384
|
+
</blockquote>
|
|
1385
|
+
<p>Not so fast, Haskie. We need to talk about a serious problem. The distribution
|
|
1386
|
+
looks okay on these sequential numbers, but we've seen that <code class="bold
|
|
1387
|
+
red">stringSum</code> doesn't have a good avalanche effect. This doesn't end
|
|
1388
|
+
well.</p>
|
|
1389
|
+
<h2 id="real-world-collisions"><a class="anchor" href="#real-world-collisions">#</a>
|
|
1390
|
+
Real-world collisions</h2>
|
|
1391
|
+
<p>Let's look at 2 real-world data sets: IP addresses and English words. What I'm
|
|
1392
|
+
going to do is take 100,000,000 random IP addresses and <a href="https://github.com/dwyl/english-words">466,550 English
|
|
1393
|
+
words</a>, hash all of them with both <code class="bold green">murmur3</code>
|
|
1394
|
+
and <code class="red bold">stringSum</code>, and see how many collisions we get.</p>
|
|
1395
|
+
<p><b>IP Addresses</b></p>
|
|
1396
|
+
<table class="datasets">
|
|
1397
|
+
<th>
|
|
1398
|
+
</th>
|
|
1399
|
+
<th>
|
|
1400
|
+
<code class="bold green">murmur3</code>
|
|
1401
|
+
</th>
|
|
1402
|
+
<th>
|
|
1403
|
+
<code class="bold red">stringSum</code>
|
|
1404
|
+
</th>
|
|
1405
|
+
<tr>
|
|
1406
|
+
<td>Collisions</td>
|
|
1407
|
+
<td>1,156,959</td>
|
|
1408
|
+
<td>99,999,566</td>
|
|
1409
|
+
</tr>
|
|
1410
|
+
<tr>
|
|
1411
|
+
<td></td>
|
|
1412
|
+
<td>1.157%</td>
|
|
1413
|
+
<td>99.999%</td>
|
|
1414
|
+
</tr>
|
|
1415
|
+
</table>
|
|
1416
|
+
<p><b>English words</b></p>
|
|
1417
|
+
<table class="datasets">
|
|
1418
|
+
<th>
|
|
1419
|
+
</th>
|
|
1420
|
+
<th>
|
|
1421
|
+
<code class="bold green">murmur3</code>
|
|
1422
|
+
</th>
|
|
1423
|
+
<th>
|
|
1424
|
+
<code class="bold red">stringSum</code>
|
|
1425
|
+
</th>
|
|
1426
|
+
<tr>
|
|
1427
|
+
<td>Collisions</td>
|
|
1428
|
+
<td>25</td>
|
|
1429
|
+
<td>464,220</td>
|
|
1430
|
+
</tr>
|
|
1431
|
+
<tr>
|
|
1432
|
+
<td></td>
|
|
1433
|
+
<td>0.005%</td>
|
|
1434
|
+
<td>99.5%</td>
|
|
1435
|
+
</tr>
|
|
1436
|
+
</table>
|
|
1437
|
+
<p>When we use hash maps for real, we aren't usually storing random values in them.
|
|
1438
|
+
We can imagine counting the number of times we've seen an IP address in rate
|
|
1439
|
+
limiting code for a server. Or code that counts the occurrences of words in
|
|
1440
|
+
books throughout history to track their origin and popularity. <code class="bold
|
|
1441
|
+
red">stringSum</code> sucks for these applications because of it's extremely
|
|
1442
|
+
high collision rate.</p>
|
|
1443
|
+
<h2 id="manufactured-collisions"><a class="anchor" href="#manufactured-collisions">#</a>
|
|
1444
|
+
Manufactured collisions</h2>
|
|
1445
|
+
<p>Now it's <code class="bold green">murmur3</code>'s turn for some bad news.
|
|
1446
|
+
It's not just collisions caused by similarity in the input we have to worry
|
|
1447
|
+
about. Check this out.</p>
|
|
1448
|
+
<p><hash-map hashFn=murmur3 valueFn=murmur3Collisions></hash-map></p>
|
|
1449
|
+
<p>What's happening here? Why do all of these jibberish strings hash to the same
|
|
1450
|
+
number?</p>
|
|
1451
|
+
<p>I hashed 141 trillion random strings to find values that hash to the number
|
|
1452
|
+
<code>1228476406</code> when using <code class="green bold">murmur3</code>. Hash functions
|
|
1453
|
+
have to always return the same output for a specific input, so it's possible to
|
|
1454
|
+
find collisions by brute force.</p>
|
|
1455
|
+
<blockquote class="haskie">
|
|
1456
|
+
<img src="/images/haskie-concerned-200px.png" />
|
|
1457
|
+
<p>
|
|
1458
|
+
I'm sorry, 141 <b>trillion</b>? Like... 141 and then 12 zeroes?
|
|
1459
|
+
</p>
|
|
1460
|
+
</blockquote>
|
|
1461
|
+
<p>Yes, and it only took me 25 minutes. <a href="https://computers-are-fast.github.io/">Computers are fast</a>.</p>
|
|
1462
|
+
<p>Bad actors having easy access to collisions can be devastating if your software
|
|
1463
|
+
builds hash maps out of user input. Take HTTP headers, for example. An HTTP
|
|
1464
|
+
request looks like this:</p>
|
|
1465
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>GET / HTTP/1.1
|
|
1466
|
+
</span><span>Accept: */*
|
|
1467
|
+
</span><span>Accept-Encoding: gzip, deflate
|
|
1468
|
+
</span><span>Connection: keep-alive
|
|
1469
|
+
</span><span>Host: google.com
|
|
1470
|
+
</span></code></pre>
|
|
1471
|
+
<p>You don't have to understand all of the words, just that the first line is the
|
|
1472
|
+
path being requested and all of the other lines are headers. Headers are <code>Key: Value</code> pairs, so HTTP servers tend to use maps to store them. Nothing stops us
|
|
1473
|
+
from passing any headers we want, so we can be really mean and pass headers we
|
|
1474
|
+
know will cause collisions. This can significantly slow down the server.</p>
|
|
1475
|
+
<p><a href="https://cryptanalysis.eu/blog/2011/12/28/effective-dos-attacks-against-web-application-plattforms-hashdos/">This isn't
|
|
1476
|
+
theoretical, either</a>.
|
|
1477
|
+
If you search "HashDoS" you'll find a lot more examples of this. It was a really
|
|
1478
|
+
big deal in the mid-2000s.</p>
|
|
1479
|
+
<p>There are a few ways to mitigate this specific to HTTP servers: ignoring
|
|
1480
|
+
jibberish header keys and limiting the number of headers you store, for
|
|
1481
|
+
example. But modern hash functions like <code class="bold green">murmur3</code>
|
|
1482
|
+
offer a more generalised solution: randomisation.</p>
|
|
1483
|
+
<p>Earlier in this post we showed some examples of hash function implementations.
|
|
1484
|
+
Those implementations took a single argument: <code>input</code>. Lots of modern hash
|
|
1485
|
+
functions take a 2nd parameter: <code>seed</code> (sometimes called <code>salt</code>). In the case
|
|
1486
|
+
of <code class="green bold">murmur3</code>, this seed is a number.</p>
|
|
1487
|
+
<p>So far, we've been using 0 as the seed. Let's see what happens with the
|
|
1488
|
+
collisions I've collected when we use a seed of 1.</p>
|
|
1489
|
+
<p><hash-map hashFn=murmur3 valueFn=murmur3Collisions seed=1></hash-map></p>
|
|
1490
|
+
<p>Just like that, 0 to 1, the collisions are gone. This is the purpose of the
|
|
1491
|
+
seed: it randomises the output of the hash function in an unpredictable way.
|
|
1492
|
+
How it achieves this is beyond the scope of this article, all hash functions do
|
|
1493
|
+
this in their own way.</p>
|
|
1494
|
+
<p>The hash function still returns the same output for the same input, it's just
|
|
1495
|
+
that the input is a combination of <code>input</code> and <code>seed</code>. Things that collide with
|
|
1496
|
+
one seed shouldn't collide when using another. Programming languages often
|
|
1497
|
+
generate a random number to use as the seed when the process starts, so that
|
|
1498
|
+
every time you run your program the seed is different. As a bad guy, not knowing
|
|
1499
|
+
the seed, it is now impossible for me to reliably cause harm.</p>
|
|
1500
|
+
<p>If you look closely in the above visualisation and the one before it, they're
|
|
1501
|
+
the same values being hashed but they produce different hash values. The
|
|
1502
|
+
implication of this is that if you hash a value with one seed, and want to be
|
|
1503
|
+
able to compare against it in the future, you need to make sure you use the same
|
|
1504
|
+
seed.</p>
|
|
1505
|
+
<p>Having different values for different seeds doesn't affect the hash map
|
|
1506
|
+
use-case, because hash maps only live for the duration the program is running.
|
|
1507
|
+
Provided you use the same seed for the lifetime of the program, your hash maps
|
|
1508
|
+
will continue to work just fine. If you ever store hash values outside of your
|
|
1509
|
+
program, in a file for example, you need to be careful you know what seed has
|
|
1510
|
+
been used.</p>
|
|
1511
|
+
<h2 id="playground"><a class="anchor" href="#playground">#</a>
|
|
1512
|
+
Playground</h2>
|
|
1513
|
+
<p>As is tradition, I've made a playground for you to write your own hash functions
|
|
1514
|
+
and see them visualised with the grids seen in this article. Click
|
|
1515
|
+
<a href="/hashing-playground/">here</a> to try it!</p>
|
|
1516
|
+
<h2 id="conclusion"><a class="anchor" href="#conclusion">#</a>
|
|
1517
|
+
Conclusion</h2>
|
|
1518
|
+
<p>We've covered what a hash function is, some ways to measure how good it is,
|
|
1519
|
+
what happens when it's not good, and some of the ways they can be broken by
|
|
1520
|
+
bad actors.</p>
|
|
1521
|
+
<p>The universe of hash functions is a large one, and we've really only scratched
|
|
1522
|
+
the surface in this post. We haven't spoken about cryptographic vs
|
|
1523
|
+
non-cryptographic hashing, we've touched on only 1 of the thousands of use-cases
|
|
1524
|
+
for hash functions, and we haven't talked about how exactly modern hash
|
|
1525
|
+
functions actually work.</p>
|
|
1526
|
+
<p>Some further reading I recommend if you're really enthusiastic about this topic
|
|
1527
|
+
and want to learn more:</p>
|
|
1528
|
+
<ul>
|
|
1529
|
+
<li><a href="https://github.com/rurban/smhasher">https://github.com/rurban/smhasher</a> this repository is the gold standard
|
|
1530
|
+
for testing how good hash functions are. They run a tonne of tests against
|
|
1531
|
+
a wide number of hash functions and present the results in a big table. It
|
|
1532
|
+
will be difficult to understand what all of the tests are for, but this is
|
|
1533
|
+
where the state of the art of hash testing lives.</li>
|
|
1534
|
+
<li><a href="https://djhworld.github.io/hyperloglog/">https://djhworld.github.io/hyperloglog/</a> this is an interactive piece
|
|
1535
|
+
on a data structure called HyperLogLog. It's used to efficiently count the
|
|
1536
|
+
number of unique elements in very, very large sets. It uses
|
|
1537
|
+
hashing to do it in a really clever way.</li>
|
|
1538
|
+
<li><a href="https://www.gnu.org/software/gperf/">https://www.gnu.org/software/gperf/</a> is a piece of software that, when given
|
|
1539
|
+
the expected set of things you want to hash, can generate a "perfect" hash
|
|
1540
|
+
function automatically.</li>
|
|
1541
|
+
</ul>
|
|
1542
|
+
<p>Feel free to join the discussion on <a href="https://news.ycombinator.com/item?id=36401747">Hacker News</a>!</p>
|
|
1543
|
+
<h2 id="acknowledgements"><a class="anchor" href="#acknowledgements">#</a>
|
|
1544
|
+
Acknowledgements</h2>
|
|
1545
|
+
<p>Thanks to everyone who read early drafts and provided invaluable feedback.</p>
|
|
1546
|
+
<ul>
|
|
1547
|
+
<li><a href="https://delroth.net/">delroth</a>, <a href="https://twitter.com/Manon_Lef/">Manon</a>, <a href="https://medium.com/@AaronKalair">Aaron</a>, <a href="https://twitter.com/TheCoppinger">Charlie</a></li>
|
|
1548
|
+
</ul>
|
|
1549
|
+
<p>And everyone who helped me find <code class="green bold">murmur3</code> hash
|
|
1550
|
+
collisions:</p>
|
|
1551
|
+
<ul>
|
|
1552
|
+
<li><a href="https://indy.codes/">Indy</a>, <a href="https://medium.com/@AaronKalair">Aaron</a>, Max</li>
|
|
1553
|
+
</ul>
|
|
1554
|
+
<h2 id="patreon"><a class="anchor" href="#patreon">#</a>
|
|
1555
|
+
Patreon</h2>
|
|
1556
|
+
<p>After the success of <a href="/load-balancing/">Load Balancing</a> and <a href="/memory-allocation/">Memory
|
|
1557
|
+
Allocation</a>, I have decided to set up a Patreon page:
|
|
1558
|
+
<a href="https://patreon.com/samwho">https://patreon.com/samwho</a>. For all of these articles going forward, I am
|
|
1559
|
+
going to post a Patreon-exclusive behind-the-scenes post talking about
|
|
1560
|
+
decisions, difficulties, and lessons learned from each post. It will give you
|
|
1561
|
+
a deep look in to how these articles evolve, and I'm really stoked about
|
|
1562
|
+
the one I've written for this one.</p>
|
|
1563
|
+
<p>If you enjoy my writing, and want to support it going forward, I'd really
|
|
1564
|
+
appreciate you becoming a Patreon. ❤️</p>
|
|
1565
|
+
</content>
|
|
1566
|
+
|
|
1567
|
+
</entry>
|
|
1568
|
+
|
|
1569
|
+
|
|
1570
|
+
|
|
1571
|
+
|
|
1572
|
+
|
|
1573
|
+
|
|
1574
|
+
<entry xml:lang="en">
|
|
1575
|
+
<title>Memory Allocation</title>
|
|
1576
|
+
<published>2023-04-13T00:00:00+00:00</published>
|
|
1577
|
+
<updated>2023-04-13T00:00:00+00:00</updated>
|
|
1578
|
+
<author>
|
|
1579
|
+
<name>Unknown</name>
|
|
1580
|
+
</author>
|
|
1581
|
+
<link rel="alternate" href="https://samwho.dev/memory-allocation/" type="text/html"/>
|
|
1582
|
+
<id>https://samwho.dev/memory-allocation/</id>
|
|
1583
|
+
|
|
1584
|
+
<content type="html"><script src="/js/gsap/gsap.min.js"></script>
|
|
1585
|
+
<script src="/js/gsap/PixiPlugin.min.js"></script>
|
|
1586
|
+
<script type="module" src="/js/memory-allocation.js"></script>
|
|
1587
|
+
<style>
|
|
1588
|
+
.memory {
|
|
1589
|
+
width: 100%;
|
|
1590
|
+
margin-bottom: 1.5em;
|
|
1591
|
+
margin-top: 0.5em;
|
|
1592
|
+
}
|
|
1593
|
+
input[type=range]:focus {
|
|
1594
|
+
outline: none;
|
|
1595
|
+
}
|
|
1596
|
+
a[simulation] {
|
|
1597
|
+
cursor: pointer;
|
|
1598
|
+
}
|
|
1599
|
+
.size {
|
|
1600
|
+
color: #0072B2 !important;
|
|
1601
|
+
font-weight: bold;
|
|
1602
|
+
}
|
|
1603
|
+
.free {
|
|
1604
|
+
color: #009E73 !important;
|
|
1605
|
+
font-weight: bold;
|
|
1606
|
+
}
|
|
1607
|
+
.allocated {
|
|
1608
|
+
color: #D55E00 !important;
|
|
1609
|
+
font-weight: bold;
|
|
1610
|
+
}
|
|
1611
|
+
.usable-memory {
|
|
1612
|
+
color: #E69F00 !important;
|
|
1613
|
+
font-weight: bold;
|
|
1614
|
+
}
|
|
1615
|
+
</style>
|
|
1616
|
+
<p>One thing that all programs on your computer have in common is a need for
|
|
1617
|
+
memory. Programs need to be loaded from your hard drive into memory before they
|
|
1618
|
+
can be run. While running, the majority of what programs do is load values from
|
|
1619
|
+
memory, do some computation on them, and then store the result back in memory.</p>
|
|
1620
|
+
<p>In this post I'm going to introduce you to the basics of memory allocation.
|
|
1621
|
+
Allocators exist because it's not enough to have memory available, you need to
|
|
1622
|
+
use it effectively. We will visually explore how simple allocators work. We'll
|
|
1623
|
+
see some of the problems that they try to solve, and some of the techniques used
|
|
1624
|
+
to solve them. At the end of this post, you should know everything you need to
|
|
1625
|
+
know to write your own allocator.</p>
|
|
1626
|
+
<h2 id="malloc-and-free"><a class="anchor" href="#malloc-and-free">#</a>
|
|
1627
|
+
<code>malloc</code> and <code>free</code></h2>
|
|
1628
|
+
<p>To understand the job of a memory allocator, it's essential to understand how
|
|
1629
|
+
programs request and return memory. <code>malloc</code> and <code>free</code> are functions that were
|
|
1630
|
+
first introduced in a recognisable form in UNIX v7 in 1979(!). Let's take a look
|
|
1631
|
+
at a short C program demonstrating their use.</p>
|
|
1632
|
+
<blockquote class="haskie">
|
|
1633
|
+
<img src="/images/haskie-concerned-200px.png" />
|
|
1634
|
+
<p>
|
|
1635
|
+
Woah, hold on. I've never written any C code before. Will I still be able
|
|
1636
|
+
to follow along?
|
|
1637
|
+
</p>
|
|
1638
|
+
</blockquote>
|
|
1639
|
+
<p>If you have beginner-level familiarity with another language, e.g. JavaScript,
|
|
1640
|
+
Python, or C#, you should have no problem following along. You don't need to
|
|
1641
|
+
understand every word, as long as you get the overall idea. This is the only
|
|
1642
|
+
C code in the article, I promise.</p>
|
|
1643
|
+
<pre data-lang="c" style="background-color:#2e3440;color:#d8dee9;" class="language-c "><code class="language-c" data-lang="c"><span style="color:#5e81ac;">#include </span><span style="color:#a3be8c;">&lt;stdlib.h&gt;
|
|
1644
|
+
</span><span>
|
|
1645
|
+
</span><span style="color:#81a1c1;">int </span><span style="color:#88c0d0;">main</span><span>() {
|
|
1646
|
+
</span><span> </span><span style="color:#81a1c1;">void *</span><span>ptr </span><span style="color:#81a1c1;">= </span><span style="color:#88c0d0;">malloc</span><span>(</span><span style="color:#b48ead;">4</span><span>)</span><span style="color:#eceff4;">;
|
|
1647
|
+
</span><span> </span><span style="color:#88c0d0;">free</span><span>(ptr)</span><span style="color:#eceff4;">;
|
|
1648
|
+
</span><span> </span><span style="color:#81a1c1;">return </span><span style="color:#b48ead;">0</span><span style="color:#eceff4;">;
|
|
1649
|
+
</span><span>}
|
|
1650
|
+
</span></code></pre>
|
|
1651
|
+
<p>In the above program we ask for 4 bytes of memory by calling <code>malloc(4)</code>, we
|
|
1652
|
+
store the value returned in a variable called <code>ptr</code>, then we indicate that we're
|
|
1653
|
+
done with the memory by calling <code>free(ptr)</code>.</p>
|
|
1654
|
+
<p>These two functions are how almost all programs manage the memory they use.
|
|
1655
|
+
Even when you're not writing C, the code that is executing your Java, Python,
|
|
1656
|
+
Ruby, JavaScript, and so on make use of <code>malloc</code> and <code>free</code>.</p>
|
|
1657
|
+
<h2 id="what-is-memory"><a class="anchor" href="#what-is-memory">#</a>
|
|
1658
|
+
What is memory?</h2>
|
|
1659
|
+
<p>The smallest unit of memory that allocators work with is called a "byte." A byte
|
|
1660
|
+
can store any number between 0 and 255. You can think of memory as being a long
|
|
1661
|
+
sequence of bytes. We're going to represent this sequence as a grid of squares,
|
|
1662
|
+
with each square representing a byte of memory.</p>
|
|
1663
|
+
<div class="memory" bytes="32" slider=false>
|
|
1664
|
+
</div>
|
|
1665
|
+
<p>In the C code from before, <code>malloc(4)</code> allocates 4 bytes of memory. We're going
|
|
1666
|
+
to represent memory that has been allocated as darker squares.</p>
|
|
1667
|
+
<div class="memory" bytes="32" slider=false>
|
|
1668
|
+
<malloc size="4" addr="0x0"></malloc>
|
|
1669
|
+
</div>
|
|
1670
|
+
<p>Then <code>free(ptr)</code> tells the allocator we're done with that memory. It is returned
|
|
1671
|
+
back to the pool of available memory.</p>
|
|
1672
|
+
<p>Here's what 4 <code>malloc</code> calls followed by 4 <code>free</code> calls looks like. You'll
|
|
1673
|
+
notice there's now a slider. Dragging the slider to the right advances time
|
|
1674
|
+
forward, and dragging it left rewinds. You can also click anywhere on the grid
|
|
1675
|
+
and then use the arrow keys on your keyboard, or you can use the left and right
|
|
1676
|
+
buttons. The ticks along the slider represent calls to <code>malloc</code> and <code>free</code>.</p>
|
|
1677
|
+
<div class="memory" bytes="32">
|
|
1678
|
+
<malloc size="4" addr="0x0"></malloc>
|
|
1679
|
+
<malloc size="5" addr="0x4"></malloc>
|
|
1680
|
+
<malloc size="6" addr="0x9"></malloc>
|
|
1681
|
+
<malloc size="7" addr="0xf"></malloc>
|
|
1682
|
+
<free addr="0x0"></free>
|
|
1683
|
+
<free addr="0x4"></free>
|
|
1684
|
+
<free addr="0x9"></free>
|
|
1685
|
+
<free addr="0xf"></free>
|
|
1686
|
+
</div>
|
|
1687
|
+
<blockquote class="haskie">
|
|
1688
|
+
<img src="/images/haskie-confused-200px.png">
|
|
1689
|
+
<p>
|
|
1690
|
+
Wait a sec... What is <code>malloc</code> actually returning as a value?
|
|
1691
|
+
What does it mean to "give" memory to a program?
|
|
1692
|
+
</p>
|
|
1693
|
+
</blockquote>
|
|
1694
|
+
<p>What <code>malloc</code> returns is called a "pointer" or a "memory address." It's a number
|
|
1695
|
+
that identifies a byte in memory. We typically write addresses in a form called
|
|
1696
|
+
"hexadecimal." Hexadecimal numbers are written with a <code>0x</code> prefix to distinguish
|
|
1697
|
+
them from decimal numbers. Move the slider below to see a comparison between
|
|
1698
|
+
decimal numbers and hexadecimal numbers.</p>
|
|
1699
|
+
<div
|
|
1700
|
+
id="hexadecimal-demo"
|
|
1701
|
+
style="display: flex; width: 100%; flex-direction: column;"
|
|
1702
|
+
>
|
|
1703
|
+
<div style="width: 100%; font-size: 2.5rem; display: flex; justify-content: center;">
|
|
1704
|
+
<code id="decimal" style="min-width: 4rem; text-align: right">0</code>
|
|
1705
|
+
<code style="min-width: 5rem; text-align: center">==</code>
|
|
1706
|
+
<code id="hexadecimal" style="min-width: 7rem; text-align: left">0x0</code>
|
|
1707
|
+
</div>
|
|
1708
|
+
<div>
|
|
1709
|
+
<input
|
|
1710
|
+
id="hexadecimal-slider"
|
|
1711
|
+
type="range"
|
|
1712
|
+
min="0"
|
|
1713
|
+
max="32"
|
|
1714
|
+
step="1"
|
|
1715
|
+
value="0"
|
|
1716
|
+
list="hexadecimal-demo-list"
|
|
1717
|
+
style="width: 100%" />
|
|
1718
|
+
<datalist id="hexadecimal-demo-list">
|
|
1719
|
+
<option value=1></option>
|
|
1720
|
+
<option value=2></option>
|
|
1721
|
+
<option value=3></option>
|
|
1722
|
+
<option value=4></option>
|
|
1723
|
+
<option value=5></option>
|
|
1724
|
+
<option value=6></option>
|
|
1725
|
+
<option value=7></option>
|
|
1726
|
+
<option value=8></option>
|
|
1727
|
+
<option value=9></option>
|
|
1728
|
+
<option value=10></option>
|
|
1729
|
+
<option value=11></option>
|
|
1730
|
+
<option value=12></option>
|
|
1731
|
+
<option value=13></option>
|
|
1732
|
+
<option value=14></option>
|
|
1733
|
+
<option value=15></option>
|
|
1734
|
+
<option value=16></option>
|
|
1735
|
+
<option value=17></option>
|
|
1736
|
+
<option value=18></option>
|
|
1737
|
+
<option value=19></option>
|
|
1738
|
+
<option value=20></option>
|
|
1739
|
+
<option value=21></option>
|
|
1740
|
+
<option value=22></option>
|
|
1741
|
+
<option value=23></option>
|
|
1742
|
+
<option value=24></option>
|
|
1743
|
+
<option value=25></option>
|
|
1744
|
+
<option value=26></option>
|
|
1745
|
+
<option value=27></option>
|
|
1746
|
+
<option value=28></option>
|
|
1747
|
+
<option value=29></option>
|
|
1748
|
+
<option value=30></option>
|
|
1749
|
+
<option value=31></option>
|
|
1750
|
+
<option value=32></option>
|
|
1751
|
+
</datalist>
|
|
1752
|
+
</div>
|
|
1753
|
+
</div>
|
|
1754
|
+
<p>Here's our familiar grid of memory. Each byte is annotated with its address in
|
|
1755
|
+
hexadecimal form. For space reasons, I've omitted the <code>0x</code> prefix.</p>
|
|
1756
|
+
<div class="memory" bytes=32 slider=false>
|
|
1757
|
+
<annotate type="text" addr=0x0 text=0></annotate>
|
|
1758
|
+
<annotate type="text" addr=0x1 text=1></annotate>
|
|
1759
|
+
<annotate type="text" addr=0x2 text=2></annotate>
|
|
1760
|
+
<annotate type="text" addr=0x3 text=3></annotate>
|
|
1761
|
+
<annotate type="text" addr=0x4 text=4></annotate>
|
|
1762
|
+
<annotate type="text" addr=0x5 text=5></annotate>
|
|
1763
|
+
<annotate type="text" addr=0x6 text=6></annotate>
|
|
1764
|
+
<annotate type="text" addr=0x7 text=7></annotate>
|
|
1765
|
+
<annotate type="text" addr=0x8 text=8></annotate>
|
|
1766
|
+
<annotate type="text" addr=0x9 text=9></annotate>
|
|
1767
|
+
<annotate type="text" addr=0xA text=A></annotate>
|
|
1768
|
+
<annotate type="text" addr=0xB text=B></annotate>
|
|
1769
|
+
<annotate type="text" addr=0xC text=C></annotate>
|
|
1770
|
+
<annotate type="text" addr=0xD text=D></annotate>
|
|
1771
|
+
<annotate type="text" addr=0xE text=E></annotate>
|
|
1772
|
+
<annotate type="text" addr=0xF text=F></annotate>
|
|
1773
|
+
<annotate type="text" addr=0x10 text=10></annotate>
|
|
1774
|
+
<annotate type="text" addr=0x11 text=11></annotate>
|
|
1775
|
+
<annotate type="text" addr=0x12 text=12></annotate>
|
|
1776
|
+
<annotate type="text" addr=0x13 text=13></annotate>
|
|
1777
|
+
<annotate type="text" addr=0x14 text=14></annotate>
|
|
1778
|
+
<annotate type="text" addr=0x15 text=15></annotate>
|
|
1779
|
+
<annotate type="text" addr=0x16 text=16></annotate>
|
|
1780
|
+
<annotate type="text" addr=0x17 text=17></annotate>
|
|
1781
|
+
<annotate type="text" addr=0x18 text=18></annotate>
|
|
1782
|
+
<annotate type="text" addr=0x19 text=19></annotate>
|
|
1783
|
+
<annotate type="text" addr=0x1A text=1A></annotate>
|
|
1784
|
+
<annotate type="text" addr=0x1B text=1B></annotate>
|
|
1785
|
+
<annotate type="text" addr=0x1C text=1C></annotate>
|
|
1786
|
+
<annotate type="text" addr=0x1D text=1D></annotate>
|
|
1787
|
+
<annotate type="text" addr=0x1E text=1E></annotate>
|
|
1788
|
+
<annotate type="text" addr=0x1F text=1F></annotate>
|
|
1789
|
+
</div>
|
|
1790
|
+
<p>The examples we use in this article pretend that your computer only has a very
|
|
1791
|
+
small amount of memory, but in real life you have billions of bytes to work
|
|
1792
|
+
with. Real addresses are much larger than what we're using here, but the idea is
|
|
1793
|
+
exactly the same. Memory addresses are numbers that refer to a specific byte in
|
|
1794
|
+
memory.</p>
|
|
1795
|
+
<h2 id="the-simplest-malloc"><a class="anchor" href="#the-simplest-malloc">#</a>
|
|
1796
|
+
The simplest <code>malloc</code></h2>
|
|
1797
|
+
<p>The "hello world" of <code>malloc</code> implementations would hand out blocks of memory by
|
|
1798
|
+
keeping track of where the previous block ended and starting the next block
|
|
1799
|
+
right after. Below we represent where the next block should start with a grey
|
|
1800
|
+
square.</p>
|
|
1801
|
+
<div class="memory" bytes="32">
|
|
1802
|
+
<allocator path="/js/allocators/stack.js"></allocator>
|
|
1803
|
+
<malloc size="4" addr="0x0"></malloc>
|
|
1804
|
+
<malloc size="5" addr="0x4"></malloc>
|
|
1805
|
+
<malloc size="6" addr="0x9"></malloc>
|
|
1806
|
+
<malloc size="7" addr="0xf"></malloc>
|
|
1807
|
+
</div>
|
|
1808
|
+
<p>You'll notice no memory is <code>free</code>d. If we're only keeping track of where the
|
|
1809
|
+
next block should start, and we don't know where previous blocks start or end,
|
|
1810
|
+
<code>free</code> doesn't have enough information to do anything. So it doesn't. This is
|
|
1811
|
+
called a "memory leak" because, once allocated, the memory can never be used
|
|
1812
|
+
again.</p>
|
|
1813
|
+
<p>Believe it or not, this isn't a completely useless implementation. For programs
|
|
1814
|
+
that use a known amount of memory, this can be a very efficient strategy. It's
|
|
1815
|
+
extremely fast and extremely simple. As a general-purpose memory allocator,
|
|
1816
|
+
though, we can't get away with having no <code>free</code> implementation.</p>
|
|
1817
|
+
<h2 id="the-simplest-general-purpose-malloc"><a class="anchor" href="#the-simplest-general-purpose-malloc">#</a>
|
|
1818
|
+
The simplest general-purpose <code>malloc</code></h2>
|
|
1819
|
+
<p>In order to <code>free</code> memory, we need to keep better track of memory. We can do
|
|
1820
|
+
this by saving the address and size of all allocations, and the address and size
|
|
1821
|
+
of blocks of free memory. We'll call these an "allocation list" and a "free
|
|
1822
|
+
list" respectively.</p>
|
|
1823
|
+
<div class="memory" bytes="32" slider=false>
|
|
1824
|
+
<allocator path="/js/allocators/freelist.js">
|
|
1825
|
+
<options coalesce=false></options>
|
|
1826
|
+
</allocator>
|
|
1827
|
+
</div>
|
|
1828
|
+
<p>We're representing free list entries as 2 grey squares linked together with a
|
|
1829
|
+
line. You can imagine this entry being represented in code as <code>address=0</code> and
|
|
1830
|
+
<code>size=32</code>. When our program starts, all of memory is marked as free. When
|
|
1831
|
+
<code>malloc</code> is called, we loop through our free list until we find a block large
|
|
1832
|
+
enough to accommodate it. When we find one, we save the address and size of the
|
|
1833
|
+
allocation in our allocation list, and shrink the free list entry accordingly.</p>
|
|
1834
|
+
<div class="memory" bytes="32" slider=false>
|
|
1835
|
+
<allocator path="/js/allocators/freelist.js">
|
|
1836
|
+
<options coalesce=false></options>
|
|
1837
|
+
</allocator>
|
|
1838
|
+
<malloc id=1 size=4></malloc>
|
|
1839
|
+
</div>
|
|
1840
|
+
<blockquote class="haskie">
|
|
1841
|
+
<img src="/images/haskie-confused-200px.png">
|
|
1842
|
+
<p>
|
|
1843
|
+
Where do we save allocations and free list entries? Aren't we pretending our
|
|
1844
|
+
computer only has 32 bytes of memory?
|
|
1845
|
+
</p>
|
|
1846
|
+
</blockquote>
|
|
1847
|
+
<p>You caught me. One of the benefits of being a memory allocator is that you're in
|
|
1848
|
+
charge of memory. You could store your allocation/free list in a reserved area
|
|
1849
|
+
that's just for you. Or you could store it inline, in a few bytes immediately
|
|
1850
|
+
preceding each allocation. For now, assume we have reserved some unseen memory
|
|
1851
|
+
for ourselves and we're using it to store our allocation and free lists.</p>
|
|
1852
|
+
<p>So what about <code>free</code>? Because we've saved the address and size of the allocation
|
|
1853
|
+
in our allocation list, we can search that list and move the allocation back in
|
|
1854
|
+
to the free list. Without the size information, we wouldn't be able to do this.</p>
|
|
1855
|
+
<div class="memory" bytes="32">
|
|
1856
|
+
<allocator path="/js/allocators/freelist.js">
|
|
1857
|
+
<options coalesce=false></options>
|
|
1858
|
+
</allocator>
|
|
1859
|
+
<malloc id=1 size=4></malloc>
|
|
1860
|
+
<free id=1></free>
|
|
1861
|
+
</div>
|
|
1862
|
+
<p>Our free list now has 2 entries. This might look harmless, but actually
|
|
1863
|
+
represents a significant problem. Let's see that problem in action.</p>
|
|
1864
|
+
<div class="memory" bytes="32">
|
|
1865
|
+
<allocator path="/js/allocators/freelist.js">
|
|
1866
|
+
<options coalesce=false></options>
|
|
1867
|
+
</allocator>
|
|
1868
|
+
<malloc id=1 size=4></malloc>
|
|
1869
|
+
<malloc id=2 size=4></malloc>
|
|
1870
|
+
<malloc id=3 size=4></malloc>
|
|
1871
|
+
<malloc id=4 size=4></malloc>
|
|
1872
|
+
<malloc id=5 size=4></malloc>
|
|
1873
|
+
<malloc id=6 size=4></malloc>
|
|
1874
|
+
<malloc id=7 size=4></malloc>
|
|
1875
|
+
<malloc id=8 size=4></malloc>
|
|
1876
|
+
<free id=1></free>
|
|
1877
|
+
<free id=2></free>
|
|
1878
|
+
<free id=3></free>
|
|
1879
|
+
<free id=4></free>
|
|
1880
|
+
<free id=5></free>
|
|
1881
|
+
<free id=6></free>
|
|
1882
|
+
<free id=7></free>
|
|
1883
|
+
<free id=8></free>
|
|
1884
|
+
</div>
|
|
1885
|
+
<p>We allocated 8 blocks of memory, each 4 bytes in size. Then we <code>free</code>d them all,
|
|
1886
|
+
resulting in 8 free list entries. The problem we have now is that if we tried
|
|
1887
|
+
to do a <code>malloc(8)</code>, there are no items in our free list that can hold 8 bytes
|
|
1888
|
+
and the <code>malloc(8)</code> will fail.</p>
|
|
1889
|
+
<p>To solve this, we need to do a bit more work. When we <code>free</code> memory, we should
|
|
1890
|
+
make sure that if the block we return to the free list is next to any other
|
|
1891
|
+
free blocks, we combine them together. This is called "coalescing."</p>
|
|
1892
|
+
<div class="memory" bytes="32">
|
|
1893
|
+
<allocator path="/js/allocators/freelist.js">
|
|
1894
|
+
<options coalesce=true>
|
|
1895
|
+
</allocator>
|
|
1896
|
+
<malloc id=1 size=4></malloc>
|
|
1897
|
+
<malloc id=2 size=4></malloc>
|
|
1898
|
+
<malloc id=3 size=4></malloc>
|
|
1899
|
+
<malloc id=4 size=4></malloc>
|
|
1900
|
+
<malloc id=5 size=4></malloc>
|
|
1901
|
+
<malloc id=6 size=4></malloc>
|
|
1902
|
+
<malloc id=7 size=4></malloc>
|
|
1903
|
+
<malloc id=8 size=4></malloc>
|
|
1904
|
+
<free id=1></free>
|
|
1905
|
+
<free id=2></free>
|
|
1906
|
+
<free id=3></free>
|
|
1907
|
+
<free id=4></free>
|
|
1908
|
+
<free id=5></free>
|
|
1909
|
+
<free id=6></free>
|
|
1910
|
+
<free id=7></free>
|
|
1911
|
+
<free id=8></free>
|
|
1912
|
+
</div>
|
|
1913
|
+
<p>Much better.</p>
|
|
1914
|
+
<h2 id="fragmentation"><a class="anchor" href="#fragmentation">#</a>
|
|
1915
|
+
Fragmentation</h2>
|
|
1916
|
+
<p>A perfectly coalesced free list doesn't solve all of our problems. The following
|
|
1917
|
+
example shows a longer sequence of allocations. Have a look at the state memory
|
|
1918
|
+
is in at the end.</p>
|
|
1919
|
+
<div class="memory" bytes="32">
|
|
1920
|
+
<allocator path="/js/allocators/freelist.js">
|
|
1921
|
+
<options coalesce=true>
|
|
1922
|
+
</allocator>
|
|
1923
|
+
<malloc id=1 size=1></malloc>
|
|
1924
|
+
<malloc id=2 size=2></malloc>
|
|
1925
|
+
<free id=1></free>
|
|
1926
|
+
<malloc id=3 size=2></malloc>
|
|
1927
|
+
<malloc id=4 size=3></malloc>
|
|
1928
|
+
<free id=3></free>
|
|
1929
|
+
<malloc id=5 size=3></malloc>
|
|
1930
|
+
<malloc id=6 size=4></malloc>
|
|
1931
|
+
<free id=5></free>
|
|
1932
|
+
<malloc id=7 size=4></malloc>
|
|
1933
|
+
<malloc id=8 size=5></malloc>
|
|
1934
|
+
<free id=7></free>
|
|
1935
|
+
<free id=2></free>
|
|
1936
|
+
<free id=4></free>
|
|
1937
|
+
<malloc id=9 size=4></malloc>
|
|
1938
|
+
<malloc id=10 size=4></malloc>
|
|
1939
|
+
<malloc id=11 size=4></malloc>
|
|
1940
|
+
<malloc id=12 size=5></malloc>
|
|
1941
|
+
</div>
|
|
1942
|
+
<p>We end this sequence with 6 of our 32 bytes free, but they're split into 2
|
|
1943
|
+
blocks of 3 bytes. If we had to service a <code>malloc(6)</code>, while we have enough free
|
|
1944
|
+
memory in theory, we wouldn't be able to. This is called "fragmentation."</p>
|
|
1945
|
+
<blockquote class="haskie">
|
|
1946
|
+
<img src="/images/haskie-confused-200px.png" />
|
|
1947
|
+
<p>
|
|
1948
|
+
Couldn't we rearrange the memory to get a block of 6 contiguous bytes? Some
|
|
1949
|
+
sort of defragmentation process?
|
|
1950
|
+
</p>
|
|
1951
|
+
</blockquote>
|
|
1952
|
+
<p>Sadly not. Remember earlier we talked about how the return value of <code>malloc</code> is
|
|
1953
|
+
the address of a byte in memory? Moving allocations won't change the pointers we
|
|
1954
|
+
have already returned from <code>malloc</code>. We would change the value those pointers
|
|
1955
|
+
are pointed at, effectively breaking them. This is one of the downsides of the
|
|
1956
|
+
<code>malloc</code>/<code>free</code> API.</p>
|
|
1957
|
+
<p>If we can't move allocations after creating them, we need to be more careful
|
|
1958
|
+
about where we put them to begin with.</p>
|
|
1959
|
+
<p>One way to combat fragmentation is, confusingly, to overallocate. If we always
|
|
1960
|
+
allocate a minimum of 4 bytes, even when the request is for 1 byte, watch what
|
|
1961
|
+
happens. This is the exact same sequence of allocations as above.</p>
|
|
1962
|
+
<div class="memory" bytes="32">
|
|
1963
|
+
<allocator path="/js/allocators/freelist.js">
|
|
1964
|
+
<options coalesce=true minsize=4>
|
|
1965
|
+
</allocator>
|
|
1966
|
+
<malloc id=1 size=1></malloc>
|
|
1967
|
+
<malloc id=2 size=2></malloc>
|
|
1968
|
+
<free id=1></free>
|
|
1969
|
+
<malloc id=3 size=2></malloc>
|
|
1970
|
+
<malloc id=4 size=3></malloc>
|
|
1971
|
+
<free id=3></free>
|
|
1972
|
+
<malloc id=5 size=3></malloc>
|
|
1973
|
+
<malloc id=6 size=4></malloc>
|
|
1974
|
+
<free id=5></free>
|
|
1975
|
+
<malloc id=7 size=4></malloc>
|
|
1976
|
+
<malloc id=8 size=5></malloc>
|
|
1977
|
+
<free id=7></free>
|
|
1978
|
+
<free id=2></free>
|
|
1979
|
+
<free id=4></free>
|
|
1980
|
+
<malloc id=9 size=4></malloc>
|
|
1981
|
+
<malloc id=10 size=4></malloc>
|
|
1982
|
+
<malloc id=11 size=4></malloc>
|
|
1983
|
+
<malloc id=12 size=5></malloc>
|
|
1984
|
+
</div>
|
|
1985
|
+
<p>Now we can service a <code>malloc(6)</code>. It's worth keeping in mind that this is just
|
|
1986
|
+
one example. Programs will call <code>malloc</code> and <code>free</code> in very different patterns
|
|
1987
|
+
depending on what they do, which makes it challenging to design an allocator
|
|
1988
|
+
that always performs well.</p>
|
|
1989
|
+
<blockquote class="haskie">
|
|
1990
|
+
<img src="/images/haskie-confused-200px.png" />
|
|
1991
|
+
<p>
|
|
1992
|
+
After the first <code>malloc</code>, the start of the free list seems to fall
|
|
1993
|
+
out of sync with allocated memory. Is that a bug in the visualisation?
|
|
1994
|
+
</p>
|
|
1995
|
+
</blockquote>
|
|
1996
|
+
<p>No, that's a side-effect of overallocating. The visualisation shows "true"
|
|
1997
|
+
memory use, whereas the free list is updated from the allocator's perspective.
|
|
1998
|
+
So when the first <code>malloc</code> happens, 1 byte of memory is allocated but the free
|
|
1999
|
+
list entry is moved forward 4 bytes. We trade some wasted space in return for
|
|
2000
|
+
less fragmentation.</p>
|
|
2001
|
+
<p>It's worth noting that this unused space that results from overallocation is
|
|
2002
|
+
another form of fragmentation. It's memory that cannot be used until the
|
|
2003
|
+
allocation that created it is freed. As a result, we wouldn't want to go too
|
|
2004
|
+
wild with overallocation. If our program only ever allocated 1 byte at a time,
|
|
2005
|
+
for example, we'd be wasting 75% of all memory.</p>
|
|
2006
|
+
<p>Another way to combat fragmentation is to segment memory into a space for small
|
|
2007
|
+
allocations and a space for big ones. In this next visualisation we start with
|
|
2008
|
+
two free lists. The lighter grey one is for allocations 3 bytes or smaller,
|
|
2009
|
+
and the darker grey one is for allocations 4 bytes or larger. Again, this is
|
|
2010
|
+
the exact same sequence of allocations as before.</p>
|
|
2011
|
+
<div id="segmented-1" class="memory" bytes=32>
|
|
2012
|
+
<allocator path="/js/allocators/segmented-freelist.js">
|
|
2013
|
+
<options
|
|
2014
|
+
coalesce=true
|
|
2015
|
+
smallfreelistthreshold=3
|
|
2016
|
+
smallfreelistsize=6>
|
|
2017
|
+
</options>
|
|
2018
|
+
</allocator>
|
|
2019
|
+
<malloc id=1 size=1></malloc>
|
|
2020
|
+
<malloc id=2 size=2></malloc>
|
|
2021
|
+
<free id=1></free>
|
|
2022
|
+
<malloc id=3 size=2></malloc>
|
|
2023
|
+
<malloc id=4 size=3></malloc>
|
|
2024
|
+
<free id=3></free>
|
|
2025
|
+
<malloc id=5 size=3></malloc>
|
|
2026
|
+
<malloc id=6 size=4></malloc>
|
|
2027
|
+
<free id=5></free>
|
|
2028
|
+
<malloc id=7 size=4></malloc>
|
|
2029
|
+
<malloc id=8 size=5></malloc>
|
|
2030
|
+
<free id=7></free>
|
|
2031
|
+
<free id=2></free>
|
|
2032
|
+
<free id=4></free>
|
|
2033
|
+
<malloc id=9 size=4></malloc>
|
|
2034
|
+
<malloc id=10 size=4></malloc>
|
|
2035
|
+
<malloc id=11 size=4></malloc>
|
|
2036
|
+
<malloc id=12 size=5></malloc>
|
|
2037
|
+
</div>
|
|
2038
|
+
<p>Nice! This also reduces fragmentation. If we're strictly only allowing
|
|
2039
|
+
allocations of 3 bytes or less in the first segment, though, then we can't
|
|
2040
|
+
service that <code>malloc(6)</code>. The trade-off here is that reserving a segment of
|
|
2041
|
+
memory for smaller allocations gives you less memory to work with for bigger
|
|
2042
|
+
ones.</p>
|
|
2043
|
+
<blockquote class="haskie">
|
|
2044
|
+
<img src="/images/haskie-triumphant-200px.png" />
|
|
2045
|
+
<p>
|
|
2046
|
+
Hey, <a simulation="segmented-1" position=4>the first allocation in the dark
|
|
2047
|
+
grey free list</a> is 3 bytes! You said this was for allocations 4 bytes and
|
|
2048
|
+
up. What gives?
|
|
2049
|
+
</p>
|
|
2050
|
+
</blockquote>
|
|
2051
|
+
<p>Got me again. This implementation I've written will put small allocations in the
|
|
2052
|
+
dark grey space when the light grey space is full. It will overallocate when it
|
|
2053
|
+
does this, otherwise we'd end up with avoidable fragmentation in the dark grey
|
|
2054
|
+
space thanks to small allocations.</p>
|
|
2055
|
+
<p>Allocators that split memory up based on the size of allocation are called
|
|
2056
|
+
"slab allocators." In practice they have many more size classes than the 2 in
|
|
2057
|
+
our example.</p>
|
|
2058
|
+
<h2 id="a-quick-malloc-puzzle"><a class="anchor" href="#a-quick-malloc-puzzle">#</a>
|
|
2059
|
+
A quick <code>malloc</code> puzzle</h2>
|
|
2060
|
+
<p>What happens if you <code>malloc(0)</code>? Have a think about this before playing with
|
|
2061
|
+
the slider below.</p>
|
|
2062
|
+
<div class="memory" bytes=32>
|
|
2063
|
+
<allocator path="/js/allocators/freelist.js">
|
|
2064
|
+
<options coalesce=true minsize=4></options>
|
|
2065
|
+
</allocator>
|
|
2066
|
+
<malloc id=1 size=0></malloc>
|
|
2067
|
+
<malloc id=2 size=0></malloc>
|
|
2068
|
+
<malloc id=3 size=0></malloc>
|
|
2069
|
+
<malloc id=4 size=0></malloc>
|
|
2070
|
+
<malloc id=5 size=0></malloc>
|
|
2071
|
+
<malloc id=6 size=0></malloc>
|
|
2072
|
+
<malloc id=7 size=0></malloc>
|
|
2073
|
+
<malloc id=8 size=0></malloc>
|
|
2074
|
+
</div>
|
|
2075
|
+
<p>This is using our free list implementation that mandates a minimum size of 4
|
|
2076
|
+
bytes for allocations. All memory gets allocated, but none is actually used.
|
|
2077
|
+
Do you think this is correct behaviour?</p>
|
|
2078
|
+
<p>It turns out that what happens when you <code>malloc(0)</code> differs between
|
|
2079
|
+
implementations. Some of them behave as above, allocating space they probably
|
|
2080
|
+
didn't have to. Others will return what's called a "null pointer", a special
|
|
2081
|
+
pointer that will crash your program if you try to read or write the memory it
|
|
2082
|
+
points to. Others pick one specific location in memory and return that same
|
|
2083
|
+
location for all calls to <code>malloc(0)</code>, regardless how many times it is called.</p>
|
|
2084
|
+
<p>Moral of the story? Don't <code>malloc(0)</code>.</p>
|
|
2085
|
+
<h2 id="inline-bookkeeping"><a class="anchor" href="#inline-bookkeeping">#</a>
|
|
2086
|
+
Inline bookkeeping</h2>
|
|
2087
|
+
<p>Remember earlier on when you asked about where allocation list and free list
|
|
2088
|
+
information gets stored, and I gave an unsatisfying answer about how it's
|
|
2089
|
+
stored in some other area of memory we've reserved for ourselves?</p>
|
|
2090
|
+
<blockquote class="haskie">
|
|
2091
|
+
<img src="/images/haskie-concerned-200px.png" />
|
|
2092
|
+
<p>
|
|
2093
|
+
Yes...
|
|
2094
|
+
</p>
|
|
2095
|
+
</blockquote>
|
|
2096
|
+
<p>This isn't the only way to do it. Lots of allocators store information right
|
|
2097
|
+
next to the blocks of memory they relate to. Have a look at this.</p>
|
|
2098
|
+
<div class="memory" bytes=32 slider=false>
|
|
2099
|
+
<allocator path="/js/allocators/inline.js">
|
|
2100
|
+
</allocator>
|
|
2101
|
+
</div>
|
|
2102
|
+
<p>What we have here is memory with no allocations, but free list information
|
|
2103
|
+
stored inline in that memory. Each block of memory, free or used, gets 3
|
|
2104
|
+
additional bytes of bookkeeping information. If <code>address</code> is the address of the
|
|
2105
|
+
first byte of the allocation, here's the layout of a block:</p>
|
|
2106
|
+
<ol>
|
|
2107
|
+
<li><code>address + 0</code> is the <span class="size">size</span> of the block</li>
|
|
2108
|
+
<li><code>address + 1</code> is whether the block is <span class="free">free (1)</span> or <span class="allocated">used (2)</span></li>
|
|
2109
|
+
<li><code>address + 2</code> is where the <span class="usable-memory">usable memory</span> starts</li>
|
|
2110
|
+
<li><code>address + 2 + size</code> -- the <span class="size">size</span> of the block again</li>
|
|
2111
|
+
</ol>
|
|
2112
|
+
<p>So in this above example, the byte at <code>0x0</code> is storing the value 29. This means
|
|
2113
|
+
it's a block containing 29 bytes of memory. The value 1 at <code>0x1</code> indicates that
|
|
2114
|
+
the block is free memory.</p>
|
|
2115
|
+
<blockquote class="haskie">
|
|
2116
|
+
<img src="/images/haskie-concerned-200px.png" />
|
|
2117
|
+
<p>
|
|
2118
|
+
We store the <span class="size">size</span> twice? Isn't that wasteful?
|
|
2119
|
+
</p>
|
|
2120
|
+
</blockquote>
|
|
2121
|
+
<p>It seems wasteful at first, but it is necessary if we want to do any form of
|
|
2122
|
+
coalescing. Let's take a look at an example.</p>
|
|
2123
|
+
<div class="memory" bytes=32 slider=false>
|
|
2124
|
+
<allocator path="/js/allocators/inline.js">
|
|
2125
|
+
</allocator>
|
|
2126
|
+
<malloc id=1 size=4></malloc>
|
|
2127
|
+
</div>
|
|
2128
|
+
<p>Here we've allocated 4 bytes of memory. To do this, our <code>malloc</code> implementation
|
|
2129
|
+
starts at the beginning of memory and checks to see if the block there is used.
|
|
2130
|
+
It knows that at <code>address + 1</code> it will find either a 1 or a 2. If it finds a
|
|
2131
|
+
1, it can check the value at <code>address</code> for how big the block is. If it is big
|
|
2132
|
+
enough, it can allocate into it. If it's not big enough, it knows it can add
|
|
2133
|
+
the value it finds in <code>address</code> to <code>address</code> to get to the start of the next
|
|
2134
|
+
block of memory.</p>
|
|
2135
|
+
<p>This has resulted in the creation of a used block (notice the 2 stored in the
|
|
2136
|
+
2nd byte), and it has pushed start of the free block forward by 7 bytes. Let's
|
|
2137
|
+
do the same again and allocate another 4 bytes.</p>
|
|
2138
|
+
<div class="memory" bytes=32 slider=false>
|
|
2139
|
+
<allocator path="/js/allocators/inline.js">
|
|
2140
|
+
</allocator>
|
|
2141
|
+
<malloc id=1 size=4></malloc>
|
|
2142
|
+
<malloc id=2 size=4></malloc>
|
|
2143
|
+
</div>
|
|
2144
|
+
<p>Next, let's <code>free</code> our first <code>malloc(4)</code>. The implementation of <code>free</code> is where
|
|
2145
|
+
storing information inline starts to shine. In our previous allocators, we had
|
|
2146
|
+
to search the allocation list to know the size of the block being <code>free</code>d. Now
|
|
2147
|
+
we know we'll find it at <code>address</code>. What's better than that is that for this
|
|
2148
|
+
<code>free</code>, we don't even need to know how big the allocation is. We can just set
|
|
2149
|
+
<code>address + 1</code> to 1!</p>
|
|
2150
|
+
<div class="memory" bytes=32 slider=false>
|
|
2151
|
+
<allocator path="/js/allocators/inline.js">
|
|
2152
|
+
</allocator>
|
|
2153
|
+
<malloc id=1 size=4></malloc>
|
|
2154
|
+
<malloc id=2 size=4></malloc>
|
|
2155
|
+
<free id=1></free>
|
|
2156
|
+
</div>
|
|
2157
|
+
<p>How great is that? Simple, fast.</p>
|
|
2158
|
+
<p>What if we wanted to free the 2nd block of used memory? We know that we want to
|
|
2159
|
+
coalesce to avoid fragmentation, but how do we do that? This is where the
|
|
2160
|
+
seemingly wasteful bookkeeping comes into play.</p>
|
|
2161
|
+
<p>When we coalesce, we check to see the state of the blocks immediately before and
|
|
2162
|
+
immediately after the block we're <code>free</code>ing. We know that we can get to the next
|
|
2163
|
+
block by adding the value at <code>address</code> to <code>address</code>, but how do we get to the
|
|
2164
|
+
previous block? We take the value at <code>address - 1</code> and <em>subtract</em> that from
|
|
2165
|
+
<code>address</code>. Without this duplicated size information at the end of the block, it
|
|
2166
|
+
would be impossible to find the previous block and impossible to coalesce
|
|
2167
|
+
properly.</p>
|
|
2168
|
+
<div class="memory" bytes=32>
|
|
2169
|
+
<allocator path="/js/allocators/inline.js">
|
|
2170
|
+
</allocator>
|
|
2171
|
+
<malloc id=1 size=4></malloc>
|
|
2172
|
+
<malloc id=2 size=4></malloc>
|
|
2173
|
+
<free id=1></free>
|
|
2174
|
+
<free id=2></free>
|
|
2175
|
+
</div>
|
|
2176
|
+
<p>Allocators that store bookkeeping information like this alongside allocations
|
|
2177
|
+
are called "boundary tag allocators."</p>
|
|
2178
|
+
<blockquote class="haskie">
|
|
2179
|
+
<img src="/images/haskie-concerned-200px.png" />
|
|
2180
|
+
<p>
|
|
2181
|
+
What's stopping a program from modifying the bookkeeping information? Wouldn't
|
|
2182
|
+
that completely break memory?
|
|
2183
|
+
</p>
|
|
2184
|
+
</blockquote>
|
|
2185
|
+
<p>Surprisingly, nothing truly prevents this. We rely heavily, as an industry, on
|
|
2186
|
+
the correctness of code. You might have heard of "buffer overrun" or "use after
|
|
2187
|
+
free" bugs before. These are when a program modifies memory past the end of an
|
|
2188
|
+
allocated block, or accidentally uses a block of memory after <code>free</code>ing it.
|
|
2189
|
+
These are indeed catastrophic. They can result in your program immediately
|
|
2190
|
+
crashing, they can result in your program crashing in several minutes, hours, or
|
|
2191
|
+
days time. They can even result in hackers using the bug to gain access to
|
|
2192
|
+
systems they shouldn't have access to.</p>
|
|
2193
|
+
<p>We're seeing a rise in popularity of "memory safe" languages, for example Rust.
|
|
2194
|
+
These languages invest a lot in making sure it's not possible to make these
|
|
2195
|
+
types of mistake in the first place. Exactly how they do that is outside of
|
|
2196
|
+
the scope of this article, but if this interests you I highly recommend giving
|
|
2197
|
+
Rust a try.</p>
|
|
2198
|
+
<p>You might have also realised that calling <code>free</code> on a pointer that's in the
|
|
2199
|
+
middle of a block of memory could also have disastrous consequences. Depending
|
|
2200
|
+
on what values are in memory, the allocator could be tricked into thinking it's
|
|
2201
|
+
<code>free</code>ing something but what it's really doing is modifying memory it shouldn't
|
|
2202
|
+
be.</p>
|
|
2203
|
+
<p>To get around this, some allocators inject "magic" values as part of the
|
|
2204
|
+
bookkeeping information. They store, say, <code>0x55</code> at <code>address + 2</code>. This would
|
|
2205
|
+
waste an extra byte of memory per allocation, but would allow them to know when
|
|
2206
|
+
a mistake has been made. To reduce the impact of this, allocators often disable
|
|
2207
|
+
this behaviour by default and allow you to enable it only when you're debugging.</p>
|
|
2208
|
+
<h2 id="playground"><a class="anchor" href="#playground">#</a>
|
|
2209
|
+
Playground</h2>
|
|
2210
|
+
<p>If you're keen to take your new found knowledge and try your hand at writing
|
|
2211
|
+
your own allocators, you can click <a href="/allocator-playground">here</a> to go to my
|
|
2212
|
+
allocator playground. You'll be able to write JavaScript code that implements
|
|
2213
|
+
the <code>malloc</code>/<code>free</code> API and visualise how it works!</p>
|
|
2214
|
+
<h2 id="conclusion"><a class="anchor" href="#conclusion">#</a>
|
|
2215
|
+
Conclusion</h2>
|
|
2216
|
+
<p>We've covered a lot in this post, and if it has left you yearning for more you
|
|
2217
|
+
won't be disappointed. I've specifically avoided the topics of virtual memory,
|
|
2218
|
+
<code>brk</code> vs <code>mmap</code>, the role of CPU caches, and the endless tricks real <code>malloc</code>
|
|
2219
|
+
implementations pull out of their sleeves. There's no shortage of information
|
|
2220
|
+
about memory allocators on the Internet, and if you've read this far you should
|
|
2221
|
+
be well-placed to dive in to it.</p>
|
|
2222
|
+
<p>Join the discussion on <a href="https://news.ycombinator.com/item?id=36029087">Hacker News</a>!</p>
|
|
2223
|
+
<h3 id="acknowledgments"><a class="anchor" href="#acknowledgments">#</a>
|
|
2224
|
+
Acknowledgments</h3>
|
|
2225
|
+
<p>Special thanks to the following people:</p>
|
|
2226
|
+
<ul>
|
|
2227
|
+
<li><a href="https://chrisdown.name">Chris Down</a> for lending me his extensive knowledge of real-world
|
|
2228
|
+
memory allocators.</li>
|
|
2229
|
+
<li><a href="https://zemlan.in/">Anton Verinov</a> for lending me his extensive knowledge of the web,
|
|
2230
|
+
browser developer tools, and user experience.</li>
|
|
2231
|
+
<li>Blake Becker, Matt Kaspar, Krista Horn, Jason Peddle, and
|
|
2232
|
+
<a href="https://joshwcomeau.com">Josh W. Comeau</a> for their insight and constructive
|
|
2233
|
+
reviews.</li>
|
|
2234
|
+
</ul>
|
|
2235
|
+
<!--
|
|
2236
|
+
## Real-world performance
|
|
2237
|
+
|
|
2238
|
+
By making use of boundary tags, we saw that `free` can be made really fast.
|
|
2239
|
+
No list traversals required, just inspection and manipulation of a few bytes
|
|
2240
|
+
of bookkeeping information. But `malloc` still has to traverse the list of
|
|
2241
|
+
all blocks, free or used, to find one that can fit the current request.
|
|
2242
|
+
|
|
2243
|
+
How do we make `malloc` fast?
|
|
2244
|
+
|
|
2245
|
+
What we're asking here isn't for the fastest possible `malloc` implementation.
|
|
2246
|
+
We've already seen that, at the very start of the article. The `malloc` that
|
|
2247
|
+
can't `free`. That's not what we want. We want a `malloc` that gets close to
|
|
2248
|
+
that speed, without creating a fragmented heap. We want high throughput, low
|
|
2249
|
+
fragmentation.
|
|
2250
|
+
|
|
2251
|
+
There's no one-size-fits-all solution here, so I will list a few ways real-world
|
|
2252
|
+
allocators try to achieve this.
|
|
2253
|
+
|
|
2254
|
+
### Segmenting/binning memory
|
|
2255
|
+
|
|
2256
|
+
We touched on this earlier, but a common approach to balancing throughput and
|
|
2257
|
+
fragmentation is by splitting memory up in to segments reserved for allocations
|
|
2258
|
+
of a specific size. In our example we had 2 segments: 1 for small allocations
|
|
2259
|
+
and 1 for big allocations.
|
|
2260
|
+
|
|
2261
|
+
A `malloc` implementation called `dlmalloc` ("Doug Lea's `malloc`") splits
|
|
2262
|
+
memory up in to 64(!) different size classes that it calls "bins." Each size
|
|
2263
|
+
class has a linked list of free blocks of memory associated with it. These lists
|
|
2264
|
+
begin empty, and as memory is `free`d it gets added to the appropriate list.
|
|
2265
|
+
|
|
2266
|
+
When there are no readily available free blocks in these bins, memory is
|
|
2267
|
+
allocated from what `dlmalloc` refers to as "the wilderness." This is just the
|
|
2268
|
+
free memory available to the program at the very beginning. `dlmalloc` takes
|
|
2269
|
+
from here when it needs to, but prefers to look in the size class bins first. It
|
|
2270
|
+
only takes from the wilderness when it can't take from anywhere else.
|
|
2271
|
+
|
|
2272
|
+
If memory isn't available in the appropriate bin, but is available in the next
|
|
2273
|
+
bin up, `dlmalloc` will take a larger block, split it in 2, allocate one, and
|
|
2274
|
+
put the other in the appropriate bin for later use. Likewise, when `free`ing,
|
|
2275
|
+
`dlmalloc` will coalesce blocks with their neighbours and return the resulting
|
|
2276
|
+
block to the largest bin it can. These techniques help reduce fragmentation.
|
|
2277
|
+
|
|
2278
|
+
### Caching
|
|
2279
|
+
|
|
2280
|
+
Another cool trick `dlmalloc` uses is called the "designated victim." When
|
|
2281
|
+
`dlmalloc` takes a block from a bin larger than the current allocation, it
|
|
2282
|
+
caches the remainder block to be used as the preferred location for the next
|
|
2283
|
+
allocation that does not perfectly match to a bin size.
|
|
2284
|
+
|
|
2285
|
+
`dlmalloc` also caches whether or not bins have blocks in them using a bitmap it
|
|
2286
|
+
calls the "binmap." This is a 32 bit value where each bit represents the state
|
|
2287
|
+
of a given bin. If it's a 1, the bin has blocks ready to use in it. If it's 0,
|
|
2288
|
+
it doesn't. This makes finding an appropriate bin really, really fast.
|
|
2289
|
+
|
|
2290
|
+
`phkmalloc`, the spiritual predecessor to `dlmalloc`, maintains a counter of how
|
|
2291
|
+
many blocks are free in a given "arena" or memory. `phkmalloc` works on a tiered
|
|
2292
|
+
system, putting smaller blocks inside of larger blocks, in order to be able to
|
|
2293
|
+
return the larger blocks back to the operating system when they have been
|
|
2294
|
+
completely freed. It calls these larger blocks "arenas."
|
|
2295
|
+
|
|
2296
|
+
### Locality
|
|
2297
|
+
|
|
2298
|
+
I've not touched on this topic at all, but it is very important especially in
|
|
2299
|
+
more modern `malloc` implementations. Without going in to too much detail,
|
|
2300
|
+
memory access involves a hierarchy of caches. Retrieving a value from memory is
|
|
2301
|
+
slow, so between your CPU and your memory sit layers of caches. Each layer
|
|
2302
|
+
closer to the CPU is faster but smaller. These caches can range in size from
|
|
2303
|
+
100MB at the largest layer to a few dozen kilobytes at the smallest. It's common
|
|
2304
|
+
for CPUs to have 3 layers.
|
|
2305
|
+
|
|
2306
|
+
When you fetch a value from memory, your CPU will actually fetch more than
|
|
2307
|
+
needed. It does this because it is likely that memory close together is needed
|
|
2308
|
+
at the same time. This is called "spatial locality." Hard drives do the same
|
|
2309
|
+
thing.
|
|
2310
|
+
|
|
2311
|
+
`malloc` is uniquely positioned to take advantage of this fact. If your `malloc`
|
|
2312
|
+
implementation intentionally places blocks close to blocks that were `malloc`ed
|
|
2313
|
+
around the same time, it will increase the chance that a single fetch from
|
|
2314
|
+
memory will hit multiple blocks of soon-needed memory.
|
|
2315
|
+
|
|
2316
|
+
Locality is also an argument in favour of storing your bookkeeping information
|
|
2317
|
+
separate to allocated memory. The more tightly packed a program's in-use memory
|
|
2318
|
+
is, the more likely it is to get cached together. Also, if you need to traverse
|
|
2319
|
+
lists to find free blocks, and your list traversal works using boundary tags,
|
|
2320
|
+
you will be accessing memory that is unlikely to be used. These accesses get
|
|
2321
|
+
cached, evicting potentially useful memory in the process.
|
|
2322
|
+
|
|
2323
|
+
This is why `dlmalloc` maintains its "binmap" separate from the bins themselves.
|
|
2324
|
+
You can check the status of all bins with minimal cache disturbance.
|
|
2325
|
+
|
|
2326
|
+
### Multithreading
|
|
2327
|
+
|
|
2328
|
+
The last topic we're going to talk about is multithreading. It's very common for
|
|
2329
|
+
computers in 2023 to have many CPU cores, but this wasn't always the case. For
|
|
2330
|
+
example, when `phkmalloc` was written there was no consideration for
|
|
2331
|
+
multithreading in the implementation. `dlmalloc` is not thread-safe by default,
|
|
2332
|
+
but comes with an option to make it thread-safe at a huge performance cost.
|
|
2333
|
+
|
|
2334
|
+
One of the first `malloc` implementations to optimise for multithreaded
|
|
2335
|
+
use-cases was `jemalloc` ("Jason Evans `malloc`"). Written around 2006,
|
|
2336
|
+
`jemalloc` makes the observation that is 2 CPU cores try to access the same
|
|
2337
|
+
area of the CPU cache, they will "fight" over the "ownership" of that area of
|
|
2338
|
+
the cache.
|
|
2339
|
+
|
|
2340
|
+
What does this mean?
|
|
2341
|
+
|
|
2342
|
+
CPU cache is split in to "lines." These lines are typically 64 bytes in size,
|
|
2343
|
+
and represent a 64 byte chunk of memory. Each CPU core has its own dedicated set
|
|
2344
|
+
of caches, so the same 64 byte chunk of memory could potentially be cached on
|
|
2345
|
+
multiple cores.
|
|
2346
|
+
|
|
2347
|
+
When 2 threads running on 2 different cores try to access the same line of
|
|
2348
|
+
cache, by trying to access the same area of memory and finding it already
|
|
2349
|
+
cached, it can become a problem. If they both write to it, that will trigger the
|
|
2350
|
+
cache to be synchronised across CPU cores, and this can be really slow. Without
|
|
2351
|
+
this synchronisation, the same 64 byte region of memory could appear to hold
|
|
2352
|
+
different values depending on what core you see it from.
|
|
2353
|
+
|
|
2354
|
+
To avoid this, `jemalloc` splits memory up in to 2MB chunks, and each chunk can
|
|
2355
|
+
only be accessed by 1 thread. Giving threads unique ownership of parts of memory
|
|
2356
|
+
guarantees that you will avoid this slow cache synchronisation. It also gives
|
|
2357
|
+
your `malloc` implementation thread-safety, because you know that all threads
|
|
2358
|
+
will be operating on memory they have sole ownership of.
|
|
2359
|
+
-->
|
|
2360
|
+
</content>
|
|
2361
|
+
|
|
2362
|
+
</entry>
|
|
2363
|
+
|
|
2364
|
+
|
|
2365
|
+
<entry xml:lang="en">
|
|
2366
|
+
<title>Load Balancing</title>
|
|
2367
|
+
<published>2023-04-10T00:00:00+00:00</published>
|
|
2368
|
+
<updated>2023-04-10T00:00:00+00:00</updated>
|
|
2369
|
+
<author>
|
|
2370
|
+
<name>Unknown</name>
|
|
2371
|
+
</author>
|
|
2372
|
+
<link rel="alternate" href="https://samwho.dev/load-balancing/" type="text/html"/>
|
|
2373
|
+
<id>https://samwho.dev/load-balancing/</id>
|
|
2374
|
+
|
|
2375
|
+
<content type="html"><script type="module" src="/js/load-balancers.js"></script>
|
|
2376
|
+
<style>
|
|
2377
|
+
.simulation {
|
|
2378
|
+
width: 100%;
|
|
2379
|
+
display: flex;
|
|
2380
|
+
justify-content: center;
|
|
2381
|
+
align-items: center;
|
|
2382
|
+
margin-bottom: 2.5em;
|
|
2383
|
+
}
|
|
2384
|
+
|
|
2385
|
+
.load-balancer {
|
|
2386
|
+
color: black;
|
|
2387
|
+
font-weight: bold;
|
|
2388
|
+
}
|
|
2389
|
+
|
|
2390
|
+
.request {
|
|
2391
|
+
color: #04BF8A;
|
|
2392
|
+
font-weight: bold;
|
|
2393
|
+
}
|
|
2394
|
+
|
|
2395
|
+
.server {
|
|
2396
|
+
color: #999999;
|
|
2397
|
+
font-weight: bold;
|
|
2398
|
+
}
|
|
2399
|
+
|
|
2400
|
+
.dropped {
|
|
2401
|
+
color: red;
|
|
2402
|
+
font-weight: bold;
|
|
2403
|
+
}
|
|
2404
|
+
|
|
2405
|
+
.lds-dual-ring {
|
|
2406
|
+
display: inline-block;
|
|
2407
|
+
width: 80px;
|
|
2408
|
+
height: 80px;
|
|
2409
|
+
}
|
|
2410
|
+
.lds-dual-ring:after {
|
|
2411
|
+
content: " ";
|
|
2412
|
+
display: block;
|
|
2413
|
+
width: 64px;
|
|
2414
|
+
height: 64px;
|
|
2415
|
+
margin: 8px;
|
|
2416
|
+
border-radius: 50%;
|
|
2417
|
+
border: 6px solid #000;
|
|
2418
|
+
border-color: #000 transparent #000 transparent;
|
|
2419
|
+
animation: lds-dual-ring 1.2s linear infinite;
|
|
2420
|
+
}
|
|
2421
|
+
@keyframes lds-dual-ring {
|
|
2422
|
+
0% {
|
|
2423
|
+
transform: rotate(0deg);
|
|
2424
|
+
}
|
|
2425
|
+
100% {
|
|
2426
|
+
transform: rotate(360deg);
|
|
2427
|
+
}
|
|
2428
|
+
}
|
|
2429
|
+
</style>
|
|
2430
|
+
<p>Past a certain point, web applications outgrow a single server deployment.
|
|
2431
|
+
Companies either want to increase their availability, scalability, or both! To
|
|
2432
|
+
do this, they deploy their application across multiple servers with a load
|
|
2433
|
+
balancer in front to distribute incoming requests. Big companies may need
|
|
2434
|
+
thousands of servers running their web application to handle the load.</p>
|
|
2435
|
+
<p>In this post we're going to focus on the ways that a single load balancer might
|
|
2436
|
+
distribute HTTP requests to a set of servers. We'll start from the bottom and
|
|
2437
|
+
work our way up to modern load balancing algorithms.</p>
|
|
2438
|
+
<h2 id="visualising-the-problem"><a class="anchor" href="#visualising-the-problem">#</a>
|
|
2439
|
+
Visualising the problem</h2>
|
|
2440
|
+
<p>Let's start at the beginning: a single <span class="load-balancer">load
|
|
2441
|
+
balancer</span> sending <span class="request">requests</span> to a single <span
|
|
2442
|
+
class="server">server</span>. <span class="request">Requests</span> are being
|
|
2443
|
+
sent at a rate of 1 request per second (RPS), and each <span
|
|
2444
|
+
class="request">request</span> reduces in size as the <span
|
|
2445
|
+
class="server">server</span> processes it.</p>
|
|
2446
|
+
<div id="1" class="simulation" style="height: 200px">
|
|
2447
|
+
<div class="lds-dual-ring"></div>
|
|
2448
|
+
</div>
|
|
2449
|
+
<p>For a lot of websites, this setup works just fine. Modern <span
|
|
2450
|
+
class="server">servers</span> are powerful and can handle a lot of <span
|
|
2451
|
+
class="request">requests</span>. But what happens when they can't keep up?</p>
|
|
2452
|
+
<div id="2" class="simulation" style="height: 200px">
|
|
2453
|
+
<div class="lds-dual-ring"></div>
|
|
2454
|
+
</div>
|
|
2455
|
+
<p>Here we see that a rate of 3 RPS causes some <span
|
|
2456
|
+
class="request">requests</span> to get <span class="dropped">dropped</span>. If
|
|
2457
|
+
a <span class="request">request</span> arrives at the <span
|
|
2458
|
+
class="server">server</span> while another <span class="request">request</span>
|
|
2459
|
+
is being processed, the <span class="server">server</span> will <span
|
|
2460
|
+
class="dropped">drop</span> it. This will result in an error being shown to the
|
|
2461
|
+
user and is something we want to avoid. We can add another <span
|
|
2462
|
+
class="server">server</span> to our <span class="load-balancer">load
|
|
2463
|
+
balancer</span> to fix this.</p>
|
|
2464
|
+
<div id="3" class="simulation" style="height: 200px">
|
|
2465
|
+
<div class="lds-dual-ring"></div>
|
|
2466
|
+
</div>
|
|
2467
|
+
<p>No more <span class="dropped">dropped</span> <span
|
|
2468
|
+
class="request">requests</span>! The way our <span class="load-balancer">load
|
|
2469
|
+
balancer</span> is behaving here, sending a request to each <span
|
|
2470
|
+
class="server">server</span> in turn, is called "round robin" load balancing.
|
|
2471
|
+
It's one of the simplest forms of load balancing, and works well when your <span
|
|
2472
|
+
class="server">servers</span> are all equally powerful and your <span
|
|
2473
|
+
class="request">requests</span> are all equally expensive.</p>
|
|
2474
|
+
<div id="4" class="simulation" style="height: 200px">
|
|
2475
|
+
<div class="lds-dual-ring"></div>
|
|
2476
|
+
</div>
|
|
2477
|
+
<h2 id="when-round-robin-doesn-t-cut-it"><a class="anchor" href="#when-round-robin-doesn-t-cut-it">#</a>
|
|
2478
|
+
When round robin doesn't cut it</h2>
|
|
2479
|
+
<p>In the real world, it's rare for <span class="server">servers</span> to be
|
|
2480
|
+
equally powerful and <span class="request">requests</span> to be equally
|
|
2481
|
+
expensive. Even if you use the exact same <span class="server">server</span>
|
|
2482
|
+
hardware, performance may differ. Applications may have to service many
|
|
2483
|
+
different types of <span class="request">requests</span>, and these will likely
|
|
2484
|
+
have different performance characteristics.</p>
|
|
2485
|
+
<p>Let's see what happens when we vary <span class="request">request</span> cost.
|
|
2486
|
+
In the following simulation, <span class="request">requests</span> aren't
|
|
2487
|
+
equally expensive. You'll be able to see this by some <span
|
|
2488
|
+
class="request">requests</span> taking longer to shrink than others.</p>
|
|
2489
|
+
<div id="5" class="simulation" style="height: 200px">
|
|
2490
|
+
<div class="lds-dual-ring"></div>
|
|
2491
|
+
</div>
|
|
2492
|
+
<p>While most <span class="request">requests</span> get served successfully, we do
|
|
2493
|
+
<span class="dropped">drop</span> some. One of the ways we can mitigate this is
|
|
2494
|
+
to have a "request queue."</p>
|
|
2495
|
+
<div id="6" class="simulation" style="height: 250px">
|
|
2496
|
+
<div class="lds-dual-ring"></div>
|
|
2497
|
+
</div>
|
|
2498
|
+
<p>Request queues help us deal with uncertainty, but it's a trade-off. We will
|
|
2499
|
+
<span class="dropped">drop</span> fewer <span class="request">requests</span>,
|
|
2500
|
+
but at the cost of some <span class="request">requests</span> having a higher
|
|
2501
|
+
latency. If you watch the above simulation long enough, you might notice the
|
|
2502
|
+
<span class="request">requests</span> subtly changing colour. The longer they go
|
|
2503
|
+
without being served, the more their colour will change. You'll also notice that
|
|
2504
|
+
thanks to the <span class="request">request</span> cost variance, <span
|
|
2505
|
+
class="server">servers</span> start to exhibit an imbalance. Queues will get
|
|
2506
|
+
backed up on <span class="server">servers</span> that get unlucky and have to
|
|
2507
|
+
serve multiple expensive <span class="request">requests</span> in a row. If
|
|
2508
|
+
a queue is full, we will <span class="dropped">drop</span> the <span
|
|
2509
|
+
class="request">request</span>.</p>
|
|
2510
|
+
<p>Everything said above applies equally to <span class="server">servers</span>
|
|
2511
|
+
that vary in power. In the next simulation we also vary the power of each
|
|
2512
|
+
<span class="server">server</span>, which is represented visually with a darker
|
|
2513
|
+
shade of grey.</p>
|
|
2514
|
+
<div id="7" class="simulation" style="height: 250px">
|
|
2515
|
+
<div class="lds-dual-ring"></div>
|
|
2516
|
+
</div>
|
|
2517
|
+
<p>The <span class="server">servers</span> are given a random power value, but odds
|
|
2518
|
+
are some are less powerful than others and quickly start to <span
|
|
2519
|
+
class="dropped">drop</span> <span class="request">requests</span>. At the same
|
|
2520
|
+
time, the more powerful <span class="server">servers</span> sit idle most of the
|
|
2521
|
+
time. This scenario shows the key weakness of round robin: variance.</p>
|
|
2522
|
+
<p>Despite its flaws, however, round robin is still the default HTTP load balancing
|
|
2523
|
+
method for <a href="https://nginx.org/en/docs/http/load_balancing.html">nginx</a>.</p>
|
|
2524
|
+
<h2 id="improving-on-round-robin"><a class="anchor" href="#improving-on-round-robin">#</a>
|
|
2525
|
+
Improving on round robin</h2>
|
|
2526
|
+
<p>It's possible to tweak round robin to perform better with variance. There's an
|
|
2527
|
+
algorithm called "weighted round robin" which involves getting humans
|
|
2528
|
+
to tag each <span class="server">server</span> with a weight that dictates how
|
|
2529
|
+
many <span class="request">requests</span> to send to it.</p>
|
|
2530
|
+
<p>In this simulation, we use each <span class="server">server's</span> known power
|
|
2531
|
+
value as its weight, and we give more powerful <span
|
|
2532
|
+
class="server">servers</span> more <span class="request">requests</span> as we
|
|
2533
|
+
loop through them.</p>
|
|
2534
|
+
<div id="8" class="simulation" style="height: 250px">
|
|
2535
|
+
<div class="lds-dual-ring"></div>
|
|
2536
|
+
</div>
|
|
2537
|
+
<p>While this handles the variance of <span class="server">server</span> power
|
|
2538
|
+
better than vanilla round robin, we still have <span
|
|
2539
|
+
class="request">request</span> variance to contend with. In practice, getting
|
|
2540
|
+
humans to set the weight by hand falls apart quickly. Boiling <span
|
|
2541
|
+
class="server">server</span> performance down to a single number is hard, and
|
|
2542
|
+
would require careful load testing with real workloads. This is rarely done, so
|
|
2543
|
+
another variant of weighted round robin calculates weights dynamically by using
|
|
2544
|
+
a proxy metric: latency.</p>
|
|
2545
|
+
<p>It stands to reason that if one <span class="server">server</span> serves
|
|
2546
|
+
<span class="request">requests</span> 3 times faster than another <span class="server">server</span>, it's probably 3 times faster and should receive
|
|
2547
|
+
3 times more <span class="request">requests</span> than the other <span class="server">server</span>.</p>
|
|
2548
|
+
<div id="9" class="simulation" style="height: 250px">
|
|
2549
|
+
<div class="lds-dual-ring"></div>
|
|
2550
|
+
</div>
|
|
2551
|
+
<p>I've added text to each <span class="server">server</span> this time that shows
|
|
2552
|
+
the average latency of the last 3 <span class="request">requests</span> served.
|
|
2553
|
+
We then decide whether to send 1, 2, or 3 <span class="request">requests</span>
|
|
2554
|
+
to each <span class="server">server</span> based on the relative differences in
|
|
2555
|
+
the latencies. The result is very similar to the initial weighted round robin
|
|
2556
|
+
simulation, but there's no need to specify the weight of each <span
|
|
2557
|
+
class="server">server</span> up front. This algorithm will also be able to adapt
|
|
2558
|
+
to changes in <span class="server">server</span> performance over time. This is
|
|
2559
|
+
called "dynamic weighted round robin."</p>
|
|
2560
|
+
<p>Let's see how it handles a complex situation, with high variance in both <span
|
|
2561
|
+
class="server">server</span> power and <span class="request">request</span>
|
|
2562
|
+
cost. The following simulation uses randomised values, so feel free to refresh
|
|
2563
|
+
the page a few times to see it adapt to new variants.</p>
|
|
2564
|
+
<div id="10" class="simulation" style="height: 250px">
|
|
2565
|
+
<div class="lds-dual-ring"></div>
|
|
2566
|
+
</div>
|
|
2567
|
+
<h2 id="moving-away-from-round-robin"><a class="anchor" href="#moving-away-from-round-robin">#</a>
|
|
2568
|
+
Moving away from round robin</h2>
|
|
2569
|
+
<p>Dynamic weighted round robin seems to account well for variance in both <span
|
|
2570
|
+
class="server">server</span> power and <span class="request">request
|
|
2571
|
+
</span> cost. But what if I told you we could do even better, and with a simpler
|
|
2572
|
+
algorithm?</p>
|
|
2573
|
+
<div id="11" class="simulation" style="height: 250px">
|
|
2574
|
+
<div class="lds-dual-ring"></div>
|
|
2575
|
+
</div>
|
|
2576
|
+
<p>This is called "least connections" load balancing.</p>
|
|
2577
|
+
<p>Because the <span class="load-balancer">load balancer</span> sits between the
|
|
2578
|
+
<span class="server">server</span> and the user, it can accurately keep track
|
|
2579
|
+
of how many outstanding <span class="request">requests</span> each <span
|
|
2580
|
+
class="server">server</span> has. Then when a new <span class="request">
|
|
2581
|
+
request</span> comes in and it's time to determine where to send it, it knows
|
|
2582
|
+
which <span class="server">servers</span> have the least work to do and
|
|
2583
|
+
prioritises those.</p>
|
|
2584
|
+
<p>This algorithm performs extremely well regardless how much variance exists.
|
|
2585
|
+
It cuts through uncertainty by maintaining an accurate understanding of what
|
|
2586
|
+
each <span class="server">server</span> is doing. It also has the benefit of
|
|
2587
|
+
being very simple to implement.</p>
|
|
2588
|
+
<p>Let's see this in action in a similarly complex simulation, the same parameters
|
|
2589
|
+
we gave the dynamic weighted round robin algorithm above. Again, these
|
|
2590
|
+
parameters are randomised within given ranges, so refresh the page to see new
|
|
2591
|
+
variants.</p>
|
|
2592
|
+
<div id="12" class="simulation" style="height: 250px">
|
|
2593
|
+
<div class="lds-dual-ring"></div>
|
|
2594
|
+
</div>
|
|
2595
|
+
<p>While this algorithm is a great balance between simplicity and performance, it's
|
|
2596
|
+
not immune to <span class="dropped">dropping</span> <span
|
|
2597
|
+
class="request">requests</span>. However, what you'll notice is that the only
|
|
2598
|
+
time this algorithm <span class="dropped">drops</span> <span
|
|
2599
|
+
class="request">requests</span> is when there is literally no more queue space
|
|
2600
|
+
available. It will make sure all available resources are in use, and that makes
|
|
2601
|
+
it a great default choice for most workloads.</p>
|
|
2602
|
+
<h2 id="optimizing-for-latency"><a class="anchor" href="#optimizing-for-latency">#</a>
|
|
2603
|
+
Optimizing for latency</h2>
|
|
2604
|
+
<p>Up until now I've been avoiding a crucial part of the discussion: what we're
|
|
2605
|
+
optimising for. Implicitly, I've been considering <span
|
|
2606
|
+
class="dropped">dropped</span> <span class="request">requests</span> to be
|
|
2607
|
+
really bad and seeking to avoid them. This is a nice goal, but it's not the
|
|
2608
|
+
metric we most want to optimise for in an HTTP <span class="load-balancer">load
|
|
2609
|
+
balancer</span>.</p>
|
|
2610
|
+
<p>What we're often more concerned about is latency. This is measured in
|
|
2611
|
+
milliseconds from the moment a <span class="request">request</span> is created
|
|
2612
|
+
to the moment it has been served. When we're discussing latency in this context,
|
|
2613
|
+
it is common to talk about different "percentiles." For example, the 50th
|
|
2614
|
+
percentile (also called the "median") is defined as the millisecond value for
|
|
2615
|
+
which 50% of requests are below, and 50% are above.</p>
|
|
2616
|
+
<p>I ran 3 simulations with identical parameters for 60 seconds and took a variety
|
|
2617
|
+
of measurements every second. Each simulation varied only by the load balancing
|
|
2618
|
+
algorithm used. Let's compare the medians for each of the 3 simulations:</p>
|
|
2619
|
+
<div id="graph-medians"></div>
|
|
2620
|
+
<p>You might not have expected it, but round robin has the best median latency. If
|
|
2621
|
+
we weren't looking at any other data points, we'd miss the full story. Let's
|
|
2622
|
+
take a look at the 95th and 99th percentiles.</p>
|
|
2623
|
+
<div id="graph-higher"></div>
|
|
2624
|
+
<p>Note: there's no colour difference between the different percentiles for each
|
|
2625
|
+
load balancing algorithm. Higher percentiles will always be higher on the graph.</p>
|
|
2626
|
+
<p>We see that round robin doesn't perform well in the higher percentiles. How can
|
|
2627
|
+
it be that round robin has a great median, but bad 95th and 99th percentiles?</p>
|
|
2628
|
+
<p>In round robin, the state of each <span class="server">server</span> isn't
|
|
2629
|
+
considered, so you'll get quite a lot of <span class="request">requests</span>
|
|
2630
|
+
going to <span class="server">servers</span> that are idle. This is how we get
|
|
2631
|
+
the low 50th percentile. On the flip side, we'll also happily send <span
|
|
2632
|
+
class="request">requests</span> to <span class="server">servers</span> that are
|
|
2633
|
+
overloaded, hence the bad 95th and 99th percentiles.</p>
|
|
2634
|
+
<p>We can take a look at the full data in histogram form:</p>
|
|
2635
|
+
<div id="histogram-1"></div>
|
|
2636
|
+
<p>I chose the parameters for these simulations to avoid <span
|
|
2637
|
+
class="dropped">dropping</span> any <span class="request">requests</span>. This
|
|
2638
|
+
guarantees we compare the same number of data points for all 3 algorithms.
|
|
2639
|
+
Let's run the simulations again but with an increased RPS value, designed to
|
|
2640
|
+
push all of the algorithms past what they can handle. The following is a graph
|
|
2641
|
+
of cumulative <span class="request">requests</span> <span
|
|
2642
|
+
class="dropped">dropped</span> over time.</p>
|
|
2643
|
+
<div id="graph-dropped"></div>
|
|
2644
|
+
<p>Least connections handles overload much better, but the cost of doing that is
|
|
2645
|
+
slightly higher 95th and 99th percentile latencies. Depending on your use-case,
|
|
2646
|
+
this might be a worthwhile trade-off.</p>
|
|
2647
|
+
<h2 id="one-last-algorithm"><a class="anchor" href="#one-last-algorithm">#</a>
|
|
2648
|
+
One last algorithm</h2>
|
|
2649
|
+
<p>If we <em>really</em> want to optimise for latency, we need an algorithm that takes
|
|
2650
|
+
latency into account. Wouldn't it be great if we could combine the dynamic
|
|
2651
|
+
weighted round robin algorithm with the least connections algorithm? The latency
|
|
2652
|
+
of weighted round robin and the resilience of least connections.</p>
|
|
2653
|
+
<p>Turns out we're not the first people to have this thought. Below is a simulation
|
|
2654
|
+
using an algorithm called "peak exponentially weighted moving average" (or
|
|
2655
|
+
PEWMA). It's a long and complex name but hang in there, I'll break down how it
|
|
2656
|
+
works in a moment.</p>
|
|
2657
|
+
<div id="13" class="simulation" style="height: 250px">
|
|
2658
|
+
<div class="lds-dual-ring"></div>
|
|
2659
|
+
</div>
|
|
2660
|
+
<p>I've set specific parameters for this simulation that are guaranteed to exhibit
|
|
2661
|
+
an expected behaviour. If you watch closely, you'll notice that the algorithm
|
|
2662
|
+
just stops sending <span class="request">requests</span> to the leftmost <span
|
|
2663
|
+
class="server">server</span> after a while. It does this because it figures out
|
|
2664
|
+
that all of the other <span class="server">servers</span> are faster, and
|
|
2665
|
+
there's no need to send <span class="request">requests</span> to the slowest
|
|
2666
|
+
one. That will just result in <span class="request">requests</span> with a
|
|
2667
|
+
higher latency.</p>
|
|
2668
|
+
<p>So how does it do this? It combines techniques from dynamic weighted round robin
|
|
2669
|
+
with techniques from least connections, and sprinkles a little bit of its own
|
|
2670
|
+
magic on top.</p>
|
|
2671
|
+
<p>For each <span class="server">server</span>, the algorithm keeps track of the
|
|
2672
|
+
latency from the last N <span class="request">requests</span>. Instead of using
|
|
2673
|
+
this to calculate an average, it sums the values but with an exponentially
|
|
2674
|
+
decreasing scale factor. This results in a value where the older a latency is,
|
|
2675
|
+
the less it contributes to the sum. Recent <span class="request">requests</span>
|
|
2676
|
+
influence the calculation more than old ones.</p>
|
|
2677
|
+
<p>That value is then taken and multiplied by the number of open connections to the
|
|
2678
|
+
<span class="server">server</span> and the result is the value we use to choose
|
|
2679
|
+
which <span class="server">server</span> to send the next <span
|
|
2680
|
+
class="request">request</span> to. Lower is better.</p>
|
|
2681
|
+
<p>So how does it compare? First let's take a look at the 50th, 95th, and 99th
|
|
2682
|
+
percentiles when compared against the least connections data from earlier.</p>
|
|
2683
|
+
<div id="pewma-graph"></div>
|
|
2684
|
+
<p>We see a marked improvement across the board! It's far more pronounced at the
|
|
2685
|
+
higher percentiles, but consistently present for the median as well. Here we
|
|
2686
|
+
can see the same data in histogram form.</p>
|
|
2687
|
+
<div id="pewma-histogram"></div>
|
|
2688
|
+
<p>How about <span class="dropped">dropped</span> <span
|
|
2689
|
+
class="requests">requests</span>?</p>
|
|
2690
|
+
<div id="pewma-dropped"></div>
|
|
2691
|
+
<p>It starts out performing better, but over time performs worse than least
|
|
2692
|
+
connections. This makes sense. PEWMA is opportunistic in that it tries to get
|
|
2693
|
+
the best latency, and this means it may sometimes leave a <span class="server">
|
|
2694
|
+
server</span> less than fully loaded.</p>
|
|
2695
|
+
<p>I want to add here that PEWMA has a lot of parameters that can be tweaked. The
|
|
2696
|
+
implementation I wrote for this post uses a configuration that seemed to work
|
|
2697
|
+
well for the situations I tested it in, but further tweaking could get you
|
|
2698
|
+
better results vs least connections. This is one of the downsides of PEWMA vs
|
|
2699
|
+
least connections: extra complexity.</p>
|
|
2700
|
+
<h2 id="conclusion"><a class="anchor" href="#conclusion">#</a>
|
|
2701
|
+
Conclusion</h2>
|
|
2702
|
+
<p>I spent a long time on this post. It was difficult to balance realism against
|
|
2703
|
+
ease of understanding, but I feel good about where I landed. I'm hopeful that
|
|
2704
|
+
being able to see how these complex systems behave in practice, in ideal and
|
|
2705
|
+
less-than-ideal scenarios, helps you grow an intuitive understanding of when
|
|
2706
|
+
they would best apply to your workloads.</p>
|
|
2707
|
+
<p><strong>Obligatory disclaimer</strong>: You must always benchmark your own workloads over
|
|
2708
|
+
taking advice from the Internet as gospel. My simulations here ignore some real
|
|
2709
|
+
life constraints (server slow start, network latency), and are set up to display
|
|
2710
|
+
specific properties of each algorithm. They aren't realistic benchmarks to be
|
|
2711
|
+
taken at face value.</p>
|
|
2712
|
+
<p>To round this out, I leave you with a version of the simulation that lets you
|
|
2713
|
+
tweak most of the parameters in real time. Have fun!</p>
|
|
2714
|
+
<p><strong>EDIT</strong>: <em>Thanks to everyone who participated in the discussions on
|
|
2715
|
+
<a href="https://news.ycombinator.com/item?id=35588797">Hacker News</a>,
|
|
2716
|
+
<a href="https://twitter.com/samwhoo/status/1645429789107318789?s=20">Twitter</a> and
|
|
2717
|
+
<a href="https://lobste.rs/s/kydugs/load_balancing">Lobste.rs</a>!</em></p>
|
|
2718
|
+
<p><em>You all had a tonne of great questions and I tried to answer all of them.
|
|
2719
|
+
Some of the common themes were about missing things, either algorithms (like
|
|
2720
|
+
"power of 2 choices") or downsides of algorithms covered (like how "least
|
|
2721
|
+
connections" handles errors from servers).</em></p>
|
|
2722
|
+
<p><em>I tried to strike a balance between post length and complexity of the
|
|
2723
|
+
simulations. I'm quite happy with where I landed, but like you I also wish I
|
|
2724
|
+
could have covered more. I'd love to see people taking inspiration from this
|
|
2725
|
+
and covering more topics in this space in a visual way. Please ping me if you
|
|
2726
|
+
do!</em></p>
|
|
2727
|
+
<p><em>The other common theme was "how did you make this?" I used
|
|
2728
|
+
<a href="https://pixijs.com/">PixiJS</a> and I'm really happy with how it turned out. It's
|
|
2729
|
+
my first time using this library and it was quite easy to get to grips with.
|
|
2730
|
+
If writing visual explanations like this are something you're interested in,
|
|
2731
|
+
I recommend it!</em></p>
|
|
2732
|
+
<h2 id="playground"><a class="anchor" href="#playground">#</a>
|
|
2733
|
+
Playground</h2>
|
|
2734
|
+
<div id="fin" class="simulation" style="height: 450px; margin-top: 20px">
|
|
2735
|
+
<div class="lds-dual-ring"></div>
|
|
2736
|
+
</div>
|
|
2737
|
+
</content>
|
|
2738
|
+
|
|
2739
|
+
</entry>
|
|
2740
|
+
|
|
2741
|
+
|
|
2742
|
+
<entry xml:lang="en">
|
|
2743
|
+
<title>Practical Problems with Auto-Increment</title>
|
|
2744
|
+
<published>2023-03-25T00:00:00+00:00</published>
|
|
2745
|
+
<updated>2023-03-25T00:00:00+00:00</updated>
|
|
2746
|
+
<author>
|
|
2747
|
+
<name>Unknown</name>
|
|
2748
|
+
</author>
|
|
2749
|
+
<link rel="alternate" href="https://samwho.dev/blog/practical-problems-with-auto-increment/" type="text/html"/>
|
|
2750
|
+
<id>https://samwho.dev/blog/practical-problems-with-auto-increment/</id>
|
|
2751
|
+
|
|
2752
|
+
<content type="html"><p>In this post I'm going to demonstrate 2 reasons I will be avoiding
|
|
2753
|
+
auto-increment fields in Postgres and MySQL in future. I'm going to prefer using
|
|
2754
|
+
UUID fields unless I have a <em>very</em> good reason not to.</p>
|
|
2755
|
+
<h2 id="mysql-8-0-auto-increment-id-re-use"><a class="anchor" href="#mysql-8-0-auto-increment-id-re-use">#</a>
|
|
2756
|
+
MySQL &lt;8.0 auto-increment ID re-use</h2>
|
|
2757
|
+
<p>If you're running an older version of MySQL, it's possible for auto-incrementing
|
|
2758
|
+
IDs to get re-used. Let's see this in action.</p>
|
|
2759
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ docker volume create mysql-data
|
|
2760
|
+
</span><span>$ docker run --platform linux/amd64 -e MYSQL_ROOT_PASSWORD=my-secret-pw -p 3306:3306 -v mysql-data:/var/lib/mysql mysql:5.7
|
|
2761
|
+
</span></code></pre>
|
|
2762
|
+
<p>This gets us a Docker container of MySQL 5.7 running, attached to a volume that
|
|
2763
|
+
will persist the data between runs of this container. Next let's get a simple
|
|
2764
|
+
schema we can work with:</p>
|
|
2765
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ docker run -it --rm --network host --platform linux/amd64 mysql:5.7 mysql -h 127.0.0.1 -P 3306 -u root -p
|
|
2766
|
+
</span><span>mysql&gt; CREATE DATABASE my_database;
|
|
2767
|
+
</span><span>Query OK, 1 row affected (0.01 sec)
|
|
2768
|
+
</span><span>
|
|
2769
|
+
</span><span>mysql&gt; USE my_database;
|
|
2770
|
+
</span><span>Database changed
|
|
2771
|
+
</span><span>mysql&gt; CREATE TABLE my_table (
|
|
2772
|
+
</span><span> -&gt; ID INT AUTO_INCREMENT PRIMARY KEY
|
|
2773
|
+
</span><span> -&gt; );
|
|
2774
|
+
</span><span>Query OK, 0 rows affected (0.02 sec)
|
|
2775
|
+
</span></code></pre>
|
|
2776
|
+
<p>Now let's insert a couple of rows.</p>
|
|
2777
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>mysql&gt; INSERT INTO my_table () VALUES ();
|
|
2778
|
+
</span><span>Query OK, 1 row affected (0.03 sec)
|
|
2779
|
+
</span><span>
|
|
2780
|
+
</span><span>mysql&gt; INSERT INTO my_table () VALUES ();
|
|
2781
|
+
</span><span>Query OK, 1 row affected (0.01 sec)
|
|
2782
|
+
</span><span>
|
|
2783
|
+
</span><span>mysql&gt; INSERT INTO my_table () VALUES ();
|
|
2784
|
+
</span><span>Query OK, 1 row affected (0.01 sec)
|
|
2785
|
+
</span><span>
|
|
2786
|
+
</span><span>mysql&gt; SELECT * FROM my_table;
|
|
2787
|
+
</span><span>+----+
|
|
2788
|
+
</span><span>| ID |
|
|
2789
|
+
</span><span>+----+
|
|
2790
|
+
</span><span>| 1 |
|
|
2791
|
+
</span><span>| 2 |
|
|
2792
|
+
</span><span>| 3 |
|
|
2793
|
+
</span><span>+----+
|
|
2794
|
+
</span><span>3 rows in set (0.01 sec)
|
|
2795
|
+
</span></code></pre>
|
|
2796
|
+
<p>So far so good. We can restart the MySQL server and run the same SELECT
|
|
2797
|
+
statement again and get the same result.</p>
|
|
2798
|
+
<p>Let's delete a row.</p>
|
|
2799
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>mysql&gt; DELETE FROM my_table WHERE ID=3;
|
|
2800
|
+
</span><span>Query OK, 1 row affected (0.03 sec)
|
|
2801
|
+
</span><span>
|
|
2802
|
+
</span><span>mysql&gt; SELECT * FROM my_table;
|
|
2803
|
+
</span><span>+----+
|
|
2804
|
+
</span><span>| ID |
|
|
2805
|
+
</span><span>+----+
|
|
2806
|
+
</span><span>| 1 |
|
|
2807
|
+
</span><span>| 2 |
|
|
2808
|
+
</span><span>+----+
|
|
2809
|
+
</span><span>2 rows in set (0.00 sec)
|
|
2810
|
+
</span></code></pre>
|
|
2811
|
+
<p>Let's insert a new row to make sure the ID 3 doesn't get reused.</p>
|
|
2812
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>mysql&gt; INSERT INTO my_table () VALUES ();
|
|
2813
|
+
</span><span>Query OK, 1 row affected (0.02 sec)
|
|
2814
|
+
</span><span>
|
|
2815
|
+
</span><span>mysql&gt; SELECT * FROM my_table;
|
|
2816
|
+
</span><span>+----+
|
|
2817
|
+
</span><span>| ID |
|
|
2818
|
+
</span><span>+----+
|
|
2819
|
+
</span><span>| 1 |
|
|
2820
|
+
</span><span>| 2 |
|
|
2821
|
+
</span><span>| 4 |
|
|
2822
|
+
</span><span>+----+
|
|
2823
|
+
</span><span>3 rows in set (0.00 sec)
|
|
2824
|
+
</span></code></pre>
|
|
2825
|
+
<p>Perfect. Let's delete that latest row, restart the server, and then insert
|
|
2826
|
+
a new row.</p>
|
|
2827
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>mysql&gt; DELETE FROM my_table WHERE ID=4;
|
|
2828
|
+
</span><span>Query OK, 1 row affected (0.01 sec)
|
|
2829
|
+
</span><span>
|
|
2830
|
+
</span><span>mysql&gt; SELECT * FROM my_table;
|
|
2831
|
+
</span><span>ERROR 2013 (HY000): Lost connection to MySQL server during query
|
|
2832
|
+
</span><span>
|
|
2833
|
+
</span><span>$ docker run -it --rm --network host --platform linux/amd64 mysql:5.7 mysql -h 127.0.0.1 -P 3306 -u root -p
|
|
2834
|
+
</span><span>
|
|
2835
|
+
</span><span>mysql&gt; USE my_database;
|
|
2836
|
+
</span><span>Database changed
|
|
2837
|
+
</span><span>
|
|
2838
|
+
</span><span>mysql&gt; SELECT * FROM my_table;
|
|
2839
|
+
</span><span>+----+
|
|
2840
|
+
</span><span>| ID |
|
|
2841
|
+
</span><span>+----+
|
|
2842
|
+
</span><span>| 1 |
|
|
2843
|
+
</span><span>| 2 |
|
|
2844
|
+
</span><span>+----+
|
|
2845
|
+
</span><span>2 rows in set (0.00 sec)
|
|
2846
|
+
</span><span>
|
|
2847
|
+
</span><span>mysql&gt; INSERT INTO my_table () VALUES ();
|
|
2848
|
+
</span><span>Query OK, 1 row affected (0.03 sec)
|
|
2849
|
+
</span><span>
|
|
2850
|
+
</span><span>mysql&gt; SELECT * FROM my_table;
|
|
2851
|
+
</span><span>+----+
|
|
2852
|
+
</span><span>| ID |
|
|
2853
|
+
</span><span>+----+
|
|
2854
|
+
</span><span>| 1 |
|
|
2855
|
+
</span><span>| 2 |
|
|
2856
|
+
</span><span>| 3 |
|
|
2857
|
+
</span><span>+----+
|
|
2858
|
+
</span><span>3 rows in set (0.00 sec)
|
|
2859
|
+
</span></code></pre>
|
|
2860
|
+
<p>Eep. MySQL has re-used the ID 3. This is because the way that auto-increment
|
|
2861
|
+
works in InnoDB is, on server restart, it will figure out what the next
|
|
2862
|
+
ID to use is by effectively running this query:</p>
|
|
2863
|
+
<pre data-lang="SQL" style="background-color:#2e3440;color:#d8dee9;" class="language-SQL "><code class="language-SQL" data-lang="SQL"><span style="color:#81a1c1;">SELECT </span><span style="color:#88c0d0;">MAX</span><span>(ID) </span><span style="color:#81a1c1;">FROM</span><span> my_table;
|
|
2864
|
+
</span></code></pre>
|
|
2865
|
+
<p>If you had deleted the most recent records from the table just before restart,
|
|
2866
|
+
IDs that had been used will be re-used when the server comes back up.</p>
|
|
2867
|
+
<p>In theory, this <em>shouldn't</em> cause you trouble. Best practice dictates that
|
|
2868
|
+
you shouldn't be using IDs from database tables outside of that table unless
|
|
2869
|
+
it's some foreign key field, and you certainly wouldn't leak that ID out of
|
|
2870
|
+
your system, right?</p>
|
|
2871
|
+
<p>In practice, this stuff happens and can cause devastatingly subtle bugs. MySQL
|
|
2872
|
+
8.0 changed this behaviour by storing the auto-increment value on disk in a way
|
|
2873
|
+
that persists across restarts.</p>
|
|
2874
|
+
<h2 id="postgres-sequence-values-don-t-get-replicated"><a class="anchor" href="#postgres-sequence-values-don-t-get-replicated">#</a>
|
|
2875
|
+
Postgres sequence values don't get replicated</h2>
|
|
2876
|
+
<p>Like MySQL 8.0, Postgres stores auto-increment values on disk. It does this in
|
|
2877
|
+
a schema object called a "sequence." When you create an auto-incrementing
|
|
2878
|
+
field in Postgres, behind the scenes a sequence will be created to back that
|
|
2879
|
+
field and durably keep track of what the next value should be.</p>
|
|
2880
|
+
<p>Let's take a look at that in practice.</p>
|
|
2881
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ docker volume create postgres-14-data
|
|
2882
|
+
</span><span>$ docker run --network host -e POSTGRES_PASSWORD=my-secret-pw -v postgres-14-data:/var/lib/postgresql/data -p postgres:14
|
|
2883
|
+
</span></code></pre>
|
|
2884
|
+
<p>With Postgres up and running, let's go ahead and create our table:</p>
|
|
2885
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ docker run -it --rm --network host postgres:14 psql -h 127.0.0.1 -U postgres
|
|
2886
|
+
</span><span>postgres=# CREATE TABLE my_table (id SERIAL PRIMARY KEY);
|
|
2887
|
+
</span><span>CREATE TABLE
|
|
2888
|
+
</span></code></pre>
|
|
2889
|
+
<p>And insert a few rows:</p>
|
|
2890
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>postgres=# INSERT INTO my_table DEFAULT VALUES;
|
|
2891
|
+
</span><span>INSERT 0 1
|
|
2892
|
+
</span><span>postgres=# INSERT INTO my_table DEFAULT VALUES;
|
|
2893
|
+
</span><span>INSERT 0 1
|
|
2894
|
+
</span><span>postgres=# INSERT INTO my_table DEFAULT VALUES;
|
|
2895
|
+
</span><span>INSERT 0 1
|
|
2896
|
+
</span><span>postgres=# SELECT * FROM my_table;
|
|
2897
|
+
</span><span> id
|
|
2898
|
+
</span><span>----
|
|
2899
|
+
</span><span> 1
|
|
2900
|
+
</span><span> 2
|
|
2901
|
+
</span><span> 3
|
|
2902
|
+
</span><span>(3 rows)
|
|
2903
|
+
</span></code></pre>
|
|
2904
|
+
<p>So far so good. Let's take a look at the table:</p>
|
|
2905
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>postgres=# \d my_table
|
|
2906
|
+
</span><span> Table &quot;public.my_table&quot;
|
|
2907
|
+
</span><span> Column | Type | Collation | Nullable | Default
|
|
2908
|
+
</span><span>--------+---------+-----------+----------+--------------------------------------
|
|
2909
|
+
</span><span> id | integer | | not null | nextval(&#39;my_table_id_seq&#39;::regclass)
|
|
2910
|
+
</span><span>Indexes:
|
|
2911
|
+
</span><span> &quot;my_table_pkey&quot; PRIMARY KEY, btree (id)
|
|
2912
|
+
</span></code></pre>
|
|
2913
|
+
<p>This output tells us that the default value for our <code>id</code> field is the <code>nextval</code>
|
|
2914
|
+
of <code>my_table_id_seq</code>. Let's take a look at <code>my_table_id_seq</code>:</p>
|
|
2915
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>postgres=# \d my_table_id_seq
|
|
2916
|
+
</span><span> Sequence &quot;public.my_table_id_seq&quot;
|
|
2917
|
+
</span><span> Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
|
|
2918
|
+
</span><span>---------+-------+---------+------------+-----------+---------+-------
|
|
2919
|
+
</span><span> integer | 1 | 1 | 2147483647 | 1 | no | 1
|
|
2920
|
+
</span><span>Owned by: public.my_table.id
|
|
2921
|
+
</span><span>
|
|
2922
|
+
</span><span>postgres=# SELECT currval(&#39;my_table_id_seq&#39;);
|
|
2923
|
+
</span><span> currval
|
|
2924
|
+
</span><span>---------
|
|
2925
|
+
</span><span> 3
|
|
2926
|
+
</span><span>(1 row)
|
|
2927
|
+
</span></code></pre>
|
|
2928
|
+
<p>Neat, we have a bonafide object in Postgres that's keeping track of the
|
|
2929
|
+
auto-incrementing ID value. If we were to repeat what we did in MySQL, delete
|
|
2930
|
+
some rows and restart, we wouldn't have the same problem here. <code>my_table_id_seq</code>
|
|
2931
|
+
is saved to disk and doesn't lose its place.</p>
|
|
2932
|
+
<p>Or does it?</p>
|
|
2933
|
+
<p>If you want to update Postgres to a new major version, the way you typically
|
|
2934
|
+
accomplish that is by creating a new Postgres instance on the version you want
|
|
2935
|
+
to upgrade to, logically replicate from the old instance to the new one, and
|
|
2936
|
+
then switch your application to talk to the new one.</p>
|
|
2937
|
+
<p>First we need to restart our Postgres 14 with some new configuration to allow
|
|
2938
|
+
logical replication:</p>
|
|
2939
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ docker run --network host -e POSTGRES_PASSWORD=my-secret-pw -v postgres-14-data:/var/lib/postgresql/data -p postgres:14 -c wal_level=logical
|
|
2940
|
+
</span></code></pre>
|
|
2941
|
+
<p>Now let's get Postgres 15 up and running:</p>
|
|
2942
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ docker volume create postgres-15-data
|
|
2943
|
+
</span><span>$ docker run --network host -e POSTGRES_PASSWORD=my-secret-pw -v postgres-15-data:/var/lib/postgresql/data postgres:15 postgres:14 -c wal_level=logical -p 5431
|
|
2944
|
+
</span></code></pre>
|
|
2945
|
+
<p>Next up, we create a "publication" on our Postgres 14 instance:</p>
|
|
2946
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>postgres=# CREATE PUBLICATION my_publication FOR ALL TABLES;
|
|
2947
|
+
</span><span>CREATE PUBLICATION
|
|
2948
|
+
</span></code></pre>
|
|
2949
|
+
<p>Then we create our "my_table" table and a "subscription" on our Postgres 15
|
|
2950
|
+
instance:</p>
|
|
2951
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>postgres=# CREATE TABLE my_table (id SERIAL PRIMARY KEY);
|
|
2952
|
+
</span><span>CREATE TABLE
|
|
2953
|
+
</span><span>postgres=# CREATE SUBSCRIPTION my_subscription CONNECTION &#39;host=127.0.0.1 port=5432 dbname=postgres user=postgres password=my-secret-pw&#39; PUBLICATION my_publication;
|
|
2954
|
+
</span><span>NOTICE: created replication slot &quot;my_subscription&quot; on publisher
|
|
2955
|
+
</span><span>CREATE SUBSCRIPTION
|
|
2956
|
+
</span></code></pre>
|
|
2957
|
+
<p>After doing this, we should see data syncing between old and new instances:</p>
|
|
2958
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ docker run -it --rm --network host postgres:15 psql -h 127.0.0.1 -U postgres -p 5432 -c &quot;SELECT * FROM my_table&quot;
|
|
2959
|
+
</span><span> id
|
|
2960
|
+
</span><span>----
|
|
2961
|
+
</span><span> 1
|
|
2962
|
+
</span><span> 2
|
|
2963
|
+
</span><span> 3
|
|
2964
|
+
</span><span>(3 rows)
|
|
2965
|
+
</span><span>
|
|
2966
|
+
</span><span>$ docker run -it --rm --network host postgres:15 psql -h 127.0.0.1 -U postgres -p 5431 -c &quot;SELECT * FROM my_table&quot;
|
|
2967
|
+
</span><span> id
|
|
2968
|
+
</span><span>----
|
|
2969
|
+
</span><span> 1
|
|
2970
|
+
</span><span> 2
|
|
2971
|
+
</span><span> 3
|
|
2972
|
+
</span><span>(3 rows)
|
|
2973
|
+
</span><span>
|
|
2974
|
+
</span><span>$ docker run -it --rm --network host postgres:15 psql -h 127.0.0.1 -U postgres -p 5432 -c &quot;INSERT INTO my_table DEFAULT VALUES&quot;
|
|
2975
|
+
</span><span>INSERT 0 1
|
|
2976
|
+
</span><span>
|
|
2977
|
+
</span><span>$ docker run -it --rm --network host postgres:15 psql -h 127.0.0.1 -U postgres -p 5431 -c &quot;SELECT * FROM my_table&quot;
|
|
2978
|
+
</span><span> id
|
|
2979
|
+
</span><span>----
|
|
2980
|
+
</span><span> 1
|
|
2981
|
+
</span><span> 2
|
|
2982
|
+
</span><span> 3
|
|
2983
|
+
</span><span> 4
|
|
2984
|
+
</span><span>(4 rows)
|
|
2985
|
+
</span></code></pre>
|
|
2986
|
+
<p>So what's the problem?</p>
|
|
2987
|
+
<p>Well...</p>
|
|
2988
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ docker run -it --rm --network host postgres:15 psql -h 127.0.0.1 -U postgres -p 5432 -c &quot;SELECT nextval(&#39;my_table_id_seq&#39;)&quot;
|
|
2989
|
+
</span><span> nextval
|
|
2990
|
+
</span><span>---------
|
|
2991
|
+
</span><span> 5
|
|
2992
|
+
</span><span>(1 row)
|
|
2993
|
+
</span><span>
|
|
2994
|
+
</span><span>$ docker run -it --rm --network host postgres:15 psql -h 127.0.0.1 -U postgres -p 5431 -c &quot;SELECT nextval(&#39;my_table_id_seq&#39;)&quot;
|
|
2995
|
+
</span><span> nextval
|
|
2996
|
+
</span><span>---------
|
|
2997
|
+
</span><span> 1
|
|
2998
|
+
</span><span>(1 row)
|
|
2999
|
+
</span></code></pre>
|
|
3000
|
+
<p>The sequence value is not replicated. If we tried to insert a row into Postgres
|
|
3001
|
+
15 we get this:</p>
|
|
3002
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ docker run -it --rm --network host postgres:15 psql -h 127.0.0.1 -U postgres -p 5431 -c &quot;INSERT INTO my_table DEFAULT VALUES&quot;
|
|
3003
|
+
</span><span>ERROR: duplicate key value violates unique constraint &quot;my_table_pkey&quot;
|
|
3004
|
+
</span><span>DETAIL: Key (id)=(2) already exists.
|
|
3005
|
+
</span></code></pre>
|
|
3006
|
+
<p>Note: It's tried to insert id=2 here because when we called <code>nextval</code> earlier, it
|
|
3007
|
+
modified the sequence.</p>
|
|
3008
|
+
<p>This can make major Postgres version updates very tricky if you rely heavily on
|
|
3009
|
+
auto-incrementing ID fields. You need to modify the sequence values manually to
|
|
3010
|
+
values you know for a fact won't be reached during the process of the upgrade,
|
|
3011
|
+
and then you likely need to disable writes during the upgrade depending on
|
|
3012
|
+
your workload.</p>
|
|
3013
|
+
<h2 id="conclusion"><a class="anchor" href="#conclusion">#</a>
|
|
3014
|
+
Conclusion</h2>
|
|
3015
|
+
<p>You can avoid all of the above pain by using UUID fields instead of
|
|
3016
|
+
auto-incrementing integers. These have the benefit of being unpredictable and
|
|
3017
|
+
not leak information about the cardinality of the underlying table if you do end
|
|
3018
|
+
up using them outside of the table (which you shouldn't).</p>
|
|
3019
|
+
<p>Thanks to <a href="https://incident.io/blog/one-two-skip-a-few">this article</a> from the
|
|
3020
|
+
wonderful folks at <a href="https://incident.io">Incident.io</a>, I am now aware of the <a href="https://en.wikipedia.org/wiki/German_tank_problem">German tank problem</a>.
|
|
3021
|
+
Well worth reading both the linked article, and the Wikipedia page, for more
|
|
3022
|
+
reasons not to use auto-increment ID fields.</p>
|
|
3023
|
+
</content>
|
|
3024
|
+
|
|
3025
|
+
</entry>
|
|
3026
|
+
|
|
3027
|
+
|
|
3028
|
+
<entry xml:lang="en">
|
|
3029
|
+
<title>Getting an Autism Diagnosis</title>
|
|
3030
|
+
<published>2023-03-02T00:00:00+00:00</published>
|
|
3031
|
+
<updated>2023-03-02T00:00:00+00:00</updated>
|
|
3032
|
+
<author>
|
|
3033
|
+
<name>Unknown</name>
|
|
3034
|
+
</author>
|
|
3035
|
+
<link rel="alternate" href="https://samwho.dev/blog/getting-an-autism-diagnosis/" type="text/html"/>
|
|
3036
|
+
<id>https://samwho.dev/blog/getting-an-autism-diagnosis/</id>
|
|
3037
|
+
|
|
3038
|
+
<content type="html"><p>On the 3rd of March 2022, we received a letter informing us that our eldest son,
|
|
3039
|
+
Max, has Autism Spectrum Disorder. The letter was the end result of a long
|
|
3040
|
+
process. I’m going to talk about that process from start to finish, in as much
|
|
3041
|
+
detail as I can.</p>
|
|
3042
|
+
<p>This post would not have been possible without my wife's dedication to our 2
|
|
3043
|
+
children, her persistence in the face of long odds, and her diligent note
|
|
3044
|
+
taking. Sophie, I love you.</p>
|
|
3045
|
+
<p><img src="/images/max-autism-diagnosis.jpeg" alt="A photograph of a letter explaining that our son, Max, has Autism Spectrum Disorder and Speech and Language difficulties" /></p>
|
|
3046
|
+
<h2 id="prologue"><a class="anchor" href="#prologue">#</a>
|
|
3047
|
+
Prologue</h2>
|
|
3048
|
+
<p>This post spiritually follows on from <a href="/blog/having-a-baby/">this one</a>. I ended
|
|
3049
|
+
that post by saying I’d like to write another post about the first months of
|
|
3050
|
+
parenthood. That never happened, because the first months of parenthood are an
|
|
3051
|
+
extreme test of patience and resolve.</p>
|
|
3052
|
+
<p>Being new parents, we didn’t know what we were doing. Max was irritable,
|
|
3053
|
+
difficult to get to sleep, and dropped down to the 9th percentile for weight.
|
|
3054
|
+
This last piece of information was a shock to us, and to our health visitor, and
|
|
3055
|
+
led to the revelation that Sophie wasn’t producing enough breastmilk. Nobody’s
|
|
3056
|
+
fault, just the way it was.</p>
|
|
3057
|
+
<p>To tell the truth, this was a relief. After we switched to formula feeding,
|
|
3058
|
+
Max’s temperament changed almost overnight. He was content, he slept better, I
|
|
3059
|
+
was able to help with feeding, and Sophie’s nipples were able to heal.</p>
|
|
3060
|
+
<p>Max was 9 months old when the UK went into its first full lockdown of the
|
|
3061
|
+
COVID-19 pandemic on March 26th 2020. The week before the lockdown was
|
|
3062
|
+
announced, we had visited a local nursery and been given forms to fill in to
|
|
3063
|
+
confirm Max’s attendance. We were excited to get him spending time with kids his
|
|
3064
|
+
own age, and we were looking forward to getting some time off from parenting.</p>
|
|
3065
|
+
<p>Instead, we all had to stay home 24/7 by law. I was in the fortunate position
|
|
3066
|
+
to already be working from home, and working in an industry that was relatively
|
|
3067
|
+
unaffected by the pandemic. But none of this helped Max’s social development.</p>
|
|
3068
|
+
<p>I tell you all of this because it’s relevant to Max’s diagnosis. While we got
|
|
3069
|
+
his diagnosis earlier than most, a fact we are ever grateful for, it did take
|
|
3070
|
+
longer than it would have had there been no pandemic. We attributed a lot of his
|
|
3071
|
+
behaviour to having lived most of his life in lockdown.</p>
|
|
3072
|
+
<h2 id="how-we-realised-max-was-different"><a class="anchor" href="#how-we-realised-max-was-different">#</a>
|
|
3073
|
+
How we realised Max was different</h2>
|
|
3074
|
+
<p>Max had his first “settling in” session at nursery on February 8th 2021. These
|
|
3075
|
+
are short sessions, often only a few hours long, designed to ease your child in
|
|
3076
|
+
to the nursery setting. Max’s first settling in session didn’t go very well, a
|
|
3077
|
+
fact that when taken in isolation isn't unusual. He spent the majority of the
|
|
3078
|
+
time screaming, and we ended up cutting it short.</p>
|
|
3079
|
+
<p>Despite this, Sophie and I settled into a good rhythm with nursery. Max went
|
|
3080
|
+
there twice a week. It was a good balance between cost, us getting time off, and
|
|
3081
|
+
Max getting time with his peers. But Max wasn’t taking to it, and one evening a
|
|
3082
|
+
member of staff took Sophie to one side and said Max “isn’t where we expect him
|
|
3083
|
+
to be developmentally.”</p>
|
|
3084
|
+
<p>It's hard to describe how this made me feel. Is this our fault? Are we not
|
|
3085
|
+
parenting well enough?</p>
|
|
3086
|
+
<p>Nursery's concerns were that Max screams a lot, can’t follow instructions, and
|
|
3087
|
+
his speech was less developed than his peers. They suggested it could be Max’s
|
|
3088
|
+
hearing, and they recommended we talk to our GP to set up a hearing test. We
|
|
3089
|
+
called our GP and had a frustrating conversation in which he asked us if <em>we</em>
|
|
3090
|
+
think Max has problems hearing. We explained that we had been recommended to
|
|
3091
|
+
get a hearing test by his nursery. But did <em>we</em> think he had problems hearing?
|
|
3092
|
+
I didn’t think Max had any problems hearing, but I didn’t want to say that
|
|
3093
|
+
because I am not a medical professional.</p>
|
|
3094
|
+
<p>Nursery also recommended we contact our health visitor to ask for a
|
|
3095
|
+
developmental review. In our area, kids used to go through a developmental
|
|
3096
|
+
review at 2 years old. This changed in recent years, and it’s done at 3 years
|
|
3097
|
+
old now. The review is made up of questions that gauge things your child can and
|
|
3098
|
+
cannot do. At the end, you get a score. If you want to see what’s on it, you can
|
|
3099
|
+
search “ASQ:3 24 months” and you’ll find loads of PDFs, all very similar.</p>
|
|
3100
|
+
<p>Max didn’t do well at this, we had to say that he wasn’t able to do most of the
|
|
3101
|
+
things they asked about. Because of his low score, we did another questionnaire
|
|
3102
|
+
called the ASQ:SE:2. This one focuses on social and emotional development.
|
|
3103
|
+
Again, Max scored low. It was the result of these two tests that led our health
|
|
3104
|
+
visitor to refer us to our local council’s special educational needs and
|
|
3105
|
+
disability (SEND) team. This happened on the 7th May 2021.</p>
|
|
3106
|
+
<p>This referral included appointments with a paediatrician (which we were told
|
|
3107
|
+
would take a few months, but actually ended up taking almost a full year),
|
|
3108
|
+
speech and language therapist, and the SEND team themselves. The 7th of May was
|
|
3109
|
+
the first point at which it felt “serious,” and we started to suspect he may be
|
|
3110
|
+
autistic.</p>
|
|
3111
|
+
<h2 id="jumping-through-all-the-hoops"><a class="anchor" href="#jumping-through-all-the-hoops">#</a>
|
|
3112
|
+
Jumping through all the hoops</h2>
|
|
3113
|
+
<p>On the 12th of August we had our appointment with the SEND team. We’ll call her
|
|
3114
|
+
Janet. It took place at nursery, and the day before it Janet had been at nursery
|
|
3115
|
+
to spend time with Max and observe his behaviour. I remember messaging my boss
|
|
3116
|
+
saying I need to be away from work for “an hour or so.” The meeting lasted 4
|
|
3117
|
+
hours. It was obvious that Janet had spent a lot of time with Max, and had taken
|
|
3118
|
+
detailed notes.</p>
|
|
3119
|
+
<p>It was in this meeting that we asked “do you think Max is autistic?” Janet said
|
|
3120
|
+
yes, he probably is. We had asked other people this same question, because it
|
|
3121
|
+
had been on our mind since the ASQ, but everybody had been cagey about it. “Oh I
|
|
3122
|
+
couldn’t say”, “I’m not qualified to make that diagnosis”, etc. We appreciated
|
|
3123
|
+
how forthcoming Janet was with us, and the risk she took personally being open
|
|
3124
|
+
about it. She confessed in us later that a lot of parents don’t like hearing
|
|
3125
|
+
that their child might be autistic, so she wasn’t surprised that most people
|
|
3126
|
+
didn’t want to say.</p>
|
|
3127
|
+
<p>After this meeting, Janet referred us to the Early Years SEND panel. It was
|
|
3128
|
+
decided by them that Max had special needs. It’s important to note at this point
|
|
3129
|
+
that we didn’t have an autism diagnosis. Help is based on need, not diagnosis.
|
|
3130
|
+
Autism or not, Max needed help with his development and our local council would
|
|
3131
|
+
give us that help regardless. Janet even went as far as to say that the
|
|
3132
|
+
diagnosis is irrelevant to the SEND team, he’ll get the help he needs based on
|
|
3133
|
+
what they observe.</p>
|
|
3134
|
+
<p>On the 26th of August 2021 we had our first speech and language therapy (SLT)
|
|
3135
|
+
appointment. This was an introductory session, Max played with some toy cars
|
|
3136
|
+
while we spoke about his behaviours. One of the things we took away from this
|
|
3137
|
+
appointment was to put Max's toys into clear plastic containers that he would
|
|
3138
|
+
need to ask us to open for him. This helps to cement the need for communication.
|
|
3139
|
+
We still do that to this day, and we do believe it has helped.</p>
|
|
3140
|
+
<p>On the 13th of September 2021, our special needs practitioner got in touch with
|
|
3141
|
+
us for the first time. We’ll call her Fay. She arranged “play sessions” with
|
|
3142
|
+
Max, these happened every few weeks from 22nd of September 2021 to 23rd of
|
|
3143
|
+
September 2022. In these sessions, Fay presents Max with toys designed to test
|
|
3144
|
+
him in different ways. Some of them require him to match colours, some of them
|
|
3145
|
+
shapes, some of them are things you play with with another person, and some are
|
|
3146
|
+
toys you aren’t meant to touch at all. All of these test how he reacts to
|
|
3147
|
+
situations, how focused he is, how well he appreciates and accepts playing with
|
|
3148
|
+
others.</p>
|
|
3149
|
+
<p>Something those play sessions taught me is that playing with children is a
|
|
3150
|
+
skill. Fay was able to get Max to play in ways we would have said were
|
|
3151
|
+
impossible without seeing it for ourselves. She was able to get him to do things
|
|
3152
|
+
when she said so, and more crucially to not do things he obviously wanted to do.
|
|
3153
|
+
He responded to her extremely well, and it was a joy to watch her work with him.</p>
|
|
3154
|
+
<p>We had heard it previously from Janet, but Fay confirmed it: Max has no
|
|
3155
|
+
difficulty learning. The way he learns is different to most other kids, though,
|
|
3156
|
+
and will benefit from a more tailored approach. Looking back, it is probably
|
|
3157
|
+
around this time we started forming our opinion that Max should attend a special
|
|
3158
|
+
needs school.</p>
|
|
3159
|
+
<p>On the 30th of September 2021 we had Max's first hearing test. The way that they
|
|
3160
|
+
wanted to test Max's hearing was with a set of stacking cups. The doctor would
|
|
3161
|
+
demonstrate making a noise, then putting a cup onto the stack. Then she would
|
|
3162
|
+
make the noise again and add another cup. Then she would try and get the child
|
|
3163
|
+
to do the same. Max, however, wasn't at the level of understanding required to
|
|
3164
|
+
complete this test.</p>
|
|
3165
|
+
<p>The backup test that she had involved a shelf of toys. The shelf was a grid,
|
|
3166
|
+
like a set of IKEA Kallax shelves, and in each square was a toy that could make
|
|
3167
|
+
a noise. The doctor would trigger each toy to make a noise and the idea was for
|
|
3168
|
+
Max to look at the toy that made the noise. Unfortunately, Max was terrified of
|
|
3169
|
+
the toys and screamed uncontrollably upon seeing them.</p>
|
|
3170
|
+
<p>After this, the test was rescheduled for January 10th 2022. This time they tried
|
|
3171
|
+
to have him listen to a cartoon on the television while they put a sensor in his
|
|
3172
|
+
ear to take some measurements. He refused to let them put the device in his ear
|
|
3173
|
+
long enough to get any readings. They tried the first set of tests again, but he
|
|
3174
|
+
reacted the same way he did before. A third test was scheduled on February 3rd
|
|
3175
|
+
2022.</p>
|
|
3176
|
+
<p>This time they wanted to try and do the test while Max was asleep, but we
|
|
3177
|
+
weren't able to get Max to sleep at the time of the appointment. However, to
|
|
3178
|
+
everyone's surprise, he went in to the doctor's office and did the stacked cup
|
|
3179
|
+
test immediately without prompting. He passed with flying colours, ruling out
|
|
3180
|
+
that hearing was causing his language difficulties. To this day we don't know
|
|
3181
|
+
what changed.</p>
|
|
3182
|
+
<p>On the 17th of February 2022, 6 months after the first appointment, we had our
|
|
3183
|
+
second SLT appointment. We had to chase them up to get this to happen, as we had
|
|
3184
|
+
not heard back from them. In this appointment we learned about "transition
|
|
3185
|
+
objects", objects to help children go from one activity to another. We used
|
|
3186
|
+
bubbles to help get Max to get in the bath, and a toy game controller to help
|
|
3187
|
+
get him in to the car. He still uses the toy controller today, though he has
|
|
3188
|
+
grown to enjoy bath time enough to not need the bubbles.</p>
|
|
3189
|
+
<p>On the same day, in the afternoon, we got a phone call telling us that there had
|
|
3190
|
+
been a short-notice cancellation with the paediatrician and would we like to do
|
|
3191
|
+
our appointment on the 20th of February? You're damn right we would! We had been
|
|
3192
|
+
waiting for this appointment since May 2021, and had heard from friends that
|
|
3193
|
+
waiting over a year was common. Some people wait more than 2 years. To get seen
|
|
3194
|
+
in less than a year is rare.</p>
|
|
3195
|
+
<p>Janet, our SEND coordinator, had compiled all of the paperwork from our other
|
|
3196
|
+
appointments and sent them to the paediatrician. She had also let the
|
|
3197
|
+
paediatrician know that we were receptive to a diagnosis (not all parents are).
|
|
3198
|
+
Unfortunately, for whatever reason, this documentation didn't get to the
|
|
3199
|
+
paediatrician ready for the appointment. The paediatrician had moved from
|
|
3200
|
+
another area and wasn't fully set up with her email yet.</p>
|
|
3201
|
+
<p>However, with what Sophie was able to find on her phone, and after observing Max
|
|
3202
|
+
for about an hour, the paediatrician told us she felt comfortable giving him a
|
|
3203
|
+
diagnosis of autism there and then, with an official letter to follow.</p>
|
|
3204
|
+
<h2 id="wrapping-up"><a class="anchor" href="#wrapping-up">#</a>
|
|
3205
|
+
Wrapping up</h2>
|
|
3206
|
+
<p>Getting an autism diagnosis before the age of 3 is uncommon, and I have to
|
|
3207
|
+
express appreciation for everyone involved in the process. While the help given
|
|
3208
|
+
to children should be based on needs alone, we have found it helpful to have a
|
|
3209
|
+
recognised diagnosis.</p>
|
|
3210
|
+
<p>What I plan to write about next is the process we went through to get Max in to
|
|
3211
|
+
a special needs school, a process which came to an end just a few weeks ago. We
|
|
3212
|
+
are ecstatic.</p>
|
|
3213
|
+
</content>
|
|
3214
|
+
|
|
3215
|
+
</entry>
|
|
3216
|
+
|
|
3217
|
+
|
|
3218
|
+
<entry xml:lang="en">
|
|
3219
|
+
<title>I finally figured out how to take notes!</title>
|
|
3220
|
+
<published>2022-02-14T00:00:00+00:00</published>
|
|
3221
|
+
<updated>2022-02-14T00:00:00+00:00</updated>
|
|
3222
|
+
<author>
|
|
3223
|
+
<name>Unknown</name>
|
|
3224
|
+
</author>
|
|
3225
|
+
<link rel="alternate" href="https://samwho.dev/blog/note-taking/" type="text/html"/>
|
|
3226
|
+
<id>https://samwho.dev/blog/note-taking/</id>
|
|
3227
|
+
|
|
3228
|
+
<content type="html"><p>I’ve never been good at taking notes. I’ve tried. Oh boy, have I tried. Name a
|
|
3229
|
+
piece of note taking software, odds are I’ve tried it. I’ve even tried going old
|
|
3230
|
+
school with pen and paper. Nothing sticks.</p>
|
|
3231
|
+
<p>Until recently.</p>
|
|
3232
|
+
<p>Some time ago, I learned about Apple’s
|
|
3233
|
+
<a href="https://apps.apple.com/gb/app/shortcuts/id1462947752">Shortcuts</a> app. It’s an
|
|
3234
|
+
app on iOS, iPadOS, and MacOS that allows you to automate actions between apps.
|
|
3235
|
+
It’s a little like <a href="https://ifttt.com">IFTTT</a>. I played with it and made a few
|
|
3236
|
+
fun things. I created a keyboard shortcut that could turn my lights on and off,
|
|
3237
|
+
for example. I didn’t take it much further than that.</p>
|
|
3238
|
+
<p>Since the start of the new year, I’ve been taking on more responsibility at
|
|
3239
|
+
work. This has meant an increase in meetings, and an increase in me being
|
|
3240
|
+
responsible for making sure things are moving forward. This means I often have
|
|
3241
|
+
to follow up on things after a meeting, and I would sometimes forget to do this.
|
|
3242
|
+
This would not do, I thought, and decided it was time to start taking meeting
|
|
3243
|
+
notes.</p>
|
|
3244
|
+
<p>I had some requirements in mind:</p>
|
|
3245
|
+
<ol>
|
|
3246
|
+
<li>I want to be able to tag notes. I’d like to track things like date, who was
|
|
3247
|
+
there, what the key topics were, and be able to search based on these tags.</li>
|
|
3248
|
+
<li>I need the ability to create action items, and be able to ask “what action
|
|
3249
|
+
items have I not yet done?”</li>
|
|
3250
|
+
<li>It has to be super easy. I want to be able to jump into a meeting and have my
|
|
3251
|
+
meeting notes ready to go.</li>
|
|
3252
|
+
</ol>
|
|
3253
|
+
<p>Turns out, combining Apple Shortcuts with <a href="https://bear.app">Bear</a> hits all of
|
|
3254
|
+
these requirements.</p>
|
|
3255
|
+
<h2 id="shortcuts"><a class="anchor" href="#shortcuts">#</a>
|
|
3256
|
+
Shortcuts</h2>
|
|
3257
|
+
<p>I have two Shortcuts I use to make my note taking life much easier:</p>
|
|
3258
|
+
<ol>
|
|
3259
|
+
<li>A shortcut that creates a meeting note.</li>
|
|
3260
|
+
<li>A shortcut that opens or creates a daily “scratch” note, for note taking
|
|
3261
|
+
outside of meetings.</li>
|
|
3262
|
+
</ol>
|
|
3263
|
+
<p>The meeting note shortcut does the following:</p>
|
|
3264
|
+
<ol>
|
|
3265
|
+
<li>Looks in my work calendar for the most recent meeting that started in the
|
|
3266
|
+
last 30 minutes.</li>
|
|
3267
|
+
<li>It then creates a note with the meeting title as the note title, and it adds
|
|
3268
|
+
tags for each person who accepted the calendar invite. It also adds a tag for
|
|
3269
|
+
the current date, my current location, and the current temperature outside. Just
|
|
3270
|
+
a bit of fun.</li>
|
|
3271
|
+
</ol>
|
|
3272
|
+
<p>I trigger this shortcut by typing cmd+ctrl+m. Any meeting I go in to, the first
|
|
3273
|
+
thing I do while I’m waiting for people to arrive is hit that shortcut, the note
|
|
3274
|
+
pops up a few seconds later, and I’m ready to take notes.</p>
|
|
3275
|
+
<p>The daily scratch note shortcut is much simpler. It creates a note with the
|
|
3276
|
+
current date as the title, and all of the same non-meeting-specific tags as the
|
|
3277
|
+
meeting note: date, location, temperature. The only difference is it first
|
|
3278
|
+
searches for a note with the current date as the title, and if it finds it it
|
|
3279
|
+
opens that instead of creating a new one. I trigger this shortcut with
|
|
3280
|
+
cmd+ctrl+s.</p>
|
|
3281
|
+
<p><a href="/images/shortcut.png"><img src="/images/shortcut.png" alt="My daily scratch note shortcut" /></a></p>
|
|
3282
|
+
<p>After a second or two, a note that looks like this opens up on my screen:</p>
|
|
3283
|
+
<p><a href="/images/note.png"><img src="/images/note.png" alt="A daily scratch note" /></a></p>
|
|
3284
|
+
<h2 id="bear"><a class="anchor" href="#bear">#</a>
|
|
3285
|
+
Bear</h2>
|
|
3286
|
+
<p>Other than being a beautiful demonstration of not implementing every single
|
|
3287
|
+
feature your user base asks for, the primary thing Bear excels at in my workflow
|
|
3288
|
+
is TODO management.</p>
|
|
3289
|
+
<p>At any point in any note, you can create a TODO. This manifests as a list item
|
|
3290
|
+
with a checkbox, much like GitHub’s TODOs. You can have as many TODOs as you
|
|
3291
|
+
want in a note, and Bear has a section of its navigation menu that will show you
|
|
3292
|
+
all notes with outstanding TODOs.</p>
|
|
3293
|
+
<p><a href="/images/note-with-todo.png"><img src="/images/note-with-todo.png" alt="A daily scratch note with TODOs" /></a></p>
|
|
3294
|
+
<h2 id="conclusion"><a class="anchor" href="#conclusion">#</a>
|
|
3295
|
+
Conclusion</h2>
|
|
3296
|
+
<p>I’ve been using this new system for about a week now, which is longer than I’ve
|
|
3297
|
+
been able to stick with any other note taking system. Nothing else has ever felt
|
|
3298
|
+
as natural to me as this does.</p>
|
|
3299
|
+
<p>The key outcome, though, is that I feel more on top of things now. I’m not
|
|
3300
|
+
dropping the ball on things people ask me to do in meetings. People don’t have
|
|
3301
|
+
to chase me for things as much, which makes me feel good and I’m sure it makes
|
|
3302
|
+
them feel good as well.</p>
|
|
3303
|
+
</content>
|
|
3304
|
+
|
|
3305
|
+
</entry>
|
|
3306
|
+
|
|
3307
|
+
|
|
3308
|
+
<entry xml:lang="en">
|
|
3309
|
+
<title>Adventures in Homelab: Part 1</title>
|
|
3310
|
+
<published>2021-05-02T00:00:00+00:00</published>
|
|
3311
|
+
<updated>2021-05-02T00:00:00+00:00</updated>
|
|
3312
|
+
<author>
|
|
3313
|
+
<name>Unknown</name>
|
|
3314
|
+
</author>
|
|
3315
|
+
<link rel="alternate" href="https://samwho.dev/blog/adventures-in-homelab-part-1/" type="text/html"/>
|
|
3316
|
+
<id>https://samwho.dev/blog/adventures-in-homelab-part-1/</id>
|
|
3317
|
+
|
|
3318
|
+
<content type="html"><p>If you work in tech, and you use the cloud in any way, you've probably heard of
|
|
3319
|
+
Kubernetes. It's inescapable now, and there's no shortage of takes on it.</p>
|
|
3320
|
+
<p>I've worked in a few companies that have used Kubernetes, but never been close
|
|
3321
|
+
to it. I've used in-house tools that communicate with it, or CI/CD systems that
|
|
3322
|
+
deploy my code in to it automatically. This has left me not really knowing what
|
|
3323
|
+
Kubernetes is or how it works.</p>
|
|
3324
|
+
<p>That changes now.</p>
|
|
3325
|
+
<p>I'm embarking on a journey to create a production-ready Kubernetes cluster in my
|
|
3326
|
+
own home.</p>
|
|
3327
|
+
<h2 id="what-s-in-this-post"><a class="anchor" href="#what-s-in-this-post">#</a>
|
|
3328
|
+
What's in this post</h2>
|
|
3329
|
+
<p>At the end of this post I will have shown you:</p>
|
|
3330
|
+
<ul>
|
|
3331
|
+
<li>How I installed Arch Linux on 3 Raspberry Pi 4Bs and got them ready to be kubelets</li>
|
|
3332
|
+
<li>How I bootstrapped a bare metal Kubernentes cluster on those Raspberry Pis</li>
|
|
3333
|
+
<li>How I set up pod networking in the cluster</li>
|
|
3334
|
+
</ul>
|
|
3335
|
+
<h2 id="things-i-bought"><a class="anchor" href="#things-i-bought">#</a>
|
|
3336
|
+
Things I bought</h2>
|
|
3337
|
+
<p>I've also wanted to slide down the <a href="https://reddit.com/r/homelab">/r/homelab</a> rabbit hole for a while, so
|
|
3338
|
+
here's what I bought to get started:</p>
|
|
3339
|
+
<ul>
|
|
3340
|
+
<li>3x <a href="https://www.amazon.co.uk/gp/product/B08M39828H/ref=ppx_yo_dt_b_asin_title_o08_s04?ie=UTF8&amp;psc=1">Raspberry Pi 4 model B</a> with <a href="https://www.amazon.co.uk/gp/product/B07VKF1CK8/ref=ppx_yo_dt_b_asin_title_o08_s00?ie=UTF8&amp;psc=1">power supplies</a> and <a href="https://www.amazon.co.uk/gp/product/B08GYKNCCP/ref=ppx_yo_dt_b_asin_title_o08_s04?ie=UTF8&amp;psc=1">SD cards</a></li>
|
|
3341
|
+
<li>A <a href="https://www.amazon.co.uk/gp/product/B08Q8MTGHS/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&amp;psc=1">rack mount for the Raspberry Pis</a></li>
|
|
3342
|
+
<li>A <a href="https://www.amazon.co.uk/gp/product/B013PGHUZS/ref=ppx_yo_dt_b_asin_title_o09_s00?ie=UTF8&amp;psc=1">12u 19" cabinet</a></li>
|
|
3343
|
+
<li>A <a href="https://www.amazon.co.uk/gp/product/B07DFBX347/ref=ppx_yo_dt_b_asin_title_o08_s02?ie=UTF8&amp;psc=1">16 PoE port gigabit ethernet switch</a></li>
|
|
3344
|
+
<li>A <a href="https://www.amazon.co.uk/gp/product/B08NXD85CK/ref=ppx_yo_dt_b_asin_title_o05_s00?ie=UTF8&amp;psc=1">rack mountable power strip</a></li>
|
|
3345
|
+
<li>A <a href="https://www.amazon.co.uk/gp/product/B008X3JHJQ/ref=ppx_yo_dt_b_asin_title_o07_s00?ie=UTF8&amp;psc=1">rack mountable shelf</a></li>
|
|
3346
|
+
<li>Some <a href="https://www.amazon.co.uk/gp/product/B004FEGBTQ/ref=ppx_yo_dt_b_asin_title_o08_s03?ie=UTF8&amp;psc=1">teeny weeny network cables</a></li>
|
|
3347
|
+
</ul>
|
|
3348
|
+
<p>Here's what it all looks like when put together:</p>
|
|
3349
|
+
<p><img src="/images/localkube-1.jpg" alt="My home server rack from top to bottom: the 3 raspberry pis mounted in their mounting bracket, the PoE switch below them with patch cables running to each pi, and the shelf below that holding a UPS and a NAS" /></p>
|
|
3350
|
+
<p>Also pictured here is the shelf, which is holding a <a href="https://www.amazon.co.uk/gp/product/B07BZCD927/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&amp;psc=1">UPS</a> on the left and a
|
|
3351
|
+
<a href="https://www.amazon.co.uk/gp/product/B075DDZ894/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&amp;psc=1">NAS</a> on the right. I had those things already, so didn't list them as part
|
|
3352
|
+
of what I bought for this project. While the UPS is optional, the NAS is quite
|
|
3353
|
+
critical to my setup. It will eventually host all of the persistent data for
|
|
3354
|
+
my cluster. More about this in the 2nd part of this series.</p>
|
|
3355
|
+
<h2 id="preparing-the-pis"><a class="anchor" href="#preparing-the-pis">#</a>
|
|
3356
|
+
Preparing the Pis</h2>
|
|
3357
|
+
<p>The first step is to get an OS running on the Raspberry Pis. While the official
|
|
3358
|
+
documentation on <a href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/">creating a bare metal Kubernetes cluster</a> recommends using
|
|
3359
|
+
a deb/rpm-compatible Linux distribution, I'm a long-time fan of Arch Linux.
|
|
3360
|
+
Surely I can't be the first person to want to do this on Arch?</p>
|
|
3361
|
+
<p>Fortunately, I'm not. Morten Linderud, part of the Arch Linux security team, has
|
|
3362
|
+
written a <a href="https://linderud.dev/blog/kubernetes-in-arch-linux/">great blog post</a> on getting a bare metal Kubernetes cluster
|
|
3363
|
+
working using Arch Linux. There's only one small gotcha: he didn't do it on
|
|
3364
|
+
Raspberry Pis.</p>
|
|
3365
|
+
<p>Before running through the steps in his blog post, we need to get Arch running
|
|
3366
|
+
on the Pis. I followed <a href="https://archlinuxarm.org/platforms/armv8/broadcom/raspberry-pi-4">this guide</a> from the official Arch Linux website,
|
|
3367
|
+
which worked perfectly. I followed the ARMv7 installation guide because the
|
|
3368
|
+
disclaimer for AArch64 put me off a little. This decision hasn't hurt me so far,
|
|
3369
|
+
though I have occasionally had to look harder for docker images built for ARM
|
|
3370
|
+
and not ARM64 (thanks, Apple).</p>
|
|
3371
|
+
<p>I'm going to use <code>kubeadm</code> to bootstrap my cluster, and while <code>kubeadm</code> is an
|
|
3372
|
+
officially supported package in the Arch Linux repos, there's no ARM build of
|
|
3373
|
+
it. There is, however, an ARM build <a href="https://aur.archlinux.org/packages/kubeadm-bin/">in the AUR</a>. I installed <a href="https://aur.archlinux.org/packages/yay/">yay</a> as
|
|
3374
|
+
my preferred AUR tool.</p>
|
|
3375
|
+
<p>To save some time, I'll tell you I needed to install all of the following on
|
|
3376
|
+
each pi:</p>
|
|
3377
|
+
<pre data-lang="bash" style="background-color:#2e3440;color:#d8dee9;" class="language-bash "><code class="language-bash" data-lang="bash"><span style="color:#88c0d0;">yay</span><span> -S kubeadm kubelet crictl conntrack-tools ethtool ebtables cni-plugins containerd socat
|
|
3378
|
+
</span></code></pre>
|
|
3379
|
+
<p>A lot of them came up during the <code>kubeadm init</code> process. It runs a set of
|
|
3380
|
+
"preflight checks" that require you to install necessary binaries. It also
|
|
3381
|
+
checks to make sure your system has various capabilities, and one of these
|
|
3382
|
+
was missing for me: memory cgroups. I had to add the following onto the end
|
|
3383
|
+
of <code>/boot/cmdline.txt</code>:</p>
|
|
3384
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>cgroup_enable=memory
|
|
3385
|
+
</span></code></pre>
|
|
3386
|
+
<p>And reboot. It also warned me that the <code>hugetlb</code> cgroup wasn't enabled, but it
|
|
3387
|
+
was an optional dependency and I decided to ignore it. This hasn't bitten me so
|
|
3388
|
+
far.</p>
|
|
3389
|
+
<p>The last thing I did was set the hostname of each of the nodes. Modify
|
|
3390
|
+
<code>/etc/hostname</code> and name the nodes as you see fit. I used <code>kubernetes-master</code>,
|
|
3391
|
+
<code>kubernetes-worker-1</code>, and <code>kubernetes-worker-2</code>. I also gave them static IPs in
|
|
3392
|
+
my local network, and DNS names to make communicating with them easier.</p>
|
|
3393
|
+
<h2 id="bootstrapping-the-cluster"><a class="anchor" href="#bootstrapping-the-cluster">#</a>
|
|
3394
|
+
Bootstrapping the cluster</h2>
|
|
3395
|
+
<p>Step 1 to bootstrapping a cluster is to set up your master node. The Kubernetes
|
|
3396
|
+
project ships a tool called <code>kubeadm</code> (Kubernetes admin) that makes this very
|
|
3397
|
+
easy. I ran the following:</p>
|
|
3398
|
+
<pre data-lang="bash" style="background-color:#2e3440;color:#d8dee9;" class="language-bash "><code class="language-bash" data-lang="bash"><span style="color:#88c0d0;">kubeadm</span><span> init --pod-network-cidr 10.244.0.0/16 --upload-certs
|
|
3399
|
+
</span></code></pre>
|
|
3400
|
+
<p>The flag <code>--pod-network-cidr</code> is the desired subnet you want pods to live in. I
|
|
3401
|
+
chose something that's very different to my home network so I would be able to
|
|
3402
|
+
distinguish them. The flag <code>--upload-certs</code> I'm not really sure about. Martin
|
|
3403
|
+
Linderud uses it in his blog post, so I did as well. From reading the
|
|
3404
|
+
documentation on the flag it looks like I didn't need it, so try without if
|
|
3405
|
+
you're feeling adventurous.</p>
|
|
3406
|
+
<p><code>kubeadm init</code> runs a set of preflight checks first. It's possible you will fail
|
|
3407
|
+
some of those checks. In that case, make sure you do some searching to figure
|
|
3408
|
+
out what's wrong and fix it before continuing.</p>
|
|
3409
|
+
<p>When <code>kubeadm init</code> finishes, you'll see output that looks like this:</p>
|
|
3410
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>Your Kubernetes control-plane has initialized successfully!
|
|
3411
|
+
</span><span>
|
|
3412
|
+
</span><span>To start using your cluster, you need to run the following as a regular user:
|
|
3413
|
+
</span><span>
|
|
3414
|
+
</span><span> mkdir -p $HOME/.kube
|
|
3415
|
+
</span><span> sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
|
|
3416
|
+
</span><span> sudo chown $(id -u):$(id -g) $HOME/.kube/config
|
|
3417
|
+
</span><span>
|
|
3418
|
+
</span><span>You should now deploy a Pod network to the cluster.
|
|
3419
|
+
</span><span>Run &quot;kubectl apply -f [podnetwork].yaml&quot; with one of the options listed at:
|
|
3420
|
+
</span><span> /docs/concepts/cluster-administration/addons/
|
|
3421
|
+
</span><span>
|
|
3422
|
+
</span><span>You can now join any number of machines by running the following on each node
|
|
3423
|
+
</span><span>as root:
|
|
3424
|
+
</span><span>
|
|
3425
|
+
</span><span> kubeadm join &lt;control-plane-host&gt;:&lt;control-plane-port&gt; --token &lt;token&gt; --discovery-token-ca-cert-hash sha256:&lt;hash&gt;
|
|
3426
|
+
</span></code></pre>
|
|
3427
|
+
<p>Save that <code>kubeadm join</code> command, you'll need it in a few minutes.</p>
|
|
3428
|
+
<p>It was at this point that I also copied the <code>~/.kube/config</code> file to my main
|
|
3429
|
+
development machine and closed the SSH connection to my master node.</p>
|
|
3430
|
+
<h2 id="pod-networking"><a class="anchor" href="#pod-networking">#</a>
|
|
3431
|
+
Pod networking</h2>
|
|
3432
|
+
<p><img src="/images/now-this-is-pod-networking.jpg" alt="Anakin Skywalker pod-racing with the caption &quot;Now this is pod networking!&quot;" /></p>
|
|
3433
|
+
<p>Pod networking has come up a couple of times now, but what is it?</p>
|
|
3434
|
+
<p>A Kubernetes cluster consists of 0-n nodes. A node is a physical machine running
|
|
3435
|
+
the <code>kubelet</code> daemon configured to be a part of your cluster. On a node, 0-n
|
|
3436
|
+
pods can be running. A pod is a collection of 1-n containers that share a local
|
|
3437
|
+
network. They're called pods as a reference to a pod of dolphins, according to
|
|
3438
|
+
the book <a href="https://www.oreilly.com/library/view/kubernetes-up-and/9781492046523/">Kubernetes: Up and Running</a>.</p>
|
|
3439
|
+
<p>Because the networks that Kubernetes clusters are deployed in are extremely
|
|
3440
|
+
varied (from cloud providers to datacenters to home networks), and needs will
|
|
3441
|
+
differ dramatically, Kubernetes doesn't ship clusters with a way for pods to
|
|
3442
|
+
communicate with other pods by default. You need to select a third-party
|
|
3443
|
+
solution that fits your needs.</p>
|
|
3444
|
+
<p>Deciding what pod networking solution is best for you is outside of the scope
|
|
3445
|
+
of this article, I'll just say that I went with <a href="https://github.com/flannel-io/flannel">flannel</a>. It sounded simple
|
|
3446
|
+
and just sorts out networking between pods without any extra fancy features.
|
|
3447
|
+
Its limitations, primarily that nodes must be on the same physical network to
|
|
3448
|
+
each other, was not a concern for me.</p>
|
|
3449
|
+
<p>Normally, you would install flannel like this:</p>
|
|
3450
|
+
<pre data-lang="bash" style="background-color:#2e3440;color:#d8dee9;" class="language-bash "><code class="language-bash" data-lang="bash"><span style="color:#88c0d0;">kubectl</span><span> apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
|
|
3451
|
+
</span></code></pre>
|
|
3452
|
+
<p>But I found that this didn't work for me. There were two reasons:</p>
|
|
3453
|
+
<ol>
|
|
3454
|
+
<li>I was missing the <code>cni-plugins</code> package</li>
|
|
3455
|
+
<li>The default backend flannel uses, <code>vxlan</code>, didn't work for some reason</li>
|
|
3456
|
+
</ol>
|
|
3457
|
+
<p>While 1 took some time to figure out, largely by doing lots of
|
|
3458
|
+
<code>kubectl describe pod</code> commands, it was a simple fix once I saw the error
|
|
3459
|
+
message.</p>
|
|
3460
|
+
<p>2, however, was tricky. Pod to pod communication by pure pod IP address worked
|
|
3461
|
+
fine, but any communication through a cluster IP address hanged indefinitely.
|
|
3462
|
+
After a lot of searching, I found someone suggesting to switch away from
|
|
3463
|
+
flannel's default <code>vxlan</code> backend, to the <code>host-gw</code> backend.</p>
|
|
3464
|
+
<p>What does all of this mean? Fuck if I know. All I know is that it fixed the
|
|
3465
|
+
problem I was having. If you download the flannel manifest from the command
|
|
3466
|
+
above and find the <code>ConfigMap</code> called <code>kube-flannel-cfg</code>, modify the bit called
|
|
3467
|
+
<code>net-conf.json</code> so that it looks like this:</p>
|
|
3468
|
+
<pre data-lang="json" style="background-color:#2e3440;color:#d8dee9;" class="language-json "><code class="language-json" data-lang="json"><span>{
|
|
3469
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;Network&quot;</span><span style="color:#eceff4;">: </span><span style="color:#a3be8c;">&quot;10.244.0.0/16&quot;</span><span style="color:#eceff4;">,
|
|
3470
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;Backend&quot;</span><span style="color:#eceff4;">: </span><span>{
|
|
3471
|
+
</span><span> </span><span style="color:#a3be8c;">&quot;Type&quot;</span><span style="color:#eceff4;">: </span><span style="color:#a3be8c;">&quot;host-gw&quot;
|
|
3472
|
+
</span><span> }
|
|
3473
|
+
</span><span>}
|
|
3474
|
+
</span></code></pre>
|
|
3475
|
+
<p>Last but not least, I found that I had to restart my master node after all of
|
|
3476
|
+
these changes. It takes a minute or two to boot back up, but when it did
|
|
3477
|
+
I was greeted with this:</p>
|
|
3478
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ kubectl get nodes
|
|
3479
|
+
</span><span>NAME STATUS ROLES AGE VERSION
|
|
3480
|
+
</span><span>kubernetes-master.local Ready control-plane,master 14m v1.21.0
|
|
3481
|
+
</span></code></pre>
|
|
3482
|
+
<h2 id="adding-the-worker-nodes"><a class="anchor" href="#adding-the-worker-nodes">#</a>
|
|
3483
|
+
Adding the worker nodes</h2>
|
|
3484
|
+
<p>Remember that <code>kubeadm join</code> command I said to save for later? Now is later.
|
|
3485
|
+
Adding nodes to your cluster is as simple as running that join command on each
|
|
3486
|
+
node.</p>
|
|
3487
|
+
<p>One bit of weirdness I experienced is that after newly joining a node to the
|
|
3488
|
+
cluster, it would get stuck in the <code>NotReady</code> state. This resolved itself after
|
|
3489
|
+
rebooting each node. Not sure what that's all about, I'm assuming network
|
|
3490
|
+
voodoo with flannel.</p>
|
|
3491
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>$ kubectl get nodes
|
|
3492
|
+
</span><span>NAME STATUS ROLES AGE VERSION
|
|
3493
|
+
</span><span>kubernetes-master.local Ready control-plane,master 20m v1.21.0
|
|
3494
|
+
</span><span>kubernetes-worker-1.local Ready &lt;none&gt; 2m v1.21.0
|
|
3495
|
+
</span><span>kubernetes-worker-2.local Ready &lt;none&gt; 1m v1.21.0
|
|
3496
|
+
</span></code></pre>
|
|
3497
|
+
<h2 id="conclusion"><a class="anchor" href="#conclusion">#</a>
|
|
3498
|
+
Conclusion</h2>
|
|
3499
|
+
<p>So now we have a working bare metal Kubernetes cluster, we're ready to start
|
|
3500
|
+
running things on it. We still have a long way to go until our cluster can run
|
|
3501
|
+
any kind of workload we want. We'll need to handle load balancing, persistent
|
|
3502
|
+
storage, and ingress resources. All of that is going to be in part 2.</p>
|
|
3503
|
+
<p>We have an even longer way to go until we could call this a production-ready
|
|
3504
|
+
cluster. The main thing missing for that is that we'd need to run 3 master
|
|
3505
|
+
nodes, and this is something I want to explore in a future post.</p>
|
|
3506
|
+
</content>
|
|
3507
|
+
|
|
3508
|
+
</entry>
|
|
3509
|
+
|
|
3510
|
+
|
|
3511
|
+
<entry xml:lang="en">
|
|
3512
|
+
<title>Simple Complex Easy Hard</title>
|
|
3513
|
+
<published>2021-04-18T00:00:00+00:00</published>
|
|
3514
|
+
<updated>2021-04-18T00:00:00+00:00</updated>
|
|
3515
|
+
<author>
|
|
3516
|
+
<name>Unknown</name>
|
|
3517
|
+
</author>
|
|
3518
|
+
<link rel="alternate" href="https://samwho.dev/blog/simple-complex-easy-hard/" type="text/html"/>
|
|
3519
|
+
<id>https://samwho.dev/blog/simple-complex-easy-hard/</id>
|
|
3520
|
+
|
|
3521
|
+
<content type="html"><p>You might have noticed the last time you were doing chores or tackling a
|
|
3522
|
+
tricky problem at work, that when something is hard it's not always hard in
|
|
3523
|
+
the same way. The hard you experience when doing chores, that mindnumbing ,
|
|
3524
|
+
I-can't-be-bothered hard, is different to the hard you might experience when
|
|
3525
|
+
debugging an elusive bug in a distributed system.</p>
|
|
3526
|
+
<p>Why?</p>
|
|
3527
|
+
<h2 id="the-2-axes-of-difficulty"><a class="anchor" href="#the-2-axes-of-difficulty">#</a>
|
|
3528
|
+
The 2 axes of difficulty</h2>
|
|
3529
|
+
<p>There are many things that determine whether a task is difficult or not, but you
|
|
3530
|
+
can make a start on getting more granular by splitting difficulty into two axes:
|
|
3531
|
+
simple-complex and easy-hard.</p>
|
|
3532
|
+
<p>What's the difference?</p>
|
|
3533
|
+
<p>A sudoku puzzle is <em>complex</em>. It depends on your skill at sudoku puzzles, and
|
|
3534
|
+
you're able to do more complex sudoku puzzles the more you practice and hone
|
|
3535
|
+
this skill. A task is complex when the number of people that <em>could</em> do it
|
|
3536
|
+
tends toward zero. If a lot of people could do it, that would make it
|
|
3537
|
+
<em>simple</em>.</p>
|
|
3538
|
+
<p>The other scale measures how much effort must be expended to complete the
|
|
3539
|
+
task. If you're put off by how much effort will be involved, it's likely
|
|
3540
|
+
<em>hard</em>. Something is hard when the number of people that are willing to put
|
|
3541
|
+
the effort in tends toward zero.</p>
|
|
3542
|
+
<h2 id="why-does-this-matter"><a class="anchor" href="#why-does-this-matter">#</a>
|
|
3543
|
+
Why does this matter?</h2>
|
|
3544
|
+
<p>This distinction can help you with time estimations. Complexity is one of the
|
|
3545
|
+
key things that introduces uncertainty in to estimating how long a task will
|
|
3546
|
+
take. It's not unusual to get half way through a sudoku puzzle and realise
|
|
3547
|
+
you've made a mistake somewhere, and need to backtrack. On the other hand,
|
|
3548
|
+
you know quite accurately how long it's going to take you to mow the lawn.</p>
|
|
3549
|
+
<p>Adding this vocabulary to your work tasks can help people to understand what
|
|
3550
|
+
to expect. A simple-easy task is likely to be predictably quick. A
|
|
3551
|
+
simple-hard task predictably long. A complex-hard task is anyone's guess, and
|
|
3552
|
+
may even be worth breaking in to several smaller tasks.</p>
|
|
3553
|
+
<h2 id="other-examples"><a class="anchor" href="#other-examples">#</a>
|
|
3554
|
+
Other examples</h2>
|
|
3555
|
+
<p><img src="/images/simple-complex-easy-hard.png" alt="Image showing 2 axes crossing in the middle, X axis is simple-complex, Y-axis is easy-hard, with a variety of examples of tasks. Looking after a newborn is simple hard, mowing the lawn is simple-middle, putting a mug in the dishwasher is easy-simple, wiring a plug is easy-middle, tracking down a bug that doesn&#39;t happen when you attach a debugger is complex-hard" /></p>
|
|
3556
|
+
<p>I'm sure you might disagree with some of these, I didn't spend an enormous
|
|
3557
|
+
amount of time thinking about them. If you have other examples, I'd love to hear
|
|
3558
|
+
about them.</p>
|
|
3559
|
+
</content>
|
|
3560
|
+
|
|
3561
|
+
</entry>
|
|
3562
|
+
|
|
3563
|
+
|
|
3564
|
+
<entry xml:lang="en">
|
|
3565
|
+
<title>Scale is Poison</title>
|
|
3566
|
+
<published>2021-03-09T00:00:00+00:00</published>
|
|
3567
|
+
<updated>2021-03-09T00:00:00+00:00</updated>
|
|
3568
|
+
<author>
|
|
3569
|
+
<name>Unknown</name>
|
|
3570
|
+
</author>
|
|
3571
|
+
<link rel="alternate" href="https://samwho.dev/blog/scale-is-poison/" type="text/html"/>
|
|
3572
|
+
<id>https://samwho.dev/blog/scale-is-poison/</id>
|
|
3573
|
+
|
|
3574
|
+
<content type="html"><p>It's March 9th 2021 and Google Calendar still doesn't have a dark mode. The iOS app update notes for the Just Eat app are still boasting about the app now supporting contact-free delivery, and have done for all 25 releases in the last 11 months that I can see on the App Store. Twitter's TweetDeck still doesn't support creating or participating in polls.</p>
|
|
3575
|
+
<h2 id="the-problem"><a class="anchor" href="#the-problem">#</a>
|
|
3576
|
+
The Problem</h2>
|
|
3577
|
+
<p>I've worked for a wide range of companies throughout my career. From a big US tech giant to a small Norwegian SaaS platform, with some bits in between. To me the problem is clear: big companies can't afford to give a fuck.</p>
|
|
3578
|
+
<p>Why is it that companies paying Google $100,000 a month for GCP get ghosted by support staff, yet Jason Fried regularly answers questions about his company <a href="https://twitter.com/search?q=%23askjf">directly on Twitter</a>?</p>
|
|
3579
|
+
<p>The answer is scale.</p>
|
|
3580
|
+
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Scale dehumanizes.<br><br>Resisting the pull of scale is a recipe for a happy life.</p>&mdash; David Perell (@david_perell) <a href="https://twitter.com/david_perell/status/1325846398093455360?ref_src=twsrc%5Etfw">November 9, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
|
|
3581
|
+
<p>When you scale, you automate. This is good and bad. It's nice to be able to get a refund automatically when an item is missing from your order. It's frustrating trying to figure out the right incantation to trick a chatbot in to connecting you to a human. It's terrifying when we encode racism into <a href="https://www.bbc.co.uk/news/technology-54349538">national approval processes</a>.</p>
|
|
3582
|
+
<p>The problem doesn't just apply to customers. Staff suffer as well. When I worked in a big US tech company, as soon as I stopped being a model employee I got stonewalled and pushed out. Exactly no effort was made to sympathise with my side of the story (my boss was bullying me), it was much easier for them to do nothing and hope I'd leave.</p>
|
|
3583
|
+
<p>Conversely, small companies have treated me exceptionally well. When my 2020 tax filing told me I owed the governemnt 10,000 GBP because of an <a href="https://twitter.com/samwhoo/status/1321039455428497408?s=20">unfortunate cocktail of mistakes</a> both myself and the company accountants had made, the CEO couldn't do enough for me.</p>
|
|
3584
|
+
<h2 id="the-solution"><a class="anchor" href="#the-solution">#</a>
|
|
3585
|
+
The Solution</h2>
|
|
3586
|
+
<p>Easy. Don't scale.</p>
|
|
3587
|
+
<p>Gasp.</p>
|
|
3588
|
+
<p>I'm serious. Don't scale past the number of users you can excellently serve. Don't scale to a point where you can't excellently polish your software. If it's becoming difficult and slow to implement new features, or react to new platform updates (e.g. widgets in iOS 14), stop. If your customers spend more time on hold than using your service, stop.</p>
|
|
3589
|
+
<p>This works in reverse, too. Annoyed that the only person you can get to listen to you is a chatbot? Move to smaller companies that give a shit. Here are some moves I made that I'm really happy with:</p>
|
|
3590
|
+
<ul>
|
|
3591
|
+
<li>Gmail to <a href="https://fastmail.com/">Fastmail</a> and <a href="https://hey.com">HEY</a>. Gmail has barely changed in years, and is a mess. Fastmail is reliable and privacy-focused, HEY is new and beautifulu and evolving every month.</li>
|
|
3592
|
+
<li>EDF to <a href="https://bulb.co.uk/">Bulb</a>. One of the rare switches that saved me money. Bulb have a great app, write great update notes, and have good customer service.</li>
|
|
3593
|
+
<li>Sky to <a href="https://www.aa.net.uk/">Andrews and Arnold</a>. Not a switch you'll want to make unless you're certain, AA are an ISP designed for techies. They're expensive but reliable, transparent, and not afraid of letting you pull the levers.</li>
|
|
3594
|
+
</ul>
|
|
3595
|
+
<p>In almost all cases, moves like this will cost you money. One of the reasons companies scale is because it's economical to do so. Being small costs more, and I appreciate not everyone can afford that, but if you can it's worth it.</p>
|
|
3596
|
+
<p>Have you made similar switches you're happy with? I'd love to hear about them. <a href="https://twitter.com/samwhoo">Tweet me</a>.</p>
|
|
3597
|
+
<h2 id="the-aside"><a class="anchor" href="#the-aside">#</a>
|
|
3598
|
+
The Aside</h2>
|
|
3599
|
+
<p>You might be thinking: "Sam, you're being unfair. I know loads of people that work in large software companies, they give lots of fucks and they're doing their best."</p>
|
|
3600
|
+
<p>I don't doubt it. I've got a lot of friends in these companies, too, and I know for certain that they're good people.</p>
|
|
3601
|
+
<p>In the end, though, they end up getting dragged into the miasma. Priorities are set by people 3-4 rungs up the ladder and there's minimal wiggle room. Performance reviews eat a month of your time per year or more, and make sure you're only working on exactly what they want you to be working on. The fetishisation of "impact" ensures that details are ignored forever.</p>
|
|
3602
|
+
<p>But they pay you a butt-load of money, so it's hard to leave.</p>
|
|
3603
|
+
</content>
|
|
3604
|
+
|
|
3605
|
+
</entry>
|
|
3606
|
+
|
|
3607
|
+
|
|
3608
|
+
<entry xml:lang="en">
|
|
3609
|
+
<title>Fun With Rust's Traits</title>
|
|
3610
|
+
<published>2020-10-17T00:00:00+00:00</published>
|
|
3611
|
+
<updated>2020-10-17T00:00:00+00:00</updated>
|
|
3612
|
+
<author>
|
|
3613
|
+
<name>Unknown</name>
|
|
3614
|
+
</author>
|
|
3615
|
+
<link rel="alternate" href="https://samwho.dev/blog/fun-with-rust-traits/" type="text/html"/>
|
|
3616
|
+
<id>https://samwho.dev/blog/fun-with-rust-traits/</id>
|
|
3617
|
+
|
|
3618
|
+
<content type="html"><p>Rust's trait system is wonderful. Everyone I know that has used it agrees
|
|
3619
|
+
with this statement. It's a great way to encode shared behaviour between
|
|
3620
|
+
data types, and create flexible APIs.</p>
|
|
3621
|
+
<p>It's also great for writing nonsense like this:</p>
|
|
3622
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">use </span><span>std</span><span style="color:#81a1c1;">::</span><span>f64</span><span style="color:#81a1c1;">::</span><span>consts</span><span style="color:#81a1c1;">::</span><span>PI</span><span style="color:#eceff4;">;
|
|
3623
|
+
</span><span style="color:#81a1c1;">use </span><span>lisp</span><span style="color:#81a1c1;">::</span><span>prelude</span><span style="color:#81a1c1;">::*</span><span style="color:#eceff4;">;
|
|
3624
|
+
</span><span>
|
|
3625
|
+
</span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">main</span><span>() {
|
|
3626
|
+
</span><span> </span><span style="color:#81a1c1;">let</span><span> r </span><span style="color:#81a1c1;">= </span><span style="color:#b48ead;">3.0</span><span style="color:#eceff4;">;
|
|
3627
|
+
</span><span> </span><span style="color:#81a1c1;">let</span><span> res </span><span style="color:#81a1c1;">= </span><span style="color:#88c0d0;">eval</span><span>((mul</span><span style="color:#eceff4;">, </span><span>PI</span><span style="color:#eceff4;">, </span><span>(mul</span><span style="color:#eceff4;">,</span><span> r</span><span style="color:#eceff4;">,</span><span> r)))</span><span style="color:#eceff4;">;
|
|
3628
|
+
</span><span> println!(</span><span style="color:#a3be8c;">&quot;</span><span style="color:#ebcb8b;">{}</span><span style="color:#a3be8c;">&quot;</span><span style="color:#eceff4;">,</span><span> res)</span><span style="color:#eceff4;">;
|
|
3629
|
+
</span><span>}
|
|
3630
|
+
</span></code></pre>
|
|
3631
|
+
<p>This program prints the area of a circle with radius 3: <code>28.274</code>.</p>
|
|
3632
|
+
<h2 id="good-grief-what-have-you-done"><a class="anchor" href="#good-grief-what-have-you-done">#</a>
|
|
3633
|
+
Good grief, what have you done?</h2>
|
|
3634
|
+
<p>At its core, Lisp is just a tree where each node evalutes to some value. The
|
|
3635
|
+
node is easy to express in Rust:</p>
|
|
3636
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">trait </span><span style="color:#8fbcbb;">Node </span><span>{
|
|
3637
|
+
</span><span> </span><span style="color:#81a1c1;">type Return</span><span style="color:#eceff4;">;
|
|
3638
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return</span><span style="color:#eceff4;">;
|
|
3639
|
+
</span><span>}
|
|
3640
|
+
</span></code></pre>
|
|
3641
|
+
<p>Then our <code>eval</code> function is a one-liner. We just take a node and evaluate it:</p>
|
|
3642
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">pub fn </span><span style="color:#88c0d0;">eval</span><span>&lt;N, R&gt;(n</span><span style="color:#eceff4;">:</span><span> N) </span><span style="color:#eceff4;">-&gt;</span><span> R
|
|
3643
|
+
</span><span style="color:#81a1c1;">where
|
|
3644
|
+
</span><span> N</span><span style="color:#eceff4;">: </span><span>Node&lt;Return = R&gt;,
|
|
3645
|
+
</span><span>{
|
|
3646
|
+
</span><span> n</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">eval</span><span>()
|
|
3647
|
+
</span><span>}
|
|
3648
|
+
</span></code></pre>
|
|
3649
|
+
<p>In Lisp, the first element in a node is usually a function, and the rest of
|
|
3650
|
+
the elements are the arguments to that function. Here's the simplest case: a
|
|
3651
|
+
function that takes no arguments.</p>
|
|
3652
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">impl</span><span>&lt;F, R&gt; Node </span><span style="color:#81a1c1;">for</span><span> (F,)
|
|
3653
|
+
</span><span style="color:#81a1c1;">where
|
|
3654
|
+
</span><span> F</span><span style="color:#eceff4;">:</span><span> Fn() -&gt; R,
|
|
3655
|
+
</span><span>{
|
|
3656
|
+
</span><span> </span><span style="color:#81a1c1;">type Return =</span><span> R</span><span style="color:#eceff4;">;
|
|
3657
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3658
|
+
</span><span> </span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">0</span><span>()
|
|
3659
|
+
</span><span> }
|
|
3660
|
+
</span><span>}
|
|
3661
|
+
</span></code></pre>
|
|
3662
|
+
<p>This is saying that a single element tuple, containing a function that takes
|
|
3663
|
+
no arguments and returns a value, is evaluated by executing that function and
|
|
3664
|
+
returning its value.</p>
|
|
3665
|
+
<p>This would allow us to write and execute this <code>main</code> function:</p>
|
|
3666
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">use </span><span>lisp</span><span style="color:#81a1c1;">::</span><span>prelude</span><span style="color:#81a1c1;">::*</span><span style="color:#eceff4;">;
|
|
3667
|
+
</span><span>
|
|
3668
|
+
</span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">hello_world</span><span>() {
|
|
3669
|
+
</span><span> println!(</span><span style="color:#a3be8c;">&quot;Hello, world!&quot;</span><span>)</span><span style="color:#eceff4;">;
|
|
3670
|
+
</span><span>}
|
|
3671
|
+
</span><span>
|
|
3672
|
+
</span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">main</span><span>() {
|
|
3673
|
+
</span><span> </span><span style="color:#88c0d0;">eval</span><span>((hello_world</span><span style="color:#eceff4;">,</span><span>))</span><span style="color:#eceff4;">;
|
|
3674
|
+
</span><span>}
|
|
3675
|
+
</span></code></pre>
|
|
3676
|
+
<p>Scaling up to functions that take arguments is a case of adding those arguments
|
|
3677
|
+
to the Node implementation:</p>
|
|
3678
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">impl</span><span>&lt;F, A, R&gt; Node </span><span style="color:#81a1c1;">for</span><span> (F, A)
|
|
3679
|
+
</span><span style="color:#81a1c1;">where
|
|
3680
|
+
</span><span> F</span><span style="color:#eceff4;">:</span><span> Fn(A) -&gt; R,
|
|
3681
|
+
</span><span>{
|
|
3682
|
+
</span><span> </span><span style="color:#81a1c1;">type Return =</span><span> R</span><span style="color:#eceff4;">;
|
|
3683
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3684
|
+
</span><span> </span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">0</span><span>(</span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">1</span><span>)
|
|
3685
|
+
</span><span> }
|
|
3686
|
+
</span><span>}
|
|
3687
|
+
</span></code></pre>
|
|
3688
|
+
<p>But wait. This is where we hit our first snag. In Lisp, arguments to a
|
|
3689
|
+
function can also be nodes in the tree that need evaluating. Consider the
|
|
3690
|
+
following expression:</p>
|
|
3691
|
+
<pre data-lang="lisp" style="background-color:#2e3440;color:#d8dee9;" class="language-lisp "><code class="language-lisp" data-lang="lisp"><span>(</span><span style="color:#81a1c1;">+ </span><span style="color:#b48ead;">1 </span><span>(</span><span style="color:#81a1c1;">+ </span><span style="color:#b48ead;">2 3</span><span>))
|
|
3692
|
+
</span></code></pre>
|
|
3693
|
+
<p>We would expect the value to be 6. In the trait we just defined, this wouldn't
|
|
3694
|
+
work. We aren't accounting for arguments also being nodes. Fortunately the
|
|
3695
|
+
fix isn't too complicated:</p>
|
|
3696
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">impl</span><span>&lt;F, A, B, R&gt; Node </span><span style="color:#81a1c1;">for</span><span> (F, A)
|
|
3697
|
+
</span><span style="color:#81a1c1;">where
|
|
3698
|
+
</span><span> F</span><span style="color:#eceff4;">:</span><span> Fn(B) -&gt; R,
|
|
3699
|
+
</span><span> A</span><span style="color:#eceff4;">: </span><span>Node&lt;Return = B&gt;,
|
|
3700
|
+
</span><span>{
|
|
3701
|
+
</span><span> </span><span style="color:#81a1c1;">type Return =</span><span> R</span><span style="color:#eceff4;">;
|
|
3702
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3703
|
+
</span><span> </span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">0</span><span>(</span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">1.</span><span style="color:#88c0d0;">eval</span><span>())
|
|
3704
|
+
</span><span> }
|
|
3705
|
+
</span><span>}
|
|
3706
|
+
</span></code></pre>
|
|
3707
|
+
<p>And we can keep scaling this to as many arguments as we expect functions to
|
|
3708
|
+
need.</p>
|
|
3709
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">impl</span><span>&lt;F, A1, A2, A3, A4, R1, R2, R3, R4, R&gt; Node </span><span style="color:#81a1c1;">for</span><span> (F, A1, A2, A3, A4)
|
|
3710
|
+
</span><span style="color:#81a1c1;">where
|
|
3711
|
+
</span><span> F</span><span style="color:#eceff4;">:</span><span> Fn(R1, R2, R3, R4) -&gt; R,
|
|
3712
|
+
</span><span> A1</span><span style="color:#eceff4;">: </span><span>Node&lt;Return = R1&gt;,
|
|
3713
|
+
</span><span> A2</span><span style="color:#eceff4;">: </span><span>Node&lt;Return = R2&gt;,
|
|
3714
|
+
</span><span> A3</span><span style="color:#eceff4;">: </span><span>Node&lt;Return = R3&gt;,
|
|
3715
|
+
</span><span> A4</span><span style="color:#eceff4;">: </span><span>Node&lt;Return = R4&gt;,
|
|
3716
|
+
</span><span>{
|
|
3717
|
+
</span><span> </span><span style="color:#81a1c1;">type Return =</span><span> R</span><span style="color:#eceff4;">;
|
|
3718
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3719
|
+
</span><span> </span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">0</span><span>(</span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">1.</span><span style="color:#88c0d0;">eval</span><span>()</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">2.</span><span style="color:#88c0d0;">eval</span><span>()</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">3.</span><span style="color:#88c0d0;">eval</span><span>()</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">self.</span><span style="color:#b48ead;">4.</span><span style="color:#88c0d0;">eval</span><span>())
|
|
3720
|
+
</span><span> }
|
|
3721
|
+
</span><span>}
|
|
3722
|
+
</span></code></pre>
|
|
3723
|
+
<p>With these trait definitions, we have made it possible to evaluate Rust
|
|
3724
|
+
tuples recursively in much the same way Lisp is evaluated.</p>
|
|
3725
|
+
<p>The last thing we need now is to define primitive types as nodes, otherwise
|
|
3726
|
+
we won't be able to evaluate anything.</p>
|
|
3727
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">impl </span><span>Node </span><span style="color:#81a1c1;">for </span><span>i32 {
|
|
3728
|
+
</span><span> </span><span style="color:#81a1c1;">type Return = Self</span><span style="color:#eceff4;">;
|
|
3729
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3730
|
+
</span><span> </span><span style="color:#81a1c1;">self
|
|
3731
|
+
</span><span> }
|
|
3732
|
+
</span><span>}
|
|
3733
|
+
</span></code></pre>
|
|
3734
|
+
<p>To avoid the drudgery of repeating these 6 lines of code for every type I can
|
|
3735
|
+
think of, I wrote a small macro to help me.</p>
|
|
3736
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#88c0d0;">macro_rules! </span><span>identity_node {
|
|
3737
|
+
</span><span> ( </span><span style="color:#81a1c1;">$</span><span>($t</span><span style="color:#eceff4;">:</span><span style="color:#81a1c1;">ty</span><span>),</span><span style="color:#81a1c1;">* </span><span>) </span><span style="color:#81a1c1;">=&gt; </span><span>{
|
|
3738
|
+
</span><span> </span><span style="color:#81a1c1;">$</span><span>(
|
|
3739
|
+
</span><span> </span><span style="color:#81a1c1;">impl </span><span>Node </span><span style="color:#81a1c1;">for </span><span>$t {
|
|
3740
|
+
</span><span> </span><span style="color:#81a1c1;">type Return = </span><span>$t</span><span style="color:#eceff4;">;
|
|
3741
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3742
|
+
</span><span> </span><span style="color:#81a1c1;">self
|
|
3743
|
+
</span><span> }
|
|
3744
|
+
</span><span> }
|
|
3745
|
+
</span><span> )</span><span style="color:#81a1c1;">*
|
|
3746
|
+
</span><span> }</span><span style="color:#eceff4;">;
|
|
3747
|
+
</span><span>}
|
|
3748
|
+
</span><span>
|
|
3749
|
+
</span><span>identity_node!(</span><span style="color:#81a1c1;">char</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">i8</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">i16</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">i32</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">i64</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">i128</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">u8</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">u16</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">u32</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">u64</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">u128</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">f32</span><span style="color:#eceff4;">, </span><span style="color:#81a1c1;">f64</span><span style="color:#eceff4;">, </span><span style="color:#8fbcbb;">String</span><span>)</span><span style="color:#eceff4;">;
|
|
3750
|
+
</span></code></pre>
|
|
3751
|
+
<p>With all of this in place it's possible to evaluate expressions. I made some helper
|
|
3752
|
+
functions for arithmetic operations to avoid the ugliness of using <code>Mul::mul</code> and
|
|
3753
|
+
<code>Add::add</code> directly:</p>
|
|
3754
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">use </span><span>std</span><span style="color:#81a1c1;">::</span><span>ops</span><span style="color:#81a1c1;">::</span><span>{Add</span><span style="color:#eceff4;">,</span><span> Div</span><span style="color:#eceff4;">,</span><span> Mul</span><span style="color:#eceff4;">,</span><span> Sub}</span><span style="color:#eceff4;">;
|
|
3755
|
+
</span><span>
|
|
3756
|
+
</span><span style="color:#81a1c1;">pub fn </span><span style="color:#88c0d0;">add</span><span>&lt;A, B, C&gt;(a</span><span style="color:#eceff4;">:</span><span> A, b</span><span style="color:#eceff4;">:</span><span> B) </span><span style="color:#eceff4;">-&gt;</span><span> C
|
|
3757
|
+
</span><span style="color:#81a1c1;">where
|
|
3758
|
+
</span><span> A</span><span style="color:#eceff4;">: </span><span>Add&lt;B, Output = C&gt;,
|
|
3759
|
+
</span><span>{
|
|
3760
|
+
</span><span> a </span><span style="color:#81a1c1;">+</span><span> b
|
|
3761
|
+
</span><span>}
|
|
3762
|
+
</span><span>
|
|
3763
|
+
</span><span style="color:#81a1c1;">pub fn </span><span style="color:#88c0d0;">sub</span><span>&lt;A, B, C&gt;(a</span><span style="color:#eceff4;">:</span><span> A, b</span><span style="color:#eceff4;">:</span><span> B) </span><span style="color:#eceff4;">-&gt;</span><span> C
|
|
3764
|
+
</span><span style="color:#81a1c1;">where
|
|
3765
|
+
</span><span> A</span><span style="color:#eceff4;">: </span><span>Sub&lt;B, Output = C&gt;,
|
|
3766
|
+
</span><span>{
|
|
3767
|
+
</span><span> a </span><span style="color:#81a1c1;">-</span><span> b
|
|
3768
|
+
</span><span>}
|
|
3769
|
+
</span><span>
|
|
3770
|
+
</span><span style="color:#81a1c1;">pub fn </span><span style="color:#88c0d0;">mul</span><span>&lt;A, B, C&gt;(a</span><span style="color:#eceff4;">:</span><span> A, b</span><span style="color:#eceff4;">:</span><span> B) </span><span style="color:#eceff4;">-&gt;</span><span> C
|
|
3771
|
+
</span><span style="color:#81a1c1;">where
|
|
3772
|
+
</span><span> A</span><span style="color:#eceff4;">: </span><span>Mul&lt;B, Output = C&gt;,
|
|
3773
|
+
</span><span>{
|
|
3774
|
+
</span><span> a </span><span style="color:#81a1c1;">*</span><span> b
|
|
3775
|
+
</span><span>}
|
|
3776
|
+
</span><span>
|
|
3777
|
+
</span><span style="color:#81a1c1;">pub fn </span><span style="color:#88c0d0;">div</span><span>&lt;A, B, C&gt;(a</span><span style="color:#eceff4;">:</span><span> A, b</span><span style="color:#eceff4;">:</span><span> B) </span><span style="color:#eceff4;">-&gt;</span><span> C
|
|
3778
|
+
</span><span style="color:#81a1c1;">where
|
|
3779
|
+
</span><span> A</span><span style="color:#eceff4;">: </span><span>Div&lt;B, Output = C&gt;,
|
|
3780
|
+
</span><span>{
|
|
3781
|
+
</span><span> a </span><span style="color:#81a1c1;">/</span><span> b
|
|
3782
|
+
</span><span>}
|
|
3783
|
+
</span></code></pre>
|
|
3784
|
+
<h2 id="further-work"><a class="anchor" href="#further-work">#</a>
|
|
3785
|
+
Further work</h2>
|
|
3786
|
+
<p>The above allows the first example, calculating the radius of a circle, to
|
|
3787
|
+
compile and run. But what about more useful examples like iterating over
|
|
3788
|
+
things, mapping them, and reducing them?</p>
|
|
3789
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">use </span><span>lisp</span><span style="color:#81a1c1;">::</span><span>prelude</span><span style="color:#81a1c1;">::*</span><span style="color:#eceff4;">;
|
|
3790
|
+
</span><span>
|
|
3791
|
+
</span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">add_one</span><span>(a</span><span style="color:#eceff4;">: </span><span style="color:#81a1c1;">i32</span><span>) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">i32 </span><span>{
|
|
3792
|
+
</span><span> a </span><span style="color:#81a1c1;">+ </span><span style="color:#b48ead;">1
|
|
3793
|
+
</span><span>}
|
|
3794
|
+
</span><span>
|
|
3795
|
+
</span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">main</span><span>() {
|
|
3796
|
+
</span><span> </span><span style="color:#81a1c1;">let</span><span> v </span><span style="color:#81a1c1;">= </span><span>vec![</span><span style="color:#b48ead;">1</span><span style="color:#eceff4;">, </span><span style="color:#b48ead;">2</span><span style="color:#eceff4;">, </span><span style="color:#b48ead;">3</span><span>]</span><span style="color:#eceff4;">;
|
|
3797
|
+
</span><span> </span><span style="color:#81a1c1;">let</span><span> res</span><span style="color:#eceff4;">: </span><span style="color:#8fbcbb;">Vec</span><span>&lt;</span><span style="color:#81a1c1;">i32</span><span>&gt; </span><span style="color:#81a1c1;">= </span><span style="color:#88c0d0;">eval</span><span>((to_vec</span><span style="color:#eceff4;">, </span><span>(reduce</span><span style="color:#eceff4;">, </span><span style="color:#b48ead;">0</span><span style="color:#eceff4;">,</span><span> add</span><span style="color:#eceff4;">, </span><span>(map</span><span style="color:#eceff4;">,</span><span> add_one</span><span style="color:#eceff4;">, </span><span>vec![</span><span style="color:#b48ead;">1</span><span style="color:#eceff4;">, </span><span style="color:#b48ead;">2</span><span style="color:#eceff4;">, </span><span style="color:#b48ead;">3</span><span>]))))</span><span style="color:#eceff4;">;
|
|
3798
|
+
</span><span> println!(</span><span style="color:#a3be8c;">&quot;</span><span style="color:#ebcb8b;">{:?}</span><span style="color:#a3be8c;">&quot;</span><span style="color:#eceff4;">,</span><span> res)</span><span style="color:#eceff4;">;
|
|
3799
|
+
</span><span>}
|
|
3800
|
+
</span></code></pre>
|
|
3801
|
+
<p>Wouldn't it be cool if this worked?</p>
|
|
3802
|
+
<p>Alas, I couldn't figure out a way to do it. I was able to define the functions:</p>
|
|
3803
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">use </span><span>std</span><span style="color:#81a1c1;">::</span><span>iter</span><span style="color:#81a1c1;">::</span><span>Map</span><span style="color:#eceff4;">;
|
|
3804
|
+
</span><span>
|
|
3805
|
+
</span><span style="color:#81a1c1;">pub fn </span><span style="color:#88c0d0;">map</span><span>&lt;I, E, F, R&gt;(f</span><span style="color:#eceff4;">:</span><span> F, i</span><span style="color:#eceff4;">:</span><span> I) </span><span style="color:#eceff4;">-&gt; </span><span>Map&lt;</span><span style="color:#81a1c1;">I::</span><span>IntoIter, F&gt;
|
|
3806
|
+
</span><span style="color:#81a1c1;">where
|
|
3807
|
+
</span><span> F</span><span style="color:#eceff4;">:</span><span> FnMut(E) -&gt; R,
|
|
3808
|
+
</span><span> I</span><span style="color:#eceff4;">: </span><span style="color:#8fbcbb;">IntoIterator</span><span>&lt;Item = E&gt;,
|
|
3809
|
+
</span><span>{
|
|
3810
|
+
</span><span> i</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">into_iter</span><span>()</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">map</span><span>(f)
|
|
3811
|
+
</span><span>}
|
|
3812
|
+
</span><span>
|
|
3813
|
+
</span><span style="color:#81a1c1;">pub fn </span><span style="color:#88c0d0;">reduce</span><span>&lt;I, E, F, R&gt;(init</span><span style="color:#eceff4;">:</span><span> R, f</span><span style="color:#eceff4;">:</span><span> F, i</span><span style="color:#eceff4;">:</span><span> I) </span><span style="color:#eceff4;">-&gt;</span><span> R
|
|
3814
|
+
</span><span style="color:#81a1c1;">where
|
|
3815
|
+
</span><span> F</span><span style="color:#eceff4;">:</span><span> FnMut(R, E) -&gt; R,
|
|
3816
|
+
</span><span> I</span><span style="color:#eceff4;">: </span><span style="color:#8fbcbb;">IntoIterator</span><span>&lt;Item = E&gt;,
|
|
3817
|
+
</span><span>{
|
|
3818
|
+
</span><span> i</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">into_iter</span><span>()</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">fold</span><span>(init</span><span style="color:#eceff4;">,</span><span> f)
|
|
3819
|
+
</span><span>}
|
|
3820
|
+
</span><span>
|
|
3821
|
+
</span><span style="color:#81a1c1;">pub fn </span><span style="color:#88c0d0;">to_vec</span><span>&lt;T, I&gt;(i</span><span style="color:#eceff4;">:</span><span> I) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#8fbcbb;">Vec</span><span>&lt;T&gt;
|
|
3822
|
+
</span><span style="color:#81a1c1;">where
|
|
3823
|
+
</span><span> I</span><span style="color:#eceff4;">: </span><span style="color:#8fbcbb;">IntoIterator</span><span>&lt;Item = T&gt;,
|
|
3824
|
+
</span><span>{
|
|
3825
|
+
</span><span> i</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">into_iter</span><span>()</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">collect</span><span>()
|
|
3826
|
+
</span><span>}
|
|
3827
|
+
</span></code></pre>
|
|
3828
|
+
<p>But I wasn't able to figure out how to make it possible to pass functions as
|
|
3829
|
+
arguments. I hit compile-time errors every time I tried.</p>
|
|
3830
|
+
<p>For example, this does not compile:</p>
|
|
3831
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">impl</span><span>&lt;F, A, R&gt; Node </span><span style="color:#81a1c1;">for </span><span>F
|
|
3832
|
+
</span><span style="color:#81a1c1;">where
|
|
3833
|
+
</span><span> F</span><span style="color:#eceff4;">:</span><span> Fn(A) -&gt; R
|
|
3834
|
+
</span><span>{
|
|
3835
|
+
</span><span> </span><span style="color:#81a1c1;">type Return = Self</span><span style="color:#eceff4;">;
|
|
3836
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3837
|
+
</span><span> </span><span style="color:#81a1c1;">self
|
|
3838
|
+
</span><span> }
|
|
3839
|
+
</span><span>}
|
|
3840
|
+
</span></code></pre>
|
|
3841
|
+
<p>The error is:</p>
|
|
3842
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>error[E0207]: the type parameter `A` is not constrained by the impl trait, self type, or predicates
|
|
3843
|
+
</span><span> --&gt; src\lisp\node.rs:98:9
|
|
3844
|
+
</span><span> |
|
|
3845
|
+
</span><span>98 | impl&lt;F, A, R&gt; Node for F
|
|
3846
|
+
</span><span> | ^ unconstrained type parameter
|
|
3847
|
+
</span><span>
|
|
3848
|
+
</span><span>error[E0207]: the type parameter `R` is not constrained by the impl trait, self type, or predicates
|
|
3849
|
+
</span><span> --&gt; src\lisp\node.rs:98:12
|
|
3850
|
+
</span><span> |
|
|
3851
|
+
</span><span>98 | impl&lt;F, A, R&gt; Node for F
|
|
3852
|
+
</span><span> | ^ unconstrained type parameter
|
|
3853
|
+
</span></code></pre>
|
|
3854
|
+
<p>This <em>does</em> compile:</p>
|
|
3855
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">impl</span><span>&lt;A, R&gt; Node </span><span style="color:#81a1c1;">for </span><span>fn(A) -&gt; R
|
|
3856
|
+
</span><span>{
|
|
3857
|
+
</span><span> </span><span style="color:#81a1c1;">type Return = Self</span><span style="color:#eceff4;">;
|
|
3858
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3859
|
+
</span><span> </span><span style="color:#81a1c1;">self
|
|
3860
|
+
</span><span> }
|
|
3861
|
+
</span><span>}
|
|
3862
|
+
</span></code></pre>
|
|
3863
|
+
<p>But if I try and use it like so:</p>
|
|
3864
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">use </span><span>lisp</span><span style="color:#81a1c1;">::</span><span>prelude</span><span style="color:#81a1c1;">::*</span><span style="color:#eceff4;">;
|
|
3865
|
+
</span><span>
|
|
3866
|
+
</span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">run</span><span>&lt;A, R&gt;(f</span><span style="color:#eceff4;">:</span><span> impl Fn(A) -&gt; R, a</span><span style="color:#eceff4;">:</span><span> A) </span><span style="color:#eceff4;">-&gt;</span><span> R {
|
|
3867
|
+
</span><span> </span><span style="color:#88c0d0;">f</span><span>(a)
|
|
3868
|
+
</span><span>}
|
|
3869
|
+
</span><span>
|
|
3870
|
+
</span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">hello</span><span>(name</span><span style="color:#eceff4;">:</span><span> String) {
|
|
3871
|
+
</span><span> println!(</span><span style="color:#a3be8c;">&quot;Hello, </span><span style="color:#ebcb8b;">{}</span><span style="color:#a3be8c;">!&quot;</span><span style="color:#eceff4;">,</span><span> name)</span><span style="color:#eceff4;">;
|
|
3872
|
+
</span><span>}
|
|
3873
|
+
</span><span>
|
|
3874
|
+
</span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">main</span><span>() {
|
|
3875
|
+
</span><span> </span><span style="color:#81a1c1;">let</span><span> name </span><span style="color:#81a1c1;">= </span><span style="color:#a3be8c;">&quot;Sam&quot;</span><span style="color:#81a1c1;">.</span><span style="color:#88c0d0;">to_owned</span><span>()</span><span style="color:#eceff4;">;
|
|
3876
|
+
</span><span> </span><span style="color:#88c0d0;">eval</span><span>((run</span><span style="color:#eceff4;">,</span><span> hello</span><span style="color:#eceff4;">,</span><span> name))</span><span style="color:#eceff4;">;
|
|
3877
|
+
</span><span>}
|
|
3878
|
+
</span></code></pre>
|
|
3879
|
+
<p>I get this error:</p>
|
|
3880
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>error[E0277]: the trait bound `fn(std::string::String) {hello}: lisp::prelude::Node` is not satisfied
|
|
3881
|
+
</span><span> --&gt; examples\fn_as_arg.rs:13:10
|
|
3882
|
+
</span><span> |
|
|
3883
|
+
</span><span>13 | eval((run, hello, name));
|
|
3884
|
+
</span><span> | ^^^^^^^^^^^^^^^^^^ the trait `lisp::prelude::Node` is not implemented for `fn(std::string::String) {hello}`
|
|
3885
|
+
</span><span> |
|
|
3886
|
+
</span><span> ::: C:\Users\hello\Documents\GitHub\rust-lisp-with-traits\src\lisp\mod.rs:11:8
|
|
3887
|
+
</span><span> |
|
|
3888
|
+
</span><span>11 | N: Node&lt;Return = R&gt;,
|
|
3889
|
+
</span><span> | ---------------- required by this bound in `lisp::prelude::eval`
|
|
3890
|
+
</span><span> |
|
|
3891
|
+
</span><span> = note: required because of the requirements on the impl of `lisp::prelude::Node` for `(fn(_, std::string::String) -&gt; _ {run::&lt;std::string::String, _, _&gt;}, fn(std::string::String) {hello}, std::string::String)`
|
|
3892
|
+
</span></code></pre>
|
|
3893
|
+
<p>If I implement the following:</p>
|
|
3894
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#81a1c1;">impl</span><span>&lt;A, R&gt; Node </span><span style="color:#81a1c1;">for </span><span>Box&lt;dyn </span><span style="color:#8fbcbb;">Fn</span><span>(A) </span><span style="color:#eceff4;">-&gt;</span><span> R&gt; {
|
|
3895
|
+
</span><span> </span><span style="color:#81a1c1;">type Return = Self</span><span style="color:#eceff4;">;
|
|
3896
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3897
|
+
</span><span> </span><span style="color:#81a1c1;">self
|
|
3898
|
+
</span><span> }
|
|
3899
|
+
</span><span>}
|
|
3900
|
+
</span><span>
|
|
3901
|
+
</span><span style="color:#81a1c1;">impl</span><span>&lt;T&gt; Node </span><span style="color:#81a1c1;">for </span><span>Box&lt;T&gt; {
|
|
3902
|
+
</span><span> </span><span style="color:#81a1c1;">type Return =</span><span> T</span><span style="color:#eceff4;">;
|
|
3903
|
+
</span><span> </span><span style="color:#81a1c1;">fn </span><span style="color:#88c0d0;">eval</span><span>(self) </span><span style="color:#eceff4;">-&gt; </span><span style="color:#81a1c1;">Self::</span><span>Return {
|
|
3904
|
+
</span><span> </span><span style="color:#81a1c1;">*self
|
|
3905
|
+
</span><span> }
|
|
3906
|
+
</span><span>}
|
|
3907
|
+
</span></code></pre>
|
|
3908
|
+
<p>I can get away with this:</p>
|
|
3909
|
+
<pre data-lang="rust" style="background-color:#2e3440;color:#d8dee9;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#88c0d0;">eval</span><span>((run</span><span style="color:#eceff4;">, </span><span style="color:#8fbcbb;">Box</span><span style="color:#81a1c1;">::</span><span>new(hello)</span><span style="color:#eceff4;">,</span><span> name))</span><span style="color:#eceff4;">;
|
|
3910
|
+
</span></code></pre>
|
|
3911
|
+
<p>And it works as expected, but feels a bit meh. If you've read this far, and
|
|
3912
|
+
know how to get this to work without needing to box the function, I'd love to
|
|
3913
|
+
hear from you. I'm <a href="https://twitter.com/samwhoo">@samwhoo</a> on Twitter, and the
|
|
3914
|
+
full code is here: <a href="https://github.com/samwho/rust-lisp-with-traits">https://github.com/samwho/rust-lisp-with-traits</a>.</p>
|
|
3915
|
+
</content>
|
|
3916
|
+
|
|
3917
|
+
</entry>
|
|
3918
|
+
|
|
3919
|
+
|
|
3920
|
+
<entry xml:lang="en">
|
|
3921
|
+
<title>Dude, where's my main?</title>
|
|
3922
|
+
<published>2020-09-12T00:00:00+00:00</published>
|
|
3923
|
+
<updated>2020-09-12T00:00:00+00:00</updated>
|
|
3924
|
+
<author>
|
|
3925
|
+
<name>Unknown</name>
|
|
3926
|
+
</author>
|
|
3927
|
+
<link rel="alternate" href="https://samwho.dev/blog/dude-wheres-my-main/" type="text/html"/>
|
|
3928
|
+
<id>https://samwho.dev/blog/dude-wheres-my-main/</id>
|
|
3929
|
+
|
|
3930
|
+
<summary type="html"><p>So I was writing a debugger. You know the sort of thing: breakpoints,
|
|
3931
|
+
stepping, checking the value of variables. It was going wonderfully until I
|
|
3932
|
+
tried to debug my debugger <em>with</em> my debugger. <code>main</code> was just... gone. It
|
|
3933
|
+
would run, but trying to set a breakpoint on it crashed the program. It took
|
|
3934
|
+
me weeks to figure out why. This is that story.</p>
|
|
3935
|
+
</summary>
|
|
3936
|
+
|
|
3937
|
+
</entry>
|
|
3938
|
+
|
|
3939
|
+
|
|
3940
|
+
<entry xml:lang="en">
|
|
3941
|
+
<title>Having a Baby</title>
|
|
3942
|
+
<published>2020-04-03T00:00:00+00:00</published>
|
|
3943
|
+
<updated>2020-04-03T00:00:00+00:00</updated>
|
|
3944
|
+
<author>
|
|
3945
|
+
<name>Unknown</name>
|
|
3946
|
+
</author>
|
|
3947
|
+
<link rel="alternate" href="https://samwho.dev/blog/having-a-baby/" type="text/html"/>
|
|
3948
|
+
<id>https://samwho.dev/blog/having-a-baby/</id>
|
|
3949
|
+
|
|
3950
|
+
<content type="html"><p>During the pregnancy of our first child, I kept a journal. I don’t keep journals. I don’t feel like I have much to say in them. This was different. The whole experience was new, and there was a lot to learn.</p>
|
|
3951
|
+
<p>This post is a cleaning up and stitching together of that journal. It has ups and downs. It has useful tips and useless anecdotes. I’ve tried to keep as much in as I can, to capture the whole experience. A lot is missing.</p>
|
|
3952
|
+
<h2 id="beginning"><a class="anchor" href="#beginning">#</a>
|
|
3953
|
+
Beginning</h2>
|
|
3954
|
+
<p>It became real on October 19th, 2018. We paid for an 8-week scan after having multiple positive pregnancy tests. Normally the first scan is at 12-weeks, but we were impatient.</p>
|
|
3955
|
+
<p><img src="/images/8-weeks.jpg" alt="8-week scan, baby is barely visible in a sac of amniotic fluid" /></p>
|
|
3956
|
+
<p>We called him “jellybean” for the longest time, but Sophie had decided on the name Max years prior. I liked it, too.</p>
|
|
3957
|
+
<h2 id="gestation"><a class="anchor" href="#gestation">#</a>
|
|
3958
|
+
Gestation</h2>
|
|
3959
|
+
<p>This is the fancy word they use instead of “pregnancy.” We started noticing a bump at around 10-11 weeks, and by the 12-week scan we had a recognisable little human. It surprised me how much like a real person they look at such a young age.</p>
|
|
3960
|
+
<p><img src="/images/12-weeks.jpg" alt="12 week scan, baby is much more visible and has an obvious head and body" /></p>
|
|
3961
|
+
<p>The advice is to wait until 12 weeks before you tell people you’re pregnant. We didn’t, I don’t think anyone does. The idea is that the further through the pregnancy you get, the higher the odds you’ll make it all the way. The odds significantly improve at the end of the first trimester.</p>
|
|
3962
|
+
<p>At 20 weeks you go for another scan and it’s at this one you can find out the sex of the baby. Not always, sometimes the baby refuses to co-operate, but for us it was obvious he was a boy. We had asked about the sex at 12 weeks. We’re first-time parents, we didn’t know. The sonographer sighed at us. “No.”</p>
|
|
3963
|
+
<p>Toward the end of the second trimester we started to prepare all the stuff that we would need. You’re not supposed to buy car seats second hand, so we bought one new. We also caught a Mothercare closing down sale and got a buggy half-price. I had a lot of fun crashing it in to things to horrify onlookers.</p>
|
|
3964
|
+
<p>Things I would recommend expecting parents buy: a waist-height changing table (your back will thank you), a yoga ball (bouncing makes babies sleepy), and a Perfect Prep machine if you’re bottle feeding. Things I would recommend you don’t buy: baby clothes. Everyone else will buy those for you.</p>
|
|
3965
|
+
<p>At the start of the third trimester we learnt that Sophie had gestational diabetes. It’s a common complication, affecting 1 in 20 pregnancies. Sophie had to stab herself in the finger to take her blood sugar levels 4 times a day until the day she gave birth. It didn’t look any fun, and at first she struggled to draw enough blood for the tests. This meant she had to do it multiple times per reading sometimes. I couldn’t help, either, as she had to get used to doing it herself because I would be at work half of the time she had to them.</p>
|
|
3966
|
+
<p>The worry with the diabetes is that because you’re not breaking down the sugar you take in, it means there’s more sugar going to your baby. This makes baby grow faster, and if it grows too fast it won’t be able to get out safely. Fortunately we had no worries there, as Max was consistently under the average weight until the day he was born.</p>
|
|
3967
|
+
<p>Gestational diabetes meant we were now classed as a “high risk” pregnancy, and were assigned a consultant. Consultants are the top of the food chain, they are who the buck stops with when it comes to patient care. When we met our consultant, she was with a colleague. They had obviously worked together a long time, they were able to finish each other’s sentences and made a lot of jokes. I liked them a lot.</p>
|
|
3968
|
+
<p>There was another minor complication in the timing of the pregnancy, which was that 7 months in to it we were planning to get married. The wedding planning had begun some time before we knew we were pregnant. The only thing it ended up changing was the dress.</p>
|
|
3969
|
+
<p>The third trimester is also when we did some parenting classes. In the UK we have the “National Children’s Trust,” or NCT. They set up classes that last a couple of days where parents who are all due around the same time learn about how to look after their baby.</p>
|
|
3970
|
+
<p>I remember almost nothing from the NCT classes. Their primary value, given we paid around £200 to attend, was the group of parents we instantly became friends with. It is literally a scheme where you pay for new friends, and it is worth it. We all still talk every day.</p>
|
|
3971
|
+
<p>Toward the end of the pregnancy Sophie started wearing my t-shirts. I didn’t mind, except they fit her really well. She also became very tired, and spent most evenings asleep on the sofa. The calm before the storm, in a lot of ways.</p>
|
|
3972
|
+
<p>Max’s due date came and went. This is normally not a worry, babies are late all the time, but the gestational diabetes makes getting him out a little more pressing.</p>
|
|
3973
|
+
<p>There are a bunch of things you can do to try to bring on labour. A lot of them are myths but some of them are legit. Having sex has been shown to work. If it doesn’t, you can go for a “sweep.” This is as icky as it sounds. A midwife will sweep their finger around your cervix, separating parts of the amniotic sac from the cervix, releasing hormones that may start labour. Or may not.</p>
|
|
3974
|
+
<p>If all of this fails, a hospital-based induction becomes necessary. On the 5th of June this is where we found ourselves.</p>
|
|
3975
|
+
<h2 id="induction"><a class="anchor" href="#induction">#</a>
|
|
3976
|
+
Induction</h2>
|
|
3977
|
+
<p>One of the things I was worrying about as the partner in this process was getting to hospital when labour began. I don’t drive. Fortunately we had friends nearby who had offered to be our chauffeurs when the time came. We had a WhatsApp group called “Labour A-Team” and everything.</p>
|
|
3978
|
+
<p>It was a relief when we got a phone call at lunch time on the 5th of June asking if we could make our way to the hospital. No panicking, no rushing. Our Labour A-Team still drove us, as the cost of parking at the hospital is astronomical. Packing our bags in to their car almost made it feel like a holiday.</p>
|
|
3979
|
+
<p>The first thing we tried in hospital was a “pessary.” This is a strip of what felt like sandpaper that is inserted in to the vagina. Its job is to release synthetic oxytocin, which is a hormone that is known to help bring on labour (among many other things, you’ll hear a lot about oxytocin if you’re having a baby.)</p>
|
|
3980
|
+
<p>I didn’t know this beforehand, but induction is a slow process. In my mind it sounded like an immediate intervention. You induce labour and wham!, things get going. Not so. There’s also the fun fact that hospitals can only deliver so many babies at a time. Our hospital had two surgical theatres in the maternity ward, and one must always be kept free for emergencies. If they're both occupied, no one can be induced. This means an emergency C-section can block all other deliveries.</p>
|
|
3981
|
+
<p>Maternity wards are also loud. Midwives visit each bay every 4 hours to take observations, or “obs.” I thought this was an odd choice of abbreviation, given it’s also the abbreviation for “obstetrics.” Obs includes blood pressure, pulse, and a measure of baby’s heartbeat. There are also people in active labour who are yet to be transferred to a “delivery suite,” a private room where you can deliver your baby in relative comfort with your own personal midwife. People in active labour are not known for their serenity.</p>
|
|
3982
|
+
<p>As there’s not a lot to do on maternity ward, and there are only so many crosswords I can bring myself to do per day, you can’t help but eavesdrop. Staff talk quite openly, and some the things we overheard were scary.</p>
|
|
3983
|
+
<p>The maternity ward we were on was operating with 6 fewer staff than usual. Most of these being, ironically, people on maternity leave. The ward was closed to new arrivals as there weren’t enough staff to safely look after them, and they were directing people to other quite far away hospitals.</p>
|
|
3984
|
+
<p>We also overheard that both surgical theatres were occupied multiple times, meaning any emergency that did come up was in a queue.</p>
|
|
3985
|
+
<p>June 7th, 48 hours after we arrived, a midwife checks on us. No real progress. It’s time to move on to the next stage of induction. This involves manually breaking the amniotic sac, and has a high chance to kick start labour. The problem is that it has to be done in a delivery suite, and there are none available. As soon as one becomes available they’ll let us know.</p>
|
|
3986
|
+
<p>What if we go in to labour before that? Best not to think about it.</p>
|
|
3987
|
+
<p>June 8th, 72 hours after arrival. I want to talk about the chairs. There were 6 bays in our maternity ward, each with a hospital bed, a chair, and a curtain that can be drawn all the way around the bay. No two chairs were the same, and no single chair was comfortable. When we arrived 3 days prior, there was only one other person in the ward so I had a chance to sample my options. I went with a rock-solid reclining chair, thinking that the reclining functionality would outweigh the hardness when it came time to sleep.</p>
|
|
3988
|
+
<p>I was wrong. The chair reclined, but reluctantly. You had to be pressing back hard on it for it to stay in a reclined position, and lying on the chair was an exercise in balance and stillness. Twice I woke up to being flung forward because I had moved in my sleep.</p>
|
|
3989
|
+
<p>At around 5pm on June 8th, 77 hours after arriving, a delivery suite became available and we were moved down to it. What a difference! Spacious, private, personal bathroom, soft mats to lie on. I felt like royalty.</p>
|
|
3990
|
+
<p>A little while before we moved down to the delivery suite, Sophie’s contractions started getting painful. She had been contracting most of the time we had been in hospital, but until now they had only been a minor annoyance. Good timing.</p>
|
|
3991
|
+
<h2 id="labour"><a class="anchor" href="#labour">#</a>
|
|
3992
|
+
Labour</h2>
|
|
3993
|
+
<p>Things started happening faster when we moved to the delivery suite. We moved at around 5pm and the plan was to break Sophie’s waters around midnight. If this didn’t start labour within 2 hours, she would be put on an oxytocin drip. After days of nothing, this was welcome but nerve-wracking.</p>
|
|
3994
|
+
<p>We had our own midwife now whose job it was to guide us through this process. She was patient but firm. She explained the various things in the room and talked us through what she was doing as she did it. When nothing needed doing we talked casually about how she got in to midwifery, it was nice.</p>
|
|
3995
|
+
<p>Sophie was given a meal which was not optional. She was told she would need the energy, and they needed to take her final blood sugar reading before going in to labour.</p>
|
|
3996
|
+
<p>At 7:40pm someone came around to put in a cannula. This is a small flexible tube that’s injected in to the hand so that if any drugs need administering, they can be, and quickly. Sophie has awkward veins, and it took three attempts to get it in right. Her wrists were swollen afterwards.</p>
|
|
3997
|
+
<p>At 21:20pm contractions were much more frequent and much more painful. This was good and bad news. Good: things are progressing. Bad: things hurt more now. I do what I can, holding her and offering encouraging words, but can’t help but feel useless.</p>
|
|
3998
|
+
<p>1:05am, June 9th. In the 4 hours that have passed the contractions have become exponentially more painful. I’d been keeping an eye on the readings, which I had quizzed every midwife about so far. There’s a line that measures contraction intensity, and it goes from 0 to 127. When it hits 127, the midwife presses a button that seems to make it go back to 0. I’m assuming this is like when you put a bowl on some scales and press a button to reset the scales to 0 with the bowl on them. This button was getting pressed a lot.</p>
|
|
3999
|
+
<p>The progression of painkillers we were offered was: gas and air, ibuprofen, codeine, epidural. Gas and air made Sophie projectile vomit, and ibuprofen and codeine did nothing for the pain. Epidurals are no joke. Before having one I had to read and sign a form detailing the risks, as Sophie was in too much pain to read anything. It could have said anything, I think I would have signed it regardless.</p>
|
|
4000
|
+
<p>After signing the form you need to wait for an anaesthetist to become available. We got lucky, one had just come out of theatre and was able to squeeze us in before his next thing. He quietly and professionally pushed a needle in to Sophie’s spine, and within minutes the pain was gone. The needle part was worrying, as every contraction causes Sophie to convulse. I didn’t ask what happens if she has a contraction while the needle was in there.</p>
|
|
4001
|
+
<p>At 2:15am I found myself lying on the floor. In all of the excitement I pulled a muscle in my back. Now she had the epidural in, Sophie was back to her normal self. She had a button she was allowed to press if she felt the pain coming back, but other than that she said all she could feel was the pressure of each contraction. None of the pain. She began extracting our midwife’s life story. I took a picture of the ceiling.</p>
|
|
4002
|
+
<p><img src="/images/ceiling.jpeg" alt="A picture of some ceiling tiles" /></p>
|
|
4003
|
+
<p>At 3:15am our midwife checked how dilated we were. This refers to diameter of the cervix opening, which expands during labour to allow the baby to come out. 10 centimetres! This is the ideal size, and it happened much faster than anyone was expecting. The end is in sight. The plan was to wait for an hour for the body to do its thang and push the baby on its own a bit before starting pushing.</p>
|
|
4004
|
+
<h2 id="delivery"><a class="anchor" href="#delivery">#</a>
|
|
4005
|
+
Delivery</h2>
|
|
4006
|
+
<p>An hour passed, some progress was made, and it was time to start pushing. One of the downsides to having an epidural is that it makes you numb from the waist down. It’s difficult to push when you’re numb from the waist down. As each contraction was coming, the midwife told Sophie to push as if she was pushing a poo out. Sophie said she tried but it was really hard to tell if it made any difference.</p>
|
|
4007
|
+
<p>In the downtime between pushes, I couldn’t help but look at the monitors. The line tracking Max’s heart rate had become erratic. I could tell the midwife was keeping an eye on it, too. She must have caught me looking at it because she explained that it’s not worrying yet, but does create a sense of urgency.</p>
|
|
4008
|
+
<p>When we were first brought down to the delivery suite, one of the parts of the room that was explained to us was a big red button above the bed. This was for the midwife to press in an emergency. When pressed, a team of people would very quickly pour in to the room. It’s important to remain calm.</p>
|
|
4009
|
+
<p>To my surprise, when she pressed the button I did remain calm. This was due, in part, to noticing a friendly face: our consultant. She had popped in about 6 hours earlier to say hi, and that if all went well it would be the last time we saw her. It was good fortunate she was working tonight, as we later learned she was covering a shift for someone else.</p>
|
|
4010
|
+
<p>Her assessment was that Max had his umbilical cord wrapped around him in such a way that no amount of pushing was going to get him out. To make matters worse, he had pooped. This can get in to the lungs and make it hard for him to breathe, so getting him out was now an extremely high priority.</p>
|
|
4011
|
+
<p>The first port of call was a bit more pushing under the supervision of our consultant. When that failed, it was time to try an “instrumental delivery.” This is where you grab the baby with a pair of tongs (“forceps”) and pull it out. I watched as our consultant clipped two halves of a pair of forceps together, yawning as she did.</p>
|
|
4012
|
+
<p>By god do they pull, too. Our consultant was a fairly small woman, but she had one foot on the bed and lent back on those forceps with all her weight. The baby didn’t budge.</p>
|
|
4013
|
+
<p>Our consultant wasn’t fucking around, either. Before she started with the forceps she explained what she was about to do. She looked Sophie directly in the eye and said: “this is extremely important: when I say push you must push, when I say stop you must stop immediately. Do you understand?”</p>
|
|
4014
|
+
<p>As well as our consultant, there were around 8 other people all doing things. I’ve no idea what. Sophie was vomiting again so it was my job to fetch and hold cardboard bowls. I focused on it intently.</p>
|
|
4015
|
+
<p>Despite all of this the baby wouldn’t come out. In between pulls our consultant did an episiotomy. This is where a cut is made at the bottom of the vagina to make the opening larger. The cut was extended twice as the pulling got more urgent.</p>
|
|
4016
|
+
<p>After what felt like forever, at 6:23am, the baby just came out. All in one go. Cord well and truly wrapped around his neck, to which our consultant pointed and said: “ha, knew it.” Baby was put on mummy for its first feed as the medical team worked on sewing mummy back up.</p>
|
|
4017
|
+
<p>While this was happening, our consultant quipped: “I have good news. Your baby has a perfectly functioning bottom. He is currently pooing on you.”</p>
|
|
4018
|
+
<p>Straight after baby came out I was handed a pair of strange circular scissors. I wasn’t expecting it, and must have looked confused. “Did you think I was going to take this moment away from you?” our consultant smirked. All the scenes in films where people snip through the umbilical cord in one clean motion are a filthy lie. It’s extremely tough, and takes a few goes.</p>
|
|
4019
|
+
<p>After a few minutes of relief 2 things went wrong in quick succession. Max started struggling to breathe, and Sophie started haemorrhaging.</p>
|
|
4020
|
+
<p>Max was taken and placed in an incubator on the other side of the room. A paediatrician started examining him. I felt he was in good hands, and focused on my wife.</p>
|
|
4021
|
+
<p>Sophie had turned a worrying shade of white. I asked what was happening. No-one answered, or no-one heard me. An alarm started going off, with a robotic voice repeating the phrase “major obstetric haemorrhage” over and over. There were now a lot more people in the room.</p>
|
|
4022
|
+
<p>Sophie asked me how Max was. I didn’t know, and didn't want to get in anyone's way. Someone suggested I go and see him, though, so I did. The paediatrician introduced himself to me as Si, short for Simon, I think, and asked me if I was worried. I said yes, confused by such an obviously silly question.</p>
|
|
4023
|
+
<p>He explained to me that the baby had breathed in some of its own poo while in the womb. This isn’t usually a problem, and will clear on its own shortly after birth, but he was keeping an eye on him to be on the safe side. This was reassuring, but it was at this point I made a critical mistake.</p>
|
|
4024
|
+
<p>I looked back at Sophie.</p>
|
|
4025
|
+
<p>Sophie herself was fine, she was asking me how the baby was, but her blood was everywhere. It was visibly pouring out of her and pooling around the bed.</p>
|
|
4026
|
+
<p>Our consultant was pressing on part of Sophie’s stomach with one hand and trying to stop the blood flow with another. A very tall man was inserting another cannula in to her arm, a bag of blood ready to go. Someone else was injecting her with something to help her blood clot. Someone else was cleaning up vomit. Various other people were standing around attentively, holding things that might be needed.</p>
|
|
4027
|
+
<p>People kept asking me if I consent to this, do I agree to that, explaining what each thing was before it was used. I wish I could have signed a form up front stating I don’t know fucking anything about medical science and I trust the hospital staff entirely. Each second wasted getting my consent could have been a second Sophie needed.</p>
|
|
4028
|
+
<p>I started thinking about how impossible it would be to raise a baby on my own. I started thinking about what I would do if I had to go back home with neither a wife or a child. I couldn’t breathe.</p>
|
|
4029
|
+
<p>Despite an overwhelming desire to collapse in to a crying mess in the hallway outside, I kept calm. I breathed, and a few minutes later our consultant announced that the blood loss was under control.</p>
|
|
4030
|
+
<h2 id="recovery"><a class="anchor" href="#recovery">#</a>
|
|
4031
|
+
Recovery</h2>
|
|
4032
|
+
<p><img src="/images/born.jpeg" alt="Baby Max, a few minutes after birth, in an incubator wrapped in a towl and wearing a fluffy hat. He&#39;s looking calmly toward the camera." /></p>
|
|
4033
|
+
<p>The problem with giving birth at 6:23am after spending three days in hospital barely getting any sleep is that you’re tired, but you have to start being a parent straight away. I’d just pulled an all-nighter, Sophie was asleep, the medical team had dispersed and it was just me, the baby, and a midwife.</p>
|
|
4034
|
+
<p>The first day I flew mostly solo. We had to wake Sophie up every hour or two to feed Max, but other than that it was him and me. Babies don’t need a whole lot of attention at the beginning. You need to change them when they poop, feed them when they’re hungry, cuddle them a bit, but other than that they sleep. The only problem is that they won’t go longer than an hour without needing <em>something</em>.</p>
|
|
4035
|
+
<p>What they tell you about the first poops is true. It is an awful substance. It sticks to skin in much the same way treacle does, but isn’t delicious. The upside, though, is that all of the subsequent poops are trivial in comparison.</p>
|
|
4036
|
+
<p>The midwife stood by me as I changed this first nappy. She wasn’t patronising or bossy or judgmental. She talked me through the steps, answered my questions, and smiled at me when it was done. Her presence was reassuring, and she made me feel well looked after.</p>
|
|
4037
|
+
<p>Monday, 10th of June. 5 days after arriving at hospital to be induced. Sophie’s blood iron levels had gotten very low, so a blood transfusion was needed before we could go home. We’d been moved out of the delivery suite and in to the post-delivery ward.</p>
|
|
4038
|
+
<p>I thought the pre-delivery ward was loud, this was a whole other level. Rarely did 10 consecutive minutes pass without a baby crying. On top of that, it was kept at 25C at all times to keep the babies warm. 25C is an obscene temperature, and I spent the whole time on this ward damp.</p>
|
|
4039
|
+
<p>The blood transfusion needed yet another cannula. This brought the total up to three simultaneous cannulas. Sophie’s arms looked like two strange, bruise-based tattoo sleeves.</p>
|
|
4040
|
+
<p>By the evening the blood transfusion was done, but we had to stay the night to be observed. I had been sleeping in a chair or on the floor for 5 days, so we thought it would be a good idea if I went home and got a proper night of sleep.</p>
|
|
4041
|
+
<p>After the delivery went south I had phoned my mum to ask if she would come stay with us a little while to help. She arrived that evening, took me home, we got a McDonald’s, watched some half-arsed TV and then I showered and went to bed. It was a feeling unlike any other, and I’ll never take beds for granted ever again.</p>
|
|
4042
|
+
<p>I woke up on Tuesday the 11th of June at 10am. I had 26 notifications and 3 missed calls. I had been sleeping through my alarm for 2 hours.</p>
|
|
4043
|
+
<p>We didn’t know this, but the second night is well-known for being a “cluster-feeding” night. This is when babies demand regular feeds and sleep in extremely short bursts. It was also the night Max found his voice. I got up, packed, wolfed down some breakfast and raced to the hospital.</p>
|
|
4044
|
+
<p>When I got there, Sophie was a wreck. She hadn’t slept all night and a doctor had just visited to tell us we had to stay another night because Max has a heart murmur. These are little holes in your heart that everyone is born with, but they close shortly after birth. Max’s hadn’t, and they wanted to keep an eye on it.</p>
|
|
4045
|
+
<p>The night of Tuesday 11th was really hard. Max cried a lot, cluster-fed, and wouldn't sleep. Twice we woke up to the sound of screaming, a midwife gently shaking us asking us to feed our baby. We felt like awful parents, but the depth of our tiredness was endless.</p>
|
|
4046
|
+
<p>Wednesday the 12th of June. A full week after arriving at hospital. By now I was sharing the bed with Sophie and I dared them to tell me I can’t. Max had calmed down on the cluster feeding and was sleeping a bit more regularly. We were still playing a waiting game that felt like it had no end, so we did a load of crosswords and eaves-dropped on the midwives all day, hoping to overhear something about Max.</p>
|
|
4047
|
+
<p>Some time in the middle of the night on the 13th of June, a doctor visited us from the Neonatal Intensive Care Unit (NICU). She listened to Max’s heart murmur and told us it was small. She could tell because the sound of the blood pumping through it was quite loud, likening it to pushing water through a thin tube. She said she’s need to do an ultrasound on it to be 100% certain it was safe for us to leave, but she didn’t know when one would become available.</p>
|
|
4048
|
+
<p>Throughout Thursday we really started losing it. The constant noise, lack of sleep, and high temperature was clawing at our sanity. We asked every midwife that came close to us to check on the status of our ultrasound. We asked about discharging ourselves, which they recommended against. We asked if we could schedule the appointment for some time in the future, but they said no.</p>
|
|
4049
|
+
<p>This nagging must have caused some kind of a stir, though, because at 17:57pm on Thursday the 13th of June we were discharged, with an appointment for an ultrasound the following day. We had spent 198 hours and 20 minutes in hospital. Getting our boy home was the biggest relief I’ve ever had, even if Max wasn’t sure about it.</p>
|
|
4050
|
+
<p><img src="/images/miffed.jpg" alt="Baby Max looking very annoyed at the camera" /></p>
|
|
4051
|
+
<h2 id="the-days-to-come"><a class="anchor" href="#the-days-to-come">#</a>
|
|
4052
|
+
The days to come</h2>
|
|
4053
|
+
<p>I'd like to write about the first months of parenthood, but this feels like a good point to wrap this post up. It's longer than I planned it to be, and I imagine if you've got this far you need a break! I find it difficult to write about the difficult things in life, and this was one of the most difficult experiences I've ever had. If you read to the end, from the bottom of my heart: thank you.</p>
|
|
4054
|
+
</content>
|
|
4055
|
+
|
|
4056
|
+
</entry>
|
|
4057
|
+
|
|
4058
|
+
|
|
4059
|
+
<entry xml:lang="en">
|
|
4060
|
+
<title>A Logical Way to Split Long Lines</title>
|
|
4061
|
+
<published>2019-05-27T00:00:00+00:00</published>
|
|
4062
|
+
<updated>2019-05-27T00:00:00+00:00</updated>
|
|
4063
|
+
<author>
|
|
4064
|
+
<name>Unknown</name>
|
|
4065
|
+
</author>
|
|
4066
|
+
<link rel="alternate" href="https://samwho.dev/blog/a-logical-way-to-split-long-lines/" type="text/html"/>
|
|
4067
|
+
<id>https://samwho.dev/blog/a-logical-way-to-split-long-lines/</id>
|
|
4068
|
+
|
|
4069
|
+
<content type="html"><p>Splitting long lines is something we do every day as programmers, but rarely do I hear discussion about how best to do it. Considering our industry-wide obsession with “best practices,” line breaks have managed to stay relatively free from scrutiny.</p>
|
|
4070
|
+
<p>A few years ago, I learned a method for splitting lines that is logical, language-independent and, most importantly, produces good results.</p>
|
|
4071
|
+
<h2 id="the-rectangle-method"><a class="anchor" href="#the-rectangle-method">#</a>
|
|
4072
|
+
The Rectangle Method</h2>
|
|
4073
|
+
<p>The core principal of this method is to always make sure you can draw a rectangle around an element and all of its children, without having to overlap with any unrelated elements. The outcome is that related things stay closer together, and our eyes rarely have to dart between distant locations.</p>
|
|
4074
|
+
<p>Confused? So was I. Let’s walk through an example.</p>
|
|
4075
|
+
<pre data-lang="java" style="background-color:#2e3440;color:#d8dee9;" class="language-java "><code class="language-java" data-lang="java"><span style="color:#8fbcbb;">JavacParser</span><span> parser </span><span style="color:#81a1c1;">=</span><span> parserFactory</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">newParser</span><span>(javaInput</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">getText</span><span>()</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepDocComments=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepEndPos=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepLineMap=*/ </span><span style="color:#81a1c1;">true</span><span>)</span><span style="color:#eceff4;">;
|
|
4076
|
+
</span></code></pre>
|
|
4077
|
+
<p>This line is 139 characters long and was taken from the source code of <a href="https://github.com/google/google-java-format/blob/64242e17f5478eb07a2ca7e409382271765f2524/core/src/main/java/com/google/googlejavaformat/java/Formatter.java#L140-L145">google-java-format</a>. It is composed of a number of elements:</p>
|
|
4078
|
+
<ol>
|
|
4079
|
+
<li>A variable declaration. This encompasses the entire line.</li>
|
|
4080
|
+
<li>The variable declaration splits in to two halves: the type and name on the left hand side, and the expression on the right hand side.</li>
|
|
4081
|
+
<li>The expression is a single method call, which could be split in to the receiver, method name, and its arguments.</li>
|
|
4082
|
+
<li>Lastly, each method argument is its own element. Comments included.</li>
|
|
4083
|
+
</ol>
|
|
4084
|
+
<p>It’s easy to draw a rectangle around this, it’s just one line. But if we say that our maximum allowed line length is 80, this line is a bit too long and needs to be split.</p>
|
|
4085
|
+
<h3 id="deciding-where-to-make-the-first-split"><a class="anchor" href="#deciding-where-to-make-the-first-split">#</a>
|
|
4086
|
+
Deciding where to make the first split</h3>
|
|
4087
|
+
<p>What do we mean when we say “an element and all of its children?” Programming language syntax can usually be represented as a tree. Our example would look something like this:</p>
|
|
4088
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span> Variable declaration
|
|
4089
|
+
</span><span> / \
|
|
4090
|
+
</span><span> Type + name Expression
|
|
4091
|
+
</span><span> / \
|
|
4092
|
+
</span><span> Receiver Method call
|
|
4093
|
+
</span><span> / \
|
|
4094
|
+
</span><span> Name Arguments
|
|
4095
|
+
</span></code></pre>
|
|
4096
|
+
<p>We want to split such that you can draw a rectangle around each subtree, without touching any other subtree.</p>
|
|
4097
|
+
<p>A natural place to make this first split would be just after the <code>=</code>:</p>
|
|
4098
|
+
<pre data-lang="java" style="background-color:#2e3440;color:#d8dee9;" class="language-java "><code class="language-java" data-lang="java"><span style="color:#8fbcbb;">JavacParser</span><span> parser </span><span style="color:#81a1c1;">=
|
|
4099
|
+
</span><span> parserFactory</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">newParser</span><span>(javaInput</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">getText</span><span>()</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepDocComments=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepEndPos=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepLineMap=*/ </span><span style="color:#81a1c1;">true</span><span>)</span><span style="color:#eceff4;">;
|
|
4100
|
+
</span></code></pre>
|
|
4101
|
+
<p>This passes the rectangle test because we can draw a rectangle around every element and its children without overlapping with unrelated elements:</p>
|
|
4102
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>┌──────────────────────────────────────────────────────┐
|
|
4103
|
+
</span><span>│┌──────────────────────┐ │
|
|
4104
|
+
</span><span>││ JavacParser parser = │ │
|
|
4105
|
+
</span><span>│└─┬────────────────────┴─────────────────────────────┐│
|
|
4106
|
+
</span><span>│ │ parserFactory.newParser(javaInput.getText(), ... ││
|
|
4107
|
+
</span><span>│ └──────────────────────────────────────────────────┘│
|
|
4108
|
+
</span><span>└──────────────────────────────────────────────────────┘
|
|
4109
|
+
</span></code></pre>
|
|
4110
|
+
<p>One big rectangle around the whole thing, and two smaller rectangles around each the declaration and assignment. Note that no rectangle overlaps any other rectangle.</p>
|
|
4111
|
+
<p>It’s good progress, but the second line is still 118 characters long so needs splitting again.</p>
|
|
4112
|
+
<p>Before doing this, I want to show how this would look if we split a little differently:</p>
|
|
4113
|
+
<pre data-lang="java" style="background-color:#2e3440;color:#d8dee9;" class="language-java "><code class="language-java" data-lang="java"><span style="color:#8fbcbb;">JavacParser</span><span> parser </span><span style="color:#81a1c1;">=</span><span> parserFactory</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">newParser</span><span>(
|
|
4114
|
+
</span><span> javaInput</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">getText</span><span>()</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepDocComments=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepEndPos=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepLineMap=*/ </span><span style="color:#81a1c1;">true</span><span>)</span><span style="color:#eceff4;">;
|
|
4115
|
+
</span></code></pre>
|
|
4116
|
+
<p>This doesn’t pass the rectangle test:</p>
|
|
4117
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span> ┌────────────────────────────────┐
|
|
4118
|
+
</span><span>JavacParser parser = │ parserFactory.newParser( │
|
|
4119
|
+
</span><span>┌────────────────────┘ │
|
|
4120
|
+
</span><span>│ javaInput.getText(), /*keepDocComments=*/ true, ... │
|
|
4121
|
+
</span><span>└─────────────────────────────────────────────────────┘
|
|
4122
|
+
</span></code></pre>
|
|
4123
|
+
<p>It’s not possible to draw a rectangle around the right hand side of the <code>=</code> and catch all of its children without also catching the left hand side of the <code>=</code>. It’s correct that the rectangle method flags this as a bad split. There’s an awful long way to travel from <code>newParser</code> to its first argument, which might result in your eyes having to dart back and forth more than necessary.</p>
|
|
4124
|
+
<h3 id="deciding-where-to-make-the-second-split"><a class="anchor" href="#deciding-where-to-make-the-second-split">#</a>
|
|
4125
|
+
Deciding where to make the second split</h3>
|
|
4126
|
+
<p>There are a couple of ways to make the second split, but the one I would go with is this:</p>
|
|
4127
|
+
<pre data-lang="java" style="background-color:#2e3440;color:#d8dee9;" class="language-java "><code class="language-java" data-lang="java"><span style="color:#8fbcbb;">JavacParser</span><span> parser </span><span style="color:#81a1c1;">=
|
|
4128
|
+
</span><span> parserFactory</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">newParser</span><span>(
|
|
4129
|
+
</span><span> javaInput</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">getText</span><span>()</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepDocComments=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepEndPos=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepLineMap=*/ </span><span style="color:#81a1c1;">true</span><span>)</span><span style="color:#eceff4;">;
|
|
4130
|
+
</span></code></pre>
|
|
4131
|
+
<p>Let’s see how that looks with some rectangles around it:</p>
|
|
4132
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>┌────────────────────────────────────────────────────────────┐
|
|
4133
|
+
</span><span>│┌──────────────────────┐ │
|
|
4134
|
+
</span><span>││ JavacParser parser = │ │
|
|
4135
|
+
</span><span>│└─┬────────────────────┴─────────────────────────────┐ │
|
|
4136
|
+
</span><span>│ │ parserFactory.newParser( │ │
|
|
4137
|
+
</span><span>│ └─┬────────────────────────────────────────────────┴────┐ │
|
|
4138
|
+
</span><span>│ │ javaInput.getText(), /*keepDocComments=*/ true, ... │ │
|
|
4139
|
+
</span><span>│ └─────────────────────────────────────────────────────┘ │
|
|
4140
|
+
</span><span>└────────────────────────────────────────────────────────────┘
|
|
4141
|
+
</span></code></pre>
|
|
4142
|
+
<p>We’re still good! Note that the above doesn’t draw a rectangle around every element we could, mostly due to space limitations. You can also draw a rectangle from <code>parserFactory.newParser</code> around everything else after it.</p>
|
|
4143
|
+
<p>Another way of doing the split that would also pass the rectangle test is:</p>
|
|
4144
|
+
<pre data-lang="java" style="background-color:#2e3440;color:#d8dee9;" class="language-java "><code class="language-java" data-lang="java"><span style="color:#8fbcbb;">JavacParser</span><span> parser </span><span style="color:#81a1c1;">=
|
|
4145
|
+
</span><span> parserFactory
|
|
4146
|
+
</span><span> </span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">newParser</span><span>(
|
|
4147
|
+
</span><span> javaInput</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">getText</span><span>()</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepDocComments=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepEndPos=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepLineMap=*/ </span><span style="color:#81a1c1;">true</span><span>)</span><span style="color:#eceff4;">;
|
|
4148
|
+
</span></code></pre>
|
|
4149
|
+
<p>But splitting at the <code>.</code> feels a little too eager to me. You can use less vertical space and lose no clarity by leaving that line as one.</p>
|
|
4150
|
+
<p>Sadly our third line is still 94 characters long, and needs to be split yet again.</p>
|
|
4151
|
+
<h2 id="deciding-where-to-make-the-third-split"><a class="anchor" href="#deciding-where-to-make-the-third-split">#</a>
|
|
4152
|
+
Deciding where to make the third split</h2>
|
|
4153
|
+
<p>Again, there are multiple routes for this one but I would go for the following:</p>
|
|
4154
|
+
<pre data-lang="java" style="background-color:#2e3440;color:#d8dee9;" class="language-java "><code class="language-java" data-lang="java"><span style="color:#8fbcbb;">JavacParser</span><span> parser </span><span style="color:#81a1c1;">=
|
|
4155
|
+
</span><span> parserFactory</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">newParser</span><span>(
|
|
4156
|
+
</span><span> javaInput</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">getText</span><span>()</span><span style="color:#eceff4;">,
|
|
4157
|
+
</span><span> </span><span style="color:#616e88;">/*keepDocComments=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">,
|
|
4158
|
+
</span><span> </span><span style="color:#616e88;">/*keepEndPos=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">,
|
|
4159
|
+
</span><span> </span><span style="color:#616e88;">/*keepLineMap=*/ </span><span style="color:#81a1c1;">true</span><span>)</span><span style="color:#eceff4;">;
|
|
4160
|
+
</span></code></pre>
|
|
4161
|
+
<p>With rectangles:</p>
|
|
4162
|
+
<pre data-lang="txt" style="background-color:#2e3440;color:#d8dee9;" class="language-txt "><code class="language-txt" data-lang="txt"><span>┌──────────────────────────────────┐
|
|
4163
|
+
</span><span>│┌──────────────────────┐ │
|
|
4164
|
+
</span><span>││ JavacParser parser = │ │
|
|
4165
|
+
</span><span>│└─┬────────────────────┴─────┐ │
|
|
4166
|
+
</span><span>│ │ parserFactory.newParser( │ │
|
|
4167
|
+
</span><span>│ └─┬────────────────────────┴───┐│
|
|
4168
|
+
</span><span>│ │ javaInput.getText(), ││
|
|
4169
|
+
</span><span>│ ├────────────────────────────┤│
|
|
4170
|
+
</span><span>│ │ /*keepDocComments=*/ true, ││
|
|
4171
|
+
</span><span>│ ├────────────────────────────┤│
|
|
4172
|
+
</span><span>│ │ /*keepEndPos=*/ true, ││
|
|
4173
|
+
</span><span>│ ├────────────────────────────┤│
|
|
4174
|
+
</span><span>│ │ /*keepLineMap=*/ true); ││
|
|
4175
|
+
</span><span>│ └────────────────────────────┘│
|
|
4176
|
+
</span><span>└──────────────────────────────────┘
|
|
4177
|
+
</span></code></pre>
|
|
4178
|
+
<p>Again, for space reasons, not all possible rectangles have been drawn.</p>
|
|
4179
|
+
<p>We could have also had multiple arguments per line:</p>
|
|
4180
|
+
<pre data-lang="java" style="background-color:#2e3440;color:#d8dee9;" class="language-java "><code class="language-java" data-lang="java"><span style="color:#8fbcbb;">JavacParser</span><span> parser </span><span style="color:#81a1c1;">=
|
|
4181
|
+
</span><span> parserFactory</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">newParser</span><span>(
|
|
4182
|
+
</span><span> javaInput</span><span style="color:#eceff4;">.</span><span style="color:#88c0d0;">getText</span><span>()</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepDocComments=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">,
|
|
4183
|
+
</span><span> </span><span style="color:#616e88;">/*keepEndPos=*/ </span><span style="color:#81a1c1;">true</span><span style="color:#eceff4;">, </span><span style="color:#616e88;">/*keepLineMap=*/ </span><span style="color:#81a1c1;">true</span><span>)</span><span style="color:#eceff4;">;
|
|
4184
|
+
</span></code></pre>
|
|
4185
|
+
<p>This passes the rectangle test and none of the lines go past the 80 character limit. However, I usually avoid this as a matter of personal preference.</p>
|
|
4186
|
+
<h2 id="conclusion"><a class="anchor" href="#conclusion">#</a>
|
|
4187
|
+
Conclusion</h2>
|
|
4188
|
+
<p>Most of my code follows this style, and I feel it’s easier to read as a result. I’m sure this is just one of many approaches, and I would love to hear about them and how they compare to this one!</p>
|
|
4189
|
+
</content>
|
|
4190
|
+
|
|
4191
|
+
</entry>
|
|
4192
|
+
|
|
4193
|
+
</feed>
|