aiiinotate 0.12.0 → 0.13.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -1,6 +1,11 @@
|
|
|
1
1
|
# aiiinotate
|
|
2
2
|
|
|
3
|
-
aiiinotate is a fast and lightweight annotation server for IIIF
|
|
3
|
+
aiiinotate is a **fast and lightweight annotation server for IIIF**.
|
|
4
|
+
|
|
5
|
+
- **aiiinotate relies on** `nodejs/fastify` and `mongodb`
|
|
6
|
+
- **provides a REST API** to read/write/update/delete IIIF annotations and index manifests.
|
|
7
|
+
- **is distributed as an NPM package**, can be used through NPM or Docker
|
|
8
|
+
- **is built for scalability and speed**: [in benchmarks](https://github.com/paulhectork/aiiinotate-benchmark) aiiinotate stores up to 10,000,000 annotations and its response times are always between $$\frac{1}{10}$$ and $$\frac{1}{100}$$ seconds
|
|
4
9
|
|
|
5
10
|
NOTE: currently, only annotations following the IIIF presentation API 2.0 and 2.1 are supported.
|
|
6
11
|
|
|
@@ -186,6 +191,16 @@ aiiinotate is well tested: **~90% test coverage** on all files !
|
|
|
186
191
|
|
|
187
192
|
---
|
|
188
193
|
|
|
194
|
+
## Scalability
|
|
195
|
+
|
|
196
|
+
In [benchmarks](https://github.com/paulhectork/aiiinotate-benchmark), aiiinotate response times are between 1/100th and 1/10th of a second with up to 10,000,000 annotations.
|
|
197
|
+
|
|
198
|
+

|
|
199
|
+
|
|
200
|
+
See [scalability.md](./docs/scalability.md) for more information.
|
|
201
|
+
|
|
202
|
+
---
|
|
203
|
+
|
|
189
204
|
## License
|
|
190
205
|
|
|
191
206
|
GNU GPL 3.0.
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
# Scalability
|
|
2
|
+
|
|
3
|
+
## Results
|
|
4
|
+
|
|
5
|
+
In [benchmarks](https://github.com/paulhectork/aiiinotate-benchmark), aiiinotate response times are between 1/100th and 1/10th of a second with up to 10,000,000 annotations.
|
|
6
|
+
|
|
7
|
+

|
|
8
|
+
|
|
9
|
+
## Discussion
|
|
10
|
+
|
|
11
|
+
### Write times
|
|
12
|
+
|
|
13
|
+
Write times (`Write anno. list`, `Write anno.`) presented above are in all probability lower than in real world applications.
|
|
14
|
+
|
|
15
|
+
Indeed, a big bottleneck when inserting annotations is to fetch an annotation's target manifest through HTTP, in ordrer to fill the `canvasIdx` field.
|
|
16
|
+
|
|
17
|
+
When running the benchmark, all annotation URIs are on `https://localhost`, an URI on localhost that is inaccessible. The request thus fails instantly.
|
|
18
|
+
|
|
19
|
+
The reason this was done is that HTTP requests are non-deterministic: if we make requests to an external server storing IIIF manifests, then we end up also benchmarking this server. By fetching manifests on an inaccessible `https://localhost`, this non-deterministic process becomes deterministic and the benchmark makes more sense.
|
|
20
|
+
|
|
21
|
+
### >10M annotations
|
|
22
|
+
|
|
23
|
+
After benchmarking for 10M annotations, we delete the database and attempt to insert 100M annotations to benchmark response times.
|
|
24
|
+
|
|
25
|
+
With 100M annotations stored, the database size becomes an issue. These scalability limits are mostly caused by MongoDB, and not by the aiiinotate app itself: the larger the database, the more hard drive memory it uses, and the more RAM it uses (since indexes are stored in the RAM). To store >100M annotations, we should scale hardware:
|
|
26
|
+
|
|
27
|
+
- **vertical scaling**: more RAM and disk space
|
|
28
|
+
- **horizontal scaling**: use a Mongo [sharded cluster](https://www.mongodb.com/docs/manual/sharding/).
|
|
29
|
+
|
|
30
|
+
It should be noted that the benchmark stress tests your machine, aiiinotate and Mongo at the same time:
|
|
31
|
+
- results presented above are obtained by running the benchmark, aiiinotate and Mongo on a single machine. All 3 are thus competing for RAM access and CPU.
|
|
32
|
+
- it uses mutliple threads, costly JSON-stringification, fast I/O and has a high throughput: it runs a lot of queries to aiiinotate, and to the MongoDB.
|
|
33
|
+
|
|
34
|
+
In real-world examples, with less throughput and when not running the benchmark itself, there should be less stress on your MongoDB server, thus less CPU usage, thus possibly more scalability.
|