opendal 0.1.6.pre.rc.1-arm64-darwin-23
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.standard.yml +20 -0
- data/.tool-versions +1 -0
- data/.yardopts +1 -0
- data/DEPENDENCIES.md +9 -0
- data/DEPENDENCIES.rust.tsv +277 -0
- data/Gemfile +35 -0
- data/README.md +159 -0
- data/Rakefile +149 -0
- data/core/CHANGELOG.md +4929 -0
- data/core/CONTRIBUTING.md +61 -0
- data/core/DEPENDENCIES.md +3 -0
- data/core/DEPENDENCIES.rust.tsv +185 -0
- data/core/LICENSE +201 -0
- data/core/README.md +228 -0
- data/core/benches/README.md +18 -0
- data/core/benches/ops/README.md +26 -0
- data/core/benches/types/README.md +9 -0
- data/core/benches/vs_fs/README.md +35 -0
- data/core/benches/vs_s3/README.md +55 -0
- data/core/edge/README.md +3 -0
- data/core/edge/file_write_on_full_disk/README.md +14 -0
- data/core/edge/s3_aws_assume_role_with_web_identity/README.md +18 -0
- data/core/edge/s3_read_on_wasm/.gitignore +3 -0
- data/core/edge/s3_read_on_wasm/README.md +42 -0
- data/core/edge/s3_read_on_wasm/webdriver.json +15 -0
- data/core/examples/README.md +23 -0
- data/core/examples/basic/README.md +15 -0
- data/core/examples/concurrent-upload/README.md +15 -0
- data/core/examples/multipart-upload/README.md +15 -0
- data/core/fuzz/.gitignore +5 -0
- data/core/fuzz/README.md +68 -0
- data/core/src/docs/comparisons/vs_object_store.md +183 -0
- data/core/src/docs/performance/concurrent_write.md +101 -0
- data/core/src/docs/performance/http_optimization.md +124 -0
- data/core/src/docs/rfcs/0000_example.md +74 -0
- data/core/src/docs/rfcs/0000_foyer_integration.md +111 -0
- data/core/src/docs/rfcs/0041_object_native_api.md +185 -0
- data/core/src/docs/rfcs/0044_error_handle.md +198 -0
- data/core/src/docs/rfcs/0057_auto_region.md +160 -0
- data/core/src/docs/rfcs/0069_object_stream.md +145 -0
- data/core/src/docs/rfcs/0090_limited_reader.md +155 -0
- data/core/src/docs/rfcs/0112_path_normalization.md +79 -0
- data/core/src/docs/rfcs/0191_async_streaming_io.md +328 -0
- data/core/src/docs/rfcs/0203_remove_credential.md +96 -0
- data/core/src/docs/rfcs/0221_create_dir.md +89 -0
- data/core/src/docs/rfcs/0247_retryable_error.md +87 -0
- data/core/src/docs/rfcs/0293_object_id.md +67 -0
- data/core/src/docs/rfcs/0337_dir_entry.md +191 -0
- data/core/src/docs/rfcs/0409_accessor_capabilities.md +67 -0
- data/core/src/docs/rfcs/0413_presign.md +154 -0
- data/core/src/docs/rfcs/0423_command_line_interface.md +268 -0
- data/core/src/docs/rfcs/0429_init_from_iter.md +107 -0
- data/core/src/docs/rfcs/0438_multipart.md +163 -0
- data/core/src/docs/rfcs/0443_gateway.md +73 -0
- data/core/src/docs/rfcs/0501_new_builder.md +111 -0
- data/core/src/docs/rfcs/0554_write_refactor.md +96 -0
- data/core/src/docs/rfcs/0561_list_metadata_reuse.md +210 -0
- data/core/src/docs/rfcs/0599_blocking_api.md +157 -0
- data/core/src/docs/rfcs/0623_redis_service.md +300 -0
- data/core/src/docs/rfcs/0627_split_capabilities.md +89 -0
- data/core/src/docs/rfcs/0661_path_in_accessor.md +126 -0
- data/core/src/docs/rfcs/0793_generic_kv_services.md +209 -0
- data/core/src/docs/rfcs/0926_object_reader.md +93 -0
- data/core/src/docs/rfcs/0977_refactor_error.md +151 -0
- data/core/src/docs/rfcs/1085_object_handler.md +73 -0
- data/core/src/docs/rfcs/1391_object_metadataer.md +110 -0
- data/core/src/docs/rfcs/1398_query_based_metadata.md +125 -0
- data/core/src/docs/rfcs/1420_object_writer.md +147 -0
- data/core/src/docs/rfcs/1477_remove_object_concept.md +159 -0
- data/core/src/docs/rfcs/1735_operation_extension.md +117 -0
- data/core/src/docs/rfcs/2083_writer_sink_api.md +106 -0
- data/core/src/docs/rfcs/2133_append_api.md +88 -0
- data/core/src/docs/rfcs/2299_chain_based_operator_api.md +99 -0
- data/core/src/docs/rfcs/2602_object_versioning.md +138 -0
- data/core/src/docs/rfcs/2758_merge_append_into_write.md +79 -0
- data/core/src/docs/rfcs/2774_lister_api.md +66 -0
- data/core/src/docs/rfcs/2779_list_with_metakey.md +143 -0
- data/core/src/docs/rfcs/2852_native_capability.md +58 -0
- data/core/src/docs/rfcs/2884_merge_range_read_into_read.md +80 -0
- data/core/src/docs/rfcs/3017_remove_write_copy_from.md +94 -0
- data/core/src/docs/rfcs/3197_config.md +237 -0
- data/core/src/docs/rfcs/3232_align_list_api.md +69 -0
- data/core/src/docs/rfcs/3243_list_prefix.md +128 -0
- data/core/src/docs/rfcs/3356_lazy_reader.md +111 -0
- data/core/src/docs/rfcs/3526_list_recursive.md +59 -0
- data/core/src/docs/rfcs/3574_concurrent_stat_in_list.md +80 -0
- data/core/src/docs/rfcs/3734_buffered_reader.md +64 -0
- data/core/src/docs/rfcs/3898_concurrent_writer.md +66 -0
- data/core/src/docs/rfcs/3911_deleter_api.md +165 -0
- data/core/src/docs/rfcs/4382_range_based_read.md +213 -0
- data/core/src/docs/rfcs/4638_executor.md +215 -0
- data/core/src/docs/rfcs/5314_remove_metakey.md +120 -0
- data/core/src/docs/rfcs/5444_operator_from_uri.md +162 -0
- data/core/src/docs/rfcs/5479_context.md +140 -0
- data/core/src/docs/rfcs/5485_conditional_reader.md +112 -0
- data/core/src/docs/rfcs/5495_list_with_deleted.md +81 -0
- data/core/src/docs/rfcs/5556_write_returns_metadata.md +121 -0
- data/core/src/docs/rfcs/5871_read_returns_metadata.md +112 -0
- data/core/src/docs/rfcs/6189_remove_native_blocking.md +106 -0
- data/core/src/docs/rfcs/6209_glob_support.md +132 -0
- data/core/src/docs/rfcs/6213_options_api.md +142 -0
- data/core/src/docs/rfcs/README.md +62 -0
- data/core/src/docs/upgrade.md +1556 -0
- data/core/src/services/aliyun_drive/docs.md +61 -0
- data/core/src/services/alluxio/docs.md +45 -0
- data/core/src/services/azblob/docs.md +77 -0
- data/core/src/services/azdls/docs.md +73 -0
- data/core/src/services/azfile/docs.md +65 -0
- data/core/src/services/b2/docs.md +54 -0
- data/core/src/services/cacache/docs.md +38 -0
- data/core/src/services/cloudflare_kv/docs.md +21 -0
- data/core/src/services/cos/docs.md +55 -0
- data/core/src/services/d1/docs.md +48 -0
- data/core/src/services/dashmap/docs.md +38 -0
- data/core/src/services/dbfs/docs.md +57 -0
- data/core/src/services/dropbox/docs.md +64 -0
- data/core/src/services/etcd/docs.md +45 -0
- data/core/src/services/foundationdb/docs.md +42 -0
- data/core/src/services/fs/docs.md +49 -0
- data/core/src/services/ftp/docs.md +42 -0
- data/core/src/services/gcs/docs.md +76 -0
- data/core/src/services/gdrive/docs.md +65 -0
- data/core/src/services/ghac/docs.md +84 -0
- data/core/src/services/github/docs.md +52 -0
- data/core/src/services/gridfs/docs.md +46 -0
- data/core/src/services/hdfs/docs.md +140 -0
- data/core/src/services/hdfs_native/docs.md +35 -0
- data/core/src/services/http/docs.md +45 -0
- data/core/src/services/huggingface/docs.md +61 -0
- data/core/src/services/ipfs/docs.md +45 -0
- data/core/src/services/ipmfs/docs.md +14 -0
- data/core/src/services/koofr/docs.md +51 -0
- data/core/src/services/lakefs/docs.md +62 -0
- data/core/src/services/memcached/docs.md +47 -0
- data/core/src/services/memory/docs.md +36 -0
- data/core/src/services/mini_moka/docs.md +19 -0
- data/core/src/services/moka/docs.md +42 -0
- data/core/src/services/mongodb/docs.md +49 -0
- data/core/src/services/monoiofs/docs.md +46 -0
- data/core/src/services/mysql/docs.md +47 -0
- data/core/src/services/obs/docs.md +54 -0
- data/core/src/services/onedrive/docs.md +115 -0
- data/core/src/services/opfs/docs.md +18 -0
- data/core/src/services/oss/docs.md +74 -0
- data/core/src/services/pcloud/docs.md +51 -0
- data/core/src/services/persy/docs.md +43 -0
- data/core/src/services/postgresql/docs.md +47 -0
- data/core/src/services/redb/docs.md +41 -0
- data/core/src/services/redis/docs.md +43 -0
- data/core/src/services/rocksdb/docs.md +54 -0
- data/core/src/services/s3/compatible_services.md +126 -0
- data/core/src/services/s3/docs.md +244 -0
- data/core/src/services/seafile/docs.md +54 -0
- data/core/src/services/sftp/docs.md +49 -0
- data/core/src/services/sled/docs.md +39 -0
- data/core/src/services/sqlite/docs.md +46 -0
- data/core/src/services/surrealdb/docs.md +54 -0
- data/core/src/services/swift/compatible_services.md +53 -0
- data/core/src/services/swift/docs.md +52 -0
- data/core/src/services/tikv/docs.md +43 -0
- data/core/src/services/upyun/docs.md +51 -0
- data/core/src/services/vercel_artifacts/docs.md +40 -0
- data/core/src/services/vercel_blob/docs.md +45 -0
- data/core/src/services/webdav/docs.md +49 -0
- data/core/src/services/webhdfs/docs.md +90 -0
- data/core/src/services/yandex_disk/docs.md +45 -0
- data/core/tests/behavior/README.md +77 -0
- data/core/tests/data/normal_dir/.gitkeep +0 -0
- data/core/tests/data/normal_file.txt +1041 -0
- data/core/tests/data/special_dir !@#$%^&()_+-=;',/.gitkeep +0 -0
- data/core/tests/data/special_file !@#$%^&()_+-=;',.txt +1041 -0
- data/core/users.md +13 -0
- data/extconf.rb +24 -0
- data/lib/opendal.rb +25 -0
- data/lib/opendal_ruby/entry.rb +35 -0
- data/lib/opendal_ruby/io.rb +70 -0
- data/lib/opendal_ruby/metadata.rb +44 -0
- data/lib/opendal_ruby/opendal_ruby.bundle +0 -0
- data/lib/opendal_ruby/operator.rb +29 -0
- data/lib/opendal_ruby/operator_info.rb +26 -0
- data/opendal.gemspec +91 -0
- data/test/blocking_op_test.rb +112 -0
- data/test/capability_test.rb +42 -0
- data/test/io_test.rb +172 -0
- data/test/lister_test.rb +77 -0
- data/test/metadata_test.rb +78 -0
- data/test/middlewares_test.rb +46 -0
- data/test/operator_info_test.rb +35 -0
- data/test/test_helper.rb +36 -0
- metadata +240 -0
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
# OpenDAL vs object_store
|
|
2
|
+
|
|
3
|
+
> NOTE: This document is written by OpenDAL's maintainers and not reviewed by
|
|
4
|
+
> object_store's maintainers. So it could not be very objective.
|
|
5
|
+
|
|
6
|
+
## About object_store
|
|
7
|
+
|
|
8
|
+
[object_store](https://crates.io/crates/object_store) is
|
|
9
|
+
|
|
10
|
+
> A focused, easy to use, idiomatic, high performance, `async` object store library interacting with object stores.
|
|
11
|
+
|
|
12
|
+
It was initially developed for [InfluxDB IOx](https://github.com/influxdata/influxdb_iox/) and later split out and donated to [Apache Arrow](https://arrow.apache.org/).
|
|
13
|
+
|
|
14
|
+
## Similarities
|
|
15
|
+
|
|
16
|
+
### Language
|
|
17
|
+
|
|
18
|
+
Yes, of course. Both `opendal` and `object_store` are developed in [Rust](https://www.rust-lang.org/), a language empowering everyone to build reliable and efficient software.
|
|
19
|
+
|
|
20
|
+
### License
|
|
21
|
+
|
|
22
|
+
Both `opendal` and `object_store` are licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
|
23
|
+
|
|
24
|
+
### Owner
|
|
25
|
+
|
|
26
|
+
`object_store` is a part of `Apache Arrow` which means it's hosted and maintained by the [Apache Software Foundation](https://www.apache.org/).
|
|
27
|
+
|
|
28
|
+
`opendal` is now hosted by the [Apache Software Foundation](https://www.apache.org/) also.
|
|
29
|
+
|
|
30
|
+
### Domain
|
|
31
|
+
|
|
32
|
+
Both `opendal` and `object_store` can be used to access data stored on object storage services. The primary users of those projects are both cloud-native databases too:
|
|
33
|
+
|
|
34
|
+
- `opendal` is mainly used by:
|
|
35
|
+
- [databend](https://github.com/datafuselabs/databend): A modern Elasticity and Performance cloud data warehouse
|
|
36
|
+
- [GreptimeDB](https://github.com/GreptimeTeam/greptimedb): An open-source, cloud-native, distributed time-series database.
|
|
37
|
+
- [mozilla/sccache](https://github.com/mozilla/sccache/): sccache is ccache with cloud storage
|
|
38
|
+
- [risingwave](https://github.com/risingwavelabs/risingwave): A Distributed SQL Database for Stream Processing
|
|
39
|
+
- [Vector](https://github.com/vectordotdev/vector): A high-performance observability data pipeline.
|
|
40
|
+
- `object_store` is mainly used by:
|
|
41
|
+
- [datafusion](https://github.com/apache/arrow-datafusion): Apache Arrow DataFusion SQL Query Engine
|
|
42
|
+
- [Influxdb IOx](https://github.com/influxdata/influxdb_iox/): The new core of InfluxDB is written in Rust on top of Apache Arrow.
|
|
43
|
+
|
|
44
|
+
## Differences
|
|
45
|
+
|
|
46
|
+
### Vision
|
|
47
|
+
|
|
48
|
+
`opendal` is Open Data Access Layer that accesses data freely, painlessly, and efficiently. `object_store` is more focused on async object store support.
|
|
49
|
+
|
|
50
|
+
You will see the different visions lead to very different routes.
|
|
51
|
+
|
|
52
|
+
### Design
|
|
53
|
+
|
|
54
|
+
`object_store` exposed a trait called [`ObjectStore`](https://docs.rs/object_store/latest/object_store/trait.ObjectStore.html) to users.
|
|
55
|
+
|
|
56
|
+
Users need to build a `dyn ObjectStore` and operate on it directly:
|
|
57
|
+
|
|
58
|
+
```rust
|
|
59
|
+
let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());
|
|
60
|
+
let path: Path = "data/file01.parquet".try_into()?;
|
|
61
|
+
let stream = object_store
|
|
62
|
+
.get(&path)
|
|
63
|
+
.await?
|
|
64
|
+
.into_stream();
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
`opendal` has a similar trait called [`Access`][crate::raw::Access]
|
|
68
|
+
|
|
69
|
+
But `opendal` don't expose this trait to end users directly. Instead, `opendal` expose a new struct called [`Operator`][crate::Operator] and builds public API on it.
|
|
70
|
+
|
|
71
|
+
```rust
|
|
72
|
+
let op: Operator = Operator::from_env(Scheme::S3)?;
|
|
73
|
+
let r = op.reader("data/file01.parquet").await?;
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
### Interception
|
|
77
|
+
|
|
78
|
+
Both `object_store` and `opendal` provide a mechanism to intercept operations.
|
|
79
|
+
|
|
80
|
+
`object_store` called `Adapters`:
|
|
81
|
+
|
|
82
|
+
```rust
|
|
83
|
+
let object_store = ThrottledStore::new(get_object_store(), ThrottleConfig::default())
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
`opendal` called [`Layer`](crate::raw::Layer):
|
|
87
|
+
|
|
88
|
+
```rust
|
|
89
|
+
let op = op.layer(TracingLayer).layer(MetricsLayer);
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
At the time of writing:
|
|
93
|
+
|
|
94
|
+
object_store (`v0.5.0`) supports:
|
|
95
|
+
|
|
96
|
+
- ThrottleStore: Rate Throttling
|
|
97
|
+
- LimitStore: Concurrent Request Limit
|
|
98
|
+
|
|
99
|
+
opendal supports:
|
|
100
|
+
|
|
101
|
+
- ImmutableIndexLayer: immutable in-memory index.
|
|
102
|
+
- LoggingLayer: logging.
|
|
103
|
+
- MetadataCacheLayer: metadata cache.
|
|
104
|
+
- ContentCacheLayer: content data cache.
|
|
105
|
+
- MetricsLayer: metrics
|
|
106
|
+
- RetryLayer: retry
|
|
107
|
+
- SubdirLayer: Allow switch directory without changing original operator.
|
|
108
|
+
- TracingLayer: tracing
|
|
109
|
+
|
|
110
|
+
### Services
|
|
111
|
+
|
|
112
|
+
`opendal` and `object_store` have different visions, so they have other services support:
|
|
113
|
+
|
|
114
|
+
| service | opendal | object_store |
|
|
115
|
+
|---------|-----------------|-----------------------------------------|
|
|
116
|
+
| azblob | Y | Y |
|
|
117
|
+
| fs | Y | Y |
|
|
118
|
+
| ftp | Y | N |
|
|
119
|
+
| gcs | Y | Y |
|
|
120
|
+
| hdfs | Y | Y *(via [datafusion-objectstore-hdfs])* |
|
|
121
|
+
| http | Y *(read only)* | N |
|
|
122
|
+
| ipfs | Y *(read only)* | N |
|
|
123
|
+
| ipmfs | Y | N |
|
|
124
|
+
| memory | Y | Y |
|
|
125
|
+
| obs | Y | N |
|
|
126
|
+
| s3 | Y | Y |
|
|
127
|
+
|
|
128
|
+
opendal has an idea called [`Capability`][crate::Capability], so it's services may have different capability sets. For example, opendal's `http` and `ipfs` are read only.
|
|
129
|
+
|
|
130
|
+
### Features
|
|
131
|
+
|
|
132
|
+
`opendal` and `object_store` have different visions, so they have different feature sets:
|
|
133
|
+
|
|
134
|
+
| opendal | object_store | notes |
|
|
135
|
+
|-----------|----------------------|----------------------------------------------|
|
|
136
|
+
| metadata | - | get some metadata from underlying storage |
|
|
137
|
+
| create | put | - |
|
|
138
|
+
| read | get | - |
|
|
139
|
+
| read | get_range | - |
|
|
140
|
+
| - | get_ranges | opendal doesn't support read multiple ranges |
|
|
141
|
+
| write | put | - |
|
|
142
|
+
| stat | head | - |
|
|
143
|
+
| delete | delete | - |
|
|
144
|
+
| - | list | opendal doesn't support list with prefix |
|
|
145
|
+
| list | list_with_delimiter | - |
|
|
146
|
+
| - | copy | - |
|
|
147
|
+
| - | copy_if_not_exists | - |
|
|
148
|
+
| - | rename | - |
|
|
149
|
+
| - | rename_if_not_exists | - |
|
|
150
|
+
| presign | - | get a presign URL of object |
|
|
151
|
+
| multipart | multipart | both support, but API is different |
|
|
152
|
+
| blocking | - | opendal supports blocking API |
|
|
153
|
+
|
|
154
|
+
## Demo show
|
|
155
|
+
|
|
156
|
+
The most straightforward complete demo how to read a file from s3:
|
|
157
|
+
|
|
158
|
+
`opendal`
|
|
159
|
+
|
|
160
|
+
```rust
|
|
161
|
+
let mut builder = S3::default();
|
|
162
|
+
builder.bucket("example");
|
|
163
|
+
builder.access_key_id("access_key_id");
|
|
164
|
+
builder.secret_access_key("secret_access_key");
|
|
165
|
+
|
|
166
|
+
let store = Operator::new(builder)?.finish();
|
|
167
|
+
let r = store.reader("data.parquet").await?;
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
`object_store`
|
|
171
|
+
|
|
172
|
+
```rust
|
|
173
|
+
let mut builder = AmazonS3Builder::new()
|
|
174
|
+
.with_bucket_name("example")
|
|
175
|
+
.with_access_key_id("access_key_id")
|
|
176
|
+
.with_secret_access_key("secret_access_key");
|
|
177
|
+
|
|
178
|
+
let store = Arc::new(builder.build()?);
|
|
179
|
+
let path: Path = "data.parquet".try_into().unwrap();
|
|
180
|
+
let stream = store.get(&path).await()?.into_stream();
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
[datafusion-objectstore-hdfs]: https://github.com/datafusion-contrib/datafusion-objectstore-hdfs/
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# Concurrent Write
|
|
2
|
+
|
|
3
|
+
OpenDAL writes data sequentially by default.
|
|
4
|
+
|
|
5
|
+
```rust
|
|
6
|
+
# use opendal::Operator;
|
|
7
|
+
# async fn test() {
|
|
8
|
+
let w = op.writer("test.txt").await?;
|
|
9
|
+
w.write(data1).await?;
|
|
10
|
+
w.write(data2).await?;
|
|
11
|
+
w.close().await?;
|
|
12
|
+
# }
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Most of the time, this can't maximize write performance due to limitations on a single connection. We can perform concurrent writes to improve performance.
|
|
16
|
+
|
|
17
|
+
```rust
|
|
18
|
+
# use opendal::Operator;
|
|
19
|
+
# async fn test() {
|
|
20
|
+
let w = op
|
|
21
|
+
.writer_with("test.txt")
|
|
22
|
+
.concurrent(8)
|
|
23
|
+
.await?;
|
|
24
|
+
|
|
25
|
+
w.write(data1).await?;
|
|
26
|
+
w.write(data2).await?;
|
|
27
|
+
w.close().await?;
|
|
28
|
+
# }
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
After setting `concurrent`, OpenDAL will attempt to write the specified file concurrently. The maximum level of concurrency is determined by the `concurrent` parameter. By default, it is set to 1, indicating a sequential write operation.
|
|
32
|
+
|
|
33
|
+
Under the hood, OpenDAL maintains a task queue to manage concurrent writes. It spawns asynchronous tasks in the background using the [`Executor`][crate::Executor] and tracks the status of each task. The task queue is flushed when the writer is closed, allowing data to be written concurrently without blocking the main thread.
|
|
34
|
+
|
|
35
|
+
Take our example here, the write of `data1` will not block the write of `data2`. The two writes will be executed concurrently.
|
|
36
|
+
|
|
37
|
+
The underlying implementation of concurrent writes may vary depending on the backend. For instance, the `s3` backend leverages the S3 Multipart Uploads API to handle concurrent writes, while the `azblob` backend utilizes the Block API for the same purpose.
|
|
38
|
+
|
|
39
|
+
## Tuning
|
|
40
|
+
|
|
41
|
+
There are two parameters that can be tuned to optimize concurrent writes:
|
|
42
|
+
|
|
43
|
+
- `concurrent`: This parameter controls the maximum number of concurrent writes. The default value is 1.
|
|
44
|
+
- `chunk`: This parameter specifies the size of each chunk of data to be written. The default value is vary for different storage services.
|
|
45
|
+
|
|
46
|
+
### `concurrent`
|
|
47
|
+
|
|
48
|
+
The most important thing to understand is that `concurrent` is not a strict limit. It represents the maximum number of concurrent writes that OpenDAL will attempt to perform. The actual number of concurrent writes may be lower, depending on the input data throughput.
|
|
49
|
+
|
|
50
|
+
For example, if you set `concurrent` to 8, OpenDAL will attempt to perform up to 8 concurrent writes. However, if the input data throughput is low, it might only carry out 2 or 3 concurrent writes at a time, as there isn't enough data to keep all 8 writes active.
|
|
51
|
+
|
|
52
|
+
The best value for `concurrent` depends on the specific use case and the underlying storage service. In general, a higher value can lead to better performance, but it highly depends on the storage service and the network conditions. For example, if the storage service is robust and bandwidth is sufficient, you may observe a linear increase in performance with higher `concurrent` values. However, if the storage service has request limits or the network is nearly saturated, increasing `concurrent` may not lead to any performance improvement—and could even degrade performance due to infinite retries on errors.
|
|
53
|
+
|
|
54
|
+
It's recommended to start with a lower value like `2` or `4` and gradually increase it while monitoring performance and resource usage.
|
|
55
|
+
|
|
56
|
+
### `chunk`
|
|
57
|
+
|
|
58
|
+
The `chunk` parameter specifies the size of each chunk of data to be written. A larger chunk size can improve performance, but it may also increase memory usage. The default value is vary for different storage services.
|
|
59
|
+
|
|
60
|
+
For example, s3 is using `5MiB` as the default chunk size. It's also the minimum chunk size for s3. If you set a smaller chunk size, OpenDAL will automatically adjust it to `5MiB`.
|
|
61
|
+
|
|
62
|
+
The best value for `chunk` depends on the specific use case and the underlying storage service. For most object storage services, a chunk size of `8MiB` or larger is recommended. However, if you're working with smaller files or have limited memory resources, you may want to use a smaller chunk size.
|
|
63
|
+
|
|
64
|
+
Please note that if you input small chunks of data, OpenDAL will attempt to merge them into a larger chunk before writing. This helps avoid the overhead of writing numerous small chunks, which can negatively affect performance.
|
|
65
|
+
|
|
66
|
+
## Usage
|
|
67
|
+
|
|
68
|
+
To upload a large in-memory chunk concurrently:
|
|
69
|
+
|
|
70
|
+
```rust
|
|
71
|
+
# use opendal::Operator;
|
|
72
|
+
# async fn test() {
|
|
73
|
+
let data = vec![0; 10 * 1024 * 1024]; // 10MiB
|
|
74
|
+
let _ = op.write_with("test.txt", data).concurrent(4).await?;
|
|
75
|
+
# }
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
`concurrent` and `chunk` also works in [`into_sink`][crate::Writer::into_sink], [`into_bytes_sink`][crate::Writer::into_bytes_sink] and [`into_futures_async_write`][crate::Writer::into_futures_async_write]:
|
|
79
|
+
|
|
80
|
+
```rust
|
|
81
|
+
use std::io;
|
|
82
|
+
|
|
83
|
+
use bytes::Bytes;
|
|
84
|
+
use futures::SinkExt;
|
|
85
|
+
use opendal::{Buffer, Operator};
|
|
86
|
+
use opendal::Result;
|
|
87
|
+
|
|
88
|
+
async fn test(op: Operator) -> io::Result<()> {
|
|
89
|
+
let mut w = op
|
|
90
|
+
.writer_with("hello.txt")
|
|
91
|
+
.concurrent(8)
|
|
92
|
+
.chunk(256)
|
|
93
|
+
.await?
|
|
94
|
+
.into_sink();
|
|
95
|
+
let bs = "Hello, World!".as_bytes();
|
|
96
|
+
w.send(Buffer::from(bs)).await?;
|
|
97
|
+
w.close().await?;
|
|
98
|
+
|
|
99
|
+
Ok(())
|
|
100
|
+
}
|
|
101
|
+
```
|
|
@@ -0,0 +1,124 @@
|
|
|
1
|
+
# HTTP Optimization
|
|
2
|
+
|
|
3
|
+
All OpenDAL HTTP-based storage services use the same [HttpClient][crate::raw::HttpClient] abstraction. This design offers users a unified interface for configuring HTTP clients. The default HTTP client is [reqwest](https://crates.io/crates/reqwest), a popular and widely used HTTP client library in Rust.
|
|
4
|
+
|
|
5
|
+
Many of the services supported by OpenDAL are HTTP-based. This guide aims to provide optimization tips for using HTTP-based storage services. While these tips are also applicable to other HTTP clients, the configuration methods may vary.
|
|
6
|
+
|
|
7
|
+
Please note that the following optimizations are based on experience and may not be suitable for all scenarios. The most effective way to determine the optimal configuration is to test it in your specific environment.
|
|
8
|
+
|
|
9
|
+
## HTTP/1.1
|
|
10
|
+
|
|
11
|
+
According to benchmarks from OpenDAL users, `HTTP/1.1` is generally faster than `HTTP/2` for large-scale download and upload operations.
|
|
12
|
+
|
|
13
|
+
`reqwest` tends to maintain only a single TCP connection for `HTTP/2`, relying on its built-in multiplexing capabilities. While this works well for small files, such as web page downloads, the design is not ideal for handling large files or massive file scan OLAP workloads.
|
|
14
|
+
|
|
15
|
+
When `HTTP/2` is disabled, `reqwest` falls back to `HTTP/1.1` and utilizes its default connection pool. This approach is better suited for large files, as it allows multiple TCP connections to be opened and used concurrently, significantly improving performance for large file downloads and uploads.
|
|
16
|
+
|
|
17
|
+
If your workloads involve large files or require high throughput, and are not sensitive to latency, consider disabling `HTTP/2` in your configuration.
|
|
18
|
+
|
|
19
|
+
```rust
|
|
20
|
+
let client = reqwest::ClientBuilder::new()
|
|
21
|
+
// Disable http2 for better performance.
|
|
22
|
+
.http1_only()
|
|
23
|
+
.build()
|
|
24
|
+
.expect("http client must be created");
|
|
25
|
+
|
|
26
|
+
// Update the http client in the operator.
|
|
27
|
+
let op = op.update_http_client(|_| HttpClient::with(client));
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## DNS Caching
|
|
31
|
+
|
|
32
|
+
`reqwest` uses the DNS resolver provided by Rust's standard library by default, which is backed by the `getaddrinfo` system call under the hood. This system call does not cache results by default, meaning that each time you make a request to a new domain, a DNS lookup will be performed.
|
|
33
|
+
|
|
34
|
+
Under high-throughput workloads, this can cause a significant performance degradation, as each request incurs the overhead of a DNS lookup. It can also negatively affect the resolver, potentially overwhelming it with the volume of requests. In extreme cases, this may result in a DoS attack on the resolver, rendering it unresponsive.
|
|
35
|
+
|
|
36
|
+
To mitigate this issue, you can enable DNS caching in `reqwest` by using the `hickory-dns` feature. This feature provides a more efficient DNS resolver that caches results.
|
|
37
|
+
|
|
38
|
+
```rust
|
|
39
|
+
let client = reqwest::ClientBuilder::new()
|
|
40
|
+
// Enable hickory dns for dns caching and async dns resolve.
|
|
41
|
+
.hickory_dns(true)
|
|
42
|
+
.build()
|
|
43
|
+
.expect("http client must be created");
|
|
44
|
+
|
|
45
|
+
// Update the http client in the operator.
|
|
46
|
+
let op = op.update_http_client(|_| HttpClient::with(client));
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
The default DNS cache settings from `hickory_dns` are generally sufficient for most workloads. However, if you have specific requirements—such as sharing the same DNS cache across multiple HTTP clients or configuring the DNS cache size—you can use the `Xuanwo/reqwest-hickory-resolver` crate to set up a custom DNS resolver.
|
|
50
|
+
|
|
51
|
+
```rust
|
|
52
|
+
/// Global shared hickory resolver.
|
|
53
|
+
static GLOBAL_HICKORY_RESOLVER: LazyLock<Arc<HickoryResolver>> = LazyLock::new(|| {
|
|
54
|
+
let mut opts = ResolverOpts::default();
|
|
55
|
+
// Only query for the ipv4 address.
|
|
56
|
+
opts.ip_strategy = LookupIpStrategy::Ipv4Only;
|
|
57
|
+
// Use larger cache size for better performance.
|
|
58
|
+
opts.cache_size = 1024;
|
|
59
|
+
// Positive TTL is set to 5 minutes.
|
|
60
|
+
opts.positive_min_ttl = Some(Duration::from_secs(300));
|
|
61
|
+
// Negative TTL is set to 1 minute.
|
|
62
|
+
opts.negative_min_ttl = Some(Duration::from_secs(60));
|
|
63
|
+
|
|
64
|
+
Arc::new(
|
|
65
|
+
HickoryResolver::default()
|
|
66
|
+
// Always shuffle the DNS results for better performance.
|
|
67
|
+
.with_shuffle(true)
|
|
68
|
+
.with_options(opts),
|
|
69
|
+
)
|
|
70
|
+
});
|
|
71
|
+
|
|
72
|
+
let client = reqwest::ClientBuilder::new()
|
|
73
|
+
// Use our global hickory resolver instead.
|
|
74
|
+
.dns_resolver(GLOBAL_HICKORY_RESOLVER.clone())
|
|
75
|
+
.build()
|
|
76
|
+
.expect("http client must be created");
|
|
77
|
+
|
|
78
|
+
// Update the http client in the operator.
|
|
79
|
+
let op = op.update_http_client(|_| HttpClient::with(client));
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
The `ResolverOpts` has many options that can be configured. For a complete list of options, please refer to the [hickory_resolver documentation](https://docs.rs/hickory-resolver/latest/hickory_resolver/config/struct.ResolverOpts.html).
|
|
83
|
+
|
|
84
|
+
Here is a summary of the most commonly used options:
|
|
85
|
+
|
|
86
|
+
- `ip_strategy`: `hickory_resolver` default to use `Ipv4thenIpv6` strategy, which means it will first query for the IPv4 address and then the IPv6 address. This is generally a good strategy for most workloads. However, if you only need IPv4 addresses, you can set this option to `Ipv4Only` to avoid unnecessary DNS lookups.
|
|
87
|
+
- `cache_size`: This option controls the size of the DNS cache. A larger cache size can improve performance, but it may also increase memory usage. The default value is `32`.
|
|
88
|
+
- `positive_min_ttl` and `negative_min_ttl`: This option controls the minimum TTL for positive and negative DNS responses. A longer TTL can improve performance, but it may also increase the risk of stale DNS records. The default value is `None`. Some bad DNS servers may return a TTL of `0` even when the record is valid. In this case, you can set a longer TTL to avoid unnecessary DNS lookups.
|
|
89
|
+
|
|
90
|
+
In addition to the options mentioned above, `Xuanwo/reqwest-hickory-resolver` also offers a `shuffle` option. This setting determines whether the DNS results are shuffled before being returned. Shuffling can enhance performance by distributing the load more evenly across multiple IP addresses.
|
|
91
|
+
|
|
92
|
+
## Timeout
|
|
93
|
+
|
|
94
|
+
`reqwest` didn't set a default timeout for HTTP requests. This means that if a request hangs or takes too long to complete, it can block the entire process, leading to performance degradation or even application crashes.
|
|
95
|
+
|
|
96
|
+
It's recommended to set a connect timeout for HTTP requests to prevent this issue.
|
|
97
|
+
|
|
98
|
+
```rust
|
|
99
|
+
let client = reqwest::ClientBuilder::new()
|
|
100
|
+
// Set a connect timeout of 5 seconds.
|
|
101
|
+
.connect_timeout(Duration::from_secs(5))
|
|
102
|
+
.build()
|
|
103
|
+
.expect("http client must be created");
|
|
104
|
+
|
|
105
|
+
// Update the http client in the operator.
|
|
106
|
+
let op = op.update_http_client(|_| HttpClient::with(client));
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
It's also recommended to use opendal's [`TimeoutLayer`][crate::layers::TimeoutLayer] to prevent slow requests hangs forever. This layer will automatically cancel the request if it takes too long to complete.
|
|
110
|
+
|
|
111
|
+
```rust
|
|
112
|
+
let op = op.layer(TimeoutLayer::new());
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## Connection Pool
|
|
116
|
+
|
|
117
|
+
`reqwest` uses a connection pool to manage HTTP connections. This allows multiple requests to share the same connection, reducing the overhead of establishing new connections for each request.
|
|
118
|
+
|
|
119
|
+
By default, the connection pool is unlimited, allowing `reqwest` to open as many connections as needed. The default keep-alive timeout is 90 seconds, meaning any connection idle for longer than that will be closed.
|
|
120
|
+
|
|
121
|
+
You can tune those settings via:
|
|
122
|
+
|
|
123
|
+
- [pool_idle_timeout](https://docs.rs/reqwest/0.12.15/reqwest/struct.ClientBuilder.html#method.pool_idle_timeout): Set an optional timeout for idle sockets being kept-alive.
|
|
124
|
+
- [pool_max_idle_per_host](https://docs.rs/reqwest/0.12.15/reqwest/struct.ClientBuilder.html#method.pool_max_idle_per_host): Sets the maximum idle connection per host allowed in the pool.
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
- Proposal Name: (fill me in with a unique ident, `my_awesome_feature`)
|
|
2
|
+
- Start Date: (fill me in with today's date, YYYY-MM-DD)
|
|
3
|
+
- RFC PR: [apache/opendal#0000](https://github.com/apache/opendal/pull/0000)
|
|
4
|
+
- Tracking Issue: [apache/opendal#0000](https://github.com/apache/opendal/issues/0000)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
One paragraph explanation of the proposal.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
Why are we doing this? What use cases does it support? What is the expected outcome?
|
|
13
|
+
|
|
14
|
+
# Guide-level explanation
|
|
15
|
+
|
|
16
|
+
Explain the proposal as if it was already included in the opendal and you were teaching it to other opendal users. That generally means:
|
|
17
|
+
|
|
18
|
+
- Introducing new named concepts.
|
|
19
|
+
- Explaining the feature mainly in terms of examples.
|
|
20
|
+
- Explaining how opendal users should *think* about the feature and how it should impact the way they use opendal. It should explain the impact as concretely as possible.
|
|
21
|
+
- If applicable, provide sample error messages, deprecation warnings, or migration guidance.
|
|
22
|
+
- If applicable, describe the differences between teaching this to exist opendal users and new opendal users.
|
|
23
|
+
|
|
24
|
+
# Reference-level explanation
|
|
25
|
+
|
|
26
|
+
This is the technical portion of the RFC. Explain the design in sufficient detail that:
|
|
27
|
+
|
|
28
|
+
- Its interaction with other features is clear.
|
|
29
|
+
- It is reasonably clear how the feature would be implemented.
|
|
30
|
+
- Corner cases are dissected by example.
|
|
31
|
+
|
|
32
|
+
The section should return to the examples given in the previous section and explain more fully how the detailed proposal makes those examples work.
|
|
33
|
+
|
|
34
|
+
# Drawbacks
|
|
35
|
+
|
|
36
|
+
Why should we *not* do this?
|
|
37
|
+
|
|
38
|
+
# Rationale and alternatives
|
|
39
|
+
|
|
40
|
+
- Why is this design the best in the space of possible designs?
|
|
41
|
+
- What other designs have been considered, and what is the rationale for not choosing them?
|
|
42
|
+
- What is the impact of not doing this?
|
|
43
|
+
|
|
44
|
+
# Prior art
|
|
45
|
+
|
|
46
|
+
Discuss prior art, both the good and the bad, in relation to this proposal.
|
|
47
|
+
A few examples of what this can include are:
|
|
48
|
+
|
|
49
|
+
- What lessons can we learn from what other communities have done here?
|
|
50
|
+
|
|
51
|
+
This section is intended to encourage you as an author to think about the lessons from other communities provide readers of your RFC with a fuller picture.
|
|
52
|
+
If there is no prior art, that is fine - your ideas are interesting to us, whether they are brand new or an adaptation from other projects.
|
|
53
|
+
|
|
54
|
+
# Unresolved questions
|
|
55
|
+
|
|
56
|
+
- What parts of the design do you expect to resolve through the RFC process before this gets merged?
|
|
57
|
+
- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
|
|
58
|
+
|
|
59
|
+
# Future possibilities
|
|
60
|
+
|
|
61
|
+
Think about what the natural extension and evolution of your proposal would be and how it would affect the opendal. Try to use this section as a tool to more fully consider all possible interactions with the project in your proposal.
|
|
62
|
+
|
|
63
|
+
Also, consider how this all fits into the roadmap for the project.
|
|
64
|
+
|
|
65
|
+
This is also a good place to "dump ideas", if they are out of scope for the
|
|
66
|
+
RFC, you are writing but otherwise related.
|
|
67
|
+
|
|
68
|
+
If you have tried and cannot think of any future possibilities,
|
|
69
|
+
you may state that you cannot think of anything.
|
|
70
|
+
|
|
71
|
+
Note that having something written down in the future-possibilities section
|
|
72
|
+
is not a reason to accept the current or a future RFC; such notes should be
|
|
73
|
+
in the section on motivation or rationale in this or subsequent RFCs.
|
|
74
|
+
The section merely provides additional information.
|
|
@@ -0,0 +1,111 @@
|
|
|
1
|
+
- Proposal Name: `foyer_integration`
|
|
2
|
+
- Start Date: 2025-07-07
|
|
3
|
+
- RFC PR: [apache/opendal#6370](https://github.com/apache/opendal/pull/6370)
|
|
4
|
+
- Tracking Issue: [apache/opendal#6372](https://github.com/apache/opendal/issues/6372)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Integrate [*foyer*](https://github.com/foyer-rs/foyer) hybrid support into OpenDAL for performance boost and cost reduction.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
Object storage is the most commonly used option by OpenDAL users. In cloud-based Object Storage services like AWS S3 / GCS, the distribution of request latency is often one to several orders of magnitude higher than local disks or memory, and these services are often billed based on the number of requests.
|
|
13
|
+
|
|
14
|
+
Applications based on these cloud object storage services often need to introduce caching to optimize storage performance while reducing request overhead. *Foyer* provides a mixed caching capability of memory and disk, offering a better balance between performance and cost, thus becoming a dependency for many systems based on cloud object storage along with OpenDAL. e.g. RisingWave, SlateDB, etc.
|
|
15
|
+
|
|
16
|
+
However, regardless of which cache component is introduced, users need to operate additional cache-related APIs apart from operating OpenDAL. If *foyer* can be integrated as an optional component into OpenDAL, it can provide users with a more friendly, convenient, and transparent interaction method.
|
|
17
|
+
|
|
18
|
+
By introducing *foyer* integration, the users will be benefited in to following aspects:
|
|
19
|
+
|
|
20
|
+
- Performance boost and cost reduction by caching with both memory and disk.
|
|
21
|
+
- A completely transparent implementation, using the same operation APIs as before.
|
|
22
|
+
|
|
23
|
+
[RFC#6297](https://github.com/apache/opendal/pull/6297) has mentioned a general cache layer design, and *foyer* can be integrated into OpenDAL as a general cache in this way. However, this may not fully leverage *foyer*'s capabilities:
|
|
24
|
+
|
|
25
|
+
- *Foyer* support automatic cache refilling on cache miss. The behavior differs based on the reason of cache miss and the statistics of the entry (e.g. entry not in cache, disk operation throttled, age of entry, etc). All of the abilities are supported by a non-standard API `fetch()`, which other cache libraries don't have.
|
|
26
|
+
- *Foyer* support requests deduplication on the same key. *Foyer* ensures that for concurrent access to the same key, only one request will actually access the disk cache or remote storage, while other requests will wait for this request to return and directly reuse the result, in order to minimize overhead as much as possible.
|
|
27
|
+
|
|
28
|
+
These capabilities overlap with some of the functionalities provided by a general cache Layer, while others are orthogonal. An independent *foyer* integration (e.g. `FoyerLayer`) can fully leverage Foyer's capabilities. At the same time, this will not affect future integration with Foyer and other cache libraries through the general cache layer.
|
|
29
|
+
|
|
30
|
+
# Guide-level explanation
|
|
31
|
+
|
|
32
|
+
## 1. Enable feature
|
|
33
|
+
|
|
34
|
+
```toml
|
|
35
|
+
opendal = { version = "*", features = ["layers-foyer"] }
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## 2. Build foyer instance
|
|
39
|
+
|
|
40
|
+
```rust
|
|
41
|
+
let cache = HybridCacheBuilder::new()
|
|
42
|
+
.memory(10)
|
|
43
|
+
.with_shards(1)
|
|
44
|
+
.storage(Engine::Large(LargeEngineOptions::new()))
|
|
45
|
+
.with_device_options(
|
|
46
|
+
DirectFsDeviceOptions::new(dir.path())
|
|
47
|
+
.with_capacity(16 * MiB as usize)
|
|
48
|
+
.with_file_size(1 * MiB as usize),
|
|
49
|
+
)
|
|
50
|
+
.with_recover_mode(RecoverMode::None)
|
|
51
|
+
.build()
|
|
52
|
+
.await
|
|
53
|
+
.unwrap();
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## 3. Build OpenDAL operator with foyer layer
|
|
57
|
+
|
|
58
|
+
```rust
|
|
59
|
+
let op = Operator::new(Dashmap::default())
|
|
60
|
+
.unwrap()
|
|
61
|
+
.layer(FoyerLayer::new(cache.clone()))
|
|
62
|
+
.finish();
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## 4. Perform operations as you used to
|
|
66
|
+
|
|
67
|
+
```rust
|
|
68
|
+
op.write("obj-1").await.unwrap();
|
|
69
|
+
|
|
70
|
+
assert_eq!(op.list("/").await.unwrap().len(), 1);
|
|
71
|
+
|
|
72
|
+
op.read("obj-1").await.unwrap();
|
|
73
|
+
|
|
74
|
+
op.delete("obj-1").await.unwrap();
|
|
75
|
+
|
|
76
|
+
assert!(op.list("/").await.unwrap().is_empty());
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
# Reference-level explanation
|
|
80
|
+
|
|
81
|
+
As mentioned in the previous section, this RFC aims to integrate *foyer* to fully leverage its capabilities, rather than designing a generic cache layer. Therefore, a transparent integration can be achieved through a `FoyerLayer`.
|
|
82
|
+
|
|
83
|
+
`FoyerLayer` holds both the reference of both internal accessor, and a *foyer* instance. For operations supported by *foyer* and compatible in behavior, the `FoyerLayer` will use *foyer* to handle requests, accessing the internal accessor as needed. For operations that *foyer* cannot support, it will automatically fallback to using the internal accessor implementation.
|
|
84
|
+
|
|
85
|
+
Here are the details of operations that involve *foyer* operation:
|
|
86
|
+
|
|
87
|
+
- `read`: Read from *foyer* hybrid cache, if the hybrid cache misses, fallback to internal accessor `read` operation.
|
|
88
|
+
- For range get, *foyer* caches and fetches the whole object and returns the requested object range. (In future versions, it may be possible to support user configuration for whether to cache the entire object or only the objects covered by the range.)
|
|
89
|
+
- `write`: Insert hybrid cache on internal accessor `write` operation success.
|
|
90
|
+
- `delete`: Delete object from *foyer* hybrid cache regardless of internal accessor `delete` operation success.
|
|
91
|
+
|
|
92
|
+
# Drawbacks
|
|
93
|
+
|
|
94
|
+
Since we cannot perceive whether other users have updated the data in the underlying storage system, introducing a cache in this case may lead to data inconsistency. Therefore, the integration of Foyer is more suitable for object storage systems that do not support updating objects.
|
|
95
|
+
|
|
96
|
+
# Rationale and alternatives
|
|
97
|
+
|
|
98
|
+
[RFC#6297](https://github.com/apache/opendal/pull/6297) has mentioned a general cache layer design, but cannot fully leverage *foyer*'s capabilities. However, the two are not in conflict. At the same time, because #6297 has not yet been finalized, I prefer to implement a layer specifically for the *foyer* first. This does not affect the future implementation of a general cache layer and can also help quickly identify potential user needs and issues.
|
|
99
|
+
|
|
100
|
+
# Prior art
|
|
101
|
+
|
|
102
|
+
*Foyer* has already been applied in systems like RisingWave, ChromaDB, and SlateDB. We can learn from this experience. Notably, both RisingWave and SlateDB support using OpenDAL as the data access layer. This RFC will provide a smoother experience for users with similar needs.
|
|
103
|
+
|
|
104
|
+
# Unresolved questions
|
|
105
|
+
|
|
106
|
+
None
|
|
107
|
+
|
|
108
|
+
# Future possibilities
|
|
109
|
+
|
|
110
|
+
- Based on the experience of implementing the *foyer* layer, a more general cache layer can be developed.
|
|
111
|
+
- Adjust the API of *foyer* to align with the usage of OpenDAL, enhancing compatibility between the two.
|