opendal 0.1.6.pre.rc.1-arm64-darwin-23

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (191) hide show
  1. checksums.yaml +7 -0
  2. data/.standard.yml +20 -0
  3. data/.tool-versions +1 -0
  4. data/.yardopts +1 -0
  5. data/DEPENDENCIES.md +9 -0
  6. data/DEPENDENCIES.rust.tsv +277 -0
  7. data/Gemfile +35 -0
  8. data/README.md +159 -0
  9. data/Rakefile +149 -0
  10. data/core/CHANGELOG.md +4929 -0
  11. data/core/CONTRIBUTING.md +61 -0
  12. data/core/DEPENDENCIES.md +3 -0
  13. data/core/DEPENDENCIES.rust.tsv +185 -0
  14. data/core/LICENSE +201 -0
  15. data/core/README.md +228 -0
  16. data/core/benches/README.md +18 -0
  17. data/core/benches/ops/README.md +26 -0
  18. data/core/benches/types/README.md +9 -0
  19. data/core/benches/vs_fs/README.md +35 -0
  20. data/core/benches/vs_s3/README.md +55 -0
  21. data/core/edge/README.md +3 -0
  22. data/core/edge/file_write_on_full_disk/README.md +14 -0
  23. data/core/edge/s3_aws_assume_role_with_web_identity/README.md +18 -0
  24. data/core/edge/s3_read_on_wasm/.gitignore +3 -0
  25. data/core/edge/s3_read_on_wasm/README.md +42 -0
  26. data/core/edge/s3_read_on_wasm/webdriver.json +15 -0
  27. data/core/examples/README.md +23 -0
  28. data/core/examples/basic/README.md +15 -0
  29. data/core/examples/concurrent-upload/README.md +15 -0
  30. data/core/examples/multipart-upload/README.md +15 -0
  31. data/core/fuzz/.gitignore +5 -0
  32. data/core/fuzz/README.md +68 -0
  33. data/core/src/docs/comparisons/vs_object_store.md +183 -0
  34. data/core/src/docs/performance/concurrent_write.md +101 -0
  35. data/core/src/docs/performance/http_optimization.md +124 -0
  36. data/core/src/docs/rfcs/0000_example.md +74 -0
  37. data/core/src/docs/rfcs/0000_foyer_integration.md +111 -0
  38. data/core/src/docs/rfcs/0041_object_native_api.md +185 -0
  39. data/core/src/docs/rfcs/0044_error_handle.md +198 -0
  40. data/core/src/docs/rfcs/0057_auto_region.md +160 -0
  41. data/core/src/docs/rfcs/0069_object_stream.md +145 -0
  42. data/core/src/docs/rfcs/0090_limited_reader.md +155 -0
  43. data/core/src/docs/rfcs/0112_path_normalization.md +79 -0
  44. data/core/src/docs/rfcs/0191_async_streaming_io.md +328 -0
  45. data/core/src/docs/rfcs/0203_remove_credential.md +96 -0
  46. data/core/src/docs/rfcs/0221_create_dir.md +89 -0
  47. data/core/src/docs/rfcs/0247_retryable_error.md +87 -0
  48. data/core/src/docs/rfcs/0293_object_id.md +67 -0
  49. data/core/src/docs/rfcs/0337_dir_entry.md +191 -0
  50. data/core/src/docs/rfcs/0409_accessor_capabilities.md +67 -0
  51. data/core/src/docs/rfcs/0413_presign.md +154 -0
  52. data/core/src/docs/rfcs/0423_command_line_interface.md +268 -0
  53. data/core/src/docs/rfcs/0429_init_from_iter.md +107 -0
  54. data/core/src/docs/rfcs/0438_multipart.md +163 -0
  55. data/core/src/docs/rfcs/0443_gateway.md +73 -0
  56. data/core/src/docs/rfcs/0501_new_builder.md +111 -0
  57. data/core/src/docs/rfcs/0554_write_refactor.md +96 -0
  58. data/core/src/docs/rfcs/0561_list_metadata_reuse.md +210 -0
  59. data/core/src/docs/rfcs/0599_blocking_api.md +157 -0
  60. data/core/src/docs/rfcs/0623_redis_service.md +300 -0
  61. data/core/src/docs/rfcs/0627_split_capabilities.md +89 -0
  62. data/core/src/docs/rfcs/0661_path_in_accessor.md +126 -0
  63. data/core/src/docs/rfcs/0793_generic_kv_services.md +209 -0
  64. data/core/src/docs/rfcs/0926_object_reader.md +93 -0
  65. data/core/src/docs/rfcs/0977_refactor_error.md +151 -0
  66. data/core/src/docs/rfcs/1085_object_handler.md +73 -0
  67. data/core/src/docs/rfcs/1391_object_metadataer.md +110 -0
  68. data/core/src/docs/rfcs/1398_query_based_metadata.md +125 -0
  69. data/core/src/docs/rfcs/1420_object_writer.md +147 -0
  70. data/core/src/docs/rfcs/1477_remove_object_concept.md +159 -0
  71. data/core/src/docs/rfcs/1735_operation_extension.md +117 -0
  72. data/core/src/docs/rfcs/2083_writer_sink_api.md +106 -0
  73. data/core/src/docs/rfcs/2133_append_api.md +88 -0
  74. data/core/src/docs/rfcs/2299_chain_based_operator_api.md +99 -0
  75. data/core/src/docs/rfcs/2602_object_versioning.md +138 -0
  76. data/core/src/docs/rfcs/2758_merge_append_into_write.md +79 -0
  77. data/core/src/docs/rfcs/2774_lister_api.md +66 -0
  78. data/core/src/docs/rfcs/2779_list_with_metakey.md +143 -0
  79. data/core/src/docs/rfcs/2852_native_capability.md +58 -0
  80. data/core/src/docs/rfcs/2884_merge_range_read_into_read.md +80 -0
  81. data/core/src/docs/rfcs/3017_remove_write_copy_from.md +94 -0
  82. data/core/src/docs/rfcs/3197_config.md +237 -0
  83. data/core/src/docs/rfcs/3232_align_list_api.md +69 -0
  84. data/core/src/docs/rfcs/3243_list_prefix.md +128 -0
  85. data/core/src/docs/rfcs/3356_lazy_reader.md +111 -0
  86. data/core/src/docs/rfcs/3526_list_recursive.md +59 -0
  87. data/core/src/docs/rfcs/3574_concurrent_stat_in_list.md +80 -0
  88. data/core/src/docs/rfcs/3734_buffered_reader.md +64 -0
  89. data/core/src/docs/rfcs/3898_concurrent_writer.md +66 -0
  90. data/core/src/docs/rfcs/3911_deleter_api.md +165 -0
  91. data/core/src/docs/rfcs/4382_range_based_read.md +213 -0
  92. data/core/src/docs/rfcs/4638_executor.md +215 -0
  93. data/core/src/docs/rfcs/5314_remove_metakey.md +120 -0
  94. data/core/src/docs/rfcs/5444_operator_from_uri.md +162 -0
  95. data/core/src/docs/rfcs/5479_context.md +140 -0
  96. data/core/src/docs/rfcs/5485_conditional_reader.md +112 -0
  97. data/core/src/docs/rfcs/5495_list_with_deleted.md +81 -0
  98. data/core/src/docs/rfcs/5556_write_returns_metadata.md +121 -0
  99. data/core/src/docs/rfcs/5871_read_returns_metadata.md +112 -0
  100. data/core/src/docs/rfcs/6189_remove_native_blocking.md +106 -0
  101. data/core/src/docs/rfcs/6209_glob_support.md +132 -0
  102. data/core/src/docs/rfcs/6213_options_api.md +142 -0
  103. data/core/src/docs/rfcs/README.md +62 -0
  104. data/core/src/docs/upgrade.md +1556 -0
  105. data/core/src/services/aliyun_drive/docs.md +61 -0
  106. data/core/src/services/alluxio/docs.md +45 -0
  107. data/core/src/services/azblob/docs.md +77 -0
  108. data/core/src/services/azdls/docs.md +73 -0
  109. data/core/src/services/azfile/docs.md +65 -0
  110. data/core/src/services/b2/docs.md +54 -0
  111. data/core/src/services/cacache/docs.md +38 -0
  112. data/core/src/services/cloudflare_kv/docs.md +21 -0
  113. data/core/src/services/cos/docs.md +55 -0
  114. data/core/src/services/d1/docs.md +48 -0
  115. data/core/src/services/dashmap/docs.md +38 -0
  116. data/core/src/services/dbfs/docs.md +57 -0
  117. data/core/src/services/dropbox/docs.md +64 -0
  118. data/core/src/services/etcd/docs.md +45 -0
  119. data/core/src/services/foundationdb/docs.md +42 -0
  120. data/core/src/services/fs/docs.md +49 -0
  121. data/core/src/services/ftp/docs.md +42 -0
  122. data/core/src/services/gcs/docs.md +76 -0
  123. data/core/src/services/gdrive/docs.md +65 -0
  124. data/core/src/services/ghac/docs.md +84 -0
  125. data/core/src/services/github/docs.md +52 -0
  126. data/core/src/services/gridfs/docs.md +46 -0
  127. data/core/src/services/hdfs/docs.md +140 -0
  128. data/core/src/services/hdfs_native/docs.md +35 -0
  129. data/core/src/services/http/docs.md +45 -0
  130. data/core/src/services/huggingface/docs.md +61 -0
  131. data/core/src/services/ipfs/docs.md +45 -0
  132. data/core/src/services/ipmfs/docs.md +14 -0
  133. data/core/src/services/koofr/docs.md +51 -0
  134. data/core/src/services/lakefs/docs.md +62 -0
  135. data/core/src/services/memcached/docs.md +47 -0
  136. data/core/src/services/memory/docs.md +36 -0
  137. data/core/src/services/mini_moka/docs.md +19 -0
  138. data/core/src/services/moka/docs.md +42 -0
  139. data/core/src/services/mongodb/docs.md +49 -0
  140. data/core/src/services/monoiofs/docs.md +46 -0
  141. data/core/src/services/mysql/docs.md +47 -0
  142. data/core/src/services/obs/docs.md +54 -0
  143. data/core/src/services/onedrive/docs.md +115 -0
  144. data/core/src/services/opfs/docs.md +18 -0
  145. data/core/src/services/oss/docs.md +74 -0
  146. data/core/src/services/pcloud/docs.md +51 -0
  147. data/core/src/services/persy/docs.md +43 -0
  148. data/core/src/services/postgresql/docs.md +47 -0
  149. data/core/src/services/redb/docs.md +41 -0
  150. data/core/src/services/redis/docs.md +43 -0
  151. data/core/src/services/rocksdb/docs.md +54 -0
  152. data/core/src/services/s3/compatible_services.md +126 -0
  153. data/core/src/services/s3/docs.md +244 -0
  154. data/core/src/services/seafile/docs.md +54 -0
  155. data/core/src/services/sftp/docs.md +49 -0
  156. data/core/src/services/sled/docs.md +39 -0
  157. data/core/src/services/sqlite/docs.md +46 -0
  158. data/core/src/services/surrealdb/docs.md +54 -0
  159. data/core/src/services/swift/compatible_services.md +53 -0
  160. data/core/src/services/swift/docs.md +52 -0
  161. data/core/src/services/tikv/docs.md +43 -0
  162. data/core/src/services/upyun/docs.md +51 -0
  163. data/core/src/services/vercel_artifacts/docs.md +40 -0
  164. data/core/src/services/vercel_blob/docs.md +45 -0
  165. data/core/src/services/webdav/docs.md +49 -0
  166. data/core/src/services/webhdfs/docs.md +90 -0
  167. data/core/src/services/yandex_disk/docs.md +45 -0
  168. data/core/tests/behavior/README.md +77 -0
  169. data/core/tests/data/normal_dir/.gitkeep +0 -0
  170. data/core/tests/data/normal_file.txt +1041 -0
  171. data/core/tests/data/special_dir !@#$%^&()_+-=;',/.gitkeep +0 -0
  172. data/core/tests/data/special_file !@#$%^&()_+-=;',.txt +1041 -0
  173. data/core/users.md +13 -0
  174. data/extconf.rb +24 -0
  175. data/lib/opendal.rb +25 -0
  176. data/lib/opendal_ruby/entry.rb +35 -0
  177. data/lib/opendal_ruby/io.rb +70 -0
  178. data/lib/opendal_ruby/metadata.rb +44 -0
  179. data/lib/opendal_ruby/opendal_ruby.bundle +0 -0
  180. data/lib/opendal_ruby/operator.rb +29 -0
  181. data/lib/opendal_ruby/operator_info.rb +26 -0
  182. data/opendal.gemspec +91 -0
  183. data/test/blocking_op_test.rb +112 -0
  184. data/test/capability_test.rb +42 -0
  185. data/test/io_test.rb +172 -0
  186. data/test/lister_test.rb +77 -0
  187. data/test/metadata_test.rb +78 -0
  188. data/test/middlewares_test.rb +46 -0
  189. data/test/operator_info_test.rb +35 -0
  190. data/test/test_helper.rb +36 -0
  191. metadata +240 -0
@@ -0,0 +1,183 @@
1
+ # OpenDAL vs object_store
2
+
3
+ > NOTE: This document is written by OpenDAL's maintainers and not reviewed by
4
+ > object_store's maintainers. So it could not be very objective.
5
+
6
+ ## About object_store
7
+
8
+ [object_store](https://crates.io/crates/object_store) is
9
+
10
+ > A focused, easy to use, idiomatic, high performance, `async` object store library interacting with object stores.
11
+
12
+ It was initially developed for [InfluxDB IOx](https://github.com/influxdata/influxdb_iox/) and later split out and donated to [Apache Arrow](https://arrow.apache.org/).
13
+
14
+ ## Similarities
15
+
16
+ ### Language
17
+
18
+ Yes, of course. Both `opendal` and `object_store` are developed in [Rust](https://www.rust-lang.org/), a language empowering everyone to build reliable and efficient software.
19
+
20
+ ### License
21
+
22
+ Both `opendal` and `object_store` are licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
23
+
24
+ ### Owner
25
+
26
+ `object_store` is a part of `Apache Arrow` which means it's hosted and maintained by the [Apache Software Foundation](https://www.apache.org/).
27
+
28
+ `opendal` is now hosted by the [Apache Software Foundation](https://www.apache.org/) also.
29
+
30
+ ### Domain
31
+
32
+ Both `opendal` and `object_store` can be used to access data stored on object storage services. The primary users of those projects are both cloud-native databases too:
33
+
34
+ - `opendal` is mainly used by:
35
+ - [databend](https://github.com/datafuselabs/databend): A modern Elasticity and Performance cloud data warehouse
36
+ - [GreptimeDB](https://github.com/GreptimeTeam/greptimedb): An open-source, cloud-native, distributed time-series database.
37
+ - [mozilla/sccache](https://github.com/mozilla/sccache/): sccache is ccache with cloud storage
38
+ - [risingwave](https://github.com/risingwavelabs/risingwave): A Distributed SQL Database for Stream Processing
39
+ - [Vector](https://github.com/vectordotdev/vector): A high-performance observability data pipeline.
40
+ - `object_store` is mainly used by:
41
+ - [datafusion](https://github.com/apache/arrow-datafusion): Apache Arrow DataFusion SQL Query Engine
42
+ - [Influxdb IOx](https://github.com/influxdata/influxdb_iox/): The new core of InfluxDB is written in Rust on top of Apache Arrow.
43
+
44
+ ## Differences
45
+
46
+ ### Vision
47
+
48
+ `opendal` is Open Data Access Layer that accesses data freely, painlessly, and efficiently. `object_store` is more focused on async object store support.
49
+
50
+ You will see the different visions lead to very different routes.
51
+
52
+ ### Design
53
+
54
+ `object_store` exposed a trait called [`ObjectStore`](https://docs.rs/object_store/latest/object_store/trait.ObjectStore.html) to users.
55
+
56
+ Users need to build a `dyn ObjectStore` and operate on it directly:
57
+
58
+ ```rust
59
+ let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());
60
+ let path: Path = "data/file01.parquet".try_into()?;
61
+ let stream = object_store
62
+ .get(&path)
63
+ .await?
64
+ .into_stream();
65
+ ```
66
+
67
+ `opendal` has a similar trait called [`Access`][crate::raw::Access]
68
+
69
+ But `opendal` don't expose this trait to end users directly. Instead, `opendal` expose a new struct called [`Operator`][crate::Operator] and builds public API on it.
70
+
71
+ ```rust
72
+ let op: Operator = Operator::from_env(Scheme::S3)?;
73
+ let r = op.reader("data/file01.parquet").await?;
74
+ ```
75
+
76
+ ### Interception
77
+
78
+ Both `object_store` and `opendal` provide a mechanism to intercept operations.
79
+
80
+ `object_store` called `Adapters`:
81
+
82
+ ```rust
83
+ let object_store = ThrottledStore::new(get_object_store(), ThrottleConfig::default())
84
+ ```
85
+
86
+ `opendal` called [`Layer`](crate::raw::Layer):
87
+
88
+ ```rust
89
+ let op = op.layer(TracingLayer).layer(MetricsLayer);
90
+ ```
91
+
92
+ At the time of writing:
93
+
94
+ object_store (`v0.5.0`) supports:
95
+
96
+ - ThrottleStore: Rate Throttling
97
+ - LimitStore: Concurrent Request Limit
98
+
99
+ opendal supports:
100
+
101
+ - ImmutableIndexLayer: immutable in-memory index.
102
+ - LoggingLayer: logging.
103
+ - MetadataCacheLayer: metadata cache.
104
+ - ContentCacheLayer: content data cache.
105
+ - MetricsLayer: metrics
106
+ - RetryLayer: retry
107
+ - SubdirLayer: Allow switch directory without changing original operator.
108
+ - TracingLayer: tracing
109
+
110
+ ### Services
111
+
112
+ `opendal` and `object_store` have different visions, so they have other services support:
113
+
114
+ | service | opendal | object_store |
115
+ |---------|-----------------|-----------------------------------------|
116
+ | azblob | Y | Y |
117
+ | fs | Y | Y |
118
+ | ftp | Y | N |
119
+ | gcs | Y | Y |
120
+ | hdfs | Y | Y *(via [datafusion-objectstore-hdfs])* |
121
+ | http | Y *(read only)* | N |
122
+ | ipfs | Y *(read only)* | N |
123
+ | ipmfs | Y | N |
124
+ | memory | Y | Y |
125
+ | obs | Y | N |
126
+ | s3 | Y | Y |
127
+
128
+ opendal has an idea called [`Capability`][crate::Capability], so it's services may have different capability sets. For example, opendal's `http` and `ipfs` are read only.
129
+
130
+ ### Features
131
+
132
+ `opendal` and `object_store` have different visions, so they have different feature sets:
133
+
134
+ | opendal | object_store | notes |
135
+ |-----------|----------------------|----------------------------------------------|
136
+ | metadata | - | get some metadata from underlying storage |
137
+ | create | put | - |
138
+ | read | get | - |
139
+ | read | get_range | - |
140
+ | - | get_ranges | opendal doesn't support read multiple ranges |
141
+ | write | put | - |
142
+ | stat | head | - |
143
+ | delete | delete | - |
144
+ | - | list | opendal doesn't support list with prefix |
145
+ | list | list_with_delimiter | - |
146
+ | - | copy | - |
147
+ | - | copy_if_not_exists | - |
148
+ | - | rename | - |
149
+ | - | rename_if_not_exists | - |
150
+ | presign | - | get a presign URL of object |
151
+ | multipart | multipart | both support, but API is different |
152
+ | blocking | - | opendal supports blocking API |
153
+
154
+ ## Demo show
155
+
156
+ The most straightforward complete demo how to read a file from s3:
157
+
158
+ `opendal`
159
+
160
+ ```rust
161
+ let mut builder = S3::default();
162
+ builder.bucket("example");
163
+ builder.access_key_id("access_key_id");
164
+ builder.secret_access_key("secret_access_key");
165
+
166
+ let store = Operator::new(builder)?.finish();
167
+ let r = store.reader("data.parquet").await?;
168
+ ```
169
+
170
+ `object_store`
171
+
172
+ ```rust
173
+ let mut builder = AmazonS3Builder::new()
174
+ .with_bucket_name("example")
175
+ .with_access_key_id("access_key_id")
176
+ .with_secret_access_key("secret_access_key");
177
+
178
+ let store = Arc::new(builder.build()?);
179
+ let path: Path = "data.parquet".try_into().unwrap();
180
+ let stream = store.get(&path).await()?.into_stream();
181
+ ```
182
+
183
+ [datafusion-objectstore-hdfs]: https://github.com/datafusion-contrib/datafusion-objectstore-hdfs/
@@ -0,0 +1,101 @@
1
+ # Concurrent Write
2
+
3
+ OpenDAL writes data sequentially by default.
4
+
5
+ ```rust
6
+ # use opendal::Operator;
7
+ # async fn test() {
8
+ let w = op.writer("test.txt").await?;
9
+ w.write(data1).await?;
10
+ w.write(data2).await?;
11
+ w.close().await?;
12
+ # }
13
+ ```
14
+
15
+ Most of the time, this can't maximize write performance due to limitations on a single connection. We can perform concurrent writes to improve performance.
16
+
17
+ ```rust
18
+ # use opendal::Operator;
19
+ # async fn test() {
20
+ let w = op
21
+ .writer_with("test.txt")
22
+ .concurrent(8)
23
+ .await?;
24
+
25
+ w.write(data1).await?;
26
+ w.write(data2).await?;
27
+ w.close().await?;
28
+ # }
29
+ ```
30
+
31
+ After setting `concurrent`, OpenDAL will attempt to write the specified file concurrently. The maximum level of concurrency is determined by the `concurrent` parameter. By default, it is set to 1, indicating a sequential write operation.
32
+
33
+ Under the hood, OpenDAL maintains a task queue to manage concurrent writes. It spawns asynchronous tasks in the background using the [`Executor`][crate::Executor] and tracks the status of each task. The task queue is flushed when the writer is closed, allowing data to be written concurrently without blocking the main thread.
34
+
35
+ Take our example here, the write of `data1` will not block the write of `data2`. The two writes will be executed concurrently.
36
+
37
+ The underlying implementation of concurrent writes may vary depending on the backend. For instance, the `s3` backend leverages the S3 Multipart Uploads API to handle concurrent writes, while the `azblob` backend utilizes the Block API for the same purpose.
38
+
39
+ ## Tuning
40
+
41
+ There are two parameters that can be tuned to optimize concurrent writes:
42
+
43
+ - `concurrent`: This parameter controls the maximum number of concurrent writes. The default value is 1.
44
+ - `chunk`: This parameter specifies the size of each chunk of data to be written. The default value is vary for different storage services.
45
+
46
+ ### `concurrent`
47
+
48
+ The most important thing to understand is that `concurrent` is not a strict limit. It represents the maximum number of concurrent writes that OpenDAL will attempt to perform. The actual number of concurrent writes may be lower, depending on the input data throughput.
49
+
50
+ For example, if you set `concurrent` to 8, OpenDAL will attempt to perform up to 8 concurrent writes. However, if the input data throughput is low, it might only carry out 2 or 3 concurrent writes at a time, as there isn't enough data to keep all 8 writes active.
51
+
52
+ The best value for `concurrent` depends on the specific use case and the underlying storage service. In general, a higher value can lead to better performance, but it highly depends on the storage service and the network conditions. For example, if the storage service is robust and bandwidth is sufficient, you may observe a linear increase in performance with higher `concurrent` values. However, if the storage service has request limits or the network is nearly saturated, increasing `concurrent` may not lead to any performance improvement—and could even degrade performance due to infinite retries on errors.
53
+
54
+ It's recommended to start with a lower value like `2` or `4` and gradually increase it while monitoring performance and resource usage.
55
+
56
+ ### `chunk`
57
+
58
+ The `chunk` parameter specifies the size of each chunk of data to be written. A larger chunk size can improve performance, but it may also increase memory usage. The default value is vary for different storage services.
59
+
60
+ For example, s3 is using `5MiB` as the default chunk size. It's also the minimum chunk size for s3. If you set a smaller chunk size, OpenDAL will automatically adjust it to `5MiB`.
61
+
62
+ The best value for `chunk` depends on the specific use case and the underlying storage service. For most object storage services, a chunk size of `8MiB` or larger is recommended. However, if you're working with smaller files or have limited memory resources, you may want to use a smaller chunk size.
63
+
64
+ Please note that if you input small chunks of data, OpenDAL will attempt to merge them into a larger chunk before writing. This helps avoid the overhead of writing numerous small chunks, which can negatively affect performance.
65
+
66
+ ## Usage
67
+
68
+ To upload a large in-memory chunk concurrently:
69
+
70
+ ```rust
71
+ # use opendal::Operator;
72
+ # async fn test() {
73
+ let data = vec![0; 10 * 1024 * 1024]; // 10MiB
74
+ let _ = op.write_with("test.txt", data).concurrent(4).await?;
75
+ # }
76
+ ```
77
+
78
+ `concurrent` and `chunk` also works in [`into_sink`][crate::Writer::into_sink], [`into_bytes_sink`][crate::Writer::into_bytes_sink] and [`into_futures_async_write`][crate::Writer::into_futures_async_write]:
79
+
80
+ ```rust
81
+ use std::io;
82
+
83
+ use bytes::Bytes;
84
+ use futures::SinkExt;
85
+ use opendal::{Buffer, Operator};
86
+ use opendal::Result;
87
+
88
+ async fn test(op: Operator) -> io::Result<()> {
89
+ let mut w = op
90
+ .writer_with("hello.txt")
91
+ .concurrent(8)
92
+ .chunk(256)
93
+ .await?
94
+ .into_sink();
95
+ let bs = "Hello, World!".as_bytes();
96
+ w.send(Buffer::from(bs)).await?;
97
+ w.close().await?;
98
+
99
+ Ok(())
100
+ }
101
+ ```
@@ -0,0 +1,124 @@
1
+ # HTTP Optimization
2
+
3
+ All OpenDAL HTTP-based storage services use the same [HttpClient][crate::raw::HttpClient] abstraction. This design offers users a unified interface for configuring HTTP clients. The default HTTP client is [reqwest](https://crates.io/crates/reqwest), a popular and widely used HTTP client library in Rust.
4
+
5
+ Many of the services supported by OpenDAL are HTTP-based. This guide aims to provide optimization tips for using HTTP-based storage services. While these tips are also applicable to other HTTP clients, the configuration methods may vary.
6
+
7
+ Please note that the following optimizations are based on experience and may not be suitable for all scenarios. The most effective way to determine the optimal configuration is to test it in your specific environment.
8
+
9
+ ## HTTP/1.1
10
+
11
+ According to benchmarks from OpenDAL users, `HTTP/1.1` is generally faster than `HTTP/2` for large-scale download and upload operations.
12
+
13
+ `reqwest` tends to maintain only a single TCP connection for `HTTP/2`, relying on its built-in multiplexing capabilities. While this works well for small files, such as web page downloads, the design is not ideal for handling large files or massive file scan OLAP workloads.
14
+
15
+ When `HTTP/2` is disabled, `reqwest` falls back to `HTTP/1.1` and utilizes its default connection pool. This approach is better suited for large files, as it allows multiple TCP connections to be opened and used concurrently, significantly improving performance for large file downloads and uploads.
16
+
17
+ If your workloads involve large files or require high throughput, and are not sensitive to latency, consider disabling `HTTP/2` in your configuration.
18
+
19
+ ```rust
20
+ let client = reqwest::ClientBuilder::new()
21
+ // Disable http2 for better performance.
22
+ .http1_only()
23
+ .build()
24
+ .expect("http client must be created");
25
+
26
+ // Update the http client in the operator.
27
+ let op = op.update_http_client(|_| HttpClient::with(client));
28
+ ```
29
+
30
+ ## DNS Caching
31
+
32
+ `reqwest` uses the DNS resolver provided by Rust's standard library by default, which is backed by the `getaddrinfo` system call under the hood. This system call does not cache results by default, meaning that each time you make a request to a new domain, a DNS lookup will be performed.
33
+
34
+ Under high-throughput workloads, this can cause a significant performance degradation, as each request incurs the overhead of a DNS lookup. It can also negatively affect the resolver, potentially overwhelming it with the volume of requests. In extreme cases, this may result in a DoS attack on the resolver, rendering it unresponsive.
35
+
36
+ To mitigate this issue, you can enable DNS caching in `reqwest` by using the `hickory-dns` feature. This feature provides a more efficient DNS resolver that caches results.
37
+
38
+ ```rust
39
+ let client = reqwest::ClientBuilder::new()
40
+ // Enable hickory dns for dns caching and async dns resolve.
41
+ .hickory_dns(true)
42
+ .build()
43
+ .expect("http client must be created");
44
+
45
+ // Update the http client in the operator.
46
+ let op = op.update_http_client(|_| HttpClient::with(client));
47
+ ```
48
+
49
+ The default DNS cache settings from `hickory_dns` are generally sufficient for most workloads. However, if you have specific requirements—such as sharing the same DNS cache across multiple HTTP clients or configuring the DNS cache size—you can use the `Xuanwo/reqwest-hickory-resolver` crate to set up a custom DNS resolver.
50
+
51
+ ```rust
52
+ /// Global shared hickory resolver.
53
+ static GLOBAL_HICKORY_RESOLVER: LazyLock<Arc<HickoryResolver>> = LazyLock::new(|| {
54
+ let mut opts = ResolverOpts::default();
55
+ // Only query for the ipv4 address.
56
+ opts.ip_strategy = LookupIpStrategy::Ipv4Only;
57
+ // Use larger cache size for better performance.
58
+ opts.cache_size = 1024;
59
+ // Positive TTL is set to 5 minutes.
60
+ opts.positive_min_ttl = Some(Duration::from_secs(300));
61
+ // Negative TTL is set to 1 minute.
62
+ opts.negative_min_ttl = Some(Duration::from_secs(60));
63
+
64
+ Arc::new(
65
+ HickoryResolver::default()
66
+ // Always shuffle the DNS results for better performance.
67
+ .with_shuffle(true)
68
+ .with_options(opts),
69
+ )
70
+ });
71
+
72
+ let client = reqwest::ClientBuilder::new()
73
+ // Use our global hickory resolver instead.
74
+ .dns_resolver(GLOBAL_HICKORY_RESOLVER.clone())
75
+ .build()
76
+ .expect("http client must be created");
77
+
78
+ // Update the http client in the operator.
79
+ let op = op.update_http_client(|_| HttpClient::with(client));
80
+ ```
81
+
82
+ The `ResolverOpts` has many options that can be configured. For a complete list of options, please refer to the [hickory_resolver documentation](https://docs.rs/hickory-resolver/latest/hickory_resolver/config/struct.ResolverOpts.html).
83
+
84
+ Here is a summary of the most commonly used options:
85
+
86
+ - `ip_strategy`: `hickory_resolver` default to use `Ipv4thenIpv6` strategy, which means it will first query for the IPv4 address and then the IPv6 address. This is generally a good strategy for most workloads. However, if you only need IPv4 addresses, you can set this option to `Ipv4Only` to avoid unnecessary DNS lookups.
87
+ - `cache_size`: This option controls the size of the DNS cache. A larger cache size can improve performance, but it may also increase memory usage. The default value is `32`.
88
+ - `positive_min_ttl` and `negative_min_ttl`: This option controls the minimum TTL for positive and negative DNS responses. A longer TTL can improve performance, but it may also increase the risk of stale DNS records. The default value is `None`. Some bad DNS servers may return a TTL of `0` even when the record is valid. In this case, you can set a longer TTL to avoid unnecessary DNS lookups.
89
+
90
+ In addition to the options mentioned above, `Xuanwo/reqwest-hickory-resolver` also offers a `shuffle` option. This setting determines whether the DNS results are shuffled before being returned. Shuffling can enhance performance by distributing the load more evenly across multiple IP addresses.
91
+
92
+ ## Timeout
93
+
94
+ `reqwest` didn't set a default timeout for HTTP requests. This means that if a request hangs or takes too long to complete, it can block the entire process, leading to performance degradation or even application crashes.
95
+
96
+ It's recommended to set a connect timeout for HTTP requests to prevent this issue.
97
+
98
+ ```rust
99
+ let client = reqwest::ClientBuilder::new()
100
+ // Set a connect timeout of 5 seconds.
101
+ .connect_timeout(Duration::from_secs(5))
102
+ .build()
103
+ .expect("http client must be created");
104
+
105
+ // Update the http client in the operator.
106
+ let op = op.update_http_client(|_| HttpClient::with(client));
107
+ ```
108
+
109
+ It's also recommended to use opendal's [`TimeoutLayer`][crate::layers::TimeoutLayer] to prevent slow requests hangs forever. This layer will automatically cancel the request if it takes too long to complete.
110
+
111
+ ```rust
112
+ let op = op.layer(TimeoutLayer::new());
113
+ ```
114
+
115
+ ## Connection Pool
116
+
117
+ `reqwest` uses a connection pool to manage HTTP connections. This allows multiple requests to share the same connection, reducing the overhead of establishing new connections for each request.
118
+
119
+ By default, the connection pool is unlimited, allowing `reqwest` to open as many connections as needed. The default keep-alive timeout is 90 seconds, meaning any connection idle for longer than that will be closed.
120
+
121
+ You can tune those settings via:
122
+
123
+ - [pool_idle_timeout](https://docs.rs/reqwest/0.12.15/reqwest/struct.ClientBuilder.html#method.pool_idle_timeout): Set an optional timeout for idle sockets being kept-alive.
124
+ - [pool_max_idle_per_host](https://docs.rs/reqwest/0.12.15/reqwest/struct.ClientBuilder.html#method.pool_max_idle_per_host): Sets the maximum idle connection per host allowed in the pool.
@@ -0,0 +1,74 @@
1
+ - Proposal Name: (fill me in with a unique ident, `my_awesome_feature`)
2
+ - Start Date: (fill me in with today's date, YYYY-MM-DD)
3
+ - RFC PR: [apache/opendal#0000](https://github.com/apache/opendal/pull/0000)
4
+ - Tracking Issue: [apache/opendal#0000](https://github.com/apache/opendal/issues/0000)
5
+
6
+ # Summary
7
+
8
+ One paragraph explanation of the proposal.
9
+
10
+ # Motivation
11
+
12
+ Why are we doing this? What use cases does it support? What is the expected outcome?
13
+
14
+ # Guide-level explanation
15
+
16
+ Explain the proposal as if it was already included in the opendal and you were teaching it to other opendal users. That generally means:
17
+
18
+ - Introducing new named concepts.
19
+ - Explaining the feature mainly in terms of examples.
20
+ - Explaining how opendal users should *think* about the feature and how it should impact the way they use opendal. It should explain the impact as concretely as possible.
21
+ - If applicable, provide sample error messages, deprecation warnings, or migration guidance.
22
+ - If applicable, describe the differences between teaching this to exist opendal users and new opendal users.
23
+
24
+ # Reference-level explanation
25
+
26
+ This is the technical portion of the RFC. Explain the design in sufficient detail that:
27
+
28
+ - Its interaction with other features is clear.
29
+ - It is reasonably clear how the feature would be implemented.
30
+ - Corner cases are dissected by example.
31
+
32
+ The section should return to the examples given in the previous section and explain more fully how the detailed proposal makes those examples work.
33
+
34
+ # Drawbacks
35
+
36
+ Why should we *not* do this?
37
+
38
+ # Rationale and alternatives
39
+
40
+ - Why is this design the best in the space of possible designs?
41
+ - What other designs have been considered, and what is the rationale for not choosing them?
42
+ - What is the impact of not doing this?
43
+
44
+ # Prior art
45
+
46
+ Discuss prior art, both the good and the bad, in relation to this proposal.
47
+ A few examples of what this can include are:
48
+
49
+ - What lessons can we learn from what other communities have done here?
50
+
51
+ This section is intended to encourage you as an author to think about the lessons from other communities provide readers of your RFC with a fuller picture.
52
+ If there is no prior art, that is fine - your ideas are interesting to us, whether they are brand new or an adaptation from other projects.
53
+
54
+ # Unresolved questions
55
+
56
+ - What parts of the design do you expect to resolve through the RFC process before this gets merged?
57
+ - What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
58
+
59
+ # Future possibilities
60
+
61
+ Think about what the natural extension and evolution of your proposal would be and how it would affect the opendal. Try to use this section as a tool to more fully consider all possible interactions with the project in your proposal.
62
+
63
+ Also, consider how this all fits into the roadmap for the project.
64
+
65
+ This is also a good place to "dump ideas", if they are out of scope for the
66
+ RFC, you are writing but otherwise related.
67
+
68
+ If you have tried and cannot think of any future possibilities,
69
+ you may state that you cannot think of anything.
70
+
71
+ Note that having something written down in the future-possibilities section
72
+ is not a reason to accept the current or a future RFC; such notes should be
73
+ in the section on motivation or rationale in this or subsequent RFCs.
74
+ The section merely provides additional information.
@@ -0,0 +1,111 @@
1
+ - Proposal Name: `foyer_integration`
2
+ - Start Date: 2025-07-07
3
+ - RFC PR: [apache/opendal#6370](https://github.com/apache/opendal/pull/6370)
4
+ - Tracking Issue: [apache/opendal#6372](https://github.com/apache/opendal/issues/6372)
5
+
6
+ # Summary
7
+
8
+ Integrate [*foyer*](https://github.com/foyer-rs/foyer) hybrid support into OpenDAL for performance boost and cost reduction.
9
+
10
+ # Motivation
11
+
12
+ Object storage is the most commonly used option by OpenDAL users. In cloud-based Object Storage services like AWS S3 / GCS, the distribution of request latency is often one to several orders of magnitude higher than local disks or memory, and these services are often billed based on the number of requests.
13
+
14
+ Applications based on these cloud object storage services often need to introduce caching to optimize storage performance while reducing request overhead. *Foyer* provides a mixed caching capability of memory and disk, offering a better balance between performance and cost, thus becoming a dependency for many systems based on cloud object storage along with OpenDAL. e.g. RisingWave, SlateDB, etc.
15
+
16
+ However, regardless of which cache component is introduced, users need to operate additional cache-related APIs apart from operating OpenDAL. If *foyer* can be integrated as an optional component into OpenDAL, it can provide users with a more friendly, convenient, and transparent interaction method.
17
+
18
+ By introducing *foyer* integration, the users will be benefited in to following aspects:
19
+
20
+ - Performance boost and cost reduction by caching with both memory and disk.
21
+ - A completely transparent implementation, using the same operation APIs as before.
22
+
23
+ [RFC#6297](https://github.com/apache/opendal/pull/6297) has mentioned a general cache layer design, and *foyer* can be integrated into OpenDAL as a general cache in this way. However, this may not fully leverage *foyer*'s capabilities:
24
+
25
+ - *Foyer* support automatic cache refilling on cache miss. The behavior differs based on the reason of cache miss and the statistics of the entry (e.g. entry not in cache, disk operation throttled, age of entry, etc). All of the abilities are supported by a non-standard API `fetch()`, which other cache libraries don't have.
26
+ - *Foyer* support requests deduplication on the same key. *Foyer* ensures that for concurrent access to the same key, only one request will actually access the disk cache or remote storage, while other requests will wait for this request to return and directly reuse the result, in order to minimize overhead as much as possible.
27
+
28
+ These capabilities overlap with some of the functionalities provided by a general cache Layer, while others are orthogonal. An independent *foyer* integration (e.g. `FoyerLayer`) can fully leverage Foyer's capabilities. At the same time, this will not affect future integration with Foyer and other cache libraries through the general cache layer.
29
+
30
+ # Guide-level explanation
31
+
32
+ ## 1. Enable feature
33
+
34
+ ```toml
35
+ opendal = { version = "*", features = ["layers-foyer"] }
36
+ ```
37
+
38
+ ## 2. Build foyer instance
39
+
40
+ ```rust
41
+ let cache = HybridCacheBuilder::new()
42
+ .memory(10)
43
+ .with_shards(1)
44
+ .storage(Engine::Large(LargeEngineOptions::new()))
45
+ .with_device_options(
46
+ DirectFsDeviceOptions::new(dir.path())
47
+ .with_capacity(16 * MiB as usize)
48
+ .with_file_size(1 * MiB as usize),
49
+ )
50
+ .with_recover_mode(RecoverMode::None)
51
+ .build()
52
+ .await
53
+ .unwrap();
54
+ ```
55
+
56
+ ## 3. Build OpenDAL operator with foyer layer
57
+
58
+ ```rust
59
+ let op = Operator::new(Dashmap::default())
60
+ .unwrap()
61
+ .layer(FoyerLayer::new(cache.clone()))
62
+ .finish();
63
+ ```
64
+
65
+ ## 4. Perform operations as you used to
66
+
67
+ ```rust
68
+ op.write("obj-1").await.unwrap();
69
+
70
+ assert_eq!(op.list("/").await.unwrap().len(), 1);
71
+
72
+ op.read("obj-1").await.unwrap();
73
+
74
+ op.delete("obj-1").await.unwrap();
75
+
76
+ assert!(op.list("/").await.unwrap().is_empty());
77
+ ```
78
+
79
+ # Reference-level explanation
80
+
81
+ As mentioned in the previous section, this RFC aims to integrate *foyer* to fully leverage its capabilities, rather than designing a generic cache layer. Therefore, a transparent integration can be achieved through a `FoyerLayer`.
82
+
83
+ `FoyerLayer` holds both the reference of both internal accessor, and a *foyer* instance. For operations supported by *foyer* and compatible in behavior, the `FoyerLayer` will use *foyer* to handle requests, accessing the internal accessor as needed. For operations that *foyer* cannot support, it will automatically fallback to using the internal accessor implementation.
84
+
85
+ Here are the details of operations that involve *foyer* operation:
86
+
87
+ - `read`: Read from *foyer* hybrid cache, if the hybrid cache misses, fallback to internal accessor `read` operation.
88
+ - For range get, *foyer* caches and fetches the whole object and returns the requested object range. (In future versions, it may be possible to support user configuration for whether to cache the entire object or only the objects covered by the range.)
89
+ - `write`: Insert hybrid cache on internal accessor `write` operation success.
90
+ - `delete`: Delete object from *foyer* hybrid cache regardless of internal accessor `delete` operation success.
91
+
92
+ # Drawbacks
93
+
94
+ Since we cannot perceive whether other users have updated the data in the underlying storage system, introducing a cache in this case may lead to data inconsistency. Therefore, the integration of Foyer is more suitable for object storage systems that do not support updating objects.
95
+
96
+ # Rationale and alternatives
97
+
98
+ [RFC#6297](https://github.com/apache/opendal/pull/6297) has mentioned a general cache layer design, but cannot fully leverage *foyer*'s capabilities. However, the two are not in conflict. At the same time, because #6297 has not yet been finalized, I prefer to implement a layer specifically for the *foyer* first. This does not affect the future implementation of a general cache layer and can also help quickly identify potential user needs and issues.
99
+
100
+ # Prior art
101
+
102
+ *Foyer* has already been applied in systems like RisingWave, ChromaDB, and SlateDB. We can learn from this experience. Notably, both RisingWave and SlateDB support using OpenDAL as the data access layer. This RFC will provide a smoother experience for users with similar needs.
103
+
104
+ # Unresolved questions
105
+
106
+ None
107
+
108
+ # Future possibilities
109
+
110
+ - Based on the experience of implementing the *foyer* layer, a more general cache layer can be developed.
111
+ - Adjust the API of *foyer* to align with the usage of OpenDAL, enhancing compatibility between the two.