opendal 0.1.6.pre.rc.1-aarch64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (191) hide show
  1. checksums.yaml +7 -0
  2. data/.standard.yml +20 -0
  3. data/.tool-versions +1 -0
  4. data/.yardopts +1 -0
  5. data/DEPENDENCIES.md +9 -0
  6. data/DEPENDENCIES.rust.tsv +277 -0
  7. data/Gemfile +35 -0
  8. data/README.md +159 -0
  9. data/Rakefile +149 -0
  10. data/core/CHANGELOG.md +4929 -0
  11. data/core/CONTRIBUTING.md +61 -0
  12. data/core/DEPENDENCIES.md +3 -0
  13. data/core/DEPENDENCIES.rust.tsv +185 -0
  14. data/core/LICENSE +201 -0
  15. data/core/README.md +228 -0
  16. data/core/benches/README.md +18 -0
  17. data/core/benches/ops/README.md +26 -0
  18. data/core/benches/types/README.md +9 -0
  19. data/core/benches/vs_fs/README.md +35 -0
  20. data/core/benches/vs_s3/README.md +55 -0
  21. data/core/edge/README.md +3 -0
  22. data/core/edge/file_write_on_full_disk/README.md +14 -0
  23. data/core/edge/s3_aws_assume_role_with_web_identity/README.md +18 -0
  24. data/core/edge/s3_read_on_wasm/.gitignore +3 -0
  25. data/core/edge/s3_read_on_wasm/README.md +42 -0
  26. data/core/edge/s3_read_on_wasm/webdriver.json +15 -0
  27. data/core/examples/README.md +23 -0
  28. data/core/examples/basic/README.md +15 -0
  29. data/core/examples/concurrent-upload/README.md +15 -0
  30. data/core/examples/multipart-upload/README.md +15 -0
  31. data/core/fuzz/.gitignore +5 -0
  32. data/core/fuzz/README.md +68 -0
  33. data/core/src/docs/comparisons/vs_object_store.md +183 -0
  34. data/core/src/docs/performance/concurrent_write.md +101 -0
  35. data/core/src/docs/performance/http_optimization.md +124 -0
  36. data/core/src/docs/rfcs/0000_example.md +74 -0
  37. data/core/src/docs/rfcs/0000_foyer_integration.md +111 -0
  38. data/core/src/docs/rfcs/0041_object_native_api.md +185 -0
  39. data/core/src/docs/rfcs/0044_error_handle.md +198 -0
  40. data/core/src/docs/rfcs/0057_auto_region.md +160 -0
  41. data/core/src/docs/rfcs/0069_object_stream.md +145 -0
  42. data/core/src/docs/rfcs/0090_limited_reader.md +155 -0
  43. data/core/src/docs/rfcs/0112_path_normalization.md +79 -0
  44. data/core/src/docs/rfcs/0191_async_streaming_io.md +328 -0
  45. data/core/src/docs/rfcs/0203_remove_credential.md +96 -0
  46. data/core/src/docs/rfcs/0221_create_dir.md +89 -0
  47. data/core/src/docs/rfcs/0247_retryable_error.md +87 -0
  48. data/core/src/docs/rfcs/0293_object_id.md +67 -0
  49. data/core/src/docs/rfcs/0337_dir_entry.md +191 -0
  50. data/core/src/docs/rfcs/0409_accessor_capabilities.md +67 -0
  51. data/core/src/docs/rfcs/0413_presign.md +154 -0
  52. data/core/src/docs/rfcs/0423_command_line_interface.md +268 -0
  53. data/core/src/docs/rfcs/0429_init_from_iter.md +107 -0
  54. data/core/src/docs/rfcs/0438_multipart.md +163 -0
  55. data/core/src/docs/rfcs/0443_gateway.md +73 -0
  56. data/core/src/docs/rfcs/0501_new_builder.md +111 -0
  57. data/core/src/docs/rfcs/0554_write_refactor.md +96 -0
  58. data/core/src/docs/rfcs/0561_list_metadata_reuse.md +210 -0
  59. data/core/src/docs/rfcs/0599_blocking_api.md +157 -0
  60. data/core/src/docs/rfcs/0623_redis_service.md +300 -0
  61. data/core/src/docs/rfcs/0627_split_capabilities.md +89 -0
  62. data/core/src/docs/rfcs/0661_path_in_accessor.md +126 -0
  63. data/core/src/docs/rfcs/0793_generic_kv_services.md +209 -0
  64. data/core/src/docs/rfcs/0926_object_reader.md +93 -0
  65. data/core/src/docs/rfcs/0977_refactor_error.md +151 -0
  66. data/core/src/docs/rfcs/1085_object_handler.md +73 -0
  67. data/core/src/docs/rfcs/1391_object_metadataer.md +110 -0
  68. data/core/src/docs/rfcs/1398_query_based_metadata.md +125 -0
  69. data/core/src/docs/rfcs/1420_object_writer.md +147 -0
  70. data/core/src/docs/rfcs/1477_remove_object_concept.md +159 -0
  71. data/core/src/docs/rfcs/1735_operation_extension.md +117 -0
  72. data/core/src/docs/rfcs/2083_writer_sink_api.md +106 -0
  73. data/core/src/docs/rfcs/2133_append_api.md +88 -0
  74. data/core/src/docs/rfcs/2299_chain_based_operator_api.md +99 -0
  75. data/core/src/docs/rfcs/2602_object_versioning.md +138 -0
  76. data/core/src/docs/rfcs/2758_merge_append_into_write.md +79 -0
  77. data/core/src/docs/rfcs/2774_lister_api.md +66 -0
  78. data/core/src/docs/rfcs/2779_list_with_metakey.md +143 -0
  79. data/core/src/docs/rfcs/2852_native_capability.md +58 -0
  80. data/core/src/docs/rfcs/2884_merge_range_read_into_read.md +80 -0
  81. data/core/src/docs/rfcs/3017_remove_write_copy_from.md +94 -0
  82. data/core/src/docs/rfcs/3197_config.md +237 -0
  83. data/core/src/docs/rfcs/3232_align_list_api.md +69 -0
  84. data/core/src/docs/rfcs/3243_list_prefix.md +128 -0
  85. data/core/src/docs/rfcs/3356_lazy_reader.md +111 -0
  86. data/core/src/docs/rfcs/3526_list_recursive.md +59 -0
  87. data/core/src/docs/rfcs/3574_concurrent_stat_in_list.md +80 -0
  88. data/core/src/docs/rfcs/3734_buffered_reader.md +64 -0
  89. data/core/src/docs/rfcs/3898_concurrent_writer.md +66 -0
  90. data/core/src/docs/rfcs/3911_deleter_api.md +165 -0
  91. data/core/src/docs/rfcs/4382_range_based_read.md +213 -0
  92. data/core/src/docs/rfcs/4638_executor.md +215 -0
  93. data/core/src/docs/rfcs/5314_remove_metakey.md +120 -0
  94. data/core/src/docs/rfcs/5444_operator_from_uri.md +162 -0
  95. data/core/src/docs/rfcs/5479_context.md +140 -0
  96. data/core/src/docs/rfcs/5485_conditional_reader.md +112 -0
  97. data/core/src/docs/rfcs/5495_list_with_deleted.md +81 -0
  98. data/core/src/docs/rfcs/5556_write_returns_metadata.md +121 -0
  99. data/core/src/docs/rfcs/5871_read_returns_metadata.md +112 -0
  100. data/core/src/docs/rfcs/6189_remove_native_blocking.md +106 -0
  101. data/core/src/docs/rfcs/6209_glob_support.md +132 -0
  102. data/core/src/docs/rfcs/6213_options_api.md +142 -0
  103. data/core/src/docs/rfcs/README.md +62 -0
  104. data/core/src/docs/upgrade.md +1556 -0
  105. data/core/src/services/aliyun_drive/docs.md +61 -0
  106. data/core/src/services/alluxio/docs.md +45 -0
  107. data/core/src/services/azblob/docs.md +77 -0
  108. data/core/src/services/azdls/docs.md +73 -0
  109. data/core/src/services/azfile/docs.md +65 -0
  110. data/core/src/services/b2/docs.md +54 -0
  111. data/core/src/services/cacache/docs.md +38 -0
  112. data/core/src/services/cloudflare_kv/docs.md +21 -0
  113. data/core/src/services/cos/docs.md +55 -0
  114. data/core/src/services/d1/docs.md +48 -0
  115. data/core/src/services/dashmap/docs.md +38 -0
  116. data/core/src/services/dbfs/docs.md +57 -0
  117. data/core/src/services/dropbox/docs.md +64 -0
  118. data/core/src/services/etcd/docs.md +45 -0
  119. data/core/src/services/foundationdb/docs.md +42 -0
  120. data/core/src/services/fs/docs.md +49 -0
  121. data/core/src/services/ftp/docs.md +42 -0
  122. data/core/src/services/gcs/docs.md +76 -0
  123. data/core/src/services/gdrive/docs.md +65 -0
  124. data/core/src/services/ghac/docs.md +84 -0
  125. data/core/src/services/github/docs.md +52 -0
  126. data/core/src/services/gridfs/docs.md +46 -0
  127. data/core/src/services/hdfs/docs.md +140 -0
  128. data/core/src/services/hdfs_native/docs.md +35 -0
  129. data/core/src/services/http/docs.md +45 -0
  130. data/core/src/services/huggingface/docs.md +61 -0
  131. data/core/src/services/ipfs/docs.md +45 -0
  132. data/core/src/services/ipmfs/docs.md +14 -0
  133. data/core/src/services/koofr/docs.md +51 -0
  134. data/core/src/services/lakefs/docs.md +62 -0
  135. data/core/src/services/memcached/docs.md +47 -0
  136. data/core/src/services/memory/docs.md +36 -0
  137. data/core/src/services/mini_moka/docs.md +19 -0
  138. data/core/src/services/moka/docs.md +42 -0
  139. data/core/src/services/mongodb/docs.md +49 -0
  140. data/core/src/services/monoiofs/docs.md +46 -0
  141. data/core/src/services/mysql/docs.md +47 -0
  142. data/core/src/services/obs/docs.md +54 -0
  143. data/core/src/services/onedrive/docs.md +115 -0
  144. data/core/src/services/opfs/docs.md +18 -0
  145. data/core/src/services/oss/docs.md +74 -0
  146. data/core/src/services/pcloud/docs.md +51 -0
  147. data/core/src/services/persy/docs.md +43 -0
  148. data/core/src/services/postgresql/docs.md +47 -0
  149. data/core/src/services/redb/docs.md +41 -0
  150. data/core/src/services/redis/docs.md +43 -0
  151. data/core/src/services/rocksdb/docs.md +54 -0
  152. data/core/src/services/s3/compatible_services.md +126 -0
  153. data/core/src/services/s3/docs.md +244 -0
  154. data/core/src/services/seafile/docs.md +54 -0
  155. data/core/src/services/sftp/docs.md +49 -0
  156. data/core/src/services/sled/docs.md +39 -0
  157. data/core/src/services/sqlite/docs.md +46 -0
  158. data/core/src/services/surrealdb/docs.md +54 -0
  159. data/core/src/services/swift/compatible_services.md +53 -0
  160. data/core/src/services/swift/docs.md +52 -0
  161. data/core/src/services/tikv/docs.md +43 -0
  162. data/core/src/services/upyun/docs.md +51 -0
  163. data/core/src/services/vercel_artifacts/docs.md +40 -0
  164. data/core/src/services/vercel_blob/docs.md +45 -0
  165. data/core/src/services/webdav/docs.md +49 -0
  166. data/core/src/services/webhdfs/docs.md +90 -0
  167. data/core/src/services/yandex_disk/docs.md +45 -0
  168. data/core/tests/behavior/README.md +77 -0
  169. data/core/tests/data/normal_dir/.gitkeep +0 -0
  170. data/core/tests/data/normal_file.txt +1041 -0
  171. data/core/tests/data/special_dir !@#$%^&()_+-=;',/.gitkeep +0 -0
  172. data/core/tests/data/special_file !@#$%^&()_+-=;',.txt +1041 -0
  173. data/core/users.md +13 -0
  174. data/extconf.rb +24 -0
  175. data/lib/opendal.rb +25 -0
  176. data/lib/opendal_ruby/entry.rb +35 -0
  177. data/lib/opendal_ruby/io.rb +70 -0
  178. data/lib/opendal_ruby/metadata.rb +44 -0
  179. data/lib/opendal_ruby/opendal_ruby.so +0 -0
  180. data/lib/opendal_ruby/operator.rb +29 -0
  181. data/lib/opendal_ruby/operator_info.rb +26 -0
  182. data/opendal.gemspec +91 -0
  183. data/test/blocking_op_test.rb +112 -0
  184. data/test/capability_test.rb +42 -0
  185. data/test/io_test.rb +172 -0
  186. data/test/lister_test.rb +77 -0
  187. data/test/metadata_test.rb +78 -0
  188. data/test/middlewares_test.rb +46 -0
  189. data/test/operator_info_test.rb +35 -0
  190. data/test/test_helper.rb +36 -0
  191. metadata +240 -0
@@ -0,0 +1,162 @@
1
+ - Proposal Name: `operator_from_uri`
2
+ - Start Date: 2024-12-23
3
+ - RFC PR: [apache/opendal#5444](https://github.com/apache/opendal/pull/5444)
4
+ - Tracking Issue: [apache/opendal#5445](https://github.com/apache/opendal/issues/5445)
5
+
6
+ # Summary
7
+
8
+ This RFC proposes adding URI-based configuration support to OpenDAL, allowing users to create operators directly from URIs. The proposal introduces a new `from_uri` API in both the `Operator` and `Configurator` traits, along with an `OperatorRegistry` to manage operator factories. As part of this change, we will also transition from the `Scheme` enum to string-based scheme identifiers, enabling better modularity and support for service crate splitting.
9
+
10
+ # Motivation
11
+
12
+ Currently, creating an operator in OpenDAL requires explicit configuration through builder patterns. While this approach provides type safety and clear documentation, it can be verbose and inflexible for simple use cases. Many storage systems are naturally identified by URIs (e.g., `s3://bucket/path`, `fs:///path/to/dir`).
13
+
14
+ Adding URI-based configuration would:
15
+
16
+ - Simplify operator creation for common use cases
17
+ - Enable configuration via connection strings (common in many applications)
18
+ - Make OpenDAL more approachable for new users
19
+ - Allow dynamic operator creation based on runtime configuration
20
+
21
+ # Guide-level explanation
22
+
23
+ The new API allows creating operators directly from URIs:
24
+
25
+ ```rust
26
+ // Create an operator using URI
27
+ let op = Operator::from_uri("s3://my-bucket/path", vec![
28
+ ("endpoint".to_string(), "http://localhost:8080"to_string()),
29
+ ])?;
30
+
31
+ // Users can pass options through the URI along with additional key-value pairs
32
+ // The extra options will override identical options specified in the URI
33
+ let op = Operator::from_uri("s3://my-bucket/path?region=us-east-1", vec![
34
+ ("endpoint".to_string(), "http://localhost:8080"to_string()),
35
+ ])?;
36
+
37
+ // Create a file system operator
38
+ let op = Operator::from_uri("fs:///tmp/test", vec![])?;
39
+ ```
40
+
41
+ OpenDAL will, by default, register services enabled by features in a global `OperatorRegistry`. Users can also create custom operator registries to support their own schemes or additional options.
42
+
43
+ ```rust
44
+ // Using a custom registry
45
+ let registry = OperatorRegistry::new();
46
+
47
+ // Register builtin builders under desired schemes
48
+ registry.register::<services::S3>(services::S3_SCHEME);
49
+ registry.register::<services::S3>("minio"); // MinIO is S3-compatible
50
+ registry.register::<services::S3>("r2"); // Cloudflare R2 is S3-compatible
51
+
52
+ // Users can define their own scheme names for internal use
53
+ registry.register::<services::S3>("company-storage");
54
+ registry.register::<services::Azblob>("backup-storage");
55
+
56
+ let op = registry.load("company-storage://bucket/path", [])?;
57
+ ```
58
+
59
+ # Reference-level explanation
60
+
61
+ The implementation consists of three main components:
62
+
63
+ 1. The `OperatorFactory` and `OperatorRegistry`:
64
+
65
+ `OperatorFactory` is a function type that takes a URI string plus options and returns an `Operator`. `OperatorRegistry` manages factories registered under different schemes.
66
+
67
+ `OperatorFactory` is a function type that takes a URI string plus options and returns an `Operator`. `OperatorRegistry` manages factories registered under different schemes.
68
+
69
+ ```rust
70
+ type OperatorFactory = fn(&str, Vec<(String, String)>) -> Result<Operator>;
71
+
72
+ pub struct OperatorRegistry { ... }
73
+
74
+ impl OperatorRegistry {
75
+ fn register<B: Builder>(&self, scheme: &str) {
76
+ ...
77
+ }
78
+
79
+ fn load(&self, uri: &str, options: impl IntoIterator<Item = (String, String)>) -> Result<Operator> {
80
+ ...
81
+ }
82
+ }
83
+ ```
84
+
85
+ 2. The `Configurator` trait extension:
86
+
87
+ `Configurator` will add a new API to create a configuration from a URI and options. Services should only parse the URI components relevant to their configuration (host, path, query parameters) without concerning themselves with the scheme portion.
88
+
89
+ ```rust
90
+ impl Configurator for S3Config {
91
+ fn from_uri(uri: &Uri, options: &HashMap<String, String>) -> Result<Self> {
92
+ ...
93
+ }
94
+ }
95
+ ```
96
+
97
+ This design allows the same S3 implementation to work whether accessed via `s3://`, `minio://`, or any other user-defined scheme.
98
+
99
+ 3. The `Operator` `from_uri` method:
100
+
101
+ The `Operator` trait will add a new `from_uri` method to create an operator from a URI and options. This method will use the global `OperatorRegistry` to find the appropriate factory for the scheme.
102
+
103
+ ```rust
104
+ impl Operator {
105
+ pub fn from_uri(
106
+ uri: &str,
107
+ options: impl IntoIterator<Item = (String, String)>,
108
+ ) -> Result<Self> {
109
+ ...
110
+ }
111
+ }
112
+ ```
113
+
114
+ ## Scheme Enum Removal
115
+
116
+ As part of this RFC, we will transition from the `Scheme` enum to string-based identifiers (`&'static str`). This change is necessary because:
117
+
118
+ 1. **Modularity**: Services in separate crates cannot add variants to a core enum
119
+ 2. **Extensibility**: Users and third-party crates can define custom schemes without modifying OpenDAL
120
+ 3. **Simplicity**: Services don't need to know their scheme identifier
121
+
122
+ # Drawbacks
123
+
124
+ - Increases API surface area
125
+ - Less type safety compared to builder patterns
126
+ - Potential for confusing error messages with invalid URIs
127
+ - Need to maintain backwards compatibility
128
+
129
+ # Rationale and alternatives
130
+
131
+ Alternatives considered:
132
+
133
+ 1. Connection string format instead of URIs
134
+ 2. Builder pattern with URI parsing
135
+ 3. Macro-based configuration
136
+
137
+ URI-based configuration was chosen because:
138
+
139
+ - URIs are widely understood
140
+ - Natural fit for storage locations
141
+ - Extensible through custom schemes
142
+ - Common in similar tools
143
+
144
+ # Prior art
145
+
146
+ Similar patterns exist in:
147
+
148
+ - Database connection strings (PostgreSQL, MongoDB)
149
+ - [`object_store::parse_url`](https://docs.rs/object_store/latest/object_store/fn.parse_url.html)
150
+
151
+ # Unresolved questions
152
+
153
+ None
154
+
155
+ # Future possibilities
156
+
157
+ - Support for connection string format
158
+ - Configuration presets like `r2` and `s3` with directory bucket enabled
159
+ - Service crate splitting: Each service can live in its own crate and register itself with the core
160
+ - Plugin system: Allow dynamic loading of service implementations at runtime
161
+ - Service discovery: Automatically register available services based on feature flags or runtime detection
162
+ - Scheme validation and conventions: Provide utilities to validate scheme naming conventions
@@ -0,0 +1,140 @@
1
+ - Proposal Name: `context`
2
+ - Start Date: 2024-12-30
3
+ - RFC PR: [apache/opendal#5480](https://github.com/apache/opendal/pull/5480)
4
+ - Tracking Issue: [apache/opendal#5479](https://github.com/apache/opendal/issues/5479)
5
+
6
+ # Summary
7
+
8
+ Add `Context` in opendal to distribute global resources like http client, runtime, etc.
9
+
10
+ # Motivation
11
+
12
+ OpenDAL now includes two global resources, the `http client` and `runtime`, which are utilized by the specified service across all enabled layers.
13
+
14
+ However, it's a bit challenging for layers to interact with these global resources.
15
+
16
+ ## For http client
17
+
18
+ Layers cannot directly access the HTTP client. The only way to interact with the HTTP client is through the service builder, such as [`S3::http_client()`](https://docs.rs/opendal/latest/opendal/services/struct.S3.html#method.http_client). Layers like logging and metrics do not have direct access to the HTTP client.
19
+
20
+ Users need to implement the `HttpFetcher` trait to interact with the HTTP client. However, the drawback is that users lack context for the given requests; they do not know which service the request originates from or which operation it is performing.
21
+
22
+ ## For runtime
23
+
24
+ OpenDAL has the [`Execute`](https://docs.rs/opendal/latest/opendal/trait.Execute.html) for users to implement so that they can interact with the runtime. However, the API is difficult to use, as layers need to extract and construct the `Executor` for every request.
25
+
26
+ For example:
27
+
28
+ ```rust
29
+ async fn read(&self, path: &str, mut args: OpRead) -> Result<(RpRead, Self::Reader)> {
30
+ if let Some(exec) = args.executor().cloned() {
31
+ args = args.with_executor(Executor::with(TimeoutExecutor::new(
32
+ exec.into_inner(),
33
+ self.io_timeout,
34
+ )));
35
+ }
36
+
37
+ self.io_timeout(Operation::Read, self.inner.read(path, args))
38
+ .await
39
+ .map(|(rp, r)| (rp, TimeoutWrapper::new(r, self.io_timeout)))
40
+ }
41
+ ```
42
+
43
+ # Guide-level explanation
44
+
45
+ So I propose to add a `Context` to OpenDAL to distribute global resources like the HTTP client and runtime.
46
+
47
+ The `Context` is a struct that contains the global resources, such as the HTTP client and runtime. It is passed to the service builder and layers so that they can interact with the global resources.
48
+
49
+ ```rust
50
+ let mut ctx = Context::default();
51
+ ctx.set_http_client(my_http_client);
52
+ ctx.set_executor(my_executor);
53
+
54
+ let op = op.with_context(ctx);
55
+ ```
56
+
57
+ The following API will be added:
58
+
59
+ - new struct `Context`
60
+ - `Context::default()`
61
+ - `Context::load_http_client(&self) -> HttpClient`
62
+ - `Context::load_executor(&self) -> Executor`
63
+ - `Context::update_http_client(&self, f: impl FnOnce(HttpClient) -> HttpClient)`
64
+ - `Context::update_executor(&self, f: impl FnOnce(Executor) -> Executor)`
65
+ - `Operator::with_context(ctx: Context) -> Operator`
66
+
67
+ The following API will be deprecated:
68
+
69
+ - `Operator::default_executor`
70
+ - `Operator::with_default_executor`
71
+ - `OpRead::with_executor`
72
+ - `OpRead::executor`
73
+ - `OpWrite::with_executor`
74
+ - `OpWrite::executor`
75
+ - All services builders' `http_client` API
76
+
77
+ # Reference-level explanation
78
+
79
+ We will add `Context` struct in `AccessInfo`. Every service must use `Context::default()` for `AccessInfo` and stores the same instance of `Context` in the service core. All the following usage of http client or runtime should be through the `Context` instead.
80
+
81
+ The `Context` itself is a struct wrapped by something like `ArcSwap<T>`, allowing us to update it atomically.
82
+
83
+ The layers will switch to `Context` to get the global resources instead of `OpRead`.
84
+
85
+ We no longer need to hijack the read operation.
86
+
87
+ ```rust
88
+ - async fn read(&self, path: &str, mut args: OpRead) -> Result<(RpRead, Self::Reader)> {
89
+ - if let Some(exec) = args.executor().cloned() {
90
+ - args = args.with_executor(Executor::with(TimeoutExecutor::new(
91
+ - exec.into_inner(),
92
+ - self.io_timeout,
93
+ - )));
94
+ - }
95
+ -
96
+ - ...
97
+ - }
98
+ ```
99
+
100
+ Instead, we can directly get the executor from the `Context` during `layer`.
101
+
102
+ ```rust
103
+ impl<A: Access> Layer<A> for TimeoutLayer {
104
+ type LayeredAccess = TimeoutAccessor<A>;
105
+
106
+ fn layer(&self, inner: A) -> Self::LayeredAccess {
107
+ inner
108
+ .info()
109
+ .context()
110
+ .update_executor(|exec| Executor::with(TimeoutExecutor::new(exec, self.io_timeout)));
111
+
112
+ TimeoutAccessor {
113
+ inner,
114
+
115
+ timeout: self.timeout,
116
+ io_timeout: self.io_timeout,
117
+ }
118
+ }
119
+ }
120
+ ```
121
+
122
+ # Drawbacks
123
+
124
+ A bit cost (`50ns`) for every operation that `load_http_client`.
125
+
126
+ # Rationale and alternatives
127
+
128
+ None.
129
+
130
+ # Prior art
131
+
132
+ None.
133
+
134
+ # Unresolved questions
135
+
136
+ None.
137
+
138
+ # Future possibilities
139
+
140
+ None.
@@ -0,0 +1,112 @@
1
+ - Proposal Name: `conditional_reader`
2
+ - Start Date: 2024-12-31
3
+ - RFC PR: [apache/opendal#5485](https://github.com/apache/opendal/pull/5485)
4
+ - Tracking Issue: [apache/opendal#5486](https://github.com/apache/opendal/issues/5486)
5
+
6
+ # Summary
7
+
8
+ Add `if_match`, `if_none_match`, `if_modified_since` and `if_unmodified_since` options to OpenDAL's `reader_with` API.
9
+
10
+ # Motivation
11
+
12
+ OpenDAL currently supports conditional `reader_with` operations based only on `version`. However, many storage services
13
+ also support conditional operations based on Etag and/or modification time.
14
+
15
+ Adding these options will:
16
+
17
+ - Provide more granular control over read operations.
18
+ - Align OpenDAL with features provided by modern storage services, meeting broader use cases.
19
+
20
+ # Guide-level explanation
21
+
22
+ Four new options will be added to the `reader_with` API:
23
+
24
+ ## `if_match`
25
+
26
+ Return the content only if its Etag matches the specified Etag; otherwise,
27
+ an error kind `ErrorKind::ConditionNotMatch` will be returned:
28
+
29
+ ```rust
30
+ let reader = op.reader_with("path/to/file")
31
+ .if_match(etag)
32
+ .await?;
33
+ ```
34
+
35
+ ## `if_none_match`
36
+
37
+ Return the content only if its Etag does NOT match the specified Etag; otherwise,
38
+ an error kind `ErrorKind::ConditionNotMatch` will be returned:
39
+
40
+ ```rust
41
+ let reader = op.reader_with("path/to/file")
42
+ .if_none_match(etag)
43
+ .await?;
44
+ ```
45
+
46
+ ## `if_modified_since`
47
+
48
+ Return the content if it has been modified since the specified time; otherwise,
49
+ an error kind `ErrorKind::ConditionNotMatch` will be returned:
50
+
51
+ ```rust
52
+ use chrono::{Duration, Utc};
53
+
54
+ let last_check = Utc::now() - Duration::seconds(3600); // 1 hour ago
55
+ let reader = op.reader_with("path/to/file")
56
+ .if_modified_since(last_check)
57
+ .await?;
58
+ ```
59
+
60
+
61
+ ## `if_unmodified_since`
62
+
63
+ Return the content if it has NOT been modified since the specified time; otherwise,
64
+ an error kind `ErrorKind::ConditionNotMatch` will be returned:
65
+
66
+ ```rust
67
+ use chrono::{Duration, Utc};
68
+
69
+ let timestamp = Utc::now() - Duration::seconds(86400); // 24 hours ago
70
+ let reader = op.reader_with("path/to/file")
71
+ .if_unmodified_since(timestamp)
72
+ .await?;
73
+ ```
74
+
75
+
76
+ # Reference-level explanation
77
+
78
+ The main implementation will include:
79
+
80
+ 1. Add new fields(`if_modified_since`, `if_unmodified_since`) and related functions to `OpRead`.
81
+
82
+ 2. Add the related functions to `FutureReader`
83
+
84
+ 3. Add new capability flags:
85
+ ```rust
86
+ pub struct Capability {
87
+ // ... other fields
88
+ pub read_with_if_modified_since: bool,
89
+ pub read_with_if_unmodified_since: bool,
90
+ }
91
+ ```
92
+ 4. implement `if_modified_since`, `if_unmodified_since` for the underlying storage service.
93
+
94
+ # Drawbacks
95
+
96
+ - Add complexity to the API
97
+
98
+ # Rationale and alternatives
99
+
100
+ - Follows existing OpenDAL patterns for conditional operations
101
+
102
+ # Prior art
103
+
104
+ None
105
+
106
+ # Unresolved questions
107
+
108
+ None
109
+
110
+ # Future possibilities
111
+
112
+ None
@@ -0,0 +1,81 @@
1
+ - Proposal Name: `list_with_deleted`
2
+ - Start Date: 2025-01-02
3
+ - RFC PR: [apache/opendal#5495](https://github.com/apache/opendal/pull/0000)
4
+ - Tracking Issue: [apache/opendal#5496](https://github.com/apache/opendal/issues/5496)
5
+
6
+ # Summary
7
+
8
+ Add `list_with(path).deleted(true)` to enable users to list deleted files from storage services.
9
+
10
+ # Motivation
11
+
12
+ OpenDAL is currently working on adding support for file versions, allowing users to read, list, and delete them.
13
+
14
+ ```rust
15
+ // Read given version
16
+ op.read_with(path).version(version_id).await;
17
+ // Fetch the metadata of given version.
18
+ op.stat_with(path).version(version_id).await;
19
+ // Delete the given version.
20
+ op.delete_with(path).version(version_id).await;
21
+ // List the path's versions.
22
+ op.list_with(path).versions().await;
23
+ ```
24
+
25
+ However, to implement the complete data recovery workflow, we should also include support for recovering deleted files from storage services. This feature is referred to as `DeleteMarker` in S3 and `Soft Deleted` in Azure Blob Storage or Google Cloud Storage. Users can utilize these deleted files (or versions) to restore files that may have been accidentally deleted.
26
+
27
+ # Guide-level explanation
28
+
29
+ I suggest adding `list_with(path).deleted(true)` to allow users to list deleted files from storage services.
30
+
31
+ ```rust
32
+ let entries = op.list_with(path).deleted(true).await;
33
+ ```
34
+
35
+ Please note that `deleted` here means "including deleted files" rather than "only deleted files." Therefore, `list_with(path).deleted(true)` will list both current files and deleted ones.
36
+
37
+ At the same time, we will add an `is_deleted` field to the `Metadata` struct to indicate whether the file has been deleted. Together with the existing `is_current` field, we will have the following matrix:
38
+
39
+ | `is_current` | `is_deleted` | Description |
40
+ |---------------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
41
+ | `Some(true)` | `false` | **The metadata's associated version is the latest, current version.** This is the normal state, indicating that this version is the most up-to-date and accessible version. |
42
+ | `Some(true)` | `true` | **The metadata's associated version is the latest, deleted version (Latest Delete Marker or Soft Deleted).** This is particularly important in object storage systems like S3. It signifies that this version is the **most recent delete marker**, indicating the object has been deleted. Subsequent GET requests will return 404 errors unless a specific version ID is provided. |
43
+ | `Some(false)` | `false` | **The metadata's associated version is neither the latest version nor deleted.** This indicates that this version is a previous version, still accessible by specifying its version ID. |
44
+ | `Some(false)` | `true` | **The metadata's associated version is not the latest version and is deleted.** This represents a historical version that has been marked for deletion. Users will need to specify the version ID to access it, and accessing it may be subject to specific delete marker behavior (e.g., in S3, it might not return actual data but a specific delete marker response). |
45
+ | `None` | `false` | **The metadata's associated file is not deleted, but its version status is either unknown or it is not the latest version.** This likely indicates that versioning is not enabled for this file, or versioning information is unavailable. |
46
+ | `None` | `true` | **The metadata's associated file is deleted, but its version status is either unknown or it is not the latest version.** This typically means the file was deleted without versioning enabled, or its versioning information is unavailable. This may represent an actual data deletion operation rather than an S3 delete marker. |
47
+
48
+
49
+ # Reference-level explanation
50
+
51
+ - Implement the `list_with(path).deleted(true)` API for the `Operator`.
52
+ - Add an `is_deleted` field to `Metadata`.
53
+ - Integrate logic for including deleted files into the `list` method of the storage service.
54
+
55
+ # Drawbacks
56
+
57
+ None.
58
+
59
+ # Rationale and alternatives
60
+
61
+ ## Why "including deleted files" rather than "only deleted files"?
62
+
63
+ Most storage services are designed to list files along with deleted files, rather than exclusively listing deleted files. For example:
64
+
65
+ - S3's [ListObjectVersions](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html) API lists all versions of an object, including delete markers.
66
+ - GCS's [list](https://cloud.google.com/storage/docs/json_api/v1/objects/list) API includes a parameter `softDeleted` to display soft-deleted files.
67
+ - AzBlob's [List Blobs](https://learn.microsoft.com/en-us/rest/api/storageservices/list-blobs) API supports the parameter `include=deleted` to list soft-deleted blobs.
68
+
69
+ So, it is more natural to list files along with deleted files, rather than only listing deleted files.
70
+
71
+ # Prior art
72
+
73
+ None.
74
+
75
+ # Unresolved questions
76
+
77
+ None.
78
+
79
+ # Future possibilities
80
+
81
+ None.
@@ -0,0 +1,121 @@
1
+ - Proposal Name: `write_returns_metadata`
2
+ - Start Date: 2025-01-16
3
+ - RFC PR: [apache/opendal#5556](https://github.com/apache/opendal/pull/5556)
4
+ - Tracking Issue: [apache/opendal#5557](https://github.com/apache/opendal/issues/5557)
5
+
6
+ # Summary
7
+
8
+ Enhance write operations by returning metadata after successful writes.
9
+
10
+ # Motivation
11
+
12
+ Currently, write operations (`write`, `write_with`, `writer`, `writer_with`) only return `Result<()>` or `Result<Writer>`.
13
+ Users who need metadata after writing (like `ETag` or `version_id`) must make an additional `stat()` call. This is inefficient
14
+ and can lead to race conditions if the file is modified between the write and stat operations.
15
+
16
+ Many storage services (like S3, GCS, Azure Blob) return metadata in their write responses. We should expose this information
17
+ to users directly after write operations.
18
+
19
+ # Guide-level explanation
20
+
21
+ The write operations will be enhanced to return metadata:
22
+
23
+ ```rust
24
+ // Before
25
+ op.write("path/to/file", data).await?;
26
+ let meta = op.stat("path/to/file").await?;
27
+ if Some(etag) = meta.etag() {
28
+ println!("File ETag: {}", etag);
29
+ }
30
+
31
+ // After
32
+ let meta = op.write("path/to/file", data).await?;
33
+ if Some(etag) = meta.etag() {
34
+ println!("File ETag: {}", etag);
35
+ }
36
+ ```
37
+
38
+ For writer operations:
39
+
40
+ ```rust
41
+ // Before
42
+ let mut writer = op.writer("path/to/file").await?;
43
+ writer.write(data).await?;
44
+ writer.close().await?;
45
+ let meta = op.stat("path/to/file").await?;
46
+ if Some(etag) = meta.etag() {
47
+ println!("File ETag: {}", etag);
48
+ }
49
+
50
+ // After
51
+ let mut writer = op.writer("path/to/file").await?;
52
+ writer.write(data).await?;
53
+ let meta = writer.close().await?;
54
+ if Some(etag) = meta.etag() {
55
+ println!("File ETag: {}", etag);
56
+ }
57
+ ```
58
+
59
+ The behavior remains unchanged if users don't need the metadata - they can simply ignore the return value.
60
+
61
+ # Reference-level explanation
62
+
63
+ ## Changes to `Operator` API
64
+
65
+ The following functions will be modified to return `Result<Metadata>` instead of `Result<()>`:
66
+
67
+ - `write()`
68
+ - `write_with()`
69
+
70
+ The `writer()` and `writer_with()` return types remain unchanged as they return `Result<Writer>`.
71
+
72
+ ## Changes to struct `Writer`
73
+
74
+ The `Writer` struct will be modified to return `Result<Metadata>` instead of `Result<()>` for the `close()` function.
75
+
76
+ ## Changes to trait `oio::Write` and trait `oio::MultipartWrite`
77
+
78
+ The `Write` trait will be modified to return `Result<Metadata>` instead of `Result<()>` for the `close()` function.
79
+
80
+ The `MultipartWrite` trait will be modified to return `Result<Metadata>` instead of `Result<()>` for the `complete_part()`
81
+ and `write_once` functions.
82
+
83
+ ## Implementation Details
84
+
85
+ For services that return metadata in their write responses:
86
+ - The metadata will be captured from the service response
87
+ - All available fields (etag, version_id, etc.) will be populated
88
+
89
+ For services that don't return metadata in write responses:
90
+ - for `fs`: we can use `stat` to retrieve the metadata before returning. since the metadata is cached by the kernel,
91
+ this won't cause a performance issue.
92
+ - for other services: A default metadata object will be returned.
93
+
94
+
95
+ # Drawbacks
96
+
97
+ - Minor breaking change for users who explicitly type the return value of write operations
98
+ - Additional complexity in the Writer implementation
99
+
100
+ # Rationale and alternatives
101
+
102
+ - Provides a clean, consistent API
103
+ - Maintains backward compatibility for users who ignore the return value
104
+ - Improves performance by avoiding additional stat calls when possible
105
+
106
+ # Prior art
107
+
108
+ Similar patterns exist in other storage SDKs:
109
+
110
+ - `object_store` crate returns metadata in `PutResult` after calling `put_opts`
111
+ - AWS SDK returns metadata in `PutObjectOutput`
112
+ - Azure SDK returns `UploadFileResponse` after uploads
113
+
114
+ # Unresolved questions
115
+
116
+ - None
117
+
118
+
119
+ # Future possibilities
120
+
121
+ - None