opendal 0.1.6.pre.rc.1-arm64-darwin-23
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.standard.yml +20 -0
- data/.tool-versions +1 -0
- data/.yardopts +1 -0
- data/DEPENDENCIES.md +9 -0
- data/DEPENDENCIES.rust.tsv +277 -0
- data/Gemfile +35 -0
- data/README.md +159 -0
- data/Rakefile +149 -0
- data/core/CHANGELOG.md +4929 -0
- data/core/CONTRIBUTING.md +61 -0
- data/core/DEPENDENCIES.md +3 -0
- data/core/DEPENDENCIES.rust.tsv +185 -0
- data/core/LICENSE +201 -0
- data/core/README.md +228 -0
- data/core/benches/README.md +18 -0
- data/core/benches/ops/README.md +26 -0
- data/core/benches/types/README.md +9 -0
- data/core/benches/vs_fs/README.md +35 -0
- data/core/benches/vs_s3/README.md +55 -0
- data/core/edge/README.md +3 -0
- data/core/edge/file_write_on_full_disk/README.md +14 -0
- data/core/edge/s3_aws_assume_role_with_web_identity/README.md +18 -0
- data/core/edge/s3_read_on_wasm/.gitignore +3 -0
- data/core/edge/s3_read_on_wasm/README.md +42 -0
- data/core/edge/s3_read_on_wasm/webdriver.json +15 -0
- data/core/examples/README.md +23 -0
- data/core/examples/basic/README.md +15 -0
- data/core/examples/concurrent-upload/README.md +15 -0
- data/core/examples/multipart-upload/README.md +15 -0
- data/core/fuzz/.gitignore +5 -0
- data/core/fuzz/README.md +68 -0
- data/core/src/docs/comparisons/vs_object_store.md +183 -0
- data/core/src/docs/performance/concurrent_write.md +101 -0
- data/core/src/docs/performance/http_optimization.md +124 -0
- data/core/src/docs/rfcs/0000_example.md +74 -0
- data/core/src/docs/rfcs/0000_foyer_integration.md +111 -0
- data/core/src/docs/rfcs/0041_object_native_api.md +185 -0
- data/core/src/docs/rfcs/0044_error_handle.md +198 -0
- data/core/src/docs/rfcs/0057_auto_region.md +160 -0
- data/core/src/docs/rfcs/0069_object_stream.md +145 -0
- data/core/src/docs/rfcs/0090_limited_reader.md +155 -0
- data/core/src/docs/rfcs/0112_path_normalization.md +79 -0
- data/core/src/docs/rfcs/0191_async_streaming_io.md +328 -0
- data/core/src/docs/rfcs/0203_remove_credential.md +96 -0
- data/core/src/docs/rfcs/0221_create_dir.md +89 -0
- data/core/src/docs/rfcs/0247_retryable_error.md +87 -0
- data/core/src/docs/rfcs/0293_object_id.md +67 -0
- data/core/src/docs/rfcs/0337_dir_entry.md +191 -0
- data/core/src/docs/rfcs/0409_accessor_capabilities.md +67 -0
- data/core/src/docs/rfcs/0413_presign.md +154 -0
- data/core/src/docs/rfcs/0423_command_line_interface.md +268 -0
- data/core/src/docs/rfcs/0429_init_from_iter.md +107 -0
- data/core/src/docs/rfcs/0438_multipart.md +163 -0
- data/core/src/docs/rfcs/0443_gateway.md +73 -0
- data/core/src/docs/rfcs/0501_new_builder.md +111 -0
- data/core/src/docs/rfcs/0554_write_refactor.md +96 -0
- data/core/src/docs/rfcs/0561_list_metadata_reuse.md +210 -0
- data/core/src/docs/rfcs/0599_blocking_api.md +157 -0
- data/core/src/docs/rfcs/0623_redis_service.md +300 -0
- data/core/src/docs/rfcs/0627_split_capabilities.md +89 -0
- data/core/src/docs/rfcs/0661_path_in_accessor.md +126 -0
- data/core/src/docs/rfcs/0793_generic_kv_services.md +209 -0
- data/core/src/docs/rfcs/0926_object_reader.md +93 -0
- data/core/src/docs/rfcs/0977_refactor_error.md +151 -0
- data/core/src/docs/rfcs/1085_object_handler.md +73 -0
- data/core/src/docs/rfcs/1391_object_metadataer.md +110 -0
- data/core/src/docs/rfcs/1398_query_based_metadata.md +125 -0
- data/core/src/docs/rfcs/1420_object_writer.md +147 -0
- data/core/src/docs/rfcs/1477_remove_object_concept.md +159 -0
- data/core/src/docs/rfcs/1735_operation_extension.md +117 -0
- data/core/src/docs/rfcs/2083_writer_sink_api.md +106 -0
- data/core/src/docs/rfcs/2133_append_api.md +88 -0
- data/core/src/docs/rfcs/2299_chain_based_operator_api.md +99 -0
- data/core/src/docs/rfcs/2602_object_versioning.md +138 -0
- data/core/src/docs/rfcs/2758_merge_append_into_write.md +79 -0
- data/core/src/docs/rfcs/2774_lister_api.md +66 -0
- data/core/src/docs/rfcs/2779_list_with_metakey.md +143 -0
- data/core/src/docs/rfcs/2852_native_capability.md +58 -0
- data/core/src/docs/rfcs/2884_merge_range_read_into_read.md +80 -0
- data/core/src/docs/rfcs/3017_remove_write_copy_from.md +94 -0
- data/core/src/docs/rfcs/3197_config.md +237 -0
- data/core/src/docs/rfcs/3232_align_list_api.md +69 -0
- data/core/src/docs/rfcs/3243_list_prefix.md +128 -0
- data/core/src/docs/rfcs/3356_lazy_reader.md +111 -0
- data/core/src/docs/rfcs/3526_list_recursive.md +59 -0
- data/core/src/docs/rfcs/3574_concurrent_stat_in_list.md +80 -0
- data/core/src/docs/rfcs/3734_buffered_reader.md +64 -0
- data/core/src/docs/rfcs/3898_concurrent_writer.md +66 -0
- data/core/src/docs/rfcs/3911_deleter_api.md +165 -0
- data/core/src/docs/rfcs/4382_range_based_read.md +213 -0
- data/core/src/docs/rfcs/4638_executor.md +215 -0
- data/core/src/docs/rfcs/5314_remove_metakey.md +120 -0
- data/core/src/docs/rfcs/5444_operator_from_uri.md +162 -0
- data/core/src/docs/rfcs/5479_context.md +140 -0
- data/core/src/docs/rfcs/5485_conditional_reader.md +112 -0
- data/core/src/docs/rfcs/5495_list_with_deleted.md +81 -0
- data/core/src/docs/rfcs/5556_write_returns_metadata.md +121 -0
- data/core/src/docs/rfcs/5871_read_returns_metadata.md +112 -0
- data/core/src/docs/rfcs/6189_remove_native_blocking.md +106 -0
- data/core/src/docs/rfcs/6209_glob_support.md +132 -0
- data/core/src/docs/rfcs/6213_options_api.md +142 -0
- data/core/src/docs/rfcs/README.md +62 -0
- data/core/src/docs/upgrade.md +1556 -0
- data/core/src/services/aliyun_drive/docs.md +61 -0
- data/core/src/services/alluxio/docs.md +45 -0
- data/core/src/services/azblob/docs.md +77 -0
- data/core/src/services/azdls/docs.md +73 -0
- data/core/src/services/azfile/docs.md +65 -0
- data/core/src/services/b2/docs.md +54 -0
- data/core/src/services/cacache/docs.md +38 -0
- data/core/src/services/cloudflare_kv/docs.md +21 -0
- data/core/src/services/cos/docs.md +55 -0
- data/core/src/services/d1/docs.md +48 -0
- data/core/src/services/dashmap/docs.md +38 -0
- data/core/src/services/dbfs/docs.md +57 -0
- data/core/src/services/dropbox/docs.md +64 -0
- data/core/src/services/etcd/docs.md +45 -0
- data/core/src/services/foundationdb/docs.md +42 -0
- data/core/src/services/fs/docs.md +49 -0
- data/core/src/services/ftp/docs.md +42 -0
- data/core/src/services/gcs/docs.md +76 -0
- data/core/src/services/gdrive/docs.md +65 -0
- data/core/src/services/ghac/docs.md +84 -0
- data/core/src/services/github/docs.md +52 -0
- data/core/src/services/gridfs/docs.md +46 -0
- data/core/src/services/hdfs/docs.md +140 -0
- data/core/src/services/hdfs_native/docs.md +35 -0
- data/core/src/services/http/docs.md +45 -0
- data/core/src/services/huggingface/docs.md +61 -0
- data/core/src/services/ipfs/docs.md +45 -0
- data/core/src/services/ipmfs/docs.md +14 -0
- data/core/src/services/koofr/docs.md +51 -0
- data/core/src/services/lakefs/docs.md +62 -0
- data/core/src/services/memcached/docs.md +47 -0
- data/core/src/services/memory/docs.md +36 -0
- data/core/src/services/mini_moka/docs.md +19 -0
- data/core/src/services/moka/docs.md +42 -0
- data/core/src/services/mongodb/docs.md +49 -0
- data/core/src/services/monoiofs/docs.md +46 -0
- data/core/src/services/mysql/docs.md +47 -0
- data/core/src/services/obs/docs.md +54 -0
- data/core/src/services/onedrive/docs.md +115 -0
- data/core/src/services/opfs/docs.md +18 -0
- data/core/src/services/oss/docs.md +74 -0
- data/core/src/services/pcloud/docs.md +51 -0
- data/core/src/services/persy/docs.md +43 -0
- data/core/src/services/postgresql/docs.md +47 -0
- data/core/src/services/redb/docs.md +41 -0
- data/core/src/services/redis/docs.md +43 -0
- data/core/src/services/rocksdb/docs.md +54 -0
- data/core/src/services/s3/compatible_services.md +126 -0
- data/core/src/services/s3/docs.md +244 -0
- data/core/src/services/seafile/docs.md +54 -0
- data/core/src/services/sftp/docs.md +49 -0
- data/core/src/services/sled/docs.md +39 -0
- data/core/src/services/sqlite/docs.md +46 -0
- data/core/src/services/surrealdb/docs.md +54 -0
- data/core/src/services/swift/compatible_services.md +53 -0
- data/core/src/services/swift/docs.md +52 -0
- data/core/src/services/tikv/docs.md +43 -0
- data/core/src/services/upyun/docs.md +51 -0
- data/core/src/services/vercel_artifacts/docs.md +40 -0
- data/core/src/services/vercel_blob/docs.md +45 -0
- data/core/src/services/webdav/docs.md +49 -0
- data/core/src/services/webhdfs/docs.md +90 -0
- data/core/src/services/yandex_disk/docs.md +45 -0
- data/core/tests/behavior/README.md +77 -0
- data/core/tests/data/normal_dir/.gitkeep +0 -0
- data/core/tests/data/normal_file.txt +1041 -0
- data/core/tests/data/special_dir !@#$%^&()_+-=;',/.gitkeep +0 -0
- data/core/tests/data/special_file !@#$%^&()_+-=;',.txt +1041 -0
- data/core/users.md +13 -0
- data/extconf.rb +24 -0
- data/lib/opendal.rb +25 -0
- data/lib/opendal_ruby/entry.rb +35 -0
- data/lib/opendal_ruby/io.rb +70 -0
- data/lib/opendal_ruby/metadata.rb +44 -0
- data/lib/opendal_ruby/opendal_ruby.bundle +0 -0
- data/lib/opendal_ruby/operator.rb +29 -0
- data/lib/opendal_ruby/operator_info.rb +26 -0
- data/opendal.gemspec +91 -0
- data/test/blocking_op_test.rb +112 -0
- data/test/capability_test.rb +42 -0
- data/test/io_test.rb +172 -0
- data/test/lister_test.rb +77 -0
- data/test/metadata_test.rb +78 -0
- data/test/middlewares_test.rb +46 -0
- data/test/operator_info_test.rb +35 -0
- data/test/test_helper.rb +36 -0
- metadata +240 -0
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
- Proposal Name: `operator_from_uri`
|
|
2
|
+
- Start Date: 2024-12-23
|
|
3
|
+
- RFC PR: [apache/opendal#5444](https://github.com/apache/opendal/pull/5444)
|
|
4
|
+
- Tracking Issue: [apache/opendal#5445](https://github.com/apache/opendal/issues/5445)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
This RFC proposes adding URI-based configuration support to OpenDAL, allowing users to create operators directly from URIs. The proposal introduces a new `from_uri` API in both the `Operator` and `Configurator` traits, along with an `OperatorRegistry` to manage operator factories. As part of this change, we will also transition from the `Scheme` enum to string-based scheme identifiers, enabling better modularity and support for service crate splitting.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
Currently, creating an operator in OpenDAL requires explicit configuration through builder patterns. While this approach provides type safety and clear documentation, it can be verbose and inflexible for simple use cases. Many storage systems are naturally identified by URIs (e.g., `s3://bucket/path`, `fs:///path/to/dir`).
|
|
13
|
+
|
|
14
|
+
Adding URI-based configuration would:
|
|
15
|
+
|
|
16
|
+
- Simplify operator creation for common use cases
|
|
17
|
+
- Enable configuration via connection strings (common in many applications)
|
|
18
|
+
- Make OpenDAL more approachable for new users
|
|
19
|
+
- Allow dynamic operator creation based on runtime configuration
|
|
20
|
+
|
|
21
|
+
# Guide-level explanation
|
|
22
|
+
|
|
23
|
+
The new API allows creating operators directly from URIs:
|
|
24
|
+
|
|
25
|
+
```rust
|
|
26
|
+
// Create an operator using URI
|
|
27
|
+
let op = Operator::from_uri("s3://my-bucket/path", vec![
|
|
28
|
+
("endpoint".to_string(), "http://localhost:8080"to_string()),
|
|
29
|
+
])?;
|
|
30
|
+
|
|
31
|
+
// Users can pass options through the URI along with additional key-value pairs
|
|
32
|
+
// The extra options will override identical options specified in the URI
|
|
33
|
+
let op = Operator::from_uri("s3://my-bucket/path?region=us-east-1", vec![
|
|
34
|
+
("endpoint".to_string(), "http://localhost:8080"to_string()),
|
|
35
|
+
])?;
|
|
36
|
+
|
|
37
|
+
// Create a file system operator
|
|
38
|
+
let op = Operator::from_uri("fs:///tmp/test", vec![])?;
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
OpenDAL will, by default, register services enabled by features in a global `OperatorRegistry`. Users can also create custom operator registries to support their own schemes or additional options.
|
|
42
|
+
|
|
43
|
+
```rust
|
|
44
|
+
// Using a custom registry
|
|
45
|
+
let registry = OperatorRegistry::new();
|
|
46
|
+
|
|
47
|
+
// Register builtin builders under desired schemes
|
|
48
|
+
registry.register::<services::S3>(services::S3_SCHEME);
|
|
49
|
+
registry.register::<services::S3>("minio"); // MinIO is S3-compatible
|
|
50
|
+
registry.register::<services::S3>("r2"); // Cloudflare R2 is S3-compatible
|
|
51
|
+
|
|
52
|
+
// Users can define their own scheme names for internal use
|
|
53
|
+
registry.register::<services::S3>("company-storage");
|
|
54
|
+
registry.register::<services::Azblob>("backup-storage");
|
|
55
|
+
|
|
56
|
+
let op = registry.load("company-storage://bucket/path", [])?;
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
# Reference-level explanation
|
|
60
|
+
|
|
61
|
+
The implementation consists of three main components:
|
|
62
|
+
|
|
63
|
+
1. The `OperatorFactory` and `OperatorRegistry`:
|
|
64
|
+
|
|
65
|
+
`OperatorFactory` is a function type that takes a URI string plus options and returns an `Operator`. `OperatorRegistry` manages factories registered under different schemes.
|
|
66
|
+
|
|
67
|
+
`OperatorFactory` is a function type that takes a URI string plus options and returns an `Operator`. `OperatorRegistry` manages factories registered under different schemes.
|
|
68
|
+
|
|
69
|
+
```rust
|
|
70
|
+
type OperatorFactory = fn(&str, Vec<(String, String)>) -> Result<Operator>;
|
|
71
|
+
|
|
72
|
+
pub struct OperatorRegistry { ... }
|
|
73
|
+
|
|
74
|
+
impl OperatorRegistry {
|
|
75
|
+
fn register<B: Builder>(&self, scheme: &str) {
|
|
76
|
+
...
|
|
77
|
+
}
|
|
78
|
+
|
|
79
|
+
fn load(&self, uri: &str, options: impl IntoIterator<Item = (String, String)>) -> Result<Operator> {
|
|
80
|
+
...
|
|
81
|
+
}
|
|
82
|
+
}
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
2. The `Configurator` trait extension:
|
|
86
|
+
|
|
87
|
+
`Configurator` will add a new API to create a configuration from a URI and options. Services should only parse the URI components relevant to their configuration (host, path, query parameters) without concerning themselves with the scheme portion.
|
|
88
|
+
|
|
89
|
+
```rust
|
|
90
|
+
impl Configurator for S3Config {
|
|
91
|
+
fn from_uri(uri: &Uri, options: &HashMap<String, String>) -> Result<Self> {
|
|
92
|
+
...
|
|
93
|
+
}
|
|
94
|
+
}
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
This design allows the same S3 implementation to work whether accessed via `s3://`, `minio://`, or any other user-defined scheme.
|
|
98
|
+
|
|
99
|
+
3. The `Operator` `from_uri` method:
|
|
100
|
+
|
|
101
|
+
The `Operator` trait will add a new `from_uri` method to create an operator from a URI and options. This method will use the global `OperatorRegistry` to find the appropriate factory for the scheme.
|
|
102
|
+
|
|
103
|
+
```rust
|
|
104
|
+
impl Operator {
|
|
105
|
+
pub fn from_uri(
|
|
106
|
+
uri: &str,
|
|
107
|
+
options: impl IntoIterator<Item = (String, String)>,
|
|
108
|
+
) -> Result<Self> {
|
|
109
|
+
...
|
|
110
|
+
}
|
|
111
|
+
}
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
## Scheme Enum Removal
|
|
115
|
+
|
|
116
|
+
As part of this RFC, we will transition from the `Scheme` enum to string-based identifiers (`&'static str`). This change is necessary because:
|
|
117
|
+
|
|
118
|
+
1. **Modularity**: Services in separate crates cannot add variants to a core enum
|
|
119
|
+
2. **Extensibility**: Users and third-party crates can define custom schemes without modifying OpenDAL
|
|
120
|
+
3. **Simplicity**: Services don't need to know their scheme identifier
|
|
121
|
+
|
|
122
|
+
# Drawbacks
|
|
123
|
+
|
|
124
|
+
- Increases API surface area
|
|
125
|
+
- Less type safety compared to builder patterns
|
|
126
|
+
- Potential for confusing error messages with invalid URIs
|
|
127
|
+
- Need to maintain backwards compatibility
|
|
128
|
+
|
|
129
|
+
# Rationale and alternatives
|
|
130
|
+
|
|
131
|
+
Alternatives considered:
|
|
132
|
+
|
|
133
|
+
1. Connection string format instead of URIs
|
|
134
|
+
2. Builder pattern with URI parsing
|
|
135
|
+
3. Macro-based configuration
|
|
136
|
+
|
|
137
|
+
URI-based configuration was chosen because:
|
|
138
|
+
|
|
139
|
+
- URIs are widely understood
|
|
140
|
+
- Natural fit for storage locations
|
|
141
|
+
- Extensible through custom schemes
|
|
142
|
+
- Common in similar tools
|
|
143
|
+
|
|
144
|
+
# Prior art
|
|
145
|
+
|
|
146
|
+
Similar patterns exist in:
|
|
147
|
+
|
|
148
|
+
- Database connection strings (PostgreSQL, MongoDB)
|
|
149
|
+
- [`object_store::parse_url`](https://docs.rs/object_store/latest/object_store/fn.parse_url.html)
|
|
150
|
+
|
|
151
|
+
# Unresolved questions
|
|
152
|
+
|
|
153
|
+
None
|
|
154
|
+
|
|
155
|
+
# Future possibilities
|
|
156
|
+
|
|
157
|
+
- Support for connection string format
|
|
158
|
+
- Configuration presets like `r2` and `s3` with directory bucket enabled
|
|
159
|
+
- Service crate splitting: Each service can live in its own crate and register itself with the core
|
|
160
|
+
- Plugin system: Allow dynamic loading of service implementations at runtime
|
|
161
|
+
- Service discovery: Automatically register available services based on feature flags or runtime detection
|
|
162
|
+
- Scheme validation and conventions: Provide utilities to validate scheme naming conventions
|
|
@@ -0,0 +1,140 @@
|
|
|
1
|
+
- Proposal Name: `context`
|
|
2
|
+
- Start Date: 2024-12-30
|
|
3
|
+
- RFC PR: [apache/opendal#5480](https://github.com/apache/opendal/pull/5480)
|
|
4
|
+
- Tracking Issue: [apache/opendal#5479](https://github.com/apache/opendal/issues/5479)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Add `Context` in opendal to distribute global resources like http client, runtime, etc.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
OpenDAL now includes two global resources, the `http client` and `runtime`, which are utilized by the specified service across all enabled layers.
|
|
13
|
+
|
|
14
|
+
However, it's a bit challenging for layers to interact with these global resources.
|
|
15
|
+
|
|
16
|
+
## For http client
|
|
17
|
+
|
|
18
|
+
Layers cannot directly access the HTTP client. The only way to interact with the HTTP client is through the service builder, such as [`S3::http_client()`](https://docs.rs/opendal/latest/opendal/services/struct.S3.html#method.http_client). Layers like logging and metrics do not have direct access to the HTTP client.
|
|
19
|
+
|
|
20
|
+
Users need to implement the `HttpFetcher` trait to interact with the HTTP client. However, the drawback is that users lack context for the given requests; they do not know which service the request originates from or which operation it is performing.
|
|
21
|
+
|
|
22
|
+
## For runtime
|
|
23
|
+
|
|
24
|
+
OpenDAL has the [`Execute`](https://docs.rs/opendal/latest/opendal/trait.Execute.html) for users to implement so that they can interact with the runtime. However, the API is difficult to use, as layers need to extract and construct the `Executor` for every request.
|
|
25
|
+
|
|
26
|
+
For example:
|
|
27
|
+
|
|
28
|
+
```rust
|
|
29
|
+
async fn read(&self, path: &str, mut args: OpRead) -> Result<(RpRead, Self::Reader)> {
|
|
30
|
+
if let Some(exec) = args.executor().cloned() {
|
|
31
|
+
args = args.with_executor(Executor::with(TimeoutExecutor::new(
|
|
32
|
+
exec.into_inner(),
|
|
33
|
+
self.io_timeout,
|
|
34
|
+
)));
|
|
35
|
+
}
|
|
36
|
+
|
|
37
|
+
self.io_timeout(Operation::Read, self.inner.read(path, args))
|
|
38
|
+
.await
|
|
39
|
+
.map(|(rp, r)| (rp, TimeoutWrapper::new(r, self.io_timeout)))
|
|
40
|
+
}
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
# Guide-level explanation
|
|
44
|
+
|
|
45
|
+
So I propose to add a `Context` to OpenDAL to distribute global resources like the HTTP client and runtime.
|
|
46
|
+
|
|
47
|
+
The `Context` is a struct that contains the global resources, such as the HTTP client and runtime. It is passed to the service builder and layers so that they can interact with the global resources.
|
|
48
|
+
|
|
49
|
+
```rust
|
|
50
|
+
let mut ctx = Context::default();
|
|
51
|
+
ctx.set_http_client(my_http_client);
|
|
52
|
+
ctx.set_executor(my_executor);
|
|
53
|
+
|
|
54
|
+
let op = op.with_context(ctx);
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
The following API will be added:
|
|
58
|
+
|
|
59
|
+
- new struct `Context`
|
|
60
|
+
- `Context::default()`
|
|
61
|
+
- `Context::load_http_client(&self) -> HttpClient`
|
|
62
|
+
- `Context::load_executor(&self) -> Executor`
|
|
63
|
+
- `Context::update_http_client(&self, f: impl FnOnce(HttpClient) -> HttpClient)`
|
|
64
|
+
- `Context::update_executor(&self, f: impl FnOnce(Executor) -> Executor)`
|
|
65
|
+
- `Operator::with_context(ctx: Context) -> Operator`
|
|
66
|
+
|
|
67
|
+
The following API will be deprecated:
|
|
68
|
+
|
|
69
|
+
- `Operator::default_executor`
|
|
70
|
+
- `Operator::with_default_executor`
|
|
71
|
+
- `OpRead::with_executor`
|
|
72
|
+
- `OpRead::executor`
|
|
73
|
+
- `OpWrite::with_executor`
|
|
74
|
+
- `OpWrite::executor`
|
|
75
|
+
- All services builders' `http_client` API
|
|
76
|
+
|
|
77
|
+
# Reference-level explanation
|
|
78
|
+
|
|
79
|
+
We will add `Context` struct in `AccessInfo`. Every service must use `Context::default()` for `AccessInfo` and stores the same instance of `Context` in the service core. All the following usage of http client or runtime should be through the `Context` instead.
|
|
80
|
+
|
|
81
|
+
The `Context` itself is a struct wrapped by something like `ArcSwap<T>`, allowing us to update it atomically.
|
|
82
|
+
|
|
83
|
+
The layers will switch to `Context` to get the global resources instead of `OpRead`.
|
|
84
|
+
|
|
85
|
+
We no longer need to hijack the read operation.
|
|
86
|
+
|
|
87
|
+
```rust
|
|
88
|
+
- async fn read(&self, path: &str, mut args: OpRead) -> Result<(RpRead, Self::Reader)> {
|
|
89
|
+
- if let Some(exec) = args.executor().cloned() {
|
|
90
|
+
- args = args.with_executor(Executor::with(TimeoutExecutor::new(
|
|
91
|
+
- exec.into_inner(),
|
|
92
|
+
- self.io_timeout,
|
|
93
|
+
- )));
|
|
94
|
+
- }
|
|
95
|
+
-
|
|
96
|
+
- ...
|
|
97
|
+
- }
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
Instead, we can directly get the executor from the `Context` during `layer`.
|
|
101
|
+
|
|
102
|
+
```rust
|
|
103
|
+
impl<A: Access> Layer<A> for TimeoutLayer {
|
|
104
|
+
type LayeredAccess = TimeoutAccessor<A>;
|
|
105
|
+
|
|
106
|
+
fn layer(&self, inner: A) -> Self::LayeredAccess {
|
|
107
|
+
inner
|
|
108
|
+
.info()
|
|
109
|
+
.context()
|
|
110
|
+
.update_executor(|exec| Executor::with(TimeoutExecutor::new(exec, self.io_timeout)));
|
|
111
|
+
|
|
112
|
+
TimeoutAccessor {
|
|
113
|
+
inner,
|
|
114
|
+
|
|
115
|
+
timeout: self.timeout,
|
|
116
|
+
io_timeout: self.io_timeout,
|
|
117
|
+
}
|
|
118
|
+
}
|
|
119
|
+
}
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
# Drawbacks
|
|
123
|
+
|
|
124
|
+
A bit cost (`50ns`) for every operation that `load_http_client`.
|
|
125
|
+
|
|
126
|
+
# Rationale and alternatives
|
|
127
|
+
|
|
128
|
+
None.
|
|
129
|
+
|
|
130
|
+
# Prior art
|
|
131
|
+
|
|
132
|
+
None.
|
|
133
|
+
|
|
134
|
+
# Unresolved questions
|
|
135
|
+
|
|
136
|
+
None.
|
|
137
|
+
|
|
138
|
+
# Future possibilities
|
|
139
|
+
|
|
140
|
+
None.
|
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
- Proposal Name: `conditional_reader`
|
|
2
|
+
- Start Date: 2024-12-31
|
|
3
|
+
- RFC PR: [apache/opendal#5485](https://github.com/apache/opendal/pull/5485)
|
|
4
|
+
- Tracking Issue: [apache/opendal#5486](https://github.com/apache/opendal/issues/5486)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Add `if_match`, `if_none_match`, `if_modified_since` and `if_unmodified_since` options to OpenDAL's `reader_with` API.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
OpenDAL currently supports conditional `reader_with` operations based only on `version`. However, many storage services
|
|
13
|
+
also support conditional operations based on Etag and/or modification time.
|
|
14
|
+
|
|
15
|
+
Adding these options will:
|
|
16
|
+
|
|
17
|
+
- Provide more granular control over read operations.
|
|
18
|
+
- Align OpenDAL with features provided by modern storage services, meeting broader use cases.
|
|
19
|
+
|
|
20
|
+
# Guide-level explanation
|
|
21
|
+
|
|
22
|
+
Four new options will be added to the `reader_with` API:
|
|
23
|
+
|
|
24
|
+
## `if_match`
|
|
25
|
+
|
|
26
|
+
Return the content only if its Etag matches the specified Etag; otherwise,
|
|
27
|
+
an error kind `ErrorKind::ConditionNotMatch` will be returned:
|
|
28
|
+
|
|
29
|
+
```rust
|
|
30
|
+
let reader = op.reader_with("path/to/file")
|
|
31
|
+
.if_match(etag)
|
|
32
|
+
.await?;
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
## `if_none_match`
|
|
36
|
+
|
|
37
|
+
Return the content only if its Etag does NOT match the specified Etag; otherwise,
|
|
38
|
+
an error kind `ErrorKind::ConditionNotMatch` will be returned:
|
|
39
|
+
|
|
40
|
+
```rust
|
|
41
|
+
let reader = op.reader_with("path/to/file")
|
|
42
|
+
.if_none_match(etag)
|
|
43
|
+
.await?;
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## `if_modified_since`
|
|
47
|
+
|
|
48
|
+
Return the content if it has been modified since the specified time; otherwise,
|
|
49
|
+
an error kind `ErrorKind::ConditionNotMatch` will be returned:
|
|
50
|
+
|
|
51
|
+
```rust
|
|
52
|
+
use chrono::{Duration, Utc};
|
|
53
|
+
|
|
54
|
+
let last_check = Utc::now() - Duration::seconds(3600); // 1 hour ago
|
|
55
|
+
let reader = op.reader_with("path/to/file")
|
|
56
|
+
.if_modified_since(last_check)
|
|
57
|
+
.await?;
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
## `if_unmodified_since`
|
|
62
|
+
|
|
63
|
+
Return the content if it has NOT been modified since the specified time; otherwise,
|
|
64
|
+
an error kind `ErrorKind::ConditionNotMatch` will be returned:
|
|
65
|
+
|
|
66
|
+
```rust
|
|
67
|
+
use chrono::{Duration, Utc};
|
|
68
|
+
|
|
69
|
+
let timestamp = Utc::now() - Duration::seconds(86400); // 24 hours ago
|
|
70
|
+
let reader = op.reader_with("path/to/file")
|
|
71
|
+
.if_unmodified_since(timestamp)
|
|
72
|
+
.await?;
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
|
|
76
|
+
# Reference-level explanation
|
|
77
|
+
|
|
78
|
+
The main implementation will include:
|
|
79
|
+
|
|
80
|
+
1. Add new fields(`if_modified_since`, `if_unmodified_since`) and related functions to `OpRead`.
|
|
81
|
+
|
|
82
|
+
2. Add the related functions to `FutureReader`
|
|
83
|
+
|
|
84
|
+
3. Add new capability flags:
|
|
85
|
+
```rust
|
|
86
|
+
pub struct Capability {
|
|
87
|
+
// ... other fields
|
|
88
|
+
pub read_with_if_modified_since: bool,
|
|
89
|
+
pub read_with_if_unmodified_since: bool,
|
|
90
|
+
}
|
|
91
|
+
```
|
|
92
|
+
4. implement `if_modified_since`, `if_unmodified_since` for the underlying storage service.
|
|
93
|
+
|
|
94
|
+
# Drawbacks
|
|
95
|
+
|
|
96
|
+
- Add complexity to the API
|
|
97
|
+
|
|
98
|
+
# Rationale and alternatives
|
|
99
|
+
|
|
100
|
+
- Follows existing OpenDAL patterns for conditional operations
|
|
101
|
+
|
|
102
|
+
# Prior art
|
|
103
|
+
|
|
104
|
+
None
|
|
105
|
+
|
|
106
|
+
# Unresolved questions
|
|
107
|
+
|
|
108
|
+
None
|
|
109
|
+
|
|
110
|
+
# Future possibilities
|
|
111
|
+
|
|
112
|
+
None
|
|
@@ -0,0 +1,81 @@
|
|
|
1
|
+
- Proposal Name: `list_with_deleted`
|
|
2
|
+
- Start Date: 2025-01-02
|
|
3
|
+
- RFC PR: [apache/opendal#5495](https://github.com/apache/opendal/pull/0000)
|
|
4
|
+
- Tracking Issue: [apache/opendal#5496](https://github.com/apache/opendal/issues/5496)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Add `list_with(path).deleted(true)` to enable users to list deleted files from storage services.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
OpenDAL is currently working on adding support for file versions, allowing users to read, list, and delete them.
|
|
13
|
+
|
|
14
|
+
```rust
|
|
15
|
+
// Read given version
|
|
16
|
+
op.read_with(path).version(version_id).await;
|
|
17
|
+
// Fetch the metadata of given version.
|
|
18
|
+
op.stat_with(path).version(version_id).await;
|
|
19
|
+
// Delete the given version.
|
|
20
|
+
op.delete_with(path).version(version_id).await;
|
|
21
|
+
// List the path's versions.
|
|
22
|
+
op.list_with(path).versions().await;
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
However, to implement the complete data recovery workflow, we should also include support for recovering deleted files from storage services. This feature is referred to as `DeleteMarker` in S3 and `Soft Deleted` in Azure Blob Storage or Google Cloud Storage. Users can utilize these deleted files (or versions) to restore files that may have been accidentally deleted.
|
|
26
|
+
|
|
27
|
+
# Guide-level explanation
|
|
28
|
+
|
|
29
|
+
I suggest adding `list_with(path).deleted(true)` to allow users to list deleted files from storage services.
|
|
30
|
+
|
|
31
|
+
```rust
|
|
32
|
+
let entries = op.list_with(path).deleted(true).await;
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Please note that `deleted` here means "including deleted files" rather than "only deleted files." Therefore, `list_with(path).deleted(true)` will list both current files and deleted ones.
|
|
36
|
+
|
|
37
|
+
At the same time, we will add an `is_deleted` field to the `Metadata` struct to indicate whether the file has been deleted. Together with the existing `is_current` field, we will have the following matrix:
|
|
38
|
+
|
|
39
|
+
| `is_current` | `is_deleted` | Description |
|
|
40
|
+
|---------------|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|
41
|
+
| `Some(true)` | `false` | **The metadata's associated version is the latest, current version.** This is the normal state, indicating that this version is the most up-to-date and accessible version. |
|
|
42
|
+
| `Some(true)` | `true` | **The metadata's associated version is the latest, deleted version (Latest Delete Marker or Soft Deleted).** This is particularly important in object storage systems like S3. It signifies that this version is the **most recent delete marker**, indicating the object has been deleted. Subsequent GET requests will return 404 errors unless a specific version ID is provided. |
|
|
43
|
+
| `Some(false)` | `false` | **The metadata's associated version is neither the latest version nor deleted.** This indicates that this version is a previous version, still accessible by specifying its version ID. |
|
|
44
|
+
| `Some(false)` | `true` | **The metadata's associated version is not the latest version and is deleted.** This represents a historical version that has been marked for deletion. Users will need to specify the version ID to access it, and accessing it may be subject to specific delete marker behavior (e.g., in S3, it might not return actual data but a specific delete marker response). |
|
|
45
|
+
| `None` | `false` | **The metadata's associated file is not deleted, but its version status is either unknown or it is not the latest version.** This likely indicates that versioning is not enabled for this file, or versioning information is unavailable. |
|
|
46
|
+
| `None` | `true` | **The metadata's associated file is deleted, but its version status is either unknown or it is not the latest version.** This typically means the file was deleted without versioning enabled, or its versioning information is unavailable. This may represent an actual data deletion operation rather than an S3 delete marker. |
|
|
47
|
+
|
|
48
|
+
|
|
49
|
+
# Reference-level explanation
|
|
50
|
+
|
|
51
|
+
- Implement the `list_with(path).deleted(true)` API for the `Operator`.
|
|
52
|
+
- Add an `is_deleted` field to `Metadata`.
|
|
53
|
+
- Integrate logic for including deleted files into the `list` method of the storage service.
|
|
54
|
+
|
|
55
|
+
# Drawbacks
|
|
56
|
+
|
|
57
|
+
None.
|
|
58
|
+
|
|
59
|
+
# Rationale and alternatives
|
|
60
|
+
|
|
61
|
+
## Why "including deleted files" rather than "only deleted files"?
|
|
62
|
+
|
|
63
|
+
Most storage services are designed to list files along with deleted files, rather than exclusively listing deleted files. For example:
|
|
64
|
+
|
|
65
|
+
- S3's [ListObjectVersions](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html) API lists all versions of an object, including delete markers.
|
|
66
|
+
- GCS's [list](https://cloud.google.com/storage/docs/json_api/v1/objects/list) API includes a parameter `softDeleted` to display soft-deleted files.
|
|
67
|
+
- AzBlob's [List Blobs](https://learn.microsoft.com/en-us/rest/api/storageservices/list-blobs) API supports the parameter `include=deleted` to list soft-deleted blobs.
|
|
68
|
+
|
|
69
|
+
So, it is more natural to list files along with deleted files, rather than only listing deleted files.
|
|
70
|
+
|
|
71
|
+
# Prior art
|
|
72
|
+
|
|
73
|
+
None.
|
|
74
|
+
|
|
75
|
+
# Unresolved questions
|
|
76
|
+
|
|
77
|
+
None.
|
|
78
|
+
|
|
79
|
+
# Future possibilities
|
|
80
|
+
|
|
81
|
+
None.
|
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
- Proposal Name: `write_returns_metadata`
|
|
2
|
+
- Start Date: 2025-01-16
|
|
3
|
+
- RFC PR: [apache/opendal#5556](https://github.com/apache/opendal/pull/5556)
|
|
4
|
+
- Tracking Issue: [apache/opendal#5557](https://github.com/apache/opendal/issues/5557)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Enhance write operations by returning metadata after successful writes.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
Currently, write operations (`write`, `write_with`, `writer`, `writer_with`) only return `Result<()>` or `Result<Writer>`.
|
|
13
|
+
Users who need metadata after writing (like `ETag` or `version_id`) must make an additional `stat()` call. This is inefficient
|
|
14
|
+
and can lead to race conditions if the file is modified between the write and stat operations.
|
|
15
|
+
|
|
16
|
+
Many storage services (like S3, GCS, Azure Blob) return metadata in their write responses. We should expose this information
|
|
17
|
+
to users directly after write operations.
|
|
18
|
+
|
|
19
|
+
# Guide-level explanation
|
|
20
|
+
|
|
21
|
+
The write operations will be enhanced to return metadata:
|
|
22
|
+
|
|
23
|
+
```rust
|
|
24
|
+
// Before
|
|
25
|
+
op.write("path/to/file", data).await?;
|
|
26
|
+
let meta = op.stat("path/to/file").await?;
|
|
27
|
+
if Some(etag) = meta.etag() {
|
|
28
|
+
println!("File ETag: {}", etag);
|
|
29
|
+
}
|
|
30
|
+
|
|
31
|
+
// After
|
|
32
|
+
let meta = op.write("path/to/file", data).await?;
|
|
33
|
+
if Some(etag) = meta.etag() {
|
|
34
|
+
println!("File ETag: {}", etag);
|
|
35
|
+
}
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
For writer operations:
|
|
39
|
+
|
|
40
|
+
```rust
|
|
41
|
+
// Before
|
|
42
|
+
let mut writer = op.writer("path/to/file").await?;
|
|
43
|
+
writer.write(data).await?;
|
|
44
|
+
writer.close().await?;
|
|
45
|
+
let meta = op.stat("path/to/file").await?;
|
|
46
|
+
if Some(etag) = meta.etag() {
|
|
47
|
+
println!("File ETag: {}", etag);
|
|
48
|
+
}
|
|
49
|
+
|
|
50
|
+
// After
|
|
51
|
+
let mut writer = op.writer("path/to/file").await?;
|
|
52
|
+
writer.write(data).await?;
|
|
53
|
+
let meta = writer.close().await?;
|
|
54
|
+
if Some(etag) = meta.etag() {
|
|
55
|
+
println!("File ETag: {}", etag);
|
|
56
|
+
}
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
The behavior remains unchanged if users don't need the metadata - they can simply ignore the return value.
|
|
60
|
+
|
|
61
|
+
# Reference-level explanation
|
|
62
|
+
|
|
63
|
+
## Changes to `Operator` API
|
|
64
|
+
|
|
65
|
+
The following functions will be modified to return `Result<Metadata>` instead of `Result<()>`:
|
|
66
|
+
|
|
67
|
+
- `write()`
|
|
68
|
+
- `write_with()`
|
|
69
|
+
|
|
70
|
+
The `writer()` and `writer_with()` return types remain unchanged as they return `Result<Writer>`.
|
|
71
|
+
|
|
72
|
+
## Changes to struct `Writer`
|
|
73
|
+
|
|
74
|
+
The `Writer` struct will be modified to return `Result<Metadata>` instead of `Result<()>` for the `close()` function.
|
|
75
|
+
|
|
76
|
+
## Changes to trait `oio::Write` and trait `oio::MultipartWrite`
|
|
77
|
+
|
|
78
|
+
The `Write` trait will be modified to return `Result<Metadata>` instead of `Result<()>` for the `close()` function.
|
|
79
|
+
|
|
80
|
+
The `MultipartWrite` trait will be modified to return `Result<Metadata>` instead of `Result<()>` for the `complete_part()`
|
|
81
|
+
and `write_once` functions.
|
|
82
|
+
|
|
83
|
+
## Implementation Details
|
|
84
|
+
|
|
85
|
+
For services that return metadata in their write responses:
|
|
86
|
+
- The metadata will be captured from the service response
|
|
87
|
+
- All available fields (etag, version_id, etc.) will be populated
|
|
88
|
+
|
|
89
|
+
For services that don't return metadata in write responses:
|
|
90
|
+
- for `fs`: we can use `stat` to retrieve the metadata before returning. since the metadata is cached by the kernel,
|
|
91
|
+
this won't cause a performance issue.
|
|
92
|
+
- for other services: A default metadata object will be returned.
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
# Drawbacks
|
|
96
|
+
|
|
97
|
+
- Minor breaking change for users who explicitly type the return value of write operations
|
|
98
|
+
- Additional complexity in the Writer implementation
|
|
99
|
+
|
|
100
|
+
# Rationale and alternatives
|
|
101
|
+
|
|
102
|
+
- Provides a clean, consistent API
|
|
103
|
+
- Maintains backward compatibility for users who ignore the return value
|
|
104
|
+
- Improves performance by avoiding additional stat calls when possible
|
|
105
|
+
|
|
106
|
+
# Prior art
|
|
107
|
+
|
|
108
|
+
Similar patterns exist in other storage SDKs:
|
|
109
|
+
|
|
110
|
+
- `object_store` crate returns metadata in `PutResult` after calling `put_opts`
|
|
111
|
+
- AWS SDK returns metadata in `PutObjectOutput`
|
|
112
|
+
- Azure SDK returns `UploadFileResponse` after uploads
|
|
113
|
+
|
|
114
|
+
# Unresolved questions
|
|
115
|
+
|
|
116
|
+
- None
|
|
117
|
+
|
|
118
|
+
|
|
119
|
+
# Future possibilities
|
|
120
|
+
|
|
121
|
+
- None
|