opendal 0.1.6.pre.rc.1-aarch64-linux
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.standard.yml +20 -0
- data/.tool-versions +1 -0
- data/.yardopts +1 -0
- data/DEPENDENCIES.md +9 -0
- data/DEPENDENCIES.rust.tsv +277 -0
- data/Gemfile +35 -0
- data/README.md +159 -0
- data/Rakefile +149 -0
- data/core/CHANGELOG.md +4929 -0
- data/core/CONTRIBUTING.md +61 -0
- data/core/DEPENDENCIES.md +3 -0
- data/core/DEPENDENCIES.rust.tsv +185 -0
- data/core/LICENSE +201 -0
- data/core/README.md +228 -0
- data/core/benches/README.md +18 -0
- data/core/benches/ops/README.md +26 -0
- data/core/benches/types/README.md +9 -0
- data/core/benches/vs_fs/README.md +35 -0
- data/core/benches/vs_s3/README.md +55 -0
- data/core/edge/README.md +3 -0
- data/core/edge/file_write_on_full_disk/README.md +14 -0
- data/core/edge/s3_aws_assume_role_with_web_identity/README.md +18 -0
- data/core/edge/s3_read_on_wasm/.gitignore +3 -0
- data/core/edge/s3_read_on_wasm/README.md +42 -0
- data/core/edge/s3_read_on_wasm/webdriver.json +15 -0
- data/core/examples/README.md +23 -0
- data/core/examples/basic/README.md +15 -0
- data/core/examples/concurrent-upload/README.md +15 -0
- data/core/examples/multipart-upload/README.md +15 -0
- data/core/fuzz/.gitignore +5 -0
- data/core/fuzz/README.md +68 -0
- data/core/src/docs/comparisons/vs_object_store.md +183 -0
- data/core/src/docs/performance/concurrent_write.md +101 -0
- data/core/src/docs/performance/http_optimization.md +124 -0
- data/core/src/docs/rfcs/0000_example.md +74 -0
- data/core/src/docs/rfcs/0000_foyer_integration.md +111 -0
- data/core/src/docs/rfcs/0041_object_native_api.md +185 -0
- data/core/src/docs/rfcs/0044_error_handle.md +198 -0
- data/core/src/docs/rfcs/0057_auto_region.md +160 -0
- data/core/src/docs/rfcs/0069_object_stream.md +145 -0
- data/core/src/docs/rfcs/0090_limited_reader.md +155 -0
- data/core/src/docs/rfcs/0112_path_normalization.md +79 -0
- data/core/src/docs/rfcs/0191_async_streaming_io.md +328 -0
- data/core/src/docs/rfcs/0203_remove_credential.md +96 -0
- data/core/src/docs/rfcs/0221_create_dir.md +89 -0
- data/core/src/docs/rfcs/0247_retryable_error.md +87 -0
- data/core/src/docs/rfcs/0293_object_id.md +67 -0
- data/core/src/docs/rfcs/0337_dir_entry.md +191 -0
- data/core/src/docs/rfcs/0409_accessor_capabilities.md +67 -0
- data/core/src/docs/rfcs/0413_presign.md +154 -0
- data/core/src/docs/rfcs/0423_command_line_interface.md +268 -0
- data/core/src/docs/rfcs/0429_init_from_iter.md +107 -0
- data/core/src/docs/rfcs/0438_multipart.md +163 -0
- data/core/src/docs/rfcs/0443_gateway.md +73 -0
- data/core/src/docs/rfcs/0501_new_builder.md +111 -0
- data/core/src/docs/rfcs/0554_write_refactor.md +96 -0
- data/core/src/docs/rfcs/0561_list_metadata_reuse.md +210 -0
- data/core/src/docs/rfcs/0599_blocking_api.md +157 -0
- data/core/src/docs/rfcs/0623_redis_service.md +300 -0
- data/core/src/docs/rfcs/0627_split_capabilities.md +89 -0
- data/core/src/docs/rfcs/0661_path_in_accessor.md +126 -0
- data/core/src/docs/rfcs/0793_generic_kv_services.md +209 -0
- data/core/src/docs/rfcs/0926_object_reader.md +93 -0
- data/core/src/docs/rfcs/0977_refactor_error.md +151 -0
- data/core/src/docs/rfcs/1085_object_handler.md +73 -0
- data/core/src/docs/rfcs/1391_object_metadataer.md +110 -0
- data/core/src/docs/rfcs/1398_query_based_metadata.md +125 -0
- data/core/src/docs/rfcs/1420_object_writer.md +147 -0
- data/core/src/docs/rfcs/1477_remove_object_concept.md +159 -0
- data/core/src/docs/rfcs/1735_operation_extension.md +117 -0
- data/core/src/docs/rfcs/2083_writer_sink_api.md +106 -0
- data/core/src/docs/rfcs/2133_append_api.md +88 -0
- data/core/src/docs/rfcs/2299_chain_based_operator_api.md +99 -0
- data/core/src/docs/rfcs/2602_object_versioning.md +138 -0
- data/core/src/docs/rfcs/2758_merge_append_into_write.md +79 -0
- data/core/src/docs/rfcs/2774_lister_api.md +66 -0
- data/core/src/docs/rfcs/2779_list_with_metakey.md +143 -0
- data/core/src/docs/rfcs/2852_native_capability.md +58 -0
- data/core/src/docs/rfcs/2884_merge_range_read_into_read.md +80 -0
- data/core/src/docs/rfcs/3017_remove_write_copy_from.md +94 -0
- data/core/src/docs/rfcs/3197_config.md +237 -0
- data/core/src/docs/rfcs/3232_align_list_api.md +69 -0
- data/core/src/docs/rfcs/3243_list_prefix.md +128 -0
- data/core/src/docs/rfcs/3356_lazy_reader.md +111 -0
- data/core/src/docs/rfcs/3526_list_recursive.md +59 -0
- data/core/src/docs/rfcs/3574_concurrent_stat_in_list.md +80 -0
- data/core/src/docs/rfcs/3734_buffered_reader.md +64 -0
- data/core/src/docs/rfcs/3898_concurrent_writer.md +66 -0
- data/core/src/docs/rfcs/3911_deleter_api.md +165 -0
- data/core/src/docs/rfcs/4382_range_based_read.md +213 -0
- data/core/src/docs/rfcs/4638_executor.md +215 -0
- data/core/src/docs/rfcs/5314_remove_metakey.md +120 -0
- data/core/src/docs/rfcs/5444_operator_from_uri.md +162 -0
- data/core/src/docs/rfcs/5479_context.md +140 -0
- data/core/src/docs/rfcs/5485_conditional_reader.md +112 -0
- data/core/src/docs/rfcs/5495_list_with_deleted.md +81 -0
- data/core/src/docs/rfcs/5556_write_returns_metadata.md +121 -0
- data/core/src/docs/rfcs/5871_read_returns_metadata.md +112 -0
- data/core/src/docs/rfcs/6189_remove_native_blocking.md +106 -0
- data/core/src/docs/rfcs/6209_glob_support.md +132 -0
- data/core/src/docs/rfcs/6213_options_api.md +142 -0
- data/core/src/docs/rfcs/README.md +62 -0
- data/core/src/docs/upgrade.md +1556 -0
- data/core/src/services/aliyun_drive/docs.md +61 -0
- data/core/src/services/alluxio/docs.md +45 -0
- data/core/src/services/azblob/docs.md +77 -0
- data/core/src/services/azdls/docs.md +73 -0
- data/core/src/services/azfile/docs.md +65 -0
- data/core/src/services/b2/docs.md +54 -0
- data/core/src/services/cacache/docs.md +38 -0
- data/core/src/services/cloudflare_kv/docs.md +21 -0
- data/core/src/services/cos/docs.md +55 -0
- data/core/src/services/d1/docs.md +48 -0
- data/core/src/services/dashmap/docs.md +38 -0
- data/core/src/services/dbfs/docs.md +57 -0
- data/core/src/services/dropbox/docs.md +64 -0
- data/core/src/services/etcd/docs.md +45 -0
- data/core/src/services/foundationdb/docs.md +42 -0
- data/core/src/services/fs/docs.md +49 -0
- data/core/src/services/ftp/docs.md +42 -0
- data/core/src/services/gcs/docs.md +76 -0
- data/core/src/services/gdrive/docs.md +65 -0
- data/core/src/services/ghac/docs.md +84 -0
- data/core/src/services/github/docs.md +52 -0
- data/core/src/services/gridfs/docs.md +46 -0
- data/core/src/services/hdfs/docs.md +140 -0
- data/core/src/services/hdfs_native/docs.md +35 -0
- data/core/src/services/http/docs.md +45 -0
- data/core/src/services/huggingface/docs.md +61 -0
- data/core/src/services/ipfs/docs.md +45 -0
- data/core/src/services/ipmfs/docs.md +14 -0
- data/core/src/services/koofr/docs.md +51 -0
- data/core/src/services/lakefs/docs.md +62 -0
- data/core/src/services/memcached/docs.md +47 -0
- data/core/src/services/memory/docs.md +36 -0
- data/core/src/services/mini_moka/docs.md +19 -0
- data/core/src/services/moka/docs.md +42 -0
- data/core/src/services/mongodb/docs.md +49 -0
- data/core/src/services/monoiofs/docs.md +46 -0
- data/core/src/services/mysql/docs.md +47 -0
- data/core/src/services/obs/docs.md +54 -0
- data/core/src/services/onedrive/docs.md +115 -0
- data/core/src/services/opfs/docs.md +18 -0
- data/core/src/services/oss/docs.md +74 -0
- data/core/src/services/pcloud/docs.md +51 -0
- data/core/src/services/persy/docs.md +43 -0
- data/core/src/services/postgresql/docs.md +47 -0
- data/core/src/services/redb/docs.md +41 -0
- data/core/src/services/redis/docs.md +43 -0
- data/core/src/services/rocksdb/docs.md +54 -0
- data/core/src/services/s3/compatible_services.md +126 -0
- data/core/src/services/s3/docs.md +244 -0
- data/core/src/services/seafile/docs.md +54 -0
- data/core/src/services/sftp/docs.md +49 -0
- data/core/src/services/sled/docs.md +39 -0
- data/core/src/services/sqlite/docs.md +46 -0
- data/core/src/services/surrealdb/docs.md +54 -0
- data/core/src/services/swift/compatible_services.md +53 -0
- data/core/src/services/swift/docs.md +52 -0
- data/core/src/services/tikv/docs.md +43 -0
- data/core/src/services/upyun/docs.md +51 -0
- data/core/src/services/vercel_artifacts/docs.md +40 -0
- data/core/src/services/vercel_blob/docs.md +45 -0
- data/core/src/services/webdav/docs.md +49 -0
- data/core/src/services/webhdfs/docs.md +90 -0
- data/core/src/services/yandex_disk/docs.md +45 -0
- data/core/tests/behavior/README.md +77 -0
- data/core/tests/data/normal_dir/.gitkeep +0 -0
- data/core/tests/data/normal_file.txt +1041 -0
- data/core/tests/data/special_dir !@#$%^&()_+-=;',/.gitkeep +0 -0
- data/core/tests/data/special_file !@#$%^&()_+-=;',.txt +1041 -0
- data/core/users.md +13 -0
- data/extconf.rb +24 -0
- data/lib/opendal.rb +25 -0
- data/lib/opendal_ruby/entry.rb +35 -0
- data/lib/opendal_ruby/io.rb +70 -0
- data/lib/opendal_ruby/metadata.rb +44 -0
- data/lib/opendal_ruby/opendal_ruby.so +0 -0
- data/lib/opendal_ruby/operator.rb +29 -0
- data/lib/opendal_ruby/operator_info.rb +26 -0
- data/opendal.gemspec +91 -0
- data/test/blocking_op_test.rb +112 -0
- data/test/capability_test.rb +42 -0
- data/test/io_test.rb +172 -0
- data/test/lister_test.rb +77 -0
- data/test/metadata_test.rb +78 -0
- data/test/middlewares_test.rb +46 -0
- data/test/operator_info_test.rb +35 -0
- data/test/test_helper.rb +36 -0
- metadata +240 -0
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
- Proposal Name: `merge_range_read_into_read`
|
|
2
|
+
- Start Date: 2023-08-20
|
|
3
|
+
- RFC PR: [apache/opendal#2884](https://github.com/apache/opendal/pull/2884)
|
|
4
|
+
- Tracking Issue: [apache/opendal#2885](https://github.com/apache/opendal/issues/2885)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Merge the `range_read` API into `read` by deleting the `op.range_reader(path, range)` and `op.range_read(path, range)` method.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
Currently OpenDAL has separate `range_read` and `read` APIs:
|
|
13
|
+
|
|
14
|
+
```rust
|
|
15
|
+
let bs = op.range_read("path/to/file", 1024..2048).await?;
|
|
16
|
+
|
|
17
|
+
let bs = op.read("path/to/file").await?;
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
As same as `range_reader` and `reader` APIs:
|
|
21
|
+
```rust
|
|
22
|
+
let reader = op.range_reader("path/to/file", 1024..2048).await?;
|
|
23
|
+
|
|
24
|
+
let reader = op.reader("path/to/file").await?;
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
This duplication forces users to learn two different APIs for reading data.
|
|
29
|
+
|
|
30
|
+
By adding this change, we can:
|
|
31
|
+
|
|
32
|
+
- Simpler API surface - users only need to learn one writing API.
|
|
33
|
+
- Reduce code duplication between read and range_read implementations.
|
|
34
|
+
- Atomic read semantics are handled internally in `reader`.
|
|
35
|
+
|
|
36
|
+
# Guide-level explanation
|
|
37
|
+
|
|
38
|
+
There is no new approach to read data from file. The `read` and `reader` API supported range read by default.
|
|
39
|
+
|
|
40
|
+
Calling `read_with("path/to/file").range(range)` will return a `reader` that supports range read.
|
|
41
|
+
|
|
42
|
+
```rust
|
|
43
|
+
let bs = op.read_with("path/to/file").range(1024..2048).await?;
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Calling `reader_with("path/to/file").range(range)` will return a `reader` that supports range read.
|
|
47
|
+
|
|
48
|
+
```rust
|
|
49
|
+
let rs = op.reader_with("path/to/file").range(1024..2048).await?;
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
There is no longer a separate `range_read` and `range_reader` API.
|
|
53
|
+
|
|
54
|
+
# Reference-level explanation
|
|
55
|
+
|
|
56
|
+
None
|
|
57
|
+
|
|
58
|
+
# Drawbacks
|
|
59
|
+
|
|
60
|
+
None
|
|
61
|
+
|
|
62
|
+
## Breaking Changes
|
|
63
|
+
|
|
64
|
+
This RFC has removed the `range_read` and `range_reader` APIs. If you have been using these APIs, you will need to reimplement your code by `op.read("path/to/file").range(1024..2048)` or `op.reader("path/to/file").range(1024..2048)`.
|
|
65
|
+
|
|
66
|
+
# Rationale and alternatives
|
|
67
|
+
|
|
68
|
+
None
|
|
69
|
+
|
|
70
|
+
# Prior art
|
|
71
|
+
|
|
72
|
+
None
|
|
73
|
+
|
|
74
|
+
# Unresolved questions
|
|
75
|
+
|
|
76
|
+
None
|
|
77
|
+
|
|
78
|
+
# Future possibilities
|
|
79
|
+
|
|
80
|
+
None
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
- Proposal Name: `remove_write_copy_from`
|
|
2
|
+
- Start Date: 2023-09-06
|
|
3
|
+
- RFC PR: [apache/opendal#3017](https://github.com/apache/opendal/pull/3017)
|
|
4
|
+
- Tracking Issue: [apache/opendal#3017](https://github.com/apache/opendal/issues/3017)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Remove the `oio::Write::copy_from()` API pending a more thoughtful design.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
In [RFC-2083: Writer Sink API](./2083_writer_sink_api.md), we launched an API, initially named `sink` and changed to `copy_from`, that enables data writing from a `Reader` to a `Writer` object.
|
|
13
|
+
|
|
14
|
+
The current API signature is:
|
|
15
|
+
```rust
|
|
16
|
+
pub trait Write: Unpin + Send + Sync {
|
|
17
|
+
/// Copies data from the given reader to the writer.
|
|
18
|
+
///
|
|
19
|
+
/// # Behavior
|
|
20
|
+
///
|
|
21
|
+
/// - `Ok(n)` indicates successful writing of `n` bytes.
|
|
22
|
+
/// - `Err(err)` indicates a failure, resulting in zero bytes written.
|
|
23
|
+
///
|
|
24
|
+
/// A situation where `n < size` may arise; the caller should then transmit the remaining bytes until the full amount is written.
|
|
25
|
+
async fn copy_from(&mut self, size: u64, src: oio::Reader) -> Result<u64>;
|
|
26
|
+
}
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
The API has the following limitations:
|
|
30
|
+
|
|
31
|
+
- Incompatibility with existing buffering and retry mechanisms.
|
|
32
|
+
- Imposes ownership requirements on the reader, complicating its use. The reader must be recreated for every write operation.
|
|
33
|
+
|
|
34
|
+
Due to restrictions in both Rust and Hyper's APIs, the following ideal implementation is currently unattainable:
|
|
35
|
+
|
|
36
|
+
```rust
|
|
37
|
+
pub trait Write: Unpin + Send + Sync {
|
|
38
|
+
async fn copy_from(&mut self, size: u64, src: &mut impl oio::Read) -> Result<u64>;
|
|
39
|
+
}
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
- Rust doesn't allow us to have `impl oio::Read` in trait method if we want object safe.
|
|
43
|
+
- hyper doesn't allow us to use reference here, it requires `impl Stream + 'static`.
|
|
44
|
+
|
|
45
|
+
Given these constraints, the proposal is to remove `oio::Write::copy_from` until a more fitting design becomes feasible.
|
|
46
|
+
|
|
47
|
+
# Guide-level explanation
|
|
48
|
+
|
|
49
|
+
The `Writer::sink()` and `Writer::copy()` methods will be kept, but it's internal implementation will be changed to use `AsyncWrite` instead. For example:
|
|
50
|
+
|
|
51
|
+
```diff
|
|
52
|
+
pub async fn copy<R>(&mut self, size: u64, read_from: R) -> Result<u64>
|
|
53
|
+
where
|
|
54
|
+
R: futures::AsyncRead + Send + Sync + Unpin + 'static,
|
|
55
|
+
{
|
|
56
|
+
if let State::Idle(Some(w)) = &mut self.state {
|
|
57
|
+
let r = Box::new(oio::into_streamable_read(
|
|
58
|
+
oio::into_read_from_file(read_from, 0, size),
|
|
59
|
+
64 * 1024,
|
|
60
|
+
));
|
|
61
|
+
- w.copy_from(size, r).await
|
|
62
|
+
+ futures::io::copy(&mut r, w).await
|
|
63
|
+
} else {
|
|
64
|
+
unreachable!(
|
|
65
|
+
"writer state invalid while copy, expect Idle, actual {}",
|
|
66
|
+
self.state
|
|
67
|
+
);
|
|
68
|
+
}
|
|
69
|
+
}
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
# Reference-level explanation
|
|
73
|
+
|
|
74
|
+
The method `oio::Write::copy_from` will be removed.
|
|
75
|
+
|
|
76
|
+
# Drawbacks
|
|
77
|
+
|
|
78
|
+
The deprecation eliminates the ability to stream data uploads. A viable alternative is to directly use `AsyncWrite` offered by `Writer`.
|
|
79
|
+
|
|
80
|
+
# Rationale and Alternatives
|
|
81
|
+
|
|
82
|
+
N/A
|
|
83
|
+
|
|
84
|
+
# Prior Art
|
|
85
|
+
|
|
86
|
+
N/A
|
|
87
|
+
|
|
88
|
+
# Unresolved Questions
|
|
89
|
+
|
|
90
|
+
N/A
|
|
91
|
+
|
|
92
|
+
# Future Possibilities
|
|
93
|
+
|
|
94
|
+
Introduce utility functions such as `Writer::copy_from(r: &dyn AsyncRead)` when possible.
|
|
@@ -0,0 +1,237 @@
|
|
|
1
|
+
- Proposal Name: `config`
|
|
2
|
+
- Start Date: 2023-09-27
|
|
3
|
+
- RFC PR: [apache/opendal#3197](https://github.com/apache/opendal/pull/3197)
|
|
4
|
+
- Tracking Issue: [apache/opendal#3240](https://github.com/apache/opendal/issues/3240)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Expose services config to the user.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
OpenDAL provides two ways to configure services: through a builder pattern and via a map.
|
|
13
|
+
|
|
14
|
+
The `Builder` allows users to configure services using the builder pattern:
|
|
15
|
+
|
|
16
|
+
```rust
|
|
17
|
+
// Create fs backend builder.
|
|
18
|
+
let mut builder = Fs::default();
|
|
19
|
+
// Set the root for fs, all operations will happen under this root.
|
|
20
|
+
builder.root("/tmp");
|
|
21
|
+
|
|
22
|
+
// Build an `Operator` to start operating the storage.
|
|
23
|
+
let op: Operator = Operator::new(builder)?.finish();
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
The benefit of builder is that it is type safe and easy to use. However, it is not flexible enough to configure services. Users must create a new builder for each service they wish to configure, translating user input into the API calls for each respective builder.
|
|
27
|
+
|
|
28
|
+
Consider the following real-world example from one of our users:
|
|
29
|
+
|
|
30
|
+
```rust
|
|
31
|
+
let mut builder = services::S3::default();
|
|
32
|
+
|
|
33
|
+
// Credential.
|
|
34
|
+
builder.access_key_id(&cfg.access_key_id);
|
|
35
|
+
builder.secret_access_key(&cfg.secret_access_key);
|
|
36
|
+
builder.session_token(&cfg.session_token);
|
|
37
|
+
builder.role_arn(&cfg.role_arn);
|
|
38
|
+
builder.external_id(&cfg.external_id);
|
|
39
|
+
|
|
40
|
+
// Root.
|
|
41
|
+
builder.root(&cfg.root);
|
|
42
|
+
|
|
43
|
+
// Disable credential loader
|
|
44
|
+
if cfg.disable_credential_loader {
|
|
45
|
+
builder.disable_config_load();
|
|
46
|
+
builder.disable_ec2_metadata();
|
|
47
|
+
}
|
|
48
|
+
|
|
49
|
+
// Enable virtual host style
|
|
50
|
+
if cfg.enable_virtual_host_style {
|
|
51
|
+
builder.enable_virtual_host_style();
|
|
52
|
+
}
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
The `Map` approach allows users to configure services using a string-based HashMap:
|
|
56
|
+
|
|
57
|
+
```rust
|
|
58
|
+
let map = HashMap::from([
|
|
59
|
+
// Set the root for fs, all operations will happen under this root.
|
|
60
|
+
("root".to_string(), "/tmp".to_string()),
|
|
61
|
+
]);
|
|
62
|
+
|
|
63
|
+
// Build an `Operator` to start operating the storage.
|
|
64
|
+
let op: Operator = Operator::via_map(Scheme::Fs, map)?;
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
This approach is simpler since it allows users to configure all services within a single map. However, it is not type safe and not easy to use. Users will need to convert their input to string and make sure the key is correct. And breaking changes could happen silently.
|
|
68
|
+
|
|
69
|
+
This is one of our limitations: We need a way to configure services that is type safe, easy to use and flexible. The other one is that there is no way for users to fetch the config of a service after it's built. This limitation complicates the dynamic modification of a service's root path for the user.
|
|
70
|
+
|
|
71
|
+
Our users have to wrap all our configs into an enum and store it in their own struct:
|
|
72
|
+
|
|
73
|
+
```rust
|
|
74
|
+
pub enum StorageParams {
|
|
75
|
+
Azblob(StorageAzblobConfig),
|
|
76
|
+
Fs(StorageFsConfig),
|
|
77
|
+
Ftp(StorageFtpConfig),
|
|
78
|
+
Gcs(StorageGcsConfig),
|
|
79
|
+
Hdfs(StorageHdfsConfig),
|
|
80
|
+
Http(StorageHttpConfig),
|
|
81
|
+
Ipfs(StorageIpfsConfig),
|
|
82
|
+
Memory,
|
|
83
|
+
Moka(StorageMokaConfig),
|
|
84
|
+
Obs(StorageObsConfig),
|
|
85
|
+
Oss(StorageOssConfig),
|
|
86
|
+
S3(StorageS3Config),
|
|
87
|
+
Redis(StorageRedisConfig),
|
|
88
|
+
Webhdfs(StorageWebhdfsConfig),
|
|
89
|
+
Cos(StorageCosConfig),
|
|
90
|
+
}
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
So I propose to expose services config to the users, allowing them to work on config structs directly and fetch the config at runtime.
|
|
94
|
+
|
|
95
|
+
# Guide-level explanation
|
|
96
|
+
|
|
97
|
+
First of all, we will add config struct for each service. For example, `Fs` will have a `FsConfig` struct and `S3` will have a `S3Config`. The fields within the config struct are public and marked as non-exhaustive.
|
|
98
|
+
|
|
99
|
+
```rust
|
|
100
|
+
#[non_exhaustive]
|
|
101
|
+
pub struct S3Config {
|
|
102
|
+
pub root: Option<String>,
|
|
103
|
+
pub bucket: String,
|
|
104
|
+
pub endpoint: Option<String>,
|
|
105
|
+
pub region: Option<String>,
|
|
106
|
+
...
|
|
107
|
+
}
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Then, we will add a `Config` enum that contains all the config structs. The enum is public and non-exhaustive too.
|
|
111
|
+
|
|
112
|
+
```rust
|
|
113
|
+
#[non_exhaustive]
|
|
114
|
+
pub enum Config {
|
|
115
|
+
Fs(FsConfig)
|
|
116
|
+
S3(S3Config),
|
|
117
|
+
Custom(&'static str, HashMap<String, String>),
|
|
118
|
+
}
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
Notably, a `Custom` variant will be added to the enum. This variant aligns with `Scheme::Custom(name)` and allows users to configure custom services.
|
|
122
|
+
|
|
123
|
+
At `Operator` level, we will add `from_config` and `via_config` methods.
|
|
124
|
+
|
|
125
|
+
```rust
|
|
126
|
+
impl Operator {
|
|
127
|
+
pub fn via_config(cfg: impl Into<Config>) -> Result<Operator> {}
|
|
128
|
+
}
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
Additionally, `OperatorInfo` will introduce a new API method, `config()`:
|
|
132
|
+
|
|
133
|
+
```rust
|
|
134
|
+
impl OperatorInfo {
|
|
135
|
+
pub fn config(&self) -> Config {}
|
|
136
|
+
}
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
Users can use `config()` to fetch the config of a service at runtime and construct a new operator based on needs.
|
|
140
|
+
|
|
141
|
+
# Reference-level explanation
|
|
142
|
+
|
|
143
|
+
Every services will have a `XxxConfig` struct.
|
|
144
|
+
|
|
145
|
+
`XxxConfig` will implement the following things:
|
|
146
|
+
|
|
147
|
+
- `Default` trait: All config fields will have a default value.
|
|
148
|
+
- `Deserialize` trait: Allow users to deserialize a config from a string.
|
|
149
|
+
- `Into<Config>` trait: All service config can be converted to `Config` enum.
|
|
150
|
+
|
|
151
|
+
Internally, `XxxConfig` will have the following traits:
|
|
152
|
+
|
|
153
|
+
- `FromMap` trait: Allow users to build a config via hashmap which will replace existing `from_map` API in `Builder`.
|
|
154
|
+
- `Into<XxxBuilder>` trait: Config can convert into corresponding builder with zero cost.
|
|
155
|
+
|
|
156
|
+
All config fields will be public and non-exhaustive, allowing users to build config this way:
|
|
157
|
+
|
|
158
|
+
```rust
|
|
159
|
+
let s3 = S3Config {
|
|
160
|
+
bucket: "test".to_string(),
|
|
161
|
+
endpoint: Some("http://localhost:9000".to_string()),
|
|
162
|
+
..Default::default()
|
|
163
|
+
}
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
The public API of existing builders will remain unchanged, although their internal implementations will be modified to utilize `XxxConfig`. Type that can't be represents as `String` like `Box<dyn AwsCredentialLoad>` and `HttpClient` will be kept in `Builder` as before.
|
|
167
|
+
|
|
168
|
+
For example:
|
|
169
|
+
|
|
170
|
+
```rust
|
|
171
|
+
#[non_exhaustive]
|
|
172
|
+
pub struct S3Config {
|
|
173
|
+
pub root: Option<String>,
|
|
174
|
+
pub bucket: String,
|
|
175
|
+
pub endpoint: Option<String>,
|
|
176
|
+
pub region: Option<String>,
|
|
177
|
+
...
|
|
178
|
+
}
|
|
179
|
+
|
|
180
|
+
pub struct S3Builder {
|
|
181
|
+
config: S3Config,
|
|
182
|
+
|
|
183
|
+
customized_credential_load: Option<Box<dyn AwsCredentialLoad>>,
|
|
184
|
+
http_client: Option<HttpClient>,
|
|
185
|
+
}
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
# Drawbacks
|
|
189
|
+
|
|
190
|
+
## API Surface
|
|
191
|
+
|
|
192
|
+
This modification will significantly expand OpenDAL's public API surface, makes it harder to maintain and increases the risk of breaking changes. Also, this change will add much more work for bindings which need to implement `XxxConfig` for each service.
|
|
193
|
+
|
|
194
|
+
## Secrets Leakage
|
|
195
|
+
|
|
196
|
+
After our config supports `Serialize`, it's possible that users will serialize the config and log it. This will lead to secrets leakage. We should add a warning in the docs to prevent this. And we should also encourage uses of services like AWS IAM instead of static secrets.
|
|
197
|
+
|
|
198
|
+
# Rationale and alternatives
|
|
199
|
+
|
|
200
|
+
## Move `root` out of service config to operator level
|
|
201
|
+
|
|
202
|
+
There is another way to solve the problem: [Move `root` out of service config to operator level](https://github.com/apache/opendal/issues/3151).
|
|
203
|
+
|
|
204
|
+
We can move `root` out of the service config and put it in `Operator` level. This way, users can configure `root` for all services in one place. However, this is a large breaking changes and users will need to maintain the `root` logic everywhere.
|
|
205
|
+
|
|
206
|
+
# Prior art
|
|
207
|
+
|
|
208
|
+
None.
|
|
209
|
+
|
|
210
|
+
# Unresolved questions
|
|
211
|
+
|
|
212
|
+
None.
|
|
213
|
+
|
|
214
|
+
# Future possibilities
|
|
215
|
+
|
|
216
|
+
## Implement `FromStr` for `Config`
|
|
217
|
+
|
|
218
|
+
We can implement `FromStr` for `Config` so that users can parse a config from a string.
|
|
219
|
+
|
|
220
|
+
```rust
|
|
221
|
+
let cfg = Config::from_str("s3://bucket/path/to/file?access_key_id=xxx&secret_access_key=xxx")?;
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
## Implement `Serialize` for `Config`
|
|
225
|
+
|
|
226
|
+
We can implement `Serialize` for `Config` so that users can serialize a config.
|
|
227
|
+
|
|
228
|
+
```rust
|
|
229
|
+
// Serialize
|
|
230
|
+
let bs = serde_json::to_vec(&cfg)?;
|
|
231
|
+
// Deserialize
|
|
232
|
+
let cfg: Config = serde_json::from_slice(&bs)?;
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
## Implement `check` for `Config`
|
|
236
|
+
|
|
237
|
+
Implement check for config so that users can check if a config is valid before `build`.
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
- Proposal Name: `align_list_api`
|
|
2
|
+
- Start Date: 2023-10-07
|
|
3
|
+
- RFC PR: [apache/opendal#3232](https://github.com/apache/opendal/pull/3232)
|
|
4
|
+
- Tracking Issue: [apache/opendal#3236](https://github.com/apache/opendal/issues/3236)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Refactor internal `Page` API to `List` API.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
OpenDAL's `Lister` is implemented by `Page`:
|
|
13
|
+
|
|
14
|
+
```rust
|
|
15
|
+
#[async_trait]
|
|
16
|
+
pub trait Page: Send + Sync + 'static {
|
|
17
|
+
/// Fetch a new page of [`Entry`]
|
|
18
|
+
///
|
|
19
|
+
/// `Ok(None)` means all pages have been returned. Any following call
|
|
20
|
+
/// to `next` will always get the same result.
|
|
21
|
+
async fn next(&mut self) -> Result<Option<Vec<Entry>>>;
|
|
22
|
+
}
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
Each call to `next` will retrieve a page of `Entry` objects. This design is modeled after the `list_object` API used in object storage services. However, this design has several drawbacks:
|
|
26
|
+
|
|
27
|
+
- Services like `fs`, `hdfs` needs to buffer the whole page in memory before returning it to the caller.
|
|
28
|
+
- `Page` is not aligned with `opendal::Lister` make it hard to understand the code.
|
|
29
|
+
- `Page` is not aligned with `Read` & `Write` which is poll based.
|
|
30
|
+
|
|
31
|
+
# Guide-level explanation
|
|
32
|
+
|
|
33
|
+
No user-facing changes.
|
|
34
|
+
|
|
35
|
+
# Reference-level explanation
|
|
36
|
+
|
|
37
|
+
We will rename `Page` to `List` and change the API to:
|
|
38
|
+
|
|
39
|
+
```rust
|
|
40
|
+
pub trait List: Send + Sync + 'static {
|
|
41
|
+
/// Fetch a new [`Entry`]
|
|
42
|
+
///
|
|
43
|
+
/// `Ok(None)` means all entries have been returned. Any following call
|
|
44
|
+
/// to `next` will always get the same result.
|
|
45
|
+
fn poll_next(&mut self, cx: &mut Context<'_>) -> Result<Option<Entry>>;
|
|
46
|
+
}
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
All `page` related code will be replaced by `list`.
|
|
50
|
+
|
|
51
|
+
# Drawbacks
|
|
52
|
+
|
|
53
|
+
Breaking changes for raw API.
|
|
54
|
+
|
|
55
|
+
# Rationale and alternatives
|
|
56
|
+
|
|
57
|
+
None
|
|
58
|
+
|
|
59
|
+
# Prior art
|
|
60
|
+
|
|
61
|
+
None
|
|
62
|
+
|
|
63
|
+
# Unresolved questions
|
|
64
|
+
|
|
65
|
+
None
|
|
66
|
+
|
|
67
|
+
# Future possibilities
|
|
68
|
+
|
|
69
|
+
None
|
|
@@ -0,0 +1,128 @@
|
|
|
1
|
+
- Proposal Name: `list_prefix`
|
|
2
|
+
- Start Date: 2023-10-08
|
|
3
|
+
- RFC PR: [apache/opendal#3243](https://github.com/apache/opendal/pull/3243)
|
|
4
|
+
- Tracking Issue: [apache/opendal#3247](https://github.com/apache/opendal/issues/3247)
|
|
5
|
+
|
|
6
|
+
# Summary
|
|
7
|
+
|
|
8
|
+
Allow users to specify a prefix and remove the requirement that the path must end with `/`.
|
|
9
|
+
|
|
10
|
+
# Motivation
|
|
11
|
+
|
|
12
|
+
OpenDAL uses `/` to distinguish between a file and a directory. This design is necessary for object storage services such as S3 and GCS, where both `abc` (file) and `abc/` (directory) can coexist. We require users to provide the correct path to the API. For instance, when using `read("abc/")`, it returns `IsADirectory`, whereas with `list("abc/")` it returns `NotADirectory`. This behavior may be perplexing for users.
|
|
13
|
+
|
|
14
|
+
As a side-effect of this design, OpenDAL always return exist for `stat("not_exist/")` since there is no way for OpenDAL to check if `not_exist/file_example` is exist via `HeadObject` call.
|
|
15
|
+
|
|
16
|
+
There are some issues and pull requests related to those issues.
|
|
17
|
+
|
|
18
|
+
- [Invalid metadata for dir objects in s3](https://github.com/apache/opendal/issues/3199)
|
|
19
|
+
- [`is_exist` always return true for key end with '/', in S3 service](https://github.com/apache/opendal/issues/2086)
|
|
20
|
+
|
|
21
|
+
POSIX-like file systems also have their own issues, as they lack native support for listing a prefix.
|
|
22
|
+
|
|
23
|
+
Give file tree like the following:
|
|
24
|
+
|
|
25
|
+
```shell
|
|
26
|
+
abc/
|
|
27
|
+
abc/def
|
|
28
|
+
abc/xyz/
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Calling `list("ab")` will return `NotFound` after we removing the requirement that the path must end with `/`.
|
|
32
|
+
|
|
33
|
+
So I propose the following changes of OpenDAL API behaviors:
|
|
34
|
+
|
|
35
|
+
- Remove the requirement that the path for `list` must end with `/`.
|
|
36
|
+
- Object storage services will use `list_object` API to check if a dir is exist.
|
|
37
|
+
- Simulate the list prefix behavior for POSIX-like file systems.
|
|
38
|
+
|
|
39
|
+
# Guide-level explanation
|
|
40
|
+
|
|
41
|
+
Given the following file tree:
|
|
42
|
+
|
|
43
|
+
```shell
|
|
44
|
+
abc/
|
|
45
|
+
abc/def_file
|
|
46
|
+
abc/def_dir/
|
|
47
|
+
abc/def_dir/xyz_file
|
|
48
|
+
abc/def_dir/xyz_dir/
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
While listing a path:
|
|
52
|
+
|
|
53
|
+
| Case | Path | Result | Description |
|
|
54
|
+
|-------------------------|-----------------|-------------------------------------|-----------------------------------------|
|
|
55
|
+
| list dir | `abc/` | `abc/def_file` <br/> `abc/def_dir/` | children that matches the dir |
|
|
56
|
+
| list prefix | `abc/def` | `abc/def_file` <br/> `abc/def_dir/` | children that matches the prefix |
|
|
57
|
+
| list file | `abc/def_file` | `abc/def_file` | the only children that matches the path |
|
|
58
|
+
| list dir without `/` | `abc/def_dir` | `abc/def_dir/` | the only children that matches the path |
|
|
59
|
+
| list file ends with `/` | `abc/def_file/` | EMPTY | no children matches the dir |
|
|
60
|
+
| list not exist dir | `def/` | EMPTY | no children found matches the dir |
|
|
61
|
+
| list not exist file | `def` | EMPTY | no children found matches the prefix |
|
|
62
|
+
|
|
63
|
+
While listing a path with `delimiter` set to `""`:
|
|
64
|
+
|
|
65
|
+
| Case | Path | Result | Description |
|
|
66
|
+
|-------------------------|-----------------|-----------------------------------------------------------------------------------------------|-----------------------------------------|
|
|
67
|
+
| list dir | `abc/` | `abc/def_file` <br/> `abc/def_dir/` <br/> `abc/def_dir/xyz_file` <br/> `abc/def_dir/xyz_dir/` | children that matches the dir |
|
|
68
|
+
| list prefix | `abc/def` | `abc/def_file` <br/> `abc/def_dir/` <br/> `abc/def_dir/xyz_file` <br/> `abc/def_dir/xyz_dir/` | children that matches the prefix |
|
|
69
|
+
| list file | `abc/def_file` | `abc/def_file` | the only children that matches the path |
|
|
70
|
+
| list dir without `/` | `abc/def_dir` | `abc/def_dir/` <br/> `abc/def_dir/xyz_file` <br/> `abc/def_dir/xyz_dir/` | children that matches the path |
|
|
71
|
+
| list file ends with `/` | `abc/def_file/` | EMPTY | no children matches the dir |
|
|
72
|
+
| list not exist dir | `def/` | EMPTY | no children found matches the dir |
|
|
73
|
+
| list not exist file | `def` | EMPTY | no children found matches the prefix |
|
|
74
|
+
|
|
75
|
+
While stat a path:
|
|
76
|
+
|
|
77
|
+
| Case | Path | Result |
|
|
78
|
+
|------------------------|-----------------|--------------------------------------------|
|
|
79
|
+
| stat existing dir | `abc/` | Metadata with dir mode |
|
|
80
|
+
| stat existing file | `abc/def_file` | Metadata with file mode |
|
|
81
|
+
| stat dir without `/` | `abc/def_dir` | Error `NotFound` or metadata with dir mode |
|
|
82
|
+
| stat file with `/` | `abc/def_file/` | Error `NotFound` |
|
|
83
|
+
| stat not existing path | `xyz` | Error `NotFound` |
|
|
84
|
+
|
|
85
|
+
While create dir on a path:
|
|
86
|
+
|
|
87
|
+
| Case | Path | Result |
|
|
88
|
+
|-----------------------------|--------|----------------------------|
|
|
89
|
+
| create dir on existing dir | `abc/` | Ok |
|
|
90
|
+
| create dir on existing file | `abc` | Error with `NotADirectory` |
|
|
91
|
+
| create dir with `/` | `xyz/` | Ok |
|
|
92
|
+
| create dir without `/` | `xyz` | Ok with `xyz/` created |
|
|
93
|
+
|
|
94
|
+
# Reference-level explanation
|
|
95
|
+
|
|
96
|
+
For POSIX-like services, we will:
|
|
97
|
+
|
|
98
|
+
- Simulate the list prefix behavior by listing the parent dir and filter the children that matches the prefix.
|
|
99
|
+
- Return `NotFound` while stat an existing file with `/`
|
|
100
|
+
|
|
101
|
+
For object storage services, we will:
|
|
102
|
+
|
|
103
|
+
- Use `list_object` API while stat a path ends with `/`.
|
|
104
|
+
- Return dir metadata if the dir is exist or there is at least a children.
|
|
105
|
+
- Return `NotFound` if the dir is not exist and there is no children.
|
|
106
|
+
- Check path before create dir with a path not ends with `/`.
|
|
107
|
+
- Return `NotADirectory` if the path is exist.
|
|
108
|
+
- Create the dir with `/` appended.
|
|
109
|
+
|
|
110
|
+
# Drawbacks
|
|
111
|
+
|
|
112
|
+
None
|
|
113
|
+
|
|
114
|
+
# Rationale and alternatives
|
|
115
|
+
|
|
116
|
+
None
|
|
117
|
+
|
|
118
|
+
# Prior art
|
|
119
|
+
|
|
120
|
+
None
|
|
121
|
+
|
|
122
|
+
# Unresolved questions
|
|
123
|
+
|
|
124
|
+
None
|
|
125
|
+
|
|
126
|
+
# Future possibilities
|
|
127
|
+
|
|
128
|
+
None
|