stac-fastapi-elasticsearch 6.1.0__tar.gz → 6.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/PKG-INFO +77 -2
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/README.md +76 -1
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/setup.py +2 -2
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi/elasticsearch/app.py +1 -1
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi/elasticsearch/database_logic.py +96 -46
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi/elasticsearch/version.py +1 -1
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi_elasticsearch.egg-info/PKG-INFO +77 -2
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi_elasticsearch.egg-info/requires.txt +2 -2
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/setup.cfg +0 -0
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi/elasticsearch/__init__.py +0 -0
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi/elasticsearch/config.py +0 -0
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi_elasticsearch.egg-info/SOURCES.txt +0 -0
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi_elasticsearch.egg-info/dependency_links.txt +0 -0
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi_elasticsearch.egg-info/entry_points.txt +0 -0
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi_elasticsearch.egg-info/not-zip-safe +0 -0
- {stac_fastapi_elasticsearch-6.1.0 → stac_fastapi_elasticsearch-6.2.0}/stac_fastapi_elasticsearch.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: stac_fastapi_elasticsearch
|
|
3
|
-
Version: 6.
|
|
3
|
+
Version: 6.2.0
|
|
4
4
|
Summary: An implementation of STAC API based on the FastAPI framework with both Elasticsearch and Opensearch.
|
|
5
5
|
Home-page: https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch
|
|
6
6
|
License: MIT
|
|
@@ -106,6 +106,7 @@ This project is built on the following technologies: STAC, stac-fastapi, FastAPI
|
|
|
106
106
|
- [Auth](#auth)
|
|
107
107
|
- [Aggregation](#aggregation)
|
|
108
108
|
- [Rate Limiting](#rate-limiting)
|
|
109
|
+
- [Datetime-Based Index Management](#datetime-based-index-management)
|
|
109
110
|
|
|
110
111
|
## Documentation & Resources
|
|
111
112
|
|
|
@@ -251,6 +252,81 @@ You can customize additional settings in your `.env` file:
|
|
|
251
252
|
> [!NOTE]
|
|
252
253
|
> The variables `ES_HOST`, `ES_PORT`, `ES_USE_SSL`, `ES_VERIFY_CERTS` and `ES_TIMEOUT` apply to both Elasticsearch and OpenSearch backends, so there is no need to rename the key names to `OS_` even if you're using OpenSearch.
|
|
253
254
|
|
|
255
|
+
## Datetime-Based Index Management
|
|
256
|
+
|
|
257
|
+
### Overview
|
|
258
|
+
|
|
259
|
+
SFEOS supports two indexing strategies for managing STAC items:
|
|
260
|
+
|
|
261
|
+
1. **Simple Indexing** (default) - One index per collection
|
|
262
|
+
2. **Datetime-Based Indexing** - Time-partitioned indexes with automatic management
|
|
263
|
+
|
|
264
|
+
The datetime-based indexing strategy is particularly useful for large temporal datasets. When a user provides a datetime parameter in a query, the system knows exactly which index to search, providing **multiple times faster searches** and significantly **reducing database load**.
|
|
265
|
+
|
|
266
|
+
### When to Use
|
|
267
|
+
|
|
268
|
+
**Recommended for:**
|
|
269
|
+
- Systems with large collections containing millions of items
|
|
270
|
+
- Systems requiring high-performance temporal searching
|
|
271
|
+
|
|
272
|
+
**Pros:**
|
|
273
|
+
- Multiple times faster queries with datetime filter
|
|
274
|
+
- Reduced database load - only relevant indexes are searched
|
|
275
|
+
|
|
276
|
+
**Cons:**
|
|
277
|
+
- Slightly longer item indexing time (automatic index management)
|
|
278
|
+
- Greater management complexity
|
|
279
|
+
|
|
280
|
+
### Configuration
|
|
281
|
+
|
|
282
|
+
#### Enabling Datetime-Based Indexing
|
|
283
|
+
|
|
284
|
+
Enable datetime-based indexing by setting the following environment variable:
|
|
285
|
+
|
|
286
|
+
```bash
|
|
287
|
+
ENABLE_DATETIME_INDEX_FILTERING=true
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### Related Configuration Variables
|
|
291
|
+
|
|
292
|
+
| Variable | Description | Default | Example |
|
|
293
|
+
|----------|-------------|---------|---------|
|
|
294
|
+
| `ENABLE_DATETIME_INDEX_FILTERING` | Enables time-based index partitioning | `false` | `true` |
|
|
295
|
+
| `DATETIME_INDEX_MAX_SIZE_GB` | Maximum size limit for datetime indexes (GB) - note: add +20% to target size due to ES/OS compression | `25` | `50` |
|
|
296
|
+
| `STAC_ITEMS_INDEX_PREFIX` | Prefix for item indexes | `items_` | `stac_items_` |
|
|
297
|
+
|
|
298
|
+
## How Datetime-Based Indexing Works
|
|
299
|
+
|
|
300
|
+
### Index and Alias Naming Convention
|
|
301
|
+
|
|
302
|
+
The system uses a precise naming convention:
|
|
303
|
+
|
|
304
|
+
**Physical indexes:**
|
|
305
|
+
```
|
|
306
|
+
{ITEMS_INDEX_PREFIX}{collection-id}_{uuid4}
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
**Aliases:**
|
|
310
|
+
```
|
|
311
|
+
{ITEMS_INDEX_PREFIX}{collection-id} # Main collection alias
|
|
312
|
+
{ITEMS_INDEX_PREFIX}{collection-id}_{start-datetime} # Temporal alias
|
|
313
|
+
{ITEMS_INDEX_PREFIX}{collection-id}_{start-datetime}_{end-datetime} # Closed index alias
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
**Example:**
|
|
317
|
+
|
|
318
|
+
*Physical indexes:*
|
|
319
|
+
- `items_sentinel-2-l2a_a1b2c3d4-e5f6-7890-abcd-ef1234567890`
|
|
320
|
+
|
|
321
|
+
*Aliases:*
|
|
322
|
+
- `items_sentinel-2-l2a` - main collection alias
|
|
323
|
+
- `items_sentinel-2-l2a_2024-01-01` - active alias from January 1, 2024
|
|
324
|
+
- `items_sentinel-2-l2a_2024-01-01_2024-03-15` - closed index alias (reached size limit)
|
|
325
|
+
|
|
326
|
+
### Index Size Management
|
|
327
|
+
|
|
328
|
+
**Important - Data Compression:** Elasticsearch and OpenSearch automatically compress data. The configured `DATETIME_INDEX_MAX_SIZE_GB` limit refers to the compressed size on disk. It is recommended to add +20% to the target size to account for compression overhead and metadata.
|
|
329
|
+
|
|
254
330
|
## Interacting with the API
|
|
255
331
|
|
|
256
332
|
- **Creating a Collection**:
|
|
@@ -559,4 +635,3 @@ You can customize additional settings in your `.env` file:
|
|
|
559
635
|
- Ensures fair resource allocation among all clients
|
|
560
636
|
|
|
561
637
|
- **Examples**: Implementation examples are available in the [examples/rate_limit](examples/rate_limit) directory.
|
|
562
|
-
|
|
@@ -85,6 +85,7 @@ This project is built on the following technologies: STAC, stac-fastapi, FastAPI
|
|
|
85
85
|
- [Auth](#auth)
|
|
86
86
|
- [Aggregation](#aggregation)
|
|
87
87
|
- [Rate Limiting](#rate-limiting)
|
|
88
|
+
- [Datetime-Based Index Management](#datetime-based-index-management)
|
|
88
89
|
|
|
89
90
|
## Documentation & Resources
|
|
90
91
|
|
|
@@ -230,6 +231,81 @@ You can customize additional settings in your `.env` file:
|
|
|
230
231
|
> [!NOTE]
|
|
231
232
|
> The variables `ES_HOST`, `ES_PORT`, `ES_USE_SSL`, `ES_VERIFY_CERTS` and `ES_TIMEOUT` apply to both Elasticsearch and OpenSearch backends, so there is no need to rename the key names to `OS_` even if you're using OpenSearch.
|
|
232
233
|
|
|
234
|
+
## Datetime-Based Index Management
|
|
235
|
+
|
|
236
|
+
### Overview
|
|
237
|
+
|
|
238
|
+
SFEOS supports two indexing strategies for managing STAC items:
|
|
239
|
+
|
|
240
|
+
1. **Simple Indexing** (default) - One index per collection
|
|
241
|
+
2. **Datetime-Based Indexing** - Time-partitioned indexes with automatic management
|
|
242
|
+
|
|
243
|
+
The datetime-based indexing strategy is particularly useful for large temporal datasets. When a user provides a datetime parameter in a query, the system knows exactly which index to search, providing **multiple times faster searches** and significantly **reducing database load**.
|
|
244
|
+
|
|
245
|
+
### When to Use
|
|
246
|
+
|
|
247
|
+
**Recommended for:**
|
|
248
|
+
- Systems with large collections containing millions of items
|
|
249
|
+
- Systems requiring high-performance temporal searching
|
|
250
|
+
|
|
251
|
+
**Pros:**
|
|
252
|
+
- Multiple times faster queries with datetime filter
|
|
253
|
+
- Reduced database load - only relevant indexes are searched
|
|
254
|
+
|
|
255
|
+
**Cons:**
|
|
256
|
+
- Slightly longer item indexing time (automatic index management)
|
|
257
|
+
- Greater management complexity
|
|
258
|
+
|
|
259
|
+
### Configuration
|
|
260
|
+
|
|
261
|
+
#### Enabling Datetime-Based Indexing
|
|
262
|
+
|
|
263
|
+
Enable datetime-based indexing by setting the following environment variable:
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
ENABLE_DATETIME_INDEX_FILTERING=true
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
### Related Configuration Variables
|
|
270
|
+
|
|
271
|
+
| Variable | Description | Default | Example |
|
|
272
|
+
|----------|-------------|---------|---------|
|
|
273
|
+
| `ENABLE_DATETIME_INDEX_FILTERING` | Enables time-based index partitioning | `false` | `true` |
|
|
274
|
+
| `DATETIME_INDEX_MAX_SIZE_GB` | Maximum size limit for datetime indexes (GB) - note: add +20% to target size due to ES/OS compression | `25` | `50` |
|
|
275
|
+
| `STAC_ITEMS_INDEX_PREFIX` | Prefix for item indexes | `items_` | `stac_items_` |
|
|
276
|
+
|
|
277
|
+
## How Datetime-Based Indexing Works
|
|
278
|
+
|
|
279
|
+
### Index and Alias Naming Convention
|
|
280
|
+
|
|
281
|
+
The system uses a precise naming convention:
|
|
282
|
+
|
|
283
|
+
**Physical indexes:**
|
|
284
|
+
```
|
|
285
|
+
{ITEMS_INDEX_PREFIX}{collection-id}_{uuid4}
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
**Aliases:**
|
|
289
|
+
```
|
|
290
|
+
{ITEMS_INDEX_PREFIX}{collection-id} # Main collection alias
|
|
291
|
+
{ITEMS_INDEX_PREFIX}{collection-id}_{start-datetime} # Temporal alias
|
|
292
|
+
{ITEMS_INDEX_PREFIX}{collection-id}_{start-datetime}_{end-datetime} # Closed index alias
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
**Example:**
|
|
296
|
+
|
|
297
|
+
*Physical indexes:*
|
|
298
|
+
- `items_sentinel-2-l2a_a1b2c3d4-e5f6-7890-abcd-ef1234567890`
|
|
299
|
+
|
|
300
|
+
*Aliases:*
|
|
301
|
+
- `items_sentinel-2-l2a` - main collection alias
|
|
302
|
+
- `items_sentinel-2-l2a_2024-01-01` - active alias from January 1, 2024
|
|
303
|
+
- `items_sentinel-2-l2a_2024-01-01_2024-03-15` - closed index alias (reached size limit)
|
|
304
|
+
|
|
305
|
+
### Index Size Management
|
|
306
|
+
|
|
307
|
+
**Important - Data Compression:** Elasticsearch and OpenSearch automatically compress data. The configured `DATETIME_INDEX_MAX_SIZE_GB` limit refers to the compressed size on disk. It is recommended to add +20% to the target size to account for compression overhead and metadata.
|
|
308
|
+
|
|
233
309
|
## Interacting with the API
|
|
234
310
|
|
|
235
311
|
- **Creating a Collection**:
|
|
@@ -538,4 +614,3 @@ You can customize additional settings in your `.env` file:
|
|
|
538
614
|
- Ensures fair resource allocation among all clients
|
|
539
615
|
|
|
540
616
|
- **Examples**: Implementation examples are available in the [examples/rate_limit](examples/rate_limit) directory.
|
|
541
|
-
|
|
@@ -6,8 +6,8 @@ with open("README.md") as f:
|
|
|
6
6
|
desc = f.read()
|
|
7
7
|
|
|
8
8
|
install_requires = [
|
|
9
|
-
"stac-fastapi-core==6.
|
|
10
|
-
"sfeos-helpers==6.
|
|
9
|
+
"stac-fastapi-core==6.2.0",
|
|
10
|
+
"sfeos-helpers==6.2.0",
|
|
11
11
|
"elasticsearch[async]~=8.18.0",
|
|
12
12
|
"uvicorn~=0.23.0",
|
|
13
13
|
"starlette>=0.35.0,<0.36.0",
|
|
@@ -117,7 +117,7 @@ post_request_model = create_post_request_model(search_extensions)
|
|
|
117
117
|
app_config = {
|
|
118
118
|
"title": os.getenv("STAC_FASTAPI_TITLE", "stac-fastapi-elasticsearch"),
|
|
119
119
|
"description": os.getenv("STAC_FASTAPI_DESCRIPTION", "stac-fastapi-elasticsearch"),
|
|
120
|
-
"api_version": os.getenv("STAC_FASTAPI_VERSION", "6.
|
|
120
|
+
"api_version": os.getenv("STAC_FASTAPI_VERSION", "6.2.0"),
|
|
121
121
|
"settings": settings,
|
|
122
122
|
"extensions": extensions,
|
|
123
123
|
"client": CoreClient(
|
|
@@ -4,7 +4,7 @@ import asyncio
|
|
|
4
4
|
import logging
|
|
5
5
|
from base64 import urlsafe_b64decode, urlsafe_b64encode
|
|
6
6
|
from copy import deepcopy
|
|
7
|
-
from typing import Any, Dict, Iterable, List, Optional, Tuple, Type
|
|
7
|
+
from typing import Any, Dict, Iterable, List, Optional, Tuple, Type
|
|
8
8
|
|
|
9
9
|
import attr
|
|
10
10
|
import elasticsearch.helpers as helpers
|
|
@@ -27,7 +27,7 @@ from stac_fastapi.extensions.core.transaction.request import (
|
|
|
27
27
|
PartialItem,
|
|
28
28
|
PatchOperation,
|
|
29
29
|
)
|
|
30
|
-
from stac_fastapi.sfeos_helpers import filter
|
|
30
|
+
from stac_fastapi.sfeos_helpers import filter as filter_module
|
|
31
31
|
from stac_fastapi.sfeos_helpers.database import (
|
|
32
32
|
apply_free_text_filter_shared,
|
|
33
33
|
apply_intersects_filter_shared,
|
|
@@ -36,7 +36,6 @@ from stac_fastapi.sfeos_helpers.database import (
|
|
|
36
36
|
get_queryables_mapping_shared,
|
|
37
37
|
index_alias_by_collection_id,
|
|
38
38
|
index_by_collection_id,
|
|
39
|
-
indices,
|
|
40
39
|
mk_actions,
|
|
41
40
|
mk_item_id,
|
|
42
41
|
populate_sort_shared,
|
|
@@ -59,9 +58,14 @@ from stac_fastapi.sfeos_helpers.mappings import (
|
|
|
59
58
|
ITEMS_INDEX_PREFIX,
|
|
60
59
|
Geometry,
|
|
61
60
|
)
|
|
61
|
+
from stac_fastapi.sfeos_helpers.search_engine import (
|
|
62
|
+
BaseIndexInserter,
|
|
63
|
+
BaseIndexSelector,
|
|
64
|
+
IndexInsertionFactory,
|
|
65
|
+
IndexSelectorFactory,
|
|
66
|
+
)
|
|
62
67
|
from stac_fastapi.types.errors import ConflictError, NotFoundError
|
|
63
68
|
from stac_fastapi.types.links import resolve_links
|
|
64
|
-
from stac_fastapi.types.rfc3339 import DateTimeType
|
|
65
69
|
from stac_fastapi.types.stac import Collection, Item
|
|
66
70
|
|
|
67
71
|
logger = logging.getLogger(__name__)
|
|
@@ -139,6 +143,8 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
139
143
|
sync_settings: SyncElasticsearchSettings = attr.ib(
|
|
140
144
|
factory=SyncElasticsearchSettings
|
|
141
145
|
)
|
|
146
|
+
async_index_selector: BaseIndexSelector = attr.ib(init=False)
|
|
147
|
+
async_index_inserter: BaseIndexInserter = attr.ib(init=False)
|
|
142
148
|
|
|
143
149
|
client = attr.ib(init=False)
|
|
144
150
|
sync_client = attr.ib(init=False)
|
|
@@ -147,6 +153,10 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
147
153
|
"""Initialize clients after the class is instantiated."""
|
|
148
154
|
self.client = self.async_settings.create_client
|
|
149
155
|
self.sync_client = self.sync_settings.create_client
|
|
156
|
+
self.async_index_inserter = IndexInsertionFactory.create_insertion_strategy(
|
|
157
|
+
self.client
|
|
158
|
+
)
|
|
159
|
+
self.async_index_selector = IndexSelectorFactory.create_selector(self.client)
|
|
150
160
|
|
|
151
161
|
item_serializer: Type[ItemSerializer] = attr.ib(default=ItemSerializer)
|
|
152
162
|
collection_serializer: Type[CollectionSerializer] = attr.ib(
|
|
@@ -216,15 +226,23 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
216
226
|
with the index for the Collection as the target index and the combined `mk_item_id` as the document id.
|
|
217
227
|
"""
|
|
218
228
|
try:
|
|
219
|
-
|
|
229
|
+
response = await self.client.search(
|
|
220
230
|
index=index_alias_by_collection_id(collection_id),
|
|
221
|
-
|
|
231
|
+
body={
|
|
232
|
+
"query": {"term": {"_id": mk_item_id(item_id, collection_id)}},
|
|
233
|
+
"size": 1,
|
|
234
|
+
},
|
|
222
235
|
)
|
|
236
|
+
if response["hits"]["total"]["value"] == 0:
|
|
237
|
+
raise NotFoundError(
|
|
238
|
+
f"Item {item_id} does not exist inside Collection {collection_id}"
|
|
239
|
+
)
|
|
240
|
+
|
|
241
|
+
return response["hits"]["hits"][0]["_source"]
|
|
223
242
|
except ESNotFoundError:
|
|
224
243
|
raise NotFoundError(
|
|
225
244
|
f"Item {item_id} does not exist inside Collection {collection_id}"
|
|
226
245
|
)
|
|
227
|
-
return item["_source"]
|
|
228
246
|
|
|
229
247
|
async def get_queryables_mapping(self, collection_id: str = "*") -> dict:
|
|
230
248
|
"""Retrieve mapping of Queryables for search.
|
|
@@ -260,31 +278,21 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
260
278
|
|
|
261
279
|
@staticmethod
|
|
262
280
|
def apply_datetime_filter(
|
|
263
|
-
search: Search,
|
|
264
|
-
) -> Search:
|
|
281
|
+
search: Search, datetime: Optional[str]
|
|
282
|
+
) -> Tuple[Search, Dict[str, Optional[str]]]:
|
|
265
283
|
"""Apply a filter to search on datetime, start_datetime, and end_datetime fields.
|
|
266
284
|
|
|
267
285
|
Args:
|
|
268
286
|
search: The search object to filter.
|
|
269
|
-
|
|
270
|
-
- A single datetime string (e.g., "2023-01-01T12:00:00")
|
|
271
|
-
- A datetime range string (e.g., "2023-01-01/2023-12-31")
|
|
272
|
-
- A datetime object
|
|
273
|
-
- A tuple of (start_datetime, end_datetime)
|
|
287
|
+
datetime: Optional[str]
|
|
274
288
|
|
|
275
289
|
Returns:
|
|
276
290
|
The filtered search object.
|
|
277
291
|
"""
|
|
278
|
-
|
|
279
|
-
return search
|
|
292
|
+
datetime_search = return_date(datetime)
|
|
280
293
|
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
datetime_search = return_date(interval)
|
|
284
|
-
except (ValueError, TypeError) as e:
|
|
285
|
-
# Handle invalid interval formats if return_date fails
|
|
286
|
-
logger.error(f"Invalid interval format: {interval}, error: {e}")
|
|
287
|
-
return search
|
|
294
|
+
if not datetime_search:
|
|
295
|
+
return search, datetime_search
|
|
288
296
|
|
|
289
297
|
if "eq" in datetime_search:
|
|
290
298
|
# For exact matches, include:
|
|
@@ -351,7 +359,10 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
351
359
|
),
|
|
352
360
|
]
|
|
353
361
|
|
|
354
|
-
return
|
|
362
|
+
return (
|
|
363
|
+
search.query(Q("bool", should=should, minimum_should_match=1)),
|
|
364
|
+
datetime_search,
|
|
365
|
+
)
|
|
355
366
|
|
|
356
367
|
@staticmethod
|
|
357
368
|
def apply_bbox_filter(search: Search, bbox: List):
|
|
@@ -466,7 +477,7 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
466
477
|
otherwise the original Search object.
|
|
467
478
|
"""
|
|
468
479
|
if _filter is not None:
|
|
469
|
-
es_query =
|
|
480
|
+
es_query = filter_module.to_es(await self.get_queryables_mapping(), _filter)
|
|
470
481
|
search = search.query(es_query)
|
|
471
482
|
|
|
472
483
|
return search
|
|
@@ -493,6 +504,7 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
493
504
|
token: Optional[str],
|
|
494
505
|
sort: Optional[Dict[str, Dict[str, str]]],
|
|
495
506
|
collection_ids: Optional[List[str]],
|
|
507
|
+
datetime_search: Dict[str, Optional[str]],
|
|
496
508
|
ignore_unavailable: bool = True,
|
|
497
509
|
) -> Tuple[Iterable[Dict[str, Any]], Optional[int], Optional[str]]:
|
|
498
510
|
"""Execute a search query with limit and other optional parameters.
|
|
@@ -503,6 +515,7 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
503
515
|
token (Optional[str]): The token used to return the next set of results.
|
|
504
516
|
sort (Optional[Dict[str, Dict[str, str]]]): Specifies how the results should be sorted.
|
|
505
517
|
collection_ids (Optional[List[str]]): The collection ids to search.
|
|
518
|
+
datetime_search (Dict[str, Optional[str]]): Datetime range used for index selection.
|
|
506
519
|
ignore_unavailable (bool, optional): Whether to ignore unavailable collections. Defaults to True.
|
|
507
520
|
|
|
508
521
|
Returns:
|
|
@@ -523,7 +536,9 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
523
536
|
|
|
524
537
|
query = search.query.to_dict() if search.query else None
|
|
525
538
|
|
|
526
|
-
index_param =
|
|
539
|
+
index_param = await self.async_index_selector.select_indexes(
|
|
540
|
+
collection_ids, datetime_search
|
|
541
|
+
)
|
|
527
542
|
if len(index_param) > ES_MAX_URL_LENGTH - 300:
|
|
528
543
|
index_param = ITEM_INDICES
|
|
529
544
|
query = add_collections_to_body(collection_ids, query)
|
|
@@ -590,6 +605,7 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
590
605
|
geometry_geohash_grid_precision: int,
|
|
591
606
|
geometry_geotile_grid_precision: int,
|
|
592
607
|
datetime_frequency_interval: str,
|
|
608
|
+
datetime_search,
|
|
593
609
|
ignore_unavailable: Optional[bool] = True,
|
|
594
610
|
):
|
|
595
611
|
"""Return aggregations of STAC Items."""
|
|
@@ -625,7 +641,10 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
625
641
|
if k in aggregations
|
|
626
642
|
}
|
|
627
643
|
|
|
628
|
-
index_param =
|
|
644
|
+
index_param = await self.async_index_selector.select_indexes(
|
|
645
|
+
collection_ids, datetime_search
|
|
646
|
+
)
|
|
647
|
+
|
|
629
648
|
search_task = asyncio.create_task(
|
|
630
649
|
self.client.search(
|
|
631
650
|
index=index_param,
|
|
@@ -667,14 +686,21 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
667
686
|
|
|
668
687
|
"""
|
|
669
688
|
await self.check_collection_exists(collection_id=item["collection"])
|
|
689
|
+
alias = index_alias_by_collection_id(item["collection"])
|
|
690
|
+
doc_id = mk_item_id(item["id"], item["collection"])
|
|
670
691
|
|
|
671
|
-
if not exist_ok
|
|
672
|
-
|
|
673
|
-
|
|
674
|
-
|
|
675
|
-
|
|
676
|
-
|
|
677
|
-
|
|
692
|
+
if not exist_ok:
|
|
693
|
+
alias_exists = await self.client.indices.exists_alias(name=alias)
|
|
694
|
+
|
|
695
|
+
if alias_exists:
|
|
696
|
+
alias_info = await self.client.indices.get_alias(name=alias)
|
|
697
|
+
indices = list(alias_info.keys())
|
|
698
|
+
|
|
699
|
+
for index in indices:
|
|
700
|
+
if await self.client.exists(index=index, id=doc_id):
|
|
701
|
+
raise ConflictError(
|
|
702
|
+
f"Item {item['id']} in collection {item['collection']} already exists"
|
|
703
|
+
)
|
|
678
704
|
|
|
679
705
|
return self.item_serializer.stac_to_db(item, base_url)
|
|
680
706
|
|
|
@@ -805,7 +831,6 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
805
831
|
# Extract item and collection IDs
|
|
806
832
|
item_id = item["id"]
|
|
807
833
|
collection_id = item["collection"]
|
|
808
|
-
|
|
809
834
|
# Ensure kwargs is a dictionary
|
|
810
835
|
kwargs = kwargs or {}
|
|
811
836
|
|
|
@@ -823,9 +848,12 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
823
848
|
item=item, base_url=base_url, exist_ok=exist_ok
|
|
824
849
|
)
|
|
825
850
|
|
|
851
|
+
target_index = await self.async_index_inserter.get_target_index(
|
|
852
|
+
collection_id, item
|
|
853
|
+
)
|
|
826
854
|
# Index the item in the database
|
|
827
855
|
await self.client.index(
|
|
828
|
-
index=
|
|
856
|
+
index=target_index,
|
|
829
857
|
id=mk_item_id(item_id, collection_id),
|
|
830
858
|
document=item,
|
|
831
859
|
refresh=refresh,
|
|
@@ -904,13 +932,28 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
904
932
|
script = operations_to_script(script_operations)
|
|
905
933
|
|
|
906
934
|
try:
|
|
907
|
-
await self.client.
|
|
935
|
+
search_response = await self.client.search(
|
|
908
936
|
index=index_alias_by_collection_id(collection_id),
|
|
937
|
+
body={
|
|
938
|
+
"query": {"term": {"_id": mk_item_id(item_id, collection_id)}},
|
|
939
|
+
"size": 1,
|
|
940
|
+
},
|
|
941
|
+
)
|
|
942
|
+
if search_response["hits"]["total"]["value"] == 0:
|
|
943
|
+
raise NotFoundError(
|
|
944
|
+
f"Item {item_id} does not exist inside Collection {collection_id}"
|
|
945
|
+
)
|
|
946
|
+
document_index = search_response["hits"]["hits"][0]["_index"]
|
|
947
|
+
await self.client.update(
|
|
948
|
+
index=document_index,
|
|
909
949
|
id=mk_item_id(item_id, collection_id),
|
|
910
950
|
script=script,
|
|
911
951
|
refresh=True,
|
|
912
952
|
)
|
|
913
|
-
|
|
953
|
+
except ESNotFoundError:
|
|
954
|
+
raise NotFoundError(
|
|
955
|
+
f"Item {item_id} does not exist inside Collection {collection_id}"
|
|
956
|
+
)
|
|
914
957
|
except BadRequestError as exc:
|
|
915
958
|
raise HTTPException(
|
|
916
959
|
status_code=400, detail=exc.info["error"]["caused_by"]
|
|
@@ -921,7 +964,9 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
921
964
|
if new_collection_id:
|
|
922
965
|
await self.client.reindex(
|
|
923
966
|
body={
|
|
924
|
-
"dest": {
|
|
967
|
+
"dest": {
|
|
968
|
+
"index": f"{ITEMS_INDEX_PREFIX}{new_collection_id}"
|
|
969
|
+
}, # # noqa
|
|
925
970
|
"source": {
|
|
926
971
|
"index": f"{ITEMS_INDEX_PREFIX}{collection_id}",
|
|
927
972
|
"query": {"term": {"id": {"value": item_id}}},
|
|
@@ -929,8 +974,8 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
929
974
|
"script": {
|
|
930
975
|
"lang": "painless",
|
|
931
976
|
"source": (
|
|
932
|
-
f"""ctx._id = ctx._id.replace('{collection_id}', '{new_collection_id}');"""
|
|
933
|
-
f"""ctx._source.collection = '{new_collection_id}';"""
|
|
977
|
+
f"""ctx._id = ctx._id.replace('{collection_id}', '{new_collection_id}');""" # noqa
|
|
978
|
+
f"""ctx._source.collection = '{new_collection_id}';""" # noqa
|
|
934
979
|
),
|
|
935
980
|
},
|
|
936
981
|
},
|
|
@@ -990,9 +1035,9 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
990
1035
|
|
|
991
1036
|
try:
|
|
992
1037
|
# Perform the delete operation
|
|
993
|
-
await self.client.
|
|
1038
|
+
await self.client.delete_by_query(
|
|
994
1039
|
index=index_alias_by_collection_id(collection_id),
|
|
995
|
-
|
|
1040
|
+
body={"query": {"term": {"_id": mk_item_id(item_id, collection_id)}}},
|
|
996
1041
|
refresh=refresh,
|
|
997
1042
|
)
|
|
998
1043
|
except ESNotFoundError:
|
|
@@ -1092,8 +1137,10 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
1092
1137
|
refresh=refresh,
|
|
1093
1138
|
)
|
|
1094
1139
|
|
|
1095
|
-
|
|
1096
|
-
|
|
1140
|
+
if self.async_index_inserter.should_create_collection_index():
|
|
1141
|
+
await self.async_index_inserter.create_simple_index(
|
|
1142
|
+
self.client, collection_id
|
|
1143
|
+
)
|
|
1097
1144
|
|
|
1098
1145
|
async def find_collection(self, collection_id: str) -> Collection:
|
|
1099
1146
|
"""Find and return a collection from the database.
|
|
@@ -1367,9 +1414,12 @@ class DatabaseLogic(BaseDatabaseLogic):
|
|
|
1367
1414
|
|
|
1368
1415
|
# Perform the bulk insert
|
|
1369
1416
|
raise_on_error = self.async_settings.raise_on_bulk_error
|
|
1417
|
+
actions = await self.async_index_inserter.prepare_bulk_actions(
|
|
1418
|
+
collection_id, processed_items
|
|
1419
|
+
)
|
|
1370
1420
|
success, errors = await helpers.async_bulk(
|
|
1371
1421
|
self.client,
|
|
1372
|
-
|
|
1422
|
+
actions,
|
|
1373
1423
|
refresh=refresh,
|
|
1374
1424
|
raise_on_error=raise_on_error,
|
|
1375
1425
|
)
|
|
@@ -1,2 +1,2 @@
|
|
|
1
1
|
"""library version."""
|
|
2
|
-
__version__ = "6.
|
|
2
|
+
__version__ = "6.2.0"
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: stac-fastapi-elasticsearch
|
|
3
|
-
Version: 6.
|
|
3
|
+
Version: 6.2.0
|
|
4
4
|
Summary: An implementation of STAC API based on the FastAPI framework with both Elasticsearch and Opensearch.
|
|
5
5
|
Home-page: https://github.com/stac-utils/stac-fastapi-elasticsearch-opensearch
|
|
6
6
|
License: MIT
|
|
@@ -106,6 +106,7 @@ This project is built on the following technologies: STAC, stac-fastapi, FastAPI
|
|
|
106
106
|
- [Auth](#auth)
|
|
107
107
|
- [Aggregation](#aggregation)
|
|
108
108
|
- [Rate Limiting](#rate-limiting)
|
|
109
|
+
- [Datetime-Based Index Management](#datetime-based-index-management)
|
|
109
110
|
|
|
110
111
|
## Documentation & Resources
|
|
111
112
|
|
|
@@ -251,6 +252,81 @@ You can customize additional settings in your `.env` file:
|
|
|
251
252
|
> [!NOTE]
|
|
252
253
|
> The variables `ES_HOST`, `ES_PORT`, `ES_USE_SSL`, `ES_VERIFY_CERTS` and `ES_TIMEOUT` apply to both Elasticsearch and OpenSearch backends, so there is no need to rename the key names to `OS_` even if you're using OpenSearch.
|
|
253
254
|
|
|
255
|
+
## Datetime-Based Index Management
|
|
256
|
+
|
|
257
|
+
### Overview
|
|
258
|
+
|
|
259
|
+
SFEOS supports two indexing strategies for managing STAC items:
|
|
260
|
+
|
|
261
|
+
1. **Simple Indexing** (default) - One index per collection
|
|
262
|
+
2. **Datetime-Based Indexing** - Time-partitioned indexes with automatic management
|
|
263
|
+
|
|
264
|
+
The datetime-based indexing strategy is particularly useful for large temporal datasets. When a user provides a datetime parameter in a query, the system knows exactly which index to search, providing **multiple times faster searches** and significantly **reducing database load**.
|
|
265
|
+
|
|
266
|
+
### When to Use
|
|
267
|
+
|
|
268
|
+
**Recommended for:**
|
|
269
|
+
- Systems with large collections containing millions of items
|
|
270
|
+
- Systems requiring high-performance temporal searching
|
|
271
|
+
|
|
272
|
+
**Pros:**
|
|
273
|
+
- Multiple times faster queries with datetime filter
|
|
274
|
+
- Reduced database load - only relevant indexes are searched
|
|
275
|
+
|
|
276
|
+
**Cons:**
|
|
277
|
+
- Slightly longer item indexing time (automatic index management)
|
|
278
|
+
- Greater management complexity
|
|
279
|
+
|
|
280
|
+
### Configuration
|
|
281
|
+
|
|
282
|
+
#### Enabling Datetime-Based Indexing
|
|
283
|
+
|
|
284
|
+
Enable datetime-based indexing by setting the following environment variable:
|
|
285
|
+
|
|
286
|
+
```bash
|
|
287
|
+
ENABLE_DATETIME_INDEX_FILTERING=true
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### Related Configuration Variables
|
|
291
|
+
|
|
292
|
+
| Variable | Description | Default | Example |
|
|
293
|
+
|----------|-------------|---------|---------|
|
|
294
|
+
| `ENABLE_DATETIME_INDEX_FILTERING` | Enables time-based index partitioning | `false` | `true` |
|
|
295
|
+
| `DATETIME_INDEX_MAX_SIZE_GB` | Maximum size limit for datetime indexes (GB) - note: add +20% to target size due to ES/OS compression | `25` | `50` |
|
|
296
|
+
| `STAC_ITEMS_INDEX_PREFIX` | Prefix for item indexes | `items_` | `stac_items_` |
|
|
297
|
+
|
|
298
|
+
## How Datetime-Based Indexing Works
|
|
299
|
+
|
|
300
|
+
### Index and Alias Naming Convention
|
|
301
|
+
|
|
302
|
+
The system uses a precise naming convention:
|
|
303
|
+
|
|
304
|
+
**Physical indexes:**
|
|
305
|
+
```
|
|
306
|
+
{ITEMS_INDEX_PREFIX}{collection-id}_{uuid4}
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
**Aliases:**
|
|
310
|
+
```
|
|
311
|
+
{ITEMS_INDEX_PREFIX}{collection-id} # Main collection alias
|
|
312
|
+
{ITEMS_INDEX_PREFIX}{collection-id}_{start-datetime} # Temporal alias
|
|
313
|
+
{ITEMS_INDEX_PREFIX}{collection-id}_{start-datetime}_{end-datetime} # Closed index alias
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
**Example:**
|
|
317
|
+
|
|
318
|
+
*Physical indexes:*
|
|
319
|
+
- `items_sentinel-2-l2a_a1b2c3d4-e5f6-7890-abcd-ef1234567890`
|
|
320
|
+
|
|
321
|
+
*Aliases:*
|
|
322
|
+
- `items_sentinel-2-l2a` - main collection alias
|
|
323
|
+
- `items_sentinel-2-l2a_2024-01-01` - active alias from January 1, 2024
|
|
324
|
+
- `items_sentinel-2-l2a_2024-01-01_2024-03-15` - closed index alias (reached size limit)
|
|
325
|
+
|
|
326
|
+
### Index Size Management
|
|
327
|
+
|
|
328
|
+
**Important - Data Compression:** Elasticsearch and OpenSearch automatically compress data. The configured `DATETIME_INDEX_MAX_SIZE_GB` limit refers to the compressed size on disk. It is recommended to add +20% to the target size to account for compression overhead and metadata.
|
|
329
|
+
|
|
254
330
|
## Interacting with the API
|
|
255
331
|
|
|
256
332
|
- **Creating a Collection**:
|
|
@@ -559,4 +635,3 @@ You can customize additional settings in your `.env` file:
|
|
|
559
635
|
- Ensures fair resource allocation among all clients
|
|
560
636
|
|
|
561
637
|
- **Examples**: Implementation examples are available in the [examples/rate_limit](examples/rate_limit) directory.
|
|
562
|
-
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|