upstream-sdk 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 In-For-Disaster-Analytics Team
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,489 @@
1
+ Metadata-Version: 2.4
2
+ Name: upstream-sdk
3
+ Version: 1.0.0
4
+ Summary: Python SDK for Upstream environmental sensor data platform and CKAN integration
5
+ Author-email: In-For-Disaster-Analytics Team <info@tacc.utexas.edu>
6
+ Maintainer-email: In-For-Disaster-Analytics Team <info@tacc.utexas.edu>
7
+ License: MIT
8
+ Project-URL: Homepage, https://github.com/In-For-Disaster-Analytics/upstream-python-sdk
9
+ Project-URL: Documentation, https://upstream-python-sdk.readthedocs.io
10
+ Project-URL: Repository, https://github.com/In-For-Disaster-Analytics/upstream-python-sdk
11
+ Project-URL: Issues, https://github.com/In-For-Disaster-Analytics/upstream-python-sdk/issues
12
+ Project-URL: Changelog, https://github.com/In-For-Disaster-Analytics/upstream-python-sdk/blob/main/CHANGELOG.md
13
+ Keywords: environmental,sensors,data,api,ckan,upstream
14
+ Classifier: Development Status :: 4 - Beta
15
+ Classifier: Intended Audience :: Science/Research
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Operating System :: OS Independent
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.9
20
+ Classifier: Programming Language :: Python :: 3.10
21
+ Classifier: Programming Language :: Python :: 3.11
22
+ Classifier: Programming Language :: Python :: 3.12
23
+ Classifier: Topic :: Scientific/Engineering
24
+ Classifier: Topic :: Scientific/Engineering :: Atmospheric Science
25
+ Classifier: Topic :: Scientific/Engineering :: GIS
26
+ Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
27
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
28
+ Requires-Python: >=3.9
29
+ Description-Content-Type: text/markdown
30
+ License-File: LICENSE
31
+ Requires-Dist: requests>=2.25.0
32
+ Requires-Dist: pyyaml>=6.0
33
+ Requires-Dist: python-dateutil>=2.8.0
34
+ Requires-Dist: typing-extensions>=4.0.0; python_version < "3.10"
35
+ Requires-Dist: pydantic>=2.0.0
36
+ Requires-Dist: urllib3>=1.25.3
37
+ Requires-Dist: upstream-api-client>=0.1.4
38
+ Provides-Extra: dev
39
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
40
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
41
+ Requires-Dist: pytest-mock>=3.6.0; extra == "dev"
42
+ Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
43
+ Requires-Dist: black>=22.0.0; extra == "dev"
44
+ Requires-Dist: flake8>=5.0.0; extra == "dev"
45
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
46
+ Requires-Dist: pre-commit>=2.20.0; extra == "dev"
47
+ Requires-Dist: tox>=4.0.0; extra == "dev"
48
+ Requires-Dist: sphinx>=5.0.0; extra == "dev"
49
+ Requires-Dist: sphinx-rtd-theme>=1.2.0; extra == "dev"
50
+ Requires-Dist: twine>=4.0.0; extra == "dev"
51
+ Requires-Dist: build>=0.8.0; extra == "dev"
52
+ Provides-Extra: data
53
+ Requires-Dist: pandas>=1.3.0; extra == "data"
54
+ Requires-Dist: numpy>=1.20.0; extra == "data"
55
+ Provides-Extra: examples
56
+ Requires-Dist: jupyter>=1.0.0; extra == "examples"
57
+ Requires-Dist: matplotlib>=3.5.0; extra == "examples"
58
+ Requires-Dist: seaborn>=0.11.0; extra == "examples"
59
+ Provides-Extra: all
60
+ Requires-Dist: upstream-sdk[data,dev,examples]; extra == "all"
61
+ Dynamic: license-file
62
+
63
+ # Upstream Python SDK
64
+
65
+ A Python SDK for seamless integration with the Upstream environmental sensor data platform and CKAN data portal.
66
+
67
+ ## Overview
68
+
69
+ The Upstream Python SDK provides a standardized, production-ready toolkit for environmental researchers and organizations to:
70
+
71
+ - **Authenticate** with Upstream API and CKAN data portals
72
+ - **Manage** environmental monitoring campaigns and stations
73
+ - **Upload** sensor data efficiently (with automatic chunking for large datasets)
74
+ - **Publish** datasets automatically to CKAN for discoverability
75
+ - **Automate** data pipelines for continuous sensor networks
76
+
77
+ ## Key Features
78
+
79
+ ### 🔐 **Unified Authentication**
80
+
81
+ - Seamless integration with Upstream API and Tapis/CKAN
82
+ - Automatic token management and refresh
83
+ - Secure credential handling
84
+
85
+ ### 📊 **Complete Data Workflow**
86
+
87
+ ```python
88
+ from upstream import UpstreamClient
89
+
90
+ # Initialize client
91
+ client = UpstreamClient(username="researcher", password="password")
92
+
93
+ # Create campaign and station
94
+ from upstream_api_client.models import CampaignsIn, StationCreate
95
+ from datetime import datetime, timedelta
96
+
97
+ campaign_data = CampaignsIn(
98
+ name="Hurricane Monitoring 2024",
99
+ description="Hurricane monitoring campaign",
100
+ contact_name="Dr. Jane Smith",
101
+ contact_email="jane.smith@university.edu",
102
+ allocation="TACC",
103
+ start_date=datetime.now(),
104
+ end_date=datetime.now() + timedelta(days=365)
105
+ )
106
+ campaign = client.create_campaign(campaign_data)
107
+
108
+ station_data = StationCreate(
109
+ name="Galveston Pier",
110
+ description="Hurricane monitoring station at Galveston Pier",
111
+ contact_name="Dr. Jane Smith",
112
+ contact_email="jane.smith@university.edu",
113
+ start_date=datetime.now(),
114
+ active=True
115
+ )
116
+ station = client.create_station(campaign.id, station_data)
117
+
118
+ # Upload sensor data
119
+ result = client.upload_csv_data(
120
+ campaign_id=campaign.id,
121
+ station_id=station.id,
122
+ sensors_file="sensors.csv",
123
+ measurements_file="measurements.csv"
124
+ )
125
+
126
+ # Automatically creates discoverable CKAN dataset
127
+ print(f"Data published at: {result.ckan_url}")
128
+ ```
129
+
130
+ ### 🚀 **Production-Ready Features**
131
+
132
+ - **Automatic chunking** for large datasets (>50MB)
133
+ - **Retry mechanisms** with exponential backoff
134
+ - **Comprehensive error handling** with detailed messages
135
+ - **Progress tracking** for long-running uploads
136
+ - **Extensive logging** for debugging and monitoring
137
+
138
+ ### 🔄 **Automation-Friendly**
139
+
140
+ Perfect for automated sensor networks:
141
+
142
+ ```python
143
+ # Scheduled data upload every 6 hours
144
+ def automated_upload():
145
+ # Collect sensor readings and save to CSV files
146
+ sensors_file, measurements_file = collect_sensor_readings()
147
+ client.upload_csv_data(
148
+ campaign_id=CAMPAIGN_ID,
149
+ station_id=STATION_ID,
150
+ sensors_file=sensors_file,
151
+ measurements_file=measurements_file
152
+ )
153
+ ```
154
+
155
+ ## Installation
156
+
157
+ ```bash
158
+ pip install upstream-sdk
159
+ ```
160
+
161
+ For development:
162
+
163
+ ```bash
164
+ pip install upstream-sdk[dev]
165
+ ```
166
+
167
+ ## Quick Start
168
+
169
+ ### 1. Basic Setup
170
+
171
+ ```python
172
+ from upstream import UpstreamClient
173
+
174
+ # Initialize with credentials
175
+ client = UpstreamClient(
176
+ username="your_username",
177
+ password="your_password",
178
+ base_url="https://upstream-dso.tacc.utexas.edu/dev"
179
+ )
180
+ ```
181
+
182
+ ### 2. Create Campaign
183
+
184
+ ```python
185
+ from upstream_api_client.models import CampaignsIn
186
+ from datetime import datetime, timedelta
187
+
188
+ campaign_data = CampaignsIn(
189
+ name="Air Quality Monitoring 2024",
190
+ description="Urban air quality sensor network deployment",
191
+ contact_name="Dr. Jane Smith",
192
+ contact_email="jane.smith@university.edu",
193
+ allocation="TACC",
194
+ start_date=datetime.now(),
195
+ end_date=datetime.now() + timedelta(days=365)
196
+ )
197
+ campaign = client.create_campaign(campaign_data)
198
+ ```
199
+
200
+ ### 3. Register Monitoring Station
201
+
202
+ ```python
203
+ from upstream_api_client.models import StationCreate
204
+ from datetime import datetime
205
+
206
+ station_data = StationCreate(
207
+ name="Downtown Monitor",
208
+ description="City center air quality station",
209
+ contact_name="Dr. Jane Smith",
210
+ contact_email="jane.smith@university.edu",
211
+ start_date=datetime.now(),
212
+ active=True
213
+ )
214
+ station = client.create_station(campaign.id, station_data)
215
+ ```
216
+
217
+ ### 4. Upload Sensor Data
218
+
219
+ ```python
220
+ # Upload from CSV files
221
+ result = client.upload_csv_data(
222
+ campaign_id=campaign.id,
223
+ station_id=station.id,
224
+ sensors_file="path/to/sensors.csv",
225
+ measurements_file="path/to/measurements.csv"
226
+ )
227
+
228
+ print(f"Uploaded {result.sensors_processed} sensors")
229
+ print(f"Added {result.measurements_added} measurements")
230
+ ```
231
+
232
+ ## Data Format Requirements
233
+
234
+ ### Sensors CSV Format
235
+
236
+ ```csv
237
+ alias,variablename,units,postprocess,postprocessscript
238
+ temp_01,Air Temperature,°C,,
239
+ humidity_01,Relative Humidity,%,,
240
+ pm25_01,PM2.5 Concentration,μg/m³,,
241
+ ```
242
+
243
+ ### Measurements CSV Format
244
+
245
+ ```csv
246
+ collectiontime,Lat_deg,Lon_deg,temp_01,humidity_01,pm25_01
247
+ 2024-01-15T10:30:00Z,30.2672,-97.7431,23.5,65.2,12.8
248
+ 2024-01-15T10:31:00Z,30.2672,-97.7431,23.7,64.8,13.1
249
+ 2024-01-15T10:32:00Z,30.2672,-97.7431,23.9,64.5,12.9
250
+ ```
251
+
252
+ ## Advanced Usage
253
+
254
+ ### Automated Pipeline Example
255
+
256
+ ```python
257
+ import schedule
258
+ from upstream import UpstreamClient
259
+
260
+ client = UpstreamClient.from_config("config.yaml")
261
+
262
+ def hourly_data_upload():
263
+ try:
264
+ # Collect data from sensors
265
+ sensor_data = collect_from_weather_station()
266
+
267
+ # Upload to Upstream
268
+ result = client.upload_csv_data(
269
+ campaign_id=CAMPAIGN_ID,
270
+ station_id=STATION_ID,
271
+ sensors_file=sensors_file,
272
+ measurements_file=measurements_file
273
+ )
274
+
275
+ logger.info(f"Successfully uploaded {result.sensors_processed} sensors and {result.measurements_added} measurements")
276
+
277
+ except Exception as e:
278
+ logger.error(f"Upload failed: {e}")
279
+ # Implement your error handling/alerting
280
+
281
+ # Schedule uploads every hour
282
+ schedule.every().hour.do(hourly_data_upload)
283
+ ```
284
+
285
+ ### Large Dataset Handling
286
+
287
+ ```python
288
+ # For large files, use chunked upload
289
+ result = client.upload_chunked_csv_data(
290
+ campaign_id=campaign.id,
291
+ station_id=station.id,
292
+ sensors_file="sensors.csv",
293
+ measurements_file="large_dataset.csv", # 500MB file
294
+ chunk_size=10000 # rows per chunk
295
+ )
296
+ ```
297
+
298
+ ### Advanced Upload Options
299
+
300
+ ```python
301
+ # For more control over uploads, use the advanced method
302
+ result = client.upload_sensor_measurement_files(
303
+ campaign_id=campaign.id,
304
+ station_id=station.id,
305
+ sensors_file="sensors.csv", # Can be file path, bytes, or (filename, bytes) tuple
306
+ measurements_file="measurements.csv", # Can be file path, bytes, or (filename, bytes) tuple
307
+ chunk_size=1000 # Process in chunks of 1000 rows
308
+ )
309
+ ```
310
+
311
+ ### Custom Data Processing
312
+
313
+ ```python
314
+ # Pre-process data before upload
315
+ def custom_pipeline():
316
+ # Your data collection logic
317
+ raw_data = collect_sensor_data()
318
+
319
+ # Apply quality control
320
+ cleaned_data = apply_qc_filters(raw_data)
321
+
322
+ # Transform to Upstream format
323
+ upstream_data = transform_data(cleaned_data)
324
+
325
+ # Upload processed data
326
+ client.upload_csv_data(
327
+ campaign_id=campaign.id,
328
+ station_id=station.id,
329
+ sensors_file="processed_sensors.csv",
330
+ measurements_file="processed_measurements.csv"
331
+ )
332
+ ```
333
+
334
+ ## Use Cases
335
+
336
+ ### 🌪️ **Disaster Response Networks**
337
+
338
+ - Hurricane monitoring stations with automated data upload
339
+ - Emergency response sensor deployment
340
+ - Real-time environmental hazard tracking
341
+
342
+ ### 🌬️ **Environmental Research**
343
+
344
+ - Long-term air quality monitoring
345
+ - Climate change research networks
346
+ - Urban environmental health studies
347
+
348
+ ### 🌊 **Water Monitoring**
349
+
350
+ - Stream gauge networks
351
+ - Water quality assessment programs
352
+ - Flood monitoring and prediction
353
+
354
+ ### 🏭 **Industrial Monitoring**
355
+
356
+ - Emissions monitoring compliance
357
+ - Environmental impact assessment
358
+ - Regulatory reporting automation
359
+
360
+ ## API Reference
361
+
362
+ ### UpstreamClient Methods
363
+
364
+ #### Campaign Management
365
+ - **`create_campaign(campaign_in: CampaignsIn)`** - Create a new monitoring campaign
366
+ - **`get_campaign(campaign_id: str)`** - Get campaign by ID
367
+ - **`list_campaigns(**kwargs)`** - List all campaigns
368
+
369
+ #### Station Management
370
+ - **`create_station(campaign_id: str, station_create: StationCreate)`** - Create a new monitoring station
371
+ - **`get_station(station_id: str, campaign_id: str)`** - Get station by ID
372
+ - **`list_stations(campaign_id: str, **kwargs)`** - List stations for a campaign
373
+
374
+ #### Data Upload
375
+ - **`upload_csv_data(campaign_id: str, station_id: str, sensors_file: str, measurements_file: str)`** - Upload CSV files
376
+ - **`upload_sensor_measurement_files(campaign_id: str, station_id: str, sensors_file: Union[str, bytes, Tuple], measurements_file: Union[str, bytes, Tuple], chunk_size: int = 1000)`** - Advanced upload with chunking
377
+ - **`upload_chunked_csv_data(campaign_id: str, station_id: str, sensors_file: str, measurements_file: str)`** - Chunked upload for large files
378
+
379
+ #### Utilities
380
+ - **`validate_files(sensors_file: str, measurements_file: str)`** - Validate CSV files
381
+ - **`get_file_info(file_path: str)`** - Get information about CSV files
382
+ - **`authenticate()`** - Test authentication
383
+ - **`logout()`** - Logout and invalidate tokens
384
+ - **`publish_to_ckan(campaign_id: str, **kwargs)`** - Publish data to CKAN
385
+
386
+ ### Core Classes
387
+
388
+ - **`UpstreamClient`** - Main SDK interface
389
+ - **`CampaignsIn`** - Campaign creation model
390
+ - **`StationCreate`** - Station creation model
391
+
392
+ ### Authentication
393
+
394
+ - **`AuthManager`** - Handle API authentication
395
+ - **`TokenManager`** - Manage token lifecycle
396
+
397
+ ### Utilities
398
+
399
+ - **`DataValidator`** - Validate CSV formats
400
+ - **`ChunkManager`** - Handle large file uploads
401
+ - **`ErrorHandler`** - Comprehensive error handling
402
+
403
+ ## Configuration
404
+
405
+ ### Environment Variables
406
+
407
+ ```bash
408
+ UPSTREAM_USERNAME=your_username
409
+ UPSTREAM_PASSWORD=your_password
410
+ UPSTREAM_BASE_URL=https://upstream-dso.tacc.utexas.edu/dev
411
+ CKAN_URL=https://ckan.tacc.utexas.edu
412
+ ```
413
+
414
+ ### Configuration File
415
+
416
+ ```yaml
417
+ # config.yaml
418
+ upstream:
419
+ username: your_username
420
+ password: your_password
421
+ base_url: https://upstream-dso.tacc.utexas.edu/dev
422
+
423
+ ckan:
424
+ url: https://ckan.tacc.utexas.edu
425
+ auto_publish: true
426
+ default_organization: your-org
427
+
428
+ upload:
429
+ chunk_size: 10000
430
+ max_file_size_mb: 50
431
+ retry_attempts: 3
432
+ timeout_seconds: 300
433
+ ```
434
+
435
+ ## Contributing
436
+
437
+ We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
438
+
439
+ ### Development Setup
440
+
441
+ ```bash
442
+ git clone https://github.com/In-For-Disaster-Analytics/upstream-python-sdk.git
443
+ cd upstream-python-sdk
444
+ pip install -e .[dev]
445
+ pre-commit install
446
+ ```
447
+
448
+ ### Running Tests
449
+
450
+ ```bash
451
+ pytest # Run all tests
452
+ pytest tests/test_auth.py # Run specific test file
453
+ pytest --cov=upstream # Run with coverage
454
+ ```
455
+
456
+ ## License
457
+
458
+ This project is licensed under the MIT License - see [LICENSE](LICENSE) file for details.
459
+
460
+ ## Support
461
+
462
+ - **Documentation**: [https://upstream-python-sdk.readthedocs.io](https://upstream-python-sdk.readthedocs.io)
463
+ - **Issues**: [GitHub Issues](https://github.com/In-For-Disaster-Analytics/upstream-python-sdk/issues)
464
+ - **Discussions**: [GitHub Discussions](https://github.com/In-For-Disaster-Analytics/upstream-python-sdk/discussions)
465
+
466
+ ## Citation
467
+
468
+ If you use this SDK in your research, please cite:
469
+
470
+ ```bibtex
471
+ @software{upstream_python_sdk,
472
+ title={Upstream Python SDK: Environmental Sensor Data Integration},
473
+ author={In-For-Disaster-Analytics Team},
474
+ year={2024},
475
+ url={https://github.com/In-For-Disaster-Analytics/upstream-python-sdk},
476
+ version={1.0.0}
477
+ }
478
+ ```
479
+
480
+ ## Related Projects
481
+
482
+ - **[Upstream Platform](https://github.com/In-For-Disaster-Analytics/upstream-docker)** - Main platform repository
483
+ - **[Upstream Examples](https://github.com/In-For-Disaster-Analytics/upstream-examples)** - Example workflows and tutorials
484
+ - **[CKAN Integration](https://ckan.tacc.utexas.edu)** - Data portal for published datasets
485
+
486
+ ---
487
+
488
+ **Built for the environmental research community** 🌍
489
+ **Enabling automated, reproducible, and discoverable environmental data workflows**