mpcaHydro 2.0.4__py3-none-any.whl → 2.0.5__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,71 @@
1
+ quality_code,Text,Description,Active
2
+ 0,Unchecked,Unchecked data in progress or data that is not quality coded as part of the workup. Default coding for shifts so the quality codes from Level and Ratings are used for discharges. ,1
3
+ 3,Instantaneous,Instantaneous groundwater measurements or sampled date for load stations.,1
4
+ 5,Excellent,Discharge measurements that are excellent.,1
5
+ 8,Reliable Interpolation,The value of the data point is an interpolation between adjacent points. Code used for filling gaps less than 4 hours or with no change in data trend likely based on reference information.,1
6
+ 10,Good,Time series data that tracks well and requires no corrections or corrections of very small magnitude or timeseries data that has been reviewed and accepted for precipitation and groundwater level. Also used for discharge measurements and rating points. ,1
7
+ 15,Fair,Time series data that tracks fairly well and requires some corrections of relatively small magnitude. Also used for discharge measurements and rating points. ,1
8
+ 20,Poor,Time series data that tracks poorly and requires significant or many corrections. Also used for discharge measurements and rating points. ,1
9
+ 27,Questionable,"Timeseries data or discharge measurements that are questionable due to operator error, equipment error, etc). Extra scrutiny should be used for these data. ",1
10
+ 28,Unknown data quality,"Unknown quality of time series data, ratings or discharge measurements.",1
11
+ 29,Modeled,"Time-series data, rating point or discharge from a reliable mathematical and\or computer model. ",1
12
+ 34,Estimated,"Time-series data estimated from reference traces, models or extrapolation of the rating curve using supporting data and up to two times the maximum measured discharge.",1
13
+ 35,Unreliable,Time-series data computed with a rating extrapolated without supporting data or beyond two times the maximum measured discharge without a model.,1
14
+ 36,Threshold Exceedance,"Time-series data may be beyond the measuring limits of the monitoring equipment, or outside the bounds of historical extremes.",1
15
+ 40,Default import code,WISKI default coding for gaugings. ,1
16
+ 45,Approved Ext Data,"External data that has been graded externally as ""Approved"".",1
17
+ 48,Unknown Ext Data,External data that has been graded internally as “Unknown”.,1
18
+ 49,Estimated Ext Data,External data that has been graded externally as “Estimated.” Typically this is finalized ice data.,1
19
+ 50,Provisional Ext Data,External data that has been graded internally or externally as “Provisional”.,1
20
+ 80,Ice - Estimated,Ice affected time series data. Discharge computed with ice affected stage data is considered estimated.,1
21
+ 199,199-Logger Unknown,Initial code for data coming to the system from the logger.,1
22
+ 200,200,Initial code for data coming to the system from telemetry or default coding for WISKI timeseries. ,1
23
+ 228,Info Parameter,This parameter is collected for informational purposes only. Data has been through a cursory check only. This is stored in the database and available upon request.,1
24
+ 255,---,System assigned code for gaps in the data set. Records with null values. ,1
25
+ 1,Continuous Data,~Discontinued~ Good TS data that requires no correction.,0
26
+ 2,Edited Data,~Discontinued~ TS data that has been edited. Typically used when spikes are removed or when points are edited manual for datum corrections.,0
27
+ 3,Instantaneous Data,Final WQ data.,0
28
+ 4,Questionable data,~Discontinued~,0
29
+ 5,Excellent measurment,Used to indicated discharge measurements that are excellent as well as excellent sections of the rating.,0
30
+ 10,Good measurement,Used to indicated discharge measurements and sections of the rating that are good and time series data that tracks well and requires no corrections or corrections of very small magnitude.,0
31
+ 12,Modeled measurement,~Discontinued~ Rating point or discharge was obtained from a relizble mathematical and/or computer model. After 3/1/11 use QC148.,0
32
+ 15,Fair measurement,Used to indicated discharge measurements and sections of the rating that are fair and time series data that tracks fairly well and requires some corrections of relatively small magnitude.,0
33
+ 20,Poor measurement,Used to indicated discharge measurements and sections of the rating that are poor and time series data that tracks poorly and requires significant or many corrections.,0
34
+ 25,Unknown measurement,Measurement data not available.,0
35
+ 27,Questionable data,"Flow measurement is very poor and should be given extra scrutiny or time series data that is questionable due to operator error, equipment error, etc.",0
36
+ 30,Good Archived Daily Value,This code is used for archived daily value data that is considered “Good”.,0
37
+ 31,Fair Archived Daily Value,This code is used for archived daily value data that is considered “Fair”.,0
38
+ 32,Poor Archived Daily Value,This code is used for archived daily value data that is considered “Poor”.,0
39
+ 33,Unknown Archived Daily Value,This code is used for archived daily value data that has unknown quality based on lack of documentation.,0
40
+ 34,Estimated Archived Daily Value,This code is used for archived daily value data that has been estimated.,0
41
+ 35,Unreliable Archived Daily Value,This code is used for archived daily value data that is unreliable based on the quality of the supporting time series data and/or rating.,0
42
+ 45,Good External Data,This code is used for external data that has been graded internally as “Good”.,0
43
+ 46,Fair External Data,This code is used for external data that has been graded internally as “Fair”.,0
44
+ 47,Poor External Data,This code is used for external data that has been graded internally as “Poor”.,0
45
+ 48,Unknown External Data,This code is used for external data that has been graded internally as “Unknown”,0
46
+ 49,Estimated External Data,This code is used for external data that has been graded externally as “Estimated.” Typically this is finalized ice data.,0
47
+ 50,Provisional External Data,This code is used for external data that has been graded internally as “Provisional”,0
48
+ 51,Telemetry data - DCP,This code is used for time-series data when imported into hydstra using an automated telemetry method that accesses a DCP through the GOES network. The “questionable measurement” flag is set through the shef code that accompanies the DCP data.,0
49
+ 60,Above rating,~Discontinued~,0
50
+ 70,Estimated Data,Value of the data point is estimated.,0
51
+ 76,Reliable interpolation,Value of the data point is an interpolation between adjacent points. ,0
52
+ 80,Ice,"(DISCONTINUED) Used to indicate ice conditions when the data should not be exported. Use in conjunction with 80 to code 232.00 values, run USDAY to compute daily flow, then recode 232.00 80 values to 180 so unit value export cannot occur.",0
53
+ 82,Linear interpolation across a gap in records,~Discontinued~ Points that were added to fill a gap in the data record. The points fall on a straight line between the end points of the gap. This code was changed to 8 in WISKI.,0
54
+ 103,Provisional Instantaneous Data,Provisional WQ data.,0
55
+ 130,Good Provisional Daily Value,This code is used for archived daily value data that is considered “Good” but Provisional because there is only one year of gaging measurements.,0
56
+ 131,Fair Provisional Daily Value,This code is used for archived daily value data that is considered “Fair” but Provisional because there is only one year of gaging measurements.,0
57
+ 132,Poor Provisional Daily Value,This code is used for archived daily value data that is considered “Poor” but Provisional because there is only one year of gaging measurements.,0
58
+ 133,Unknown Provisional Archived Daily Value,This code is used for archived daily value data that has unknown quality based on lack of documentation but Provisional because there is only one year of gaging measurements.,0
59
+ 134,Estimated Provisional Archived Daily Value,This code is used for archived daily value data that has been estimated but Provisional because there is only one year of gaging measurements.,0
60
+ 135,Unreliable Provisional Archived Daily Value,This code is used for archived daily value data that is unreliable based on the quality of the supporting time series data and/or rating but Provisional because there is only one year of gaging measurements.,0
61
+ 140,Data not yet checked,This code is used for time-series data when it is initially imported into hydstra using manual import methods. ,0
62
+ 141,Telemetry data - not yet checked,This code is used for time-series data when it is imported into hydstra using an automated telemetry method.,0
63
+ 148,Modeled measurement,Rating point or discharge was obtained from a reliable mathematical and/or computer model.,0
64
+ 149,Extrapolated rating point,Rating point accurately extrapolated using supporting data and is less than two times the maxiumum measured discharge.,0
65
+ 150,Over-extrapolated rating point,Rating point extrapolated without supporting data or beyone two times the maximum measured discharge without a mathematical model.,0
66
+ 151,Data Missing,"This code is used to flag the end of a period of missing time-series data, before the next good data value.",0
67
+ 160,Above rating,~Discontinued~,0
68
+ 169,Datalogger Hardware Error Code 6999,"This code is used to indicate that a time-series point had a value of 6999 or -6999, a typical hardware error code, and the value was changed.",0
69
+ 170,Estimated Data,"Used to indicate estimated data when the data should not be exported. Often used in conjunction with 70 to code 232.00 values, run USDAY to compute daily flow, then recode 232.00 70 values to 170 so unit value export can not occur.",0
70
+ 180,Ice,Used to indicate ice conditions.,0
71
+ 255,Data Missing,This code is used when data is exported and does not exist for a given time period.,0
Binary file
Binary file
Binary file
mpcaHydro/data_manager.py CHANGED
@@ -8,89 +8,47 @@ Created on Fri Jun 3 10:01:14 2022
8
8
  import pandas as pd
9
9
  #from abc import abstractmethod
10
10
  from pathlib import Path
11
- from mpcaHydro import etlWISKI, etlSWD#, etlEQUIS
11
+ from mpcaHydro import etlSWD
12
+ from mpcaHydro import equis, wiski, warehouse
13
+ from mpcaHydro import xref
14
+ from mpcaHydro import outlets
15
+ from mpcaHydro.reports import reportManager
12
16
  import duckdb
13
17
 
14
- #
15
- '''
16
- Q
17
- WT
18
- TSS
19
- N
20
- TKN
21
- OP
22
- TP
23
- CHLA
24
- DO
25
-
26
-
27
- class Station
28
-
29
- - id
30
- - name
31
- - source
32
- - data
33
-
34
-
35
-
36
-
37
-
38
- '''
39
- WISKI_EQUIS_XREF = pd.read_csv(Path(__file__).parent/'data/WISKI_EQUIS_XREF.csv')
40
- #WISKI_EQUIS_XREF = pd.read_csv('C:/Users/mfratki/Documents/GitHub/hspf_tools/WISKI_EQUIS_XREF.csv')
41
-
42
18
  AGG_DEFAULTS = {'cfs':'mean',
43
19
  'mg/l':'mean',
44
- 'degF': 'mean',
20
+ 'degf': 'mean',
45
21
  'lb':'sum'}
46
22
 
47
23
  UNIT_DEFAULTS = {'Q': 'cfs',
24
+ 'QB': 'cfs',
48
25
  'TSS': 'mg/l',
49
26
  'TP' : 'mg/l',
50
27
  'OP' : 'mg/l',
51
28
  'TKN': 'mg/l',
52
29
  'N' : 'mg/l',
53
- 'WT' : 'degF',
30
+ 'WT' : 'degf',
54
31
  'WL' : 'ft'}
55
32
 
56
- # VALID_UNITS = {'Q': 'cfs',
57
- # 'TSS': 'mg/l','lb',
58
- # 'TP' : 'mg/l',
59
- # 'OP' : 'mg/l',
60
- # 'TKN': 'mg/l',
61
- # 'N' : 'mg/l',
62
- # 'WT' : 'degF',
63
- # 'WL' : 'ft'}
64
33
 
34
+ def validate_constituent(constituent):
35
+ assert constituent in ['Q','TSS','TP','OP','TKN','N','WT','DO','WL','CHLA']
36
+
37
+ def validate_unit(unit):
38
+ assert(unit in ['mg/l','lb','cfs','degF'])
65
39
 
66
- def are_lists_identical(nested_list):
67
- # Sort each sublist
68
- sorted_sublists = [sorted(sublist) for sublist in nested_list]
69
- # Compare all sublists to the first one
70
- return all(sublist == sorted_sublists[0] for sublist in sorted_sublists)
71
40
 
72
- def construct_database(folderpath):
41
+ def build_warehouse(folderpath):
73
42
  folderpath = Path(folderpath)
74
43
  db_path = folderpath.joinpath('observations.duckdb').as_posix()
75
- with duckdb.connect(db_path) as con:
76
- con.execute("DROP TABLE IF EXISTS observations")
77
- datafiles = folderpath.joinpath('*.csv').as_posix()
78
- query = '''
79
- CREATE TABLE observations AS SELECT *
80
- FROM
81
- read_csv_auto(?,
82
- union_by_name = true);
83
-
84
- '''
85
- con.execute(query,[datafiles])
86
-
44
+ warehouse.init_db(db_path)
87
45
 
88
46
  def constituent_summary(db_path):
89
47
  with duckdb.connect(db_path) as con:
90
48
  query = '''
91
49
  SELECT
92
50
  station_id,
93
- source,
51
+ station_origin,
94
52
  constituent,
95
53
  COUNT(*) AS sample_count,
96
54
  year(MIN(datetime)) AS start_date,
@@ -98,7 +56,7 @@ def constituent_summary(db_path):
98
56
  FROM
99
57
  observations
100
58
  GROUP BY
101
- constituent, station_id,source
59
+ constituent, station_id,station_origin
102
60
  ORDER BY
103
61
  sample_count;'''
104
62
 
@@ -108,293 +66,163 @@ def constituent_summary(db_path):
108
66
 
109
67
  class dataManager():
110
68
 
111
- def __init__(self,folderpath):
69
+ def __init__(self,folderpath, oracle_user = None, oracle_password =None):
112
70
 
113
71
  self.data = {}
114
72
  self.folderpath = Path(folderpath)
115
73
  self.db_path = self.folderpath.joinpath('observations.duckdb')
116
-
117
- def _reconstruct_database(self):
118
- construct_database(self.folderpath)
119
74
 
120
-
121
- def constituent_summary(self,constituents = None):
122
- with duckdb.connect(self.db_path) as con:
123
- if constituents is None:
124
- constituents = con.query('''
125
- SELECT DISTINCT
126
- constituent
127
- FROM observations''').to_df()['constituent'].to_list()
128
-
129
- query = '''
130
- SELECT
131
- station_id,
132
- source,
133
- constituent,
134
- COUNT(*) AS sample_count,
135
- year(MIN(datetime)) AS start_date,
136
- year(MAX(datetime)) AS end_date
137
- FROM
138
- observations
139
- WHERE
140
- constituent in (SELECT UNNEST(?))
141
- GROUP BY
142
- constituent,station_id,source
143
- ORDER BY
144
- constituent,sample_count;'''
145
-
146
- df = con.execute(query,[constituents]).fetch_df()
147
- return df
75
+ self.oracle_user = oracle_user
76
+ self.oracle_password = oracle_password
77
+ warehouse.init_db(self.db_path,reset = False)
78
+ self.xref = xref
79
+ self.outlets = outlets
80
+ self.reports = reportManager(self.db_path)
148
81
 
149
- def get_wiski_stations(self):
150
- return list(WISKI_EQUIS_XREF['WISKI_STATION_NO'].unique())
151
82
 
152
- def get_equis_stations(self):
153
- return list(WISKI_EQUIS_XREF['EQUIS_STATION_ID'].unique())
83
+ def connect_to_oracle(self):
84
+ assert (self.credentials_exist(), 'Oracle credentials not found. Set ORACLE_USER and ORACLE_PASSWORD environment variables or use swd as station_origin')
85
+ equis.connect(user = self.oracle_user, password = self.oracle_password)
154
86
 
155
- def wiski_equis_alias(self,wiski_station_id):
156
- equis_ids = list(set(WISKI_EQUIS_XREF.loc[WISKI_EQUIS_XREF['WISKI_STATION_NO'] == wiski_station_id,'WISKI_EQUIS_ID'].to_list()))
157
- equis_ids = [equis_id for equis_id in equis_ids if not pd.isna(equis_id)]
158
- if len(equis_ids) == 0:
159
- return []
160
- elif len(equis_ids) > 1:
161
- print(f'Too Many Equis Stations for {wiski_station_id}')
162
- raise
87
+ def credentials_exist(self):
88
+ if (self.oracle_user is not None) & (self.oracle_password is not None):
89
+ return True
163
90
  else:
164
- return equis_ids[0]
165
-
166
- def wiski_equis_associations(self,wiski_station_id):
167
- equis_ids = list(WISKI_EQUIS_XREF.loc[WISKI_EQUIS_XREF['WISKI_STATION_NO'] == wiski_station_id,'EQUIS_STATION_ID'].unique())
168
- equis_ids = [equis_id for equis_id in equis_ids if not pd.isna(equis_id)]
169
- if len(equis_ids) == 0:
170
- return []
171
- else:
172
- return equis_ids
173
-
174
- def equis_wiski_associations(self,equis_station_id):
175
- wiski_ids = list(WISKI_EQUIS_XREF.loc[WISKI_EQUIS_XREF['EQUIS_STATION_ID'] == equis_station_id,'WISKI_STATION_NO'].unique())
176
- wiski_ids = [wiski_id for wiski_id in wiski_ids if not pd.isna(wiski_id)]
177
- if len(wiski_ids) == 0:
178
- return []
179
- else:
180
- return wiski_ids
91
+ return False
181
92
 
182
- def equis_wiski_alias(self,equis_station_id):
183
- wiski_ids = list(set(WISKI_EQUIS_XREF.loc[WISKI_EQUIS_XREF['WISKI_EQUIS_ID'] == equis_station_id,'WISKI_STATION_NO'].to_list()))
184
- wiski_ids = [wiski_id for wiski_id in wiski_ids if not pd.isna(wiski_id)]
185
- if len(wiski_ids) == 0:
186
- return []
187
- elif len(wiski_ids) > 1:
188
- print(f'Too Many WISKI Stations for {equis_station_id}')
189
- raise
190
- else:
191
- return wiski_ids[0]
192
-
193
- def _equis_wiski_associations(self,equis_station_ids):
194
- wiski_stations = [self.equis_wiski_associations(equis_station_id) for equis_station_id in equis_station_ids]
195
- if are_lists_identical(wiski_stations):
196
- return wiski_stations[0]
197
- else:
198
- return []
199
-
200
- def _stations_by_wid(self,wid_no,station_origin):
201
- if station_origin in ['wiski','wplmn']:
202
- station_col = 'WISKI_STATION_NO'
203
- elif station_origin in ['equis','swd']:
204
- station_col = 'EQUIS_STATION_ID'
205
- else:
206
- raise
207
-
208
- return list(WISKI_EQUIS_XREF.loc[WISKI_EQUIS_XREF['WID'] == wid_no,station_col].unique())
93
+ def _build_warehouse(self):
94
+ build_warehouse(self.folderpath)
209
95
 
96
+ def download_station_data(self,station_id,station_origin,overwrite=True,to_csv = False,filter_qc_codes = True, start_year = 1996, end_year = 2030,baseflow_method = 'Boughton'):
97
+ '''
98
+ Method to download data for a specific station and load it into the warehouse.
99
+
100
+ :param self: Description
101
+ :param station_id: Station identifier
102
+ :param station_origin: source of station data: wiski, equis, or swd
103
+ :param overwrite: Whether to overwrite existing data
104
+ :param to_csv: Whether to export data to CSV
105
+ :param filter_qc_codes: Whether to filter quality control codes
106
+ :param start_year: Start year for data download
107
+ :param end_year: End year for data download
108
+ :param baseflow_method: Method for baseflow calculation
109
+ '''
110
+ with duckdb.connect(self.db_path,read_only=False) as con:
111
+ if overwrite:
112
+ warehouse.drop_station_id(con,station_id,station_origin)
113
+ warehouse.update_views(con)
114
+
115
+ if station_origin == 'wiski':
116
+ df = wiski.download([station_id],start_year = start_year, end_year = end_year)
117
+ warehouse.load_df_to_staging(con,df, 'wiski_raw', replace = overwrite)
118
+ warehouse.load_df_to_analytics(con,wiski.transform(df,filter_qc_codes = filter_qc_codes,baseflow_method = baseflow_method),'wiski') # method includes normalization
119
+
120
+ elif station_origin == 'equis':
121
+ assert (self.credentials_exist(), 'Oracle credentials not found. Set ORACLE_USER and ORACLE_PASSWORD environment variables or use swd as station_origin')
122
+ df = equis.download([station_id])
123
+ warehouse.load_df_to_staging(con,df, 'equis_raw',replace = overwrite)
124
+ warehouse.load_df_to_analytics(con,equis.transform(df),'equis')
125
+
126
+ elif station_origin == 'swd':
127
+ df = etlSWD.download(station_id)
128
+ warehouse.load_df_to_staging(con,df, 'swd_raw', replace = overwrite)
129
+ warehouse.load_df_to_analytics(con,etlSWD.transform(df),'swd')
130
+ else:
131
+ raise ValueError('station_origin must be wiski, equis, or swd')
210
132
 
211
- def download_stations_by_wid(self, wid_no,station_origin, folderpath = None, overwrite = False):
212
-
213
- station_ids = self._station_by_wid(wid_no,station_origin)
214
-
215
- if not station_ids.empty:
216
- for _, row in station_ids.iterrows():
217
- self.download_station_data(row['station_id'],station_origin, folderpath, overwrite)
218
-
219
- def _download_station_data(self,station_id,station_origin,overwrite=False):
220
- assert(station_origin in ['wiski','equis','swd','wplmn'])
221
- if station_origin == 'wiski':
222
- #equis_stations = list(WISKI_EQUIS_XREF.loc[WISKI_EQUIS_XREF['WISKI_STATION_NO'] == station_id,'WISKI_EQUIS_ID'].unique())
223
- #[self.download_station_data(equis_station,'equis',overwrite = overwrite) for equis_station in equis_stations]
224
- self.download_station_data(station_id,'wiski',overwrite = overwrite)
225
- equis_alias = self.wiski_equis_alias(station_id)
226
- self.download_station_data(equis_alias,'swd',overwrite = overwrite)
227
- elif station_origin == 'wplmn':
228
- self.download_station_data(station_id,'wplmn',overwrite = overwrite)
229
- equis_alias = self.wiski_equis_alias(station_id)
230
- self.download_station_data(equis_alias,'swd',overwrite = overwrite)
231
- else:
232
- wiski_station = self.equis_wiski_associations(station_id)
233
- #wiski_station = WISKI_EQUIS_XREF.loc[WISKI_EQUIS_XREF['EQUIS_STATION_ID'] == station_id,'WISKI_STATION_NO']
234
- self.download_station_data(station_id,'equis',overwrite = overwrite)
235
- self.download_station_data(wiski_station,'wiski',overwrite = overwrite)
236
-
237
-
238
- def download_station_data(self,station_id,source,folderpath=None,overwrite = False):
239
- assert(source in ['wiski','equis','swd','wplmn'])
240
- station_id = str(station_id)
241
- save_name = station_id
242
- if source == 'wplmn':
243
- save_name = station_id + '_wplmn'
244
-
245
- if folderpath is None:
246
- folderpath = self.folderpath
247
- else:
248
- folderpath = Path(folderpath)
249
-
250
-
251
- if (folderpath.joinpath(save_name + '.csv').exists()) & (not overwrite):
252
- print (f'{station_id} data already downloaded')
253
- return
254
-
255
- if source == 'wiski':
256
- data = etlWISKI.download(station_id)
257
- elif source == 'swd':
258
- data = etlSWD.download(station_id)
259
- elif source == 'equis':
260
- data = etlSWD.download(station_id)
261
- else:
262
- data = etlWISKI.download(station_id,wplmn=True)
263
- #raise NotImplementedError()
264
- #data = etlEQUIS.download(station_id)
133
+ with duckdb.connect(self.db_path,read_only=False) as con:
134
+ warehouse.update_views(con)
265
135
 
266
-
267
-
268
- if len(data) > 0:
269
- data.to_csv(folderpath.joinpath(save_name + '.csv'))
270
- self.data[station_id] = data
271
- else:
272
- print(f'No {source} calibration cata available at Station {station_id}')
273
-
274
-
275
- def _load(self,station_id):
276
- df = pd.read_csv(self.folderpath.joinpath(station_id + '.csv'),
277
- index_col='datetime',
278
- parse_dates=['datetime'],
279
- #usecols=['Ts Date','Station number','variable', 'value','reach_id'],
280
- dtype={'station_id': str, 'value': float, 'variable': str,'constituent':str,'unit':str})
281
- self.data[station_id] = df
136
+ if to_csv:
137
+ self.to_csv(station_id)
138
+
282
139
  return df
283
140
 
284
- def load(self,station_id):
285
- try:
286
- df = self.data[station_id]
287
- except:
288
- self._load(station_id)
141
+ def get_outlets(self):
142
+ with duckdb.connect(self.db_path,read_only=True) as con:
143
+ query = '''
144
+ SELECT *
145
+ FROM outlets.station_reach_pairs
146
+ ORDER BY outlet_id'''
147
+ df = con.execute(query).fetch_df()
289
148
  return df
149
+ def get_station_ids(self,station_origin = None):
150
+ with duckdb.connect(self.db_path,read_only=True) as con:
151
+ if station_origin is None:
152
+ query = '''
153
+ SELECT DISTINCT station_id, station_origin
154
+ FROM analytics.observations'''
155
+ df = con.execute(query).fetch_df()
156
+ else:
157
+ query = '''
158
+ SELECT DISTINCT station_id
159
+ FROM analytics.observations
160
+ WHERE station_origin = ?'''
161
+ df = con.execute(query,[station_origin]).fetch_df()
162
+
163
+ return df['station_id'].to_list()
290
164
 
291
- def info(self,constituent):
292
- return pd.concat([self._load(file.stem) for file in self.folderpath.iterdir() if file.suffix == '.csv'])[['station_id','constituent','value']].groupby(by = ['station_id','constituent']).count()
293
-
294
- def get_wplmn_data(self,station_id,constituent,unit = 'mg/l', agg_period = 'YE', samples_only = True):
295
-
296
- assert constituent in ['Q','TSS','TP','OP','TKN','N','WT','DO','WL','CHLA']
297
- station_id = station_id + '_wplmn'
298
- dfsub = self._load(station_id)
299
-
300
- if samples_only:
301
- dfsub = dfsub.loc[dfsub['quality_id'] == 3]
302
- agg_func = 'mean'
303
-
304
- dfsub = dfsub.loc[(dfsub['constituent'] == constituent) &
305
- (dfsub['unit'] == unit),
306
- ['value','data_format','source']]
307
165
 
166
+ def get_station_data(self,station_ids,constituent,agg_period = None):
308
167
 
309
- df = dfsub[['value']].resample(agg_period).agg(agg_func)
310
-
311
- if df.empty:
312
- dfsub = df
313
- else:
314
-
315
- df['data_format'] = dfsub['data_format'].iloc[0]
316
- df['source'] = dfsub['source'].iloc[0]
317
-
318
- #if (constituent == 'TSS') & (unit == 'lb'): #convert TSS from lbs to us tons
319
- # dfsub['value'] = dfsub['value']/2000
320
-
321
- #dfsub = dfsub.resample('H').mean().dropna()
168
+
169
+ with duckdb.connect(self.db_path,read_only=True) as con:
170
+ query = '''
171
+ SELECT *
172
+ FROM analytics.observations
173
+ WHERE station_id IN ? AND constituent = ?'''
174
+ df = con.execute(query,[station_ids,constituent]).fetch_df()
322
175
 
176
+ unit = UNIT_DEFAULTS[constituent]
177
+ agg_func = AGG_DEFAULTS[unit]
178
+
179
+ df.set_index('datetime',inplace=True)
323
180
  df.attrs['unit'] = unit
324
181
  df.attrs['constituent'] = constituent
325
- return df['value'].to_frame().dropna()
326
-
327
- def get_data(self,station_id,constituent,agg_period = 'D'):
328
- return self._get_data([station_id],constituent,agg_period)
329
-
330
- def _get_data(self,station_ids,constituent,agg_period = 'D',tz_offset = '-6'):
331
- '''
332
-
333
- Returns the processed observational data associated with the calibration specific id.
334
-
335
-
336
- Parameters
337
- ----------
338
- station_id : STR
339
- Station ID as a string
340
- constituent : TYPE
341
- Constituent abbreviation used for calibration. Valid options:
342
- 'Q',
343
- 'TSS',
344
- 'TP',
345
- 'OP',
346
- 'TKN',
347
- 'N',
348
- 'WT',
349
- 'DO',
350
- 'WL']
351
- unit : TYPE, optional
352
- Units of data. The default is 'mg/l'.
353
- sample_flag : TYPE, optional
354
- For WPLMN data this flag determines modeled loads are returned. The default is False.
182
+ if agg_period is not None:
183
+ df = df[['value']].resample(agg_period).agg(agg_func)
184
+ df.attrs['agg_period'] = agg_period
355
185
 
356
- Returns
357
- -------
358
- dfsub : Pands.Series
359
- Pandas series of data. Note that no metadata is returned.
186
+ df.rename(columns={'value': 'observed'}, inplace=True)
187
+ return df
188
+
189
+ def get_outlet_data(self,outlet_id,constituent,agg_period = 'D'):
190
+ with duckdb.connect(self.db_path,read_only=True) as con:
191
+ query = '''
192
+ SELECT *
193
+ FROM analytics.outlet_observations_with_flow
194
+ WHERE outlet_id = ? AND constituent = ?'''
195
+ df = con.execute(query,[outlet_id,constituent]).fetch_df()
360
196
 
361
- '''
362
-
363
- assert constituent in ['Q','TSS','TP','OP','TKN','N','WT','DO','WL','CHLA']
364
-
365
197
  unit = UNIT_DEFAULTS[constituent]
366
198
  agg_func = AGG_DEFAULTS[unit]
367
-
368
- dfsub = pd.concat([self.load(station_id) for station_id in station_ids]) # Check cache
369
- dfsub = dfsub.loc[(dfsub['constituent'] == constituent) &
370
- (dfsub['unit'] == unit),
371
- ['value','data_format','source']]
372
-
373
- df = dfsub[['value']].resample(agg_period).agg(agg_func)
199
+
200
+ df.set_index('datetime',inplace=True)
374
201
  df.attrs['unit'] = unit
375
202
  df.attrs['constituent'] = constituent
376
-
377
- if df.empty:
378
-
379
- return df
380
- else:
381
-
382
- df['data_format'] = dfsub['data_format'].iloc[0]
383
- df['source'] = dfsub['source'].iloc[0]
203
+ if agg_period is not None:
204
+ df = df[['value','flow_value','baseflow_value']].resample(agg_period).agg(agg_func)
205
+ df.attrs['agg_period'] = agg_period
384
206
 
207
+ df.rename(columns={'value': 'observed',
208
+ 'flow_value': 'observed_flow',
209
+ 'baseflow_value': 'observed_baseflow'}, inplace=True)
210
+ return df
385
211
 
386
- # convert to desired timzone before stripping timezone information.
387
- #df.index.tz_convert('UTC-06:00').tz_localize(None)
388
- df.index = df.index.tz_localize(None)
389
- return df['value'].to_frame().dropna()
390
212
 
391
213
 
392
- def validate_constituent(constituent):
393
- assert constituent in ['Q','TSS','TP','OP','TKN','N','WT','DO','WL','CHLA']
394
-
395
- def validate_unit(unit):
396
- assert(unit in ['mg/l','lb','cfs','degF'])
397
-
214
+ def to_csv(self,station_id,folderpath = None):
215
+ if folderpath is None:
216
+ folderpath = self.folderpath
217
+ else:
218
+ folderpath = Path(folderpath)
219
+ df = self._load(station_id)
220
+ if len(df) > 0:
221
+ df.to_csv(folderpath.joinpath(station_id + '.csv'))
222
+ else:
223
+ print(f'No {station_id} calibration data available at Station {station_id}')
224
+
225
+ df.to_csv(folderpath.joinpath(station_id + '.csv'))
398
226
 
399
227
 
400
228
  # class database():