acspype 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
acspype-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Ian Black
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
acspype-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,120 @@
1
+ Metadata-Version: 2.4
2
+ Name: acspype
3
+ Version: 0.1.0
4
+ Summary: A Python package for serial data acquisition and advanced processing for the Sea-Bird Scientific ACS.
5
+ Author-email: Ian Black <iantimothyblack@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Repository, https://github.com/IanTBlack/acspype
8
+ Project-URL: Issues, https://github.com/IanTBlack/acspype/issues
9
+ Project-URL: Discussion, https://github.com/IanTBlack/acspype/discussion
10
+ Keywords: ACS,Sea-Bird Scientific,SBS
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Science/Research
13
+ Classifier: Programming Language :: Python :: 3.12
14
+ Classifier: Topic :: Scientific/Engineering
15
+ Requires-Python: >=3.12
16
+ Description-Content-Type: text/markdown
17
+ License-File: LICENSE
18
+ Requires-Dist: bottleneck
19
+ Requires-Dist: fsspec
20
+ Requires-Dist: gsw
21
+ Requires-Dist: pyserial
22
+ Requires-Dist: netCDF4
23
+ Requires-Dist: numpy
24
+ Requires-Dist: pandas
25
+ Requires-Dist: scipy
26
+ Requires-Dist: xarray
27
+ Dynamic: license-file
28
+
29
+ # acpype
30
+
31
+ acpype provides functions for reading [Sea-Bird Scientific ACS](https://www.seabird.com/ac-s-spectral-absorption-and-attenuation-sensor/product?id=60762467715) data over serial and for performing community-accepted processing on ACS data.
32
+
33
+ ACS data are inherently complex and difficult to work with, particularly for new users without strong optics backgrounds.
34
+ This package attempts to simplify the process of reading and processing ACS data so that users can more quickly get to the more advanced data products for their research.
35
+
36
+ # CAUTION
37
+ If you are using this package to acquire data from an ACS over serial, it is your responsibility to ensure that you are
38
+ using the appropriate equipment. Please contact Sea-Bird Scientific for more information about the relevant equipment and
39
+ conditions needed to acquire data from the ACS.
40
+
41
+
42
+ ## What does this package need from you (and Sea-Bird Scientific)?
43
+ In addition to the physical sensor, one who deploys an ACS will also receive a device file (.dev) and a temperature-salinity correction file (.cor) from Sea-Bird Scientific.
44
+
45
+ As of publishing of this software package, the current version of the device file received from the factory is version 3. The device file is a text file that contains factory derived pure water offsets.
46
+ These offsets are created at select temperature bins, which acpype uses to correct data output for temperature changes within the sensor, which impact the internal optics.
47
+ The file provides pure water offsets. Removal of these offsets from a datasets then removes the effect of water on the absorption and attenuation measurements.
48
+
49
+ The TS4.cor file that is delivered with each ACS provides empirically derived temperature and salinity correction coefficients derived by Sullivan et al. 2006.
50
+ These coefficients are used to correct the absorption and attenuation measurements for temperature and salinity effects.
51
+
52
+ ## What does this package not do?
53
+ This package does not provide any functionality for logging data from the ACS. The `serial_no_device_file.ipynb` and `serial_with_device_file.ipynb` offer acypype-specific data structures which can then be converted to a format that enables the logging of the ACS data.
54
+ Inherently, the custom data structures are NamedTuple objects. To access that information as a Python dictionary, you can use the `._asdict()` method.
55
+
56
+ In applications where concurrency is required, PostgreSQL is a good option for storing data, since it provides functionality for handling arrays.
57
+ If file-based logging is required and single files are desired, SQLite is a good option. The ACS produces a significant amount of data, so logging to text or .netCDF files may best be done as hourly or daily files to prevent excessive memory usage.
58
+
59
+
60
+
61
+ ## Restrictions
62
+ The data streaming and processing functions are intended to be used separately from one another and not as a single pipeline.
63
+ The ACS outputs data at 4Hz and processing a serial packet to scattering corrected values is something that can't be done within 250ms for most computers.
64
+ Conceptually, this could be resolved by passing data between threads, but for now we will leave that up to the end user to implement.
65
+
66
+ acpype does not typically enforce a strict naming convention of variables created using the software, but the examples provided use terminology commonly found in the manual and in literature that uses ACS data.
67
+ For obtaining data over serial, the provided functions create predefined names, which users can change out at will.
68
+
69
+ The only enforced naming conventions are with the dimensions of any ACS data or metadata file imported with Xarray.
70
+ The following dimensions are required for ACS datasets:
71
+
72
+ - time: The time of the ACS sample, in UTC. Derived from the computer that is reading data over serial.
73
+ - a_wavelength: The absorption wavelength bins derived from the device file.
74
+ - c_wavelength: The attenuation wavelength bins derived from the device file.
75
+
76
+
77
+ For the device file, the required dimensions are:
78
+ - a_wavelength: The absorption wavelength bins derived from the device file.
79
+ - c_wavelength: The attenuation wavelength bins derived from the device file.
80
+ - temperature_bin: The temperature bin associated with the pure water offset.
81
+
82
+ For the TS4.cor file, the required dimensions are:
83
+ - wavelength
84
+
85
+
86
+
87
+ <!--
88
+ ## Installation
89
+
90
+ This package is available on [PyPI](https://pypi.org/project/acpype/) and can be installed using pip:
91
+ `pip install acspype`
92
+
93
+ -->
94
+
95
+ ## Suggested Naming Conventions
96
+ acpype doesn't enforce a strict naming convention of variables created using the software, but the examples provided use terminology commonly found in the ACS manual and in literature that uses ACS data.
97
+
98
+
99
+
100
+
101
+
102
+
103
+
104
+ #### SCRATCH
105
+
106
+
107
+
108
+ Serial Pipeline
109
+ The ACS communicates over RS232 and asynchronously sends data to produce a new binary packet at 4Hz. A serial pipeline should effectively seek out a full ACS binary packet within a serial port’s buffer, extract it, and then assign it a timestamp based on the host computer clock. Those looking to integrate the ACS into a data acquisition system should know that the instrument does not have a real-time clock and only reports the number of milliseconds that have passed since the system received power. After a full packet is identified, it can be parsed to obtain a set of ACS data with values in engineering units (e.g. counts). Using a corresponding device file (ACS-XXX.dev), the engineering units can then be converted to geophysical units which are indexed by wavelength for the absorption (a) and attenuation (c) channels. Real-time correction of an ACS spectrum in the same thread is limited to what can be done in between receiving packets. A full packet will be received roughly every 250 milliseconds. Anecdotally, computation up through measured absorption (am) and attenuation (cm) can safely be done in the same thread, but temperature-salinity correction may lead to serial buffer pile-up and inaccurate timestamps. Steps 0.0 to 1.5 in Table 1 describe what can be done in single thread data acquisition for the ACS.
110
+ acpype also separates itself from software like pyACS by providing functionality to parse an incoming ACS binary packet into engineering values without explicit knowledge of the information in the ACS’ accompanying device file. Conveniently, the first 31 and last 3 bytes of a packet from any ACS (structure version 3) maintain the same format description, and byte 31 provides the number of output wavelengths, which can then be used to calculate the remaining number of bytes within the packet and split it into raw engineering values. This makes it possible, as a recommended last resort, to acquire, identify, and log ACS data from any ACS on a serial port without knowledge of the information within the device file. It is extremely important to keep track of the separate device file that provides the factory (or user) created pure water offsets. A key challenge in processing and merging ACS data is that the number of wavelength outputs and wavelength bins are very rarely the same between instruments, as well as between the same instrument factory refurbishments. This leads to the possibility of incorrectly merging ACS data if not indexed by wavelength.
111
+ After processing the measured value, it is up to the user to determine the storage format of data. acpype does not enforce a storage format or filetype for serial data acquisition. For applications that require concurrency, the use of PostgreSQL (PGDG, 2025) is suggested, as it supports the storage and retrieval of arrays. For file-based applications, using SQLite (SQLite Consortium, 2025) is suggested, but arrays will first need to be converted to character delimited strings. Logging directly to text or netCDF files is discouraged because memory use would increase as the dataset becomes larger, unless functionality is implemented to maintain a consistent file size.
112
+
113
+ Post-Processing Pipeline
114
+ It is presumed that the data processed with acpype will already have been timestamped and split into its engineering parts. acpype does require that the input data be placed in an Xarray.Dataset with the minimum coordinates of time, a_wavelength, and c_wavelength. Steps 0.1 to 5.1 describe the steps that should be taken as part of a standard post-processing pipeline for ACS data. Along this pipeline, several key pieces of information are needed
115
+
116
+ A post-processing pipeline should first follow the conversions and corrections implemented in the suggestion serial pipeline. After calculation of the measured value, temperature-salinity correction should then be done for both attenuation and absorption.
117
+
118
+
119
+
120
+
@@ -0,0 +1,92 @@
1
+ # acpype
2
+
3
+ acpype provides functions for reading [Sea-Bird Scientific ACS](https://www.seabird.com/ac-s-spectral-absorption-and-attenuation-sensor/product?id=60762467715) data over serial and for performing community-accepted processing on ACS data.
4
+
5
+ ACS data are inherently complex and difficult to work with, particularly for new users without strong optics backgrounds.
6
+ This package attempts to simplify the process of reading and processing ACS data so that users can more quickly get to the more advanced data products for their research.
7
+
8
+ # CAUTION
9
+ If you are using this package to acquire data from an ACS over serial, it is your responsibility to ensure that you are
10
+ using the appropriate equipment. Please contact Sea-Bird Scientific for more information about the relevant equipment and
11
+ conditions needed to acquire data from the ACS.
12
+
13
+
14
+ ## What does this package need from you (and Sea-Bird Scientific)?
15
+ In addition to the physical sensor, one who deploys an ACS will also receive a device file (.dev) and a temperature-salinity correction file (.cor) from Sea-Bird Scientific.
16
+
17
+ As of publishing of this software package, the current version of the device file received from the factory is version 3. The device file is a text file that contains factory derived pure water offsets.
18
+ These offsets are created at select temperature bins, which acpype uses to correct data output for temperature changes within the sensor, which impact the internal optics.
19
+ The file provides pure water offsets. Removal of these offsets from a datasets then removes the effect of water on the absorption and attenuation measurements.
20
+
21
+ The TS4.cor file that is delivered with each ACS provides empirically derived temperature and salinity correction coefficients derived by Sullivan et al. 2006.
22
+ These coefficients are used to correct the absorption and attenuation measurements for temperature and salinity effects.
23
+
24
+ ## What does this package not do?
25
+ This package does not provide any functionality for logging data from the ACS. The `serial_no_device_file.ipynb` and `serial_with_device_file.ipynb` offer acypype-specific data structures which can then be converted to a format that enables the logging of the ACS data.
26
+ Inherently, the custom data structures are NamedTuple objects. To access that information as a Python dictionary, you can use the `._asdict()` method.
27
+
28
+ In applications where concurrency is required, PostgreSQL is a good option for storing data, since it provides functionality for handling arrays.
29
+ If file-based logging is required and single files are desired, SQLite is a good option. The ACS produces a significant amount of data, so logging to text or .netCDF files may best be done as hourly or daily files to prevent excessive memory usage.
30
+
31
+
32
+
33
+ ## Restrictions
34
+ The data streaming and processing functions are intended to be used separately from one another and not as a single pipeline.
35
+ The ACS outputs data at 4Hz and processing a serial packet to scattering corrected values is something that can't be done within 250ms for most computers.
36
+ Conceptually, this could be resolved by passing data between threads, but for now we will leave that up to the end user to implement.
37
+
38
+ acpype does not typically enforce a strict naming convention of variables created using the software, but the examples provided use terminology commonly found in the manual and in literature that uses ACS data.
39
+ For obtaining data over serial, the provided functions create predefined names, which users can change out at will.
40
+
41
+ The only enforced naming conventions are with the dimensions of any ACS data or metadata file imported with Xarray.
42
+ The following dimensions are required for ACS datasets:
43
+
44
+ - time: The time of the ACS sample, in UTC. Derived from the computer that is reading data over serial.
45
+ - a_wavelength: The absorption wavelength bins derived from the device file.
46
+ - c_wavelength: The attenuation wavelength bins derived from the device file.
47
+
48
+
49
+ For the device file, the required dimensions are:
50
+ - a_wavelength: The absorption wavelength bins derived from the device file.
51
+ - c_wavelength: The attenuation wavelength bins derived from the device file.
52
+ - temperature_bin: The temperature bin associated with the pure water offset.
53
+
54
+ For the TS4.cor file, the required dimensions are:
55
+ - wavelength
56
+
57
+
58
+
59
+ <!--
60
+ ## Installation
61
+
62
+ This package is available on [PyPI](https://pypi.org/project/acpype/) and can be installed using pip:
63
+ `pip install acspype`
64
+
65
+ -->
66
+
67
+ ## Suggested Naming Conventions
68
+ acpype doesn't enforce a strict naming convention of variables created using the software, but the examples provided use terminology commonly found in the ACS manual and in literature that uses ACS data.
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+ #### SCRATCH
77
+
78
+
79
+
80
+ Serial Pipeline
81
+ The ACS communicates over RS232 and asynchronously sends data to produce a new binary packet at 4Hz. A serial pipeline should effectively seek out a full ACS binary packet within a serial port’s buffer, extract it, and then assign it a timestamp based on the host computer clock. Those looking to integrate the ACS into a data acquisition system should know that the instrument does not have a real-time clock and only reports the number of milliseconds that have passed since the system received power. After a full packet is identified, it can be parsed to obtain a set of ACS data with values in engineering units (e.g. counts). Using a corresponding device file (ACS-XXX.dev), the engineering units can then be converted to geophysical units which are indexed by wavelength for the absorption (a) and attenuation (c) channels. Real-time correction of an ACS spectrum in the same thread is limited to what can be done in between receiving packets. A full packet will be received roughly every 250 milliseconds. Anecdotally, computation up through measured absorption (am) and attenuation (cm) can safely be done in the same thread, but temperature-salinity correction may lead to serial buffer pile-up and inaccurate timestamps. Steps 0.0 to 1.5 in Table 1 describe what can be done in single thread data acquisition for the ACS.
82
+ acpype also separates itself from software like pyACS by providing functionality to parse an incoming ACS binary packet into engineering values without explicit knowledge of the information in the ACS’ accompanying device file. Conveniently, the first 31 and last 3 bytes of a packet from any ACS (structure version 3) maintain the same format description, and byte 31 provides the number of output wavelengths, which can then be used to calculate the remaining number of bytes within the packet and split it into raw engineering values. This makes it possible, as a recommended last resort, to acquire, identify, and log ACS data from any ACS on a serial port without knowledge of the information within the device file. It is extremely important to keep track of the separate device file that provides the factory (or user) created pure water offsets. A key challenge in processing and merging ACS data is that the number of wavelength outputs and wavelength bins are very rarely the same between instruments, as well as between the same instrument factory refurbishments. This leads to the possibility of incorrectly merging ACS data if not indexed by wavelength.
83
+ After processing the measured value, it is up to the user to determine the storage format of data. acpype does not enforce a storage format or filetype for serial data acquisition. For applications that require concurrency, the use of PostgreSQL (PGDG, 2025) is suggested, as it supports the storage and retrieval of arrays. For file-based applications, using SQLite (SQLite Consortium, 2025) is suggested, but arrays will first need to be converted to character delimited strings. Logging directly to text or netCDF files is discouraged because memory use would increase as the dataset becomes larger, unless functionality is implemented to maintain a consistent file size.
84
+
85
+ Post-Processing Pipeline
86
+ It is presumed that the data processed with acpype will already have been timestamped and split into its engineering parts. acpype does require that the input data be placed in an Xarray.Dataset with the minimum coordinates of time, a_wavelength, and c_wavelength. Steps 0.1 to 5.1 describe the steps that should be taken as part of a standard post-processing pipeline for ACS data. Along this pipeline, several key pieces of information are needed
87
+
88
+ A post-processing pipeline should first follow the conversions and corrections implemented in the suggestion serial pipeline. After calculation of the measured value, temperature-salinity correction should then be done for both attenuation and absorption.
89
+
90
+
91
+
92
+
@@ -0,0 +1,5 @@
1
+ from .dev import ACSDev
2
+ from .tscor import ACSTSCor
3
+ from .stream import ACSStream
4
+ from .processing import parse_packet, calibrate_packet
5
+
@@ -0,0 +1,28 @@
1
+ import numpy as np
2
+
3
+ NUM_PAT = "[+-]?[0-9]*[.]?[0-9]+" # REGEX for any number, float or int, positive or negative.
4
+
5
+ PACKET_REGISTRATION = b'\xff\x00\xff\x00' # Start of every ACS packet.
6
+ PAD_BYTE = b'\x00' # End of every ACS packet.
7
+ WVL_BYTE_OFFSET = 4 + 2 + 1 + 1 + 1 + 3 + 2 + 2 + 2 + 2 + 2 + 2 + 2 + 4 + 1 # See Process Data section in ACS manual.
8
+ NUM_CHECKSUM_BYTES = 2
9
+ PACKET_HEAD = '!4cHBBl7HIBB' # struct descriptor for the static header of a packet.
10
+ PACKET_TAIL = 'Hx' # struct descriptor for the static tail of a packet.
11
+ LPR = len(PACKET_REGISTRATION)
12
+
13
+ class DefaultSerial:
14
+ BAUDRATE: int = 115200
15
+ BYTESIZE: int = 8
16
+ PARITY: str = 'N'
17
+ STOPBITS: int = 1
18
+ FLOWCONTROL: int = 0
19
+ TIMEOUT: int = 3
20
+
21
+ # Raw pressure counts are no longer output by an ACS and can be safely ignored. The reserved_1 and reserved_2 variables are single byte variables that are not used by the ACS and can be ignored.
22
+ ACS_VARS_TO_IGNORE = ['raw_pressure', 'reserved_1', 'reserved_2']
23
+
24
+ #---------- File Creation ----------#
25
+ ENCODING = {'time': {'units': 'nanoseconds since 1900-01-01'}} # xr.Dataset to netcdf encoding for time
26
+
27
+ #---------- PHYSICAL QUANTITIES ----------#
28
+ EST_FLOW_CELL_VOLUME = 30 # in mL, from the ACS Protocol Document, Rev Q.
@@ -0,0 +1,229 @@
1
+ from datetime import datetime
2
+ import numpy as np
3
+ import re
4
+ from scipy import interpolate
5
+ import xarray as xr
6
+
7
+ from acspype.core import NUM_PAT
8
+
9
+
10
+ class ACSDev:
11
+ """
12
+ A class for parsing ACS .dev files and putting them into a format that is easier to work with for larger or
13
+ multiple file datasets.
14
+
15
+ Generally, users will not call individual functions, but rather use the class to obtain attributes, which are
16
+ created at class instantiation or convert the data to an xarray dataset using the to_xarray function.
17
+ """
18
+
19
+ def __init__(self, filepath: str) -> None:
20
+ """
21
+ Run the following functions at instantiation to parse the .dev file and store the info as class attributes.
22
+
23
+ :param filepath: The filepath to the .dev file.
24
+ :return: None
25
+ """
26
+
27
+ self._filepath = filepath
28
+ self.__read_dev()
29
+ self.__parse_metadata()
30
+ self.__parse_tbins()
31
+ self.__parse_offsets()
32
+ self.__build_interp_funcs()
33
+ self.__check_parse()
34
+
35
+
36
+ def __read_dev(self) -> None:
37
+ """
38
+ Import the .dev file as a text file.
39
+ The file contents are stored as a list of strings in the class attribute self._content.
40
+
41
+ :return: None
42
+ """
43
+
44
+ with open(self._filepath, 'r') as _file:
45
+ self._content = _file.readlines()
46
+
47
+
48
+ def __parse_metadata(self) -> None:
49
+ """
50
+ Parse the .dev file for individual sensor metadata.
51
+ Sensor specific metadata are stored as class attributes.
52
+
53
+ :return: None
54
+ """
55
+
56
+ metadata_lines = [line for line in self._content if 'C and A offset' not in line]
57
+ for line in metadata_lines:
58
+ if 'ACS Meter' in line:
59
+ self.sensor_type = re.findall('(.*?)\n', line)[0]
60
+ elif 'Serial' in line:
61
+ self.sn_hexdec = re.findall('(.*?)\t', line)[0]
62
+ self.sn = 'ACS-' + str(int(self.sn_hexdec[-6:], 16)).zfill(5) # Convert to sn shown on product sticker.
63
+ elif 'structure version' in line:
64
+ self.structure_version = int(re.findall(f'({NUM_PAT})\t', line)[0])
65
+ elif 'tcal' in line or 'Tcal' in line:
66
+ self.tcal, self.ical = [float(v) for v in re.findall(f': ({NUM_PAT}) C', line)]
67
+ cal_date_str = re.findall('file on (.*?)[.]', line)[0].replace(' ', '')
68
+ try: # Sometimes the file date is entered as yyyy or yy. This should handle both cases.
69
+ self.cal_date = datetime.strptime(cal_date_str, '%m/%d/%Y').strftime('%Y-%m-%d')
70
+ except:
71
+ self.cal_date = datetime.strptime(cal_date_str, '%m/%d/%y').strftime('%Y-%m-%d')
72
+ elif 'Depth calibration' in line:
73
+ (self.depth_cal_1,
74
+ self.depth_cal_2) = [float(v) for v in re.findall(f'({NUM_PAT})', line)]
75
+ elif 'Baud' in line:
76
+ self.baudrate = int(re.findall(f'({NUM_PAT})\t', line)[0])
77
+ elif 'Path' in line:
78
+ self.path_length = float(re.findall(f'({NUM_PAT})\t', line)[0])
79
+ elif 'wavelengths' in line:
80
+ self.num_wavelength = int(re.findall(f'({NUM_PAT})\t', line)[0])
81
+ elif 'number of temperature bins' in line:
82
+ self.num_tbin = int(re.findall(f'({NUM_PAT})\t', line)[0])
83
+ elif 'maxANoise' in line:
84
+ (self.max_a_noise, self.max_c_noise, self.max_a_nonconform, self.max_c_nonconform,
85
+ self.max_a_difference, self.max_c_difference, self.min_a_counts,
86
+ self.min_c_counts, self.min_r_counts, self.max_temp_sd,
87
+ self.max_depth_sd) = [float(v) for v in re.findall(f'({NUM_PAT})\t', line)]
88
+
89
+
90
+ def __parse_tbins(self) -> None:
91
+ """
92
+ Parse the .dev file for temperature bin information.
93
+
94
+ :return: None
95
+ """
96
+ tbin_line = [line for line in self._content if '; temperature bins' in line][0]
97
+ tbins = tbin_line.split('\t')
98
+ tbins = [v for v in tbins if v] # Toss empty strings.
99
+ tbins = [v for v in tbins if v != '\n'] # Toss newline characters.
100
+ self.tbin = np.array([float(v) for v in tbins if 'temperature bins' not in v]) # Convert to float and toss comment.
101
+
102
+
103
+ def __parse_offsets(self) -> None:
104
+ """
105
+ Parse the .dev file for a and c offsets. Data are saved as class attributes for access at a later time.
106
+
107
+ :return: None
108
+ """
109
+
110
+ offset_lines = [line for line in self._content if 'C and A offset' in line]
111
+
112
+ # Create holder arrays to loop over and append data to.
113
+ c_wvls = []
114
+ a_wvls = []
115
+ c_offs = []
116
+ a_offs = []
117
+ c_deltas = []
118
+ a_deltas = []
119
+ wavelength_color_schemes = []
120
+
121
+ for line in offset_lines:
122
+ offsets, c_delta, a_delta = line.split('\t\t')[:-1]
123
+ c_wvl, a_wvl, wvl_color, c_off, a_off = offsets.split('\t')
124
+
125
+ # Convert strings to proper pythonic datatypes.
126
+ c_wvl = float(c_wvl.replace('C', ''))
127
+ a_wvl = float(a_wvl.replace('A', ''))
128
+ c_off = float(c_off)
129
+ a_off = float(a_off)
130
+ c_delta = [float(v) for v in c_delta.split('\t')]
131
+ a_delta = [float(v) for v in a_delta.split('\t')]
132
+
133
+ # Append files to holder arrays.
134
+ c_wvls.append(c_wvl)
135
+ a_wvls.append(a_wvl)
136
+ c_offs.append(c_off)
137
+ a_offs.append(a_off)
138
+ c_deltas.append(c_delta)
139
+ a_deltas.append(a_delta)
140
+ wavelength_color_schemes.append(wvl_color)
141
+
142
+ # Convert holder arrays to numpy arrays.
143
+ self.c_wavelength = np.array(c_wvls)
144
+ self.a_wavelength = np.array(a_wvls)
145
+ self.c_offset = np.array(c_offs)
146
+ self.a_offset = np.array(a_offs)
147
+ self.c_delta_t = np.array(c_deltas)
148
+ self.a_delta_t = np.array(a_deltas)
149
+ self.wavelength_color_schemes = wavelength_color_schemes
150
+
151
+
152
+ def __build_interp_funcs(self) -> None:
153
+ """
154
+ Build interpolation functions for the a and c delta_t values and store as class attributes.
155
+
156
+ :return: None
157
+ """
158
+ self.func_a_delta_t = interpolate.interp1d(self.tbin, self.a_delta_t, axis=1)
159
+ self.func_c_delta_t = interpolate.interp1d(self.tbin, self.c_delta_t, axis=1)
160
+ self.delta_t_interp_method = 'scipy.interpolate.interp1d'
161
+
162
+
163
+ def __check_parse(self) -> None:
164
+ """
165
+ Verify that the shape of the data is as expected.
166
+
167
+ :return: None
168
+ """
169
+
170
+ if len(self.a_wavelength) != self.num_wavelength:
171
+ raise ValueError('Mismatch between number of wavelengths extracted for A and expected from file.'
172
+ 'Please verify the .dev file integrity.')
173
+ if len(self.c_wavelength) != self.num_wavelength:
174
+ raise ValueError('Mismatch between number of wavelengths extracted for C and expected from file.'
175
+ 'Please verify the .dev file integrity.')
176
+ if len(self.c_wavelength) != len(self.a_wavelength):
177
+ raise ValueError('Mismatch between number of wavelengths extracted for A and C.'
178
+ 'Please verify the .dev file integrity.')
179
+ if np.array(self.a_delta_t).shape != (len(self.a_wavelength), self.num_tbin):
180
+ raise ValueError('Mismatch between length of A wavelengths and number of temperature bins.'
181
+ 'Please verify the .dev file integrity.')
182
+ if np.array(self.c_delta_t).shape != (len(self.a_wavelength), self.num_tbin):
183
+ raise ValueError('Mismatch between length of C wavelengths and number of temperature bins.'
184
+ 'Please verify the .dev file integrity.')
185
+
186
+
187
+ def to_xarray(self) -> xr.Dataset:
188
+ """
189
+ Convert the parsed .dev file files to an xarray dataset
190
+
191
+ Returns: An appropriately dimensioned xarray dataset containing device file files.
192
+ """
193
+ ds = xr.Dataset()
194
+ ds = ds.assign_coords({'a_wavelength': self.a_wavelength,
195
+ 'c_wavelength': self.c_wavelength,
196
+ 'temperature_bin': self.tbin})
197
+
198
+ ds['a_offset'] = (['a_wavelength'], self.a_offset)
199
+ ds['a_delta_t'] = (['a_wavelength', 'temperature_bin'], self.a_delta_t)
200
+
201
+ ds['c_offset'] = (['c_wavelength'], np.array(self.c_offset))
202
+ ds['c_delta_t'] = (['c_wavelength', 'temperature_bin'], self.c_delta_t)
203
+
204
+ ds.attrs['device_filepath'] = self._filepath
205
+ ds.attrs['sensor_type'] = self.sensor_type
206
+ ds.attrs['serial_number_hexdec'] = self.sn_hexdec
207
+ ds.attrs['serial_number'] = self.sn
208
+ ds.attrs['device_file_structure_version'] = self.structure_version
209
+ ds.attrs['tcal'] = self.tcal
210
+ ds.attrs['ical'] = self.ical
211
+ ds.attrs['calibration_date'] = self.cal_date
212
+ ds.attrs['depth_cal_1'] = self.depth_cal_1
213
+ ds.attrs['depth_cal_2'] = self.depth_cal_2
214
+ ds.attrs['baudrate'] = self.baudrate
215
+ ds.attrs['path_length'] = self.path_length
216
+ ds.attrs['number_of_wavelength_bins'] = self.num_wavelength
217
+ ds.attrs['number_of_temperature_bins'] = self.num_tbin
218
+ ds.attrs['max_a_noise'] = self.max_a_noise
219
+ ds.attrs['max_c_noise'] = self.max_c_noise
220
+ ds.attrs['max_a_nonconform'] = self.max_a_nonconform
221
+ ds.attrs['max_c_nonconform'] = self.max_c_nonconform
222
+ ds.attrs['max_a_difference'] = self.max_a_difference
223
+ ds.attrs['max_c_difference'] = self.max_c_difference
224
+ ds.attrs['min_a_counts'] = self.min_a_counts
225
+ ds.attrs['min_c_counts'] = self.min_c_counts
226
+ ds.attrs['min_r_counts'] = self.min_r_counts
227
+ ds.attrs['max_temp_sd'] = self.max_temp_sd
228
+ ds.attrs['max_depth_sd'] = self.max_depth_sd
229
+ return ds
@@ -0,0 +1,142 @@
1
+ import numpy as np
2
+ from scipy.interpolate import CubicSpline
3
+ from typing import Union
4
+ import xarray as xr
5
+
6
+
7
+ def find_discontinuity_index(a_wavelengths: Union[list,tuple, np.array, xr.DataArray],
8
+ c_wavelengths: Union[list,tuple, np.array, xr.DataArray],
9
+ min_band: int = 535, max_band: int = 600) -> int:
10
+ """
11
+
12
+ This code is modified from the OPTAA processing utilities in the ooi-data-explorations repo.
13
+ https://github.com/IanTBlack/ooi-data-explorations/blob/master/python/ooi_data_explorations/uncabled/utilities/utilities_optaa.py#L104
14
+
15
+ Find the last wavelength index of the first filter based on wavelength differences.
16
+ This function assumes that the discontinuity occurs between 535 nm and 600 nm, which is buffered from the values in the ACS Protocol Document, Rev Q.
17
+
18
+ :param a_wavelengths: Absorption wavelengths
19
+ :param c_wavelengths: Attenuation wavelengths
20
+ :return: The last wavelength index at the discontinuity.
21
+ """
22
+
23
+ # Copy and convert to numpy arrays because we are paranoid about global variables.
24
+ a_wavelengths = np.array(a_wavelengths).copy()
25
+ c_wavelengths = np.array(c_wavelengths).copy()
26
+
27
+ # Set values outside the range to NaN
28
+ a_wavelengths[(a_wavelengths < min_band) | (a_wavelengths > max_band)] = np.nan
29
+ c_wavelengths[(c_wavelengths < min_band) | (c_wavelengths > max_band)] = np.nan
30
+
31
+ # Find the index of the discontinuity
32
+ didx = int(np.nanargmin(np.diff(a_wavelengths) + np.diff(c_wavelengths)))
33
+ return didx
34
+
35
+
36
+
37
+ def _compute_discontinuity_offset(values: Union[list, tuple, np.array],
38
+ wavelength: Union[list, tuple, np.array],
39
+ didx: int) -> float:
40
+ """
41
+ This code is modified from the OPTAA processing utilities in the ooi-data-explorations repo.
42
+ https://github.com/IanTBlack/ooi-data-explorations/blob/master/python/ooi_data_explorations/uncabled/utilities/utilities_optaa.py#L212
43
+
44
+ Compute the scalar discontinuity offset to be applied to the second half of an ACS spectra.
45
+ NOTE: If the input values contain an inf value, the function will return -999. This is to prevent math errors associated with creating a cubic spline on infinite values.
46
+ Spectra with infinite values should be removed at some point in the processing pipeline.
47
+
48
+ :param values: The incoming absorption or attenuation values. It is highly recommended that these values be
49
+ representative of ACS data that have been converted to 'geophysical' units (1/m) and corrected for the effects
50
+ of internal temperature on output. That is to say, the recommended input is the measured (a_m and c_m) in the
51
+ ACS protocol documents and manual. acspype inputs would be a_m_discontinuity and c_m_discontinuity.
52
+ :param wavelength: The wavelength bins of the values.
53
+ :param didx: The index of discontinuity.
54
+ :return: The discontinuity offset for the second half of the spectrum.
55
+ """
56
+ _wavelength = np.copy(wavelength)
57
+ _values = np.copy(values)
58
+ _didx = int(np.copy(didx))
59
+
60
+ x = _wavelength[_didx - 2:_didx + 1]
61
+ y = _values[_didx - 2:_didx + 1]
62
+
63
+ if True in np.isinf(y):
64
+ return -999
65
+ else:
66
+ cubic_spline = CubicSpline(x, y)
67
+ interp = cubic_spline(_wavelength[_didx + 1], extrapolate=True)
68
+ offset = interp - _values[_didx + 1]
69
+ return offset
70
+
71
+
72
+ def _apply_discontinuity_offset(values: Union[list, tuple, np.array],
73
+ offset: float,
74
+ didx: int) -> np.array:
75
+
76
+ """
77
+ This code is modified from the OPTAA processing utilities in the ooi-data-explorations repo.
78
+ https://github.com/IanTBlack/ooi-data-explorations/blob/master/python/ooi_data_explorations/uncabled/utilities/utilities_optaa.py#L212
79
+
80
+ Apply a pre-determined discontinuity offset to values after the discontinuity index.
81
+
82
+ :param values: The measured values to apply the discontinuity offset to.
83
+ :param offset: The scalar offset to apply to the values after the discontinuity index.
84
+ :param didx: The discontinuity index computed from find_discontinuity_index.
85
+ :return: A discontinuity-corrected spectra.
86
+ """
87
+
88
+ _values = np.copy(values)
89
+ _offset = np.copy(offset)
90
+ _didx = int(np.copy(didx))
91
+ _values[_didx + 1:] = _values[_didx + 1:] + _offset
92
+ return _values
93
+
94
+
95
+ def compute_discontinuity_offset(measured, wavelength_dim, discontinuity_index):
96
+ """
97
+ This is a wrapper function for _compute_discontinuity_offset that is vectorized for Xarray.
98
+
99
+ :param measured: The measured values
100
+ :param wavelength_dim: The dimension to calculate the offset on.
101
+ :param discontinuity_index
102
+ :return: The scalar offset for a given spectrum.
103
+ """
104
+
105
+ offset = xr.apply_ufunc(_compute_discontinuity_offset, measured,
106
+ kwargs = {'wavelength': measured[wavelength_dim].values, 'didx': discontinuity_index},
107
+ input_core_dims = [[wavelength_dim]],
108
+ output_core_dims = [[]],
109
+ vectorize = True)
110
+ return offset
111
+
112
+
113
+ def apply_discontinuity_offset(measured, offset, wavelength_dim, discontinuity_index):
114
+ """
115
+ This is a wrapper function for _apply_discontinuity_offset that is vectorized for Xarray.
116
+ :param measured: The measured values.
117
+ :param offset: The pre-determined discontinuity offset.
118
+ :param wavelength_dim: The wavelength dimension to apply the offset to.
119
+ :param discontinuity_index: The index of discontinuity.
120
+ :return: Spectra with the discontinuity offset applied.
121
+ """
122
+
123
+ dc = xr.apply_ufunc(_apply_discontinuity_offset, measured, offset,
124
+ kwargs = {'didx': discontinuity_index},
125
+ input_core_dims=[[wavelength_dim],[]],
126
+ output_core_dims = [[wavelength_dim]],
127
+ vectorize = True)
128
+ return dc
129
+
130
+
131
+ def discontinuity_correction(measured, wavelength_dim, discontinuity_index):
132
+ """
133
+ This is a convenience function for computing the discontinuity offset and applying it to the measured values in Xarray.
134
+ :param measured: The measured values.
135
+ :param wavelength_dim: The wavelength dimension to apply the correction to.
136
+ :param discontinuity_index: The index of discontinuity.
137
+ :return:
138
+ """
139
+
140
+ offset = compute_discontinuity_offset(measured, wavelength_dim, discontinuity_index)
141
+ dc = apply_discontinuity_offset(measured, offset, wavelength_dim, discontinuity_index)
142
+ return dc, offset