fishlib 0.6.2__tar.gz → 0.6.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
File without changes
fishlib-0.6.4/PKG-INFO ADDED
@@ -0,0 +1,208 @@
1
+ Metadata-Version: 2.1
2
+ Name: fishlib
3
+ Version: 0.6.4
4
+ Summary: A Python library for parsing, standardizing, and comparing seafood product descriptions in foodservice
5
+ Home-page: https://github.com/KTG0409/fishlib
6
+ Author: Karen Morton
7
+ Author-email: kmorton319@gmail.com
8
+ Keywords: seafood,fish,foodservice,parsing,standardization,pricing,comparison
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: Intended Audience :: Science/Research
12
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
13
+ Classifier: Topic :: Office/Business
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.8
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Operating System :: OS Independent
22
+ Requires-Python: >=3.8
23
+ Description-Content-Type: text/markdown
24
+ Provides-Extra: dev
25
+ License-File: LICENSE
26
+
27
+ # fishlib 🐟
28
+
29
+ A Python library for parsing, standardizing, and comparing seafood product descriptions in foodservice.
30
+
31
+ **The Problem:** Seafood product descriptions are messy. The same product can be described a hundred different ways. Comparing prices across distributors, suppliers, or market data requires deep domain knowledge to know if two items are actually comparable.
32
+
33
+ **The Solution:** `fishlib` parses item descriptions into structured attributes, standardizes them to common codes, and enables apples-to-apples comparisons — so you don't need to be a fish expert to work with seafood data.
34
+
35
+ ## What's New in 0.6.4
36
+
37
+ - **Fixed:** Bare `WALLEYE` no longer false-matches Alaska Pollock (new dedicated `walleye` and `pike` categories)
38
+ - **Fixed:** Bare-letter trim codes (`A`, `B`, `C`, `D`, `E`) now recognized with word-boundary safety
39
+ - **Fixed:** Squid `TUBES` no longer misclassified as rings — new `TUBE` form code added
40
+ - **Added 16 new categories:** trout, sea_bass, branzino, snapper, grouper, monkfish, mackerel, walleye, pike, perch, barramundi, hamachi, rockfish, roe, crawfish, conch
41
+ - **Expanded oyster:** added Kumamoto, European Flat (Belon), and Olympia species
42
+ - **Cleaner schema:** removed duplicate codes (`SKOFF` folded into `SKLS`, `MRNTD` kept only in `value_added`, meat-grade `CLAW` renamed to `CLAW_MEAT` to disambiguate from form)
43
+ - **Zero dependencies:** `pandas` removed from install requirements (never actually imported in library code)
44
+
45
+ **Coverage:** 38 categories, 109 species/subspecies, 411 aliases.
46
+
47
+ ## Installation
48
+
49
+ ```bash
50
+ pip install fishlib
51
+ ```
52
+
53
+ ## Quick Start
54
+
55
+ ```python
56
+ import fishlib
57
+
58
+ # Parse any item description
59
+ item = fishlib.parse("SALMON FIL ATL SKON D 6OZ IVP")
60
+
61
+ print(item['category']) # 'salmon'
62
+ print(item['subspecies']) # 'atlantic'
63
+ print(item['form']) # 'FIL'
64
+ print(item['skin']) # 'SKON'
65
+ print(item['trim']) # 'D'
66
+ print(item['size']) # '6OZ'
67
+ print(item['pack']) # 'IVP'
68
+
69
+ # Get a comparison key for matching
70
+ key = fishlib.comparison_key(item)
71
+ # "SALMON|ATLANTIC|FIL|SKON|D|6OZ"
72
+
73
+ # Check if two items are comparable
74
+ result = fishlib.match(
75
+ "SALMON PRTN ATL BNLS SKLS 6 OZ CENTER CUT",
76
+ "Portico Salmon Fillet 6 oz Boneless / Skinless",
77
+ )
78
+ print(result['is_comparable']) # True
79
+ print(result['confidence']) # e.g., 0.85
80
+ print(result['different_attributes']) # ['form']
81
+ print(result['recommendation']) # human-readable summary
82
+ ```
83
+
84
+ ## Features
85
+
86
+ ### Parse Item Descriptions
87
+
88
+ Turn messy text into structured data:
89
+
90
+ ```python
91
+ fishlib.parse("POLLOCK FIL WILD ALASKA PROCESSED CHINA 6OZ IVP")
92
+ # {
93
+ # 'category': 'pollock', 'subspecies': 'alaska',
94
+ # 'form': 'FIL', 'size': '6OZ', 'pack': 'IVP',
95
+ # 'origin_harvest': 'USA', 'origin_processed': 'CHN',
96
+ # 'freeze_cycle': 'TWICE', # finfish processed in Asia ⇒ twice frozen
97
+ # ...
98
+ # }
99
+ ```
100
+
101
+ ### Standardized Codes
102
+
103
+ | Attribute | Codes |
104
+ |-----------|-------|
105
+ | **Form** | FIL (Fillet), PRTN (Portion), LOIN, WHL (Whole), STEAK, TAIL, TUBE, RING, CLUSTER, LEG, CLAW, MEAT, H&G, PD, PUD, HLSO, EZPL, ... |
106
+ | **Skin** | SKON (Skin On), SKLS (Skinless) |
107
+ | **Bone** | BNLS (Boneless), BIN (Bone In), PBO (Pin Bone Out) |
108
+ | **Trim** | A, B, C, D, E, FTRIM — bare letter or TRIM-D / DTRM variants |
109
+ | **Pack** | IVP, IQF, CVP, BULK, SHL, TRAY |
110
+ | **Storage** | FRZ (Frozen), FRSH (Fresh), RFRSH (Refreshed) |
111
+ | **Harvest** | WILD, FARM |
112
+ | **Preparation** | RAW, CKD, SMKD, CURED |
113
+ | **Value-added** | BRDD, STFD, MRNTD, SSNDD, POF |
114
+
115
+ ### Origin & Freeze Cycle
116
+
117
+ ```python
118
+ # origin split into harvest (where caught/farmed) and processed (where cut/portioned)
119
+ item['origin_harvest'] # 'USA' — caught in Alaska
120
+ item['origin_processed'] # 'CHN' — portioned in China
121
+ item['freeze_cycle'] # 'TWICE' — inferred from Asian processing of finfish
122
+
123
+ # origin is also populated for legacy single-field use
124
+ item['origin'] # 'USA'
125
+ ```
126
+
127
+ ### Species Coverage
128
+
129
+ 38 categories across parent groups:
130
+
131
+ - **Finfish:** salmon, tuna, mahi, swordfish, sea_bass, branzino, snapper, grouper, monkfish, mackerel, barramundi, hamachi, rockfish
132
+ - **Groundfish:** cod (incl. black cod/sablefish), haddock, pollock
133
+ - **Flatfish:** halibut, flounder, sole
134
+ - **Freshwater:** tilapia, swai (pangasius/basa/tra), catfish, trout, walleye, pike, perch
135
+ - **Crustacean:** crab, lobster, shrimp, crawfish
136
+ - **Mollusk:** scallop, clam, oyster, mussel, calamari, octopus, conch
137
+ - **Roe:** ikura, tobiko, masago, caviar, uni
138
+
139
+ ### Trim Guide (Salmon)
140
+
141
+ | Trim | Description | Skin |
142
+ |------|-------------|------|
143
+ | **A** | Backbone off, bellybone off | ON |
144
+ | **B** | A + backfin off, collarbone off, belly fat/fins off | ON |
145
+ | **C** | B + pin bone out | ON |
146
+ | **D** | C + back trimmed, tailpiece off, belly membrane off, nape trimmed | ON |
147
+ | **E** | D + skin removed | OFF |
148
+
149
+ **Foodservice standard:** Trim D (skin on) and Trim E (skin off).
150
+
151
+ ### Cut Styles (Portions)
152
+
153
+ | Style | Description | Value |
154
+ |-------|-------------|-------|
155
+ | **Center Cut** | From center of fish only, no tails/nape | Premium |
156
+ | **Bias** | Cut at angle for better presentation | Premium |
157
+ | **Block** | Straight cuts end-to-end, includes tails | Mid |
158
+ | **Random** | Mixed pieces, various shapes | Value |
159
+
160
+ ### Match & Compare
161
+
162
+ ```python
163
+ fishlib.is_comparable(item1, item2) # True / False
164
+ fishlib.match(item1, item2) # full match dict
165
+ fishlib.find_matches(target, candidates, threshold=0.8)
166
+ fishlib.explain_difference(item1, item2) # human-readable explanation
167
+ ```
168
+
169
+ ### Reference Data
170
+
171
+ ```python
172
+ from fishlib import reference
173
+
174
+ reference.trim_levels('salmon') # Trim A–E definitions
175
+ reference.is_trim_skin_on('D') # True (trim D is skin-on)
176
+ reference.cut_style('CENTER') # {'description': ..., 'premium': True}
177
+ reference.price_tier('salmon', 'king') # {'tier': 'ultra-premium'}
178
+ ```
179
+
180
+ ## Why This Exists
181
+
182
+ In foodservice distribution, comparing prices requires knowing if products are truly comparable. A "6oz salmon fillet" from two different sources might be:
183
+
184
+ - Center-cut bias portion at $12/lb (premium)
185
+ - Block-cut with tail pieces at $8/lb (commodity)
186
+
187
+ Without the right attributes, price comparisons are meaningless. `fishlib` encodes the domain knowledge needed to make accurate comparisons — so you don't need 20 years of fish experience to work with seafood data.
188
+
189
+ ## Contributing
190
+
191
+ Contributions welcome. Areas of interest:
192
+
193
+ - Additional species and regional variants
194
+ - International market terminology
195
+ - Packaging and processing codes
196
+ - Price reference data
197
+
198
+ ## Author
199
+
200
+ **Karen Morton** — seafood industry professional with 20+ years of experience in category management and procurement. Built from years of experience managing seafood categories and the realization that this knowledge should be accessible to everyone, not trapped in experts' heads.
201
+
202
+ ## Acknowledgments
203
+
204
+ Developed with assistance from [Claude](https://claude.ai) (Anthropic) for code scaffolding, refactoring, test harness construction, and documentation. All seafood domain knowledge — species classification, trim logic, cut styles, freeze-cycle inference rules, alias curation, and design decisions — comes from the author's 20+ years in foodservice category management.
205
+
206
+ ## License
207
+
208
+ MIT License — use it, modify it, share it. Just make seafood data better for everyone.
@@ -0,0 +1,182 @@
1
+ # fishlib 🐟
2
+
3
+ A Python library for parsing, standardizing, and comparing seafood product descriptions in foodservice.
4
+
5
+ **The Problem:** Seafood product descriptions are messy. The same product can be described a hundred different ways. Comparing prices across distributors, suppliers, or market data requires deep domain knowledge to know if two items are actually comparable.
6
+
7
+ **The Solution:** `fishlib` parses item descriptions into structured attributes, standardizes them to common codes, and enables apples-to-apples comparisons — so you don't need to be a fish expert to work with seafood data.
8
+
9
+ ## What's New in 0.6.4
10
+
11
+ - **Fixed:** Bare `WALLEYE` no longer false-matches Alaska Pollock (new dedicated `walleye` and `pike` categories)
12
+ - **Fixed:** Bare-letter trim codes (`A`, `B`, `C`, `D`, `E`) now recognized with word-boundary safety
13
+ - **Fixed:** Squid `TUBES` no longer misclassified as rings — new `TUBE` form code added
14
+ - **Added 16 new categories:** trout, sea_bass, branzino, snapper, grouper, monkfish, mackerel, walleye, pike, perch, barramundi, hamachi, rockfish, roe, crawfish, conch
15
+ - **Expanded oyster:** added Kumamoto, European Flat (Belon), and Olympia species
16
+ - **Cleaner schema:** removed duplicate codes (`SKOFF` folded into `SKLS`, `MRNTD` kept only in `value_added`, meat-grade `CLAW` renamed to `CLAW_MEAT` to disambiguate from form)
17
+ - **Zero dependencies:** `pandas` removed from install requirements (never actually imported in library code)
18
+
19
+ **Coverage:** 38 categories, 109 species/subspecies, 411 aliases.
20
+
21
+ ## Installation
22
+
23
+ ```bash
24
+ pip install fishlib
25
+ ```
26
+
27
+ ## Quick Start
28
+
29
+ ```python
30
+ import fishlib
31
+
32
+ # Parse any item description
33
+ item = fishlib.parse("SALMON FIL ATL SKON D 6OZ IVP")
34
+
35
+ print(item['category']) # 'salmon'
36
+ print(item['subspecies']) # 'atlantic'
37
+ print(item['form']) # 'FIL'
38
+ print(item['skin']) # 'SKON'
39
+ print(item['trim']) # 'D'
40
+ print(item['size']) # '6OZ'
41
+ print(item['pack']) # 'IVP'
42
+
43
+ # Get a comparison key for matching
44
+ key = fishlib.comparison_key(item)
45
+ # "SALMON|ATLANTIC|FIL|SKON|D|6OZ"
46
+
47
+ # Check if two items are comparable
48
+ result = fishlib.match(
49
+ "SALMON PRTN ATL BNLS SKLS 6 OZ CENTER CUT",
50
+ "Portico Salmon Fillet 6 oz Boneless / Skinless",
51
+ )
52
+ print(result['is_comparable']) # True
53
+ print(result['confidence']) # e.g., 0.85
54
+ print(result['different_attributes']) # ['form']
55
+ print(result['recommendation']) # human-readable summary
56
+ ```
57
+
58
+ ## Features
59
+
60
+ ### Parse Item Descriptions
61
+
62
+ Turn messy text into structured data:
63
+
64
+ ```python
65
+ fishlib.parse("POLLOCK FIL WILD ALASKA PROCESSED CHINA 6OZ IVP")
66
+ # {
67
+ # 'category': 'pollock', 'subspecies': 'alaska',
68
+ # 'form': 'FIL', 'size': '6OZ', 'pack': 'IVP',
69
+ # 'origin_harvest': 'USA', 'origin_processed': 'CHN',
70
+ # 'freeze_cycle': 'TWICE', # finfish processed in Asia ⇒ twice frozen
71
+ # ...
72
+ # }
73
+ ```
74
+
75
+ ### Standardized Codes
76
+
77
+ | Attribute | Codes |
78
+ |-----------|-------|
79
+ | **Form** | FIL (Fillet), PRTN (Portion), LOIN, WHL (Whole), STEAK, TAIL, TUBE, RING, CLUSTER, LEG, CLAW, MEAT, H&G, PD, PUD, HLSO, EZPL, ... |
80
+ | **Skin** | SKON (Skin On), SKLS (Skinless) |
81
+ | **Bone** | BNLS (Boneless), BIN (Bone In), PBO (Pin Bone Out) |
82
+ | **Trim** | A, B, C, D, E, FTRIM — bare letter or TRIM-D / DTRM variants |
83
+ | **Pack** | IVP, IQF, CVP, BULK, SHL, TRAY |
84
+ | **Storage** | FRZ (Frozen), FRSH (Fresh), RFRSH (Refreshed) |
85
+ | **Harvest** | WILD, FARM |
86
+ | **Preparation** | RAW, CKD, SMKD, CURED |
87
+ | **Value-added** | BRDD, STFD, MRNTD, SSNDD, POF |
88
+
89
+ ### Origin & Freeze Cycle
90
+
91
+ ```python
92
+ # origin split into harvest (where caught/farmed) and processed (where cut/portioned)
93
+ item['origin_harvest'] # 'USA' — caught in Alaska
94
+ item['origin_processed'] # 'CHN' — portioned in China
95
+ item['freeze_cycle'] # 'TWICE' — inferred from Asian processing of finfish
96
+
97
+ # origin is also populated for legacy single-field use
98
+ item['origin'] # 'USA'
99
+ ```
100
+
101
+ ### Species Coverage
102
+
103
+ 38 categories across parent groups:
104
+
105
+ - **Finfish:** salmon, tuna, mahi, swordfish, sea_bass, branzino, snapper, grouper, monkfish, mackerel, barramundi, hamachi, rockfish
106
+ - **Groundfish:** cod (incl. black cod/sablefish), haddock, pollock
107
+ - **Flatfish:** halibut, flounder, sole
108
+ - **Freshwater:** tilapia, swai (pangasius/basa/tra), catfish, trout, walleye, pike, perch
109
+ - **Crustacean:** crab, lobster, shrimp, crawfish
110
+ - **Mollusk:** scallop, clam, oyster, mussel, calamari, octopus, conch
111
+ - **Roe:** ikura, tobiko, masago, caviar, uni
112
+
113
+ ### Trim Guide (Salmon)
114
+
115
+ | Trim | Description | Skin |
116
+ |------|-------------|------|
117
+ | **A** | Backbone off, bellybone off | ON |
118
+ | **B** | A + backfin off, collarbone off, belly fat/fins off | ON |
119
+ | **C** | B + pin bone out | ON |
120
+ | **D** | C + back trimmed, tailpiece off, belly membrane off, nape trimmed | ON |
121
+ | **E** | D + skin removed | OFF |
122
+
123
+ **Foodservice standard:** Trim D (skin on) and Trim E (skin off).
124
+
125
+ ### Cut Styles (Portions)
126
+
127
+ | Style | Description | Value |
128
+ |-------|-------------|-------|
129
+ | **Center Cut** | From center of fish only, no tails/nape | Premium |
130
+ | **Bias** | Cut at angle for better presentation | Premium |
131
+ | **Block** | Straight cuts end-to-end, includes tails | Mid |
132
+ | **Random** | Mixed pieces, various shapes | Value |
133
+
134
+ ### Match & Compare
135
+
136
+ ```python
137
+ fishlib.is_comparable(item1, item2) # True / False
138
+ fishlib.match(item1, item2) # full match dict
139
+ fishlib.find_matches(target, candidates, threshold=0.8)
140
+ fishlib.explain_difference(item1, item2) # human-readable explanation
141
+ ```
142
+
143
+ ### Reference Data
144
+
145
+ ```python
146
+ from fishlib import reference
147
+
148
+ reference.trim_levels('salmon') # Trim A–E definitions
149
+ reference.is_trim_skin_on('D') # True (trim D is skin-on)
150
+ reference.cut_style('CENTER') # {'description': ..., 'premium': True}
151
+ reference.price_tier('salmon', 'king') # {'tier': 'ultra-premium'}
152
+ ```
153
+
154
+ ## Why This Exists
155
+
156
+ In foodservice distribution, comparing prices requires knowing if products are truly comparable. A "6oz salmon fillet" from two different sources might be:
157
+
158
+ - Center-cut bias portion at $12/lb (premium)
159
+ - Block-cut with tail pieces at $8/lb (commodity)
160
+
161
+ Without the right attributes, price comparisons are meaningless. `fishlib` encodes the domain knowledge needed to make accurate comparisons — so you don't need 20 years of fish experience to work with seafood data.
162
+
163
+ ## Contributing
164
+
165
+ Contributions welcome. Areas of interest:
166
+
167
+ - Additional species and regional variants
168
+ - International market terminology
169
+ - Packaging and processing codes
170
+ - Price reference data
171
+
172
+ ## Author
173
+
174
+ **Karen Morton** — seafood industry professional with 20+ years of experience in category management and procurement. Built from years of experience managing seafood categories and the realization that this knowledge should be accessible to everyone, not trapped in experts' heads.
175
+
176
+ ## Acknowledgments
177
+
178
+ Developed with assistance from [Claude](https://claude.ai) (Anthropic) for code scaffolding, refactoring, test harness construction, and documentation. All seafood domain knowledge — species classification, trim logic, cut styles, freeze-cycle inference rules, alias curation, and design decisions — comes from the author's 20+ years in foodservice category management.
179
+
180
+ ## License
181
+
182
+ MIT License — use it, modify it, share it. Just make seafood data better for everyone.
@@ -0,0 +1,55 @@
1
+ """fishlib — parse, standardize, and compare seafood product descriptions."""
2
+
3
+ __version__ = "0.6.4"
4
+ __author__ = "Karen Morton"
5
+
6
+ from .parser import parse, parse_batch, extract_key_attributes
7
+ from .matcher import (
8
+ comparison_key,
9
+ match,
10
+ is_comparable,
11
+ match_score,
12
+ find_matches,
13
+ explain_difference,
14
+ )
15
+ from .standards import (
16
+ standardize_form,
17
+ standardize_skin,
18
+ standardize_bone,
19
+ standardize_trim,
20
+ standardize_pack,
21
+ standardize_storage,
22
+ standardize_cut_style,
23
+ standardize_harvest,
24
+ standardize_origin,
25
+ standardize_size,
26
+ get_standard_code,
27
+ get_code_info,
28
+ list_codes,
29
+ )
30
+
31
+ __all__ = [
32
+ "__version__",
33
+ "parse",
34
+ "parse_batch",
35
+ "extract_key_attributes",
36
+ "comparison_key",
37
+ "match",
38
+ "is_comparable",
39
+ "match_score",
40
+ "find_matches",
41
+ "explain_difference",
42
+ "standardize_form",
43
+ "standardize_skin",
44
+ "standardize_bone",
45
+ "standardize_trim",
46
+ "standardize_pack",
47
+ "standardize_storage",
48
+ "standardize_cut_style",
49
+ "standardize_harvest",
50
+ "standardize_origin",
51
+ "standardize_size",
52
+ "get_standard_code",
53
+ "get_code_info",
54
+ "list_codes",
55
+ ]