food_fish_parser 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: a92347877837b339f13c2140d2955f41410ea7a6258b34a51f1daf35c4009715
4
+ data.tar.gz: 9d554028e69f5925e747054cd6c13f3742014ec37ec0621ae1b6008f72a0a8fc
5
+ SHA512:
6
+ metadata.gz: c93ac59e5393093803ad638ab8992deb0a2af35ed7df7f9e3c0d5666d9477d66660f55bce085cea38b9d3a75c7fd6c17f769af56999b15e745cdb55f4540e6ef
7
+ data.tar.gz: bf20e42335d25ab91068d2dc8bdd6e8db5a0fbc13595297c97587e0c3a8c4daf3b4668eee32742669cfda32c508e8c2d99bf1fb3dc242c705d1efd2bf63e7550
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2018 Questionmark
4
+ Copyright (c) 2018 wvengen
5
+
6
+ Permission is hereby granted, free of charge, to any person obtaining a copy
7
+ of this software and associated documentation files (the "Software"), to deal
8
+ in the Software without restriction, including without limitation the rights
9
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
+ copies of the Software, and to permit persons to whom the Software is
11
+ furnished to do so, subject to the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be included in all
14
+ copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,93 @@
1
+ # Food fish parser
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/food_fish_parser.svg)](https://rubygems.org/gems/food_fish_parser)
4
+
5
+ Food products with fish in them often list some details about the particular species,
6
+ fishing method and origin. This [Ruby](https://www.ruby-lang.org/) gem and program parses
7
+ the text found on the product and returns a structured representation.
8
+
9
+ At this moment, the parser mostly recognises Dutch-language text.
10
+
11
+ Please note that this code is in an early stage of development.
12
+
13
+ ## Installation
14
+
15
+ ```
16
+ gem install food_fish_parser
17
+ ```
18
+
19
+ ## Example
20
+
21
+ ```ruby
22
+ require 'food_fish_parser'
23
+
24
+ s = <<EOT.gsub(/\n/, '').strip
25
+ zalm (salmo salar), gekweekt in noorwegen, kweekmethode: kooien.pangasius
26
+ (pangasius spp), gekweekt in vietnam, kweekmethode: vijver. coquilles
27
+ (placopecten magellanicus), vangstgebied noordwest atlantische oceaan fao 21,
28
+ kabeljauw (gadus macrocephalus), vangstgebied stille oceaan fao 67, garnaal
29
+ (litopenaeus vannamei), gekweekt in ecuador, kweekmethode: vijver.
30
+ EOT
31
+ parser = FoodFishParser::Parser.new
32
+ puts parser.parse(s).to_a.inspect
33
+ ```
34
+ Results in a list of detected fishes
35
+ ```ruby
36
+ [
37
+ {
38
+ :names => [{ :common=>"zalm", :latin=>"salmo salar" }],
39
+ :catch_areas => [],
40
+ :catch_methods => [],
41
+ :aquaculture_areas => [{ :text=>"noorwegen", :fao_codes=>[] }],
42
+ :aquaculture_methods => [{ :text=>"kooien" }]
43
+ },
44
+ {
45
+ :names => [{ :common=>"pangasius", :latin=>"pangasius spp" }],
46
+ :catch_areas => [],
47
+ :catch_methods => [],
48
+ :aquaculture_areas => [{ :text=>"vietnam", :fao_codes=>[] }],
49
+ :aquaculture_methods => [{ :text=>"vijver" }]
50
+ },
51
+ {
52
+ :names => [{ :common=>"coquilles", :latin=>"placopecten magellanicus" }],
53
+ :catch_areas => [{ :text=>"noordwest atlantische oceaan", :fao_codes=>["21"] }],
54
+ :catch_methods => [],
55
+ :aquaculture_areas => [],
56
+ :aquaculture_methods => []
57
+ },
58
+ {
59
+ :names => [{ :common=>"kabeljauw", :latin=>"gadus macrocephalus" }],
60
+ :catch_areas => [{ :text=>"stille oceaan", :fao_codes=>["67"] }],
61
+ :catch_methods => [],
62
+ :aquaculture_areas => [],
63
+ :aquaculture_methods => []
64
+ },
65
+ {
66
+ :names => [{ :common=>"garnaal", :latin=>"litopenaeus vannamei" }],
67
+ :catch_areas => [],
68
+ :catch_methods => [],
69
+ :aquaculture_areas => [{ :text=>"ecuador", :fao_codes=>[] }],
70
+ :aquaculture_methods => [{ :text=>"vijver" }]
71
+ }
72
+ ]
73
+ ```
74
+
75
+
76
+ ## Test data
77
+
78
+ [`data/fish-ingredient-samples-qm-nl`](data/fish-ingredient-samples-qm-nl) contains about 2k
79
+ real-world ingredient lists with fish found on the Dutch market. Each line contains one ingredient
80
+ list (newlines are encoded as `\n`, empty lines and those starting with `#` are ignored).
81
+
82
+
83
+ ## Species
84
+
85
+ This gem does very basic named entity recognition of fish names. There are more fish names than the
86
+ parser can handle, so the detected fish names are limited to those actually found in packaged food products.
87
+ At the moment only a very limited number of names is detected. To add more, expand the _species-found_ text
88
+ files in [species/](species/) and run `species/species-treetop-gen.sh`. This updates the fish name grammars.
89
+
90
+
91
+ ## License
92
+
93
+ This software is distributed under the [MIT license](LICENSE). Data may have a [different license](data/README.md).
@@ -0,0 +1,111 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # Parser for food fish lists.
4
+ #
5
+ require 'optparse'
6
+
7
+ $:.push(File.expand_path(File.dirname(__FILE__) + "/../lib"))
8
+ require 'food_fish_parser'
9
+
10
+ begin
11
+ require 'pry'
12
+ def pp(o, color: true)
13
+ if color
14
+ Pry::ColorPrinter.pp(o)
15
+ else
16
+ puts(o.inspect)
17
+ end
18
+ end
19
+ rescue LoadError
20
+ # fallback without color printing
21
+ def pp(o, color: nil)
22
+ puts(o.inspect)
23
+ end
24
+ end
25
+
26
+ def colorize(color, s)
27
+ if color
28
+ "\e[#{color}m#{s}\e[0;22m"
29
+ else
30
+ s
31
+ end
32
+ end
33
+
34
+ def parse_single(s, parsed=nil, parser:, verbosity: 1, print: nil, escape: false, color: false)
35
+ parsed ||= parser.parse(s)
36
+
37
+ return unless print.nil? || (parsed && print == :parsed) || (!parsed && print == :noresult)
38
+
39
+ puts colorize(color && "0;32", escape ? s.gsub("\n", "\\n") : s) if verbosity > 0
40
+
41
+ if parsed
42
+ puts(parsed.inspect) if verbosity > 1
43
+ pp(parsed.to_a, color: color) if verbosity > 0
44
+ return true
45
+ else
46
+ puts "(no result: #{parser.parser.failure_reason})" if verbosity > 0
47
+ return false
48
+ end
49
+ end
50
+
51
+ def parse_file(path, parser:, verbosity: 1, print: nil, escape: false, color: false)
52
+ count_parsed = count_noresult = 0
53
+ File.foreach(path) do |line|
54
+ next if line =~ /^#/ # comment
55
+ next if line =~ /^\s*$/ # empty line
56
+
57
+ line = line.gsub('\\n', "\n").strip
58
+ parsed = parser.parse(line)
59
+ count_parsed += 1 if parsed
60
+ count_noresult += 1 unless parsed
61
+
62
+ parse_single(line, parsed, parser: parser, verbosity: verbosity, print: print, escape: escape, color: color)
63
+ end
64
+
65
+ pct_parsed = 100.0 * count_parsed / (count_parsed + count_noresult)
66
+ pct_noresult = 100.0 * count_noresult / (count_parsed + count_noresult)
67
+ puts "parsed #{colorize(color && "1;32", count_parsed)} (#{pct_parsed.round(1)}%), no result #{colorize(color && "1;31", count_noresult)} (#{pct_noresult.round(1)}%)"
68
+ return count_noresult
69
+ end
70
+
71
+ verbosity = 1
72
+ files = []
73
+ strings = []
74
+ print = nil
75
+ escape = false
76
+ color = true
77
+ OptionParser.new do |opts|
78
+ opts.banner = <<-EOF.gsub(/^ /, '')
79
+ Usage: #{$0} [options] --file|-f <filename>
80
+ #{$0} [options] --string|-s <text>
81
+
82
+ EOF
83
+
84
+ opts.on("-f", "--file FILE", "Parse all lines of the file as fish detail text.") {|f| files << f }
85
+ opts.on("-s", "--string TEXT", "Parse specified fish detail text.") {|s| strings << s }
86
+
87
+ opts.on("-q", "--[no-]quiet", "Only show summary.") {|q| verbosity = q ? 0 : 1 }
88
+ opts.on("-p", "--parsed", "Only show lines that were successfully parsed.") {|p| print = :parsed }
89
+ opts.on("-n", "--noresult", "Only show lines that had no result.") {|p| print = :noresult }
90
+ opts.on("-e", "--[no-]escape", "Escape newlines") {|e| escape = !!e }
91
+ opts.on("-c", "--[no-]color", "Use color") {|e| color = !!e }
92
+ opts.on("-v", "--[no-]verbose", "Show more data (parsed tree).") {|v| verbosity = v ? 2 : 1 }
93
+ opts.on( "--version", "Show program version.") do
94
+ puts("food_fish_parser v#{FoodFishParser::VERSION}")
95
+ exit
96
+ end
97
+ opts.on("-h", "--help", "Show this help") do
98
+ puts(opts)
99
+ exit
100
+ end
101
+ end.parse!
102
+
103
+ if strings.any? || files.any?
104
+ parser = FoodFishParser::Parser.new
105
+ success = true
106
+ strings.each {|s| success &= parse_single(s, parser: parser, verbosity: verbosity, print: print, escape: escape, color: color) }
107
+ files.each {|f| success &= parse_file(f, parser: parser, verbosity: verbosity, print: print, escape: escape, color: color) }
108
+ success or exit(1)
109
+ else
110
+ STDERR.puts("Please specify one or more --file or --string arguments (see --help).")
111
+ end
@@ -0,0 +1,30 @@
1
+ $:.unshift(File.expand_path(File.dirname(__FILE__) + '/lib'))
2
+ require 'food_fish_parser/version'
3
+
4
+ Gem::Specification.new do |s|
5
+ s.name = 'food_fish_parser'
6
+ s.version = FoodFishParser::VERSION
7
+ s.date = FoodFishParser::VERSION_DATE
8
+ s.summary = 'Parser for fish details found on food products.'
9
+ s.authors = ['wvengen']
10
+ s.email = ['dev-ruby@willem.engen.nl']
11
+ s.homepage = 'https://github.com/q-m/food-fish-parser-ruby'
12
+ s.license = 'MIT'
13
+ s.description = <<-EOD
14
+ Food products that contain fish sometimes indicate details like fishing
15
+ area, method or aquaculture country. This parser know about various ways
16
+ this is found on a product package, and returns a structured representation
17
+ of the fish ingredient details.
18
+ EOD
19
+ s.metadata = {
20
+ 'bug_tracker_uri' => 'https://github.com/q-m/food-fish-parser-ruby/issues',
21
+ 'source_code_uri' => 'https://github.com/q-m/food-fish-parser-ruby',
22
+ }
23
+
24
+ s.files = `git ls-files *.gemspec lib`.split("\n")
25
+ s.executables = `git ls-files bin`.split("\n").map(&File.method(:basename))
26
+ s.extra_rdoc_files = ['README.md', 'LICENSE']
27
+ s.require_paths = ['lib']
28
+
29
+ s.add_runtime_dependency 'treetop', '~> 1.6'
30
+ end
@@ -0,0 +1,2 @@
1
+ require_relative 'food_fish_parser/version'
2
+ require_relative 'food_fish_parser/parser'
@@ -0,0 +1,18 @@
1
+ require 'treetop'
2
+ require_relative 'nodes'
3
+
4
+ # @todo find a way to auto-generate Ruby from Treetop files when building gem,
5
+ # see https://stackoverflow.com/q/37794587/2866660
6
+
7
+ # note that the species name files are autogenerated
8
+ Treetop.load File.dirname(__FILE__) + '/grammar/common'
9
+ Treetop.load File.dirname(__FILE__) + '/grammar/fish_name_latin'
10
+ Treetop.load File.dirname(__FILE__) + '/grammar/fish_name_nl'
11
+ Treetop.load File.dirname(__FILE__) + '/grammar/fish_name'
12
+ Treetop.load File.dirname(__FILE__) + '/grammar/words'
13
+ Treetop.load File.dirname(__FILE__) + '/grammar/fao_area'
14
+ Treetop.load File.dirname(__FILE__) + '/grammar/catch_area'
15
+ Treetop.load File.dirname(__FILE__) + '/grammar/catch_method'
16
+ Treetop.load File.dirname(__FILE__) + '/grammar/aquac_area'
17
+ Treetop.load File.dirname(__FILE__) + '/grammar/aquac_method'
18
+ Treetop.load File.dirname(__FILE__) + '/grammar/root'
@@ -0,0 +1,27 @@
1
+
2
+ module FoodFishParser::Grammar
3
+ grammar AquacArea
4
+ include Common
5
+ include Words
6
+ include FaoArea
7
+
8
+ rule aquac_area_indicator
9
+ (
10
+ 'uit'i / 'gekweekt in'i / 'gekweekt op'i /
11
+ 'aquacultuurproduct uit'i / 'aquacultuur product uit'i
12
+ )
13
+ !char
14
+ ( ws* ( ':' / '>' ) )?
15
+ end
16
+
17
+ rule aquac_area_content
18
+ (
19
+ ( area:( words ) ( ws* comma? ws* fao_area_list_enclosures )? ) /
20
+ ( fao_area_list_enclosures ws* comma? ws* area:( words ) ) /
21
+ fao_area_list_enclosures area:''
22
+ )
23
+ <AquacAreaNode>
24
+ end
25
+
26
+ end
27
+ end
@@ -0,0 +1,18 @@
1
+ module FoodFishParser::Grammar
2
+ grammar AquacMethod
3
+ include Common
4
+ include Words
5
+
6
+ rule aquac_method_indicator
7
+ ( 'wijze van vangst en kweekmethode: aquacultuur:'i / 'kweekmethoden'i / 'kweekmethode'i )
8
+ !char
9
+ ( ws* ( ':' / '>' ) )?
10
+ end
11
+
12
+ rule aquac_method_content
13
+ words
14
+ <AquacMethodNode>
15
+ end
16
+
17
+ end
18
+ end
@@ -0,0 +1,32 @@
1
+ module FoodFishParser::Grammar
2
+ grammar CatchArea
3
+ include Common
4
+ include Words
5
+ include FaoArea
6
+
7
+ rule catch_area_indicator
8
+ ( 'wild'i ws* )?
9
+ (
10
+ 'gevangen'i ws+ ( 'in'i / 'op'i ) /
11
+ 'visgebied'i / 'vangstgebied'i / 'vangsgebied'i /
12
+ 'betrapt bij'i
13
+ )
14
+ !char
15
+ ( ws* ( ':' / '>' ) )?
16
+ end
17
+
18
+ rule catch_area_indicator_short
19
+ catch_area_indicator /
20
+ ( 'in'i / 'op'i ) !char ( ws* ':' )?
21
+ end
22
+
23
+ rule catch_area_content
24
+ (
25
+ ( area:( words_no_with ) ( ws* comma? ws* fao_area_list_enclosures )? ) /
26
+ ( fao_area_list_enclosures ws* comma? ws* area:( words_no_with ) ) /
27
+ fao_area_list_enclosures area:''
28
+ )
29
+ <CatchAreaNode>
30
+ end
31
+ end
32
+ end
@@ -0,0 +1,24 @@
1
+ module FoodFishParser::Grammar
2
+ grammar CatchMethod
3
+ include Common
4
+ include Words
5
+
6
+ rule catch_method_indicator
7
+ ( 'wild'i ws* )?
8
+ ( 'gevangen'i ws+ with / 'vangstmethode'i / 'vangsmethode'i )
9
+ !char
10
+ ( ws* ( ':' / '>' ) )?
11
+ end
12
+
13
+ rule catch_method_indicator_short
14
+ catch_method_indicator /
15
+ with ( ws* ':' )?
16
+ end
17
+
18
+ rule catch_method_content
19
+ words_no_in_on
20
+ <CatchMethodNode>
21
+ end
22
+
23
+ end
24
+ end
@@ -0,0 +1,38 @@
1
+ module FoodFishParser::Grammar
2
+ grammar Common
3
+
4
+ # whitespace
5
+ rule ws
6
+ [ \t]
7
+ end
8
+
9
+ rule char
10
+ [[:alnum:]] / [-]
11
+ end
12
+
13
+ rule comma
14
+ ','
15
+ end
16
+
17
+ rule dash
18
+ [-֊ ‐ ‑ ‒ – — ― ﹘﹣-]
19
+ end
20
+
21
+
22
+ rule and
23
+ ( 'and'i / 'en'i / 'und'i ) !char / '&'
24
+ end
25
+
26
+ rule or
27
+ ( 'or'i / 'of'i / 'oder'i ) !char / '/'
28
+ end
29
+
30
+ rule and_or
31
+ ( ( 'and/or'i / 'en/of'i ) !char ) / and / or
32
+ end
33
+
34
+ rule with
35
+ ( 'met'i / 'd.m.v.'i '.'? / 'with'i ) !char
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,60 @@
1
+ module FoodFishParser::Grammar
2
+ grammar FaoArea
3
+ include Common
4
+
5
+ rule fao_area_list_enclosures
6
+ ( '(' ws* fao_area_list ws* ')' ) /
7
+ ( '|' ws* fao_area_list ) /
8
+ fao_area_list
9
+ end
10
+
11
+ rule fao_area_list
12
+ fao_area_indicator ws*
13
+ ':'? ws*
14
+ fao_area_code
15
+ (
16
+ ( '/' fao_area_code )+ /
17
+ ( ( ',' ws* fao_area_code )+ ws* comma? ws* and ws+ fao_area_code ) /
18
+ ( ',' ws* fao_area_code )+
19
+ )?
20
+ end
21
+
22
+ rule fao_area_indicator
23
+ ( 'FAO'i / 'FA0'i )
24
+ ( ( dash / ws+ ) 'gebied'i )? ( ws* 'nr'i '.'? )?
25
+ end
26
+
27
+ rule fao_area_code
28
+ fao_area_major_code
29
+ (
30
+ ( ws* '(' ws* fao_area_sub_range ws* ')' ) /
31
+ ( fao_area_sub_range )
32
+ )?
33
+ <FaoAreaCodeNode>
34
+ end
35
+
36
+ rule fao_area_major_code
37
+ ( '0' [0-9] [0-9] ) / ( [0-9] [0-9] )
38
+ end
39
+
40
+ rule fao_area_sub_range
41
+ fao_area_sub_code
42
+ ( ws* dash ws* fao_area_sub_code )?
43
+ end
44
+
45
+ rule fao_area_sub_code
46
+ (
47
+ ( ( dash / '/' / ws* )? [ivxIVX]+ ) /
48
+ ( dash [0-9] [0-9] )
49
+ )
50
+ fao_area_suffix?
51
+ (
52
+ ws* '(' ws* [[:digit:]]+ ( ws* dash ws* [[:digit:]]+ )? ws* ')'
53
+ )?
54
+ end
55
+
56
+ rule fao_area_suffix
57
+ [abcdABCD]
58
+ end
59
+ end
60
+ end
@@ -0,0 +1,21 @@
1
+ module FoodFishParser::Grammar
2
+ grammar FishName
3
+ include Common
4
+ include FishNameLatin
5
+ include FishNameNL
6
+
7
+ rule fish_name
8
+ (
9
+ fish_name_nl ws* '(' ws* fish_name_latin ws* ')' /
10
+ fish_name_nl /
11
+ fish_name_latin
12
+ )
13
+ <FishNameNode>
14
+ end
15
+
16
+ rule fish_name_list
17
+ fish_name ( ws+ and_or ws+ fish_name )*
18
+ end
19
+
20
+ end
21
+ end
@@ -0,0 +1,19 @@
1
+ # autogenerated by species-treetop-gen-latin.rb on 2020-03-17
2
+ module FoodFishParser::Grammar
3
+ grammar FishNameLatin
4
+ include Common
5
+
6
+ rule fish_name_latin
7
+ fish_name_latin_first ( ws+ fish_name_latin_second )?
8
+ <FishNameLatinNode>
9
+ end
10
+
11
+ rule fish_name_latin_first
12
+ 'zygochlamys'i / 'zeus'i / 'xiphopenaeus'i / 'xiphias'i / 'undaria'i / 'ulva'i / 'trichiurus'i / 'trachurus'i / 'todarodes'i / 'thunnus'i / 'theragra'i / 'stolephorus'i / 'sprattus'i / 'spirulina'i / 'sparus'i / 'solea'i / 'sepiella'i / 'sepia'i / 'sebastes'i / 'scomber'i / 'sardinella'i / 'sardina'i / 'salmo'i / 'saccharina'i / 'reinhardtius'i / 'psetta'i / 'procambarus'i / 'portunus'i / 'porphyra'i / 'pollachius'i / 'pleuronectes'i / 'pleoticus'i / 'placopecten'i / 'phymatolithon'i / 'perna'i / 'penaeus'i / 'penaeidae'i / 'pelvetia'i / 'pecten'i / 'patinopecten'i / 'parapenaeopsis'i / 'paralomis'i / 'paphia'i / 'pangasius'i / 'pandalus'i / 'palmaria'i / 'pagellus'i / 'pacifische'i / 'ovalipes'i / 'ostrea'i / 'oreochromis'i / 'oncorhynchus'i / 'octopus'i / 'nephrops'i / 'nemipterus'i / 'nelumbo'i / 'mytilus'i / 'mulinia'i / 'micromesistius'i / 'metapenaeus'i / 'merluccius'i / 'merlangius'i / 'melanogrammus'i / 'macruronus'i / 'macrocystis'i / 'lophius'i / 'loligo'i / 'litopenaeus'i / 'lithodes'i / 'limanda'i / 'lethrinus'i / 'lepidotrigla'i / 'lepidopsetta'i / 'lates'i / 'laminaria'i / 'katsuwonus'i / 'illex'i / 'homarus'i / 'himanthalia'i / 'haematococcus'i / 'gracilaria'i / 'gelidium'i / 'gadus'i / 'fucus'i / 'euthynnus'i / 'ensis'i / 'engraulis'i / 'dunaliella'i / 'dosidicus'i / 'dicentrarchus'i / 'crassostrea'i / 'crangon'i / 'clupea'i / 'clarias'i / 'chondrus'i / 'chlorella'i / 'cerastoderma'i / 'caulerpa'i / 'ascophyllum'i / 'anguilla'i / 'anadara'i / 'alle'i / 'alaria'i / 'acipenser'i
13
+ end
14
+
15
+ rule fish_name_latin_second
16
+ 'yessoensis'i / 'vulgaris'i / 'virens'i / 'vesiculosus'i / 'verrucosa'i / 'vannamei'i / 'undulata'i / 'umbilicalis'i / 'tenera'i / 'stylifera'i / 'sprattus'i / 'spp.'i / 'spp'i / 'solea'i / 'scombrus'i / 'santolla'i / 'salina'i / 'salar'i / 'ringens'i / 'pyrifera'i / 'pyrenoidosa'i / 'punctatus'i / 'productus'i / 'pluvialis'i / 'platessa'i / 'platensis'i / 'piscatorius'i / 'pinnatifida'i / 'pilchardus'i / 'pelamis'i / 'pelagicus'i / 'patagonica'i / 'pangasius'i / 'palmata'i / 'pacificus'i / 'officinalis'i / 'ocellatus'i / 'nucifera'i / 'novaezelandiae'i / 'norvegicus'i / 'nodosum'i / 'niloticus'i / 'nerka'i / 'mykiss'i / 'murphyi'i / 'muelleri'i / 'morhua'i / 'monodon'i / 'monoceros'i / 'microptera'i / 'merluccius'i / 'merlangus'i / 'merguiensis'i / 'maximus'i / 'maxima'i / 'marinus'i / 'magellanicus'i / 'macrocephalus'i / 'limanda'i / 'lepturus'i / 'lentillifera'i / 'latissima'i / 'lactuca'i / 'labrax'i / 'kroyeri'i / 'kisutch'i / 'keta'i / 'kabeljauw'i / 'jordani'i / 'japonicus'i / 'japonica'i / 'hippoglossoides'i / 'hexodon'i / 'harengus'i / 'gueldenstaedtii'i / 'granulosa'i / 'gorbuscha'i / 'gladius'i / 'gigas'i / 'gibbosa'i / 'gariepinus'i / 'galloprovincialis'i / 'faber'i / 'esculenta'i / 'encrasicolus'i / 'elongata'i / 'edulis'i / 'edule'i / 'directus'i / 'digitata'i / 'crispus'i / 'crangon'i / 'clarkii'i / 'chilensis'i / 'chalcogramma'i / 'capensis'i / 'canaliculus'i / 'canaliculata'i / 'calcareum'i / 'borealis'i / 'bogaraveo'i / 'bilineata'i / 'australis'i / 'aurata'i / 'argentinus'i / 'antiquata'i / 'anguilla'i / 'anchoita'i / 'americanus'i / 'alle'i / 'albacares'i / 'alalunga'i / 'aeglefinus'i
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,27 @@
1
+ # autogenerated by species-treetop-gen-nl.rb on 2020-03-17
2
+ module FoodFishParser::Grammar
3
+ grammar FishNameNL
4
+ include Common
5
+
6
+ rule fish_name_nl
7
+ ( fish_name_nl_area ws+ )? ( fish_name_nl_attr ws* )? fish_name_nl_name fish_name_nl_suffix?
8
+ <FishNameCommonNode>
9
+ end
10
+
11
+ rule fish_name_nl_area
12
+ 'pacifische'i / 'indische'i / 'groenlandse'i / 'atlantische'i / 'argentijnse'i / 'alaska'i
13
+ end
14
+
15
+ rule fish_name_nl_attr
16
+ 'zwarte'i / 'zwart'i / 'witte'i / 'witpoot'i / 'wit'i / 'roze'i / 'rood'i / 'rode'i / 'rivier'i / 'pijl'i / 'kleine'i / 'klein'i / 'grote'i / 'groot'i / 'groene'i / 'groen'i / 'doorn'i / 'coho'i / 'chum'i / 'blauwe'i / 'blauw'i
17
+ end
18
+
19
+ rule fish_name_nl_name
20
+ 'zonnevis'i / 'zeewolf'i / 'zeesnoek'i / 'zeekreeft'i / 'zeeforel'i / 'zeebaars'i / 'zalm'i / 'wijting'i / 'weekdieren'i / 'weekdier'i / 'vintonijn'i / 'tonijn'i / 'tong'i / 'tilapia'i / 'tarbot'i / 'tapijtschelp'i / 'sprot'i / 'spie'i / 'snotolf'i / 'snoekbaars'i / 'snoek'i / 'skipjack tonijn'i / 'schol'i / 'schelvis'i / 'schelpen'i / 'schelp'i / 'schar'i / 'sardines'i / 'regenboogforel'i / 'raat'i / 'poon'i / 'pollak'i / 'pangasius'i / 'paling'i / 'oogtonijn'i / 'mul'i / 'mosselen'i / 'mossel'i / 'meerval'i / 'mantelschelp'i / 'makreel'i / 'lom'i / 'leng'i / 'kreeft'i / 'koolvis'i / 'kokkel'i / 'karper'i / 'kabeljauw'i / 'hondstong'i / 'hoki'i / 'heilbot'i / 'heek'i / 'hake'i / 'haai'i / 'ha'i / 'gruis'i / 'griet'i / 'geep'i / 'geelvintonijn'i / 'garnalen'i / 'garnaal'i / 'fint'i / 'coquilles'i / 'cocquilles'i / 'botervis'i / 'bot'i / 'beekridder'i / 'barracuda'i / 'baars'i / 'arkschelp'i / 'ansjovis'i / 'albacore tonijn'i
21
+ end
22
+
23
+ rule fish_name_nl_suffix
24
+ 'vlees'i / 'ringen'i / 'ring'i / 'filets'i / 'filet'i
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,55 @@
1
+ module FoodFishParser::Grammar
2
+ grammar Root
3
+ include Common
4
+ include FishName
5
+ include CatchArea
6
+ include CatchMethod
7
+ include AquacArea
8
+ include AquacMethod
9
+
10
+ rule root
11
+ fishes:(
12
+ ( fish ( ws* and_or ws* fish )+ ) /
13
+ ( fish ( ws* ( '.' / comma ) ws* fish )+ ) /
14
+ fish
15
+ )
16
+ ( ws* '.' )?
17
+ <RootNode>
18
+ end
19
+
20
+ rule fish
21
+ (
22
+ ( fish_name_list ( ws* ( comma / ':' ) )? ws+ fish_catch_info ) /
23
+ ( fish_name_list ( ws* ( comma / ':' ) )? ws+ fish_aquac_info ) /
24
+ fish_name_list /
25
+ fish_catch_info /
26
+ fish_aquac_info
27
+ )
28
+ <FishNode>
29
+ end
30
+
31
+ rule fish_catch_info
32
+ (
33
+ catch_method_indicator ws* catch_method_content
34
+ ( ( ws* comma )? ws+ catch_area_indicator_short ws* catch_area_content )?
35
+ ) / (
36
+ catch_area_indicator ws* catch_area_content
37
+ ( ( ws* comma )? ws+ catch_method_indicator_short ws* catch_method_content )?
38
+ )
39
+ end
40
+
41
+ rule fish_aquac_info
42
+ (
43
+ aquac_area_indicator ws* aquac_area_content
44
+ ws* '.' ws* aquac_method_indicator ws* aquac_method_content
45
+ ) / (
46
+ aquac_area_indicator ws* aquac_area_content
47
+ ( ( ws* comma )? ws+ aquac_method_indicator ws* aquac_method_content )?
48
+ ) / (
49
+ aquac_method_indicator ws* aquac_method_content
50
+ ( ( ws* comma )? ws+ aquac_area_indicator ws* aquac_area_content )?
51
+ )
52
+ end
53
+
54
+ end
55
+ end
@@ -0,0 +1,52 @@
1
+ module FoodFishParser::Grammar
2
+ grammar Words
3
+ include Common
4
+ include FishNameLatin
5
+ include FishNameNL
6
+
7
+ rule word
8
+ word_abbr / '(sub)'i? !words_to_avoid char+
9
+ end
10
+
11
+ rule words
12
+ word ( word_sep word )*
13
+ end
14
+
15
+ rule words_no_in_on
16
+ !( 'in'i / 'op'i !char ) word ( word_sep !( 'in'i / 'op'i !char ) word )*
17
+ end
18
+
19
+ rule words_no_with
20
+ !with word ( word_sep !with word )*
21
+ end
22
+
23
+ rule word_sep
24
+ ( ws* ( comma / '/' ) ws* ) / ws+
25
+ end
26
+
27
+ rule word_abbr
28
+ ( [a-zA-Z] '.' )+ [a-zA-Z] / [a-zA-Z] '.' ( [a-zA-Z] '.' )+ ![a-zA-Z]
29
+ end
30
+
31
+ # these words should not be considered, because they indicate a new section
32
+ rule words_to_avoid
33
+ (
34
+ fish_name_latin /
35
+ fish_name_nl /
36
+ 'gevangen'i /
37
+ 'visgebied'i /
38
+ 'vangstgebied'i /
39
+ 'vangstmethode'i /
40
+ 'vangsmethode'i /
41
+ 'betrapt'i /
42
+ 'gekweekt'i /
43
+ 'kweekmethode'i /
44
+ 'kweekmethoden'i /
45
+ 'd.m.v'i '.'? /
46
+ 'FAO'i /
47
+ 'FA0'i
48
+ )
49
+ ![[:alpha:]]
50
+ end
51
+ end
52
+ end
@@ -0,0 +1,91 @@
1
+ require 'treetop/runtime'
2
+
3
+ # Needs to be in grammar namespace so Treetop can find the nodes.
4
+ module FoodFishParser
5
+ module Grammar
6
+
7
+ # Additions for Treetop nodes, include this in other nodes where needed.
8
+ module SyntaxNodeAdditions
9
+ def to_a_deep(n, cls)
10
+ if n.is_a?(cls)
11
+ [n]
12
+ elsif n.nonterminal?
13
+ n.elements.map {|m| to_a_deep(m, cls) }.flatten(1).compact
14
+ end
15
+ end
16
+ end
17
+
18
+ # Root object, contains everything else.
19
+ module RootNode
20
+ include SyntaxNodeAdditions
21
+ def to_a
22
+ to_a_deep(fishes, FishNode).map(&:to_h)
23
+ end
24
+ end
25
+
26
+ module FishNode
27
+ include SyntaxNodeAdditions
28
+ def to_h
29
+ {
30
+ names: to_a_deep(self, FishNameNode).map(&:to_h),
31
+ catch_areas: to_a_deep(self, CatchAreaNode).map(&:to_h),
32
+ catch_methods: to_a_deep(self, CatchMethodNode).map(&:to_h),
33
+ aquaculture_areas: to_a_deep(self, AquacAreaNode).map(&:to_h),
34
+ aquaculture_methods: to_a_deep(self, AquacMethodNode).map(&:to_h),
35
+ }
36
+ end
37
+ end
38
+
39
+ module FishNameNode
40
+ include SyntaxNodeAdditions
41
+ def to_h
42
+ {
43
+ common: to_a_deep(self, FishNameCommonNode).first&.text_value,
44
+ latin: to_a_deep(self, FishNameLatinNode).first&.text_value
45
+ }
46
+ end
47
+ end
48
+
49
+ module FishNameCommonNode; end
50
+ module FishNameLatinNode; end
51
+
52
+ module CatchAreaNode
53
+ include SyntaxNodeAdditions
54
+ def to_h
55
+ {
56
+ text: area.text_value,
57
+ fao_codes: to_a_deep(self, FaoAreaCodeNode).map(&:text_value)
58
+ }
59
+ end
60
+ end
61
+
62
+ module FaoAreaCodeNode; end
63
+
64
+ module CatchMethodNode
65
+ def to_h
66
+ {
67
+ text: self.text_value
68
+ }
69
+ end
70
+ end
71
+
72
+ module AquacAreaNode
73
+ include SyntaxNodeAdditions
74
+ def to_h
75
+ {
76
+ text: area.text_value,
77
+ fao_codes: to_a_deep(self, FaoAreaCodeNode).map(&:text_value)
78
+ }
79
+ end
80
+ end
81
+
82
+ module AquacMethodNode
83
+ def to_h
84
+ {
85
+ text: self.text_value
86
+ }
87
+ end
88
+ end
89
+
90
+ end
91
+ end
@@ -0,0 +1,26 @@
1
+ require_relative 'grammar'
2
+
3
+ module FoodFishParser
4
+ class Parser
5
+
6
+ # @!attribute [r] parser
7
+ # @return [Treetop::Runtime::CompiledParser] low-level parser object
8
+ # @note This attribute is there for convenience, but may change in the future. Take care.
9
+ attr_reader :parser
10
+
11
+ # Create a new fish detail parser
12
+ # @return [FoodFishParser::Parser]
13
+ def initialize
14
+ @parser = Grammar::RootParser.new
15
+ end
16
+
17
+ # Parse food fish text into a structured representation.
18
+ #
19
+ # @return [FoodFishParser::Grammar::RootNode] structured representation of fish details
20
+ # @note Unrecognized options are passed to Treetop, but this is not guarenteed to remain so forever.
21
+ def parse(s, **options)
22
+ @parser.parse(s, **options)
23
+ end
24
+
25
+ end
26
+ end
@@ -0,0 +1,4 @@
1
+ module FoodFishParser
2
+ VERSION = '0.1.0'
3
+ VERSION_DATE = '2020-03-17'
4
+ end
metadata ADDED
@@ -0,0 +1,86 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: food_fish_parser
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - wvengen
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2020-03-17 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: treetop
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.6'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.6'
27
+ description: |2
28
+ Food products that contain fish sometimes indicate details like fishing
29
+ area, method or aquaculture country. This parser know about various ways
30
+ this is found on a product package, and returns a structured representation
31
+ of the fish ingredient details.
32
+ email:
33
+ - dev-ruby@willem.engen.nl
34
+ executables:
35
+ - food_fish_parser
36
+ extensions: []
37
+ extra_rdoc_files:
38
+ - README.md
39
+ - LICENSE
40
+ files:
41
+ - LICENSE
42
+ - README.md
43
+ - bin/food_fish_parser
44
+ - food_fish_parser.gemspec
45
+ - lib/food_fish_parser.rb
46
+ - lib/food_fish_parser/grammar.rb
47
+ - lib/food_fish_parser/grammar/aquac_area.treetop
48
+ - lib/food_fish_parser/grammar/aquac_method.treetop
49
+ - lib/food_fish_parser/grammar/catch_area.treetop
50
+ - lib/food_fish_parser/grammar/catch_method.treetop
51
+ - lib/food_fish_parser/grammar/common.treetop
52
+ - lib/food_fish_parser/grammar/fao_area.treetop
53
+ - lib/food_fish_parser/grammar/fish_name.treetop
54
+ - lib/food_fish_parser/grammar/fish_name_latin.treetop
55
+ - lib/food_fish_parser/grammar/fish_name_nl.treetop
56
+ - lib/food_fish_parser/grammar/root.treetop
57
+ - lib/food_fish_parser/grammar/words.treetop
58
+ - lib/food_fish_parser/nodes.rb
59
+ - lib/food_fish_parser/parser.rb
60
+ - lib/food_fish_parser/version.rb
61
+ homepage: https://github.com/q-m/food-fish-parser-ruby
62
+ licenses:
63
+ - MIT
64
+ metadata:
65
+ bug_tracker_uri: https://github.com/q-m/food-fish-parser-ruby/issues
66
+ source_code_uri: https://github.com/q-m/food-fish-parser-ruby
67
+ post_install_message:
68
+ rdoc_options: []
69
+ require_paths:
70
+ - lib
71
+ required_ruby_version: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ required_rubygems_version: !ruby/object:Gem::Requirement
77
+ requirements:
78
+ - - ">="
79
+ - !ruby/object:Gem::Version
80
+ version: '0'
81
+ requirements: []
82
+ rubygems_version: 3.0.3
83
+ signing_key:
84
+ specification_version: 4
85
+ summary: Parser for fish details found on food products.
86
+ test_files: []