food_fish_parser 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: a92347877837b339f13c2140d2955f41410ea7a6258b34a51f1daf35c4009715
4
+ data.tar.gz: 9d554028e69f5925e747054cd6c13f3742014ec37ec0621ae1b6008f72a0a8fc
5
+ SHA512:
6
+ metadata.gz: c93ac59e5393093803ad638ab8992deb0a2af35ed7df7f9e3c0d5666d9477d66660f55bce085cea38b9d3a75c7fd6c17f769af56999b15e745cdb55f4540e6ef
7
+ data.tar.gz: bf20e42335d25ab91068d2dc8bdd6e8db5a0fbc13595297c97587e0c3a8c4daf3b4668eee32742669cfda32c508e8c2d99bf1fb3dc242c705d1efd2bf63e7550
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2018 Questionmark
4
+ Copyright (c) 2018 wvengen
5
+
6
+ Permission is hereby granted, free of charge, to any person obtaining a copy
7
+ of this software and associated documentation files (the "Software"), to deal
8
+ in the Software without restriction, including without limitation the rights
9
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
+ copies of the Software, and to permit persons to whom the Software is
11
+ furnished to do so, subject to the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be included in all
14
+ copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,93 @@
1
+ # Food fish parser
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/food_fish_parser.svg)](https://rubygems.org/gems/food_fish_parser)
4
+
5
+ Food products with fish in them often list some details about the particular species,
6
+ fishing method and origin. This [Ruby](https://www.ruby-lang.org/) gem and program parses
7
+ the text found on the product and returns a structured representation.
8
+
9
+ At this moment, the parser mostly recognises Dutch-language text.
10
+
11
+ Please note that this code is in an early stage of development.
12
+
13
+ ## Installation
14
+
15
+ ```
16
+ gem install food_fish_parser
17
+ ```
18
+
19
+ ## Example
20
+
21
+ ```ruby
22
+ require 'food_fish_parser'
23
+
24
+ s = <<EOT.gsub(/\n/, '').strip
25
+ zalm (salmo salar), gekweekt in noorwegen, kweekmethode: kooien.pangasius
26
+ (pangasius spp), gekweekt in vietnam, kweekmethode: vijver. coquilles
27
+ (placopecten magellanicus), vangstgebied noordwest atlantische oceaan fao 21,
28
+ kabeljauw (gadus macrocephalus), vangstgebied stille oceaan fao 67, garnaal
29
+ (litopenaeus vannamei), gekweekt in ecuador, kweekmethode: vijver.
30
+ EOT
31
+ parser = FoodFishParser::Parser.new
32
+ puts parser.parse(s).to_a.inspect
33
+ ```
34
+ Results in a list of detected fishes
35
+ ```ruby
36
+ [
37
+ {
38
+ :names => [{ :common=>"zalm", :latin=>"salmo salar" }],
39
+ :catch_areas => [],
40
+ :catch_methods => [],
41
+ :aquaculture_areas => [{ :text=>"noorwegen", :fao_codes=>[] }],
42
+ :aquaculture_methods => [{ :text=>"kooien" }]
43
+ },
44
+ {
45
+ :names => [{ :common=>"pangasius", :latin=>"pangasius spp" }],
46
+ :catch_areas => [],
47
+ :catch_methods => [],
48
+ :aquaculture_areas => [{ :text=>"vietnam", :fao_codes=>[] }],
49
+ :aquaculture_methods => [{ :text=>"vijver" }]
50
+ },
51
+ {
52
+ :names => [{ :common=>"coquilles", :latin=>"placopecten magellanicus" }],
53
+ :catch_areas => [{ :text=>"noordwest atlantische oceaan", :fao_codes=>["21"] }],
54
+ :catch_methods => [],
55
+ :aquaculture_areas => [],
56
+ :aquaculture_methods => []
57
+ },
58
+ {
59
+ :names => [{ :common=>"kabeljauw", :latin=>"gadus macrocephalus" }],
60
+ :catch_areas => [{ :text=>"stille oceaan", :fao_codes=>["67"] }],
61
+ :catch_methods => [],
62
+ :aquaculture_areas => [],
63
+ :aquaculture_methods => []
64
+ },
65
+ {
66
+ :names => [{ :common=>"garnaal", :latin=>"litopenaeus vannamei" }],
67
+ :catch_areas => [],
68
+ :catch_methods => [],
69
+ :aquaculture_areas => [{ :text=>"ecuador", :fao_codes=>[] }],
70
+ :aquaculture_methods => [{ :text=>"vijver" }]
71
+ }
72
+ ]
73
+ ```
74
+
75
+
76
+ ## Test data
77
+
78
+ [`data/fish-ingredient-samples-qm-nl`](data/fish-ingredient-samples-qm-nl) contains about 2k
79
+ real-world ingredient lists with fish found on the Dutch market. Each line contains one ingredient
80
+ list (newlines are encoded as `\n`, empty lines and those starting with `#` are ignored).
81
+
82
+
83
+ ## Species
84
+
85
+ This gem does very basic named entity recognition of fish names. There are more fish names than the
86
+ parser can handle, so the detected fish names are limited to those actually found in packaged food products.
87
+ At the moment only a very limited number of names is detected. To add more, expand the _species-found_ text
88
+ files in [species/](species/) and run `species/species-treetop-gen.sh`. This updates the fish name grammars.
89
+
90
+
91
+ ## License
92
+
93
+ This software is distributed under the [MIT license](LICENSE). Data may have a [different license](data/README.md).
@@ -0,0 +1,111 @@
1
+ #!/usr/bin/env ruby
2
+ #
3
+ # Parser for food fish lists.
4
+ #
5
+ require 'optparse'
6
+
7
+ $:.push(File.expand_path(File.dirname(__FILE__) + "/../lib"))
8
+ require 'food_fish_parser'
9
+
10
+ begin
11
+ require 'pry'
12
+ def pp(o, color: true)
13
+ if color
14
+ Pry::ColorPrinter.pp(o)
15
+ else
16
+ puts(o.inspect)
17
+ end
18
+ end
19
+ rescue LoadError
20
+ # fallback without color printing
21
+ def pp(o, color: nil)
22
+ puts(o.inspect)
23
+ end
24
+ end
25
+
26
+ def colorize(color, s)
27
+ if color
28
+ "\e[#{color}m#{s}\e[0;22m"
29
+ else
30
+ s
31
+ end
32
+ end
33
+
34
+ def parse_single(s, parsed=nil, parser:, verbosity: 1, print: nil, escape: false, color: false)
35
+ parsed ||= parser.parse(s)
36
+
37
+ return unless print.nil? || (parsed && print == :parsed) || (!parsed && print == :noresult)
38
+
39
+ puts colorize(color && "0;32", escape ? s.gsub("\n", "\\n") : s) if verbosity > 0
40
+
41
+ if parsed
42
+ puts(parsed.inspect) if verbosity > 1
43
+ pp(parsed.to_a, color: color) if verbosity > 0
44
+ return true
45
+ else
46
+ puts "(no result: #{parser.parser.failure_reason})" if verbosity > 0
47
+ return false
48
+ end
49
+ end
50
+
51
+ def parse_file(path, parser:, verbosity: 1, print: nil, escape: false, color: false)
52
+ count_parsed = count_noresult = 0
53
+ File.foreach(path) do |line|
54
+ next if line =~ /^#/ # comment
55
+ next if line =~ /^\s*$/ # empty line
56
+
57
+ line = line.gsub('\\n', "\n").strip
58
+ parsed = parser.parse(line)
59
+ count_parsed += 1 if parsed
60
+ count_noresult += 1 unless parsed
61
+
62
+ parse_single(line, parsed, parser: parser, verbosity: verbosity, print: print, escape: escape, color: color)
63
+ end
64
+
65
+ pct_parsed = 100.0 * count_parsed / (count_parsed + count_noresult)
66
+ pct_noresult = 100.0 * count_noresult / (count_parsed + count_noresult)
67
+ puts "parsed #{colorize(color && "1;32", count_parsed)} (#{pct_parsed.round(1)}%), no result #{colorize(color && "1;31", count_noresult)} (#{pct_noresult.round(1)}%)"
68
+ return count_noresult
69
+ end
70
+
71
+ verbosity = 1
72
+ files = []
73
+ strings = []
74
+ print = nil
75
+ escape = false
76
+ color = true
77
+ OptionParser.new do |opts|
78
+ opts.banner = <<-EOF.gsub(/^ /, '')
79
+ Usage: #{$0} [options] --file|-f <filename>
80
+ #{$0} [options] --string|-s <text>
81
+
82
+ EOF
83
+
84
+ opts.on("-f", "--file FILE", "Parse all lines of the file as fish detail text.") {|f| files << f }
85
+ opts.on("-s", "--string TEXT", "Parse specified fish detail text.") {|s| strings << s }
86
+
87
+ opts.on("-q", "--[no-]quiet", "Only show summary.") {|q| verbosity = q ? 0 : 1 }
88
+ opts.on("-p", "--parsed", "Only show lines that were successfully parsed.") {|p| print = :parsed }
89
+ opts.on("-n", "--noresult", "Only show lines that had no result.") {|p| print = :noresult }
90
+ opts.on("-e", "--[no-]escape", "Escape newlines") {|e| escape = !!e }
91
+ opts.on("-c", "--[no-]color", "Use color") {|e| color = !!e }
92
+ opts.on("-v", "--[no-]verbose", "Show more data (parsed tree).") {|v| verbosity = v ? 2 : 1 }
93
+ opts.on( "--version", "Show program version.") do
94
+ puts("food_fish_parser v#{FoodFishParser::VERSION}")
95
+ exit
96
+ end
97
+ opts.on("-h", "--help", "Show this help") do
98
+ puts(opts)
99
+ exit
100
+ end
101
+ end.parse!
102
+
103
+ if strings.any? || files.any?
104
+ parser = FoodFishParser::Parser.new
105
+ success = true
106
+ strings.each {|s| success &= parse_single(s, parser: parser, verbosity: verbosity, print: print, escape: escape, color: color) }
107
+ files.each {|f| success &= parse_file(f, parser: parser, verbosity: verbosity, print: print, escape: escape, color: color) }
108
+ success or exit(1)
109
+ else
110
+ STDERR.puts("Please specify one or more --file or --string arguments (see --help).")
111
+ end
@@ -0,0 +1,30 @@
1
+ $:.unshift(File.expand_path(File.dirname(__FILE__) + '/lib'))
2
+ require 'food_fish_parser/version'
3
+
4
+ Gem::Specification.new do |s|
5
+ s.name = 'food_fish_parser'
6
+ s.version = FoodFishParser::VERSION
7
+ s.date = FoodFishParser::VERSION_DATE
8
+ s.summary = 'Parser for fish details found on food products.'
9
+ s.authors = ['wvengen']
10
+ s.email = ['dev-ruby@willem.engen.nl']
11
+ s.homepage = 'https://github.com/q-m/food-fish-parser-ruby'
12
+ s.license = 'MIT'
13
+ s.description = <<-EOD
14
+ Food products that contain fish sometimes indicate details like fishing
15
+ area, method or aquaculture country. This parser know about various ways
16
+ this is found on a product package, and returns a structured representation
17
+ of the fish ingredient details.
18
+ EOD
19
+ s.metadata = {
20
+ 'bug_tracker_uri' => 'https://github.com/q-m/food-fish-parser-ruby/issues',
21
+ 'source_code_uri' => 'https://github.com/q-m/food-fish-parser-ruby',
22
+ }
23
+
24
+ s.files = `git ls-files *.gemspec lib`.split("\n")
25
+ s.executables = `git ls-files bin`.split("\n").map(&File.method(:basename))
26
+ s.extra_rdoc_files = ['README.md', 'LICENSE']
27
+ s.require_paths = ['lib']
28
+
29
+ s.add_runtime_dependency 'treetop', '~> 1.6'
30
+ end
@@ -0,0 +1,2 @@
1
+ require_relative 'food_fish_parser/version'
2
+ require_relative 'food_fish_parser/parser'
@@ -0,0 +1,18 @@
1
+ require 'treetop'
2
+ require_relative 'nodes'
3
+
4
+ # @todo find a way to auto-generate Ruby from Treetop files when building gem,
5
+ # see https://stackoverflow.com/q/37794587/2866660
6
+
7
+ # note that the species name files are autogenerated
8
+ Treetop.load File.dirname(__FILE__) + '/grammar/common'
9
+ Treetop.load File.dirname(__FILE__) + '/grammar/fish_name_latin'
10
+ Treetop.load File.dirname(__FILE__) + '/grammar/fish_name_nl'
11
+ Treetop.load File.dirname(__FILE__) + '/grammar/fish_name'
12
+ Treetop.load File.dirname(__FILE__) + '/grammar/words'
13
+ Treetop.load File.dirname(__FILE__) + '/grammar/fao_area'
14
+ Treetop.load File.dirname(__FILE__) + '/grammar/catch_area'
15
+ Treetop.load File.dirname(__FILE__) + '/grammar/catch_method'
16
+ Treetop.load File.dirname(__FILE__) + '/grammar/aquac_area'
17
+ Treetop.load File.dirname(__FILE__) + '/grammar/aquac_method'
18
+ Treetop.load File.dirname(__FILE__) + '/grammar/root'
@@ -0,0 +1,27 @@
1
+
2
+ module FoodFishParser::Grammar
3
+ grammar AquacArea
4
+ include Common
5
+ include Words
6
+ include FaoArea
7
+
8
+ rule aquac_area_indicator
9
+ (
10
+ 'uit'i / 'gekweekt in'i / 'gekweekt op'i /
11
+ 'aquacultuurproduct uit'i / 'aquacultuur product uit'i
12
+ )
13
+ !char
14
+ ( ws* ( ':' / '>' ) )?
15
+ end
16
+
17
+ rule aquac_area_content
18
+ (
19
+ ( area:( words ) ( ws* comma? ws* fao_area_list_enclosures )? ) /
20
+ ( fao_area_list_enclosures ws* comma? ws* area:( words ) ) /
21
+ fao_area_list_enclosures area:''
22
+ )
23
+ <AquacAreaNode>
24
+ end
25
+
26
+ end
27
+ end
@@ -0,0 +1,18 @@
1
+ module FoodFishParser::Grammar
2
+ grammar AquacMethod
3
+ include Common
4
+ include Words
5
+
6
+ rule aquac_method_indicator
7
+ ( 'wijze van vangst en kweekmethode: aquacultuur:'i / 'kweekmethoden'i / 'kweekmethode'i )
8
+ !char
9
+ ( ws* ( ':' / '>' ) )?
10
+ end
11
+
12
+ rule aquac_method_content
13
+ words
14
+ <AquacMethodNode>
15
+ end
16
+
17
+ end
18
+ end
@@ -0,0 +1,32 @@
1
+ module FoodFishParser::Grammar
2
+ grammar CatchArea
3
+ include Common
4
+ include Words
5
+ include FaoArea
6
+
7
+ rule catch_area_indicator
8
+ ( 'wild'i ws* )?
9
+ (
10
+ 'gevangen'i ws+ ( 'in'i / 'op'i ) /
11
+ 'visgebied'i / 'vangstgebied'i / 'vangsgebied'i /
12
+ 'betrapt bij'i
13
+ )
14
+ !char
15
+ ( ws* ( ':' / '>' ) )?
16
+ end
17
+
18
+ rule catch_area_indicator_short
19
+ catch_area_indicator /
20
+ ( 'in'i / 'op'i ) !char ( ws* ':' )?
21
+ end
22
+
23
+ rule catch_area_content
24
+ (
25
+ ( area:( words_no_with ) ( ws* comma? ws* fao_area_list_enclosures )? ) /
26
+ ( fao_area_list_enclosures ws* comma? ws* area:( words_no_with ) ) /
27
+ fao_area_list_enclosures area:''
28
+ )
29
+ <CatchAreaNode>
30
+ end
31
+ end
32
+ end
@@ -0,0 +1,24 @@
1
+ module FoodFishParser::Grammar
2
+ grammar CatchMethod
3
+ include Common
4
+ include Words
5
+
6
+ rule catch_method_indicator
7
+ ( 'wild'i ws* )?
8
+ ( 'gevangen'i ws+ with / 'vangstmethode'i / 'vangsmethode'i )
9
+ !char
10
+ ( ws* ( ':' / '>' ) )?
11
+ end
12
+
13
+ rule catch_method_indicator_short
14
+ catch_method_indicator /
15
+ with ( ws* ':' )?
16
+ end
17
+
18
+ rule catch_method_content
19
+ words_no_in_on
20
+ <CatchMethodNode>
21
+ end
22
+
23
+ end
24
+ end
@@ -0,0 +1,38 @@
1
+ module FoodFishParser::Grammar
2
+ grammar Common
3
+
4
+ # whitespace
5
+ rule ws
6
+ [ \t]
7
+ end
8
+
9
+ rule char
10
+ [[:alnum:]] / [-]
11
+ end
12
+
13
+ rule comma
14
+ ','
15
+ end
16
+
17
+ rule dash
18
+ [-֊ ‐ ‑ ‒ – — ― ﹘﹣-]
19
+ end
20
+
21
+
22
+ rule and
23
+ ( 'and'i / 'en'i / 'und'i ) !char / '&'
24
+ end
25
+
26
+ rule or
27
+ ( 'or'i / 'of'i / 'oder'i ) !char / '/'
28
+ end
29
+
30
+ rule and_or
31
+ ( ( 'and/or'i / 'en/of'i ) !char ) / and / or
32
+ end
33
+
34
+ rule with
35
+ ( 'met'i / 'd.m.v.'i '.'? / 'with'i ) !char
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,60 @@
1
+ module FoodFishParser::Grammar
2
+ grammar FaoArea
3
+ include Common
4
+
5
+ rule fao_area_list_enclosures
6
+ ( '(' ws* fao_area_list ws* ')' ) /
7
+ ( '|' ws* fao_area_list ) /
8
+ fao_area_list
9
+ end
10
+
11
+ rule fao_area_list
12
+ fao_area_indicator ws*
13
+ ':'? ws*
14
+ fao_area_code
15
+ (
16
+ ( '/' fao_area_code )+ /
17
+ ( ( ',' ws* fao_area_code )+ ws* comma? ws* and ws+ fao_area_code ) /
18
+ ( ',' ws* fao_area_code )+
19
+ )?
20
+ end
21
+
22
+ rule fao_area_indicator
23
+ ( 'FAO'i / 'FA0'i )
24
+ ( ( dash / ws+ ) 'gebied'i )? ( ws* 'nr'i '.'? )?
25
+ end
26
+
27
+ rule fao_area_code
28
+ fao_area_major_code
29
+ (
30
+ ( ws* '(' ws* fao_area_sub_range ws* ')' ) /
31
+ ( fao_area_sub_range )
32
+ )?
33
+ <FaoAreaCodeNode>
34
+ end
35
+
36
+ rule fao_area_major_code
37
+ ( '0' [0-9] [0-9] ) / ( [0-9] [0-9] )
38
+ end
39
+
40
+ rule fao_area_sub_range
41
+ fao_area_sub_code
42
+ ( ws* dash ws* fao_area_sub_code )?
43
+ end
44
+
45
+ rule fao_area_sub_code
46
+ (
47
+ ( ( dash / '/' / ws* )? [ivxIVX]+ ) /
48
+ ( dash [0-9] [0-9] )
49
+ )
50
+ fao_area_suffix?
51
+ (
52
+ ws* '(' ws* [[:digit:]]+ ( ws* dash ws* [[:digit:]]+ )? ws* ')'
53
+ )?
54
+ end
55
+
56
+ rule fao_area_suffix
57
+ [abcdABCD]
58
+ end
59
+ end
60
+ end
@@ -0,0 +1,21 @@
1
+ module FoodFishParser::Grammar
2
+ grammar FishName
3
+ include Common
4
+ include FishNameLatin
5
+ include FishNameNL
6
+
7
+ rule fish_name
8
+ (
9
+ fish_name_nl ws* '(' ws* fish_name_latin ws* ')' /
10
+ fish_name_nl /
11
+ fish_name_latin
12
+ )
13
+ <FishNameNode>
14
+ end
15
+
16
+ rule fish_name_list
17
+ fish_name ( ws+ and_or ws+ fish_name )*
18
+ end
19
+
20
+ end
21
+ end
@@ -0,0 +1,19 @@
1
+ # autogenerated by species-treetop-gen-latin.rb on 2020-03-17
2
+ module FoodFishParser::Grammar
3
+ grammar FishNameLatin
4
+ include Common
5
+
6
+ rule fish_name_latin
7
+ fish_name_latin_first ( ws+ fish_name_latin_second )?
8
+ <FishNameLatinNode>
9
+ end
10
+
11
+ rule fish_name_latin_first
12
+ 'zygochlamys'i / 'zeus'i / 'xiphopenaeus'i / 'xiphias'i / 'undaria'i / 'ulva'i / 'trichiurus'i / 'trachurus'i / 'todarodes'i / 'thunnus'i / 'theragra'i / 'stolephorus'i / 'sprattus'i / 'spirulina'i / 'sparus'i / 'solea'i / 'sepiella'i / 'sepia'i / 'sebastes'i / 'scomber'i / 'sardinella'i / 'sardina'i / 'salmo'i / 'saccharina'i / 'reinhardtius'i / 'psetta'i / 'procambarus'i / 'portunus'i / 'porphyra'i / 'pollachius'i / 'pleuronectes'i / 'pleoticus'i / 'placopecten'i / 'phymatolithon'i / 'perna'i / 'penaeus'i / 'penaeidae'i / 'pelvetia'i / 'pecten'i / 'patinopecten'i / 'parapenaeopsis'i / 'paralomis'i / 'paphia'i / 'pangasius'i / 'pandalus'i / 'palmaria'i / 'pagellus'i / 'pacifische'i / 'ovalipes'i / 'ostrea'i / 'oreochromis'i / 'oncorhynchus'i / 'octopus'i / 'nephrops'i / 'nemipterus'i / 'nelumbo'i / 'mytilus'i / 'mulinia'i / 'micromesistius'i / 'metapenaeus'i / 'merluccius'i / 'merlangius'i / 'melanogrammus'i / 'macruronus'i / 'macrocystis'i / 'lophius'i / 'loligo'i / 'litopenaeus'i / 'lithodes'i / 'limanda'i / 'lethrinus'i / 'lepidotrigla'i / 'lepidopsetta'i / 'lates'i / 'laminaria'i / 'katsuwonus'i / 'illex'i / 'homarus'i / 'himanthalia'i / 'haematococcus'i / 'gracilaria'i / 'gelidium'i / 'gadus'i / 'fucus'i / 'euthynnus'i / 'ensis'i / 'engraulis'i / 'dunaliella'i / 'dosidicus'i / 'dicentrarchus'i / 'crassostrea'i / 'crangon'i / 'clupea'i / 'clarias'i / 'chondrus'i / 'chlorella'i / 'cerastoderma'i / 'caulerpa'i / 'ascophyllum'i / 'anguilla'i / 'anadara'i / 'alle'i / 'alaria'i / 'acipenser'i
13
+ end
14
+
15
+ rule fish_name_latin_second
16
+ 'yessoensis'i / 'vulgaris'i / 'virens'i / 'vesiculosus'i / 'verrucosa'i / 'vannamei'i / 'undulata'i / 'umbilicalis'i / 'tenera'i / 'stylifera'i / 'sprattus'i / 'spp.'i / 'spp'i / 'solea'i / 'scombrus'i / 'santolla'i / 'salina'i / 'salar'i / 'ringens'i / 'pyrifera'i / 'pyrenoidosa'i / 'punctatus'i / 'productus'i / 'pluvialis'i / 'platessa'i / 'platensis'i / 'piscatorius'i / 'pinnatifida'i / 'pilchardus'i / 'pelamis'i / 'pelagicus'i / 'patagonica'i / 'pangasius'i / 'palmata'i / 'pacificus'i / 'officinalis'i / 'ocellatus'i / 'nucifera'i / 'novaezelandiae'i / 'norvegicus'i / 'nodosum'i / 'niloticus'i / 'nerka'i / 'mykiss'i / 'murphyi'i / 'muelleri'i / 'morhua'i / 'monodon'i / 'monoceros'i / 'microptera'i / 'merluccius'i / 'merlangus'i / 'merguiensis'i / 'maximus'i / 'maxima'i / 'marinus'i / 'magellanicus'i / 'macrocephalus'i / 'limanda'i / 'lepturus'i / 'lentillifera'i / 'latissima'i / 'lactuca'i / 'labrax'i / 'kroyeri'i / 'kisutch'i / 'keta'i / 'kabeljauw'i / 'jordani'i / 'japonicus'i / 'japonica'i / 'hippoglossoides'i / 'hexodon'i / 'harengus'i / 'gueldenstaedtii'i / 'granulosa'i / 'gorbuscha'i / 'gladius'i / 'gigas'i / 'gibbosa'i / 'gariepinus'i / 'galloprovincialis'i / 'faber'i / 'esculenta'i / 'encrasicolus'i / 'elongata'i / 'edulis'i / 'edule'i / 'directus'i / 'digitata'i / 'crispus'i / 'crangon'i / 'clarkii'i / 'chilensis'i / 'chalcogramma'i / 'capensis'i / 'canaliculus'i / 'canaliculata'i / 'calcareum'i / 'borealis'i / 'bogaraveo'i / 'bilineata'i / 'australis'i / 'aurata'i / 'argentinus'i / 'antiquata'i / 'anguilla'i / 'anchoita'i / 'americanus'i / 'alle'i / 'albacares'i / 'alalunga'i / 'aeglefinus'i
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,27 @@
1
+ # autogenerated by species-treetop-gen-nl.rb on 2020-03-17
2
+ module FoodFishParser::Grammar
3
+ grammar FishNameNL
4
+ include Common
5
+
6
+ rule fish_name_nl
7
+ ( fish_name_nl_area ws+ )? ( fish_name_nl_attr ws* )? fish_name_nl_name fish_name_nl_suffix?
8
+ <FishNameCommonNode>
9
+ end
10
+
11
+ rule fish_name_nl_area
12
+ 'pacifische'i / 'indische'i / 'groenlandse'i / 'atlantische'i / 'argentijnse'i / 'alaska'i
13
+ end
14
+
15
+ rule fish_name_nl_attr
16
+ 'zwarte'i / 'zwart'i / 'witte'i / 'witpoot'i / 'wit'i / 'roze'i / 'rood'i / 'rode'i / 'rivier'i / 'pijl'i / 'kleine'i / 'klein'i / 'grote'i / 'groot'i / 'groene'i / 'groen'i / 'doorn'i / 'coho'i / 'chum'i / 'blauwe'i / 'blauw'i
17
+ end
18
+
19
+ rule fish_name_nl_name
20
+ 'zonnevis'i / 'zeewolf'i / 'zeesnoek'i / 'zeekreeft'i / 'zeeforel'i / 'zeebaars'i / 'zalm'i / 'wijting'i / 'weekdieren'i / 'weekdier'i / 'vintonijn'i / 'tonijn'i / 'tong'i / 'tilapia'i / 'tarbot'i / 'tapijtschelp'i / 'sprot'i / 'spie'i / 'snotolf'i / 'snoekbaars'i / 'snoek'i / 'skipjack tonijn'i / 'schol'i / 'schelvis'i / 'schelpen'i / 'schelp'i / 'schar'i / 'sardines'i / 'regenboogforel'i / 'raat'i / 'poon'i / 'pollak'i / 'pangasius'i / 'paling'i / 'oogtonijn'i / 'mul'i / 'mosselen'i / 'mossel'i / 'meerval'i / 'mantelschelp'i / 'makreel'i / 'lom'i / 'leng'i / 'kreeft'i / 'koolvis'i / 'kokkel'i / 'karper'i / 'kabeljauw'i / 'hondstong'i / 'hoki'i / 'heilbot'i / 'heek'i / 'hake'i / 'haai'i / 'ha'i / 'gruis'i / 'griet'i / 'geep'i / 'geelvintonijn'i / 'garnalen'i / 'garnaal'i / 'fint'i / 'coquilles'i / 'cocquilles'i / 'botervis'i / 'bot'i / 'beekridder'i / 'barracuda'i / 'baars'i / 'arkschelp'i / 'ansjovis'i / 'albacore tonijn'i
21
+ end
22
+
23
+ rule fish_name_nl_suffix
24
+ 'vlees'i / 'ringen'i / 'ring'i / 'filets'i / 'filet'i
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,55 @@
1
+ module FoodFishParser::Grammar
2
+ grammar Root
3
+ include Common
4
+ include FishName
5
+ include CatchArea
6
+ include CatchMethod
7
+ include AquacArea
8
+ include AquacMethod
9
+
10
+ rule root
11
+ fishes:(
12
+ ( fish ( ws* and_or ws* fish )+ ) /
13
+ ( fish ( ws* ( '.' / comma ) ws* fish )+ ) /
14
+ fish
15
+ )
16
+ ( ws* '.' )?
17
+ <RootNode>
18
+ end
19
+
20
+ rule fish
21
+ (
22
+ ( fish_name_list ( ws* ( comma / ':' ) )? ws+ fish_catch_info ) /
23
+ ( fish_name_list ( ws* ( comma / ':' ) )? ws+ fish_aquac_info ) /
24
+ fish_name_list /
25
+ fish_catch_info /
26
+ fish_aquac_info
27
+ )
28
+ <FishNode>
29
+ end
30
+
31
+ rule fish_catch_info
32
+ (
33
+ catch_method_indicator ws* catch_method_content
34
+ ( ( ws* comma )? ws+ catch_area_indicator_short ws* catch_area_content )?
35
+ ) / (
36
+ catch_area_indicator ws* catch_area_content
37
+ ( ( ws* comma )? ws+ catch_method_indicator_short ws* catch_method_content )?
38
+ )
39
+ end
40
+
41
+ rule fish_aquac_info
42
+ (
43
+ aquac_area_indicator ws* aquac_area_content
44
+ ws* '.' ws* aquac_method_indicator ws* aquac_method_content
45
+ ) / (
46
+ aquac_area_indicator ws* aquac_area_content
47
+ ( ( ws* comma )? ws+ aquac_method_indicator ws* aquac_method_content )?
48
+ ) / (
49
+ aquac_method_indicator ws* aquac_method_content
50
+ ( ( ws* comma )? ws+ aquac_area_indicator ws* aquac_area_content )?
51
+ )
52
+ end
53
+
54
+ end
55
+ end
@@ -0,0 +1,52 @@
1
+ module FoodFishParser::Grammar
2
+ grammar Words
3
+ include Common
4
+ include FishNameLatin
5
+ include FishNameNL
6
+
7
+ rule word
8
+ word_abbr / '(sub)'i? !words_to_avoid char+
9
+ end
10
+
11
+ rule words
12
+ word ( word_sep word )*
13
+ end
14
+
15
+ rule words_no_in_on
16
+ !( 'in'i / 'op'i !char ) word ( word_sep !( 'in'i / 'op'i !char ) word )*
17
+ end
18
+
19
+ rule words_no_with
20
+ !with word ( word_sep !with word )*
21
+ end
22
+
23
+ rule word_sep
24
+ ( ws* ( comma / '/' ) ws* ) / ws+
25
+ end
26
+
27
+ rule word_abbr
28
+ ( [a-zA-Z] '.' )+ [a-zA-Z] / [a-zA-Z] '.' ( [a-zA-Z] '.' )+ ![a-zA-Z]
29
+ end
30
+
31
+ # these words should not be considered, because they indicate a new section
32
+ rule words_to_avoid
33
+ (
34
+ fish_name_latin /
35
+ fish_name_nl /
36
+ 'gevangen'i /
37
+ 'visgebied'i /
38
+ 'vangstgebied'i /
39
+ 'vangstmethode'i /
40
+ 'vangsmethode'i /
41
+ 'betrapt'i /
42
+ 'gekweekt'i /
43
+ 'kweekmethode'i /
44
+ 'kweekmethoden'i /
45
+ 'd.m.v'i '.'? /
46
+ 'FAO'i /
47
+ 'FA0'i
48
+ )
49
+ ![[:alpha:]]
50
+ end
51
+ end
52
+ end
@@ -0,0 +1,91 @@
1
+ require 'treetop/runtime'
2
+
3
+ # Needs to be in grammar namespace so Treetop can find the nodes.
4
+ module FoodFishParser
5
+ module Grammar
6
+
7
+ # Additions for Treetop nodes, include this in other nodes where needed.
8
+ module SyntaxNodeAdditions
9
+ def to_a_deep(n, cls)
10
+ if n.is_a?(cls)
11
+ [n]
12
+ elsif n.nonterminal?
13
+ n.elements.map {|m| to_a_deep(m, cls) }.flatten(1).compact
14
+ end
15
+ end
16
+ end
17
+
18
+ # Root object, contains everything else.
19
+ module RootNode
20
+ include SyntaxNodeAdditions
21
+ def to_a
22
+ to_a_deep(fishes, FishNode).map(&:to_h)
23
+ end
24
+ end
25
+
26
+ module FishNode
27
+ include SyntaxNodeAdditions
28
+ def to_h
29
+ {
30
+ names: to_a_deep(self, FishNameNode).map(&:to_h),
31
+ catch_areas: to_a_deep(self, CatchAreaNode).map(&:to_h),
32
+ catch_methods: to_a_deep(self, CatchMethodNode).map(&:to_h),
33
+ aquaculture_areas: to_a_deep(self, AquacAreaNode).map(&:to_h),
34
+ aquaculture_methods: to_a_deep(self, AquacMethodNode).map(&:to_h),
35
+ }
36
+ end
37
+ end
38
+
39
+ module FishNameNode
40
+ include SyntaxNodeAdditions
41
+ def to_h
42
+ {
43
+ common: to_a_deep(self, FishNameCommonNode).first&.text_value,
44
+ latin: to_a_deep(self, FishNameLatinNode).first&.text_value
45
+ }
46
+ end
47
+ end
48
+
49
+ module FishNameCommonNode; end
50
+ module FishNameLatinNode; end
51
+
52
+ module CatchAreaNode
53
+ include SyntaxNodeAdditions
54
+ def to_h
55
+ {
56
+ text: area.text_value,
57
+ fao_codes: to_a_deep(self, FaoAreaCodeNode).map(&:text_value)
58
+ }
59
+ end
60
+ end
61
+
62
+ module FaoAreaCodeNode; end
63
+
64
+ module CatchMethodNode
65
+ def to_h
66
+ {
67
+ text: self.text_value
68
+ }
69
+ end
70
+ end
71
+
72
+ module AquacAreaNode
73
+ include SyntaxNodeAdditions
74
+ def to_h
75
+ {
76
+ text: area.text_value,
77
+ fao_codes: to_a_deep(self, FaoAreaCodeNode).map(&:text_value)
78
+ }
79
+ end
80
+ end
81
+
82
+ module AquacMethodNode
83
+ def to_h
84
+ {
85
+ text: self.text_value
86
+ }
87
+ end
88
+ end
89
+
90
+ end
91
+ end
@@ -0,0 +1,26 @@
1
+ require_relative 'grammar'
2
+
3
+ module FoodFishParser
4
+ class Parser
5
+
6
+ # @!attribute [r] parser
7
+ # @return [Treetop::Runtime::CompiledParser] low-level parser object
8
+ # @note This attribute is there for convenience, but may change in the future. Take care.
9
+ attr_reader :parser
10
+
11
+ # Create a new fish detail parser
12
+ # @return [FoodFishParser::Parser]
13
+ def initialize
14
+ @parser = Grammar::RootParser.new
15
+ end
16
+
17
+ # Parse food fish text into a structured representation.
18
+ #
19
+ # @return [FoodFishParser::Grammar::RootNode] structured representation of fish details
20
+ # @note Unrecognized options are passed to Treetop, but this is not guarenteed to remain so forever.
21
+ def parse(s, **options)
22
+ @parser.parse(s, **options)
23
+ end
24
+
25
+ end
26
+ end
@@ -0,0 +1,4 @@
1
+ module FoodFishParser
2
+ VERSION = '0.1.0'
3
+ VERSION_DATE = '2020-03-17'
4
+ end
metadata ADDED
@@ -0,0 +1,86 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: food_fish_parser
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - wvengen
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2020-03-17 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: treetop
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.6'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.6'
27
+ description: |2
28
+ Food products that contain fish sometimes indicate details like fishing
29
+ area, method or aquaculture country. This parser know about various ways
30
+ this is found on a product package, and returns a structured representation
31
+ of the fish ingredient details.
32
+ email:
33
+ - dev-ruby@willem.engen.nl
34
+ executables:
35
+ - food_fish_parser
36
+ extensions: []
37
+ extra_rdoc_files:
38
+ - README.md
39
+ - LICENSE
40
+ files:
41
+ - LICENSE
42
+ - README.md
43
+ - bin/food_fish_parser
44
+ - food_fish_parser.gemspec
45
+ - lib/food_fish_parser.rb
46
+ - lib/food_fish_parser/grammar.rb
47
+ - lib/food_fish_parser/grammar/aquac_area.treetop
48
+ - lib/food_fish_parser/grammar/aquac_method.treetop
49
+ - lib/food_fish_parser/grammar/catch_area.treetop
50
+ - lib/food_fish_parser/grammar/catch_method.treetop
51
+ - lib/food_fish_parser/grammar/common.treetop
52
+ - lib/food_fish_parser/grammar/fao_area.treetop
53
+ - lib/food_fish_parser/grammar/fish_name.treetop
54
+ - lib/food_fish_parser/grammar/fish_name_latin.treetop
55
+ - lib/food_fish_parser/grammar/fish_name_nl.treetop
56
+ - lib/food_fish_parser/grammar/root.treetop
57
+ - lib/food_fish_parser/grammar/words.treetop
58
+ - lib/food_fish_parser/nodes.rb
59
+ - lib/food_fish_parser/parser.rb
60
+ - lib/food_fish_parser/version.rb
61
+ homepage: https://github.com/q-m/food-fish-parser-ruby
62
+ licenses:
63
+ - MIT
64
+ metadata:
65
+ bug_tracker_uri: https://github.com/q-m/food-fish-parser-ruby/issues
66
+ source_code_uri: https://github.com/q-m/food-fish-parser-ruby
67
+ post_install_message:
68
+ rdoc_options: []
69
+ require_paths:
70
+ - lib
71
+ required_ruby_version: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ required_rubygems_version: !ruby/object:Gem::Requirement
77
+ requirements:
78
+ - - ">="
79
+ - !ruby/object:Gem::Version
80
+ version: '0'
81
+ requirements: []
82
+ rubygems_version: 3.0.3
83
+ signing_key:
84
+ specification_version: 4
85
+ summary: Parser for fish details found on food products.
86
+ test_files: []