fop_lang 0.4.0 → 0.8.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8ed95bb4708820a186e6485cc29dbb47286b0f309a1caf91af7778d768b0efb3
4
- data.tar.gz: 85d41728ddae13f3667f0a2d55a5c4dcbc26e8217d4f31466e6ed92038859881
3
+ metadata.gz: e23d8d937f5a4b5e4d74010bb91923dedce019543d4d3baefc228dece938a731
4
+ data.tar.gz: cc97f6953b708498be169352269b861c73c9dbe52ded1a72f4370a8d18d32d48
5
5
  SHA512:
6
- metadata.gz: e650bdf66d8d0b5dcb603eae494f38d4969a19647053b4e81e0612705f5b16a5755d007e4ce01fad9b487224d66eff462738f1e37b011ba2a4bf4a45b0203bb3
7
- data.tar.gz: 99b31736236785cecc85b9bb23ccc3e366713cd23dd04c992a8fff6676a316d5741bfe5058e17cd0812c43d8bf1979aa7546184386018c1f92ed2462a85eb5fb
6
+ metadata.gz: e2cec9cd47a472298f7af0268a9dc03aacce374ed88da7b505e33cb4536f6f1d04107cce7c33eba4718809d54591a3111bfd26971eef3c52073ba1226be4da4f
7
+ data.tar.gz: 80b5700d0cdda44dd021fe48d5c134cb992c6967b10681c43488a5a7276fbf03df7d7a9427a9aa92529569eaf0d134fa789df0c8e27cd2250dc50bcb16727d13
data/README.md CHANGED
@@ -1,57 +1,107 @@
1
1
  # fop_lang
2
2
 
3
- Fop (Filter and OPerations language) is an experimental, tiny expression language in the vein of awk and sed. This is a Ruby implementation. It is useful for simultaneously matching and transforming text input.
3
+ Fop (Filter and OPerations language) is a tiny, experimental language for filtering and operating on text. Think of it like awk but with the condition and action segments combined.
4
4
 
5
- ```ruby
6
- gem 'fop_lang'
7
- ```
5
+ This is a Ruby implementation with both a library interface and a bin command.
8
6
 
9
- ## Release Number Example
7
+ ## Installation
10
8
 
11
- This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
9
+ ```bash
10
+ $ gem install fop_lang
11
+ ```
12
+
13
+ You may use fop in a Ruby script:
12
14
 
13
15
  ```ruby
14
- f = Fop('release-{N}.{N+1}.{N=0}')
16
+ require 'fop_lang'
15
17
 
16
- puts f.apply('release-5.99.1')
17
- => 'release-5.100.0'
18
+ f = Fop('foo {N+1}')
19
+ f.apply('foo 1')
20
+ => "foo 2"
21
+ f.apply('bar 1')
22
+ => nil
23
+ ```
18
24
 
19
- puts f.apply('release-5')
20
- => nil
21
- # doesn't match the pattern
25
+ or run `fop` from the command line:
26
+
27
+ ```bash
28
+ $ echo 'foo 1' | fop 'foo {N+1}'
29
+ foo 2
30
+ $ echo 'bar 1' | fop 'foo {N+1}'
22
31
  ```
23
32
 
24
- ## Anatomy of a Fop expression
33
+ ## Syntax
34
+
35
+ `Text /(R|r)egex/ {N+1}`
36
+
37
+ The above program demonstrates a text match, a regex match, and a match expression. If the input matches all three segments, output is given. If the input was `Text regex 5`, the output would be `Text regex 6`.
38
+
39
+ ### Text match
40
+
41
+ `Text ` and ` ` in the above example.
42
+
43
+ The input must match this text exactly. Whitespace is part of the match. Wildcards (`*`) are allowed. Special characters (`*/{}\`) may be escaped with `\`.
44
+
45
+ The output of a text match will be the matching input.
46
+
47
+ ### Regex match
48
+
49
+ `/(R|r)egex/` in the above example.
50
+
51
+ Regular expressions may be placed between `/`s. If the regular expression contains a `/`, you may escape it with `\`. Special regex characters like `[]()+.*` may also be escaped with `\`.
25
52
 
26
- `Text Literal {Operation}`
53
+ The output of a regex match will be the matching input.
27
54
 
28
- The above expression contains the only two parts of Fop (except for the wildcard and escape characters).
55
+ ### Match expression
29
56
 
30
- **Text Literals**
57
+ `{N+1}` in the above example.
31
58
 
32
- A text literal works how it sounds: the input must match it exactly. If it matches it passes through unchanged. The only exception is the `*` (wildcard) character, which matches 0 or more of anything. Wildcards can be used anywhere except inside `{...}` (operations).
59
+ A match expression both matches on input and modifies that input. An expression is made up of 1 - 3 parts:
33
60
 
34
- If `\` (escape) is used before the special characters `*`, `{` or `}`, then that character is treated like a text literal. It's recommended to use single-quoted Ruby strings with Fop expressions that so you don't need to double-escape.
61
+ 1. The match, e.g. `N` for numeric.
62
+ 2. The operator, e.g. `+` for addition (optional).
63
+ 3. The argument, e.g `1` for "add one" (required for most operators).
35
64
 
36
- **Operations**
65
+ The output of a match expression will be the _modified_ matching input. If no operator is given, the output will be the matching input.
37
66
 
38
- Operations are the interesting part of Fop, and are specified between `{` and `}`. An Operation can consist of one to three parts:
67
+ **Matches**
39
68
 
40
- 1. Matching class (required): Defines what characters the operation will match and operate on.
41
- * `N` is the numeric class and will match one or more digits.
42
- * `A` is the alpha class and will match one or more letters (lower or upper case).
43
- * `W` is the word class and matches alphanumeric chars and underscores.
44
- * `*` is the wildcard class and greedily matches everything after it.
45
- * `/.../` matches on the supplied regex between the `/`'s. If you're regex contains a `/`, it must be escaped. Capture groups may be referenced in the operator argument as `$1`, `$2`, etc.
46
- 3. Operator (optional): What to do to the matching characters.
47
- * `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars.
48
- * `>` Append the following chars to the matching value.
49
- * `<` Prepend the following chars to the matching value.
50
- * `+` Perform addition on the matching number and the argument (`N` only).
51
- * `-` Subtract the argument from the matching number (`N` only).
52
- 5. Operator argument (required for some operators): meaning varies by operator.
69
+ * `N` matches one or more consecutive digits.
70
+ * `A` matches one or more letters (lower or upper case).
71
+ * `W` matches alphanumeric chars and underscores.
72
+ * `*` greedily matches everything after it.
73
+ * `/regex/` matches on the supplied regex. Capture groups may be referenced in the argument as `$1`, `$2`, etc.
53
74
 
54
- ## More Examples
75
+ **Operators**
76
+
77
+ * `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars.
78
+ * `>` Append the argument to the matching value.
79
+ * `<` Prepend the argument to the matching value.
80
+ * `+` Perform addition on the matching number and the argument (`N` only).
81
+ * `-` Subtract the argument from the matching number (`N` only).
82
+
83
+ **Whitespace**
84
+
85
+ Inside of match expressions, whitespace is an optional seperator of terms, i.e. `{ N + 1 }` is the same as `{N+1}`. This means that any spaces in string arguments must be escaped. For example, replacing a word with `foo bar` looks like `{W = foo\ bar}`.
86
+
87
+ ## Examples
88
+
89
+ ### Release Number Example
90
+
91
+ This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
92
+
93
+ ```ruby
94
+ f = Fop('release-{N}.{N+1}.{N=0}')
95
+
96
+ puts f.apply('release-5.99.1')
97
+ => 'release-5.100.0'
98
+
99
+ puts f.apply('release-5')
100
+ => nil
101
+ # doesn't match the pattern
102
+ ```
103
+
104
+ ### More Examples
55
105
 
56
106
  ```ruby
57
107
  f = Fop('release-{N=5}.{N+1}.{N=0}')
@@ -61,10 +111,10 @@ Operations are the interesting part of Fop, and are specified between `{` and `}
61
111
  ```
62
112
 
63
113
  ```ruby
64
- f = Fop('rel{/(ease)?/}-{N=5}.{N+1}.{N=0}')
114
+ f = Fop('rel{/(ease)?/=}-{N=5}.{N+1}.{N=0}')
65
115
 
66
116
  puts f.apply('release-4.99.1')
67
- => 'release-5.100.0'
117
+ => 'rel-5.100.0'
68
118
 
69
119
  puts f.apply('rel-4.99.1')
70
120
  => 'rel-5.100.0'
data/bin/fop ADDED
@@ -0,0 +1,42 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ # Used for local testing
4
+ # $LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib')
5
+
6
+ require 'fop_lang'
7
+ require 'fop/cli'
8
+
9
+ opts = Fop::CLI.options!
10
+
11
+ if opts.version
12
+ puts Fop::VERSION
13
+ exit 0
14
+ end
15
+
16
+ src = opts.src.read.chomp
17
+ if src.empty?
18
+ $stderr.puts "No expression given"
19
+ exit 1
20
+ end
21
+
22
+ fop, errors = Fop.compile(src)
23
+ opts.src.close
24
+ NL = "\n".freeze
25
+
26
+ if errors
27
+ $stderr.puts src
28
+ $stderr.puts errors.join(NL)
29
+ exit 1
30
+ end
31
+
32
+ if opts.check
33
+ $stdout.puts "Syntax OK" unless opts.quiet
34
+ exit 0
35
+ end
36
+
37
+ while (line = gets) do
38
+ line.chomp!
39
+ if (res = fop.apply(line))
40
+ print(res << NL)
41
+ end
42
+ end
data/lib/fop/cli.rb ADDED
@@ -0,0 +1,34 @@
1
+ require 'optparse'
2
+
3
+ module Fop
4
+ module CLI
5
+ Options = Struct.new(:src, :check, :quiet, :version)
6
+
7
+ def self.options!
8
+ options = Options.new
9
+ OptionParser.new do |opts|
10
+ opts.banner = "Usage: fop [options] [ 'prog' | -f progfile ] [ file ... ]"
11
+
12
+ opts.on("-fFILE", "--file=FILE", "Read program from file instead of first argument") do |f|
13
+ options.src = File.open(f)
14
+ options.src.advise(:sequential)
15
+ end
16
+
17
+ opts.on("-c", "--check", "Perform a syntax check on the program and exit") do
18
+ options.check = true
19
+ end
20
+
21
+ opts.on("-q", "--quiet", "Only print errors and output") do
22
+ options.quiet = true
23
+ end
24
+
25
+ opts.on("--version", "Print version and exit") do
26
+ options.version = true
27
+ end
28
+ end.parse!
29
+
30
+ options.src ||= StringIO.new(ARGV.shift || "")
31
+ options
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,95 @@
1
+ require_relative 'parser'
2
+
3
+ module Fop
4
+ module Compiler
5
+ def self.compile(src)
6
+ parser = Parser.new(src)
7
+ nodes, errors = parser.parse
8
+
9
+ instructions = nodes.map { |node|
10
+ case node
11
+ when Nodes::Text, Nodes::Regex
12
+ Instructions.regex_match(node.regex)
13
+ when Nodes::Expression
14
+ arg_error = Validations.validate_args(node)
15
+ errors << arg_error if arg_error
16
+ Instructions::ExpressionMatch.new(node)
17
+ else
18
+ raise "Unknown node type #{node}"
19
+ end
20
+ }
21
+
22
+ return nil, errors if errors.any?
23
+ return instructions, nil
24
+ end
25
+
26
+ module Instructions
27
+ Op = Struct.new(:proc, :arity, :max_arity)
28
+ BLANK = "".freeze
29
+ OPERATIONS = {
30
+ "=" => Op.new(->(_val, args) { args[0] || BLANK }, 0, 1),
31
+ "+" => Op.new(->(val, args) { val.to_i + args[0].to_i }, 1),
32
+ "-" => Op.new(->(val, args) { val.to_i - args[0].to_i }, 1),
33
+ ">" => Op.new(->(val, args) { val + args[0] }, 1),
34
+ "<" => Op.new(->(val, args) { args[0] + val }, 1),
35
+ }
36
+
37
+ def self.regex_match(regex)
38
+ ->(input) { input.slice! regex }
39
+ end
40
+
41
+ class ExpressionMatch
42
+ def initialize(node)
43
+ @regex = node.regex&.regex
44
+ @op = node.operator_token ? OPERATIONS.fetch(node.operator_token.val) : nil
45
+ @regex_match = node.regex_match
46
+ @args = node.args&.map { |arg|
47
+ arg.has_captures ? arg.segments : arg.segments.join("")
48
+ }
49
+ end
50
+
51
+ def call(input)
52
+ if (match = @regex.match(input))
53
+ val = match.to_s
54
+ blank = val == BLANK
55
+ input.sub!(val, BLANK) unless blank
56
+ found_val = @regex_match || !blank
57
+ if @op and @args and found_val
58
+ args = @args.map { |arg|
59
+ case arg
60
+ when String then arg
61
+ when Array then sub_caps(arg, match.captures)
62
+ else raise "Unexpected arg type #{arg.class.name}"
63
+ end
64
+ }
65
+ @op.proc.call(val, args)
66
+ else
67
+ val
68
+ end
69
+ end
70
+ end
71
+
72
+ private
73
+
74
+ def sub_caps(args, caps)
75
+ args.map { |a|
76
+ a.is_a?(Integer) ? caps[a].to_s : a
77
+ }.join("")
78
+ end
79
+ end
80
+ end
81
+
82
+ module Validations
83
+ def self.validate_args(exp_node)
84
+ op_token = exp_node.operator_token || return
85
+ op = Instructions::OPERATIONS.fetch(op_token.val)
86
+ num = exp_node.args&.size || 0
87
+ arity = op.arity
88
+ max_arity = op.max_arity || arity
89
+ if num < arity or num > max_arity
90
+ Parser::Error.new(:argument, op_token, "#{op_token.val} expects #{arity}..#{max_arity} arguments; #{num} given")
91
+ end
92
+ end
93
+ end
94
+ end
95
+ end
data/lib/fop/nodes.rb CHANGED
@@ -1,44 +1,39 @@
1
1
  module Fop
2
2
  module Nodes
3
- Text = Struct.new(:wildcard, :str) do
4
- def consume!(input)
5
- @regex ||= Regexp.new((wildcard ? ".*" : "^") + Regexp.escape(str))
6
- input.slice!(@regex)
7
- end
8
-
3
+ Text = Struct.new(:wildcard, :str, :regex) do
9
4
  def to_s
10
5
  w = wildcard ? "*" : nil
11
- "Text #{w}#{str}"
6
+ "[#{w}txt] #{str}"
12
7
  end
13
8
  end
14
9
 
15
- Op = Struct.new(:wildcard, :match, :regex_match, :regex, :operator, :operator_arg, :operator_arg_w_caps, :expression) do
16
- def consume!(input)
17
- if (match = regex.match(input))
18
- val = match.to_s
19
- blank = val == Parser::BLANK
20
- input.sub!(val, Parser::BLANK) unless blank
21
- found_val = regex_match || !blank
22
- arg = operator_arg_w_caps ? sub_caps(operator_arg_w_caps, match.captures) : operator_arg
23
- expression && found_val ? expression.call(val, operator, arg) : val
24
- end
10
+ Regex = Struct.new(:wildcard, :src, :regex) do
11
+ def to_s
12
+ w = wildcard ? "*" : nil
13
+ "[#{w}reg] #{src}"
25
14
  end
15
+ end
26
16
 
17
+ Expression = Struct.new(:wildcard, :match, :regex_match, :regex, :operator_token, :args) do
27
18
  def to_s
28
19
  w = wildcard ? "*" : nil
29
- s = "#{w}#{match}"
30
- s << " #{operator} #{operator_arg}" if operator
20
+ s = "[#{w}exp] #{match}"
21
+ if operator_token
22
+ arg_str = args
23
+ .map { |a| a.is_a?(Integer) ? "$#{a+1}" : a.to_s }
24
+ .join("")
25
+ s << " #{operator_token.val} #{arg_str}"
26
+ end
31
27
  s
32
28
  end
29
+ end
33
30
 
34
- private
35
-
36
- def sub_caps(tokens, caps)
37
- tokens.map { |t|
38
- case t
39
- when String then t
40
- when Parser::CaptureGroup then caps[t.index].to_s
41
- else raise Parser::Error, "Unexpected #{t} in capture group"
31
+ Arg = Struct.new(:segments, :has_captures) do
32
+ def to_s
33
+ segments.map { |s|
34
+ case s
35
+ when Integer then "$#{s + 1}"
36
+ else s.to_s
42
37
  end
43
38
  }.join("")
44
39
  end
data/lib/fop/parser.rb CHANGED
@@ -1,181 +1,173 @@
1
+ require_relative 'tokenizer'
1
2
  require_relative 'nodes'
2
3
 
3
4
  module Fop
4
- module Parser
5
- Error = Class.new(StandardError)
6
- CaptureGroup = Struct.new(:index)
5
+ class Parser
6
+ DIGIT = /^[0-9]$/
7
+ REGEX_START = "^".freeze
8
+ REGEX_LAZY_WILDCARD = ".*?".freeze
9
+ REGEX_MATCHES = {
10
+ "N" => "[0-9]+".freeze,
11
+ "W" => "\\w+".freeze,
12
+ "A" => "[a-zA-Z]+".freeze,
13
+ "*" => ".*".freeze,
14
+ }.freeze
15
+ #OPS_WITH_OPTIONAL_ARGS = [Tokenizer::OP_REPLACE]
16
+ TR_REGEX = /.*/
17
+
18
+ Error = Struct.new(:type, :token, :message) do
19
+ def to_s
20
+ "#{type.to_s.capitalize} error: #{message} at column #{token.pos}"
21
+ end
22
+ end
7
23
 
8
- MATCH_NUM = "N".freeze
9
- MATCH_WORD = "W".freeze
10
- MATCH_ALPHA = "A".freeze
11
- MATCH_WILD = "*".freeze
12
- BLANK = "".freeze
13
- OP_REPLACE = "=".freeze
14
- OP_APPEND = ">".freeze
15
- OP_PREPEND = "<".freeze
16
- OP_ADD = "+".freeze
17
- OP_SUB = "-".freeze
18
- OP_MUL = "*".freeze
19
- OP_DIV = "/".freeze
20
- VAR = "$".freeze
21
- CAP_NUM = /^[1-9]$/
24
+ attr_reader :errors
22
25
 
23
- EXP_REPLACE = ->(_val, _op, arg) { arg || BLANK }
24
- EXP_MATH = ->(val, op, arg) { val.to_i.send(op, arg.to_i) }
25
- EXP_APPEND = ->(val, _op, arg) { val + arg }
26
- EXP_PREPEND = ->(val, _op, arg) { arg + val }
26
+ def initialize(src, debug: false)
27
+ @tokenizer = Tokenizer.new(src)
28
+ @errors = []
29
+ end
27
30
 
28
- def self.parse!(tokens)
31
+ def parse
29
32
  nodes = []
30
- curr_node = nil
31
-
32
- tokens.each { |token|
33
- case curr_node
34
- when nil
35
- curr_node = new_node token
36
- when :wildcard
37
- curr_node = new_node token, true
38
- raise Error, "Unexpected * after wildcard" if curr_node == :wildcard
39
- when Nodes::Text
40
- curr_node, finished_node = parse_text curr_node, token
41
- nodes << finished_node if finished_node
42
- when Nodes::Op
43
- nodes << curr_node
44
- curr_node = new_node token
33
+ wildcard = false
34
+ eof = false
35
+ # Top-level parsing. It will always be looking for a String, Regex, or Expression.
36
+ until eof
37
+ @tokenizer.reset_escapes!
38
+ t = @tokenizer.next
39
+ case t.type
40
+ when Tokens::WILDCARD
41
+ errors << Error.new(:syntax, t, "Consecutive wildcards") if wildcard
42
+ wildcard = true
43
+ when Tokens::TEXT
44
+ reg = build_regex!(wildcard, t, Regexp.escape(t.val))
45
+ nodes << Nodes::Text.new(wildcard, t.val, reg)
46
+ wildcard = false
47
+ when Tokens::EXP_OPEN
48
+ nodes << parse_exp!(wildcard)
49
+ wildcard = false
50
+ when Tokens::REG_DELIM
51
+ nodes << parse_regex!(wildcard)
52
+ wildcard = false
53
+ when Tokens::EOF
54
+ eof = true
45
55
  else
46
- raise Error, "Unexpected node #{curr_node}"
56
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}")
47
57
  end
48
- }
49
-
50
- case curr_node
51
- when nil
52
- # noop
53
- when :wildcard
54
- nodes << Nodes::Text.new(true, "")
55
- when Nodes::Text, Nodes::Op
56
- nodes << curr_node
57
- else
58
- raise Error, "Unexpected end node #{curr_node}"
59
58
  end
60
-
61
- nodes
59
+ nodes << Nodes::Text.new(true, "", TR_REGEX) if wildcard
60
+ return nodes, @errors
62
61
  end
63
62
 
64
- private
63
+ def parse_exp!(wildcard = false)
64
+ exp = Nodes::Expression.new(wildcard)
65
+ parse_exp_match! exp
66
+ parse_exp_operator! exp
67
+ if exp.operator_token
68
+ parse_exp_arg! exp
69
+ end
70
+ return exp
71
+ end
65
72
 
66
- def self.new_node(token, wildcard = false)
67
- case token
68
- when Tokenizer::Char
69
- Nodes::Text.new(wildcard, token.char.clone)
70
- when Tokenizer::Op
71
- op = Nodes::Op.new(wildcard)
72
- parse_op! op, token
73
- op
74
- when :wildcard
75
- :wildcard
73
+ def parse_exp_match!(exp)
74
+ @tokenizer.escape.whitespace = false
75
+ @tokenizer.escape.operators = false
76
+ t = @tokenizer.next
77
+ case t.type
78
+ when Tokens::TEXT, Tokens::WILDCARD
79
+ exp.match = t.val
80
+ if (src = REGEX_MATCHES[exp.match])
81
+ reg = Regexp.new((exp.wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
82
+ exp.regex = Nodes::Regex.new(exp.wildcard, src, reg)
83
+ else
84
+ errors << Error.new(:name, t, "Unknown match type '#{exp.match}'") if exp.regex.nil?
85
+ end
86
+ when Tokens::REG_DELIM
87
+ exp.regex = parse_regex!(exp.wildcard)
88
+ exp.match = exp.regex&.src
89
+ exp.regex_match = true
90
+ @tokenizer.reset_escapes!
76
91
  else
77
- raise Error, "Unexpected #{token}"
92
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string or a regex")
78
93
  end
79
94
  end
80
95
 
81
- # @return current node
82
- # @return finished node
83
- def self.parse_text(node, token)
84
- case token
85
- when Tokenizer::Char
86
- node.str << token.char
87
- return node, nil
88
- when Tokenizer::Op
89
- op = new_node token
90
- return op, node
91
- when :wildcard
92
- return :wildcard, node
96
+ def parse_exp_operator!(exp)
97
+ @tokenizer.escape.whitespace = false
98
+ @tokenizer.escape.operators = false
99
+ t = @tokenizer.next
100
+ case t.type
101
+ when Tokens::EXP_CLOSE
102
+ # no op
103
+ when Tokens::OPERATOR, Tokens::TEXT
104
+ exp.operator_token = t
93
105
  else
94
- raise Error, "Unexpected #{token}"
106
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected an operator")
95
107
  end
96
108
  end
97
109
 
98
- def self.parse_op!(node, token)
99
- # parse the matching type
100
- node.regex =
101
- case token.match
102
- when Tokenizer::Char
103
- node.match = token.match.char
104
- node.regex_match = false
105
- case node.match
106
- when MATCH_NUM then Regexp.new((node.wildcard ? ".*?" : "^") + "[0-9]+")
107
- when MATCH_WORD then Regexp.new((node.wildcard ? ".*?" : "^") + "\\w+")
108
- when MATCH_ALPHA then Regexp.new((node.wildcard ? ".*?" : "^") + "[a-zA-Z]+")
109
- when MATCH_WILD then /.*/
110
- else raise Error, "Unknown match type '#{node.match}'"
111
- end
112
- when Tokenizer::Regex
113
- node.match = "/#{token.match.src}/"
114
- node.regex_match = true
115
- Regexp.new((node.wildcard ? ".*?" : "^") + token.match.src)
116
- when nil
117
- raise Error, "Empty operation"
110
+ def parse_exp_arg!(exp)
111
+ @tokenizer.escape.whitespace = false
112
+ @tokenizer.escape.whitespace_sep = false
113
+ @tokenizer.escape.operators = true
114
+ @tokenizer.escape.regex = true
115
+ @tokenizer.escape.regex_capture = false if exp.regex_match
116
+
117
+ arg = Nodes::Arg.new([], false)
118
+ exp.args = []
119
+ found_close, eof = false, false
120
+ until found_close or eof
121
+ t = @tokenizer.next
122
+ case t.type
123
+ when Tokens::TEXT
124
+ arg.segments << t.val
125
+ when Tokens::REG_CAPTURE
126
+ arg.has_captures = true
127
+ arg.segments << t.val.to_i - 1
128
+ errors << Error.new(:syntax, t, "Invalid regex capture; must be between 0 and 9 (found #{t.val})") unless t.val =~ DIGIT
129
+ errors << Error.new(:syntax, t, "Unexpected regex capture; expected str or '}'") if !exp.regex_match
130
+ when Tokens::WHITESPACE_SEP
131
+ if arg.segments.any?
132
+ exp.args << arg
133
+ arg = Nodes::Arg.new([])
134
+ end
135
+ when Tokens::EXP_CLOSE
136
+ found_close = true
137
+ when Tokens::EOF
138
+ eof = true
139
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
118
140
  else
119
- raise Error, "Unexpected #{token.match}"
141
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
120
142
  end
121
-
122
- # parse the operator (if any)
123
- if token.operator
124
- raise Error, "Unexpected #{token.operator} for operator" unless token.operator.is_a? Tokenizer::Char
125
- node.operator = token.operator.char
126
- node.operator_arg = token.arg if token.arg and token.arg != BLANK
127
- node.operator_arg_w_caps = parse_captures! node.operator_arg if node.operator_arg and node.regex_match
128
- node.expression =
129
- case node.operator
130
- when OP_REPLACE
131
- EXP_REPLACE
132
- when OP_ADD, OP_SUB, OP_MUL, OP_DIV
133
- raise Error, "Operator #{node.operator} is only available for numeric matches" unless node.match == MATCH_NUM
134
- raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
135
- EXP_MATH
136
- when OP_APPEND
137
- raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
138
- EXP_APPEND
139
- when OP_PREPEND
140
- raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
141
- EXP_PREPEND
142
- else
143
- raise Error, "Unknown operator #{node.operator}"
144
- end
145
143
  end
146
- end
147
-
148
- def self.parse_captures!(arg)
149
- i = 0
150
- iend = arg.size - 1
151
- escape = false
152
- nodes = []
153
-
154
- until i > iend
155
- char = arg[i]
156
- i += 1
144
+ exp.args << arg if arg.segments.any?
157
145
 
158
- if escape
159
- nodes << char
160
- escape = false
161
- next
162
- end
146
+ #if exp.arg.size != 1 and !OPS_WITH_OPTIONAL_ARGS.include?(exp.operator)
147
+ # errors << Error.new(:arg, op_token, "Operator '#{op_token.val}' requires an argument")
148
+ #end
149
+ end
163
150
 
164
- case char
165
- when Tokenizer::ESCAPE
166
- escape = true
167
- when VAR
168
- num = arg[i].to_s
169
- raise Error, "Capture group number must be between 1 and 9; found '#{num}'" unless num =~ CAP_NUM
170
- nodes << CaptureGroup.new(num.to_i - 1)
171
- i += 1
172
- else
173
- nodes << char
174
- end
151
+ def parse_regex!(wildcard)
152
+ @tokenizer.regex_mode!
153
+ t = @tokenizer.next
154
+ reg = Nodes::Regex.new(wildcard, t.val)
155
+ if t.type == Tokens::TEXT
156
+ reg.regex = build_regex!(wildcard, t)
157
+ else
158
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex")
175
159
  end
176
160
 
177
- raise Error, "Trailing escape" if escape
178
- nodes
161
+ t = @tokenizer.next
162
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex") unless t.type == Tokens::REG_DELIM
163
+ reg
164
+ end
165
+
166
+ def build_regex!(wildcard, token, src = token.val)
167
+ Regexp.new((wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
168
+ rescue RegexpError => e
169
+ errors << Error.new(:regex, token, e.message)
170
+ nil
179
171
  end
180
172
  end
181
173
  end
data/lib/fop/program.rb CHANGED
@@ -1,22 +1,16 @@
1
- require_relative 'tokenizer'
2
- require_relative 'parser'
3
-
4
1
  module Fop
5
2
  class Program
6
- attr_reader :nodes
7
-
8
- def initialize(src)
9
- tokens = Tokenizer.new(src).tokenize!
10
- @nodes = Parser.parse! tokens
3
+ def initialize(instructions)
4
+ @instructions = instructions
11
5
  end
12
6
 
13
7
  def apply(input)
14
8
  input = input.clone
15
9
  output =
16
- @nodes.reduce("") { |acc, token|
17
- section = token.consume!(input)
18
- return nil if section.nil?
19
- acc + section.to_s
10
+ @instructions.reduce("") { |acc, ins|
11
+ result = ins.call(input)
12
+ return nil if result.nil?
13
+ acc + result.to_s
20
14
  }
21
15
  input.empty? ? output : nil
22
16
  end
data/lib/fop/tokenizer.rb CHANGED
@@ -1,144 +1,194 @@
1
+ require_relative 'tokens'
2
+
1
3
  module Fop
2
4
  class Tokenizer
3
- Char = Struct.new(:char)
4
- Op = Struct.new(:match, :operator, :arg)
5
- Regex = Struct.new(:src)
6
- Error = Class.new(StandardError)
5
+ Token = Struct.new(:pos, :type, :val)
6
+ Escapes = Struct.new(:whitespace, :whitespace_sep, :operators, :regex_capture, :regex, :regex_escape, :wildcards, :exp)
7
7
 
8
- OP_OPEN = "{".freeze
9
- OP_CLOSE = "}".freeze
8
+ EXP_OPEN = "{".freeze
9
+ EXP_CLOSE = "}".freeze
10
10
  ESCAPE = "\\".freeze
11
11
  WILDCARD = "*".freeze
12
- REGEX_MARKER = "/".freeze
12
+ REGEX_DELIM = "/".freeze
13
+ REGEX_CAPTURE = "$".freeze
14
+ OP_REPLACE = "=".freeze
15
+ OP_APPEND = ">".freeze
16
+ OP_PREPEND = "<".freeze
17
+ OP_ADD = "+".freeze
18
+ OP_SUB = "-".freeze
19
+ WHITESPACE = " ".freeze
20
+
21
+ #
22
+ # Controls which "mode" the tokenizer is currently in. This is a necessary result of the syntax lacking
23
+ # explicit string delimiters. That *could* be worked around by requiring users to escape all reserved chars,
24
+ # but that's ugly af. Instead, the parser continually assesses the current context and flips these flags on
25
+ # or off to auto-escape certain chars for the next token.
26
+ #
27
+ attr_reader :escape
13
28
 
14
29
  def initialize(src)
15
30
  @src = src
16
31
  @end = src.size - 1
32
+ @start_i = 0
33
+ @i = 0
34
+ reset_escapes!
17
35
  end
18
36
 
19
- def tokenize!
20
- tokens = []
21
- escape = false
22
- i = 0
23
- until i > @end do
24
- char = @src[i]
25
- i += 1
26
-
27
- if escape
28
- tokens << Char.new(char)
29
- escape = false
30
- next
31
- end
32
-
33
- case char
34
- when ESCAPE
35
- escape = true
36
- when OP_OPEN
37
- i, op = operation! i
38
- tokens << op
39
- when OP_CLOSE
40
- raise "Unexpected #{OP_CLOSE}"
41
- when WILDCARD
42
- tokens << :wildcard
43
- else
44
- tokens << Char.new(char)
45
- end
46
- end
47
-
48
- raise Error, "Trailing escape" if escape
49
- tokens
37
+ # Auto-escape operators and regex capture vars. Appropriate for top-level syntax.
38
+ def reset_escapes!
39
+ @escape = Escapes.new(true, true, true, true)
50
40
  end
51
41
 
52
- private
53
-
54
- def operation!(i)
55
- found_close = false
56
- op = Op.new(nil, nil, "")
42
+ # Auto-escape anything you'd find in a regular expression
43
+ def regex_mode!
44
+ @escape.whitespace = true
45
+ @escape.regex = false # look for the final /
46
+ @escape.regex_escape = true # pass \ through to the regex engine UNLESS it's followed by a /
47
+ @escape.wildcards = true
48
+ @escape.operators = true
49
+ @escape.regex_capture = true
50
+ @escape.exp = true
51
+ end
57
52
 
58
- # Find matcher
59
- until found_close or op.match or i > @end do
60
- char = @src[i]
61
- i += 1
62
- case char
63
- when OP_CLOSE
64
- found_close = true
65
- when REGEX_MARKER
66
- i, reg = regex! i
67
- op.match = reg
53
+ def next
54
+ return Token.new(@i, Tokens::EOF) if @i > @end
55
+ char = @src[@i]
56
+ case char
57
+ when EXP_OPEN
58
+ @i += 1
59
+ token! Tokens::EXP_OPEN
60
+ when EXP_CLOSE
61
+ @i += 1
62
+ token! Tokens::EXP_CLOSE
63
+ when WILDCARD
64
+ @i += 1
65
+ token! Tokens::WILDCARD, WILDCARD
66
+ when REGEX_DELIM
67
+ if @escape.regex
68
+ get_str!
68
69
  else
69
- op.match = Char.new(char)
70
+ @i += 1
71
+ token! Tokens::REG_DELIM
70
72
  end
71
- end
72
-
73
- # Find operator
74
- until found_close or op.operator or i > @end do
75
- char = @src[i]
76
- i += 1
77
- case char
78
- when OP_CLOSE
79
- found_close = true
73
+ when REGEX_CAPTURE
74
+ if @escape.regex_capture
75
+ get_str!
80
76
  else
81
- op.operator = Char.new(char)
77
+ @i += 1
78
+ t = token! Tokens::REG_CAPTURE, @src[@i]
79
+ @i += 1
80
+ @start_i = @i
81
+ t
82
82
  end
83
- end
84
-
85
- # Find operator arg
86
- escape = false
87
- until found_close or i > @end do
88
- char = @src[i]
89
- i += 1
90
-
91
- if escape
92
- op.arg << char
93
- escape = false
94
- next
83
+ when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
84
+ if @escape.operators
85
+ get_str!
86
+ else
87
+ @i += 1
88
+ token! Tokens::OPERATOR, char
95
89
  end
96
-
97
- case char
98
- when ESCAPE
99
- escape = true
100
- when OP_OPEN
101
- raise "Unexpected #{OP_OPEN}"
102
- when OP_CLOSE
103
- found_close = true
90
+ when WHITESPACE
91
+ if @escape.whitespace
92
+ get_str!
93
+ elsif !@escape.whitespace_sep
94
+ @i += 1
95
+ token! Tokens::WHITESPACE_SEP
104
96
  else
105
- op.arg << char
97
+ @i += 1
98
+ @start_i = @i
99
+ self.next
106
100
  end
101
+ else
102
+ get_str!
107
103
  end
108
-
109
- raise Error, "Unclosed operation" if !found_close
110
- raise Error, "Trailing escape" if escape
111
- return i, op
112
104
  end
113
105
 
114
- def regex!(i)
115
- escape = false
116
- found_close = false
117
- src = ""
106
+ private
107
+
108
+ def token!(type, val = nil)
109
+ t = Token.new(@start_i, type, val)
110
+ @start_i = @i
111
+ t
112
+ end
118
113
 
119
- until found_close or i > @end
120
- char = @src[i]
121
- i += 1
114
+ def get_str!
115
+ str = ""
116
+ escape, found_end = false, false
117
+ until found_end or @i > @end
118
+ char = @src[@i]
122
119
 
123
120
  if escape
124
- src << char
121
+ @i += 1
122
+ str << char
125
123
  escape = false
126
124
  next
127
125
  end
128
126
 
129
127
  case char
130
128
  when ESCAPE
131
- escape = true
132
- when REGEX_MARKER
133
- found_close = true
129
+ @i += 1
130
+ if @escape.regex_escape and @src[@i] != REGEX_DELIM
131
+ str << char
132
+ else
133
+ escape = true
134
+ end
135
+ when EXP_OPEN
136
+ if @escape.exp
137
+ @i += 1
138
+ str << char
139
+ else
140
+ found_end = true
141
+ end
142
+ when EXP_CLOSE
143
+ if @escape.exp
144
+ @i += 1
145
+ str << char
146
+ else
147
+ found_end = true
148
+ end
149
+ when WILDCARD
150
+ if @escape.wildcards
151
+ @i += 1
152
+ str << char
153
+ else
154
+ found_end = true
155
+ end
156
+ when REGEX_DELIM
157
+ if @escape.regex
158
+ @i += 1
159
+ str << char
160
+ else
161
+ found_end = true
162
+ end
163
+ when REGEX_CAPTURE
164
+ if @escape.regex_capture
165
+ @i += 1
166
+ str << char
167
+ else
168
+ found_end = true
169
+ end
170
+ when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
171
+ if @escape.operators
172
+ @i += 1
173
+ str << char
174
+ else
175
+ found_end = true
176
+ end
177
+ when WHITESPACE
178
+ if @escape.whitespace
179
+ @i += 1
180
+ str << char
181
+ else
182
+ found_end = true
183
+ end
134
184
  else
135
- src << char
185
+ @i += 1
186
+ str << char
136
187
  end
137
188
  end
138
189
 
139
- raise Error, "Unclosed regex" if !found_close
140
- raise Error, "Trailing escape" if escape
141
- return i, Regex.new(src)
190
+ return Token.new(@i - 1, Tokens::TR_ESC) if escape
191
+ token! Tokens::TEXT, str
142
192
  end
143
193
  end
144
194
  end
data/lib/fop/tokens.rb ADDED
@@ -0,0 +1,14 @@
1
+ module Fop
2
+ module Tokens
3
+ TEXT = :TXT
4
+ EXP_OPEN = :"{"
5
+ EXP_CLOSE = :"}"
6
+ REG_CAPTURE = :"$"
7
+ REG_DELIM = :/
8
+ WILDCARD = :*
9
+ OPERATOR = :op
10
+ TR_ESC = :"trailing escape"
11
+ WHITESPACE_SEP = :s
12
+ EOF = :EOF
13
+ end
14
+ end
data/lib/fop/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Fop
2
- VERSION = "0.4.0"
2
+ VERSION = "0.8.0"
3
3
  end
data/lib/fop_lang.rb CHANGED
@@ -1,12 +1,22 @@
1
1
  require_relative 'fop/version'
2
+ require_relative 'fop/compiler'
2
3
  require_relative 'fop/program'
3
4
 
4
5
  def Fop(src)
5
- ::Fop::Program.new(src)
6
+ ::Fop.compile!(src)
6
7
  end
7
8
 
8
9
  module Fop
10
+ def self.compile!(src)
11
+ prog, errors = compile(src)
12
+ # TODO better exception
13
+ raise "Fop errors: " + errors.map(&:message).join(",") if errors
14
+ prog
15
+ end
16
+
9
17
  def self.compile(src)
10
- Program.new(src)
18
+ instructions, errors = ::Fop::Compiler.compile(src)
19
+ return nil, errors if errors
20
+ return Program.new(instructions), nil
11
21
  end
12
22
  end
metadata CHANGED
@@ -1,26 +1,31 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fop_lang
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.8.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jordan Hollinger
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-08-16 00:00:00.000000000 Z
11
+ date: 2021-09-01 00:00:00.000000000 Z
12
12
  dependencies: []
13
13
  description: A micro expression language for Filter and OPerations on text
14
14
  email: jordan.hollinger@gmail.com
15
- executables: []
15
+ executables:
16
+ - fop
16
17
  extensions: []
17
18
  extra_rdoc_files: []
18
19
  files:
19
20
  - README.md
21
+ - bin/fop
22
+ - lib/fop/cli.rb
23
+ - lib/fop/compiler.rb
20
24
  - lib/fop/nodes.rb
21
25
  - lib/fop/parser.rb
22
26
  - lib/fop/program.rb
23
27
  - lib/fop/tokenizer.rb
28
+ - lib/fop/tokens.rb
24
29
  - lib/fop/version.rb
25
30
  - lib/fop_lang.rb
26
31
  homepage: https://jhollinger.github.io/fop-lang-rb/