fop_lang 0.3.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b5f19a543b81c0046dc63fcc1c0769989628017d2c1d1da74ef0db9866a0f2f7
4
- data.tar.gz: 03b6597f9cab97c95ccda8396693bb43d9da729137cb916cc74f7fbecc314b32
3
+ metadata.gz: 798fd7c335f394e878fba2f70a9f60372ea356c79f2dc63392398920d0ffce38
4
+ data.tar.gz: 654786ff77823e8d8dd9a348f958828346e3755e43a04a0f38e711a6c5571ea9
5
5
  SHA512:
6
- metadata.gz: 3a17c82a561e20cbc5cb8abbad5be4f94f02110d60b6130e3e1e9489672c5c134befc6b1daca2f590f083a67934e600fb5d6fa0ea5433181ba3014514c558232
7
- data.tar.gz: 790250c8a79dcf04b381f2dd33cbaa048fd070688ab45446ff87652dcb18844c2d6139d0ead060fa338a57b8590eee0167ea2c25abd84e1d71571f33c49bcbda
6
+ metadata.gz: 6761f3d7dd602d1c93a2387fc73ea14c11484e88d0d319bbf87df98925977aa15de59a63f23aafffafa384ce3b9def9f81edabae669aabc2012b00d3131e46f4
7
+ data.tar.gz: 7f5187cd510d691dda996284d5a400804b7573f67506701e39a6d2909c8a4026b58655f6b2800708e911377ccce790885a2238eed7a75d4873e4b599d23e67df
data/README.md CHANGED
@@ -1,55 +1,99 @@
1
1
  # fop_lang
2
2
 
3
- Fop (Filter and OPerations language) is an experimental, tiny expression language in the vein of awk and sed. This is a Ruby implementation. It is useful for simultaneously matching and transforming text input.
3
+ Fop (Filter and OPerations language) is a tiny, experimental language for filtering and transforming text. Think of it like awk but with the condition and action segments combined.
4
4
 
5
- ```ruby
6
- gem 'fop_lang'
7
- ```
5
+ This is a Ruby implementation with both a library interface and a bin command.
8
6
 
9
- ## Release Number Example
7
+ ## Installation
10
8
 
11
- This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
9
+ ```bash
10
+ $ gem install fop_lang
11
+ ```
12
+
13
+ You may use fop in a Ruby script:
12
14
 
13
15
  ```ruby
14
- f = Fop('release-{N}.{N+1}.{N=0}')
16
+ require 'fop_lang'
15
17
 
16
- puts f.apply('release-5.99.1')
17
- => 'release-5.100.0'
18
+ f = Fop('foo {N+1}')
18
19
 
19
- puts f.apply('release-5')
20
- => nil
21
- # doesn't match the pattern
20
+ f.apply('foo 1')
21
+ => "foo 2"
22
+
23
+ f.apply('bar 1')
24
+ => nil
25
+ ```
26
+
27
+ or run `fop` from the command line:
28
+
29
+ ```bash
30
+ $ echo 'foo 1' | fop 'foo {N+1}'
31
+ foo 2
32
+ $ echo 'bar 1' | fop 'foo {N+1}'
22
33
  ```
23
34
 
24
- ## Anatomy of a Fop expression
35
+ ## Syntax
36
+
37
+ `Text /(R|r)egex/ {N+1}`
38
+
39
+ The above program demonstrates a text match, a regex match, and a match expression. If the input matches all three segments, output is given. If the input was `Text regex 5`, the output would be `Text regex 6`.
40
+
41
+ ### Text match
42
+
43
+ The input must match this text exactly. Whitespace is part of the match. Wildcards (`*`) are allowed. Special characters (`*/{}\`) may be escaped with `\`.
25
44
 
26
- `Text Literal {Operation}`
45
+ The output of a text match will be the matching input.
27
46
 
28
- The above expression contains the only two parts of Fop (except for the wildcard and escape characters).
47
+ ### Regex match
29
48
 
30
- **Text Literals**
49
+ Regular expressions may be placed between `/`s. If the regular expression contains a `/`, you may escape it with `\`. Special regex characters like `[]()+.*` may also be escaped with `\`.
31
50
 
32
- A text literal works how it sounds: the input must match it exactly. The only exception is the `*` (wildcard) character, which matches 0 or more of anything. Wildcards can be used anywhere except inside `{...}` (operations).
51
+ The output of a regex match will be the matching input.
33
52
 
34
- If `\` (escape) is used before the special characters `*`, `{` or `}`, then that character is treated like a text literal. It's recommended to use single-quoted Ruby strings with Fop expressions that so you don't need to double-escape.
53
+ ### Match expression
35
54
 
36
- **Operations**
55
+ A match expression both matches on input and modifies that input. An expression is made up of 1 - 3 parts:
37
56
 
38
- Operations are the interesting part of Fop, and are specified between `{` and `}`. An Operation can consist of one to three parts:
57
+ 1. The match, e.g. `N` for numeric.
58
+ 2. The operator, e.g. `+` for addition (optional).
59
+ 3. The argument, e.g `1` for "add one" (required for most operators).
39
60
 
40
- 1. Matching class (required): Defines what characters the operation will match and operate on.
41
- * `N` is the numeric class and will match one or more digits.
42
- * `A` is the alpha class and will match one or more letters (lower or upper case).
43
- * `W` is the word class and matches alphanumeric chars and underscores.
44
- * `*` is the wildcard class and greedily matches everything after it.
45
- * `/.../` matches on the supplied regex between the `/`'s. If you're regex contains a `/`, it must be escaped.
46
- 3. Operator (optional): What to do to the matching characters.
47
- * `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars. Note that any `/` chars must be escaped, so as not to be mistaken for a regex.
48
- * `+` Perform addition on the matching number and the argument (`N` only).
49
- * `-` Subtract the argument from the matching number (`N` only).
50
- 5. Operator argument (required for some operators): meaning varies by operator.
61
+ The output of a match expression will be the _modified_ matching input. If no operator is given, the output will be the matching input.
62
+
63
+ **Matches**
64
+
65
+ * `N` matches one or more consecutive digits.
66
+ * `A` matches one or more letters (lower or upper case).
67
+ * `W` matches alphanumeric chars and underscores.
68
+ * `*` greedily matches everything after it.
69
+ * `/regex/` matches on the supplied regex. Capture groups may be referenced in the argument as `$1`, `$2`, etc.
70
+
71
+ **Operators**
72
+
73
+ * `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars.
74
+ * `>` Append the argument to the matching value.
75
+ * `<` Prepend the argument to the matching value.
76
+ * `+` Perform addition on the matching number and the argument (`N` only).
77
+ * `-` Subtract the argument from the matching number (`N` only).
78
+
79
+ ## Examples
80
+
81
+ ### Release Number Example
82
+
83
+ This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
84
+
85
+ ```ruby
86
+ f = Fop('release-{N}.{N+1}.{N=0}')
87
+
88
+ puts f.apply('release-5.99.1')
89
+ => 'release-5.100.0'
90
+
91
+ puts f.apply('release-5')
92
+ => nil
93
+ # doesn't match the pattern
94
+ ```
51
95
 
52
- ## More Examples
96
+ ### More Examples
53
97
 
54
98
  ```ruby
55
99
  f = Fop('release-{N=5}.{N+1}.{N=0}')
data/bin/fop ADDED
@@ -0,0 +1,42 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ # Used for local testing
4
+ # $LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib')
5
+
6
+ require 'fop_lang'
7
+ require 'fop/cli'
8
+
9
+ opts = Fop::CLI.options!
10
+
11
+ if opts.version
12
+ puts Fop::VERSION
13
+ exit 0
14
+ end
15
+
16
+ src = opts.src.read.chomp
17
+ if src.empty?
18
+ $stderr.puts "No expression given"
19
+ exit 1
20
+ end
21
+
22
+ fop, errors = Fop.compile(src)
23
+ opts.src.close
24
+ NL = "\n".freeze
25
+
26
+ if errors
27
+ $stderr.puts src
28
+ $stderr.puts errors.join(NL)
29
+ exit 1
30
+ end
31
+
32
+ if opts.check
33
+ $stdout.puts "Syntax OK" unless opts.quiet
34
+ exit 0
35
+ end
36
+
37
+ while (line = gets) do
38
+ line.chomp!
39
+ if (res = fop.apply(line))
40
+ print(res << NL)
41
+ end
42
+ end
data/lib/fop/cli.rb ADDED
@@ -0,0 +1,34 @@
1
+ require 'optparse'
2
+
3
+ module Fop
4
+ module CLI
5
+ Options = Struct.new(:src, :check, :quiet, :version)
6
+
7
+ def self.options!
8
+ options = Options.new
9
+ OptionParser.new do |opts|
10
+ opts.banner = "Usage: fop [options] [ 'prog' | -f progfile ] [ file ... ]"
11
+
12
+ opts.on("-fFILE", "--file=FILE", "Read program from file instead of first argument") do |f|
13
+ options.src = File.open(f)
14
+ options.src.advise(:sequential)
15
+ end
16
+
17
+ opts.on("-c", "--check", "Perform a syntax check on the program and exit") do
18
+ options.check = true
19
+ end
20
+
21
+ opts.on("-q", "--quiet", "Only print errors and output") do
22
+ options.quiet = true
23
+ end
24
+
25
+ opts.on("--version", "Print version and exit") do
26
+ options.version = true
27
+ end
28
+ end.parse!
29
+
30
+ options.src ||= StringIO.new(ARGV.shift || "")
31
+ options
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,72 @@
1
+ require_relative 'parser'
2
+
3
+ module Fop
4
+ module Compiler
5
+ def self.compile(src)
6
+ parser = Parser.new(src)
7
+ nodes, errors = parser.parse
8
+
9
+ instructions = nodes.map { |node|
10
+ case node
11
+ when Nodes::Text, Nodes::Regex
12
+ Instructions.regex_match(node.regex)
13
+ when Nodes::Expression
14
+ Instructions::ExpressionMatch.new(node)
15
+ else
16
+ raise "Unknown node type #{node}"
17
+ end
18
+ }
19
+
20
+ return nil, errors if errors.any?
21
+ return instructions, nil
22
+ end
23
+
24
+ module Instructions
25
+ BLANK = "".freeze
26
+ OPERATIONS = {
27
+ "=" => ->(_val, arg) { arg || BLANK },
28
+ "+" => ->(val, arg) { val.to_i + arg.to_i },
29
+ "-" => ->(val, arg) { val.to_i - arg.to_i },
30
+ ">" => ->(val, arg) { val + arg },
31
+ "<" => ->(val, arg) { arg + val },
32
+ }
33
+
34
+ def self.regex_match(regex)
35
+ ->(input) { input.slice! regex }
36
+ end
37
+
38
+ class ExpressionMatch
39
+ def initialize(node)
40
+ @regex = node.regex&.regex
41
+ @op = node.operator ? OPERATIONS.fetch(node.operator) : nil
42
+ @regex_match = node.regex_match
43
+ if node.arg&.any? { |a| a.is_a? Integer }
44
+ @arg, @arg_with_caps = nil, node.arg
45
+ else
46
+ @arg = node.arg&.join("")
47
+ @arg_with_caps = nil
48
+ end
49
+ end
50
+
51
+ def call(input)
52
+ if (match = @regex.match(input))
53
+ val = match.to_s
54
+ blank = val == BLANK
55
+ input.sub!(val, BLANK) unless blank
56
+ found_val = @regex_match || !blank
57
+ arg = @arg_with_caps ? sub_caps(@arg_with_caps, match.captures) : @arg
58
+ @op && found_val ? @op.call(val, arg) : val
59
+ end
60
+ end
61
+
62
+ private
63
+
64
+ def sub_caps(args, caps)
65
+ args.map { |a|
66
+ a.is_a?(Integer) ? caps[a].to_s : a
67
+ }.join("")
68
+ end
69
+ end
70
+ end
71
+ end
72
+ end
data/lib/fop/nodes.rb CHANGED
@@ -1,29 +1,29 @@
1
1
  module Fop
2
2
  module Nodes
3
- Text = Struct.new(:wildcard, :str) do
4
- def consume!(input)
5
- @regex ||= Regexp.new((wildcard ? ".*" : "^") + Regexp.escape(str))
6
- input.slice!(@regex)
7
- end
8
-
3
+ Text = Struct.new(:wildcard, :str, :regex) do
9
4
  def to_s
10
5
  w = wildcard ? "*" : nil
11
- "Text #{w}#{str}"
6
+ "[#{w}txt] #{str}"
12
7
  end
13
8
  end
14
9
 
15
- Op = Struct.new(:wildcard, :match, :regex_match, :regex, :operator, :operator_arg, :expression) do
16
- def consume!(input)
17
- if (val = input.slice!(regex))
18
- found_val = regex_match || val != Parser::BLANK
19
- expression && found_val ? expression.call(val) : val
20
- end
10
+ Regex = Struct.new(:wildcard, :src, :regex) do
11
+ def to_s
12
+ w = wildcard ? "*" : nil
13
+ "[#{w}reg] #{src}"
21
14
  end
15
+ end
22
16
 
17
+ Expression = Struct.new(:wildcard, :match, :regex_match, :regex, :operator, :arg) do
23
18
  def to_s
24
19
  w = wildcard ? "*" : nil
25
- s = "#{w}#{match}"
26
- s << " #{operator} #{operator_arg}" if operator
20
+ s = "[#{w}exp] #{match}"
21
+ if operator
22
+ arg_str = arg
23
+ .map { |a| a.is_a?(Integer) ? "$#{a+1}" : a.to_s }
24
+ .join("")
25
+ s << " #{operator} #{arg_str}"
26
+ end
27
27
  s
28
28
  end
29
29
  end
data/lib/fop/parser.rb CHANGED
@@ -1,136 +1,162 @@
1
+ require_relative 'tokenizer'
1
2
  require_relative 'nodes'
2
3
 
3
4
  module Fop
4
- module Parser
5
- Error = Class.new(StandardError)
5
+ class Parser
6
+ DIGIT = /^[0-9]$/
7
+ REGEX_START = "^".freeze
8
+ REGEX_LAZY_WILDCARD = ".*?".freeze
9
+ REGEX_MATCHES = {
10
+ "N" => "[0-9]+".freeze,
11
+ "W" => "\\w+".freeze,
12
+ "A" => "[a-zA-Z]+".freeze,
13
+ "*" => ".*".freeze,
14
+ }.freeze
15
+ OPS_WITH_OPTIONAL_ARGS = [Tokenizer::OP_REPLACE]
16
+ TR_REGEX = /.*/
6
17
 
7
- MATCH_NUM = "N".freeze
8
- MATCH_WORD = "W".freeze
9
- MATCH_ALPHA = "A".freeze
10
- MATCH_WILD = "*".freeze
11
- BLANK = "".freeze
12
- OP_REPLACE = "=".freeze
13
- OP_ADD = "+".freeze
14
- OP_SUB = "-".freeze
15
- OP_MUL = "*".freeze
16
- OP_DIV = "/".freeze
18
+ Error = Struct.new(:type, :token, :message) do
19
+ def to_s
20
+ "#{type.to_s.capitalize} error: #{message} at column #{token.pos}"
21
+ end
22
+ end
17
23
 
18
- def self.parse!(tokens)
19
- nodes = []
20
- curr_node = nil
24
+ attr_reader :errors
21
25
 
22
- tokens.each { |token|
23
- case curr_node
24
- when nil
25
- curr_node = new_node token
26
- when :wildcard
27
- curr_node = new_node token, true
28
- raise Error, "Unexpected * after wildcard" if curr_node == :wildcard
29
- when Nodes::Text
30
- curr_node, finished_node = parse_text curr_node, token
31
- nodes << finished_node if finished_node
32
- when Nodes::Op
33
- nodes << curr_node
34
- curr_node = new_node token
26
+ def initialize(src, debug: false)
27
+ @tokenizer = Tokenizer.new(src)
28
+ @errors = []
29
+ end
30
+
31
+ def parse
32
+ nodes = []
33
+ wildcard = false
34
+ eof = false
35
+ # Top-level parsing. It will always be looking for a String, Regex, or Expression.
36
+ until eof
37
+ @tokenizer.reset_escapes!
38
+ t = @tokenizer.next
39
+ case t.type
40
+ when Tokens::WILDCARD
41
+ errors << Error.new(:syntax, t, "Consecutive wildcards") if wildcard
42
+ wildcard = true
43
+ when Tokens::TEXT
44
+ reg = build_regex!(wildcard, t, Regexp.escape(t.val))
45
+ nodes << Nodes::Text.new(wildcard, t.val, reg)
46
+ wildcard = false
47
+ when Tokens::EXP_OPEN
48
+ nodes << parse_exp!(wildcard)
49
+ wildcard = false
50
+ when Tokens::REG_DELIM
51
+ nodes << parse_regex!(wildcard)
52
+ wildcard = false
53
+ when Tokens::EOF
54
+ eof = true
35
55
  else
36
- raise Error, "Unexpected node #{curr_node}"
56
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}")
37
57
  end
38
- }
39
-
40
- case curr_node
41
- when nil
42
- # noop
43
- when :wildcard
44
- nodes << Nodes::Text.new(true, "")
45
- when Nodes::Text, Nodes::Op
46
- nodes << curr_node
47
- else
48
- raise "Unexpected end node #{curr_node}"
49
58
  end
50
-
51
- nodes
59
+ nodes << Nodes::Text.new(true, "", TR_REGEX) if wildcard
60
+ return nodes, @errors
52
61
  end
53
62
 
54
- private
63
+ def parse_exp!(wildcard = false)
64
+ exp = Nodes::Expression.new(wildcard)
65
+ parse_exp_match! exp
66
+ op_token = parse_exp_operator! exp
67
+ if exp.operator
68
+ parse_exp_arg! exp, op_token
69
+ end
70
+ return exp
71
+ end
55
72
 
56
- def self.new_node(token, wildcard = false)
57
- case token
58
- when Tokenizer::Char
59
- Nodes::Text.new(wildcard, token.char.clone)
60
- when Tokenizer::Op
61
- op = Nodes::Op.new(wildcard)
62
- parse_op! op, token.tokens
63
- op
64
- when :wildcard
65
- :wildcard
73
+ def parse_exp_match!(exp)
74
+ @tokenizer.escape.operators = false
75
+ t = @tokenizer.next
76
+ case t.type
77
+ when Tokens::TEXT, Tokens::WILDCARD
78
+ exp.match = t.val
79
+ if (src = REGEX_MATCHES[exp.match])
80
+ reg = Regexp.new((exp.wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
81
+ exp.regex = Nodes::Regex.new(exp.wildcard, src, reg)
82
+ else
83
+ errors << Error.new(:name, t, "Unknown match type '#{exp.match}'") if exp.regex.nil?
84
+ end
85
+ when Tokens::REG_DELIM
86
+ exp.regex = parse_regex!(exp.wildcard)
87
+ exp.match = exp.regex&.src
88
+ exp.regex_match = true
89
+ @tokenizer.reset_escapes!
66
90
  else
67
- raise Error, "Unexpected #{token}"
91
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string or a regex")
68
92
  end
69
93
  end
70
94
 
71
- # @return current node
72
- # @return finished node
73
- def self.parse_text(node, token)
74
- case token
75
- when Tokenizer::Char
76
- node.str << token.char
77
- return node, nil
78
- when Tokenizer::Op
79
- op = new_node token
80
- return op, node
81
- when :wildcard
82
- return :wildcard, node
95
+ def parse_exp_operator!(exp)
96
+ @tokenizer.escape.operators = false
97
+ t = @tokenizer.next
98
+ case t.type
99
+ when Tokens::EXP_CLOSE
100
+ # no op
101
+ when Tokens::OPERATOR
102
+ exp.operator = t.val
83
103
  else
84
- raise Error, "Unexpected #{token}"
104
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected an operator")
85
105
  end
106
+ t
86
107
  end
87
108
 
88
- def self.parse_op!(node, tokens)
89
- t = tokens[0] || raise(Error, "Empty operation")
90
- # parse the matching type
91
- node.regex =
92
- case t
93
- when Tokenizer::Char
94
- node.match = t.char
95
- node.regex_match = false
96
- case t.char
97
- when MATCH_NUM then Regexp.new((node.wildcard ? ".*?" : "^") + "[0-9]+")
98
- when MATCH_WORD then Regexp.new((node.wildcard ? ".*?" : "^") + "\\w+")
99
- when MATCH_ALPHA then Regexp.new((node.wildcard ? ".*?" : "^") + "[a-zA-Z]+")
100
- when MATCH_WILD then /.*/
101
- else raise Error, "Unknown match type '#{t.char}'"
102
- end
103
- when Tokenizer::Regex
104
- node.match = "/#{t.src}/"
105
- node.regex_match = true
106
- Regexp.new((node.wildcard ? ".*?" : "^") + t.src)
109
+ def parse_exp_arg!(exp, op_token)
110
+ @tokenizer.escape.operators = true
111
+ @tokenizer.escape.regex = true
112
+ @tokenizer.escape.regex_capture = false if exp.regex_match
113
+
114
+ exp.arg = []
115
+ found_close, eof = false, false
116
+ until found_close or eof
117
+ t = @tokenizer.next
118
+ case t.type
119
+ when Tokens::TEXT
120
+ exp.arg << t.val
121
+ when Tokens::REG_CAPTURE
122
+ exp.arg << t.val.to_i - 1
123
+ errors << Error.new(:syntax, t, "Invalid regex capture; must be between 0 and 9 (found #{t.val})") unless t.val =~ DIGIT
124
+ errors << Error.new(:syntax, t, "Unexpected regex capture; expected str or '}'") if !exp.regex_match
125
+ when Tokens::EXP_CLOSE
126
+ found_close = true
127
+ when Tokens::EOF
128
+ eof = true
129
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
107
130
  else
108
- raise Error, "Unexpected token #{t}"
131
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
109
132
  end
133
+ end
110
134
 
111
- # parse the operator (if any)
112
- if (op = tokens[1])
113
- raise Error, "Unexpected #{op}" unless op.is_a? Tokenizer::Char
114
- node.operator = op.char
115
-
116
- arg = tokens[2..-1].reduce("") { |acc, t|
117
- raise Error, "Unexpected #{t}" unless t.is_a? Tokenizer::Char
118
- acc + t.char
119
- }
120
- node.operator_arg = arg == BLANK ? nil : arg
135
+ if exp.arg.size != 1 and !OPS_WITH_OPTIONAL_ARGS.include?(exp.operator)
136
+ errors << Error.new(:arg, op_token, "Operator '#{op_token.val}' requires an argument")
137
+ end
138
+ end
121
139
 
122
- node.expression =
123
- case node.operator
124
- when OP_REPLACE
125
- ->(_) { node.operator_arg || BLANK }
126
- when OP_ADD, OP_SUB, OP_MUL, OP_DIV
127
- raise Error, "Operator #{node.operator} is only available for numeric matches" unless node.match == MATCH_NUM
128
- raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
129
- ->(x) { x.to_i.send(node.operator, node.operator_arg.to_i) }
130
- else
131
- raise(Error, "Unknown operator #{node.operator}")
132
- end
140
+ def parse_regex!(wildcard)
141
+ @tokenizer.regex_mode!
142
+ t = @tokenizer.next
143
+ reg = Nodes::Regex.new(wildcard, t.val)
144
+ if t.type == Tokens::TEXT
145
+ reg.regex = build_regex!(wildcard, t)
146
+ else
147
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex")
133
148
  end
149
+
150
+ t = @tokenizer.next
151
+ errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex") unless t.type == Tokens::REG_DELIM
152
+ reg
153
+ end
154
+
155
+ def build_regex!(wildcard, token, src = token.val)
156
+ Regexp.new((wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
157
+ rescue RegexpError => e
158
+ errors << Error.new(:regex, token, e.message)
159
+ nil
134
160
  end
135
161
  end
136
162
  end
data/lib/fop/program.rb CHANGED
@@ -1,22 +1,16 @@
1
- require_relative 'tokenizer'
2
- require_relative 'parser'
3
-
4
1
  module Fop
5
2
  class Program
6
- attr_reader :nodes
7
-
8
- def initialize(src)
9
- tokens = Tokenizer.new(src).tokenize!
10
- @nodes = Parser.parse! tokens
3
+ def initialize(instructions)
4
+ @instructions = instructions
11
5
  end
12
6
 
13
7
  def apply(input)
14
8
  input = input.clone
15
9
  output =
16
- @nodes.reduce("") { |acc, token|
17
- section = token.consume!(input)
18
- return nil if section.nil?
19
- acc + section.to_s
10
+ @instructions.reduce("") { |acc, ins|
11
+ result = ins.call(input)
12
+ return nil if result.nil?
13
+ acc + result.to_s
20
14
  }
21
15
  input.empty? ? output : nil
22
16
  end
data/lib/fop/tokenizer.rb CHANGED
@@ -1,123 +1,175 @@
1
+ require_relative 'tokens'
2
+
1
3
  module Fop
2
4
  class Tokenizer
3
- Char = Struct.new(:char)
4
- Op = Struct.new(:tokens)
5
- Regex = Struct.new(:src)
6
- Error = Class.new(StandardError)
5
+ Token = Struct.new(:pos, :type, :val)
6
+ Error = Struct.new(:pos, :message)
7
+ Escapes = Struct.new(:operators, :regex_capture, :regex, :regex_escape, :wildcards, :exp)
7
8
 
8
- OP_OPEN = "{".freeze
9
- OP_CLOSE = "}".freeze
9
+ EXP_OPEN = "{".freeze
10
+ EXP_CLOSE = "}".freeze
10
11
  ESCAPE = "\\".freeze
11
12
  WILDCARD = "*".freeze
12
- REGEX_MARKER = "/".freeze
13
+ REGEX_DELIM = "/".freeze
14
+ REGEX_CAPTURE = "$".freeze
15
+ OP_REPLACE = "=".freeze
16
+ OP_APPEND = ">".freeze
17
+ OP_PREPEND = "<".freeze
18
+ OP_ADD = "+".freeze
19
+ OP_SUB = "-".freeze
20
+
21
+ #
22
+ # Controls which "mode" the tokenizer is currently in. This is a necessary result of the syntax lacking
23
+ # explicit string delimiters. That *could* be worked around by requiring users to escape all reserved chars,
24
+ # but that's ugly af. Instead, the parser continually assesses the current context and flips these flags on
25
+ # or off to auto-escape certain chars for the next token.
26
+ #
27
+ attr_reader :escape
13
28
 
14
29
  def initialize(src)
15
30
  @src = src
16
31
  @end = src.size - 1
32
+ @start_i = 0
33
+ @i = 0
34
+ reset_escapes!
17
35
  end
18
36
 
19
- def tokenize!
20
- tokens = []
21
- escape = false
22
- i = 0
23
- until i > @end do
24
- char = @src[i]
25
- if escape
26
- tokens << Char.new(char)
27
- escape = false
28
- i += 1
29
- next
30
- end
31
-
32
- case char
33
- when ESCAPE
34
- escape = true
35
- i += 1
36
- when OP_OPEN
37
- i, op = operation! i + 1
38
- tokens << op
39
- when OP_CLOSE
40
- raise "Unexpected #{OP_CLOSE}"
41
- when WILDCARD
42
- tokens << :wildcard
43
- i += 1
44
- else
45
- tokens << Char.new(char)
46
- i += 1
47
- end
48
- end
49
-
50
- raise Error, "Trailing escape" if escape
51
- tokens
37
+ # Auto-escape operators and regex capture vars. Appropriate for top-level syntax.
38
+ def reset_escapes!
39
+ @escape = Escapes.new(true, true)
52
40
  end
53
41
 
54
- private
55
-
56
- def operation!(i)
57
- escape = false
58
- found_close = false
59
- tokens = []
42
+ # Auto-escape anything you'd find in a regular expression
43
+ def regex_mode!
44
+ @escape.regex = false # look for the final /
45
+ @escape.regex_escape = true # pass \ through to the regex engine UNLESS it's followed by a /
46
+ @escape.wildcards = true
47
+ @escape.operators = true
48
+ @escape.regex_capture = true
49
+ @escape.exp = true
50
+ end
60
51
 
61
- until found_close or i > @end do
62
- char = @src[i]
63
- if escape
64
- tokens << Char.new(char)
65
- escape = false
66
- i += 1
67
- next
52
+ def next
53
+ return Token.new(@i, Tokens::EOF) if @i > @end
54
+ char = @src[@i]
55
+ case char
56
+ when EXP_OPEN
57
+ @i += 1
58
+ token! Tokens::EXP_OPEN
59
+ when EXP_CLOSE
60
+ @i += 1
61
+ token! Tokens::EXP_CLOSE
62
+ when WILDCARD
63
+ @i += 1
64
+ token! Tokens::WILDCARD, WILDCARD
65
+ when REGEX_DELIM
66
+ if @escape.regex
67
+ get_str!
68
+ else
69
+ @i += 1
70
+ token! Tokens::REG_DELIM
68
71
  end
69
-
70
- case char
71
- when ESCAPE
72
- escape = true
73
- i += 1
74
- when OP_OPEN
75
- raise "Unexpected #{OP_OPEN}"
76
- when OP_CLOSE
77
- found_close = true
78
- i += 1
79
- when REGEX_MARKER
80
- i, reg = regex! i + 1
81
- tokens << reg
72
+ when REGEX_CAPTURE
73
+ if @escape.regex_capture
74
+ get_str!
82
75
  else
83
- tokens << Char.new(char)
84
- i += 1
76
+ @i += 1
77
+ t = token! Tokens::REG_CAPTURE, @src[@i]
78
+ @i += 1
79
+ @start_i = @i
80
+ t
85
81
  end
82
+ when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
83
+ if @escape.operators
84
+ get_str!
85
+ else
86
+ @i += 1
87
+ token! Tokens::OPERATOR, char
88
+ end
89
+ else
90
+ get_str!
86
91
  end
87
-
88
- raise Error, "Unclosed operation" if !found_close
89
- raise Error, "Trailing escape" if escape
90
- return i, Op.new(tokens)
91
92
  end
92
93
 
93
- def regex!(i)
94
- escape = false
95
- found_close = false
96
- src = ""
94
+ private
97
95
 
98
- until found_close or i > @end
99
- char = @src[i]
100
- i += 1
96
+ def token!(type, val = nil)
97
+ t = Token.new(@start_i, type, val)
98
+ @start_i = @i
99
+ t
100
+ end
101
+
102
+ def get_str!
103
+ str = ""
104
+ escape, found_end = false, false
105
+ until found_end or @i > @end
106
+ char = @src[@i]
101
107
 
102
108
  if escape
103
- src << char
109
+ @i += 1
110
+ str << char
104
111
  escape = false
105
112
  next
106
113
  end
107
114
 
108
115
  case char
109
116
  when ESCAPE
110
- escape = true
111
- when REGEX_MARKER
112
- found_close = true
117
+ @i += 1
118
+ if @escape.regex_escape and @src[@i] != REGEX_DELIM
119
+ str << char
120
+ else
121
+ escape = true
122
+ end
123
+ when EXP_OPEN
124
+ if @escape.exp
125
+ @i += 1
126
+ str << char
127
+ else
128
+ found_end = true
129
+ end
130
+ when EXP_CLOSE
131
+ if @escape.exp
132
+ @i += 1
133
+ str << char
134
+ else
135
+ found_end = true
136
+ end
137
+ when WILDCARD
138
+ if @escape.wildcards
139
+ @i += 1
140
+ str << char
141
+ else
142
+ found_end = true
143
+ end
144
+ when REGEX_DELIM
145
+ if @escape.regex
146
+ @i += 1
147
+ str << char
148
+ else
149
+ found_end = true
150
+ end
151
+ when REGEX_CAPTURE
152
+ if @escape.regex_capture
153
+ @i += 1
154
+ str << char
155
+ else
156
+ found_end = true
157
+ end
158
+ when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
159
+ if @escape.operators
160
+ @i += 1
161
+ str << char
162
+ else
163
+ found_end = true
164
+ end
113
165
  else
114
- src << char
166
+ @i += 1
167
+ str << char
115
168
  end
116
169
  end
117
170
 
118
- raise Error, "Unclosed regex" if !found_close
119
- raise Error, "Trailing escape" if escape
120
- return i, Regex.new(src)
171
+ return Token.new(@i - 1, Tokens::TR_ESC) if escape
172
+ token! Tokens::TEXT, str
121
173
  end
122
174
  end
123
175
  end
data/lib/fop/tokens.rb ADDED
@@ -0,0 +1,13 @@
1
+ module Fop
2
+ module Tokens
3
+ TEXT = :TXT
4
+ EXP_OPEN = :"{"
5
+ EXP_CLOSE = :"}"
6
+ REG_CAPTURE = :"$"
7
+ REG_DELIM = :/
8
+ WILDCARD = :*
9
+ OPERATOR = :op
10
+ TR_ESC = :"trailing escape"
11
+ EOF = :EOF
12
+ end
13
+ end
data/lib/fop/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Fop
2
- VERSION = "0.3.0"
2
+ VERSION = "0.7.0"
3
3
  end
data/lib/fop_lang.rb CHANGED
@@ -1,12 +1,22 @@
1
1
  require_relative 'fop/version'
2
+ require_relative 'fop/compiler'
2
3
  require_relative 'fop/program'
3
4
 
4
5
  def Fop(src)
5
- ::Fop::Program.new(src)
6
+ ::Fop.compile!(src)
6
7
  end
7
8
 
8
9
  module Fop
10
+ def self.compile!(src)
11
+ prog, errors = compile(src)
12
+ # TODO better exception
13
+ raise "Fop errors: " + errors.map(&:message).join(",") if errors
14
+ prog
15
+ end
16
+
9
17
  def self.compile(src)
10
- Program.new(src)
18
+ instructions, errors = ::Fop::Compiler.compile(src)
19
+ return nil, errors if errors
20
+ return Program.new(instructions), nil
11
21
  end
12
22
  end
metadata CHANGED
@@ -1,26 +1,31 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fop_lang
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.7.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jordan Hollinger
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-08-16 00:00:00.000000000 Z
11
+ date: 2021-08-30 00:00:00.000000000 Z
12
12
  dependencies: []
13
13
  description: A micro expression language for Filter and OPerations on text
14
14
  email: jordan.hollinger@gmail.com
15
- executables: []
15
+ executables:
16
+ - fop
16
17
  extensions: []
17
18
  extra_rdoc_files: []
18
19
  files:
19
20
  - README.md
21
+ - bin/fop
22
+ - lib/fop/cli.rb
23
+ - lib/fop/compiler.rb
20
24
  - lib/fop/nodes.rb
21
25
  - lib/fop/parser.rb
22
26
  - lib/fop/program.rb
23
27
  - lib/fop/tokenizer.rb
28
+ - lib/fop/tokens.rb
24
29
  - lib/fop/version.rb
25
30
  - lib/fop_lang.rb
26
31
  homepage: https://jhollinger.github.io/fop-lang-rb/