fop_lang 0.4.0 → 0.8.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +86 -36
- data/bin/fop +42 -0
- data/lib/fop/cli.rb +34 -0
- data/lib/fop/compiler.rb +95 -0
- data/lib/fop/nodes.rb +22 -27
- data/lib/fop/parser.rb +142 -150
- data/lib/fop/program.rb +6 -12
- data/lib/fop/tokenizer.rb +154 -104
- data/lib/fop/tokens.rb +14 -0
- data/lib/fop/version.rb +1 -1
- data/lib/fop_lang.rb +12 -2
- metadata +8 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e23d8d937f5a4b5e4d74010bb91923dedce019543d4d3baefc228dece938a731
|
4
|
+
data.tar.gz: cc97f6953b708498be169352269b861c73c9dbe52ded1a72f4370a8d18d32d48
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e2cec9cd47a472298f7af0268a9dc03aacce374ed88da7b505e33cb4536f6f1d04107cce7c33eba4718809d54591a3111bfd26971eef3c52073ba1226be4da4f
|
7
|
+
data.tar.gz: 80b5700d0cdda44dd021fe48d5c134cb992c6967b10681c43488a5a7276fbf03df7d7a9427a9aa92529569eaf0d134fa789df0c8e27cd2250dc50bcb16727d13
|
data/README.md
CHANGED
@@ -1,57 +1,107 @@
|
|
1
1
|
# fop_lang
|
2
2
|
|
3
|
-
Fop (Filter and OPerations language) is
|
3
|
+
Fop (Filter and OPerations language) is a tiny, experimental language for filtering and operating on text. Think of it like awk but with the condition and action segments combined.
|
4
4
|
|
5
|
-
|
6
|
-
gem 'fop_lang'
|
7
|
-
```
|
5
|
+
This is a Ruby implementation with both a library interface and a bin command.
|
8
6
|
|
9
|
-
##
|
7
|
+
## Installation
|
10
8
|
|
11
|
-
|
9
|
+
```bash
|
10
|
+
$ gem install fop_lang
|
11
|
+
```
|
12
|
+
|
13
|
+
You may use fop in a Ruby script:
|
12
14
|
|
13
15
|
```ruby
|
14
|
-
|
16
|
+
require 'fop_lang'
|
15
17
|
|
16
|
-
|
17
|
-
|
18
|
+
f = Fop('foo {N+1}')
|
19
|
+
f.apply('foo 1')
|
20
|
+
=> "foo 2"
|
21
|
+
f.apply('bar 1')
|
22
|
+
=> nil
|
23
|
+
```
|
18
24
|
|
19
|
-
|
20
|
-
|
21
|
-
|
25
|
+
or run `fop` from the command line:
|
26
|
+
|
27
|
+
```bash
|
28
|
+
$ echo 'foo 1' | fop 'foo {N+1}'
|
29
|
+
foo 2
|
30
|
+
$ echo 'bar 1' | fop 'foo {N+1}'
|
22
31
|
```
|
23
32
|
|
24
|
-
##
|
33
|
+
## Syntax
|
34
|
+
|
35
|
+
`Text /(R|r)egex/ {N+1}`
|
36
|
+
|
37
|
+
The above program demonstrates a text match, a regex match, and a match expression. If the input matches all three segments, output is given. If the input was `Text regex 5`, the output would be `Text regex 6`.
|
38
|
+
|
39
|
+
### Text match
|
40
|
+
|
41
|
+
`Text ` and ` ` in the above example.
|
42
|
+
|
43
|
+
The input must match this text exactly. Whitespace is part of the match. Wildcards (`*`) are allowed. Special characters (`*/{}\`) may be escaped with `\`.
|
44
|
+
|
45
|
+
The output of a text match will be the matching input.
|
46
|
+
|
47
|
+
### Regex match
|
48
|
+
|
49
|
+
`/(R|r)egex/` in the above example.
|
50
|
+
|
51
|
+
Regular expressions may be placed between `/`s. If the regular expression contains a `/`, you may escape it with `\`. Special regex characters like `[]()+.*` may also be escaped with `\`.
|
25
52
|
|
26
|
-
|
53
|
+
The output of a regex match will be the matching input.
|
27
54
|
|
28
|
-
|
55
|
+
### Match expression
|
29
56
|
|
30
|
-
|
57
|
+
`{N+1}` in the above example.
|
31
58
|
|
32
|
-
A
|
59
|
+
A match expression both matches on input and modifies that input. An expression is made up of 1 - 3 parts:
|
33
60
|
|
34
|
-
|
61
|
+
1. The match, e.g. `N` for numeric.
|
62
|
+
2. The operator, e.g. `+` for addition (optional).
|
63
|
+
3. The argument, e.g `1` for "add one" (required for most operators).
|
35
64
|
|
36
|
-
|
65
|
+
The output of a match expression will be the _modified_ matching input. If no operator is given, the output will be the matching input.
|
37
66
|
|
38
|
-
|
67
|
+
**Matches**
|
39
68
|
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
* `/.../` matches on the supplied regex between the `/`'s. If you're regex contains a `/`, it must be escaped. Capture groups may be referenced in the operator argument as `$1`, `$2`, etc.
|
46
|
-
3. Operator (optional): What to do to the matching characters.
|
47
|
-
* `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars.
|
48
|
-
* `>` Append the following chars to the matching value.
|
49
|
-
* `<` Prepend the following chars to the matching value.
|
50
|
-
* `+` Perform addition on the matching number and the argument (`N` only).
|
51
|
-
* `-` Subtract the argument from the matching number (`N` only).
|
52
|
-
5. Operator argument (required for some operators): meaning varies by operator.
|
69
|
+
* `N` matches one or more consecutive digits.
|
70
|
+
* `A` matches one or more letters (lower or upper case).
|
71
|
+
* `W` matches alphanumeric chars and underscores.
|
72
|
+
* `*` greedily matches everything after it.
|
73
|
+
* `/regex/` matches on the supplied regex. Capture groups may be referenced in the argument as `$1`, `$2`, etc.
|
53
74
|
|
54
|
-
|
75
|
+
**Operators**
|
76
|
+
|
77
|
+
* `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars.
|
78
|
+
* `>` Append the argument to the matching value.
|
79
|
+
* `<` Prepend the argument to the matching value.
|
80
|
+
* `+` Perform addition on the matching number and the argument (`N` only).
|
81
|
+
* `-` Subtract the argument from the matching number (`N` only).
|
82
|
+
|
83
|
+
**Whitespace**
|
84
|
+
|
85
|
+
Inside of match expressions, whitespace is an optional seperator of terms, i.e. `{ N + 1 }` is the same as `{N+1}`. This means that any spaces in string arguments must be escaped. For example, replacing a word with `foo bar` looks like `{W = foo\ bar}`.
|
86
|
+
|
87
|
+
## Examples
|
88
|
+
|
89
|
+
### Release Number Example
|
90
|
+
|
91
|
+
This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
|
92
|
+
|
93
|
+
```ruby
|
94
|
+
f = Fop('release-{N}.{N+1}.{N=0}')
|
95
|
+
|
96
|
+
puts f.apply('release-5.99.1')
|
97
|
+
=> 'release-5.100.0'
|
98
|
+
|
99
|
+
puts f.apply('release-5')
|
100
|
+
=> nil
|
101
|
+
# doesn't match the pattern
|
102
|
+
```
|
103
|
+
|
104
|
+
### More Examples
|
55
105
|
|
56
106
|
```ruby
|
57
107
|
f = Fop('release-{N=5}.{N+1}.{N=0}')
|
@@ -61,10 +111,10 @@ Operations are the interesting part of Fop, and are specified between `{` and `}
|
|
61
111
|
```
|
62
112
|
|
63
113
|
```ruby
|
64
|
-
f = Fop('rel{/(ease)
|
114
|
+
f = Fop('rel{/(ease)?/=}-{N=5}.{N+1}.{N=0}')
|
65
115
|
|
66
116
|
puts f.apply('release-4.99.1')
|
67
|
-
=> '
|
117
|
+
=> 'rel-5.100.0'
|
68
118
|
|
69
119
|
puts f.apply('rel-4.99.1')
|
70
120
|
=> 'rel-5.100.0'
|
data/bin/fop
ADDED
@@ -0,0 +1,42 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
# Used for local testing
|
4
|
+
# $LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib')
|
5
|
+
|
6
|
+
require 'fop_lang'
|
7
|
+
require 'fop/cli'
|
8
|
+
|
9
|
+
opts = Fop::CLI.options!
|
10
|
+
|
11
|
+
if opts.version
|
12
|
+
puts Fop::VERSION
|
13
|
+
exit 0
|
14
|
+
end
|
15
|
+
|
16
|
+
src = opts.src.read.chomp
|
17
|
+
if src.empty?
|
18
|
+
$stderr.puts "No expression given"
|
19
|
+
exit 1
|
20
|
+
end
|
21
|
+
|
22
|
+
fop, errors = Fop.compile(src)
|
23
|
+
opts.src.close
|
24
|
+
NL = "\n".freeze
|
25
|
+
|
26
|
+
if errors
|
27
|
+
$stderr.puts src
|
28
|
+
$stderr.puts errors.join(NL)
|
29
|
+
exit 1
|
30
|
+
end
|
31
|
+
|
32
|
+
if opts.check
|
33
|
+
$stdout.puts "Syntax OK" unless opts.quiet
|
34
|
+
exit 0
|
35
|
+
end
|
36
|
+
|
37
|
+
while (line = gets) do
|
38
|
+
line.chomp!
|
39
|
+
if (res = fop.apply(line))
|
40
|
+
print(res << NL)
|
41
|
+
end
|
42
|
+
end
|
data/lib/fop/cli.rb
ADDED
@@ -0,0 +1,34 @@
|
|
1
|
+
require 'optparse'
|
2
|
+
|
3
|
+
module Fop
|
4
|
+
module CLI
|
5
|
+
Options = Struct.new(:src, :check, :quiet, :version)
|
6
|
+
|
7
|
+
def self.options!
|
8
|
+
options = Options.new
|
9
|
+
OptionParser.new do |opts|
|
10
|
+
opts.banner = "Usage: fop [options] [ 'prog' | -f progfile ] [ file ... ]"
|
11
|
+
|
12
|
+
opts.on("-fFILE", "--file=FILE", "Read program from file instead of first argument") do |f|
|
13
|
+
options.src = File.open(f)
|
14
|
+
options.src.advise(:sequential)
|
15
|
+
end
|
16
|
+
|
17
|
+
opts.on("-c", "--check", "Perform a syntax check on the program and exit") do
|
18
|
+
options.check = true
|
19
|
+
end
|
20
|
+
|
21
|
+
opts.on("-q", "--quiet", "Only print errors and output") do
|
22
|
+
options.quiet = true
|
23
|
+
end
|
24
|
+
|
25
|
+
opts.on("--version", "Print version and exit") do
|
26
|
+
options.version = true
|
27
|
+
end
|
28
|
+
end.parse!
|
29
|
+
|
30
|
+
options.src ||= StringIO.new(ARGV.shift || "")
|
31
|
+
options
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
data/lib/fop/compiler.rb
ADDED
@@ -0,0 +1,95 @@
|
|
1
|
+
require_relative 'parser'
|
2
|
+
|
3
|
+
module Fop
|
4
|
+
module Compiler
|
5
|
+
def self.compile(src)
|
6
|
+
parser = Parser.new(src)
|
7
|
+
nodes, errors = parser.parse
|
8
|
+
|
9
|
+
instructions = nodes.map { |node|
|
10
|
+
case node
|
11
|
+
when Nodes::Text, Nodes::Regex
|
12
|
+
Instructions.regex_match(node.regex)
|
13
|
+
when Nodes::Expression
|
14
|
+
arg_error = Validations.validate_args(node)
|
15
|
+
errors << arg_error if arg_error
|
16
|
+
Instructions::ExpressionMatch.new(node)
|
17
|
+
else
|
18
|
+
raise "Unknown node type #{node}"
|
19
|
+
end
|
20
|
+
}
|
21
|
+
|
22
|
+
return nil, errors if errors.any?
|
23
|
+
return instructions, nil
|
24
|
+
end
|
25
|
+
|
26
|
+
module Instructions
|
27
|
+
Op = Struct.new(:proc, :arity, :max_arity)
|
28
|
+
BLANK = "".freeze
|
29
|
+
OPERATIONS = {
|
30
|
+
"=" => Op.new(->(_val, args) { args[0] || BLANK }, 0, 1),
|
31
|
+
"+" => Op.new(->(val, args) { val.to_i + args[0].to_i }, 1),
|
32
|
+
"-" => Op.new(->(val, args) { val.to_i - args[0].to_i }, 1),
|
33
|
+
">" => Op.new(->(val, args) { val + args[0] }, 1),
|
34
|
+
"<" => Op.new(->(val, args) { args[0] + val }, 1),
|
35
|
+
}
|
36
|
+
|
37
|
+
def self.regex_match(regex)
|
38
|
+
->(input) { input.slice! regex }
|
39
|
+
end
|
40
|
+
|
41
|
+
class ExpressionMatch
|
42
|
+
def initialize(node)
|
43
|
+
@regex = node.regex&.regex
|
44
|
+
@op = node.operator_token ? OPERATIONS.fetch(node.operator_token.val) : nil
|
45
|
+
@regex_match = node.regex_match
|
46
|
+
@args = node.args&.map { |arg|
|
47
|
+
arg.has_captures ? arg.segments : arg.segments.join("")
|
48
|
+
}
|
49
|
+
end
|
50
|
+
|
51
|
+
def call(input)
|
52
|
+
if (match = @regex.match(input))
|
53
|
+
val = match.to_s
|
54
|
+
blank = val == BLANK
|
55
|
+
input.sub!(val, BLANK) unless blank
|
56
|
+
found_val = @regex_match || !blank
|
57
|
+
if @op and @args and found_val
|
58
|
+
args = @args.map { |arg|
|
59
|
+
case arg
|
60
|
+
when String then arg
|
61
|
+
when Array then sub_caps(arg, match.captures)
|
62
|
+
else raise "Unexpected arg type #{arg.class.name}"
|
63
|
+
end
|
64
|
+
}
|
65
|
+
@op.proc.call(val, args)
|
66
|
+
else
|
67
|
+
val
|
68
|
+
end
|
69
|
+
end
|
70
|
+
end
|
71
|
+
|
72
|
+
private
|
73
|
+
|
74
|
+
def sub_caps(args, caps)
|
75
|
+
args.map { |a|
|
76
|
+
a.is_a?(Integer) ? caps[a].to_s : a
|
77
|
+
}.join("")
|
78
|
+
end
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
module Validations
|
83
|
+
def self.validate_args(exp_node)
|
84
|
+
op_token = exp_node.operator_token || return
|
85
|
+
op = Instructions::OPERATIONS.fetch(op_token.val)
|
86
|
+
num = exp_node.args&.size || 0
|
87
|
+
arity = op.arity
|
88
|
+
max_arity = op.max_arity || arity
|
89
|
+
if num < arity or num > max_arity
|
90
|
+
Parser::Error.new(:argument, op_token, "#{op_token.val} expects #{arity}..#{max_arity} arguments; #{num} given")
|
91
|
+
end
|
92
|
+
end
|
93
|
+
end
|
94
|
+
end
|
95
|
+
end
|
data/lib/fop/nodes.rb
CHANGED
@@ -1,44 +1,39 @@
|
|
1
1
|
module Fop
|
2
2
|
module Nodes
|
3
|
-
Text = Struct.new(:wildcard, :str) do
|
4
|
-
def consume!(input)
|
5
|
-
@regex ||= Regexp.new((wildcard ? ".*" : "^") + Regexp.escape(str))
|
6
|
-
input.slice!(@regex)
|
7
|
-
end
|
8
|
-
|
3
|
+
Text = Struct.new(:wildcard, :str, :regex) do
|
9
4
|
def to_s
|
10
5
|
w = wildcard ? "*" : nil
|
11
|
-
"
|
6
|
+
"[#{w}txt] #{str}"
|
12
7
|
end
|
13
8
|
end
|
14
9
|
|
15
|
-
|
16
|
-
def
|
17
|
-
|
18
|
-
|
19
|
-
blank = val == Parser::BLANK
|
20
|
-
input.sub!(val, Parser::BLANK) unless blank
|
21
|
-
found_val = regex_match || !blank
|
22
|
-
arg = operator_arg_w_caps ? sub_caps(operator_arg_w_caps, match.captures) : operator_arg
|
23
|
-
expression && found_val ? expression.call(val, operator, arg) : val
|
24
|
-
end
|
10
|
+
Regex = Struct.new(:wildcard, :src, :regex) do
|
11
|
+
def to_s
|
12
|
+
w = wildcard ? "*" : nil
|
13
|
+
"[#{w}reg] #{src}"
|
25
14
|
end
|
15
|
+
end
|
26
16
|
|
17
|
+
Expression = Struct.new(:wildcard, :match, :regex_match, :regex, :operator_token, :args) do
|
27
18
|
def to_s
|
28
19
|
w = wildcard ? "*" : nil
|
29
|
-
s = "#{w}#{match}"
|
30
|
-
|
20
|
+
s = "[#{w}exp] #{match}"
|
21
|
+
if operator_token
|
22
|
+
arg_str = args
|
23
|
+
.map { |a| a.is_a?(Integer) ? "$#{a+1}" : a.to_s }
|
24
|
+
.join("")
|
25
|
+
s << " #{operator_token.val} #{arg_str}"
|
26
|
+
end
|
31
27
|
s
|
32
28
|
end
|
29
|
+
end
|
33
30
|
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
when Parser::CaptureGroup then caps[t.index].to_s
|
41
|
-
else raise Parser::Error, "Unexpected #{t} in capture group"
|
31
|
+
Arg = Struct.new(:segments, :has_captures) do
|
32
|
+
def to_s
|
33
|
+
segments.map { |s|
|
34
|
+
case s
|
35
|
+
when Integer then "$#{s + 1}"
|
36
|
+
else s.to_s
|
42
37
|
end
|
43
38
|
}.join("")
|
44
39
|
end
|
data/lib/fop/parser.rb
CHANGED
@@ -1,181 +1,173 @@
|
|
1
|
+
require_relative 'tokenizer'
|
1
2
|
require_relative 'nodes'
|
2
3
|
|
3
4
|
module Fop
|
4
|
-
|
5
|
-
|
6
|
-
|
5
|
+
class Parser
|
6
|
+
DIGIT = /^[0-9]$/
|
7
|
+
REGEX_START = "^".freeze
|
8
|
+
REGEX_LAZY_WILDCARD = ".*?".freeze
|
9
|
+
REGEX_MATCHES = {
|
10
|
+
"N" => "[0-9]+".freeze,
|
11
|
+
"W" => "\\w+".freeze,
|
12
|
+
"A" => "[a-zA-Z]+".freeze,
|
13
|
+
"*" => ".*".freeze,
|
14
|
+
}.freeze
|
15
|
+
#OPS_WITH_OPTIONAL_ARGS = [Tokenizer::OP_REPLACE]
|
16
|
+
TR_REGEX = /.*/
|
17
|
+
|
18
|
+
Error = Struct.new(:type, :token, :message) do
|
19
|
+
def to_s
|
20
|
+
"#{type.to_s.capitalize} error: #{message} at column #{token.pos}"
|
21
|
+
end
|
22
|
+
end
|
7
23
|
|
8
|
-
|
9
|
-
MATCH_WORD = "W".freeze
|
10
|
-
MATCH_ALPHA = "A".freeze
|
11
|
-
MATCH_WILD = "*".freeze
|
12
|
-
BLANK = "".freeze
|
13
|
-
OP_REPLACE = "=".freeze
|
14
|
-
OP_APPEND = ">".freeze
|
15
|
-
OP_PREPEND = "<".freeze
|
16
|
-
OP_ADD = "+".freeze
|
17
|
-
OP_SUB = "-".freeze
|
18
|
-
OP_MUL = "*".freeze
|
19
|
-
OP_DIV = "/".freeze
|
20
|
-
VAR = "$".freeze
|
21
|
-
CAP_NUM = /^[1-9]$/
|
24
|
+
attr_reader :errors
|
22
25
|
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
26
|
+
def initialize(src, debug: false)
|
27
|
+
@tokenizer = Tokenizer.new(src)
|
28
|
+
@errors = []
|
29
|
+
end
|
27
30
|
|
28
|
-
def
|
31
|
+
def parse
|
29
32
|
nodes = []
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
33
|
+
wildcard = false
|
34
|
+
eof = false
|
35
|
+
# Top-level parsing. It will always be looking for a String, Regex, or Expression.
|
36
|
+
until eof
|
37
|
+
@tokenizer.reset_escapes!
|
38
|
+
t = @tokenizer.next
|
39
|
+
case t.type
|
40
|
+
when Tokens::WILDCARD
|
41
|
+
errors << Error.new(:syntax, t, "Consecutive wildcards") if wildcard
|
42
|
+
wildcard = true
|
43
|
+
when Tokens::TEXT
|
44
|
+
reg = build_regex!(wildcard, t, Regexp.escape(t.val))
|
45
|
+
nodes << Nodes::Text.new(wildcard, t.val, reg)
|
46
|
+
wildcard = false
|
47
|
+
when Tokens::EXP_OPEN
|
48
|
+
nodes << parse_exp!(wildcard)
|
49
|
+
wildcard = false
|
50
|
+
when Tokens::REG_DELIM
|
51
|
+
nodes << parse_regex!(wildcard)
|
52
|
+
wildcard = false
|
53
|
+
when Tokens::EOF
|
54
|
+
eof = true
|
45
55
|
else
|
46
|
-
|
56
|
+
errors << Error.new(:syntax, t, "Unexpected #{t.type}")
|
47
57
|
end
|
48
|
-
}
|
49
|
-
|
50
|
-
case curr_node
|
51
|
-
when nil
|
52
|
-
# noop
|
53
|
-
when :wildcard
|
54
|
-
nodes << Nodes::Text.new(true, "")
|
55
|
-
when Nodes::Text, Nodes::Op
|
56
|
-
nodes << curr_node
|
57
|
-
else
|
58
|
-
raise Error, "Unexpected end node #{curr_node}"
|
59
58
|
end
|
60
|
-
|
61
|
-
nodes
|
59
|
+
nodes << Nodes::Text.new(true, "", TR_REGEX) if wildcard
|
60
|
+
return nodes, @errors
|
62
61
|
end
|
63
62
|
|
64
|
-
|
63
|
+
def parse_exp!(wildcard = false)
|
64
|
+
exp = Nodes::Expression.new(wildcard)
|
65
|
+
parse_exp_match! exp
|
66
|
+
parse_exp_operator! exp
|
67
|
+
if exp.operator_token
|
68
|
+
parse_exp_arg! exp
|
69
|
+
end
|
70
|
+
return exp
|
71
|
+
end
|
65
72
|
|
66
|
-
def
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
73
|
+
def parse_exp_match!(exp)
|
74
|
+
@tokenizer.escape.whitespace = false
|
75
|
+
@tokenizer.escape.operators = false
|
76
|
+
t = @tokenizer.next
|
77
|
+
case t.type
|
78
|
+
when Tokens::TEXT, Tokens::WILDCARD
|
79
|
+
exp.match = t.val
|
80
|
+
if (src = REGEX_MATCHES[exp.match])
|
81
|
+
reg = Regexp.new((exp.wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
|
82
|
+
exp.regex = Nodes::Regex.new(exp.wildcard, src, reg)
|
83
|
+
else
|
84
|
+
errors << Error.new(:name, t, "Unknown match type '#{exp.match}'") if exp.regex.nil?
|
85
|
+
end
|
86
|
+
when Tokens::REG_DELIM
|
87
|
+
exp.regex = parse_regex!(exp.wildcard)
|
88
|
+
exp.match = exp.regex&.src
|
89
|
+
exp.regex_match = true
|
90
|
+
@tokenizer.reset_escapes!
|
76
91
|
else
|
77
|
-
|
92
|
+
errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string or a regex")
|
78
93
|
end
|
79
94
|
end
|
80
95
|
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
when
|
89
|
-
|
90
|
-
return op, node
|
91
|
-
when :wildcard
|
92
|
-
return :wildcard, node
|
96
|
+
def parse_exp_operator!(exp)
|
97
|
+
@tokenizer.escape.whitespace = false
|
98
|
+
@tokenizer.escape.operators = false
|
99
|
+
t = @tokenizer.next
|
100
|
+
case t.type
|
101
|
+
when Tokens::EXP_CLOSE
|
102
|
+
# no op
|
103
|
+
when Tokens::OPERATOR, Tokens::TEXT
|
104
|
+
exp.operator_token = t
|
93
105
|
else
|
94
|
-
|
106
|
+
errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected an operator")
|
95
107
|
end
|
96
108
|
end
|
97
109
|
|
98
|
-
def
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
|
110
|
+
def parse_exp_arg!(exp)
|
111
|
+
@tokenizer.escape.whitespace = false
|
112
|
+
@tokenizer.escape.whitespace_sep = false
|
113
|
+
@tokenizer.escape.operators = true
|
114
|
+
@tokenizer.escape.regex = true
|
115
|
+
@tokenizer.escape.regex_capture = false if exp.regex_match
|
116
|
+
|
117
|
+
arg = Nodes::Arg.new([], false)
|
118
|
+
exp.args = []
|
119
|
+
found_close, eof = false, false
|
120
|
+
until found_close or eof
|
121
|
+
t = @tokenizer.next
|
122
|
+
case t.type
|
123
|
+
when Tokens::TEXT
|
124
|
+
arg.segments << t.val
|
125
|
+
when Tokens::REG_CAPTURE
|
126
|
+
arg.has_captures = true
|
127
|
+
arg.segments << t.val.to_i - 1
|
128
|
+
errors << Error.new(:syntax, t, "Invalid regex capture; must be between 0 and 9 (found #{t.val})") unless t.val =~ DIGIT
|
129
|
+
errors << Error.new(:syntax, t, "Unexpected regex capture; expected str or '}'") if !exp.regex_match
|
130
|
+
when Tokens::WHITESPACE_SEP
|
131
|
+
if arg.segments.any?
|
132
|
+
exp.args << arg
|
133
|
+
arg = Nodes::Arg.new([])
|
134
|
+
end
|
135
|
+
when Tokens::EXP_CLOSE
|
136
|
+
found_close = true
|
137
|
+
when Tokens::EOF
|
138
|
+
eof = true
|
139
|
+
errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
|
118
140
|
else
|
119
|
-
|
141
|
+
errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
|
120
142
|
end
|
121
|
-
|
122
|
-
# parse the operator (if any)
|
123
|
-
if token.operator
|
124
|
-
raise Error, "Unexpected #{token.operator} for operator" unless token.operator.is_a? Tokenizer::Char
|
125
|
-
node.operator = token.operator.char
|
126
|
-
node.operator_arg = token.arg if token.arg and token.arg != BLANK
|
127
|
-
node.operator_arg_w_caps = parse_captures! node.operator_arg if node.operator_arg and node.regex_match
|
128
|
-
node.expression =
|
129
|
-
case node.operator
|
130
|
-
when OP_REPLACE
|
131
|
-
EXP_REPLACE
|
132
|
-
when OP_ADD, OP_SUB, OP_MUL, OP_DIV
|
133
|
-
raise Error, "Operator #{node.operator} is only available for numeric matches" unless node.match == MATCH_NUM
|
134
|
-
raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
|
135
|
-
EXP_MATH
|
136
|
-
when OP_APPEND
|
137
|
-
raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
|
138
|
-
EXP_APPEND
|
139
|
-
when OP_PREPEND
|
140
|
-
raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
|
141
|
-
EXP_PREPEND
|
142
|
-
else
|
143
|
-
raise Error, "Unknown operator #{node.operator}"
|
144
|
-
end
|
145
143
|
end
|
146
|
-
|
147
|
-
|
148
|
-
def self.parse_captures!(arg)
|
149
|
-
i = 0
|
150
|
-
iend = arg.size - 1
|
151
|
-
escape = false
|
152
|
-
nodes = []
|
153
|
-
|
154
|
-
until i > iend
|
155
|
-
char = arg[i]
|
156
|
-
i += 1
|
144
|
+
exp.args << arg if arg.segments.any?
|
157
145
|
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
|
162
|
-
end
|
146
|
+
#if exp.arg.size != 1 and !OPS_WITH_OPTIONAL_ARGS.include?(exp.operator)
|
147
|
+
# errors << Error.new(:arg, op_token, "Operator '#{op_token.val}' requires an argument")
|
148
|
+
#end
|
149
|
+
end
|
163
150
|
|
164
|
-
|
165
|
-
|
166
|
-
|
167
|
-
|
168
|
-
|
169
|
-
|
170
|
-
|
171
|
-
|
172
|
-
else
|
173
|
-
nodes << char
|
174
|
-
end
|
151
|
+
def parse_regex!(wildcard)
|
152
|
+
@tokenizer.regex_mode!
|
153
|
+
t = @tokenizer.next
|
154
|
+
reg = Nodes::Regex.new(wildcard, t.val)
|
155
|
+
if t.type == Tokens::TEXT
|
156
|
+
reg.regex = build_regex!(wildcard, t)
|
157
|
+
else
|
158
|
+
errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex")
|
175
159
|
end
|
176
160
|
|
177
|
-
|
178
|
-
|
161
|
+
t = @tokenizer.next
|
162
|
+
errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex") unless t.type == Tokens::REG_DELIM
|
163
|
+
reg
|
164
|
+
end
|
165
|
+
|
166
|
+
def build_regex!(wildcard, token, src = token.val)
|
167
|
+
Regexp.new((wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
|
168
|
+
rescue RegexpError => e
|
169
|
+
errors << Error.new(:regex, token, e.message)
|
170
|
+
nil
|
179
171
|
end
|
180
172
|
end
|
181
173
|
end
|
data/lib/fop/program.rb
CHANGED
@@ -1,22 +1,16 @@
|
|
1
|
-
require_relative 'tokenizer'
|
2
|
-
require_relative 'parser'
|
3
|
-
|
4
1
|
module Fop
|
5
2
|
class Program
|
6
|
-
|
7
|
-
|
8
|
-
def initialize(src)
|
9
|
-
tokens = Tokenizer.new(src).tokenize!
|
10
|
-
@nodes = Parser.parse! tokens
|
3
|
+
def initialize(instructions)
|
4
|
+
@instructions = instructions
|
11
5
|
end
|
12
6
|
|
13
7
|
def apply(input)
|
14
8
|
input = input.clone
|
15
9
|
output =
|
16
|
-
@
|
17
|
-
|
18
|
-
return nil if
|
19
|
-
acc +
|
10
|
+
@instructions.reduce("") { |acc, ins|
|
11
|
+
result = ins.call(input)
|
12
|
+
return nil if result.nil?
|
13
|
+
acc + result.to_s
|
20
14
|
}
|
21
15
|
input.empty? ? output : nil
|
22
16
|
end
|
data/lib/fop/tokenizer.rb
CHANGED
@@ -1,144 +1,194 @@
|
|
1
|
+
require_relative 'tokens'
|
2
|
+
|
1
3
|
module Fop
|
2
4
|
class Tokenizer
|
3
|
-
|
4
|
-
|
5
|
-
Regex = Struct.new(:src)
|
6
|
-
Error = Class.new(StandardError)
|
5
|
+
Token = Struct.new(:pos, :type, :val)
|
6
|
+
Escapes = Struct.new(:whitespace, :whitespace_sep, :operators, :regex_capture, :regex, :regex_escape, :wildcards, :exp)
|
7
7
|
|
8
|
-
|
9
|
-
|
8
|
+
EXP_OPEN = "{".freeze
|
9
|
+
EXP_CLOSE = "}".freeze
|
10
10
|
ESCAPE = "\\".freeze
|
11
11
|
WILDCARD = "*".freeze
|
12
|
-
|
12
|
+
REGEX_DELIM = "/".freeze
|
13
|
+
REGEX_CAPTURE = "$".freeze
|
14
|
+
OP_REPLACE = "=".freeze
|
15
|
+
OP_APPEND = ">".freeze
|
16
|
+
OP_PREPEND = "<".freeze
|
17
|
+
OP_ADD = "+".freeze
|
18
|
+
OP_SUB = "-".freeze
|
19
|
+
WHITESPACE = " ".freeze
|
20
|
+
|
21
|
+
#
|
22
|
+
# Controls which "mode" the tokenizer is currently in. This is a necessary result of the syntax lacking
|
23
|
+
# explicit string delimiters. That *could* be worked around by requiring users to escape all reserved chars,
|
24
|
+
# but that's ugly af. Instead, the parser continually assesses the current context and flips these flags on
|
25
|
+
# or off to auto-escape certain chars for the next token.
|
26
|
+
#
|
27
|
+
attr_reader :escape
|
13
28
|
|
14
29
|
def initialize(src)
|
15
30
|
@src = src
|
16
31
|
@end = src.size - 1
|
32
|
+
@start_i = 0
|
33
|
+
@i = 0
|
34
|
+
reset_escapes!
|
17
35
|
end
|
18
36
|
|
19
|
-
|
20
|
-
|
21
|
-
escape =
|
22
|
-
i = 0
|
23
|
-
until i > @end do
|
24
|
-
char = @src[i]
|
25
|
-
i += 1
|
26
|
-
|
27
|
-
if escape
|
28
|
-
tokens << Char.new(char)
|
29
|
-
escape = false
|
30
|
-
next
|
31
|
-
end
|
32
|
-
|
33
|
-
case char
|
34
|
-
when ESCAPE
|
35
|
-
escape = true
|
36
|
-
when OP_OPEN
|
37
|
-
i, op = operation! i
|
38
|
-
tokens << op
|
39
|
-
when OP_CLOSE
|
40
|
-
raise "Unexpected #{OP_CLOSE}"
|
41
|
-
when WILDCARD
|
42
|
-
tokens << :wildcard
|
43
|
-
else
|
44
|
-
tokens << Char.new(char)
|
45
|
-
end
|
46
|
-
end
|
47
|
-
|
48
|
-
raise Error, "Trailing escape" if escape
|
49
|
-
tokens
|
37
|
+
# Auto-escape operators and regex capture vars. Appropriate for top-level syntax.
|
38
|
+
def reset_escapes!
|
39
|
+
@escape = Escapes.new(true, true, true, true)
|
50
40
|
end
|
51
41
|
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
42
|
+
# Auto-escape anything you'd find in a regular expression
|
43
|
+
def regex_mode!
|
44
|
+
@escape.whitespace = true
|
45
|
+
@escape.regex = false # look for the final /
|
46
|
+
@escape.regex_escape = true # pass \ through to the regex engine UNLESS it's followed by a /
|
47
|
+
@escape.wildcards = true
|
48
|
+
@escape.operators = true
|
49
|
+
@escape.regex_capture = true
|
50
|
+
@escape.exp = true
|
51
|
+
end
|
57
52
|
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
53
|
+
def next
|
54
|
+
return Token.new(@i, Tokens::EOF) if @i > @end
|
55
|
+
char = @src[@i]
|
56
|
+
case char
|
57
|
+
when EXP_OPEN
|
58
|
+
@i += 1
|
59
|
+
token! Tokens::EXP_OPEN
|
60
|
+
when EXP_CLOSE
|
61
|
+
@i += 1
|
62
|
+
token! Tokens::EXP_CLOSE
|
63
|
+
when WILDCARD
|
64
|
+
@i += 1
|
65
|
+
token! Tokens::WILDCARD, WILDCARD
|
66
|
+
when REGEX_DELIM
|
67
|
+
if @escape.regex
|
68
|
+
get_str!
|
68
69
|
else
|
69
|
-
|
70
|
+
@i += 1
|
71
|
+
token! Tokens::REG_DELIM
|
70
72
|
end
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
until found_close or op.operator or i > @end do
|
75
|
-
char = @src[i]
|
76
|
-
i += 1
|
77
|
-
case char
|
78
|
-
when OP_CLOSE
|
79
|
-
found_close = true
|
73
|
+
when REGEX_CAPTURE
|
74
|
+
if @escape.regex_capture
|
75
|
+
get_str!
|
80
76
|
else
|
81
|
-
|
77
|
+
@i += 1
|
78
|
+
t = token! Tokens::REG_CAPTURE, @src[@i]
|
79
|
+
@i += 1
|
80
|
+
@start_i = @i
|
81
|
+
t
|
82
82
|
end
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
i += 1
|
90
|
-
|
91
|
-
if escape
|
92
|
-
op.arg << char
|
93
|
-
escape = false
|
94
|
-
next
|
83
|
+
when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
|
84
|
+
if @escape.operators
|
85
|
+
get_str!
|
86
|
+
else
|
87
|
+
@i += 1
|
88
|
+
token! Tokens::OPERATOR, char
|
95
89
|
end
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
when OP_CLOSE
|
103
|
-
found_close = true
|
90
|
+
when WHITESPACE
|
91
|
+
if @escape.whitespace
|
92
|
+
get_str!
|
93
|
+
elsif !@escape.whitespace_sep
|
94
|
+
@i += 1
|
95
|
+
token! Tokens::WHITESPACE_SEP
|
104
96
|
else
|
105
|
-
|
97
|
+
@i += 1
|
98
|
+
@start_i = @i
|
99
|
+
self.next
|
106
100
|
end
|
101
|
+
else
|
102
|
+
get_str!
|
107
103
|
end
|
108
|
-
|
109
|
-
raise Error, "Unclosed operation" if !found_close
|
110
|
-
raise Error, "Trailing escape" if escape
|
111
|
-
return i, op
|
112
104
|
end
|
113
105
|
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
|
106
|
+
private
|
107
|
+
|
108
|
+
def token!(type, val = nil)
|
109
|
+
t = Token.new(@start_i, type, val)
|
110
|
+
@start_i = @i
|
111
|
+
t
|
112
|
+
end
|
118
113
|
|
119
|
-
|
120
|
-
|
121
|
-
|
114
|
+
def get_str!
|
115
|
+
str = ""
|
116
|
+
escape, found_end = false, false
|
117
|
+
until found_end or @i > @end
|
118
|
+
char = @src[@i]
|
122
119
|
|
123
120
|
if escape
|
124
|
-
|
121
|
+
@i += 1
|
122
|
+
str << char
|
125
123
|
escape = false
|
126
124
|
next
|
127
125
|
end
|
128
126
|
|
129
127
|
case char
|
130
128
|
when ESCAPE
|
131
|
-
|
132
|
-
|
133
|
-
|
129
|
+
@i += 1
|
130
|
+
if @escape.regex_escape and @src[@i] != REGEX_DELIM
|
131
|
+
str << char
|
132
|
+
else
|
133
|
+
escape = true
|
134
|
+
end
|
135
|
+
when EXP_OPEN
|
136
|
+
if @escape.exp
|
137
|
+
@i += 1
|
138
|
+
str << char
|
139
|
+
else
|
140
|
+
found_end = true
|
141
|
+
end
|
142
|
+
when EXP_CLOSE
|
143
|
+
if @escape.exp
|
144
|
+
@i += 1
|
145
|
+
str << char
|
146
|
+
else
|
147
|
+
found_end = true
|
148
|
+
end
|
149
|
+
when WILDCARD
|
150
|
+
if @escape.wildcards
|
151
|
+
@i += 1
|
152
|
+
str << char
|
153
|
+
else
|
154
|
+
found_end = true
|
155
|
+
end
|
156
|
+
when REGEX_DELIM
|
157
|
+
if @escape.regex
|
158
|
+
@i += 1
|
159
|
+
str << char
|
160
|
+
else
|
161
|
+
found_end = true
|
162
|
+
end
|
163
|
+
when REGEX_CAPTURE
|
164
|
+
if @escape.regex_capture
|
165
|
+
@i += 1
|
166
|
+
str << char
|
167
|
+
else
|
168
|
+
found_end = true
|
169
|
+
end
|
170
|
+
when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
|
171
|
+
if @escape.operators
|
172
|
+
@i += 1
|
173
|
+
str << char
|
174
|
+
else
|
175
|
+
found_end = true
|
176
|
+
end
|
177
|
+
when WHITESPACE
|
178
|
+
if @escape.whitespace
|
179
|
+
@i += 1
|
180
|
+
str << char
|
181
|
+
else
|
182
|
+
found_end = true
|
183
|
+
end
|
134
184
|
else
|
135
|
-
|
185
|
+
@i += 1
|
186
|
+
str << char
|
136
187
|
end
|
137
188
|
end
|
138
189
|
|
139
|
-
|
140
|
-
|
141
|
-
return i, Regex.new(src)
|
190
|
+
return Token.new(@i - 1, Tokens::TR_ESC) if escape
|
191
|
+
token! Tokens::TEXT, str
|
142
192
|
end
|
143
193
|
end
|
144
194
|
end
|
data/lib/fop/tokens.rb
ADDED
data/lib/fop/version.rb
CHANGED
data/lib/fop_lang.rb
CHANGED
@@ -1,12 +1,22 @@
|
|
1
1
|
require_relative 'fop/version'
|
2
|
+
require_relative 'fop/compiler'
|
2
3
|
require_relative 'fop/program'
|
3
4
|
|
4
5
|
def Fop(src)
|
5
|
-
::Fop
|
6
|
+
::Fop.compile!(src)
|
6
7
|
end
|
7
8
|
|
8
9
|
module Fop
|
10
|
+
def self.compile!(src)
|
11
|
+
prog, errors = compile(src)
|
12
|
+
# TODO better exception
|
13
|
+
raise "Fop errors: " + errors.map(&:message).join(",") if errors
|
14
|
+
prog
|
15
|
+
end
|
16
|
+
|
9
17
|
def self.compile(src)
|
10
|
-
|
18
|
+
instructions, errors = ::Fop::Compiler.compile(src)
|
19
|
+
return nil, errors if errors
|
20
|
+
return Program.new(instructions), nil
|
11
21
|
end
|
12
22
|
end
|
metadata
CHANGED
@@ -1,26 +1,31 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: fop_lang
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.8.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jordan Hollinger
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2021-
|
11
|
+
date: 2021-09-01 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
13
|
description: A micro expression language for Filter and OPerations on text
|
14
14
|
email: jordan.hollinger@gmail.com
|
15
|
-
executables:
|
15
|
+
executables:
|
16
|
+
- fop
|
16
17
|
extensions: []
|
17
18
|
extra_rdoc_files: []
|
18
19
|
files:
|
19
20
|
- README.md
|
21
|
+
- bin/fop
|
22
|
+
- lib/fop/cli.rb
|
23
|
+
- lib/fop/compiler.rb
|
20
24
|
- lib/fop/nodes.rb
|
21
25
|
- lib/fop/parser.rb
|
22
26
|
- lib/fop/program.rb
|
23
27
|
- lib/fop/tokenizer.rb
|
28
|
+
- lib/fop/tokens.rb
|
24
29
|
- lib/fop/version.rb
|
25
30
|
- lib/fop_lang.rb
|
26
31
|
homepage: https://jhollinger.github.io/fop-lang-rb/
|