RubyGems - fop_lang - Versions diffs - 0.4.0 → 0.8.0 - Mend

fop_lang 0.4.0 → 0.8.0

Files changed (13) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 8ed95bb4708820a186e6485cc29dbb47286b0f309a1caf91af7778d768b0efb3
-  data.tar.gz: 85d41728ddae13f3667f0a2d55a5c4dcbc26e8217d4f31466e6ed92038859881
+  metadata.gz: e23d8d937f5a4b5e4d74010bb91923dedce019543d4d3baefc228dece938a731
+  data.tar.gz: cc97f6953b708498be169352269b861c73c9dbe52ded1a72f4370a8d18d32d48
 SHA512:
-  metadata.gz: e650bdf66d8d0b5dcb603eae494f38d4969a19647053b4e81e0612705f5b16a5755d007e4ce01fad9b487224d66eff462738f1e37b011ba2a4bf4a45b0203bb3
-  data.tar.gz: 99b31736236785cecc85b9bb23ccc3e366713cd23dd04c992a8fff6676a316d5741bfe5058e17cd0812c43d8bf1979aa7546184386018c1f92ed2462a85eb5fb
+  metadata.gz: e2cec9cd47a472298f7af0268a9dc03aacce374ed88da7b505e33cb4536f6f1d04107cce7c33eba4718809d54591a3111bfd26971eef3c52073ba1226be4da4f
+  data.tar.gz: 80b5700d0cdda44dd021fe48d5c134cb992c6967b10681c43488a5a7276fbf03df7d7a9427a9aa92529569eaf0d134fa789df0c8e27cd2250dc50bcb16727d13

data/README.md CHANGED Viewed

@@ -1,57 +1,107 @@
 # fop_lang
-Fop (Filter and OPerations language) is an experimental, tiny expression language in the vein of awk and sed. This is a Ruby implementation. It is useful for simultaneously matching and transforming text input.
+Fop (Filter and OPerations language) is a tiny, experimental language for filtering and operating on text. Think of it like awk but with the condition and action segments combined.
-```ruby
-gem 'fop_lang'
-```
+This is a Ruby implementation with both a library interface and a bin command.
-## Release Number Example
+## Installation
-This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
+```bash
+$ gem install fop_lang
+```
+You may use fop in a Ruby script:
 ```ruby
-  f = Fop('release-{N}.{N+1}.{N=0}')
+require 'fop_lang'
-  puts f.apply('release-5.99.1')
-  =>           'release-5.100.0'
+f = Fop('foo {N+1}')
+f.apply('foo 1')
+=> "foo 2"
+f.apply('bar 1')
+=> nil
+```
-  puts f.apply('release-5')
-  => nil
-  # doesn't match the pattern
+or run `fop` from the command line:
+```bash
+$ echo 'foo 1' | fop 'foo {N+1}'
+foo 2
+$ echo 'bar 1' | fop 'foo {N+1}'
 ```
-## Anatomy of a Fop expression
+## Syntax
+`Text /(R|r)egex/ {N+1}`
+The above program demonstrates a text match, a regex match, and a match expression. If the input matches all three segments, output is given. If the input was `Text regex 5`, the output would be `Text regex 6`.
+### Text match
+`Text ` and ` ` in the above example.
+The input must match this text exactly. Whitespace is part of the match. Wildcards (`*`) are allowed. Special characters (`*/{}\`) may be escaped with `\`.
+The output of a text match will be the matching input.
+### Regex match
+`/(R|r)egex/` in the above example.
+Regular expressions may be placed between `/`s. If the regular expression contains a `/`, you may escape it with `\`. Special regex characters like `[]()+.*` may also be escaped with `\`.
-`Text Literal {Operation}`
+The output of a regex match will be the matching input.
-The above expression contains the only two parts of Fop (except for the wildcard and escape characters).
+### Match expression
-**Text Literals**
+`{N+1}` in the above example.
-A text literal works how it sounds: the input must match it exactly. If it matches it passes through unchanged. The only exception is the `*` (wildcard) character, which matches 0 or more of anything. Wildcards can be used anywhere except inside `{...}` (operations).
+A match expression both matches on input and modifies that input. An expression is made up of 1 - 3 parts:
-If `\` (escape) is used before the special characters `*`, `{` or `}`, then that character is treated like a text literal. It's recommended to use single-quoted Ruby strings with Fop expressions that so you don't need to double-escape.
+1. The match, e.g. `N` for numeric.
+2. The operator, e.g. `+` for addition (optional).
+3. The argument, e.g `1` for "add one" (required for most operators).
-**Operations**
+The output of a match expression will be the _modified_ matching input. If no operator is given, the output will be the matching input.
-Operations are the interesting part of Fop, and are specified between `{` and `}`. An Operation can consist of one to three parts:
+**Matches**
-1. Matching class (required): Defines what characters the operation will match and operate on.
-  * `N` is the numeric class and will match one or more digits.
-  * `A` is the alpha class and will match one or more letters (lower or upper case).
-  * `W` is the word class and matches alphanumeric chars and underscores.
-  * `*` is the wildcard class and greedily matches everything after it.
-  * `/.../` matches on the supplied regex between the `/`'s. If you're regex contains a `/`, it must be escaped. Capture groups may be referenced in the operator argument as `$1`, `$2`, etc.
-3. Operator (optional): What to do to the matching characters.
-  * `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars.
-  * `>` Append the following chars to the matching value.
-  * `<` Prepend the following chars to the matching value.
-  * `+` Perform addition on the matching number and the argument (`N` only).
-  * `-` Subtract the argument from the matching number (`N` only).
-5. Operator argument (required for some operators): meaning varies by operator.
+* `N` matches one or more consecutive digits.
+* `A` matches one or more letters (lower or upper case).
+* `W` matches alphanumeric chars and underscores.
+* `*` greedily matches everything after it.
+* `/regex/` matches on the supplied regex. Capture groups may be referenced in the argument as `$1`, `$2`, etc.
-## More Examples
+**Operators**
+* `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars.
+* `>` Append the argument to the matching value.
+* `<` Prepend the argument to the matching value.
+* `+` Perform addition on the matching number and the argument (`N` only).
+* `-` Subtract the argument from the matching number (`N` only).
+**Whitespace**
+Inside of match expressions, whitespace is an optional seperator of terms, i.e. `{ N + 1 }` is the same as `{N+1}`. This means that any spaces in string arguments must be escaped. For example, replacing a word with `foo bar` looks like `{W = foo\ bar}`.
+## Examples
+### Release Number Example
+This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
+```ruby
+  f = Fop('release-{N}.{N+1}.{N=0}')
+  puts f.apply('release-5.99.1')
+  =>           'release-5.100.0'
+  puts f.apply('release-5')
+  => nil
+  # doesn't match the pattern
+```
+### More Examples
 ```ruby
   f = Fop('release-{N=5}.{N+1}.{N=0}')
@@ -61,10 +111,10 @@ Operations are the interesting part of Fop, and are specified between `{` and `}
 ```
 ```ruby
-  f = Fop('rel{/(ease)?/}-{N=5}.{N+1}.{N=0}')
+  f = Fop('rel{/(ease)?/=}-{N=5}.{N+1}.{N=0}')
   puts f.apply('release-4.99.1')
-  =>           'release-5.100.0'
+  =>           'rel-5.100.0'
   puts f.apply('rel-4.99.1')
   =>           'rel-5.100.0'

data/bin/fop ADDED Viewed

@@ -0,0 +1,42 @@
+#!/usr/bin/env ruby
+# Used for local testing
+# $LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib')
+require 'fop_lang'
+require 'fop/cli'
+opts = Fop::CLI.options!
+if opts.version
+  puts Fop::VERSION
+  exit 0
+end
+src = opts.src.read.chomp
+if src.empty?
+  $stderr.puts "No expression given"
+  exit 1
+end
+fop, errors = Fop.compile(src)
+opts.src.close
+NL = "\n".freeze
+if errors
+  $stderr.puts src
+  $stderr.puts errors.join(NL)
+  exit 1
+end
+if opts.check
+  $stdout.puts "Syntax OK" unless opts.quiet
+  exit 0
+end
+while (line = gets) do
+  line.chomp!
+  if (res = fop.apply(line))
+    print(res << NL)
+  end
+end

data/lib/fop/cli.rb ADDED Viewed

@@ -0,0 +1,34 @@
+require 'optparse'
+module Fop
+  module CLI
+    Options = Struct.new(:src, :check, :quiet, :version)
+    def self.options!
+      options = Options.new
+      OptionParser.new do |opts|
+        opts.banner = "Usage: fop [options] [ 'prog' | -f progfile ] [ file ... ]"
+        opts.on("-fFILE", "--file=FILE", "Read program from file instead of first argument") do |f|
+          options.src = File.open(f)
+          options.src.advise(:sequential)
+        end
+        opts.on("-c", "--check", "Perform a syntax check on the program and exit") do
+          options.check = true
+        end
+        opts.on("-q", "--quiet", "Only print errors and output") do
+          options.quiet = true
+        end
+        opts.on("--version", "Print version and exit") do
+          options.version = true
+        end
+      end.parse!
+      options.src ||= StringIO.new(ARGV.shift || "")
+      options
+    end
+  end
+end

data/lib/fop/compiler.rb ADDED Viewed

@@ -0,0 +1,95 @@
+require_relative 'parser'
+module Fop
+  module Compiler
+    def self.compile(src)
+      parser = Parser.new(src)
+      nodes, errors = parser.parse
+      instructions = nodes.map { |node|
+        case node
+        when Nodes::Text, Nodes::Regex
+          Instructions.regex_match(node.regex)
+        when Nodes::Expression
+          arg_error = Validations.validate_args(node)
+          errors << arg_error if arg_error
+          Instructions::ExpressionMatch.new(node)
+        else
+          raise "Unknown node type #{node}"
+        end
+      }
+      return nil, errors if errors.any?
+      return instructions, nil
+    end
+    module Instructions
+      Op = Struct.new(:proc, :arity, :max_arity)
+      BLANK = "".freeze
+      OPERATIONS = {
+        "=" => Op.new(->(_val, args) { args[0] || BLANK }, 0, 1),
+        "+" => Op.new(->(val, args) { val.to_i + args[0].to_i }, 1),
+        "-" => Op.new(->(val, args) { val.to_i - args[0].to_i }, 1),
+        ">" => Op.new(->(val, args) { val + args[0] }, 1),
+        "<" => Op.new(->(val, args) { args[0] + val }, 1),
+      }
+      def self.regex_match(regex)
+        ->(input) { input.slice! regex }
+      end
+      class ExpressionMatch
+        def initialize(node)
+          @regex = node.regex&.regex
+          @op = node.operator_token ? OPERATIONS.fetch(node.operator_token.val) : nil
+          @regex_match = node.regex_match
+          @args = node.args&.map { |arg|
+            arg.has_captures ? arg.segments : arg.segments.join("")
+          }
+        end
+        def call(input)
+          if (match = @regex.match(input))
+            val = match.to_s
+            blank = val == BLANK
+            input.sub!(val, BLANK) unless blank
+            found_val = @regex_match || !blank
+            if @op and @args and found_val
+              args = @args.map { |arg|
+                case arg
+                when String then arg
+                when Array then sub_caps(arg, match.captures)
+                else raise "Unexpected arg type #{arg.class.name}"
+                end
+              }
+              @op.proc.call(val, args)
+            else
+              val
+            end
+          end
+        end
+        private
+        def sub_caps(args, caps)
+          args.map { |a|
+            a.is_a?(Integer) ? caps[a].to_s : a
+          }.join("")
+        end
+      end
+    end
+    module Validations
+      def self.validate_args(exp_node)
+        op_token = exp_node.operator_token || return
+        op = Instructions::OPERATIONS.fetch(op_token.val)
+        num = exp_node.args&.size || 0
+        arity = op.arity
+        max_arity = op.max_arity || arity
+        if num < arity or num > max_arity
+          Parser::Error.new(:argument, op_token, "#{op_token.val} expects #{arity}..#{max_arity} arguments; #{num} given")
+        end
+      end
+    end
+  end
+end

data/lib/fop/nodes.rb CHANGED Viewed

@@ -1,44 +1,39 @@
 module Fop
   module Nodes
-    Text = Struct.new(:wildcard, :str) do
-      def consume!(input)
-        @regex ||= Regexp.new((wildcard ? ".*" : "^") + Regexp.escape(str))
-        input.slice!(@regex)
-      end
+    Text = Struct.new(:wildcard, :str, :regex) do
       def to_s
         w = wildcard ? "*" : nil
-        "Text #{w}#{str}"
+        "[#{w}txt] #{str}"
       end
     end
-    Op = Struct.new(:wildcard, :match, :regex_match, :regex, :operator, :operator_arg, :operator_arg_w_caps, :expression) do
-      def consume!(input)
-        if (match = regex.match(input))
-          val = match.to_s
-          blank = val == Parser::BLANK
-          input.sub!(val, Parser::BLANK) unless blank
-          found_val = regex_match || !blank
-          arg = operator_arg_w_caps ? sub_caps(operator_arg_w_caps, match.captures) : operator_arg
-          expression && found_val ? expression.call(val, operator, arg) : val
-        end
+    Regex = Struct.new(:wildcard, :src, :regex) do
+      def to_s
+        w = wildcard ? "*" : nil
+        "[#{w}reg] #{src}"
       end
+    end
+    Expression = Struct.new(:wildcard, :match, :regex_match, :regex, :operator_token, :args) do
       def to_s
         w = wildcard ? "*" : nil
-        s = "#{w}#{match}"
-        s << " #{operator} #{operator_arg}" if operator
+        s = "[#{w}exp] #{match}"
+        if operator_token
+          arg_str = args
+            .map { |a| a.is_a?(Integer) ? "$#{a+1}" : a.to_s }
+            .join("")
+          s << " #{operator_token.val} #{arg_str}"
+        end
         s
       end
+    end
-      private
-      def sub_caps(tokens, caps)
-        tokens.map { |t|
-          case t
-          when String then t
-          when Parser::CaptureGroup then caps[t.index].to_s
-          else raise Parser::Error, "Unexpected #{t} in capture group"
+    Arg = Struct.new(:segments, :has_captures) do
+      def to_s
+        segments.map { |s|
+          case s
+          when Integer then "$#{s + 1}"
+          else s.to_s
           end
         }.join("")
       end

data/lib/fop/parser.rb CHANGED Viewed

@@ -1,181 +1,173 @@
+require_relative 'tokenizer'
 require_relative 'nodes'
 module Fop
-  module Parser
-    Error = Class.new(StandardError)
-    CaptureGroup = Struct.new(:index)
+  class Parser
+    DIGIT = /^[0-9]$/
+    REGEX_START = "^".freeze
+    REGEX_LAZY_WILDCARD = ".*?".freeze
+    REGEX_MATCHES = {
+      "N" => "[0-9]+".freeze,
+      "W" => "\\w+".freeze,
+      "A" => "[a-zA-Z]+".freeze,
+      "*" => ".*".freeze,
+    }.freeze
+    #OPS_WITH_OPTIONAL_ARGS = [Tokenizer::OP_REPLACE]
+    TR_REGEX = /.*/
+    Error = Struct.new(:type, :token, :message) do
+      def to_s
+        "#{type.to_s.capitalize} error: #{message} at column #{token.pos}"
+      end
+    end
-    MATCH_NUM = "N".freeze
-    MATCH_WORD = "W".freeze
-    MATCH_ALPHA = "A".freeze
-    MATCH_WILD = "*".freeze
-    BLANK = "".freeze
-    OP_REPLACE = "=".freeze
-    OP_APPEND = ">".freeze
-    OP_PREPEND = "<".freeze
-    OP_ADD = "+".freeze
-    OP_SUB = "-".freeze
-    OP_MUL = "*".freeze
-    OP_DIV = "/".freeze
-    VAR = "$".freeze
-    CAP_NUM = /^[1-9]$/
+    attr_reader :errors
-    EXP_REPLACE = ->(_val, _op, arg) { arg || BLANK }
-    EXP_MATH = ->(val, op, arg) { val.to_i.send(op, arg.to_i) }
-    EXP_APPEND = ->(val, _op, arg) { val + arg }
-    EXP_PREPEND = ->(val, _op, arg) { arg + val }
+    def initialize(src, debug: false)
+      @tokenizer = Tokenizer.new(src)
+      @errors = []
+    end
-    def self.parse!(tokens)
+    def parse
       nodes = []
-      curr_node = nil
-      tokens.each { |token|
-        case curr_node
-        when nil
-          curr_node = new_node token
-        when :wildcard
-          curr_node = new_node token, true
-          raise Error, "Unexpected * after wildcard" if curr_node == :wildcard
-        when Nodes::Text
-          curr_node, finished_node = parse_text curr_node, token
-          nodes << finished_node if finished_node
-        when Nodes::Op
-          nodes << curr_node
-          curr_node = new_node token
+      wildcard = false
+      eof = false
+      # Top-level parsing. It will always be looking for a String, Regex, or Expression.
+      until eof
+        @tokenizer.reset_escapes!
+        t = @tokenizer.next
+        case t.type
+        when Tokens::WILDCARD
+          errors << Error.new(:syntax, t, "Consecutive wildcards") if wildcard
+          wildcard = true
+        when Tokens::TEXT
+          reg = build_regex!(wildcard, t, Regexp.escape(t.val))
+          nodes << Nodes::Text.new(wildcard, t.val, reg)
+          wildcard = false
+        when Tokens::EXP_OPEN
+          nodes << parse_exp!(wildcard)
+          wildcard = false
+        when Tokens::REG_DELIM
+          nodes << parse_regex!(wildcard)
+          wildcard = false
+        when Tokens::EOF
+          eof = true
         else
-          raise Error, "Unexpected node #{curr_node}"
+          errors << Error.new(:syntax, t, "Unexpected #{t.type}")
         end
-      }
-      case curr_node
-      when nil
-        # noop
-      when :wildcard
-        nodes << Nodes::Text.new(true, "")
-      when Nodes::Text, Nodes::Op
-        nodes << curr_node
-      else
-        raise Error, "Unexpected end node #{curr_node}"
       end
-      nodes
+      nodes << Nodes::Text.new(true, "", TR_REGEX) if wildcard
+      return nodes, @errors
     end
-    private
+    def parse_exp!(wildcard = false)
+      exp = Nodes::Expression.new(wildcard)
+      parse_exp_match! exp
+      parse_exp_operator! exp
+      if exp.operator_token
+        parse_exp_arg! exp
+      end
+      return exp
+    end
-    def self.new_node(token, wildcard = false)
-      case token
-      when Tokenizer::Char
-        Nodes::Text.new(wildcard, token.char.clone)
-      when Tokenizer::Op
-        op = Nodes::Op.new(wildcard)
-        parse_op! op, token
-        op
-      when :wildcard
-        :wildcard
+    def parse_exp_match!(exp)
+      @tokenizer.escape.whitespace = false
+      @tokenizer.escape.operators = false
+      t = @tokenizer.next
+      case t.type
+      when Tokens::TEXT, Tokens::WILDCARD
+        exp.match = t.val
+        if (src = REGEX_MATCHES[exp.match])
+          reg = Regexp.new((exp.wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
+          exp.regex = Nodes::Regex.new(exp.wildcard, src, reg)
+        else
+          errors << Error.new(:name, t, "Unknown match type '#{exp.match}'") if exp.regex.nil?
+        end
+      when Tokens::REG_DELIM
+        exp.regex = parse_regex!(exp.wildcard)
+        exp.match = exp.regex&.src
+        exp.regex_match = true
+        @tokenizer.reset_escapes!
       else
-        raise Error, "Unexpected #{token}"
+        errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string or a regex")
       end
     end
-    # @return current node
-    # @return finished node
-    def self.parse_text(node, token)
-      case token
-      when Tokenizer::Char
-        node.str << token.char
-        return node, nil
-      when Tokenizer::Op
-        op = new_node token
-        return op, node
-      when :wildcard
-        return :wildcard, node
+    def parse_exp_operator!(exp)
+      @tokenizer.escape.whitespace = false
+      @tokenizer.escape.operators = false
+      t = @tokenizer.next
+      case t.type
+      when Tokens::EXP_CLOSE
+        # no op
+      when Tokens::OPERATOR, Tokens::TEXT
+        exp.operator_token = t
       else
-        raise Error, "Unexpected #{token}"
+        errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected an operator")
       end
     end
-    def self.parse_op!(node, token)
-      # parse the matching type
-      node.regex =
-        case token.match
-        when Tokenizer::Char
-          node.match = token.match.char
-          node.regex_match = false
-          case node.match
-          when MATCH_NUM then Regexp.new((node.wildcard ? ".*?" : "^") + "[0-9]+")
-          when MATCH_WORD then Regexp.new((node.wildcard ? ".*?" : "^") + "\\w+")
-          when MATCH_ALPHA then Regexp.new((node.wildcard ? ".*?" : "^") + "[a-zA-Z]+")
-          when MATCH_WILD then /.*/
-          else raise Error, "Unknown match type '#{node.match}'"
-          end
-        when Tokenizer::Regex
-          node.match = "/#{token.match.src}/"
-          node.regex_match = true
-          Regexp.new((node.wildcard ? ".*?" : "^") + token.match.src)
-        when nil
-          raise Error, "Empty operation"
+    def parse_exp_arg!(exp)
+      @tokenizer.escape.whitespace = false
+      @tokenizer.escape.whitespace_sep = false
+      @tokenizer.escape.operators = true
+      @tokenizer.escape.regex = true
+      @tokenizer.escape.regex_capture = false if exp.regex_match
+      arg = Nodes::Arg.new([], false)
+      exp.args = []
+      found_close, eof = false, false
+      until found_close or eof
+        t = @tokenizer.next
+        case t.type
+        when Tokens::TEXT
+          arg.segments << t.val
+        when Tokens::REG_CAPTURE
+          arg.has_captures = true
+          arg.segments << t.val.to_i - 1
+          errors << Error.new(:syntax, t, "Invalid regex capture; must be between 0 and 9 (found #{t.val})") unless t.val =~ DIGIT
+          errors << Error.new(:syntax, t, "Unexpected regex capture; expected str or '}'") if !exp.regex_match
+        when Tokens::WHITESPACE_SEP
+          if arg.segments.any?
+            exp.args << arg
+            arg = Nodes::Arg.new([])
+          end
+        when Tokens::EXP_CLOSE
+          found_close = true
+        when Tokens::EOF
+          eof = true
+          errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
         else
-          raise Error, "Unexpected #{token.match}"
+          errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
         end
-      # parse the operator (if any)
-      if token.operator
-        raise Error, "Unexpected #{token.operator} for operator" unless token.operator.is_a? Tokenizer::Char
-        node.operator = token.operator.char
-        node.operator_arg = token.arg if token.arg and token.arg != BLANK
-        node.operator_arg_w_caps = parse_captures! node.operator_arg if node.operator_arg and node.regex_match
-        node.expression =
-          case node.operator
-          when OP_REPLACE
-            EXP_REPLACE
-          when OP_ADD, OP_SUB, OP_MUL, OP_DIV
-            raise Error, "Operator #{node.operator} is only available for numeric matches" unless node.match == MATCH_NUM
-            raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
-            EXP_MATH
-          when OP_APPEND
-            raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
-            EXP_APPEND
-          when OP_PREPEND
-            raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
-            EXP_PREPEND
-          else
-            raise Error, "Unknown operator #{node.operator}"
-          end
       end
-    end
-    def self.parse_captures!(arg)
-      i = 0
-      iend = arg.size - 1
-      escape = false
-      nodes = []
-      until i > iend
-        char = arg[i]
-        i += 1
+      exp.args << arg if arg.segments.any?
-        if escape
-          nodes << char
-          escape = false
-          next
-        end
+      #if exp.arg.size != 1 and !OPS_WITH_OPTIONAL_ARGS.include?(exp.operator)
+      #  errors << Error.new(:arg, op_token, "Operator '#{op_token.val}' requires an argument")
+      #end
+    end
-        case char
-        when Tokenizer::ESCAPE
-          escape = true
-        when VAR
-          num = arg[i].to_s
-          raise Error, "Capture group number must be between 1 and 9; found '#{num}'" unless num =~ CAP_NUM
-          nodes << CaptureGroup.new(num.to_i - 1)
-          i += 1
-        else
-          nodes << char
-        end
+    def parse_regex!(wildcard)
+      @tokenizer.regex_mode!
+      t = @tokenizer.next
+      reg = Nodes::Regex.new(wildcard, t.val)
+      if t.type == Tokens::TEXT
+        reg.regex = build_regex!(wildcard, t)
+      else
+        errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex")
       end
-      raise Error, "Trailing escape" if escape
-      nodes
+      t = @tokenizer.next
+      errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex") unless t.type == Tokens::REG_DELIM
+      reg
+    end
+    def build_regex!(wildcard, token, src = token.val)
+      Regexp.new((wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
+    rescue RegexpError => e
+      errors << Error.new(:regex, token, e.message)
+      nil
     end
   end
 end

data/lib/fop/program.rb CHANGED Viewed

@@ -1,22 +1,16 @@
-require_relative 'tokenizer'
-require_relative 'parser'
 module Fop
   class Program
-    attr_reader :nodes
-    def initialize(src)
-      tokens = Tokenizer.new(src).tokenize!
-      @nodes = Parser.parse! tokens
+    def initialize(instructions)
+      @instructions = instructions
     end
     def apply(input)
       input = input.clone
       output =
-        @nodes.reduce("") { |acc, token|
-          section = token.consume!(input)
-          return nil if section.nil?
-          acc + section.to_s
+        @instructions.reduce("") { |acc, ins|
+          result = ins.call(input)
+          return nil if result.nil?
+          acc + result.to_s
         }
       input.empty? ? output : nil
     end

data/lib/fop/tokenizer.rb CHANGED Viewed

@@ -1,144 +1,194 @@
+require_relative 'tokens'
 module Fop
   class Tokenizer
-    Char = Struct.new(:char)
-    Op = Struct.new(:match, :operator, :arg)
-    Regex = Struct.new(:src)
-    Error = Class.new(StandardError)
+    Token = Struct.new(:pos, :type, :val)
+    Escapes = Struct.new(:whitespace, :whitespace_sep, :operators, :regex_capture, :regex, :regex_escape, :wildcards, :exp)
-    OP_OPEN = "{".freeze
-    OP_CLOSE = "}".freeze
+    EXP_OPEN = "{".freeze
+    EXP_CLOSE = "}".freeze
     ESCAPE = "\\".freeze
     WILDCARD = "*".freeze
-    REGEX_MARKER = "/".freeze
+    REGEX_DELIM = "/".freeze
+    REGEX_CAPTURE = "$".freeze
+    OP_REPLACE = "=".freeze
+    OP_APPEND = ">".freeze
+    OP_PREPEND = "<".freeze
+    OP_ADD = "+".freeze
+    OP_SUB = "-".freeze
+    WHITESPACE = " ".freeze
+    #
+    # Controls which "mode" the tokenizer is currently in. This is a necessary result of the syntax lacking
+    # explicit string delimiters. That *could* be worked around by requiring users to escape all reserved chars,
+    # but that's ugly af. Instead, the parser continually assesses the current context and flips these flags on
+    # or off to auto-escape certain chars for the next token.
+    #
+    attr_reader :escape
     def initialize(src)
       @src = src
       @end = src.size - 1
+      @start_i = 0
+      @i = 0
+      reset_escapes!
     end
-    def tokenize!
-      tokens = []
-      escape = false
-      i = 0
-      until i > @end do
-        char = @src[i]
-        i += 1
-        if escape
-          tokens << Char.new(char)
-          escape = false
-          next
-        end
-        case char
-        when ESCAPE
-          escape = true
-        when OP_OPEN
-          i, op = operation! i
-          tokens << op
-        when OP_CLOSE
-          raise "Unexpected #{OP_CLOSE}"
-        when WILDCARD
-          tokens << :wildcard
-        else
-          tokens << Char.new(char)
-        end
-      end
-      raise Error, "Trailing escape" if escape
-      tokens
+    # Auto-escape operators and regex capture vars. Appropriate for top-level syntax.
+    def reset_escapes!
+      @escape = Escapes.new(true, true, true, true)
     end
-    private
-    def operation!(i)
-      found_close = false
-      op = Op.new(nil, nil, "")
+    # Auto-escape anything you'd find in a regular expression
+    def regex_mode!
+      @escape.whitespace = true
+      @escape.regex = false # look for the final /
+      @escape.regex_escape = true # pass \ through to the regex engine UNLESS it's followed by a /
+      @escape.wildcards = true
+      @escape.operators = true
+      @escape.regex_capture = true
+      @escape.exp = true
+    end
-      # Find matcher
-      until found_close or op.match or i > @end do
-        char = @src[i]
-        i += 1
-        case char
-        when OP_CLOSE
-          found_close = true
-        when REGEX_MARKER
-          i, reg = regex! i
-          op.match = reg
+    def next
+      return Token.new(@i, Tokens::EOF) if @i > @end
+      char = @src[@i]
+      case char
+      when EXP_OPEN
+        @i += 1
+        token! Tokens::EXP_OPEN
+      when EXP_CLOSE
+        @i += 1
+        token! Tokens::EXP_CLOSE
+      when WILDCARD
+        @i += 1
+        token! Tokens::WILDCARD, WILDCARD
+      when REGEX_DELIM
+        if @escape.regex
+          get_str!
         else
-          op.match = Char.new(char)
+          @i += 1
+          token! Tokens::REG_DELIM
         end
-      end
-      # Find operator
-      until found_close or op.operator or i > @end do
-        char = @src[i]
-        i += 1
-        case char
-        when OP_CLOSE
-          found_close = true
+      when REGEX_CAPTURE
+        if @escape.regex_capture
+          get_str!
         else
-          op.operator = Char.new(char)
+          @i += 1
+          t = token! Tokens::REG_CAPTURE, @src[@i]
+          @i += 1
+          @start_i = @i
+          t
         end
-      end
-      # Find operator arg
-      escape = false
-      until found_close or i > @end do
-        char = @src[i]
-        i += 1
-        if escape
-          op.arg << char
-          escape = false
-          next
+      when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
+        if @escape.operators
+          get_str!
+        else
+          @i += 1
+          token! Tokens::OPERATOR, char
         end
-        case char
-        when ESCAPE
-          escape = true
-        when OP_OPEN
-          raise "Unexpected #{OP_OPEN}"
-        when OP_CLOSE
-          found_close = true
+      when WHITESPACE
+        if @escape.whitespace
+          get_str!
+        elsif !@escape.whitespace_sep
+          @i += 1
+          token! Tokens::WHITESPACE_SEP
         else
-          op.arg << char
+          @i += 1
+          @start_i = @i
+          self.next
         end
+      else
+        get_str!
       end
-      raise Error, "Unclosed operation" if !found_close
-      raise Error, "Trailing escape" if escape
-      return i, op
     end
-    def regex!(i)
-      escape = false
-      found_close = false
-      src = ""
+    private
+    def token!(type, val = nil)
+      t = Token.new(@start_i, type, val)
+      @start_i = @i
+      t
+    end
-      until found_close or i > @end
-        char = @src[i]
-        i += 1
+    def get_str!
+      str = ""
+      escape, found_end = false, false
+      until found_end or @i > @end
+        char = @src[@i]
         if escape
-          src << char
+          @i += 1
+          str << char
           escape = false
           next
         end
         case char
         when ESCAPE
-          escape = true
-        when REGEX_MARKER
-          found_close = true
+          @i += 1
+          if @escape.regex_escape and @src[@i] != REGEX_DELIM
+            str << char
+          else
+            escape = true
+          end
+        when EXP_OPEN
+          if @escape.exp
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when EXP_CLOSE
+          if @escape.exp
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when WILDCARD
+          if @escape.wildcards
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when REGEX_DELIM
+          if @escape.regex
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when REGEX_CAPTURE
+          if @escape.regex_capture
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
+          if @escape.operators
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when WHITESPACE
+          if @escape.whitespace
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
         else
-          src << char
+          @i += 1
+          str << char
         end
       end
-      raise Error, "Unclosed regex" if !found_close
-      raise Error, "Trailing escape" if escape
-      return i, Regex.new(src)
+      return Token.new(@i - 1, Tokens::TR_ESC) if escape
+      token! Tokens::TEXT, str
     end
   end
 end

data/lib/fop/tokens.rb ADDED Viewed

@@ -0,0 +1,14 @@
+module Fop
+  module Tokens
+    TEXT = :TXT
+    EXP_OPEN = :"{"
+    EXP_CLOSE = :"}"
+    REG_CAPTURE = :"$"
+    REG_DELIM = :/
+    WILDCARD = :*
+    OPERATOR = :op
+    TR_ESC = :"trailing escape"
+    WHITESPACE_SEP = :s
+    EOF = :EOF
+  end
+end

data/lib/fop/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Fop
-  VERSION = "0.4.0"
+  VERSION = "0.8.0"
 end

data/lib/fop_lang.rb CHANGED Viewed

@@ -1,12 +1,22 @@
 require_relative 'fop/version'
+require_relative 'fop/compiler'
 require_relative 'fop/program'
 def Fop(src)
-  ::Fop::Program.new(src)
+  ::Fop.compile!(src)
 end
 module Fop
+  def self.compile!(src)
+    prog, errors = compile(src)
+    # TODO better exception
+    raise "Fop errors: " + errors.map(&:message).join(",") if errors
+    prog
+  end
   def self.compile(src)
-    Program.new(src)
+    instructions, errors = ::Fop::Compiler.compile(src)
+    return nil, errors if errors
+    return Program.new(instructions), nil
   end
 end

metadata CHANGED Viewed

@@ -1,26 +1,31 @@
 --- !ruby/object:Gem::Specification
 name: fop_lang
 version: !ruby/object:Gem::Version
-  version: 0.4.0
+  version: 0.8.0
 platform: ruby
 authors:
 - Jordan Hollinger
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2021-08-16 00:00:00.000000000 Z
+date: 2021-09-01 00:00:00.000000000 Z
 dependencies: []
 description: A micro expression language for Filter and OPerations on text
 email: jordan.hollinger@gmail.com
-executables: []
+executables:
+- fop
 extensions: []
 extra_rdoc_files: []
 files:
 - README.md
+- bin/fop
+- lib/fop/cli.rb
+- lib/fop/compiler.rb
 - lib/fop/nodes.rb
 - lib/fop/parser.rb
 - lib/fop/program.rb
 - lib/fop/tokenizer.rb
+- lib/fop/tokens.rb
 - lib/fop/version.rb
 - lib/fop_lang.rb
 homepage: https://jhollinger.github.io/fop-lang-rb/