RubyGems - fop_lang - Versions diffs - 0.3.0 → 0.7.0 - Mend

fop_lang 0.3.0 → 0.7.0

Files changed (13) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: b5f19a543b81c0046dc63fcc1c0769989628017d2c1d1da74ef0db9866a0f2f7
-  data.tar.gz: 03b6597f9cab97c95ccda8396693bb43d9da729137cb916cc74f7fbecc314b32
+  metadata.gz: 798fd7c335f394e878fba2f70a9f60372ea356c79f2dc63392398920d0ffce38
+  data.tar.gz: 654786ff77823e8d8dd9a348f958828346e3755e43a04a0f38e711a6c5571ea9
 SHA512:
-  metadata.gz: 3a17c82a561e20cbc5cb8abbad5be4f94f02110d60b6130e3e1e9489672c5c134befc6b1daca2f590f083a67934e600fb5d6fa0ea5433181ba3014514c558232
-  data.tar.gz: 790250c8a79dcf04b381f2dd33cbaa048fd070688ab45446ff87652dcb18844c2d6139d0ead060fa338a57b8590eee0167ea2c25abd84e1d71571f33c49bcbda
+  metadata.gz: 6761f3d7dd602d1c93a2387fc73ea14c11484e88d0d319bbf87df98925977aa15de59a63f23aafffafa384ce3b9def9f81edabae669aabc2012b00d3131e46f4
+  data.tar.gz: 7f5187cd510d691dda996284d5a400804b7573f67506701e39a6d2909c8a4026b58655f6b2800708e911377ccce790885a2238eed7a75d4873e4b599d23e67df

data/README.md CHANGED Viewed

@@ -1,55 +1,99 @@
 # fop_lang
-Fop (Filter and OPerations language) is an experimental, tiny expression language in the vein of awk and sed. This is a Ruby implementation. It is useful for simultaneously matching and transforming text input.
+Fop (Filter and OPerations language) is a tiny, experimental language for filtering and transforming text. Think of it like awk but with the condition and action segments combined.
-```ruby
-gem 'fop_lang'
-```
+This is a Ruby implementation with both a library interface and a bin command.
-## Release Number Example
+## Installation
-This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
+```bash
+$ gem install fop_lang
+```
+You may use fop in a Ruby script:
 ```ruby
-  f = Fop('release-{N}.{N+1}.{N=0}')
+require 'fop_lang'
-  puts f.apply('release-5.99.1')
-  =>           'release-5.100.0'
+f = Fop('foo {N+1}')
-  puts f.apply('release-5')
-  => nil
-  # doesn't match the pattern
+f.apply('foo 1')
+=> "foo 2"
+f.apply('bar 1')
+=> nil
+```
+or run `fop` from the command line:
+```bash
+$ echo 'foo 1' | fop 'foo {N+1}'
+foo 2
+$ echo 'bar 1' | fop 'foo {N+1}'
 ```
-## Anatomy of a Fop expression
+## Syntax
+`Text /(R|r)egex/ {N+1}`
+The above program demonstrates a text match, a regex match, and a match expression. If the input matches all three segments, output is given. If the input was `Text regex 5`, the output would be `Text regex 6`.
+### Text match
+The input must match this text exactly. Whitespace is part of the match. Wildcards (`*`) are allowed. Special characters (`*/{}\`) may be escaped with `\`.
-`Text Literal {Operation}`
+The output of a text match will be the matching input.
-The above expression contains the only two parts of Fop (except for the wildcard and escape characters).
+### Regex match
-**Text Literals**
+Regular expressions may be placed between `/`s. If the regular expression contains a `/`, you may escape it with `\`. Special regex characters like `[]()+.*` may also be escaped with `\`.
-A text literal works how it sounds: the input must match it exactly. The only exception is the `*` (wildcard) character, which matches 0 or more of anything. Wildcards can be used anywhere except inside `{...}` (operations).
+The output of a regex match will be the matching input.
-If `\` (escape) is used before the special characters `*`, `{` or `}`, then that character is treated like a text literal. It's recommended to use single-quoted Ruby strings with Fop expressions that so you don't need to double-escape.
+### Match expression
-**Operations**
+A match expression both matches on input and modifies that input. An expression is made up of 1 - 3 parts:
-Operations are the interesting part of Fop, and are specified between `{` and `}`. An Operation can consist of one to three parts:
+1. The match, e.g. `N` for numeric.
+2. The operator, e.g. `+` for addition (optional).
+3. The argument, e.g `1` for "add one" (required for most operators).
-1. Matching class (required): Defines what characters the operation will match and operate on.
-  * `N` is the numeric class and will match one or more digits.
-  * `A` is the alpha class and will match one or more letters (lower or upper case).
-  * `W` is the word class and matches alphanumeric chars and underscores.
-  * `*` is the wildcard class and greedily matches everything after it.
-  * `/.../` matches on the supplied regex between the `/`'s. If you're regex contains a `/`, it must be escaped.
-3. Operator (optional): What to do to the matching characters.
-  * `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars. Note that any `/` chars must be escaped, so as not to be mistaken for a regex.
-  * `+` Perform addition on the matching number and the argument (`N` only).
-  * `-` Subtract the argument from the matching number (`N` only).
-5. Operator argument (required for some operators): meaning varies by operator.
+The output of a match expression will be the _modified_ matching input. If no operator is given, the output will be the matching input.
+**Matches**
+* `N` matches one or more consecutive digits.
+* `A` matches one or more letters (lower or upper case).
+* `W` matches alphanumeric chars and underscores.
+* `*` greedily matches everything after it.
+* `/regex/` matches on the supplied regex. Capture groups may be referenced in the argument as `$1`, `$2`, etc.
+**Operators**
+* `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars.
+* `>` Append the argument to the matching value.
+* `<` Prepend the argument to the matching value.
+* `+` Perform addition on the matching number and the argument (`N` only).
+* `-` Subtract the argument from the matching number (`N` only).
+## Examples
+### Release Number Example
+This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
+```ruby
+  f = Fop('release-{N}.{N+1}.{N=0}')
+  puts f.apply('release-5.99.1')
+  =>           'release-5.100.0'
+  puts f.apply('release-5')
+  => nil
+  # doesn't match the pattern
+```
-## More Examples
+### More Examples
 ```ruby
   f = Fop('release-{N=5}.{N+1}.{N=0}')

data/bin/fop ADDED Viewed

@@ -0,0 +1,42 @@
+#!/usr/bin/env ruby
+# Used for local testing
+# $LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib')
+require 'fop_lang'
+require 'fop/cli'
+opts = Fop::CLI.options!
+if opts.version
+  puts Fop::VERSION
+  exit 0
+end
+src = opts.src.read.chomp
+if src.empty?
+  $stderr.puts "No expression given"
+  exit 1
+end
+fop, errors = Fop.compile(src)
+opts.src.close
+NL = "\n".freeze
+if errors
+  $stderr.puts src
+  $stderr.puts errors.join(NL)
+  exit 1
+end
+if opts.check
+  $stdout.puts "Syntax OK" unless opts.quiet
+  exit 0
+end
+while (line = gets) do
+  line.chomp!
+  if (res = fop.apply(line))
+    print(res << NL)
+  end
+end

data/lib/fop/cli.rb ADDED Viewed

@@ -0,0 +1,34 @@
+require 'optparse'
+module Fop
+  module CLI
+    Options = Struct.new(:src, :check, :quiet, :version)
+    def self.options!
+      options = Options.new
+      OptionParser.new do |opts|
+        opts.banner = "Usage: fop [options] [ 'prog' | -f progfile ] [ file ... ]"
+        opts.on("-fFILE", "--file=FILE", "Read program from file instead of first argument") do |f|
+          options.src = File.open(f)
+          options.src.advise(:sequential)
+        end
+        opts.on("-c", "--check", "Perform a syntax check on the program and exit") do
+          options.check = true
+        end
+        opts.on("-q", "--quiet", "Only print errors and output") do
+          options.quiet = true
+        end
+        opts.on("--version", "Print version and exit") do
+          options.version = true
+        end
+      end.parse!
+      options.src ||= StringIO.new(ARGV.shift || "")
+      options
+    end
+  end
+end

data/lib/fop/compiler.rb ADDED Viewed

@@ -0,0 +1,72 @@
+require_relative 'parser'
+module Fop
+  module Compiler
+    def self.compile(src)
+      parser = Parser.new(src)
+      nodes, errors = parser.parse
+      instructions = nodes.map { |node|
+        case node
+        when Nodes::Text, Nodes::Regex
+          Instructions.regex_match(node.regex)
+        when Nodes::Expression
+          Instructions::ExpressionMatch.new(node)
+        else
+          raise "Unknown node type #{node}"
+        end
+      }
+      return nil, errors if errors.any?
+      return instructions, nil
+    end
+    module Instructions
+      BLANK = "".freeze
+      OPERATIONS = {
+        "=" => ->(_val, arg) { arg || BLANK },
+        "+" => ->(val, arg) { val.to_i + arg.to_i },
+        "-" => ->(val, arg) { val.to_i - arg.to_i },
+        ">" => ->(val, arg) { val + arg },
+        "<" => ->(val, arg) { arg + val },
+      }
+      def self.regex_match(regex)
+        ->(input) { input.slice! regex }
+      end
+      class ExpressionMatch
+        def initialize(node)
+          @regex = node.regex&.regex
+          @op = node.operator ? OPERATIONS.fetch(node.operator) : nil
+          @regex_match = node.regex_match
+          if node.arg&.any? { |a| a.is_a? Integer }
+            @arg, @arg_with_caps = nil, node.arg
+          else
+            @arg = node.arg&.join("")
+            @arg_with_caps = nil
+          end
+        end
+        def call(input)
+          if (match = @regex.match(input))
+            val = match.to_s
+            blank = val == BLANK
+            input.sub!(val, BLANK) unless blank
+            found_val = @regex_match || !blank
+            arg = @arg_with_caps ? sub_caps(@arg_with_caps, match.captures) : @arg
+            @op && found_val ? @op.call(val, arg) : val
+          end
+        end
+        private
+        def sub_caps(args, caps)
+          args.map { |a|
+            a.is_a?(Integer) ? caps[a].to_s : a
+          }.join("")
+        end
+      end
+    end
+  end
+end

data/lib/fop/nodes.rb CHANGED Viewed

@@ -1,29 +1,29 @@
 module Fop
   module Nodes
-    Text = Struct.new(:wildcard, :str) do
-      def consume!(input)
-        @regex ||= Regexp.new((wildcard ? ".*" : "^") + Regexp.escape(str))
-        input.slice!(@regex)
-      end
+    Text = Struct.new(:wildcard, :str, :regex) do
       def to_s
         w = wildcard ? "*" : nil
-        "Text #{w}#{str}"
+        "[#{w}txt] #{str}"
       end
     end
-    Op = Struct.new(:wildcard, :match, :regex_match, :regex, :operator, :operator_arg, :expression) do
-      def consume!(input)
-        if (val = input.slice!(regex))
-          found_val = regex_match || val != Parser::BLANK
-          expression && found_val ? expression.call(val) : val
-        end
+    Regex = Struct.new(:wildcard, :src, :regex) do
+      def to_s
+        w = wildcard ? "*" : nil
+        "[#{w}reg] #{src}"
       end
+    end
+    Expression = Struct.new(:wildcard, :match, :regex_match, :regex, :operator, :arg) do
       def to_s
         w = wildcard ? "*" : nil
-        s = "#{w}#{match}"
-        s << " #{operator} #{operator_arg}" if operator
+        s = "[#{w}exp] #{match}"
+        if operator
+          arg_str = arg
+            .map { |a| a.is_a?(Integer) ? "$#{a+1}" : a.to_s }
+            .join("")
+          s << " #{operator} #{arg_str}"
+        end
         s
       end
     end

data/lib/fop/parser.rb CHANGED Viewed

@@ -1,136 +1,162 @@
+require_relative 'tokenizer'
 require_relative 'nodes'
 module Fop
-  module Parser
-    Error = Class.new(StandardError)
+  class Parser
+    DIGIT = /^[0-9]$/
+    REGEX_START = "^".freeze
+    REGEX_LAZY_WILDCARD = ".*?".freeze
+    REGEX_MATCHES = {
+      "N" => "[0-9]+".freeze,
+      "W" => "\\w+".freeze,
+      "A" => "[a-zA-Z]+".freeze,
+      "*" => ".*".freeze,
+    }.freeze
+    OPS_WITH_OPTIONAL_ARGS = [Tokenizer::OP_REPLACE]
+    TR_REGEX = /.*/
-    MATCH_NUM = "N".freeze
-    MATCH_WORD = "W".freeze
-    MATCH_ALPHA = "A".freeze
-    MATCH_WILD = "*".freeze
-    BLANK = "".freeze
-    OP_REPLACE = "=".freeze
-    OP_ADD = "+".freeze
-    OP_SUB = "-".freeze
-    OP_MUL = "*".freeze
-    OP_DIV = "/".freeze
+    Error = Struct.new(:type, :token, :message) do
+      def to_s
+        "#{type.to_s.capitalize} error: #{message} at column #{token.pos}"
+      end
+    end
-    def self.parse!(tokens)
-      nodes = []
-      curr_node = nil
+    attr_reader :errors
-      tokens.each { |token|
-        case curr_node
-        when nil
-          curr_node = new_node token
-        when :wildcard
-          curr_node = new_node token, true
-          raise Error, "Unexpected * after wildcard" if curr_node == :wildcard
-        when Nodes::Text
-          curr_node, finished_node = parse_text curr_node, token
-          nodes << finished_node if finished_node
-        when Nodes::Op
-          nodes << curr_node
-          curr_node = new_node token
+    def initialize(src, debug: false)
+      @tokenizer = Tokenizer.new(src)
+      @errors = []
+    end
+    def parse
+      nodes = []
+      wildcard = false
+      eof = false
+      # Top-level parsing. It will always be looking for a String, Regex, or Expression.
+      until eof
+        @tokenizer.reset_escapes!
+        t = @tokenizer.next
+        case t.type
+        when Tokens::WILDCARD
+          errors << Error.new(:syntax, t, "Consecutive wildcards") if wildcard
+          wildcard = true
+        when Tokens::TEXT
+          reg = build_regex!(wildcard, t, Regexp.escape(t.val))
+          nodes << Nodes::Text.new(wildcard, t.val, reg)
+          wildcard = false
+        when Tokens::EXP_OPEN
+          nodes << parse_exp!(wildcard)
+          wildcard = false
+        when Tokens::REG_DELIM
+          nodes << parse_regex!(wildcard)
+          wildcard = false
+        when Tokens::EOF
+          eof = true
         else
-          raise Error, "Unexpected node #{curr_node}"
+          errors << Error.new(:syntax, t, "Unexpected #{t.type}")
         end
-      }
-      case curr_node
-      when nil
-        # noop
-      when :wildcard
-        nodes << Nodes::Text.new(true, "")
-      when Nodes::Text, Nodes::Op
-        nodes << curr_node
-      else
-        raise "Unexpected end node #{curr_node}"
       end
-      nodes
+      nodes << Nodes::Text.new(true, "", TR_REGEX) if wildcard
+      return nodes, @errors
     end
-    private
+    def parse_exp!(wildcard = false)
+      exp = Nodes::Expression.new(wildcard)
+      parse_exp_match! exp
+      op_token = parse_exp_operator! exp
+      if exp.operator
+        parse_exp_arg! exp, op_token
+      end
+      return exp
+    end
-    def self.new_node(token, wildcard = false)
-      case token
-      when Tokenizer::Char
-        Nodes::Text.new(wildcard, token.char.clone)
-      when Tokenizer::Op
-        op = Nodes::Op.new(wildcard)
-        parse_op! op, token.tokens
-        op
-      when :wildcard
-        :wildcard
+    def parse_exp_match!(exp)
+      @tokenizer.escape.operators = false
+      t = @tokenizer.next
+      case t.type
+      when Tokens::TEXT, Tokens::WILDCARD
+        exp.match = t.val
+        if (src = REGEX_MATCHES[exp.match])
+          reg = Regexp.new((exp.wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
+          exp.regex = Nodes::Regex.new(exp.wildcard, src, reg)
+        else
+          errors << Error.new(:name, t, "Unknown match type '#{exp.match}'") if exp.regex.nil?
+        end
+      when Tokens::REG_DELIM
+        exp.regex = parse_regex!(exp.wildcard)
+        exp.match = exp.regex&.src
+        exp.regex_match = true
+        @tokenizer.reset_escapes!
       else
-        raise Error, "Unexpected #{token}"
+        errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string or a regex")
       end
     end
-    # @return current node
-    # @return finished node
-    def self.parse_text(node, token)
-      case token
-      when Tokenizer::Char
-        node.str << token.char
-        return node, nil
-      when Tokenizer::Op
-        op = new_node token
-        return op, node
-      when :wildcard
-        return :wildcard, node
+    def parse_exp_operator!(exp)
+      @tokenizer.escape.operators = false
+      t = @tokenizer.next
+      case t.type
+      when Tokens::EXP_CLOSE
+        # no op
+      when Tokens::OPERATOR
+        exp.operator = t.val
       else
-        raise Error, "Unexpected #{token}"
+        errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected an operator")
       end
+      t
     end
-    def self.parse_op!(node, tokens)
-      t = tokens[0] || raise(Error, "Empty operation")
-      # parse the matching type
-      node.regex =
-        case t
-        when Tokenizer::Char
-          node.match = t.char
-          node.regex_match = false
-          case t.char
-          when MATCH_NUM then Regexp.new((node.wildcard ? ".*?" : "^") + "[0-9]+")
-          when MATCH_WORD then Regexp.new((node.wildcard ? ".*?" : "^") + "\\w+")
-          when MATCH_ALPHA then Regexp.new((node.wildcard ? ".*?" : "^") + "[a-zA-Z]+")
-          when MATCH_WILD then /.*/
-          else raise Error, "Unknown match type '#{t.char}'"
-          end
-        when Tokenizer::Regex
-          node.match = "/#{t.src}/"
-          node.regex_match = true
-          Regexp.new((node.wildcard ? ".*?" : "^") + t.src)
+    def parse_exp_arg!(exp, op_token)
+      @tokenizer.escape.operators = true
+      @tokenizer.escape.regex = true
+      @tokenizer.escape.regex_capture = false if exp.regex_match
+      exp.arg = []
+      found_close, eof = false, false
+      until found_close or eof
+        t = @tokenizer.next
+        case t.type
+        when Tokens::TEXT
+          exp.arg << t.val
+        when Tokens::REG_CAPTURE
+          exp.arg << t.val.to_i - 1
+          errors << Error.new(:syntax, t, "Invalid regex capture; must be between 0 and 9 (found #{t.val})") unless t.val =~ DIGIT
+          errors << Error.new(:syntax, t, "Unexpected regex capture; expected str or '}'") if !exp.regex_match
+        when Tokens::EXP_CLOSE
+          found_close = true
+        when Tokens::EOF
+          eof = true
+          errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
         else
-          raise Error, "Unexpected token #{t}"
+          errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
         end
+      end
-      # parse the operator (if any)
-      if (op = tokens[1])
-        raise Error, "Unexpected #{op}" unless op.is_a? Tokenizer::Char
-        node.operator = op.char
-        arg = tokens[2..-1].reduce("") { |acc, t|
-          raise Error, "Unexpected #{t}" unless t.is_a? Tokenizer::Char
-          acc + t.char
-        }
-        node.operator_arg = arg == BLANK ? nil : arg
+      if exp.arg.size != 1 and !OPS_WITH_OPTIONAL_ARGS.include?(exp.operator)
+        errors << Error.new(:arg, op_token, "Operator '#{op_token.val}' requires an argument")
+      end
+    end
-        node.expression =
-          case node.operator
-          when OP_REPLACE
-            ->(_) { node.operator_arg || BLANK }
-          when OP_ADD, OP_SUB, OP_MUL, OP_DIV
-            raise Error, "Operator #{node.operator} is only available for numeric matches" unless node.match == MATCH_NUM
-            raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
-            ->(x) { x.to_i.send(node.operator, node.operator_arg.to_i) }
-          else
-            raise(Error, "Unknown operator #{node.operator}")
-          end
+    def parse_regex!(wildcard)
+      @tokenizer.regex_mode!
+      t = @tokenizer.next
+      reg = Nodes::Regex.new(wildcard, t.val)
+      if t.type == Tokens::TEXT
+        reg.regex = build_regex!(wildcard, t)
+      else
+        errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex")
       end
+      t = @tokenizer.next
+      errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex") unless t.type == Tokens::REG_DELIM
+      reg
+    end
+    def build_regex!(wildcard, token, src = token.val)
+      Regexp.new((wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
+    rescue RegexpError => e
+      errors << Error.new(:regex, token, e.message)
+      nil
     end
   end
 end

data/lib/fop/program.rb CHANGED Viewed

@@ -1,22 +1,16 @@
-require_relative 'tokenizer'
-require_relative 'parser'
 module Fop
   class Program
-    attr_reader :nodes
-    def initialize(src)
-      tokens = Tokenizer.new(src).tokenize!
-      @nodes = Parser.parse! tokens
+    def initialize(instructions)
+      @instructions = instructions
     end
     def apply(input)
       input = input.clone
       output =
-        @nodes.reduce("") { |acc, token|
-          section = token.consume!(input)
-          return nil if section.nil?
-          acc + section.to_s
+        @instructions.reduce("") { |acc, ins|
+          result = ins.call(input)
+          return nil if result.nil?
+          acc + result.to_s
         }
       input.empty? ? output : nil
     end

data/lib/fop/tokenizer.rb CHANGED Viewed

@@ -1,123 +1,175 @@
+require_relative 'tokens'
 module Fop
   class Tokenizer
-    Char = Struct.new(:char)
-    Op = Struct.new(:tokens)
-    Regex = Struct.new(:src)
-    Error = Class.new(StandardError)
+    Token = Struct.new(:pos, :type, :val)
+    Error = Struct.new(:pos, :message)
+    Escapes = Struct.new(:operators, :regex_capture, :regex, :regex_escape, :wildcards, :exp)
-    OP_OPEN = "{".freeze
-    OP_CLOSE = "}".freeze
+    EXP_OPEN = "{".freeze
+    EXP_CLOSE = "}".freeze
     ESCAPE = "\\".freeze
     WILDCARD = "*".freeze
-    REGEX_MARKER = "/".freeze
+    REGEX_DELIM = "/".freeze
+    REGEX_CAPTURE = "$".freeze
+    OP_REPLACE = "=".freeze
+    OP_APPEND = ">".freeze
+    OP_PREPEND = "<".freeze
+    OP_ADD = "+".freeze
+    OP_SUB = "-".freeze
+    #
+    # Controls which "mode" the tokenizer is currently in. This is a necessary result of the syntax lacking
+    # explicit string delimiters. That *could* be worked around by requiring users to escape all reserved chars,
+    # but that's ugly af. Instead, the parser continually assesses the current context and flips these flags on
+    # or off to auto-escape certain chars for the next token.
+    #
+    attr_reader :escape
     def initialize(src)
       @src = src
       @end = src.size - 1
+      @start_i = 0
+      @i = 0
+      reset_escapes!
     end
-    def tokenize!
-      tokens = []
-      escape = false
-      i = 0
-      until i > @end do
-        char = @src[i]
-        if escape
-          tokens << Char.new(char)
-          escape = false
-          i += 1
-          next
-        end
-        case char
-        when ESCAPE
-          escape = true
-          i += 1
-        when OP_OPEN
-          i, op = operation! i + 1
-          tokens << op
-        when OP_CLOSE
-          raise "Unexpected #{OP_CLOSE}"
-        when WILDCARD
-          tokens << :wildcard
-          i += 1
-        else
-          tokens << Char.new(char)
-          i += 1
-        end
-      end
-      raise Error, "Trailing escape" if escape
-      tokens
+    # Auto-escape operators and regex capture vars. Appropriate for top-level syntax.
+    def reset_escapes!
+      @escape = Escapes.new(true, true)
     end
-    private
-    def operation!(i)
-      escape = false
-      found_close = false
-      tokens = []
+    # Auto-escape anything you'd find in a regular expression
+    def regex_mode!
+      @escape.regex = false # look for the final /
+      @escape.regex_escape = true # pass \ through to the regex engine UNLESS it's followed by a /
+      @escape.wildcards = true
+      @escape.operators = true
+      @escape.regex_capture = true
+      @escape.exp = true
+    end
-      until found_close or i > @end do
-        char = @src[i]
-        if escape
-          tokens << Char.new(char)
-          escape = false
-          i += 1
-          next
+    def next
+      return Token.new(@i, Tokens::EOF) if @i > @end
+      char = @src[@i]
+      case char
+      when EXP_OPEN
+        @i += 1
+        token! Tokens::EXP_OPEN
+      when EXP_CLOSE
+        @i += 1
+        token! Tokens::EXP_CLOSE
+      when WILDCARD
+        @i += 1
+        token! Tokens::WILDCARD, WILDCARD
+      when REGEX_DELIM
+        if @escape.regex
+          get_str!
+        else
+          @i += 1
+          token! Tokens::REG_DELIM
         end
-        case char
-        when ESCAPE
-          escape = true
-          i += 1
-        when OP_OPEN
-          raise "Unexpected #{OP_OPEN}"
-        when OP_CLOSE
-          found_close = true
-          i += 1
-        when REGEX_MARKER
-          i, reg = regex! i + 1
-          tokens << reg
+      when REGEX_CAPTURE
+        if @escape.regex_capture
+          get_str!
         else
-          tokens << Char.new(char)
-          i += 1
+          @i += 1
+          t = token! Tokens::REG_CAPTURE, @src[@i]
+          @i += 1
+          @start_i = @i
+          t
         end
+      when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
+        if @escape.operators
+          get_str!
+        else
+          @i += 1
+          token! Tokens::OPERATOR, char
+        end
+      else
+        get_str!
       end
-      raise Error, "Unclosed operation" if !found_close
-      raise Error, "Trailing escape" if escape
-      return i, Op.new(tokens)
     end
-    def regex!(i)
-      escape = false
-      found_close = false
-      src = ""
+    private
-      until found_close or i > @end
-        char = @src[i]
-        i += 1
+    def token!(type, val = nil)
+      t = Token.new(@start_i, type, val)
+      @start_i = @i
+      t
+    end
+    def get_str!
+      str = ""
+      escape, found_end = false, false
+      until found_end or @i > @end
+        char = @src[@i]
         if escape
-          src << char
+          @i += 1
+          str << char
           escape = false
           next
         end
         case char
         when ESCAPE
-          escape = true
-        when REGEX_MARKER
-          found_close = true
+          @i += 1
+          if @escape.regex_escape and @src[@i] != REGEX_DELIM
+            str << char
+          else
+            escape = true
+          end
+        when EXP_OPEN
+          if @escape.exp
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when EXP_CLOSE
+          if @escape.exp
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when WILDCARD
+          if @escape.wildcards
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when REGEX_DELIM
+          if @escape.regex
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when REGEX_CAPTURE
+          if @escape.regex_capture
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
+          if @escape.operators
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
         else
-          src << char
+          @i += 1
+          str << char
         end
       end
-      raise Error, "Unclosed regex" if !found_close
-      raise Error, "Trailing escape" if escape
-      return i, Regex.new(src)
+      return Token.new(@i - 1, Tokens::TR_ESC) if escape
+      token! Tokens::TEXT, str
     end
   end
 end

data/lib/fop/tokens.rb ADDED Viewed

@@ -0,0 +1,13 @@
+module Fop
+  module Tokens
+    TEXT = :TXT
+    EXP_OPEN = :"{"
+    EXP_CLOSE = :"}"
+    REG_CAPTURE = :"$"
+    REG_DELIM = :/
+    WILDCARD = :*
+    OPERATOR = :op
+    TR_ESC = :"trailing escape"
+    EOF = :EOF
+  end
+end

data/lib/fop/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Fop
-  VERSION = "0.3.0"
+  VERSION = "0.7.0"
 end

data/lib/fop_lang.rb CHANGED Viewed

@@ -1,12 +1,22 @@
 require_relative 'fop/version'
+require_relative 'fop/compiler'
 require_relative 'fop/program'
 def Fop(src)
-  ::Fop::Program.new(src)
+  ::Fop.compile!(src)
 end
 module Fop
+  def self.compile!(src)
+    prog, errors = compile(src)
+    # TODO better exception
+    raise "Fop errors: " + errors.map(&:message).join(",") if errors
+    prog
+  end
   def self.compile(src)
-    Program.new(src)
+    instructions, errors = ::Fop::Compiler.compile(src)
+    return nil, errors if errors
+    return Program.new(instructions), nil
   end
 end

metadata CHANGED Viewed

@@ -1,26 +1,31 @@
 --- !ruby/object:Gem::Specification
 name: fop_lang
 version: !ruby/object:Gem::Version
-  version: 0.3.0
+  version: 0.7.0
 platform: ruby
 authors:
 - Jordan Hollinger
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2021-08-16 00:00:00.000000000 Z
+date: 2021-08-30 00:00:00.000000000 Z
 dependencies: []
 description: A micro expression language for Filter and OPerations on text
 email: jordan.hollinger@gmail.com
-executables: []
+executables:
+- fop
 extensions: []
 extra_rdoc_files: []
 files:
 - README.md
+- bin/fop
+- lib/fop/cli.rb
+- lib/fop/compiler.rb
 - lib/fop/nodes.rb
 - lib/fop/parser.rb
 - lib/fop/program.rb
 - lib/fop/tokenizer.rb
+- lib/fop/tokens.rb
 - lib/fop/version.rb
 - lib/fop_lang.rb
 homepage: https://jhollinger.github.io/fop-lang-rb/