RubyGems - fop_lang - Versions diffs - 0.3.0 → 0.7.0 - Mend

fop_lang 0.3.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: b5f19a543b81c0046dc63fcc1c0769989628017d2c1d1da74ef0db9866a0f2f7
-  data.tar.gz: 03b6597f9cab97c95ccda8396693bb43d9da729137cb916cc74f7fbecc314b32
+  metadata.gz: 798fd7c335f394e878fba2f70a9f60372ea356c79f2dc63392398920d0ffce38
+  data.tar.gz: 654786ff77823e8d8dd9a348f958828346e3755e43a04a0f38e711a6c5571ea9
 SHA512:
-  metadata.gz: 3a17c82a561e20cbc5cb8abbad5be4f94f02110d60b6130e3e1e9489672c5c134befc6b1daca2f590f083a67934e600fb5d6fa0ea5433181ba3014514c558232
-  data.tar.gz: 790250c8a79dcf04b381f2dd33cbaa048fd070688ab45446ff87652dcb18844c2d6139d0ead060fa338a57b8590eee0167ea2c25abd84e1d71571f33c49bcbda
+  metadata.gz: 6761f3d7dd602d1c93a2387fc73ea14c11484e88d0d319bbf87df98925977aa15de59a63f23aafffafa384ce3b9def9f81edabae669aabc2012b00d3131e46f4
+  data.tar.gz: 7f5187cd510d691dda996284d5a400804b7573f67506701e39a6d2909c8a4026b58655f6b2800708e911377ccce790885a2238eed7a75d4873e4b599d23e67df

data/README.md CHANGED Viewed

@@ -1,55 +1,99 @@
 # fop_lang
-Fop (Filter and OPerations language) is an experimental, tiny expression language in the vein of awk and sed. This is a Ruby implementation. It is useful for simultaneously matching and transforming text input.
+Fop (Filter and OPerations language) is a tiny, experimental language for filtering and transforming text. Think of it like awk but with the condition and action segments combined.
-```ruby
-gem 'fop_lang'
-```
+This is a Ruby implementation with both a library interface and a bin command.
-## Release Number Example
+## Installation
-This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
+```bash
+$ gem install fop_lang
+```
+You may use fop in a Ruby script:
 ```ruby
-  f = Fop('release-{N}.{N+1}.{N=0}')
+require 'fop_lang'
-  puts f.apply('release-5.99.1')
-  =>           'release-5.100.0'
+f = Fop('foo {N+1}')
-  puts f.apply('release-5')
-  => nil
-  # doesn't match the pattern
+f.apply('foo 1')
+=> "foo 2"
+f.apply('bar 1')
+=> nil
+```
+or run `fop` from the command line:
+```bash
+$ echo 'foo 1' | fop 'foo {N+1}'
+foo 2
+$ echo 'bar 1' | fop 'foo {N+1}'
 ```
-## Anatomy of a Fop expression
+## Syntax
+`Text /(R|r)egex/ {N+1}`
+The above program demonstrates a text match, a regex match, and a match expression. If the input matches all three segments, output is given. If the input was `Text regex 5`, the output would be `Text regex 6`.
+### Text match
+The input must match this text exactly. Whitespace is part of the match. Wildcards (`*`) are allowed. Special characters (`*/{}\`) may be escaped with `\`.
-`Text Literal {Operation}`
+The output of a text match will be the matching input.
-The above expression contains the only two parts of Fop (except for the wildcard and escape characters).
+### Regex match
-**Text Literals**
+Regular expressions may be placed between `/`s. If the regular expression contains a `/`, you may escape it with `\`. Special regex characters like `[]()+.*` may also be escaped with `\`.
-A text literal works how it sounds: the input must match it exactly. The only exception is the `*` (wildcard) character, which matches 0 or more of anything. Wildcards can be used anywhere except inside `{...}` (operations).
+The output of a regex match will be the matching input.
-If `\` (escape) is used before the special characters `*`, `{` or `}`, then that character is treated like a text literal. It's recommended to use single-quoted Ruby strings with Fop expressions that so you don't need to double-escape.
+### Match expression
-**Operations**
+A match expression both matches on input and modifies that input. An expression is made up of 1 - 3 parts:
-Operations are the interesting part of Fop, and are specified between `{` and `}`. An Operation can consist of one to three parts:
+1. The match, e.g. `N` for numeric.
+2. The operator, e.g. `+` for addition (optional).
+3. The argument, e.g `1` for "add one" (required for most operators).
-1. Matching class (required): Defines what characters the operation will match and operate on.
-  * `N` is the numeric class and will match one or more digits.
-  * `A` is the alpha class and will match one or more letters (lower or upper case).
-  * `W` is the word class and matches alphanumeric chars and underscores.
-  * `*` is the wildcard class and greedily matches everything after it.
-  * `/.../` matches on the supplied regex between the `/`'s. If you're regex contains a `/`, it must be escaped.
-3. Operator (optional): What to do to the matching characters.
-  * `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars. Note that any `/` chars must be escaped, so as not to be mistaken for a regex.
-  * `+` Perform addition on the matching number and the argument (`N` only).
-  * `-` Subtract the argument from the matching number (`N` only).
-5. Operator argument (required for some operators): meaning varies by operator.
+The output of a match expression will be the _modified_ matching input. If no operator is given, the output will be the matching input.
+**Matches**
+* `N` matches one or more consecutive digits.
+* `A` matches one or more letters (lower or upper case).
+* `W` matches alphanumeric chars and underscores.
+* `*` greedily matches everything after it.
+* `/regex/` matches on the supplied regex. Capture groups may be referenced in the argument as `$1`, `$2`, etc.
+**Operators**
+* `=` Replace the matching character(s) with the given argument. If no argument is given, drop the matching chars.
+* `>` Append the argument to the matching value.
+* `<` Prepend the argument to the matching value.
+* `+` Perform addition on the matching number and the argument (`N` only).
+* `-` Subtract the argument from the matching number (`N` only).
+## Examples
+### Release Number Example
+This example takes in GitHub branch names, decides if they're release branches, and if so, increments the version number.
+```ruby
+  f = Fop('release-{N}.{N+1}.{N=0}')
+  puts f.apply('release-5.99.1')
+  =>           'release-5.100.0'
+  puts f.apply('release-5')
+  => nil
+  # doesn't match the pattern
+```
-## More Examples
+### More Examples
 ```ruby
   f = Fop('release-{N=5}.{N+1}.{N=0}')

data/bin/fop ADDED Viewed

@@ -0,0 +1,42 @@
+#!/usr/bin/env ruby
+# Used for local testing
+# $LOAD_PATH.unshift File.join(File.dirname(__FILE__), '..', 'lib')
+require 'fop_lang'
+require 'fop/cli'
+opts = Fop::CLI.options!
+if opts.version
+  puts Fop::VERSION
+  exit 0
+end
+src = opts.src.read.chomp
+if src.empty?
+  $stderr.puts "No expression given"
+  exit 1
+end
+fop, errors = Fop.compile(src)
+opts.src.close
+NL = "\n".freeze
+if errors
+  $stderr.puts src
+  $stderr.puts errors.join(NL)
+  exit 1
+end
+if opts.check
+  $stdout.puts "Syntax OK" unless opts.quiet
+  exit 0
+end
+while (line = gets) do
+  line.chomp!
+  if (res = fop.apply(line))
+    print(res << NL)
+  end
+end

data/lib/fop/cli.rb ADDED Viewed

@@ -0,0 +1,34 @@
+require 'optparse'
+module Fop
+  module CLI
+    Options = Struct.new(:src, :check, :quiet, :version)
+    def self.options!
+      options = Options.new
+      OptionParser.new do |opts|
+        opts.banner = "Usage: fop [options] [ 'prog' | -f progfile ] [ file ... ]"
+        opts.on("-fFILE", "--file=FILE", "Read program from file instead of first argument") do |f|
+          options.src = File.open(f)
+          options.src.advise(:sequential)
+        end
+        opts.on("-c", "--check", "Perform a syntax check on the program and exit") do
+          options.check = true
+        end
+        opts.on("-q", "--quiet", "Only print errors and output") do
+          options.quiet = true
+        end
+        opts.on("--version", "Print version and exit") do
+          options.version = true
+        end
+      end.parse!
+      options.src ||= StringIO.new(ARGV.shift || "")
+      options
+    end
+  end
+end

data/lib/fop/compiler.rb ADDED Viewed

@@ -0,0 +1,72 @@
+require_relative 'parser'
+module Fop
+  module Compiler
+    def self.compile(src)
+      parser = Parser.new(src)
+      nodes, errors = parser.parse
+      instructions = nodes.map { |node|
+        case node
+        when Nodes::Text, Nodes::Regex
+          Instructions.regex_match(node.regex)
+        when Nodes::Expression
+          Instructions::ExpressionMatch.new(node)
+        else
+          raise "Unknown node type #{node}"
+        end
+      }
+      return nil, errors if errors.any?
+      return instructions, nil
+    end
+    module Instructions
+      BLANK = "".freeze
+      OPERATIONS = {
+        "=" => ->(_val, arg) { arg || BLANK },
+        "+" => ->(val, arg) { val.to_i + arg.to_i },
+        "-" => ->(val, arg) { val.to_i - arg.to_i },
+        ">" => ->(val, arg) { val + arg },
+        "<" => ->(val, arg) { arg + val },
+      }
+      def self.regex_match(regex)
+        ->(input) { input.slice! regex }
+      end
+      class ExpressionMatch
+        def initialize(node)
+          @regex = node.regex&.regex
+          @op = node.operator ? OPERATIONS.fetch(node.operator) : nil
+          @regex_match = node.regex_match
+          if node.arg&.any? { |a| a.is_a? Integer }
+            @arg, @arg_with_caps = nil, node.arg
+          else
+            @arg = node.arg&.join("")
+            @arg_with_caps = nil
+          end
+        end
+        def call(input)
+          if (match = @regex.match(input))
+            val = match.to_s
+            blank = val == BLANK
+            input.sub!(val, BLANK) unless blank
+            found_val = @regex_match || !blank
+            arg = @arg_with_caps ? sub_caps(@arg_with_caps, match.captures) : @arg
+            @op && found_val ? @op.call(val, arg) : val
+          end
+        end
+        private
+        def sub_caps(args, caps)
+          args.map { |a|
+            a.is_a?(Integer) ? caps[a].to_s : a
+          }.join("")
+        end
+      end
+    end
+  end
+end

data/lib/fop/nodes.rb CHANGED Viewed

@@ -1,29 +1,29 @@
 module Fop
   module Nodes
-    Text = Struct.new(:wildcard, :str) do
-      def consume!(input)
-        @regex ||= Regexp.new((wildcard ? ".*" : "^") + Regexp.escape(str))
-        input.slice!(@regex)
-      end
+    Text = Struct.new(:wildcard, :str, :regex) do
       def to_s
         w = wildcard ? "*" : nil
-        "Text #{w}#{str}"
+        "[#{w}txt] #{str}"
       end
     end
-    Op = Struct.new(:wildcard, :match, :regex_match, :regex, :operator, :operator_arg, :expression) do
-      def consume!(input)
-        if (val = input.slice!(regex))
-          found_val = regex_match || val != Parser::BLANK
-          expression && found_val ? expression.call(val) : val
-        end
+    Regex = Struct.new(:wildcard, :src, :regex) do
+      def to_s
+        w = wildcard ? "*" : nil
+        "[#{w}reg] #{src}"
       end
+    end
+    Expression = Struct.new(:wildcard, :match, :regex_match, :regex, :operator, :arg) do
       def to_s
         w = wildcard ? "*" : nil
-        s = "#{w}#{match}"
-        s << " #{operator} #{operator_arg}" if operator
+        s = "[#{w}exp] #{match}"
+        if operator
+          arg_str = arg
+            .map { |a| a.is_a?(Integer) ? "$#{a+1}" : a.to_s }
+            .join("")
+          s << " #{operator} #{arg_str}"
+        end
         s
       end
     end

data/lib/fop/parser.rb CHANGED Viewed

@@ -1,136 +1,162 @@
+require_relative 'tokenizer'
 require_relative 'nodes'
 module Fop
-  module Parser
-    Error = Class.new(StandardError)
+  class Parser
+    DIGIT = /^[0-9]$/
+    REGEX_START = "^".freeze
+    REGEX_LAZY_WILDCARD = ".*?".freeze
+    REGEX_MATCHES = {
+      "N" => "[0-9]+".freeze,
+      "W" => "\\w+".freeze,
+      "A" => "[a-zA-Z]+".freeze,
+      "*" => ".*".freeze,
+    }.freeze
+    OPS_WITH_OPTIONAL_ARGS = [Tokenizer::OP_REPLACE]
+    TR_REGEX = /.*/
-    MATCH_NUM = "N".freeze
-    MATCH_WORD = "W".freeze
-    MATCH_ALPHA = "A".freeze
-    MATCH_WILD = "*".freeze
-    BLANK = "".freeze
-    OP_REPLACE = "=".freeze
-    OP_ADD = "+".freeze
-    OP_SUB = "-".freeze
-    OP_MUL = "*".freeze
-    OP_DIV = "/".freeze
+    Error = Struct.new(:type, :token, :message) do
+      def to_s
+        "#{type.to_s.capitalize} error: #{message} at column #{token.pos}"
+      end
+    end
-    def self.parse!(tokens)
-      nodes = []
-      curr_node = nil
+    attr_reader :errors
-      tokens.each { |token|
-        case curr_node
-        when nil
-          curr_node = new_node token
-        when :wildcard
-          curr_node = new_node token, true
-          raise Error, "Unexpected * after wildcard" if curr_node == :wildcard
-        when Nodes::Text
-          curr_node, finished_node = parse_text curr_node, token
-          nodes << finished_node if finished_node
-        when Nodes::Op
-          nodes << curr_node
-          curr_node = new_node token
+    def initialize(src, debug: false)
+      @tokenizer = Tokenizer.new(src)
+      @errors = []
+    end
+    def parse
+      nodes = []
+      wildcard = false
+      eof = false
+      # Top-level parsing. It will always be looking for a String, Regex, or Expression.
+      until eof
+        @tokenizer.reset_escapes!
+        t = @tokenizer.next
+        case t.type
+        when Tokens::WILDCARD
+          errors << Error.new(:syntax, t, "Consecutive wildcards") if wildcard
+          wildcard = true
+        when Tokens::TEXT
+          reg = build_regex!(wildcard, t, Regexp.escape(t.val))
+          nodes << Nodes::Text.new(wildcard, t.val, reg)
+          wildcard = false
+        when Tokens::EXP_OPEN
+          nodes << parse_exp!(wildcard)
+          wildcard = false
+        when Tokens::REG_DELIM
+          nodes << parse_regex!(wildcard)
+          wildcard = false
+        when Tokens::EOF
+          eof = true
         else
-          raise Error, "Unexpected node #{curr_node}"
+          errors << Error.new(:syntax, t, "Unexpected #{t.type}")
         end
-      }
-      case curr_node
-      when nil
-        # noop
-      when :wildcard
-        nodes << Nodes::Text.new(true, "")
-      when Nodes::Text, Nodes::Op
-        nodes << curr_node
-      else
-        raise "Unexpected end node #{curr_node}"
       end
-      nodes
+      nodes << Nodes::Text.new(true, "", TR_REGEX) if wildcard
+      return nodes, @errors
     end
-    private
+    def parse_exp!(wildcard = false)
+      exp = Nodes::Expression.new(wildcard)
+      parse_exp_match! exp
+      op_token = parse_exp_operator! exp
+      if exp.operator
+        parse_exp_arg! exp, op_token
+      end
+      return exp
+    end
-    def self.new_node(token, wildcard = false)
-      case token
-      when Tokenizer::Char
-        Nodes::Text.new(wildcard, token.char.clone)
-      when Tokenizer::Op
-        op = Nodes::Op.new(wildcard)
-        parse_op! op, token.tokens
-        op
-      when :wildcard
-        :wildcard
+    def parse_exp_match!(exp)
+      @tokenizer.escape.operators = false
+      t = @tokenizer.next
+      case t.type
+      when Tokens::TEXT, Tokens::WILDCARD
+        exp.match = t.val
+        if (src = REGEX_MATCHES[exp.match])
+          reg = Regexp.new((exp.wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
+          exp.regex = Nodes::Regex.new(exp.wildcard, src, reg)
+        else
+          errors << Error.new(:name, t, "Unknown match type '#{exp.match}'") if exp.regex.nil?
+        end
+      when Tokens::REG_DELIM
+        exp.regex = parse_regex!(exp.wildcard)
+        exp.match = exp.regex&.src
+        exp.regex_match = true
+        @tokenizer.reset_escapes!
       else
-        raise Error, "Unexpected #{token}"
+        errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string or a regex")
       end
     end
-    # @return current node
-    # @return finished node
-    def self.parse_text(node, token)
-      case token
-      when Tokenizer::Char
-        node.str << token.char
-        return node, nil
-      when Tokenizer::Op
-        op = new_node token
-        return op, node
-      when :wildcard
-        return :wildcard, node
+    def parse_exp_operator!(exp)
+      @tokenizer.escape.operators = false
+      t = @tokenizer.next
+      case t.type
+      when Tokens::EXP_CLOSE
+        # no op
+      when Tokens::OPERATOR
+        exp.operator = t.val
       else
-        raise Error, "Unexpected #{token}"
+        errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected an operator")
       end
+      t
     end
-    def self.parse_op!(node, tokens)
-      t = tokens[0] || raise(Error, "Empty operation")
-      # parse the matching type
-      node.regex =
-        case t
-        when Tokenizer::Char
-          node.match = t.char
-          node.regex_match = false
-          case t.char
-          when MATCH_NUM then Regexp.new((node.wildcard ? ".*?" : "^") + "[0-9]+")
-          when MATCH_WORD then Regexp.new((node.wildcard ? ".*?" : "^") + "\\w+")
-          when MATCH_ALPHA then Regexp.new((node.wildcard ? ".*?" : "^") + "[a-zA-Z]+")
-          when MATCH_WILD then /.*/
-          else raise Error, "Unknown match type '#{t.char}'"
-          end
-        when Tokenizer::Regex
-          node.match = "/#{t.src}/"
-          node.regex_match = true
-          Regexp.new((node.wildcard ? ".*?" : "^") + t.src)
+    def parse_exp_arg!(exp, op_token)
+      @tokenizer.escape.operators = true
+      @tokenizer.escape.regex = true
+      @tokenizer.escape.regex_capture = false if exp.regex_match
+      exp.arg = []
+      found_close, eof = false, false
+      until found_close or eof
+        t = @tokenizer.next
+        case t.type
+        when Tokens::TEXT
+          exp.arg << t.val
+        when Tokens::REG_CAPTURE
+          exp.arg << t.val.to_i - 1
+          errors << Error.new(:syntax, t, "Invalid regex capture; must be between 0 and 9 (found #{t.val})") unless t.val =~ DIGIT
+          errors << Error.new(:syntax, t, "Unexpected regex capture; expected str or '}'") if !exp.regex_match
+        when Tokens::EXP_CLOSE
+          found_close = true
+        when Tokens::EOF
+          eof = true
+          errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
         else
-          raise Error, "Unexpected token #{t}"
+          errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected str or '}'")
         end
+      end
-      # parse the operator (if any)
-      if (op = tokens[1])
-        raise Error, "Unexpected #{op}" unless op.is_a? Tokenizer::Char
-        node.operator = op.char
-        arg = tokens[2..-1].reduce("") { |acc, t|
-          raise Error, "Unexpected #{t}" unless t.is_a? Tokenizer::Char
-          acc + t.char
-        }
-        node.operator_arg = arg == BLANK ? nil : arg
+      if exp.arg.size != 1 and !OPS_WITH_OPTIONAL_ARGS.include?(exp.operator)
+        errors << Error.new(:arg, op_token, "Operator '#{op_token.val}' requires an argument")
+      end
+    end
-        node.expression =
-          case node.operator
-          when OP_REPLACE
-            ->(_) { node.operator_arg || BLANK }
-          when OP_ADD, OP_SUB, OP_MUL, OP_DIV
-            raise Error, "Operator #{node.operator} is only available for numeric matches" unless node.match == MATCH_NUM
-            raise Error, "Operator #{node.operator} expects an argument" if node.operator_arg.nil?
-            ->(x) { x.to_i.send(node.operator, node.operator_arg.to_i) }
-          else
-            raise(Error, "Unknown operator #{node.operator}")
-          end
+    def parse_regex!(wildcard)
+      @tokenizer.regex_mode!
+      t = @tokenizer.next
+      reg = Nodes::Regex.new(wildcard, t.val)
+      if t.type == Tokens::TEXT
+        reg.regex = build_regex!(wildcard, t)
+      else
+        errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex")
       end
+      t = @tokenizer.next
+      errors << Error.new(:syntax, t, "Unexpected #{t.type}; expected a string of regex") unless t.type == Tokens::REG_DELIM
+      reg
+    end
+    def build_regex!(wildcard, token, src = token.val)
+      Regexp.new((wildcard ? REGEX_LAZY_WILDCARD : REGEX_START) + src)
+    rescue RegexpError => e
+      errors << Error.new(:regex, token, e.message)
+      nil
     end
   end
 end

data/lib/fop/program.rb CHANGED Viewed

@@ -1,22 +1,16 @@
-require_relative 'tokenizer'
-require_relative 'parser'
 module Fop
   class Program
-    attr_reader :nodes
-    def initialize(src)
-      tokens = Tokenizer.new(src).tokenize!
-      @nodes = Parser.parse! tokens
+    def initialize(instructions)
+      @instructions = instructions
     end
     def apply(input)
       input = input.clone
       output =
-        @nodes.reduce("") { |acc, token|
-          section = token.consume!(input)
-          return nil if section.nil?
-          acc + section.to_s
+        @instructions.reduce("") { |acc, ins|
+          result = ins.call(input)
+          return nil if result.nil?
+          acc + result.to_s
         }
       input.empty? ? output : nil
     end

data/lib/fop/tokenizer.rb CHANGED Viewed

@@ -1,123 +1,175 @@
+require_relative 'tokens'
 module Fop
   class Tokenizer
-    Char = Struct.new(:char)
-    Op = Struct.new(:tokens)
-    Regex = Struct.new(:src)
-    Error = Class.new(StandardError)
+    Token = Struct.new(:pos, :type, :val)
+    Error = Struct.new(:pos, :message)
+    Escapes = Struct.new(:operators, :regex_capture, :regex, :regex_escape, :wildcards, :exp)
-    OP_OPEN = "{".freeze
-    OP_CLOSE = "}".freeze
+    EXP_OPEN = "{".freeze
+    EXP_CLOSE = "}".freeze
     ESCAPE = "\\".freeze
     WILDCARD = "*".freeze
-    REGEX_MARKER = "/".freeze
+    REGEX_DELIM = "/".freeze
+    REGEX_CAPTURE = "$".freeze
+    OP_REPLACE = "=".freeze
+    OP_APPEND = ">".freeze
+    OP_PREPEND = "<".freeze
+    OP_ADD = "+".freeze
+    OP_SUB = "-".freeze
+    #
+    # Controls which "mode" the tokenizer is currently in. This is a necessary result of the syntax lacking
+    # explicit string delimiters. That *could* be worked around by requiring users to escape all reserved chars,
+    # but that's ugly af. Instead, the parser continually assesses the current context and flips these flags on
+    # or off to auto-escape certain chars for the next token.
+    #
+    attr_reader :escape
     def initialize(src)
       @src = src
       @end = src.size - 1
+      @start_i = 0
+      @i = 0
+      reset_escapes!
     end
-    def tokenize!
-      tokens = []
-      escape = false
-      i = 0
-      until i > @end do
-        char = @src[i]
-        if escape
-          tokens << Char.new(char)
-          escape = false
-          i += 1
-          next
-        end
-        case char
-        when ESCAPE
-          escape = true
-          i += 1
-        when OP_OPEN
-          i, op = operation! i + 1
-          tokens << op
-        when OP_CLOSE
-          raise "Unexpected #{OP_CLOSE}"
-        when WILDCARD
-          tokens << :wildcard
-          i += 1
-        else
-          tokens << Char.new(char)
-          i += 1
-        end
-      end
-      raise Error, "Trailing escape" if escape
-      tokens
+    # Auto-escape operators and regex capture vars. Appropriate for top-level syntax.
+    def reset_escapes!
+      @escape = Escapes.new(true, true)
     end
-    private
-    def operation!(i)
-      escape = false
-      found_close = false
-      tokens = []
+    # Auto-escape anything you'd find in a regular expression
+    def regex_mode!
+      @escape.regex = false # look for the final /
+      @escape.regex_escape = true # pass \ through to the regex engine UNLESS it's followed by a /
+      @escape.wildcards = true
+      @escape.operators = true
+      @escape.regex_capture = true
+      @escape.exp = true
+    end
-      until found_close or i > @end do
-        char = @src[i]
-        if escape
-          tokens << Char.new(char)
-          escape = false
-          i += 1
-          next
+    def next
+      return Token.new(@i, Tokens::EOF) if @i > @end
+      char = @src[@i]
+      case char
+      when EXP_OPEN
+        @i += 1
+        token! Tokens::EXP_OPEN
+      when EXP_CLOSE
+        @i += 1
+        token! Tokens::EXP_CLOSE
+      when WILDCARD
+        @i += 1
+        token! Tokens::WILDCARD, WILDCARD
+      when REGEX_DELIM
+        if @escape.regex
+          get_str!
+        else
+          @i += 1
+          token! Tokens::REG_DELIM
         end
-        case char
-        when ESCAPE
-          escape = true
-          i += 1
-        when OP_OPEN
-          raise "Unexpected #{OP_OPEN}"
-        when OP_CLOSE
-          found_close = true
-          i += 1
-        when REGEX_MARKER
-          i, reg = regex! i + 1
-          tokens << reg
+      when REGEX_CAPTURE
+        if @escape.regex_capture
+          get_str!
         else
-          tokens << Char.new(char)
-          i += 1
+          @i += 1
+          t = token! Tokens::REG_CAPTURE, @src[@i]
+          @i += 1
+          @start_i = @i
+          t
         end
+      when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
+        if @escape.operators
+          get_str!
+        else
+          @i += 1
+          token! Tokens::OPERATOR, char
+        end
+      else
+        get_str!
       end
-      raise Error, "Unclosed operation" if !found_close
-      raise Error, "Trailing escape" if escape
-      return i, Op.new(tokens)
     end
-    def regex!(i)
-      escape = false
-      found_close = false
-      src = ""
+    private
-      until found_close or i > @end
-        char = @src[i]
-        i += 1
+    def token!(type, val = nil)
+      t = Token.new(@start_i, type, val)
+      @start_i = @i
+      t
+    end
+    def get_str!
+      str = ""
+      escape, found_end = false, false
+      until found_end or @i > @end
+        char = @src[@i]
         if escape
-          src << char
+          @i += 1
+          str << char
           escape = false
           next
         end
         case char
         when ESCAPE
-          escape = true
-        when REGEX_MARKER
-          found_close = true
+          @i += 1
+          if @escape.regex_escape and @src[@i] != REGEX_DELIM
+            str << char
+          else
+            escape = true
+          end
+        when EXP_OPEN
+          if @escape.exp
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when EXP_CLOSE
+          if @escape.exp
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when WILDCARD
+          if @escape.wildcards
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when REGEX_DELIM
+          if @escape.regex
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when REGEX_CAPTURE
+          if @escape.regex_capture
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
+        when OP_REPLACE, OP_APPEND, OP_PREPEND, OP_ADD, OP_SUB
+          if @escape.operators
+            @i += 1
+            str << char
+          else
+            found_end = true
+          end
         else
-          src << char
+          @i += 1
+          str << char
         end
       end
-      raise Error, "Unclosed regex" if !found_close
-      raise Error, "Trailing escape" if escape
-      return i, Regex.new(src)
+      return Token.new(@i - 1, Tokens::TR_ESC) if escape
+      token! Tokens::TEXT, str
     end
   end
 end

data/lib/fop/tokens.rb ADDED Viewed

@@ -0,0 +1,13 @@
+module Fop
+  module Tokens
+    TEXT = :TXT
+    EXP_OPEN = :"{"
+    EXP_CLOSE = :"}"
+    REG_CAPTURE = :"$"
+    REG_DELIM = :/
+    WILDCARD = :*
+    OPERATOR = :op
+    TR_ESC = :"trailing escape"
+    EOF = :EOF
+  end
+end

data/lib/fop/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Fop
-  VERSION = "0.3.0"
+  VERSION = "0.7.0"
 end

data/lib/fop_lang.rb CHANGED Viewed

@@ -1,12 +1,22 @@
 require_relative 'fop/version'
+require_relative 'fop/compiler'
 require_relative 'fop/program'
 def Fop(src)
-  ::Fop::Program.new(src)
+  ::Fop.compile!(src)
 end
 module Fop
+  def self.compile!(src)
+    prog, errors = compile(src)
+    # TODO better exception
+    raise "Fop errors: " + errors.map(&:message).join(",") if errors
+    prog
+  end
   def self.compile(src)
-    Program.new(src)
+    instructions, errors = ::Fop::Compiler.compile(src)
+    return nil, errors if errors
+    return Program.new(instructions), nil
   end
 end

metadata CHANGED Viewed

@@ -1,26 +1,31 @@
 --- !ruby/object:Gem::Specification
 name: fop_lang
 version: !ruby/object:Gem::Version
-  version: 0.3.0
+  version: 0.7.0
 platform: ruby
 authors:
 - Jordan Hollinger
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2021-08-16 00:00:00.000000000 Z
+date: 2021-08-30 00:00:00.000000000 Z
 dependencies: []
 description: A micro expression language for Filter and OPerations on text
 email: jordan.hollinger@gmail.com
-executables: []
+executables:
+- fop
 extensions: []
 extra_rdoc_files: []
 files:
 - README.md
+- bin/fop
+- lib/fop/cli.rb
+- lib/fop/compiler.rb
 - lib/fop/nodes.rb
 - lib/fop/parser.rb
 - lib/fop/program.rb
 - lib/fop/tokenizer.rb
+- lib/fop/tokens.rb
 - lib/fop/version.rb
 - lib/fop_lang.rb
 homepage: https://jhollinger.github.io/fop-lang-rb/