RubyGems - yara-normalize - Versions diffs - 0.4.0 → 1.0.0 - Mend

yara-normalize 0.4.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

checksums.yaml +4 -4
data/README.rdoc +69 -35
data/lib/yara-normalize/yara-normalize.rb +153 -108
metadata +6 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 3a345a64cbb92b8600dbed3abe6c92219d60a0d27bd04a49bc60f9e141023369
-  data.tar.gz: a7ee233ae22e1260789397a71ad1926f60d65e86714662b2b1e5a65e2ca27cb0
+  metadata.gz: 13f41905bf9f1f9e8d8f30146578908c1752bffb95f8058cd849f985655104d0
+  data.tar.gz: f03ec512ead274a1e8990fe2d6448c2f62361e3a5729db0f0c7f24ea10dc69d9
 SHA512:
-  metadata.gz: a8cb5ab7710807545d146ac7c8d16928296b74dffce0be7963ea815247c8ab3671290c8b236d015ded9dd287eb3b008c8ca4c626279f32440c12b80716b4c9fd
-  data.tar.gz: b82a45e72917d4132aa0e6edaa2fac91b162660f8deb88140a05356efbf42ff8e531c786dd674ec31c96231317bef5d9f30c0bd4f1abed7143851b698bf167f9
+  metadata.gz: 485d6e7e454a7ea967b11d9065f31403a7e8b7a08b9b218d674d4007f07cbe2ec702258794acf021b75b4a35627dcbe6dc4eda2d91b17f37ee392f235fdc5005
+  data.tar.gz: 2379f78b6e1b4d283a4e0f209e76a18597da2b8a8e0b4e596c85e86ad4e59a2c69c748f72e857c86186ebaa460a0e91fb3d58e7308d1308bcd7898042caccb14

data/README.rdoc CHANGED Viewed

@@ -1,35 +1,34 @@
 = yara-normalize
-Normalizes Yara Signatures into a repeatable hash even when non-transforming changes are made}
-To enable consistent comparisons between yara rules (signature), a uniform hashing standard was needed.
+Normalizes YARA signatures into a repeatable, stable hash even when
+non-semantic changes are made (whitespace, comments, tag ordering, variable
+renaming, etc.).
-This modules takes just the strings from the strings section, sorts them, then generate a sha1 hash.
-Then, in the conditions section, reorder the boolean expression to make groups first and then replace all variables
-with $a $b $c, etc.  Then hash the result of this.
+To enable consistent comparisons between YARA rules, a uniform fingerprinting
+standard is applied:
-Then, the signature ID is the concatenation of the truncated md5 sum of the sorted strings and the truncated md5 sum of the normalized conditions. E.g., yn01:488085c947cb22ed:d936fceffe.
+1. *Strings section* — each string value (the part after the '=') is extracted,
+   sorted alphabetically, and the sorted list is hashed with SHA-256.  Variable
+   names ($a, $mshtmlExec_1, …) are excluded from the hash so that renaming
+   does not change the fingerprint.
-== Usage
+2. *Condition section* — variable references ($name, #name) are replaced with
+   positional tokens ($0, $1, …) in order of first appearance, so cosmetic
+   renames do not affect the hash.  The resulting text is hashed with SHA-256.
+The rule fingerprint is:
+  yn<VERSION>:<last-16-hex-chars-of-strings-SHA256>:<last-10-hex-chars-of-condition-SHA256>
-See test cases.
+Prior to version 0.4.0 the fingerprint used MD5 and carried the prefix +yn01+.
+Since 0.4.0 the fingerprint uses SHA-256 and carries the prefix +yn02+.  The
+two identifier series are not interchangeable.
+== Usage
   require 'yara-normalize'
-  sig =<<EOS
-  rule DataConversion__wide : IntegerParsing DataConversion {
-     meta:
-      weight =1
-          strings:
-      $="wtoi" nocase
-          $ ="wtol" nocase
-      $= "wtof" nocase
-       $   =   "wtodb" nocase
-  condition:
-      any of them
-  }
-  EOS
-  yn = YaraTools::YaraRule.new(sig)
-  puts yn.hash # => yn01:488085c947cb22ed:d936fceffe
-  puts yn.normalize # =>
+  sig = <<~EOS
     rule DataConversion__wide : IntegerParsing DataConversion {
       meta:
         weight = 1
@@ -41,22 +40,57 @@ See test cases.
       condition:
         any of them
     }
-  puts yn.name # => DataConversion__wide
-  pp yn.tags # => ["IntegerParsing","DataConversion"]
+  EOS
+  yn = YaraTools::YaraRule.new(sig)
+  puts yn.hash
+  # => yn02:6783b7082bed88dc:6821e3f6a3
+  puts yn.name    # => DataConversion__wide
+  pp   yn.tags    # => ["IntegerParsing", "DataConversion"]
+  pp   yn.meta    # => {"weight"=>"1"}
+  pp   yn.strings # => ["$ = \"wtoi\" nocase", ...]
+  puts yn.normalize
+  # => rule DataConversion__wide : IntegerParsing DataConversion {
+  #      meta:
+  #        weight = 1
+  #      strings:
+  #        $ = "wtoi" nocase
+  #        $ = "wtol" nocase
+  #        $ = "wtof" nocase
+  #        $ = "wtodb" nocase
+  #      condition:
+  #        any of them
+  #    }
+Splitting a multi-rule file:
+  rules = YaraTools::Splitter.split(File.read("ruleset.yar"))
+  rules.each { |r| puts "#{r.name}: #{r.hash}" }
+== Security notes
+* Fingerprints use SHA-256 (as of yn02).  MD5-based yn01 hashes should be
+  considered legacy and re-computed.
+* +YaraRule#hash+ overrides Ruby's +Object#hash+.  Do *not* use +YaraRule+
+  objects as Hash keys; the method returns a String fingerprint, not the
+  Integer that Ruby's Hash tables require.
 == Contributing to yara-normalize
-* Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet.
-* Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it.
+* Check out the latest master to make sure the feature hasn't been implemented
+  or the bug hasn't been fixed yet.
+* Check out the issue tracker to make sure someone already hasn't requested it
+  and/or contributed it.
 * Fork the project.
 * Start a feature/bugfix branch.
 * Commit and push until you are happy with your contribution.
-* Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
-* Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
+* Make sure to add tests for it. This is important so I don't break it in a
+  future version unintentionally.
+* Please try not to mess with the Rakefile, version, or history.
 == Copyright
-Copyright (c) 2012 chrislee35. See LICENSE.txt for
-further details.
+Copyright (c) 2012 chrislee35. See LICENSE.txt for further details.

data/lib/yara-normalize/yara-normalize.rb CHANGED Viewed

@@ -1,110 +1,155 @@
-require 'digest/md5'
-require 'pp'
+require 'digest'
 module YaraTools
-	VERSION = "01"
-	class YaraRule
-		attr_reader :original, :name, :tags, :meta, :strings, :condition, :normalized_strings
-		def initialize(ruletext)
-			ruletext = ruletext.gsub(/[\r\n]+/,"\n").gsub(/^\s*\/\/.*$/,'')
-			@original = ruletext
-			@lookup_table = {}
-			@next_replacement = 0
-			if ruletext =~ /rule\s+([\w\-]+)(\s*:\s*(\w[\w\s]+\w))?\s*\{\s*(meta:\s*(.*?))?strings:\s*(.*?)\s*condition:\s*(.*?)\s*\}/m
-                name,_,tags,_,meta,strings,condition = $~.captures
-				@name = name
-				@tags = tags.strip.split(/[,\s]+/) if tags
-				@meta = {}
-                if meta
-                    meta.split(/\n/).each do |m|
-                        k,v = m.strip.split(/\s*=\s*/,2)
-                        if v
-                            @meta[k] = v
-                        end
-                    end
-                end
-				@normalized_strings = []
-				@strings = strings.split(/\n/).map do |s|
-					# strip off the spaces from the edges and then replace the first = with ' = '.
-					s = s.strip
-					if s[/\s*=\s*/,0]
-						s[/\s*=\s*/,0] = " = "
-					end
-					if s =~ /= \{([0-9a-fA-F\s]+)\}/
-						# normalize the hex string
-						hexstr = $1.gsub(/\s+/,'').downcase.scan(/../).join(" ")
-						s = s.gsub(/= \{([0-9a-fA-F\s]+)\}/, "= { #{hexstr} }")
-					end
-					_, val = s.split(/ = /,2)
-					if val
-						@normalized_strings << val
-					else
-						@normalized_strings << s
-					end
-					s
-				end
-				@normalized_strings.sort!
-				@condition = condition.split(/\n/).map{|x| x.strip}
-				@normalized_condition = @condition.map{|x| _normalize_condition(x)}
-			end
-		end
-		def _normalize_condition(condition)
-			condition.gsub(/[\$\#]\w+/) do |x|
-				key = x[1,1000]
-				if not @lookup_table[key]
-					@lookup_table[key] = @next_replacement.to_s
-					@next_replacement += 1
-				end
-				x[0].chr+@lookup_table[key]
-			end
-		end
-		def normalize
-			text = "rule #{@name} "
-			if @tags and @tags.length > 0
-				text += ": #{@tags.join(' ')} "
-			end
-			text += "{\n"
-			if @meta and @meta.length > 0
-				text += "  meta:\n"
-				@meta.each do |k,v|
-					text += "    #{k} = #{v}\n"
-				end
-			end
-			if @strings and @strings.length > 0
-				text += "  strings:\n"
-				@strings.each do |s|
-					if s =~ /\w/
-						text += "    #{s}\n"
-					end
-				end
-			end
-			if @condition and @condition.length > 0
-				text += "  condition:\n"
-				@condition.each do |c|
-					if c =~ /\w/
-						text += "    #{c}\n"
-					end
-				end
-			end
-			text + "}"
-		end
-		def hash
-			normalized_strings = @normalized_strings.join("%")
-			normalized_condition = @normalized_condition.join("%")
-			strings_hash = Digest::MD5.hexdigest(normalized_strings)
-			condition_hash = Digest::MD5.hexdigest(normalized_condition)
-			"yn#{VERSION}:#{strings_hash[-16,16]}:#{condition_hash[-10,10]}"
-		end
-	end
-	class Splitter
-		def Splitter.split(ruleset)
-			ruleset.gsub(/[\r\n]+/,"\n").gsub(/^\s*\/\/.*$/,'').scan(/(rule\s+([\w\-]+)(\s*:\s*(\w[\w\s]+\w))?\s*\{\s*(meta:\s*(.*?))?strings:\s*(.*?)\s*condition:\s*(.*?)\s*\})/m).map do |rule|
-				YaraRule.new(rule[0])
-			end
-		end
-	end
+  # Hash format version embedded in every yn-hash identifier.
+  # Increment when the normalization algorithm changes so consumers can
+  # detect that two hashes are not directly comparable (e.g. yn01 vs yn02).
+  VERSION = "02"
+  class YaraRule
+    attr_reader :original, :name, :tags, :meta, :strings, :condition, :normalized_strings
+    def initialize(ruletext)
+      # Normalize line endings and strip single-line (//) comments before
+      # any further parsing so they never appear in meta/strings/condition.
+      ruletext = ruletext.gsub(/[\r\n]+/, "\n").gsub(/^\s*\/\/.*$/, '')
+      @original = ruletext
+      # Lookup table used by _normalize_condition to replace variable names
+      # ($foo, #foo) with stable positional tokens ($0, $1, …) so that
+      # cosmetic renames do not affect the normalized condition hash.
+      @lookup_table = {}
+      @next_replacement = 0
+      # Single-pass regex parse.  The rule grammar is:
+      #   rule <name> [: <tags>] { [meta: …] strings: … condition: … }
+      # The .*? quantifiers are non-greedy so they stop at the first matching
+      # delimiter keyword rather than consuming the whole file.
+      rule_re = /rule\s+([\w\-]+)(\s*:\s*(\w[\w\s]+\w))?\s*\{\s*(meta:\s*(.*?))?strings:\s*(.*?)\s*condition:\s*(.*?)\s*\}/m
+      if ruletext =~ rule_re
+        name, _, tags, _, meta, strings, condition = $~.captures
+        @name = name
+        # Tags are optional; split on whitespace/commas when present.
+        @tags = tags.strip.split(/[,\s]+/) if tags
+        # Parse the meta section into a key/value Hash.  Each line has the
+        # form: key = value (value may contain spaces and quotes).
+        @meta = {}
+        if meta
+          meta.split(/\n/).each do |m|
+            k, v = m.strip.split(/\s*=\s*/, 2)
+            @meta[k] = v if v
+          end
+        end
+        # Parse the strings section, normalizing whitespace around '=' and
+        # canonicalizing any hex byte strings (e.g. { 4D 5A } → { 4d 5a }).
+        @normalized_strings = []
+        @strings = strings.split(/\n/).map do |s|
+          s = s.strip
+          # Collapse any amount of whitespace around '=' to a single ' = '.
+          s[/\s*=\s*/, 0] = " = " if s[/\s*=\s*/, 0]
+          # Hex byte strings: normalise spacing and case so that
+          # { 4D5A } and { 4d 5a } produce the same output.
+          if s =~ /= \{([0-9a-fA-F\s]+)\}/
+            hexstr = $1.gsub(/\s+/, '').downcase.scan(/../).join(" ")
+            s = s.gsub(/= \{([0-9a-fA-F\s]+)\}/, "= { #{hexstr} }")
+          end
+          # Collect only the value portion (right of ' = ') for hashing,
+          # so that variable renames ($a → $b) do not change the hash.
+          _, val = s.split(/ = /, 2)
+          @normalized_strings << (val || s)
+          s
+        end
+        @normalized_strings.sort!
+        @condition = condition.split(/\n/).map(&:strip)
+        @normalized_condition = @condition.map { |x| _normalize_condition(x) }
+      end
+    end
+    # Replace named variable references in a condition line with positional
+    # tokens so that renaming $mshtmlExec_1 → $a does not change the hash.
+    # Both count (#) and match ($) sigils are preserved.
+    # NOTE: This method is intentionally prefixed with _ to signal that it is
+    # an internal implementation detail; do not call it from outside this class.
+    def _normalize_condition(condition)
+      condition.gsub(/[\$\#]\w+/) do |x|
+        key = x[1, 1000]
+        @lookup_table[key] ||= begin
+          val = @next_replacement.to_s
+          @next_replacement += 1
+          val
+        end
+        x[0].chr + @lookup_table[key]
+      end
+    end
+    # Return a canonical, human-readable rendering of the rule with
+    # consistent indentation and ordering.  Tags, meta, strings, and
+    # condition are preserved in their original order.
+    def normalize
+      text = "rule #{@name} "
+      text += ": #{@tags.join(' ')} " if @tags && !@tags.empty?
+      text += "{\n"
+      if @meta && !@meta.empty?
+        text += "  meta:\n"
+        @meta.each { |k, v| text += "    #{k} = #{v}\n" }
+      end
+      if @strings && !@strings.empty?
+        text += "  strings:\n"
+        @strings.each { |s| text += "    #{s}\n" if s =~ /\w/ }
+      end
+      if @condition && !@condition.empty?
+        text += "  condition:\n"
+        @condition.each { |c| text += "    #{c}\n" if c =~ /\w/ }
+      end
+      text + "}"
+    end
+    # Return a stable identifier for this rule in the form:
+    #   yn<VERSION>:<strings_fingerprint>:<condition_fingerprint>
+    #
+    # The strings fingerprint is the last 16 hex chars of the SHA-256 digest
+    # of the sorted, normalised string values joined by '%'.
+    # The condition fingerprint is the last 10 hex chars of the SHA-256 digest
+    # of the normalised condition lines joined by '%'.
+    #
+    # Using SHA-256 (replacing the previous MD5) gives 256-bit collision
+    # resistance and avoids MD5's well-known preimage and collision weaknesses.
+    #
+    # SECURITY NOTE: This method is named `hash` to match the public API, but
+    # it overrides Ruby's built-in Object#hash, which is expected to return an
+    # Integer for use as a Hash table key.  Do NOT use YaraRule objects as Hash
+    # keys; use .hash (this method) only for YARA rule fingerprinting.
+    def hash
+      normalized_strings   = @normalized_strings.join("%")
+      normalized_condition = @normalized_condition.join("%")
+      strings_digest   = Digest::SHA256.hexdigest(normalized_strings)
+      condition_digest = Digest::SHA256.hexdigest(normalized_condition)
+      "yn#{VERSION}:#{strings_digest[-16, 16]}:#{condition_digest[-10, 10]}"
+    end
+  end
+  # Splits a multi-rule YARA file into individual YaraRule objects.
+  class Splitter
+    # Parse a string containing one or more YARA rules and return an Array of
+    # YaraRule instances, one per rule found in +ruleset+.
+    def self.split(ruleset)
+      # Strip line endings and single-line comments before scanning so that
+      # comment text cannot interfere with the rule boundary regex.
+      clean = ruleset.gsub(/[\r\n]+/, "\n").gsub(/^\s*\/\/.*$/, '')
+      rule_re = /(rule\s+([\w\-]+)(\s*:\s*(\w[\w\s]+\w))?\s*\{\s*(meta:\s*(.*?))?strings:\s*(.*?)\s*condition:\s*(.*?)\s*\})/m
+      clean.scan(rule_re).map { |rule| YaraRule.new(rule[0]) }
+    end
+  end
 end

metadata CHANGED Viewed

@@ -1,13 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: yara-normalize
 version: !ruby/object:Gem::Version
-  version: 0.4.0
+  version: 1.0.0
 platform: ruby
 authors:
 - Chris Lee
+autorequire:
 bindir: bin
 cert_chain: []
-date: 1980-01-02 00:00:00.000000000 Z
+date: 2026-04-25 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: test-unit
@@ -110,6 +111,7 @@ homepage: https://github.com/chrislee35/yara-normalize
 licenses:
 - MIT
 metadata: {}
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -124,7 +126,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.7.2
+rubygems_version: 3.4.20
+signing_key:
 specification_version: 4
 summary: Normalizes Yara signatures into a repeatable hash even when non-transforming
   changes are made.