RubyGems - sas-linter - Versions diffs - 0.2.1 → 0.2.3 - Mend

sas-linter 0.2.1 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/README.md +6 -1
data/lib/sas_linter/rules/inconsistent_variable_case.rb +161 -0
data/lib/sas_linter/version.rb +1 -1
data/lib/sas_linter.rb +1 -0
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 306561a9219d046d164095dd03b92fdd2da003c4292188e9d5d75e19eae4c3c9
-  data.tar.gz: 559eab2894b60b33f0a159df624603379a4285c61a6e49425ead209a991ae760
+  metadata.gz: 144b602b9b4eff14c301d9c04852f07f490e9557802f897eb4dd08acbf3d3fa4
+  data.tar.gz: 4aa7e953e1ed8a05cd1f4b94cf698086438a529604019120be0bcc0bb0530be3
 SHA512:
-  metadata.gz: 7da26fa1fbf7cf1fc00753e157c6e5ba2e9ed16b7c2470dccd7585717a35f269ee2636754b572f0ca425f16d4058fea0e212927bc45dd7efc75d504da13202df
-  data.tar.gz: 7f507e8df56ea74b355c0c471a3de0022064860c43b90e779d11a4b0f7ac1ebd20de09ea20f614e04a87c747b1bf774cb8eeda97e00045879f728f040a487e1c
+  metadata.gz: 74e7303ebdcfcfc616cd6e520f8984494d687b3c0c3af6dcba4efba91f7cbfefd61f511bfa535ae4ee9c4f6a481d4774a96b7a4ff20803ce2768782f2b39f5e6
+  data.tar.gz: 7a1dab42267ac076c9d992d7770871c386d52d6f4c6c2cc088372151b6f586e1ff12b963dcab26a47aa0b427396eef989c9f837805a4183e35c43a734b8fbd41

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # sas-linter
-A configurable lint engine for SAS source files. Built on the [`sas-lexer`](https://github.com/mes-amis/sas-lexer-rb) gem (a Ruby FFI binding to Misha Perlov's Rust [`sas-lexer`](https://github.com/mishamsk/sas-lexer)) and ships with eleven pluggable rules covering structural defects, cosmetic issues, and source-header conventions.
+A configurable lint engine for SAS source files. Built on the [`sas-lexer`](https://github.com/mes-amis/sas-lexer-rb) gem (a Ruby FFI binding to Misha Perlov's Rust [`sas-lexer`](https://github.com/mishamsk/sas-lexer)) and ships with thirteen pluggable rules covering structural defects, cosmetic issues, and source-header conventions.
 ## Installation
@@ -63,6 +63,10 @@ rules:
     enabled: true
     autofix: false             # rule supports autofix; off by default
+  inconsistent_variable_case:
+    enabled: true
+    autofix: false             # rewrite every minority casing to the most-common form
   variable_value_out_of_known_range:
     enabled: true
     csv_paths:                          # empty list = rule is a no-op
@@ -135,6 +139,7 @@ findings = linter.lint_file("path/to/source.sas")
 | `malformed_if_condition` | Empty conditions, missing operators, orphan `then`, unbalanced parens, etc. |
 | `missing_assignment_semicolon` | Assignment statements followed by an inline `**` comment but no terminating `;`. |
 | `variable_value_out_of_known_range` | `if VAR = N` / `if VAR in (...)` literals fall outside the variable's documented acceptable values. Loads the catalog from one or more CSVs with configurable column names and column separator (`,`, `;`, tab). |
+| `inconsistent_variable_case` | Identifier appears with more than one casing in the same file (`myVar` vs `MyVar`). SAS treats both as the same variable; autofix rewrites every minority spelling to the most-common form. Skips proc-format definitions and `format.` / `lib.member` references. |
 `bin/sas_lint --list-rules` prints the same set with autofix capability.

data/lib/sas_linter/rules/inconsistent_variable_case.rb ADDED Viewed

@@ -0,0 +1,161 @@
+# frozen_string_literal: true
+require_relative "../../sas_linter"
+require "sas_lexer"
+class SasLinter
+  module Rules
+    # Flag identifiers that are spelled with inconsistent letter case
+    # across the file. SAS resolves variable references case-insensitively,
+    # so `myVar` and `MyVar` end up bound to the same column — but mixing
+    # the two within one program is sloppy and makes the source harder to
+    # grep, diff, and read.
+    #
+    # The most-used spelling wins; every other casing is reported (and
+    # rewritten when autofix is on). Ties resolve to the first occurrence
+    # so the canonical form is reading-order deterministic.
+    #
+    # Skipped on purpose:
+    #   * identifiers immediately followed by `.` (format references like
+    #     `agecat.`, library references like `work.foo`);
+    #   * identifiers immediately preceded by `.` (the column half of
+    #     `lib.member` / `dataset.col`) — those name a column in another
+    #     dataset, not a variable in the current step;
+    #   * `value` / `invalue` / `picture` themselves and the format name
+    #     directly following them — these are proc-format definitions,
+    #     not variable references. We match locally rather than tracking
+    #     a `proc format ... run;` block because real-world SAS files
+    #     meant to be `%include`d into a caller's data step often omit
+    #     the terminating `run;`, so a state machine would never close.
+    class InconsistentVariableCase < Rule
+      rule_id :inconsistent_variable_case
+      description "Variable identifiers must use one consistent letter case " \
+                  "across the file; mixing `myVar` and `MyVar` is sloppy."
+      severity :warning
+      TT = SasLexer::Lexer::TokenType
+      # Identifiers that introduce a format / informat / picture
+      # definition in a `proc format` step. The lexer types these as
+      # plain IDENTIFIERs (not keywords), so we recognize them by text.
+      FORMAT_DEF_KEYWORDS = %w[value invalue picture].freeze
+      def self.supports_autofix?
+        true
+      end
+      def check(tokens, path:, all_tokens: nil, source: nil) # rubocop:disable Lint/UnusedMethodArgument
+        findings = []
+        each_inconsistent_use(tokens) do |token, canonical|
+          findings << finding(
+            line: token[:start_line],
+            column: token[:start_column] + 1,
+            message: "variable `#{token[:text]}` is spelled `#{canonical}` " \
+                     "elsewhere in this file — pick one case and stick with it.",
+            path: path
+          )
+        end
+        findings
+      end
+      def autofix(source)
+        return source if source.nil? || source.empty?
+        # If a previous rule's autofix returned ASCII-8BIT (e.g.
+        # EncodingIssues#autofix walks bytes and returns binary), tag
+        # it UTF-8 before slicing. The lexer treats the bytes as UTF-8
+        # and reports character offsets either way; only Ruby's
+        # `String#[]=` cares about the encoding label, and it indexes
+        # by bytes for ASCII-8BIT but by characters for UTF-8 — so a
+        # binary tag plus any multi-byte sequence earlier in the file
+        # would shift every replacement by the byte/char gap.
+        src = source.encoding == Encoding::UTF_8 ? source : source.dup.force_encoding("UTF-8")
+        lexer = SasLexer::Lexer.new
+        begin
+          all_tokens = lexer.tokenize(src)
+        ensure
+          lexer.free
+        end
+        tokens = all_tokens.reject do |t|
+          t[:channel] == SasLexer::Lexer::TokenChannel::HIDDEN ||
+            t[:channel] == SasLexer::Lexer::TokenChannel::COMMENT
+        end
+        edits = []
+        each_inconsistent_use(tokens) do |token, canonical|
+          edits << [token[:start], token[:end], canonical]
+        end
+        # Apply right-to-left so earlier offsets stay valid.
+        out = src.dup
+        edits.sort_by! { |start, _, _| -start }
+        edits.each { |start, finish, repl| out[start...finish] = repl }
+        out
+      end
+      private
+      # Yields `[token, canonical_form]` for every identifier whose
+      # spelling differs from the file-wide canonical case.
+      def each_inconsistent_use(tokens)
+        groups = collect_variable_uses(tokens)
+        groups.each_value do |uses|
+          forms = uses.map { |t| t[:text] }.tally
+          next if forms.size <= 1
+          canonical = canonical_form(forms, uses)
+          uses.each do |t|
+            yield t, canonical unless t[:text] == canonical
+          end
+        end
+      end
+      # Walk default-channel tokens and bucket eligible IDENTIFIER
+      # uses by lowercase name. Format-related identifiers (see class
+      # docstring) are filtered out by `variable_use?`.
+      def collect_variable_uses(tokens)
+        groups = Hash.new { |h, k| h[k] = [] }
+        tokens.each_with_index do |t, i|
+          next unless t[:type] == TT::IDENTIFIER && variable_use?(tokens, i)
+          groups[t[:text].downcase] << t
+        end
+        groups
+      end
+      # Reject `format.` / `lib.member` shapes via byte-adjacency to a
+      # `.` token, and `value <fmt-name>` shapes by checking the
+      # neighboring identifier. The lexer emits the dot separately, so
+      # we use `prev.end == t.start` / `t.end == nxt.start` to tell a
+      # truly-adjacent dot from one that just happens to follow after
+      # whitespace.
+      def variable_use?(tokens, i)
+        t = tokens[i]
+        nxt = tokens[i + 1]
+        prev = i.positive? ? tokens[i - 1] : nil
+        return false if nxt && nxt[:type] == TT::DOT && nxt[:start] == t[:end]
+        return false if prev && prev[:type] == TT::DOT && prev[:end] == t[:start]
+        return false if FORMAT_DEF_KEYWORDS.include?(t[:text].downcase)
+        return false if prev && prev[:type] == TT::IDENTIFIER &&
+                        FORMAT_DEF_KEYWORDS.include?(prev[:text].downcase)
+        true
+      end
+      # Most-used spelling wins; ties go to the first occurrence so the
+      # canonical form matches reading order and stays deterministic
+      # across runs.
+      def canonical_form(forms, uses)
+        max_count = forms.values.max
+        winners = forms.select { |_, c| c == max_count }.keys
+        return winners.first if winners.size == 1
+        uses.each { |t| return t[:text] if winners.include?(t[:text]) }
+        winners.first
+      end
+    end
+  end
+end

data/lib/sas_linter/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class SasLinter
-  VERSION = "0.2.1"
+  VERSION = "0.2.3"
 end

data/lib/sas_linter.rb CHANGED Viewed

@@ -309,3 +309,4 @@ require_relative "sas_linter/rules/malformed_if_condition"
 require_relative "sas_linter/rules/missing_assignment_semicolon"
 require_relative "sas_linter/rules/variable_value_out_of_known_range"
 require_relative "sas_linter/rules/invalid_numeric_literal"
+require_relative "sas_linter/rules/inconsistent_variable_case"

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: sas-linter
 version: !ruby/object:Gem::Version
-  version: 0.2.1
+  version: 0.2.3
 platform: ruby
 authors:
 - Craig McNamara
@@ -64,6 +64,7 @@ files:
 - lib/sas_linter/rules/commented_out_guard.rb
 - lib/sas_linter/rules/encoding_issues.rb
 - lib/sas_linter/rules/identical_if_else_branches.rb
+- lib/sas_linter/rules/inconsistent_variable_case.rb
 - lib/sas_linter/rules/invalid_numeric_literal.rb
 - lib/sas_linter/rules/line_endings.rb
 - lib/sas_linter/rules/malformed_if_condition.rb