RubyGems - sas-linter - Versions diffs - 0.1.0 → 0.2.1 - Mend

sas-linter 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CLAUDE.md +44 -0
data/README.md +65 -20
data/bin/sas_lint +46 -24
data/config/lint.yaml +4 -0
data/lib/sas_linter/formatter.rb +262 -0
data/lib/sas_linter/rules/encoding_issues.rb +6 -1
data/lib/sas_linter/rules/invalid_numeric_literal.rb +49 -0
data/lib/sas_linter/version.rb +1 -1
data/lib/sas_linter.rb +24 -0
metadata +5 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: caf1fe6a2bd278f6a8a12ec8863698dd42223ab1b995c03b660cee841d7b46f4
-  data.tar.gz: f24e0739e60b21693f5a3a9afb926cb45d0b8ee87a826f72d20ae9ef9705c3d5
+  metadata.gz: 306561a9219d046d164095dd03b92fdd2da003c4292188e9d5d75e19eae4c3c9
+  data.tar.gz: 559eab2894b60b33f0a159df624603379a4285c61a6e49425ead209a991ae760
 SHA512:
-  metadata.gz: b6ba9e2692038475acc1db1e0481f21587dc36bc03e26f6863408f74eddeaaabc42df99bbf7763405526ce78e8c2c3ab62d4a2684d9daecdbe0590b90d781223
-  data.tar.gz: b730dcb4ac2d4d9c4082a2a92834f1968e3df6902845273adba44cb29d3ad87ae7863027488ac14528f107f6ff19228f8538b51a20650ca66de30d90ee277ab1
+  metadata.gz: 7da26fa1fbf7cf1fc00753e157c6e5ba2e9ed16b7c2470dccd7585717a35f269ee2636754b572f0ca425f16d4058fea0e212927bc45dd7efc75d504da13202df
+  data.tar.gz: 7f507e8df56ea74b355c0c471a3de0022064860c43b90e779d11a4b0f7ac1ebd20de09ea20f614e04a87c747b1bf774cb8eeda97e00045879f728f040a487e1c

data/CLAUDE.md ADDED Viewed

@@ -0,0 +1,44 @@
+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Commands
+```sh
+bundle install                                  # install dev deps
+bundle exec rake spec                           # full test suite (also `rake` default)
+bundle exec rspec spec/sas_linter_spec.rb       # single file
+bundle exec rspec spec/sas_linter_spec.rb:42    # single example by line
+bundle exec rubocop                             # lint Ruby (also `rake rubocop`)
+bin/sas_lint --list-rules                       # CLI sanity check (lists every registered rule)
+bin/sas_lint path/to/file.sas                   # run linter from working tree
+gem build sas-linter.gemspec                    # build the gem locally
+```
+Ruby ≥ 3.4 is required (see gemspec). CI matrix runs ubuntu + macOS on Ruby 3.4 and 4.0.
+## Architecture
+**Rule registry, self-registering subclasses.** `SasLinter::Rule` keeps a class-level `registry` keyed by rule id. Subclasses call `rule_id :foo` in their class body, which triggers `Rule.register(self)`. Requiring a rule file is enough to make it discoverable via `SasLinter::Rule.fetch(:foo)` and to include it in `SasLinter.new` (no rules arg → all registered rules). All rule files are required at the bottom of `lib/sas_linter.rb`; new rules must be added to that require block.
+**Two-channel token stream.** `SasLinter#tokenize` runs `SasLexer::Lexer` once and returns `[default_tokens, all_tokens]`. Default-channel excludes whitespace + comment tokens; `all_tokens` keeps everything. `Rule#check(tokens, path:, all_tokens:, source:)` receives both, plus the raw `source` string. Most rules walk default-channel only; `commented_out_guard` needs comments, source-hygiene rules (`trailing_whitespace`, `tab_expansion`, `line_endings`, `encoding_issues`, `source_headers`) work directly on `source`.
+**Config-driven instantiation.** `SasLinter.from_config(hash)` walks `config["rules"]`, skips `enabled: false` entries, and calls `klass.from_config(opts)` on the rest. The base `Rule.from_config` only forwards `autofix:` — rules with extra options (`encoding_issues`, `tab_expansion`, `variable_value_out_of_known_range`) override `from_config` to map their YAML keys onto kwargs. **Rules omitted from a config default to enabled with no options** so adding a new rule never silently disables it for existing users; to suppress, list with `enabled: false`.
+**Autofix pipeline.** `lint_with_fixes` returns `[findings, modified_source]`. After `check`, the engine runs `autofix(source)` on every rule where `rule.autofix?` (instance flag from config) AND `rule.class.supports_autofix?` (class capability) are both true. Autofixes compose by chaining each rule's output into the next rule's input. `lint_file` writes the result back **only if `modified.b != original.b`** — the `.b` byte compare prevents spurious rewrites when a rule returns a re-encoded but byte-identical string (encoding-tag-only diffs would otherwise overwrite the user's encoding). The CLI's `--no-autofix` strips `autofix: true` from the loaded config hash *before* constructing the linter, so a dry run cannot rewrite a file even if the config requests it.
+**Source encoding fallback.** SAS sources are commonly Windows-1252 or ISO-8859-1. `read_source` reads as BINARY, returns as UTF-8 if already valid, otherwise transcodes Win-1252 → UTF-8 with `invalid: :replace, undef: :replace, replace: "'"`, falling back to ISO-8859-1 on failure. The lexer requires valid UTF-8, so this transcode happens before tokenization.
+**Custom rules.** Subclass `SasLinter::Rule`, declare `rule_id`, `description`, `severity`, implement `check`. To support autofix, override `self.supports_autofix?` to return true and implement `#autofix(source)` returning the rewritten source. Use the protected `finding(line:, column:, message:, path:)` helper rather than building `Finding` structs directly so severity and rule id are filled in consistently.
+## Test fixtures
+`spec/sas_linter_spec.rb` is the integration suite. Per-rule fixtures live in `spec/fixtures/lints/<rule_id>/` as a pair: `lint.sas` (demonstrates the bug, expected to produce findings) and `clean.sas` (same shape, fixed, expected to be silent). Helpers `lint_fixture(name)` and `clean_fixture(name)` resolve those paths. When adding a rule, add a matching fixture pair — the suite's parametric tests rely on the convention. Rule-specific unit specs live under `spec/sas_linter/rules/`.
+## Release flow
+`.github/workflows/publish.yml` publishes to RubyGems on push to `main` via OIDC trusted publishing (no API key in repo). The job is idempotent: it checks for an existing `v<version>` tag on origin and treats RubyGems "already pushed" responses as success. To cut a release, bump `SasLinter::VERSION` in `lib/sas_linter/version.rb` and merge to `main`; the workflow tags `v<version>` and creates a GitHub release.
+## License note
+AGPL-3.0-or-later — chosen to match upstream `sas-lexer`. Redistribution and network-service use trigger source-disclosure obligations; standalone CLI/CI use does not. Keep this in mind before suggesting embedding the linter into a redistributed product.

data/README.md CHANGED Viewed

@@ -37,6 +37,71 @@ bin/sas_lint --no-autofix src/*.sas
 Exit codes: `0` clean, `1` findings, `2` invalid args.
+## YAML config
+Every rule with options, plus its defaults. Rules omitted from the config default to enabled with no options, so adding a new rule to the gem won't silently disable it for users with existing configs. To suppress a rule, list it with `enabled: false`.
+```yaml
+rules:
+  # ── Structural / semantic rules ─────────────────────────────────────
+  unreachable_inner_branch_value:
+    enabled: true              # default for every rule
+  identical_if_else_branches:
+    enabled: true
+  malformed_if_condition:
+    enabled: true
+  commented_out_guard:
+    enabled: true
+  choose_one_template:
+    enabled: true
+  missing_assignment_semicolon:
+    enabled: true
+    autofix: false             # rule supports autofix; off by default
+  variable_value_out_of_known_range:
+    enabled: true
+    csv_paths:                          # empty list = rule is a no-op
+      - metadata/variables.csv
+      - metadata/variables-extra.csv
+    name_column: "Variable"             # default
+    values_column: "Acceptable Values"  # default
+    name_match: case_insensitive        # case_insensitive | exact
+    delimiter: ","                      # CSV column separator: "," | ";" | "\t"
+  # ── Source-hygiene rules (all support autofix) ──────────────────────
+  trailing_whitespace:
+    enabled: true
+    autofix: false
+  tab_expansion:
+    enabled: true
+    autofix: false
+    width: 8                   # tab stop width
+  source_headers:
+    enabled: true
+    autofix: false             # rewrap **…**; 90-char header rows when true
+  line_endings:
+    enabled: true
+    autofix: false             # collapse \r\r\n → \r\n; lone \r → dominant ending
+  encoding_issues:
+    enabled: true
+    autofix: false
+    use_defaults: false        # apply built-in smart-quote / em-dash / Win-1252 map
+    replacements:              # project-specific byte→ASCII rewrites (run BEFORE defaults)
+      "—": "--"
+      "\x85": "Ö"
+```
+`enabled` and `autofix` are accepted on every rule. Options not listed above are ignored.
 ## Library usage
 ```ruby
@@ -99,26 +164,6 @@ Subclasses self-register on the rule registry via `rule_id` — once required, t
 To support autofix, override `self.supports_autofix?` to return `true` and implement `#autofix(source)` to return the rewritten source.
-## YAML config
-```yaml
-rules:
-  malformed_if_condition:
-    enabled: true              # default
-  trailing_whitespace:
-    enabled: true
-    autofix: true
-  encoding_issues:
-    enabled: true
-    use_defaults: true
-    replacements:
-      "—": "--"
-  identical_if_else_branches:
-    enabled: false             # disable a rule
-```
-Rules omitted from the config default to enabled with no options, so adding a new rule to the gem won't silently disable it for users with existing configs.
 ## Testing
 ```sh

data/bin/sas_lint CHANGED Viewed

@@ -4,7 +4,7 @@
 require "sas_linter"
 require "optparse"
-options = { config: SasLinter::DEFAULT_CONFIG_PATH, rules: nil, autofix: true }
+options = { config: SasLinter::DEFAULT_CONFIG_PATH, rules: nil, autofix: true, format: false }
 parser = OptionParser.new do |opts|
   opts.banner = "Usage: sas_lint FILE [FILE ...] [options]"
@@ -18,6 +18,12 @@ parser = OptionParser.new do |opts|
     options[:rules] = ids.map(&:to_sym)
   end
+  opts.on("--format",
+          "Reformat file(s) in place using `format:` config options and all autofix-capable " \
+          "rules. Does not report lint findings.") do
+    options[:format] = true
+  end
   opts.on("--no-autofix",
           "Suppress autofix even if the config sets `autofix: true` for some rule. " \
           "Findings are still reported but no file is rewritten.") do
@@ -45,35 +51,51 @@ if ARGV.empty?
   exit 2
 end
-linter =
-  if options[:rules]
-    SasLinter.new(rules: options[:rules])
-  else
-    config = SasLinter.load_config_file(options[:config])
-    # `--no-autofix` strips every rule's autofix flag from the loaded config
-    # before the linter is built, so a dry run can never rewrite a file.
-    if options[:autofix] == false && config.is_a?(Hash) && config["rules"].is_a?(Hash)
-      config["rules"].each_value do |opts_hash|
-        opts_hash["autofix"] = false if opts_hash.is_a?(Hash) && opts_hash["autofix"]
-      end
-    end
-    SasLinter.from_config(config)
-  end
+config = SasLinter.load_config_file(options[:config])
 exit_code = 0
-ARGV.each do |path|
-  unless File.file?(path)
-    warn "sas_lint: #{path}: not a regular file"
-    exit_code = 2
-    next
+if options[:format]
+  formatter = SasLinter::Formatter.from_config(config)
+  linter = options[:rules] ? SasLinter.new(rules: options[:rules]) : SasLinter.from_config(config)
+  ARGV.each do |path|
+    unless File.file?(path)
+      warn "sas_lint: #{path}: not a regular file"
+      exit_code = 2
+      next
+    end
+    linter.format_file(path, formatter: formatter)
   end
+else
+  linter =
+    if options[:rules]
+      SasLinter.new(rules: options[:rules])
+    else
+      # `--no-autofix` strips every rule's autofix flag from the loaded config
+      # before the linter is built, so a dry run can never rewrite a file.
+      if options[:autofix] == false && config.is_a?(Hash) && config["rules"].is_a?(Hash)
+        config["rules"].each_value do |opts_hash|
+          opts_hash["autofix"] = false if opts_hash.is_a?(Hash) && opts_hash["autofix"]
+        end
+      end
+      SasLinter.from_config(config)
+    end
-  findings = linter.lint_file(path)
-  next if findings.empty?
+  ARGV.each do |path|
+    unless File.file?(path)
+      warn "sas_lint: #{path}: not a regular file"
+      exit_code = 2
+      next
+    end
+    findings = linter.lint_file(path)
+    next if findings.empty?
-  exit_code = 1
-  findings.each { |f| puts f.to_s }
+    exit_code = 1
+    findings.each { |f| puts f.to_s }
+  end
 end
 exit exit_code

data/config/lint.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+format:
+  keywords: preserve      # preserve | upper | lower
+  operator_spacing: true  # normalize spaces around binary operators and after commas
+  indent_width: 2         # indentation width; 0 or omit to disable

data/lib/sas_linter/formatter.rb ADDED Viewed

@@ -0,0 +1,262 @@
+# frozen_string_literal: true
+class SasLinter
+  class Formatter
+    BINARY_OP_NAMES = %w[
+      ASSIGN PLUS MINUS STAR FSLASH STAR2
+      LT LE GT GE NE LTGT GTLT
+      AMP PIPE PIPE2 EXCL EXCL2 BPIPE BPIPE2 SOUNDS_LIKE
+    ].freeze
+    UNARY_CANDIDATE_NAMES = %w[PLUS MINUS].freeze
+    NO_SPACE_BEFORE_NAMES = %w[SEMI COMMA RPAREN RBRACK].freeze
+    VALUE_ENDING_NAMES    = %w[
+      IDENTIFIER INTEGER_LITERAL FLOAT_LITERAL FLOAT_EXPONENT_LITERAL
+      STRING_LITERAL NAME_LITERAL DATE_LITERAL TIME_LITERAL DATE_TIME_LITERAL
+      HEX_STRING_LITERAL BIT_TESTING_LITERAL MACRO_VAR_RESOLVE MACRO_IDENTIFIER
+      STRING_EXPR_END BIT_TESTING_LITERAL_EXPR_END DATE_LITERAL_EXPR_END
+      DATE_TIME_LITERAL_EXPR_END HEX_STRING_LITERAL_EXPR_END NAME_LITERAL_EXPR_END
+      TIME_LITERAL_EXPR_END RPAREN RBRACK
+    ].freeze
+    COMMA_NAMES     = %w[COMMA].freeze
+    DATA_PROC_NAMES = %w[KW_DATA KW_PROC].freeze
+    DO_NAMES        = %w[KW_DO].freeze
+    END_NAMES       = %w[KW_END].freeze
+    RUN_QUIT_NAMES  = %w[KW_RUN KW_QUIT].freeze
+    SEMI_NAMES      = %w[SEMI].freeze
+    def self.from_config(config)
+      config = (config || {}).transform_keys(&:to_s)
+      fmt = (config["format"] || {}).transform_keys(&:to_s)
+      keywords = fmt.fetch("keywords", "preserve").to_sym
+      unless %i[preserve upper lower].include?(keywords)
+        raise ArgumentError,
+              "format.keywords must be 'preserve', 'upper', or 'lower' (got '#{keywords}')"
+      end
+      operator_spacing = fmt.key?("operator_spacing") ? !!fmt["operator_spacing"] : false
+      raw_width = fmt["indent_width"]
+      indent_width = case raw_width
+                     when nil, false then nil
+                     else
+                       w = Integer(raw_width)
+                       w > 0 ? w : nil
+                     end
+      new(keywords: keywords, operator_spacing: operator_spacing, indent_width: indent_width)
+    end
+    def initialize(keywords: :preserve, operator_spacing: false, indent_width: nil)
+      @keywords = keywords
+      @operator_spacing = operator_spacing
+      @indent_width = indent_width
+    end
+    def format(source)
+      return source if noop?
+      lexer = SasLexer::Lexer.new
+      all_tokens = begin
+        lexer.tokenize(source)
+      ensure
+        lexer.free
+      end
+      result = reconstruct(all_tokens)
+      result = apply_indentation(result, all_tokens) if @indent_width
+      result
+    end
+    private
+    def noop?
+      @keywords == :preserve && !@operator_spacing && @indent_width.nil?
+    end
+    # --- Type sets (built lazily from sas-lexer vocabulary) ---
+    def type_set(names)
+      tt = SasLexer::Lexer::TokenType
+      names.filter_map { |n| tt.const_get(n) if tt.const_defined?(n) }.to_set
+    end
+    def keyword_types
+      @keyword_types ||= SasLexer::Lexer::TokenType.constants
+        .select { |c| c.to_s.start_with?("KW_", "KWM_") }
+        .map { |c| SasLexer::Lexer::TokenType.const_get(c) }
+        .to_set
+    end
+    def binary_op_types    = @binary_op_types    ||= type_set(BINARY_OP_NAMES)
+    def unary_cand_types   = @unary_cand_types   ||= type_set(UNARY_CANDIDATE_NAMES)
+    def no_sp_before_types = @no_sp_before_types ||= type_set(NO_SPACE_BEFORE_NAMES)
+    def value_ending_types = @value_ending_types ||= type_set(VALUE_ENDING_NAMES)
+    def comma_types        = @comma_types        ||= type_set(COMMA_NAMES)
+    def data_proc_types    = @data_proc_types    ||= type_set(DATA_PROC_NAMES)
+    def do_types           = @do_types           ||= type_set(DO_NAMES)
+    def end_types          = @end_types          ||= type_set(END_NAMES)
+    def run_quit_types     = @run_quit_types     ||= type_set(RUN_QUIT_NAMES)
+    def semi_types         = @semi_types         ||= type_set(SEMI_NAMES)
+    # --- Reconstruction (keyword casing + operator spacing) ---
+    def apply_casing(token)
+      text = token[:text]
+      return text if @keywords == :preserve
+      default_ch = SasLexer::Lexer::TokenChannel::DEFAULT
+      return text unless token[:channel] == default_ch && keyword_types.include?(token[:type])
+      @keywords == :upper ? text.upcase : text.downcase
+    end
+    # Partition all_tokens into [{gap:, tok:}] segments where gap holds the
+    # non-DEFAULT tokens preceding tok, and tok is a DEFAULT-channel token.
+    def segmentize(all_tokens)
+      default_ch = SasLexer::Lexer::TokenChannel::DEFAULT
+      segments = []
+      gap = []
+      all_tokens.each do |t|
+        if t[:channel] == default_ch
+          segments << { gap: gap, tok: t }
+          gap = []
+        else
+          gap << t
+        end
+      end
+      segments << { gap: gap, tok: nil } unless gap.empty?
+      segments
+    end
+    def reconstruct(all_tokens)
+      segments = segmentize(all_tokens)
+      result = +""
+      segments.each_with_index do |seg, idx|
+        prev_prev = idx > 1 ? segments[idx - 2][:tok] : nil
+        prev      = idx > 0 ? segments[idx - 1][:tok] : nil
+        cur       = seg[:tok]
+        gap_text  = seg[:gap].map { |t| t[:text] }.join
+        if @operator_spacing && prev && cur && !gap_text.include?("\n")
+          desired = gap_desired(prev_prev, prev, cur)
+          result << (desired.nil? ? gap_text : desired)
+        else
+          result << gap_text
+        end
+        result << apply_casing(cur) if cur
+      end
+      result
+    end
+    # Returns the desired whitespace between two same-line DEFAULT tokens:
+    #   nil  → leave unchanged
+    #   ""   → no space
+    #   " "  → exactly one space
+    #
+    # Three consecutive DEFAULT tokens are provided so unary PLUS/MINUS can be
+    # distinguished from binary: a PLUS/MINUS is binary only when its preceding
+    # token is a value-ending type (identifier, literal, or closing bracket).
+    def gap_desired(prev_prev_tok, prev_tok, next_tok)
+      pt  = prev_tok[:type]
+      nt  = next_tok[:type]
+      ppt = prev_prev_tok&.[](:type)
+      return ""  if no_sp_before_types.include?(nt)
+      return " " if comma_types.include?(pt)
+      # Space after binary operator — but not after a unary +/-
+      if binary_op_types.include?(pt)
+        if unary_cand_types.include?(pt) && (ppt.nil? || !value_ending_types.include?(ppt))
+          return nil
+        end
+        return " "
+      end
+      # Space before binary operator — but not before a unary +/-
+      if binary_op_types.include?(nt)
+        return nil if unary_cand_types.include?(nt) && !value_ending_types.include?(pt)
+        return " "
+      end
+      nil
+    end
+    # --- Indentation ---
+    # Walk all_tokens and assign an indent level to each source line.
+    # Only the FIRST token on a given line determines its level (||= semantics).
+    # Nesting rules:
+    #   DATA / PROC → level 0; content after their SEMI → level 1
+    #   DO          → content inside indented one further level
+    #   END         → decrements level before assigning the END line's level
+    #   RUN / QUIT  → resets to level 0
+    def compute_line_levels(all_tokens)
+      default_ch = SasLexer::Lexer::TokenChannel::DEFAULT
+      hidden_ch  = SasLexer::Lexer::TokenChannel::HIDDEN
+      levels = {}
+      level = 0
+      after_data_proc = false
+      all_tokens.each do |tok|
+        next if tok[:channel] == hidden_ch
+        line = tok[:start_line]
+        type = tok[:type]
+        unless tok[:channel] == default_ch
+          levels[line] ||= level  # comment tokens — indent at current level
+          next
+        end
+        if data_proc_types.include?(type)
+          level = 0
+          after_data_proc = true
+          levels[line] ||= level
+        elsif run_quit_types.include?(type)
+          level = 0
+          after_data_proc = false
+          levels[line] ||= level
+        elsif do_types.include?(type)
+          levels[line] ||= level
+          level += 1
+        elsif end_types.include?(type)
+          level = [level - 1, 0].max
+          levels[line] ||= level
+        elsif semi_types.include?(type) && after_data_proc
+          # Semicolon ends the DATA/PROC header — step body starts at level 1.
+          # Don't re-assign the SEMI's line (already marked at level 0 by DATA/PROC).
+          after_data_proc = false
+          level = 1
+        else
+          levels[line] ||= level
+        end
+      end
+      levels
+    end
+    # Re-indent each line of source using the computed per-line levels.
+    # Lines with no token coverage (blank lines) are left unchanged.
+    def apply_indentation(source, all_tokens)
+      line_levels = compute_line_levels(all_tokens)
+      source.each_line.with_index.map do |line, idx|
+        lineno     = idx + 1
+        line_level = line_levels[lineno]
+        next line if line_level.nil?
+        eol     = line.match(/\r?\n\z/)&.[](0) || ""
+        body    = line[0...(line.length - eol.length)]
+        stripped = body.lstrip
+        next eol if stripped.empty?  # blank line — strip stray whitespace
+        (" " * (@indent_width * line_level)) + stripped + eol
+      end.join
+    end
+  end
+end

data/lib/sas_linter/rules/encoding_issues.rb CHANGED Viewed

@@ -286,8 +286,13 @@ class SasLinter
         findings
       end
+      # `seq` is ASCII-8BIT (from `pack("C*")`); `encode("UTF-8")` on a
+      # binary string replaces every byte ≥ 0x80 with U+FFFD before
+      # `codepoints` runs. The bytes are already a valid UTF-8 sequence
+      # by construction (caller checked `utf8_sequence_length`), so
+      # reinterpret rather than transcode.
       def codepoint(seq)
-        seq.encode("UTF-8", invalid: :replace, undef: :replace).codepoints.first
+        seq.dup.force_encoding("UTF-8").codepoints.first
       end
       # Returns the length (1-4) of a valid UTF-8 sequence starting

data/lib/sas_linter/rules/invalid_numeric_literal.rb ADDED Viewed

@@ -0,0 +1,49 @@
+# frozen_string_literal: true
+require_relative "../../sas_linter"
+require "sas_lexer"
+class SasLinter
+  module Rules
+    # Flag tokens the lexer typed as INTEGER_LITERAL whose text isn't
+    # actually a valid SAS numeric literal — typically a digit-prefixed
+    # identifier the lexer munched into one token, e.g.
+    #
+    #     if K9d = 1f then do;     * `1f` is not valid SAS
+    #     x = 1d2;                  * `D` exponent is Fortran, not SAS
+    #
+    # Real SAS only accepts two integer-shaped literals:
+    #
+    #   * plain decimal:  `[0-9]+`
+    #   * hex literal:    `[0-9][0-9A-Fa-f]*[xX]` (trailing `x`/`X`)
+    #
+    # Float and exponent forms (`1.0`, `1e3`) get their own token types
+    # (FLOAT_LITERAL / FLOAT_EXPONENT_LITERAL) and aren't checked here.
+    class InvalidNumericLiteral < Rule
+      rule_id :invalid_numeric_literal
+      description "INTEGER_LITERAL must be a plain decimal or a SAS hex " \
+                  "literal (`0FFx`-style); reject suffixes like `1f` or " \
+                  "`1d2` that the lexer accepts but SAS does not."
+      severity :warning
+      TT = SasLexer::Lexer::TokenType
+      VALID = /\A(?:[0-9]+|[0-9][0-9A-Fa-f]*[xX])\z/
+      MESSAGE_SUFFIX = "is not a valid SAS numeric literal — SAS has no " \
+                       "`f`/`F`/`L` numeric suffixes and uses `E` (not `D`) " \
+                       "for exponents; hex literals must end in `x`/`X`."
+      def check(tokens, path:, all_tokens: nil, source: nil) # rubocop:disable Lint/UnusedMethodArgument
+        tokens.filter_map do |t|
+          next unless t[:type] == TT::INTEGER_LITERAL && !VALID.match?(t[:text])
+          finding(
+            line: t[:start_line], column: t[:start_column] + 1,
+            message: "`#{t[:text]}` #{MESSAGE_SUFFIX}", path: path
+          )
+        end
+      end
+    end
+  end
+end

data/lib/sas_linter/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 class SasLinter
-  VERSION = "0.1.0"
+  VERSION = "0.2.1"
 end

data/lib/sas_linter.rb CHANGED Viewed

@@ -241,6 +241,28 @@ class SasLinter
     findings
   end
+  # Apply formatting to a file in-place. Runs the formatter's own
+  # transformations first, then the autofix pipeline for rules that have
+  # `autofix: true` in config — identical to lint_with_fixes except that the
+  # formatter pass runs first. Rules that haven't been opted in to autofix
+  # (e.g. missing_assignment_semicolon without explicit `autofix: true`) are
+  # left alone so that --format stays cosmetic by default.
+  #
+  # Returns true if the file was rewritten, false if nothing changed.
+  def format_file(path, formatter:)
+    original = read_source(path)
+    modified = formatter.format(original)
+    @rules.each do |rule|
+      next unless rule.autofix? && rule.class.supports_autofix?
+      modified = rule.autofix(modified)
+    end
+    return false if modified.b == original.b
+    File.write(path, modified)
+    true
+  end
   private
   def read_source(path)
@@ -273,6 +295,7 @@ class SasLinter
   end
 end
+require_relative "sas_linter/formatter"
 require_relative "sas_linter/rules/unreachable_inner_branch_value"
 require_relative "sas_linter/rules/identical_if_else_branches"
 require_relative "sas_linter/rules/commented_out_guard"
@@ -285,3 +308,4 @@ require_relative "sas_linter/rules/encoding_issues"
 require_relative "sas_linter/rules/malformed_if_condition"
 require_relative "sas_linter/rules/missing_assignment_semicolon"
 require_relative "sas_linter/rules/variable_value_out_of_known_range"
+require_relative "sas_linter/rules/invalid_numeric_literal"

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: sas-linter
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.1
 platform: ruby
 authors:
 - Craig McNamara
@@ -52,15 +52,19 @@ executables:
 extensions: []
 extra_rdoc_files: []
 files:
+- CLAUDE.md
 - LICENSE
 - README.md
 - Rakefile
 - bin/sas_lint
+- config/lint.yaml
 - lib/sas_linter.rb
+- lib/sas_linter/formatter.rb
 - lib/sas_linter/rules/choose_one_template.rb
 - lib/sas_linter/rules/commented_out_guard.rb
 - lib/sas_linter/rules/encoding_issues.rb
 - lib/sas_linter/rules/identical_if_else_branches.rb
+- lib/sas_linter/rules/invalid_numeric_literal.rb
 - lib/sas_linter/rules/line_endings.rb
 - lib/sas_linter/rules/malformed_if_condition.rb
 - lib/sas_linter/rules/missing_assignment_semicolon.rb