sas-linter 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: caf1fe6a2bd278f6a8a12ec8863698dd42223ab1b995c03b660cee841d7b46f4
4
- data.tar.gz: f24e0739e60b21693f5a3a9afb926cb45d0b8ee87a826f72d20ae9ef9705c3d5
3
+ metadata.gz: 306561a9219d046d164095dd03b92fdd2da003c4292188e9d5d75e19eae4c3c9
4
+ data.tar.gz: 559eab2894b60b33f0a159df624603379a4285c61a6e49425ead209a991ae760
5
5
  SHA512:
6
- metadata.gz: b6ba9e2692038475acc1db1e0481f21587dc36bc03e26f6863408f74eddeaaabc42df99bbf7763405526ce78e8c2c3ab62d4a2684d9daecdbe0590b90d781223
7
- data.tar.gz: b730dcb4ac2d4d9c4082a2a92834f1968e3df6902845273adba44cb29d3ad87ae7863027488ac14528f107f6ff19228f8538b51a20650ca66de30d90ee277ab1
6
+ metadata.gz: 7da26fa1fbf7cf1fc00753e157c6e5ba2e9ed16b7c2470dccd7585717a35f269ee2636754b572f0ca425f16d4058fea0e212927bc45dd7efc75d504da13202df
7
+ data.tar.gz: 7f507e8df56ea74b355c0c471a3de0022064860c43b90e779d11a4b0f7ac1ebd20de09ea20f614e04a87c747b1bf774cb8eeda97e00045879f728f040a487e1c
data/CLAUDE.md ADDED
@@ -0,0 +1,44 @@
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Commands
6
+
7
+ ```sh
8
+ bundle install # install dev deps
9
+ bundle exec rake spec # full test suite (also `rake` default)
10
+ bundle exec rspec spec/sas_linter_spec.rb # single file
11
+ bundle exec rspec spec/sas_linter_spec.rb:42 # single example by line
12
+ bundle exec rubocop # lint Ruby (also `rake rubocop`)
13
+ bin/sas_lint --list-rules # CLI sanity check (lists every registered rule)
14
+ bin/sas_lint path/to/file.sas # run linter from working tree
15
+ gem build sas-linter.gemspec # build the gem locally
16
+ ```
17
+
18
+ Ruby ≥ 3.4 is required (see gemspec). CI matrix runs ubuntu + macOS on Ruby 3.4 and 4.0.
19
+
20
+ ## Architecture
21
+
22
+ **Rule registry, self-registering subclasses.** `SasLinter::Rule` keeps a class-level `registry` keyed by rule id. Subclasses call `rule_id :foo` in their class body, which triggers `Rule.register(self)`. Requiring a rule file is enough to make it discoverable via `SasLinter::Rule.fetch(:foo)` and to include it in `SasLinter.new` (no rules arg → all registered rules). All rule files are required at the bottom of `lib/sas_linter.rb`; new rules must be added to that require block.
23
+
24
+ **Two-channel token stream.** `SasLinter#tokenize` runs `SasLexer::Lexer` once and returns `[default_tokens, all_tokens]`. Default-channel excludes whitespace + comment tokens; `all_tokens` keeps everything. `Rule#check(tokens, path:, all_tokens:, source:)` receives both, plus the raw `source` string. Most rules walk default-channel only; `commented_out_guard` needs comments, source-hygiene rules (`trailing_whitespace`, `tab_expansion`, `line_endings`, `encoding_issues`, `source_headers`) work directly on `source`.
25
+
26
+ **Config-driven instantiation.** `SasLinter.from_config(hash)` walks `config["rules"]`, skips `enabled: false` entries, and calls `klass.from_config(opts)` on the rest. The base `Rule.from_config` only forwards `autofix:` — rules with extra options (`encoding_issues`, `tab_expansion`, `variable_value_out_of_known_range`) override `from_config` to map their YAML keys onto kwargs. **Rules omitted from a config default to enabled with no options** so adding a new rule never silently disables it for existing users; to suppress, list with `enabled: false`.
27
+
28
+ **Autofix pipeline.** `lint_with_fixes` returns `[findings, modified_source]`. After `check`, the engine runs `autofix(source)` on every rule where `rule.autofix?` (instance flag from config) AND `rule.class.supports_autofix?` (class capability) are both true. Autofixes compose by chaining each rule's output into the next rule's input. `lint_file` writes the result back **only if `modified.b != original.b`** — the `.b` byte compare prevents spurious rewrites when a rule returns a re-encoded but byte-identical string (encoding-tag-only diffs would otherwise overwrite the user's encoding). The CLI's `--no-autofix` strips `autofix: true` from the loaded config hash *before* constructing the linter, so a dry run cannot rewrite a file even if the config requests it.
29
+
30
+ **Source encoding fallback.** SAS sources are commonly Windows-1252 or ISO-8859-1. `read_source` reads as BINARY, returns as UTF-8 if already valid, otherwise transcodes Win-1252 → UTF-8 with `invalid: :replace, undef: :replace, replace: "'"`, falling back to ISO-8859-1 on failure. The lexer requires valid UTF-8, so this transcode happens before tokenization.
31
+
32
+ **Custom rules.** Subclass `SasLinter::Rule`, declare `rule_id`, `description`, `severity`, implement `check`. To support autofix, override `self.supports_autofix?` to return true and implement `#autofix(source)` returning the rewritten source. Use the protected `finding(line:, column:, message:, path:)` helper rather than building `Finding` structs directly so severity and rule id are filled in consistently.
33
+
34
+ ## Test fixtures
35
+
36
+ `spec/sas_linter_spec.rb` is the integration suite. Per-rule fixtures live in `spec/fixtures/lints/<rule_id>/` as a pair: `lint.sas` (demonstrates the bug, expected to produce findings) and `clean.sas` (same shape, fixed, expected to be silent). Helpers `lint_fixture(name)` and `clean_fixture(name)` resolve those paths. When adding a rule, add a matching fixture pair — the suite's parametric tests rely on the convention. Rule-specific unit specs live under `spec/sas_linter/rules/`.
37
+
38
+ ## Release flow
39
+
40
+ `.github/workflows/publish.yml` publishes to RubyGems on push to `main` via OIDC trusted publishing (no API key in repo). The job is idempotent: it checks for an existing `v<version>` tag on origin and treats RubyGems "already pushed" responses as success. To cut a release, bump `SasLinter::VERSION` in `lib/sas_linter/version.rb` and merge to `main`; the workflow tags `v<version>` and creates a GitHub release.
41
+
42
+ ## License note
43
+
44
+ AGPL-3.0-or-later — chosen to match upstream `sas-lexer`. Redistribution and network-service use trigger source-disclosure obligations; standalone CLI/CI use does not. Keep this in mind before suggesting embedding the linter into a redistributed product.
data/README.md CHANGED
@@ -37,6 +37,71 @@ bin/sas_lint --no-autofix src/*.sas
37
37
 
38
38
  Exit codes: `0` clean, `1` findings, `2` invalid args.
39
39
 
40
+ ## YAML config
41
+
42
+ Every rule with options, plus its defaults. Rules omitted from the config default to enabled with no options, so adding a new rule to the gem won't silently disable it for users with existing configs. To suppress a rule, list it with `enabled: false`.
43
+
44
+ ```yaml
45
+ rules:
46
+ # ── Structural / semantic rules ─────────────────────────────────────
47
+ unreachable_inner_branch_value:
48
+ enabled: true # default for every rule
49
+
50
+ identical_if_else_branches:
51
+ enabled: true
52
+
53
+ malformed_if_condition:
54
+ enabled: true
55
+
56
+ commented_out_guard:
57
+ enabled: true
58
+
59
+ choose_one_template:
60
+ enabled: true
61
+
62
+ missing_assignment_semicolon:
63
+ enabled: true
64
+ autofix: false # rule supports autofix; off by default
65
+
66
+ variable_value_out_of_known_range:
67
+ enabled: true
68
+ csv_paths: # empty list = rule is a no-op
69
+ - metadata/variables.csv
70
+ - metadata/variables-extra.csv
71
+ name_column: "Variable" # default
72
+ values_column: "Acceptable Values" # default
73
+ name_match: case_insensitive # case_insensitive | exact
74
+ delimiter: "," # CSV column separator: "," | ";" | "\t"
75
+
76
+ # ── Source-hygiene rules (all support autofix) ──────────────────────
77
+ trailing_whitespace:
78
+ enabled: true
79
+ autofix: false
80
+
81
+ tab_expansion:
82
+ enabled: true
83
+ autofix: false
84
+ width: 8 # tab stop width
85
+
86
+ source_headers:
87
+ enabled: true
88
+ autofix: false # rewrap **…**; 90-char header rows when true
89
+
90
+ line_endings:
91
+ enabled: true
92
+ autofix: false # collapse \r\r\n → \r\n; lone \r → dominant ending
93
+
94
+ encoding_issues:
95
+ enabled: true
96
+ autofix: false
97
+ use_defaults: false # apply built-in smart-quote / em-dash / Win-1252 map
98
+ replacements: # project-specific byte→ASCII rewrites (run BEFORE defaults)
99
+ "—": "--"
100
+ "\x85": "Ö"
101
+ ```
102
+
103
+ `enabled` and `autofix` are accepted on every rule. Options not listed above are ignored.
104
+
40
105
  ## Library usage
41
106
 
42
107
  ```ruby
@@ -99,26 +164,6 @@ Subclasses self-register on the rule registry via `rule_id` — once required, t
99
164
 
100
165
  To support autofix, override `self.supports_autofix?` to return `true` and implement `#autofix(source)` to return the rewritten source.
101
166
 
102
- ## YAML config
103
-
104
- ```yaml
105
- rules:
106
- malformed_if_condition:
107
- enabled: true # default
108
- trailing_whitespace:
109
- enabled: true
110
- autofix: true
111
- encoding_issues:
112
- enabled: true
113
- use_defaults: true
114
- replacements:
115
- "—": "--"
116
- identical_if_else_branches:
117
- enabled: false # disable a rule
118
- ```
119
-
120
- Rules omitted from the config default to enabled with no options, so adding a new rule to the gem won't silently disable it for users with existing configs.
121
-
122
167
  ## Testing
123
168
 
124
169
  ```sh
data/bin/sas_lint CHANGED
@@ -4,7 +4,7 @@
4
4
  require "sas_linter"
5
5
  require "optparse"
6
6
 
7
- options = { config: SasLinter::DEFAULT_CONFIG_PATH, rules: nil, autofix: true }
7
+ options = { config: SasLinter::DEFAULT_CONFIG_PATH, rules: nil, autofix: true, format: false }
8
8
  parser = OptionParser.new do |opts|
9
9
  opts.banner = "Usage: sas_lint FILE [FILE ...] [options]"
10
10
 
@@ -18,6 +18,12 @@ parser = OptionParser.new do |opts|
18
18
  options[:rules] = ids.map(&:to_sym)
19
19
  end
20
20
 
21
+ opts.on("--format",
22
+ "Reformat file(s) in place using `format:` config options and all autofix-capable " \
23
+ "rules. Does not report lint findings.") do
24
+ options[:format] = true
25
+ end
26
+
21
27
  opts.on("--no-autofix",
22
28
  "Suppress autofix even if the config sets `autofix: true` for some rule. " \
23
29
  "Findings are still reported but no file is rewritten.") do
@@ -45,35 +51,51 @@ if ARGV.empty?
45
51
  exit 2
46
52
  end
47
53
 
48
- linter =
49
- if options[:rules]
50
- SasLinter.new(rules: options[:rules])
51
- else
52
- config = SasLinter.load_config_file(options[:config])
53
- # `--no-autofix` strips every rule's autofix flag from the loaded config
54
- # before the linter is built, so a dry run can never rewrite a file.
55
- if options[:autofix] == false && config.is_a?(Hash) && config["rules"].is_a?(Hash)
56
- config["rules"].each_value do |opts_hash|
57
- opts_hash["autofix"] = false if opts_hash.is_a?(Hash) && opts_hash["autofix"]
58
- end
59
- end
60
- SasLinter.from_config(config)
61
- end
54
+ config = SasLinter.load_config_file(options[:config])
62
55
 
63
56
  exit_code = 0
64
57
 
65
- ARGV.each do |path|
66
- unless File.file?(path)
67
- warn "sas_lint: #{path}: not a regular file"
68
- exit_code = 2
69
- next
58
+ if options[:format]
59
+ formatter = SasLinter::Formatter.from_config(config)
60
+ linter = options[:rules] ? SasLinter.new(rules: options[:rules]) : SasLinter.from_config(config)
61
+
62
+ ARGV.each do |path|
63
+ unless File.file?(path)
64
+ warn "sas_lint: #{path}: not a regular file"
65
+ exit_code = 2
66
+ next
67
+ end
68
+
69
+ linter.format_file(path, formatter: formatter)
70
70
  end
71
+ else
72
+ linter =
73
+ if options[:rules]
74
+ SasLinter.new(rules: options[:rules])
75
+ else
76
+ # `--no-autofix` strips every rule's autofix flag from the loaded config
77
+ # before the linter is built, so a dry run can never rewrite a file.
78
+ if options[:autofix] == false && config.is_a?(Hash) && config["rules"].is_a?(Hash)
79
+ config["rules"].each_value do |opts_hash|
80
+ opts_hash["autofix"] = false if opts_hash.is_a?(Hash) && opts_hash["autofix"]
81
+ end
82
+ end
83
+ SasLinter.from_config(config)
84
+ end
71
85
 
72
- findings = linter.lint_file(path)
73
- next if findings.empty?
86
+ ARGV.each do |path|
87
+ unless File.file?(path)
88
+ warn "sas_lint: #{path}: not a regular file"
89
+ exit_code = 2
90
+ next
91
+ end
92
+
93
+ findings = linter.lint_file(path)
94
+ next if findings.empty?
74
95
 
75
- exit_code = 1
76
- findings.each { |f| puts f.to_s }
96
+ exit_code = 1
97
+ findings.each { |f| puts f.to_s }
98
+ end
77
99
  end
78
100
 
79
101
  exit exit_code
data/config/lint.yaml ADDED
@@ -0,0 +1,4 @@
1
+ format:
2
+ keywords: preserve # preserve | upper | lower
3
+ operator_spacing: true # normalize spaces around binary operators and after commas
4
+ indent_width: 2 # indentation width; 0 or omit to disable
@@ -0,0 +1,262 @@
1
+ # frozen_string_literal: true
2
+
3
+ class SasLinter
4
+ class Formatter
5
+ BINARY_OP_NAMES = %w[
6
+ ASSIGN PLUS MINUS STAR FSLASH STAR2
7
+ LT LE GT GE NE LTGT GTLT
8
+ AMP PIPE PIPE2 EXCL EXCL2 BPIPE BPIPE2 SOUNDS_LIKE
9
+ ].freeze
10
+ UNARY_CANDIDATE_NAMES = %w[PLUS MINUS].freeze
11
+ NO_SPACE_BEFORE_NAMES = %w[SEMI COMMA RPAREN RBRACK].freeze
12
+ VALUE_ENDING_NAMES = %w[
13
+ IDENTIFIER INTEGER_LITERAL FLOAT_LITERAL FLOAT_EXPONENT_LITERAL
14
+ STRING_LITERAL NAME_LITERAL DATE_LITERAL TIME_LITERAL DATE_TIME_LITERAL
15
+ HEX_STRING_LITERAL BIT_TESTING_LITERAL MACRO_VAR_RESOLVE MACRO_IDENTIFIER
16
+ STRING_EXPR_END BIT_TESTING_LITERAL_EXPR_END DATE_LITERAL_EXPR_END
17
+ DATE_TIME_LITERAL_EXPR_END HEX_STRING_LITERAL_EXPR_END NAME_LITERAL_EXPR_END
18
+ TIME_LITERAL_EXPR_END RPAREN RBRACK
19
+ ].freeze
20
+ COMMA_NAMES = %w[COMMA].freeze
21
+ DATA_PROC_NAMES = %w[KW_DATA KW_PROC].freeze
22
+ DO_NAMES = %w[KW_DO].freeze
23
+ END_NAMES = %w[KW_END].freeze
24
+ RUN_QUIT_NAMES = %w[KW_RUN KW_QUIT].freeze
25
+ SEMI_NAMES = %w[SEMI].freeze
26
+
27
+ def self.from_config(config)
28
+ config = (config || {}).transform_keys(&:to_s)
29
+ fmt = (config["format"] || {}).transform_keys(&:to_s)
30
+
31
+ keywords = fmt.fetch("keywords", "preserve").to_sym
32
+ unless %i[preserve upper lower].include?(keywords)
33
+ raise ArgumentError,
34
+ "format.keywords must be 'preserve', 'upper', or 'lower' (got '#{keywords}')"
35
+ end
36
+
37
+ operator_spacing = fmt.key?("operator_spacing") ? !!fmt["operator_spacing"] : false
38
+
39
+ raw_width = fmt["indent_width"]
40
+ indent_width = case raw_width
41
+ when nil, false then nil
42
+ else
43
+ w = Integer(raw_width)
44
+ w > 0 ? w : nil
45
+ end
46
+
47
+ new(keywords: keywords, operator_spacing: operator_spacing, indent_width: indent_width)
48
+ end
49
+
50
+ def initialize(keywords: :preserve, operator_spacing: false, indent_width: nil)
51
+ @keywords = keywords
52
+ @operator_spacing = operator_spacing
53
+ @indent_width = indent_width
54
+ end
55
+
56
+ def format(source)
57
+ return source if noop?
58
+
59
+ lexer = SasLexer::Lexer.new
60
+ all_tokens = begin
61
+ lexer.tokenize(source)
62
+ ensure
63
+ lexer.free
64
+ end
65
+
66
+ result = reconstruct(all_tokens)
67
+ result = apply_indentation(result, all_tokens) if @indent_width
68
+ result
69
+ end
70
+
71
+ private
72
+
73
+ def noop?
74
+ @keywords == :preserve && !@operator_spacing && @indent_width.nil?
75
+ end
76
+
77
+ # --- Type sets (built lazily from sas-lexer vocabulary) ---
78
+
79
+ def type_set(names)
80
+ tt = SasLexer::Lexer::TokenType
81
+ names.filter_map { |n| tt.const_get(n) if tt.const_defined?(n) }.to_set
82
+ end
83
+
84
+ def keyword_types
85
+ @keyword_types ||= SasLexer::Lexer::TokenType.constants
86
+ .select { |c| c.to_s.start_with?("KW_", "KWM_") }
87
+ .map { |c| SasLexer::Lexer::TokenType.const_get(c) }
88
+ .to_set
89
+ end
90
+
91
+ def binary_op_types = @binary_op_types ||= type_set(BINARY_OP_NAMES)
92
+ def unary_cand_types = @unary_cand_types ||= type_set(UNARY_CANDIDATE_NAMES)
93
+ def no_sp_before_types = @no_sp_before_types ||= type_set(NO_SPACE_BEFORE_NAMES)
94
+ def value_ending_types = @value_ending_types ||= type_set(VALUE_ENDING_NAMES)
95
+ def comma_types = @comma_types ||= type_set(COMMA_NAMES)
96
+ def data_proc_types = @data_proc_types ||= type_set(DATA_PROC_NAMES)
97
+ def do_types = @do_types ||= type_set(DO_NAMES)
98
+ def end_types = @end_types ||= type_set(END_NAMES)
99
+ def run_quit_types = @run_quit_types ||= type_set(RUN_QUIT_NAMES)
100
+ def semi_types = @semi_types ||= type_set(SEMI_NAMES)
101
+
102
+ # --- Reconstruction (keyword casing + operator spacing) ---
103
+
104
+ def apply_casing(token)
105
+ text = token[:text]
106
+ return text if @keywords == :preserve
107
+
108
+ default_ch = SasLexer::Lexer::TokenChannel::DEFAULT
109
+ return text unless token[:channel] == default_ch && keyword_types.include?(token[:type])
110
+
111
+ @keywords == :upper ? text.upcase : text.downcase
112
+ end
113
+
114
+ # Partition all_tokens into [{gap:, tok:}] segments where gap holds the
115
+ # non-DEFAULT tokens preceding tok, and tok is a DEFAULT-channel token.
116
+ def segmentize(all_tokens)
117
+ default_ch = SasLexer::Lexer::TokenChannel::DEFAULT
118
+ segments = []
119
+ gap = []
120
+ all_tokens.each do |t|
121
+ if t[:channel] == default_ch
122
+ segments << { gap: gap, tok: t }
123
+ gap = []
124
+ else
125
+ gap << t
126
+ end
127
+ end
128
+ segments << { gap: gap, tok: nil } unless gap.empty?
129
+ segments
130
+ end
131
+
132
+ def reconstruct(all_tokens)
133
+ segments = segmentize(all_tokens)
134
+ result = +""
135
+
136
+ segments.each_with_index do |seg, idx|
137
+ prev_prev = idx > 1 ? segments[idx - 2][:tok] : nil
138
+ prev = idx > 0 ? segments[idx - 1][:tok] : nil
139
+ cur = seg[:tok]
140
+ gap_text = seg[:gap].map { |t| t[:text] }.join
141
+
142
+ if @operator_spacing && prev && cur && !gap_text.include?("\n")
143
+ desired = gap_desired(prev_prev, prev, cur)
144
+ result << (desired.nil? ? gap_text : desired)
145
+ else
146
+ result << gap_text
147
+ end
148
+
149
+ result << apply_casing(cur) if cur
150
+ end
151
+
152
+ result
153
+ end
154
+
155
+ # Returns the desired whitespace between two same-line DEFAULT tokens:
156
+ # nil → leave unchanged
157
+ # "" → no space
158
+ # " " → exactly one space
159
+ #
160
+ # Three consecutive DEFAULT tokens are provided so unary PLUS/MINUS can be
161
+ # distinguished from binary: a PLUS/MINUS is binary only when its preceding
162
+ # token is a value-ending type (identifier, literal, or closing bracket).
163
+ def gap_desired(prev_prev_tok, prev_tok, next_tok)
164
+ pt = prev_tok[:type]
165
+ nt = next_tok[:type]
166
+ ppt = prev_prev_tok&.[](:type)
167
+
168
+ return "" if no_sp_before_types.include?(nt)
169
+ return " " if comma_types.include?(pt)
170
+
171
+ # Space after binary operator — but not after a unary +/-
172
+ if binary_op_types.include?(pt)
173
+ if unary_cand_types.include?(pt) && (ppt.nil? || !value_ending_types.include?(ppt))
174
+ return nil
175
+ end
176
+ return " "
177
+ end
178
+
179
+ # Space before binary operator — but not before a unary +/-
180
+ if binary_op_types.include?(nt)
181
+ return nil if unary_cand_types.include?(nt) && !value_ending_types.include?(pt)
182
+
183
+ return " "
184
+ end
185
+
186
+ nil
187
+ end
188
+
189
+ # --- Indentation ---
190
+
191
+ # Walk all_tokens and assign an indent level to each source line.
192
+ # Only the FIRST token on a given line determines its level (||= semantics).
193
+ # Nesting rules:
194
+ # DATA / PROC → level 0; content after their SEMI → level 1
195
+ # DO → content inside indented one further level
196
+ # END → decrements level before assigning the END line's level
197
+ # RUN / QUIT → resets to level 0
198
+ def compute_line_levels(all_tokens)
199
+ default_ch = SasLexer::Lexer::TokenChannel::DEFAULT
200
+ hidden_ch = SasLexer::Lexer::TokenChannel::HIDDEN
201
+ levels = {}
202
+ level = 0
203
+ after_data_proc = false
204
+
205
+ all_tokens.each do |tok|
206
+ next if tok[:channel] == hidden_ch
207
+
208
+ line = tok[:start_line]
209
+ type = tok[:type]
210
+
211
+ unless tok[:channel] == default_ch
212
+ levels[line] ||= level # comment tokens — indent at current level
213
+ next
214
+ end
215
+
216
+ if data_proc_types.include?(type)
217
+ level = 0
218
+ after_data_proc = true
219
+ levels[line] ||= level
220
+ elsif run_quit_types.include?(type)
221
+ level = 0
222
+ after_data_proc = false
223
+ levels[line] ||= level
224
+ elsif do_types.include?(type)
225
+ levels[line] ||= level
226
+ level += 1
227
+ elsif end_types.include?(type)
228
+ level = [level - 1, 0].max
229
+ levels[line] ||= level
230
+ elsif semi_types.include?(type) && after_data_proc
231
+ # Semicolon ends the DATA/PROC header — step body starts at level 1.
232
+ # Don't re-assign the SEMI's line (already marked at level 0 by DATA/PROC).
233
+ after_data_proc = false
234
+ level = 1
235
+ else
236
+ levels[line] ||= level
237
+ end
238
+ end
239
+
240
+ levels
241
+ end
242
+
243
+ # Re-indent each line of source using the computed per-line levels.
244
+ # Lines with no token coverage (blank lines) are left unchanged.
245
+ def apply_indentation(source, all_tokens)
246
+ line_levels = compute_line_levels(all_tokens)
247
+
248
+ source.each_line.with_index.map do |line, idx|
249
+ lineno = idx + 1
250
+ line_level = line_levels[lineno]
251
+ next line if line_level.nil?
252
+
253
+ eol = line.match(/\r?\n\z/)&.[](0) || ""
254
+ body = line[0...(line.length - eol.length)]
255
+ stripped = body.lstrip
256
+ next eol if stripped.empty? # blank line — strip stray whitespace
257
+
258
+ (" " * (@indent_width * line_level)) + stripped + eol
259
+ end.join
260
+ end
261
+ end
262
+ end
@@ -286,8 +286,13 @@ class SasLinter
286
286
  findings
287
287
  end
288
288
 
289
+ # `seq` is ASCII-8BIT (from `pack("C*")`); `encode("UTF-8")` on a
290
+ # binary string replaces every byte ≥ 0x80 with U+FFFD before
291
+ # `codepoints` runs. The bytes are already a valid UTF-8 sequence
292
+ # by construction (caller checked `utf8_sequence_length`), so
293
+ # reinterpret rather than transcode.
289
294
  def codepoint(seq)
290
- seq.encode("UTF-8", invalid: :replace, undef: :replace).codepoints.first
295
+ seq.dup.force_encoding("UTF-8").codepoints.first
291
296
  end
292
297
 
293
298
  # Returns the length (1-4) of a valid UTF-8 sequence starting
@@ -0,0 +1,49 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "../../sas_linter"
4
+ require "sas_lexer"
5
+
6
+ class SasLinter
7
+ module Rules
8
+ # Flag tokens the lexer typed as INTEGER_LITERAL whose text isn't
9
+ # actually a valid SAS numeric literal — typically a digit-prefixed
10
+ # identifier the lexer munched into one token, e.g.
11
+ #
12
+ # if K9d = 1f then do; * `1f` is not valid SAS
13
+ # x = 1d2; * `D` exponent is Fortran, not SAS
14
+ #
15
+ # Real SAS only accepts two integer-shaped literals:
16
+ #
17
+ # * plain decimal: `[0-9]+`
18
+ # * hex literal: `[0-9][0-9A-Fa-f]*[xX]` (trailing `x`/`X`)
19
+ #
20
+ # Float and exponent forms (`1.0`, `1e3`) get their own token types
21
+ # (FLOAT_LITERAL / FLOAT_EXPONENT_LITERAL) and aren't checked here.
22
+ class InvalidNumericLiteral < Rule
23
+ rule_id :invalid_numeric_literal
24
+ description "INTEGER_LITERAL must be a plain decimal or a SAS hex " \
25
+ "literal (`0FFx`-style); reject suffixes like `1f` or " \
26
+ "`1d2` that the lexer accepts but SAS does not."
27
+ severity :warning
28
+
29
+ TT = SasLexer::Lexer::TokenType
30
+
31
+ VALID = /\A(?:[0-9]+|[0-9][0-9A-Fa-f]*[xX])\z/
32
+
33
+ MESSAGE_SUFFIX = "is not a valid SAS numeric literal — SAS has no " \
34
+ "`f`/`F`/`L` numeric suffixes and uses `E` (not `D`) " \
35
+ "for exponents; hex literals must end in `x`/`X`."
36
+
37
+ def check(tokens, path:, all_tokens: nil, source: nil) # rubocop:disable Lint/UnusedMethodArgument
38
+ tokens.filter_map do |t|
39
+ next unless t[:type] == TT::INTEGER_LITERAL && !VALID.match?(t[:text])
40
+
41
+ finding(
42
+ line: t[:start_line], column: t[:start_column] + 1,
43
+ message: "`#{t[:text]}` #{MESSAGE_SUFFIX}", path: path
44
+ )
45
+ end
46
+ end
47
+ end
48
+ end
49
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class SasLinter
4
- VERSION = "0.1.0"
4
+ VERSION = "0.2.1"
5
5
  end
data/lib/sas_linter.rb CHANGED
@@ -241,6 +241,28 @@ class SasLinter
241
241
  findings
242
242
  end
243
243
 
244
+ # Apply formatting to a file in-place. Runs the formatter's own
245
+ # transformations first, then the autofix pipeline for rules that have
246
+ # `autofix: true` in config — identical to lint_with_fixes except that the
247
+ # formatter pass runs first. Rules that haven't been opted in to autofix
248
+ # (e.g. missing_assignment_semicolon without explicit `autofix: true`) are
249
+ # left alone so that --format stays cosmetic by default.
250
+ #
251
+ # Returns true if the file was rewritten, false if nothing changed.
252
+ def format_file(path, formatter:)
253
+ original = read_source(path)
254
+ modified = formatter.format(original)
255
+ @rules.each do |rule|
256
+ next unless rule.autofix? && rule.class.supports_autofix?
257
+
258
+ modified = rule.autofix(modified)
259
+ end
260
+ return false if modified.b == original.b
261
+
262
+ File.write(path, modified)
263
+ true
264
+ end
265
+
244
266
  private
245
267
 
246
268
  def read_source(path)
@@ -273,6 +295,7 @@ class SasLinter
273
295
  end
274
296
  end
275
297
 
298
+ require_relative "sas_linter/formatter"
276
299
  require_relative "sas_linter/rules/unreachable_inner_branch_value"
277
300
  require_relative "sas_linter/rules/identical_if_else_branches"
278
301
  require_relative "sas_linter/rules/commented_out_guard"
@@ -285,3 +308,4 @@ require_relative "sas_linter/rules/encoding_issues"
285
308
  require_relative "sas_linter/rules/malformed_if_condition"
286
309
  require_relative "sas_linter/rules/missing_assignment_semicolon"
287
310
  require_relative "sas_linter/rules/variable_value_out_of_known_range"
311
+ require_relative "sas_linter/rules/invalid_numeric_literal"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sas-linter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Craig McNamara
@@ -52,15 +52,19 @@ executables:
52
52
  extensions: []
53
53
  extra_rdoc_files: []
54
54
  files:
55
+ - CLAUDE.md
55
56
  - LICENSE
56
57
  - README.md
57
58
  - Rakefile
58
59
  - bin/sas_lint
60
+ - config/lint.yaml
59
61
  - lib/sas_linter.rb
62
+ - lib/sas_linter/formatter.rb
60
63
  - lib/sas_linter/rules/choose_one_template.rb
61
64
  - lib/sas_linter/rules/commented_out_guard.rb
62
65
  - lib/sas_linter/rules/encoding_issues.rb
63
66
  - lib/sas_linter/rules/identical_if_else_branches.rb
67
+ - lib/sas_linter/rules/invalid_numeric_literal.rb
64
68
  - lib/sas_linter/rules/line_endings.rb
65
69
  - lib/sas_linter/rules/malformed_if_condition.rb
66
70
  - lib/sas_linter/rules/missing_assignment_semicolon.rb