sas-linter 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8179cf3c7d87f82c58c772a5916be519d2a6028c882bbec74de8716ba04acf9b
4
- data.tar.gz: 1c6256a8f4e3dea07dcce74aad34f4431d24ee80ad94e41afdaeb71b279a1d08
3
+ metadata.gz: 4552996a6f33196d26f85d2cd057c525f8aff278b2e4d26791d0295bc5b49ccc
4
+ data.tar.gz: 72b2f4a69e108b29818cb686b8ffb135768547a0d68a09086a5d4946524af17d
5
5
  SHA512:
6
- metadata.gz: 43a9b197ef714f8942fc7b1dda78ec62ad5c1d844ff194da6a1dabdf67195eb347a7437542bbdd0c1df70fc3f7c0a6ef9e33e6d05fe1af6b04f4eaa8875cfcab
7
- data.tar.gz: 728e57cee72bd5335d715418cec293ea3d890b65bef6944d6fbd85992f972c549662fd398e93b856e3d98c36908aacbcd2654d7a9e1638d0d2be9a332a3170a9
6
+ metadata.gz: 1efcbb6cf7707971d58ae82089df387a309e5f5941d9a90b72075c1cfa8af660bc21d9089f5d42b72b428b8669535334eb6b1f1bd22967738a58a1057f0d7614
7
+ data.tar.gz: 6c37528f85bd94d51020b08fb1685a20a1dd1e0eded7a2949c27c57b21b1b4b5204036220826b39e198e6b6c92160634ccbd7bc3b79b657b834a9a1e666233bd
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # sas-linter
2
2
 
3
- A configurable lint engine for SAS source files. Built on the [`sas-lexer`](https://github.com/mes-amis/sas-lexer-rb) gem (a Ruby FFI binding to Misha Perlov's Rust [`sas-lexer`](https://github.com/mishamsk/sas-lexer)) and ships with eleven pluggable rules covering structural defects, cosmetic issues, and source-header conventions.
3
+ A configurable lint engine for SAS source files. Built on the [`sas-lexer`](https://github.com/mes-amis/sas-lexer-rb) gem (a Ruby FFI binding to Misha Perlov's Rust [`sas-lexer`](https://github.com/mishamsk/sas-lexer)) and ships with thirteen pluggable rules covering structural defects, cosmetic issues, and source-header conventions.
4
4
 
5
5
  ## Installation
6
6
 
@@ -63,6 +63,10 @@ rules:
63
63
  enabled: true
64
64
  autofix: false # rule supports autofix; off by default
65
65
 
66
+ inconsistent_variable_case:
67
+ enabled: true
68
+ autofix: false # rewrite every minority casing to the most-common form
69
+
66
70
  variable_value_out_of_known_range:
67
71
  enabled: true
68
72
  csv_paths: # empty list = rule is a no-op
@@ -135,6 +139,7 @@ findings = linter.lint_file("path/to/source.sas")
135
139
  | `malformed_if_condition` | Empty conditions, missing operators, orphan `then`, unbalanced parens, etc. |
136
140
  | `missing_assignment_semicolon` | Assignment statements followed by an inline `**` comment but no terminating `;`. |
137
141
  | `variable_value_out_of_known_range` | `if VAR = N` / `if VAR in (...)` literals fall outside the variable's documented acceptable values. Loads the catalog from one or more CSVs with configurable column names and column separator (`,`, `;`, tab). |
142
+ | `inconsistent_variable_case` | Identifier appears with more than one casing in the same file (`myVar` vs `MyVar`). SAS treats both as the same variable; autofix rewrites every minority spelling to the most-common form. Skips proc-format definitions and `format.` / `lib.member` references. |
138
143
 
139
144
  `bin/sas_lint --list-rules` prints the same set with autofix capability.
140
145
 
@@ -286,8 +286,13 @@ class SasLinter
286
286
  findings
287
287
  end
288
288
 
289
+ # `seq` is ASCII-8BIT (from `pack("C*")`); `encode("UTF-8")` on a
290
+ # binary string replaces every byte ≥ 0x80 with U+FFFD before
291
+ # `codepoints` runs. The bytes are already a valid UTF-8 sequence
292
+ # by construction (caller checked `utf8_sequence_length`), so
293
+ # reinterpret rather than transcode.
289
294
  def codepoint(seq)
290
- seq.encode("UTF-8", invalid: :replace, undef: :replace).codepoints.first
295
+ seq.dup.force_encoding("UTF-8").codepoints.first
291
296
  end
292
297
 
293
298
  # Returns the length (1-4) of a valid UTF-8 sequence starting
@@ -0,0 +1,151 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "../../sas_linter"
4
+ require "sas_lexer"
5
+
6
+ class SasLinter
7
+ module Rules
8
+ # Flag identifiers that are spelled with inconsistent letter case
9
+ # across the file. SAS resolves variable references case-insensitively,
10
+ # so `myVar` and `MyVar` end up bound to the same column — but mixing
11
+ # the two within one program is sloppy and makes the source harder to
12
+ # grep, diff, and read.
13
+ #
14
+ # The most-used spelling wins; every other casing is reported (and
15
+ # rewritten when autofix is on). Ties resolve to the first occurrence
16
+ # so the canonical form is reading-order deterministic.
17
+ #
18
+ # Skipped on purpose:
19
+ # * identifiers immediately followed by `.` (format references like
20
+ # `agecat.`, library references like `work.foo`);
21
+ # * identifiers immediately preceded by `.` (the column half of
22
+ # `lib.member` / `dataset.col`) — those name a column in another
23
+ # dataset, not a variable in the current step;
24
+ # * `value` / `invalue` / `picture` themselves and the format name
25
+ # directly following them — these are proc-format definitions,
26
+ # not variable references. We match locally rather than tracking
27
+ # a `proc format ... run;` block because real-world SAS files
28
+ # meant to be `%include`d into a caller's data step often omit
29
+ # the terminating `run;`, so a state machine would never close.
30
+ class InconsistentVariableCase < Rule
31
+ rule_id :inconsistent_variable_case
32
+ description "Variable identifiers must use one consistent letter case " \
33
+ "across the file; mixing `myVar` and `MyVar` is sloppy."
34
+ severity :warning
35
+
36
+ TT = SasLexer::Lexer::TokenType
37
+
38
+ # Identifiers that introduce a format / informat / picture
39
+ # definition in a `proc format` step. The lexer types these as
40
+ # plain IDENTIFIERs (not keywords), so we recognize them by text.
41
+ FORMAT_DEF_KEYWORDS = %w[value invalue picture].freeze
42
+
43
+ def self.supports_autofix?
44
+ true
45
+ end
46
+
47
+ def check(tokens, path:, all_tokens: nil, source: nil) # rubocop:disable Lint/UnusedMethodArgument
48
+ findings = []
49
+ each_inconsistent_use(tokens) do |token, canonical|
50
+ findings << finding(
51
+ line: token[:start_line],
52
+ column: token[:start_column] + 1,
53
+ message: "variable `#{token[:text]}` is spelled `#{canonical}` " \
54
+ "elsewhere in this file — pick one case and stick with it.",
55
+ path: path
56
+ )
57
+ end
58
+ findings
59
+ end
60
+
61
+ def autofix(source)
62
+ return source if source.nil? || source.empty?
63
+
64
+ lexer = SasLexer::Lexer.new
65
+ begin
66
+ all_tokens = lexer.tokenize(source)
67
+ ensure
68
+ lexer.free
69
+ end
70
+ tokens = all_tokens.reject do |t|
71
+ t[:channel] == SasLexer::Lexer::TokenChannel::HIDDEN ||
72
+ t[:channel] == SasLexer::Lexer::TokenChannel::COMMENT
73
+ end
74
+
75
+ edits = []
76
+ each_inconsistent_use(tokens) do |token, canonical|
77
+ edits << [token[:start], token[:end], canonical]
78
+ end
79
+
80
+ # Apply right-to-left so earlier offsets stay valid.
81
+ out = source.dup
82
+ edits.sort_by! { |start, _, _| -start }
83
+ edits.each { |start, finish, repl| out[start...finish] = repl }
84
+ out
85
+ end
86
+
87
+ private
88
+
89
+ # Yields `[token, canonical_form]` for every identifier whose
90
+ # spelling differs from the file-wide canonical case.
91
+ def each_inconsistent_use(tokens)
92
+ groups = collect_variable_uses(tokens)
93
+
94
+ groups.each_value do |uses|
95
+ forms = uses.map { |t| t[:text] }.tally
96
+ next if forms.size <= 1
97
+
98
+ canonical = canonical_form(forms, uses)
99
+ uses.each do |t|
100
+ yield t, canonical unless t[:text] == canonical
101
+ end
102
+ end
103
+ end
104
+
105
+ # Walk default-channel tokens and bucket eligible IDENTIFIER
106
+ # uses by lowercase name. Format-related identifiers (see class
107
+ # docstring) are filtered out by `variable_use?`.
108
+ def collect_variable_uses(tokens)
109
+ groups = Hash.new { |h, k| h[k] = [] }
110
+ tokens.each_with_index do |t, i|
111
+ next unless t[:type] == TT::IDENTIFIER && variable_use?(tokens, i)
112
+
113
+ groups[t[:text].downcase] << t
114
+ end
115
+ groups
116
+ end
117
+
118
+ # Reject `format.` / `lib.member` shapes via byte-adjacency to a
119
+ # `.` token, and `value <fmt-name>` shapes by checking the
120
+ # neighboring identifier. The lexer emits the dot separately, so
121
+ # we use `prev.end == t.start` / `t.end == nxt.start` to tell a
122
+ # truly-adjacent dot from one that just happens to follow after
123
+ # whitespace.
124
+ def variable_use?(tokens, i)
125
+ t = tokens[i]
126
+ nxt = tokens[i + 1]
127
+ prev = i.positive? ? tokens[i - 1] : nil
128
+
129
+ return false if nxt && nxt[:type] == TT::DOT && nxt[:start] == t[:end]
130
+ return false if prev && prev[:type] == TT::DOT && prev[:end] == t[:start]
131
+ return false if FORMAT_DEF_KEYWORDS.include?(t[:text].downcase)
132
+ return false if prev && prev[:type] == TT::IDENTIFIER &&
133
+ FORMAT_DEF_KEYWORDS.include?(prev[:text].downcase)
134
+
135
+ true
136
+ end
137
+
138
+ # Most-used spelling wins; ties go to the first occurrence so the
139
+ # canonical form matches reading order and stays deterministic
140
+ # across runs.
141
+ def canonical_form(forms, uses)
142
+ max_count = forms.values.max
143
+ winners = forms.select { |_, c| c == max_count }.keys
144
+ return winners.first if winners.size == 1
145
+
146
+ uses.each { |t| return t[:text] if winners.include?(t[:text]) }
147
+ winners.first
148
+ end
149
+ end
150
+ end
151
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class SasLinter
4
- VERSION = "0.2.0"
4
+ VERSION = "0.2.2"
5
5
  end
data/lib/sas_linter.rb CHANGED
@@ -309,3 +309,4 @@ require_relative "sas_linter/rules/malformed_if_condition"
309
309
  require_relative "sas_linter/rules/missing_assignment_semicolon"
310
310
  require_relative "sas_linter/rules/variable_value_out_of_known_range"
311
311
  require_relative "sas_linter/rules/invalid_numeric_literal"
312
+ require_relative "sas_linter/rules/inconsistent_variable_case"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sas-linter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Craig McNamara
@@ -64,6 +64,7 @@ files:
64
64
  - lib/sas_linter/rules/commented_out_guard.rb
65
65
  - lib/sas_linter/rules/encoding_issues.rb
66
66
  - lib/sas_linter/rules/identical_if_else_branches.rb
67
+ - lib/sas_linter/rules/inconsistent_variable_case.rb
67
68
  - lib/sas_linter/rules/invalid_numeric_literal.rb
68
69
  - lib/sas_linter/rules/line_endings.rb
69
70
  - lib/sas_linter/rules/malformed_if_condition.rb