sas-linter 0.1.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CLAUDE.md +44 -0
- data/README.md +65 -20
- data/bin/sas_lint +46 -24
- data/config/lint.yaml +4 -0
- data/lib/sas_linter/formatter.rb +262 -0
- data/lib/sas_linter/rules/encoding_issues.rb +6 -1
- data/lib/sas_linter/rules/invalid_numeric_literal.rb +49 -0
- data/lib/sas_linter/version.rb +1 -1
- data/lib/sas_linter.rb +24 -0
- metadata +5 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 306561a9219d046d164095dd03b92fdd2da003c4292188e9d5d75e19eae4c3c9
|
|
4
|
+
data.tar.gz: 559eab2894b60b33f0a159df624603379a4285c61a6e49425ead209a991ae760
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 7da26fa1fbf7cf1fc00753e157c6e5ba2e9ed16b7c2470dccd7585717a35f269ee2636754b572f0ca425f16d4058fea0e212927bc45dd7efc75d504da13202df
|
|
7
|
+
data.tar.gz: 7f507e8df56ea74b355c0c471a3de0022064860c43b90e779d11a4b0f7ac1ebd20de09ea20f614e04a87c747b1bf774cb8eeda97e00045879f728f040a487e1c
|
data/CLAUDE.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## Commands
|
|
6
|
+
|
|
7
|
+
```sh
|
|
8
|
+
bundle install # install dev deps
|
|
9
|
+
bundle exec rake spec # full test suite (also `rake` default)
|
|
10
|
+
bundle exec rspec spec/sas_linter_spec.rb # single file
|
|
11
|
+
bundle exec rspec spec/sas_linter_spec.rb:42 # single example by line
|
|
12
|
+
bundle exec rubocop # lint Ruby (also `rake rubocop`)
|
|
13
|
+
bin/sas_lint --list-rules # CLI sanity check (lists every registered rule)
|
|
14
|
+
bin/sas_lint path/to/file.sas # run linter from working tree
|
|
15
|
+
gem build sas-linter.gemspec # build the gem locally
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
Ruby ≥ 3.4 is required (see gemspec). CI matrix runs ubuntu + macOS on Ruby 3.4 and 4.0.
|
|
19
|
+
|
|
20
|
+
## Architecture
|
|
21
|
+
|
|
22
|
+
**Rule registry, self-registering subclasses.** `SasLinter::Rule` keeps a class-level `registry` keyed by rule id. Subclasses call `rule_id :foo` in their class body, which triggers `Rule.register(self)`. Requiring a rule file is enough to make it discoverable via `SasLinter::Rule.fetch(:foo)` and to include it in `SasLinter.new` (no rules arg → all registered rules). All rule files are required at the bottom of `lib/sas_linter.rb`; new rules must be added to that require block.
|
|
23
|
+
|
|
24
|
+
**Two-channel token stream.** `SasLinter#tokenize` runs `SasLexer::Lexer` once and returns `[default_tokens, all_tokens]`. Default-channel excludes whitespace + comment tokens; `all_tokens` keeps everything. `Rule#check(tokens, path:, all_tokens:, source:)` receives both, plus the raw `source` string. Most rules walk default-channel only; `commented_out_guard` needs comments, source-hygiene rules (`trailing_whitespace`, `tab_expansion`, `line_endings`, `encoding_issues`, `source_headers`) work directly on `source`.
|
|
25
|
+
|
|
26
|
+
**Config-driven instantiation.** `SasLinter.from_config(hash)` walks `config["rules"]`, skips `enabled: false` entries, and calls `klass.from_config(opts)` on the rest. The base `Rule.from_config` only forwards `autofix:` — rules with extra options (`encoding_issues`, `tab_expansion`, `variable_value_out_of_known_range`) override `from_config` to map their YAML keys onto kwargs. **Rules omitted from a config default to enabled with no options** so adding a new rule never silently disables it for existing users; to suppress, list with `enabled: false`.
|
|
27
|
+
|
|
28
|
+
**Autofix pipeline.** `lint_with_fixes` returns `[findings, modified_source]`. After `check`, the engine runs `autofix(source)` on every rule where `rule.autofix?` (instance flag from config) AND `rule.class.supports_autofix?` (class capability) are both true. Autofixes compose by chaining each rule's output into the next rule's input. `lint_file` writes the result back **only if `modified.b != original.b`** — the `.b` byte compare prevents spurious rewrites when a rule returns a re-encoded but byte-identical string (encoding-tag-only diffs would otherwise overwrite the user's encoding). The CLI's `--no-autofix` strips `autofix: true` from the loaded config hash *before* constructing the linter, so a dry run cannot rewrite a file even if the config requests it.
|
|
29
|
+
|
|
30
|
+
**Source encoding fallback.** SAS sources are commonly Windows-1252 or ISO-8859-1. `read_source` reads as BINARY, returns as UTF-8 if already valid, otherwise transcodes Win-1252 → UTF-8 with `invalid: :replace, undef: :replace, replace: "'"`, falling back to ISO-8859-1 on failure. The lexer requires valid UTF-8, so this transcode happens before tokenization.
|
|
31
|
+
|
|
32
|
+
**Custom rules.** Subclass `SasLinter::Rule`, declare `rule_id`, `description`, `severity`, implement `check`. To support autofix, override `self.supports_autofix?` to return true and implement `#autofix(source)` returning the rewritten source. Use the protected `finding(line:, column:, message:, path:)` helper rather than building `Finding` structs directly so severity and rule id are filled in consistently.
|
|
33
|
+
|
|
34
|
+
## Test fixtures
|
|
35
|
+
|
|
36
|
+
`spec/sas_linter_spec.rb` is the integration suite. Per-rule fixtures live in `spec/fixtures/lints/<rule_id>/` as a pair: `lint.sas` (demonstrates the bug, expected to produce findings) and `clean.sas` (same shape, fixed, expected to be silent). Helpers `lint_fixture(name)` and `clean_fixture(name)` resolve those paths. When adding a rule, add a matching fixture pair — the suite's parametric tests rely on the convention. Rule-specific unit specs live under `spec/sas_linter/rules/`.
|
|
37
|
+
|
|
38
|
+
## Release flow
|
|
39
|
+
|
|
40
|
+
`.github/workflows/publish.yml` publishes to RubyGems on push to `main` via OIDC trusted publishing (no API key in repo). The job is idempotent: it checks for an existing `v<version>` tag on origin and treats RubyGems "already pushed" responses as success. To cut a release, bump `SasLinter::VERSION` in `lib/sas_linter/version.rb` and merge to `main`; the workflow tags `v<version>` and creates a GitHub release.
|
|
41
|
+
|
|
42
|
+
## License note
|
|
43
|
+
|
|
44
|
+
AGPL-3.0-or-later — chosen to match upstream `sas-lexer`. Redistribution and network-service use trigger source-disclosure obligations; standalone CLI/CI use does not. Keep this in mind before suggesting embedding the linter into a redistributed product.
|
data/README.md
CHANGED
|
@@ -37,6 +37,71 @@ bin/sas_lint --no-autofix src/*.sas
|
|
|
37
37
|
|
|
38
38
|
Exit codes: `0` clean, `1` findings, `2` invalid args.
|
|
39
39
|
|
|
40
|
+
## YAML config
|
|
41
|
+
|
|
42
|
+
Every rule with options, plus its defaults. Rules omitted from the config default to enabled with no options, so adding a new rule to the gem won't silently disable it for users with existing configs. To suppress a rule, list it with `enabled: false`.
|
|
43
|
+
|
|
44
|
+
```yaml
|
|
45
|
+
rules:
|
|
46
|
+
# ── Structural / semantic rules ─────────────────────────────────────
|
|
47
|
+
unreachable_inner_branch_value:
|
|
48
|
+
enabled: true # default for every rule
|
|
49
|
+
|
|
50
|
+
identical_if_else_branches:
|
|
51
|
+
enabled: true
|
|
52
|
+
|
|
53
|
+
malformed_if_condition:
|
|
54
|
+
enabled: true
|
|
55
|
+
|
|
56
|
+
commented_out_guard:
|
|
57
|
+
enabled: true
|
|
58
|
+
|
|
59
|
+
choose_one_template:
|
|
60
|
+
enabled: true
|
|
61
|
+
|
|
62
|
+
missing_assignment_semicolon:
|
|
63
|
+
enabled: true
|
|
64
|
+
autofix: false # rule supports autofix; off by default
|
|
65
|
+
|
|
66
|
+
variable_value_out_of_known_range:
|
|
67
|
+
enabled: true
|
|
68
|
+
csv_paths: # empty list = rule is a no-op
|
|
69
|
+
- metadata/variables.csv
|
|
70
|
+
- metadata/variables-extra.csv
|
|
71
|
+
name_column: "Variable" # default
|
|
72
|
+
values_column: "Acceptable Values" # default
|
|
73
|
+
name_match: case_insensitive # case_insensitive | exact
|
|
74
|
+
delimiter: "," # CSV column separator: "," | ";" | "\t"
|
|
75
|
+
|
|
76
|
+
# ── Source-hygiene rules (all support autofix) ──────────────────────
|
|
77
|
+
trailing_whitespace:
|
|
78
|
+
enabled: true
|
|
79
|
+
autofix: false
|
|
80
|
+
|
|
81
|
+
tab_expansion:
|
|
82
|
+
enabled: true
|
|
83
|
+
autofix: false
|
|
84
|
+
width: 8 # tab stop width
|
|
85
|
+
|
|
86
|
+
source_headers:
|
|
87
|
+
enabled: true
|
|
88
|
+
autofix: false # rewrap **…**; 90-char header rows when true
|
|
89
|
+
|
|
90
|
+
line_endings:
|
|
91
|
+
enabled: true
|
|
92
|
+
autofix: false # collapse \r\r\n → \r\n; lone \r → dominant ending
|
|
93
|
+
|
|
94
|
+
encoding_issues:
|
|
95
|
+
enabled: true
|
|
96
|
+
autofix: false
|
|
97
|
+
use_defaults: false # apply built-in smart-quote / em-dash / Win-1252 map
|
|
98
|
+
replacements: # project-specific byte→ASCII rewrites (run BEFORE defaults)
|
|
99
|
+
"—": "--"
|
|
100
|
+
"\x85": "Ö"
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
`enabled` and `autofix` are accepted on every rule. Options not listed above are ignored.
|
|
104
|
+
|
|
40
105
|
## Library usage
|
|
41
106
|
|
|
42
107
|
```ruby
|
|
@@ -99,26 +164,6 @@ Subclasses self-register on the rule registry via `rule_id` — once required, t
|
|
|
99
164
|
|
|
100
165
|
To support autofix, override `self.supports_autofix?` to return `true` and implement `#autofix(source)` to return the rewritten source.
|
|
101
166
|
|
|
102
|
-
## YAML config
|
|
103
|
-
|
|
104
|
-
```yaml
|
|
105
|
-
rules:
|
|
106
|
-
malformed_if_condition:
|
|
107
|
-
enabled: true # default
|
|
108
|
-
trailing_whitespace:
|
|
109
|
-
enabled: true
|
|
110
|
-
autofix: true
|
|
111
|
-
encoding_issues:
|
|
112
|
-
enabled: true
|
|
113
|
-
use_defaults: true
|
|
114
|
-
replacements:
|
|
115
|
-
"—": "--"
|
|
116
|
-
identical_if_else_branches:
|
|
117
|
-
enabled: false # disable a rule
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
Rules omitted from the config default to enabled with no options, so adding a new rule to the gem won't silently disable it for users with existing configs.
|
|
121
|
-
|
|
122
167
|
## Testing
|
|
123
168
|
|
|
124
169
|
```sh
|
data/bin/sas_lint
CHANGED
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
require "sas_linter"
|
|
5
5
|
require "optparse"
|
|
6
6
|
|
|
7
|
-
options = { config: SasLinter::DEFAULT_CONFIG_PATH, rules: nil, autofix: true }
|
|
7
|
+
options = { config: SasLinter::DEFAULT_CONFIG_PATH, rules: nil, autofix: true, format: false }
|
|
8
8
|
parser = OptionParser.new do |opts|
|
|
9
9
|
opts.banner = "Usage: sas_lint FILE [FILE ...] [options]"
|
|
10
10
|
|
|
@@ -18,6 +18,12 @@ parser = OptionParser.new do |opts|
|
|
|
18
18
|
options[:rules] = ids.map(&:to_sym)
|
|
19
19
|
end
|
|
20
20
|
|
|
21
|
+
opts.on("--format",
|
|
22
|
+
"Reformat file(s) in place using `format:` config options and all autofix-capable " \
|
|
23
|
+
"rules. Does not report lint findings.") do
|
|
24
|
+
options[:format] = true
|
|
25
|
+
end
|
|
26
|
+
|
|
21
27
|
opts.on("--no-autofix",
|
|
22
28
|
"Suppress autofix even if the config sets `autofix: true` for some rule. " \
|
|
23
29
|
"Findings are still reported but no file is rewritten.") do
|
|
@@ -45,35 +51,51 @@ if ARGV.empty?
|
|
|
45
51
|
exit 2
|
|
46
52
|
end
|
|
47
53
|
|
|
48
|
-
|
|
49
|
-
if options[:rules]
|
|
50
|
-
SasLinter.new(rules: options[:rules])
|
|
51
|
-
else
|
|
52
|
-
config = SasLinter.load_config_file(options[:config])
|
|
53
|
-
# `--no-autofix` strips every rule's autofix flag from the loaded config
|
|
54
|
-
# before the linter is built, so a dry run can never rewrite a file.
|
|
55
|
-
if options[:autofix] == false && config.is_a?(Hash) && config["rules"].is_a?(Hash)
|
|
56
|
-
config["rules"].each_value do |opts_hash|
|
|
57
|
-
opts_hash["autofix"] = false if opts_hash.is_a?(Hash) && opts_hash["autofix"]
|
|
58
|
-
end
|
|
59
|
-
end
|
|
60
|
-
SasLinter.from_config(config)
|
|
61
|
-
end
|
|
54
|
+
config = SasLinter.load_config_file(options[:config])
|
|
62
55
|
|
|
63
56
|
exit_code = 0
|
|
64
57
|
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
58
|
+
if options[:format]
|
|
59
|
+
formatter = SasLinter::Formatter.from_config(config)
|
|
60
|
+
linter = options[:rules] ? SasLinter.new(rules: options[:rules]) : SasLinter.from_config(config)
|
|
61
|
+
|
|
62
|
+
ARGV.each do |path|
|
|
63
|
+
unless File.file?(path)
|
|
64
|
+
warn "sas_lint: #{path}: not a regular file"
|
|
65
|
+
exit_code = 2
|
|
66
|
+
next
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
linter.format_file(path, formatter: formatter)
|
|
70
70
|
end
|
|
71
|
+
else
|
|
72
|
+
linter =
|
|
73
|
+
if options[:rules]
|
|
74
|
+
SasLinter.new(rules: options[:rules])
|
|
75
|
+
else
|
|
76
|
+
# `--no-autofix` strips every rule's autofix flag from the loaded config
|
|
77
|
+
# before the linter is built, so a dry run can never rewrite a file.
|
|
78
|
+
if options[:autofix] == false && config.is_a?(Hash) && config["rules"].is_a?(Hash)
|
|
79
|
+
config["rules"].each_value do |opts_hash|
|
|
80
|
+
opts_hash["autofix"] = false if opts_hash.is_a?(Hash) && opts_hash["autofix"]
|
|
81
|
+
end
|
|
82
|
+
end
|
|
83
|
+
SasLinter.from_config(config)
|
|
84
|
+
end
|
|
71
85
|
|
|
72
|
-
|
|
73
|
-
|
|
86
|
+
ARGV.each do |path|
|
|
87
|
+
unless File.file?(path)
|
|
88
|
+
warn "sas_lint: #{path}: not a regular file"
|
|
89
|
+
exit_code = 2
|
|
90
|
+
next
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
findings = linter.lint_file(path)
|
|
94
|
+
next if findings.empty?
|
|
74
95
|
|
|
75
|
-
|
|
76
|
-
|
|
96
|
+
exit_code = 1
|
|
97
|
+
findings.each { |f| puts f.to_s }
|
|
98
|
+
end
|
|
77
99
|
end
|
|
78
100
|
|
|
79
101
|
exit exit_code
|
data/config/lint.yaml
ADDED
|
@@ -0,0 +1,262 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
class SasLinter
|
|
4
|
+
class Formatter
|
|
5
|
+
BINARY_OP_NAMES = %w[
|
|
6
|
+
ASSIGN PLUS MINUS STAR FSLASH STAR2
|
|
7
|
+
LT LE GT GE NE LTGT GTLT
|
|
8
|
+
AMP PIPE PIPE2 EXCL EXCL2 BPIPE BPIPE2 SOUNDS_LIKE
|
|
9
|
+
].freeze
|
|
10
|
+
UNARY_CANDIDATE_NAMES = %w[PLUS MINUS].freeze
|
|
11
|
+
NO_SPACE_BEFORE_NAMES = %w[SEMI COMMA RPAREN RBRACK].freeze
|
|
12
|
+
VALUE_ENDING_NAMES = %w[
|
|
13
|
+
IDENTIFIER INTEGER_LITERAL FLOAT_LITERAL FLOAT_EXPONENT_LITERAL
|
|
14
|
+
STRING_LITERAL NAME_LITERAL DATE_LITERAL TIME_LITERAL DATE_TIME_LITERAL
|
|
15
|
+
HEX_STRING_LITERAL BIT_TESTING_LITERAL MACRO_VAR_RESOLVE MACRO_IDENTIFIER
|
|
16
|
+
STRING_EXPR_END BIT_TESTING_LITERAL_EXPR_END DATE_LITERAL_EXPR_END
|
|
17
|
+
DATE_TIME_LITERAL_EXPR_END HEX_STRING_LITERAL_EXPR_END NAME_LITERAL_EXPR_END
|
|
18
|
+
TIME_LITERAL_EXPR_END RPAREN RBRACK
|
|
19
|
+
].freeze
|
|
20
|
+
COMMA_NAMES = %w[COMMA].freeze
|
|
21
|
+
DATA_PROC_NAMES = %w[KW_DATA KW_PROC].freeze
|
|
22
|
+
DO_NAMES = %w[KW_DO].freeze
|
|
23
|
+
END_NAMES = %w[KW_END].freeze
|
|
24
|
+
RUN_QUIT_NAMES = %w[KW_RUN KW_QUIT].freeze
|
|
25
|
+
SEMI_NAMES = %w[SEMI].freeze
|
|
26
|
+
|
|
27
|
+
def self.from_config(config)
|
|
28
|
+
config = (config || {}).transform_keys(&:to_s)
|
|
29
|
+
fmt = (config["format"] || {}).transform_keys(&:to_s)
|
|
30
|
+
|
|
31
|
+
keywords = fmt.fetch("keywords", "preserve").to_sym
|
|
32
|
+
unless %i[preserve upper lower].include?(keywords)
|
|
33
|
+
raise ArgumentError,
|
|
34
|
+
"format.keywords must be 'preserve', 'upper', or 'lower' (got '#{keywords}')"
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
operator_spacing = fmt.key?("operator_spacing") ? !!fmt["operator_spacing"] : false
|
|
38
|
+
|
|
39
|
+
raw_width = fmt["indent_width"]
|
|
40
|
+
indent_width = case raw_width
|
|
41
|
+
when nil, false then nil
|
|
42
|
+
else
|
|
43
|
+
w = Integer(raw_width)
|
|
44
|
+
w > 0 ? w : nil
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
new(keywords: keywords, operator_spacing: operator_spacing, indent_width: indent_width)
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
def initialize(keywords: :preserve, operator_spacing: false, indent_width: nil)
|
|
51
|
+
@keywords = keywords
|
|
52
|
+
@operator_spacing = operator_spacing
|
|
53
|
+
@indent_width = indent_width
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
def format(source)
|
|
57
|
+
return source if noop?
|
|
58
|
+
|
|
59
|
+
lexer = SasLexer::Lexer.new
|
|
60
|
+
all_tokens = begin
|
|
61
|
+
lexer.tokenize(source)
|
|
62
|
+
ensure
|
|
63
|
+
lexer.free
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
result = reconstruct(all_tokens)
|
|
67
|
+
result = apply_indentation(result, all_tokens) if @indent_width
|
|
68
|
+
result
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
private
|
|
72
|
+
|
|
73
|
+
def noop?
|
|
74
|
+
@keywords == :preserve && !@operator_spacing && @indent_width.nil?
|
|
75
|
+
end
|
|
76
|
+
|
|
77
|
+
# --- Type sets (built lazily from sas-lexer vocabulary) ---
|
|
78
|
+
|
|
79
|
+
def type_set(names)
|
|
80
|
+
tt = SasLexer::Lexer::TokenType
|
|
81
|
+
names.filter_map { |n| tt.const_get(n) if tt.const_defined?(n) }.to_set
|
|
82
|
+
end
|
|
83
|
+
|
|
84
|
+
def keyword_types
|
|
85
|
+
@keyword_types ||= SasLexer::Lexer::TokenType.constants
|
|
86
|
+
.select { |c| c.to_s.start_with?("KW_", "KWM_") }
|
|
87
|
+
.map { |c| SasLexer::Lexer::TokenType.const_get(c) }
|
|
88
|
+
.to_set
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
def binary_op_types = @binary_op_types ||= type_set(BINARY_OP_NAMES)
|
|
92
|
+
def unary_cand_types = @unary_cand_types ||= type_set(UNARY_CANDIDATE_NAMES)
|
|
93
|
+
def no_sp_before_types = @no_sp_before_types ||= type_set(NO_SPACE_BEFORE_NAMES)
|
|
94
|
+
def value_ending_types = @value_ending_types ||= type_set(VALUE_ENDING_NAMES)
|
|
95
|
+
def comma_types = @comma_types ||= type_set(COMMA_NAMES)
|
|
96
|
+
def data_proc_types = @data_proc_types ||= type_set(DATA_PROC_NAMES)
|
|
97
|
+
def do_types = @do_types ||= type_set(DO_NAMES)
|
|
98
|
+
def end_types = @end_types ||= type_set(END_NAMES)
|
|
99
|
+
def run_quit_types = @run_quit_types ||= type_set(RUN_QUIT_NAMES)
|
|
100
|
+
def semi_types = @semi_types ||= type_set(SEMI_NAMES)
|
|
101
|
+
|
|
102
|
+
# --- Reconstruction (keyword casing + operator spacing) ---
|
|
103
|
+
|
|
104
|
+
def apply_casing(token)
|
|
105
|
+
text = token[:text]
|
|
106
|
+
return text if @keywords == :preserve
|
|
107
|
+
|
|
108
|
+
default_ch = SasLexer::Lexer::TokenChannel::DEFAULT
|
|
109
|
+
return text unless token[:channel] == default_ch && keyword_types.include?(token[:type])
|
|
110
|
+
|
|
111
|
+
@keywords == :upper ? text.upcase : text.downcase
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
# Partition all_tokens into [{gap:, tok:}] segments where gap holds the
|
|
115
|
+
# non-DEFAULT tokens preceding tok, and tok is a DEFAULT-channel token.
|
|
116
|
+
def segmentize(all_tokens)
|
|
117
|
+
default_ch = SasLexer::Lexer::TokenChannel::DEFAULT
|
|
118
|
+
segments = []
|
|
119
|
+
gap = []
|
|
120
|
+
all_tokens.each do |t|
|
|
121
|
+
if t[:channel] == default_ch
|
|
122
|
+
segments << { gap: gap, tok: t }
|
|
123
|
+
gap = []
|
|
124
|
+
else
|
|
125
|
+
gap << t
|
|
126
|
+
end
|
|
127
|
+
end
|
|
128
|
+
segments << { gap: gap, tok: nil } unless gap.empty?
|
|
129
|
+
segments
|
|
130
|
+
end
|
|
131
|
+
|
|
132
|
+
def reconstruct(all_tokens)
|
|
133
|
+
segments = segmentize(all_tokens)
|
|
134
|
+
result = +""
|
|
135
|
+
|
|
136
|
+
segments.each_with_index do |seg, idx|
|
|
137
|
+
prev_prev = idx > 1 ? segments[idx - 2][:tok] : nil
|
|
138
|
+
prev = idx > 0 ? segments[idx - 1][:tok] : nil
|
|
139
|
+
cur = seg[:tok]
|
|
140
|
+
gap_text = seg[:gap].map { |t| t[:text] }.join
|
|
141
|
+
|
|
142
|
+
if @operator_spacing && prev && cur && !gap_text.include?("\n")
|
|
143
|
+
desired = gap_desired(prev_prev, prev, cur)
|
|
144
|
+
result << (desired.nil? ? gap_text : desired)
|
|
145
|
+
else
|
|
146
|
+
result << gap_text
|
|
147
|
+
end
|
|
148
|
+
|
|
149
|
+
result << apply_casing(cur) if cur
|
|
150
|
+
end
|
|
151
|
+
|
|
152
|
+
result
|
|
153
|
+
end
|
|
154
|
+
|
|
155
|
+
# Returns the desired whitespace between two same-line DEFAULT tokens:
|
|
156
|
+
# nil → leave unchanged
|
|
157
|
+
# "" → no space
|
|
158
|
+
# " " → exactly one space
|
|
159
|
+
#
|
|
160
|
+
# Three consecutive DEFAULT tokens are provided so unary PLUS/MINUS can be
|
|
161
|
+
# distinguished from binary: a PLUS/MINUS is binary only when its preceding
|
|
162
|
+
# token is a value-ending type (identifier, literal, or closing bracket).
|
|
163
|
+
def gap_desired(prev_prev_tok, prev_tok, next_tok)
|
|
164
|
+
pt = prev_tok[:type]
|
|
165
|
+
nt = next_tok[:type]
|
|
166
|
+
ppt = prev_prev_tok&.[](:type)
|
|
167
|
+
|
|
168
|
+
return "" if no_sp_before_types.include?(nt)
|
|
169
|
+
return " " if comma_types.include?(pt)
|
|
170
|
+
|
|
171
|
+
# Space after binary operator — but not after a unary +/-
|
|
172
|
+
if binary_op_types.include?(pt)
|
|
173
|
+
if unary_cand_types.include?(pt) && (ppt.nil? || !value_ending_types.include?(ppt))
|
|
174
|
+
return nil
|
|
175
|
+
end
|
|
176
|
+
return " "
|
|
177
|
+
end
|
|
178
|
+
|
|
179
|
+
# Space before binary operator — but not before a unary +/-
|
|
180
|
+
if binary_op_types.include?(nt)
|
|
181
|
+
return nil if unary_cand_types.include?(nt) && !value_ending_types.include?(pt)
|
|
182
|
+
|
|
183
|
+
return " "
|
|
184
|
+
end
|
|
185
|
+
|
|
186
|
+
nil
|
|
187
|
+
end
|
|
188
|
+
|
|
189
|
+
# --- Indentation ---
|
|
190
|
+
|
|
191
|
+
# Walk all_tokens and assign an indent level to each source line.
|
|
192
|
+
# Only the FIRST token on a given line determines its level (||= semantics).
|
|
193
|
+
# Nesting rules:
|
|
194
|
+
# DATA / PROC → level 0; content after their SEMI → level 1
|
|
195
|
+
# DO → content inside indented one further level
|
|
196
|
+
# END → decrements level before assigning the END line's level
|
|
197
|
+
# RUN / QUIT → resets to level 0
|
|
198
|
+
def compute_line_levels(all_tokens)
|
|
199
|
+
default_ch = SasLexer::Lexer::TokenChannel::DEFAULT
|
|
200
|
+
hidden_ch = SasLexer::Lexer::TokenChannel::HIDDEN
|
|
201
|
+
levels = {}
|
|
202
|
+
level = 0
|
|
203
|
+
after_data_proc = false
|
|
204
|
+
|
|
205
|
+
all_tokens.each do |tok|
|
|
206
|
+
next if tok[:channel] == hidden_ch
|
|
207
|
+
|
|
208
|
+
line = tok[:start_line]
|
|
209
|
+
type = tok[:type]
|
|
210
|
+
|
|
211
|
+
unless tok[:channel] == default_ch
|
|
212
|
+
levels[line] ||= level # comment tokens — indent at current level
|
|
213
|
+
next
|
|
214
|
+
end
|
|
215
|
+
|
|
216
|
+
if data_proc_types.include?(type)
|
|
217
|
+
level = 0
|
|
218
|
+
after_data_proc = true
|
|
219
|
+
levels[line] ||= level
|
|
220
|
+
elsif run_quit_types.include?(type)
|
|
221
|
+
level = 0
|
|
222
|
+
after_data_proc = false
|
|
223
|
+
levels[line] ||= level
|
|
224
|
+
elsif do_types.include?(type)
|
|
225
|
+
levels[line] ||= level
|
|
226
|
+
level += 1
|
|
227
|
+
elsif end_types.include?(type)
|
|
228
|
+
level = [level - 1, 0].max
|
|
229
|
+
levels[line] ||= level
|
|
230
|
+
elsif semi_types.include?(type) && after_data_proc
|
|
231
|
+
# Semicolon ends the DATA/PROC header — step body starts at level 1.
|
|
232
|
+
# Don't re-assign the SEMI's line (already marked at level 0 by DATA/PROC).
|
|
233
|
+
after_data_proc = false
|
|
234
|
+
level = 1
|
|
235
|
+
else
|
|
236
|
+
levels[line] ||= level
|
|
237
|
+
end
|
|
238
|
+
end
|
|
239
|
+
|
|
240
|
+
levels
|
|
241
|
+
end
|
|
242
|
+
|
|
243
|
+
# Re-indent each line of source using the computed per-line levels.
|
|
244
|
+
# Lines with no token coverage (blank lines) are left unchanged.
|
|
245
|
+
def apply_indentation(source, all_tokens)
|
|
246
|
+
line_levels = compute_line_levels(all_tokens)
|
|
247
|
+
|
|
248
|
+
source.each_line.with_index.map do |line, idx|
|
|
249
|
+
lineno = idx + 1
|
|
250
|
+
line_level = line_levels[lineno]
|
|
251
|
+
next line if line_level.nil?
|
|
252
|
+
|
|
253
|
+
eol = line.match(/\r?\n\z/)&.[](0) || ""
|
|
254
|
+
body = line[0...(line.length - eol.length)]
|
|
255
|
+
stripped = body.lstrip
|
|
256
|
+
next eol if stripped.empty? # blank line — strip stray whitespace
|
|
257
|
+
|
|
258
|
+
(" " * (@indent_width * line_level)) + stripped + eol
|
|
259
|
+
end.join
|
|
260
|
+
end
|
|
261
|
+
end
|
|
262
|
+
end
|
|
@@ -286,8 +286,13 @@ class SasLinter
|
|
|
286
286
|
findings
|
|
287
287
|
end
|
|
288
288
|
|
|
289
|
+
# `seq` is ASCII-8BIT (from `pack("C*")`); `encode("UTF-8")` on a
|
|
290
|
+
# binary string replaces every byte ≥ 0x80 with U+FFFD before
|
|
291
|
+
# `codepoints` runs. The bytes are already a valid UTF-8 sequence
|
|
292
|
+
# by construction (caller checked `utf8_sequence_length`), so
|
|
293
|
+
# reinterpret rather than transcode.
|
|
289
294
|
def codepoint(seq)
|
|
290
|
-
seq.
|
|
295
|
+
seq.dup.force_encoding("UTF-8").codepoints.first
|
|
291
296
|
end
|
|
292
297
|
|
|
293
298
|
# Returns the length (1-4) of a valid UTF-8 sequence starting
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "../../sas_linter"
|
|
4
|
+
require "sas_lexer"
|
|
5
|
+
|
|
6
|
+
class SasLinter
|
|
7
|
+
module Rules
|
|
8
|
+
# Flag tokens the lexer typed as INTEGER_LITERAL whose text isn't
|
|
9
|
+
# actually a valid SAS numeric literal — typically a digit-prefixed
|
|
10
|
+
# identifier the lexer munched into one token, e.g.
|
|
11
|
+
#
|
|
12
|
+
# if K9d = 1f then do; * `1f` is not valid SAS
|
|
13
|
+
# x = 1d2; * `D` exponent is Fortran, not SAS
|
|
14
|
+
#
|
|
15
|
+
# Real SAS only accepts two integer-shaped literals:
|
|
16
|
+
#
|
|
17
|
+
# * plain decimal: `[0-9]+`
|
|
18
|
+
# * hex literal: `[0-9][0-9A-Fa-f]*[xX]` (trailing `x`/`X`)
|
|
19
|
+
#
|
|
20
|
+
# Float and exponent forms (`1.0`, `1e3`) get their own token types
|
|
21
|
+
# (FLOAT_LITERAL / FLOAT_EXPONENT_LITERAL) and aren't checked here.
|
|
22
|
+
class InvalidNumericLiteral < Rule
|
|
23
|
+
rule_id :invalid_numeric_literal
|
|
24
|
+
description "INTEGER_LITERAL must be a plain decimal or a SAS hex " \
|
|
25
|
+
"literal (`0FFx`-style); reject suffixes like `1f` or " \
|
|
26
|
+
"`1d2` that the lexer accepts but SAS does not."
|
|
27
|
+
severity :warning
|
|
28
|
+
|
|
29
|
+
TT = SasLexer::Lexer::TokenType
|
|
30
|
+
|
|
31
|
+
VALID = /\A(?:[0-9]+|[0-9][0-9A-Fa-f]*[xX])\z/
|
|
32
|
+
|
|
33
|
+
MESSAGE_SUFFIX = "is not a valid SAS numeric literal — SAS has no " \
|
|
34
|
+
"`f`/`F`/`L` numeric suffixes and uses `E` (not `D`) " \
|
|
35
|
+
"for exponents; hex literals must end in `x`/`X`."
|
|
36
|
+
|
|
37
|
+
def check(tokens, path:, all_tokens: nil, source: nil) # rubocop:disable Lint/UnusedMethodArgument
|
|
38
|
+
tokens.filter_map do |t|
|
|
39
|
+
next unless t[:type] == TT::INTEGER_LITERAL && !VALID.match?(t[:text])
|
|
40
|
+
|
|
41
|
+
finding(
|
|
42
|
+
line: t[:start_line], column: t[:start_column] + 1,
|
|
43
|
+
message: "`#{t[:text]}` #{MESSAGE_SUFFIX}", path: path
|
|
44
|
+
)
|
|
45
|
+
end
|
|
46
|
+
end
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
end
|
data/lib/sas_linter/version.rb
CHANGED
data/lib/sas_linter.rb
CHANGED
|
@@ -241,6 +241,28 @@ class SasLinter
|
|
|
241
241
|
findings
|
|
242
242
|
end
|
|
243
243
|
|
|
244
|
+
# Apply formatting to a file in-place. Runs the formatter's own
|
|
245
|
+
# transformations first, then the autofix pipeline for rules that have
|
|
246
|
+
# `autofix: true` in config — identical to lint_with_fixes except that the
|
|
247
|
+
# formatter pass runs first. Rules that haven't been opted in to autofix
|
|
248
|
+
# (e.g. missing_assignment_semicolon without explicit `autofix: true`) are
|
|
249
|
+
# left alone so that --format stays cosmetic by default.
|
|
250
|
+
#
|
|
251
|
+
# Returns true if the file was rewritten, false if nothing changed.
|
|
252
|
+
def format_file(path, formatter:)
|
|
253
|
+
original = read_source(path)
|
|
254
|
+
modified = formatter.format(original)
|
|
255
|
+
@rules.each do |rule|
|
|
256
|
+
next unless rule.autofix? && rule.class.supports_autofix?
|
|
257
|
+
|
|
258
|
+
modified = rule.autofix(modified)
|
|
259
|
+
end
|
|
260
|
+
return false if modified.b == original.b
|
|
261
|
+
|
|
262
|
+
File.write(path, modified)
|
|
263
|
+
true
|
|
264
|
+
end
|
|
265
|
+
|
|
244
266
|
private
|
|
245
267
|
|
|
246
268
|
def read_source(path)
|
|
@@ -273,6 +295,7 @@ class SasLinter
|
|
|
273
295
|
end
|
|
274
296
|
end
|
|
275
297
|
|
|
298
|
+
require_relative "sas_linter/formatter"
|
|
276
299
|
require_relative "sas_linter/rules/unreachable_inner_branch_value"
|
|
277
300
|
require_relative "sas_linter/rules/identical_if_else_branches"
|
|
278
301
|
require_relative "sas_linter/rules/commented_out_guard"
|
|
@@ -285,3 +308,4 @@ require_relative "sas_linter/rules/encoding_issues"
|
|
|
285
308
|
require_relative "sas_linter/rules/malformed_if_condition"
|
|
286
309
|
require_relative "sas_linter/rules/missing_assignment_semicolon"
|
|
287
310
|
require_relative "sas_linter/rules/variable_value_out_of_known_range"
|
|
311
|
+
require_relative "sas_linter/rules/invalid_numeric_literal"
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: sas-linter
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1
|
|
4
|
+
version: 0.2.1
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Craig McNamara
|
|
@@ -52,15 +52,19 @@ executables:
|
|
|
52
52
|
extensions: []
|
|
53
53
|
extra_rdoc_files: []
|
|
54
54
|
files:
|
|
55
|
+
- CLAUDE.md
|
|
55
56
|
- LICENSE
|
|
56
57
|
- README.md
|
|
57
58
|
- Rakefile
|
|
58
59
|
- bin/sas_lint
|
|
60
|
+
- config/lint.yaml
|
|
59
61
|
- lib/sas_linter.rb
|
|
62
|
+
- lib/sas_linter/formatter.rb
|
|
60
63
|
- lib/sas_linter/rules/choose_one_template.rb
|
|
61
64
|
- lib/sas_linter/rules/commented_out_guard.rb
|
|
62
65
|
- lib/sas_linter/rules/encoding_issues.rb
|
|
63
66
|
- lib/sas_linter/rules/identical_if_else_branches.rb
|
|
67
|
+
- lib/sas_linter/rules/invalid_numeric_literal.rb
|
|
64
68
|
- lib/sas_linter/rules/line_endings.rb
|
|
65
69
|
- lib/sas_linter/rules/malformed_if_condition.rb
|
|
66
70
|
- lib/sas_linter/rules/missing_assignment_semicolon.rb
|