collie 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (58) hide show
  1. checksums.yaml +7 -0
  2. data/CHANGELOG.md +12 -0
  3. data/Gemfile +10 -0
  4. data/LICENSE.txt +21 -0
  5. data/README.md +333 -0
  6. data/Rakefile +9 -0
  7. data/collie.gemspec +37 -0
  8. data/docs/TUTORIAL.md +588 -0
  9. data/docs/index.html +56 -0
  10. data/docs/playground/README.md +134 -0
  11. data/docs/playground/build-collie-bundle.rb +85 -0
  12. data/docs/playground/css/styles.css +402 -0
  13. data/docs/playground/index.html +146 -0
  14. data/docs/playground/js/app.js +231 -0
  15. data/docs/playground/js/collie-bridge.js +186 -0
  16. data/docs/playground/js/editor.js +129 -0
  17. data/docs/playground/js/examples.js +80 -0
  18. data/docs/playground/js/ruby-runner.js +75 -0
  19. data/docs/playground/test-server.sh +18 -0
  20. data/exe/collie +15 -0
  21. data/lib/collie/analyzer/conflict.rb +114 -0
  22. data/lib/collie/analyzer/reachability.rb +83 -0
  23. data/lib/collie/analyzer/recursion.rb +96 -0
  24. data/lib/collie/analyzer/symbol_table.rb +67 -0
  25. data/lib/collie/ast.rb +183 -0
  26. data/lib/collie/cli.rb +249 -0
  27. data/lib/collie/config.rb +91 -0
  28. data/lib/collie/formatter/formatter.rb +196 -0
  29. data/lib/collie/formatter/options.rb +23 -0
  30. data/lib/collie/linter/base.rb +62 -0
  31. data/lib/collie/linter/registry.rb +34 -0
  32. data/lib/collie/linter/rules/ambiguous_precedence.rb +87 -0
  33. data/lib/collie/linter/rules/circular_reference.rb +89 -0
  34. data/lib/collie/linter/rules/consistent_tag_naming.rb +69 -0
  35. data/lib/collie/linter/rules/duplicate_token.rb +38 -0
  36. data/lib/collie/linter/rules/empty_action.rb +52 -0
  37. data/lib/collie/linter/rules/factorizable_rules.rb +67 -0
  38. data/lib/collie/linter/rules/left_recursion.rb +34 -0
  39. data/lib/collie/linter/rules/long_rule.rb +37 -0
  40. data/lib/collie/linter/rules/missing_start_symbol.rb +38 -0
  41. data/lib/collie/linter/rules/nonterminal_naming.rb +34 -0
  42. data/lib/collie/linter/rules/prec_improvement.rb +54 -0
  43. data/lib/collie/linter/rules/redundant_epsilon.rb +44 -0
  44. data/lib/collie/linter/rules/right_recursion.rb +35 -0
  45. data/lib/collie/linter/rules/token_naming.rb +39 -0
  46. data/lib/collie/linter/rules/trailing_whitespace.rb +46 -0
  47. data/lib/collie/linter/rules/undefined_symbol.rb +55 -0
  48. data/lib/collie/linter/rules/unreachable_rule.rb +49 -0
  49. data/lib/collie/linter/rules/unused_nonterminal.rb +93 -0
  50. data/lib/collie/linter/rules/unused_token.rb +82 -0
  51. data/lib/collie/parser/lexer.rb +349 -0
  52. data/lib/collie/parser/parser.rb +416 -0
  53. data/lib/collie/reporter/github.rb +35 -0
  54. data/lib/collie/reporter/json.rb +52 -0
  55. data/lib/collie/reporter/text.rb +97 -0
  56. data/lib/collie/version.rb +5 -0
  57. data/lib/collie.rb +52 -0
  58. metadata +145 -0
data/docs/TUTORIAL.md ADDED
@@ -0,0 +1,588 @@
1
+ # Collie Tutorial
2
+
3
+ This tutorial will guide you through using Collie to lint and format your grammar files.
4
+
5
+ ## Table of Contents
6
+
7
+ 1. [Getting Started](#getting-started)
8
+ 2. [Your First Grammar File](#your-first-grammar-file)
9
+ 3. [Running the Linter](#running-the-linter)
10
+ 4. [Understanding Lint Rules](#understanding-lint-rules)
11
+ 5. [Auto-correcting Issues](#auto-correcting-issues)
12
+ 6. [Configuring Collie](#configuring-collie)
13
+ 7. [Formatting Grammar Files](#formatting-grammar-files)
14
+ 8. [Working with Lrama Extensions](#working-with-lrama-extensions)
15
+ 9. [CI Integration](#ci-integration)
16
+
17
+ ## Getting Started
18
+
19
+ ### Installation
20
+
21
+ Install Collie using RubyGems:
22
+
23
+ ```bash
24
+ gem install collie
25
+ ```
26
+
27
+ Verify the installation:
28
+
29
+ ```bash
30
+ collie --version
31
+ ```
32
+
33
+ ### Project Setup
34
+
35
+ Create a new directory for your grammar project:
36
+
37
+ ```bash
38
+ mkdir my-parser
39
+ cd my-parser
40
+ ```
41
+
42
+ Initialize a Collie configuration file:
43
+
44
+ ```bash
45
+ collie init
46
+ ```
47
+
48
+ This creates a `.collie.yml` file with default settings.
49
+
50
+ ## Your First Grammar File
51
+
52
+ Create a simple calculator grammar file called `calc.y`:
53
+
54
+ ```yacc
55
+ %token NUMBER
56
+ %token PLUS MINUS TIMES DIVIDE
57
+ %token LPAREN RPAREN
58
+
59
+ %left PLUS MINUS
60
+ %left TIMES DIVIDE
61
+
62
+ %%
63
+
64
+ program
65
+ : expr
66
+ ;
67
+
68
+ expr
69
+ : expr PLUS expr { $$ = $1 + $3; }
70
+ | expr MINUS expr { $$ = $1 - $3; }
71
+ | expr TIMES expr { $$ = $1 * $3; }
72
+ | expr DIVIDE expr { $$ = $1 / $3; }
73
+ | LPAREN expr RPAREN { $$ = $2; }
74
+ | NUMBER { $$ = $1; }
75
+ ;
76
+
77
+ %%
78
+ ```
79
+
80
+ ## Running the Linter
81
+
82
+ Lint your grammar file:
83
+
84
+ ```bash
85
+ collie lint calc.y
86
+ ```
87
+
88
+ Auto-fix issues where possible:
89
+
90
+ ```bash
91
+ collie lint -a calc.y
92
+ ```
93
+
94
+ You should see output indicating any issues found. For the example above, you might see:
95
+
96
+ ```
97
+ calc.y
98
+ 14:1: warning: [LeftRecursion] Rule 'expr' uses left recursion (consider using right recursion for LL parsers)
99
+
100
+ 1 warning(s) found
101
+ ```
102
+
103
+ This is informational - left recursion is actually good for LR parsers like Lrama!
104
+
105
+ ### Understanding the Output
106
+
107
+ Each offense shows:
108
+ - File and location: `calc.y:14:1`
109
+ - Severity: `warning`, `error`, `convention`, or `info`
110
+ - Rule name: `[LeftRecursion]`
111
+ - Message: Description of the issue
112
+
113
+ ## Understanding Lint Rules
114
+
115
+ ### Error Rules (Must Fix)
116
+
117
+ These indicate actual problems with your grammar:
118
+
119
+ ```yacc
120
+ # DuplicateToken - Token defined twice
121
+ %token NUMBER
122
+ %token NUMBER # Error: duplicate definition
123
+
124
+ # UndefinedSymbol - Using undeclared token
125
+ expr: PLUS UNDEFINED # Error: UNDEFINED not declared
126
+
127
+ # CircularReference - Infinite recursion
128
+ rule_a: rule_b ;
129
+ rule_b: rule_a ; # Error: circular reference with no base case
130
+ ```
131
+
132
+ ### Warning Rules (Should Fix)
133
+
134
+ These indicate potential issues:
135
+
136
+ ```yacc
137
+ # UnusedToken - Token declared but never used
138
+ %token UNUSED_TOKEN # Warning: never referenced in rules
139
+
140
+ # AmbiguousPrecedence - Operator without precedence
141
+ expr: expr '+' expr ; # Warning: '+' has no %left/%right/%nonassoc declaration
142
+ ```
143
+
144
+ ### Convention Rules (Style)
145
+
146
+ These enforce naming conventions:
147
+
148
+ ```yacc
149
+ # TokenNaming - Tokens should be UPPER_CASE
150
+ %token Number # Convention: should be NUMBER
151
+
152
+ # NonterminalNaming - Nonterminals should be snake_case
153
+ ExprStmt: expr SEMICOLON ; # Convention: should be expr_stmt
154
+ ```
155
+
156
+ ### Info Rules (Optimization Hints)
157
+
158
+ These suggest potential improvements:
159
+
160
+ ```yacc
161
+ # FactorizableRules - Common prefix can be factored
162
+ stmt
163
+ : IF LPAREN expr RPAREN stmt
164
+ | IF LPAREN expr RPAREN stmt ELSE stmt
165
+ ;
166
+ # Info: Consider factoring the common IF LPAREN expr RPAREN prefix
167
+
168
+ # RedundantEpsilon - Potentially unnecessary empty production
169
+ optional_item
170
+ : item
171
+ | /* empty */
172
+ ;
173
+ # Info: Consider using the optional item where it's used instead
174
+ ```
175
+
176
+ ## Auto-correcting Issues
177
+
178
+ Collie can automatically fix certain issues with the `-a` or `--autocorrect` flag:
179
+
180
+ ```bash
181
+ collie lint -a calc.y
182
+ ```
183
+
184
+ ### Autocorrectable Rules
185
+
186
+ The following rules support autocorrect:
187
+
188
+ - `TrailingWhitespace`: Removes trailing spaces and tabs from lines
189
+ - `EmptyAction`: Removes unnecessary empty action blocks `{ }`
190
+
191
+ When you run with `-a`, Collie will:
192
+ 1. Detect all offenses in your grammar file
193
+ 2. Apply fixes for autocorrectable offenses
194
+ 3. Write the corrected source back to the file
195
+ 4. Show how many offenses were auto-corrected
196
+
197
+ Example output:
198
+ ```
199
+ calc.y
200
+ 5:15: convention: [TrailingWhitespace] Trailing whitespace detected
201
+ 12:1: convention: [EmptyAction] Empty action block can be removed
202
+
203
+ Auto-corrected 2 offense(s) in calc.y
204
+ ```
205
+
206
+ ### Combining with Other Options
207
+
208
+ You can combine autocorrect with other options:
209
+
210
+ ```bash
211
+ # Autocorrect only specific rules
212
+ collie lint -a --only TrailingWhitespace calc.y
213
+
214
+ # Autocorrect all except specific rules
215
+ collie lint -a --except EmptyAction calc.y
216
+
217
+ # Autocorrect multiple files
218
+ collie lint -a **/*.y
219
+ ```
220
+
221
+ ## Configuring Collie
222
+
223
+ Edit your `.collie.yml` to customize behavior:
224
+
225
+ ### Disabling Specific Rules
226
+
227
+ ```yaml
228
+ rules:
229
+ LeftRecursion:
230
+ enabled: false # Don't warn about left recursion
231
+ ```
232
+
233
+ ### Configuring Rule Options
234
+
235
+ ```yaml
236
+ rules:
237
+ LongRule:
238
+ enabled: true
239
+ max_alternatives: 15 # Allow up to 15 alternatives (default: 10)
240
+
241
+ TokenNaming:
242
+ enabled: true
243
+ pattern: '^[A-Z][A-Z0-9_]*$' # Custom regex pattern
244
+ ```
245
+
246
+ ### Formatter Options
247
+
248
+ ```yaml
249
+ formatter:
250
+ indent_size: 2 # Number of spaces for indentation
251
+ align_tokens: true # Align token declarations
252
+ align_alternatives: true # Align rule alternatives
253
+ blank_lines_around_sections: 1 # Blank lines before/after %%
254
+ max_line_length: 100 # Maximum line length
255
+ ```
256
+
257
+ ### Excluding Files
258
+
259
+ ```yaml
260
+ exclude:
261
+ - 'vendor/**/*'
262
+ - 'generated/**/*'
263
+ - 'tmp/**/*'
264
+ ```
265
+
266
+ ## Formatting Grammar Files
267
+
268
+ ### Check Formatting
269
+
270
+ See if your file needs formatting without modifying it:
271
+
272
+ ```bash
273
+ collie fmt --check calc.y
274
+ ```
275
+
276
+ Exit code 0 means properly formatted, 1 means formatting needed.
277
+
278
+ ### Show Formatting Changes
279
+
280
+ View the diff of what would change:
281
+
282
+ ```bash
283
+ collie fmt --diff calc.y
284
+ ```
285
+
286
+ Output shows unified diff:
287
+
288
+ ```diff
289
+ %token NUMBER
290
+ -%token PLUS MINUS TIMES DIVIDE
291
+ +%token PLUS
292
+ +%token MINUS
293
+ +%token TIMES
294
+ +%token DIVIDE
295
+ ```
296
+
297
+ ### Apply Formatting
298
+
299
+ Format the file in-place:
300
+
301
+ ```bash
302
+ collie fmt calc.y
303
+ ```
304
+
305
+ ### Formatted Output Example
306
+
307
+ Before:
308
+
309
+ ```yacc
310
+ %token NUMBER PLUS MINUS
311
+ %left PLUS MINUS
312
+ %%
313
+ expr:expr PLUS expr|NUMBER;
314
+ %%
315
+ ```
316
+
317
+ After:
318
+
319
+ ```yacc
320
+ %token NUMBER
321
+ %token PLUS
322
+ %token MINUS
323
+
324
+ %left PLUS MINUS
325
+
326
+ %%
327
+
328
+ expr
329
+ : expr PLUS expr
330
+ | NUMBER
331
+ ;
332
+
333
+ %%
334
+ ```
335
+
336
+ ## Working with Lrama Extensions
337
+
338
+ Lrama extends Yacc/Bison with powerful features. Collie fully supports them.
339
+
340
+ ### Parameterized Rules
341
+
342
+ Define reusable rule templates:
343
+
344
+ ```yacc
345
+ %rule pair(X, Y): X COMMA Y
346
+ { $$ = make_pair($1, $3); }
347
+ ;
348
+
349
+ %%
350
+
351
+ # Use the template with different types
352
+ number_pair: pair(NUMBER, NUMBER) ;
353
+ string_pair: pair(STRING, STRING) ;
354
+ ```
355
+
356
+ ### Named References
357
+
358
+ Use descriptive names instead of positional references:
359
+
360
+ ```yacc
361
+ assignment
362
+ : IDENTIFIER[var] EQUALS expr[value]
363
+ { assign_variable($var, $value); }
364
+ ;
365
+
366
+ # Instead of:
367
+ # : IDENTIFIER EQUALS expr { assign_variable($1, $3); }
368
+ ```
369
+
370
+ ### Inline Rules
371
+
372
+ Mark rules for inline expansion:
373
+
374
+ ```yacc
375
+ %inline opt(X)
376
+ : /* empty */
377
+ | X
378
+ ;
379
+
380
+ %%
381
+
382
+ # This:
383
+ optional_semicolon: opt(SEMICOLON) ;
384
+
385
+ # Expands to:
386
+ optional_semicolon
387
+ : /* empty */
388
+ | SEMICOLON
389
+ ;
390
+ ```
391
+
392
+ ### Full Example with Lrama Features
393
+
394
+ ```yacc
395
+ %token NUMBER IDENTIFIER
396
+ %token LPAREN RPAREN COMMA
397
+
398
+ %rule list(X): X | list(X) COMMA X ;
399
+
400
+ %%
401
+
402
+ program
403
+ : function_call
404
+ ;
405
+
406
+ function_call
407
+ : IDENTIFIER[func] LPAREN argument_list RPAREN
408
+ { call_function($func, $3); }
409
+ ;
410
+
411
+ argument_list
412
+ : list(expr)
413
+ | /* empty */ { $$ = empty_list(); }
414
+ ;
415
+
416
+ expr
417
+ : NUMBER[n] { $$ = make_number($n); }
418
+ | IDENTIFIER[id] { $$ = make_variable($id); }
419
+ ;
420
+
421
+ %%
422
+ ```
423
+
424
+ ## CI Integration
425
+
426
+ ### GitHub Actions
427
+
428
+ Add Collie to your GitHub Actions workflow.
429
+
430
+ Create `.github/workflows/lint.yml`:
431
+
432
+ ```yaml
433
+ name: Lint Grammar Files
434
+
435
+ on:
436
+ push:
437
+ branches: [ main ]
438
+ pull_request:
439
+ branches: [ main ]
440
+
441
+ jobs:
442
+ lint:
443
+ runs-on: ubuntu-latest
444
+
445
+ steps:
446
+ - uses: actions/checkout@v4
447
+
448
+ - name: Set up Ruby
449
+ uses: ruby/setup-ruby@v1
450
+ with:
451
+ ruby-version: '3.4'
452
+
453
+ - name: Install Collie
454
+ run: gem install collie
455
+
456
+ - name: Lint grammar files
457
+ run: collie lint **/*.y
458
+
459
+ - name: Check formatting
460
+ run: collie fmt --check **/*.y
461
+ ```
462
+
463
+ ### Use Reusable Workflow
464
+
465
+ For a simpler setup, use Collie's reusable workflow:
466
+
467
+ ```yaml
468
+ name: Lint Grammar Files
469
+
470
+ on: [push, pull_request]
471
+
472
+ jobs:
473
+ lint:
474
+ uses: ydah/collie/.github/workflows/lint.yml@main
475
+ with:
476
+ files: 'src/**/*.y'
477
+ config: '.collie.yml'
478
+ fail-on-warnings: false
479
+ ```
480
+
481
+ ### Pre-commit Hook
482
+
483
+ Add a git pre-commit hook to lint before committing.
484
+
485
+ Create `.git/hooks/pre-commit`:
486
+
487
+ ```bash
488
+ #!/bin/bash
489
+
490
+ # Lint staged .y files
491
+ git diff --cached --name-only --diff-filter=ACM | grep '\.y$' | while read file; do
492
+ collie lint "$file"
493
+ if [ $? -ne 0 ]; then
494
+ echo "Lint failed for $file"
495
+ exit 1
496
+ fi
497
+ done
498
+ ```
499
+
500
+ Make it executable:
501
+
502
+ ```bash
503
+ chmod +x .git/hooks/pre-commit
504
+ ```
505
+
506
+ ## Best Practices
507
+
508
+ ### 1. Start with Default Rules
509
+
510
+ Begin with all rules enabled, then disable specific ones based on your needs.
511
+
512
+ ### 2. Use Consistent Naming
513
+
514
+ Follow these conventions:
515
+ - Tokens: `UPPER_CASE` (e.g., `NUMBER`, `IDENTIFIER`)
516
+ - Nonterminals: `snake_case` (e.g., `expr_stmt`, `function_call`)
517
+ - Type tags: Consistent style (all `snake_case` or all `camelCase`)
518
+
519
+ ### 3. Declare Precedence
520
+
521
+ Always declare precedence for operators:
522
+
523
+ ```yacc
524
+ %left '+' '-'
525
+ %left '*' '/'
526
+ %right '^'
527
+ %nonassoc UMINUS # Unary minus
528
+ ```
529
+
530
+ ### 4. Document Complex Rules
531
+
532
+ Add comments for non-obvious grammar rules:
533
+
534
+ ```yacc
535
+ # Function definition with optional parameter list
536
+ function_def
537
+ : DEF IDENTIFIER opt_params block
538
+ { $$ = make_function($2, $3, $4); }
539
+ ;
540
+ ```
541
+
542
+ ### 5. Run Collie Regularly
543
+
544
+ Add to your development workflow:
545
+
546
+ ```bash
547
+ # Before committing
548
+ collie lint **/*.y && collie fmt **/*.y
549
+
550
+ # In CI
551
+ collie lint --format github **/*.y
552
+ ```
553
+
554
+ ## Troubleshooting
555
+
556
+ ### "Parser error: unexpected token"
557
+
558
+ Your grammar file has syntax errors. Check:
559
+ - Matching braces in actions `{ }`
560
+ - Proper section separators `%%`
561
+ - Valid token/nonterminal names
562
+
563
+ ### "No offenses detected" but file has issues
564
+
565
+ Check if the rule is enabled in `.collie.yml`:
566
+
567
+ ```yaml
568
+ rules:
569
+ RuleName:
570
+ enabled: true
571
+ ```
572
+
573
+ ### Formatting produces unexpected output
574
+
575
+ File a bug report with:
576
+ 1. Original file content
577
+ 2. Expected output
578
+ 3. Actual output
579
+ 4. Your `.collie.yml` configuration
580
+
581
+ ## Next Steps
582
+
583
+ - Read the [Configuration Guide](CONFIGURATION.md) for advanced config options
584
+ - Check the [Rule Reference](RULES.md) for detailed rule descriptions
585
+ - Explore [examples/](../examples/) for real-world grammar files
586
+ - Join the discussion on [GitHub](https://github.com/ydah/collie)
587
+
588
+ Happy parsing!
data/docs/index.html ADDED
@@ -0,0 +1,56 @@
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <meta http-equiv="refresh" content="0; url=./playground/">
7
+ <title>Collie - Redirecting to Playground</title>
8
+ <style>
9
+ body {
10
+ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
11
+ display: flex;
12
+ align-items: center;
13
+ justify-content: center;
14
+ min-height: 100vh;
15
+ margin: 0;
16
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
17
+ color: white;
18
+ text-align: center;
19
+ }
20
+ .content {
21
+ max-width: 500px;
22
+ padding: 2rem;
23
+ }
24
+ h1 {
25
+ font-size: 3rem;
26
+ margin-bottom: 1rem;
27
+ }
28
+ p {
29
+ font-size: 1.2rem;
30
+ margin-bottom: 2rem;
31
+ opacity: 0.9;
32
+ }
33
+ a {
34
+ display: inline-block;
35
+ padding: 1rem 2rem;
36
+ background: white;
37
+ color: #667eea;
38
+ text-decoration: none;
39
+ border-radius: 8px;
40
+ font-weight: 600;
41
+ transition: transform 0.2s;
42
+ }
43
+ a:hover {
44
+ transform: translateY(-2px);
45
+ }
46
+ </style>
47
+ </head>
48
+ <body>
49
+ <div class="content">
50
+ <h1>🐕 Collie</h1>
51
+ <p>Linter and formatter for Lrama Style BNF grammar files</p>
52
+ <p>Redirecting to playground...</p>
53
+ <a href="./playground/">Go to Playground</a>
54
+ </div>
55
+ </body>
56
+ </html>