yara-ffi 4.0.0 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/USAGE.md ADDED
@@ -0,0 +1,747 @@
1
+ # yara-ffi Usage Guide
2
+
3
+ This guide covers comprehensive usage of the yara-ffi Ruby gem, which provides FFI bindings to YARA-X (the modern Rust-based YARA implementation) for pattern matching and malware detection.
4
+
5
+ ## Quick Reference
6
+
7
+ ### Basic Scanning
8
+
9
+ ```ruby
10
+ require 'yara'
11
+
12
+ # Quick test
13
+ results = Yara.test(rule_string, data)
14
+ puts "Matched: #{results.first.rule_name}" if results.first&.match?
15
+
16
+ # Scan with block
17
+ Yara.scan(rule_string, data) do |result|
18
+ puts "Found: #{result.rule_name}"
19
+ end
20
+ ```
21
+
22
+ ### Scanner with Resource Management
23
+
24
+ ```ruby
25
+ # Recommended: Automatic cleanup
26
+ Yara::Scanner.open(rule) do |scanner|
27
+ scanner.compile
28
+ results = scanner.scan(data)
29
+ end
30
+
31
+ # Manual: Must call close()
32
+ scanner = Yara::Scanner.new
33
+ scanner.add_rule(rule)
34
+ scanner.compile
35
+ results = scanner.scan(data)
36
+ scanner.close
37
+ ```
38
+
39
+ ### Pattern Match Analysis
40
+
41
+ ```ruby
42
+ results = Yara.scan(rule, data)
43
+ result = results.first
44
+
45
+ # Check specific patterns
46
+ result.pattern_matched?(:$suspicious) # => true/false
47
+ result.matches_for_pattern(:$api_call) # => [PatternMatch, ...]
48
+ result.total_matches # => 5
49
+
50
+ # Get match details
51
+ matches = result.matches_for_pattern(:$pattern)
52
+ matches.each do |match|
53
+ puts "At offset #{match.offset}: #{match.matched_data(data)}"
54
+ end
55
+ ```
56
+
57
+ ### Rule Metadata & Tags
58
+
59
+ ```ruby
60
+ result = results.first
61
+
62
+ # Metadata access
63
+ result.rule_meta[:author] # Raw access
64
+ result.metadata_string(:author) # Type-safe access
65
+ result.metadata_int(:severity) # Returns Integer or nil
66
+
67
+ # Tags
68
+ result.tags # => ["malware", "trojan"]
69
+ result.has_tag?("malware") # => true
70
+ ```
71
+
72
+ ### Global Variables
73
+
74
+ ```ruby
75
+ # Set individual globals
76
+ scanner.set_global_str("ENV", "production")
77
+ scanner.set_global_bool("DEBUG", false)
78
+ scanner.set_global_int("MAX_SIZE", 1000)
79
+
80
+ # Set multiple globals
81
+ scanner.set_globals({
82
+ "ENV" => "production",
83
+ "DEBUG" => false,
84
+ "RETRIES" => 3
85
+ })
86
+ ```
87
+
88
+ ### Rule Compilation & Serialization
89
+
90
+ ```ruby
91
+ # Advanced compilation
92
+ compiler = Yara::Compiler.new
93
+ compiler.define_global_str("ENV", "prod")
94
+ compiler.add_source(rule1)
95
+ compiler.add_source(rule2)
96
+
97
+ # Serialize for reuse
98
+ serialized = compiler.build_serialized
99
+ File.binwrite("rules.bin", serialized)
100
+
101
+ # Later: deserialize and scan
102
+ data = File.binread("rules.bin")
103
+ scanner = Yara::Scanner.from_serialized(data)
104
+ results = scanner.scan(target_data)
105
+ ```
106
+
107
+ ### Error Handling
108
+
109
+ ```ruby
110
+ begin
111
+ Yara::Scanner.open(rule) do |scanner|
112
+ scanner.compile
113
+ scanner.set_timeout(5000) # 5 seconds
114
+ results = scanner.scan(data)
115
+ end
116
+ rescue Yara::Scanner::CompilationError => e
117
+ puts "Rule error: #{e.message}"
118
+ rescue Yara::Scanner::ScanError => e
119
+ puts "Scan failed: #{e.message}"
120
+ end
121
+ ```
122
+
123
+ ### Performance Tips
124
+
125
+ ```ruby
126
+ # 1. Use serialized rules for repeated scans
127
+ serialized = compiler.build_serialized
128
+ scanners = 10.times.map { Yara::Scanner.from_serialized(serialized) }
129
+
130
+ # 2. Set reasonable timeouts
131
+ scanner.set_timeout(10000) # 10 seconds
132
+
133
+ # 3. Use block syntax for streaming
134
+ Yara.scan(rule, large_data) do |result|
135
+ process_immediately(result) # Don't accumulate all results
136
+ end
137
+ ```
138
+
139
+ ---
140
+
141
+ ## Table of Contents
142
+
143
+ - [Quick Start](#quick-start)
144
+ - [Basic Scanning](#basic-scanning)
145
+ - [Advanced Scanner Usage](#advanced-scanner-usage)
146
+ - [Pattern Matching Analysis](#pattern-matching-analysis)
147
+ - [Rule Compilation & Management](#rule-compilation--management)
148
+ - [Global Variables](#global-variables)
149
+ - [Rule Serialization](#rule-serialization)
150
+ - [Metadata & Tags](#metadata--tags)
151
+ - [Namespaces](#namespaces)
152
+ - [Performance & Timeouts](#performance--timeouts)
153
+ - [Error Handling](#error-handling)
154
+ - [Best Practices](#best-practices)
155
+
156
+ ## Quick Start
157
+
158
+ ### Simple Rule Testing
159
+
160
+ ```ruby
161
+ require 'yara'
162
+
163
+ # Basic rule
164
+ rule = <<-RULE
165
+ rule ExampleRule
166
+ {
167
+ meta:
168
+ description = "Example rule for testing"
169
+ author = "security_team"
170
+
171
+ strings:
172
+ $text_string = "malware"
173
+ $text_regex = /suspicious[0-9]+/
174
+
175
+ condition:
176
+ $text_string or $text_regex
177
+ }
178
+ RULE
179
+
180
+ # Test against data
181
+ results = Yara.test(rule, "This contains malware signatures")
182
+ puts results.first.match? # => true
183
+ puts results.first.rule_name # => "ExampleRule"
184
+ ```
185
+
186
+ ### Scanning with Block Processing
187
+
188
+ ```ruby
189
+ # Process results as they're found
190
+ Yara.scan(rule, "sample data") do |result|
191
+ puts "Matched: #{result.rule_name}"
192
+ puts " Author: #{result.rule_meta[:author]}"
193
+ puts " Total matches: #{result.total_matches}"
194
+ end
195
+ ```
196
+
197
+ ## Basic Scanning
198
+
199
+ ### Using Convenience Methods
200
+
201
+ The `Yara.test` and `Yara.scan` methods provide the simplest interface for basic scanning:
202
+
203
+ ```ruby
204
+ # Quick test - returns ScanResults collection
205
+ results = Yara.test(rule, data)
206
+ results.each { |result| puts "Match: #{result.rule_name}" }
207
+
208
+ # Scan with optional block processing
209
+ scan_results = Yara.scan(rule, data)
210
+ # or
211
+ Yara.scan(rule, data) { |result| process_match(result) }
212
+ ```
213
+
214
+ ### Multiple Rules
215
+
216
+ ```ruby
217
+ rule1 = <<-RULE
218
+ rule RuleOne
219
+ {
220
+ strings: $a = "pattern one"
221
+ condition: $a
222
+ }
223
+ RULE
224
+
225
+ rule2 = <<-RULE
226
+ rule RuleTwo
227
+ {
228
+ strings: $b = "pattern two"
229
+ condition: $b
230
+ }
231
+ RULE
232
+
233
+ scanner = Yara::Scanner.new
234
+ scanner.add_rule(rule1)
235
+ scanner.add_rule(rule2)
236
+ scanner.compile
237
+
238
+ results = scanner.scan("text with pattern one and pattern two")
239
+ puts results.map(&:rule_name) # => ["RuleOne", "RuleTwo"]
240
+ scanner.close
241
+ ```
242
+
243
+ ## Advanced Scanner Usage
244
+
245
+ ### Resource Management
246
+
247
+ Always use proper resource management to avoid memory leaks:
248
+
249
+ ```ruby
250
+ # Recommended: Automatic resource cleanup
251
+ Yara::Scanner.open(rule) do |scanner|
252
+ scanner.compile
253
+ results = scanner.scan(data)
254
+ # scanner is automatically closed when block exits
255
+ end
256
+
257
+ # Manual resource management (requires explicit close)
258
+ scanner = Yara::Scanner.new
259
+ scanner.add_rule(rule)
260
+ scanner.compile
261
+ results = scanner.scan(data)
262
+ scanner.close # Required to prevent memory leaks
263
+ ```
264
+
265
+ ### Scanner Lifecycle
266
+
267
+ The scanner follows a strict compile-then-scan workflow:
268
+
269
+ ```ruby
270
+ scanner = Yara::Scanner.new
271
+
272
+ # 1. Add rules (can add multiple)
273
+ scanner.add_rule(rule1)
274
+ scanner.add_rule(rule2, namespace: "custom")
275
+
276
+ # 2. Compile all rules
277
+ scanner.compile
278
+
279
+ # 3. Scan data (can scan multiple times)
280
+ results1 = scanner.scan(data1)
281
+ results2 = scanner.scan(data2)
282
+
283
+ # 4. Clean up
284
+ scanner.close
285
+ ```
286
+
287
+ ## Pattern Matching Analysis
288
+
289
+ ### Detailed Pattern Match Information
290
+
291
+ Access precise match locations and extracted data:
292
+
293
+ ```ruby
294
+ rule = <<-RULE
295
+ rule PatternAnalysis
296
+ {
297
+ strings:
298
+ $api_call = "GetProcAddress"
299
+ $registry = "HKEY_LOCAL_MACHINE"
300
+ $suspicious = "cmd.exe"
301
+
302
+ condition:
303
+ 2 of them
304
+ }
305
+ RULE
306
+
307
+ data = "Malware uses GetProcAddress and HKEY_LOCAL_MACHINE registry keys"
308
+ results = Yara.scan(rule, data)
309
+ result = results.first
310
+
311
+ # Access pattern matches by name
312
+ api_matches = result.matches_for_pattern(:$api_call)
313
+ api_matches.each do |match|
314
+ puts "API call found at offset #{match.offset}"
315
+ puts "Matched text: '#{match.matched_data(data)}'"
316
+ puts "Match length: #{match.length} bytes"
317
+ end
318
+
319
+ # Get all pattern matches
320
+ result.pattern_matches.each do |pattern_name, matches|
321
+ puts "Pattern #{pattern_name}: #{matches.size} matches"
322
+ matches.each do |match|
323
+ puts " At offset #{match.offset}: '#{match.matched_data(data)}'"
324
+ end
325
+ end
326
+ ```
327
+
328
+ ### Pattern Match Convenience Methods
329
+
330
+ ```ruby
331
+ # Check if specific patterns matched
332
+ if result.pattern_matched?(:$suspicious)
333
+ puts "Suspicious pattern detected!"
334
+ end
335
+
336
+ # Get total match count across all patterns
337
+ puts "Total matches: #{result.total_matches}"
338
+
339
+ # Get all matches sorted by location
340
+ all_matches = result.all_matches.sort_by(&:offset)
341
+ all_matches.each { |m| puts "Match at #{m.offset}" }
342
+
343
+ # Check for overlapping matches
344
+ match1 = result.matches_for_pattern(:$pattern1).first
345
+ match2 = result.matches_for_pattern(:$pattern2).first
346
+ if match1.overlaps?(match2)
347
+ puts "Patterns overlap in the data"
348
+ end
349
+ ```
350
+
351
+ ## Rule Compilation & Management
352
+
353
+ ### Using the Compiler Class
354
+
355
+ For advanced compilation scenarios, use `Yara::Compiler` directly:
356
+
357
+ ```ruby
358
+ compiler = Yara::Compiler.new
359
+
360
+ # Add multiple sources
361
+ compiler.add_source(rule1, "rule1.yar")
362
+ compiler.add_source(rule2, "rule2.yar")
363
+
364
+ # Define global variables
365
+ compiler.define_global_str("ENV", "production")
366
+ compiler.define_global_int("MAX_SIZE", 1000)
367
+ compiler.define_global_bool("DEBUG", false)
368
+
369
+ # Build rules
370
+ rules_ptr = compiler.build
371
+
372
+ # Create scanner from compiled rules
373
+ scanner = Yara::Scanner.from_rules(rules_ptr, owns_rules: true)
374
+ results = scanner.scan(data)
375
+
376
+ # Cleanup
377
+ scanner.close
378
+ compiler.destroy
379
+ ```
380
+
381
+ ### Compilation Error Handling
382
+
383
+ ```ruby
384
+ begin
385
+ compiler.add_source("rule bad { condition: undefined_var }")
386
+ compiler.build
387
+ rescue Yara::Compiler::CompileError => e
388
+ # Get detailed error information
389
+ errors = compiler.errors_json
390
+ errors.each do |error|
391
+ puts "Error: #{error['message']}"
392
+ puts "Line: #{error['line']}"
393
+ end
394
+
395
+ # Get warnings too
396
+ warnings = compiler.warnings_json
397
+ warnings.each { |warn| puts "Warning: #{warn['message']}" }
398
+ end
399
+ ```
400
+
401
+ ## Global Variables
402
+
403
+ ### Setting Global Variables on Scanner
404
+
405
+ ```ruby
406
+ rule_with_globals = <<-RULE
407
+ rule ConfigurableRule
408
+ {
409
+ condition:
410
+ ENV == "production" and DEBUG == false and RETRIES >= 3
411
+ }
412
+ RULE
413
+
414
+ scanner = Yara::Scanner.new
415
+ scanner.add_rule(rule_with_globals)
416
+ scanner.compile
417
+
418
+ # Set individual globals
419
+ scanner.set_global_str("ENV", "production")
420
+ scanner.set_global_bool("DEBUG", false)
421
+ scanner.set_global_int("RETRIES", 5)
422
+ scanner.set_global_float("THRESHOLD", 0.95)
423
+
424
+ results = scanner.scan("") # Rule depends only on globals
425
+ scanner.close
426
+ ```
427
+
428
+ ### Bulk Global Variable Setting
429
+
430
+ ```ruby
431
+ # Set multiple globals at once
432
+ globals = {
433
+ "ENV" => "production",
434
+ "DEBUG" => false,
435
+ "RETRIES" => 3,
436
+ "THRESHOLD" => 0.95
437
+ }
438
+
439
+ # Strict mode (default) - raises on undefined globals
440
+ scanner.set_globals(globals)
441
+
442
+ # Lenient mode - silently skips undefined globals
443
+ scanner.set_globals(globals, strict: false)
444
+ ```
445
+
446
+ ## Rule Serialization
447
+
448
+ ### Serialize and Deserialize Rules
449
+
450
+ Compile rules once and reuse them across processes or persistence:
451
+
452
+ ```ruby
453
+ # Compile and serialize rules
454
+ compiler = Yara::Compiler.new
455
+ compiler.add_source(rule1)
456
+ compiler.add_source(rule2)
457
+ serialized_rules = compiler.build_serialized
458
+
459
+ # Save to file or database
460
+ File.binwrite("rules.bin", serialized_rules)
461
+
462
+ # Later, deserialize and use
463
+ serialized_data = File.binread("rules.bin")
464
+ scanner = Yara::Scanner.from_serialized(serialized_data)
465
+ results = scanner.scan(data) # No compile step needed!
466
+
467
+ scanner.close
468
+ ```
469
+
470
+ ## Metadata & Tags
471
+
472
+ ### Accessing Rule Metadata
473
+
474
+ ```ruby
475
+ rule_with_metadata = <<-RULE
476
+ rule MetadataExample
477
+ {
478
+ meta:
479
+ author = "Security Team"
480
+ description = "Detects malware patterns"
481
+ version = 2
482
+ severity = 8
483
+ active = true
484
+ confidence = 0.95
485
+
486
+ strings:
487
+ $pattern = "suspicious"
488
+
489
+ condition:
490
+ $pattern
491
+ }
492
+ RULE
493
+
494
+ results = Yara.scan(rule_with_metadata, "suspicious activity")
495
+ result = results.first
496
+
497
+ # Access metadata hash
498
+ puts result.rule_meta[:author] # => "Security Team"
499
+ puts result.rule_meta[:severity] # => 8
500
+ puts result.rule_meta[:active] # => true
501
+
502
+ # Type-safe metadata access
503
+ puts result.metadata_string(:author) # => "Security Team"
504
+ puts result.metadata_int(:severity) # => 8
505
+ puts result.metadata_bool(:active) # => true
506
+ puts result.metadata_float(:confidence) # => 0.95
507
+ ```
508
+
509
+ ### Working with Tags
510
+
511
+ ```ruby
512
+ rule_with_tags = <<-RULE
513
+ rule TaggedRule : malware suspicious windows
514
+ {
515
+ meta:
516
+ author = "Security Team"
517
+
518
+ strings:
519
+ $pattern = "evil"
520
+
521
+ condition:
522
+ $pattern
523
+ }
524
+ RULE
525
+
526
+ results = Yara.scan(rule_with_tags, "evil code")
527
+ result = results.first
528
+
529
+ # Access tags array
530
+ puts result.tags # => ["malware", "suspicious", "windows"]
531
+
532
+ # Check for specific tags
533
+ if result.has_tag?("malware")
534
+ puts "Malware detected!"
535
+ end
536
+
537
+ if result.has_tag?("windows") && result.has_tag?("suspicious")
538
+ puts "Windows-specific suspicious activity"
539
+ end
540
+ ```
541
+
542
+ ## Namespaces
543
+
544
+ ### Organizing Rules with Namespaces
545
+
546
+ ```ruby
547
+ # Add rules to specific namespaces
548
+ scanner = Yara::Scanner.new
549
+ scanner.add_rule(malware_rule, namespace: "malware")
550
+ scanner.add_rule(pup_rule, namespace: "pup")
551
+ scanner.add_rule(generic_rule) # Default namespace
552
+ scanner.compile
553
+
554
+ results = scanner.scan(data)
555
+ results.each do |result|
556
+ puts "Match: #{result.qualified_name}" # e.g., "malware.trojan_rule"
557
+ puts "Namespace: #{result.namespace}" # e.g., "malware"
558
+ end
559
+ scanner.close
560
+ ```
561
+
562
+ ### Namespace in Rule Source
563
+
564
+ ```ruby
565
+ # Namespace can be defined in rule source
566
+ rule_with_namespace = <<-RULE
567
+ namespace malware {
568
+ rule TrojanDetector
569
+ {
570
+ strings: $trojan = "trojan"
571
+ condition: $trojan
572
+ }
573
+ }
574
+ RULE
575
+
576
+ scanner.add_rule(rule_with_namespace)
577
+ ```
578
+
579
+ ## Performance & Timeouts
580
+
581
+ ### Setting Scan Timeouts
582
+
583
+ ```ruby
584
+ scanner = Yara::Scanner.new
585
+ scanner.add_rule(complex_rule)
586
+ scanner.compile
587
+
588
+ # Set timeout to 5 seconds (5000 milliseconds)
589
+ scanner.set_timeout(5000)
590
+
591
+ begin
592
+ results = scanner.scan(large_data)
593
+ rescue Yara::Scanner::ScanError => e
594
+ if e.message.include?("timeout")
595
+ puts "Scan timed out - data too large or rule too complex"
596
+ end
597
+ end
598
+
599
+ scanner.close
600
+ ```
601
+
602
+ ### Optimizing Performance
603
+
604
+ ```ruby
605
+ # Use serialized rules for repeated usage
606
+ compiler = Yara::Compiler.new
607
+ compiler.add_source(ruleset)
608
+ serialized = compiler.build_serialized
609
+
610
+ # Create multiple scanners from same compiled rules
611
+ 10.times do
612
+ scanner = Yara::Scanner.from_serialized(serialized)
613
+ # Process data in parallel
614
+ Thread.new { scanner.scan(data_chunk) }
615
+ end
616
+ ```
617
+
618
+ ## Error Handling
619
+
620
+ ### Common Exception Types
621
+
622
+ ```ruby
623
+ begin
624
+ scanner = Yara::Scanner.new
625
+ scanner.add_rule(rule)
626
+ scanner.compile
627
+ results = scanner.scan(data)
628
+ rescue Yara::Scanner::CompilationError => e
629
+ puts "Rule compilation failed: #{e.message}"
630
+ rescue Yara::Scanner::ScanError => e
631
+ puts "Scanning failed: #{e.message}"
632
+ rescue Yara::Scanner::NotCompiledError => e
633
+ puts "Attempted to scan before compiling: #{e.message}"
634
+ ensure
635
+ scanner&.close
636
+ end
637
+ ```
638
+
639
+ ### Compiler Error Diagnostics
640
+
641
+ ```ruby
642
+ begin
643
+ compiler = Yara::Compiler.new
644
+ compiler.add_source(invalid_rule)
645
+ compiler.build
646
+ rescue Yara::Compiler::CompileError
647
+ # Get structured error information
648
+ errors = compiler.errors_json
649
+ errors.each do |error|
650
+ puts "Error at line #{error['line']}: #{error['message']}"
651
+ end
652
+ ensure
653
+ compiler&.destroy
654
+ end
655
+ ```
656
+
657
+ ## Best Practices
658
+
659
+ ### 1. Resource Management
660
+
661
+ ```ruby
662
+ # Always use block syntax for automatic cleanup
663
+ Yara::Scanner.open(rule) do |scanner|
664
+ scanner.compile
665
+ results = scanner.scan(data)
666
+ # Automatic cleanup
667
+ end
668
+
669
+ # Or ensure manual cleanup
670
+ scanner = Yara::Scanner.new
671
+ begin
672
+ # ... use scanner
673
+ ensure
674
+ scanner.close
675
+ end
676
+ ```
677
+
678
+ ### 2. Efficient Rule Management
679
+
680
+ ```ruby
681
+ # Compile rules once, use many times
682
+ compiler = Yara::Compiler.new
683
+ compiler.add_source(ruleset)
684
+ serialized = compiler.build_serialized
685
+
686
+ # Create scanners as needed
687
+ def create_scanner(rules_data)
688
+ Yara::Scanner.from_serialized(rules_data)
689
+ end
690
+ ```
691
+
692
+ ### 3. Error-Resilient Scanning
693
+
694
+ ```ruby
695
+ def safe_scan(rule, data)
696
+ Yara::Scanner.open(rule) do |scanner|
697
+ scanner.compile
698
+ scanner.set_timeout(10000) # 10 second timeout
699
+
700
+ begin
701
+ results = scanner.scan(data)
702
+ return results
703
+ rescue Yara::Scanner::ScanError => e
704
+ puts "Scan failed: #{e.message}"
705
+ return Yara::ScanResults.new # Empty results
706
+ end
707
+ end
708
+ end
709
+ ```
710
+
711
+ ### 4. Pattern Analysis
712
+
713
+ ```ruby
714
+ def analyze_matches(results, original_data)
715
+ results.each do |result|
716
+ puts "Rule: #{result.rule_name}"
717
+ puts "Tags: #{result.tags.join(', ')}" if result.tags.any?
718
+
719
+ result.pattern_matches.each do |pattern, matches|
720
+ puts " Pattern #{pattern}: #{matches.size} matches"
721
+ matches.each do |match|
722
+ context_start = [match.offset - 10, 0].max
723
+ context_end = [match.end_offset + 10, original_data.length].min
724
+ context = original_data[context_start...context_end]
725
+ puts " #{match.offset}: #{context.inspect}"
726
+ end
727
+ end
728
+ end
729
+ end
730
+ ```
731
+
732
+ ### 5. Global Variable Management
733
+
734
+ ```ruby
735
+ # Define globals at compile time for best performance
736
+ compiler = Yara::Compiler.new
737
+ compiler.define_global_str("ENV", ENV["RAILS_ENV"] || "development")
738
+ compiler.define_global_bool("DEBUG", Rails.env.development?)
739
+
740
+ # Or set globals per scan for dynamic behavior
741
+ scanner.set_globals({
742
+ "CURRENT_TIME" => Time.now.to_i,
743
+ "USER_LEVEL" => user.security_level
744
+ }, strict: false)
745
+ ```
746
+
747
+ This comprehensive usage guide covers all major functionality available in yara-ffi. For development information, see [DEVELOPMENT.md](DEVELOPMENT.md).