gmail_search_syntax 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1b5e08a769d7b375473e7ca0e4afe134e03862ae3f31040c9bb22904ff482b33
4
- data.tar.gz: 32624e727131b5bb0779f3f1271b6031e252fbdbdea776b445c598b06f343715
3
+ metadata.gz: 47b56ee5467d6808c4ceae00289bf291551374a3e5854ed5223a5b2f5ca2f9ac
4
+ data.tar.gz: 27aa483d8296eb2a3e775aecfaeffc0813f60936cd9091e976eb5159559cafc9
5
5
  SHA512:
6
- metadata.gz: d93ca6cb4e4d0bab18a9e3ff2620f669f9913cbafa3b389dd1bf3a828329df34be12f202856bba61e66d581dd8cf6c91a665889efeb8e5df3a50d5d60e33d131
7
- data.tar.gz: c16da51e8f41ba6a9c001293df098c789b7dd5f3b494d3cfb0f60d3e16e485b8f5d185dd67c0cb1b0cade40d1079eca786c1614c7d7afc90585e97c5d0b8c4e1
6
+ metadata.gz: 0170d5419e8ab3335a3bd4494f09c3b6e6f5a143cf26791a383f2189a9e12fcd49a596e56555670d018a6ffdffa13b13eed27e554cbe5ff82a221dcd12ed4bd7
7
+ data.tar.gz: 95d48f2e6aedb4160634db949e32be25823f8b808e71dc113f3ce9e8490b507e136a96372c329f8ff03f2476bc52f4dd31fc89f67a9ba2b7e11e5e0bcebf241f
@@ -0,0 +1,24 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [ main ]
6
+ pull_request:
7
+ branches: [ main ]
8
+
9
+ jobs:
10
+ test:
11
+ runs-on: ubuntu-latest
12
+
13
+ steps:
14
+ - uses: actions/checkout@v4
15
+
16
+ - name: Set up Ruby
17
+ uses: ruby/setup-ruby@v1
18
+ with:
19
+ ruby-version: '3.4'
20
+ bundler-cache: true
21
+
22
+ - name: Run tests
23
+ run: bundle exec rake
24
+
data/AGENTS.md ADDED
@@ -0,0 +1,104 @@
1
+ # Agent Guidelines for gmail_search_syntax
2
+
3
+ This document outlines the coding standards and workflow requirements for AI agents working on the gmail_search_syntax project.
4
+
5
+ ## Ruby File Standards
6
+
7
+ ### Frozen String Literal
8
+ **ALWAYS** include `# frozen_string_literal: true` at the top of every Ruby file:
9
+
10
+ ```ruby
11
+ # frozen_string_literal: true
12
+
13
+ class MyClass
14
+ # ... implementation
15
+ end
16
+ ```
17
+
18
+ This directive should be the very first line of every `.rb` file to ensure string immutability and improve performance.
19
+
20
+ ## Documentation Standards
21
+
22
+ ### Markdown Files Location
23
+ When writing step-by-step instructions, documentation, or process descriptions in Markdown format, place them in the `slop/` directory:
24
+
25
+ ```
26
+ slop/
27
+ ├── ARCHITECTURE.md
28
+ ├── GMAIL_BEHAVIOR_COMPARISON.md
29
+ ├── GMAIL_COMPATIBILITY_COMPLETE.md
30
+ └── IMPLEMENTATION_NOTES.md
31
+ ```
32
+
33
+ This keeps the main project directory clean while preserving detailed documentation and implementation notes.
34
+
35
+ ## Code Formatting
36
+
37
+ ### StandardRB Integration
38
+ After creating or modifying any Ruby file, **ALWAYS** run StandardRB to maintain consistent formatting:
39
+
40
+ ```bash
41
+ standardrb --fix /path/to/file.rb
42
+ ```
43
+
44
+ This ensures:
45
+ - Consistent code style across the project
46
+ - Automatic fixing of common formatting issues
47
+ - Compliance with the project's Ruby style guide
48
+ - Uniform indentation, spacing, and syntax
49
+
50
+ ### Workflow
51
+ 1. Create or modify a Ruby file
52
+ 2. Immediately run `standardrb --fix` on the file
53
+ 3. Verify the changes are acceptable
54
+ 4. Continue with development
55
+
56
+ ## Project Context
57
+
58
+ This is a Ruby gem that parses Gmail's search syntax and converts it into an Abstract Syntax Tree (AST). The project includes:
59
+
60
+ - **Core parsing**: Tokenizer, parser, and AST nodes
61
+ - **SQL conversion**: SQLite and Postgres visitors for database queries
62
+ - **Comprehensive testing**: Unit and integration tests
63
+ - **Documentation**: Schema documentation and operator reference
64
+
65
+ ## Key Files
66
+
67
+ - `lib/gmail_search_syntax.rb` - Main entry point
68
+ - `lib/gmail_search_syntax/parser.rb` - Core parsing logic
69
+ - `lib/gmail_search_syntax/sql_visitor.rb` - SQL generation
70
+ - `test/` - Test suite
71
+ - `SCHEMA.md` - Database schema documentation
72
+ - `slop/` - Detailed implementation documentation
73
+
74
+ ## Best Practices
75
+
76
+ 1. **Test Coverage**: Ensure all new functionality has corresponding tests
77
+ 2. **Documentation**: Update relevant documentation when adding features
78
+ 3. **Backward Compatibility**: Maintain API compatibility when possible
79
+ 4. **Performance**: Consider performance implications of parsing changes
80
+ 5. **Gmail Compatibility**: Verify changes against Gmail's actual search behavior
81
+
82
+ ## Example Workflow
83
+
84
+ ```bash
85
+ # 1. Create a new Ruby file
86
+ echo '# frozen_string_literal: true' > lib/gmail_search_syntax/new_feature.rb
87
+
88
+ # 2. Add implementation
89
+ # ... write code ...
90
+
91
+ # 3. Format with StandardRB
92
+ standardrb --fix lib/gmail_search_syntax/new_feature.rb
93
+
94
+ # 4. Create documentation if needed
95
+ echo '# Implementation Notes' > slop/NEW_FEATURE_NOTES.md
96
+
97
+ # 5. Add tests
98
+ # ... write tests ...
99
+
100
+ # 6. Run test suite
101
+ bundle exec rake test
102
+ ```
103
+
104
+ Remember: Consistency in code style and documentation organization is crucial for maintaining this project's quality and readability.
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source "https://rubygems.org"
2
+
3
+ gemspec
data/README.md CHANGED
@@ -6,7 +6,8 @@ Based on the official Gmail search operators documentation:
6
6
  https://support.google.com/mail/answer/7190
7
7
 
8
8
  > [!TIP]
9
- > This gem was created for [Cora,](https://cora.computer/) your personal e-mail assistant.
9
+ > This gem was created for [Cora,](https://cora.computer/)
10
+ > your personal e-mail assistant.
10
11
  > Send them some love for allowing me to share it.
11
12
 
12
13
  ## Installation
@@ -54,47 +55,6 @@ GmailSearchSyntax.parse!("")
54
55
  # => raises GmailSearchSyntax::EmptyQueryError
55
56
  ```
56
57
 
57
- ### Converting to SQL
58
-
59
- The gem includes a SQLite visitor that can convert Gmail queries to SQL. Here's a complex example:
60
-
61
- ```ruby
62
- require 'gmail_search_syntax'
63
-
64
- # A complex Gmail query with multiple operators
65
- query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
66
-
67
- ast = GmailSearchSyntax.parse!(query)
68
- visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
69
- visitor.visit(ast)
70
-
71
- sql, params = visitor.to_query.to_sql
72
- ```
73
-
74
- This generates the following SQL:
75
-
76
- ```sql
77
- SELECT DISTINCT m0.id
78
- FROM messages AS m0
79
- INNER JOIN message_addresses AS ma1 ON m0.id = ma1.message_id
80
- INNER JOIN message_addresses AS ma3 ON m0.id = ma3.message_id
81
- INNER JOIN message_labels AS ml ON m0.id = ml.message_id
82
- INNER JOIN labels AS l ON ml.label_id = l.id
83
- WHERE ((((ma1.address_type = ? OR ma1.address_type = ? OR ma1.address_type = ?)
84
- AND ma1.email_address = ?)
85
- OR ((ma3.address_type = ? OR ma3.address_type = ? OR ma3.address_type = ?)
86
- AND ma3.email_address = ?))
87
- AND m0.subject LIKE ?
88
- AND m0.has_attachment = 1
89
- AND NOT l.name = ?
90
- AND m0.internal_date > ?
91
- AND m0.size_bytes > ?)
92
- ```
93
-
94
- With parameters: `["from", "cc", "bcc", "manager", "from", "cc", "bcc", "boss", "%quarterly review%", "archived", "2024-01-01", 5242880]`
95
-
96
- A similar visitor is provided for PostgreSQL.
97
-
98
58
  ## Supported Operators
99
59
 
100
60
  Email routing: `from:`, `to:`, `cc:`, `bcc:`, `deliveredto:`
@@ -113,6 +73,23 @@ Size: `size:`, `larger:`, `smaller:`
113
73
 
114
74
  There is also a converter from the operators to SQL queries against an embedded SQLite database. This is meant more as an example than a fully-featured store, but it shows what's possible.
115
75
 
76
+ ### Converting to SQL
77
+
78
+ The gem includes a SQLite visitor and a Postgres visitor which converts the Gmail queries into corresponding SQL. See SCHEMA.md for more information.
79
+
80
+ ```ruby
81
+ require 'gmail_search_syntax'
82
+
83
+ # A complex Gmail query with multiple operators
84
+ query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
85
+
86
+ ast = GmailSearchSyntax.parse!(query)
87
+ visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
88
+ visitor.visit(ast)
89
+
90
+ sql, params = visitor.to_query.to_sql
91
+ ```
92
+
116
93
  ## Testing
117
94
 
118
95
  ```bash
data/SCHEMA.md CHANGED
@@ -1,15 +1,111 @@
1
- # Database Schema for Gmail Search
1
+ This document describes the database schema designed to support Gmail search syntax queries. The schema is optimized for the search operators defined in `lib/GMAIL_SEARCH_OPERATORS.md`.
2
2
 
3
- ## Overview
3
+ ```ruby
4
+ require 'gmail_search_syntax'
4
5
 
5
- This document describes the SQLite database schema designed to support Gmail search syntax queries. The schema is optimized for the search operators defined in `lib/GMAIL_SEARCH_OPERATORS.md`.
6
+ # A complex Gmail query with multiple operators
7
+ query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
6
8
 
7
- ## Core Tables
9
+ ast = GmailSearchSyntax.parse!(query)
10
+ visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
11
+ visitor.visit(ast)
12
+
13
+ sql, params = visitor.to_query.to_sql
14
+ ```
15
+
16
+ generates the following SQL:
17
+
18
+ ```sql
19
+ SELECT DISTINCT m0.id
20
+ FROM messages AS m0
21
+ INNER JOIN message_addresses AS ma1 ON m0.id = ma1.message_id
22
+ INNER JOIN message_addresses AS ma3 ON m0.id = ma3.message_id
23
+ INNER JOIN message_labels AS ml ON m0.id = ml.message_id
24
+ INNER JOIN labels AS l ON ml.label_id = l.id
25
+ WHERE ((((ma1.address_type = ? OR ma1.address_type = ? OR ma1.address_type = ?)
26
+ AND ma1.email_address = ?)
27
+ OR ((ma3.address_type = ? OR ma3.address_type = ? OR ma3.address_type = ?)
28
+ AND ma3.email_address = ?))
29
+ AND m0.subject LIKE ?
30
+ AND m0.has_attachment = 1
31
+ AND NOT l.name = ?
32
+ AND m0.internal_date > ?
33
+ AND m0.size_bytes > ?)
34
+ ```
35
+
36
+ and bound parameters:
37
+
38
+ ```
39
+ [
40
+ "from", "cc", "bcc", "manager", "from", "cc", "bcc",
41
+ "boss", "%quarterly review%", "archived", "2024-01-01", 5242880
42
+ ]
43
+ ```
44
+
45
+ ## String Matching Requirements
46
+
47
+ ### Prefix/Suffix Matching
48
+ Required for:
49
+ - **Email addresses** (from:, to:, cc:, bcc:, deliveredto:)
50
+ - `from:marc@` → prefix match → `WHERE email_address LIKE 'marc@%'`
51
+ - `from:@example.com` → suffix match → `WHERE email_address LIKE '%@example.com'`
52
+ - `from:marc@example.com` → exact match → `WHERE email_address = 'marc@example.com'`
53
+
54
+ - **Mailing lists** (list:)
55
+ - Same pattern as email addresses
56
+
57
+ - **Filenames** (filename:)
58
+ - `filename:pdf` → extension match → `WHERE filename LIKE '%.pdf'`
59
+ - `filename:homework` → prefix match → `WHERE filename LIKE 'homework%'`
60
+
61
+ ### Exact Match Only
62
+ - RFC822 message IDs
63
+ - Boolean/enum fields (is:, has:, in:, category:, label:)
64
+
65
+ ## SQL Visitor Usage
66
+
67
+ The library provides two SQL visitor implementations for different database backends: SQLite and Postgres. They are configured to use the schema described below. You convert the search AST nodes into a SQL query using the provided SQL visitors. If you have a different schema, use the visitor code as a template.
68
+
69
+
70
+ ```ruby
71
+ ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
72
+ visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "me@example.com")
73
+ visitor.visit(ast)
74
+
75
+ sql, params = visitor.to_query.to_sql
76
+ # sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > datetime('now', ?)"
77
+ # params: ["from", "cc", "bcc", "amy@example.com", "-7 days"]
78
+ ```
79
+
80
+ The visitors implement:
81
+
82
+ - **Parameterized queries**: All user input is bound via `?` placeholders
83
+ - **Automatic table joins**: Joins required tables based on operators
84
+ - **Nested conditions**: Properly handles AND/OR/NOT with parentheses
85
+ - **Special operators**:
86
+ - `from:me` / `to:me` → uses `current_user_email`
87
+ - `in:anywhere` → no location filter
88
+ - `AROUND` → generates `(1 = 0)` no-op condition
89
+ - **Date handling**:
90
+ - Converts dates from `YYYY/MM/DD` to `YYYY-MM-DD`
91
+ - Parses relative times (`1y`, `2d`, `3m`) to database-specific datetime functions
92
+ - **Size parsing**: Converts `10M`, `1G` to bytes
93
+
94
+ ## Fuzzy Matching Limitations
95
+
96
+ The current implementation does **not** support:
97
+ - **AROUND operator** (proximity search) - generates no-op `(1 = 0)` condition
98
+ - Full-text search with word distance calculations
99
+ - Stemming or phonetic matching
100
+ - Levenshtein distance / typo tolerance
101
+
102
+ These features require additional implementation, potentially using SQLite FTS5 extensions.
8
103
 
9
- ### messages
10
- Primary table storing email message metadata.
104
+ ## Core Tables
11
105
 
12
106
  ```sql
107
+ -- messages
108
+ -- Primary table storing email message metadata.
13
109
  CREATE TABLE messages (
14
110
  id TEXT PRIMARY KEY,
15
111
  rfc822_message_id TEXT,
@@ -55,12 +151,12 @@ CREATE TABLE messages (
55
151
 
56
152
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
57
153
  );
58
- ```
154
+ `
59
155
 
60
- ### message_addresses
61
- Stores email addresses associated with messages (from, to, cc, bcc, delivered_to).
156
+ -- message_addresses
157
+ -- Stores email addresses associated with messages (from, to, cc, bcc, delivered_to).
158
+ -- The `from:` and `to:` operators search across `from`, `cc`, and `bcc` address types per Gmail specification.
62
159
 
63
- ```sql
64
160
  CREATE TABLE message_addresses (
65
161
  id INTEGER PRIMARY KEY AUTOINCREMENT,
66
162
  message_id TEXT NOT NULL,
@@ -70,26 +166,21 @@ CREATE TABLE message_addresses (
70
166
 
71
167
  FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE CASCADE
72
168
  );
73
- ```
74
169
 
75
- **Note:** The `from:` and `to:` operators search across `from`, `cc`, and `bcc` address types per Gmail specification.
76
170
 
77
- ### labels
78
- Label definitions with external string IDs.
171
+ -- labels
172
+ -- Label definitions with external string IDs.
79
173
 
80
- ```sql
81
174
  CREATE TABLE labels (
82
175
  id TEXT PRIMARY KEY,
83
176
  name TEXT NOT NULL UNIQUE,
84
177
  is_system_label BOOLEAN DEFAULT 0,
85
178
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
86
179
  );
87
- ```
88
180
 
89
- ### message_labels
90
- Many-to-many relationship between messages and labels.
181
+ -- message_labels
182
+ -- Many-to-many relationship between messages and labels.
91
183
 
92
- ```sql
93
184
  CREATE TABLE message_labels (
94
185
  id INTEGER PRIMARY KEY AUTOINCREMENT,
95
186
  message_id TEXT NOT NULL,
@@ -100,11 +191,9 @@ CREATE TABLE message_labels (
100
191
  UNIQUE(message_id, label_id)
101
192
  );
102
193
  ```
194
+ -- attachments
195
+ -- File attachments associated with messages.
103
196
 
104
- ### attachments
105
- File attachments associated with messages.
106
-
107
- ```sql
108
197
  CREATE TABLE attachments (
109
198
  id INTEGER PRIMARY KEY AUTOINCREMENT,
110
199
  message_id TEXT NOT NULL,
@@ -116,108 +205,3 @@ CREATE TABLE attachments (
116
205
  );
117
206
  ```
118
207
 
119
- ## String Matching Requirements
120
-
121
- ### Prefix/Suffix Matching
122
- Required for:
123
- - **Email addresses** (from:, to:, cc:, bcc:, deliveredto:)
124
- - `from:marc@` → prefix match → `WHERE email_address LIKE 'marc@%'`
125
- - `from:@example.com` → suffix match → `WHERE email_address LIKE '%@example.com'`
126
- - `from:marc@example.com` → exact match → `WHERE email_address = 'marc@example.com'`
127
-
128
- - **Mailing lists** (list:)
129
- - Same pattern as email addresses
130
-
131
- - **Filenames** (filename:)
132
- - `filename:pdf` → extension match → `WHERE filename LIKE '%.pdf'`
133
- - `filename:homework` → prefix match → `WHERE filename LIKE 'homework%'`
134
-
135
- ### Exact Match Only
136
- - RFC822 message IDs
137
- - Boolean/enum fields (is:, has:, in:, category:, label:)
138
-
139
- ## SQL Visitor Usage
140
-
141
- The library provides two SQL visitor implementations for different database backends:
142
-
143
- ### SQLiteVisitor
144
-
145
- Converts parsed AST into SQLite-compatible SQL queries:
146
-
147
- ```ruby
148
- ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
149
- visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "me@example.com")
150
- visitor.visit(ast)
151
-
152
- sql, params = visitor.to_query.to_sql
153
- # sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > datetime('now', ?)"
154
- # params: ["from", "cc", "bcc", "amy@example.com", "-7 days"]
155
- ```
156
-
157
- ### PostgresVisitor
158
-
159
- Converts parsed AST into PostgreSQL-compatible SQL queries:
160
-
161
- ```ruby
162
- ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
163
- visitor = GmailSearchSyntax::PostgresVisitor.new(current_user_email: "me@example.com")
164
- visitor.visit(ast)
165
-
166
- sql, params = visitor.to_query.to_sql
167
- # sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > (NOW() - ?::interval)"
168
- # params: ["from", "cc", "bcc", "amy@example.com", "7 days"]
169
- ```
170
-
171
- **Note**: `SqlVisitor` is an alias for `SQLiteVisitor` for backward compatibility.
172
-
173
- ### Database-Specific Differences
174
-
175
- The main difference between the visitors is in relative date handling:
176
-
177
- | Feature | SQLite | PostgreSQL |
178
- |---------|--------|------------|
179
- | `older_than:7d` | `datetime('now', '-7 days')` | `NOW() - '7 days'::interval` |
180
- | `newer_than:3m` | `datetime('now', '-3 months')` | `NOW() - '3 months'::interval` |
181
- | Parameter format | `"-7 days"` (negative) | `"7 days"` (positive with cast) |
182
-
183
- All other query generation is identical between the two visitors.
184
-
185
- ### Features
186
-
187
- - **Parameterized queries**: All user input is bound via `?` placeholders
188
- - **Automatic table joins**: Joins required tables based on operators
189
- - **Nested conditions**: Properly handles AND/OR/NOT with parentheses
190
- - **Special operators**:
191
- - `from:me` / `to:me` → uses `current_user_email`
192
- - `in:anywhere` → no location filter
193
- - `AROUND` → generates `(1 = 0)` no-op condition
194
- - **Date handling**:
195
- - Converts dates from `YYYY/MM/DD` to `YYYY-MM-DD`
196
- - Parses relative times (`1y`, `2d`, `3m`) to database-specific datetime functions
197
- - **Size parsing**: Converts `10M`, `1G` to bytes
198
-
199
- ### Query Object
200
-
201
- The `Query` class accumulates SQL components:
202
-
203
- ```ruby
204
- query = visitor.to_query
205
-
206
- query.conditions # Array of WHERE conditions
207
- query.joins # Hash of JOIN clauses
208
- query.params # Array of bound parameters
209
-
210
- sql, params = query.to_sql
211
- # Returns: [sql_string, parameters_array]
212
- ```
213
-
214
- ## Fuzzy Matching Limitations
215
-
216
- The current implementation does **not** support:
217
- - **AROUND operator** (proximity search) - generates no-op `(1 = 0)` condition
218
- - Full-text search with word distance calculations
219
- - Stemming or phonetic matching
220
- - Levenshtein distance / typo tolerance
221
-
222
- These features require additional implementation, potentially using SQLite FTS5 extensions.
223
-
data/examples/demo.rb CHANGED
@@ -1,8 +1,5 @@
1
1
  require_relative "../lib/gmail_search_syntax"
2
-
3
- puts "Gmail Search Syntax Parser - Demo"
4
- puts "=" * 50
5
- puts
2
+ require "pp"
6
3
 
7
4
  queries = [
8
5
  "from:amy@example.com",
@@ -23,6 +20,5 @@ queries = [
23
20
  queries.each do |query|
24
21
  puts "Query: #{query}"
25
22
  ast = GmailSearchSyntax.parse!(query)
26
- puts "AST: #{ast.inspect}"
27
- puts
23
+ pp ast
28
24
  end
@@ -0,0 +1,23 @@
1
+ require_relative "lib/gmail_search_syntax/version"
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = "gmail_search_syntax"
5
+ s.version = GmailSearchSyntax::VERSION
6
+ s.summary = "Gmail search syntax parser"
7
+ s.authors = ["me@julik.nl"]
8
+ s.license = "MIT"
9
+ s.homepage = "https://github.com/julik/gmail_search_syntax"
10
+ s.required_ruby_version = ">= 3.0"
11
+
12
+ s.files = Dir.chdir(__dir__) do
13
+ `git ls-files -z`.split("\x0").reject do |f|
14
+ File.basename(f).start_with?(".")
15
+ end
16
+ end
17
+ s.require_paths = ["lib"]
18
+
19
+ s.add_development_dependency "minitest", "~> 5.0"
20
+ s.add_development_dependency "rake", "~> 13.0"
21
+ s.add_development_dependency "sqlite3", "< 1.6"
22
+ s.add_development_dependency "standard", "~> 1.0"
23
+ end
@@ -195,47 +195,17 @@ module GmailSearchSyntax
195
195
  when :lbrace
196
196
  parse_braces
197
197
  when :quoted_string
198
- # Quoted strings are consumed as-is, no bareword collection
199
198
  value = current_token.value
200
199
  advance
201
200
  value
202
201
  when :word, :email, :number, :date, :relative_time
203
- # Collect the initial value and any following barewords
204
- # until we hit an operator, special token, or grouping
205
- values = []
206
- types = []
207
-
208
- # Collect barewords
209
- while !eof? && is_bareword_token?
210
- # Check if this word is actually an operator (word followed by colon)
211
- if current_token.type == :word && peek_token&.type == :colon
212
- break
213
- end
214
-
215
- values << current_token.value
216
- types << current_token.type
217
- advance
218
- end
219
-
220
- # If we only collected one value and it's a number, preserve its type
221
- if values.length == 1 && types[0] == :number
222
- values[0]
223
- else
224
- # Multiple values or non-number: join as string
225
- values.map(&:to_s).join(" ")
226
- end
227
- end
228
- end
229
-
230
- def is_bareword_token?
231
- return false if eof?
232
-
233
- # Barewords are simple value tokens, not operators or special syntax
234
- case current_token.type
235
- when :word, :email, :number, :date, :relative_time
236
- true
237
- else
238
- false
202
+ # Take only a single token as the operator value.
203
+ # Multi-word values must be explicitly quoted: from:"john smith"
204
+ # This matches Gmail's actual search behavior where bare words
205
+ # after an operator are treated as separate search terms.
206
+ value = current_token.value
207
+ advance
208
+ value.is_a?(Integer) ? value : value.to_s
239
209
  end
240
210
  end
241
211
  end
@@ -55,9 +55,16 @@ module GmailSearchSyntax
55
55
  advance
56
56
  when "-"
57
57
  next_char = peek_char
58
- if next_char && next_char !~ /\s/
58
+ prev_char = (@position > 0) ? @input[@position - 1] : nil
59
+ # Negation requires: non-whitespace follows AND (start of input OR whitespace precedes)
60
+ # Gmail behavior: "Coxlee-Gammage" → Coxlee AND Gammage (hyphen is word separator)
61
+ # "Coxlee -Gammage" → Coxlee AND NOT Gammage (space+hyphen = negation)
62
+ if next_char && next_char !~ /\s/ && (prev_char.nil? || prev_char =~ /\s/)
59
63
  add_token(:minus, char)
60
64
  advance
65
+ elsif prev_char && prev_char !~ /\s/
66
+ # Embedded hyphen (preceded by non-whitespace) - skip it as word separator
67
+ advance
61
68
  else
62
69
  read_word
63
70
  end
@@ -1,3 +1,3 @@
1
1
  module GmailSearchSyntax
2
- VERSION = "0.1.2"
2
+ VERSION = "0.1.4"
3
3
  end
@@ -0,0 +1,102 @@
1
+ # Embedded Hyphens in Gmail Search
2
+
3
+ ## Gmail's Actual Behavior
4
+
5
+ Gmail treats hyphens differently depending on whether they are preceded by whitespace:
6
+
7
+ ### Embedded Hyphen (No Preceding Whitespace)
8
+
9
+ When a hyphen appears immediately after a word character (no space before it), Gmail treats it as a **word separator**, not a negation operator. Both parts become separate search tokens that are implicitly ANDed together.
10
+
11
+ ```
12
+ Coxlee-Gammage
13
+ ```
14
+
15
+ Gmail behavior: Search for messages containing both "Coxlee" AND "Gammage". Both tokens get highlighted in search results.
16
+
17
+ Parsed as:
18
+ ```ruby
19
+ GmailSearchSyntax.parse!("Coxlee-Gammage")
20
+ # => #<And [#<StringToken "Coxlee">, #<StringToken "Gammage">]>
21
+ ```
22
+
23
+ ### Space + Hyphen (Negation)
24
+
25
+ When a hyphen is preceded by whitespace (or at the start of input), it functions as the **negation operator**.
26
+
27
+ ```
28
+ Coxlee -Gammage
29
+ ```
30
+
31
+ Gmail behavior: Search for messages containing "Coxlee" but NOT "Gammage".
32
+
33
+ Parsed as:
34
+ ```ruby
35
+ GmailSearchSyntax.parse!("Coxlee -Gammage")
36
+ # => #<And [#<StringToken "Coxlee">, #<Not #<StringToken "Gammage">>]>
37
+ ```
38
+
39
+ ## Examples
40
+
41
+ | Query | Parsed As | Meaning |
42
+ |-------|-----------|---------|
43
+ | `some-outfit` | `some AND outfit` | Contains both "some" and "outfit" |
44
+ | `some -outfit` | `some AND NOT outfit` | Contains "some" but not "outfit" |
45
+ | `a-b-c` | `a AND b AND c` | Contains all three tokens |
46
+ | `-spam` | `NOT spam` | Does not contain "spam" |
47
+ | `cats-dogs -birds` | `cats AND dogs AND NOT birds` | Contains "cats" and "dogs", not "birds" |
48
+
49
+ ## Real-World Use Cases
50
+
51
+ ### Hyphenated Names
52
+ ```
53
+ from:Mary-Jane
54
+ ```
55
+ Searches for emails where "from" contains "Mary" AND message contains "Jane".
56
+
57
+ ### Hyphenated Terms
58
+ ```
59
+ self-service
60
+ ```
61
+ Finds messages containing both "self" and "service".
62
+
63
+ ### Compound Words
64
+ ```
65
+ e-commerce
66
+ ```
67
+ Finds messages containing both "e" and "commerce".
68
+
69
+ ## Implementation Details
70
+
71
+ The fix is in the tokenizer (`lib/gmail_search_syntax/tokenizer.rb`). When encountering a `-` character:
72
+
73
+ 1. Check if there's a non-whitespace character following (potential negation or word separator)
74
+ 2. Check if there's whitespace (or nothing) preceding the hyphen
75
+ 3. If preceded by whitespace or at start of input: treat as negation operator (`:minus` token)
76
+ 4. If preceded by non-whitespace: skip the hyphen (acts as word separator, no token emitted)
77
+
78
+ ```ruby
79
+ when "-"
80
+ next_char = peek_char
81
+ prev_char = @position > 0 ? @input[@position - 1] : nil
82
+
83
+ if next_char && next_char !~ /\s/ && (prev_char.nil? || prev_char =~ /\s/)
84
+ # Negation: preceded by whitespace or at start, followed by non-whitespace
85
+ add_token(:minus, char)
86
+ advance
87
+ elsif prev_char && prev_char !~ /\s/
88
+ # Embedded hyphen: preceded by non-whitespace - skip as word separator
89
+ advance
90
+ else
91
+ read_word
92
+ end
93
+ ```
94
+
95
+ ## Bug That Was Fixed
96
+
97
+ Previously, the gem incorrectly treated all hyphens followed by non-whitespace as negation:
98
+
99
+ - **Old (incorrect):** `some-outfit` was parsed as `some AND NOT outfit`
100
+ - **New (correct):** `some-outfit` is parsed as `some AND outfit`
101
+
102
+ This matches Gmail's actual search behavior where hyphenated terms find messages containing both parts of the hyphenated word.
@@ -0,0 +1,84 @@
1
+ # Greedy vs Non-Greedy Operator Value Tokenization
2
+
3
+ ## Summary
4
+
5
+ This document explains the fix in `bugfix-tokens` that changes how operator values are parsed from **greedy** (consuming multiple barewords) to **non-greedy** (single token only), matching Gmail's actual search behavior.
6
+
7
+ ## The Problem
8
+
9
+ The previous implementation used greedy tokenization for operator values. When parsing `label:Cora/Google Drive`, the parser would consume all subsequent barewords (`Cora/Google`, `Drive`) into the operator's value until hitting another operator or special token.
10
+
11
+ **Previous behavior:**
12
+ ```
13
+ label:Cora/Google Drive label:Notes
14
+ → Operator(label, "Cora/Google Drive"), Operator(label, "Notes")
15
+ ```
16
+
17
+ **Gmail's actual behavior:**
18
+ ```
19
+ label:Cora/Google Drive label:Notes
20
+ → Operator(label, "Cora/Google"), StringToken("Drive"), Operator(label, "Notes")
21
+ ```
22
+
23
+ ## Gmail's Actual Behavior
24
+
25
+ In Gmail search, barewords after an operator are treated as **separate search terms**, not as part of the operator's value. To include multiple words in an operator value, you must explicitly quote them:
26
+
27
+ | Input | Gmail Interpretation |
28
+ |-------|---------------------|
29
+ | `subject:urgent meeting` | Subject contains "urgent" AND body contains "meeting" |
30
+ | `subject:"urgent meeting"` | Subject contains "urgent meeting" |
31
+ | `in:anywhere movie` | Search "movie" in all mail locations |
32
+ | `label:test one two` | Label is "test" AND body contains "one" AND "two" |
33
+
34
+ ## The Fix
35
+
36
+ Changed `parse_operator_value` in `lib/gmail_search_syntax/parser.rb` to only consume a single token for bareword values (`:word`, `:email`, `:number`, `:date`, `:relative_time`).
37
+
38
+ ### Before (greedy)
39
+
40
+ ```ruby
41
+ when :word, :email, :number, :date, :relative_time
42
+ values = []
43
+ types = []
44
+
45
+ # Collect barewords until operator or special token
46
+ while !eof? && is_bareword_token?
47
+ if current_token.type == :word && peek_token&.type == :colon
48
+ break
49
+ end
50
+ values << current_token.value
51
+ types << current_token.type
52
+ advance
53
+ end
54
+
55
+ # Join multiple values as string
56
+ values.map(&:to_s).join(" ")
57
+ ```
58
+
59
+ ### After (non-greedy)
60
+
61
+ ```ruby
62
+ when :word, :email, :number, :date, :relative_time
63
+ # Take only a single token as the operator value.
64
+ # Multi-word values must be explicitly quoted: from:"john smith"
65
+ value = current_token.value
66
+ advance
67
+ value.is_a?(Integer) ? value : value.to_s
68
+ ```
69
+
70
+ ## Test Changes
71
+
72
+ Updated tests to reflect the corrected behavior:
73
+
74
+ | Test | Previous Expected | Now Expected |
75
+ |------|-------------------|--------------|
76
+ | `in:anywhere movie` | `Operator("in", "anywhere movie")` | `Operator("in", "anywhere")`, `StringToken("movie")` |
77
+ | `subject:urgent meeting important` | `Operator("subject", "urgent meeting important")` | `Operator("subject", "urgent")`, `StringToken("meeting")`, `StringToken("important")` |
78
+ | `label:test one two three label:another` | 2 operands | 5 operands |
79
+
80
+ ## Implications
81
+
82
+ 1. **Breaking change** for consumers relying on greedy behavior
83
+ 2. Users must now quote multi-word operator values explicitly
84
+ 3. More accurate translation to SQL/other query languages since the semantics now match Gmail
@@ -142,6 +142,91 @@ class GmailSearchSyntaxTest < Minitest::Test
142
142
  assert_equal "movie", ast.operands[1].child.value
143
143
  end
144
144
 
145
+ # Gmail behavior: embedded hyphens (no preceding whitespace) are word separators, not negation
146
+ # "Coxlee-Gammage" → Coxlee AND Gammage (both tokens highlighted)
147
+ # "Coxlee -Gammage" → Coxlee AND NOT Gammage (space+hyphen = negation)
148
+
149
+ def test_embedded_hyphen_is_word_separator
150
+ # Gmail behavior: hyphen without preceding whitespace separates words, not negation
151
+ ast = GmailSearchSyntax.parse!("some-outfit")
152
+ assert_instance_of And, ast
153
+
154
+ assert_equal 2, ast.operands.length
155
+ assert_instance_of StringToken, ast.operands[0]
156
+ assert_equal "some", ast.operands[0].value
157
+
158
+ assert_instance_of StringToken, ast.operands[1]
159
+ assert_equal "outfit", ast.operands[1].value
160
+ end
161
+
162
+ def test_embedded_hyphen_multiple
163
+ # Multiple hyphens: a-b-c → a AND b AND c
164
+ ast = GmailSearchSyntax.parse!("a-b-c")
165
+ assert_instance_of And, ast
166
+
167
+ assert_equal 3, ast.operands.length
168
+ assert_equal "a", ast.operands[0].value
169
+ assert_equal "b", ast.operands[1].value
170
+ assert_equal "c", ast.operands[2].value
171
+ end
172
+
173
+ def test_embedded_hyphen_real_name
174
+ # Real-world case: hyphenated names
175
+ ast = GmailSearchSyntax.parse!("Coxlee-Gammage")
176
+ assert_instance_of And, ast
177
+
178
+ assert_equal 2, ast.operands.length
179
+ assert_equal "Coxlee", ast.operands[0].value
180
+ assert_equal "Gammage", ast.operands[1].value
181
+ end
182
+
183
+ def test_space_hyphen_is_negation
184
+ # Space + hyphen = negation (unchanged behavior)
185
+ ast = GmailSearchSyntax.parse!("cats -dogs")
186
+ assert_instance_of And, ast
187
+
188
+ assert_equal 2, ast.operands.length
189
+ assert_instance_of StringToken, ast.operands[0]
190
+ assert_equal "cats", ast.operands[0].value
191
+
192
+ assert_instance_of Not, ast.operands[1]
193
+ assert_equal "dogs", ast.operands[1].child.value
194
+ end
195
+
196
+ def test_embedded_hyphen_combined_with_negation
197
+ # Mixed: embedded hyphen + space-preceded negation
198
+ ast = GmailSearchSyntax.parse!("some-outfit -dogs")
199
+ assert_instance_of And, ast
200
+
201
+ assert_equal 3, ast.operands.length
202
+ assert_equal "some", ast.operands[0].value
203
+ assert_equal "outfit", ast.operands[1].value
204
+ assert_instance_of Not, ast.operands[2]
205
+ assert_equal "dogs", ast.operands[2].child.value
206
+ end
207
+
208
+ def test_negation_at_start_of_input
209
+ # Negation at start of input still works
210
+ ast = GmailSearchSyntax.parse!("-spam")
211
+ assert_instance_of Not, ast
212
+ assert_equal "spam", ast.child.value
213
+ end
214
+
215
+ def test_embedded_hyphen_with_operator
216
+ # Embedded hyphen in operator context
217
+ ast = GmailSearchSyntax.parse!("from:mary-jane")
218
+ assert_instance_of And, ast
219
+
220
+ # "from:mary" becomes operator, "-jane" is embedded hyphen → "jane" is separate word
221
+ assert_equal 2, ast.operands.length
222
+ assert_instance_of Operator, ast.operands[0]
223
+ assert_equal "from", ast.operands[0].name
224
+ assert_equal "mary", ast.operands[0].value
225
+
226
+ assert_instance_of StringToken, ast.operands[1]
227
+ assert_equal "jane", ast.operands[1].value
228
+ end
229
+
145
230
  def test_around_operator
146
231
  ast = GmailSearchSyntax.parse!("holiday AROUND 10 vacation")
147
232
  assert_instance_of Around, ast
@@ -200,10 +285,13 @@ class GmailSearchSyntaxTest < Minitest::Test
200
285
  end
201
286
 
202
287
  def test_in_anywhere
203
- # With Gmail-compatible bareword consumption, "movie" gets consumed into operator value
204
- # To search for "movie" as text, use: in:anywhere "movie" or use a different operator after
288
+ # Gmail treats barewords after operator as separate search terms
289
+ # in:anywhere movie → search for "movie" in all mail locations
205
290
  ast = GmailSearchSyntax.parse!("in:anywhere movie")
206
- assert_operator({name: "in", value: "anywhere movie"}, ast)
291
+ assert_instance_of And, ast
292
+ assert_equal 2, ast.operands.length
293
+ assert_operator({name: "in", value: "anywhere"}, ast.operands[0])
294
+ assert_string_token({value: "movie"}, ast.operands[1])
207
295
  end
208
296
 
209
297
  def test_is_starred
@@ -660,92 +748,103 @@ class GmailSearchSyntaxTest < Minitest::Test
660
748
  assert_equal 'project\\plan', ast.operands[1].value
661
749
  end
662
750
 
663
- # Gmail behavior: barewords after operator values get consumed into the operator value
664
- # until the next operator is encountered.
665
- # We now implement this Gmail-compatible behavior.
751
+ # Gmail behavior: barewords after operator values are treated as separate search terms.
752
+ # Multi-word operator values must be explicitly quoted: label:"Cora/Google Drive"
666
753
 
667
754
  def test_label_with_space_separated_value_gmail_behavior
668
- # Gmail parses this as: label:"Cora/Google Drive", label:"Notes"
669
- # We now match this behavior
755
+ # Gmail treats barewords as separate search terms
756
+ # To search for label "Cora/Google Drive", you must quote it: label:"Cora/Google Drive"
670
757
  ast = GmailSearchSyntax.parse!("label:Cora/Google Drive label:Notes")
671
758
  assert_instance_of And, ast
672
- assert_equal 2, ast.operands.length
759
+ assert_equal 3, ast.operands.length
673
760
 
674
- # Gmail-compatible: barewords consumed into operator value
761
+ # First operator takes only the first token
675
762
  assert_instance_of Operator, ast.operands[0]
676
763
  assert_equal "label", ast.operands[0].name
677
- assert_equal "Cora/Google Drive", ast.operands[0].value
764
+ assert_equal "Cora/Google", ast.operands[0].value
765
+
766
+ # "Drive" becomes a separate search term
767
+ assert_instance_of StringToken, ast.operands[1]
768
+ assert_equal "Drive", ast.operands[1].value
678
769
 
679
770
  # Second operator parsed correctly
680
- assert_instance_of Operator, ast.operands[1]
681
- assert_equal "label", ast.operands[1].name
682
- assert_equal "Notes", ast.operands[1].value
771
+ assert_instance_of Operator, ast.operands[2]
772
+ assert_equal "label", ast.operands[2].name
773
+ assert_equal "Notes", ast.operands[2].value
683
774
  end
684
775
 
685
776
  def test_subject_with_barewords_gmail_behavior
686
- # Gmail parses: subject:"urgent meeting important"
687
- # We now match this behavior
777
+ # Gmail treats barewords as separate search terms
778
+ # subject:urgent meeting important subject contains "urgent" AND body contains "meeting" AND "important"
688
779
  ast = GmailSearchSyntax.parse!("subject:urgent meeting important")
689
- assert_instance_of Operator, ast
780
+ assert_instance_of And, ast
781
+ assert_equal 3, ast.operands.length
690
782
 
691
- assert_equal "subject", ast.name
692
- assert_equal "urgent meeting important", ast.value
783
+ assert_operator({name: "subject", value: "urgent"}, ast.operands[0])
784
+ assert_string_token({value: "meeting"}, ast.operands[1])
785
+ assert_string_token({value: "important"}, ast.operands[2])
693
786
  end
694
787
 
695
788
  def test_multiple_barewords_between_operators_gmail_behavior
696
- # Gmail parses: label:"test one two three", label:"another"
697
- # We now match this behavior
789
+ # Gmail treats each bareword as a separate search term
790
+ # label:test one two three label:another → 5 terms
698
791
  ast = GmailSearchSyntax.parse!("label:test one two three label:another")
699
792
  assert_instance_of And, ast
700
- assert_equal 2, ast.operands.length
793
+ assert_equal 5, ast.operands.length
701
794
 
702
- assert_instance_of Operator, ast.operands[0]
703
- assert_equal "label", ast.operands[0].name
704
- assert_equal "test one two three", ast.operands[0].value
705
-
706
- assert_instance_of Operator, ast.operands[1]
707
- assert_equal "label", ast.operands[1].name
708
- assert_equal "another", ast.operands[1].value
795
+ assert_operator({name: "label", value: "test"}, ast.operands[0])
796
+ assert_string_token({value: "one"}, ast.operands[1])
797
+ assert_string_token({value: "two"}, ast.operands[2])
798
+ assert_string_token({value: "three"}, ast.operands[3])
799
+ assert_operator({name: "label", value: "another"}, ast.operands[4])
709
800
  end
710
801
 
711
802
  def test_barewords_stop_at_special_operators
712
- # Bareword collection should stop at OR, AND, AROUND
803
+ # Barewords are separate terms, OR separates two groups
713
804
  ast = GmailSearchSyntax.parse!("subject:urgent meeting OR subject:important call")
714
805
  assert_instance_of Or, ast
715
806
  assert_equal 2, ast.operands.length
716
807
 
717
- assert_instance_of Operator, ast.operands[0]
718
- assert_equal "subject", ast.operands[0].name
719
- assert_equal "urgent meeting", ast.operands[0].value
808
+ # Left side: subject:urgent AND meeting (implicit AND)
809
+ assert_instance_of And, ast.operands[0]
810
+ assert_equal 2, ast.operands[0].operands.length
811
+ assert_operator({name: "subject", value: "urgent"}, ast.operands[0].operands[0])
812
+ assert_string_token({value: "meeting"}, ast.operands[0].operands[1])
720
813
 
721
- assert_instance_of Operator, ast.operands[1]
722
- assert_equal "subject", ast.operands[1].name
723
- assert_equal "important call", ast.operands[1].value
814
+ # Right side: subject:important AND call (implicit AND)
815
+ assert_instance_of And, ast.operands[1]
816
+ assert_equal 2, ast.operands[1].operands.length
817
+ assert_operator({name: "subject", value: "important"}, ast.operands[1].operands[0])
818
+ assert_string_token({value: "call"}, ast.operands[1].operands[1])
724
819
  end
725
820
 
726
821
  def test_barewords_with_mixed_tokens
727
- # Numbers, dates, emails should all be collected as barewords
822
+ # Numbers, dates, emails are all separate search terms
728
823
  ast = GmailSearchSyntax.parse!("subject:meeting 2024 Q1 review")
729
- assert_instance_of Operator, ast
730
- assert_equal "subject", ast.name
731
- assert_equal "meeting 2024 Q1 review", ast.value
824
+ assert_instance_of And, ast
825
+ assert_equal 4, ast.operands.length
826
+
827
+ assert_operator({name: "subject", value: "meeting"}, ast.operands[0])
828
+ assert_string_token({value: 2024}, ast.operands[1])
829
+ assert_string_token({value: "Q1"}, ast.operands[2])
830
+ assert_string_token({value: "review"}, ast.operands[3])
732
831
  end
733
832
 
734
833
  def test_specific_gmail_example_cora_google_drive
735
- # The specific example from the user: label:Cora/Google Drive label:Notes
736
- # This should parse as two separate label operators with multi-word values
834
+ # label:Cora/Google Drive label:Notes
835
+ # "Drive" is a separate search term - to include it in the label, quote it:
836
+ # label:"Cora/Google Drive" label:Notes
737
837
  ast = GmailSearchSyntax.parse!("label:Cora/Google Drive label:Notes")
738
838
  assert_instance_of And, ast
739
- assert_equal 2, ast.operands.length
839
+ assert_equal 3, ast.operands.length
740
840
 
741
- # First operator: label with "Cora/Google Drive"
742
- assert_instance_of Operator, ast.operands[0]
743
- assert_equal "label", ast.operands[0].name
744
- assert_equal "Cora/Google Drive", ast.operands[0].value
841
+ # First operator: label with "Cora/Google" only
842
+ assert_operator({name: "label", value: "Cora/Google"}, ast.operands[0])
843
+
844
+ # "Drive" becomes a separate search term
845
+ assert_string_token({value: "Drive"}, ast.operands[1])
745
846
 
746
847
  # Second operator: label with "Notes"
747
- assert_instance_of Operator, ast.operands[1]
748
- assert_equal "label", ast.operands[1].name
749
- assert_equal "Notes", ast.operands[1].value
848
+ assert_operator({name: "label", value: "Notes"}, ast.operands[2])
750
849
  end
751
850
  end
@@ -94,6 +94,64 @@ class TokenizerTest < Minitest::Test
94
94
  assert_token_stream(expected, tokens)
95
95
  end
96
96
 
97
+ def test_tokenize_embedded_hyphen
98
+ # Gmail behavior: embedded hyphen (no preceding whitespace) is a word separator, not negation
99
+ tokens = tokenize("some-outfit")
100
+ expected = [
101
+ {type: :word, value: "some"},
102
+ {type: :word, value: "outfit"},
103
+ {type: :eof}
104
+ ]
105
+ assert_token_stream(expected, tokens)
106
+ end
107
+
108
+ def test_tokenize_multiple_embedded_hyphens
109
+ # Multiple hyphens: a-b-c → three separate words
110
+ tokens = tokenize("a-b-c")
111
+ expected = [
112
+ {type: :word, value: "a"},
113
+ {type: :word, value: "b"},
114
+ {type: :word, value: "c"},
115
+ {type: :eof}
116
+ ]
117
+ assert_token_stream(expected, tokens)
118
+ end
119
+
120
+ def test_tokenize_hyphenated_name
121
+ # Real-world case: hyphenated names like "Coxlee-Gammage"
122
+ tokens = tokenize("Coxlee-Gammage")
123
+ expected = [
124
+ {type: :word, value: "Coxlee"},
125
+ {type: :word, value: "Gammage"},
126
+ {type: :eof}
127
+ ]
128
+ assert_token_stream(expected, tokens)
129
+ end
130
+
131
+ def test_tokenize_negation_at_start
132
+ # Negation at start of input
133
+ tokens = tokenize("-spam")
134
+ expected = [
135
+ {type: :minus},
136
+ {type: :word, value: "spam"},
137
+ {type: :eof}
138
+ ]
139
+ assert_token_stream(expected, tokens)
140
+ end
141
+
142
+ def test_tokenize_embedded_hyphen_vs_negation
143
+ # Mixed: embedded hyphen + space-preceded negation
144
+ tokens = tokenize("some-outfit -dogs")
145
+ expected = [
146
+ {type: :word, value: "some"},
147
+ {type: :word, value: "outfit"},
148
+ {type: :minus},
149
+ {type: :word, value: "dogs"},
150
+ {type: :eof}
151
+ ]
152
+ assert_token_stream(expected, tokens)
153
+ end
154
+
97
155
  def test_tokenize_around
98
156
  tokens = tokenize("holiday AROUND 10 vacation")
99
157
  expected = [
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gmail_search_syntax
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - me@julik.nl
@@ -69,10 +69,9 @@ executables: []
69
69
  extensions: []
70
70
  extra_rdoc_files: []
71
71
  files:
72
- - ARCHITECTURE.md
73
- - GMAIL_BEHAVIOR_COMPARISON.md
74
- - GMAIL_COMPATIBILITY_COMPLETE.md
75
- - IMPLEMENTATION_NOTES.md
72
+ - ".github/workflows/ci.yml"
73
+ - AGENTS.md
74
+ - Gemfile
76
75
  - README.md
77
76
  - Rakefile
78
77
  - SCHEMA.md
@@ -84,6 +83,7 @@ files:
84
83
  - examples/postgres_vs_sqlite.rb
85
84
  - examples/sql_query.rb
86
85
  - examples/text_vs_substring_demo.rb
86
+ - gmail_search_syntax.gemspec
87
87
  - lib/GMAIL_SEARCH_OPERATORS.md
88
88
  - lib/gmail_search_syntax.rb
89
89
  - lib/gmail_search_syntax/ast.rb
@@ -91,6 +91,12 @@ files:
91
91
  - lib/gmail_search_syntax/sql_visitor.rb
92
92
  - lib/gmail_search_syntax/tokenizer.rb
93
93
  - lib/gmail_search_syntax/version.rb
94
+ - slop/ARCHITECTURE.md
95
+ - slop/EMBEDDED_HYPHENS.md
96
+ - slop/GMAIL_BEHAVIOR_COMPARISON.md
97
+ - slop/GMAIL_COMPATIBILITY_COMPLETE.md
98
+ - slop/GREEDY_VS_NON_GREEDY_TOKENIZATION.md
99
+ - slop/IMPLEMENTATION_NOTES.md
94
100
  - test/gmail_search_syntax_test.rb
95
101
  - test/integration_test.rb
96
102
  - test/postgres_visitor_test.rb
File without changes