gmail_search_syntax 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1b5e08a769d7b375473e7ca0e4afe134e03862ae3f31040c9bb22904ff482b33
4
- data.tar.gz: 32624e727131b5bb0779f3f1271b6031e252fbdbdea776b445c598b06f343715
3
+ metadata.gz: d7c192efdd70d31b0747a356b1adab4c50160afb81e747d1088f2727f97c7262
4
+ data.tar.gz: 73f6401e73d2ecae59b6e6741c08786d2da3dffbbdcb6f21095c63bf40761f6d
5
5
  SHA512:
6
- metadata.gz: d93ca6cb4e4d0bab18a9e3ff2620f669f9913cbafa3b389dd1bf3a828329df34be12f202856bba61e66d581dd8cf6c91a665889efeb8e5df3a50d5d60e33d131
7
- data.tar.gz: c16da51e8f41ba6a9c001293df098c789b7dd5f3b494d3cfb0f60d3e16e485b8f5d185dd67c0cb1b0cade40d1079eca786c1614c7d7afc90585e97c5d0b8c4e1
6
+ metadata.gz: ce609cacd2a6276d9178276ee84e0ccffc80eb8840bd62e6402e2275817ba2052559f903a55648373e87a79527415d6be4df1d4052429c83e3faa9abc1fe2c1c
7
+ data.tar.gz: 7badbc6024a67752bfab201577ebeb900ba106ed9979110c04eed6c76d35e33e26a7b1147927a2fd9d329f6a7bc28b71d2487d0f26e87f23a9dd7f867fac7e09
@@ -0,0 +1,24 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ branches: [ main ]
6
+ pull_request:
7
+ branches: [ main ]
8
+
9
+ jobs:
10
+ test:
11
+ runs-on: ubuntu-latest
12
+
13
+ steps:
14
+ - uses: actions/checkout@v4
15
+
16
+ - name: Set up Ruby
17
+ uses: ruby/setup-ruby@v1
18
+ with:
19
+ ruby-version: '3.4'
20
+ bundler-cache: true
21
+
22
+ - name: Run tests
23
+ run: bundle exec rake
24
+
data/AGENTS.md ADDED
@@ -0,0 +1,104 @@
1
+ # Agent Guidelines for gmail_search_syntax
2
+
3
+ This document outlines the coding standards and workflow requirements for AI agents working on the gmail_search_syntax project.
4
+
5
+ ## Ruby File Standards
6
+
7
+ ### Frozen String Literal
8
+ **ALWAYS** include `# frozen_string_literal: true` at the top of every Ruby file:
9
+
10
+ ```ruby
11
+ # frozen_string_literal: true
12
+
13
+ class MyClass
14
+ # ... implementation
15
+ end
16
+ ```
17
+
18
+ This directive should be the very first line of every `.rb` file to ensure string immutability and improve performance.
19
+
20
+ ## Documentation Standards
21
+
22
+ ### Markdown Files Location
23
+ When writing step-by-step instructions, documentation, or process descriptions in Markdown format, place them in the `slop/` directory:
24
+
25
+ ```
26
+ slop/
27
+ ├── ARCHITECTURE.md
28
+ ├── GMAIL_BEHAVIOR_COMPARISON.md
29
+ ├── GMAIL_COMPATIBILITY_COMPLETE.md
30
+ └── IMPLEMENTATION_NOTES.md
31
+ ```
32
+
33
+ This keeps the main project directory clean while preserving detailed documentation and implementation notes.
34
+
35
+ ## Code Formatting
36
+
37
+ ### StandardRB Integration
38
+ After creating or modifying any Ruby file, **ALWAYS** run StandardRB to maintain consistent formatting:
39
+
40
+ ```bash
41
+ standardrb --fix /path/to/file.rb
42
+ ```
43
+
44
+ This ensures:
45
+ - Consistent code style across the project
46
+ - Automatic fixing of common formatting issues
47
+ - Compliance with the project's Ruby style guide
48
+ - Uniform indentation, spacing, and syntax
49
+
50
+ ### Workflow
51
+ 1. Create or modify a Ruby file
52
+ 2. Immediately run `standardrb --fix` on the file
53
+ 3. Verify the changes are acceptable
54
+ 4. Continue with development
55
+
56
+ ## Project Context
57
+
58
+ This is a Ruby gem that parses Gmail's search syntax and converts it into an Abstract Syntax Tree (AST). The project includes:
59
+
60
+ - **Core parsing**: Tokenizer, parser, and AST nodes
61
+ - **SQL conversion**: SQLite and Postgres visitors for database queries
62
+ - **Comprehensive testing**: Unit and integration tests
63
+ - **Documentation**: Schema documentation and operator reference
64
+
65
+ ## Key Files
66
+
67
+ - `lib/gmail_search_syntax.rb` - Main entry point
68
+ - `lib/gmail_search_syntax/parser.rb` - Core parsing logic
69
+ - `lib/gmail_search_syntax/sql_visitor.rb` - SQL generation
70
+ - `test/` - Test suite
71
+ - `SCHEMA.md` - Database schema documentation
72
+ - `slop/` - Detailed implementation documentation
73
+
74
+ ## Best Practices
75
+
76
+ 1. **Test Coverage**: Ensure all new functionality has corresponding tests
77
+ 2. **Documentation**: Update relevant documentation when adding features
78
+ 3. **Backward Compatibility**: Maintain API compatibility when possible
79
+ 4. **Performance**: Consider performance implications of parsing changes
80
+ 5. **Gmail Compatibility**: Verify changes against Gmail's actual search behavior
81
+
82
+ ## Example Workflow
83
+
84
+ ```bash
85
+ # 1. Create a new Ruby file
86
+ echo '# frozen_string_literal: true' > lib/gmail_search_syntax/new_feature.rb
87
+
88
+ # 2. Add implementation
89
+ # ... write code ...
90
+
91
+ # 3. Format with StandardRB
92
+ standardrb --fix lib/gmail_search_syntax/new_feature.rb
93
+
94
+ # 4. Create documentation if needed
95
+ echo '# Implementation Notes' > slop/NEW_FEATURE_NOTES.md
96
+
97
+ # 5. Add tests
98
+ # ... write tests ...
99
+
100
+ # 6. Run test suite
101
+ bundle exec rake test
102
+ ```
103
+
104
+ Remember: Consistency in code style and documentation organization is crucial for maintaining this project's quality and readability.
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source "https://rubygems.org"
2
+
3
+ gemspec
data/README.md CHANGED
@@ -6,7 +6,8 @@ Based on the official Gmail search operators documentation:
6
6
  https://support.google.com/mail/answer/7190
7
7
 
8
8
  > [!TIP]
9
- > This gem was created for [Cora,](https://cora.computer/) your personal e-mail assistant.
9
+ > This gem was created for [Cora,](https://cora.computer/)
10
+ > your personal e-mail assistant.
10
11
  > Send them some love for allowing me to share it.
11
12
 
12
13
  ## Installation
@@ -54,47 +55,6 @@ GmailSearchSyntax.parse!("")
54
55
  # => raises GmailSearchSyntax::EmptyQueryError
55
56
  ```
56
57
 
57
- ### Converting to SQL
58
-
59
- The gem includes a SQLite visitor that can convert Gmail queries to SQL. Here's a complex example:
60
-
61
- ```ruby
62
- require 'gmail_search_syntax'
63
-
64
- # A complex Gmail query with multiple operators
65
- query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
66
-
67
- ast = GmailSearchSyntax.parse!(query)
68
- visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
69
- visitor.visit(ast)
70
-
71
- sql, params = visitor.to_query.to_sql
72
- ```
73
-
74
- This generates the following SQL:
75
-
76
- ```sql
77
- SELECT DISTINCT m0.id
78
- FROM messages AS m0
79
- INNER JOIN message_addresses AS ma1 ON m0.id = ma1.message_id
80
- INNER JOIN message_addresses AS ma3 ON m0.id = ma3.message_id
81
- INNER JOIN message_labels AS ml ON m0.id = ml.message_id
82
- INNER JOIN labels AS l ON ml.label_id = l.id
83
- WHERE ((((ma1.address_type = ? OR ma1.address_type = ? OR ma1.address_type = ?)
84
- AND ma1.email_address = ?)
85
- OR ((ma3.address_type = ? OR ma3.address_type = ? OR ma3.address_type = ?)
86
- AND ma3.email_address = ?))
87
- AND m0.subject LIKE ?
88
- AND m0.has_attachment = 1
89
- AND NOT l.name = ?
90
- AND m0.internal_date > ?
91
- AND m0.size_bytes > ?)
92
- ```
93
-
94
- With parameters: `["from", "cc", "bcc", "manager", "from", "cc", "bcc", "boss", "%quarterly review%", "archived", "2024-01-01", 5242880]`
95
-
96
- A similar visitor is provided for PostgreSQL.
97
-
98
58
  ## Supported Operators
99
59
 
100
60
  Email routing: `from:`, `to:`, `cc:`, `bcc:`, `deliveredto:`
@@ -113,6 +73,23 @@ Size: `size:`, `larger:`, `smaller:`
113
73
 
114
74
  There is also a converter from the operators to SQL queries against an embedded SQLite database. This is meant more as an example than a fully-featured store, but it shows what's possible.
115
75
 
76
+ ### Converting to SQL
77
+
78
+ The gem includes a SQLite visitor and a Postgres visitor which converts the Gmail queries into corresponding SQL. See SCHEMA.md for more information.
79
+
80
+ ```ruby
81
+ require 'gmail_search_syntax'
82
+
83
+ # A complex Gmail query with multiple operators
84
+ query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
85
+
86
+ ast = GmailSearchSyntax.parse!(query)
87
+ visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
88
+ visitor.visit(ast)
89
+
90
+ sql, params = visitor.to_query.to_sql
91
+ ```
92
+
116
93
  ## Testing
117
94
 
118
95
  ```bash
data/SCHEMA.md CHANGED
@@ -1,15 +1,111 @@
1
- # Database Schema for Gmail Search
1
+ This document describes the database schema designed to support Gmail search syntax queries. The schema is optimized for the search operators defined in `lib/GMAIL_SEARCH_OPERATORS.md`.
2
2
 
3
- ## Overview
3
+ ```ruby
4
+ require 'gmail_search_syntax'
4
5
 
5
- This document describes the SQLite database schema designed to support Gmail search syntax queries. The schema is optimized for the search operators defined in `lib/GMAIL_SEARCH_OPERATORS.md`.
6
+ # A complex Gmail query with multiple operators
7
+ query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
6
8
 
7
- ## Core Tables
9
+ ast = GmailSearchSyntax.parse!(query)
10
+ visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
11
+ visitor.visit(ast)
12
+
13
+ sql, params = visitor.to_query.to_sql
14
+ ```
15
+
16
+ generates the following SQL:
17
+
18
+ ```sql
19
+ SELECT DISTINCT m0.id
20
+ FROM messages AS m0
21
+ INNER JOIN message_addresses AS ma1 ON m0.id = ma1.message_id
22
+ INNER JOIN message_addresses AS ma3 ON m0.id = ma3.message_id
23
+ INNER JOIN message_labels AS ml ON m0.id = ml.message_id
24
+ INNER JOIN labels AS l ON ml.label_id = l.id
25
+ WHERE ((((ma1.address_type = ? OR ma1.address_type = ? OR ma1.address_type = ?)
26
+ AND ma1.email_address = ?)
27
+ OR ((ma3.address_type = ? OR ma3.address_type = ? OR ma3.address_type = ?)
28
+ AND ma3.email_address = ?))
29
+ AND m0.subject LIKE ?
30
+ AND m0.has_attachment = 1
31
+ AND NOT l.name = ?
32
+ AND m0.internal_date > ?
33
+ AND m0.size_bytes > ?)
34
+ ```
35
+
36
+ and bound parameters:
37
+
38
+ ```
39
+ [
40
+ "from", "cc", "bcc", "manager", "from", "cc", "bcc",
41
+ "boss", "%quarterly review%", "archived", "2024-01-01", 5242880
42
+ ]
43
+ ```
44
+
45
+ ## String Matching Requirements
46
+
47
+ ### Prefix/Suffix Matching
48
+ Required for:
49
+ - **Email addresses** (from:, to:, cc:, bcc:, deliveredto:)
50
+ - `from:marc@` → prefix match → `WHERE email_address LIKE 'marc@%'`
51
+ - `from:@example.com` → suffix match → `WHERE email_address LIKE '%@example.com'`
52
+ - `from:marc@example.com` → exact match → `WHERE email_address = 'marc@example.com'`
53
+
54
+ - **Mailing lists** (list:)
55
+ - Same pattern as email addresses
56
+
57
+ - **Filenames** (filename:)
58
+ - `filename:pdf` → extension match → `WHERE filename LIKE '%.pdf'`
59
+ - `filename:homework` → prefix match → `WHERE filename LIKE 'homework%'`
60
+
61
+ ### Exact Match Only
62
+ - RFC822 message IDs
63
+ - Boolean/enum fields (is:, has:, in:, category:, label:)
64
+
65
+ ## SQL Visitor Usage
66
+
67
+ The library provides two SQL visitor implementations for different database backends: SQLite and Postgres. They are configured to use the schema described below. You convert the search AST nodes into a SQL query using the provided SQL visitors. If you have a different schema, use the visitor code as a template.
68
+
69
+
70
+ ```ruby
71
+ ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
72
+ visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "me@example.com")
73
+ visitor.visit(ast)
74
+
75
+ sql, params = visitor.to_query.to_sql
76
+ # sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > datetime('now', ?)"
77
+ # params: ["from", "cc", "bcc", "amy@example.com", "-7 days"]
78
+ ```
79
+
80
+ The visitors implement:
81
+
82
+ - **Parameterized queries**: All user input is bound via `?` placeholders
83
+ - **Automatic table joins**: Joins required tables based on operators
84
+ - **Nested conditions**: Properly handles AND/OR/NOT with parentheses
85
+ - **Special operators**:
86
+ - `from:me` / `to:me` → uses `current_user_email`
87
+ - `in:anywhere` → no location filter
88
+ - `AROUND` → generates `(1 = 0)` no-op condition
89
+ - **Date handling**:
90
+ - Converts dates from `YYYY/MM/DD` to `YYYY-MM-DD`
91
+ - Parses relative times (`1y`, `2d`, `3m`) to database-specific datetime functions
92
+ - **Size parsing**: Converts `10M`, `1G` to bytes
93
+
94
+ ## Fuzzy Matching Limitations
95
+
96
+ The current implementation does **not** support:
97
+ - **AROUND operator** (proximity search) - generates no-op `(1 = 0)` condition
98
+ - Full-text search with word distance calculations
99
+ - Stemming or phonetic matching
100
+ - Levenshtein distance / typo tolerance
101
+
102
+ These features require additional implementation, potentially using SQLite FTS5 extensions.
8
103
 
9
- ### messages
10
- Primary table storing email message metadata.
104
+ ## Core Tables
11
105
 
12
106
  ```sql
107
+ -- messages
108
+ -- Primary table storing email message metadata.
13
109
  CREATE TABLE messages (
14
110
  id TEXT PRIMARY KEY,
15
111
  rfc822_message_id TEXT,
@@ -55,12 +151,12 @@ CREATE TABLE messages (
55
151
 
56
152
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
57
153
  );
58
- ```
154
+ `
59
155
 
60
- ### message_addresses
61
- Stores email addresses associated with messages (from, to, cc, bcc, delivered_to).
156
+ -- message_addresses
157
+ -- Stores email addresses associated with messages (from, to, cc, bcc, delivered_to).
158
+ -- The `from:` and `to:` operators search across `from`, `cc`, and `bcc` address types per Gmail specification.
62
159
 
63
- ```sql
64
160
  CREATE TABLE message_addresses (
65
161
  id INTEGER PRIMARY KEY AUTOINCREMENT,
66
162
  message_id TEXT NOT NULL,
@@ -70,26 +166,21 @@ CREATE TABLE message_addresses (
70
166
 
71
167
  FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE CASCADE
72
168
  );
73
- ```
74
169
 
75
- **Note:** The `from:` and `to:` operators search across `from`, `cc`, and `bcc` address types per Gmail specification.
76
170
 
77
- ### labels
78
- Label definitions with external string IDs.
171
+ -- labels
172
+ -- Label definitions with external string IDs.
79
173
 
80
- ```sql
81
174
  CREATE TABLE labels (
82
175
  id TEXT PRIMARY KEY,
83
176
  name TEXT NOT NULL UNIQUE,
84
177
  is_system_label BOOLEAN DEFAULT 0,
85
178
  created_at DATETIME DEFAULT CURRENT_TIMESTAMP
86
179
  );
87
- ```
88
180
 
89
- ### message_labels
90
- Many-to-many relationship between messages and labels.
181
+ -- message_labels
182
+ -- Many-to-many relationship between messages and labels.
91
183
 
92
- ```sql
93
184
  CREATE TABLE message_labels (
94
185
  id INTEGER PRIMARY KEY AUTOINCREMENT,
95
186
  message_id TEXT NOT NULL,
@@ -100,11 +191,9 @@ CREATE TABLE message_labels (
100
191
  UNIQUE(message_id, label_id)
101
192
  );
102
193
  ```
194
+ -- attachments
195
+ -- File attachments associated with messages.
103
196
 
104
- ### attachments
105
- File attachments associated with messages.
106
-
107
- ```sql
108
197
  CREATE TABLE attachments (
109
198
  id INTEGER PRIMARY KEY AUTOINCREMENT,
110
199
  message_id TEXT NOT NULL,
@@ -116,108 +205,3 @@ CREATE TABLE attachments (
116
205
  );
117
206
  ```
118
207
 
119
- ## String Matching Requirements
120
-
121
- ### Prefix/Suffix Matching
122
- Required for:
123
- - **Email addresses** (from:, to:, cc:, bcc:, deliveredto:)
124
- - `from:marc@` → prefix match → `WHERE email_address LIKE 'marc@%'`
125
- - `from:@example.com` → suffix match → `WHERE email_address LIKE '%@example.com'`
126
- - `from:marc@example.com` → exact match → `WHERE email_address = 'marc@example.com'`
127
-
128
- - **Mailing lists** (list:)
129
- - Same pattern as email addresses
130
-
131
- - **Filenames** (filename:)
132
- - `filename:pdf` → extension match → `WHERE filename LIKE '%.pdf'`
133
- - `filename:homework` → prefix match → `WHERE filename LIKE 'homework%'`
134
-
135
- ### Exact Match Only
136
- - RFC822 message IDs
137
- - Boolean/enum fields (is:, has:, in:, category:, label:)
138
-
139
- ## SQL Visitor Usage
140
-
141
- The library provides two SQL visitor implementations for different database backends:
142
-
143
- ### SQLiteVisitor
144
-
145
- Converts parsed AST into SQLite-compatible SQL queries:
146
-
147
- ```ruby
148
- ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
149
- visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "me@example.com")
150
- visitor.visit(ast)
151
-
152
- sql, params = visitor.to_query.to_sql
153
- # sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > datetime('now', ?)"
154
- # params: ["from", "cc", "bcc", "amy@example.com", "-7 days"]
155
- ```
156
-
157
- ### PostgresVisitor
158
-
159
- Converts parsed AST into PostgreSQL-compatible SQL queries:
160
-
161
- ```ruby
162
- ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
163
- visitor = GmailSearchSyntax::PostgresVisitor.new(current_user_email: "me@example.com")
164
- visitor.visit(ast)
165
-
166
- sql, params = visitor.to_query.to_sql
167
- # sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > (NOW() - ?::interval)"
168
- # params: ["from", "cc", "bcc", "amy@example.com", "7 days"]
169
- ```
170
-
171
- **Note**: `SqlVisitor` is an alias for `SQLiteVisitor` for backward compatibility.
172
-
173
- ### Database-Specific Differences
174
-
175
- The main difference between the visitors is in relative date handling:
176
-
177
- | Feature | SQLite | PostgreSQL |
178
- |---------|--------|------------|
179
- | `older_than:7d` | `datetime('now', '-7 days')` | `NOW() - '7 days'::interval` |
180
- | `newer_than:3m` | `datetime('now', '-3 months')` | `NOW() - '3 months'::interval` |
181
- | Parameter format | `"-7 days"` (negative) | `"7 days"` (positive with cast) |
182
-
183
- All other query generation is identical between the two visitors.
184
-
185
- ### Features
186
-
187
- - **Parameterized queries**: All user input is bound via `?` placeholders
188
- - **Automatic table joins**: Joins required tables based on operators
189
- - **Nested conditions**: Properly handles AND/OR/NOT with parentheses
190
- - **Special operators**:
191
- - `from:me` / `to:me` → uses `current_user_email`
192
- - `in:anywhere` → no location filter
193
- - `AROUND` → generates `(1 = 0)` no-op condition
194
- - **Date handling**:
195
- - Converts dates from `YYYY/MM/DD` to `YYYY-MM-DD`
196
- - Parses relative times (`1y`, `2d`, `3m`) to database-specific datetime functions
197
- - **Size parsing**: Converts `10M`, `1G` to bytes
198
-
199
- ### Query Object
200
-
201
- The `Query` class accumulates SQL components:
202
-
203
- ```ruby
204
- query = visitor.to_query
205
-
206
- query.conditions # Array of WHERE conditions
207
- query.joins # Hash of JOIN clauses
208
- query.params # Array of bound parameters
209
-
210
- sql, params = query.to_sql
211
- # Returns: [sql_string, parameters_array]
212
- ```
213
-
214
- ## Fuzzy Matching Limitations
215
-
216
- The current implementation does **not** support:
217
- - **AROUND operator** (proximity search) - generates no-op `(1 = 0)` condition
218
- - Full-text search with word distance calculations
219
- - Stemming or phonetic matching
220
- - Levenshtein distance / typo tolerance
221
-
222
- These features require additional implementation, potentially using SQLite FTS5 extensions.
223
-
data/examples/demo.rb CHANGED
@@ -1,8 +1,5 @@
1
1
  require_relative "../lib/gmail_search_syntax"
2
-
3
- puts "Gmail Search Syntax Parser - Demo"
4
- puts "=" * 50
5
- puts
2
+ require "pp"
6
3
 
7
4
  queries = [
8
5
  "from:amy@example.com",
@@ -23,6 +20,5 @@ queries = [
23
20
  queries.each do |query|
24
21
  puts "Query: #{query}"
25
22
  ast = GmailSearchSyntax.parse!(query)
26
- puts "AST: #{ast.inspect}"
27
- puts
23
+ pp ast
28
24
  end
@@ -0,0 +1,23 @@
1
+ require_relative "lib/gmail_search_syntax/version"
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = "gmail_search_syntax"
5
+ s.version = GmailSearchSyntax::VERSION
6
+ s.summary = "Gmail search syntax parser"
7
+ s.authors = ["me@julik.nl"]
8
+ s.license = "MIT"
9
+ s.homepage = "https://github.com/julik/gmail_search_syntax"
10
+ s.required_ruby_version = ">= 3.0"
11
+
12
+ s.files = Dir.chdir(__dir__) do
13
+ `git ls-files -z`.split("\x0").reject do |f|
14
+ File.basename(f).start_with?(".")
15
+ end
16
+ end
17
+ s.require_paths = ["lib"]
18
+
19
+ s.add_development_dependency "minitest", "~> 5.0"
20
+ s.add_development_dependency "rake", "~> 13.0"
21
+ s.add_development_dependency "sqlite3", "< 1.6"
22
+ s.add_development_dependency "standard", "~> 1.0"
23
+ end
@@ -195,47 +195,17 @@ module GmailSearchSyntax
195
195
  when :lbrace
196
196
  parse_braces
197
197
  when :quoted_string
198
- # Quoted strings are consumed as-is, no bareword collection
199
198
  value = current_token.value
200
199
  advance
201
200
  value
202
201
  when :word, :email, :number, :date, :relative_time
203
- # Collect the initial value and any following barewords
204
- # until we hit an operator, special token, or grouping
205
- values = []
206
- types = []
207
-
208
- # Collect barewords
209
- while !eof? && is_bareword_token?
210
- # Check if this word is actually an operator (word followed by colon)
211
- if current_token.type == :word && peek_token&.type == :colon
212
- break
213
- end
214
-
215
- values << current_token.value
216
- types << current_token.type
217
- advance
218
- end
219
-
220
- # If we only collected one value and it's a number, preserve its type
221
- if values.length == 1 && types[0] == :number
222
- values[0]
223
- else
224
- # Multiple values or non-number: join as string
225
- values.map(&:to_s).join(" ")
226
- end
227
- end
228
- end
229
-
230
- def is_bareword_token?
231
- return false if eof?
232
-
233
- # Barewords are simple value tokens, not operators or special syntax
234
- case current_token.type
235
- when :word, :email, :number, :date, :relative_time
236
- true
237
- else
238
- false
202
+ # Take only a single token as the operator value.
203
+ # Multi-word values must be explicitly quoted: from:"john smith"
204
+ # This matches Gmail's actual search behavior where bare words
205
+ # after an operator are treated as separate search terms.
206
+ value = current_token.value
207
+ advance
208
+ value.is_a?(Integer) ? value : value.to_s
239
209
  end
240
210
  end
241
211
  end
@@ -1,3 +1,3 @@
1
1
  module GmailSearchSyntax
2
- VERSION = "0.1.2"
2
+ VERSION = "0.1.3"
3
3
  end
@@ -0,0 +1,84 @@
1
+ # Greedy vs Non-Greedy Operator Value Tokenization
2
+
3
+ ## Summary
4
+
5
+ This document explains the fix in `bugfix-tokens` that changes how operator values are parsed from **greedy** (consuming multiple barewords) to **non-greedy** (single token only), matching Gmail's actual search behavior.
6
+
7
+ ## The Problem
8
+
9
+ The previous implementation used greedy tokenization for operator values. When parsing `label:Cora/Google Drive`, the parser would consume all subsequent barewords (`Cora/Google`, `Drive`) into the operator's value until hitting another operator or special token.
10
+
11
+ **Previous behavior:**
12
+ ```
13
+ label:Cora/Google Drive label:Notes
14
+ → Operator(label, "Cora/Google Drive"), Operator(label, "Notes")
15
+ ```
16
+
17
+ **Gmail's actual behavior:**
18
+ ```
19
+ label:Cora/Google Drive label:Notes
20
+ → Operator(label, "Cora/Google"), StringToken("Drive"), Operator(label, "Notes")
21
+ ```
22
+
23
+ ## Gmail's Actual Behavior
24
+
25
+ In Gmail search, barewords after an operator are treated as **separate search terms**, not as part of the operator's value. To include multiple words in an operator value, you must explicitly quote them:
26
+
27
+ | Input | Gmail Interpretation |
28
+ |-------|---------------------|
29
+ | `subject:urgent meeting` | Subject contains "urgent" AND body contains "meeting" |
30
+ | `subject:"urgent meeting"` | Subject contains "urgent meeting" |
31
+ | `in:anywhere movie` | Search "movie" in all mail locations |
32
+ | `label:test one two` | Label is "test" AND body contains "one" AND "two" |
33
+
34
+ ## The Fix
35
+
36
+ Changed `parse_operator_value` in `lib/gmail_search_syntax/parser.rb` to only consume a single token for bareword values (`:word`, `:email`, `:number`, `:date`, `:relative_time`).
37
+
38
+ ### Before (greedy)
39
+
40
+ ```ruby
41
+ when :word, :email, :number, :date, :relative_time
42
+ values = []
43
+ types = []
44
+
45
+ # Collect barewords until operator or special token
46
+ while !eof? && is_bareword_token?
47
+ if current_token.type == :word && peek_token&.type == :colon
48
+ break
49
+ end
50
+ values << current_token.value
51
+ types << current_token.type
52
+ advance
53
+ end
54
+
55
+ # Join multiple values as string
56
+ values.map(&:to_s).join(" ")
57
+ ```
58
+
59
+ ### After (non-greedy)
60
+
61
+ ```ruby
62
+ when :word, :email, :number, :date, :relative_time
63
+ # Take only a single token as the operator value.
64
+ # Multi-word values must be explicitly quoted: from:"john smith"
65
+ value = current_token.value
66
+ advance
67
+ value.is_a?(Integer) ? value : value.to_s
68
+ ```
69
+
70
+ ## Test Changes
71
+
72
+ Updated tests to reflect the corrected behavior:
73
+
74
+ | Test | Previous Expected | Now Expected |
75
+ |------|-------------------|--------------|
76
+ | `in:anywhere movie` | `Operator("in", "anywhere movie")` | `Operator("in", "anywhere")`, `StringToken("movie")` |
77
+ | `subject:urgent meeting important` | `Operator("subject", "urgent meeting important")` | `Operator("subject", "urgent")`, `StringToken("meeting")`, `StringToken("important")` |
78
+ | `label:test one two three label:another` | 2 operands | 5 operands |
79
+
80
+ ## Implications
81
+
82
+ 1. **Breaking change** for consumers relying on greedy behavior
83
+ 2. Users must now quote multi-word operator values explicitly
84
+ 3. More accurate translation to SQL/other query languages since the semantics now match Gmail
@@ -200,10 +200,13 @@ class GmailSearchSyntaxTest < Minitest::Test
200
200
  end
201
201
 
202
202
  def test_in_anywhere
203
- # With Gmail-compatible bareword consumption, "movie" gets consumed into operator value
204
- # To search for "movie" as text, use: in:anywhere "movie" or use a different operator after
203
+ # Gmail treats barewords after operator as separate search terms
204
+ # in:anywhere movie → search for "movie" in all mail locations
205
205
  ast = GmailSearchSyntax.parse!("in:anywhere movie")
206
- assert_operator({name: "in", value: "anywhere movie"}, ast)
206
+ assert_instance_of And, ast
207
+ assert_equal 2, ast.operands.length
208
+ assert_operator({name: "in", value: "anywhere"}, ast.operands[0])
209
+ assert_string_token({value: "movie"}, ast.operands[1])
207
210
  end
208
211
 
209
212
  def test_is_starred
@@ -660,92 +663,103 @@ class GmailSearchSyntaxTest < Minitest::Test
660
663
  assert_equal 'project\\plan', ast.operands[1].value
661
664
  end
662
665
 
663
- # Gmail behavior: barewords after operator values get consumed into the operator value
664
- # until the next operator is encountered.
665
- # We now implement this Gmail-compatible behavior.
666
+ # Gmail behavior: barewords after operator values are treated as separate search terms.
667
+ # Multi-word operator values must be explicitly quoted: label:"Cora/Google Drive"
666
668
 
667
669
  def test_label_with_space_separated_value_gmail_behavior
668
- # Gmail parses this as: label:"Cora/Google Drive", label:"Notes"
669
- # We now match this behavior
670
+ # Gmail treats barewords as separate search terms
671
+ # To search for label "Cora/Google Drive", you must quote it: label:"Cora/Google Drive"
670
672
  ast = GmailSearchSyntax.parse!("label:Cora/Google Drive label:Notes")
671
673
  assert_instance_of And, ast
672
- assert_equal 2, ast.operands.length
674
+ assert_equal 3, ast.operands.length
673
675
 
674
- # Gmail-compatible: barewords consumed into operator value
676
+ # First operator takes only the first token
675
677
  assert_instance_of Operator, ast.operands[0]
676
678
  assert_equal "label", ast.operands[0].name
677
- assert_equal "Cora/Google Drive", ast.operands[0].value
679
+ assert_equal "Cora/Google", ast.operands[0].value
680
+
681
+ # "Drive" becomes a separate search term
682
+ assert_instance_of StringToken, ast.operands[1]
683
+ assert_equal "Drive", ast.operands[1].value
678
684
 
679
685
  # Second operator parsed correctly
680
- assert_instance_of Operator, ast.operands[1]
681
- assert_equal "label", ast.operands[1].name
682
- assert_equal "Notes", ast.operands[1].value
686
+ assert_instance_of Operator, ast.operands[2]
687
+ assert_equal "label", ast.operands[2].name
688
+ assert_equal "Notes", ast.operands[2].value
683
689
  end
684
690
 
685
691
  def test_subject_with_barewords_gmail_behavior
686
- # Gmail parses: subject:"urgent meeting important"
687
- # We now match this behavior
692
+ # Gmail treats barewords as separate search terms
693
+ # subject:urgent meeting important subject contains "urgent" AND body contains "meeting" AND "important"
688
694
  ast = GmailSearchSyntax.parse!("subject:urgent meeting important")
689
- assert_instance_of Operator, ast
695
+ assert_instance_of And, ast
696
+ assert_equal 3, ast.operands.length
690
697
 
691
- assert_equal "subject", ast.name
692
- assert_equal "urgent meeting important", ast.value
698
+ assert_operator({name: "subject", value: "urgent"}, ast.operands[0])
699
+ assert_string_token({value: "meeting"}, ast.operands[1])
700
+ assert_string_token({value: "important"}, ast.operands[2])
693
701
  end
694
702
 
695
703
  def test_multiple_barewords_between_operators_gmail_behavior
696
- # Gmail parses: label:"test one two three", label:"another"
697
- # We now match this behavior
704
+ # Gmail treats each bareword as a separate search term
705
+ # label:test one two three label:another → 5 terms
698
706
  ast = GmailSearchSyntax.parse!("label:test one two three label:another")
699
707
  assert_instance_of And, ast
700
- assert_equal 2, ast.operands.length
708
+ assert_equal 5, ast.operands.length
701
709
 
702
- assert_instance_of Operator, ast.operands[0]
703
- assert_equal "label", ast.operands[0].name
704
- assert_equal "test one two three", ast.operands[0].value
705
-
706
- assert_instance_of Operator, ast.operands[1]
707
- assert_equal "label", ast.operands[1].name
708
- assert_equal "another", ast.operands[1].value
710
+ assert_operator({name: "label", value: "test"}, ast.operands[0])
711
+ assert_string_token({value: "one"}, ast.operands[1])
712
+ assert_string_token({value: "two"}, ast.operands[2])
713
+ assert_string_token({value: "three"}, ast.operands[3])
714
+ assert_operator({name: "label", value: "another"}, ast.operands[4])
709
715
  end
710
716
 
711
717
  def test_barewords_stop_at_special_operators
712
- # Bareword collection should stop at OR, AND, AROUND
718
+ # Barewords are separate terms, OR separates two groups
713
719
  ast = GmailSearchSyntax.parse!("subject:urgent meeting OR subject:important call")
714
720
  assert_instance_of Or, ast
715
721
  assert_equal 2, ast.operands.length
716
722
 
717
- assert_instance_of Operator, ast.operands[0]
718
- assert_equal "subject", ast.operands[0].name
719
- assert_equal "urgent meeting", ast.operands[0].value
723
+ # Left side: subject:urgent AND meeting (implicit AND)
724
+ assert_instance_of And, ast.operands[0]
725
+ assert_equal 2, ast.operands[0].operands.length
726
+ assert_operator({name: "subject", value: "urgent"}, ast.operands[0].operands[0])
727
+ assert_string_token({value: "meeting"}, ast.operands[0].operands[1])
720
728
 
721
- assert_instance_of Operator, ast.operands[1]
722
- assert_equal "subject", ast.operands[1].name
723
- assert_equal "important call", ast.operands[1].value
729
+ # Right side: subject:important AND call (implicit AND)
730
+ assert_instance_of And, ast.operands[1]
731
+ assert_equal 2, ast.operands[1].operands.length
732
+ assert_operator({name: "subject", value: "important"}, ast.operands[1].operands[0])
733
+ assert_string_token({value: "call"}, ast.operands[1].operands[1])
724
734
  end
725
735
 
726
736
  def test_barewords_with_mixed_tokens
727
- # Numbers, dates, emails should all be collected as barewords
737
+ # Numbers, dates, emails are all separate search terms
728
738
  ast = GmailSearchSyntax.parse!("subject:meeting 2024 Q1 review")
729
- assert_instance_of Operator, ast
730
- assert_equal "subject", ast.name
731
- assert_equal "meeting 2024 Q1 review", ast.value
739
+ assert_instance_of And, ast
740
+ assert_equal 4, ast.operands.length
741
+
742
+ assert_operator({name: "subject", value: "meeting"}, ast.operands[0])
743
+ assert_string_token({value: 2024}, ast.operands[1])
744
+ assert_string_token({value: "Q1"}, ast.operands[2])
745
+ assert_string_token({value: "review"}, ast.operands[3])
732
746
  end
733
747
 
734
748
  def test_specific_gmail_example_cora_google_drive
735
- # The specific example from the user: label:Cora/Google Drive label:Notes
736
- # This should parse as two separate label operators with multi-word values
749
+ # label:Cora/Google Drive label:Notes
750
+ # "Drive" is a separate search term - to include it in the label, quote it:
751
+ # label:"Cora/Google Drive" label:Notes
737
752
  ast = GmailSearchSyntax.parse!("label:Cora/Google Drive label:Notes")
738
753
  assert_instance_of And, ast
739
- assert_equal 2, ast.operands.length
754
+ assert_equal 3, ast.operands.length
740
755
 
741
- # First operator: label with "Cora/Google Drive"
742
- assert_instance_of Operator, ast.operands[0]
743
- assert_equal "label", ast.operands[0].name
744
- assert_equal "Cora/Google Drive", ast.operands[0].value
756
+ # First operator: label with "Cora/Google" only
757
+ assert_operator({name: "label", value: "Cora/Google"}, ast.operands[0])
758
+
759
+ # "Drive" becomes a separate search term
760
+ assert_string_token({value: "Drive"}, ast.operands[1])
745
761
 
746
762
  # Second operator: label with "Notes"
747
- assert_instance_of Operator, ast.operands[1]
748
- assert_equal "label", ast.operands[1].name
749
- assert_equal "Notes", ast.operands[1].value
763
+ assert_operator({name: "label", value: "Notes"}, ast.operands[2])
750
764
  end
751
765
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gmail_search_syntax
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - me@julik.nl
@@ -69,10 +69,9 @@ executables: []
69
69
  extensions: []
70
70
  extra_rdoc_files: []
71
71
  files:
72
- - ARCHITECTURE.md
73
- - GMAIL_BEHAVIOR_COMPARISON.md
74
- - GMAIL_COMPATIBILITY_COMPLETE.md
75
- - IMPLEMENTATION_NOTES.md
72
+ - ".github/workflows/ci.yml"
73
+ - AGENTS.md
74
+ - Gemfile
76
75
  - README.md
77
76
  - Rakefile
78
77
  - SCHEMA.md
@@ -84,6 +83,7 @@ files:
84
83
  - examples/postgres_vs_sqlite.rb
85
84
  - examples/sql_query.rb
86
85
  - examples/text_vs_substring_demo.rb
86
+ - gmail_search_syntax.gemspec
87
87
  - lib/GMAIL_SEARCH_OPERATORS.md
88
88
  - lib/gmail_search_syntax.rb
89
89
  - lib/gmail_search_syntax/ast.rb
@@ -91,6 +91,11 @@ files:
91
91
  - lib/gmail_search_syntax/sql_visitor.rb
92
92
  - lib/gmail_search_syntax/tokenizer.rb
93
93
  - lib/gmail_search_syntax/version.rb
94
+ - slop/ARCHITECTURE.md
95
+ - slop/GMAIL_BEHAVIOR_COMPARISON.md
96
+ - slop/GMAIL_COMPATIBILITY_COMPLETE.md
97
+ - slop/GREEDY_VS_NON_GREEDY_TOKENIZATION.md
98
+ - slop/IMPLEMENTATION_NOTES.md
94
99
  - test/gmail_search_syntax_test.rb
95
100
  - test/integration_test.rb
96
101
  - test/postgres_visitor_test.rb
File without changes