gmail_search_syntax 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/ARCHITECTURE.md +338 -0
- data/README.md +129 -0
- data/Rakefile +11 -0
- data/SCHEMA.md +223 -0
- data/examples/alias_collision_fix.rb +43 -0
- data/examples/demo.rb +28 -0
- data/examples/gmail_message_id_demo.rb +118 -0
- data/examples/postgres_vs_sqlite.rb +55 -0
- data/examples/sql_query.rb +47 -0
- data/lib/GMAIL_SEARCH_OPERATORS.md +58 -0
- data/lib/gmail_search_syntax/ast.rb +100 -0
- data/lib/gmail_search_syntax/parser.rb +224 -0
- data/lib/gmail_search_syntax/sql_visitor.rb +496 -0
- data/lib/gmail_search_syntax/tokenizer.rb +152 -0
- data/lib/gmail_search_syntax/version.rb +3 -0
- data/lib/gmail_search_syntax.rb +34 -0
- data/test/gmail_search_syntax_test.rb +691 -0
- data/test/integration_test.rb +668 -0
- data/test/postgres_visitor_test.rb +156 -0
- data/test/sql_visitor_test.rb +346 -0
- data/test/test_helper.rb +27 -0
- data/test/tokenizer_test.rb +185 -0
- metadata +115 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: bb674a944e51bd81d0690d2d972c0b63bae352a30469d5d1719b959d78316b4f
|
4
|
+
data.tar.gz: 03fc2f44531f2d610e6dc189ab7104bf753cb3fff325de5f6e388b4529e7c9d3
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 13394002e2a5546876608249caeb8ed3379e6cad5bab7aaf324830f5ca885c9838838d3416eb6e4b6f5a17cc728e453dc586648572db769b7e5ec9b6fa758244
|
7
|
+
data.tar.gz: f22df8edc76415087dc990be7c895ec2af13545c7238c573cf129bbf8152adceded0f82d1d62dadebdc71f00bcb37edd477ec28661c365efb549dcc6e9688551
|
data/ARCHITECTURE.md
ADDED
@@ -0,0 +1,338 @@
|
|
1
|
+
# Gmail Search Syntax Parser Architecture
|
2
|
+
|
3
|
+
A rigorous parser for Gmail's search syntax as documented at [Gmail Help](https://support.google.com/mail/answer/7190).
|
4
|
+
|
5
|
+
## Architecture Overview
|
6
|
+
|
7
|
+
The parser is built in three stages:
|
8
|
+
|
9
|
+
1. **Tokenization** - Breaking the input string into tokens
|
10
|
+
2. **Parsing** - Building an Abstract Syntax Tree (AST) from tokens
|
11
|
+
3. **AST** - Minimal tree structure representing the search query
|
12
|
+
|
13
|
+
## Tokenization
|
14
|
+
|
15
|
+
The `Tokenizer` class scans the input character by character and produces a stream of tokens.
|
16
|
+
|
17
|
+
### Token Types
|
18
|
+
|
19
|
+
- **Keywords**: `word` - Operator names (from, to, subject, etc.)
|
20
|
+
- **Punctuation**: `colon`, `lparen`, `rparen`, `lbrace`, `rbrace`, `minus`, `plus`
|
21
|
+
- **Logical Operators**: `or`, `and`, `around`
|
22
|
+
- **Values**: `email`, `number`, `date`, `relative_time`, `quoted_string`
|
23
|
+
- **End**: `eof`
|
24
|
+
|
25
|
+
### Key Features
|
26
|
+
|
27
|
+
- Recognizes email addresses (contains `@`)
|
28
|
+
- Parses quoted strings with escape sequences
|
29
|
+
- Identifies dates (`YYYY/MM/DD` or `MM/DD/YYYY`)
|
30
|
+
- Recognizes relative times (`1y`, `2d`, `3m`)
|
31
|
+
- Handles logical operators (`OR`, `AND`, `AROUND`)
|
32
|
+
- Properly tokenizes negation (`-`)
|
33
|
+
|
34
|
+
### Example
|
35
|
+
|
36
|
+
```ruby
|
37
|
+
Input: 'from:amy@example.com subject:"meeting notes"'
|
38
|
+
Tokens: [word("from"), colon, email("amy@example.com"),
|
39
|
+
word("subject"), colon, quoted_string("meeting notes"), eof]
|
40
|
+
```
|
41
|
+
|
42
|
+
## Parsing
|
43
|
+
|
44
|
+
The `Parser` class implements a recursive descent parser that builds an AST from the token stream.
|
45
|
+
|
46
|
+
### Operator Precedence (highest to lowest)
|
47
|
+
|
48
|
+
1. Unary operators (`-`, `+`)
|
49
|
+
2. Primary expressions (operators, text, grouping)
|
50
|
+
3. `AROUND` (proximity search)
|
51
|
+
4. Implicit `AND` (adjacency)
|
52
|
+
5. Explicit `AND`
|
53
|
+
6. `OR`
|
54
|
+
|
55
|
+
### Grammar
|
56
|
+
|
57
|
+
```
|
58
|
+
expression → or_expression
|
59
|
+
or_expression → and_expression ( "OR" and_expression )*
|
60
|
+
and_expression → around_expr ( around_expr )* [ "AND" around_expr ]*
|
61
|
+
around_expr → unary_expr [ "AROUND" NUMBER? unary_expr ]
|
62
|
+
unary_expr → "-" primary | "+" primary | primary
|
63
|
+
primary → "(" expression ")"
|
64
|
+
| "{" or_list "}"
|
65
|
+
| operator ":" value
|
66
|
+
| quoted_string
|
67
|
+
| word
|
68
|
+
```
|
69
|
+
|
70
|
+
### Key Features
|
71
|
+
|
72
|
+
- **Implicit AND**: Adjacent terms are combined with AND
|
73
|
+
- **Braces as OR**: `{a b c}` is equivalent to `a OR b OR c`
|
74
|
+
- **Negation**: `-term` creates a NOT node
|
75
|
+
- **Grouping**: Parentheses override precedence
|
76
|
+
- **Operator values**: Can be words, emails, numbers, dates, or even grouped expressions
|
77
|
+
|
78
|
+
## AST Structure
|
79
|
+
|
80
|
+
The AST is a minimal tree representation with the following node types:
|
81
|
+
|
82
|
+
### Node Types
|
83
|
+
|
84
|
+
#### `Operator`
|
85
|
+
Represents a search operator (from, to, subject, etc.)
|
86
|
+
```ruby
|
87
|
+
name: String # "from", "to", "subject", etc.
|
88
|
+
value: String|Node # Value or nested expression
|
89
|
+
```
|
90
|
+
|
91
|
+
#### `Text`
|
92
|
+
Plain text search term.
|
93
|
+
```ruby
|
94
|
+
value: String
|
95
|
+
```
|
96
|
+
|
97
|
+
#### `And`
|
98
|
+
Logical AND combination with 2 or more operands.
|
99
|
+
```ruby
|
100
|
+
operands: [Node] # Array of 2+ nodes
|
101
|
+
```
|
102
|
+
|
103
|
+
#### `Or`
|
104
|
+
Logical OR combination with 2 or more operands.
|
105
|
+
```ruby
|
106
|
+
operands: [Node] # Array of 2+ nodes
|
107
|
+
```
|
108
|
+
|
109
|
+
#### `Not`
|
110
|
+
Negation (exclusion).
|
111
|
+
```ruby
|
112
|
+
child: Node
|
113
|
+
```
|
114
|
+
|
115
|
+
#### `Group`
|
116
|
+
Parenthesized grouping.
|
117
|
+
```ruby
|
118
|
+
children: [Node]
|
119
|
+
```
|
120
|
+
|
121
|
+
#### `Around`
|
122
|
+
Proximity search.
|
123
|
+
```ruby
|
124
|
+
left: Node
|
125
|
+
distance: Integer # Default: 5
|
126
|
+
right: Node
|
127
|
+
```
|
128
|
+
|
129
|
+
## Supported Operators
|
130
|
+
|
131
|
+
Based on Gmail's official documentation:
|
132
|
+
|
133
|
+
### Email Routing
|
134
|
+
- `from:`, `to:`, `cc:`, `bcc:`, `deliveredto:`
|
135
|
+
|
136
|
+
### Metadata
|
137
|
+
- `subject:`, `label:`, `category:`, `list:`
|
138
|
+
|
139
|
+
### Dates & Times
|
140
|
+
- `after:`, `before:`, `older:`, `newer:`, `older_than:`, `newer_than:`
|
141
|
+
|
142
|
+
### Attachments
|
143
|
+
- `has:`, `filename:`
|
144
|
+
|
145
|
+
### Status & Location
|
146
|
+
- `is:`, `in:`
|
147
|
+
|
148
|
+
### Size
|
149
|
+
- `size:`, `larger:`, `smaller:`
|
150
|
+
|
151
|
+
### Advanced
|
152
|
+
- `rfc822msgid:`
|
153
|
+
|
154
|
+
## Examples
|
155
|
+
|
156
|
+
### Simple Query
|
157
|
+
```ruby
|
158
|
+
Input: "from:amy@example.com"
|
159
|
+
AST: Operator("from", "amy@example.com")
|
160
|
+
```
|
161
|
+
|
162
|
+
### Logical OR
|
163
|
+
```ruby
|
164
|
+
Input: "from:amy OR from:bob"
|
165
|
+
AST: Or([
|
166
|
+
Operator("from", "amy"),
|
167
|
+
Operator("from", "bob")
|
168
|
+
])
|
169
|
+
```
|
170
|
+
|
171
|
+
### Multiple OR
|
172
|
+
```ruby
|
173
|
+
Input: "{from:a from:b from:c}"
|
174
|
+
AST: Or([
|
175
|
+
Operator("from", "a"),
|
176
|
+
Operator("from", "b"),
|
177
|
+
Operator("from", "c")
|
178
|
+
])
|
179
|
+
```
|
180
|
+
|
181
|
+
### Implicit AND
|
182
|
+
```ruby
|
183
|
+
Input: "subject:meeting has:attachment"
|
184
|
+
AST: And([
|
185
|
+
Operator("subject", "meeting"),
|
186
|
+
Operator("has", "attachment")
|
187
|
+
])
|
188
|
+
```
|
189
|
+
|
190
|
+
### Multiple AND
|
191
|
+
```ruby
|
192
|
+
Input: "from:boss subject:urgent has:attachment"
|
193
|
+
AST: And([
|
194
|
+
Operator("from", "boss"),
|
195
|
+
Operator("subject", "urgent"),
|
196
|
+
Operator("has", "attachment")
|
197
|
+
])
|
198
|
+
```
|
199
|
+
|
200
|
+
### Negation
|
201
|
+
```ruby
|
202
|
+
Input: "dinner -movie"
|
203
|
+
AST: And([
|
204
|
+
Text("dinner"),
|
205
|
+
Not(Text("movie"))
|
206
|
+
])
|
207
|
+
```
|
208
|
+
|
209
|
+
### Proximity Search
|
210
|
+
```ruby
|
211
|
+
Input: "holiday AROUND 10 vacation"
|
212
|
+
AST: Around(
|
213
|
+
Text("holiday"),
|
214
|
+
10,
|
215
|
+
Text("vacation")
|
216
|
+
)
|
217
|
+
```
|
218
|
+
|
219
|
+
### Complex Query
|
220
|
+
```ruby
|
221
|
+
Input: "(from:boss OR from:manager) subject:urgent -label:spam"
|
222
|
+
AST: And([
|
223
|
+
Or([
|
224
|
+
Operator("from", "boss"),
|
225
|
+
Operator("from", "manager")
|
226
|
+
]),
|
227
|
+
Operator("subject", "urgent"),
|
228
|
+
Not(Operator("label", "spam"))
|
229
|
+
])
|
230
|
+
```
|
231
|
+
|
232
|
+
### Grouped Operator Values
|
233
|
+
```ruby
|
234
|
+
Input: "subject:(dinner movie)"
|
235
|
+
AST: Operator("subject",
|
236
|
+
And([
|
237
|
+
Text("dinner"),
|
238
|
+
Text("movie")
|
239
|
+
])
|
240
|
+
)
|
241
|
+
```
|
242
|
+
|
243
|
+
### OR/AND Inside Operator Values
|
244
|
+
|
245
|
+
**Using Parentheses with OR:**
|
246
|
+
```ruby
|
247
|
+
Input: "from:(mischa@ OR julik@)"
|
248
|
+
AST: Operator("from",
|
249
|
+
Or([
|
250
|
+
Text("mischa@"),
|
251
|
+
Text("julik@")
|
252
|
+
])
|
253
|
+
)
|
254
|
+
```
|
255
|
+
|
256
|
+
**Using Curly Braces (implicit OR):**
|
257
|
+
```ruby
|
258
|
+
Input: "from:{mischa@ marc@}"
|
259
|
+
AST: Operator("from",
|
260
|
+
Or([
|
261
|
+
Text("mischa@"),
|
262
|
+
Text("marc@")
|
263
|
+
])
|
264
|
+
)
|
265
|
+
```
|
266
|
+
|
267
|
+
**Combined with Other Conditions:**
|
268
|
+
```ruby
|
269
|
+
Input: "from:(alice@ OR bob@) subject:meeting"
|
270
|
+
AST: And([
|
271
|
+
Operator("from",
|
272
|
+
Or([
|
273
|
+
Text("alice@"),
|
274
|
+
Text("bob@")
|
275
|
+
])
|
276
|
+
),
|
277
|
+
Operator("subject", "meeting")
|
278
|
+
])
|
279
|
+
```
|
280
|
+
|
281
|
+
## Testing
|
282
|
+
|
283
|
+
The parser includes comprehensive test coverage:
|
284
|
+
|
285
|
+
- **84 tests** across two test suites
|
286
|
+
- **460 assertions** verifying behavior
|
287
|
+
- Tests for all operators from Gmail documentation
|
288
|
+
- Edge case handling (empty queries, nested groups, etc.)
|
289
|
+
- Separate tokenizer tests with strict order verification
|
290
|
+
- Tests for complex expressions inside operator values:
|
291
|
+
- OR/AND with parentheses
|
292
|
+
- OR with curly braces (implicit OR)
|
293
|
+
- Mixed parentheses and curly braces
|
294
|
+
|
295
|
+
Run tests:
|
296
|
+
```bash
|
297
|
+
bundle exec rake test
|
298
|
+
```
|
299
|
+
|
300
|
+
## Usage
|
301
|
+
|
302
|
+
```ruby
|
303
|
+
require 'gmail_search_syntax'
|
304
|
+
|
305
|
+
ast = GmailSearchSyntax.parse!("from:manager subject:meeting")
|
306
|
+
# => #<And #<Operator from: "manager"> AND #<Operator subject: "meeting">>
|
307
|
+
|
308
|
+
# Access AST nodes
|
309
|
+
ast.operands[0].name # => "from"
|
310
|
+
ast.operands[0].value # => "manager"
|
311
|
+
ast.operands[1].name # => "subject"
|
312
|
+
ast.operands[1].value # => "meeting"
|
313
|
+
|
314
|
+
# Empty queries raise an error
|
315
|
+
GmailSearchSyntax.parse!("")
|
316
|
+
# => raises GmailSearchSyntax::EmptyQueryError: "Query cannot be empty"
|
317
|
+
```
|
318
|
+
|
319
|
+
## Design Decisions
|
320
|
+
|
321
|
+
1. **Minimal AST**: No redundant nodes; single-child nodes are collapsed
|
322
|
+
2. **Multi-operand AND/OR**: Support 2+ operands instead of binary left/right structure
|
323
|
+
3. **Strict tokenization**: Emails, dates, and numbers are recognized at tokenization
|
324
|
+
4. **Operator precedence**: Matches Gmail's actual behavior
|
325
|
+
5. **Implicit AND**: Adjacent terms combine naturally
|
326
|
+
6. **Value flexibility**: Operator values can be expressions (for grouping)
|
327
|
+
7. **Fail-fast on empty**: `parse!` raises `EmptyQueryError` for empty/whitespace-only input
|
328
|
+
|
329
|
+
## Extensions
|
330
|
+
|
331
|
+
The parser is designed to be extended:
|
332
|
+
|
333
|
+
1. Add semantic validation (valid operator names, date formats)
|
334
|
+
2. Convert AST to other query formats (SQL, Elasticsearch)
|
335
|
+
3. Add query optimization (flatten nested ANDs/ORs)
|
336
|
+
4. Pretty-print queries
|
337
|
+
5. Query builder API
|
338
|
+
|
data/README.md
ADDED
@@ -0,0 +1,129 @@
|
|
1
|
+
# gmail_search_syntax
|
2
|
+
|
3
|
+
Parser for Gmail's search syntax. Converts Gmail search queries into an Abstract Syntax Tree.
|
4
|
+
|
5
|
+
Based on the official Gmail search operators documentation:
|
6
|
+
https://support.google.com/mail/answer/7190
|
7
|
+
|
8
|
+
> [!TIP]
|
9
|
+
> This gem was created for [Cora,](https://cora.computer/) your personal e-mail assistant.
|
10
|
+
> Send them some love for allowing me to share it.
|
11
|
+
|
12
|
+
## Installation
|
13
|
+
|
14
|
+
```ruby
|
15
|
+
gem 'gmail_search_syntax'
|
16
|
+
```
|
17
|
+
|
18
|
+
## Usage
|
19
|
+
|
20
|
+
```ruby
|
21
|
+
require 'gmail_search_syntax'
|
22
|
+
|
23
|
+
ast = GmailSearchSyntax.parse!("from:boss subject:meeting")
|
24
|
+
# => #<And #<Operator from: "boss"> AND #<Operator subject: "meeting">>
|
25
|
+
```
|
26
|
+
|
27
|
+
Afterwards you can do all sorts of interesting things with this, for example - transform your AST nodes into Elastic or SQL queries, or execute them bottom-op just from arrays in memory.
|
28
|
+
|
29
|
+
### Examples
|
30
|
+
|
31
|
+
```ruby
|
32
|
+
# Simple operator
|
33
|
+
GmailSearchSyntax.parse!("from:amy@example.com")
|
34
|
+
# => #<Operator from: "amy@example.com">
|
35
|
+
|
36
|
+
# Logical OR
|
37
|
+
GmailSearchSyntax.parse!("from:amy OR from:bob")
|
38
|
+
# => #<Or #<Operator from: "amy"> OR #<Operator from: "bob">>
|
39
|
+
|
40
|
+
# Negation
|
41
|
+
GmailSearchSyntax.parse!("dinner -movie")
|
42
|
+
# => #<And #<Text "dinner"> AND #<Not #<Text "movie">>>
|
43
|
+
|
44
|
+
# Proximity search
|
45
|
+
GmailSearchSyntax.parse!("holiday AROUND 10 vacation")
|
46
|
+
# => #<Around #<Text "holiday"> AROUND 10 #<Text "vacation">>
|
47
|
+
|
48
|
+
# Complex query with OR inside operator values
|
49
|
+
GmailSearchSyntax.parse!("from:{alice@ bob@} subject:urgent")
|
50
|
+
# => #<And #<Operator from: #<Or ...>> AND #<Operator subject: "urgent">>
|
51
|
+
|
52
|
+
# Empty queries raise an error
|
53
|
+
GmailSearchSyntax.parse!("")
|
54
|
+
# => raises GmailSearchSyntax::EmptyQueryError
|
55
|
+
```
|
56
|
+
|
57
|
+
### Converting to SQL
|
58
|
+
|
59
|
+
The gem includes a SQLite visitor that can convert Gmail queries to SQL. Here's a complex example:
|
60
|
+
|
61
|
+
```ruby
|
62
|
+
require 'gmail_search_syntax'
|
63
|
+
|
64
|
+
# A complex Gmail query with multiple operators
|
65
|
+
query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
|
66
|
+
|
67
|
+
ast = GmailSearchSyntax.parse!(query)
|
68
|
+
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
|
69
|
+
visitor.visit(ast)
|
70
|
+
|
71
|
+
sql, params = visitor.to_query.to_sql
|
72
|
+
```
|
73
|
+
|
74
|
+
This generates the following SQL:
|
75
|
+
|
76
|
+
```sql
|
77
|
+
SELECT DISTINCT m0.id
|
78
|
+
FROM messages AS m0
|
79
|
+
INNER JOIN message_addresses AS ma1 ON m0.id = ma1.message_id
|
80
|
+
INNER JOIN message_addresses AS ma3 ON m0.id = ma3.message_id
|
81
|
+
INNER JOIN message_labels AS ml ON m0.id = ml.message_id
|
82
|
+
INNER JOIN labels AS l ON ml.label_id = l.id
|
83
|
+
WHERE ((((ma1.address_type = ? OR ma1.address_type = ? OR ma1.address_type = ?)
|
84
|
+
AND ma1.email_address = ?)
|
85
|
+
OR ((ma3.address_type = ? OR ma3.address_type = ? OR ma3.address_type = ?)
|
86
|
+
AND ma3.email_address = ?))
|
87
|
+
AND m0.subject LIKE ?
|
88
|
+
AND m0.has_attachment = 1
|
89
|
+
AND NOT l.name = ?
|
90
|
+
AND m0.internal_date > ?
|
91
|
+
AND m0.size_bytes > ?)
|
92
|
+
```
|
93
|
+
|
94
|
+
With parameters: `["from", "cc", "bcc", "manager", "from", "cc", "bcc", "boss", "%quarterly review%", "archived", "2024-01-01", 5242880]`
|
95
|
+
|
96
|
+
A similar visitor is provided for PostgreSQL.
|
97
|
+
|
98
|
+
## Supported Operators
|
99
|
+
|
100
|
+
Email routing: `from:`, `to:`, `cc:`, `bcc:`, `deliveredto:`
|
101
|
+
Metadata: `subject:`, `label:`, `category:`, `list:`
|
102
|
+
Dates: `after:`, `before:`, `older:`, `newer:`, `older_than:`, `newer_than:`
|
103
|
+
Attachments: `has:`, `filename:`
|
104
|
+
Status: `is:`, `in:`
|
105
|
+
Size: `size:`, `larger:`, `smaller:`
|
106
|
+
|
107
|
+
## Features
|
108
|
+
|
109
|
+
- Handles operator precedence (negation, AROUND, implicit AND, explicit AND, OR)
|
110
|
+
- Supports grouping with parentheses and braces
|
111
|
+
- Recognizes emails, dates, quoted strings, and numbers
|
112
|
+
- Minimal AST structure
|
113
|
+
|
114
|
+
There is also a converter from the operators to SQL queries against an embedded SQLite database. This is meant more as an example than a fully-featured store, but it shows what's possible.
|
115
|
+
|
116
|
+
## Testing
|
117
|
+
|
118
|
+
```bash
|
119
|
+
bundle exec rake test
|
120
|
+
```
|
121
|
+
|
122
|
+
## License
|
123
|
+
|
124
|
+
MIT
|
125
|
+
|
126
|
+
## Legal Notes
|
127
|
+
|
128
|
+
Gmail is a trademark of Google LLC.
|
129
|
+
|
data/Rakefile
ADDED
data/SCHEMA.md
ADDED
@@ -0,0 +1,223 @@
|
|
1
|
+
# Database Schema for Gmail Search
|
2
|
+
|
3
|
+
## Overview
|
4
|
+
|
5
|
+
This document describes the SQLite database schema designed to support Gmail search syntax queries. The schema is optimized for the search operators defined in `lib/GMAIL_SEARCH_OPERATORS.md`.
|
6
|
+
|
7
|
+
## Core Tables
|
8
|
+
|
9
|
+
### messages
|
10
|
+
Primary table storing email message metadata.
|
11
|
+
|
12
|
+
```sql
|
13
|
+
CREATE TABLE messages (
|
14
|
+
id TEXT PRIMARY KEY,
|
15
|
+
rfc822_message_id TEXT,
|
16
|
+
subject TEXT,
|
17
|
+
body TEXT,
|
18
|
+
internal_date DATETIME,
|
19
|
+
size_bytes INTEGER,
|
20
|
+
|
21
|
+
is_important BOOLEAN DEFAULT 0,
|
22
|
+
is_starred BOOLEAN DEFAULT 0,
|
23
|
+
is_unread BOOLEAN DEFAULT 0,
|
24
|
+
is_read BOOLEAN DEFAULT 0,
|
25
|
+
is_muted BOOLEAN DEFAULT 0,
|
26
|
+
|
27
|
+
in_inbox BOOLEAN DEFAULT 1,
|
28
|
+
in_archive BOOLEAN DEFAULT 0,
|
29
|
+
in_snoozed BOOLEAN DEFAULT 0,
|
30
|
+
in_spam BOOLEAN DEFAULT 0,
|
31
|
+
in_trash BOOLEAN DEFAULT 0,
|
32
|
+
|
33
|
+
has_attachment BOOLEAN DEFAULT 0,
|
34
|
+
has_youtube BOOLEAN DEFAULT 0,
|
35
|
+
has_drive BOOLEAN DEFAULT 0,
|
36
|
+
has_document BOOLEAN DEFAULT 0,
|
37
|
+
has_spreadsheet BOOLEAN DEFAULT 0,
|
38
|
+
has_presentation BOOLEAN DEFAULT 0,
|
39
|
+
|
40
|
+
has_yellow_star BOOLEAN DEFAULT 0,
|
41
|
+
has_orange_star BOOLEAN DEFAULT 0,
|
42
|
+
has_red_star BOOLEAN DEFAULT 0,
|
43
|
+
has_purple_star BOOLEAN DEFAULT 0,
|
44
|
+
has_blue_star BOOLEAN DEFAULT 0,
|
45
|
+
has_green_star BOOLEAN DEFAULT 0,
|
46
|
+
has_red_bang BOOLEAN DEFAULT 0,
|
47
|
+
has_orange_guillemet BOOLEAN DEFAULT 0,
|
48
|
+
has_yellow_bang BOOLEAN DEFAULT 0,
|
49
|
+
has_green_check BOOLEAN DEFAULT 0,
|
50
|
+
has_blue_info BOOLEAN DEFAULT 0,
|
51
|
+
has_purple_question BOOLEAN DEFAULT 0,
|
52
|
+
|
53
|
+
category TEXT,
|
54
|
+
mailing_list TEXT,
|
55
|
+
|
56
|
+
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
57
|
+
);
|
58
|
+
```
|
59
|
+
|
60
|
+
### message_addresses
|
61
|
+
Stores email addresses associated with messages (from, to, cc, bcc, delivered_to).
|
62
|
+
|
63
|
+
```sql
|
64
|
+
CREATE TABLE message_addresses (
|
65
|
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
66
|
+
message_id TEXT NOT NULL,
|
67
|
+
address_type TEXT NOT NULL,
|
68
|
+
email_address TEXT NOT NULL,
|
69
|
+
display_name TEXT,
|
70
|
+
|
71
|
+
FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE CASCADE
|
72
|
+
);
|
73
|
+
```
|
74
|
+
|
75
|
+
**Note:** The `from:` and `to:` operators search across `from`, `cc`, and `bcc` address types per Gmail specification.
|
76
|
+
|
77
|
+
### labels
|
78
|
+
Label definitions with external string IDs.
|
79
|
+
|
80
|
+
```sql
|
81
|
+
CREATE TABLE labels (
|
82
|
+
id TEXT PRIMARY KEY,
|
83
|
+
name TEXT NOT NULL UNIQUE,
|
84
|
+
is_system_label BOOLEAN DEFAULT 0,
|
85
|
+
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
86
|
+
);
|
87
|
+
```
|
88
|
+
|
89
|
+
### message_labels
|
90
|
+
Many-to-many relationship between messages and labels.
|
91
|
+
|
92
|
+
```sql
|
93
|
+
CREATE TABLE message_labels (
|
94
|
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
95
|
+
message_id TEXT NOT NULL,
|
96
|
+
label_id TEXT NOT NULL,
|
97
|
+
|
98
|
+
FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE CASCADE,
|
99
|
+
FOREIGN KEY (label_id) REFERENCES labels(id) ON DELETE CASCADE,
|
100
|
+
UNIQUE(message_id, label_id)
|
101
|
+
);
|
102
|
+
```
|
103
|
+
|
104
|
+
### attachments
|
105
|
+
File attachments associated with messages.
|
106
|
+
|
107
|
+
```sql
|
108
|
+
CREATE TABLE attachments (
|
109
|
+
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
110
|
+
message_id TEXT NOT NULL,
|
111
|
+
filename TEXT NOT NULL,
|
112
|
+
content_type TEXT,
|
113
|
+
size_bytes INTEGER,
|
114
|
+
|
115
|
+
FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE CASCADE
|
116
|
+
);
|
117
|
+
```
|
118
|
+
|
119
|
+
## String Matching Requirements
|
120
|
+
|
121
|
+
### Prefix/Suffix Matching
|
122
|
+
Required for:
|
123
|
+
- **Email addresses** (from:, to:, cc:, bcc:, deliveredto:)
|
124
|
+
- `from:marc@` → prefix match → `WHERE email_address LIKE 'marc@%'`
|
125
|
+
- `from:@example.com` → suffix match → `WHERE email_address LIKE '%@example.com'`
|
126
|
+
- `from:marc@example.com` → exact match → `WHERE email_address = 'marc@example.com'`
|
127
|
+
|
128
|
+
- **Mailing lists** (list:)
|
129
|
+
- Same pattern as email addresses
|
130
|
+
|
131
|
+
- **Filenames** (filename:)
|
132
|
+
- `filename:pdf` → extension match → `WHERE filename LIKE '%.pdf'`
|
133
|
+
- `filename:homework` → prefix match → `WHERE filename LIKE 'homework%'`
|
134
|
+
|
135
|
+
### Exact Match Only
|
136
|
+
- RFC822 message IDs
|
137
|
+
- Boolean/enum fields (is:, has:, in:, category:, label:)
|
138
|
+
|
139
|
+
## SQL Visitor Usage
|
140
|
+
|
141
|
+
The library provides two SQL visitor implementations for different database backends:
|
142
|
+
|
143
|
+
### SQLiteVisitor
|
144
|
+
|
145
|
+
Converts parsed AST into SQLite-compatible SQL queries:
|
146
|
+
|
147
|
+
```ruby
|
148
|
+
ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
|
149
|
+
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "me@example.com")
|
150
|
+
visitor.visit(ast)
|
151
|
+
|
152
|
+
sql, params = visitor.to_query.to_sql
|
153
|
+
# sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > datetime('now', ?)"
|
154
|
+
# params: ["from", "cc", "bcc", "amy@example.com", "-7 days"]
|
155
|
+
```
|
156
|
+
|
157
|
+
### PostgresVisitor
|
158
|
+
|
159
|
+
Converts parsed AST into PostgreSQL-compatible SQL queries:
|
160
|
+
|
161
|
+
```ruby
|
162
|
+
ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
|
163
|
+
visitor = GmailSearchSyntax::PostgresVisitor.new(current_user_email: "me@example.com")
|
164
|
+
visitor.visit(ast)
|
165
|
+
|
166
|
+
sql, params = visitor.to_query.to_sql
|
167
|
+
# sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > (NOW() - ?::interval)"
|
168
|
+
# params: ["from", "cc", "bcc", "amy@example.com", "7 days"]
|
169
|
+
```
|
170
|
+
|
171
|
+
**Note**: `SqlVisitor` is an alias for `SQLiteVisitor` for backward compatibility.
|
172
|
+
|
173
|
+
### Database-Specific Differences
|
174
|
+
|
175
|
+
The main difference between the visitors is in relative date handling:
|
176
|
+
|
177
|
+
| Feature | SQLite | PostgreSQL |
|
178
|
+
|---------|--------|------------|
|
179
|
+
| `older_than:7d` | `datetime('now', '-7 days')` | `NOW() - '7 days'::interval` |
|
180
|
+
| `newer_than:3m` | `datetime('now', '-3 months')` | `NOW() - '3 months'::interval` |
|
181
|
+
| Parameter format | `"-7 days"` (negative) | `"7 days"` (positive with cast) |
|
182
|
+
|
183
|
+
All other query generation is identical between the two visitors.
|
184
|
+
|
185
|
+
### Features
|
186
|
+
|
187
|
+
- **Parameterized queries**: All user input is bound via `?` placeholders
|
188
|
+
- **Automatic table joins**: Joins required tables based on operators
|
189
|
+
- **Nested conditions**: Properly handles AND/OR/NOT with parentheses
|
190
|
+
- **Special operators**:
|
191
|
+
- `from:me` / `to:me` → uses `current_user_email`
|
192
|
+
- `in:anywhere` → no location filter
|
193
|
+
- `AROUND` → generates `(1 = 0)` no-op condition
|
194
|
+
- **Date handling**:
|
195
|
+
- Converts dates from `YYYY/MM/DD` to `YYYY-MM-DD`
|
196
|
+
- Parses relative times (`1y`, `2d`, `3m`) to database-specific datetime functions
|
197
|
+
- **Size parsing**: Converts `10M`, `1G` to bytes
|
198
|
+
|
199
|
+
### Query Object
|
200
|
+
|
201
|
+
The `Query` class accumulates SQL components:
|
202
|
+
|
203
|
+
```ruby
|
204
|
+
query = visitor.to_query
|
205
|
+
|
206
|
+
query.conditions # Array of WHERE conditions
|
207
|
+
query.joins # Hash of JOIN clauses
|
208
|
+
query.params # Array of bound parameters
|
209
|
+
|
210
|
+
sql, params = query.to_sql
|
211
|
+
# Returns: [sql_string, parameters_array]
|
212
|
+
```
|
213
|
+
|
214
|
+
## Fuzzy Matching Limitations
|
215
|
+
|
216
|
+
The current implementation does **not** support:
|
217
|
+
- **AROUND operator** (proximity search) - generates no-op `(1 = 0)` condition
|
218
|
+
- Full-text search with word distance calculations
|
219
|
+
- Stemming or phonetic matching
|
220
|
+
- Levenshtein distance / typo tolerance
|
221
|
+
|
222
|
+
These features require additional implementation, potentially using SQLite FTS5 extensions.
|
223
|
+
|