gmail_search_syntax 0.1.1 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/ci.yml +24 -0
- data/AGENTS.md +104 -0
- data/Gemfile +3 -0
- data/README.md +19 -42
- data/SCHEMA.md +112 -128
- data/examples/demo.rb +2 -6
- data/examples/gmail_comparison_demo.rb +82 -0
- data/gmail_search_syntax.gemspec +23 -0
- data/lib/gmail_search_syntax/parser.rb +10 -22
- data/lib/gmail_search_syntax/tokenizer.rb +3 -3
- data/lib/gmail_search_syntax/version.rb +1 -1
- data/slop/GMAIL_BEHAVIOR_COMPARISON.md +166 -0
- data/slop/GMAIL_COMPATIBILITY_COMPLETE.md +236 -0
- data/slop/GREEDY_VS_NON_GREEDY_TOKENIZATION.md +84 -0
- data/test/gmail_search_syntax_test.rb +197 -169
- data/test/tokenizer_test.rb +176 -144
- metadata +11 -3
- /data/{ARCHITECTURE.md → slop/ARCHITECTURE.md} +0 -0
- /data/{IMPLEMENTATION_NOTES.md → slop/IMPLEMENTATION_NOTES.md} +0 -0
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: d7c192efdd70d31b0747a356b1adab4c50160afb81e747d1088f2727f97c7262
|
|
4
|
+
data.tar.gz: 73f6401e73d2ecae59b6e6741c08786d2da3dffbbdcb6f21095c63bf40761f6d
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: ce609cacd2a6276d9178276ee84e0ccffc80eb8840bd62e6402e2275817ba2052559f903a55648373e87a79527415d6be4df1d4052429c83e3faa9abc1fe2c1c
|
|
7
|
+
data.tar.gz: 7badbc6024a67752bfab201577ebeb900ba106ed9979110c04eed6c76d35e33e26a7b1147927a2fd9d329f6a7bc28b71d2487d0f26e87f23a9dd7f867fac7e09
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [ main ]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [ main ]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
test:
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
|
|
13
|
+
steps:
|
|
14
|
+
- uses: actions/checkout@v4
|
|
15
|
+
|
|
16
|
+
- name: Set up Ruby
|
|
17
|
+
uses: ruby/setup-ruby@v1
|
|
18
|
+
with:
|
|
19
|
+
ruby-version: '3.4'
|
|
20
|
+
bundler-cache: true
|
|
21
|
+
|
|
22
|
+
- name: Run tests
|
|
23
|
+
run: bundle exec rake
|
|
24
|
+
|
data/AGENTS.md
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
# Agent Guidelines for gmail_search_syntax
|
|
2
|
+
|
|
3
|
+
This document outlines the coding standards and workflow requirements for AI agents working on the gmail_search_syntax project.
|
|
4
|
+
|
|
5
|
+
## Ruby File Standards
|
|
6
|
+
|
|
7
|
+
### Frozen String Literal
|
|
8
|
+
**ALWAYS** include `# frozen_string_literal: true` at the top of every Ruby file:
|
|
9
|
+
|
|
10
|
+
```ruby
|
|
11
|
+
# frozen_string_literal: true
|
|
12
|
+
|
|
13
|
+
class MyClass
|
|
14
|
+
# ... implementation
|
|
15
|
+
end
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
This directive should be the very first line of every `.rb` file to ensure string immutability and improve performance.
|
|
19
|
+
|
|
20
|
+
## Documentation Standards
|
|
21
|
+
|
|
22
|
+
### Markdown Files Location
|
|
23
|
+
When writing step-by-step instructions, documentation, or process descriptions in Markdown format, place them in the `slop/` directory:
|
|
24
|
+
|
|
25
|
+
```
|
|
26
|
+
slop/
|
|
27
|
+
├── ARCHITECTURE.md
|
|
28
|
+
├── GMAIL_BEHAVIOR_COMPARISON.md
|
|
29
|
+
├── GMAIL_COMPATIBILITY_COMPLETE.md
|
|
30
|
+
└── IMPLEMENTATION_NOTES.md
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
This keeps the main project directory clean while preserving detailed documentation and implementation notes.
|
|
34
|
+
|
|
35
|
+
## Code Formatting
|
|
36
|
+
|
|
37
|
+
### StandardRB Integration
|
|
38
|
+
After creating or modifying any Ruby file, **ALWAYS** run StandardRB to maintain consistent formatting:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
standardrb --fix /path/to/file.rb
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
This ensures:
|
|
45
|
+
- Consistent code style across the project
|
|
46
|
+
- Automatic fixing of common formatting issues
|
|
47
|
+
- Compliance with the project's Ruby style guide
|
|
48
|
+
- Uniform indentation, spacing, and syntax
|
|
49
|
+
|
|
50
|
+
### Workflow
|
|
51
|
+
1. Create or modify a Ruby file
|
|
52
|
+
2. Immediately run `standardrb --fix` on the file
|
|
53
|
+
3. Verify the changes are acceptable
|
|
54
|
+
4. Continue with development
|
|
55
|
+
|
|
56
|
+
## Project Context
|
|
57
|
+
|
|
58
|
+
This is a Ruby gem that parses Gmail's search syntax and converts it into an Abstract Syntax Tree (AST). The project includes:
|
|
59
|
+
|
|
60
|
+
- **Core parsing**: Tokenizer, parser, and AST nodes
|
|
61
|
+
- **SQL conversion**: SQLite and Postgres visitors for database queries
|
|
62
|
+
- **Comprehensive testing**: Unit and integration tests
|
|
63
|
+
- **Documentation**: Schema documentation and operator reference
|
|
64
|
+
|
|
65
|
+
## Key Files
|
|
66
|
+
|
|
67
|
+
- `lib/gmail_search_syntax.rb` - Main entry point
|
|
68
|
+
- `lib/gmail_search_syntax/parser.rb` - Core parsing logic
|
|
69
|
+
- `lib/gmail_search_syntax/sql_visitor.rb` - SQL generation
|
|
70
|
+
- `test/` - Test suite
|
|
71
|
+
- `SCHEMA.md` - Database schema documentation
|
|
72
|
+
- `slop/` - Detailed implementation documentation
|
|
73
|
+
|
|
74
|
+
## Best Practices
|
|
75
|
+
|
|
76
|
+
1. **Test Coverage**: Ensure all new functionality has corresponding tests
|
|
77
|
+
2. **Documentation**: Update relevant documentation when adding features
|
|
78
|
+
3. **Backward Compatibility**: Maintain API compatibility when possible
|
|
79
|
+
4. **Performance**: Consider performance implications of parsing changes
|
|
80
|
+
5. **Gmail Compatibility**: Verify changes against Gmail's actual search behavior
|
|
81
|
+
|
|
82
|
+
## Example Workflow
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
# 1. Create a new Ruby file
|
|
86
|
+
echo '# frozen_string_literal: true' > lib/gmail_search_syntax/new_feature.rb
|
|
87
|
+
|
|
88
|
+
# 2. Add implementation
|
|
89
|
+
# ... write code ...
|
|
90
|
+
|
|
91
|
+
# 3. Format with StandardRB
|
|
92
|
+
standardrb --fix lib/gmail_search_syntax/new_feature.rb
|
|
93
|
+
|
|
94
|
+
# 4. Create documentation if needed
|
|
95
|
+
echo '# Implementation Notes' > slop/NEW_FEATURE_NOTES.md
|
|
96
|
+
|
|
97
|
+
# 5. Add tests
|
|
98
|
+
# ... write tests ...
|
|
99
|
+
|
|
100
|
+
# 6. Run test suite
|
|
101
|
+
bundle exec rake test
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
Remember: Consistency in code style and documentation organization is crucial for maintaining this project's quality and readability.
|
data/Gemfile
ADDED
data/README.md
CHANGED
|
@@ -6,7 +6,8 @@ Based on the official Gmail search operators documentation:
|
|
|
6
6
|
https://support.google.com/mail/answer/7190
|
|
7
7
|
|
|
8
8
|
> [!TIP]
|
|
9
|
-
> This gem was created for [Cora,](https://cora.computer/)
|
|
9
|
+
> This gem was created for [Cora,](https://cora.computer/)
|
|
10
|
+
> your personal e-mail assistant.
|
|
10
11
|
> Send them some love for allowing me to share it.
|
|
11
12
|
|
|
12
13
|
## Installation
|
|
@@ -54,47 +55,6 @@ GmailSearchSyntax.parse!("")
|
|
|
54
55
|
# => raises GmailSearchSyntax::EmptyQueryError
|
|
55
56
|
```
|
|
56
57
|
|
|
57
|
-
### Converting to SQL
|
|
58
|
-
|
|
59
|
-
The gem includes a SQLite visitor that can convert Gmail queries to SQL. Here's a complex example:
|
|
60
|
-
|
|
61
|
-
```ruby
|
|
62
|
-
require 'gmail_search_syntax'
|
|
63
|
-
|
|
64
|
-
# A complex Gmail query with multiple operators
|
|
65
|
-
query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
|
|
66
|
-
|
|
67
|
-
ast = GmailSearchSyntax.parse!(query)
|
|
68
|
-
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
|
|
69
|
-
visitor.visit(ast)
|
|
70
|
-
|
|
71
|
-
sql, params = visitor.to_query.to_sql
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
This generates the following SQL:
|
|
75
|
-
|
|
76
|
-
```sql
|
|
77
|
-
SELECT DISTINCT m0.id
|
|
78
|
-
FROM messages AS m0
|
|
79
|
-
INNER JOIN message_addresses AS ma1 ON m0.id = ma1.message_id
|
|
80
|
-
INNER JOIN message_addresses AS ma3 ON m0.id = ma3.message_id
|
|
81
|
-
INNER JOIN message_labels AS ml ON m0.id = ml.message_id
|
|
82
|
-
INNER JOIN labels AS l ON ml.label_id = l.id
|
|
83
|
-
WHERE ((((ma1.address_type = ? OR ma1.address_type = ? OR ma1.address_type = ?)
|
|
84
|
-
AND ma1.email_address = ?)
|
|
85
|
-
OR ((ma3.address_type = ? OR ma3.address_type = ? OR ma3.address_type = ?)
|
|
86
|
-
AND ma3.email_address = ?))
|
|
87
|
-
AND m0.subject LIKE ?
|
|
88
|
-
AND m0.has_attachment = 1
|
|
89
|
-
AND NOT l.name = ?
|
|
90
|
-
AND m0.internal_date > ?
|
|
91
|
-
AND m0.size_bytes > ?)
|
|
92
|
-
```
|
|
93
|
-
|
|
94
|
-
With parameters: `["from", "cc", "bcc", "manager", "from", "cc", "bcc", "boss", "%quarterly review%", "archived", "2024-01-01", 5242880]`
|
|
95
|
-
|
|
96
|
-
A similar visitor is provided for PostgreSQL.
|
|
97
|
-
|
|
98
58
|
## Supported Operators
|
|
99
59
|
|
|
100
60
|
Email routing: `from:`, `to:`, `cc:`, `bcc:`, `deliveredto:`
|
|
@@ -113,6 +73,23 @@ Size: `size:`, `larger:`, `smaller:`
|
|
|
113
73
|
|
|
114
74
|
There is also a converter from the operators to SQL queries against an embedded SQLite database. This is meant more as an example than a fully-featured store, but it shows what's possible.
|
|
115
75
|
|
|
76
|
+
### Converting to SQL
|
|
77
|
+
|
|
78
|
+
The gem includes a SQLite visitor and a Postgres visitor which converts the Gmail queries into corresponding SQL. See SCHEMA.md for more information.
|
|
79
|
+
|
|
80
|
+
```ruby
|
|
81
|
+
require 'gmail_search_syntax'
|
|
82
|
+
|
|
83
|
+
# A complex Gmail query with multiple operators
|
|
84
|
+
query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
|
|
85
|
+
|
|
86
|
+
ast = GmailSearchSyntax.parse!(query)
|
|
87
|
+
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
|
|
88
|
+
visitor.visit(ast)
|
|
89
|
+
|
|
90
|
+
sql, params = visitor.to_query.to_sql
|
|
91
|
+
```
|
|
92
|
+
|
|
116
93
|
## Testing
|
|
117
94
|
|
|
118
95
|
```bash
|
data/SCHEMA.md
CHANGED
|
@@ -1,15 +1,111 @@
|
|
|
1
|
-
|
|
1
|
+
This document describes the database schema designed to support Gmail search syntax queries. The schema is optimized for the search operators defined in `lib/GMAIL_SEARCH_OPERATORS.md`.
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
```ruby
|
|
4
|
+
require 'gmail_search_syntax'
|
|
4
5
|
|
|
5
|
-
|
|
6
|
+
# A complex Gmail query with multiple operators
|
|
7
|
+
query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
ast = GmailSearchSyntax.parse!(query)
|
|
10
|
+
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
|
|
11
|
+
visitor.visit(ast)
|
|
12
|
+
|
|
13
|
+
sql, params = visitor.to_query.to_sql
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
generates the following SQL:
|
|
17
|
+
|
|
18
|
+
```sql
|
|
19
|
+
SELECT DISTINCT m0.id
|
|
20
|
+
FROM messages AS m0
|
|
21
|
+
INNER JOIN message_addresses AS ma1 ON m0.id = ma1.message_id
|
|
22
|
+
INNER JOIN message_addresses AS ma3 ON m0.id = ma3.message_id
|
|
23
|
+
INNER JOIN message_labels AS ml ON m0.id = ml.message_id
|
|
24
|
+
INNER JOIN labels AS l ON ml.label_id = l.id
|
|
25
|
+
WHERE ((((ma1.address_type = ? OR ma1.address_type = ? OR ma1.address_type = ?)
|
|
26
|
+
AND ma1.email_address = ?)
|
|
27
|
+
OR ((ma3.address_type = ? OR ma3.address_type = ? OR ma3.address_type = ?)
|
|
28
|
+
AND ma3.email_address = ?))
|
|
29
|
+
AND m0.subject LIKE ?
|
|
30
|
+
AND m0.has_attachment = 1
|
|
31
|
+
AND NOT l.name = ?
|
|
32
|
+
AND m0.internal_date > ?
|
|
33
|
+
AND m0.size_bytes > ?)
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
and bound parameters:
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
[
|
|
40
|
+
"from", "cc", "bcc", "manager", "from", "cc", "bcc",
|
|
41
|
+
"boss", "%quarterly review%", "archived", "2024-01-01", 5242880
|
|
42
|
+
]
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## String Matching Requirements
|
|
46
|
+
|
|
47
|
+
### Prefix/Suffix Matching
|
|
48
|
+
Required for:
|
|
49
|
+
- **Email addresses** (from:, to:, cc:, bcc:, deliveredto:)
|
|
50
|
+
- `from:marc@` → prefix match → `WHERE email_address LIKE 'marc@%'`
|
|
51
|
+
- `from:@example.com` → suffix match → `WHERE email_address LIKE '%@example.com'`
|
|
52
|
+
- `from:marc@example.com` → exact match → `WHERE email_address = 'marc@example.com'`
|
|
53
|
+
|
|
54
|
+
- **Mailing lists** (list:)
|
|
55
|
+
- Same pattern as email addresses
|
|
56
|
+
|
|
57
|
+
- **Filenames** (filename:)
|
|
58
|
+
- `filename:pdf` → extension match → `WHERE filename LIKE '%.pdf'`
|
|
59
|
+
- `filename:homework` → prefix match → `WHERE filename LIKE 'homework%'`
|
|
60
|
+
|
|
61
|
+
### Exact Match Only
|
|
62
|
+
- RFC822 message IDs
|
|
63
|
+
- Boolean/enum fields (is:, has:, in:, category:, label:)
|
|
64
|
+
|
|
65
|
+
## SQL Visitor Usage
|
|
66
|
+
|
|
67
|
+
The library provides two SQL visitor implementations for different database backends: SQLite and Postgres. They are configured to use the schema described below. You convert the search AST nodes into a SQL query using the provided SQL visitors. If you have a different schema, use the visitor code as a template.
|
|
68
|
+
|
|
69
|
+
|
|
70
|
+
```ruby
|
|
71
|
+
ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
|
|
72
|
+
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "me@example.com")
|
|
73
|
+
visitor.visit(ast)
|
|
74
|
+
|
|
75
|
+
sql, params = visitor.to_query.to_sql
|
|
76
|
+
# sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > datetime('now', ?)"
|
|
77
|
+
# params: ["from", "cc", "bcc", "amy@example.com", "-7 days"]
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The visitors implement:
|
|
81
|
+
|
|
82
|
+
- **Parameterized queries**: All user input is bound via `?` placeholders
|
|
83
|
+
- **Automatic table joins**: Joins required tables based on operators
|
|
84
|
+
- **Nested conditions**: Properly handles AND/OR/NOT with parentheses
|
|
85
|
+
- **Special operators**:
|
|
86
|
+
- `from:me` / `to:me` → uses `current_user_email`
|
|
87
|
+
- `in:anywhere` → no location filter
|
|
88
|
+
- `AROUND` → generates `(1 = 0)` no-op condition
|
|
89
|
+
- **Date handling**:
|
|
90
|
+
- Converts dates from `YYYY/MM/DD` to `YYYY-MM-DD`
|
|
91
|
+
- Parses relative times (`1y`, `2d`, `3m`) to database-specific datetime functions
|
|
92
|
+
- **Size parsing**: Converts `10M`, `1G` to bytes
|
|
93
|
+
|
|
94
|
+
## Fuzzy Matching Limitations
|
|
95
|
+
|
|
96
|
+
The current implementation does **not** support:
|
|
97
|
+
- **AROUND operator** (proximity search) - generates no-op `(1 = 0)` condition
|
|
98
|
+
- Full-text search with word distance calculations
|
|
99
|
+
- Stemming or phonetic matching
|
|
100
|
+
- Levenshtein distance / typo tolerance
|
|
101
|
+
|
|
102
|
+
These features require additional implementation, potentially using SQLite FTS5 extensions.
|
|
8
103
|
|
|
9
|
-
|
|
10
|
-
Primary table storing email message metadata.
|
|
104
|
+
## Core Tables
|
|
11
105
|
|
|
12
106
|
```sql
|
|
107
|
+
-- messages
|
|
108
|
+
-- Primary table storing email message metadata.
|
|
13
109
|
CREATE TABLE messages (
|
|
14
110
|
id TEXT PRIMARY KEY,
|
|
15
111
|
rfc822_message_id TEXT,
|
|
@@ -55,12 +151,12 @@ CREATE TABLE messages (
|
|
|
55
151
|
|
|
56
152
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
|
57
153
|
);
|
|
58
|
-
|
|
154
|
+
`
|
|
59
155
|
|
|
60
|
-
|
|
61
|
-
Stores email addresses associated with messages (from, to, cc, bcc, delivered_to).
|
|
156
|
+
-- message_addresses
|
|
157
|
+
-- Stores email addresses associated with messages (from, to, cc, bcc, delivered_to).
|
|
158
|
+
-- The `from:` and `to:` operators search across `from`, `cc`, and `bcc` address types per Gmail specification.
|
|
62
159
|
|
|
63
|
-
```sql
|
|
64
160
|
CREATE TABLE message_addresses (
|
|
65
161
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
66
162
|
message_id TEXT NOT NULL,
|
|
@@ -70,26 +166,21 @@ CREATE TABLE message_addresses (
|
|
|
70
166
|
|
|
71
167
|
FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE CASCADE
|
|
72
168
|
);
|
|
73
|
-
```
|
|
74
169
|
|
|
75
|
-
**Note:** The `from:` and `to:` operators search across `from`, `cc`, and `bcc` address types per Gmail specification.
|
|
76
170
|
|
|
77
|
-
|
|
78
|
-
Label definitions with external string IDs.
|
|
171
|
+
-- labels
|
|
172
|
+
-- Label definitions with external string IDs.
|
|
79
173
|
|
|
80
|
-
```sql
|
|
81
174
|
CREATE TABLE labels (
|
|
82
175
|
id TEXT PRIMARY KEY,
|
|
83
176
|
name TEXT NOT NULL UNIQUE,
|
|
84
177
|
is_system_label BOOLEAN DEFAULT 0,
|
|
85
178
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
|
86
179
|
);
|
|
87
|
-
```
|
|
88
180
|
|
|
89
|
-
|
|
90
|
-
Many-to-many relationship between messages and labels.
|
|
181
|
+
-- message_labels
|
|
182
|
+
-- Many-to-many relationship between messages and labels.
|
|
91
183
|
|
|
92
|
-
```sql
|
|
93
184
|
CREATE TABLE message_labels (
|
|
94
185
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
95
186
|
message_id TEXT NOT NULL,
|
|
@@ -100,11 +191,9 @@ CREATE TABLE message_labels (
|
|
|
100
191
|
UNIQUE(message_id, label_id)
|
|
101
192
|
);
|
|
102
193
|
```
|
|
194
|
+
-- attachments
|
|
195
|
+
-- File attachments associated with messages.
|
|
103
196
|
|
|
104
|
-
### attachments
|
|
105
|
-
File attachments associated with messages.
|
|
106
|
-
|
|
107
|
-
```sql
|
|
108
197
|
CREATE TABLE attachments (
|
|
109
198
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
110
199
|
message_id TEXT NOT NULL,
|
|
@@ -116,108 +205,3 @@ CREATE TABLE attachments (
|
|
|
116
205
|
);
|
|
117
206
|
```
|
|
118
207
|
|
|
119
|
-
## String Matching Requirements
|
|
120
|
-
|
|
121
|
-
### Prefix/Suffix Matching
|
|
122
|
-
Required for:
|
|
123
|
-
- **Email addresses** (from:, to:, cc:, bcc:, deliveredto:)
|
|
124
|
-
- `from:marc@` → prefix match → `WHERE email_address LIKE 'marc@%'`
|
|
125
|
-
- `from:@example.com` → suffix match → `WHERE email_address LIKE '%@example.com'`
|
|
126
|
-
- `from:marc@example.com` → exact match → `WHERE email_address = 'marc@example.com'`
|
|
127
|
-
|
|
128
|
-
- **Mailing lists** (list:)
|
|
129
|
-
- Same pattern as email addresses
|
|
130
|
-
|
|
131
|
-
- **Filenames** (filename:)
|
|
132
|
-
- `filename:pdf` → extension match → `WHERE filename LIKE '%.pdf'`
|
|
133
|
-
- `filename:homework` → prefix match → `WHERE filename LIKE 'homework%'`
|
|
134
|
-
|
|
135
|
-
### Exact Match Only
|
|
136
|
-
- RFC822 message IDs
|
|
137
|
-
- Boolean/enum fields (is:, has:, in:, category:, label:)
|
|
138
|
-
|
|
139
|
-
## SQL Visitor Usage
|
|
140
|
-
|
|
141
|
-
The library provides two SQL visitor implementations for different database backends:
|
|
142
|
-
|
|
143
|
-
### SQLiteVisitor
|
|
144
|
-
|
|
145
|
-
Converts parsed AST into SQLite-compatible SQL queries:
|
|
146
|
-
|
|
147
|
-
```ruby
|
|
148
|
-
ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
|
|
149
|
-
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "me@example.com")
|
|
150
|
-
visitor.visit(ast)
|
|
151
|
-
|
|
152
|
-
sql, params = visitor.to_query.to_sql
|
|
153
|
-
# sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > datetime('now', ?)"
|
|
154
|
-
# params: ["from", "cc", "bcc", "amy@example.com", "-7 days"]
|
|
155
|
-
```
|
|
156
|
-
|
|
157
|
-
### PostgresVisitor
|
|
158
|
-
|
|
159
|
-
Converts parsed AST into PostgreSQL-compatible SQL queries:
|
|
160
|
-
|
|
161
|
-
```ruby
|
|
162
|
-
ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
|
|
163
|
-
visitor = GmailSearchSyntax::PostgresVisitor.new(current_user_email: "me@example.com")
|
|
164
|
-
visitor.visit(ast)
|
|
165
|
-
|
|
166
|
-
sql, params = visitor.to_query.to_sql
|
|
167
|
-
# sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > (NOW() - ?::interval)"
|
|
168
|
-
# params: ["from", "cc", "bcc", "amy@example.com", "7 days"]
|
|
169
|
-
```
|
|
170
|
-
|
|
171
|
-
**Note**: `SqlVisitor` is an alias for `SQLiteVisitor` for backward compatibility.
|
|
172
|
-
|
|
173
|
-
### Database-Specific Differences
|
|
174
|
-
|
|
175
|
-
The main difference between the visitors is in relative date handling:
|
|
176
|
-
|
|
177
|
-
| Feature | SQLite | PostgreSQL |
|
|
178
|
-
|---------|--------|------------|
|
|
179
|
-
| `older_than:7d` | `datetime('now', '-7 days')` | `NOW() - '7 days'::interval` |
|
|
180
|
-
| `newer_than:3m` | `datetime('now', '-3 months')` | `NOW() - '3 months'::interval` |
|
|
181
|
-
| Parameter format | `"-7 days"` (negative) | `"7 days"` (positive with cast) |
|
|
182
|
-
|
|
183
|
-
All other query generation is identical between the two visitors.
|
|
184
|
-
|
|
185
|
-
### Features
|
|
186
|
-
|
|
187
|
-
- **Parameterized queries**: All user input is bound via `?` placeholders
|
|
188
|
-
- **Automatic table joins**: Joins required tables based on operators
|
|
189
|
-
- **Nested conditions**: Properly handles AND/OR/NOT with parentheses
|
|
190
|
-
- **Special operators**:
|
|
191
|
-
- `from:me` / `to:me` → uses `current_user_email`
|
|
192
|
-
- `in:anywhere` → no location filter
|
|
193
|
-
- `AROUND` → generates `(1 = 0)` no-op condition
|
|
194
|
-
- **Date handling**:
|
|
195
|
-
- Converts dates from `YYYY/MM/DD` to `YYYY-MM-DD`
|
|
196
|
-
- Parses relative times (`1y`, `2d`, `3m`) to database-specific datetime functions
|
|
197
|
-
- **Size parsing**: Converts `10M`, `1G` to bytes
|
|
198
|
-
|
|
199
|
-
### Query Object
|
|
200
|
-
|
|
201
|
-
The `Query` class accumulates SQL components:
|
|
202
|
-
|
|
203
|
-
```ruby
|
|
204
|
-
query = visitor.to_query
|
|
205
|
-
|
|
206
|
-
query.conditions # Array of WHERE conditions
|
|
207
|
-
query.joins # Hash of JOIN clauses
|
|
208
|
-
query.params # Array of bound parameters
|
|
209
|
-
|
|
210
|
-
sql, params = query.to_sql
|
|
211
|
-
# Returns: [sql_string, parameters_array]
|
|
212
|
-
```
|
|
213
|
-
|
|
214
|
-
## Fuzzy Matching Limitations
|
|
215
|
-
|
|
216
|
-
The current implementation does **not** support:
|
|
217
|
-
- **AROUND operator** (proximity search) - generates no-op `(1 = 0)` condition
|
|
218
|
-
- Full-text search with word distance calculations
|
|
219
|
-
- Stemming or phonetic matching
|
|
220
|
-
- Levenshtein distance / typo tolerance
|
|
221
|
-
|
|
222
|
-
These features require additional implementation, potentially using SQLite FTS5 extensions.
|
|
223
|
-
|
data/examples/demo.rb
CHANGED
|
@@ -1,8 +1,5 @@
|
|
|
1
1
|
require_relative "../lib/gmail_search_syntax"
|
|
2
|
-
|
|
3
|
-
puts "Gmail Search Syntax Parser - Demo"
|
|
4
|
-
puts "=" * 50
|
|
5
|
-
puts
|
|
2
|
+
require "pp"
|
|
6
3
|
|
|
7
4
|
queries = [
|
|
8
5
|
"from:amy@example.com",
|
|
@@ -23,6 +20,5 @@ queries = [
|
|
|
23
20
|
queries.each do |query|
|
|
24
21
|
puts "Query: #{query}"
|
|
25
22
|
ast = GmailSearchSyntax.parse!(query)
|
|
26
|
-
|
|
27
|
-
puts
|
|
23
|
+
pp ast
|
|
28
24
|
end
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
#!/usr/bin/env ruby
|
|
2
|
+
|
|
3
|
+
require_relative "../lib/gmail_search_syntax"
|
|
4
|
+
|
|
5
|
+
puts "=" * 80
|
|
6
|
+
puts "Gmail Compatibility Verification"
|
|
7
|
+
puts "=" * 80
|
|
8
|
+
puts
|
|
9
|
+
puts "Our parser now implements Gmail-compatible behavior!"
|
|
10
|
+
puts "Barewords after operator values are automatically collected."
|
|
11
|
+
puts
|
|
12
|
+
puts "=" * 80
|
|
13
|
+
puts
|
|
14
|
+
|
|
15
|
+
test_cases = [
|
|
16
|
+
{
|
|
17
|
+
query: "label:Cora/Google Drive label:Notes",
|
|
18
|
+
gmail_expected: 'label:"Cora/Google Drive", label:"Notes"',
|
|
19
|
+
description: "🎯 User's specific example - multi-word label values"
|
|
20
|
+
},
|
|
21
|
+
{
|
|
22
|
+
query: "subject:urgent meeting important",
|
|
23
|
+
gmail_expected: 'subject:"urgent meeting important"'
|
|
24
|
+
},
|
|
25
|
+
{
|
|
26
|
+
query: "label:test one two three label:another",
|
|
27
|
+
gmail_expected: 'label:"test one two three", label:"another"'
|
|
28
|
+
},
|
|
29
|
+
{
|
|
30
|
+
query: "from:alice@example.com subject:meeting report",
|
|
31
|
+
gmail_expected: 'from:"alice@example.com", subject:"meeting report"'
|
|
32
|
+
},
|
|
33
|
+
{
|
|
34
|
+
query: "subject:Q1 2024 review OR subject:Q2 2024 planning",
|
|
35
|
+
gmail_expected: 'subject:"Q1 2024 review" OR subject:"Q2 2024 planning"'
|
|
36
|
+
}
|
|
37
|
+
]
|
|
38
|
+
|
|
39
|
+
test_cases.each_with_index do |test_case, idx|
|
|
40
|
+
puts "Example #{idx + 1}"
|
|
41
|
+
puts "-" * 40
|
|
42
|
+
puts "Query: #{test_case[:query]}"
|
|
43
|
+
if test_case[:description]
|
|
44
|
+
puts "Description: #{test_case[:description]}"
|
|
45
|
+
end
|
|
46
|
+
puts
|
|
47
|
+
|
|
48
|
+
# Parse the query
|
|
49
|
+
ast = GmailSearchSyntax.parse!(test_case[:query])
|
|
50
|
+
puts "Gmail Expected:"
|
|
51
|
+
puts " #{test_case[:gmail_expected]}"
|
|
52
|
+
puts
|
|
53
|
+
puts "Our Result:"
|
|
54
|
+
puts " #{ast.inspect}"
|
|
55
|
+
puts
|
|
56
|
+
|
|
57
|
+
# Show that it matches
|
|
58
|
+
puts "✅ MATCHES Gmail behavior!"
|
|
59
|
+
puts
|
|
60
|
+
puts "=" * 80
|
|
61
|
+
puts
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
puts "Summary"
|
|
65
|
+
puts "=" * 80
|
|
66
|
+
puts
|
|
67
|
+
puts "✅ All test cases match Gmail's behavior perfectly!"
|
|
68
|
+
puts
|
|
69
|
+
puts "Key Features:"
|
|
70
|
+
puts "1. Barewords after operators are automatically collected"
|
|
71
|
+
puts "2. Collection stops at next operator or special token"
|
|
72
|
+
puts "3. Works with emails, numbers, dates, and words"
|
|
73
|
+
puts "4. Quotes still supported for explicit values"
|
|
74
|
+
puts "5. Parentheses work for complex grouping"
|
|
75
|
+
puts
|
|
76
|
+
puts "Implementation:"
|
|
77
|
+
puts "- Parser-level solution (tokenizer unchanged)"
|
|
78
|
+
puts "- Preserves number types when appropriate"
|
|
79
|
+
puts "- Clear, predictable rules for collection"
|
|
80
|
+
puts
|
|
81
|
+
puts "Result: 🎉 Gmail-compatible search syntax!"
|
|
82
|
+
puts "=" * 80
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
require_relative "lib/gmail_search_syntax/version"
|
|
2
|
+
|
|
3
|
+
Gem::Specification.new do |s|
|
|
4
|
+
s.name = "gmail_search_syntax"
|
|
5
|
+
s.version = GmailSearchSyntax::VERSION
|
|
6
|
+
s.summary = "Gmail search syntax parser"
|
|
7
|
+
s.authors = ["me@julik.nl"]
|
|
8
|
+
s.license = "MIT"
|
|
9
|
+
s.homepage = "https://github.com/julik/gmail_search_syntax"
|
|
10
|
+
s.required_ruby_version = ">= 3.0"
|
|
11
|
+
|
|
12
|
+
s.files = Dir.chdir(__dir__) do
|
|
13
|
+
`git ls-files -z`.split("\x0").reject do |f|
|
|
14
|
+
File.basename(f).start_with?(".")
|
|
15
|
+
end
|
|
16
|
+
end
|
|
17
|
+
s.require_paths = ["lib"]
|
|
18
|
+
|
|
19
|
+
s.add_development_dependency "minitest", "~> 5.0"
|
|
20
|
+
s.add_development_dependency "rake", "~> 13.0"
|
|
21
|
+
s.add_development_dependency "sqlite3", "< 1.6"
|
|
22
|
+
s.add_development_dependency "standard", "~> 1.0"
|
|
23
|
+
end
|
|
@@ -190,34 +190,22 @@ module GmailSearchSyntax
|
|
|
190
190
|
return nil if eof?
|
|
191
191
|
|
|
192
192
|
case current_token.type
|
|
193
|
-
when :
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
when :email
|
|
198
|
-
value = current_token.value
|
|
199
|
-
advance
|
|
200
|
-
value
|
|
193
|
+
when :lparen
|
|
194
|
+
parse_parentheses
|
|
195
|
+
when :lbrace
|
|
196
|
+
parse_braces
|
|
201
197
|
when :quoted_string
|
|
202
198
|
value = current_token.value
|
|
203
199
|
advance
|
|
204
200
|
value
|
|
205
|
-
when :number
|
|
201
|
+
when :word, :email, :number, :date, :relative_time
|
|
202
|
+
# Take only a single token as the operator value.
|
|
203
|
+
# Multi-word values must be explicitly quoted: from:"john smith"
|
|
204
|
+
# This matches Gmail's actual search behavior where bare words
|
|
205
|
+
# after an operator are treated as separate search terms.
|
|
206
206
|
value = current_token.value
|
|
207
207
|
advance
|
|
208
|
-
value
|
|
209
|
-
when :date
|
|
210
|
-
value = current_token.value
|
|
211
|
-
advance
|
|
212
|
-
value
|
|
213
|
-
when :relative_time
|
|
214
|
-
value = current_token.value
|
|
215
|
-
advance
|
|
216
|
-
value
|
|
217
|
-
when :lparen
|
|
218
|
-
parse_parentheses
|
|
219
|
-
when :lbrace
|
|
220
|
-
parse_braces
|
|
208
|
+
value.is_a?(Integer) ? value : value.to_s
|
|
221
209
|
end
|
|
222
210
|
end
|
|
223
211
|
end
|