gmail_search_syntax 0.1.2 → 0.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.github/workflows/ci.yml +24 -0
- data/AGENTS.md +104 -0
- data/Gemfile +3 -0
- data/README.md +19 -42
- data/SCHEMA.md +112 -128
- data/examples/demo.rb +2 -6
- data/gmail_search_syntax.gemspec +23 -0
- data/lib/gmail_search_syntax/parser.rb +7 -37
- data/lib/gmail_search_syntax/tokenizer.rb +8 -1
- data/lib/gmail_search_syntax/version.rb +1 -1
- data/slop/EMBEDDED_HYPHENS.md +102 -0
- data/slop/GREEDY_VS_NON_GREEDY_TOKENIZATION.md +84 -0
- data/test/gmail_search_syntax_test.rb +149 -50
- data/test/tokenizer_test.rb +58 -0
- metadata +11 -5
- /data/{ARCHITECTURE.md → slop/ARCHITECTURE.md} +0 -0
- /data/{GMAIL_BEHAVIOR_COMPARISON.md → slop/GMAIL_BEHAVIOR_COMPARISON.md} +0 -0
- /data/{GMAIL_COMPATIBILITY_COMPLETE.md → slop/GMAIL_COMPATIBILITY_COMPLETE.md} +0 -0
- /data/{IMPLEMENTATION_NOTES.md → slop/IMPLEMENTATION_NOTES.md} +0 -0
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 47b56ee5467d6808c4ceae00289bf291551374a3e5854ed5223a5b2f5ca2f9ac
|
|
4
|
+
data.tar.gz: 27aa483d8296eb2a3e775aecfaeffc0813f60936cd9091e976eb5159559cafc9
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 0170d5419e8ab3335a3bd4494f09c3b6e6f5a143cf26791a383f2189a9e12fcd49a596e56555670d018a6ffdffa13b13eed27e554cbe5ff82a221dcd12ed4bd7
|
|
7
|
+
data.tar.gz: 95d48f2e6aedb4160634db949e32be25823f8b808e71dc113f3ce9e8490b507e136a96372c329f8ff03f2476bc52f4dd31fc89f67a9ba2b7e11e5e0bcebf241f
|
|
@@ -0,0 +1,24 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [ main ]
|
|
6
|
+
pull_request:
|
|
7
|
+
branches: [ main ]
|
|
8
|
+
|
|
9
|
+
jobs:
|
|
10
|
+
test:
|
|
11
|
+
runs-on: ubuntu-latest
|
|
12
|
+
|
|
13
|
+
steps:
|
|
14
|
+
- uses: actions/checkout@v4
|
|
15
|
+
|
|
16
|
+
- name: Set up Ruby
|
|
17
|
+
uses: ruby/setup-ruby@v1
|
|
18
|
+
with:
|
|
19
|
+
ruby-version: '3.4'
|
|
20
|
+
bundler-cache: true
|
|
21
|
+
|
|
22
|
+
- name: Run tests
|
|
23
|
+
run: bundle exec rake
|
|
24
|
+
|
data/AGENTS.md
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
# Agent Guidelines for gmail_search_syntax
|
|
2
|
+
|
|
3
|
+
This document outlines the coding standards and workflow requirements for AI agents working on the gmail_search_syntax project.
|
|
4
|
+
|
|
5
|
+
## Ruby File Standards
|
|
6
|
+
|
|
7
|
+
### Frozen String Literal
|
|
8
|
+
**ALWAYS** include `# frozen_string_literal: true` at the top of every Ruby file:
|
|
9
|
+
|
|
10
|
+
```ruby
|
|
11
|
+
# frozen_string_literal: true
|
|
12
|
+
|
|
13
|
+
class MyClass
|
|
14
|
+
# ... implementation
|
|
15
|
+
end
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
This directive should be the very first line of every `.rb` file to ensure string immutability and improve performance.
|
|
19
|
+
|
|
20
|
+
## Documentation Standards
|
|
21
|
+
|
|
22
|
+
### Markdown Files Location
|
|
23
|
+
When writing step-by-step instructions, documentation, or process descriptions in Markdown format, place them in the `slop/` directory:
|
|
24
|
+
|
|
25
|
+
```
|
|
26
|
+
slop/
|
|
27
|
+
├── ARCHITECTURE.md
|
|
28
|
+
├── GMAIL_BEHAVIOR_COMPARISON.md
|
|
29
|
+
├── GMAIL_COMPATIBILITY_COMPLETE.md
|
|
30
|
+
└── IMPLEMENTATION_NOTES.md
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
This keeps the main project directory clean while preserving detailed documentation and implementation notes.
|
|
34
|
+
|
|
35
|
+
## Code Formatting
|
|
36
|
+
|
|
37
|
+
### StandardRB Integration
|
|
38
|
+
After creating or modifying any Ruby file, **ALWAYS** run StandardRB to maintain consistent formatting:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
standardrb --fix /path/to/file.rb
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
This ensures:
|
|
45
|
+
- Consistent code style across the project
|
|
46
|
+
- Automatic fixing of common formatting issues
|
|
47
|
+
- Compliance with the project's Ruby style guide
|
|
48
|
+
- Uniform indentation, spacing, and syntax
|
|
49
|
+
|
|
50
|
+
### Workflow
|
|
51
|
+
1. Create or modify a Ruby file
|
|
52
|
+
2. Immediately run `standardrb --fix` on the file
|
|
53
|
+
3. Verify the changes are acceptable
|
|
54
|
+
4. Continue with development
|
|
55
|
+
|
|
56
|
+
## Project Context
|
|
57
|
+
|
|
58
|
+
This is a Ruby gem that parses Gmail's search syntax and converts it into an Abstract Syntax Tree (AST). The project includes:
|
|
59
|
+
|
|
60
|
+
- **Core parsing**: Tokenizer, parser, and AST nodes
|
|
61
|
+
- **SQL conversion**: SQLite and Postgres visitors for database queries
|
|
62
|
+
- **Comprehensive testing**: Unit and integration tests
|
|
63
|
+
- **Documentation**: Schema documentation and operator reference
|
|
64
|
+
|
|
65
|
+
## Key Files
|
|
66
|
+
|
|
67
|
+
- `lib/gmail_search_syntax.rb` - Main entry point
|
|
68
|
+
- `lib/gmail_search_syntax/parser.rb` - Core parsing logic
|
|
69
|
+
- `lib/gmail_search_syntax/sql_visitor.rb` - SQL generation
|
|
70
|
+
- `test/` - Test suite
|
|
71
|
+
- `SCHEMA.md` - Database schema documentation
|
|
72
|
+
- `slop/` - Detailed implementation documentation
|
|
73
|
+
|
|
74
|
+
## Best Practices
|
|
75
|
+
|
|
76
|
+
1. **Test Coverage**: Ensure all new functionality has corresponding tests
|
|
77
|
+
2. **Documentation**: Update relevant documentation when adding features
|
|
78
|
+
3. **Backward Compatibility**: Maintain API compatibility when possible
|
|
79
|
+
4. **Performance**: Consider performance implications of parsing changes
|
|
80
|
+
5. **Gmail Compatibility**: Verify changes against Gmail's actual search behavior
|
|
81
|
+
|
|
82
|
+
## Example Workflow
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
# 1. Create a new Ruby file
|
|
86
|
+
echo '# frozen_string_literal: true' > lib/gmail_search_syntax/new_feature.rb
|
|
87
|
+
|
|
88
|
+
# 2. Add implementation
|
|
89
|
+
# ... write code ...
|
|
90
|
+
|
|
91
|
+
# 3. Format with StandardRB
|
|
92
|
+
standardrb --fix lib/gmail_search_syntax/new_feature.rb
|
|
93
|
+
|
|
94
|
+
# 4. Create documentation if needed
|
|
95
|
+
echo '# Implementation Notes' > slop/NEW_FEATURE_NOTES.md
|
|
96
|
+
|
|
97
|
+
# 5. Add tests
|
|
98
|
+
# ... write tests ...
|
|
99
|
+
|
|
100
|
+
# 6. Run test suite
|
|
101
|
+
bundle exec rake test
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
Remember: Consistency in code style and documentation organization is crucial for maintaining this project's quality and readability.
|
data/Gemfile
ADDED
data/README.md
CHANGED
|
@@ -6,7 +6,8 @@ Based on the official Gmail search operators documentation:
|
|
|
6
6
|
https://support.google.com/mail/answer/7190
|
|
7
7
|
|
|
8
8
|
> [!TIP]
|
|
9
|
-
> This gem was created for [Cora,](https://cora.computer/)
|
|
9
|
+
> This gem was created for [Cora,](https://cora.computer/)
|
|
10
|
+
> your personal e-mail assistant.
|
|
10
11
|
> Send them some love for allowing me to share it.
|
|
11
12
|
|
|
12
13
|
## Installation
|
|
@@ -54,47 +55,6 @@ GmailSearchSyntax.parse!("")
|
|
|
54
55
|
# => raises GmailSearchSyntax::EmptyQueryError
|
|
55
56
|
```
|
|
56
57
|
|
|
57
|
-
### Converting to SQL
|
|
58
|
-
|
|
59
|
-
The gem includes a SQLite visitor that can convert Gmail queries to SQL. Here's a complex example:
|
|
60
|
-
|
|
61
|
-
```ruby
|
|
62
|
-
require 'gmail_search_syntax'
|
|
63
|
-
|
|
64
|
-
# A complex Gmail query with multiple operators
|
|
65
|
-
query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
|
|
66
|
-
|
|
67
|
-
ast = GmailSearchSyntax.parse!(query)
|
|
68
|
-
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
|
|
69
|
-
visitor.visit(ast)
|
|
70
|
-
|
|
71
|
-
sql, params = visitor.to_query.to_sql
|
|
72
|
-
```
|
|
73
|
-
|
|
74
|
-
This generates the following SQL:
|
|
75
|
-
|
|
76
|
-
```sql
|
|
77
|
-
SELECT DISTINCT m0.id
|
|
78
|
-
FROM messages AS m0
|
|
79
|
-
INNER JOIN message_addresses AS ma1 ON m0.id = ma1.message_id
|
|
80
|
-
INNER JOIN message_addresses AS ma3 ON m0.id = ma3.message_id
|
|
81
|
-
INNER JOIN message_labels AS ml ON m0.id = ml.message_id
|
|
82
|
-
INNER JOIN labels AS l ON ml.label_id = l.id
|
|
83
|
-
WHERE ((((ma1.address_type = ? OR ma1.address_type = ? OR ma1.address_type = ?)
|
|
84
|
-
AND ma1.email_address = ?)
|
|
85
|
-
OR ((ma3.address_type = ? OR ma3.address_type = ? OR ma3.address_type = ?)
|
|
86
|
-
AND ma3.email_address = ?))
|
|
87
|
-
AND m0.subject LIKE ?
|
|
88
|
-
AND m0.has_attachment = 1
|
|
89
|
-
AND NOT l.name = ?
|
|
90
|
-
AND m0.internal_date > ?
|
|
91
|
-
AND m0.size_bytes > ?)
|
|
92
|
-
```
|
|
93
|
-
|
|
94
|
-
With parameters: `["from", "cc", "bcc", "manager", "from", "cc", "bcc", "boss", "%quarterly review%", "archived", "2024-01-01", 5242880]`
|
|
95
|
-
|
|
96
|
-
A similar visitor is provided for PostgreSQL.
|
|
97
|
-
|
|
98
58
|
## Supported Operators
|
|
99
59
|
|
|
100
60
|
Email routing: `from:`, `to:`, `cc:`, `bcc:`, `deliveredto:`
|
|
@@ -113,6 +73,23 @@ Size: `size:`, `larger:`, `smaller:`
|
|
|
113
73
|
|
|
114
74
|
There is also a converter from the operators to SQL queries against an embedded SQLite database. This is meant more as an example than a fully-featured store, but it shows what's possible.
|
|
115
75
|
|
|
76
|
+
### Converting to SQL
|
|
77
|
+
|
|
78
|
+
The gem includes a SQLite visitor and a Postgres visitor which converts the Gmail queries into corresponding SQL. See SCHEMA.md for more information.
|
|
79
|
+
|
|
80
|
+
```ruby
|
|
81
|
+
require 'gmail_search_syntax'
|
|
82
|
+
|
|
83
|
+
# A complex Gmail query with multiple operators
|
|
84
|
+
query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
|
|
85
|
+
|
|
86
|
+
ast = GmailSearchSyntax.parse!(query)
|
|
87
|
+
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
|
|
88
|
+
visitor.visit(ast)
|
|
89
|
+
|
|
90
|
+
sql, params = visitor.to_query.to_sql
|
|
91
|
+
```
|
|
92
|
+
|
|
116
93
|
## Testing
|
|
117
94
|
|
|
118
95
|
```bash
|
data/SCHEMA.md
CHANGED
|
@@ -1,15 +1,111 @@
|
|
|
1
|
-
|
|
1
|
+
This document describes the database schema designed to support Gmail search syntax queries. The schema is optimized for the search operators defined in `lib/GMAIL_SEARCH_OPERATORS.md`.
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
```ruby
|
|
4
|
+
require 'gmail_search_syntax'
|
|
4
5
|
|
|
5
|
-
|
|
6
|
+
# A complex Gmail query with multiple operators
|
|
7
|
+
query = '(from:manager OR from:boss) subject:"quarterly review" has:attachment -label:archived after:2024/01/01 larger:5M'
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
ast = GmailSearchSyntax.parse!(query)
|
|
10
|
+
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "user@example.com")
|
|
11
|
+
visitor.visit(ast)
|
|
12
|
+
|
|
13
|
+
sql, params = visitor.to_query.to_sql
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
generates the following SQL:
|
|
17
|
+
|
|
18
|
+
```sql
|
|
19
|
+
SELECT DISTINCT m0.id
|
|
20
|
+
FROM messages AS m0
|
|
21
|
+
INNER JOIN message_addresses AS ma1 ON m0.id = ma1.message_id
|
|
22
|
+
INNER JOIN message_addresses AS ma3 ON m0.id = ma3.message_id
|
|
23
|
+
INNER JOIN message_labels AS ml ON m0.id = ml.message_id
|
|
24
|
+
INNER JOIN labels AS l ON ml.label_id = l.id
|
|
25
|
+
WHERE ((((ma1.address_type = ? OR ma1.address_type = ? OR ma1.address_type = ?)
|
|
26
|
+
AND ma1.email_address = ?)
|
|
27
|
+
OR ((ma3.address_type = ? OR ma3.address_type = ? OR ma3.address_type = ?)
|
|
28
|
+
AND ma3.email_address = ?))
|
|
29
|
+
AND m0.subject LIKE ?
|
|
30
|
+
AND m0.has_attachment = 1
|
|
31
|
+
AND NOT l.name = ?
|
|
32
|
+
AND m0.internal_date > ?
|
|
33
|
+
AND m0.size_bytes > ?)
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
and bound parameters:
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
[
|
|
40
|
+
"from", "cc", "bcc", "manager", "from", "cc", "bcc",
|
|
41
|
+
"boss", "%quarterly review%", "archived", "2024-01-01", 5242880
|
|
42
|
+
]
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
## String Matching Requirements
|
|
46
|
+
|
|
47
|
+
### Prefix/Suffix Matching
|
|
48
|
+
Required for:
|
|
49
|
+
- **Email addresses** (from:, to:, cc:, bcc:, deliveredto:)
|
|
50
|
+
- `from:marc@` → prefix match → `WHERE email_address LIKE 'marc@%'`
|
|
51
|
+
- `from:@example.com` → suffix match → `WHERE email_address LIKE '%@example.com'`
|
|
52
|
+
- `from:marc@example.com` → exact match → `WHERE email_address = 'marc@example.com'`
|
|
53
|
+
|
|
54
|
+
- **Mailing lists** (list:)
|
|
55
|
+
- Same pattern as email addresses
|
|
56
|
+
|
|
57
|
+
- **Filenames** (filename:)
|
|
58
|
+
- `filename:pdf` → extension match → `WHERE filename LIKE '%.pdf'`
|
|
59
|
+
- `filename:homework` → prefix match → `WHERE filename LIKE 'homework%'`
|
|
60
|
+
|
|
61
|
+
### Exact Match Only
|
|
62
|
+
- RFC822 message IDs
|
|
63
|
+
- Boolean/enum fields (is:, has:, in:, category:, label:)
|
|
64
|
+
|
|
65
|
+
## SQL Visitor Usage
|
|
66
|
+
|
|
67
|
+
The library provides two SQL visitor implementations for different database backends: SQLite and Postgres. They are configured to use the schema described below. You convert the search AST nodes into a SQL query using the provided SQL visitors. If you have a different schema, use the visitor code as a template.
|
|
68
|
+
|
|
69
|
+
|
|
70
|
+
```ruby
|
|
71
|
+
ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
|
|
72
|
+
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "me@example.com")
|
|
73
|
+
visitor.visit(ast)
|
|
74
|
+
|
|
75
|
+
sql, params = visitor.to_query.to_sql
|
|
76
|
+
# sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > datetime('now', ?)"
|
|
77
|
+
# params: ["from", "cc", "bcc", "amy@example.com", "-7 days"]
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The visitors implement:
|
|
81
|
+
|
|
82
|
+
- **Parameterized queries**: All user input is bound via `?` placeholders
|
|
83
|
+
- **Automatic table joins**: Joins required tables based on operators
|
|
84
|
+
- **Nested conditions**: Properly handles AND/OR/NOT with parentheses
|
|
85
|
+
- **Special operators**:
|
|
86
|
+
- `from:me` / `to:me` → uses `current_user_email`
|
|
87
|
+
- `in:anywhere` → no location filter
|
|
88
|
+
- `AROUND` → generates `(1 = 0)` no-op condition
|
|
89
|
+
- **Date handling**:
|
|
90
|
+
- Converts dates from `YYYY/MM/DD` to `YYYY-MM-DD`
|
|
91
|
+
- Parses relative times (`1y`, `2d`, `3m`) to database-specific datetime functions
|
|
92
|
+
- **Size parsing**: Converts `10M`, `1G` to bytes
|
|
93
|
+
|
|
94
|
+
## Fuzzy Matching Limitations
|
|
95
|
+
|
|
96
|
+
The current implementation does **not** support:
|
|
97
|
+
- **AROUND operator** (proximity search) - generates no-op `(1 = 0)` condition
|
|
98
|
+
- Full-text search with word distance calculations
|
|
99
|
+
- Stemming or phonetic matching
|
|
100
|
+
- Levenshtein distance / typo tolerance
|
|
101
|
+
|
|
102
|
+
These features require additional implementation, potentially using SQLite FTS5 extensions.
|
|
8
103
|
|
|
9
|
-
|
|
10
|
-
Primary table storing email message metadata.
|
|
104
|
+
## Core Tables
|
|
11
105
|
|
|
12
106
|
```sql
|
|
107
|
+
-- messages
|
|
108
|
+
-- Primary table storing email message metadata.
|
|
13
109
|
CREATE TABLE messages (
|
|
14
110
|
id TEXT PRIMARY KEY,
|
|
15
111
|
rfc822_message_id TEXT,
|
|
@@ -55,12 +151,12 @@ CREATE TABLE messages (
|
|
|
55
151
|
|
|
56
152
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
|
57
153
|
);
|
|
58
|
-
|
|
154
|
+
`
|
|
59
155
|
|
|
60
|
-
|
|
61
|
-
Stores email addresses associated with messages (from, to, cc, bcc, delivered_to).
|
|
156
|
+
-- message_addresses
|
|
157
|
+
-- Stores email addresses associated with messages (from, to, cc, bcc, delivered_to).
|
|
158
|
+
-- The `from:` and `to:` operators search across `from`, `cc`, and `bcc` address types per Gmail specification.
|
|
62
159
|
|
|
63
|
-
```sql
|
|
64
160
|
CREATE TABLE message_addresses (
|
|
65
161
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
66
162
|
message_id TEXT NOT NULL,
|
|
@@ -70,26 +166,21 @@ CREATE TABLE message_addresses (
|
|
|
70
166
|
|
|
71
167
|
FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE CASCADE
|
|
72
168
|
);
|
|
73
|
-
```
|
|
74
169
|
|
|
75
|
-
**Note:** The `from:` and `to:` operators search across `from`, `cc`, and `bcc` address types per Gmail specification.
|
|
76
170
|
|
|
77
|
-
|
|
78
|
-
Label definitions with external string IDs.
|
|
171
|
+
-- labels
|
|
172
|
+
-- Label definitions with external string IDs.
|
|
79
173
|
|
|
80
|
-
```sql
|
|
81
174
|
CREATE TABLE labels (
|
|
82
175
|
id TEXT PRIMARY KEY,
|
|
83
176
|
name TEXT NOT NULL UNIQUE,
|
|
84
177
|
is_system_label BOOLEAN DEFAULT 0,
|
|
85
178
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
|
86
179
|
);
|
|
87
|
-
```
|
|
88
180
|
|
|
89
|
-
|
|
90
|
-
Many-to-many relationship between messages and labels.
|
|
181
|
+
-- message_labels
|
|
182
|
+
-- Many-to-many relationship between messages and labels.
|
|
91
183
|
|
|
92
|
-
```sql
|
|
93
184
|
CREATE TABLE message_labels (
|
|
94
185
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
95
186
|
message_id TEXT NOT NULL,
|
|
@@ -100,11 +191,9 @@ CREATE TABLE message_labels (
|
|
|
100
191
|
UNIQUE(message_id, label_id)
|
|
101
192
|
);
|
|
102
193
|
```
|
|
194
|
+
-- attachments
|
|
195
|
+
-- File attachments associated with messages.
|
|
103
196
|
|
|
104
|
-
### attachments
|
|
105
|
-
File attachments associated with messages.
|
|
106
|
-
|
|
107
|
-
```sql
|
|
108
197
|
CREATE TABLE attachments (
|
|
109
198
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
110
199
|
message_id TEXT NOT NULL,
|
|
@@ -116,108 +205,3 @@ CREATE TABLE attachments (
|
|
|
116
205
|
);
|
|
117
206
|
```
|
|
118
207
|
|
|
119
|
-
## String Matching Requirements
|
|
120
|
-
|
|
121
|
-
### Prefix/Suffix Matching
|
|
122
|
-
Required for:
|
|
123
|
-
- **Email addresses** (from:, to:, cc:, bcc:, deliveredto:)
|
|
124
|
-
- `from:marc@` → prefix match → `WHERE email_address LIKE 'marc@%'`
|
|
125
|
-
- `from:@example.com` → suffix match → `WHERE email_address LIKE '%@example.com'`
|
|
126
|
-
- `from:marc@example.com` → exact match → `WHERE email_address = 'marc@example.com'`
|
|
127
|
-
|
|
128
|
-
- **Mailing lists** (list:)
|
|
129
|
-
- Same pattern as email addresses
|
|
130
|
-
|
|
131
|
-
- **Filenames** (filename:)
|
|
132
|
-
- `filename:pdf` → extension match → `WHERE filename LIKE '%.pdf'`
|
|
133
|
-
- `filename:homework` → prefix match → `WHERE filename LIKE 'homework%'`
|
|
134
|
-
|
|
135
|
-
### Exact Match Only
|
|
136
|
-
- RFC822 message IDs
|
|
137
|
-
- Boolean/enum fields (is:, has:, in:, category:, label:)
|
|
138
|
-
|
|
139
|
-
## SQL Visitor Usage
|
|
140
|
-
|
|
141
|
-
The library provides two SQL visitor implementations for different database backends:
|
|
142
|
-
|
|
143
|
-
### SQLiteVisitor
|
|
144
|
-
|
|
145
|
-
Converts parsed AST into SQLite-compatible SQL queries:
|
|
146
|
-
|
|
147
|
-
```ruby
|
|
148
|
-
ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
|
|
149
|
-
visitor = GmailSearchSyntax::SQLiteVisitor.new(current_user_email: "me@example.com")
|
|
150
|
-
visitor.visit(ast)
|
|
151
|
-
|
|
152
|
-
sql, params = visitor.to_query.to_sql
|
|
153
|
-
# sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > datetime('now', ?)"
|
|
154
|
-
# params: ["from", "cc", "bcc", "amy@example.com", "-7 days"]
|
|
155
|
-
```
|
|
156
|
-
|
|
157
|
-
### PostgresVisitor
|
|
158
|
-
|
|
159
|
-
Converts parsed AST into PostgreSQL-compatible SQL queries:
|
|
160
|
-
|
|
161
|
-
```ruby
|
|
162
|
-
ast = GmailSearchSyntax.parse!("from:amy@example.com newer_than:7d")
|
|
163
|
-
visitor = GmailSearchSyntax::PostgresVisitor.new(current_user_email: "me@example.com")
|
|
164
|
-
visitor.visit(ast)
|
|
165
|
-
|
|
166
|
-
sql, params = visitor.to_query.to_sql
|
|
167
|
-
# sql: "SELECT DISTINCT m.id FROM messages m ... WHERE ... m.internal_date > (NOW() - ?::interval)"
|
|
168
|
-
# params: ["from", "cc", "bcc", "amy@example.com", "7 days"]
|
|
169
|
-
```
|
|
170
|
-
|
|
171
|
-
**Note**: `SqlVisitor` is an alias for `SQLiteVisitor` for backward compatibility.
|
|
172
|
-
|
|
173
|
-
### Database-Specific Differences
|
|
174
|
-
|
|
175
|
-
The main difference between the visitors is in relative date handling:
|
|
176
|
-
|
|
177
|
-
| Feature | SQLite | PostgreSQL |
|
|
178
|
-
|---------|--------|------------|
|
|
179
|
-
| `older_than:7d` | `datetime('now', '-7 days')` | `NOW() - '7 days'::interval` |
|
|
180
|
-
| `newer_than:3m` | `datetime('now', '-3 months')` | `NOW() - '3 months'::interval` |
|
|
181
|
-
| Parameter format | `"-7 days"` (negative) | `"7 days"` (positive with cast) |
|
|
182
|
-
|
|
183
|
-
All other query generation is identical between the two visitors.
|
|
184
|
-
|
|
185
|
-
### Features
|
|
186
|
-
|
|
187
|
-
- **Parameterized queries**: All user input is bound via `?` placeholders
|
|
188
|
-
- **Automatic table joins**: Joins required tables based on operators
|
|
189
|
-
- **Nested conditions**: Properly handles AND/OR/NOT with parentheses
|
|
190
|
-
- **Special operators**:
|
|
191
|
-
- `from:me` / `to:me` → uses `current_user_email`
|
|
192
|
-
- `in:anywhere` → no location filter
|
|
193
|
-
- `AROUND` → generates `(1 = 0)` no-op condition
|
|
194
|
-
- **Date handling**:
|
|
195
|
-
- Converts dates from `YYYY/MM/DD` to `YYYY-MM-DD`
|
|
196
|
-
- Parses relative times (`1y`, `2d`, `3m`) to database-specific datetime functions
|
|
197
|
-
- **Size parsing**: Converts `10M`, `1G` to bytes
|
|
198
|
-
|
|
199
|
-
### Query Object
|
|
200
|
-
|
|
201
|
-
The `Query` class accumulates SQL components:
|
|
202
|
-
|
|
203
|
-
```ruby
|
|
204
|
-
query = visitor.to_query
|
|
205
|
-
|
|
206
|
-
query.conditions # Array of WHERE conditions
|
|
207
|
-
query.joins # Hash of JOIN clauses
|
|
208
|
-
query.params # Array of bound parameters
|
|
209
|
-
|
|
210
|
-
sql, params = query.to_sql
|
|
211
|
-
# Returns: [sql_string, parameters_array]
|
|
212
|
-
```
|
|
213
|
-
|
|
214
|
-
## Fuzzy Matching Limitations
|
|
215
|
-
|
|
216
|
-
The current implementation does **not** support:
|
|
217
|
-
- **AROUND operator** (proximity search) - generates no-op `(1 = 0)` condition
|
|
218
|
-
- Full-text search with word distance calculations
|
|
219
|
-
- Stemming or phonetic matching
|
|
220
|
-
- Levenshtein distance / typo tolerance
|
|
221
|
-
|
|
222
|
-
These features require additional implementation, potentially using SQLite FTS5 extensions.
|
|
223
|
-
|
data/examples/demo.rb
CHANGED
|
@@ -1,8 +1,5 @@
|
|
|
1
1
|
require_relative "../lib/gmail_search_syntax"
|
|
2
|
-
|
|
3
|
-
puts "Gmail Search Syntax Parser - Demo"
|
|
4
|
-
puts "=" * 50
|
|
5
|
-
puts
|
|
2
|
+
require "pp"
|
|
6
3
|
|
|
7
4
|
queries = [
|
|
8
5
|
"from:amy@example.com",
|
|
@@ -23,6 +20,5 @@ queries = [
|
|
|
23
20
|
queries.each do |query|
|
|
24
21
|
puts "Query: #{query}"
|
|
25
22
|
ast = GmailSearchSyntax.parse!(query)
|
|
26
|
-
|
|
27
|
-
puts
|
|
23
|
+
pp ast
|
|
28
24
|
end
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
require_relative "lib/gmail_search_syntax/version"
|
|
2
|
+
|
|
3
|
+
Gem::Specification.new do |s|
|
|
4
|
+
s.name = "gmail_search_syntax"
|
|
5
|
+
s.version = GmailSearchSyntax::VERSION
|
|
6
|
+
s.summary = "Gmail search syntax parser"
|
|
7
|
+
s.authors = ["me@julik.nl"]
|
|
8
|
+
s.license = "MIT"
|
|
9
|
+
s.homepage = "https://github.com/julik/gmail_search_syntax"
|
|
10
|
+
s.required_ruby_version = ">= 3.0"
|
|
11
|
+
|
|
12
|
+
s.files = Dir.chdir(__dir__) do
|
|
13
|
+
`git ls-files -z`.split("\x0").reject do |f|
|
|
14
|
+
File.basename(f).start_with?(".")
|
|
15
|
+
end
|
|
16
|
+
end
|
|
17
|
+
s.require_paths = ["lib"]
|
|
18
|
+
|
|
19
|
+
s.add_development_dependency "minitest", "~> 5.0"
|
|
20
|
+
s.add_development_dependency "rake", "~> 13.0"
|
|
21
|
+
s.add_development_dependency "sqlite3", "< 1.6"
|
|
22
|
+
s.add_development_dependency "standard", "~> 1.0"
|
|
23
|
+
end
|
|
@@ -195,47 +195,17 @@ module GmailSearchSyntax
|
|
|
195
195
|
when :lbrace
|
|
196
196
|
parse_braces
|
|
197
197
|
when :quoted_string
|
|
198
|
-
# Quoted strings are consumed as-is, no bareword collection
|
|
199
198
|
value = current_token.value
|
|
200
199
|
advance
|
|
201
200
|
value
|
|
202
201
|
when :word, :email, :number, :date, :relative_time
|
|
203
|
-
#
|
|
204
|
-
#
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
# Check if this word is actually an operator (word followed by colon)
|
|
211
|
-
if current_token.type == :word && peek_token&.type == :colon
|
|
212
|
-
break
|
|
213
|
-
end
|
|
214
|
-
|
|
215
|
-
values << current_token.value
|
|
216
|
-
types << current_token.type
|
|
217
|
-
advance
|
|
218
|
-
end
|
|
219
|
-
|
|
220
|
-
# If we only collected one value and it's a number, preserve its type
|
|
221
|
-
if values.length == 1 && types[0] == :number
|
|
222
|
-
values[0]
|
|
223
|
-
else
|
|
224
|
-
# Multiple values or non-number: join as string
|
|
225
|
-
values.map(&:to_s).join(" ")
|
|
226
|
-
end
|
|
227
|
-
end
|
|
228
|
-
end
|
|
229
|
-
|
|
230
|
-
def is_bareword_token?
|
|
231
|
-
return false if eof?
|
|
232
|
-
|
|
233
|
-
# Barewords are simple value tokens, not operators or special syntax
|
|
234
|
-
case current_token.type
|
|
235
|
-
when :word, :email, :number, :date, :relative_time
|
|
236
|
-
true
|
|
237
|
-
else
|
|
238
|
-
false
|
|
202
|
+
# Take only a single token as the operator value.
|
|
203
|
+
# Multi-word values must be explicitly quoted: from:"john smith"
|
|
204
|
+
# This matches Gmail's actual search behavior where bare words
|
|
205
|
+
# after an operator are treated as separate search terms.
|
|
206
|
+
value = current_token.value
|
|
207
|
+
advance
|
|
208
|
+
value.is_a?(Integer) ? value : value.to_s
|
|
239
209
|
end
|
|
240
210
|
end
|
|
241
211
|
end
|
|
@@ -55,9 +55,16 @@ module GmailSearchSyntax
|
|
|
55
55
|
advance
|
|
56
56
|
when "-"
|
|
57
57
|
next_char = peek_char
|
|
58
|
-
|
|
58
|
+
prev_char = (@position > 0) ? @input[@position - 1] : nil
|
|
59
|
+
# Negation requires: non-whitespace follows AND (start of input OR whitespace precedes)
|
|
60
|
+
# Gmail behavior: "Coxlee-Gammage" → Coxlee AND Gammage (hyphen is word separator)
|
|
61
|
+
# "Coxlee -Gammage" → Coxlee AND NOT Gammage (space+hyphen = negation)
|
|
62
|
+
if next_char && next_char !~ /\s/ && (prev_char.nil? || prev_char =~ /\s/)
|
|
59
63
|
add_token(:minus, char)
|
|
60
64
|
advance
|
|
65
|
+
elsif prev_char && prev_char !~ /\s/
|
|
66
|
+
# Embedded hyphen (preceded by non-whitespace) - skip it as word separator
|
|
67
|
+
advance
|
|
61
68
|
else
|
|
62
69
|
read_word
|
|
63
70
|
end
|
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
# Embedded Hyphens in Gmail Search
|
|
2
|
+
|
|
3
|
+
## Gmail's Actual Behavior
|
|
4
|
+
|
|
5
|
+
Gmail treats hyphens differently depending on whether they are preceded by whitespace:
|
|
6
|
+
|
|
7
|
+
### Embedded Hyphen (No Preceding Whitespace)
|
|
8
|
+
|
|
9
|
+
When a hyphen appears immediately after a word character (no space before it), Gmail treats it as a **word separator**, not a negation operator. Both parts become separate search tokens that are implicitly ANDed together.
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
Coxlee-Gammage
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Gmail behavior: Search for messages containing both "Coxlee" AND "Gammage". Both tokens get highlighted in search results.
|
|
16
|
+
|
|
17
|
+
Parsed as:
|
|
18
|
+
```ruby
|
|
19
|
+
GmailSearchSyntax.parse!("Coxlee-Gammage")
|
|
20
|
+
# => #<And [#<StringToken "Coxlee">, #<StringToken "Gammage">]>
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
### Space + Hyphen (Negation)
|
|
24
|
+
|
|
25
|
+
When a hyphen is preceded by whitespace (or at the start of input), it functions as the **negation operator**.
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
Coxlee -Gammage
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
Gmail behavior: Search for messages containing "Coxlee" but NOT "Gammage".
|
|
32
|
+
|
|
33
|
+
Parsed as:
|
|
34
|
+
```ruby
|
|
35
|
+
GmailSearchSyntax.parse!("Coxlee -Gammage")
|
|
36
|
+
# => #<And [#<StringToken "Coxlee">, #<Not #<StringToken "Gammage">>]>
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Examples
|
|
40
|
+
|
|
41
|
+
| Query | Parsed As | Meaning |
|
|
42
|
+
|-------|-----------|---------|
|
|
43
|
+
| `some-outfit` | `some AND outfit` | Contains both "some" and "outfit" |
|
|
44
|
+
| `some -outfit` | `some AND NOT outfit` | Contains "some" but not "outfit" |
|
|
45
|
+
| `a-b-c` | `a AND b AND c` | Contains all three tokens |
|
|
46
|
+
| `-spam` | `NOT spam` | Does not contain "spam" |
|
|
47
|
+
| `cats-dogs -birds` | `cats AND dogs AND NOT birds` | Contains "cats" and "dogs", not "birds" |
|
|
48
|
+
|
|
49
|
+
## Real-World Use Cases
|
|
50
|
+
|
|
51
|
+
### Hyphenated Names
|
|
52
|
+
```
|
|
53
|
+
from:Mary-Jane
|
|
54
|
+
```
|
|
55
|
+
Searches for emails where "from" contains "Mary" AND message contains "Jane".
|
|
56
|
+
|
|
57
|
+
### Hyphenated Terms
|
|
58
|
+
```
|
|
59
|
+
self-service
|
|
60
|
+
```
|
|
61
|
+
Finds messages containing both "self" and "service".
|
|
62
|
+
|
|
63
|
+
### Compound Words
|
|
64
|
+
```
|
|
65
|
+
e-commerce
|
|
66
|
+
```
|
|
67
|
+
Finds messages containing both "e" and "commerce".
|
|
68
|
+
|
|
69
|
+
## Implementation Details
|
|
70
|
+
|
|
71
|
+
The fix is in the tokenizer (`lib/gmail_search_syntax/tokenizer.rb`). When encountering a `-` character:
|
|
72
|
+
|
|
73
|
+
1. Check if there's a non-whitespace character following (potential negation or word separator)
|
|
74
|
+
2. Check if there's whitespace (or nothing) preceding the hyphen
|
|
75
|
+
3. If preceded by whitespace or at start of input: treat as negation operator (`:minus` token)
|
|
76
|
+
4. If preceded by non-whitespace: skip the hyphen (acts as word separator, no token emitted)
|
|
77
|
+
|
|
78
|
+
```ruby
|
|
79
|
+
when "-"
|
|
80
|
+
next_char = peek_char
|
|
81
|
+
prev_char = @position > 0 ? @input[@position - 1] : nil
|
|
82
|
+
|
|
83
|
+
if next_char && next_char !~ /\s/ && (prev_char.nil? || prev_char =~ /\s/)
|
|
84
|
+
# Negation: preceded by whitespace or at start, followed by non-whitespace
|
|
85
|
+
add_token(:minus, char)
|
|
86
|
+
advance
|
|
87
|
+
elsif prev_char && prev_char !~ /\s/
|
|
88
|
+
# Embedded hyphen: preceded by non-whitespace - skip as word separator
|
|
89
|
+
advance
|
|
90
|
+
else
|
|
91
|
+
read_word
|
|
92
|
+
end
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
## Bug That Was Fixed
|
|
96
|
+
|
|
97
|
+
Previously, the gem incorrectly treated all hyphens followed by non-whitespace as negation:
|
|
98
|
+
|
|
99
|
+
- **Old (incorrect):** `some-outfit` was parsed as `some AND NOT outfit`
|
|
100
|
+
- **New (correct):** `some-outfit` is parsed as `some AND outfit`
|
|
101
|
+
|
|
102
|
+
This matches Gmail's actual search behavior where hyphenated terms find messages containing both parts of the hyphenated word.
|
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Greedy vs Non-Greedy Operator Value Tokenization
|
|
2
|
+
|
|
3
|
+
## Summary
|
|
4
|
+
|
|
5
|
+
This document explains the fix in `bugfix-tokens` that changes how operator values are parsed from **greedy** (consuming multiple barewords) to **non-greedy** (single token only), matching Gmail's actual search behavior.
|
|
6
|
+
|
|
7
|
+
## The Problem
|
|
8
|
+
|
|
9
|
+
The previous implementation used greedy tokenization for operator values. When parsing `label:Cora/Google Drive`, the parser would consume all subsequent barewords (`Cora/Google`, `Drive`) into the operator's value until hitting another operator or special token.
|
|
10
|
+
|
|
11
|
+
**Previous behavior:**
|
|
12
|
+
```
|
|
13
|
+
label:Cora/Google Drive label:Notes
|
|
14
|
+
→ Operator(label, "Cora/Google Drive"), Operator(label, "Notes")
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
**Gmail's actual behavior:**
|
|
18
|
+
```
|
|
19
|
+
label:Cora/Google Drive label:Notes
|
|
20
|
+
→ Operator(label, "Cora/Google"), StringToken("Drive"), Operator(label, "Notes")
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
## Gmail's Actual Behavior
|
|
24
|
+
|
|
25
|
+
In Gmail search, barewords after an operator are treated as **separate search terms**, not as part of the operator's value. To include multiple words in an operator value, you must explicitly quote them:
|
|
26
|
+
|
|
27
|
+
| Input | Gmail Interpretation |
|
|
28
|
+
|-------|---------------------|
|
|
29
|
+
| `subject:urgent meeting` | Subject contains "urgent" AND body contains "meeting" |
|
|
30
|
+
| `subject:"urgent meeting"` | Subject contains "urgent meeting" |
|
|
31
|
+
| `in:anywhere movie` | Search "movie" in all mail locations |
|
|
32
|
+
| `label:test one two` | Label is "test" AND body contains "one" AND "two" |
|
|
33
|
+
|
|
34
|
+
## The Fix
|
|
35
|
+
|
|
36
|
+
Changed `parse_operator_value` in `lib/gmail_search_syntax/parser.rb` to only consume a single token for bareword values (`:word`, `:email`, `:number`, `:date`, `:relative_time`).
|
|
37
|
+
|
|
38
|
+
### Before (greedy)
|
|
39
|
+
|
|
40
|
+
```ruby
|
|
41
|
+
when :word, :email, :number, :date, :relative_time
|
|
42
|
+
values = []
|
|
43
|
+
types = []
|
|
44
|
+
|
|
45
|
+
# Collect barewords until operator or special token
|
|
46
|
+
while !eof? && is_bareword_token?
|
|
47
|
+
if current_token.type == :word && peek_token&.type == :colon
|
|
48
|
+
break
|
|
49
|
+
end
|
|
50
|
+
values << current_token.value
|
|
51
|
+
types << current_token.type
|
|
52
|
+
advance
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
# Join multiple values as string
|
|
56
|
+
values.map(&:to_s).join(" ")
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### After (non-greedy)
|
|
60
|
+
|
|
61
|
+
```ruby
|
|
62
|
+
when :word, :email, :number, :date, :relative_time
|
|
63
|
+
# Take only a single token as the operator value.
|
|
64
|
+
# Multi-word values must be explicitly quoted: from:"john smith"
|
|
65
|
+
value = current_token.value
|
|
66
|
+
advance
|
|
67
|
+
value.is_a?(Integer) ? value : value.to_s
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Test Changes
|
|
71
|
+
|
|
72
|
+
Updated tests to reflect the corrected behavior:
|
|
73
|
+
|
|
74
|
+
| Test | Previous Expected | Now Expected |
|
|
75
|
+
|------|-------------------|--------------|
|
|
76
|
+
| `in:anywhere movie` | `Operator("in", "anywhere movie")` | `Operator("in", "anywhere")`, `StringToken("movie")` |
|
|
77
|
+
| `subject:urgent meeting important` | `Operator("subject", "urgent meeting important")` | `Operator("subject", "urgent")`, `StringToken("meeting")`, `StringToken("important")` |
|
|
78
|
+
| `label:test one two three label:another` | 2 operands | 5 operands |
|
|
79
|
+
|
|
80
|
+
## Implications
|
|
81
|
+
|
|
82
|
+
1. **Breaking change** for consumers relying on greedy behavior
|
|
83
|
+
2. Users must now quote multi-word operator values explicitly
|
|
84
|
+
3. More accurate translation to SQL/other query languages since the semantics now match Gmail
|
|
@@ -142,6 +142,91 @@ class GmailSearchSyntaxTest < Minitest::Test
|
|
|
142
142
|
assert_equal "movie", ast.operands[1].child.value
|
|
143
143
|
end
|
|
144
144
|
|
|
145
|
+
# Gmail behavior: embedded hyphens (no preceding whitespace) are word separators, not negation
|
|
146
|
+
# "Coxlee-Gammage" → Coxlee AND Gammage (both tokens highlighted)
|
|
147
|
+
# "Coxlee -Gammage" → Coxlee AND NOT Gammage (space+hyphen = negation)
|
|
148
|
+
|
|
149
|
+
def test_embedded_hyphen_is_word_separator
|
|
150
|
+
# Gmail behavior: hyphen without preceding whitespace separates words, not negation
|
|
151
|
+
ast = GmailSearchSyntax.parse!("some-outfit")
|
|
152
|
+
assert_instance_of And, ast
|
|
153
|
+
|
|
154
|
+
assert_equal 2, ast.operands.length
|
|
155
|
+
assert_instance_of StringToken, ast.operands[0]
|
|
156
|
+
assert_equal "some", ast.operands[0].value
|
|
157
|
+
|
|
158
|
+
assert_instance_of StringToken, ast.operands[1]
|
|
159
|
+
assert_equal "outfit", ast.operands[1].value
|
|
160
|
+
end
|
|
161
|
+
|
|
162
|
+
def test_embedded_hyphen_multiple
|
|
163
|
+
# Multiple hyphens: a-b-c → a AND b AND c
|
|
164
|
+
ast = GmailSearchSyntax.parse!("a-b-c")
|
|
165
|
+
assert_instance_of And, ast
|
|
166
|
+
|
|
167
|
+
assert_equal 3, ast.operands.length
|
|
168
|
+
assert_equal "a", ast.operands[0].value
|
|
169
|
+
assert_equal "b", ast.operands[1].value
|
|
170
|
+
assert_equal "c", ast.operands[2].value
|
|
171
|
+
end
|
|
172
|
+
|
|
173
|
+
def test_embedded_hyphen_real_name
|
|
174
|
+
# Real-world case: hyphenated names
|
|
175
|
+
ast = GmailSearchSyntax.parse!("Coxlee-Gammage")
|
|
176
|
+
assert_instance_of And, ast
|
|
177
|
+
|
|
178
|
+
assert_equal 2, ast.operands.length
|
|
179
|
+
assert_equal "Coxlee", ast.operands[0].value
|
|
180
|
+
assert_equal "Gammage", ast.operands[1].value
|
|
181
|
+
end
|
|
182
|
+
|
|
183
|
+
def test_space_hyphen_is_negation
|
|
184
|
+
# Space + hyphen = negation (unchanged behavior)
|
|
185
|
+
ast = GmailSearchSyntax.parse!("cats -dogs")
|
|
186
|
+
assert_instance_of And, ast
|
|
187
|
+
|
|
188
|
+
assert_equal 2, ast.operands.length
|
|
189
|
+
assert_instance_of StringToken, ast.operands[0]
|
|
190
|
+
assert_equal "cats", ast.operands[0].value
|
|
191
|
+
|
|
192
|
+
assert_instance_of Not, ast.operands[1]
|
|
193
|
+
assert_equal "dogs", ast.operands[1].child.value
|
|
194
|
+
end
|
|
195
|
+
|
|
196
|
+
def test_embedded_hyphen_combined_with_negation
|
|
197
|
+
# Mixed: embedded hyphen + space-preceded negation
|
|
198
|
+
ast = GmailSearchSyntax.parse!("some-outfit -dogs")
|
|
199
|
+
assert_instance_of And, ast
|
|
200
|
+
|
|
201
|
+
assert_equal 3, ast.operands.length
|
|
202
|
+
assert_equal "some", ast.operands[0].value
|
|
203
|
+
assert_equal "outfit", ast.operands[1].value
|
|
204
|
+
assert_instance_of Not, ast.operands[2]
|
|
205
|
+
assert_equal "dogs", ast.operands[2].child.value
|
|
206
|
+
end
|
|
207
|
+
|
|
208
|
+
def test_negation_at_start_of_input
|
|
209
|
+
# Negation at start of input still works
|
|
210
|
+
ast = GmailSearchSyntax.parse!("-spam")
|
|
211
|
+
assert_instance_of Not, ast
|
|
212
|
+
assert_equal "spam", ast.child.value
|
|
213
|
+
end
|
|
214
|
+
|
|
215
|
+
def test_embedded_hyphen_with_operator
|
|
216
|
+
# Embedded hyphen in operator context
|
|
217
|
+
ast = GmailSearchSyntax.parse!("from:mary-jane")
|
|
218
|
+
assert_instance_of And, ast
|
|
219
|
+
|
|
220
|
+
# "from:mary" becomes operator, "-jane" is embedded hyphen → "jane" is separate word
|
|
221
|
+
assert_equal 2, ast.operands.length
|
|
222
|
+
assert_instance_of Operator, ast.operands[0]
|
|
223
|
+
assert_equal "from", ast.operands[0].name
|
|
224
|
+
assert_equal "mary", ast.operands[0].value
|
|
225
|
+
|
|
226
|
+
assert_instance_of StringToken, ast.operands[1]
|
|
227
|
+
assert_equal "jane", ast.operands[1].value
|
|
228
|
+
end
|
|
229
|
+
|
|
145
230
|
def test_around_operator
|
|
146
231
|
ast = GmailSearchSyntax.parse!("holiday AROUND 10 vacation")
|
|
147
232
|
assert_instance_of Around, ast
|
|
@@ -200,10 +285,13 @@ class GmailSearchSyntaxTest < Minitest::Test
|
|
|
200
285
|
end
|
|
201
286
|
|
|
202
287
|
def test_in_anywhere
|
|
203
|
-
#
|
|
204
|
-
#
|
|
288
|
+
# Gmail treats barewords after operator as separate search terms
|
|
289
|
+
# in:anywhere movie → search for "movie" in all mail locations
|
|
205
290
|
ast = GmailSearchSyntax.parse!("in:anywhere movie")
|
|
206
|
-
|
|
291
|
+
assert_instance_of And, ast
|
|
292
|
+
assert_equal 2, ast.operands.length
|
|
293
|
+
assert_operator({name: "in", value: "anywhere"}, ast.operands[0])
|
|
294
|
+
assert_string_token({value: "movie"}, ast.operands[1])
|
|
207
295
|
end
|
|
208
296
|
|
|
209
297
|
def test_is_starred
|
|
@@ -660,92 +748,103 @@ class GmailSearchSyntaxTest < Minitest::Test
|
|
|
660
748
|
assert_equal 'project\\plan', ast.operands[1].value
|
|
661
749
|
end
|
|
662
750
|
|
|
663
|
-
# Gmail behavior: barewords after operator values
|
|
664
|
-
#
|
|
665
|
-
# We now implement this Gmail-compatible behavior.
|
|
751
|
+
# Gmail behavior: barewords after operator values are treated as separate search terms.
|
|
752
|
+
# Multi-word operator values must be explicitly quoted: label:"Cora/Google Drive"
|
|
666
753
|
|
|
667
754
|
def test_label_with_space_separated_value_gmail_behavior
|
|
668
|
-
# Gmail
|
|
669
|
-
#
|
|
755
|
+
# Gmail treats barewords as separate search terms
|
|
756
|
+
# To search for label "Cora/Google Drive", you must quote it: label:"Cora/Google Drive"
|
|
670
757
|
ast = GmailSearchSyntax.parse!("label:Cora/Google Drive label:Notes")
|
|
671
758
|
assert_instance_of And, ast
|
|
672
|
-
assert_equal
|
|
759
|
+
assert_equal 3, ast.operands.length
|
|
673
760
|
|
|
674
|
-
#
|
|
761
|
+
# First operator takes only the first token
|
|
675
762
|
assert_instance_of Operator, ast.operands[0]
|
|
676
763
|
assert_equal "label", ast.operands[0].name
|
|
677
|
-
assert_equal "Cora/Google
|
|
764
|
+
assert_equal "Cora/Google", ast.operands[0].value
|
|
765
|
+
|
|
766
|
+
# "Drive" becomes a separate search term
|
|
767
|
+
assert_instance_of StringToken, ast.operands[1]
|
|
768
|
+
assert_equal "Drive", ast.operands[1].value
|
|
678
769
|
|
|
679
770
|
# Second operator parsed correctly
|
|
680
|
-
assert_instance_of Operator, ast.operands[
|
|
681
|
-
assert_equal "label", ast.operands[
|
|
682
|
-
assert_equal "Notes", ast.operands[
|
|
771
|
+
assert_instance_of Operator, ast.operands[2]
|
|
772
|
+
assert_equal "label", ast.operands[2].name
|
|
773
|
+
assert_equal "Notes", ast.operands[2].value
|
|
683
774
|
end
|
|
684
775
|
|
|
685
776
|
def test_subject_with_barewords_gmail_behavior
|
|
686
|
-
# Gmail
|
|
687
|
-
#
|
|
777
|
+
# Gmail treats barewords as separate search terms
|
|
778
|
+
# subject:urgent meeting important → subject contains "urgent" AND body contains "meeting" AND "important"
|
|
688
779
|
ast = GmailSearchSyntax.parse!("subject:urgent meeting important")
|
|
689
|
-
assert_instance_of
|
|
780
|
+
assert_instance_of And, ast
|
|
781
|
+
assert_equal 3, ast.operands.length
|
|
690
782
|
|
|
691
|
-
|
|
692
|
-
|
|
783
|
+
assert_operator({name: "subject", value: "urgent"}, ast.operands[0])
|
|
784
|
+
assert_string_token({value: "meeting"}, ast.operands[1])
|
|
785
|
+
assert_string_token({value: "important"}, ast.operands[2])
|
|
693
786
|
end
|
|
694
787
|
|
|
695
788
|
def test_multiple_barewords_between_operators_gmail_behavior
|
|
696
|
-
# Gmail
|
|
697
|
-
#
|
|
789
|
+
# Gmail treats each bareword as a separate search term
|
|
790
|
+
# label:test one two three label:another → 5 terms
|
|
698
791
|
ast = GmailSearchSyntax.parse!("label:test one two three label:another")
|
|
699
792
|
assert_instance_of And, ast
|
|
700
|
-
assert_equal
|
|
793
|
+
assert_equal 5, ast.operands.length
|
|
701
794
|
|
|
702
|
-
|
|
703
|
-
|
|
704
|
-
|
|
705
|
-
|
|
706
|
-
|
|
707
|
-
assert_equal "label", ast.operands[1].name
|
|
708
|
-
assert_equal "another", ast.operands[1].value
|
|
795
|
+
assert_operator({name: "label", value: "test"}, ast.operands[0])
|
|
796
|
+
assert_string_token({value: "one"}, ast.operands[1])
|
|
797
|
+
assert_string_token({value: "two"}, ast.operands[2])
|
|
798
|
+
assert_string_token({value: "three"}, ast.operands[3])
|
|
799
|
+
assert_operator({name: "label", value: "another"}, ast.operands[4])
|
|
709
800
|
end
|
|
710
801
|
|
|
711
802
|
def test_barewords_stop_at_special_operators
|
|
712
|
-
#
|
|
803
|
+
# Barewords are separate terms, OR separates two groups
|
|
713
804
|
ast = GmailSearchSyntax.parse!("subject:urgent meeting OR subject:important call")
|
|
714
805
|
assert_instance_of Or, ast
|
|
715
806
|
assert_equal 2, ast.operands.length
|
|
716
807
|
|
|
717
|
-
|
|
718
|
-
|
|
719
|
-
assert_equal
|
|
808
|
+
# Left side: subject:urgent AND meeting (implicit AND)
|
|
809
|
+
assert_instance_of And, ast.operands[0]
|
|
810
|
+
assert_equal 2, ast.operands[0].operands.length
|
|
811
|
+
assert_operator({name: "subject", value: "urgent"}, ast.operands[0].operands[0])
|
|
812
|
+
assert_string_token({value: "meeting"}, ast.operands[0].operands[1])
|
|
720
813
|
|
|
721
|
-
|
|
722
|
-
|
|
723
|
-
assert_equal
|
|
814
|
+
# Right side: subject:important AND call (implicit AND)
|
|
815
|
+
assert_instance_of And, ast.operands[1]
|
|
816
|
+
assert_equal 2, ast.operands[1].operands.length
|
|
817
|
+
assert_operator({name: "subject", value: "important"}, ast.operands[1].operands[0])
|
|
818
|
+
assert_string_token({value: "call"}, ast.operands[1].operands[1])
|
|
724
819
|
end
|
|
725
820
|
|
|
726
821
|
def test_barewords_with_mixed_tokens
|
|
727
|
-
# Numbers, dates, emails
|
|
822
|
+
# Numbers, dates, emails are all separate search terms
|
|
728
823
|
ast = GmailSearchSyntax.parse!("subject:meeting 2024 Q1 review")
|
|
729
|
-
assert_instance_of
|
|
730
|
-
assert_equal
|
|
731
|
-
|
|
824
|
+
assert_instance_of And, ast
|
|
825
|
+
assert_equal 4, ast.operands.length
|
|
826
|
+
|
|
827
|
+
assert_operator({name: "subject", value: "meeting"}, ast.operands[0])
|
|
828
|
+
assert_string_token({value: 2024}, ast.operands[1])
|
|
829
|
+
assert_string_token({value: "Q1"}, ast.operands[2])
|
|
830
|
+
assert_string_token({value: "review"}, ast.operands[3])
|
|
732
831
|
end
|
|
733
832
|
|
|
734
833
|
def test_specific_gmail_example_cora_google_drive
|
|
735
|
-
#
|
|
736
|
-
#
|
|
834
|
+
# label:Cora/Google Drive label:Notes
|
|
835
|
+
# "Drive" is a separate search term - to include it in the label, quote it:
|
|
836
|
+
# label:"Cora/Google Drive" label:Notes
|
|
737
837
|
ast = GmailSearchSyntax.parse!("label:Cora/Google Drive label:Notes")
|
|
738
838
|
assert_instance_of And, ast
|
|
739
|
-
assert_equal
|
|
839
|
+
assert_equal 3, ast.operands.length
|
|
740
840
|
|
|
741
|
-
# First operator: label with "Cora/Google
|
|
742
|
-
|
|
743
|
-
|
|
744
|
-
|
|
841
|
+
# First operator: label with "Cora/Google" only
|
|
842
|
+
assert_operator({name: "label", value: "Cora/Google"}, ast.operands[0])
|
|
843
|
+
|
|
844
|
+
# "Drive" becomes a separate search term
|
|
845
|
+
assert_string_token({value: "Drive"}, ast.operands[1])
|
|
745
846
|
|
|
746
847
|
# Second operator: label with "Notes"
|
|
747
|
-
|
|
748
|
-
assert_equal "label", ast.operands[1].name
|
|
749
|
-
assert_equal "Notes", ast.operands[1].value
|
|
848
|
+
assert_operator({name: "label", value: "Notes"}, ast.operands[2])
|
|
750
849
|
end
|
|
751
850
|
end
|
data/test/tokenizer_test.rb
CHANGED
|
@@ -94,6 +94,64 @@ class TokenizerTest < Minitest::Test
|
|
|
94
94
|
assert_token_stream(expected, tokens)
|
|
95
95
|
end
|
|
96
96
|
|
|
97
|
+
def test_tokenize_embedded_hyphen
|
|
98
|
+
# Gmail behavior: embedded hyphen (no preceding whitespace) is a word separator, not negation
|
|
99
|
+
tokens = tokenize("some-outfit")
|
|
100
|
+
expected = [
|
|
101
|
+
{type: :word, value: "some"},
|
|
102
|
+
{type: :word, value: "outfit"},
|
|
103
|
+
{type: :eof}
|
|
104
|
+
]
|
|
105
|
+
assert_token_stream(expected, tokens)
|
|
106
|
+
end
|
|
107
|
+
|
|
108
|
+
def test_tokenize_multiple_embedded_hyphens
|
|
109
|
+
# Multiple hyphens: a-b-c → three separate words
|
|
110
|
+
tokens = tokenize("a-b-c")
|
|
111
|
+
expected = [
|
|
112
|
+
{type: :word, value: "a"},
|
|
113
|
+
{type: :word, value: "b"},
|
|
114
|
+
{type: :word, value: "c"},
|
|
115
|
+
{type: :eof}
|
|
116
|
+
]
|
|
117
|
+
assert_token_stream(expected, tokens)
|
|
118
|
+
end
|
|
119
|
+
|
|
120
|
+
def test_tokenize_hyphenated_name
|
|
121
|
+
# Real-world case: hyphenated names like "Coxlee-Gammage"
|
|
122
|
+
tokens = tokenize("Coxlee-Gammage")
|
|
123
|
+
expected = [
|
|
124
|
+
{type: :word, value: "Coxlee"},
|
|
125
|
+
{type: :word, value: "Gammage"},
|
|
126
|
+
{type: :eof}
|
|
127
|
+
]
|
|
128
|
+
assert_token_stream(expected, tokens)
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
def test_tokenize_negation_at_start
|
|
132
|
+
# Negation at start of input
|
|
133
|
+
tokens = tokenize("-spam")
|
|
134
|
+
expected = [
|
|
135
|
+
{type: :minus},
|
|
136
|
+
{type: :word, value: "spam"},
|
|
137
|
+
{type: :eof}
|
|
138
|
+
]
|
|
139
|
+
assert_token_stream(expected, tokens)
|
|
140
|
+
end
|
|
141
|
+
|
|
142
|
+
def test_tokenize_embedded_hyphen_vs_negation
|
|
143
|
+
# Mixed: embedded hyphen + space-preceded negation
|
|
144
|
+
tokens = tokenize("some-outfit -dogs")
|
|
145
|
+
expected = [
|
|
146
|
+
{type: :word, value: "some"},
|
|
147
|
+
{type: :word, value: "outfit"},
|
|
148
|
+
{type: :minus},
|
|
149
|
+
{type: :word, value: "dogs"},
|
|
150
|
+
{type: :eof}
|
|
151
|
+
]
|
|
152
|
+
assert_token_stream(expected, tokens)
|
|
153
|
+
end
|
|
154
|
+
|
|
97
155
|
def test_tokenize_around
|
|
98
156
|
tokens = tokenize("holiday AROUND 10 vacation")
|
|
99
157
|
expected = [
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: gmail_search_syntax
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.4
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- me@julik.nl
|
|
@@ -69,10 +69,9 @@ executables: []
|
|
|
69
69
|
extensions: []
|
|
70
70
|
extra_rdoc_files: []
|
|
71
71
|
files:
|
|
72
|
-
-
|
|
73
|
-
-
|
|
74
|
-
-
|
|
75
|
-
- IMPLEMENTATION_NOTES.md
|
|
72
|
+
- ".github/workflows/ci.yml"
|
|
73
|
+
- AGENTS.md
|
|
74
|
+
- Gemfile
|
|
76
75
|
- README.md
|
|
77
76
|
- Rakefile
|
|
78
77
|
- SCHEMA.md
|
|
@@ -84,6 +83,7 @@ files:
|
|
|
84
83
|
- examples/postgres_vs_sqlite.rb
|
|
85
84
|
- examples/sql_query.rb
|
|
86
85
|
- examples/text_vs_substring_demo.rb
|
|
86
|
+
- gmail_search_syntax.gemspec
|
|
87
87
|
- lib/GMAIL_SEARCH_OPERATORS.md
|
|
88
88
|
- lib/gmail_search_syntax.rb
|
|
89
89
|
- lib/gmail_search_syntax/ast.rb
|
|
@@ -91,6 +91,12 @@ files:
|
|
|
91
91
|
- lib/gmail_search_syntax/sql_visitor.rb
|
|
92
92
|
- lib/gmail_search_syntax/tokenizer.rb
|
|
93
93
|
- lib/gmail_search_syntax/version.rb
|
|
94
|
+
- slop/ARCHITECTURE.md
|
|
95
|
+
- slop/EMBEDDED_HYPHENS.md
|
|
96
|
+
- slop/GMAIL_BEHAVIOR_COMPARISON.md
|
|
97
|
+
- slop/GMAIL_COMPATIBILITY_COMPLETE.md
|
|
98
|
+
- slop/GREEDY_VS_NON_GREEDY_TOKENIZATION.md
|
|
99
|
+
- slop/IMPLEMENTATION_NOTES.md
|
|
94
100
|
- test/gmail_search_syntax_test.rb
|
|
95
101
|
- test/integration_test.rb
|
|
96
102
|
- test/postgres_visitor_test.rb
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|