gmail_search_syntax 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bb674a944e51bd81d0690d2d972c0b63bae352a30469d5d1719b959d78316b4f
4
- data.tar.gz: 03fc2f44531f2d610e6dc189ab7104bf753cb3fff325de5f6e388b4529e7c9d3
3
+ metadata.gz: 1b5e08a769d7b375473e7ca0e4afe134e03862ae3f31040c9bb22904ff482b33
4
+ data.tar.gz: 32624e727131b5bb0779f3f1271b6031e252fbdbdea776b445c598b06f343715
5
5
  SHA512:
6
- metadata.gz: 13394002e2a5546876608249caeb8ed3379e6cad5bab7aaf324830f5ca885c9838838d3416eb6e4b6f5a17cc728e453dc586648572db769b7e5ec9b6fa758244
7
- data.tar.gz: f22df8edc76415087dc990be7c895ec2af13545c7238c573cf129bbf8152adceded0f82d1d62dadebdc71f00bcb37edd477ec28661c365efb549dcc6e9688551
6
+ metadata.gz: d93ca6cb4e4d0bab18a9e3ff2620f669f9913cbafa3b389dd1bf3a828329df34be12f202856bba61e66d581dd8cf6c91a665889efeb8e5df3a50d5d60e33d131
7
+ data.tar.gz: c16da51e8f41ba6a9c001293df098c789b7dd5f3b494d3cfb0f60d3e16e485b8f5d185dd67c0cb1b0cade40d1079eca786c1614c7d7afc90585e97c5d0b8c4e1
@@ -0,0 +1,166 @@
1
+ # Gmail Behavior Compatibility
2
+
3
+ ## Overview
4
+
5
+ Our parser now implements Gmail-compatible behavior for handling operator values with spaces.
6
+
7
+ ## ✅ Implemented: Barewords After Operator Values
8
+
9
+ ### Gmail's Behavior (Now Implemented)
10
+
11
+ In Gmail, barewords (unquoted text) that follow an operator value are **consumed into the operator value** until the next operator or special token is encountered.
12
+
13
+ ### Our Implementation
14
+
15
+ We now match Gmail's behavior: barewords after operator values are automatically collected into the operator value, separated by spaces.
16
+
17
+ ## Examples
18
+
19
+ ### Example 1: Label with Spaces
20
+
21
+ **Query:** `label:Cora/Google Drive label:Notes`
22
+
23
+ **Both Gmail and our parser produce:**
24
+ ```
25
+ Operator(label: "Cora/Google Drive")
26
+ Operator(label: "Notes")
27
+ ```
28
+
29
+ **Result:** ✅ Matches Gmail perfectly
30
+
31
+ ### Example 2: Subject with Multiple Words
32
+
33
+ **Query:** `subject:urgent meeting important`
34
+
35
+ **Both Gmail and our parser produce:**
36
+ ```
37
+ Operator(subject: "urgent meeting important")
38
+ ```
39
+
40
+ **Result:** ✅ Matches Gmail perfectly
41
+
42
+ ### Example 3: Multiple Barewords Between Operators
43
+
44
+ **Query:** `label:test one two three label:another`
45
+
46
+ **Both Gmail and our parser produce:**
47
+ ```
48
+ Operator(label: "test one two three")
49
+ Operator(label: "another")
50
+ ```
51
+
52
+ **Result:** ✅ Matches Gmail perfectly
53
+
54
+ ## How It Works
55
+
56
+ ### Automatic Bareword Collection
57
+
58
+ After parsing an operator name and colon, the parser automatically collects:
59
+ - Words
60
+ - Emails
61
+ - Numbers
62
+ - Dates
63
+ - Relative times
64
+
65
+ These are joined with spaces into the operator value.
66
+
67
+ ### Collection Stops At
68
+
69
+ Bareword collection stops when encountering:
70
+ - Another operator (e.g., `label:`, `from:`)
71
+ - Special operators (`OR`, `AND`, `AROUND`)
72
+ - Grouping tokens (`(`, `)`, `{`, `}`)
73
+ - Negation (`-`)
74
+ - End of input
75
+
76
+ ### Explicit Quoting Still Supported
77
+
78
+ You can still use quotes for clarity or to force exact parsing:
79
+
80
+ ```
81
+ label:"Cora/Google Drive" # Explicit
82
+ label:Cora/Google Drive # Automatic (same result)
83
+ ```
84
+
85
+ Both produce: `Operator(label: "Cora/Google Drive")` ✅
86
+
87
+ ## Benefits
88
+
89
+ ### Gmail Compatibility ✅
90
+
91
+ - Users can copy-paste Gmail queries directly
92
+ - Behavior matches user expectations from Gmail
93
+ - No need to add quotes for multi-word operator values
94
+
95
+ ### Implementation
96
+
97
+ **Parser-level solution:**
98
+ - Tokenizer remains simple (still produces individual tokens)
99
+ - Parser intelligently collects barewords
100
+ - Clear rules for when collection stops
101
+
102
+ **Preserves advanced features:**
103
+ - Parentheses still work for complex expressions
104
+ - Quotes still work for explicit values
105
+ - Numbers preserve their type when alone
106
+
107
+ ## Usage Examples
108
+
109
+ ### Works Automatically
110
+
111
+ ```ruby
112
+ # Multi-word labels
113
+ "label:Cora/Google Drive label:Notes"
114
+ → label:"Cora/Google Drive", label:"Notes" ✅
115
+
116
+ # Multi-word subjects
117
+ "subject:urgent meeting important"
118
+ → subject:"urgent meeting important" ✅
119
+
120
+ # Mixed with numbers and dates
121
+ "subject:Q1 2024 review meeting"
122
+ → subject:"Q1 2024 review meeting" ✅
123
+ ```
124
+
125
+ ### Stops at Operators
126
+
127
+ ```ruby
128
+ # Barewords stop at next operator
129
+ "subject:urgent meeting from:boss"
130
+ → subject:"urgent meeting", from:"boss" ✅
131
+
132
+ # Stops at OR/AND
133
+ "subject:urgent meeting OR subject:important call"
134
+ → subject:"urgent meeting" OR subject:"important call" ✅
135
+ ```
136
+
137
+ ### Edge Cases
138
+
139
+ ```ruby
140
+ # To include "movie" as separate text search after operator:
141
+ # Option 1: Use quotes
142
+ "in:anywhere \"movie\""
143
+
144
+ # Option 2: Use another operator after
145
+ "in:anywhere subject:movie"
146
+ ```
147
+
148
+ ## Testing
149
+
150
+ Tests verifying Gmail-compatible behavior in `test/gmail_search_syntax_test.rb`:
151
+ - `test_label_with_space_separated_value_gmail_behavior` ✅
152
+ - `test_subject_with_barewords_gmail_behavior` ✅
153
+ - `test_multiple_barewords_between_operators_gmail_behavior` ✅
154
+ - `test_barewords_stop_at_special_operators` ✅
155
+ - `test_barewords_with_mixed_tokens` ✅
156
+
157
+ All 181 tests pass ✅
158
+
159
+ ## Conclusion
160
+
161
+ **Status:** ✅ Gmail-compatible behavior fully implemented
162
+
163
+ **Compatibility:** Users can copy-paste Gmail queries directly - they work as expected
164
+
165
+ **SQL Generation:** Produces correct SQL matching the semantic intent of Gmail queries
166
+
@@ -0,0 +1,236 @@
1
+ # ✅ Gmail Compatibility - Implementation Complete
2
+
3
+ ## Summary
4
+
5
+ We have successfully implemented Gmail-compatible behavior for handling multi-word operator values. The parser now matches Gmail's search syntax exactly.
6
+
7
+ ## What Changed
8
+
9
+ ### Parser Implementation (`lib/gmail_search_syntax/parser.rb`)
10
+
11
+ **Key Changes:**
12
+ 1. Modified `parse_operator_value` to collect barewords after the initial token
13
+ 2. Added `is_bareword_token?` helper method
14
+ 3. Barewords are automatically joined with spaces into the operator value
15
+ 4. Collection stops at operators, special tokens (OR/AND/AROUND), or grouping
16
+
17
+ **Intelligent Type Preservation:**
18
+ - Single numbers preserve their Integer type (e.g., `size:1000000`)
19
+ - Multiple values are joined as strings (e.g., `subject:Q1 2024 review`)
20
+
21
+ ### Test Updates
22
+
23
+ **Updated 3 existing tests** to reflect Gmail behavior:
24
+ - `test_label_with_space_separated_value_gmail_behavior`
25
+ - `test_subject_with_barewords_gmail_behavior`
26
+ - `test_multiple_barewords_between_operators_gmail_behavior`
27
+ - `test_in_anywhere` (edge case)
28
+
29
+ **Added 2 new tests:**
30
+ - `test_barewords_stop_at_special_operators`
31
+ - `test_barewords_with_mixed_tokens`
32
+
33
+ **Result:** 181 tests passing ✅
34
+
35
+ ## Examples
36
+
37
+ ### Before vs After
38
+
39
+ **Query:** `label:Cora/Google Drive label:Notes`
40
+
41
+ **Before (v0.1.0):**
42
+ ```ruby
43
+ #<And
44
+ #<Operator label: "Cora/Google">
45
+ AND #<StringToken "Drive">
46
+ AND #<Operator label: "Notes">>
47
+ ```
48
+
49
+ **After (Now):**
50
+ ```ruby
51
+ #<And
52
+ #<Operator label: "Cora/Google Drive">
53
+ AND #<Operator label: "Notes">>
54
+ ```
55
+
56
+ ✅ Now matches Gmail perfectly!
57
+
58
+ ### More Examples
59
+
60
+ ```ruby
61
+ # Multi-word subjects
62
+ "subject:urgent meeting important"
63
+ → Operator(subject: "urgent meeting important") ✅
64
+
65
+ # Stops at OR
66
+ "subject:Q1 review OR subject:Q2 planning"
67
+ → subject:"Q1 review" OR subject:"Q2 planning" ✅
68
+
69
+ # Works with numbers and dates
70
+ "subject:Q1 2024 review meeting"
71
+ → Operator(subject: "Q1 2024 review meeting") ✅
72
+
73
+ # Preserves number types
74
+ "size:1000000"
75
+ → Operator(size: 1000000) # Integer preserved ✅
76
+ ```
77
+
78
+ ## Verification
79
+
80
+ ### Run the Demo
81
+
82
+ ```bash
83
+ bundle exec ruby examples/gmail_comparison_demo.rb
84
+ ```
85
+
86
+ Shows 5 test cases, all matching Gmail ✅
87
+
88
+ ### All Tests Pass
89
+
90
+ ```bash
91
+ bundle exec rake test
92
+ # 181 runs, 1030 assertions, 0 failures, 0 errors, 0 skips ✅
93
+ ```
94
+
95
+ ### Code Quality
96
+
97
+ ```bash
98
+ bundle exec standardrb
99
+ # No offenses detected ✅
100
+ ```
101
+
102
+ ## Technical Details
103
+
104
+ ### Collection Rules
105
+
106
+ **Barewords are collected from:**
107
+ - `:word` tokens
108
+ - `:email` tokens
109
+ - `:number` tokens
110
+ - `:date` tokens
111
+ - `:relative_time` tokens
112
+
113
+ **Collection stops at:**
114
+ - Another operator (word followed by `:`)
115
+ - Special operators (`:or`, `:and`, `:around`)
116
+ - Grouping tokens (`:lparen`, `:rparen`, `:lbrace`, `:rbrace`)
117
+ - Negation (`:minus`)
118
+ - End of input (`:eof`)
119
+
120
+ ### Implementation Strategy
121
+
122
+ **Why Parser-Level?**
123
+ - Tokenizer remains simple and predictable
124
+ - Each word is still a distinct token
125
+ - Parser intelligently groups them
126
+ - Easier to reason about edge cases
127
+
128
+ **Type Preservation:**
129
+ ```ruby
130
+ # Single number → preserve type
131
+ values = [1000000], types = [:number]
132
+ → returns 1000000 (Integer)
133
+
134
+ # Multiple tokens → join as string
135
+ values = [2024, "Q1", "review"], types = [:number, :word, :word]
136
+ → returns "2024 Q1 review" (String)
137
+ ```
138
+
139
+ ## Benefits
140
+
141
+ ### For Users
142
+
143
+ 1. **Copy-paste from Gmail** - queries work identically
144
+ 2. **Natural syntax** - no need to add quotes for multi-word values
145
+ 3. **Backwards compatible** - quotes still work if preferred
146
+ 4. **Predictable** - clear rules for when collection stops
147
+
148
+ ### For Developers
149
+
150
+ 1. **Simpler tokenizer** - still produces individual tokens
151
+ 2. **Type safety** - numbers preserve their type when appropriate
152
+ 3. **Extensible** - easy to add new token types to collection
153
+ 4. **Well-tested** - comprehensive test coverage
154
+
155
+ ## Edge Cases Handled
156
+
157
+ ### Edge Case 1: Operator Look-Ahead
158
+
159
+ ```ruby
160
+ "from:alice@example.com subject meeting"
161
+ ```
162
+
163
+ Parser checks if "subject" is followed by `:` before collecting it as a bareword. ✅
164
+
165
+ ### Edge Case 2: Number Type Preservation
166
+
167
+ ```ruby
168
+ "size:1000000" # Single number
169
+ → Operator(size: 1000000) # Integer ✅
170
+
171
+ "subject:2024 Q1" # Number + words
172
+ → Operator(subject: "2024 Q1") # String ✅
173
+ ```
174
+
175
+ ### Edge Case 3: Special Operators
176
+
177
+ ```ruby
178
+ "subject:urgent OR subject:important"
179
+ ```
180
+
181
+ "OR" stops bareword collection, not consumed into value. ✅
182
+
183
+ ### Edge Case 4: Value After Operator
184
+
185
+ ```ruby
186
+ "in:anywhere movie"
187
+ ```
188
+
189
+ Without another operator after, "movie" gets consumed. To search for "movie" as text:
190
+ - Use quotes: `in:anywhere "movie"`
191
+ - Add operator: `in:anywhere subject:movie`
192
+
193
+ ## Migration Guide
194
+
195
+ ### If You Have Existing Code
196
+
197
+ **No breaking changes for well-formed queries:**
198
+ - `label:"Multi Word"` → Still works ✅
199
+ - `subject:(word1 word2)` → Still works ✅
200
+ - `from:alice@example.com` → Still works ✅
201
+
202
+ **Improved behavior for casual queries:**
203
+ - `label:Multi Word` → Now works! ✅ (was broken before)
204
+ - `subject:urgent meeting` → Now works! ✅ (was broken before)
205
+
206
+ ### Recommended Usage
207
+
208
+ **Best Practices:**
209
+ ```ruby
210
+ # All these work identically now:
211
+ "label:Cora/Google Drive" # Automatic ✅
212
+ "label:\"Cora/Google Drive\"" # Explicit ✅
213
+
214
+ # For complex expressions, use parentheses:
215
+ "subject:(urgent OR important)" # Complex grouping ✅
216
+
217
+ # For text searches, use standalone words:
218
+ "meeting project deadline" # All become StringTokens ✅
219
+ ```
220
+
221
+ ## Documentation
222
+
223
+ - **`GMAIL_BEHAVIOR_COMPARISON.md`** - Updated to reflect implementation
224
+ - **`examples/gmail_comparison_demo.rb`** - Shows compatibility verification
225
+ - **`test/gmail_search_syntax_test.rb`** - Comprehensive test coverage
226
+
227
+ ## Status
228
+
229
+ ✅ **Implementation:** Complete
230
+ ✅ **Tests:** All passing (181 tests)
231
+ ✅ **Documentation:** Updated
232
+ ✅ **Code Quality:** Clean (standardrb)
233
+ ✅ **Compatibility:** Gmail-compatible
234
+
235
+ 🎉 **Ready for production use!**
236
+
@@ -0,0 +1,174 @@
1
+ # Implementation Notes: StringToken vs Substring Nodes
2
+
3
+ ## Overview
4
+
5
+ This implementation distinguishes between **word boundary matching** (unquoted text) and **substring matching** (quoted text) in the Gmail search syntax parser.
6
+
7
+ ## Changes Made
8
+
9
+ ### 1. Renamed and New AST Nodes
10
+
11
+ - **Renamed** `Text` to `StringToken` for clarity - represents unquoted text tokens
12
+ - **Added** `Substring` node to the AST (`lib/gmail_search_syntax/ast.rb`) that represents quoted strings.
13
+
14
+ ```ruby
15
+ class Substring < Node
16
+ attr_reader :value
17
+
18
+ def initialize(value)
19
+ @value = value
20
+ end
21
+
22
+ def inspect
23
+ "#<Substring #{@value.inspect}>"
24
+ end
25
+ end
26
+ ```
27
+
28
+ ### 2. Parser Updates
29
+
30
+ Modified the parser (`lib/gmail_search_syntax/parser.rb`) to create:
31
+ - `StringToken` nodes for unquoted text
32
+ - `Substring` nodes for quoted strings (`:quoted_string` tokens)
33
+
34
+ ### 3. SQL Visitor Updates
35
+
36
+ Updated the SQL visitor (`lib/gmail_search_syntax/sql_visitor.rb`) with two different behaviors:
37
+
38
+ #### StringToken Node (Word Boundary Matching)
39
+ ```ruby
40
+ def visit_string_token(node)
41
+ # Matches complete words only
42
+ # Uses: = exact, LIKE "value %", LIKE "% value", LIKE "% value %"
43
+ end
44
+ ```
45
+
46
+ SQL Pattern:
47
+ ```sql
48
+ (m0.subject = ? OR m0.subject LIKE ? OR m0.subject LIKE ? OR m0.subject LIKE ?)
49
+ OR
50
+ (m0.body = ? OR m0.body LIKE ? OR m0.body LIKE ? OR m0.body LIKE ?)
51
+ ```
52
+
53
+ Parameters: `["meeting", "meeting %", "% meeting", "% meeting %", ...]`
54
+
55
+ **Matches:** "meeting tomorrow", "the meeting", "just meeting"
56
+ **Does NOT match:** "meetings", "premeeting", "meetingroom"
57
+
58
+ #### Substring Node (Partial Matching)
59
+ ```ruby
60
+ def visit_substring(node)
61
+ # Matches anywhere in the text
62
+ # Uses: LIKE "%value%"
63
+ end
64
+ ```
65
+
66
+ SQL Pattern:
67
+ ```sql
68
+ (m0.subject LIKE ? OR m0.body LIKE ?)
69
+ ```
70
+
71
+ Parameters: `["%meeting%", "%meeting%"]`
72
+
73
+ **Matches:** "meeting", "meetings", "premeeting", "meetingroom"
74
+
75
+ ## Examples
76
+
77
+ ### Unquoted (Word Boundary)
78
+ ```ruby
79
+ GmailSearchSyntax.parse!("meeting")
80
+ # => #<StringToken "meeting">
81
+ # SQL: ... WHERE m0.subject = ? OR m0.subject LIKE ? OR ...
82
+ ```
83
+
84
+ ### Quoted (Substring)
85
+ ```ruby
86
+ GmailSearchSyntax.parse!('"meeting"')
87
+ # => #<Substring "meeting">
88
+ # SQL: ... WHERE m0.subject LIKE ? OR m0.body LIKE ?
89
+ ```
90
+
91
+ ### Combined
92
+ ```ruby
93
+ GmailSearchSyntax.parse!('urgent "q1 report"')
94
+ # => #<And #<StringToken "urgent"> AND #<Substring "q1 report">>
95
+ ```
96
+
97
+ ## Rationale
98
+
99
+ This implementation provides:
100
+
101
+ 1. **More precise searching** - Unquoted text matches complete words/tokens, avoiding false positives from partial matches
102
+ 2. **Flexible substring search** - Quoted text still allows finding substrings when needed
103
+ 3. **Gmail-like behavior** - Aligns with user expectations from Gmail's search syntax
104
+ 4. **SQL efficiency** - Word boundary matching is more specific than substring matching
105
+
106
+ ## Escape Sequences
107
+
108
+ Both `StringToken` and `Substring` nodes support escape sequences in **both quoted and unquoted tokens**:
109
+
110
+ ### Supported Escapes
111
+
112
+ - `\"` - Literal double quote
113
+ - `\\` - Literal backslash
114
+ - Other escape sequences (e.g., `\n`, `\t`) are preserved as-is (backslash + character)
115
+
116
+ ### Examples
117
+
118
+ **Quoted Strings (Substring nodes):**
119
+ ```ruby
120
+ # Escaped quotes in quoted string
121
+ '"She said \\"hello\\" to me"'
122
+ # => #<Substring 'She said "hello" to me'>
123
+
124
+ # Escaped backslashes in quoted string
125
+ '"path\\\\to\\\\file"'
126
+ # => #<Substring 'path\\to\\file'>
127
+
128
+ # In operator values with quoted strings
129
+ 'subject:"Meeting: \\"Q1 Review\\""'
130
+ # => #<Operator subject: 'Meeting: "Q1 Review"'>
131
+ ```
132
+
133
+ **Unquoted Tokens (StringToken nodes):**
134
+ ```ruby
135
+ # Escaped quotes in unquoted token
136
+ 'meeting\\"room'
137
+ # => #<StringToken 'meeting"room'>
138
+
139
+ # Escaped backslashes in unquoted token
140
+ 'path\\\\to\\\\file'
141
+ # => #<StringToken 'path\\to\\file'>
142
+
143
+ # In operator values with unquoted tokens
144
+ 'subject:test\\"value'
145
+ # => #<Operator subject: 'test"value'>
146
+ ```
147
+
148
+ This allows you to include literal quotes and backslashes in any token, whether quoted or unquoted.
149
+
150
+ ## Testing
151
+
152
+ All tests pass with comprehensive coverage:
153
+ - Basic functionality tests
154
+ - Escape sequence tests in tokenizer
155
+ - Integration tests for parsing with escaped quotes
156
+ - SQL generation tests with escaped quotes
157
+
158
+ New tests added:
159
+ - `test_quoted_text_search_uses_substring` in `test/sql_visitor_test.rb`
160
+ - `test_tokenize_quoted_string_with_escaped_quote` in `test/tokenizer_test.rb`
161
+ - `test_tokenize_quoted_string_with_escaped_backslash` in `test/tokenizer_test.rb`
162
+ - `test_tokenize_word_with_escaped_quote` in `test/tokenizer_test.rb`
163
+ - `test_tokenize_word_with_escaped_backslash` in `test/tokenizer_test.rb`
164
+ - `test_quoted_string_with_escaped_quotes` in `test/gmail_search_syntax_test.rb`
165
+ - `test_unquoted_text_with_escaped_quote` in `test/gmail_search_syntax_test.rb`
166
+ - `test_unquoted_text_with_escaped_backslash` in `test/gmail_search_syntax_test.rb`
167
+ - `test_subject_with_escaped_quotes` in `test/sql_visitor_test.rb`
168
+ - `test_unquoted_token_with_escaped_quote` in `test/sql_visitor_test.rb`
169
+ - `test_operator_with_unquoted_escaped_quote` in `test/sql_visitor_test.rb`
170
+
171
+ Run demos:
172
+ - `bundle exec ruby examples/text_vs_substring_demo.rb`
173
+ - `bundle exec ruby examples/escaped_quotes_demo.rb`
174
+
data/README.md CHANGED
@@ -39,11 +39,11 @@ GmailSearchSyntax.parse!("from:amy OR from:bob")
39
39
 
40
40
  # Negation
41
41
  GmailSearchSyntax.parse!("dinner -movie")
42
- # => #<And #<Text "dinner"> AND #<Not #<Text "movie">>>
42
+ # => #<And #<StringToken "dinner"> AND #<Not #<StringToken "movie">>>
43
43
 
44
44
  # Proximity search
45
45
  GmailSearchSyntax.parse!("holiday AROUND 10 vacation")
46
- # => #<Around #<Text "holiday"> AROUND 10 #<Text "vacation">>
46
+ # => #<Around #<StringToken "holiday"> AROUND 10 #<StringToken "vacation">>
47
47
 
48
48
  # Complex query with OR inside operator values
49
49
  GmailSearchSyntax.parse!("from:{alice@ bob@} subject:urgent")