orangeslice 1.7.0 → 1.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,220 @@
1
+ # B2B Database: Generalized Rules for Table Selection
2
+
3
+ Based on 40+ comprehensive tests across different filter types, selectivity levels, and cross-table patterns.
4
+
5
+ ---
6
+
7
+ ## The Core Rules
8
+
9
+ ### Rule 1: Single Filter Type Determines Winner
10
+
11
+ | Filter Type | Normalized | Denormalized | Winner |
12
+ | ------------------------ | ---------- | ------------ | ------------------- |
13
+ | **ID lookup** | 4-6ms | 12-31ms | Normalized (3-8x) |
14
+ | **NULL check** | 5ms | 18ms | Normalized (3.6x) |
15
+ | **Numeric range** | 19ms | 187ms | Normalized (10x) |
16
+ | **Numeric comparison** | 115ms | 354ms | Normalized (3x) |
17
+ | **Single skill array** | 209ms | 152ms | Denormalized (1.4x) |
18
+ | **updated_at** (indexed) | 4ms | 14ms | Normalized (3.5x) |
19
+
20
+ **Generalization**: For single filters, normalized wins on indexed columns and numeric filters; denormalized wins on array filters.
21
+
22
+ ---
23
+
24
+ ### Rule 2: Text Selectivity is the Key Factor
25
+
26
+ | Headline Term | Normalized | Denormalized | Winner |
27
+ | ------------------------ | ---------- | ------------ | ----------------------- |
28
+ | **CEO** (very common) | 55ms | 131ms | Normalized (2.4x) |
29
+ | **engineer** (common) | 24ms | 136ms | Normalized (5.7x) |
30
+ | **devops** (uncommon) | 409ms | 256ms | **Denormalized (1.6x)** |
31
+ | **tensorflow** (rare) | 28,000ms | 9,873ms | **Denormalized (2.8x)** |
32
+ | **solidity** (very rare) | 27,246ms | 12,829ms | **Denormalized (2.1x)** |
33
+
34
+ **Crossover Point**: Around "uncommon" selectivity level.
35
+
36
+ **Generalization**:
37
+
38
+ - **Common terms** (CEO, engineer, manager, developer) → Use normalized
39
+ - **Uncommon/rare terms** (devops, kubernetes, blockchain, tensorflow) → Use denormalized
40
+
41
+ ---
42
+
43
+ ### Rule 3: Filter Count Matters
44
+
45
+ | # of Filters | Pattern | Normalized | Denormalized | Winner |
46
+ | ------------ | ----------------------- | ---------- | ------------ | ----------------------- |
47
+ | 1 | Numeric only | 19-115ms | 187-354ms | Normalized (3-10x) |
48
+ | 1 | Text (common) | 24-55ms | 131-136ms | Normalized (2.4-5.7x) |
49
+ | 1 | Text (rare) | 27-28s | 9-13s | Denormalized (2-2.8x) |
50
+ | 2 | Numeric + text (common) | 119ms | 603ms | Normalized (5x) |
51
+ | 2 | Numeric + text (rare) | 10,643ms | 4,024ms | **Denormalized (2.6x)** |
52
+ | 2 | Array + text | 2,968ms | 1,037ms | **Denormalized (2.9x)** |
53
+ | 3 | Array + numeric + text | 6,948ms | 3,257ms | **Denormalized (2.1x)** |
54
+ | 3 | Company triple filter | 6,568ms | 1,486ms | **Denormalized (4.4x)** |
55
+
56
+ **Generalization**:
57
+
58
+ - **1 filter**: Normalized usually wins (except rare text or arrays)
59
+ - **2 filters**: Depends on selectivity and filter types
60
+ - **3+ filters**: Denormalized usually wins
61
+
62
+ ---
63
+
64
+ ### Rule 4: Filter Location in Cross-Table Queries
65
+
66
+ | Text Filter Location | Normalized | Denormalized | Winner |
67
+ | ---------------------------- | ---------- | -------------- | ------------------------ |
68
+ | **Company description only** | 125ms | 4,166-15,435ms | **Normalized (33-123x)** |
69
+ | **Profile headline only** | TIMEOUT | 602-709ms | **Denormalized (∞)** |
70
+ | **Both profile + company** | 702ms | 3,440ms | **Normalized (4.9x)** |
71
+
72
+ **Generalization**:
73
+
74
+ - Text filter on **company** side → Normalized wins (company table smaller)
75
+ - Text filter on **profile** side → Denormalized wins (profile table huge)
76
+ - Text filter on **both** → Normalized wins if company filter is selective
77
+
78
+ ---
79
+
80
+ ### Rule 5: LIMIT Size Impact
81
+
82
+ | Term Type | LIMIT 10 | LIMIT 100 | LIMIT 500-1000 |
83
+ | --------------------- | ------------------------- | --------------------------- | ------------------------------- |
84
+ | **Common** (engineer) | N: 24ms, D: 42ms (N 1.8x) | N: 24ms, D: 136ms (N 5.7x) | N: 80ms, D: 865ms (N 10.8x) |
85
+ | **Rare** (blockchain) | N: 234ms, D: 241ms (tie) | N: 713ms, D: 384ms (D 1.9x) | N: 3,725ms, D: 2,806ms (D 1.3x) |
86
+
87
+ **Generalization**:
88
+
89
+ - **Common terms**: Normalized scales better with larger LIMITs
90
+ - **Rare terms**: Denormalized maintains advantage across all LIMITs
91
+
92
+ ---
93
+
94
+ ### Rule 6: Cross-Table JOIN Patterns
95
+
96
+ | Pattern | Normalized | Denormalized | Winner |
97
+ | -------------------------------- | ---------- | ------------ | ---------------------- |
98
+ | Company ID → employees | 48ms | 279ms | Normalized (5.8x) |
99
+ | Company name (org) search | 274ms | 8,600ms | Normalized (31x) |
100
+ | Profile headline + company size | 20,205ms | 217ms | **Denormalized (93x)** |
101
+ | Profile skill + company size | 7,168ms | 536ms | **Denormalized (13x)** |
102
+ | Profile skill + company industry | TIMEOUT | 3,553ms | **Denormalized (∞)** |
103
+ | Profile text + company text | 702ms | 3,440ms | Normalized (4.9x) |
104
+
105
+ **Generalization**:
106
+
107
+ - **Company-first lookups**: Normalized wins
108
+ - **Profile text + company constraint**: Denormalized wins (dramatically)
109
+ - **Both have text filters**: Normalized may win if company filter is selective
110
+
111
+ ---
112
+
113
+ ## The Decision Matrix
114
+
115
+ ### Profile-Only Queries
116
+
117
+ ```
118
+ Single filter?
119
+ ├─ ID/slug/updated_at (indexed) → Normalized
120
+ ├─ Numeric (connections, followers) → Normalized
121
+ ├─ Single skill array → Denormalized (slight)
122
+ ├─ Headline (common term) → Normalized
123
+ └─ Headline (rare term) → Denormalized
124
+
125
+ Multiple filters?
126
+ ├─ All numeric/indexed → Normalized
127
+ ├─ Contains rare headline term → Denormalized
128
+ ├─ Contains multiple skills → Denormalized
129
+ └─ 3+ filters → Denormalized
130
+ ```
131
+
132
+ ### Company-Only Queries
133
+
134
+ ```
135
+ Single filter?
136
+ ├─ ID/slug (indexed) → Normalized
137
+ ├─ employee_count > high threshold → Denormalized (odd case)
138
+ ├─ employee_count range → Normalized or tie
139
+ └─ Description ILIKE → Normalized or tie
140
+
141
+ Multiple filters?
142
+ ├─ 2 filters → Usually tie or slight normalized advantage
143
+ └─ 3+ filters → Denormalized
144
+ ```
145
+
146
+ ### Cross-Table Queries (Profile + Company)
147
+
148
+ ```
149
+ Text filter location?
150
+ ├─ Only on company → Normalized (company table smaller)
151
+ ├─ Only on profile → Denormalized (profile table huge)
152
+ └─ Both → Depends on company filter selectivity
153
+
154
+ Filter type combination?
155
+ ├─ Company-first (ID/name lookup) → Normalized
156
+ ├─ Profile text + company numeric → Denormalized
157
+ ├─ Profile skill + company constraint → Denormalized
158
+ └─ Profile numeric + company text → Normalized (if company selective)
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Quick Reference Table
164
+
165
+ | Scenario | Use This | Speedup |
166
+ | ------------------------------ | ----------------------- | ----------- |
167
+ | ID lookup | Normalized | 3-8x |
168
+ | Slug lookup | Normalized (with key64) | ∞ |
169
+ | updated_at filter | Normalized | 3.5x |
170
+ | Common headline term | Normalized | 2.4-5.7x |
171
+ | **Rare headline term** | Denormalized | 2-2.8x |
172
+ | **Multi-skill search** | Denormalized | 1.4-2.9x |
173
+ | **3+ combined filters** | Denormalized | 2-4x |
174
+ | Company name via org | Normalized (GIN) | 31-68x |
175
+ | Company description search | Normalized | 1.2x or tie |
176
+ | **Profile headline + company** | Denormalized | 13-93x |
177
+ | Company text + profile | Normalized | 4.9x |
178
+
179
+ ---
180
+
181
+ ## The Three Key Insights
182
+
183
+ ### 1. Text Selectivity Crossover
184
+
185
+ Common terms have high early-termination probability → Normalized wins.
186
+ Rare terms require full table scan → Denormalized wins (smaller row size).
187
+
188
+ ### 2. Filter Count Scaling
189
+
190
+ More filters = more full-table scan work → Denormalized's smaller row size advantage compounds.
191
+
192
+ ### 3. Table Size Asymmetry
193
+
194
+ - `linkedin_profile`: ~1.15 billion rows
195
+ - `linkedin_company`: ~millions of rows (100-1000x smaller)
196
+
197
+ Text filter on the **smaller table** (company) is fast for normalized.
198
+ Text filter on the **larger table** (profile) favors denormalized.
199
+
200
+ ---
201
+
202
+ ## Summary: When to Use Each
203
+
204
+ ### Use Normalized When:
205
+
206
+ 1. Single indexed filter (ID, slug, updated_at)
207
+ 2. Company-first queries
208
+ 3. Common headline terms (CEO, engineer, manager)
209
+ 4. Text filter is on company side (not profile)
210
+ 5. Large LIMIT with common terms
211
+ 6. Aggregations (COUNT, GROUP BY)
212
+
213
+ ### Use Denormalized When:
214
+
215
+ 1. Rare headline terms (kubernetes, blockchain, tensorflow)
216
+ 2. Multiple skills in query
217
+ 3. 3+ combined filters
218
+ 4. Profile text filter + company constraint
219
+ 5. Country filter + headline filter
220
+ 6. Complex regex patterns on headline
@@ -0,0 +1,240 @@
1
+ # B2B Database NLP Test Queries
2
+
3
+ Natural language queries for testing the B2B database agent.
4
+
5
+ ---
6
+
7
+ ## Company Lookups
8
+
9
+ 1. Find the company with domain stripe.com
10
+ 2. Get company info for Anthropic
11
+ 3. Look up the company OpenAI
12
+ 4. What company has the website openai.com?
13
+ 5. Show me details about Stripe
14
+
15
+ ## People at Companies
16
+
17
+ 6. Show me people who work at Stripe
18
+ 7. Find engineers at Stripe
19
+ 8. Find sales and account executives at OpenAI
20
+ 9. Who are the software engineers at Google?
21
+ 10. Find product managers at Meta
22
+
23
+ ## Company Description Searches
24
+
25
+ 11. Find AI video companies
26
+ 12. Find companies building personalization infrastructure
27
+ 13. Find AI platforms for sales and RevOps
28
+ 14. Find companies with developer APIs
29
+ 15. Find companies that do data integration
30
+ 16. Find cybersecurity startups
31
+ 17. Find companies building workflow automation tools
32
+
33
+ ## Top-of-Funnel Company Lists
34
+
35
+ 18. Find SaaS platform software companies
36
+ 19. List all tech companies
37
+ 20. Find B2B software companies
38
+ 21. Show me internet companies
39
+
40
+ ## People by Headline/Title
41
+
42
+ 22. Find people working in AI
43
+ 23. Find CTOs at AI startups
44
+ 24. Find founders, CEOs, CTOs, VPs, and directors
45
+ 25. Find C-suite executives
46
+ 26. Find VP of Sales professionals
47
+ 27. Find data scientists
48
+ 28. Find product managers in tech
49
+
50
+ ## Complex Company Searches
51
+
52
+ 29. Find SaaS companies with usage-based pricing
53
+ 30. Find companies with video APIs or SDKs
54
+ 31. Find companies doing real-time analytics
55
+ 32. Find enterprise security companies
56
+
57
+ ## Funding Queries
58
+
59
+ 33. Find recently funded companies
60
+ 34. Find Series A software companies
61
+ 35. Find companies funded by Andreessen Horowitz
62
+ 36. Find seed stage startups funded recently
63
+ 37. Find Series B companies in fintech
64
+ 38. Find companies that raised over $50M
65
+ 39. Show me YC-backed companies
66
+
67
+ ## Industry Lookups
68
+
69
+ 40. What are the top industries by company count?
70
+ 41. What industry is OpenAI in?
71
+ 42. List all industry categories
72
+
73
+ ## Employee Growth
74
+
75
+ 43. Find fast-growing companies with over 50% growth
76
+ 44. Find hypergrowth companies
77
+ 45. Find companies that doubled headcount this year
78
+ 46. Show me the fastest growing startups
79
+
80
+ ## Targeted Role + Company Searches
81
+
82
+ 47. Find founders at AI companies
83
+ 48. Find CTOs at fintech companies
84
+ 49. Find VPs of Engineering at SaaS companies
85
+ 50. Find sales leaders at Series A startups
86
+ 51. Find marketing executives at e-commerce companies
87
+
88
+ ## People at Company Types
89
+
90
+ 52. Find engineers at tech companies
91
+ 53. Find decision makers at AI video companies
92
+ 54. Find recruiters at fast-growing startups
93
+ 55. Find designers at consumer apps
94
+
95
+ ## Role + Funding Stage
96
+
97
+ 56. Find engineers at Series A startups
98
+ 57. Find VPs of Engineering at Series B companies
99
+ 58. Find CTOs at recently funded AI companies
100
+ 59. Find product managers at seed stage companies
101
+ 60. Find sales reps at Series C companies
102
+
103
+ ## Education
104
+
105
+ 61. Find Stanford alumni who work at Google
106
+ 62. Find people who got their MBA recently
107
+ 63. Find people with Computer Science degrees
108
+ 64. Find MIT graduates in tech
109
+ 65. Find Harvard Business School alumni
110
+
111
+ ## Skills
112
+
113
+ 66. Find people with Python and Machine Learning skills
114
+ 67. Find full-stack developers with React and Node.js
115
+ 68. Find data engineers with SQL and Spark
116
+ 69. Find iOS developers with Swift experience
117
+ 70. Find people who know Kubernetes
118
+
119
+ ## Certifications
120
+
121
+ 71. Find AWS certified professionals
122
+ 72. Find Google Cloud certified people
123
+ 73. Find PMP certified project managers
124
+ 74. Find Salesforce certified admins
125
+
126
+ ## Thought Leaders & Content
127
+
128
+ 75. Find AI thought leaders with popular articles
129
+ 76. Find people who write LinkedIn articles frequently
130
+ 77. Find tech influencers
131
+ 78. Find people writing about machine learning
132
+
133
+ ## Job Postings
134
+
135
+ 79. Show me jobs at Stripe
136
+ 80. Find high-paying engineering jobs over 200k
137
+ 81. Find remote jobs posted recently
138
+ 82. Find senior engineering roles at startups
139
+ 83. Show me product manager openings
140
+
141
+ ## Company Social Activity
142
+
143
+ 84. Find companies posting about AI
144
+ 85. Find viral company posts with high engagement
145
+ 86. Find companies announcing new products
146
+
147
+ ## Company Attributes
148
+
149
+ 87. Find companies specializing in ML or AI
150
+ 88. Find public companies with stock tickers
151
+ 89. Find companies with active career pages
152
+ 90. Find companies with multiple office locations
153
+
154
+ ## Alumni Searches
155
+
156
+ 91. Find ex-Google employees who now work at startups
157
+ 92. Find former Stripe engineers
158
+ 93. Find Meta alumni at AI companies
159
+ 94. Find people who left Amazon for startups
160
+
161
+ ## Analytics & Aggregations
162
+
163
+ 95. What roles are most common at OpenAI?
164
+ 96. What is the average funding by round type?
165
+ 97. How many engineers does Stripe have?
166
+ 98. What's the headcount breakdown at Google?
167
+
168
+ ## Influencers
169
+
170
+ 99. Find LinkedIn influencers
171
+ 100. Find people with over 50,000 followers
172
+ 101. Find well-connected professionals
173
+
174
+ ## Reference Data
175
+
176
+ 102. What are the company types?
177
+ 103. What are the job seniority levels?
178
+ 104. What are the employment types?
179
+ 105. List countries in North America
180
+
181
+ ## Geography
182
+
183
+ 106. Find tech companies in the UK
184
+ 107. Find tech companies in the Bay Area
185
+ 108. Find tech workers in New York
186
+ 109. Find startups in Austin
187
+ 110. Find AI companies in London
188
+
189
+ ## Ranked Companies
190
+
191
+ 111. Show me Fortune 500 companies
192
+ 112. Show me Inc Magazine ranked companies
193
+ 113. Find Forbes Cloud 100 companies
194
+
195
+ ## Volunteers & Awards
196
+
197
+ 114. Find people who volunteer in education
198
+ 115. Find people with innovation awards
199
+ 116. Find award-winning engineers
200
+
201
+ ## Recommendations & Languages
202
+
203
+ 117. Find highly recommended people
204
+ 118. Find multilingual professionals
205
+ 119. Find people who speak Mandarin and English
206
+
207
+ ## Patents & Publications
208
+
209
+ 120. Find patent holders
210
+ 121. Find people with AI patents
211
+ 122. Find published researchers
212
+ 123. Find people with machine learning patents
213
+
214
+ ## Projects & Links
215
+
216
+ 124. Find people with AI/ML projects
217
+ 125. Find people with GitHub profiles
218
+ 126. Find developers with open source contributions
219
+
220
+ ## Full Enrichment
221
+
222
+ 127. Get full company profile for Stripe
223
+ 128. Get complete info on this person
224
+ 129. Enrich this company with funding data
225
+ 130. Get all details about OpenAI
226
+
227
+ ---
228
+
229
+ ## Complex Multi-Criteria Queries
230
+
231
+ 131. Find CTOs at Series A AI companies in San Francisco
232
+ 132. Find Stanford MBAs who are now founders at fintech startups
233
+ 133. Find ex-Google engineers at YC companies
234
+ 134. Find AWS certified developers at fast-growing startups
235
+ 135. Find VPs of Sales at recently funded B2B SaaS companies
236
+ 136. Find people with ML skills who work at Series B companies
237
+ 137. Find founders at AI companies who went to Stanford
238
+ 138. Find CTOs at healthcare tech companies with recent funding
239
+ 139. Find engineers at remote-first companies with Python skills
240
+ 140. Find product managers at e-commerce companies in New York