orangeslice 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +107 -0
- package/dist/b2b.d.ts +30 -0
- package/dist/b2b.js +89 -0
- package/dist/index.d.ts +29 -0
- package/dist/index.js +28 -0
- package/dist/queue.d.ts +9 -0
- package/dist/queue.js +48 -0
- package/docs/B2B_CROSS_TABLE_TEST_FINDINGS.md +255 -0
- package/docs/B2B_DATABASE.md +314 -0
- package/docs/B2B_DATABASE_TEST_FINDINGS.md +476 -0
- package/docs/B2B_EMPLOYEE_SEARCH.md +697 -0
- package/docs/B2B_GENERALIZATION_RULES.md +220 -0
- package/docs/B2B_NLP_QUERY_MAPPINGS.md +240 -0
- package/docs/B2B_NORMALIZED_VS_DENORMALIZED.md +952 -0
- package/docs/B2B_SCHEMA.md +1042 -0
- package/docs/B2B_SQL_COMPREHENSIVE_TEST_FINDINGS.md +301 -0
- package/docs/B2B_TABLE_INDICES.ts +496 -0
- package/package.json +33 -0
package/README.md
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
# orangeslice
|
|
2
|
+
|
|
3
|
+
Rate-limited B2B API client for AI agents. Call `orangeslice.b2b.sql()` anywhere, anytime - concurrency and rate limiting handled automatically.
|
|
4
|
+
|
|
5
|
+
## Documentation
|
|
6
|
+
|
|
7
|
+
**Before writing queries, read the docs in [`./docs/`](./docs/):**
|
|
8
|
+
|
|
9
|
+
| Doc | What it covers |
|
|
10
|
+
|-----|----------------|
|
|
11
|
+
| [B2B_DATABASE.md](./docs/B2B_DATABASE.md) | Database overview, API endpoint, request format |
|
|
12
|
+
| [B2B_SCHEMA.md](./docs/B2B_SCHEMA.md) | All tables and columns |
|
|
13
|
+
| [B2B_EMPLOYEE_SEARCH.md](./docs/B2B_EMPLOYEE_SEARCH.md) | How to search for people/employees |
|
|
14
|
+
| [B2B_GENERALIZATION_RULES.md](./docs/B2B_GENERALIZATION_RULES.md) | Query patterns and best practices |
|
|
15
|
+
| [B2B_NLP_QUERY_MAPPINGS.md](./docs/B2B_NLP_QUERY_MAPPINGS.md) | Natural language to SQL mappings |
|
|
16
|
+
| [B2B_TABLE_INDICES.ts](./docs/B2B_TABLE_INDICES.ts) | TypeScript types for all tables |
|
|
17
|
+
|
|
18
|
+
Start with `B2B_DATABASE.md` and `B2B_SCHEMA.md` to understand the data model.
|
|
19
|
+
|
|
20
|
+
## Installation
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
npm install orangeslice
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
## Usage
|
|
27
|
+
|
|
28
|
+
```typescript
|
|
29
|
+
import { orangeslice } from 'orangeslice';
|
|
30
|
+
|
|
31
|
+
// Query the B2B database - automatically rate-limited
|
|
32
|
+
const companies = await orangeslice.b2b.sql(`
|
|
33
|
+
SELECT company_name, domain, employee_count
|
|
34
|
+
FROM linkedin_company
|
|
35
|
+
WHERE universal_name = 'stripe'
|
|
36
|
+
`);
|
|
37
|
+
|
|
38
|
+
// Multiple parallel calls are queued (max 2 concurrent by default)
|
|
39
|
+
const results = await Promise.all([
|
|
40
|
+
orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'stripe'"),
|
|
41
|
+
orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'openai'"),
|
|
42
|
+
orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'meta'"),
|
|
43
|
+
orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'google'"),
|
|
44
|
+
orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'amazon'"),
|
|
45
|
+
]);
|
|
46
|
+
// ^ Only 2 run at once, rest wait in queue automatically
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Configuration
|
|
50
|
+
|
|
51
|
+
```typescript
|
|
52
|
+
// Optional - configure before use
|
|
53
|
+
orangeslice.b2b.configure({
|
|
54
|
+
proxyUrl: 'http://your-proxy-url:3000/query', // default: B2B_SQL_PROXY_URL env var
|
|
55
|
+
concurrency: 3, // default: 2
|
|
56
|
+
minDelayMs: 200, // default: 100ms between requests
|
|
57
|
+
});
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Environment Variables
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
B2B_SQL_PROXY_URL=http://165.22.151.131:3000/query
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## How It Works
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
Agent calls: [1] [2] [3] [4] [5] [6]
|
|
70
|
+
↓
|
|
71
|
+
Queue: [ 1, 2 running ] [ 3, 4, 5, 6 waiting ]
|
|
72
|
+
↓
|
|
73
|
+
API: [1] [2] ← only 2 hit the API at once
|
|
74
|
+
↓
|
|
75
|
+
[3] [4] ← when 1,2 finish, 3,4 start
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
**Agent never has to:**
|
|
79
|
+
- Think about concurrency
|
|
80
|
+
- Add `await sleep()`
|
|
81
|
+
- Worry about rate limits
|
|
82
|
+
- Handle API throttling errors
|
|
83
|
+
|
|
84
|
+
## API Reference
|
|
85
|
+
|
|
86
|
+
### `orangeslice.b2b.sql<T>(query: string): Promise<T>`
|
|
87
|
+
|
|
88
|
+
Execute SQL and return rows.
|
|
89
|
+
|
|
90
|
+
```typescript
|
|
91
|
+
const companies = await orangeslice.b2b.sql<Company[]>(
|
|
92
|
+
"SELECT * FROM linkedin_company WHERE employee_count > 1000 LIMIT 10"
|
|
93
|
+
);
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### `orangeslice.b2b.query<T>(query: string): Promise<QueryResult<T>>`
|
|
97
|
+
|
|
98
|
+
Execute SQL and return full result with metadata.
|
|
99
|
+
|
|
100
|
+
```typescript
|
|
101
|
+
const result = await orangeslice.b2b.query("SELECT * FROM linkedin_company LIMIT 10");
|
|
102
|
+
// result.rows, result.rowCount, result.duration_ms
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
## Note on Concurrency
|
|
106
|
+
|
|
107
|
+
The rate limit is **per-process**. If you run multiple scripts simultaneously, each has its own queue. For most AI agent use cases (single script), this is fine.
|
package/dist/b2b.d.ts
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Configure the B2B client
|
|
3
|
+
*/
|
|
4
|
+
export declare function configure(options: {
|
|
5
|
+
proxyUrl?: string;
|
|
6
|
+
concurrency?: number;
|
|
7
|
+
minDelayMs?: number;
|
|
8
|
+
}): void;
|
|
9
|
+
export interface QueryResult<T = Record<string, unknown>> {
|
|
10
|
+
rows: T[];
|
|
11
|
+
rowCount: number;
|
|
12
|
+
duration_ms: number;
|
|
13
|
+
}
|
|
14
|
+
/**
|
|
15
|
+
* Execute a SQL query against the B2B database.
|
|
16
|
+
* Automatically rate-limited and concurrency-controlled.
|
|
17
|
+
*
|
|
18
|
+
* @example
|
|
19
|
+
* const companies = await b2b.sql<Company[]>("SELECT * FROM linkedin_company WHERE domain = 'stripe.com'");
|
|
20
|
+
*/
|
|
21
|
+
export declare function sql<T = Record<string, unknown>[]>(query: string): Promise<T>;
|
|
22
|
+
/**
|
|
23
|
+
* Execute a SQL query and get full result with metadata
|
|
24
|
+
*/
|
|
25
|
+
export declare function query<T = Record<string, unknown>>(sqlQuery: string): Promise<QueryResult<T>>;
|
|
26
|
+
export declare const b2b: {
|
|
27
|
+
sql: typeof sql;
|
|
28
|
+
query: typeof query;
|
|
29
|
+
configure: typeof configure;
|
|
30
|
+
};
|
package/dist/b2b.js
ADDED
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
3
|
+
exports.b2b = void 0;
|
|
4
|
+
exports.configure = configure;
|
|
5
|
+
exports.sql = sql;
|
|
6
|
+
exports.query = query;
|
|
7
|
+
const queue_1 = require("./queue");
|
|
8
|
+
// Default config
|
|
9
|
+
let config = {
|
|
10
|
+
proxyUrl: process.env.B2B_SQL_PROXY_URL || "http://165.22.151.131:3000/query",
|
|
11
|
+
concurrency: 2,
|
|
12
|
+
minDelayMs: 100, // 100ms between requests = max 10/sec
|
|
13
|
+
};
|
|
14
|
+
// Create queue and rate limiter with defaults
|
|
15
|
+
let queue = (0, queue_1.createQueue)(config.concurrency);
|
|
16
|
+
let rateLimiter = (0, queue_1.createRateLimiter)(config.minDelayMs);
|
|
17
|
+
/**
|
|
18
|
+
* Configure the B2B client
|
|
19
|
+
*/
|
|
20
|
+
function configure(options) {
|
|
21
|
+
if (options.proxyUrl)
|
|
22
|
+
config.proxyUrl = options.proxyUrl;
|
|
23
|
+
if (options.concurrency) {
|
|
24
|
+
config.concurrency = options.concurrency;
|
|
25
|
+
queue = (0, queue_1.createQueue)(options.concurrency);
|
|
26
|
+
}
|
|
27
|
+
if (options.minDelayMs !== undefined) {
|
|
28
|
+
config.minDelayMs = options.minDelayMs;
|
|
29
|
+
rateLimiter = (0, queue_1.createRateLimiter)(options.minDelayMs);
|
|
30
|
+
}
|
|
31
|
+
}
|
|
32
|
+
/**
|
|
33
|
+
* Execute a SQL query against the B2B database.
|
|
34
|
+
* Automatically rate-limited and concurrency-controlled.
|
|
35
|
+
*
|
|
36
|
+
* @example
|
|
37
|
+
* const companies = await b2b.sql<Company[]>("SELECT * FROM linkedin_company WHERE domain = 'stripe.com'");
|
|
38
|
+
*/
|
|
39
|
+
async function sql(query) {
|
|
40
|
+
return queue(async () => {
|
|
41
|
+
return rateLimiter(async () => {
|
|
42
|
+
const response = await fetch(config.proxyUrl, {
|
|
43
|
+
method: "POST",
|
|
44
|
+
headers: { "Content-Type": "application/json" },
|
|
45
|
+
body: JSON.stringify({ sql: query }),
|
|
46
|
+
});
|
|
47
|
+
if (!response.ok) {
|
|
48
|
+
throw new Error(`B2B SQL request failed: ${response.status} ${response.statusText}`);
|
|
49
|
+
}
|
|
50
|
+
const data = (await response.json());
|
|
51
|
+
if (data.error) {
|
|
52
|
+
throw new Error(`B2B SQL error: ${data.error}`);
|
|
53
|
+
}
|
|
54
|
+
return (data.rows || []);
|
|
55
|
+
});
|
|
56
|
+
});
|
|
57
|
+
}
|
|
58
|
+
/**
|
|
59
|
+
* Execute a SQL query and get full result with metadata
|
|
60
|
+
*/
|
|
61
|
+
async function query(sqlQuery) {
|
|
62
|
+
return queue(async () => {
|
|
63
|
+
return rateLimiter(async () => {
|
|
64
|
+
const response = await fetch(config.proxyUrl, {
|
|
65
|
+
method: "POST",
|
|
66
|
+
headers: { "Content-Type": "application/json" },
|
|
67
|
+
body: JSON.stringify({ sql: sqlQuery }),
|
|
68
|
+
});
|
|
69
|
+
if (!response.ok) {
|
|
70
|
+
throw new Error(`B2B SQL request failed: ${response.status} ${response.statusText}`);
|
|
71
|
+
}
|
|
72
|
+
const data = (await response.json());
|
|
73
|
+
if (data.error) {
|
|
74
|
+
throw new Error(`B2B SQL error: ${data.error}`);
|
|
75
|
+
}
|
|
76
|
+
return {
|
|
77
|
+
rows: (data.rows || []),
|
|
78
|
+
rowCount: data.rowCount || 0,
|
|
79
|
+
duration_ms: data.duration_ms || 0,
|
|
80
|
+
};
|
|
81
|
+
});
|
|
82
|
+
});
|
|
83
|
+
}
|
|
84
|
+
// Export as namespace
|
|
85
|
+
exports.b2b = {
|
|
86
|
+
sql,
|
|
87
|
+
query,
|
|
88
|
+
configure,
|
|
89
|
+
};
|
package/dist/index.d.ts
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
1
|
+
import { b2b } from "./b2b";
|
|
2
|
+
export { b2b };
|
|
3
|
+
/**
|
|
4
|
+
* Main orangeslice namespace
|
|
5
|
+
*
|
|
6
|
+
* @example
|
|
7
|
+
* import { orangeslice } from 'orangeslice';
|
|
8
|
+
*
|
|
9
|
+
* // Configure (optional)
|
|
10
|
+
* orangeslice.b2b.configure({ concurrency: 3 });
|
|
11
|
+
*
|
|
12
|
+
* // Query - automatically rate-limited
|
|
13
|
+
* const companies = await orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE domain = 'stripe.com'");
|
|
14
|
+
*
|
|
15
|
+
* // Multiple parallel calls are queued automatically
|
|
16
|
+
* const results = await Promise.all([
|
|
17
|
+
* orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'stripe'"),
|
|
18
|
+
* orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'openai'"),
|
|
19
|
+
* orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'meta'"),
|
|
20
|
+
* ]);
|
|
21
|
+
*/
|
|
22
|
+
export declare const orangeslice: {
|
|
23
|
+
b2b: {
|
|
24
|
+
sql: typeof import("./b2b").sql;
|
|
25
|
+
query: typeof import("./b2b").query;
|
|
26
|
+
configure: typeof import("./b2b").configure;
|
|
27
|
+
};
|
|
28
|
+
};
|
|
29
|
+
export default orangeslice;
|
package/dist/index.js
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
3
|
+
exports.orangeslice = exports.b2b = void 0;
|
|
4
|
+
const b2b_1 = require("./b2b");
|
|
5
|
+
Object.defineProperty(exports, "b2b", { enumerable: true, get: function () { return b2b_1.b2b; } });
|
|
6
|
+
/**
|
|
7
|
+
* Main orangeslice namespace
|
|
8
|
+
*
|
|
9
|
+
* @example
|
|
10
|
+
* import { orangeslice } from 'orangeslice';
|
|
11
|
+
*
|
|
12
|
+
* // Configure (optional)
|
|
13
|
+
* orangeslice.b2b.configure({ concurrency: 3 });
|
|
14
|
+
*
|
|
15
|
+
* // Query - automatically rate-limited
|
|
16
|
+
* const companies = await orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE domain = 'stripe.com'");
|
|
17
|
+
*
|
|
18
|
+
* // Multiple parallel calls are queued automatically
|
|
19
|
+
* const results = await Promise.all([
|
|
20
|
+
* orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'stripe'"),
|
|
21
|
+
* orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'openai'"),
|
|
22
|
+
* orangeslice.b2b.sql("SELECT * FROM linkedin_company WHERE universal_name = 'meta'"),
|
|
23
|
+
* ]);
|
|
24
|
+
*/
|
|
25
|
+
exports.orangeslice = {
|
|
26
|
+
b2b: b2b_1.b2b,
|
|
27
|
+
};
|
|
28
|
+
exports.default = exports.orangeslice;
|
package/dist/queue.d.ts
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Simple concurrency queue - limits how many async operations run at once.
|
|
3
|
+
* Any excess calls are queued and run when a slot opens.
|
|
4
|
+
*/
|
|
5
|
+
export declare function createQueue(concurrency: number): <T>(fn: () => Promise<T>) => Promise<T>;
|
|
6
|
+
/**
|
|
7
|
+
* Rate limiter - ensures minimum delay between requests
|
|
8
|
+
*/
|
|
9
|
+
export declare function createRateLimiter(minDelayMs: number): <T>(fn: () => Promise<T>) => Promise<T>;
|
package/dist/queue.js
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
3
|
+
exports.createQueue = createQueue;
|
|
4
|
+
exports.createRateLimiter = createRateLimiter;
|
|
5
|
+
/**
|
|
6
|
+
* Simple concurrency queue - limits how many async operations run at once.
|
|
7
|
+
* Any excess calls are queued and run when a slot opens.
|
|
8
|
+
*/
|
|
9
|
+
function createQueue(concurrency) {
|
|
10
|
+
let active = 0;
|
|
11
|
+
const pending = [];
|
|
12
|
+
const next = () => {
|
|
13
|
+
if (active < concurrency && pending.length > 0) {
|
|
14
|
+
active++;
|
|
15
|
+
const resolve = pending.shift();
|
|
16
|
+
resolve();
|
|
17
|
+
}
|
|
18
|
+
};
|
|
19
|
+
return async (fn) => {
|
|
20
|
+
// Wait for a slot to open
|
|
21
|
+
await new Promise((resolve) => {
|
|
22
|
+
pending.push(resolve);
|
|
23
|
+
next();
|
|
24
|
+
});
|
|
25
|
+
try {
|
|
26
|
+
return await fn();
|
|
27
|
+
}
|
|
28
|
+
finally {
|
|
29
|
+
active--;
|
|
30
|
+
next();
|
|
31
|
+
}
|
|
32
|
+
};
|
|
33
|
+
}
|
|
34
|
+
/**
|
|
35
|
+
* Rate limiter - ensures minimum delay between requests
|
|
36
|
+
*/
|
|
37
|
+
function createRateLimiter(minDelayMs) {
|
|
38
|
+
let lastRequest = 0;
|
|
39
|
+
return async (fn) => {
|
|
40
|
+
const now = Date.now();
|
|
41
|
+
const timeSinceLastRequest = now - lastRequest;
|
|
42
|
+
if (timeSinceLastRequest < minDelayMs) {
|
|
43
|
+
await new Promise((resolve) => setTimeout(resolve, minDelayMs - timeSinceLastRequest));
|
|
44
|
+
}
|
|
45
|
+
lastRequest = Date.now();
|
|
46
|
+
return fn();
|
|
47
|
+
};
|
|
48
|
+
}
|
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
# B2B Cross-Table Query Test Findings
|
|
2
|
+
|
|
3
|
+
Comprehensive performance comparison between normalized tables (`linkedin_profile`, `linkedin_company`) and denormalized views (`lkd_profile`, `lkd_company`) for cross-table queries.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Executive Summary
|
|
8
|
+
|
|
9
|
+
| Pattern | Normalized | Denormalized | Winner | Speedup |
|
|
10
|
+
| ---------------------------------- | ---------- | ------------ | ------------ | ------- |
|
|
11
|
+
| **Company ID lookup → employees** | 48ms | 279ms | Normalized | 5.8x |
|
|
12
|
+
| **Company name (org) search** | 274ms | 8,600ms | Normalized | 31x |
|
|
13
|
+
| **GIN-indexed org ILIKE** | 430ms | 29,409ms | Normalized | 68x |
|
|
14
|
+
| **Title ILIKE (common term)** | 64ms | 313ms | Normalized | 4.9x |
|
|
15
|
+
| **updated_at filter** | 4ms | 14ms | Normalized | 3.5x |
|
|
16
|
+
| **Company ID direct lookup** | 4ms | 31ms | Normalized | 7.8x |
|
|
17
|
+
| **Headline (rare term)** | 2,530ms | 1,258ms | Denormalized | 2x |
|
|
18
|
+
| **Skill array search** | 216ms | 169ms | Denormalized | 1.3x |
|
|
19
|
+
| **Industry + employee_count** | 742ms | 202ms | Denormalized | 3.7x |
|
|
20
|
+
| **Headline + company size (JOIN)** | 20,205ms | 217ms | Denormalized | 93x |
|
|
21
|
+
| **Multi-skill + company size** | 28,173ms | 1,281ms | Denormalized | 22x |
|
|
22
|
+
| **Skill + company industry** | TIMEOUT | 3,553ms | Denormalized | ∞ |
|
|
23
|
+
| **Complex multi-filter + company** | TIMEOUT | 4,947ms | Denormalized | ∞ |
|
|
24
|
+
| **AI company + SF location** | TIMEOUT | 11,061ms | Denormalized | ∞ |
|
|
25
|
+
|
|
26
|
+
**Key Finding**: When combining profile text filters (headline, skills) with company constraints (employee_count, industry), **denormalized JOINs are 20-90x faster** and often the only option that completes within timeout.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Critical Pattern: Profile + Company Combined Filters
|
|
31
|
+
|
|
32
|
+
The most important discovery: **cross-table queries with text filters perform dramatically better with denormalized tables**.
|
|
33
|
+
|
|
34
|
+
### Normalized Multi-JOIN (Often Fails)
|
|
35
|
+
|
|
36
|
+
```sql
|
|
37
|
+
-- ❌ TIMEOUT or 20+ seconds
|
|
38
|
+
SELECT lp.id, lp.first_name, lp.headline, lc.company_name
|
|
39
|
+
FROM linkedin_profile lp
|
|
40
|
+
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
41
|
+
JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
|
|
42
|
+
WHERE pos.end_date IS NULL
|
|
43
|
+
AND lp.headline ILIKE '%engineer%'
|
|
44
|
+
AND lc.employee_count > 1000
|
|
45
|
+
LIMIT 50
|
|
46
|
+
-- Result: 20,205ms
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Denormalized JOIN (Fast)
|
|
50
|
+
|
|
51
|
+
```sql
|
|
52
|
+
-- ✅ 217ms - 93x faster
|
|
53
|
+
SELECT lkd.profile_id, lkd.first_name, lkd.headline, lkdc.name
|
|
54
|
+
FROM lkd_profile lkd
|
|
55
|
+
JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
|
|
56
|
+
WHERE lkd.headline ILIKE '%engineer%'
|
|
57
|
+
AND lkdc.employee_count > 1000
|
|
58
|
+
LIMIT 50
|
|
59
|
+
-- Result: 217ms
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Test Results by Category
|
|
65
|
+
|
|
66
|
+
### A. Company-First Queries
|
|
67
|
+
|
|
68
|
+
| Test | Query | Normalized | Denormalized | Winner |
|
|
69
|
+
| ---- | ------------------------------- | ---------- | ------------ | ----------------- |
|
|
70
|
+
| A1 | Employees at company ID | **48ms** | 279ms | Normalized (5.8x) |
|
|
71
|
+
| A2 | Employees by company name (org) | **274ms** | 8,600ms | Normalized (31x) |
|
|
72
|
+
| A3 | Engineers at large companies | **96ms** | 234ms | Normalized (2.4x) |
|
|
73
|
+
|
|
74
|
+
**Conclusion**: For company-first queries, normalized tables win due to indexed lookups.
|
|
75
|
+
|
|
76
|
+
### B. Profile-First Queries
|
|
77
|
+
|
|
78
|
+
| Test | Query | Normalized | Denormalized | Winner |
|
|
79
|
+
| ---- | -------------------------- | ---------- | ------------ | ------------------- |
|
|
80
|
+
| B1 | Python developers | 216ms | **169ms** | Denormalized (1.3x) |
|
|
81
|
+
| B2 | US Data Scientists | 644ms | **557ms** | Denormalized (1.2x) |
|
|
82
|
+
| B3 | Senior engineers + company | 4,535ms | **196ms** | Denormalized (23x) |
|
|
83
|
+
|
|
84
|
+
**Conclusion**: Simple profile queries are similar; profile + company queries favor denormalized.
|
|
85
|
+
|
|
86
|
+
### C. Complex Prospecting Queries
|
|
87
|
+
|
|
88
|
+
| Test | Query | Normalized | Denormalized | Winner |
|
|
89
|
+
| ---- | ----------------------------------------- | ----------- | ------------ | ----------------- |
|
|
90
|
+
| C1 | Decision makers at funded startups | **1,198ms** | 3,124ms | Normalized (2.6x) |
|
|
91
|
+
| C2 | AI company employees in SF | TIMEOUT | **11,061ms** | Denormalized (∞) |
|
|
92
|
+
| C3 | Hybrid (normalized profile + lkd_company) | 9,631ms | - | - |
|
|
93
|
+
|
|
94
|
+
**Conclusion**: When funding table is used (indexed JOIN), normalized wins. When text filters span tables, denormalized wins.
|
|
95
|
+
|
|
96
|
+
### D. Company Lookups
|
|
97
|
+
|
|
98
|
+
| Test | Query | Normalized | Denormalized | Winner |
|
|
99
|
+
| ---- | -------------------------- | ---------- | ------------ | ------------------- |
|
|
100
|
+
| D1 | Company by ID | **4ms** | 31ms | Normalized (7.8x) |
|
|
101
|
+
| D2 | Industry + employee filter | 742ms | **202ms** | Denormalized (3.7x) |
|
|
102
|
+
|
|
103
|
+
### E. Edge Cases
|
|
104
|
+
|
|
105
|
+
| Test | Query | Normalized | Denormalized | Winner |
|
|
106
|
+
| ---- | --------------------- | ---------- | ------------ | ------------------- |
|
|
107
|
+
| E1 | Headline (blockchain) | 713ms | **384ms** | Denormalized (1.9x) |
|
|
108
|
+
| E2 | Company description | 144ms | 152ms | Tie |
|
|
109
|
+
|
|
110
|
+
### F. Verification Tests
|
|
111
|
+
|
|
112
|
+
| Test | Query | Normalized | Denormalized | Winner |
|
|
113
|
+
| ---- | ---------------------------- | ---------- | ------------ | ------------------- |
|
|
114
|
+
| F1 | Multi-skill + company size | 28,173ms | **1,281ms** | Denormalized (22x) |
|
|
115
|
+
| F2 | Country + org (GIN) | **990ms** | 4,594ms | Normalized (4.6x) |
|
|
116
|
+
| F3 | Title regex + company filter | 434ms | **227ms** | Denormalized (1.9x) |
|
|
117
|
+
|
|
118
|
+
### G. Index Pattern Tests
|
|
119
|
+
|
|
120
|
+
| Test | Query | Normalized | Denormalized | Winner |
|
|
121
|
+
| ---- | ------------------------- | ---------- | ------------ | ----------------- |
|
|
122
|
+
| G1 | org ILIKE (GIN indexed) | **430ms** | 29,409ms | Normalized (68x) |
|
|
123
|
+
| G2 | headline ILIKE (no index) | 2,530ms | **1,258ms** | Denormalized (2x) |
|
|
124
|
+
| G3 | title ILIKE | **64ms** | 313ms | Normalized (4.9x) |
|
|
125
|
+
| G4 | updated_at filter | **4ms** | 14ms | Normalized (3.5x) |
|
|
126
|
+
|
|
127
|
+
### H. Cross-Table JOIN Patterns
|
|
128
|
+
|
|
129
|
+
| Test | Query | Normalized | Denormalized | Winner |
|
|
130
|
+
| ---- | ------------------------- | ---------- | ------------ | ------------------ |
|
|
131
|
+
| H1 | Headline + employee_count | 20,205ms | **217ms** | Denormalized (93x) |
|
|
132
|
+
| H2 | Skill + company industry | TIMEOUT | **3,553ms** | Denormalized (∞) |
|
|
133
|
+
| H3 | Multi-filter + company | TIMEOUT | **4,947ms** | Denormalized (∞) |
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## Decision Rules for Cross-Table Queries
|
|
138
|
+
|
|
139
|
+
### Use Normalized (`linkedin_profile` + `linkedin_company` JOINs) When:
|
|
140
|
+
|
|
141
|
+
1. **Company-first lookup** - Start with company ID, get employees
|
|
142
|
+
2. **GIN-indexed field** - Searching `linkedin_profile.org` (company name)
|
|
143
|
+
3. **Indexed lookups** - `updated_at`, company ID, profile ID
|
|
144
|
+
4. **Title field search** - `linkedin_profile.title` is faster
|
|
145
|
+
5. **Indexed JOIN tables** - `linkedin_crunchbase_funding`, `linkedin_profile_position3` by company
|
|
146
|
+
|
|
147
|
+
### Use Denormalized (`lkd_profile` JOIN `lkd_company`) When:
|
|
148
|
+
|
|
149
|
+
1. **Headline + company filter** - 93x faster
|
|
150
|
+
2. **Skill + company constraint** - Normalized times out
|
|
151
|
+
3. **Multi-filter combinations** - 22x faster
|
|
152
|
+
4. **Industry + employee_count** - 3.7x faster
|
|
153
|
+
5. **Text filter spanning profile + company** - Often only option
|
|
154
|
+
|
|
155
|
+
### Never Use:
|
|
156
|
+
|
|
157
|
+
1. `lkd_profile.company_name` ILIKE - Use `linkedin_profile.org` (68x faster)
|
|
158
|
+
2. Normalized multi-JOIN with headline filter - Will timeout or be 20s+
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## Recommended Query Patterns
|
|
163
|
+
|
|
164
|
+
### Pattern 1: Find Employees at Company by Name
|
|
165
|
+
|
|
166
|
+
```sql
|
|
167
|
+
-- ✅ BEST: Use GIN-indexed org field
|
|
168
|
+
SELECT id, first_name, title, headline, org
|
|
169
|
+
FROM linkedin_profile
|
|
170
|
+
WHERE org ILIKE '%Google%'
|
|
171
|
+
LIMIT 50
|
|
172
|
+
-- Result: 274ms
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
### Pattern 2: Find Engineers at Large Companies
|
|
176
|
+
|
|
177
|
+
```sql
|
|
178
|
+
-- ✅ BEST: Denormalized JOIN (93x faster)
|
|
179
|
+
SELECT lkd.profile_id, lkd.first_name, lkd.headline, lkdc.name, lkdc.employee_count
|
|
180
|
+
FROM lkd_profile lkd
|
|
181
|
+
JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
|
|
182
|
+
WHERE lkd.headline ILIKE '%engineer%'
|
|
183
|
+
AND lkdc.employee_count > 1000
|
|
184
|
+
LIMIT 50
|
|
185
|
+
-- Result: 217ms
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### Pattern 3: Find People with Skills at Specific Company Types
|
|
189
|
+
|
|
190
|
+
```sql
|
|
191
|
+
-- ✅ BEST: Denormalized (normalized times out)
|
|
192
|
+
SELECT lkd.profile_id, lkd.first_name, lkd.headline, lkdc.name
|
|
193
|
+
FROM lkd_profile lkd
|
|
194
|
+
JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
|
|
195
|
+
WHERE 'Python' = ANY(lkd.skills)
|
|
196
|
+
AND 'SQL' = ANY(lkd.skills)
|
|
197
|
+
AND lkdc.employee_count BETWEEN 100 AND 5000
|
|
198
|
+
LIMIT 50
|
|
199
|
+
-- Result: 1,281ms (normalized: 28,173ms)
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
### Pattern 4: Prospecting Query (Profile Criteria + Company Criteria)
|
|
203
|
+
|
|
204
|
+
```sql
|
|
205
|
+
-- ✅ BEST: Denormalized for multi-filter
|
|
206
|
+
SELECT lkd.profile_id, lkd.first_name, lkd.title, lkdc.name, lkdc.employee_count
|
|
207
|
+
FROM lkd_profile lkd
|
|
208
|
+
JOIN lkd_company lkdc ON lkdc.linkedin_company_id = lkd.linkedin_company_id
|
|
209
|
+
WHERE lkd.title ~* '(manager|director|lead)'
|
|
210
|
+
AND lkdc.employee_count BETWEEN 100 AND 1000
|
|
211
|
+
LIMIT 50
|
|
212
|
+
-- Result: 227ms (normalized: 434ms)
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
### Pattern 5: Decision Makers at Funded Startups
|
|
216
|
+
|
|
217
|
+
```sql
|
|
218
|
+
-- ✅ BEST: Normalized when using indexed funding table
|
|
219
|
+
SELECT DISTINCT lp.id, lp.first_name, lp.title, lc.company_name
|
|
220
|
+
FROM linkedin_profile lp
|
|
221
|
+
JOIN linkedin_profile_position3 pos ON pos.linkedin_profile_id = lp.id
|
|
222
|
+
JOIN linkedin_company lc ON lc.id = pos.linkedin_company_id
|
|
223
|
+
JOIN linkedin_crunchbase_funding cf ON cf.linkedin_company_id = lc.id
|
|
224
|
+
WHERE pos.end_date IS NULL
|
|
225
|
+
AND lp.title ~* '(CEO|CTO|VP|Director|Head)'
|
|
226
|
+
AND lc.employee_count BETWEEN 10 AND 500
|
|
227
|
+
LIMIT 50
|
|
228
|
+
-- Result: 1,198ms
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
---
|
|
232
|
+
|
|
233
|
+
## Summary: The Cross-Table Golden Rules
|
|
234
|
+
|
|
235
|
+
1. **Company name search** → Always use `linkedin_profile.org` (GIN indexed, 68x faster)
|
|
236
|
+
2. **Headline/skill + company constraint** → Always use denormalized JOIN (20-93x faster, normalized often times out)
|
|
237
|
+
3. **Company-first lookups** → Use normalized (5-8x faster)
|
|
238
|
+
4. **Indexed table JOINs (funding, positions)** → Normalized is fine
|
|
239
|
+
5. **Multi-filter profile + company** → Denormalized is the only option that works
|
|
240
|
+
|
|
241
|
+
### Quick Decision:
|
|
242
|
+
|
|
243
|
+
```
|
|
244
|
+
Need to search by company name?
|
|
245
|
+
└─ YES → Use linkedin_profile.org
|
|
246
|
+
|
|
247
|
+
Need profile text filter (headline/skills) + company constraint?
|
|
248
|
+
└─ YES → Use lkd_profile JOIN lkd_company
|
|
249
|
+
|
|
250
|
+
Need company ID lookup or indexed JOIN?
|
|
251
|
+
└─ YES → Use normalized tables
|
|
252
|
+
|
|
253
|
+
Default for prospecting queries:
|
|
254
|
+
└─ Use lkd_profile JOIN lkd_company
|
|
255
|
+
```
|