mojulo 0.0.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (121) hide show
  1. package/README.md +54 -4
  2. package/lib/audit-logger-new.js +11 -0
  3. package/lib/auth/gate.js +25 -0
  4. package/lib/auth/service.js +17 -0
  5. package/lib/auth/session.js +63 -0
  6. package/lib/builder/chat-processor.js +607 -0
  7. package/lib/builder/composer-bridge.js +82 -0
  8. package/lib/builder/evaluator.js +159 -0
  9. package/lib/builder/executor.js +252 -0
  10. package/lib/builder/index.js +48 -0
  11. package/lib/builder/session.js +248 -0
  12. package/lib/builder/system-prompt.js +422 -0
  13. package/lib/builder/tone-presets.js +75 -0
  14. package/lib/builder/tool-executors.js +1527 -0
  15. package/lib/builder/tools.js +338 -0
  16. package/lib/builder/validators.js +239 -0
  17. package/lib/composer/composer.js +225 -0
  18. package/lib/composer/index.js +40 -0
  19. package/lib/composer/protocols/00_base.txt +19 -0
  20. package/lib/composer/protocols/01_knowledge.txt +9 -0
  21. package/lib/composer/protocols/02_form-gathering.txt +32 -0
  22. package/lib/composer/protocols/03_appointments.txt +16 -0
  23. package/lib/composer/protocols/04_triage.txt +15 -0
  24. package/lib/composer/protocols/05_optical-read.txt +22 -0
  25. package/lib/composer/response-builder.js +98 -0
  26. package/lib/config-builder.js +650 -0
  27. package/lib/db/ids.js +10 -0
  28. package/lib/db/index.js +179 -0
  29. package/lib/db/repositories/apiKeys.js +72 -0
  30. package/lib/db/repositories/auditLogs.js +12 -0
  31. package/lib/db/repositories/botSpaces.js +12 -0
  32. package/lib/db/repositories/builderSessions.js +312 -0
  33. package/lib/db/repositories/deploymentEvents.js +12 -0
  34. package/lib/db/repositories/deployments.js +385 -0
  35. package/lib/db/repositories/documents.js +68 -0
  36. package/lib/db/repositories/mcpJobs.js +84 -0
  37. package/lib/deployers/bot-fleet.js +110 -0
  38. package/lib/deployers/bot-proxy.js +72 -0
  39. package/lib/deployers/build.js +89 -0
  40. package/lib/deployers/cloud-deploy.js +310 -0
  41. package/lib/deployers/docker.js +439 -0
  42. package/lib/deployers/fly.js +432 -0
  43. package/lib/deployers/index.js +38 -0
  44. package/lib/deployment-auth.js +36 -0
  45. package/lib/document-parser.js +171 -0
  46. package/lib/embedder/chunker.js +93 -0
  47. package/lib/embedder/local.js +101 -0
  48. package/lib/embedder/preview-rag.js +93 -0
  49. package/lib/envelope-schema.js +54 -0
  50. package/lib/fleet/scoped-sql.js +342 -0
  51. package/lib/form-schema-config/base.js +135 -0
  52. package/lib/form-schema-config/index.js +286 -0
  53. package/lib/form-schema-config/locales/af-ZA.js +153 -0
  54. package/lib/form-schema-config/locales/ar-AE.js +142 -0
  55. package/lib/form-schema-config/locales/ar-SA.js +164 -0
  56. package/lib/form-schema-config/locales/de-DE.js +152 -0
  57. package/lib/form-schema-config/locales/en-AU.js +161 -0
  58. package/lib/form-schema-config/locales/en-CA.js +115 -0
  59. package/lib/form-schema-config/locales/en-GB.js +132 -0
  60. package/lib/form-schema-config/locales/en-IN.js +219 -0
  61. package/lib/form-schema-config/locales/en-MY.js +171 -0
  62. package/lib/form-schema-config/locales/en-NG.js +198 -0
  63. package/lib/form-schema-config/locales/en-PH.js +186 -0
  64. package/lib/form-schema-config/locales/en-SG.js +153 -0
  65. package/lib/form-schema-config/locales/en-US.js +138 -0
  66. package/lib/form-schema-config/locales/es-ES.js +171 -0
  67. package/lib/form-schema-config/locales/es-MX.js +193 -0
  68. package/lib/form-schema-config/locales/fr-CA.js +138 -0
  69. package/lib/form-schema-config/locales/fr-FR.js +155 -0
  70. package/lib/form-schema-config/locales/hi-IN.js +219 -0
  71. package/lib/form-schema-config/locales/it-IT.js +157 -0
  72. package/lib/form-schema-config/locales/ja-JP.js +169 -0
  73. package/lib/form-schema-config/locales/ko-KR.js +140 -0
  74. package/lib/form-schema-config/locales/nl-NL.js +149 -0
  75. package/lib/form-schema-config/locales/pt-BR.js +168 -0
  76. package/lib/form-schema-config/locales/zh-CN.js +172 -0
  77. package/lib/form-schema-config/locales/zh-HK.js +142 -0
  78. package/lib/form-structure-schema.js +191 -0
  79. package/lib/llm-providers.js +828 -0
  80. package/lib/markdown.js +197 -0
  81. package/lib/mcp/catalysts/appointment-to-calendar.md +84 -0
  82. package/lib/mcp/catalysts/conversations-to-channel-digest.md +104 -0
  83. package/lib/mcp/catalysts/document-extract-to-store.md +92 -0
  84. package/lib/mcp/catalysts/knowledge-gap-miner.md +96 -0
  85. package/lib/mcp/catalysts/loader.js +144 -0
  86. package/lib/mcp/catalysts/qualify-lead-to-crm.md +83 -0
  87. package/lib/mcp/catalysts/scan-conversations-for-signal.md +92 -0
  88. package/lib/mcp/catalysts/submission-to-ticket.md +83 -0
  89. package/lib/mcp/catalysts/submissions-to-warehouse.md +103 -0
  90. package/lib/mcp/catalysts/weekly-submissions-digest.md +82 -0
  91. package/lib/mcp/jobs.js +64 -0
  92. package/lib/mcp/server.js +184 -0
  93. package/lib/mcp/session-binding.js +130 -0
  94. package/lib/mcp/tools/build.js +123 -0
  95. package/lib/mcp/tools/catalysts.js +477 -0
  96. package/lib/mcp/tools/context.js +325 -0
  97. package/lib/mcp/tools/fleet.js +391 -0
  98. package/lib/mcp/tools/jobs-tools.js +240 -0
  99. package/lib/mcp/tools/operate.js +314 -0
  100. package/lib/preview/build-preview-config.js +136 -0
  101. package/lib/rate-limiter.js +11 -0
  102. package/lib/resolve-api-key.js +142 -0
  103. package/lib/storage/index.js +40 -0
  104. package/messages/de.json +2136 -0
  105. package/messages/en.json +2136 -0
  106. package/messages/es.json +2136 -0
  107. package/messages/fr.json +2136 -0
  108. package/messages/it.json +2136 -0
  109. package/messages/ja.json +2136 -0
  110. package/messages/ko.json +2136 -0
  111. package/messages/nl.json +2136 -0
  112. package/messages/pl.json +2136 -0
  113. package/messages/pt.json +2136 -0
  114. package/messages/ru.json +2136 -0
  115. package/messages/uk.json +2136 -0
  116. package/messages/zh.json +2136 -0
  117. package/package.json +68 -5
  118. package/scripts/mcp-config.mjs +162 -0
  119. package/scripts/mcp-stdio-loader.mjs +42 -0
  120. package/scripts/mcp-stdio.mjs +108 -0
  121. package/scripts/mojulo-paths.mjs +48 -0
@@ -0,0 +1,144 @@
1
+ /**
2
+ * Skill catalyst loader.
3
+ *
4
+ * Skill catalysts are curated workflow patterns shipped with the control
5
+ * plane. They live as .md files in this directory and are exposed via the MCP
6
+ * server so the user's Claude can pull one, read the bot's shape via existing
7
+ * operate tools, and **catalyze** the synthesis of a concrete local skill into
8
+ * the user's `.claude/skills/`.
9
+ *
10
+ * The "catalyst" framing is literal: each file enables one phase transition
11
+ * from a vague user intent + a bot's shape + a destination MCP into a
12
+ * structured skill artifact. The catalyst is not consumed (the file persists
13
+ * and can catalyze again for the next bot) and does not appear in the
14
+ * resulting skill — it's the nucleation point that lets the skill crystallize
15
+ * out.
16
+ *
17
+ * Mojulo only ships the canonical library — there is no user-writable
18
+ * catalyst directory. Custom or one-off patterns are Claude Code's
19
+ * responsibility: a user wanting a bespoke workflow either lets Claude
20
+ * synthesize from scratch or maintains their own catalyst-shaped markdown
21
+ * locally.
22
+ *
23
+ * File format — JSON frontmatter between two `---` fences, then markdown body:
24
+ *
25
+ * ---
26
+ * { "id": "...", "name": "...", ... }
27
+ * ---
28
+ *
29
+ * # Body markdown the model reads at synthesis time.
30
+ *
31
+ * Frontmatter is JSON (not YAML) to keep the loader dep-free and the parse
32
+ * unambiguous. The body is the value — it's the prompt Claude reads to write
33
+ * the user's skill, so it carries the workflow reasoning, mapping intent, and
34
+ * pitfalls.
35
+ *
36
+ * Validation faults are loader bugs (the library is curated, not user input)
37
+ * — we throw with a clear file + field reference so a bad PR fails loudly.
38
+ */
39
+
40
+ import { readdirSync, readFileSync } from 'node:fs';
41
+ import { join, dirname } from 'node:path';
42
+ import { fileURLToPath } from 'node:url';
43
+
44
+ const CATALYST_DIR = dirname(fileURLToPath(import.meta.url));
45
+
46
+ // `valueHook` is required: it's the consultation-mode sentence we read aloud
47
+ // to position the catalyst when surfacing it via `recommend_catalysts` — one
48
+ // sentence in user-outcome terms ("turn yesterday's submissions into qualified
49
+ // CRM contacts"). Without it, the agent has only the `summary` (which is
50
+ // implementation-shaped) and consultation suggestions sound bureaucratic.
51
+ const REQUIRED_FIELDS = ['id', 'name', 'summary', 'valueHook'];
52
+ const FRONTMATTER_FENCE = /^---\s*\n([\s\S]*?)\n---\s*\n?/;
53
+
54
+ let _catalog = null;
55
+
56
+ function parseCatalystFile(filePath, raw) {
57
+ const match = raw.match(FRONTMATTER_FENCE);
58
+ if (!match) {
59
+ throw new Error(
60
+ `Catalyst ${filePath} is missing JSON frontmatter (expected '---' fences).`
61
+ );
62
+ }
63
+ let meta;
64
+ try {
65
+ meta = JSON.parse(match[1]);
66
+ } catch (err) {
67
+ throw new Error(`Catalyst ${filePath} has invalid JSON frontmatter: ${err.message}`);
68
+ }
69
+ for (const field of REQUIRED_FIELDS) {
70
+ if (!meta[field] || typeof meta[field] !== 'string') {
71
+ throw new Error(`Catalyst ${filePath} is missing required string field '${field}'.`);
72
+ }
73
+ }
74
+ const body = raw.slice(match[0].length).trim();
75
+ if (!body) {
76
+ throw new Error(`Catalyst ${filePath} has an empty body — the prose is the catalyst's value.`);
77
+ }
78
+ return {
79
+ id: meta.id,
80
+ name: meta.name,
81
+ summary: meta.summary,
82
+ valueHook: meta.valueHook,
83
+ version: meta.version ?? 1,
84
+ category: meta.category || null,
85
+ requires: meta.requires || {},
86
+ parameters: Array.isArray(meta.parameters) ? meta.parameters : [],
87
+ mcpTools: meta.mcpTools || {},
88
+ body,
89
+ };
90
+ }
91
+
92
+ function loadCatalog() {
93
+ const files = readdirSync(CATALYST_DIR).filter((f) => f.endsWith('.md'));
94
+ const catalysts = new Map();
95
+ for (const file of files) {
96
+ const filePath = join(CATALYST_DIR, file);
97
+ const raw = readFileSync(filePath, 'utf8');
98
+ const catalyst = parseCatalystFile(filePath, raw);
99
+ if (catalysts.has(catalyst.id)) {
100
+ throw new Error(
101
+ `Catalyst id collision: '${catalyst.id}' is declared in both ${catalysts.get(catalyst.id)._file} and ${file}.`
102
+ );
103
+ }
104
+ catalyst._file = file;
105
+ catalysts.set(catalyst.id, catalyst);
106
+ }
107
+ return catalysts;
108
+ }
109
+
110
+ export function getCatalystCatalog() {
111
+ if (!_catalog) _catalog = loadCatalog();
112
+ return _catalog;
113
+ }
114
+
115
+ export function listCatalysts({ category } = {}) {
116
+ const catalog = getCatalystCatalog();
117
+ const out = [];
118
+ for (const catalyst of catalog.values()) {
119
+ if (category && catalyst.category !== category) continue;
120
+ out.push({
121
+ id: catalyst.id,
122
+ name: catalyst.name,
123
+ summary: catalyst.summary,
124
+ valueHook: catalyst.valueHook,
125
+ category: catalyst.category,
126
+ requires: catalyst.requires,
127
+ });
128
+ }
129
+ return out;
130
+ }
131
+
132
+ export function getCatalyst(id) {
133
+ const catalyst = getCatalystCatalog().get(id);
134
+ if (!catalyst) return null;
135
+ const { _file, ...rest } = catalyst;
136
+ return rest;
137
+ }
138
+
139
+ // Test seam — let the test suite point at a fixture directory.
140
+ export function _resetCatalogForTests(catalog) {
141
+ _catalog = catalog || null;
142
+ }
143
+
144
+ export { CATALYST_DIR as _CATALYST_DIR_FOR_TESTS, parseCatalystFile as _parseCatalystFileForTests };
@@ -0,0 +1,83 @@
1
+ ---
2
+ {
3
+ "id": "qualify-lead-to-crm",
4
+ "name": "Qualify lead and sync to CRM",
5
+ "summary": "Score new submissions against the user's rubric and create matching CRM records, skipping low-quality leads.",
6
+ "valueHook": "Turn yesterday's intake submissions into qualified CRM contacts overnight, deduped and scored.",
7
+ "version": 1,
8
+ "category": "crm-sync",
9
+ "requires": {
10
+ "protocols": ["formGathering"],
11
+ "destinationMcpCategory": "crm-like",
12
+ "destinationExamples": ["HubSpot", "Salesforce", "Pipedrive", "Attio", "Close"]
13
+ },
14
+ "parameters": [
15
+ {
16
+ "name": "qualifyingCriteria",
17
+ "prompt": "What makes a 'qualified' submission for your business? (rubric in plain English — e.g., specific industries, geographic area, deal size signals, accepted insurance carriers)"
18
+ },
19
+ {
20
+ "name": "scoreThreshold",
21
+ "prompt": "Minimum qualifying score (0-100) below which a submission is skipped?",
22
+ "default": 60
23
+ },
24
+ {
25
+ "name": "dedupeKey",
26
+ "prompt": "Which submission field detects duplicate contacts in the CRM? (typically email or phone)"
27
+ }
28
+ ],
29
+ "mcpTools": {
30
+ "mojulo": ["query_submissions", "get_deployment"],
31
+ "destination": {
32
+ "description": "A CRM-like MCP exposing search-by-property + contact/deal create. Examples: HubSpot, Salesforce, Pipedrive, Attio."
33
+ }
34
+ }
35
+ }
36
+ ---
37
+
38
+ # Qualify lead and sync to CRM
39
+
40
+ This catalyst turns a `formGathering` mojulo bot into a CRM intake pipeline. You score each new submission against the user's rubric, dedupe against existing CRM records, and create a contact (and optionally a deal) for the qualified ones.
41
+
42
+ ## How to synthesize the skill
43
+
44
+ 1. Call `get_deployment(deploymentId)` to read the bot's form schema. The synthesized skill's mapping is **derived from this schema** — never guess field names.
45
+ 2. Ask the user the three `parameters` questions in one round.
46
+ 3. Inspect the bound destination MCP to learn its contact-create surface (field names, required props, search-by-property tool). Field mapping is the catalyst's value-add — don't assume it's `name`/`email`/`phone` everywhere; HubSpot uses `firstname`/`lastname`/`email`, Salesforce uses `FirstName`/`LastName`/`Email`, Attio uses object/attribute pairs.
47
+ 4. Write `.claude/skills/<bot-slug>-crm-sync/SKILL.md` with the synthesized workflow. The skill takes `deploymentId` and `since` as inputs.
48
+
49
+ ## Mapping intent
50
+
51
+ The mojulo submission JSON has one entry per form field. Map by **field semantics**, not by position:
52
+
53
+ - Identity fields (email, phone, name) → CRM contact identity props. Use the configured `dedupeKey` to search-before-create.
54
+ - Categorical fields (industry, plan interest, source) → CRM contact properties or pipeline/stage tags on the created deal.
55
+ - Free-text fields (chief complaint, message, notes) → CRM contact `notes` or a follow-up activity log entry. Do **not** create a deal from free-text alone — these are the noisiest fields.
56
+ - Timestamp + submission id → store as `mojulo_submission_id` + `mojulo_captured_at` on the contact for traceability.
57
+
58
+ When the bot's form schema has a field that doesn't fit any CRM property, **ask the user** during synthesis where it should go. Don't silently drop fields.
59
+
60
+ ## Qualifying logic
61
+
62
+ Run each submission through a single LLM judgement against `qualifyingCriteria`. Return a score 0-100 and a one-sentence reason. The skill stores the score + reason on the CRM contact (a `mojulo_qualifying_score` property) so the user can audit why something was kept or skipped without re-running the classifier.
63
+
64
+ Submissions below `scoreThreshold` are logged but not pushed. The skill emits a decision log per run.
65
+
66
+ ## Idempotency
67
+
68
+ Use `since` as a high-water cursor on the submission timestamp. Each run pulls only submissions newer than `since`. The synthesized skill should print the new cursor at end of run so the user can pass it back next time — or wire it through a scheduler.
69
+
70
+ Independent of the cursor, **always search-before-create** on the `dedupeKey`. Two failure modes the cursor doesn't cover: a user re-runs an old window, or the same person submits twice. Search-before-create is the durable defense.
71
+
72
+ ## Pitfalls — surface these to the user
73
+
74
+ - **PII back through the LLM.** Form-gathering's design point is that PII bypasses the LLM at capture time. This skill deliberately reintroduces PII at routing time, since qualifying needs to read fields like email or chief complaint. Worth confirming the user is OK with this against the data-handling posture they advertised to end users.
75
+ - **Irreversible writes.** CRM contact creates are visible to sales reps and trigger downstream automations (welcome sequences, lead-rotation rules). Default the synthesized skill to a `--dry-run` mode that prints decisions without writing. The user explicitly opts into live writes per run.
76
+ - **Rate limits.** CRMs throttle aggressively. Process submissions serially with a small inter-call delay rather than parallelizing.
77
+ - **Field-mapping drift.** If the user later edits the bot's form schema, the skill's mapping goes stale silently. Recommend the user re-run the catalyst flow to regenerate the skill when they change the form.
78
+
79
+ ## Skill behavior contract
80
+
81
+ - **Inputs:** `deploymentId` (string, required), `since` (ISO timestamp, optional — defaults to last-cursor or 24h ago), `dryRun` (bool, default true)
82
+ - **Outputs:** a per-submission decision log: `{ submissionId, score, reason, action: 'created' | 'updated' | 'skipped-low-score' | 'skipped-duplicate', crmRecordId? }`
83
+ - **Side effects (live mode only):** CRM contact create/update via the bound MCP. No mojulo-side writes.
@@ -0,0 +1,92 @@
1
+ ---
2
+ {
3
+ "id": "scan-conversations-for-signal",
4
+ "name": "Scan conversations for a signal",
5
+ "summary": "Sample recent bot conversations, scan each for a user-defined signal (churn intent, competitor mentions, recurring complaints), and route matches to an actuator MCP.",
6
+ "valueHook": "Sample recent conversations for a signal you care about — churn intent, competitor mentions, recurring complaints — and route matches where the team can act.",
7
+ "version": 1,
8
+ "category": "analysis",
9
+ "requires": {
10
+ "protocols": [],
11
+ "destinationMcpCategory": "actuator-like",
12
+ "destinationExamples": ["Linear", "Slack", "Notion", "Google Sheets"]
13
+ },
14
+ "parameters": [
15
+ {
16
+ "name": "signalDefinition",
17
+ "prompt": "What signal are you scanning for? (e.g., 'mentions of competitor X', 'churn intent or cancellation language', 'accessibility complaints', 'recurring feature requests')"
18
+ },
19
+ {
20
+ "name": "sampleSize",
21
+ "prompt": "How many recent conversations to scan per run?",
22
+ "default": 30
23
+ },
24
+ {
25
+ "name": "matchAction",
26
+ "prompt": "What should happen when the signal fires? (e.g., 'file a Linear ticket tagged voice-of-customer', 'post a Slack message to #cs-insights', 'append a row to a Notion database')"
27
+ }
28
+ ],
29
+ "mcpTools": {
30
+ "mojulo": ["query_conversations", "get_conversation", "get_deployment"],
31
+ "destination": {
32
+ "description": "Any MCP that can perform the configured matchAction. Examples: Linear (issue_create), Slack (post_message), Notion (append_block), Sheets (append_row)."
33
+ }
34
+ }
35
+ }
36
+ ---
37
+
38
+ # Scan conversations for a signal
39
+
40
+ This is the analytical counterpart to the write-side catalysts. Rather than acting on every submission, it samples conversations, looks for a specific signal in the turn text, and only acts when the signal fires. The point is **sampling, not sweeping** — a bounded scan lets the user tune their signal prompt against real conversations before scaling up.
41
+
42
+ This catalyst formalizes a recipe already documented in [docs/mcp-integration.md](docs/mcp-integration.md#recipes) §4 — the formal version adds parameterization and a behavior contract.
43
+
44
+ ## How to synthesize the skill
45
+
46
+ 1. `get_deployment(deploymentId)` — read the bot's protocols and identity. The signal prompt benefits from knowing what the bot is *for*; "churn intent" means different things on a support bot vs. a sales bot.
47
+ 2. Ask the user the three `parameters` questions.
48
+ 3. Inspect the destination MCP for the actuator surface implied by `matchAction`. The mapping from signal-match to action payload is the catalyst's value-add.
49
+ 4. Write `.claude/skills/<bot-slug>-scan-<signal-slug>/SKILL.md`. Naming includes the signal so multiple signal scans on the same bot don't collide.
50
+
51
+ ## Scan logic
52
+
53
+ 1. `query_conversations(deploymentId, since?)` to get summaries — already sorted by recency.
54
+ 2. Take the top `sampleSize`. For each, `get_conversation(deploymentId, conversationId)` to pull the turn list.
55
+ 3. Run a single LLM judgement per conversation against `signalDefinition`. Return: `{ matched: bool, evidence: '<quoted snippet, ≤200 chars>', confidence: 'low' | 'medium' | 'high' }`.
56
+ 4. For matches, fire `matchAction` with a payload that includes the conversation id, the evidence snippet, and a link/path back to the source.
57
+
58
+ ## Action payload composition
59
+
60
+ The synthesized skill should produce one action per match (not one batched action per run). The payload structure:
61
+
62
+ - A title or summary derived from `signalDefinition` and the matched conversation
63
+ - The evidence snippet **with surrounding context** (1 turn before, 1 turn after) — quoting in isolation loses meaning
64
+ - Conversation id + deployment id + bot name (mojulo trace)
65
+ - The chain verification URL `<bot-url>/verify/<conversationId>` so the reviewer can confirm authenticity
66
+
67
+ ## Sampling discipline
68
+
69
+ `sampleSize` defaults to 30 for a reason — it keeps each run's LLM cost predictable and bounded. The user can scale up once they've validated the signal definition holds up. Recommend the synthesized skill default to a small sample for the first few runs, then graduate.
70
+
71
+ For continuous monitoring, the right pattern is to combine this skill with `/schedule` so it runs on a cadence. Avoid trying to do "watch all conversations always" — there's no event surface for that, and the polling cost would be silly.
72
+
73
+ ## Multiple signals on one bot
74
+
75
+ Don't synthesize a multi-signal skill. Each signal gets its own skill instance. Reasons:
76
+
77
+ - The signal prompt is the brittle part — tuning one signal shouldn't risk regressing another.
78
+ - Sampling overlap is fine: two skills both scanning the recent 30 cost roughly twice as much, which is fine.
79
+ - Action targets often differ per signal (churn → CS Slack; competitor → product Notion; feature request → product backlog).
80
+
81
+ ## Pitfalls
82
+
83
+ - **False positives flood the actuator.** A loose signal definition fires on too many conversations and floods the destination. The first run with a new signal should default to `--dry-run` so the user sees what would have fired before they wire it live.
84
+ - **Confidence calibration.** "High confidence" from the model doesn't mean the signal is real — it means the model is sure of its own judgement. Recommend the user spot-check 10-20 matches early on to calibrate.
85
+ - **PII through the LLM.** Conversation turns can contain sensitive content. Scanning by definition reads them. Same caveat as the other catalysts — confirm against the bot's data-handling posture.
86
+ - **Stale conversations.** `query_conversations` is unbounded by default. With no `since`, the sample drifts toward the oldest conversations. Always pass `since` (default: 7d).
87
+
88
+ ## Skill behavior contract
89
+
90
+ - **Inputs:** `deploymentId` (required), `sampleSize` (default 30), `since` (default 7d ago, ISO), `dryRun` (default true)
91
+ - **Outputs:** per-conversation scan log `{ conversationId, matched, confidence, evidence?, actionResult? }`
92
+ - **Side effects (live mode):** one destination-MCP action per match.
@@ -0,0 +1,83 @@
1
+ ---
2
+ {
3
+ "id": "submission-to-ticket",
4
+ "name": "Submission to ITSM ticket",
5
+ "summary": "Turn new submissions (or triaged conversations) into tickets in Linear/Jira/ServiceNow with routing, priority, and assignment.",
6
+ "valueHook": "Bot intake becomes routed, prioritized tickets in your team's tracker — no manual triage step.",
7
+ "version": 1,
8
+ "category": "itsm",
9
+ "requires": {
10
+ "protocols": ["formGathering"],
11
+ "optionalProtocols": ["triage"],
12
+ "destinationMcpCategory": "ticketing-like",
13
+ "destinationExamples": ["Linear", "Jira", "ServiceNow", "GitHub Issues"]
14
+ },
15
+ "parameters": [
16
+ {
17
+ "name": "routingRules",
18
+ "prompt": "How should submissions route to teams/projects/assignment groups? (e.g., 'urgent complaints → on-call coordinator; billing issues → finance queue; general → support backlog')"
19
+ },
20
+ {
21
+ "name": "priorityRules",
22
+ "prompt": "What signals priority? (e.g., 'words like emergency/urgent → high; missed-appointment forms → high; everything else → normal')"
23
+ },
24
+ {
25
+ "name": "titleTemplate",
26
+ "prompt": "How should the ticket title be composed? (e.g., '[chief_complaint] — [name] ([conversation_id])')"
27
+ }
28
+ ],
29
+ "mcpTools": {
30
+ "mojulo": ["query_submissions", "get_conversation", "get_deployment"],
31
+ "destination": {
32
+ "description": "A ticketing-like MCP exposing issue/ticket create with title, body, priority, project/queue, and assignee fields. Examples: Linear, Jira, ServiceNow, GitHub Issues."
33
+ }
34
+ }
35
+ }
36
+ ---
37
+
38
+ # Submission to ITSM ticket
39
+
40
+ This catalyst wires a mojulo bot's submissions (and, when the `triage` protocol is enabled, the routing decision) into a ticketing system. Each submission becomes one ticket with derived priority, project/queue assignment, and a description rich enough that the assignee doesn't need to come back and read the original conversation.
41
+
42
+ ## How to synthesize the skill
43
+
44
+ 1. `get_deployment(deploymentId)` — read the form schema. If the bot has `triage` enabled, note the routes; they're hints for `routingRules`.
45
+ 2. Ask the user the three `parameters` questions.
46
+ 3. Inspect the destination MCP to learn its ticket-create surface — particularly the **project/queue identifier shape** (Linear team id, Jira project key, ServiceNow assignment group sys_id) and the **priority enum** (Linear: 1-4, Jira: P0-P5, ServiceNow: 1-5).
47
+ 4. Write `.claude/skills/<bot-slug>-ticket-sync/SKILL.md`.
48
+
49
+ ## Routing logic
50
+
51
+ Apply `routingRules` as a single classification step per submission. The skill picks one project/queue per submission. If the rules don't match cleanly, default to a configured fallback queue rather than guessing — silent misrouting is worse than a clear "unsorted" pile a human can clear.
52
+
53
+ When the bot has `triage` enabled, **prefer the bot's own triage decision over re-classifying**. The triage protocol has already routed the conversation against the vector store; re-doing that work risks divergence. Use the triage label as the queue assignment; only apply `routingRules` for priority and any secondary routing axis.
54
+
55
+ ## Priority logic
56
+
57
+ Same shape: one LLM judgement per submission, returns a priority + one-sentence reason. The reason goes into the ticket body so reviewers see why something was tagged P0 vs P3.
58
+
59
+ ## Ticket body composition
60
+
61
+ The synthesized skill should build the body from:
62
+
63
+ 1. **Submission fields** rendered as a clean key/value list (use the form schema field labels, not raw keys).
64
+ 2. **Conversation excerpt** — pull the conversation via `get_conversation(deploymentId, conversationId)` and include the last 4-6 turns. Don't dump the whole thing; reviewers will skim.
65
+ 3. **Mojulo trace** — submission id, conversation id, deployment id, captured-at timestamp. Critical for incident response — the reviewer needs to be able to walk back to the source.
66
+ 4. **Verification link** — if the conversation has a `chain_hash`, include the bot's `/verify/<conversationId>` URL so the reviewer can confirm tamper-evidence on dispute.
67
+
68
+ ## Idempotency
69
+
70
+ `since` cursor + a `mojulo_submission_id` field on the ticket (most ticketing systems support custom fields or labels). Search-before-create to avoid double-filing on re-runs. If the system has no custom-field surface, append the submission id to the ticket title as `[sub:...]` and grep on retry.
71
+
72
+ ## Pitfalls
73
+
74
+ - **Triage-vs-rules conflict.** If both the bot's triage and the skill's `routingRules` apply, the user needs to know which wins. Default to triage. Make the synthesized skill comment this clearly.
75
+ - **PII in ticket bodies.** Tickets are often visible to wider teams than the form submission was intended for. If the bot collects SSN/DOB/financial info, ask the user during synthesis whether to redact those fields from the ticket body (store identifiers only) and link to the bot's submission view for the full record.
76
+ - **Alert fatigue.** A new bot may have a backlog of historical submissions. The first run with a wide `since` window can flood a queue. Recommend the user start with a narrow window or pipe the first batch into a triage project for review.
77
+ - **Closing the loop.** This skill creates tickets; it doesn't close them. Ticket lifecycle stays in the ITSM. If the user later wants the bot to know "this issue was resolved," that's a separate skill in the other direction (and not currently exposed).
78
+
79
+ ## Skill behavior contract
80
+
81
+ - **Inputs:** `deploymentId` (required), `since` (optional ISO), `dryRun` (default true), `fallbackQueue` (required for live mode — the queue used when routing fails)
82
+ - **Outputs:** per-submission decision log `{ submissionId, priority, queue, ticketId? }`
83
+ - **Side effects (live mode only):** ticket create via destination MCP. No mojulo-side writes.
@@ -0,0 +1,103 @@
1
+ ---
2
+ {
3
+ "id": "submissions-to-warehouse",
4
+ "name": "Submissions to data warehouse",
5
+ "summary": "Append form submissions to an analytical warehouse table (BigQuery/Snowflake/Postgres/Redshift) with stable schema and incremental cursor — append-only, no qualifying logic, ready for SQL analysis downstream.",
6
+ "valueHook": "Submissions land in your analytics warehouse with a stable schema, ready for SQL analysis and dashboards downstream.",
7
+ "version": 1,
8
+ "category": "warehouse",
9
+ "requires": {
10
+ "protocols": ["formGathering"],
11
+ "destinationMcpCategory": "warehouse-like",
12
+ "destinationExamples": ["BigQuery", "Snowflake", "Postgres", "Redshift", "DuckDB"]
13
+ },
14
+ "parameters": [
15
+ {
16
+ "name": "targetTable",
17
+ "prompt": "Fully-qualified target table name (e.g., 'analytics.mojulo_submissions', 'PROD.RAW.BOT_INTAKES'). Will be created if absent and the destination MCP supports DDL; otherwise the user creates it from the schema you propose."
18
+ },
19
+ {
20
+ "name": "columnMapping",
21
+ "prompt": "How should form fields map to warehouse columns? Provide field-name → column-name pairs plus the SQL type each column should use (STRING, TIMESTAMP, NUMERIC, BOOLEAN, JSON). For fields that aren't a clean fit, propose a JSON column and pack them there."
22
+ },
23
+ {
24
+ "name": "partitionStrategy",
25
+ "prompt": "How should the table be partitioned for query performance? (e.g., 'daily by captured_at', 'by deployment_id', 'none — small volume'). Defaults to daily if the warehouse supports time-partitioned tables.",
26
+ "default": "daily by captured_at"
27
+ },
28
+ {
29
+ "name": "backfillOnFirstRun",
30
+ "prompt": "On the first run, should the skill backfill all historical submissions, or start from now-forward only? Backfill is fine for low-volume bots; bounded windows are safer for high-volume.",
31
+ "default": "now-forward only"
32
+ }
33
+ ],
34
+ "mcpTools": {
35
+ "mojulo": ["query_submissions", "get_deployment"],
36
+ "destination": {
37
+ "description": "A warehouse-like MCP that exposes table create (optional) and row append/insert with named columns and types. Examples: BigQuery, Snowflake, Postgres, Redshift, DuckDB. The destination must support either bulk insert (preferred) or single-row insert. Streaming inserts are nice-to-have."
38
+ }
39
+ }
40
+ }
41
+ ---
42
+
43
+ # Submissions to data warehouse
44
+
45
+ This is the analytical-pipeline counterpart to `qualify-lead-to-crm`. Where that catalyst is **opinionated** (scoring, branching, dedupe-as-update), this one is **mechanical** (append-only, schema-of-record, no qualifying judgments). The output is a warehouse table that downstream analysts can SQL against without knowing anything about mojulo's internals.
46
+
47
+ If you find yourself wanting "qualify the row before inserting" or "update an existing record on second submission" — that's the `qualify-lead-to-crm` catalyst, not this one. This catalyst's value is exactly its lack of opinions; every submission goes through, with the same shape, every time.
48
+
49
+ ## How to synthesize the skill
50
+
51
+ 1. `get_deployment(deploymentId)` — read the form schema. The schema **is** the source-of-truth for `columnMapping`. Never invent columns the bot's form doesn't produce; never silently drop form fields without surfacing them as candidates for a JSON column.
52
+ 2. Ask the user the four `parameters` questions, batched. Propose a default `columnMapping` derived from the form schema (best-guess types per field name) and let the user adjust — don't make them type the whole mapping from scratch.
53
+ 3. Inspect the destination MCP. Confirm it supports the insert path you need (bulk preferred, single-row acceptable). If the user's warehouse MCP only exposes query/read (no write), this catalyst doesn't apply — say so plainly rather than trying to force a path.
54
+ 4. Write `.claude/skills/<bot-slug>-warehouse-sync/SKILL.md`. The skill takes `deploymentId` and `since` as inputs.
55
+
56
+ ## Schema design
57
+
58
+ Every row in the target table has the same shape, regardless of which form was submitted. The columns:
59
+
60
+ - **Universal trace columns** — `submission_id` (PRIMARY KEY), `deployment_id`, `conversation_id`, `captured_at` (TIMESTAMP), `ingested_at` (TIMESTAMP — when this skill ran). These are non-negotiable. They're what makes the warehouse rows joinable to other systems and analyzable over time.
61
+ - **Mapped form columns** — one column per form field, per `columnMapping`. Types chosen to match analytical use (TIMESTAMP for dates, NUMERIC for currency, STRING for free text, BOOLEAN for yes/no, JSON for nested).
62
+ - **Fallback `raw_extras` JSON column** — any form field the user didn't explicitly map lands here. Better to capture-and-defer than to drop. Analysts can JSON-extract later if a field turns out to matter.
63
+ - **Optional partition column** — typically `captured_at` derived to a date for daily partition pruning. Some warehouses (BigQuery) have explicit partition syntax; others (Snowflake) use clustering keys.
64
+
65
+ The schema should be **additive-friendly** — when the bot's form grows new fields, the synthesized skill should detect the unmapped field, route it to `raw_extras`, and emit a clear log line. The user can later promote it to a typed column with an ALTER + a one-time backfill from `raw_extras`. Don't try to auto-ALTER the schema from the skill; warehouse DDL is operator territory.
66
+
67
+ ## Incremental cursor
68
+
69
+ `since` cursor on `captured_at` is the primary mechanism. Each run:
70
+
71
+ 1. Read the table's `MAX(captured_at)` (or accept `since` as input override).
72
+ 2. `query_submissions(deploymentId, since=...)` to pull only newer rows.
73
+ 3. Bulk-insert the batch.
74
+ 4. Print the new high-water timestamp so the user can pass it back or wire it to a scheduler.
75
+
76
+ The synthesized skill should NOT rely on `submission_id` for incremental cursoring — IDs aren't guaranteed monotonic over time in the bot's SQLite. Always use timestamp.
77
+
78
+ ## Idempotency
79
+
80
+ Two layers of defense, both important:
81
+
82
+ - **Cursor-based dedup (primary):** the `since` cursor advances past already-loaded rows. Re-running with the same `since` is a no-op when no new submissions exist.
83
+ - **`submission_id` PRIMARY KEY (safety net):** because cursor logic can fail (operator manually re-runs an old window, clock skew), the destination table's primary key on `submission_id` ensures double-inserts fail loudly rather than silently duplicate. Use `INSERT ... ON CONFLICT DO NOTHING` (Postgres) / `MERGE ... WHEN NOT MATCHED` (BigQuery/Snowflake) so re-runs degrade to no-ops instead of errors.
84
+
85
+ ## Bulk vs. streaming
86
+
87
+ Default to **bulk inserts** (one statement per run, all rows in one batch). Reasons: cheaper per-row, atomic from the warehouse's perspective, easier to reason about for incremental loads. Streaming inserts are tempting for "near-real-time" but introduce per-row cost and break the idempotency story when retries land.
88
+
89
+ If the user's volume is high enough to need streaming (>~1000 submissions/hour sustained), this catalyst isn't the right tool — the bot's webhook ([server.js](../../lite-template/server.js)'s `/api/send-webhook`) is the architecturally-correct path for event-driven warehouse loading, and the skill becomes "drain the webhook DLQ" rather than "scan the bot's SQLite." Surface this distinction to the user if their submission rate is in that range.
90
+
91
+ ## Pitfalls
92
+
93
+ - **PII in the warehouse.** Warehouses are typically more broadly accessible than the bot's SQLite — analysts, BI tools, downstream pipelines, sometimes vendors. If the form captures sensitive fields (DOB, SSN, financial, medical), the synthesized skill should default to **excluding** those columns from the mapping and pointing the user at column-level encryption or a separate restricted-access table. The user can override, but the question must be asked explicitly during synthesis.
94
+ - **Schema drift.** When the bot's form gains/renames/drops fields, the warehouse columns silently misalign. The skill must detect unmapped fields each run and emit a log line; recommend the user re-run the catalyst flow when they change the form.
95
+ - **Type coercion silently lies.** If a form field is "free text" but most rows look numeric, the user might map it to NUMERIC. The day a row arrives with `'N/A'` in that field, the insert fails or the value goes NULL. Default to STRING for any free-text field and let the user explicitly promote to a typed column only when they're certain of the data.
96
+ - **Backfill stampedes.** A `backfillOnFirstRun=true` against a bot with months of submissions will hammer both the bot proxy and the destination warehouse. Recommend chunking the backfill into 7-day windows with a delay between, and surface progress (`backfilled 2026-01-01..2026-01-07: 312 rows`).
97
+ - **Bot-proxy load.** `query_submissions` proxies through to the bot. For very large windows, the proxy can timeout or the bot can be slow to serialize. Recommend keeping per-run windows bounded (≤30 days) and chaining if a larger backfill is needed.
98
+
99
+ ## Skill behavior contract
100
+
101
+ - **Inputs:** `deploymentId` (required), `since` (optional ISO — defaults to MAX(captured_at) from the destination table, falling back to 24h ago on first run), `dryRun` (default true)
102
+ - **Outputs:** per-run summary: `{ deploymentId, windowStart, windowEnd, rowsInserted, rowsSkippedDuplicate, unmappedFields: [...], newHighWaterMark }`
103
+ - **Side effects (live mode):** bulk insert / merge to the destination warehouse table. No mojulo-side writes. No bot-side writes.
@@ -0,0 +1,82 @@
1
+ ---
2
+ {
3
+ "id": "weekly-submissions-digest",
4
+ "name": "Periodic submissions digest",
5
+ "summary": "Produce a recurring digest of recent bot submissions (counts, trends, notable items) and post it to a doc, channel, or email.",
6
+ "valueHook": "A recurring summary of recent submissions — counts, trends, notable items — posted where stakeholders see it.",
7
+ "version": 1,
8
+ "category": "digest",
9
+ "requires": {
10
+ "protocols": ["formGathering"],
11
+ "destinationMcpCategory": "doc-or-channel-like",
12
+ "destinationExamples": ["Notion", "Slack", "Gmail", "Google Docs"]
13
+ },
14
+ "parameters": [
15
+ {
16
+ "name": "cadenceDescription",
17
+ "prompt": "How often will this run, and what window should each digest cover? (e.g., 'weekly, covering the prior 7 days')"
18
+ },
19
+ {
20
+ "name": "groupBy",
21
+ "prompt": "What dimensions should the digest break submissions down by? (e.g., 'source channel, lead type, urgency tag' — pick fields from the bot's form schema)"
22
+ },
23
+ {
24
+ "name": "notableThreshold",
25
+ "prompt": "What qualifies as a 'notable' submission worth calling out individually in the digest? (e.g., 'high-priority complaints, deals > $10k, returning customer issues')"
26
+ },
27
+ {
28
+ "name": "outputFormat",
29
+ "prompt": "Where does this land, and what format? (e.g., 'Notion page in workspace X', 'Slack message to #bot-digest', 'email to team@example.com')"
30
+ }
31
+ ],
32
+ "mcpTools": {
33
+ "mojulo": ["query_submissions", "get_deployment"],
34
+ "destination": {
35
+ "description": "Any MCP that can write a document or post a message. Examples: Notion (create_page), Slack (post_message), Gmail (send_email), Google Docs (create_document)."
36
+ }
37
+ }
38
+ }
39
+ ---
40
+
41
+ # Periodic submissions digest
42
+
43
+ A digest skill is a low-cost way for a team to stay aware of what a bot is collecting without anyone manually clicking through the dashboard. The synthesis goal is a skill that, run on a cadence (manually or via scheduler), summarizes the recent submission window into the user's chosen output surface.
44
+
45
+ ## How to synthesize the skill
46
+
47
+ 1. `get_deployment(deploymentId)` — read the form schema. The fields listed in `groupBy` must exist; if not, ask the user to pick others.
48
+ 2. Ask the user the four `parameters` questions in one round.
49
+ 3. Inspect the destination MCP's write surface — markdown support, length limits, attachment support. The digest format adapts to what the destination accepts.
50
+ 4. Write `.claude/skills/<bot-slug>-digest/SKILL.md`.
51
+
52
+ ## Digest composition
53
+
54
+ A good digest has four sections, in this order:
55
+
56
+ 1. **Header:** bot name, window covered, total submissions.
57
+ 2. **Counts:** breakdown by each `groupBy` dimension. Tables or bullet lists depending on destination capability.
58
+ 3. **Trends:** week-over-week deltas if a prior digest exists. The synthesized skill should optionally read the prior digest from the destination to compute deltas; if the destination doesn't support read, skip trends.
59
+ 4. **Notable items:** 3-10 submissions matching `notableThreshold`, each with a one-line summary and a link/id back to the source. Keep this section bounded — the digest loses value when it tries to surface everything.
60
+
61
+ ## Sampling vs full scan
62
+
63
+ For low-volume bots (<200 submissions/window) the skill processes every submission. For higher volume, the skill samples notable items and counts via lightweight aggregation rather than LLM-classifying every row. Set the threshold at synthesis time based on the bot's observed volume — `query_submissions` with a recent window tells you roughly what to expect.
64
+
65
+ ## Idempotency
66
+
67
+ Less critical here than for write-side catalysts — re-running the digest just overwrites or re-posts. But:
68
+
69
+ - For Notion/Doc destinations: search-before-create on the page title to update an existing digest rather than spawn duplicates per run.
70
+ - For Slack/email destinations: there's no idempotency — re-running re-sends. Default the synthesized skill to `--dry-run` mode that prints the digest to stdout, with `--send` required for live.
71
+
72
+ ## Pitfalls
73
+
74
+ - **Stale notable threshold.** The threshold "high-priority complaints" depends on the form having a `priority` or equivalent field. If the form changes, the digest silently goes empty. Recommend the user re-run the catalyst flow when form fields they reference change.
75
+ - **PII in digests.** Digests are often shared more broadly than the form submission was. Default to summarizing identity (count + role + general region) rather than dumping names/emails into the digest body. The user can override if their team needs identity.
76
+ - **Empty windows.** A bot with no submissions in the window shouldn't produce a noisy "0 submissions" digest every week. Default the synthesized skill to skip-when-empty unless the user explicitly wants the heartbeat.
77
+
78
+ ## Skill behavior contract
79
+
80
+ - **Inputs:** `deploymentId` (required), `windowStart` and `windowEnd` (optional ISO — defaults derived from cadence), `dryRun` (default true)
81
+ - **Outputs:** the rendered digest (printed in dry-run mode, posted otherwise)
82
+ - **Side effects (live mode):** one document/message create or update via destination MCP.