codifier 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/README.md +543 -0
  2. package/commands/codify.md +7 -0
  3. package/commands/onboard.md +7 -0
  4. package/commands/push-memory.md +7 -0
  5. package/commands/recall.md +41 -0
  6. package/commands/remember.md +7 -0
  7. package/commands/research.md +7 -0
  8. package/dist/cli/add.d.ts +5 -0
  9. package/dist/cli/add.d.ts.map +1 -0
  10. package/dist/cli/add.js +25 -0
  11. package/dist/cli/add.js.map +1 -0
  12. package/dist/cli/bin/codifier.d.ts +7 -0
  13. package/dist/cli/bin/codifier.d.ts.map +1 -0
  14. package/dist/cli/bin/codifier.js +47 -0
  15. package/dist/cli/bin/codifier.js.map +1 -0
  16. package/dist/cli/detect.d.ts +15 -0
  17. package/dist/cli/detect.d.ts.map +1 -0
  18. package/dist/cli/detect.js +69 -0
  19. package/dist/cli/detect.js.map +1 -0
  20. package/dist/cli/doctor.d.ts +6 -0
  21. package/dist/cli/doctor.d.ts.map +1 -0
  22. package/dist/cli/doctor.js +71 -0
  23. package/dist/cli/doctor.js.map +1 -0
  24. package/dist/cli/init.d.ts +7 -0
  25. package/dist/cli/init.d.ts.map +1 -0
  26. package/dist/cli/init.js +144 -0
  27. package/dist/cli/init.js.map +1 -0
  28. package/dist/cli/update.d.ts +5 -0
  29. package/dist/cli/update.d.ts.map +1 -0
  30. package/dist/cli/update.js +38 -0
  31. package/dist/cli/update.js.map +1 -0
  32. package/dist/index.js +87 -0
  33. package/package.json +40 -0
  34. package/skills/brownfield-onboard/SKILL.md +142 -0
  35. package/skills/capture-session/SKILL.md +111 -0
  36. package/skills/initialize-project/SKILL.md +185 -0
  37. package/skills/initialize-project/templates/evals-prompt.md +39 -0
  38. package/skills/initialize-project/templates/requirements-prompt.md +44 -0
  39. package/skills/initialize-project/templates/roadmap-prompt.md +44 -0
  40. package/skills/initialize-project/templates/rules-prompt.md +34 -0
  41. package/skills/push-memory/SKILL.md +131 -0
  42. package/skills/research-analyze/SKILL.md +149 -0
  43. package/skills/research-analyze/templates/query-generation-prompt.md +61 -0
  44. package/skills/research-analyze/templates/synthesis-prompt.md +67 -0
  45. package/skills/shared/codifier-tools.md +187 -0
@@ -0,0 +1,131 @@
1
+ # Skill: Push Memory
2
+
3
+ **Role:** Any (cross-functional)
4
+ **Purpose:** Sync local session learnings from `docs/MEMORY.md` to the shared Codifier knowledge base via `update_memory`. Supports idempotent re-sync via per-entry `[kb:<uuid>]` annotations — entries already pushed are skipped automatically.
5
+
6
+ See `../shared/codifier-tools.md` for full MCP tool reference.
7
+
8
+ ---
9
+
10
+ ## Prerequisites
11
+
12
+ - Active MCP connection to the Codifier server
13
+ - A `docs/MEMORY.md` file with at least one entry (run `/remember` to capture learnings if this file does not exist)
14
+ - A project in the Codifier KB (confirmed in Step 1)
15
+
16
+ ---
17
+
18
+ ## Workflow
19
+
20
+ Follow these steps in order. You are the state machine — call MCP tools only for data operations.
21
+
22
+ ### Step 1 — Confirm Project
23
+
24
+ Read `docs/MEMORY.md`. Check this location first, then `.codifier/docs/MEMORY.md`.
25
+
26
+ Inspect the file header for a `project_id` field. The header follows this format:
27
+
28
+ ```
29
+ # Session Memory
30
+
31
+ _Project:_ <project_name>
32
+ _Project ID:_ <uuid>
33
+ _Last updated:_ <date>
34
+ ```
35
+
36
+ - If a `project_id` is present in the header: use it for all subsequent MCP calls. Inform the user which project will be used.
37
+ - If no `project_id` is in the header: call `manage_projects` with `operation: "list"` and present the results to the user. Ask: **"Which project should these learnings be pushed to?"** If they need a new project, call `manage_projects` with `operation: "create"`. Store the resolved `project_id`.
38
+
39
+ ### Step 2 — Identify Unsynced Entries
40
+
41
+ Parse `docs/MEMORY.md` and collect all bullet-point entries across all category sections.
42
+
43
+ Entries follow one of two formats:
44
+
45
+ - **Synced** (already in the KB): `- [kb:<uuid>] The learning text`
46
+ - **Unsynced** (local-only): `- The learning text`
47
+
48
+ Classify every entry. Entries with a `[kb:<uuid>]` prefix are already synced — do not push them again.
49
+
50
+ If all entries are already synced, inform the user:
51
+
52
+ > "All entries in docs/MEMORY.md are already synced to the shared KB. Nothing to push."
53
+
54
+ Then exit — do not proceed further.
55
+
56
+ ### Step 3 — Preview and Confirm
57
+
58
+ Show the user all unsynced entries grouped by category. Use this format:
59
+
60
+ ```
61
+ Unsynced entries to push:
62
+
63
+ ## <category>
64
+ - <entry text>
65
+ - <entry text>
66
+
67
+ ## <category>
68
+ - <entry text>
69
+
70
+ Push these N entries to the shared KB? [confirm]
71
+ ```
72
+
73
+ Wait for the user to confirm before proceeding. If they decline or ask to skip specific entries, respect their choice and adjust the push set accordingly.
74
+
75
+ ### Step 4 — Push Each Entry
76
+
77
+ For each confirmed unsynced entry, call `update_memory` with:
78
+
79
+ ```json
80
+ {
81
+ "project_id": "<from Step 1>",
82
+ "memory_type": "learning",
83
+ "title": "<category>: <first ~60 chars of bullet text>",
84
+ "content": {
85
+ "text": "<full bullet text>",
86
+ "category": "<category>"
87
+ },
88
+ "tags": ["session-context", "<category>"],
89
+ "description": "<full bullet text>"
90
+ }
91
+ ```
92
+
93
+ Where `<category>` is the section heading under which the entry appears in `docs/MEMORY.md` (e.g., `gotcha`, `convention`, `decision`).
94
+
95
+ After each successful `update_memory` call:
96
+
97
+ 1. Take the `id` returned in the response.
98
+ 2. Immediately rewrite that entry in `docs/MEMORY.md` to prepend the `[kb:<uuid>]` annotation:
99
+
100
+ Before: `- The actual learning text`
101
+ After: `- [kb:a1b2c3d4-e5f6-7890-abcd-ef1234567890] The actual learning text`
102
+
103
+ This makes the push resumable. If the process fails partway through, already-pushed entries are marked and will be skipped on the next run.
104
+
105
+ Push entries one at a time. Do not batch. Write the annotation back to the file after each individual success before moving to the next entry.
106
+
107
+ ### Step 5 — Update Header and Summarize
108
+
109
+ Update the `_Last updated:_ <date>` line in the `docs/MEMORY.md` header to today's date.
110
+
111
+ Report the final summary to the user:
112
+
113
+ - How many entries were pushed successfully
114
+ - How many entries were skipped (already synced or user-excluded)
115
+ - How many entries failed (if any)
116
+ - The project they were pushed to (name and ID)
117
+
118
+ Then tell the user:
119
+
120
+ > "These learnings are now available to your team via fetch_context with tags: ['session-context']"
121
+
122
+ > "Any new learnings captured via /remember will appear without a [kb:...] prefix and can be pushed next time."
123
+
124
+ ---
125
+
126
+ ## Error Handling
127
+
128
+ - **`update_memory` fails for a specific entry**: Log the error, skip that entry, and continue with the remaining entries. Report all failures in the Step 5 summary. Do not write a `[kb:...]` annotation for failed entries — they will be retried on the next push.
129
+ - **`docs/MEMORY.md` does not exist**: Inform the user: "No local memory file found. Run `/remember` to capture session learnings first, or `npx @codifier/cli init` to set up your project."
130
+ - **MCP connection not available**: Inform the user: "Push requires an active MCP connection to the Codifier server. Verify your MCP config and try again."
131
+ - **File write fails after successful `update_memory`**: Inform the user of the annotation that could not be written (entry text + returned UUID) so they can manually add it. The KB push itself succeeded — only the local annotation is missing.
@@ -0,0 +1,149 @@
1
+ # Skill: Research & Analyze
2
+
3
+ **Role:** Researcher
4
+ **Purpose:** Define a research objective, discover Athena data warehouse schemas, generate and validate SQL queries, execute them, synthesize the findings into a ResearchFindings.md report, and persist it to the shared knowledge base.
5
+
6
+ See `../shared/codifier-tools.md` for full MCP tool reference.
7
+
8
+ ---
9
+
10
+ ## Prerequisites
11
+
12
+ - Active MCP connection to the Codifier server
13
+ - AWS Athena credentials configured on the server (`AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `ATHENA_S3_OUTPUT_LOCATION`)
14
+ - A project to associate the findings with
15
+
16
+ ---
17
+
18
+ ## Workflow
19
+
20
+ ### Step 1 — Identify or Create the Project
21
+
22
+ Call `manage_projects` with `operation: "list"` and show the user their existing projects.
23
+
24
+ Ask: **"Which project should these research findings be associated with?"**
25
+
26
+ Select or create a project and capture the `project_id`.
27
+
28
+ ### Step 2 — Fetch Prior Research
29
+
30
+ Call `fetch_context` with `{ project_id, memory_type: "research_finding" }` to surface any prior findings relevant to this session.
31
+
32
+ If prior findings exist, summarize them briefly: **"Here's what we've found before on this project..."**
33
+
34
+ ### Step 2b — Surface Local Learnings
35
+
36
+ Attempt to read `docs/MEMORY.md`. If the file does not exist, skip this step silently and continue to Step 3.
37
+
38
+ If the file exists, scan for entries relevant to the research domain — particularly `data`, `gotcha`, and `convention` categories. Present relevant local learnings to the user alongside KB findings from Step 2. This may help refine the research objective.
39
+
40
+ Note: This is a local file read — no MCP call required.
41
+
42
+ ### Step 3 — Define the Research Objective
43
+
44
+ Ask the user to describe:
45
+ 1. **Research objective** — the specific question or hypothesis to investigate
46
+ 2. **Background context** — business context, prior hypotheses, relevant metrics or KPIs
47
+ 3. **Time period of interest** — date ranges for the analysis
48
+ 4. **Known relevant tables** — if the user knows which tables to look at (optional)
49
+
50
+ Confirm your understanding of the objective before proceeding.
51
+
52
+ ### Step 4 — Discover Available Tables
53
+
54
+ Call `query_data` with `{ operation: "list-tables", project_id }`.
55
+
56
+ Present the full table list to the user. Ask: **"Which of these tables are likely relevant to your research objective?"**
57
+
58
+ ### Step 5 — Describe Selected Tables
59
+
60
+ Call `query_data` with `{ operation: "describe-tables", project_id, table_names: [<user-selected tables>] }`.
61
+
62
+ Review the returned schemas with the user. Note column names, data types, and any partitioning. Ask if any additional tables should be included.
63
+
64
+ ### Step 6 — Generate SQL Queries
65
+
66
+ Using the prompt template in `templates/query-generation-prompt.md`, generate SQL queries tailored to the research objective.
67
+
68
+ **Substitute:**
69
+ - `{objective}` — the research objective from Step 3
70
+ - `{context}` — background context from Step 3
71
+ - `{available_tables}` — full table list from Step 4
72
+ - `{table_definitions}` — schema details from Step 5
73
+
74
+ Present all generated queries to the user. For each query, show:
75
+ - Query ID and purpose
76
+ - The SQL
77
+ - Expected output columns
78
+
79
+ Ask: **"Do these queries look correct? Which ones should we run, and are there any you'd like to modify?"**
80
+
81
+ Allow the user to edit, add, or remove queries before execution.
82
+
83
+ ### Step 7 — Execute Approved Queries
84
+
85
+ For each approved query, call `query_data` with `{ operation: "execute-query", project_id, query: "<sql>" }`.
86
+
87
+ Execute one query at a time. After each:
88
+ - Show the result rows
89
+ - Ask: "Does this look as expected, or should we investigate further before continuing?"
90
+
91
+ If a query returns no results: note this explicitly and ask if the query should be revised.
92
+ If a query errors: show the error and ask the user how to proceed.
93
+
94
+ ### Step 8 — Synthesize Findings
95
+
96
+ Using the prompt template in `templates/synthesis-prompt.md`, synthesize all query results into a ResearchFindings.md report.
97
+
98
+ **Substitute:**
99
+ - `{objective}` — the research objective
100
+ - `{context}` — background context
101
+ - `{query_results}` — all query results (as structured data)
102
+ - `{table_definitions}` — the schema reference from Step 5
103
+
104
+ Present the full ResearchFindings.md to the user. Ask: **"Does this accurately capture the findings? Any corrections or additions?"**
105
+
106
+ Incorporate feedback.
107
+
108
+ ### Step 9 — Persist Findings
109
+
110
+ Call `update_memory`:
111
+ ```
112
+ memory_type: "research_finding"
113
+ title: "ResearchFindings — <objective summary> — <YYYY-MM-DD>"
114
+ content: {
115
+ text: "<full ResearchFindings.md markdown>",
116
+ objective: "<objective>",
117
+ tables_used: ["<table1>", "<table2>"],
118
+ queries_run: <count>
119
+ }
120
+ tags: ["research", "<domain-tag>", "<date-tag>"]
121
+ source_role: "researcher"
122
+ ```
123
+
124
+ ### Step 10 — Summarize
125
+
126
+ Tell the user:
127
+ - Project ID and memory ID of the persisted finding
128
+ - Tables queried and query count
129
+ - Key findings (2–3 sentence summary)
130
+ - How developers can access this finding: `fetch_context` with `{ project_id, memory_type: "research_finding" }`
131
+
132
+ ---
133
+
134
+ ## Error Handling
135
+
136
+ - If `list-tables` returns empty: Athena credentials may not be configured. Inform the user and check the server configuration.
137
+ - If a query exceeds the 100KB result cap: the tool returns a truncation notice. Acknowledge this in the findings methodology section.
138
+ - If the user asks to run a non-SELECT query: refuse and explain the SELECT-only constraint. Offer an alternative SELECT formulation if possible.
139
+ - If synthesis produces speculative conclusions: flag them explicitly with confidence levels (High/Medium/Low) per the synthesis template.
140
+
141
+ ---
142
+
143
+ ## End-of-Workflow Memory Capture
144
+
145
+ After completing Step 10, suggest to the user:
146
+
147
+ > "You may have learned things during this research session worth capturing. Run `/remember` to capture session learnings to docs/MEMORY.md, or `/push-memory` to sync existing local memories to the shared KB."
148
+
149
+ This is a suggestion only — do not automatically invoke the capture or push Skills.
@@ -0,0 +1,61 @@
1
+ # Prompt Template: Generate SQL Queries
2
+
3
+ When this template is used, substitute all `{placeholders}` with actual values, then generate the queries as instructed.
4
+
5
+ ---
6
+
7
+ You are a senior data analyst expert in SQL and data warehousing. Using the research objective and schema information below, generate SQL queries that will answer the research questions effectively.
8
+
9
+ ## Research Objective
10
+
11
+ {objective}
12
+
13
+ ## Research Context
14
+
15
+ {context}
16
+
17
+ ## Available Schema
18
+
19
+ **Tables discovered:**
20
+ {available_tables}
21
+
22
+ **Table definitions:**
23
+ {table_definitions}
24
+
25
+ ## Instructions
26
+
27
+ Generate a set of SQL queries that address the research objective. Organise them from exploratory (broad counts, distributions) to specific (targeted metrics that directly answer the objective).
28
+
29
+ For EACH query provide:
30
+
31
+ ### Query: {query-id} — {short title}
32
+
33
+ **Purpose:** one sentence describing what this query answers
34
+
35
+ **SQL:**
36
+ ```sql
37
+ -- {explanation of non-obvious logic}
38
+ SELECT
39
+ ...
40
+ FROM {table}
41
+ WHERE ...
42
+ AND date_partition BETWEEN '{{start_date}}' AND '{{end_date}}'
43
+ LIMIT 1000
44
+ ```
45
+
46
+ **Expected output columns:**
47
+ | Column | Type | Description |
48
+ |--------|------|-------------|
49
+ | ... | ... | ... |
50
+
51
+ **Notes:** caveats, known data quality issues, or follow-up queries suggested
52
+
53
+ ---
54
+
55
+ **Query writing conventions:**
56
+ - Use standard ANSI SQL where possible
57
+ - Add comments inside SQL explaining non-obvious logic
58
+ - Parameterise date ranges using placeholders like `{{start_date}}` and `{{end_date}}`
59
+ - Include `LIMIT` clauses on exploratory queries
60
+ - For Athena: use partition columns in WHERE clauses to control cost
61
+ - Only SELECT statements — no DDL or DML
@@ -0,0 +1,67 @@
1
+ # Prompt Template: Synthesize Research Findings
2
+
3
+ When this template is used, substitute all `{placeholders}` with actual values, then generate the findings report as instructed.
4
+
5
+ ---
6
+
7
+ You are a senior data scientist and technical writer. Using the research objective, context, and query results below, synthesise a clear and actionable research findings report.
8
+
9
+ ## Research Objective
10
+
11
+ {objective}
12
+
13
+ ## Research Context
14
+
15
+ {context}
16
+
17
+ ## Query Results
18
+
19
+ {query_results}
20
+
21
+ ## Available Schema Reference
22
+
23
+ {table_definitions}
24
+
25
+ ## Instructions
26
+
27
+ Produce a research findings report titled `# ResearchFindings.md` with the following sections:
28
+
29
+ ### 1. Executive Summary
30
+ 2–4 sentences: the most important finding and its business implication.
31
+
32
+ ### 2. Methodology
33
+ Describe:
34
+ - Data sources used (tables, date ranges)
35
+ - Queries run and what each was designed to measure
36
+ - Data quality considerations or limitations discovered
37
+
38
+ ### 3. Key Findings
39
+ For each significant finding:
40
+
41
+ **Finding N: {descriptive title}**
42
+ - **Evidence:** specific numbers, percentages, or trends from the query results
43
+ - **Interpretation:** what this means in business or research terms
44
+ - **Confidence:** High / Medium / Low — with reasoning
45
+
46
+ ### 4. Trends and Patterns
47
+ Describe temporal trends, correlations, anomalies, or unexpected patterns observed across the query results.
48
+
49
+ ### 5. Limitations and Caveats
50
+ Be explicit about:
51
+ - Data gaps or missing periods
52
+ - Potential biases in the data
53
+ - Queries that returned no results and what that implies
54
+ - Assumptions made during the analysis
55
+
56
+ ### 6. Recommendations
57
+ Actionable next steps based on the findings. Each recommendation must state:
58
+ - **Action:** what to do
59
+ - **Owner:** who should act on it
60
+ - **Rationale:** why this follows from the data
61
+
62
+ ### 7. Follow-up Research Questions
63
+ List 3–5 questions this analysis surfaced but could not answer, to guide future research sessions.
64
+
65
+ ---
66
+
67
+ Format as a structured Markdown document suitable for sharing with stakeholders.
@@ -0,0 +1,187 @@
1
+ # Codifier MCP Tools Reference
2
+
3
+ This document describes all 5 MCP tools exposed by the Codifier server. Reference this when executing any Codifier skill.
4
+
5
+ ---
6
+
7
+ ## 1. `fetch_context`
8
+
9
+ Retrieve memories from the shared knowledge base, filtered by project, type, tags, or full-text search.
10
+
11
+ **Parameters:**
12
+ | Parameter | Type | Required | Description |
13
+ |-----------|------|----------|-------------|
14
+ | `project_id` | string (UUID) | ✓ | Project to scope the query to |
15
+ | `memory_type` | enum | — | Filter by type: `rule`, `document`, `api_contract`, `learning`, `research_finding` |
16
+ | `tags` | string[] | — | All supplied tags must be present on the memory |
17
+ | `query` | string | — | Full-text search applied to title and content |
18
+ | `limit` | number (1–100) | — | Max results (default: 20) |
19
+
20
+ **Returns:** Array of memory records with `id`, `title`, `content`, `memory_type`, `tags`, `source_role`, `created_at`.
21
+
22
+ **Usage patterns:**
23
+ - Fetch all rules for a project: `{ project_id, memory_type: "rule" }`
24
+ - Fetch researcher findings relevant to auth: `{ project_id, memory_type: "research_finding", tags: ["auth"] }`
25
+ - Full-text search across all memory types: `{ project_id, query: "payment processing" }`
26
+
27
+ ---
28
+
29
+ ## 2. `update_memory`
30
+
31
+ Create a new memory or update an existing one in the shared knowledge base.
32
+
33
+ **Parameters:**
34
+ | Parameter | Type | Required | Description |
35
+ |-----------|------|----------|-------------|
36
+ | `project_id` | string (UUID) | ✓ | Project to scope this memory to |
37
+ | `memory_type` | enum | ✓ | `rule`, `document`, `api_contract`, `learning`, `research_finding` |
38
+ | `title` | string | ✓ | Short descriptive title |
39
+ | `content` | object | ✓ | Structured content payload (any JSON object) |
40
+ | `id` | string (UUID) | — | If provided, updates the existing record instead of creating |
41
+ | `tags` | string[] | — | Tags for filtering and categorization |
42
+ | `category` | string | — | Category grouping (e.g., "security", "error-handling") |
43
+ | `description` | string | — | Human-readable summary |
44
+ | `confidence` | number (0–1) | — | Confidence score (default: 1.0) |
45
+ | `source_role` | string | — | Role that produced this memory (e.g., "developer", "researcher") |
46
+
47
+ **Returns:** The created or updated memory record including its `id`.
48
+
49
+ **Usage patterns:**
50
+ - Store a generated Rules.md: `{ project_id, memory_type: "document", title: "Rules.md", content: { text: "..." }, source_role: "developer" }`
51
+ - Store a research finding: `{ project_id, memory_type: "research_finding", title: "Q4 Retention Analysis", content: { summary: "...", findings: [...] }, source_role: "researcher" }`
52
+ - Update an existing memory: `{ project_id, id: "<existing-id>", memory_type: "rule", title: "...", content: {...} }`
53
+
54
+ ---
55
+
56
+ ## 3. `manage_projects`
57
+
58
+ Create, list, or switch the active project.
59
+
60
+ **Parameters:**
61
+ | Parameter | Type | Required | Description |
62
+ |-----------|------|----------|-------------|
63
+ | `operation` | enum | ✓ | `create`, `list`, or `switch` |
64
+ | `name` | string | For `create` | Project name |
65
+ | `org` | string | — | Organisation name (optional for `create`) |
66
+ | `project_id` | string (UUID) | For `switch` | Project to switch to |
67
+
68
+ **Returns:**
69
+ - `list`: Array of projects with `id`, `name`, `org`, `created_at`
70
+ - `create`: The created project record including its `id`
71
+ - `switch`: Confirmation of the active project
72
+
73
+ **Usage patterns:**
74
+ - List all projects: `{ operation: "list" }`
75
+ - Create a new project: `{ operation: "create", name: "Payments Redesign", org: "Acme Corp" }`
76
+ - Switch to an existing project: `{ operation: "switch", project_id: "<uuid>" }`
77
+
78
+ ---
79
+
80
+ ## 4. `pack_repo`
81
+
82
+ Condense a code repository into a versioned text snapshot using RepoMix. The snapshot is stored in the `repositories` table and can be retrieved for context.
83
+
84
+ **Parameters:**
85
+ | Parameter | Type | Required | Description |
86
+ |-----------|------|----------|-------------|
87
+ | `url` | string | ✓ | Repository URL (e.g., `https://github.com/org/repo`) or local path |
88
+ | `project_id` | string (UUID) | ✓ | Project to associate the snapshot with |
89
+ | `version_label` | string | — | Version label for this snapshot (e.g., `"v1.2.3"`, `"sprint-5"`, `"2026-02"`) |
90
+
91
+ **Returns:** Repository record with `id`, `url`, `version_label`, `token_count`, `file_count`, and `created_at`.
92
+
93
+ **Usage patterns:**
94
+ - Pack a public GitHub repo: `{ url: "https://github.com/org/repo", project_id, version_label: "2026-02" }`
95
+ - Pack multiple repos for brownfield onboarding: call once per repo URL
96
+
97
+ **Note:** Large repos may take 30–60 seconds. The packed snapshot is plain text suitable for LLM context.
98
+
99
+ ---
100
+
101
+ ## 5. `query_data`
102
+
103
+ Discover schemas and execute SELECT queries against an AWS Athena data warehouse.
104
+
105
+ **Parameters:**
106
+ | Parameter | Type | Required | Description |
107
+ |-----------|------|----------|-------------|
108
+ | `operation` | enum | ✓ | `list-tables`, `describe-tables`, or `execute-query` |
109
+ | `project_id` | string (UUID) | ✓ | Project UUID for session scoping |
110
+ | `query` | string | For `execute-query` | SQL SELECT statement to execute |
111
+ | `table_names` | string[] | For `describe-tables` | Tables to describe |
112
+
113
+ **Returns:**
114
+ - `list-tables`: Array of available table names
115
+ - `describe-tables`: Schema definitions for requested tables
116
+ - `execute-query`: Query results (capped at 100KB; truncation notice included if limit hit)
117
+
118
+ **Usage patterns:**
119
+ - Discover available tables: `{ operation: "list-tables", project_id }`
120
+ - Get schema for selected tables: `{ operation: "describe-tables", project_id, table_names: ["events", "users"] }`
121
+ - Execute a query: `{ operation: "execute-query", project_id, query: "SELECT user_id, COUNT(*) FROM events GROUP BY 1 LIMIT 100" }`
122
+
123
+ **Constraints:** Only SELECT statements are permitted. DDL and DML are rejected.
124
+
125
+ ---
126
+
127
+ ## Session Memory Lifecycle
128
+
129
+ Memory capture is Codifier's foundational capability — every use case produces learnings worth persisting, whether or not it produces a structured artifact. The lifecycle is local-first with on-demand KB sync.
130
+
131
+ ### Flow
132
+
133
+ ```
134
+ /remember (capture) → docs/MEMORY.md (local) → user edits → /push-memory (sync to KB)
135
+
136
+ /recall (retrieve) ← docs/MEMORY.md (local) + fetch_context (KB) ←─┘
137
+ ```
138
+
139
+ 1. **Capture** (`/remember`): The LLM elicits learnings from the user, structures them as categorized bullet points, and appends them to `docs/MEMORY.md`. No MCP calls. Local file only.
140
+
141
+ 2. **Review**: The user edits `docs/MEMORY.md` directly — add, remove, recategorize, or refine entries. The file is human-readable markdown grouped by category.
142
+
143
+ 3. **Push** (`/push-memory`): The LLM reads `docs/MEMORY.md`, identifies unsynced entries (those without a `[kb:<uuid>]` annotation), and calls `update_memory` once per entry. After each successful push, the returned `id` is written back as a `[kb:<uuid>]` annotation, making the operation idempotent and resumable.
144
+
145
+ 4. **Recall** (`/recall`): Reads `docs/MEMORY.md` for instant local recall (no MCP call), then optionally calls `fetch_context` to supplement with shared team learnings from the KB. Local and KB results are presented as distinct sections, never merged.
146
+
147
+ ### Session-Context Learning Pattern for `update_memory`
148
+
149
+ When pushing session learnings to the KB, use this pattern:
150
+
151
+ ```json
152
+ {
153
+ "project_id": "<project-uuid>",
154
+ "memory_type": "learning",
155
+ "title": "<category>: <first ~60 chars of bullet text>",
156
+ "content": { "text": "<full bullet text>", "category": "<category>" },
157
+ "tags": ["session-context", "<category>"],
158
+ "description": "<full bullet text>"
159
+ }
160
+ ```
161
+
162
+ **Tag contract:**
163
+ - All session learnings carry the `"session-context"` tag — this is the primary filter for retrieving session memories across the team
164
+ - The category tag (e.g., `"gotcha"`, `"convention"`, `"architecture"`) is the secondary filter
165
+
166
+ **Idempotency via `[kb:<uuid>]` annotations:**
167
+ - After a successful `update_memory` call, the returned `id` is written into the `docs/MEMORY.md` entry as: `- [kb:<uuid>] The actual learning text`
168
+ - On re-push, entries with `[kb:<uuid>]` annotations are skipped (already synced)
169
+ - To update an existing KB record, pass the annotated `id` to `update_memory` as the `id` parameter — this triggers an update instead of a create
170
+
171
+ **Retrieving session learnings:**
172
+ - All session learnings for a project: `fetch_context({ project_id, memory_type: "learning", tags: ["session-context"] })`
173
+ - Filtered by category: `fetch_context({ project_id, memory_type: "learning", tags: ["session-context", "gotcha"] })`
174
+ - Full-text search: `fetch_context({ project_id, memory_type: "learning", tags: ["session-context"], query: "API timeout" })`
175
+
176
+ ### Categories
177
+
178
+ Standard categories for session learnings (not exhaustive — users can add their own):
179
+
180
+ | Category | Use for |
181
+ |----------|---------|
182
+ | `architecture` | System design patterns, structural decisions, component relationships |
183
+ | `gotcha` | Surprising behaviors, edge cases, things that break unexpectedly |
184
+ | `convention` | Naming patterns, file organization, coding standards discovered |
185
+ | `tooling` | Tool configurations, CLI flags, environment setup details |
186
+ | `data` | Data schemas, query patterns, data quality observations |
187
+ | `process` | Workflow insights, team practices, deployment procedures |