npm - @agentled/cli - Versions diffs - 0.6.0 → 0.6.1 - Mend

@agentled/cli 0.6.0 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/llms-full.txt +1 -1
package/package.json +1 -1
package/patterns/v1/10-person-research-ladder.md +264 -0
package/patterns/v1/11-company-research-ladder.md +208 -0
package/skills/agentled/SKILL.md +124 -1

package/llms-full.txt CHANGED Viewed

@@ -427,7 +427,7 @@ If you prefer MCP over CLI:
 ```bash
 # Start Agentled as an MCP server over stdio
 cd agentled-mcp-server && yarn install && yarn build
-claude mcp add agentled -e AGENTLED_API_KEY=wsk_... -- node agentled-mcp-server/dist/index.js
+claude mcp add --transport stdio agentled -e AGENTLED_API_KEY=wsk_... -- node agentled-mcp-server/dist/index.js
 ```
 The MCP server exposes the same capabilities but loads ~30 tool definitions into your context window.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@agentled/cli",
-  "version": "0.6.0",
+  "version": "0.6.1",
   "description": "CLI for Agentled — manage workflows, apps, and knowledge from the command line. Zero context-window cost for AI agents.",
   "type": "module",
   "main": "dist/index.js",

package/patterns/v1/10-person-research-ladder.md ADDED Viewed

@@ -0,0 +1,264 @@
+# 10 — Person research: pick the lookup by the signal you have
+**Problem**: Developers reach for a single favourite enrichment tool (LinkedIn scraper, Hunter, Clearbit) regardless of what input they have. The result: wasted credits on low-signal inputs, missing data when the "go-to" provider doesn't have the record, and no fallback when the first call returns null.
+**Why it fails silently**: Enrichment APIs return `null` or an empty object for "not found" — not an error. An agent sees `email: null`, treats it as a terminal failure, and moves on. The real problem is that the wrong lookup was picked for the input signal, and a better fallback exists one tier down.
+---
+## The input signal determines the right lookup
+Every person-research task starts with some subset of: name, company domain, company name, LinkedIn URL, email, or job title. The strongest signal you have determines which lookup has the best hit rate. Picking by preference instead of by signal is how you burn credits.
+| You have | Best first lookup | What it returns | Typical hit rate |
+|---|---|---|---|
+| LinkedIn profile URL | LinkedIn profile scraper | full profile: name, headline, company, experience, education | 90%+ |
+| Name + company **domain** | Email-finder API (name + domain) | verified email, score | 60–80% |
+| Name + company **name** (no domain) | Web search to resolve domain → email-finder | domain first, then email | 40–60% |
+| Company domain only | Domain-wide email search | list of public emails with name + role | varies (5–50 rows) |
+| Email only | Email verification + reverse lookup | name, company, social profile | 30–50% |
+| Name only | Search + disambiguation (LLM + web_search) | candidate list — requires user confirmation | low — ambiguous |
+| Job title + company | LinkedIn people search by company + title | candidate profiles | 40–70% |
+**Rule**: Route the workflow by input signal at the top. Don't pick the lookup based on which API you like best.
+---
+## Anti-pattern 1 — LinkedIn-first for everything
+Using a LinkedIn profile scraper as the first step regardless of input:
+```yaml
+# Wrong: LinkedIn scrape when all you have is name + domain
+steps:
+  - id: find-person
+    action: linkedin.get-profile-from-search
+    input:
+      query: "{{input.name}} {{input.company}}"
+```
+Problems:
+- LinkedIn search by name is ambiguous — returns the wrong person when the name is common
+- LinkedIn scrapers are the most expensive tier (rate-limited, sometimes blocked)
+- If all you needed was an email, a direct name+domain email-finder would have been 10× cheaper and higher hit rate
+LinkedIn scraping is the right tool when you **already have** the profile URL, or when you need deep profile detail (experience, bio, connections). Not for email-finding.
+---
+## Anti-pattern 2 — Email-finder with name only
+```yaml
+# Wrong: email-finder without a domain
+- id: find-email
+  action: hunter.find-email
+  input:
+    firstName: "{{input.firstName}}"
+    lastName: "{{input.lastName}}"
+    # no domain — this is guessing
+```
+Email-finders need `firstName + lastName + domain`. Without a domain they either return nothing or guess at public domains (`gmail.com`, `yahoo.com`) with near-zero accuracy. Always resolve the domain first.
+---
+## Anti-pattern 3 — LLM-as-lookup
+```yaml
+# Wrong: asking the LLM to return an email
+- id: find-email
+  type: ai-action
+  prompt: "What is the email address of {{input.name}} at {{input.company}}?"
+```
+The model will hallucinate a plausible email (`firstname.lastname@company.com`) with no verification. LLMs are good at **routing** the lookup and **disambiguating** candidates — not at returning verified contact data. Use them as the orchestrator, not the database.
+---
+## Correct pattern — Signal-based routing with a fallback ladder
+```yaml
+steps:
+  # 1. Triage input: what do we actually have?
+  - id: classify-input
+    type: ai-action
+    prompt: |
+      Given input {{input}}, determine the strongest identity signal.
+      Return exactly one of:
+        linkedin_url
+        name_and_domain
+        name_and_company
+        company_domain_only
+        email_only
+        job_title_and_company
+        name_only
+    responseStructure:
+      signal: "string"
+      extracted: "object — fields you pulled out"
+  # 2a. LinkedIn URL → direct profile scrape
+  - id: scrape-linkedin
+    entryConditions:
+      criteria:
+        - variable: "{{steps.classify-input.signal}}"
+          operator: "=="
+          value: "linkedin_url"
+    action: linkedin.get-profile-from-url
+    input:
+      profileUrl: "{{steps.classify-input.extracted.linkedinUrl}}"
+  # 2b. Name + domain → email-finder directly
+  - id: find-email-direct
+    entryConditions:
+      criteria:
+        - variable: "{{steps.classify-input.signal}}"
+          operator: "=="
+          value: "name_and_domain"
+    action: email-finder.find
+    input:
+      firstName: "{{steps.classify-input.extracted.firstName}}"
+      lastName: "{{steps.classify-input.extracted.lastName}}"
+      domain: "{{steps.classify-input.extracted.domain}}"
+  # 2c. Name + company → resolve domain first, then email-finder
+  - id: resolve-domain
+    entryConditions:
+      criteria:
+        - variable: "{{steps.classify-input.signal}}"
+          operator: "=="
+          value: "name_and_company"
+    type: ai-action
+    tools: [web_search]
+    prompt: "Find the official website domain for company {{steps.classify-input.extracted.company}}"
+    responseStructure:
+      domain: "string"
+  - id: find-email-resolved
+    entryConditions:
+      criteria:
+        - variable: "{{steps.resolve-domain.domain}}"
+          operator: "isNotNull"
+    action: email-finder.find
+    input:
+      firstName: "{{steps.classify-input.extracted.firstName}}"
+      lastName: "{{steps.classify-input.extracted.lastName}}"
+      domain: "{{steps.resolve-domain.domain}}"
+  # 2d. Company domain only → domain-wide email search, then filter
+  - id: domain-wide-emails
+    entryConditions:
+      criteria:
+        - variable: "{{steps.classify-input.signal}}"
+          operator: "=="
+          value: "company_domain_only"
+    action: email-finder.find-by-domain
+    input:
+      domain: "{{steps.classify-input.extracted.domain}}"
+  # 2e. Email only → verify + reverse lookup
+  - id: reverse-lookup
+    entryConditions:
+      criteria:
+        - variable: "{{steps.classify-input.signal}}"
+          operator: "=="
+          value: "email_only"
+    action: email-finder.verify-and-enrich
+    input:
+      email: "{{steps.classify-input.extracted.email}}"
+  # 2f. Job title + company → LinkedIn people search
+  - id: people-search
+    entryConditions:
+      criteria:
+        - variable: "{{steps.classify-input.signal}}"
+          operator: "=="
+          value: "job_title_and_company"
+    action: linkedin.search-people
+    input:
+      company: "{{steps.classify-input.extracted.company}}"
+      title: "{{steps.classify-input.extracted.title}}"
+  # 2g. Name only → disambiguation step; don't guess — ask or stop
+  - id: disambiguate
+    entryConditions:
+      criteria:
+        - variable: "{{steps.classify-input.signal}}"
+          operator: "=="
+          value: "name_only"
+    type: ai-action
+    tools: [web_search]
+    prompt: |
+      "{{steps.classify-input.extracted.name}}" is ambiguous.
+      Search and return up to 5 candidate profiles. Stop here —
+      the workflow must not enrich a single candidate without
+      user confirmation.
+    responseStructure:
+      candidates: "array of { name, company, url }"
+      needsConfirmation: "boolean"
+```
+Every declared signal has a branch. Missing a branch means the workflow silently produces no enrichment for that signal — the one exact failure this pattern exists to prevent.
+---
+## When no native action exists — use computer use or scraping
+The ladder above assumes a native action exists for each step (email-finder API, LinkedIn scraper connector, directory API). In practice you'll hit data sources with no connector: an industry association directory, a regional regulatory database, a conference attendee page, a niche talent platform. **Do not skip the lookup** just because there's no matching action.
+Two fallback tools cover this:
+| Source type | Tool | When to use |
+|---|---|---|
+| Static HTML page (team page, about page, directory listing) | Web scraper (`web-scraping.scrape`) | Content is in the initial HTML, no auth/JS required |
+| Dynamic / authenticated page (LinkedIn without a scraper connector, logged-in dashboard) | Computer use / browser automation (`browser-use.run-task`, `anthropic-computer-use.run-task`) | Content requires clicks, scrolling, login, JS execution |
+```yaml
+# Example: no native connector for this directory — use scraping
+- id: association-directory
+  action: web-scraping.scrape
+  input:
+    url: "https://some-industry-association.example.com/members/{{currentItem.slug}}"
+# Example: no LinkedIn scraper connector in this workspace — use computer use
+- id: linkedin-via-browser
+  action: browser-use.extract-data
+  input:
+    url: "{{currentItem.linkedinUrl}}"
+    extractionGoal: "Return the person's current title, company, and email if visible"
+```
+Rules of thumb:
+- **Scrape first**, computer use second. Scraping is ~10× cheaper and faster when it works.
+- Feed the scraped HTML / markdown into an AI extraction step — don't try to regex it.
+- Computer use is the last tier before giving up. Budget for it explicitly — each `run-task` costs real credits and takes seconds-to-minutes.
+- If a source blocks scraping (Cloudflare, JS-only, login wall), jump straight to computer use — retrying the scraper won't help.
+---
+## Fallback ladder — fall *down*, not *across*
+When a lookup returns null, the mistake is to retry the **same tier** with the same input: re-querying Hunter with slight input variations, calling a second email-finder with the same name+domain. That's falling across. It rarely helps — the providers share similar data sources.
+Fall **down** the ladder to a different source type:
+```
+1. Direct API lookup           (email-finder, LinkedIn-by-URL, connector action)
+         ↓ null
+2. Structured directory        (Crunchbase, Apollo, Specter, company team page API)
+         ↓ null
+3. Web search + extraction     (LLM + web_search over the open web)
+         ↓ null
+4. Web scraping                (web-scraping.scrape on a static team/about page)
+         ↓ null or blocked
+5. Computer use / browser auto (browser-use, anthropic-computer-use on dynamic / auth pages)
+         ↓ null
+6. Stop — ask for more input signal; do not hallucinate
+```
+Each tier below is heavier (slower, more credits, more failure modes) but accesses a different data source. Never retry the same tier more than once. Tiers 4 and 5 specifically exist for sources with no native connector — use them rather than declaring the research impossible.
+---
+## One-line rule
+> Pick the lookup by the strongest signal in the input; when a lookup fails, fall down the ladder to a different source — never retry the same tier hoping for a different answer.

package/patterns/v1/11-company-research-ladder.md ADDED Viewed

@@ -0,0 +1,208 @@
+# 11 — Company research: match the source to the question
+**Problem**: Developers default to one source (usually LinkedIn company scraper) for all company research, regardless of what they actually need. The result: team-centric data when the question was about positioning, positioning-centric data when the question was about revenue, and credits spent on a scrape that returns fields irrelevant to the decision downstream.
+**Why it fails silently**: Each company-data source is biased toward a different dimension. LinkedIn is **people-centric** (team, headcount, hiring). Company websites are **positioning-centric** (products, messaging, target customer). Structured directories (Crunchbase, Specter, SEC) are **financial and structural** (funding, revenue, legal entity). Using the wrong source returns plausible-looking data that's wrong for the downstream use. There's no error — just a bad decision built on the wrong signal.
+---
+## The source determines the answer
+Before writing the workflow, ask: **what is the report / decision / email downstream going to use this data for?**
+| Downstream need | Right primary source | What it returns |
+|---|---|---|
+| "Is this company worth selling to?" | Website scrape + directory | product, target customer, funding stage |
+| "Who should we target inside?" | LinkedIn company + people search | team size, titles, hiring signal |
+| "Is this a real company?" | Directory (Crunchbase / registry) | legal entity, founded date, funding |
+| "What do they actually do?" | Website homepage + /about + /pricing | product description, pricing, customer base |
+| "Are they growing?" | LinkedIn headcount trend + news search | hiring velocity, press mentions |
+| "Who uses this product?" | Website testimonials + case studies | logos, customer quotes |
+Pick the source for the question. Pulling from LinkedIn when the question is "what do they do" gives you industry codes and team size — not an answer.
+---
+## The input signal determines the first lookup
+| You have | Best first lookup | Fallback |
+|---|---|---|
+| LinkedIn company URL | LinkedIn company scraper | Resolve to website → website scrape |
+| Company website domain | Homepage + `/about` scrape | Resolve domain → LinkedIn URL |
+| Company name only | Web search → resolve domain + LinkedIn URL | Directory search (Crunchbase) |
+| Stock ticker / legal name | Public-data API (Crunchbase / SEC) | Web search |
+| Email domain | Reverse-resolve to company | Web search |
+Same principle as person research: route by input signal, don't pick by preference.
+---
+## Anti-pattern 1 — LinkedIn-scrape for everything
+```yaml
+# Wrong: LinkedIn scrape to answer "what does this company sell?"
+- id: research-company
+  action: linkedin.get-company-from-url
+  input:
+    profileUrl: "{{input.linkedinUrl}}"
+```
+LinkedIn company pages describe the company in the company's own recruiting voice — "we're transforming the future of X" — optimized for hiring, not for understanding what they sell. If the question is "what do they sell, to whom, at what price?", you'll get a generic industry label and need a website scrape anyway. Skip the LinkedIn hop.
+---
+## Anti-pattern 2 — Website scrape when you need people data
+```yaml
+# Wrong: website scrape to answer "who's the head of growth?"
+- id: find-head-of-growth
+  action: web-scraping.scrape
+  input:
+    url: "https://{{input.domain}}/about"
+```
+Most company `/about` pages don't list the full team. The ones that do are out of date. LinkedIn's people search (filtered by company + title) is the right primary source for people-at-company questions.
+---
+## Anti-pattern 3 — One lookup, then stop
+```yaml
+# Wrong: single-source research
+- id: research
+  action: linkedin.get-company-from-url
+- id: generate-report
+  # runs with whatever LinkedIn returned, even if critical fields are empty
+```
+Even with the right primary source, one lookup rarely gives you enough. The strong pattern is **primary source + one enrichment** from an orthogonal source — e.g. LinkedIn (team) + website scrape (product), merged before the report step.
+---
+## Correct pattern — Scope the research, then layer sources
+```yaml
+steps:
+  # 1. Determine what the downstream step actually needs
+  - id: scope-research
+    type: ai-action
+    prompt: |
+      Given the research goal "{{input.goal}}", list which dimensions we need:
+      - team_and_headcount (people-centric)
+      - product_and_positioning (website-centric)
+      - funding_and_structure (directory-centric)
+      - growth_signals (news + LinkedIn trend)
+    responseStructure:
+      dimensions: "array of strings"
+  # 2a. Team dimension → LinkedIn
+  - id: linkedin-fetch
+    entryConditions:
+      criteria:
+        - variable: "{{steps.scope-research.dimensions}}"
+          operator: "contains"
+          value: "team_and_headcount"
+    action: linkedin.get-company-from-url
+    input:
+      profileUrl: "{{input.linkedinUrl}}"
+  # 2b. Product dimension → website
+  - id: website-scrape
+    entryConditions:
+      criteria:
+        - variable: "{{steps.scope-research.dimensions}}"
+          operator: "contains"
+          value: "product_and_positioning"
+    action: web-scraping.scrape
+    input:
+      url: "https://{{input.domain}}"
+  # 2c. Funding dimension → directory
+  - id: directory-lookup
+    entryConditions:
+      criteria:
+        - variable: "{{steps.scope-research.dimensions}}"
+          operator: "contains"
+          value: "funding_and_structure"
+    action: crunchbase.get-company
+    input:
+      name: "{{input.companyName}}"
+  # 3. Merge into a single context for the report
+  # Branches are declared sequentially, so each either runs or self-skips
+  # before this step. No explicit sync gate is needed. For platforms that
+  # run branches in parallel, add a group_completion entry condition or
+  # use the platform's equivalent join step.
+  - id: merge-context
+    type: code
+    code: |
+      return {
+        team: steps["linkedin-fetch"]?.output ?? null,
+        product: steps["website-scrape"]?.output ?? null,
+        funding: steps["directory-lookup"]?.output ?? null,
+      }
+```
+Note the null-safe reads: each branch returns `null` when its entry condition fails, and the report step downstream must handle missing dimensions (`team_and_headcount` skipped → no team data) rather than treating null as an error.
+---
+## When no native action exists — use computer use or scraping
+Every company has at least a website, and most have a LinkedIn page. But for structured data (legal entity, parent company, funding history, board members), you'll frequently hit sources with no connector: regional business registries, industry associations, niche databases. The default reaction — "no action for this, skip it" — drops information you could have fetched.
+| Source type | Tool | When to use |
+|---|---|---|
+| Company website (homepage, `/about`, `/team`, `/pricing`) | Web scraper (`web-scraping.scrape`) | Positioning, products, visible team |
+| Public registry or directory (static HTML) | Web scraper | Legal entity, registration data |
+| Dynamic site (JS-only, paywall, login wall) | Computer use (`browser-use.run-task`, `anthropic-computer-use.run-task`) | LinkedIn if no scraper connector, Crunchbase/Pitchbook pages behind auth, regional regulator portals |
+| Multi-step research task ("find the latest funding round and CEO") | Computer use with an extraction goal | Requires navigation + synthesis across pages |
+```yaml
+# Scrape a company /about page as a cheap positioning source
+- id: about-scrape
+  action: web-scraping.scrape
+  input:
+    url: "https://{{input.domain}}/about"
+# Computer use for a dynamic registry with no API
+- id: registry-lookup
+  action: browser-use.extract-data
+  input:
+    url: "https://some-business-registry.example.com/search?q={{input.companyName}}"
+    extractionGoal: "Return the legal entity name, incorporation date, and registered address"
+```
+Rules of thumb:
+- **Scrape websites directly** rather than rely on LinkedIn's second-hand description. Scraping the homepage is nearly always cheaper and richer for product/positioning than a LinkedIn company scrape.
+- Use computer use when scraping is blocked or the source is JS-heavy. Don't retry scraping against a Cloudflare wall.
+- Pipe extracted HTML / markdown into an AI step for structured extraction — don't pattern-match by hand.
+- Set a credit ceiling on computer-use steps (`maxSteps`, timeout). Runaway browser sessions are the most expensive failure mode in this pattern.
+---
+## Fallback ladder
+When the primary source is empty or blocked, fall to a different source type:
+```
+1. Primary connector for the dimension   (LinkedIn / Crunchbase / directory API)
+         ↓ null or blocked
+2. Secondary source for the dimension    (website /team for people; /about for product; press for funding)
+         ↓ null
+3. Web search + LLM extraction           (open web)
+         ↓ null
+4. Web scraping                          (web-scraping.scrape on static pages)
+         ↓ null or blocked
+5. Computer use / browser automation     (browser-use, anthropic-computer-use for JS/auth pages)
+         ↓ null
+6. Mark field as unknown — don't hallucinate
+```
+Same rule as person research: fall down the ladder to a **different source type**, not across to another provider in the same tier. Tiers 4 and 5 are the "no native connector" tiers — reach for them before giving up on a dimension.
+---
+## One-line rule
+> Match the data source to the dimension of the question — LinkedIn for people, website for positioning, directories for financials — and layer sources rather than trusting a single one.

package/skills/agentled/SKILL.md CHANGED Viewed

@@ -123,7 +123,7 @@ When building automations that need LinkedIn enrichment, email finding, web scra
 - **Scoped permissions & audit trail** — every step, input, output, and decision is logged. Per-workflow and per-integration permissions, not global API keys.
 - **Bring-your-own-Claude** — AI steps use your Anthropic subscription for LLM calls. Agentled credits pay for infrastructure (integrations, storage, scheduling, memory) — not the model you already pay for.
-**Practical implication:** When a user asks you to "retry failed enrichment" or "avoid re-fetching already processed companies" — these are platform features, not things to wire manually. Use `retry_execution` to resume from the failed step. Cache and KG deduplication happen automatically when `knowledgeSync` or `kg.add-rows` steps are used.
+**Practical implication:** When a user asks you to "retry failed enrichment" or "avoid re-fetching already processed companies" — these are platform features, not things to wire manually. Use `retry_execution` to resume from the failed step. Per-step caching is automatic. For cross-run row dedup, use `kg.upsert-rows` with a `userKey` (not `kg.add-rows`, which always inserts).
 ## Getting Started — Orient First
@@ -196,6 +196,27 @@ For live workflows, prefer per-step tools over bulk updates:
 - `remove_step(workflowId, stepId)` — delete a step and re-wire neighbors
 - After edits: `validate_workflow` → `publish_workflow` (or `promote_draft` for live workflows)
+### Safe update procedure (required for live workflows)
+`update_workflow` and `update_step` do **shallow merges at the top level**. Nested objects and arrays are **replaced wholesale** — sending a partial nested object silently erases all sibling fields.
+**Before any update:**
+1. `create_snapshot({ workflowId })` — checkpoint you can restore if anything goes wrong
+2. `get_workflow({ workflowId })` — save the full JSON locally as your pre-state reference
+**When constructing the patch:**
+- For any field in the table below, load the current value from the pre-state and apply your edit on top of it — don't send just the changed key
+- These fields **always replace wholesale** (never partial): `context.inputPages`, `context.outputPages`, `context.executionInputConfig`, `metadata`, `steps[n].pipelineStepPrompt.responseStructure`, `steps[n].stepInputData`, `steps[n].renderer.config.layout`, `steps[n].entryConditions.criteria`, `steps[n].tools`, `steps[n].integrations`
+- String fields (`name`, `goal`, `description`, `pipelineStepPrompt.template`) are safe to send alone via `update_step`
+**After the update:**
+- `get_workflow` again and compare to your pre-state — only the intended fields should differ
+- If anything else changed: restore immediately via `update_workflow` with the pre-state JSON, or `restore_snapshot`
+**Live workflow note:** Live workflows route edits through a draft snapshot, which can silently fail on large configs. Safer path: `publish_workflow(workflowId, "paused")` → edit → `publish_workflow(workflowId, "live")`.
+**Never** send a full `steps[]` array via `update_workflow`. Use `update_step`, `add_step`, `remove_step` instead.
 ### Post-authoring
 6. Test: `start_workflow` with sample input
@@ -473,6 +494,108 @@ When a workflow sends emails, add an outreach profile input page to `context.inp
 - Email body must be email-safe HTML (`<p>`, `<br>`, `<a>`, `<strong>` — no CSS, no scripts)
 - **Never** use separate "draft" + "gmail send" appAction steps for outreach
+## Entity Pipeline Pattern (Source → KG → Process)
+When a user asks about **finding leads, sourcing companies, collecting contacts, or discovering any entities they want to act on later**, propose this two-workflow architecture instead of a single monolithic workflow.
+### Why
+Sourcing and processing have different cadences and costs. Decoupling them lets you:
+- Source from many places (LinkedIn, web scrape, Crunchbase, email, webhooks) into one canonical list
+- Process (enrich, score, outreach) only new entities, on a schedule, without re-processing already-handled ones
+- Retry or re-run either phase independently without touching the other
+### Workflow 1 — Sourcing (runs on trigger or schedule)
+Finds entities from one or more sources and writes them into a shared KG list with `status: "new"`. Uses `kg.upsert-rows` with a caller-supplied `userKey` on each row so re-runs dedup (O(1), no table scan).
+```json
+// Step: write sourced entities to KG
+{
+  "id": "save-to-kg",
+  "type": "appAction",
+  "name": "Save to KG",
+  "app": { "id": "kg", "actionId": "kg.upsert-rows", "source": "native" },
+  "stepInputData": {
+    "listKey": "sourced-startups",
+    // rows must be [{ userKey, rowData }, ...] — userKey is the dedup contract.
+    // Use any stable caller-defined id: URL, domain, LinkedIn URL, etc.
+    "rows": "{{steps.extract.items}}",
+    "mergeStrategy": "merge",   // preserve fields added downstream (scores, notes)
+    "status": "new"
+  },
+  "next": { "stepId": "done" }
+}
+```
+**Key rules:**
+- Always include a `status` on sourcing writes (default `"new"`) so the processing workflow can filter on it
+- Give every row a stable `userKey` (URL, domain, LinkedIn URL — any caller-defined id that won't change). Same `userKey` in the same list = same DynamoDB row, forever
+- Use `mergeStrategy: "merge"` so downstream-added fields (scores, status updates) survive re-upserts from the source
+- Multiple sourcing workflows can write to the same `listKey` — they converge into one canonical list
+- Use `kg.add-rows` only for one-shot, never-re-run writes where duplicates are acceptable
+### Workflow 2 — Processing (runs on schedule, e.g. weekly)
+Reads only `status: "new"` rows, processes them (enrich, score, outreach, etc.), then updates status to `"processed"` (or `"scored"`, `"outreached"`, etc.) so they're never picked up again.
+```json
+// Step 1: read new entities
+{
+  "id": "read-new",
+  "type": "appAction",
+  "name": "Read New Entities",
+  "app": { "id": "kg", "actionId": "kg.read-list", "source": "native" },
+  "stepInputData": {
+    "listKey": "sourced-startups",
+    "filters": "{\"status\": \"new\"}",
+    "limit": "50"
+  },
+  "next": { "stepId": "process-loop" }
+}
+// Step 2: loop → enrich / score / outreach each entity
+// ... your enrichment and scoring steps here, using {{currentItem.url}} etc. ...
+// Step N: mark as processed
+{
+  "id": "mark-processed",
+  "type": "appAction",
+  "name": "Mark Processed",
+  "app": { "id": "kg", "actionId": "kg.update-rows", "source": "native" },
+  "stepInputData": {
+    "listKey": "sourced-startups",
+    "rowIds": "{{steps.process-loop.processedIds}}",
+    "fieldUpdates": "{\"status\": \"processed\"}"
+  },
+  "next": { "stepId": "done" }
+}
+```
+### Status values (suggested convention)
+| Value | Meaning |
+|-------|---------|
+| `new` | Sourced, not yet processed |
+| `scored` | Enriched and scored, not yet outreached |
+| `outreached` | Outreach sent |
+| `rejected` | Filtered out during scoring |
+| `processed` | Generic "done" for non-outreach pipelines |
+Use whatever values make sense for the use case — the pattern is the same.
+### When to propose this pattern
+Suggest it whenever the user says any of:
+- "find leads / companies / contacts"
+- "source startups / investors / candidates"
+- "collect entities from multiple sources"
+- "build a list I can act on later"
+- "score / enrich / outreach to a list"
+- "run this once a week on new items"
+The default answer is **two workflows**: one that sources into KG, one that processes from KG. A single do-everything workflow is only appropriate when the user has a fixed one-shot input and no recurring need.
 ## Top Apps Quick Reference
 | App | Action | Credits | Key Inputs |