npm - @tangle-network/agent-eval - Versions diffs - 0.68.0 → 0.69.0 - Mend

@tangle-network/agent-eval 0.68.0 → 0.69.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,19 @@ All notable changes to `@tangle-network/agent-eval` and its sibling `agent-eval-
 ---
+## [0.69.0] — 2026-05-30 — strong generic baseline roles (engineer / researcher / generalist)
+The structured profile (0.68.0) had a hollow top zone — `baselineProfile` took an arbitrary `role` string. Products are file-producing, tool-using agents living in a sandbox, but nothing gave them a strong operator foundation. This adds three generically-useful, verification-first baseline roles distilled from agent-runtime's `coderProfile` doctrine.
+### Added (`profile.*`)
+- **`engineerRole`** — a senior principal / 10x-IC sandbox operator: produce the real artifact then verify it; smallest correct change; **run the checks and fix the root cause — never weaken a test or hide an error**; inspect external-boundary outcomes; "done" = produced AND verified.
+- **`researcherRole`** — read the real sources, cite every material claim, mark inference vs. verified, never fabricate a source/quote/number.
+- **`generalistRole`** — strong default: do over describe, ground claims, verify before done, ask only on genuinely user-owned choices.
+- `BASELINE_ROLES` (keyed `engineer|researcher|generalist`) + `baselineProfileFromRole(role, overrides?)` — pick a foundation, override the environment to describe THIS product's sandbox, then layer domain via `prodProfile`.
+**Layering discipline:** these are domain-AGNOSTIC and verification-first. Domain strength (legal M&A persona, tax-calc rigor) stays in the **product repo** and composes on top via `domain[]`; it is lifted into the substrate only once ≥2 products genuinely reuse it. 3 new tests assert the roles are distinct, verification-first, and carry no product-domain words. Full suite (1642) green.
 ## [0.68.0] — 2026-05-30 — structured AgentProfile (the self-improvement surface stops being an opaque blob)
 The optimizable surface was an opaque string addendum, so the loop could only mutate (and the dashboard only diff) an unstructured blob — you couldn't see *what kind* of improvement a candidate made. This adds a **sectioned `AgentProfile`** primitive (mirrored on Harvey LAB's system-prompt structure) so the surface has named, separately-addressable zones the loop targets one at a time.

package/dist/index.d.ts CHANGED Viewed

@@ -5821,6 +5821,45 @@ declare function defaultRenderStudentPrompt<TInput>(args: {
  *  Throws on failure so the cell is recorded as failed, not silently zeroed. */
 declare function defaultParseStudentLabel<TProduced>(rawContent: string, scenarioId: string): TProduced;
+/**
+ * @experimental
+ *
+ * Strong, generically-useful baseline ROLES — the top zone of an `AgentProfile`
+ * before any domain layer. A product composes one of these with its own
+ * environment description (its sandbox) and its domain guidance, which lives in
+ * the product repo (not here): a profile is `<baseline role> + <environment> +
+ * <domain sections>`. Domain strength is NOT generalized into the substrate —
+ * it is lifted here only once ≥2 products genuinely reuse it.
+ *
+ * Three roles cover the common shapes:
+ *   - `engineerRole`   — builds + verifies real artifacts in a sandbox. Distilled
+ *                        from agent-runtime's `coderProfile` doctrine (minimal
+ *                        correct change, run the checks, fix the root cause,
+ *                        never weaken a test or hide an error).
+ *   - `researcherRole` — gathers from real sources and grounds every claim
+ *                        (cite or mark inferred; never fabricate).
+ *   - `generalistRole` — a strong default: helpful, grounded, verify-before-assert.
+ *
+ * All three are verification-first and domain-agnostic. They describe HOW a
+ * capable IC operates, not WHAT domain it works in.
+ */
+/** Senior-IC engineer operating in a sandbox — produces real artifacts and
+ *  verifies them before declaring done. The shared operator foundation for any
+ *  file-producing, tool-using product agent. */
+declare const engineerRole: string;
+/** Researcher who grounds every claim in real sources — gathers, cites, and
+ *  distinguishes verified fact from inference. */
+declare const researcherRole: string;
+/** Strong general-purpose default — helpful, grounded, verifies before asserting. */
+declare const generalistRole: string;
+/** The named baseline roles, for selection by key (e.g. from a product config). */
+declare const BASELINE_ROLES: {
+    readonly engineer: string;
+    readonly researcher: string;
+    readonly generalist: string;
+};
+type BaselineRoleKey = keyof typeof BASELINE_ROLES;
 /**
  * @experimental — surface may evolve as production agents wire it in.
  *
@@ -5843,6 +5882,7 @@ declare function defaultParseStudentLabel<TProduced>(rawContent: string, scenari
  * varied. Consume this module by its own path (`@tangle-network/agent-eval`
  * exposes it under the `profile` namespace) to avoid the name clash.
  */
 /** A named, addressable region of the system prompt. `evolvable` marks whether
  *  the self-improvement loop is allowed to patch its body; fixed scaffolding
  *  (e.g. a compliance preamble) sets `evolvable: false`. */
@@ -5892,6 +5932,18 @@ declare function baselineProfile(args: {
     toolConventions?: string;
     skills?: ProfileSkill[];
 }): AgentProfile;
+/**
+ * Baseline from one of the strong generic roles (`'engineer' | 'researcher' |
+ * 'generalist'`) — the common case: pick a role foundation, optionally override
+ * the environment to describe THIS product's sandbox, then layer domain via
+ * `prodProfile`. Domain guidance stays in the product repo; this only supplies
+ * the generically-useful role + stock scaffolding.
+ */
+declare function baselineProfileFromRole(role: BaselineRoleKey, args?: {
+    environment?: string;
+    toolConventions?: string;
+    skills?: ProfileSkill[];
+}): AgentProfile;
 /** The production profile: the baseline scaffolding plus the domain sections
  *  shipped after self-improvement. Differs from the baseline ONLY in `domain`
  *  (and any skills the caller layered into the baseline) — the role,
@@ -5918,15 +5970,21 @@ declare function sectionHash(section: AgentProfileSection): string;
 type index_AgentProfile = AgentProfile;
 type index_AgentProfileSection = AgentProfileSection;
+declare const index_BASELINE_ROLES: typeof BASELINE_ROLES;
+type index_BaselineRoleKey = BaselineRoleKey;
 type index_ProfileSkill = ProfileSkill;
 declare const index_applyDomainPatch: typeof applyDomainPatch;
 declare const index_baselineProfile: typeof baselineProfile;
+declare const index_baselineProfileFromRole: typeof baselineProfileFromRole;
+declare const index_engineerRole: typeof engineerRole;
+declare const index_generalistRole: typeof generalistRole;
 declare const index_prodProfile: typeof prodProfile;
 declare const index_profileToSurface: typeof profileToSurface;
 declare const index_renderProfile: typeof renderProfile;
+declare const index_researcherRole: typeof researcherRole;
 declare const index_sectionHash: typeof sectionHash;
 declare namespace index {
-  export { type index_AgentProfile as AgentProfile, type index_AgentProfileSection as AgentProfileSection, type index_ProfileSkill as ProfileSkill, index_applyDomainPatch as applyDomainPatch, index_baselineProfile as baselineProfile, index_prodProfile as prodProfile, index_profileToSurface as profileToSurface, index_renderProfile as renderProfile, index_sectionHash as sectionHash };
+  export { type index_AgentProfile as AgentProfile, type index_AgentProfileSection as AgentProfileSection, index_BASELINE_ROLES as BASELINE_ROLES, type index_BaselineRoleKey as BaselineRoleKey, type index_ProfileSkill as ProfileSkill, index_applyDomainPatch as applyDomainPatch, index_baselineProfile as baselineProfile, index_baselineProfileFromRole as baselineProfileFromRole, index_engineerRole as engineerRole, index_generalistRole as generalistRole, index_prodProfile as prodProfile, index_profileToSurface as profileToSurface, index_renderProfile as renderProfile, index_researcherRole as researcherRole, index_sectionHash as sectionHash };
 }
 export { ANALYST_SEVERITIES, type ActiveLearningOptions, type AdapterRun, AgentDriver, type AgentDriverConfig, AgentEvalError, AgentProfile$1 as AgentProfile, type AgreementResult, type AlignmentOp, Analyst, AnalystContext, AnalystCost, AnalystFinding, AnalystSeverity, AnalyzeTracesInput, AnalyzeTracesOptions, AnalyzeTracesResult, type AntiSlopConfig, type AntiSlopIssue, type AntiSlopReport, Artifact$1 as Artifact, type Artifact as ArtifactCheckArtifact, type ArtifactEventLike, type ArtifactValidator, type AssertCrossFamilyOptions, type AssertSingleBackendOptions, type AutoPrClient, AxGepaSteeringOptimizer, type AxSteeringOptimizerConfig, type BackendDescriptor, BaselineReport, BehaviorAssertion, BenchmarkReport, BenchmarkRunner, BenchmarkRunnerConfig, type BisectOptions, type BisectResult, type BisectStep, BudgetBreachError, BudgetGuard, BudgetLedgerEntry, BudgetSpec, type BuildAgreementJudgeOptions, CallExpectation, type CanaryAlert, type CanaryKind, type CanaryLeak, type CanaryOptions, type CanaryReport, type CanarySeverity, type CandidateScenario, type CausalAttributionReport, type CellVerdict, ChatRequest, CheckResult, CollectedArtifacts, type CommandRunner, type CompareLabels, CompletionCriterion, type CompletionRequirement, type CompletionVerdict, type ConceptComplexity, type ConceptFinding, type ConceptSpec, type ConceptWeightStrategy, type ContinuityCheck, type ContinuityCheckResult, type ContinuityReport, type ContinuitySnapshotPair, type ContractMetric, type ContractReport, ConvergenceTracker, type CorrectnessChecker, type CostEntry, type CostSummary, CostTracker, type CounterfactualContext, type CounterfactualMutation, type CounterfactualResult, type CounterfactualRunner, CreateChatClientOpts, type CreateDefaultReviewerOptions, type CreateSandboxPoolOpts, type CreateTraceAnalystKindOpts, CrossFamilyError, type CrossTraceDiff, type CrossTraceDiffOptions, D1ExperimentStore, type D1ExperimentStoreOptions, type D1Like, type D1PreparedStatementLike, DEFAULT_AGENT_SLOS, DEFAULT_COMPLEXITY_WEIGHTS, DEFAULT_FINDERS, DEFAULT_HARNESS_OBJECTIVES, DEFAULT_MUTATION_PRIMITIVES, DEFAULT_MUTATORS, DEFAULT_PR_REVIEW_SCORE_WEIGHTS, DEFAULT_RUN_SCORE_WEIGHTS, DEFAULT_SEVERITY_WEIGHTS, DEFAULT_TRACE_ANALYST_KINDS, Dataset, DatasetScenario, type DecideNextUserTurnOpts, type DeployFamily, type DeployGateLayerInput, type DeployRunResult, type DeployRunner, type DiffPolicy, type DiffScorecardOptions, type DirEntry, type Direction, type DiscoverPersonasOptions, type DiscoveredPersona, DriverResult, DriverState, DualAgentBench, type DualAgentBenchConfig, type DualAgentReport, type DualAgentRound, type DualAgentScenario, type DualAgentScenarioResult, ERROR_COUNT_PATTERNS, type ErrorCountPattern, type EvolutionRound, type ExecutorConfig, type Expectation, type Experiment, type Run as ExperimentRun, type ExperimentStore, ExperimentTracker, type ExportedRewardModel, type ExtractOptions, type ExtractResult, FAILURE_MODE_KIND_SPEC, FINDING_SUBJECT_GRAMMAR_PROMPT, FINDING_SUBJECT_KINDS, type FactorContribution, type FactorialCell, FeedbackLabel, FeedbackTrajectory, FeedbackTrajectoryStore, type FieldAgreementSpec, type FileChange, FileSystemExperimentStore, type FileSystemExperimentStoreOptions, type FindingSubject, type FindingSubjectKind, FindingSubjectStringSchema, type FindingsDiff, FindingsStore, type FlowAction, type FlowLayerEnv, type FlowLayerFactoryInput, type FlowRunner, type FlowRunnerStepResult, type FlowSpec, type FlowStep, type GhCliClientOptions, type GoldScenario, type GoldSplit, type GoldenSeverity, type GoldenSpec, type HarnessAdapter, HarnessConfig, type HarnessExperimentConfig, type HarnessExperimentResult, type HarnessIntervention, type HarnessRunRequest, type HarnessRunResult, type HarnessScenario, type HarnessSelection, type HarnessVariant, type HarnessVariantReport, HoldoutAuditor, type HostedJudgeConfig, type HostedJudgeDimension, type HostedJudgeRequest, type HostedJudgeResponse, type HostedRunCriticConfig, type HostedRunScoreRequest, type HostedRunScoreResponse, type HttpGithubClientOptions, type HypothesisManifest, type HypothesisResult, IMPROVEMENT_KIND_SPEC, INTENT_MATCH_JUDGE_VERSION, type ImageData, InMemoryExperimentStore, InMemoryWorkspaceInspector, type InferenceScorer, type InspectorContext, type IntentMatchInput, type IntentMatchOptions, type IntentMatchResult, type InteractionContribution, type JudgeAdapterOpts, type JudgeFamily, type JudgeFleetOptions, JudgeFn, JudgeInput, type JudgeReplayResult, type JudgeRetryOutcome, type JudgeRetryPolicy, JudgeRunner, KIND_EXPECTED_SUBJECTS, KNOWLEDGE_GAP_KIND_SPEC, KNOWLEDGE_POISONING_KIND_SPEC, type KeywordConceptSpec, type KeywordCoverageFinding, type KeywordCoverageOptions, type KeywordCoverageResult, type LangfuseEnvelope, type LangfuseGeneration, type LangfuseScore, Layer, LayerResult, type LiveProofArtifact, type LiveProofConfig, type LiveProofContext, type LiveProofResult, LlmClientOptions, type LlmCorrectnessCheckerOpts, LlmSpan, LockedJsonlAppender, MODEL_PRICING, type MatchResult, type MatcherResult, type MeasurementPolicy, type MergeOptions, MetricsCollector, type MuffledFinder, type MuffledFinding, MultiLayerVerifier, type MultiToolchainLayerConfig, type Mutator, Mutex, type Objective, type Oracle, type OracleObservation, type OracleReport, type OracleResult, type OrthogonalityInput, type OrthogonalityResult, OtelExportConfig, OtelExporter, type OtelPipelineHandle, type OtelPipelineOptions, PairwiseSteeringOptimizer, type ParaphraseRobustnessScenarioInput, type ParaphraseRobustnessScenarioResult, type ParetoResult, type ParseStudentLabel, type PersistedFinding, PersonaConfig, type Playbook, type PlaybookEntry, type PoolSlot, type PrReviewAuditCase, type PrReviewBenchmarkSummary, type PrReviewComment, type PrReviewMatchedFinding, type PrReviewOutcome, type PrReviewReferenceFinding, type PrReviewScore, type PrReviewScoreWeights, type PrReviewSeverity, type PrReviewSource, type ProducedProposal, type ProducedState, ProductClient, ProductClientConfig, type PromptHandle, PromptRegistry, type ProposalEventLike, type ProposeAutomatedPullRequestInput, type ProposeAutomatedPullRequestResult, RAW_FINDING_SCHEMA_PROMPT, type RawAnalystFinding, RawAnalystFindingSchema, type RecordRunsOptions, type ReferenceMatchResult, type ReferenceReplayAdapter, type ReferenceReplayAdapterFn, type ReferenceReplayAdapterLike, type ReferenceReplayAggregate, type ReferenceReplayCandidate, type ReferenceReplayCase, type ReferenceReplayCaseRun, type ReferenceReplayExecutionScenario, type ReferenceReplayItem, type ReferenceReplayMatch, type ReferenceReplayMatchStrategy, type ReferenceReplayMatcher, type ReferenceReplayPromotionDecision, type ReferenceReplayPromotionPolicy, type ReferenceReplayRun, type ReferenceReplayRunContext, type ReferenceReplayRunOptions, type ReferenceReplayRunStore, type ReferenceReplayScenario, type ReferenceReplayScenarioScore, type ReferenceReplayScore, type ReferenceReplayScoreOptions, type ReferenceReplaySplit, type ReferenceReplaySplitComparison, type ReferenceReplaySteeringRowsOptions, type ReflectionContext, type ReflectionProposal, ReleaseConfidenceScorecard, ReleaseConfidenceThresholds, type RenderStudentPrompt, type RepoRef, type RequirementCheck, type ReviewerMemoryEntry, type ReviewerOutput, type ReviewerPromptInput, type ReviewerSoftFailDefaults, type ReviewerVerificationSummary, type RobustnessResult, Run$1 as Run, type RunCommandInput, type RunCommandResult, type RunConfig, RunCritic, type RunCriticAdapterOpts, type RunCriticOptions, type RunDiff, type RunDistillationOptions, type RunDistillationResult, RunFilter, RunRecord, type RunScore, type RunScoreWeights, type RunTrace, type RuntimeEventLike, SEMANTIC_CONCEPT_JUDGE_VERSION, SKILL_USAGE_ANALYST, SandboxDriver, SandboxHarnessResult, type SandboxJudgeKind, type SandboxJudgeResult, type SandboxJudgeSpec, type SandboxPool, type SatisfiedBy, type ScanOptions, Scenario, type ScenarioCost, ScenarioFile, ScenarioRegistry, ScenarioResult, type Scorecard, type ScorecardCell, type ScorecardCellDiff, type ScorecardDiff, type ScorecardEntry, type ScorecardLogLine, type ScoredTarget, type SelfPlayOptions, type SelfPlayProposer, type SelfPlayScorer, type SemanticConceptJudgeAdapterOpts, type SemanticConceptJudgeInput, type SemanticConceptJudgeOptions, type SemanticConceptJudgeResult, type SeriesConvergenceOptions, type SeriesConvergenceResult, Severity, type SignedManifest, type SignedManifestAlgo, type SingleBackendDivergence, SingleBackendError, type SingleBackendField, type SingleBackendReport, SkillUsageAnalyst, type SkillUsageRecord, type SkillUsageReport, type SkillUsageScanConfig, type Slo, type SloCheckResult, type SloComparator, type SloReport, type SloSeverity, type SlopCategory, type SlotFactory, Span, type SplitGoldOptions, type SteeringBundle, type SteeringDelta, type SteeringOptimizationResult, type SteeringOptimizationRow, type SteeringOptimizationSelector, type SteeringOptimizerBackend, type SteeringOptimizerConfig, type SteeringRolePrompt, type StepAttribution, type SynthesisReason, type SynthesisTarget, type TaskGold, TestResult, type ThresholdContract, TokenCounter, type TokenSpec, type ToolCallEventLike, TraceAnalysisStore, type TraceAnalystAdapterOpts, type TraceAnalystGolden, type TraceAnalystKindSpec, TraceEmitter, TraceEvent, TraceStore, type TraceToolGroupName, type TracedAnalystOptions, type TracedJudgeOptions, Trajectory, TrajectoryStep, type TrialTrace, TurnMetrics, UNIVERSAL_FINDERS, type ValidationContext, type ValidationIssue, type ValidationResult, type VerifierAdapterOpts, VerifyContext, VerifyOptions, type VisualDiffOptions, type VisualDiffResult, type ViteDeployRunnerInput, type WorkflowTopology, type WorkspaceAssertion, type WorkspaceAssertionResult, type WorkspaceInspector, type WorkspaceSnapshot, type WranglerDeployRunnerInput, adversarialJudge, aggregatePrReviewScore, aggregateRunScore, analyzeAntiSlop, analyzeSeries, appendScorecard, assertCrossFamily, assertSingleBackend, attributeCounterfactuals, bisect, buildAgreementJudge, buildDriverSystemPrompt, buildReflectionPrompt, buildReviewerPrompt, buildSkillUsageReport, buildTraceToolsForGroup, byteLengthRange, canaryLeakView, canonicalize, causalAttribution, checkBehavioralCanary, checkCanaries, checkSlos, clamp01, codeExecutionJudge, coherenceJudge, collectionPreserved, commentsForSource, commitBisect, compareReferenceReplay, compilerJudge, composeValidators, containsAll, createAntiSlopJudge, createCustomJudge, createDefaultReviewer, createDomainExpertJudge, createIntentMatchJudge, createJudgeAdapter, createLlmCorrectnessChecker, createRunCriticAdapter, createSandboxPool, createSemanticConceptJudge, createSemanticConceptJudgeAdapter, createTraceAnalystAdapter, createTraceAnalystKind, createVerifierAdapter, crossTraceDiff, crowdingDistance, decideNextUserTurn, decideReferenceReplayPromotion, decideReferenceReplayRunPromotion, defaultIsMaterial, defaultJudges, defaultParseStudentLabel, defaultReferenceReplayMatcher, defaultRenderStudentPrompt, deployGateLayer, diffFindings, diffScorecard, discoverPersonas, distillPlaybook, dominates, emitSkillUsageFindings, estimateCost, estimateTokens, evaluateContract, evaluateHypothesis, evaluateOracles, executeScenario, expectAgent, exportRewardModel, extractAssetUrls, extractErrorCount, extractProducedState, fieldAgreement, fileContains, fileExists, findAutoMatchNoExpectation, findConstructorCwdDropped, findFallbackToPass, findLiteralTruePass, findSkipCountsAsPass, flowLayer, formatBenchmarkReport, formatDriverReport, formatFindings, formatScorecardDiff, ghCliClient, precision as goldenPrecision, hashContent, hashJson, htmlContainsElement, httpGithubClient, inMemoryReferenceReplayStore, isModelPriced, isOtelConfigured, jsonHasKeys, jsonShape, jsonlReferenceReplayStore, judgeFamily, keyPreserved, liftSeverity, linterJudge, loadGoldScenarios, loadScorecard, loadScorerFromGrader, localCommandRunner, lowercaseMutator, matchGoldens, mergeLayerResults, mergeSteeringBundle, multiToolchainLayer, notBlocked, paraphraseRobustness, paraphraseRobustnessScenarios, paretoFrontier, paretoFrontierWithCrowding, parseCorrectnessResponse, parseFindingSubject, parseGoldJsonl, parseRawFinding, parseReflectionResponse, passOrthogonality, pixelDeltaRatio, politenessPrefixMutator, printDriverSummary, index as profile, promptBisect, proposeAutomatedPullRequest, proposeSynthesisTargets, recordRuns, recordRunsToScorecard, referenceReplayRunsToSteeringRows, referenceReplayScenarioToRunScore, regexMatch, regexMatches, renderFindingSubject, renderMarkdownReport, renderPlaybookMarkdown, renderPriorFindings, renderSteeringText, replayScorerOverCorpus, replayTraceThroughJudge, resetLockedAppendersForTesting, resolveModelPricing, rowCount, rowWhere, runAssertions, runBehavioralCanaries, runCanaries, runCounterfactual, runDistillation, runE2EWorkflow, runExpectations, runHarnessExperiment, runIntentMatchJudge, runJudgeFleet, runKeywordCoverageJudge, runKeywordCoverageJudgeUrl, runLiveProof, runReferenceReplay, runSelfPlay, runSemanticConceptJudge, scalarScore, scanForMuffledGates, scoreContinuity, scorePrReviewComments, scorePrReviewSource, scoreReferenceReplay, securityJudge, selectHarnessVariant, sentenceReorderMutator, signManifest, splitGold, statusAdvanced, summarizeHarnessResults, summarizePrReviewBenchmark, testJudge, textInSnapshot, toLangfuseEnvelope, toPrometheusText, traceJudge, traceJudgeEnsemble, tracedAnalyzeTraces, typoMutator, urlContains, verifyCompletion, verifyManifest, visualDiff, viteDeployRunner, weightedRecall, whitespaceCollapseMutator, withJudgeRetry, withOtelPipeline, wranglerDeployRunner };

package/dist/index.js CHANGED Viewed

@@ -10719,13 +10719,57 @@ function replacerSortKeys() {
 // src/profile/index.ts
 var profile_exports = {};
 __export(profile_exports, {
+  BASELINE_ROLES: () => BASELINE_ROLES,
   applyDomainPatch: () => applyDomainPatch,
   baselineProfile: () => baselineProfile,
+  baselineProfileFromRole: () => baselineProfileFromRole,
+  engineerRole: () => engineerRole,
+  generalistRole: () => generalistRole,
   prodProfile: () => prodProfile,
   profileToSurface: () => profileToSurface,
   renderProfile: () => renderProfile,
+  researcherRole: () => researcherRole,
   sectionHash: () => sectionHash
 });
+// src/profile/baselines.ts
+var engineerRole = [
+  "You are a senior principal engineer \u2014 a 10x individual contributor \u2014 operating inside an isolated sandbox workspace with real tools (shell, filesystem, editors, test runners).",
+  "You do not behave like a chatbot that describes work. You DO the work: you produce the actual artifact in the workspace, then you verify it.",
+  "",
+  "How you operate:",
+  "- Deliver the smallest correct change that fully satisfies the goal. Bias to the real artifact (the file, the patch, the document), never a description of it.",
+  "- Before declaring done, run the available checks (tests, typecheck, validators, a re-read of what you produced). If a check fails, fix the ROOT CAUSE \u2014 never weaken the check, never hide the error, never fake success.",
+  "- External-boundary calls (shell, network, filesystem) can fail. Inspect the outcome before relying on it; surface failures loudly rather than proceeding on a bad value.",
+  '- State outcomes faithfully: what you verified, what you skipped, what is still failing. "Done" means produced AND verified.'
+].join("\n");
+var researcherRole = [
+  "You are a principal research analyst operating inside a workspace with tools to read sources, search, and record findings.",
+  "Your output is only as good as its grounding. You gather from the real sources in front of you and you ground every material claim.",
+  "",
+  "How you operate:",
+  "- Read the actual sources before concluding. Do not answer from memory when a source is available to check.",
+  "- Cite the source for every material claim; explicitly mark anything you infer rather than verify. Never fabricate a source, a quote, a number, or a citation.",
+  "- Distinguish what the sources establish from what you are extrapolating, and say which is which.",
+  "- When the sources are insufficient or contradictory, say so plainly rather than papering over the gap."
+].join("\n");
+var generalistRole = [
+  "You are a capable, senior generalist assistant operating inside a workspace with real tools.",
+  "You are direct and grounded: you verify before you assert, and you produce real output rather than describing it.",
+  "",
+  "How you operate:",
+  "- Prefer doing over describing \u2014 when the workspace lets you produce the artifact, produce it.",
+  "- Ground claims in what you can check; when you cannot check something, say so instead of guessing confidently.",
+  "- Verify your output before declaring done, and report what you verified vs. what remains uncertain.",
+  "- Ask the user only when a choice is genuinely theirs and you cannot resolve it from the task, the workspace, or sensible defaults."
+].join("\n");
+var BASELINE_ROLES = {
+  engineer: engineerRole,
+  researcher: researcherRole,
+  generalist: generalistRole
+};
+// src/profile/index.ts
 function renderSkill(skill) {
   const lines = [`### ${skill.name}`, skill.description];
   if (skill.triggers.length > 0) {
@@ -10785,6 +10829,9 @@ function baselineProfile(args) {
     domain: []
   };
 }
+function baselineProfileFromRole(role, args = {}) {
+  return baselineProfile({ role: BASELINE_ROLES[role], ...args });
+}
 function prodProfile(baseline, shipped) {
   return { ...baseline, domain: [...baseline.domain, ...shipped] };
 }