hackmyagent 0.9.3 → 0.9.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -233,6 +233,71 @@ Output formats: `text`, `json`, `sarif`, `html`, `asp` (Agent Security Profile).
233
233
 
234
234
  ---
235
235
 
236
+ ### `hackmyagent secure -b oasb-2`
237
+
238
+ Run OASB-2 composite assessment: infrastructure security (OASB-1, 50%) combined with behavioral governance (scan-soul, 50%) for a unified score.
239
+
240
+ ```bash
241
+ hackmyagent secure -b oasb-2 # full composite assessment
242
+ hackmyagent secure -b oasb-2 --json # JSON output
243
+ hackmyagent secure -b oasb-2 --fail-below 60 # CI gate
244
+ ```
245
+
246
+ Output shows infrastructure score, governance score, composite score, and conformance level. Requires a SOUL.md (or equivalent governance file) in the scanned directory.
247
+
248
+ ---
249
+
250
+ ### `hackmyagent scan-soul`
251
+
252
+ Scan a SOUL.md (or equivalent governance file) against OASB v2 behavioral governance controls. Checks 8 domains and up to 68 controls depending on agent tier.
253
+
254
+ ```bash
255
+ hackmyagent scan-soul # scan current directory
256
+ hackmyagent scan-soul ./my-agent # scan specific directory
257
+ hackmyagent scan-soul --tier MULTI-AGENT # override tier detection
258
+ hackmyagent scan-soul --json # JSON output for CI
259
+ hackmyagent scan-soul --verbose # show individual control results
260
+ hackmyagent scan-soul --deep # LLM semantic analysis (requires ANTHROPIC_API_KEY)
261
+ hackmyagent scan-soul --fail-below 60 # exit 1 if score below threshold
262
+ ```
263
+
264
+ Auto-detects governance file in priority order: `SOUL.md` > `system-prompt.md` > `CLAUDE.md` > `.cursorrules` > `agent-config.yaml` and others.
265
+
266
+ Tier-to-control counts:
267
+
268
+ | Tier | Controls | Use case |
269
+ |------|----------|----------|
270
+ | `BASIC` | 27 | Chatbots with no tool access |
271
+ | `TOOL-USING` | 54 | Agents with tool/function calling |
272
+ | `AGENTIC` | 65 | Autonomous multi-step agents |
273
+ | `MULTI-AGENT` | 68 | Orchestrators and sub-agent systems |
274
+
275
+ Conformance levels:
276
+
277
+ | Level | Criteria |
278
+ |-------|----------|
279
+ | `none` | A critical control (SOUL-IH-003 or SOUL-HB-001) is missing — grade capped at C |
280
+ | `essential` | All critical controls pass |
281
+ | `standard` | All critical + high controls pass, score ≥ 60 |
282
+ | `hardened` | All controls pass, score ≥ 75 |
283
+
284
+ ---
285
+
286
+ ### `hackmyagent harden-soul`
287
+
288
+ Generate a SOUL.md, or add missing governance sections to an existing one. Existing content is always preserved.
289
+
290
+ ```bash
291
+ hackmyagent harden-soul # add missing sections
292
+ hackmyagent harden-soul --dry-run # preview without writing
293
+ hackmyagent harden-soul ./my-agent # target specific directory
294
+ hackmyagent harden-soul --json # JSON output
295
+ ```
296
+
297
+ Generates template content for each missing OASB v2 governance domain. Run `scan-soul` after to verify coverage improved.
298
+
299
+ ---
300
+
236
301
  ### `hackmyagent fix-all`
237
302
 
238
303
  Run all security plugins in sequence: credential vault, file signing, skill guard. Applies fixes and generates a report.
package/dist/cli.js CHANGED
@@ -3591,6 +3591,35 @@ function gradeColor(grade) {
3591
3591
  case 'F': return colors.brightRed;
3592
3592
  }
3593
3593
  }
3594
+ function levelColor(level) {
3595
+ switch (level) {
3596
+ case 'hardened': return colors.green;
3597
+ case 'standard': return colors.green;
3598
+ case 'developing': return colors.yellow;
3599
+ case 'initial': return colors.cyan;
3600
+ case 'not-started': return colors.reset;
3601
+ }
3602
+ }
3603
+ function levelLabel(level) {
3604
+ switch (level) {
3605
+ case 'hardened': return 'Hardened';
3606
+ case 'standard': return 'Standard';
3607
+ case 'developing': return 'Developing';
3608
+ case 'initial': return 'Initial';
3609
+ case 'not-started': return 'Not Started';
3610
+ }
3611
+ }
3612
+ /**
3613
+ * Detect how the CLI was invoked to suggest correct command prefix.
3614
+ */
3615
+ function getCommandPrefix() {
3616
+ const execPath = process.argv[1] || '';
3617
+ if (execPath.includes('npx') || execPath.includes('.npm/_npx') ||
3618
+ execPath.includes('node_modules/.bin')) {
3619
+ return 'npx hackmyagent';
3620
+ }
3621
+ return 'hackmyagent';
3622
+ }
3594
3623
  // Domain percentage bar for text output
3595
3624
  function domainBar(pct) {
3596
3625
  if (pct >= 80)
@@ -3613,31 +3642,29 @@ Searches for governance files in priority order:
3613
3642
  > .github/copilot-instructions.md > CLAUDE.md > .clinerules
3614
3643
  > instructions.md > constitution.md > agent-config.yaml
3615
3644
 
3616
- Domains checked (OASB v2):
3617
- 7. Trust Hierarchy 8. Capability Boundaries
3618
- 9. Injection Hardening 10. Data Handling
3619
- 11. Hardcoded Behaviors 12. Agentic Safety
3620
- 13. Honesty & Transparency 14. Human Oversight
3621
-
3622
- Grade: A (80-100), B (60-79), C (40-59), D (20-39), F (0-19)
3623
- Critical floor: Missing SOUL-IH-003 or SOUL-HB-001 caps grade at C.
3645
+ Agent profiles filter domains by agent purpose:
3646
+ conversational: Injection, Hardcoded, Honesty
3647
+ code-assistant: + Trust, Data
3648
+ tool-agent: + Capability, Oversight
3649
+ autonomous: + Agentic Safety
3650
+ orchestrator: All 8 domains
3624
3651
 
3625
- Conformance levels:
3626
- none: one or more critical controls missing
3627
- essential: all critical controls pass, score < 60
3628
- standard: all critical controls pass, score >= 60
3629
- hardened: all critical controls pass, score >= 75
3652
+ Maturity levels:
3653
+ Hardened (80+), Standard (60-79), Developing (40-59),
3654
+ Initial (1-39), Not Started (0)
3630
3655
 
3631
3656
  Examples:
3632
3657
  $ hackmyagent scan-soul Scan current directory
3633
3658
  $ hackmyagent scan-soul ./my-agent Scan specific directory
3634
3659
  $ hackmyagent scan-soul --json Machine-readable output
3635
3660
  $ hackmyagent scan-soul --verbose Show all controls
3661
+ $ hackmyagent scan-soul --profile conversational Override profile
3636
3662
  $ hackmyagent scan-soul --deep Enable LLM semantic analysis`)
3637
3663
  .argument('[directory]', 'Directory to scan (defaults to current directory)', '.')
3638
3664
  .option('--json', 'Output as JSON')
3639
3665
  .option('-v, --verbose', 'Show individual control results')
3640
3666
  .option('--tier <tier>', 'Override agent tier detection (BASIC, TOOL-USING, AGENTIC, MULTI-AGENT)')
3667
+ .option('--profile <profile>', 'Override agent profile (conversational, code-assistant, tool-agent, autonomous, orchestrator, custom)')
3641
3668
  .option('--fail-below <score>', 'Exit 1 if score below threshold (0-100)')
3642
3669
  .option('--deep', 'Enable LLM semantic analysis for ambiguous controls (requires claude CLI or ANTHROPIC_API_KEY)')
3643
3670
  .action(async (directory, options) => {
@@ -3647,10 +3674,12 @@ Examples:
3647
3674
  process.stderr.write(`Error: Directory '${targetDir}' does not exist.\n`);
3648
3675
  process.exit(1);
3649
3676
  }
3677
+ const prefix = getCommandPrefix();
3650
3678
  const scanner = new index_1.SoulScanner();
3651
3679
  const result = await scanner.scanSoul(targetDir, {
3652
3680
  verbose: options.verbose,
3653
3681
  tier: options.tier,
3682
+ profile: options.profile,
3654
3683
  deepAnalysis: options.deep,
3655
3684
  });
3656
3685
  // JSON output
@@ -3675,9 +3704,23 @@ Examples:
3675
3704
  process.stdout.write(`File: ${colors.red}No governance file found${colors.reset}\n`);
3676
3705
  process.stdout.write(` Searched: ${['SOUL.md', 'system-prompt.md', 'CLAUDE.md', '...'].join(', ')}\n`);
3677
3706
  }
3678
- process.stdout.write(`Agent Tier: ${result.agentTier} (auto-detected)\n\n`);
3707
+ const tierLabel = result.tierForced ? `${result.agentTier} (--tier flag)` : `${result.agentTier} (auto-detected)`;
3708
+ const profileLabel = result.profileForced ? `${result.agentProfile} (--profile flag)` : `${result.agentProfile} (auto-detected)`;
3709
+ process.stdout.write(`Agent Tier: ${tierLabel}\n`);
3710
+ process.stdout.write(`Agent Profile: ${profileLabel}\n`);
3711
+ if (result.skippedDomains.length > 0) {
3712
+ process.stdout.write(`Skipped Domains: ${result.skippedDomains.join(', ')}\n`);
3713
+ }
3714
+ process.stdout.write('\n');
3679
3715
  process.stdout.write('Domain Scores:\n');
3680
3716
  for (const domain of result.domains) {
3717
+ if (domain.skippedByProfile) {
3718
+ if (options.verbose) {
3719
+ const label = (domain.domain + ':').padEnd(26);
3720
+ process.stdout.write(` ${label}${colors.reset}-- (skipped by profile)${colors.reset}\n`);
3721
+ }
3722
+ continue;
3723
+ }
3681
3724
  const pctColor = domainBar(domain.percentage);
3682
3725
  const label = (domain.domain + ':').padEnd(26);
3683
3726
  process.stdout.write(` ${label}${pctColor}${domain.passed}/${domain.total} (${domain.percentage}%)${colors.reset}\n`);
@@ -3692,9 +3735,9 @@ Examples:
3692
3735
  }
3693
3736
  }
3694
3737
  process.stdout.write('\n');
3695
- // Score and grade
3696
- const gc = gradeColor(result.grade);
3697
- process.stdout.write(`Governance Score: ${gc}${result.score}/100 (Grade: ${result.grade})${colors.reset}\n`);
3738
+ // Score and level (progress-oriented)
3739
+ const lc = levelColor(result.level);
3740
+ process.stdout.write(`Governance Score: ${lc}${result.score}/100 [${levelLabel(result.level)}]${colors.reset}\n`);
3698
3741
  // Conformance level
3699
3742
  if (result.conformance === 'none') {
3700
3743
  process.stdout.write(`Conformance: ${colors.red}NONE${colors.reset} -- critical control missing (${result.criticalMissing.join(', ')})\n`);
@@ -3706,17 +3749,19 @@ Examples:
3706
3749
  process.stdout.write(`${colors.yellow}Critical Floor: APPLIED${colors.reset} (${result.criticalMissing.join(', ')} missing)\n`);
3707
3750
  }
3708
3751
  // Deep analysis summary
3709
- if (result.deepAnalysisResults && result.deepAnalysisResults.length > 0) {
3752
+ if (result.deepAnalysisAvailable === false) {
3753
+ process.stdout.write(`${colors.yellow}Deep Analysis: unavailable${colors.reset} -- set ANTHROPIC_API_KEY or install the claude CLI\n`);
3754
+ }
3755
+ else if (result.deepAnalysisResults && result.deepAnalysisResults.length > 0) {
3710
3756
  const llmUpgraded = result.deepAnalysisResults.filter((e) => e.llmPassed).length;
3711
- if (llmUpgraded > 0) {
3712
- process.stdout.write(`Deep Analysis: ${llmUpgraded} control${llmUpgraded === 1 ? '' : 's'} upgraded by LLM semantic analysis\n`);
3713
- }
3757
+ process.stdout.write(`Deep Analysis: ${llmUpgraded} control${llmUpgraded === 1 ? '' : 's'} upgraded by LLM semantic analysis\n`);
3714
3758
  }
3715
- // Path forward
3759
+ // Path forward (recovery-oriented, not punitive)
3716
3760
  const missing = result.totalControls - result.totalPassed;
3717
3761
  if (missing > 0) {
3718
- process.stdout.write(`\n${missing} control${missing === 1 ? '' : 's'} missing.`);
3719
- process.stdout.write(` Run '${colors.cyan}hackmyagent harden-soul${colors.reset}' to remediate.\n`);
3762
+ const recoverable = Math.min(100 - result.score, 100);
3763
+ process.stdout.write(`\n Path forward: +${recoverable} recoverable by addressing ${missing} control${missing === 1 ? '' : 's'}`);
3764
+ process.stdout.write(`\n Run '${colors.cyan}${prefix} harden-soul${colors.reset}' to remediate.\n`);
3720
3765
  }
3721
3766
  else {
3722
3767
  process.stdout.write(`\n${colors.green}All ${result.totalControls} governance controls covered.${colors.reset}\n`);
@@ -3742,6 +3787,8 @@ program
3742
3787
 
3743
3788
  Runs scan-soul internally to identify missing controls, then generates
3744
3789
  template content for each missing domain. Existing content is preserved.
3790
+ Supports iterative hardening: if a domain heading exists but controls
3791
+ fail within it, appends targeted remediation for those controls.
3745
3792
 
3746
3793
  Modes:
3747
3794
  Default: Append missing sections to SOUL.md (or create it)
@@ -3754,6 +3801,7 @@ Examples:
3754
3801
  $ hackmyagent harden-soul --json Machine-readable output`)
3755
3802
  .argument('[directory]', 'Directory to harden (defaults to current directory)', '.')
3756
3803
  .option('--dry-run', 'Preview changes without modifying files')
3804
+ .option('--profile <profile>', 'Override agent profile (conversational, code-assistant, tool-agent, autonomous, orchestrator, custom)')
3757
3805
  .option('--json', 'Output as JSON')
3758
3806
  .action(async (directory, options) => {
3759
3807
  try {
@@ -3762,8 +3810,9 @@ Examples:
3762
3810
  process.stderr.write(`Error: Directory '${targetDir}' does not exist.\n`);
3763
3811
  process.exit(1);
3764
3812
  }
3813
+ const prefix = getCommandPrefix();
3765
3814
  const scanner = new index_1.SoulScanner();
3766
- const result = await scanner.hardenSoul(targetDir, { dryRun: options.dryRun });
3815
+ const result = await scanner.hardenSoul(targetDir, { dryRun: options.dryRun, profile: options.profile });
3767
3816
  // JSON output
3768
3817
  if (options.json) {
3769
3818
  // Exclude full content from JSON to keep it concise
@@ -3780,7 +3829,7 @@ Examples:
3780
3829
  // Text output
3781
3830
  if (result.sectionsAdded.length === 0) {
3782
3831
  process.stdout.write(`\n${colors.green}All governance domains already have sections in ${result.file}.${colors.reset}\n`);
3783
- process.stdout.write(`Run 'hackmyagent scan-soul --verbose' to see individual control coverage.\n\n`);
3832
+ process.stdout.write(`Run '${prefix} scan-soul --verbose' to see individual control coverage.\n\n`);
3784
3833
  return;
3785
3834
  }
3786
3835
  if (result.dryRun) {
@@ -3815,7 +3864,7 @@ Examples:
3815
3864
  process.stdout.write(` ${colors.green}+${colors.reset} ${section}\n`);
3816
3865
  }
3817
3866
  process.stdout.write(`Controls covered: +${result.controlsAdded}\n\n`);
3818
- process.stdout.write(`Run '${colors.cyan}hackmyagent scan-soul${colors.reset}' to verify coverage.\n\n`);
3867
+ process.stdout.write(`Run '${colors.cyan}${prefix} scan-soul${colors.reset}' to verify coverage.\n\n`);
3819
3868
  }
3820
3869
  }
3821
3870
  catch (error) {