@aiagenta2z/agtm 1.0.8 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,18 +1,18 @@
1
1
 
2
2
  ### agtm: CLI Tool for AI Agent Management, Skills, Agent Registry, Benchmarks and Hints in AI Agent Marketplace
3
3
 
4
- [GitHub](https://github.com/aiagenta2z/agtm)|[AI Agent Marketplace CLI Doc](https://www.deepnlp.org/doc/ai_agent_marketplace)|[DeepNLP AI Agent Marketplace](https://www.deepnlp.org/store/ai-agent) | [OneKey Agent Router](https://www.deepnlp.org/agent/onekey-mcp-router) | [Agent MCP OneKey Router Ranking](https://www.deepnlp.org/agent/rankings) | [NodeJS agtm](https://www.npmjs.com/package/@aiagenta2z/agtm)
4
+ [GitHub](https://github.com/aiagenta2z/agtm)|[AI Agent Marketplace CLI Doc](https://www.deepnlp.org/doc/ai_agent_marketplace)|[DeepNLP AI Agent Marketplace](https://www.deepnlp.org/store/ai-agent) | [OneKey Gateway](https://deepnlp.org/doc/onekey_gateway) | [Agent MCP OneKey Router Ranking](https://www.deepnlp.org/agent/rankings) | [NodeJS agtm](https://www.npmjs.com/package/@aiagenta2z/agtm)
5
5
 
6
6
  `agtm` (AI Agent Management CLI) unifies skill management, agent registration, marketplace search, and provider CLI execution. Install skills from GitHub, log and rate skill runs, upload agent metadata to registries, query the public marketplace, and run agent toolchains with fuzzy hints.
7
7
 
8
- Features
8
+ ## Features
9
9
 
10
10
  *`agtm skills`*: Manage Skills, Add Skills, List Skills, Log Skills Performance, Skills performance Evaluator, compare to realworld benchmarks
11
11
  *`agtm upload`*: AI Agent Registry, register local agent meta information of json or yaml format(agent.json/agent.yaml) or sync your github source meta including README.md
12
12
  *`agtm search`*: Search the open source AI Agent Marketplace, including github community, huggingface community, product hunt community, deepnlp ai agent marketplace index, etc
13
13
  *`agtm run`*: Run agent clis, don't need to remember, with the powerful hints and completion ability, just type a few characters and "--hint" will help you complete the command line.
14
14
 
15
- Furthermore, `agtm` provides memory to track skill outputs and enables performance rating against industry job level benchmarks. This allows you to score each skill execution and assign a professional tier to your AI Agent's capabilities—for example, evaluating its performance as equivalent to that of an L3 or L5 software engineer, marketing prefessional, etc.
15
+ Furthermore, `agtm` provides memory to track skill outputs and enables performance rating against industry job level benchmarks. This allows you to score each skill execution and assign a professional tier to your AI Agent's capabilities—for example, evaluating its performance as equivalent to that of an L3 or L5 software engineer, marketing professional, etc.
16
16
 
17
17
  ```shell
18
18
  skill_id run_times score level
@@ -29,6 +29,12 @@ code_success_skills 5 0.9 L3(100%)
29
29
  npm install -g @aiagenta2z/agtm
30
30
  ```
31
31
 
32
+ Setup hint and skills benchmark
33
+ ```shell
34
+ agtm setup --levels ## Needed before `agtm rate`, to sync the benchmarks json to local folder
35
+ agtm setup --hint ## Needed before `agtm run`
36
+ ```
37
+
32
38
  Agtm CLI Options
33
39
 
34
40
  | CLI | Command and Options | Document |
@@ -171,8 +177,16 @@ The `run` command executes agent workflows with interactive hints and fuzzy CLI
171
177
 
172
178
  Let's say you want to run an agent command of Playwright to go to a URL and fetch a webpage. You don't need to remember the full command—type `play`, pick the provider, then pick the CLI action.
173
179
 
180
+
181
+
174
182
  ### Usage
175
183
 
184
+ Remember to setup hint before running the agent-cli
185
+
186
+ ```shell
187
+ agtm setup --hint
188
+ ```
189
+
176
190
  ```
177
191
  agtm run <provider_unique_id> <agent_cli>
178
192
  ```
@@ -180,9 +194,7 @@ agtm run <provider_unique_id> <agent_cli>
180
194
  ### Example
181
195
 
182
196
  ```shell
183
- rockingdingo@rockingdingodeMacBook-Pro skills_cli % agtm run play
184
- DEBUG: Entering Human Mode | idArg play | commandArgs | options [object Object] | hasHints true | hints [object Object]
185
-
197
+ agtm run play
186
198
  Skill ID suggestions:
187
199
  1. microsoft/playwright-cli
188
200
  2. googleworkspace/cli
@@ -246,6 +258,28 @@ agtm upload --config ./agent.json --endpoint https://www.deepnlp.org/api/ai_agen
246
258
  agtm upload --config ./agent.json --endpoint https://www.aiagenta2z.com/api/ai_agent_marketplace/registry --schema ./schema.json
247
259
  ```
248
260
 
261
+
262
+
263
+ ### Skills Agtm-Cli
264
+
265
+ We provide Skills repo to use in various agents to evaluate skills and run agent hints.
266
+ The skills can be found in ./skills/ folder
267
+
268
+ | skill | description |
269
+ | ---- | ---- |
270
+ | agent-cli-hint-completion | This skill uses `agtm run --mode agent` to help hint agents clis usage |
271
+ | agent-skills-evaluator | This skill use `agtm skills log` and `agtm skills rate` to track other skills performance from LLM-based evaluator, match to professional Job Level Benchmarks, such as Google L3 level software engineers/ Apple M3 level marketing specialist performance. |
272
+
273
+ ```shell
274
+ npx agtm skills add aiagenta2z/agtm ## install all the skill evaluation and skill cli-hints
275
+ npx agtm skills add aiagenta2z/agtm -s agent-skills-evaluator
276
+ ```
277
+
278
+ ```shell
279
+ npx skills add aiagenta2z/agtm ## install all the skill evaluation and skill cli-hints
280
+ npx skills add aiagenta2z/agtm -s agent-skills-evaluator
281
+ ```
282
+
249
283
  ### Contributing
250
284
 
251
285
  #### Agent CLI List
package/dist/agtm-cli.js CHANGED
@@ -9,6 +9,8 @@ import { execFileSync, spawn } from 'node:child_process';
9
9
  import { createInterface } from 'node:readline/promises';
10
10
  import { fileURLToPath } from 'node:url';
11
11
  import { randomUUID } from 'node:crypto';
12
+ //production setup
13
+ const LOG_ENABLE = false;
12
14
  // --- Configuration ---
13
15
  const BASE_URL = 'https://www.deepnlp.org/api/ai_agent_marketplace';
14
16
  const REGISTRY_ENDPOINT = `${BASE_URL}/registry`;
@@ -19,7 +21,6 @@ const MOCK_RETURN_URL = "https://www.deepnlp.org/store/ai-agent/ai-agent/pub-AI-
19
21
  const CLI_DIR = path.dirname(fileURLToPath(import.meta.url));
20
22
  const MODE_AGENT = 'agent';
21
23
  const MODE_HUMAN = 'human';
22
- const LOG_ENABLE = true;
23
24
  const AGTM_LOCAL_DIR = path.join(process.cwd(), '.agtm');
24
25
  const AGTM_GLOBAL_DIR = path.join(os.homedir(), '.agtm');
25
26
  const SKILL_LOG_DIR_LOCAL = path.join(AGTM_LOCAL_DIR, 'skills', 'log');
@@ -636,7 +637,7 @@ function loadLevelDescriptions(levelFile) {
636
637
  return null;
637
638
  }
638
639
  }
639
- const DEFAULT_EVAL_PROMPT = 'You are an evaluator of skill performance. Score each example from 0.0 to 1.0 and assign a level based on benchmarks. Return JSON only.';
640
+ const DEFAULT_EVAL_SYSTEM_PROMPT = 'System Prompt: You are an evaluator of skill performance. Score each example from 0.0 to 1.0 and assign a level based on benchmarks. Return JSON only. Please output json in format of {"skill_id": <skill_id>, "results": [{"log_id": "<log_id_1>", "score": 1.0, "level": "L3", **extra},{"log_id": "<log_id_2>", "score": 1.0, "level": "L3", **extra}]}';
640
641
  const BENCHMARK_TOP_K = 3;
641
642
  function benchmarkKey(obj) {
642
643
  if (!obj || typeof obj !== 'object')
@@ -683,17 +684,79 @@ async function handleSkillsRatePrepare(options) {
683
684
  console.error(`\n❌ Error: No logs found for skill '${skillId}' in ${logDir}.`);
684
685
  process.exit(1);
685
686
  }
687
+ var userInputPrompt = `User prompt: ${(options.prompt || "")}`;
688
+ var mergeInstruction = DEFAULT_EVAL_SYSTEM_PROMPT + "\n" + userInputPrompt;
686
689
  const levelsData = loadLevelDescriptions(options.benchmark);
687
690
  const benchmarks = normalizeBenchmarks(skillId, levelsData).slice(0, BENCHMARK_TOP_K);
688
691
  const payload = {
689
692
  skill_id: skillId,
690
693
  benchmarks,
691
694
  logs: logs.map(({ log_id, input, output }) => ({ log_id, input, output })),
692
- instructions: options.prompt || DEFAULT_EVAL_PROMPT
695
+ instructions: mergeInstruction
693
696
  };
694
697
  console.log(JSON.stringify(payload, null, 2));
695
698
  }
696
699
  async function handleSkillsRateApply(options) {
700
+ const skillId = options.skill_id;
701
+ if (!skillId) {
702
+ console.error('\n❌ Error: --skill_id is required.');
703
+ process.exit(1);
704
+ }
705
+ if (!options.result) {
706
+ console.error('\n❌ Error: --result <json or base64> is required.');
707
+ process.exit(1);
708
+ }
709
+ let parsed;
710
+ try {
711
+ let raw = options.result;
712
+ // Attempt base64 decode if JSON parsing fails
713
+ try {
714
+ parsed = JSON.parse(raw);
715
+ }
716
+ catch {
717
+ // try decode base64
718
+ raw = Buffer.from(raw, 'base64').toString('utf8');
719
+ parsed = JSON.parse(raw);
720
+ }
721
+ }
722
+ catch (e) {
723
+ console.error(`\n❌ Error: invalid JSON/base64 for --result: ${e.message}`);
724
+ process.exit(1);
725
+ }
726
+ const results = Array.isArray(parsed?.results) ? parsed.results : [];
727
+ if (results.length === 0) {
728
+ console.error('\n❌ Error: --result must contain a non-empty "results" array.');
729
+ process.exit(1);
730
+ }
731
+ const logDir = getLogDir(options.logDir);
732
+ const logs = loadLogs(logDir).filter(l => l.skill_id === skillId);
733
+ const byId = new Map(logs.map(l => [l.log_id, l]));
734
+ let updated = 0;
735
+ const missing = [];
736
+ for (const item of results) {
737
+ const id = item?.log_id;
738
+ if (!id || !byId.has(id)) {
739
+ missing.push(String(id || 'unknown'));
740
+ continue;
741
+ }
742
+ const entry = byId.get(id);
743
+ // Support both 'score' and 'rating'
744
+ if (item.rating !== undefined)
745
+ entry.rating = Number(item.rating);
746
+ if (item.score !== undefined)
747
+ entry.rating = Number(item.score);
748
+ if (item.level !== undefined)
749
+ entry.level = String(item.level);
750
+ // Optional rationale
751
+ if (item.rationale !== undefined)
752
+ entry.rationale = String(item.rationale);
753
+ const target = path.join(logDir, `${entry.log_id}.json`);
754
+ fs.writeFileSync(target, JSON.stringify(entry, null, 2), 'utf8');
755
+ updated += 1;
756
+ }
757
+ console.log(JSON.stringify({ status: 'success', updated, missing }, null, 2));
758
+ }
759
+ async function handleSkillsRateApplyBak(options) {
697
760
  const skillId = options.skill_id;
698
761
  if (!skillId) {
699
762
  console.error('\n❌ Error: --skill_id is required.');
@@ -917,7 +980,7 @@ function fuzzyScore(query, candidate) {
917
980
  return 0.7 * editScore + 0.3 * tokenScore;
918
981
  }
919
982
  function createTrie() {
920
- return { children: new Map(), values: new Set() };
983
+ return { children: new Map(), terminalValues: new Set() };
921
984
  }
922
985
  function insertTrie(trie, key, value) {
923
986
  let node = trie;
@@ -932,8 +995,8 @@ function insertTrie(trie, key, value) {
932
995
  node.children.set(ch, created);
933
996
  node = created;
934
997
  }
935
- node.values.add(value);
936
998
  }
999
+ node.terminalValues.add(value);
937
1000
  }
938
1001
  function searchTrie(trie, prefix, limit) {
939
1002
  let node = trie;
@@ -944,9 +1007,57 @@ function searchTrie(trie, prefix, limit) {
944
1007
  return [];
945
1008
  }
946
1009
  }
947
- const values = Array.from(node.values);
948
- values.sort((a, b) => a.localeCompare(b));
949
- return values.slice(0, limit);
1010
+ const out = [];
1011
+ const seen = new Set();
1012
+ const dfs = (current) => {
1013
+ if (out.length >= limit)
1014
+ return;
1015
+ const terminal = Array.from(current.terminalValues).sort((a, b) => a.localeCompare(b));
1016
+ for (const value of terminal) {
1017
+ if (out.length >= limit)
1018
+ return;
1019
+ if (seen.has(value))
1020
+ continue;
1021
+ seen.add(value);
1022
+ out.push(value);
1023
+ }
1024
+ const keys = Array.from(current.children.keys()).sort((a, b) => a.localeCompare(b));
1025
+ for (const key of keys) {
1026
+ if (out.length >= limit)
1027
+ return;
1028
+ dfs(current.children.get(key));
1029
+ }
1030
+ };
1031
+ dfs(node);
1032
+ return out;
1033
+ }
1034
+ function trieToPersisted(node) {
1035
+ const children = {};
1036
+ for (const [key, child] of node.children.entries()) {
1037
+ children[key] = trieToPersisted(child);
1038
+ }
1039
+ return {
1040
+ children: Object.keys(children).length ? children : undefined,
1041
+ terminalValues: node.terminalValues.size ? Array.from(node.terminalValues).sort((a, b) => a.localeCompare(b)) : undefined
1042
+ };
1043
+ }
1044
+ function persistedToTrie(node) {
1045
+ const trie = createTrie();
1046
+ if (Array.isArray(node.terminalValues)) {
1047
+ for (const value of node.terminalValues) {
1048
+ if (typeof value === 'string' && value.trim()) {
1049
+ trie.terminalValues.add(value);
1050
+ }
1051
+ }
1052
+ }
1053
+ if (node.children && typeof node.children === 'object') {
1054
+ for (const [key, child] of Object.entries(node.children)) {
1055
+ if (!child || typeof child !== 'object')
1056
+ continue;
1057
+ trie.children.set(key, persistedToTrie(child));
1058
+ }
1059
+ }
1060
+ return trie;
950
1061
  }
951
1062
  function mergeHints(target, source) {
952
1063
  for (const [id, entry] of Object.entries(source)) {
@@ -997,6 +1108,26 @@ function writeHintsFile(filePath, hints) {
997
1108
  ensureDir(path.dirname(filePath));
998
1109
  fs.writeFileSync(filePath, JSON.stringify(hints, null, 2));
999
1110
  }
1111
+ function writeHintsTrieFile(filePath, trie) {
1112
+ ensureDir(path.dirname(filePath));
1113
+ fs.writeFileSync(filePath, JSON.stringify(trieToPersisted(trie), null, 2), 'utf8');
1114
+ }
1115
+ function loadHintsTrieFile(filePath) {
1116
+ if (!fs.existsSync(filePath)) {
1117
+ return null;
1118
+ }
1119
+ try {
1120
+ const raw = fs.readFileSync(filePath, 'utf8');
1121
+ const parsed = JSON.parse(raw);
1122
+ if (!parsed || typeof parsed !== 'object') {
1123
+ return null;
1124
+ }
1125
+ return persistedToTrie(parsed);
1126
+ }
1127
+ catch {
1128
+ return null;
1129
+ }
1130
+ }
1000
1131
  function findBundledHintsDir() {
1001
1132
  const candidates = [
1002
1133
  path.resolve(CLI_DIR, 'data', 'config', 'hints'),
@@ -1045,6 +1176,14 @@ async function loadBundledHints() {
1045
1176
  return merged;
1046
1177
  }
1047
1178
  function getHintsPath(useGlobal) {
1179
+ const baseDir = useGlobal ? AGTM_GLOBAL_DIR : AGTM_LOCAL_DIR;
1180
+ return path.join(baseDir, 'hints', 'hints.json');
1181
+ }
1182
+ function getHintsTriePath(useGlobal) {
1183
+ const baseDir = useGlobal ? AGTM_GLOBAL_DIR : AGTM_LOCAL_DIR;
1184
+ return path.join(baseDir, 'hints', 'hints_trie.json');
1185
+ }
1186
+ function getOldHintsPath(useGlobal) {
1048
1187
  if (useGlobal) {
1049
1188
  return path.join(AGTM_GLOBAL_DIR, 'hints.json');
1050
1189
  }
@@ -1063,6 +1202,8 @@ function loadCombinedHints(useGlobal) {
1063
1202
  const localHints = loadHintsFile(getHintsPath(false));
1064
1203
  mergeHints(combined, globalHints);
1065
1204
  mergeHints(combined, localHints);
1205
+ mergeHints(combined, loadHintsFile(getOldHintsPath(true)));
1206
+ mergeHints(combined, loadHintsFile(getOldHintsPath(false)));
1066
1207
  mergeHints(combined, loadHintsFile(getLegacyHintsPath(true)));
1067
1208
  mergeHints(combined, loadHintsFile(getLegacyHintsPath(false)));
1068
1209
  if (useGlobal) {
@@ -1101,6 +1242,49 @@ function filterCliHints(hints, query, limit) {
1101
1242
  const sorted = [...hints].sort((a, b) => a.cli.localeCompare(b.cli));
1102
1243
  return sorted.slice(0, limit);
1103
1244
  }
1245
+ function highlightMatches(text, query) {
1246
+ const trimmed = query.trim();
1247
+ if (!trimmed)
1248
+ return text;
1249
+ const tokens = trimmed
1250
+ .toLowerCase()
1251
+ .split(/[^a-z0-9]+/g)
1252
+ .map((t) => t.trim())
1253
+ .filter(Boolean);
1254
+ if (tokens.length === 0)
1255
+ return text;
1256
+ const lower = text.toLowerCase();
1257
+ const ranges = [];
1258
+ for (const token of tokens) {
1259
+ let idx = lower.indexOf(token);
1260
+ while (idx !== -1) {
1261
+ ranges.push([idx, idx + token.length]);
1262
+ idx = lower.indexOf(token, idx + 1);
1263
+ }
1264
+ }
1265
+ if (ranges.length === 0)
1266
+ return text;
1267
+ ranges.sort((a, b) => a[0] - b[0] || a[1] - b[1]);
1268
+ const merged = [];
1269
+ for (const [start, end] of ranges) {
1270
+ const last = merged[merged.length - 1];
1271
+ if (!last || start > last[1]) {
1272
+ merged.push([start, end]);
1273
+ }
1274
+ else {
1275
+ last[1] = Math.max(last[1], end);
1276
+ }
1277
+ }
1278
+ let out = '';
1279
+ let cursor = 0;
1280
+ for (const [start, end] of merged) {
1281
+ out += text.slice(cursor, start);
1282
+ out += green(text.slice(start, end));
1283
+ cursor = end;
1284
+ }
1285
+ out += text.slice(cursor);
1286
+ return out;
1287
+ }
1104
1288
  async function promptSelection(prompt, options) {
1105
1289
  if (!process.stdin.isTTY) {
1106
1290
  return options.length > 0 ? options[0] : null;
@@ -1124,7 +1308,7 @@ async function promptSelection(prompt, options) {
1124
1308
  rl.close();
1125
1309
  }
1126
1310
  }
1127
- async function promptCommandLine(promptText) {
1311
+ async function promptCommandLineBase(promptText) {
1128
1312
  if (!process.stdin.isTTY) {
1129
1313
  return null;
1130
1314
  }
@@ -1138,7 +1322,33 @@ async function promptCommandLine(promptText) {
1138
1322
  rl.close();
1139
1323
  }
1140
1324
  }
1141
- async function selectSkillId(hints, input, limit = 5) {
1325
+ import readline from 'readline';
1326
+ async function promptCommandLine(promptText, defaultValue) {
1327
+ if (!process.stdin.isTTY)
1328
+ return null;
1329
+ const rl = readline.createInterface({
1330
+ input: process.stdin,
1331
+ output: process.stdout,
1332
+ });
1333
+ try {
1334
+ return await new Promise((resolve) => {
1335
+ rl.question(promptText, (answer) => {
1336
+ rl.close();
1337
+ const trimmed = answer.trim();
1338
+ resolve(trimmed || defaultValue || null);
1339
+ });
1340
+ // Pre-fill default value and move cursor to end
1341
+ if (defaultValue) {
1342
+ rl.write(defaultValue);
1343
+ }
1344
+ });
1345
+ }
1346
+ finally {
1347
+ // just in case
1348
+ rl.close();
1349
+ }
1350
+ }
1351
+ async function selectSkillId(hints, input, limit = 5, trie) {
1142
1352
  const ids = Object.keys(hints);
1143
1353
  if (ids.length === 0) {
1144
1354
  return null;
@@ -1146,9 +1356,9 @@ async function selectSkillId(hints, input, limit = 5) {
1146
1356
  if (input && hints[input]) {
1147
1357
  return input;
1148
1358
  }
1149
- const trie = buildIdTrie(hints);
1359
+ const activeTrie = trie || buildIdTrie(hints);
1150
1360
  const prefix = input || '';
1151
- let suggestions = searchTrie(trie, prefix, limit);
1361
+ let suggestions = searchTrie(activeTrie, prefix, limit);
1152
1362
  if (suggestions.length === 0 && prefix) {
1153
1363
  const scored = ids
1154
1364
  .map((id) => ({ id, score: fuzzyScore(prefix, id) }))
@@ -1159,11 +1369,21 @@ async function selectSkillId(hints, input, limit = 5) {
1159
1369
  if (suggestions.length === 0) {
1160
1370
  return null;
1161
1371
  }
1162
- console.log('\nSkill ID suggestions:');
1372
+ let printedLines = 0;
1373
+ const trackedLog = (message = '') => {
1374
+ console.log(message);
1375
+ printedLines += countConsoleLogLines(message);
1376
+ };
1377
+ trackedLog('');
1378
+ trackedLog('Skill ID suggestions:');
1163
1379
  suggestions.forEach((value, index) => {
1164
- console.log(` ${index + 1}. ${value}`);
1380
+ trackedLog(` ${index + 1}. ${highlightMatches(value, prefix)}`);
1165
1381
  });
1166
- const selected = await promptSelection('\nSelect skill id (number or id): ', suggestions);
1382
+ const selected = await promptSelection('Select skill id (number or id): ', suggestions);
1383
+ printedLines += 1; // prompt line
1384
+ if (process.stdin.isTTY && process.stdout.isTTY) {
1385
+ clearLastLines(printedLines + 1); // +1 for the post-input newline line
1386
+ }
1167
1387
  console.log(`Selected Skill/Cli is ${selected}`);
1168
1388
  if (!selected) {
1169
1389
  return null;
@@ -1186,13 +1406,23 @@ async function selectCliHint(hints, query, limit = 5) {
1186
1406
  if (suggestions.length === 0) {
1187
1407
  return null;
1188
1408
  }
1189
- console.log('\nCommand hints:');
1409
+ let printedLines = 0;
1410
+ const trackedLog = (message = '') => {
1411
+ console.log(message);
1412
+ printedLines += countConsoleLogLines(message);
1413
+ };
1414
+ trackedLog('');
1415
+ trackedLog('Command hints:');
1190
1416
  suggestions.forEach((item, index) => {
1191
1417
  const hintText = item.hint ? ` # ${item.hint}` : '';
1192
- console.log(` ${index + 1}. ${item.cli}${hintText}`);
1418
+ trackedLog(` ${index + 1}. ${highlightMatches(item.cli, query || '')}${hintText}`);
1193
1419
  });
1194
1420
  const options = suggestions.map((item) => item.cli);
1195
- const selected = await promptSelection('\nSelect command (number or input custom): ', options);
1421
+ const selected = await promptSelection('Select command (number or input custom): ', options);
1422
+ printedLines += 1; // prompt line
1423
+ if (process.stdin.isTTY && process.stdout.isTTY) {
1424
+ clearLastLines(printedLines + 1); // +1 for the post-input newline line
1425
+ }
1196
1426
  if (!selected) {
1197
1427
  return null;
1198
1428
  }
@@ -1212,16 +1442,22 @@ async function handleSetup(options) {
1212
1442
  if (options.hint) {
1213
1443
  const bundled = await loadBundledHints();
1214
1444
  const targetPath = getHintsPath(useGlobal);
1445
+ const targetTriePath = getHintsTriePath(useGlobal);
1215
1446
  const legacyPath = getLegacyHintsPath(useGlobal);
1216
1447
  const existing = loadHintsFile(targetPath);
1448
+ const existingOld = loadHintsFile(getOldHintsPath(useGlobal));
1217
1449
  const merged = {};
1218
1450
  mergeHints(merged, bundled);
1451
+ mergeHints(merged, existingOld);
1219
1452
  mergeHints(merged, existing);
1220
1453
  writeHintsFile(targetPath, merged);
1454
+ const trieSource = loadCombinedHints(useGlobal);
1455
+ writeHintsTrieFile(targetTriePath, buildIdTrie(trieSource));
1221
1456
  if (fs.existsSync(path.dirname(legacyPath))) {
1222
1457
  writeHintsFile(legacyPath, merged);
1223
1458
  }
1224
1459
  console.log(`\n✅ Hints cache updated at ${targetPath}`);
1460
+ console.log(`✅ Hints trie updated at ${targetTriePath}`);
1225
1461
  }
1226
1462
  if (options['levels']) {
1227
1463
  const bundledLevelsDir = findBundledLevelsDir();
@@ -1235,6 +1471,26 @@ async function handleSetup(options) {
1235
1471
  console.log(`\n✅ Levels copied to ${targetDir}`);
1236
1472
  }
1237
1473
  }
1474
+ function clearScreen() {
1475
+ // process.stdout.write('\x1Bc');
1476
+ process.stdout.write('\x1b[0f');
1477
+ }
1478
+ function clearLastLines(n) {
1479
+ if (!process.stdout.isTTY)
1480
+ return;
1481
+ for (let i = 0; i < n; i++) {
1482
+ process.stdout.write('\x1b[2K'); // clear current line
1483
+ if (i < n - 1) {
1484
+ process.stdout.write('\x1b[1A'); // move cursor up
1485
+ }
1486
+ }
1487
+ process.stdout.write('\x1b[0G'); // move to start of line
1488
+ }
1489
+ function countConsoleLogLines(message) {
1490
+ if (message === '')
1491
+ return 1;
1492
+ return message.split('\n').length;
1493
+ }
1238
1494
  async function handleRun(idArg, commandArgs = [], options = {}) {
1239
1495
  const isAgent = (options.mode || 'human').toLowerCase() === MODE_AGENT;
1240
1496
  // first load local hints
@@ -1262,6 +1518,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
1262
1518
  runtimeHints = await loadBundledHints();
1263
1519
  }
1264
1520
  const activeHints = hasHints ? hints : (runtimeHints || {});
1521
+ const cachedIdTrie = hasHints ? loadHintsTrieFile(getHintsTriePath(false)) : null;
1265
1522
  const ids = Object.keys(activeHints);
1266
1523
  if (ids.length === 0) {
1267
1524
  console.error('\n❌ Error: No hints available.');
@@ -1269,7 +1526,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
1269
1526
  }
1270
1527
  if (!idArg || !activeHints[idArg]) {
1271
1528
  const query = idArg || '';
1272
- const trie = buildIdTrie(activeHints);
1529
+ const trie = cachedIdTrie || buildIdTrie(activeHints);
1273
1530
  let suggestions = searchTrie(trie, query, 2);
1274
1531
  if (suggestions.length === 0 && query) {
1275
1532
  const scored = ids
@@ -1280,7 +1537,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
1280
1537
  }
1281
1538
  console.log('\nSkill ID suggestions:');
1282
1539
  suggestions.forEach((value, index) => {
1283
- console.log(` ${index + 1}. ${value}`);
1540
+ console.log(` ${index + 1}. ${highlightMatches(value, query)}`);
1284
1541
  const entry = activeHints[value];
1285
1542
  if (entry?.hints?.length) {
1286
1543
  const preview = entry.hints.slice(0, 2).map((h) => `${h.cli}${h.hint ? ` # ${h.hint}` : ''}`);
@@ -1343,24 +1600,27 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
1343
1600
  if (LOG_ENABLE) {
1344
1601
  console.log(`DEBUG: Entering Human Mode | idArg ${idArg} | commandArgs ${commandArgs} | options ${options} | hasHints ${hasHints} | hints ${hints}`);
1345
1602
  }
1603
+ const cachedIdTrie = hasHints ? loadHintsTrieFile(getHintsTriePath(false)) : null;
1346
1604
  // human mode with pause for cli input
1347
1605
  if (!idArg) {
1348
1606
  if (!hasHints) {
1349
1607
  console.error('\n❌ Error: No hints cache found. Run `agtm setup --hint` first.');
1350
1608
  process.exit(1);
1351
1609
  }
1352
- const selected = await selectSkillId(hints);
1610
+ const selected = await selectSkillId(hints, undefined, 5, cachedIdTrie);
1353
1611
  if (!selected) {
1354
1612
  console.error('\n❌ Error: No skill id selected.');
1355
1613
  process.exit(1);
1356
1614
  }
1357
1615
  idArg = selected;
1616
+ // clearScreen();
1358
1617
  }
1359
1618
  else if (hasHints && !hints[idArg]) {
1360
- const selected = await selectSkillId(hints, idArg);
1619
+ const selected = await selectSkillId(hints, idArg, 5, cachedIdTrie);
1361
1620
  if (selected) {
1362
1621
  idArg = selected;
1363
1622
  }
1623
+ // clearScreen();
1364
1624
  }
1365
1625
  let finalCommandArgs = commandArgs;
1366
1626
  if (hasHints && idArg && finalCommandArgs.length > 0 && hints[finalCommandArgs[0]]) {
@@ -1383,7 +1643,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
1383
1643
  if (!finalCommandArgs || finalCommandArgs.length === 0) {
1384
1644
  let chosen = null;
1385
1645
  if (hintEntry?.hints && hintEntry.hints.length > 0) {
1386
- const query = await promptCommandLine('\nEnter command (leave empty to list hints): ');
1646
+ const query = await promptCommandLine(`\nEnter command to run (leave empty to list cli hints): `, ``);
1387
1647
  const searchQuery = query || '';
1388
1648
  chosen = await selectCliHint(hintEntry.hints, searchQuery);
1389
1649
  }
@@ -1391,7 +1651,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
1391
1651
  finalCommandArgs = chosen.cli.split(/\s+/).filter(Boolean);
1392
1652
  }
1393
1653
  else {
1394
- const manual = await promptCommandLine('\nEnter command to run: ');
1654
+ const manual = await promptCommandLine('\nEnter command line to run: ', ``);
1395
1655
  if (!manual) {
1396
1656
  console.error('\n❌ Error: No command selected.');
1397
1657
  process.exit(1);
@@ -1411,7 +1671,8 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
1411
1671
  process.exit(1);
1412
1672
  }
1413
1673
  const finalCommandLine = finalCommandArgs.join(' ');
1414
- const edited = await promptCommandLine(`\nFinal command [${finalCommandLine}]: `);
1674
+ console.log("\nComplete the Cli with your arguments or leave blank and press Enter");
1675
+ const edited = await promptCommandLine(`\nFinal command line [${finalCommandLine}]:\n`, `${finalCommandLine}`);
1415
1676
  if (edited && edited.trim()) {
1416
1677
  finalCommandArgs = edited.split(/\s+/).filter(Boolean);
1417
1678
  }
@@ -82,6 +82,115 @@ To use the rate command, have to setup the benchmark levels configuration. save
82
82
  agtm setup --levels
83
83
  ```
84
84
 
85
+ #### Description
86
+ This skill run `agent rate` command line to evaluate
87
+
88
+ The Agtm Skills CLI manages local skill bundles for supported agents (for example `claude-code`, `codex`, `openclaw`). It can download skills from GitHub, install them into the correct agent folders, list what is installed, record run logs, and apply rating benchmarks.
89
+
90
+ It also serves as a benchmarking tool to evaluate skill outputs:
91
+ **Benchmark** your AI agent against real-world standards — from Google-level engineering to Apple-caliber product launches.
92
+ **Rate** performance of each run with structured scores and levels, helping agents like Claude Code choose the right skills more effectively.
93
+
94
+
95
+ #### Usage
96
+
97
+
98
+ Each time after your agent runs a skills, it runs a follow up skill agent-skills-evaluator to track
99
+ the log of this run with input, output summarized, keep them in a log file based memory.
100
+ Then it calls the `agtm skills log` and `agtm skills rate`, `agtm skills rate show`
101
+
102
+ `agtm skills log`: keep track of skills running in a local cache json log file
103
+ `agtm skills rate prepare`: Fetch the evaluator and benchmarks.json, load the criteria of evaluation, such as job levels, task fullfillment.
104
+ `agtm skills rate apply`: Append the LLM Based Evaluator to the local results.
105
+ `agtm skills rate show`: Show the table of historical scores, level ratings.
106
+
107
+ ```
108
+ agtm skills log <skill_id> --data '<json_payload>'
109
+ agtm skills rate prepare --skill_id <skill_id> --prompt "<eval_prompt>" --benchmark <path/benchmark.json>
110
+ agtm skills rate apply --skill_id <skill_id> --result '<result_json: log_id>'
111
+ agtm skills rate show --skill_id <skill_id>
112
+ ```
113
+
114
+ #### Example
115
+ Note: `code_success_skills` is a dummy skill which always produce success results, `code_fail_skills` is a dummy skill which always produce failure results,
116
+
117
+
118
+ ```shell
119
+ ## log command will output a log_id
120
+ agtm skills log code_success_skills --data '{"input":"generate sql","output":"ok","meta":{"agent":"claude-code"}}'
121
+ agtm skills rate prepare --skill_id code_success_skills --prompt "Evaluate the code execution results"
122
+ agtm skills rate apply --skill_id code_success_skills --result '{"results":[{"log_id":"3679a3fe-4d97-4eb1-83bc-f83d711be195","rating":0.90,"level":"L4"}]}'
123
+ agtm skills rate show ## show the historical skills dashboard, including score, evaluation levels
124
+ ```
125
+
126
+ Note:
127
+ - Persists a run record at `.agtm/skills/log/<uuid>.json` (or the `--logDir` you supply).
128
+ - `<json_payload>` must contain at least `input` and `output`; optional fields (meta, rating, level) are accepted.
129
+
130
+
131
+ #### Pipeline
132
+
133
+ **Step 1. Add log to memory**
134
+ ```
135
+ agtm skills log code_success_skills --data '{"input":"generate sql","output":"ok","meta":{"agent":"claude-code"}}'
136
+ agtm skills log code_fail_skills --data '{"input":"generate sql","output":"failure","meta":{"agent":"claude-code"}}'
137
+
138
+ ```
139
+
140
+ It will generate a {log_id}.json as memory
141
+ ```shell
142
+ ✅ Saved log to .agtm/skills/log/96c216f1-edc5-40f3-b041-b01a68b137a1.json
143
+ ```
144
+
145
+ **Step 2. Prepare Evaluation prompt**
146
+
147
+ Prepare (<input, output>, benchmark) for LLM to compare the <input,output> with the benchmark..
148
+
149
+ ```shell
150
+ agtm skills rate prepare --skill_id code_success_skills --prompt "Evaluate the code execution results"
151
+
152
+ agtm skills rate prepare --skill_id code_fail_skills --prompt "Evaluate the code execution results"
153
+ ```
154
+
155
+ ```shell
156
+ {"skill_id":"code_success_skills","benchmarks":[{"software-engineering":{"Google":[{"level":"L3","title":"Software Engineer II","description":"Entry-level engineer. Delivers well-scoped tasks with guidance. Learning codebase, tools, and best practices.","signals":["task execution","learning velocity","code quality basics"]},{"level":"L4","title":"Software Engineer III","description":"Independent contributor. Owns small features end-to-end. Writes maintainable code and participates in design discussions.","signals":["ownership","code quality","debugging ability"]},{"level":"L5","title":"Senior Software Engineer","description":"Leads projects and drives design decisions. Mentors others and improves system quality.","signals":["technical leadership","system design","mentorship"]},{"level":"L6","title":"Staff Software Engineer","description":"Owns large systems or cross-team initiatives. Sets technical direction and influences multiple teams.","signals":["architecture","cross-team impact","scalability thinking"]},{"level":"L7","title":"Senior Staff Software Engineer","description":"Drives org-level technical strategy. Solves ambiguous, high-impact problems.","signals":["org influence","complex problem solving","long-term vision"]},{"level":"L8","title":"Principal Engineer","description":"Company-wide impact. Defines technical standards and long-term architecture.","signals":["company impact","vision","industry-level thinking"]}]}}],"logs":[{"log_id":"1db0e927-79f1-46c2-b6dd-200d567f631d","input":"generate sql","output":"ok"},{"log_id":"94a2fae9-80ff-4b18-a77a-5714d34bcc20","input":"generate sql","output":"ok"},{"log_id":"96c216f1-edc5-40f3-b041-b01a68b137a1","input":"generate sql","output":"ok"},{"log_id":"b1f76f33-6f45-41e3-ae14-6b598f6aa357","input":"generate sql","output":"ok"}],"instructions":"System Prompt: You are an evaluator of skill performance. Score each example from 0.0 to 1.0 and assign a level based on benchmarks. Return JSON only. Please output json in format of {\"skill_id\": <skill_id>, \"results\": [{\"log_id\": \"<log_id_1>\", \"score\": 1.0, \"level\": \"L3\", **extra},{\"log_id\": \"<log_id_2>\", \"score\": 1.0, \"level\": \"L3\", **extra}]}\nUser prompt: Evaluate the code execution results"}
157
+
158
+ {"skill_id":"code_fail_skills","benchmarks":[{"software-engineering":{"Google":[{"level":"L3","title":"Software Engineer II","description":"Entry-level engineer. Delivers well-scoped tasks with guidance. Learning codebase, tools, and best practices.","signals":["task execution","learning velocity","code quality basics"]},{"level":"L4","title":"Software Engineer III","description":"Independent contributor. Owns small features end-to-end. Writes maintainable code and participates in design discussions.","signals":["ownership","code quality","debugging ability"]},{"level":"L5","title":"Senior Software Engineer","description":"Leads projects and drives design decisions. Mentors others and improves system quality.","signals":["technical leadership","system design","mentorship"]},{"level":"L6","title":"Staff Software Engineer","description":"Owns large systems or cross-team initiatives. Sets technical direction and influences multiple teams.","signals":["architecture","cross-team impact","scalability thinking"]},{"level":"L7","title":"Senior Staff Software Engineer","description":"Drives org-level technical strategy. Solves ambiguous, high-impact problems.","signals":["org influence","complex problem solving","long-term vision"]},{"level":"L8","title":"Principal Engineer","description":"Company-wide impact. Defines technical standards and long-term architecture.","signals":["company impact","vision","industry-level thinking"]}]}}],"logs":[{"log_id":"2e5513e7-27ae-4636-9d21-4b57ec9f739b","input":"generate sql","output":"failure"},{"log_id":"563747fb-ea62-4ebc-80c4-1bc1d1c82ed5","input":"generate sql","output":"failure"},{"log_id":"db699754-b1fd-491c-a49f-2af1a41ad1f7","input":"generate sql","output":"failure"}],"instructions":"System Prompt: You are an evaluator of skill performance. Score each example from 0.0 to 1.0 and assign a level based on benchmarks. Return JSON only. Please output json in format of {\"skill_id\": <skill_id>, \"results\": [{\"log_id\": \"<log_id_1>\", \"score\": 1.0, \"level\": \"L3\", **extra},{\"log_id\": \"<log_id_2>\", \"score\": 1.0, \"level\": \"L3\", **extra}]}\nUser prompt: Evaluate the code execution results"}
159
+ ```
160
+
161
+ **Step 3. Local Agent Run the evaluation prompt of step 2.**
162
+
163
+ Your Agent give {"score": double, "level": str} to each of the log_id
164
+ ```
165
+ {"skill_id":"code_success_skills","results":[{"log_id":"1db0e927-79f1-46c2-b6dd-200d567f631d","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"94a2fae9-80ff-4b18-a77a-5714d34bcc20","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"96c216f1-edc5-40f3-b041-b01a68b137a1","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"b1f76f33-6f45-41e3-ae14-6b598f6aa357","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."}]}
166
+
167
+ {"skill_id":"code_fail_skills","results":[{"log_id":"2e5513e7-27ae-4636-9d21-4b57ec9f739b","score":0,"level":"L3"},{"log_id":"563747fb-ea62-4ebc-80c4-1bc1d1c82ed5","score":0,"level":"L3"},{"log_id":"db699754-b1fd-491c-a49f-2af1a41ad1f7","score":0,"level":"L3"}]}
168
+ ```
169
+
170
+ **Step 4. Apply Results to Local Log Status**
171
+
172
+ ```shell
173
+ agtm skills rate apply --skill_id code_success_skills --result '{"skill_id":"code_success_skills","results":[{"log_id":"1db0e927-79f1-46c2-b6dd-200d567f631d","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"94a2fae9-80ff-4b18-a77a-5714d34bcc20","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"96c216f1-edc5-40f3-b041-b01a68b137a1","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"b1f76f33-6f45-41e3-ae14-6b598f6aa357","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate. Matches entry-level performance criteria for task execution."}]}'
174
+
175
+ agtm skills rate apply --skill_id code_fail_skills --result '{"skill_id":"code_fail_skills","results":[{"log_id":"2e5513e7-27ae-4636-9d21-4b57ec9f739b","score":0,"level":"L3"},{"log_id":"563747fb-ea62-4ebc-80c4-1bc1d1c82ed5","score":0,"level":"L3"},{"log_id":"db699754-b1fd-491c-a49f-2af1a41ad1f7","score":0,"level":"L3"}]}'
176
+ ```
177
+
178
+ **Step 5. Show final Result (Optional)**
179
+ ```shell
180
+ agtm skills rate show
181
+ ```
182
+
183
+
184
+ ```shell
185
+ skill_id run_times score level
186
+ ------------------- --------- ----- -----
187
+ code_fail_skills 3 0.00 L3
188
+ code_success_skills 4 1.00 L3
189
+ ```
190
+
191
+
192
+ #### CLI Documents
193
+
85
194
  #### Usage
86
195
  ```
87
196
  agtm skills rate prepare --skill_id <skill_id> --prompt "<eval_prompt>" --benchmark <path/benchmark.json>
@@ -131,6 +240,7 @@ write your `customized_agent_benchmark.json` following the formats
131
240
  }
132
241
  ```
133
242
 
243
+
134
244
  ## Supported Agents
135
245
  We provide the same skills local folder as vercel/skills packages.
136
246
  Skills can be installed to any of these agents
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@aiagenta2z/agtm",
3
- "version": "1.0.8",
3
+ "version": "1.1.0",
4
4
  "description": "agtm: CLI Tool for AI Agent Management, Skills, Agent Registry, Benchmarks and Hints in AI Agent Marketplace\n",
5
5
  "main": "dist/agtm-cli.js",
6
6
  "type": "module",