@aiagenta2z/agtm 1.0.9 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +26 -6
- package/data/config/hints/base_hints.json +2 -0
- package/dist/agtm-cli.js +286 -25
- package/docs/skills/README.md +110 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,18 +1,18 @@
|
|
|
1
1
|
|
|
2
2
|
### agtm: CLI Tool for AI Agent Management, Skills, Agent Registry, Benchmarks and Hints in AI Agent Marketplace
|
|
3
3
|
|
|
4
|
-
[GitHub](https://github.com/aiagenta2z/agtm)|[AI Agent Marketplace CLI Doc](https://www.deepnlp.org/doc/ai_agent_marketplace)|[DeepNLP AI Agent Marketplace](https://www.deepnlp.org/store/ai-agent) | [OneKey
|
|
4
|
+
[GitHub](https://github.com/aiagenta2z/agtm)|[AI Agent Marketplace CLI Doc](https://www.deepnlp.org/doc/ai_agent_marketplace)|[DeepNLP AI Agent Marketplace](https://www.deepnlp.org/store/ai-agent) | [OneKey Gateway](https://deepnlp.org/doc/onekey_gateway) | [Agent MCP OneKey Router Ranking](https://www.deepnlp.org/agent/rankings) | [NodeJS agtm](https://www.npmjs.com/package/@aiagenta2z/agtm)
|
|
5
5
|
|
|
6
6
|
`agtm` (AI Agent Management CLI) unifies skill management, agent registration, marketplace search, and provider CLI execution. Install skills from GitHub, log and rate skill runs, upload agent metadata to registries, query the public marketplace, and run agent toolchains with fuzzy hints.
|
|
7
7
|
|
|
8
|
-
Features
|
|
8
|
+
## Features
|
|
9
9
|
|
|
10
10
|
*`agtm skills`*: Manage Skills, Add Skills, List Skills, Log Skills Performance, Skills performance Evaluator, compare to realworld benchmarks
|
|
11
11
|
*`agtm upload`*: AI Agent Registry, register local agent meta information of json or yaml format(agent.json/agent.yaml) or sync your github source meta including README.md
|
|
12
12
|
*`agtm search`*: Search the open source AI Agent Marketplace, including github community, huggingface community, product hunt community, deepnlp ai agent marketplace index, etc
|
|
13
13
|
*`agtm run`*: Run agent clis, don't need to remember, with the powerful hints and completion ability, just type a few characters and "--hint" will help you complete the command line.
|
|
14
14
|
|
|
15
|
-
Furthermore, `agtm` provides memory to track skill outputs and enables performance rating against industry job level benchmarks. This allows you to score each skill execution and assign a professional tier to your AI Agent's capabilities—for example, evaluating its performance as equivalent to that of an L3 or L5 software engineer, marketing
|
|
15
|
+
Furthermore, `agtm` provides memory to track skill outputs and enables performance rating against industry job level benchmarks. This allows you to score each skill execution and assign a professional tier to your AI Agent's capabilities—for example, evaluating its performance as equivalent to that of an L3 or L5 software engineer, marketing professional, etc.
|
|
16
16
|
|
|
17
17
|
```shell
|
|
18
18
|
skill_id run_times score level
|
|
@@ -194,9 +194,7 @@ agtm run <provider_unique_id> <agent_cli>
|
|
|
194
194
|
### Example
|
|
195
195
|
|
|
196
196
|
```shell
|
|
197
|
-
|
|
198
|
-
DEBUG: Entering Human Mode | idArg play | commandArgs | options [object Object] | hasHints true | hints [object Object]
|
|
199
|
-
|
|
197
|
+
agtm run play
|
|
200
198
|
Skill ID suggestions:
|
|
201
199
|
1. microsoft/playwright-cli
|
|
202
200
|
2. googleworkspace/cli
|
|
@@ -260,6 +258,28 @@ agtm upload --config ./agent.json --endpoint https://www.deepnlp.org/api/ai_agen
|
|
|
260
258
|
agtm upload --config ./agent.json --endpoint https://www.aiagenta2z.com/api/ai_agent_marketplace/registry --schema ./schema.json
|
|
261
259
|
```
|
|
262
260
|
|
|
261
|
+
|
|
262
|
+
|
|
263
|
+
### Skills Agtm-Cli
|
|
264
|
+
|
|
265
|
+
We provide Skills repo to use in various agents to evaluate skills and run agent hints.
|
|
266
|
+
The skills can be found in ./skills/ folder
|
|
267
|
+
|
|
268
|
+
| skill | description |
|
|
269
|
+
| ---- | ---- |
|
|
270
|
+
| agent-cli-hint-completion | This skill uses `agtm run --mode agent` to help hint agents clis usage |
|
|
271
|
+
| agent-skills-evaluator | This skill use `agtm skills log` and `agtm skills rate` to track other skills performance from LLM-based evaluator, match to professional Job Level Benchmarks, such as Google L3 level software engineers/ Apple M3 level marketing specialist performance. |
|
|
272
|
+
|
|
273
|
+
```shell
|
|
274
|
+
npx agtm skills add aiagenta2z/agtm ## install all the skill evaluation and skill cli-hints
|
|
275
|
+
npx agtm skills add aiagenta2z/agtm -s agent-skills-evaluator
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
```shell
|
|
279
|
+
npx skills add aiagenta2z/agtm ## install all the skill evaluation and skill cli-hints
|
|
280
|
+
npx skills add aiagenta2z/agtm -s agent-skills-evaluator
|
|
281
|
+
```
|
|
282
|
+
|
|
263
283
|
### Contributing
|
|
264
284
|
|
|
265
285
|
#### Agent CLI List
|
package/dist/agtm-cli.js
CHANGED
|
@@ -9,6 +9,8 @@ import { execFileSync, spawn } from 'node:child_process';
|
|
|
9
9
|
import { createInterface } from 'node:readline/promises';
|
|
10
10
|
import { fileURLToPath } from 'node:url';
|
|
11
11
|
import { randomUUID } from 'node:crypto';
|
|
12
|
+
//production setup
|
|
13
|
+
const LOG_ENABLE = false;
|
|
12
14
|
// --- Configuration ---
|
|
13
15
|
const BASE_URL = 'https://www.deepnlp.org/api/ai_agent_marketplace';
|
|
14
16
|
const REGISTRY_ENDPOINT = `${BASE_URL}/registry`;
|
|
@@ -19,7 +21,6 @@ const MOCK_RETURN_URL = "https://www.deepnlp.org/store/ai-agent/ai-agent/pub-AI-
|
|
|
19
21
|
const CLI_DIR = path.dirname(fileURLToPath(import.meta.url));
|
|
20
22
|
const MODE_AGENT = 'agent';
|
|
21
23
|
const MODE_HUMAN = 'human';
|
|
22
|
-
const LOG_ENABLE = true;
|
|
23
24
|
const AGTM_LOCAL_DIR = path.join(process.cwd(), '.agtm');
|
|
24
25
|
const AGTM_GLOBAL_DIR = path.join(os.homedir(), '.agtm');
|
|
25
26
|
const SKILL_LOG_DIR_LOCAL = path.join(AGTM_LOCAL_DIR, 'skills', 'log');
|
|
@@ -636,7 +637,7 @@ function loadLevelDescriptions(levelFile) {
|
|
|
636
637
|
return null;
|
|
637
638
|
}
|
|
638
639
|
}
|
|
639
|
-
const
|
|
640
|
+
const DEFAULT_EVAL_SYSTEM_PROMPT = 'System Prompt: You are an evaluator of skill performance. Score each example from 0.0 to 1.0 and assign a level based on benchmarks. Return JSON only. Please output json in format of {"skill_id": <skill_id>, "results": [{"log_id": "<log_id_1>", "score": 1.0, "level": "L3", **extra},{"log_id": "<log_id_2>", "score": 1.0, "level": "L3", **extra}]}';
|
|
640
641
|
const BENCHMARK_TOP_K = 3;
|
|
641
642
|
function benchmarkKey(obj) {
|
|
642
643
|
if (!obj || typeof obj !== 'object')
|
|
@@ -683,17 +684,79 @@ async function handleSkillsRatePrepare(options) {
|
|
|
683
684
|
console.error(`\n❌ Error: No logs found for skill '${skillId}' in ${logDir}.`);
|
|
684
685
|
process.exit(1);
|
|
685
686
|
}
|
|
687
|
+
var userInputPrompt = `User prompt: ${(options.prompt || "")}`;
|
|
688
|
+
var mergeInstruction = DEFAULT_EVAL_SYSTEM_PROMPT + "\n" + userInputPrompt;
|
|
686
689
|
const levelsData = loadLevelDescriptions(options.benchmark);
|
|
687
690
|
const benchmarks = normalizeBenchmarks(skillId, levelsData).slice(0, BENCHMARK_TOP_K);
|
|
688
691
|
const payload = {
|
|
689
692
|
skill_id: skillId,
|
|
690
693
|
benchmarks,
|
|
691
694
|
logs: logs.map(({ log_id, input, output }) => ({ log_id, input, output })),
|
|
692
|
-
instructions:
|
|
695
|
+
instructions: mergeInstruction
|
|
693
696
|
};
|
|
694
697
|
console.log(JSON.stringify(payload, null, 2));
|
|
695
698
|
}
|
|
696
699
|
async function handleSkillsRateApply(options) {
|
|
700
|
+
const skillId = options.skill_id;
|
|
701
|
+
if (!skillId) {
|
|
702
|
+
console.error('\n❌ Error: --skill_id is required.');
|
|
703
|
+
process.exit(1);
|
|
704
|
+
}
|
|
705
|
+
if (!options.result) {
|
|
706
|
+
console.error('\n❌ Error: --result <json or base64> is required.');
|
|
707
|
+
process.exit(1);
|
|
708
|
+
}
|
|
709
|
+
let parsed;
|
|
710
|
+
try {
|
|
711
|
+
let raw = options.result;
|
|
712
|
+
// Attempt base64 decode if JSON parsing fails
|
|
713
|
+
try {
|
|
714
|
+
parsed = JSON.parse(raw);
|
|
715
|
+
}
|
|
716
|
+
catch {
|
|
717
|
+
// try decode base64
|
|
718
|
+
raw = Buffer.from(raw, 'base64').toString('utf8');
|
|
719
|
+
parsed = JSON.parse(raw);
|
|
720
|
+
}
|
|
721
|
+
}
|
|
722
|
+
catch (e) {
|
|
723
|
+
console.error(`\n❌ Error: invalid JSON/base64 for --result: ${e.message}`);
|
|
724
|
+
process.exit(1);
|
|
725
|
+
}
|
|
726
|
+
const results = Array.isArray(parsed?.results) ? parsed.results : [];
|
|
727
|
+
if (results.length === 0) {
|
|
728
|
+
console.error('\n❌ Error: --result must contain a non-empty "results" array.');
|
|
729
|
+
process.exit(1);
|
|
730
|
+
}
|
|
731
|
+
const logDir = getLogDir(options.logDir);
|
|
732
|
+
const logs = loadLogs(logDir).filter(l => l.skill_id === skillId);
|
|
733
|
+
const byId = new Map(logs.map(l => [l.log_id, l]));
|
|
734
|
+
let updated = 0;
|
|
735
|
+
const missing = [];
|
|
736
|
+
for (const item of results) {
|
|
737
|
+
const id = item?.log_id;
|
|
738
|
+
if (!id || !byId.has(id)) {
|
|
739
|
+
missing.push(String(id || 'unknown'));
|
|
740
|
+
continue;
|
|
741
|
+
}
|
|
742
|
+
const entry = byId.get(id);
|
|
743
|
+
// Support both 'score' and 'rating'
|
|
744
|
+
if (item.rating !== undefined)
|
|
745
|
+
entry.rating = Number(item.rating);
|
|
746
|
+
if (item.score !== undefined)
|
|
747
|
+
entry.rating = Number(item.score);
|
|
748
|
+
if (item.level !== undefined)
|
|
749
|
+
entry.level = String(item.level);
|
|
750
|
+
// Optional rationale
|
|
751
|
+
if (item.rationale !== undefined)
|
|
752
|
+
entry.rationale = String(item.rationale);
|
|
753
|
+
const target = path.join(logDir, `${entry.log_id}.json`);
|
|
754
|
+
fs.writeFileSync(target, JSON.stringify(entry, null, 2), 'utf8');
|
|
755
|
+
updated += 1;
|
|
756
|
+
}
|
|
757
|
+
console.log(JSON.stringify({ status: 'success', updated, missing }, null, 2));
|
|
758
|
+
}
|
|
759
|
+
async function handleSkillsRateApplyBak(options) {
|
|
697
760
|
const skillId = options.skill_id;
|
|
698
761
|
if (!skillId) {
|
|
699
762
|
console.error('\n❌ Error: --skill_id is required.');
|
|
@@ -917,7 +980,7 @@ function fuzzyScore(query, candidate) {
|
|
|
917
980
|
return 0.7 * editScore + 0.3 * tokenScore;
|
|
918
981
|
}
|
|
919
982
|
function createTrie() {
|
|
920
|
-
return { children: new Map(),
|
|
983
|
+
return { children: new Map(), terminalValues: new Set() };
|
|
921
984
|
}
|
|
922
985
|
function insertTrie(trie, key, value) {
|
|
923
986
|
let node = trie;
|
|
@@ -932,8 +995,8 @@ function insertTrie(trie, key, value) {
|
|
|
932
995
|
node.children.set(ch, created);
|
|
933
996
|
node = created;
|
|
934
997
|
}
|
|
935
|
-
node.values.add(value);
|
|
936
998
|
}
|
|
999
|
+
node.terminalValues.add(value);
|
|
937
1000
|
}
|
|
938
1001
|
function searchTrie(trie, prefix, limit) {
|
|
939
1002
|
let node = trie;
|
|
@@ -944,9 +1007,57 @@ function searchTrie(trie, prefix, limit) {
|
|
|
944
1007
|
return [];
|
|
945
1008
|
}
|
|
946
1009
|
}
|
|
947
|
-
const
|
|
948
|
-
|
|
949
|
-
|
|
1010
|
+
const out = [];
|
|
1011
|
+
const seen = new Set();
|
|
1012
|
+
const dfs = (current) => {
|
|
1013
|
+
if (out.length >= limit)
|
|
1014
|
+
return;
|
|
1015
|
+
const terminal = Array.from(current.terminalValues).sort((a, b) => a.localeCompare(b));
|
|
1016
|
+
for (const value of terminal) {
|
|
1017
|
+
if (out.length >= limit)
|
|
1018
|
+
return;
|
|
1019
|
+
if (seen.has(value))
|
|
1020
|
+
continue;
|
|
1021
|
+
seen.add(value);
|
|
1022
|
+
out.push(value);
|
|
1023
|
+
}
|
|
1024
|
+
const keys = Array.from(current.children.keys()).sort((a, b) => a.localeCompare(b));
|
|
1025
|
+
for (const key of keys) {
|
|
1026
|
+
if (out.length >= limit)
|
|
1027
|
+
return;
|
|
1028
|
+
dfs(current.children.get(key));
|
|
1029
|
+
}
|
|
1030
|
+
};
|
|
1031
|
+
dfs(node);
|
|
1032
|
+
return out;
|
|
1033
|
+
}
|
|
1034
|
+
function trieToPersisted(node) {
|
|
1035
|
+
const children = {};
|
|
1036
|
+
for (const [key, child] of node.children.entries()) {
|
|
1037
|
+
children[key] = trieToPersisted(child);
|
|
1038
|
+
}
|
|
1039
|
+
return {
|
|
1040
|
+
children: Object.keys(children).length ? children : undefined,
|
|
1041
|
+
terminalValues: node.terminalValues.size ? Array.from(node.terminalValues).sort((a, b) => a.localeCompare(b)) : undefined
|
|
1042
|
+
};
|
|
1043
|
+
}
|
|
1044
|
+
function persistedToTrie(node) {
|
|
1045
|
+
const trie = createTrie();
|
|
1046
|
+
if (Array.isArray(node.terminalValues)) {
|
|
1047
|
+
for (const value of node.terminalValues) {
|
|
1048
|
+
if (typeof value === 'string' && value.trim()) {
|
|
1049
|
+
trie.terminalValues.add(value);
|
|
1050
|
+
}
|
|
1051
|
+
}
|
|
1052
|
+
}
|
|
1053
|
+
if (node.children && typeof node.children === 'object') {
|
|
1054
|
+
for (const [key, child] of Object.entries(node.children)) {
|
|
1055
|
+
if (!child || typeof child !== 'object')
|
|
1056
|
+
continue;
|
|
1057
|
+
trie.children.set(key, persistedToTrie(child));
|
|
1058
|
+
}
|
|
1059
|
+
}
|
|
1060
|
+
return trie;
|
|
950
1061
|
}
|
|
951
1062
|
function mergeHints(target, source) {
|
|
952
1063
|
for (const [id, entry] of Object.entries(source)) {
|
|
@@ -997,6 +1108,26 @@ function writeHintsFile(filePath, hints) {
|
|
|
997
1108
|
ensureDir(path.dirname(filePath));
|
|
998
1109
|
fs.writeFileSync(filePath, JSON.stringify(hints, null, 2));
|
|
999
1110
|
}
|
|
1111
|
+
function writeHintsTrieFile(filePath, trie) {
|
|
1112
|
+
ensureDir(path.dirname(filePath));
|
|
1113
|
+
fs.writeFileSync(filePath, JSON.stringify(trieToPersisted(trie), null, 2), 'utf8');
|
|
1114
|
+
}
|
|
1115
|
+
function loadHintsTrieFile(filePath) {
|
|
1116
|
+
if (!fs.existsSync(filePath)) {
|
|
1117
|
+
return null;
|
|
1118
|
+
}
|
|
1119
|
+
try {
|
|
1120
|
+
const raw = fs.readFileSync(filePath, 'utf8');
|
|
1121
|
+
const parsed = JSON.parse(raw);
|
|
1122
|
+
if (!parsed || typeof parsed !== 'object') {
|
|
1123
|
+
return null;
|
|
1124
|
+
}
|
|
1125
|
+
return persistedToTrie(parsed);
|
|
1126
|
+
}
|
|
1127
|
+
catch {
|
|
1128
|
+
return null;
|
|
1129
|
+
}
|
|
1130
|
+
}
|
|
1000
1131
|
function findBundledHintsDir() {
|
|
1001
1132
|
const candidates = [
|
|
1002
1133
|
path.resolve(CLI_DIR, 'data', 'config', 'hints'),
|
|
@@ -1045,6 +1176,14 @@ async function loadBundledHints() {
|
|
|
1045
1176
|
return merged;
|
|
1046
1177
|
}
|
|
1047
1178
|
function getHintsPath(useGlobal) {
|
|
1179
|
+
const baseDir = useGlobal ? AGTM_GLOBAL_DIR : AGTM_LOCAL_DIR;
|
|
1180
|
+
return path.join(baseDir, 'hints', 'hints.json');
|
|
1181
|
+
}
|
|
1182
|
+
function getHintsTriePath(useGlobal) {
|
|
1183
|
+
const baseDir = useGlobal ? AGTM_GLOBAL_DIR : AGTM_LOCAL_DIR;
|
|
1184
|
+
return path.join(baseDir, 'hints', 'hints_trie.json');
|
|
1185
|
+
}
|
|
1186
|
+
function getOldHintsPath(useGlobal) {
|
|
1048
1187
|
if (useGlobal) {
|
|
1049
1188
|
return path.join(AGTM_GLOBAL_DIR, 'hints.json');
|
|
1050
1189
|
}
|
|
@@ -1063,6 +1202,8 @@ function loadCombinedHints(useGlobal) {
|
|
|
1063
1202
|
const localHints = loadHintsFile(getHintsPath(false));
|
|
1064
1203
|
mergeHints(combined, globalHints);
|
|
1065
1204
|
mergeHints(combined, localHints);
|
|
1205
|
+
mergeHints(combined, loadHintsFile(getOldHintsPath(true)));
|
|
1206
|
+
mergeHints(combined, loadHintsFile(getOldHintsPath(false)));
|
|
1066
1207
|
mergeHints(combined, loadHintsFile(getLegacyHintsPath(true)));
|
|
1067
1208
|
mergeHints(combined, loadHintsFile(getLegacyHintsPath(false)));
|
|
1068
1209
|
if (useGlobal) {
|
|
@@ -1101,6 +1242,49 @@ function filterCliHints(hints, query, limit) {
|
|
|
1101
1242
|
const sorted = [...hints].sort((a, b) => a.cli.localeCompare(b.cli));
|
|
1102
1243
|
return sorted.slice(0, limit);
|
|
1103
1244
|
}
|
|
1245
|
+
function highlightMatches(text, query) {
|
|
1246
|
+
const trimmed = query.trim();
|
|
1247
|
+
if (!trimmed)
|
|
1248
|
+
return text;
|
|
1249
|
+
const tokens = trimmed
|
|
1250
|
+
.toLowerCase()
|
|
1251
|
+
.split(/[^a-z0-9]+/g)
|
|
1252
|
+
.map((t) => t.trim())
|
|
1253
|
+
.filter(Boolean);
|
|
1254
|
+
if (tokens.length === 0)
|
|
1255
|
+
return text;
|
|
1256
|
+
const lower = text.toLowerCase();
|
|
1257
|
+
const ranges = [];
|
|
1258
|
+
for (const token of tokens) {
|
|
1259
|
+
let idx = lower.indexOf(token);
|
|
1260
|
+
while (idx !== -1) {
|
|
1261
|
+
ranges.push([idx, idx + token.length]);
|
|
1262
|
+
idx = lower.indexOf(token, idx + 1);
|
|
1263
|
+
}
|
|
1264
|
+
}
|
|
1265
|
+
if (ranges.length === 0)
|
|
1266
|
+
return text;
|
|
1267
|
+
ranges.sort((a, b) => a[0] - b[0] || a[1] - b[1]);
|
|
1268
|
+
const merged = [];
|
|
1269
|
+
for (const [start, end] of ranges) {
|
|
1270
|
+
const last = merged[merged.length - 1];
|
|
1271
|
+
if (!last || start > last[1]) {
|
|
1272
|
+
merged.push([start, end]);
|
|
1273
|
+
}
|
|
1274
|
+
else {
|
|
1275
|
+
last[1] = Math.max(last[1], end);
|
|
1276
|
+
}
|
|
1277
|
+
}
|
|
1278
|
+
let out = '';
|
|
1279
|
+
let cursor = 0;
|
|
1280
|
+
for (const [start, end] of merged) {
|
|
1281
|
+
out += text.slice(cursor, start);
|
|
1282
|
+
out += green(text.slice(start, end));
|
|
1283
|
+
cursor = end;
|
|
1284
|
+
}
|
|
1285
|
+
out += text.slice(cursor);
|
|
1286
|
+
return out;
|
|
1287
|
+
}
|
|
1104
1288
|
async function promptSelection(prompt, options) {
|
|
1105
1289
|
if (!process.stdin.isTTY) {
|
|
1106
1290
|
return options.length > 0 ? options[0] : null;
|
|
@@ -1124,7 +1308,7 @@ async function promptSelection(prompt, options) {
|
|
|
1124
1308
|
rl.close();
|
|
1125
1309
|
}
|
|
1126
1310
|
}
|
|
1127
|
-
async function
|
|
1311
|
+
async function promptCommandLineBase(promptText) {
|
|
1128
1312
|
if (!process.stdin.isTTY) {
|
|
1129
1313
|
return null;
|
|
1130
1314
|
}
|
|
@@ -1138,7 +1322,33 @@ async function promptCommandLine(promptText) {
|
|
|
1138
1322
|
rl.close();
|
|
1139
1323
|
}
|
|
1140
1324
|
}
|
|
1141
|
-
|
|
1325
|
+
import readline from 'readline';
|
|
1326
|
+
async function promptCommandLine(promptText, defaultValue) {
|
|
1327
|
+
if (!process.stdin.isTTY)
|
|
1328
|
+
return null;
|
|
1329
|
+
const rl = readline.createInterface({
|
|
1330
|
+
input: process.stdin,
|
|
1331
|
+
output: process.stdout,
|
|
1332
|
+
});
|
|
1333
|
+
try {
|
|
1334
|
+
return await new Promise((resolve) => {
|
|
1335
|
+
rl.question(promptText, (answer) => {
|
|
1336
|
+
rl.close();
|
|
1337
|
+
const trimmed = answer.trim();
|
|
1338
|
+
resolve(trimmed || defaultValue || null);
|
|
1339
|
+
});
|
|
1340
|
+
// Pre-fill default value and move cursor to end
|
|
1341
|
+
if (defaultValue) {
|
|
1342
|
+
rl.write(defaultValue);
|
|
1343
|
+
}
|
|
1344
|
+
});
|
|
1345
|
+
}
|
|
1346
|
+
finally {
|
|
1347
|
+
// just in case
|
|
1348
|
+
rl.close();
|
|
1349
|
+
}
|
|
1350
|
+
}
|
|
1351
|
+
async function selectSkillId(hints, input, limit = 5, trie) {
|
|
1142
1352
|
const ids = Object.keys(hints);
|
|
1143
1353
|
if (ids.length === 0) {
|
|
1144
1354
|
return null;
|
|
@@ -1146,9 +1356,9 @@ async function selectSkillId(hints, input, limit = 5) {
|
|
|
1146
1356
|
if (input && hints[input]) {
|
|
1147
1357
|
return input;
|
|
1148
1358
|
}
|
|
1149
|
-
const
|
|
1359
|
+
const activeTrie = trie || buildIdTrie(hints);
|
|
1150
1360
|
const prefix = input || '';
|
|
1151
|
-
let suggestions = searchTrie(
|
|
1361
|
+
let suggestions = searchTrie(activeTrie, prefix, limit);
|
|
1152
1362
|
if (suggestions.length === 0 && prefix) {
|
|
1153
1363
|
const scored = ids
|
|
1154
1364
|
.map((id) => ({ id, score: fuzzyScore(prefix, id) }))
|
|
@@ -1159,11 +1369,21 @@ async function selectSkillId(hints, input, limit = 5) {
|
|
|
1159
1369
|
if (suggestions.length === 0) {
|
|
1160
1370
|
return null;
|
|
1161
1371
|
}
|
|
1162
|
-
|
|
1372
|
+
let printedLines = 0;
|
|
1373
|
+
const trackedLog = (message = '') => {
|
|
1374
|
+
console.log(message);
|
|
1375
|
+
printedLines += countConsoleLogLines(message);
|
|
1376
|
+
};
|
|
1377
|
+
trackedLog('');
|
|
1378
|
+
trackedLog('Skill ID suggestions:');
|
|
1163
1379
|
suggestions.forEach((value, index) => {
|
|
1164
|
-
|
|
1380
|
+
trackedLog(` ${index + 1}. ${highlightMatches(value, prefix)}`);
|
|
1165
1381
|
});
|
|
1166
|
-
const selected = await promptSelection('
|
|
1382
|
+
const selected = await promptSelection('Select skill id (number or id): ', suggestions);
|
|
1383
|
+
printedLines += 1; // prompt line
|
|
1384
|
+
if (process.stdin.isTTY && process.stdout.isTTY) {
|
|
1385
|
+
clearLastLines(printedLines + 1); // +1 for the post-input newline line
|
|
1386
|
+
}
|
|
1167
1387
|
console.log(`Selected Skill/Cli is ${selected}`);
|
|
1168
1388
|
if (!selected) {
|
|
1169
1389
|
return null;
|
|
@@ -1186,13 +1406,23 @@ async function selectCliHint(hints, query, limit = 5) {
|
|
|
1186
1406
|
if (suggestions.length === 0) {
|
|
1187
1407
|
return null;
|
|
1188
1408
|
}
|
|
1189
|
-
|
|
1409
|
+
let printedLines = 0;
|
|
1410
|
+
const trackedLog = (message = '') => {
|
|
1411
|
+
console.log(message);
|
|
1412
|
+
printedLines += countConsoleLogLines(message);
|
|
1413
|
+
};
|
|
1414
|
+
trackedLog('');
|
|
1415
|
+
trackedLog('Command hints:');
|
|
1190
1416
|
suggestions.forEach((item, index) => {
|
|
1191
1417
|
const hintText = item.hint ? ` # ${item.hint}` : '';
|
|
1192
|
-
|
|
1418
|
+
trackedLog(` ${index + 1}. ${highlightMatches(item.cli, query || '')}${hintText}`);
|
|
1193
1419
|
});
|
|
1194
1420
|
const options = suggestions.map((item) => item.cli);
|
|
1195
|
-
const selected = await promptSelection('
|
|
1421
|
+
const selected = await promptSelection('Select command (number or input custom): ', options);
|
|
1422
|
+
printedLines += 1; // prompt line
|
|
1423
|
+
if (process.stdin.isTTY && process.stdout.isTTY) {
|
|
1424
|
+
clearLastLines(printedLines + 1); // +1 for the post-input newline line
|
|
1425
|
+
}
|
|
1196
1426
|
if (!selected) {
|
|
1197
1427
|
return null;
|
|
1198
1428
|
}
|
|
@@ -1212,16 +1442,22 @@ async function handleSetup(options) {
|
|
|
1212
1442
|
if (options.hint) {
|
|
1213
1443
|
const bundled = await loadBundledHints();
|
|
1214
1444
|
const targetPath = getHintsPath(useGlobal);
|
|
1445
|
+
const targetTriePath = getHintsTriePath(useGlobal);
|
|
1215
1446
|
const legacyPath = getLegacyHintsPath(useGlobal);
|
|
1216
1447
|
const existing = loadHintsFile(targetPath);
|
|
1448
|
+
const existingOld = loadHintsFile(getOldHintsPath(useGlobal));
|
|
1217
1449
|
const merged = {};
|
|
1218
1450
|
mergeHints(merged, bundled);
|
|
1451
|
+
mergeHints(merged, existingOld);
|
|
1219
1452
|
mergeHints(merged, existing);
|
|
1220
1453
|
writeHintsFile(targetPath, merged);
|
|
1454
|
+
const trieSource = loadCombinedHints(useGlobal);
|
|
1455
|
+
writeHintsTrieFile(targetTriePath, buildIdTrie(trieSource));
|
|
1221
1456
|
if (fs.existsSync(path.dirname(legacyPath))) {
|
|
1222
1457
|
writeHintsFile(legacyPath, merged);
|
|
1223
1458
|
}
|
|
1224
1459
|
console.log(`\n✅ Hints cache updated at ${targetPath}`);
|
|
1460
|
+
console.log(`✅ Hints trie updated at ${targetTriePath}`);
|
|
1225
1461
|
}
|
|
1226
1462
|
if (options['levels']) {
|
|
1227
1463
|
const bundledLevelsDir = findBundledLevelsDir();
|
|
@@ -1235,6 +1471,26 @@ async function handleSetup(options) {
|
|
|
1235
1471
|
console.log(`\n✅ Levels copied to ${targetDir}`);
|
|
1236
1472
|
}
|
|
1237
1473
|
}
|
|
1474
|
+
function clearScreen() {
|
|
1475
|
+
// process.stdout.write('\x1Bc');
|
|
1476
|
+
process.stdout.write('\x1b[0f');
|
|
1477
|
+
}
|
|
1478
|
+
function clearLastLines(n) {
|
|
1479
|
+
if (!process.stdout.isTTY)
|
|
1480
|
+
return;
|
|
1481
|
+
for (let i = 0; i < n; i++) {
|
|
1482
|
+
process.stdout.write('\x1b[2K'); // clear current line
|
|
1483
|
+
if (i < n - 1) {
|
|
1484
|
+
process.stdout.write('\x1b[1A'); // move cursor up
|
|
1485
|
+
}
|
|
1486
|
+
}
|
|
1487
|
+
process.stdout.write('\x1b[0G'); // move to start of line
|
|
1488
|
+
}
|
|
1489
|
+
function countConsoleLogLines(message) {
|
|
1490
|
+
if (message === '')
|
|
1491
|
+
return 1;
|
|
1492
|
+
return message.split('\n').length;
|
|
1493
|
+
}
|
|
1238
1494
|
async function handleRun(idArg, commandArgs = [], options = {}) {
|
|
1239
1495
|
const isAgent = (options.mode || 'human').toLowerCase() === MODE_AGENT;
|
|
1240
1496
|
// first load local hints
|
|
@@ -1262,6 +1518,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
|
|
|
1262
1518
|
runtimeHints = await loadBundledHints();
|
|
1263
1519
|
}
|
|
1264
1520
|
const activeHints = hasHints ? hints : (runtimeHints || {});
|
|
1521
|
+
const cachedIdTrie = hasHints ? loadHintsTrieFile(getHintsTriePath(false)) : null;
|
|
1265
1522
|
const ids = Object.keys(activeHints);
|
|
1266
1523
|
if (ids.length === 0) {
|
|
1267
1524
|
console.error('\n❌ Error: No hints available.');
|
|
@@ -1269,7 +1526,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
|
|
|
1269
1526
|
}
|
|
1270
1527
|
if (!idArg || !activeHints[idArg]) {
|
|
1271
1528
|
const query = idArg || '';
|
|
1272
|
-
const trie = buildIdTrie(activeHints);
|
|
1529
|
+
const trie = cachedIdTrie || buildIdTrie(activeHints);
|
|
1273
1530
|
let suggestions = searchTrie(trie, query, 2);
|
|
1274
1531
|
if (suggestions.length === 0 && query) {
|
|
1275
1532
|
const scored = ids
|
|
@@ -1280,7 +1537,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
|
|
|
1280
1537
|
}
|
|
1281
1538
|
console.log('\nSkill ID suggestions:');
|
|
1282
1539
|
suggestions.forEach((value, index) => {
|
|
1283
|
-
console.log(` ${index + 1}. ${value}`);
|
|
1540
|
+
console.log(` ${index + 1}. ${highlightMatches(value, query)}`);
|
|
1284
1541
|
const entry = activeHints[value];
|
|
1285
1542
|
if (entry?.hints?.length) {
|
|
1286
1543
|
const preview = entry.hints.slice(0, 2).map((h) => `${h.cli}${h.hint ? ` # ${h.hint}` : ''}`);
|
|
@@ -1343,24 +1600,27 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
|
|
|
1343
1600
|
if (LOG_ENABLE) {
|
|
1344
1601
|
console.log(`DEBUG: Entering Human Mode | idArg ${idArg} | commandArgs ${commandArgs} | options ${options} | hasHints ${hasHints} | hints ${hints}`);
|
|
1345
1602
|
}
|
|
1603
|
+
const cachedIdTrie = hasHints ? loadHintsTrieFile(getHintsTriePath(false)) : null;
|
|
1346
1604
|
// human mode with pause for cli input
|
|
1347
1605
|
if (!idArg) {
|
|
1348
1606
|
if (!hasHints) {
|
|
1349
1607
|
console.error('\n❌ Error: No hints cache found. Run `agtm setup --hint` first.');
|
|
1350
1608
|
process.exit(1);
|
|
1351
1609
|
}
|
|
1352
|
-
const selected = await selectSkillId(hints);
|
|
1610
|
+
const selected = await selectSkillId(hints, undefined, 5, cachedIdTrie);
|
|
1353
1611
|
if (!selected) {
|
|
1354
1612
|
console.error('\n❌ Error: No skill id selected.');
|
|
1355
1613
|
process.exit(1);
|
|
1356
1614
|
}
|
|
1357
1615
|
idArg = selected;
|
|
1616
|
+
// clearScreen();
|
|
1358
1617
|
}
|
|
1359
1618
|
else if (hasHints && !hints[idArg]) {
|
|
1360
|
-
const selected = await selectSkillId(hints, idArg);
|
|
1619
|
+
const selected = await selectSkillId(hints, idArg, 5, cachedIdTrie);
|
|
1361
1620
|
if (selected) {
|
|
1362
1621
|
idArg = selected;
|
|
1363
1622
|
}
|
|
1623
|
+
// clearScreen();
|
|
1364
1624
|
}
|
|
1365
1625
|
let finalCommandArgs = commandArgs;
|
|
1366
1626
|
if (hasHints && idArg && finalCommandArgs.length > 0 && hints[finalCommandArgs[0]]) {
|
|
@@ -1383,7 +1643,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
|
|
|
1383
1643
|
if (!finalCommandArgs || finalCommandArgs.length === 0) {
|
|
1384
1644
|
let chosen = null;
|
|
1385
1645
|
if (hintEntry?.hints && hintEntry.hints.length > 0) {
|
|
1386
|
-
const query = await promptCommandLine(
|
|
1646
|
+
const query = await promptCommandLine(`\nEnter command to run (leave empty to list cli hints): `, ``);
|
|
1387
1647
|
const searchQuery = query || '';
|
|
1388
1648
|
chosen = await selectCliHint(hintEntry.hints, searchQuery);
|
|
1389
1649
|
}
|
|
@@ -1391,7 +1651,7 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
|
|
|
1391
1651
|
finalCommandArgs = chosen.cli.split(/\s+/).filter(Boolean);
|
|
1392
1652
|
}
|
|
1393
1653
|
else {
|
|
1394
|
-
const manual = await promptCommandLine('\nEnter command to run: ');
|
|
1654
|
+
const manual = await promptCommandLine('\nEnter command line to run: ', ``);
|
|
1395
1655
|
if (!manual) {
|
|
1396
1656
|
console.error('\n❌ Error: No command selected.');
|
|
1397
1657
|
process.exit(1);
|
|
@@ -1411,7 +1671,8 @@ async function handleRun(idArg, commandArgs = [], options = {}) {
|
|
|
1411
1671
|
process.exit(1);
|
|
1412
1672
|
}
|
|
1413
1673
|
const finalCommandLine = finalCommandArgs.join(' ');
|
|
1414
|
-
|
|
1674
|
+
console.log("\nComplete the Cli with your arguments or leave blank and press Enter");
|
|
1675
|
+
const edited = await promptCommandLine(`\nFinal command line [${finalCommandLine}]:\n`, `${finalCommandLine}`);
|
|
1415
1676
|
if (edited && edited.trim()) {
|
|
1416
1677
|
finalCommandArgs = edited.split(/\s+/).filter(Boolean);
|
|
1417
1678
|
}
|
package/docs/skills/README.md
CHANGED
|
@@ -82,6 +82,115 @@ To use the rate command, have to setup the benchmark levels configuration. save
|
|
|
82
82
|
agtm setup --levels
|
|
83
83
|
```
|
|
84
84
|
|
|
85
|
+
#### Description
|
|
86
|
+
This skill run `agent rate` command line to evaluate
|
|
87
|
+
|
|
88
|
+
The Agtm Skills CLI manages local skill bundles for supported agents (for example `claude-code`, `codex`, `openclaw`). It can download skills from GitHub, install them into the correct agent folders, list what is installed, record run logs, and apply rating benchmarks.
|
|
89
|
+
|
|
90
|
+
It also serves as a benchmarking tool to evaluate skill outputs:
|
|
91
|
+
**Benchmark** your AI agent against real-world standards — from Google-level engineering to Apple-caliber product launches.
|
|
92
|
+
**Rate** performance of each run with structured scores and levels, helping agents like Claude Code choose the right skills more effectively.
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
#### Usage
|
|
96
|
+
|
|
97
|
+
|
|
98
|
+
Each time after your agent runs a skills, it runs a follow up skill agent-skills-evaluator to track
|
|
99
|
+
the log of this run with input, output summarized, keep them in a log file based memory.
|
|
100
|
+
Then it calls the `agtm skills log` and `agtm skills rate`, `agtm skills rate show`
|
|
101
|
+
|
|
102
|
+
`agtm skills log`: keep track of skills running in a local cache json log file
|
|
103
|
+
`agtm skills rate prepare`: Fetch the evaluator and benchmarks.json, load the criteria of evaluation, such as job levels, task fullfillment.
|
|
104
|
+
`agtm skills rate apply`: Append the LLM Based Evaluator to the local results.
|
|
105
|
+
`agtm skills rate show`: Show the table of historical scores, level ratings.
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
agtm skills log <skill_id> --data '<json_payload>'
|
|
109
|
+
agtm skills rate prepare --skill_id <skill_id> --prompt "<eval_prompt>" --benchmark <path/benchmark.json>
|
|
110
|
+
agtm skills rate apply --skill_id <skill_id> --result '<result_json: log_id>'
|
|
111
|
+
agtm skills rate show --skill_id <skill_id>
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
#### Example
|
|
115
|
+
Note: `code_success_skills` is a dummy skill which always produce success results, `code_fail_skills` is a dummy skill which always produce failure results,
|
|
116
|
+
|
|
117
|
+
|
|
118
|
+
```shell
|
|
119
|
+
## log command will output a log_id
|
|
120
|
+
agtm skills log code_success_skills --data '{"input":"generate sql","output":"ok","meta":{"agent":"claude-code"}}'
|
|
121
|
+
agtm skills rate prepare --skill_id code_success_skills --prompt "Evaluate the code execution results"
|
|
122
|
+
agtm skills rate apply --skill_id code_success_skills --result '{"results":[{"log_id":"3679a3fe-4d97-4eb1-83bc-f83d711be195","rating":0.90,"level":"L4"}]}'
|
|
123
|
+
agtm skills rate show ## show the historical skills dashboard, including score, evaluation levels
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
Note:
|
|
127
|
+
- Persists a run record at `.agtm/skills/log/<uuid>.json` (or the `--logDir` you supply).
|
|
128
|
+
- `<json_payload>` must contain at least `input` and `output`; optional fields (meta, rating, level) are accepted.
|
|
129
|
+
|
|
130
|
+
|
|
131
|
+
#### Pipeline
|
|
132
|
+
|
|
133
|
+
**Step 1. Add log to memory**
|
|
134
|
+
```
|
|
135
|
+
agtm skills log code_success_skills --data '{"input":"generate sql","output":"ok","meta":{"agent":"claude-code"}}'
|
|
136
|
+
agtm skills log code_fail_skills --data '{"input":"generate sql","output":"failure","meta":{"agent":"claude-code"}}'
|
|
137
|
+
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
It will generate a {log_id}.json as memory
|
|
141
|
+
```shell
|
|
142
|
+
✅ Saved log to .agtm/skills/log/96c216f1-edc5-40f3-b041-b01a68b137a1.json
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
**Step 2. Prepare Evaluation prompt**
|
|
146
|
+
|
|
147
|
+
Prepare (<input, output>, benchmark) for LLM to compare the <input,output> with the benchmark..
|
|
148
|
+
|
|
149
|
+
```shell
|
|
150
|
+
agtm skills rate prepare --skill_id code_success_skills --prompt "Evaluate the code execution results"
|
|
151
|
+
|
|
152
|
+
agtm skills rate prepare --skill_id code_fail_skills --prompt "Evaluate the code execution results"
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
```shell
|
|
156
|
+
{"skill_id":"code_success_skills","benchmarks":[{"software-engineering":{"Google":[{"level":"L3","title":"Software Engineer II","description":"Entry-level engineer. Delivers well-scoped tasks with guidance. Learning codebase, tools, and best practices.","signals":["task execution","learning velocity","code quality basics"]},{"level":"L4","title":"Software Engineer III","description":"Independent contributor. Owns small features end-to-end. Writes maintainable code and participates in design discussions.","signals":["ownership","code quality","debugging ability"]},{"level":"L5","title":"Senior Software Engineer","description":"Leads projects and drives design decisions. Mentors others and improves system quality.","signals":["technical leadership","system design","mentorship"]},{"level":"L6","title":"Staff Software Engineer","description":"Owns large systems or cross-team initiatives. Sets technical direction and influences multiple teams.","signals":["architecture","cross-team impact","scalability thinking"]},{"level":"L7","title":"Senior Staff Software Engineer","description":"Drives org-level technical strategy. Solves ambiguous, high-impact problems.","signals":["org influence","complex problem solving","long-term vision"]},{"level":"L8","title":"Principal Engineer","description":"Company-wide impact. Defines technical standards and long-term architecture.","signals":["company impact","vision","industry-level thinking"]}]}}],"logs":[{"log_id":"1db0e927-79f1-46c2-b6dd-200d567f631d","input":"generate sql","output":"ok"},{"log_id":"94a2fae9-80ff-4b18-a77a-5714d34bcc20","input":"generate sql","output":"ok"},{"log_id":"96c216f1-edc5-40f3-b041-b01a68b137a1","input":"generate sql","output":"ok"},{"log_id":"b1f76f33-6f45-41e3-ae14-6b598f6aa357","input":"generate sql","output":"ok"}],"instructions":"System Prompt: You are an evaluator of skill performance. Score each example from 0.0 to 1.0 and assign a level based on benchmarks. Return JSON only. Please output json in format of {\"skill_id\": <skill_id>, \"results\": [{\"log_id\": \"<log_id_1>\", \"score\": 1.0, \"level\": \"L3\", **extra},{\"log_id\": \"<log_id_2>\", \"score\": 1.0, \"level\": \"L3\", **extra}]}\nUser prompt: Evaluate the code execution results"}
|
|
157
|
+
|
|
158
|
+
{"skill_id":"code_fail_skills","benchmarks":[{"software-engineering":{"Google":[{"level":"L3","title":"Software Engineer II","description":"Entry-level engineer. Delivers well-scoped tasks with guidance. Learning codebase, tools, and best practices.","signals":["task execution","learning velocity","code quality basics"]},{"level":"L4","title":"Software Engineer III","description":"Independent contributor. Owns small features end-to-end. Writes maintainable code and participates in design discussions.","signals":["ownership","code quality","debugging ability"]},{"level":"L5","title":"Senior Software Engineer","description":"Leads projects and drives design decisions. Mentors others and improves system quality.","signals":["technical leadership","system design","mentorship"]},{"level":"L6","title":"Staff Software Engineer","description":"Owns large systems or cross-team initiatives. Sets technical direction and influences multiple teams.","signals":["architecture","cross-team impact","scalability thinking"]},{"level":"L7","title":"Senior Staff Software Engineer","description":"Drives org-level technical strategy. Solves ambiguous, high-impact problems.","signals":["org influence","complex problem solving","long-term vision"]},{"level":"L8","title":"Principal Engineer","description":"Company-wide impact. Defines technical standards and long-term architecture.","signals":["company impact","vision","industry-level thinking"]}]}}],"logs":[{"log_id":"2e5513e7-27ae-4636-9d21-4b57ec9f739b","input":"generate sql","output":"failure"},{"log_id":"563747fb-ea62-4ebc-80c4-1bc1d1c82ed5","input":"generate sql","output":"failure"},{"log_id":"db699754-b1fd-491c-a49f-2af1a41ad1f7","input":"generate sql","output":"failure"}],"instructions":"System Prompt: You are an evaluator of skill performance. Score each example from 0.0 to 1.0 and assign a level based on benchmarks. Return JSON only. Please output json in format of {\"skill_id\": <skill_id>, \"results\": [{\"log_id\": \"<log_id_1>\", \"score\": 1.0, \"level\": \"L3\", **extra},{\"log_id\": \"<log_id_2>\", \"score\": 1.0, \"level\": \"L3\", **extra}]}\nUser prompt: Evaluate the code execution results"}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Step 3. Local Agent Run the evaluation prompt of step 2.**
|
|
162
|
+
|
|
163
|
+
Your Agent give {"score": double, "level": str} to each of the log_id
|
|
164
|
+
```
|
|
165
|
+
{"skill_id":"code_success_skills","results":[{"log_id":"1db0e927-79f1-46c2-b6dd-200d567f631d","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"94a2fae9-80ff-4b18-a77a-5714d34bcc20","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"96c216f1-edc5-40f3-b041-b01a68b137a1","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"b1f76f33-6f45-41e3-ae14-6b598f6aa357","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."}]}
|
|
166
|
+
|
|
167
|
+
{"skill_id":"code_fail_skills","results":[{"log_id":"2e5513e7-27ae-4636-9d21-4b57ec9f739b","score":0,"level":"L3"},{"log_id":"563747fb-ea62-4ebc-80c4-1bc1d1c82ed5","score":0,"level":"L3"},{"log_id":"db699754-b1fd-491c-a49f-2af1a41ad1f7","score":0,"level":"L3"}]}
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
**Step 4. Apply Results to Local Log Status**
|
|
171
|
+
|
|
172
|
+
```shell
|
|
173
|
+
agtm skills rate apply --skill_id code_success_skills --result '{"skill_id":"code_success_skills","results":[{"log_id":"1db0e927-79f1-46c2-b6dd-200d567f631d","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"94a2fae9-80ff-4b18-a77a-5714d34bcc20","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"96c216f1-edc5-40f3-b041-b01a68b137a1","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate sql. Matches entry-level performance criteria for task execution."},{"log_id":"b1f76f33-6f45-41e3-ae14-6b598f6aa357","score":1,"level":"L3","rationale":"Successfully executed a well-scoped task generate. Matches entry-level performance criteria for task execution."}]}'
|
|
174
|
+
|
|
175
|
+
agtm skills rate apply --skill_id code_fail_skills --result '{"skill_id":"code_fail_skills","results":[{"log_id":"2e5513e7-27ae-4636-9d21-4b57ec9f739b","score":0,"level":"L3"},{"log_id":"563747fb-ea62-4ebc-80c4-1bc1d1c82ed5","score":0,"level":"L3"},{"log_id":"db699754-b1fd-491c-a49f-2af1a41ad1f7","score":0,"level":"L3"}]}'
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
**Step 5. Show final Result (Optional)**
|
|
179
|
+
```shell
|
|
180
|
+
agtm skills rate show
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
|
|
184
|
+
```shell
|
|
185
|
+
skill_id run_times score level
|
|
186
|
+
------------------- --------- ----- -----
|
|
187
|
+
code_fail_skills 3 0.00 L3
|
|
188
|
+
code_success_skills 4 1.00 L3
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
|
|
192
|
+
#### CLI Documents
|
|
193
|
+
|
|
85
194
|
#### Usage
|
|
86
195
|
```
|
|
87
196
|
agtm skills rate prepare --skill_id <skill_id> --prompt "<eval_prompt>" --benchmark <path/benchmark.json>
|
|
@@ -131,6 +240,7 @@ write your `customized_agent_benchmark.json` following the formats
|
|
|
131
240
|
}
|
|
132
241
|
```
|
|
133
242
|
|
|
243
|
+
|
|
134
244
|
## Supported Agents
|
|
135
245
|
We provide the same skills local folder as vercel/skills packages.
|
|
136
246
|
Skills can be installed to any of these agents
|
package/package.json
CHANGED