@mison/ling 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/.shared/ui-ux-pro-max/data/charts.csv +26 -0
- package/.agents/.shared/ui-ux-pro-max/data/colors.csv +97 -0
- package/.agents/.shared/ui-ux-pro-max/data/icons.csv +101 -0
- package/.agents/.shared/ui-ux-pro-max/data/landing.csv +31 -0
- package/.agents/.shared/ui-ux-pro-max/data/products.csv +97 -0
- package/.agents/.shared/ui-ux-pro-max/data/prompts.csv +24 -0
- package/.agents/.shared/ui-ux-pro-max/data/react-performance.csv +45 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/flutter.csv +53 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/html-tailwind.csv +56 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/jetpack-compose.csv +53 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/nextjs.csv +53 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/nuxt-ui.csv +51 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/nuxtjs.csv +59 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/react-native.csv +52 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/react.csv +54 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/shadcn.csv +61 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/svelte.csv +54 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/swiftui.csv +51 -0
- package/.agents/.shared/ui-ux-pro-max/data/stacks/vue.csv +50 -0
- package/.agents/.shared/ui-ux-pro-max/data/styles.csv +59 -0
- package/.agents/.shared/ui-ux-pro-max/data/typography.csv +58 -0
- package/.agents/.shared/ui-ux-pro-max/data/ui-reasoning.csv +101 -0
- package/.agents/.shared/ui-ux-pro-max/data/ux-guidelines.csv +100 -0
- package/.agents/.shared/ui-ux-pro-max/data/web-interface.csv +31 -0
- package/.agents/.shared/ui-ux-pro-max/scripts/core.py +258 -0
- package/.agents/.shared/ui-ux-pro-max/scripts/design_system.py +1067 -0
- package/.agents/.shared/ui-ux-pro-max/scripts/search.py +106 -0
- package/.agents/ARCHITECTURE.md +285 -0
- package/.agents/agents/backend-specialist.md +268 -0
- package/.agents/agents/code-archaeologist.md +106 -0
- package/.agents/agents/database-architect.md +225 -0
- package/.agents/agents/debugger.md +225 -0
- package/.agents/agents/devops-engineer.md +242 -0
- package/.agents/agents/documentation-writer.md +104 -0
- package/.agents/agents/explorer-agent.md +73 -0
- package/.agents/agents/frontend-specialist.md +618 -0
- package/.agents/agents/game-developer.md +162 -0
- package/.agents/agents/mobile-developer.md +382 -0
- package/.agents/agents/orchestrator.md +436 -0
- package/.agents/agents/penetration-tester.md +188 -0
- package/.agents/agents/performance-optimizer.md +187 -0
- package/.agents/agents/product-manager.md +112 -0
- package/.agents/agents/product-owner.md +95 -0
- package/.agents/agents/project-planner.md +405 -0
- package/.agents/agents/qa-automation-engineer.md +103 -0
- package/.agents/agents/security-auditor.md +170 -0
- package/.agents/agents/seo-specialist.md +111 -0
- package/.agents/agents/test-engineer.md +158 -0
- package/.agents/mcp_config.json +22 -0
- package/.agents/rules/GEMINI.md +273 -0
- package/.agents/scripts/auto_preview.py +148 -0
- package/.agents/scripts/checklist.py +217 -0
- package/.agents/scripts/session_manager.py +120 -0
- package/.agents/scripts/verify_all.py +327 -0
- package/.agents/skills/api-patterns/SKILL.md +84 -0
- package/.agents/skills/api-patterns/api-style.md +42 -0
- package/.agents/skills/api-patterns/auth.md +24 -0
- package/.agents/skills/api-patterns/documentation.md +26 -0
- package/.agents/skills/api-patterns/graphql.md +41 -0
- package/.agents/skills/api-patterns/rate-limiting.md +31 -0
- package/.agents/skills/api-patterns/response.md +37 -0
- package/.agents/skills/api-patterns/rest.md +40 -0
- package/.agents/skills/api-patterns/scripts/api_validator.py +211 -0
- package/.agents/skills/api-patterns/security-testing.md +122 -0
- package/.agents/skills/api-patterns/trpc.md +41 -0
- package/.agents/skills/api-patterns/versioning.md +22 -0
- package/.agents/skills/app-builder/SKILL.md +75 -0
- package/.agents/skills/app-builder/agent-coordination.md +74 -0
- package/.agents/skills/app-builder/feature-building.md +53 -0
- package/.agents/skills/app-builder/project-detection.md +34 -0
- package/.agents/skills/app-builder/scaffolding.md +118 -0
- package/.agents/skills/app-builder/tech-stack.md +40 -0
- package/.agents/skills/app-builder/templates/SKILL.md +39 -0
- package/.agents/skills/app-builder/templates/astro-static/TEMPLATE.md +76 -0
- package/.agents/skills/app-builder/templates/chrome-extension/TEMPLATE.md +92 -0
- package/.agents/skills/app-builder/templates/cli-tool/TEMPLATE.md +88 -0
- package/.agents/skills/app-builder/templates/electron-desktop/TEMPLATE.md +88 -0
- package/.agents/skills/app-builder/templates/express-api/TEMPLATE.md +83 -0
- package/.agents/skills/app-builder/templates/flutter-app/TEMPLATE.md +90 -0
- package/.agents/skills/app-builder/templates/monorepo-turborepo/TEMPLATE.md +90 -0
- package/.agents/skills/app-builder/templates/nextjs-fullstack/TEMPLATE.md +122 -0
- package/.agents/skills/app-builder/templates/nextjs-saas/TEMPLATE.md +122 -0
- package/.agents/skills/app-builder/templates/nextjs-static/TEMPLATE.md +169 -0
- package/.agents/skills/app-builder/templates/nuxt-app/TEMPLATE.md +134 -0
- package/.agents/skills/app-builder/templates/python-fastapi/TEMPLATE.md +83 -0
- package/.agents/skills/app-builder/templates/react-native-app/TEMPLATE.md +119 -0
- package/.agents/skills/architecture/SKILL.md +57 -0
- package/.agents/skills/architecture/context-discovery.md +43 -0
- package/.agents/skills/architecture/examples.md +94 -0
- package/.agents/skills/architecture/pattern-selection.md +68 -0
- package/.agents/skills/architecture/patterns-reference.md +50 -0
- package/.agents/skills/architecture/trade-off-analysis.md +77 -0
- package/.agents/skills/bash-linux/SKILL.md +201 -0
- package/.agents/skills/behavioral-modes/SKILL.md +264 -0
- package/.agents/skills/brainstorming/SKILL.md +164 -0
- package/.agents/skills/brainstorming/dynamic-questioning.md +359 -0
- package/.agents/skills/clean-code/SKILL.md +200 -0
- package/.agents/skills/code-review-checklist/SKILL.md +125 -0
- package/.agents/skills/database-design/SKILL.md +54 -0
- package/.agents/skills/database-design/database-selection.md +43 -0
- package/.agents/skills/database-design/indexing.md +39 -0
- package/.agents/skills/database-design/migrations.md +50 -0
- package/.agents/skills/database-design/optimization.md +36 -0
- package/.agents/skills/database-design/orm-selection.md +30 -0
- package/.agents/skills/database-design/schema-design.md +56 -0
- package/.agents/skills/database-design/scripts/schema_validator.py +172 -0
- package/.agents/skills/deployment-procedures/SKILL.md +241 -0
- package/.agents/skills/doc.md +177 -0
- package/.agents/skills/documentation-templates/SKILL.md +194 -0
- package/.agents/skills/frontend-design/SKILL.md +418 -0
- package/.agents/skills/frontend-design/animation-guide.md +331 -0
- package/.agents/skills/frontend-design/color-system.md +307 -0
- package/.agents/skills/frontend-design/decision-trees.md +418 -0
- package/.agents/skills/frontend-design/motion-graphics.md +306 -0
- package/.agents/skills/frontend-design/scripts/accessibility_checker.py +183 -0
- package/.agents/skills/frontend-design/scripts/ux_audit.py +727 -0
- package/.agents/skills/frontend-design/typography-system.md +345 -0
- package/.agents/skills/frontend-design/ux-psychology.md +1118 -0
- package/.agents/skills/frontend-design/visual-effects.md +383 -0
- package/.agents/skills/game-development/2d-games/SKILL.md +119 -0
- package/.agents/skills/game-development/3d-games/SKILL.md +135 -0
- package/.agents/skills/game-development/SKILL.md +167 -0
- package/.agents/skills/game-development/game-art/SKILL.md +185 -0
- package/.agents/skills/game-development/game-audio/SKILL.md +190 -0
- package/.agents/skills/game-development/game-design/SKILL.md +129 -0
- package/.agents/skills/game-development/mobile-games/SKILL.md +108 -0
- package/.agents/skills/game-development/multiplayer/SKILL.md +132 -0
- package/.agents/skills/game-development/pc-games/SKILL.md +144 -0
- package/.agents/skills/game-development/vr-ar/SKILL.md +123 -0
- package/.agents/skills/game-development/web-games/SKILL.md +150 -0
- package/.agents/skills/geo-fundamentals/SKILL.md +155 -0
- package/.agents/skills/geo-fundamentals/scripts/geo_checker.py +289 -0
- package/.agents/skills/i18n-localization/SKILL.md +154 -0
- package/.agents/skills/i18n-localization/scripts/i18n_checker.py +241 -0
- package/.agents/skills/intelligent-routing/SKILL.md +335 -0
- package/.agents/skills/lint-and-validate/SKILL.md +44 -0
- package/.agents/skills/lint-and-validate/scripts/lint_runner.py +184 -0
- package/.agents/skills/lint-and-validate/scripts/type_coverage.py +173 -0
- package/.agents/skills/mcp-builder/SKILL.md +176 -0
- package/.agents/skills/mobile-design/SKILL.md +394 -0
- package/.agents/skills/mobile-design/decision-trees.md +516 -0
- package/.agents/skills/mobile-design/mobile-backend.md +491 -0
- package/.agents/skills/mobile-design/mobile-color-system.md +420 -0
- package/.agents/skills/mobile-design/mobile-debugging.md +122 -0
- package/.agents/skills/mobile-design/mobile-design-thinking.md +355 -0
- package/.agents/skills/mobile-design/mobile-navigation.md +458 -0
- package/.agents/skills/mobile-design/mobile-performance.md +767 -0
- package/.agents/skills/mobile-design/mobile-testing.md +356 -0
- package/.agents/skills/mobile-design/mobile-typography.md +432 -0
- package/.agents/skills/mobile-design/platform-android.md +666 -0
- package/.agents/skills/mobile-design/platform-ios.md +561 -0
- package/.agents/skills/mobile-design/scripts/mobile_audit.py +670 -0
- package/.agents/skills/mobile-design/touch-psychology.md +537 -0
- package/.agents/skills/nextjs-react-expert/1-async-eliminating-waterfalls.md +311 -0
- package/.agents/skills/nextjs-react-expert/2-bundle-bundle-size-optimization.md +241 -0
- package/.agents/skills/nextjs-react-expert/3-server-server-side-performance.md +489 -0
- package/.agents/skills/nextjs-react-expert/4-client-client-side-data-fetching.md +263 -0
- package/.agents/skills/nextjs-react-expert/5-rerender-re-render-optimization.md +581 -0
- package/.agents/skills/nextjs-react-expert/6-rendering-rendering-performance.md +431 -0
- package/.agents/skills/nextjs-react-expert/7-js-javascript-performance.md +683 -0
- package/.agents/skills/nextjs-react-expert/8-advanced-advanced-patterns.md +149 -0
- package/.agents/skills/nextjs-react-expert/SKILL.md +286 -0
- package/.agents/skills/nextjs-react-expert/scripts/convert_rules.py +222 -0
- package/.agents/skills/nextjs-react-expert/scripts/react_performance_checker.py +252 -0
- package/.agents/skills/nodejs-best-practices/SKILL.md +333 -0
- package/.agents/skills/parallel-agents/SKILL.md +193 -0
- package/.agents/skills/performance-profiling/SKILL.md +149 -0
- package/.agents/skills/performance-profiling/scripts/lighthouse_audit.py +120 -0
- package/.agents/skills/plan-writing/SKILL.md +152 -0
- package/.agents/skills/powershell-windows/SKILL.md +166 -0
- package/.agents/skills/python-patterns/SKILL.md +441 -0
- package/.agents/skills/red-team-tactics/SKILL.md +203 -0
- package/.agents/skills/refactoring-patterns/SKILL.md +43 -0
- package/.agents/skills/rust-pro/SKILL.md +190 -0
- package/.agents/skills/seo-fundamentals/SKILL.md +135 -0
- package/.agents/skills/seo-fundamentals/scripts/seo_checker.py +215 -0
- package/.agents/skills/server-management/SKILL.md +161 -0
- package/.agents/skills/systematic-debugging/SKILL.md +114 -0
- package/.agents/skills/tailwind-patterns/SKILL.md +269 -0
- package/.agents/skills/tdd-workflow/SKILL.md +149 -0
- package/.agents/skills/testing-patterns/SKILL.md +178 -0
- package/.agents/skills/testing-patterns/scripts/test_runner.py +219 -0
- package/.agents/skills/vulnerability-scanner/SKILL.md +276 -0
- package/.agents/skills/vulnerability-scanner/checklists.md +131 -0
- package/.agents/skills/vulnerability-scanner/scripts/__pycache__/security_scan.cpython-310.pyc +0 -0
- package/.agents/skills/vulnerability-scanner/scripts/security_scan.py +524 -0
- package/.agents/skills/web-design-guidelines/SKILL.md +57 -0
- package/.agents/skills/webapp-testing/SKILL.md +187 -0
- package/.agents/skills/webapp-testing/scripts/playwright_runner.py +173 -0
- package/.agents/workflows/brainstorm.md +113 -0
- package/.agents/workflows/create.md +59 -0
- package/.agents/workflows/debug.md +103 -0
- package/.agents/workflows/deploy.md +176 -0
- package/.agents/workflows/enhance.md +63 -0
- package/.agents/workflows/orchestrate.md +242 -0
- package/.agents/workflows/plan.md +89 -0
- package/.agents/workflows/preview.md +80 -0
- package/.agents/workflows/restore-localize-compat.md +525 -0
- package/.agents/workflows/status.md +86 -0
- package/.agents/workflows/test.md +144 -0
- package/.agents/workflows/ui-ux-pro-max.md +295 -0
- package/.spec/profiles/codex/AGENTS.spec.md +7 -0
- package/.spec/profiles/codex/ling.spec.rules.md +4 -0
- package/.spec/profiles/gemini/GEMINI.spec.md +5 -0
- package/.spec/references/README.md +36 -0
- package/.spec/references/cse-quickstart.md +96 -0
- package/.spec/references/gda-framework.md +394 -0
- package/.spec/references/harness-engineering-digest.md +93 -0
- package/.spec/skills/cybernetic-systems-engineering/SKILL.md +792 -0
- package/.spec/skills/cybernetic-systems-engineering/agents/openai.yaml +5 -0
- package/.spec/skills/cybernetic-systems-engineering/assets/quickstart.md +96 -0
- package/.spec/skills/cybernetic-systems-engineering/references/README.md +36 -0
- package/.spec/skills/cybernetic-systems-engineering/references/gda-framework.md +394 -0
- package/.spec/skills/cybernetic-systems-engineering/scripts/issues.csv +20 -0
- package/.spec/skills/harness-engineering/SKILL.md +100 -0
- package/.spec/skills/harness-engineering/agents/openai.yaml +4 -0
- package/.spec/skills/harness-engineering/references/harness-engineering-digest.md +93 -0
- package/.spec/templates/driver-prompt.md +7 -0
- package/.spec/templates/handoff.md +9 -0
- package/.spec/templates/issues.template.csv +2 -0
- package/.spec/templates/phase-acceptance.md +9 -0
- package/.spec/templates/review-report.md +9 -0
- package/AGENT_FLOW.md +609 -0
- package/CHANGELOG.md +43 -0
- package/LICENSE +21 -0
- package/README.md +359 -0
- package/bin/adapters/base.js +63 -0
- package/bin/adapters/codex.js +421 -0
- package/bin/adapters/gemini.js +157 -0
- package/bin/ag-kit.js +2266 -0
- package/bin/core/builder.js +80 -0
- package/bin/core/generator.js +59 -0
- package/bin/core/resource-loader.js +64 -0
- package/bin/core/transformer.js +208 -0
- package/bin/interactive.js +65 -0
- package/bin/ling.js +3 -0
- package/bin/utils/atomic-writer.js +97 -0
- package/bin/utils/git-helper.js +68 -0
- package/bin/utils/managed-block.js +65 -0
- package/bin/utils/manifest.js +244 -0
- package/bin/utils.js +89 -0
- package/docs/PLAN.md +54 -0
- package/docs/TECH.md +191 -0
- package/package.json +56 -0
- package/scripts/ci-verify.js +110 -0
- package/scripts/clean.js +123 -0
- package/scripts/health-check.js +143 -0
- package/scripts/health-check.sh +6 -0
- package/scripts/postinstall-check.js +112 -0
- package/scripts/run-tests.js +49 -0
- package/tests/atomic-writer.test.js +47 -0
- package/tests/clean-script.test.js +77 -0
- package/tests/cli-smoke.test.js +479 -0
- package/tests/codex-adapter.test.js +132 -0
- package/tests/doctor.test.js +94 -0
- package/tests/gemini-adapter.test.js +30 -0
- package/tests/generator.test.js +48 -0
- package/tests/git-helper.test.js +53 -0
- package/tests/global-sync.test.js +133 -0
- package/tests/health-check-script.test.js +34 -0
- package/tests/managed-block.test.js +41 -0
- package/tests/manifest.test.js +97 -0
- package/tests/package-tarball.test.js +33 -0
- package/tests/phase-c.test.js +107 -0
- package/tests/spec-profile.test.js +86 -0
- package/tests/standards-compliance.test.js +303 -0
- package/tests/transformer.test.js +74 -0
- package/tests/versioning.test.js +51 -0
|
@@ -0,0 +1,394 @@
|
|
|
1
|
+
# GDA Framework
|
|
2
|
+
|
|
3
|
+
## 1. 为什么这个 skill 需要总体设计视角
|
|
4
|
+
|
|
5
|
+
从工程实践看:
|
|
6
|
+
|
|
7
|
+
- 初级工程师关注语法和框架
|
|
8
|
+
- 中级工程师关注算法和模式
|
|
9
|
+
- 高级工程师真正解决的是:**如何控制复杂性**
|
|
10
|
+
|
|
11
|
+
这个 skill 的理论底座来自两个方向:
|
|
12
|
+
|
|
13
|
+
1. **系统工程**
|
|
14
|
+
- 关注总体目标、边界、分解、集成、组织
|
|
15
|
+
2. **工程控制论**
|
|
16
|
+
- 关注反馈、稳定性、时滞、噪声、鲁棒性、调节策略
|
|
17
|
+
|
|
18
|
+
把它们映射到软件工程,核心不是数学推导,而是:
|
|
19
|
+
|
|
20
|
+
- 把软件系统当作可观测、可控制、会受扰动、会振荡的复杂系统
|
|
21
|
+
|
|
22
|
+
## 2. 五大维度
|
|
23
|
+
|
|
24
|
+
### 2.1 第一性原理
|
|
25
|
+
|
|
26
|
+
先问:
|
|
27
|
+
|
|
28
|
+
- 业务真正目标是什么
|
|
29
|
+
- 物理极限是什么
|
|
30
|
+
- 什么是不变量
|
|
31
|
+
- 什么是绝对不能妥协的约束
|
|
32
|
+
|
|
33
|
+
软件中的等价问题:
|
|
34
|
+
|
|
35
|
+
- 吞吐 vs 一致性
|
|
36
|
+
- 延迟预算
|
|
37
|
+
- 网络 RTT
|
|
38
|
+
- 磁盘 IOPS
|
|
39
|
+
- 内存 / 缓存边界
|
|
40
|
+
|
|
41
|
+
对应方法:
|
|
42
|
+
|
|
43
|
+
- 约束驱动设计
|
|
44
|
+
|
|
45
|
+
### 2.2 公理化思维
|
|
46
|
+
|
|
47
|
+
系统必须建立在少数不可推翻的契约上:
|
|
48
|
+
|
|
49
|
+
- API 契约
|
|
50
|
+
- schema 契约
|
|
51
|
+
- 状态机合法跃迁
|
|
52
|
+
- 事件语义
|
|
53
|
+
|
|
54
|
+
软件中的反面典型:
|
|
55
|
+
|
|
56
|
+
- 模块越权访问
|
|
57
|
+
- 影子实现
|
|
58
|
+
- 同一状态有两套解释
|
|
59
|
+
|
|
60
|
+
对应方法:
|
|
61
|
+
|
|
62
|
+
- 定义即防御
|
|
63
|
+
- 契约优先于实现
|
|
64
|
+
|
|
65
|
+
### 2.3 多模型思维
|
|
66
|
+
|
|
67
|
+
复杂系统不能只用一种图来理解。
|
|
68
|
+
|
|
69
|
+
至少应切换三种模型:
|
|
70
|
+
|
|
71
|
+
1. 数据流模型
|
|
72
|
+
2. 状态机模型
|
|
73
|
+
3. 时序 / 并发 / 排队模型
|
|
74
|
+
|
|
75
|
+
软件中的典型价值:
|
|
76
|
+
|
|
77
|
+
- 同步 RPC 看不清的问题,换成事件流模型会更清楚
|
|
78
|
+
- OOP 卡住的高并发状态流转,换成 Actor / 队列模型更稳定
|
|
79
|
+
- 代码 profiler 看不清的延迟问题,用排队模型更直观
|
|
80
|
+
|
|
81
|
+
对应方法:
|
|
82
|
+
|
|
83
|
+
- 正交投影建模法
|
|
84
|
+
|
|
85
|
+
### 2.4 类比迁移能力
|
|
86
|
+
|
|
87
|
+
软件系统没有物理形体,但可以寻找物理同构:
|
|
88
|
+
|
|
89
|
+
- 限流 / 降级 -> 漏桶 / 令牌桶
|
|
90
|
+
- 自动扩缩容 -> PID
|
|
91
|
+
- 消息队列 -> 水库 / 缓冲池
|
|
92
|
+
- 熔断 -> 自动断路器
|
|
93
|
+
|
|
94
|
+
当系统出现:
|
|
95
|
+
|
|
96
|
+
- 拥堵
|
|
97
|
+
- 雪崩
|
|
98
|
+
- 震荡
|
|
99
|
+
- 超调
|
|
100
|
+
|
|
101
|
+
就应该主动寻找物理世界中的成熟控制经验。
|
|
102
|
+
|
|
103
|
+
### 2.5 反馈与验证能力
|
|
104
|
+
|
|
105
|
+
没有反馈回路的系统,等价于盲飞。
|
|
106
|
+
|
|
107
|
+
必须有:
|
|
108
|
+
|
|
109
|
+
- metrics
|
|
110
|
+
- logs
|
|
111
|
+
- traces
|
|
112
|
+
- replay / repro
|
|
113
|
+
- tests / benchmarks / gate
|
|
114
|
+
|
|
115
|
+
尤其要关注:
|
|
116
|
+
|
|
117
|
+
- 时滞
|
|
118
|
+
- 非线性
|
|
119
|
+
- 噪声
|
|
120
|
+
- 偶然通过的测试
|
|
121
|
+
|
|
122
|
+
## 3. GDA 四步法
|
|
123
|
+
|
|
124
|
+
### Step 1: Axiom & Boundary
|
|
125
|
+
|
|
126
|
+
执行动作:
|
|
127
|
+
|
|
128
|
+
1. 明确系统最终目标
|
|
129
|
+
2. 写出不变量
|
|
130
|
+
3. 列出硬约束
|
|
131
|
+
4. 算清物理边界
|
|
132
|
+
|
|
133
|
+
产物:
|
|
134
|
+
|
|
135
|
+
- 控制合同(Control Contract)
|
|
136
|
+
- 边界清单
|
|
137
|
+
- 风险清单
|
|
138
|
+
|
|
139
|
+
### Step 2: Multi-model Construction
|
|
140
|
+
|
|
141
|
+
执行动作:
|
|
142
|
+
|
|
143
|
+
1. 建立静态契约域
|
|
144
|
+
- API
|
|
145
|
+
- schema
|
|
146
|
+
- 配置
|
|
147
|
+
- 事件格式
|
|
148
|
+
2. 建立动态状态域
|
|
149
|
+
- 状态机
|
|
150
|
+
- 生命周期
|
|
151
|
+
- 状态收口
|
|
152
|
+
3. 建立容量与排队域
|
|
153
|
+
- 队列
|
|
154
|
+
- 缓冲
|
|
155
|
+
- 背压
|
|
156
|
+
- 瓶颈
|
|
157
|
+
|
|
158
|
+
产物:
|
|
159
|
+
|
|
160
|
+
- 最小系统图
|
|
161
|
+
- 状态机图或文字版
|
|
162
|
+
- 容量/时滞估计
|
|
163
|
+
|
|
164
|
+
### Step 3: Cybernetic Control
|
|
165
|
+
|
|
166
|
+
执行动作:
|
|
167
|
+
|
|
168
|
+
1. 识别主要误差
|
|
169
|
+
2. 选择最小控制输入
|
|
170
|
+
3. 一次只改一类问题
|
|
171
|
+
4. 防止振荡
|
|
172
|
+
|
|
173
|
+
典型控制动作:
|
|
174
|
+
|
|
175
|
+
- 缩小改动范围
|
|
176
|
+
- 加强可观测点
|
|
177
|
+
- 从大修退回最小修复
|
|
178
|
+
- 从“改实现”切换到“补测试/补契约/修接口”
|
|
179
|
+
|
|
180
|
+
### Step 4: Closed-loop Observability
|
|
181
|
+
|
|
182
|
+
执行动作:
|
|
183
|
+
|
|
184
|
+
1. 分层验证
|
|
185
|
+
- L0:快回路
|
|
186
|
+
- L1:中回路
|
|
187
|
+
- L2:慢回路
|
|
188
|
+
2. 明确什么只能在真实环境验证
|
|
189
|
+
3. 沉淀 review / gate / handoff 文档
|
|
190
|
+
|
|
191
|
+
产物:
|
|
192
|
+
|
|
193
|
+
- 测试矩阵
|
|
194
|
+
- 审计报告
|
|
195
|
+
- gate 文档
|
|
196
|
+
- handoff
|
|
197
|
+
|
|
198
|
+
## 4. 对软件架构师的直接启示
|
|
199
|
+
|
|
200
|
+
### 4.1 不是先选技术,而是先选控制目标
|
|
201
|
+
|
|
202
|
+
错误问题:
|
|
203
|
+
|
|
204
|
+
- 用什么数据库?
|
|
205
|
+
- 上不上微服务?
|
|
206
|
+
|
|
207
|
+
正确问题:
|
|
208
|
+
|
|
209
|
+
- 哪个指标必须优先满足?
|
|
210
|
+
- 哪个不变量绝不能破?
|
|
211
|
+
- 哪个失败模式必须被优先吸收?
|
|
212
|
+
|
|
213
|
+
### 4.2 不是先加抽象,而是先界定契约
|
|
214
|
+
|
|
215
|
+
错误问题:
|
|
216
|
+
|
|
217
|
+
- 如何把实现写得更“通用”?
|
|
218
|
+
|
|
219
|
+
正确问题:
|
|
220
|
+
|
|
221
|
+
- 什么应该是 public?
|
|
222
|
+
- 什么只是测试方便,不该变成公共 API?
|
|
223
|
+
- 什么是模块的业务区位?
|
|
224
|
+
|
|
225
|
+
### 4.3 不是所有“通过测试”都等于问题解决
|
|
226
|
+
|
|
227
|
+
必须区分:
|
|
228
|
+
|
|
229
|
+
- 语义测试
|
|
230
|
+
- schema 契约测试
|
|
231
|
+
- 主链测试
|
|
232
|
+
- 真实环境 gate
|
|
233
|
+
|
|
234
|
+
这是本 skill 最关键的现实原则之一。
|
|
235
|
+
|
|
236
|
+
## 5. 使用这个 skill 时的输出风格
|
|
237
|
+
|
|
238
|
+
默认要求输出这些内容中的大部分:
|
|
239
|
+
|
|
240
|
+
1. 当前误差是什么
|
|
241
|
+
2. 当前系统状态估计是什么
|
|
242
|
+
3. 当前改动为什么是最小控制输入
|
|
243
|
+
4. 验证层次是什么
|
|
244
|
+
5. 哪些风险还没被闭环吸收
|
|
245
|
+
|
|
246
|
+
## 6. 常见误区
|
|
247
|
+
|
|
248
|
+
1. 把控制论理解成“堆术语”
|
|
249
|
+
2. 把系统工程理解成“画更多图”
|
|
250
|
+
3. 把离线通过等价为真实环境通过
|
|
251
|
+
4. 把测试便利性凌驾于模块职责之上
|
|
252
|
+
5. 把临时修复变成新的影子事实源
|
|
253
|
+
|
|
254
|
+
## 7. 一页版执行模板
|
|
255
|
+
|
|
256
|
+
### Control Contract
|
|
257
|
+
|
|
258
|
+
- Primary Setpoint:
|
|
259
|
+
- Acceptance:
|
|
260
|
+
- Guardrail Metrics:
|
|
261
|
+
- Sampling Plan:
|
|
262
|
+
- Known Delays / Delay Budget:
|
|
263
|
+
- Recovery Target:
|
|
264
|
+
- Rollback Trigger:
|
|
265
|
+
- Constraints:
|
|
266
|
+
- Boundary:
|
|
267
|
+
- Coupling Notes:
|
|
268
|
+
- Approximation Validity:
|
|
269
|
+
- Actuator Budget:
|
|
270
|
+
- Risks:
|
|
271
|
+
|
|
272
|
+
### State Estimate
|
|
273
|
+
|
|
274
|
+
- Entry:
|
|
275
|
+
- Key state:
|
|
276
|
+
- Key invariants:
|
|
277
|
+
- Current error signal:
|
|
278
|
+
|
|
279
|
+
### Hypotheses
|
|
280
|
+
|
|
281
|
+
1. H1:
|
|
282
|
+
2. H2:
|
|
283
|
+
3. H3:
|
|
284
|
+
|
|
285
|
+
### Experiment Plan
|
|
286
|
+
|
|
287
|
+
- E1:
|
|
288
|
+
- E2:
|
|
289
|
+
- E3:
|
|
290
|
+
|
|
291
|
+
### Patch Plan
|
|
292
|
+
|
|
293
|
+
- Step 1:
|
|
294
|
+
- Step 2:
|
|
295
|
+
- Step 3:
|
|
296
|
+
|
|
297
|
+
### Verification
|
|
298
|
+
|
|
299
|
+
- L0:
|
|
300
|
+
- L1:
|
|
301
|
+
- L2:
|
|
302
|
+
|
|
303
|
+
### Recovery Evidence
|
|
304
|
+
|
|
305
|
+
- Trigger:
|
|
306
|
+
- Recovery time:
|
|
307
|
+
- Rollback / restart:
|
|
308
|
+
|
|
309
|
+
### Observability Evidence
|
|
310
|
+
|
|
311
|
+
- Metrics / traces:
|
|
312
|
+
- Profiling baseline:
|
|
313
|
+
- Before vs after:
|
|
314
|
+
|
|
315
|
+
### Residual Risks
|
|
316
|
+
|
|
317
|
+
- Risk 1:
|
|
318
|
+
- Risk 2:
|
|
319
|
+
|
|
320
|
+
## 8. 现代映射附录
|
|
321
|
+
|
|
322
|
+
这一节不是重复主 skill,而是把《论系统工程》《工程控制论》的经典问题,映射到云原生、分布式和 AI 原生工程语境。
|
|
323
|
+
|
|
324
|
+
### 8.1 系统工程总体设计 -> 云原生总体控制结构
|
|
325
|
+
|
|
326
|
+
- 经典问题:
|
|
327
|
+
- 如何为一个大系统建立总体设计部,统一目标、边界、分工与集成
|
|
328
|
+
- 现代映射:
|
|
329
|
+
- 平台团队、架构委员会、共享边界 owner、发布治理和统一门禁
|
|
330
|
+
- 典型例子:
|
|
331
|
+
- 微服务拆分后,团队不再只管“自己的服务”,还要定义共享 API、共享 schema、mesh 策略和发布窗口
|
|
332
|
+
|
|
333
|
+
### 8.2 时滞 -> 异步传播、CI 排队、缓存刷新与灰度延迟
|
|
334
|
+
|
|
335
|
+
- 经典问题:
|
|
336
|
+
- 时间延迟会让原本稳定的系统出现低频振荡
|
|
337
|
+
- 现代映射:
|
|
338
|
+
- 异步队列传播、CDN 或缓存刷新、控制面配置下发、CI 队列、灰度生效延迟
|
|
339
|
+
- 典型例子:
|
|
340
|
+
- 你刚调了限流参数,dashboard 还没刷新,第二次调参就已经把系统推过头了
|
|
341
|
+
|
|
342
|
+
### 8.3 采样 -> 监控抓取周期、批量上报与窗口聚合
|
|
343
|
+
|
|
344
|
+
- 经典问题:
|
|
345
|
+
- 离散采样会带来别名和误判
|
|
346
|
+
- 现代映射:
|
|
347
|
+
- metrics 抓取周期、日志批量上传、trace 采样率、profiling 采样窗口
|
|
348
|
+
- 典型例子:
|
|
349
|
+
- 用 1 分钟粒度看 5 秒一个周期的抖动,会把真实震荡看成偶发现象
|
|
350
|
+
|
|
351
|
+
### 8.4 反馈 -> 自动扩缩容、重试、熔断、回滚控制器
|
|
352
|
+
|
|
353
|
+
- 经典问题:
|
|
354
|
+
- 反馈能增强稳定性,但错误反馈也会放大震荡
|
|
355
|
+
- 现代映射:
|
|
356
|
+
- HPA、队列消费调度、自动回滚、熔断半开、重试预算控制
|
|
357
|
+
- 典型例子:
|
|
358
|
+
- HPA、应用内重试和网关重试同时工作时,系统不再是单控制器,而是耦合反馈系统
|
|
359
|
+
|
|
360
|
+
### 8.5 解耦 -> 输入输出影响矩阵与平台边界
|
|
361
|
+
|
|
362
|
+
- 经典问题:
|
|
363
|
+
- 多变量系统要先解耦,否则一个输入会同时打坏多个输出
|
|
364
|
+
- 现代映射:
|
|
365
|
+
- 限流参数、缓存策略、连接池、批量大小、并发度这些高耦合 knob 的影响矩阵
|
|
366
|
+
- 典型例子:
|
|
367
|
+
- 调大重试可能抬高成功率,但也会同时抬高延迟、成本和队列长度
|
|
368
|
+
|
|
369
|
+
### 8.6 噪声 -> flake、监控抖动、依赖 brownout
|
|
370
|
+
|
|
371
|
+
- 经典问题:
|
|
372
|
+
- 噪声会污染观测,使错误控制建立在伪信号上
|
|
373
|
+
- 现代映射:
|
|
374
|
+
- flaky test、监控抖动、短时网络毛刺、依赖 brownout、缓存击穿带来的尖峰
|
|
375
|
+
- 典型例子:
|
|
376
|
+
- 把短时 brownout 当成永久故障,会触发错误的扩容、回滚或切流
|
|
377
|
+
|
|
378
|
+
### 8.7 自适应 -> 在线调参、搜索、学习型控制
|
|
379
|
+
|
|
380
|
+
- 经典问题:
|
|
381
|
+
- 当环境持续变化时,固定控制律会失效
|
|
382
|
+
- 现代映射:
|
|
383
|
+
- 自适应限流、动态并发、在线容量预测、带反馈的调参系统、AI agent 的在线策略更新
|
|
384
|
+
- 典型例子:
|
|
385
|
+
- AI 原生系统中,固定 prompt 路由和固定超时很快会老化,必须让策略能根据观测在线修正
|
|
386
|
+
|
|
387
|
+
### 8.8 冗余 -> 多副本、双写、影子链路与容错恢复
|
|
388
|
+
|
|
389
|
+
- 经典问题:
|
|
390
|
+
- 用不可靠部件构造可靠系统,要靠冗余、仲裁和恢复
|
|
391
|
+
- 现代映射:
|
|
392
|
+
- 多副本部署、跨可用区、双写双读、shadow 流量、冷备热备、幂等恢复
|
|
393
|
+
- 典型例子:
|
|
394
|
+
- 双写不是“多一份保险”,而是引入了新的对账、顺序和一致性问题,必须被当成状态面控制问题
|
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# Harnesses are underrated 文章蒸馏
|
|
2
|
+
|
|
3
|
+
## 文章信息
|
|
4
|
+
|
|
5
|
+
- 标题: Harnesses are underrated
|
|
6
|
+
- 来源: OpenAI
|
|
7
|
+
- 发布时间: 2025-07-31
|
|
8
|
+
- 作者: Boris Power、Brendan Dolan-Gavitt、Calvin French-Owen、Charles Lamanna、Katie Gamanji、Oliver Greenwood、Romain Huet
|
|
9
|
+
- 链接: https://openai.com/index/harness-engineering/
|
|
10
|
+
|
|
11
|
+
## 全文核心观点
|
|
12
|
+
|
|
13
|
+
这篇文章的主张是:模型能力决定理论上限,但工程系统(harness)决定实际交付效果。
|
|
14
|
+
“好 harness”不是单一技巧,而是把代理放到真实工作回路里,让它能访问文档、代码、测试、CI 和故障反馈,并在明确的边界内迭代。
|
|
15
|
+
|
|
16
|
+
## 关键内容蒸馏
|
|
17
|
+
|
|
18
|
+
### 1. 先解决执行环境,再谈提示词细节
|
|
19
|
+
|
|
20
|
+
- 让代理直接使用真实工具:代码库、测试命令、日志、PR、CI。
|
|
21
|
+
- 如果代理无法运行或验证改动,即使模型很强也难以稳定产出。
|
|
22
|
+
- 实践里,基础 harness 的改进经常比微调提示词更有效。
|
|
23
|
+
|
|
24
|
+
### 2. 给代理“可控自主权”
|
|
25
|
+
|
|
26
|
+
- 不要把代理变成每一步都等待人工确认的脚本。
|
|
27
|
+
- 在可控预算内允许代理先自行尝试并迭代。
|
|
28
|
+
- 自主执行必须伴随可观测性:每次尝试有明确输入、输出和失败信号。
|
|
29
|
+
|
|
30
|
+
### 3. 用分层权限降低风险
|
|
31
|
+
|
|
32
|
+
- 从低权限开始(读代码、查文档),按证据逐级放开(跑测试、改代码、执行受限操作)。
|
|
33
|
+
- 权限升级应有触发条件;失败应能回滚到安全级别。
|
|
34
|
+
- 这样既提高成功率,也减少误操作风险。
|
|
35
|
+
|
|
36
|
+
### 4. 明确定义“何时请求人工”
|
|
37
|
+
|
|
38
|
+
- 不是遇到问题就立刻抛给人,也不是无限重试。
|
|
39
|
+
- 需要在系统层面定义求助阈值,例如:
|
|
40
|
+
- 同类失败重复出现;
|
|
41
|
+
- 关键信息缺失;
|
|
42
|
+
- 需要越权或高风险动作;
|
|
43
|
+
- 多方案权衡需要业务取舍。
|
|
44
|
+
- 求助请求应包含证据和候选方案,便于人快速决策。
|
|
45
|
+
|
|
46
|
+
### 5. 把 agent 纳入 CI 回路
|
|
47
|
+
|
|
48
|
+
- 让代理并行尝试修复,CI 作为客观判定器。
|
|
49
|
+
- 失败时把上下文移交人工,而不是隐藏失败。
|
|
50
|
+
- 这种人机接力可以缩短故障恢复时间。
|
|
51
|
+
|
|
52
|
+
### 6. 关注真实指标,而非演示效果
|
|
53
|
+
|
|
54
|
+
- 衡量指标应围绕真实交付:任务闭环率、修复成功率、回归通过率、人工介入频次、平均修复时间。
|
|
55
|
+
- 文章案例强调:当 harness 做对后,任务完成率会明显提升。
|
|
56
|
+
|
|
57
|
+
## 文中提到的案例信号
|
|
58
|
+
|
|
59
|
+
- 在 HumanEval 场景,完善 agent-only harness 后,完成率可从接近 0 提升到约 50% 量级。
|
|
60
|
+
- 在 SWE-bench Verified 场景,先达到约 65%,继续优化 harness 后可到约 75%。
|
|
61
|
+
- 文中还给出团队实践信号:通过强化 harness,任务解决率可从约 28% 提升到约 65%。
|
|
62
|
+
|
|
63
|
+
## 与“人机协作/主动请求”直接相关的落地规则
|
|
64
|
+
|
|
65
|
+
1. 先让代理独立跑到“可证明卡住”为止,再请求人工。
|
|
66
|
+
2. 请求人工时只问一个关键决策,并给出 A/B/C 选项。
|
|
67
|
+
3. 每个选项明确成本、风险、收益与时间。
|
|
68
|
+
4. 人工确认后,代理继续执行并返回证据,而不是重复讨论。
|
|
69
|
+
5. 将协作过程写入任务日志,保持可追溯。
|
|
70
|
+
|
|
71
|
+
## 可直接复用的求助阈值
|
|
72
|
+
|
|
73
|
+
- 连续 2-3 次同类失败,且错误模式一致。
|
|
74
|
+
- 缺失关键上下文,无法再通过本地推断补足。
|
|
75
|
+
- 需要生产级权限、外网访问、密钥或不可逆操作。
|
|
76
|
+
- 变更范围超出既定边界,可能影响无关模块。
|
|
77
|
+
|
|
78
|
+
## 可直接复用的求助结构
|
|
79
|
+
|
|
80
|
+
1. 当前阻塞(一句话)。
|
|
81
|
+
2. 已尝试动作与证据(命令、日志、结果)。
|
|
82
|
+
3. 根因假设(含置信度)。
|
|
83
|
+
4. 备选方案 A/B/C(成本、风险、预计耗时)。
|
|
84
|
+
5. 需要人工确认的唯一问题。
|
|
85
|
+
|
|
86
|
+
## 与 skill 的关系
|
|
87
|
+
|
|
88
|
+
本目录下的 `SKILL.md` 把这些原则转成可执行流程,重点覆盖:
|
|
89
|
+
- harness 先行;
|
|
90
|
+
- 自主执行预算;
|
|
91
|
+
- 主动求助阈值;
|
|
92
|
+
- 选项型协作提问;
|
|
93
|
+
- 可追溯交付收束。
|