ai-execution-protocol 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +12 -1
- package/README.md +54 -2
- package/behavior/audit-checklist.yaml +55 -0
- package/behavior/contract.yaml +156 -0
- package/dataset/README.md +38 -0
- package/dataset/raw-examples.md +36 -0
- package/dataset/train.jsonl +3 -0
- package/dataset/validation.jsonl +2 -0
- package/dist/minimal/AGENTS.md +8 -1
- package/dist/minimal/README.md +3 -0
- package/dist/minimal/behavior/audit-checklist.yaml +15 -0
- package/dist/minimal/behavior/contract.yaml +29 -0
- package/dist/minimal/canonical-state.yaml +1 -1
- package/dist/minimal/capabilities/registry.yaml +48 -0
- package/dist/minimal/context-map.yaml +2 -1
- package/dist/minimal/ide-rules/instruction-block.md +23 -0
- package/dist/minimal/memory/INDEX.yaml +1 -1
- package/dist/minimal/protocol/README.yaml +11 -1
- package/dist/minimal/protocol/capability-gate.yaml +56 -0
- package/dist/minimal/protocol/capability-router.yaml +123 -0
- package/dist/minimal/protocol/context-rules.yaml +2 -1
- package/dist/minimal/protocol/fast-path.yaml +8 -1
- package/dist/minimal/protocol/intelligence-router.yaml +63 -0
- package/dist/minimal/protocol/route-packs.yaml +49 -1
- package/dist/minimal/protocol/router.yaml +35 -1
- package/docs/00-visao-geral.md +41 -0
- package/docs/01-modelo-de-execucao.md +25 -0
- package/docs/02-niveis-de-risco.md +62 -0
- package/docs/03-mapeamento-antes-de-alterar.md +48 -0
- package/docs/04-janela-de-contexto.md +56 -0
- package/docs/05-validacao-e-entrega.md +48 -0
- package/docs/06-memoria-e-continuidade.md +27 -0
- package/docs/07-legibilidade-para-ia.md +47 -0
- package/docs/08-posicionamento.md +48 -0
- package/docs/09-governanca-de-mudancas.md +48 -0
- package/docs/10-economia-de-prompt.md +79 -0
- package/docs/11-retencao-de-resultados.md +26 -0
- package/docs/12-instalacao-em-outro-projeto.md +254 -0
- package/docs/13-uso-em-ides.md +137 -0
- package/docs/14-publicacao.md +128 -0
- package/docs/15-contexto-persistente.md +204 -0
- package/docs/16-release-e-atualizacao.md +146 -0
- package/docs/17-documentacao-atomica.md +117 -0
- package/docs/18-memoria-adaptativa.md +107 -0
- package/docs/19-orcamento-de-contexto.md +63 -0
- package/docs/20-validacao-seletiva.md +46 -0
- package/docs/21-roteamento-de-capacidades.md +121 -0
- package/docs/22-roadmap-v1.md +163 -0
- package/docs/23-contrato-comportamental.md +116 -0
- package/docs/24-gate-de-capacidades-e-inteligencia.md +109 -0
- package/docs/README.md +58 -0
- package/eval/README.md +27 -0
- package/eval/rubric.yaml +57 -0
- package/eval/sample-result.yaml +28 -0
- package/install-manifest.json +38 -2
- package/package.json +9 -2
- package/protocol/README.yaml +11 -1
- package/protocol/capability-gate.yaml +56 -0
- package/protocol/capability-router.yaml +123 -0
- package/protocol/context-rules.yaml +2 -1
- package/protocol/fast-path.yaml +8 -1
- package/protocol/intelligence-router.yaml +63 -0
- package/protocol/route-packs.yaml +49 -1
- package/protocol/router.yaml +35 -1
- package/roadmap/v1.yaml +139 -0
- package/schema/README.md +26 -0
- package/schema/behavior-contract.schema.json +31 -0
- package/schema/capability-registry.schema.json +51 -0
- package/schema/evaluated-response.schema.json +27 -0
- package/schema/evaluation-result.schema.json +32 -0
- package/schema/memory-entry.schema.json +55 -0
- package/schema/protocol-rule.schema.json +16 -0
- package/schema/protocol-rule.schema.yaml +28 -0
- package/schema/test-case.schema.json +44 -0
- package/schema/test-case.schema.yaml +37 -0
- package/scripts/README.md +79 -1
- package/scripts/build_dist.py +3 -0
- package/scripts/npm_install_protocol.js +60 -1
- package/scripts/verify_install.py +25 -0
- package/templates/minimal/AGENTS.md +8 -1
- package/templates/minimal/behavior/audit-checklist.yaml +15 -0
- package/templates/minimal/behavior/contract.yaml +29 -0
- package/templates/minimal/canonical-state.yaml +1 -1
- package/templates/minimal/capabilities/registry.yaml +48 -0
- package/templates/minimal/context-map.yaml +2 -1
- package/templates/minimal/ide-rules/instruction-block.md +23 -0
- package/templates/minimal/memory/INDEX.yaml +1 -1
- package/templates/minimal/protocol/capability-gate.yaml +10 -0
- package/templates/minimal/protocol/intelligence-router.yaml +10 -0
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
id: capability_gate
|
|
2
|
+
type: operational_rules
|
|
3
|
+
version: 0.4.0
|
|
4
|
+
purpose: gate_capability_use_before_skill_mcp_or_tool_invocation
|
|
5
|
+
principle: plan_before_use_audit_after_use
|
|
6
|
+
guarantee_boundary:
|
|
7
|
+
framework_can:
|
|
8
|
+
- require_capability_plan_before_use
|
|
9
|
+
- mark_unplanned_use_as_protocol_failure
|
|
10
|
+
- compare_used_capabilities_with_selected_capabilities
|
|
11
|
+
- block_high_risk_workflow_when_plan_is_missing
|
|
12
|
+
host_must:
|
|
13
|
+
- hide_or_disable_tools_for_physical_enforcement
|
|
14
|
+
- enforce_runtime_permissions
|
|
15
|
+
required_before_use:
|
|
16
|
+
- task_objective
|
|
17
|
+
- risk_level
|
|
18
|
+
- operation_scope
|
|
19
|
+
- requested_operations
|
|
20
|
+
- required_outcome_tags
|
|
21
|
+
- selected_capabilities
|
|
22
|
+
- confirmation_status_when_required
|
|
23
|
+
allowed_states:
|
|
24
|
+
planned:
|
|
25
|
+
meaning: selected_but_not_invoked
|
|
26
|
+
used:
|
|
27
|
+
meaning: invoked_within_plan_and_scope
|
|
28
|
+
blocked:
|
|
29
|
+
meaning: needed_but_missing_or_unconfirmed
|
|
30
|
+
violation:
|
|
31
|
+
meaning: used_without_plan_or_outside_scope
|
|
32
|
+
rules:
|
|
33
|
+
- no_skill_mcp_or_remote_tool_before_capability_plan
|
|
34
|
+
- local_read_can_be_implicit_only_for_level_0_or_1_basic_navigation
|
|
35
|
+
- level_2_or_3_requires_explicit_capability_plan
|
|
36
|
+
- publish_write_or_destructive_requires_confirmation_when_policy_requires
|
|
37
|
+
- used_capability_must_be_subset_of_selected_capabilities
|
|
38
|
+
- unplanned_use_is_protocol_failure
|
|
39
|
+
- missing_required_capability_blocks_high_risk_workflow
|
|
40
|
+
audit:
|
|
41
|
+
compare:
|
|
42
|
+
- selected_capabilities
|
|
43
|
+
- used_capabilities
|
|
44
|
+
- operation_scope
|
|
45
|
+
- confirmation_status
|
|
46
|
+
fail_when:
|
|
47
|
+
- used_not_selected
|
|
48
|
+
- used_for_unapproved_operation
|
|
49
|
+
- used_after_blocked_status
|
|
50
|
+
- publish_without_confirmation
|
|
51
|
+
delivery:
|
|
52
|
+
include_for_level_2_or_3:
|
|
53
|
+
- capability_plan_summary
|
|
54
|
+
- used_capabilities
|
|
55
|
+
- gate_status
|
|
56
|
+
- violations_if_any
|
|
@@ -0,0 +1,123 @@
|
|
|
1
|
+
id: capability_router
|
|
2
|
+
type: operational_rules
|
|
3
|
+
version: 0.4.0
|
|
4
|
+
purpose: select_only_necessary_skills_mcps_and_tools
|
|
5
|
+
principle: minimum_capability_set_must_preserve_required_quality
|
|
6
|
+
platform_boundary:
|
|
7
|
+
can_control:
|
|
8
|
+
- selection
|
|
9
|
+
- instruction_loading
|
|
10
|
+
- invocation
|
|
11
|
+
- operation_scope
|
|
12
|
+
cannot_guarantee:
|
|
13
|
+
- physical_unloading_of_host_exposed_tools
|
|
14
|
+
- revocation_of_platform_permissions
|
|
15
|
+
rule: exposed_capability_must_remain_unused_until_selected
|
|
16
|
+
entrypoint:
|
|
17
|
+
registry: capabilities/registry.yaml
|
|
18
|
+
selection_flow:
|
|
19
|
+
- classify_task_and_risk
|
|
20
|
+
- define_required_outcomes_and_operations
|
|
21
|
+
- inspect_available_capability_metadata
|
|
22
|
+
- prefer_existing_local_capability
|
|
23
|
+
- select_smallest_set_covering_required_outcomes
|
|
24
|
+
- add_dependency_only_when_selected_capability_requires_it
|
|
25
|
+
- verify_permissions_confirmation_and_validation
|
|
26
|
+
- stop_discovery_when_coverage_is_complete
|
|
27
|
+
capability_types:
|
|
28
|
+
- built_in_reasoning
|
|
29
|
+
- local_tool
|
|
30
|
+
- skill
|
|
31
|
+
- mcp
|
|
32
|
+
- remote_service
|
|
33
|
+
operations:
|
|
34
|
+
read:
|
|
35
|
+
effect: none_or_read_only
|
|
36
|
+
write:
|
|
37
|
+
effect: state_change
|
|
38
|
+
publish:
|
|
39
|
+
effect: external_release
|
|
40
|
+
destructive:
|
|
41
|
+
effect: irreversible_or_high_impact
|
|
42
|
+
risk_policy:
|
|
43
|
+
level_0:
|
|
44
|
+
external_capability_budget: 0
|
|
45
|
+
allow:
|
|
46
|
+
- built_in_reasoning
|
|
47
|
+
expand_when:
|
|
48
|
+
- direct_answer_requires_verified_current_data
|
|
49
|
+
level_1:
|
|
50
|
+
external_capability_budget: 1
|
|
51
|
+
prefer:
|
|
52
|
+
- local_read
|
|
53
|
+
- focused_skill
|
|
54
|
+
level_2:
|
|
55
|
+
external_capability_budget: 3
|
|
56
|
+
prefer:
|
|
57
|
+
- specialized_skill
|
|
58
|
+
- targeted_mcp
|
|
59
|
+
- local_validation
|
|
60
|
+
level_3:
|
|
61
|
+
external_capability_budget: 3
|
|
62
|
+
principle: higher_risk_means_stricter_permissions_not_more_tools
|
|
63
|
+
require:
|
|
64
|
+
- least_privilege
|
|
65
|
+
- explicit_operation_scope
|
|
66
|
+
- confirmation_before_sensitive_write_publish_or_destructive
|
|
67
|
+
- validation_before_and_after
|
|
68
|
+
cost_model:
|
|
69
|
+
dimensions:
|
|
70
|
+
- context_tokens
|
|
71
|
+
- latency
|
|
72
|
+
- remote_calls
|
|
73
|
+
- permission_scope
|
|
74
|
+
- side_effect_risk
|
|
75
|
+
choose_when:
|
|
76
|
+
- required_outcome_is_covered
|
|
77
|
+
- expected_quality_gain_exceeds_incremental_cost
|
|
78
|
+
never_trade:
|
|
79
|
+
- correctness
|
|
80
|
+
- security
|
|
81
|
+
- required_validation
|
|
82
|
+
- current_information_when_task_depends_on_it
|
|
83
|
+
preference_order:
|
|
84
|
+
- built_in_reasoning
|
|
85
|
+
- existing_project_context
|
|
86
|
+
- local_read_tool
|
|
87
|
+
- focused_local_skill
|
|
88
|
+
- targeted_remote_read
|
|
89
|
+
- remote_write
|
|
90
|
+
- publish_or_destructive
|
|
91
|
+
discovery:
|
|
92
|
+
do:
|
|
93
|
+
- use_known_available_capabilities_first
|
|
94
|
+
- search_for_tool_only_when_required_capability_is_missing
|
|
95
|
+
- load_skill_instructions_only_after_selection
|
|
96
|
+
- connect_mcp_only_for_matching_operation
|
|
97
|
+
avoid:
|
|
98
|
+
- loading_all_skills_before_selection
|
|
99
|
+
- listing_all_mcp_resources_without_need
|
|
100
|
+
- installing_adjacent_tools_not_required_by_task
|
|
101
|
+
- continuing_discovery_after_complete_coverage
|
|
102
|
+
permission_policy:
|
|
103
|
+
- read_permission_does_not_imply_write_permission
|
|
104
|
+
- write_permission_does_not_imply_publish_permission
|
|
105
|
+
- memory_never_authorizes_sensitive_operation
|
|
106
|
+
- capability_availability_does_not_authorize_use
|
|
107
|
+
- current_user_request_defines_allowed_scope
|
|
108
|
+
fallback:
|
|
109
|
+
when_required_coverage_is_missing:
|
|
110
|
+
- do_not_execute_incomplete_high_risk_workflow
|
|
111
|
+
- use_safe_local_partial_work_when_independently_valid
|
|
112
|
+
- report_missing_capability
|
|
113
|
+
- request_installation_or_user_action_only_when_required
|
|
114
|
+
delivery:
|
|
115
|
+
include_when_capability_used:
|
|
116
|
+
- selected_capabilities
|
|
117
|
+
- selection_reason
|
|
118
|
+
- operation_scope
|
|
119
|
+
- confirmation_status_when_required
|
|
120
|
+
- validation
|
|
121
|
+
omit:
|
|
122
|
+
- full_available_capability_catalog
|
|
123
|
+
- rejected_capabilities_without_audit_need
|
|
@@ -70,6 +70,7 @@ existing_project_files:
|
|
|
70
70
|
- .cursorrules
|
|
71
71
|
- CLAUDE.md
|
|
72
72
|
- .github/copilot-instructions.md
|
|
73
|
+
- .cursor/rules/ai-execution-protocol.mdc
|
|
73
74
|
- package_docs
|
|
74
75
|
- framework_configs
|
|
75
76
|
behavior:
|
|
@@ -78,7 +79,7 @@ existing_project_files:
|
|
|
78
79
|
- treat_generated_or_old_docs_as_untrusted_until_verified
|
|
79
80
|
- keep_protocol_rules_in_AGENTS_and_protocol_folder
|
|
80
81
|
- use_framework_configs_as_technical_source_when_task_touches_framework
|
|
81
|
-
-
|
|
82
|
+
- duplicate_protocol_rules_across_ide_files_only_with_marked_integration
|
|
82
83
|
conflict_order:
|
|
83
84
|
- current_user_request
|
|
84
85
|
- AGENTS_protocol_block
|
|
@@ -1,11 +1,14 @@
|
|
|
1
1
|
id: fast_path
|
|
2
2
|
type: agent_entrypoint
|
|
3
|
-
version: 0.
|
|
3
|
+
version: 0.4.0
|
|
4
4
|
purpose: minimum_rules_to_start_any_task
|
|
5
5
|
read_next:
|
|
6
6
|
- router.yaml
|
|
7
7
|
- route-packs.yaml
|
|
8
8
|
- context-budget.yaml
|
|
9
|
+
- capability-router.yaml
|
|
10
|
+
- capability-gate.yaml
|
|
11
|
+
- intelligence-router.yaml
|
|
9
12
|
- modes.yaml
|
|
10
13
|
core_rules:
|
|
11
14
|
- classify_risk_before_action
|
|
@@ -24,6 +27,10 @@ core_rules:
|
|
|
24
27
|
- use_only_matching_memory_subjects
|
|
25
28
|
- check_memory_update_result_after_task
|
|
26
29
|
- use_selective_validation_by_blast_radius
|
|
30
|
+
- select_minimum_capability_set_before_loading_skills_or_mcps
|
|
31
|
+
- require_capability_plan_before_skill_mcp_or_remote_tool_use
|
|
32
|
+
- choose_intelligence_level_proportional_to_risk_and_complexity
|
|
33
|
+
- follow_behavioral_execution_contract
|
|
27
34
|
risk_short:
|
|
28
35
|
level_0: answer_only
|
|
29
36
|
level_1: small_clear_reversible_isolated_change
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
id: intelligence_router
|
|
2
|
+
type: operational_rules
|
|
3
|
+
version: 0.4.0
|
|
4
|
+
purpose: choose_model_reasoning_and_effort_proportional_to_task_need
|
|
5
|
+
principle: use_the_cheapest_sufficient_intelligence_without_trading_correctness
|
|
6
|
+
levels:
|
|
7
|
+
minimal:
|
|
8
|
+
use_when:
|
|
9
|
+
- level_0_direct_answer
|
|
10
|
+
- no_current_external_data_needed
|
|
11
|
+
- no_file_change
|
|
12
|
+
model_need: low_cost_fast
|
|
13
|
+
reasoning_depth: low
|
|
14
|
+
tools: none
|
|
15
|
+
standard:
|
|
16
|
+
use_when:
|
|
17
|
+
- level_1_small_change
|
|
18
|
+
- focused_file_read
|
|
19
|
+
- simple_validation
|
|
20
|
+
model_need: default
|
|
21
|
+
reasoning_depth: medium
|
|
22
|
+
tools: local_only
|
|
23
|
+
deep:
|
|
24
|
+
use_when:
|
|
25
|
+
- level_2_flow_bug
|
|
26
|
+
- refactor
|
|
27
|
+
- ambiguous_impact
|
|
28
|
+
- failed_first_validation
|
|
29
|
+
model_need: stronger_or_more_reasoning
|
|
30
|
+
reasoning_depth: high
|
|
31
|
+
tools: selected_local_or_targeted_remote
|
|
32
|
+
critical:
|
|
33
|
+
use_when:
|
|
34
|
+
- level_3_data_auth_security_deploy_publish_destructive
|
|
35
|
+
- high_blast_radius
|
|
36
|
+
- irreversible_or_external_side_effect
|
|
37
|
+
model_need: strongest_available_for_task
|
|
38
|
+
reasoning_depth: high_with_audit
|
|
39
|
+
tools: least_privilege_confirmed
|
|
40
|
+
escalate_when:
|
|
41
|
+
- risk_level_increases
|
|
42
|
+
- ambiguity_blocks_safe_action
|
|
43
|
+
- validation_fails
|
|
44
|
+
- context_conflict_detected
|
|
45
|
+
- external_current_data_is_required
|
|
46
|
+
- specialized_modality_is_required
|
|
47
|
+
deescalate_when:
|
|
48
|
+
- task_is_direct_answer
|
|
49
|
+
- no_code_or_external_state_needed
|
|
50
|
+
- validation_plan_is_trivial
|
|
51
|
+
- previous_high_risk_assumption_is_not_supported_by_evidence
|
|
52
|
+
never_trade:
|
|
53
|
+
- security
|
|
54
|
+
- correctness
|
|
55
|
+
- required_validation
|
|
56
|
+
- explicit_user_scope
|
|
57
|
+
delivery:
|
|
58
|
+
include_when_level_2_or_3:
|
|
59
|
+
- intelligence_level
|
|
60
|
+
- escalation_reason_if_any
|
|
61
|
+
- why_lower_level_was_not_enough
|
|
62
|
+
omit_for_level_0:
|
|
63
|
+
- model_discussion_unless_user_asks
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
id: route_packs
|
|
2
2
|
type: route_summary_index
|
|
3
|
-
version: 0.
|
|
3
|
+
version: 0.4.0
|
|
4
4
|
purpose: compact_first_read_before_full_route_files
|
|
5
5
|
principle: read_pack_first_expand_only_when_needed
|
|
6
6
|
use:
|
|
@@ -120,10 +120,12 @@ packs:
|
|
|
120
120
|
- run_post_deploy_check_if_executed
|
|
121
121
|
evaluate_response:
|
|
122
122
|
read_if_pack_insufficient:
|
|
123
|
+
- ../behavior/contract.yaml
|
|
123
124
|
- ../eval/rubric.yaml
|
|
124
125
|
- ../schema/evaluated-response.schema.json
|
|
125
126
|
do:
|
|
126
127
|
- score_risk_behavior_avoidance_delivery_clarity
|
|
128
|
+
- check_behavior_contract_alignment
|
|
127
129
|
- apply_automatic_fail_rules
|
|
128
130
|
create_or_edit_yaml:
|
|
129
131
|
read_if_pack_insufficient:
|
|
@@ -182,3 +184,49 @@ packs:
|
|
|
182
184
|
- infer_checks_from_changed_files
|
|
183
185
|
- run_smallest_sufficient_validation
|
|
184
186
|
- expand_when_shared_contract_changes
|
|
187
|
+
capability_selection:
|
|
188
|
+
risk: adaptive
|
|
189
|
+
read_if_pack_insufficient:
|
|
190
|
+
- capability-router.yaml
|
|
191
|
+
- capability-gate.yaml
|
|
192
|
+
- context-budget.yaml
|
|
193
|
+
do:
|
|
194
|
+
- define_required_outcomes_and_operations
|
|
195
|
+
- select_smallest_available_capability_set
|
|
196
|
+
- load_only_selected_skill_or_mcp
|
|
197
|
+
- require_confirmation_for_sensitive_remote_effect
|
|
198
|
+
- audit_used_capabilities_against_selected_plan
|
|
199
|
+
- stop_discovery_when_quality_coverage_is_complete
|
|
200
|
+
intelligence_selection:
|
|
201
|
+
risk: adaptive
|
|
202
|
+
read_if_pack_insufficient:
|
|
203
|
+
- intelligence-router.yaml
|
|
204
|
+
- context-budget.yaml
|
|
205
|
+
do:
|
|
206
|
+
- choose_cheapest_sufficient_intelligence_level
|
|
207
|
+
- escalate_for_risk_ambiguity_validation_failure_or_large_context
|
|
208
|
+
- deescalate_when_task_is_direct_and_low_risk
|
|
209
|
+
- do_not_trade_security_correctness_or_validation_for_cost
|
|
210
|
+
behavior_evaluation:
|
|
211
|
+
risk: 1
|
|
212
|
+
read_if_pack_insufficient:
|
|
213
|
+
- ../behavior/contract.yaml
|
|
214
|
+
- ../behavior/audit-checklist.yaml
|
|
215
|
+
- ../eval/rubric.yaml
|
|
216
|
+
do:
|
|
217
|
+
- compare_response_to_observable_behaviors
|
|
218
|
+
- verify_simple_tasks_are_not_overprocessed
|
|
219
|
+
- verify_critical_tasks_are_not_undercontrolled
|
|
220
|
+
- apply_behavior_automatic_fail_rules
|
|
221
|
+
dataset_preparation:
|
|
222
|
+
risk: 1
|
|
223
|
+
read_if_pack_insufficient:
|
|
224
|
+
- ../behavior/contract.yaml
|
|
225
|
+
- ../behavior/audit-checklist.yaml
|
|
226
|
+
- prompt-economy.yaml
|
|
227
|
+
- ../dataset/README.md
|
|
228
|
+
do:
|
|
229
|
+
- create_examples_from_observable_behavior
|
|
230
|
+
- include_good_bad_and_reason
|
|
231
|
+
- keep_training_examples_consistent
|
|
232
|
+
- avoid_rewarding_bureaucracy
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
id: protocol_router
|
|
2
2
|
type: read_router
|
|
3
|
-
version: 0.
|
|
3
|
+
version: 0.4.0
|
|
4
4
|
purpose: choose_minimum_protocol_files_by_task
|
|
5
5
|
default_read:
|
|
6
6
|
- fast-path.yaml
|
|
@@ -76,8 +76,17 @@ routes:
|
|
|
76
76
|
evaluate_response:
|
|
77
77
|
read:
|
|
78
78
|
- fast-path.yaml
|
|
79
|
+
- ../behavior/contract.yaml
|
|
79
80
|
- ../eval/rubric.yaml
|
|
80
81
|
- ../schema/evaluated-response.schema.json
|
|
82
|
+
behavior_evaluation:
|
|
83
|
+
risk: 1
|
|
84
|
+
read:
|
|
85
|
+
- fast-path.yaml
|
|
86
|
+
- ../behavior/contract.yaml
|
|
87
|
+
- ../behavior/audit-checklist.yaml
|
|
88
|
+
- ../eval/rubric.yaml
|
|
89
|
+
- ../dataset/README.md
|
|
81
90
|
create_or_edit_yaml:
|
|
82
91
|
read:
|
|
83
92
|
- fast-path.yaml
|
|
@@ -113,6 +122,27 @@ routes:
|
|
|
113
122
|
read:
|
|
114
123
|
- fast-path.yaml
|
|
115
124
|
- selective-validation.yaml
|
|
125
|
+
capability_selection:
|
|
126
|
+
risk: adaptive
|
|
127
|
+
read:
|
|
128
|
+
- fast-path.yaml
|
|
129
|
+
- capability-router.yaml
|
|
130
|
+
- capability-gate.yaml
|
|
131
|
+
- context-budget.yaml
|
|
132
|
+
intelligence_selection:
|
|
133
|
+
risk: adaptive
|
|
134
|
+
read:
|
|
135
|
+
- fast-path.yaml
|
|
136
|
+
- intelligence-router.yaml
|
|
137
|
+
- context-budget.yaml
|
|
138
|
+
dataset_preparation:
|
|
139
|
+
risk: 1
|
|
140
|
+
read:
|
|
141
|
+
- fast-path.yaml
|
|
142
|
+
- ../behavior/contract.yaml
|
|
143
|
+
- ../behavior/audit-checklist.yaml
|
|
144
|
+
- prompt-economy.yaml
|
|
145
|
+
- ../dataset/README.md
|
|
116
146
|
rules:
|
|
117
147
|
- start_with_default_read
|
|
118
148
|
- choose_one_route_if_task_type_is_clear
|
|
@@ -120,6 +150,10 @@ rules:
|
|
|
120
150
|
- expand_from_route_pack_only_when_needed
|
|
121
151
|
- apply_context_budget_to_selected_route
|
|
122
152
|
- retrieve_only_matching_memory_subjects
|
|
153
|
+
- select_capabilities_before_loading_skill_or_connecting_mcp
|
|
154
|
+
- require_capability_gate_before_invocation
|
|
155
|
+
- route_model_or_reasoning_effort_by_risk_and_complexity
|
|
156
|
+
- use_behavior_contract_when_task_is_about_adherence_dataset_or_training
|
|
123
157
|
- if_route_unclear_read_risk_levels_then_choose_route
|
|
124
158
|
- do_not_read_docs_unless_protocol_is_insufficient
|
|
125
159
|
- do_not_read_cases_unless_testing_or_comparing_behavior
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# 00 - Visao Geral
|
|
2
|
+
|
|
3
|
+
## Ideia central
|
|
4
|
+
|
|
5
|
+
Uma IA que trabalha em codigo, projetos ou tarefas tecnicas nao deve apenas
|
|
6
|
+
obedecer ao prompt literal. Ela deve entender a intencao real, classificar o
|
|
7
|
+
risco, buscar o contexto minimo, executar com escopo controlado e entregar com
|
|
8
|
+
evidencia.
|
|
9
|
+
|
|
10
|
+
O objetivo nao e criar um prompt maior. O objetivo e criar um metodo de
|
|
11
|
+
decisao.
|
|
12
|
+
|
|
13
|
+
## Nome de trabalho
|
|
14
|
+
|
|
15
|
+
Protocolo de Execucao Segura para IA.
|
|
16
|
+
|
|
17
|
+
## Problema que resolve
|
|
18
|
+
|
|
19
|
+
Pedidos humanos costumam ser curtos, informais ou incompletos. Sem metodo, a IA
|
|
20
|
+
pode alterar arquivos antes de entender impacto, ler contexto demais, tratar
|
|
21
|
+
tarefa critica como simples ou entregar sem provar o que validou.
|
|
22
|
+
|
|
23
|
+
## Camadas
|
|
24
|
+
|
|
25
|
+
1. Filosofia: interpretar, reduzir risco e executar com evidencia.
|
|
26
|
+
2. Processo: entender, classificar risco, buscar contexto, mapear, alterar,
|
|
27
|
+
validar e entregar.
|
|
28
|
+
3. Operacao: usar templates e regras YAML curtas para guiar agentes.
|
|
29
|
+
4. Contexto: compilar apenas o pacote minimo necessario antes de raciocinar.
|
|
30
|
+
|
|
31
|
+
## Principio principal
|
|
32
|
+
|
|
33
|
+
O processo deve ser proporcional ao risco.
|
|
34
|
+
|
|
35
|
+
Tarefa simples deve ser rapida. Tarefa com banco, autenticacao, seguranca,
|
|
36
|
+
deploy, dados reais ou comandos destrutivos exige mapa critico, confirmacao e
|
|
37
|
+
validacao forte.
|
|
38
|
+
|
|
39
|
+
A conversa nao deve ser fonte da verdade. A fonte da verdade deve ser o estado
|
|
40
|
+
atual verificado por `INDEX.yaml`, `config.yaml`, protocolo instalado e arquivos
|
|
41
|
+
lidos na tarefa atual.
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# 01 - Modelo de Execucao
|
|
2
|
+
|
|
3
|
+
## Ideia central
|
|
4
|
+
|
|
5
|
+
A IA deve transformar um pedido humano em uma tarefa tecnica clara e escolher o
|
|
6
|
+
menor caminho seguro.
|
|
7
|
+
|
|
8
|
+
## Fluxo
|
|
9
|
+
|
|
10
|
+
1. Entender o objetivo.
|
|
11
|
+
2. Identificar a area provavel.
|
|
12
|
+
3. Avaliar risco.
|
|
13
|
+
4. Buscar contexto minimo suficiente.
|
|
14
|
+
5. Planejar a menor alteracao segura.
|
|
15
|
+
6. Executar.
|
|
16
|
+
7. Validar.
|
|
17
|
+
8. Explicar resultado, limites e pendencias.
|
|
18
|
+
|
|
19
|
+
## Quando mostrar interpretacao
|
|
20
|
+
|
|
21
|
+
Mostre a interpretacao tecnica quando houver ambiguidade, risco, multiplas
|
|
22
|
+
interpretacoes ou pedido explicito.
|
|
23
|
+
|
|
24
|
+
Para tarefas simples, resolver direto costuma ser melhor.
|
|
25
|
+
|
|
@@ -0,0 +1,62 @@
|
|
|
1
|
+
# 02 - Niveis de Risco
|
|
2
|
+
|
|
3
|
+
## Ideia central
|
|
4
|
+
|
|
5
|
+
O volume de contexto e planejamento deve acompanhar o risco.
|
|
6
|
+
|
|
7
|
+
## Nivel 0
|
|
8
|
+
|
|
9
|
+
Resposta simples. Nao altera arquivo e nao executa acao sensivel.
|
|
10
|
+
|
|
11
|
+
## Nivel 1
|
|
12
|
+
|
|
13
|
+
Caminho rapido. A tarefa e clara, pequena, reversivel e isolada.
|
|
14
|
+
|
|
15
|
+
## Nivel 2
|
|
16
|
+
|
|
17
|
+
Mapa de impacto. Use quando ha comportamento relevante, mais de um arquivo,
|
|
18
|
+
ambiguidade ou impacto para usuario.
|
|
19
|
+
|
|
20
|
+
## Nivel 3
|
|
21
|
+
|
|
22
|
+
Mapa critico. Use quando envolve dados reais, seguranca, autenticacao,
|
|
23
|
+
permissoes, banco, deploy, integracoes criticas ou comandos destrutivos.
|
|
24
|
+
|
|
25
|
+
## Regra de subida
|
|
26
|
+
|
|
27
|
+
Comece pequeno. Suba o nivel quando aparecer evidencia de risco. Nunca reduza o
|
|
28
|
+
nivel ignorando risco ja descoberto.
|
|
29
|
+
|
|
30
|
+
## Classificacao proporcional
|
|
31
|
+
|
|
32
|
+
Nao trate toda tarefa com varios passos como nivel 3.
|
|
33
|
+
|
|
34
|
+
Comece pelo menor nivel seguro para o escopo conhecido. Suba apenas quando
|
|
35
|
+
aparecer evidencia concreta: dados reais, autenticacao, permissoes, banco,
|
|
36
|
+
deploy, segredo, comando destrutivo ou impacto incerto em fluxo existente.
|
|
37
|
+
|
|
38
|
+
Se a tarefa puder ser dividida, classifique cada subtarefa pelo proprio escopo.
|
|
39
|
+
Uma parte critica pode continuar bloqueada em nivel 3 enquanto uma parte segura
|
|
40
|
+
segue como nivel 1 ou 2.
|
|
41
|
+
|
|
42
|
+
Isso tambem economiza contexto: a IA nao precisa carregar contexto critico para
|
|
43
|
+
uma subtarefa segura, mas deve manter o risco critico registrado quando ele
|
|
44
|
+
continua no pedido original.
|
|
45
|
+
|
|
46
|
+
## Acao critica bloqueada
|
|
47
|
+
|
|
48
|
+
Se uma tarefa foi classificada como nivel 3 e a parte critica nao pode ser
|
|
49
|
+
executada, nao reduza o risco da tarefa original apenas para economizar
|
|
50
|
+
contexto.
|
|
51
|
+
|
|
52
|
+
Marque a parte critica como bloqueada ou pendente. Depois, se houver partes
|
|
53
|
+
seguras e uteis, separe essas partes como subtarefas menores e classifique cada
|
|
54
|
+
uma pelo proprio escopo.
|
|
55
|
+
|
|
56
|
+
Exemplo: se o pedido e publicar em producao, mas nao existe acesso ao deploy, o
|
|
57
|
+
deploy continua nivel 3 bloqueado. Preparar README, revisar `.gitignore` ou
|
|
58
|
+
montar comandos de publicacao pode ser tratado como subtarefa segura, com
|
|
59
|
+
contexto menor.
|
|
60
|
+
|
|
61
|
+
So reduza o escopo quando a acao critica for removida explicitamente do pedido
|
|
62
|
+
ou quando nova evidencia provar que o risco critico nao se aplica.
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# 03 - Mapeamento Antes de Alterar
|
|
2
|
+
|
|
3
|
+
## Ideia central
|
|
4
|
+
|
|
5
|
+
Antes de modificar, a IA deve saber o que esta alterando, por que esta
|
|
6
|
+
alterando e como provar que funcionou.
|
|
7
|
+
|
|
8
|
+
## Mapa minimo
|
|
9
|
+
|
|
10
|
+
Use em tarefas simples:
|
|
11
|
+
|
|
12
|
+
- objetivo;
|
|
13
|
+
- area afetada;
|
|
14
|
+
- arquivos candidatos;
|
|
15
|
+
- risco;
|
|
16
|
+
- plano;
|
|
17
|
+
- validacao.
|
|
18
|
+
|
|
19
|
+
## Mapa de impacto
|
|
20
|
+
|
|
21
|
+
Use em tarefas medias:
|
|
22
|
+
|
|
23
|
+
- objetivo real;
|
|
24
|
+
- fluxo tecnico;
|
|
25
|
+
- arquivos candidatos;
|
|
26
|
+
- riscos e efeitos colaterais;
|
|
27
|
+
- fora do escopo;
|
|
28
|
+
- plano de alteracao;
|
|
29
|
+
- rollback mental;
|
|
30
|
+
- validacao esperada.
|
|
31
|
+
|
|
32
|
+
## Mapa critico
|
|
33
|
+
|
|
34
|
+
Use em tarefas sensiveis:
|
|
35
|
+
|
|
36
|
+
- dados afetados;
|
|
37
|
+
- permissoes;
|
|
38
|
+
- superficie de seguranca;
|
|
39
|
+
- riscos criticos;
|
|
40
|
+
- plano seguro;
|
|
41
|
+
- confirmacao necessaria;
|
|
42
|
+
- rollback.
|
|
43
|
+
|
|
44
|
+
## Regra pratica
|
|
45
|
+
|
|
46
|
+
Nao editar arquivo que nao foi identificado como candidato. Se um novo arquivo
|
|
47
|
+
se tornar necessario, atualizar o mapa antes de editar.
|
|
48
|
+
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# 04 - Janela de Contexto
|
|
2
|
+
|
|
3
|
+
## Ideia central
|
|
4
|
+
|
|
5
|
+
Janela de contexto e o limite de informacao que a IA consegue considerar em uma
|
|
6
|
+
conversa ou execucao.
|
|
7
|
+
|
|
8
|
+
Ela inclui mensagens, instrucoes, arquivos lidos, outputs, logs, diffs e
|
|
9
|
+
resumos.
|
|
10
|
+
|
|
11
|
+
## Risco
|
|
12
|
+
|
|
13
|
+
Contexto demais aumenta custo e pode fazer a IA misturar decisoes antigas com
|
|
14
|
+
informacoes novas.
|
|
15
|
+
|
|
16
|
+
## Boas praticas
|
|
17
|
+
|
|
18
|
+
- Manter cada arquivo com no maximo 400 linhas.
|
|
19
|
+
- Ler trechos em vez de arquivos inteiros quando possivel.
|
|
20
|
+
- Usar busca textual para localizar a parte relevante.
|
|
21
|
+
- Evitar repetir regras longas no chat.
|
|
22
|
+
- Separar historico de estado atual.
|
|
23
|
+
- Tratar a conversa como interface, nao como fonte da verdade.
|
|
24
|
+
- Usar `protocol/context-compiler.yaml` para montar um pacote minimo de contexto
|
|
25
|
+
antes de tarefas grandes ou com historico confuso.
|
|
26
|
+
- Abrir nova conversa quando a continuidade ficar arriscada.
|
|
27
|
+
|
|
28
|
+
## MVP recomendado
|
|
29
|
+
|
|
30
|
+
Antes de bancos vetoriais ou grafo completo, use:
|
|
31
|
+
|
|
32
|
+
- `canonical-state.yaml` como estado atual resumido;
|
|
33
|
+
- `context-map.yaml` como mapa de dominios e aliases;
|
|
34
|
+
- `decisions/` para decisoes ativas;
|
|
35
|
+
- `INDEX.yaml` como mapa;
|
|
36
|
+
- `config.yaml` como estado atual;
|
|
37
|
+
- `protocol/router.yaml` como seletor de regras;
|
|
38
|
+
- busca textual para localizar arquivos candidatos;
|
|
39
|
+
- trechos relevantes em vez de arquivos inteiros;
|
|
40
|
+
- resumo de handoff quando a conversa ficar longa.
|
|
41
|
+
|
|
42
|
+
Meta economica: reduzir contexto desnecessario em ate 90% quando isso nao
|
|
43
|
+
prejudicar seguranca, classificacao de risco, precisao ou validacao.
|
|
44
|
+
|
|
45
|
+
Essa meta mede contexto irrelevante evitado. Ela nao autoriza ignorar arquivo,
|
|
46
|
+
dependencia ou validacao necessaria.
|
|
47
|
+
|
|
48
|
+
Use RAG, grafo de conhecimento ou cache semantico apenas quando o projeto tiver
|
|
49
|
+
volume real suficiente para justificar o custo.
|
|
50
|
+
|
|
51
|
+
## Limite de arquivo
|
|
52
|
+
|
|
53
|
+
Nenhum arquivo deve passar de 400 linhas.
|
|
54
|
+
|
|
55
|
+
Quando um arquivo se aproximar desse limite, dividir por assunto e manter um
|
|
56
|
+
indice curto.
|