direxio-deployer 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. package/AGENTS.md +92 -0
  2. package/LICENSE +21 -0
  3. package/README.md +221 -0
  4. package/README_zh.md +218 -0
  5. package/SKILL.md +722 -0
  6. package/agents/README.md +25 -0
  7. package/agents/openai.yaml +12 -0
  8. package/bin/direxio-deployer.mjs +375 -0
  9. package/package.json +28 -0
  10. package/references/agent-targets.md +128 -0
  11. package/references/architecture.md +44 -0
  12. package/references/bug-history.md +78 -0
  13. package/references/deployment-lessons.md +218 -0
  14. package/references/deployment-optimization-audit.md +317 -0
  15. package/references/deployment-workflow.md +341 -0
  16. package/references/iam-policy.json +52 -0
  17. package/references/runtime-wiring.md +209 -0
  18. package/references/state-machine.md +46 -0
  19. package/references/token-refresh.md +81 -0
  20. package/references/tooling.md +106 -0
  21. package/references/troubleshooting.md +26 -0
  22. package/references/user-journey.md +75 -0
  23. package/references/verification-recovery.md +84 -0
  24. package/references/voip-turn-runbook.md +154 -0
  25. package/references/windows-deployment-notes.md +119 -0
  26. package/scripts/aws-credentials.sh +195 -0
  27. package/scripts/cloud-init/Caddyfile +48 -0
  28. package/scripts/cloud-init/docker-compose.yml +125 -0
  29. package/scripts/cloud-init/init-tokens.sh +238 -0
  30. package/scripts/cloud-init/user-data.yaml +40 -0
  31. package/scripts/destroy.ps1 +77 -0
  32. package/scripts/destroy.sh +589 -0
  33. package/scripts/lib/aws.sh +73 -0
  34. package/scripts/lib/domain.sh +175 -0
  35. package/scripts/lib/operation_report.sh +240 -0
  36. package/scripts/lib/ops.sh +230 -0
  37. package/scripts/lib/paths.sh +35 -0
  38. package/scripts/lib/state.sh +137 -0
  39. package/scripts/mcp-tools-list.mjs +95 -0
  40. package/scripts/orchestrate.ps1 +112 -0
  41. package/scripts/orchestrate.sh +1126 -0
  42. package/scripts/phases/s0_prereq_aws.sh +39 -0
  43. package/scripts/phases/s1_preflight.sh +72 -0
  44. package/scripts/phases/s2_domain.sh +103 -0
  45. package/scripts/phases/s3_provision.sh +421 -0
  46. package/scripts/phases/s4_bootstrap_stack.sh +38 -0
  47. package/scripts/phases/s5_init_tokens.sh +118 -0
  48. package/scripts/phases/s6_wire_local.sh +1435 -0
  49. package/scripts/phases/s7_verify_e2e.sh +136 -0
  50. package/scripts/pricing-estimate.sh +256 -0
  51. package/scripts/render/render-userdata.sh +86 -0
  52. package/scripts/reset-app-data.sh +40 -0
  53. package/scripts/update.sh +30 -0
  54. package/tests/aws_credentials_test.sh +139 -0
  55. package/tests/connect_daemon_runtime_check_test.sh +120 -0
  56. package/tests/default_paths_test.sh +58 -0
  57. package/tests/destroy_local_bridge_test.sh +154 -0
  58. package/tests/destroy_root_identity_test.sh +91 -0
  59. package/tests/destroy_route53_zone_test.sh +80 -0
  60. package/tests/domain_authoritative_dns_test.sh +49 -0
  61. package/tests/mcp_doctor_runtime_check_test.sh +86 -0
  62. package/tests/mcp_smoke_runtime_check_test.sh +121 -0
  63. package/tests/mcp_tools_runtime_check_test.sh +123 -0
  64. package/tests/npm_skill_distribution_test.sh +95 -0
  65. package/tests/operation_report_test.sh +258 -0
  66. package/tests/orchestrate_status_recovery_test.sh +91 -0
  67. package/tests/phase_timeout_test.sh +88 -0
  68. package/tests/pricing_estimate_test.sh +159 -0
  69. package/tests/render_userdata_remote_nodes_test.sh +40 -0
  70. package/tests/root_volume_tracking_test.sh +41 -0
  71. package/tests/route53_overwrite_guard_test.sh +86 -0
  72. package/tests/route53_zone_auto_create_test.sh +66 -0
  73. package/tests/runtime_summary_check_test.sh +203 -0
  74. package/tests/s6_wire_local_test.sh +405 -0
  75. package/tests/skill_structure_test.sh +298 -0
  76. package/tests/update_reset_ops_test.sh +230 -0
  77. package/tests/user_confirmation_gates_test.sh +152 -0
@@ -0,0 +1,341 @@
1
+ # Deployment Workflow
2
+
3
+ ## Preflight
4
+
5
+ 1. Confirm `DOMAIN`, `DOMAIN_MODE`, and `CONFIRM_DOMAIN_BINDING=1`.
6
+ 2. Confirm AWS region, credentials, billing, instance type, and costs.
7
+ 3. Check dependencies with the OS-specific commands in `tooling.md`.
8
+ 4. Check current DNS provider. Prefer `DOMAIN_MODE=route53` when the user
9
+ confirms AWS may manage the hosted zone and A record. Use `DOMAIN_MODE=user`
10
+ only as a fallback when no DNS provider automation is available.
11
+ 5. Check state:
12
+
13
+ ```bash
14
+ bash scripts/orchestrate.sh status
15
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh status
16
+ ```
17
+
18
+ If state has resources, require one:
19
+
20
+ ```bash
21
+ P2P_EXISTING_STATE_ACTION=continue
22
+ P2P_EXISTING_STATE_ACTION=destroy
23
+ DOMAIN=<different-domain>
24
+ ```
25
+
26
+ For first-time credentials, import the selected AWS access-key CSV and verify
27
+ the identity before provisioning. A temporary `DirexioDeployer` IAM user is the
28
+ recommended routine path, but root access keys are allowed when the operator
29
+ explicitly chose them:
30
+
31
+ ```bash
32
+ bash scripts/aws-credentials.sh import-csv /path/to/accessKeys.csv direxio-deployer <region>
33
+ export AWS_PROFILE=direxio-deployer
34
+ bash scripts/aws-credentials.sh verify direxio-deployer
35
+ ```
36
+
37
+ Before the first mutating AWS phase, produce a monthly estimate for the selected
38
+ region and instance size:
39
+
40
+ ```bash
41
+ bash scripts/pricing-estimate.sh \
42
+ --region <region> \
43
+ --instance-type t3.small \
44
+ --disk-gb 8 \
45
+ --domain-mode user
46
+ ```
47
+
48
+ When `state.json` already exists, refresh and persist the same estimate:
49
+
50
+ ```bash
51
+ bash scripts/pricing-estimate.sh --state ~/.direxio/nodes/<service_id>/state.json --write-state
52
+ ```
53
+
54
+ `scripts/orchestrate.sh` also writes `cost_estimate` automatically after region
55
+ selection and refreshes it in S3 after the final EC2 instance type is known.
56
+ The estimate includes EC2, gp3 storage, public IPv4, and Route53 hosted-zone
57
+ cost when applicable. It excludes data transfer, TURN relay traffic, domain
58
+ registration, taxes, and AWS credits. Credit coverage is not guaranteed; verify
59
+ credits and actual charges in AWS Billing Console, and set an AWS Budget or
60
+ billing alert before leaving the node running.
61
+
62
+ ## Destroy
63
+
64
+ From the repository root:
65
+
66
+ ```bash
67
+ DOMAIN=__DOMAIN__ bash scripts/destroy.sh
68
+ ```
69
+
70
+ On Windows, run destroy from PowerShell so local service paths stay in Windows
71
+ form and the wrapper selects Git for Windows Bash instead of WSL:
72
+
73
+ ```powershell
74
+ $env:DOMAIN = "__DOMAIN__"
75
+ .\scripts\destroy.ps1
76
+ ```
77
+
78
+ Destroy stops and uninstalls the local `direxio-connect` daemon only when `direxio-connect daemon status --service-name <service_id>` reports a `WorkDir` matching the current service directory, `~/.direxio/nodes/<service_id>/cc-connect`. It then terminates the recorded EC2 instance, verifies the recorded EBS root volume, releases the Elastic IP, deletes the security group and key pair, removes Route53 records/zones created by the deployer, records AWS read-back results under `destroy.evidence`, and removes the corresponding local service directory under `~/.direxio/nodes/<service_id>`. This prevents stale credentials, `state.json` files, and local service registrations from being treated as active deployments later while preserving an audit report for cleanup.
79
+
80
+ Destroy allows root AWS access-key identity when the operator explicitly chose
81
+ root credentials. Use the same deployment profile for teardown that was used
82
+ for provisioning.
83
+
84
+ Use `P2P_KEEP_WORKDIR=1 DOMAIN=__DOMAIN__ bash scripts/destroy.sh` on POSIX, or set `$env:P2P_KEEP_WORKDIR = "1"` before `.\scripts\destroy.ps1` on Windows, only when preserving local state files for debugging; if used, report that the service directory still exists.
85
+
86
+ ## Run
87
+
88
+ From the repository root:
89
+
90
+ ```bash
91
+ AWS_PROFILE=p2p-matrix \
92
+ AWS_DEFAULT_REGION=us-east-1 \
93
+ DOMAIN=__DOMAIN__ \
94
+ DOMAIN_MODE=user \
95
+ CONFIRM_DOMAIN_BINDING=1 \
96
+ INSTANCE_TYPE=t3.small \
97
+ MESSAGE_SERVER_IMAGE=direxio/message-server:latest \
98
+ bash scripts/orchestrate.sh
99
+ ```
100
+
101
+ Exit codes:
102
+
103
+ - `0`: deployment complete.
104
+ - `1`: phase failed; inspect logs and rerun or destroy.
105
+ - `2`: waiting for user/external action, usually DNS or credentials.
106
+
107
+ ## Operation Report
108
+
109
+ New deploys write a redacted machine-readable report to:
110
+
111
+ ```text
112
+ ~/.direxio/nodes/<service_id>/operation-report.json
113
+ ```
114
+
115
+ Regenerate it from current state with:
116
+
117
+ ```bash
118
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh report new_deploy
119
+ ```
120
+
121
+ Destroy writes its audit report outside the service directory because the
122
+ service directory is removed:
123
+
124
+ ```text
125
+ ~/.direxio/reports/<service_id>/operation-report.json
126
+ ```
127
+
128
+ Reports include operation type, S0-S7 gate status, user-confirmation gates,
129
+ credential/config paths, cc-connect/MCP metadata, AWS resource IDs, billing
130
+ reminders, `billing.cost_estimate`, destroy read-back evidence under
131
+ `destroy.evidence` when applicable, and redaction evidence. They must not
132
+ contain the initialization code, AWS secrets, access tokens, agent tokens, or
133
+ Matrix session tokens. User/runtime evidence is also scrubbed for
134
+ eight-or-more digit numeric strings because users may paste initialization
135
+ codes into confirmation notes. After update/reset, the report must show
136
+ `credentials.status=refresh_pending`, `connect.install_status=refresh_pending`,
137
+ and `mcp.status=refresh_pending` until S5/S6/S7 and runtime checks refresh
138
+ local evidence.
139
+
140
+ When the user or runtime evidence confirms a manual product gate, write it back
141
+ to state before regenerating the report. Connect daemon status is a
142
+ service-scoped local bridge check, MCP doctor is a non-polluting runtime check,
143
+ MCP tools is stdio `tools/list` discovery, and MCP smoke is a read-only backend
144
+ call. They are not the full runtime product gate:
145
+
146
+ ```bash
147
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh verify runtime
148
+
149
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh verify connect_daemon
150
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh verify mcp_doctor
151
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh verify mcp_tools
152
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh verify mcp_smoke
153
+
154
+ DIREXIO_CONFIRM_EVIDENCE="user completed app initialization" \
155
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh confirm app_initialization
156
+
157
+ DIREXIO_CONFIRM_EVIDENCE="user sent a message and saw the agent reply" \
158
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh confirm real_chat
159
+
160
+ DIREXIO_CONFIRM_RUNTIME_PROBE=1 \
161
+ DIREXIO_CONFIRM_EVIDENCE="MCP doctor/tool discovery and runtime probe confirmed" \
162
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh confirm agent_mcp_runtime
163
+
164
+ DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh report new_deploy
165
+ ```
166
+
167
+ `confirm agent_mcp_runtime` refuses to write the gate until
168
+ `runtime_checks.summary.status=passed` and `DIREXIO_CONFIRM_RUNTIME_PROBE=1`
169
+ are both present. Use the flag only after the selected runtime/channel probe has
170
+ actually loaded the service-scoped MCP tools; `verify runtime` alone is an
171
+ internal non-polluting check, not the full product gate.
172
+ All `confirm` commands require `DIREXIO_CONFIRM_EVIDENCE` with a concrete
173
+ user/runtime evidence note; do not write user-confirmation gates with generic
174
+ default evidence. The evidence note must be at least 12 characters; avoid
175
+ placeholders such as `ok`, `yes`, or `done`.
176
+
177
+ ## Existing Node Update
178
+
179
+ Update the running service image without recreating infrastructure or deleting
180
+ data:
181
+
182
+ ```bash
183
+ DOMAIN=__DOMAIN__ MESSAGE_SERVER_IMAGE=direxio/message-server:latest bash scripts/update.sh
184
+ P2P_EXISTING_STATE_ACTION=continue DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh
185
+ ```
186
+
187
+ `update.sh` SSHes to the recorded EC2 instance, runs Docker Compose pull/up,
188
+ reruns `/opt/p2p/init-tokens.sh`, clears stale local secret fields, stops only
189
+ the matching service-scoped direxio-connect daemon when its `WorkDir` matches
190
+ this service, and marks S4-S7 pending so health, credential sync, local
191
+ MCP/agent wiring, and final verification run again. It does not remove Docker
192
+ volumes.
193
+
194
+ ## Existing Node App Data Reset
195
+
196
+ Reset application data while preserving EC2, public IPv4/EIP, DNS, and Caddy
197
+ TLS volumes:
198
+
199
+ ```bash
200
+ DIREXIO_RESET_APP_DATA_CONFIRM=1 DOMAIN=__DOMAIN__ bash scripts/reset-app-data.sh
201
+ P2P_EXISTING_STATE_ACTION=continue DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh
202
+ ```
203
+
204
+ `reset-app-data.sh` removes only `postgres-data`, `message-config`, and
205
+ `message-data`. It must not remove `caddy-data` or `caddy-config`; losing those
206
+ volumes can trigger certificate reissuance and Let's Encrypt rate limits. It
207
+ stops only the matching service-scoped direxio-connect daemon when its `WorkDir`
208
+ matches this service. After reset, treat old app users, rooms, messages,
209
+ initialization code, access token, agent token, and agent room as stale until
210
+ S5-S7 complete again.
211
+
212
+ ## S4 Bootstrap Timeout / Certificate Rate Limit Recovery
213
+
214
+ When S4 fails with `healthz did not return 200 before timeout`, the most
215
+ common cause is **Let's Encrypt certificate rate limiting** (max 5 certs per
216
+ domain per 7 days). Caddy retries automatically with backoff, but the
217
+ orchestration script may time out first.
218
+
219
+ **How to check:**
220
+
221
+ ```bash
222
+ ssh -i <keyfile> ubuntu@<public-ip> \
223
+ 'sudo docker logs p2p-caddy-1 --tail 20 2>&1 | grep -i "rateLimit\|retry after\|429"'
224
+ ```
225
+
226
+ If rate-limited, the log shows `retry after <timestamp> UTC`.
227
+
228
+ **Recovery options:**
229
+
230
+ 1. **Wait for rate limit to expire** — Caddy retries in the background.
231
+ Check progress periodically:
232
+ ```bash
233
+ curl -skI https://<DOMAIN>/healthz # returns 200 when cert is ready
234
+ ```
235
+ Once the endpoint returns 200, re-run orchestrate.sh to complete:
236
+ ```bash
237
+ P2P_EXISTING_STATE_ACTION=continue \
238
+ DNS_READY=1 \
239
+ AWS_PROFILE=p2p-matrix \
240
+ AWS_DEFAULT_REGION=us-east-1 \
241
+ DOMAIN=<DOMAIN> \
242
+ DOMAIN_MODE=route53 \
243
+ CONFIRM_DOMAIN_BINDING=1 \
244
+ INSTANCE_TYPE=t3.small \
245
+ bash scripts/orchestrate.sh
246
+ ```
247
+
248
+ 2. **Use a different domain** — If you have multiple domains, destroy the
249
+ current deployment and deploy on a domain without recent cert history:
250
+ ```bash
251
+ bash scripts/destroy.sh
252
+ ```
253
+ On Windows, use `.\scripts\destroy.ps1` from PowerShell.
254
+ Then start again with a fresh domain.
255
+
256
+ 3. **Force Caddy staging CA** (development only) — Set the environment
257
+ variable `CADDY_ACME_CA=https://acme-staging-v02.api.letsencrypt.org/directory`
258
+ in the compose file to get a staging certificate. Staging certs are **not
259
+ trusted by browsers** — use only for testing.
260
+
261
+ ## Route53 DNS Mode
262
+
263
+ With `DOMAIN_MODE=route53`, S3 reuses a matching hosted zone or creates one,
264
+ records the zone id and nameservers in `state.json`, upserts the A record, and
265
+ waits for DNS to resolve.
266
+
267
+ If the current Route53 A record already points to a different IP, S3 stops
268
+ before changing DNS and records `route53_existing_a_value` plus
269
+ `route53_pending_a_value` in state. Confirm the replacement only after checking
270
+ that the old IP is safe to replace:
271
+
272
+ ```bash
273
+ DIREXIO_CONFIRM_DNS_OVERWRITE=1 \
274
+ P2P_EXISTING_STATE_ACTION=continue \
275
+ DOMAIN=__DOMAIN__ \
276
+ DOMAIN_MODE=route53 \
277
+ CONFIRM_DOMAIN_BINDING=1 \
278
+ bash scripts/orchestrate.sh
279
+ ```
280
+
281
+ If the domain is registered outside Route53, delegate the recorded nameservers
282
+ at the current registrar or through a provider API:
283
+
284
+ ```bash
285
+ jq '.resources | {route53_zone_id, route53_zone_name, route53_name_servers}' ~/.direxio/nodes/<service_id>/state.json
286
+ ```
287
+
288
+ After authoritative DNS returns the new IP, continue with the same state:
289
+
290
+ ```bash
291
+ P2P_EXISTING_STATE_ACTION=continue \
292
+ DOMAIN=__DOMAIN__ \
293
+ DOMAIN_MODE=route53 \
294
+ CONFIRM_DOMAIN_BINDING=1 \
295
+ bash scripts/orchestrate.sh
296
+ ```
297
+
298
+ Destroy deletes deployer-created hosted zones when state records
299
+ `route53_zone_created_by_deployer=true`; pre-existing or user-owned zones are
300
+ left in place.
301
+
302
+ ## Manual DNS Mode
303
+
304
+ Use manual DNS mode only when no DNS provider automation is available. When S3
305
+ emits an Elastic IP, ask the user to set:
306
+
307
+ ```text
308
+ <DOMAIN> A <PUBLIC_IP>
309
+ ```
310
+
311
+ For Cloudflare, use DNS-only, not proxied. For Alibaba/HiChina, edit the A record in Alibaba Cloud DNS.
312
+
313
+ After authoritative DNS returns the new IP:
314
+
315
+ ```bash
316
+ DNS_READY=1 \
317
+ AWS_PROFILE=p2p-matrix \
318
+ AWS_DEFAULT_REGION=us-east-1 \
319
+ DOMAIN=__DOMAIN__ \
320
+ DOMAIN_MODE=user \
321
+ CONFIRM_DOMAIN_BINDING=1 \
322
+ INSTANCE_TYPE=t3.small \
323
+ MESSAGE_SERVER_IMAGE=direxio/message-server:latest \
324
+ P2P_EXISTING_STATE_ACTION=continue \
325
+ bash scripts/orchestrate.sh
326
+ ```
327
+
328
+ ## Initialization Code Field
329
+
330
+ Current backend delivery uses the `password` field for the user-facing eight-digit app initialization code, a unified user `access_token`, and an agent-only `agent_token`.
331
+
332
+ State fields after S5:
333
+
334
+ ```text
335
+ password
336
+ agent_token
337
+ access_token
338
+ as_url
339
+ ```
340
+
341
+ All fields are written to `~/.direxio/nodes/<service_id>/credentials.json` with mode `0600`.
@@ -0,0 +1,52 @@
1
+ {
2
+ "Version": "2012-10-17",
3
+ "Comment": "p2p-matrix 一键部署所需的最小 IAM 权限。用户建 IAM 用户时附加此策略,再为其生成 AK/SK 交给 agent。比给 AdministratorAccess 安全得多。",
4
+ "Statement": [
5
+ {
6
+ "Sid": "Preflight",
7
+ "Effect": "Allow",
8
+ "Action": [
9
+ "sts:GetCallerIdentity",
10
+ "servicequotas:GetServiceQuota",
11
+ "ssm:GetParameters",
12
+ "ec2:DescribeVpcs"
13
+ ],
14
+ "Resource": "*"
15
+ },
16
+ {
17
+ "Sid": "EC2Deploy",
18
+ "Effect": "Allow",
19
+ "Action": [
20
+ "ec2:CreateKeyPair",
21
+ "ec2:DeleteKeyPair",
22
+ "ec2:CreateSecurityGroup",
23
+ "ec2:DeleteSecurityGroup",
24
+ "ec2:AuthorizeSecurityGroupIngress",
25
+ "ec2:RunInstances",
26
+ "ec2:TerminateInstances",
27
+ "ec2:DescribeInstances",
28
+ "ec2:DescribeImages",
29
+ "ec2:DescribeSecurityGroups",
30
+ "ec2:DescribeKeyPairs",
31
+ "ec2:CreateTags",
32
+ "ec2:AllocateAddress",
33
+ "ec2:AssociateAddress",
34
+ "ec2:ReleaseAddress",
35
+ "ec2:DescribeAddresses"
36
+ ],
37
+ "Resource": "*"
38
+ },
39
+ {
40
+ "Sid": "DnsOptional_仅当域名托管在Route53才需要",
41
+ "Effect": "Allow",
42
+ "Action": [
43
+ "route53:ListHostedZones",
44
+ "route53:CreateHostedZone",
45
+ "route53:DeleteHostedZone",
46
+ "route53:ChangeResourceRecordSets",
47
+ "route53:GetChange"
48
+ ],
49
+ "Resource": "*"
50
+ }
51
+ ]
52
+ }
@@ -0,0 +1,209 @@
1
+ # Runtime Wiring
2
+
3
+ After deployment, S6 writes service-scoped files under:
4
+
5
+ ```text
6
+ ~/.direxio/nodes/<service_id>/
7
+ ```
8
+
9
+ `service_id` is derived from the deployed service domain.
10
+
11
+ ## Credentials
12
+
13
+ `credentials.json` keeps the backend `password` field, owner access token, agent token, and room identity. User-facing reports should call the `password` value the eight-digit app initialization code:
14
+
15
+ ```json
16
+ {
17
+ "profiles": {
18
+ "default": {
19
+ "password": "<eight-digit-app-initialization-code>",
20
+ "access_token": "<owner-access-token>",
21
+ "agent_room_id": "__ROOM_ID__",
22
+ "direxio_domain": "https://__DOMAIN__",
23
+ "direxio_agent_token": "<agent-token>",
24
+ "direxio_agent_room_id": "__ROOM_ID__",
25
+ "direxio_agent_node_id": "__AGENT_NODE_ID__"
26
+ }
27
+ }
28
+ }
29
+ ```
30
+
31
+ Treat the synced `password` and owner `access_token` as one-time/volatile values. A successful App initialization or token exchange can reset them on the server. Before showing the eight-digit app initialization code or using an owner `access_token` for `/_p2p/command` or Matrix Client API calls, pull the current `/opt/p2p/bootstrap.json` from the server and refresh local credentials instead of using older local output.
32
+
33
+ `env` contains the same service-scoped environment values for shell usage:
34
+
35
+ ```bash
36
+ DIREXIO_DOMAIN=https://__DOMAIN__
37
+ DIREXIO_AGENT_TOKEN=<agent_token>
38
+ DIREXIO_AGENT_ROOM_ID=__ROOM_ID__
39
+ DIREXIO_AGENT_NODE_ID=__AGENT_NODE_ID__
40
+ ```
41
+
42
+ ## MCP Tooling
43
+
44
+ S6 writes MCP snippets under the same service directory:
45
+
46
+ ```text
47
+ ~/.direxio/nodes/<service_id>/mcp/
48
+ ```
49
+
50
+ Generated files:
51
+
52
+ - `codex.toml`: Codex TOML snippet using `[mcp_servers."<server-name>"]`.
53
+ - `openclaw.md`: OpenClaw CLI setup note. It must use `openclaw mcp set`; do not paste MCP JSON into `~/.openclaw/openclaw.json`.
54
+ - `openclaw-server.json`: one OpenClaw MCP server object consumed by `openclaw mcp set`.
55
+ - `hermes.mcp.json`: Hermes JSON snippet using `mcpServers`.
56
+ - `mcp-servers.json`: generic JSON snippet for other MCP clients.
57
+ - `env`: shell exports for checking `direxio-mcp` manually.
58
+
59
+ All snippets and setup artifacts run `direxio-mcp` over stdio and set:
60
+
61
+ ```bash
62
+ DIREXIO_CREDENTIALS_FILE=~/.direxio/nodes/<service_id>/credentials.json
63
+ DIREXIO_AGENT_NODE_ID=__AGENT_NODE_ID__
64
+ ```
65
+
66
+ This is intentionally separate from the `direxio-connect` bridge. MCP uses the deployer credential file; cc-connect uses a direct Matrix Client-Server session in `cc-connect/config.toml`.
67
+
68
+ Install and check the MCP package:
69
+
70
+ ```bash
71
+ npm install -g direxio-mcp@latest
72
+ DIREXIO_CREDENTIALS_FILE=~/.direxio/nodes/<service_id>/credentials.json direxio-mcp doctor --json
73
+ ```
74
+
75
+ ## cc-connect Matrix Bridge
76
+
77
+ S6 calls `agent.matrix_session.create` with the owner token. Current message-server builds must return a Matrix session for `@agent:<server>`, not for `@owner:<server>`. The resulting session is stored at:
78
+
79
+ ```text
80
+ ~/.direxio/nodes/<service_id>/cc-connect/matrix-session.json
81
+ ```
82
+
83
+ S6 then writes:
84
+
85
+ ```text
86
+ ~/.direxio/nodes/<service_id>/cc-connect/config.toml
87
+ ```
88
+
89
+ The config uses:
90
+
91
+ - `type = "matrix"` only.
92
+ - `homeserver` from the deployed Direxio domain.
93
+ - `access_token`, `device_id`, and `user_id` from `agent.matrix_session.create`.
94
+ - `room_id` from the real backend-created `agent_room_id`.
95
+ - `admin_from = "@owner:<server>"` at the project level, so only the portal owner can run privileged commands such as `/dir` and `/shell`.
96
+ - `share_session_in_channel = true` and `group_reply_all = true` for agent-room conversation continuity.
97
+ - `auto_join = false` and `auto_verify = false`; message-server creates and joins the real room.
98
+ - `[speech]` is generated and enabled automatically when S6 can find a speech-to-text API key. Without a key, voice input is not enabled and `direxio-connect` will answer voice messages with its speech configuration warning.
99
+
100
+ `/dir reset` is expected to restore the generated `work_dir` and remove the current project directory override from `cc-connect/data/projects/<project>.state.json`.
101
+
102
+ ## Install Parameters
103
+
104
+ ```bash
105
+ DIREXIO_AGENT_PLATFORM=auto
106
+ DIREXIO_CC_CONNECT_AGENT=<optional cc-connect agent>
107
+ DIREXIO_AGENT_INSTALL=skip|recommend|auto
108
+ DIREXIO_AGENT_INSTALL_MODE=recommended|cc-connect
109
+ DIREXIO_LOCAL_PATH_STYLE=posix|windows
110
+ DIREXIO_CC_CONNECT_AGENT_CMD=<optional agent executable path>
111
+ DIREXIO_<AGENT>_COMMAND=<optional agent-specific executable path>
112
+ DIREXIO_CC_CONNECT_AGENT_OPTIONS_TOML=<optional extra TOML under projects.agent.options>
113
+ DIREXIO_OPENCLAW_COMMAND=<optional OpenClaw executable path>
114
+ DIREXIO_HERMES_COMMAND=<optional Hermes executable path>
115
+ DIREXIO_OPENCLAW_ACP_URL=<required OpenClaw gateway URL>
116
+ DIREXIO_OPENCLAW_ACP_TOKEN_FILE=<required OpenClaw ACP token file>
117
+ DIREXIO_OPENCLAW_ACP_SESSION=<required OpenClaw ACP session>
118
+ DIREXIO_OPENCLAW_ACP_ARGS_TOML=<optional OpenClaw ACP TOML array>
119
+ DIREXIO_HERMES_ACP_ARGS_TOML=<optional Hermes ACP TOML array>
120
+ DIREXIO_CC_CONNECT_NPM_PACKAGE=direxio-connent@latest
121
+ DIREXIO_CC_CONNECT_REPO=https://github.com/YingSuiAI/direxio-connect.git
122
+ DIREXIO_MCP_NPM_PACKAGE=direxio-mcp@latest
123
+ DIREXIO_MCP_COMMAND=direxio-mcp
124
+ DIREXIO_SPEECH_PROVIDER=openai|groq|qwen|gemini
125
+ DIREXIO_SPEECH_API_KEY=<optional generic STT key>
126
+ DIREXIO_SPEECH_BASE_URL=<optional OpenAI-compatible STT base URL>
127
+ DIREXIO_SPEECH_MODEL=<optional STT model>
128
+ DIREXIO_SPEECH_LANGUAGE=zh
129
+ ```
130
+
131
+ Defaults:
132
+
133
+ - `DIREXIO_CC_CONNECT_AGENT` is the preferred explicit selector. It accepts every connent/connect agent: `acp`, `antigravity`, `claudecode`, `codex`, `copilot`, `cursor`, `devin`, `gemini`, `iflow`, `kimi`, `opencode`, `pi`, `qoder`, `reasonix`, and `tmux`.
134
+ - `DIREXIO_AGENT_PLATFORM=auto` detects the local agent runtime and maps it to a `direxio-connect` agent type only when it can identify one unambiguously. OpenClaw and Hermes map to the generic `acp` connect agent. OpenClaw requires explicit real ACP Gateway settings from the current runtime; Hermes uses the `direxio-connect hermes-acp-adapter -- hermes acp` compatibility wrapper by default.
135
+ - `DIREXIO_LOCAL_PATH_STYLE=windows` writes Windows-compatible `data_dir`, `work_dir`, config paths, and install commands. `scripts/orchestrate.ps1` sets this automatically. Linux, macOS, and WSL Bash runs should leave the default `posix` style. Windows Git Bash/MSYS2 users who run `scripts/orchestrate.sh` directly must set `DIREXIO_LOCAL_PATH_STYLE=windows` when the local bridge is a Windows process.
136
+ - `DIREXIO_CC_CONNECT_AGENT_CMD` writes `cmd = "<path>"` into `[projects.agent.options]`. Agent-specific forms such as `DIREXIO_CODEX_COMMAND`, `DIREXIO_CLAUDE_CODE_COMMAND`, `DIREXIO_GEMINI_COMMAND`, `DIREXIO_OPENCODE_COMMAND`, `DIREXIO_QODERCLI_COMMAND`, and `DIREXIO_OPENCLAW_COMMAND` are also accepted. For Hermes, `DIREXIO_HERMES_COMMAND` selects the child Hermes executable behind the adapter, while `DIREXIO_HERMES_ACP_ADAPTER_COMMAND` overrides the adapter command itself.
137
+ - `DIREXIO_CC_CONNECT_AGENT_OPTIONS_TOML` appends agent-specific options under `[projects.agent.options]`; use it for agents with required non-command options such as `reasonix` (`serve_url`) or `tmux` (`session`).
138
+ - OpenClaw Gateway ACP uses `DIREXIO_OPENCLAW_ACP_URL`, `DIREXIO_OPENCLAW_ACP_TOKEN_FILE`, and `DIREXIO_OPENCLAW_ACP_SESSION` to write `--url`, `--token-file`, and `--session`. Complete OpenClaw pairing first and use the real session that should receive Direxio messages; do not guess a default session or reuse old chat output.
139
+ - `DIREXIO_OPENCLAW_ACP_ARGS_TOML` replaces the generated OpenClaw ACP args array, for example `["acp", "--url", "wss://gateway.example.test:18789", "--token-file", "$HOME/.openclaw/gateway.token", "--session", "agent:main:main"]`. `DIREXIO_HERMES_ACP_ARGS_TOML` supplies the child Hermes args; S6 prefixes `["hermes-acp-adapter", "--", "<hermes-command>"]` automatically.
140
+ - `DIREXIO_AGENT_INSTALL=recommend` prints and records the command only.
141
+ - `DIREXIO_AGENT_INSTALL=auto` runs `npm install -g direxio-connent@latest` and then installs the `direxio-connect` daemon with the generated config and `--service-name <service_id>`. It is recorded as installed only when `direxio-connect daemon status --service-name <service_id>` reports `Status: Running` and recent daemon logs do not show ACP session initialization failure; otherwise S6 records `agent_install_status=install_failed`.
142
+ - `DIREXIO_AGENT_INSTALL_MODE=recommended` maps every supported local runtime to `cc-connect`.
143
+ - Speech defaults to `DIREXIO_SPEECH_PROVIDER=openai` and `DIREXIO_SPEECH_LANGUAGE=zh`. Provider-specific keys are also accepted: `DIREXIO_SPEECH_OPENAI_API_KEY` or `OPENAI_API_KEY`, `DIREXIO_SPEECH_GROQ_API_KEY` or `GROQ_API_KEY`, `DIREXIO_SPEECH_QWEN_API_KEY` or `DASHSCOPE_API_KEY`, and `DIREXIO_SPEECH_GEMINI_API_KEY`, `GEMINI_API_KEY`, or `GOOGLE_API_KEY`. Set `DIREXIO_SPEECH_ENABLED=false` to suppress speech config generation even when a key exists.
144
+
145
+ Manual command:
146
+
147
+ ```bash
148
+ npm install -g direxio-connent@latest
149
+ direxio-connect daemon install --config ~/.direxio/nodes/<service_id>/cc-connect/config.toml --service-name <service_id> --force
150
+ direxio-connect daemon status --service-name <service_id>
151
+ ```
152
+
153
+ Source fallback:
154
+
155
+ ```bash
156
+ git clone https://github.com/YingSuiAI/direxio-connect.git
157
+ cd connect
158
+ make build AGENTS=<cc-connect-agent> PLATFORMS_INCLUDE=matrix
159
+ ./direxio-connect daemon install --config ~/.direxio/nodes/<service_id>/cc-connect/config.toml --service-name <service_id> --force
160
+ ```
161
+
162
+ ## State Fields
163
+
164
+ S6 records these bridge-related fields in `state.json`:
165
+
166
+ ```text
167
+ agent_runtime
168
+ agent_install_policy
169
+ agent_install_mode
170
+ agent_install_command
171
+ agent_install_status
172
+ agent_node_id
173
+ agent_service_id
174
+ agent_service_dir
175
+ agent_credentials_file
176
+ agent_env_file
177
+ agent_workspace
178
+ agent_skill_install_path
179
+ agent_global_skill_install_path
180
+ direxio_agent_bridge
181
+ cc_connect_agent
182
+ cc_connect_agent_cmd
183
+ cc_connect_agent_options_toml_present
184
+ cc_connect_npm_package
185
+ cc_connect_repo
186
+ cc_connect_ref
187
+ cc_connect_source_dir
188
+ cc_connect_runtime_dir
189
+ cc_connect_config
190
+ cc_connect_binary
191
+ cc_connect_data_dir
192
+ cc_connect_matrix_session_file
193
+ cc_connect_matrix_user
194
+ cc_connect_matrix_device
195
+ cc_connect_matrix_homeserver
196
+ mcp_npm_package
197
+ mcp_command
198
+ mcp_server_name
199
+ mcp_config_dir
200
+ mcp_credentials_file
201
+ mcp_codex_config
202
+ mcp_openclaw_config
203
+ mcp_hermes_config
204
+ mcp_json_config
205
+ mcp_env_file
206
+ mcp_readme
207
+ mcp_install_command
208
+ mcp_doctor_command
209
+ ```
@@ -0,0 +1,46 @@
1
+ # 部署状态机
2
+
3
+ `scripts/orchestrate.sh` 是可续跑状态机。默认情况下,`DOMAIN=<domain>` 会把状态和本地桥接文件统一放在 `~/.direxio/nodes/<service_id>/`,状态机读取该目录下的 `state.json`,从第一个未完成阶段继续。不带 `DOMAIN` 运行 `bash scripts/orchestrate.sh status` 会扫描 `~/.direxio/nodes/*/state.json` 并列出所有本地服务。
4
+
5
+ ## 阶段
6
+
7
+ - **S0_PREREQ_AWS**: 校验 AWS CLI、凭据和账号身份。
8
+ - **S1_PREFLIGHT**: 校验 region、默认 VPC、vCPU 配额、Ubuntu amd64 AMI。
9
+ - **S2_DOMAIN**: 确认正式长期域名和 Matrix `server_name` 不可逆绑定。
10
+ - **S3_PROVISION**: 创建 EC2、密钥对、安全组、Elastic IP,按 DNS 模式处理 Route53 hosted zone/A 记录或等待外部 DNS,渲染 cloud-init。默认镜像 `MESSAGE_SERVER_IMAGE=direxio/message-server:latest`。
11
+ - **S4_BOOTSTRAP_STACK**: 等 cloud-init 安装 Docker 并启动 `postgres:18 + message-server + caddy + coturn`,轮询 `https://<domain>/healthz`。
12
+ - **S5_INIT_TOKENS**: SSH 读取云端 `init-tokens.sh` 生成的 `/opt/p2p/bootstrap.json`,归一化 `password`、`access_token`、`agent_token`、真实 `agent_room_id`。云端脚本会先调用 `portal.bootstrap`,并在服务端未返回房间时用 Matrix Client API 创建和回写真实 agent room。`password` 和 owner `access_token` 按一次性/易失凭据处理;需要登录或用 token 调接口前,必须重新从服务器拉取最新 `/opt/p2p/bootstrap.json`,不要复用旧输出。
13
+ - **S6_WIRE_LOCAL**: 写本地凭据、创建 `@agent:<server>` Matrix session、写 `cc-connect/config.toml`,写 MCP 配置片段,并按策略安装或推荐 `direxio-connect`。
14
+ - **S7_VERIFY_E2E**: 验证 `/_p2p`、Matrix versions、well-known、owner.json+CORS、TURN。
15
+
16
+ ## 云端 compose
17
+
18
+ - `postgres`: PostgreSQL 18,数据卷 `/var/lib/postgresql`。
19
+ - `message-init`: 生成 Direxio message-server 配置和 TURN 配置。
20
+ - `message-server`: 运行 Matrix + P2P 统一后端,公开容器内 8008。
21
+ - `caddy`: 对外 80/443,反代 `/_matrix/*` 和 `/_p2p/*`。
22
+ - `coturn`: TURN relay。
23
+
24
+ ## 完成判据
25
+
26
+ S7 自动验收通过后应交付:
27
+
28
+ - App 域名: `<domain>`
29
+ - 八位 App 初始化码: 后端 `password` 字段的当前值
30
+ - 本地服务凭据: `~/.direxio/nodes/<service_id>/credentials.json`
31
+ - 环境文件: `~/.direxio/nodes/<service_id>/env`
32
+ - cc-connect 配置: `~/.direxio/nodes/<service_id>/cc-connect/config.toml`
33
+ - MCP 配置目录: `~/.direxio/nodes/<service_id>/mcp/`
34
+ - Matrix bridge 用户: `@agent:<server>`
35
+ - 安装命令: `npm install -g direxio-connent@latest && direxio-connect daemon install --config <config> --service-name <service_id> --force`
36
+ - MCP 检查命令: `DIREXIO_CREDENTIALS_FILE=<credentials.json> direxio-mcp doctor --json`
37
+ - AWS 信息: region、instance id、Elastic IP、Route53 hosted zone、SSH 命令、state.json、destroy 命令
38
+ - 用户确认 gates: App 初始化、消息闭环、Agent/MCP runtime 验证仍需单独记录。
39
+
40
+ ## 常见阻断
41
+
42
+ - DNS 未指向 EIP: S3 返回 waiting。Route53 模式下先检查 hosted zone/NS 委托;manual DNS fallback 下用户或 DNS provider automation 设置 A 记录后用 `DNS_READY=1` 续跑。
43
+ - `/healthz` 不通: 看 `/var/log/cloud-init-output.log` 和 `docker compose logs message-server`。
44
+ - bootstrap 缺字段: 在实例上重跑 `sudo sh -lc 'cd /opt/p2p && DOMAIN=<domain> bash /opt/p2p/init-tokens.sh'`,再看宿主 `/opt/p2p/bootstrap.json` 和容器内 `/var/direxio-message-server/p2p/bootstrap.json`。
45
+ - `agent_room_id` 缺失或是旧伪 ID: 确认 `.env` 有 `P2P_PORTAL_PASSWORD`,然后重跑 `/opt/p2p/init-tokens.sh`;脚本应创建真实 Matrix room 并回写。
46
+ - TURN 为空: 检查 `TURN_SECRET`、coturn、安全组 3478 和 49160-49200/udp。