direxio-deployer 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. package/AGENTS.md +92 -0
  2. package/LICENSE +21 -0
  3. package/README.md +221 -0
  4. package/README_zh.md +218 -0
  5. package/SKILL.md +722 -0
  6. package/agents/README.md +25 -0
  7. package/agents/openai.yaml +12 -0
  8. package/bin/direxio-deployer.mjs +375 -0
  9. package/package.json +28 -0
  10. package/references/agent-targets.md +128 -0
  11. package/references/architecture.md +44 -0
  12. package/references/bug-history.md +78 -0
  13. package/references/deployment-lessons.md +218 -0
  14. package/references/deployment-optimization-audit.md +317 -0
  15. package/references/deployment-workflow.md +341 -0
  16. package/references/iam-policy.json +52 -0
  17. package/references/runtime-wiring.md +209 -0
  18. package/references/state-machine.md +46 -0
  19. package/references/token-refresh.md +81 -0
  20. package/references/tooling.md +106 -0
  21. package/references/troubleshooting.md +26 -0
  22. package/references/user-journey.md +75 -0
  23. package/references/verification-recovery.md +84 -0
  24. package/references/voip-turn-runbook.md +154 -0
  25. package/references/windows-deployment-notes.md +119 -0
  26. package/scripts/aws-credentials.sh +195 -0
  27. package/scripts/cloud-init/Caddyfile +48 -0
  28. package/scripts/cloud-init/docker-compose.yml +125 -0
  29. package/scripts/cloud-init/init-tokens.sh +238 -0
  30. package/scripts/cloud-init/user-data.yaml +40 -0
  31. package/scripts/destroy.ps1 +77 -0
  32. package/scripts/destroy.sh +589 -0
  33. package/scripts/lib/aws.sh +73 -0
  34. package/scripts/lib/domain.sh +175 -0
  35. package/scripts/lib/operation_report.sh +240 -0
  36. package/scripts/lib/ops.sh +230 -0
  37. package/scripts/lib/paths.sh +35 -0
  38. package/scripts/lib/state.sh +137 -0
  39. package/scripts/mcp-tools-list.mjs +95 -0
  40. package/scripts/orchestrate.ps1 +112 -0
  41. package/scripts/orchestrate.sh +1126 -0
  42. package/scripts/phases/s0_prereq_aws.sh +39 -0
  43. package/scripts/phases/s1_preflight.sh +72 -0
  44. package/scripts/phases/s2_domain.sh +103 -0
  45. package/scripts/phases/s3_provision.sh +421 -0
  46. package/scripts/phases/s4_bootstrap_stack.sh +38 -0
  47. package/scripts/phases/s5_init_tokens.sh +118 -0
  48. package/scripts/phases/s6_wire_local.sh +1435 -0
  49. package/scripts/phases/s7_verify_e2e.sh +136 -0
  50. package/scripts/pricing-estimate.sh +256 -0
  51. package/scripts/render/render-userdata.sh +86 -0
  52. package/scripts/reset-app-data.sh +40 -0
  53. package/scripts/update.sh +30 -0
  54. package/tests/aws_credentials_test.sh +139 -0
  55. package/tests/connect_daemon_runtime_check_test.sh +120 -0
  56. package/tests/default_paths_test.sh +58 -0
  57. package/tests/destroy_local_bridge_test.sh +154 -0
  58. package/tests/destroy_root_identity_test.sh +91 -0
  59. package/tests/destroy_route53_zone_test.sh +80 -0
  60. package/tests/domain_authoritative_dns_test.sh +49 -0
  61. package/tests/mcp_doctor_runtime_check_test.sh +86 -0
  62. package/tests/mcp_smoke_runtime_check_test.sh +121 -0
  63. package/tests/mcp_tools_runtime_check_test.sh +123 -0
  64. package/tests/npm_skill_distribution_test.sh +95 -0
  65. package/tests/operation_report_test.sh +258 -0
  66. package/tests/orchestrate_status_recovery_test.sh +91 -0
  67. package/tests/phase_timeout_test.sh +88 -0
  68. package/tests/pricing_estimate_test.sh +159 -0
  69. package/tests/render_userdata_remote_nodes_test.sh +40 -0
  70. package/tests/root_volume_tracking_test.sh +41 -0
  71. package/tests/route53_overwrite_guard_test.sh +86 -0
  72. package/tests/route53_zone_auto_create_test.sh +66 -0
  73. package/tests/runtime_summary_check_test.sh +203 -0
  74. package/tests/s6_wire_local_test.sh +405 -0
  75. package/tests/skill_structure_test.sh +298 -0
  76. package/tests/update_reset_ops_test.sh +230 -0
  77. package/tests/user_confirmation_gates_test.sh +152 -0
@@ -0,0 +1,81 @@
1
+ # Token Refresh
2
+
3
+ 每次重部署或清空数据卷后,`password`、owner `access_token`、`agent_token` 和 cc-connect Matrix session 都会变化。状态机 S6 会自动回填;手动恢复时按这里检查。
4
+
5
+ 从服务端同步过来的 `password` 和 owner `access_token` 必须按一次性/易失凭据处理。`password` 是后端字段名,对用户展示时必须叫八位 App 初始化码。用户完成初始化或 token exchange 后,服务端可能立刻重置这些值;任何需要再次获取初始化码,或需要用 `access_token` 调 `/_p2p/command`、Matrix Client API 等接口的操作,都必须先重新从服务器拉取最新 `/opt/p2p/bootstrap.json`,再更新本地 `credentials.json`。不要复用聊天记录、旧 `state.json`、旧 `credentials.json` 或历史部署输出里的 password/access token。
6
+
7
+ 现有节点执行 `scripts/update.sh` 或 `scripts/reset-app-data.sh` 后,本地旧证据也必须作废。脚本会清掉旧 `password`、`access_token`、`agent_token`、`agent_room_id`、`user_confirmations` 和 `runtime_checks`,把 `agent_install_status` 标成 `refresh_pending`,并只在 `WorkDir` 匹配当前 service 时停止对应的本地 bridge(stops only the matching service-scoped direxio-connect daemon),再把 S4-S7 标回 pending。这样旧的用户确认、MCP discovery、Agent runtime probe 或旧 bridge 安装状态不会被误用到更新/重置后的节点。`status` 会显示 `Local refresh:`,提醒 update/reset 已经清掉旧 credentials、user confirmations、runtime checks 和 bridge install proof;下一步必须 rerun the deployment workflow to refresh S4-S7, local credentials, MCP snippets, and runtime checks。后续必须续跑 `scripts/orchestrate.sh`,让 S5/S6/S7 和 `verify runtime` 重新写入当前证据。
8
+
9
+ ## 远端凭据
10
+
11
+ EC2 机器内 `/opt/p2p/bootstrap.json`:
12
+
13
+ ```json
14
+ {
15
+ "version": 1,
16
+ "owner_user_id": "__OWNER_USER_ID__",
17
+ "user_id": "__OWNER_USER_ID__",
18
+ "homeserver": "https://__DOMAIN__",
19
+ "access_token": "<ACCESS_TOKEN>",
20
+ "agent_token": "<AGENT_TOKEN>",
21
+ "password": "<APP_INITIALIZATION_CODE>",
22
+ "agent_room_id": "__ROOM_ID__"
23
+ }
24
+ ```
25
+
26
+ 取回:
27
+
28
+ ```bash
29
+ ssh -i <key.pem> ubuntu@<ip> 'sudo cat /opt/p2p/bootstrap.json' > bootstrap.json
30
+ ```
31
+
32
+ 如果刚执行过 App 初始化、`portal.auth`、手动接口调用、S5/S6 重跑,或者不确定本地凭据是否最新,先执行上面的取回命令,再读取 `password` 字段对应的八位初始化码或 `access_token`。
33
+
34
+ ## 本地服务凭据
35
+
36
+ `~/.direxio/nodes/<service_id>/credentials.json`:
37
+
38
+ ```json
39
+ {
40
+ "profiles": {
41
+ "default": {
42
+ "password": "<APP_INITIALIZATION_CODE>",
43
+ "access_token": "<ACCESS_TOKEN>",
44
+ "agent_room_id": "__ROOM_ID__",
45
+ "direxio_domain": "https://__DOMAIN__",
46
+ "direxio_agent_token": "<AGENT_TOKEN>",
47
+ "direxio_agent_room_id": "__ROOM_ID__",
48
+ "direxio_agent_node_id": "<agent_node_id>"
49
+ }
50
+ }
51
+ }
52
+ ```
53
+
54
+ 权限必须是 `0600`:
55
+
56
+ ```bash
57
+ chmod 600 ~/.direxio/nodes/<service_id>/credentials.json
58
+ ```
59
+
60
+ S6 也会写:
61
+
62
+ ```text
63
+ ~/.direxio/nodes/<service_id>/env
64
+ ~/.direxio/nodes/<service_id>/cc-connect/matrix-session.json
65
+ ~/.direxio/nodes/<service_id>/cc-connect/config.toml
66
+ ```
67
+
68
+ 刷新后重新安装或重启本地 bridge:
69
+
70
+ ```bash
71
+ direxio-connect daemon install --config ~/.direxio/nodes/<service_id>/cc-connect/config.toml --service-name <service_id> --force
72
+ direxio-connect daemon status --service-name <service_id>
73
+ ```
74
+
75
+ ## 验证
76
+
77
+ ```bash
78
+ curl -skf https://<domain>/healthz && echo OK
79
+ curl -sk https://<domain>/.well-known/portal/owner.json
80
+ curl -sk https://<domain>/_matrix/client/versions
81
+ ```
@@ -0,0 +1,106 @@
1
+ # Tooling By OS
2
+
3
+ Prepare `bash`, `aws`, `jq`, `ssh`, `scp`, `curl`, and at least one DNS lookup
4
+ tool. Always inspect first, then ask before installing or downloading.
5
+
6
+ ## Detect
7
+
8
+ ```bash
9
+ command -v bash aws jq ssh scp curl
10
+ command -v dig nslookup getent
11
+ aws --version
12
+ jq --version
13
+ ```
14
+
15
+ On Windows PowerShell:
16
+
17
+ ```powershell
18
+ Get-Command "C:\Program Files\Git\bin\bash.exe","C:\Program Files\Git\usr\bin\bash.exe",aws,jq,ssh,scp,curl,nslookup,Resolve-DnsName -ErrorAction SilentlyContinue
19
+ ```
20
+
21
+ ## Windows
22
+
23
+ Preferred shell: Git Bash, MSYS2, Cygwin, or a working WSL distro. Do not use `C:\Windows\System32\bash.exe` unless `bash -lc 'echo ok'` succeeds.
24
+
25
+ Standard Git Bash usually does not include `dig`. Use Windows `Resolve-DnsName`
26
+ or `nslookup` for DNS checks instead of blocking deployment on `dig`.
27
+
28
+ Common system installs:
29
+
30
+ ```powershell
31
+ winget install --id Git.Git --exact
32
+ winget install --id Amazon.AWSCLI --exact
33
+ winget install --id jqlang.jq --exact
34
+ ```
35
+
36
+ Workspace-local fallback when package managers are unavailable:
37
+
38
+ ```powershell
39
+ New-Item -ItemType Directory -Force -Path .tools\bin | Out-Null
40
+ Invoke-WebRequest -Uri https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-windows-amd64.exe -OutFile .tools\bin\jq.exe
41
+ python -m venv .tools\awscli-venv
42
+ .\.tools\awscli-venv\Scripts\python.exe -m pip install --upgrade pip awscli
43
+ ```
44
+
45
+ Create `.tools/bin/aws` for Git Bash:
46
+
47
+ ```bash
48
+ #!/usr/bin/env bash
49
+ SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd)
50
+ exec "$SCRIPT_DIR/../awscli-venv/Scripts/python.exe" -m awscli "$@"
51
+ ```
52
+
53
+ Run ops from Git Bash:
54
+
55
+ ```powershell
56
+ & "C:\Program Files\Git\bin\bash.exe" -lc 'PATH="$PWD/.tools/bin:$PATH"; bash scripts/orchestrate.sh'
57
+ ```
58
+
59
+ ## macOS
60
+
61
+ Preferred installs:
62
+
63
+ ```bash
64
+ brew install awscli jq
65
+ ```
66
+
67
+ If Homebrew is unavailable, ask before using the official AWS CLI pkg installer.
68
+ macOS already includes `ssh`, `scp`, `curl`, and `dig`.
69
+
70
+ ## Linux
71
+
72
+ Choose the detected package manager:
73
+
74
+ ```bash
75
+ sudo apt-get update && sudo apt-get install -y awscli jq openssh-client curl dnsutils
76
+ sudo dnf install -y awscli jq openssh-clients curl bind-utils
77
+ sudo yum install -y awscli jq openssh-clients curl bind-utils
78
+ sudo pacman -Sy --needed aws-cli jq openssh curl bind-tools
79
+ sudo zypper install -y aws-cli jq openssh-clients curl bind-utils
80
+ ```
81
+
82
+ If distro packages are too old or missing, ask before using the official AWS CLI zip installer.
83
+
84
+ ## Credentials
85
+
86
+ Prefer a temporary `DirexioDeployer` IAM user or role. If the user provides an
87
+ AWS access-key CSV, import it through the repository helper so command output
88
+ stays redacted and the identity is marked as `root=true|false`:
89
+
90
+ ```bash
91
+ bash scripts/aws-credentials.sh import-csv /path/to/accessKeys.csv direxio-deployer <region>
92
+ export AWS_PROFILE=direxio-deployer
93
+ bash scripts/aws-credentials.sh verify direxio-deployer
94
+ ```
95
+
96
+ Existing profiles can still be used, including root profiles when the operator
97
+ explicitly chooses root credentials:
98
+
99
+ ```bash
100
+ aws configure --profile p2p-matrix
101
+ export AWS_PROFILE=p2p-matrix
102
+ export AWS_DEFAULT_REGION=us-east-1
103
+ aws sts get-caller-identity
104
+ ```
105
+
106
+ Never print secrets or commit them.
@@ -0,0 +1,26 @@
1
+ # Troubleshooting
2
+
3
+ ## cc-connect Bridge
4
+
5
+ - `agent_room_id` must be a real Matrix room id beginning with `!`. Values like `!agent:<domain>` are legacy pseudo ids and must be fixed by redeploying or restarting a current message-server build.
6
+ - Current message-server images require `P2P_PORTAL_PASSWORD` and an explicit `portal.bootstrap` call. The cloud `init-tokens.sh` script is responsible for that call and for creating a real Matrix agent room when the backend credentials file does not already include `agent_room_id`.
7
+ - `agent.matrix_session.create` must return `@agent:<server>`. If it returns `@owner:<server>`, deploy a message-server build that includes agent Matrix session support.
8
+ - `cc-connect/config.toml` must contain one Matrix platform and the same `room_id` as S5/S6 state.
9
+ - `direxio-connect daemon status --service-name <service_id>` checks the local bridge process for the current Direxio node. If no daemon is installed, run the command printed in S6 state `agent_install_command`.
10
+ - If npm install fails, verify `npm view direxio-connent` and that the GitHub release contains the matching `direxio-connect` binary asset.
11
+
12
+ ## Matrix Checks
13
+
14
+ The deployed homeserver should answer:
15
+
16
+ ```bash
17
+ curl -k https://<domain>/_matrix/client/versions
18
+ ```
19
+
20
+ The local bridge should use the Matrix session file at:
21
+
22
+ ```text
23
+ ~/.direxio/nodes/<service_id>/cc-connect/matrix-session.json
24
+ ```
25
+
26
+ Do not hand-edit the access token unless S6 cannot create a session; rerun S6 after refreshing server credentials.
@@ -0,0 +1,75 @@
1
+ # Operator Journey
2
+
3
+ This document is the operator-facing reference for the deployment journey described in the root `SKILL.md`.
4
+
5
+ The important policy is simple:
6
+
7
+ > Direxio deployments require a real, long-lived domain before infrastructure is created.
8
+
9
+ ## Before Running Ops
10
+
11
+ Confirm these items before calling `scripts/orchestrate.sh`:
12
+
13
+ 1. The final Matrix domain is selected, for example `__DOMAIN__`.
14
+ 2. The user understands that Matrix `server_name` is bound to that domain.
15
+ 3. The user has confirmed `CONFIRM_DOMAIN_BINDING=1`.
16
+ 4. AWS CLI v2, `jq`, `ssh`, `scp`, and `curl` are available.
17
+ 5. AWS credentials are configured through `AWS_PROFILE` or environment variables.
18
+ 6. `AWS_DEFAULT_REGION` is explicit.
19
+ 7. `MESSAGE_SERVER_IMAGE` is selected, or the default `direxio/message-server:latest` is accepted.
20
+ 8. Existing state handling is explicit: continue, destroy, or new workdir.
21
+
22
+ On Windows, first verify that `bash` is a usable POSIX shell:
23
+
24
+ ```powershell
25
+ Get-Command bash.exe -All
26
+ bash -lc 'echo ok; command -v aws; command -v jq; command -v ssh; command -v scp; command -v curl'
27
+ ```
28
+
29
+ ## Domain Modes
30
+
31
+ | Mode | Meaning | DNS behavior |
32
+ |---|---|---|
33
+ | `route53` | User authorizes AWS Route53 automation | S3 reuses or creates the hosted zone, records NS, upserts the A record, and waits for DNS to resolve |
34
+ | `user` | Fallback when no DNS provider automation is available | S3 emits the EIP and waits until the domain A record resolves to it |
35
+
36
+ ## Minimal Command
37
+
38
+ ```bash
39
+ AWS_PROFILE=p2p-matrix \
40
+ AWS_DEFAULT_REGION=us-east-1 \
41
+ DOMAIN=__DOMAIN__ \
42
+ DOMAIN_MODE=user \
43
+ CONFIRM_DOMAIN_BINDING=1 \
44
+ INSTANCE_TYPE=t3.small \
45
+ MESSAGE_SERVER_IMAGE=direxio/message-server:latest \
46
+ bash scripts/orchestrate.sh
47
+ ```
48
+
49
+ ## Token Initialization
50
+
51
+ S5 reads `/opt/p2p/bootstrap.json` from the instance. Current message-server builds initialize on startup and write the backend `password` field plus owner, Matrix, and agent tokens. User-facing delivery should call `password` the eight-digit app initialization code.
52
+
53
+ ## Delivery
54
+
55
+ When all phases complete, report:
56
+
57
+ - App domain
58
+ - eight-digit app initialization code, sourced from the backend `password` field
59
+ - `access_token`, `agent_token`, and real `agent_room_id` in local credentials
60
+ - local node credential file status
61
+ - persisted `DIREXIO_DOMAIN`, `DIREXIO_AGENT_TOKEN`, `DIREXIO_AGENT_ROOM_ID`, `DIREXIO_AGENT_NODE_ID`
62
+ - `cc_connect_config`, `cc_connect_matrix_user`, `cc_connect_matrix_device`, and `cc_connect_matrix_homeserver`
63
+ - install policy/mode/status from `DIREXIO_AGENT_INSTALL` and `DIREXIO_AGENT_INSTALL_MODE`
64
+ - manual command: `npm install -g direxio-connent@latest && direxio-connect daemon install --config <cc_connect_config> --service-name <service_id> --force`
65
+ - region, instance ID, public IP, and `state.json` path
66
+ - SSH command
67
+ - stop-billing guidance: ask the agent to destroy this node when finished
68
+ - which gates are automated and which still need user confirmation, because S7 green is not the final product-complete state
69
+
70
+ After delivery, verify the local bridge by checking `direxio-connect daemon status --service-name <service_id>` when installed, or by running the recorded `agent_install_command` if the policy was `recommend`.
71
+
72
+ Destroying AWS resources removes deployer-created Route53 A records and
73
+ attempts to delete hosted zones that state marks as deployer-created. It does
74
+ not remove registered domains, third-party DNS records, or user-owned hosted
75
+ zones.
@@ -0,0 +1,84 @@
1
+ # Verification And Recovery
2
+
3
+ ## Fresh Verification
4
+
5
+ Run the built-in acceptance phase through the state machine:
6
+
7
+ ```bash
8
+ bash scripts/orchestrate.sh status
9
+ ```
10
+
11
+ Complete state shows `current: DONE` and S0-S7 as `done`.
12
+
13
+ The status command also prints a `Recovery summary`. When `current` is not
14
+ `DONE`, use that summary as the user-facing explanation instead of exposing only
15
+ raw phase names. It covers:
16
+
17
+ - where the deployment is blocked;
18
+ - whether recorded EC2, public IPv4/EIP, or EBS resources may still be billing;
19
+ - whether it is safe to rerun or must continue with preserved `state.json`;
20
+ - the next action for the current phase;
21
+ - stop-loss guidance through destroy when resources exist.
22
+
23
+ Independent checks:
24
+
25
+ ```bash
26
+ curl -fsS https://<DOMAIN>/healthz
27
+ curl -fsS https://<DOMAIN>/_matrix/client/versions
28
+ curl -fsS https://<DOMAIN>/.well-known/matrix/server
29
+ curl -fsS https://<DOMAIN>/.well-known/portal/owner.json
30
+ ```
31
+
32
+ If local DNS lags but authoritative DNS is correct, use:
33
+
34
+ ```bash
35
+ curl --resolve <DOMAIN>:443:<PUBLIC_IP> -fsS https://<DOMAIN>/healthz
36
+ ```
37
+
38
+ ## Common Waiting Points
39
+
40
+ - S0 waits for valid AWS credentials.
41
+ - S1 waits for default VPC, EC2 quota, or AMI availability.
42
+ - S3 waits for DNS A record.
43
+ - S4 waits for Docker/image pulls/Caddy certificate issuance.
44
+ - S5 waits for `/opt/p2p/bootstrap.json` and password/agent_token extraction.
45
+
46
+ Rerun the same command after fixing the blocker; state resumes from the first unfinished phase.
47
+
48
+ After S3, do not reset or delete state just to silence an error. If EC2, public
49
+ IPv4/EIP, or other AWS resources are recorded, preserve `state.json`, repair the
50
+ blocker, and rerun with `P2P_EXISTING_STATE_ACTION=continue`; or destroy first
51
+ if the user wants to stop billing.
52
+
53
+ ## Destroy
54
+
55
+ Destroy recorded AWS resources while state exists:
56
+
57
+ ```bash
58
+ DOMAIN=__DOMAIN__ bash scripts/destroy.sh
59
+ ```
60
+
61
+ Destroy stops and uninstalls the local `direxio-connect` daemon only when its reported `WorkDir` matches the current service's `~/.direxio/nodes/<service_id>/cc-connect` directory. It then cleans recorded EC2, EBS root volume, EIP, key pair, security group, Route53 records/zones created by the deployer, and current service directory best-effort. Before removing local state, it records AWS read-back cleanup evidence under `destroy.evidence`. User-managed DNS records and purchased domains remain the user's responsibility.
62
+
63
+ After destroy, read the redacted audit report at:
64
+
65
+ ```text
66
+ ~/.direxio/reports/<service_id>/operation-report.json
67
+ ```
68
+
69
+ Use it to report which recorded AWS resources were processed, which AWS
70
+ read-back checks show released or deleted resources, and which external items
71
+ remain outside automatic destroy scope.
72
+
73
+ ## Update / Reset Follow-Up
74
+
75
+ After `scripts/update.sh` or `scripts/reset-app-data.sh`, rerun:
76
+
77
+ ```bash
78
+ P2P_EXISTING_STATE_ACTION=continue DOMAIN=__DOMAIN__ bash scripts/orchestrate.sh
79
+ ```
80
+
81
+ The scripts intentionally mark S4-S7 pending and clear stale local secret
82
+ fields. Do not copy old initialization codes or tokens from chat history,
83
+ `state.json`, or `credentials.json`; S5 must fetch fresh bootstrap data and S6
84
+ must rewrite service-scoped local credentials/MCP snippets.
@@ -0,0 +1,154 @@
1
+ # VoIP / TURN relay 部署方案(已落地)
2
+
3
+ > 来源:飞书《20260603 - VoIP 通话连接缺口》。**已按决策实现**:方案 A 自建 coturn、第一版仅明文 `turn:3478`(udp+tcp,不上 TLS)、UDP relay 收窄 `49160-49200`。代码见 PR `feat/voip-turn-coturn`。
4
+ > ⚠️ VoIP 专项验收的硬标准:真机互拨一通,WebRTC internals 看到 **relay** ICE candidate(S7 的 turnServer 非空只是必要条件)。基础部署验收只要求 S7 `turnServer` 自动检测通过,不要求用户每次真实打电话。
5
+ >
6
+ > 缺口结论:Matrix 通话信令(`m.call.*`)已互通,但 `/_matrix/client/v3/voip/turnServer` 返回 `{}`,
7
+ > ICE 只有 host/srflx 没有 relay → 跨 NAT/防火墙通话必失败。**纯后端缺口,非前端。**
8
+ > 责任方:**只有 ops 部署 skill**(agent / AS / client 都不涉及 TURN)。
9
+
10
+ ---
11
+
12
+ ## 一、TURN 方案决策(已采用 A)
13
+
14
+ ### 方案 A:VPS 上自建 coturn(已落地)
15
+ compose 加一个 `coturn` 容器;Dendrite 用 **shared-secret 模式**让 `/voip/turnServer` 动态签发短期 credentials。
16
+
17
+ | 维度 | 说明 |
18
+ |---|---|
19
+ | 自包含 | ✅ 只给一把 AWS key 就能起,零外部账号、零额外月费 —— 契合 skill 定位 |
20
+ | 动态凭证 | ✅ shared-secret 模式 = homeserver 按 ttl 现签,满足文档"短期有效/动态签发"要求 |
21
+ | 成本 | ✅ 不额外花钱(复用同一台 EC2;TURN relay 流量走该机带宽) |
22
+ | 代价 | ⚠️ 要在安全组开 TURN 端口(含一段 UDP relay 范围);coturn 容器要拿公网 IP 作 external-ip |
23
+ | 改动量 | 8 处(见第三节) |
24
+
25
+ ### 备选方案 B:接外部 TURN 服务商(未采用)
26
+ skill 不部署 coturn,只把服务商给的 `uris + username/password` 写进 Dendrite turn 段。
27
+
28
+ | 维度 | 说明 |
29
+ |---|---|
30
+ | 运维 | ✅ 最省心,relay 由服务商扛,不用开 UDP 端口 |
31
+ | 自包含 | ❌ 破坏"只给一把 AWS key"——需额外注册服务商、拿密钥 |
32
+ | 成本 | ⚠️ 可能按流量计费(Twilio)或有额度上限(Cloudflare 免费档) |
33
+ | 动态凭证 | ⚠️ 多数服务商也支持短期凭证,但要 skill 去调它的签发 API,反而更复杂 |
34
+ | 改动量 | 少(只改 Dendrite turn 段 + S7 验收),但多一份"用户要准备的外部凭证" |
35
+
36
+ **最终采用方案 A**。理由:这个 skill 的核心卖点是 AWS 内自包含部署;B 会引入新的外部服务商注册和密钥准备。A 的唯一代价是多开几个端口,并已通过收窄 UDP relay 范围把暴露面压到最小。
37
+
38
+ ---
39
+
40
+ ## 二、端口规划(按你的选择:收窄固定 UDP 范围)
41
+
42
+ | 端口 | 协议 | 用途 | 是否对公网 |
43
+ |---|---|---|---|
44
+ | 3478 | udp + tcp | TURN/STUN 主端口 | ✅ 安全组放行 |
45
+ | **49160–49200** | udp | relay 媒体端口(**收窄固定范围**,~40 个) | ✅ 放行这一段 |
46
+
47
+ > 收窄范围够小规模 1:1 通话用;并发高时再调宽。coturn 用 `min-port/max-port` 锁这段,安全组只开这段。
48
+ > Caddy **不**代理 TURN(UDP 走不了 Caddy);coturn 端口直接暴露在主机网络。
49
+
50
+ ---
51
+
52
+ ## 三、8 处改动(具体代码片段,方案 A)
53
+
54
+ ### 1. `scripts/cloud-init/docker-compose.yml` — 加 coturn 服务
55
+ ```yaml
56
+ # ── coturn (TURN relay,WebRTC 通话必需) ──────────────────────────
57
+ # 用 host 网络以正确处理 UDP relay 与 external-ip(容器 NAT 会破坏 relay)。
58
+ # DOMAIN/PUBLIC_IP/TURN_SECRET 由 .env 注入(user-data 写)。
59
+ coturn:
60
+ image: coturn/coturn:latest
61
+ network_mode: host # relay 必须;不要放进 p2p-net 桥接网络
62
+ restart: unless-stopped
63
+ command:
64
+ - -n
65
+ - --realm=${DOMAIN}
66
+ - --listening-port=3478
67
+ - --min-port=49160
68
+ - --max-port=49200
69
+ - --external-ip=${PUBLIC_IP}
70
+ - --use-auth-secret
71
+ - --static-auth-secret=${TURN_SECRET}
72
+ - --no-cli
73
+ - --no-multicast-peers
74
+ - --no-tls
75
+ - --no-dtls
76
+ ```
77
+ > 注:`network_mode: host` 与现有 `networks: [p2p-net]` 不兼容,coturn 单独用 host 网络。
78
+ > 其余服务不变。
79
+
80
+ ### 2. `phases/s3_provision.sh` — 安全组加 TURN 端口
81
+ 把现有 `for p in 22 80 443` 段扩展:
82
+ ```bash
83
+ # 基础:SSH/HTTP/HTTPS
84
+ for p in 22 80 443; do
85
+ aws ec2 authorize-security-group-ingress --group-id "$sg" --protocol tcp --port "$p" --cidr 0.0.0.0/0 >/dev/null
86
+ done
87
+ # TURN:3478 udp+tcp
88
+ aws ec2 authorize-security-group-ingress --group-id "$sg" --protocol tcp --port 3478 --cidr 0.0.0.0/0 >/dev/null
89
+ aws ec2 authorize-security-group-ingress --group-id "$sg" --protocol udp --port 3478 --cidr 0.0.0.0/0 >/dev/null
90
+ # TURN relay UDP 收窄范围 49160-49200
91
+ aws ec2 authorize-security-group-ingress --group-id "$sg" --protocol udp --port 49160-49200 --cidr 0.0.0.0/0 >/dev/null
92
+ ```
93
+ 注释同步从"仅 22/80/443"改成"22/80/443 + TURN(3478 + 49160-49200/udp)"。
94
+
95
+ ### 3. `scripts/cloud-init/user-data.yaml` — 注入 PUBLIC_IP / TURN_SECRET 到 .env
96
+ 在 cloud-init 的 IMDS 取公网 IP 那步顺便落 `PUBLIC_IP`;再生成一个随机 `TURN_SECRET`:
97
+ ```bash
98
+ # 已有:IP=$(curl ... public-ipv4)
99
+ echo "PUBLIC_IP=$IP" >> /opt/p2p/.env
100
+ # TURN 共享密钥(随机,homeserver 与 coturn 共用)
101
+ echo "TURN_SECRET=$(head -c 32 /dev/urandom | base64 | tr -d '/+=' | head -c 40)" >> /opt/p2p/.env
102
+ ```
103
+ > 注:custom 域名模式下 PUBLIC_IP 取不到 IMDS 也能用 `curl ifconfig.me` 兜底;或由 deploy 端注入。
104
+
105
+ ### 4. compose 的 message-server init — 追加 turn 段(让 turnServer 非空)
106
+ 照现有 `printf ... >> $$CFG` 的模式,在 app_service_api 之后再追加:
107
+ ```sh
108
+ # TURN:与 coturn 共用 static-auth-secret,Dendrite 按 ttl 动态签 credentials
109
+ printf '\nclient_api:\n turn:\n turn_shared_secret: "%s"\n turn_user_lifetime: "24h"\n turn_uris:\n - "turn:%s:3478?transport=udp"\n - "turn:%s:3478?transport=tcp"\n' "$$TURN_SECRET" "${DOMAIN}" "${DOMAIN}" >> "$$CFG"
110
+ ```
111
+ > message-server init 需要 `TURN_SECRET` 环境变量(从 .env 传入 `environment:`)。
112
+ > turn_uris 用 `${DOMAIN}`(正式域名),解析到公网 IP,coturn 在那监听。
113
+
114
+ ### 5. `scripts/cloud-init/Caddyfile` — 不动(确认 TURN 不走 Caddy)
115
+ TURN 端口直连 coturn,Caddyfile **不加** TURN 反代。仅在文件头注释说明"TURN 由 coturn 直接暴露,不经 Caddy"。
116
+
117
+ ### 6. `phases/s7_verify_e2e.sh` — 加 TURN 验收
118
+ 新增一项(用 password/agent_token 换 access_token 后查 turnServer):
119
+ ```bash
120
+ # 换统一 access_token
121
+ at=$(curl -sk -X POST "https://$domain/_p2p/command" -H 'Content-Type: application/json' -d "{\"action\":\"portal.auth\",\"params\":{\"password\":\"$password\"}}" | jq -r '.access_token')
122
+ # 查 turnServer 必须非空、有 turn: uris、有 username/password、ttl>0
123
+ turn=$(curl -sk "https://$domain/_matrix/client/v3/voip/turnServer" -H "Authorization: Bearer $at")
124
+ echo "$turn" | jq -e '.uris and (.uris|length>0) and (.uris[]|test("^turns?:")) and (.username!="") and (.password!="") and (.ttl>0)' >/dev/null \
125
+ && ok " ✓ TURN turnServer 非空且有效" || { warn " ✗ TURN turnServer 无效:$turn"; fails=$((fails+1)); }
126
+ ```
127
+
128
+ ### 7. `references/troubleshooting.md` — 加 TURN 排查
129
+ 新增条目:
130
+ - **症状**:通话一直"正在连接"→"连接失败"。
131
+ - **查**:`/_matrix/client/v3/voip/turnServer` 是否返回 `{}`;coturn 容器是否在跑(`docker compose ps`);安全组 3478/49160-49200 是否放行;`docker logs coturn` 看有没有 relay 分配。
132
+ - **修**:turn 段没追加 → 看 message-server init;端口没开 → 看 s3 安全组;external-ip 错 → 看 .env PUBLIC_IP。
133
+
134
+ ### 8. `references/bug-history.md` + root `SKILL.md` — 记一笔"别再丢 VoIP"
135
+ - bug-history 加:"VoIP 通话连不上 = 没 TURN relay。已加 coturn + Dendrite shared-secret turn 段 + 安全组 TURN 端口。重部署勿删。"
136
+ - deployer skill 关键设计加一行:"**TURN/coturn 是通话必需**,compose 含 coturn、安全组开 3478/49160-49200,别为简化删掉。"
137
+
138
+ ---
139
+
140
+ ## 四、验收(文档给的标准,落到 S7 + 人工)
141
+
142
+ **自动(S7)**:`/_matrix/client/v3/voip/turnServer` 返回非空 + uris 含 `turn:` + username/password 非空 + ttl>0。
143
+
144
+ **人工(浏览器)**:Alice/Bob 互拨语音/视频 → 从"正在连接"进"通话中" → WebRTC internals 看到 **relay** ICE candidate → 挂断后 timeline 有通话系统消息(`m.call.candidates` 等技术信令不应当普通聊天显示)。
145
+
146
+ ---
147
+
148
+ ## 五、当前状态与剩余验收
149
+
150
+ 1. 方案 A 已落地到部署 skill,第一版只跑明文 `turn:3478`(udp+tcp),不上 `turns:5349`。
151
+ 2. 自动验收已进入 S7:能防止重部署后 `turnServer` 再次变成 `{}`。
152
+ 3. 基础部署验收只阻塞在 S7 `turnServer` 非空/有效;真正的媒体链路属于 VoIP 专项验收,仍需 Alice/Bob 真机互拨,并在 WebRTC internals 看到 `relay` ICE candidate。
153
+
154
+ > agent 项目:**这件事无需改动**(已核实它与 TURN/通话无关)。
@@ -0,0 +1,119 @@
1
+ # Windows Deployment Notes
2
+
3
+ Tested on Windows 10+ with Git Bash / MSYS2. These notes capture quirks that differ from Linux/macOS deployments.
4
+
5
+ ## Entry Point
6
+
7
+ Use the PowerShell wrapper from the repository root:
8
+
9
+ ```powershell
10
+ .\scripts\orchestrate.ps1 status
11
+ .\scripts\orchestrate.ps1
12
+ .\scripts\destroy.ps1
13
+ ```
14
+
15
+ The wrappers find Git for Windows Bash and use it for the Bash state machine, but set `DIREXIO_LOCAL_PATH_STYLE=windows` so S6 writes Windows-compatible `direxio-connect` config paths and daemon install commands. Use these PowerShell entrypoints on Windows instead of WSL Bash unless you intentionally deployed from WSL and want WSL-owned local paths.
16
+
17
+ Destroy can use `DOMAIN` or an explicit Windows state path:
18
+
19
+ ```powershell
20
+ $env:DOMAIN = "__DOMAIN__"
21
+ .\scripts\destroy.ps1
22
+
23
+ .\scripts\destroy.ps1 "$env:USERPROFILE\.direxio\nodes\<service_id>\state.json"
24
+ ```
25
+
26
+ ## Background Process Output Buffering
27
+
28
+ When running `orchestrate.sh` as a background process, bash may buffer stdout because it is not connected to a terminal. The process still writes state to `~/.direxio/nodes/<service_id>/state.json`.
29
+
30
+ Poll progress with:
31
+
32
+ ```bash
33
+ cat ~/.direxio/nodes/<service_id>/state.json | jq '{phase, phases}'
34
+ ```
35
+
36
+ For real-time tailing, use `stdbuf` when available:
37
+
38
+ ```bash
39
+ stdbuf -oL bash scripts/orchestrate.sh 2>&1
40
+ ```
41
+
42
+ ## DNS Diagnostics
43
+
44
+ - `dig` is not always available in Git Bash. Use `nslookup` or Route53 API output.
45
+ - Chinese locale can garble `nslookup` text; DNS resolution still works.
46
+ - When local DNS cache is stale but the record is correct, pin the IP:
47
+
48
+ ```bash
49
+ curl -sk --resolve __DOMAIN__:443:__EIP__ https://__DOMAIN__/healthz
50
+ ```
51
+
52
+ ## AWS Proxy Bypass
53
+
54
+ `lib/aws.sh` sets `NO_PROXY=*` and unsets proxy variables for AWS CLI calls. If AWS still fails with proxy errors, check:
55
+
56
+ ```bash
57
+ echo "HTTP_PROXY=$HTTP_PROXY"
58
+ echo "HTTPS_PROXY=$HTTPS_PROXY"
59
+ cat ~/.aws/config
60
+ ```
61
+
62
+ ## Reading AWS Credential CSVs
63
+
64
+ Windows terminal output may redact AWS keys. If a CSV appears truncated in output, read it without printing secrets and configure AWS CLI directly. Never print or log credential values.
65
+
66
+ ## Runtime Detection
67
+
68
+ S6 checks active runtime signals before historical config directories. If detection is ambiguous on Windows, set:
69
+
70
+ ```bash
71
+ DIREXIO_CC_CONNECT_AGENT=claudecode
72
+ ```
73
+
74
+ or another supported connent/connect agent before running `scripts/orchestrate.sh`. Supported bridge agents are `acp`, `antigravity`, `claudecode`, `codex`, `copilot`, `cursor`, `devin`, `gemini`, `iflow`, `kimi`, `opencode`, `pi`, `qoder`, `reasonix`, and `tmux`.
75
+
76
+ ## direxio-connect
77
+
78
+ The npm path is the default local install:
79
+
80
+ ```bash
81
+ npm install -g direxio-connent@latest
82
+ direxio-connect daemon install --config "$HOME/.direxio/nodes/<service_id>/cc-connect/config.toml" --service-name <service_id> --force
83
+ direxio-connect daemon status --service-name <service_id>
84
+ ```
85
+
86
+ If the command is not found after install, check the npm global bin directory:
87
+
88
+ ```bash
89
+ npm bin -g
90
+ ```
91
+
92
+ If an agent executable cannot be spawned from PATH, set a generic or agent-specific command before running S6:
93
+
94
+ ```powershell
95
+ $env:DIREXIO_CC_CONNECT_AGENT = "gemini"
96
+ $env:DIREXIO_GEMINI_COMMAND = "C:\Tools\gemini.cmd"
97
+ ```
98
+
99
+ For Codex Desktop, the wrapper also tries to find the real bundled `codex.exe` because WindowsApps aliases cannot always be spawned by child processes:
100
+
101
+ ```powershell
102
+ $codex = Get-ChildItem (Join-Path $env:LOCALAPPDATA 'OpenAI\Codex\bin') -Filter codex.exe -Recurse |
103
+ Select-Object -First 1 -ExpandProperty FullName
104
+ $env:DIREXIO_CODEX_COMMAND = $codex
105
+ ```
106
+
107
+ Use the Git Bash `$HOME` path for files generated by the deployer. If running `direxio-connect` from PowerShell, translate the config path to the Windows user profile path.
108
+
109
+ ## EC2 SSH Key Paths
110
+
111
+ SSH key files are written with Windows-compatible paths such as `C:/Users/.../.direxio/deploy/p2p-*.pem`. The SSH command printed in the delivery summary works in Git Bash. If using PowerShell or cmd, convert forward slashes to backslashes.
112
+
113
+ ## Verifying Deployment
114
+
115
+ The Matrix endpoint returns HTTP 200 when the service is healthy:
116
+
117
+ ```bash
118
+ curl -sk https://<domain>/_matrix/client/versions
119
+ ```