proof-artifacts 0.1.0-preview.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +94 -0
- package/bin/proof-artifacts.js +2049 -0
- package/docs/ARCHITECTURE.md +102 -0
- package/docs/SMOKE_TEST_2026-05-13.md +87 -0
- package/docs/VPS.md +156 -0
- package/package.json +26 -0
- package/runtime/Dockerfile +34 -0
- package/runtime/entrypoint.sh +35 -0
- package/skills/proof-artifacts/SKILL.md +145 -0
|
@@ -0,0 +1,102 @@
|
|
|
1
|
+
# Linux VPS v0 Architecture
|
|
2
|
+
|
|
3
|
+
## Goal
|
|
4
|
+
|
|
5
|
+
Build the smallest useful primitive for cloud coding agents: an installable CLI that creates an isolated Linux desktop on a VPS, lets an agent open and inspect a project UI, records the screen, captures screenshots, and leaves behind artifacts the user can review.
|
|
6
|
+
|
|
7
|
+
## Non-goals for v0
|
|
8
|
+
|
|
9
|
+
- No macOS or Windows host automation.
|
|
10
|
+
- No custom cloud scheduler.
|
|
11
|
+
- No model loop.
|
|
12
|
+
- No browser-only abstraction as the primary layer.
|
|
13
|
+
- No driver-level virtual monitor work.
|
|
14
|
+
|
|
15
|
+
Those can come later. The v0 bet is that Linux containers are the simplest reliable substrate for background visual verification.
|
|
16
|
+
|
|
17
|
+
## Runtime
|
|
18
|
+
|
|
19
|
+
Each session is one Docker container:
|
|
20
|
+
|
|
21
|
+
- `Xvfb` provides display `:99`.
|
|
22
|
+
- `fluxbox` provides a lightweight window manager.
|
|
23
|
+
- `Chromium` provides a real visible browser.
|
|
24
|
+
- `x11vnc` and `noVNC` expose a watchable remote desktop.
|
|
25
|
+
- `ffmpeg` captures screenshots and MP4 recordings.
|
|
26
|
+
- `/workspace` is the mounted project.
|
|
27
|
+
- `/artifacts` is the mounted evidence directory.
|
|
28
|
+
|
|
29
|
+
On Unix hosts, the container runs as the invoking UID/GID when available so screenshots, recordings, and reports stay removable by the VPS user.
|
|
30
|
+
Bridge-mode sessions also add `host.docker.internal` as a Docker host-gateway alias for VPS-host app access when the provider allows bridge-to-host traffic. That path is best-effort: some VPS firewall/Docker setups block bridge-to-host traffic. For localhost dev servers on the VPS host, `start --url http://127.0.0.1:<port>` infers host networking so Chromium can use the same network namespace as the app. Host-network sessions bind noVNC to VPS localhost and auto-probe noVNC/VNC ports when possible.
|
|
31
|
+
|
|
32
|
+
The CLI can use a configured runtime image with local Docker build fallback. A public prebuilt image still needs CI publishing before the first-run experience is fast.
|
|
33
|
+
|
|
34
|
+
The host keeps artifacts at:
|
|
35
|
+
|
|
36
|
+
```txt
|
|
37
|
+
.proof-artifacts/sessions/<session-name>/
|
|
38
|
+
manifest.json
|
|
39
|
+
session.json
|
|
40
|
+
report.json
|
|
41
|
+
report.html
|
|
42
|
+
screenshots/
|
|
43
|
+
recordings/
|
|
44
|
+
logs/
|
|
45
|
+
session-readme.txt
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
The host also writes a lightweight global index under `~/.proof-artifacts/sessions/` so commands can find a session from another working directory.
|
|
49
|
+
|
|
50
|
+
## CLI Surface
|
|
51
|
+
|
|
52
|
+
```txt
|
|
53
|
+
proof-artifacts doctor
|
|
54
|
+
proof-artifacts detect --project .
|
|
55
|
+
proof-artifacts smoke
|
|
56
|
+
proof-artifacts start --name task --project . --url http://127.0.0.1:3000 --fullscreen
|
|
57
|
+
proof-artifacts launch --name task --window-match "My App" -- npm run electron
|
|
58
|
+
proof-artifacts open http://127.0.0.1:3000 --name task
|
|
59
|
+
proof-artifacts screenshot --name task --label before
|
|
60
|
+
proof-artifacts record start --name task --label demo
|
|
61
|
+
proof-artifacts record stop --name task
|
|
62
|
+
proof-artifacts report --name task
|
|
63
|
+
proof-artifacts artifacts --name task
|
|
64
|
+
proof-artifacts exec --name task --stdin -- sh -s
|
|
65
|
+
proof-artifacts cleanup --max-age-hours 24 --dry-run
|
|
66
|
+
proof-artifacts stop --name task
|
|
67
|
+
proof-artifacts install-skill
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
`start` binds noVNC to `127.0.0.1` with a Docker-assigned host port unless `--novnc-port` is supplied. `cleanup` removes managed containers by age and keeps artifacts unless `--delete-artifacts` is explicitly set.
|
|
71
|
+
`detect` is a lightweight project classifier for agent guidance, not a full build system. `launch` is for Linux desktop apps that can run inside the existing Xvfb desktop session.
|
|
72
|
+
|
|
73
|
+
## Isolation Model
|
|
74
|
+
|
|
75
|
+
The first version isolates by Docker container and display server. This gives each agent its own:
|
|
76
|
+
|
|
77
|
+
- desktop process tree
|
|
78
|
+
- browser profile
|
|
79
|
+
- display
|
|
80
|
+
- recording process
|
|
81
|
+
- artifact directory
|
|
82
|
+
|
|
83
|
+
It does not yet provide hard multi-tenant security. For untrusted workloads, run each agent on a separate VM or with stricter Docker isolation.
|
|
84
|
+
|
|
85
|
+
## Why Linux First
|
|
86
|
+
|
|
87
|
+
Linux lets us create virtual displays without OS-private APIs or driver installation. On macOS, virtual monitors usually require private `CGVirtualDisplay` APIs. On Windows, virtual displays usually require an Indirect Display Driver and admin installation. Those are real future paths, but they are the wrong first bottleneck.
|
|
88
|
+
|
|
89
|
+
## Success Criteria
|
|
90
|
+
|
|
91
|
+
- A fresh Linux VPS with Docker can run `npm i -g proof-artifacts`.
|
|
92
|
+
- `proof-artifacts start` creates a watchable desktop.
|
|
93
|
+
- `proof-artifacts open` launches a visible browser.
|
|
94
|
+
- `proof-artifacts launch` starts a Linux desktop app in the visible desktop.
|
|
95
|
+
- `proof-artifacts screenshot` writes a PNG.
|
|
96
|
+
- `proof-artifacts record start/stop` writes an MP4.
|
|
97
|
+
- `proof-artifacts report` writes a human-readable handoff with summary cards, artifact counts, recent failures, screenshots, recordings, logs, and the redacted manifest.
|
|
98
|
+
- `proof-artifacts smoke` proves the core path end-to-end.
|
|
99
|
+
- `proof-artifacts list/status` lets agents understand existing sessions before starting, stopping, or replacing anything.
|
|
100
|
+
- `proof-artifacts install-skill` gives Codex agents the workflow.
|
|
101
|
+
|
|
102
|
+
Future v0 success criteria should include CI-published prebuilt images, more project/runtime coverage, and an MCP wrapper after CLI semantics settle.
|
|
@@ -0,0 +1,87 @@
|
|
|
1
|
+
# VPS Smoke Test - 2026-05-13
|
|
2
|
+
|
|
3
|
+
Historical note: this file is an early VPS smoke-test snapshot from May 13, 2026, not the current product status. Current behavior is documented in `README.md`, `docs/ARCHITECTURE.md`, `docs/VPS.md`, and `progress.md`.
|
|
4
|
+
|
|
5
|
+
## Environment
|
|
6
|
+
|
|
7
|
+
- Host: `ubuntu@165.1.74.127`
|
|
8
|
+
- OS: Ubuntu Noble on Oracle kernel `6.17.0-1011-oracle`
|
|
9
|
+
- User: `ubuntu`, passwordless sudo, Docker group available after install
|
|
10
|
+
- Installed during test:
|
|
11
|
+
- Docker `29.1.3`
|
|
12
|
+
- Node `18.19.1`
|
|
13
|
+
- npm `9.2.0`
|
|
14
|
+
|
|
15
|
+
## Commands Tested
|
|
16
|
+
|
|
17
|
+
```sh
|
|
18
|
+
node --check bin/proof-artifacts.js
|
|
19
|
+
node ./bin/proof-artifacts.js doctor
|
|
20
|
+
node ./bin/proof-artifacts.js start --name smoke --project . --width 1280 --height 800 --novnc-port 6080
|
|
21
|
+
node ./bin/proof-artifacts.js open https://example.com --name smoke
|
|
22
|
+
node ./bin/proof-artifacts.js screenshot --name smoke --label example
|
|
23
|
+
node ./bin/proof-artifacts.js record start --name smoke --label demo
|
|
24
|
+
node ./bin/proof-artifacts.js record stop --name smoke
|
|
25
|
+
node ./bin/proof-artifacts.js stop --name smoke
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## What Worked
|
|
29
|
+
|
|
30
|
+
- SSH access worked with the provided key.
|
|
31
|
+
- Docker installed and ran as the `ubuntu` user.
|
|
32
|
+
- `doctor` correctly detected Docker.
|
|
33
|
+
- Docker image build completed successfully.
|
|
34
|
+
- `start` created a containerized desktop session.
|
|
35
|
+
- noVNC served `vnc.html` over `127.0.0.1:<port>` with HTTP 200.
|
|
36
|
+
- `open` launched Chromium in the virtual desktop.
|
|
37
|
+
- `screenshot` produced a valid `1280x800` PNG.
|
|
38
|
+
- `record start` and `record stop` produced a valid MP4 and ffmpeg log.
|
|
39
|
+
- Session artifacts were written under `.proof-artifacts/sessions/<name>/`.
|
|
40
|
+
|
|
41
|
+
## Bugs Found And Fixed
|
|
42
|
+
|
|
43
|
+
- Initial screenshots showed a Fluxbox `xmessage` warning instead of the browser.
|
|
44
|
+
- Root cause: Fluxbox tries to run `fbsetbg` on first launch and opens a wallpaper-setter warning dialog.
|
|
45
|
+
- Fix: `runtime/entrypoint.sh` now repeatedly kills startup `fbsetbg` xmessage noise.
|
|
46
|
+
- Chromium was open but not necessarily foreground.
|
|
47
|
+
- Fix: `open` now waits for a visible Chromium window and raises it with `xdotool`.
|
|
48
|
+
- The package declared Node `>=20`, but the CLI worked on Ubuntu's Node 18.
|
|
49
|
+
- Fix: `package.json` now declares Node `>=18`.
|
|
50
|
+
|
|
51
|
+
## Remaining Issues
|
|
52
|
+
|
|
53
|
+
These were remaining issues at the time of the original smoke test; several have since been fixed.
|
|
54
|
+
|
|
55
|
+
- First run is slow because `start` builds a large Docker image locally.
|
|
56
|
+
- Docker build logs are extremely noisy and use the legacy Docker builder on this VPS.
|
|
57
|
+
- The runtime image is large because Chromium, ffmpeg, noVNC, X11, Python, and desktop libraries are installed together.
|
|
58
|
+
- Chromium logs DBus warnings because the container does not run a system bus. This did not block screenshots or recording.
|
|
59
|
+
- noVNC is unauthenticated inside the container. The host maps it to `127.0.0.1`, which is acceptable for SSH tunneling but not public exposure.
|
|
60
|
+
- The video was valid, but the test recording was mostly static. We still need a test that moves/clicks/types so recordings prove interaction, not only capture.
|
|
61
|
+
|
|
62
|
+
## Product Notes
|
|
63
|
+
|
|
64
|
+
- The core primitive is real enough to keep developing: isolated Linux display, noVNC, screenshots, and MP4 recording all worked on a clean VPS.
|
|
65
|
+
- The next product-quality jump is a prebuilt Docker image. Building locally on first use is too slow for the desired experience.
|
|
66
|
+
- A `setup` command should install/check Docker and Node without making users copy a runbook.
|
|
67
|
+
- `doctor` should check Docker daemon access, available ports, image availability, noVNC reachability, and ffmpeg capture.
|
|
68
|
+
- The agent-facing interface should become MCP tools instead of only shell commands.
|
|
69
|
+
- A generated HTML report would make artifacts feel like a finished agent handoff instead of a folder of files. This has since been implemented.
|
|
70
|
+
|
|
71
|
+
## Verified Artifacts
|
|
72
|
+
|
|
73
|
+
Remote examples:
|
|
74
|
+
|
|
75
|
+
```txt
|
|
76
|
+
/home/ubuntu/proof-artifacts-smoke/.proof-artifacts/sessions/smoke2/screenshots/example2.png
|
|
77
|
+
/home/ubuntu/proof-artifacts-smoke/.proof-artifacts/sessions/smoke2/recordings/demo2.mp4
|
|
78
|
+
/home/ubuntu/proof-artifacts-smoke/.proof-artifacts/sessions/smoke3/screenshots/example3.png
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Local inspection copies:
|
|
82
|
+
|
|
83
|
+
```txt
|
|
84
|
+
/tmp/proof-artifacts-smoke/example2.png
|
|
85
|
+
/tmp/proof-artifacts-smoke/demo2.mp4
|
|
86
|
+
/tmp/proof-artifacts-smoke/example3.png
|
|
87
|
+
```
|
package/docs/VPS.md
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
1
|
+
# VPS Runbook
|
|
2
|
+
|
|
3
|
+
## Install
|
|
4
|
+
|
|
5
|
+
On a fresh Linux VPS:
|
|
6
|
+
|
|
7
|
+
```sh
|
|
8
|
+
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
|
|
9
|
+
sudo apt-get install -y nodejs docker.io
|
|
10
|
+
sudo usermod -aG docker "$USER"
|
|
11
|
+
newgrp docker
|
|
12
|
+
npm install -g . # from this repo until the npm package is published
|
|
13
|
+
proof-artifacts doctor
|
|
14
|
+
proof-artifacts smoke
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
Some providers ship newer Docker packages through their own images. If `docker.io` is old or unavailable, install Docker Engine from Docker's official apt repository instead.
|
|
18
|
+
After the package is published, replace the local source install with `npm install -g proof-artifacts`.
|
|
19
|
+
|
|
20
|
+
## Start A Desktop
|
|
21
|
+
|
|
22
|
+
```sh
|
|
23
|
+
cd /path/to/project
|
|
24
|
+
proof-artifacts start --name issue-123 --project .
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
The desktop viewer is bound to `127.0.0.1` on the VPS. Do not expose noVNC publicly without authentication.
|
|
28
|
+
The CLI prints the selected noVNC URL, for example `http://127.0.0.1:32771/vnc.html`.
|
|
29
|
+
|
|
30
|
+
Use an SSH tunnel from your laptop:
|
|
31
|
+
|
|
32
|
+
```sh
|
|
33
|
+
ssh -L 6080:127.0.0.1:<printed-port> user@your-vps
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
Then open:
|
|
37
|
+
|
|
38
|
+
```txt
|
|
39
|
+
http://127.0.0.1:6080/vnc.html
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Parallel Agents
|
|
43
|
+
|
|
44
|
+
Run one session per agent:
|
|
45
|
+
|
|
46
|
+
```sh
|
|
47
|
+
proof-artifacts start --name agent-a --project .
|
|
48
|
+
proof-artifacts start --name agent-b --project .
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
Each session gets its own container, browser profile, display, and artifact directory.
|
|
52
|
+
Docker assigns each session a different localhost noVNC port by default. Use `--novnc-port <port>` only when you need a fixed tunnel target.
|
|
53
|
+
Starting a session with the same name as an already running session fails by default. Use a more specific name, stop the old session, or pass `--replace` only when you intentionally want to remove the existing desktop.
|
|
54
|
+
|
|
55
|
+
## Record Evidence
|
|
56
|
+
|
|
57
|
+
```sh
|
|
58
|
+
proof-artifacts open http://127.0.0.1:3000 --name agent-a --fullscreen
|
|
59
|
+
proof-artifacts record start --name agent-a --label checkout-flow
|
|
60
|
+
proof-artifacts screenshot --name agent-a --label after-login
|
|
61
|
+
proof-artifacts record stop --name agent-a
|
|
62
|
+
proof-artifacts report --name agent-a
|
|
63
|
+
proof-artifacts status --name agent-a
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Artifacts are stored under:
|
|
67
|
+
|
|
68
|
+
```txt
|
|
69
|
+
.proof-artifacts/sessions/<name>/
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
Each session writes `manifest.json`, `session.json`, `report.json`, and `report.html`. A global index under `~/.proof-artifacts/sessions/` lets the CLI find recent sessions from another working directory.
|
|
73
|
+
The report starts with a summary section for status, elapsed time, artifact counts, latest screenshot/recording, and any failed command context.
|
|
74
|
+
|
|
75
|
+
When the app is running on the VPS host instead of inside the desktop container, pass the URL to `start` so the CLI can infer host networking for localhost URLs:
|
|
76
|
+
|
|
77
|
+
```sh
|
|
78
|
+
proof-artifacts start --name agent-a --project . --url http://127.0.0.1:3000 --fullscreen
|
|
79
|
+
proof-artifacts open http://127.0.0.1:3000 --name agent-a
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Use `--fullscreen` when you want Chrome to fill the proof display while keeping the tab bar and address bar visible. `start --fullscreen` stores this as the session default for later `open` calls; `open --fullscreen` applies it only to that browser open. Use `open --no-fullscreen` to override a fullscreen session default when you need a normal windowed browser.
|
|
83
|
+
|
|
84
|
+
You can ask the CLI to check the host URL from both the VPS and a running desktop session:
|
|
85
|
+
|
|
86
|
+
```sh
|
|
87
|
+
proof-artifacts doctor --name agent-a --url http://127.0.0.1:3000
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
Use `proof-artifacts doctor --name agent-a` to inspect a running session's desktop runtime tools and noVNC reachability. Use `proof-artifacts doctor --deep` only when you want an explicit disposable smoke run.
|
|
91
|
+
|
|
92
|
+
Some VPS firewall/Docker setups block bridge containers from reaching the host gateway. You can also request host networking explicitly:
|
|
93
|
+
|
|
94
|
+
```sh
|
|
95
|
+
proof-artifacts start --name agent-a --project . --network host --novnc-port 6080
|
|
96
|
+
proof-artifacts open http://127.0.0.1:3000 --name agent-a
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
Host-network sessions bind noVNC to `127.0.0.1` on the VPS, not public interfaces. They also auto-probe noVNC/VNC ports when `--novnc-port auto` is used. If you choose fixed ports manually, avoid running multiple host-network sessions on the same ports.
|
|
100
|
+
On the Oracle VPS test hosts, bridge-to-host traffic was blocked. Host-network mode worked for the ARM64 Excalidraw end-to-end test, so treat it as the practical VPS fallback for localhost dev servers.
|
|
101
|
+
|
|
102
|
+
## Desktop Apps
|
|
103
|
+
|
|
104
|
+
The Linux runtime can also launch Linux desktop apps inside the Xvfb session. First ask for a project hint:
|
|
105
|
+
|
|
106
|
+
```sh
|
|
107
|
+
proof-artifacts detect --project .
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
If the project can run as a Linux desktop app, start a session and launch the app command:
|
|
111
|
+
|
|
112
|
+
```sh
|
|
113
|
+
proof-artifacts start --name desktop-demo --project .
|
|
114
|
+
proof-artifacts launch --name desktop-demo --window-match "My App" -- npm run electron
|
|
115
|
+
proof-artifacts record start --name desktop-demo --label walkthrough
|
|
116
|
+
proof-artifacts screenshot --name desktop-demo --label launched
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Use `--window-match` with text from the expected window title. Use `--no-wait` only for commands that intentionally start background services.
|
|
120
|
+
|
|
121
|
+
Native macOS, iOS, Windows, and Android apps are not supported by this Linux VPS runtime. They need a matching OS runner/backend.
|
|
122
|
+
|
|
123
|
+
## Run Automation Scripts
|
|
124
|
+
|
|
125
|
+
`proof-artifacts exec` runs inside the desktop container at `/workspace` with `DISPLAY=:99`. Pipe scripts with `--stdin` when you want to automate the visible browser or canvas without copying a file into the container:
|
|
126
|
+
|
|
127
|
+
```sh
|
|
128
|
+
proof-artifacts exec --name agent-a --stdin -- sh -s <<'SH'
|
|
129
|
+
xdotool search --onlyvisible --name Chromium windowactivate || true
|
|
130
|
+
xdotool mousemove 120 120 click 1
|
|
131
|
+
SH
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
This is useful for repeatable UI demos, drawing/canvas tests, or small browser interactions. The CLI records that an stdin exec happened but does not store the script contents in the session manifest.
|
|
135
|
+
|
|
136
|
+
## Cleanup
|
|
137
|
+
|
|
138
|
+
```sh
|
|
139
|
+
proof-artifacts stop --name agent-a
|
|
140
|
+
proof-artifacts stop --all
|
|
141
|
+
proof-artifacts list
|
|
142
|
+
proof-artifacts cleanup --max-age-hours 24 --dry-run
|
|
143
|
+
proof-artifacts cleanup --max-age-hours 24
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
`list` shows known sessions from the local artifact directory and global index. `status --name <name>` is the quickest way to check whether a session is running, whether noVNC responds, whether a recording is active, and how many artifacts exist.
|
|
147
|
+
`stop --all` removes only Docker containers labeled as managed by proof-artifacts and keeps artifacts. `cleanup` removes managed containers by age and keeps artifact directories by default. Add `--delete-artifacts` only when you intentionally want to delete evidence.
|
|
148
|
+
|
|
149
|
+
## Smoke Check
|
|
150
|
+
|
|
151
|
+
```sh
|
|
152
|
+
proof-artifacts smoke
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
The smoke command verifies Docker, runtime image availability, desktop startup, noVNC reachability, browser launch, screenshot capture, MP4 recording, report generation, and cleanup. Add `--keep` if you want to leave the smoke desktop running for inspection.
|
|
156
|
+
Use `proof-artifacts smoke --rebuild` when testing local runtime image changes.
|
package/package.json
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "proof-artifacts",
|
|
3
|
+
"version": "0.1.0-preview.0",
|
|
4
|
+
"description": "Proof artifacts for coding agents using isolated Linux desktops.",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"bin": {
|
|
7
|
+
"proof-artifacts": "./bin/proof-artifacts.js",
|
|
8
|
+
"proof": "./bin/proof-artifacts.js"
|
|
9
|
+
},
|
|
10
|
+
"files": [
|
|
11
|
+
"bin/",
|
|
12
|
+
"docs/",
|
|
13
|
+
"runtime/",
|
|
14
|
+
"skills/"
|
|
15
|
+
],
|
|
16
|
+
"scripts": {
|
|
17
|
+
"check": "node --check bin/proof-artifacts.js",
|
|
18
|
+
"test": "node --test",
|
|
19
|
+
"doctor": "node ./bin/proof-artifacts.js doctor",
|
|
20
|
+
"smoke": "node ./bin/proof-artifacts.js smoke"
|
|
21
|
+
},
|
|
22
|
+
"engines": {
|
|
23
|
+
"node": ">=18"
|
|
24
|
+
},
|
|
25
|
+
"license": "MIT"
|
|
26
|
+
}
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
FROM debian:12-slim
|
|
2
|
+
|
|
3
|
+
ENV DEBIAN_FRONTEND=noninteractive
|
|
4
|
+
ENV DISPLAY=:99
|
|
5
|
+
ENV WIDTH=1280
|
|
6
|
+
ENV HEIGHT=800
|
|
7
|
+
|
|
8
|
+
RUN apt-get update \
|
|
9
|
+
&& apt-get install -y --no-install-recommends \
|
|
10
|
+
ca-certificates \
|
|
11
|
+
chromium \
|
|
12
|
+
curl \
|
|
13
|
+
dbus-x11 \
|
|
14
|
+
ffmpeg \
|
|
15
|
+
fluxbox \
|
|
16
|
+
net-tools \
|
|
17
|
+
novnc \
|
|
18
|
+
procps \
|
|
19
|
+
python3-websockify \
|
|
20
|
+
wmctrl \
|
|
21
|
+
x11-utils \
|
|
22
|
+
x11vnc \
|
|
23
|
+
xdotool \
|
|
24
|
+
xvfb \
|
|
25
|
+
&& rm -rf /var/lib/apt/lists/*
|
|
26
|
+
|
|
27
|
+
WORKDIR /workspace
|
|
28
|
+
|
|
29
|
+
COPY entrypoint.sh /usr/local/bin/proof-artifacts-entrypoint
|
|
30
|
+
RUN chmod +x /usr/local/bin/proof-artifacts-entrypoint
|
|
31
|
+
|
|
32
|
+
EXPOSE 5900 6080
|
|
33
|
+
|
|
34
|
+
ENTRYPOINT ["proof-artifacts-entrypoint"]
|
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
#!/usr/bin/env sh
|
|
2
|
+
set -eu
|
|
3
|
+
|
|
4
|
+
mkdir -p /artifacts/screenshots /artifacts/recordings /artifacts/logs /tmp/runtime
|
|
5
|
+
|
|
6
|
+
VNC_PORT="${VNC_PORT:-5900}"
|
|
7
|
+
NOVNC_PORT="${NOVNC_PORT:-6080}"
|
|
8
|
+
NOVNC_LISTEN="${NOVNC_LISTEN:-0.0.0.0}"
|
|
9
|
+
|
|
10
|
+
Xvfb "${DISPLAY}" -screen 0 "${WIDTH}x${HEIGHT}x24" -ac -nolisten tcp +extension RANDR > /tmp/runtime/xvfb.log 2>&1 &
|
|
11
|
+
|
|
12
|
+
for _ in $(seq 1 50); do
|
|
13
|
+
xdpyinfo -display "${DISPLAY}" >/dev/null 2>&1 && break
|
|
14
|
+
sleep 0.1
|
|
15
|
+
done
|
|
16
|
+
|
|
17
|
+
fluxbox > /tmp/runtime/fluxbox.log 2>&1 &
|
|
18
|
+
# Fluxbox may show a first-run fbsetbg xmessage when no wallpaper setter exists.
|
|
19
|
+
# It blocks useful first screenshots, so dismiss that known startup noise.
|
|
20
|
+
(for _ in $(seq 1 40); do pkill -f "fbsetbg:" >/dev/null 2>&1 || true; sleep 0.25; done) &
|
|
21
|
+
x11vnc -display "${DISPLAY}" -forever -shared -nopw -localhost -no6 -noipv6 -listen 127.0.0.1 -rfbport "${VNC_PORT}" > /tmp/runtime/x11vnc.log 2>&1 &
|
|
22
|
+
websockify --web=/usr/share/novnc/ "${NOVNC_LISTEN}:${NOVNC_PORT}" localhost:"${VNC_PORT}" > /tmp/runtime/novnc.log 2>&1 &
|
|
23
|
+
|
|
24
|
+
cat > /artifacts/session-readme.txt <<EOF
|
|
25
|
+
Proof Artifacts desktop is running.
|
|
26
|
+
|
|
27
|
+
Display: ${DISPLAY}
|
|
28
|
+
Resolution: ${WIDTH}x${HEIGHT}
|
|
29
|
+
Workspace: /workspace
|
|
30
|
+
Artifacts: /artifacts
|
|
31
|
+
VNC: 127.0.0.1:${VNC_PORT}
|
|
32
|
+
noVNC: ${NOVNC_LISTEN}:${NOVNC_PORT}
|
|
33
|
+
EOF
|
|
34
|
+
|
|
35
|
+
tail -f /tmp/runtime/*.log
|
|
@@ -0,0 +1,145 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: proof-artifacts
|
|
3
|
+
description: Use when an agent needs to run, inspect, screenshot, or record a web/app UI inside an isolated Linux VPS desktop created by the proof-artifacts CLI.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Proof Artifacts
|
|
7
|
+
|
|
8
|
+
Use `proof-artifacts` when you need visual verification in a Linux/VPS sandbox. Prefer this over controlling the user's real desktop.
|
|
9
|
+
|
|
10
|
+
## Bootstrap
|
|
11
|
+
|
|
12
|
+
Before using the desktop workflow, make sure the CLI exists. Do this automatically; do not make the user memorize setup commands.
|
|
13
|
+
|
|
14
|
+
1. Check for the CLI:
|
|
15
|
+
|
|
16
|
+
```sh
|
|
17
|
+
command -v proof-artifacts || command -v proof
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
2. If missing, install it:
|
|
21
|
+
|
|
22
|
+
```sh
|
|
23
|
+
npm install -g proof-artifacts
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
3. If the npm package is not published yet or npm returns 404, install from the `proof-artifacts` source repo root instead:
|
|
27
|
+
|
|
28
|
+
```sh
|
|
29
|
+
npm install -g .
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
4. Re-check `command -v proof-artifacts`, then run:
|
|
33
|
+
|
|
34
|
+
```sh
|
|
35
|
+
proof-artifacts doctor
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
If Node/npm is missing, install Node.js 18+ first or tell the user the machine needs Node.js before the skill can continue. If Docker is missing or the user cannot access the Docker daemon, explain that Docker Engine is the required Linux desktop substrate and give the exact next command for the detected distro when it is obvious. On a new VPS, run `proof-artifacts doctor --deep` when you want the CLI to perform a disposable smoke capture.
|
|
39
|
+
|
|
40
|
+
## Workflow
|
|
41
|
+
|
|
42
|
+
1. If the environment is new or suspicious, run `proof-artifacts smoke`.
|
|
43
|
+
2. Run `proof-artifacts detect --project <repo-path>` before choosing how to run the project.
|
|
44
|
+
3. If it is a web app, start the dev server from the detected package scripts or project docs on the VPS host, keep it running, and identify its localhost URL.
|
|
45
|
+
4. If it is a web app running on the VPS host at `localhost`, pass that URL to `start` so the CLI can infer host networking: `proof-artifacts start --name <task-name> --project <repo-path> --url http://127.0.0.1:<port> --fullscreen`. For external URLs, normal bridge mode is fine.
|
|
46
|
+
5. If it is a Linux desktop app, start a session and launch it with `proof-artifacts launch --name <task-name> --window-match <app-title> -- <command...>`.
|
|
47
|
+
6. If it is macOS, Windows, Android, iOS, or unknown and cannot run in Linux, say that clearly instead of forcing it through the desktop container.
|
|
48
|
+
7. Open the target URL with `proof-artifacts open <url> --name <task-name>` for web apps. Add `--fullscreen` if the session was not started with `--fullscreen` and the goal is a full-size browser window.
|
|
49
|
+
8. Start recording before exercising the app: `proof-artifacts record start --name <task-name> --label <short-label>`.
|
|
50
|
+
9. Use screenshots and shell commands to inspect state: `proof-artifacts screenshot --name <task-name> --label <checkpoint>`.
|
|
51
|
+
10. Stop recording: `proof-artifacts record stop --name <task-name>`.
|
|
52
|
+
11. Check the session with `proof-artifacts status --name <task-name>`, then generate a handoff with `proof-artifacts report --name <task-name>`.
|
|
53
|
+
12. Use the report summary and failure context as the source of truth for what worked, what failed, and which screenshots/recordings to return to the user.
|
|
54
|
+
|
|
55
|
+
## Project Detection
|
|
56
|
+
|
|
57
|
+
Use `detect` to get a first-pass run plan:
|
|
58
|
+
|
|
59
|
+
```sh
|
|
60
|
+
proof-artifacts detect --project <repo-path>
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
Treat the output as guidance, not proof. Confirm the detected commands against project docs/scripts before running expensive installs or builds.
|
|
64
|
+
|
|
65
|
+
The Linux runtime can test web apps and Linux desktop apps. It cannot run native macOS/iOS/Windows/Android apps; those need a matching runner/backend.
|
|
66
|
+
|
|
67
|
+
## Linux Desktop Apps
|
|
68
|
+
|
|
69
|
+
For Electron, Tauri, GTK, Qt, Java/Swing, or other Linux desktop apps, launch the app inside the desktop session:
|
|
70
|
+
|
|
71
|
+
```sh
|
|
72
|
+
proof-artifacts start --name <task-name> --project <repo-path>
|
|
73
|
+
proof-artifacts launch --name <task-name> --window-match "<app title>" -- npm run electron
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
Then record, click, type, and screenshot with the normal workflow. If no window is detected, retry with a better `--window-match` value or use `--no-wait` when the launch command intentionally starts background services.
|
|
77
|
+
|
|
78
|
+
## Host App Access
|
|
79
|
+
|
|
80
|
+
When a dev server is running on the VPS host at a localhost URL, verify reachability instead of guessing:
|
|
81
|
+
|
|
82
|
+
```sh
|
|
83
|
+
proof-artifacts doctor --name <task-name> --url http://127.0.0.1:3000
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
If you need to be explicit, restart the session with host networking:
|
|
87
|
+
|
|
88
|
+
```sh
|
|
89
|
+
proof-artifacts stop --name <task-name>
|
|
90
|
+
proof-artifacts start --name <task-name> --project <repo-path> --network host
|
|
91
|
+
proof-artifacts open http://127.0.0.1:3000 --name <task-name>
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
Host-network sessions bind noVNC to VPS localhost and auto-probe noVNC/VNC ports when `--novnc-port auto` is used. Avoid fixed port collisions if you set ports manually.
|
|
95
|
+
|
|
96
|
+
## Browser Fullscreen
|
|
97
|
+
|
|
98
|
+
Prefer fullscreen browser mode for final visual proof when you want Chrome to fill the virtual display while keeping the tab bar and address bar visible:
|
|
99
|
+
|
|
100
|
+
```sh
|
|
101
|
+
proof-artifacts start --name <task-name> --project <repo-path> --url http://127.0.0.1:<port> --fullscreen
|
|
102
|
+
proof-artifacts open http://127.0.0.1:<port> --name <task-name>
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Use `proof-artifacts open <url> --name <task-name> --fullscreen` for a one-off fullscreen open. This is browser-window fullscreen, not kiosk mode: the tab bar and address bar should remain visible. Use `--no-fullscreen` if the session default is fullscreen but you need a normal windowed browser.
|
|
106
|
+
|
|
107
|
+
## Desktop Automation Scripts
|
|
108
|
+
|
|
109
|
+
`proof-artifacts exec` runs inside the desktop container at `/workspace` with `DISPLAY=:99`. Use `--stdin` for ad hoc multi-line scripts so you do not need `docker cp`:
|
|
110
|
+
|
|
111
|
+
```sh
|
|
112
|
+
proof-artifacts exec --name <task-name> --stdin -- sh -s <<'SH'
|
|
113
|
+
xdotool search --onlyvisible --name Chromium windowactivate || true
|
|
114
|
+
xdotool mousemove 120 120 click 1
|
|
115
|
+
SH
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
Do not put secrets in scripts passed over stdin; the CLI does not record stdin contents, but shell history or surrounding tooling might.
|
|
119
|
+
|
|
120
|
+
## Rules
|
|
121
|
+
|
|
122
|
+
- Keep each task in its own named session so multiple agents do not share a display.
|
|
123
|
+
- On a VPS, keep noVNC bound to localhost and use SSH port forwarding for human viewing. Use the printed port from `start`.
|
|
124
|
+
- Use stable, specific names like the branch, issue id, or task slug. Avoid generic names like `default` when multiple agents may run on the same VPS.
|
|
125
|
+
- If the web app is running on the VPS host, first try `proof-artifacts doctor --name <task-name> --url http://127.0.0.1:<port>`. If the VPS blocks Docker bridge-to-host traffic, use host-network mode and verify it with `screenshot` before relying on it.
|
|
126
|
+
- Record evidence for user-visible changes, not every command.
|
|
127
|
+
- If any setup step fails, summarize what worked, the exact blocker, and the next command. Do not leave the user with a vague "environment failed" message.
|
|
128
|
+
- Stop sessions with `proof-artifacts stop --name <task-name>` when work is complete.
|
|
129
|
+
- Use `proof-artifacts list` before stopping/reusing names if there may be multiple agents on the VPS.
|
|
130
|
+
- If a command needs to interact with the desktop display, run it through `proof-artifacts exec` so `DISPLAY=:99` is set.
|
|
131
|
+
- Use `proof-artifacts cleanup --dry-run` before deleting stale sessions. Use `proof-artifacts stop --all` only for managed proof-artifacts containers and only when emergency cleanup is needed. Add `--delete-artifacts` only when evidence can be discarded.
|
|
132
|
+
|
|
133
|
+
## Useful Commands
|
|
134
|
+
|
|
135
|
+
- `proof-artifacts smoke --keep` leaves the test desktop running for inspection.
|
|
136
|
+
- `proof-artifacts smoke --rebuild` rebuilds the local runtime image before testing local runtime changes.
|
|
137
|
+
- `proof-artifacts start --image <image> --pull` tries a configured runtime image before local build fallback.
|
|
138
|
+
- `proof-artifacts open <url> --name <task-name> --fullscreen` opens Chromium as a full-display browser window with browser chrome still visible.
|
|
139
|
+
- `proof-artifacts detect --project <repo-path>` suggests whether the project is web, Linux desktop, or unsupported in the Linux runtime.
|
|
140
|
+
- `proof-artifacts launch --name <task-name> --window-match <title> -- <command...>` starts a Linux desktop app inside the session.
|
|
141
|
+
- `proof-artifacts exec --name <task-name> --stdin -- sh -s` pipes a multi-line script into the desktop container.
|
|
142
|
+
- `proof-artifacts doctor --url <url> --name <task-name>` checks whether a running session can reach a host app URL.
|
|
143
|
+
- `proof-artifacts list` shows known sessions from the local artifact directory and global index.
|
|
144
|
+
- `proof-artifacts status --name <task-name>` shows runtime state, noVNC reachability, active recording, and artifact counts.
|
|
145
|
+
- `proof-artifacts cleanup --max-age-hours 24` removes old managed containers while preserving artifacts.
|