webctx 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +84 -0
- package/CONTRIBUTORS.md +93 -0
- package/LICENSE +21 -0
- package/Makefile +69 -0
- package/README.md +95 -0
- package/bin/webctx.js +28 -0
- package/cmd/webctx/main.go +11 -0
- package/docs/porting-status.md +173 -0
- package/go.mod +3 -0
- package/internal/app/app.go +139 -0
- package/internal/app/app_test.go +77 -0
- package/internal/app/scrape.go +310 -0
- package/internal/app/tools.go +558 -0
- package/internal/buildinfo/buildinfo.go +16 -0
- package/package.json +55 -0
- package/scripts/postinstall.js +137 -0
package/AGENTS.md
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# AGENTS.md
|
|
2
|
+
|
|
3
|
+
Guidance for coding agents working in `webctx`.
|
|
4
|
+
|
|
5
|
+
## Purpose
|
|
6
|
+
|
|
7
|
+
This repo contains the pure Go port of the `webctx` CLI.
|
|
8
|
+
|
|
9
|
+
It is no longer a generic starter template. Treat the current CLI behavior as the source of truth unless the user explicitly asks to change it.
|
|
10
|
+
|
|
11
|
+
## Architecture
|
|
12
|
+
|
|
13
|
+
- `cmd/webctx/main.go`: process entrypoint, exit-code based dispatch.
|
|
14
|
+
- `internal/app/app.go`: CLI parsing and top-level command routing.
|
|
15
|
+
- `internal/app/tools.go`: search provider clients, ranking, formatting, and HTTP helpers.
|
|
16
|
+
- `internal/app/scrape.go`: GitHub raw-content optimization, `.md` fetch path, Firecrawl queue, and env loading.
|
|
17
|
+
- `internal/app/app_test.go`: unit tests for CLI behavior and core helpers.
|
|
18
|
+
- `bin/webctx.js`: npm shim that invokes the packaged native binary.
|
|
19
|
+
- `scripts/postinstall.js`: downloads release binary on install, falls back to `go build`.
|
|
20
|
+
- `.github/workflows/release.yml`: tag-driven release pipeline.
|
|
21
|
+
- `docs/porting-status.md`: progress log and remaining work for future agents.
|
|
22
|
+
|
|
23
|
+
## Local commands
|
|
24
|
+
|
|
25
|
+
Use `make` targets:
|
|
26
|
+
|
|
27
|
+
- `make fmt`
|
|
28
|
+
- `make test`
|
|
29
|
+
- `make vet`
|
|
30
|
+
- `make lint`
|
|
31
|
+
- `make check`
|
|
32
|
+
- `make build`
|
|
33
|
+
- `make build-all`
|
|
34
|
+
- `make install-local`
|
|
35
|
+
|
|
36
|
+
Direct commands:
|
|
37
|
+
|
|
38
|
+
- `go test ./...`
|
|
39
|
+
- `go vet ./...`
|
|
40
|
+
- `npm run lint`
|
|
41
|
+
|
|
42
|
+
## Current CLI contract
|
|
43
|
+
|
|
44
|
+
Preserve these commands unless the user explicitly asks to change them:
|
|
45
|
+
|
|
46
|
+
- `webctx search <query> [--exclude domain1,domain2] [--keyword phrase]`
|
|
47
|
+
- `webctx read-link <url>`
|
|
48
|
+
- `webctx map-site <url>`
|
|
49
|
+
- `webctx --version`
|
|
50
|
+
|
|
51
|
+
Behavioral expectations:
|
|
52
|
+
|
|
53
|
+
- `search` combines Brave, Tavily, and Exa results, then re-ranks them with duplicate-aware scoring.
|
|
54
|
+
- `read-link` keeps the current GitHub raw-content fast path, `.md` fast path, and Firecrawl fallback settings.
|
|
55
|
+
- `map-site` keeps the current Firecrawl map request settings.
|
|
56
|
+
- The CLI should remain agent-friendly and emit plain markdown/text output.
|
|
57
|
+
|
|
58
|
+
## How to change things safely
|
|
59
|
+
|
|
60
|
+
1. Keep binary naming convention unchanged unless you also update postinstall/workflow:
|
|
61
|
+
- release assets: `<cli>_<goos>_<goarch>[.exe]`
|
|
62
|
+
- npm-installed binary path: `bin/<cli>-bin` (or `.exe` on Windows)
|
|
63
|
+
|
|
64
|
+
2. If changing search behavior, compare against the TypeScript porting notes in `docs/porting-status.md` first.
|
|
65
|
+
|
|
66
|
+
3. If adding dependencies, commit `go.sum` and make sure the workflow still passes on a clean checkout.
|
|
67
|
+
|
|
68
|
+
4. If you change release artifacts or version plumbing, update `Makefile`, `.github/workflows/release.yml`, and `scripts/postinstall.js` together.
|
|
69
|
+
|
|
70
|
+
## Release contract
|
|
71
|
+
|
|
72
|
+
Release pipeline triggers on `v*` tags and expects:
|
|
73
|
+
|
|
74
|
+
- `NPM_TOKEN` GitHub secret present.
|
|
75
|
+
- npm package name in `package.json` is publishable under your account/org.
|
|
76
|
+
- repository URL matches the release origin used by `scripts/postinstall.js`.
|
|
77
|
+
|
|
78
|
+
Release binaries should embed the tagged version into `internal/buildinfo.Version` so `webctx --version` matches the release tag.
|
|
79
|
+
|
|
80
|
+
## Guardrails
|
|
81
|
+
|
|
82
|
+
- Prefer additive changes and keep the CLI output stable.
|
|
83
|
+
- Do not silently change Firecrawl request settings unless the user explicitly wants behavioral changes.
|
|
84
|
+
- Do not reintroduce MCP/server code unless requested; this repo is intentionally CLI-only.
|
package/CONTRIBUTORS.md
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# CONTRIBUTORS.md
|
|
2
|
+
|
|
3
|
+
Maintainer notes for `webctx`.
|
|
4
|
+
|
|
5
|
+
## Prerequisites
|
|
6
|
+
|
|
7
|
+
- Go `1.26+`
|
|
8
|
+
- Node `18+`
|
|
9
|
+
- npm account with publish rights for the package name in `package.json`
|
|
10
|
+
- GitHub repo admin access
|
|
11
|
+
|
|
12
|
+
## Local development
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
make check
|
|
16
|
+
make build
|
|
17
|
+
./dist/webctx --help
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
Example command checks:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
./dist/webctx --version
|
|
24
|
+
./dist/webctx search "golang http client"
|
|
25
|
+
./dist/webctx read-link https://github.com/amxv/webctx-ts/blob/main/cli.ts
|
|
26
|
+
./dist/webctx map-site https://example.com
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
Install command locally:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
make install-local
|
|
33
|
+
webctx --help
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Release process
|
|
37
|
+
|
|
38
|
+
1. Ensure `main` is green:
|
|
39
|
+
|
|
40
|
+
```bash
|
|
41
|
+
make check
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
2. Confirm the release workflow is targeting `webctx` and that `package.json` still points to the correct GitHub repository.
|
|
45
|
+
|
|
46
|
+
3. Prepare release tag:
|
|
47
|
+
|
|
48
|
+
```bash
|
|
49
|
+
make release-tag VERSION=0.1.0
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
4. GitHub Actions `release` workflow runs automatically:
|
|
53
|
+
- quality checks
|
|
54
|
+
- cross-platform binary build
|
|
55
|
+
- GitHub release publish
|
|
56
|
+
- npm publish
|
|
57
|
+
|
|
58
|
+
## Required GitHub secret
|
|
59
|
+
|
|
60
|
+
- `NPM_TOKEN`: npm automation token with publish rights for your package.
|
|
61
|
+
|
|
62
|
+
Set via GitHub CLI:
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
gh secret set NPM_TOKEN --repo amxv/webctx
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## npm token setup
|
|
69
|
+
|
|
70
|
+
Create token at npm:
|
|
71
|
+
|
|
72
|
+
- Profile -> Access Tokens -> Create New Token
|
|
73
|
+
- Use an automation/granular token scoped to required package/org
|
|
74
|
+
|
|
75
|
+
Validate auth locally:
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
npm whoami
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
## Notes on package naming
|
|
82
|
+
|
|
83
|
+
`webctx` is already configured. If you ever rename or move the package, update all of the following together:
|
|
84
|
+
|
|
85
|
+
- `package.json`
|
|
86
|
+
- `bin/webctx.js`
|
|
87
|
+
- `scripts/postinstall.js`
|
|
88
|
+
- `.github/workflows/release.yml`
|
|
89
|
+
- `Makefile`
|
|
90
|
+
|
|
91
|
+
## Porting reference
|
|
92
|
+
|
|
93
|
+
The repo includes `docs/porting-status.md` as the running reference for what was ported from `webctx-ts`, what was intentionally excluded, and what future agents should verify before making behavior changes.
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 amxv
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/Makefile
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
SHELL := /bin/bash
|
|
2
|
+
|
|
3
|
+
GO ?= go
|
|
4
|
+
GOFMT ?= gofmt
|
|
5
|
+
BIN_NAME ?= webctx
|
|
6
|
+
CMD_PATH ?= ./cmd/$(BIN_NAME)
|
|
7
|
+
DIST_DIR ?= dist
|
|
8
|
+
BIN_PATH ?= $(DIST_DIR)/$(BIN_NAME)
|
|
9
|
+
VERSION ?= $(shell node -p "require('./package.json').version" 2>/dev/null)
|
|
10
|
+
LDFLAGS ?= -s -w -X github.com/amxv/webctx/internal/buildinfo.Version=$(if $(VERSION),$(VERSION),dev)
|
|
11
|
+
|
|
12
|
+
.PHONY: help fmt test vet lint check build build-all install-local clean release-tag
|
|
13
|
+
|
|
14
|
+
help:
|
|
15
|
+
@echo "webctx command runner"
|
|
16
|
+
@echo ""
|
|
17
|
+
@echo "Targets:"
|
|
18
|
+
@echo " make fmt - format Go files"
|
|
19
|
+
@echo " make test - run go test ./..."
|
|
20
|
+
@echo " make vet - run go vet ./..."
|
|
21
|
+
@echo " make lint - run Node script checks"
|
|
22
|
+
@echo " make check - fmt + test + vet + lint"
|
|
23
|
+
@echo " make build - build local binary to dist/webctx"
|
|
24
|
+
@echo " make build-all - build release binaries for 5 target platforms"
|
|
25
|
+
@echo " make install-local - install CLI to ~/.local/bin/webctx"
|
|
26
|
+
@echo " make clean - remove dist artifacts"
|
|
27
|
+
@echo " make release-tag - create and push git tag (requires VERSION=x.y.z)"
|
|
28
|
+
|
|
29
|
+
fmt:
|
|
30
|
+
@$(GOFMT) -w $$(find . -type f -name '*.go' -not -path './dist/*')
|
|
31
|
+
|
|
32
|
+
test:
|
|
33
|
+
@$(GO) test ./...
|
|
34
|
+
|
|
35
|
+
vet:
|
|
36
|
+
@$(GO) vet ./...
|
|
37
|
+
|
|
38
|
+
lint:
|
|
39
|
+
@npm run lint
|
|
40
|
+
|
|
41
|
+
check: fmt test vet lint
|
|
42
|
+
|
|
43
|
+
build:
|
|
44
|
+
@mkdir -p $(DIST_DIR)
|
|
45
|
+
@$(GO) build -trimpath -ldflags="$(LDFLAGS)" -o $(BIN_PATH) $(CMD_PATH)
|
|
46
|
+
|
|
47
|
+
build-all:
|
|
48
|
+
@mkdir -p $(DIST_DIR)
|
|
49
|
+
@for target in "darwin amd64" "darwin arm64" "linux amd64" "linux arm64" "windows amd64"; do \
|
|
50
|
+
set -- $$target; \
|
|
51
|
+
GOOS=$$1; GOARCH=$$2; \
|
|
52
|
+
EXT=""; \
|
|
53
|
+
if [ "$$GOOS" = "windows" ]; then EXT=".exe"; fi; \
|
|
54
|
+
echo "Building $(BIN_NAME) for $$GOOS/$$GOARCH"; \
|
|
55
|
+
CGO_ENABLED=0 GOOS=$$GOOS GOARCH=$$GOARCH $(GO) build -trimpath -ldflags="$(LDFLAGS)" -o "$(DIST_DIR)/$(BIN_NAME)_$${GOOS}_$${GOARCH}$${EXT}" $(CMD_PATH); \
|
|
56
|
+
done
|
|
57
|
+
|
|
58
|
+
install-local: build
|
|
59
|
+
@mkdir -p $$HOME/.local/bin
|
|
60
|
+
@install -m 755 $(BIN_PATH) $$HOME/.local/bin/$(BIN_NAME)
|
|
61
|
+
@echo "Installed $(BIN_NAME) to $$HOME/.local/bin/$(BIN_NAME)"
|
|
62
|
+
|
|
63
|
+
clean:
|
|
64
|
+
@rm -rf $(DIST_DIR)
|
|
65
|
+
|
|
66
|
+
release-tag:
|
|
67
|
+
@test -n "$(VERSION)" || (echo "Usage: make release-tag VERSION=x.y.z" && exit 1)
|
|
68
|
+
@git tag "v$(VERSION)"
|
|
69
|
+
@git push origin "v$(VERSION)"
|
package/README.md
ADDED
|
@@ -0,0 +1,95 @@
|
|
|
1
|
+
# webctx
|
|
2
|
+
|
|
3
|
+
`webctx` is a pure Go CLI for agent-friendly web search and page extraction.
|
|
4
|
+
|
|
5
|
+
## What it does
|
|
6
|
+
|
|
7
|
+
- `search`: combines Brave, Tavily, and Exa search results, deduplicates them, and re-ranks them
|
|
8
|
+
- `read-link`: returns clean markdown for a single URL using a GitHub raw-content path, a `.md` fast path, and Firecrawl scraping fallback
|
|
9
|
+
- `map-site`: returns a sitemap-style list of URLs and metadata from Firecrawl
|
|
10
|
+
|
|
11
|
+
## Install
|
|
12
|
+
|
|
13
|
+
Global npm install:
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
npm i -g webctx
|
|
17
|
+
webctx --help
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
Build from source:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
git clone https://github.com/amxv/webctx.git
|
|
24
|
+
cd webctx
|
|
25
|
+
make build
|
|
26
|
+
./dist/webctx --help
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Commands
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
webctx --help
|
|
33
|
+
webctx --version
|
|
34
|
+
webctx search <query> [--exclude domain1,domain2] [--keyword phrase]
|
|
35
|
+
webctx read-link <url>
|
|
36
|
+
webctx map-site <url>
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
Examples:
|
|
40
|
+
|
|
41
|
+
```bash
|
|
42
|
+
webctx search "next.js server components"
|
|
43
|
+
webctx search "react hooks" --exclude youtube.com,vimeo.com
|
|
44
|
+
webctx search "drizzle orm" --keyword "migration guide"
|
|
45
|
+
webctx read-link https://docs.example.com/guide
|
|
46
|
+
webctx map-site https://example.com
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Environment variables
|
|
50
|
+
|
|
51
|
+
The CLI loads `.env.local` when present and reads provider credentials from the environment.
|
|
52
|
+
|
|
53
|
+
Quick start:
|
|
54
|
+
|
|
55
|
+
```bash
|
|
56
|
+
cp .env.local.example .env.local
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Required by command:
|
|
60
|
+
|
|
61
|
+
- `search`
|
|
62
|
+
- `BRAVE_API_KEY`
|
|
63
|
+
- `TAVILY_API_KEY`
|
|
64
|
+
- `EXA_API_KEY`
|
|
65
|
+
- `read-link`
|
|
66
|
+
- `FIRECRAWL_API_KEY` for non-GitHub / non-`.md` URLs
|
|
67
|
+
- `map-site`
|
|
68
|
+
- `FIRECRAWL_API_KEY`
|
|
69
|
+
|
|
70
|
+
## Release and distribution
|
|
71
|
+
|
|
72
|
+
This repo publishes in two ways:
|
|
73
|
+
|
|
74
|
+
- GitHub Releases for native binaries
|
|
75
|
+
- npm for `npm i -g webctx`
|
|
76
|
+
|
|
77
|
+
The release workflow triggers on `v*` tags and does the following:
|
|
78
|
+
|
|
79
|
+
1. runs Go and Node quality checks
|
|
80
|
+
2. builds cross-platform binaries
|
|
81
|
+
3. creates a GitHub Release with those assets
|
|
82
|
+
4. publishes the npm package using the tag version
|
|
83
|
+
|
|
84
|
+
## Project layout
|
|
85
|
+
|
|
86
|
+
- `cmd/webctx/main.go`: CLI entrypoint
|
|
87
|
+
- `internal/app/`: CLI parsing, search, ranking, scrape, and Firecrawl queue logic
|
|
88
|
+
- `internal/buildinfo/`: build-time version plumbing for `--version`
|
|
89
|
+
- `bin/webctx.js`: npm shim that invokes the packaged native binary
|
|
90
|
+
- `scripts/postinstall.js`: downloads the release binary on install and falls back to local `go build`
|
|
91
|
+
- `.github/workflows/release.yml`: tag-driven release pipeline
|
|
92
|
+
- `AGENTS.md`: guidance for coding agents
|
|
93
|
+
- `CONTRIBUTORS.md`: maintainer/release notes
|
|
94
|
+
|
|
95
|
+
See `AGENTS.md` and `CONTRIBUTORS.md` for repo-specific implementation and maintenance details.
|
package/bin/webctx.js
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
|
|
3
|
+
const fs = require("node:fs");
|
|
4
|
+
const path = require("node:path");
|
|
5
|
+
const { spawnSync } = require("node:child_process");
|
|
6
|
+
|
|
7
|
+
const pkg = require("../package.json");
|
|
8
|
+
const cliName = pkg.config?.cliBinaryName || "webctx";
|
|
9
|
+
const executableName = process.platform === "win32" ? `${cliName}.exe` : `${cliName}-bin`;
|
|
10
|
+
const executablePath = path.join(__dirname, executableName);
|
|
11
|
+
|
|
12
|
+
if (!fs.existsSync(executablePath)) {
|
|
13
|
+
console.error(`${cliName} binary is not installed. Re-run: npm rebuild -g ${pkg.name}`);
|
|
14
|
+
process.exit(1);
|
|
15
|
+
}
|
|
16
|
+
|
|
17
|
+
const child = spawnSync(executablePath, process.argv.slice(2), { stdio: "inherit" });
|
|
18
|
+
|
|
19
|
+
if (child.error) {
|
|
20
|
+
console.error(child.error.message);
|
|
21
|
+
process.exit(1);
|
|
22
|
+
}
|
|
23
|
+
|
|
24
|
+
if (child.signal) {
|
|
25
|
+
process.kill(process.pid, child.signal);
|
|
26
|
+
}
|
|
27
|
+
|
|
28
|
+
process.exit(child.status ?? 1);
|
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
# webctx TypeScript -> Go porting status
|
|
2
|
+
|
|
3
|
+
This document is the handoff reference for future agents working on `amxv/webctx`.
|
|
4
|
+
|
|
5
|
+
## Goal
|
|
6
|
+
|
|
7
|
+
Port the CLI behavior from `amxv/webctx-ts` into pure Go while keeping the command-line interface and provider behavior effectively one-to-one for the CLI use case.
|
|
8
|
+
|
|
9
|
+
Scope intentionally excludes the MCP/server/dashboard pieces from the TypeScript repo.
|
|
10
|
+
|
|
11
|
+
## Source areas reviewed in `webctx-ts`
|
|
12
|
+
|
|
13
|
+
- `cli.ts`
|
|
14
|
+
- `tools/search.ts`
|
|
15
|
+
- `tools/read-link.ts`
|
|
16
|
+
- `tools/map-site.ts`
|
|
17
|
+
- `lib/search/brave.ts`
|
|
18
|
+
- `lib/search/tavily.ts`
|
|
19
|
+
- `lib/search/exa.ts`
|
|
20
|
+
- `lib/ranking.ts`
|
|
21
|
+
- `lib/utils.ts`
|
|
22
|
+
- `lib/scraping.ts`
|
|
23
|
+
- `lib/firecrawl-queue.ts`
|
|
24
|
+
- `lib/rate-limiter.ts`
|
|
25
|
+
|
|
26
|
+
## Completed
|
|
27
|
+
|
|
28
|
+
### CLI surface
|
|
29
|
+
|
|
30
|
+
Implemented in Go:
|
|
31
|
+
|
|
32
|
+
- `webctx search <query> [--exclude domain1,domain2] [--keyword phrase]`
|
|
33
|
+
- `webctx read-link <url>`
|
|
34
|
+
- `webctx map-site <url>`
|
|
35
|
+
- `webctx --help`
|
|
36
|
+
- `webctx --version`
|
|
37
|
+
|
|
38
|
+
Notes:
|
|
39
|
+
|
|
40
|
+
- `--version` now prints the bare version string, matching the TypeScript CLI.
|
|
41
|
+
- Error handling is exit-code based in Go rather than promise rejection based.
|
|
42
|
+
|
|
43
|
+
### Search port
|
|
44
|
+
|
|
45
|
+
Implemented:
|
|
46
|
+
|
|
47
|
+
- Brave HTTP client
|
|
48
|
+
- Tavily HTTP client using direct HTTP API instead of the TypeScript SDK
|
|
49
|
+
- Exa HTTP client
|
|
50
|
+
- provider fan-out with per-provider timeout
|
|
51
|
+
- duplicate-aware reranking
|
|
52
|
+
- excluded-domain filtering
|
|
53
|
+
- HTML entity decoding
|
|
54
|
+
- keyword truncation to 5 words for Exa include-text mode
|
|
55
|
+
- top 35 result output limit
|
|
56
|
+
|
|
57
|
+
Current behavior matches the TypeScript CLI design:
|
|
58
|
+
|
|
59
|
+
- normal search mode queries Brave + Tavily + Exa
|
|
60
|
+
- keyword mode queries Exa only
|
|
61
|
+
- user/domain exclusions are applied after provider collection, matching the TypeScript tool flow
|
|
62
|
+
|
|
63
|
+
### Read-link port
|
|
64
|
+
|
|
65
|
+
Implemented:
|
|
66
|
+
|
|
67
|
+
- GitHub raw-content fast path
|
|
68
|
+
- `.md` fast path with HEAD probe
|
|
69
|
+
- Firecrawl scrape fallback with the same agent-oriented request settings
|
|
70
|
+
- PDF parser enablement for `.pdf` URLs
|
|
71
|
+
|
|
72
|
+
Kept settings aligned with the TypeScript CLI:
|
|
73
|
+
|
|
74
|
+
- `formats: ["markdown"]`
|
|
75
|
+
- `onlyMainContent: true`
|
|
76
|
+
- `skipTlsVerification: true`
|
|
77
|
+
- `blockAds: true`
|
|
78
|
+
- `removeBase64Images: true`
|
|
79
|
+
- `maxAge: 600000`
|
|
80
|
+
- same excluded tags list
|
|
81
|
+
|
|
82
|
+
### Map-site port
|
|
83
|
+
|
|
84
|
+
Implemented with the same Firecrawl map settings:
|
|
85
|
+
|
|
86
|
+
- `sitemap: "include"`
|
|
87
|
+
- `includeSubdomains: true`
|
|
88
|
+
- `ignoreQueryParameters: true`
|
|
89
|
+
- `limit: 5000`
|
|
90
|
+
|
|
91
|
+
### Firecrawl queue / rate limiting
|
|
92
|
+
|
|
93
|
+
Implemented in Go:
|
|
94
|
+
|
|
95
|
+
- singleton-style queue wrapper
|
|
96
|
+
- token bucket limiter at 10 requests/minute
|
|
97
|
+
- serialized queue processing for Firecrawl operations
|
|
98
|
+
|
|
99
|
+
This is not a literal line-by-line port, but preserves the same operational intent.
|
|
100
|
+
|
|
101
|
+
### Release/publish setup
|
|
102
|
+
|
|
103
|
+
Updated to be release-ready for the real `webctx` CLI:
|
|
104
|
+
|
|
105
|
+
- GitHub Actions workflow now builds `webctx` instead of the old template placeholder
|
|
106
|
+
- release binaries embed the tagged version into `internal/buildinfo.Version`
|
|
107
|
+
- npm metadata now describes the actual CLI instead of the template
|
|
108
|
+
- README / agent / maintainer docs updated for the real repo
|
|
109
|
+
|
|
110
|
+
## Intentionally not ported
|
|
111
|
+
|
|
112
|
+
- MCP/server behavior
|
|
113
|
+
- Next.js app/dashboard code
|
|
114
|
+
- database/logging layers unrelated to the CLI
|
|
115
|
+
|
|
116
|
+
These can be added later only if explicitly requested.
|
|
117
|
+
|
|
118
|
+
## Current repo files of interest
|
|
119
|
+
|
|
120
|
+
- `cmd/webctx/main.go`
|
|
121
|
+
- `internal/app/app.go`
|
|
122
|
+
- `internal/app/tools.go`
|
|
123
|
+
- `internal/app/scrape.go`
|
|
124
|
+
- `internal/app/app_test.go`
|
|
125
|
+
- `.github/workflows/release.yml`
|
|
126
|
+
- `scripts/postinstall.js`
|
|
127
|
+
- `README.md`
|
|
128
|
+
|
|
129
|
+
## Verification already completed
|
|
130
|
+
|
|
131
|
+
- `go test ./...`
|
|
132
|
+
- `go build ./cmd/webctx`
|
|
133
|
+
|
|
134
|
+
## Live validation notes
|
|
135
|
+
|
|
136
|
+
Live CLI validation was run against a real `.env.local` on the Sprite machine.
|
|
137
|
+
|
|
138
|
+
Confirmed working live:
|
|
139
|
+
|
|
140
|
+
- combined `search` path returns real web results
|
|
141
|
+
- public GitHub blob `read-link` fast path works
|
|
142
|
+
- Firecrawl-backed `read-link` works
|
|
143
|
+
- Firecrawl-backed `map-site` works
|
|
144
|
+
|
|
145
|
+
Observed external/provider constraints during live validation:
|
|
146
|
+
|
|
147
|
+
- `search --keyword` currently depends on Exa-only results and could not be fully validated because the live Exa account returned `NO_MORE_CREDITS`
|
|
148
|
+
- private GitHub blob URLs are not readable via unauthenticated raw-content fetch, so they fall through to the general scrape path
|
|
149
|
+
|
|
150
|
+
These findings were from live provider behavior, not from compile/test failures in the Go port.
|
|
151
|
+
|
|
152
|
+
## Good next checks for future agents
|
|
153
|
+
|
|
154
|
+
1. Run live end-to-end checks against real provider keys for:
|
|
155
|
+
- normal multi-provider search
|
|
156
|
+
- Exa keyword-only search mode
|
|
157
|
+
- GitHub raw-content read-link
|
|
158
|
+
- `.md` fast path read-link
|
|
159
|
+
- Firecrawl scrape fallback
|
|
160
|
+
- Firecrawl map-site
|
|
161
|
+
|
|
162
|
+
2. Compare a handful of live outputs from `webctx-ts` and Go `webctx` for formatting parity.
|
|
163
|
+
|
|
164
|
+
3. If performance tuning is needed, focus on:
|
|
165
|
+
- HTTP client reuse
|
|
166
|
+
- provider timeout tuning
|
|
167
|
+
- Firecrawl queue behavior under concurrent use
|
|
168
|
+
|
|
169
|
+
## Constraints to preserve
|
|
170
|
+
|
|
171
|
+
- Keep the CLI output simple and agent-friendly.
|
|
172
|
+
- Keep the Firecrawl request settings stable unless explicitly asked to change them.
|
|
173
|
+
- Keep the release asset naming contract stable unless postinstall/workflow are updated together.
|
package/go.mod
ADDED