@mondoohq/xgrep_linux_amd64 0.0.0 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +30 -231
  2. package/package.json +1 -1
  3. package/xgrep +0 -0
package/README.md CHANGED
@@ -2,32 +2,11 @@
2
2
 
3
3
  A fast, Semgrep-compatible code scanner written in Go.
4
4
 
5
- xgrep scans codebases using Semgrep YAML rule syntax and tree-sitter for language-aware pattern matching.
6
-
7
- ## Design Goals
8
-
9
- xgrep optimizes for **accuracy**: when it reports a vulnerability, it should be real
10
- and exploitable. False positives are what kill SAST tools — once a scanner cries wolf,
11
- people stop reading its output and real bugs slip through. Every rule and engine change
12
- is judged against these goals:
13
-
14
- 1. **Report exploitable issues, not imperfect code.** The bar for a security finding is
15
- "exploitable," not "technically imperfect." A technically-true-but-harmless match is
16
- treated as noise.
17
- 2. **Earn precision through dataflow/reachability, not by weakening detection.** Prefer
18
- firing when untrusted input actually reaches a dangerous sink over matching code shape
19
- alone. Relaxing *what counts as a bug* to cut noise also loses real bugs — add context,
20
- don't loosen the pattern.
21
- 3. **Separate correctness from security.** A code smell (e.g. an unescaped `.` in a
22
- hostname regex) is a low-severity correctness note; an exploitable bug is a security
23
- finding. Smells must never drown out confirmed vulnerabilities.
24
- 4. **Calibrate severity and confidence to exploitability.** HIGH/CRITICAL only when impact
25
- is demonstrable; uncertain findings are low-confidence "review" items, clearly distinct
26
- from confirmed ones.
27
- 5. **Prefer AST and semantic analysis over regex.** Tree-sitter ASTs and taint dataflow are
28
- more precise than text patterns, and are the default.
29
- 6. **Never suppress a true positive to lower a count.** If xgrep finds a real bug — even a
30
- minor one — the fix belongs in the code, not in muting the rule.
5
+ xgrep scans codebases using Semgrep YAML rule syntax and tree-sitter for language-aware,
6
+ AST-based pattern matching. It optimizes for **accuracy** — when it reports a
7
+ vulnerability, it should be real and exploitable — and adds code-intelligence and
8
+ AI-agent features on top of scanning. See the
9
+ [design goals](docs/01-getting-started/index.md#design-goals).
31
10
 
32
11
  ## Installation
33
12
 
@@ -43,222 +22,42 @@ cd xgrep
43
22
  go build -o xgrep ./cmd/xgrep
44
23
  ```
45
24
 
46
- ## Quick Start
25
+ ## Quick start
47
26
 
48
27
  ```bash
49
- # Scan a directory with a rule file
28
+ # Scan a directory with a rule file (or a directory of rules)
50
29
  xgrep -f rules.yaml src/
51
30
 
52
- # Scan with a directory of rules
53
- xgrep -f rules/ src/
54
-
55
- # Use --config/-c as an alias for -f/--rules
56
- xgrep --config rules.yaml src/
57
- ```
58
-
59
- ## Usage
60
-
61
- ```
62
- xgrep [flags] -f <rules> <targets...>
63
-
64
- Flags:
65
- -f, --rules string path to rule file or directory
66
- -c, --config string path to rule file or directory (alias for --rules)
67
- --json output results as JSON
68
- --sarif output results as SARIF
69
- -j, --jobs int number of parallel workers (default: NumCPU)
70
- --severity string minimum severity to report (INFO, WARNING, ERROR)
71
- --include string include only files matching glob pattern
72
- --exclude string exclude files matching glob pattern
73
- --max-target-bytes skip files larger than N bytes
74
- -o, --output string write output to file instead of stdout
75
- --rule-id string only run rules with matching IDs (comma-separated)
76
- --skip-rule string skip rules with matching IDs (comma-separated)
77
- --autofix apply fixes to source files in place
78
- --dry-run show fixes without applying (use with --autofix)
79
- --verbose enable debug output
80
-
81
- Subcommands:
82
- scan scan targets (default when -f is provided)
83
- inspect code intelligence: search symbols, navigate definitions, assess impact
84
- graph build and query the code graph
85
- mcp run as an MCP server over stdio (for AI agents)
86
- test <path> run tests on rule files in a directory
87
- validate <path> validate rule files without scanning
88
- lsp start an LSP server over stdio
89
- version print version and exit
90
- ```
91
-
92
- ## Code Intelligence
93
-
94
- `xgrep inspect` provides fast code navigation for both humans and AI agents:
95
-
96
- ```bash
97
- # Understand a codebase
98
- xgrep inspect overview .
99
-
100
- # Search for symbols by name
101
- xgrep inspect symbol "Handler" --kind function
102
-
103
- # Fast text search (trigram-indexed via Zoekt)
104
- xgrep inspect search "TODO|FIXME" --regex --lang go
105
-
106
- # Go to definition
107
- xgrep inspect definition --file src/server.go --line 42
108
-
109
- # Find all callers and callees
110
- xgrep inspect references "ProcessRequest"
111
-
112
- # File outline (all symbols)
113
- xgrep inspect outline src/server.go
114
-
115
- # Assess blast radius before changing a function
116
- xgrep inspect impact "ProcessRequest"
117
-
118
- # Show call dependencies (upstream + downstream)
119
- xgrep inspect deps "ProcessRequest"
120
- ```
121
-
122
- All commands support `--json` for structured output. The code graph and search index are
123
- cached in `.xgrep/` and rebuild incrementally.
124
-
125
- See [docs/CODE_INTELLIGENCE.md](docs/CODE_INTELLIGENCE.md) for full documentation.
126
-
127
- ## Code Graph
128
-
129
- ```bash
130
- # Build the code graph (auto-cached to .xgrep/graph.json)
131
- xgrep graph build .
132
-
133
- # Find callers / callees
134
- xgrep graph callers --json <function-name>
135
- xgrep graph callees --json <function-name>
136
-
137
- # Find call paths between two functions
138
- xgrep graph paths --json <source> <dest>
139
-
140
- # Show N-hop neighborhood with inlined source code
141
- xgrep graph context <function-name> --depth 2
142
- ```
143
-
144
- ## MCP Server
145
-
146
- Run xgrep as an [MCP](https://modelcontextprotocol.io) server for AI agent integration:
147
-
148
- ```bash
149
- xgrep mcp
150
- ```
151
-
152
- Exposes all scan, graph, and inspect capabilities as MCP tools over stdio.
153
-
154
- ## Supported Languages
155
-
156
- ### Tree-sitter languages (full AST matching)
157
-
158
- | Language | Extensions |
159
- |------------|-------------------------------------|
160
- | Python | `.py`, `.pyi` |
161
- | Go | `.go` |
162
- | Java | `.java` |
163
- | JavaScript | `.js`, `.jsx`, `.mjs`, `.cjs` |
164
- | TypeScript | `.ts` |
165
- | TSX | `.tsx` |
166
- | Ruby | `.rb` |
167
- | PHP | `.php` |
168
- | C | `.c`, `.h` |
169
- | C++ | `.cc`, `.cpp`, `.cxx`, `.hpp` |
170
- | C# | `.cs` |
171
- | Rust | `.rs` |
172
- | Kotlin | `.kt`, `.kts` |
173
- | Scala | `.scala`, `.sc` |
174
- | Bash | `.sh`, `.bash`, `.zsh` |
175
- | Lua | `.lua` |
176
- | Julia | `.jl` |
177
- | OCaml | `.ml`, `.mli` |
178
- | HTML | `.html`, `.htm`, `.vue` |
179
- | JSON | `.json` |
180
- | YAML | `.yaml`, `.yml` |
181
- | XML | `.xml` |
182
- | HCL | `.tf`, `.hcl` |
183
-
184
- ### Regex-only languages
185
-
186
- Dockerfile, Solidity, Swift, Dart, R, Clojure, Elixir, Erlang, Scheme, Lisp, and generic/text files are matched using regex patterns.
187
-
188
- ## Rule Format
189
-
190
- xgrep supports the Semgrep YAML rule format:
191
-
192
- ```yaml
193
- rules:
194
- - id: my-rule
195
- pattern: eval(...)
196
- message: Avoid using eval()
197
- severity: WARNING
198
- languages: [python]
199
- ```
200
-
201
- Supported rule features include:
202
- - `pattern`, `patterns`, `pattern-either`, `pattern-not`, `pattern-inside`, `pattern-not-inside`
203
- - `pattern-regex`, `pattern-not-regex`
204
- - Metavariables (`$VAR`, `$...ARGS`)
205
- - `metavariable-pattern`, `metavariable-regex`, `metavariable-comparison`
206
- - `focus-metavariable`
207
- - `fix` (autofix support)
208
- - Taint analysis (`mode: taint` with `pattern-sources`, `pattern-sinks`, `pattern-sanitizers`, `pattern-propagators`)
209
- - Supply chain rules (`r2c-internal-project-depends-on`)
210
- - `options` including `interfile: true` for cross-file analysis
211
- - `min-version` / `max-version` for engine version constraints
212
-
213
- See the [Semgrep rule syntax documentation](https://semgrep.dev/docs/writing-rules/rule-syntax) for details.
214
-
215
- ## Testing Rules
216
-
217
- Use `xgrep test` to validate rules against annotated test files:
218
-
219
- ```bash
220
- xgrep test rules/
221
- ```
222
-
223
- Test files use comment annotations to mark expected matches:
224
-
225
- ```python
226
- # ruleid: my-rule
227
- eval(user_input)
228
-
229
- # ok: my-rule
230
- safe_function(data)
231
-
232
- # todoruleid: my-rule
233
- not_yet_supported()
234
- ```
235
-
236
- ## Output Formats
237
-
238
- ### Text (default)
239
-
240
- ```
241
- src/app.py:10:my-rule: Avoid using eval()
242
- ```
243
-
244
- ### JSON
245
-
246
- ```bash
31
+ # Machine-readable output
247
32
  xgrep -f rules.yaml --json src/
33
+ xgrep -f rules.yaml --sarif src/ # GitHub Code Scanning
34
+ xgrep -f rules.yaml --gitlab -o gl-sast-report.json src/ # GitLab SAST
248
35
  ```
249
36
 
250
- ### SARIF
37
+ A scan target can also be a **remote git repository** — xgrep clones it (shallow,
38
+ default branch) into a temp directory and scans it, no manual clone needed:
251
39
 
252
40
  ```bash
253
- xgrep -f rules.yaml --sarif src/
41
+ xgrep scan github.com/mondoohq/xgrep # host/owner/repo shorthand
42
+ xgrep scan https://github.com/mondoohq/xgrep # or a full HTTPS/SSH URL
43
+ xgrep scan github.com/mondoohq/xgrep --ref v1.2.0 # a branch, tag, or commit
254
44
  ```
255
45
 
256
- ## LSP Support
46
+ See the [remote-repository section](docs/02-scanning/cli-reference.md#scanning-a-remote-repository)
47
+ for `--ref`, `--depth`, and `--full-clone`.
257
48
 
258
- xgrep includes a Language Server Protocol server for editor integration:
49
+ ## Documentation
259
50
 
260
- ```bash
261
- xgrep -f rules.yaml lsp
262
- ```
51
+ Full documentation lives in [`docs/`](docs/README.md):
52
+
53
+ - **[Getting started](docs/01-getting-started/index.md)** — install and run your first scan.
54
+ - **[Scanning](docs/02-scanning/index.md)** — CLI reference, output formats, supported
55
+ languages, file filtering, and Semgrep compatibility.
56
+ - **[Rules](docs/03-rules/index.md)** — writing, syntax, taint analysis, and testing rules.
57
+ - **[Code intelligence](docs/04-code-intelligence/index.md)** — `xgrep inspect` and the code graph.
58
+ - **[Integrations](docs/05-integrations/index.md)** — MCP, LSP, and CI.
59
+ - **[AI agents](docs/06-ai-agents/index.md)** — using xgrep as an agent backend (see also
60
+ [`AGENTS.md`](AGENTS.md)).
263
61
 
264
- The LSP server communicates over stdio and provides real-time diagnostics as you edit code.
62
+ Contributors: see [`CLAUDE.md`](CLAUDE.md) and the
63
+ [architecture decision records](docs/adr).
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@mondoohq/xgrep_linux_amd64",
3
- "version": "0.0.0",
3
+ "version": "0.1.0",
4
4
  "bin": {
5
5
  "xgrep_linux_amd64": "xgrep"
6
6
  },
package/xgrep CHANGED
Binary file