PyPI - vulnhawk - Versions diffs - 0.1.0__tar.gz - Mend

vulnhawk 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

vulnhawk-0.1.0/.github/workflows/vulnhawk.yml +26 -0
vulnhawk-0.1.0/.gitignore +13 -0
vulnhawk-0.1.0/CONTRIBUTING.md +53 -0
vulnhawk-0.1.0/LICENSE +21 -0
vulnhawk-0.1.0/PKG-INFO +259 -0
vulnhawk-0.1.0/README.md +221 -0
vulnhawk-0.1.0/action/action.yml +114 -0
vulnhawk-0.1.0/pyproject.toml +76 -0
vulnhawk-0.1.0/tests/__init__.py +0 -0
vulnhawk-0.1.0/tests/fixtures/vulnerable_go.go +87 -0
vulnhawk-0.1.0/tests/fixtures/vulnerable_js.js +120 -0
vulnhawk-0.1.0/tests/fixtures/vulnerable_python.py +130 -0
vulnhawk-0.1.0/tests/test_chunker.py +94 -0
vulnhawk-0.1.0/tests/test_models.py +82 -0
vulnhawk-0.1.0/tests/test_reporters.py +96 -0
vulnhawk-0.1.0/vulnhawk/__init__.py +3 -0
vulnhawk-0.1.0/vulnhawk/cli.py +203 -0
vulnhawk-0.1.0/vulnhawk/llm/__init__.py +0 -0
vulnhawk-0.1.0/vulnhawk/llm/base.py +32 -0
vulnhawk-0.1.0/vulnhawk/llm/claude.py +36 -0
vulnhawk-0.1.0/vulnhawk/llm/ollama.py +45 -0
vulnhawk-0.1.0/vulnhawk/llm/openai_backend.py +39 -0
vulnhawk-0.1.0/vulnhawk/models.py +151 -0
vulnhawk-0.1.0/vulnhawk/reporters/__init__.py +0 -0
vulnhawk-0.1.0/vulnhawk/reporters/json_reporter.py +42 -0
vulnhawk-0.1.0/vulnhawk/reporters/markdown.py +61 -0
vulnhawk-0.1.0/vulnhawk/reporters/sarif.py +104 -0
vulnhawk-0.1.0/vulnhawk/reporters/terminal.py +120 -0
vulnhawk-0.1.0/vulnhawk/rules/__init__.py +0 -0
vulnhawk-0.1.0/vulnhawk/rules/prompts.py +125 -0
vulnhawk-0.1.0/vulnhawk/scanner/__init__.py +0 -0
vulnhawk-0.1.0/vulnhawk/scanner/chunker.py +378 -0
vulnhawk-0.1.0/vulnhawk/scanner/engine.py +236 -0
vulnhawk-0.1.0/vulnhawk/scanner/languages/__init__.py +0 -0
vulnhawk-0.1.0/vulnhawk/utils/__init__.py +0 -0

vulnhawk-0.1.0/.github/workflows/vulnhawk.yml ADDED Viewed

@@ -0,0 +1,26 @@
+# Example workflow: Run VulnHawk on every pull request
+name: VulnHawk Security Scan
+on:
+  pull_request:
+  push:
+    branches: [main]
+permissions:
+  security-events: write  # Required for SARIF upload
+jobs:
+  security-scan:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run VulnHawk
+        uses: momenbasel/vulnhawk@main
+        with:
+          target: '.'
+          backend: 'claude'
+          mode: 'full'
+          severity: 'medium'
+          api-key: ${{ secrets.ANTHROPIC_API_KEY }}
+          fail-on-findings: 'true'

vulnhawk-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,13 @@
+__pycache__/
+*.pyc
+*.pyo
+*.egg-info/
+dist/
+build/
+.venv/
+venv/
+.pytest_cache/
+.ruff_cache/
+*.sarif
+*.json.bak
+.DS_Store

vulnhawk-0.1.0/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Contributing to VulnHawk
+Thanks for your interest in contributing.
+## Development Setup
+```bash
+git clone https://github.com/momenbasel/vulnhawk.git
+cd vulnhawk
+uv venv .venv && source .venv/bin/activate
+uv pip install -e ".[dev]"
+```
+## Running Tests
+```bash
+pytest
+```
+## Code Style
+We use `ruff` for linting:
+```bash
+ruff check .
+ruff format .
+```
+## Adding Language Support
+1. Create a new file in `vulnhawk/scanner/languages/`
+2. Add the extension mapping in `vulnhawk/models.py` (`Language.from_extension`)
+3. Add the splitter function in `vulnhawk/scanner/chunker.py`
+4. Add test fixtures in `tests/fixtures/`
+5. Add tests in `tests/test_chunker.py`
+## Adding a New LLM Backend
+1. Create a new file in `vulnhawk/llm/`
+2. Implement the `BaseLLM` interface
+3. Add the backend choice in `vulnhawk/cli.py`
+4. Document any required environment variables
+## Reporting Security Issues
+If you find a security vulnerability in VulnHawk itself, please report it privately via GitHub Security Advisories rather than opening a public issue.
+## Pull Requests
+- Keep PRs focused on a single change
+- Add tests for new functionality
+- Update documentation if needed
+- Run `ruff check .` before submitting

vulnhawk-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Moamen Basel
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

vulnhawk-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,259 @@
+Metadata-Version: 2.4
+Name: vulnhawk
+Version: 0.1.0
+Summary: AI-powered code security scanner that finds vulnerabilities Semgrep and CodeQL miss
+Project-URL: Homepage, https://github.com/momenbasel/vulnhawk
+Project-URL: Repository, https://github.com/momenbasel/vulnhawk
+Project-URL: Issues, https://github.com/momenbasel/vulnhawk/issues
+Author: Moamen Basel
+License-Expression: MIT
+License-File: LICENSE
+Keywords: ai,appsec,code-analysis,llm,sast,scanner,security,vulnerability
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Topic :: Security
+Classifier: Topic :: Software Development :: Quality Assurance
+Classifier: Topic :: Software Development :: Testing
+Requires-Python: >=3.10
+Requires-Dist: anthropic>=0.40.0
+Requires-Dist: click>=8.0
+Requires-Dist: httpx>=0.27.0
+Requires-Dist: openai>=1.0
+Requires-Dist: pathspec>=0.12.0
+Requires-Dist: pyyaml>=6.0
+Requires-Dist: rich>=13.0
+Requires-Dist: tiktoken>=0.7.0
+Provides-Extra: dev
+Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: ruff>=0.5.0; extra == 'dev'
+Description-Content-Type: text/markdown
+<p align="center">
+  <img src="https://img.shields.io/pypi/v/vulnhawk?color=blue&label=PyPI" alt="PyPI">
+  <img src="https://img.shields.io/pypi/pyversions/vulnhawk" alt="Python">
+  <img src="https://img.shields.io/github/license/momenbasel/vulnhawk" alt="License">
+  <img src="https://img.shields.io/github/stars/momenbasel/vulnhawk?style=social" alt="Stars">
+</p>
+<h1 align="center">VulnHawk</h1>
+<p align="center">
+  <strong>AI-powered code security scanner that finds vulnerabilities<br>Semgrep and CodeQL miss.</strong>
+</p>
+<p align="center">
+  VulnHawk uses AI to understand your code's <em>business logic</em> - not just pattern matching.<br>
+  It spots missing auth checks, IDOR flaws, and logic bugs that rule-based tools can't detect.
+</p>
+---
+## What Makes VulnHawk Different
+Traditional scanners (Semgrep, CodeQL, Bandit) use pattern matching and AST rules. They're great at finding known patterns, but they **can't understand intent**.
+VulnHawk analyzes your code with AI and cross-references how different parts of your codebase handle security. If 12 endpoints check authorization but one doesn't, VulnHawk catches it.
+| Feature | Semgrep / CodeQL | VulnHawk |
+|---------|-----------------|----------|
+| Detection method | AST pattern matching | AI code understanding |
+| Business logic bugs | Cannot detect | Detects missing auth, IDOR, logic flaws |
+| Cross-file analysis | Requires custom rules | Automatic - compares similar code patterns |
+| Setup | Write rules, configure | Zero config - works immediately |
+| Finding descriptions | Rule IDs and templates | Natural language with attack scenarios |
+| Fix suggestions | Generic recommendations | Context-specific code fixes |
+## Quick Start
+```bash
+pip install vulnhawk
+```
+Set your LLM API key:
+```bash
+export ANTHROPIC_API_KEY=sk-ant-...    # Claude (default)
+# or
+export OPENAI_API_KEY=sk-...           # OpenAI
+# or just run Ollama locally           # Free, no API key needed
+```
+Scan your code:
+```bash
+vulnhawk scan ./src
+```
+That's it. No config files, no rule writing, no setup.
+## Usage
+### Basic scan
+```bash
+vulnhawk scan ./src
+```
+### Focused scanning
+```bash
+# Only check authentication and authorization
+vulnhawk scan ./src --mode auth
+# Only check for injection vulnerabilities
+vulnhawk scan ./api --mode injection
+# Only look for hardcoded secrets
+vulnhawk scan . --mode secrets
+```
+### Output formats
+```bash
+# JSON output
+vulnhawk scan ./src -o json -f results.json
+# SARIF for GitHub Code Scanning
+vulnhawk scan ./src -o sarif -f results.sarif
+# Markdown report
+vulnhawk scan ./src -o markdown -f report.md
+```
+### Different LLM backends
+```bash
+# Claude (default, best results)
+vulnhawk scan ./src -b claude
+# OpenAI
+vulnhawk scan ./src -b openai -m gpt-4o
+# Ollama (free, local, private)
+vulnhawk scan ./src -b ollama -m llama3.1
+```
+### Filter by severity
+```bash
+# Only critical and high
+vulnhawk scan ./src --severity high
+# Everything including info
+vulnhawk scan ./src --severity info
+```
+### Preview what will be scanned
+```bash
+vulnhawk info ./src
+```
+## GitHub Action
+Add VulnHawk to your CI/CD pipeline:
+```yaml
+name: Security Scan
+on: [pull_request]
+permissions:
+  security-events: write
+jobs:
+  vulnhawk:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: momenbasel/vulnhawk@main
+        with:
+          target: '.'
+          api-key: ${{ secrets.ANTHROPIC_API_KEY }}
+          severity: 'medium'
+          fail-on-findings: 'true'
+```
+Results automatically appear in GitHub's **Security** tab via SARIF upload.
+## Scan Modes
+| Mode | What it checks |
+|------|---------------|
+| `full` | Everything (default) |
+| `auth` | Authentication bypass, missing auth checks, session flaws, JWT issues |
+| `injection` | SQLi, command injection, SSTI, NoSQL injection, XSS |
+| `secrets` | Hardcoded API keys, passwords, tokens, connection strings |
+| `config` | Debug mode, verbose errors, permissive CORS, insecure cookies |
+| `crypto` | Weak hashing, hardcoded keys, insecure random, deprecated algorithms |
+## Supported Languages
+- Python
+- JavaScript / TypeScript
+- Go
+- More coming soon (Java, Ruby, PHP, Rust)
+## How It Works
+1. **Discover** - Walks your codebase, respects `.gitignore` and `.vulnhawkignore`
+2. **Chunk** - Splits code into logical pieces (functions, classes, routes) with surrounding context
+3. **Enrich** - For each chunk, includes import context and **related code** from elsewhere in the codebase (this is the key differentiator - it shows the AI how other parts handle auth, validation, etc.)
+4. **Analyze** - Sends enriched chunks to the LLM with security-focused analysis prompts
+5. **Validate** - Cross-references findings, removes duplicates, assigns confidence scores
+6. **Report** - Formats results with code snippets, attack scenarios, and fix suggestions
+The **enrichment step** is what makes VulnHawk fundamentally different. By showing the AI how similar endpoints in your codebase handle security, it can spot the one that doesn't.
+## Configuration
+### .vulnhawkignore
+Create a `.vulnhawkignore` file to exclude paths (same syntax as `.gitignore`):
+```
+# Skip generated code
+generated/
+*.gen.go
+# Skip vendor dependencies
+vendor/
+third_party/
+```
+### Environment Variables
+| Variable | Description |
+|----------|-------------|
+| `ANTHROPIC_API_KEY` | API key for Claude backend |
+| `OPENAI_API_KEY` | API key for OpenAI backend |
+## FAQ
+**How much does it cost to run?**
+Depends on codebase size and LLM backend. A typical scan of a medium project (~100 files) costs about $0.50-$2.00 with Claude. Use Ollama for free local scanning.
+**Will it find everything?**
+No security tool catches everything. VulnHawk is best at finding business logic bugs, missing authorization, and context-dependent vulnerabilities that pattern-matching tools miss. Use it alongside (not instead of) Semgrep/CodeQL.
+**Is my code sent to an external API?**
+Yes, code chunks are sent to the configured LLM provider (Anthropic, OpenAI). Use the Ollama backend for fully local, private scanning.
+**Does it support monorepos?**
+Yes. Point it at any directory and it will scan all supported files recursively.
+## Contributing
+Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+```bash
+# Development setup
+git clone https://github.com/momenbasel/vulnhawk.git
+cd vulnhawk
+uv venv .venv && source .venv/bin/activate
+uv pip install -e ".[dev]"
+pytest
+```
+## License
+MIT - see [LICENSE](LICENSE)

vulnhawk-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,221 @@
+<p align="center">
+  <img src="https://img.shields.io/pypi/v/vulnhawk?color=blue&label=PyPI" alt="PyPI">
+  <img src="https://img.shields.io/pypi/pyversions/vulnhawk" alt="Python">
+  <img src="https://img.shields.io/github/license/momenbasel/vulnhawk" alt="License">
+  <img src="https://img.shields.io/github/stars/momenbasel/vulnhawk?style=social" alt="Stars">
+</p>
+<h1 align="center">VulnHawk</h1>
+<p align="center">
+  <strong>AI-powered code security scanner that finds vulnerabilities<br>Semgrep and CodeQL miss.</strong>
+</p>
+<p align="center">
+  VulnHawk uses AI to understand your code's <em>business logic</em> - not just pattern matching.<br>
+  It spots missing auth checks, IDOR flaws, and logic bugs that rule-based tools can't detect.
+</p>
+---
+## What Makes VulnHawk Different
+Traditional scanners (Semgrep, CodeQL, Bandit) use pattern matching and AST rules. They're great at finding known patterns, but they **can't understand intent**.
+VulnHawk analyzes your code with AI and cross-references how different parts of your codebase handle security. If 12 endpoints check authorization but one doesn't, VulnHawk catches it.
+| Feature | Semgrep / CodeQL | VulnHawk |
+|---------|-----------------|----------|
+| Detection method | AST pattern matching | AI code understanding |
+| Business logic bugs | Cannot detect | Detects missing auth, IDOR, logic flaws |
+| Cross-file analysis | Requires custom rules | Automatic - compares similar code patterns |
+| Setup | Write rules, configure | Zero config - works immediately |
+| Finding descriptions | Rule IDs and templates | Natural language with attack scenarios |
+| Fix suggestions | Generic recommendations | Context-specific code fixes |
+## Quick Start
+```bash
+pip install vulnhawk
+```
+Set your LLM API key:
+```bash
+export ANTHROPIC_API_KEY=sk-ant-...    # Claude (default)
+# or
+export OPENAI_API_KEY=sk-...           # OpenAI
+# or just run Ollama locally           # Free, no API key needed
+```
+Scan your code:
+```bash
+vulnhawk scan ./src
+```
+That's it. No config files, no rule writing, no setup.
+## Usage
+### Basic scan
+```bash
+vulnhawk scan ./src
+```
+### Focused scanning
+```bash
+# Only check authentication and authorization
+vulnhawk scan ./src --mode auth
+# Only check for injection vulnerabilities
+vulnhawk scan ./api --mode injection
+# Only look for hardcoded secrets
+vulnhawk scan . --mode secrets
+```
+### Output formats
+```bash
+# JSON output
+vulnhawk scan ./src -o json -f results.json
+# SARIF for GitHub Code Scanning
+vulnhawk scan ./src -o sarif -f results.sarif
+# Markdown report
+vulnhawk scan ./src -o markdown -f report.md
+```
+### Different LLM backends
+```bash
+# Claude (default, best results)
+vulnhawk scan ./src -b claude
+# OpenAI
+vulnhawk scan ./src -b openai -m gpt-4o
+# Ollama (free, local, private)
+vulnhawk scan ./src -b ollama -m llama3.1
+```
+### Filter by severity
+```bash
+# Only critical and high
+vulnhawk scan ./src --severity high
+# Everything including info
+vulnhawk scan ./src --severity info
+```
+### Preview what will be scanned
+```bash
+vulnhawk info ./src
+```
+## GitHub Action
+Add VulnHawk to your CI/CD pipeline:
+```yaml
+name: Security Scan
+on: [pull_request]
+permissions:
+  security-events: write
+jobs:
+  vulnhawk:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: momenbasel/vulnhawk@main
+        with:
+          target: '.'
+          api-key: ${{ secrets.ANTHROPIC_API_KEY }}
+          severity: 'medium'
+          fail-on-findings: 'true'
+```
+Results automatically appear in GitHub's **Security** tab via SARIF upload.
+## Scan Modes
+| Mode | What it checks |
+|------|---------------|
+| `full` | Everything (default) |
+| `auth` | Authentication bypass, missing auth checks, session flaws, JWT issues |
+| `injection` | SQLi, command injection, SSTI, NoSQL injection, XSS |
+| `secrets` | Hardcoded API keys, passwords, tokens, connection strings |
+| `config` | Debug mode, verbose errors, permissive CORS, insecure cookies |
+| `crypto` | Weak hashing, hardcoded keys, insecure random, deprecated algorithms |
+## Supported Languages
+- Python
+- JavaScript / TypeScript
+- Go
+- More coming soon (Java, Ruby, PHP, Rust)
+## How It Works
+1. **Discover** - Walks your codebase, respects `.gitignore` and `.vulnhawkignore`
+2. **Chunk** - Splits code into logical pieces (functions, classes, routes) with surrounding context
+3. **Enrich** - For each chunk, includes import context and **related code** from elsewhere in the codebase (this is the key differentiator - it shows the AI how other parts handle auth, validation, etc.)
+4. **Analyze** - Sends enriched chunks to the LLM with security-focused analysis prompts
+5. **Validate** - Cross-references findings, removes duplicates, assigns confidence scores
+6. **Report** - Formats results with code snippets, attack scenarios, and fix suggestions
+The **enrichment step** is what makes VulnHawk fundamentally different. By showing the AI how similar endpoints in your codebase handle security, it can spot the one that doesn't.
+## Configuration
+### .vulnhawkignore
+Create a `.vulnhawkignore` file to exclude paths (same syntax as `.gitignore`):
+```
+# Skip generated code
+generated/
+*.gen.go
+# Skip vendor dependencies
+vendor/
+third_party/
+```
+### Environment Variables
+| Variable | Description |
+|----------|-------------|
+| `ANTHROPIC_API_KEY` | API key for Claude backend |
+| `OPENAI_API_KEY` | API key for OpenAI backend |
+## FAQ
+**How much does it cost to run?**
+Depends on codebase size and LLM backend. A typical scan of a medium project (~100 files) costs about $0.50-$2.00 with Claude. Use Ollama for free local scanning.
+**Will it find everything?**
+No security tool catches everything. VulnHawk is best at finding business logic bugs, missing authorization, and context-dependent vulnerabilities that pattern-matching tools miss. Use it alongside (not instead of) Semgrep/CodeQL.
+**Is my code sent to an external API?**
+Yes, code chunks are sent to the configured LLM provider (Anthropic, OpenAI). Use the Ollama backend for fully local, private scanning.
+**Does it support monorepos?**
+Yes. Point it at any directory and it will scan all supported files recursively.
+## Contributing
+Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+```bash
+# Development setup
+git clone https://github.com/momenbasel/vulnhawk.git
+cd vulnhawk
+uv venv .venv && source .venv/bin/activate
+uv pip install -e ".[dev]"
+pytest
+```
+## License
+MIT - see [LICENSE](LICENSE)