PyPI - orihime - Versions diffs - 1.9.2__tar.gz - Mend

orihime 1.9.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

orihime-1.9.2/PKG-INFO +399 -0
orihime-1.9.2/README.md +368 -0
orihime-1.9.2/orihime/__init__.py +0 -0
orihime-1.9.2/orihime/__main__.py +363 -0
orihime-1.9.2/orihime/complexity_pass.py +248 -0
orihime-1.9.2/orihime/cross_resolver.py +199 -0
orihime-1.9.2/orihime/indexer.py +1078 -0
orihime-1.9.2/orihime/io_fanout_pass.py +298 -0
orihime-1.9.2/orihime/java_extractor.py +1107 -0
orihime-1.9.2/orihime/js_extractor.py +664 -0
orihime-1.9.2/orihime/kotlin_extractor.py +771 -0
orihime-1.9.2/orihime/language.py +78 -0
orihime-1.9.2/orihime/license_checker.py +244 -0
orihime-1.9.2/orihime/mcp_server.py +2697 -0
orihime-1.9.2/orihime/parse_result.py +39 -0
orihime-1.9.2/orihime/path_utils.py +30 -0
orihime-1.9.2/orihime/perf_ingest.py +238 -0
orihime-1.9.2/orihime/resolver.py +657 -0
orihime-1.9.2/orihime/schema.py +176 -0
orihime-1.9.2/orihime/security_config.py +277 -0
orihime-1.9.2/orihime/skills/orihime-call-flow/SKILL.md +171 -0
orihime-1.9.2/orihime/skills/orihime-change-impact/SKILL.md +187 -0
orihime-1.9.2/orihime/skills/orihime-code-assist/SKILL.md +228 -0
orihime-1.9.2/orihime/skills/orihime-design-review/SKILL.md +345 -0
orihime-1.9.2/orihime/skills/orihime-perf-analysis/SKILL.md +228 -0
orihime-1.9.2/orihime/skills/orihime-security-audit/SKILL.md +269 -0
orihime-1.9.2/orihime/skills/orihime-setup/SKILL.md +184 -0
orihime-1.9.2/orihime/ui_server.py +1841 -0
orihime-1.9.2/orihime/walker.py +23 -0
orihime-1.9.2/orihime/write_client.py +46 -0
orihime-1.9.2/orihime/write_server.py +105 -0
orihime-1.9.2/orihime.egg-info/PKG-INFO +399 -0
orihime-1.9.2/orihime.egg-info/SOURCES.txt +37 -0
orihime-1.9.2/orihime.egg-info/dependency_links.txt +1 -0
orihime-1.9.2/orihime.egg-info/entry_points.txt +2 -0
orihime-1.9.2/orihime.egg-info/requires.txt +10 -0
orihime-1.9.2/orihime.egg-info/top_level.txt +1 -0
orihime-1.9.2/pyproject.toml +51 -0
orihime-1.9.2/setup.cfg +4 -0

orihime-1.9.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,399 @@
+Metadata-Version: 2.4
+Name: orihime
+Version: 1.9.2
+Summary: Cross-repository code knowledge graph for Java/Kotlin/JS/TS — MCP server, web UI, CLI
+License: MIT
+Project-URL: Homepage, https://github.com/srinivasan-sundaresan95/orihime
+Project-URL: Repository, https://github.com/srinivasan-sundaresan95/orihime
+Project-URL: Bug Tracker, https://github.com/srinivasan-sundaresan95/orihime/issues
+Keywords: mcp,mcp-server,model-context-protocol,code-analysis,call-graph,taint-analysis,sast,java,kotlin,javascript,typescript,kuzudb,tree-sitter,security
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Software Development :: Quality Assurance
+Classifier: Topic :: Security
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Requires-Dist: kuzu>=0.6
+Requires-Dist: tree-sitter>=0.23
+Requires-Dist: tree-sitter-java>=0.23
+Requires-Dist: tree-sitter-kotlin>=0.3
+Requires-Dist: tree-sitter-javascript>=0.23
+Requires-Dist: tree-sitter-typescript>=0.23
+Requires-Dist: mcp[cli]>=1.0
+Requires-Dist: fastapi>=0.111
+Requires-Dist: uvicorn[standard]>=0.29
+Requires-Dist: httpx>=0.27
+# Orihime
+[![PyPI](https://img.shields.io/pypi/v/orihime)](https://pypi.org/project/orihime/)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![MCP](https://img.shields.io/badge/MCP-server-blue)](https://modelcontextprotocol.io)
+[![Smithery](https://smithery.ai/badge/orihime)](https://smithery.ai/server/orihime)
+A cross-repository code knowledge graph for Java/Kotlin/JavaScript/TypeScript codebases. Orihime indexes your source code into an embedded [KuzuDB](https://kuzudb.com/) graph database using [tree-sitter](https://tree-sitter.github.io/) and exposes the graph through an **MCP server** (for AI assistants), a local web UI, and a CLI.
+> **Mythology**: Orihime (織姫) is Vega — the weaving princess who weaves the fabric of the cosmos. She weaves connections. The tool that weaves your codebase into a single graph.
+---
+## What It Does
+- **Call graph across repositories** — who calls what, across service boundaries, including REST calls resolved to the endpoint they target
+- **Cross-repo taint analysis** — track user-controlled data from HTTP/Kafka/JMS entry points through the call graph to dangerous sinks (SQL injection, path traversal, XXE, deserialization, SSRF, log injection, …)
+- **Security reports** — OWASP Top 10, CWE, PCI DSS, STIG frameworks; second-order injection detection; custom sources/sinks via YAML
+- **Entry-point reachability filtering** — suppress false positives from dead code; only surface findings reachable from real entry points (HTTP handlers, `@KafkaListener`, `@Scheduled`, `@JmsListener`, `@RabbitListener`)
+- **Complexity hints** — static O(n²) loop detection, N+1 JPA risk, unbounded queries, recursive calls — no profiler needed
+- **Performance correlation** — ingest Gatling/JMeter load test results; correlate with the call graph to find confirmed hotspots and Little's Law capacity ceilings per endpoint
+- **License compliance** — scan Maven/Gradle dependencies against SPDX identifiers; flag GPL/AGPL/LGPL in commercial projects
+- **Incremental re-index** — git blob-hash-based skip; only changed files are re-parsed on subsequent runs
+- **Multi-language** — Java, Kotlin, JavaScript, TypeScript (Next.js, Express, React)
+---
+## Quick Start — AI-first (Claude Code)
+The primary way to use Orihime is through an AI assistant via MCP. You index once, then ask questions in natural language — no Cypher, no grep, no reading source files.
+### 1. Install
+```bash
+git clone https://github.com/srinivasan-sundaresan95/orihime.git
+cd orihime
+pip install -e .
+```
+### 2. Register with Claude Code (one-time setup)
+```bash
+python -m orihime register       # writes MCP server entry to ~/.claude/settings.json
+python -m orihime install-skills # copies Claude Code skills to ~/.claude/skills/
+```
+Restart Claude Code. The `orihime` MCP tools and skills (`/orihime-call-flow`, `/orihime-security-audit`, `/orihime-perf-analysis`, `/orihime-change-impact`) are now active.
+### 3. Index your repositories
+```bash
+python -m orihime index --repo /path/to/your/service-a --name service-a
+python -m orihime index --repo /path/to/your/service-b --name service-b
+```
+### 4. Ask questions
+```
+Trace the call flow for GET /api/orders in service-a
+Find SQL injection risks in service-b
+What breaks if I change OrderService.processPayment?
+Which endpoints are approaching saturation?
+```
+No source file reads. No grep. Claude uses the graph directly — typically 5–8 tool calls vs 30+ for source-only analysis.
+> **CLI alternative**: All operations above are also available as Python commands (`python -m orihime index`, `python -m orihime ui`, etc.) if you prefer working outside an AI assistant. See [CLI Reference](#cli-reference) below.
+---
+## Feature Comparison
+| Capability | Orihime | GitNexus | SonarQube Community | SonarQube Developer | SonarQube Enterprise |
+|---|---|---|---|---|---|
+| Cross-repo call graph | ✓ | ✓ | ✗ | ✗ | ✗ |
+| REST endpoint resolution | ✓ | ✓ | ✗ | ✗ | ✗ |
+| MCP integration (AI assistants) | ✓ | ✓ | ✓¹ | ✓¹ | ✓¹ |
+| Claude Code hooks + skills | ✓ | ✓ | ✗ | ✗ | ✗ |
+| Cross-file taint (SAST / injection) | ✓ | ✗ | ✗ | ✓ | ✓ |
+| Second-order injection | ✓ | ✗ | ✗ | ✗ | ✗ |
+| Entry-point reachability filter | ✓ | ✗ | ✗ | ✗ | ✗ |
+| Custom sources/sinks (YAML) | ✓ | ✗ | ✗ | ✗ | ✓² |
+| OWASP/CWE/PCI/STIG compliance reports | ✓ | ✗ | ✗ | ✗ | ✓ |
+| Argument-level taint (value-flow) | ✓ | ✗ | ✗ | ✗ | ✗ |
+| Complexity hints (O(n²), N+1) | ✓ | ✗ | partial | partial | partial |
+| I/O fan-out + serial/parallel analysis | ✓ | ✗ | ✗ | ✗ | ✗ |
+| Perf ingestion + capacity model | ✓ | ✗ | ✗ | ✗ | ✗ |
+| Cross-service cascade risk | ✓ | ✗ | ✗ | ✗ | ✗ |
+| License compliance | ✓ | ✗ | ✗ | ✗ | ✓³ |
+| Embedded DB (no server daemon) | ✓ | ✓ | ✗ | ✗ | ✗ |
+| Indexes Java / Kotlin | ✓ | ✓ | ✓ | ✓ | ✓ |
+| Indexes JS / TS | ✓ | ✓ | ✓ | ✓ | ✓ |
+| License | MIT | PolyForm NC | LGPL | Commercial | Commercial |
+> ¹ Via the official [sonarqube-mcp-server](https://github.com/SonarSource/sonarqube-mcp-server) (SonarSource, production-ready). Works with all SonarQube editions.
+> ² Custom taint sources/sinks require the Advanced Security add-on (Enterprise+).
+> ³ License compliance (SBOM + policy enforcement) requires the Advanced Security add-on (Enterprise+).
+>
+> **GitNexus** (PolyForm Non-Commercial) provides cross-repo call graphs and MCP integration across 14 languages including Java and Kotlin. It does not cover SAST, perf analysis, or compliance reporting.
+---
+## MCP Tools Reference
+### Call Graph
+| Tool | Description |
+|---|---|
+| `find_callers(method_fqn)` | All methods that call the given method |
+| `find_callees(method_fqn)` | All methods called by the given method |
+| `blast_radius(method_fqn, max_depth)` | Transitive set of callers up to N hops |
+| `find_endpoint_callers(http_method, path_pattern)` | Trace back from an HTTP endpoint to its callers |
+| `find_implementations(interface_fqn)` | All classes implementing an interface |
+| `find_superclasses(class_fqn, max_depth)` | Inheritance chain |
+| `find_external_calls(repo_name)` | All calls to methods outside the indexed repo |
+### Discovery
+| Tool | Description |
+|---|---|
+| `search_symbol(query)` | Full-text search across class/method FQNs |
+| `get_file_location(fqn)` | File path and line number for any class or method |
+| `list_repos()` | All indexed repositories |
+| `list_branches(repo_name)` | All indexed branches for a repo |
+| `list_endpoints(repo_name)` | All HTTP endpoints in a repo |
+| `list_unresolved_calls(repo_name)` | REST calls that couldn't be matched to an endpoint |
+| `find_repo_dependencies(repo_name)` | Cross-service DEPENDS_ON edges |
+### ORM / JPA
+| Tool | Description |
+|---|---|
+| `list_entity_relations(repo_name)` | All JPA entity relationships — also used in design review (Phase 1.5) |
+| `find_eager_fetches(repo_name)` | EAGER-fetched collections (N+1 risk) |
+### Security (SAST)
+| Tool | Description |
+|---|---|
+| `find_taint_sinks(repo_name)` | All taint sinks reachable in the call graph |
+| `find_taint_flows(repo_name)` | Value-flow taint: argument → parameter across CALLS edges |
+| `find_cross_service_taint(repo_name, max_depth)` | Taint that crosses service boundaries via REST |
+| `find_second_order_injection(repo_name)` | Taint stored to DB then re-read and used as sink |
+| `find_entry_points(repo_name)` | All HTTP/Kafka/Scheduled/JMS/RabbitMQ entry points |
+| `find_reachable_sinks(repo_name, show_all)` | Taint sinks filtered to those reachable from entry points only |
+| `generate_security_report(repo_name, framework)` | Report in OWASP / CWE / PCI / STIG format |
+| `list_security_config()` | Show active sources, sinks, and sanitizers from YAML config |
+### Complexity & Performance
+| Tool | Description |
+|---|---|
+| `find_complexity_hints(repo_name, min_severity)` | Methods flagged with O(n²), N+1, unbounded-query, recursive |
+| `ingest_perf_results(repo_name, file_path)` | Load Gatling simulation.log, JMeter XML, or JSON perf data |
+| `find_hotspots(repo_name)` | Complexity hints × p99 latency, sorted by risk score |
+| `estimate_capacity(repo_name)` | Little's Law capacity per endpoint; flags near-saturation |
+| `find_cascade_risk(repo_name)` | Cross-service cascade: upstream endpoints limited by downstream saturation |
+### License Compliance
+| Tool | Description |
+|---|---|
+| `find_license_violations(repo_name, allowed, skip_lookup)` | Flag GPL/AGPL/LGPL dependencies via Maven Central |
+### Index
+| Tool | Description |
+|---|---|
+| `index_repo_tool(repo_path, repo_name)` | Trigger an index from within the MCP session |
+---
+## CLI Reference
+All operations are also accessible directly without an AI assistant:
+```
+python -m orihime index        --repo PATH  --name NAME  [--db PATH] [--force] [--branch NAME]
+python -m orihime ui           [--port 7700] [--db PATH]
+python -m orihime serve
+python -m orihime serve-sse    [--port 7702] [--db PATH]
+python -m orihime resolve        [--db PATH]
+python -m orihime write-server   [--port 7701] [--db PATH]
+python -m orihime register       [--db PATH] [--python PATH]
+python -m orihime install-skills
+```
+| Command | Description |
+|---|---|
+| `index` | Parse a repository and write its graph into KuzuDB |
+| `ui` | Start the local web UI on port 7700 |
+| `serve` | Start the MCP server on stdio (for Claude Code, Claude Desktop, any MCP client) |
+| `serve-sse` | Start the MCP server with SSE transport (for CI runners and remote clients) |
+| `resolve` | Match RestCall URL patterns against Endpoints across all indexed repos |
+| `write-server` | Start the write-serialization server for team/server deployments |
+| `register` | Write the Orihime MCP server entry to `~/.claude/settings.json` |
+| `install-skills` | Copy bundled skills to the target AI assistant's config dir (`--agent claude\|cursor\|codex\|copilot\|all`) |
+---
+## Web UI
+```
+http://localhost:7700
+```
+| Page | Description |
+|---|---|
+| `/` | Call graph explorer: search methods, trace callers/callees, visualize CALLS graph |
+| `/findings` | Security + complexity findings table — filter by OWASP category, severity, file |
+| `/api/…` | JSON endpoints backing the UI (also usable directly) |
+---
+## Configuration
+### Environment Variables
+| Variable | Default | Description |
+|---|---|---|
+| `ORIHIME_DB_PATH` | `~/.orihime/orihime.db` | Path to KuzuDB database directory |
+| `ORIHIME_SERVER_URL` | _(unset)_ | URL of the write-serialization server (team mode) |
+### Custom Sources and Sinks
+Create `~/.orihime/security_config.yaml` (or set `ORIHIME_SECURITY_CONFIG`):
+```yaml
+sources:
+  - method_pattern: ".*getCustomUserInput"
+    description: "Custom input source"
+sinks:
+  - method_pattern: ".*legacyExec"
+    sink_type: "COMMAND_INJECTION"
+    description: "Legacy shell executor"
+sanitizers:
+  - method_pattern: ".*sanitizeForLegacy"
+```
+The built-in config covers `HttpServletRequest`, `@RequestParam`, `@PathVariable`, `@RequestBody`, JDBC `execute*`, JPA native queries, `Runtime.exec`, `ProcessBuilder`, XML parsers, `ObjectInputStream`, `Files.get`, `Paths.get`, `new URL`, logging calls, and more.
+---
+## Documentation
+| Doc | Description |
+|---|---|
+| [MCP Server](docs/mcp-server.md) | All MCP tools with parameters and examples |
+| [Extractors](docs/extractors.md) | How Java/Kotlin/JS/TS are parsed; ExtractResult schema |
+| [Security Config](docs/security-config.md) | Custom sources, sinks, sanitizers — YAML reference |
+| [CI Integration](docs/ci-integration.md) | GitHub Actions PR review workflow setup |
+| [Docker](docs/docker.md) | Docker Compose setup for server deployments |
+| [Adding a Language](docs/adding-a-language.md) | How to add a new language extractor |
+| [Cross-Repo Resolution](docs/resolver.md) | How REST calls are matched to endpoints across repos |
+---
+## Team / Server Mode
+KuzuDB has a single-writer constraint. In team deployments where multiple developers re-index simultaneously, run the write-serialization server:
+```bash
+# On the shared server — owns the KuzuDB connection
+python -m orihime write-server --port 7701 --db /shared/orihime.db
+# Each developer's indexer sends writes to the server
+ORIHIME_SERVER_URL=http://server:7701 python -m orihime index --repo /path --name my-service
+```
+Developers running locally without `ORIHIME_SERVER_URL` open KuzuDB directly as always. The web UI and MCP server always read directly from KuzuDB (reads do not go through the write server).
+---
+## Architecture
+```
+Source files
+    │
+    ▼ tree-sitter (Java, Kotlin, JS, TS)
+ParseResult (plain Python dicts, picklable)
+    │
+    ▼ ProcessPoolExecutor (parallel parse workers)
+Phase 2: KuzuDB writes (batched by table, 500-edge transactions)
+    │
+    ▼
+KuzuDB embedded graph  ←──────────────────────────────┐
+    │                                                   │
+    ├── MCP server (FastMCP, stdio)                     │
+    ├── Web UI (Starlette, port 7700)                   │
+    └── Write server (FastAPI, port 7701, team mode) ──┘
+```
+**Graph schema** (SCHEMA_VERSION 10):
+| Node | Key fields |
+|---|---|
+| `Repo` | id, name, root_path |
+| `File` | path, language, blob_hash, branch_name |
+| `Class` | fqn, annotations, is_interface |
+| `Method` | fqn, line_start, annotations, is_entry_point, complexity_hint |
+| `Endpoint` | http_method, path, path_regex |
+| `RestCall` | http_method, url_pattern |
+| `EntityRelation` | source_class, target_class, fetch_type, relation_type |
+| `PerfSample` | endpoint_fqn, p50_ms, p99_ms, rps, source |
+| `CapacityEstimate` | endpoint_fqn, saturation_rps, ceiling_concurrency, risk_level |
+| Relationship | Description |
+|---|---|
+| `CALLS` | Method → Method; carries callee_name, caller_arg_pos, callee_param_pos |
+| `CALLS_REST` | Method → Endpoint (resolved cross-service call) |
+| `UNRESOLVED_CALL` | Method → RestCall (not yet resolved) |
+| `CONTAINS_CLASS` | File → Class |
+| `CONTAINS_METHOD` | Class → Method |
+| `EXPOSES` | Repo → Endpoint |
+| `DEPENDS_ON` | Repo → Repo (cross-service dependency) |
+| `EXTENDS` | Class → Class |
+| `IMPLEMENTS` | Class → Class |
+| `HAS_RELATION` | Class → EntityRelation |
+| `OBSERVED_AT` | Method → PerfSample |
+---
+## Performance
+### Query performance (graph DB)
+Benchmarked on an 845-file Java/Kotlin service:
+| Operation | Time |
+|---|---|
+| Cold index | ~67s |
+| Incremental re-index (no changes) | ~34s |
+| `find_callers` | <5ms |
+| `blast_radius` (depth 3) | <15ms |
+| `find_taint_sinks` (full repo) | <25ms |
+Batch write speedup vs naive per-row writes: **12×**.
+---
+### AI assistant benchmark — tracing a single call flow
+#### Java/Kotlin codebase (845 + 224 files, measured)
+Benchmarked on a 845-file Kotlin service and a 224-file Java service, tracing one controller endpoint through service → repositories → upstream APIs. GitNexus v1.6.3, Orihime v1.9, and a grep+source-read baseline were all measured on the same codebase on the same hardware (WSL2/Ubuntu, Intel i7, 2026-04-30).
+| Approach | Cold index | Query latency | Avg tokens/query | Files read |
+|---|---|---|---|---|
+| **Baseline** — Claude reads source files directly | — | ~4–5 min | ~14,000 | 27 |
+| **GitNexus v1.6.3** | 51.4s | 2–10s⁴ | ~1,490 | 0 |
+| **Orihime v1.9** | **66.6s** | **3–22ms** | **~683** | **0** |
+**Orihime vs baseline: 95% fewer tokens · 200–1,400× faster queries**
+**Orihime vs GitNexus: 2.2× fewer tokens · 200–1,400× faster queries · MCP-native**
+The 7 Orihime tool calls produced ~80% of the structural picture (full controller→service→repo→upstream chain, 27 test methods surfaced, resilience wiring discovered automatically). The remaining ~20% — upstream API URLs, auth headers, branch-level control flow — requires targeted source reads, scoped to ~5 specific files rather than 27.
+GitNexus's cold index is ~1.3× faster on NTFS (Node.js parse throughput advantage). On native Linux this gap narrows to near parity.
+> ⁴ GitNexus query latency is dominated by live GitHub API round trips (1–3 per query × 500–2,000ms each, rate-limit dependent). Blast radius returned results in the wrong direction (upstream imports rather than downstream dependents).
+---
+## License
+MIT