npm - @pmaddire/gcie - Versions diffs - 0.1.2 - Mend

@pmaddire/gcie 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (82) hide show

package/AGENT.md +256 -0
package/AGENT_USAGE.md +231 -0
package/ARCHITECTURE.md +151 -0
package/CLAUDE.md +69 -0
package/DEBUGGING_PLAYBOOK.md +160 -0
package/KNOWLEDGE_INDEX.md +154 -0
package/POTENTIAL_UPDATES +130 -0
package/PROJECT.md +141 -0
package/README.md +371 -0
package/REPO_DIGITAL_TWIN.md +98 -0
package/ROADMAP.md +301 -0
package/SETUP_ANY_REPO.md +85 -0
package/bin/gcie-init.js +20 -0
package/bin/gcie.js +45 -0
package/cli/__init__.py +1 -0
package/cli/app.py +163 -0
package/cli/commands/__init__.py +1 -0
package/cli/commands/cache.py +35 -0
package/cli/commands/context.py +2426 -0
package/cli/commands/context_slices.py +617 -0
package/cli/commands/debug.py +24 -0
package/cli/commands/index.py +17 -0
package/cli/commands/query.py +20 -0
package/cli/commands/setup.py +73 -0
package/config/__init__.py +1 -0
package/config/scanner_config.py +82 -0
package/context/__init__.py +1 -0
package/context/architecture_bootstrap.py +170 -0
package/context/architecture_index.py +185 -0
package/context/architecture_parser.py +170 -0
package/context/architecture_slicer.py +308 -0
package/context/context_router.py +70 -0
package/context/fallback_evaluator.py +21 -0
package/coverage_integration/__init__.py +1 -0
package/coverage_integration/coverage_loader.py +55 -0
package/debugging/__init__.py +12 -0
package/debugging/bug_localizer.py +81 -0
package/debugging/execution_path_analyzer.py +42 -0
package/embeddings/__init__.py +6 -0
package/embeddings/encoder.py +45 -0
package/embeddings/faiss_index.py +72 -0
package/git_integration/__init__.py +1 -0
package/git_integration/git_miner.py +78 -0
package/graphs/__init__.py +17 -0
package/graphs/call_graph.py +70 -0
package/graphs/code_graph.py +81 -0
package/graphs/execution_graph.py +35 -0
package/graphs/git_graph.py +43 -0
package/graphs/graph_store.py +25 -0
package/graphs/node_factory.py +21 -0
package/graphs/test_graph.py +65 -0
package/graphs/validators.py +28 -0
package/graphs/variable_graph.py +51 -0
package/knowledge_index/__init__.py +1 -0
package/knowledge_index/index_builder.py +60 -0
package/knowledge_index/models.py +35 -0
package/knowledge_index/query_api.py +38 -0
package/knowledge_index/store.py +23 -0
package/llm_context/__init__.py +6 -0
package/llm_context/context_builder.py +67 -0
package/llm_context/snippet_selector.py +57 -0
package/package.json +14 -0
package/parser/__init__.py +18 -0
package/parser/ast_parser.py +216 -0
package/parser/call_resolver.py +52 -0
package/parser/models.py +75 -0
package/parser/tree_sitter_adapter.py +56 -0
package/parser/variable_extractor.py +31 -0
package/retrieval/__init__.py +17 -0
package/retrieval/cache.py +22 -0
package/retrieval/hybrid_retriever.py +249 -0
package/retrieval/query_parser.py +38 -0
package/retrieval/ranking.py +43 -0
package/retrieval/semantic_retriever.py +39 -0
package/retrieval/symbolic_retriever.py +80 -0
package/scanner/__init__.py +5 -0
package/scanner/file_filters.py +37 -0
package/scanner/models.py +44 -0
package/scanner/repository_scanner.py +55 -0
package/scripts/bootstrap_from_github.ps1 +41 -0
package/tracing/__init__.py +1 -0
package/tracing/runtime_tracer.py +60 -0

package/AGENT.md ADDED Viewed

@@ -0,0 +1,256 @@
+# AGENT.md
+Agent Operating Instructions for GraphCode Intelligence Engine (GCIE)
+This file provides persistent architectural context for coding agents working on this repository.
+Agents must read this file before performing any development tasks.
+---
+PROJECT NAME
+GraphCode Intelligence Engine (GCIE)
+---
+PROJECT PURPOSE
+GCIE is a graph-based code intelligence system designed to drastically reduce LLM token usage when working with large codebases.
+Instead of sending entire files to an LLM, GCIE retrieves only the minimal execution-relevant code required to answer a query.
+Example query:
+"Why is variable diff exploding?"
+GCIE should return only the relevant execution path and functions responsible for modifying the variable.
+---
+HIGH LEVEL ARCHITECTURE
+The system constructs multiple graphs representing the codebase.
+These graphs are combined into a unified knowledge graph used for symbolic and semantic retrieval.
+Graph types include:
+1. Code Structure Graph
+2. Call Graph
+3. Variable Dependency Graph
+4. Execution Trace Graph
+5. Git History Graph
+6. Test Coverage Graph
+---
+SYSTEM COMPONENTS
+parser/
+Parses repository source code using AST.
+Responsible for extracting:
+functions
+classes
+variables
+imports
+assignments
+function calls
+---
+graphs/
+Responsible for building graph representations.
+Graph modules include:
+code_graph.py
+call_graph.py
+variable_graph.py
+execution_graph.py
+git_graph.py
+test_graph.py
+---
+retrieval/
+Responsible for retrieving relevant code based on queries.
+Includes:
+symbolic_retriever.py
+semantic_retriever.py
+hybrid_retriever.py
+---
+embeddings/
+Responsible for embedding code for semantic search.
+Uses SentenceTransformers.
+Embeddings stored in FAISS vector index.
+---
+debugging/
+Contains logic for automated bug localization.
+Includes:
+bug_localizer.py
+execution_path_analyzer.py
+---
+llm_context/
+Builds minimal code context for LLM prompts.
+Responsible for formatting retrieved code snippets.
+---
+cli/
+CLI interface for interacting with GCIE.
+---
+CORE DESIGN PRINCIPLES
+Minimal Context Retrieval
+The system should always aim to return the smallest possible code context required to answer a query.
+---
+Graph First Retrieval
+Symbolic graph traversal should be performed before semantic search.
+---
+Hybrid Ranking
+Final results should combine:
+symbolic retrieval
+semantic similarity
+git recency weighting
+test coverage weighting
+---
+GRAPH DATA MODEL
+Nodes may represent:
+files
+classes
+functions
+variables
+commits
+tests
+Edges may represent:
+CALLS
+IMPORTS
+DEFINES
+MODIFIES
+READS
+WRITES
+EXECUTES
+CHANGED_IN
+COVERED_BY
+---
+QUERY PIPELINE
+When a query is received:
+1. Extract relevant symbols (variables/functions)
+2. Perform symbolic graph traversal
+3. Retrieve execution paths
+4. Rank candidates with embeddings
+5. Apply git and coverage weighting
+6. Return minimal code snippets
+---
+BUG LOCALIZATION STRATEGY
+When debugging queries are received:
+1. Identify target variable or function
+2. Find functions modifying that symbol
+3. Trace upstream execution paths
+4. Prioritize recently modified code
+5. Prioritize code with low test coverage
+---
+DEVELOPMENT WORKFLOW
+This repository uses the Get Shit Done (GSD) workflow.
+Agents must follow the process:
+1. Create project specification
+2. Generate roadmap
+3. Plan phases
+4. Execute atomic tasks
+5. Verify outputs
+Agents must not implement large features in a single step.
+---
+IMPLEMENTATION RULES
+Agents must:
+write modular code
+use Python type hints
+write docstrings
+verify features before continuing
+follow phased development
+---
+PERFORMANCE GOALS
+The system should aim to reduce LLM prompt context size by at least 10x compared to naive full-repository prompts.
+---
+FUTURE EXTENSIONS
+Possible future improvements include:
+cross-language support using Tree-sitter
+persistent graph database using Neo4j
+IDE integration
+LLM agent integration
+execution-aware debugging agents
+---
+END OF AGENT MEMORY FILE

package/AGENT_USAGE.md ADDED Viewed

@@ -0,0 +1,231 @@
+# GCIE Agent Usage (Portable Default)
+This file is designed to be dropped into any repository and used immediately.
+## Goal
+Retrieve the smallest useful context while preserving edit safety.
+Priority order:
+1. accuracy (must-have coverage)
+2. full-hit reliability
+3. token efficiency
+## Quick Start (Any Repo)
+1. Identify must-have context categories for the task:
+- implementation file(s)
+- wiring/orchestration file(s)
+- validation surface when risk is non-trivial
+- this may be a test, spec, schema, contract, migration, config, or CLI surface depending on the repo
+2. Run one primary retrieval with a file-first, symbol-heavy query:
+```powershell
+gcie.cmd context <path> "<file-first symbol-heavy query>" --intent <edit|debug|refactor|explore> --budget <shape budget>
+```
+3. Check must-have coverage.
+4. If one must-have file is missing, run targeted gap-fill for only that file.
+5. Stop immediately when must-have coverage is complete.
+## Retrieval Modes (Adaptive Router)
+Use three modes and choose by task family:
+1. `plain-context-first` (default for most tasks)
+2. `slicer-first` (for hard routed architecture or multi-hop families)
+3. `direct-file-check` (verification and fast gap closure)
+Plain-context command:
+```powershell
+gcie.cmd context <path> "<query>" --intent <edit|debug|refactor|explore> --budget <shape budget>
+```
+Slicer-first command:
+```powershell
+gcie.cmd context-slices <path> "<query>" --intent <edit|debug|refactor|explore>
+```
+Direct-file-check command:
+```powershell
+rg -n "<symbol1|symbol2|symbol3>" <likely files or subtree>
+```
+Mode-switch rule:
+- start with `plain-context-first` unless setup calibration proved another mode is better for that family
+- use `slicer-first` only for families where routing/architecture slices repeatedly outperform plain context
+- use `direct-file-check` whenever must-have coverage is uncertain or one file remains missing
+- do not keep retrying the same mode indefinitely; switch after one weak result
+Portable starter policy:
+- default all families to `plain-context-first`
+- after first 10-20 tasks, promote individual families to `slicer-first` only if benchmarked better
+- keep a family on plain-context if slicer is more expensive with no accuracy gain
+## Architecture Tracking (Portable, In-Repo)
+To make slicer mode adapt as the repo changes, keep architecture tracking inside the repo where GCIE runs.
+Track these files under `.gcie/`:
+- `.gcie/architecture.md`
+- `.gcie/architecture_index.json`
+- `.gcie/context_config.json`
+How to keep it adaptive:
+1. Bootstrap from user docs once (read-only):
+- `ARCHITECTURE.md`, `README.md`, `PROJECT.md`, `docs/architecture.md`, `docs/system_design.md`
+2. Use `.gcie/architecture.md` as GCIE-owned working architecture map.
+3. Refresh `.gcie/architecture.md` and `.gcie/architecture_index.json` when structural changes happen:
+- new subsystem
+- major module split/merge
+- interface/boundary change
+- dependency-direction change
+- active work-area shift
+4. Do not overwrite user-owned docs unless explicitly asked.
+Architecture confidence rule:
+- if architecture slice confidence is low or required mappings are stale/missing, fallback to plain `context` automatically
+- record fallback reason in `.gcie/context_config.json` when bypassing slicer mode
+## Portable Defaults (Task-Shape Based)
+Use these as a starting point in new repos.
+Primary pass budgets:
+- `auto`: simple same-layer or strong single-file lookup
+- `900`: same-family two-file lookup, frontend-local component lookup
+- `1100`: backend/config pair, same-layer backend pair
+- `1150`: cross-layer UI/API flow
+- `1300-1400`: explicit multi-hop chain (3+ linked files)
+Gap-fill budgets:
+- missing general implementation/wiring file: `900`
+- missing small orchestration or entry file: `500`
+Scope rule:
+- use the smallest path scope that still contains the expected files
+- use repo root (`.`) only for true cross-layer or backend orchestration recovery
+- if explicit targets cluster in one subtree, broad repo-root retrieval is often worse than subtree retrieval
+## Query Construction (Portable)
+Use this pattern:
+`<file-a> <file-b> <function/component> <state-or-arg> <route/flag> <config-key>`
+Guidelines:
+- include explicit file paths when known
+- include 2 to 6 distinctive symbols
+- include a caller or entry anchor when the target is indirect
+- avoid vague summaries and long laundry-list queries
+## Adaptive Loop (When Retrieval Is Weak)
+Treat retrieval as weak if any are true:
+- missing implementation or wiring category
+- generic entry/support files dominate
+- only tiny snippets from the target file appear, with no useful implementation body
+- expected cross-layer endpoint is missing
+Adapt in this order, one change at a time:
+1. Query upgrade:
+- add explicit file paths
+- add missing symbols such as functions, props, routes, flags, or keys
+- add caller or entry anchor
+2. Scope correction:
+- noisy root results: move to subtree scope
+- missing cross-layer or backend anchor: use a targeted root query for that file
+3. Budget bump:
+- raise one rung only, roughly `+100` to `+250`
+4. Targeted gap-fill:
+- fetch only the missing must-have file(s)
+5. Decompose chain, only if needed:
+- for 4+ hops, split into adjacent 2-3 file hops
+## Safe Efficiency Mode
+Use only after stable coverage is achieved.
+Rules:
+- do not lower primary budgets for known hard shapes
+- for a single missing file, try `800` before `900` only if the first pass already found same-family context
+- if `800` misses, immediately retry the stable default
+- if any miss persists, revert that task family to stable settings
+Note:
+- `800` is an experimental efficiency step-down, not a portable default truth
+- keep it only if it preserves full must-have coverage in the current repo
+## Verification Rule
+Always verify with a quick local symbol check before editing:
+```powershell
+rg -n "symbol1|symbol2|symbol3" <likely files>
+```
+GCIE is a context compressor, not the final truth gate.
+If one required file is still missing after retrieval, do direct-file-check first, then run one targeted GCIE call only for that file.
+## Portable Stop Rule
+Stop retrieval when all must-have categories are covered:
+- implementation
+- wiring/orchestration
+- validation surface, when risk justifies it
+Do not continue increasing budgets after sufficiency is reached.
+## First 5 Tasks Calibration (Minimal)
+For a new repo, track these fields for the first 5 tasks:
+- task shape
+- primary budget
+- gap-fill used (Y/N)
+- must-have full-hit (Y/N)
+- total tokens
+If a miss pattern repeats 2+ times in one task family:
+- add one local override for that family only
+- keep all other families on portable defaults
+Update necessity rule:
+- explicit workflow updates are optional, not required for baseline operation
+- if results are stable, keep using portable defaults without changes
+- add or update a local override only when the same miss pattern repeats 2-3 times
+## Optional Appendix: Repo-Specific Overrides (Example)
+These are examples from one mixed-layer repo and are not universal defaults.
+1. `cross_layer_ui_api` override:
+```powershell
+gcie.cmd context frontend "src/App.jsx src/main.jsx <symbols>" --intent edit --budget 900
+gcie.cmd context . "app.py start_convert selected_theme selectedTheme no_ai" --intent edit --budget 900
+```
+2. Stage 3/4 planner-builder pair override (`Plan_slides.py` + `Build_pptx.py`):
+```powershell
+gcie.cmd context . "Plan_slides.py content_slides section_divider figure_slides table_slide" --intent <intent> --budget 900
+gcie.cmd context . "Build_pptx.py build_pptx render_eq_png apply_theme THEME_CHOICES" --intent <intent> --budget 900
+```
+3. Stage 1/2 with `main.py` override:
+```powershell
+gcie.cmd context . "Analyze_pdf_structure.py Extract_pdf_content.py extract_pages split_into_sections extract_images enrich_with_ai" --intent explore --budget 1100
+gcie.cmd context . "main.py Stage 1 Stage 2 extract_pages enrich_with_ai" --intent explore --budget 500
+```
+4. Guardrail example:
+- keep the stable workflow for families that regress under split retrieval
+- example: `llm_client.py + Analyze_pdf_structure.py + Extract_pdf_content.py` in one benchmarked repo
+If this appendix does not match your repo, ignore it and use only the portable sections above.

package/ARCHITECTURE.md ADDED Viewed

@@ -0,0 +1,151 @@
+# ARCHITECTURE.md
+GraphCode Intelligence Engine (GCIE) Architecture
+This document describes the architecture of the GCIE system.
+---
+SYSTEM PURPOSE
+GCIE is a graph-based code intelligence engine that retrieves minimal execution-relevant code context for LLM workflows.
+The system reduces token usage by retrieving only relevant code paths rather than entire files.
+---
+CORE SUBSYSTEMS
+The system consists of five primary subsystems.
+1. Code Parser
+2. Graph Builder
+3. Knowledge Index
+4. Retrieval Engine
+5. LLM Context Builder
+---
+ARCHITECTURE DIAGRAM
+Repository
+↓
+Repository Scanner
+↓
+AST Parser
+↓
+Symbol Extractor
+↓
+Graph Builders
+↓
+Unified Knowledge Graph
+↓
+Knowledge Index
+↓
+Retrieval Engine
+↓
+LLM Context Builder
+↓
+CLI Interface
+---
+GRAPH SYSTEM
+The graph system builds multiple graphs representing the codebase.
+Code Structure Graph
+Represents relationships between files, classes, and functions.
+Call Graph
+Represents which functions call other functions.
+Variable Dependency Graph
+Represents read/write relationships between functions and variables.
+Execution Trace Graph
+Represents runtime execution paths captured via tracing.
+Git History Graph
+Represents relationships between commits and code elements.
+Test Coverage Graph
+Represents relationships between tests and executed code.
+---
+KNOWLEDGE INDEX
+The Knowledge Index is a structured metadata index of the codebase.
+It stores:
+function metadata
+class metadata
+file metadata
+variable metadata
+dependency metadata
+The Knowledge Index allows fast queries such as:
+Which functions modify variable X
+Which modules depend on module Y
+Which functions call function Z
+These queries can be answered without calling an LLM.
+---
+RETRIEVAL PIPELINE
+Query
+↓
+Symbol Extraction
+↓
+Knowledge Index Query
+↓
+Graph Traversal
+↓
+Semantic Ranking
+↓
+Context Builder
+↓
+LLM
+---
+TOKEN REDUCTION STRATEGY
+Token usage is reduced by:
+1. Graph-based symbolic retrieval
+2. Knowledge index filtering
+3. Semantic ranking
+4. Minimal context packaging
+---
+EXPECTED TOKEN SAVINGS
+Typical context sizes:
+Naive repo prompt:
+20k tokens
+Vector RAG:
+3k tokens
+GCIE graph retrieval:
+300–800 tokens
+---
+END ARCHITECTURE

package/CLAUDE.md ADDED Viewed

@@ -0,0 +1,69 @@
+# Agent Instructions
+Before making any code change, run:
+gcie.cmd context-slices . "<task>" --intent <edit|debug|refactor|explore>
+Use only the returned snippets as working context.
+If the context seems insufficient, add a pin or increase budgets.
+## Tested Usage Playbook (updated 2026-03-25)
+### 1) Command form
+On this machine, use:
+gcie.cmd context-slices . "<task>" --intent <edit|debug|refactor|explore> [--profile recall|low] [--pin <path>] [--include-tests]
+(`gcie` PowerShell shim can be blocked by execution policy; `gcie.cmd` works.)
+### 2) Profiles
+- `--profile recall` (default): higher recall, still strong savings.
+- `--profile low`: aggressive budgets, cheaper but can miss files.
+Profile defaults:
+- recall: stage-a 400, stage-b 800, max-total 1200, pin-budget 300
+- low: stage-a 300, stage-b 600, max-total 800, pin-budget 200
+### 3) Tests only when needed
+Do not pull tests unless you are writing/updating tests or touching risky logic:
+- Add `--include-tests` only when necessary.
+### 4) File pinning for must-have wiring files
+If a required file is still missing (commonly `frontend/src/App.jsx`), pin it directly:
+- `gcie.cmd context-slices . "<task wiring keywords>" --pin frontend/src/App.jsx --pin-budget 300 --intent edit`
+### 5) Token gate
+Default max output is `--max-total 1200` tokens for medium tasks.
+If required files are still missing, the tool can exceed this limit to surface more context.
+### 6) Query strategy
+Use explicit nouns from the code, not abstract issue summaries:
+- Better: "api export endpoint doc_type session_id markdown"
+- Better: "Canvas.test.jsx and Canvas.jsx for no architecture nodes generated"
+- Better: "refinement rename subsystem patch persisted session outputs"
+Include filenames or function/prop names where possible.
+### 7) Known command behavior
+- `gcie.cmd context-slices` works for repo paths and uses path-scoped slices.
+- `gcie.cmd query` / `gcie.cmd debug` can fail on directory path `.` in this environment; avoid them for whole-repo lookup.
+- Reindex after major changes: `gcie.cmd index .`
+### 8) Sufficiency gate (required before edits)
+Context is sufficient only when results include all of:
+- Primary implementation file(s)
+- UI/handler wiring file(s) (often `frontend/src/App.jsx`)
+- At least one relevant test file (only when `--include-tests` is used)
+If any are missing, add pins and/or raise budgets before editing.