PyPI - icsf-cli - Versions diffs - 0.1.0__tar.gz - Mend

icsf-cli 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

icsf_cli-0.1.0/PKG-INFO +1095 -0
icsf_cli-0.1.0/README.md +1059 -0
icsf_cli-0.1.0/backend/__init__.py +7 -0
icsf_cli-0.1.0/backend/cli.py +202 -0
icsf_cli-0.1.0/backend/cli_api.py +409 -0
icsf_cli-0.1.0/backend/config.py +76 -0
icsf_cli-0.1.0/backend/diag_auth.py +16 -0
icsf_cli-0.1.0/backend/logging_config.py +18 -0
icsf_cli-0.1.0/backend/main.py +1644 -0
icsf_cli-0.1.0/icsf_cli.egg-info/PKG-INFO +1095 -0
icsf_cli-0.1.0/icsf_cli.egg-info/SOURCES.txt +15 -0
icsf_cli-0.1.0/icsf_cli.egg-info/dependency_links.txt +1 -0
icsf_cli-0.1.0/icsf_cli.egg-info/entry_points.txt +2 -0
icsf_cli-0.1.0/icsf_cli.egg-info/requires.txt +16 -0
icsf_cli-0.1.0/icsf_cli.egg-info/top_level.txt +1 -0
icsf_cli-0.1.0/pyproject.toml +55 -0
icsf_cli-0.1.0/setup.cfg +4 -0

icsf_cli-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,1095 @@
+Metadata-Version: 2.4
+Name: icsf-cli
+Version: 0.1.0
+Summary: ICSF – Intelligent Code Security & Fixing Platform (CLI)
+Author-email: Ramu Venkatesan <Ramu.Venkatesan@infoservices.com>
+License: MIT
+Project-URL: Homepage, https://github.com/icsf-testing/icsf-poc
+Project-URL: Source, https://github.com/icsf-testing/icsf-poc
+Project-URL: Issues, https://github.com/icsf-testing/icsf-poc/issues
+Keywords: security,cli,github,java,maven,icsf
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Intended Audience :: Developers
+Classifier: Topic :: Security
+Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: fastapi<1.0.0,>=0.104.1
+Requires-Dist: uvicorn[standard]<0.32.0,>=0.24.0
+Requires-Dist: httpx<1,>=0.28.1
+Requires-Dist: pydantic<3.0.0,>=2.10.0
+Requires-Dist: pydantic[email]<3.0.0,>=2.10.0
+Requires-Dist: python-dotenv==1.0.0
+Requires-Dist: pandas<3.0.0,>=2.0.0
+Requires-Dist: python-multipart<1.0.0,>=0.0.6
+Requires-Dist: boto3<2.0.0,>=1.34.0
+Requires-Dist: botocore<2.0.0,>=1.34.0
+Requires-Dist: pyyaml<7.0.0,>=6.0.1
+Requires-Dist: PyGithub<3.0.0,>=2.1.1
+Requires-Dist: GitPython<4.0.0,>=3.1.40
+Requires-Dist: numpy<3.0.0,>=1.26.0
+Requires-Dist: streamlit<2.0.0,>=1.28.0
+Requires-Dist: starlette<1.0.0,>=0.27.0
+# ICSF – Intelligent Code Security & Fixing Platform
+ICSF is a full-stack, AI-powered platform that automates the discovery, analysis, and remediation of security vulnerabilities in Java/Maven codebases. It combines a multi-agent cognitive fixing pipeline with an autonomous self-healing testing framework (**Atlas**) to deliver a closed-loop system: from vulnerability report → verified, PR-ready fix.
+---
+## Table of Contents
+- [Architecture Overview](#-architecture-overview)
+- [End-to-End Application Flow](#-end-to-end-application-flow)
+- [Backend Deep Dive](#-backend-deep-dive)
+  - [FastAPI Main (`main.py`)](#1-fastapi-main-mainpy--1636-lines)
+  - [Configuration & Credentials](#2-configuration--credentials)
+  - [Pydantic Data Models](#3-pydantic-data-models-modelsagent_modelspy)
+  - [Services Layer](#4-services-layer-services)
+  - [Agents Layer (Cognitive Fixing Loop)](#5-agents-layer-agents--cognitive-fixing-loop)
+  - [Atlas Subsystem (Self-Healing Testing)](#6-atlas-subsystem-atlas--self-healing-testing-framework)
+- [Frontend Deep Dive](#-frontend-deep-dive)
+- [RAG (Retrieval-Augmented Generation)](#-rag-retrieval-augmented-generation)
+- [AI / LLM Integration](#-ai--llm-integration)
+- [Input Requirements](#-input-requirements)
+- [Technical Stack](#-technical-stack)
+- [Getting Started](#-getting-started)
+- [Project Structure](#-project-structure)
+- [API Reference](#-api-reference)
+---
+## 🏗️ Architecture Overview
+ICSF follows a layered, modular architecture:
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    Frontend (Streamlit – 5676 lines)                │
+│   Premium dark-mode dashboard · Real-time progress · Lineage graph │
+├─────────────────────────────────────────────────────────────────────┤
+│                   Backend (FastAPI – 1636 lines)                    │
+│         REST API · WebSocket/SSE · Request ID middleware            │
+├────────────┬──────────────────────┬─────────────────────────────────┤
+│  Services  │       Agents         │      Atlas Subsystem            │
+│ (14 files) │  (Cognitive Loop)    │  (Self-Healing Testing)         │
+│            │  5 agents + helpers  │  14 sub-packages                │
+├────────────┼──────────────────────┼─────────────────────────────────┤
+│            │   AWS Bedrock (LLM)  │  SQLite RAG Store               │
+│            │  Claude 3.5 Sonnet   │  Titan Embeddings               │
+│            │  Titan Embeddings    │  Cosine Similarity Search       │
+└────────────┴──────────────────────┴─────────────────────────────────┘
+```
+### Key Design Principles
+| Principle | Implementation |
+|---|---|
+| **Single AI Provider** | All LLM/embedding calls route through AWS Bedrock only |
+| **Multi-Agent Pipeline** | 5 specialized agents, each with a single responsibility |
+| **Self-Healing** | Atlas auto-repairs build failures and test regressions |
+| **Cross-Repo Awareness** | Dependency analysis spans across multiple repositories |
+| **Cost Control** | `CostGuardService` enforces per-run budget limits |
+| **Resilience** | Retry with exponential backoff + circuit breakers |
+---
+## 🔄 End-to-End Application Flow
+```mermaid
+flowchart TD
+    %% Entry & Configuration
+    U((User)) -->|1. Setup| UI[Streamlit Dashboard]
+    UI -->|2. Upload CSV| BE[FastAPI Backend]
+    subgraph "Phase I: Discovery & Mapping"
+        direction TB
+        BE -->|3. Fetch Projects| GH[GitHub API]
+        GH -->|4. List Repos| REPO[(Repository Store)]
+        VS[Vulnerability Mapper] -->|5. Map Assets| FT[(Local Workspace)]
+    end
+    BE --> VS
+    subgraph "Phase II: Quality Baseline – Atlas"
+        direction LR
+        BASE[Lightweight Baseline] -->|6. Verify Build| COV[Capture Coverage]
+    end
+    VS --> BASE
+    subgraph "Phase III: Impact Analysis"
+        direction TB
+        DS[Dependency Service] -->|7. Map Blast Radius| MAP[Call Tree & Usage]
+    end
+    COV --> DS
+    subgraph "Phase IV: Cognitive Fixing Loop"
+        direction TB
+        CC[Code Context Agent] --> FS[Fix Strategy Agent]
+        FS --> CF[Code Fixer Agent]
+        CF --> SV[Safety Validator Agent]
+    end
+    MAP -->|8. Start Fix| CC
+    CC <-->|Rich Context| DS
+    CF -->|9. Apply Fix| FT
+    subgraph "Phase V: Self-Healing Pipeline – Atlas"
+        direction TB
+        BM[BuildMechanic] --> TH[TestHealer]
+        TH --> TG[AI Test Generator]
+        TG --> VR[Validation Report]
+    end
+    SV -->|10. Final Verify| BM
+    BM -->|Self-Heal Build| FT
+    FT --> TH
+    subgraph "Phase VI: Delivery & Sync"
+        direction TB
+        VR --> PR[Batch PR Manager]
+        PR -->|11. Create Sync PR| GH
+        PR -->|12. Update UI| UI
+    end
+    %% RAG Knowledge Loop
+    TG -.->|Save Success Patterns| RAG[(SQLite RAG Store)]
+    RAG -.->|Context Enrichment| CC
+```
+### Phase-by-Phase Walkthrough
+| Phase | What Happens | Key Service |
+|---|---|---|
+| **I. Discovery** | Upload CSV → fetch repos from GitHub API → match vulnerability file paths to repositories using intelligent path normalization | `VulnerabilityService`, `GitHubService` |
+| **II. Baseline** | Run `mvn compile test` on the unmodified code to establish Ground Truth coverage & build health | `AtlasService.run_baseline_only()` |
+| **III. Impact Analysis** | Parse Java files, build global dependency graph, find cross-repo callers of the vulnerable method | `DependencyService` |
+| **IV. Cognitive Fixing** | 4-agent pipeline: Analyze context → Plan strategy → Generate fix → Validate safety | `FixOrchestrator` + 4 Agents |
+| **V. Self-Healing** | BuildMechanic auto-repairs compilation; TestHealer fixes broken tests; AI generates new security-targeted tests | Atlas pipeline |
+| **VI. Delivery** | Aggregate all fixes into a single PR with rich markdown body, push to GitHub | `BatchPRService`, `PRManagerService` |
+---
+## 🔧 Backend Deep Dive
+### 1. FastAPI Main (`main.py`) — 1636 lines
+The central orchestration hub. Defines the REST API, middleware, and all endpoint routes.
+#### Startup & Middleware
+| Component | Purpose |
+|---|---|
+| `_startup_validation()` | Smoke-checks AWS + GitHub credentials on boot |
+| `RequestIDMiddleware` | Injects a UUID `X-Request-ID` header into every request for log correlation |
+| CORS middleware | Configurable via `ALLOWED_ORIGINS` env var |
+#### Pydantic Request/Response Models (inline)
+| Model | Fields | Used By |
+|---|---|---|
+| `GitHubRepoRequest` | `username`, `email`, `token` | `POST /api/github/repos` |
+| `Repository` | `id`, `name`, `full_name`, `clone_url`, `language`, etc. | All repo endpoints |
+| `RepositoriesResponse` | `username`, `total_repos`, `repositories[]` | Repo listing |
+| `TestingRequest` | `repo_url`, `repo_path`, `fixed_files`, `create_pr`, `vulnerability`, etc. | Testing pipeline |
+| `Vulnerability` | `file_name`, `line_no` | Vulnerability mapping |
+| `MappedVulnerability` | `repo: Repository`, `vulnerabilities[]` | Mapping results |
+#### API Endpoints
+| Method | Route | Description |
+|---|---|---|
+| `GET` | `/` | Root welcome |
+| `GET` | `/api/health` | Health check for Docker/LB probes |
+| `GET` | `/api/credentials/github` | Retrieve loaded GitHub credentials |
+| `GET` | `/api/credentials/verify` | Debug credential loading |
+| `POST` | `/api/github/repos` | Fetch repos (POST with body) |
+| `GET` | `/api/github/repos` | Fetch repos (GET with query params) |
+| `POST` | `/api/vulnerabilities/map` | Upload CSV + map vulnerabilities to repos |
+| `POST` | `/api/dependencies/analyze` | Analyze dependencies for a single vulnerability |
+| `POST` | `/api/dependencies/batch-analyze` | Batch dependency analysis for multiple vulnerabilities |
+| `POST` | `/api/fix/orchestrate` | Run the full multi-agent fixing pipeline |
+| `POST` | `/api/pr/create` | Create a single PR with fixed code |
+| `POST` | `/api/testing/start` | Start async testing pipeline job |
+| `GET` | `/api/testing/job/{job_id}` | Poll job status |
+| `GET` | `/api/testing/stream/{job_id}` | SSE event stream for real-time progress |
+| `GET` | `/api/testing/runs` | List recent pipeline runs |
+| `POST` | `/api/testing/run` | Legacy sync testing endpoint |
+| `POST` | `/api/fix/batch` | Batch fix multiple vulnerabilities |
+| `POST` | `/api/pr/merge` | Merge PR with conflict resolution |
+| `POST` | `/api/pr/check-mergeability` | Check PR mergeability |
+| `POST` | `/api/pr/create-batch` | Create single aggregated PR for all fixes |
+---
+### 2. Configuration & Credentials
+#### `config.py` — The Config Class
+| Attribute | Source | Default |
+|---|---|---|
+| `AWS_ACCESS_KEY_ID` | `.env` | — |
+| `AWS_SECRET_ACCESS_KEY` | `.env` | — |
+| `AWS_REGION` | `.env` | `us-east-1` |
+| `AWS_SESSION_TOKEN` | `.env` | `None` |
+| `BEDROCK_MODEL_ID` | `.env` | `anthropic.claude-3-5-sonnet-20240620-v1:0` |
+| `BEDROCK_EMBED_MODEL_ID` | `.env` | `amazon.titan-embed-text-v1` |
+**Key Methods:**
+- `get_github_credentials(force_reload=False)` — Reads `credentials.yaml` for GitHub PAT, username, email
+- `validate_bedrock_credentials()` — Returns `(is_valid, error_msg)` tuple
+- `get_bedrock_config()` — Returns dict with `access_key`, `secret_key`, `region`
+#### `credentials.yaml`
+```yaml
+github:
+  token: ghp_xxxxx
+  username: your-username
+  email: your-email@example.com
+```
+---
+### 3. Pydantic Data Models (`models/agent_models.py`)
+These 10 models define the complete data flow through the multi-agent pipeline:
+```mermaid
+flowchart LR
+    VFR[VulnerabilityFixRequest] --> VA[VulnerabilityAnalysis]
+    VA --> CC[CodeContext]
+    CC --> FS[FixStrategy]
+    FS --> CF[CodeFix]
+    CF --> SV[SafetyValidation]
+    SV --> FE[FixExplanation]
+    VFR --> FOR[FixOrchestrationResult]
+    VA --> FOR
+    CC --> FOR
+    FS --> FOR
+    CF --> FOR
+    SV --> FOR
+    FE --> FOR
+```
+| Model | Role | Key Fields |
+|---|---|---|
+| `VulnerabilityFixRequest` | Input to pipeline | `vulnerability_type`, `file_path`, `line_number`, `repo_path` |
+| `VulnerabilityAnalysis` | Agent 1 output | `severity`, `security_impact`, `root_causes`, `fix_category` |
+| `CodeContext` | Agent 2 output | `code_snippet`, `class_name`, `dependent_files_intra/inter`, `data_flow` |
+| `FixStrategy` | Agent 3 output | `fix_approach`, `code_changes_plan`, `files_to_modify_primary/secondary` |
+| `CodeFix` | Agent 4 output | `fixed_code` (Dict[path→code]), `diff`, `change_summary`, `reasoning` |
+| `SafetyValidation` | Agent 5 output | `validation_status`, `correctness_score`, `breaking_changes`, `issues_found` |
+| `FixExplanation` | Agent 6 output | `vulnerability_summary`, `fix_explanation`, `markdown_report` |
+| `FixOrchestrationResult` | Complete result | Aggregates all agent outputs + `overall_status`, `errors` |
+| `VulnerabilitySeverity` | Enum | `CRITICAL`, `HIGH`, `MEDIUM`, `LOW`, `INFO` |
+| `ValidationResult` | Enum | `APPROVED`, `REJECTED`, `NEEDS_REVIEW` |
+---
+### 4. Services Layer (`services/`)
+The services layer contains 14 files providing the core business logic.
+#### 4.1 `bedrock_service.py` — AWS Bedrock LLM Wrapper (439 lines)
+The primary AI gateway used by the **Agents** layer.
+| Method | Description |
+|---|---|
+| `invoke_claude(prompt, model_id, max_tokens, temperature, system_prompt)` | Synchronous Claude invocation via Bedrock `invoke_model` API |
+| `ainvoke_claude(...)` | Async wrapper using `asyncio.to_thread` |
+| `invoke_llama(prompt, ...)` | Llama 3 70B invocation (different payload format) |
+| `ainvoke_llama(...)` | Async Llama wrapper |
+| `embed_text(text, embed_model_id)` | Generate embeddings via Amazon Titan Embed |
+| `invoke_model(model_id, prompt, ...)` | Generic dispatcher — auto-selects Claude/Llama based on model ID |
+| `test_connection()` | Smoke test with simple prompt |
+**Supported Model Constants:**
+| Constant | Model ID |
+|---|---|
+| `CLAUDE_3_5_SONNET` | `anthropic.claude-3-5-sonnet-20240620-v1:0` |
+| `CLAUDE_3_SONNET` | `anthropic.claude-3-sonnet-20240229-v1:0` |
+| `LLAMA_3_70B` | `meta.llama3-70b-instruct-v1:0` |
+#### 4.2 `github_service.py` — GitHub API Client (455 lines)
+| Method | Description |
+|---|---|
+| `verify_token_and_get_user(username)` | Validate PAT + retrieve user info |
+| `get_user_by_username(username)` | Public API user lookup |
+| `get_username_from_email(email)` | Reverse email → username lookup |
+| `get_user_organizations()` | List authenticated user's orgs |
+| `get_organization_repositories(org_name)` | List all repos in an org (paginated) |
+| `get_all_repositories(username, include_private, include_orgs)` | Aggregated repo fetch (user + org repos) |
+| `get_repository_details(owner, repo_name)` | Single repo metadata |
+| `get_repository_file_tree(owner, repo_name, branch)` | Recursive file tree via Git Tree API |
+#### 4.3 `vulnerability_service.py` — CSV Parser & Repo Mapper (833 lines)
+Parses vulnerability reports from Fortify, Checkmarx, SonarQube, Snyk, etc.
+| Method | Description |
+|---|---|
+| `parse_csv_file(file_content, filename)` | Parse CSV into DataFrame; auto-detects column names |
+| `extract_repo_name_from_url(url)` | Handles HTTPS, SSH, `.git` suffix URLs |
+| `normalize_repo_identifier(repo_name, repo_url)` | Lowercase normalization for matching |
+| `normalize_file_path(file_path)` | Cross-platform path normalization |
+| `get_path_variations(file_path)` | Generates multiple path format variations for fuzzy matching |
+| `match_file_in_repo(file_name, repo_files)` | Intelligent file matching with early-exit optimization |
+| `clone_repository_and_get_files(repo_url, clone_dir)` | Git clone + file tree extraction |
+| `map_vulnerabilities_to_repos(df, repositories, repo_files_map, clone_repos)` | Core mapping: CSV rows → repository + file matches |
+#### 4.4 `dependency_service.py` — Java Dependency Graph Engine (2037 lines)
+The largest service file. Performs static analysis of Java source code and Maven POM files.
+| Method | Description |
+|---|---|
+| `parse_java_file(file_path)` | Extracts package, imports, classes, methods, interfaces, method calls via regex/AST parsing |
+| `parse_pom_xml(pom_path)` | Extracts `groupId`, `artifactId`, `version`, dependencies, parent POM |
+| `find_java_files(repo_path)` | Recursive `.java` file discovery |
+| `find_pom_files(repo_path)` | Recursive `pom.xml` file discovery |
+| `build_global_dependency_graph(all_repos, artifact_index)` | Builds both intra-repo and inter-repo dependency edges. Node identity = `(repo_name, file_path)` |
+| `build_intra_repo_dependencies(repo_path)` | File-to-file dependencies within a single repo (import-graph) |
+| `find_maven_artifact_for_file(file_path, repo_path)` | Map a `.java` file to its Maven artifact coordinates |
+| `find_cross_repo_dependent_files(...)` | **Inter-repo blast radius**: finds files in other repos that depend on the vulnerable file |
+| `_build_cross_repo_dependency_chains(...)` | Transitive dependency chain traversal across repos (up to `max_depth=5`) |
+| `build_maven_artifact_index(all_repos)` | Maps `(groupId, artifactId)` → repository metadata |
+#### 4.5 `fix_orchestrator.py` — Multi-Agent Pipeline Controller (564 lines)
+Coordinates the sequential agent execution:
+```
+Agent 2 (Code Context) → Agent 3 (Fix Strategy) → Agent 4 (Code Fix) → Agent 5 (Safety Validator)
+```
+| Method | Description |
+|---|---|
+| `orchestrate_fix(request, stop_at_agent, validate_fix, max_validation_retries, all_repositories_info)` | Main entry point. Runs agents 2→5 sequentially, with optional validation loop |
+| `get_orchestration_status(result)` | Human-readable status summary |
+| `_create_skeleton_analysis(request)` | Generates a default `VulnerabilityAnalysis` from request data |
+Supports `stop_at_agent` for incremental testing (e.g., run only agents 2-3).
+#### 4.6 `batch_fix_service.py` — Batch Vulnerability Processing (753 lines)
+Processes multiple vulnerabilities in sequence or with controlled concurrency.
+| Method | Description |
+|---|---|
+| `_process_vulnerability_fix(current_idx, vuln_idx, vuln, ...)` | Process a single vulnerability with logging |
+| `_run_testing_agent(repo_path, repo_name, phase, fixed_files)` | Run Atlas testing (baseline or validation phase) |
+| `fix_single_vulnerability(vulnerability, repo_path, ...)` | Single fix with full orchestration |
+| `fix_batch_vulnerabilities(vulnerabilities, repo_path, ..., max_concurrent, auto_create_pr, run_tests_after_fix)` | Main batch entry point. Runs baseline → sequential fixes → validation → optional PR |
+Workflow: **Baseline → Fix each vulnerability → Run Atlas validation → Create aggregated PR**
+#### 4.7 `pr_manager_service.py` — Git & PR Operations (1359 lines)
+Complete Git workflow management.
+| Method | Description |
+|---|---|
+| `_run_git_command(repo_path, command, timeout)` | Safe subprocess wrapper for git commands |
+| `create_branch(repo_path, branch_name, base_branch)` | Create and checkout new branch |
+| `commit_changes(repo_path, files_to_commit, commit_message, author_name, author_email)` | Stage + commit with configurable author |
+| `push_branch(repo_path, branch_name, remote)` | Git push to remote |
+| `create_pull_request(owner, repo, title, body, head_branch, base_branch)` | GitHub API PR creation |
+| `_validate_compilation(repo_path, files_modified)` | Best-effort Maven/Gradle compilation check |
+| `_clean_code_before_validation(code)` | Removes markdown artifacts, separator lines from LLM output |
+| `_validate_java_code(code, file_path)` | Basic Java structure validation (package, class, brace matching) |
+| `apply_fixed_code(repo_path, files_modified, fixed_code_map)` | Write fixed code to files with validation |
+| `create_pr_for_fix(repo_path, repo_owner, ..., include_all_repo_changes)` | Complete workflow: apply → branch → commit → push → PR |
+#### 4.8 `batch_pr_service.py` — Aggregated PR Creation (429 lines)
+| Method | Description |
+|---|---|
+| `_extract_files_and_code(fix_result)` | Parse fix result into `files_modified` + `fixed_code_map` |
+| `create_single_pr(fix_result, ...)` | Create PR for one vulnerability |
+| `create_batch_prs(successful_fixes, ...)` | One PR per vulnerability |
+| `create_single_batch_pr(successful_fixes, ..., test_results)` | **Single aggregated PR** combining all fixes + test results |
+#### 4.9 `atlas_service.py` — Testing Pipeline Façade (430 lines)
+Bridges the backend API to the Atlas subsystem.
+| Method | Description |
+|---|---|
+| `_check_required_tools()` | Validates `git`, `mvn`, `java` are on PATH |
+| `run_testing_pipeline(repo_url, create_pr, job_id)` | Full pipeline on remote repo (clone → test → coverage → PR) |
+| `run_testing_pipeline_local(repo_path, repo_url, fixed_files, ...)` | Full pipeline on already-cloned local repo |
+| `run_baseline_only(repo_path, repo_url)` | Lightweight: build + existing tests + coverage — NO AI |
+#### 4.10 `fix_validator_service.py` — Post-Fix Validation (277 lines)
+| Method | Description |
+|---|---|
+| `validate_fix(repo_path, files_modified)` | Run Maven build + tests on the fixed repo. Uses `BuildMechanic` for auto-repair |
+| `get_validation_feedback(validation_result)` | Generate feedback string for retry loop |
+#### 4.11 `job_manager.py` — Async Job & SSE Streaming (86 lines)
+| Method | Description |
+|---|---|
+| `create_job()` | Create UUID-identified job with `asyncio.Queue` |
+| `update_job(job_id, status, message, progress)` | Update status + push to SSE queue |
+| `end_job(job_id)` | Signal `[DONE]` to SSE stream |
+| `stream_job_events(job_id)` | Async generator for `StreamingResponse` |
+#### 4.12 `run_history.py` — SQLite Run Persistence (115 lines)
+| Method | Description |
+|---|---|
+| `create_run(repo_url, repo_path)` | Insert new run record |
+| `update_run(run_id, status, result_data, error_msg, cost)` | Update with test/coverage/regression/quality gate reports |
+| `get_recent_runs(limit)` | Fetch recent runs with JSON report parsing |
+Schema: `pipeline_runs(run_id, repo_url, repo_path, status, start_time, end_time, total_cost, test_report, coverage_report, regression_report, quality_gate_report, error_message)`
+#### 4.13 `cost_guard.py` — LLM Cost Limiter (50 lines)
+| Method | Description |
+|---|---|
+| `start_run(run_id)` | Initialize per-run cost tracking |
+| `add_cost(run_id, prompt_tokens, completion_tokens, model_id)` | Accumulate cost; returns `False` if budget exceeded |
+| `get_run_cost(run_id)` | Query accumulated cost |
+Pricing: Claude 3.5 Sonnet — $0.003/1K prompt tokens, $0.015/1K completion tokens. Default budget: **$5.00/run**.
+---
+### 5. Agents Layer (`agents/`) — Cognitive Fixing Loop
+#### 5.1 `code_context_agent.py` — Blast Radius Mapper (642 lines)
+**Purpose:** Understand the full context around a vulnerability — local code, dependent files, data flow.
+| Method | Logic |
+|---|---|
+| `_read_file_with_context(file_path, line_number, context_lines)` | Extract code snippet + surrounding context. Includes class/method even if vulnerability is in imports |
+| `_extract_class_and_method(file_path, line_number, code_content)` | Regex-based Java class/method extraction |
+| `_analyze_data_flow_and_usage(code_snippet, vulnerability_type, ...)` | **LLM call**: Analyze how user input flows from source → vulnerable sink |
+| `_analyze_method_usage_in_dependents(vulnerable_class, ..., dependent_files)` | Check if the fix will break dependent files by analyzing their imports and usage |
+| `_discover_other_repositories(current_repo_path)` | Scan `temp_cloned_repos/` directory for cross-repo analysis |
+| `analyze(request, vulnerability_analysis, all_repositories_info)` | Main entry: reads file, finds dependents (intra + inter-repo), runs LLM data flow analysis |
+#### 5.2 `fix_strategy_agent.py` — Surgical Planner (633 lines)
+**Purpose:** Design a backward-compatible fix plan.
+| Method | Logic |
+|---|---|
+| `_get_available_java_files(repo_path, max_files)` | Inventory of Java files for validation |
+| `_analyze_file_imports_and_usage(repo_path, file_path)` | Static import analysis to find related files |
+| `_build_strategy_prompt(request, analysis, context)` | Constructs a detailed LLM prompt with vulnerability info, dependents, constraint rules |
+| `_parse_strategy_response(response_content)` | JSON extraction from LLM response |
+| `analyze(request, vulnerability_analysis, code_context)` | **LLM call**: Generate fix strategy; categorizes files as Primary (logic change) or Secondary (impacted usage) |
+**Key Decision Logic:**
+- If a method is called by 50+ files → force **backward-compatible fix** (overloaded method, not breaking change)
+- Uses `FrameworkDetector` for framework-specific recommendations (Spring Security, Jakarta, etc.)
+#### 5.3 `code_fix_agent.py` — Multi-File Code Generator (1071 lines)
+**Purpose:** Generate actual fixed Java code across multiple files.
+| Method | Logic |
+|---|---|
+| `_read_file(file_path)` | Read source file |
+| `_find_nearest_pom_xml(repo_path, file_rel_path)` | Walk up directories to find `pom.xml` |
+| `_project_allows_spring_security(repo_path, file_rel_path, original_code)` | Check if Spring Security dependencies exist before generating SS code |
+| `_dependency_constraints_text(repo_path, ...)` | Generate constraint text for LLM prompt |
+| `_postprocess_for_project_dependencies(code, ...)` | **Deterministic safety net**: strip Spring Security constructs if project doesn't include it |
+| `_generate_diff(original, fixed, file_path)` | Unified diff generation |
+| `_clean_generated_code(code, file_path)` | Aggressive cleanup: removes markdown, `<thinking>` blocks, ensures valid Java |
+| `_generate_fixed_code(original_code, request, ...)` | **LLM call**: Generate complete fixed file with prompt-chain reasoning |
+| `fix_code(request, ..., fix_strategy)` | Main entry: fixes ALL files in `files_to_modify_primary`, runs post-processing |
+Uses `ImportManager.add_missing_imports()` and `SyntaxValidator.validate()` for post-processing.
+#### 5.4 `safety_validator_agent.py` — Logic Gate (371 lines)
+**Purpose:** Verify the fix is correct, introduces no regressions.
+| Method | Logic |
+|---|---|
+| `_format_fixed_code(fixed_code)` | Format code dict for display |
+| `_format_dependent_files_for_validation(code_context)` | Format dependent files context |
+| `_build_validation_prompt(request, ..., code_fix)` | Comprehensive validation prompt |
+| `_parse_validation_response(response_content)` | Extract structured validation data |
+| `_normalize_validation_data(parsed)` | Ensure correct types for downstream consumption |
+| `validate(request, ..., code_fix)` | **LLM call**: Returns `APPROVED`/`REJECTED`/`NEEDS_REVIEW` with `correctness_score` (0-1) |
+#### 5.5 `codebase_analysis_agent.py` — Repository Intelligence (594 lines)
+**Purpose:** Deep structural analysis of the codebase (similar to AI coding assistants).
+| Method | Logic |
+|---|---|
+| `analyze_codebase_structure(repo_path, focus_file)` | Full repo analysis with in-memory cache (TTL=300s) |
+| `find_dependent_files(repo_path, target_file, max_depth)` | Find all files depending on target file |
+| `analyze_code_flow(repo_path, file_path, line_number)` | Data flow analysis around a specific line |
+| `_analyze_architecture(repo_path, java_files)` | Detect project layers (Controller, Service, DAO, etc.) |
+| `_build_dependency_graph(repo_path, java_files)` | Build import-based dependency graph |
+| `_detect_patterns(repo_path, java_files)` | Detect design patterns (Singleton, Factory, Builder, Observer) |
+| `_parse_java_file(file_path)` | Extract package, imports, classes (regex-based) |
+#### 5.6 `agent_improvements.py` — Helper Utilities (368 lines)
+Four static helper classes:
+| Class | Purpose |
+|---|---|
+| `ImportManager` | Auto-detect and add missing Java imports (maps common security classes to their import statements) |
+| `SyntaxValidator` | Basic Java syntax validation (brace matching, package declaration, class structure) |
+| `FrameworkDetector` | Detect frameworks in `pom.xml` (Spring Boot, Spring Security, JPA, Jackson, etc.) with framework-specific fix recommendations |
+| `ContextEnhancer` | Extract full method/class definitions from source code for enhanced prompt context |
+---
+### 6. Atlas Subsystem (`atlas/`) — Self-Healing Testing Framework
+Atlas is a comprehensive, autonomous testing and quality assurance pipeline with 14 sub-packages.
+#### 6.1 `orchestrator/run_pipeline.py` — Pipeline Core (1412 lines)
+The brain of Atlas. Orchestrates the entire testing lifecycle.
+| Function | Description |
+|---|---|
+| `run_full_pipeline(repo_url, ...)` | Clone remote repo → full pipeline |
+| `run_full_pipeline_local(repo_path, ...)` | Full pipeline on local repo |
+| `run_baseline_only(repo_path, ...)` | Lightweight: build + test + coverage only |
+| `_run_baseline_phase(repo_path, ...)` | Build (with restricted auto-fix) + existing tests + JaCoCo coverage  |
+| `_run_validation_phase(repo_path, ...)` | Diff-aware test generation, healing, regression detection (630+ lines) |
+| `_run_full_pipeline_core(...)` | Core pipeline: baseline → validation → quality gate → PR |
+| `evaluate_quality_gate(coverage, unit, min_coverage_pct, max_failures)` | Pass/fail decision on release readiness |
+| `calculate_regression_report(state_mgr, ...)` | Compare current vs baseline to detect regressions/improvements |
+| `_calculate_usage(llm)` | Compute estimated cost from Bedrock token metrics |
+| `run_organization_pipeline(org_url, ...)` | Scan entire GitHub org: run pipeline on each Java/Maven repo |
+#### 6.2 `agents/build_mechanic.py` — Build Failure Auto-Repair (1133 lines)
+The SRE agent. Diagnoses and fixes compilation failures.
+| Method | Description |
+|---|---|
+| `analyze(stdout, stderr)` | Parse Maven build output → `BuildDiagnosis` (root cause, confidence, hints) |
+| `generate_fix(diagnosis, workspace_path, ...)` | **LLM call**: Generate concrete fix (file patches, POM changes, config files) |
+**Domain Expertise:**
+- Spring Security 6 migration patterns (`WebSecurityConfigurerAdapter` → lambda DSL)
+- Deprecated API detection and deletion
+- Missing dependency resolution (maps class names → Maven coordinates)
+- `COMMON_TEST_HINTS` dictionary: 30+ patterns mapping class names to imports
+- Test assertion guidelines (status codes, JSON paths, mock strategies)
+#### 6.3 `agents/test_healer.py` — Test Failure Doctor (151 lines)
+| Method | Description |
+|---|---|
+| `heal(failed_tests, workspace_path)` | Group failures by class → **LLM call**: generate fixed test file → `AgentFix` |
+| `_find_test_file(root, classname)` | Locate `.java` test file by class name |
+Processes top 10 failures, max 3 classes, max 5 failures per class.
+#### 6.4 `rag/store.py` — SQLite Vector RAG Store (210 lines)
+Lightweight persistent RAG store for test pattern learning.
+| Method | Description |
+|---|---|
+| `upsert(id, kind, embedding, text, metadata)` | Insert/update with normalized float32 embedding blob |
+| `query(embedding, top_k, kind, kinds, score_threshold, include_expired)` | Cosine similarity search via dot product |
+| `get_by_id(id)` | Direct ID lookup |
+| `count(kind)` | Count entries by kind |
+| `evict_expired()` | TTL-based cleanup (default 30 days) |
+**Schema:** `rag_items(id TEXT PK, kind TEXT, created_at INT, embedding BLOB, metadata_json TEXT, text TEXT)`
+**Indexes:** `kind`, `created_at`
+#### 6.5 `llm/bedrock.py` — Atlas Bedrock Client (163 lines)
+Dedicated Bedrock client for the Atlas subsystem.
+| Method | Description |
+|---|---|
+| `embed_text(text)` | Titan Embeddings: `inputText` → embedding vector |
+| `generate_text(system, user, max_tokens)` | Claude Messages API via Bedrock `invoke_model` |
+Tracks `total_input_tokens`, `total_output_tokens`, `total_embedding_tokens` for cost calculation.
+**Security:** Permanent credentials (`AKIA*`) do NOT use session tokens; temporary (`ASIA*`) require them.
+#### 6.6 `generation/java_unit_test_generator.py` — RAG-Enhanced Test Gen (441 lines)
+| Method | Description |
+|---|---|
+| `generate_minimal_tests_for_repo(target_count, preferred_classes, ...)` | Discover main classes → prioritize by scoring → generate tests |
+| `_generate_single_test(src, repo_path, ...)` | **LLM + RAG call**: Check fingerprint → query RAG for similar patterns → generate JUnit 5 test |
+| `_set_fingerprint(class_key, sha, test_path)` | Store source hash in RAG for idempotent re-runs |
+**Scoring heuristic for class prioritization:**
+- +10 if in preferred classes list
+- +5 for service/controller/repository classes
+- +3 for `@RestController`/`@Service`/`@Repository` annotations
+- −2 for test/config/model classes
+Uses `RepoContractRegistry` for constructor/method signature validation in generated tests.
+#### 6.7 `build/` — Build Infrastructure (5 files)
+| File | Purpose |
+|---|---|
+| `maven.py` | Maven command runner (`mvn compile`, `mvn test`, etc.) with subprocess management |
+| `jacoco_injector.py` | Inject JaCoCo Maven plugin into `pom.xml` for code coverage |
+| `spring_test_injector.py` | Inject `spring-boot-starter-test` dependency |
+| `failsafe_injector.py` | Inject Maven Failsafe plugin for integration tests |
+| `dependency_governance.py` | Enforce dependency version governance (BOM alignment, conflict resolution) |
+#### 6.8 `core/` — Core Infrastructure (5 files)
+| File | Purpose |
+|---|---|
+| `config.py` | Atlas-specific configuration (data dirs, model IDs, etc.) |
+| `logging.py` | `RunLogger` class for structured pipeline logging |
+| `state.py` | `PipelineStateManager` — manages baseline/validation state persistence |
+| `shell.py` | Safe shell command execution with timeout |
+| `resilience.py` | **Retry with exponential backoff** (configurable attempts, jitter) + **Circuit Breaker** pattern (CLOSED/OPEN/HALF-OPEN states) + **Rate Limiter** |
+#### 6.9 `analysis/` — Code Analysis (3 files)
+| File | Purpose |
+|---|---|
+| `java_maven.py` | Java project analysis: `detect_repo_facts()`, `count_existing_tests()`, `find_domain_models()` |
+| `contract_service.py` | `RepoContractRegistry`: extract class constructors, method signatures for test generation validation |
+| `diff_analyzer.py` | Analyze git diffs to identify functional changes for targeted test generation |
+#### 6.10 `reporting/` — Test Reporting (2 files)
+| File | Purpose |
+|---|---|
+| `models.py` | Report dataclasses: `TestReport`, `CoverageReport`, `BreakageReport`, `GenerationReport`, `RegressionReport`, `QualityGateReport`, `UsageReport`, `FullRunReport` |
+| `parsers.py` | Parse Surefire XML reports, JaCoCo CSV coverage data, classify test failures |
+#### 6.11 `gitops/` — GitHub Integration (3 files)
+| File | Purpose |
+|---|---|
+| `github_pr.py` | Create PRs for Atlas-generated tests |
+| `github_issues.py` | Create GitHub issues for persistent test failures |
+| `github_org.py` | List repos in a GitHub organization for org-wide scanning |
+#### 6.12 `repo/` — Repository Management (2 files)
+| File | Purpose |
+|---|---|
+| `cloner.py` | `RepoCloner`: Clone repos with token authentication |
+| `history.py` | Run history tracking for the Atlas pipeline |
+---
+## 🎨 Frontend Deep Dive
+**Technology:** Streamlit (5676 lines, single `app.py` + utility modules)
+### UI Components
+The frontend is a premium dark-mode dashboard with glassmorphism styling, gradient headers, and micro-animations. Key CSS tokens:
+- Background: `#0f172a` (dark slate), Secondary: `#1e293b`
+- Accent: `linear-gradient(135deg, #3b82f6, #2dd4bf)` (blue → teal)
+- Font: Inter (body), JetBrains Mono (code)
+### Core Functions
+| Function | Lines | Purpose |
+|---|---|---|
+| `main()` | 45 | Entry point: mode selector (Vulnerability Workflow vs Repository Explorer) |
+| `display_vulnerability_workflow(api_url)` | ~600 | Streamlined flow: Upload → Map → Test → Fix → Verify |
+| `display_repositories(data)` | ~2400 | Full repository explorer with vulnerability cards, dep trees, fix controls |
+| `process_active_batch_fix(selected_repo_id, ...)` | ~1040 | Real-time batch fix processing with progress bars |
+| `display_lineage_graph(result, repo_name, vuln_idx)` | ~275 | NetworkX-based dependency graph visualization |
+| `fetch_repositories(api_base_url, ...)` | 30 | Call backend to fetch GitHub repos |
+| `map_vulnerabilities(api_url, repositories_data, csv_file)` | 28 | Upload CSV and map vulnerabilities |
+| `run_testing_agent(api_url, repo_url)` | 70 | SSE streaming of Atlas pipeline progress |
+| `batch_fix_vulnerabilities(api_url, vulnerabilities, ...)` | ~210 | Call batch fix endpoint with progress callbacks |
+| `display_lineage_graph._extract_paths(items)` | 10 | Extract file paths from dependent files list |
+| `display_setup_progress(current_step)` | ~160 | Animated 3-step progress tracker (Upload → Fetch → Map) |
+| `display_run_history(api_url)` | 70 | Fetch and display pipeline run history table |
+### Frontend Utility Modules
+| File | Purpose |
+|---|---|
+| `src/vulnerability_ui.py` (52KB) | Advanced vulnerability display: cards, severity badges, fix result rendering |
+| `src/lineage.py` (10KB) | Lineage graph data transformations |
+| `utils/atlas_report_comprehensive.py` (21KB) | Comprehensive Atlas report rendering |
+| `utils/integrate_render.py` (4KB) | Report integration helpers |
+---
+## 🧠 RAG (Retrieval-Augmented Generation)
+ICSF uses a custom RAG implementation for test pattern learning:
+### Architecture
+```
+                    ┌──────────────────┐
+                    │   Titan Embed    │
+                    │   (Bedrock)      │
+                    └────────┬─────────┘
+                             │ embedding vector
+                    ┌────────▼─────────┐
+                    │  SqliteVectorRag │
+                    │     Store        │
+                    │  (cosine search) │
+                    └────────┬─────────┘
+                             │ similar patterns
+                    ┌────────▼─────────┐
+                    │  Test Generator  │
+                    │  (LLM prompt)    │
+                    └──────────────────┘
+```
+### How RAG is Used
+1. **Fingerprint Check**: Before generating a test, hash the source file → query RAG for existing fingerprint → skip if unchanged
+2. **Pattern Retrieval**: Query RAG store for similar test patterns (`kind=test_pattern`) with cosine similarity ≥ 0.25
+3. **Context Injection**: Retrieved patterns are injected into the LLM prompt as examples
+4. **Pattern Storage**: After successful test generation, store the pattern in RAG for future use
+### RAG Store Configuration
+| Setting | Value |
+|---|---|
+| **Database** | SQLite (`data/atlas_rag.db`) |
+| **Embedding Model** | Amazon Titan Embed Text v1 |
+| **Embedding Dimension** | 1536 (float32) |
+| **Similarity Metric** | Cosine (via dot product on normalized vectors) |
+| **TTL** | 30 days (auto-eviction of stale entries) |
+| **Score Threshold** | 0.25 minimum cosine similarity |
+---
+## 🤖 AI / LLM Integration
+### Models Used
+| Model | Use Case | Provider |
+|---|---|---|
+| **Claude 3.5 Sonnet** | All reasoning: code analysis, fix generation, strategy planning, safety validation, build repair, test healing | AWS Bedrock |
+| **Amazon Titan Embed Text v1** | Text embeddings for RAG store | AWS Bedrock |
+| **Llama 3 70B** (optional) | Alternative generation model | AWS Bedrock |
+### LLM Call Sites
+| Component | # of LLM Calls | Purpose |
+|---|---|---|
+| `CodeContextAgent` | 1 | Data flow analysis |
+| `FixStrategyAgent` | 1 | Fix strategy planning |
+| `CodeFixAgent` | 1 per file | Code generation |
+| `SafetyValidatorAgent` | 1 | Fix validation |
+| `BuildMechanic` | 1–3 per build failure | Build error diagnosis + fix |
+| `TestHealer` | 1 per test class | Test repair |
+| `JavaUnitTestGenerator` | 1 per source class | Test generation |
+| Total per vulnerability | ~6–12 | Depending on file count and failure iterations |
+### Cost Management
+- `CostGuardService` tracks cost per run with **$5.00 default budget**
+- Pricing model: Claude 3.5 Sonnet @ $0.003/1K input, $0.015/1K output
+- `_calculate_usage()` in the pipeline reports total tokens + estimated cost
+- `BedrockClient` tracks `total_input_tokens`, `total_output_tokens`, `total_embedding_tokens`
+---
+## 📥 Input Requirements
+### 1. Security Vulnerability Report (CSV)
+Supported scanners: **Fortify**, **Checkmarx**, **SonarQube**, **Snyk**
+| Required Column | Example |
+|---|---|
+| `vulnerability_type` or `category` | Cross-Site Scripting |
+| `file_name` or `file_path` | `src/main/java/com/example/Controller.java` |
+| `line_no` or `line_number` | `42` |
+| `severity` | Critical / High / Medium / Low |
+| `description` | User input is rendered without encoding |
+| `recommendation` | Use OWASP encoder for output encoding |
+| `repo_name` or `link` | `my-app` or `https://github.com/org/my-app` |
+### 2. Version Control Credentials
+- **GitHub PAT**: Requires `repo` and `read:user` scopes
+- Stored in `backend/credentials.yaml`
+### 3. AI Model Access (AWS Bedrock)
+- **AWS credentials**: `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` in `.env`
+- **Region**: `us-east-1` (default) or any Bedrock-enabled region
+- **Model access**: Must have Claude 3.5 Sonnet + Titan Embeddings enabled in your AWS account
+### 4. Build Environment
+- **Java JDK 17+** on PATH
+- **Maven** on PATH
+- **Git** on PATH
+---
+## 🛠️ Technical Stack
+| Layer | Technology | Version |
+|---|---|---|
+| **Language** | Python | 3.10+ |
+| **Backend Framework** | FastAPI | ≥0.104 |
+| **Frontend Framework** | Streamlit | ≥1.28 |
+| **LLM Provider** | AWS Bedrock (Boto3) | ≥1.34 |
+| **Embedding Model** | Amazon Titan Embed Text v1 | — |
+| **Reasoning Model** | Claude 3.5 Sonnet | — |
+| **Database** | SQLite | (stdlib) |
+| **HTTP Client** | httpx | ≥0.28 |
+| **Data Processing** | pandas | ≥2.0 |
+| **Version Control** | GitPython + GitHub API | ≥3.1 |
+| **Graph Analysis** | NetworkX | ≥3.0 |
+| **Validation** | Pydantic | ≥2.10 |
+| **Containerization** | Docker Compose | 3.8 |
+| **Build Tools** | Maven, JDK 17+ | — |
+---
+## 🚀 Getting Started
+### Prerequisites
+- **Git**, **Java JDK 17+**, and **Maven** installed and on PATH
+- **Python 3.10+**
+- **AWS credentials** with Bedrock access (Claude 3.5 Sonnet + Titan Embeddings enabled)
+- **GitHub PAT** with `repo` and `read:user` scopes
+### Environment Setup
+1. **Create `.env`** in `backend/`:
+```env
+AWS_ACCESS_KEY_ID=your_key
+AWS_SECRET_ACCESS_KEY=your_secret
+AWS_REGION=us-east-1
+BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20240620-v1:0
+BEDROCK_EMBED_MODEL_ID=amazon.titan-embed-text-v1
+```
+2. **Create `credentials.yaml`** in `backend/`:
+```yaml
+github:
+  token: ghp_your_personal_access_token
+  username: your-github-username
+  email: your-email@example.com
+```
+### Run with Docker (Recommended)
+```bash
+docker-compose up --build
+```
+- **Backend**: http://localhost:8000
+- **Frontend**: http://localhost:8501
+- Backend has 4GB memory limit, frontend has 1GB
+- Health checks are configured for both services
+### Manual Installation
+**Backend:**
+```bash
+cd backend
+python -m venv venv
+source venv/bin/activate  # or venv\Scripts\activate on Windows
+pip install -r requirements.txt
+uvicorn main:app --reload --host 0.0.0.0 --port 8000
+```
+**Frontend:**
+```bash
+cd frontend
+python -m venv venv
+source venv/bin/activate  # or venv\Scripts\activate on Windows
+pip install -r requirements.txt
+streamlit run app.py --server.port 8501
+```
+---
+## 📂 Project Structure
+```
+ICSF/
+├── backend/
+│   ├── main.py                          # FastAPI entrypoint (1636 lines, 20+ endpoints)
+│   ├── config.py                        # Config class (AWS, Bedrock, GitHub credentials)
+│   ├── credentials.yaml                 # GitHub PAT + user info
+│   ├── logging_config.py               # Global logging configuration
+│   ├── .env                             # AWS credentials (not committed)
+│   │
+│   ├── models/
+│   │   └── agent_models.py              # 10 Pydantic models for pipeline data flow
+│   │
+│   ├── services/                        # 14 service files
+│   │   ├── bedrock_service.py           # AWS Bedrock LLM wrapper (Claude, Llama, Titan)
+│   │   ├── github_service.py            # GitHub API client (repos, orgs, file trees)
+│   │   ├── vulnerability_service.py     # CSV parsing & repo mapping (833 lines)
+│   │   ├── dependency_service.py        # Java dependency graph engine (2037 lines)
+│   │   ├── fix_orchestrator.py          # Multi-agent pipeline controller
+│   │   ├── batch_fix_service.py         # Batch vulnerability processing
+│   │   ├── pr_manager_service.py        # Git operations & PR creation (1359 lines)
+│   │   ├── batch_pr_service.py          # Aggregated PR creation
+│   │   ├── atlas_service.py             # Testing pipeline façade
+│   │   ├── fix_validator_service.py     # Post-fix build/test validation
+│   │   ├── job_manager.py               # Async job & SSE streaming
+│   │   ├── run_history.py               # SQLite run persistence
+│   │   └── cost_guard.py                # LLM cost limiter ($5/run default)
+│   │
+│   ├── agents/                          # 7 agent files (Cognitive Fixing Loop)
+│   │   ├── code_context_agent.py        # Blast radius mapper (642 lines)
+│   │   ├── fix_strategy_agent.py        # Surgical planner (633 lines)
+│   │   ├── code_fix_agent.py            # Multi-file code generator (1071 lines)
+│   │   ├── safety_validator_agent.py    # Logic gate validator (371 lines)
+│   │   ├── codebase_analysis_agent.py   # Repository intelligence (594 lines)
+│   │   └── agent_improvements.py        # Helpers: ImportManager, SyntaxValidator, etc.
+│   │
+│   ├── atlas/                           # Self-Healing Testing Framework
+│   │   ├── orchestrator/
+│   │   │   └── run_pipeline.py          # Pipeline core (1412 lines)
+│   │   ├── agents/
+│   │   │   ├── build_mechanic.py        # Build failure auto-repair (1133 lines)
+│   │   │   ├── test_healer.py           # Test failure doctor (151 lines)
+│   │   │   └── models.py               # Agent data models
+│   │   ├── rag/
+│   │   │   └── store.py                 # SQLite vector RAG store (210 lines)
+│   │   ├── llm/
+│   │   │   └── bedrock.py               # Atlas Bedrock client (163 lines)
+│   │   ├── generation/
+│   │   │   └── java_unit_test_generator.py  # RAG-enhanced test gen (441 lines)
+│   │   ├── build/
+│   │   │   ├── maven.py                 # Maven command runner
+│   │   │   ├── jacoco_injector.py       # JaCoCo coverage plugin injection
+│   │   │   ├── spring_test_injector.py  # Spring test dependency injection
+│   │   │   ├── failsafe_injector.py     # Failsafe plugin injection
+│   │   │   └── dependency_governance.py # Dependency version governance
+│   │   ├── core/
+│   │   │   ├── config.py                # Atlas configuration
+│   │   │   ├── logging.py              # RunLogger
+│   │   │   ├── state.py                # Pipeline state manager
+│   │   │   ├── shell.py                # Safe shell execution
+│   │   │   └── resilience.py           # Retry, circuit breaker, rate limiter
+│   │   ├── analysis/
+│   │   │   ├── java_maven.py           # Java project fact detection
+│   │   │   ├── contract_service.py     # Constructor/method signature registry
+│   │   │   └── diff_analyzer.py        # Git diff → functional change detection
+│   │   ├── reporting/
+│   │   │   ├── models.py               # Report dataclasses
+│   │   │   └── parsers.py              # Surefire XML & JaCoCo CSV parsers
+│   │   ├── gitops/
+│   │   │   ├── github_pr.py            # PR creation for generated tests
+│   │   │   ├── github_issues.py        # Issue creation for failures
+│   │   │   └── github_org.py           # Organization repo listing
+│   │   └── repo/
+│   │       ├── cloner.py               # Repository cloning
+│   │       └── history.py              # Run history tracking
+│   │
+│   ├── scripts/                         # Utility & test scripts
+│   │   ├── test_bedrock_connection.py
+│   │   ├── test_cross_repo_dependencies.py
+│   │   ├── test_dependency_analysis.py
+│   │   ├── test_orchestrator.py
+│   │   ├── analyze_all_matched_files.py
+│   │   └── visualize_dependency_mapping.py
+│   │
+│   └── data/                            # SQLite databases & logs
+│       ├── runs.db                      # Pipeline run history
+│       └── atlas_rag.db                 # RAG vector store
+│
+├── frontend/
+│   ├── app.py                           # Streamlit UI (5676 lines)
+│   ├── src/
+│   │   ├── vulnerability_ui.py          # Vulnerability display components
+│   │   └── lineage.py                   # Lineage graph data transforms
+│   ├── utils/
+│   │   ├── atlas_report_comprehensive.py # Atlas report rendering
+│   │   └── integrate_render.py          # Report integration helpers
+│   ├── requirements.txt
+│   └── Dockerfile
+│
+├── docker-compose.yml                   # Multi-container setup
+├── start_frontend.bat                   # Windows frontend launcher
+└── start_frontend.sh                    # Linux/Mac frontend launcher
+```
+---
+## 📡 API Reference
+### Health & Credentials
+| Endpoint | Method | Description |
+|---|---|---|
+| `/api/health` | GET | Health check (Docker/LB probes) |
+| `/api/credentials/github` | GET | Retrieve loaded GitHub credentials |
+| `/api/credentials/verify` | GET | Debug credential loading |
+### Repository Management
+| Endpoint | Method | Description |
+|---|---|---|
+| `/api/github/repos` | POST/GET | Fetch GitHub repositories |
+### Vulnerability Management
+| Endpoint | Method | Description |
+|---|---|---|
+| `/api/vulnerabilities/map` | POST | Upload CSV + map vulnerabilities |
+| `/api/dependencies/analyze` | POST | Single vulnerability dependency analysis |
+| `/api/dependencies/batch-analyze` | POST | Batch dependency analysis |
+### Fix Operations
+| Endpoint | Method | Description |
+|---|---|---|
+| `/api/fix/orchestrate` | POST | Full multi-agent fix pipeline |
+| `/api/fix/batch` | POST | Batch fix multiple vulnerabilities |
+### Testing Pipeline
+| Endpoint | Method | Description |
+|---|---|---|
+| `/api/testing/start` | POST | Start async testing job |
+| `/api/testing/job/{job_id}` | GET | Poll job status |
+| `/api/testing/stream/{job_id}` | GET | SSE event stream |
+| `/api/testing/runs` | GET | Pipeline run history |
+| `/api/testing/run` | POST | Legacy sync testing |
+### Pull Request Management
+| Endpoint | Method | Description |
+|---|---|---|
+| `/api/pr/create` | POST | Create single PR |
+| `/api/pr/create-batch` | POST | Create aggregated PR |
+| `/api/pr/merge` | POST | Merge with conflict resolution |
+| `/api/pr/check-mergeability` | POST | Check PR mergeability |
+---
+*ICSF — Making code security intelligent, automated, and reliable.*