PyPI - gitm-labs - Versions diffs - 0.0.2__tar.gz → 0.0.3__tar.gz - Mend

gitm-labs 0.0.2tar.gz → 0.0.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (167) hide show

gitm_labs-0.0.3/.github/workflows/claude-review.yml ADDED Viewed

@@ -0,0 +1,98 @@
+name: Claude Code Review
+on:
+  pull_request:
+    types: [opened, synchronize]
+jobs:
+  review:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - name: Get PR diff
+        run: |
+          git diff origin/${{ github.base_ref }}...HEAD \
+            -- '*.py' '*.sh' '*.md' '*.yaml' '*.yml' \
+            > pr_diff.txt
+          echo "Diff size: $(wc -c < pr_diff.txt) bytes"
+      - name: Run Claude Review
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_NUMBER: ${{ github.event.pull_request.number }}
+          REPO: ${{ github.repository }}
+        run: |
+          DIFF_SIZE=$(wc -c < pr_diff.txt)
+          if [ "$DIFF_SIZE" -eq 0 ]; then
+            echo "No reviewable changes."
+            exit 0
+          fi
+          if [ "$DIFF_SIZE" -gt 75000 ]; then
+            echo "Diff too large (${DIFF_SIZE} bytes), truncating to first 75KB..."
+            head -c 75000 pr_diff.txt > pr_diff_trimmed.txt
+            mv pr_diff_trimmed.txt pr_diff.txt
+          fi
+          python3 << 'EOF'
+          import json, os, subprocess, sys, urllib.request
+          with open("pr_diff.txt") as f:
+              diff = f.read()
+          payload = {
+              "model": "claude-sonnet-4-6",
+              "max_tokens": 16384,
+              "messages": [{
+                  "role": "user",
+                  "content": (
+                      "You are reviewing a GPU compute / ML benchmarking repo "
+                      "(Python, CUDA, CuPy, PyTorch, OpenPCDet, bash). "
+                      "Review this diff and give concise feedback. Use these sections "
+                      "(skip any with nothing to report):\n\n"
+                      "**🐛 Bugs** — logic errors, off-by-ones, wrong assumptions\n"
+                      "**🔒 Security** — shell injection, hardcoded secrets, unsafe paths\n"
+                      "**⚡ Performance** — unnecessary CPU↔GPU transfers, missed parallelism\n"
+                      "**📊 Reproducibility** — seed handling, non-determinism risks\n"
+                      "**💡 Suggestions** — missing error handling, untested edge cases\n\n"
+                      f"```diff\n{diff}\n```"
+                  )
+              }]
+          }
+          req = urllib.request.Request(
+              "https://api.anthropic.com/v1/messages",
+              data=json.dumps(payload).encode(),
+              headers={
+                  "x-api-key": os.environ["ANTHROPIC_API_KEY"],
+                  "anthropic-version": "2023-06-01",
+                  "content-type": "application/json",
+                  "anthropic-beta": "output-128k-2025-02-19"
+              }
+          )
+          try:
+              with urllib.request.urlopen(req) as resp:
+                  data = json.load(resp)
+                  comment = data["content"][0]["text"]
+          except urllib.error.HTTPError as e:
+              print(f"API error {e.code}: {e.read().decode()}", file=sys.stderr)
+              sys.exit(1)
+          body = f"## 🤖 Claude Code Review\n\n{comment}"
+          subprocess.run([
+              "gh", "pr", "comment",
+              os.environ["PR_NUMBER"],
+              "--repo", os.environ["REPO"],
+              "--body", body
+          ], check=True)
+          EOF

gitm_labs-0.0.3/.github/workflows/gemini-pr-review.yml ADDED Viewed

@@ -0,0 +1,26 @@
+name: Gemini PR Code Review
+on:
+  pull_request:
+    types: [opened, synchronize, reopened]
+  workflow_dispatch: # Allows manual triggering
+jobs:
+  gemini-review:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      pull-requests: write
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0 # Fetches full history so git diff can calculate cleanly
+      - name: Run Gemini Code Review
+        uses: sshnaidm/gemini-code-review-action@v2
+        with:
+          gemini-key: ${{ secrets.GEMINI_API_KEY }}
+          model: "gemini-2.5-flash"
+          context-lines: 10

{gitm_labs-0.0.2 → gitm_labs-0.0.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: gitm-labs
-Version: 0.0.2
+Version: 0.0.3
 Summary: Autonomous GPU kernel optimizer
 Author: GITM
 License: Proprietary
@@ -32,6 +32,9 @@ Description-Content-Type: text/markdown
 # gitm-labs
+<img width="1062" height="356" alt="image" src="https://github.com/user-attachments/assets/ffee3fc3-c42f-4fe5-9e31-c6a62a245f44" />
 Behavioral compiler + intervention runtime for GPU-intensive workloads. Given a workload and a time budget, gitm-labs autonomously profiles, attributes, and applies kernel-level interventions to hit a target performance improvement — and produces a provenance report showing exactly what it changed and why.
 ## Install
@@ -115,7 +118,7 @@ The monitor checks observed-vs-predicted against three invariants:
 2. **Memory-traffic** — per-kernel bytes-moved must match predicted.
 3. **Stream-concurrency** — predicted-concurrent kernels must overlap.
-See [docs/invariants.md](docs/invariants.md).
+See [docs/invariants.md](https://github.com/GitM-Labs/runtime/blob/main/docs/invariants.md).
 ### Module responsibilities
@@ -133,8 +136,6 @@ See [docs/invariants.md](docs/invariants.md).
 | `gitm.agents` | Autonomous policy — selects interventions, drives rollback |
 | `gitm.scheduler` | 24-hour loop phase orchestration |
-See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full design.
 ## Data layout
 Two environment variables control where data lives:

{gitm_labs-0.0.2 → gitm_labs-0.0.3}/README.md RENAMED Viewed

@@ -1,5 +1,8 @@
 # gitm-labs
+<img width="1062" height="356" alt="image" src="https://github.com/user-attachments/assets/ffee3fc3-c42f-4fe5-9e31-c6a62a245f44" />
 Behavioral compiler + intervention runtime for GPU-intensive workloads. Given a workload and a time budget, gitm-labs autonomously profiles, attributes, and applies kernel-level interventions to hit a target performance improvement — and produces a provenance report showing exactly what it changed and why.
 ## Install
@@ -83,7 +86,7 @@ The monitor checks observed-vs-predicted against three invariants:
 2. **Memory-traffic** — per-kernel bytes-moved must match predicted.
 3. **Stream-concurrency** — predicted-concurrent kernels must overlap.
-See [docs/invariants.md](docs/invariants.md).
+See [docs/invariants.md](https://github.com/GitM-Labs/runtime/blob/main/docs/invariants.md).
 ### Module responsibilities
@@ -101,8 +104,6 @@ See [docs/invariants.md](docs/invariants.md).
 | `gitm.agents` | Autonomous policy — selects interventions, drives rollback |
 | `gitm.scheduler` | 24-hour loop phase orchestration |
-See [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the full design.
 ## Data layout
 Two environment variables control where data lives:

gitm_labs-0.0.3/benchmarks/edge/datasets_proposal.md ADDED Viewed

@@ -0,0 +1,77 @@
+# Additional edge/robotics datasets — proposal
+> Author: Karthik — for review by Adit before adding to spec.
+Current scope: nuScenes v1.0 + KITTI Object (~47k keyframes combined).
+Below are the next candidates ranked by signal value for the Git.M invariants.
+---
+## Tier 1 — High signal, worth adding
+### Waymo Open Dataset (v2.0)
+- **Size:** ~1,000 segments, 200k frames, 5-beam lidar (top + 4 side)
+- **Why it matters:** Much denser point clouds (64-beam top lidar vs KITTI's 64-beam but wider
+  range + higher annotation quality). Significantly harder for the backbone — GPU active % likely
+  higher, which tightens the stream-concurrency signal.
+- **Concern:** Requires a data access agreement (Google form, ~1 week turnaround).
+  Also non-commercial only — verify with Adit before committing.
+- **Manifest rows:** ~200k (5x current KITTI). Build time ~20 min.
+- **Blocker:** License approval.
+### Argoverse 2 (Sensor Dataset)
+- **Size:** 1,000 scenarios, ~30k frames, 2x lidar (spinning + forward-facing).
+- **Why it matters:** Two asynchronous lidar streams per frame — interesting for concurrency
+  invariant because merging two streams before voxelization introduces a sync point.
+  Good test of whether stream-concurrency signal carries to multi-lidar setups.
+- **Download:** Open access via S3 (`s3://argoai-argoverse2/...`). No license gate.
+- **Manifest rows:** ~30k. Adds ~30% to current combined manifest.
+- **Blocker:** None. Could add this week.
+---
+## Tier 2 — Useful if we want breadth
+### ONCE (One Million Scenes)
+- **Size:** ~1M frames, single 40-beam lidar.
+- **Why it matters:** Volume — more frames = tighter convergence bounds and better
+  steady-state GPU utilization measurements. Useful for validating that the 2%
+  convergence requirement holds at scale.
+- **Download:** Open access (Chinese hosting, slow downloads). May need mirror.
+- **Blocker:** Download bandwidth on RunPod. Otherwise no license gate.
+### PandaSet
+- **Size:** ~16k frames, dual lidar (mechanical + solid-state).
+- **Why it matters:** Solid-state lidar has a fundamentally different point density
+  pattern (non-uniform angular resolution). Tests whether the voxelization step
+  behaves differently under non-uniform inputs.
+- **Download:** Open access (free sign-up, direct download).
+- **Blocker:** None.
+---
+## Tier 3 — Lower priority
+### SemanticKITTI (KITTI odometry sequences with semantic labels)
+- **Size:** Same lidar as KITTI Object but sequential (not individual frames).
+  22 sequences, ~43k scans.
+- **Why it matters:** Sequential frames are much more cache-friendly — useful
+  as a control condition to isolate the I/O cache locality effect.
+- **Blocker:** None. Builds on top of existing KITTI download.
+### nuScenes-lidarseg
+- **Same data as nuScenes v1.0** but with per-point semantic labels.
+  No new lidar frames; adds annotation load to the post-processing step.
+- **Why it matters:** Tests sync_stall_pct sensitivity to heavier post-processing.
+---
+## Recommendation
+Add **Argoverse 2** first — no license gate, open S3, meaningful new signal
+(multi-lidar sync point). After that, pursue **Waymo** if the license approval
+clears, since it's the most widely used benchmark for 3D detection and having
+it in the manifest would make the benchmark credible to external readers.
+Skip ONCE for now (download pain) and SemanticKITTI/PandaSet unless we need
+more control conditions.

{gitm_labs-0.0.2 → gitm_labs-0.0.3}/benchmarks/kitti/results.md RENAMED Viewed

@@ -2,17 +2,21 @@
 ## Baseline measurements
-| Run | Seed | fps | GPU active % | Data stall % | Sync % | CPU % |
-|-----|------|-----|--------------|-------------|--------|-------|
-| Baseline 1 | 42 | TBD | TBD | TBD | TBD | TBD |
-| Baseline 2 | 43 | TBD | TBD | TBD | TBD | TBD |
-| Baseline 3 | 44 | TBD | TBD | TBD | TBD | TBD |
-| Mean | — | TBD | TBD | TBD | TBD | TBD |
-| Stddev | — | TBD | TBD | TBD | TBD | TBD |
-3-seed fps spread: TBD% — within 2%: TBD
-NVML cross-check (mean utilization): TBD%
+| Run | Seed | fps | GPU active % | Data stall % | Sync % | CPU % | Compute headroom % |
+|-----|------|-----|--------------|-------------|--------|-------|-------------------|
+| Baseline 1 | 42 | TBD | TBD | TBD | TBD | TBD | TBD |
+| Baseline 2 | 43 | TBD | TBD | TBD | TBD | TBD | TBD |
+| Baseline 3 | 44 | TBD | TBD | TBD | TBD | TBD | TBD |
+| Baseline 4 | 45 | TBD | TBD | TBD | TBD | TBD | TBD |
+| Baseline 5 | 46 | TBD | TBD | TBD | TBD | TBD | TBD |
+| Baseline 6 | 47 | TBD | TBD | TBD | TBD | TBD | TBD |
+| Mean | -- | TBD | TBD | TBD | TBD | TBD | TBD |
+| Stddev | -- | TBD | TBD | TBD | TBD | TBD | -- |
+6-seed fps spread: TBD% -- within 2%: TBD
+GPU headroom (compute_headroom_pct = 100 - mean NVML util): TBD%
+Memory free at peak: TBD GB
 ## Stream-concurrency verification
@@ -23,7 +27,7 @@ Host-side voxelization overlaps device-side backbone inference: **TBD**
        python harness/smoke_kitti.py --cfg $OPENPCDET_CFG --ckpt $OPENPCDET_CKPT --n-frames 200
      Open the .nsys-rep in Nsight Systems GUI, zoom in on a few consecutive frames,
      look for CPU voxelization bar overlapping GPU backbone bar.
-     Screenshot → benchmarks/kitti/concurrency_timeline.png
+     Screenshot -> benchmarks/kitti/concurrency_timeline.png
 -->
 ![nsys concurrency timeline](concurrency_timeline.png)
@@ -36,11 +40,11 @@ invariant has no signal for this workload and the benchmark needs review.
 ## Notes
-- Machine: TBD
+- Machine: RunPod y4xbh7yws2e4tu-64410cb0
 - GPU: TBD
 - Driver version: TBD
 - CUDA version: TBD
-- OpenPCDet commit: TBD
-- Config sha256: TBD
+- OpenPCDet commit: 233f849829b6ac19afb8af8837a0246890908755
+- Config sha256: 170a9ffe76cfd8509d1044cfbcf1cbd44c5d320fda81bf0089a8d5efaf1c91c8
 - Checkpoint sha256: 4c83fc0fa02575b9b3e9dec676f698e7a70bb5a795e89f91df8a96b916fa19e2
 - Date: TBD

gitm_labs-0.0.3/benchmarks/kitti/spec.md ADDED Viewed

@@ -0,0 +1,109 @@
+# KITTI edge benchmark spec
+## Section 1: Input definition
+Datasets used:
+- KITTI 3D Object Detection: 7,481 training frames, Velodyne lidar + calibration + labels
+- Data location: `$GITM_DATA_ROOT/kitti/training/`
+- Directory layout:
+  - `velodyne/`   -- 000000.bin to 007480.bin  (float32 XYZI point clouds)
+  - `calib/`      -- 000000.txt to 007480.txt  (camera-lidar calibration)
+  - `label_2/`    -- 000000.txt to 007480.txt  (3D bounding box annotations)
+- Manifest: `benchmarks/kitti/manifest.yaml`
+  - Every file sha256-verified. Pass/fail gated by `python harness/verify_manifest.py`.
+  - Generate: `python harness/gen_kitti_manifest.py --root $GITM_DATA_ROOT/kitti/training`
+## Section 2: Work unit
+One frame processed end-to-end through:
+    voxelization -> 3D backbone (PointPillars) -> BEV head -> NMS -> detections
+Model: OpenPCDet PointPillars (KITTI config)
+- OpenPCDet commit: `233f849829b6ac19afb8af8837a0246890908755`
+- Config (pointpillar.yaml) sha256: `170a9ffe76cfd8509d1044cfbcf1cbd44c5d320fda81bf0089a8d5efaf1c91c8`
+- Checkpoint: `pointpillar_7728.pth`
+- Checkpoint sha256: `4c83fc0fa02575b9b3e9dec676f698e7a70bb5a795e89f91df8a96b916fa19e2`
+Stage breakdown per frame:
+1. Load .bin (np.fromfile) -- CPU / data stall
+2. Voxelization + H2D copy -- CPU / data stall
+3. Backbone + BEV head -- GPU active
+4. NMS + box assembly -- CPU / sync stall
+Implementation: `gitm.benchmarks.kitti.WorkUnit`
+## Section 3: Success metric
+- Top-line metric: `frames_per_second` (timed warm window)
+- Warm-up: 100 frames discarded before timing begins
+- Disk pre-warm: all frames read once before GPU warmup (eliminates OS page cache
+  locality as a seed-ordering confound)
+- Timed window: 7,381 frames (all training frames minus warmup)
+- Convergence: 6 seeds (42-47) must agree within 2% fps spread
+- GPU saturation check: GPU active % must be < 85%
+- Auxiliary: `total_detections` per run (regression sentinel, not a target)
+Baseline result (fill after running `bash harness/run_baselines.sh`):
+| Seed | fps | GPU active % | Data stall % | Sync % | CPU % | Compute headroom % |
+|------|-----|--------------|-------------|--------|-------|-------------------|
+| 42   | TBD | TBD          | TBD         | TBD    | TBD   | TBD               |
+| 43   | TBD | TBD          | TBD         | TBD    | TBD   | TBD               |
+| 44   | TBD | TBD          | TBD         | TBD    | TBD   | TBD               |
+| 45   | TBD | TBD          | TBD         | TBD    | TBD   | TBD               |
+| 46   | TBD | TBD          | TBD         | TBD    | TBD   | TBD               |
+| 47   | TBD | TBD          | TBD         | TBD    | TBD   | TBD               |
+| Mean | TBD | TBD          | TBD         | TBD    | TBD   | TBD               |
+| Stddev | TBD | TBD        | TBD         | TBD    | TBD   | --                |
+6-seed fps spread: TBD -- within 2%: TBD
+## Section 4: Expected stall profile
+| Category | What it is | Expected % | Measured % |
+|----------|-----------|------------|------------|
+| Data stall | lidar .bin decode + host-side voxelization + H2D copy | 20-35% | TBD |
+| Sync stall | NMS serialization on CPU | 10-20% | TBD |
+| GPU active | backbone + BEV head forward pass | 50-65% | TBD |
+| CPU overhead | Python dispatch, dataloader | ~5% | TBD |
+**Critical check:** GPU active must be < 85%. If saturated, flag Adit same day
+for 500-frame shard fallback.
+**Stream-concurrency check (nsys):** host-side voxelization of frame N+1 should
+overlap device-side backbone inference on frame N. Capture nsys timeline and
+commit screenshot to `benchmarks/kitti/results.md`. If overlap is absent, the
+stream-concurrency invariant has no signal -- flag Adit immediately.
+## Section 5: GPU headroom (runtime integration)
+Measured via `gitm.optimizer.headroom_kernel_rank.gpu_headroom()` using NVML
+samples collected at 5 Hz during the timed window.
+| Metric | Expected | Measured |
+|--------|----------|---------|
+| Compute headroom (100 - mean util) | >35% | TBD |
+| Memory free at peak | >10 GB | TBD |
+Per-stage spread (p50/p95 latency per stage across all frames):
+| Stage | mean ms | p50 ms | p95 ms | % of frame |
+|-------|---------|--------|--------|------------|
+| load  | TBD | TBD | TBD | TBD |
+| preprocess (voxelize + H2D) | TBD | TBD | TBD | TBD |
+| inference (backbone + BEV + NMS) | TBD | TBD | TBD | TBD |
+| postprocess (D2H) | TBD | TBD | TBD | TBD |
+Stage spread is emitted as `stage_spread` in each baseline JSON and as
+`stage_spread_report.txt` alongside it.
+## Environment
+- Machine: RunPod y4xbh7yws2e4tu-64410cb0 (2 TB persistent /workspace)
+- GPU: TBD
+- Driver: TBD
+- CUDA: TBD
+- OpenPCDet commit: 233f849829b6ac19afb8af8837a0246890908755
+- Date: TBD

gitm_labs-0.0.3/gitm/benchmarks/edge/__init__.py ADDED Viewed

@@ -0,0 +1,10 @@
+"""GITM edge (nuScenes) benchmark — CenterPoint-PointPillar baseline.
+Mirrors gitm.benchmarks.kitti but for the nuScenes CenterPoint-PointPillar
+(dyn / GPU-voxelization) baseline, with multi-sweep (keyframe + 9 sweeps)
+point accumulation sourced from OpenPCDet's NuScenesDataset.
+"""
+from gitm.benchmarks.edge.workunit import NuScenesWorkUnit, WorkUnitResult
+__all__ = ["NuScenesWorkUnit", "WorkUnitResult"]

gitm-labs 0.0.2__tar.gz → 0.0.3__tar.gz

gitm-labs 0.0.2tar.gz → 0.0.3tar.gz