PyPI - cr_proc - Versions diffs - 0.2.0__tar.gz - Mend

cr_proc 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

cr_proc-0.2.0/PKG-INFO +247 -0
cr_proc-0.2.0/README.md +238 -0
cr_proc-0.2.0/pyproject.toml +23 -0
cr_proc-0.2.0/src/cr_proc/__init__.py +7 -0
cr_proc-0.2.0/src/cr_proc/api/build.py +184 -0
cr_proc-0.2.0/src/cr_proc/api/document.py +249 -0
cr_proc-0.2.0/src/cr_proc/api/load.py +165 -0
cr_proc-0.2.0/src/cr_proc/api/output.py +75 -0
cr_proc-0.2.0/src/cr_proc/api/verify.py +672 -0
cr_proc-0.2.0/src/cr_proc/cli.py +556 -0
cr_proc-0.2.0/src/cr_proc/display.py +157 -0
cr_proc-0.2.0/src/cr_proc/playback.py +559 -0
cr_proc-0.2.0/src/cr_proc/timeutil.py +31 -0

cr_proc-0.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,247 @@
+Metadata-Version: 2.3
+Name: cr_proc
+Version: 0.2.0
+Summary: A tool for processing BYU CS code recording files.
+Author: Ethan Dye
+Author-email: Ethan Dye <mrtops03@gmail.com>
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+# `code_recorder_processor`
+`code_recorder_processor` processes `*.recording.jsonl.gz` files produced by
+the current `jetbrains-recorder` and `vscode-recorder` implementations. It
+reconstructs the edited document, compares that reconstruction to a template,
+and reports suspicious activity such as large external pastes, rapid AI-style
+paste bursts, and time-limit violations.
+## Scope
+The processor is designed around the current recorder implementations, not
+around the historical examples in this repository.
+Current schema expectations:
+- Modern edit events use `type: "edit"`.
+- Status events use typed records such as `type: "focusStatus"`.
+- Events include `timestamp`, `document`, `offset`, `oldFragment`, and
+  `newFragment`.
+Compatibility behavior:
+- Older recordings that omit `type` on edit events are still accepted.
+- If a mixed recording contains both modern typed edits and later stale legacy
+  untyped edits, the processor prefers the typed stream.
+- Example recordings in `recordings/` are fixtures, not the schema source of
+  truth.
+## Installation
+For development inside this repository:
+```bash
+uv sync --dev
+```
+For running commands in the repo without a global install, prefer:
+```bash
+uv run cr_proc --help
+```
+To install the CLI globally from a local checkout:
+```bash
+uv tool install .
+```
+After that, the `cr_proc` command is available directly:
+```bash
+cr_proc --help
+```
+If you want the global command to track local source changes while developing:
+```bash
+uv tool install --editable .
+```
+## Quick Start
+The simplest invocation is to pass only recordings. When `--template` is
+omitted, the processor looks for a matching template file next to each
+recording.
+Single recording:
+```bash
+uv run cr_proc path/to/student.recording.jsonl.gz
+```
+Multiple recordings:
+```bash
+uv run cr_proc recordings/*.recording.jsonl.gz
+```
+Explicit template file:
+```bash
+uv run cr_proc student.recording.jsonl.gz --template template.py
+```
+Template directory:
+```bash
+uv run cr_proc recordings/*.recording.jsonl.gz --template templates/
+```
+Write reconstructed output:
+```bash
+uv run cr_proc student.recording.jsonl.gz --write reconstructed.py
+uv run cr_proc recordings/*.recording.jsonl.gz --write output/
+```
+Compare to submitted files:
+```bash
+uv run cr_proc student.recording.jsonl.gz --submitted submitted.py
+uv run cr_proc recordings/*.recording.jsonl.gz --submitted submissions/
+```
+Write JSON results:
+```bash
+uv run cr_proc recordings/*.recording.jsonl.gz --output-json results.json
+```
+Playback mode:
+```bash
+uv run cr_proc student.recording.jsonl.gz --playback
+```
+This opens a windowed viewer. Use the left/right arrow keys to step through
+edits, `Space` to play or pause, and `Home`/`End` to jump to the beginning or
+final state. The viewer is generated as a local HTML page and opened in your
+default browser.
+Select a specific document from a multi-document recording:
+```bash
+uv run cr_proc multi-file.recording.jsonl.gz --document src/main.py
+```
+## CLI Reference
+Core inputs:
+- `inputs`: One or more recording files or glob patterns.
+- `--template PATH`: Optional template file or template directory.
+- `--document NAME`: Optional override for which document inside the recording
+  should be processed. This matches the recorded document path or filename and
+  is not another local file input.
+Outputs:
+- `--write PATH`: Write reconstructed code. In single-file mode this can be a
+  file or a directory. In batch mode it must be a directory.
+- `--output-json PATH`: Write structured JSON results.
+- `--submitted PATH`: Compare reconstructed code to a submitted file or a
+  directory of submitted files.
+Verification and filtering:
+- `--time-limit MINUTES`: Flag recordings whose active editing time exceeds the
+  limit.
+- `--filter-file FILE`: Exclude recordings matching a path, filename, or base
+  filename.
+- `--filter-function-generation`: Suppress suspicious autocomplete findings
+  that are recognized as IDE-generated boilerplate function stubs.
+Playback:
+- `--playback`: Open a browser-based windowed playback viewer.
+- `--playback-speed FLOAT`: Playback speed multiplier.
+- `--playback-start-event N`: Start playback from a later applied-event index.
+Compatibility aliases:
+- Legacy positional-template usage still works.
+- `--template-dir`, `--output-file`, `--output-dir`, `--submitted-file`, and
+  `--submitted-dir` are still accepted as compatibility aliases.
+## Template Resolution
+When the processor needs a template, it resolves it in this order:
+1. `--template <file>` uses that exact file.
+2. `--template <directory>` searches that directory for the best filename or
+   stem match to the recorded document.
+3. If `--template` is omitted, the processor searches the recording's parent
+   directory.
+4. Legacy positional-template mode treats the last positional argument as a
+   template file when it does not look like a recording path.
+`--document` affects this process by telling the processor which recorded
+document to treat as the target before template matching happens. It only
+selects data already present in the recording.
+If no matching template is found, processing still continues by falling back to
+the recording snapshot as the reconstruction seed.
+## Output Behavior
+Normal user-facing output goes to `stderr`:
+- time summaries
+- suspicious-event summaries
+- template mismatch diffs
+- submitted-file comparison summaries
+- warnings
+Reconstructed code is written only when `--write` is used.
+JSON output is written only when `--output-json` is used.
+## Suspicious Activity Detection
+The processor currently reports:
+- large multi-line external pastes
+- rapid clusters of pasted lines within one second as an AI indicator
+- time-limit violations for single recordings and combined batch activity
+These checks are heuristic. They are intended to surface recordings for review,
+not to act as a standalone disciplinary decision engine.
+## Development
+Run tests:
+```bash
+uv run pytest -q
+```
+Run the bundled example recording:
+```bash
+uv run cr_proc recordings/cs111-homework0/cs111-homework0-ISC.recording.jsonl.gz
+```
+## CI and Release
+GitHub Actions uses `uv`, not Poetry.
+- CI installs dependencies with `uv sync --locked --dev`.
+- CI currently runs on Python `3.11` and `3.14`.
+- The publish workflow builds distributions with `uv build`.
+## Repository Fixtures
+The bundled recordings are documented in
+[`recordings/README.md`](/Volumes/Developer/cs111/code_recorder_processor/recordings/README.md).
+Those files are useful for regression tests and examples, but some were created
+with older recorder versions and intentionally exercise compatibility paths.

cr_proc-0.2.0/README.md ADDED Viewed

@@ -0,0 +1,238 @@
+# `code_recorder_processor`
+`code_recorder_processor` processes `*.recording.jsonl.gz` files produced by
+the current `jetbrains-recorder` and `vscode-recorder` implementations. It
+reconstructs the edited document, compares that reconstruction to a template,
+and reports suspicious activity such as large external pastes, rapid AI-style
+paste bursts, and time-limit violations.
+## Scope
+The processor is designed around the current recorder implementations, not
+around the historical examples in this repository.
+Current schema expectations:
+- Modern edit events use `type: "edit"`.
+- Status events use typed records such as `type: "focusStatus"`.
+- Events include `timestamp`, `document`, `offset`, `oldFragment`, and
+  `newFragment`.
+Compatibility behavior:
+- Older recordings that omit `type` on edit events are still accepted.
+- If a mixed recording contains both modern typed edits and later stale legacy
+  untyped edits, the processor prefers the typed stream.
+- Example recordings in `recordings/` are fixtures, not the schema source of
+  truth.
+## Installation
+For development inside this repository:
+```bash
+uv sync --dev
+```
+For running commands in the repo without a global install, prefer:
+```bash
+uv run cr_proc --help
+```
+To install the CLI globally from a local checkout:
+```bash
+uv tool install .
+```
+After that, the `cr_proc` command is available directly:
+```bash
+cr_proc --help
+```
+If you want the global command to track local source changes while developing:
+```bash
+uv tool install --editable .
+```
+## Quick Start
+The simplest invocation is to pass only recordings. When `--template` is
+omitted, the processor looks for a matching template file next to each
+recording.
+Single recording:
+```bash
+uv run cr_proc path/to/student.recording.jsonl.gz
+```
+Multiple recordings:
+```bash
+uv run cr_proc recordings/*.recording.jsonl.gz
+```
+Explicit template file:
+```bash
+uv run cr_proc student.recording.jsonl.gz --template template.py
+```
+Template directory:
+```bash
+uv run cr_proc recordings/*.recording.jsonl.gz --template templates/
+```
+Write reconstructed output:
+```bash
+uv run cr_proc student.recording.jsonl.gz --write reconstructed.py
+uv run cr_proc recordings/*.recording.jsonl.gz --write output/
+```
+Compare to submitted files:
+```bash
+uv run cr_proc student.recording.jsonl.gz --submitted submitted.py
+uv run cr_proc recordings/*.recording.jsonl.gz --submitted submissions/
+```
+Write JSON results:
+```bash
+uv run cr_proc recordings/*.recording.jsonl.gz --output-json results.json
+```
+Playback mode:
+```bash
+uv run cr_proc student.recording.jsonl.gz --playback
+```
+This opens a windowed viewer. Use the left/right arrow keys to step through
+edits, `Space` to play or pause, and `Home`/`End` to jump to the beginning or
+final state. The viewer is generated as a local HTML page and opened in your
+default browser.
+Select a specific document from a multi-document recording:
+```bash
+uv run cr_proc multi-file.recording.jsonl.gz --document src/main.py
+```
+## CLI Reference
+Core inputs:
+- `inputs`: One or more recording files or glob patterns.
+- `--template PATH`: Optional template file or template directory.
+- `--document NAME`: Optional override for which document inside the recording
+  should be processed. This matches the recorded document path or filename and
+  is not another local file input.
+Outputs:
+- `--write PATH`: Write reconstructed code. In single-file mode this can be a
+  file or a directory. In batch mode it must be a directory.
+- `--output-json PATH`: Write structured JSON results.
+- `--submitted PATH`: Compare reconstructed code to a submitted file or a
+  directory of submitted files.
+Verification and filtering:
+- `--time-limit MINUTES`: Flag recordings whose active editing time exceeds the
+  limit.
+- `--filter-file FILE`: Exclude recordings matching a path, filename, or base
+  filename.
+- `--filter-function-generation`: Suppress suspicious autocomplete findings
+  that are recognized as IDE-generated boilerplate function stubs.
+Playback:
+- `--playback`: Open a browser-based windowed playback viewer.
+- `--playback-speed FLOAT`: Playback speed multiplier.
+- `--playback-start-event N`: Start playback from a later applied-event index.
+Compatibility aliases:
+- Legacy positional-template usage still works.
+- `--template-dir`, `--output-file`, `--output-dir`, `--submitted-file`, and
+  `--submitted-dir` are still accepted as compatibility aliases.
+## Template Resolution
+When the processor needs a template, it resolves it in this order:
+1. `--template <file>` uses that exact file.
+2. `--template <directory>` searches that directory for the best filename or
+   stem match to the recorded document.
+3. If `--template` is omitted, the processor searches the recording's parent
+   directory.
+4. Legacy positional-template mode treats the last positional argument as a
+   template file when it does not look like a recording path.
+`--document` affects this process by telling the processor which recorded
+document to treat as the target before template matching happens. It only
+selects data already present in the recording.
+If no matching template is found, processing still continues by falling back to
+the recording snapshot as the reconstruction seed.
+## Output Behavior
+Normal user-facing output goes to `stderr`:
+- time summaries
+- suspicious-event summaries
+- template mismatch diffs
+- submitted-file comparison summaries
+- warnings
+Reconstructed code is written only when `--write` is used.
+JSON output is written only when `--output-json` is used.
+## Suspicious Activity Detection
+The processor currently reports:
+- large multi-line external pastes
+- rapid clusters of pasted lines within one second as an AI indicator
+- time-limit violations for single recordings and combined batch activity
+These checks are heuristic. They are intended to surface recordings for review,
+not to act as a standalone disciplinary decision engine.
+## Development
+Run tests:
+```bash
+uv run pytest -q
+```
+Run the bundled example recording:
+```bash
+uv run cr_proc recordings/cs111-homework0/cs111-homework0-ISC.recording.jsonl.gz
+```
+## CI and Release
+GitHub Actions uses `uv`, not Poetry.
+- CI installs dependencies with `uv sync --locked --dev`.
+- CI currently runs on Python `3.11` and `3.14`.
+- The publish workflow builds distributions with `uv build`.
+## Repository Fixtures
+The bundled recordings are documented in
+[`recordings/README.md`](/Volumes/Developer/cs111/code_recorder_processor/recordings/README.md).
+Those files are useful for regression tests and examples, but some were created
+with older recorder versions and intentionally exercise compatibility paths.

cr_proc-0.2.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,23 @@
+[project]
+name = "cr_proc"
+version = "0.2.0"
+description = "A tool for processing BYU CS code recording files."
+readme = "README.md"
+requires-python = ">=3.11"
+authors = [
+    { name = "Ethan Dye", email = "mrtops03@gmail.com" },
+]
+dependencies = []
+[project.scripts]
+cr_proc = "cr_proc.cli:main"
+[dependency-groups]
+dev = [
+    "mdformat>=1.0.0,<2.0.0",
+    "pytest>=9.0.2,<10.0.0",
+]
+[build-system]
+requires = ["uv_build>=0.11.1,<0.12.0"]
+build-backend = "uv_build"

cr_proc-0.2.0/src/cr_proc/__init__.py ADDED Viewed

@@ -0,0 +1,7 @@
+"""Code Recorder Processor - A tool for processing BYU CS code recording files."""
+from importlib.metadata import version, PackageNotFoundError
+try:
+    __version__ = version("cr_proc")
+except PackageNotFoundError:
+    __version__ = "unknown"

cr_proc-0.2.0/src/cr_proc/api/build.py ADDED Viewed

@@ -0,0 +1,184 @@
+"""Replay edit events to reconstruct document state."""
+from __future__ import annotations
+import sys
+from typing import Any
+from ..timeutil import parse_timestamp
+from .document import filter_events_by_document_with_rename_handling
+from .load import filter_edit_events
+def _normalize_newlines(text: str) -> str:
+    """Normalize CRLF to LF for stable replay and diff behavior."""
+    return text.replace("\r\n", "\n")
+def _ordered_edit_events(events: tuple[dict[str, Any], ...]) -> list[dict[str, Any]]:
+    decorated: list[tuple[int, object, dict[str, Any]]] = []
+    for index, event in enumerate(events):
+        timestamp = event.get("timestamp")
+        if timestamp:
+            try:
+                decorated.append((0, parse_timestamp(str(timestamp)), event))
+                continue
+            except ValueError:
+                pass
+        decorated.append((1, index, event))
+    decorated.sort(key=lambda item: (item[0], item[1]))
+    return [event for _, _, event in decorated]
+def _utf16_units_to_index(text: str, units: int) -> int:
+    if units <= 0:
+        return 0
+    consumed = 0
+    index = 0
+    for char in text:
+        if consumed >= units:
+            break
+        consumed += 2 if ord(char) > 0xFFFF else 1
+        index += 1
+    return index
+def _resolve_offset(document: str, old_fragment: str, offset: int, window: int) -> int:
+    if old_fragment == "":
+        return max(0, min(offset, len(document)))
+    if 0 <= offset <= len(document) and document[offset : offset + len(old_fragment)] == old_fragment:
+        return offset
+    start = max(0, offset - window)
+    end = min(len(document), offset + window + len(old_fragment))
+    best_match: tuple[int, int] | None = None
+    search_at = start
+    while True:
+        found = document.find(old_fragment, search_at, end)
+        if found == -1:
+            break
+        distance = abs(found - offset)
+        candidate = (distance, found)
+        if best_match is None or candidate < best_match:
+            best_match = candidate
+        search_at = found + 1
+    if best_match is None:
+        raise ValueError(
+            f"Old fragment not found near offset {offset}.\n"
+            f"old={old_fragment!r}\nnew fragment length={len(old_fragment)}"
+        )
+    return best_match[1]
+def _apply_edit(
+    document: str,
+    *,
+    old_fragment: str,
+    new_fragment: str,
+    offset: int,
+    window: int,
+    utf16_mode: bool,
+) -> str:
+    text_offset = _utf16_units_to_index(document, offset) if utf16_mode else offset
+    resolved_offset = _resolve_offset(document, old_fragment, text_offset, window)
+    return (
+        document[:resolved_offset]
+        + new_fragment
+        + document[resolved_offset + len(old_fragment) :]
+    )
+def reconstruct_file_from_events(
+    events: tuple[dict[str, Any], ...],
+    template: str,
+    document_path: str | None = None,
+    *,
+    utf16_mode: bool = False,
+    window: int = 200,
+    normalize_newlines: bool = True,
+    skip_unreplayable: bool = True,
+) -> str:
+    """Replay edit events to reconstruct the final document state."""
+    edit_events = filter_edit_events(events)
+    if not edit_events:
+        return _normalize_newlines(template) if normalize_newlines else template
+    target_document = document_path
+    if target_document is None:
+        recorded_docs = {
+            str(event["document"])
+            for event in edit_events
+            if event.get("document") is not None
+        }
+        if len(recorded_docs) == 1:
+            target_document = next(iter(recorded_docs))
+        else:
+            raise ValueError(
+                "Ambiguous target document: provide document_path explicitly."
+            )
+    doc_events = tuple(
+        filter_events_by_document_with_rename_handling(edit_events, target_document)
+    )
+    ordered_events = _ordered_edit_events(doc_events)
+    if not ordered_events:
+        return _normalize_newlines(template) if normalize_newlines else template
+    document = _normalize_newlines(template) if normalize_newlines else template
+    skipped = 0
+    for event_index, event in enumerate(ordered_events):
+        old_fragment = str(event.get("oldFragment", ""))
+        new_fragment = str(event.get("newFragment", ""))
+        if normalize_newlines:
+            old_fragment = _normalize_newlines(old_fragment)
+            new_fragment = _normalize_newlines(new_fragment)
+        try:
+            offset = int(event.get("offset", 0) or 0)
+        except (TypeError, ValueError):
+            offset = 0
+        if old_fragment == new_fragment and offset == 0:
+            if old_fragment:
+                document = old_fragment
+            continue
+        if old_fragment == new_fragment:
+            continue
+        try:
+            document = _apply_edit(
+                document,
+                old_fragment=old_fragment,
+                new_fragment=new_fragment,
+                offset=offset,
+                window=window,
+                utf16_mode=utf16_mode,
+            )
+        except ValueError as exc:
+            if not skip_unreplayable:
+                raise
+            skipped += 1
+            print(
+                "Warning: "
+                f"Skipping event #{event_index} "
+                f"(timestamp: {event.get('timestamp', 'unknown')}): "
+                f"{exc} - document offset may have drifted",
+                file=sys.stderr,
+            )
+    if skipped:
+        print(
+            f"Warning: Skipped {skipped} event(s) due to offset drift",
+            file=sys.stderr,
+        )
+    return document