PyPI - cr-proc - Versions diffs - 0.1.10__tar.gz → 0.1.12__tar.gz - Mend

cr-proc 0.1.10tar.gz → 0.1.12tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

{cr_proc-0.1.10 → cr_proc-0.1.12}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: cr_proc
-Version: 0.1.10
+Version: 0.1.12
 Summary: A tool for processing BYU CS code recording files.
 Author: Ethan Dye
 Author-email: mrtops03@gmail.com
@@ -28,7 +28,8 @@ poetry install
 ## Usage
-The processor can be run using the `cr_proc` command with recording file(s) and a template:
+The processor can be run using the `cr_proc` command with recording file(s) and
+a template:
 ```bash
 poetry run cr_proc <path-to-jsonl-file> <path-to-template-file>
@@ -36,7 +37,8 @@ poetry run cr_proc <path-to-jsonl-file> <path-to-template-file>
 ### Batch Processing
-You can process multiple recording files at once (e.g., for different students' submissions):
+You can process multiple recording files at once (e.g., for different students'
+submissions):
 ```bash
 # Process multiple files
@@ -47,9 +49,11 @@ poetry run cr_proc recordings/*.jsonl.gz template.py
 ```
 When processing multiple files:
 - Each recording is processed independently (for different students/documents)
 - Time calculations and verification are done separately for each file
-- A combined time report is shown at the end summarizing total editing time across all recordings
+- A combined time report is shown at the end summarizing total editing time
+  across all recordings
 - Results can be output to individual files using `--output-dir`
 ### Arguments
@@ -61,24 +65,34 @@ When processing multiple files:
 ### Options
-- `-t, --time-limit MINUTES`: (Optional) Maximum allowed time in minutes between the
-  first and last edit in the recording. Applied individually to each recording file and
-  also to the combined total in batch mode. If the elapsed time exceeds this limit, the
-  recording is flagged as suspicious.
-- `-d, --document DOCUMENT`: (Optional) Document path or filename to process from the
-  recording. Defaults to the document whose extension matches the template file.
-- `-o, --output-json OUTPUT_JSON`: (Optional) Path to output JSON file with verification
-  results (time info and suspicious events). In batch mode, creates a single JSON file
-  containing all recordings plus the combined time report.
-- `-f, --output-file OUTPUT_FILE`: (Optional) Write reconstructed code to specified file
-  instead of stdout. For single files only.
-- `--output-dir OUTPUT_DIR`: (Optional) Directory to write reconstructed code files in
-  batch mode. Files are named based on input recording filenames.
-- `-s, --show-autocomplete-details`: (Optional) Show individual auto-complete events in
-  addition to aggregate statistics.
-- `-p, --playback`: (Optional) Play back the recording in real-time, showing code evolution.
-- `--playback-speed SPEED`: (Optional) Playback speed multiplier (1.0 = real-time, 2.0 = 2x
-  speed, 0.5 = half speed).
+- `-t, --time-limit MINUTES`: (Optional) Maximum allowed time in minutes between
+  the first and last edit in the recording. Applied individually to each
+  recording file and also to the combined total in batch mode. If the elapsed
+  time exceeds this limit, the recording is flagged as suspicious.
+- `-d, --document DOCUMENT`: (Optional) Document path or filename to process
+  from the recording. Defaults to the document whose extension matches the
+  template file.
+- `-o, --output-json OUTPUT_JSON`: (Optional) Path to output JSON file with
+  verification results (time info and suspicious events). In batch mode, creates
+  a single JSON file containing all recordings plus the combined time report.
+- `-f, --output-file OUTPUT_FILE`: (Optional) Write reconstructed code to
+  specified file instead of stdout. For single files only.
+- `--output-dir OUTPUT_DIR`: (Optional) Directory to write reconstructed code
+  files in batch mode. Files are named based on input recording filenames.
+- `--submitted-file SUBMITTED_FILE`: (Optional) Path to the submitted final file
+  to verify against the reconstructed output. If provided, the reconstructed code
+  will be compared to this file and differences will be reported.
+- `--submitted-dir SUBMITTED_DIR`: (Optional) Directory containing submitted files
+  to verify against the reconstructed output. For each recording file, the
+  corresponding submitted file will be found by matching the filename
+  (e.g., `homework0-ISC.recording.jsonl.gz` will match `homework0-ISC.py`).
+  Cannot be used with `--submitted-file`.
+- `-s, --show-autocomplete-details`: (Optional) Show individual auto-complete
+  events in addition to aggregate statistics.
+- `-p, --playback`: (Optional) Play back the recording in real-time, showing
+  code evolution.
+- `--playback-speed SPEED`: (Optional) Playback speed multiplier (1.0 =
+  real-time, 2.0 = 2x speed, 0.5 = half speed).
 ### Examples
@@ -106,7 +120,20 @@ Save JSON results:
 poetry run cr_proc student1.jsonl.gz student2.jsonl.gz template.py -o results/
 ```
-This will process each recording independently and flag any that exceed 30 minutes.
+Verify against a single submitted file:
+```bash
+poetry run cr_proc homework0.recording.jsonl.gz homework0.py --submitted-file submitted_homework0.py
+```
+Verify against submitted files in a directory (batch mode):
+```bash
+poetry run cr_proc recordings/*.jsonl.gz template.py --submitted-dir submissions/
+```
+This will process each recording independently and flag any that exceed 30
+minutes.
 The processor will:
@@ -118,8 +145,9 @@ The processor will:
 ### Output
-Reconstructed code files are written to disk using `-f/--output-file` (single file)
-or `--output-dir` (batch mode). The processor does not output reconstructed code to stdout.
+Reconstructed code files are written to disk using `-f/--output-file` (single
+file) or `--output-dir` (batch mode). The processor does not output
+reconstructed code to stdout.
 Verification information, warnings, and errors are printed to stderr, including:
@@ -133,8 +161,8 @@ Verification information, warnings, and errors are printed to stderr, including:
 ### Suspicious Activity Detection
-The processor automatically detects and reports three types of suspicious activity
-patterns:
+The processor automatically detects and reports three types of suspicious
+activity patterns:
 #### 1. Time Limit Exceeded
@@ -142,8 +170,8 @@ When the `--time-limit` flag is specified, the processor flags recordings where
 the elapsed time between the first and last edit exceeds the specified limit.
 This can indicate unusually long work sessions or potential external assistance.
-Each recording file is checked independently against the time limit. In batch mode,
-the combined total time is also checked against the limit.
+Each recording file is checked independently against the time limit. In batch
+mode, the combined total time is also checked against the limit.
 **Example warning (single file):**
@@ -199,12 +227,14 @@ Events #42-#44 (rapid one-line pastes (AI indicator)): 3 lines, 89 chars
 ### JSON Output Format
-The `--output-json` flag generates JSON files with verification results using a consistent format
-for both single file and batch modes, making it easier for tooling to consume.
+The `--output-json` flag generates JSON files with verification results using a
+consistent format for both single file and batch modes, making it easier for
+tooling to consume.
 #### JSON Structure
 All JSON output follows this unified format:
 - `batch_mode`: Boolean indicating if multiple files were processed
 - `total_files`: Number of files processed
 - `verified_count`: How many files passed verification
@@ -219,6 +249,7 @@ All JSON output follows this unified format:
 - `files`: Array of individual results for each recording
 **Single file example:**
 ```json
 {
   "batch_mode": false,
@@ -244,6 +275,7 @@ All JSON output follows this unified format:
 ```
 **Batch file example:**
 ```json
 {
   "batch_mode": true,

{cr_proc-0.1.10 → cr_proc-0.1.12}/README.md RENAMED Viewed

@@ -16,7 +16,8 @@ poetry install
 ## Usage
-The processor can be run using the `cr_proc` command with recording file(s) and a template:
+The processor can be run using the `cr_proc` command with recording file(s) and
+a template:
 ```bash
 poetry run cr_proc <path-to-jsonl-file> <path-to-template-file>
@@ -24,7 +25,8 @@ poetry run cr_proc <path-to-jsonl-file> <path-to-template-file>
 ### Batch Processing
-You can process multiple recording files at once (e.g., for different students' submissions):
+You can process multiple recording files at once (e.g., for different students'
+submissions):
 ```bash
 # Process multiple files
@@ -35,9 +37,11 @@ poetry run cr_proc recordings/*.jsonl.gz template.py
 ```
 When processing multiple files:
 - Each recording is processed independently (for different students/documents)
 - Time calculations and verification are done separately for each file
-- A combined time report is shown at the end summarizing total editing time across all recordings
+- A combined time report is shown at the end summarizing total editing time
+  across all recordings
 - Results can be output to individual files using `--output-dir`
 ### Arguments
@@ -49,24 +53,34 @@ When processing multiple files:
 ### Options
-- `-t, --time-limit MINUTES`: (Optional) Maximum allowed time in minutes between the
-  first and last edit in the recording. Applied individually to each recording file and
-  also to the combined total in batch mode. If the elapsed time exceeds this limit, the
-  recording is flagged as suspicious.
-- `-d, --document DOCUMENT`: (Optional) Document path or filename to process from the
-  recording. Defaults to the document whose extension matches the template file.
-- `-o, --output-json OUTPUT_JSON`: (Optional) Path to output JSON file with verification
-  results (time info and suspicious events). In batch mode, creates a single JSON file
-  containing all recordings plus the combined time report.
-- `-f, --output-file OUTPUT_FILE`: (Optional) Write reconstructed code to specified file
-  instead of stdout. For single files only.
-- `--output-dir OUTPUT_DIR`: (Optional) Directory to write reconstructed code files in
-  batch mode. Files are named based on input recording filenames.
-- `-s, --show-autocomplete-details`: (Optional) Show individual auto-complete events in
-  addition to aggregate statistics.
-- `-p, --playback`: (Optional) Play back the recording in real-time, showing code evolution.
-- `--playback-speed SPEED`: (Optional) Playback speed multiplier (1.0 = real-time, 2.0 = 2x
-  speed, 0.5 = half speed).
+- `-t, --time-limit MINUTES`: (Optional) Maximum allowed time in minutes between
+  the first and last edit in the recording. Applied individually to each
+  recording file and also to the combined total in batch mode. If the elapsed
+  time exceeds this limit, the recording is flagged as suspicious.
+- `-d, --document DOCUMENT`: (Optional) Document path or filename to process
+  from the recording. Defaults to the document whose extension matches the
+  template file.
+- `-o, --output-json OUTPUT_JSON`: (Optional) Path to output JSON file with
+  verification results (time info and suspicious events). In batch mode, creates
+  a single JSON file containing all recordings plus the combined time report.
+- `-f, --output-file OUTPUT_FILE`: (Optional) Write reconstructed code to
+  specified file instead of stdout. For single files only.
+- `--output-dir OUTPUT_DIR`: (Optional) Directory to write reconstructed code
+  files in batch mode. Files are named based on input recording filenames.
+- `--submitted-file SUBMITTED_FILE`: (Optional) Path to the submitted final file
+  to verify against the reconstructed output. If provided, the reconstructed code
+  will be compared to this file and differences will be reported.
+- `--submitted-dir SUBMITTED_DIR`: (Optional) Directory containing submitted files
+  to verify against the reconstructed output. For each recording file, the
+  corresponding submitted file will be found by matching the filename
+  (e.g., `homework0-ISC.recording.jsonl.gz` will match `homework0-ISC.py`).
+  Cannot be used with `--submitted-file`.
+- `-s, --show-autocomplete-details`: (Optional) Show individual auto-complete
+  events in addition to aggregate statistics.
+- `-p, --playback`: (Optional) Play back the recording in real-time, showing
+  code evolution.
+- `--playback-speed SPEED`: (Optional) Playback speed multiplier (1.0 =
+  real-time, 2.0 = 2x speed, 0.5 = half speed).
 ### Examples
@@ -94,7 +108,20 @@ Save JSON results:
 poetry run cr_proc student1.jsonl.gz student2.jsonl.gz template.py -o results/
 ```
-This will process each recording independently and flag any that exceed 30 minutes.
+Verify against a single submitted file:
+```bash
+poetry run cr_proc homework0.recording.jsonl.gz homework0.py --submitted-file submitted_homework0.py
+```
+Verify against submitted files in a directory (batch mode):
+```bash
+poetry run cr_proc recordings/*.jsonl.gz template.py --submitted-dir submissions/
+```
+This will process each recording independently and flag any that exceed 30
+minutes.
 The processor will:
@@ -106,8 +133,9 @@ The processor will:
 ### Output
-Reconstructed code files are written to disk using `-f/--output-file` (single file)
-or `--output-dir` (batch mode). The processor does not output reconstructed code to stdout.
+Reconstructed code files are written to disk using `-f/--output-file` (single
+file) or `--output-dir` (batch mode). The processor does not output
+reconstructed code to stdout.
 Verification information, warnings, and errors are printed to stderr, including:
@@ -121,8 +149,8 @@ Verification information, warnings, and errors are printed to stderr, including:
 ### Suspicious Activity Detection
-The processor automatically detects and reports three types of suspicious activity
-patterns:
+The processor automatically detects and reports three types of suspicious
+activity patterns:
 #### 1. Time Limit Exceeded
@@ -130,8 +158,8 @@ When the `--time-limit` flag is specified, the processor flags recordings where
 the elapsed time between the first and last edit exceeds the specified limit.
 This can indicate unusually long work sessions or potential external assistance.
-Each recording file is checked independently against the time limit. In batch mode,
-the combined total time is also checked against the limit.
+Each recording file is checked independently against the time limit. In batch
+mode, the combined total time is also checked against the limit.
 **Example warning (single file):**
@@ -187,12 +215,14 @@ Events #42-#44 (rapid one-line pastes (AI indicator)): 3 lines, 89 chars
 ### JSON Output Format
-The `--output-json` flag generates JSON files with verification results using a consistent format
-for both single file and batch modes, making it easier for tooling to consume.
+The `--output-json` flag generates JSON files with verification results using a
+consistent format for both single file and batch modes, making it easier for
+tooling to consume.
 #### JSON Structure
 All JSON output follows this unified format:
 - `batch_mode`: Boolean indicating if multiple files were processed
 - `total_files`: Number of files processed
 - `verified_count`: How many files passed verification
@@ -207,6 +237,7 @@ All JSON output follows this unified format:
 - `files`: Array of individual results for each recording
 **Single file example:**
 ```json
 {
   "batch_mode": false,
@@ -232,6 +263,7 @@ All JSON output follows this unified format:
 ```
 **Batch file example:**
 ```json
 {
   "batch_mode": true,

{cr_proc-0.1.10 → cr_proc-0.1.12}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "cr_proc"
-version = "0.1.10"
+version = "0.1.12"
 description = "A tool for processing BYU CS code recording files."
 authors = [
     {name = "Ethan Dye",email = "mrtops03@gmail.com"}

{cr_proc-0.1.10 → cr_proc-0.1.12}/src/code_recorder_processor/api/build.py RENAMED Viewed

@@ -169,6 +169,9 @@ def reconstruct_file_from_events(
     from .load import is_edit_event
     events = tuple(e for e in events if is_edit_event(e))
+    # Skip no-op events (oldFragment == newFragment, typically file-open markers)
+    events = tuple(e for e in events if not (e.get("oldFragment") == e.get("newFragment") and e.get("offset") == 0))
     # Read template content
     if normalize_newlines:
         template = _normalize_newlines(template)
@@ -197,6 +200,39 @@ def reconstruct_file_from_events(
         # No events for target_doc; return template unchanged
         return template
+    # Handle case where first event is a file-open/load event at offset 0
+    # (IDE captures the file content as seen when opened)
+    if evs and evs[0].get("offset") == 0:
+        first_old = evs[0].get("oldFragment", "")
+        first_new = evs[0].get("newFragment", "")
+        if first_old and not template.startswith(first_old):
+            # Check if this looks like a file-open event:
+            # - First event is at offset 0
+            # - oldFragment and newFragment contain significant content (file was loaded)
+            # - Template is much smaller (stub/placeholder)
+            is_likely_file_open = (
+                first_old == first_new and  # no-op replacement (just file load)
+                len(first_old) > 50 and      # substantial content
+                len(template) < len(first_old)  # template is smaller stub
+            )
+            if is_likely_file_open:
+                # Use first event's oldFragment as the template (actual file state when opened)
+                template = first_old
+            else:
+                # Template genuinely doesn't match
+                raise ValueError(
+                    f"Template content does not match recording's initial state.\n"
+                    f"First event expects to replace {len(first_old)} chars starting at offset 0,\n"
+                    f"but template only has {len(template)} chars and starts with:\n"
+                    f"{template[:min(100, len(template))]!r}\n\n"
+                    f"Expected to start with:\n"
+                    f"{first_old[:min(100, len(first_old))]!r}\n\n"
+                    f"Recording was likely made on a different version of the file.\n"
+                    f"Document path in recording: {target_doc}"
+                )
     if utf16_mode:
         # Work in UTF-16-LE byte space
         doc_bytes = template.encode("utf-16-le")

{cr_proc-0.1.10 → cr_proc-0.1.12}/src/code_recorder_processor/api/document.py RENAMED Viewed

@@ -5,9 +5,38 @@ from pathlib import Path, PureWindowsPath, PurePosixPath
 from typing import Any
+def normalize_path_string(path_str: str) -> str:
+    """
+    Normalize a path string to use forward slashes (POSIX style).
+    Handles both Windows-style (backslash) and Unix-style (forward slash) paths
+    regardless of the current platform. Useful for cross-platform consistency
+    when files are created on Windows but processed on other systems.
+    Parameters
+    ----------
+    path_str : str
+        Path string (may use Windows or Unix separators)
+    Returns
+    -------
+    str
+        Normalized path string using forward slashes
+    """
+    # Try to detect if this is a Windows path (contains backslashes)
+    if "\\" in path_str:
+        # Windows-style path
+        path_obj = PureWindowsPath(path_str)
+    else:
+        # Unix-style path (or just a filename)
+        path_obj = PurePosixPath(path_str)
+    return path_obj.as_posix()
 def _normalize_document_path(doc_path: str) -> tuple[str, str]:
     """
-    Normalize a document path to extract filename and stem.
+    Extract filename and stem from a document path.
     Handles both Windows-style (backslash) and Unix-style (forward slash) paths
     regardless of the current platform.
@@ -22,14 +51,9 @@ def _normalize_document_path(doc_path: str) -> tuple[str, str]:
     tuple[str, str]
         (filename, stem) extracted from the path
     """
-    # Try to detect if this is a Windows path (contains backslashes)
-    if "\\" in doc_path:
-        # Windows-style path
-        path_obj = PureWindowsPath(doc_path)
-    else:
-        # Unix-style path (or just a filename)
-        path_obj = PurePosixPath(doc_path)
+    # Normalize to forward slashes first, then parse
+    normalized = normalize_path_string(doc_path)
+    path_obj = PurePosixPath(normalized)
     return path_obj.name, path_obj.stem

{cr_proc-0.1.10 → cr_proc-0.1.12}/src/code_recorder_processor/api/output.py RENAMED Viewed

@@ -4,6 +4,8 @@ import sys
 from pathlib import Path
 from typing import Any
+from .document import normalize_path_string
 def write_batch_json_output(
     output_path: Path,
@@ -36,15 +38,21 @@ def write_batch_json_output(
     # Convert results to JSON-serializable format
     files_data = []
     for r in results:
-        files_data.append({
-            "jsonl_file": str(r["jsonl_file"]),
+        file_result = {
+            "jsonl_file": normalize_path_string(str(r["jsonl_file"])),
             "document": r["target_document"],
             "verified": r["verified"],
             "time_info": r["time_info"],
             "suspicious_events": r["suspicious_events"],
             "template_diff": r.get("template_diff", ""),
             "reconstructed_code": r["reconstructed"],
-        })
+        }
+        # Add submitted_comparison if present
+        if r.get("submitted_comparison") is not None:
+            file_result["submitted_comparison"] = r["submitted_comparison"]
+        files_data.append(file_result)
     # Use consistent format for both single and batch modes
     output_data = {

{cr_proc-0.1.10 → cr_proc-0.1.12}/src/code_recorder_processor/api/verify.py RENAMED Viewed

@@ -1,6 +1,7 @@
 from typing import Any
 from datetime import datetime
 import difflib
+from .document import normalize_path_string
 # ============================================================================
 # Constants for detection thresholds
@@ -837,15 +838,19 @@ def verify(template: str, jsonData: tuple[dict[str, Any], ...]) -> tuple[str, li
 def combine_time_info(
-    time_infos: list[dict[str, Any] | None], time_limit_minutes: int | None
+    all_events: list[tuple[dict[str, Any], ...]], time_limit_minutes: int | None
 ) -> dict[str, Any] | None:
     """
-    Combine time information from multiple recording files.
+    Combine time information from multiple recording files, avoiding double-counting overlapping time.
+    Merges all events from multiple recordings, then calculates the actual time spent editing
+    using the same logic as check_time_limit (gap analysis with focus awareness). This ensures
+    overlapping editing sessions are not double-counted.
     Parameters
     ----------
-    time_infos : list[dict[str, Any] | None]
-        List of time information dictionaries from multiple files
+    all_events : list[tuple[dict[str, Any], ...]]
+        List of event tuples from multiple recording files
     time_limit_minutes : int | None
         Time limit to check against
@@ -854,40 +859,94 @@ def combine_time_info(
     dict[str, Any] | None
         Combined time information, or None if no valid data
     """
-    valid_infos = [info for info in time_infos if info is not None]
-    if not valid_infos:
+    # Filter out empty event sets
+    valid_event_sets = [events for events in all_events if events]
+    if not valid_event_sets:
         return None
-    # Sum elapsed times across all sessions
-    total_elapsed = sum(info["minutes_elapsed"] for info in valid_infos)
+    # Merge all events from all recordings into a single tuple
+    merged_events = tuple(
+        event
+        for event_set in valid_event_sets
+        for event in event_set
+    )
-    # Find overall first and last timestamps
-    all_timestamps = []
-    for info in valid_infos:
-        all_timestamps.append(
-            datetime.fromisoformat(info["first_timestamp"].replace("Z", "+00:00"))
-        )
-        all_timestamps.append(
-            datetime.fromisoformat(info["last_timestamp"].replace("Z", "+00:00"))
-        )
+    # Use check_time_limit on the merged events to calculate time properly
+    # This handles overlapping periods automatically since we're now analyzing
+    # all events together chronologically
+    combined_result = check_time_limit(merged_events, time_limit_minutes)
-    first_ts = min(all_timestamps)
-    last_ts = max(all_timestamps)
-    overall_span = (last_ts - first_ts).total_seconds() / 60
+    if combined_result is None:
+        return None
-    result = {
-        "time_limit_minutes": time_limit_minutes,
-        "minutes_elapsed": round(total_elapsed, 2),
-        "first_timestamp": first_ts.isoformat().replace("+00:00", "Z"),
-        "last_timestamp": last_ts.isoformat().replace("+00:00", "Z"),
-        "file_count": len(valid_infos),
-        "overall_span_minutes": round(overall_span, 2),
-    }
+    # Add file_count to the result
+    combined_result["file_count"] = len(valid_event_sets)
-    # For time limit check in combined mode, use the sum of elapsed times
-    if time_limit_minutes is not None:
-        result["exceeds_limit"] = total_elapsed > time_limit_minutes
-    else:
-        result["exceeds_limit"] = False
+    return combined_result
-    return result
+def compare_submitted_file(reconstructed_code: str, submitted_file_path) -> dict[str, Any]:
+    """
+    Compare reconstructed code from recording with a submitted final file.
+    Parameters
+    ----------
+    reconstructed_code : str
+        The code reconstructed from the recording
+    submitted_file_path : Path
+        Path to the submitted file
+    Returns
+    -------
+    dict[str, Any]
+        Dictionary containing:
+        - matches: bool indicating if the files match
+        - submitted_file: path to the submitted file
+        - diff: unified diff string if files don't match
+        - whitespace_only: bool indicating if only whitespace differs
+    """
+    try:
+        submitted_content = submitted_file_path.read_text()
+    except Exception as e:
+        return {
+            "matches": False,
+            "submitted_file": normalize_path_string(str(submitted_file_path)),
+            "error": f"Failed to read submitted file: {e}",
+            "diff": "",
+            "whitespace_only": False,
+        }
+    # Normalize newlines for comparison
+    reconstructed_normalized = _normalize_newlines(reconstructed_code)
+    submitted_normalized = _normalize_newlines(submitted_content)
+    # Check exact match
+    matches = reconstructed_normalized == submitted_normalized
+    # Check if only whitespace differs
+    whitespace_only = False
+    if not matches:
+        whitespace_only = is_only_whitespace_differences(
+            submitted_normalized, reconstructed_normalized
+        )
+    # Generate diff if they don't match
+    diff_text = ""
+    if not matches:
+        reconstructed_lines = reconstructed_normalized.splitlines(keepends=True)
+        submitted_lines = submitted_normalized.splitlines(keepends=True)
+        diff = difflib.unified_diff(
+            reconstructed_lines,
+            submitted_lines,
+            fromfile="reconstructed",
+            tofile="submitted",
+            lineterm="",
+        )
+        diff_text = "".join(diff)
+    return {
+        "matches": matches,
+        "submitted_file": normalize_path_string(str(submitted_file_path)),
+        "diff": diff_text,
+        "whitespace_only": whitespace_only,
+    }

{cr_proc-0.1.10 → cr_proc-0.1.12}/src/code_recorder_processor/cli.py RENAMED Viewed

@@ -18,11 +18,13 @@ from .api.output import write_batch_json_output
 from .api.verify import (
     check_time_limit,
     combine_time_info,
+    compare_submitted_file,
     detect_external_copypaste,
     template_diff,
     verify,
 )
 from .display import (
+    display_submitted_file_comparison,
     display_suspicious_events,
     display_template_diff,
     display_time_info,
@@ -102,6 +104,21 @@ def create_parser() -> argparse.ArgumentParser:
         help="Directory to write reconstructed code files in batch mode (one file per recording). "
         "Files are named based on input recording filenames.",
     )
+    parser.add_argument(
+        "--submitted-file",
+        type=Path,
+        default=None,
+        help="Path to the submitted final file to verify against the reconstructed output. "
+        "If provided, the reconstructed code will be compared to this file.",
+    )
+    parser.add_argument(
+        "--submitted-dir",
+        type=Path,
+        default=None,
+        help="Directory containing submitted files to compare against. "
+        "For each recording, the corresponding submitted file will be found by matching the filename. "
+        "For example, 'homework0-ISC.recording.jsonl.gz' will match 'homework0-ISC.py' in the directory.",
+    )
     parser.add_argument(
         "-s",
         "--show-autocomplete-details",
@@ -169,12 +186,55 @@ def expand_file_patterns(patterns: list[str]) -> list[Path]:
     return existing_files
+def find_submitted_file(
+    jsonl_file: Path,
+    submitted_dir: Path,
+    target_document: str | None,
+) -> Path | None:
+    """
+    Find the submitted file corresponding to a recording file.
+    Matches by replacing '.recording.jsonl.gz' with the extension of the
+    target document (or '.py' if not specified).
+    Parameters
+    ----------
+    jsonl_file : Path
+        Path to the JSONL recording file
+    submitted_dir : Path
+        Directory containing submitted files
+    target_document : str | None
+        Target document path (to extract extension)
+    Returns
+    -------
+    Path | None
+        Path to the submitted file if found, None otherwise
+    """
+    # Determine the file extension from target_document or default to .py
+    extension = ".py"
+    if target_document:
+        extension = Path(target_document).suffix or ".py"
+    # Remove '.recording.jsonl.gz' and add the appropriate extension
+    base_name = jsonl_file.name.replace(".recording.jsonl.gz", "")
+    submitted_filename = base_name + extension
+    submitted_file = submitted_dir / submitted_filename
+    if submitted_file.exists():
+        return submitted_file
+    return None
 def process_single_file(
     jsonl_path: Path,
     template_data: str,
     target_document: str | None,
     time_limit: int | None,
-) -> tuple[bool, str, list[dict[str, Any]], dict[str, Any] | None, str]:
+    submitted_file: Path | None = None,
+    submitted_dir: Path | None = None,
+) -> tuple[bool, str, list[dict[str, Any]], dict[str, Any] | None, str, tuple[dict[str, Any], ...], dict[str, Any] | None]:
     """
     Process a single JSONL recording file.
@@ -188,17 +248,21 @@ def process_single_file(
         Document to process
     time_limit : int | None
         Time limit in minutes
+    submitted_file : Path | None
+        Path to the submitted file to compare against
+    submitted_dir : Path | None
+        Directory containing submitted files to compare against
     Returns
     -------
     tuple
-        (verified, reconstructed_code, suspicious_events, time_info, template_diff_text)
+        (verified, reconstructed_code, suspicious_events, time_info, template_diff_text, doc_events, submitted_comparison)
     """
     try:
         json_data = load_jsonl(jsonl_path)
     except (FileNotFoundError, ValueError, IOError) as e:
         print(f"Error loading {jsonl_path}: {e}", file=sys.stderr)
-        return False, "", [], None, ""
+        return False, "", [], None, "", (), None
     # Filter events for target document
     doc_events = filter_events_by_document(json_data, target_document)
@@ -207,7 +271,7 @@ def process_single_file(
             f"Warning: No events found for document '{target_document}' in {jsonl_path}",
             file=sys.stderr,
         )
-        return False, "", [], None, ""
+        return False, "", [], None, "", (), None
     # Check time information
     time_info = check_time_limit(doc_events, time_limit)
@@ -218,13 +282,29 @@ def process_single_file(
         reconstructed = reconstruct_file_from_events(
             doc_events, verified_template, document_path=target_document
         )
-        return True, reconstructed, suspicious_events, time_info, ""
+        # Compare with submitted file if provided
+        submitted_comparison = None
+        actual_submitted_file = submitted_file
+        # If submitted_dir is provided, find the matching file
+        if submitted_dir and not submitted_file:
+            actual_submitted_file = find_submitted_file(jsonl_path, submitted_dir, target_document)
+            if actual_submitted_file:
+                print(f"Found submitted file: {actual_submitted_file.name}", file=sys.stderr)
+        if actual_submitted_file and actual_submitted_file.exists():
+            submitted_comparison = compare_submitted_file(reconstructed, actual_submitted_file)
+        elif actual_submitted_file:
+            print(f"Warning: Submitted file not found: {actual_submitted_file}", file=sys.stderr)
+        return True, reconstructed, suspicious_events, time_info, "", doc_events, submitted_comparison
     except ValueError as e:
         # If verification fails but we have events, still try to reconstruct
         print(f"Warning: Verification failed for {jsonl_path}: {e}", file=sys.stderr)
         try:
             if not doc_events:
-                return False, "", [], time_info, ""
+                return False, "", [], time_info, "", (), None
             # Compute diff against template and still detect suspicious events
             diff_text = template_diff(template_data, doc_events)
@@ -235,19 +315,35 @@ def process_single_file(
             reconstructed = reconstruct_file_from_events(
                 doc_events, initial_state, document_path=target_document
             )
-            return False, reconstructed, suspicious_events, time_info, diff_text
+            # Compare with submitted file if provided
+            submitted_comparison = None
+            actual_submitted_file = submitted_file
+            # If submitted_dir is provided, find the matching file
+            if submitted_dir and not submitted_file:
+                actual_submitted_file = find_submitted_file(jsonl_path, submitted_dir, target_document)
+                if actual_submitted_file:
+                    print(f"Found submitted file: {actual_submitted_file.name}", file=sys.stderr)
+            if actual_submitted_file and actual_submitted_file.exists():
+                submitted_comparison = compare_submitted_file(reconstructed, actual_submitted_file)
+            elif actual_submitted_file:
+                print(f"Warning: Submitted file not found: {actual_submitted_file}", file=sys.stderr)
+            return False, reconstructed, suspicious_events, time_info, diff_text, doc_events, submitted_comparison
         except Exception as reconstruction_error:
             print(
                 f"Error reconstructing {jsonl_path}: {type(reconstruction_error).__name__}: {reconstruction_error}",
                 file=sys.stderr,
             )
-            return False, "", [], time_info, ""
+            return False, "", [], time_info, "", (), None
     except Exception as e:
         print(
             f"Error processing {jsonl_path}: {type(e).__name__}: {e}",
             file=sys.stderr,
         )
-        return False, "", [], time_info, ""
+        return False, "", [], time_info, "", (), None
 def write_reconstructed_file(
@@ -274,7 +370,7 @@ def write_reconstructed_file(
     """
     try:
         output_path.parent.mkdir(parents=True, exist_ok=True)
-        output_path.write_text(content)
+        output_path.write_text(content + '\n')
         print(f"{file_description} written to: {output_path}", file=sys.stderr)
         return True
     except Exception as e:
@@ -387,8 +483,8 @@ def process_batch(
             file_template_data = template_data
         # Process the file
-        verified, reconstructed, suspicious_events, time_info, diff_text = process_single_file(
-            jsonl_file, file_template_data, target_document, args.time_limit
+        verified, reconstructed, suspicious_events, time_info, diff_text, doc_events, submitted_comparison = process_single_file(
+            jsonl_file, file_template_data, target_document, args.time_limit, args.submitted_file, args.submitted_dir
         )
         if not verified:
@@ -398,6 +494,7 @@ def process_batch(
         display_time_info(time_info)
         display_suspicious_events(suspicious_events, args.show_autocomplete_details)
         display_template_diff(diff_text)
+        display_submitted_file_comparison(submitted_comparison)
         # Store results
         results.append({
@@ -408,6 +505,8 @@ def process_batch(
             "suspicious_events": suspicious_events,
             "time_info": time_info,
             "template_diff": diff_text,
+            "doc_events": doc_events,
+            "submitted_comparison": submitted_comparison,
         })
         # Write output file if requested
@@ -470,14 +569,15 @@ def process_single(
     print(f"Processing: {target_document or template_base}", file=sys.stderr)
-    verified, reconstructed, suspicious_events, time_info, diff_text = process_single_file(
-        jsonl_file, file_template_data, target_document, args.time_limit
+    verified, reconstructed, suspicious_events, time_info, diff_text, doc_events, submitted_comparison = process_single_file(
+        jsonl_file, file_template_data, target_document, args.time_limit, args.submitted_file, args.submitted_dir
     )
     # Display results
     display_time_info(time_info)
     display_suspicious_events(suspicious_events, args.show_autocomplete_details)
     display_template_diff(diff_text)
+    display_submitted_file_comparison(submitted_comparison)
     # Write output file if requested
     if reconstructed and args.output_file:
@@ -492,6 +592,8 @@ def process_single(
         "suspicious_events": suspicious_events,
         "time_info": time_info,
         "template_diff": diff_text,
+        "doc_events": doc_events,
+        "submitted_comparison": submitted_comparison,
     }]
     return results, verified
@@ -526,6 +628,11 @@ def main() -> int:
         parser.print_help()
         return 1
+    # Validate that both --submitted-file and --submitted-dir are not provided simultaneously
+    if args.submitted_file and args.submitted_dir:
+        print("Error: Cannot specify both --submitted-file and --submitted-dir", file=sys.stderr)
+        return 1
     # Expand file patterns and validate
     try:
         jsonl_files = expand_file_patterns(jsonl_patterns)
@@ -600,10 +707,10 @@ def main() -> int:
         print_batch_summary(len(results), verified_count, failed_files)
         # Display combined time report
-        time_infos = [r["time_info"] for r in results]
+        all_events = [r["doc_events"] for r in results]
         combined_time = None
-        if any(time_infos):
-            combined_time = combine_time_info(time_infos, args.time_limit)
+        if any(all_events):
+            combined_time = combine_time_info(all_events, args.time_limit)
             display_time_info(combined_time, is_combined=True)
         # Write JSON output

{cr_proc-0.1.10 → cr_proc-0.1.12}/src/code_recorder_processor/display.py RENAMED Viewed

@@ -176,6 +176,39 @@ def display_template_diff(diff_text: str) -> None:
     print(diff_text, file=sys.stderr)
+def display_submitted_file_comparison(comparison: dict[str, Any] | None) -> None:
+    """
+    Display comparison results between reconstructed code and submitted file.
+    Parameters
+    ----------
+    comparison : dict[str, Any] | None
+        Comparison results from compare_submitted_file, or None if no comparison
+    """
+    if not comparison:
+        return
+    print("\nSubmitted file comparison:", file=sys.stderr)
+    print(f"  Submitted file: {comparison['submitted_file']}", file=sys.stderr)
+    if "error" in comparison:
+        print(f"  Error: {comparison['error']}", file=sys.stderr)
+        return
+    if comparison["matches"]:
+        print("  ✓ Reconstructed code matches submitted file exactly", file=sys.stderr)
+    elif comparison.get("whitespace_only", False):
+        print("  ⚠ Reconstructed code differs only in whitespace from submitted file", file=sys.stderr)
+    else:
+        print("  ✗ Reconstructed code differs from submitted file", file=sys.stderr)
+        if comparison.get("diff"):
+            print("\n  Diff (reconstructed → submitted):", file=sys.stderr)
+            # Indent each line of the diff
+            for line in comparison["diff"].split("\n"):
+                if line:
+                    print(f"    {line}", file=sys.stderr)
 def print_separator() -> None:
     """Print a separator line."""
     print(f"{'='*80}", file=sys.stderr)

{cr_proc-0.1.10 → cr_proc-0.1.12}/src/code_recorder_processor/__init__.py RENAMED Viewed

File without changes

{cr_proc-0.1.10 → cr_proc-0.1.12}/src/code_recorder_processor/api/load.py RENAMED Viewed

File without changes

{cr_proc-0.1.10 → cr_proc-0.1.12}/src/code_recorder_processor/playback.py RENAMED Viewed

File without changes

cr-proc 0.1.10__tar.gz → 0.1.12__tar.gz

cr-proc 0.1.10tar.gz → 0.1.12tar.gz