cr-proc 0.1.8__tar.gz → 0.1.9__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
cr_proc-0.1.9/PKG-INFO ADDED
@@ -0,0 +1,280 @@
1
+ Metadata-Version: 2.4
2
+ Name: cr_proc
3
+ Version: 0.1.9
4
+ Summary: A tool for processing BYU CS code recording files.
5
+ Author: Ethan Dye
6
+ Author-email: mrtops03@gmail.com
7
+ Requires-Python: >=3.14
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.14
10
+ Requires-Dist: py-jsonl (>=1.3.22,<2.0.0)
11
+ Description-Content-Type: text/markdown
12
+
13
+ # `code_recorder_processor`
14
+
15
+ [![CI](https://github.com/BYU-CS-Course-Ops/code_recorder_processor/actions/workflows/ci.yml/badge.svg)](https://github.com/BYU-CS-Course-Ops/code_recorder_processor/actions/workflows/ci.yml)
16
+
17
+ This contains code to process and verify the `*.recorder.jsonl.gz` files that
18
+ are produced by the
19
+ [jetbrains-recorder](https://github.com/BYU-CS-Course-Ops/jetbrains-recorder).
20
+
21
+ ## Installation
22
+
23
+ Install the package and its dependencies using Poetry:
24
+
25
+ ```bash
26
+ poetry install
27
+ ```
28
+
29
+ ## Usage
30
+
31
+ The processor can be run using the `cr_proc` command with recording file(s) and a template:
32
+
33
+ ```bash
34
+ poetry run cr_proc <path-to-jsonl-file> <path-to-template-file>
35
+ ```
36
+
37
+ ### Batch Processing
38
+
39
+ You can process multiple recording files at once (e.g., for different students' submissions):
40
+
41
+ ```bash
42
+ # Process multiple files
43
+ poetry run cr_proc file1.jsonl.gz file2.jsonl.gz template.py
44
+
45
+ # Using glob patterns
46
+ poetry run cr_proc recordings/*.jsonl.gz template.py
47
+ ```
48
+
49
+ When processing multiple files:
50
+ - Each recording is processed independently (for different students/documents)
51
+ - Time calculations and verification are done separately for each file
52
+ - A combined time report is shown at the end summarizing total editing time across all recordings
53
+ - Results can be output to individual files using `--output-dir`
54
+
55
+ ### Arguments
56
+
57
+ - `<path-to-jsonl-file>`: Path(s) to compressed JSONL file(s)
58
+ (`*.recorder.jsonl.gz`) produced by the jetbrains-recorder. Supports multiple
59
+ files and glob patterns like `recordings/*.jsonl.gz`
60
+ - `<path-to-template-file>`: Path to the initial template file that was recorded
61
+
62
+ ### Options
63
+
64
+ - `-t, --time-limit MINUTES`: (Optional) Maximum allowed time in minutes between the
65
+ first and last edit in the recording. Applied individually to each recording file and
66
+ also to the combined total in batch mode. If the elapsed time exceeds this limit, the
67
+ recording is flagged as suspicious.
68
+ - `-d, --document DOCUMENT`: (Optional) Document path or filename to process from the
69
+ recording. Defaults to the document whose extension matches the template file.
70
+ - `-o, --output-json OUTPUT_JSON`: (Optional) Path to output JSON file with verification
71
+ results (time info and suspicious events). In batch mode, creates a single JSON file
72
+ containing all recordings plus the combined time report.
73
+ - `-f, --output-file OUTPUT_FILE`: (Optional) Write reconstructed code to specified file
74
+ instead of stdout. For single files only.
75
+ - `--output-dir OUTPUT_DIR`: (Optional) Directory to write reconstructed code files in
76
+ batch mode. Files are named based on input recording filenames.
77
+ - `-s, --show-autocomplete-details`: (Optional) Show individual auto-complete events in
78
+ addition to aggregate statistics.
79
+ - `-p, --playback`: (Optional) Play back the recording in real-time, showing code evolution.
80
+ - `--playback-speed SPEED`: (Optional) Playback speed multiplier (1.0 = real-time, 2.0 = 2x
81
+ speed, 0.5 = half speed).
82
+
83
+ ### Examples
84
+
85
+ Basic usage:
86
+
87
+ ```bash
88
+ poetry run cr_proc homework0.recording.jsonl.gz homework0.py
89
+ ```
90
+
91
+ With time limit flag:
92
+
93
+ ```bash
94
+ poetry run cr_proc homework0.recording.jsonl.gz homework0.py --time-limit 30
95
+ ```
96
+
97
+ Batch processing with output directory:
98
+
99
+ ```bash
100
+ poetry run cr_proc recordings/*.jsonl.gz template.py --output-dir output/
101
+ ```
102
+
103
+ Save JSON results:
104
+
105
+ ```bash
106
+ poetry run cr_proc student1.jsonl.gz student2.jsonl.gz template.py -o results/
107
+ ```
108
+
109
+ This will process each recording independently and flag any that exceed 30 minutes.
110
+
111
+ The processor will:
112
+
113
+ 1. Load the recorded events from the JSONL file
114
+ 2. Verify that the initial event matches the template (allowances for newline
115
+ differences are made)
116
+ 3. Reconstruct the final file state by applying all recorded events
117
+ 4. Output the reconstructed file contents to stdout
118
+
119
+ ### Output
120
+
121
+ Reconstructed code files are written to disk using `-f/--output-file` (single file)
122
+ or `--output-dir` (batch mode). The processor does not output reconstructed code to stdout.
123
+
124
+ Verification information, warnings, and errors are printed to stderr, including:
125
+
126
+ - The document path being processed
127
+ - Time information (elapsed time, time span) for each recording
128
+ - Suspicious copy-paste and AI activity indicators for each file
129
+ - Batch summary showing:
130
+ - Verification status of all processed files
131
+ - Combined time report (total editing time across all recordings)
132
+ - Time limit violations if applicable
133
+
134
+ ### Suspicious Activity Detection
135
+
136
+ The processor automatically detects and reports three types of suspicious activity
137
+ patterns:
138
+
139
+ #### 1. Time Limit Exceeded
140
+
141
+ When the `--time-limit` flag is specified, the processor flags recordings where
142
+ the elapsed time between the first and last edit exceeds the specified limit.
143
+ This can indicate unusually long work sessions or potential external assistance.
144
+
145
+ Each recording file is checked independently against the time limit. In batch mode,
146
+ the combined total time is also checked against the limit.
147
+
148
+ **Example warning (single file):**
149
+
150
+ ```
151
+ Elapsed editing time: 45.5 minutes
152
+ Time span (first to last edit): 62.30 minutes
153
+
154
+ Time limit exceeded!
155
+ Limit: 30 minutes
156
+ First edit: 2025-01-15T10:00:00+00:00
157
+ Last edit: 2025-01-15T11:02:18+00:00
158
+ ```
159
+
160
+ **Example warning (batch mode combined report):**
161
+
162
+ ```
163
+ ================================================================================
164
+ BATCH SUMMARY: Processed 3 files
165
+ ================================================================================
166
+ Verified: 3/3
167
+
168
+ COMBINED TIME REPORT (3 recordings):
169
+ Total elapsed editing time: 65.5 minutes
170
+ Overall time span: 120.45 minutes
171
+
172
+ Time limit exceeded!
173
+ Limit: 60 minutes
174
+ ```
175
+
176
+ #### 2. External Copy-Paste (Multi-line Pastes)
177
+
178
+ The processor flags multi-line additions (more than one line) that do not appear
179
+ to be copied from within the document itself. These indicate content pasted from
180
+ external sources.
181
+
182
+ **Example warning:**
183
+
184
+ ```
185
+ Event #15 (multi-line external paste): 5 lines, 156 chars - newFragment: def helper_function():...
186
+ ```
187
+
188
+ #### 3. Rapid One-line Pastes (AI Indicator)
189
+
190
+ When 3 or more single-line pastes occur within a 1-second window, this is
191
+ flagged as a potential AI activity indicator. Human typing does not typically
192
+ produce this pattern; rapid sequential pastes suggest automated code generation.
193
+
194
+ **Example warning:**
195
+
196
+ ```
197
+ Events #42-#44 (rapid one-line pastes (AI indicator)): 3 lines, 89 chars
198
+ ```
199
+
200
+ ### JSON Output Format
201
+
202
+ The `--output-json` flag generates JSON files with verification results using a consistent format
203
+ for both single file and batch modes, making it easier for tooling to consume.
204
+
205
+ #### JSON Structure
206
+
207
+ All JSON output follows this unified format:
208
+ - `batch_mode`: Boolean indicating if multiple files were processed
209
+ - `total_files`: Number of files processed
210
+ - `verified_count`: How many files passed verification
211
+ - `all_verified`: Whether all files passed
212
+ - `combined_time_info`: Time information (present in both modes):
213
+ - Single file: Contains time info for that file
214
+ - Batch mode: Contains combined time report with:
215
+ - `minutes_elapsed`: Total editing time across all recordings
216
+ - `overall_span_minutes`: Time span from first to last edit
217
+ - `file_count`: Number of recordings
218
+ - `exceeds_limit`: Whether combined time exceeds the limit
219
+ - `files`: Array of individual results for each recording
220
+
221
+ **Single file example:**
222
+ ```json
223
+ {
224
+ "batch_mode": false,
225
+ "total_files": 1,
226
+ "verified_count": 1,
227
+ "all_verified": true,
228
+ "combined_time_info": {
229
+ "minutes_elapsed": 15.74,
230
+ "first_timestamp": "2026-01-15T01:21:35.360168Z",
231
+ "exceeds_limit": false
232
+ },
233
+ "files": [
234
+ {
235
+ "jsonl_file": "recording.jsonl.gz",
236
+ "document": "/path/to/homework.py",
237
+ "verified": true,
238
+ "time_info": { ... },
239
+ "suspicious_events": [ ... ],
240
+ "reconstructed_code": "..."
241
+ }
242
+ ]
243
+ }
244
+ ```
245
+
246
+ **Batch file example:**
247
+ ```json
248
+ {
249
+ "batch_mode": true,
250
+ "total_files": 2,
251
+ "verified_count": 2,
252
+ "all_verified": true,
253
+ "combined_time_info": {
254
+ "minutes_elapsed": 31.24,
255
+ "overall_span_minutes": 18739.29,
256
+ "file_count": 2,
257
+ "exceeds_limit": false
258
+ },
259
+ "files": [ /* individual results for each file */ ]
260
+ }
261
+ ```
262
+
263
+ ### Error Handling
264
+
265
+ If verification fails (the recorded initial state doesn't match the template),
266
+ the processor will:
267
+
268
+ - Print an error message to stderr
269
+ - Display a diff showing the differences
270
+ - Exit with status code 1
271
+
272
+ If file loading or processing errors occur, the processor will:
273
+
274
+ - Print a descriptive error message to stderr
275
+ - Exit with status code 1
276
+
277
+ ## Future Ideas
278
+
279
+ - Check for odd typing behavior
280
+
@@ -0,0 +1,267 @@
1
+ # `code_recorder_processor`
2
+
3
+ [![CI](https://github.com/BYU-CS-Course-Ops/code_recorder_processor/actions/workflows/ci.yml/badge.svg)](https://github.com/BYU-CS-Course-Ops/code_recorder_processor/actions/workflows/ci.yml)
4
+
5
+ This contains code to process and verify the `*.recorder.jsonl.gz` files that
6
+ are produced by the
7
+ [jetbrains-recorder](https://github.com/BYU-CS-Course-Ops/jetbrains-recorder).
8
+
9
+ ## Installation
10
+
11
+ Install the package and its dependencies using Poetry:
12
+
13
+ ```bash
14
+ poetry install
15
+ ```
16
+
17
+ ## Usage
18
+
19
+ The processor can be run using the `cr_proc` command with recording file(s) and a template:
20
+
21
+ ```bash
22
+ poetry run cr_proc <path-to-jsonl-file> <path-to-template-file>
23
+ ```
24
+
25
+ ### Batch Processing
26
+
27
+ You can process multiple recording files at once (e.g., for different students' submissions):
28
+
29
+ ```bash
30
+ # Process multiple files
31
+ poetry run cr_proc file1.jsonl.gz file2.jsonl.gz template.py
32
+
33
+ # Using glob patterns
34
+ poetry run cr_proc recordings/*.jsonl.gz template.py
35
+ ```
36
+
37
+ When processing multiple files:
38
+ - Each recording is processed independently (for different students/documents)
39
+ - Time calculations and verification are done separately for each file
40
+ - A combined time report is shown at the end summarizing total editing time across all recordings
41
+ - Results can be output to individual files using `--output-dir`
42
+
43
+ ### Arguments
44
+
45
+ - `<path-to-jsonl-file>`: Path(s) to compressed JSONL file(s)
46
+ (`*.recorder.jsonl.gz`) produced by the jetbrains-recorder. Supports multiple
47
+ files and glob patterns like `recordings/*.jsonl.gz`
48
+ - `<path-to-template-file>`: Path to the initial template file that was recorded
49
+
50
+ ### Options
51
+
52
+ - `-t, --time-limit MINUTES`: (Optional) Maximum allowed time in minutes between the
53
+ first and last edit in the recording. Applied individually to each recording file and
54
+ also to the combined total in batch mode. If the elapsed time exceeds this limit, the
55
+ recording is flagged as suspicious.
56
+ - `-d, --document DOCUMENT`: (Optional) Document path or filename to process from the
57
+ recording. Defaults to the document whose extension matches the template file.
58
+ - `-o, --output-json OUTPUT_JSON`: (Optional) Path to output JSON file with verification
59
+ results (time info and suspicious events). In batch mode, creates a single JSON file
60
+ containing all recordings plus the combined time report.
61
+ - `-f, --output-file OUTPUT_FILE`: (Optional) Write reconstructed code to specified file
62
+ instead of stdout. For single files only.
63
+ - `--output-dir OUTPUT_DIR`: (Optional) Directory to write reconstructed code files in
64
+ batch mode. Files are named based on input recording filenames.
65
+ - `-s, --show-autocomplete-details`: (Optional) Show individual auto-complete events in
66
+ addition to aggregate statistics.
67
+ - `-p, --playback`: (Optional) Play back the recording in real-time, showing code evolution.
68
+ - `--playback-speed SPEED`: (Optional) Playback speed multiplier (1.0 = real-time, 2.0 = 2x
69
+ speed, 0.5 = half speed).
70
+
71
+ ### Examples
72
+
73
+ Basic usage:
74
+
75
+ ```bash
76
+ poetry run cr_proc homework0.recording.jsonl.gz homework0.py
77
+ ```
78
+
79
+ With time limit flag:
80
+
81
+ ```bash
82
+ poetry run cr_proc homework0.recording.jsonl.gz homework0.py --time-limit 30
83
+ ```
84
+
85
+ Batch processing with output directory:
86
+
87
+ ```bash
88
+ poetry run cr_proc recordings/*.jsonl.gz template.py --output-dir output/
89
+ ```
90
+
91
+ Save JSON results:
92
+
93
+ ```bash
94
+ poetry run cr_proc student1.jsonl.gz student2.jsonl.gz template.py -o results/
95
+ ```
96
+
97
+ This will process each recording independently and flag any that exceed 30 minutes.
98
+
99
+ The processor will:
100
+
101
+ 1. Load the recorded events from the JSONL file
102
+ 2. Verify that the initial event matches the template (allowances for newline
103
+ differences are made)
104
+ 3. Reconstruct the final file state by applying all recorded events
105
+ 4. Output the reconstructed file contents to stdout
106
+
107
+ ### Output
108
+
109
+ Reconstructed code files are written to disk using `-f/--output-file` (single file)
110
+ or `--output-dir` (batch mode). The processor does not output reconstructed code to stdout.
111
+
112
+ Verification information, warnings, and errors are printed to stderr, including:
113
+
114
+ - The document path being processed
115
+ - Time information (elapsed time, time span) for each recording
116
+ - Suspicious copy-paste and AI activity indicators for each file
117
+ - Batch summary showing:
118
+ - Verification status of all processed files
119
+ - Combined time report (total editing time across all recordings)
120
+ - Time limit violations if applicable
121
+
122
+ ### Suspicious Activity Detection
123
+
124
+ The processor automatically detects and reports three types of suspicious activity
125
+ patterns:
126
+
127
+ #### 1. Time Limit Exceeded
128
+
129
+ When the `--time-limit` flag is specified, the processor flags recordings where
130
+ the elapsed time between the first and last edit exceeds the specified limit.
131
+ This can indicate unusually long work sessions or potential external assistance.
132
+
133
+ Each recording file is checked independently against the time limit. In batch mode,
134
+ the combined total time is also checked against the limit.
135
+
136
+ **Example warning (single file):**
137
+
138
+ ```
139
+ Elapsed editing time: 45.5 minutes
140
+ Time span (first to last edit): 62.30 minutes
141
+
142
+ Time limit exceeded!
143
+ Limit: 30 minutes
144
+ First edit: 2025-01-15T10:00:00+00:00
145
+ Last edit: 2025-01-15T11:02:18+00:00
146
+ ```
147
+
148
+ **Example warning (batch mode combined report):**
149
+
150
+ ```
151
+ ================================================================================
152
+ BATCH SUMMARY: Processed 3 files
153
+ ================================================================================
154
+ Verified: 3/3
155
+
156
+ COMBINED TIME REPORT (3 recordings):
157
+ Total elapsed editing time: 65.5 minutes
158
+ Overall time span: 120.45 minutes
159
+
160
+ Time limit exceeded!
161
+ Limit: 60 minutes
162
+ ```
163
+
164
+ #### 2. External Copy-Paste (Multi-line Pastes)
165
+
166
+ The processor flags multi-line additions (more than one line) that do not appear
167
+ to be copied from within the document itself. These indicate content pasted from
168
+ external sources.
169
+
170
+ **Example warning:**
171
+
172
+ ```
173
+ Event #15 (multi-line external paste): 5 lines, 156 chars - newFragment: def helper_function():...
174
+ ```
175
+
176
+ #### 3. Rapid One-line Pastes (AI Indicator)
177
+
178
+ When 3 or more single-line pastes occur within a 1-second window, this is
179
+ flagged as a potential AI activity indicator. Human typing does not typically
180
+ produce this pattern; rapid sequential pastes suggest automated code generation.
181
+
182
+ **Example warning:**
183
+
184
+ ```
185
+ Events #42-#44 (rapid one-line pastes (AI indicator)): 3 lines, 89 chars
186
+ ```
187
+
188
+ ### JSON Output Format
189
+
190
+ The `--output-json` flag generates JSON files with verification results using a consistent format
191
+ for both single file and batch modes, making it easier for tooling to consume.
192
+
193
+ #### JSON Structure
194
+
195
+ All JSON output follows this unified format:
196
+ - `batch_mode`: Boolean indicating if multiple files were processed
197
+ - `total_files`: Number of files processed
198
+ - `verified_count`: How many files passed verification
199
+ - `all_verified`: Whether all files passed
200
+ - `combined_time_info`: Time information (present in both modes):
201
+ - Single file: Contains time info for that file
202
+ - Batch mode: Contains combined time report with:
203
+ - `minutes_elapsed`: Total editing time across all recordings
204
+ - `overall_span_minutes`: Time span from first to last edit
205
+ - `file_count`: Number of recordings
206
+ - `exceeds_limit`: Whether combined time exceeds the limit
207
+ - `files`: Array of individual results for each recording
208
+
209
+ **Single file example:**
210
+ ```json
211
+ {
212
+ "batch_mode": false,
213
+ "total_files": 1,
214
+ "verified_count": 1,
215
+ "all_verified": true,
216
+ "combined_time_info": {
217
+ "minutes_elapsed": 15.74,
218
+ "first_timestamp": "2026-01-15T01:21:35.360168Z",
219
+ "exceeds_limit": false
220
+ },
221
+ "files": [
222
+ {
223
+ "jsonl_file": "recording.jsonl.gz",
224
+ "document": "/path/to/homework.py",
225
+ "verified": true,
226
+ "time_info": { ... },
227
+ "suspicious_events": [ ... ],
228
+ "reconstructed_code": "..."
229
+ }
230
+ ]
231
+ }
232
+ ```
233
+
234
+ **Batch file example:**
235
+ ```json
236
+ {
237
+ "batch_mode": true,
238
+ "total_files": 2,
239
+ "verified_count": 2,
240
+ "all_verified": true,
241
+ "combined_time_info": {
242
+ "minutes_elapsed": 31.24,
243
+ "overall_span_minutes": 18739.29,
244
+ "file_count": 2,
245
+ "exceeds_limit": false
246
+ },
247
+ "files": [ /* individual results for each file */ ]
248
+ }
249
+ ```
250
+
251
+ ### Error Handling
252
+
253
+ If verification fails (the recorded initial state doesn't match the template),
254
+ the processor will:
255
+
256
+ - Print an error message to stderr
257
+ - Display a diff showing the differences
258
+ - Exit with status code 1
259
+
260
+ If file loading or processing errors occur, the processor will:
261
+
262
+ - Print a descriptive error message to stderr
263
+ - Exit with status code 1
264
+
265
+ ## Future Ideas
266
+
267
+ - Check for odd typing behavior
@@ -1,6 +1,6 @@
1
1
  [project]
2
2
  name = "cr_proc"
3
- version = "0.1.8"
3
+ version = "0.1.9"
4
4
  description = "A tool for processing BYU CS code recording files."
5
5
  authors = [
6
6
  {name = "Ethan Dye",email = "mrtops03@gmail.com"}
@@ -127,6 +127,8 @@ def reconstruct_file_from_events(
127
127
  ----------
128
128
  events : tuple of dict
129
129
  Each dict should contain:
130
+ - 'type': (optional) event type - only 'edit' events or events without
131
+ type field are processed (backwards compatible)
130
132
  - 'timestamp': ISO 8601 string, e.g., '2026-01-13T22:40:44.137341Z'
131
133
  - 'document': absolute path string of the edited file
132
134
  - 'offset': integer offset (JetBrains Document uses UTF-16 code units)
@@ -163,6 +165,10 @@ def reconstruct_file_from_events(
163
165
  - If the target document cannot be determined.
164
166
  - If an edit cannot be applied (oldFragment not found near offset).
165
167
  """
168
+ # Filter to only edit events (backwards compatible with old format)
169
+ from .load import is_edit_event
170
+ events = tuple(e for e in events if is_edit_event(e))
171
+
166
172
  # Read template content
167
173
  if normalize_newlines:
168
174
  template = _normalize_newlines(template)