assignment-codeval 0.0.9__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,346 @@
1
+ Metadata-Version: 2.4
2
+ Name: assignment-codeval
3
+ Version: 0.0.9
4
+ Summary: CodEval for evaluating programming assignments
5
+ Requires-Python: >=3.12
6
+ Description-Content-Type: text/markdown
7
+ Requires-Dist: canvasapi==3.3.0
8
+ Requires-Dist: certifi==2021.10.8
9
+ Requires-Dist: charset-normalizer==2.0.9
10
+ Requires-Dist: click==8.2.1
11
+ Requires-Dist: configparser==5.2.0
12
+ Requires-Dist: idna==3.3
13
+ Requires-Dist: pytz==2021.3
14
+ Requires-Dist: requests==2.27.0
15
+ Requires-Dist: urllib3==1.26.7
16
+ Requires-Dist: pymongo==4.3.3
17
+ Requires-Dist: markdown==3.4.1
18
+ Requires-Dist: anthropic>=0.39.0
19
+ Requires-Dist: openai>=1.0.0
20
+ Requires-Dist: google-generativeai>=0.8.0
21
+ Provides-Extra: test
22
+ Requires-Dist: pytest>=7.0; extra == "test"
23
+
24
+ # CodEval
25
+
26
+ Currently CodEval has 3 main components:
27
+ ## 1. Test Simple I/O Programming Assignments on Canvas
28
+ ### codeval.ini contents
29
+ ```
30
+ [SERVER]
31
+ url=<canvas API>
32
+ token=<canvas token>
33
+ [RUN]
34
+ precommand=
35
+ command=
36
+ ```
37
+
38
+ Refer to a sample codeval.ini file [here](samples/codeval.ini)
39
+
40
+ ### Command to run:
41
+ `python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
42
+ Example:
43
+ If the course name on Canvas is CS 149 - Operating Systems, the command can be:
44
+ `python3 codeval.py CS\ 149`
45
+ or
46
+ `python3 codeval.py "Operating Systems"`
47
+ Use a part of the course name that can uniquely identify the course on Canvas.
48
+
49
+ ### Flags
50
+ - **--dry-run/--no-dry-run** (Optional)
51
+ - Default: --dry-run
52
+ - Do not update the results on Canvas. Print the results to the terminal instead.
53
+ - **--verbose/--no-verbose** (Optional)
54
+ - Default: --no-verbose
55
+ - Show detailed logs
56
+ - **--force/--no-force** (Optional)
57
+ - Default: --no-force
58
+ - Grade submissions even if already graded
59
+ - **--copytmpdir/--no-copytmpdir** (Optional)
60
+ - Default: --no-copytmpdir
61
+ - Copy temporary directory content to current directory for debugging
62
+
63
+ ### Specification Tags
64
+ Tags used in a spec file (\<course name>.codeval)
65
+
66
+ | Tag | Meaning | Function |
67
+ |---|---|---|
68
+ | C | Compile Code | Specifies the command to compile the submission code |
69
+ | CTO | Compile Timeout | Timeout in seconds for the compile command to run |
70
+ | RUN | Run Script | Specifies the script to use to evaluate the specification file. Defaults to evaluate.sh. |
71
+ | Z | Download Zip | Will be followed by zip files to download from Canvas to use when running the test cases. |
72
+ | CF | Check Function | Will be followed by a function name and a list of files to check to ensure that the function is used by one of those files. |
73
+ | CC | Check Container | Will be followed by a function name and a list of files to check to ensure that a container is used by one of those files. Primarily supports C++ containers such as std::vector |
74
+ | CO | Check Object | Will be followed by a function name and a list of files to check to ensure that an object is used by one of those files. Primarily support C++ stream operations |
75
+ | CMD/TCMD | Run Command | Will be followed by a command to run. The TCMD will cause the evaluation to fail if the command exits with an error. |
76
+ | CMP | Compare | Will be followed by two files to compare. |
77
+ | T/HT | Test Case | Will be followed by the command to run to test the submission. |
78
+ | I/IB/IF | Supply Input | Specifies the input for a test case. I adds a newline, IB does not add a newline, IF reads from a file. |
79
+ | O/OB/OF | Check Output | Specifies the expected output for a test case. O adds a newline, OB does not add a newline, OF reads from a file. |
80
+ | E/EB | Check Error | Specifies the expected error output for a test case. E adds a newline, EB does not. |
81
+ | TO | Timeout | Specifies the time limit in seconds for a test case to run. Defaults to 20 seconds. |
82
+ | X | Exit Code | Specifies the expected exit code for a test case. Defaults to zero. |
83
+ | SS | Start Server | Command containing timeout (wait until server starts), kill timeout (wait to kill the server), and the command to start a server |
84
+
85
+ Refer to a sample spec file [here](samples/assignment-name.codeval)
86
+
87
+ ## 2. Test Distributed Programming Assignments
88
+ ### (or complex non I/O programs)
89
+ ### codeval.ini contents
90
+ ```
91
+ [SERVER]
92
+ url=<canvas API>
93
+ token=<canvas token>
94
+ [RUN]
95
+ precommand=
96
+ command=
97
+ dist_command=
98
+ host_ip=
99
+ [MONGO]
100
+ url=
101
+ db=
102
+ ```
103
+
104
+ Refer to a sample codeval.ini file [here](samples/codeval.ini)
105
+
106
+ ### Command to run
107
+ is the same as the [command in #1](#command-to-run):
108
+ `python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
109
+
110
+ ### Distributed Specification Tags
111
+
112
+ | Tag | Meaning | Function |
113
+ |---|---|---|
114
+ | --DT-- | Distributed Tests Begin | Marks the beginning of distributed tests. Is used to determine if the spec file has distributed tests |
115
+ | GTO | Global timeout | A total timeout for all distributed tests, for each of homogenous and heterogenous tests. Homogenous tests = GTO value. Heterogenous tests = 2 * GTO value |
116
+ | PORTS | Exposed ports count | Maximum number of ports needed to expose per docker container |
117
+ | ECMD/ECMDT SYNC/ASYNC | External Command | Command that runs in the a controller container, emulating a host machine. ECMDT: Evaluation fails if command returns an error. SYNC: CodEval waits for command to execute or fail. ASYNC: CodEval doesn't wait for command to execute, failure is checked if ECMDT |
118
+ | DTC $int [HOM] [HET] | Distributed Test Config Group | Signifies the start of a new group of Distributed tests. Replace $int with the number of containers that needs to be started for the test group. HOM denotes homogenous tests, i.e., user's own submissions will be executed in the contianers. HET denotes heterogenous tests, i.e., a combination of $int - 1 other users' and current user's submissions will be executed in the containers. Can enter either HOM or HET or both |
119
+ | ICMD/ICMDT SYNC/ASYNC */n1,n2,n3... | Internal Command | Command that runs in each of the containers. ICMDT: Evaluation fails if command returns an error. SYNC: wait for command to execute or fail. ASYNC: Don't wait for command to execute, failure is checked if ICMDT *: run command in all the containers. n1,n2..nx: Run command in containers indexed n1,n2..nx only. Containers follow zero-based indexing |
120
+ | TESTCMD | Test Command | Command run on the host machine to validate the submission(s) |
121
+ | --DTCLEAN-- | Cleanup Commands | Commands to execute after the tests have completed or failed. Can contain only ECMD or ECMDT |
122
+
123
+ ### Special placeholders in commands
124
+ | Placeholder | Usage |
125
+ | --- | --- |
126
+ | TEMP_DIR | used in ECMD/ECMDT to be replaced by the temporary directory generated by CodEval during execution |
127
+ | HOST_IP | used in ECMD/ECMDT/ICMD/ICMDT to be replaced by the host's IP specified in codeval.ini |
128
+ | USERNAME | used in ICMD/ICMDT to be replaced by the user's username whose submission is being evaluated |
129
+ | PORT_$int | used in ICMD/ICMDT to be replaced by a port number assigned to the running docker continer. $int needs to be < PORT value in the specification |
130
+
131
+ Refer to a sample spec file [here](samples/assignment-name.codeval)
132
+
133
+ ### Notes
134
+ - The config file `codeval.ini` needs to contain the extra entries only if the tag `--DT--` exists in the specification file
135
+ - Distributed tests need a running mongodb service to persists the progress of students running heterogenous tests
136
+
137
+
138
+ ## 3. Test SQL Assignments
139
+ ### codeval.ini contents
140
+ ```
141
+ [SERVER]
142
+ url=<canvas API>
143
+ token=<canvas token>
144
+ [RUN]
145
+ precommand=
146
+ command=
147
+ dist_command=
148
+ host_ip=
149
+ sql_command=
150
+ ```
151
+
152
+ Refer to a sample codeval.ini file [here](SQL/samples/codeval.ini)
153
+
154
+ ### Command to run
155
+ is the same as the [command in #1](#command-to-run):
156
+ `python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
157
+
158
+ ### SQL Specification Tags
159
+
160
+ | Tag | Meaning | Function |
161
+ |------------------|-------------------------|----------------------------------------------------------------------------------------------|
162
+ | --SQL-- | SQL Tests Begin | Marks the beginning of SQL tests. Is used to determine if the spec file has SQL based tests |
163
+ | INSERT | Insert rows in DB | Insert rows in the SQL database using files/ individual insert queries. |
164
+ | CONDITIONPRESENT | Check condition in file | Validate submission files for a required condition to be present in submissions. |
165
+ | SCHEMACHECK | External Command | Validate submission files for database related checks like constraints. |
166
+ | TSQL | SQL Test | Marks the SQL test, take input as a file or individual query and run it on submission files. |
167
+
168
+ Refer to a sample spec file [here](SQL/samples/ASSIGNMENT:CREATE.codeval)
169
+
170
+ ### Notes
171
+ - The config file `codeval.ini` needs to contain the extra entries only if the tag `--SQL--` exists in the specification file
172
+ - SQL tests need a separate container image to run SQL tests in MYSQL.
173
+
174
+
175
+ ## Create an assignment on Canvas
176
+
177
+ ### Command to create the assignment:
178
+ **Syntax:** `python3 codeval.py create-assignment <course_name> <specification_file> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --group_name ]`
179
+ **Example:** `python3 codeval.py create-assignment "Practice1" 'a_big_bag_of_strings.txt' --no-dry-run --verbose --group_name "exam 2"`
180
+
181
+ ### Command to grade the assignment:
182
+ **Syntax:** `python3 codeval.py grade-submissions <course_name> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --force/--no-force][--copytmpdir/--no-copytmpdir]`
183
+ **Example:** `python3 codeval.py grade-submissions "Practice1" --no-dry-run --force --verbose`
184
+
185
+ ### Assignment description tags
186
+
187
+ * CRT_HW START <Assignment_name> - usually at the beginning of the file. Then lines that follow this tag are the assignment description in markdown.
188
+
189
+ * CRT_HW END - ends the assignment description
190
+
191
+ ## Assignment description macros
192
+
193
+ * DISCSN_URL - this macro will be substituted with the URL of the discussion that was created for this assignment
194
+
195
+ * EXMPLS <no_of_test_cases> - this macro will be replaced with the specified number of test cases formatted for display
196
+
197
+ * FILE[file_name] - this macro will be replaced by a link to the specified file
198
+
199
+ ### MODIFICATIONS REQUIRED IN THE SPECIFICATION FILE.
200
+ 1) Start the specification file with the tag CRT_HW START followed by a space followed by the name of assignment.
201
+ ``` For ex: CRT_HW START Hello World```
202
+ 2) The following lines after the first line will contain the description of the assignment in Markdown format.
203
+ 3) The description ends with the last line containing just the tag CRT_HW END .
204
+ ``` For ex: CRT_HW END ```
205
+ 4) After this tag, the content for grading the submission begins.
206
+
207
+ Addition of the Discussion Topic in the assignment description.
208
+ 1) Insert the tag DISCUSSION_LINK wherever you want the corresponding discussion topic's link to appear.
209
+ ```For ex: To access the discussion topic for this assignment you go here DISCUSSION_LINK```
210
+
211
+ #### Addition of sample examples in the assignment description.
212
+ 1) Insert the tag EXMPLS followed by single space followed by the value.
213
+ Here value is the number of test cases to be displayed as sample examples.
214
+ At maximum it will print all the non hidden test cases.
215
+ For ex: EXMPLS 5
216
+ #### Addition of the links to the files uploaded in the Codeval folder in the assignment description.
217
+ 1) In order to add hyperlink to a file the markdown format is as follows:
218
+ [file_name_to_be_displayed](Url_of_the_file)
219
+ Here in the parenthesis where the Url is required,insert the tag
220
+ FILE[name of file].
221
+ For ex: FILE[file_name.extension]
222
+ If the file is not already in the Codeval folder, it will be extracted from a zip file in the
223
+ CodEval spec and uploaded automatically.
224
+
225
+ ### UPLOAD THE REQUIRED FILES IN CODEVAL FOLDER IN FILES SECTION.
226
+ 1) Create a folder called `assignmentFiles` which should contain all the necessary files including
227
+ the specification file.
228
+
229
+ ### EXAMPLE OF THE SPECIFICATION FILE.
230
+
231
+ CRT_HW START Bag Of Strings
232
+ # Description
233
+ ## Problem Statement
234
+ - This Is An Example For The Description Of The Assignment In Markdown.
235
+ - To Download The File [Hello_World](URL_OF_HW "Helloworld.Txt")
236
+
237
+ ## Sample Examples
238
+ EXMPLS 3
239
+
240
+ ## Discussion Topic
241
+ Here Is The Link To The Discussion Topic: DISCSN_URL
242
+
243
+ ### Rubric
244
+ | Cases | Points|
245
+ | ----- |----- |
246
+ | Base Points | 50 |
247
+
248
+ CRT_HW END
249
+
250
+ C cc -o bigbag --std=gnu11 bigbag.c
251
+
252
+
253
+ ## 4. Test Assignments with AI Models
254
+
255
+ Test programming assignments against multiple AI models (Claude, GPT, Gemini) to benchmark their performance.
256
+
257
+ ### Installation
258
+
259
+ Install the AI provider packages you want to use:
260
+
261
+ ```bash
262
+ # Install all AI providers
263
+ pip install assignment-codeval[ai]
264
+
265
+ # Or install specific providers
266
+ pip install anthropic # For Claude models
267
+ pip install openai # For GPT models
268
+ pip install google-generativeai # For Gemini models
269
+ ```
270
+
271
+ ### codeval.ini contents (optional)
272
+ ```
273
+ [AI]
274
+ anthropic_key=sk-ant-...
275
+ openai_key=sk-...
276
+ google_key=...
277
+ ```
278
+
279
+ API keys can also be provided via:
280
+ - Environment variables: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`
281
+ - Command line options: `--anthropic-key`, `--openai-key`, `--google-key`
282
+
283
+ ### Command to run
284
+ ```bash
285
+ assignment-codeval test-with-ai <codeval_file> [OPTIONS]
286
+ ```
287
+
288
+ ### Options
289
+ | Option | Description |
290
+ |--------|-------------|
291
+ | `-o, --output-dir` | Directory to store solutions and results (default: `ai_test_results`) |
292
+ | `-n, --attempts` | Number of attempts per model (default: 1) |
293
+ | `-m, --models` | Specific models to test (can be used multiple times) |
294
+ | `-p, --providers` | Only test models from specific providers: `anthropic`, `openai`, `google` |
295
+ | `--anthropic-key` | Anthropic API key |
296
+ | `--openai-key` | OpenAI API key |
297
+ | `--google-key` | Google API key |
298
+
299
+ ### Examples
300
+ ```bash
301
+ # Test with all Anthropic models
302
+ assignment-codeval test-with-ai my_assignment.codeval -p anthropic
303
+
304
+ # Test with specific model, 3 attempts each
305
+ assignment-codeval test-with-ai my_assignment.codeval -m "Claude Sonnet 4" -n 3
306
+
307
+ # Test with all providers (requires all API keys)
308
+ assignment-codeval test-with-ai my_assignment.codeval -n 2
309
+
310
+ # Pass API key directly
311
+ assignment-codeval test-with-ai my_assignment.codeval --anthropic-key sk-ant-xxx -p anthropic
312
+ ```
313
+
314
+ ### Supported Models
315
+
316
+ | Provider | Models |
317
+ |----------|--------|
318
+ | Anthropic | Claude Sonnet 4, Claude Opus 4 |
319
+ | OpenAI | GPT-4o, GPT-4o Mini, o1, o3-mini |
320
+ | Google | Gemini 2.0 Flash, Gemini 1.5 Pro |
321
+
322
+ Note: You can add additional models using `-m "model-id"`. Check each provider's documentation for available model IDs.
323
+
324
+ ### Output Structure
325
+ ```
326
+ ai_test_results/
327
+ ├── prompt.txt # The prompt sent to AI models
328
+ ├── results.json # Summary of all results
329
+ ├── Claude_Sonnet_4/
330
+ │ └── attempt_1/
331
+ │ ├── raw_response.txt # Raw AI response
332
+ │ ├── solution.c # Extracted code
333
+ │ └── <codeval files> # Copied for evaluation
334
+ ├── GPT-4o/
335
+ │ └── attempt_1/
336
+ │ └── ...
337
+ └── ...
338
+ ```
339
+
340
+ ### Notes
341
+ - The command extracts the assignment description from the codeval file (between `CRT_HW START` and `CRT_HW END` tags)
342
+ - Support files from `support_files/` directory are automatically copied for evaluation
343
+ - Results include pass/fail status, response time, and any errors
344
+ - Use multiple attempts (`-n`) to account for AI response variability
345
+
346
+
@@ -0,0 +1,323 @@
1
+ # CodEval
2
+
3
+ Currently CodEval has 3 main components:
4
+ ## 1. Test Simple I/O Programming Assignments on Canvas
5
+ ### codeval.ini contents
6
+ ```
7
+ [SERVER]
8
+ url=<canvas API>
9
+ token=<canvas token>
10
+ [RUN]
11
+ precommand=
12
+ command=
13
+ ```
14
+
15
+ Refer to a sample codeval.ini file [here](samples/codeval.ini)
16
+
17
+ ### Command to run:
18
+ `python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
19
+ Example:
20
+ If the course name on Canvas is CS 149 - Operating Systems, the command can be:
21
+ `python3 codeval.py CS\ 149`
22
+ or
23
+ `python3 codeval.py "Operating Systems"`
24
+ Use a part of the course name that can uniquely identify the course on Canvas.
25
+
26
+ ### Flags
27
+ - **--dry-run/--no-dry-run** (Optional)
28
+ - Default: --dry-run
29
+ - Do not update the results on Canvas. Print the results to the terminal instead.
30
+ - **--verbose/--no-verbose** (Optional)
31
+ - Default: --no-verbose
32
+ - Show detailed logs
33
+ - **--force/--no-force** (Optional)
34
+ - Default: --no-force
35
+ - Grade submissions even if already graded
36
+ - **--copytmpdir/--no-copytmpdir** (Optional)
37
+ - Default: --no-copytmpdir
38
+ - Copy temporary directory content to current directory for debugging
39
+
40
+ ### Specification Tags
41
+ Tags used in a spec file (\<course name>.codeval)
42
+
43
+ | Tag | Meaning | Function |
44
+ |---|---|---|
45
+ | C | Compile Code | Specifies the command to compile the submission code |
46
+ | CTO | Compile Timeout | Timeout in seconds for the compile command to run |
47
+ | RUN | Run Script | Specifies the script to use to evaluate the specification file. Defaults to evaluate.sh. |
48
+ | Z | Download Zip | Will be followed by zip files to download from Canvas to use when running the test cases. |
49
+ | CF | Check Function | Will be followed by a function name and a list of files to check to ensure that the function is used by one of those files. |
50
+ | CC | Check Container | Will be followed by a function name and a list of files to check to ensure that a container is used by one of those files. Primarily supports C++ containers such as std::vector |
51
+ | CO | Check Object | Will be followed by a function name and a list of files to check to ensure that an object is used by one of those files. Primarily support C++ stream operations |
52
+ | CMD/TCMD | Run Command | Will be followed by a command to run. The TCMD will cause the evaluation to fail if the command exits with an error. |
53
+ | CMP | Compare | Will be followed by two files to compare. |
54
+ | T/HT | Test Case | Will be followed by the command to run to test the submission. |
55
+ | I/IB/IF | Supply Input | Specifies the input for a test case. I adds a newline, IB does not add a newline, IF reads from a file. |
56
+ | O/OB/OF | Check Output | Specifies the expected output for a test case. O adds a newline, OB does not add a newline, OF reads from a file. |
57
+ | E/EB | Check Error | Specifies the expected error output for a test case. E adds a newline, EB does not. |
58
+ | TO | Timeout | Specifies the time limit in seconds for a test case to run. Defaults to 20 seconds. |
59
+ | X | Exit Code | Specifies the expected exit code for a test case. Defaults to zero. |
60
+ | SS | Start Server | Command containing timeout (wait until server starts), kill timeout (wait to kill the server), and the command to start a server |
61
+
62
+ Refer to a sample spec file [here](samples/assignment-name.codeval)
63
+
64
+ ## 2. Test Distributed Programming Assignments
65
+ ### (or complex non I/O programs)
66
+ ### codeval.ini contents
67
+ ```
68
+ [SERVER]
69
+ url=<canvas API>
70
+ token=<canvas token>
71
+ [RUN]
72
+ precommand=
73
+ command=
74
+ dist_command=
75
+ host_ip=
76
+ [MONGO]
77
+ url=
78
+ db=
79
+ ```
80
+
81
+ Refer to a sample codeval.ini file [here](samples/codeval.ini)
82
+
83
+ ### Command to run
84
+ is the same as the [command in #1](#command-to-run):
85
+ `python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
86
+
87
+ ### Distributed Specification Tags
88
+
89
+ | Tag | Meaning | Function |
90
+ |---|---|---|
91
+ | --DT-- | Distributed Tests Begin | Marks the beginning of distributed tests. Is used to determine if the spec file has distributed tests |
92
+ | GTO | Global timeout | A total timeout for all distributed tests, for each of homogenous and heterogenous tests. Homogenous tests = GTO value. Heterogenous tests = 2 * GTO value |
93
+ | PORTS | Exposed ports count | Maximum number of ports needed to expose per docker container |
94
+ | ECMD/ECMDT SYNC/ASYNC | External Command | Command that runs in the a controller container, emulating a host machine. ECMDT: Evaluation fails if command returns an error. SYNC: CodEval waits for command to execute or fail. ASYNC: CodEval doesn't wait for command to execute, failure is checked if ECMDT |
95
+ | DTC $int [HOM] [HET] | Distributed Test Config Group | Signifies the start of a new group of Distributed tests. Replace $int with the number of containers that needs to be started for the test group. HOM denotes homogenous tests, i.e., user's own submissions will be executed in the contianers. HET denotes heterogenous tests, i.e., a combination of $int - 1 other users' and current user's submissions will be executed in the containers. Can enter either HOM or HET or both |
96
+ | ICMD/ICMDT SYNC/ASYNC */n1,n2,n3... | Internal Command | Command that runs in each of the containers. ICMDT: Evaluation fails if command returns an error. SYNC: wait for command to execute or fail. ASYNC: Don't wait for command to execute, failure is checked if ICMDT *: run command in all the containers. n1,n2..nx: Run command in containers indexed n1,n2..nx only. Containers follow zero-based indexing |
97
+ | TESTCMD | Test Command | Command run on the host machine to validate the submission(s) |
98
+ | --DTCLEAN-- | Cleanup Commands | Commands to execute after the tests have completed or failed. Can contain only ECMD or ECMDT |
99
+
100
+ ### Special placeholders in commands
101
+ | Placeholder | Usage |
102
+ | --- | --- |
103
+ | TEMP_DIR | used in ECMD/ECMDT to be replaced by the temporary directory generated by CodEval during execution |
104
+ | HOST_IP | used in ECMD/ECMDT/ICMD/ICMDT to be replaced by the host's IP specified in codeval.ini |
105
+ | USERNAME | used in ICMD/ICMDT to be replaced by the user's username whose submission is being evaluated |
106
+ | PORT_$int | used in ICMD/ICMDT to be replaced by a port number assigned to the running docker continer. $int needs to be < PORT value in the specification |
107
+
108
+ Refer to a sample spec file [here](samples/assignment-name.codeval)
109
+
110
+ ### Notes
111
+ - The config file `codeval.ini` needs to contain the extra entries only if the tag `--DT--` exists in the specification file
112
+ - Distributed tests need a running mongodb service to persists the progress of students running heterogenous tests
113
+
114
+
115
+ ## 3. Test SQL Assignments
116
+ ### codeval.ini contents
117
+ ```
118
+ [SERVER]
119
+ url=<canvas API>
120
+ token=<canvas token>
121
+ [RUN]
122
+ precommand=
123
+ command=
124
+ dist_command=
125
+ host_ip=
126
+ sql_command=
127
+ ```
128
+
129
+ Refer to a sample codeval.ini file [here](SQL/samples/codeval.ini)
130
+
131
+ ### Command to run
132
+ is the same as the [command in #1](#command-to-run):
133
+ `python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
134
+
135
+ ### SQL Specification Tags
136
+
137
+ | Tag | Meaning | Function |
138
+ |------------------|-------------------------|----------------------------------------------------------------------------------------------|
139
+ | --SQL-- | SQL Tests Begin | Marks the beginning of SQL tests. Is used to determine if the spec file has SQL based tests |
140
+ | INSERT | Insert rows in DB | Insert rows in the SQL database using files/ individual insert queries. |
141
+ | CONDITIONPRESENT | Check condition in file | Validate submission files for a required condition to be present in submissions. |
142
+ | SCHEMACHECK | External Command | Validate submission files for database related checks like constraints. |
143
+ | TSQL | SQL Test | Marks the SQL test, take input as a file or individual query and run it on submission files. |
144
+
145
+ Refer to a sample spec file [here](SQL/samples/ASSIGNMENT:CREATE.codeval)
146
+
147
+ ### Notes
148
+ - The config file `codeval.ini` needs to contain the extra entries only if the tag `--SQL--` exists in the specification file
149
+ - SQL tests need a separate container image to run SQL tests in MYSQL.
150
+
151
+
152
+ ## Create an assignment on Canvas
153
+
154
+ ### Command to create the assignment:
155
+ **Syntax:** `python3 codeval.py create-assignment <course_name> <specification_file> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --group_name ]`
156
+ **Example:** `python3 codeval.py create-assignment "Practice1" 'a_big_bag_of_strings.txt' --no-dry-run --verbose --group_name "exam 2"`
157
+
158
+ ### Command to grade the assignment:
159
+ **Syntax:** `python3 codeval.py grade-submissions <course_name> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --force/--no-force][--copytmpdir/--no-copytmpdir]`
160
+ **Example:** `python3 codeval.py grade-submissions "Practice1" --no-dry-run --force --verbose`
161
+
162
+ ### Assignment description tags
163
+
164
+ * CRT_HW START <Assignment_name> - usually at the beginning of the file. Then lines that follow this tag are the assignment description in markdown.
165
+
166
+ * CRT_HW END - ends the assignment description
167
+
168
+ ## Assignment description macros
169
+
170
+ * DISCSN_URL - this macro will be substituted with the URL of the discussion that was created for this assignment
171
+
172
+ * EXMPLS <no_of_test_cases> - this macro will be replaced with the specified number of test cases formatted for display
173
+
174
+ * FILE[file_name] - this macro will be replaced by a link to the specified file
175
+
176
+ ### MODIFICATIONS REQUIRED IN THE SPECIFICATION FILE.
177
+ 1) Start the specification file with the tag CRT_HW START followed by a space followed by the name of assignment.
178
+ ``` For ex: CRT_HW START Hello World```
179
+ 2) The following lines after the first line will contain the description of the assignment in Markdown format.
180
+ 3) The description ends with the last line containing just the tag CRT_HW END .
181
+ ``` For ex: CRT_HW END ```
182
+ 4) After this tag, the content for grading the submission begins.
183
+
184
+ Addition of the Discussion Topic in the assignment description.
185
+ 1) Insert the tag DISCUSSION_LINK wherever you want the corresponding discussion topic's link to appear.
186
+ ```For ex: To access the discussion topic for this assignment you go here DISCUSSION_LINK```
187
+
188
+ #### Addition of sample examples in the assignment description.
189
+ 1) Insert the tag EXMPLS followed by single space followed by the value.
190
+ Here value is the number of test cases to be displayed as sample examples.
191
+ At maximum it will print all the non hidden test cases.
192
+ For ex: EXMPLS 5
193
+ #### Addition of the links to the files uploaded in the Codeval folder in the assignment description.
194
+ 1) In order to add hyperlink to a file the markdown format is as follows:
195
+ [file_name_to_be_displayed](Url_of_the_file)
196
+ Here in the parenthesis where the Url is required,insert the tag
197
+ FILE[name of file].
198
+ For ex: FILE[file_name.extension]
199
+ If the file is not already in the Codeval folder, it will be extracted from a zip file in the
200
+ CodEval spec and uploaded automatically.
201
+
202
+ ### UPLOAD THE REQUIRED FILES IN CODEVAL FOLDER IN FILES SECTION.
203
+ 1) Create a folder called `assignmentFiles` which should contain all the necessary files including
204
+ the specification file.
205
+
206
+ ### EXAMPLE OF THE SPECIFICATION FILE.
207
+
208
+ CRT_HW START Bag Of Strings
209
+ # Description
210
+ ## Problem Statement
211
+ - This Is An Example For The Description Of The Assignment In Markdown.
212
+ - To Download The File [Hello_World](URL_OF_HW "Helloworld.Txt")
213
+
214
+ ## Sample Examples
215
+ EXMPLS 3
216
+
217
+ ## Discussion Topic
218
+ Here Is The Link To The Discussion Topic: DISCSN_URL
219
+
220
+ ### Rubric
221
+ | Cases | Points|
222
+ | ----- |----- |
223
+ | Base Points | 50 |
224
+
225
+ CRT_HW END
226
+
227
+ C cc -o bigbag --std=gnu11 bigbag.c
228
+
229
+
230
+ ## 4. Test Assignments with AI Models
231
+
232
+ Test programming assignments against multiple AI models (Claude, GPT, Gemini) to benchmark their performance.
233
+
234
+ ### Installation
235
+
236
+ Install the AI provider packages you want to use:
237
+
238
+ ```bash
239
+ # Install all AI providers
240
+ pip install assignment-codeval[ai]
241
+
242
+ # Or install specific providers
243
+ pip install anthropic # For Claude models
244
+ pip install openai # For GPT models
245
+ pip install google-generativeai # For Gemini models
246
+ ```
247
+
248
+ ### codeval.ini contents (optional)
249
+ ```
250
+ [AI]
251
+ anthropic_key=sk-ant-...
252
+ openai_key=sk-...
253
+ google_key=...
254
+ ```
255
+
256
+ API keys can also be provided via:
257
+ - Environment variables: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`
258
+ - Command line options: `--anthropic-key`, `--openai-key`, `--google-key`
259
+
260
+ ### Command to run
261
+ ```bash
262
+ assignment-codeval test-with-ai <codeval_file> [OPTIONS]
263
+ ```
264
+
265
+ ### Options
266
+ | Option | Description |
267
+ |--------|-------------|
268
+ | `-o, --output-dir` | Directory to store solutions and results (default: `ai_test_results`) |
269
+ | `-n, --attempts` | Number of attempts per model (default: 1) |
270
+ | `-m, --models` | Specific models to test (can be used multiple times) |
271
+ | `-p, --providers` | Only test models from specific providers: `anthropic`, `openai`, `google` |
272
+ | `--anthropic-key` | Anthropic API key |
273
+ | `--openai-key` | OpenAI API key |
274
+ | `--google-key` | Google API key |
275
+
276
+ ### Examples
277
+ ```bash
278
+ # Test with all Anthropic models
279
+ assignment-codeval test-with-ai my_assignment.codeval -p anthropic
280
+
281
+ # Test with specific model, 3 attempts each
282
+ assignment-codeval test-with-ai my_assignment.codeval -m "Claude Sonnet 4" -n 3
283
+
284
+ # Test with all providers (requires all API keys)
285
+ assignment-codeval test-with-ai my_assignment.codeval -n 2
286
+
287
+ # Pass API key directly
288
+ assignment-codeval test-with-ai my_assignment.codeval --anthropic-key sk-ant-xxx -p anthropic
289
+ ```
290
+
291
+ ### Supported Models
292
+
293
+ | Provider | Models |
294
+ |----------|--------|
295
+ | Anthropic | Claude Sonnet 4, Claude Opus 4 |
296
+ | OpenAI | GPT-4o, GPT-4o Mini, o1, o3-mini |
297
+ | Google | Gemini 2.0 Flash, Gemini 1.5 Pro |
298
+
299
+ Note: You can add additional models using `-m "model-id"`. Check each provider's documentation for available model IDs.
300
+
301
+ ### Output Structure
302
+ ```
303
+ ai_test_results/
304
+ ├── prompt.txt # The prompt sent to AI models
305
+ ├── results.json # Summary of all results
306
+ ├── Claude_Sonnet_4/
307
+ │ └── attempt_1/
308
+ │ ├── raw_response.txt # Raw AI response
309
+ │ ├── solution.c # Extracted code
310
+ │ └── <codeval files> # Copied for evaluation
311
+ ├── GPT-4o/
312
+ │ └── attempt_1/
313
+ │ └── ...
314
+ └── ...
315
+ ```
316
+
317
+ ### Notes
318
+ - The command extracts the assignment description from the codeval file (between `CRT_HW START` and `CRT_HW END` tags)
319
+ - Support files from `support_files/` directory are automatically copied for evaluation
320
+ - Results include pass/fail status, response time, and any errors
321
+ - Use multiple attempts (`-n`) to account for AI response variability
322
+
323
+