assignment-codeval 0.0.9__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- assignment_codeval-0.0.9/PKG-INFO +346 -0
- assignment_codeval-0.0.9/README.md +323 -0
- assignment_codeval-0.0.9/pyproject.toml +32 -0
- assignment_codeval-0.0.9/setup.cfg +4 -0
- assignment_codeval-0.0.9/src/assignment_codeval/__init__.py +0 -0
- assignment_codeval-0.0.9/src/assignment_codeval/ai_benchmark.py +528 -0
- assignment_codeval-0.0.9/src/assignment_codeval/canvas_utils.py +109 -0
- assignment_codeval-0.0.9/src/assignment_codeval/cli.py +23 -0
- assignment_codeval-0.0.9/src/assignment_codeval/commons.py +57 -0
- assignment_codeval-0.0.9/src/assignment_codeval/convertMD2Html.py +62 -0
- assignment_codeval-0.0.9/src/assignment_codeval/create_assignment.py +169 -0
- assignment_codeval-0.0.9/src/assignment_codeval/evaluate.py +936 -0
- assignment_codeval-0.0.9/src/assignment_codeval/file_utils.py +62 -0
- assignment_codeval-0.0.9/src/assignment_codeval/github_connect.py +103 -0
- assignment_codeval-0.0.9/src/assignment_codeval/submissions.py +244 -0
- assignment_codeval-0.0.9/src/assignment_codeval.egg-info/PKG-INFO +346 -0
- assignment_codeval-0.0.9/src/assignment_codeval.egg-info/SOURCES.txt +20 -0
- assignment_codeval-0.0.9/src/assignment_codeval.egg-info/dependency_links.txt +1 -0
- assignment_codeval-0.0.9/src/assignment_codeval.egg-info/entry_points.txt +2 -0
- assignment_codeval-0.0.9/src/assignment_codeval.egg-info/requires.txt +17 -0
- assignment_codeval-0.0.9/src/assignment_codeval.egg-info/top_level.txt +1 -0
- assignment_codeval-0.0.9/tests/test_codeval.py +35 -0
|
@@ -0,0 +1,346 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: assignment-codeval
|
|
3
|
+
Version: 0.0.9
|
|
4
|
+
Summary: CodEval for evaluating programming assignments
|
|
5
|
+
Requires-Python: >=3.12
|
|
6
|
+
Description-Content-Type: text/markdown
|
|
7
|
+
Requires-Dist: canvasapi==3.3.0
|
|
8
|
+
Requires-Dist: certifi==2021.10.8
|
|
9
|
+
Requires-Dist: charset-normalizer==2.0.9
|
|
10
|
+
Requires-Dist: click==8.2.1
|
|
11
|
+
Requires-Dist: configparser==5.2.0
|
|
12
|
+
Requires-Dist: idna==3.3
|
|
13
|
+
Requires-Dist: pytz==2021.3
|
|
14
|
+
Requires-Dist: requests==2.27.0
|
|
15
|
+
Requires-Dist: urllib3==1.26.7
|
|
16
|
+
Requires-Dist: pymongo==4.3.3
|
|
17
|
+
Requires-Dist: markdown==3.4.1
|
|
18
|
+
Requires-Dist: anthropic>=0.39.0
|
|
19
|
+
Requires-Dist: openai>=1.0.0
|
|
20
|
+
Requires-Dist: google-generativeai>=0.8.0
|
|
21
|
+
Provides-Extra: test
|
|
22
|
+
Requires-Dist: pytest>=7.0; extra == "test"
|
|
23
|
+
|
|
24
|
+
# CodEval
|
|
25
|
+
|
|
26
|
+
Currently CodEval has 3 main components:
|
|
27
|
+
## 1. Test Simple I/O Programming Assignments on Canvas
|
|
28
|
+
### codeval.ini contents
|
|
29
|
+
```
|
|
30
|
+
[SERVER]
|
|
31
|
+
url=<canvas API>
|
|
32
|
+
token=<canvas token>
|
|
33
|
+
[RUN]
|
|
34
|
+
precommand=
|
|
35
|
+
command=
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
Refer to a sample codeval.ini file [here](samples/codeval.ini)
|
|
39
|
+
|
|
40
|
+
### Command to run:
|
|
41
|
+
`python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
|
|
42
|
+
Example:
|
|
43
|
+
If the course name on Canvas is CS 149 - Operating Systems, the command can be:
|
|
44
|
+
`python3 codeval.py CS\ 149`
|
|
45
|
+
or
|
|
46
|
+
`python3 codeval.py "Operating Systems"`
|
|
47
|
+
Use a part of the course name that can uniquely identify the course on Canvas.
|
|
48
|
+
|
|
49
|
+
### Flags
|
|
50
|
+
- **--dry-run/--no-dry-run** (Optional)
|
|
51
|
+
- Default: --dry-run
|
|
52
|
+
- Do not update the results on Canvas. Print the results to the terminal instead.
|
|
53
|
+
- **--verbose/--no-verbose** (Optional)
|
|
54
|
+
- Default: --no-verbose
|
|
55
|
+
- Show detailed logs
|
|
56
|
+
- **--force/--no-force** (Optional)
|
|
57
|
+
- Default: --no-force
|
|
58
|
+
- Grade submissions even if already graded
|
|
59
|
+
- **--copytmpdir/--no-copytmpdir** (Optional)
|
|
60
|
+
- Default: --no-copytmpdir
|
|
61
|
+
- Copy temporary directory content to current directory for debugging
|
|
62
|
+
|
|
63
|
+
### Specification Tags
|
|
64
|
+
Tags used in a spec file (\<course name>.codeval)
|
|
65
|
+
|
|
66
|
+
| Tag | Meaning | Function |
|
|
67
|
+
|---|---|---|
|
|
68
|
+
| C | Compile Code | Specifies the command to compile the submission code |
|
|
69
|
+
| CTO | Compile Timeout | Timeout in seconds for the compile command to run |
|
|
70
|
+
| RUN | Run Script | Specifies the script to use to evaluate the specification file. Defaults to evaluate.sh. |
|
|
71
|
+
| Z | Download Zip | Will be followed by zip files to download from Canvas to use when running the test cases. |
|
|
72
|
+
| CF | Check Function | Will be followed by a function name and a list of files to check to ensure that the function is used by one of those files. |
|
|
73
|
+
| CC | Check Container | Will be followed by a function name and a list of files to check to ensure that a container is used by one of those files. Primarily supports C++ containers such as std::vector |
|
|
74
|
+
| CO | Check Object | Will be followed by a function name and a list of files to check to ensure that an object is used by one of those files. Primarily support C++ stream operations |
|
|
75
|
+
| CMD/TCMD | Run Command | Will be followed by a command to run. The TCMD will cause the evaluation to fail if the command exits with an error. |
|
|
76
|
+
| CMP | Compare | Will be followed by two files to compare. |
|
|
77
|
+
| T/HT | Test Case | Will be followed by the command to run to test the submission. |
|
|
78
|
+
| I/IB/IF | Supply Input | Specifies the input for a test case. I adds a newline, IB does not add a newline, IF reads from a file. |
|
|
79
|
+
| O/OB/OF | Check Output | Specifies the expected output for a test case. O adds a newline, OB does not add a newline, OF reads from a file. |
|
|
80
|
+
| E/EB | Check Error | Specifies the expected error output for a test case. E adds a newline, EB does not. |
|
|
81
|
+
| TO | Timeout | Specifies the time limit in seconds for a test case to run. Defaults to 20 seconds. |
|
|
82
|
+
| X | Exit Code | Specifies the expected exit code for a test case. Defaults to zero. |
|
|
83
|
+
| SS | Start Server | Command containing timeout (wait until server starts), kill timeout (wait to kill the server), and the command to start a server |
|
|
84
|
+
|
|
85
|
+
Refer to a sample spec file [here](samples/assignment-name.codeval)
|
|
86
|
+
|
|
87
|
+
## 2. Test Distributed Programming Assignments
|
|
88
|
+
### (or complex non I/O programs)
|
|
89
|
+
### codeval.ini contents
|
|
90
|
+
```
|
|
91
|
+
[SERVER]
|
|
92
|
+
url=<canvas API>
|
|
93
|
+
token=<canvas token>
|
|
94
|
+
[RUN]
|
|
95
|
+
precommand=
|
|
96
|
+
command=
|
|
97
|
+
dist_command=
|
|
98
|
+
host_ip=
|
|
99
|
+
[MONGO]
|
|
100
|
+
url=
|
|
101
|
+
db=
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
Refer to a sample codeval.ini file [here](samples/codeval.ini)
|
|
105
|
+
|
|
106
|
+
### Command to run
|
|
107
|
+
is the same as the [command in #1](#command-to-run):
|
|
108
|
+
`python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
|
|
109
|
+
|
|
110
|
+
### Distributed Specification Tags
|
|
111
|
+
|
|
112
|
+
| Tag | Meaning | Function |
|
|
113
|
+
|---|---|---|
|
|
114
|
+
| --DT-- | Distributed Tests Begin | Marks the beginning of distributed tests. Is used to determine if the spec file has distributed tests |
|
|
115
|
+
| GTO | Global timeout | A total timeout for all distributed tests, for each of homogenous and heterogenous tests. Homogenous tests = GTO value. Heterogenous tests = 2 * GTO value |
|
|
116
|
+
| PORTS | Exposed ports count | Maximum number of ports needed to expose per docker container |
|
|
117
|
+
| ECMD/ECMDT SYNC/ASYNC | External Command | Command that runs in the a controller container, emulating a host machine. ECMDT: Evaluation fails if command returns an error. SYNC: CodEval waits for command to execute or fail. ASYNC: CodEval doesn't wait for command to execute, failure is checked if ECMDT |
|
|
118
|
+
| DTC $int [HOM] [HET] | Distributed Test Config Group | Signifies the start of a new group of Distributed tests. Replace $int with the number of containers that needs to be started for the test group. HOM denotes homogenous tests, i.e., user's own submissions will be executed in the contianers. HET denotes heterogenous tests, i.e., a combination of $int - 1 other users' and current user's submissions will be executed in the containers. Can enter either HOM or HET or both |
|
|
119
|
+
| ICMD/ICMDT SYNC/ASYNC */n1,n2,n3... | Internal Command | Command that runs in each of the containers. ICMDT: Evaluation fails if command returns an error. SYNC: wait for command to execute or fail. ASYNC: Don't wait for command to execute, failure is checked if ICMDT *: run command in all the containers. n1,n2..nx: Run command in containers indexed n1,n2..nx only. Containers follow zero-based indexing |
|
|
120
|
+
| TESTCMD | Test Command | Command run on the host machine to validate the submission(s) |
|
|
121
|
+
| --DTCLEAN-- | Cleanup Commands | Commands to execute after the tests have completed or failed. Can contain only ECMD or ECMDT |
|
|
122
|
+
|
|
123
|
+
### Special placeholders in commands
|
|
124
|
+
| Placeholder | Usage |
|
|
125
|
+
| --- | --- |
|
|
126
|
+
| TEMP_DIR | used in ECMD/ECMDT to be replaced by the temporary directory generated by CodEval during execution |
|
|
127
|
+
| HOST_IP | used in ECMD/ECMDT/ICMD/ICMDT to be replaced by the host's IP specified in codeval.ini |
|
|
128
|
+
| USERNAME | used in ICMD/ICMDT to be replaced by the user's username whose submission is being evaluated |
|
|
129
|
+
| PORT_$int | used in ICMD/ICMDT to be replaced by a port number assigned to the running docker continer. $int needs to be < PORT value in the specification |
|
|
130
|
+
|
|
131
|
+
Refer to a sample spec file [here](samples/assignment-name.codeval)
|
|
132
|
+
|
|
133
|
+
### Notes
|
|
134
|
+
- The config file `codeval.ini` needs to contain the extra entries only if the tag `--DT--` exists in the specification file
|
|
135
|
+
- Distributed tests need a running mongodb service to persists the progress of students running heterogenous tests
|
|
136
|
+
|
|
137
|
+
|
|
138
|
+
## 3. Test SQL Assignments
|
|
139
|
+
### codeval.ini contents
|
|
140
|
+
```
|
|
141
|
+
[SERVER]
|
|
142
|
+
url=<canvas API>
|
|
143
|
+
token=<canvas token>
|
|
144
|
+
[RUN]
|
|
145
|
+
precommand=
|
|
146
|
+
command=
|
|
147
|
+
dist_command=
|
|
148
|
+
host_ip=
|
|
149
|
+
sql_command=
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
Refer to a sample codeval.ini file [here](SQL/samples/codeval.ini)
|
|
153
|
+
|
|
154
|
+
### Command to run
|
|
155
|
+
is the same as the [command in #1](#command-to-run):
|
|
156
|
+
`python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
|
|
157
|
+
|
|
158
|
+
### SQL Specification Tags
|
|
159
|
+
|
|
160
|
+
| Tag | Meaning | Function |
|
|
161
|
+
|------------------|-------------------------|----------------------------------------------------------------------------------------------|
|
|
162
|
+
| --SQL-- | SQL Tests Begin | Marks the beginning of SQL tests. Is used to determine if the spec file has SQL based tests |
|
|
163
|
+
| INSERT | Insert rows in DB | Insert rows in the SQL database using files/ individual insert queries. |
|
|
164
|
+
| CONDITIONPRESENT | Check condition in file | Validate submission files for a required condition to be present in submissions. |
|
|
165
|
+
| SCHEMACHECK | External Command | Validate submission files for database related checks like constraints. |
|
|
166
|
+
| TSQL | SQL Test | Marks the SQL test, take input as a file or individual query and run it on submission files. |
|
|
167
|
+
|
|
168
|
+
Refer to a sample spec file [here](SQL/samples/ASSIGNMENT:CREATE.codeval)
|
|
169
|
+
|
|
170
|
+
### Notes
|
|
171
|
+
- The config file `codeval.ini` needs to contain the extra entries only if the tag `--SQL--` exists in the specification file
|
|
172
|
+
- SQL tests need a separate container image to run SQL tests in MYSQL.
|
|
173
|
+
|
|
174
|
+
|
|
175
|
+
## Create an assignment on Canvas
|
|
176
|
+
|
|
177
|
+
### Command to create the assignment:
|
|
178
|
+
**Syntax:** `python3 codeval.py create-assignment <course_name> <specification_file> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --group_name ]`
|
|
179
|
+
**Example:** `python3 codeval.py create-assignment "Practice1" 'a_big_bag_of_strings.txt' --no-dry-run --verbose --group_name "exam 2"`
|
|
180
|
+
|
|
181
|
+
### Command to grade the assignment:
|
|
182
|
+
**Syntax:** `python3 codeval.py grade-submissions <course_name> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --force/--no-force][--copytmpdir/--no-copytmpdir]`
|
|
183
|
+
**Example:** `python3 codeval.py grade-submissions "Practice1" --no-dry-run --force --verbose`
|
|
184
|
+
|
|
185
|
+
### Assignment description tags
|
|
186
|
+
|
|
187
|
+
* CRT_HW START <Assignment_name> - usually at the beginning of the file. Then lines that follow this tag are the assignment description in markdown.
|
|
188
|
+
|
|
189
|
+
* CRT_HW END - ends the assignment description
|
|
190
|
+
|
|
191
|
+
## Assignment description macros
|
|
192
|
+
|
|
193
|
+
* DISCSN_URL - this macro will be substituted with the URL of the discussion that was created for this assignment
|
|
194
|
+
|
|
195
|
+
* EXMPLS <no_of_test_cases> - this macro will be replaced with the specified number of test cases formatted for display
|
|
196
|
+
|
|
197
|
+
* FILE[file_name] - this macro will be replaced by a link to the specified file
|
|
198
|
+
|
|
199
|
+
### MODIFICATIONS REQUIRED IN THE SPECIFICATION FILE.
|
|
200
|
+
1) Start the specification file with the tag CRT_HW START followed by a space followed by the name of assignment.
|
|
201
|
+
``` For ex: CRT_HW START Hello World```
|
|
202
|
+
2) The following lines after the first line will contain the description of the assignment in Markdown format.
|
|
203
|
+
3) The description ends with the last line containing just the tag CRT_HW END .
|
|
204
|
+
``` For ex: CRT_HW END ```
|
|
205
|
+
4) After this tag, the content for grading the submission begins.
|
|
206
|
+
|
|
207
|
+
Addition of the Discussion Topic in the assignment description.
|
|
208
|
+
1) Insert the tag DISCUSSION_LINK wherever you want the corresponding discussion topic's link to appear.
|
|
209
|
+
```For ex: To access the discussion topic for this assignment you go here DISCUSSION_LINK```
|
|
210
|
+
|
|
211
|
+
#### Addition of sample examples in the assignment description.
|
|
212
|
+
1) Insert the tag EXMPLS followed by single space followed by the value.
|
|
213
|
+
Here value is the number of test cases to be displayed as sample examples.
|
|
214
|
+
At maximum it will print all the non hidden test cases.
|
|
215
|
+
For ex: EXMPLS 5
|
|
216
|
+
#### Addition of the links to the files uploaded in the Codeval folder in the assignment description.
|
|
217
|
+
1) In order to add hyperlink to a file the markdown format is as follows:
|
|
218
|
+
[file_name_to_be_displayed](Url_of_the_file)
|
|
219
|
+
Here in the parenthesis where the Url is required,insert the tag
|
|
220
|
+
FILE[name of file].
|
|
221
|
+
For ex: FILE[file_name.extension]
|
|
222
|
+
If the file is not already in the Codeval folder, it will be extracted from a zip file in the
|
|
223
|
+
CodEval spec and uploaded automatically.
|
|
224
|
+
|
|
225
|
+
### UPLOAD THE REQUIRED FILES IN CODEVAL FOLDER IN FILES SECTION.
|
|
226
|
+
1) Create a folder called `assignmentFiles` which should contain all the necessary files including
|
|
227
|
+
the specification file.
|
|
228
|
+
|
|
229
|
+
### EXAMPLE OF THE SPECIFICATION FILE.
|
|
230
|
+
|
|
231
|
+
CRT_HW START Bag Of Strings
|
|
232
|
+
# Description
|
|
233
|
+
## Problem Statement
|
|
234
|
+
- This Is An Example For The Description Of The Assignment In Markdown.
|
|
235
|
+
- To Download The File [Hello_World](URL_OF_HW "Helloworld.Txt")
|
|
236
|
+
|
|
237
|
+
## Sample Examples
|
|
238
|
+
EXMPLS 3
|
|
239
|
+
|
|
240
|
+
## Discussion Topic
|
|
241
|
+
Here Is The Link To The Discussion Topic: DISCSN_URL
|
|
242
|
+
|
|
243
|
+
### Rubric
|
|
244
|
+
| Cases | Points|
|
|
245
|
+
| ----- |----- |
|
|
246
|
+
| Base Points | 50 |
|
|
247
|
+
|
|
248
|
+
CRT_HW END
|
|
249
|
+
|
|
250
|
+
C cc -o bigbag --std=gnu11 bigbag.c
|
|
251
|
+
|
|
252
|
+
|
|
253
|
+
## 4. Test Assignments with AI Models
|
|
254
|
+
|
|
255
|
+
Test programming assignments against multiple AI models (Claude, GPT, Gemini) to benchmark their performance.
|
|
256
|
+
|
|
257
|
+
### Installation
|
|
258
|
+
|
|
259
|
+
Install the AI provider packages you want to use:
|
|
260
|
+
|
|
261
|
+
```bash
|
|
262
|
+
# Install all AI providers
|
|
263
|
+
pip install assignment-codeval[ai]
|
|
264
|
+
|
|
265
|
+
# Or install specific providers
|
|
266
|
+
pip install anthropic # For Claude models
|
|
267
|
+
pip install openai # For GPT models
|
|
268
|
+
pip install google-generativeai # For Gemini models
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
### codeval.ini contents (optional)
|
|
272
|
+
```
|
|
273
|
+
[AI]
|
|
274
|
+
anthropic_key=sk-ant-...
|
|
275
|
+
openai_key=sk-...
|
|
276
|
+
google_key=...
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
API keys can also be provided via:
|
|
280
|
+
- Environment variables: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`
|
|
281
|
+
- Command line options: `--anthropic-key`, `--openai-key`, `--google-key`
|
|
282
|
+
|
|
283
|
+
### Command to run
|
|
284
|
+
```bash
|
|
285
|
+
assignment-codeval test-with-ai <codeval_file> [OPTIONS]
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
### Options
|
|
289
|
+
| Option | Description |
|
|
290
|
+
|--------|-------------|
|
|
291
|
+
| `-o, --output-dir` | Directory to store solutions and results (default: `ai_test_results`) |
|
|
292
|
+
| `-n, --attempts` | Number of attempts per model (default: 1) |
|
|
293
|
+
| `-m, --models` | Specific models to test (can be used multiple times) |
|
|
294
|
+
| `-p, --providers` | Only test models from specific providers: `anthropic`, `openai`, `google` |
|
|
295
|
+
| `--anthropic-key` | Anthropic API key |
|
|
296
|
+
| `--openai-key` | OpenAI API key |
|
|
297
|
+
| `--google-key` | Google API key |
|
|
298
|
+
|
|
299
|
+
### Examples
|
|
300
|
+
```bash
|
|
301
|
+
# Test with all Anthropic models
|
|
302
|
+
assignment-codeval test-with-ai my_assignment.codeval -p anthropic
|
|
303
|
+
|
|
304
|
+
# Test with specific model, 3 attempts each
|
|
305
|
+
assignment-codeval test-with-ai my_assignment.codeval -m "Claude Sonnet 4" -n 3
|
|
306
|
+
|
|
307
|
+
# Test with all providers (requires all API keys)
|
|
308
|
+
assignment-codeval test-with-ai my_assignment.codeval -n 2
|
|
309
|
+
|
|
310
|
+
# Pass API key directly
|
|
311
|
+
assignment-codeval test-with-ai my_assignment.codeval --anthropic-key sk-ant-xxx -p anthropic
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
### Supported Models
|
|
315
|
+
|
|
316
|
+
| Provider | Models |
|
|
317
|
+
|----------|--------|
|
|
318
|
+
| Anthropic | Claude Sonnet 4, Claude Opus 4 |
|
|
319
|
+
| OpenAI | GPT-4o, GPT-4o Mini, o1, o3-mini |
|
|
320
|
+
| Google | Gemini 2.0 Flash, Gemini 1.5 Pro |
|
|
321
|
+
|
|
322
|
+
Note: You can add additional models using `-m "model-id"`. Check each provider's documentation for available model IDs.
|
|
323
|
+
|
|
324
|
+
### Output Structure
|
|
325
|
+
```
|
|
326
|
+
ai_test_results/
|
|
327
|
+
├── prompt.txt # The prompt sent to AI models
|
|
328
|
+
├── results.json # Summary of all results
|
|
329
|
+
├── Claude_Sonnet_4/
|
|
330
|
+
│ └── attempt_1/
|
|
331
|
+
│ ├── raw_response.txt # Raw AI response
|
|
332
|
+
│ ├── solution.c # Extracted code
|
|
333
|
+
│ └── <codeval files> # Copied for evaluation
|
|
334
|
+
├── GPT-4o/
|
|
335
|
+
│ └── attempt_1/
|
|
336
|
+
│ └── ...
|
|
337
|
+
└── ...
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
### Notes
|
|
341
|
+
- The command extracts the assignment description from the codeval file (between `CRT_HW START` and `CRT_HW END` tags)
|
|
342
|
+
- Support files from `support_files/` directory are automatically copied for evaluation
|
|
343
|
+
- Results include pass/fail status, response time, and any errors
|
|
344
|
+
- Use multiple attempts (`-n`) to account for AI response variability
|
|
345
|
+
|
|
346
|
+
|
|
@@ -0,0 +1,323 @@
|
|
|
1
|
+
# CodEval
|
|
2
|
+
|
|
3
|
+
Currently CodEval has 3 main components:
|
|
4
|
+
## 1. Test Simple I/O Programming Assignments on Canvas
|
|
5
|
+
### codeval.ini contents
|
|
6
|
+
```
|
|
7
|
+
[SERVER]
|
|
8
|
+
url=<canvas API>
|
|
9
|
+
token=<canvas token>
|
|
10
|
+
[RUN]
|
|
11
|
+
precommand=
|
|
12
|
+
command=
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Refer to a sample codeval.ini file [here](samples/codeval.ini)
|
|
16
|
+
|
|
17
|
+
### Command to run:
|
|
18
|
+
`python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
|
|
19
|
+
Example:
|
|
20
|
+
If the course name on Canvas is CS 149 - Operating Systems, the command can be:
|
|
21
|
+
`python3 codeval.py CS\ 149`
|
|
22
|
+
or
|
|
23
|
+
`python3 codeval.py "Operating Systems"`
|
|
24
|
+
Use a part of the course name that can uniquely identify the course on Canvas.
|
|
25
|
+
|
|
26
|
+
### Flags
|
|
27
|
+
- **--dry-run/--no-dry-run** (Optional)
|
|
28
|
+
- Default: --dry-run
|
|
29
|
+
- Do not update the results on Canvas. Print the results to the terminal instead.
|
|
30
|
+
- **--verbose/--no-verbose** (Optional)
|
|
31
|
+
- Default: --no-verbose
|
|
32
|
+
- Show detailed logs
|
|
33
|
+
- **--force/--no-force** (Optional)
|
|
34
|
+
- Default: --no-force
|
|
35
|
+
- Grade submissions even if already graded
|
|
36
|
+
- **--copytmpdir/--no-copytmpdir** (Optional)
|
|
37
|
+
- Default: --no-copytmpdir
|
|
38
|
+
- Copy temporary directory content to current directory for debugging
|
|
39
|
+
|
|
40
|
+
### Specification Tags
|
|
41
|
+
Tags used in a spec file (\<course name>.codeval)
|
|
42
|
+
|
|
43
|
+
| Tag | Meaning | Function |
|
|
44
|
+
|---|---|---|
|
|
45
|
+
| C | Compile Code | Specifies the command to compile the submission code |
|
|
46
|
+
| CTO | Compile Timeout | Timeout in seconds for the compile command to run |
|
|
47
|
+
| RUN | Run Script | Specifies the script to use to evaluate the specification file. Defaults to evaluate.sh. |
|
|
48
|
+
| Z | Download Zip | Will be followed by zip files to download from Canvas to use when running the test cases. |
|
|
49
|
+
| CF | Check Function | Will be followed by a function name and a list of files to check to ensure that the function is used by one of those files. |
|
|
50
|
+
| CC | Check Container | Will be followed by a function name and a list of files to check to ensure that a container is used by one of those files. Primarily supports C++ containers such as std::vector |
|
|
51
|
+
| CO | Check Object | Will be followed by a function name and a list of files to check to ensure that an object is used by one of those files. Primarily support C++ stream operations |
|
|
52
|
+
| CMD/TCMD | Run Command | Will be followed by a command to run. The TCMD will cause the evaluation to fail if the command exits with an error. |
|
|
53
|
+
| CMP | Compare | Will be followed by two files to compare. |
|
|
54
|
+
| T/HT | Test Case | Will be followed by the command to run to test the submission. |
|
|
55
|
+
| I/IB/IF | Supply Input | Specifies the input for a test case. I adds a newline, IB does not add a newline, IF reads from a file. |
|
|
56
|
+
| O/OB/OF | Check Output | Specifies the expected output for a test case. O adds a newline, OB does not add a newline, OF reads from a file. |
|
|
57
|
+
| E/EB | Check Error | Specifies the expected error output for a test case. E adds a newline, EB does not. |
|
|
58
|
+
| TO | Timeout | Specifies the time limit in seconds for a test case to run. Defaults to 20 seconds. |
|
|
59
|
+
| X | Exit Code | Specifies the expected exit code for a test case. Defaults to zero. |
|
|
60
|
+
| SS | Start Server | Command containing timeout (wait until server starts), kill timeout (wait to kill the server), and the command to start a server |
|
|
61
|
+
|
|
62
|
+
Refer to a sample spec file [here](samples/assignment-name.codeval)
|
|
63
|
+
|
|
64
|
+
## 2. Test Distributed Programming Assignments
|
|
65
|
+
### (or complex non I/O programs)
|
|
66
|
+
### codeval.ini contents
|
|
67
|
+
```
|
|
68
|
+
[SERVER]
|
|
69
|
+
url=<canvas API>
|
|
70
|
+
token=<canvas token>
|
|
71
|
+
[RUN]
|
|
72
|
+
precommand=
|
|
73
|
+
command=
|
|
74
|
+
dist_command=
|
|
75
|
+
host_ip=
|
|
76
|
+
[MONGO]
|
|
77
|
+
url=
|
|
78
|
+
db=
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
Refer to a sample codeval.ini file [here](samples/codeval.ini)
|
|
82
|
+
|
|
83
|
+
### Command to run
|
|
84
|
+
is the same as the [command in #1](#command-to-run):
|
|
85
|
+
`python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
|
|
86
|
+
|
|
87
|
+
### Distributed Specification Tags
|
|
88
|
+
|
|
89
|
+
| Tag | Meaning | Function |
|
|
90
|
+
|---|---|---|
|
|
91
|
+
| --DT-- | Distributed Tests Begin | Marks the beginning of distributed tests. Is used to determine if the spec file has distributed tests |
|
|
92
|
+
| GTO | Global timeout | A total timeout for all distributed tests, for each of homogenous and heterogenous tests. Homogenous tests = GTO value. Heterogenous tests = 2 * GTO value |
|
|
93
|
+
| PORTS | Exposed ports count | Maximum number of ports needed to expose per docker container |
|
|
94
|
+
| ECMD/ECMDT SYNC/ASYNC | External Command | Command that runs in the a controller container, emulating a host machine. ECMDT: Evaluation fails if command returns an error. SYNC: CodEval waits for command to execute or fail. ASYNC: CodEval doesn't wait for command to execute, failure is checked if ECMDT |
|
|
95
|
+
| DTC $int [HOM] [HET] | Distributed Test Config Group | Signifies the start of a new group of Distributed tests. Replace $int with the number of containers that needs to be started for the test group. HOM denotes homogenous tests, i.e., user's own submissions will be executed in the contianers. HET denotes heterogenous tests, i.e., a combination of $int - 1 other users' and current user's submissions will be executed in the containers. Can enter either HOM or HET or both |
|
|
96
|
+
| ICMD/ICMDT SYNC/ASYNC */n1,n2,n3... | Internal Command | Command that runs in each of the containers. ICMDT: Evaluation fails if command returns an error. SYNC: wait for command to execute or fail. ASYNC: Don't wait for command to execute, failure is checked if ICMDT *: run command in all the containers. n1,n2..nx: Run command in containers indexed n1,n2..nx only. Containers follow zero-based indexing |
|
|
97
|
+
| TESTCMD | Test Command | Command run on the host machine to validate the submission(s) |
|
|
98
|
+
| --DTCLEAN-- | Cleanup Commands | Commands to execute after the tests have completed or failed. Can contain only ECMD or ECMDT |
|
|
99
|
+
|
|
100
|
+
### Special placeholders in commands
|
|
101
|
+
| Placeholder | Usage |
|
|
102
|
+
| --- | --- |
|
|
103
|
+
| TEMP_DIR | used in ECMD/ECMDT to be replaced by the temporary directory generated by CodEval during execution |
|
|
104
|
+
| HOST_IP | used in ECMD/ECMDT/ICMD/ICMDT to be replaced by the host's IP specified in codeval.ini |
|
|
105
|
+
| USERNAME | used in ICMD/ICMDT to be replaced by the user's username whose submission is being evaluated |
|
|
106
|
+
| PORT_$int | used in ICMD/ICMDT to be replaced by a port number assigned to the running docker continer. $int needs to be < PORT value in the specification |
|
|
107
|
+
|
|
108
|
+
Refer to a sample spec file [here](samples/assignment-name.codeval)
|
|
109
|
+
|
|
110
|
+
### Notes
|
|
111
|
+
- The config file `codeval.ini` needs to contain the extra entries only if the tag `--DT--` exists in the specification file
|
|
112
|
+
- Distributed tests need a running mongodb service to persists the progress of students running heterogenous tests
|
|
113
|
+
|
|
114
|
+
|
|
115
|
+
## 3. Test SQL Assignments
|
|
116
|
+
### codeval.ini contents
|
|
117
|
+
```
|
|
118
|
+
[SERVER]
|
|
119
|
+
url=<canvas API>
|
|
120
|
+
token=<canvas token>
|
|
121
|
+
[RUN]
|
|
122
|
+
precommand=
|
|
123
|
+
command=
|
|
124
|
+
dist_command=
|
|
125
|
+
host_ip=
|
|
126
|
+
sql_command=
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Refer to a sample codeval.ini file [here](SQL/samples/codeval.ini)
|
|
130
|
+
|
|
131
|
+
### Command to run
|
|
132
|
+
is the same as the [command in #1](#command-to-run):
|
|
133
|
+
`python3 codeval.py grade-submissions <a unique part of course name> [FLAGS]`
|
|
134
|
+
|
|
135
|
+
### SQL Specification Tags
|
|
136
|
+
|
|
137
|
+
| Tag | Meaning | Function |
|
|
138
|
+
|------------------|-------------------------|----------------------------------------------------------------------------------------------|
|
|
139
|
+
| --SQL-- | SQL Tests Begin | Marks the beginning of SQL tests. Is used to determine if the spec file has SQL based tests |
|
|
140
|
+
| INSERT | Insert rows in DB | Insert rows in the SQL database using files/ individual insert queries. |
|
|
141
|
+
| CONDITIONPRESENT | Check condition in file | Validate submission files for a required condition to be present in submissions. |
|
|
142
|
+
| SCHEMACHECK | External Command | Validate submission files for database related checks like constraints. |
|
|
143
|
+
| TSQL | SQL Test | Marks the SQL test, take input as a file or individual query and run it on submission files. |
|
|
144
|
+
|
|
145
|
+
Refer to a sample spec file [here](SQL/samples/ASSIGNMENT:CREATE.codeval)
|
|
146
|
+
|
|
147
|
+
### Notes
|
|
148
|
+
- The config file `codeval.ini` needs to contain the extra entries only if the tag `--SQL--` exists in the specification file
|
|
149
|
+
- SQL tests need a separate container image to run SQL tests in MYSQL.
|
|
150
|
+
|
|
151
|
+
|
|
152
|
+
## Create an assignment on Canvas
|
|
153
|
+
|
|
154
|
+
### Command to create the assignment:
|
|
155
|
+
**Syntax:** `python3 codeval.py create-assignment <course_name> <specification_file> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --group_name ]`
|
|
156
|
+
**Example:** `python3 codeval.py create-assignment "Practice1" 'a_big_bag_of_strings.txt' --no-dry-run --verbose --group_name "exam 2"`
|
|
157
|
+
|
|
158
|
+
### Command to grade the assignment:
|
|
159
|
+
**Syntax:** `python3 codeval.py grade-submissions <course_name> [ --dry-run/--no-dry-run ] [ --verbose/--no-verbose ] [ --force/--no-force][--copytmpdir/--no-copytmpdir]`
|
|
160
|
+
**Example:** `python3 codeval.py grade-submissions "Practice1" --no-dry-run --force --verbose`
|
|
161
|
+
|
|
162
|
+
### Assignment description tags
|
|
163
|
+
|
|
164
|
+
* CRT_HW START <Assignment_name> - usually at the beginning of the file. Then lines that follow this tag are the assignment description in markdown.
|
|
165
|
+
|
|
166
|
+
* CRT_HW END - ends the assignment description
|
|
167
|
+
|
|
168
|
+
## Assignment description macros
|
|
169
|
+
|
|
170
|
+
* DISCSN_URL - this macro will be substituted with the URL of the discussion that was created for this assignment
|
|
171
|
+
|
|
172
|
+
* EXMPLS <no_of_test_cases> - this macro will be replaced with the specified number of test cases formatted for display
|
|
173
|
+
|
|
174
|
+
* FILE[file_name] - this macro will be replaced by a link to the specified file
|
|
175
|
+
|
|
176
|
+
### MODIFICATIONS REQUIRED IN THE SPECIFICATION FILE.
|
|
177
|
+
1) Start the specification file with the tag CRT_HW START followed by a space followed by the name of assignment.
|
|
178
|
+
``` For ex: CRT_HW START Hello World```
|
|
179
|
+
2) The following lines after the first line will contain the description of the assignment in Markdown format.
|
|
180
|
+
3) The description ends with the last line containing just the tag CRT_HW END .
|
|
181
|
+
``` For ex: CRT_HW END ```
|
|
182
|
+
4) After this tag, the content for grading the submission begins.
|
|
183
|
+
|
|
184
|
+
Addition of the Discussion Topic in the assignment description.
|
|
185
|
+
1) Insert the tag DISCUSSION_LINK wherever you want the corresponding discussion topic's link to appear.
|
|
186
|
+
```For ex: To access the discussion topic for this assignment you go here DISCUSSION_LINK```
|
|
187
|
+
|
|
188
|
+
#### Addition of sample examples in the assignment description.
|
|
189
|
+
1) Insert the tag EXMPLS followed by single space followed by the value.
|
|
190
|
+
Here value is the number of test cases to be displayed as sample examples.
|
|
191
|
+
At maximum it will print all the non hidden test cases.
|
|
192
|
+
For ex: EXMPLS 5
|
|
193
|
+
#### Addition of the links to the files uploaded in the Codeval folder in the assignment description.
|
|
194
|
+
1) In order to add hyperlink to a file the markdown format is as follows:
|
|
195
|
+
[file_name_to_be_displayed](Url_of_the_file)
|
|
196
|
+
Here in the parenthesis where the Url is required,insert the tag
|
|
197
|
+
FILE[name of file].
|
|
198
|
+
For ex: FILE[file_name.extension]
|
|
199
|
+
If the file is not already in the Codeval folder, it will be extracted from a zip file in the
|
|
200
|
+
CodEval spec and uploaded automatically.
|
|
201
|
+
|
|
202
|
+
### UPLOAD THE REQUIRED FILES IN CODEVAL FOLDER IN FILES SECTION.
|
|
203
|
+
1) Create a folder called `assignmentFiles` which should contain all the necessary files including
|
|
204
|
+
the specification file.
|
|
205
|
+
|
|
206
|
+
### EXAMPLE OF THE SPECIFICATION FILE.
|
|
207
|
+
|
|
208
|
+
CRT_HW START Bag Of Strings
|
|
209
|
+
# Description
|
|
210
|
+
## Problem Statement
|
|
211
|
+
- This Is An Example For The Description Of The Assignment In Markdown.
|
|
212
|
+
- To Download The File [Hello_World](URL_OF_HW "Helloworld.Txt")
|
|
213
|
+
|
|
214
|
+
## Sample Examples
|
|
215
|
+
EXMPLS 3
|
|
216
|
+
|
|
217
|
+
## Discussion Topic
|
|
218
|
+
Here Is The Link To The Discussion Topic: DISCSN_URL
|
|
219
|
+
|
|
220
|
+
### Rubric
|
|
221
|
+
| Cases | Points|
|
|
222
|
+
| ----- |----- |
|
|
223
|
+
| Base Points | 50 |
|
|
224
|
+
|
|
225
|
+
CRT_HW END
|
|
226
|
+
|
|
227
|
+
C cc -o bigbag --std=gnu11 bigbag.c
|
|
228
|
+
|
|
229
|
+
|
|
230
|
+
## 4. Test Assignments with AI Models
|
|
231
|
+
|
|
232
|
+
Test programming assignments against multiple AI models (Claude, GPT, Gemini) to benchmark their performance.
|
|
233
|
+
|
|
234
|
+
### Installation
|
|
235
|
+
|
|
236
|
+
Install the AI provider packages you want to use:
|
|
237
|
+
|
|
238
|
+
```bash
|
|
239
|
+
# Install all AI providers
|
|
240
|
+
pip install assignment-codeval[ai]
|
|
241
|
+
|
|
242
|
+
# Or install specific providers
|
|
243
|
+
pip install anthropic # For Claude models
|
|
244
|
+
pip install openai # For GPT models
|
|
245
|
+
pip install google-generativeai # For Gemini models
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
### codeval.ini contents (optional)
|
|
249
|
+
```
|
|
250
|
+
[AI]
|
|
251
|
+
anthropic_key=sk-ant-...
|
|
252
|
+
openai_key=sk-...
|
|
253
|
+
google_key=...
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
API keys can also be provided via:
|
|
257
|
+
- Environment variables: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`
|
|
258
|
+
- Command line options: `--anthropic-key`, `--openai-key`, `--google-key`
|
|
259
|
+
|
|
260
|
+
### Command to run
|
|
261
|
+
```bash
|
|
262
|
+
assignment-codeval test-with-ai <codeval_file> [OPTIONS]
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
### Options
|
|
266
|
+
| Option | Description |
|
|
267
|
+
|--------|-------------|
|
|
268
|
+
| `-o, --output-dir` | Directory to store solutions and results (default: `ai_test_results`) |
|
|
269
|
+
| `-n, --attempts` | Number of attempts per model (default: 1) |
|
|
270
|
+
| `-m, --models` | Specific models to test (can be used multiple times) |
|
|
271
|
+
| `-p, --providers` | Only test models from specific providers: `anthropic`, `openai`, `google` |
|
|
272
|
+
| `--anthropic-key` | Anthropic API key |
|
|
273
|
+
| `--openai-key` | OpenAI API key |
|
|
274
|
+
| `--google-key` | Google API key |
|
|
275
|
+
|
|
276
|
+
### Examples
|
|
277
|
+
```bash
|
|
278
|
+
# Test with all Anthropic models
|
|
279
|
+
assignment-codeval test-with-ai my_assignment.codeval -p anthropic
|
|
280
|
+
|
|
281
|
+
# Test with specific model, 3 attempts each
|
|
282
|
+
assignment-codeval test-with-ai my_assignment.codeval -m "Claude Sonnet 4" -n 3
|
|
283
|
+
|
|
284
|
+
# Test with all providers (requires all API keys)
|
|
285
|
+
assignment-codeval test-with-ai my_assignment.codeval -n 2
|
|
286
|
+
|
|
287
|
+
# Pass API key directly
|
|
288
|
+
assignment-codeval test-with-ai my_assignment.codeval --anthropic-key sk-ant-xxx -p anthropic
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
### Supported Models
|
|
292
|
+
|
|
293
|
+
| Provider | Models |
|
|
294
|
+
|----------|--------|
|
|
295
|
+
| Anthropic | Claude Sonnet 4, Claude Opus 4 |
|
|
296
|
+
| OpenAI | GPT-4o, GPT-4o Mini, o1, o3-mini |
|
|
297
|
+
| Google | Gemini 2.0 Flash, Gemini 1.5 Pro |
|
|
298
|
+
|
|
299
|
+
Note: You can add additional models using `-m "model-id"`. Check each provider's documentation for available model IDs.
|
|
300
|
+
|
|
301
|
+
### Output Structure
|
|
302
|
+
```
|
|
303
|
+
ai_test_results/
|
|
304
|
+
├── prompt.txt # The prompt sent to AI models
|
|
305
|
+
├── results.json # Summary of all results
|
|
306
|
+
├── Claude_Sonnet_4/
|
|
307
|
+
│ └── attempt_1/
|
|
308
|
+
│ ├── raw_response.txt # Raw AI response
|
|
309
|
+
│ ├── solution.c # Extracted code
|
|
310
|
+
│ └── <codeval files> # Copied for evaluation
|
|
311
|
+
├── GPT-4o/
|
|
312
|
+
│ └── attempt_1/
|
|
313
|
+
│ └── ...
|
|
314
|
+
└── ...
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
### Notes
|
|
318
|
+
- The command extracts the assignment description from the codeval file (between `CRT_HW START` and `CRT_HW END` tags)
|
|
319
|
+
- Support files from `support_files/` directory are automatically copied for evaluation
|
|
320
|
+
- Results include pass/fail status, response time, and any errors
|
|
321
|
+
- Use multiple attempts (`-n`) to account for AI response variability
|
|
322
|
+
|
|
323
|
+
|