atomic_assessments_import 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +21 -1
  3. data/docs/plans/2026-02-11-flexible-examsoft-importer-design.md +127 -0
  4. data/docs/plans/2026-02-11-flexible-examsoft-importer-plan.md +2635 -0
  5. data/lib/atomic_assessments_import/csv/converter.rb +3 -3
  6. data/lib/atomic_assessments_import/exam_soft/chunker/heading_split_strategy.rb +38 -0
  7. data/lib/atomic_assessments_import/exam_soft/chunker/horizontal_rule_split_strategy.rb +37 -0
  8. data/lib/atomic_assessments_import/exam_soft/chunker/metadata_marker_strategy.rb +38 -0
  9. data/lib/atomic_assessments_import/exam_soft/chunker/numbered_question_strategy.rb +41 -0
  10. data/lib/atomic_assessments_import/exam_soft/chunker/strategy.rb +22 -0
  11. data/lib/atomic_assessments_import/exam_soft/chunker.rb +46 -0
  12. data/lib/atomic_assessments_import/exam_soft/converter.rb +203 -0
  13. data/lib/atomic_assessments_import/exam_soft/extractor/correct_answer_detector.rb +36 -0
  14. data/lib/atomic_assessments_import/exam_soft/extractor/feedback_detector.rb +50 -0
  15. data/lib/atomic_assessments_import/exam_soft/extractor/metadata_detector.rb +37 -0
  16. data/lib/atomic_assessments_import/exam_soft/extractor/options_detector.rb +44 -0
  17. data/lib/atomic_assessments_import/exam_soft/extractor/question_stem_detector.rb +44 -0
  18. data/lib/atomic_assessments_import/exam_soft/extractor/question_type_detector.rb +51 -0
  19. data/lib/atomic_assessments_import/exam_soft/extractor.rb +96 -0
  20. data/lib/atomic_assessments_import/exam_soft.rb +10 -0
  21. data/lib/atomic_assessments_import/questions/cloze_dropdown.rb +62 -0
  22. data/lib/atomic_assessments_import/questions/essay.rb +20 -0
  23. data/lib/atomic_assessments_import/questions/fill_in_the_blank.rb +49 -0
  24. data/lib/atomic_assessments_import/questions/matching.rb +42 -0
  25. data/lib/atomic_assessments_import/questions/multiple_choice.rb +102 -0
  26. data/lib/atomic_assessments_import/questions/ordering.rb +53 -0
  27. data/lib/atomic_assessments_import/questions/question.rb +106 -0
  28. data/lib/atomic_assessments_import/questions/short_answer.rb +24 -0
  29. data/lib/atomic_assessments_import/utils.rb +21 -0
  30. data/lib/atomic_assessments_import/version.rb +1 -1
  31. data/lib/atomic_assessments_import.rb +30 -12
  32. metadata +43 -5
  33. data/lib/atomic_assessments_import/csv/questions/multiple_choice.rb +0 -104
  34. data/lib/atomic_assessments_import/csv/questions/question.rb +0 -86
  35. data/lib/atomic_assessments_import/csv/utils.rb +0 -24
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5a2c89288e443b60075a4eff9906a588c4878c8e101fb9b50f50efcbc4ecc9a9
4
- data.tar.gz: 5fc6957f64095e6993c4aaf1be4112528e34c50dc76d71ffe77601fb80808eb6
3
+ metadata.gz: d7c2b70c4beeb0f121cdcc5a6246b68806f7ffc19dca1081f4df4c2346b8f684
4
+ data.tar.gz: 2fb3d3dda322a36af487277fd90875c3cc66a9eb29a442d31cb1297096925f01
5
5
  SHA512:
6
- metadata.gz: 87166f7c32c8dc9deb439e48506052ab876f66981dadd84923700b5cdf8e1293cbe6ab8b8bc5d431752c588281f898b95cdaf58c77c3a9792ebffbb89a416abc
7
- data.tar.gz: 5d3e27a023a30921c375d3cc9c3dc17188a57d44d25f472376a9069ceb2ca2516e1235e5a0d45cd135d7a81ac5fd2bc615779b794badb10ae596a054e9e501e0
6
+ metadata.gz: 3aefad8373ea8d8d7f7e3eacbbc2df7f56a4bfa5060c6d0c78b936305eb7dc47e47f141bc631db2b81abcd227a07b9e62a90880c6253525500b9cfafa19f547f
7
+ data.tar.gz: 707491e1f786a083fc26604d4c44ec6919d87214ae806848644b87760a4cd3271d795b4169c932bff30e7f165e3dcc58fcbdcb4e704602cfc31fe028ca068d48
data/README.md CHANGED
@@ -1,6 +1,14 @@
1
1
  # Atomic Assessments Import
2
2
 
3
- Import converters for atomic assessments. Currently only CSV multiple choice format is supported by this GEM.
3
+ Import converters for atomic assessments. Currently this GEM supports the following export and file types:
4
+ * CSV
5
+ - Multiple Choice
6
+ * ExamSoft (in RTF, HTML, or DOCX file format)
7
+ - Multiple Choice
8
+ - True/False
9
+ - Fill in the Blank / Cloze
10
+ - Ordering
11
+ - Essay
4
12
 
5
13
  For QTI conversion, see:
6
14
 
@@ -21,6 +29,14 @@ If bundler is not being used to manage dependencies, install the gem by executin
21
29
 
22
30
  $ gem install atomic_assessments_import
23
31
 
32
+ ## Usage
33
+ ```
34
+ Usage: bin/convert <file> <export_path> [converter]
35
+ <file> Path to CSV or RTF file to convert
36
+ <export_path> Path for output ZIP file
37
+ [converter] Which converter to use- 'examsoft' for files coming from ExamSoft, 'csv' for standard CSV files. Defaults to csv if not specified.
38
+ ```
39
+
24
40
  ## Standalone conversion scripts
25
41
 
26
42
  Convert a CSV to a learnosity archive:
@@ -31,6 +47,10 @@ Convert a CSV to json on standard out:
31
47
 
32
48
  $ bin/convert_to_json input.csv
33
49
 
50
+ Convert an ExamSoft RTF to a learnosity archive:
51
+
52
+ $ bin/convert input.rtf output.zip examsoft
53
+
34
54
  ## CSV input format
35
55
 
36
56
  All columns are optional execpt "Option A", "Option B", and "Correct Answer".
@@ -0,0 +1,127 @@
1
+ # Flexible ExamSoft Importer Design
2
+
3
+ ## Problem
4
+
5
+ The current ExamSoft converter uses rigid regex patterns tied to an assumed export format. Since we don't have real ExamSoft export files and can't confirm the actual format, the importer needs to be flexible enough to handle format variations gracefully.
6
+
7
+ ## Goals
8
+
9
+ - Handle unknown ExamSoft export formats without breaking
10
+ - Support all ExamSoft question types (MCQ, multiple-select, T/F, essay, short answer, fill-in-the-blank, matching, ordering)
11
+ - Best-effort import with warnings for unparseable content
12
+ - Easy to extend with new chunking strategies and question types
13
+
14
+ ## Pipeline
15
+
16
+ ```
17
+ Input File (docx/html/rtf/etc.)
18
+ |
19
+ v
20
+ 1. Normalize -- Pandoc converts to HTML, Nokogiri parses to DOM
21
+ |
22
+ v
23
+ 2. Chunk -- Split DOM into one chunk per question
24
+ Tries multiple strategies, picks best
25
+ |
26
+ v
27
+ 3. Extract -- Per chunk: detect question type,
28
+ extract fields, build row_mock
29
+ |
30
+ v
31
+ Existing Question pipeline (Questions::Question.load -> to_learnosity)
32
+ ```
33
+
34
+ ### Stage 1: Normalize
35
+
36
+ Unchanged from current approach. Pandoc converts any input format to HTML. Nokogiri (already in bundle) parses the HTML into a DOM. All subsequent processing works on DOM nodes and text content, not raw HTML strings.
37
+
38
+ ### Stage 2: Chunk
39
+
40
+ The chunker tries multiple splitting strategies in order and picks the first one that produces reasonable results.
41
+
42
+ **Strategies (in priority order):**
43
+
44
+ 1. Metadata marker split -- split where `Folder:` or `Type:` appears at the start of a paragraph
45
+ 2. Numbered question split -- split where a paragraph starts with `\d+)` or `\d+.`
46
+ 3. Heading split -- split on `<h1>`-`<h6>` tags
47
+ 4. Horizontal rule split -- split on `<hr>` tags
48
+
49
+ **Scoring:** Each strategy produces candidate chunks. The chunker picks the strategy where the most chunks look "question-like" (contain text followed by lettered/numbered items). Must produce > 1 chunk.
50
+
51
+ **Exam header:** Content before the first question chunk is treated as a document-level header. Logged for informational purposes (question count, total points, creation date). Can be wired into output later if valuable.
52
+
53
+ **Extensibility:** Each strategy is a self-contained class with a `split(doc)` method. Adding a new strategy means writing the class and adding it to the list.
54
+
55
+ If no strategy produces good results, the whole document becomes one chunk and the extractor does its best.
56
+
57
+ ### Stage 3: Extract
58
+
59
+ The extractor runs independent field detectors against each chunk:
60
+
61
+ | Detector | What it looks for | Required? |
62
+ |------------------|-------------------------------------------------------------------------|------------------------------------|
63
+ | QuestionType | "Type:" labels, keywords, or inferred from structure | No (defaults based on structure) |
64
+ | QuestionStem | Main question text before options, after numbered prefix | Yes (warns if missing) |
65
+ | Options | Lettered/numbered items, bulleted lists | Required for MCQ types |
66
+ | CorrectAnswer | `*` prefix, bold, "Answer:" label | Required for MCQ types |
67
+ | Metadata | `Folder:`, `Title:`, `Category:` labels (any order) | No |
68
+ | Feedback | Text after `~`, or "Explanation:"/"Rationale:" labels | No |
69
+ | MatchingPairs | Two parallel lists or table structure | Required for matching type |
70
+ | OrderingSequence | Numbered/labeled sequence with correct order indicator | Required for ordering type |
71
+
72
+ Each detector returns its result or nil. The extractor assembles findings into a `row_mock` hash compatible with the existing `Questions::Question.load` pipeline.
73
+
74
+ ## Question Type Mapping
75
+
76
+ | ExamSoft Type | Question Class | Learnosity type | Notes |
77
+ |-------------------|-----------------------------|-----------------|---------------------------------------------|
78
+ | Multiple Choice | MultipleChoice (existing) | mcq | Single response |
79
+ | Multiple Select | MultipleChoice (existing) | mcq | `multiple_responses: true` |
80
+ | True/False | MultipleChoice (existing) | mcq | Two options (True/False) |
81
+ | Essay | Essay (new) | longanswer | Optional word limit, sample answer |
82
+ | Short Answer | ShortAnswer (new) | shorttext | Expected answer(s) |
83
+ | Fill in the Blank | FillInTheBlank (new) | cloze | Text with blanks, accepted answers per blank|
84
+ | Matching | Matching (new) | association | Two lists of items to pair |
85
+ | Ordering | Ordering (new) | orderlist | Items with correct sequence |
86
+
87
+ **Future types (out of scope):** Drag and drop, hotspot, numeric/formula, matrix/grid, NGN types (bowtie). When encountered, these are imported best-effort as draft items with a warning.
88
+
89
+ ## Error Handling
90
+
91
+ **Approach:** Best-effort throughout. Never fail the whole import due to one bad question.
92
+
93
+ **Warning/error levels:**
94
+
95
+ - **Info** -- exam header metadata (logged, not surfaced)
96
+ - **Warning** -- missing optional fields, unsupported question type imported as draft
97
+ - **Error** -- chunk with no usable content, skipped entirely
98
+
99
+ **Item status based on parse completeness:**
100
+
101
+ - Fully parsed -> `status: "published"`
102
+ - Partially parsed (missing required fields or unsupported type) -> `status: "draft"`
103
+ - Completely unparseable -> skipped, error logged
104
+
105
+ All warnings and errors collected in the output `:errors` array with chunk identifiers.
106
+
107
+ ## Dependencies
108
+
109
+ - **Nokogiri** -- already in bundle (v1.18.3), used for DOM parsing of Pandoc HTML output
110
+ - **Pandoc** -- already used, unchanged
111
+ - No new external dependencies
112
+
113
+ ## Testing Strategy
114
+
115
+ **Fixture-based tests:**
116
+ - Existing fixtures (simple.docx, simple.html, simple.rtf) for backward compatibility
117
+ - New fixtures for each question type
118
+ - "Messy" fixtures: missing fields, mixed types, exam headers, unexpected formatting
119
+
120
+ **Unit tests:**
121
+ - Chunker strategies tested independently
122
+ - Field detectors tested independently
123
+ - New question type classes tested same as MultipleChoice
124
+
125
+ **Integration tests:**
126
+ - Full pipeline: file in -> items + questions + warnings out
127
+ - Partial-parse scenarios: document with N questions where some are unparseable