atomic_assessments_import 0.2.4 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +21 -1
  3. data/docs/plans/2026-02-11-flexible-examsoft-importer-design.md +127 -0
  4. data/docs/plans/2026-02-11-flexible-examsoft-importer-plan.md +2635 -0
  5. data/lib/atomic_assessments_import/csv/converter.rb +3 -3
  6. data/lib/atomic_assessments_import/exam_soft/chunker/heading_split_strategy.rb +38 -0
  7. data/lib/atomic_assessments_import/exam_soft/chunker/horizontal_rule_split_strategy.rb +37 -0
  8. data/lib/atomic_assessments_import/exam_soft/chunker/metadata_marker_strategy.rb +38 -0
  9. data/lib/atomic_assessments_import/exam_soft/chunker/numbered_question_strategy.rb +41 -0
  10. data/lib/atomic_assessments_import/exam_soft/chunker/strategy.rb +22 -0
  11. data/lib/atomic_assessments_import/exam_soft/chunker.rb +46 -0
  12. data/lib/atomic_assessments_import/exam_soft/converter.rb +203 -0
  13. data/lib/atomic_assessments_import/exam_soft/extractor/correct_answer_detector.rb +36 -0
  14. data/lib/atomic_assessments_import/exam_soft/extractor/feedback_detector.rb +50 -0
  15. data/lib/atomic_assessments_import/exam_soft/extractor/metadata_detector.rb +37 -0
  16. data/lib/atomic_assessments_import/exam_soft/extractor/options_detector.rb +44 -0
  17. data/lib/atomic_assessments_import/exam_soft/extractor/question_stem_detector.rb +44 -0
  18. data/lib/atomic_assessments_import/exam_soft/extractor/question_type_detector.rb +51 -0
  19. data/lib/atomic_assessments_import/exam_soft/extractor.rb +96 -0
  20. data/lib/atomic_assessments_import/exam_soft.rb +10 -0
  21. data/lib/atomic_assessments_import/questions/cloze_dropdown.rb +62 -0
  22. data/lib/atomic_assessments_import/questions/essay.rb +20 -0
  23. data/lib/atomic_assessments_import/questions/fill_in_the_blank.rb +49 -0
  24. data/lib/atomic_assessments_import/questions/matching.rb +42 -0
  25. data/lib/atomic_assessments_import/questions/multiple_choice.rb +102 -0
  26. data/lib/atomic_assessments_import/questions/ordering.rb +53 -0
  27. data/lib/atomic_assessments_import/questions/question.rb +106 -0
  28. data/lib/atomic_assessments_import/questions/short_answer.rb +24 -0
  29. data/lib/atomic_assessments_import/utils.rb +21 -0
  30. data/lib/atomic_assessments_import/version.rb +1 -1
  31. data/lib/atomic_assessments_import/writer.rb +1 -1
  32. data/lib/atomic_assessments_import.rb +31 -12
  33. metadata +62 -13
  34. data/lib/atomic_assessments_import/csv/questions/multiple_choice.rb +0 -104
  35. data/lib/atomic_assessments_import/csv/questions/question.rb +0 -86
  36. data/lib/atomic_assessments_import/csv/utils.rb +0 -24
@@ -0,0 +1,2635 @@
1
+ # Flexible ExamSoft Importer Implementation Plan
2
+
3
+ > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
4
+
5
+ **Goal:** Refactor the ExamSoft converter from rigid regex parsing into a flexible chunker + field detector pipeline that handles unknown format variations with best-effort extraction.
6
+
7
+ **Architecture:** Pandoc normalizes input to HTML, Nokogiri parses to DOM, a strategy-based chunker splits into per-question chunks, independent field detectors extract data from each chunk, and the existing Question pipeline produces Learnosity output. Warnings accumulate rather than halting.
8
+
9
+ **Tech Stack:** Ruby, RSpec, Nokogiri (already in bundle), PandocRuby (already in bundle), Learnosity format output
10
+
11
+ ---
12
+
13
+ ### Task 1: Chunking Strategy Base Class + MetadataMarkerStrategy
14
+
15
+ This is the foundation. The MetadataMarkerStrategy replicates the current chunking behavior (split on `Folder:` / `Type:` markers) so we can verify backward compatibility.
16
+
17
+ **Files:**
18
+ - Create: `lib/atomic_assessments_import/exam_soft/chunker/strategy.rb`
19
+ - Create: `lib/atomic_assessments_import/exam_soft/chunker/metadata_marker_strategy.rb`
20
+ - Test: `spec/atomic_assessments_import/examsoft/chunker/metadata_marker_strategy_spec.rb`
21
+
22
+ **Step 1: Write the failing test**
23
+
24
+ Create `spec/atomic_assessments_import/examsoft/chunker/metadata_marker_strategy_spec.rb`:
25
+
26
+ ```ruby
27
+ # frozen_string_literal: true
28
+
29
+ require "atomic_assessments_import"
30
+ require "nokogiri"
31
+
32
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Chunker::MetadataMarkerStrategy do
33
+ describe "#split" do
34
+ it "splits HTML on Folder: markers" do
35
+ html = <<~HTML
36
+ <p>Folder: Geography Title: Q1 Category: Test 1) What is the capital? ~ Explanation</p>
37
+ <p>*a) Paris</p>
38
+ <p>b) London</p>
39
+ <p>Folder: Science Title: Q2 Category: Test 2) What is H2O? ~ Water</p>
40
+ <p>*a) Water</p>
41
+ <p>b) Fire</p>
42
+ HTML
43
+ doc = Nokogiri::HTML.fragment(html)
44
+ strategy = described_class.new
45
+ chunks = strategy.split(doc)
46
+
47
+ expect(chunks.length).to eq(2)
48
+ end
49
+
50
+ it "splits HTML on Type: markers" do
51
+ html = <<~HTML
52
+ <p>Type: MA Folder: Geography Title: Q1 Category: Test 1) Question? ~ Expl</p>
53
+ <p>*a) Answer</p>
54
+ <p>Type: MCQ Folder: Science Title: Q2 Category: Test 2) Question2? ~ Expl</p>
55
+ <p>*a) Answer2</p>
56
+ HTML
57
+ doc = Nokogiri::HTML.fragment(html)
58
+ strategy = described_class.new
59
+ chunks = strategy.split(doc)
60
+
61
+ expect(chunks.length).to eq(2)
62
+ end
63
+
64
+ it "returns empty array when no markers found" do
65
+ html = "<p>Just some text with no markers</p>"
66
+ doc = Nokogiri::HTML.fragment(html)
67
+ strategy = described_class.new
68
+ chunks = strategy.split(doc)
69
+
70
+ expect(chunks).to eq([])
71
+ end
72
+
73
+ it "separates exam header from questions" do
74
+ html = <<~HTML
75
+ <p>Exam: Midterm 2024</p>
76
+ <p>Total Questions: 30</p>
77
+ <p>Folder: Geography Title: Q1 Category: Test 1) Question? ~ Expl</p>
78
+ <p>*a) Answer</p>
79
+ HTML
80
+ doc = Nokogiri::HTML.fragment(html)
81
+ strategy = described_class.new
82
+ chunks = strategy.split(doc)
83
+
84
+ expect(chunks.length).to eq(1)
85
+ expect(strategy.header_nodes).not_to be_empty
86
+ end
87
+
88
+ it "returns chunks as arrays of Nokogiri nodes" do
89
+ html = <<~HTML
90
+ <p>Folder: Geo Title: Q1 Category: Test 1) Question? ~ Expl</p>
91
+ <p>*a) Answer</p>
92
+ <p>b) Wrong</p>
93
+ HTML
94
+ doc = Nokogiri::HTML.fragment(html)
95
+ strategy = described_class.new
96
+ chunks = strategy.split(doc)
97
+
98
+ expect(chunks.length).to eq(1)
99
+ expect(chunks[0]).to all(be_a(Nokogiri::XML::Node))
100
+ end
101
+ end
102
+ end
103
+ ```
104
+
105
+ **Step 2: Run test to verify it fails**
106
+
107
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/chunker/metadata_marker_strategy_spec.rb -v`
108
+ Expected: FAIL — uninitialized constant
109
+
110
+ **Step 3: Write the base Strategy class**
111
+
112
+ Create `lib/atomic_assessments_import/exam_soft/chunker/strategy.rb`:
113
+
114
+ ```ruby
115
+ # frozen_string_literal: true
116
+
117
+ module AtomicAssessmentsImport
118
+ module ExamSoft
119
+ module Chunker
120
+ class Strategy
121
+ attr_reader :header_nodes
122
+
123
+ def initialize
124
+ @header_nodes = []
125
+ end
126
+
127
+ # Subclasses implement this. Returns an array of chunks,
128
+ # where each chunk is an array of Nokogiri nodes belonging to one question.
129
+ # Returns empty array if this strategy doesn't apply to the document.
130
+ def split(doc)
131
+ raise NotImplementedError
132
+ end
133
+ end
134
+ end
135
+ end
136
+ end
137
+ ```
138
+
139
+ **Step 4: Write MetadataMarkerStrategy**
140
+
141
+ Create `lib/atomic_assessments_import/exam_soft/chunker/metadata_marker_strategy.rb`:
142
+
143
+ ```ruby
144
+ # frozen_string_literal: true
145
+
146
+ require_relative "strategy"
147
+
148
+ module AtomicAssessmentsImport
149
+ module ExamSoft
150
+ module Chunker
151
+ class MetadataMarkerStrategy < Strategy
152
+ MARKER_PATTERN = /\A\s*(?:Type:\s*.+?\s+)?Folder:\s*/i
153
+
154
+ def split(doc)
155
+ @header_nodes = []
156
+ chunks = []
157
+ current_chunk = []
158
+ found_first = false
159
+
160
+ doc.children.each do |node|
161
+ text = node.text.strip
162
+ next if text.empty? && !node.name.match?(/^(img|table|hr)$/i)
163
+
164
+ if text.match?(MARKER_PATTERN)
165
+ found_first = true
166
+ chunks << current_chunk unless current_chunk.empty?
167
+ current_chunk = [node]
168
+ elsif found_first
169
+ current_chunk << node
170
+ else
171
+ @header_nodes << node
172
+ end
173
+ end
174
+
175
+ chunks << current_chunk unless current_chunk.empty?
176
+ chunks
177
+ end
178
+ end
179
+ end
180
+ end
181
+ end
182
+ ```
183
+
184
+ **Step 5: Run test to verify it passes**
185
+
186
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/chunker/metadata_marker_strategy_spec.rb -v`
187
+ Expected: PASS
188
+
189
+ **Step 6: Commit**
190
+
191
+ ```bash
192
+ git add lib/atomic_assessments_import/exam_soft/chunker/ spec/atomic_assessments_import/examsoft/chunker/
193
+ git commit -m "feat: add chunker base class and MetadataMarkerStrategy"
194
+ ```
195
+
196
+ ---
197
+
198
+ ### Task 2: NumberedQuestionStrategy
199
+
200
+ **Files:**
201
+ - Create: `lib/atomic_assessments_import/exam_soft/chunker/numbered_question_strategy.rb`
202
+ - Test: `spec/atomic_assessments_import/examsoft/chunker/numbered_question_strategy_spec.rb`
203
+
204
+ **Step 1: Write the failing test**
205
+
206
+ Create `spec/atomic_assessments_import/examsoft/chunker/numbered_question_strategy_spec.rb`:
207
+
208
+ ```ruby
209
+ # frozen_string_literal: true
210
+
211
+ require "atomic_assessments_import"
212
+ require "nokogiri"
213
+
214
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Chunker::NumberedQuestionStrategy do
215
+ describe "#split" do
216
+ it "splits on paragraphs starting with number-paren pattern" do
217
+ html = <<~HTML
218
+ <p>1) What is the capital of France?</p>
219
+ <p>a) Paris</p>
220
+ <p>b) London</p>
221
+ <p>2) What is H2O?</p>
222
+ <p>a) Water</p>
223
+ <p>b) Fire</p>
224
+ HTML
225
+ doc = Nokogiri::HTML.fragment(html)
226
+ chunks = described_class.new.split(doc)
227
+
228
+ expect(chunks.length).to eq(2)
229
+ end
230
+
231
+ it "splits on paragraphs starting with number-dot pattern" do
232
+ html = <<~HTML
233
+ <p>1. What is the capital of France?</p>
234
+ <p>a) Paris</p>
235
+ <p>2. What is H2O?</p>
236
+ <p>a) Water</p>
237
+ HTML
238
+ doc = Nokogiri::HTML.fragment(html)
239
+ chunks = described_class.new.split(doc)
240
+
241
+ expect(chunks.length).to eq(2)
242
+ end
243
+
244
+ it "returns empty array when no numbered questions found" do
245
+ html = "<p>Just some regular text</p><p>More text</p>"
246
+ doc = Nokogiri::HTML.fragment(html)
247
+ chunks = described_class.new.split(doc)
248
+
249
+ expect(chunks).to eq([])
250
+ end
251
+
252
+ it "separates header content before first question" do
253
+ html = <<~HTML
254
+ <p>Exam: Midterm</p>
255
+ <p>Total: 30 questions</p>
256
+ <p>1) First question?</p>
257
+ <p>a) Answer</p>
258
+ HTML
259
+ doc = Nokogiri::HTML.fragment(html)
260
+ strategy = described_class.new
261
+ chunks = strategy.split(doc)
262
+
263
+ expect(chunks.length).to eq(1)
264
+ expect(strategy.header_nodes.length).to eq(2)
265
+ end
266
+
267
+ it "does not split on lettered options like a) b) c)" do
268
+ html = <<~HTML
269
+ <p>1) What is the capital of France?</p>
270
+ <p>a) Paris</p>
271
+ <p>b) London</p>
272
+ <p>c) Berlin</p>
273
+ HTML
274
+ doc = Nokogiri::HTML.fragment(html)
275
+ chunks = described_class.new.split(doc)
276
+
277
+ expect(chunks.length).to eq(1)
278
+ end
279
+ end
280
+ end
281
+ ```
282
+
283
+ **Step 2: Run test to verify it fails**
284
+
285
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/chunker/numbered_question_strategy_spec.rb -v`
286
+ Expected: FAIL — uninitialized constant
287
+
288
+ **Step 3: Write implementation**
289
+
290
+ Create `lib/atomic_assessments_import/exam_soft/chunker/numbered_question_strategy.rb`:
291
+
292
+ ```ruby
293
+ # frozen_string_literal: true
294
+
295
+ require_relative "strategy"
296
+
297
+ module AtomicAssessmentsImport
298
+ module ExamSoft
299
+ module Chunker
300
+ class NumberedQuestionStrategy < Strategy
301
+ # Matches "1)" or "1." or "12)" etc. at start of text, but NOT single letters like "a)"
302
+ NUMBERED_PATTERN = /\A\s*(\d+)\s*[.)]/
303
+
304
+ def split(doc)
305
+ @header_nodes = []
306
+ chunks = []
307
+ current_chunk = []
308
+ found_first = false
309
+
310
+ doc.children.each do |node|
311
+ text = node.text.strip
312
+ next if text.empty? && !node.name.match?(/^(img|table|hr)$/i)
313
+
314
+ if text.match?(NUMBERED_PATTERN)
315
+ found_first = true
316
+ chunks << current_chunk unless current_chunk.empty?
317
+ current_chunk = [node]
318
+ elsif found_first
319
+ current_chunk << node
320
+ else
321
+ @header_nodes << node
322
+ end
323
+ end
324
+
325
+ chunks << current_chunk unless current_chunk.empty?
326
+ # Only valid if we found more than one chunk (single could be a false positive)
327
+ chunks.length > 1 ? chunks : []
328
+ end
329
+ end
330
+ end
331
+ end
332
+ end
333
+ ```
334
+
335
+ **Step 4: Run test to verify it passes**
336
+
337
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/chunker/numbered_question_strategy_spec.rb -v`
338
+ Expected: PASS
339
+
340
+ **Step 5: Commit**
341
+
342
+ ```bash
343
+ git add lib/atomic_assessments_import/exam_soft/chunker/numbered_question_strategy.rb spec/atomic_assessments_import/examsoft/chunker/numbered_question_strategy_spec.rb
344
+ git commit -m "feat: add NumberedQuestionStrategy for chunking"
345
+ ```
346
+
347
+ ---
348
+
349
+ ### Task 3: HeadingSplitStrategy + HorizontalRuleSplitStrategy
350
+
351
+ These two are simple and follow the same pattern, so they're combined.
352
+
353
+ **Files:**
354
+ - Create: `lib/atomic_assessments_import/exam_soft/chunker/heading_split_strategy.rb`
355
+ - Create: `lib/atomic_assessments_import/exam_soft/chunker/horizontal_rule_split_strategy.rb`
356
+ - Test: `spec/atomic_assessments_import/examsoft/chunker/heading_split_strategy_spec.rb`
357
+ - Test: `spec/atomic_assessments_import/examsoft/chunker/horizontal_rule_split_strategy_spec.rb`
358
+
359
+ **Step 1: Write failing tests**
360
+
361
+ Create `spec/atomic_assessments_import/examsoft/chunker/heading_split_strategy_spec.rb`:
362
+
363
+ ```ruby
364
+ # frozen_string_literal: true
365
+
366
+ require "atomic_assessments_import"
367
+ require "nokogiri"
368
+
369
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Chunker::HeadingSplitStrategy do
370
+ describe "#split" do
371
+ it "splits on heading tags" do
372
+ html = <<~HTML
373
+ <h2>Question 1</h2>
374
+ <p>What is the capital of France?</p>
375
+ <p>a) Paris</p>
376
+ <h2>Question 2</h2>
377
+ <p>What is H2O?</p>
378
+ <p>a) Water</p>
379
+ HTML
380
+ doc = Nokogiri::HTML.fragment(html)
381
+ chunks = described_class.new.split(doc)
382
+
383
+ expect(chunks.length).to eq(2)
384
+ end
385
+
386
+ it "returns empty array when no headings found" do
387
+ html = "<p>No headings here</p>"
388
+ doc = Nokogiri::HTML.fragment(html)
389
+ chunks = described_class.new.split(doc)
390
+
391
+ expect(chunks).to eq([])
392
+ end
393
+
394
+ it "separates header content before first heading" do
395
+ html = <<~HTML
396
+ <p>Exam header info</p>
397
+ <h2>Question 1</h2>
398
+ <p>What is the capital?</p>
399
+ HTML
400
+ doc = Nokogiri::HTML.fragment(html)
401
+ strategy = described_class.new
402
+ chunks = strategy.split(doc)
403
+
404
+ expect(chunks.length).to eq(1)
405
+ expect(strategy.header_nodes).not_to be_empty
406
+ end
407
+ end
408
+ end
409
+ ```
410
+
411
+ Create `spec/atomic_assessments_import/examsoft/chunker/horizontal_rule_split_strategy_spec.rb`:
412
+
413
+ ```ruby
414
+ # frozen_string_literal: true
415
+
416
+ require "atomic_assessments_import"
417
+ require "nokogiri"
418
+
419
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Chunker::HorizontalRuleSplitStrategy do
420
+ describe "#split" do
421
+ it "splits on hr tags" do
422
+ html = <<~HTML
423
+ <p>Question 1: What is the capital of France?</p>
424
+ <p>a) Paris</p>
425
+ <hr/>
426
+ <p>Question 2: What is H2O?</p>
427
+ <p>a) Water</p>
428
+ HTML
429
+ doc = Nokogiri::HTML.fragment(html)
430
+ chunks = described_class.new.split(doc)
431
+
432
+ expect(chunks.length).to eq(2)
433
+ end
434
+
435
+ it "returns empty array when no hr tags found" do
436
+ html = "<p>No rules here</p>"
437
+ doc = Nokogiri::HTML.fragment(html)
438
+ chunks = described_class.new.split(doc)
439
+
440
+ expect(chunks).to eq([])
441
+ end
442
+
443
+ it "separates header content before first hr" do
444
+ html = <<~HTML
445
+ <p>Exam header info</p>
446
+ <hr/>
447
+ <p>Question 1</p>
448
+ HTML
449
+ doc = Nokogiri::HTML.fragment(html)
450
+ strategy = described_class.new
451
+ chunks = strategy.split(doc)
452
+
453
+ expect(chunks.length).to eq(1)
454
+ expect(strategy.header_nodes).not_to be_empty
455
+ end
456
+ end
457
+ end
458
+ ```
459
+
460
+ **Step 2: Run tests to verify they fail**
461
+
462
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/chunker/heading_split_strategy_spec.rb spec/atomic_assessments_import/examsoft/chunker/horizontal_rule_split_strategy_spec.rb -v`
463
+ Expected: FAIL — uninitialized constants
464
+
465
+ **Step 3: Write implementations**
466
+
467
+ Create `lib/atomic_assessments_import/exam_soft/chunker/heading_split_strategy.rb`:
468
+
469
+ ```ruby
470
+ # frozen_string_literal: true
471
+
472
+ require_relative "strategy"
473
+
474
+ module AtomicAssessmentsImport
475
+ module ExamSoft
476
+ module Chunker
477
+ class HeadingSplitStrategy < Strategy
478
+ HEADING_PATTERN = /^h[1-6]$/i
479
+
480
+ def split(doc)
481
+ @header_nodes = []
482
+ chunks = []
483
+ current_chunk = []
484
+ found_first = false
485
+
486
+ doc.children.each do |node|
487
+ if node.name.match?(HEADING_PATTERN)
488
+ found_first = true
489
+ chunks << current_chunk unless current_chunk.empty?
490
+ current_chunk = [node]
491
+ elsif found_first
492
+ text = node.text.strip
493
+ next if text.empty? && !node.name.match?(/^(img|table|hr)$/i)
494
+
495
+ current_chunk << node
496
+ else
497
+ @header_nodes << node unless node.text.strip.empty?
498
+ end
499
+ end
500
+
501
+ chunks << current_chunk unless current_chunk.empty?
502
+ chunks.length > 1 ? chunks : []
503
+ end
504
+ end
505
+ end
506
+ end
507
+ end
508
+ ```
509
+
510
+ Create `lib/atomic_assessments_import/exam_soft/chunker/horizontal_rule_split_strategy.rb`:
511
+
512
+ ```ruby
513
+ # frozen_string_literal: true
514
+
515
+ require_relative "strategy"
516
+
517
+ module AtomicAssessmentsImport
518
+ module ExamSoft
519
+ module Chunker
520
+ class HorizontalRuleSplitStrategy < Strategy
521
+ def split(doc)
522
+ @header_nodes = []
523
+ chunks = []
524
+ current_chunk = []
525
+ found_first = false
526
+
527
+ doc.children.each do |node|
528
+ if node.name == "hr"
529
+ if current_chunk.empty? && !found_first
530
+ # Content before first hr with no question content is header
531
+ next
532
+ end
533
+ found_first = true
534
+ chunks << current_chunk unless current_chunk.empty?
535
+ current_chunk = []
536
+ elsif found_first || !chunks.empty?
537
+ text = node.text.strip
538
+ next if text.empty? && !node.name.match?(/^(img|table)$/i)
539
+
540
+ current_chunk << node
541
+ else
542
+ text = node.text.strip
543
+ if text.empty?
544
+ next
545
+ else
546
+ # Before any hr — could be header or first question
547
+ current_chunk << node
548
+ end
549
+ end
550
+ end
551
+
552
+ chunks << current_chunk unless current_chunk.empty?
553
+
554
+ if chunks.length > 1
555
+ chunks
556
+ else
557
+ @header_nodes = []
558
+ []
559
+ end
560
+ end
561
+ end
562
+ end
563
+ end
564
+ end
565
+ ```
566
+
567
+ Note: The HorizontalRuleSplitStrategy is a bit different — the `<hr>` is a separator *between* chunks, not part of a chunk. Content before the first `<hr>` is the first chunk (or header if there's no question content before it).
568
+
569
+ **Step 4: Run tests to verify they pass**
570
+
571
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/chunker/heading_split_strategy_spec.rb spec/atomic_assessments_import/examsoft/chunker/horizontal_rule_split_strategy_spec.rb -v`
572
+ Expected: PASS
573
+
574
+ **Step 5: Commit**
575
+
576
+ ```bash
577
+ git add lib/atomic_assessments_import/exam_soft/chunker/heading_split_strategy.rb lib/atomic_assessments_import/exam_soft/chunker/horizontal_rule_split_strategy.rb spec/atomic_assessments_import/examsoft/chunker/heading_split_strategy_spec.rb spec/atomic_assessments_import/examsoft/chunker/horizontal_rule_split_strategy_spec.rb
578
+ git commit -m "feat: add HeadingSplitStrategy and HorizontalRuleSplitStrategy"
579
+ ```
580
+
581
+ ---
582
+
583
+ ### Task 4: Chunker Orchestrator
584
+
585
+ The orchestrator tries each strategy and picks the best one.
586
+
587
+ **Files:**
588
+ - Create: `lib/atomic_assessments_import/exam_soft/chunker.rb`
589
+ - Test: `spec/atomic_assessments_import/examsoft/chunker_spec.rb`
590
+
591
+ **Step 1: Write the failing test**
592
+
593
+ Create `spec/atomic_assessments_import/examsoft/chunker_spec.rb`:
594
+
595
+ ```ruby
596
+ # frozen_string_literal: true
597
+
598
+ require "atomic_assessments_import"
599
+ require "nokogiri"
600
+
601
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Chunker do
602
+ describe "#chunk" do
603
+ it "uses MetadataMarkerStrategy when Folder: markers are present" do
604
+ html = <<~HTML
605
+ <p>Folder: Geo Title: Q1 Category: Test 1) Question? ~ Expl</p>
606
+ <p>*a) Answer</p>
607
+ <p>Folder: Sci Title: Q2 Category: Test 2) Question2? ~ Expl</p>
608
+ <p>*a) Answer2</p>
609
+ HTML
610
+ doc = Nokogiri::HTML.fragment(html)
611
+ chunker = described_class.new(doc)
612
+ result = chunker.chunk
613
+
614
+ expect(result[:chunks].length).to eq(2)
615
+ end
616
+
617
+ it "falls back to NumberedQuestionStrategy when no metadata markers" do
618
+ html = <<~HTML
619
+ <p>1) What is the capital of France?</p>
620
+ <p>a) Paris</p>
621
+ <p>b) London</p>
622
+ <p>2) What is H2O?</p>
623
+ <p>a) Water</p>
624
+ <p>b) Fire</p>
625
+ HTML
626
+ doc = Nokogiri::HTML.fragment(html)
627
+ chunker = described_class.new(doc)
628
+ result = chunker.chunk
629
+
630
+ expect(result[:chunks].length).to eq(2)
631
+ end
632
+
633
+ it "falls back to HeadingSplitStrategy when no numbers" do
634
+ html = <<~HTML
635
+ <h2>Question 1</h2>
636
+ <p>What is the capital?</p>
637
+ <p>a) Paris</p>
638
+ <h2>Question 2</h2>
639
+ <p>What is H2O?</p>
640
+ <p>a) Water</p>
641
+ HTML
642
+ doc = Nokogiri::HTML.fragment(html)
643
+ chunker = described_class.new(doc)
644
+ result = chunker.chunk
645
+
646
+ expect(result[:chunks].length).to eq(2)
647
+ end
648
+
649
+ it "returns whole document as single chunk when no strategy matches" do
650
+ html = <<~HTML
651
+ <p>Some question text here</p>
652
+ <p>a) An option</p>
653
+ HTML
654
+ doc = Nokogiri::HTML.fragment(html)
655
+ chunker = described_class.new(doc)
656
+ result = chunker.chunk
657
+
658
+ expect(result[:chunks].length).to eq(1)
659
+ expect(result[:warnings]).to include(/No chunking strategy/i)
660
+ end
661
+
662
+ it "extracts header nodes" do
663
+ html = <<~HTML
664
+ <p>Exam: Midterm 2024</p>
665
+ <p>Total Questions: 30</p>
666
+ <p>Folder: Geo Title: Q1 Category: Test 1) Question? ~ Expl</p>
667
+ <p>*a) Answer</p>
668
+ HTML
669
+ doc = Nokogiri::HTML.fragment(html)
670
+ chunker = described_class.new(doc)
671
+ result = chunker.chunk
672
+
673
+ expect(result[:header_nodes]).not_to be_empty
674
+ end
675
+ end
676
+ end
677
+ ```
678
+
679
+ **Step 2: Run test to verify it fails**
680
+
681
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/chunker_spec.rb -v`
682
+ Expected: FAIL
683
+
684
+ **Step 3: Write implementation**
685
+
686
+ Create `lib/atomic_assessments_import/exam_soft/chunker.rb`:
687
+
688
+ ```ruby
689
+ # frozen_string_literal: true
690
+
691
+ require_relative "chunker/strategy"
692
+ require_relative "chunker/metadata_marker_strategy"
693
+ require_relative "chunker/numbered_question_strategy"
694
+ require_relative "chunker/heading_split_strategy"
695
+ require_relative "chunker/horizontal_rule_split_strategy"
696
+
697
+ module AtomicAssessmentsImport
698
+ module ExamSoft
699
+ class Chunker
700
+ STRATEGIES = [
701
+ Chunker::MetadataMarkerStrategy,
702
+ Chunker::NumberedQuestionStrategy,
703
+ Chunker::HeadingSplitStrategy,
704
+ Chunker::HorizontalRuleSplitStrategy,
705
+ ].freeze
706
+
707
+ def initialize(doc)
708
+ @doc = doc
709
+ end
710
+
711
+ def chunk
712
+ warnings = []
713
+
714
+ STRATEGIES.each do |strategy_class|
715
+ strategy = strategy_class.new
716
+ chunks = strategy.split(@doc)
717
+ next if chunks.empty?
718
+
719
+ return {
720
+ chunks: chunks,
721
+ header_nodes: strategy.header_nodes,
722
+ warnings: warnings,
723
+ }
724
+ end
725
+
726
+ # No strategy matched — return entire document as one chunk
727
+ all_nodes = @doc.children.reject { |n| n.text.strip.empty? && !n.name.match?(/^(img|table|hr)$/i) }
728
+ warnings << "No chunking strategy matched. Treating entire document as a single question."
729
+
730
+ {
731
+ chunks: [all_nodes],
732
+ header_nodes: [],
733
+ warnings: warnings,
734
+ }
735
+ end
736
+ end
737
+ end
738
+ end
739
+ ```
740
+
741
+ **Step 4: Run test to verify it passes**
742
+
743
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/chunker_spec.rb -v`
744
+ Expected: PASS
745
+
746
+ **Step 5: Commit**
747
+
748
+ ```bash
749
+ git add lib/atomic_assessments_import/exam_soft/chunker.rb spec/atomic_assessments_import/examsoft/chunker_spec.rb
750
+ git commit -m "feat: add Chunker orchestrator with strategy cascade"
751
+ ```
752
+
753
+ ---
754
+
755
+ ### Task 5: Field Detectors — QuestionStem, Options, CorrectAnswer
756
+
757
+ These three are the core detectors needed for MCQ questions.
758
+
759
+ **Files:**
760
+ - Create: `lib/atomic_assessments_import/exam_soft/extractor/question_stem_detector.rb`
761
+ - Create: `lib/atomic_assessments_import/exam_soft/extractor/options_detector.rb`
762
+ - Create: `lib/atomic_assessments_import/exam_soft/extractor/correct_answer_detector.rb`
763
+ - Test: `spec/atomic_assessments_import/examsoft/extractor/question_stem_detector_spec.rb`
764
+ - Test: `spec/atomic_assessments_import/examsoft/extractor/options_detector_spec.rb`
765
+ - Test: `spec/atomic_assessments_import/examsoft/extractor/correct_answer_detector_spec.rb`
766
+
767
+ **Step 1: Write failing tests**
768
+
769
+ Create `spec/atomic_assessments_import/examsoft/extractor/question_stem_detector_spec.rb`:
770
+
771
+ ```ruby
772
+ # frozen_string_literal: true
773
+
774
+ require "atomic_assessments_import"
775
+ require "nokogiri"
776
+
777
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Extractor::QuestionStemDetector do
778
+ def nodes_from(html)
779
+ Nokogiri::HTML.fragment(html).children.to_a
780
+ end
781
+
782
+ describe "#detect" do
783
+ it "extracts question text before options" do
784
+ nodes = nodes_from(<<~HTML)
785
+ <p>1) What is the capital of France?</p>
786
+ <p>a) Paris</p>
787
+ <p>b) London</p>
788
+ HTML
789
+ result = described_class.new(nodes).detect
790
+
791
+ expect(result).to eq("What is the capital of France?")
792
+ end
793
+
794
+ it "extracts question text with tilde-separated explanation removed" do
795
+ nodes = nodes_from(<<~HTML)
796
+ <p>Folder: Geo Title: Q1 Category: Test 1) What is the capital? ~ Paris is the capital.</p>
797
+ <p>*a) Paris</p>
798
+ HTML
799
+ result = described_class.new(nodes).detect
800
+
801
+ expect(result).to eq("What is the capital?")
802
+ end
803
+
804
+ it "extracts question text without numbered prefix" do
805
+ nodes = nodes_from(<<~HTML)
806
+ <p>What is the capital of France?</p>
807
+ <p>a) Paris</p>
808
+ HTML
809
+ result = described_class.new(nodes).detect
810
+
811
+ expect(result).to eq("What is the capital of France?")
812
+ end
813
+
814
+ it "returns nil when no question text found" do
815
+ nodes = nodes_from("<p>a) Paris</p><p>b) London</p>")
816
+ result = described_class.new(nodes).detect
817
+
818
+ expect(result).to be_nil
819
+ end
820
+ end
821
+ end
822
+ ```
823
+
824
+ Create `spec/atomic_assessments_import/examsoft/extractor/options_detector_spec.rb`:
825
+
826
+ ```ruby
827
+ # frozen_string_literal: true
828
+
829
+ require "atomic_assessments_import"
830
+ require "nokogiri"
831
+
832
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Extractor::OptionsDetector do
833
+ def nodes_from(html)
834
+ Nokogiri::HTML.fragment(html).children.to_a
835
+ end
836
+
837
+ describe "#detect" do
838
+ it "extracts lettered options with paren format" do
839
+ nodes = nodes_from(<<~HTML)
840
+ <p>Question text</p>
841
+ <p>a) Paris</p>
842
+ <p>b) London</p>
843
+ <p>c) Berlin</p>
844
+ HTML
845
+ result = described_class.new(nodes).detect
846
+
847
+ expect(result.length).to eq(3)
848
+ expect(result[0][:text]).to eq("Paris")
849
+ expect(result[1][:text]).to eq("London")
850
+ expect(result[2][:text]).to eq("Berlin")
851
+ end
852
+
853
+ it "detects correct answer markers with asterisk" do
854
+ nodes = nodes_from(<<~HTML)
855
+ <p>*a) Paris</p>
856
+ <p>b) London</p>
857
+ HTML
858
+ result = described_class.new(nodes).detect
859
+
860
+ expect(result[0][:correct]).to be true
861
+ expect(result[1][:correct]).to be false
862
+ end
863
+
864
+ it "detects correct answer markers with bold" do
865
+ nodes = nodes_from(<<~HTML)
866
+ <p><strong>a) Paris</strong></p>
867
+ <p>b) London</p>
868
+ HTML
869
+ result = described_class.new(nodes).detect
870
+
871
+ expect(result[0][:correct]).to be true
872
+ expect(result[1][:correct]).to be false
873
+ end
874
+
875
+ it "returns empty array when no options found" do
876
+ nodes = nodes_from("<p>Just a paragraph</p>")
877
+ result = described_class.new(nodes).detect
878
+
879
+ expect(result).to eq([])
880
+ end
881
+
882
+ it "handles uppercase letter options" do
883
+ nodes = nodes_from(<<~HTML)
884
+ <p>A) Paris</p>
885
+ <p>B) London</p>
886
+ HTML
887
+ result = described_class.new(nodes).detect
888
+
889
+ expect(result.length).to eq(2)
890
+ expect(result[0][:text]).to eq("Paris")
891
+ end
892
+ end
893
+ end
894
+ ```
895
+
896
+ Create `spec/atomic_assessments_import/examsoft/extractor/correct_answer_detector_spec.rb`:
897
+
898
+ ```ruby
899
+ # frozen_string_literal: true
900
+
901
+ require "atomic_assessments_import"
902
+ require "nokogiri"
903
+
904
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Extractor::CorrectAnswerDetector do
905
+ def nodes_from(html)
906
+ Nokogiri::HTML.fragment(html).children.to_a
907
+ end
908
+
909
+ describe "#detect" do
910
+ it "detects correct answers from asterisk-marked options" do
911
+ options = [
912
+ { text: "Paris", letter: "a", correct: true },
913
+ { text: "London", letter: "b", correct: false },
914
+ ]
915
+ result = described_class.new(nodes_from(""), options).detect
916
+
917
+ expect(result).to eq(["a"])
918
+ end
919
+
920
+ it "detects multiple correct answers" do
921
+ options = [
922
+ { text: "Little Rock", letter: "a", correct: true },
923
+ { text: "Denver", letter: "b", correct: true },
924
+ { text: "Detroit", letter: "c", correct: false },
925
+ ]
926
+ result = described_class.new(nodes_from(""), options).detect
927
+
928
+ expect(result).to eq(["a", "b"])
929
+ end
930
+
931
+ it "detects correct answer from Answer: label in chunk" do
932
+ nodes = nodes_from("<p>Answer: A</p>")
933
+ options = [
934
+ { text: "Paris", letter: "a", correct: false },
935
+ { text: "London", letter: "b", correct: false },
936
+ ]
937
+ result = described_class.new(nodes, options).detect
938
+
939
+ expect(result).to eq(["a"])
940
+ end
941
+
942
+ it "returns empty array when no correct answer found" do
943
+ options = [
944
+ { text: "Paris", letter: "a", correct: false },
945
+ { text: "London", letter: "b", correct: false },
946
+ ]
947
+ result = described_class.new(nodes_from(""), options).detect
948
+
949
+ expect(result).to eq([])
950
+ end
951
+ end
952
+ end
953
+ ```
954
+
955
+ **Step 2: Run tests to verify they fail**
956
+
957
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/extractor/ -v`
958
+ Expected: FAIL — uninitialized constants
959
+
960
+ **Step 3: Write implementations**
961
+
962
+ Create `lib/atomic_assessments_import/exam_soft/extractor/question_stem_detector.rb`:
963
+
964
+ ```ruby
965
+ # frozen_string_literal: true
966
+
967
+ module AtomicAssessmentsImport
968
+ module ExamSoft
969
+ module Extractor
970
+ class QuestionStemDetector
971
+ OPTION_PATTERN = /\A\s*\*?[a-oA-O][.)]/
972
+ NUMBERED_PREFIX = /\A\s*\d+\s*[.)]\s*/
973
+ METADATA_PREFIX = /\A\s*(?:(?:Type:\s*.+?\s+)?Folder:.+?(?:Title:.+?)?(?:Category:.+?)?)?\s*\d*\s*[.)]?\s*/m
974
+ TILDE_SPLIT = /\s*~\s*/
975
+
976
+ def initialize(nodes)
977
+ @nodes = nodes
978
+ end
979
+
980
+ def detect
981
+ @nodes.each do |node|
982
+ text = node.text.strip
983
+ next if text.empty?
984
+ next if text.match?(OPTION_PATTERN)
985
+
986
+ # This node contains the question stem (possibly with metadata prefix)
987
+ # Try to extract just the question part
988
+ stem = extract_stem(text)
989
+ return stem unless stem.nil? || stem.empty?
990
+ end
991
+
992
+ nil
993
+ end
994
+
995
+ private
996
+
997
+ def extract_stem(text)
998
+ # Remove metadata prefix if present (Folder:, Title:, Category:, etc.)
999
+ cleaned = text.sub(METADATA_PREFIX, "")
1000
+ # Remove numbered prefix
1001
+ cleaned = cleaned.sub(NUMBERED_PREFIX, "")
1002
+ # Split on tilde (explanation separator) and take the question part
1003
+ cleaned = cleaned.split(TILDE_SPLIT).first
1004
+ cleaned&.strip.presence
1005
+ end
1006
+ end
1007
+ end
1008
+ end
1009
+ end
1010
+ ```
1011
+
1012
+ Create `lib/atomic_assessments_import/exam_soft/extractor/options_detector.rb`:
1013
+
1014
+ ```ruby
1015
+ # frozen_string_literal: true
1016
+
1017
+ module AtomicAssessmentsImport
1018
+ module ExamSoft
1019
+ module Extractor
1020
+ class OptionsDetector
1021
+ OPTION_PATTERN = /\A\s*(\*?)([a-oA-O])\s*[.)]\s*(.+)/m
1022
+
1023
+ def initialize(nodes)
1024
+ @nodes = nodes
1025
+ end
1026
+
1027
+ def detect
1028
+ options = []
1029
+
1030
+ @nodes.each do |node|
1031
+ text = node.text.strip
1032
+ match = text.match(OPTION_PATTERN)
1033
+ next unless match
1034
+
1035
+ marker = match[1]
1036
+ letter = match[2].downcase
1037
+ option_text = match[3].strip
1038
+
1039
+ # Check for bold formatting as correct marker
1040
+ bold = node.at_css("strong, b")
1041
+ is_correct = marker == "*" || (bold && bold.text.strip == text.strip)
1042
+
1043
+ options << {
1044
+ text: option_text,
1045
+ letter: letter,
1046
+ correct: is_correct || false,
1047
+ }
1048
+ end
1049
+
1050
+ options
1051
+ end
1052
+ end
1053
+ end
1054
+ end
1055
+ end
1056
+ ```
1057
+
1058
+ Create `lib/atomic_assessments_import/exam_soft/extractor/correct_answer_detector.rb`:
1059
+
1060
+ ```ruby
1061
+ # frozen_string_literal: true
1062
+
1063
+ module AtomicAssessmentsImport
1064
+ module ExamSoft
1065
+ module Extractor
1066
+ class CorrectAnswerDetector
1067
+ ANSWER_LABEL_PATTERN = /\bAnswer:\s*([A-Oa-o,;\s]+)/i
1068
+
1069
+ def initialize(nodes, options)
1070
+ @nodes = nodes
1071
+ @options = options
1072
+ end
1073
+
1074
+ def detect
1075
+ # First: check options for correct markers (asterisk, bold)
1076
+ from_markers = @options.select { |o| o[:correct] }.map { |o| o[:letter] }
1077
+ return from_markers unless from_markers.empty?
1078
+
1079
+ # Second: look for "Answer:" label in the chunk
1080
+ @nodes.each do |node|
1081
+ text = node.text.strip
1082
+ match = text.match(ANSWER_LABEL_PATTERN)
1083
+ next unless match
1084
+
1085
+ letters = match[1].scan(/[a-oA-O]/).map(&:downcase)
1086
+ return letters unless letters.empty?
1087
+ end
1088
+
1089
+ []
1090
+ end
1091
+ end
1092
+ end
1093
+ end
1094
+ end
1095
+ ```
1096
+
1097
+ **Step 4: Run tests to verify they pass**
1098
+
1099
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/extractor/ -v`
1100
+ Expected: PASS
1101
+
1102
+ **Step 5: Commit**
1103
+
1104
+ ```bash
1105
+ git add lib/atomic_assessments_import/exam_soft/extractor/ spec/atomic_assessments_import/examsoft/extractor/
1106
+ git commit -m "feat: add core field detectors (stem, options, correct answer)"
1107
+ ```
1108
+
1109
+ ---
1110
+
1111
+ ### Task 6: Field Detectors — Metadata, Feedback, QuestionType
1112
+
1113
+ **Files:**
1114
+ - Create: `lib/atomic_assessments_import/exam_soft/extractor/metadata_detector.rb`
1115
+ - Create: `lib/atomic_assessments_import/exam_soft/extractor/feedback_detector.rb`
1116
+ - Create: `lib/atomic_assessments_import/exam_soft/extractor/question_type_detector.rb`
1117
+ - Test: `spec/atomic_assessments_import/examsoft/extractor/metadata_detector_spec.rb`
1118
+ - Test: `spec/atomic_assessments_import/examsoft/extractor/feedback_detector_spec.rb`
1119
+ - Test: `spec/atomic_assessments_import/examsoft/extractor/question_type_detector_spec.rb`
1120
+
1121
+ **Step 1: Write failing tests**
1122
+
1123
+ Create `spec/atomic_assessments_import/examsoft/extractor/metadata_detector_spec.rb`:
1124
+
1125
+ ```ruby
1126
+ # frozen_string_literal: true
1127
+
1128
+ require "atomic_assessments_import"
1129
+ require "nokogiri"
1130
+
1131
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Extractor::MetadataDetector do
1132
+ def nodes_from(html)
1133
+ Nokogiri::HTML.fragment(html).children.to_a
1134
+ end
1135
+
1136
+ describe "#detect" do
1137
+ it "extracts folder, title, and category" do
1138
+ nodes = nodes_from("<p>Folder: Geography Title: Question 1 Category: Subject/Capitals, Difficulty/Normal 1) Question?</p>")
1139
+ result = described_class.new(nodes).detect
1140
+
1141
+ expect(result[:folder]).to eq("Geography")
1142
+ expect(result[:title]).to eq("Question 1")
1143
+ expect(result[:categories]).to include("Subject/Capitals")
1144
+ end
1145
+
1146
+ it "extracts type when present" do
1147
+ nodes = nodes_from("<p>Type: MA Folder: Geography Title: Q1 Category: Test 1) Question?</p>")
1148
+ result = described_class.new(nodes).detect
1149
+
1150
+ expect(result[:type]).to eq("ma")
1151
+ end
1152
+
1153
+ it "returns empty hash when no metadata found" do
1154
+ nodes = nodes_from("<p>Just a question with no metadata</p>")
1155
+ result = described_class.new(nodes).detect
1156
+
1157
+ expect(result).to eq({})
1158
+ end
1159
+ end
1160
+ end
1161
+ ```
1162
+
1163
+ Create `spec/atomic_assessments_import/examsoft/extractor/feedback_detector_spec.rb`:
1164
+
1165
+ ```ruby
1166
+ # frozen_string_literal: true
1167
+
1168
+ require "atomic_assessments_import"
1169
+ require "nokogiri"
1170
+
1171
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Extractor::FeedbackDetector do
1172
+ def nodes_from(html)
1173
+ Nokogiri::HTML.fragment(html).children.to_a
1174
+ end
1175
+
1176
+ describe "#detect" do
1177
+ it "extracts feedback after tilde" do
1178
+ nodes = nodes_from("<p>1) What is the capital? ~ Paris is the capital of France.</p>")
1179
+ result = described_class.new(nodes).detect
1180
+
1181
+ expect(result).to eq("Paris is the capital of France.")
1182
+ end
1183
+
1184
+ it "extracts feedback from Explanation: label" do
1185
+ nodes = nodes_from(<<~HTML)
1186
+ <p>What is the capital?</p>
1187
+ <p>Explanation: Paris is the capital of France.</p>
1188
+ HTML
1189
+ result = described_class.new(nodes).detect
1190
+
1191
+ expect(result).to eq("Paris is the capital of France.")
1192
+ end
1193
+
1194
+ it "extracts feedback from Rationale: label" do
1195
+ nodes = nodes_from(<<~HTML)
1196
+ <p>What is the capital?</p>
1197
+ <p>Rationale: Paris is the capital of France.</p>
1198
+ HTML
1199
+ result = described_class.new(nodes).detect
1200
+
1201
+ expect(result).to eq("Paris is the capital of France.")
1202
+ end
1203
+
1204
+ it "returns nil when no feedback found" do
1205
+ nodes = nodes_from("<p>Just a question</p>")
1206
+ result = described_class.new(nodes).detect
1207
+
1208
+ expect(result).to be_nil
1209
+ end
1210
+ end
1211
+ end
1212
+ ```
1213
+
1214
+ Create `spec/atomic_assessments_import/examsoft/extractor/question_type_detector_spec.rb`:
1215
+
1216
+ ```ruby
1217
+ # frozen_string_literal: true
1218
+
1219
+ require "atomic_assessments_import"
1220
+ require "nokogiri"
1221
+
1222
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Extractor::QuestionTypeDetector do
1223
+ def nodes_from(html)
1224
+ Nokogiri::HTML.fragment(html).children.to_a
1225
+ end
1226
+
1227
+ describe "#detect" do
1228
+ it "detects type from Type: label" do
1229
+ nodes = nodes_from("<p>Type: MA Folder: Geo 1) Question?</p>")
1230
+ result = described_class.new(nodes, has_options: true).detect
1231
+
1232
+ expect(result).to eq("ma")
1233
+ end
1234
+
1235
+ it "detects essay from Type: label" do
1236
+ nodes = nodes_from("<p>Type: Essay Folder: Geo 1) Question?</p>")
1237
+ result = described_class.new(nodes, has_options: false).detect
1238
+
1239
+ expect(result).to eq("essay")
1240
+ end
1241
+
1242
+ it "defaults to mcq when options are present" do
1243
+ nodes = nodes_from("<p>A question with no type label</p>")
1244
+ result = described_class.new(nodes, has_options: true).detect
1245
+
1246
+ expect(result).to eq("mcq")
1247
+ end
1248
+
1249
+ it "defaults to short_answer when no options" do
1250
+ nodes = nodes_from("<p>A question with no type label and no options</p>")
1251
+ result = described_class.new(nodes, has_options: false).detect
1252
+
1253
+ expect(result).to eq("short_answer")
1254
+ end
1255
+
1256
+ it "detects true/false from Type: label" do
1257
+ nodes = nodes_from("<p>Type: True/False 1) Question?</p>")
1258
+ result = described_class.new(nodes, has_options: true).detect
1259
+
1260
+ expect(result).to eq("true_false")
1261
+ end
1262
+
1263
+ it "detects matching from Type: label" do
1264
+ nodes = nodes_from("<p>Type: Matching 1) Question?</p>")
1265
+ result = described_class.new(nodes, has_options: false).detect
1266
+
1267
+ expect(result).to eq("matching")
1268
+ end
1269
+
1270
+ it "detects ordering from Type: label" do
1271
+ nodes = nodes_from("<p>Type: Ordering 1) Question?</p>")
1272
+ result = described_class.new(nodes, has_options: false).detect
1273
+
1274
+ expect(result).to eq("ordering")
1275
+ end
1276
+
1277
+ it "detects fill_in_the_blank from Type: label" do
1278
+ nodes = nodes_from("<p>Type: Fill in the Blank 1) Question?</p>")
1279
+ result = described_class.new(nodes, has_options: false).detect
1280
+
1281
+ expect(result).to eq("fill_in_the_blank")
1282
+ end
1283
+ end
1284
+ end
1285
+ ```
1286
+
1287
+ **Step 2: Run tests to verify they fail**
1288
+
1289
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/extractor/ -v`
1290
+ Expected: FAIL — uninitialized constants for new detectors
1291
+
1292
+ **Step 3: Write implementations**
1293
+
1294
+ Create `lib/atomic_assessments_import/exam_soft/extractor/metadata_detector.rb`:
1295
+
1296
+ ```ruby
1297
+ # frozen_string_literal: true
1298
+
1299
+ module AtomicAssessmentsImport
1300
+ module ExamSoft
1301
+ module Extractor
1302
+ class MetadataDetector
1303
+ FOLDER_PATTERN = /Folder:\s*(.+?)(?=\s*(?:Title:|Category:|\d+[.)]))/
1304
+ TITLE_PATTERN = /Title:\s*(.+?)(?=\s*(?:Category:|\d+[.)]))/
1305
+ CATEGORY_PATTERN = /Category:\s*(.+?)(?=\s*\d+[.)]|\z)/
1306
+ TYPE_PATTERN = /Type:\s*(\S+)/
1307
+
1308
+ def initialize(nodes)
1309
+ @nodes = nodes
1310
+ end
1311
+
1312
+ def detect
1313
+ # Combine all text from nodes to search for metadata
1314
+ full_text = @nodes.map { |n| n.text.strip }.join(" ")
1315
+ result = {}
1316
+
1317
+ type_match = full_text.match(TYPE_PATTERN)
1318
+ result[:type] = type_match[1].strip.downcase if type_match
1319
+
1320
+ folder_match = full_text.match(FOLDER_PATTERN)
1321
+ result[:folder] = folder_match[1].strip if folder_match
1322
+
1323
+ title_match = full_text.match(TITLE_PATTERN)
1324
+ result[:title] = title_match[1].strip if title_match
1325
+
1326
+ category_match = full_text.match(CATEGORY_PATTERN)
1327
+ if category_match
1328
+ result[:categories] = category_match[1].split(",").map(&:strip)
1329
+ end
1330
+
1331
+ result
1332
+ end
1333
+ end
1334
+ end
1335
+ end
1336
+ end
1337
+ ```
1338
+
1339
+ Create `lib/atomic_assessments_import/exam_soft/extractor/feedback_detector.rb`:
1340
+
1341
+ ```ruby
1342
+ # frozen_string_literal: true
1343
+
1344
+ module AtomicAssessmentsImport
1345
+ module ExamSoft
1346
+ module Extractor
1347
+ class FeedbackDetector
1348
+ TILDE_PATTERN = /~\s*(.+)/m
1349
+ LABEL_PATTERN = /\A\s*(?:Explanation|Rationale):\s*(.+)/im
1350
+
1351
+ def initialize(nodes)
1352
+ @nodes = nodes
1353
+ end
1354
+
1355
+ def detect
1356
+ # First: look for tilde-separated feedback in any node
1357
+ @nodes.each do |node|
1358
+ text = node.text.strip
1359
+ match = text.match(TILDE_PATTERN)
1360
+ if match
1361
+ feedback = match[1].strip
1362
+ return feedback unless feedback.empty?
1363
+ end
1364
+ end
1365
+
1366
+ # Second: look for labeled feedback (Explanation:, Rationale:)
1367
+ @nodes.each do |node|
1368
+ text = node.text.strip
1369
+ match = text.match(LABEL_PATTERN)
1370
+ return match[1].strip if match
1371
+ end
1372
+
1373
+ nil
1374
+ end
1375
+ end
1376
+ end
1377
+ end
1378
+ end
1379
+ ```
1380
+
1381
+ Create `lib/atomic_assessments_import/exam_soft/extractor/question_type_detector.rb`:
1382
+
1383
+ ```ruby
1384
+ # frozen_string_literal: true
1385
+
1386
+ module AtomicAssessmentsImport
1387
+ module ExamSoft
1388
+ module Extractor
1389
+ class QuestionTypeDetector
1390
+ TYPE_LABEL_PATTERN = /Type:\s*(.+?)(?=\s*(?:Folder:|Title:|Category:|\d+[.)]|\z))/i
1391
+
1392
+ TYPE_MAP = {
1393
+ /\Amcq?\z/i => "mcq",
1394
+ /\Amultiple\s*choice\z/i => "mcq",
1395
+ /\Ama\z/i => "ma",
1396
+ /\Amultiple\s*(?:select|answer|response)\z/i => "ma",
1397
+ /\Atrue[\s\/]*false\z/i => "true_false",
1398
+ /\At\s*\/?\s*f\z/i => "true_false",
1399
+ /\Aessay\z/i => "essay",
1400
+ /\Along\s*answer\z/i => "essay",
1401
+ /\Ashort\s*answer\z/i => "short_answer",
1402
+ /\Afill[\s_-]*in[\s_-]*(?:the[\s_-]*)?blank\z/i => "fill_in_the_blank",
1403
+ /\Acloze\z/i => "fill_in_the_blank",
1404
+ /\Amatching\z/i => "matching",
1405
+ /\Aorder(?:ing)?\z/i => "ordering",
1406
+ }.freeze
1407
+
1408
+ def initialize(nodes, has_options:)
1409
+ @nodes = nodes
1410
+ @has_options = has_options
1411
+ end
1412
+
1413
+ def detect
1414
+ # Try to find an explicit Type: label
1415
+ full_text = @nodes.map { |n| n.text.strip }.join(" ")
1416
+ match = full_text.match(TYPE_LABEL_PATTERN)
1417
+
1418
+ if match
1419
+ type_text = match[1].strip
1420
+ TYPE_MAP.each do |pattern, type|
1421
+ return type if type_text.match?(pattern)
1422
+ end
1423
+ # Unknown explicit type — return it lowercased as-is
1424
+ return type_text.downcase.gsub(/\s+/, "_")
1425
+ end
1426
+
1427
+ # No explicit type — infer from structure
1428
+ @has_options ? "mcq" : "short_answer"
1429
+ end
1430
+ end
1431
+ end
1432
+ end
1433
+ end
1434
+ ```
1435
+
1436
+ **Step 4: Run tests to verify they pass**
1437
+
1438
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/extractor/ -v`
1439
+ Expected: PASS
1440
+
1441
+ **Step 5: Commit**
1442
+
1443
+ ```bash
1444
+ git add lib/atomic_assessments_import/exam_soft/extractor/ spec/atomic_assessments_import/examsoft/extractor/
1445
+ git commit -m "feat: add metadata, feedback, and question type detectors"
1446
+ ```
1447
+
1448
+ ---
1449
+
1450
+ ### Task 7: Extractor Orchestrator
1451
+
1452
+ Assembles all detectors and builds the `row_mock` hash.
1453
+
1454
+ **Files:**
1455
+ - Create: `lib/atomic_assessments_import/exam_soft/extractor.rb`
1456
+ - Test: `spec/atomic_assessments_import/examsoft/extractor_spec.rb`
1457
+
1458
+ **Step 1: Write the failing test**
1459
+
1460
+ Create `spec/atomic_assessments_import/examsoft/extractor_spec.rb`:
1461
+
1462
+ ```ruby
1463
+ # frozen_string_literal: true
1464
+
1465
+ require "atomic_assessments_import"
1466
+ require "nokogiri"
1467
+
1468
+ RSpec.describe AtomicAssessmentsImport::ExamSoft::Extractor do
1469
+ def nodes_from(html)
1470
+ Nokogiri::HTML.fragment(html).children.to_a
1471
+ end
1472
+
1473
+ describe "#extract" do
1474
+ it "extracts a complete MCQ question" do
1475
+ nodes = nodes_from(<<~HTML)
1476
+ <p>Folder: Geography Title: Question 1 Category: Subject/Capitals 1) What is the capital of France? ~ Paris is the capital.</p>
1477
+ <p>*a) Paris</p>
1478
+ <p>b) London</p>
1479
+ <p>c) Berlin</p>
1480
+ HTML
1481
+ result = described_class.new(nodes).extract
1482
+
1483
+ expect(result[:row]["question text"]).to eq("What is the capital of France?")
1484
+ expect(result[:row]["option a"]).to eq("Paris")
1485
+ expect(result[:row]["option b"]).to eq("London")
1486
+ expect(result[:row]["option c"]).to eq("Berlin")
1487
+ expect(result[:row]["correct answer"]).to eq("a")
1488
+ expect(result[:row]["title"]).to eq("Question 1")
1489
+ expect(result[:row]["folder"]).to eq("Geography")
1490
+ expect(result[:row]["general feedback"]).to eq("Paris is the capital.")
1491
+ expect(result[:row]["question type"]).to eq("mcq")
1492
+ expect(result[:status]).to eq("published")
1493
+ expect(result[:warnings]).to be_empty
1494
+ end
1495
+
1496
+ it "returns draft status when no correct answer" do
1497
+ nodes = nodes_from(<<~HTML)
1498
+ <p>1) What is the capital of France?</p>
1499
+ <p>a) Paris</p>
1500
+ <p>b) London</p>
1501
+ HTML
1502
+ result = described_class.new(nodes).extract
1503
+
1504
+ expect(result[:status]).to eq("draft")
1505
+ expect(result[:warnings]).to include(/correct answer/i)
1506
+ end
1507
+
1508
+ it "returns draft status when no question text found" do
1509
+ nodes = nodes_from(<<~HTML)
1510
+ <p>a) Paris</p>
1511
+ <p>b) London</p>
1512
+ HTML
1513
+ result = described_class.new(nodes).extract
1514
+
1515
+ expect(result[:status]).to eq("draft")
1516
+ expect(result[:warnings]).to include(/question text/i)
1517
+ end
1518
+
1519
+ it "handles multiple correct answers for MA type" do
1520
+ nodes = nodes_from(<<~HTML)
1521
+ <p>Type: MA Folder: Geo Title: Q1 Category: Test 1) Pick capitals? ~ Explanation</p>
1522
+ <p>*a) Paris</p>
1523
+ <p>*b) Berlin</p>
1524
+ <p>c) Detroit</p>
1525
+ HTML
1526
+ result = described_class.new(nodes).extract
1527
+
1528
+ expect(result[:row]["correct answer"]).to eq("a; b")
1529
+ expect(result[:row]["question type"]).to eq("ma")
1530
+ end
1531
+
1532
+ it "extracts essay questions without options" do
1533
+ nodes = nodes_from(<<~HTML)
1534
+ <p>Type: Essay Folder: Writing Title: Q1 Category: Test 1) Discuss the causes of WWI.</p>
1535
+ HTML
1536
+ result = described_class.new(nodes).extract
1537
+
1538
+ expect(result[:row]["question type"]).to eq("essay")
1539
+ expect(result[:row]["question text"]).to eq("Discuss the causes of WWI.")
1540
+ expect(result[:status]).to eq("published")
1541
+ end
1542
+
1543
+ it "warns for unsupported question types but still imports" do
1544
+ nodes = nodes_from(<<~HTML)
1545
+ <p>Type: Hotspot 1) Identify the region on the map.</p>
1546
+ HTML
1547
+ result = described_class.new(nodes).extract
1548
+
1549
+ expect(result[:status]).to eq("draft")
1550
+ expect(result[:warnings]).to include(/unsupported.*hotspot/i)
1551
+ end
1552
+ end
1553
+ end
1554
+ ```
1555
+
1556
+ **Step 2: Run test to verify it fails**
1557
+
1558
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/extractor_spec.rb -v`
1559
+ Expected: FAIL
1560
+
1561
+ **Step 3: Write implementation**
1562
+
1563
+ Create `lib/atomic_assessments_import/exam_soft/extractor.rb`:
1564
+
1565
+ ```ruby
1566
+ # frozen_string_literal: true
1567
+
1568
+ require_relative "extractor/question_stem_detector"
1569
+ require_relative "extractor/options_detector"
1570
+ require_relative "extractor/correct_answer_detector"
1571
+ require_relative "extractor/metadata_detector"
1572
+ require_relative "extractor/feedback_detector"
1573
+ require_relative "extractor/question_type_detector"
1574
+
1575
+ module AtomicAssessmentsImport
1576
+ module ExamSoft
1577
+ class Extractor
1578
+ SUPPORTED_TYPES = %w[mcq ma true_false essay short_answer fill_in_the_blank matching ordering].freeze
1579
+ # Types that require options and a correct answer
1580
+ OPTION_TYPES = %w[mcq ma true_false].freeze
1581
+
1582
+ def initialize(nodes)
1583
+ @nodes = nodes
1584
+ end
1585
+
1586
+ def extract
1587
+ warnings = []
1588
+
1589
+ # Run detectors
1590
+ options = Extractor::OptionsDetector.new(@nodes).detect
1591
+ has_options = !options.empty?
1592
+
1593
+ metadata = Extractor::MetadataDetector.new(@nodes).detect
1594
+ question_type = Extractor::QuestionTypeDetector.new(@nodes, has_options: has_options).detect
1595
+ stem = Extractor::QuestionStemDetector.new(@nodes).detect
1596
+ feedback = Extractor::FeedbackDetector.new(@nodes).detect
1597
+ correct_answers = has_options ? Extractor::CorrectAnswerDetector.new(@nodes, options).detect : []
1598
+
1599
+ # Determine status
1600
+ status = "published"
1601
+
1602
+ unless SUPPORTED_TYPES.include?(question_type)
1603
+ warnings << "Unsupported question type '#{question_type}', imported as draft"
1604
+ status = "draft"
1605
+ end
1606
+
1607
+ if stem.nil?
1608
+ warnings << "No question text found, imported as draft"
1609
+ status = "draft"
1610
+ end
1611
+
1612
+ if OPTION_TYPES.include?(question_type)
1613
+ if options.empty?
1614
+ warnings << "No options found for #{question_type} question, imported as draft"
1615
+ status = "draft"
1616
+ end
1617
+ if correct_answers.empty?
1618
+ warnings << "No correct answer found, imported as draft"
1619
+ status = "draft"
1620
+ end
1621
+ end
1622
+
1623
+ # Build row_mock
1624
+ row = {
1625
+ "question id" => nil,
1626
+ "folder" => metadata[:folder],
1627
+ "title" => metadata[:title],
1628
+ "category" => metadata[:categories] || [],
1629
+ "import type" => nil,
1630
+ "description" => nil,
1631
+ "question text" => stem,
1632
+ "question type" => question_type,
1633
+ "stimulus review" => nil,
1634
+ "instructor stimulus" => nil,
1635
+ "correct answer" => correct_answers.join("; "),
1636
+ "scoring type" => nil,
1637
+ "points" => nil,
1638
+ "distractor rationale" => nil,
1639
+ "sample answer" => nil,
1640
+ "acknowledgements" => nil,
1641
+ "general feedback" => feedback,
1642
+ "correct feedback" => nil,
1643
+ "incorrect feedback" => nil,
1644
+ "shuffle options" => nil,
1645
+ "template" => question_type,
1646
+ }
1647
+
1648
+ # Add option keys
1649
+ options.each_with_index do |opt, index|
1650
+ letter = ("a".ord + index).chr
1651
+ row["option #{letter}"] = opt[:text]
1652
+ end
1653
+
1654
+ {
1655
+ row: row,
1656
+ status: status,
1657
+ warnings: warnings,
1658
+ }
1659
+ end
1660
+ end
1661
+ end
1662
+ end
1663
+ ```
1664
+
1665
+ **Step 4: Run test to verify it passes**
1666
+
1667
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/extractor_spec.rb -v`
1668
+ Expected: PASS
1669
+
1670
+ **Step 5: Commit**
1671
+
1672
+ ```bash
1673
+ git add lib/atomic_assessments_import/exam_soft/extractor.rb spec/atomic_assessments_import/examsoft/extractor_spec.rb
1674
+ git commit -m "feat: add Extractor orchestrator with field detection pipeline"
1675
+ ```
1676
+
1677
+ ---
1678
+
1679
+ ### Task 8: New Question Type Classes — Essay and ShortAnswer
1680
+
1681
+ **Files:**
1682
+ - Create: `lib/atomic_assessments_import/questions/essay.rb`
1683
+ - Create: `lib/atomic_assessments_import/questions/short_answer.rb`
1684
+ - Test: `spec/atomic_assessments_import/questions/essay_spec.rb`
1685
+ - Test: `spec/atomic_assessments_import/questions/short_answer_spec.rb`
1686
+ - Modify: `lib/atomic_assessments_import/questions/question.rb:12-18` (add cases to `self.load`)
1687
+
1688
+ **Step 1: Write failing tests**
1689
+
1690
+ Create `spec/atomic_assessments_import/questions/essay_spec.rb`:
1691
+
1692
+ ```ruby
1693
+ # frozen_string_literal: true
1694
+
1695
+ require "atomic_assessments_import"
1696
+
1697
+ RSpec.describe AtomicAssessmentsImport::Questions::Essay do
1698
+ let(:row) do
1699
+ {
1700
+ "question text" => "Discuss the causes of World War I.",
1701
+ "question type" => "essay",
1702
+ "general feedback" => "A good answer covers alliances, imperialism, and nationalism.",
1703
+ "sample answer" => "World War I was caused by...",
1704
+ "points" => "10",
1705
+ }
1706
+ end
1707
+
1708
+ describe "#question_type" do
1709
+ it "returns longanswer" do
1710
+ question = described_class.new(row)
1711
+ expect(question.question_type).to eq("longanswer")
1712
+ end
1713
+ end
1714
+
1715
+ describe "#to_learnosity" do
1716
+ it "returns correct structure" do
1717
+ question = described_class.new(row)
1718
+ result = question.to_learnosity
1719
+
1720
+ expect(result[:type]).to eq("longanswer")
1721
+ expect(result[:widget_type]).to eq("response")
1722
+ expect(result[:data][:stimulus]).to eq("Discuss the causes of World War I.")
1723
+ end
1724
+
1725
+ it "includes max_length when word limit specified" do
1726
+ row["word_limit"] = "500"
1727
+ question = described_class.new(row)
1728
+ result = question.to_learnosity
1729
+
1730
+ expect(result[:data][:max_length]).to eq(500)
1731
+ end
1732
+
1733
+ it "sets metadata" do
1734
+ question = described_class.new(row)
1735
+ result = question.to_learnosity
1736
+
1737
+ expect(result[:data][:metadata][:sample_answer]).to eq("World War I was caused by...")
1738
+ expect(result[:data][:metadata][:general_feedback]).to eq("A good answer covers alliances, imperialism, and nationalism.")
1739
+ end
1740
+ end
1741
+ end
1742
+ ```
1743
+
1744
+ Create `spec/atomic_assessments_import/questions/short_answer_spec.rb`:
1745
+
1746
+ ```ruby
1747
+ # frozen_string_literal: true
1748
+
1749
+ require "atomic_assessments_import"
1750
+
1751
+ RSpec.describe AtomicAssessmentsImport::Questions::ShortAnswer do
1752
+ let(:row) do
1753
+ {
1754
+ "question text" => "What is the chemical symbol for water?",
1755
+ "question type" => "short_answer",
1756
+ "correct answer" => "H2O",
1757
+ "points" => "1",
1758
+ }
1759
+ end
1760
+
1761
+ describe "#question_type" do
1762
+ it "returns shorttext" do
1763
+ question = described_class.new(row)
1764
+ expect(question.question_type).to eq("shorttext")
1765
+ end
1766
+ end
1767
+
1768
+ describe "#to_learnosity" do
1769
+ it "returns correct structure" do
1770
+ question = described_class.new(row)
1771
+ result = question.to_learnosity
1772
+
1773
+ expect(result[:type]).to eq("shorttext")
1774
+ expect(result[:widget_type]).to eq("response")
1775
+ expect(result[:data][:stimulus]).to eq("What is the chemical symbol for water?")
1776
+ end
1777
+
1778
+ it "includes validation with correct answer" do
1779
+ question = described_class.new(row)
1780
+ result = question.to_learnosity
1781
+
1782
+ expect(result[:data][:validation][:valid_response][:value]).to eq("H2O")
1783
+ expect(result[:data][:validation][:valid_response][:score]).to eq(1)
1784
+ end
1785
+ end
1786
+ end
1787
+ ```
1788
+
1789
+ **Step 2: Run tests to verify they fail**
1790
+
1791
+ Run: `bundle exec rspec spec/atomic_assessments_import/questions/essay_spec.rb spec/atomic_assessments_import/questions/short_answer_spec.rb -v`
1792
+ Expected: FAIL — uninitialized constants
1793
+
1794
+ **Step 3: Write implementations**
1795
+
1796
+ Create `lib/atomic_assessments_import/questions/essay.rb`:
1797
+
1798
+ ```ruby
1799
+ # frozen_string_literal: true
1800
+
1801
+ require_relative "question"
1802
+
1803
+ module AtomicAssessmentsImport
1804
+ module Questions
1805
+ class Essay < Question
1806
+ def question_type
1807
+ "longanswer"
1808
+ end
1809
+
1810
+ def question_data
1811
+ data = super
1812
+ word_limit = @row["word_limit"]&.to_i
1813
+ data[:max_length] = word_limit if word_limit && word_limit > 0
1814
+ data
1815
+ end
1816
+ end
1817
+ end
1818
+ end
1819
+ ```
1820
+
1821
+ Create `lib/atomic_assessments_import/questions/short_answer.rb`:
1822
+
1823
+ ```ruby
1824
+ # frozen_string_literal: true
1825
+
1826
+ require_relative "question"
1827
+
1828
+ module AtomicAssessmentsImport
1829
+ module Questions
1830
+ class ShortAnswer < Question
1831
+ def question_type
1832
+ "shorttext"
1833
+ end
1834
+
1835
+ def question_data
1836
+ super.merge(
1837
+ validation: {
1838
+ valid_response: {
1839
+ score: points,
1840
+ value: @row["correct answer"] || "",
1841
+ },
1842
+ }
1843
+ )
1844
+ end
1845
+ end
1846
+ end
1847
+ end
1848
+ ```
1849
+
1850
+ **Step 4: Update Question.load** in `lib/atomic_assessments_import/questions/question.rb`
1851
+
1852
+ Change the `self.load` method to include new types:
1853
+
1854
+ ```ruby
1855
+ def self.load(row)
1856
+ case row["question type"]
1857
+ when nil, "", /multiple choice/i, /mcq/i, /^ma$/i
1858
+ MultipleChoice.new(row)
1859
+ when /true_false/i, /true\/false/i
1860
+ MultipleChoice.new(row)
1861
+ when /essay/i, /longanswer/i
1862
+ Essay.new(row)
1863
+ when /short_answer/i, /shorttext/i
1864
+ ShortAnswer.new(row)
1865
+ else
1866
+ raise "Unknown question type #{row['question type']}"
1867
+ end
1868
+ end
1869
+ ```
1870
+
1871
+ Also add requires at the top of `question.rb` — actually, since `question.rb` is loaded first and subclasses require it, just add the requires in the extractor/converter that uses `Question.load`. The existing pattern is that `converter.rb` files require all question classes. We'll add the new requires there.
1872
+
1873
+ For now, add to the top of `lib/atomic_assessments_import/questions/question.rb` after the class definition is loaded — actually the simplest approach: add requires in the files that use `Question.load`. The existing exam_soft converter already requires question and multiple_choice. We'll add essay and short_answer requires alongside those.
1874
+
1875
+ **Step 5: Run tests to verify they pass**
1876
+
1877
+ Run: `bundle exec rspec spec/atomic_assessments_import/questions/essay_spec.rb spec/atomic_assessments_import/questions/short_answer_spec.rb -v`
1878
+ Expected: PASS
1879
+
1880
+ **Step 6: Run all tests to check nothing broke**
1881
+
1882
+ Run: `bundle exec rspec`
1883
+ Expected: All pass
1884
+
1885
+ **Step 7: Commit**
1886
+
1887
+ ```bash
1888
+ git add lib/atomic_assessments_import/questions/essay.rb lib/atomic_assessments_import/questions/short_answer.rb lib/atomic_assessments_import/questions/question.rb spec/atomic_assessments_import/questions/essay_spec.rb spec/atomic_assessments_import/questions/short_answer_spec.rb
1889
+ git commit -m "feat: add Essay and ShortAnswer question types"
1890
+ ```
1891
+
1892
+ ---
1893
+
1894
+ ### Task 9: New Question Type Classes — FillInTheBlank, Matching, Ordering
1895
+
1896
+ **Files:**
1897
+ - Create: `lib/atomic_assessments_import/questions/fill_in_the_blank.rb`
1898
+ - Create: `lib/atomic_assessments_import/questions/matching.rb`
1899
+ - Create: `lib/atomic_assessments_import/questions/ordering.rb`
1900
+ - Test: `spec/atomic_assessments_import/questions/fill_in_the_blank_spec.rb`
1901
+ - Test: `spec/atomic_assessments_import/questions/matching_spec.rb`
1902
+ - Test: `spec/atomic_assessments_import/questions/ordering_spec.rb`
1903
+ - Modify: `lib/atomic_assessments_import/questions/question.rb:12-18` (add remaining cases to `self.load`)
1904
+
1905
+ **Step 1: Write failing tests**
1906
+
1907
+ Create `spec/atomic_assessments_import/questions/fill_in_the_blank_spec.rb`:
1908
+
1909
+ ```ruby
1910
+ # frozen_string_literal: true
1911
+
1912
+ require "atomic_assessments_import"
1913
+
1914
+ RSpec.describe AtomicAssessmentsImport::Questions::FillInTheBlank do
1915
+ let(:row) do
1916
+ {
1917
+ "question text" => "The capital of France is {{response}}.",
1918
+ "question type" => "fill_in_the_blank",
1919
+ "correct answer" => "Paris",
1920
+ "points" => "1",
1921
+ }
1922
+ end
1923
+
1924
+ describe "#question_type" do
1925
+ it "returns clozetext" do
1926
+ question = described_class.new(row)
1927
+ expect(question.question_type).to eq("clozetext")
1928
+ end
1929
+ end
1930
+
1931
+ describe "#to_learnosity" do
1932
+ it "returns correct structure" do
1933
+ question = described_class.new(row)
1934
+ result = question.to_learnosity
1935
+
1936
+ expect(result[:type]).to eq("clozetext")
1937
+ expect(result[:data][:stimulus]).to eq("The capital of France is {{response}}.")
1938
+ end
1939
+
1940
+ it "includes validation with correct answer" do
1941
+ question = described_class.new(row)
1942
+ result = question.to_learnosity
1943
+
1944
+ expect(result[:data][:validation][:valid_response][:score]).to eq(1)
1945
+ expect(result[:data][:validation][:valid_response][:value]).to eq(["Paris"])
1946
+ end
1947
+ end
1948
+ end
1949
+ ```
1950
+
1951
+ Create `spec/atomic_assessments_import/questions/matching_spec.rb`:
1952
+
1953
+ ```ruby
1954
+ # frozen_string_literal: true
1955
+
1956
+ require "atomic_assessments_import"
1957
+
1958
+ RSpec.describe AtomicAssessmentsImport::Questions::Matching do
1959
+ let(:row) do
1960
+ {
1961
+ "question text" => "Match the countries to their capitals.",
1962
+ "question type" => "matching",
1963
+ "option a" => "France",
1964
+ "option b" => "Germany",
1965
+ "option c" => "Spain",
1966
+ "match a" => "Paris",
1967
+ "match b" => "Berlin",
1968
+ "match c" => "Madrid",
1969
+ "points" => "3",
1970
+ }
1971
+ end
1972
+
1973
+ describe "#question_type" do
1974
+ it "returns association" do
1975
+ question = described_class.new(row)
1976
+ expect(question.question_type).to eq("association")
1977
+ end
1978
+ end
1979
+
1980
+ describe "#to_learnosity" do
1981
+ it "returns correct structure" do
1982
+ question = described_class.new(row)
1983
+ result = question.to_learnosity
1984
+
1985
+ expect(result[:type]).to eq("association")
1986
+ expect(result[:data][:stimulus]).to eq("Match the countries to their capitals.")
1987
+ end
1988
+
1989
+ it "includes stimulus and possible responses" do
1990
+ question = described_class.new(row)
1991
+ result = question.to_learnosity
1992
+
1993
+ expect(result[:data][:stimulus_list].length).to eq(3)
1994
+ expect(result[:data][:possible_responses].length).to eq(3)
1995
+ end
1996
+
1997
+ it "includes validation" do
1998
+ question = described_class.new(row)
1999
+ result = question.to_learnosity
2000
+
2001
+ expect(result[:data][:validation][:valid_response][:score]).to eq(3)
2002
+ expect(result[:data][:validation][:valid_response][:value].length).to eq(3)
2003
+ end
2004
+ end
2005
+ end
2006
+ ```
2007
+
2008
+ Create `spec/atomic_assessments_import/questions/ordering_spec.rb`:
2009
+
2010
+ ```ruby
2011
+ # frozen_string_literal: true
2012
+
2013
+ require "atomic_assessments_import"
2014
+
2015
+ RSpec.describe AtomicAssessmentsImport::Questions::Ordering do
2016
+ let(:row) do
2017
+ {
2018
+ "question text" => "Arrange these events in chronological order.",
2019
+ "question type" => "ordering",
2020
+ "option a" => "World War I",
2021
+ "option b" => "World War II",
2022
+ "option c" => "Cold War",
2023
+ "correct answer" => "a; b; c",
2024
+ "points" => "3",
2025
+ }
2026
+ end
2027
+
2028
+ describe "#question_type" do
2029
+ it "returns orderlist" do
2030
+ question = described_class.new(row)
2031
+ expect(question.question_type).to eq("orderlist")
2032
+ end
2033
+ end
2034
+
2035
+ describe "#to_learnosity" do
2036
+ it "returns correct structure" do
2037
+ question = described_class.new(row)
2038
+ result = question.to_learnosity
2039
+
2040
+ expect(result[:type]).to eq("orderlist")
2041
+ expect(result[:data][:stimulus]).to eq("Arrange these events in chronological order.")
2042
+ end
2043
+
2044
+ it "includes list of items" do
2045
+ question = described_class.new(row)
2046
+ result = question.to_learnosity
2047
+
2048
+ expect(result[:data][:list].length).to eq(3)
2049
+ end
2050
+
2051
+ it "includes validation with correct order" do
2052
+ question = described_class.new(row)
2053
+ result = question.to_learnosity
2054
+
2055
+ expect(result[:data][:validation][:valid_response][:score]).to eq(3)
2056
+ expect(result[:data][:validation][:valid_response][:value]).to eq(["0", "1", "2"])
2057
+ end
2058
+ end
2059
+ end
2060
+ ```
2061
+
2062
+ **Step 2: Run tests to verify they fail**
2063
+
2064
+ Run: `bundle exec rspec spec/atomic_assessments_import/questions/fill_in_the_blank_spec.rb spec/atomic_assessments_import/questions/matching_spec.rb spec/atomic_assessments_import/questions/ordering_spec.rb -v`
2065
+ Expected: FAIL — uninitialized constants
2066
+
2067
+ **Step 3: Write implementations**
2068
+
2069
+ Create `lib/atomic_assessments_import/questions/fill_in_the_blank.rb`:
2070
+
2071
+ ```ruby
2072
+ # frozen_string_literal: true
2073
+
2074
+ require_relative "question"
2075
+
2076
+ module AtomicAssessmentsImport
2077
+ module Questions
2078
+ class FillInTheBlank < Question
2079
+ def question_type
2080
+ "clozetext"
2081
+ end
2082
+
2083
+ def question_data
2084
+ answers = (@row["correct answer"] || "").split(";").map(&:strip)
2085
+
2086
+ super.merge(
2087
+ validation: {
2088
+ valid_response: {
2089
+ score: points,
2090
+ value: answers,
2091
+ },
2092
+ }
2093
+ )
2094
+ end
2095
+ end
2096
+ end
2097
+ end
2098
+ ```
2099
+
2100
+ Create `lib/atomic_assessments_import/questions/matching.rb`:
2101
+
2102
+ ```ruby
2103
+ # frozen_string_literal: true
2104
+
2105
+ require_relative "question"
2106
+
2107
+ module AtomicAssessmentsImport
2108
+ module Questions
2109
+ class Matching < Question
2110
+ INDEXES = ("a".."o").to_a.freeze
2111
+
2112
+ def question_type
2113
+ "association"
2114
+ end
2115
+
2116
+ def question_data
2117
+ stimulus_list = []
2118
+ possible_responses = []
2119
+ valid_values = []
2120
+
2121
+ INDEXES.each do |letter|
2122
+ option = @row["option #{letter}"]
2123
+ match = @row["match #{letter}"]
2124
+ break unless option
2125
+
2126
+ stimulus_list << option
2127
+ possible_responses << match if match
2128
+ valid_values << match if match
2129
+ end
2130
+
2131
+ super.merge(
2132
+ stimulus_list: stimulus_list,
2133
+ possible_responses: possible_responses,
2134
+ validation: {
2135
+ valid_response: {
2136
+ score: points,
2137
+ value: valid_values,
2138
+ },
2139
+ }
2140
+ )
2141
+ end
2142
+ end
2143
+ end
2144
+ end
2145
+ ```
2146
+
2147
+ Create `lib/atomic_assessments_import/questions/ordering.rb`:
2148
+
2149
+ ```ruby
2150
+ # frozen_string_literal: true
2151
+
2152
+ require_relative "question"
2153
+
2154
+ module AtomicAssessmentsImport
2155
+ module Questions
2156
+ class Ordering < Question
2157
+ INDEXES = ("a".."o").to_a.freeze
2158
+
2159
+ def question_type
2160
+ "orderlist"
2161
+ end
2162
+
2163
+ def question_data
2164
+ items = []
2165
+ INDEXES.each do |letter|
2166
+ option = @row["option #{letter}"]
2167
+ break unless option
2168
+
2169
+ items << option
2170
+ end
2171
+
2172
+ # Parse correct order from "a; b; c" format
2173
+ order = (@row["correct answer"] || "").split(";").map(&:strip).map(&:downcase)
2174
+ valid_values = order.filter_map { |letter| INDEXES.find_index(letter)&.to_s }
2175
+
2176
+ super.merge(
2177
+ list: items,
2178
+ validation: {
2179
+ valid_response: {
2180
+ score: points,
2181
+ value: valid_values,
2182
+ },
2183
+ }
2184
+ )
2185
+ end
2186
+ end
2187
+ end
2188
+ end
2189
+ ```
2190
+
2191
+ **Step 4: Update Question.load** in `lib/atomic_assessments_import/questions/question.rb`
2192
+
2193
+ Final version of `self.load`:
2194
+
2195
+ ```ruby
2196
+ def self.load(row)
2197
+ case row["question type"]
2198
+ when nil, "", /multiple choice/i, /mcq/i, /^ma$/i
2199
+ MultipleChoice.new(row)
2200
+ when /true_false/i, /true\/false/i
2201
+ MultipleChoice.new(row)
2202
+ when /essay/i, /longanswer/i
2203
+ Essay.new(row)
2204
+ when /short_answer/i, /shorttext/i
2205
+ ShortAnswer.new(row)
2206
+ when /fill_in_the_blank/i, /cloze/i
2207
+ FillInTheBlank.new(row)
2208
+ when /matching/i, /association/i
2209
+ Matching.new(row)
2210
+ when /ordering/i, /orderlist/i
2211
+ Ordering.new(row)
2212
+ else
2213
+ raise "Unknown question type #{row['question type']}"
2214
+ end
2215
+ end
2216
+ ```
2217
+
2218
+ **Step 5: Run tests to verify they pass**
2219
+
2220
+ Run: `bundle exec rspec spec/atomic_assessments_import/questions/ -v`
2221
+ Expected: PASS
2222
+
2223
+ **Step 6: Commit**
2224
+
2225
+ ```bash
2226
+ git add lib/atomic_assessments_import/questions/ spec/atomic_assessments_import/questions/
2227
+ git commit -m "feat: add FillInTheBlank, Matching, and Ordering question types"
2228
+ ```
2229
+
2230
+ ---
2231
+
2232
+ ### Task 10: Refactor ExamSoft::Converter to Use New Pipeline
2233
+
2234
+ Replace the monolithic regex-based converter with the chunker + extractor pipeline.
2235
+
2236
+ **Files:**
2237
+ - Modify: `lib/atomic_assessments_import/exam_soft/converter.rb` (major rewrite)
2238
+ - Modify: `lib/atomic_assessments_import/exam_soft.rb` (add requires)
2239
+
2240
+ **Step 1: Read and understand the existing converter**
2241
+
2242
+ The existing converter is at `lib/atomic_assessments_import/exam_soft/converter.rb`. It handles:
2243
+ 1. File input (String path or Tempfile)
2244
+ 2. Pandoc conversion to HTML
2245
+ 3. Regex chunking + extraction
2246
+ 4. Building row_mock
2247
+ 5. Calling convert_row to build items/questions
2248
+
2249
+ We keep steps 1-2 and 5, replace step 3-4 with Chunker + Extractor.
2250
+
2251
+ **Step 2: Rewrite the converter**
2252
+
2253
+ Replace `lib/atomic_assessments_import/exam_soft/converter.rb` with:
2254
+
2255
+ ```ruby
2256
+ # frozen_string_literal: true
2257
+
2258
+ require "pandoc-ruby"
2259
+ require "nokogiri"
2260
+ require "active_support/core_ext/digest/uuid"
2261
+
2262
+ require_relative "../questions/question"
2263
+ require_relative "../questions/multiple_choice"
2264
+ require_relative "../questions/essay"
2265
+ require_relative "../questions/short_answer"
2266
+ require_relative "../questions/fill_in_the_blank"
2267
+ require_relative "../questions/matching"
2268
+ require_relative "../questions/ordering"
2269
+ require_relative "../utils"
2270
+ require_relative "chunker"
2271
+ require_relative "extractor"
2272
+
2273
+ module AtomicAssessmentsImport
2274
+ module ExamSoft
2275
+ class Converter
2276
+ def initialize(file)
2277
+ @file = file
2278
+ end
2279
+
2280
+ def convert
2281
+ html = normalize_to_html
2282
+ doc = Nokogiri::HTML.fragment(html)
2283
+
2284
+ # Chunk the document
2285
+ chunk_result = Chunker.new(doc).chunk
2286
+ all_warnings = chunk_result[:warnings].dup
2287
+
2288
+ # Log header info if present
2289
+ unless chunk_result[:header_nodes].empty?
2290
+ header_text = chunk_result[:header_nodes].map { |n| n.text.strip }.join(" ")
2291
+ all_warnings << "Exam header detected: #{header_text}" unless header_text.empty?
2292
+ end
2293
+
2294
+ items = []
2295
+ questions = []
2296
+
2297
+ chunk_result[:chunks].each_with_index do |chunk_nodes, index|
2298
+ # Extract fields from this chunk
2299
+ extraction = Extractor.new(chunk_nodes).extract
2300
+ all_warnings.concat(extraction[:warnings].map { |w| "Question #{index + 1}: #{w}" })
2301
+
2302
+ row = extraction[:row]
2303
+ status = extraction[:status]
2304
+
2305
+ # Skip completely unparseable chunks
2306
+ if row["question text"].nil? && row["option a"].nil?
2307
+ all_warnings << "Question #{index + 1}: Skipped — no usable content found"
2308
+ next
2309
+ end
2310
+
2311
+ begin
2312
+ item, question_widgets = convert_row(row, status)
2313
+ items << item
2314
+ questions += question_widgets
2315
+ rescue StandardError => e
2316
+ title = row["title"] || "Question #{index + 1}"
2317
+ all_warnings << "#{title}: #{e.message}, imported as draft"
2318
+ # Attempt bare-minimum import
2319
+ begin
2320
+ item, question_widgets = convert_row_minimal(row)
2321
+ items << item
2322
+ questions += question_widgets
2323
+ rescue StandardError
2324
+ all_warnings << "#{title}: Could not import even minimally, skipped"
2325
+ end
2326
+ end
2327
+ end
2328
+
2329
+ {
2330
+ activities: [],
2331
+ items: items,
2332
+ questions: questions,
2333
+ features: [],
2334
+ errors: all_warnings,
2335
+ }
2336
+ end
2337
+
2338
+ private
2339
+
2340
+ def normalize_to_html
2341
+ if @file.is_a?(String)
2342
+ PandocRuby.new([@file], from: @file.split(".").last).to_html
2343
+ else
2344
+ source_type = @file.path.split(".").last.match(/^[a-zA-Z]+/)[0]
2345
+ PandocRuby.new(@file.read, from: source_type).to_html
2346
+ end
2347
+ end
2348
+
2349
+ def categories_to_tags(categories)
2350
+ tags = {}
2351
+ (categories || []).each do |cat|
2352
+ if cat.include?("/")
2353
+ key, value = cat.split("/", 2).map(&:strip)
2354
+ tags[key.to_sym] ||= []
2355
+ tags[key.to_sym] << value
2356
+ else
2357
+ tags[cat.to_sym] ||= []
2358
+ end
2359
+ end
2360
+ tags
2361
+ end
2362
+
2363
+ def convert_row(row, status = "published")
2364
+ source = "<p>ExamSoft Import on #{Time.now.strftime('%Y-%m-%d')}</p>\n"
2365
+ if row["question id"].present?
2366
+ source += "<p>External id: #{row['question id']}</p>\n"
2367
+ end
2368
+
2369
+ question = Questions::Question.load(row)
2370
+ item = {
2371
+ reference: SecureRandom.uuid,
2372
+ title: row["title"] || "",
2373
+ status: status,
2374
+ tags: categories_to_tags(row["category"]),
2375
+ metadata: {
2376
+ import_date: Time.now.iso8601,
2377
+ import_type: row["import_type"] || "examsoft",
2378
+ },
2379
+ source: source,
2380
+ description: row["description"] || "",
2381
+ questions: [
2382
+ {
2383
+ reference: question.reference,
2384
+ type: question.question_type,
2385
+ },
2386
+ ],
2387
+ features: [],
2388
+ definition: {
2389
+ widgets: [
2390
+ {
2391
+ reference: question.reference,
2392
+ widget_type: "response",
2393
+ },
2394
+ ],
2395
+ },
2396
+ }
2397
+ [item, [question.to_learnosity]]
2398
+ end
2399
+
2400
+ def convert_row_minimal(row)
2401
+ # Fallback: create a bare item with just the question text
2402
+ reference = SecureRandom.uuid
2403
+ item = {
2404
+ reference: reference,
2405
+ title: row["title"] || "",
2406
+ status: "draft",
2407
+ tags: {},
2408
+ metadata: {
2409
+ import_date: Time.now.iso8601,
2410
+ import_type: "examsoft",
2411
+ },
2412
+ source: "<p>ExamSoft Import on #{Time.now.strftime('%Y-%m-%d')}</p>\n",
2413
+ description: row["question text"] || "",
2414
+ questions: [],
2415
+ features: [],
2416
+ definition: { widgets: [] },
2417
+ }
2418
+ [item, []]
2419
+ end
2420
+ end
2421
+ end
2422
+ end
2423
+ ```
2424
+
2425
+ **Step 3: Run existing tests to check backward compatibility**
2426
+
2427
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/ -v`
2428
+ Expected: Existing tests should mostly pass. Some may need minor adjustments due to error handling changes (e.g., "raises if no options" now produces a warning instead of an exception).
2429
+
2430
+ **Step 4: Update existing ExamSoft specs for new behavior**
2431
+
2432
+ The tests that expect `raise_error` for missing options/correct answers need to change — the new converter uses best-effort and produces warnings instead. Update `spec/atomic_assessments_import/examsoft/docx_converter_spec.rb`:
2433
+
2434
+ Change the "raises if no options" test to:
2435
+ ```ruby
2436
+ it "warns and imports as draft if no options are given" do
2437
+ no_options = Tempfile.new("temp.docx")
2438
+ original_content = File.read("spec/fixtures/no_options.docx")
2439
+ no_options.write(original_content)
2440
+ no_options.rewind
2441
+
2442
+ data = described_class.new(no_options).convert
2443
+ expect(data[:errors]).to include(a_string_matching(/no options|missing options/i))
2444
+ end
2445
+ ```
2446
+
2447
+ Change the "raises if no correct answer" test to:
2448
+ ```ruby
2449
+ it "warns and imports as draft if no correct answer is given" do
2450
+ no_correct = Tempfile.new("temp.docx")
2451
+ original_content = File.read("spec/fixtures/no_correct.docx")
2452
+ no_correct.write(original_content)
2453
+ no_correct.rewind
2454
+
2455
+ data = described_class.new(no_correct).convert
2456
+ expect(data[:errors]).to include(a_string_matching(/correct answer/i))
2457
+ end
2458
+ ```
2459
+
2460
+ Apply similar changes to `html_converter_spec.rb` and `rtf_converter_spec.rb`.
2461
+
2462
+ **Step 5: Run all tests**
2463
+
2464
+ Run: `bundle exec rspec`
2465
+ Expected: All pass
2466
+
2467
+ **Step 6: Commit**
2468
+
2469
+ ```bash
2470
+ git add lib/atomic_assessments_import/exam_soft/ spec/atomic_assessments_import/examsoft/
2471
+ git commit -m "refactor: rewrite ExamSoft converter to use chunker + extractor pipeline"
2472
+ ```
2473
+
2474
+ ---
2475
+
2476
+ ### Task 11: Integration Tests — Mixed Types, Messy Documents, Partial Parse
2477
+
2478
+ **Files:**
2479
+ - Create: `spec/fixtures/mixed_types.html`
2480
+ - Create: `spec/fixtures/messy_document.html`
2481
+ - Create: `spec/atomic_assessments_import/examsoft/integration_spec.rb`
2482
+
2483
+ **Step 1: Create test fixtures**
2484
+
2485
+ Create `spec/fixtures/mixed_types.html`:
2486
+
2487
+ ```html
2488
+ <p>Exam: Midterm 2024</p>
2489
+ <p>Total Questions: 4</p>
2490
+ <p>Folder: Science Title: Q1 Category: Biology/Cells 1) What is the powerhouse of the cell? ~ The mitochondria produces ATP.</p>
2491
+ <p>*a) Mitochondria</p>
2492
+ <p>b) Nucleus</p>
2493
+ <p>c) Ribosome</p>
2494
+ <p>Type: Essay Folder: Writing Title: Q2 Category: English/Composition 2) Discuss the themes of Hamlet.</p>
2495
+ <p>Type: MA Folder: Geography Title: Q3 Category: Capitals 3) Select all European capitals.</p>
2496
+ <p>*a) Paris</p>
2497
+ <p>*b) Berlin</p>
2498
+ <p>c) New York</p>
2499
+ <p>Folder: Science Title: Q4 Category: Chemistry 4) What is the chemical symbol for gold?</p>
2500
+ <p>*a) Au</p>
2501
+ <p>b) Ag</p>
2502
+ <p>c) Fe</p>
2503
+ ```
2504
+
2505
+ Create `spec/fixtures/messy_document.html`:
2506
+
2507
+ ```html
2508
+ <p>Some random header text</p>
2509
+ <p></p>
2510
+ <p>Folder: Test Title: Q1 Category: General 1) A normal question? ~ Normal explanation</p>
2511
+ <p>*a) Correct</p>
2512
+ <p>b) Wrong</p>
2513
+ <p>Folder: Test Title: Q2 Category: General 2) A question with no options at all</p>
2514
+ <p>Folder: Test Title: Q3 Category: General 3) Another normal question? ~ Another explanation</p>
2515
+ <p>*a) Right</p>
2516
+ <p>b) Wrong</p>
2517
+ ```
2518
+
2519
+ **Step 2: Write integration tests**
2520
+
2521
+ Create `spec/atomic_assessments_import/examsoft/integration_spec.rb`:
2522
+
2523
+ ```ruby
2524
+ # frozen_string_literal: true
2525
+
2526
+ require "atomic_assessments_import"
2527
+
2528
+ RSpec.describe "ExamSoft Integration" do
2529
+ describe "mixed question types" do
2530
+ it "handles a document with MCQ, essay, and MA questions" do
2531
+ data = AtomicAssessmentsImport::ExamSoft::Converter.new("spec/fixtures/mixed_types.html").convert
2532
+
2533
+ expect(data[:items].length).to eq(4)
2534
+
2535
+ # MCQ question
2536
+ q1 = data[:questions].find { |q| q[:data][:stimulus]&.include?("powerhouse") }
2537
+ expect(q1).not_to be_nil
2538
+ expect(q1[:type]).to eq("mcq")
2539
+
2540
+ # Essay question
2541
+ q2 = data[:questions].find { |q| q[:data][:stimulus]&.include?("Hamlet") }
2542
+ expect(q2).not_to be_nil
2543
+ expect(q2[:type]).to eq("longanswer")
2544
+
2545
+ # MA question
2546
+ q3 = data[:questions].find { |q| q[:data][:stimulus]&.include?("European capitals") }
2547
+ expect(q3).not_to be_nil
2548
+ end
2549
+
2550
+ it "reports exam header in warnings" do
2551
+ data = AtomicAssessmentsImport::ExamSoft::Converter.new("spec/fixtures/mixed_types.html").convert
2552
+
2553
+ expect(data[:errors]).to include(a_string_matching(/header/i))
2554
+ end
2555
+ end
2556
+
2557
+ describe "messy documents with partial parse" do
2558
+ it "imports what it can and warns about problems" do
2559
+ data = AtomicAssessmentsImport::ExamSoft::Converter.new("spec/fixtures/messy_document.html").convert
2560
+
2561
+ # Should get at least 2 good items (Q1 and Q3)
2562
+ published = data[:items].select { |i| i[:status] == "published" }
2563
+ expect(published.length).to be >= 2
2564
+
2565
+ # Should have warnings about Q2 (no options for what looks like MCQ)
2566
+ expect(data[:errors].length).to be > 0
2567
+ end
2568
+ end
2569
+
2570
+ describe "backward compatibility" do
2571
+ it "produces the same structure from simple.html as before" do
2572
+ data = AtomicAssessmentsImport::ExamSoft::Converter.new("spec/fixtures/simple.html").convert
2573
+
2574
+ expect(data[:items].length).to eq(3)
2575
+ expect(data[:questions].length).to eq(3)
2576
+ expect(data[:activities]).to eq([])
2577
+ expect(data[:features]).to eq([])
2578
+
2579
+ item1 = data[:items].find { |i| i[:title] == "Question 1" }
2580
+ expect(item1).not_to be_nil
2581
+ expect(item1[:status]).to eq("published")
2582
+
2583
+ q1 = data[:questions].find { |q| q[:data][:stimulus] == "What is the capital of France?" }
2584
+ expect(q1).not_to be_nil
2585
+ expect(q1[:data][:options].length).to eq(3)
2586
+ end
2587
+ end
2588
+ end
2589
+ ```
2590
+
2591
+ **Step 3: Run integration tests**
2592
+
2593
+ Run: `bundle exec rspec spec/atomic_assessments_import/examsoft/integration_spec.rb -v`
2594
+ Expected: PASS
2595
+
2596
+ **Step 4: Run full test suite**
2597
+
2598
+ Run: `bundle exec rspec`
2599
+ Expected: All pass
2600
+
2601
+ **Step 5: Commit**
2602
+
2603
+ ```bash
2604
+ git add spec/fixtures/mixed_types.html spec/fixtures/messy_document.html spec/atomic_assessments_import/examsoft/integration_spec.rb
2605
+ git commit -m "test: add integration tests for mixed types, messy docs, backward compat"
2606
+ ```
2607
+
2608
+ ---
2609
+
2610
+ ### Task 12: Final Cleanup and Full Test Run
2611
+
2612
+ **Files:**
2613
+ - Review: all modified files
2614
+ - Clean up: any dead code from old converter, unused comments
2615
+
2616
+ **Step 1: Run full test suite**
2617
+
2618
+ Run: `bundle exec rspec --format documentation`
2619
+ Expected: All pass
2620
+
2621
+ **Step 2: Check for dead code**
2622
+
2623
+ Look for any leftover references to the old regex patterns in the converter that are no longer needed. The old `chunk_pattern`, `meta_regex`, `question_regex`, `explanation_regex`, `options_regex` constants should all be gone since they were local variables in the old `convert` method.
2624
+
2625
+ **Step 3: Run rubocop if configured**
2626
+
2627
+ Run: `bundle exec rubocop lib/atomic_assessments_import/exam_soft/ lib/atomic_assessments_import/questions/`
2628
+ Fix any style issues.
2629
+
2630
+ **Step 4: Final commit**
2631
+
2632
+ ```bash
2633
+ git add -A
2634
+ git commit -m "chore: cleanup after ExamSoft converter refactor"
2635
+ ```