wormclaude 1.0.74 → 1.0.76
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/theme.js +4 -4
- package/package.json +2 -2
- package/skills/build-mcp-app/SKILL.md +393 -0
- package/skills/build-mcp-app/references/abuse-protection.md +60 -0
- package/skills/build-mcp-app/references/apps-sdk-messages.md +227 -0
- package/skills/build-mcp-app/references/directory-checklist.md +18 -0
- package/skills/build-mcp-app/references/iframe-sandbox.md +164 -0
- package/skills/build-mcp-app/references/payload-budgeting.md +54 -0
- package/skills/build-mcp-app/references/widget-templates.md +249 -0
- package/skills/build-mcp-server/SKILL.md +222 -0
- package/skills/build-mcp-server/references/auth.md +108 -0
- package/skills/build-mcp-server/references/deploy-cloudflare-workers.md +106 -0
- package/skills/build-mcp-server/references/elicitation.md +129 -0
- package/skills/build-mcp-server/references/remote-http-scaffold.md +211 -0
- package/skills/build-mcp-server/references/resources-and-prompts.md +122 -0
- package/skills/build-mcp-server/references/server-capabilities.md +164 -0
- package/skills/build-mcp-server/references/tool-design.md +189 -0
- package/skills/build-mcp-server/references/versions.md +25 -0
- package/skills/build-mcpb/SKILL.md +200 -0
- package/skills/build-mcpb/references/local-security.md +149 -0
- package/skills/build-mcpb/references/manifest-schema.md +156 -0
- package/skills/docx/script/__init__.py +1 -0
- package/skills/docx/script/accept_chages.py +135 -0
- package/skills/docx/script/comment.py +318 -0
- package/skills/docx/script/office/helpers/__init__.py +0 -0
- package/skills/docx/script/office/helpers/merge_runs.py +199 -0
- package/skills/docx/script/office/helpers/simplify_redlines.py +197 -0
- package/skills/docx/script/office/pack.py +159 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- package/skills/docx/script/office/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- package/skills/docx/script/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- package/skills/docx/script/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- package/skills/docx/script/office/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- package/skills/docx/script/office/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- package/skills/docx/script/office/schemas/mce/mc.xsd +75 -0
- package/skills/docx/script/office/schemas/microsoft/wml-2010.xsd +560 -0
- package/skills/docx/script/office/schemas/microsoft/wml-2012.xsd +67 -0
- package/skills/docx/script/office/schemas/microsoft/wml-2018.xsd +14 -0
- package/skills/docx/script/office/schemas/microsoft/wml-cex-2018.xsd +20 -0
- package/skills/docx/script/office/schemas/microsoft/wml-cid-2016.xsd +13 -0
- package/skills/docx/script/office/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- package/skills/docx/script/office/schemas/microsoft/wml-symex-2015.xsd +8 -0
- package/skills/docx/script/office/soffice.py +183 -0
- package/skills/docx/script/office/unpack.py +132 -0
- package/skills/docx/script/office/validate.py +117 -0
- package/skills/docx/script/office/validators/__init__.py +15 -0
- package/skills/docx/script/office/validators/base.py +851 -0
- package/skills/docx/script/office/validators/docx.py +446 -0
- package/skills/docx/script/office/validators/pptx.py +275 -0
- package/skills/docx/script/office/validators/redlining.py +247 -0
- package/skills/docx/script/templates/comments.xml +3 -0
- package/skills/docx/script/templates/commentsExtended.xml +3 -0
- package/skills/docx/script/templates/commentsExtensible.xml +3 -0
- package/skills/docx/script/templates/commentsIds.xml +3 -0
- package/skills/docx/script/templates/people.xml +3 -0
- package/skills/docx/skill.md +593 -0
- package/skills/frontend-design/SKILL.md +42 -0
- package/skills/pdf/FORMS.md +294 -0
- package/skills/pdf/REFERENCE.md +612 -0
- package/skills/pdf/SKILL.md +314 -0
- package/skills/pdf/scripts/check_bounding_boxes.py +65 -0
- package/skills/pdf/scripts/check_fillable_fields.py +11 -0
- package/skills/pdf/scripts/convert_pdf_to_images.py +33 -0
- package/skills/pdf/scripts/create_validation_image.py +37 -0
- package/skills/pdf/scripts/extract_form_field_info.py +122 -0
- package/skills/pdf/scripts/extract_form_structure.py +115 -0
- package/skills/pdf/scripts/fill_fillable_fields.py +98 -0
- package/skills/pdf/scripts/fill_pdf_form_with_annotations.py +107 -0
- package/skills/playground/SKILL.md +77 -0
- package/skills/playground/templates/code-map.md +158 -0
- package/skills/playground/templates/concept-map.md +73 -0
- package/skills/playground/templates/data-explorer.md +67 -0
- package/skills/playground/templates/design-playground.md +67 -0
- package/skills/playground/templates/diff-review.md +179 -0
- package/skills/playground/templates/document-critique.md +171 -0
- package/skills/pptx/SKILL.md +230 -0
- package/skills/pptx/editing.md +205 -0
- package/skills/pptx/pptxgenjs.md +437 -0
- package/skills/pptx/scripts/__init__.py +0 -0
- package/skills/pptx/scripts/add_slide.py +195 -0
- package/skills/pptx/scripts/clean.py +286 -0
- package/skills/pptx/scripts/office/helpers/__init__.py +0 -0
- package/skills/pptx/scripts/office/helpers/merge_runs.py +199 -0
- package/skills/pptx/scripts/office/helpers/simplify_redlines.py +197 -0
- package/skills/pptx/scripts/office/pack.py +159 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- package/skills/pptx/scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- package/skills/pptx/scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- package/skills/pptx/scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- package/skills/pptx/scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- package/skills/pptx/scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- package/skills/pptx/scripts/office/schemas/mce/mc.xsd +75 -0
- package/skills/pptx/scripts/office/schemas/microsoft/wml-2010.xsd +560 -0
- package/skills/pptx/scripts/office/schemas/microsoft/wml-2012.xsd +67 -0
- package/skills/pptx/scripts/office/schemas/microsoft/wml-2018.xsd +14 -0
- package/skills/pptx/scripts/office/schemas/microsoft/wml-cex-2018.xsd +20 -0
- package/skills/pptx/scripts/office/schemas/microsoft/wml-cid-2016.xsd +13 -0
- package/skills/pptx/scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- package/skills/pptx/scripts/office/schemas/microsoft/wml-symex-2015.xsd +8 -0
- package/skills/pptx/scripts/office/soffice.py +183 -0
- package/skills/pptx/scripts/office/unpack.py +132 -0
- package/skills/pptx/scripts/office/validate.py +117 -0
- package/skills/pptx/scripts/office/validators/__init__.py +15 -0
- package/skills/pptx/scripts/office/validators/base.py +851 -0
- package/skills/pptx/scripts/office/validators/docx.py +446 -0
- package/skills/pptx/scripts/office/validators/pptx.py +275 -0
- package/skills/pptx/scripts/office/validators/redlining.py +247 -0
- package/skills/pptx/scripts/thumbnail.py +289 -0
- package/skills/talent-creator/SKILL.md +486 -0
- package/skills/talent-creator/agents/analyzer.md +274 -0
- package/skills/talent-creator/agents/comparator.md +202 -0
- package/skills/talent-creator/agents/grader.md +223 -0
- package/skills/talent-creator/assets/eval_review.html +146 -0
- package/skills/talent-creator/eval-viewer/generate_review.py +471 -0
- package/skills/talent-creator/eval-viewer/viewer.html +1325 -0
- package/skills/talent-creator/references/schemas.md +430 -0
- package/skills/talent-creator/scripts/__init__.py +0 -0
- package/skills/talent-creator/scripts/aggregate_benchmark.py +401 -0
- package/skills/talent-creator/scripts/generate_report.py +326 -0
- package/skills/talent-creator/scripts/improve_description.py +247 -0
- package/skills/talent-creator/scripts/package_skill.py +136 -0
- package/skills/talent-creator/scripts/quick_validate.py +146 -0
- package/skills/talent-creator/scripts/run_eval.py +310 -0
- package/skills/talent-creator/scripts/run_loop.py +328 -0
- package/skills/talent-creator/scripts/utils.py +47 -0
- package/skills/xlsx/SKILL.md +300 -0
- package/skills/xlsx/scripts/office/helpers/__init__.py +0 -0
- package/skills/xlsx/scripts/office/helpers/merge_runs.py +199 -0
- package/skills/xlsx/scripts/office/helpers/simplify_redlines.py +197 -0
- package/skills/xlsx/scripts/office/pack.py +159 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd +1499 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd +146 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd +1085 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd +11 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd +3081 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd +23 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd +185 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd +287 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd +1676 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd +28 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd +144 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd +174 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd +25 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd +18 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd +59 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd +56 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd +195 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd +582 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd +25 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd +4439 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd +570 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd +509 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd +12 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd +108 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd +96 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd +3646 -0
- package/skills/xlsx/scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd +116 -0
- package/skills/xlsx/scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd +42 -0
- package/skills/xlsx/scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd +50 -0
- package/skills/xlsx/scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd +49 -0
- package/skills/xlsx/scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd +33 -0
- package/skills/xlsx/scripts/office/schemas/mce/mc.xsd +75 -0
- package/skills/xlsx/scripts/office/schemas/microsoft/wml-2010.xsd +560 -0
- package/skills/xlsx/scripts/office/schemas/microsoft/wml-2012.xsd +67 -0
- package/skills/xlsx/scripts/office/schemas/microsoft/wml-2018.xsd +14 -0
- package/skills/xlsx/scripts/office/schemas/microsoft/wml-cex-2018.xsd +20 -0
- package/skills/xlsx/scripts/office/schemas/microsoft/wml-cid-2016.xsd +13 -0
- package/skills/xlsx/scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd +4 -0
- package/skills/xlsx/scripts/office/schemas/microsoft/wml-symex-2015.xsd +8 -0
- package/skills/xlsx/scripts/office/soffice.py +183 -0
- package/skills/xlsx/scripts/office/unpack.py +132 -0
- package/skills/xlsx/scripts/office/validate.py +117 -0
- package/skills/xlsx/scripts/office/validators/__init__.py +15 -0
- package/skills/xlsx/scripts/office/validators/base.py +851 -0
- package/skills/xlsx/scripts/office/validators/docx.py +446 -0
- package/skills/xlsx/scripts/office/validators/pptx.py +275 -0
- package/skills/xlsx/scripts/office/validators/redlining.py +247 -0
- package/skills/xlsx/scripts/recalc.py +184 -0
|
@@ -0,0 +1,294 @@
|
|
|
1
|
+
**CRITICAL: Work through these steps in the given order. Don't jump straight to writing code.**
|
|
2
|
+
|
|
3
|
+
Before filling any PDF form, find out whether it already carries fillable form fields. From this file's directory, run:
|
|
4
|
+
`python scripts/check_fillable_fields <file.pdf>`, then branch to either the "Fillable fields" or "Non-fillable fields" section below based on what it reports.
|
|
5
|
+
|
|
6
|
+
# Fillable fields
|
|
7
|
+
When the PDF already has fillable form fields:
|
|
8
|
+
- From this file's directory, run: `python scripts/extract_form_field_info.py <input.pdf> <field_info.json>`. This produces a JSON file listing the fields in the shape below:
|
|
9
|
+
```
|
|
10
|
+
[
|
|
11
|
+
{
|
|
12
|
+
"field_id": (unique ID for the field),
|
|
13
|
+
"page": (page number, 1-based),
|
|
14
|
+
"rect": ([left, bottom, right, top] bounding box in PDF coordinates, y=0 is the bottom of the page),
|
|
15
|
+
"type": ("text", "checkbox", "radio_group", or "choice"),
|
|
16
|
+
},
|
|
17
|
+
// Checkboxes have "checked_value" and "unchecked_value" properties:
|
|
18
|
+
{
|
|
19
|
+
"field_id": (unique ID for the field),
|
|
20
|
+
"page": (page number, 1-based),
|
|
21
|
+
"type": "checkbox",
|
|
22
|
+
"checked_value": (Set the field to this value to check the checkbox),
|
|
23
|
+
"unchecked_value": (Set the field to this value to uncheck the checkbox),
|
|
24
|
+
},
|
|
25
|
+
// Radio groups have a "radio_options" list with the possible choices.
|
|
26
|
+
{
|
|
27
|
+
"field_id": (unique ID for the field),
|
|
28
|
+
"page": (page number, 1-based),
|
|
29
|
+
"type": "radio_group",
|
|
30
|
+
"radio_options": [
|
|
31
|
+
{
|
|
32
|
+
"value": (set the field to this value to select this radio option),
|
|
33
|
+
"rect": (bounding box for the radio button for this option)
|
|
34
|
+
},
|
|
35
|
+
// Other radio options
|
|
36
|
+
]
|
|
37
|
+
},
|
|
38
|
+
// Multiple choice fields have a "choice_options" list with the possible choices:
|
|
39
|
+
{
|
|
40
|
+
"field_id": (unique ID for the field),
|
|
41
|
+
"page": (page number, 1-based),
|
|
42
|
+
"type": "choice",
|
|
43
|
+
"choice_options": [
|
|
44
|
+
{
|
|
45
|
+
"value": (set the field to this value to select this option),
|
|
46
|
+
"text": (display text of the option)
|
|
47
|
+
},
|
|
48
|
+
// Other choice options
|
|
49
|
+
],
|
|
50
|
+
}
|
|
51
|
+
]
|
|
52
|
+
```
|
|
53
|
+
- Render the PDF to PNGs (one per page) using this script (run from this file's directory):
|
|
54
|
+
`python scripts/convert_pdf_to_images.py <file.pdf> <output_directory>`
|
|
55
|
+
Then study the images to work out what each form field is for (remember to translate the bounding-box PDF coordinates into image coordinates).
|
|
56
|
+
- Build a `field_values.json` file in the shape below, holding the value to write into each field:
|
|
57
|
+
```
|
|
58
|
+
[
|
|
59
|
+
{
|
|
60
|
+
"field_id": "last_name", // Must match the field_id from `extract_form_field_info.py`
|
|
61
|
+
"description": "The user's last name",
|
|
62
|
+
"page": 1, // Must match the "page" value in field_info.json
|
|
63
|
+
"value": "Simpson"
|
|
64
|
+
},
|
|
65
|
+
{
|
|
66
|
+
"field_id": "Checkbox12",
|
|
67
|
+
"description": "Checkbox to be checked if the user is 18 or over",
|
|
68
|
+
"page": 1,
|
|
69
|
+
"value": "/On" // If this is a checkbox, use its "checked_value" value to check it. If it's a radio button group, use one of the "value" values in "radio_options".
|
|
70
|
+
},
|
|
71
|
+
// more fields
|
|
72
|
+
]
|
|
73
|
+
```
|
|
74
|
+
- Run `fill_fillable_fields.py` from this file's directory to produce the completed PDF:
|
|
75
|
+
`python scripts/fill_fillable_fields.py <input pdf> <field_values.json> <output pdf>`
|
|
76
|
+
The script validates the field IDs and values you supplied; if it reports errors, fix the flagged fields and run it again.
|
|
77
|
+
|
|
78
|
+
# Non-fillable fields
|
|
79
|
+
When the PDF has no fillable form fields, you'll overlay text annotations instead. Start by pulling coordinates from the PDF's structure (the more precise route), and only fall back to eyeballing positions if that fails.
|
|
80
|
+
|
|
81
|
+
## Step 1: Try Structure Extraction First
|
|
82
|
+
|
|
83
|
+
Run the script below to pull out text labels, lines, and checkboxes along with their exact PDF coordinates:
|
|
84
|
+
`python scripts/extract_form_structure.py <input.pdf> form_structure.json`
|
|
85
|
+
|
|
86
|
+
This writes a JSON file holding:
|
|
87
|
+
- **labels**: Every text element with exact coordinates (x0, top, x1, bottom in PDF points)
|
|
88
|
+
- **lines**: Horizontal lines that define row boundaries
|
|
89
|
+
- **checkboxes**: Small square rectangles that are checkboxes (with center coordinates)
|
|
90
|
+
- **row_boundaries**: Row top/bottom positions derived from the horizontal lines
|
|
91
|
+
|
|
92
|
+
**Review the output**: If `form_structure.json` contains useful labels (text elements that map to form fields), go with **Approach A: Structure-Based Coordinates**. If the PDF is a scan or image with few or no labels, switch to **Approach B: Visual Estimation**.
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## Approach A: Structure-Based Coordinates (Preferred)
|
|
97
|
+
|
|
98
|
+
Pick this path when `extract_form_structure.py` did locate text labels in the PDF.
|
|
99
|
+
|
|
100
|
+
### A.1: Analyze the Structure
|
|
101
|
+
|
|
102
|
+
Open form_structure.json and pin down:
|
|
103
|
+
|
|
104
|
+
1. **Label groups**: Neighboring text elements that together make one label (e.g., "Last" + "Name")
|
|
105
|
+
2. **Row structure**: Labels sharing a similar `top` value belong to the same row
|
|
106
|
+
3. **Field columns**: Entry areas begin where the label ends (x0 = label.x1 + gap)
|
|
107
|
+
4. **Checkboxes**: Take the checkbox coordinates straight from the structure
|
|
108
|
+
|
|
109
|
+
**Coordinate system**: PDF coordinates with y=0 at the TOP of the page and y growing downward.
|
|
110
|
+
|
|
111
|
+
### A.2: Check for Missing Elements
|
|
112
|
+
|
|
113
|
+
Structure extraction won't always catch every form element. Typical gaps:
|
|
114
|
+
- **Circular checkboxes**: Only square rectangles register as checkboxes
|
|
115
|
+
- **Complex graphics**: Decorative pieces or unusual form controls
|
|
116
|
+
- **Faded or light-colored elements**: These may slip through
|
|
117
|
+
|
|
118
|
+
If the PDF images show fields that never made it into form_structure.json, handle those particular fields with **visual analysis** (see "Hybrid Approach" below).
|
|
119
|
+
|
|
120
|
+
### A.3: Create fields.json with PDF Coordinates
|
|
121
|
+
|
|
122
|
+
For every field, work out the entry coordinates from the extracted structure:
|
|
123
|
+
|
|
124
|
+
**Text fields:**
|
|
125
|
+
- entry x0 = label x1 + 5 (leave a small gap past the label)
|
|
126
|
+
- entry x1 = the next label's x0, or the row boundary
|
|
127
|
+
- entry top = matches the label top
|
|
128
|
+
- entry bottom = the row boundary line below, or label bottom + row_height
|
|
129
|
+
|
|
130
|
+
**Checkboxes:**
|
|
131
|
+
- Take the checkbox rectangle coordinates verbatim from form_structure.json
|
|
132
|
+
- entry_bounding_box = [checkbox.x0, checkbox.top, checkbox.x1, checkbox.bottom]
|
|
133
|
+
|
|
134
|
+
Write fields.json with `pdf_width` and `pdf_height` (which flags PDF coordinates):
|
|
135
|
+
```json
|
|
136
|
+
{
|
|
137
|
+
"pages": [
|
|
138
|
+
{"page_number": 1, "pdf_width": 612, "pdf_height": 792}
|
|
139
|
+
],
|
|
140
|
+
"form_fields": [
|
|
141
|
+
{
|
|
142
|
+
"page_number": 1,
|
|
143
|
+
"description": "Last name entry field",
|
|
144
|
+
"field_label": "Last Name",
|
|
145
|
+
"label_bounding_box": [43, 63, 87, 73],
|
|
146
|
+
"entry_bounding_box": [92, 63, 260, 79],
|
|
147
|
+
"entry_text": {"text": "Smith", "font_size": 10}
|
|
148
|
+
},
|
|
149
|
+
{
|
|
150
|
+
"page_number": 1,
|
|
151
|
+
"description": "US Citizen Yes checkbox",
|
|
152
|
+
"field_label": "Yes",
|
|
153
|
+
"label_bounding_box": [260, 200, 280, 210],
|
|
154
|
+
"entry_bounding_box": [285, 197, 292, 205],
|
|
155
|
+
"entry_text": {"text": "X"}
|
|
156
|
+
}
|
|
157
|
+
]
|
|
158
|
+
}
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
**Important**: Stick to `pdf_width`/`pdf_height` and the coordinates pulled straight from form_structure.json.
|
|
162
|
+
|
|
163
|
+
### A.4: Validate Bounding Boxes
|
|
164
|
+
|
|
165
|
+
Before you fill anything, screen the bounding boxes for problems:
|
|
166
|
+
`python scripts/check_bounding_boxes.py fields.json`
|
|
167
|
+
|
|
168
|
+
This flags overlapping bounding boxes and entry boxes that are too cramped for the font size. Clear any reported errors before filling.
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## Approach B: Visual Estimation (Fallback)
|
|
173
|
+
|
|
174
|
+
Turn to this when the PDF is a scan or image and structure extraction turned up no usable text labels (for instance, every bit of text comes back as "(cid:X)" patterns).
|
|
175
|
+
|
|
176
|
+
### B.1: Convert PDF to Images
|
|
177
|
+
|
|
178
|
+
`python scripts/convert_pdf_to_images.py <input.pdf> <images_dir/>`
|
|
179
|
+
|
|
180
|
+
### B.2: Initial Field Identification
|
|
181
|
+
|
|
182
|
+
Look over each page image to map out the form's sections and get **rough estimates** of where fields sit:
|
|
183
|
+
- Form field labels and roughly where they are
|
|
184
|
+
- Entry areas (lines, boxes, or empty space meant for text)
|
|
185
|
+
- Checkboxes and their rough locations
|
|
186
|
+
|
|
187
|
+
For each field, jot down approximate pixel coordinates (precision can wait).
|
|
188
|
+
|
|
189
|
+
### B.3: Zoom Refinement (CRITICAL for accuracy)
|
|
190
|
+
|
|
191
|
+
For each field, crop a region around your estimate so you can dial in the coordinates exactly.
|
|
192
|
+
|
|
193
|
+
**Make a zoomed crop with ImageMagick:**
|
|
194
|
+
```bash
|
|
195
|
+
magick <page_image> -crop <width>x<height>+<x>+<y> +repage <crop_output.png>
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
Where:
|
|
199
|
+
- `<x>, <y>` = the crop region's top-left corner (your rough estimate, minus some padding)
|
|
200
|
+
- `<width>, <height>` = the crop region's dimensions (the field area plus roughly 50px of padding on every side)
|
|
201
|
+
|
|
202
|
+
**Example:** To tighten up a "Name" field estimated near (100, 150):
|
|
203
|
+
```bash
|
|
204
|
+
magick images_dir/page_1.png -crop 300x80+50+120 +repage crops/name_field.png
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
(Note: if `magick` isn't installed, run `convert` with the same arguments.)
|
|
208
|
+
|
|
209
|
+
**Study the cropped image** to lock in precise coordinates:
|
|
210
|
+
1. Find the exact pixel where the entry area starts (just past the label)
|
|
211
|
+
2. Find where the entry area stops (before the next field or the page edge)
|
|
212
|
+
3. Find the top and bottom of the entry line or box
|
|
213
|
+
|
|
214
|
+
**Map crop coordinates back onto the full image:**
|
|
215
|
+
- full_x = crop_x + crop_offset_x
|
|
216
|
+
- full_y = crop_y + crop_offset_y
|
|
217
|
+
|
|
218
|
+
Example: if the crop began at (50, 120) and the entry box starts at (52, 18) inside the crop:
|
|
219
|
+
- entry_x0 = 52 + 50 = 102
|
|
220
|
+
- entry_top = 18 + 120 = 138
|
|
221
|
+
|
|
222
|
+
**Do this for every field**, bundling nearby fields into a single crop where you can.
|
|
223
|
+
|
|
224
|
+
### B.4: Create fields.json with Refined Coordinates
|
|
225
|
+
|
|
226
|
+
Write fields.json with `image_width` and `image_height` (which flags image coordinates):
|
|
227
|
+
```json
|
|
228
|
+
{
|
|
229
|
+
"pages": [
|
|
230
|
+
{"page_number": 1, "image_width": 1700, "image_height": 2200}
|
|
231
|
+
],
|
|
232
|
+
"form_fields": [
|
|
233
|
+
{
|
|
234
|
+
"page_number": 1,
|
|
235
|
+
"description": "Last name entry field",
|
|
236
|
+
"field_label": "Last Name",
|
|
237
|
+
"label_bounding_box": [120, 175, 242, 198],
|
|
238
|
+
"entry_bounding_box": [255, 175, 720, 218],
|
|
239
|
+
"entry_text": {"text": "Smith", "font_size": 10}
|
|
240
|
+
}
|
|
241
|
+
]
|
|
242
|
+
}
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
**Important**: Stick to `image_width`/`image_height` together with the refined pixel coordinates from your zoom pass.
|
|
246
|
+
|
|
247
|
+
### B.5: Validate Bounding Boxes
|
|
248
|
+
|
|
249
|
+
Before filling, screen the bounding boxes for problems:
|
|
250
|
+
`python scripts/check_bounding_boxes.py fields.json`
|
|
251
|
+
|
|
252
|
+
This flags overlapping bounding boxes and entry boxes too cramped for the font size. Clear any reported errors before filling.
|
|
253
|
+
|
|
254
|
+
---
|
|
255
|
+
|
|
256
|
+
## Hybrid Approach: Structure + Visual
|
|
257
|
+
|
|
258
|
+
Go this route when structure extraction nails most fields but skips a few elements (circular checkboxes, oddball form controls, and the like).
|
|
259
|
+
|
|
260
|
+
1. **Apply Approach A** to the fields that showed up in form_structure.json
|
|
261
|
+
2. **Render the PDF to images** so you can visually inspect the missing fields
|
|
262
|
+
3. **Apply zoom refinement** (from Approach B) to those missing fields
|
|
263
|
+
4. **Reconcile the coordinates**: keep `pdf_width`/`pdf_height` for the structure-extracted fields. For the visually estimated ones, convert image coordinates into PDF coordinates:
|
|
264
|
+
- pdf_x = image_x * (pdf_width / image_width)
|
|
265
|
+
- pdf_y = image_y * (pdf_height / image_height)
|
|
266
|
+
5. **Settle on one coordinate system** in fields.json — convert everything to PDF coordinates with `pdf_width`/`pdf_height`
|
|
267
|
+
|
|
268
|
+
---
|
|
269
|
+
|
|
270
|
+
## Step 2: Validate Before Filling
|
|
271
|
+
|
|
272
|
+
**Always screen the bounding boxes before filling:**
|
|
273
|
+
`python scripts/check_bounding_boxes.py fields.json`
|
|
274
|
+
|
|
275
|
+
This looks for:
|
|
276
|
+
- Overlapping bounding boxes (which would smear text together)
|
|
277
|
+
- Entry boxes too small for the chosen font size
|
|
278
|
+
|
|
279
|
+
Clear any reported errors in fields.json before moving on.
|
|
280
|
+
|
|
281
|
+
## Step 3: Fill the Form
|
|
282
|
+
|
|
283
|
+
The fill script figures out the coordinate system on its own and handles the conversion:
|
|
284
|
+
`python scripts/fill_pdf_form_with_annotations.py <input.pdf> fields.json <output.pdf>`
|
|
285
|
+
|
|
286
|
+
## Step 4: Verify Output
|
|
287
|
+
|
|
288
|
+
Render the filled PDF to images and confirm the text landed where it should:
|
|
289
|
+
`python scripts/convert_pdf_to_images.py <output.pdf> <verify_images/>`
|
|
290
|
+
|
|
291
|
+
If any text is out of place:
|
|
292
|
+
- **Approach A**: Confirm you're using PDF coordinates from form_structure.json alongside `pdf_width`/`pdf_height`
|
|
293
|
+
- **Approach B**: Confirm the image dimensions line up and the coordinates are accurate pixels
|
|
294
|
+
- **Hybrid**: Confirm the coordinate conversions are right for the visually estimated fields
|