wordhelpers 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -0,0 +1,215 @@
|
|
|
1
|
+
Metadata-Version: 2.3
|
|
2
|
+
Name: wordhelpers
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Helpers for working with python-docx
|
|
5
|
+
Author: AJ Cruz
|
|
6
|
+
Author-email: 15045766-a-cruz@users.noreply.gitlab.com
|
|
7
|
+
Requires-Python: >=3.13
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
10
|
+
Requires-Dist: pydantic (>=2.12.5,<3.0.0)
|
|
11
|
+
Requires-Dist: python-docx (>=1.2.0,<2.0.0)
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
|
|
14
|
+
[](https://img.shields.io/pypi/pyversions/wordhelpers)
|
|
15
|
+
[](https://pypi.python.org/pypi/wordhelpers)
|
|
16
|
+
[](https://github.com/ambv/black)
|
|
17
|
+
|
|
18
|
+
# wordhelpers
|
|
19
|
+
=============
|
|
20
|
+
Helper functions for [python-docx](https://python-docx.readthedocs.io/en/latest/). I found myself re-learning docx every time I wanted to use it in a project, so this provides and abstraction. You represent Word tables as a properly-formatted Python dictionary and the helper function converts it to a docx table.
|
|
21
|
+
|
|
22
|
+
# Installation
|
|
23
|
+
wordhelpers can be installed via poetry with: ```poetry add wordhelpers```
|
|
24
|
+
or via pip with: ```pip install wordhelpers```
|
|
25
|
+
|
|
26
|
+
# Usage
|
|
27
|
+
For detailed documentation of the python-docx library see [python-docx](https://python-docx.readthedocs.io/en/latest/)
|
|
28
|
+
|
|
29
|
+
1. Import the python-docx library into your script with:
|
|
30
|
+
```python
|
|
31
|
+
from docx import Document
|
|
32
|
+
```
|
|
33
|
+
1. Import the helpers from this project with:
|
|
34
|
+
```python
|
|
35
|
+
from wordhelpers import build_table, replace_placeholder_with_table
|
|
36
|
+
```
|
|
37
|
+
1. Create the docx Word document object with something like:
|
|
38
|
+
```python
|
|
39
|
+
doc_obj = Document("a_word_template.docx")
|
|
40
|
+
```
|
|
41
|
+
1. Manipulate the document object as required (see the next section of this README for info on how to do that)
|
|
42
|
+
1. When all changes to your document object are complete, write them with:
|
|
43
|
+
```python
|
|
44
|
+
doc_obj.save("output_file.docx")
|
|
45
|
+
```
|
|
46
|
+
# Manipulating the document object
|
|
47
|
+
wordhelpers provides two main functions available to your scripts:
|
|
48
|
+
1. build_table(<doc_obj>, <table_dict>, <remove_leading_para>)
|
|
49
|
+
1. replace_placeholder_with_table(<doc_obj>, <search_string>, <table_obj>)
|
|
50
|
+
|
|
51
|
+
### build_table(<doc_obj>, <table_dict>, <remove_leading_para>)
|
|
52
|
+
The purpose of this function is to allow the script author to model Word tables using Python dictionaries. If formatted properly, the module will translate the Python dictionary to the appropriate python-docx syntax and create the Word table object.
|
|
53
|
+
|
|
54
|
+
The build_table function has the following arguments:
|
|
55
|
+
- **<doc_obj>** - The python-docx Word document object created in step 3 of the "Usage" section above.
|
|
56
|
+
- **<table_dict>** - The Word table model (Python dictionary). The expected Python dictionary format to model a Word table is:
|
|
57
|
+
```python
|
|
58
|
+
{
|
|
59
|
+
"style": None,
|
|
60
|
+
"rows": [
|
|
61
|
+
{
|
|
62
|
+
"cells": [
|
|
63
|
+
{
|
|
64
|
+
"width": None,
|
|
65
|
+
"background": None,
|
|
66
|
+
"paragraphs": [{"style":None,"alignment": "center", "text":"Some Text"}],
|
|
67
|
+
"table": {optional child table}
|
|
68
|
+
},
|
|
69
|
+
{
|
|
70
|
+
"merge": None
|
|
71
|
+
},
|
|
72
|
+
]
|
|
73
|
+
}
|
|
74
|
+
]
|
|
75
|
+
}
|
|
76
|
+
```
|
|
77
|
+
The cell **background** attribute is optional. If supplied with a hexidecimal color code, the cell will be shaded that color.
|
|
78
|
+
|
|
79
|
+
The cell **width** attribute is optional. If supplied with a decimal number (inches), it will hard-code that column's width to the supplied value.
|
|
80
|
+
|
|
81
|
+
The cell **table** attribute is optional. It can be used to nest tables within table cells. If "table" is provided, no other keys are required (background, paragraphs, etc).
|
|
82
|
+
|
|
83
|
+
The paragraph **style** attribute is optional. If set to anything besides None it will use the Word style referenced. The style must already exist in the source/template Word document.
|
|
84
|
+
|
|
85
|
+
The paragraph **alignment** attribute is optional. If set to ```"center"``` it will center-align the text within a cell, if set to ```"right"``` it will right-align the text within a cell
|
|
86
|
+
|
|
87
|
+
The **merge** key is optional. If used the cell will be merged with the cell above (from a dictionary view, to the left from a table view). Multiple merges can be used in a row to merge multiple cells.
|
|
88
|
+
|
|
89
|
+
By default a paragraph's **text** property will create a single-line (but wrapped) entry in the cell if the value is a string. If you would like to create a multi-line cell entry, supply the value as a list instead of a string. This will instruct the module to add a line break after each list item.
|
|
90
|
+
- **<remove_leading_para>** - This is an optional argument. If not set it will default to True. MS Word tables when created automatically have an empty paragraph at the top/beginning of the table cell. This can create unwanted spacing at the top of the table. By default (value set to "True") the paragraph will be deleted. If you want to keep the paragraph (to add text to it), set this to "False".
|
|
91
|
+
|
|
92
|
+
**IMPORTANT NOTE:** This adds the table object to very end of your Word file. If you want to relocate it, use the provided `replace_placeholder_with_table()` function (see below).
|
|
93
|
+
|
|
94
|
+
### replace_placeholder_with_table(<doc_obj>, <search_string>, <table_obj>)
|
|
95
|
+
The purpose of this function is to search a Word file for a given string (the placeholder) and replace the string with a Word table object.
|
|
96
|
+
|
|
97
|
+
The replace_placeholder_with_table function has the following arguments:
|
|
98
|
+
- **<doc_obj>** - The python-docx Word document object created in step 2 of the "USING PYTHON-DOCX LIBRARY" section above.
|
|
99
|
+
- **<search_string>** - The string to search for in the document object (doc_obj)
|
|
100
|
+
- **<table_obj>** - The python-docx Word Table object that will replace the <search_string> in the document object (odc_obj)
|
|
101
|
+
|
|
102
|
+
It will relocate the table to the placeholder and remove the placeholder.-
|
|
103
|
+
|
|
104
|
+
### EXAMPLE
|
|
105
|
+
We start with a Microsoft Word template named "source-template.docx" that looks like this:
|
|
106
|
+
|
|
107
|
+

|
|
108
|
+
|
|
109
|
+
Our sample Python script looks like this:
|
|
110
|
+
```python
|
|
111
|
+
from docx import Document
|
|
112
|
+
from dcnet_msofficetools.docx_extensions import build_table, replace_placeholder_with_table
|
|
113
|
+
|
|
114
|
+
doc_obj = Document("source-template.docx")
|
|
115
|
+
|
|
116
|
+
my_dictionary = {
|
|
117
|
+
"style": None,
|
|
118
|
+
"rows": [
|
|
119
|
+
{
|
|
120
|
+
"cells": [
|
|
121
|
+
{
|
|
122
|
+
"paragraphs": [],
|
|
123
|
+
"table": {
|
|
124
|
+
"style": "plain",
|
|
125
|
+
"rows": [
|
|
126
|
+
{
|
|
127
|
+
"cells": [
|
|
128
|
+
{
|
|
129
|
+
"background": "#506279",
|
|
130
|
+
"paragraphs":[{"style": "regularbold", "text": "Header 1:"}]
|
|
131
|
+
},
|
|
132
|
+
{
|
|
133
|
+
"background": "#506279",
|
|
134
|
+
"paragraphs":[{"style": "regularbold", "text": "Header 2:"}]
|
|
135
|
+
},
|
|
136
|
+
{
|
|
137
|
+
"background": "#506279",
|
|
138
|
+
"paragraphs":[{"style": "regularbold", "text": "Header 3:"}]
|
|
139
|
+
}
|
|
140
|
+
]
|
|
141
|
+
},
|
|
142
|
+
{
|
|
143
|
+
"cells": [
|
|
144
|
+
{
|
|
145
|
+
"background": "#D5DCE4",
|
|
146
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 1 Data 1:"}]
|
|
147
|
+
},
|
|
148
|
+
{
|
|
149
|
+
"background": "#D5DCE4",
|
|
150
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 1 Data 2:"}]
|
|
151
|
+
},
|
|
152
|
+
{
|
|
153
|
+
"background": "#D5DCE4",
|
|
154
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 1 Data 3:"}]
|
|
155
|
+
}
|
|
156
|
+
]
|
|
157
|
+
},
|
|
158
|
+
{
|
|
159
|
+
"cells": [
|
|
160
|
+
{
|
|
161
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 2 Data 1:"}]
|
|
162
|
+
},
|
|
163
|
+
{
|
|
164
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 2 Data 2:"}]
|
|
165
|
+
},
|
|
166
|
+
{
|
|
167
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 2 Data 3:"}]
|
|
168
|
+
}
|
|
169
|
+
]
|
|
170
|
+
},
|
|
171
|
+
{
|
|
172
|
+
"cells": [
|
|
173
|
+
{
|
|
174
|
+
"background": "#D5DCE4",
|
|
175
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 3 Data 1:"}]
|
|
176
|
+
},
|
|
177
|
+
{
|
|
178
|
+
"background": "#D5DCE4",
|
|
179
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 3 Data 2:"}]
|
|
180
|
+
},
|
|
181
|
+
{
|
|
182
|
+
"background": "#D5DCE4",
|
|
183
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 3 Data 3:"}]
|
|
184
|
+
}
|
|
185
|
+
]
|
|
186
|
+
}
|
|
187
|
+
]
|
|
188
|
+
}
|
|
189
|
+
}
|
|
190
|
+
]
|
|
191
|
+
}
|
|
192
|
+
]
|
|
193
|
+
}
|
|
194
|
+
|
|
195
|
+
my_table = build_table(doc_obj, my_dictionary)
|
|
196
|
+
|
|
197
|
+
replace_placeholder_with_table(doc_obj, '\[py_placeholder1\]', my_table)
|
|
198
|
+
|
|
199
|
+
doc_obj.save("output_word_doc.docx")
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
We run the Python script and it produces a new Word document named "output_word_doc.docx" that looks like this:
|
|
203
|
+
|
|
204
|
+

|
|
205
|
+
|
|
206
|
+
|
|
207
|
+
The project provides some additional docx functions that may be useful to your project:
|
|
208
|
+
- ```get_para_by_string(doc_obj: _Document, search: str)```: Searches for a keyword in the docx object and returns there paragraph where the keyword is found
|
|
209
|
+
- ```insert_paragraph_after(paragraph: Paragraph, text: str = None, style: str = None)```: Searches for a keyword in the docx object and inserts a new paragraph immediately after it with the supplied text
|
|
210
|
+
- ```delete_paragraph(paragraph: Paragraph)```: Deletes a given paragraph (after you've inserted text after it for example)
|
|
211
|
+
|
|
212
|
+
As well as the following helper functions for the dictionary table models:
|
|
213
|
+
- ```insert_text_into_row(cell_text: list)```: Builds a row (dictionary) from a list of text where each list item is a column in the row. Supports "merge"
|
|
214
|
+
-```insert_text_by_table_coords(table: dict, row: int, col: int, text: str)```: Inserts text into a table dictionary given the row & column numbers.
|
|
215
|
+
- ```generate_table(num_rows: int, num_cols: int, header_row: list, style: str = None)```: Generates a basic table dictionary and populates the headers from a list of text (strings).
|
|
@@ -0,0 +1,202 @@
|
|
|
1
|
+
[](https://img.shields.io/pypi/pyversions/wordhelpers)
|
|
2
|
+
[](https://pypi.python.org/pypi/wordhelpers)
|
|
3
|
+
[](https://github.com/ambv/black)
|
|
4
|
+
|
|
5
|
+
# wordhelpers
|
|
6
|
+
=============
|
|
7
|
+
Helper functions for [python-docx](https://python-docx.readthedocs.io/en/latest/). I found myself re-learning docx every time I wanted to use it in a project, so this provides and abstraction. You represent Word tables as a properly-formatted Python dictionary and the helper function converts it to a docx table.
|
|
8
|
+
|
|
9
|
+
# Installation
|
|
10
|
+
wordhelpers can be installed via poetry with: ```poetry add wordhelpers```
|
|
11
|
+
or via pip with: ```pip install wordhelpers```
|
|
12
|
+
|
|
13
|
+
# Usage
|
|
14
|
+
For detailed documentation of the python-docx library see [python-docx](https://python-docx.readthedocs.io/en/latest/)
|
|
15
|
+
|
|
16
|
+
1. Import the python-docx library into your script with:
|
|
17
|
+
```python
|
|
18
|
+
from docx import Document
|
|
19
|
+
```
|
|
20
|
+
1. Import the helpers from this project with:
|
|
21
|
+
```python
|
|
22
|
+
from wordhelpers import build_table, replace_placeholder_with_table
|
|
23
|
+
```
|
|
24
|
+
1. Create the docx Word document object with something like:
|
|
25
|
+
```python
|
|
26
|
+
doc_obj = Document("a_word_template.docx")
|
|
27
|
+
```
|
|
28
|
+
1. Manipulate the document object as required (see the next section of this README for info on how to do that)
|
|
29
|
+
1. When all changes to your document object are complete, write them with:
|
|
30
|
+
```python
|
|
31
|
+
doc_obj.save("output_file.docx")
|
|
32
|
+
```
|
|
33
|
+
# Manipulating the document object
|
|
34
|
+
wordhelpers provides two main functions available to your scripts:
|
|
35
|
+
1. build_table(<doc_obj>, <table_dict>, <remove_leading_para>)
|
|
36
|
+
1. replace_placeholder_with_table(<doc_obj>, <search_string>, <table_obj>)
|
|
37
|
+
|
|
38
|
+
### build_table(<doc_obj>, <table_dict>, <remove_leading_para>)
|
|
39
|
+
The purpose of this function is to allow the script author to model Word tables using Python dictionaries. If formatted properly, the module will translate the Python dictionary to the appropriate python-docx syntax and create the Word table object.
|
|
40
|
+
|
|
41
|
+
The build_table function has the following arguments:
|
|
42
|
+
- **<doc_obj>** - The python-docx Word document object created in step 3 of the "Usage" section above.
|
|
43
|
+
- **<table_dict>** - The Word table model (Python dictionary). The expected Python dictionary format to model a Word table is:
|
|
44
|
+
```python
|
|
45
|
+
{
|
|
46
|
+
"style": None,
|
|
47
|
+
"rows": [
|
|
48
|
+
{
|
|
49
|
+
"cells": [
|
|
50
|
+
{
|
|
51
|
+
"width": None,
|
|
52
|
+
"background": None,
|
|
53
|
+
"paragraphs": [{"style":None,"alignment": "center", "text":"Some Text"}],
|
|
54
|
+
"table": {optional child table}
|
|
55
|
+
},
|
|
56
|
+
{
|
|
57
|
+
"merge": None
|
|
58
|
+
},
|
|
59
|
+
]
|
|
60
|
+
}
|
|
61
|
+
]
|
|
62
|
+
}
|
|
63
|
+
```
|
|
64
|
+
The cell **background** attribute is optional. If supplied with a hexidecimal color code, the cell will be shaded that color.
|
|
65
|
+
|
|
66
|
+
The cell **width** attribute is optional. If supplied with a decimal number (inches), it will hard-code that column's width to the supplied value.
|
|
67
|
+
|
|
68
|
+
The cell **table** attribute is optional. It can be used to nest tables within table cells. If "table" is provided, no other keys are required (background, paragraphs, etc).
|
|
69
|
+
|
|
70
|
+
The paragraph **style** attribute is optional. If set to anything besides None it will use the Word style referenced. The style must already exist in the source/template Word document.
|
|
71
|
+
|
|
72
|
+
The paragraph **alignment** attribute is optional. If set to ```"center"``` it will center-align the text within a cell, if set to ```"right"``` it will right-align the text within a cell
|
|
73
|
+
|
|
74
|
+
The **merge** key is optional. If used the cell will be merged with the cell above (from a dictionary view, to the left from a table view). Multiple merges can be used in a row to merge multiple cells.
|
|
75
|
+
|
|
76
|
+
By default a paragraph's **text** property will create a single-line (but wrapped) entry in the cell if the value is a string. If you would like to create a multi-line cell entry, supply the value as a list instead of a string. This will instruct the module to add a line break after each list item.
|
|
77
|
+
- **<remove_leading_para>** - This is an optional argument. If not set it will default to True. MS Word tables when created automatically have an empty paragraph at the top/beginning of the table cell. This can create unwanted spacing at the top of the table. By default (value set to "True") the paragraph will be deleted. If you want to keep the paragraph (to add text to it), set this to "False".
|
|
78
|
+
|
|
79
|
+
**IMPORTANT NOTE:** This adds the table object to very end of your Word file. If you want to relocate it, use the provided `replace_placeholder_with_table()` function (see below).
|
|
80
|
+
|
|
81
|
+
### replace_placeholder_with_table(<doc_obj>, <search_string>, <table_obj>)
|
|
82
|
+
The purpose of this function is to search a Word file for a given string (the placeholder) and replace the string with a Word table object.
|
|
83
|
+
|
|
84
|
+
The replace_placeholder_with_table function has the following arguments:
|
|
85
|
+
- **<doc_obj>** - The python-docx Word document object created in step 2 of the "USING PYTHON-DOCX LIBRARY" section above.
|
|
86
|
+
- **<search_string>** - The string to search for in the document object (doc_obj)
|
|
87
|
+
- **<table_obj>** - The python-docx Word Table object that will replace the <search_string> in the document object (odc_obj)
|
|
88
|
+
|
|
89
|
+
It will relocate the table to the placeholder and remove the placeholder.-
|
|
90
|
+
|
|
91
|
+
### EXAMPLE
|
|
92
|
+
We start with a Microsoft Word template named "source-template.docx" that looks like this:
|
|
93
|
+
|
|
94
|
+

|
|
95
|
+
|
|
96
|
+
Our sample Python script looks like this:
|
|
97
|
+
```python
|
|
98
|
+
from docx import Document
|
|
99
|
+
from dcnet_msofficetools.docx_extensions import build_table, replace_placeholder_with_table
|
|
100
|
+
|
|
101
|
+
doc_obj = Document("source-template.docx")
|
|
102
|
+
|
|
103
|
+
my_dictionary = {
|
|
104
|
+
"style": None,
|
|
105
|
+
"rows": [
|
|
106
|
+
{
|
|
107
|
+
"cells": [
|
|
108
|
+
{
|
|
109
|
+
"paragraphs": [],
|
|
110
|
+
"table": {
|
|
111
|
+
"style": "plain",
|
|
112
|
+
"rows": [
|
|
113
|
+
{
|
|
114
|
+
"cells": [
|
|
115
|
+
{
|
|
116
|
+
"background": "#506279",
|
|
117
|
+
"paragraphs":[{"style": "regularbold", "text": "Header 1:"}]
|
|
118
|
+
},
|
|
119
|
+
{
|
|
120
|
+
"background": "#506279",
|
|
121
|
+
"paragraphs":[{"style": "regularbold", "text": "Header 2:"}]
|
|
122
|
+
},
|
|
123
|
+
{
|
|
124
|
+
"background": "#506279",
|
|
125
|
+
"paragraphs":[{"style": "regularbold", "text": "Header 3:"}]
|
|
126
|
+
}
|
|
127
|
+
]
|
|
128
|
+
},
|
|
129
|
+
{
|
|
130
|
+
"cells": [
|
|
131
|
+
{
|
|
132
|
+
"background": "#D5DCE4",
|
|
133
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 1 Data 1:"}]
|
|
134
|
+
},
|
|
135
|
+
{
|
|
136
|
+
"background": "#D5DCE4",
|
|
137
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 1 Data 2:"}]
|
|
138
|
+
},
|
|
139
|
+
{
|
|
140
|
+
"background": "#D5DCE4",
|
|
141
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 1 Data 3:"}]
|
|
142
|
+
}
|
|
143
|
+
]
|
|
144
|
+
},
|
|
145
|
+
{
|
|
146
|
+
"cells": [
|
|
147
|
+
{
|
|
148
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 2 Data 1:"}]
|
|
149
|
+
},
|
|
150
|
+
{
|
|
151
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 2 Data 2:"}]
|
|
152
|
+
},
|
|
153
|
+
{
|
|
154
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 2 Data 3:"}]
|
|
155
|
+
}
|
|
156
|
+
]
|
|
157
|
+
},
|
|
158
|
+
{
|
|
159
|
+
"cells": [
|
|
160
|
+
{
|
|
161
|
+
"background": "#D5DCE4",
|
|
162
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 3 Data 1:"}]
|
|
163
|
+
},
|
|
164
|
+
{
|
|
165
|
+
"background": "#D5DCE4",
|
|
166
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 3 Data 2:"}]
|
|
167
|
+
},
|
|
168
|
+
{
|
|
169
|
+
"background": "#D5DCE4",
|
|
170
|
+
"paragraphs":[{"style": "No Spacing", "text": "Row 3 Data 3:"}]
|
|
171
|
+
}
|
|
172
|
+
]
|
|
173
|
+
}
|
|
174
|
+
]
|
|
175
|
+
}
|
|
176
|
+
}
|
|
177
|
+
]
|
|
178
|
+
}
|
|
179
|
+
]
|
|
180
|
+
}
|
|
181
|
+
|
|
182
|
+
my_table = build_table(doc_obj, my_dictionary)
|
|
183
|
+
|
|
184
|
+
replace_placeholder_with_table(doc_obj, '\[py_placeholder1\]', my_table)
|
|
185
|
+
|
|
186
|
+
doc_obj.save("output_word_doc.docx")
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
We run the Python script and it produces a new Word document named "output_word_doc.docx" that looks like this:
|
|
190
|
+
|
|
191
|
+

|
|
192
|
+
|
|
193
|
+
|
|
194
|
+
The project provides some additional docx functions that may be useful to your project:
|
|
195
|
+
- ```get_para_by_string(doc_obj: _Document, search: str)```: Searches for a keyword in the docx object and returns there paragraph where the keyword is found
|
|
196
|
+
- ```insert_paragraph_after(paragraph: Paragraph, text: str = None, style: str = None)```: Searches for a keyword in the docx object and inserts a new paragraph immediately after it with the supplied text
|
|
197
|
+
- ```delete_paragraph(paragraph: Paragraph)```: Deletes a given paragraph (after you've inserted text after it for example)
|
|
198
|
+
|
|
199
|
+
As well as the following helper functions for the dictionary table models:
|
|
200
|
+
- ```insert_text_into_row(cell_text: list)```: Builds a row (dictionary) from a list of text where each list item is a column in the row. Supports "merge"
|
|
201
|
+
-```insert_text_by_table_coords(table: dict, row: int, col: int, text: str)```: Inserts text into a table dictionary given the row & column numbers.
|
|
202
|
+
- ```generate_table(num_rows: int, num_cols: int, header_row: list, style: str = None)```: Generates a basic table dictionary and populates the headers from a list of text (strings).
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
[project]
|
|
2
|
+
name = "wordhelpers"
|
|
3
|
+
version = "0.1.0"
|
|
4
|
+
description = "Helpers for working with python-docx"
|
|
5
|
+
authors = [
|
|
6
|
+
{name = "AJ Cruz",email = "15045766-a-cruz@users.noreply.gitlab.com"}
|
|
7
|
+
]
|
|
8
|
+
readme = "README.md"
|
|
9
|
+
requires-python = ">=3.13"
|
|
10
|
+
dependencies = [
|
|
11
|
+
"python-docx (>=1.2.0,<2.0.0)",
|
|
12
|
+
"pydantic (>=2.12.5,<3.0.0)"
|
|
13
|
+
]
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
[build-system]
|
|
17
|
+
requires = ["poetry-core>=2.0.0,<3.0.0"]
|
|
18
|
+
build-backend = "poetry.core.masonry.api"
|
|
19
|
+
|
|
20
|
+
[tool.poetry.group.dev.dependencies]
|
|
21
|
+
black = "^26.1.0"
|
|
22
|
+
|
|
@@ -0,0 +1,308 @@
|
|
|
1
|
+
# Built-In imports
|
|
2
|
+
import re
|
|
3
|
+
|
|
4
|
+
# Third-party imports
|
|
5
|
+
from .pydantic_models import WordTableModel
|
|
6
|
+
from docx import Document
|
|
7
|
+
from docx.document import Document as _Document
|
|
8
|
+
from docx.table import Table, _Cell
|
|
9
|
+
from docx.text.paragraph import Paragraph
|
|
10
|
+
from docx.oxml.ns import nsdecls
|
|
11
|
+
from docx.oxml import parse_xml, OxmlElement
|
|
12
|
+
from docx.shared import Inches, Cm
|
|
13
|
+
from docx.enum.text import WD_ALIGN_PARAGRAPH
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
def cell_shader(cell: _Cell, hex_color: tuple[int, int, int]) -> None:
|
|
17
|
+
shading_elm = parse_xml(
|
|
18
|
+
r'<w:shd {} w:fill="{}"/>'.format(nsdecls("w"), hex_color.strip("#").upper())
|
|
19
|
+
)
|
|
20
|
+
cell._tc.get_or_add_tcPr().append(shading_elm)
|
|
21
|
+
|
|
22
|
+
|
|
23
|
+
def delete_paragraph(paragraph: Paragraph) -> None:
|
|
24
|
+
"""
|
|
25
|
+
Function to delete a given paragraph in a Word document.
|
|
26
|
+
Requires as input the doxc paragraph object.
|
|
27
|
+
"""
|
|
28
|
+
p = paragraph._element
|
|
29
|
+
p.getparent().remove(p)
|
|
30
|
+
p._p = p._element = None
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
def set_col_width(table: Table, column: int, width: int | float) -> None:
|
|
34
|
+
width = Inches(float(width))
|
|
35
|
+
for row in table.rows:
|
|
36
|
+
row.cells[column].width = width
|
|
37
|
+
|
|
38
|
+
|
|
39
|
+
def find_para_by_string(doc_obj: _Document, search: str) -> int:
|
|
40
|
+
"""
|
|
41
|
+
Function to find a string in a Word document and return the paragraph
|
|
42
|
+
number. Receives as input the Word document object (using docx module) and
|
|
43
|
+
a search string.
|
|
44
|
+
"""
|
|
45
|
+
for i, p in enumerate(doc_obj.paragraphs):
|
|
46
|
+
if re.match(search, p.text):
|
|
47
|
+
return i
|
|
48
|
+
|
|
49
|
+
|
|
50
|
+
def get_para_by_string(doc_obj: _Document, search: str) -> Paragraph:
|
|
51
|
+
"""
|
|
52
|
+
Function to find a string in a Word document and return the paragraph.
|
|
53
|
+
Receives as input the Word document object (using docx module) and
|
|
54
|
+
a search string.
|
|
55
|
+
"""
|
|
56
|
+
for p in doc_obj.paragraphs:
|
|
57
|
+
if re.match(search, p.text):
|
|
58
|
+
return p
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
def replace_placeholder_with_table(
|
|
62
|
+
doc_obj: _Document, placeholder: str, table: Table
|
|
63
|
+
) -> None:
|
|
64
|
+
"""
|
|
65
|
+
Function to relocate a Word table object to immediately follow a given
|
|
66
|
+
reference paragraph identified by the placeholder. Receives as input the
|
|
67
|
+
placeholder string and the Word table object (using docx module).
|
|
68
|
+
After moving the Word table after the placeholder paragraph, delete the
|
|
69
|
+
placeholder paragraph.
|
|
70
|
+
"""
|
|
71
|
+
# Locate the paragraph from the supplied placeholder text
|
|
72
|
+
paragraph: Paragraph = get_para_by_string(doc_obj, placeholder)
|
|
73
|
+
if not paragraph:
|
|
74
|
+
print(f'WARNING: Could not locate placeholder "{placeholder}"')
|
|
75
|
+
else:
|
|
76
|
+
# Move the Word table to a new paragraph immediately after the placeholder paragraph
|
|
77
|
+
paragraph._p.addnext(table._tbl)
|
|
78
|
+
# Delete the placeholder paragraph
|
|
79
|
+
delete_paragraph(paragraph)
|
|
80
|
+
|
|
81
|
+
|
|
82
|
+
def build_table(
|
|
83
|
+
docx_obj: _Document | Table, table_dict: dict, remove_leading_para: bool = True
|
|
84
|
+
) -> Table:
|
|
85
|
+
"""
|
|
86
|
+
Convert a WordTableModel-style dictionary into a Word table object.
|
|
87
|
+
Supports nested tables and merged cells within a row.
|
|
88
|
+
Automatically sets nested table widths to fill merged cells.
|
|
89
|
+
"""
|
|
90
|
+
raw_table = WordTableModel.model_validate(table_dict)
|
|
91
|
+
|
|
92
|
+
# Create table if docx_obj is _Document
|
|
93
|
+
if isinstance(docx_obj, _Document):
|
|
94
|
+
table = docx_obj.add_table(
|
|
95
|
+
rows=len(raw_table.rows),
|
|
96
|
+
cols=len(raw_table.rows[0].cells) if raw_table.rows else 0,
|
|
97
|
+
)
|
|
98
|
+
elif isinstance(docx_obj, Table):
|
|
99
|
+
table = docx_obj
|
|
100
|
+
else:
|
|
101
|
+
raise ValueError(f"docx_obj must be _Document or Table, got {type(docx_obj)}")
|
|
102
|
+
|
|
103
|
+
# Apply table style
|
|
104
|
+
if raw_table.style:
|
|
105
|
+
table.style = raw_table.style
|
|
106
|
+
|
|
107
|
+
for i, row in enumerate(raw_table.rows):
|
|
108
|
+
table_row = table.rows[i]
|
|
109
|
+
col_idx = 0
|
|
110
|
+
|
|
111
|
+
while col_idx < len(row.cells):
|
|
112
|
+
cell = row.cells[col_idx]
|
|
113
|
+
table_cell = table_row.cells[col_idx]
|
|
114
|
+
|
|
115
|
+
# --- Handle merged cells ---
|
|
116
|
+
if isinstance(cell, str) and cell.lower() == "merge":
|
|
117
|
+
left_cell = table_row.cells[col_idx - 1]
|
|
118
|
+
left_cell.merge(table_cell)
|
|
119
|
+
col_idx += 1
|
|
120
|
+
continue
|
|
121
|
+
|
|
122
|
+
# --- Regular cell formatting ---
|
|
123
|
+
if cell.background_color:
|
|
124
|
+
cell_shader(table_cell, cell.background_color)
|
|
125
|
+
if cell.width:
|
|
126
|
+
set_col_width(table, col_idx, cell.width)
|
|
127
|
+
|
|
128
|
+
# --- Paragraphs ---
|
|
129
|
+
for para_idx, para in enumerate(cell.paragraphs):
|
|
130
|
+
if para_idx >= len(table_cell.paragraphs):
|
|
131
|
+
p = table_cell.add_paragraph()
|
|
132
|
+
else:
|
|
133
|
+
p = table_cell.paragraphs[para_idx]
|
|
134
|
+
|
|
135
|
+
if isinstance(para.text, str):
|
|
136
|
+
p.text = para.text
|
|
137
|
+
elif isinstance(para.text, list):
|
|
138
|
+
# Multi-line text with breaks
|
|
139
|
+
p.text = ""
|
|
140
|
+
for idx, run_text in enumerate(para.text):
|
|
141
|
+
run = p.add_run(run_text)
|
|
142
|
+
if idx < len(para.text) - 1:
|
|
143
|
+
run.add_break()
|
|
144
|
+
|
|
145
|
+
# Paragraph alignment
|
|
146
|
+
if para.alignment:
|
|
147
|
+
alignment_map = {
|
|
148
|
+
"left": WD_ALIGN_PARAGRAPH.LEFT,
|
|
149
|
+
"center": WD_ALIGN_PARAGRAPH.CENTER,
|
|
150
|
+
"right": WD_ALIGN_PARAGRAPH.RIGHT,
|
|
151
|
+
"justify": WD_ALIGN_PARAGRAPH.JUSTIFY,
|
|
152
|
+
"distribute": WD_ALIGN_PARAGRAPH.DISTRIBUTE,
|
|
153
|
+
}
|
|
154
|
+
p.alignment = alignment_map.get(para.alignment.value.lower())
|
|
155
|
+
|
|
156
|
+
# Paragraph style
|
|
157
|
+
if para.style:
|
|
158
|
+
p.style = para.style
|
|
159
|
+
|
|
160
|
+
# --- Nested table ---
|
|
161
|
+
if cell.table:
|
|
162
|
+
# Compute width of merged cell
|
|
163
|
+
merged_cols = 1
|
|
164
|
+
temp_idx = col_idx + 1
|
|
165
|
+
while (
|
|
166
|
+
temp_idx < len(row.cells)
|
|
167
|
+
and isinstance(row.cells[temp_idx], str)
|
|
168
|
+
and row.cells[temp_idx].lower() == "merge"
|
|
169
|
+
):
|
|
170
|
+
merged_cols += 1
|
|
171
|
+
temp_idx += 1
|
|
172
|
+
|
|
173
|
+
# Sum widths of merged columns, or use default
|
|
174
|
+
parent_width = sum(
|
|
175
|
+
table.columns[col_idx + k].width or Cm(2.5)
|
|
176
|
+
for k in range(merged_cols)
|
|
177
|
+
)
|
|
178
|
+
|
|
179
|
+
nested_rows = len(cell.table.rows)
|
|
180
|
+
nested_cols = len(cell.table.rows[0].cells) if cell.table.rows else 0
|
|
181
|
+
|
|
182
|
+
nested_table = table_cell.add_table(rows=nested_rows, cols=nested_cols)
|
|
183
|
+
nested_table.allow_autofit = False # We'll set widths manually
|
|
184
|
+
|
|
185
|
+
# Assign column widths proportionally
|
|
186
|
+
nested_col_width = parent_width / nested_cols
|
|
187
|
+
for col in nested_table.columns:
|
|
188
|
+
for nested_cell in col.cells:
|
|
189
|
+
nested_cell.width = nested_col_width
|
|
190
|
+
|
|
191
|
+
# Recursively build nested table
|
|
192
|
+
build_table(nested_table, cell.table.model_dump())
|
|
193
|
+
|
|
194
|
+
# Remove leading paragraph in merged/nested cell
|
|
195
|
+
if remove_leading_para and table_cell.paragraphs:
|
|
196
|
+
delete_paragraph(table_cell.paragraphs[0])
|
|
197
|
+
|
|
198
|
+
col_idx += 1
|
|
199
|
+
|
|
200
|
+
# Remove leading empty paragraph in document
|
|
201
|
+
if isinstance(docx_obj, _Document) and remove_leading_para and docx_obj.paragraphs:
|
|
202
|
+
if not docx_obj.paragraphs[0].text.strip():
|
|
203
|
+
delete_paragraph(docx_obj.paragraphs[0])
|
|
204
|
+
|
|
205
|
+
return table
|
|
206
|
+
|
|
207
|
+
|
|
208
|
+
def insert_paragraph_after(paragraph: Paragraph, text: str = None, style: str = None):
|
|
209
|
+
"""
|
|
210
|
+
Insert a new paragraph after the given paragraph.
|
|
211
|
+
"""
|
|
212
|
+
|
|
213
|
+
# Create a new empty <w:p> element
|
|
214
|
+
new_p = OxmlElement("w:p")
|
|
215
|
+
|
|
216
|
+
# Insert the new <w:p> after the given paragraph’s <w:p>
|
|
217
|
+
paragraph._p.addnext(new_p)
|
|
218
|
+
|
|
219
|
+
# Wrap the XML element as a python-docx Paragraph
|
|
220
|
+
new_para = Paragraph(new_p, paragraph._parent)
|
|
221
|
+
|
|
222
|
+
# Set text and style
|
|
223
|
+
if text:
|
|
224
|
+
new_para.add_run(text)
|
|
225
|
+
if style:
|
|
226
|
+
new_para.style = style
|
|
227
|
+
|
|
228
|
+
|
|
229
|
+
# TABLE (DICTIONARY) SHORTCUTS
|
|
230
|
+
def insert_text_into_row(cell_text: list) -> dict:
|
|
231
|
+
"""
|
|
232
|
+
Generate a table row dictionary from a list of text strings.
|
|
233
|
+
Each string in the list becomes a cell in the row.
|
|
234
|
+
Assumes no styling.
|
|
235
|
+
|
|
236
|
+
:param cell_text: A list of text strings for each cell in the row. Use "merge" to indicate merged cells.
|
|
237
|
+
:type cell_text: list
|
|
238
|
+
:return: A dictionary representing the table row.
|
|
239
|
+
:rtype: dict
|
|
240
|
+
"""
|
|
241
|
+
|
|
242
|
+
row: dict = {"cells": []}
|
|
243
|
+
for text in cell_text:
|
|
244
|
+
if text.lower() == "merge":
|
|
245
|
+
row["cells"].append("merge")
|
|
246
|
+
continue
|
|
247
|
+
|
|
248
|
+
row["cells"].append(
|
|
249
|
+
{
|
|
250
|
+
"paragraphs": [
|
|
251
|
+
{
|
|
252
|
+
"text": [text],
|
|
253
|
+
},
|
|
254
|
+
],
|
|
255
|
+
}
|
|
256
|
+
)
|
|
257
|
+
|
|
258
|
+
return row
|
|
259
|
+
|
|
260
|
+
|
|
261
|
+
def insert_text_by_table_coords(table: dict, row: int, col: int, text: str) -> dict:
|
|
262
|
+
table["rows"][row]["cells"][col]["paragraphs"][0]["text"] = [text]
|
|
263
|
+
return table
|
|
264
|
+
|
|
265
|
+
|
|
266
|
+
def generate_table(
|
|
267
|
+
num_rows: int, num_cols: int, header_row: list, style: str = None
|
|
268
|
+
) -> dict:
|
|
269
|
+
"""
|
|
270
|
+
Generate a basic table dictionary with specified number of rows and columns.
|
|
271
|
+
Each cell contains empty text.
|
|
272
|
+
|
|
273
|
+
:param num_rows: Number of rows in the table.
|
|
274
|
+
:type num_rows: int
|
|
275
|
+
:param num_cols: Number of columns in the table.
|
|
276
|
+
:type num_cols: int
|
|
277
|
+
:param header_row: A list of text strings for the header row.
|
|
278
|
+
:type header_row: list
|
|
279
|
+
:param style: The style to apply to the table.
|
|
280
|
+
:type style: str
|
|
281
|
+
:return: A dictionary representing the table.
|
|
282
|
+
:rtype: dict
|
|
283
|
+
"""
|
|
284
|
+
|
|
285
|
+
table: dict = {"rows": []}
|
|
286
|
+
|
|
287
|
+
if style:
|
|
288
|
+
table["style"] = style
|
|
289
|
+
|
|
290
|
+
for _ in range(num_rows):
|
|
291
|
+
row: dict = {"cells": []}
|
|
292
|
+
for _ in range(num_cols):
|
|
293
|
+
row["cells"].append(
|
|
294
|
+
{
|
|
295
|
+
"paragraphs": [
|
|
296
|
+
{
|
|
297
|
+
"text": [""],
|
|
298
|
+
},
|
|
299
|
+
],
|
|
300
|
+
}
|
|
301
|
+
)
|
|
302
|
+
table["rows"].append(row)
|
|
303
|
+
|
|
304
|
+
if header_row:
|
|
305
|
+
for col_idx, header_text in enumerate(header_row):
|
|
306
|
+
table["rows"][0]["cells"][col_idx]["paragraphs"][0]["text"] = [header_text]
|
|
307
|
+
|
|
308
|
+
return table
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
# Built-In Imports
|
|
2
|
+
import re
|
|
3
|
+
from enum import Enum
|
|
4
|
+
from typing import Optional
|
|
5
|
+
|
|
6
|
+
# Third-Party Imports
|
|
7
|
+
from pydantic import BaseModel, field_validator, Field
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
class AlignmentEnum(str, Enum):
|
|
11
|
+
left = "left"
|
|
12
|
+
center = "center"
|
|
13
|
+
right = "right"
|
|
14
|
+
justify = "justify"
|
|
15
|
+
distribute = "distribute"
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
class WordParagraphModel(BaseModel):
|
|
19
|
+
style: str | None = None
|
|
20
|
+
alignment: AlignmentEnum | None = None
|
|
21
|
+
text: list[str] = Field(default_factory=list)
|
|
22
|
+
|
|
23
|
+
|
|
24
|
+
class WordCellModel(BaseModel):
|
|
25
|
+
width: int | None = None
|
|
26
|
+
background_color: str | None = None
|
|
27
|
+
paragraphs: list[WordParagraphModel] = Field(default_factory=list)
|
|
28
|
+
table: Optional["WordTableModel"] | None = None # forward reference
|
|
29
|
+
|
|
30
|
+
@field_validator("background_color")
|
|
31
|
+
@classmethod
|
|
32
|
+
def validate_hex_color(cls, v):
|
|
33
|
+
if v is None:
|
|
34
|
+
return v # allow None
|
|
35
|
+
if not isinstance(v, str):
|
|
36
|
+
raise ValueError("background_color must be a string")
|
|
37
|
+
# regex: # followed by 3 or 6 hex digits
|
|
38
|
+
if not re.fullmatch(r"#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})", v):
|
|
39
|
+
raise ValueError(f"'{v}' is not a valid hex color")
|
|
40
|
+
return v
|
|
41
|
+
|
|
42
|
+
|
|
43
|
+
class WordRowModel(BaseModel):
|
|
44
|
+
cells: list[WordCellModel | str] = Field(default_factory=list)
|
|
45
|
+
|
|
46
|
+
@field_validator("cells")
|
|
47
|
+
@classmethod
|
|
48
|
+
def validate_cells(cls, v):
|
|
49
|
+
if isinstance(v, str):
|
|
50
|
+
if not v.strip().lower() == "merge":
|
|
51
|
+
raise ValueError("If a cell is a string, it must be 'merge'")
|
|
52
|
+
return v
|
|
53
|
+
|
|
54
|
+
|
|
55
|
+
class WordTableModel(BaseModel):
|
|
56
|
+
style: str | None = None
|
|
57
|
+
rows: list[WordRowModel] = Field(default_factory=list)
|