markdown-to-confluence 0.3.4__py3-none-any.whl → 0.3.5__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: markdown-to-confluence
3
- Version: 0.3.4
3
+ Version: 0.3.5
4
4
  Summary: Publish Markdown files to Confluence wiki
5
5
  Home-page: https://github.com/hunyadi/md2conf
6
6
  Author: Levente Hunyadi
@@ -21,12 +21,12 @@ Classifier: Typing :: Typed
21
21
  Requires-Python: >=3.9
22
22
  Description-Content-Type: text/markdown
23
23
  License-File: LICENSE
24
- Requires-Dist: lxml>=5.3
25
- Requires-Dist: types-lxml>=2024.12.13
26
- Requires-Dist: markdown>=3.7
27
- Requires-Dist: types-markdown>=3.7
28
- Requires-Dist: pymdown-extensions>=10.14
29
- Requires-Dist: pyyaml>=6.0
24
+ Requires-Dist: lxml>=5.4
25
+ Requires-Dist: types-lxml>=2025.3.30
26
+ Requires-Dist: markdown>=3.8
27
+ Requires-Dist: types-markdown>=3.8
28
+ Requires-Dist: pymdown-extensions>=10.15
29
+ Requires-Dist: PyYAML>=6.0
30
30
  Requires-Dist: types-PyYAML>=6.0
31
31
  Requires-Dist: requests>=2.32
32
32
  Requires-Dist: types-requests>=2.32
@@ -198,20 +198,26 @@ root
198
198
  └── Mean vs. median
199
199
  ```
200
200
 
201
+ ### Lists and tables
202
+
203
+ If your Markdown lists or tables don't appear in Confluence as expected, verify that the list or table is delimited by a blank line both before and after, as per strict Markdown syntax. While some previewers accept a more lenient syntax (e.g. an itemized list immediately following a paragraph), *md2conf* uses [Python-Markdown](https://python-markdown.github.io/) internally to convert Markdown into XHTML, which expects the Markdown document to adhere to the stricter syntax.
204
+
201
205
  ### Publishing images
202
206
 
203
207
  Local images referenced in a Markdown file are automatically published to Confluence as attachments to the page.
204
208
 
205
- Unfortunately, Confluence struggles with SVG images, e.g. they may only show in *edit* mode, display in a wrong size or text labels in the image may be truncated. In order to mitigate the issue, whenever *md2conf* encounters a reference to an SVG image in a Markdown file, it checks whether a corresponding PNG image also exists in the same directory, and if a PNG image is found, it is published instead.
209
+ Unfortunately, Confluence struggles with SVG images, e.g. they may only show in *edit* mode, display in a wrong size or text labels in the image may be truncated. (This seems to be a known issue in Confluence.) In order to mitigate the issue, whenever *md2conf* encounters a reference to an SVG image in a Markdown file, it checks whether a corresponding PNG image also exists in the same directory, and if a PNG image is found, it is published instead.
206
210
 
207
211
  External images referenced with an absolute URL retain the original URL.
208
212
 
209
213
  ### Ignoring files
210
214
 
211
- Skip files in a directory with rules defined in `.mdignore`. Each rule should occupy a single line. Rules follow the syntax of [fnmatch](https://docs.python.org/3/library/fnmatch.html#fnmatch.fnmatch). Specifically, `?` matches any single character, and `*` matches zero or more characters. For example, use `up-*.md` to exclude Markdown files that start with `up-`. Lines that start with `#` are treated as comments.
215
+ Skip files in a directory with rules defined in `.mdignore`. Each rule should occupy a single line. Rules follow the syntax (and constraints) of [fnmatch](https://docs.python.org/3/library/fnmatch.html#fnmatch.fnmatch). Specifically, `?` matches any single character, and `*` matches zero or more characters. For example, use `up-*.md` to exclude Markdown files that start with `up-`. Lines that start with `#` are treated as comments.
212
216
 
213
217
  Files that don't have the extension `*.md` are skipped automatically. Hidden directories (whose name starts with `.`) are not recursed into.
214
218
 
219
+ Relative paths to items in a nested directory are not supported. You must put `.mdignore` in the same directory where the items to be skipped reside.
220
+
215
221
  ### Page title
216
222
 
217
223
  *md2conf* makes a best-effort attempt at setting the Confluence wiki page title when it publishes a Markdown document the first time. The following are probed in this order:
@@ -0,0 +1,23 @@
1
+ markdown_to_confluence-0.3.5.dist-info/licenses/LICENSE,sha256=Pv43so2bPfmKhmsrmXFyAvS7M30-1i1tzjz6-dfhyOo,1077
2
+ md2conf/__init__.py,sha256=Uaqb3maQScpYs3FiH8kuM6pUh5JzE4Vy52MgU9pvMTw,402
3
+ md2conf/__main__.py,sha256=bFcfmSnTWeuhmDm7bJ3jJabZ2S8W9biuAP6_R-Cc9As,8034
4
+ md2conf/api.py,sha256=VxrAJ4yCsdGFVAEQQWw5aONwsMz0b6KvN4EMLXCKOwE,26905
5
+ md2conf/application.py,sha256=SIM4yLHaLnvG7wRJLbRvptrkc0q4JMuAhDnanqsuYzA,6697
6
+ md2conf/converter.py,sha256=ASXhs7g79dOU4x1QhfvKL8mtwth508GTGcb3AUHigC4,37286
7
+ md2conf/emoji.py,sha256=48QJtOD0F3Be1laYLvAOwe0GxrJS-vcfjtCdiBsNcAc,1960
8
+ md2conf/entities.dtd,sha256=M6NzqL5N7dPs_eUA_6sDsiSLzDaAacrx9LdttiufvYU,30215
9
+ md2conf/local.py,sha256=998bBRpDAOywA-L0KD4_VyuL2Xftflv0ler-uNPQZn4,3866
10
+ md2conf/matcher.py,sha256=y5WEZNklTpUoJtMJlulTvfhl_v-UMU6wySJAKit91ig,4940
11
+ md2conf/mermaid.py,sha256=ZETocFDKi_fSYyVR1pJ7fo207YYFSuT44MSYFQ8-cZ0,2562
12
+ md2conf/metadata.py,sha256=Xozg2PjJnis7VQYQT_edIvTb8u0cs_ZizPOAxc1N8vg,1003
13
+ md2conf/processor.py,sha256=jSLFy8hqZJXf3b79jp31Fn9-cm4j9xq4HDChp9pyhP0,6706
14
+ md2conf/properties.py,sha256=TOCXLdTfYkKjRwZaMgvXw0mNCI4opEUwpBXro2Kv2B4,2467
15
+ md2conf/puppeteer-config.json,sha256=-dMTAN_7kNTGbDlfXzApl0KJpAWna9YKZdwMKbpOb60,159
16
+ md2conf/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
17
+ md2conf/scanner.py,sha256=iF8NCQAFO6Yut5aAQr7uxfWzVMMt9j3T5ADoVVSJWKQ,3543
18
+ markdown_to_confluence-0.3.5.dist-info/METADATA,sha256=NiXwBXtQ5WhHce_JX7TBUSefQSR5jk5fERe46BL4vwE,18462
19
+ markdown_to_confluence-0.3.5.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
20
+ markdown_to_confluence-0.3.5.dist-info/entry_points.txt,sha256=F1zxa1wtEObtbHS-qp46330WVFLHdMnV2wQ-ZorRmX0,50
21
+ markdown_to_confluence-0.3.5.dist-info/top_level.txt,sha256=_FJfl_kHrHNidyjUOuS01ngu_jDsfc-ZjSocNRJnTzU,8
22
+ markdown_to_confluence-0.3.5.dist-info/zip-safe,sha256=AbpHGcgLb-kRsJGnwFEktk7uzpZOCcBY74-YBdrKVGs,1
23
+ markdown_to_confluence-0.3.5.dist-info/RECORD,,
@@ -1,5 +1,5 @@
1
1
  Wheel-Version: 1.0
2
- Generator: setuptools (80.8.0)
2
+ Generator: setuptools (80.9.0)
3
3
  Root-Is-Purelib: true
4
4
  Tag: py3-none-any
5
5
 
md2conf/__init__.py CHANGED
@@ -5,7 +5,7 @@ Parses Markdown files, converts Markdown content into the Confluence Storage For
5
5
  Confluence API endpoints to upload images and content.
6
6
  """
7
7
 
8
- __version__ = "0.3.4"
8
+ __version__ = "0.3.5"
9
9
  __author__ = "Levente Hunyadi"
10
10
  __copyright__ = "Copyright 2022-2025, Levente Hunyadi"
11
11
  __license__ = "MIT"
md2conf/api.py CHANGED
@@ -43,13 +43,23 @@ JsonType = Union[
43
43
 
44
44
 
45
45
  class ConfluenceVersion(enum.Enum):
46
+ """
47
+ Confluence REST API version an HTTP request corresponds to.
48
+
49
+ For some operations, Confluence Cloud supports v2 endpoints exclusively. However, for other operations, only v1 endpoints are available via REST API.
50
+ Some versions of Confluence Server and Data Center, unfortunately, don't support v2 endpoints at all.
51
+
52
+ The principal use case for *md2conf* is Confluence Cloud. As such, *md2conf* uses v2 endpoints when available, and resorts to v1 endpoints only when
53
+ necessary.
54
+ """
55
+
46
56
  VERSION_1 = "rest/api"
47
57
  VERSION_2 = "api/v2"
48
58
 
49
59
 
50
60
  class ConfluencePageParentContentType(enum.Enum):
51
61
  """
52
- Content types that can be a parent to a Confluence page
62
+ Content types that can be a parent to a Confluence page.
53
63
  """
54
64
 
55
65
  PAGE = "page"
@@ -80,6 +90,15 @@ LOGGER = logging.getLogger(__name__)
80
90
 
81
91
  @dataclass(frozen=True)
82
92
  class ConfluenceAttachment:
93
+ """
94
+ Holds data for an object uploaded to Confluence as a page attachment.
95
+
96
+ :param id: Unique ID for the attachment.
97
+ :param media_type: MIME type for the attachment.
98
+ :param file_size: Size in bytes.
99
+ :param comment: Description for the attachment.
100
+ """
101
+
83
102
  id: str
84
103
  media_type: str
85
104
  file_size: int
@@ -87,7 +106,18 @@ class ConfluenceAttachment:
87
106
 
88
107
 
89
108
  @dataclass(frozen=True)
90
- class ConfluencePageMetadata:
109
+ class ConfluencePageProperties:
110
+ """
111
+ Holds Confluence page properties used for page synchronization.
112
+
113
+ :param id: Confluence page ID.
114
+ :param space_id: Confluence space ID.
115
+ :param parent_id: Confluence page ID of the immediate parent.
116
+ :param parent_type: Identifies the content type of the parent.
117
+ :param title: Page title.
118
+ :param version: Page version. Incremented when the page is updated.
119
+ """
120
+
91
121
  id: str
92
122
  space_id: str
93
123
  parent_id: str
@@ -97,11 +127,36 @@ class ConfluencePageMetadata:
97
127
 
98
128
 
99
129
  @dataclass(frozen=True)
100
- class ConfluencePage(ConfluencePageMetadata):
130
+ class ConfluencePage(ConfluencePageProperties):
131
+ """
132
+ Holds Confluence page data used for page synchronization.
133
+
134
+ :param content: Page content in Confluence Storage Format.
135
+ """
136
+
101
137
  content: str
102
138
 
103
139
 
140
+ @dataclass(frozen=True)
141
+ class ConfluenceLabel:
142
+ """
143
+ Holds information about a single label.
144
+
145
+ :param id: ID of the label.
146
+ :param name: Name of the label.
147
+ :param prefix: Prefix of the label.
148
+ """
149
+
150
+ id: str
151
+ name: str
152
+ prefix: str
153
+
154
+
104
155
  class ConfluenceAPI:
156
+ """
157
+ Represents an active connection to a Confluence server.
158
+ """
159
+
105
160
  properties: ConfluenceConnectionProperties
106
161
  session: Optional["ConfluenceSession"] = None
107
162
 
@@ -195,7 +250,7 @@ class ConfluenceSession:
195
250
  path: str,
196
251
  query: Optional[dict[str, str]] = None,
197
252
  ) -> JsonType:
198
- "Execute an HTTP request via Confluence API."
253
+ "Executes an HTTP request via Confluence API."
199
254
 
200
255
  url = self._build_url(version, path, query)
201
256
  response = self.session.get(url)
@@ -204,7 +259,33 @@ class ConfluenceSession:
204
259
  response.raise_for_status()
205
260
  return response.json()
206
261
 
207
- def _save(self, version: ConfluenceVersion, path: str, data: dict) -> None:
262
+ def _fetch(
263
+ self, path: str, query: Optional[dict[str, str]] = None
264
+ ) -> list[JsonType]:
265
+ "Retrieves all results of a REST API v2 paginated result-set."
266
+
267
+ items: list[JsonType] = []
268
+ url = self._build_url(ConfluenceVersion.VERSION_2, path, query)
269
+ while True:
270
+ response = self.session.get(url)
271
+ response.raise_for_status()
272
+
273
+ payload = typing.cast(dict[str, JsonType], response.json())
274
+ results = typing.cast(list[JsonType], payload["results"])
275
+ items.extend(results)
276
+
277
+ links = typing.cast(dict[str, JsonType], payload.get("_links", {}))
278
+ link = typing.cast(str, links.get("next", ""))
279
+ if link:
280
+ url = f"https://{self.site.domain}{link}"
281
+ else:
282
+ break
283
+
284
+ return items
285
+
286
+ def _save(self, version: ConfluenceVersion, path: str, data: JsonType) -> None:
287
+ "Persists data via Confluence REST API."
288
+
208
289
  url = self._build_url(version, path)
209
290
  response = self.session.put(
210
291
  url,
@@ -263,7 +344,7 @@ class ConfluenceSession:
263
344
  self, *, space_id: Optional[str] = None, space_key: Optional[str] = None
264
345
  ) -> Optional[str]:
265
346
  """
266
- Coalesce a space ID or space key into a space ID, accounting for site default.
347
+ Coalesces a space ID or space key into a space ID, accounting for site default.
267
348
 
268
349
  :param space_id: A Confluence space ID.
269
350
  :param space_key: A Confluence space key.
@@ -285,6 +366,10 @@ class ConfluenceSession:
285
366
  def get_attachment_by_name(
286
367
  self, page_id: str, filename: str
287
368
  ) -> ConfluenceAttachment:
369
+ """
370
+ Retrieves a Confluence page attachment by an unprefixed file name.
371
+ """
372
+
288
373
  path = f"/pages/{page_id}/attachments"
289
374
  query = {"filename": filename}
290
375
  data = typing.cast(
@@ -313,6 +398,18 @@ class ConfluenceSession:
313
398
  comment: Optional[str] = None,
314
399
  force: bool = False,
315
400
  ) -> None:
401
+ """
402
+ Uploads a new attachment to a Confluence page.
403
+
404
+ :param page_id: Confluence page ID.
405
+ :param attachment_name: Unprefixed name unique to the page.
406
+ :param attachment_path: Path to the file to upload as an attachment.
407
+ :param raw_data: Raw data to upload as an attachment.
408
+ :param content_type: Attachment MIME type.
409
+ :param comment: Attachment description.
410
+ :param force: Overwrite an existing attachment even if there seem to be no changes.
411
+ """
412
+
316
413
  if attachment_path is None and raw_data is None:
317
414
  raise ArgumentError("required: `attachment_path` or `raw_data`")
318
415
 
@@ -409,7 +506,7 @@ class ConfluenceSession:
409
506
  ) -> None:
410
507
  id = attachment_id.removeprefix("att")
411
508
  path = f"/content/{page_id}/child/attachment/{id}"
412
- data = {
509
+ data: JsonType = {
413
510
  "id": attachment_id,
414
511
  "type": "attachment",
415
512
  "status": "current",
@@ -428,10 +525,11 @@ class ConfluenceSession:
428
525
  space_key: Optional[str] = None,
429
526
  ) -> str:
430
527
  """
431
- Look up a Confluence wiki page ID by title.
528
+ Looks up a Confluence wiki page ID by title.
432
529
 
433
530
  :param title: The page title.
434
- :param space_key: The Confluence space key (unless the default space is to be used).
531
+ :param space_id: The Confluence space ID (unless the default space is to be used). Exclusive with space key.
532
+ :param space_key: The Confluence space key (unless the default space is to be used). Exclusive with space ID.
435
533
  :returns: Confluence page ID.
436
534
  """
437
535
 
@@ -457,7 +555,7 @@ class ConfluenceSession:
457
555
 
458
556
  def get_page(self, page_id: str) -> ConfluencePage:
459
557
  """
460
- Retrieve Confluence wiki page details and content.
558
+ Retrieves Confluence wiki page details and content.
461
559
 
462
560
  :param page_id: The Confluence page ID.
463
561
  :returns: Confluence page info and content.
@@ -486,9 +584,9 @@ class ConfluenceSession:
486
584
  )
487
585
 
488
586
  @functools.cache
489
- def get_page_metadata(self, page_id: str) -> ConfluencePageMetadata:
587
+ def get_page_properties(self, page_id: str) -> ConfluencePageProperties:
490
588
  """
491
- Retrieve Confluence wiki page details.
589
+ Retrieves Confluence wiki page details.
492
590
 
493
591
  :param page_id: The Confluence page ID.
494
592
  :returns: Confluence page info.
@@ -499,7 +597,7 @@ class ConfluenceSession:
499
597
  data = typing.cast(dict[str, JsonType], payload)
500
598
  version = typing.cast(dict[str, JsonType], data["version"])
501
599
 
502
- return ConfluencePageMetadata(
600
+ return ConfluencePageProperties(
503
601
  id=page_id,
504
602
  space_id=typing.cast(str, data["spaceId"]),
505
603
  parent_id=typing.cast(str, data["parentId"]),
@@ -514,7 +612,7 @@ class ConfluenceSession:
514
612
 
515
613
  def get_page_version(self, page_id: str) -> int:
516
614
  """
517
- Retrieve a Confluence wiki page version.
615
+ Retrieves a Confluence wiki page version.
518
616
 
519
617
  :param page_id: The Confluence page ID.
520
618
  :returns: Confluence page version.
@@ -534,7 +632,7 @@ class ConfluenceSession:
534
632
  title: Optional[str] = None,
535
633
  ) -> None:
536
634
  """
537
- Update a page via the Confluence API.
635
+ Updates a page via the Confluence API.
538
636
 
539
637
  :param page_id: The Confluence page ID.
540
638
  :param new_content: Confluence Storage Format XHTML.
@@ -553,7 +651,7 @@ class ConfluenceSession:
553
651
  LOGGER.warning(exc)
554
652
 
555
653
  path = f"/pages/{page_id}"
556
- data = {
654
+ data: JsonType = {
557
655
  "id": page_id,
558
656
  "status": "current",
559
657
  "title": new_title,
@@ -571,10 +669,10 @@ class ConfluenceSession:
571
669
  new_content: str,
572
670
  ) -> ConfluencePage:
573
671
  """
574
- Create a new page via Confluence API.
672
+ Creates a new page via Confluence API.
575
673
  """
576
674
 
577
- parent_page = self.get_page_metadata(parent_id)
675
+ parent_page = self.get_page_properties(parent_id)
578
676
  path = "/pages/"
579
677
  query = {
580
678
  "spaceId": parent_page.space_id,
@@ -615,10 +713,10 @@ class ConfluenceSession:
615
713
 
616
714
  def delete_page(self, page_id: str, *, purge: bool = False) -> None:
617
715
  """
618
- Delete a page via Confluence API.
716
+ Deletes a page via Confluence API.
619
717
 
620
718
  :param page_id: The Confluence page ID.
621
- :param purge: True to completely purge the page, False to move to trash only.
719
+ :param purge: `True` to completely purge the page, `False` to move to trash only.
622
720
  """
623
721
 
624
722
  path = f"/pages/{page_id}"
@@ -645,10 +743,12 @@ class ConfluenceSession:
645
743
  space_key: Optional[str] = None,
646
744
  ) -> Optional[str]:
647
745
  """
648
- Check if a Confluence page exists with the given title.
746
+ Checks if a Confluence page exists with the given title.
649
747
 
650
748
  :param title: Page title. Pages in the same Confluence space must have a unique title.
651
749
  :param space_key: Identifies the Confluence space.
750
+
751
+ :returns: Confluence page ID of a matching page (if found), or `None`.
652
752
  """
653
753
 
654
754
  space_id = self.get_space_id(space_id=space_id, space_key=space_key)
@@ -676,13 +776,13 @@ class ConfluenceSession:
676
776
 
677
777
  def get_or_create_page(self, title: str, parent_id: str) -> ConfluencePage:
678
778
  """
679
- Find a page with the given title, or create a new page if no such page exists.
779
+ Finds a page with the given title, or creates a new page if no such page exists.
680
780
 
681
781
  :param title: Page title. Pages in the same Confluence space must have a unique title.
682
782
  :param parent_id: Identifies the parent page for a new child page.
683
783
  """
684
784
 
685
- parent_page = self.get_page_metadata(parent_id)
785
+ parent_page = self.get_page_properties(parent_id)
686
786
  page_id = self.page_exists(title, space_id=parent_page.space_id)
687
787
 
688
788
  if page_id is not None:
@@ -691,3 +791,22 @@ class ConfluenceSession:
691
791
  else:
692
792
  LOGGER.debug("Creating new page with title: %s", title)
693
793
  return self.create_page(parent_id, title, "")
794
+
795
+ def get_labels(self, page_id: str) -> list[ConfluenceLabel]:
796
+ """
797
+ Retrieves labels for a Confluence page.
798
+
799
+ :param page_id: The Confluence page ID.
800
+ :returns: A list of page labels.
801
+ """
802
+
803
+ items: list[ConfluenceLabel] = []
804
+ path = f"/pages/{page_id}/labels"
805
+ results = self._fetch(path)
806
+ for r in results:
807
+ result = typing.cast(dict[str, JsonType], r)
808
+ id = typing.cast(str, result["id"])
809
+ name = typing.cast(str, result["name"])
810
+ prefix = typing.cast(str, result["prefix"])
811
+ items.append(ConfluenceLabel(id, name, prefix))
812
+ return items
md2conf/application.py CHANGED
@@ -17,12 +17,11 @@ from .converter import (
17
17
  ConfluenceDocumentOptions,
18
18
  ConfluencePageID,
19
19
  attachment_name,
20
- extract_frontmatter_title,
21
- extract_qualified_id,
22
20
  )
23
21
  from .metadata import ConfluencePageMetadata
24
22
  from .processor import Converter, Processor, ProcessorFactory
25
23
  from .properties import PageError
24
+ from .scanner import Scanner
26
25
 
27
26
  LOGGER = logging.getLogger(__name__)
28
27
 
@@ -49,56 +48,43 @@ class SynchronizingProcessor(Processor):
49
48
  self.api = api
50
49
 
51
50
  def _get_or_create_page(
52
- self,
53
- absolute_path: Path,
54
- parent_id: Optional[ConfluencePageID],
55
- *,
56
- title: Optional[str] = None,
51
+ self, absolute_path: Path, parent_id: Optional[ConfluencePageID]
57
52
  ) -> ConfluencePageMetadata:
58
53
  """
59
54
  Creates a new Confluence page if no page is linked in the Markdown document.
60
55
  """
61
56
 
62
57
  # parse file
63
- with open(absolute_path, "r", encoding="utf-8") as f:
64
- text = f.read()
65
-
66
- qualified_id, text = extract_qualified_id(text)
58
+ document = Scanner().read(absolute_path)
67
59
 
68
60
  overwrite = False
69
- if qualified_id is None:
61
+ if document.page_id is None:
70
62
  # create new Confluence page
71
63
  if parent_id is None:
72
64
  raise PageError(
73
65
  f"expected: parent page ID for Markdown file with no linked Confluence page: {absolute_path}"
74
66
  )
75
67
 
76
- # assign title from front-matter if present
77
- if title is None:
78
- title, _ = extract_frontmatter_title(text)
79
-
80
68
  # use file name (without extension) and path hash if no title is supplied
81
- if title is None:
69
+ if document.title is not None:
70
+ title = document.title
71
+ else:
82
72
  overwrite = True
83
73
  relative_path = absolute_path.relative_to(self.root_dir)
84
74
  hash = hashlib.md5(relative_path.as_posix().encode("utf-8"))
85
75
  digest = "".join(f"{c:x}" for c in hash.digest())
86
76
  title = f"{absolute_path.stem} [{digest}]"
87
77
 
88
- confluence_page = self._create_page(absolute_path, text, title, parent_id)
78
+ confluence_page = self._create_page(
79
+ absolute_path, document.text, title, parent_id
80
+ )
89
81
  else:
90
82
  # look up existing Confluence page
91
- confluence_page = self.api.get_page(qualified_id.page_id)
92
-
93
- space_key = (
94
- self.api.space_id_to_key(confluence_page.space_id)
95
- if confluence_page.space_id
96
- else self.site.space_key
97
- )
83
+ confluence_page = self.api.get_page(document.page_id)
98
84
 
99
85
  return ConfluencePageMetadata(
100
86
  page_id=confluence_page.id,
101
- space_key=space_key,
87
+ space_key=self.api.space_id_to_key(confluence_page.space_id),
102
88
  title=confluence_page.title,
103
89
  overwrite=overwrite,
104
90
  )
@@ -123,7 +109,9 @@ class SynchronizingProcessor(Processor):
123
109
  )
124
110
  return confluence_page
125
111
 
126
- def _save_document(self, document: ConfluenceDocument, path: Path) -> None:
112
+ def _save_document(
113
+ self, page_id: ConfluencePageID, document: ConfluenceDocument, path: Path
114
+ ) -> None:
127
115
  """
128
116
  Saves a new version of a Confluence document.
129
117
 
@@ -133,25 +121,40 @@ class SynchronizingProcessor(Processor):
133
121
  base_path = path.parent
134
122
  for image in document.images:
135
123
  self.api.upload_attachment(
136
- document.id.page_id,
124
+ page_id.page_id,
137
125
  attachment_name(image),
138
126
  attachment_path=base_path / image,
139
127
  )
140
128
 
141
129
  for name, data in document.embedded_images.items():
142
130
  self.api.upload_attachment(
143
- document.id.page_id,
131
+ page_id.page_id,
144
132
  name,
145
133
  raw_data=data,
146
134
  )
147
135
 
148
136
  content = document.xhtml()
137
+ LOGGER.debug("Generated Confluence Storage Format document:\n%s", content)
149
138
 
150
- # leave title as it is for existing pages, update title for pages with randomly assigned title
151
- title = document.title if self.page_metadata[path].overwrite else None
139
+ title = None
140
+ if document.title is not None:
141
+ meta = self.page_metadata[path]
152
142
 
153
- LOGGER.debug("Generated Confluence Storage Format document:\n%s", content)
154
- self.api.update_page(document.id.page_id, content, title=title)
143
+ # update title only for pages with randomly assigned title
144
+ if meta.overwrite:
145
+ conflicting_page_id = self.api.page_exists(
146
+ document.title, space_id=self.api.space_key_to_id(meta.space_key)
147
+ )
148
+ if conflicting_page_id is None:
149
+ title = document.title
150
+ else:
151
+ LOGGER.info(
152
+ "Document title of %s conflicts with Confluence page title of %s",
153
+ path,
154
+ conflicting_page_id,
155
+ )
156
+
157
+ self.api.update_page(page_id.page_id, content, title=title)
155
158
 
156
159
  def _update_markdown(
157
160
  self,
@@ -200,6 +203,8 @@ class SynchronizingProcessorFactory(ProcessorFactory):
200
203
  class Application(Converter):
201
204
  """
202
205
  The entry point for Markdown to Confluence conversion.
206
+
207
+ This is the class instantiated by the command-line application.
203
208
  """
204
209
 
205
210
  def __init__(
md2conf/converter.py CHANGED
@@ -18,16 +18,16 @@ import xml.etree.ElementTree
18
18
  from dataclasses import dataclass
19
19
  from pathlib import Path
20
20
  from typing import Any, Literal, Optional, Union
21
- from urllib.parse import ParseResult, urlparse, urlunparse
21
+ from urllib.parse import ParseResult, quote_plus, urlparse, urlunparse
22
22
 
23
23
  import lxml.etree as ET
24
24
  import markdown
25
- import yaml
26
25
  from lxml.builder import ElementMaker
27
26
 
28
27
  from .mermaid import render_diagram
29
28
  from .metadata import ConfluencePageMetadata, ConfluenceSiteMetadata
30
29
  from .properties import PageError
30
+ from .scanner import ScannedDocument, Scanner
31
31
 
32
32
  namespaces = {
33
33
  "ac": "http://atlassian.com/content",
@@ -66,6 +66,19 @@ def is_relative_url(url: str) -> bool:
66
66
  return not bool(urlparts.scheme) and not bool(urlparts.netloc)
67
67
 
68
68
 
69
+ def encode_title(text: str) -> str:
70
+ "Converts a title string such that it is safe to embed into a Confluence URL."
71
+
72
+ # replace unsafe characters with space
73
+ text = re.sub(r"[^A-Za-z0-9._~()'!*:@,;+?-]+", " ", text)
74
+
75
+ # replace multiple consecutive spaces with single space
76
+ text = re.sub(r"\s\s+", " ", text)
77
+
78
+ # URL-encode
79
+ return quote_plus(text.strip())
80
+
81
+
69
82
  def emoji_generator(
70
83
  index: str,
71
84
  shortname: str,
@@ -466,7 +479,7 @@ class ConfluenceStorageFormatConverter(NodeVisitor):
466
479
  "Confluence space key required for building full web URLs"
467
480
  )
468
481
 
469
- page_url = f"{self.site_metadata.base_path}spaces/{space_key}/pages/{link_metadata.page_id}/{link_metadata.title}"
482
+ page_url = f"{self.site_metadata.base_path}spaces/{space_key}/pages/{link_metadata.page_id}/{encode_title(link_metadata.title)}"
470
483
 
471
484
  components = ParseResult(
472
485
  scheme="https",
@@ -949,78 +962,15 @@ class DocumentError(RuntimeError):
949
962
  "Raised when a converted Markdown document has an unexpected element or attribute."
950
963
 
951
964
 
952
- def extract_value(pattern: str, text: str) -> tuple[Optional[str], str]:
953
- values: list[str] = []
954
-
955
- def _repl_func(matchobj: re.Match) -> str:
956
- values.append(matchobj.group(1))
957
- return ""
958
-
959
- text = re.sub(pattern, _repl_func, text, 1, re.ASCII)
960
- value = values[0] if values else None
961
- return value, text
962
-
963
-
964
965
  @dataclass
965
966
  class ConfluencePageID:
966
967
  page_id: str
967
968
 
968
- def __init__(self, page_id: str):
969
- self.page_id = page_id
970
-
971
969
 
972
970
  @dataclass
973
971
  class ConfluenceQualifiedID:
974
972
  page_id: str
975
- space_key: Optional[str] = None
976
-
977
- def __init__(self, page_id: str, space_key: Optional[str] = None):
978
- self.page_id = page_id
979
- self.space_key = space_key
980
-
981
-
982
- def extract_qualified_id(text: str) -> tuple[Optional[ConfluenceQualifiedID], str]:
983
- "Extracts the Confluence page ID and space key from a Markdown document."
984
-
985
- page_id, text = extract_value(r"<!--\s+confluence-page-id:\s*(\d+)\s+-->", text)
986
-
987
- if page_id is None:
988
- return None, text
989
-
990
- # extract Confluence space key
991
- space_key, text = extract_value(r"<!--\s+confluence-space-key:\s*(\S+)\s+-->", text)
992
-
993
- return ConfluenceQualifiedID(page_id, space_key), text
994
-
995
-
996
- def extract_frontmatter(text: str) -> tuple[Optional[str], str]:
997
- "Extracts the front matter from a Markdown document."
998
-
999
- return extract_value(r"(?ms)\A---$(.+?)^---$", text)
1000
-
1001
-
1002
- def extract_frontmatter_title(text: str) -> tuple[Optional[str], str]:
1003
- frontmatter, text = extract_frontmatter(text)
1004
-
1005
- title: Optional[str] = None
1006
- if frontmatter is not None:
1007
- properties = yaml.safe_load(frontmatter)
1008
- if isinstance(properties, dict):
1009
- property_title = properties.get("title")
1010
- if isinstance(property_title, str):
1011
- title = property_title
1012
-
1013
- return title, text
1014
-
1015
-
1016
- def read_qualified_id(absolute_path: Path) -> Optional[ConfluenceQualifiedID]:
1017
- "Reads the Confluence page ID and space key from a Markdown document."
1018
-
1019
- with open(absolute_path, "r", encoding="utf-8") as f:
1020
- document = f.read()
1021
-
1022
- qualified_id, _ = extract_qualified_id(document)
1023
- return qualified_id
973
+ space_key: str
1024
974
 
1025
975
 
1026
976
  @dataclass
@@ -1055,7 +1005,6 @@ class ConversionError(RuntimeError):
1055
1005
 
1056
1006
 
1057
1007
  class ConfluenceDocument:
1058
- id: ConfluenceQualifiedID
1059
1008
  title: Optional[str]
1060
1009
  links: list[str]
1061
1010
  images: list[Path]
@@ -1071,63 +1020,47 @@ class ConfluenceDocument:
1071
1020
  root_dir: Path,
1072
1021
  site_metadata: ConfluenceSiteMetadata,
1073
1022
  page_metadata: dict[Path, ConfluencePageMetadata],
1074
- ) -> "ConfluenceDocument":
1023
+ ) -> tuple[ConfluencePageID, "ConfluenceDocument"]:
1075
1024
  path = path.resolve(True)
1076
1025
 
1077
- with open(path, "r", encoding="utf-8") as f:
1078
- text = f.read()
1026
+ document = Scanner().read(path)
1079
1027
 
1080
- # extract Confluence page ID
1081
- qualified_id, text = extract_qualified_id(text)
1082
- if qualified_id is None:
1028
+ if document.page_id is not None:
1029
+ page_id = ConfluencePageID(document.page_id)
1030
+ else:
1083
1031
  # look up Confluence page ID in metadata
1084
1032
  metadata = page_metadata.get(path)
1085
1033
  if metadata is not None:
1086
- qualified_id = ConfluenceQualifiedID(
1087
- metadata.page_id, metadata.space_key
1088
- )
1089
- if qualified_id is None:
1090
- raise PageError("missing Confluence page ID")
1034
+ page_id = ConfluencePageID(metadata.page_id)
1035
+ else:
1036
+ raise PageError("missing Confluence page ID")
1091
1037
 
1092
- return ConfluenceDocument(
1093
- path, text, qualified_id, options, root_dir, site_metadata, page_metadata
1038
+ return page_id, ConfluenceDocument(
1039
+ path, document, options, root_dir, site_metadata, page_metadata
1094
1040
  )
1095
1041
 
1096
1042
  def __init__(
1097
1043
  self,
1098
1044
  path: Path,
1099
- text: str,
1100
- qualified_id: ConfluenceQualifiedID,
1045
+ document: ScannedDocument,
1101
1046
  options: ConfluenceDocumentOptions,
1102
1047
  root_dir: Path,
1103
1048
  site_metadata: ConfluenceSiteMetadata,
1104
1049
  page_metadata: dict[Path, ConfluencePageMetadata],
1105
1050
  ) -> None:
1106
1051
  self.options = options
1107
- self.id = qualified_id
1108
-
1109
- # extract frontmatter
1110
- self.title, text = extract_frontmatter_title(text)
1111
-
1112
- # extract 'generated-by' tag text
1113
- generated_by_tag, text = extract_value(
1114
- r"<!--\s+generated-by:\s*(.*)\s+-->", text
1115
- )
1116
1052
 
1117
1053
  # convert to HTML
1118
- html = markdown_to_html(text)
1054
+ html = markdown_to_html(document.text)
1119
1055
 
1120
1056
  # parse Markdown document
1121
1057
  if self.options.generated_by is not None:
1122
- if generated_by_tag is not None:
1123
- generated_by_text = generated_by_tag
1124
- else:
1125
- generated_by_text = self.options.generated_by
1058
+ generated_by = document.generated_by or self.options.generated_by
1126
1059
  else:
1127
- generated_by_text = None
1060
+ generated_by = None
1128
1061
 
1129
- if generated_by_text is not None:
1130
- generated_by_html = markdown_to_html(generated_by_text)
1062
+ if generated_by is not None:
1063
+ generated_by_html = markdown_to_html(generated_by)
1131
1064
 
1132
1065
  content = [
1133
1066
  '<ac:structured-macro ac:name="info" ac:schema-version="1">',
@@ -1161,8 +1094,7 @@ class ConfluenceDocument:
1161
1094
  self.images = converter.images
1162
1095
  self.embedded_images = converter.embedded_images
1163
1096
 
1164
- if self.title is None:
1165
- self.title = converter.toc.get_title()
1097
+ self.title = document.title or converter.toc.get_title()
1166
1098
 
1167
1099
  def xhtml(self) -> str:
1168
1100
  return elements_to_string(self.root)
md2conf/local.py CHANGED
@@ -12,16 +12,11 @@ import os
12
12
  from pathlib import Path
13
13
  from typing import Optional
14
14
 
15
- from .converter import (
16
- ConfluenceDocument,
17
- ConfluenceDocumentOptions,
18
- ConfluencePageID,
19
- ConfluenceQualifiedID,
20
- extract_qualified_id,
21
- )
15
+ from .converter import ConfluenceDocument, ConfluenceDocumentOptions, ConfluencePageID
22
16
  from .metadata import ConfluencePageMetadata, ConfluenceSiteMetadata
23
17
  from .processor import Converter, Processor, ProcessorFactory
24
18
  from .properties import PageError
19
+ from .scanner import Scanner
25
20
 
26
21
  LOGGER = logging.getLogger(__name__)
27
22
 
@@ -52,41 +47,39 @@ class LocalProcessor(Processor):
52
47
  self.out_dir = out_dir or root_dir
53
48
 
54
49
  def _get_or_create_page(
55
- self,
56
- absolute_path: Path,
57
- parent_id: Optional[ConfluencePageID],
58
- *,
59
- title: Optional[str] = None,
50
+ self, absolute_path: Path, parent_id: Optional[ConfluencePageID]
60
51
  ) -> ConfluencePageMetadata:
61
52
  """
62
53
  Extracts metadata from a Markdown file.
63
54
  """
64
55
 
65
56
  # parse file
66
- with open(absolute_path, "r", encoding="utf-8") as f:
67
- text = f.read()
68
-
69
- qualified_id, text = extract_qualified_id(text)
70
-
71
- if qualified_id is None:
57
+ document = Scanner().read(absolute_path)
58
+ if document.page_id is not None:
59
+ page_id = document.page_id
60
+ space_key = document.space_key or self.site.space_key or "HOME"
61
+ else:
72
62
  if parent_id is None:
73
63
  raise PageError(
74
64
  f"expected: parent page ID for Markdown file with no linked Confluence page: {absolute_path}"
75
65
  )
76
66
 
77
- hash = hashlib.md5(text.encode("utf-8"))
67
+ hash = hashlib.md5(document.text.encode("utf-8"))
78
68
  digest = "".join(f"{c:x}" for c in hash.digest())
79
69
  LOGGER.info("Identifier %s assigned to page: %s", digest, absolute_path)
80
- qualified_id = ConfluenceQualifiedID(digest)
70
+ page_id = digest
71
+ space_key = self.site.space_key or "HOME"
81
72
 
82
73
  return ConfluencePageMetadata(
83
- page_id=qualified_id.page_id,
84
- space_key=qualified_id.space_key,
74
+ page_id=page_id,
75
+ space_key=space_key,
85
76
  title="",
86
77
  overwrite=True,
87
78
  )
88
79
 
89
- def _save_document(self, document: ConfluenceDocument, path: Path) -> None:
80
+ def _save_document(
81
+ self, page_id: ConfluencePageID, document: ConfluenceDocument, path: Path
82
+ ) -> None:
90
83
  """
91
84
  Saves a new version of a Confluence document.
92
85
 
md2conf/matcher.py CHANGED
@@ -10,15 +10,15 @@ import os.path
10
10
  from dataclasses import dataclass
11
11
  from fnmatch import fnmatch
12
12
  from pathlib import Path
13
- from typing import Iterable, Optional
13
+ from typing import Iterable, Optional, Union, overload
14
14
 
15
15
 
16
- @dataclass
16
+ @dataclass(frozen=True)
17
17
  class Entry:
18
18
  """
19
19
  Represents a file or directory entry.
20
20
 
21
- :param name: Name of the file-system entry.
21
+ :param name: Name of the file-system entry to match against the rule-set.
22
22
  :param is_dir: True if the entry is a directory.
23
23
  """
24
24
 
@@ -43,6 +43,15 @@ class MatcherOptions:
43
43
  self.extension = f".{self.extension}"
44
44
 
45
45
 
46
+ def _entry_name_dir(entry: Union[Entry, os.DirEntry[str]]) -> tuple[str, bool]:
47
+ if isinstance(entry, Entry):
48
+ return entry.name, entry.is_dir
49
+ elif isinstance(entry, os.DirEntry):
50
+ return entry.name, entry.is_dir()
51
+ else:
52
+ raise NotImplementedError("type match not exhaustive")
53
+
54
+
46
55
  class Matcher:
47
56
  "Compares file and directory names against a list of exclude/include patterns."
48
57
 
@@ -58,20 +67,40 @@ class Matcher:
58
67
  else:
59
68
  self.rules = []
60
69
 
70
+ for rule in self.rules:
71
+ if "/" in rule or os.path.sep in rule:
72
+ raise ValueError(f"nested matching not supported: {rule}")
73
+
61
74
  def extension_matches(self, name: str) -> bool:
62
75
  "True if the file name has the expected extension."
63
76
 
64
77
  return self.options.extension is None or name.endswith(self.options.extension)
65
78
 
66
- def is_excluded(self, name: str, is_dir: bool) -> bool:
79
+ @overload
80
+ def is_excluded(self, entry: Entry) -> bool:
81
+ """
82
+ True if the file or directory name matches any of the exclusion patterns.
83
+
84
+ :param entry: A data-class object.
85
+ :returns: True if the name matches at least one of the exclusion patterns.
86
+ """
87
+
88
+ ...
89
+
90
+ @overload
91
+ def is_excluded(self, entry: os.DirEntry[str]) -> bool:
67
92
  """
68
93
  True if the file or directory name matches any of the exclusion patterns.
69
94
 
70
- :param name: Name to match against the rule-set.
71
- :param is_dir: Whether the name identifies a directory.
95
+ :param entry: An object returned by `scandir`.
72
96
  :returns: True if the name matches at least one of the exclusion patterns.
73
97
  """
74
98
 
99
+ ...
100
+
101
+ def is_excluded(self, entry: Union[Entry, os.DirEntry[str]]) -> bool:
102
+ name, is_dir = _entry_name_dir(entry)
103
+
75
104
  # skip hidden files and directories
76
105
  if name.startswith("."):
77
106
  return True
@@ -86,26 +115,38 @@ class Matcher:
86
115
  else:
87
116
  return False
88
117
 
89
- def is_included(self, name: str, is_dir: bool) -> bool:
118
+ @overload
119
+ def is_included(self, entry: Entry) -> bool:
120
+ """
121
+ True if the file or directory name matches none of the exclusion patterns.
122
+
123
+ :param entry: A data-class object.
124
+ :returns: True if the name doesn't match any of the exclusion patterns.
125
+ """
126
+ ...
127
+
128
+ @overload
129
+ def is_included(self, entry: os.DirEntry[str]) -> bool:
90
130
  """
91
131
  True if the file or directory name matches none of the exclusion patterns.
92
132
 
93
- :param name: Name to match against the rule-set.
94
- :param is_dir: Whether the name identifies a directory.
133
+ :param entry: An object returned by `scandir`.
95
134
  :returns: True if the name doesn't match any of the exclusion patterns.
96
135
  """
136
+ ...
97
137
 
98
- return not self.is_excluded(name, is_dir)
138
+ def is_included(self, entry: Union[Entry, os.DirEntry[str]]) -> bool:
139
+ return not self.is_excluded(entry)
99
140
 
100
- def filter(self, items: Iterable[Entry]) -> list[Entry]:
141
+ def filter(self, entries: Iterable[Entry]) -> list[Entry]:
101
142
  """
102
143
  Returns only those elements from the input that don't match any of the exclusion rules.
103
144
 
104
- :param items: A list of names to filter.
145
+ :param entries: A list of names to filter.
105
146
  :returns: A filtered list of names that didn't match any of the exclusion rules.
106
147
  """
107
148
 
108
- return [item for item in items if self.is_included(item.name, item.is_dir)]
149
+ return [entry for entry in entries if self.is_included(entry)]
109
150
 
110
151
  def scandir(self, path: Path) -> list[Entry]:
111
152
  """
md2conf/mermaid.py CHANGED
@@ -79,10 +79,16 @@ def render_diagram(source: str, output_format: Literal["png", "svg"] = "png") ->
79
79
  )
80
80
  stdout, stderr = proc.communicate(input=source.encode("utf-8"))
81
81
  if proc.returncode:
82
- raise RuntimeError(
83
- f"failed to convert Mermaid diagram; exit code: {proc.returncode}, "
84
- f"output:\n{stdout.decode('utf-8')}\n{stderr.decode('utf-8')}"
85
- )
82
+ messages = [
83
+ f"failed to convert Mermaid diagram; exit code: {proc.returncode}"
84
+ ]
85
+ console_output = stdout.decode("utf-8")
86
+ if console_output:
87
+ messages.append(f"output:\n{console_output}")
88
+ console_error = stderr.decode("utf-8")
89
+ if console_error:
90
+ messages.append(f"error:\n{console_error}")
91
+ raise RuntimeError("\n".join(messages))
86
92
  with open(filename, "rb") as image:
87
93
  return image.read()
88
94
 
md2conf/metadata.py CHANGED
@@ -37,6 +37,6 @@ class ConfluencePageMetadata:
37
37
  """
38
38
 
39
39
  page_id: str
40
- space_key: Optional[str]
40
+ space_key: str
41
41
  title: str
42
42
  overwrite: bool
md2conf/processor.py CHANGED
@@ -69,18 +69,14 @@ class Processor:
69
69
  self._process_page(path)
70
70
 
71
71
  def _process_page(self, path: Path) -> None:
72
- document = ConfluenceDocument.create(
72
+ page_id, document = ConfluenceDocument.create(
73
73
  path, self.options, self.root_dir, self.site, self.page_metadata
74
74
  )
75
- self._save_document(document, path)
75
+ self._save_document(page_id, document, path)
76
76
 
77
77
  @abstractmethod
78
78
  def _get_or_create_page(
79
- self,
80
- absolute_path: Path,
81
- parent_id: Optional[ConfluencePageID],
82
- *,
83
- title: Optional[str] = None,
79
+ self, absolute_path: Path, parent_id: Optional[ConfluencePageID]
84
80
  ) -> ConfluencePageMetadata:
85
81
  """
86
82
  Creates a new Confluence page if no page is linked in the Markdown document.
@@ -88,7 +84,9 @@ class Processor:
88
84
  ...
89
85
 
90
86
  @abstractmethod
91
- def _save_document(self, document: ConfluenceDocument, path: Path) -> None: ...
87
+ def _save_document(
88
+ self, page_id: ConfluencePageID, document: ConfluenceDocument, path: Path
89
+ ) -> None: ...
92
90
 
93
91
  def _index_directory(
94
92
  self, local_dir: Path, parent_id: Optional[ConfluencePageID]
@@ -104,7 +102,7 @@ class Processor:
104
102
  files: list[Path] = []
105
103
  directories: list[Path] = []
106
104
  for entry in os.scandir(local_dir):
107
- if matcher.is_excluded(entry.name, entry.is_dir()):
105
+ if matcher.is_excluded(entry):
108
106
  continue
109
107
 
110
108
  if entry.is_file():
md2conf/scanner.py ADDED
@@ -0,0 +1,117 @@
1
+ """
2
+ Publish Markdown files to Confluence wiki.
3
+
4
+ Copyright 2022-2025, Levente Hunyadi
5
+
6
+ :see: https://github.com/hunyadi/md2conf
7
+ """
8
+
9
+ import re
10
+ from dataclasses import dataclass
11
+ from pathlib import Path
12
+ from typing import Any, Optional
13
+
14
+ import yaml
15
+
16
+
17
+ def extract_value(pattern: str, text: str) -> tuple[Optional[str], str]:
18
+ values: list[str] = []
19
+
20
+ def _repl_func(matchobj: re.Match) -> str:
21
+ values.append(matchobj.group(1))
22
+ return ""
23
+
24
+ text = re.sub(pattern, _repl_func, text, count=1, flags=re.ASCII)
25
+ value = values[0] if values else None
26
+ return value, text
27
+
28
+
29
+ def extract_frontmatter_block(text: str) -> tuple[Optional[str], str]:
30
+ "Extracts the front-matter from a Markdown document as a blob of unparsed text."
31
+
32
+ return extract_value(r"(?ms)\A---$(.+?)^---$", text)
33
+
34
+
35
+ def extract_frontmatter_properties(text: str) -> tuple[Optional[dict[str, Any]], str]:
36
+ "Extracts the front-matter from a Markdown document as a dictionary."
37
+
38
+ block, text = extract_frontmatter_block(text)
39
+
40
+ properties: Optional[dict[str, Any]] = None
41
+ if block is not None:
42
+ data = yaml.safe_load(block)
43
+ if isinstance(data, dict):
44
+ properties = data
45
+
46
+ return properties, text
47
+
48
+
49
+ def get_string(properties: dict[str, Any], key: str) -> Optional[str]:
50
+ value = properties.get(key)
51
+ if value is None:
52
+ return None
53
+ elif not isinstance(value, str):
54
+ raise ValueError(
55
+ f"expected dictionary value type of `str` for key `{key}`; got value of type `{type(value).__name__}`"
56
+ )
57
+ else:
58
+ return value
59
+
60
+
61
+ @dataclass
62
+ class ScannedDocument:
63
+ """
64
+ An object that holds properties extracted from a Markdown document, including remaining source text.
65
+
66
+ :param page_id: Confluence page ID.
67
+ :param space_key: Confluence space key.
68
+ :param generated_by: Text identifying the tool that generated the document.
69
+ :param title: The title extracted from front-matter.
70
+ :param text: Text that remains after front-matter and inline properties have been extracted.
71
+ """
72
+
73
+ page_id: Optional[str]
74
+ space_key: Optional[str]
75
+ generated_by: Optional[str]
76
+ title: Optional[str]
77
+ text: str
78
+
79
+
80
+ class Scanner:
81
+ def read(self, absolute_path: Path) -> ScannedDocument:
82
+ """
83
+ Extracts essential properties from a Markdown document.
84
+ """
85
+
86
+ # parse file
87
+ with open(absolute_path, "r", encoding="utf-8") as f:
88
+ text = f.read()
89
+
90
+ # extract Confluence page ID
91
+ page_id, text = extract_value(r"<!--\s+confluence-page-id:\s*(\d+)\s+-->", text)
92
+
93
+ # extract Confluence space key
94
+ space_key, text = extract_value(
95
+ r"<!--\s+confluence-space-key:\s*(\S+)\s+-->", text
96
+ )
97
+
98
+ # extract 'generated-by' tag text
99
+ generated_by, text = extract_value(r"<!--\s+generated-by:\s*(.*)\s+-->", text)
100
+
101
+ title: Optional[str] = None
102
+
103
+ # extract front-matter
104
+ properties, text = extract_frontmatter_properties(text)
105
+ if properties is not None:
106
+ page_id = page_id or get_string(properties, "confluence-page-id")
107
+ space_key = space_key or get_string(properties, "confluence-space-key")
108
+ generated_by = generated_by or get_string(properties, "generated-by")
109
+ title = get_string(properties, "title")
110
+
111
+ return ScannedDocument(
112
+ page_id=page_id,
113
+ space_key=space_key,
114
+ generated_by=generated_by,
115
+ title=title,
116
+ text=text,
117
+ )
@@ -1,22 +0,0 @@
1
- markdown_to_confluence-0.3.4.dist-info/licenses/LICENSE,sha256=Pv43so2bPfmKhmsrmXFyAvS7M30-1i1tzjz6-dfhyOo,1077
2
- md2conf/__init__.py,sha256=9gI6OYCv9-54FzxjNHLOH09H5quUDEMWq9pdbhnwoXM,402
3
- md2conf/__main__.py,sha256=bFcfmSnTWeuhmDm7bJ3jJabZ2S8W9biuAP6_R-Cc9As,8034
4
- md2conf/api.py,sha256=ZIYoBXclLbzrrQ_oFRllsTEnQIMbxqd9OD80-AC5qM0,22769
5
- md2conf/application.py,sha256=eIVeAGUzfdIq1uYLYpTg30UNSq-YcUIY-OgKKK3M4E4,6436
6
- md2conf/converter.py,sha256=2Sgq1WQd-dCtrdTVrBwhowPC8PmubMNCH1aAcRwntjs,39404
7
- md2conf/emoji.py,sha256=48QJtOD0F3Be1laYLvAOwe0GxrJS-vcfjtCdiBsNcAc,1960
8
- md2conf/entities.dtd,sha256=M6NzqL5N7dPs_eUA_6sDsiSLzDaAacrx9LdttiufvYU,30215
9
- md2conf/local.py,sha256=AOuwyvPOXrRRPGOTDeoVYkMPJ9MI2zqRGAvHuY35wy4,3884
10
- md2conf/matcher.py,sha256=FgMFPvGiOqGezCs8OyerfsVo-iIHFoI6LRMzdcjM5UY,3693
11
- md2conf/mermaid.py,sha256=un_KHBDpG5Zad_QD3HN1uBwUxp4I-HVJYhNKbH7KwcA,2312
12
- md2conf/metadata.py,sha256=9BtNRsICbKzPTs63P70XekNARePdW1DtdKNJqXh2ZFM,1013
13
- md2conf/processor.py,sha256=Ko_3WqLK6jM-bEN7OD9Vc3g3vhSjRYawz3fG6uoUsXc,6733
14
- md2conf/properties.py,sha256=TOCXLdTfYkKjRwZaMgvXw0mNCI4opEUwpBXro2Kv2B4,2467
15
- md2conf/puppeteer-config.json,sha256=-dMTAN_7kNTGbDlfXzApl0KJpAWna9YKZdwMKbpOb60,159
16
- md2conf/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
17
- markdown_to_confluence-0.3.4.dist-info/METADATA,sha256=PUtJXudDooVfwOzVtohxweWHMjgDv5CIrDvyqiJ0tlg,17745
18
- markdown_to_confluence-0.3.4.dist-info/WHEEL,sha256=zaaOINJESkSfm_4HQVc5ssNzHCPXhJm0kEUakpsEHaU,91
19
- markdown_to_confluence-0.3.4.dist-info/entry_points.txt,sha256=F1zxa1wtEObtbHS-qp46330WVFLHdMnV2wQ-ZorRmX0,50
20
- markdown_to_confluence-0.3.4.dist-info/top_level.txt,sha256=_FJfl_kHrHNidyjUOuS01ngu_jDsfc-ZjSocNRJnTzU,8
21
- markdown_to_confluence-0.3.4.dist-info/zip-safe,sha256=AbpHGcgLb-kRsJGnwFEktk7uzpZOCcBY74-YBdrKVGs,1
22
- markdown_to_confluence-0.3.4.dist-info/RECORD,,