python-hwpx 1.3__tar.gz → 1.4__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {python-hwpx-1.3/src/python_hwpx.egg-info → python-hwpx-1.4}/PKG-INFO +2 -1
- {python-hwpx-1.3 → python-hwpx-1.4}/README.md +1 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/pyproject.toml +1 -1
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/document.py +18 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/oxml/__init__.py +6 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/oxml/document.py +118 -1
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/package.py +88 -0
- {python-hwpx-1.3 → python-hwpx-1.4/src/python_hwpx.egg-info}/PKG-INFO +2 -1
- {python-hwpx-1.3 → python-hwpx-1.4}/tests/test_integration_hwpx_compatibility.py +89 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/LICENSE +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/setup.cfg +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/__init__.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/data/Skeleton.hwpx +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/opc/package.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/oxml/body.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/oxml/common.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/oxml/header.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/oxml/parser.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/oxml/schema.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/oxml/utils.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/templates.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/tools/__init__.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/tools/_schemas/header.xsd +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/tools/_schemas/section.xsd +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/tools/object_finder.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/tools/text_extractor.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/hwpx/tools/validator.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/python_hwpx.egg-info/SOURCES.txt +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/python_hwpx.egg-info/dependency_links.txt +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/python_hwpx.egg-info/entry_points.txt +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/python_hwpx.egg-info/requires.txt +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/src/python_hwpx.egg-info/top_level.txt +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/tests/test_document_formatting.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/tests/test_inline_models.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/tests/test_memo_and_style_editing.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/tests/test_oxml_parsing.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/tests/test_section_headers.py +0 -0
- {python-hwpx-1.3 → python-hwpx-1.4}/tests/test_text_extractor_annotations.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: python-hwpx
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.4
|
|
4
4
|
Summary: Hancom HWPX 패키지를 로드하고 편집하기 위한 Python 유틸리티 모음
|
|
5
5
|
Author: python-hwpx Maintainers
|
|
6
6
|
License: Non-Commercial License
|
|
@@ -39,6 +39,7 @@ Requires-Dist: pytest>=7.4; extra == "test"
|
|
|
39
39
|
- **타입이 지정된 본문 모델** – `hwpx.oxml.body`는 표·컨트롤·인라인 도형·변경 추적 태그를 데이터 클래스에 매핑하고, `HwpxOxmlParagraph.model`/`HwpxOxmlRun.model`로 이를 조회·수정한 뒤 XML로 되돌릴 수 있도록 지원합니다.
|
|
40
40
|
- **메모와 필드 앵커** – `add_memo_with_anchor()`로 메모를 생성하면서 MEMO 필드 컨트롤을 자동 삽입해 한/글에서 바로 표시되도록 합니다.
|
|
41
41
|
- **헤더 참조 목록 탐색** – 글머리표, 문단 속성, 스타일, 변경 추적 항목, 작성자 정보를 데이터클래스로 파싱하고 `document.bullets`·`document.styles` 같은 조회 헬퍼로 ID 기반 검색을 단순화했습니다.
|
|
42
|
+
- **바탕쪽·이력·버전 파트 제어** – 매니페스트에 포함된 master-page/history/version 파트를 `document.master_pages`, `document.histories`, `document.version`으로 직접 편집하고 저장합니다.
|
|
42
43
|
- **스타일 기반 텍스트 치환** – 런 서식(색상, 밑줄, `charPrIDRef`)으로 필터링해 텍스트를 선택적으로 교체하거나 삭제합니다. 하이라이트
|
|
43
44
|
마커나 태그로 분리된 문자열도 서식을 유지한 채 치환합니다.
|
|
44
45
|
- **텍스트 추출 파이프라인** – `hwpx.tools.text_extractor.TextExtractor`는 하이라이트, 각주, 컨트롤을 원하는 방식으로 표현하며 문단 텍스트를 반환합니다.
|
|
@@ -9,6 +9,7 @@
|
|
|
9
9
|
- **타입이 지정된 본문 모델** – `hwpx.oxml.body`는 표·컨트롤·인라인 도형·변경 추적 태그를 데이터 클래스에 매핑하고, `HwpxOxmlParagraph.model`/`HwpxOxmlRun.model`로 이를 조회·수정한 뒤 XML로 되돌릴 수 있도록 지원합니다.
|
|
10
10
|
- **메모와 필드 앵커** – `add_memo_with_anchor()`로 메모를 생성하면서 MEMO 필드 컨트롤을 자동 삽입해 한/글에서 바로 표시되도록 합니다.
|
|
11
11
|
- **헤더 참조 목록 탐색** – 글머리표, 문단 속성, 스타일, 변경 추적 항목, 작성자 정보를 데이터클래스로 파싱하고 `document.bullets`·`document.styles` 같은 조회 헬퍼로 ID 기반 검색을 단순화했습니다.
|
|
12
|
+
- **바탕쪽·이력·버전 파트 제어** – 매니페스트에 포함된 master-page/history/version 파트를 `document.master_pages`, `document.histories`, `document.version`으로 직접 편집하고 저장합니다.
|
|
12
13
|
- **스타일 기반 텍스트 치환** – 런 서식(색상, 밑줄, `charPrIDRef`)으로 필터링해 텍스트를 선택적으로 교체하거나 삭제합니다. 하이라이트
|
|
13
14
|
마커나 태그로 분리된 문자열도 서식을 유지한 채 치환합니다.
|
|
14
15
|
- **텍스트 추출 파이프라인** – `hwpx.tools.text_extractor.TextExtractor`는 하이라이트, 각주, 컨트롤을 원하는 방식으로 표현하며 문단 텍스트를 반환합니다.
|
|
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|
|
4
4
|
|
|
5
5
|
[project]
|
|
6
6
|
name = "python-hwpx"
|
|
7
|
-
version = "1.
|
|
7
|
+
version = "1.4"
|
|
8
8
|
description = "Hancom HWPX 패키지를 로드하고 편집하기 위한 Python 유틸리티 모음"
|
|
9
9
|
readme = { file = "README.md", content-type = "text/markdown" }
|
|
10
10
|
license = { text = "Non-Commercial License" }
|
|
@@ -13,13 +13,16 @@ from .oxml import (
|
|
|
13
13
|
Bullet,
|
|
14
14
|
HwpxOxmlDocument,
|
|
15
15
|
HwpxOxmlHeader,
|
|
16
|
+
HwpxOxmlHistory,
|
|
16
17
|
HwpxOxmlInlineObject,
|
|
18
|
+
HwpxOxmlMasterPage,
|
|
17
19
|
HwpxOxmlMemo,
|
|
18
20
|
HwpxOxmlParagraph,
|
|
19
21
|
HwpxOxmlRun,
|
|
20
22
|
HwpxOxmlSection,
|
|
21
23
|
HwpxOxmlSectionHeaderFooter,
|
|
22
24
|
HwpxOxmlTable,
|
|
25
|
+
HwpxOxmlVersion,
|
|
23
26
|
MemoShape,
|
|
24
27
|
ParagraphProperty,
|
|
25
28
|
RunStyle,
|
|
@@ -89,6 +92,21 @@ class HwpxDocument:
|
|
|
89
92
|
"""Return the header parts referenced by the document."""
|
|
90
93
|
return self._root.headers
|
|
91
94
|
|
|
95
|
+
@property
|
|
96
|
+
def master_pages(self) -> List[HwpxOxmlMasterPage]:
|
|
97
|
+
"""Return the master-page parts declared in the manifest."""
|
|
98
|
+
return self._root.master_pages
|
|
99
|
+
|
|
100
|
+
@property
|
|
101
|
+
def histories(self) -> List[HwpxOxmlHistory]:
|
|
102
|
+
"""Return document history parts referenced by the manifest."""
|
|
103
|
+
return self._root.histories
|
|
104
|
+
|
|
105
|
+
@property
|
|
106
|
+
def version(self) -> HwpxOxmlVersion | None:
|
|
107
|
+
"""Return the version metadata part if present."""
|
|
108
|
+
return self._root.version
|
|
109
|
+
|
|
92
110
|
@property
|
|
93
111
|
def memo_shapes(self) -> dict[str, MemoShape]:
|
|
94
112
|
"""Return memo shapes available in the header reference lists."""
|
|
@@ -19,7 +19,9 @@ from .document import (
|
|
|
19
19
|
DocumentNumbering,
|
|
20
20
|
HwpxOxmlDocument,
|
|
21
21
|
HwpxOxmlHeader,
|
|
22
|
+
HwpxOxmlHistory,
|
|
22
23
|
HwpxOxmlInlineObject,
|
|
24
|
+
HwpxOxmlMasterPage,
|
|
23
25
|
HwpxOxmlMemo,
|
|
24
26
|
HwpxOxmlMemoGroup,
|
|
25
27
|
HwpxOxmlParagraph,
|
|
@@ -30,6 +32,7 @@ from .document import (
|
|
|
30
32
|
HwpxOxmlTable,
|
|
31
33
|
HwpxOxmlTableCell,
|
|
32
34
|
HwpxOxmlTableRow,
|
|
35
|
+
HwpxOxmlVersion,
|
|
33
36
|
PageMargins,
|
|
34
37
|
PageSize,
|
|
35
38
|
RunStyle,
|
|
@@ -126,7 +129,9 @@ __all__ = [
|
|
|
126
129
|
"DocumentNumbering",
|
|
127
130
|
"HwpxOxmlDocument",
|
|
128
131
|
"HwpxOxmlHeader",
|
|
132
|
+
"HwpxOxmlHistory",
|
|
129
133
|
"HwpxOxmlInlineObject",
|
|
134
|
+
"HwpxOxmlMasterPage",
|
|
130
135
|
"HwpxOxmlMemo",
|
|
131
136
|
"HwpxOxmlMemoGroup",
|
|
132
137
|
"HwpxOxmlParagraph",
|
|
@@ -137,6 +142,7 @@ __all__ = [
|
|
|
137
142
|
"HwpxOxmlTable",
|
|
138
143
|
"HwpxOxmlTableCell",
|
|
139
144
|
"HwpxOxmlTableRow",
|
|
145
|
+
"HwpxOxmlVersion",
|
|
140
146
|
"KeyDerivation",
|
|
141
147
|
"KeyEncryption",
|
|
142
148
|
"LinkInfo",
|
|
@@ -1999,6 +1999,61 @@ class HwpxOxmlParagraph:
|
|
|
1999
1999
|
self.section.mark_dirty()
|
|
2000
2000
|
|
|
2001
2001
|
|
|
2002
|
+
class _HwpxOxmlSimplePart:
|
|
2003
|
+
"""Common base for standalone XML parts that are not sections or headers."""
|
|
2004
|
+
|
|
2005
|
+
def __init__(
|
|
2006
|
+
self,
|
|
2007
|
+
part_name: str,
|
|
2008
|
+
element: ET.Element,
|
|
2009
|
+
document: "HwpxOxmlDocument" | None = None,
|
|
2010
|
+
):
|
|
2011
|
+
self.part_name = part_name
|
|
2012
|
+
self._element = element
|
|
2013
|
+
self._document = document
|
|
2014
|
+
self._dirty = False
|
|
2015
|
+
|
|
2016
|
+
@property
|
|
2017
|
+
def element(self) -> ET.Element:
|
|
2018
|
+
return self._element
|
|
2019
|
+
|
|
2020
|
+
@property
|
|
2021
|
+
def document(self) -> "HwpxOxmlDocument" | None:
|
|
2022
|
+
return self._document
|
|
2023
|
+
|
|
2024
|
+
def attach_document(self, document: "HwpxOxmlDocument") -> None:
|
|
2025
|
+
self._document = document
|
|
2026
|
+
|
|
2027
|
+
@property
|
|
2028
|
+
def dirty(self) -> bool:
|
|
2029
|
+
return self._dirty
|
|
2030
|
+
|
|
2031
|
+
def mark_dirty(self) -> None:
|
|
2032
|
+
self._dirty = True
|
|
2033
|
+
|
|
2034
|
+
def reset_dirty(self) -> None:
|
|
2035
|
+
self._dirty = False
|
|
2036
|
+
|
|
2037
|
+
def replace_element(self, element: ET.Element) -> None:
|
|
2038
|
+
self._element = element
|
|
2039
|
+
self.mark_dirty()
|
|
2040
|
+
|
|
2041
|
+
def to_bytes(self) -> bytes:
|
|
2042
|
+
return _serialize_xml(self._element)
|
|
2043
|
+
|
|
2044
|
+
|
|
2045
|
+
class HwpxOxmlMasterPage(_HwpxOxmlSimplePart):
|
|
2046
|
+
"""Represents a master page part in the package."""
|
|
2047
|
+
|
|
2048
|
+
|
|
2049
|
+
class HwpxOxmlHistory(_HwpxOxmlSimplePart):
|
|
2050
|
+
"""Represents a document history part."""
|
|
2051
|
+
|
|
2052
|
+
|
|
2053
|
+
class HwpxOxmlVersion(_HwpxOxmlSimplePart):
|
|
2054
|
+
"""Represents the ``version.xml`` part."""
|
|
2055
|
+
|
|
2056
|
+
|
|
2002
2057
|
class HwpxOxmlSection:
|
|
2003
2058
|
"""Represents the contents of a section XML part."""
|
|
2004
2059
|
|
|
@@ -2540,16 +2595,29 @@ class HwpxOxmlDocument:
|
|
|
2540
2595
|
manifest: ET.Element,
|
|
2541
2596
|
sections: Sequence[HwpxOxmlSection],
|
|
2542
2597
|
headers: Sequence[HwpxOxmlHeader],
|
|
2598
|
+
*,
|
|
2599
|
+
master_pages: Sequence[HwpxOxmlMasterPage] | None = None,
|
|
2600
|
+
histories: Sequence[HwpxOxmlHistory] | None = None,
|
|
2601
|
+
version: HwpxOxmlVersion | None = None,
|
|
2543
2602
|
):
|
|
2544
2603
|
self._manifest = manifest
|
|
2545
2604
|
self._sections = list(sections)
|
|
2546
2605
|
self._headers = list(headers)
|
|
2606
|
+
self._master_pages = list(master_pages or [])
|
|
2607
|
+
self._histories = list(histories or [])
|
|
2608
|
+
self._version = version
|
|
2547
2609
|
self._char_property_cache: dict[str, RunStyle] | None = None
|
|
2548
2610
|
|
|
2549
2611
|
for section in self._sections:
|
|
2550
2612
|
section.attach_document(self)
|
|
2551
2613
|
for header in self._headers:
|
|
2552
2614
|
header.attach_document(self)
|
|
2615
|
+
for master_page in self._master_pages:
|
|
2616
|
+
master_page.attach_document(self)
|
|
2617
|
+
for history in self._histories:
|
|
2618
|
+
history.attach_document(self)
|
|
2619
|
+
if self._version is not None:
|
|
2620
|
+
self._version.attach_document(self)
|
|
2553
2621
|
|
|
2554
2622
|
@classmethod
|
|
2555
2623
|
def from_package(cls, package: "HwpxPackage") -> "HwpxOxmlDocument":
|
|
@@ -2561,12 +2629,35 @@ class HwpxOxmlDocument:
|
|
|
2561
2629
|
manifest = package.get_xml(package.MANIFEST_PATH)
|
|
2562
2630
|
section_paths = package.section_paths()
|
|
2563
2631
|
header_paths = package.header_paths()
|
|
2632
|
+
master_page_paths = package.master_page_paths()
|
|
2633
|
+
history_paths = package.history_paths()
|
|
2634
|
+
version_path = package.version_path()
|
|
2564
2635
|
|
|
2565
2636
|
sections = [
|
|
2566
2637
|
HwpxOxmlSection(path, package.get_xml(path)) for path in section_paths
|
|
2567
2638
|
]
|
|
2568
2639
|
headers = [HwpxOxmlHeader(path, package.get_xml(path)) for path in header_paths]
|
|
2569
|
-
|
|
2640
|
+
master_pages = [
|
|
2641
|
+
HwpxOxmlMasterPage(path, package.get_xml(path))
|
|
2642
|
+
for path in master_page_paths
|
|
2643
|
+
if package.has_part(path)
|
|
2644
|
+
]
|
|
2645
|
+
histories = [
|
|
2646
|
+
HwpxOxmlHistory(path, package.get_xml(path))
|
|
2647
|
+
for path in history_paths
|
|
2648
|
+
if package.has_part(path)
|
|
2649
|
+
]
|
|
2650
|
+
version = None
|
|
2651
|
+
if version_path and package.has_part(version_path):
|
|
2652
|
+
version = HwpxOxmlVersion(version_path, package.get_xml(version_path))
|
|
2653
|
+
return cls(
|
|
2654
|
+
manifest,
|
|
2655
|
+
sections,
|
|
2656
|
+
headers,
|
|
2657
|
+
master_pages=master_pages,
|
|
2658
|
+
histories=histories,
|
|
2659
|
+
version=version,
|
|
2660
|
+
)
|
|
2570
2661
|
|
|
2571
2662
|
@property
|
|
2572
2663
|
def manifest(self) -> ET.Element:
|
|
@@ -2580,6 +2671,18 @@ class HwpxOxmlDocument:
|
|
|
2580
2671
|
def headers(self) -> List[HwpxOxmlHeader]:
|
|
2581
2672
|
return list(self._headers)
|
|
2582
2673
|
|
|
2674
|
+
@property
|
|
2675
|
+
def master_pages(self) -> List[HwpxOxmlMasterPage]:
|
|
2676
|
+
return list(self._master_pages)
|
|
2677
|
+
|
|
2678
|
+
@property
|
|
2679
|
+
def histories(self) -> List[HwpxOxmlHistory]:
|
|
2680
|
+
return list(self._histories)
|
|
2681
|
+
|
|
2682
|
+
@property
|
|
2683
|
+
def version(self) -> HwpxOxmlVersion | None:
|
|
2684
|
+
return self._version
|
|
2685
|
+
|
|
2583
2686
|
def _ensure_char_property_cache(self) -> dict[str, RunStyle]:
|
|
2584
2687
|
if self._char_property_cache is None:
|
|
2585
2688
|
mapping: dict[str, RunStyle] = {}
|
|
@@ -2812,6 +2915,14 @@ class HwpxOxmlDocument:
|
|
|
2812
2915
|
headers_dirty = True
|
|
2813
2916
|
if headers_dirty:
|
|
2814
2917
|
self.invalidate_char_property_cache()
|
|
2918
|
+
for master_page in self._master_pages:
|
|
2919
|
+
if master_page.dirty:
|
|
2920
|
+
updates[master_page.part_name] = master_page.to_bytes()
|
|
2921
|
+
for history in self._histories:
|
|
2922
|
+
if history.dirty:
|
|
2923
|
+
updates[history.part_name] = history.to_bytes()
|
|
2924
|
+
if self._version is not None and self._version.dirty:
|
|
2925
|
+
updates[self._version.part_name] = self._version.to_bytes()
|
|
2815
2926
|
return updates
|
|
2816
2927
|
|
|
2817
2928
|
def reset_dirty(self) -> None:
|
|
@@ -2820,3 +2931,9 @@ class HwpxOxmlDocument:
|
|
|
2820
2931
|
section.reset_dirty()
|
|
2821
2932
|
for header in self._headers:
|
|
2822
2933
|
header.reset_dirty()
|
|
2934
|
+
for master_page in self._master_pages:
|
|
2935
|
+
master_page.reset_dirty()
|
|
2936
|
+
for history in self._histories:
|
|
2937
|
+
history.reset_dirty()
|
|
2938
|
+
if self._version is not None:
|
|
2939
|
+
self._version.reset_dirty()
|
|
@@ -11,6 +11,21 @@ from zipfile import ZIP_DEFLATED, ZipFile
|
|
|
11
11
|
_OPF_NS = "http://www.idpf.org/2007/opf/"
|
|
12
12
|
|
|
13
13
|
|
|
14
|
+
def _normalized_manifest_value(element: ET.Element) -> str:
|
|
15
|
+
values = [
|
|
16
|
+
element.attrib.get("id", ""),
|
|
17
|
+
element.attrib.get("href", ""),
|
|
18
|
+
element.attrib.get("media-type", ""),
|
|
19
|
+
element.attrib.get("properties", ""),
|
|
20
|
+
]
|
|
21
|
+
return " ".join(part.lower() for part in values if part)
|
|
22
|
+
|
|
23
|
+
|
|
24
|
+
def _manifest_matches(element: ET.Element, *candidates: str) -> bool:
|
|
25
|
+
normalized = _normalized_manifest_value(element)
|
|
26
|
+
return any(candidate in normalized for candidate in candidates if candidate)
|
|
27
|
+
|
|
28
|
+
|
|
14
29
|
def _ensure_bytes(value: bytes | str | ET.Element) -> bytes:
|
|
15
30
|
if isinstance(value, bytes):
|
|
16
31
|
return value
|
|
@@ -38,6 +53,10 @@ class HwpxPackage:
|
|
|
38
53
|
self._spine_cache: list[str] | None = None
|
|
39
54
|
self._section_paths_cache: list[str] | None = None
|
|
40
55
|
self._header_paths_cache: list[str] | None = None
|
|
56
|
+
self._master_page_paths_cache: list[str] | None = None
|
|
57
|
+
self._history_paths_cache: list[str] | None = None
|
|
58
|
+
self._version_path_cache: str | None = None
|
|
59
|
+
self._version_path_cache_resolved = False
|
|
41
60
|
|
|
42
61
|
# -- construction ----------------------------------------------------
|
|
43
62
|
@classmethod
|
|
@@ -85,6 +104,12 @@ class HwpxPackage:
|
|
|
85
104
|
self._spine_cache = None
|
|
86
105
|
self._section_paths_cache = None
|
|
87
106
|
self._header_paths_cache = None
|
|
107
|
+
self._master_page_paths_cache = None
|
|
108
|
+
self._history_paths_cache = None
|
|
109
|
+
self._version_path_cache = None
|
|
110
|
+
self._version_path_cache_resolved = False
|
|
111
|
+
elif part_name == "version.xml":
|
|
112
|
+
self._version_path_cache_resolved = False
|
|
88
113
|
|
|
89
114
|
def get_xml(self, part_name: str) -> ET.Element:
|
|
90
115
|
return ET.fromstring(self.get_part(part_name))
|
|
@@ -101,6 +126,11 @@ class HwpxPackage:
|
|
|
101
126
|
self._manifest_tree = self.get_xml(self.MANIFEST_PATH)
|
|
102
127
|
return self._manifest_tree
|
|
103
128
|
|
|
129
|
+
def _manifest_items(self) -> list[ET.Element]:
|
|
130
|
+
manifest = self.manifest_tree()
|
|
131
|
+
ns = {"opf": _OPF_NS}
|
|
132
|
+
return list(manifest.findall("./opf:manifest/opf:item", ns))
|
|
133
|
+
|
|
104
134
|
def _resolve_spine_paths(self) -> list[str]:
|
|
105
135
|
if self._spine_cache is None:
|
|
106
136
|
manifest = self.manifest_tree()
|
|
@@ -155,6 +185,64 @@ class HwpxPackage:
|
|
|
155
185
|
self._header_paths_cache = paths
|
|
156
186
|
return list(self._header_paths_cache)
|
|
157
187
|
|
|
188
|
+
def master_page_paths(self) -> list[str]:
|
|
189
|
+
if self._master_page_paths_cache is None:
|
|
190
|
+
from pathlib import PurePosixPath
|
|
191
|
+
|
|
192
|
+
paths = [
|
|
193
|
+
item.attrib.get("href", "")
|
|
194
|
+
for item in self._manifest_items()
|
|
195
|
+
if _manifest_matches(item, "masterpage", "master-page")
|
|
196
|
+
and item.attrib.get("href")
|
|
197
|
+
]
|
|
198
|
+
|
|
199
|
+
if not paths:
|
|
200
|
+
paths = [
|
|
201
|
+
name
|
|
202
|
+
for name in self._parts.keys()
|
|
203
|
+
if "master" in PurePosixPath(name).name.lower()
|
|
204
|
+
and "page" in PurePosixPath(name).name.lower()
|
|
205
|
+
]
|
|
206
|
+
|
|
207
|
+
self._master_page_paths_cache = paths
|
|
208
|
+
return list(self._master_page_paths_cache)
|
|
209
|
+
|
|
210
|
+
def history_paths(self) -> list[str]:
|
|
211
|
+
if self._history_paths_cache is None:
|
|
212
|
+
from pathlib import PurePosixPath
|
|
213
|
+
|
|
214
|
+
paths = [
|
|
215
|
+
item.attrib.get("href", "")
|
|
216
|
+
for item in self._manifest_items()
|
|
217
|
+
if _manifest_matches(item, "history")
|
|
218
|
+
and item.attrib.get("href")
|
|
219
|
+
]
|
|
220
|
+
|
|
221
|
+
if not paths:
|
|
222
|
+
paths = [
|
|
223
|
+
name
|
|
224
|
+
for name in self._parts.keys()
|
|
225
|
+
if "history" in PurePosixPath(name).name.lower()
|
|
226
|
+
]
|
|
227
|
+
|
|
228
|
+
self._history_paths_cache = paths
|
|
229
|
+
return list(self._history_paths_cache)
|
|
230
|
+
|
|
231
|
+
def version_path(self) -> str | None:
|
|
232
|
+
if not self._version_path_cache_resolved:
|
|
233
|
+
path: str | None = None
|
|
234
|
+
for item in self._manifest_items():
|
|
235
|
+
if _manifest_matches(item, "version"):
|
|
236
|
+
href = item.attrib.get("href", "").strip()
|
|
237
|
+
if href:
|
|
238
|
+
path = href
|
|
239
|
+
break
|
|
240
|
+
if path is None and self.has_part("version.xml"):
|
|
241
|
+
path = "version.xml"
|
|
242
|
+
self._version_path_cache = path
|
|
243
|
+
self._version_path_cache_resolved = True
|
|
244
|
+
return self._version_path_cache
|
|
245
|
+
|
|
158
246
|
# -- saving ----------------------------------------------------------
|
|
159
247
|
def save(
|
|
160
248
|
self,
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: python-hwpx
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.4
|
|
4
4
|
Summary: Hancom HWPX 패키지를 로드하고 편집하기 위한 Python 유틸리티 모음
|
|
5
5
|
Author: python-hwpx Maintainers
|
|
6
6
|
License: Non-Commercial License
|
|
@@ -39,6 +39,7 @@ Requires-Dist: pytest>=7.4; extra == "test"
|
|
|
39
39
|
- **타입이 지정된 본문 모델** – `hwpx.oxml.body`는 표·컨트롤·인라인 도형·변경 추적 태그를 데이터 클래스에 매핑하고, `HwpxOxmlParagraph.model`/`HwpxOxmlRun.model`로 이를 조회·수정한 뒤 XML로 되돌릴 수 있도록 지원합니다.
|
|
40
40
|
- **메모와 필드 앵커** – `add_memo_with_anchor()`로 메모를 생성하면서 MEMO 필드 컨트롤을 자동 삽입해 한/글에서 바로 표시되도록 합니다.
|
|
41
41
|
- **헤더 참조 목록 탐색** – 글머리표, 문단 속성, 스타일, 변경 추적 항목, 작성자 정보를 데이터클래스로 파싱하고 `document.bullets`·`document.styles` 같은 조회 헬퍼로 ID 기반 검색을 단순화했습니다.
|
|
42
|
+
- **바탕쪽·이력·버전 파트 제어** – 매니페스트에 포함된 master-page/history/version 파트를 `document.master_pages`, `document.histories`, `document.version`으로 직접 편집하고 저장합니다.
|
|
42
43
|
- **스타일 기반 텍스트 치환** – 런 서식(색상, 밑줄, `charPrIDRef`)으로 필터링해 텍스트를 선택적으로 교체하거나 삭제합니다. 하이라이트
|
|
43
44
|
마커나 태그로 분리된 문자열도 서식을 유지한 채 치환합니다.
|
|
44
45
|
- **텍스트 추출 파이프라인** – `hwpx.tools.text_extractor.TextExtractor`는 하이라이트, 각주, 컨트롤을 원하는 방식으로 표현하며 문단 텍스트를 반환합니다.
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
from __future__ import annotations
|
|
2
2
|
|
|
3
3
|
import io
|
|
4
|
+
import xml.etree.ElementTree as ET
|
|
4
5
|
from pathlib import Path
|
|
5
6
|
from zipfile import ZIP_DEFLATED, ZIP_STORED, ZipFile
|
|
6
7
|
|
|
@@ -9,6 +10,7 @@ import pytest
|
|
|
9
10
|
from hwpx.document import HwpxDocument
|
|
10
11
|
from hwpx.package import HwpxPackage
|
|
11
12
|
from hwpx.tools import load_default_schemas, validate_document
|
|
13
|
+
from hwpx.templates import blank_document_bytes
|
|
12
14
|
|
|
13
15
|
_MIMETYPE = b"application/hwp+zip"
|
|
14
16
|
_VERSION_XML = (
|
|
@@ -137,3 +139,90 @@ def test_fixture_validates_against_reference_schemas(
|
|
|
137
139
|
|
|
138
140
|
bytes_report = validate_document(sample_document_bytes)
|
|
139
141
|
assert bytes_report.ok, "Generated sample failed schema validation from bytes"
|
|
142
|
+
|
|
143
|
+
|
|
144
|
+
def test_master_page_history_and_version_round_trip(tmp_path: Path) -> None:
|
|
145
|
+
package = HwpxPackage.open(blank_document_bytes())
|
|
146
|
+
|
|
147
|
+
manifest = package.manifest_tree()
|
|
148
|
+
ns = {"opf": "http://www.idpf.org/2007/opf/"}
|
|
149
|
+
manifest_list = manifest.find(f"{{{ns['opf']}}}manifest")
|
|
150
|
+
assert manifest_list is not None
|
|
151
|
+
|
|
152
|
+
def add_manifest_item(item_id: str, href: str) -> None:
|
|
153
|
+
ET.SubElement(
|
|
154
|
+
manifest_list,
|
|
155
|
+
f"{{{ns['opf']}}}item",
|
|
156
|
+
{"id": item_id, "href": href, "media-type": "application/xml"},
|
|
157
|
+
)
|
|
158
|
+
|
|
159
|
+
add_manifest_item("master-page-0", "Contents/masterPages/masterPage0.xml")
|
|
160
|
+
add_manifest_item("history", "Contents/history.xml")
|
|
161
|
+
add_manifest_item("version", "version.xml")
|
|
162
|
+
package.set_xml(package.MANIFEST_PATH, manifest)
|
|
163
|
+
|
|
164
|
+
hm_ns = "http://www.hancom.co.kr/hwpml/2011/master-page"
|
|
165
|
+
master_root = ET.Element(f"{{{hm_ns}}}masterPage")
|
|
166
|
+
ET.SubElement(
|
|
167
|
+
master_root,
|
|
168
|
+
f"{{{hm_ns}}}masterPageItem",
|
|
169
|
+
{"id": "0", "type": "BOTH", "name": "초기 바탕쪽"},
|
|
170
|
+
)
|
|
171
|
+
package.set_xml("Contents/masterPages/masterPage0.xml", master_root)
|
|
172
|
+
|
|
173
|
+
hhs_ns = "http://www.hancom.co.kr/hwpml/2011/history"
|
|
174
|
+
history_root = ET.Element(f"{{{hhs_ns}}}history")
|
|
175
|
+
history_entry = ET.SubElement(history_root, f"{{{hhs_ns}}}historyEntry", {"id": "0"})
|
|
176
|
+
comment = ET.SubElement(history_entry, f"{{{hhs_ns}}}comment")
|
|
177
|
+
comment.text = "초기 내역"
|
|
178
|
+
package.set_xml("Contents/history.xml", history_root)
|
|
179
|
+
|
|
180
|
+
document = HwpxDocument.from_package(package)
|
|
181
|
+
|
|
182
|
+
assert len(document.master_pages) == 1
|
|
183
|
+
assert len(document.histories) == 1
|
|
184
|
+
version_part = document.version
|
|
185
|
+
assert version_part is not None
|
|
186
|
+
|
|
187
|
+
master_page = document.master_pages[0]
|
|
188
|
+
master_item = master_page.element.find(f"{{{hm_ns}}}masterPageItem")
|
|
189
|
+
assert master_item is not None
|
|
190
|
+
master_item.set("name", "검토용 바탕쪽")
|
|
191
|
+
master_page.mark_dirty()
|
|
192
|
+
|
|
193
|
+
history_part = document.histories[0]
|
|
194
|
+
history_comment = history_part.element.find(
|
|
195
|
+
f"{{{hhs_ns}}}historyEntry/{{{hhs_ns}}}comment"
|
|
196
|
+
)
|
|
197
|
+
assert history_comment is not None
|
|
198
|
+
history_comment.text = "업데이트된 변경 기록"
|
|
199
|
+
history_part.mark_dirty()
|
|
200
|
+
|
|
201
|
+
version_part.element.set("appVersion", "15.0.0.100 WIN32")
|
|
202
|
+
version_part.mark_dirty()
|
|
203
|
+
|
|
204
|
+
output_path = tmp_path / "master_history_roundtrip.hwpx"
|
|
205
|
+
document.save(output_path)
|
|
206
|
+
|
|
207
|
+
reopened = HwpxDocument.open(output_path)
|
|
208
|
+
assert reopened.master_pages
|
|
209
|
+
assert reopened.histories
|
|
210
|
+
reopened_version = reopened.version
|
|
211
|
+
assert reopened_version is not None
|
|
212
|
+
|
|
213
|
+
reopened_master_item = reopened.master_pages[0].element.find(
|
|
214
|
+
f"{{{hm_ns}}}masterPageItem"
|
|
215
|
+
)
|
|
216
|
+
assert reopened_master_item is not None
|
|
217
|
+
assert reopened_master_item.get("name") == "검토용 바탕쪽"
|
|
218
|
+
|
|
219
|
+
reopened_history_comment = reopened.histories[0].element.find(
|
|
220
|
+
f"{{{hhs_ns}}}historyEntry/{{{hhs_ns}}}comment"
|
|
221
|
+
)
|
|
222
|
+
assert reopened_history_comment is not None
|
|
223
|
+
assert reopened_history_comment.text == "업데이트된 변경 기록"
|
|
224
|
+
|
|
225
|
+
assert reopened_version.element.get("appVersion") == "15.0.0.100 WIN32"
|
|
226
|
+
assert "Contents/masterPages/masterPage0.xml" in reopened.package.master_page_paths()
|
|
227
|
+
assert "Contents/history.xml" in reopened.package.history_paths()
|
|
228
|
+
assert reopened.package.version_path() == "version.xml"
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|