PyPI - python-hwpx - Versions diffs - 2.10.1__tar.gz → 2.10.2__tar.gz - Mend

python-hwpx 2.10.1tar.gz → 2.10.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (113) hide show

{python_hwpx-2.10.1/src/python_hwpx.egg-info → python_hwpx-2.10.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: python-hwpx
-Version: 2.10.1
+Version: 2.10.2
 Summary: 한글 없이 HWPX 문서를 열고, 편집하고, 생성하고, 검증하는 Python 자동화 라이브러리
 Author: python-hwpx Maintainers
 License-Expression: Apache-2.0
@@ -115,6 +115,47 @@ hwpx-validate-package 보고서.hwpx
 hwpx-analyze-template 보고서.hwpx
 ```
+### 4. 풍부한 Markdown 변환 (서식·표·각주·이미지 보존)
+`export_markdown()`는 단순 평문 추출이고, `export_rich_markdown()`는 인라인 서식(`**굵게**`, `*기울임*`, `~~취소선~~`),
+표(중첩 포함, colspan/rowspan 안전), 도형 텍스트, 이미지, 각주/미주, 하이퍼링크, 제목(`#`/`##`) 자동 감지까지 보존한다.
+```python
+from hwpx import HwpxDocument
+doc = HwpxDocument.open("보고서.hwpx")
+md = doc.export_rich_markdown(
+    image_dir="out/images",          # BinData 이미지를 디스크에 추출
+    image_ref_prefix="images/",      # 마크다운 내 ![](images/...) 경로 접두
+    detect_headings=True,            # Ⅰ./1. 패턴 기반 #/## 자동
+)
+print(md)
+```
+문자열·경로·바이트도 그대로 받는다:
+```python
+from hwpx.tools.markdown_export import export_markdown
+md = export_markdown("보고서.hwpx")          # 경로
+md = export_markdown(open("a.hwpx", "rb").read())  # bytes
+```
+### 5. 각주 본문에 혼합 서식 / 하이퍼링크 추가
+`HwpxOxmlNote`에 `body_paragraph`, `add_run`, `add_hyperlink` helper가 있어 각주 본문을
+직접 paragraph로 다루지 않고도 인라인 서식·링크를 손쉽게 채울 수 있다.
+```python
+para = section.paragraphs[0]
+note = para.add_footnote("")  # 빈 각주 생성 후 본문 구성
+note.add_run("자세한 내용은 ", )
+note.add_run("정부 공식 사이트", bold=True)
+note.add_run("를 참고하라: ")
+note.add_hyperlink("https://www.kasa.go.kr", "우주항공청")
+```
 처음에는 `open/new -> edit/extract -> save_to_path` 흐름만 잡으면 된다. 패키지 구조, XML 파트, 템플릿 회귀 점검은 필요할 때만 확장하면 된다.
 ## 어디부터 읽으면 되나
@@ -244,6 +285,7 @@ doc.set_footer_text("1 / 10", page_type="BOTH")
 # 표 셀 병합·분할
 table.merge_cells(0, 0, 1, 1)   # (0,0)~(1,1) 병합
 table.set_cell_text(0, 0, "병합된 셀", logical=True, split_merged=True)
+table.set_cell_text(0, 0, "line 1\nline 2", split_paragraphs=True)
 # 양식형 표 자동 채우기
 form = doc.add_table(2, 2)
@@ -257,6 +299,12 @@ doc.fill_by_path({
 })
 ```
+`doc.paragraphs`의 인덱스는 본문 직속 문단 0-based 기준입니다. 표 안 문단은
+본문 `paragraph_index`에 섞지 않고 `get_table_map()`의 cell `location`
+(`table_index`, `row`, `col`, `cell_paragraph_index`)으로 다룹니다.
+`get_table_map()`은 `caption_text`와 `preceding_paragraph_text`를 분리해
+반환하고, 셀 미리보기의 여러 문단은 `\n`으로 유지합니다.
 ### 🔍 텍스트 추출 & 검색
 ```python

{python_hwpx-2.10.1 → python_hwpx-2.10.2}/README.md RENAMED Viewed

@@ -79,6 +79,47 @@ hwpx-validate-package 보고서.hwpx
 hwpx-analyze-template 보고서.hwpx
 ```
+### 4. 풍부한 Markdown 변환 (서식·표·각주·이미지 보존)
+`export_markdown()`는 단순 평문 추출이고, `export_rich_markdown()`는 인라인 서식(`**굵게**`, `*기울임*`, `~~취소선~~`),
+표(중첩 포함, colspan/rowspan 안전), 도형 텍스트, 이미지, 각주/미주, 하이퍼링크, 제목(`#`/`##`) 자동 감지까지 보존한다.
+```python
+from hwpx import HwpxDocument
+doc = HwpxDocument.open("보고서.hwpx")
+md = doc.export_rich_markdown(
+    image_dir="out/images",          # BinData 이미지를 디스크에 추출
+    image_ref_prefix="images/",      # 마크다운 내 ![](images/...) 경로 접두
+    detect_headings=True,            # Ⅰ./1. 패턴 기반 #/## 자동
+)
+print(md)
+```
+문자열·경로·바이트도 그대로 받는다:
+```python
+from hwpx.tools.markdown_export import export_markdown
+md = export_markdown("보고서.hwpx")          # 경로
+md = export_markdown(open("a.hwpx", "rb").read())  # bytes
+```
+### 5. 각주 본문에 혼합 서식 / 하이퍼링크 추가
+`HwpxOxmlNote`에 `body_paragraph`, `add_run`, `add_hyperlink` helper가 있어 각주 본문을
+직접 paragraph로 다루지 않고도 인라인 서식·링크를 손쉽게 채울 수 있다.
+```python
+para = section.paragraphs[0]
+note = para.add_footnote("")  # 빈 각주 생성 후 본문 구성
+note.add_run("자세한 내용은 ", )
+note.add_run("정부 공식 사이트", bold=True)
+note.add_run("를 참고하라: ")
+note.add_hyperlink("https://www.kasa.go.kr", "우주항공청")
+```
 처음에는 `open/new -> edit/extract -> save_to_path` 흐름만 잡으면 된다. 패키지 구조, XML 파트, 템플릿 회귀 점검은 필요할 때만 확장하면 된다.
 ## 어디부터 읽으면 되나
@@ -208,6 +249,7 @@ doc.set_footer_text("1 / 10", page_type="BOTH")
 # 표 셀 병합·분할
 table.merge_cells(0, 0, 1, 1)   # (0,0)~(1,1) 병합
 table.set_cell_text(0, 0, "병합된 셀", logical=True, split_merged=True)
+table.set_cell_text(0, 0, "line 1\nline 2", split_paragraphs=True)
 # 양식형 표 자동 채우기
 form = doc.add_table(2, 2)
@@ -221,6 +263,12 @@ doc.fill_by_path({
 })
 ```
+`doc.paragraphs`의 인덱스는 본문 직속 문단 0-based 기준입니다. 표 안 문단은
+본문 `paragraph_index`에 섞지 않고 `get_table_map()`의 cell `location`
+(`table_index`, `row`, `col`, `cell_paragraph_index`)으로 다룹니다.
+`get_table_map()`은 `caption_text`와 `preceding_paragraph_text`를 분리해
+반환하고, 셀 미리보기의 여러 문단은 `\n`으로 유지합니다.
 ### 🔍 텍스트 추출 & 검색
 ```python

{python_hwpx-2.10.1 → python_hwpx-2.10.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "python-hwpx"
-version = "2.10.1"
+version = "2.10.2"
 description = "한글 없이 HWPX 문서를 열고, 편집하고, 생성하고, 검증하는 Python 자동화 라이브러리"
 readme = { file = "README.md", content-type = "text/markdown" }
 license = "Apache-2.0"

{python_hwpx-2.10.1 → python_hwpx-2.10.2}/src/hwpx/document.py RENAMED Viewed

@@ -1472,6 +1472,14 @@ class HwpxDocument:
         from .tools.exporter import export_markdown
         return export_markdown(self, **kwargs)  # type: ignore[arg-type]
+    def export_rich_markdown(self, **kwargs: object) -> str:
+        """Export rich Markdown preserving inline styles, tables, footnotes, hyperlinks, images, and shape text.
+        Keyword args forwarded to :func:`~hwpx.tools.markdown_export.export_markdown`.
+        """
+        from .tools.markdown_export import export_markdown as _rich
+        return _rich(self, **kwargs)  # type: ignore[arg-type]
     # ------------------------------------------------------------------
     # Validation
     # ------------------------------------------------------------------

{python_hwpx-2.10.1 → python_hwpx-2.10.2}/src/hwpx/oxml/document.py RENAMED Viewed

@@ -1872,6 +1872,68 @@ class HwpxOxmlNote:
         t.text = _sanitize_text(value)
         self.paragraph.section.mark_dirty()
+    @property
+    def body_paragraph(self) -> "HwpxOxmlParagraph":
+        """Return the note's body ``<hp:p>`` wrapped as :class:`HwpxOxmlParagraph`.
+        The body lives inside ``<hp:subList>`` and is distinct from
+        :attr:`paragraph`, which is the *hosting* paragraph (where the note
+        marker is inserted). Use this to add runs with mixed formatting
+        directly into the note body:
+        >>> note = para.add_footnote("기본 ")
+        >>> note.add_run("청색", char_pr_id_ref=5)
+        """
+        p = self.element.find(f".//{_HP}p")
+        if p is None:
+            raise ValueError("note has no body paragraph element")
+        return HwpxOxmlParagraph(p, self.paragraph.section)
+    def add_run(
+        self,
+        text: str = "",
+        *,
+        char_pr_id_ref: str | int | None = None,
+        bold: bool = False,
+        italic: bool = False,
+        underline: bool = False,
+        color: str | None = None,
+        font: str | None = None,
+        size: int | float | None = None,
+        highlight: str | None = None,
+        strike: bool | None = None,
+        attributes: dict[str, str] | None = None,
+    ) -> "HwpxOxmlRun":
+        """Append a run to the note body paragraph (delegates to body_paragraph.add_run)."""
+        return self.body_paragraph.add_run(
+            text,
+            char_pr_id_ref=char_pr_id_ref,
+            bold=bold,
+            italic=italic,
+            underline=underline,
+            color=color,
+            font=font,
+            size=size,
+            highlight=highlight,
+            strike=strike,
+            attributes=attributes,
+        )
+    def add_hyperlink(
+        self,
+        url: str,
+        display_text: str,
+        *,
+        char_pr_id_ref: str | int | None = None,
+    ) -> "HwpxOxmlInlineObject":
+        """Append a hyperlink to the note body paragraph.
+        Convenience wrapper around ``body_paragraph.add_hyperlink``.
+        """
+        return self.body_paragraph.add_hyperlink(
+            url, display_text, char_pr_id_ref=char_pr_id_ref
+        )
 def _default_sublist_attributes() -> dict[str, str]:
     """Return standard attributes for a ``<hp:subList>`` element.
@@ -2425,6 +2487,9 @@ class HwpxOxmlTableCell:
     @property
     def text(self) -> str:
+        paragraphs = self.paragraphs
+        if paragraphs:
+            return "\n".join(paragraph.text or "" for paragraph in paragraphs)
         parts: list[str] = []
         for t_elem in self.element.findall(f".//{_HP}t"):
             if t_elem.text:
@@ -2433,8 +2498,79 @@ class HwpxOxmlTableCell:
     @text.setter
     def text(self, value: str) -> None:
+        self.set_text(value)
+    def _first_run_char_pr_id_ref(self) -> str:
+        for paragraph in self.paragraphs:
+            for run in paragraph.runs:
+                if run.char_pr_id_ref is not None:
+                    return str(run.char_pr_id_ref)
+        return "0"
+    def _paragraph_format_attrs(self, paragraph: "HwpxOxmlParagraph" | None = None) -> dict[str, str]:
+        source = paragraph.element if paragraph is not None else None
+        attrs = dict(_default_cell_paragraph_attributes())
+        if source is not None:
+            for key in ("paraPrIDRef", "styleIDRef", "pageBreak", "columnBreak", "merged"):
+                value = source.get(key)
+                if value is not None:
+                    attrs[key] = value
+        attrs["id"] = _paragraph_id()
+        return attrs
+    def _run_char_pr_for_line(self, paragraphs: Sequence["HwpxOxmlParagraph"], index: int) -> str:
+        if index < len(paragraphs):
+            for run in paragraphs[index].runs:
+                if run.char_pr_id_ref is not None:
+                    return str(run.char_pr_id_ref)
+        return self._first_run_char_pr_id_ref()
+    def _set_split_paragraph_text(self, value: str) -> None:
+        sublist = self._ensure_sublist()
+        existing = self.paragraphs
+        lines = (value or "").replace("\r\n", "\n").replace("\r", "\n").split("\n")
+        if not lines:
+            lines = [""]
+        for paragraph in list(sublist.findall(f"{_HP}p")):
+            sublist.remove(paragraph)
+        for index, line in enumerate(lines):
+            source = existing[index] if index < len(existing) else existing[0] if existing else None
+            paragraph = _append_child(sublist, f"{_HP}p", self._paragraph_format_attrs(source))
+            run = _append_child(
+                paragraph,
+                f"{_HP}run",
+                {"charPrIDRef": self._run_char_pr_for_line(existing, index)},
+            )
+            _append_text_with_tabs(run, line)
+    def set_text(
+        self,
+        value: str,
+        *,
+        preserve_format: bool = True,
+        split_paragraphs: bool = False,
+    ) -> None:
+        if split_paragraphs:
+            self._set_split_paragraph_text(value)
+            self.element.set("dirty", "1")
+            self.table.mark_dirty()
+            return
         text_element = self._ensure_text_element()
         text_element.text = _sanitize_text(value)
+        for node in self.element.findall(f".//{_HP}t"):
+            if node is text_element:
+                continue
+            if node.text:
+                node.text = ""
+        if not preserve_format:
+            run = text_element
+            while run is not None and _element_local_name(run) != "run":
+                run = run.getparent() if hasattr(run, "getparent") else None
+            if run is not None:
+                run.set("charPrIDRef", "0")
         self.element.set("dirty", "1")
         self.table.mark_dirty()
@@ -2898,6 +3034,8 @@ class HwpxOxmlTable:
         *,
         logical: bool = False,
         split_merged: bool = False,
+        preserve_format: bool = True,
+        split_paragraphs: bool = False,
     ) -> None:
         if logical:
             entry = self._grid_entry(row_index, col_index)
@@ -2907,7 +3045,11 @@ class HwpxOxmlTable:
                 cell = entry.cell
         else:
             cell = self.cell(row_index, col_index)
-        cell.text = text
+        cell.set_text(
+            text,
+            preserve_format=preserve_format,
+            split_paragraphs=split_paragraphs,
+        )
     def split_merged_cell(
         self, row_index: int, col_index: int
@@ -3797,7 +3939,10 @@ class HwpxOxmlParagraph:
         sublist = _append_child(note_element, f"{_HP}subList", _default_sublist_attributes())
         p_attrs = {"id": _paragraph_id(), **_DEFAULT_PARAGRAPH_ATTRS}
         paragraph = _append_child(sublist, f"{_HP}p", p_attrs)
-        note_run = _append_child(paragraph, f"{_HP}run", {"charPrIDRef": "0"})
+        # 본문 run의 charPrIDRef도 인자를 따라가도록 적용 (host run과 동일 스타일).
+        # None이면 "0"(default).
+        body_cpr = "0" if char_pr_id_ref is None else str(char_pr_id_ref)
+        note_run = _append_child(paragraph, f"{_HP}run", {"charPrIDRef": body_cpr})
         t = _append_child(note_run, f"{_HP}t", {})
         t.text = _sanitize_text(text)
         self.section.mark_dirty()

python-hwpx 2.10.1__tar.gz → 2.10.2__tar.gz

python-hwpx 2.10.1tar.gz → 2.10.2tar.gz