PyPI - libefiling - Versions diffs - 0.1.58__tar.gz → 0.2.0__tar.gz - Mend

libefiling 0.1.58tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{libefiling-0.1.58 → libefiling-0.2.0}/.gitignore RENAMED Viewed

@@ -178,3 +178,4 @@ old
 output2
 ./images
 ./images
+.envrc

{libefiling-0.1.58 → libefiling-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: libefiling
-Version: 0.1.58
+Version: 0.2.0
 Summary: A Python library for e-filing systems.
 Project-URL: Homepage, https://github.com/hyperion13th144m/libefiling
 Project-URL: Repository, https://github.com/hyperion13th144m/libefiling
@@ -13,7 +13,6 @@ Classifier: Operating System :: POSIX :: Linux
 Classifier: Programming Language :: Python :: 3
 Requires-Python: >=3.12
 Requires-Dist: asn1crypto<2.0.0,>=1.5.1
-Requires-Dist: dotenv<0.10.0,>=0.9.9
 Requires-Dist: pillow>=12.1.1
 Requires-Dist: pydantic<3.0.0,>=2.12.5
 Requires-Dist: pytesseract<0.4.0,>=0.3.13
@@ -54,7 +53,7 @@ pip install libefiling
 ## 使い方
 ```python
-from libefiling import parse_archive, ImageConvertParam, generate_sha256, get_document_code, get_doc_id
+from libefiling import parse_archive, ImageConvertParam, Source
 params = [
     ImageConvertParam(
@@ -89,8 +88,13 @@ OUT='output'
 ###     "chemical-formulas", "figures", "equations", "tables", "other-images", "ALL"
 ### ]
 ocr_target = ["other-images"]
-doc_id = generate_sha256(SRC)
-if doc_id === '...':
+# src のハッシュ値や文書コードを生成して、処理するか判定する例
+source = Source.create(SRC)
+document_code = source.get_document_code()
+if document_code not in ['A163', 'A151']:
+    raise ValueError(f"Unsupported document code: {document_code}")
+if source.sha256 == '...'
     print("Already processed")
 else:
   parse_archive(
@@ -102,8 +106,7 @@ else:
     image_max_workers=0,  # 0: CPU数に応じて自動
   )
-print(get_document_code("output/manifest.json"))
-print(get_doc_id("output/manifest.json"))
 ```
  - generate_sha256 はアーカイブの内容に応じたハッシュ値を生成し、再処理判定用に使える。
  - parse_archive は SRC,PROCを OUTに展開する。第4引数に、画像変換のパラメータを渡せる。
@@ -112,8 +115,7 @@ OUT に各種ファイルが展開される。第5引数はOCR処理対象の画
   - image_max_workers が 1 のとき: シリアル実行
   - image_max_workers が 2 以上のとき: スレッド並列実行
   - image_max_workers が 0 のとき: CPU数ベースで自動設定
- - get_document_code は parse_archive で生成された manifest.json のパスを与えると、文書コード(e.g. A163)を返す。
- - get_doc_id は parse_archive で生成された manifest.json のパスを与えると、doc_id を返す。
+  - source = Source.create(SRC) の source は、manifest.json, xml/sources.xml の内容とおなじ。parse_archive するまえに、source.sha256 を得られるということ。
 ### 画像変換の高速化オプション
 既定では Pillow でリサイズします。環境変数 LIBEFILING_RESIZER_BACKEND を指定すると、
@@ -193,3 +195,15 @@ MIT ライセンス
 0.1.56
  - 画像リサイズのために pillow-simd を選択できるようにした。
+0.1.60
+ - get_document_code 関数は、manifest.jsonだけでなく、アーカイブパス・手続ファイルを与えても文書コードを返すようにした。
+ - manifest.json に 文書コードを含めた
+0.2.0
+ - manifest.json の documents フィールドを sources フィールドに変更した。
+   - sources の子要素は配列でなく archive, procedure とした。
+   - sources.document_code フィールドは、文書コードを表す
+ - get_document_code 廃止，Source クラスの get_document_code で代替
+ - get_doc_id, generate_sha256 関数廃止, Source クラスの sha256 で代替
+ - xml/sources.xml をはき出すようにした. manifest.json の sources フィールドと同じ内容を表す。

{libefiling-0.1.58 → libefiling-0.2.0}/README.md RENAMED Viewed

@@ -33,7 +33,7 @@ pip install libefiling
 ## 使い方
 ```python
-from libefiling import parse_archive, ImageConvertParam, generate_sha256, get_document_code, get_doc_id
+from libefiling import parse_archive, ImageConvertParam, Source
 params = [
     ImageConvertParam(
@@ -68,8 +68,13 @@ OUT='output'
 ###     "chemical-formulas", "figures", "equations", "tables", "other-images", "ALL"
 ### ]
 ocr_target = ["other-images"]
-doc_id = generate_sha256(SRC)
-if doc_id === '...':
+# src のハッシュ値や文書コードを生成して、処理するか判定する例
+source = Source.create(SRC)
+document_code = source.get_document_code()
+if document_code not in ['A163', 'A151']:
+    raise ValueError(f"Unsupported document code: {document_code}")
+if source.sha256 == '...'
     print("Already processed")
 else:
   parse_archive(
@@ -81,8 +86,7 @@ else:
     image_max_workers=0,  # 0: CPU数に応じて自動
   )
-print(get_document_code("output/manifest.json"))
-print(get_doc_id("output/manifest.json"))
 ```
  - generate_sha256 はアーカイブの内容に応じたハッシュ値を生成し、再処理判定用に使える。
  - parse_archive は SRC,PROCを OUTに展開する。第4引数に、画像変換のパラメータを渡せる。
@@ -91,8 +95,7 @@ OUT に各種ファイルが展開される。第5引数はOCR処理対象の画
   - image_max_workers が 1 のとき: シリアル実行
   - image_max_workers が 2 以上のとき: スレッド並列実行
   - image_max_workers が 0 のとき: CPU数ベースで自動設定
- - get_document_code は parse_archive で生成された manifest.json のパスを与えると、文書コード(e.g. A163)を返す。
- - get_doc_id は parse_archive で生成された manifest.json のパスを与えると、doc_id を返す。
+  - source = Source.create(SRC) の source は、manifest.json, xml/sources.xml の内容とおなじ。parse_archive するまえに、source.sha256 を得られるということ。
 ### 画像変換の高速化オプション
 既定では Pillow でリサイズします。環境変数 LIBEFILING_RESIZER_BACKEND を指定すると、
@@ -172,3 +175,15 @@ MIT ライセンス
 0.1.56
  - 画像リサイズのために pillow-simd を選択できるようにした。
+0.1.60
+ - get_document_code 関数は、manifest.jsonだけでなく、アーカイブパス・手続ファイルを与えても文書コードを返すようにした。
+ - manifest.json に 文書コードを含めた
+0.2.0
+ - manifest.json の documents フィールドを sources フィールドに変更した。
+   - sources の子要素は配列でなく archive, procedure とした。
+   - sources.document_code フィールドは、文書コードを表す
+ - get_document_code 廃止，Source クラスの get_document_code で代替
+ - get_doc_id, generate_sha256 関数廃止, Source クラスの sha256 で代替
+ - xml/sources.xml をはき出すようにした. manifest.json の sources フィールドと同じ内容を表す。

{libefiling-0.1.58 → libefiling-0.2.0}/docs/manifest.md RENAMED Viewed

@@ -40,7 +40,7 @@ manifest.json は、次の設計方針に基づいている。
 {
   "manifest_version": "1.0.0",
   "generator": { ... },
-  "document": { ... },
+  "sources": { ... },
   "paths": { ... },
   "xml_files": [ ... ],
   "images": [ ... ],
@@ -71,12 +71,11 @@ manifest.json は、次の設計方針に基づいている。
 - 再現性やデバッグのために使用される
-## 4.3 document
+## 4.3 sources
 ```json
-"document": {
-  "doc_id": "D000001",
-  "sources": [
-    {
+"sources": {
+  "document_code": "A163",
+  "archive": {
     "filename": "...AAA.JWX",
     "sha256": "...",
     "byte_size": 12345678,
@@ -84,7 +83,7 @@ manifest.json は、次の設計方針に基づいている。
     "kind": "AA",
     "extension": ".JWX"
   },
-   {
+  "procedure": {
     "filename": "...AFM.XML",
     "sha256": "...",
     "byte_size": 4220,
@@ -92,13 +91,12 @@ manifest.json は、次の設計方針に基づいている。
     "kind": "FM",
     "extension": ".XML"
   }
-  ]
 }
 ```
-- doc_id は、この文書単位を一意に識別するためのID
-- source は基になったファイルに関する情報
-- archive_sha256 は再処理判定や追跡用
+- document_code は、文書の分類コード
+- archive, procedure は基になったファイルに関する情報
+- sha256 はarchive, procedure のファイル内容に基づいて生成されたハッシュ値。処理済みかどうかの判定に使える
 - task, kind, extension はファイル名から得られるアーカイブの属する業務、種類、拡張子
 - task の値は以下の通り
   - A: 出願
@@ -116,7 +114,6 @@ manifest.json は、次の設計方針に基づいている。
   - ER: 緊急避難用送信ファイル
   - FM: 手続情報管理ファイル
   - XX: 不明（上記に当てはまらない場合）
-- procedure_source は手続き情報ファイルに関する情報
 ### 4.4 paths
 ```json

{libefiling-0.1.58 → libefiling-0.2.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "libefiling"
-version = "0.1.58"
+version = "0.2.0"
 description = "A Python library for e-filing systems."
 authors = [{ name = "hyperion13th144m", email = "hyperion13th144m@gmail.com" }]
 requires-python = ">=3.12"
@@ -16,7 +16,6 @@ dependencies = [
     "asn1crypto (>=1.5.1,<2.0.0)",
     "pytesseract (>=0.3.13,<0.4.0)",
     "pydantic (>=2.12.5,<3.0.0)",
-    "dotenv (>=0.9.9,<0.10.0)",
     "pillow>=12.1.1",
 ]
@@ -42,3 +41,22 @@ include = ["src/libefiling"]
 [build-system]
 requires = ["hatchling"]
 build-backend = "hatchling.build"
+[tool.ruff.lint]
+# 1. Enable flake8-bugbear (`B`) rules, in addition to the defaults.
+select = ["E4", "E7", "E9", "F", "B"]
+# 2. Avoid enforcing line-length violations (`E501`)
+ignore = ["E501"]
+# 3. Avoid trying to fix flake8-bugbear (`B`) violations.
+unfixable = ["B"]
+# 4. Ignore `E402` (import violations) in all `__init__.py` files, and in selected subdirectories.
+[tool.ruff.lint.per-file-ignores]
+"**/{tests,docs,tools}/*" = ["E402"]
+"__init__.py" = ["E402"]
+[tool.ruff.format]
+# 5. Use double quotes in `ruff format`.
+quote-style = "double"

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/__init__.py RENAMED Viewed

@@ -1,5 +1,4 @@
 from .archive.utils import generate_sha256
 from .image.params import ImageConvertParam
-from .manifest import Manifest, get_doc_id
+from .manifest import Manifest, Source
 from .parse import parse_archive
-from .xml.utils import get_document_code

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/archive/utils.py RENAMED Viewed

@@ -2,19 +2,19 @@ import hashlib
 from pathlib import Path
-def generate_sha256(archive_path: str | Path) -> str:
-    """return document sha256 based on archive_path content
+def generate_sha256(file_path: str | Path) -> str:
+    """return document sha256 based on file_path content
     Args:
-        archive_path (str | Path): archive path
+        file_path (str | Path): file path
     Returns:
         str: document sha256
     """
     sha256_hash = hashlib.sha256()
-    if isinstance(archive_path, Path):
-        archive_path = str(archive_path)
-    with open(archive_path, "rb") as f:
+    if isinstance(file_path, Path):
+        file_path = str(file_path)
+    with open(file_path, "rb") as f:
         # Read and update hash string value in blocks of 4K
         for byte_block in iter(lambda: f.read(4096), b""):
             sha256_hash.update(byte_block)

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/default_config.py RENAMED Viewed

@@ -1,4 +1,4 @@
-from .image.params import ImageConvertParam
+from .image.params import ImageAttribute, ImageConvertParam
 defaultImageParams = [
     ImageConvertParam(
@@ -6,20 +6,20 @@ defaultImageParams = [
         height=300,
         suffix="-thumbnail",
         format=".webp",
-        attributes=[{"key": "sizeTag", "value": "thumbnail"}],
+        attributes=[ImageAttribute(key="sizeTag", value="thumbnail")],
     ),
     ImageConvertParam(
         width=600,
         height=600,
         suffix="-middle",
         format=".webp",
-        attributes=[{"key": "sizeTag", "value": "middle"}],
+        attributes=[ImageAttribute(key="sizeTag", value="middle")],
     ),
     ImageConvertParam(
         width=800,
         height=0,
         suffix="-large",
         format=".webp",
-        attributes=[{"key": "sizeTag", "value": "large"}],
+        attributes=[ImageAttribute(key="sizeTag", value="large")],
     ),
 ]

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/manifest.py RENAMED Viewed

@@ -1,12 +1,13 @@
 from __future__ import annotations
 from datetime import datetime
-from enum import Enum
 from pathlib import Path
-from typing import List, Literal, Optional, get_args
+from typing import List, Optional
+from xml.etree import ElementTree as ET
 from pydantic import BaseModel, Field
+from libefiling.archive.utils import generate_sha256
 from libefiling.image.kind import IMAGE_KIND
 from libefiling.xml.kind import XML_KIND
@@ -30,13 +31,14 @@ class Source(BaseModel):
     extension: str
     @classmethod
-    def create(cls, file_path: str, sha256: str) -> Source:
+    def create(cls, file_path: str | Path) -> Source:
         """Create Source from file path
         Args:
-            file_path (str): file path
+            file_path (str | Path): file path
         """
         filename = Path(file_path).name
+        sha256 = generate_sha256(file_path)
         byte_size = Path(file_path).stat().st_size
         if len(filename) == 63:
             task = filename[56 : 56 + 1]
@@ -54,10 +56,55 @@ class Source(BaseModel):
             extension=extension,
         )
+    def get_document_code(self) -> str:
+        """Get document code from archive file name
+        Args:
+        Returns:
+            str: document code (e.g. A163) or None if not found
+        """
+        if len(self.filename) < 29:
+            return "UNKNOWN"
+        else:
+            return self.filename[19 : 19 + 9].replace("_", "").strip()
-class DocumentInfo(BaseModel):
-    doc_id: str
-    sources: List[Source]
+class Sources(BaseModel):
+    document_code: str
+    archive: Source
+    procedure: Source
+    def save_as_xml(self, xml_path: str) -> None:
+        """Save Sources as XML file
+        Args:
+            xml_path (str): XML file path to save
+        """
+        root = ET.Element("sources", attrib={"document-code": self.document_code})
+        for source in [self.archive, self.procedure]:
+            ET.SubElement(
+                root,
+                "source",
+                attrib={
+                    "filename": source.filename,
+                    "sha256": source.sha256,
+                    "byte-size": str(source.byte_size),
+                    "task": source.task,
+                    "kind": source.kind,
+                    "extension": source.extension,
+                },
+            )
+        tree = ET.ElementTree(root)
+        tree.write(xml_path, encoding="utf-8", xml_declaration=True)
+    def to_xml_file(self, xml_path: str) -> XmlFile:
+        return XmlFile(
+            filename=Path(xml_path).name,
+            original_filename=None,
+            sha256=generate_sha256(xml_path),
+            encoding=EncodingInfo(detected="UTF-8", normalized_to="UTF-8"),
+            kind="source",
+        )
 # -------------------------
@@ -151,23 +198,8 @@ class Stats(BaseModel):
 class Manifest(BaseModel):
     manifest_version: str = "1.0.0"
     generator: GeneratorInfo
-    document: DocumentInfo
+    sources: Sources
     paths: Paths = Paths()
     xml_files: List[XmlFile] = []
     images: List[ImageEntry] = []
     stats: Stats
-    images: List[ImageEntry] = []
-    stats: Stats
-def get_doc_id(manifest_path: str) -> str | None:
-    """Get document ID from manifest file
-    Args:
-        manifest_path (str): manifest file path (e.g. manifest.json)
-    Returns:
-        str: document ID (e.g. 2024000000000)
-    """
-    mp = Path(manifest_path)
-    manifest = Manifest.model_validate_json(mp.read_text(encoding="utf-8"))
-    return manifest.document.doc_id.strip() if manifest.document.doc_id else None

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/parse.py RENAMED Viewed

@@ -1,18 +1,16 @@
 import os
-import shutil
 from concurrent.futures import ThreadPoolExecutor
 from datetime import datetime
 from importlib.metadata import version as get_version
 from itertools import chain
 from pathlib import Path
-from typing import Iterable, Iterator, List, Literal, Union, get_args
+from typing import Iterable, Iterator, List
 from libefiling.archive.utils import generate_sha256
 from libefiling.image.kind import OCR_TARGET, detect_image_kind
 from libefiling.image.mediatype import get_media_type
 from libefiling.manifest import (
     DerivedImage,
-    DocumentInfo,
     EncodingInfo,
     GeneratorInfo,
     ImageAttributes,
@@ -20,6 +18,7 @@ from libefiling.manifest import (
     Manifest,
     OcrInfo,
     Source,
+    Sources,
     Stats,
     XmlFile,
 )
@@ -69,7 +68,8 @@ def parse_archive(
     xml_files = process_xml(raw_xml_files, xml_dir)
     ### convert charset of procedure xml to UTF-8 and save to xml_dir
-    xml_files.append(process_procedure_xml(Path(src_procedure_path), xml_dir))
+    proc_xml_path = xml_dir / "procedure.xml"
+    xml_files.append(process_procedure_xml(Path(src_procedure_path), proc_xml_path))
     ### guess language
     lang = guess_language_by_filename(str(xml_dir))
@@ -89,10 +89,21 @@ def parse_archive(
         max_workers=image_max_workers,
     )
+    ### generate sources.xml
+    source_archive = Source.create(src_archive_path)
+    source_proc = Source.create(src_procedure_path)
+    sources = Sources(
+        document_code=source_archive.get_document_code(),
+        archive=source_archive,
+        procedure=source_proc,
+    )
+    sources_xml_path = str(xml_dir / "sources.xml")
+    sources.save_as_xml(sources_xml_path)
+    xml_files.append(sources.to_xml_file(sources_xml_path))
     # generate manifest
     manifest = process_manifest(
-        src_archive_path,
-        src_procedure_path,
+        sources,
         str(xml_dir),
         xml_files,
         images,
@@ -148,16 +159,14 @@ def process_xml(
 def process_procedure_xml(
     src_procedure_path: Path,
-    xml_dir: Path,
-    filename: str = "procedure.xml",
+    xml_path: Path,
 ) -> XmlFile:
-    xml_path = xml_dir / filename
     convert_xml_charset(str(src_procedure_path), str(xml_path))
     return XmlFile(
-        filename=filename,
+        filename=xml_path.name,
         encoding=EncodingInfo(detected="shift_jis", normalized_to="UTF-8"),
         sha256=generate_sha256(xml_path),
-        kind=detect_xml_kind(filename),
+        kind=detect_xml_kind(xml_path.name),
     )
@@ -177,7 +186,9 @@ def process_images(
     workers = _resolve_worker_count(max_workers)
     if workers <= 1 or len(image_list) == 1:
         return [
-            _process_single_image(image, images_dir, ocr_dir, image_params, lang, ocr_target)
+            _process_single_image(
+                image, images_dir, ocr_dir, image_params, lang, ocr_target
+            )
             for image in image_list
         ]
@@ -275,8 +286,7 @@ def get_ocr_text(image: Path, ocr_dir: Path, lang: str) -> OcrInfo:
 def process_manifest(
-    src_archive_path: str,
-    src_procedure_path: str,
+    sources: Sources,
     xml_dir: str,
     xml_files: list[XmlFile],
     images: list[ImageEntry],
@@ -287,17 +297,7 @@ def process_manifest(
             version=get_version("libefiling"),
             created_at=datetime.now(),
         ),
-        document=DocumentInfo(
-            doc_id=generate_sha256(src_archive_path),
-            sources=[
-                Source.create(
-                    src_archive_path, sha256=generate_sha256(src_archive_path)
-                ),
-                Source.create(
-                    src_procedure_path, sha256=generate_sha256(src_procedure_path)
-                ),
-            ],
-        ),
+        sources=sources,
         xml_files=xml_files,
         images=images,
         stats=Stats(

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/xml/kind.py RENAMED Viewed

@@ -23,6 +23,7 @@ XML_KIND = Literal[
     "special-attached-documents",
     "special-st26-sequence-list",
     "procedure",
+    "source",
     "unknown",
 ]
@@ -139,6 +140,11 @@ re_xml: list[XML_RE_MAP] = [
         "regex": re.compile(r"procedure\.xml"),
         "description": "procedure XML procedure.xml",
     },
+    {
+        "kind": "source",
+        "regex": re.compile(r"source\.xml"),
+        "description": "source XML source.xml",
+    },
 ]

libefiling-0.1.58/docs/benchmark_process_images.py DELETED Viewed

@@ -1,43 +0,0 @@
-from pathlib import Path
-from time import perf_counter
-from libefiling.default_config import defaultImageParams
-from libefiling.parse import process_images
-# Benchmark target: images/ 以下の tif 画像
-image_files = sorted(Path("images").rglob("*.tif"))
-if not image_files:
-    raise SystemExit("No *.tif files found under images/")
-sample_size = min(120, len(image_files))
-sample = image_files[:sample_size]
-print(f"sample: {sample_size} files")
-def run_case(max_workers: int | None, out_root: Path) -> float:
-    out_images = out_root / "images"
-    out_ocr = out_root / "ocr"
-    out_images.mkdir(parents=True, exist_ok=True)
-    out_ocr.mkdir(parents=True, exist_ok=True)
-    start = perf_counter()
-    result = process_images(
-        sample,
-        out_images,
-        out_ocr,
-        defaultImageParams,
-        "jpn",
-        None,
-        max_workers=max_workers,
-    )
-    elapsed = perf_counter() - start
-    print(
-        f"max_workers={max_workers} elapsed={elapsed:.3f}s items={len(result)}"
-    )
-    return elapsed
-serial_sec = run_case(1, Path("/tmp/libefiling-bench-serial"))
-parallel_sec = run_case(0, Path("/tmp/libefiling-bench-auto"))
-print(f"speedup: {serial_sec / parallel_sec:.3f}x")
-print(f"time_reduction: {(serial_sec - parallel_sec) / serial_sec * 100:.2f}%")

libefiling-0.1.58/docs/benchmark_resize.py DELETED Viewed

@@ -1,137 +0,0 @@
-"""Benchmark the current PIL implementation in the active environment."""
-import argparse
-import importlib.metadata
-import json
-import statistics
-from pathlib import Path
-from time import perf_counter
-import libefiling.image.convert as conv
-from libefiling.default_config import defaultImageParams
-from libefiling.image.convert import get_size, load_image, resize_image
-def parse_args() -> argparse.Namespace:
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--sample-size", type=int, default=120)
-    parser.add_argument("--repeats", type=int, default=3)
-    parser.add_argument("--backend", default="pillow")
-    parser.add_argument("--json", action="store_true")
-    return parser.parse_args()
-args = parse_args()
-# --------------------------------------------------------------------------- #
-# 画像ファイルの収集
-# --------------------------------------------------------------------------- #
-DATA_ROOT = Path("images/var/data")
-image_files = sorted(DATA_ROOT.rglob("*.tif"))
-if not image_files:
-    raise SystemExit(f"No *.tif files found under {DATA_ROOT}")
-MAX_SAMPLES = 120
-sample_files = image_files[: min(args.sample_size, len(image_files))]
-print(f"Total *.tif: {len(image_files)}  →  using {len(sample_files)} files")
-# --------------------------------------------------------------------------- #
-# pillow-simd の有無を確認
-# --------------------------------------------------------------------------- #
-try:
-    simd_ver = importlib.metadata.version("pillow-simd")
-    print(f"pillow-simd : {simd_ver}")
-except importlib.metadata.PackageNotFoundError:
-    simd_ver = None
-    print("pillow-simd : not installed  (pillow-simd backend は pillow にフォールバックします)")
-try:
-    pillow_ver = importlib.metadata.version("Pillow")
-except importlib.metadata.PackageNotFoundError:
-    pillow_ver = "unknown"
-print(f"Pillow      : {pillow_ver}")
-if simd_ver is not None:
-    print("NOTE: pillow-simd をインストールした環境では PIL 自体が pillow-simd 実装です。")
-    print("      このスクリプト内の 'pillow' と 'pillow-simd' は別バイナリ比較ではなく、")
-    print("      同じ PIL 実装に対する別コードパス比較になります。")
-else:
-    print("NOTE: pillow-simd 未導入環境では 'pillow-simd' backend は Pillow にフォールバックします。")
-    print("      このスクリプト内の 'pillow' と 'pillow-simd' は別バイナリ比較にはなりません。")
-print("      真の比較を行うには、Pillow 環境と pillow-simd 環境を分けて個別に実行してください。")
-print()
-# --------------------------------------------------------------------------- #
-# 画像を事前にメモリへロード（I/O をベンチから除外）
-# --------------------------------------------------------------------------- #
-print("Loading images into memory...", end=" ", flush=True)
-loaded_images = []
-for p in sample_files:
-    try:
-        loaded_images.append(load_image(p))
-    except Exception as e:
-        print(f"\nSkipping {p}: {e}")
-print(f"{len(loaded_images)} images loaded.")
-print()
-# リサイズターゲット（defaultImageParams の全サイズを使用）
-resize_targets = [(p.width, p.height) for p in defaultImageParams]
-# --------------------------------------------------------------------------- #
-# ベンチマーク本体
-# --------------------------------------------------------------------------- #
-REPEATS = args.repeats
-def run_benchmark(backend: str) -> list[float]:
-    """指定バックエンドで全サンプルをリサイズし、1回あたりの経過秒のリストを返す。"""
-    conv.RESIZER_BACKEND = backend
-    times: list[float] = []
-    for _ in range(REPEATS):
-        t0 = perf_counter()
-        for img in loaded_images:
-            for w, h in resize_targets:
-                size = get_size(img, w, h)
-                resize_image(img, size)
-        elapsed = perf_counter() - t0
-        times.append(elapsed)
-    return times
-print(f"[{args.backend}] running ({REPEATS} reps) ...", flush=True)
-times = run_benchmark(args.backend)
-best = min(times)
-avg = statistics.mean(times)
-ops = len(loaded_images) * len(resize_targets)
-throughput = ops / best
-print(f"  best={best:.3f}s  avg={avg:.3f}s  ({ops} resize ops/rep)")
-print()
-print("=" * 50)
-print("  Summary")
-print("=" * 50)
-print(f"  backend        {args.backend}")
-print(f"  pillow         {pillow_ver}")
-print(f"  pillow-simd    {simd_ver or 'not installed'}")
-print(f"  best           {best:.3f}s")
-print(f"  avg            {avg:.3f}s")
-print(f"  throughput     {throughput:.0f} ops/s")
-if args.json:
-    print(
-        json.dumps(
-            {
-                "backend": args.backend,
-                "pillow": pillow_ver,
-                "pillow_simd": simd_ver,
-                "sample_size": len(sample_files),
-                "repeats": REPEATS,
-                "ops_per_repeat": ops,
-                "times": times,
-                "best": best,
-                "avg": avg,
-                "throughput": throughput,
-            }
-        )
-    )

libefiling-0.1.58/docs/benchmark_resize_isolated.py DELETED Viewed

@@ -1,146 +0,0 @@
-"""Run an isolated Pillow vs pillow-simd vs cykooz benchmark in separate virtualenvs."""
-import argparse
-import json
-import shutil
-import subprocess
-import sys
-from pathlib import Path
-ROOT = Path(__file__).resolve().parents[1]
-PYTHON = sys.executable
-ENV_ROOT = ROOT / ".bench-envs"
-COMMON_DEPS = [
-    "asn1crypto>=1.5.1,<2.0.0",
-    "pytesseract>=0.3.13,<0.4.0",
-    "pydantic>=2.12.5,<3.0.0",
-    "dotenv>=0.9.9,<0.10.0",
-]
-ENV_SPECS = [
-    {
-        "name": "pillow",
-        "packages": ["pillow"],
-        "backend": "pillow",
-    },
-    {
-        "name": "pillow-simd",
-        "packages": ["pillow-simd"],
-        "backend": "pillow",
-    },
-    {
-        "name": "cykooz",
-        "packages": ["pillow", "cykooz_resizer"],
-        "backend": "cykooz",
-    },
-]
-def parse_args() -> argparse.Namespace:
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--sample-size", type=int, default=120)
-    parser.add_argument("--repeats", type=int, default=3)
-    parser.add_argument("--keep-envs", action="store_true")
-    return parser.parse_args()
-def run(cmd: list[str], cwd: Path | None = None) -> subprocess.CompletedProcess[str]:
-    return subprocess.run(
-        cmd,
-        cwd=cwd or ROOT,
-        check=True,
-        text=True,
-        capture_output=True,
-    )
-def env_python(env_dir: Path) -> Path:
-    return env_dir / "bin" / "python"
-def setup_env(env_name: str, packages: list[str]) -> Path:
-    env_dir = ENV_ROOT / env_name
-    if env_dir.exists():
-        shutil.rmtree(env_dir)
-    print(f"[setup] {env_name}")
-    run(["uv", "venv", str(env_dir), "--python", PYTHON])
-    python_bin = env_python(env_dir)
-    run(["uv", "pip", "install", "--python", str(python_bin), "-e", ".", "--no-deps"])
-    run(["uv", "pip", "install", "--python", str(python_bin), *COMMON_DEPS, *packages])
-    return env_dir
-def benchmark_env(
-    env_name: str,
-    env_dir: Path,
-    backend: str,
-    sample_size: int,
-    repeats: int,
-) -> dict:
-    python_bin = env_python(env_dir)
-    result = run(
-        [
-            str(python_bin),
-            "docs/benchmark_resize.py",
-            "--backend",
-            backend,
-            "--sample-size",
-            str(sample_size),
-            "--repeats",
-            str(repeats),
-            "--json",
-        ]
-    )
-    print(result.stdout)
-    payload = json.loads(result.stdout.strip().splitlines()[-1])
-    payload["env_name"] = env_name
-    return payload
-def main() -> int:
-    args = parse_args()
-    ENV_ROOT.mkdir(exist_ok=True)
-    env_dirs: dict[str, Path] = {}
-    results: list[dict] = []
-    for spec in ENV_SPECS:
-        env_dirs[spec["name"]] = setup_env(spec["name"], spec["packages"])
-    try:
-        for spec in ENV_SPECS:
-            results.append(
-                benchmark_env(
-                    spec["name"],
-                    env_dirs[spec["name"]],
-                    spec["backend"],
-                    args.sample_size,
-                    args.repeats,
-                )
-            )
-    finally:
-        if not args.keep_envs:
-            shutil.rmtree(ENV_ROOT, ignore_errors=True)
-    by_name = {result["env_name"]: result for result in results}
-    pillow_result = by_name["pillow"]
-    fastest = min(results, key=lambda result: result["best"])
-    print("=" * 60)
-    print("Isolated Comparison")
-    print("=" * 60)
-    for result in results:
-        speedup = pillow_result["best"] / result["best"]
-        print(
-            f"{result['env_name']:<12} best={result['best']:.3f}s "
-            f"throughput={result['throughput']:.0f} ops/s speedup={speedup:.3f}x"
-        )
-    print(f"fastest      {fastest['env_name']} (best={fastest['best']:.3f}s)")
-    return 0
-if __name__ == "__main__":
-    raise SystemExit(main())

libefiling-0.1.58/src/libefiling/xml/utils.py DELETED Viewed

@@ -1,47 +0,0 @@
-from pathlib import Path
-from xml.etree import ElementTree as ET
-from libefiling.manifest import Manifest
-def get_document_code(manifest_path: str) -> str | None:
-    """Get document code from manifest file
-    Args:
-        manifest_path (str): manifest file path (e.g. manifest.json)
-    Returns:
-        str: document code (e.g. A163)
-    """
-    mp = Path(manifest_path)
-    manifest = Manifest.model_validate_json(mp.read_text(encoding="utf-8"))
-    manifest_dir = mp.parent
-    xml_dir = manifest_dir / manifest.paths.xml_dir
-    for xml in manifest.xml_files:
-        if xml.kind == "procedure":
-            return get_document_code_from_procedure(str(xml_dir / xml.filename))
-    else:
-        return None
-def get_document_code_from_procedure(procedure_path: str) -> str | None:
-    """Get document code from procedure.xml file path
-    Args:
-        procedure_path (str): procedure.xml file path
-    Returns:
-        str: document code (e.g. A163)
-    """
-    ns = {"jp": "http://www.jpo.go.jp"}
-    tree = ET.parse(procedure_path)
-    elem = tree.find(".//jp:document-name", ns)
-    if elem is None:
-        return None
-    # Namespaced attributes are stored as expanded QName keys.
-    code = elem.get("{http://www.jpo.go.jp}document-code")
-    return code.strip() if code else None
-if __name__ == "__main__":
-    import sys
-    print(get_document_code(sys.argv[1]))

{libefiling-0.1.58 → libefiling-0.2.0}/LICENSE RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/docs/README.md RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/docs/archive_structure_notes.md RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/docs/file-1.png RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/docs/file-2.png RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/docs/file-3.png RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/archive/__init__.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/archive/aaa.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/archive/extract.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/archive/handler.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/archive/nnf.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/charset.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/cli.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/image/__init__.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/image/convert.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/image/kind.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/image/mediatype.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/image/ocr.py RENAMED Viewed

File without changes

{libefiling-0.1.58 → libefiling-0.2.0}/src/libefiling/image/params.py RENAMED Viewed

File without changes

libefiling 0.1.58__tar.gz → 0.2.0__tar.gz

libefiling 0.1.58tar.gz → 0.2.0tar.gz