PyPI - yomitoku - Versions diffs - 0.8.1__tar.gz → 0.9.1__tar.gz - Mend

yomitoku 0.8.1tar.gz → 0.9.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (185) hide show

{yomitoku-0.8.1 → yomitoku-0.9.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: yomitoku
-Version: 0.8.1
+Version: 0.9.1
 Summary: Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.
 Author-email: Kotaro Kinoshita <kotaro.kinoshita@mlism.com>
 License: CC BY-NC-SA 4.0
@@ -19,6 +19,8 @@ Requires-Dist: shapely>=2.0.6
 Requires-Dist: timm>=1.0.11
 Requires-Dist: torch>=2.5.0
 Requires-Dist: torchvision>=0.20.0
+Provides-Extra: mcp
+Requires-Dist: mcp[cli]>=1.6.0; extra == 'mcp'
 Description-Content-Type: text/markdown
 日本語版 | [English](README_EN.md)
@@ -64,6 +66,7 @@ Markdown でエクスポートした結果は関してはリポジトリ内の[s
 ## 📣 リリース情報
+- 2025 年  4 月  4 日 YomiToku v0.8.0 手書き文字認識のサポート
 - 2024 年 11 月 26 日 YomiToku v0.5.1 (beta) を公開
 ## 💡 インストールの方法

{yomitoku-0.8.1 → yomitoku-0.9.1}/README.md RENAMED Viewed

@@ -41,6 +41,7 @@ Markdown でエクスポートした結果は関してはリポジトリ内の[s
 ## 📣 リリース情報
+- 2025 年  4 月  4 日 YomiToku v0.8.0 手書き文字認識のサポート
 - 2024 年 11 月 26 日 YomiToku v0.5.1 (beta) を公開
 ## 💡 インストールの方法

{yomitoku-0.8.1 → yomitoku-0.9.1}/demo/simple_ocr.py RENAMED Viewed

@@ -4,9 +4,12 @@ from yomitoku import OCR
 from yomitoku.data.functions import load_pdf
 if __name__ == "__main__":
-    ocr = OCR(visualize=True, device="cpu")
+    ocr = OCR(visualize=True, device="cuda")
     # PDFファイルを読み込み
     imgs = load_pdf("demo/sample.pdf")
+    import time
+    start = time.time()
     for i, img in enumerate(imgs):
         results, ocr_vis = ocr(img)

{yomitoku-0.8.1 → yomitoku-0.9.1}/docs/cli.en.md RENAMED Viewed

@@ -107,4 +107,18 @@ If the PDF contains multiple pages, you can export them as a single file.
 ```
 yomitoku ${path_data} -f md --combine
-```
+```
+## Specifying Reading Order
+By default, *Auto* mode automatically detects whether a document is written horizontally or vertically and estimates the appropriate reading order. However, you can explicitly specify a custom reading order. For horizontal documents, the default is `top2left`, and for vertical documents, it is `top2bottom`.
+```
+yomitoku ${path_data} --reading_order left2right
+```
+* `top2bottom`: Prioritizes reading from top to bottom. Useful for multi-column documents such as word processor files with vertical flow.
+* `left2right`: Prioritizes reading from left to right. Suitable for layouts like receipts or health insurance cards, where key-value text pairs are arranged in columns.
+* `right2left`: Prioritizes reading from right to left. Effective for vertically written documents.

{yomitoku-0.8.1 → yomitoku-0.9.1}/docs/cli.ja.md RENAMED Viewed

@@ -104,4 +104,18 @@ PDFに複数ページが含まれる場合に複数ページを一つのファ
 ```
 yomitoku ${path_data} -f md --combine
-```
+```
+## 読み取り順を指定する
+Autoでは、横書きのドキュメント、縦書きのドキュメントを識別し、自動で読み取り順を推定しますが、任意の読み取り順の指定することが可能です。デフォルトでは横書きの文書は`top2left`, 縦書きは`top2bottom`になります。
+```
+yomitoku ${path_data} --reading_order left2right
+```
+- `top2bottom`: 上から下方向に優先的に読み取り順を推定します。段組みのワードドキュメントなどに対して、有効です。
+- `left2right`: 左から右方向に優先的に読み取り順を推定します。レシートや保険証などキーに対して、値を示すテキストが段組みになっているようなレイアウトに有効です。
+- `right2left:` 右から左方向に優先的に読み取り順を推定します。縦書きのドキュメントに対して有効です。

yomitoku-0.9.1/docs/mcp.en.md ADDED Viewed

@@ -0,0 +1,48 @@
+# MCP
+This section explains how to use the Yomitoku MCP server in conjunction with Claude Desktop.
+## Installing Yomitoku
+First, install Yomitoku by following the "Installation with uv" section in [Installation](installation.en.md).
+However, to add `mcp` as a dependency during installation, include `mcp` in `--extra` as shown below.
+```bash
+uv sync --extra mcp
+```
+## Setting up Claude Desktop
+Next, add the following configuration to the `mcpServers` section of the Claude Desktop configuration file. (Refer to [here](https://modelcontextprotocol.io/quickstart/user) for how to open the configuration file)
+```json
+{
+  "mcpServers": {
+    "yomitoku": {
+      "command": "uv",
+      "args": [
+        "--directory",
+        "(Absolute path of the directory where Yomitoku was cloned)",
+        "run",
+        "yomitoku_mcp"
+      ],
+      "env": {
+        "RESOURCE_DIR": "(Absolute path of the directory containing files for OCR)"
+      }
+    }
+  }
+}
+```
+For example, if you executed `git clone https://github.com/kotaro-kinoshita/yomitoku.git` in `/Users/your-username/workspace`, then `(Directory where Yomitoku was cloned)` would be `/Users/your-username/workspace/yomitoku`, and if you use `sample.pdf` in the `yomitoku/demo` directory, specify `(Directory containing files for OCR)` as `/Users/your-username/workspace/yomitoku/demo`.
+## Using Claude Desktop
+* Please restart Claude Desktop to apply changes to the configuration file.
+For example, if you use `yomitoku/demo/sample.pdf` as a sample, instruct as follows:
+```txt
+Analyze sample.pdf using OCR and translate it into English.
+```

yomitoku-0.9.1/docs/mcp.ja.md ADDED Viewed

@@ -0,0 +1,50 @@
+# MCP
+ここではYomitokuのMCPサーバーをClaude Desktopに連携して利用する方法を説明します。
+## Yomitokuのインストール
+まずは
+[Installation](installation.ja.md)の「uvでのインストール」に従ってYomitokuをインストールしてください。
+ただし、`mcp`を依存関係に追加するためにインストール時には下記のように`--extra`に`mcp`を加えます。
+```bash
+uv sync --extra mcp
+```
+## Claude Desktopの設定
+次にClaude Desktopの設定ファイルの`mcpServers`に以下ように設定を追加します。(設定ファイルの開き方は[こちら](https://modelcontextprotocol.io/quickstart/user)を参照してください)
+```json
+{
+  "mcpServers": {
+    "yomitoku": {
+      "command": "uv",
+      "args": [
+        "--directory",
+        "(YomitokuをCloneしたディレクトリの絶対パス)",
+        "run",
+        "yomitoku_mcp"
+      ],
+      "env": {
+        "RESOURCE_DIR": "(OCR対象のファイルがあるディレクトリの絶対パス)"
+      }
+    }
+  }
+}
+```
+例えば、`/Users/your-username/workspace`で`git clone https://github.com/kotaro-kinoshita/yomitoku.git`を実行した場合は、`(YomitokuをCloneしたディレクトリ)`は`/Users/your-username/workspace/yomitoku`となり、`yomitoku/demo`ディレクトリの`sample.pdf`を用いる場合は`(OCR対象のファイルがあるディレクトリ)`を`/Users/your-username/workspace/yomitoku/demo`と指定します。
+## Claude Desktopでの利用
+※ 設定ファイルの変更を反映するにはClaude Desktopを再起動してください。
+例えば`yomitoku/demo/sample.pdf`をサンプルとして用いる場合、下記のように指示してください。
+```txt
+sample.pdfをOCRで解析して要約してください。
+```

{yomitoku-0.8.1 → yomitoku-0.9.1}/mkdocs.yml RENAMED Viewed

@@ -69,6 +69,7 @@ nav:
   - Installation: installation.md
   - CLI Usage: cli.md
   - Module Usage: module.md
+  - MCP: mcp.md
 repo_url: https://github.com/kotaro-kinoshita/yomitoku-dev

{yomitoku-0.8.1 → yomitoku-0.9.1}/pyproject.toml RENAMED Viewed

@@ -70,6 +70,12 @@ explicit = true
 [project.scripts]
 yomitoku = "yomitoku.cli.main:main"
+yomitoku_mcp = "yomitoku.cli.mcp:run_mcp_server"
+[project.optional-dependencies]
+mcp = [
+    "mcp[cli]>=1.6.0",
+]
 [tool.tox]
 legacy_tox_ini = """

{yomitoku-0.8.1 → yomitoku-0.9.1}/src/yomitoku/cli/main.py RENAMED Viewed

@@ -3,7 +3,6 @@ import os
 import time
 from pathlib import Path
-import cv2
 import torch
 from ..constants import SUPPORT_OUTPUT_FORMAT
@@ -14,6 +13,8 @@ from ..utils.logger import set_logger
 from ..export import save_csv, save_html, save_json, save_markdown
 from ..export import convert_json, convert_csv, convert_html, convert_markdown
+from ..utils.misc import save_image
 logger = set_logger(__name__, "INFO")
@@ -91,21 +92,23 @@ def process_single_file(args, analyzer, path, format):
         if ocr is not None:
             out_path = os.path.join(
-                args.outdir, f"{dirname}_{filename}_p{page+1}_ocr.jpg"
+                args.outdir, f"{dirname}_{filename}_p{page + 1}_ocr.jpg"
             )
-            cv2.imwrite(out_path, ocr)
+            save_image(ocr, out_path)
             logger.info(f"Output file: {out_path}")
         if layout is not None:
             out_path = os.path.join(
-                args.outdir, f"{dirname}_{filename}_p{page+1}_layout.jpg"
+                args.outdir, f"{dirname}_{filename}_p{page + 1}_layout.jpg"
             )
-            cv2.imwrite(out_path, layout)
+            save_image(layout, out_path)
             logger.info(f"Output file: {out_path}")
-        out_path = os.path.join(args.outdir, f"{dirname}_{filename}_p{page+1}.{format}")
+        out_path = os.path.join(
+            args.outdir, f"{dirname}_{filename}_p{page + 1}.{format}"
+        )
         if format == "json":
             if args.combine:
@@ -340,6 +343,12 @@ def main():
         action="store_true",
         help="if set, ignore meta information(header, footer) in the output",
     )
+    parser.add_argument(
+        "--reading_order",
+        default="auto",
+        type=str,
+        choices=["auto", "left2right", "top2bottom", "right2left"],
+    )
     args = parser.parse_args()
@@ -393,6 +402,7 @@ def main():
         visualize=args.vis,
         device=args.device,
         ignore_meta=args.ignore_meta,
+        reading_order=args.reading_order,
     )
     os.makedirs(args.outdir, exist_ok=True)
@@ -407,7 +417,7 @@ def main():
                 logger.info(f"Processing file: {file_path}")
                 process_single_file(args, analyzer, file_path, format)
                 end = time.time()
-                logger.info(f"Total Processing time: {end-start:.2f} sec")
+                logger.info(f"Total Processing time: {end - start:.2f} sec")
             except Exception:
                 continue
     else:
@@ -415,7 +425,7 @@ def main():
         logger.info(f"Processing file: {path}")
         process_single_file(args, analyzer, path, format)
         end = time.time()
-        logger.info(f"Total Processing time: {end-start:.2f} sec")
+        logger.info(f"Total Processing time: {end - start:.2f} sec")
 if __name__ == "__main__":

yomitoku-0.9.1/src/yomitoku/cli/mcp.py ADDED Viewed

@@ -0,0 +1,165 @@
+import json
+import io
+import csv
+import os
+from pathlib import Path
+from mcp.server.fastmcp import Context, FastMCP
+from yomitoku import DocumentAnalyzer
+from yomitoku.data.functions import load_image, load_pdf
+from yomitoku.export import convert_json, convert_markdown, convert_csv, convert_html
+try:
+    RESOURCE_DIR = os.environ["RESOURCE_DIR"]
+except KeyError:
+    raise ValueError("Environment variable 'RESOURCE_DIR' is not set.")
+analyzer = None
+async def load_analyzer(ctx: Context) -> DocumentAnalyzer:
+    """
+    Load the DocumentAnalyzer instance if not already loaded.
+    Args:
+        ctx (Context): The context in which the analyzer is being loaded.
+    Returns:
+        DocumentAnalyzer: The loaded document analyzer instance.
+    """
+    global analyzer
+    if analyzer is None:
+        await ctx.info("Load document analyzer")
+        analyzer = DocumentAnalyzer(visualize=False, device="cuda")
+    return analyzer
+mcp = FastMCP("yomitoku")
+@mcp.tool()
+async def process_ocr(ctx: Context, filename: str, output_format: str) -> str:
+    """
+    Perform OCR on the specified file in the resource direcory and convert
+    the results to the desired format.
+    Args:
+        ctx (Context): The context in which the OCR processing is executed.
+        filename (str): The name of the file to process in the resource directory.
+        output_format (str): The desired format for the output. The available options are:
+            - json: Outputs the text as structured data along with positional information.
+            - markdown: Outputs texts and tables in Markdown format.
+            - html: Outputs texts and tables in HTML format.
+            - csv: Outputs texts and tables in CSV format.
+    Returns:
+        str: The OCR results converted to the specified format.
+    """
+    analyzer = await load_analyzer(ctx)
+    await ctx.info("Start ocr processing")
+    file_path = os.path.join(RESOURCE_DIR, filename)
+    if Path(file_path).suffix[1:].lower() in ["pdf"]:
+        imgs = load_pdf(file_path)
+    else:
+        imgs = load_image(file_path)
+    results = []
+    for page, img in enumerate(imgs):
+        analyzer.img = img
+        result, _, _ = await analyzer.run(img)
+        results.append(result)
+        await ctx.report_progress(page + 1, len(imgs))
+    if output_format == "json":
+        return json.dumps(
+            [
+                convert_json(
+                    result,
+                    out_path=None,
+                    ignore_line_break=True,
+                    img=img,
+                    export_figure=False,
+                    figure_dir=None,
+                ).model_dump()
+                for img, result in zip(imgs, results)
+            ],
+            ensure_ascii=False,
+            sort_keys=True,
+            separators=(",", ": "),
+        )
+    elif output_format == "markdown":
+        return "\n".join(
+            [
+                convert_markdown(
+                    result,
+                    out_path=None,
+                    ignore_line_break=True,
+                    img=img,
+                    export_figure=False,
+                )[0]
+                for img, result in zip(imgs, results)
+            ]
+        )
+    elif output_format == "html":
+        return "\n".join(
+            [
+                convert_html(
+                    result,
+                    out_path=None,
+                    ignore_line_break=True,
+                    img=img,
+                    export_figure=False,
+                    export_figure_letter="",
+                )[0]
+                for img, result in zip(imgs, results)
+            ]
+        )
+    elif output_format == "csv":
+        output = io.StringIO()
+        writer = csv.writer(output, quoting=csv.QUOTE_MINIMAL)
+        for img, result in zip(imgs, results):
+            elements = convert_csv(
+                result,
+                out_path=None,
+                ignore_line_break=True,
+                img=img,
+                export_figure=False,
+            )
+            for element in elements:
+                if element["type"] == "table":
+                    writer.writerows(element["element"])
+                else:
+                    writer.writerow([element["element"]])
+                writer.writerow([""])
+        return output.getvalue()
+    else:
+        raise ValueError(
+            f"Unsupported output format: {output_format}."
+            " Supported formats are json, markdown, html or csv."
+        )
+@mcp.resource("file://list")
+async def get_file_list() -> list[str]:
+    """
+    Retrieve a list of files in the resource directory.
+    Returns:
+        list[str]: A list of filenames in the resource directory.
+    """
+    return os.listdir(RESOURCE_DIR)
+def run_mcp_server():
+    """
+    Run the MCP server.
+    """
+    mcp.run(transport="stdio")
+if __name__ == "__main__":
+    run_mcp_server()

{yomitoku-0.8.1 → yomitoku-0.9.1}/src/yomitoku/data/dataset.py RENAMED Viewed

@@ -8,9 +8,11 @@ from .functions import (
     validate_quads,
 )
+from concurrent.futures import ThreadPoolExecutor
 class ParseqDataset(Dataset):
-    def __init__(self, cfg, img, quads):
+    def __init__(self, cfg, img, quads, num_workers=8):
         self.img = img[:, :, ::-1]
         self.quads = quads
         self.cfg = cfg
@@ -22,19 +24,27 @@ class ParseqDataset(Dataset):
             ]
         )
-        validate_quads(self.img, self.quads)
+        with ThreadPoolExecutor(max_workers=num_workers) as executor:
+            data = list(executor.map(self.preprocess, self.quads))
-    def __len__(self):
-        return len(self.quads)
+        self.data = [tensor for tensor in data if tensor is not None]
+    def preprocess(self, quad):
+        if validate_quads(self.img, quad) is None:
+            return None
+        roi_img = extract_roi_with_perspective(self.img, quad)
-    def __getitem__(self, index):
-        polygon = self.quads[index]
-        roi_img = extract_roi_with_perspective(self.img, polygon)
         if roi_img is None:
-            return
+            return None
         roi_img = rotate_text_image(roi_img, thresh_aspect=2)
         resized = resize_with_padding(roi_img, self.cfg.data.img_size)
-        tensor = self.transform(resized)
-        return tensor
+        return resized
+    def __len__(self):
+        return len(self.data)
+    def __getitem__(self, index):
+        return self.transform(self.data[index])

{yomitoku-0.8.1 → yomitoku-0.9.1}/src/yomitoku/data/functions.py RENAMED Viewed

@@ -191,7 +191,7 @@ def array_to_tensor(img: np.ndarray) -> torch.Tensor:
     return tensor
-def validate_quads(img: np.ndarray, quads: list[list[list[int]]]):
+def validate_quads(img: np.ndarray, quad: list[list[list[int]]]):
     """
     Validate the vertices of the quadrilateral.
@@ -204,23 +204,23 @@ def validate_quads(img: np.ndarray, quads: list[list[list[int]]]):
     """
     h, w = img.shape[:2]
-    for quad in quads:
-        if len(quad) != 4:
-            raise ValueError("The number of vertices must be 4.")
-        for point in quad:
-            if len(point) != 2:
-                raise ValueError("The number of coordinates must be 2.")
-        quad = np.array(quad, dtype=int)
-        x1 = np.min(quad[:, 0])
-        x2 = np.max(quad[:, 0])
-        y1 = np.min(quad[:, 1])
-        y2 = np.max(quad[:, 1])
-        h, w = img.shape[:2]
+    if len(quad) != 4:
+        # raise ValueError("The number of vertices must be 4.")
+        return None
+    for point in quad:
+        if len(point) != 2:
+            return None
+    quad = np.array(quad, dtype=int)
+    x1 = np.min(quad[:, 0])
+    x2 = np.max(quad[:, 0])
+    y1 = np.min(quad[:, 1])
+    y2 = np.max(quad[:, 1])
+    h, w = img.shape[:2]
-        if x1 < 0 or x2 > w or y1 < 0 or y2 > h:
-            raise ValueError(f"The vertices are out of the image. {quad.tolist()}")
+    if x1 < 0 or x2 > w or y1 < 0 or y2 > h:
+        return None
     return True
@@ -237,19 +237,18 @@ def extract_roi_with_perspective(img, quad):
         np.ndarray: extracted image
     """
     dst = img.copy()
-    quad = np.array(quad, dtype=np.float32)
+    quad = np.array(quad, dtype=np.int64)
     width = np.linalg.norm(quad[0] - quad[1])
     height = np.linalg.norm(quad[1] - quad[2])
     width = int(width)
     height = int(height)
     pts1 = np.float32(quad)
     pts2 = np.float32([[0, 0], [width, 0], [width, height], [0, height]])
     M = cv2.getPerspectiveTransform(pts1, pts2)
     dst = cv2.warpPerspective(dst, M, (width, height))
     return dst

{yomitoku-0.8.1 → yomitoku-0.9.1}/src/yomitoku/document_analyzer.py RENAMED Viewed

@@ -86,8 +86,12 @@ def extract_paragraph_within_figure(paragraphs, figures):
                 check_list[i] = True
         figure["direction"] = judge_page_direction(contained_paragraphs)
+        reading_order = (
+            "left2right" if figure["direction"] == "horizontal" else "right2left"
+        )
         figure_paragraphs = prediction_reading_order(
-            contained_paragraphs, figure["direction"]
+            contained_paragraphs, reading_order
         )
         figure["paragraphs"] = sorted(figure_paragraphs, key=lambda x: x.order)
         figure = FigureSchema(**figure)
@@ -126,8 +130,8 @@ def extract_words_within_element(pred_words, element):
     cnt_vertical = word_direction.count("vertical")
     element_direction = "horizontal" if cnt_horizontal > cnt_vertical else "vertical"
-    prediction_reading_order(contained_words, element_direction)
+    order = "left2right" if element_direction == "horizontal" else "right2left"
+    prediction_reading_order(contained_words, order)
     contained_words = sorted(contained_words, key=lambda x: x.order)
     contained_words = "\n".join([content.contents for content in contained_words])
@@ -328,6 +332,7 @@ class DocumentAnalyzer:
         device="cuda",
         visualize=False,
         ignore_meta=False,
+        reading_order="auto",
     ):
         default_configs = {
             "ocr": {
@@ -352,6 +357,8 @@ class DocumentAnalyzer:
             },
         }
+        self.reading_order = reading_order
         if isinstance(configs, dict):
             recursive_update(default_configs, configs)
         else:
@@ -452,9 +459,17 @@ class DocumentAnalyzer:
         elements = page_contents + layout_res.tables + figures
-        prediction_reading_order(headers, page_direction)
-        prediction_reading_order(footers, page_direction)
-        prediction_reading_order(elements, page_direction, self.img)
+        prediction_reading_order(headers, "left2right")
+        prediction_reading_order(footers, "left2right")
+        if self.reading_order == "auto":
+            reading_order = (
+                "right2left" if page_direction == "vertical" else "top2bottom"
+            )
+        else:
+            reading_order = self.reading_order
+        prediction_reading_order(elements, reading_order, self.img)
         for i, element in enumerate(elements):
             element.order += len(headers)

{yomitoku-0.8.1 → yomitoku-0.9.1}/src/yomitoku/export/export_csv.py RENAMED Viewed

@@ -1,7 +1,7 @@
 import csv
 import os
-import cv2
+from ..utils.misc import save_image
 def table_to_csv(table, ignore_line_break):
@@ -54,7 +54,7 @@ def save_figure(
         filename = os.path.splitext(os.path.basename(out_path))[0]
         figure_name = f"{filename}_figure_{i}.png"
         figure_path = os.path.join(save_dir, figure_name)
-        cv2.imwrite(figure_path, figure_img)
+        save_image(figure_img, figure_path)
 def convert_csv(

yomitoku 0.8.1__tar.gz → 0.9.1__tar.gz

yomitoku 0.8.1tar.gz → 0.9.1tar.gz