PyPI - hanja-tools - Versions diffs - 0.1.0__tar.gz - Mend

hanja-tools 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

hanja_tools-0.1.0/LICENSE +21 -0
hanja_tools-0.1.0/PKG-INFO +78 -0
hanja_tools-0.1.0/README.md +53 -0
hanja_tools-0.1.0/hanja-tools-PLAN.md +374 -0
hanja_tools-0.1.0/hanja_tools/__init__.py +20 -0
hanja_tools-0.1.0/hanja_tools/data/inmyong_hanja.json +1 -0
hanja_tools-0.1.0/hanja_tools/data/stroke_count.json +1 -0
hanja_tools-0.1.0/hanja_tools/db.py +140 -0
hanja_tools-0.1.0/hanja_tools/inmyong.py +54 -0
hanja_tools-0.1.0/hanja_tools/ohaeng.py +75 -0
hanja_tools-0.1.0/hanja_tools/search.py +132 -0
hanja_tools-0.1.0/hanja_tools/stroke.py +44 -0
hanja_tools-0.1.0/pyproject.toml +45 -0
hanja_tools-0.1.0/scripts/build_inmyong_db.py +54 -0
hanja_tools-0.1.0/scripts/build_stroke_db.py +60 -0
hanja_tools-0.1.0/tests/conftest.py +10 -0
hanja_tools-0.1.0/tests/test_inmyong.py +48 -0
hanja_tools-0.1.0/tests/test_ohaeng.py +59 -0
hanja_tools-0.1.0/tests/test_search.py +94 -0
hanja_tools-0.1.0/tests/test_stroke.py +53 -0

hanja_tools-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 hanja-tools contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

hanja_tools-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,78 @@
+Metadata-Version: 2.4
+Name: hanja-tools
+Version: 0.1.0
+Summary: Korean Hanja utilities: stroke count, five-elements (ohaeng), and personal-name character filter
+Project-URL: Homepage, https://github.com/hyun/hanja-tools
+Project-URL: Repository, https://github.com/hyun/hanja-tools
+License: MIT
+License-File: LICENSE
+Keywords: chinese-characters,hanja,inmyong,korean,ohaeng,stroke
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Text Processing :: Linguistic
+Requires-Python: >=3.11
+Requires-Dist: hanja>=0.14.0
+Provides-Extra: dev
+Requires-Dist: build>=1.0; extra == 'dev'
+Requires-Dist: pytest-cov>=5.0; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: twine>=5.0; extra == 'dev'
+Description-Content-Type: text/markdown
+# hanja-tools
+한국 한자(漢字) 관련 기능을 통합한 Python 패키지입니다.
+`suminb/hanja`가 제공하지 않는 **획수·오행·인명용 필터·검색** 기능을 추가합니다.
+## 설치
+```bash
+pip install hanja-tools
+```
+## 빠른 시작
+```python
+from hanja_tools import HanjaDB
+db = HanjaDB()
+db.get_eum('松')        # '송'  — 한글 음 (suminb/hanja 기반)
+db.get_stroke('松')     # 8     — 획수 (Unihan kTotalStrokes)
+db.get_ohaeng('松')     # '金'  — 오행 (수리오행법: 8획 끝자리 8 → 金)
+db.is_inmyong('松')     # True  — 인명용 한자 여부 (대법원 기준)
+db.info('松')
+# {'char': '松', 'eum': '송', 'stroke': 8, 'ohaeng': '金', 'inmyong': True}
+# 검색
+db.search_by_eum('송')           # ['松', '誦', '頌', ...]
+db.search_by_stroke(8)           # 8획 한자 목록
+db.search_by_ohaeng('木')        # 木 오행 한자 목록
+db.search(eum='송', inmyong=True) # 복합 조건
+```
+## 오행 분류표 (수리오행법)
+| 획수 끝자리 | 오행 |
+|---|---|
+| 1, 2 | 木 |
+| 3, 4 | 火 |
+| 5, 6 | 土 |
+| 7, 8 | 金 |
+| 9, 0 | 水 |
+## 데이터 출처
+| 데이터 | 출처 | 라이선스 |
+|---|---|---|
+| 한자→음 | `suminb/hanja` table.yml (27,497자) | CC0 |
+| 획수 | Unicode Unihan Database `kTotalStrokes` | Unicode License |
+| 인명용 한자 | `rutopio/Korean-Name-Hanja-Charset` data-gov.csv | MIT |
+## 라이선스
+MIT

hanja_tools-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,53 @@
+# hanja-tools
+한국 한자(漢字) 관련 기능을 통합한 Python 패키지입니다.
+`suminb/hanja`가 제공하지 않는 **획수·오행·인명용 필터·검색** 기능을 추가합니다.
+## 설치
+```bash
+pip install hanja-tools
+```
+## 빠른 시작
+```python
+from hanja_tools import HanjaDB
+db = HanjaDB()
+db.get_eum('松')        # '송'  — 한글 음 (suminb/hanja 기반)
+db.get_stroke('松')     # 8     — 획수 (Unihan kTotalStrokes)
+db.get_ohaeng('松')     # '金'  — 오행 (수리오행법: 8획 끝자리 8 → 金)
+db.is_inmyong('松')     # True  — 인명용 한자 여부 (대법원 기준)
+db.info('松')
+# {'char': '松', 'eum': '송', 'stroke': 8, 'ohaeng': '金', 'inmyong': True}
+# 검색
+db.search_by_eum('송')           # ['松', '誦', '頌', ...]
+db.search_by_stroke(8)           # 8획 한자 목록
+db.search_by_ohaeng('木')        # 木 오행 한자 목록
+db.search(eum='송', inmyong=True) # 복합 조건
+```
+## 오행 분류표 (수리오행법)
+| 획수 끝자리 | 오행 |
+|---|---|
+| 1, 2 | 木 |
+| 3, 4 | 火 |
+| 5, 6 | 土 |
+| 7, 8 | 金 |
+| 9, 0 | 水 |
+## 데이터 출처
+| 데이터 | 출처 | 라이선스 |
+|---|---|---|
+| 한자→음 | `suminb/hanja` table.yml (27,497자) | CC0 |
+| 획수 | Unicode Unihan Database `kTotalStrokes` | Unicode License |
+| 인명용 한자 | `rutopio/Korean-Name-Hanja-Charset` data-gov.csv | MIT |
+## 라이선스
+MIT

hanja_tools-0.1.0/hanja-tools-PLAN.md ADDED Viewed

@@ -0,0 +1,374 @@
+# hanja-tools 프로젝트 기획서
+> **목적**: 한국 한자(漢字) 관련 기능을 통합한 독립 Python 패키지
+> **배경**: ho-namer(서예 호 작명 앱) 개발 중 발견한 공백 — `suminb/hanja`는 음(音) 변환만 제공하며 획수·오행·인명용 필터가 없음
+---
+## 1. 문제 정의
+### suminb/hanja 패키지의 현황과 한계
+`pip install hanja`로 설치하는 기존 패키지(`suminb/hanja`, CC0 라이선스)는 내부적으로 `table.yml` 하나로 동작한다.
+**내부 구조**
+```
+hanja 패키지 (522KB)
+├── table.yml   ← 핵심: 한자 유니코드 문자 → 한글 음 1글자 매핑 (27,497자)
+├── __init__.py ← translate(), is_hanja(), split_hanja() 공개 API
+├── impl.py     ← 두음법칙 포함 번역 로직
+├── hangul.py   ← 두음법칙 처리 (dooeum 함수)
+└── table.py    ← table.yml YAML 로더
+```
+**table.yml 구조** (키: 한자 문자 자체, 값: 한글 음 1음절)
+```yaml
+'松': '송'
+'山': '산'
+'水': '수'
+'靑': '청'
+```
+**블록별 커버리지**
+| 유니코드 블록 | 범위 | 수록 자수 |
+|---|---|---|
+| CJK Unified Ideographs | U+4E00~U+9FFF | 20,902자 |
+| CJK Extension-A | U+3400~U+4DBF | 6,582자 |
+| CJK Compatibility | U+F900~U+FAFF | 13자 |
+| **합계** | | **27,497자** |
+음(音) 종류: 546개 / 가장 많은 음: 구(349자), 수(306자), 기(300자)
+**제공 기능과 공백**
+| 기능 | suminb/hanja | 비고 |
+|---|---|---|
+| 한자 → 한글 음 변환 | ✅ | CC0, 두음법칙 포함 |
+| 한자 여부 판별 `is_hanja()` | ✅ | |
+| 한자/비한자 분리 `split_hanja()` | ✅ | |
+| **획수(劃數) 조회** | ❌ 없음 | 별도 구현 필요 |
+| **오행(五行) 분류** | ❌ 없음 | 획수 기반 계산 |
+| **인명용 한자 필터** | ❌ 없음 | 대법원 9,389자 |
+| **한자 검색** (음/획수/오행별) | ❌ 없음 | |
+| **한자 의미** | ❌ 없음 | |
+---
+## 2. 프로젝트 목표
+`suminb/hanja`의 공백을 채우는 **독립 Python 패키지**를 만든다.
+```
+pip install hanja-tools
+```
+### 핵심 기능
+```python
+from hanja_tools import HanjaDB
+db = HanjaDB()
+# 한자 → 음 (suminb/hanja 기반)
+db.get_eum('松')          # → '송'
+# 획수 조회 (Unihan kTotalStrokes)
+db.get_stroke('松')       # → 8
+# 오행 계산 (수리오행법: 획수 끝자리)
+db.get_ohaeng('松')       # → '木'  (8획 → 끝자리 8 → 金... 아님 → 정확한 분류법 적용)
+# 인명용 한자 여부 (대법원 9,389자)
+db.is_inmyong('松')       # → True
+# 검색
+db.search_by_eum('송')           # → ['松', '誦', '頌', ...]
+db.search_by_stroke(8)           # → ['松', '杰', ...]
+db.search_by_ohaeng('木')        # → [...]
+db.search(eum='송', stroke=8)    # → 복합 조건
+```
+---
+## 3. 데이터 소스
+### 3-1. 한자 → 음 매핑
+- **출처**: `suminb/hanja` 내장 `table.yml`
+- **라이선스**: CC0 (상업적 사용 포함 제한 없음)
+- **수량**: 27,497자
+- **처리**: 그대로 활용 또는 의존성으로 설치
+### 3-2. 획수 데이터
+- **출처**: Unicode Consortium — **Unihan Database** `kTotalStrokes` 필드
+- **파일**: `Unihan_IRGSources.txt` (Unicode.org 무료 제공)
+- **라이선스**: Unicode License (무료, 재배포 가능)
+- **URL**: `https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip`
+- **처리**: 1회 파싱 스크립트 → `data/stroke_count.json` 저장
+```python
+# Unihan 파싱 예시
+# Unihan_IRGSources.txt 형식:
+# U+677E  kTotalStrokes  8
+# U+5C71  kTotalStrokes  3
+```
+### 3-3. 인명용 한자 목록
+- **출처 A**: 나무위키 인명용 한자표 (대법원 원본 기반)
+  - URL: `https://namu.wiki/w/한자/인명용%20한자표`
+  - 텍스트 파일: 대법원 PDF 첨부 `AppendixA-ListofHanjaforUseinPersonalNames.txt`
+- **출처 B**: `rutopio/Korean-Name-Hanja-Charset` (GitHub, MIT 라이선스)
+  - `data-gov.csv` — 대법원 크롤링 결과 10,163자
+  - `data-naver.csv` — 네이버 사전 크롤링 결과 8,957자
+- **최신 기준**: 2024년 6월 개정 — 총 **9,389자**
+### 3-4. 오행 분류
+성명학에서 사용하는 **수리오행법** (획수 끝자리 기준):
+| 획수 끝자리 | 오행 |
+|---|---|
+| 1, 2 | 木 |
+| 3, 4 | 火 |
+| 5, 6 | 土 |
+| 7, 8 | 金 |
+| 9, 0 | 水 |
+---
+## 4. 기술 스택
+| 항목 | 선택 | 비고 |
+|---|---|---|
+| 언어 | Python 3.11+ | |
+| 의존성 | `suminb/hanja`, `pyyaml` | 최소화 |
+| 데이터 형식 | JSON | 런타임 로드 |
+| 테스트 | pytest | |
+| 패키지 관리 | pyproject.toml (uv) | |
+| 배포 | PyPI | `hanja-tools` |
+| 라이선스 | MIT | |
+---
+## 5. 디렉토리 구조
+```
+hanja-tools/
+├── pyproject.toml
+├── README.md
+├── LICENSE
+│
+├── hanja_tools/
+│   ├── __init__.py          # 공개 API
+│   ├── db.py                # HanjaDB 메인 클래스
+│   ├── stroke.py            # 획수 조회
+│   ├── ohaeng.py            # 오행 계산
+│   ├── inmyong.py           # 인명용 한자 필터
+│   ├── search.py            # 검색 기능
+│   └── data/
+│       ├── stroke_count.json    # Unihan kTotalStrokes 파싱 결과
+│       └── inmyong_hanja.json   # 대법원 인명용 한자 9,389자
+│
+├── scripts/
+│   ├── build_stroke_db.py   # Unihan 파싱 → stroke_count.json 생성
+│   └── build_inmyong_db.py  # 대법원/나무위키 → inmyong_hanja.json 생성
+│
+└── tests/
+    ├── test_stroke.py
+    ├── test_ohaeng.py
+    ├── test_inmyong.py
+    └── test_search.py
+```
+---
+## 6. 빌드 스크립트 설계
+### `scripts/build_stroke_db.py`
+```python
+"""
+Unicode Unihan Database에서 kTotalStrokes를 파싱해
+data/stroke_count.json을 생성하는 1회성 빌드 스크립트
+"""
+import urllib.request, zipfile, io, json
+UNIHAN_URL = "https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip"
+def fetch_stroke_count():
+    print("Unihan.zip 다운로드...")
+    with urllib.request.urlopen(UNIHAN_URL) as r:
+        data = r.read()
+    stroke_map = {}
+    with zipfile.ZipFile(io.BytesIO(data)) as z:
+        with z.open("Unihan_IRGSources.txt") as f:
+            for line in f:
+                line = line.decode('utf-8').strip()
+                if '\tkTotalStrokes\t' in line:
+                    parts = line.split('\t')
+                    codepoint = parts[0]          # 'U+677E'
+                    stroke = int(parts[2])
+                    char = chr(int(codepoint[2:], 16))
+                    stroke_map[char] = stroke
+    print(f"총 {len(stroke_map)}자 획수 데이터 수집")
+    return stroke_map
+if __name__ == '__main__':
+    result = fetch_stroke_count()
+    with open('hanja_tools/data/stroke_count.json', 'w', encoding='utf-8') as f:
+        json.dump(result, f, ensure_ascii=False, indent=None)
+    print("저장 완료: hanja_tools/data/stroke_count.json")
+```
+### `scripts/build_inmyong_db.py`
+```python
+"""
+rutopio/Korean-Name-Hanja-Charset의 data-gov.csv 또는
+대법원 txt 파일을 파싱해 inmyong_hanja.json을 생성
+"""
+import csv, json, urllib.request
+# rutopio/Korean-Name-Hanja-Charset (MIT)
+CSV_URL = "https://raw.githubusercontent.com/rutopio/Korean-Name-Hanja-Charset/main/data-gov.csv"
+def build():
+    chars = set()
+    with urllib.request.urlopen(CSV_URL) as r:
+        reader = csv.DictReader(r.read().decode('utf-8').splitlines())
+        for row in reader:
+            ch = row.get('character', '').strip()
+            if ch:
+                chars.add(ch)
+    print(f"인명용 한자 {len(chars)}자 수집")
+    return sorted(chars)
+if __name__ == '__main__':
+    result = build()
+    with open('hanja_tools/data/inmyong_hanja.json', 'w', encoding='utf-8') as f:
+        json.dump(result, f, ensure_ascii=False)
+    print("저장 완료: hanja_tools/data/inmyong_hanja.json")
+```
+---
+## 7. 공개 API 설계
+```python
+from hanja_tools import HanjaDB, stroke, ohaeng, search
+# === 단일 한자 조회 ===
+db = HanjaDB()
+db.get_eum('松')           # '송'        — 한글 음
+db.get_stroke('松')        # 8           — 획수
+db.get_ohaeng('松')        # '金'        — 오행 (수리오행법)
+db.is_inmyong('松')        # True        — 인명용 여부
+db.info('松')              # {'char':'松','eum':'송','stroke':8,'ohaeng':'金','inmyong':True}
+# === 검색 ===
+db.search_by_eum('송')             # ['松','誦','頌',...]  음으로 검색
+db.search_by_stroke(8)             # 8획 한자 전체
+db.search_by_ohaeng('木')          # 木 오행 한자 전체
+db.search(eum='송', inmyong=True)  # 복합 조건
+db.search(stroke=8, ohaeng='金', inmyong=True)
+# === 오행 계산 (독립 함수) ===
+from hanja_tools.ohaeng import calc_ohaeng_by_stroke
+calc_ohaeng_by_stroke(8)   # '金'
+calc_ohaeng_by_stroke(13)  # '火'  (끝자리 3)
+# === 획수 (독립 함수) ===
+from hanja_tools.stroke import get_stroke
+get_stroke('松')            # 8
+get_stroke('山')            # 3
+```
+---
+## 8. 테스트 계획
+```
+tests/
+├── test_stroke.py
+│   ├── test_known_strokes()      # 松=8, 山=3, 水=4 등 검증
+│   ├── test_stroke_not_found()   # 없는 한자 → None 반환
+│   └── test_stroke_count_range() # 1~64획 범위 내
+│
+├── test_ohaeng.py
+│   ├── test_ohaeng_mok()         # 1,2획 → 木
+│   ├── test_ohaeng_hwa()         # 3,4획 → 火
+│   ├── test_ohaeng_to()          # 5,6획 → 土
+│   ├── test_ohaeng_geum()        # 7,8획 → 金
+│   └── test_ohaeng_su()          # 9,0획 → 水
+│
+├── test_inmyong.py
+│   ├── test_inmyong_true()       # 松 → True
+│   ├── test_inmyong_false()      # 일반 한자 → False
+│   └── test_inmyong_count()      # 9,000자 이상
+│
+└── test_search.py
+    ├── test_search_by_eum()
+    ├── test_search_by_stroke()
+    ├── test_search_by_ohaeng()
+    └── test_search_compound()
+```
+---
+## 9. ho-namer와의 관계
+```
+[hanja-tools]  →  pip install hanja-tools
+                         ↓
+                   [ho-namer]
+                   core/hanja.py 에서 import
+                   Phase 6 (음양 배열·한자 DB) 에 활용
+```
+ho-namer의 `core/hanja.py`는 `hanja-tools`를 의존성으로 설치해 사용한다.
+hanja-tools가 완성되기 전까지는 ho-namer 내부에 임시 구현(`core/hanja_local.py`)을 두고 진행한다.
+---
+## 10. 참조 레포
+| 레포 | 역할 | 라이선스 |
+|---|---|---|
+| `suminb/hanja` | 한자→음 table.yml (27,497자) | CC0 |
+| `rutopio/Korean-Name-Hanja-Charset` | 대법원 인명용 한자 크롤링 데이터 | MIT |
+| `byung-u/EdenNaming` | 작명 도메인 로직 참조 | MIT |
+| Unicode Unihan DB | kTotalStrokes 획수 데이터 | Unicode License |
+| 나무위키 인명용 한자표 | 대법원 원본 기반 텍스트 목록 | — |
+---
+## 11. 개발 순서 (권장)
+```
+Phase 0  프로젝트 초기화 (pyproject.toml, 디렉토리)
+    ↓
+Phase 1  scripts/build_stroke_db.py → data/stroke_count.json 생성
+    ↓
+Phase 2  scripts/build_inmyong_db.py → data/inmyong_hanja.json 생성
+    ↓
+Phase 3  stroke.py + test_stroke.py  (획수 조회)
+    ↓
+Phase 4  ohaeng.py + test_ohaeng.py  (오행 계산)
+    ↓
+Phase 5  inmyong.py + test_inmyong.py (인명용 필터)
+    ↓
+Phase 6  db.py + search.py + test_search.py (통합 클래스 + 검색)
+    ↓
+Phase 7  PyPI 배포 (twine, pyproject.toml 완성)
+```
+---
+*작성일: 2026-06-24*
+*관련 프로젝트: ho-namer (서예 호 작명 FastAPI 앱)*

hanja_tools-0.1.0/hanja_tools/__init__.py ADDED Viewed

@@ -0,0 +1,20 @@
+"""
+hanja_tools — Korean Hanja utilities
+Public API:
+    HanjaDB  : unified access class
+    get_stroke, calc_ohaeng_by_stroke : convenience re-exports
+"""
+from hanja_tools.db import HanjaDB
+from hanja_tools.stroke import get_stroke
+from hanja_tools.ohaeng import calc_ohaeng_by_stroke, get_ohaeng
+from hanja_tools.inmyong import is_inmyong
+__all__ = [
+    "HanjaDB",
+    "get_stroke",
+    "calc_ohaeng_by_stroke",
+    "get_ohaeng",
+    "is_inmyong",
+]