PyPI - cihai-cli - Versions diffs - 0.32.0__tar.gz → 0.34.0__tar.gz - Mend

cihai-cli 0.32.0tar.gz → 0.34.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (115) hide show

cihai_cli-0.34.0/.github/contributing.md ADDED Viewed

@@ -0,0 +1,26 @@
+# Contributing
+When contributing to this repository, please first discuss the change you wish to make via issue,
+email, or any other method with the maintainers of this repository before making a change.
+See [AGENTS.md](../AGENTS.md) for coding standards and development workflow.
+## Pull Request Process
+1. **Format and lint**: `uv run ruff format .` then `uv run ruff check . --fix --show-fixes`
+2. **Type check**: `uv run mypy`
+3. **Test**: `uv run pytest` — all tests must pass before submitting
+4. **Document**: Update docs if your change affects the public interface
+5. You may merge the Pull Request once you have the sign-off of one other developer. If you
+   do not have permission to do that, you may request a reviewer to merge it for you.
+## Decorum
+- Participants will be tolerant of opposing views.
+- Participants must ensure that their language and actions are free of personal
+  attacks and disparaging personal remarks.
+- When interpreting the words and actions of others, participants should always
+  assume good intentions.
+- Behaviour which can be reasonably considered harassment will not be tolerated.
+Based on [Ruby's Community Conduct Guideline](https://www.ruby-lang.org/en/conduct/)

{cihai_cli-0.32.0 → cihai_cli-0.34.0}/.github/workflows/docs.yml RENAMED Viewed

@@ -20,7 +20,7 @@ jobs:
       - uses: actions/checkout@v6
       - name: Filter changed file paths to outputs
-        uses: dorny/paths-filter@v3.0.2
+        uses: dorny/paths-filter@v4.0.1
         id: changes
         with:
           filters: |
@@ -51,7 +51,7 @@ jobs:
         run: uv sync --all-extras --dev
       - name: Install just
-        uses: extractions/setup-just@v3
+        uses: extractions/setup-just@v4
       - name: Print python versions
         if: env.PUBLISH == 'true'
@@ -59,6 +59,15 @@ jobs:
           python -V
           uv run python -V
+      - name: Cache sphinx fonts
+        if: env.PUBLISH == 'true'
+        uses: actions/cache@v5
+        with:
+          path: ~/.cache/sphinx-fonts
+          key: sphinx-fonts-${{ hashFiles('docs/conf.py') }}
+          restore-keys: |
+            sphinx-fonts-
       - name: Build documentation
         if: env.PUBLISH == 'true'
         run: |
@@ -66,7 +75,7 @@ jobs:
       - name: Configure AWS Credentials
         if: env.PUBLISH == 'true'
-        uses: aws-actions/configure-aws-credentials@v5
+        uses: aws-actions/configure-aws-credentials@v6
         with:
           role-to-assume: ${{ secrets.CIHAI_CLI_DOCS_ROLE_ARN }}
           aws-region: us-east-1

{cihai_cli-0.32.0 → cihai_cli-0.34.0}/.github/workflows/tests.yml RENAMED Viewed

@@ -41,7 +41,7 @@ jobs:
       - name: Test with pytest
         run: uv run py.test --cov=./ --cov-report=xml
-      - uses: codecov/codecov-action@v5
+      - uses: codecov/codecov-action@v6
         with:
           token: ${{ secrets.CODECOV_TOKEN }}

{cihai_cli-0.32.0 → cihai_cli-0.34.0}/.gitignore RENAMED Viewed

@@ -56,3 +56,7 @@ monkeytype.sqlite3
 # Test data
 .cihai_cache/
+# Generated by sphinx_fonts extension (downloaded at build time)
+docs/_static/fonts/
+docs/_static/css/fonts.css

cihai_cli-0.34.0/.tool-versions ADDED Viewed

@@ -0,0 +1,3 @@
+just 1.54
+uv 0.11.25
+python 3.14 3.13 3.12 3.11 3.10

cihai_cli-0.34.0/AGENTS.md ADDED Viewed

@@ -0,0 +1,435 @@
+# AGENTS.md
+Guidance for AI agents (Cursor, Claude Code, Copilot, etc.) working in this repository.
+## CRITICAL REQUIREMENTS
+### Test Success
+- ALL tests must pass (unit, doctest, lint, type checks) before declaring work complete.
+- Do not describe code as "working" if any test fails.
+- Fix regressions rather than disabling or skipping tests unless explicitly approved.
+## Project Overview
+gp-libs is the shared tooling stack used across the git-pull ecosystem. This repository, `cihai-cli`, is a command-line interface built on top of the `cihai` library to explore the Unihan (CJK) character database. Key abilities:
+- Lookup CJK characters with `cihai info <char>` and YAML-formatted output.
+- Reverse search definitions with `cihai reverse <term>`.
+- Bootstraps and queries the Unihan dataset via `cihai` / `unihan-etl`.
+- Provides a small, typed argparse-based CLI (`src/cihai_cli/cli.py`) exposed as the `cihai` entry point.
+## Development Environment
+This project uses:
+- Python 3.10+
+- [uv](https://github.com/astral-sh/uv) for dependency and task execution
+- [ruff](https://github.com/astral-sh/ruff) for linting/formatting
+- [mypy](https://github.com/python/mypy) with strict settings
+- [pytest](https://docs.pytest.org/) (+ doctests) for testing
+- [gp-libs](https://github.com/gp-libs/gp-libs) for shared docs/testing helpers
+- Sphinx (Furo) for documentation
+## Common Commands
+### Setup
+```bash
+# Install dependencies (editable)
+uv pip install --editable .
+uv pip sync
+# Install with dev extras
+uv pip install --editable . -G dev
+```
+### Tests
+```bash
+just test           # or: uv run pytest
+uv run pytest tests/test_cli.py           # single file
+uv run pytest tests/test_cli.py::test_info_command  # single test
+just start          # run tests then watch with pytest-watcher
+uv run ptw .        # standalone watcher (doctests enabled by default)
+```
+### Linting & Types
+```bash
+just ruff           # uv run ruff check .
+just ruff-format    # uv run ruff format .
+uv run ruff check . --fix --show-fixes
+just mypy           # strict type checking
+```
+### Documentation
+```bash
+just build-docs     # build Sphinx HTML in docs/_build
+just start-docs     # autobuild + livereload
+just design-docs    # update CSS/JS assets
+```
+### Workflow (recommended)
+1) `uv run ruff format .`
+2) `uv run pytest`
+3) `uv run ruff check . --fix --show-fixes`
+4) `uv run mypy`
+5) `uv run pytest` (verify clean)
+## Code Architecture (quick map)
+- `src/cihai_cli/cli.py`: argparse entrypoint, implements `info` and `reverse` commands, logging setup.
+- `src/cihai_cli/__about__.py`: package metadata (`__version__`).
+- Tests: `tests/` (unit) plus doctests in `src/` and `docs/`.
+- Docs: `docs/` Sphinx project (Furo theme).
+## Testing Strategy
+- Pytest with doctests enabled (`addopts` in `pyproject.toml`).
+- Prefer real `cihai` / `unihan_etl` integration over heavy mocking; reuse fixtures where present.
+- Watch mode: `uv run ptw .` (used in `just start`).
+- Coverage via `pytest-cov`; configuration in `pyproject.toml`.
+- Prefer fixtures over mocks (`server`, `session`, etc. when available); use `tmp_path` over `tempfile`, `monkeypatch` over `unittest.mock`.
+## Coding Standards
+- `from __future__ import annotations` required; enforced by ruff.
+- Namespace imports for stdlib/typing (`import typing as t`); third-party packages may use `from X import Y`.
+- Docstrings follow NumPy style (see `tool.ruff.lint.pydocstyle`).
+- Python target version: 3.10 (`tool.ruff.target-version`).
+- Keep CLI output human-friendly YAML; avoid breaking existing flags/args.
+- Doctests: keep concise, narrative Examples blocks; move complex flows to `tests/examples/`.
+## Logging Standards
+These rules guide future logging changes; existing code may not yet conform.
+### Logger setup
+- Use `logging.getLogger(__name__)` in every module
+- Add `NullHandler` in library `__init__.py` files
+- Never configure handlers, levels, or formatters in library code — that's the application's job
+### Structured context via `extra`
+Pass structured data on every log call where useful for filtering, searching, or test assertions.
+**Core keys** (stable, scalar, safe at any log level):
+| Key | Type | Context |
+|-----|------|---------|
+| `unihan_field` | `str` | UNIHAN field name |
+| `unihan_source_file` | `str` | source data file path |
+| `unihan_record_count` | `int` | records processed |
+| `cihai_command` | `str` | CLI command name |
+**Heavy/optional keys** (DEBUG only, potentially large):
+| Key | Type | Context |
+|-----|------|---------|
+| `unihan_stdout` | `list[str]` | subprocess stdout lines (truncate or cap; `%(unihan_stdout)s` produces repr) |
+| `unihan_stderr` | `list[str]` | subprocess stderr lines (same caveats) |
+Treat established keys as compatibility-sensitive — downstream users may build dashboards and alerts on them. Change deliberately.
+### Key naming rules
+- `snake_case`, not dotted; `unihan_` prefix
+- Prefer stable scalars; avoid ad-hoc objects
+- Heavy keys (`unihan_stdout`, `unihan_stderr`) are DEBUG-only; consider companion `unihan_stdout_len` fields or hard truncation (e.g. `stdout[:100]`)
+### Lazy formatting
+`logger.debug("msg %s", val)` not f-strings. Two rationales:
+- Deferred string interpolation: skipped entirely when level is filtered
+- Aggregator message template grouping: `"Running %s"` is one signature grouped ×10,000; f-strings make each line unique
+When computing `val` itself is expensive, guard with `if logger.isEnabledFor(logging.DEBUG)`.
+### stacklevel for wrappers
+Increment for each wrapper layer so `%(filename)s:%(lineno)d` and OTel `code.filepath` point to the real caller. Verify whenever call depth changes.
+### LoggerAdapter for persistent context
+For objects with stable identity (Dataset, Reader, Exporter), use `LoggerAdapter` to avoid repeating the same `extra` on every call. Lead with the portable pattern (override `process()` to merge); `merge_extra=True` simplifies this on Python 3.13+.
+### Log levels
+| Level | Use for | Examples |
+|-------|---------|----------|
+| `DEBUG` | Internal mechanics, data I/O | Field parsing, record transformation steps |
+| `INFO` | Data lifecycle, user-visible operations | Download completed, export finished, database bootstrapped |
+| `WARNING` | Recoverable issues, deprecation, user-actionable config | Missing optional field, deprecated data format |
+| `ERROR` | Failures that stop an operation | Download failed, parse error, database write failed |
+Config discovery noise belongs in `DEBUG`; only surprising/user-actionable config issues → `WARNING`.
+### Message style
+- Lowercase, past tense for events: `"download completed"`, `"parse error"`
+- No trailing punctuation
+- Keep messages short; put details in `extra`, not the message string
+### Exception logging
+- Use `logger.exception()` only inside `except` blocks when you are **not** re-raising
+- Use `logger.error(..., exc_info=True)` when you need the traceback outside an `except` block
+- Avoid `logger.exception()` followed by `raise` — this duplicates the traceback. Either add context via `extra` that would otherwise be lost, or let the exception propagate
+### Testing logs
+Assert on `caplog.records` attributes, not string matching on `caplog.text`:
+- Scope capture: `caplog.at_level(logging.DEBUG, logger="cihai_cli.cli")`
+- Filter records rather than index by position: `[r for r in caplog.records if hasattr(r, "unihan_field")]`
+- Assert on schema: `record.unihan_record_count == 100` not `"100 records" in caplog.text`
+- `caplog.record_tuples` cannot access extra fields — always use `caplog.records`
+### Avoid
+- f-strings/`.format()` in log calls
+- Unguarded logging in hot loops (guard with `isEnabledFor()`)
+- Catch-log-reraise without adding new context
+- `print()` for diagnostics
+- Logging secret env var values (log key names only)
+- Non-scalar ad-hoc objects in `extra`
+- Requiring custom `extra` fields in format strings without safe defaults (missing keys raise `KeyError`)
+## Documentation Standards
+### Code Blocks in Documentation
+When writing documentation (README, CHANGES, docs/), follow these rules for code blocks:
+**One command per code block.** This makes commands individually copyable. For sequential commands, either use separate code blocks or chain them with `&&` or `;` and `\` continuations (keeping it one logical command).
+**Put explanations outside the code block**, not as comments inside.
+Good:
+Run the tests:
+```console
+$ uv run pytest
+```
+Run with coverage:
+```console
+$ uv run pytest --cov
+```
+Bad:
+```console
+# Run the tests
+$ uv run pytest
+# Run with coverage
+$ uv run pytest --cov
+```
+### Shell Command Formatting
+These rules apply to shell commands in documentation (README, CHANGES, docs/), **not** to Python doctests.
+**Use `console` language tag with `$ ` prefix.** This distinguishes interactive commands from scripts and enables prompt-aware copy in many terminals.
+Good:
+```console
+$ uv run pytest
+```
+Bad:
+```bash
+uv run pytest
+```
+**Split long commands with `\` for readability.** Each flag or flag+value pair gets its own continuation line, indented. Positional parameters go on the final line.
+Good:
+```console
+$ pipx install \
+    --suffix=@next \
+    --pip-args '\--pre' \
+    --force \
+    'cihai-cli'
+```
+Bad:
+```console
+$ pipx install --suffix=@next --pip-args '\--pre' --force 'cihai-cli'
+```
+### Changelog Conventions
+These rules apply when authoring entries in `CHANGES`, which is rendered as the Sphinx changelog page. Modeled on Django's release-notes shape — deliverables get titles and prose, not bullets.
+**Release entry boilerplate.** Every release header is `## cihai-cli X.Y.Z (YYYY-MM-DD)`. The file opens with a `## cihai-cli X.Y.Z (unreleased)` placeholder block fenced by `<!-- KEEP THIS PLACEHOLDER ... -->` and `<!-- END PLACEHOLDER ... -->` HTML comments — new release entries land immediately below the END marker, never above it.
+**Open with a multi-sentence lead paragraph.** Plain prose, no italic. Open with the version as sentence subject (*"cihai-cli X.Y.Z ships …"*) so the lead is self-contained when excerpted. Two to four sentences telling the reader what shipped and who cares — user-visible takeaways, not internal mechanism. Cross-reference detail docs with `{ref}` to keep the lead compact.
+**Each deliverable is a section, not a bullet.** Inside `### What's new`, every distinct deliverable gets a `#### Deliverable title (#NN)` heading naming it in user vocabulary, followed by 1-3 prose paragraphs explaining what shipped. Don't wrap a paragraph in `- ` — bullets are for enumerable lists, not paragraph containers. Cross-link detail docs (`See {ref}\`foo\` for details.`) so prose stays focused.
+**The deliverable test.** Before writing an entry, ask: "What's the deliverable, in user vocabulary?" If you can't answer in one sentence, the entry isn't ready. Mechanism (helper internals, byte counters, schema-validation locations) belongs in PR descriptions and code comments, not the changelog.
+**Fixed subheadings**, in this order when present: `### Breaking changes`, `### Dependencies`, `### What's new`, `### Fixes`, `### Documentation`, `### Development`. Dev tooling (helper scripts, internal automation) lives under `### Development`. For breaking changes, show the migration path with concrete inline code (e.g. a `# Before` / `# After` fenced code block). Dependency floor bumps use the form ``Minimum `pkg>=X.Y.Z` (was `>=X.Y.W`)``.
+**PR refs `(#NN)`** sit in each deliverable's `####` heading.
+**When bullets are appropriate.** Catch-all sections (`### Fixes`, occasionally `### Documentation`) with 3+ genuinely small items use bullets — one line each, never paragraphs. If a bullet swells past two lines, promote it to a `#### Title (#NN)` heading with prose body.
+**Anti-patterns.**
+- Fragile metrics: token ceilings, third-party version pins, percent benchmarks, exact byte counts. Describe the *capability*, not the math.
+- Internal jargon: private symbols (leading-underscore identifiers), algorithm names exposed for the first time, backend scaffolding.
+- Walls of text dressed up as bullets.
+- Buried breaking changes — they get their own subheading at the top of the entry.
+**Always link autodoc'd APIs.** Any class, method, function, exception, or attribute that has its own rendered page must be cited via the appropriate role (`{class}`, `{meth}`, `{func}`, `{exc}`, `{attr}`) — never with plain backticks. Doc pages without explicit ref labels use `{doc}`. Plain backticks are correct for code syntax, env vars, parameter names, and file paths that aren't doc pages — anything without an autodoc destination.
+**MyST roles.** Class references use `{class}`, methods use `{meth}`, functions use `{func}`, exceptions use `{exc}`, attributes use `{attr}`, internal anchors use `{ref}`, doc-path links use `{doc}`.
+**Summarization style.** When a user asks "what changed in the latest version?" or similar, lead with the entry's lead paragraph (paraphrased if needed), followed by each `####` deliverable heading under `### What's new` with a one-sentence summary. Cite `(#NN)` only if the user asks for source links. Don't invent versions, dates, or numbers not present in `CHANGES`. Don't quote line numbers or file offsets — those shift as the file evolves.
+## Debugging Tips
+- Lean on `pytest -k <pattern> -vv` for focused failures.
+- For CLI behavior, run `uv run cihai info 好` or `uv run cihai reverse library`.
+- If Unihan DB is missing, CLI bootstraps automatically; avoid altering that flow unless required.
+- Stuck in loops? Pause, minimize to a minimal repro, document exact errors, and restate the hypothesis before another attempt.
+## Git Commit Standards
+Commit subjects: `Scope(type[detail]): concise description`
+Body template:
+```
+why: Reason or impact.
+what:
+- Key technical changes
+- Single topic only
+```
+Guidelines:
+- Subject ≤50 chars; body lines ≤72 chars; imperative mood.
+- One topic per commit; separate subject and body with a blank line.
+- Mark breaking changes with `BREAKING:` and include related issue refs when relevant.
+Common commit types:
+- **feat**: New features or enhancements
+- **fix**: Bug fixes
+- **refactor**: Code restructuring without functional change
+- **docs**: Documentation updates
+- **chore**: Maintenance (dependencies, tooling, config)
+- **test**: Test-related updates
+- **style**: Code style and formatting
+- **py(deps)**: Dependencies
+- **py(deps[dev])**: Dev dependencies
+- **ai(rules[AGENTS])**: AI rule updates
+- **ai(claude[rules])**: Claude Code rules (CLAUDE.md)
+- **ai(claude[command])**: Claude Code command changes
+#### Release commits
+Never create tags. Never push tags. The user handles tagging and tag
+pushes (tags trigger the CI publish workflow).
+Release commit subjects are plain and short: `Tag v<version>`. Put
+the detailed why/what in the commit body. Don't use the
+`Scope(type[detail]):` format for releases — don't bury the lede.
+## Notes & Docs Authoring
+- For `notes/**/*.md`, keep content concise and well-structured (headings, bullets, code fences).
+- Use clear link text `[Title](mdc:URL)` and avoid redundancy; follow llms.txt style when possible.
+## References
+- Project docs: https://cihai-cli.git-pull.com
+- Library docs: https://cihai.git-pull.com
+- Unihan dataset: https://www.unicode.org/charts/unihan.html
+- Shared tooling: https://github.com/gp-libs/gp-libs
+## AI Slop Prevention
+Treat AI slop as **review-hostile noise**, not as proof that text or
+code is wrong. The goal is to maximize information density by removing
+artifacts that make the repository harder to trust or navigate.
+### The Anti-Slop Rubric
+Before committing, audit all AI-assisted changes for these noise
+patterns:
+- **AI Signatures:** Remove "Generated by", footers, conversational
+  filler ("Certainly!", "Here is..."), unexplained emojis (🤖, ✨), and
+  AI-tool metadata.
+- **Brittle References:** Avoid hard-coded line numbers, fragile
+  file/test counts, dated "as of" claims, bare SHAs, and local
+  absolute paths unless they are strict evidentiary artifacts (e.g.,
+  benchmark logs).
+- **Diff Narration:** Do not restate what moved, was renamed, or was
+  removed in artifacts the downstream reader holds: code, docstrings,
+  README, CHANGES, PR descriptions, or release notes. The diff and
+  commit message already carry this history.
+- **Branch-Internal Narrative:** Do not mention intermediate branch
+  states, abandoned approaches, or "no longer" behavior unless users
+  of a published release actually experienced the old state (**The
+  Published-Release Test**).
+- **Low-Value Scaffolding:** Remove ownerless TODOs (`TODO: revisit`),
+  unused future-proofing, debug artifacts, and defensive wrappers that
+  do not protect a currently reachable failure mode.
+- **Prose Inflation:** Replace generic AI "tells" like *comprehensive,
+  robust, seamless, production-ready, leverage, delve, tapestry,* and
+  *best practices* with concrete descriptions of behavior,
+  constraints, or trade-offs.
+### Preservation & Context
+**When unsure, leave the text in place and ask.** Subjective cleanup
+must never be a reason to remove load-bearing rationale.
+- **Preserve the "Why":** You MUST NOT delete comments that document
+  invariants, protocol constraints, platform quirks, security
+  boundaries, and upstream workarounds.
+- **Evidence is Immune:** Preserve exact counts, dates, and SHAs when
+  they serve as evidence in benchmark results, release notes, stack
+  traces, or lockfiles.
+- **Behavior Over Inventory:** A useful description explains what
+  changed for the *system or user*; it does not provide an inventory
+  of files or functions the diff already shows.
+### The Published-Release Test
+Long-running branches accumulate tactical decisions — renames,
+refactors, attempts-then-reverts. When deciding what counts as
+branch-internal, use trunk or the parent branch as the baseline — not
+intermediate states inside the current branch. Ask:
+> Did users of the most recently published release ever experience
+> this old name, old behavior, or bug?
+If the answer is **no**, it is branch-internal narrative. Move it to
+the commit message and describe only the final state in the artifact.
+**Keep in shipped artifacts:**
+- Deprecations and migration guides for symbols that actually shipped.
+- `### Fixes` entries for bugs that affected users of a published
+  release.
+- Comments explaining *why the current code looks this way*
+  (invariants, platform quirks) that make sense to a reader who never
+  saw the previous version.
+### Cleanup in Hindsight
+When applying these rules retroactively from inside a feature branch,
+first establish scope by diffing against the parent branch (or trunk)
+to identify which commits this branch actually introduced. Then:
+- **In-branch commits:** Prompt the user with two options: `fixup!`
+  commits with `git rebase --autosquash` to address each causal commit
+  at its source, or a single cleanup commit at branch tip.
+- **Trunk/Parent commits:** Default to leaving them alone. Act only on
+  explicit user instruction. If the user opts in, fold the cleanup
+  into a single commit at branch tip; do not rewrite shared history.
+- **Scope guard:** If cleaning prior slop would touch a colleague's
+  work or expand the branch beyond its stated goal, stay in lane:
+  protect the current goal and leave prior slop alone.

cihai-cli 0.32.0__tar.gz → 0.34.0__tar.gz

cihai-cli 0.32.0tar.gz → 0.34.0tar.gz