PyPI - ldv-cli - Versions diffs - 0.9.0__tar.gz → 0.11.0__tar.gz - Mend

ldv-cli 0.9.0tar.gz → 0.11.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{ldv_cli-0.9.0 → ldv_cli-0.11.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ldv-cli
-Version: 0.9.0
+Version: 0.11.0
 Summary: ldv — CLI for the Liquid DataViewer platform (formerly lql)
 Project-URL: Homepage, https://github.com/Liquid4All/lql
 Author: Liquid AI
@@ -140,7 +140,8 @@ ldv datasets create --workspace <id> --hf-bucket <org/bucket> --key <path-or-glo
                                                  From an HF storage bucket (e.g. --key 'data/*.parquet')
 ldv datasets sync <id>                           Trigger sync (HF repo, S3, or HF bucket)
 ldv datasets schema <id>                         Show column schema
-ldv datasets rows <id> [--limit N] [--offset N]  Fetch rows
+ldv datasets rows <id> [-f "col<op>value"] [--columns a,b] [--limit N] [--offset N]
+                                                 Fetch rows (-f/--filter: same syntax everywhere)
 ldv datasets delete <id>                         Delete dataset
 ldv datasets push <id>                           Push to HuggingFace
 ldv datasets push-status <id> [--job <id>]       Check push job status
@@ -174,10 +175,11 @@ ldv preview <src> --offset N           Start at row index N
 ldv preview <src> --title "<title>"    Title shown in the viewer header
 ```
-**Filtering (`--filter`/`-f`).** Show only matching rows — works on local files and
-platform datasets (platform filtering runs server-side). Repeatable; filters AND
-together; string match is case-insensitive. Operators: `=`, `!=`, `~` (contains),
-`>`, `<`, `>=`, `<=`.
+**Filtering (`--filter`/`-f`) — one syntax everywhere.** The same flag and syntax
+work on `preview`, `datasets rows`, and `eval samples`. Show only matching rows —
+`preview` also filters local files (client-side); platform datasets filter
+server-side. Repeatable; filters AND together; string match is case-insensitive.
+Operators: `=`, `!=`, `~` (contains), `>`, `<`, `>=`, `<=`.
 ```
 ldv preview <dataset-id> -f "domain=telecom"
@@ -228,7 +230,7 @@ ldv eval list [--workspace <id>]                 List eval datasets only
                                                  workspace, lists only evals you own.
 ldv eval correctness <id>                        Fast accuracy + correct/incorrect/missing counts
 ldv eval stats <id>                              Accuracy + error-type distribution + token stats
-ldv eval samples <id> [--filter correct|incorrect|missing|all]
+ldv eval samples <id> [-f "col<op>value" ...] [--correct|--incorrect|--missing]
                        [--search <text>] [--error-type <value>]
                        [--columns a,b] [--limit N] [--offset N]
                                                  Slice the dataset for error analysis. Filters
@@ -239,6 +241,8 @@ ldv eval sample <id> --row <index>               Read one full sample (the conve
 Notes:
+- `-f`/`--filter` is the unified column filter — same syntax as `preview` and `datasets rows` (see Filtering above).
+- `--correct` / `--incorrect` / `--missing` are convenience flags for the canonical correctness filter (mutually exclusive). They AND with any `-f` filters, `--search`, and `--error-type`.
 - `--search` matches a substring on the prompt **or** response column (either hit counts). Override the searched columns with `--search-columns a,b`.
 - `--error-type` values come from the `error_field` / `error_distribution` reported by `eval stats`.
 - Use the `index` from `eval samples` directly as `eval sample --row <index>`.
@@ -248,7 +252,8 @@ Typical analysis loop:
 ```bash
 ldv eval list --workspace <id>                   # find the eval dataset
 ldv eval stats <id>                              # accuracy + where the errors cluster
-ldv eval samples <id> --filter incorrect --limit 20   # pull the misses
+ldv eval samples <id> --incorrect --limit 20     # pull the misses
+ldv eval samples <id> --incorrect -f "reasoning_tokens>30000"  # misses that ran long
 ldv eval sample <id> --row 42                    # read one failure in full
 ```

{ldv_cli-0.9.0 → ldv_cli-0.11.0}/README.md RENAMED Viewed

@@ -124,7 +124,8 @@ ldv datasets create --workspace <id> --hf-bucket <org/bucket> --key <path-or-glo
                                                  From an HF storage bucket (e.g. --key 'data/*.parquet')
 ldv datasets sync <id>                           Trigger sync (HF repo, S3, or HF bucket)
 ldv datasets schema <id>                         Show column schema
-ldv datasets rows <id> [--limit N] [--offset N]  Fetch rows
+ldv datasets rows <id> [-f "col<op>value"] [--columns a,b] [--limit N] [--offset N]
+                                                 Fetch rows (-f/--filter: same syntax everywhere)
 ldv datasets delete <id>                         Delete dataset
 ldv datasets push <id>                           Push to HuggingFace
 ldv datasets push-status <id> [--job <id>]       Check push job status
@@ -158,10 +159,11 @@ ldv preview <src> --offset N           Start at row index N
 ldv preview <src> --title "<title>"    Title shown in the viewer header
 ```
-**Filtering (`--filter`/`-f`).** Show only matching rows — works on local files and
-platform datasets (platform filtering runs server-side). Repeatable; filters AND
-together; string match is case-insensitive. Operators: `=`, `!=`, `~` (contains),
-`>`, `<`, `>=`, `<=`.
+**Filtering (`--filter`/`-f`) — one syntax everywhere.** The same flag and syntax
+work on `preview`, `datasets rows`, and `eval samples`. Show only matching rows —
+`preview` also filters local files (client-side); platform datasets filter
+server-side. Repeatable; filters AND together; string match is case-insensitive.
+Operators: `=`, `!=`, `~` (contains), `>`, `<`, `>=`, `<=`.
 ```
 ldv preview <dataset-id> -f "domain=telecom"
@@ -212,7 +214,7 @@ ldv eval list [--workspace <id>]                 List eval datasets only
                                                  workspace, lists only evals you own.
 ldv eval correctness <id>                        Fast accuracy + correct/incorrect/missing counts
 ldv eval stats <id>                              Accuracy + error-type distribution + token stats
-ldv eval samples <id> [--filter correct|incorrect|missing|all]
+ldv eval samples <id> [-f "col<op>value" ...] [--correct|--incorrect|--missing]
                        [--search <text>] [--error-type <value>]
                        [--columns a,b] [--limit N] [--offset N]
                                                  Slice the dataset for error analysis. Filters
@@ -223,6 +225,8 @@ ldv eval sample <id> --row <index>               Read one full sample (the conve
 Notes:
+- `-f`/`--filter` is the unified column filter — same syntax as `preview` and `datasets rows` (see Filtering above).
+- `--correct` / `--incorrect` / `--missing` are convenience flags for the canonical correctness filter (mutually exclusive). They AND with any `-f` filters, `--search`, and `--error-type`.
 - `--search` matches a substring on the prompt **or** response column (either hit counts). Override the searched columns with `--search-columns a,b`.
 - `--error-type` values come from the `error_field` / `error_distribution` reported by `eval stats`.
 - Use the `index` from `eval samples` directly as `eval sample --row <index>`.
@@ -232,7 +236,8 @@ Typical analysis loop:
 ```bash
 ldv eval list --workspace <id>                   # find the eval dataset
 ldv eval stats <id>                              # accuracy + where the errors cluster
-ldv eval samples <id> --filter incorrect --limit 20   # pull the misses
+ldv eval samples <id> --incorrect --limit 20     # pull the misses
+ldv eval samples <id> --incorrect -f "reasoning_tokens>30000"  # misses that ran long
 ldv eval sample <id> --row 42                    # read one failure in full
 ```

{ldv_cli-0.9.0 → ldv_cli-0.11.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "ldv-cli"
-version = "0.9.0"
+version = "0.11.0"
 description = "ldv — CLI for the Liquid DataViewer platform (formerly lql)"
 readme = "README.md"
 requires-python = ">=3.12"

{ldv_cli-0.9.0 → ldv_cli-0.11.0}/src/ldv/__init__.py RENAMED Viewed

@@ -3,4 +3,4 @@ from importlib.metadata import PackageNotFoundError, version
 try:
     __version__ = version("ldv-cli")
 except PackageNotFoundError:  # not installed (e.g. running from a source checkout)
-    __version__ = "0.9.0"
+    __version__ = "0.10.0"

{ldv_cli-0.9.0 → ldv_cli-0.11.0}/src/ldv/commands/datasets.py RENAMED Viewed

@@ -1,7 +1,7 @@
 import json
 import sys
 from pathlib import Path
-from typing import Annotated, Optional
+from typing import Annotated, List, Optional
 import typer
@@ -10,6 +10,7 @@ from .._group import AliasGroup
 from .._opts import ApiUrlOpt, JsonOpt, ProfileOpt
 from ..api import ApiClient
 from ..config import _env
+from ..filters import FILTER_HELP, parse_filters, to_api_filters
 from ..output import print_error, print_grouped_tables, print_json, print_table
 from ..util import q
@@ -339,15 +340,28 @@ def profile_cmd(
 @app.command("rows")
 def rows(
     id: Annotated[str, typer.Argument(help="Dataset ID")],
+    filter_: Annotated[Optional[List[str]], typer.Option("--filter", "-f", help=FILTER_HELP)] = None,
+    columns: Annotated[
+        Optional[str], typer.Option("--columns", help="Comma-separated columns to project")
+    ] = None,
     limit: Annotated[str, typer.Option("--limit", help="Number of rows")] = "20",
     offset: Annotated[str, typer.Option("--offset", help="Row offset")] = "0",
     json_out: JsonOpt = False,
     profile: ProfileOpt = None,
     api_url: ApiUrlOpt = None,
 ) -> None:
-    """Get dataset rows."""
+    """Get dataset rows, optionally filtered (see --filter)."""
     client = ApiClient(profile=profile, api_url=api_url)
-    data = client.get(f"/v1/datasets/{q(id)}/rows", params={"limit": limit, "offset": offset}).json()
+    params = {"limit": limit, "offset": offset}
+    if columns:
+        params["columns"] = str(columns)
+    api_filters = to_api_filters(parse_filters(filter_))
+    if api_filters:
+        data = client.post(
+            f"/v1/datasets/{q(id)}/rows/filter", json={"filters": api_filters}, params=params
+        ).json()
+    else:
+        data = client.get(f"/v1/datasets/{q(id)}/rows", params=params).json()
     if json_out:
         print_json(data)
         return

{ldv_cli-0.9.0 → ldv_cli-0.11.0}/src/ldv/commands/evals.py RENAMED Viewed

@@ -5,12 +5,14 @@ import sys
 from typing import Annotated, List, Optional
 import typer
+from rich.console import Console
 from .._group import AliasGroup
 from .._opts import ApiUrlOpt, JsonOpt, ProfileOpt
 from ..api import ApiClient
 from ..config import _env
+from ..filters import FILTER_HELP, parse_filters, to_api_filters
 from ..output import print_error, print_json, print_table
 from ..util import q
@@ -178,10 +180,69 @@ def correctness(
     )
+def _bar(pct: float, width: int = 20) -> str:
+    filled = round(pct * width)
+    filled = max(0, min(width, filled))
+    return "█" * filled + "░" * (width - filled)
+@app.command("failures")
+def failures(
+    id: Annotated[str, typer.Argument(help="Dataset ID")],
+    json_out: JsonOpt = False,
+    profile: ProfileOpt = None,
+    api_url: ApiUrlOpt = None,
+) -> None:
+    """Quality analysis: clean vs. dirty rate + failure mode breakdown."""
+    client = ApiClient(profile=profile, api_url=api_url)
+    data = client.get(f"/v1/datasets/{q(id)}/eval-failure-analysis").json()
+    if json_out:
+        print_json(data)
+        return
+    skip = data.get("skip_reason")
+    if skip:
+        sys.stdout.write(f"No failure_analysis column found in this dataset.\n")
+        return
+    total = data.get("total") or 0
+    clean = data.get("clean") or 0
+    dirty = data.get("dirty") or 0
+    clean_rate = data.get("clean_rate") or 0.0
+    dirty_rate = 1.0 - clean_rate
+    console = Console()
+    console.print(f"\n[bold]Quality analysis: {total:,} samples[/bold]\n")
+    console.print(f"  [green]Quality rate[/green]   {_bar(clean_rate)}  {clean_rate * 100:.1f}%")
+    console.print(f"  [red]Issues[/red]         {_bar(dirty_rate)}  {dirty_rate * 100:.1f}%")
+    modes = data.get("mode_distribution") or []
+    if not modes:
+        if dirty == 0:
+            sys.stdout.write("\nNo issues detected.\n")
+        else:
+            sys.stdout.write(f"\n{dirty:,} samples with issues (no mode breakdown available).\n")
+        return
+    sys.stdout.write(f"\nFailure modes  ({dirty:,} samples with issues):\n")
+    name_width = max((len(str(m.get("mode") or "").replace("_", " ")) for m in modes), default=0)
+    name_width = max(name_width, 10)
+    count_width = max((len(str(m.get("count") or 0)) for m in modes), default=0)
+    count_width = max(count_width, 5)
+    for m in modes:
+        name = str(m.get("mode") or "").replace("_", " ")
+        count = m.get("count") or 0
+        rate = m.get("rate") or 0.0
+        bar = _bar(rate)
+        sys.stdout.write(f"  {name:<{name_width}}  {count:>{count_width}}  {bar}  {rate * 100:.1f}%\n")
 @app.command("samples")
 def samples(
     id: Annotated[str, typer.Argument(help="Dataset ID")],
-    filter_: Annotated[str, typer.Option("--filter", help="correct | incorrect | missing | all")] = "all",
+    filter_: Annotated[Optional[List[str]], typer.Option("--filter", "-f", help=FILTER_HELP)] = None,
+    correct: Annotated[bool, typer.Option("--correct", help="Only correct samples")] = False,
+    incorrect: Annotated[bool, typer.Option("--incorrect", help="Only incorrect samples")] = False,
+    missing: Annotated[bool, typer.Option("--missing", help="Only samples with no verdict")] = False,
     search: Annotated[Optional[str], typer.Option("--search", help="Substring match on prompt OR response column")] = None,
     search_columns: Annotated[Optional[str], typer.Option("--search-columns", help="Override which columns --search matches (comma-separated)")] = None,
     error_type: Annotated[Optional[str], typer.Option("--error-type", help="Filter to samples whose error field equals <value>")] = None,
@@ -192,14 +253,17 @@ def samples(
     profile: ProfileOpt = None,
     api_url: ApiUrlOpt = None,
 ) -> None:
-    """List samples filtered by correctness / search / error type (for error analysis)."""
+    """List eval samples filtered by --filter / --correct / --incorrect / --missing / --search / --error-type."""
     client = ApiClient(profile=profile, api_url=api_url)
-    filters: List[dict] = []
+    filters: List[dict] = to_api_filters(parse_filters(filter_))
-    kind = str(filter_ or "all").lower()
-    if kind not in ("all", "correct", "incorrect", "missing"):
-        print_error("--filter must be one of: correct, incorrect, missing, all", "bad_filter")
+    # --correct / --incorrect / --missing are convenience flags for the canonical
+    # correctness filter (server-side reconciliation). Mutually exclusive.
+    chosen = [name for name, on in (("correct", correct), ("incorrect", incorrect), ("missing", missing)) if on]
+    if len(chosen) > 1:
+        print_error("--correct, --incorrect and --missing are mutually exclusive.", "bad_filter")
         raise typer.Exit(1)
+    correctness = chosen[0] if chosen else None
     if search:
         if search_columns:
@@ -230,8 +294,8 @@ def samples(
     params = {"limit": limit, "offset": offset}
     if columns:
         params["columns"] = str(columns)
-    if kind != "all":
-        params["correctness"] = kind
+    if correctness:
+        params["correctness"] = correctness
     data = client.post(f"/v1/datasets/{q(id)}/rows/filter", json={"filters": filters}, params=params).json()
     if json_out:

{ldv_cli-0.9.0 → ldv_cli-0.11.0}/src/ldv/commands/instructions.py RENAMED Viewed

@@ -84,7 +84,8 @@ A workspace is the top-level container for datasets, spec docs, and members.
   ldv datasets schema <id>             # Column names + types
   ldv datasets profile <id>            # Per-column nulls/cardinality/numeric stats/top values + content token stats
                                        #   [--full-content] exact content scan (slow)  [--skip-content] omit it
-  ldv datasets rows <id> [--limit N] [--offset N]
+  ldv datasets rows <id> [-f "col<op>value" ...] [--columns a,b] [--limit N] [--offset N]
+                                       #   -f/--filter is the same syntax everywhere (see Filtering below)
   ldv datasets delete <id>
   ldv datasets push <id>               # Push edits back to HuggingFace
   ldv datasets push-status <id> [--job <job-id>]
@@ -116,13 +117,17 @@ repeatable), -f/--filter (filter rows; see below), -n/--limit (page size when
 paging a platform dataset), --offset (start row index), --title, --hf, --split,
 --workspace, --profile, --api-url.
-Filtering: -f/--filter "col<op>value" shows only matching rows — works on local
-files and platform datasets (server-side for platform). Repeatable; filters AND
-together; string compare is case-insensitive. Operators: = (eq), != (ne),
-~ (contains), >, <, >=, <=.
+Filtering (one syntax everywhere): -f/--filter "col<op>value" shows only matching
+rows. The SAME flag and syntax work on `preview`, `datasets rows`, and
+`eval samples`. Repeatable; filters AND together; string compare is
+case-insensitive. Operators: = (eq), != (ne), ~ (contains), >, <, >=, <=.
+For `preview` it also runs on local files (client-side); on platform datasets all
+three filter server-side via POST /v1/datasets/{id}/rows/filter.
   ldv preview <dataset-id> -f "domain=telecom" -f "reward>=0.8"
   ldv preview data.jsonl -f "model~lfm"
+  ldv datasets rows <id> -f "lang=en" -f "score<0.5"
+  ldv eval samples <id> -f "reasoning_tokens>30000" --incorrect
 Navigation: two modes toggled with m — pager (one sample at a time; ←/→ or
 n/b switch samples, ↑/↓/j/k scroll) and scroll (all samples; n/b jump between
@@ -149,17 +154,34 @@ primitives for error analysis — YOU do the reasoning over what they return.
   ldv eval stats <id>                  # Accuracy + correctness counts + error-type
                                        # distribution + token stats (the distribution view)
   ldv eval correctness <id>            # Fast accuracy + correct/incorrect/missing counts
-  ldv eval samples <id> [--filter correct|incorrect|missing|all] [--search <text>]
-                       [--error-type <value>] [--columns a,b] [--limit N] [--offset N]
+  ldv eval failures <id>               # Quality analysis: clean-vs-dirty rate + failure mode
+                                       # breakdown from the failure_analysis column.
+                                       # Example output:
+                                       #   Quality analysis: 1,000 samples
+                                       #     Quality rate   ████████████████████░░░░░  80.0%
+                                       #     Issues         █████░░░░░░░░░░░░░░░░░░░░  20.0%
+                                       #   Failure modes  (200 samples with issues):
+                                       #     truncated response   100  ██████████████████  50.0%
+                                       #     missing think tags    80  ██████████████      40.0%
+                                       # If no failure_analysis column exists, prints a clear
+                                       # message and exits 0. Use --json for the raw API response.
+  ldv eval samples <id> [-f "col<op>value" ...] [--correct|--incorrect|--missing]
+                       [--search <text>] [--error-type <value>] [--columns a,b]
+                       [--limit N] [--offset N]
                                        # Slice the dataset for error analysis. Filters AND
                                        # together. Prints an 'index' column per row.
   ldv eval sample <id> --row <index>   # Read one full sample (the conversation) by the
                                        # 'index' from `eval samples`
 Notes:
+  - -f/--filter is the unified column filter (same syntax as preview / datasets rows; see Filtering).
+  - --correct / --incorrect / --missing are convenience flags for the canonical correctness filter
+    (mutually exclusive). They AND with any -f filters and --search / --error-type.
   - --search matches a substring on the prompt OR response column (either one matching is a hit).
   - --error-type values come from the `error_field` / `error_distribution` in `eval stats`.
   - Use the 'index' from `eval samples` directly as `eval sample --row <index>`.
+  - `eval failures` reads the `failure_analysis` column; if absent, skip_reason is set and a
+    clear message is printed. Use --json to get the raw counts for programmatic consumption.
 ## Row Edits
@@ -278,10 +300,13 @@ never goes stale.
 ### Analyze an eval's failure modes
   ldv eval list --json                              # find the eval dataset
+  ldv eval failures <id> --json                     # clean rate + failure mode breakdown
+                                                    #   (mode_distribution: name/count/rate per mode)
   ldv eval stats <id> --json                        # accuracy + error_distribution_incorrect
                                                     #   = the common errors AMONG the misses
-  ldv eval samples <id> --filter incorrect --json   # pull the misses
-  ldv eval samples <id> --filter incorrect --error-type <value> --json   # focus one failure mode
+  ldv eval samples <id> --incorrect --json          # pull the misses
+  ldv eval samples <id> --incorrect --error-type <value> --json   # focus one failure mode
+  ldv eval samples <id> --incorrect -f "reasoning_tokens>30000" --json  # misses that ran long
   ldv eval sample <id> --row <index> --json         # read the full conversation of a miss
   # Then synthesize the common pattern across the misses yourself — the commands give you
   # the data (counts, slices, conversations); the analysis is your job.

{ldv_cli-0.9.0 → ldv_cli-0.11.0}/src/ldv/commands/preview.py RENAMED Viewed

@@ -20,6 +20,7 @@ import typer
 from .._opts import ApiUrlOpt, ProfileOpt
 from ..api import ApiClient
+from ..filters import FILTER_HELP, parse_filters, row_matches, to_api_filters
 from ..output import print_error
 from ..util import q
@@ -759,67 +760,6 @@ def _choose_workspace(client: ApiClient, tui_mod) -> Optional[str]:
     return choice
-# --------------------------------------------------------------------------
-# Row filtering (--filter "col<op>value")
-# --------------------------------------------------------------------------
-# Maps each CLI symbol to the platform filter API's operator name (the same
-# names work server-side and locally). _parse_filters picks the earliest operator
-# (longest on a tie), so list order doesn't affect correctness.
-_FILTER_OPS = [(">=", "gte"), ("<=", "lte"), ("!=", "ne"), ("~", "contains"), ("=", "eq"), (">", "gt"), ("<", "lt")]
-_NUMERIC_OPS = {"gt": lambda c, v: c > v, "lt": lambda c, v: c < v, "gte": lambda c, v: c >= v, "lte": lambda c, v: c <= v}
-def _parse_filters(specs: Optional[List[str]]) -> List[tuple]:
-    """Parse ['col=value', 'reward>=0.5', 'name~kod'] → [(col, op, value), ...].
-    Splits on the EARLIEST operator (longest on a tie, so 'reward>=5' is gte not
-    gt), keeping operator chars in the value intact (e.g. 'q=a>b' → col 'q', value
-    'a>b'). Rejects an empty column or value."""
-    out: List[tuple] = []
-    for spec in specs or []:
-        chosen = None  # (index, symbol, op_name)
-        for sym, op in _FILTER_OPS:
-            i = spec.find(sym)
-            if i > 0 and (chosen is None or i < chosen[0] or (i == chosen[0] and len(sym) > len(chosen[1]))):
-                chosen = (i, sym, op)
-        if chosen is None:
-            print_error(
-                f"Invalid --filter '{spec}'. Use col=value, col!=value, col~text, or col>/</>=/<= N.",
-                "bad_filter",
-            )
-            raise typer.Exit(1)
-        i, sym, op = chosen
-        col, val = spec[:i].strip(), spec[i + len(sym):].strip()
-        if not col or not val:
-            print_error(f"Invalid --filter '{spec}': both a column and a value are required.", "bad_filter")
-            raise typer.Exit(1)
-        out.append((col, op, val))
-    return out
-def _cell_matches(cell: object, op: str, val: str) -> bool:
-    if op == "contains":
-        return cell is not None and val.lower() in str(cell).lower()
-    if op in ("eq", "ne"):
-        equal = cell is not None and str(cell).strip().lower() == val.strip().lower()
-        return equal if op == "eq" else not equal
-    try:
-        return _NUMERIC_OPS[op](float(cell), float(val))  # gt/lt/gte/lte
-    except (TypeError, ValueError):
-        return False
-def _row_matches(row: object, filters: List[tuple]) -> bool:
-    """Client-side predicate (local files). A non-dict row can't match a column
-    filter. All filters AND together."""
-    if not filters:
-        return True
-    if not isinstance(row, dict):
-        return False
-    return all(_cell_matches(row.get(col), op, val) for col, op, val in filters)
 # --------------------------------------------------------------------------
 # Command
 # --------------------------------------------------------------------------
@@ -835,10 +775,7 @@ def preview(
     offset: Annotated[int, typer.Option("--offset", help="Start at this row index")] = 0,
     filter_: Annotated[
         Optional[List[str]],
-        typer.Option(
-            "--filter", "-f",
-            help="Filter rows: 'col=value', 'col!=value', 'col~text' (contains), or 'col>/</>=/<= N'. Repeatable (AND).",
-        ),
+        typer.Option("--filter", "-f", help=FILTER_HELP),
     ] = None,
     title: Annotated[Optional[str], typer.Option("--title", help="Title shown in the viewer header")] = None,
     hf: Annotated[
@@ -869,7 +806,7 @@ def preview(
         print_error("The terminal viewer requires 'textual'. Install it: pip install textual", "missing_textual")
         raise typer.Exit(1)
-    filters = _parse_filters(filter_)
+    filters = parse_filters(filter_)
     local_path = Path(source)
     is_local = (not hf) and local_path.exists() and local_path.is_file()
@@ -877,7 +814,7 @@ def preview(
     if is_local:
         rows = _load_local(local_path)
         if filters:
-            rows = [r for r in rows if _row_matches(r, filters)]
+            rows = [r for r in rows if row_matches(r, filters)]
             if not rows:
                 print_error("No rows match the filter(s).", "no_match")
                 raise typer.Exit(3)
@@ -909,7 +846,7 @@ def preview(
         view_title = title or f"dataset {source}"
     page_size = limit if limit and limit > 0 else 25
-    api_filters = [{"column": col, "operator": op, "value": val} for col, op, val in filters]
+    api_filters = to_api_filters(filters)
     def _fetch_page(off: int, lim: int) -> List[object]:
         params = {"limit": str(lim), "offset": str(offset + off)}

ldv_cli-0.11.0/src/ldv/filters.py ADDED Viewed

@@ -0,0 +1,99 @@
+"""Shared row-filter syntax for `preview`, `datasets rows`, and `eval samples`.
+One filtering language across the CLI: `--filter "col<op>value"` (repeatable, AND).
+The operator symbols map to the platform filter API's operator names, which work
+both server-side (`POST /v1/datasets/{id}/rows/filter`) and locally (preview's
+client-side matcher for local files).
+"""
+from typing import List, Optional
+import typer
+from .output import print_error
+# Shown in each command's --filter help so the syntax is documented in one place.
+FILTER_HELP = (
+    "Filter rows: 'col=value', 'col!=value', 'col~text' (contains), "
+    "or 'col>/</>=/<= N'. Repeatable (AND)."
+)
+# Maps each CLI symbol to the platform filter API's operator name. parse_filters
+# picks the earliest operator (longest on a tie), so list order doesn't affect
+# correctness.
+_FILTER_OPS = [
+    (">=", "gte"),
+    ("<=", "lte"),
+    ("!=", "neq"),
+    ("~", "contains"),
+    ("=", "eq"),
+    (">", "gt"),
+    ("<", "lt"),
+]
+_NUMERIC_OPS = {
+    "gt": lambda c, v: c > v,
+    "lt": lambda c, v: c < v,
+    "gte": lambda c, v: c >= v,
+    "lte": lambda c, v: c <= v,
+}
+def parse_filters(specs: Optional[List[str]]) -> List[tuple]:
+    """Parse ['col=value', 'reward>=0.5', 'name~kod'] → [(col, op, value), ...].
+    Splits on the EARLIEST operator (longest on a tie, so 'reward>=5' is gte not
+    gt), keeping operator chars in the value intact (e.g. 'q=a>b' → col 'q', value
+    'a>b'). Rejects an empty column or value."""
+    out: List[tuple] = []
+    for spec in specs or []:
+        chosen = None  # (index, symbol, op_name)
+        for sym, op in _FILTER_OPS:
+            i = spec.find(sym)
+            if i > 0 and (
+                chosen is None
+                or i < chosen[0]
+                or (i == chosen[0] and len(sym) > len(chosen[1]))
+            ):
+                chosen = (i, sym, op)
+        if chosen is None:
+            print_error(
+                f"Invalid --filter '{spec}'. Use col=value, col!=value, col~text, or col>/</>=/<= N.",
+                "bad_filter",
+            )
+            raise typer.Exit(1)
+        i, sym, op = chosen
+        col, val = spec[:i].strip(), spec[i + len(sym) :].strip()
+        if not col or not val:
+            print_error(
+                f"Invalid --filter '{spec}': both a column and a value are required.",
+                "bad_filter",
+            )
+            raise typer.Exit(1)
+        out.append((col, op, val))
+    return out
+def to_api_filters(parsed: List[tuple]) -> List[dict]:
+    """[(col, op, val), ...] → the `filters` payload for POST /rows/filter."""
+    return [{"column": col, "operator": op, "value": val} for col, op, val in parsed]
+def cell_matches(cell: object, op: str, val: str) -> bool:
+    if op == "contains":
+        return cell is not None and val.lower() in str(cell).lower()
+    if op in ("eq", "neq"):
+        equal = cell is not None and str(cell).strip().lower() == val.strip().lower()
+        return equal if op == "eq" else not equal
+    try:
+        return _NUMERIC_OPS[op](float(cell), float(val))  # gt/lt/gte/lte
+    except (TypeError, ValueError):
+        return False
+def row_matches(row: object, filters: List[tuple]) -> bool:
+    """Client-side predicate (local files). A non-dict row can't match a column
+    filter. All filters AND together."""
+    if not filters:
+        return True
+    if not isinstance(row, dict):
+        return False
+    return all(cell_matches(row.get(col), op, val) for col, op, val in filters)

{ldv_cli-0.9.0 → ldv_cli-0.11.0}/uv.lock RENAMED Viewed

@@ -173,7 +173,7 @@ wheels = [
 [[package]]
 name = "ldv-cli"
-version = "0.9.0"
+version = "0.10.0"
 source = { editable = "." }
 dependencies = [
     { name = "httpx" },