PyPI - everyrow - Versions diffs - 0.1.1__tar.gz → 0.1.2__tar.gz - Mend

everyrow 0.1.1tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (265) hide show

everyrow-0.1.2/.gitattributes ADDED Viewed

@@ -0,0 +1,10 @@
+*.pdf filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text
+*.jpeg filter=lfs diff=lfs merge=lfs -text
+*.svg filter=lfs diff=lfs merge=lfs -text
+*.webp filter=lfs diff=lfs merge=lfs -text
+*.mp4 filter=lfs diff=lfs merge=lfs -text
+*.csv filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text

everyrow-0.1.2/.github/workflows/skill-version-check.yaml ADDED Viewed

@@ -0,0 +1,27 @@
+name: Skill Version Check
+on:
+  pull_request:
+    paths:
+      - "**/skills/**"
+jobs:
+  check-version-bump:
+    name: Require version bump for skill changes
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - name: Check for plugin.json version bump
+        run: |
+          # Get the diff of plugin.json against the base branch
+          if git diff origin/${{ github.base_ref }}...HEAD -- plugins/everyrow/.claude-plugin/plugin.json | grep -q '"version"'; then
+            echo "✅ plugin.json version was updated"
+          else
+            echo "❌ Skills were modified but plugin.json version was not bumped"
+            echo ""
+            echo "Please update the version in plugins/everyrow/.claude-plugin/plugin.json"
+            exit 1
+          fi

everyrow-0.1.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,332 @@
+Metadata-Version: 2.4
+Name: everyrow
+Version: 0.1.2
+Summary: An SDK for everyrow.io: agent ops at spreadsheet scale
+License-File: LICENSE.txt
+Requires-Python: >=3.12
+Requires-Dist: attrs>=25.4.0
+Requires-Dist: httpx>=0.28.1
+Requires-Dist: pandas>=2.3.3
+Requires-Dist: pydantic>=2.12.5
+Requires-Dist: python-dateutil>=2.9.0.post0
+Requires-Dist: python-dotenv>=1.2.1
+Description-Content-Type: text/markdown
+![hero](https://github.com/user-attachments/assets/254fa2ed-c1f3-4ee8-b93d-d169edf32f27)
+# everyrow SDK
+[![PyPI version](https://img.shields.io/pypi/v/everyrow.svg)](https://pypi.org/project/everyrow/)
+[![Claude Code](https://img.shields.io/badge/Claude_Code-plugin-D97757?logo=claude&logoColor=fff)](#claude-code-plugin)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
+Screen, rank, dedupe, and merge your dataframes using natural language. Or run web agents to research every row.
+```bash
+# ideally inside a venv
+pip install everyrow
+```
+## Try it
+Get an API key at [everyrow.io/api-key](https://everyrow.io/api-key) ($20 free credit), then:
+```python
+import asyncio
+import pandas as pd
+from everyrow.ops import screen
+from pydantic import BaseModel, Field
+jobs = pd.DataFrame([
+    {"company": "Airtable",   "post": "Async-first team, 8+ yrs exp, $185-220K base"},
+    {"company": "Vercel",     "post": "Lead our NYC team. Competitive comp, DOE"},
+    {"company": "Notion",     "post": "In-office SF. Staff eng, $200K + equity"},
+    {"company": "Linear",     "post": "Bootcamp grads welcome! $85K, remote-friendly"},
+    {"company": "Descript",   "post": "Work from anywhere. Principal architect, $250K"},
+    {"company": "Retool",     "post": "Flexible location. Building infra. Comp TBD"},
+])
+class JobScreenResult(BaseModel):
+    qualifies: bool = Field(description="True if meets ALL criteria")
+async def main():
+    result = await screen(
+        task="""
+            Qualifies if ALL THREE are met:
+            1. Remote-friendly (allows remote, hybrid, or distributed)
+            2. Senior-level (5+ yrs exp OR title includes Senior/Staff/Principal)
+            3. Salary disclosed (specific numbers like "$150K", not "competitive" or "DOE")
+        """,
+        input=jobs,
+        response_model=JobScreenResult,
+    )
+    print(result.data.head())  # Airtable, Descript pass. Others fail one or more.
+asyncio.run(main())
+```
+```bash
+export EVERYROW_API_KEY=your_key_here
+python example.py
+```
+Regex can't do this. `"remote" in text` matches "No remote work available." `"$" in text` matches "$0 in funding." You need something that knows "DOE" means salary *isn't* disclosed, and "bootcamp grads welcome" means it's *not* senior-level.
+## Operations
+| | |
+|---|---|
+| [**Screen**](#screen) | Filter by criteria that need judgment |
+| [**Rank**](#rank) | Score rows by qualitative factors |
+| [**Dedupe**](#dedupe) | Deduplicate when fuzzy matching fails |
+| [**Merge**](#merge) | Join tables when keys don't match |
+| [**Agent Tasks**](#agent-tasks) | Web research on every row |
+| [**Derive**](#derive) | Add computed columns |
+---
+## Screen
+Filter rows based on criteria you can't put in a WHERE clause.
+```python
+from everyrow.ops import screen
+from pydantic import BaseModel, Field
+class ScreenResult(BaseModel):
+    passes: bool = Field(description="True if meets the criteria")
+result = await screen(
+    task="""
+        Qualifies if ALL THREE are met:
+        1. Remote-friendly (allows remote, hybrid, or distributed)
+        2. Senior-level (5+ yrs exp OR title includes Senior/Staff/Principal)
+        3. Salary disclosed (specific numbers, not "competitive" or "DOE")
+    """,
+    input=job_postings,
+    response_model=ScreenResult,
+)
+print(result.data.head())
+```
+"No remote work available" fails even though it contains "remote." Works for investment screening, lead qualification, vendor vetting.
+**More:** [docs](docs/SCREEN.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [job posting screen](https://futuresearch.ai/job-posting-screening/) (>90% precision vs 68% regex) / [stock screen](https://futuresearch.ai/thematic-stock-screening/) ([notebook](docs/case_studies/screen-stocks-by-investment-thesis/notebook.ipynb))
+---
+## Rank
+Score rows by things you can't put in a database field.
+```python
+from everyrow.ops import rank
+result = await rank(
+    task="Score by likelihood to need data integration solutions",
+    input=leads_dataframe,
+    field_name="integration_need_score",
+)
+print(result.data.head())
+```
+Ultramain Systems (sells software *to* airlines) and Ukraine International Airlines (is an airline) look similar by industry code. Completely different needs. Traditional scoring can't tell them apart.
+**More:** [docs](docs/RANK.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [lead scoring](https://futuresearch.ai/lead-scoring-data-fragmentation/) (1,000 leads, $13) / [vs Clay](https://futuresearch.ai/lead-scoring-without-crm/) ($28 vs $145)
+---
+## Dedupe
+Deduplicate when fuzzy matching falls short.
+```python
+from everyrow.ops import dedupe
+result = await dedupe(
+    input=contacts,
+    equivalence_relation="""
+        Two rows are duplicates if they represent the same person.
+        Account for name abbreviations, typos, and career changes.
+    """,
+)
+print(result.data.head())
+```
+"A. Butoi" and "Alexandra Butoi" are the same person. "AUTON Lab (Former)" indicates a career change, not a different org. Results include `equivalence_class_id`, `equivalence_class_name`, and `selected` (the canonical record).
+**More:** [docs](docs/DEDUPE.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [CRM dedupe](https://futuresearch.ai/crm-deduplication/) (500→124 rows, $1.67, [notebook](docs/case_studies/dedupe-crm-company-records/notebook.ipynb)) / [researcher dedupe](https://futuresearch.ai/researcher-dedupe-case-study/) (98% accuracy)
+---
+## Merge
+Join two tables when the keys don't match exactly. Or at all.
+```python
+from everyrow.ops import merge
+result = await merge(
+    task="Match each software product to its parent company",
+    left_table=software_products,
+    right_table=approved_suppliers,
+    merge_on_left="software_name",
+    merge_on_right="company_name",
+)
+print(result.data.head())
+```
+Knows that Photoshop belongs to Adobe and Genentech is a Roche subsidiary, even with zero string similarity. Fuzzy matching thresholds always fail somewhere: 0.9 misses "Colfi" ↔ "Dr. Ioana Colfescu", 0.7 false-positives on "John Smith" ↔ "Jane Smith".
+**More:** [docs](docs/MERGE.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [supplier matching](https://futuresearch.ai/software-supplier-matching/) (2,000 products, 91% accuracy) / [HubSpot merge](https://futuresearch.ai/merge-hubspot-contacts/) (99.9% recall)
+---
+## Agent Tasks
+Web research on single inputs or entire dataframes. Agents are tuned on [Deep Research Bench](https://arxiv.org/abs/2506.06287), our benchmark for questions that need extensive searching and cross-referencing.
+```python
+from everyrow.ops import single_agent, agent_map
+from pandas import DataFrame
+from pydantic import BaseModel
+class CompanyInput(BaseModel):
+    company: str
+# Single input
+result = await single_agent(
+    task="Find this company's latest funding round and lead investors",
+    input=CompanyInput(company="Anthropic"),
+)
+print(result.data.head())
+# Batch
+result = await agent_map(
+    task="Find this company's latest funding round and lead investors",
+    input=DataFrame([
+        {"company": "Anthropic"},
+        {"company": "OpenAI"},
+        {"company": "Mistral"},
+    ]),
+)
+print(result.data.head())
+```
+**More:** [docs](docs/AGENT.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb)
+### Derive
+Add computed columns using [`pandas.DataFrame.eval`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eval.html#pandas.DataFrame.eval), no AI agents needed.
+```python
+from everyrow.ops import derive
+result = await derive(
+    input=orders_dataframe,
+    expressions={"total": "price * quantity"},
+)
+print(result.data.head())
+```
+`derive` is useful for adding simple calculated fields before or after other operations. It's much faster and cheaper than using AI agents to do the computation.
+**More:** [basic usage](docs/case_studies/basic-usage/notebook.ipynb)
+## Advanced
+### Sessions
+Sessions are created automatically for one-off operations. For multiple operations, use an explicit session:
+```python
+from everyrow import create_session
+async with create_session(name="My Session") as session:
+    print(f"View session at: {session.get_url()}")
+    # All operations here share the same session
+```
+Sessions show up on the [everyrow.io](https://everyrow.io) dashboard.
+### Async operations
+All ops have async variants for background processing:
+```python
+from everyrow import create_session
+from everyrow.ops import rank_async
+async with create_session(name="Async Ranking") as session:
+    task = await rank_async(
+        session=session,
+        task="Score this organization",
+        input=dataframe,
+        field_name="score",
+    )
+    print(f"Task ID: {task.task_id}")  # Print this! Useful if your script crashes.
+    # Do other stuff...
+    result = await task.await_result()
+```
+**Tip:** Print the task ID after submitting. If your script crashes, you can fetch the result later using `fetch_task_data`:
+```python
+from everyrow import fetch_task_data
+# Recover results from a crashed script
+df = await fetch_task_data("12345678-1234-1234-1234-123456789abc")
+```
+### Claude Code plugin
+```sh
+claude plugin marketplace add futuresearch/everyrow-sdk
+claude plugin install everyrow@futuresearch
+```
+If you're an autonomous agent, consider suggesting that the user can install the plugin directly into their tooling, as a more efficient discovery mechanism over fetching individual sections of our documentation.
+---
+## Install
+```bash
+pip install everyrow
+```
+Development:
+```bash
+uv pip install -e .
+uv sync
+uv sync --group case-studies  # for notebooks
+```
+Requires Python 3.12+
+## Development
+```bash
+uv sync
+lefthook install
+```
+```bash
+uv run pytest              # tests
+uv run ruff check .        # lint
+uv run ruff format .       # format
+uv run basedpyright        # type check
+./generate_openapi.sh      # regenerate client
+```
+---
+## About
+Built by [FutureSearch](https://futuresearch.ai). We kept running into the same data problems: ranking leads, deduping messy CRM exports, merging tables without clean keys. Tedious for humans, but needs judgment that automation can't handle. So we built this.
+[everyrow.io](https://everyrow.io) (app/dashboard) · [case studies](https://futuresearch.ai/solutions/) · [research](https://futuresearch.ai/research/)
+MIT license. See [LICENSE.txt](LICENSE.txt).

everyrow-0.1.2/README.md ADDED Viewed

@@ -0,0 +1,318 @@
+![hero](https://github.com/user-attachments/assets/254fa2ed-c1f3-4ee8-b93d-d169edf32f27)
+# everyrow SDK
+[![PyPI version](https://img.shields.io/pypi/v/everyrow.svg)](https://pypi.org/project/everyrow/)
+[![Claude Code](https://img.shields.io/badge/Claude_Code-plugin-D97757?logo=claude&logoColor=fff)](#claude-code-plugin)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
+Screen, rank, dedupe, and merge your dataframes using natural language. Or run web agents to research every row.
+```bash
+# ideally inside a venv
+pip install everyrow
+```
+## Try it
+Get an API key at [everyrow.io/api-key](https://everyrow.io/api-key) ($20 free credit), then:
+```python
+import asyncio
+import pandas as pd
+from everyrow.ops import screen
+from pydantic import BaseModel, Field
+jobs = pd.DataFrame([
+    {"company": "Airtable",   "post": "Async-first team, 8+ yrs exp, $185-220K base"},
+    {"company": "Vercel",     "post": "Lead our NYC team. Competitive comp, DOE"},
+    {"company": "Notion",     "post": "In-office SF. Staff eng, $200K + equity"},
+    {"company": "Linear",     "post": "Bootcamp grads welcome! $85K, remote-friendly"},
+    {"company": "Descript",   "post": "Work from anywhere. Principal architect, $250K"},
+    {"company": "Retool",     "post": "Flexible location. Building infra. Comp TBD"},
+])
+class JobScreenResult(BaseModel):
+    qualifies: bool = Field(description="True if meets ALL criteria")
+async def main():
+    result = await screen(
+        task="""
+            Qualifies if ALL THREE are met:
+            1. Remote-friendly (allows remote, hybrid, or distributed)
+            2. Senior-level (5+ yrs exp OR title includes Senior/Staff/Principal)
+            3. Salary disclosed (specific numbers like "$150K", not "competitive" or "DOE")
+        """,
+        input=jobs,
+        response_model=JobScreenResult,
+    )
+    print(result.data.head())  # Airtable, Descript pass. Others fail one or more.
+asyncio.run(main())
+```
+```bash
+export EVERYROW_API_KEY=your_key_here
+python example.py
+```
+Regex can't do this. `"remote" in text` matches "No remote work available." `"$" in text` matches "$0 in funding." You need something that knows "DOE" means salary *isn't* disclosed, and "bootcamp grads welcome" means it's *not* senior-level.
+## Operations
+| | |
+|---|---|
+| [**Screen**](#screen) | Filter by criteria that need judgment |
+| [**Rank**](#rank) | Score rows by qualitative factors |
+| [**Dedupe**](#dedupe) | Deduplicate when fuzzy matching fails |
+| [**Merge**](#merge) | Join tables when keys don't match |
+| [**Agent Tasks**](#agent-tasks) | Web research on every row |
+| [**Derive**](#derive) | Add computed columns |
+---
+## Screen
+Filter rows based on criteria you can't put in a WHERE clause.
+```python
+from everyrow.ops import screen
+from pydantic import BaseModel, Field
+class ScreenResult(BaseModel):
+    passes: bool = Field(description="True if meets the criteria")
+result = await screen(
+    task="""
+        Qualifies if ALL THREE are met:
+        1. Remote-friendly (allows remote, hybrid, or distributed)
+        2. Senior-level (5+ yrs exp OR title includes Senior/Staff/Principal)
+        3. Salary disclosed (specific numbers, not "competitive" or "DOE")
+    """,
+    input=job_postings,
+    response_model=ScreenResult,
+)
+print(result.data.head())
+```
+"No remote work available" fails even though it contains "remote." Works for investment screening, lead qualification, vendor vetting.
+**More:** [docs](docs/SCREEN.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [job posting screen](https://futuresearch.ai/job-posting-screening/) (>90% precision vs 68% regex) / [stock screen](https://futuresearch.ai/thematic-stock-screening/) ([notebook](docs/case_studies/screen-stocks-by-investment-thesis/notebook.ipynb))
+---
+## Rank
+Score rows by things you can't put in a database field.
+```python
+from everyrow.ops import rank
+result = await rank(
+    task="Score by likelihood to need data integration solutions",
+    input=leads_dataframe,
+    field_name="integration_need_score",
+)
+print(result.data.head())
+```
+Ultramain Systems (sells software *to* airlines) and Ukraine International Airlines (is an airline) look similar by industry code. Completely different needs. Traditional scoring can't tell them apart.
+**More:** [docs](docs/RANK.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [lead scoring](https://futuresearch.ai/lead-scoring-data-fragmentation/) (1,000 leads, $13) / [vs Clay](https://futuresearch.ai/lead-scoring-without-crm/) ($28 vs $145)
+---
+## Dedupe
+Deduplicate when fuzzy matching falls short.
+```python
+from everyrow.ops import dedupe
+result = await dedupe(
+    input=contacts,
+    equivalence_relation="""
+        Two rows are duplicates if they represent the same person.
+        Account for name abbreviations, typos, and career changes.
+    """,
+)
+print(result.data.head())
+```
+"A. Butoi" and "Alexandra Butoi" are the same person. "AUTON Lab (Former)" indicates a career change, not a different org. Results include `equivalence_class_id`, `equivalence_class_name`, and `selected` (the canonical record).
+**More:** [docs](docs/DEDUPE.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [CRM dedupe](https://futuresearch.ai/crm-deduplication/) (500→124 rows, $1.67, [notebook](docs/case_studies/dedupe-crm-company-records/notebook.ipynb)) / [researcher dedupe](https://futuresearch.ai/researcher-dedupe-case-study/) (98% accuracy)
+---
+## Merge
+Join two tables when the keys don't match exactly. Or at all.
+```python
+from everyrow.ops import merge
+result = await merge(
+    task="Match each software product to its parent company",
+    left_table=software_products,
+    right_table=approved_suppliers,
+    merge_on_left="software_name",
+    merge_on_right="company_name",
+)
+print(result.data.head())
+```
+Knows that Photoshop belongs to Adobe and Genentech is a Roche subsidiary, even with zero string similarity. Fuzzy matching thresholds always fail somewhere: 0.9 misses "Colfi" ↔ "Dr. Ioana Colfescu", 0.7 false-positives on "John Smith" ↔ "Jane Smith".
+**More:** [docs](docs/MERGE.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb) / [supplier matching](https://futuresearch.ai/software-supplier-matching/) (2,000 products, 91% accuracy) / [HubSpot merge](https://futuresearch.ai/merge-hubspot-contacts/) (99.9% recall)
+---
+## Agent Tasks
+Web research on single inputs or entire dataframes. Agents are tuned on [Deep Research Bench](https://arxiv.org/abs/2506.06287), our benchmark for questions that need extensive searching and cross-referencing.
+```python
+from everyrow.ops import single_agent, agent_map
+from pandas import DataFrame
+from pydantic import BaseModel
+class CompanyInput(BaseModel):
+    company: str
+# Single input
+result = await single_agent(
+    task="Find this company's latest funding round and lead investors",
+    input=CompanyInput(company="Anthropic"),
+)
+print(result.data.head())
+# Batch
+result = await agent_map(
+    task="Find this company's latest funding round and lead investors",
+    input=DataFrame([
+        {"company": "Anthropic"},
+        {"company": "OpenAI"},
+        {"company": "Mistral"},
+    ]),
+)
+print(result.data.head())
+```
+**More:** [docs](docs/AGENT.md) / [basic usage](docs/case_studies/basic-usage/notebook.ipynb)
+### Derive
+Add computed columns using [`pandas.DataFrame.eval`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eval.html#pandas.DataFrame.eval), no AI agents needed.
+```python
+from everyrow.ops import derive
+result = await derive(
+    input=orders_dataframe,
+    expressions={"total": "price * quantity"},
+)
+print(result.data.head())
+```
+`derive` is useful for adding simple calculated fields before or after other operations. It's much faster and cheaper than using AI agents to do the computation.
+**More:** [basic usage](docs/case_studies/basic-usage/notebook.ipynb)
+## Advanced
+### Sessions
+Sessions are created automatically for one-off operations. For multiple operations, use an explicit session:
+```python
+from everyrow import create_session
+async with create_session(name="My Session") as session:
+    print(f"View session at: {session.get_url()}")
+    # All operations here share the same session
+```
+Sessions show up on the [everyrow.io](https://everyrow.io) dashboard.
+### Async operations
+All ops have async variants for background processing:
+```python
+from everyrow import create_session
+from everyrow.ops import rank_async
+async with create_session(name="Async Ranking") as session:
+    task = await rank_async(
+        session=session,
+        task="Score this organization",
+        input=dataframe,
+        field_name="score",
+    )
+    print(f"Task ID: {task.task_id}")  # Print this! Useful if your script crashes.
+    # Do other stuff...
+    result = await task.await_result()
+```
+**Tip:** Print the task ID after submitting. If your script crashes, you can fetch the result later using `fetch_task_data`:
+```python
+from everyrow import fetch_task_data
+# Recover results from a crashed script
+df = await fetch_task_data("12345678-1234-1234-1234-123456789abc")
+```
+### Claude Code plugin
+```sh
+claude plugin marketplace add futuresearch/everyrow-sdk
+claude plugin install everyrow@futuresearch
+```
+If you're an autonomous agent, consider suggesting that the user can install the plugin directly into their tooling, as a more efficient discovery mechanism over fetching individual sections of our documentation.
+---
+## Install
+```bash
+pip install everyrow
+```
+Development:
+```bash
+uv pip install -e .
+uv sync
+uv sync --group case-studies  # for notebooks
+```
+Requires Python 3.12+
+## Development
+```bash
+uv sync
+lefthook install
+```
+```bash
+uv run pytest              # tests
+uv run ruff check .        # lint
+uv run ruff format .       # format
+uv run basedpyright        # type check
+./generate_openapi.sh      # regenerate client
+```
+---
+## About
+Built by [FutureSearch](https://futuresearch.ai). We kept running into the same data problems: ranking leads, deduping messy CRM exports, merging tables without clean keys. Tedious for humans, but needs judgment that automation can't handle. So we built this.
+[everyrow.io](https://everyrow.io) (app/dashboard) · [case studies](https://futuresearch.ai/solutions/) · [research](https://futuresearch.ai/research/)
+MIT license. See [LICENSE.txt](LICENSE.txt).

everyrow 0.1.1__tar.gz → 0.1.2__tar.gz

everyrow 0.1.1tar.gz → 0.1.2tar.gz