PyPI - compair-core - Versions diffs - 0.3.14__tar.gz → 0.4.0__tar.gz - Mend

compair-core 0.3.14tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of compair-core might be problematic. Click here for more details.

Files changed (45) hide show

{compair_core-0.3.14 → compair_core-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: compair-core
-Version: 0.3.14
+Version: 0.4.0
 Summary: Open-source foundation of the Compair collaboration platform.
 Author: RocketResearch, Inc.
 License: MIT
@@ -86,7 +86,8 @@ Container definitions and build pipelines live outside this public package:
 Key environment variables for the core edition:
 - `COMPAIR_EDITION` (`core`) – corresponds to this core local implementation.
-- `COMPAIR_SQLITE_DIR` / `COMPAIR_SQLITE_NAME` – override the default local SQLite path (falls back to `./compair_data` if `/data` is not writable).
+- `COMPAIR_DATABASE_URL` – optional explicit SQLAlchemy URL (e.g. `postgresql+psycopg2://user:pass@host/db`). When omitted, Compair falls back to a local SQLite file.
+- `COMPAIR_DB_DIR` / `COMPAIR_DB_NAME` – directory and filename for the bundled SQLite database (default: `~/.compair-core/data/compair.db`). Legacy `COMPAIR_SQLITE_*` variables remain supported.
 - `COMPAIR_LOCAL_MODEL_URL` – endpoint for your local embeddings/feedback service (defaults to `http://local-model:9000`).
 - `COMPAIR_EMAIL_BACKEND` – the core mailer logs emails to stdout; cloud overrides this with transactional delivery.
 - `COMPAIR_REQUIRE_AUTHENTICATION` (`true`) – set to `false` to run the API in single-user mode without login or account management. When disabled, Compair auto-provisions a local user, group, and long-lived session token so you can upload documents immediately.
@@ -94,6 +95,10 @@ Key environment variables for the core edition:
 - `COMPAIR_INCLUDE_LEGACY_ROUTES` (`false`) – opt-in to the full legacy API surface (used by the hosted product) when running the core edition. Leave unset to expose only the streamlined single-user endpoints in Swagger.
 - `COMPAIR_EMBEDDING_DIM` – force the embedding vector size stored in the database (defaults to 384 for core, 1536 for cloud). Keep this in sync with whichever embedding model you configure.
 - `COMPAIR_VECTOR_BACKEND` (`auto`) – set to `pgvector` when running against PostgreSQL with the pgvector extension, or `json` to store embeddings as JSON (the default for SQLite deployments).
+- `COMPAIR_GENERATION_PROVIDER` (`local`) – choose how feedback is produced. Options: `local` (call the bundled FastAPI service), `openai` (use ChatGPT-compatible APIs with an API key), `http` (POST the request to a custom endpoint), or `fallback` (skip generation and surface similar references only).
+- `COMPAIR_OPENAI_API_KEY` / `COMPAIR_OPENAI_MODEL` – when using the OpenAI provider, supply your API key and optional model name (defaults to `gpt-4o-mini`). The fallback kicks in automatically if the key or SDK is unavailable.
+- `COMPAIR_GENERATION_ENDPOINT` – HTTP endpoint invoked when `COMPAIR_GENERATION_PROVIDER=http`; the service receives a JSON payload (`document`, `references`, `length_instruction`) and should return `{"feedback": ...}`.
+- `COMPAIR_OCR_ENDPOINT` – endpoint the backend calls for OCR uploads (defaults to the bundled Tesseract wrapper at `http://local-ocr:9001/ocr-file`). Provide your own service by overriding this URL.
 See `compair_core/server/settings.py` for the full settings surface.

{compair_core-0.3.14 → compair_core-0.4.0}/README.md RENAMED Viewed

@@ -51,7 +51,8 @@ Container definitions and build pipelines live outside this public package:
 Key environment variables for the core edition:
 - `COMPAIR_EDITION` (`core`) – corresponds to this core local implementation.
-- `COMPAIR_SQLITE_DIR` / `COMPAIR_SQLITE_NAME` – override the default local SQLite path (falls back to `./compair_data` if `/data` is not writable).
+- `COMPAIR_DATABASE_URL` – optional explicit SQLAlchemy URL (e.g. `postgresql+psycopg2://user:pass@host/db`). When omitted, Compair falls back to a local SQLite file.
+- `COMPAIR_DB_DIR` / `COMPAIR_DB_NAME` – directory and filename for the bundled SQLite database (default: `~/.compair-core/data/compair.db`). Legacy `COMPAIR_SQLITE_*` variables remain supported.
 - `COMPAIR_LOCAL_MODEL_URL` – endpoint for your local embeddings/feedback service (defaults to `http://local-model:9000`).
 - `COMPAIR_EMAIL_BACKEND` – the core mailer logs emails to stdout; cloud overrides this with transactional delivery.
 - `COMPAIR_REQUIRE_AUTHENTICATION` (`true`) – set to `false` to run the API in single-user mode without login or account management. When disabled, Compair auto-provisions a local user, group, and long-lived session token so you can upload documents immediately.
@@ -59,6 +60,10 @@ Key environment variables for the core edition:
 - `COMPAIR_INCLUDE_LEGACY_ROUTES` (`false`) – opt-in to the full legacy API surface (used by the hosted product) when running the core edition. Leave unset to expose only the streamlined single-user endpoints in Swagger.
 - `COMPAIR_EMBEDDING_DIM` – force the embedding vector size stored in the database (defaults to 384 for core, 1536 for cloud). Keep this in sync with whichever embedding model you configure.
 - `COMPAIR_VECTOR_BACKEND` (`auto`) – set to `pgvector` when running against PostgreSQL with the pgvector extension, or `json` to store embeddings as JSON (the default for SQLite deployments).
+- `COMPAIR_GENERATION_PROVIDER` (`local`) – choose how feedback is produced. Options: `local` (call the bundled FastAPI service), `openai` (use ChatGPT-compatible APIs with an API key), `http` (POST the request to a custom endpoint), or `fallback` (skip generation and surface similar references only).
+- `COMPAIR_OPENAI_API_KEY` / `COMPAIR_OPENAI_MODEL` – when using the OpenAI provider, supply your API key and optional model name (defaults to `gpt-4o-mini`). The fallback kicks in automatically if the key or SDK is unavailable.
+- `COMPAIR_GENERATION_ENDPOINT` – HTTP endpoint invoked when `COMPAIR_GENERATION_PROVIDER=http`; the service receives a JSON payload (`document`, `references`, `length_instruction`) and should return `{"feedback": ...}`.
+- `COMPAIR_OCR_ENDPOINT` – endpoint the backend calls for OCR uploads (defaults to the bundled Tesseract wrapper at `http://local-ocr:9001/ocr-file`). Provide your own service by overriding this URL.
 See `compair_core/server/settings.py` for the full settings surface.

{compair_core-0.3.14 → compair_core-0.4.0}/compair_core/api.py RENAMED Viewed

@@ -2370,6 +2370,8 @@ def get_activity_feed(
 ):
     """Retrieve recent activities for a user's groups."""
     require_feature(HAS_ACTIVITY, "Activity feed")
+    if not IS_CLOUD:
+        raise HTTPException(status_code=501, detail="Activity feed is only available in the Compair Cloud edition.")
     with compair.Session() as session:
         # Get user's groups
@@ -3514,7 +3516,11 @@ CORE_PATHS: set[str] = {
     "/load_documents",
     "/load_document",
     "/load_document_by_id",
+    "/load_user_files",
     "/create_doc",
+    "/update_doc",
+    "/delete_doc",
+    "/delete_docs",
     "/process_doc",
     "/status/{task_id}",
     "/upload/ocr-file",
@@ -3523,6 +3529,7 @@ CORE_PATHS: set[str] = {
     "/load_references",
     "/load_feedback",
     "/documents/{document_id}/feedback",
+    "/get_activity_feed",
 }
 for route in router.routes:

{compair_core-0.3.14 → compair_core-0.4.0}/compair_core/compair/__init__.py RENAMED Viewed

@@ -1,6 +1,7 @@
 from __future__ import annotations
 import os
+from pathlib import Path
 from sqlalchemy import Engine, create_engine
 from sqlalchemy.orm import sessionmaker
@@ -37,27 +38,51 @@ if edition == "cloud":
 def _handle_engine() -> Engine:
+    # Preferred configuration: explicit database URL
+    explicit_url = (
+        os.getenv("COMPAIR_DATABASE_URL")
+        or os.getenv("COMPAIR_DB_URL")
+        or os.getenv("DATABASE_URL")
+    )
+    if explicit_url:
+        if explicit_url.startswith("sqlite:"):
+            return create_engine(explicit_url, connect_args={"check_same_thread": False})
+        return create_engine(explicit_url)
+    # Backwards compatibility with legacy Postgres env variables
     db = os.getenv("DB")
     db_user = os.getenv("DB_USER")
     db_passw = os.getenv("DB_PASSW")
-    db_url = os.getenv("DB_URL")
+    db_host = os.getenv("DB_URL")
-    if all([db, db_user, db_passw, db_url]):
+    if all([db, db_user, db_passw, db_host]):
         return create_engine(
-            f"postgresql+psycopg2://{db_user}:{db_passw}@{db_url}/{db}",
+            f"postgresql+psycopg2://{db_user}:{db_passw}@{db_host}/{db}",
             pool_size=10,
             max_overflow=0,
         )
-    sqlite_dir = os.getenv("COMPAIR_SQLITE_DIR", "/data")
+    # Local default: place an SQLite database inside COMPAIR_DB_DIR
+    db_dir = (
+        os.getenv("COMPAIR_DB_DIR")
+        or os.getenv("COMPAIR_SQLITE_DIR")
+        or os.path.join(Path.home(), ".compair-core", "data")
+    )
+    db_name = os.getenv("COMPAIR_DB_NAME") or os.getenv("COMPAIR_SQLITE_NAME") or "compair.db"
+    db_path = Path(db_dir).expanduser()
     try:
-        os.makedirs(sqlite_dir, exist_ok=True)
+        db_path.mkdir(parents=True, exist_ok=True)
     except OSError:
-        fallback_dir = os.path.join(os.getcwd(), "compair_data")
-        os.makedirs(fallback_dir, exist_ok=True)
-        sqlite_dir = fallback_dir
-    sqlite_path = os.path.join(sqlite_dir, os.getenv("COMPAIR_SQLITE_NAME", "compair.db"))
-    return create_engine(f"sqlite:///{sqlite_path}", connect_args={"check_same_thread": False})
+        fallback_dir = Path(os.getcwd()) / "compair_data"
+        fallback_dir.mkdir(parents=True, exist_ok=True)
+        db_path = fallback_dir
+    sqlite_path = db_path / db_name
+    return create_engine(
+        f"sqlite:///{sqlite_path}",
+        connect_args={"check_same_thread": False},
+    )
 def initialize_database() -> None:

compair_core-0.4.0/compair_core/compair/feedback.py ADDED Viewed

@@ -0,0 +1,246 @@
+from __future__ import annotations
+import os
+from typing import Any, Iterable, List
+import requests
+from .logger import log_event
+from .models import Document, User
+try:
+    import openai  # type: ignore
+except ImportError:  # pragma: no cover - optional dependency
+    openai = None  # type: ignore
+try:
+    from compair_cloud.feedback import Reviewer as CloudReviewer  # type: ignore
+    from compair_cloud.feedback import get_feedback as cloud_get_feedback  # type: ignore
+except (ImportError, ModuleNotFoundError):
+    CloudReviewer = None  # type: ignore
+    cloud_get_feedback = None  # type: ignore
+class Reviewer:
+    """Edition-aware wrapper that selects a feedback provider based on configuration."""
+    def __init__(self) -> None:
+        self.edition = os.getenv("COMPAIR_EDITION", "core").lower()
+        self.provider = os.getenv("COMPAIR_GENERATION_PROVIDER", "local").lower()
+        self.length_map = {
+            "Brief": "1–2 short sentences",
+            "Detailed": "A couple short paragraphs",
+            "Verbose": "As thorough as reasonably possible without repeating information",
+        }
+        self._cloud_impl = None
+        self._openai_client = None
+        self.openai_model = os.getenv("COMPAIR_OPENAI_MODEL", "gpt-5-nano")
+        self.custom_endpoint = os.getenv("COMPAIR_GENERATION_ENDPOINT")
+        if self.edition == "cloud" and CloudReviewer is not None:
+            self._cloud_impl = CloudReviewer()
+            self.provider = "cloud"
+        else:
+            if self.provider == "openai":
+                api_key = os.getenv("COMPAIR_OPENAI_API_KEY")
+                if api_key and openai is not None:
+                    # Support both legacy (ChatCompletion) and new SDKs
+                    if hasattr(openai, "api_key"):
+                        openai.api_key = api_key  # type: ignore[assignment]
+                    if hasattr(openai, "OpenAI"):
+                        try:  # pragma: no cover - optional runtime dependency
+                            self._openai_client = openai.OpenAI(api_key=api_key)  # type: ignore[attr-defined]
+                        except Exception:  # pragma: no cover - if instantiation fails
+                            self._openai_client = None
+                if self._openai_client is None and not hasattr(openai, "ChatCompletion"):
+                    log_event("openai_feedback_unavailable", reason="openai_library_missing")
+                    self.provider = "fallback"
+            if self.provider == "http" and not self.custom_endpoint:
+                log_event("custom_feedback_unavailable", reason="missing_endpoint")
+                self.provider = "fallback"
+            if self.provider == "local":
+                self.model = os.getenv("COMPAIR_LOCAL_GENERATION_MODEL", "local-feedback")
+                base_url = os.getenv("COMPAIR_LOCAL_MODEL_URL", "http://local-model:9000")
+                route = os.getenv("COMPAIR_LOCAL_GENERATION_ROUTE", "/generate")
+                self.endpoint = f"{base_url.rstrip('/')}{route}"
+            else:
+                self.model = "external"
+                self.endpoint = None
+            if self.provider not in {"local", "openai", "http", "fallback"}:
+                log_event("feedback_provider_unknown", provider=self.provider)
+                self.provider = "fallback"
+    @property
+    def is_cloud(self) -> bool:
+        return self._cloud_impl is not None
+def _reference_snippets(references: Iterable[Any], limit: int = 3) -> List[str]:
+    snippets: List[str] = []
+    for ref in references:
+        snippet = getattr(ref, "content", "") or ""
+        snippet = snippet.replace("\n", " ").strip()
+        if snippet:
+            snippets.append(snippet[:200])
+        if len(snippets) == limit:
+            break
+    return snippets
+def _fallback_feedback(text: str, references: list[Any]) -> str:
+    snippets = _reference_snippets(references)
+    if not snippets:
+        return "NONE"
+    joined = "; ".join(snippets)
+    return f"Consider aligning with these reference passages: {joined}"
+def _openai_feedback(
+    reviewer: Reviewer,
+    doc: Document,
+    text: str,
+    references: list[Any],
+    user: User,
+) -> str | None:
+    if openai is None:
+        return None
+    instruction = reviewer.length_map.get(user.preferred_feedback_length, "1–2 short sentences")
+    ref_text = "\n\n".join(_reference_snippets(references, limit=3))
+    messages = [
+        {
+            "role": "system",
+            "content": (
+                "You are Compair, an assistant that delivers concise, actionable feedback on a user's document. "
+                "Focus on clarity, cohesion, and usefulness."
+            ),
+        },
+        {
+            "role": "user",
+            "content": (
+                f"Document:\n{text}\n\nHelpful reference excerpts:\n{ref_text or 'None provided'}\n\n"
+                f"Respond with {instruction} that highlights the most valuable revision to make next."
+            ),
+        },
+    ]
+    try:
+        if reviewer._openai_client is not None and hasattr(reviewer._openai_client, "responses"):
+            response = reviewer._openai_client.responses.create(  # type: ignore[union-attr]
+                model=reviewer.openai_model,
+                input=messages,
+                max_output_tokens=256,
+            )
+            content = getattr(response, "output_text", None)
+            if not content and hasattr(response, "outputs"):
+                # Legacy compatibility: join content parts
+                parts = []
+                for item in getattr(response, "outputs", []):
+                    parts.extend(getattr(item, "content", []))
+                content = " ".join(getattr(part, "text", "") for part in parts)
+        elif hasattr(openai, "ChatCompletion"):
+            chat_response = openai.ChatCompletion.create(  # type: ignore[attr-defined]
+                model=reviewer.openai_model,
+                messages=messages,
+                temperature=0.3,
+                max_tokens=256,
+            )
+            content = (
+                chat_response["choices"][0]["message"]["content"].strip()  # type: ignore[index, assignment]
+            )
+        else:
+            content = None
+    except Exception as exc:  # pragma: no cover - network/API failure
+        log_event("openai_feedback_failed", error=str(exc))
+        content = None
+    if content:
+        content = content.strip()
+        if content:
+            return content
+    return None
+def _local_feedback(
+    reviewer: Reviewer,
+    text: str,
+    references: list[Any],
+    user: User,
+) -> str | None:
+    payload = {
+        "document": text,
+        "references": [getattr(ref, "content", "") for ref in references],
+        "length_instruction": reviewer.length_map.get(
+            user.preferred_feedback_length,
+            "1–2 short sentences",
+        ),
+    }
+    try:
+        response = requests.post(reviewer.endpoint, json=payload, timeout=30)
+        response.raise_for_status()
+        data = response.json()
+        feedback = data.get("feedback") or data.get("text")
+        if feedback:
+            return str(feedback).strip()
+    except Exception as exc:  # pragma: no cover - network failures stay graceful
+        log_event("local_feedback_failed", error=str(exc))
+    return None
+def _http_feedback(
+    reviewer: Reviewer,
+    text: str,
+    references: list[Any],
+    user: User,
+) -> str | None:
+    if not reviewer.custom_endpoint:
+        return None
+    payload = {
+        "document": text,
+        "references": [getattr(ref, "content", "") for ref in references],
+        "length_instruction": reviewer.length_map.get(
+            user.preferred_feedback_length,
+            "1–2 short sentences",
+        ),
+    }
+    try:
+        response = requests.post(reviewer.custom_endpoint, json=payload, timeout=30)
+        response.raise_for_status()
+        data = response.json()
+        feedback = data.get("feedback") or data.get("text")
+        if isinstance(feedback, str):
+            feedback = feedback.strip()
+        if feedback:
+            return feedback
+    except Exception as exc:  # pragma: no cover - network failures stay graceful
+        log_event("custom_feedback_failed", error=str(exc))
+    return None
+def get_feedback(
+    reviewer: Reviewer,
+    doc: Document,
+    text: str,
+    references: list[Any],
+    user: User,
+) -> str:
+    if reviewer.is_cloud and cloud_get_feedback is not None:
+        return cloud_get_feedback(reviewer._cloud_impl, doc, text, references, user)  # type: ignore[arg-type]
+    if reviewer.provider == "openai":
+        feedback = _openai_feedback(reviewer, doc, text, references, user)
+        if feedback:
+            return feedback
+    if reviewer.provider == "http":
+        feedback = _http_feedback(reviewer, text, references, user)
+        if feedback:
+            return feedback
+    if reviewer.provider == "local" and getattr(reviewer, "endpoint", None):
+        feedback = _local_feedback(reviewer, text, references, user)
+        if feedback:
+            return feedback
+    return _fallback_feedback(text, references)

{compair_core-0.3.14 → compair_core-0.4.0}/compair_core/compair/models.py RENAMED Viewed

@@ -76,9 +76,13 @@ def _embedding_column():
             raise RuntimeError(
                 "pgvector is required when COMPAIR_VECTOR_BACKEND is set to 'pgvector'."
             )
-        return mapped_column(Vector(EMBEDDING_DIMENSION), nullable=True)
+        return mapped_column(
+            Vector(EMBEDDING_DIMENSION),
+            nullable=True,
+            default=None,
+        )
     # Store embeddings as JSON arrays (works across SQLite/Postgres without pgvector)
-    return mapped_column(JSON, nullable=True)
+    return mapped_column(JSON, nullable=True, default=None)
 def cosine_similarity(vec1: Sequence[float] | None, vec2: Sequence[float] | None) -> float | None:
@@ -279,10 +283,10 @@ class Document(BaseObject):
     doc_type: Mapped[str]
     datetime_created: Mapped[datetime]
     datetime_modified: Mapped[datetime]
+    embedding: Mapped[list[float] | None] = _embedding_column()
     file_key: Mapped[str | None] = mapped_column(String, nullable=True, default=None)
     image_key: Mapped[str | None] = mapped_column(String, nullable=True, default=None)
     is_published: Mapped[bool] = mapped_column(Boolean, default=False)
-    embedding: Mapped[list[float] | None] = _embedding_column()
     user = relationship("User", back_populates="documents")
     groups = relationship("Group", secondary="document_to_group", back_populates="documents")
@@ -315,8 +319,8 @@ class Note(Base):
     author_id: Mapped[str] = mapped_column(ForeignKey("user.user_id", ondelete="CASCADE"), index=True)
     group_id: Mapped[str | None] = mapped_column(ForeignKey("group.group_id", ondelete="CASCADE"), index=True, nullable=True)
     content: Mapped[str] = mapped_column(Text)
-    datetime_created: Mapped[datetime] = mapped_column(default=datetime.now(timezone.utc))
     embedding: Mapped[list[float] | None] = _embedding_column()
+    datetime_created: Mapped[datetime] = mapped_column(default=datetime.now(timezone.utc))
     document = relationship("Document", back_populates="notes")
     author = relationship("User", back_populates="notes")

{compair_core-0.3.14 → compair_core-0.4.0}/compair_core/server/local_model/app.py RENAMED Viewed

@@ -46,13 +46,19 @@ class EmbedResponse(BaseModel):
 class GenerateRequest(BaseModel):
+    # Legacy format used by the CLI shim
     system: str | None = None
-    prompt: str
+    prompt: str | None = None
     verbosity: str | None = None
+    # Core API payload (document + references)
+    document: str | None = None
+    references: List[str] | None = None
+    length_instruction: str | None = None
 class GenerateResponse(BaseModel):
-    text: str
+    feedback: str
 @app.post("/embed", response_model=EmbedResponse)
@@ -62,12 +68,20 @@ def embed(request: EmbedRequest) -> EmbedResponse:
 @app.post("/generate", response_model=GenerateResponse)
 def generate(request: GenerateRequest) -> GenerateResponse:
-    prompt = request.prompt.strip()
-    if not prompt:
-        return GenerateResponse(text="NONE")
-    first_sentence = prompt.split("\n", 1)[0][:200]
-    verbosity = request.verbosity or "default"
-    return GenerateResponse(
-        text=f"[local-{verbosity}] Key takeaway: {first_sentence}"
-    )
+    # Determine the main text input (document or prompt)
+    text_input = request.document or request.prompt or ""
+    text_input = text_input.strip()
+    if not text_input:
+        return GenerateResponse(feedback="NONE")
+    first_sentence = text_input.split("\n", 1)[0][:200]
+    verbosity = request.length_instruction or request.verbosity or "brief response"
+    ref_snippet = ""
+    if request.references:
+        top_ref = (request.references[0] or "").strip()
+        if top_ref:
+            ref_snippet = f" Reference: {top_ref[:160]}"
+    feedback = f"[local-feedback] {verbosity}: {first_sentence}{ref_snippet}".strip()
+    return GenerateResponse(feedback=feedback or "NONE")

compair_core-0.4.0/compair_core/server/local_model/ocr.py ADDED Viewed

@@ -0,0 +1,44 @@
+"""Minimal OCR endpoint leveraging pytesseract when available."""
+from __future__ import annotations
+import io
+import os
+from typing import Any, Dict
+from fastapi import FastAPI, File, HTTPException, UploadFile
+app = FastAPI(title="Compair Local OCR", version="0.1.0")
+try:  # Optional dependency
+    import pytesseract  # type: ignore
+    from PIL import Image  # type: ignore
+except ImportError:  # pragma: no cover - optional
+    pytesseract = None  # type: ignore
+    Image = None  # type: ignore
+_OCR_FALLBACK = os.getenv("COMPAIR_LOCAL_OCR_FALLBACK", "text")  # text | none
+def _extract_text(data: bytes) -> str:
+    if pytesseract is None or Image is None:
+        if _OCR_FALLBACK == "text":
+            try:
+                return data.decode("utf-8")
+            except UnicodeDecodeError:
+                return data.decode("latin-1", errors="ignore")
+        return ""
+    try:
+        image = Image.open(io.BytesIO(data))
+        return pytesseract.image_to_string(image)
+    except Exception:
+        return ""
+@app.post("/ocr-file")
+async def ocr_file(file: UploadFile = File(...)) -> Dict[str, Any]:
+    payload = await file.read()
+    text = _extract_text(payload)
+    if not text:
+        raise HTTPException(status_code=501, detail="OCR not available or failed to extract text.")
+    return {"extracted_text": text}

{compair_core-0.3.14 → compair_core-0.4.0}/compair_core/server/routers/capabilities.py RENAMED Viewed

@@ -36,6 +36,10 @@ def capabilities(settings: Settings = Depends(get_settings)) -> dict[str, object
             "docs": None if edition == "core" else 100,
             "feedback_per_day": None if edition == "core" else 50,
         },
+        "features": {
+            "ocr_upload": settings.ocr_enabled,
+            "activity_feed": edition == "cloud",
+        },
         "server": "Compair Cloud" if edition == "cloud" else "Compair Core",
         "version": settings.version,
         "legacy_routes": settings.include_legacy_routes,

{compair_core-0.3.14 → compair_core-0.4.0}/compair_core.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: compair-core
-Version: 0.3.14
+Version: 0.4.0
 Summary: Open-source foundation of the Compair collaboration platform.
 Author: RocketResearch, Inc.
 License: MIT
@@ -86,7 +86,8 @@ Container definitions and build pipelines live outside this public package:
 Key environment variables for the core edition:
 - `COMPAIR_EDITION` (`core`) – corresponds to this core local implementation.
-- `COMPAIR_SQLITE_DIR` / `COMPAIR_SQLITE_NAME` – override the default local SQLite path (falls back to `./compair_data` if `/data` is not writable).
+- `COMPAIR_DATABASE_URL` – optional explicit SQLAlchemy URL (e.g. `postgresql+psycopg2://user:pass@host/db`). When omitted, Compair falls back to a local SQLite file.
+- `COMPAIR_DB_DIR` / `COMPAIR_DB_NAME` – directory and filename for the bundled SQLite database (default: `~/.compair-core/data/compair.db`). Legacy `COMPAIR_SQLITE_*` variables remain supported.
 - `COMPAIR_LOCAL_MODEL_URL` – endpoint for your local embeddings/feedback service (defaults to `http://local-model:9000`).
 - `COMPAIR_EMAIL_BACKEND` – the core mailer logs emails to stdout; cloud overrides this with transactional delivery.
 - `COMPAIR_REQUIRE_AUTHENTICATION` (`true`) – set to `false` to run the API in single-user mode without login or account management. When disabled, Compair auto-provisions a local user, group, and long-lived session token so you can upload documents immediately.
@@ -94,6 +95,10 @@ Key environment variables for the core edition:
 - `COMPAIR_INCLUDE_LEGACY_ROUTES` (`false`) – opt-in to the full legacy API surface (used by the hosted product) when running the core edition. Leave unset to expose only the streamlined single-user endpoints in Swagger.
 - `COMPAIR_EMBEDDING_DIM` – force the embedding vector size stored in the database (defaults to 384 for core, 1536 for cloud). Keep this in sync with whichever embedding model you configure.
 - `COMPAIR_VECTOR_BACKEND` (`auto`) – set to `pgvector` when running against PostgreSQL with the pgvector extension, or `json` to store embeddings as JSON (the default for SQLite deployments).
+- `COMPAIR_GENERATION_PROVIDER` (`local`) – choose how feedback is produced. Options: `local` (call the bundled FastAPI service), `openai` (use ChatGPT-compatible APIs with an API key), `http` (POST the request to a custom endpoint), or `fallback` (skip generation and surface similar references only).
+- `COMPAIR_OPENAI_API_KEY` / `COMPAIR_OPENAI_MODEL` – when using the OpenAI provider, supply your API key and optional model name (defaults to `gpt-4o-mini`). The fallback kicks in automatically if the key or SDK is unavailable.
+- `COMPAIR_GENERATION_ENDPOINT` – HTTP endpoint invoked when `COMPAIR_GENERATION_PROVIDER=http`; the service receives a JSON payload (`document`, `references`, `length_instruction`) and should return `{"feedback": ...}`.
+- `COMPAIR_OCR_ENDPOINT` – endpoint the backend calls for OCR uploads (defaults to the bundled Tesseract wrapper at `http://local-ocr:9001/ocr-file`). Provide your own service by overriding this URL.
 See `compair_core/server/settings.py` for the full settings surface.

{compair_core-0.3.14 → compair_core-0.4.0}/compair_core.egg-info/SOURCES.txt RENAMED Viewed

@@ -30,6 +30,7 @@ compair_core/server/deps.py
 compair_core/server/settings.py
 compair_core/server/local_model/__init__.py
 compair_core/server/local_model/app.py
+compair_core/server/local_model/ocr.py
 compair_core/server/providers/__init__.py
 compair_core/server/providers/console_mailer.py
 compair_core/server/providers/contracts.py

{compair_core-0.3.14 → compair_core-0.4.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "compair-core"
-version = "0.3.14"
+version = "0.4.0"
 description = "Open-source foundation of the Compair collaboration platform."
 readme = "README.md"
 license = { text = "MIT" }

compair_core-0.3.14/compair_core/compair/feedback.py DELETED Viewed

@@ -1,79 +0,0 @@
-from __future__ import annotations
-import os
-import requests
-from typing import Any
-from .logger import log_event
-from .models import Document, User
-try:
-    from compair_cloud.feedback import Reviewer as CloudReviewer  # type: ignore
-    from compair_cloud.feedback import get_feedback as cloud_get_feedback  # type: ignore
-except (ImportError, ModuleNotFoundError):
-    CloudReviewer = None  # type: ignore
-    cloud_get_feedback = None  # type: ignore
-class Reviewer:
-    """Edition-aware wrapper that falls back to the local feedback endpoint."""
-    def __init__(self) -> None:
-        self.edition = os.getenv("COMPAIR_EDITION", "core").lower()
-        self._cloud_impl = None
-        if self.edition == "cloud" and CloudReviewer is not None:
-            self._cloud_impl = CloudReviewer()
-        else:
-            self.client = None
-            self.model = os.getenv("COMPAIR_LOCAL_GENERATION_MODEL", "local-feedback")
-            base_url = os.getenv("COMPAIR_LOCAL_MODEL_URL", "http://local-model:9000")
-            route = os.getenv("COMPAIR_LOCAL_GENERATION_ROUTE", "/generate")
-            self.endpoint = f"{base_url.rstrip('/')}{route}"
-    @property
-    def is_cloud(self) -> bool:
-        return self._cloud_impl is not None
-def _fallback_feedback(text: str, references: list[Any]) -> str:
-    if not references:
-        return "NONE"
-    top_ref = references[0]
-    snippet = getattr(top_ref, "content", "") or ""
-    snippet = snippet.replace("\n", " ").strip()[:200]
-    if not snippet:
-        return "NONE"
-    return f"Check alignment with this reference: {snippet}"
-def get_feedback(
-    reviewer: Reviewer,
-    doc: Document,
-    text: str,
-    references: list[Any],
-    user: User,
-) -> str:
-    if reviewer.is_cloud and cloud_get_feedback is not None:
-        return cloud_get_feedback(reviewer._cloud_impl, doc, text, references, user)  # type: ignore[arg-type]
-    payload = {
-        "document": text,
-        "references": [getattr(ref, "content", "") for ref in references],
-        "length_instruction": {
-            "Brief": "1–2 short sentences",
-            "Detailed": "A couple short paragraphs",
-            "Verbose": "As thorough as reasonably possible without repeating information",
-        }.get(user.preferred_feedback_length, "1–2 short sentences"),
-    }
-    try:
-        response = requests.post(reviewer.endpoint, json=payload, timeout=30)
-        response.raise_for_status()
-        data = response.json()
-        feedback = data.get("feedback")
-        if feedback:
-            return feedback
-    except Exception as exc:  # pragma: no cover - network failures stay graceful
-        log_event("local_feedback_failed", error=str(exc))
-    return _fallback_feedback(text, references)