@ragieai/skills 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +8 -0
- package/.mcp.json +11 -0
- package/LICENSE +21 -0
- package/README.md +88 -0
- package/dist/index.cjs +55 -0
- package/dist/index.d.cts +16 -0
- package/dist/index.d.ts +16 -0
- package/dist/index.js +26 -0
- package/package.json +43 -0
- package/skills/ragie/SKILL.md +50 -0
- package/skills/ragie/references/api-reference.md +203 -0
- package/skills/ragie/references/ingestion.md +127 -0
- package/skills/ragie/references/mcp.md +84 -0
- package/skills/ragie/references/metadata-filtering.md +149 -0
- package/skills/ragie/references/partitions.md +85 -0
- package/skills/ragie/references/python.md +232 -0
- package/skills/ragie/references/quickstart.md +69 -0
- package/skills/ragie/references/rag-patterns.md +160 -0
- package/skills/ragie/references/retrieval.md +77 -0
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "ragieai",
|
|
3
|
+
"description": "Ragie AI — managed RAG platform for document ingestion, search, and retrieval. Provides skills for integrating Ragie into applications and using the Ragie MCP server.",
|
|
4
|
+
"author": {
|
|
5
|
+
"name": "Ragie",
|
|
6
|
+
"url": "https://ragie.ai"
|
|
7
|
+
}
|
|
8
|
+
}
|
package/.mcp.json
ADDED
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Ragie Corp
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
1
|
+
# @ragieai/skills
|
|
2
|
+
|
|
3
|
+
Ragie skills for AI coding agents. Install once and your agent understands how to ingest documents, run retrievals, build RAG pipelines, configure the MCP server, and handle multi-tenancy with Ragie.
|
|
4
|
+
|
|
5
|
+
Works with Claude Code, Cursor, Cline, Copilot, Windsurf, and [40+ other agents](https://skills.sh).
|
|
6
|
+
|
|
7
|
+
## Installation
|
|
8
|
+
|
|
9
|
+
### Via skills CLI (recommended)
|
|
10
|
+
|
|
11
|
+
Installs into your current project for whichever agent you use:
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
npx skills add ragieai/skills
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
### Via npm
|
|
18
|
+
|
|
19
|
+
For programmatic access to skill and reference content:
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
npm install @ragieai/skills
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
### Local development
|
|
26
|
+
|
|
27
|
+
Symlink the skill directory directly into Claude Code's skills folder so edits are reflected immediately:
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
ln -s /path/to/ragieai/skills/skills/ragie ~/.claude/skills/ragie
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## What's Included
|
|
34
|
+
|
|
35
|
+
### Skill
|
|
36
|
+
|
|
37
|
+
One skill — `ragie` — that activates when you ask about integrating Ragie, building RAG pipelines, ingesting documents, configuring the MCP server, or working with retrievals.
|
|
38
|
+
|
|
39
|
+
### References
|
|
40
|
+
|
|
41
|
+
| File | Content |
|
|
42
|
+
|------|---------|
|
|
43
|
+
| `quickstart.md` | Install SDK, ingest a document, run first retrieval |
|
|
44
|
+
| `ingestion.md` | Files, URLs, raw text, readiness polling, webhooks, bulk ingest |
|
|
45
|
+
| `retrieval.md` | Search options, `topK`, reranking, hybrid search, metadata filters |
|
|
46
|
+
| `mcp.md` | MCP server URL pattern, configuration, the `retrieve` tool |
|
|
47
|
+
| `partitions.md` | Multi-tenancy, partition isolation, partition management |
|
|
48
|
+
| `metadata-filtering.md` | Tagging documents, filtering at retrieval time |
|
|
49
|
+
| `rag-patterns.md` | RAG responses, streaming, citations, tool use, production checklist |
|
|
50
|
+
| `api-reference.md` | Full REST endpoint reference, error codes |
|
|
51
|
+
| `python.md` | Python SDK equivalents for all patterns |
|
|
52
|
+
|
|
53
|
+
References are loaded on demand — only what's relevant to the current task is pulled into context.
|
|
54
|
+
|
|
55
|
+
## MCP Server
|
|
56
|
+
|
|
57
|
+
The Ragie MCP server exposes a `retrieve` tool scoped to a partition, letting your agent search your knowledge base directly. Configure it with two environment variables:
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
export RAGIE_API_KEY=ragie_...
|
|
61
|
+
export RAGIE_PARTITION=your-partition
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
The plugin's `.mcp.json` handles the rest. See `mcp.md` for multi-partition setup.
|
|
65
|
+
|
|
66
|
+
## Programmatic API
|
|
67
|
+
|
|
68
|
+
```typescript
|
|
69
|
+
import {
|
|
70
|
+
getSkill,
|
|
71
|
+
getReference,
|
|
72
|
+
getSkillPath,
|
|
73
|
+
getReferencePath,
|
|
74
|
+
getSkillsDir,
|
|
75
|
+
} from "@ragieai/skills";
|
|
76
|
+
|
|
77
|
+
// Get skill or reference content as a string
|
|
78
|
+
const skill = getSkill("ragie");
|
|
79
|
+
const quickstart = getReference("quickstart");
|
|
80
|
+
|
|
81
|
+
// Get absolute file paths
|
|
82
|
+
const skillPath = getSkillPath("ragie");
|
|
83
|
+
const refPath = getReferencePath("rag-patterns");
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## License
|
|
87
|
+
|
|
88
|
+
MIT
|
package/dist/index.cjs
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
var __defProp = Object.defineProperty;
|
|
3
|
+
var __getOwnPropDesc = Object.getOwnPropertyDescriptor;
|
|
4
|
+
var __getOwnPropNames = Object.getOwnPropertyNames;
|
|
5
|
+
var __hasOwnProp = Object.prototype.hasOwnProperty;
|
|
6
|
+
var __export = (target, all) => {
|
|
7
|
+
for (var name in all)
|
|
8
|
+
__defProp(target, name, { get: all[name], enumerable: true });
|
|
9
|
+
};
|
|
10
|
+
var __copyProps = (to, from, except, desc) => {
|
|
11
|
+
if (from && typeof from === "object" || typeof from === "function") {
|
|
12
|
+
for (let key of __getOwnPropNames(from))
|
|
13
|
+
if (!__hasOwnProp.call(to, key) && key !== except)
|
|
14
|
+
__defProp(to, key, { get: () => from[key], enumerable: !(desc = __getOwnPropDesc(from, key)) || desc.enumerable });
|
|
15
|
+
}
|
|
16
|
+
return to;
|
|
17
|
+
};
|
|
18
|
+
var __toCommonJS = (mod) => __copyProps(__defProp({}, "__esModule", { value: true }), mod);
|
|
19
|
+
|
|
20
|
+
// src/index.ts
|
|
21
|
+
var index_exports = {};
|
|
22
|
+
__export(index_exports, {
|
|
23
|
+
getReference: () => getReference,
|
|
24
|
+
getReferencePath: () => getReferencePath,
|
|
25
|
+
getSkill: () => getSkill,
|
|
26
|
+
getSkillPath: () => getSkillPath,
|
|
27
|
+
getSkillsDir: () => getSkillsDir
|
|
28
|
+
});
|
|
29
|
+
module.exports = __toCommonJS(index_exports);
|
|
30
|
+
var import_fs = require("fs");
|
|
31
|
+
var import_path = require("path");
|
|
32
|
+
var root = (0, import_path.join)(__dirname, "..");
|
|
33
|
+
function getReferencePath(name) {
|
|
34
|
+
return (0, import_path.join)(root, "skills", "ragie", "references", `${name}.md`);
|
|
35
|
+
}
|
|
36
|
+
function getSkillPath(name) {
|
|
37
|
+
return (0, import_path.join)(root, "skills", name, "SKILL.md");
|
|
38
|
+
}
|
|
39
|
+
function getSkillsDir() {
|
|
40
|
+
return (0, import_path.join)(root, "skills");
|
|
41
|
+
}
|
|
42
|
+
function getReference(name) {
|
|
43
|
+
return (0, import_fs.readFileSync)(getReferencePath(name), "utf-8");
|
|
44
|
+
}
|
|
45
|
+
function getSkill(name) {
|
|
46
|
+
return (0, import_fs.readFileSync)(getSkillPath(name), "utf-8");
|
|
47
|
+
}
|
|
48
|
+
// Annotate the CommonJS export names for ESM import in node:
|
|
49
|
+
0 && (module.exports = {
|
|
50
|
+
getReference,
|
|
51
|
+
getReferencePath,
|
|
52
|
+
getSkill,
|
|
53
|
+
getSkillPath,
|
|
54
|
+
getSkillsDir
|
|
55
|
+
});
|
package/dist/index.d.cts
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
declare const REFERENCES: readonly ["quickstart", "ingestion", "retrieval", "mcp", "partitions", "metadata-filtering", "rag-patterns", "api-reference", "python"];
|
|
2
|
+
declare const SKILLS: readonly ["ragie"];
|
|
3
|
+
type ReferenceName = (typeof REFERENCES)[number];
|
|
4
|
+
type SkillName = (typeof SKILLS)[number];
|
|
5
|
+
/** Returns the absolute path to a reference file. */
|
|
6
|
+
declare function getReferencePath(name: ReferenceName): string;
|
|
7
|
+
/** Returns the absolute path to a skill's SKILL.md. */
|
|
8
|
+
declare function getSkillPath(name: SkillName): string;
|
|
9
|
+
/** Returns the absolute path to the skills directory. */
|
|
10
|
+
declare function getSkillsDir(): string;
|
|
11
|
+
/** Returns the content of a reference file as a string. */
|
|
12
|
+
declare function getReference(name: ReferenceName): string;
|
|
13
|
+
/** Returns the content of a skill's SKILL.md as a string. */
|
|
14
|
+
declare function getSkill(name: SkillName): string;
|
|
15
|
+
|
|
16
|
+
export { type ReferenceName, type SkillName, getReference, getReferencePath, getSkill, getSkillPath, getSkillsDir };
|
package/dist/index.d.ts
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
declare const REFERENCES: readonly ["quickstart", "ingestion", "retrieval", "mcp", "partitions", "metadata-filtering", "rag-patterns", "api-reference", "python"];
|
|
2
|
+
declare const SKILLS: readonly ["ragie"];
|
|
3
|
+
type ReferenceName = (typeof REFERENCES)[number];
|
|
4
|
+
type SkillName = (typeof SKILLS)[number];
|
|
5
|
+
/** Returns the absolute path to a reference file. */
|
|
6
|
+
declare function getReferencePath(name: ReferenceName): string;
|
|
7
|
+
/** Returns the absolute path to a skill's SKILL.md. */
|
|
8
|
+
declare function getSkillPath(name: SkillName): string;
|
|
9
|
+
/** Returns the absolute path to the skills directory. */
|
|
10
|
+
declare function getSkillsDir(): string;
|
|
11
|
+
/** Returns the content of a reference file as a string. */
|
|
12
|
+
declare function getReference(name: ReferenceName): string;
|
|
13
|
+
/** Returns the content of a skill's SKILL.md as a string. */
|
|
14
|
+
declare function getSkill(name: SkillName): string;
|
|
15
|
+
|
|
16
|
+
export { type ReferenceName, type SkillName, getReference, getReferencePath, getSkill, getSkillPath, getSkillsDir };
|
package/dist/index.js
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
// src/index.ts
|
|
2
|
+
import { readFileSync } from "fs";
|
|
3
|
+
import { join } from "path";
|
|
4
|
+
var root = join(__dirname, "..");
|
|
5
|
+
function getReferencePath(name) {
|
|
6
|
+
return join(root, "skills", "ragie", "references", `${name}.md`);
|
|
7
|
+
}
|
|
8
|
+
function getSkillPath(name) {
|
|
9
|
+
return join(root, "skills", name, "SKILL.md");
|
|
10
|
+
}
|
|
11
|
+
function getSkillsDir() {
|
|
12
|
+
return join(root, "skills");
|
|
13
|
+
}
|
|
14
|
+
function getReference(name) {
|
|
15
|
+
return readFileSync(getReferencePath(name), "utf-8");
|
|
16
|
+
}
|
|
17
|
+
function getSkill(name) {
|
|
18
|
+
return readFileSync(getSkillPath(name), "utf-8");
|
|
19
|
+
}
|
|
20
|
+
export {
|
|
21
|
+
getReference,
|
|
22
|
+
getReferencePath,
|
|
23
|
+
getSkill,
|
|
24
|
+
getSkillPath,
|
|
25
|
+
getSkillsDir
|
|
26
|
+
};
|
package/package.json
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "@ragieai/skills",
|
|
3
|
+
"version": "0.1.0",
|
|
4
|
+
"description": "Ragie skills for AI coding agents — document ingestion, retrieval, RAG patterns, and MCP integration",
|
|
5
|
+
"keywords": [
|
|
6
|
+
"ragie",
|
|
7
|
+
"rag",
|
|
8
|
+
"retrieval",
|
|
9
|
+
"ai",
|
|
10
|
+
"skills",
|
|
11
|
+
"claude",
|
|
12
|
+
"cursor"
|
|
13
|
+
],
|
|
14
|
+
"license": "MIT",
|
|
15
|
+
"type": "module",
|
|
16
|
+
"main": "./dist/index.cjs",
|
|
17
|
+
"module": "./dist/index.js",
|
|
18
|
+
"types": "./dist/index.d.ts",
|
|
19
|
+
"exports": {
|
|
20
|
+
".": {
|
|
21
|
+
"types": "./dist/index.d.ts",
|
|
22
|
+
"import": "./dist/index.js",
|
|
23
|
+
"require": "./dist/index.cjs"
|
|
24
|
+
}
|
|
25
|
+
},
|
|
26
|
+
"files": [
|
|
27
|
+
"dist",
|
|
28
|
+
"skills",
|
|
29
|
+
".claude-plugin",
|
|
30
|
+
".mcp.json"
|
|
31
|
+
],
|
|
32
|
+
"scripts": {
|
|
33
|
+
"build": "tsup src/index.ts --format esm,cjs --dts",
|
|
34
|
+
"dev": "tsup src/index.ts --format esm,cjs --dts --watch",
|
|
35
|
+
"test": "vitest run"
|
|
36
|
+
},
|
|
37
|
+
"devDependencies": {
|
|
38
|
+
"@types/node": "^22.0.0",
|
|
39
|
+
"tsup": "^8.0.0",
|
|
40
|
+
"typescript": "^5.0.0",
|
|
41
|
+
"vitest": "^4.1.4"
|
|
42
|
+
}
|
|
43
|
+
}
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ragie
|
|
3
|
+
description: >
|
|
4
|
+
This skill should be used when the user wants to "add Ragie to my project", "integrate Ragie",
|
|
5
|
+
"use Ragie for RAG", "ingest documents with Ragie", "search documents with Ragie",
|
|
6
|
+
"set up document retrieval", "build a RAG pipeline", "use the Ragie MCP", "query my knowledge base",
|
|
7
|
+
"connect Ragie to Claude", or mentions Ragie in the context of document search, retrieval-augmented
|
|
8
|
+
generation, or knowledge base management. Provides end-to-end guidance for the Ragie managed RAG
|
|
9
|
+
platform: SDK setup, document ingestion, retrieval, MCP usage, and RAG application patterns.
|
|
10
|
+
version: "1.0.0"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# Ragie
|
|
14
|
+
|
|
15
|
+
Ragie is a fully managed RAG (Retrieval-Augmented Generation) platform. It handles document ingestion, chunking, embedding, and retrieval — available via REST API, Python/TypeScript SDKs, and an MCP server.
|
|
16
|
+
|
|
17
|
+
## References
|
|
18
|
+
|
|
19
|
+
Load the relevant reference for the task at hand:
|
|
20
|
+
|
|
21
|
+
| Reference | When to load |
|
|
22
|
+
|-----------|--------------|
|
|
23
|
+
| `references/quickstart.md` | Getting started, first integration, install instructions |
|
|
24
|
+
| `references/ingestion.md` | Uploading files/URLs/text, readiness polling, webhooks, bulk ingest |
|
|
25
|
+
| `references/retrieval.md` | Search options, `top_k`, reranking, hybrid search, filters |
|
|
26
|
+
| `references/mcp.md` | MCP server setup, `retrieve` tool, URL pattern, multi-partition config |
|
|
27
|
+
| `references/partitions.md` | Multi-tenancy, partition isolation, partition management |
|
|
28
|
+
| `references/metadata-filtering.md` | Tagging documents, filtering at retrieval time |
|
|
29
|
+
| `references/rag-patterns.md` | Building RAG responses, streaming, citations, tool use, production checklist |
|
|
30
|
+
| `references/api-reference.md` | Full REST endpoint reference, error codes, SDK auth |
|
|
31
|
+
| `references/python.md` | Python SDK equivalents for all patterns |
|
|
32
|
+
|
|
33
|
+
## Core Concepts
|
|
34
|
+
|
|
35
|
+
| Concept | Description |
|
|
36
|
+
|---------|-------------|
|
|
37
|
+
| **Document** | Any ingested file, URL, or raw text. Processed asynchronously. |
|
|
38
|
+
| **Chunk** | A segment produced by Ragie's splitting pipeline. The unit of retrieval. |
|
|
39
|
+
| **Retrieval** | Hybrid semantic + keyword search across chunks. Returns `scored_chunks`. |
|
|
40
|
+
| **Partition** | Logical namespace for isolation (multi-tenancy, environments). |
|
|
41
|
+
| **Metadata** | Key-value pairs on documents; used for filtering at retrieval time. |
|
|
42
|
+
|
|
43
|
+
## Quick Decision Guide
|
|
44
|
+
|
|
45
|
+
- **New to Ragie?** → `references/quickstart.md`
|
|
46
|
+
- **Uploading documents?** → `references/ingestion.md`
|
|
47
|
+
- **Searching / querying?** → `references/retrieval.md`
|
|
48
|
+
- **Using Claude Code's MCP tool?** → `references/mcp.md`
|
|
49
|
+
- **Building a RAG app end-to-end?** → `references/rag-patterns.md`
|
|
50
|
+
- **Multiple tenants or environments?** → `references/partitions.md`
|
|
@@ -0,0 +1,203 @@
|
|
|
1
|
+
# Ragie REST API Reference
|
|
2
|
+
|
|
3
|
+
Base URL: `https://api.ragie.ai`
|
|
4
|
+
|
|
5
|
+
Authentication: `Authorization: Bearer <RAGIE_API_KEY>` on every request.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Documents
|
|
10
|
+
|
|
11
|
+
### Create document from URL
|
|
12
|
+
|
|
13
|
+
```
|
|
14
|
+
POST /documents
|
|
15
|
+
Content-Type: application/json
|
|
16
|
+
|
|
17
|
+
{
|
|
18
|
+
"url": "https://example.com/page",
|
|
19
|
+
"name": "My Doc", // optional display name
|
|
20
|
+
"partition": "tenant-id", // optional partition
|
|
21
|
+
"metadata": {} // optional key-value metadata
|
|
22
|
+
}
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
### Create document from raw bytes
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
POST /documents/raw
|
|
29
|
+
Content-Type: multipart/form-data
|
|
30
|
+
|
|
31
|
+
content <file bytes>
|
|
32
|
+
content_type application/pdf | text/plain | text/markdown | ...
|
|
33
|
+
name <string>
|
|
34
|
+
partition <string> (optional)
|
|
35
|
+
metadata <json> (optional)
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
### Get document
|
|
39
|
+
|
|
40
|
+
```
|
|
41
|
+
GET /documents/{document_id}
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Response fields: `id`, `name`, `status` (`pending` | `partitioning` | `partitioned` | `refined` | `chunked` | `indexed` | `summary_indexed` | `keyword_indexed` | `ready` | `failed`), `metadata`, `partition`, `created_at`, `updated_at`.
|
|
45
|
+
|
|
46
|
+
### List documents
|
|
47
|
+
|
|
48
|
+
```
|
|
49
|
+
GET /documents?page_size=<n>&cursor=<c>&filter=<json>
|
|
50
|
+
Partition: <partition> ← partition is a header, not a query param
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Returns `{ "results": [...], "pagination": { "next_cursor": "..." } }`.
|
|
54
|
+
|
|
55
|
+
### Update document metadata
|
|
56
|
+
|
|
57
|
+
```
|
|
58
|
+
PATCH /documents/{document_id}/metadata
|
|
59
|
+
Content-Type: application/json
|
|
60
|
+
|
|
61
|
+
{
|
|
62
|
+
"metadata": { "key": "value" }
|
|
63
|
+
}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Performs a partial update. Keys set to `null` are deleted.
|
|
67
|
+
|
|
68
|
+
### Delete document
|
|
69
|
+
|
|
70
|
+
```
|
|
71
|
+
DELETE /documents/{document_id}
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## Retrievals
|
|
77
|
+
|
|
78
|
+
### Retrieve chunks
|
|
79
|
+
|
|
80
|
+
```
|
|
81
|
+
POST /retrievals
|
|
82
|
+
Content-Type: application/json
|
|
83
|
+
|
|
84
|
+
{
|
|
85
|
+
"query": "What are the rate limits?",
|
|
86
|
+
"top_k": 8, // default 8
|
|
87
|
+
"rerank": true, // cross-encoder rerank (recommended)
|
|
88
|
+
"partition": "tenant-id", // scope to partition (optional)
|
|
89
|
+
"filter": { "product": "api" }, // metadata filter (optional)
|
|
90
|
+
"max_chunks_per_document": 2, // limit chunks per source doc (optional)
|
|
91
|
+
"recency_bias": false // favor recently ingested docs (optional)
|
|
92
|
+
}
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
Response:
|
|
96
|
+
|
|
97
|
+
```json
|
|
98
|
+
{
|
|
99
|
+
"scored_chunks": [
|
|
100
|
+
{
|
|
101
|
+
"id": "chunk_...",
|
|
102
|
+
"index": 0,
|
|
103
|
+
"text": "...",
|
|
104
|
+
"score": 0.92,
|
|
105
|
+
"document_id": "doc_...",
|
|
106
|
+
"document_name": "API Docs",
|
|
107
|
+
"document_metadata": {},
|
|
108
|
+
"metadata": {}
|
|
109
|
+
}
|
|
110
|
+
]
|
|
111
|
+
}
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
## Partitions
|
|
117
|
+
|
|
118
|
+
### List partitions
|
|
119
|
+
|
|
120
|
+
```
|
|
121
|
+
GET /partitions
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Create partition
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
POST /partitions
|
|
128
|
+
Content-Type: application/json
|
|
129
|
+
|
|
130
|
+
{ "name": "tenant-42", "description": "optional" }
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
### Get partition (usage metrics)
|
|
134
|
+
|
|
135
|
+
```
|
|
136
|
+
GET /partitions/{partition_id}
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
Returns document count, pages processed/hosted monthly/total.
|
|
140
|
+
|
|
141
|
+
### Delete partition (and all its documents)
|
|
142
|
+
|
|
143
|
+
```
|
|
144
|
+
DELETE /partitions/{partition_id}
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## Webhooks
|
|
150
|
+
|
|
151
|
+
Register a webhook endpoint to receive document status change events:
|
|
152
|
+
|
|
153
|
+
```
|
|
154
|
+
POST /webhook_endpoints
|
|
155
|
+
Content-Type: application/json
|
|
156
|
+
|
|
157
|
+
{ "url": "https://your-server.com/ragie-webhook" }
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
Ragie sends `document_status_updated` events when documents reach `indexed`, `keyword_indexed`, `ready`, or `failed` states.
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## Error Codes
|
|
165
|
+
|
|
166
|
+
| HTTP | Meaning |
|
|
167
|
+
|------|---------|
|
|
168
|
+
| 401 | Invalid or missing API key |
|
|
169
|
+
| 402 | Usage limit exceeded |
|
|
170
|
+
| 404 | Document / partition not found |
|
|
171
|
+
| 422 | Validation error — response body has `detail` array |
|
|
172
|
+
| 429 | Rate limited — retry with exponential back-off |
|
|
173
|
+
| 5xx | Server error — retry |
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## SDK Install & Auth
|
|
178
|
+
|
|
179
|
+
### TypeScript / Node
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
npm install ragie
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
```typescript
|
|
186
|
+
import { Ragie } from "ragie";
|
|
187
|
+
const client = new Ragie({ auth: process.env.RAGIE_API_KEY });
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
### Python
|
|
191
|
+
|
|
192
|
+
```bash
|
|
193
|
+
pip install ragie
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
```python
|
|
197
|
+
from ragie import Ragie
|
|
198
|
+
client = Ragie(auth=os.environ["RAGIE_API_KEY"])
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
All SDK methods mirror the REST endpoints and return typed response objects. The SDKs handle pagination, retries, and multipart uploads automatically.
|
|
202
|
+
|
|
203
|
+
Note: The TypeScript SDK uses camelCase (`createDocumentFromUrl`, `topK`, `scoredChunks`, `patchMetadata`). The REST API and Python SDK use snake_case (`create_document_from_url`, `top_k`, `scored_chunks`, `patch_metadata`).
|
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# Ragie Document Ingestion
|
|
2
|
+
|
|
3
|
+
> Python user? See `references/python.md` for Python equivalents.
|
|
4
|
+
|
|
5
|
+
## Ingestion Methods
|
|
6
|
+
|
|
7
|
+
| Method | SDK call | Use when |
|
|
8
|
+
|--------|----------|----------|
|
|
9
|
+
| File upload | `documents.create()` | Uploading files — supports all file types (PDF, DOCX, PPTX, images, …) |
|
|
10
|
+
| In-memory data | `documents.createRaw()` | Creating documents from in-memory text or JSON (scraped content, generated text, structured data) |
|
|
11
|
+
| URL | `documents.createDocumentFromUrl()` | Web pages, public S3/GCS links |
|
|
12
|
+
|
|
13
|
+
**Prefer `documents.create()`** when uploading files, as it supports all file types including binary formats. **Prefer `createRaw()`** when your data is already in memory as a string or object — it is simpler and avoids unnecessary file/Blob wrapping, but only handles text and JSON.
|
|
14
|
+
|
|
15
|
+
## From a File
|
|
16
|
+
|
|
17
|
+
Use `documents.create()` with a `Blob`. This is the only method that supports all file types including binary formats (PDF, DOCX, images, etc.).
|
|
18
|
+
|
|
19
|
+
```typescript
|
|
20
|
+
import { openAsBlob } from "fs";
|
|
21
|
+
|
|
22
|
+
const doc = await client.documents.create({
|
|
23
|
+
file: await openAsBlob("doc.pdf"),
|
|
24
|
+
name: "doc.pdf",
|
|
25
|
+
partition: "tenant-42", // optional
|
|
26
|
+
metadata: { type: "report", year: "2024" }, // optional
|
|
27
|
+
});
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## From In-Memory Data (Raw Text or JSON)
|
|
31
|
+
|
|
32
|
+
**This is the preferred method when your data is already in memory** (e.g., scraped content, generated text, API responses). It accepts strings and plain objects — not binary data.
|
|
33
|
+
|
|
34
|
+
```typescript
|
|
35
|
+
const doc = await client.documents.createRaw({
|
|
36
|
+
data: "Your text content here...", // string or plain object (not binary)
|
|
37
|
+
name: "my-note",
|
|
38
|
+
partition: "tenant-42", // optional
|
|
39
|
+
});
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## From a URL
|
|
43
|
+
|
|
44
|
+
```typescript
|
|
45
|
+
const doc = await client.documents.createDocumentFromUrl({
|
|
46
|
+
url: "https://example.com/report.pdf",
|
|
47
|
+
name: "Q4 Report", // optional display name
|
|
48
|
+
partition: "tenant-42", // optional partition
|
|
49
|
+
metadata: { type: "report", year: "2024" }, // optional
|
|
50
|
+
});
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## Document Lifecycle
|
|
54
|
+
|
|
55
|
+
Documents are processed asynchronously through several stages:
|
|
56
|
+
|
|
57
|
+
`pending` → `partitioning` → `partitioned` → `refined` → `chunked` → `indexed` → `summary_indexed` → `keyword_indexed` → `ready` (or `failed`)
|
|
58
|
+
|
|
59
|
+
For polling, check `status === "ready"` or `status === "failed"`. Intermediate stages are informational.
|
|
60
|
+
|
|
61
|
+
### Polling for Readiness
|
|
62
|
+
|
|
63
|
+
```typescript
|
|
64
|
+
async function waitForReady(
|
|
65
|
+
client: Ragie,
|
|
66
|
+
docId: string,
|
|
67
|
+
timeoutMs = 120_000
|
|
68
|
+
): Promise<void> {
|
|
69
|
+
const start = Date.now();
|
|
70
|
+
while (Date.now() - start < timeoutMs) {
|
|
71
|
+
const doc = await client.documents.get({ documentId: docId });
|
|
72
|
+
if (doc.status === "ready") return;
|
|
73
|
+
if (doc.status === "failed") throw new Error(`Document ${docId} failed`);
|
|
74
|
+
await new Promise((r) => setTimeout(r, 3000));
|
|
75
|
+
}
|
|
76
|
+
throw new Error(`Document ${docId} not ready after ${timeoutMs}ms`);
|
|
77
|
+
}
|
|
78
|
+
|
|
79
|
+
const doc = await client.documents.createDocumentFromUrl({ url });
|
|
80
|
+
await waitForReady(client, doc.id);
|
|
81
|
+
// now safe to retrieve
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### Webhooks
|
|
85
|
+
|
|
86
|
+
Ragie can POST to your server when a document's status changes. Register a webhook endpoint via the Ragie dashboard or `POST /webhook_endpoints`. Ragie sends `document_status_updated` events when documents reach `indexed`, `keyword_indexed`, `ready`, or `failed` states.
|
|
87
|
+
|
|
88
|
+
Use polling during local development; register a webhook endpoint for production.
|
|
89
|
+
|
|
90
|
+
## Bulk Ingestion
|
|
91
|
+
|
|
92
|
+
```typescript
|
|
93
|
+
const docs = await Promise.all(
|
|
94
|
+
urls.map((url) =>
|
|
95
|
+
client.documents.createDocumentFromUrl({ url, partition: "my-partition" })
|
|
96
|
+
)
|
|
97
|
+
);
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Document Management
|
|
101
|
+
|
|
102
|
+
```typescript
|
|
103
|
+
// Get a document
|
|
104
|
+
const doc = await client.documents.get({ documentId: docId });
|
|
105
|
+
|
|
106
|
+
// List documents (returns a PageIterator — async iterable)
|
|
107
|
+
for await (const page of client.documents.list({ partition: "tenant-42", pageSize: 50 })) {
|
|
108
|
+
for (const doc of page.result.documents) {
|
|
109
|
+
console.log(doc.id, doc.name, doc.status);
|
|
110
|
+
}
|
|
111
|
+
}
|
|
112
|
+
|
|
113
|
+
// Update metadata (partial update — keys set to null are deleted)
|
|
114
|
+
await client.documents.patchMetadata({
|
|
115
|
+
documentId: docId,
|
|
116
|
+
patchDocumentMetadataParams: { metadata: { reviewed: "true", version: "v4" } },
|
|
117
|
+
});
|
|
118
|
+
|
|
119
|
+
// Delete a document
|
|
120
|
+
await client.documents.delete({ documentId: docId });
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
## Gotchas
|
|
124
|
+
|
|
125
|
+
- Always check `status === "ready"` before querying — newly ingested documents are not immediately searchable.
|
|
126
|
+
- **Prefer `createRaw()` for in-memory data** — it's simpler when you already have a string or object. **Prefer `documents.create()` for file uploads** — it supports all file types. `createRaw()` only handles text and JSON (`data: string | object`); binary files (PDF, DOCX, etc.) must use `documents.create({ file: blob })`.
|
|
127
|
+
- Supported file types include PDF, DOCX, PPTX, TXT, MD, HTML, and more. Check the dashboard for the full list.
|