@framers/agentos-skills-registry 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ml-content-classifier
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Real-time content safety classification using ML models (toxicity, prompt injection, jailbreak detection)
|
|
5
|
+
author: Frame.dev
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: security
|
|
8
|
+
tags: [guardrails, safety, toxicity, injection, jailbreak, classifier, ml, bert, onnx]
|
|
9
|
+
requires_tools: [classify_content]
|
|
10
|
+
metadata:
|
|
11
|
+
agentos:
|
|
12
|
+
emoji: "\U0001F6E1"
|
|
13
|
+
---
|
|
14
|
+
|
|
15
|
+
# ML Content Classifier
|
|
16
|
+
|
|
17
|
+
A guardrail automatically classifies your inputs and outputs for safety
|
|
18
|
+
violations (toxicity, prompt injection, jailbreak attempts). You also have
|
|
19
|
+
a tool for on-demand classification.
|
|
20
|
+
|
|
21
|
+
## When to Use classify_content
|
|
22
|
+
|
|
23
|
+
- Before forwarding user-provided text to external APIs
|
|
24
|
+
- To evaluate RAG retrieval results before including in responses
|
|
25
|
+
- For content moderation workflows
|
|
26
|
+
- To check tool outputs before presenting to users
|
|
27
|
+
|
|
28
|
+
## What It Detects
|
|
29
|
+
|
|
30
|
+
- **Toxicity**: toxic, severe_toxic, obscene, threat, insult, identity_hate
|
|
31
|
+
- **Prompt injection**: attempts to override system instructions
|
|
32
|
+
- **Jailbreak**: role-play attacks, constraint bypasses, system prompt extraction
|
|
33
|
+
|
|
34
|
+
## Constraints
|
|
35
|
+
|
|
36
|
+
- Models (~98MB total) load lazily on first classification
|
|
37
|
+
- Classification takes ~20-60ms per chunk (CPU), ~5-15ms (GPU)
|
|
38
|
+
- The guardrail evaluates every ~200 tokens during streaming
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pii-redaction
|
|
3
|
+
version: '1.0.0'
|
|
4
|
+
description: Detect and redact personally identifiable information (PII) from text using a four-tier pipeline (regex + NLP + NER + LLM-as-judge)
|
|
5
|
+
author: Frame.dev
|
|
6
|
+
namespace: wunderland
|
|
7
|
+
category: security
|
|
8
|
+
tags: [pii, privacy, redaction, gdpr, hipaa, compliance, security, ner]
|
|
9
|
+
requires_tools: [pii_scan, pii_redact]
|
|
10
|
+
metadata:
|
|
11
|
+
agentos:
|
|
12
|
+
emoji: "\U0001F6E1"
|
|
13
|
+
primaryEnv: PII_LLM_API_KEY
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# PII Redaction
|
|
17
|
+
|
|
18
|
+
You have access to PII detection and redaction capabilities. A guardrail
|
|
19
|
+
automatically redacts PII from your inputs and outputs, but you can also
|
|
20
|
+
proactively scan and redact text before storing it, sending it to external
|
|
21
|
+
APIs, or sharing it across agents.
|
|
22
|
+
|
|
23
|
+
## When to Use
|
|
24
|
+
|
|
25
|
+
- Before storing user-provided text in memory or database
|
|
26
|
+
- Before sending text to third-party APIs or external tools
|
|
27
|
+
- Before sharing content across agents in multi-agent systems
|
|
28
|
+
- When a user asks you to anonymize or de-identify text
|
|
29
|
+
- When handling medical, financial, or legal documents
|
|
30
|
+
|
|
31
|
+
## Available Tools
|
|
32
|
+
|
|
33
|
+
### pii_scan
|
|
34
|
+
Scan text and return detected PII entities with type, confidence, and location.
|
|
35
|
+
Use this to audit text without modifying it.
|
|
36
|
+
|
|
37
|
+
### pii_redact
|
|
38
|
+
Redact PII from text and return the sanitized version. Supports styles:
|
|
39
|
+
- placeholder: [PERSON], [EMAIL], [SSN]
|
|
40
|
+
- mask: J*** S****, ***@***.com
|
|
41
|
+
- hash: [PERSON:a1b2c3d4e5] (deterministic, correlatable)
|
|
42
|
+
- category-tag: <PII type="PERSON">REDACTED</PII>
|
|
43
|
+
|
|
44
|
+
## Best Practices
|
|
45
|
+
|
|
46
|
+
- Scan before store: always run pii_scan before writing user data to memory
|
|
47
|
+
- Use placeholder style for human-readable output
|
|
48
|
+
- Use hash style when you need to correlate redacted entities across documents
|
|
49
|
+
- If a user explicitly asks you to include their name/email, respect that —
|
|
50
|
+
the guardrail handles involuntary leakage, not intentional sharing
|
|
51
|
+
|
|
52
|
+
## Constraints
|
|
53
|
+
|
|
54
|
+
- NER model (~110MB) loads lazily on first detection of name-like tokens
|
|
55
|
+
- LLM judge calls cost tokens — only invoked for ambiguous cases
|
|
56
|
+
- Regex patterns cover 50+ countries for government IDs
|