PyPI - gliner2 - Versions diffs - 1.0.0__tar.gz - Mend

gliner2 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

gliner2-1.0.0/PKG-INFO +282 -0
gliner2-1.0.0/README.md +274 -0
gliner2-1.0.0/gliner2/__init__.py +4 -0
gliner2-1.0.0/gliner2/inference/__init__.py +0 -0
gliner2-1.0.0/gliner2/inference/engine.py +1802 -0
gliner2-1.0.0/gliner2/layers.py +272 -0
gliner2-1.0.0/gliner2/model.py +367 -0
gliner2-1.0.0/gliner2/processor.py +951 -0
gliner2-1.0.0/gliner2/trainer.py +158 -0
gliner2-1.0.0/gliner2.egg-info/PKG-INFO +282 -0
gliner2-1.0.0/gliner2.egg-info/SOURCES.txt +14 -0
gliner2-1.0.0/gliner2.egg-info/dependency_links.txt +1 -0
gliner2-1.0.0/gliner2.egg-info/requires.txt +1 -0
gliner2-1.0.0/gliner2.egg-info/top_level.txt +1 -0
gliner2-1.0.0/pyproject.toml +24 -0
gliner2-1.0.0/setup.cfg +4 -0

gliner2-1.0.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,282 @@
+Metadata-Version: 2.4
+Name: gliner2
+Version: 1.0.0
+Maintainer: Urchade Zaratiana
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+Requires-Dist: gliner
+Here's a refined version of your GLiNER2 documentation that improves clarity, flow, and formatting while preserving technical depth and usability:
+---
+# **GLiNER2: Unified Schema-Based Information Extraction**
+> *Next-gen extraction for text, structured data, and classification—powered by [Fastino AI](https://fastino.ai)*
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![Powered by Fastino](https://img.shields.io/badge/Powered%20by-Fastino-blue)](https://fastino.ai)
+GLiNER2 is the successor to [GLiNER](https://github.com/urchade/GLiNER), introducing a schema-driven framework to consolidate entity extraction, classification, and structured parsing—all within a unified API.
+---
+## ✨ What Makes GLiNER2 Unique?
+| Capability              | Traditional Tools | **GLiNER2** |
+| ----------------------- | ----------------- | ----------- |
+| Entity Extraction       | ✅                 | ✅ Enhanced  |
+| Text Classification     | ❌                 | ✅ New       |
+| Structured Data Parsing | ❌                 | ✅ New       |
+| Unified Schema API      | ❌                 | ✅ New       |
+| Multi-task Processing   | ❌                 | ✅ New       |
+Instead of juggling multiple models, simply define **what** you want and extract it all in **one pass**.
+---
+## 🚀 Quick Start
+### Installation
+```bash
+pip install gliner2
+```
+### Basic Usage
+```python
+from gliner2 import GLiNER2
+extractor = GLiNER2.from_pretrained("fastino/gliner-v2")
+results = extractor.extract_entities(
+    "Dr. Sarah Johnson from Stanford published groundbreaking AI research.",
+    ["person", "organization", "field"]
+)
+print(results)
+# {'entities': {'person': ['Dr. Sarah Johnson'], 'organization': ['Stanford'], 'field': ['AI research']}}
+```
+---
+## 🧠 Schema-Based Extraction
+Define a custom schema for **entities**, **classification**, and **structured fields**:
+```python
+schema = (extractor.create_schema()
+    .entities(["person", "company", "location"])
+    .classification("sentiment", ["positive", "negative", "neutral"])
+    .structure("product")
+        .field("name", dtype="str")
+        .field("price", dtype="str")
+        .field("features", dtype="list")
+)
+results = extractor.extract("Apple CEO Tim Cook announced iPhone 15 for $999...", schema)
+```
+---
+## 🎯 Entity Extraction
+### Flexible & Domain-Aware
+```python
+text = "Patient took 400mg ibuprofen for severe headache yesterday."
+results = extractor.extract_entities(text, ["medication", "dosage", "symptom", "timeframe"])
+```
+#### With Descriptions
+```python
+results = extractor.extract_entities(
+    "The API endpoint /users/{id} returns 404 when user not found.",
+    {
+        "endpoint": "API URLs and paths like /users/{id}",
+        "http_status": "HTTP status codes like 200, 404, 500",
+        "error_condition": "Error scenarios and failure cases"
+    }
+)
+```
+> 💡 **Tips**:
+>
+> * Use clear descriptions for ambiguous terms
+> * Prefer specific labels like `"email_address"` over `"email"`
+---
+## 📊 Text Classification
+### Single & Multi-Label Support
+```python
+results = extractor.classify_text(
+    "This product exceeded my expectations!",
+    {"sentiment": ["positive", "negative", "neutral"]}
+)
+```
+### Multi-Label with Threshold
+```python
+results = extractor.classify_text(
+    "The camera is excellent but battery life is disappointing.",
+    {
+        "aspects": {
+            "labels": ["camera", "battery", "display", "performance", "design"],
+            "multi_label": True,
+            "cls_threshold": 0.4
+        }
+    }
+)
+```
+---
+## 🗃️ Structured Data Extraction
+### Turn Unstructured Text into JSON
+```python
+text = """
+John Smith (CEO) at TechCorp can be reached at john@techcorp.com or +1-555-0123.
+The company, founded in 2010, specializes in AI software with 150 employees.
+"""
+results = extractor.extract_json(
+    text,
+    {
+        "contact": [
+            "name::str::Full name of the person",
+            "title::str::Job title or position",
+            "email::str::Email address",
+            "phone::str::Phone number"
+        ],
+        "company": [
+            "name::str::Company name",
+            "founded::str::Year founded",
+            "industry::str::Business sector",
+            "size::str::Number of employees"
+        ]
+    }
+)
+```
+---
+## 🧩 Multi-Task Extraction with Schemas
+Analyze text with entities, classification, and structured fields—**all at once**.
+```python
+schema = (extractor.create_schema()
+    .entities({
+        "person": "Names of people",
+        "organization": "Companies and institutions",
+        "location": "Geographic locations"
+    })
+    .classification("category", {
+        "business": "Corporate news",
+        "technology": "Tech developments",
+        "research": "Academic studies"
+    })
+    .structure("announcement")
+        .field("what", dtype="str")
+        .field("when", dtype="str")
+        .field("impact", dtype="list")
+        .field("stakeholders", dtype="list")
+)
+```
+---
+## 🧪 Advanced Configuration
+### Precision Control per Field
+```python
+schema = (extractor.create_schema()
+    .structure("financial_data")
+        .field("amount", dtype="str", threshold=0.95)
+        .field("date", dtype="str", threshold=0.8)
+        .field("description", dtype="str", threshold=0.6)
+)
+```
+### Data Type & Choices
+```python
+schema = (extractor.create_schema()
+    .structure("product")
+        .field("name", dtype="str")
+        .field("features", dtype="list")
+        .field("category", dtype="str", choices=["electronics", "software", "service"])
+        .field("tags", dtype="list", choices=["popular", "new", "discounted", "premium"])
+)
+```
+---
+## 🏭 Real-World Applications
+### Healthcare
+```python
+text = "Patient Mary Johnson, age 65, visited Dr. Roberts on March 15th..."
+# Extracts patient, doctor, medications, urgency, and prescriptions
+```
+### Legal Contracts
+```python
+text = "Employment Agreement between TechCorp Inc. and Jane Doe..."
+# Extracts parties, dates, clauses, penalties, obligations
+```
+### Finance
+```python
+text = "Transaction ID: TXN-2024-001. Transfer of $5,000..."
+# Extracts transaction IDs, parties, amounts, purposes
+```
+---
+## 📚 API Summary
+| Component            | Description                   |
+| -------------------- | ----------------------------- |
+| `GLiNER2`            | Main model class              |
+| `create_schema()`    | Schema builder                |
+| `extract()`          | Unified extraction method     |
+| `extract_entities()` | Fast entity-only extraction   |
+| `classify_text()`    | Text classification by schema |
+| `extract_json()`     | Structured record parsing     |
+---
+## 🤝 Contribute
+We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md):
+1. Fork and branch
+2. Add your feature + test it
+3. Submit a PR
+---
+## 🙌 Credits
+* **GLiNER2** by [Fastino AI](https://fastino.ai)
+* **Original GLiNER** by [Urchade Zaratiana](https://github.com/urchade/GLiNER)
+---
+<div align="center"><strong>Built with ❤️ by the Fastino AI team</strong></div>
+---
+Let me know if you'd like this turned into a formatted README or a documentation website layout.

gliner2-1.0.0/README.md ADDED Viewed

@@ -0,0 +1,274 @@
+Here's a refined version of your GLiNER2 documentation that improves clarity, flow, and formatting while preserving technical depth and usability:
+---
+# **GLiNER2: Unified Schema-Based Information Extraction**
+> *Next-gen extraction for text, structured data, and classification—powered by [Fastino AI](https://fastino.ai)*
+[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
+[![Powered by Fastino](https://img.shields.io/badge/Powered%20by-Fastino-blue)](https://fastino.ai)
+GLiNER2 is the successor to [GLiNER](https://github.com/urchade/GLiNER), introducing a schema-driven framework to consolidate entity extraction, classification, and structured parsing—all within a unified API.
+---
+## ✨ What Makes GLiNER2 Unique?
+| Capability              | Traditional Tools | **GLiNER2** |
+| ----------------------- | ----------------- | ----------- |
+| Entity Extraction       | ✅                 | ✅ Enhanced  |
+| Text Classification     | ❌                 | ✅ New       |
+| Structured Data Parsing | ❌                 | ✅ New       |
+| Unified Schema API      | ❌                 | ✅ New       |
+| Multi-task Processing   | ❌                 | ✅ New       |
+Instead of juggling multiple models, simply define **what** you want and extract it all in **one pass**.
+---
+## 🚀 Quick Start
+### Installation
+```bash
+pip install gliner2
+```
+### Basic Usage
+```python
+from gliner2 import GLiNER2
+extractor = GLiNER2.from_pretrained("fastino/gliner-v2")
+results = extractor.extract_entities(
+    "Dr. Sarah Johnson from Stanford published groundbreaking AI research.",
+    ["person", "organization", "field"]
+)
+print(results)
+# {'entities': {'person': ['Dr. Sarah Johnson'], 'organization': ['Stanford'], 'field': ['AI research']}}
+```
+---
+## 🧠 Schema-Based Extraction
+Define a custom schema for **entities**, **classification**, and **structured fields**:
+```python
+schema = (extractor.create_schema()
+    .entities(["person", "company", "location"])
+    .classification("sentiment", ["positive", "negative", "neutral"])
+    .structure("product")
+        .field("name", dtype="str")
+        .field("price", dtype="str")
+        .field("features", dtype="list")
+)
+results = extractor.extract("Apple CEO Tim Cook announced iPhone 15 for $999...", schema)
+```
+---
+## 🎯 Entity Extraction
+### Flexible & Domain-Aware
+```python
+text = "Patient took 400mg ibuprofen for severe headache yesterday."
+results = extractor.extract_entities(text, ["medication", "dosage", "symptom", "timeframe"])
+```
+#### With Descriptions
+```python
+results = extractor.extract_entities(
+    "The API endpoint /users/{id} returns 404 when user not found.",
+    {
+        "endpoint": "API URLs and paths like /users/{id}",
+        "http_status": "HTTP status codes like 200, 404, 500",
+        "error_condition": "Error scenarios and failure cases"
+    }
+)
+```
+> 💡 **Tips**:
+>
+> * Use clear descriptions for ambiguous terms
+> * Prefer specific labels like `"email_address"` over `"email"`
+---
+## 📊 Text Classification
+### Single & Multi-Label Support
+```python
+results = extractor.classify_text(
+    "This product exceeded my expectations!",
+    {"sentiment": ["positive", "negative", "neutral"]}
+)
+```
+### Multi-Label with Threshold
+```python
+results = extractor.classify_text(
+    "The camera is excellent but battery life is disappointing.",
+    {
+        "aspects": {
+            "labels": ["camera", "battery", "display", "performance", "design"],
+            "multi_label": True,
+            "cls_threshold": 0.4
+        }
+    }
+)
+```
+---
+## 🗃️ Structured Data Extraction
+### Turn Unstructured Text into JSON
+```python
+text = """
+John Smith (CEO) at TechCorp can be reached at john@techcorp.com or +1-555-0123.
+The company, founded in 2010, specializes in AI software with 150 employees.
+"""
+results = extractor.extract_json(
+    text,
+    {
+        "contact": [
+            "name::str::Full name of the person",
+            "title::str::Job title or position",
+            "email::str::Email address",
+            "phone::str::Phone number"
+        ],
+        "company": [
+            "name::str::Company name",
+            "founded::str::Year founded",
+            "industry::str::Business sector",
+            "size::str::Number of employees"
+        ]
+    }
+)
+```
+---
+## 🧩 Multi-Task Extraction with Schemas
+Analyze text with entities, classification, and structured fields—**all at once**.
+```python
+schema = (extractor.create_schema()
+    .entities({
+        "person": "Names of people",
+        "organization": "Companies and institutions",
+        "location": "Geographic locations"
+    })
+    .classification("category", {
+        "business": "Corporate news",
+        "technology": "Tech developments",
+        "research": "Academic studies"
+    })
+    .structure("announcement")
+        .field("what", dtype="str")
+        .field("when", dtype="str")
+        .field("impact", dtype="list")
+        .field("stakeholders", dtype="list")
+)
+```
+---
+## 🧪 Advanced Configuration
+### Precision Control per Field
+```python
+schema = (extractor.create_schema()
+    .structure("financial_data")
+        .field("amount", dtype="str", threshold=0.95)
+        .field("date", dtype="str", threshold=0.8)
+        .field("description", dtype="str", threshold=0.6)
+)
+```
+### Data Type & Choices
+```python
+schema = (extractor.create_schema()
+    .structure("product")
+        .field("name", dtype="str")
+        .field("features", dtype="list")
+        .field("category", dtype="str", choices=["electronics", "software", "service"])
+        .field("tags", dtype="list", choices=["popular", "new", "discounted", "premium"])
+)
+```
+---
+## 🏭 Real-World Applications
+### Healthcare
+```python
+text = "Patient Mary Johnson, age 65, visited Dr. Roberts on March 15th..."
+# Extracts patient, doctor, medications, urgency, and prescriptions
+```
+### Legal Contracts
+```python
+text = "Employment Agreement between TechCorp Inc. and Jane Doe..."
+# Extracts parties, dates, clauses, penalties, obligations
+```
+### Finance
+```python
+text = "Transaction ID: TXN-2024-001. Transfer of $5,000..."
+# Extracts transaction IDs, parties, amounts, purposes
+```
+---
+## 📚 API Summary
+| Component            | Description                   |
+| -------------------- | ----------------------------- |
+| `GLiNER2`            | Main model class              |
+| `create_schema()`    | Schema builder                |
+| `extract()`          | Unified extraction method     |
+| `extract_entities()` | Fast entity-only extraction   |
+| `classify_text()`    | Text classification by schema |
+| `extract_json()`     | Structured record parsing     |
+---
+## 🤝 Contribute
+We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md):
+1. Fork and branch
+2. Add your feature + test it
+3. Submit a PR
+---
+## 🙌 Credits
+* **GLiNER2** by [Fastino AI](https://fastino.ai)
+* **Original GLiNER** by [Urchade Zaratiana](https://github.com/urchade/GLiNER)
+---
+<div align="center"><strong>Built with ❤️ by the Fastino AI team</strong></div>
+---
+Let me know if you'd like this turned into a formatted README or a documentation website layout.

gliner2-1.0.0/gliner2/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+__version__ = "1.0.0"
+from .model import Extractor, ExtractorConfig
+from .inference.engine import GLiNER2

gliner2-1.0.0/gliner2/inference/__init__.py ADDED Viewed

File without changes