gliner2 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
gliner2-1.0.0/PKG-INFO ADDED
@@ -0,0 +1,282 @@
1
+ Metadata-Version: 2.4
2
+ Name: gliner2
3
+ Version: 1.0.0
4
+ Maintainer: Urchade Zaratiana
5
+ Requires-Python: >=3.8
6
+ Description-Content-Type: text/markdown
7
+ Requires-Dist: gliner
8
+
9
+ Here's a refined version of your GLiNER2 documentation that improves clarity, flow, and formatting while preserving technical depth and usability:
10
+
11
+ ---
12
+
13
+ # **GLiNER2: Unified Schema-Based Information Extraction**
14
+
15
+ > *Next-gen extraction for text, structured data, and classification—powered by [Fastino AI](https://fastino.ai)*
16
+
17
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
18
+ [![Powered by Fastino](https://img.shields.io/badge/Powered%20by-Fastino-blue)](https://fastino.ai)
19
+
20
+ GLiNER2 is the successor to [GLiNER](https://github.com/urchade/GLiNER), introducing a schema-driven framework to consolidate entity extraction, classification, and structured parsing—all within a unified API.
21
+
22
+ ---
23
+
24
+ ## ✨ What Makes GLiNER2 Unique?
25
+
26
+ | Capability | Traditional Tools | **GLiNER2** |
27
+ | ----------------------- | ----------------- | ----------- |
28
+ | Entity Extraction | ✅ | ✅ Enhanced |
29
+ | Text Classification | ❌ | ✅ New |
30
+ | Structured Data Parsing | ❌ | ✅ New |
31
+ | Unified Schema API | ❌ | ✅ New |
32
+ | Multi-task Processing | ❌ | ✅ New |
33
+
34
+ Instead of juggling multiple models, simply define **what** you want and extract it all in **one pass**.
35
+
36
+ ---
37
+
38
+ ## 🚀 Quick Start
39
+
40
+ ### Installation
41
+
42
+ ```bash
43
+ pip install gliner2
44
+ ```
45
+
46
+ ### Basic Usage
47
+
48
+ ```python
49
+ from gliner2 import GLiNER2
50
+
51
+ extractor = GLiNER2.from_pretrained("fastino/gliner-v2")
52
+
53
+ results = extractor.extract_entities(
54
+ "Dr. Sarah Johnson from Stanford published groundbreaking AI research.",
55
+ ["person", "organization", "field"]
56
+ )
57
+ print(results)
58
+ # {'entities': {'person': ['Dr. Sarah Johnson'], 'organization': ['Stanford'], 'field': ['AI research']}}
59
+ ```
60
+
61
+ ---
62
+
63
+ ## 🧠 Schema-Based Extraction
64
+
65
+ Define a custom schema for **entities**, **classification**, and **structured fields**:
66
+
67
+ ```python
68
+ schema = (extractor.create_schema()
69
+ .entities(["person", "company", "location"])
70
+ .classification("sentiment", ["positive", "negative", "neutral"])
71
+ .structure("product")
72
+ .field("name", dtype="str")
73
+ .field("price", dtype="str")
74
+ .field("features", dtype="list")
75
+ )
76
+
77
+ results = extractor.extract("Apple CEO Tim Cook announced iPhone 15 for $999...", schema)
78
+ ```
79
+
80
+ ---
81
+
82
+ ## 🎯 Entity Extraction
83
+
84
+ ### Flexible & Domain-Aware
85
+
86
+ ```python
87
+ text = "Patient took 400mg ibuprofen for severe headache yesterday."
88
+ results = extractor.extract_entities(text, ["medication", "dosage", "symptom", "timeframe"])
89
+ ```
90
+
91
+ #### With Descriptions
92
+
93
+ ```python
94
+ results = extractor.extract_entities(
95
+ "The API endpoint /users/{id} returns 404 when user not found.",
96
+ {
97
+ "endpoint": "API URLs and paths like /users/{id}",
98
+ "http_status": "HTTP status codes like 200, 404, 500",
99
+ "error_condition": "Error scenarios and failure cases"
100
+ }
101
+ )
102
+ ```
103
+
104
+ > 💡 **Tips**:
105
+ >
106
+ > * Use clear descriptions for ambiguous terms
107
+ > * Prefer specific labels like `"email_address"` over `"email"`
108
+
109
+ ---
110
+
111
+ ## 📊 Text Classification
112
+
113
+ ### Single & Multi-Label Support
114
+
115
+ ```python
116
+ results = extractor.classify_text(
117
+ "This product exceeded my expectations!",
118
+ {"sentiment": ["positive", "negative", "neutral"]}
119
+ )
120
+ ```
121
+
122
+ ### Multi-Label with Threshold
123
+
124
+ ```python
125
+ results = extractor.classify_text(
126
+ "The camera is excellent but battery life is disappointing.",
127
+ {
128
+ "aspects": {
129
+ "labels": ["camera", "battery", "display", "performance", "design"],
130
+ "multi_label": True,
131
+ "cls_threshold": 0.4
132
+ }
133
+ }
134
+ )
135
+ ```
136
+
137
+ ---
138
+
139
+ ## 🗃️ Structured Data Extraction
140
+
141
+ ### Turn Unstructured Text into JSON
142
+
143
+ ```python
144
+ text = """
145
+ John Smith (CEO) at TechCorp can be reached at john@techcorp.com or +1-555-0123.
146
+ The company, founded in 2010, specializes in AI software with 150 employees.
147
+ """
148
+
149
+ results = extractor.extract_json(
150
+ text,
151
+ {
152
+ "contact": [
153
+ "name::str::Full name of the person",
154
+ "title::str::Job title or position",
155
+ "email::str::Email address",
156
+ "phone::str::Phone number"
157
+ ],
158
+ "company": [
159
+ "name::str::Company name",
160
+ "founded::str::Year founded",
161
+ "industry::str::Business sector",
162
+ "size::str::Number of employees"
163
+ ]
164
+ }
165
+ )
166
+ ```
167
+
168
+ ---
169
+
170
+ ## 🧩 Multi-Task Extraction with Schemas
171
+
172
+ Analyze text with entities, classification, and structured fields—**all at once**.
173
+
174
+ ```python
175
+ schema = (extractor.create_schema()
176
+ .entities({
177
+ "person": "Names of people",
178
+ "organization": "Companies and institutions",
179
+ "location": "Geographic locations"
180
+ })
181
+ .classification("category", {
182
+ "business": "Corporate news",
183
+ "technology": "Tech developments",
184
+ "research": "Academic studies"
185
+ })
186
+ .structure("announcement")
187
+ .field("what", dtype="str")
188
+ .field("when", dtype="str")
189
+ .field("impact", dtype="list")
190
+ .field("stakeholders", dtype="list")
191
+ )
192
+ ```
193
+
194
+ ---
195
+
196
+ ## 🧪 Advanced Configuration
197
+
198
+ ### Precision Control per Field
199
+
200
+ ```python
201
+ schema = (extractor.create_schema()
202
+ .structure("financial_data")
203
+ .field("amount", dtype="str", threshold=0.95)
204
+ .field("date", dtype="str", threshold=0.8)
205
+ .field("description", dtype="str", threshold=0.6)
206
+ )
207
+ ```
208
+
209
+ ### Data Type & Choices
210
+
211
+ ```python
212
+ schema = (extractor.create_schema()
213
+ .structure("product")
214
+ .field("name", dtype="str")
215
+ .field("features", dtype="list")
216
+ .field("category", dtype="str", choices=["electronics", "software", "service"])
217
+ .field("tags", dtype="list", choices=["popular", "new", "discounted", "premium"])
218
+ )
219
+ ```
220
+
221
+ ---
222
+
223
+ ## 🏭 Real-World Applications
224
+
225
+ ### Healthcare
226
+
227
+ ```python
228
+ text = "Patient Mary Johnson, age 65, visited Dr. Roberts on March 15th..."
229
+ # Extracts patient, doctor, medications, urgency, and prescriptions
230
+ ```
231
+
232
+ ### Legal Contracts
233
+
234
+ ```python
235
+ text = "Employment Agreement between TechCorp Inc. and Jane Doe..."
236
+ # Extracts parties, dates, clauses, penalties, obligations
237
+ ```
238
+
239
+ ### Finance
240
+
241
+ ```python
242
+ text = "Transaction ID: TXN-2024-001. Transfer of $5,000..."
243
+ # Extracts transaction IDs, parties, amounts, purposes
244
+ ```
245
+
246
+ ---
247
+
248
+ ## 📚 API Summary
249
+
250
+ | Component | Description |
251
+ | -------------------- | ----------------------------- |
252
+ | `GLiNER2` | Main model class |
253
+ | `create_schema()` | Schema builder |
254
+ | `extract()` | Unified extraction method |
255
+ | `extract_entities()` | Fast entity-only extraction |
256
+ | `classify_text()` | Text classification by schema |
257
+ | `extract_json()` | Structured record parsing |
258
+
259
+ ---
260
+
261
+ ## 🤝 Contribute
262
+
263
+ We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md):
264
+
265
+ 1. Fork and branch
266
+ 2. Add your feature + test it
267
+ 3. Submit a PR
268
+
269
+ ---
270
+
271
+ ## 🙌 Credits
272
+
273
+ * **GLiNER2** by [Fastino AI](https://fastino.ai)
274
+ * **Original GLiNER** by [Urchade Zaratiana](https://github.com/urchade/GLiNER)
275
+
276
+ ---
277
+
278
+ <div align="center"><strong>Built with ❤️ by the Fastino AI team</strong></div>
279
+
280
+ ---
281
+
282
+ Let me know if you'd like this turned into a formatted README or a documentation website layout.
@@ -0,0 +1,274 @@
1
+ Here's a refined version of your GLiNER2 documentation that improves clarity, flow, and formatting while preserving technical depth and usability:
2
+
3
+ ---
4
+
5
+ # **GLiNER2: Unified Schema-Based Information Extraction**
6
+
7
+ > *Next-gen extraction for text, structured data, and classification—powered by [Fastino AI](https://fastino.ai)*
8
+
9
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
10
+ [![Powered by Fastino](https://img.shields.io/badge/Powered%20by-Fastino-blue)](https://fastino.ai)
11
+
12
+ GLiNER2 is the successor to [GLiNER](https://github.com/urchade/GLiNER), introducing a schema-driven framework to consolidate entity extraction, classification, and structured parsing—all within a unified API.
13
+
14
+ ---
15
+
16
+ ## ✨ What Makes GLiNER2 Unique?
17
+
18
+ | Capability | Traditional Tools | **GLiNER2** |
19
+ | ----------------------- | ----------------- | ----------- |
20
+ | Entity Extraction | ✅ | ✅ Enhanced |
21
+ | Text Classification | ❌ | ✅ New |
22
+ | Structured Data Parsing | ❌ | ✅ New |
23
+ | Unified Schema API | ❌ | ✅ New |
24
+ | Multi-task Processing | ❌ | ✅ New |
25
+
26
+ Instead of juggling multiple models, simply define **what** you want and extract it all in **one pass**.
27
+
28
+ ---
29
+
30
+ ## 🚀 Quick Start
31
+
32
+ ### Installation
33
+
34
+ ```bash
35
+ pip install gliner2
36
+ ```
37
+
38
+ ### Basic Usage
39
+
40
+ ```python
41
+ from gliner2 import GLiNER2
42
+
43
+ extractor = GLiNER2.from_pretrained("fastino/gliner-v2")
44
+
45
+ results = extractor.extract_entities(
46
+ "Dr. Sarah Johnson from Stanford published groundbreaking AI research.",
47
+ ["person", "organization", "field"]
48
+ )
49
+ print(results)
50
+ # {'entities': {'person': ['Dr. Sarah Johnson'], 'organization': ['Stanford'], 'field': ['AI research']}}
51
+ ```
52
+
53
+ ---
54
+
55
+ ## 🧠 Schema-Based Extraction
56
+
57
+ Define a custom schema for **entities**, **classification**, and **structured fields**:
58
+
59
+ ```python
60
+ schema = (extractor.create_schema()
61
+ .entities(["person", "company", "location"])
62
+ .classification("sentiment", ["positive", "negative", "neutral"])
63
+ .structure("product")
64
+ .field("name", dtype="str")
65
+ .field("price", dtype="str")
66
+ .field("features", dtype="list")
67
+ )
68
+
69
+ results = extractor.extract("Apple CEO Tim Cook announced iPhone 15 for $999...", schema)
70
+ ```
71
+
72
+ ---
73
+
74
+ ## 🎯 Entity Extraction
75
+
76
+ ### Flexible & Domain-Aware
77
+
78
+ ```python
79
+ text = "Patient took 400mg ibuprofen for severe headache yesterday."
80
+ results = extractor.extract_entities(text, ["medication", "dosage", "symptom", "timeframe"])
81
+ ```
82
+
83
+ #### With Descriptions
84
+
85
+ ```python
86
+ results = extractor.extract_entities(
87
+ "The API endpoint /users/{id} returns 404 when user not found.",
88
+ {
89
+ "endpoint": "API URLs and paths like /users/{id}",
90
+ "http_status": "HTTP status codes like 200, 404, 500",
91
+ "error_condition": "Error scenarios and failure cases"
92
+ }
93
+ )
94
+ ```
95
+
96
+ > 💡 **Tips**:
97
+ >
98
+ > * Use clear descriptions for ambiguous terms
99
+ > * Prefer specific labels like `"email_address"` over `"email"`
100
+
101
+ ---
102
+
103
+ ## 📊 Text Classification
104
+
105
+ ### Single & Multi-Label Support
106
+
107
+ ```python
108
+ results = extractor.classify_text(
109
+ "This product exceeded my expectations!",
110
+ {"sentiment": ["positive", "negative", "neutral"]}
111
+ )
112
+ ```
113
+
114
+ ### Multi-Label with Threshold
115
+
116
+ ```python
117
+ results = extractor.classify_text(
118
+ "The camera is excellent but battery life is disappointing.",
119
+ {
120
+ "aspects": {
121
+ "labels": ["camera", "battery", "display", "performance", "design"],
122
+ "multi_label": True,
123
+ "cls_threshold": 0.4
124
+ }
125
+ }
126
+ )
127
+ ```
128
+
129
+ ---
130
+
131
+ ## 🗃️ Structured Data Extraction
132
+
133
+ ### Turn Unstructured Text into JSON
134
+
135
+ ```python
136
+ text = """
137
+ John Smith (CEO) at TechCorp can be reached at john@techcorp.com or +1-555-0123.
138
+ The company, founded in 2010, specializes in AI software with 150 employees.
139
+ """
140
+
141
+ results = extractor.extract_json(
142
+ text,
143
+ {
144
+ "contact": [
145
+ "name::str::Full name of the person",
146
+ "title::str::Job title or position",
147
+ "email::str::Email address",
148
+ "phone::str::Phone number"
149
+ ],
150
+ "company": [
151
+ "name::str::Company name",
152
+ "founded::str::Year founded",
153
+ "industry::str::Business sector",
154
+ "size::str::Number of employees"
155
+ ]
156
+ }
157
+ )
158
+ ```
159
+
160
+ ---
161
+
162
+ ## 🧩 Multi-Task Extraction with Schemas
163
+
164
+ Analyze text with entities, classification, and structured fields—**all at once**.
165
+
166
+ ```python
167
+ schema = (extractor.create_schema()
168
+ .entities({
169
+ "person": "Names of people",
170
+ "organization": "Companies and institutions",
171
+ "location": "Geographic locations"
172
+ })
173
+ .classification("category", {
174
+ "business": "Corporate news",
175
+ "technology": "Tech developments",
176
+ "research": "Academic studies"
177
+ })
178
+ .structure("announcement")
179
+ .field("what", dtype="str")
180
+ .field("when", dtype="str")
181
+ .field("impact", dtype="list")
182
+ .field("stakeholders", dtype="list")
183
+ )
184
+ ```
185
+
186
+ ---
187
+
188
+ ## 🧪 Advanced Configuration
189
+
190
+ ### Precision Control per Field
191
+
192
+ ```python
193
+ schema = (extractor.create_schema()
194
+ .structure("financial_data")
195
+ .field("amount", dtype="str", threshold=0.95)
196
+ .field("date", dtype="str", threshold=0.8)
197
+ .field("description", dtype="str", threshold=0.6)
198
+ )
199
+ ```
200
+
201
+ ### Data Type & Choices
202
+
203
+ ```python
204
+ schema = (extractor.create_schema()
205
+ .structure("product")
206
+ .field("name", dtype="str")
207
+ .field("features", dtype="list")
208
+ .field("category", dtype="str", choices=["electronics", "software", "service"])
209
+ .field("tags", dtype="list", choices=["popular", "new", "discounted", "premium"])
210
+ )
211
+ ```
212
+
213
+ ---
214
+
215
+ ## 🏭 Real-World Applications
216
+
217
+ ### Healthcare
218
+
219
+ ```python
220
+ text = "Patient Mary Johnson, age 65, visited Dr. Roberts on March 15th..."
221
+ # Extracts patient, doctor, medications, urgency, and prescriptions
222
+ ```
223
+
224
+ ### Legal Contracts
225
+
226
+ ```python
227
+ text = "Employment Agreement between TechCorp Inc. and Jane Doe..."
228
+ # Extracts parties, dates, clauses, penalties, obligations
229
+ ```
230
+
231
+ ### Finance
232
+
233
+ ```python
234
+ text = "Transaction ID: TXN-2024-001. Transfer of $5,000..."
235
+ # Extracts transaction IDs, parties, amounts, purposes
236
+ ```
237
+
238
+ ---
239
+
240
+ ## 📚 API Summary
241
+
242
+ | Component | Description |
243
+ | -------------------- | ----------------------------- |
244
+ | `GLiNER2` | Main model class |
245
+ | `create_schema()` | Schema builder |
246
+ | `extract()` | Unified extraction method |
247
+ | `extract_entities()` | Fast entity-only extraction |
248
+ | `classify_text()` | Text classification by schema |
249
+ | `extract_json()` | Structured record parsing |
250
+
251
+ ---
252
+
253
+ ## 🤝 Contribute
254
+
255
+ We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md):
256
+
257
+ 1. Fork and branch
258
+ 2. Add your feature + test it
259
+ 3. Submit a PR
260
+
261
+ ---
262
+
263
+ ## 🙌 Credits
264
+
265
+ * **GLiNER2** by [Fastino AI](https://fastino.ai)
266
+ * **Original GLiNER** by [Urchade Zaratiana](https://github.com/urchade/GLiNER)
267
+
268
+ ---
269
+
270
+ <div align="center"><strong>Built with ❤️ by the Fastino AI team</strong></div>
271
+
272
+ ---
273
+
274
+ Let me know if you'd like this turned into a formatted README or a documentation website layout.
@@ -0,0 +1,4 @@
1
+ __version__ = "1.0.0"
2
+
3
+ from .model import Extractor, ExtractorConfig
4
+ from .inference.engine import GLiNER2
File without changes