ruby-spacy 0.2.2 → 0.2.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/README.md +153 -41
- data/examples/openai_integration/openai_completion.rb +1 -1
- data/examples/openai_integration/openai_query_1.rb +1 -1
- data/examples/openai_integration/openai_query_2.rb +1 -1
- data/examples/openai_integration/openai_query_3.rb +72 -57
- data/examples/openai_integration/openai_query_4.rb +3 -3
- data/lib/ruby-spacy/version.rb +1 -1
- data/lib/ruby-spacy.rb +32 -13
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 9c9ca5b4cba8eb115192aa0b5a45216d12a9d9e4cdddc253ba55ace52e778afd
|
4
|
+
data.tar.gz: 197c61acfa742048fefff05b35d6045e17dd5cf212667c277537fb984a0ff926
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 950daeb4f8ee140a15bacf18ea3228f2604a552df8aa12be52fb7a488c78e67b894b8678fbe6fbed74da54beb714e89d02ab1bd46d5c59a908b8ddfbc5c9e7c0
|
7
|
+
data.tar.gz: 84b183babd37f9120c0ac2332eec23dff30d3180da165aaf044bf72ef4be7af4efc2b339ad5ac5b489e3e3b9b44ba33d3df4fca287addbbed05cfa4201b79d75
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -13,11 +13,12 @@
|
|
13
13
|
| ✅ | Access to pre-trained word vectors |
|
14
14
|
| ✅ | OpenAI Chat/Completion/Embeddings API integration |
|
15
15
|
|
16
|
-
Current Version: `0.2.
|
16
|
+
Current Version: `0.2.3`
|
17
17
|
|
18
|
-
-
|
18
|
+
- spaCy 3.7.0 supported
|
19
|
+
- OpenAI API integration
|
19
20
|
|
20
|
-
## Installation of
|
21
|
+
## Installation of Prerequisites
|
21
22
|
|
22
23
|
**IMPORTANT**: Make sure that the `enable-shared` option is enabled in your Python installation. You can use [pyenv](https://github.com/pyenv/pyenv) to install any version of Python you like. Install Python 3.10.6, for instance, using pyenv with `enable-shared` as follows:
|
23
24
|
|
@@ -109,7 +110,7 @@ Output:
|
|
109
110
|
|:-----:|:--:|:-------:|:--:|:------:|:----:|:-------:|:---:|:-:|:--:|:-------:|
|
110
111
|
| Apple | is | looking | at | buying | U.K. | startup | for | $ | 1 | billion |
|
111
112
|
|
112
|
-
### Part-of-speech and
|
113
|
+
### Part-of-speech and Dependency
|
113
114
|
|
114
115
|
→ [spaCy: Part-of-speech tags and dependencies](https://spacy.io/usage/spacy-101#annotations-pos-deps)
|
115
116
|
|
@@ -149,7 +150,7 @@ Output:
|
|
149
150
|
| 1 | 1 | NUM | CD | compound |
|
150
151
|
| billion | billion | NUM | CD | pobj |
|
151
152
|
|
152
|
-
### Part-of-speech and
|
153
|
+
### Part-of-speech and Dependency (Japanese)
|
153
154
|
|
154
155
|
Ruby code:
|
155
156
|
|
@@ -234,7 +235,7 @@ Output:
|
|
234
235
|
| 1 | d | false | false | NumType = Card |
|
235
236
|
| billion | xxxx | true | false | NumType = Card |
|
236
237
|
|
237
|
-
### Visualizing
|
238
|
+
### Visualizing Dependency
|
238
239
|
|
239
240
|
→ [spaCy: Visualizers](https://spacy.io/usage/visualizers)
|
240
241
|
|
@@ -259,7 +260,7 @@ Output:
|
|
259
260
|
|
260
261
|
![](https://github.com/yohasebe/ruby-spacy/blob/main/examples/get_started/outputs/test_dep.svg)
|
261
262
|
|
262
|
-
### Visualizing
|
263
|
+
### Visualizing Dependency (Compact)
|
263
264
|
|
264
265
|
Ruby code:
|
265
266
|
|
@@ -282,7 +283,7 @@ Output:
|
|
282
283
|
|
283
284
|
![](https://github.com/yohasebe/ruby-spacy/blob/main/examples/get_started/outputs/test_dep_compact.svg)
|
284
285
|
|
285
|
-
### Named
|
286
|
+
### Named Entity Recognition
|
286
287
|
|
287
288
|
→ [spaCy: Named entities](https://spacy.io/usage/spacy-101#annotations-ner)
|
288
289
|
|
@@ -314,7 +315,7 @@ Output:
|
|
314
315
|
| U.K. | 27 | 31 | GPE |
|
315
316
|
| $1 billion | 44 | 54 | MONEY |
|
316
317
|
|
317
|
-
### Named
|
318
|
+
### Named Entity Recognition (Japanese)
|
318
319
|
|
319
320
|
Ruby code:
|
320
321
|
|
@@ -347,7 +348,7 @@ Output:
|
|
347
348
|
| ファミコン | 10 | 15 | PRODUCT |
|
348
349
|
| 14,800円 | 16 | 23 | MONEY |
|
349
350
|
|
350
|
-
### Checking
|
351
|
+
### Checking Availability of Word Vectors
|
351
352
|
|
352
353
|
→ [spaCy: Word vectors and similarity](https://spacy.io/usage/spacy-101#vectors-similarity)
|
353
354
|
|
@@ -380,7 +381,7 @@ Output:
|
|
380
381
|
| banana | true | 6.700014 | false |
|
381
382
|
| afskfsd | false | 0.0 | true |
|
382
383
|
|
383
|
-
### Similarity
|
384
|
+
### Similarity Calculation
|
384
385
|
|
385
386
|
Ruby code:
|
386
387
|
|
@@ -405,7 +406,7 @@ Doc 2: Fast food tastes very good.
|
|
405
406
|
Similarity: 0.7687607012190486
|
406
407
|
```
|
407
408
|
|
408
|
-
### Similarity
|
409
|
+
### Similarity Calculation (Japanese)
|
409
410
|
|
410
411
|
Ruby code:
|
411
412
|
|
@@ -428,7 +429,7 @@ doc2: あいにくの悪天候で残念です。
|
|
428
429
|
Similarity: 0.8684192637149641
|
429
430
|
```
|
430
431
|
|
431
|
-
### Word
|
432
|
+
### Word Vector Calculation
|
432
433
|
|
433
434
|
**Tokyo - Japan + France = Paris ?**
|
434
435
|
|
@@ -475,7 +476,7 @@ Output:
|
|
475
476
|
| 10 | marseille | 0.6370999813079834 |
|
476
477
|
|
477
478
|
|
478
|
-
### Word
|
479
|
+
### Word Vector Calculation (Japanese)
|
479
480
|
|
480
481
|
**東京 - 日本 + フランス = パリ ?**
|
481
482
|
|
@@ -524,7 +525,9 @@ Output:
|
|
524
525
|
|
525
526
|
## OpenAI API Integration
|
526
527
|
|
527
|
-
|
528
|
+
> ⚠️ This feature is currently experimental. Details are subject to change. Please refer to OpenAI's [API reference](https://platform.openai.com/docs/api-reference) and [Ruby OpenAI](https://github.com/alexrudall/ruby-openai) for available parameters (`max_tokens`, `temperature`, etc).
|
529
|
+
|
530
|
+
Easily leverage GPT models within ruby-spacy by using an OpenAI API key. When constructing prompts for the `Doc::openai_query` method, you can incorporate the following token properties of the document. These properties are retrieved through function calls (made internally by GPT when necessary) and seamlessly integrated into your prompt. Note that function calls need `gpt-4o-mini` or greater. The available properties include:
|
528
531
|
|
529
532
|
- `surface`
|
530
533
|
- `lemma`
|
@@ -534,7 +537,7 @@ Easily leverage GPT models within ruby-spacy by using an OpenAI API key. When co
|
|
534
537
|
- `ent_type` (entity type)
|
535
538
|
- `morphology`
|
536
539
|
|
537
|
-
### GPT Prompting
|
540
|
+
### GPT Prompting (Translation)
|
538
541
|
|
539
542
|
Ruby code:
|
540
543
|
|
@@ -549,7 +552,7 @@ doc = nlp.read("The Beatles released 12 studio albums")
|
|
549
552
|
# default parameter values
|
550
553
|
# max_tokens: 1000
|
551
554
|
# temperature: 0.7
|
552
|
-
# model: "gpt-
|
555
|
+
# model: "gpt-4o-mini"
|
553
556
|
res1 = doc.openai_query(
|
554
557
|
access_token: api_key,
|
555
558
|
prompt: "Translate the text to Japanese."
|
@@ -561,7 +564,7 @@ Output:
|
|
561
564
|
|
562
565
|
> ビートルズは12枚のスタジオアルバムをリリースしました。
|
563
566
|
|
564
|
-
### GPT Prompting
|
567
|
+
### GPT Prompting (Elaboration)
|
565
568
|
|
566
569
|
Ruby code:
|
567
570
|
|
@@ -572,6 +575,10 @@ api_key = ENV["OPENAI_API_KEY"]
|
|
572
575
|
nlp = Spacy::Language.new("en_core_web_sm")
|
573
576
|
doc = nlp.read("The Beatles were an English rock band formed in Liverpool in 1960.")
|
574
577
|
|
578
|
+
# default parameter values
|
579
|
+
# max_tokens: 1000
|
580
|
+
# temperature: 0.7
|
581
|
+
# model: "gpt-4o-mini"
|
575
582
|
res = doc.openai_query(
|
576
583
|
access_token: api_key,
|
577
584
|
prompt: "Extract the topic of the document and list 10 entities (names, concepts, locations, etc.) that are relevant to the topic."
|
@@ -580,21 +587,112 @@ res = doc.openai_query(
|
|
580
587
|
|
581
588
|
Output:
|
582
589
|
|
583
|
-
> Topic
|
590
|
+
> **Topic:** The Beatles
|
591
|
+
>
|
592
|
+
> **Relevant Entities:**
|
584
593
|
>
|
585
|
-
>
|
586
|
-
>
|
587
|
-
>
|
588
|
-
>
|
589
|
-
>
|
590
|
-
>
|
591
|
-
>
|
592
|
-
>
|
593
|
-
>
|
594
|
-
>
|
595
|
-
|
596
|
-
|
597
|
-
|
594
|
+
> 1. The Beatles (PERSON)
|
595
|
+
> 2. Liverpool (GPE - Geopolitical Entity)
|
596
|
+
> 3. English (LANGUAGE)
|
597
|
+
> 4. Rock (MUSIC GENRE)
|
598
|
+
> 5. 1960 (DATE)
|
599
|
+
> 6. Band (MUSIC GROUP)
|
600
|
+
> 7. John Lennon (PERSON - key member)
|
601
|
+
> 8. Paul McCartney (PERSON - key member)
|
602
|
+
> 9. George Harrison (PERSON - key member)
|
603
|
+
> 10. Ringo Starr (PERSON - key member)
|
604
|
+
|
605
|
+
### GPT Prompting (JSON Output Using RAG with Token Properties)
|
606
|
+
|
607
|
+
Ruby code:
|
608
|
+
|
609
|
+
```ruby
|
610
|
+
require "ruby-spacy"
|
611
|
+
|
612
|
+
api_key = ENV["OPENAI_API_KEY"]
|
613
|
+
nlp = Spacy::Language.new("en_core_web_sm")
|
614
|
+
doc = nlp.read("The Beatles released 12 studio albums")
|
615
|
+
|
616
|
+
# default parameter values
|
617
|
+
# max_tokens: 1000
|
618
|
+
# temperature: 0.7
|
619
|
+
# model: "gpt-4o-mini"
|
620
|
+
res = doc.openai_query(
|
621
|
+
access_token: api_key,
|
622
|
+
prompt: "List token data of each of the words used in the sentence. Add 'meaning' property and value (brief semantic definition) to each token data. Output as a JSON object."
|
623
|
+
)
|
624
|
+
```
|
625
|
+
|
626
|
+
Output:
|
627
|
+
|
628
|
+
```json
|
629
|
+
{
|
630
|
+
"tokens": [
|
631
|
+
{
|
632
|
+
"surface": "The",
|
633
|
+
"lemma": "the",
|
634
|
+
"pos": "DET",
|
635
|
+
"tag": "DT",
|
636
|
+
"dep": "det",
|
637
|
+
"ent_type": "",
|
638
|
+
"morphology": "{'Definite': 'Def', 'PronType': 'Art'}",
|
639
|
+
"meaning": "A definite article used to specify a noun."
|
640
|
+
},
|
641
|
+
{
|
642
|
+
"surface": "Beatles",
|
643
|
+
"lemma": "beatle",
|
644
|
+
"pos": "NOUN",
|
645
|
+
"tag": "NNS",
|
646
|
+
"dep": "nsubj",
|
647
|
+
"ent_type": "GPE",
|
648
|
+
"morphology": "{'Number': 'Plur'}",
|
649
|
+
"meaning": "A British rock band formed in Liverpool in 1960."
|
650
|
+
},
|
651
|
+
{
|
652
|
+
"surface": "released",
|
653
|
+
"lemma": "release",
|
654
|
+
"pos": "VERB",
|
655
|
+
"tag": "VBD",
|
656
|
+
"dep": "ROOT",
|
657
|
+
"ent_type": "",
|
658
|
+
"morphology": "{'Tense': 'Past', 'VerbForm': 'Fin'}",
|
659
|
+
"meaning": "To make something available to the public."
|
660
|
+
},
|
661
|
+
{
|
662
|
+
"surface": "12",
|
663
|
+
"lemma": "12",
|
664
|
+
"pos": "NUM",
|
665
|
+
"tag": "CD",
|
666
|
+
"dep": "nummod",
|
667
|
+
"ent_type": "CARDINAL",
|
668
|
+
"morphology": "{'NumType': 'Card'}",
|
669
|
+
"meaning": "A cardinal number representing the quantity of twelve."
|
670
|
+
},
|
671
|
+
{
|
672
|
+
"surface": "studio",
|
673
|
+
"lemma": "studio",
|
674
|
+
"pos": "NOUN",
|
675
|
+
"tag": "NN",
|
676
|
+
"dep": "compound",
|
677
|
+
"ent_type": "",
|
678
|
+
"morphology": "{'Number': 'Sing'}",
|
679
|
+
"meaning": "A place where recording or filming takes place."
|
680
|
+
},
|
681
|
+
{
|
682
|
+
"surface": "albums",
|
683
|
+
"lemma": "album",
|
684
|
+
"pos": "NOUN",
|
685
|
+
"tag": "NNS",
|
686
|
+
"dep": "dobj",
|
687
|
+
"ent_type": "",
|
688
|
+
"morphology": "{'Number': 'Plur'}",
|
689
|
+
"meaning": "Collections of music tracks or recordings."
|
690
|
+
}
|
691
|
+
]
|
692
|
+
}
|
693
|
+
```
|
694
|
+
|
695
|
+
### GPT Prompting (Generate a Syntaxt Tree using Token Properties)
|
598
696
|
|
599
697
|
Ruby code:
|
600
698
|
|
@@ -603,11 +701,15 @@ require "ruby-spacy"
|
|
603
701
|
|
604
702
|
api_key = ENV["OPENAI_API_KEY"]
|
605
703
|
nlp = Spacy::Language.new("en_core_web_sm")
|
704
|
+
doc = nlp.read("The Beatles released 12 studio albums")
|
606
705
|
|
706
|
+
# default parameter values
|
707
|
+
# max_tokens: 1000
|
708
|
+
# temperature: 0.7
|
607
709
|
res = doc.openai_query(
|
608
710
|
access_token: api_key,
|
609
711
|
model: "gpt-4",
|
610
|
-
prompt: "Generate a tree diagram from the text
|
712
|
+
prompt: "Generate a tree diagram from the text using given token data. Use the following bracketing style: [S [NP [Det the] [N cat]] [VP [V sat] [PP [P on] [NP the mat]]]"
|
611
713
|
)
|
612
714
|
puts res
|
613
715
|
```
|
@@ -647,14 +749,14 @@ doc = nlp.read("Vladimir Nabokov was a")
|
|
647
749
|
# default parameter values
|
648
750
|
# max_tokens: 1000
|
649
751
|
# temperature: 0.7
|
650
|
-
# model: "gpt-
|
752
|
+
# model: "gpt-4o-mini"
|
651
753
|
res = doc.openai_completion(access_token: api_key)
|
652
754
|
puts res
|
653
755
|
```
|
654
756
|
|
655
757
|
Output:
|
656
758
|
|
657
|
-
> Russian-American novelist and
|
759
|
+
> Vladimir Nabokov was a Russian-American novelist, poet, and entomologist, best known for his intricate prose style and innovative narrative techniques. He is most famously recognized for his controversial novel "Lolita," which explores themes of obsession and manipulation. Nabokov's works often reflect his fascination with language, memory, and the nature of art. In addition to his literary accomplishments, he was also a passionate lepidopterist, contributing to the field of butterfly studies. His literary career spanned several decades, and his influence continues to be felt in contemporary literature.
|
658
760
|
|
659
761
|
### Text Embeddings
|
660
762
|
|
@@ -676,12 +778,22 @@ puts res
|
|
676
778
|
Output:
|
677
779
|
|
678
780
|
```
|
679
|
-
-0.
|
680
|
-
-0.
|
681
|
-
|
682
|
-
|
683
|
-
|
684
|
-
|
781
|
+
-0.0023891362
|
782
|
+
-0.016671216
|
783
|
+
0.010879759
|
784
|
+
0.012918914
|
785
|
+
0.0012281279
|
786
|
+
...
|
787
|
+
```
|
788
|
+
|
789
|
+
## Advanced Usage
|
790
|
+
|
791
|
+
### Setting a Timeout
|
792
|
+
|
793
|
+
You can set a timeout for the `Spacy::Language.new` method:
|
794
|
+
|
795
|
+
```ruby
|
796
|
+
nlp = Spacy::Language.new("en_core_web_sm", timeout: 120) # Set timeout to 120 seconds
|
685
797
|
```
|
686
798
|
|
687
799
|
## Author
|
@@ -12,7 +12,7 @@ doc = nlp.read("The Beatles released 12 studio albums")
|
|
12
12
|
# default parameter values
|
13
13
|
# max_tokens: 1000
|
14
14
|
# temperature: 0.7
|
15
|
-
# model: "gpt-
|
15
|
+
# model: "gpt-4o-mini"
|
16
16
|
res = doc.openai_query(access_token: api_key, prompt: "Translate the text to Japanese.")
|
17
17
|
|
18
18
|
puts res
|
@@ -12,7 +12,7 @@ doc = nlp.read("The Beatles were an English rock band formed in Liverpool in 196
|
|
12
12
|
# default parameter values
|
13
13
|
# max_tokens: 1000
|
14
14
|
# temperature: 0.7
|
15
|
-
# model: "gpt-
|
15
|
+
# model: "gpt-4o-mini"
|
16
16
|
res = doc.openai_query(access_token: api_key, prompt: "Extract the topic of the document and list 10 entities (names, concepts, locations, etc.) that are relevant to the topic.")
|
17
17
|
|
18
18
|
puts res
|
@@ -12,63 +12,78 @@ doc = nlp.read("The Beatles released 12 studio albums")
|
|
12
12
|
# default parameter values
|
13
13
|
# max_tokens: 1000
|
14
14
|
# temperature: 0.7
|
15
|
-
# model: "gpt-
|
16
|
-
res = doc.openai_query(
|
15
|
+
# model: "gpt-4o-mini"
|
16
|
+
res = doc.openai_query(
|
17
|
+
access_token: api_key,
|
18
|
+
prompt: "List token data of each of the words used in the sentence. Add 'meaning' property and value (brief semantic definition) to each token data. Output as a JSON object.",
|
19
|
+
max_tokens: 1000,
|
20
|
+
temperature: 0.7,
|
21
|
+
model: "gpt-4o-mini"
|
22
|
+
)
|
17
23
|
|
18
24
|
puts res
|
19
25
|
|
20
|
-
#
|
21
|
-
#
|
22
|
-
#
|
23
|
-
#
|
24
|
-
#
|
25
|
-
#
|
26
|
-
#
|
27
|
-
#
|
28
|
-
#
|
29
|
-
#
|
30
|
-
#
|
31
|
-
#
|
32
|
-
#
|
33
|
-
#
|
34
|
-
#
|
35
|
-
#
|
36
|
-
#
|
37
|
-
#
|
38
|
-
#
|
39
|
-
#
|
40
|
-
#
|
41
|
-
#
|
42
|
-
#
|
43
|
-
#
|
44
|
-
#
|
45
|
-
#
|
46
|
-
#
|
47
|
-
#
|
48
|
-
#
|
49
|
-
#
|
50
|
-
#
|
51
|
-
#
|
52
|
-
#
|
53
|
-
#
|
54
|
-
#
|
55
|
-
#
|
56
|
-
#
|
57
|
-
#
|
58
|
-
#
|
59
|
-
#
|
60
|
-
#
|
61
|
-
#
|
62
|
-
#
|
63
|
-
#
|
64
|
-
#
|
65
|
-
#
|
66
|
-
#
|
67
|
-
#
|
68
|
-
#
|
69
|
-
#
|
70
|
-
#
|
71
|
-
#
|
72
|
-
#
|
73
|
-
#
|
74
|
-
#
|
26
|
+
# {
|
27
|
+
# "tokens": [
|
28
|
+
# {
|
29
|
+
# "surface": "The",
|
30
|
+
# "lemma": "the",
|
31
|
+
# "pos": "DET",
|
32
|
+
# "tag": "DT",
|
33
|
+
# "dep": "det",
|
34
|
+
# "ent_type": "",
|
35
|
+
# "morphology": "{'Definite': 'Def', 'PronType': 'Art'}",
|
36
|
+
# "meaning": "Used to refer to one or more people or things already mentioned or assumed to be common knowledge"
|
37
|
+
# },
|
38
|
+
# {
|
39
|
+
# "surface": "Beatles",
|
40
|
+
# "lemma": "beatle",
|
41
|
+
# "pos": "NOUN",
|
42
|
+
# "tag": "NNS",
|
43
|
+
# "dep": "nsubj",
|
44
|
+
# "ent_type": "GPE",
|
45
|
+
# "morphology": "{'Number': 'Plur'}",
|
46
|
+
# "meaning": "A British rock band formed in Liverpool in 1960"
|
47
|
+
# },
|
48
|
+
# {
|
49
|
+
# "surface": "released",
|
50
|
+
# "lemma": "release",
|
51
|
+
# "pos": "VERB",
|
52
|
+
# "tag": "VBD",
|
53
|
+
# "dep": "ROOT",
|
54
|
+
# "ent_type": "",
|
55
|
+
# "morphology": "{'Tense': 'Past', 'VerbForm': 'Fin'}",
|
56
|
+
# "meaning": "To make something available or known to the public"
|
57
|
+
# },
|
58
|
+
# {
|
59
|
+
# "surface": "12",
|
60
|
+
# "lemma": "12",
|
61
|
+
# "pos": "NUM",
|
62
|
+
# "tag": "CD",
|
63
|
+
# "dep": "nummod",
|
64
|
+
# "ent_type": "CARDINAL",
|
65
|
+
# "morphology": "{'NumType': 'Card'}",
|
66
|
+
# "meaning": "A number representing a quantity"
|
67
|
+
# },
|
68
|
+
# {
|
69
|
+
# "surface": "studio",
|
70
|
+
# "lemma": "studio",
|
71
|
+
# "pos": "NOUN",
|
72
|
+
# "tag": "NN",
|
73
|
+
# "dep": "compound",
|
74
|
+
# "ent_type": "",
|
75
|
+
# "morphology": "{'Number': 'Sing'}",
|
76
|
+
# "meaning": "A place where creative work is done"
|
77
|
+
# },
|
78
|
+
# {
|
79
|
+
# "surface": "albums",
|
80
|
+
# "lemma": "album",
|
81
|
+
# "pos": "NOUN",
|
82
|
+
# "tag": "NNS",
|
83
|
+
# "dep": "dobj",
|
84
|
+
# "ent_type": "",
|
85
|
+
# "morphology": "{'Number': 'Plur'}",
|
86
|
+
# "meaning": "A collection of musical or spoken recordings"
|
87
|
+
# }
|
88
|
+
# ]
|
89
|
+
# }
|
@@ -12,11 +12,11 @@ doc = nlp.read("The Beatles released 12 studio albums")
|
|
12
12
|
# default parameter values
|
13
13
|
# max_tokens: 1000
|
14
14
|
# temperature: 0.7
|
15
|
-
# model: "gpt-
|
15
|
+
# model: "gpt-4o-mini"
|
16
16
|
res = doc.openai_query(
|
17
17
|
access_token: api_key,
|
18
|
-
model: "gpt-
|
19
|
-
prompt: "Generate a tree diagram from the text
|
18
|
+
model: "gpt-4o",
|
19
|
+
prompt: "Generate a tree diagram from the text using given token data. Use the following bracketing style: [S [NP [Det the] [N cat]] [VP [V sat] [PP [P on] [NP the mat]]]"
|
20
20
|
)
|
21
21
|
|
22
22
|
puts res
|
data/lib/ruby-spacy/version.rb
CHANGED
data/lib/ruby-spacy.rb
CHANGED
@@ -1,10 +1,21 @@
|
|
1
1
|
# frozen_string_literal: true
|
2
2
|
|
3
3
|
require_relative "ruby-spacy/version"
|
4
|
-
require "strscan"
|
5
4
|
require "numpy"
|
6
|
-
require "pycall"
|
7
5
|
require "openai"
|
6
|
+
require "pycall"
|
7
|
+
require "strscan"
|
8
|
+
require "timeout"
|
9
|
+
|
10
|
+
begin
|
11
|
+
PyCall.init
|
12
|
+
_spacy = PyCall.import_module("spacy")
|
13
|
+
rescue PyCall::PyError => e
|
14
|
+
puts "Failed to initialize PyCall or import spacy: #{e.message}"
|
15
|
+
puts "Python traceback:"
|
16
|
+
puts e.traceback
|
17
|
+
raise
|
18
|
+
end
|
8
19
|
|
9
20
|
# This module covers the areas of spaCy functionality for _using_ many varieties of its language models, not for _building_ ones.
|
10
21
|
module Spacy
|
@@ -216,7 +227,7 @@ module Spacy
|
|
216
227
|
def openai_query(access_token: nil,
|
217
228
|
max_tokens: 1000,
|
218
229
|
temperature: 0.7,
|
219
|
-
model: "gpt-
|
230
|
+
model: "gpt-4o-mini",
|
220
231
|
messages: [],
|
221
232
|
prompt: nil)
|
222
233
|
if messages.empty?
|
@@ -291,7 +302,7 @@ module Spacy
|
|
291
302
|
end
|
292
303
|
end
|
293
304
|
|
294
|
-
def openai_completion(access_token: nil, max_tokens: 1000, temperature: 0.7, model: "gpt-
|
305
|
+
def openai_completion(access_token: nil, max_tokens: 1000, temperature: 0.7, model: "gpt-4o-mini")
|
295
306
|
messages = [
|
296
307
|
{ role: "system", content: "Complete the text input by the user." },
|
297
308
|
{ role: "user", content: @text }
|
@@ -355,16 +366,24 @@ module Spacy
|
|
355
366
|
|
356
367
|
# Creates a language model instance, which is conventionally referred to by a variable named `nlp`.
|
357
368
|
# @param model [String] A language model installed in the system
|
358
|
-
def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0)
|
369
|
+
def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0, timeout: 60)
|
359
370
|
@spacy_nlp_id = "nlp_#{model.object_id}"
|
360
|
-
|
361
|
-
|
362
|
-
|
363
|
-
|
364
|
-
|
365
|
-
|
366
|
-
|
367
|
-
|
371
|
+
begin
|
372
|
+
Timeout.timeout(timeout) do
|
373
|
+
PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')")
|
374
|
+
end
|
375
|
+
@py_nlp = PyCall.eval(@spacy_nlp_id)
|
376
|
+
rescue Timeout::Error
|
377
|
+
raise "PyCall execution timed out after #{timeout} seconds"
|
378
|
+
rescue StandardError => e
|
379
|
+
retrial += 1
|
380
|
+
if retrial <= max_retrial
|
381
|
+
sleep 0.5
|
382
|
+
retry
|
383
|
+
else
|
384
|
+
raise "Failed to initialize Spacy after #{max_retrial} attempts: #{e.message}"
|
385
|
+
end
|
386
|
+
end
|
368
387
|
end
|
369
388
|
|
370
389
|
# Reads and analyze the given text.
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: ruby-spacy
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Yoichiro Hasebe
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2024-08-27 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -224,7 +224,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
224
224
|
- !ruby/object:Gem::Version
|
225
225
|
version: '0'
|
226
226
|
requirements: []
|
227
|
-
rubygems_version: 3.4.
|
227
|
+
rubygems_version: 3.4.13
|
228
228
|
signing_key:
|
229
229
|
specification_version: 4
|
230
230
|
summary: A wrapper module for using spaCy natural language processing library from
|