ruby-spacy 0.2.2 → 0.2.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9e9edb55398e8926b4fd9c06d65b49538129e34a960098b1ad20535d64a2b787
4
- data.tar.gz: 639fa3186d563480d0eb268fa948d8b97428fcdf37887dfff64954fcfc86c1f0
3
+ metadata.gz: 9c9ca5b4cba8eb115192aa0b5a45216d12a9d9e4cdddc253ba55ace52e778afd
4
+ data.tar.gz: 197c61acfa742048fefff05b35d6045e17dd5cf212667c277537fb984a0ff926
5
5
  SHA512:
6
- metadata.gz: 74367e0cd67a3537b20f73427baf626ada1f123d9c34da1a55795a905c3cfd8239c5cc1a04e6cf92c8312c6338a6300ce95b837d24c642c3dbb77733a25060ed
7
- data.tar.gz: 4723555e09a6416ec8cb5727b3344756be36a26f65429759508aaee697b245960065cb74699b7f619d552a67b574dbc70da1b53a0ddc434ade45901b0ca72dd7
6
+ metadata.gz: 950daeb4f8ee140a15bacf18ea3228f2604a552df8aa12be52fb7a488c78e67b894b8678fbe6fbed74da54beb714e89d02ab1bd46d5c59a908b8ddfbc5c9e7c0
7
+ data.tar.gz: 84b183babd37f9120c0ac2332eec23dff30d3180da165aaf044bf72ef4be7af4efc2b339ad5ac5b489e3e3b9b44ba33d3df4fca287addbbed05cfa4201b79d75
data/CHANGELOG.md CHANGED
@@ -1,5 +1,9 @@
1
1
  # Change Log
2
2
 
3
+ ## 0.2.3 - 2024-08-27
4
+ - Timeout option added to `Spacy::Language.new`
5
+ - Default OpenaAI models updated to `gpt-4o-mini`
6
+
3
7
  ## 0.2.0 - 2022-10-02
4
8
  - spaCy 3.7.0 supported
5
9
 
data/README.md CHANGED
@@ -13,11 +13,12 @@
13
13
  | ✅ | Access to pre-trained word vectors |
14
14
  | ✅ | OpenAI Chat/Completion/Embeddings API integration |
15
15
 
16
- Current Version: `0.2.2`
16
+ Current Version: `0.2.3`
17
17
 
18
- - Addressed installation issues in some environments
18
+ - spaCy 3.7.0 supported
19
+ - OpenAI API integration
19
20
 
20
- ## Installation of prerequisites
21
+ ## Installation of Prerequisites
21
22
 
22
23
  **IMPORTANT**: Make sure that the `enable-shared` option is enabled in your Python installation. You can use [pyenv](https://github.com/pyenv/pyenv) to install any version of Python you like. Install Python 3.10.6, for instance, using pyenv with `enable-shared` as follows:
23
24
 
@@ -109,7 +110,7 @@ Output:
109
110
  |:-----:|:--:|:-------:|:--:|:------:|:----:|:-------:|:---:|:-:|:--:|:-------:|
110
111
  | Apple | is | looking | at | buying | U.K. | startup | for | $ | 1 | billion |
111
112
 
112
- ### Part-of-speech and dependency
113
+ ### Part-of-speech and Dependency
113
114
 
114
115
  → [spaCy: Part-of-speech tags and dependencies](https://spacy.io/usage/spacy-101#annotations-pos-deps)
115
116
 
@@ -149,7 +150,7 @@ Output:
149
150
  | 1 | 1 | NUM | CD | compound |
150
151
  | billion | billion | NUM | CD | pobj |
151
152
 
152
- ### Part-of-speech and dependency (Japanese)
153
+ ### Part-of-speech and Dependency (Japanese)
153
154
 
154
155
  Ruby code:
155
156
 
@@ -234,7 +235,7 @@ Output:
234
235
  | 1 | d | false | false | NumType = Card |
235
236
  | billion | xxxx | true | false | NumType = Card |
236
237
 
237
- ### Visualizing dependency
238
+ ### Visualizing Dependency
238
239
 
239
240
  → [spaCy: Visualizers](https://spacy.io/usage/visualizers)
240
241
 
@@ -259,7 +260,7 @@ Output:
259
260
 
260
261
  ![](https://github.com/yohasebe/ruby-spacy/blob/main/examples/get_started/outputs/test_dep.svg)
261
262
 
262
- ### Visualizing dependency (compact)
263
+ ### Visualizing Dependency (Compact)
263
264
 
264
265
  Ruby code:
265
266
 
@@ -282,7 +283,7 @@ Output:
282
283
 
283
284
  ![](https://github.com/yohasebe/ruby-spacy/blob/main/examples/get_started/outputs/test_dep_compact.svg)
284
285
 
285
- ### Named entity recognition
286
+ ### Named Entity Recognition
286
287
 
287
288
  → [spaCy: Named entities](https://spacy.io/usage/spacy-101#annotations-ner)
288
289
 
@@ -314,7 +315,7 @@ Output:
314
315
  | U.K. | 27 | 31 | GPE |
315
316
  | $1 billion | 44 | 54 | MONEY |
316
317
 
317
- ### Named entity recognition (Japanese)
318
+ ### Named Entity Recognition (Japanese)
318
319
 
319
320
  Ruby code:
320
321
 
@@ -347,7 +348,7 @@ Output:
347
348
  | ファミコン | 10 | 15 | PRODUCT |
348
349
  | 14,800円 | 16 | 23 | MONEY |
349
350
 
350
- ### Checking availability of word vectors
351
+ ### Checking Availability of Word Vectors
351
352
 
352
353
  → [spaCy: Word vectors and similarity](https://spacy.io/usage/spacy-101#vectors-similarity)
353
354
 
@@ -380,7 +381,7 @@ Output:
380
381
  | banana | true | 6.700014 | false |
381
382
  | afskfsd | false | 0.0 | true |
382
383
 
383
- ### Similarity calculation
384
+ ### Similarity Calculation
384
385
 
385
386
  Ruby code:
386
387
 
@@ -405,7 +406,7 @@ Doc 2: Fast food tastes very good.
405
406
  Similarity: 0.7687607012190486
406
407
  ```
407
408
 
408
- ### Similarity calculation (Japanese)
409
+ ### Similarity Calculation (Japanese)
409
410
 
410
411
  Ruby code:
411
412
 
@@ -428,7 +429,7 @@ doc2: あいにくの悪天候で残念です。
428
429
  Similarity: 0.8684192637149641
429
430
  ```
430
431
 
431
- ### Word vector calculation
432
+ ### Word Vector Calculation
432
433
 
433
434
  **Tokyo - Japan + France = Paris ?**
434
435
 
@@ -475,7 +476,7 @@ Output:
475
476
  | 10 | marseille | 0.6370999813079834 |
476
477
 
477
478
 
478
- ### Word vector calculation (Japanese)
479
+ ### Word Vector Calculation (Japanese)
479
480
 
480
481
  **東京 - 日本 + フランス = パリ ?**
481
482
 
@@ -524,7 +525,9 @@ Output:
524
525
 
525
526
  ## OpenAI API Integration
526
527
 
527
- Easily leverage GPT models within ruby-spacy by using an OpenAI API key. When constructing prompts for the `Doc::openai_query` method, you can incorporate various token properties from the document. These properties are retrieved through function calls and seamlessly integrated into your prompt (`gpt-3.5-turbo-0613` or greater is needed). The available properties include:
528
+ > ⚠️ This feature is currently experimental. Details are subject to change. Please refer to OpenAI's [API reference](https://platform.openai.com/docs/api-reference) and [Ruby OpenAI](https://github.com/alexrudall/ruby-openai) for available parameters (`max_tokens`, `temperature`, etc).
529
+
530
+ Easily leverage GPT models within ruby-spacy by using an OpenAI API key. When constructing prompts for the `Doc::openai_query` method, you can incorporate the following token properties of the document. These properties are retrieved through function calls (made internally by GPT when necessary) and seamlessly integrated into your prompt. Note that function calls need `gpt-4o-mini` or greater. The available properties include:
528
531
 
529
532
  - `surface`
530
533
  - `lemma`
@@ -534,7 +537,7 @@ Easily leverage GPT models within ruby-spacy by using an OpenAI API key. When co
534
537
  - `ent_type` (entity type)
535
538
  - `morphology`
536
539
 
537
- ### GPT Prompting 1
540
+ ### GPT Prompting (Translation)
538
541
 
539
542
  Ruby code:
540
543
 
@@ -549,7 +552,7 @@ doc = nlp.read("The Beatles released 12 studio albums")
549
552
  # default parameter values
550
553
  # max_tokens: 1000
551
554
  # temperature: 0.7
552
- # model: "gpt-3.5-turbo-0613"
555
+ # model: "gpt-4o-mini"
553
556
  res1 = doc.openai_query(
554
557
  access_token: api_key,
555
558
  prompt: "Translate the text to Japanese."
@@ -561,7 +564,7 @@ Output:
561
564
 
562
565
  > ビートルズは12枚のスタジオアルバムをリリースしました。
563
566
 
564
- ### GPT Prompting 2
567
+ ### GPT Prompting (Elaboration)
565
568
 
566
569
  Ruby code:
567
570
 
@@ -572,6 +575,10 @@ api_key = ENV["OPENAI_API_KEY"]
572
575
  nlp = Spacy::Language.new("en_core_web_sm")
573
576
  doc = nlp.read("The Beatles were an English rock band formed in Liverpool in 1960.")
574
577
 
578
+ # default parameter values
579
+ # max_tokens: 1000
580
+ # temperature: 0.7
581
+ # model: "gpt-4o-mini"
575
582
  res = doc.openai_query(
576
583
  access_token: api_key,
577
584
  prompt: "Extract the topic of the document and list 10 entities (names, concepts, locations, etc.) that are relevant to the topic."
@@ -580,21 +587,112 @@ res = doc.openai_query(
580
587
 
581
588
  Output:
582
589
 
583
- > Topic: The Beatles
590
+ > **Topic:** The Beatles
591
+ >
592
+ > **Relevant Entities:**
584
593
  >
585
- > Entities:
586
- > 1. The Beatles (band)
587
- > 2. English (nationality)
588
- > 3. Rock band
589
- > 4. Liverpool (city)
590
- > 5. 1960 (year)
591
- > 6. John Lennon (member)
592
- > 7. Paul McCartney (member)
593
- > 8. George Harrison (member)
594
- > 9. Ringo Starr (member)
595
- > 10. Music
596
-
597
- ### GPT Prompting 3
594
+ > 1. The Beatles (PERSON)
595
+ > 2. Liverpool (GPE - Geopolitical Entity)
596
+ > 3. English (LANGUAGE)
597
+ > 4. Rock (MUSIC GENRE)
598
+ > 5. 1960 (DATE)
599
+ > 6. Band (MUSIC GROUP)
600
+ > 7. John Lennon (PERSON - key member)
601
+ > 8. Paul McCartney (PERSON - key member)
602
+ > 9. George Harrison (PERSON - key member)
603
+ > 10. Ringo Starr (PERSON - key member)
604
+
605
+ ### GPT Prompting (JSON Output Using RAG with Token Properties)
606
+
607
+ Ruby code:
608
+
609
+ ```ruby
610
+ require "ruby-spacy"
611
+
612
+ api_key = ENV["OPENAI_API_KEY"]
613
+ nlp = Spacy::Language.new("en_core_web_sm")
614
+ doc = nlp.read("The Beatles released 12 studio albums")
615
+
616
+ # default parameter values
617
+ # max_tokens: 1000
618
+ # temperature: 0.7
619
+ # model: "gpt-4o-mini"
620
+ res = doc.openai_query(
621
+ access_token: api_key,
622
+ prompt: "List token data of each of the words used in the sentence. Add 'meaning' property and value (brief semantic definition) to each token data. Output as a JSON object."
623
+ )
624
+ ```
625
+
626
+ Output:
627
+
628
+ ```json
629
+ {
630
+ "tokens": [
631
+ {
632
+ "surface": "The",
633
+ "lemma": "the",
634
+ "pos": "DET",
635
+ "tag": "DT",
636
+ "dep": "det",
637
+ "ent_type": "",
638
+ "morphology": "{'Definite': 'Def', 'PronType': 'Art'}",
639
+ "meaning": "A definite article used to specify a noun."
640
+ },
641
+ {
642
+ "surface": "Beatles",
643
+ "lemma": "beatle",
644
+ "pos": "NOUN",
645
+ "tag": "NNS",
646
+ "dep": "nsubj",
647
+ "ent_type": "GPE",
648
+ "morphology": "{'Number': 'Plur'}",
649
+ "meaning": "A British rock band formed in Liverpool in 1960."
650
+ },
651
+ {
652
+ "surface": "released",
653
+ "lemma": "release",
654
+ "pos": "VERB",
655
+ "tag": "VBD",
656
+ "dep": "ROOT",
657
+ "ent_type": "",
658
+ "morphology": "{'Tense': 'Past', 'VerbForm': 'Fin'}",
659
+ "meaning": "To make something available to the public."
660
+ },
661
+ {
662
+ "surface": "12",
663
+ "lemma": "12",
664
+ "pos": "NUM",
665
+ "tag": "CD",
666
+ "dep": "nummod",
667
+ "ent_type": "CARDINAL",
668
+ "morphology": "{'NumType': 'Card'}",
669
+ "meaning": "A cardinal number representing the quantity of twelve."
670
+ },
671
+ {
672
+ "surface": "studio",
673
+ "lemma": "studio",
674
+ "pos": "NOUN",
675
+ "tag": "NN",
676
+ "dep": "compound",
677
+ "ent_type": "",
678
+ "morphology": "{'Number': 'Sing'}",
679
+ "meaning": "A place where recording or filming takes place."
680
+ },
681
+ {
682
+ "surface": "albums",
683
+ "lemma": "album",
684
+ "pos": "NOUN",
685
+ "tag": "NNS",
686
+ "dep": "dobj",
687
+ "ent_type": "",
688
+ "morphology": "{'Number': 'Plur'}",
689
+ "meaning": "Collections of music tracks or recordings."
690
+ }
691
+ ]
692
+ }
693
+ ```
694
+
695
+ ### GPT Prompting (Generate a Syntaxt Tree using Token Properties)
598
696
 
599
697
  Ruby code:
600
698
 
@@ -603,11 +701,15 @@ require "ruby-spacy"
603
701
 
604
702
  api_key = ENV["OPENAI_API_KEY"]
605
703
  nlp = Spacy::Language.new("en_core_web_sm")
704
+ doc = nlp.read("The Beatles released 12 studio albums")
606
705
 
706
+ # default parameter values
707
+ # max_tokens: 1000
708
+ # temperature: 0.7
607
709
  res = doc.openai_query(
608
710
  access_token: api_key,
609
711
  model: "gpt-4",
610
- prompt: "Generate a tree diagram from the text in the following style: [S [NP [Det the] [N cat]] [VP [V sat] [PP [P on] [NP the mat]]]"
712
+ prompt: "Generate a tree diagram from the text using given token data. Use the following bracketing style: [S [NP [Det the] [N cat]] [VP [V sat] [PP [P on] [NP the mat]]]"
611
713
  )
612
714
  puts res
613
715
  ```
@@ -647,14 +749,14 @@ doc = nlp.read("Vladimir Nabokov was a")
647
749
  # default parameter values
648
750
  # max_tokens: 1000
649
751
  # temperature: 0.7
650
- # model: "gpt-3.5-turbo-0613"
752
+ # model: "gpt-4o-mini"
651
753
  res = doc.openai_completion(access_token: api_key)
652
754
  puts res
653
755
  ```
654
756
 
655
757
  Output:
656
758
 
657
- > Russian-American novelist and lepidopterist. He was born in 1899 in St. Petersburg, Russia, and later emigrated to the United States in 1940. Nabokov is best known for his novel "Lolita," which was published in 1955 and caused much controversy due to its controversial subject matter. Throughout his career, Nabokov wrote many other notable works, including "Pale Fire" and "Ada or Ardor: A Family Chronicle." In addition to his writing, Nabokov was also a passionate butterfly collector and taxonomist, publishing several scientific papers on the subject. He passed away in 1977, leaving behind a rich literary legacy.
759
+ > Vladimir Nabokov was a Russian-American novelist, poet, and entomologist, best known for his intricate prose style and innovative narrative techniques. He is most famously recognized for his controversial novel "Lolita," which explores themes of obsession and manipulation. Nabokov's works often reflect his fascination with language, memory, and the nature of art. In addition to his literary accomplishments, he was also a passionate lepidopterist, contributing to the field of butterfly studies. His literary career spanned several decades, and his influence continues to be felt in contemporary literature.
658
760
 
659
761
  ### Text Embeddings
660
762
 
@@ -676,12 +778,22 @@ puts res
676
778
  Output:
677
779
 
678
780
  ```
679
- -0.00208362
680
- -0.01645165
681
- 0.0110955965
682
- 0.012802119
683
- 0.0012175755
684
- ...
781
+ -0.0023891362
782
+ -0.016671216
783
+ 0.010879759
784
+ 0.012918914
785
+ 0.0012281279
786
+ ...
787
+ ```
788
+
789
+ ## Advanced Usage
790
+
791
+ ### Setting a Timeout
792
+
793
+ You can set a timeout for the `Spacy::Language.new` method:
794
+
795
+ ```ruby
796
+ nlp = Spacy::Language.new("en_core_web_sm", timeout: 120) # Set timeout to 120 seconds
685
797
  ```
686
798
 
687
799
  ## Author
@@ -12,7 +12,7 @@ doc = nlp.read("Vladimir Nabokov was a")
12
12
  # default parameter values
13
13
  # max_tokens: 1000
14
14
  # temperature: 0.7
15
- # model: "gpt-3.5-turbo-0613"
15
+ # model: "gpt-4o-mini"
16
16
  res = doc.openai_completion(access_token: api_key)
17
17
  puts res
18
18
 
@@ -12,7 +12,7 @@ doc = nlp.read("The Beatles released 12 studio albums")
12
12
  # default parameter values
13
13
  # max_tokens: 1000
14
14
  # temperature: 0.7
15
- # model: "gpt-3.5-turbo-0613"
15
+ # model: "gpt-4o-mini"
16
16
  res = doc.openai_query(access_token: api_key, prompt: "Translate the text to Japanese.")
17
17
 
18
18
  puts res
@@ -12,7 +12,7 @@ doc = nlp.read("The Beatles were an English rock band formed in Liverpool in 196
12
12
  # default parameter values
13
13
  # max_tokens: 1000
14
14
  # temperature: 0.7
15
- # model: "gpt-3.5-turbo-0613"
15
+ # model: "gpt-4o-mini"
16
16
  res = doc.openai_query(access_token: api_key, prompt: "Extract the topic of the document and list 10 entities (names, concepts, locations, etc.) that are relevant to the topic.")
17
17
 
18
18
  puts res
@@ -12,63 +12,78 @@ doc = nlp.read("The Beatles released 12 studio albums")
12
12
  # default parameter values
13
13
  # max_tokens: 1000
14
14
  # temperature: 0.7
15
- # model: "gpt-3.5-turbo-0613"
16
- res = doc.openai_query(access_token: api_key, prompt: "List detailed morphology data of each of the word used in the sentence")
15
+ # model: "gpt-4o-mini"
16
+ res = doc.openai_query(
17
+ access_token: api_key,
18
+ prompt: "List token data of each of the words used in the sentence. Add 'meaning' property and value (brief semantic definition) to each token data. Output as a JSON object.",
19
+ max_tokens: 1000,
20
+ temperature: 0.7,
21
+ model: "gpt-4o-mini"
22
+ )
17
23
 
18
24
  puts res
19
25
 
20
- # Here is the detailed morphology data for each word in the sentence:
21
- #
22
- # 1. Token: "The"
23
- # - Surface: "The"
24
- # - Lemma: "the"
25
- # - Part-of-speech: Determiner (DET)
26
- # - Tag: DT
27
- # - Dependency: Determiner (det)
28
- # - Entity type: None
29
- # - Morphology: {'Definite': 'Def', 'PronType': 'Art'}
30
- #
31
- # 2. Token: "Beatles"
32
- # - Surface: "Beatles"
33
- # - Lemma: "beatle"
34
- # - Part-of-speech: Noun (NOUN)
35
- # - Tag: NNS
36
- # - Dependency: Noun subject (nsubj)
37
- # - Entity type: GPE (Geopolitical Entity)
38
- # - Morphology: {'Number': 'Plur'}
39
- #
40
- # 3. Token: "released"
41
- # - Surface: "released"
42
- # - Lemma: "release"
43
- # - Part-of-speech: Verb (VERB)
44
- # - Tag: VBD
45
- # - Dependency: Root
46
- # - Entity type: None
47
- # - Morphology: {'Tense': 'Past', 'VerbForm': 'Fin'}
48
- #
49
- # 4. Token: "12"
50
- # - Surface: "12"
51
- # - Lemma: "12"
52
- # - Part-of-speech: Numeral (NUM)
53
- # - Tag: CD
54
- # - Dependency: Numeric modifier (nummod)
55
- # - Entity type: Cardinal number (CARDINAL)
56
- # - Morphology: {'NumType': 'Card'}
57
- #
58
- # 5. Token: "studio"
59
- # - Surface: "studio"
60
- # - Lemma: "studio"
61
- # - Part-of-speech: Noun (NOUN)
62
- # - Tag: NN
63
- # - Dependency: Compound
64
- # - Entity type: None
65
- # - Morphology: {'Number': 'Sing'}
66
- #
67
- # 6. Token: "albums"
68
- # - Surface: "albums"
69
- # - Lemma: "album"
70
- # - Part-of-speech: Noun (NOUN)
71
- # - Tag: NNS
72
- # - Dependency: Direct object (dobj)
73
- # - Entity type: None
74
- # - Morphology: {'Number': 'Plur'}
26
+ # {
27
+ # "tokens": [
28
+ # {
29
+ # "surface": "The",
30
+ # "lemma": "the",
31
+ # "pos": "DET",
32
+ # "tag": "DT",
33
+ # "dep": "det",
34
+ # "ent_type": "",
35
+ # "morphology": "{'Definite': 'Def', 'PronType': 'Art'}",
36
+ # "meaning": "Used to refer to one or more people or things already mentioned or assumed to be common knowledge"
37
+ # },
38
+ # {
39
+ # "surface": "Beatles",
40
+ # "lemma": "beatle",
41
+ # "pos": "NOUN",
42
+ # "tag": "NNS",
43
+ # "dep": "nsubj",
44
+ # "ent_type": "GPE",
45
+ # "morphology": "{'Number': 'Plur'}",
46
+ # "meaning": "A British rock band formed in Liverpool in 1960"
47
+ # },
48
+ # {
49
+ # "surface": "released",
50
+ # "lemma": "release",
51
+ # "pos": "VERB",
52
+ # "tag": "VBD",
53
+ # "dep": "ROOT",
54
+ # "ent_type": "",
55
+ # "morphology": "{'Tense': 'Past', 'VerbForm': 'Fin'}",
56
+ # "meaning": "To make something available or known to the public"
57
+ # },
58
+ # {
59
+ # "surface": "12",
60
+ # "lemma": "12",
61
+ # "pos": "NUM",
62
+ # "tag": "CD",
63
+ # "dep": "nummod",
64
+ # "ent_type": "CARDINAL",
65
+ # "morphology": "{'NumType': 'Card'}",
66
+ # "meaning": "A number representing a quantity"
67
+ # },
68
+ # {
69
+ # "surface": "studio",
70
+ # "lemma": "studio",
71
+ # "pos": "NOUN",
72
+ # "tag": "NN",
73
+ # "dep": "compound",
74
+ # "ent_type": "",
75
+ # "morphology": "{'Number': 'Sing'}",
76
+ # "meaning": "A place where creative work is done"
77
+ # },
78
+ # {
79
+ # "surface": "albums",
80
+ # "lemma": "album",
81
+ # "pos": "NOUN",
82
+ # "tag": "NNS",
83
+ # "dep": "dobj",
84
+ # "ent_type": "",
85
+ # "morphology": "{'Number': 'Plur'}",
86
+ # "meaning": "A collection of musical or spoken recordings"
87
+ # }
88
+ # ]
89
+ # }
@@ -12,11 +12,11 @@ doc = nlp.read("The Beatles released 12 studio albums")
12
12
  # default parameter values
13
13
  # max_tokens: 1000
14
14
  # temperature: 0.7
15
- # model: "gpt-3.5-turbo-0613"
15
+ # model: "gpt-4o-mini"
16
16
  res = doc.openai_query(
17
17
  access_token: api_key,
18
- model: "gpt-4",
19
- prompt: "Generate a tree diagram from the text in the following style: [S [NP [Det the] [N cat]] [VP [V sat] [PP [P on] [NP the mat]]]"
18
+ model: "gpt-4o",
19
+ prompt: "Generate a tree diagram from the text using given token data. Use the following bracketing style: [S [NP [Det the] [N cat]] [VP [V sat] [PP [P on] [NP the mat]]]"
20
20
  )
21
21
 
22
22
  puts res
@@ -2,5 +2,5 @@
2
2
 
3
3
  module Spacy
4
4
  # The version number of the module
5
- VERSION = "0.2.2"
5
+ VERSION = "0.2.3"
6
6
  end
data/lib/ruby-spacy.rb CHANGED
@@ -1,10 +1,21 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require_relative "ruby-spacy/version"
4
- require "strscan"
5
4
  require "numpy"
6
- require "pycall"
7
5
  require "openai"
6
+ require "pycall"
7
+ require "strscan"
8
+ require "timeout"
9
+
10
+ begin
11
+ PyCall.init
12
+ _spacy = PyCall.import_module("spacy")
13
+ rescue PyCall::PyError => e
14
+ puts "Failed to initialize PyCall or import spacy: #{e.message}"
15
+ puts "Python traceback:"
16
+ puts e.traceback
17
+ raise
18
+ end
8
19
 
9
20
  # This module covers the areas of spaCy functionality for _using_ many varieties of its language models, not for _building_ ones.
10
21
  module Spacy
@@ -216,7 +227,7 @@ module Spacy
216
227
  def openai_query(access_token: nil,
217
228
  max_tokens: 1000,
218
229
  temperature: 0.7,
219
- model: "gpt-3.5-turbo-0613",
230
+ model: "gpt-4o-mini",
220
231
  messages: [],
221
232
  prompt: nil)
222
233
  if messages.empty?
@@ -291,7 +302,7 @@ module Spacy
291
302
  end
292
303
  end
293
304
 
294
- def openai_completion(access_token: nil, max_tokens: 1000, temperature: 0.7, model: "gpt-3.5-turbo-0613")
305
+ def openai_completion(access_token: nil, max_tokens: 1000, temperature: 0.7, model: "gpt-4o-mini")
295
306
  messages = [
296
307
  { role: "system", content: "Complete the text input by the user." },
297
308
  { role: "user", content: @text }
@@ -355,16 +366,24 @@ module Spacy
355
366
 
356
367
  # Creates a language model instance, which is conventionally referred to by a variable named `nlp`.
357
368
  # @param model [String] A language model installed in the system
358
- def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0)
369
+ def initialize(model = "en_core_web_sm", max_retrial: MAX_RETRIAL, retrial: 0, timeout: 60)
359
370
  @spacy_nlp_id = "nlp_#{model.object_id}"
360
- PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')")
361
- @py_nlp = PyCall.eval(@spacy_nlp_id)
362
- rescue StandardError
363
- retrial += 1
364
- raise "Error: Pycall failed to load Spacy" unless retrial <= max_retrial
365
-
366
- sleep 0.5
367
- initialize(model, max_retrial: max_retrial, retrial: retrial)
371
+ begin
372
+ Timeout.timeout(timeout) do
373
+ PyCall.exec("import spacy; #{@spacy_nlp_id} = spacy.load('#{model}')")
374
+ end
375
+ @py_nlp = PyCall.eval(@spacy_nlp_id)
376
+ rescue Timeout::Error
377
+ raise "PyCall execution timed out after #{timeout} seconds"
378
+ rescue StandardError => e
379
+ retrial += 1
380
+ if retrial <= max_retrial
381
+ sleep 0.5
382
+ retry
383
+ else
384
+ raise "Failed to initialize Spacy after #{max_retrial} attempts: #{e.message}"
385
+ end
386
+ end
368
387
  end
369
388
 
370
389
  # Reads and analyze the given text.
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-spacy
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.2
4
+ version: 0.2.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Yoichiro Hasebe
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-10-03 00:00:00.000000000 Z
11
+ date: 2024-08-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -224,7 +224,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
224
224
  - !ruby/object:Gem::Version
225
225
  version: '0'
226
226
  requirements: []
227
- rubygems_version: 3.4.12
227
+ rubygems_version: 3.4.13
228
228
  signing_key:
229
229
  specification_version: 4
230
230
  summary: A wrapper module for using spaCy natural language processing library from