tumblrbot 1.9.5__tar.gz → 1.9.6__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -218,4 +218,4 @@ __marimo__/
218
218
  data
219
219
  *.toml
220
220
  *.jsonl
221
- *.lnk
221
+ tumblrbot.ps1
@@ -1,15 +1,15 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: tumblrbot
3
- Version: 1.9.5
3
+ Version: 1.9.6
4
4
  Summary: An updated bot that posts to Tumblr, based on your very own blog!
5
5
  Requires-Python: >= 3.14
6
6
  Description-Content-Type: text/markdown
7
- Requires-Dist: click
8
7
  Requires-Dist: openai
9
8
  Requires-Dist: pydantic
10
9
  Requires-Dist: requests
11
10
  Requires-Dist: requests-oauthlib
12
11
  Requires-Dist: rich
12
+ Requires-Dist: tenacity
13
13
  Requires-Dist: tiktoken
14
14
  Requires-Dist: tomlkit
15
15
  Project-URL: Funding, https://ko-fi.com/maidscientistizutsumimarin
@@ -84,10 +84,6 @@ Features:
84
84
  - Colorful output, progress bars, and post previews using [rich].
85
85
  - Automatically keeps the [config][configurable] file up-to-date and recreates it if missing (without overriding user settings).
86
86
 
87
- **To-Do:**
88
-
89
- - Add retry logic for rate limiting.
90
-
91
87
  **Known Issues:**
92
88
 
93
89
  - Fine-tuning can fail after the validation phase due to the examples file not passing [OpenAI] moderation checks. There are a few workarounds for this that can be tried in combination:
@@ -103,24 +99,33 @@ Features:
103
99
 
104
100
  **Please submit an issue or contact us for features you want added/reimplemented.**
105
101
 
106
- ## Installation
102
+ ## Installation & Usage
103
+
104
+ ### Downloadable Binary
105
+
106
+ | Pros | Cons |
107
+ | --- | --- |
108
+ | Easier to install | Harder to update |
109
+ | No risk of dependencies breaking | Dependencies may be older |
110
+
111
+ 1. Download the latest release's [tumblrbot.exe].
112
+ 1. Launch `tumblrbot.exe` in the install location.
113
+
114
+ ### PyPi
115
+
116
+ | Pros | Cons |
117
+ | --- | --- |
118
+ | Easier to update | Harder to install |
119
+ | Dependencies may be newer | Dependencies may break |
107
120
 
108
121
  1. Install the latest version of [Python]:
109
122
  - Windows: `winget install python3`
110
123
  - Linux (apt): `apt install python-pip`
111
124
  - Linux (pacman): `pacman install python-pip`
112
125
  1. Install the [pip] package: `pip install tumblrbot`
113
- - Alternatively, you can install from this repository: `pip install git+https://github.com/MaidThatPrograms/tumblrbot.git`
126
+ - Alternatively, you can install from this repository: `pip install git+https://github.com/MaidScientistIzutsumiMarin/tumblrbot.git`
114
127
  - On Linux, you will have to make a virtual environment or use the flag to install packages system-wide.
115
-
116
- ### Alternative Installation for Windows
117
-
118
- 1. Download the latest release's [tumblrbot.exe].
119
- 1. Run the file directly, or add it to your path, and use it as normal.
120
-
121
- ## Usage
122
-
123
- Run `tumblrbot` from anywhere. Run `tumblrbot --help` for command-line options. Every command-line option corresponds to a value from the [config][configurable].
128
+ 1. Run `tumblrbot` from anywhere. Run `tumblrbot --help` for command-line options. Every command-line option corresponds to a value from the [config][configurable].
124
129
 
125
130
  ## Obtaining Tokens
126
131
 
@@ -177,6 +182,7 @@ Specific Options:
177
182
  To be specific, it should follow the [JSON Lines] file format with one collection of name/value pairs (a dictionary) per line. You can validate your file using the [JSON Lines Validator].
178
183
 
179
184
  - **`post_limit`** - At most, this many valid posts will be included in the training data. This effectively is a filter to select the `N` most recent valid posts from each blog. `0` will use every available valid post.
185
+ - **`moderation_batch_size`** - This controls the batch size when submitting posts to the OpenAI moderation. There is no limit, but higher numbers will cause you to be rate-limited more, which can overall be slower. Low numbers reduce rate-limiting, but can sometimes take longer due to needing more requests. The best value will depend on your computer, internet connection, and any number of factors on OpenAI's side. The default value is just what worked best for our computer.
180
186
  - **`filtered_words`** - During training data generation, any posts with the specified words will be removed. Word boundaries are not checked by default, so “the” will also filter out posts with “them” or “thematic”. This setting supports regular expressions, so you can explicitly look for word boundaries by surrounding an entry with “\\\b”, i.e., “\\\bthe\\\b”. Regular expressions have to be escaped like so due to how JSON data is read in. If you are familiar with regular expressions, it could be useful for you to know that every entry is joined with a “|” which is then used to search the post content for any matches.
181
187
  - **`developer_message`** - This message is used in for fine-tuning the AI as well as generating prompts. If you change this, you will need to run the fine-tuning again with the new value before generating posts.
182
188
  - **`user_message`** - This setting is used and works in the same way as `developer_message`.
@@ -67,10 +67,6 @@ Features:
67
67
  - Colorful output, progress bars, and post previews using [rich].
68
68
  - Automatically keeps the [config][configurable] file up-to-date and recreates it if missing (without overriding user settings).
69
69
 
70
- **To-Do:**
71
-
72
- - Add retry logic for rate limiting.
73
-
74
70
  **Known Issues:**
75
71
 
76
72
  - Fine-tuning can fail after the validation phase due to the examples file not passing [OpenAI] moderation checks. There are a few workarounds for this that can be tried in combination:
@@ -86,24 +82,33 @@ Features:
86
82
 
87
83
  **Please submit an issue or contact us for features you want added/reimplemented.**
88
84
 
89
- ## Installation
85
+ ## Installation & Usage
86
+
87
+ ### Downloadable Binary
88
+
89
+ | Pros | Cons |
90
+ | --- | --- |
91
+ | Easier to install | Harder to update |
92
+ | No risk of dependencies breaking | Dependencies may be older |
93
+
94
+ 1. Download the latest release's [tumblrbot.exe].
95
+ 1. Launch `tumblrbot.exe` in the install location.
96
+
97
+ ### PyPi
98
+
99
+ | Pros | Cons |
100
+ | --- | --- |
101
+ | Easier to update | Harder to install |
102
+ | Dependencies may be newer | Dependencies may break |
90
103
 
91
104
  1. Install the latest version of [Python]:
92
105
  - Windows: `winget install python3`
93
106
  - Linux (apt): `apt install python-pip`
94
107
  - Linux (pacman): `pacman install python-pip`
95
108
  1. Install the [pip] package: `pip install tumblrbot`
96
- - Alternatively, you can install from this repository: `pip install git+https://github.com/MaidThatPrograms/tumblrbot.git`
109
+ - Alternatively, you can install from this repository: `pip install git+https://github.com/MaidScientistIzutsumiMarin/tumblrbot.git`
97
110
  - On Linux, you will have to make a virtual environment or use the flag to install packages system-wide.
98
-
99
- ### Alternative Installation for Windows
100
-
101
- 1. Download the latest release's [tumblrbot.exe].
102
- 1. Run the file directly, or add it to your path, and use it as normal.
103
-
104
- ## Usage
105
-
106
- Run `tumblrbot` from anywhere. Run `tumblrbot --help` for command-line options. Every command-line option corresponds to a value from the [config][configurable].
111
+ 1. Run `tumblrbot` from anywhere. Run `tumblrbot --help` for command-line options. Every command-line option corresponds to a value from the [config][configurable].
107
112
 
108
113
  ## Obtaining Tokens
109
114
 
@@ -160,6 +165,7 @@ Specific Options:
160
165
  To be specific, it should follow the [JSON Lines] file format with one collection of name/value pairs (a dictionary) per line. You can validate your file using the [JSON Lines Validator].
161
166
 
162
167
  - **`post_limit`** - At most, this many valid posts will be included in the training data. This effectively is a filter to select the `N` most recent valid posts from each blog. `0` will use every available valid post.
168
+ - **`moderation_batch_size`** - This controls the batch size when submitting posts to the OpenAI moderation. There is no limit, but higher numbers will cause you to be rate-limited more, which can overall be slower. Low numbers reduce rate-limiting, but can sometimes take longer due to needing more requests. The best value will depend on your computer, internet connection, and any number of factors on OpenAI's side. The default value is just what worked best for our computer.
163
169
  - **`filtered_words`** - During training data generation, any posts with the specified words will be removed. Word boundaries are not checked by default, so “the” will also filter out posts with “them” or “thematic”. This setting supports regular expressions, so you can explicitly look for word boundaries by surrounding an entry with “\\\b”, i.e., “\\\bthe\\\b”. Regular expressions have to be escaped like so due to how JSON data is read in. If you are familiar with regular expressions, it could be useful for you to know that every entry is joined with a “|” which is then used to search the post content for any matches.
164
170
  - **`developer_message`** - This message is used in for fine-tuning the AI as well as generating prompts. If you change this, you will need to run the fine-tuning again with the new value before generating posts.
165
171
  - **`user_message`** - This setting is used and works in the same way as `developer_message`.
@@ -0,0 +1 @@
1
+ ..\..\Powershell\build.ps1 -ExtraArgs '--collect-all tiktoken_ext'
@@ -1,18 +1,18 @@
1
1
  [project]
2
2
  name = "tumblrbot"
3
- version = "1.9.5"
3
+ version = "1.9.6"
4
4
  description = "An updated bot that posts to Tumblr, based on your very own blog!"
5
5
  readme = "README.md"
6
6
  requires-python = ">= 3.14"
7
7
  dependencies = [
8
- "click",
9
8
  "openai",
10
9
  "pydantic",
11
10
  "requests",
12
11
  "requests-oauthlib",
13
12
  "rich",
13
+ "tenacity",
14
14
  "tiktoken",
15
- "tomlkit",
15
+ "tomlkit"
16
16
  ]
17
17
 
18
18
  [project.urls]
@@ -1,3 +1,5 @@
1
+ from sys import exit as sys_exit
2
+
1
3
  from openai import OpenAI
2
4
  from rich.prompt import Confirm
3
5
  from rich.traceback import install
@@ -35,3 +37,7 @@ def main() -> None:
35
37
 
36
38
  if Confirm.ask("Generate drafts?", default=False):
37
39
  DraftGenerator(openai=openai, tumblr=tumblr).main()
40
+
41
+
42
+ if __name__ == "__main__":
43
+ sys_exit(main())
@@ -0,0 +1,97 @@
1
+ from collections.abc import Generator
2
+ from itertools import batched
3
+ from json import loads
4
+ from math import ceil
5
+ from re import IGNORECASE
6
+ from re import compile as re_compile
7
+ from typing import TYPE_CHECKING, override
8
+
9
+ from openai import RateLimitError
10
+ from rich import print as rich_print
11
+ from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_random_exponential
12
+
13
+ from tumblrbot.utils.common import FlowClass, PreviewLive
14
+ from tumblrbot.utils.models import Example, Message, Post
15
+
16
+ if TYPE_CHECKING:
17
+ from collections.abc import Generator, Iterable
18
+ from pathlib import Path
19
+
20
+ from openai._types import SequenceNotStr
21
+ from openai.types import ModerationCreateResponse, ModerationMultiModalInputParam
22
+
23
+
24
+ class ExamplesWriter(FlowClass):
25
+ @override
26
+ def main(self) -> None:
27
+ self.config.examples_file.parent.mkdir(parents=True, exist_ok=True)
28
+
29
+ examples = [self.create_example(*prompt) for prompt in self.get_custom_prompts()]
30
+ examples.extend(self.create_example(self.config.user_message, str(post)) for post in self.get_valid_posts())
31
+ self.write_examples(examples)
32
+
33
+ rich_print(f"[bold]The examples file can be found at: '{self.config.examples_file}'\n")
34
+
35
+ def create_example(self, user_message: str, assistant_message: str) -> Example:
36
+ return Example(
37
+ messages=[
38
+ Message(role="developer", content=self.config.developer_message),
39
+ Message(role="user", content=user_message),
40
+ Message(role="assistant", content=assistant_message),
41
+ ],
42
+ )
43
+
44
+ def get_custom_prompts(self) -> Generator[tuple[str, str]]:
45
+ self.config.custom_prompts_file.parent.mkdir(parents=True, exist_ok=True)
46
+ self.config.custom_prompts_file.touch(exist_ok=True)
47
+
48
+ with self.config.custom_prompts_file.open("rb") as fp:
49
+ for line in fp:
50
+ data: dict[str, str] = loads(line)
51
+ yield from data.items()
52
+
53
+ # This function mostly exists to make writing examples atomic.
54
+ def write_examples(self, examples: Iterable[Example]) -> None:
55
+ with self.config.examples_file.open("w", encoding="utf_8") as fp:
56
+ for example in examples:
57
+ fp.write(f"{example.model_dump_json()}\n")
58
+
59
+ def get_valid_posts(self) -> Generator[Post]:
60
+ for path in self.get_data_paths():
61
+ posts = list(self.get_valid_posts_from_path(path))
62
+ yield from posts[-self.config.post_limit :]
63
+
64
+ def get_valid_posts_from_path(self, path: Path) -> Generator[Post]:
65
+ pattern = re_compile("|".join(self.config.filtered_words), IGNORECASE)
66
+ with path.open("rb") as fp:
67
+ for line in fp:
68
+ post = Post.model_validate_json(line)
69
+ if post.valid_text_post() and not (post.trail and self.config.filtered_words and pattern.search(str(post))):
70
+ yield post
71
+
72
+ def filter_examples(self) -> None:
73
+ raw_examples = self.config.examples_file.read_bytes().splitlines()
74
+ old_examples = map(Example.model_validate_json, raw_examples)
75
+ new_examples: list[Example] = []
76
+ with PreviewLive() as live:
77
+ for batch in live.progress.track(
78
+ batched(old_examples, self.config.moderation_batch_size, strict=False),
79
+ ceil(len(raw_examples) / self.config.moderation_batch_size),
80
+ description="Removing flagged posts...",
81
+ ):
82
+ response = self.create_moderation_batch(tuple(map(Example.get_assistant_message, batch)))
83
+ new_examples.extend(example for example, moderation in zip(batch, response.results, strict=True) if not moderation.flagged)
84
+
85
+ self.write_examples(new_examples)
86
+
87
+ rich_print(f"[red]Removed {len(raw_examples) - len(new_examples)} posts.\n")
88
+
89
+ @retry(
90
+ stop=stop_after_attempt(10),
91
+ wait=wait_random_exponential(),
92
+ retry=retry_if_exception_type(RateLimitError),
93
+ before_sleep=lambda state: rich_print(f"[yellow]OpenAI rate limit exceeded. Waiting for {state.idle_for} seconds..."),
94
+ reraise=True,
95
+ )
96
+ def create_moderation_batch(self, api_input: str | SequenceNotStr[str] | Iterable[ModerationMultiModalInputParam]) -> ModerationCreateResponse:
97
+ return self.openai.moderations.create(input=api_input)
@@ -3,9 +3,9 @@ from textwrap import dedent
3
3
  from time import sleep
4
4
  from typing import TYPE_CHECKING, override
5
5
 
6
- import rich
7
- from rich import progress
6
+ from rich import print as rich_print
8
7
  from rich.console import Console
8
+ from rich.progress import open as progress_open
9
9
  from rich.prompt import Confirm
10
10
  from tiktoken import encoding_for_model, get_encoding
11
11
 
@@ -21,7 +21,7 @@ if TYPE_CHECKING:
21
21
  class FineTuner(FlowClass):
22
22
  @staticmethod
23
23
  def dedent_print(text: str) -> None:
24
- rich.print(dedent(text).lstrip())
24
+ rich_print(dedent(text).lstrip())
25
25
 
26
26
  @override
27
27
  def main(self) -> None:
@@ -55,12 +55,12 @@ class FineTuner(FlowClass):
55
55
  if self.config.job_id:
56
56
  return self.poll_job_status()
57
57
 
58
- with progress.open(self.config.examples_file, "rb", description=f"Uploading [purple]{self.config.examples_file}[/]...") as fp:
58
+ with progress_open(self.config.examples_file, "rb", description=f"Uploading [purple]{self.config.examples_file}[/]...") as fp:
59
59
  file = self.openai.files.create(
60
60
  file=fp,
61
61
  purpose="fine-tune",
62
62
  )
63
- rich.print()
63
+ rich_print()
64
64
 
65
65
  job = self.openai.fine_tuning.jobs.create(
66
66
  model=self.config.base_model,
@@ -96,7 +96,7 @@ class FineTuner(FlowClass):
96
96
  if job.status != "succeeded":
97
97
  if Confirm.ask("[gray62]Delete uploaded examples file?", default=False):
98
98
  self.openai.files.delete(job.training_file)
99
- rich.print()
99
+ rich_print()
100
100
 
101
101
  if job.status == "failed" and job.error is not None:
102
102
  raise RuntimeError(job.error.message)
@@ -2,12 +2,12 @@ from functools import cache
2
2
  from random import choice, random, sample
3
3
  from typing import TYPE_CHECKING, override
4
4
 
5
- import rich
6
5
  from pydantic import ConfigDict
6
+ from rich import print as rich_print
7
7
  from rich.prompt import IntPrompt
8
8
 
9
9
  from tumblrbot.utils.common import FlowClass, PreviewLive
10
- from tumblrbot.utils.models import Post
10
+ from tumblrbot.utils.models import Block, Post
11
11
 
12
12
  if TYPE_CHECKING:
13
13
  from collections.abc import Iterable
@@ -32,7 +32,7 @@ class DraftGenerator(FlowClass):
32
32
  exception.add_note(f"📉 An error occurred! Generated {i} draft(s) before failing. {message}")
33
33
  raise
34
34
 
35
- rich.print(f":chart_increasing: [bold green]Generated {self.config.draft_count} draft(s).[/] {message}")
35
+ rich_print(f":chart_increasing: [bold green]Generated {self.config.draft_count} draft(s).[/] {message}")
36
36
 
37
37
  def generate_post(self) -> Post:
38
38
  if original := self.get_random_post():
@@ -48,7 +48,7 @@ class DraftGenerator(FlowClass):
48
48
  tags = tags.tags
49
49
 
50
50
  return Post(
51
- content=[Post.Block(type="text", text=text)],
51
+ content=[Block(type="text", text=text)],
52
52
  tags=tags or [],
53
53
  parent_tumblelog_uuid=original.blog.uuid,
54
54
  parent_post_id=original.id,
@@ -1,18 +1,21 @@
1
1
  from abc import abstractmethod
2
- from pathlib import Path
3
2
  from random import choice
4
- from typing import ClassVar, Self, override
3
+ from typing import TYPE_CHECKING, ClassVar, Self, override
5
4
 
6
- from openai import OpenAI
5
+ from openai import OpenAI # noqa: TC002
7
6
  from pydantic import ConfigDict
8
7
  from rich._spinners import SPINNERS
9
- from rich.console import RenderableType
10
8
  from rich.live import Live
11
9
  from rich.progress import MofNCompleteColumn, Progress, SpinnerColumn, TimeElapsedColumn
12
10
  from rich.table import Table
13
11
 
14
12
  from tumblrbot.utils.models import Config, FullyValidatedModel
15
- from tumblrbot.utils.tumblr import TumblrSession
13
+ from tumblrbot.utils.tumblr import TumblrSession # noqa: TC001
14
+
15
+ if TYPE_CHECKING:
16
+ from pathlib import Path
17
+
18
+ from rich.console import RenderableType
16
19
 
17
20
 
18
21
  class FlowClass(FullyValidatedModel):
@@ -1,17 +1,19 @@
1
- from collections.abc import Generator
2
1
  from getpass import getpass
3
2
  from pathlib import Path
4
- from typing import Annotated, Any, Literal, Self, override
3
+ from tomllib import loads
4
+ from typing import TYPE_CHECKING, Annotated, Any, Literal, Self, override
5
5
 
6
- import rich
7
- from openai.types import ChatModel
6
+ from openai.types import ChatModel # noqa: TC002
8
7
  from pydantic import BaseModel, ConfigDict, Field, NonNegativeFloat, NonNegativeInt, PlainSerializer, PositiveFloat, PositiveInt, model_validator
9
- from pydantic.json_schema import SkipJsonSchema
8
+ from pydantic.json_schema import SkipJsonSchema # noqa: TC002
10
9
  from requests_oauthlib import OAuth1Session
10
+ from rich import print as rich_print
11
11
  from rich.panel import Panel
12
12
  from rich.prompt import Prompt
13
13
  from tomlkit import comment, document, dumps # pyright: ignore[reportUnknownVariableType]
14
- from tomllib import loads
14
+
15
+ if TYPE_CHECKING:
16
+ from collections.abc import Generator
15
17
 
16
18
 
17
19
  class FullyValidatedModel(BaseModel):
@@ -58,7 +60,7 @@ class Config(FileSyncSettings):
58
60
 
59
61
  # Writing Examples
60
62
  post_limit: NonNegativeInt = Field(0, description="The number of the most recent posts from each blog that should be included in the training data.")
61
- max_moderation_batch_size: PositiveInt = Field(100, description="The number of posts, at most, to submit to the OpenAI moderation API. This is also capped by the API.")
63
+ moderation_batch_size: PositiveInt = Field(25, description="The number of posts at a time to submit to the OpenAI moderation API.")
62
64
  custom_prompts_file: Path = Field(Path("custom_prompts.jsonl"), description="Where to read in custom prompts from.")
63
65
  filtered_words: list[str] = Field([], description="A case-insensitive list of disallowed words used to filter out training data. Regular expressions are allowed, but must be escaped.")
64
66
 
@@ -80,7 +82,7 @@ class Config(FileSyncSettings):
80
82
 
81
83
  # Generating
82
84
  upload_blog_identifier: str = Field("", description="The identifier of the blog which generated drafts will be uploaded to. This must be a blog associated with the same account as the configured Tumblr secret tokens.")
83
- draft_count: PositiveInt = Field(150, description="The number of drafts to process. This will affect the number of tokens used with OpenAI")
85
+ draft_count: PositiveInt = Field(100, description="The number of drafts to process. This will affect the number of tokens used with OpenAI")
84
86
  tags_chance: NonNegativeFloat = Field(0.1, description="The chance to generate tags for any given post. This will use more OpenAI tokens.")
85
87
  tags_developer_message: str = Field("You will be provided with a block of text, and your task is to extract a very short list of the most important subjects from it.", description="The developer message used to generate tags.")
86
88
  reblog_blog_identifiers: list[str] = Field([], description="The identifiers of blogs that can be reblogged from when generating drafts.")
@@ -88,13 +90,15 @@ class Config(FileSyncSettings):
88
90
  reblog_user_message: str = Field("Please write a comical Tumblr post in response to the following post:\n\n{}", description="The format string for the user message used to reblog posts.")
89
91
 
90
92
  @override
91
- def model_post_init(self, _: object) -> None:
93
+ def model_post_init(self, context: object) -> None:
94
+ super().model_post_init(context)
95
+
92
96
  if not self.download_blog_identifiers:
93
- rich.print("Enter the [cyan]identifiers of your blogs[/] that data should be [bold purple]downloaded[/] from, separated by commas.")
97
+ rich_print("Enter the [cyan]identifiers of your blogs[/] that data should be [bold purple]downloaded[/] from, separated by commas.")
94
98
  self.download_blog_identifiers = list(map(str.strip, Prompt.ask("[bold][Example] [dim]staff.tumblr.com,changes").split(",")))
95
99
 
96
100
  if not self.upload_blog_identifier:
97
- rich.print("Enter the [cyan]identifier of your blog[/] that drafts should be [bold purple]uploaded[/] to.")
101
+ rich_print("Enter the [cyan]identifier of your blog[/] that drafts should be [bold purple]uploaded[/] to.")
98
102
  self.upload_blog_identifier = Prompt.ask("[bold][Example] [dim]staff.tumblr.com or changes").strip()
99
103
 
100
104
 
@@ -109,7 +113,9 @@ class Tokens(FileSyncSettings):
109
113
  tumblr: Tumblr = Tumblr()
110
114
 
111
115
  @override
112
- def model_post_init(self, _: object) -> None:
116
+ def model_post_init(self, context: object) -> None:
117
+ super().model_post_init(context)
118
+
113
119
  # Check if any tokens are missing or if the user wants to reset them, then set tokens if necessary.
114
120
  if not self.openai_api_key:
115
121
  (self.openai_api_key,) = self.online_token_prompt("https://platform.openai.com/api-keys", "API key")
@@ -124,8 +130,8 @@ class Tokens(FileSyncSettings):
124
130
  self.tumblr.client_key,
125
131
  self.tumblr.client_secret,
126
132
  ) as oauth_session:
127
- fetch_response = oauth_session.fetch_request_token("http://tumblr.com/oauth/request_token")
128
- full_authorize_url = oauth_session.authorization_url("http://tumblr.com/oauth/authorize")
133
+ fetch_response = oauth_session.fetch_request_token("http://tumblr.com/oauth/request_token") # pyright: ignore[reportUnknownMemberType]
134
+ full_authorize_url = oauth_session.authorization_url("http://tumblr.com/oauth/authorize") # pyright: ignore[reportUnknownMemberType]
129
135
  (redirect_response,) = self.online_token_prompt(full_authorize_url, "full redirect URL")
130
136
  oauth_response = oauth_session.parse_authorization_response(redirect_response)
131
137
 
@@ -135,7 +141,7 @@ class Tokens(FileSyncSettings):
135
141
  *self.get_oauth_tokens(fetch_response),
136
142
  verifier=oauth_response["oauth_verifier"],
137
143
  ) as oauth_session:
138
- oauth_tokens = oauth_session.fetch_access_token("http://tumblr.com/oauth/access_token")
144
+ oauth_tokens = oauth_session.fetch_access_token("http://tumblr.com/oauth/access_token") # pyright: ignore[reportUnknownMemberType]
139
145
 
140
146
  self.tumblr.resource_owner_key, self.tumblr.resource_owner_secret = self.get_oauth_tokens(oauth_tokens)
141
147
 
@@ -143,11 +149,11 @@ class Tokens(FileSyncSettings):
143
149
  def online_token_prompt(url: str, *tokens: str) -> Generator[str]:
144
150
  formatted_token_string = " and ".join(f"[cyan]{token}[/]" for token in tokens)
145
151
 
146
- rich.print(f"Retrieve your {formatted_token_string} from: {url}")
152
+ rich_print(f"Retrieve your {formatted_token_string} from: {url}")
147
153
  for token in tokens:
148
154
  yield getpass(f"Enter your {token} (masked): ", echo_char="*").strip()
149
155
 
150
- rich.print()
156
+ rich_print()
151
157
 
152
158
  @staticmethod
153
159
  def get_oauth_tokens(token: dict[str, str]) -> tuple[str, str]:
@@ -160,20 +166,22 @@ class Blog(FullyValidatedModel):
160
166
  uuid: str = ""
161
167
 
162
168
 
163
- class ResponseModel(FullyValidatedModel):
164
- class Response(FullyValidatedModel):
165
- blog: Blog = Blog()
166
- posts: list[Any] = []
169
+ class Response(FullyValidatedModel):
170
+ blog: Blog = Blog()
171
+ posts: list[Any] = []
167
172
 
173
+
174
+ class ResponseModel(FullyValidatedModel):
168
175
  response: Response
169
176
 
170
177
 
171
- class Post(FullyValidatedModel):
172
- class Block(FullyValidatedModel):
173
- type: str = ""
174
- text: str = ""
175
- blocks: list[int] = []
178
+ class Block(FullyValidatedModel):
179
+ type: str = ""
180
+ text: str = ""
181
+ blocks: list[int] = []
182
+
176
183
 
184
+ class Post(FullyValidatedModel):
177
185
  blog: SkipJsonSchema[Blog] = Blog()
178
186
  id: SkipJsonSchema[int] = 0
179
187
  parent_tumblelog_uuid: SkipJsonSchema[str] = ""
@@ -212,9 +220,17 @@ class Post(FullyValidatedModel):
212
220
  return bool(self.content) and all(block.type == "text" for block in self.content) and not (self.is_submission or any(block.type == "ask" for block in self.layout))
213
221
 
214
222
 
215
- class Example(FullyValidatedModel):
216
- class Message(FullyValidatedModel):
217
- role: Literal["developer", "user", "assistant"]
218
- content: str
223
+ class Message(FullyValidatedModel):
224
+ role: Literal["developer", "user", "assistant"]
225
+ content: str
219
226
 
227
+
228
+ class Example(FullyValidatedModel):
220
229
  messages: list[Message]
230
+
231
+ def get_assistant_message(self) -> str:
232
+ for message in self.messages:
233
+ if message.role == "assistant":
234
+ return message.content
235
+ msg = "Assistant message not found!"
236
+ raise ValueError(msg)
@@ -2,13 +2,23 @@ from typing import Self
2
2
 
3
3
  from requests import HTTPError, Response
4
4
  from requests_oauthlib import OAuth1Session
5
+ from rich import print as rich_print
6
+ from tenacity import retry, retry_if_exception_message, stop_after_attempt, wait_random_exponential
5
7
 
6
8
  from tumblrbot.utils.models import Post, ResponseModel, Tokens
7
9
 
10
+ rate_limit_retry = retry(
11
+ stop=stop_after_attempt(10),
12
+ wait=wait_random_exponential(min=60),
13
+ retry=retry_if_exception_message(match="429 Client Error: Limit Exceeded for url: .+"),
14
+ before_sleep=lambda state: rich_print(f"[yellow]Tumblr rate limit exceeded. Waiting for {state.idle_for} seconds..."),
15
+ reraise=True,
16
+ )
17
+
8
18
 
9
19
  class TumblrSession(OAuth1Session):
10
20
  def __init__(self, tokens: Tokens) -> None:
11
- super().__init__(**tokens.tumblr.model_dump())
21
+ super().__init__(**tokens.tumblr.model_dump()) # pyright: ignore[reportUnknownMemberType]
12
22
  self.hooks["response"].append(self.response_hook)
13
23
 
14
24
  def __enter__(self) -> Self:
@@ -22,10 +32,12 @@ class TumblrSession(OAuth1Session):
22
32
  error.add_note(response.text)
23
33
  raise
24
34
 
35
+ @rate_limit_retry
25
36
  def retrieve_blog_info(self, blog_identifier: str) -> ResponseModel:
26
37
  response = self.get(f"https://api.tumblr.com/v2/blog/{blog_identifier}/info")
27
38
  return ResponseModel.model_validate_json(response.text)
28
39
 
40
+ @rate_limit_retry
29
41
  def retrieve_published_posts(
30
42
  self,
31
43
  blog_identifier: str,
@@ -43,6 +55,7 @@ class TumblrSession(OAuth1Session):
43
55
  )
44
56
  return ResponseModel.model_validate_json(response.text)
45
57
 
58
+ @rate_limit_retry
46
59
  def create_post(self, blog_identifier: str, post: Post) -> ResponseModel:
47
60
  response = self.post(
48
61
  f"https://api.tumblr.com/v2/blog/{blog_identifier}/posts",
@@ -1,100 +0,0 @@
1
- import re
2
- from itertools import batched
3
- from json import loads
4
- from math import ceil
5
- from re import search
6
- from typing import IO, TYPE_CHECKING, override
7
-
8
- import rich
9
- from openai import BadRequestError
10
-
11
- from tumblrbot.utils.common import FlowClass, PreviewLive
12
- from tumblrbot.utils.models import Example, Post
13
-
14
- if TYPE_CHECKING:
15
- from collections.abc import Generator
16
- from pathlib import Path
17
-
18
-
19
- class ExamplesWriter(FlowClass):
20
- @override
21
- def main(self) -> None:
22
- self.config.examples_file.parent.mkdir(parents=True, exist_ok=True)
23
-
24
- with self.config.examples_file.open("w", encoding="utf_8") as fp:
25
- for user_message, assistant_response in self.get_custom_prompts():
26
- self.write_example(
27
- user_message,
28
- assistant_response,
29
- fp,
30
- )
31
-
32
- for post in self.get_valid_posts():
33
- self.write_example(
34
- self.config.user_message,
35
- str(post),
36
- fp,
37
- )
38
-
39
- rich.print(f"[bold]The examples file can be found at: '{self.config.examples_file}'\n")
40
-
41
- def write_example(self, user_message: str, assistant_message: str, fp: IO[str]) -> None:
42
- example = Example(
43
- messages=[
44
- Example.Message(role="developer", content=self.config.developer_message),
45
- Example.Message(role="user", content=user_message),
46
- Example.Message(role="assistant", content=assistant_message),
47
- ],
48
- )
49
- fp.write(f"{example.model_dump_json()}\n")
50
-
51
- def get_custom_prompts(self) -> Generator[tuple[str, str]]:
52
- self.config.custom_prompts_file.parent.mkdir(parents=True, exist_ok=True)
53
- self.config.custom_prompts_file.touch(exist_ok=True)
54
-
55
- with self.config.custom_prompts_file.open("rb") as fp:
56
- for line in fp:
57
- data: dict[str, str] = loads(line)
58
- yield from data.items()
59
-
60
- def get_valid_posts(self) -> Generator[Post]:
61
- for path in self.get_data_paths():
62
- posts = list(self.get_valid_posts_from_path(path))
63
- yield from posts[-self.config.post_limit :]
64
-
65
- def get_valid_posts_from_path(self, path: Path) -> Generator[Post]:
66
- pattern = re.compile("|".join(self.config.filtered_words), re.IGNORECASE)
67
- with path.open("rb") as fp:
68
- for line in fp:
69
- post = Post.model_validate_json(line)
70
- if post.valid_text_post() and not (post.trail and self.config.filtered_words and pattern.search(str(post))):
71
- yield post
72
-
73
- def filter_examples(self) -> None:
74
- examples = self.config.examples_file.read_text("utf_8").splitlines()
75
- with self.config.examples_file.open("w", encoding="utf_8") as fp:
76
- batch_size = self.get_moderation_batch_size()
77
- removed = 0
78
-
79
- with PreviewLive() as live:
80
- for batch in live.progress.track(
81
- batched(examples, batch_size, strict=False),
82
- ceil(len(examples) / batch_size),
83
- description="Removing flagged posts...",
84
- ):
85
- response = self.openai.moderations.create(input=list(batch))
86
- for example, moderation in zip(batch, response.results, strict=True):
87
- if moderation.flagged:
88
- removed += 1
89
- else:
90
- fp.write(f"{example}\n")
91
- rich.print(f"[red]Removed {removed} posts.\n")
92
-
93
- def get_moderation_batch_size(self) -> int:
94
- try:
95
- self.openai.moderations.create(input=[""] * self.config.max_moderation_batch_size)
96
- except BadRequestError as error:
97
- message = error.response.json()["error"]["message"]
98
- if match := search(r"(\d+)\.", message):
99
- return int(match.group(1))
100
- return self.config.max_moderation_batch_size
File without changes
File without changes