PyPI - tumblrbot - Versions diffs - 1.4.3__tar.gz → 1.4.5__tar.gz - Mend

tumblrbot 1.4.3tar.gz → 1.4.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/.gitignore RENAMED Viewed

@@ -1,7 +1,8 @@
 # Custom
+.vscode
 data
 *.toml
-*.json*
+*.jsonl
 # Byte-compiled / optimized / DLL files
 __pycache__/

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: tumblrbot
-Version: 1.4.3
+Version: 1.4.5
 Summary: An updated bot that posts to Tumblr, based on your very own blog!
 Requires-Python: >= 3.13
 Description-Content-Type: text/markdown
@@ -12,35 +12,41 @@ Requires-Dist: openai
 Requires-Dist: pwinput
 Requires-Dist: pydantic
 Requires-Dist: pydantic-settings
-Requires-Dist: requests
-Requires-Dist: requests-cache
 Requires-Dist: requests-oauthlib
 Requires-Dist: rich
 Requires-Dist: tiktoken
 Requires-Dist: tomlkit
 Project-URL: Source, https://github.com/MaidThatPrograms/tumblrbot
+# tumblrbot
 [OAuth]: https://oauth.net/1
-[OpenAI]: https://pypi.org/project/openai
 [Python]: https://python.org/download
-[Tumblr]: https://tumblr.com
+[JSON Lines]: https://jsonlines.org
+[JSON Lines Validator]: https://jsonlines.org/validator
+[pip]: https://pypi.org
 [keyring]: https://pypi.org/project/keyring
 [Rich]: https://pypi.org/project/rich
+[OpenAI]: https://pypi.org/project/openai
+[OpenAI Pricing]: https://platform.openai.com/docs/pricing#fine-tuning
+[OpenAI Tokens]: https://platform.openai.com/settings/organization/api-keys
+[Fine-Tuning Portal]: https://platform.openai.com/finetune
 [Moderation API]: https://platform.openai.com/docs/api-reference/moderations
-[pip]: https://pypi.org
+[Tumblr]: https://tumblr.com
+[Tumblr Tokens]: https://tumblr.com/oauth/apps
 [Download]: src/tumblrbot/flow/download.py
 [Examples]: src/tumblrbot/flow/examples.py
 [Fine-Tune]: src/tumblrbot/flow/fine_tune.py
 [Generate]: src/tumblrbot/flow/generate.py
 [Main]: src/tumblrbot/__main__.py
-[README.md]: README.md
-[config]: #configuration
-# tumblrbot
+[Config]: #configuration
+[Fine-Tuning]: #manual-fine-tuning
 [![PyPI - Version](https://img.shields.io/pypi/v/tumblrbot)](https://python.org/pypi/tumblrbot)
 Description of original project:
@@ -49,6 +55,7 @@ Description of original project:
 This fork is largely a rewrite of the source code with similarities in its structure and process.
 Features:
 - An [interactive console][Main] for all steps of generating posts for the blog:
    1. Asks for [OpenAI] and [Tumblr] tokens.
       - Stores API tokens using [keyring].
@@ -75,14 +82,18 @@ Features:
 - Automatically keeps the [config] file up-to-date and recreates it if missing.
 **To-Do:**
 - Add code documentation.
-- Fix inaccurate post counts when downloading posts.
-- Fix file not found error when starting fine-tuning.
+**Known Issues:**
+- Sometimes, you will get an error about the training file not being found when starting fine-tuning. We do not currently have a fix or workaround for this. You should instead use the online portal for fine-tuning if this continues to happen. Read more in [fine-tuning].
+- Post counts are incorrect when downloading posts. We are not certain what the cause of this is, but our tests suggest this is a [Tumblr] API problem that is giving inaccurate numbers.
 **Please submit an issue or contact us for features you want added/reimplemented.**
 ## Installation
 1. Install the latest version of [Python]:
    - Windows: `winget install python3`
    - Linux (apt): `apt install python-pip`
@@ -93,17 +104,23 @@ Features:
    - See [keyring] for additional requirements if you are not on Windows.
 ## Usage
-Run `tumblrbot` from anywhere. Run `tumblrbot --help` for command-line options. Every command-line option corresponds to a value from the [config](#configuration).
+Run `tumblrbot` from anywhere. Run `tumblrbot --help` for command-line options. Every command-line option corresponds to a value from the [config].
 ## Obtaining Tokens
 ### OpenAI
-API token can be created [here](https://platform.openai.com/settings/organization/api-keys).
+API token can be created here: [OpenAI Tokens].
    1. Leave everything at the defaults and set `Project` to `Default Project`.
    1. Press `Create secret key`.
    1. Press `Copy` to copy the API token to your clipboard.
 ### Tumblr
-API tokens can be created [here](https://tumblr.com/oauth/apps).
+API tokens can be created here: [Tumblr Tokens].
    1. Press `+ Register Application`.
    1. Enter anything for `Application Name` and `Application Description`.
    1. Enter any URL for `Application Website` and `Default callback URL`, like `https://example.com`.
@@ -118,20 +135,38 @@ When running this program, you will be prompted to enter all of these tokens. **
 After inputting the [Tumblr] tokens, you will be given a URL that you need to open in your browser. Press `Allow`, then copy and paste the URL of the page you are redirected to into the console.
 ## Configuration
 All config options can be found in `config.toml` after running the program once. This will be kept up-to-date if there are changes to the config's format in a future update. This also means it may be worthwhile to double-check the config file after an update. Any changes to the config should be in the changelog for a given version.
 All file options can include directories that will be created when the program is run.
-- `custom_prompts_file` You will have to create this file yourself. It should follow the following format:
+- `custom_prompts_file` This file should follow the following file format:
    ```json
-   {"user message 1": "assistant response 1",
-    "user message 2": "assistant response 2"}
+   {"user message 1": "assistant response 1"}
+   {"user message 1": "assistant response 1"}
+   {"user message 2": "assistant response 2", "user message 3": "assistant response 3"}
    ```
+   To be specific, it should follow the [JSON Lines] file format with one collection of name/value pairs (a dictionary) per line. You can validate your file using the [JSON Lines Validator].
 - **`developer_message`** - This message is used in for fine-tuning the AI as well as generating prompts. If you change this, you will need to run the fine-tuning again with the new value before generating posts.
 - **`user_message`** - This message is used in the same way as `developer_message` and should be treated the same.
-- **`expected_epochs`** - The default value here is the default number of epochs for `base_model`. You may have to change this value if you change `base_model`. After running fine-tuning once, you will see the number of epochs used in the [fine-tuning portal](https://platform.openai.com/finetune) under *Hyperparameters*. This value will also be updated automatically if you run fine-tuning through this program.
-- **`token_price`** - The default value here is the default token price for `base_model`. You can find the up-to-date value [here](https://platform.openai.com/docs/pricing#fine-tuning), in the *Training* column.
-- **`job_id`** - If there is any value here, this program will resume monitoring the corresponding job, instead of starting a new one. This gets set when starting the fine-tuning and is cleared when it is completed. You can find job IDs in the [fine-tuning portal](https://platform.openai.com/finetune).
-- **`base_model`** - This value is used to choose the tokenizer for estimating fine-tuning costs. It is also the base model that will be fine-tuned and the model that is used to generate tags. You can find a list of options in the [fine-tuning portal](https://platform.openai.com/finetune) by pressing *+ Create* and opening the drop-down list for *Base Model*. Be sure to update `token_price` if you change this value.
+- **`expected_epochs`** - The default value here is the default number of epochs for `base_model`. You may have to change this value if you change `base_model`. After running fine-tuning once, you will see the number of epochs used in the [fine-tuning portal] under *Hyperparameters*. This value will also be updated automatically if you run fine-tuning through this program.
+- **`token_price`** - The default value here is the default token price for `base_model`. You can find the up-to-date value in [OpenAI Pricing], in the *Training* column.
+- **`job_id`** - If there is any value here, this program will resume monitoring the corresponding job, instead of starting a new one. This gets set when starting the fine-tuning and is cleared when it is completed. You can read more in [fine-tuning].
+- **`base_model`** - This value is used to choose the tokenizer for estimating fine-tuning costs. It is also the base model that will be fine-tuned and the model that is used to generate tags. You can find a list of options in the [fine-tuning portal] by pressing `+ Create` and opening the drop-down list for `Base Model`. Be sure to update `token_price` if you change this value.
+- **`fine_tuned_model`** - Set automatically after monitoring fine-tuning if the job has succeeded. You can read more in [fine-tuning].
 - **`tags_chance`** - This should be between 0 and 1. Setting it to 0 corresponds to a 0% chance (never) to add tags to a post. 1 corresponds to a 100% chance (always) to add tags to a post. Adding tags incurs a very small token cost.
+## Manual Fine-Tuning
+You can manually upload the examples file to [OpenAI] and start the fine-tuning here: [fine-tuning portal].
+1. Press `+ Create`.
+1. Select the desired `Base Model` from the dropdown. This should ideally match the model set in the [config].
+1. Upload the generated examples file to the section under `Training data`. You can find the path for this in the [config].
+1. Press `Create`.
+1. (Optional) Copy the value next to `Job ID` and paste it into the [config] under `job_id`. You can then run the program and monitor its progress as usual.
+1. If you do not do the above, you will have to copy the value next to `Output model` once the job is complete and paste it into the [config] under `fine_tuned_model`.

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/README.md RENAMED Viewed

@@ -1,24 +1,32 @@
+# tumblrbot
 [OAuth]: https://oauth.net/1
-[OpenAI]: https://pypi.org/project/openai
 [Python]: https://python.org/download
-[Tumblr]: https://tumblr.com
+[JSON Lines]: https://jsonlines.org
+[JSON Lines Validator]: https://jsonlines.org/validator
+[pip]: https://pypi.org
 [keyring]: https://pypi.org/project/keyring
 [Rich]: https://pypi.org/project/rich
+[OpenAI]: https://pypi.org/project/openai
+[OpenAI Pricing]: https://platform.openai.com/docs/pricing#fine-tuning
+[OpenAI Tokens]: https://platform.openai.com/settings/organization/api-keys
+[Fine-Tuning Portal]: https://platform.openai.com/finetune
 [Moderation API]: https://platform.openai.com/docs/api-reference/moderations
-[pip]: https://pypi.org
+[Tumblr]: https://tumblr.com
+[Tumblr Tokens]: https://tumblr.com/oauth/apps
 [Download]: src/tumblrbot/flow/download.py
 [Examples]: src/tumblrbot/flow/examples.py
 [Fine-Tune]: src/tumblrbot/flow/fine_tune.py
 [Generate]: src/tumblrbot/flow/generate.py
 [Main]: src/tumblrbot/__main__.py
-[README.md]: README.md
-[config]: #configuration
-# tumblrbot
+[Config]: #configuration
+[Fine-Tuning]: #manual-fine-tuning
 [![PyPI - Version](https://img.shields.io/pypi/v/tumblrbot)](https://python.org/pypi/tumblrbot)
 Description of original project:
@@ -27,6 +35,7 @@ Description of original project:
 This fork is largely a rewrite of the source code with similarities in its structure and process.
 Features:
 - An [interactive console][Main] for all steps of generating posts for the blog:
    1. Asks for [OpenAI] and [Tumblr] tokens.
       - Stores API tokens using [keyring].
@@ -53,14 +62,18 @@ Features:
 - Automatically keeps the [config] file up-to-date and recreates it if missing.
 **To-Do:**
 - Add code documentation.
-- Fix inaccurate post counts when downloading posts.
-- Fix file not found error when starting fine-tuning.
+**Known Issues:**
+- Sometimes, you will get an error about the training file not being found when starting fine-tuning. We do not currently have a fix or workaround for this. You should instead use the online portal for fine-tuning if this continues to happen. Read more in [fine-tuning].
+- Post counts are incorrect when downloading posts. We are not certain what the cause of this is, but our tests suggest this is a [Tumblr] API problem that is giving inaccurate numbers.
 **Please submit an issue or contact us for features you want added/reimplemented.**
 ## Installation
 1. Install the latest version of [Python]:
    - Windows: `winget install python3`
    - Linux (apt): `apt install python-pip`
@@ -71,17 +84,23 @@ Features:
    - See [keyring] for additional requirements if you are not on Windows.
 ## Usage
-Run `tumblrbot` from anywhere. Run `tumblrbot --help` for command-line options. Every command-line option corresponds to a value from the [config](#configuration).
+Run `tumblrbot` from anywhere. Run `tumblrbot --help` for command-line options. Every command-line option corresponds to a value from the [config].
 ## Obtaining Tokens
 ### OpenAI
-API token can be created [here](https://platform.openai.com/settings/organization/api-keys).
+API token can be created here: [OpenAI Tokens].
    1. Leave everything at the defaults and set `Project` to `Default Project`.
    1. Press `Create secret key`.
    1. Press `Copy` to copy the API token to your clipboard.
 ### Tumblr
-API tokens can be created [here](https://tumblr.com/oauth/apps).
+API tokens can be created here: [Tumblr Tokens].
    1. Press `+ Register Application`.
    1. Enter anything for `Application Name` and `Application Description`.
    1. Enter any URL for `Application Website` and `Default callback URL`, like `https://example.com`.
@@ -96,19 +115,37 @@ When running this program, you will be prompted to enter all of these tokens. **
 After inputting the [Tumblr] tokens, you will be given a URL that you need to open in your browser. Press `Allow`, then copy and paste the URL of the page you are redirected to into the console.
 ## Configuration
 All config options can be found in `config.toml` after running the program once. This will be kept up-to-date if there are changes to the config's format in a future update. This also means it may be worthwhile to double-check the config file after an update. Any changes to the config should be in the changelog for a given version.
 All file options can include directories that will be created when the program is run.
-- `custom_prompts_file` You will have to create this file yourself. It should follow the following format:
+- `custom_prompts_file` This file should follow the following file format:
    ```json
-   {"user message 1": "assistant response 1",
-    "user message 2": "assistant response 2"}
+   {"user message 1": "assistant response 1"}
+   {"user message 1": "assistant response 1"}
+   {"user message 2": "assistant response 2", "user message 3": "assistant response 3"}
    ```
+   To be specific, it should follow the [JSON Lines] file format with one collection of name/value pairs (a dictionary) per line. You can validate your file using the [JSON Lines Validator].
 - **`developer_message`** - This message is used in for fine-tuning the AI as well as generating prompts. If you change this, you will need to run the fine-tuning again with the new value before generating posts.
 - **`user_message`** - This message is used in the same way as `developer_message` and should be treated the same.
-- **`expected_epochs`** - The default value here is the default number of epochs for `base_model`. You may have to change this value if you change `base_model`. After running fine-tuning once, you will see the number of epochs used in the [fine-tuning portal](https://platform.openai.com/finetune) under *Hyperparameters*. This value will also be updated automatically if you run fine-tuning through this program.
-- **`token_price`** - The default value here is the default token price for `base_model`. You can find the up-to-date value [here](https://platform.openai.com/docs/pricing#fine-tuning), in the *Training* column.
-- **`job_id`** - If there is any value here, this program will resume monitoring the corresponding job, instead of starting a new one. This gets set when starting the fine-tuning and is cleared when it is completed. You can find job IDs in the [fine-tuning portal](https://platform.openai.com/finetune).
-- **`base_model`** - This value is used to choose the tokenizer for estimating fine-tuning costs. It is also the base model that will be fine-tuned and the model that is used to generate tags. You can find a list of options in the [fine-tuning portal](https://platform.openai.com/finetune) by pressing *+ Create* and opening the drop-down list for *Base Model*. Be sure to update `token_price` if you change this value.
+- **`expected_epochs`** - The default value here is the default number of epochs for `base_model`. You may have to change this value if you change `base_model`. After running fine-tuning once, you will see the number of epochs used in the [fine-tuning portal] under *Hyperparameters*. This value will also be updated automatically if you run fine-tuning through this program.
+- **`token_price`** - The default value here is the default token price for `base_model`. You can find the up-to-date value in [OpenAI Pricing], in the *Training* column.
+- **`job_id`** - If there is any value here, this program will resume monitoring the corresponding job, instead of starting a new one. This gets set when starting the fine-tuning and is cleared when it is completed. You can read more in [fine-tuning].
+- **`base_model`** - This value is used to choose the tokenizer for estimating fine-tuning costs. It is also the base model that will be fine-tuned and the model that is used to generate tags. You can find a list of options in the [fine-tuning portal] by pressing `+ Create` and opening the drop-down list for `Base Model`. Be sure to update `token_price` if you change this value.
+- **`fine_tuned_model`** - Set automatically after monitoring fine-tuning if the job has succeeded. You can read more in [fine-tuning].
 - **`tags_chance`** - This should be between 0 and 1. Setting it to 0 corresponds to a 0% chance (never) to add tags to a post. 1 corresponds to a 100% chance (always) to add tags to a post. Adding tags incurs a very small token cost.
+## Manual Fine-Tuning
+You can manually upload the examples file to [OpenAI] and start the fine-tuning here: [fine-tuning portal].
+1. Press `+ Create`.
+1. Select the desired `Base Model` from the dropdown. This should ideally match the model set in the [config].
+1. Upload the generated examples file to the section under `Training data`. You can find the path for this in the [config].
+1. Press `Create`.
+1. (Optional) Copy the value next to `Job ID` and paste it into the [config] under `job_id`. You can then run the program and monitor its progress as usual.
+1. If you do not do the above, you will have to copy the value next to `Output model` once the job is complete and paste it into the [config] under `fine_tuned_model`.

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "tumblrbot"
-version = "1.4.3"
+version = "1.4.5"
 description = "An updated bot that posts to Tumblr, based on your very own blog!"
 readme = "README.md"
 requires-python = ">= 3.13"
@@ -13,8 +13,6 @@ dependencies = [
   "pwinput",
   "pydantic",
   "pydantic-settings",
-  "requests",
-  "requests-cache",
   "requests-oauthlib",
   "rich",
   "tiktoken",

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/__main__.py RENAMED Viewed

@@ -19,22 +19,18 @@ def main() -> None:
         OpenAI(api_key=tokens.openai_api_key.get_secret_value(), http_client=DefaultHttpxClient(http2=True)) as openai,
         TumblrSession(tokens=tokens) as tumblr,
     ):
-        post_downloader = PostDownloader(openai, tumblr)
         if Confirm.ask("Download latest posts?", default=False):
-            post_downloader.download()
-        download_paths = post_downloader.get_data_paths()
+            PostDownloader(openai=openai, tumblr=tumblr).main()
-        examples_writer = ExamplesWriter(openai, tumblr, download_paths)
         if Confirm.ask("Create training data?", default=False):
-            examples_writer.write_examples()
-        estimated_tokens = sum(examples_writer.count_tokens())
+            ExamplesWriter(openai=openai, tumblr=tumblr).main()
-        fine_tuner = FineTuner(openai, tumblr, estimated_tokens)
+        fine_tuner = FineTuner(openai=openai, tumblr=tumblr)
         fine_tuner.print_estimates()
         message = "Resume monitoring the previous fine-tuning process?" if FlowClass.config.job_id else "Upload data to OpenAI for fine-tuning?"
         if Confirm.ask(f"{message} [bold]You must do this to set the model to generate drafts from. Alternatively, manually enter a model into the config", default=False):
-            fine_tuner.fine_tune()
+            fine_tuner.main()
         if Confirm.ask("Generate drafts?", default=False):
-            DraftGenerator(openai, tumblr).create_drafts()
+            DraftGenerator(openai=openai, tumblr=tumblr).main()

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/flow/download.py RENAMED Viewed

@@ -1,13 +1,14 @@
 from io import TextIOBase
 from json import dump
-from pathlib import Path
+from typing import override
 from tumblrbot.utils.common import FlowClass, PreviewLive
 from tumblrbot.utils.models import Post
 class PostDownloader(FlowClass):
-    def download(self) -> None:
+    @override
+    def main(self) -> None:
         self.config.data_directory.mkdir(parents=True, exist_ok=True)
         with PreviewLive() as live:
@@ -50,9 +51,3 @@ class PostDownloader(FlowClass):
                 completed += len(posts)
             else:
                 return
-    def get_data_paths(self) -> list[Path]:
-        return list(map(self.get_data_path, self.config.download_blog_identifiers))
-    def get_data_path(self, blog_identifier: str) -> Path:
-        return (self.config.data_directory / blog_identifier).with_suffix(".jsonl")

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/flow/examples.py RENAMED Viewed

@@ -1,27 +1,21 @@
 from collections.abc import Generator
-from dataclasses import dataclass
 from json import loads
 from math import ceil
-from pathlib import Path
 from re import search
-from typing import IO
+from typing import IO, override
 import rich
 from more_itertools import chunked
 from openai import BadRequestError
-from rich.console import Console
 from rich.prompt import Confirm
-from tiktoken import encoding_for_model, get_encoding
 from tumblrbot.utils.common import FlowClass, PreviewLive
 from tumblrbot.utils.models import Example, Post
-@dataclass
 class ExamplesWriter(FlowClass):
-    data_paths: list[Path]
-    def write_examples(self) -> None:
+    @override
+    def main(self) -> None:
         self.config.examples_file.parent.mkdir(parents=True, exist_ok=True)
         with self.config.examples_file.open("w", encoding="utf_8") as fp:
@@ -52,16 +46,22 @@ class ExamplesWriter(FlowClass):
         fp.write(f"{example.model_dump_json()}\n")
     def get_custom_prompts(self) -> Generator[tuple[str, str]]:
-        if self.config.custom_prompts_file.exists():
-            text = self.config.custom_prompts_file.read_text(encoding="utf_8")
-            yield from loads(text).items()
+        self.config.custom_prompts_file.parent.mkdir(parents=True, exist_ok=True)
+        self.config.custom_prompts_file.touch(exist_ok=True)
+        with self.config.custom_prompts_file.open("r", encoding="utf_8") as fp:
+            for line in fp:
+                data: dict[str, str] = loads(line)
+                yield from data.items()
     def get_filtered_posts(self) -> Generator[Post]:
-        posts = list(self.get_valid_posts())
+        posts = self.get_valid_posts()
         if Confirm.ask("[gray62]Remove posts flagged by the OpenAI moderation? This can sometimes resolve errors with fine-tuning validation, but is slow.", default=False):
-            removed = 0
             chunk_size = self.get_moderation_chunk_limit()
+            posts = list(posts)
+            removed = 0
             with PreviewLive() as live:
                 for chunk in live.progress.track(
                     chunked(posts, chunk_size),
@@ -80,7 +80,7 @@ class ExamplesWriter(FlowClass):
             yield from posts
     def get_valid_posts(self) -> Generator[Post]:
-        for data_path in self.data_paths:
+        for data_path in self.get_data_paths():
             with data_path.open(encoding="utf_8") as fp:
                 for line in fp:
                     post = Post.model_validate_json(line)
@@ -96,19 +96,3 @@ class ExamplesWriter(FlowClass):
             if match := search(r"(\d+)\.", message):
                 return int(match.group(1))
         return test_n
-    def count_tokens(self) -> Generator[int]:
-        # Based on https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken
-        # and https://cookbook.openai.com/examples/chat_finetuning_data_prep
-        try:
-            encoding = encoding_for_model(self.config.base_model)
-        except KeyError as error:
-            encoding = get_encoding("o200k_base")
-            Console(stderr=True, style="logging.level.warning").print(f"[Warning] Using encoding '{encoding.name}': {''.join(error.args)}\n")
-        with self.config.examples_file.open(encoding="utf_8") as fp:
-            for line in fp:
-                example = Example.model_validate_json(line)
-                yield len(encoding.encode("assistant"))  # every reply is primed with <|start|>assistant<|message|>
-                for message in example.messages:
-                    yield 4 + len(encoding.encode(message.content))

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/flow/fine_tune.py RENAMED Viewed

@@ -1,25 +1,27 @@
-from dataclasses import dataclass
+from collections.abc import Generator
 from datetime import datetime
 from textwrap import dedent
-from time import sleep, time
+from time import sleep
+from typing import override
 import rich
 from openai.types.fine_tuning import FineTuningJob
 from rich import progress
+from rich.console import Console
 from rich.prompt import Confirm
+from tiktoken import encoding_for_model, get_encoding
 from tumblrbot.utils.common import FlowClass, PreviewLive
+from tumblrbot.utils.models import Example
-@dataclass
 class FineTuner(FlowClass):
-    estimated_tokens: int
     @staticmethod
     def dedent_print(text: str) -> None:
         rich.print(dedent(text).lstrip())
-    def fine_tune(self) -> None:
+    @override
+    def main(self) -> None:
         job = self.create_job()
         self.dedent_print(f"""
@@ -39,8 +41,6 @@ class FineTuner(FlowClass):
                 live.progress.update(
                     task_id,
-                    total=job.estimated_finish - job.created_at if job.estimated_finish else None,
-                    completed=time() - job.created_at,
                     description=f"Fine-tuning: [italic]{job.status.replace('_', ' ').title()}[/]...",
                 )
@@ -102,16 +102,33 @@ class FineTuner(FlowClass):
             self.config.fine_tuned_model = job.fine_tuned_model or ""
     def print_estimates(self) -> None:
-        total_tokens = self.config.expected_epochs * self.estimated_tokens
+        estimated_tokens = sum(self.count_tokens())
+        total_tokens = self.config.expected_epochs * estimated_tokens
         cost_string = self.get_cost_string(total_tokens)
         self.dedent_print(f"""
-            Tokens {self.estimated_tokens:,}:
+            Tokens {estimated_tokens:,}:
             Total tokens for [bold orange1]{self.config.expected_epochs}[/] epoch(s): {total_tokens:,}
             Expected cost when trained with [bold purple]{self.config.base_model}[/]: {cost_string}
             NOTE: Token values are approximate and may not be 100% accurate, please be aware of this when using the data.
                     [italic red]Amelia, Mutsumi, and Marin are not responsible for any inaccuracies in the token count or estimated price.[/]
         """)
+    def count_tokens(self) -> Generator[int]:
+        # Based on https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken
+        # and https://cookbook.openai.com/examples/chat_finetuning_data_prep
+        try:
+            encoding = encoding_for_model(self.config.base_model)
+        except KeyError as error:
+            encoding = get_encoding("o200k_base")
+            Console(stderr=True, style="logging.level.warning").print(f"[Warning] Using encoding '{encoding.name}': {''.join(error.args)}\n")
+        with self.config.examples_file.open(encoding="utf_8") as fp:
+            for line in fp:
+                example = Example.model_validate_json(line)
+                yield len(encoding.encode("assistant"))  # every reply is primed with <|start|>assistant<|message|>
+                for message in example.messages:
+                    yield 4 + len(encoding.encode(message.content))
     def get_cost_string(self, total_tokens: int) -> str:
         return f"${self.config.token_price / 1000000 * total_tokens:.2f}"

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/flow/generate.py RENAMED Viewed

@@ -1,13 +1,18 @@
 from random import random
+from typing import override
 import rich
+from rich.prompt import IntPrompt
 from tumblrbot.utils.common import FlowClass, PreviewLive
 from tumblrbot.utils.models import Post
 class DraftGenerator(FlowClass):
-    def create_drafts(self) -> None:
+    @override
+    def main(self) -> None:
+        self.config.draft_count = IntPrompt.ask("How many drafts should be generated?", default=self.config.draft_count)
         message = f"View drafts here: https://tumblr.com/blog/{self.config.upload_blog_identifier}/drafts"
         with PreviewLive() as live:
@@ -24,10 +29,7 @@ class DraftGenerator(FlowClass):
     def generate_post(self) -> Post:
         content = self.generate_content()
-        post = Post(
-            content=[content],
-            state="draft",
-        )
+        post = Post(content=[content])
         if tags := self.generate_tags(content):
             post.tags = tags.tags
         return post
@@ -39,16 +41,15 @@ class DraftGenerator(FlowClass):
             model=self.config.fine_tuned_model,
         ).output_text
-        return Post.Block(type="text", text=content)
+        return Post.Block(text=content)
     def generate_tags(self, content: Post.Block) -> Post | None:
         if random() < self.config.tags_chance:  # noqa: S311
             return self.openai.responses.parse(
                 text_format=Post,
-                input=f"Extract the most important subjects from the following text:\n\n{content.text}",
-                instructions="You are an advanced text summarization tool. You return the requested data to the user as a list of comma-separated strings.",
+                input=content.text,
+                instructions=self.config.tags_developer_message,
                 model=self.config.base_model,
-                temperature=0.5,
             ).output_parsed
         return None

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/utils/common.py RENAMED Viewed

@@ -1,25 +1,37 @@
-from dataclasses import dataclass
+from abc import abstractmethod
 from random import choice
 from typing import ClassVar, Self, override
 from openai import OpenAI
+from pydantic import ConfigDict
 from rich._spinners import SPINNERS
 from rich.console import RenderableType
 from rich.live import Live
 from rich.progress import MofNCompleteColumn, Progress, SpinnerColumn, TimeElapsedColumn
 from rich.table import Table
-from tumblrbot.utils.config import Config
+from tumblrbot.utils.config import Config, Path
+from tumblrbot.utils.models import FullyValidatedModel
 from tumblrbot.utils.tumblr import TumblrSession
-@dataclass
-class FlowClass:
+class FlowClass(FullyValidatedModel):
+    model_config = ConfigDict(arbitrary_types_allowed=True)
     config: ClassVar = Config()  # pyright: ignore[reportCallIssue]
     openai: OpenAI
     tumblr: TumblrSession
+    @abstractmethod
+    def main(self) -> None: ...
+    def get_data_paths(self) -> list[Path]:
+        return list(map(self.get_data_path, self.config.download_blog_identifiers))
+    def get_data_path(self, blog_identifier: str) -> Path:
+        return (self.config.data_directory / blog_identifier).with_suffix(".jsonl")
 class PreviewLive(Live):
     def __init__(self) -> None:

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/utils/config.py RENAMED Viewed

@@ -31,7 +31,7 @@ class Config(BaseSettings):
     data_directory: Path = Field(Path("data"), description="Where to store downloaded post data.")
     # Writing Examples
-    custom_prompts_file: Path = Field(Path("custom_prompts.json"), description="Where to read in custom prompts from.")
+    custom_prompts_file: Path = Field(Path("custom_prompts.jsonl"), description="Where to read in custom prompts from.")
     # Writing Examples & Fine-Tuning
     examples_file: Path = Field(Path("examples.jsonl"), description="Where to output the examples that will be used to fine-tune the model.")
@@ -53,6 +53,7 @@ class Config(BaseSettings):
     upload_blog_identifier: str = Field("", description="The identifier of the blog which generated drafts will be uploaded to. This must be a blog associated with the same account as the configured Tumblr secret tokens.")
     draft_count: PositiveInt = Field(150, description="The number of drafts to process. This will affect the number of tokens used with OpenAI")
     tags_chance: NonNegativeFloat = Field(0.1, description="The chance to generate tags for any given post. This will incur extra calls to OpenAI.")
+    tags_developer_message: str = Field("You will be provided with a block of text, and your task is to extract a very short list of the most important subjects from it.", description="The developer message used to generate tags.")
     @override
     @classmethod

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/utils/models.py RENAMED Viewed

@@ -3,6 +3,7 @@ from typing import Annotated, Any, ClassVar, Literal, Self, override
 import rich
 from keyring import get_password, set_password
+from niquests import Session
 from openai import BaseModel
 from pwinput import pwinput
 from pydantic import ConfigDict, PlainSerializer, SecretStr
@@ -69,6 +70,8 @@ class Tokens(FullyValidatedModel):
         if not all(self.tumblr.model_dump(mode="json").values()) or Confirm.ask("Reset Tumblr API tokens?", default=False):
             self.tumblr.client_key, self.tumblr.client_secret = self.online_token_prompt("https://tumblr.com/oauth/apps", "consumer key", "consumer secret")
+            OAuth1Session.__bases__ = (Session,)
             with OAuth1Session(
                 self.tumblr.client_key.get_secret_value(),
                 self.tumblr.client_secret.get_secret_value(),
@@ -95,13 +98,13 @@ class Tokens(FullyValidatedModel):
 class Post(FullyValidatedModel):
     class Block(FullyValidatedModel):
-        type: str = ""
+        type: str = "text"
         text: str = ""
         blocks: list[int] = []  # noqa: RUF012
     timestamp: SkipJsonSchema[int] = 0
     tags: Annotated[list[str], PlainSerializer(",".join)] = []  # noqa: RUF012
-    state: SkipJsonSchema[Literal["published", "queued", "draft", "private", "unapproved"]] = "published"
+    state: SkipJsonSchema[Literal["published", "queued", "draft", "private", "unapproved"]] = "draft"
     content: SkipJsonSchema[list[Block]] = []  # noqa: RUF012
     layout: SkipJsonSchema[list[Block]] = []  # noqa: RUF012

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/utils/tumblr.py RENAMED Viewed

@@ -1,21 +1,18 @@
 from dataclasses import dataclass
 from typing import Self
-from niquests import HTTPError, Session
-from requests import Response
-from requests_cache import CacheMixin
+from niquests import HTTPError, PreparedRequest, Response, Session
 from requests_oauthlib import OAuth1
 from tumblrbot.utils.models import Post, Tokens
 @dataclass
-class TumblrSession(CacheMixin, Session):  # pyright: ignore[reportIncompatibleMethodOverride, reportIncompatibleVariableOverride]
+class TumblrSession(Session):
     tokens: Tokens
     def __post_init__(self) -> None:
-        CacheMixin.__init__(self, use_cache_dir=True)
-        Session.__init__(self, happy_eyeballs=True)
+        super().__init__(multiplexed=True, happy_eyeballs=True)
         self.auth = OAuth1(**self.tokens.tumblr.model_dump(mode="json"))
         self.hooks["response"].append(self.response_hook)
@@ -24,21 +21,22 @@ class TumblrSession(CacheMixin, Session):  # pyright: ignore[reportIncompatibleM
         super().__enter__()
         return self
-    def response_hook(self, response: Response, **_: object) -> None:
-        try:
-            response.raise_for_status()
-        except HTTPError as error:
-            if response.text:
-                error.add_note(response.text)
-            raise
+    def response_hook(self, response: PreparedRequest | Response) -> None:
+        if isinstance(response, Response):
+            try:
+                response.raise_for_status()
+            except HTTPError as error:
+                if response.text:
+                    error.add_note(response.text)
+                raise
     def retrieve_published_posts(self, blog_identifier: str, after: int) -> Response:
         return self.get(
             f"https://api.tumblr.com/v2/blog/{blog_identifier}/posts",
             params={
-                "after": after,
+                "after": str(after),
                 "sort": "asc",
-                "npf": True,
+                "npf": str(True),
             },
         )

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/.github/dependabot.yml RENAMED Viewed

File without changes

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/UNLICENSE RENAMED Viewed

File without changes

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/__init__.py RENAMED Viewed

File without changes

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/flow/__init__.py RENAMED Viewed

File without changes

{tumblrbot-1.4.3 → tumblrbot-1.4.5}/src/tumblrbot/utils/__init__.py RENAMED Viewed

File without changes

tumblrbot 1.4.3__tar.gz → 1.4.5__tar.gz

tumblrbot 1.4.3tar.gz → 1.4.5tar.gz